[Date Prev][Date Next] [Chronological] [Thread] [Top]

MDB questions

To: openldap-technical@openldap.org
Subject: MDB questions
From: William Brown <william@blackhats.net.au>
Date: Thu, 03 May 2018 16:53:21 +1200
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= blackhats.net.au; h=content-transfer-encoding:content-type:date :from:message-id:mime-version:subject:to:x-me-sender:x-me-sender :x-sasl-enc; s=fm2; bh=wqTNM24EcstLFkAGVt9tLJHavJGDmGJUpBXUtGTOJ ys=; b=Uz5rLfnl9wTw75XjZoOaZuFl/WJN1VrbkdwFHJJW8iE0Eobv2KIFcarqc Yl/5tiyvq6Dbc9wD2jABOsuMQWfc0MIz5jAJIIGrSH+CFZ6peo1AYV0WxUPOZxG5 EyYEB/jiYUTuLupept/q58mK6U0owS+syBY9QoINT2KQa4E9c/grkRlI6fsP6Vqt s9cYoJ7Rsyrs0ihlf7nwJExaylk6i0aAvrJmIrkJcH8RXpwivr6y9s9hVdoiI52Q t5NaYm/fj5Ka2Jetk+A4691PPrHt8BbtFGBXs7mabw3Jzx0qOVMwJKRj5m1UWKmi aOwh0QeJVObcxcv57anSqJM2sKxCQ==
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=content-transfer-encoding:content-type :date:from:message-id:mime-version:subject:to:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; bh=wqTNM24EcstLFkAGVt9tLJHavJGDm GJUpBXUtGTOJys=; b=K3CrCCaf3KQVePYEuQkvpnZUBYrmIm/qSGWjJcpEvQBxX p5dRthzg2Q7RAoKUhXZzuc/JSDcSRFchv+oL9cWajm/aFvyI/ILEvwLDk/tznMfo 97h7AYrPI1NRk7KelioA5Rq5Fn3T8t5geBMzPzB1isEWKh/SBB50PWTPKRmz78uj JHp9x98SPhm5EM6Nj6CO7lAALOjNvTXXVcWmw2+o7u0EZEZAL+P/VRseKX+w74gH 2BtRtgasECkuuJKhiHb+W/0AoVZmSTndQJkQLOwG2XbZhPj/EnWQm9c36CDPSwqs P1rlNVVOu8uTDvXjTD/wTcTwKbpt/JQDXQwuyJMYA==

Hi there,

I have a few questions about MDB, and I have some things I'd like to
work on.

In the docs there are a few references that reference binary searching.
It's not 100% clear but I assume this is a binary search of the keys in
a BTree node, not that MDB is a bst. 

How does MDB provide crash resilience on the free pages?

According to man, free() should only be called on memory from malloc
but I see that you use free on mmaped pages in mdb_dpage_free. There
must be something I'm missing here about this.

Anyway, I have two things I want to work on.

The simple one is when pages are moved from the txn free list to the
env free list (I hope that's correct), it would be good to call
madvise(MADV_REMOVE) on the data section. 

The reason for this is that the madvise call will allow supported
filesystems to hole punch the sparse file, allowing space reclamation -
without MDB needing to worry about it!

The much more invasive change I want to work on is page checksumming.
Basically there are 4 cases I have in mind

* No checksumming (today)
* Metadata checksumming only
* Metadata and data checksumming

These could be used in these scenarios:

* write checksums but don't verify them at run time
* write checksums, and only verify metadata on read (possibly a good
default option)
* write checksums, and verify metadata and data on read (slowest, but
has strong integrity properties for some applications)

And in all cases I want to add an "mdb_verify" command that would
assert all of these are also correct offline.

There are a few reasons for this

* Hardware is unreliable. Ram, disk, cables, even cpu cache memory can
all exhibt bit flips and other data loss. Changing a bit in a pointer
can cause damage to any datastructure, and flows on to crashes or
silent corruption
* Software is never perfect - checksumming allows detection of over-
writes of data from overflow or other mistakes that we as humans all
make.

I'd opt to use something fast like crc32c (intel provides hardware to
accelerate this with -march=native). The only issue I see is that this
would require an ondisk structure change because the current structs
don't have space for this  -and the csums have to be *first*.

http://www.lmdb.tech/doc/group__internal.html#structMDB__page

The checksum would have to be the first value *or* the last value of
the page header, (so that it can be updated without affecting the
result of the checksum). The checksum for the data would have to be
within the header so that this is asserted as correct.

Is this something I should pursue? Would this require a ondisk format
change? Is there something that could be done to avoid this?


Thanks,

William

Follow-Ups:
- Re: LMDB questions
  - From: Howard Chu <hyc@symas.com>

Prev by Date: SV: Missing structural object, ldapadd says it's there, ldapdelete says it isn't
Next by Date: pwdRESET not working
Index(es):
- Chronological
- Thread