[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: After power reset, LMDB sub-DBs' key-pairs are missing



Hi Howard,

On Mon, Mar 14, 2016 at 6:10 PM, Howard Chu <hyc@symas.com> wrote:
> Sounds like ext4 or your SSD is messing with you. The only way you could
> wind up with your original state, 3 sub-DBs with 1 record each, based on the
> processing you described, is if the original DB pages representing that
> state were still recorded in the file. There's no way that those pages would
> not have already been reused by LMDB, after 2210167 transactions had been
> written to the DB. Much less the 4132159 transactions of your post-reboot
> file.
>
> So either the SSD has remapped pages out from under you, or the ext4 journal
> has decided to give you back an older version of the file. In either case I
> doubt that any of your real data is still accessible thru any standard
> filesystem APIs.

Thank you for the very quick response. LMDB has been rock solid in
this use for some time now, so I'd also be inclined to look to ext4
first, and will need to examine their laundry list of mount options
[1] again. This particular file system was mounted
'noatime,nodev,nosuid,noexec', which means it should default to
'data=ordered' for the journaling.

In case anyone on the list has informed views regarding ext4 mount
options to avoid or embrace when using LMDB, certainly would love to
hear them...

As for the present problem, from further analysis what I suspect to be
the reason those sub-DBs each contain their lone keys is that those
three keys (one in each sub-DB) are the defaults inserted when the
program initializes a new LMDB file to write to. Those three keys are
inserted if the database is believed to be empty, i.e., if it has zero
entries.

So possibly the post-reboot state of the DB file looked empty to LMDB,
either for the main DB or for the three sub-DBs, and the program then
proceeded to insert and commit those default keys. That's more
plausible than a resurrection of the exact initial state (from months
ago) where the original pages would indeed be long gone, as you noted.

I don't suppose there is much hope of finding the previous B+tree
root(s) in the .mdb file and attempting to recover data from any
still-reachable leaves?

> If this filesystem is only being used to store LMDB data, you should use
> ext2 (or some other non-journaling filesystem of your choice). If all your
> txns are being committed synchronously, you should consider using a raw
> block device instead of a filesystem. (Code for this is experimental and not
> yet released. It's slower than using a filesystem, when using asynch
> transactions, but several times faster than any other filesystem for synch
> transactions.)

Thanks for those tips, will evaluate LMDB on ext2. Would also be most
interested to help test the new code for LMDB on a raw block device.
Might that perhaps be available in some Git branch going forward?

[1] https://www.kernel.org/doc/Documentation/filesystems/ext4.txt

-- 
Arto Bendiken | @bendiken | http://ar.to