Issue 7668 - LMDB enhancement, corruption detection
Summary: LMDB enhancement, corruption detection
Status: RESOLVED TEST
Alias: None
Product: LMDB
Classification: Unclassified
Component: liblmdb (show other issues)
Version: unspecified
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-08-20 13:29 UTC by Howard Chu
Modified: 2021-06-02 12:44 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description Howard Chu 2013-08-20 13:29:34 UTC
Full_Name: Howard Chu
Version: LMDB 0.9.7
OS: 
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (78.155.233.73)
Submitted by: hyc


Currently LMDB always stores two snapshots of the environment, one in each of
the two meta pages. For ongoing operation it simply uses the one with the higher
transaction number; the older one is ignored.

If a power failure occurs at the instant that a transaction commit is occurring,
it is possible for the meta page write to be corrupted in the storage device.
Storage devices use ECC to detect (and sometimes recover from) these problems;
if the error cannot be corrected then an attempt to read this sector will fail
and the OS may tell the application that the read failed with an I/O error.

In mdb_env_read_header() we currently attempt to read both header pages but
completely fail the mdb_env_open() if an error occurs. Instead, we should always
attempt to read both pages (assuming the file was not truncated, and both pages
really were written before). If both pages exist but we only successfully read
one, we should allow the env_open() to proceed in Read-Only mode, and return an
error code to the caller indicating this situation. E.g., MDB_NEEDS_BACKUP. The
point being, the user should use mdb_copy(1) to make a backup of the environment
ASAP and should not be able to do anything else to the environment in the
meantime.

It's been suggested that LMDB should also use its own CRC on the meta page, to
detect more subtle corruptions. We may consider adding this as well, but
obviously this would be an on-disk format change. My personal view is this is
what ECC DRAM is for. The primary argument for it is the possibility (again,
during a power faiure) for the contents of the storage device's write buffer to
be corrupted while writing to the meta page. I.e., ECC may have protected the
data all the way from the host to the storage device, but if the device was in
the middle of writing the sector when the power failure occurred, the write may
have completed, but the buffer DRAM may have been losing charge while the write
was happening. Again, I'm skeptical that corruptions of this sort would not be
detected by the device's own ECC checks.

While we're at it, it may be useful to include an env_open() flag explicitly
requesting use of the older meta page. This flag would only be valid in
conjunction with MDB_RDONLY. It would allow using mdb_copy to make a backup of
the previous environment snapshot.

Comment 1 Quanah Gibson-Mount 2017-04-13 15:29:41 UTC
moved from Incoming to Software Enhancements
Comment 2 Howard Chu 2021-06-02 12:44:37 UTC
MDB_PREVSNAPSHOT flag was added for LMDB 1.0 to allow opening env with the previous instead of current meta page.
mdb_env_set_checksum() was added for LMDB 1.0 to allow specifying per-page checksums.