[Date Prev][Date Next] [Chronological] [Thread] [Top]

(ITS#7703) mdb sync() issues vs. ACID

Full_Name: Hallvard B Furuseth
Version: LMDB_0.9.8
Submission from: (NULL) (
Submitted by: hallvard

mdb_env_sync() uses the wrong sync method when syncing a commit
written with a different MDB_WRITEMAP setting in another MDB_env.

Two processes with MDB_NOMETASYNC, each process doing every 2nd
write txn, will sync each other's meta pages.  If they have
different MDB_WRITEMAPs, every meta page gets synced wrongly.
This breaks durability of ACID.

There is a similar problem if a process crashes after writing
the meta page but before sync succeeds, and mdb_env_open() then
resets the lockfile to refer to the unsynced commit.  Robust
mutexes will introduce a similar problem without mdb_env_open.

I'm not volunteering to figure out how to do this right, e.g. how
do fsync/msync/FlushFileBuffers work on various OSes if the file
descriptor or memory map is read-only, do we need to set a "need
to sync" flag in the lockfile in this case for the first writer
or write txn to obey?

Another fix: Disable this scenario.  Store the MDB_WRITEMAP
setting in the lockfile when resetting it, even with MDB_RDONLY.
Obey that flag rather than the writemap flag to mdb_env_open()
when not resetting the lockfile.  However, now a small program
like mdb_stat can have disproportionate effect on another process
which opens the env at the same time.  Also nested txns need to
work with MDB_WRITEMAP.

For the crash case above and robust mutexes:

Maybe mdb_env_open() should not modify me_txns->mti_txnid if it
refers to the oldest meta page.  That way the possibly unsynced
commit will never be exposed unless the lockfile is removed.
But next write txn must then reset the "hidden" metapage and sync
before proceeding, similar to how mdb_env_write_meta() does at
failure.  Otherwise removing the lockfile would expose a meta
page referring to data which may have been overwritten, e.g. by
an mdb_abort()ed commit.

Another variant would be to sync in mdb_env_open() when resetting
the lockfile, or maybe an MDB_RDONLY env must set a "sync needed"