[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#7703) mdb sync() issues vs. ACID

h.b.furuseth@usit.uio.no wrote:
> Full_Name: Hallvard B Furuseth
> Version: LMDB_0.9.8
> OS:
> URL:
> Submission from: (NULL) (
> Submitted by: hallvard
> mdb_env_sync() uses the wrong sync method when syncing a commit
> written with a different MDB_WRITEMAP setting in another MDB_env.
> Two processes with MDB_NOMETASYNC, each process doing every 2nd
> write txn, will sync each other's meta pages.  If they have
> different MDB_WRITEMAPs, every meta page gets synced wrongly.
> This breaks durability of ACID.

Sounds like a doc issue. This can only arise if two separate processes are 
using different configurations to access the same MDB environment. Most 
applications will always use identical configurations to access their 
databases, so this won't occur.

> There is a similar problem if a process crashes after writing
> the meta page but before sync succeeds, and mdb_env_open() then
> resets the lockfile to refer to the unsynced commit.  Robust
> mutexes will introduce a similar problem without mdb_env_open.
> I'm not volunteering to figure out how to do this right, e.g. how
> do fsync/msync/FlushFileBuffers work on various OSes if the file
> descriptor or memory map is read-only, do we need to set a "need
> to sync" flag in the lockfile in this case for the first writer
> or write txn to obey?
> Another fix: Disable this scenario.  Store the MDB_WRITEMAP
> setting in the lockfile when resetting it, even with MDB_RDONLY.
> Obey that flag rather than the writemap flag to mdb_env_open()
> when not resetting the lockfile.  However, now a small program
> like mdb_stat can have disproportionate effect on another process
> which opens the env at the same time.  Also nested txns need to
> work with MDB_WRITEMAP.
> For the crash case above and robust mutexes:
> Maybe mdb_env_open() should not modify me_txns->mti_txnid if it
> refers to the oldest meta page.  That way the possibly unsynced
> commit will never be exposed unless the lockfile is removed.
> But next write txn must then reset the "hidden" metapage and sync
> before proceeding, similar to how mdb_env_write_meta() does at
> failure.  Otherwise removing the lockfile would expose a meta
> page referring to data which may have been overwritten, e.g. by
> an mdb_abort()ed commit.
> Another variant would be to sync in mdb_env_open() when resetting
> the lockfile, or maybe an MDB_RDONLY env must set a "sync needed"
> flag.

Nothing to fix in code. Doc this as "don't do this." Nobody currently does it 
anyway so it will have no impact.

   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/