[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#8475) Feature request: MDB low durability transactions

On 06. aug. 2016 17:38, bentrask@comcast.net wrote:
> Transaction commits are one of the few bottlenecks in MDB, because it has to
> fsync twice, sequentially.
> I think MDB could support mixed low and high durability transactions in the same
> database by adding per-page checksums and a third root page. The idea is that
> when committing a low-durability transaction, no fsyncs are performed. (...)

Yesno.  We can get rid of fsyncs, but not that way.  Checksumming each
page isn't enough.  We must know it's the right version of the page and
not e.g. a similar page from a previous aborted transaction.  To commit
a branch or meta page, we'd need to scan its children and checksum the
page headers (thus including their checksum) of each.  Expensive.

IIRC there are three things we can do:

- Use and fsync a WAL (write-ahead log) instead of the database pages.
   That can be cheaper because it writes one contiguous region instead
   of a lot of random-access pages.  Requires recovery after a crash.

- Volatile metapages which mdb_env_open() _always_ throws away if no
   other environment is already open.  They are lost of the application
   crashes/exits without doing a final checkpoint.

- Improve that a bit: Put them in a shared memory region, since that
   won't survive a system crash (unlike if we put them in the lockfile).
   That way they'll survive application crash provided something does
   a checkpoint before next system crash.

We've discussed these sometimes and there are caveats for some of them,
I don't quite remember.  One issue is that a "system crash" isn't the
only thing which can lose unsynced pages.  Another is unmounting and
re-mounting the disk (i.e. an USB disk).