[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: LMDB crash consistency, again

On 01/05/2015 12:58 PM, Howard Chu wrote:
The LKML thread indicates that this bug was already fixed. The zheng mai
paper says they used RHEL6, which shipped with kernel 2.6.32 so it apparently
was too old to have the fix.

All in all a bunch of bogus reporting; claiming that all DBs are broken when
in fact LMDB is perfectly correct.

True - but often uninteresting from the user's perspective.  So I do
think Linux should default to fsync for some years - at least when the
file may have grown.  Makefile can explain the problem and provide a
variable to always use fdatasync, if the admin knows the kernel is OK.

As for how to know the synced size, if you want to do more than always
use fsync on an OS where fdatasync is unreliable:

I drafted some code to get around it, but it got messy.  If we
use more code for this than just '#define MDB_FDATASYNC fsync',
I suggest to handle it all in mdb_env_sync() which can fstat():

struct MDB_env:
    off_t   me_size;    /**< file size known to be synced, or 0 */

mdb_env_sync() {
    size_t sz = 0;
    if (mdb_fsize(env->me_fd, &sz) != MDB_SUCCESS || sz != env->me_size) {
        if (fsync(env->me_fd))
            rc = ErrCode();
        else if (sz)
            env->me_size = sz;
    } else
    ...normal sync...;

mdb_env_open() does not know if the current filesize has been
synced, so drop setting me_size there.