[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#8431) LMDB: Access newly opened database from another transaction

On Mon, May 30, 2016 at 2:44 PM, Hallvard Breien Furuseth
<h.b.furuseth@usit.uio.no> wrote:
> In ITS#8430, Juerg Bircher wrote:
>> So if you drop ITS#8431 it would at least be very helpful to get an
>> separate error like MDB_INVALID_DPI instead of EINVAL So the
>> application sharing and caching handles (dbi) could handle the
>> situation appropriate like use mdb_env_open() so that the dbi becomes
>> valid in that txn as well.
> This is ITS#8431 in a new guise, so I'm replying here.  We know DBI
> handling is inconvenient.  We'd have fixed it if it were easy without
> adding a bunch of locks.  I'll walk through some of it below, I expect
> it'll be useful to have for future reference anyway:
> You mean like this?
> if (mdb_something(txn, dbi, ...) == MDB_INVALID_DBI) {
>   /* Get <dbi>, which some other txn committed after <txn> began */
>   mdb_dbi_open(txn, "database name of <dbi>", 0, &dbi2);
>   assert(dbi2 == dbi);
> }
> That breaks this requirement from the mdb_dbi_open() doc:
> * A transaction that uses
> * this function must finish (either commit or abort) before
> * any other transaction in the process may use this function.

Just to illustrate the simplified sequence of calls (using MDB_NOTLS):

    mdb_txn_begin(env, NULL, MDB_RDONLY, &txn2);
    mdb_txn_begin(env, NULL, MDB_RDONLY, &txn);

    MDB_dbi dbi;
    mdb_dbi_open(txn, "db1", 0, &dbi);

    // dbi becomes now valid and is propagated to the env

    // At this point dbi for txn2 is invalid as txn2 started before
opening the database
    if (mdb_something(txn2, dbi, ...) == MDB_INVALID_DBI) {
        MDB_dbi dbi2;
        mdb_dbi_open(txn2, "db1", 0, &dbi2);
        assert(dbi2 == dbi);

        // now dbi is valid for txn2 as with the mdb_dbi_open we
propagated it to txn2!
        mdb_something(txn2, dbi, ...);

This code is really not preferred and is only a workaround.

> To fulfill that, you must serialize txns which use mdb_dbi_open().
> (Unless they're all write txns, which LMDB serializes.)  One reason
> you must, is that there is no sync anywhere else in your suggestion.
> Multi-threaded use of structures, like the ones behind DBIs, must be
> synchronized.  But DBIs and mdb_dbi_open() are not mutex-protected,
> nor are read-only txns (after the per-thread setup).  So it's up to
> the user.  And LMDB mostly can't catch users at less than txn-sized
> threading bugs - like the DBI workarounds people keep trying.
> I suppose we should add some of the above more clearly in lmdb.h doc,
> since people keep trying to hack around it.  And "EINVAL/MDB_BAD_DBI
> often indicate that LMDB detected a bug in your program, not that you
> can safely work around the problem when LMDB returns them."
> (Not always, unless we walk carefully through which error codes are
> returned when first.)
> For a txn to pick up a DBI committed after the txn started, LMDB
> would need to mutex-protect DBI operations.  Something like this:
> - Add an env+txn flag MDB_ADMIN_LOCKS.  If it is set in the env,
>   then mdb_dbi_open() fails if it is not also set in the txn.
>   The txn flag wraps DBI handling in a new mutex in txn_begin(),
>   dbi_open(), commit/abort() of a txn which opened a DBI.
>   The env flag wraps dbi_close() in the mutex.
> - Add mdb_dbi_pickup(MDB_txn*, MDB_dbi of a named DB): Pick up a DBI
>   which was committed after the txn began.  It will lock the mutex,
>   and maybe it'll require MDB_ADMIN_LOCKS in the txn.
>   (I don't know if pickup(main DBI) would be safe.)
>   Maybe that's more ACID-friendly - a separate function to step
>   outside and look at the current env.  If I've gotten it right...
>   Of course, if the txn which calls dbi_pickup() began before that
>   named DB was created, then the txn picks up a DBI which it cannot
>   use.  That's OK as far as LMDB is concerned, it does safely catch
>   use of a non-existent named DB.  But makes it less useful to users.
> So I wonder, will people optimize away the mutexing in the way you
> suggested above?  Try to use a DBI in a txn without MDB_ADMIN_LOCKS,
> and if that fails call mdb_dbi_pickup().  If so, they reintroduce
> the heisenbug: The test for whether the DBI was OK, was not mutexed
> vs. mdb_txn_commit(txn which created the DBI).
> Or simpler, the new flag could just tell MDB to wrap a mutex around
> every single txn_begin and txn_commit.  That won't tempt anyone to
> optimize away locks, since there are so much locking already.  But
> then, why are they using LMDB?  LMDB has made a lot of tradeoffs to
> avoid locking in read-only txns.  See Caveats in lmdb.h.
> And it'd still be unsafe to allow overlapping txns to mdb_dbi_open()
> the same DB.  txn1 creates the DBI, txn2 opens it too, txn1 aborts but
> must not close the DBI because txn2 is using it... So MDB_dbx would
> need a refcount of how many existing txns opened the DB during DBI
> creation.  Also I suppose ADMIN_LOCKS would need to disable
> mdb_set_compare() & co.  A new function mdb_dbi_open2() could init the
> entire MDB_dbx while holding the mutex, so other txns would not use
> the half-initialized DBX.
> Alternatively, dbi_open2(named DB) above can take a flag which says
> "make the DBI immediately available in the MDB_env".  That'll be
> unsafe for txns where the user does not synchronize txn_begin against
> the dbi_open2().  And it has its own sets of issues with multiple
> txns opening a DB, nested txns, etc.
> Then there's another approach: "Transactions are isolated from each
> other, provided you use LMDB correctly.  That's all.  Sorry about
> the DBI inconveniences."

If I got it right it is all about protecting against invalid DBIs
which can happen for example if someone closes database "A" which was
assigned DBI "5" and later the database "B" is opened and the free
slot 5 is then assigned to "B". So we could be misled using DBI "5"
having in mind database "A".

So why don't we extend the MDB_DBI by adding the sequence in the lower
or higher bits. When addressing the slot we simply strip the sequence
bits and verify the slot's sequence against the sequence in the
MDB_DBI to ensure the DBI has not changed. So picking up a DPI (as
suggested in the patch) can be verified.
The restriction with mdb_dbi_open() will stay. But the following would work:

    MDB_dbi dbi;

    mdb_txn_begin(env, NULL, MDB_RDONLY, &txn2);
    mdb_txn_begin(env, NULL, MDB_RDONLY, &txn1);

    mdb_dbi_open(txn1, "db1", 0, &dbi);

    // dbi becomes now valid and is propagated to the env
    // ==> and therefore is also valid for any other transaction!

    // use dbi opened in txn1 (which is committed)
    mdb_something(txn2, dbi, ...);



So the patch could look something like that.

static int mdb_update_db_info(MDB_txn *txn, MDB_dbi dbi) {
    unsigned int dbi_index = dbi & MDB_DBI_INDXE_MASK;
    unsigned int dbi_seq = dbi & MDB_DBI_SEQ_MASK << 16;

    if (txn->mt_env->me_dbiseqs[dbi_index] != dbi_seq) {
        return 0;

    /* propagate newly opened db from env to current txn. Mark them as unused */
    if (txn->mt_numdbs < txn->mt_env->me_numdbs) {
        memset(txn->mt_dbflags + txn->mt_numdbs, DB_UNUSED,
txn->mt_env->me_numdbs - txn->mt_numdbs);
    txn->mt_numdbs = txn->mt_env->me_numdbs;

    return dbi_index < txn->mt_numdbs;