[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: LMDB and multiple processes



Thanks again Howard for the help early on.  I'm into the development
of my real application now, and I'm able to leverage the docs much
better.

With my application, I have built a (~650MB) database with (9)
sub-databases.  I'll have a writer application to do updates, which
should be pretty easy.  There are a couple of things I want to make
sure that I'm doing correctly for the reader:

My application is a web service written in Go, using Go's net/http
package, which creates a new "goroutine" for each incoming request.
goroutines run concurrently, but may be multiplexed onto a single OS
thread.  So, I will be using the MDB_NOTLS flag when opening the
environment.  Then--from what I can gather--it seems like I will need
to allocate a pool of read-only transactions if I want to avoid
allocating new transactions for each HTTP request (is that right?).
Something like the following:


/* Test this to figure out how many are needed to never run out in practice */
N_READERS = 512

txn = env.BeginTxn(nil, MDB_RDONLY)  // mdb_txn_begin: parent=nil,
flags=MDB_RDONLY
for each dbname in dbnames {
    txn.DBIOpen(dbname, 0)  // mdb_dbi_open: name=dbname, flags=0
}
txn.Commit()

for i = 0; i < N_READERS; i++ {
    txn = env.BeginTxn(nil, MDB_RDONLY)
    txnPool.Add(txn)
}


Then, for each HTTP request, I would pull a txn out of the pool, use
it (for multiple sequential queries for a given HTTP request), reset
it, renew it, and put it back in the pool.

I've got a proof of concept working with the above strategy, but does
this all sound sane?

        Thanks,
        Brian

P.S. Sorry about the previous non-plaintext e-mails sent to the list.
Somehow my e-mail client reverted to silly mode.

On Wed, Jun 4, 2014 at 2:43 PM, Brian G. Merrell <bgmerrell@gmail.com> wrote:
> On Wed, Jun 4, 2014 at 1:04 PM, Howard Chu <hyc@symas.com> wrote:
>>
>> Brian G. Merrell wrote:
>>>
>>> On Wed, Jun 4, 2014 at 10:22 AM, Howard Chu <hyc@symas.com
>>> <mailto:hyc@symas.com>> wrote:
>>>
>>>     Brian G. Merrell wrote:
>>>
>>>         Hi all,
>>>
>>>         First, I'm having trouble finding resources to answer a question like this
>>>         myself, so please forgive me if I've missed something.
>>>
>>>
>>>     http://symas.com/mdb/doc/
>>>
>>>
>>> Thanks.  I did see and skim the API portion of the docs before asking, but I
>>> was just having trouble knowing how the pieces fit together to solve a problem.
>>
>>
>> Skimming isn't going to cut it.
>
> Fair enough, I probably gave up prematurely.  Blame my inferior
> intellect, but with zero other context into LMDB, I was having trouble
> getting a holistic view of LMDB from the docs.  From the information
> you've shared, though, it's made the docs much more approachable.  For
> whatever it's worth, I plan to write something up with my findings
> that will hopefully help someone.
>
>>
>>
>>>     Your reader process should be using read transactions.
>>
>>
>>> OK, I interpret this as meaning that I need to pass the MDB_RDONLY flag to
>>> mdb_txn_begin.  Is that correct?
>>
>>
>> Yes.
>>
>>
>>>     In the actual LMDB API read transactions can be reused by their creating
>>>     thread, so they are zero-cost after the first time. I don't know if any of
>>>     the other language wrappers leverage this fact.
>>
>>
>>> This helps a lot.  I will investigate what the case is with gomdb.
>>>
>>>
>>>     Opening a DBI only needs to be done once per process. Opening per
>>>     transaction would be stupid, like reopening a file handle on every request.
>>>
>>>
>>> I suspected so.  The fact that mdb_dbi_open takes a transaction had me
>>> confused a bit, because I thought I would need to pass in the new transaction
>>> every time I got a transaction from mdb_txn_begin.
>>
>>
>> mdb_dbi_open takes a txn because it needs one if you're creating a DB for the first time. I.e., it must write metadata for the DB into the environment, and all writes to MDB must be inside a txn. But once that txn is committed, the DBI itself lives on until mdb_dbi_close. This is all already explained in the doc for mdb_dbi_open; if you hadn't skimmed you would have seen it already.
>>
>> Most of this is only a concern when you're using named subDBs. The default unnamed DB always exists, so its DBI is always valid anyway.
>
> I will probably use named subDBs for my real application (instead of 9
> separate databases like I do in LevelDB), so thanks for sharing.
>
>>
>>
>>> I've refactored the reader to look like this:
>>>
>>>
>>> env = NewEnv()
>>> env.Open("/tmp/foo", 0, 0664)
>>> txn = BeginTxn(nil, mdb.RDONLY) // parent txn is the nil arg
>>> dbi = txn.DBIOpen(nil, 0)
>>> txn.Abort()
>>
>>
>> You want mdb_txn_reset() here, not abort. Abort frees/destroys the txn handle so it cannot be reused.
>>
>>
>>> while {
>>>       txn = BeginTxn(nil, mdb.RDONLY) // parent txn is the nil arg
>>
>> and here you want mdb_txn_renew(), to reuse the txn handle instead of creating a new one.
>
> Ahah!  Thank you.  I had tried this before, but because I had used the
> txn.Abort() above, things did not go well.  Now my benchmark times are
> back to what I would expect.  I.e., they are comparable to the
> performance I was seeing when I had all transaction code outside of
> the loop (but wasn't seeing the data being updated after running my
> writer process).
>
>>
>>
>>>       for i = 0; i < n_entries; i++ {
>>>           key = sprintf("Key-%d", i)
>>>           val = txn.Get(dbi, key)
>>>           print("%s: %s", key, value)
>>>       }
>>>       txn.Commit()
>>
>> and you want mdb_txn_reset() here too, not commit. Commit also frees/destroys the txn handle.
>>
>>>       sleep(5)
>>> }
>>
>>
>> You can abort or commit the txn during your process teardown phase to dispose of it.
>>
>>
>>> env.DBIClose(dbi)
>>>
>>>
>>> Now, I guess the big question that BeginTxn inside the loop is zero-cost.
>>>
>>> Thanks for the tips so far Howard; it has been very helpful.
>>
>>
>> --
>>   -- Howard Chu
>>   CTO, Symas Corp.           http://www.symas.com
>>   Director, Highland Sun     http://highlandsun.com/hyc/
>>   Chief Architect, OpenLDAP  http://www.openldap.org/project/