[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Issues arising from creating powerdns backend based on LMDB

On 22/08/13 23:37, Howard Chu wrote:

1) Can you update documentation to explain what happens when I do a
mdb_cursor_del() ? I am assuming it advances the cursor to the next
record (this seems to be the behaviour). However there is some sort of
bug with this assumption. Basically I have a loop which jumps
(MDB_SET_RANGE) to a key and then wants to do a delete until key is like
something else. So I do while(..) { mdb_cursor_del(),
mdb_cursor_get(..., MDB_GET_CURRENT)}. This works fine mostly, but
roughly 1% of the time I get EINVAL returned when I try to
MDB_GET_CURRENT after a delete. This always seems to happen on the same
records - not sure about the memory structure but could it be something
to do with hitting a page boundary somehow invalidating the cursor?

That's exactly what it does, yes.

Any idea about the EINVAL issue?

None of the memory behavior you just described makes any sense to me. LMDB uses a shared memory map, exclusively. All of the memory growth you see in the process should be shared memory. If it's anywhere else then I'm pretty sure you have a memory leak. With all the valgrind sessions we've run I'm also pretty sure that *we* don't have a memory leak.

As for the random I/O, it also seems a bit suspect. Are you doing a commit on every key, or batching multiple keys per commit?

I'm not doing *any* commits just one big txn for all the data...

The below C works fine up until i=4m (ie 500mb of residential memory shown in top), then has massive slowdown, shared memory (again, as seen in top) increases, waits about 20-30 seconds and then disks get hammered writing 10mb/sec (200txns) when they are capable of 100-200mb/sec streaming writes... Does it do the same for you?

int main(int argc,char * argv[]) {
    int i = 0, j = 0, rc;
MDB_env *env; MDB_dbi dbi; MDB_val key, data; MDB_txn *txn; char buf[40];
    int count = 100000000;

        rc = mdb_env_create(&env);
        rc = mdb_env_set_mapsize(env, (size_t)1024*1024*1024*10);
        rc = mdb_env_open(env, "./testdb", 0, 0664);
        rc = mdb_txn_begin(env, NULL, 0, &txn);
        rc = mdb_open(txn, NULL, 0, &dbi);

        for (i=0;i<count;i++) {
sprintf( buf, "blah foo %9d%9d%9d", (long)(random() * (float)count / RAND_MAX) - i, i, i );
            if( i %100000 == 0 )
                printf("%s\n", buf);
            key.mv_size = sizeof(buf); key.mv_data = &buf;
            data.mv_size = sizeof(buf); data.mv_data = &buf;
            rc = mdb_put(txn, dbi, &key, &data, 0);
        rc = mdb_txn_commit(txn);
        mdb_close(env, dbi);


    return 0;

By the way, I've just generated our biggest database (~4.5gb) from scratch using our standard perl script. Using kyoto (treedb) with various tunings it did it in 18 min real time vs lmdb at 50 minutes (both ssd-backed in a box with 24gb free memory).