[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Issues arising from creating powerdns backend based on LMDB

Mark Zealey wrote:
On 22/08/13 23:37, Howard Chu wrote:

1) Can you update documentation to explain what happens when I do a
mdb_cursor_del() ? I am assuming it advances the cursor to the next
record (this seems to be the behaviour). However there is some sort of
bug with this assumption. Basically I have a loop which jumps
(MDB_SET_RANGE) to a key and then wants to do a delete until key is like
something else. So I do while(..) { mdb_cursor_del(),
mdb_cursor_get(..., MDB_GET_CURRENT)}. This works fine mostly, but
roughly 1% of the time I get EINVAL returned when I try to
MDB_GET_CURRENT after a delete. This always seems to happen on the same
records - not sure about the memory structure but could it be something
to do with hitting a page boundary somehow invalidating the cursor?

That's exactly what it does, yes.

Any idea about the EINVAL issue?

Yes, as I said already, it does exactly what you said. When you've deleted the last item on the page the cursor no longer points at a valid node, so GET_CURRENT returns EINVAL.

None of the memory behavior you just described makes any sense to me.
LMDB uses a shared memory map, exclusively. All of the memory growth
you see in the process should be shared memory. If it's anywhere else
then I'm pretty sure you have a memory leak. With all the valgrind
sessions we've run I'm also pretty sure that *we* don't have a memory

As for the random I/O, it also seems a bit suspect. Are you doing a
commit on every key, or batching multiple keys per commit?

I'm not doing *any* commits just one big txn for all the data...

The below C works fine up until i=4m (ie 500mb of residential memory
shown in top), then has massive slowdown, shared memory (again, as seen
in top) increases, waits about 20-30 seconds and then disks get hammered
writing 10mb/sec (200txns) when they are capable of 100-200mb/sec
streaming writes... Does it do the same for you?

int main(int argc,char * argv[]) {
      int i = 0, j = 0, rc;
      MDB_env *env; MDB_dbi dbi; MDB_val key, data; MDB_txn *txn; char
      int count = 100000000;

          rc = mdb_env_create(&env);
          rc = mdb_env_set_mapsize(env, (size_t)1024*1024*1024*10);
          rc = mdb_env_open(env, "./testdb", 0, 0664);
          rc = mdb_txn_begin(env, NULL, 0, &txn);
          rc = mdb_open(txn, NULL, 0, &dbi);

          for (i=0;i<count;i++) {
              sprintf( buf, "blah foo %9d%9d%9d", (long)(random() *
(float)count / RAND_MAX) - i, i, i );
              if( i %100000 == 0 )
                  printf("%s\n", buf);
              key.mv_size = sizeof(buf); key.mv_data = &buf;
              data.mv_size = sizeof(buf); data.mv_data = &buf;
              rc = mdb_put(txn, dbi, &key, &data, 0);
          rc = mdb_txn_commit(txn);
          mdb_close(env, dbi);


      return 0;

By the way, I've just generated our biggest database (~4.5gb) from
scratch using our standard perl script. Using kyoto (treedb) with
various tunings it did it in 18 min real time vs lmdb at 50 minutes
(both ssd-backed in a box with 24gb free memory).

Kyoto writes async by default. You should do the same here, use MDB_NOSYNC on the env_open.

  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/