[Date Prev][Date Next] [Chronological] [Thread] [Top]

Issues arising from creating powerdns backend based on LMDB




Hi Howard, I've now got lmdb working with powerdns in place of kyoto -
nice and easy to do thanks! Maximum DNS query load is a little better -
about 10-30% depending on use-case, but for us the main gain is that you
can have a writer going on at the same time - I was struggling a bit
with how to push updates from a different process using kyoto. There's a
few issues and things I'd like to comment on though:

1) Can you update documentation to explain what happens when I do a
mdb_cursor_del() ? I am assuming it advances the cursor to the next
record (this seems to be the behaviour). However there is some sort of
bug with this assumption. Basically I have a loop which jumps
(MDB_SET_RANGE) to a key and then wants to do a delete until key is like
something else. So I do while(..) { mdb_cursor_del(),
mdb_cursor_get(..., MDB_GET_CURRENT)}. This works fine mostly, but
roughly 1% of the time I get EINVAL returned when I try to
MDB_GET_CURRENT after a delete. This always seems to happen on the same
records - not sure about the memory structure but could it be something
to do with hitting a page boundary somehow invalidating the cursor? At
the moment I just catch that and then do an MDB_NEXT to skip over them
but this will be an issue for us on live. This is from perl so it /may/
be that, or the version of lmdb that is shipped with it however the perl
layer is a very thin wrapper and looking at the code I can only think it
comes from lmdb.

2) Currently, because kyoto cabinet didn't have support for multiple
identical keys we don't use the DUP options. This leads to quite long
keys (1200-1300 bytes in some cases). In the future, it would be nice to
have a run-time keylength specifier or something along those lines.

3) Perhaps a mdb_cursor_get_key() function (like kyoto) which doesn't
return the data (just the key). As in (2) we store all the data in the
key - not sure how much of a performance difference this would make though

4) Creating database with non-sequential keys is very bad (on 4gb
databases, 2* slower than kyoto - about 1h30 and uses more memory). I
spent quite a bit of time looking at this in perl and then C. Basically
I create a database, open 1 txn and then insert a bunch of unordered
keys. Up to about 500mb it's fine and nice and quick - from perl about
75k inserts/sec (slow mostly because it's reading from mysql). However
after than first 500mb it starts flushing to disk. In a sequential
insert case the flush is very quick - 100-200mb/sec or so. However on
non-sequential insert I've seen it drop to like 4 or 5mb/sec as it's
writing data all over the disk rather than big sequential writes. iostat
shows the same ~200tps of write, 100% usage but only 4-10mb/sec of bytes
being written.

However, even when it's not flushing (or when storing data on SSD or
memdisk), after the first 500mb performance massively drops off to
perhaps 10-15k inserts/sec. At the same time, looking at `top`, once the
residential memory hits about 500mb, the 'shared memory' starts being
used and residential just keeps on increasing. I'm not sure if this is
some kind of kernel accounting thing to do with mmap usage but it
doesn't happen for sequential key inserts (for those, shared mem stays
around 0, residential stays 500mb). I'm using centos 6 with various
different kernels from default to 3.7.5 and the behaviour is the same. I
don't really know how to go about looking for the root cause of this but
I'm pretty sure that whilst the IO is crippling it in places there is
something else going on of which the shared memory increase is a sign.
I've tried using the WRITEMAP option too which doesn't seem to affect
anything significantly in terms of performance or memory usage.

5) pkgconfig/rpms would be really nice to have. Or do you expect it to
just be bundled with a project as eg the perl module does?

Thanks,

Mark