[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Issues arising from creating powerdns backend based on LMDB



Mark Zealey wrote:
On 23/08/13 04:55, Howard Chu wrote:
Howard Chu wrote:
Yes, I see it here, and I see the problem. LMDB was not originally
designed to
handle transactions of unlimited size. It originally had a txn
sizelimit of
about 512MB. In 0.9.7 we added some code to raise this limit, and it's
performing quite poorly here. I've tweaked my copy of the code to
alleviate
that problem but your test program still fails here because the
volume of data
being written also exceeds the map size. You were able to run this to
completion?

Two things... I've committed a patch to mdb.master to help this case
out. It sped up my run of your program, using only 10M records, from
19min to 7min.

Additionally, if you change your test program to commit every 2M
records, and avoid running into the large txn situation, then the 10M
records are stored in only 1m51s.

Running it now with the original 100M count. Will see how it goes.

I never actually ran it through (hence the map size issue) it was more
just an unlimited number to investigate the slowdown - 10M seems fine. I
just pulled from git (assumed this was better than the patch you sent)
and rebuilt, certainly seems a bit better now although at around 6m
records (ext4) it has some awful IO - drops to 1mb/sec in places on our
normal disk (first few writes are 100mb/s then it starts writing all
over the place). I've tried on both ext4 and xfs with no special tuning
and pretty much the same thing happens although closer to 7m records on
xfs. This is with NOSYNC option too. If I set the commit gap to 1m
records performance is ok up to around 8.4m records on ext4 and then
just stops for a minute or two doing small writes. Same thing at about
9.4m. It seems that the patch has  pushed the performance dropoff back a
bit and perhaps improved on it but there is still an issue there as far
as I can see.

Agreed, it's still fairly slow. I reran the 100M using commits at 100,000 and it finished in 18m26s.

The test program with 10m records committing every 1m completes in 1m10s
user time, but 5m30s real time because of all the pausing for disk
writes (ext4 but as above doesn't seem to make much difference compared
to xfs)... Same program&latest git on an SSD-backed system (ie massive
number of small write transactions don't cause any issues) with slightly
faster CPU - user time 47sec, real time 1min. On the SSD-backed box
without any commits - 5m30s user time, 6min real time.

So committing every 1-2m records is much better. I don't mind using
short transactions (in fact the program doesn't actually need any
transactions). Perhaps it would be good to have a "Allow LMDB to
automatically commit+reopen this transaction for optimal performance"
flag, or some way of easily knowing when the txn should be committed and
reopened rather than trying to guess roughly how many bytes i've written
since the last txn and commit if > a magic number of 400mb?

Also I don't know how intentional the 512mb limit you mention is but
perhaps that could be set at runtime - in that way I could just set to
half the box's mem size and ensure I don't need to write anything until
I have the whole thing generated?

By the way, looking at `free` output seems to imply that `top` is lying
about how much memory the program is using - residential looks like it
is capped at 500mb but it keeps rising along with shared which is
presumably the pages in the mmap that are in memory at the moment.

Yes, the shared memory is included in the rss, it's quite deceptive especially if you have multiple processes using shared memory.

wrt the ssd vs hdd performance differences, I did see similar disk write
issues in kyoto. So for that we generate onto a memdisk, however it
seems a bit strange to have to do this with LMDB given it's advertised
as a memory database.

LMDB is *not* advertised as a "memory database" - it is advertised as a memory-mapped disk database. It is only people who have no clue what they're talking about who refer to it as a "memory database". Memory databases have no persistence and are limited to the size of RAM. LMDB has neither of those traits. Being a disk-based DB means we're affected by issues like disk seek time.

--
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/