[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Issues arising from creating powerdns backend based on LMDB



On 23/08/13 04:55, Howard Chu wrote:
Howard Chu wrote:
Yes, I see it here, and I see the problem. LMDB was not originally designed to handle transactions of unlimited size. It originally had a txn sizelimit of
about 512MB. In 0.9.7 we added some code to raise this limit, and it's
performing quite poorly here. I've tweaked my copy of the code to alleviate that problem but your test program still fails here because the volume of data being written also exceeds the map size. You were able to run this to completion?

Two things... I've committed a patch to mdb.master to help this case out. It sped up my run of your program, using only 10M records, from 19min to 7min.

Additionally, if you change your test program to commit every 2M records, and avoid running into the large txn situation, then the 10M records are stored in only 1m51s.

Running it now with the original 100M count. Will see how it goes.

I never actually ran it through (hence the map size issue) it was more just an unlimited number to investigate the slowdown - 10M seems fine. I just pulled from git (assumed this was better than the patch you sent) and rebuilt, certainly seems a bit better now although at around 6m records (ext4) it has some awful IO - drops to 1mb/sec in places on our normal disk (first few writes are 100mb/s then it starts writing all over the place). I've tried on both ext4 and xfs with no special tuning and pretty much the same thing happens although closer to 7m records on xfs. This is with NOSYNC option too. If I set the commit gap to 1m records performance is ok up to around 8.4m records on ext4 and then just stops for a minute or two doing small writes. Same thing at about 9.4m. It seems that the patch has pushed the performance dropoff back a bit and perhaps improved on it but there is still an issue there as far as I can see.

The test program with 10m records committing every 1m completes in 1m10s user time, but 5m30s real time because of all the pausing for disk writes (ext4 but as above doesn't seem to make much difference compared to xfs)... Same program&latest git on an SSD-backed system (ie massive number of small write transactions don't cause any issues) with slightly faster CPU - user time 47sec, real time 1min. On the SSD-backed box without any commits - 5m30s user time, 6min real time.

So committing every 1-2m records is much better. I don't mind using short transactions (in fact the program doesn't actually need any transactions). Perhaps it would be good to have a "Allow LMDB to automatically commit+reopen this transaction for optimal performance" flag, or some way of easily knowing when the txn should be committed and reopened rather than trying to guess roughly how many bytes i've written since the last txn and commit if > a magic number of 400mb?

Also I don't know how intentional the 512mb limit you mention is but perhaps that could be set at runtime - in that way I could just set to half the box's mem size and ensure I don't need to write anything until I have the whole thing generated?

By the way, looking at `free` output seems to imply that `top` is lying about how much memory the program is using - residential looks like it is capped at 500mb but it keeps rising along with shared which is presumably the pages in the mmap that are in memory at the moment.

wrt the ssd vs hdd performance differences, I did see similar disk write issues in kyoto. So for that we generate onto a memdisk, however it seems a bit strange to have to do this with LMDB given it's advertised as a memory database.

Mark