[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: LMDB random writes really slow for large data
Chuntao HONG wrote:
I am testing LMDB performance with the benchmark given in
http://www.lmdb.tech/bench/ondisk/. And I noticed that LMDB random writes are
really slow when the data goes beyond memory.
I am using a machine with 4GB DRAM with Intel PCIE SSD. The key size is 10
bytes and value size is 1KB. The benchmark code is given in
http://www.lmdb.tech/bench/ondisk/, and the command line I used is
"./db_bench_mdb --benchmarks=fillrandbatch --threads=1 --stats_interval=1024
--num=10000000 --value_size=1000 --use_existing_db=0 ".
For the first 1GB of data written, the average write rate is 140MB/s. The rate
then drops significantly to 40MB/s for the first 2GB. At the end of the test,
in which 10M values are written, the average rate is just 3MB/s, and the
instant rate is 1MB/s. I know LMDB is not optimized for writes, but I didn't
expect it to be this slow, given that I have a really high-end Intel SSD.
Any flash SSD will get bogged down by a continuous write workload, since it
must do wear-leveling and compaction in the background and "the background" is
getting too busy.
I also notice that the way LMDB access the SSD is really strange. At the
beginning of the test, it writes the SSD at around 400MB/s, but performs no
read, which is expected. But as we write more and more data, LMDB starts to
read the SSD. As time goes on, the read throughput rises while the write
throughput drops significantly. At the end of test, LMDB is constantly reading
at around 190MB/s, while occationally issuing 100MB writes at around 10-20
second intervals.
*
*
1. Is it normal for LMDB to have such low write throughput (1MB/s at the end
of test) for data stored on SSD?
2. Why is LMDB reading more data than it is writing (about 20MB data read per
1MB written) at the end of the test?
**
*
*
To my understanding, although we have more data than the DRAM can hold, the
branch nodes of the B-tree should still be in the DRAM. So for every write,
the only pages that we need to fetch from SSD is the leaf nodes. And when we
write the leaf node, we might also need to write its parents. So there should
be more writes than reads. But it turns out LMDB is reading much more than
writing. I think it might be the reason why it is so slow at the end. But I
really cannot understand why.*
Rerun the benchmark with --readahead=0. The kernel does 16page readahead by
default, and on a random access workload, 15 of those pages are wasted effort.
They also cause useful pages to be evicted from RAM. This is where the
majority of the excess reads come from.
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/