[Date Prev][Date Next]
Re: Fwd: multiple sequential lmdb readers + spinning media = slow / thrashes?
- To: Matthew Moskewicz <firstname.lastname@example.org>, email@example.com
- Subject: Re: Fwd: multiple sequential lmdb readers + spinning media = slow / thrashes?
- From: Howard Chu <firstname.lastname@example.org>
- Date: Thu, 26 Feb 2015 23:46:43 +0000
- In-reply-to: <CAP_1Qvn+E7n0wqvFdnx6zvBXQz8pUg7SksCQ=3GJcOrYOagjiA@mail.gmail.com>
- References: <CAP_1QvmL1G8P8mpgEzETjvfub_3exodwr99Vn89ajPMgcMOFPA@mail.gmail.com> <CAP_1Qvn+E7n0wqvFdnx6zvBXQz8pUg7SksCQ=3GJcOrYOagjiA@mail.gmail.com>
- User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Firefox/38.0 SeaMonkey/2.35a1
Matthew Moskewicz wrote:
warnings: new to list, first post, lmdb noob.
i'm a caffe user:
in one use case, caffe sequentially streams though >100GB lmdbs at a
rate of ~30MB/s in blocks of about 40MB. however, if multiple caffe
processes are reading the same lmdb (opened with MDB_RDONLY), read
performance becomes limiting (i.e. the processes become IO bound), even
though the disk has sufficient read bandwidth (say ~180MB/s). some of
the relevant caffe lmdb code is here:
however, if i *both*
1) run blockdev --setra 65536 --setfra 65536 /dev/sdwhatever
2) modify lmdb to call posix_madvise(env->me_map, env->me_mapsize,
then i can get >1 reader to run without being IO limited.
This is quite timing-dependent - if you start your multiple readers at exactly the same time and they run at exactly the same speed, then they will all be using the same cached pages and all of the readers can run at the full bandwidth of the disk. If they're staggered or not running in lockstep, then you'll only get partial performance.
for (2), see https://github.com/moskewcz/scratch/tree/lmdb_seq_read_opt
similarly, using a sequential read microbenchmark designed to model the
caffe reads from here:
if i run one reader, i get 180MB/s bandwidth.
with two readers, but neither (1) nor (2) above, each gets ~30MB/s
with (1) and (2) enabled, and two readers, each gets ~90MB/s bandwidth.
The other point to note is that sequential reads in LMDB won't remain truly sequential (as seen by the storage device) after a few rounds of inserts/deletes/updates. Once you get any element of seek/random I/O in here your madvise will be useless.
PS: backstory (skippable):
caffe originally used LevelDB to get better read performance for
sequentially loading sets of ~1M 227x227x3 raw images (~200GB data).
typically processing time is ~2 hours for this data set size, yielding a
read BW need of 30MB/s or so. it's not really clear if/why LevelDB was
uses aside from the fact that the caffe author was a google intern at
the time he wrote it, but anecdotally i think the claim is that reading
the raw .jpgs had perf. issues, although it's unclear exactly what or
why. i guess it was the usual story about not getting sequential reads
without using LevelDB. they switched to lmdb a while back.
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/