[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Fwd: multiple sequential lmdb readers + spinning media = slow / thrashes?



Matthew Moskewicz wrote:

warnings: new to list, first post, lmdb noob.

i'm a caffe user:
https://github.com/BVLC/caffe

in one use case, caffe sequentially streams though >100GB lmdbs at a
rate of ~30MB/s in blocks of about 40MB. however, if multiple caffe
processes are reading the same lmdb (opened with MDB_RDONLY), read
performance becomes limiting (i.e. the processes become IO bound), even
though the disk has sufficient read bandwidth (say ~180MB/s). some of
the relevant caffe lmdb code is here:

https://github.com/BVLC/caffe/blob/master/src/caffe/util/db.cpp

however, if i *both*
1) run  blockdev --setra 65536 --setfra 65536 /dev/sdwhatever
2) modify lmdb to call posix_madvise(env->me_map, env->me_mapsize,
POSIX_MADV_SEQUENTIAL);

then i can get >1 reader to run without being IO limited.

This is quite timing-dependent - if you start your multiple readers at exactly the same time and they run at exactly the same speed, then they will all be using the same cached pages and all of the readers can run at the full bandwidth of the disk. If they're staggered or not running in lockstep, then you'll only get partial performance.

for (2), see https://github.com/moskewcz/scratch/tree/lmdb_seq_read_opt

similarly, using a sequential read microbenchmark designed to model the
caffe reads from here:
https://github.com/moskewcz/boda/blob/master/src/lmdbif.cc

if i run one reader, i get 180MB/s bandwidth.
with two readers, but neither (1) nor (2) above, each gets ~30MB/s
bandwidth.
with (1) and (2) enabled, and two readers, each gets ~90MB/s bandwidth.

The other point to note is that sequential reads in LMDB won't remain truly sequential (as seen by the storage device) after a few rounds of inserts/deletes/updates. Once you get any element of seek/random I/O in here your madvise will be useless.

any advice?

mwm

PS: backstory (skippable):
caffe originally used LevelDB to get better read performance for
sequentially loading sets of ~1M 227x227x3 raw images (~200GB data).
typically processing time is ~2 hours for this data set size, yielding a
read BW need of 30MB/s or so. it's not really clear if/why LevelDB was
uses aside from the fact that the caffe author was a google intern at
the time he wrote it, but anecdotally i think the claim is that reading
the raw .jpgs had perf. issues, although it's unclear exactly what or
why. i guess it was the usual story about not getting sequential reads
without using LevelDB. they switched to lmdb a while back.

<mailto:openldap-devel@openldap.org>



--
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/