[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: write-scaling problems in LMDB



Luke Kenneth Casson Leighton wrote:
http://symas.com/mdb/inmem/scaling.html

i just looked at this and i have a sneaking suspicion that you may be
running into the same problem that i encountered when accidentally
opening 10 LMDBs 20 times by forking 30 processes *after* opening the
10 LMDBs... (and also forgetting to close them in the parent).

what i found was that when i reduced the number of LMDBs to 3 or
below, the loadavg on the multi-core system was absolutely fine [ok,
it was around 3 to 4 but that's ok]

adding one more LMDB (remember that's 31 extra file handles opened to
a shm-mmap'd file) increased the loadavg to 6 or 7.  by the time that
was up to 10 the loadavg had completely unreasonably jumped to over
30.  i could log in over ssh - that was not a problem.  editing a file
was ok (opening it) but trying to create new files resulted in
applications (such as vim) was blocked so badly that i often could not
even press ctrl-z in order to background the task, and had to kill the
entire ssh session.

in each test run the amount of work being done was actually relatively small.

basically i suspect a severe bug in the linux kernel which these
extreme circumstances (32 or 16 processes accessing the same mmap'd
file for example) have never been encountered, so the bug is simply...
unknown.

I've occasionally seen the system lockup as well; I'm pretty sure it's due to dirty page writeback. There are plenty of other references on the web about Linux dealing poorly with heavy writeback I/O.

can i make the suggestion that, whilst i am aware that it is generally
not recommended for production environments to run more processes than
there are cores, you try running 128, 256 and even 512 processes all
hitting that 64-core system, and monitor its I/O usage (iostats) and
loadavg whilst doing so?

Sure, I can conduct that test and collect system stats using atop. Will let ya know. By the way, we're using threads here, not processes. But the overall loading behavior should be the same either way.

the hypothesis to test is that the performance, which should scale
reasonably linearly downwards as a ratio of the number of processes to
the number of cores, instead drops like a lead balloon.

l.



--
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/