[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: write-scaling problems in LMDB



On Mon, Oct 20, 2014 at 1:53 PM, Howard Chu <hyc@symas.com> wrote:

>>   then it would be possible to make a direct comparison (against the
>> figures you just sent), against the e.g. 32-threads case.  32 readers,
>> 2 writers.  32 readers, 4 writers.  32 readers, 8 writers and so on.
>> keeping the number of threads (write plus read) to below or equal the
>> total number of cores avoids any unnecessary context-switching
>
>
> We can do that by running two instances of the benchmark program
> concurrently; one doing a read-only job with a fixed number of threads (32)
> and one doing a write-only job with the increasing number of threads.

 ohh, ok - great.  saves a job doing some programming at least.

>>   the hypothesis being tested is that the writers performance overall
>> remains the same, as only one may perform writes at a time.
>
>
>>   i know it sounds silly to do that: it sounds so obvious that yeah it
>> really should not make any difference given that no matter how many
>> writers there are they will always do absolutely nothing (except one
>> of them), and the context switching when one finishes should also be
>> negligeable, but i know there's something wrong and i'd like to help
>> find out what it is.
>
>
> My experience from benchmarking OpenLDAP over the years is that mutexes
> scale only up to a point. When you have threads grabbing the same mutex from
> across socket boundaries, things go into the toilet. There's no fix for
> this; that's the nature of inter-socket communication.

 argh.  ok.  so... actually.... accidentally, the design where i used
a single LMDB (one env) shared amongst (20 to 30) processes using
db_open to create (10 or so) databases would mitigate against that...
taking a quick look at mdb.c the mutex lock is done on the env not on
the database...

 sooo compared to the previous design there would only be a 20/30-to-1
mutex contention whereas previously there were  *10 sets* of 20 or 30
to 1 mutexes all competing... and if mutexes use sockets underneath
that would explain why the inter-process communication (which also
used sockets) was so dreadful.

 huh, how about that.

do you happen to have access to a straight 8-core SMP system, or is it
relatively easy to turn off the NUMA architecture?

l.