[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Antw: Re: Slapd running very slow




On 2015-04-28 7:51 PM, Andrew Findlay wrote:
> Did you get to the bottom of this?
Yes.
>
> On Thu, Apr 23, 2015 at 08:29:48PM +1000, Geoff Swan wrote:
>
>> On 2015-04-23 5:56 PM, Howard Chu wrote:
>>> In normal (safe) operation, every transaction commit performs 2
>>> fsyncs. Your 140MB/s throughput spec isn't relevant here, your disk's
>>> IOPS rate is what matters. You can use NOMETASYNC to do only 1 fsync
>>> per commit.
> Decent SAS disks spin at 10,000 or 15,000 RPM so unless there is a non-volatile
> memory cache in there I would expect at most 15000/60 = 250 fsyncs per second per
> drive, giving 125 transaction commits per second per drive.
These are Enterprise SAS drives with onboard read and write cache systems.

>
>> OK. I ran a reduced version of test script (20 processes each performing
>> 40 read/write operations) with normal (safe) mode of operation on a test
>> server that has 32GB RAM, and everything else identical to the server
>> with 128GB.
> So that is just 800 operations taking 60s?
>
>> A quick test using vmstat at 1s intervals gave the following output
>> whilst it was running.
>>
>> procs ---------------memory-------------- ---swap-- -----io----
>> -system-- ------cpu-----
>>  r  b     swpd     free     buff    cache   si   so    bi    bo   in  
>> cs us sy id wa st
>> 20  0        0 32011144   167764   330416    0    0     1    15   40  
>> 56  0  0 99  1  0
>>  0  0        0 31914848   167764   330424    0    0     0  1560 2594
>> 2130  2  1 97  0  0
>>  0  0        0 31914336   167764   330424    0    0     0  1708  754
>> 1277  0  0 100  0  0
>>  0  0        0 31914508   167772   330420    0    0     0  2028  779
>> 1300  0  0 99  1  0
>> The script took about 60s to complete, which is a lot longer than
>> expected. It appears almost all I/O bound, at a fairly slow rate (1500
>> blocks in a second is 6MB/s).
> As you say, it is IO bound (wa ~= 100%). Stop worrying about MB/s: the data rate is
> irrelevant, what matters is synchronous small-block writes and those are limited by
> rotation speed.
>
> Are you absolutely certain that the disks are SAS? Does your disk controller
> believe it? I had big problems with an HP controller once that refused to run SATA
> drives at anything like their full speed as it waited for each transaction to
> finish and report back before queuing the next one...
Yes, they are SAS drives and the driver recognises them as such,
connected to a C600 controller.

>
> Andrew

Did a lot of testing over the last week or so.
It appears to be fundamentally a linux block layer problem. An fsync
operation appears to set the FUA flag on the scsi command to force it to
bypass the write cache. This is a real problem since it bypasses the
intelligence built into a scsi controller to handle the write cache. So
consequently we see a seek time in each 4K block transaction. Seems to
be hard wired and buried in the block layer. It would be nice to have a
mount option to prevent this from happening on certain mounted volumes.

However, there was some significant improvement in the 3.19.5 kernel,
where multi-queues can be enabled for scsi operations. Seems to still
bypass the write cache on the scsi drive, however the performance is
much better.

Another area that also improved things was in the vm tuning, however
this is fairly sensitive (like a high-Q bandpass filter). Reducing the
vm.dirty_expire_centisecs value from 30s to 15s improved things in this
environment, which can have a buildup of written pages. Making them
expire a bit sooner allows for less bumpy cache flushing.