[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: slapd responds very slowly when cpu has 100% usage (but actually low load)

To: openldap-software@openldap.org
Subject: Re: slapd responds very slowly when cpu has 100% usage (but actually low load)
From: Antonis Christofides <anthony@itia.ntua.gr>
Date: Wed, 13 Dec 2006 10:27:11 +0200
Content-disposition: inline
In-reply-to: <17785.44047.309511.505800@CCC1.WPI.EDU> <244A3391-8A74-43E0-A9AF-F6DB217556D5@delong.com> <45798BFA.4090109@symas.com> <20061208141620.GC4922@itia.ntua.gr>
References: <20061208141620.GC4922@itia.ntua.gr> <17785.44047.309511.505800@CCC1.WPI.EDU> <20061208141620.GC4922@itia.ntua.gr> <244A3391-8A74-43E0-A9AF-F6DB217556D5@delong.com> <20061208141620.GC4922@itia.ntua.gr> <45798BFA.4090109@symas.com> <20061208141620.GC4922@itia.ntua.gr>
User-agent: mutt-ng/devel-r655 (Linux)

(My original message, which presents the problem, is at the bottom.)

Thank you for your responses, here is some more information:

> You didn't mention what version of slapd you're running.

That's right, sorry.  I'm running Debian-packaged slapd 2.2.23-8,
using bdb 4.2.52-18 (everything on my system is the Debian sarge's
packages, except for the kernel, which is a recompiled Ubuntu 6.06
kernel, 2.6.12 SMP).

> I would expect the call to wait after forking to exec true to be treated
> as blocking IO, but, perhaps on your system, true is an sh builtin.

That's right, true is a shell builtin.  So the two processes don't
fork or anything, they just run and run.  Some tests I made right now
show that a certain ldapsearch completes in 4 seconds if the processes
have nice 19, 35-40 seconds if they have nice 10, and longer if nice
0.

And, yes, if I replace "while true" with "while /bin/true", slapd
responds instantly.

> nicing a process does not affect its time slice, just where it sits
> in the run queue when it is ready to run.

Your description of how the scheduler works might explain the cause of
the problem if slapd makes a huge number of blocking I/O requests:
each time it makes such a request, it goes to Sleep, and the next
process on the run queue (the niced shell in our case) is set to run,
and exhausts its time slice.  Then, supposing slapd is ready again, it
is set to run, it makes a request, it sleeps again, etc.  If it needs
to make hundreds of I/O requests, could it explain the delay?

Here is what ps shows while I'm waiting for slapd to respond:

anthony@acheloos:~$ ps u -m 16204
USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND
root     16204  0.0  0.5 30140 5596 ?        -    11:45   0:00 /usr/sbin/slapd
root         -  0.0    -     -    - -        Ss   11:45   0:00 -
root         -  0.0    -     -    - -        Ss   11:45   0:00 -
root         -  0.0    -     -    - -        Rs   11:46   0:00 -

One of the threads is "Rs"; after slapd delivers its response, it goes
back to "Ss".

> Maybe you have the memory to let everything rest in memory.  I don't know what
> your two looping shells do to your memory...  If you had some control to never
> swap out ldap, this theory could be tested.

I think I have lots of spare memory:
 
top - 12:53:32 up 60 days,  1:35,  5 users,  load average: 1.38, 0.77, 0.63
Tasks: 262 total,   3 running, 258 sleeping,   1 stopped,   0 zombie
 Cpu0 :  0.7% us,  0.7% sy, 98.7% ni,  0.0% id,  0.0% wa,  0.0% hi, 0.0% si
 Cpu1 :  0.0% us,  0.0% sy, 100.0% ni,  0.0% id,  0.0% wa,  0.0% hi, 0.0% si
Mem:   1035680k total,  1016936k used,    18744k free,    39004k buffers
Swap:  2097144k total,    99888k used,  1997256k free,   577360k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 8551 anthony   35  10  4032 1204  916 R 99.9  0.1   0:10.96 sh
 8550 anthony   35  10  4032 1204  916 R 98.8  0.1   0:10.72 sh
16204 root      21   0 30140 5600 3528 S  0.0  0.5   0:00.22 slapd

I tried to play with cachesize and idlcachesize (set them to 10
thousand), but didn't see any difference, which hardly surprises me
given that my ldap database has only 257 records.

Finally, here is my DB_CONFIG:

set_cachesize   0       2097152         0
set_lg_bsize    524288
set_lk_max_objects      5000
set_lk_max_locks        5000
set_lk_max_lockers      5000

(My slapd.conf does not contain any db-related parameters).



My original message:
> Hi,
> 
> At the almost idle Dual Core machine which runs slapd, I run:
> 
>    nice sh -c 'while true; do true; done' &
>    nice sh -c 'while true; do true; done' &
> 
> (i.e. I'm running this twice).  Then each of the two CPUs always has
> some job to do, so both CPUs have 100% usage, but this is "nice".
> 
> Then, slapd takes too long to respond to queries.  It may take 10 or
> 20 seconds.  If I kill or stop one of the two dummy processes, it
> replies instantly.  If I continue both dummy processes, it's back to
> 10 or 20 seconds.  Needless to say all machine resources seem ok; low
> disk usage, lots of spare memory; and slapd is not niced.
> 
> If it's not something immediately obvious, could you help me debug it?
> I've run slapd with various "-d" options but it gives me results that
> I have trouble understanding.
> 
> The OS is Debian 3.1 (Sarge), with a 2.6.12 SMP Linux kernel.

Follow-Ups:
- Re: slapd responds very slowly when cpu has 100% usage (but actually low load)
  - From: Quanah Gibson-Mount <quanah@stanford.edu>
- Re: slapd responds very slowly when cpu has 100% usage (but actually low load)
  - From: Howard Chu <hyc@symas.com>

References:
- slapd responds very slowly when cpu has 100% usage (but actually low load)
  - From: Antonis Christofides <anthony@itia.ntua.gr>
- Re: slapd responds very slowly when cpu has 100% usage (but actually low load)
  - From: Howard Chu <hyc@symas.com>

Prev by Date: Re: dnsDomain2.schema and aRecord
Next by Date: Re: dnsDomain2.schema and aRecord
Index(es):
- Chronological
- Thread