[Date Prev][Date Next] [Chronological] [Thread] [Top]

(ITS#4308) poor performance under load



Full_Name: Kevin Spicer
Version: 2.3.14 / 2.3.15
OS: Solaris 9
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (198.178.236.140)


I posted about this on the software list earlier, but have since confirmed the
fault with 2.3.15 as well as 2.3.14 (my first test on 2.3.15 was okay but
running the test subsequently was not)

After a period of running (minutes or hours) slapd starts to chew up CPU and
become very slow to respond, at times slapd constantly uses 100% of one CPU
(these are dual CPU systems).  I had no such problem on 2.3.11 (although I'm not
ruling out a configuration error on my part).  The only change I made to the
configuration was to comment out a 'threads 500' directive (thanks for the
helpful warning message about that BTW!) - that isn't related, as reinstating it
doesn't solve the problem.  I've rerun slapindex in case it was an indexing
issue but that also hasn't helped.  I'm seeing this on several servers and it
tends to happen when there is a reasonable amount of user activity on the server
(the user load isn't very high though, just the problem doesn't appear to happen
when the server is mostly idle).

My setup, briefly

4 Solaris 9 servers
ol 2.3.14
db 4.2.52

Each server has 4 databases organised as a master with three subsidiary db's
glued on, each server acts as master for one of these with the other three
replicated onto it using slurpd.  I'm also using the ppolicy overlay.

compiled with '--prefix=/usr/local' '--enable-bdb' '--enable-crypt'
'--with-threads' '--with-tls' '--without-kerberos' '--enable-wrappers'
'--enable-modules' '--enable-ppolicy=mod' 

I've since done some testing to attempt to replicate the problems with both
2.3.14 and 2.3.13 (without patches), it seems that 2.3.14 is substantially
slower.

To test I kicked off the following command 15 times (concurrently)
ldapsearch >/dev/null &

This returns the 4400 odd entries in my directory that can be viewed with an
anonymous bind and no ssl.  
Admittedly this is a crude test, and there are other apps on the system that
will be making ldap queries - but nothing like this order of magnitude.  I was
running my regular slapd.conf except with logging 0.

I repeated the tests using various values for threads in slapd.conf (the
problems on my production systems seem to have lessened since I raised the
threads parameter to an insane value!).

In all cases the 'user' time was around 6.5 - 7s and the 'sys' time was around
1.2 - 1.5s however the 'real' time varied greatly - see below...

			Threads
		2		4	8	16	32
2.3.13	10.7s		9.1s	8.3s	8.4s	8.1s
2.3.14	171.9s	47s	31s	64.7s	30.6s

Following some advice I took a look at the output of prstat -L, it seems that
one thread is using lots of CPU (2.3.14 16 threads)
prstat -L -p `cat /var/run/slapd/slapd.pid`
   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/LWPID
 11383 ldap      245M   28M cpu0     0    0   0:00:17  28% slapd/2
 11383 ldap      245M   28M sleep   59    0   0:00:00 1.0% slapd/17
 11383 ldap      245M   28M sleep   59    0   0:00:00 0.7% slapd/15
 11383 ldap      245M   28M sleep   59    0   0:00:00 0.6% slapd/16
 11383 ldap      245M   28M sleep   59    0   0:00:00 0.6% slapd/10
 11383 ldap      245M   28M sleep   59    0   0:00:00 0.6% slapd/14
 11383 ldap      245M   28M sleep   59    0   0:00:00 0.6% slapd/13
 11383 ldap      245M   28M sleep   59    0   0:00:00 0.6% slapd/4
 11383 ldap      245M   28M sleep   59    0   0:00:00 0.6% slapd/7
 11383 ldap      245M   28M sleep   59    0   0:00:00 0.6% slapd/11
 11383 ldap      245M   28M sleep   59    0   0:00:00 0.5% slapd/3
 11383 ldap      245M   28M sleep   59    0   0:00:00 0.5% slapd/5
 11383 ldap      245M   28M sleep   59    0   0:00:00 0.5% slapd/6
 11383 ldap      245M   28M sleep   59    0   0:00:00 0.4% slapd/9
 11383 ldap      245M   28M sleep   59    0   0:00:00 0.4% slapd/8
 11383 ldap      245M   28M sleep   59    0   0:00:00 0.4% slapd/12
 11383 ldap      245M   28M sleep   59    0   0:00:00 0.1% slapd/1
 11383 ldap      245M   28M sleep   59    0   0:00:00 0.0% slapd/18

After the test completes that one thread continues to eat cpu for up to several
minutes.