[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: slapd becomes unresponsive, using 100% CPU



I do get the same pb, as you mentioned I reported it first in :
http://www.openldap.org/lists/openldap-software/200212/msg00403.html
Now, I think I managed to have a correct DB_CONFIG, anyway it still happens (slapd takes 98-100% CPU, with no responses to ldapsearches)
Last time it happened I did an strace -p pid_slapd , but nothing was return !?
Hopefully I still run a 2.0.25 in production, so I can slapcat then slapadd on this "unstable" (to me at least) 2.1.12 (BDB 4.1.25_NC) version, that can run fine for few days then crashes to 100% CPU usage :-( .


This is why I still cannot upgrade to 2.1.X ! I consider it unstable as long as I get these unexpected 100% CPU problems.

Any definitve conclusion on this problem (misconfiguration of me or a bug ?) will be greatly appreciable.

jheiss-openldap@ofb.net wrote:
We're experiencing a problem on our LDAP servers.  They will run fine
for several days and then slapd will begin using all CPU time and
become unresponsive to queries.  This happens on both Red Hat 8.0 and
FreeBSD 4.7.  A restart of slapd generally restores order for another
couple of days.  I have been unable to strace slapd while it was
having trouble on Red Hat (strace just hangs with no output), but was
able to truss slapd on the FreeBSD box.  It's in a loop doing the
following (the exact calls in each loop vary a little):

gettimeofday(0x28328dec,0x0)                     = 0 (0x0)
sigprocmask(0x3,0x28328e78,0x0)                  = 0 (0x0)
sigaltstack(0x283435e0,0x0)                      = 0 (0x0)
poll(0x80ec000,0x33d,0x0)                        = 0 (0x0)
sigreturn(0x91bf864)                             = 0 (0x0)
SIGNAL 27
SIGNAL 27

When things are operating normally truss shows similar calls intermixed
with a number of read, write, fstat, fcntl, setsockopt, accept, close,
etc.

Software versions:

OpenLDAP 2.1.12
Berkeley DB 4.0.14-14 (Red Hat) / Berkeley DB 4.1.25 (FreeBSD)

I've managed to find a few similar problem reports.  There is ITS 2195,
but I don't think it is relevant because we don't use groups in our
ACLs.  There are also these emails to the list:
http://www.openldap.org/lists/openldap-software/200212/msg00403.html
http://www.openldap.org/lists/openldap-software/200302/msg00111.html
Again I don't think they are relevant.  The first was a mis-configuration
that we haven't done and the second seems to have been the same problem
as the ITS report.

I tried going back to ldbm instead of bdb which did seem to reduce the
frequency of problems but did not eliminate it.  I tried compiled slapd
without threads and that seems to have eliminated the problem but
introduced its own problems and doesn't seem like a viable solution.
I tried creating a DB_CONFIG and increasing the BDB cache size from the
default of 256k to 8M, 16M and 64M but that didn't help.

I'd appreciate any suggestions.

Thanks,

Jason




-- Jehan Procaccia | Ingenieur Systemes & Reseaux Institut National des Telecommunications| Tel : +33 (0) 160764436 MCI, Moyens Communs Informatiques | Mail: Jehan.Procaccia@int-evry.fr 9 rue Charles Fourier 91011 Evry France | Fax : +33 (0) 160764321