[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: slapd hangs up and uses 100% CPU (v2.1.12 release)

To: lists@codeangels.com
Subject: Re: slapd hangs up and uses 100% CPU (v2.1.12 release)
From: Jehan PROCACCIA <Jehan.Procaccia@int-evry.fr>
Date: Thu, 06 Feb 2003 09:48:05 +0100
Cc: OpenLDAP-software@OpenLDAP.org
References: <4349.194.230.157.69.1044451898.squirrel@www.codeangels.com> <3E412C55.1020608@int-evry.fr> <2371.192.168.1.3.1044483728.squirrel@www.codeangels.com>
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20021003

The strace I get when slapd takes 100% CPU is:

$ strace -p 1939
sched_yield() = 0
sched_yield() = 0
sched_yield() = 0
....

It looks tha same as yours ... ?

Too bad you don't take advantage of BDB, however you're right, as long as you don't get a solution on this pb, this is very annoying.

One's should read the following thread about that pb, for me it was becauses I reserved too much memory for BDB !

http://www.openldap.org/lists/openldap-software/200212/msg00435.html

Kirill Ponazdyr wrote:

Greetings,

When i truss (Solaris´s strace) the hanging process I see a loop made of:

/4:     yield()                                         = 0

lines.

I have tried to recompile bdb in its newest version, compile with -O
instead -O3 as I usually do, no change at all.

I also tried to change file descriptor soft limit to 1024, systemwide. No
change again.

Now I gaved up and went to ldbm on the base of gdbm, it works flawlessly.
It really seems to be a "exclusive" bdb backend problem.

Regards

Kirill

I have the same pb, but cannot reproduct it as I want ...
However I noticed that when I stoped playing with bdb tunning it worked
better ... By playing with bdb, i mean using the cachesize and
checkpoint directives in slapd.conf, if you put silly values, as I might
have done, this will maybe trash slapd ... ? Since I put reseaonable
values, now it seems to work fine .

my slapd.conf

#cachesize      6000
checkpoint      100000 360
#dbnosync

and DB_Config file for my database

$ cat /var/lib/ldap/int/DB_CONFIG
#set the logfile size to 100MB.
#set_lg_max 104857600
#set the in-memory log buffer size
set_lg_bsize 204800
#temporary while we're slapadding the database
set_flags DB_TXN_NOSYNC
#set the (per db?) cachesize to 0GB + X bytes, split into N pieces of
memory set_cachesize 0 5120000 2


Although I still don't know which ones are used, slapd.conf directives
or DB_CONFIG ones ??

when slapd takes 100% , could you make a strace -p pid  (pid=pid of
slapd at 100%) to check what is is actually doing. For me it was looping
on something, can't remember what, but it's somewhere in the list .

Let us know if you find an explanation.

Thanks.

Kirill Ponazdyr wrote:

Greetings,

We have a problem with slapd hanging up and using 100% CPU time on our
machine when we try to do operations on a tree, it happens in random
places but we could find one where it happens every time, when we try
to delete a certain object in the tree. We can repro the problem as
many times as we wish. Unfortunately the slapd has to be killed by
kill -9 and this corrupts our databases, so we have to reload a
directory (PITA).

Thus two questions: Why is this stuff happening ? and is there a way
to run a consistency check on BDB databases, thus not requiering the
full reload ?

Here are release infos, configs and debug output:

Releases:
-----------------------------------------
Openldap v2.1.12 release
Bdb libraries 4.1.24
Solaris 9 Sparc with latest patch cluster

HW:
-----------------------------------------
Sun Netra T1125 with 1 Gig RAM.


DB_CONFIG
-------------------------------
set_lg_bsize 2097152
set_cachesize 0 209715200 2


slapd.conf:
--------------------------------------------------------------
include                 /etc/openldap/schema/core.schema
include                 /etc/openldap/schema/cosine.schema
include                 /etc/openldap/schema/nis.schema
include                 /etc/openldap/schema/qmail.schema
include                 /etc/openldap/schema/inetorgperson.schema
include                 /etc/openldap/schema/qmailControl.schema
pidfile                 /var/run/slapd.pid
argsfile                /var/run/slapd.args
disallow                bind_anon
allow                   bind_v2

database                bdb
suffix                  "o=Codeangels, c=CH"
directory               /export/ldap-databases/codeangels
rootdn                  ** censored **
rootpw                  ** censored **
index                   cn,sn,uid pres,eq,approx,sub
index                   objectClass eq
... snip ....

Debug:
---------------- snip -------------------
=> access_allowed: write access granted by write(=wrscx)
====> bdb_unlocked_cache_return_entry_r( 526 ): returned (0)
bdb_dn2entry_rw("cn=managers,ou=codeangels.com,ou=mail,ou=itaccounts,o=codeangels,c=ch")
=> bdb_dn2id_matched(
"cn=managers,ou=codeangels.com,ou=mail,ou=itaccounts,o=codeangels,c=ch"
) ====>
bdb_cache_find_entry_dn2id("cn=managers,ou=codeangels.com,ou=mail,ou=itaccounts,o=codeangels,c=ch"):
542 (1 tries)
bdb_cache_entry_db_lock: entry
cn=managers,ou=codeangels.com,ou=mail,ou=itaccounts,o=codeangels,c=ch,
rw 1, rc -30995 ====> bdb_cache_find_entry_id( 542 ): 542 (busy) 2
locker = -2147483031
bdb_cache_entry_db_lock: entry
cn=managers,ou=codeangels.com,ou=mail,ou=itaccounts,o=codeangels,c=ch,
rw 1, rc -30995 ====> bdb_cache_find_entry_id( 542 ): 542 (busy) 2
locker = -2147483031
bdb_cache_entry_db_lock: entry
cn=managers,ou=codeangels.com,ou=mail,ou=itaccounts,o=codeangels,c=ch,
rw 1, rc -30995 ====> bdb_cache_find_entry_id( 542 ): 542 (busy) 2
locker = -2147483031
.... repeat above 2 lines until killed ....
---------------- snip -------------------

---
Kirill Ponazdyr
Technical Director
Codeangels Solutions
Tel: +41 (0)43 844 90 10
Fax: +41 (0)43 844 90 12

References:
- slapd hangs up and uses 100% CPU (v2.1.12 release)
  - From: "Kirill Ponazdyr" <lists@codeangels.com>
- Re: slapd hangs up and uses 100% CPU (v2.1.12 release)
  - From: Jehan PROCACCIA <Jehan.Procaccia@int-evry.fr>
- Re: slapd hangs up and uses 100% CPU (v2.1.12 release)
  - From: "Kirill Ponazdyr" <lists@codeangels.com>

Prev by Date: Re: OpenLDAP on RedHat 8.0
Next by Date: Re: i just couldn't ldapadd!!!!!?????
Index(es):
- Chronological
- Thread