[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: slapd hangs up and uses 100% CPU (v2.1.12 release) (ITS#2302)







This seems to be related to ITS#2302.

- Jong

-----------------------------------------------------------

Greetings,

When i truss (Solaris´s strace) the hanging process I see a loop made of:

/4:     yield()                                         = 0

lines.

I have tried to recompile bdb in its newest version, compile with -O
instead -O3 as I usually do, no change at all.

I also tried to change file descriptor soft limit to 1024, systemwide. No
change again.

Now I gaved up and went to ldbm on the base of gdbm, it works flawlessly.
It really seems to be a "exclusive" bdb backend problem.

Regards

Kirill

> I have the same pb, but cannot reproduct it as I want ...
> However I noticed that when I stoped playing with bdb tunning it worked
> better ... By playing with bdb, i mean using the cachesize and
> checkpoint directives in slapd.conf, if you put silly values, as I might
>  have done, this will maybe trash slapd ... ? Since I put reseaonable
> values, now it seems to work fine .
>
> my slapd.conf
>
> #cachesize      6000
> checkpoint      100000 360
> #dbnosync
>
> and DB_Config file for my database
>
> $ cat /var/lib/ldap/int/DB_CONFIG
> #set the logfile size to 100MB.
> #set_lg_max 104857600
> #set the in-memory log buffer size
> set_lg_bsize 204800
> #temporary while we're slapadding the database
> set_flags DB_TXN_NOSYNC
> #set the (per db?) cachesize to 0GB + X bytes, split into N pieces of
> memory set_cachesize 0 5120000 2
>
>
> Although I still don't know which ones are used, slapd.conf directives
> or DB_CONFIG ones ??
>
> when slapd takes 100% , could you make a strace -p pid  (pid=pid of
> slapd at 100%) to check what is is actually doing. For me it was looping
>  on something, can't remember what, but it's somewhere in the list .
>
> Let us know if you find an explanation.
>
> Thanks.
>
> Kirill Ponazdyr wrote:
>> Greetings,
>>
>> We have a problem with slapd hanging up and using 100% CPU time on our
>> machine when we try to do operations on a tree, it happens in random
>> places but we could find one where it happens every time, when we try
>> to delete a certain object in the tree. We can repro the problem as
>> many times as we wish. Unfortunately the slapd has to be killed by
>> kill -9 and this corrupts our databases, so we have to reload a
>> directory (PITA).
>>
>> Thus two questions: Why is this stuff happening ? and is there a way
>> to run a consistency check on BDB databases, thus not requiering the
>> full reload ?
>>
>> Here are release infos, configs and debug output:
>>
>> Releases:
>> -----------------------------------------
>> Openldap v2.1.12 release
>> Bdb libraries 4.1.24
>> Solaris 9 Sparc with latest patch cluster
>>
>> HW:
>> -----------------------------------------
>> Sun Netra T1125 with 1 Gig RAM.
>>
>>
>> DB_CONFIG
>> -------------------------------
>> set_lg_bsize 2097152
>> set_cachesize 0 209715200 2
>>
>>
>> slapd.conf:
>> --------------------------------------------------------------
>> include                 /etc/openldap/schema/core.schema
>> include                 /etc/openldap/schema/cosine.schema
>> include                 /etc/openldap/schema/nis.schema
>> include                 /etc/openldap/schema/qmail.schema
>> include                 /etc/openldap/schema/inetorgperson.schema
>> include                 /etc/openldap/schema/qmailControl.schema
>> pidfile                 /var/run/slapd.pid
>> argsfile                /var/run/slapd.args
>> disallow                bind_anon
>> allow                   bind_v2
>>
>> database                bdb
>> suffix                  "o=Codeangels, c=CH"
>> directory               /export/ldap-databases/codeangels
>> rootdn                  ** censored **
>> rootpw                  ** censored **
>> index                   cn,sn,uid pres,eq,approx,sub
>> index                   objectClass eq
>> ... snip ....
>>
>> Debug:
>> ---------------- snip -------------------
>> => access_allowed: write access granted by write(=wrscx)
>> ====> bdb_unlocked_cache_return_entry_r( 526 ): returned (0)
>>
bdb_dn2entry_rw("cn=managers,ou=codeangels.com,ou=mail,ou=itaccounts,o=codeangels,c=ch")

>> => bdb_dn2id_matched(
>> "cn=managers,ou=codeangels.com,ou=mail,ou=itaccounts,o=codeangels,c=ch"
>> ) ====>
>>
bdb_cache_find_entry_dn2id("cn=managers,ou=codeangels.com,ou=mail,ou=itaccounts,o=codeangels,c=ch"):

>> 542 (1 tries)
>> bdb_cache_entry_db_lock: entry
>> cn=managers,ou=codeangels.com,ou=mail,ou=itaccounts,o=codeangels,c=ch,
>> rw 1, rc -30995 ====> bdb_cache_find_entry_id( 542 ): 542 (busy) 2
>> locker = -2147483031
>> bdb_cache_entry_db_lock: entry
>> cn=managers,ou=codeangels.com,ou=mail,ou=itaccounts,o=codeangels,c=ch,
>> rw 1, rc -30995 ====> bdb_cache_find_entry_id( 542 ): 542 (busy) 2
>> locker = -2147483031
>> bdb_cache_entry_db_lock: entry
>> cn=managers,ou=codeangels.com,ou=mail,ou=itaccounts,o=codeangels,c=ch,
>> rw 1, rc -30995 ====> bdb_cache_find_entry_id( 542 ): 542 (busy) 2
>> locker = -2147483031
>> .... repeat above 2 lines until killed ....
>> ---------------- snip -------------------
>>
>> ---
>> Kirill Ponazdyr
>> Technical Director
>> Codeangels Solutions
>> Tel: +41 (0)43 844 90 10
>> Fax: +41 (0)43 844 90 12
>>
>>





------------------------
Jong Hyuk Choi
IBM Thomas J. Watson Research Center - Enterprise Linux Group
P. O. Box 218, Yorktown Heights, NY 10598
email: jongchoi@us.ibm.com
(phone) 914-945-3979    (fax) 914-945-4425   TL: 862-3979