[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: slapd hangs up and uses 100% CPU (v2.1.12 release)



On Wed, 5 Feb 2003, Kirill Ponazdyr wrote:

> Greetings,
>
> When i truss (Solarisīs strace) the hanging process I see a loop made of:
>
> /4:     yield()                                         = 0
>
> lines.
>
> I have tried to recompile bdb in its newest version, compile with -O
> instead -O3 as I usually do, no change at all.
>
> I also tried to change file descriptor soft limit to 1024, systemwide. No
> change again.
>
> Now I gaved up and went to ldbm on the base of gdbm, it works flawlessly.
> It really seems to be a "exclusive" bdb backend problem.
>

I can reproduce this on Solaris 9.
Check out http://www.OpenLDAP.org/its/index.cgi?findid=2195

Do you use 'group' in your acls?

-Igor

> Regards
>
> Kirill
>
> > I have the same pb, but cannot reproduct it as I want ...
> > However I noticed that when I stoped playing with bdb tunning it worked
> > better ... By playing with bdb, i mean using the cachesize and
> > checkpoint directives in slapd.conf, if you put silly values, as I might
> >  have done, this will maybe trash slapd ... ? Since I put reseaonable
> > values, now it seems to work fine .
> >
> > my slapd.conf
> >
> > #cachesize      6000
> > checkpoint      100000 360
> > #dbnosync
> >
> > and DB_Config file for my database
> >
> > $ cat /var/lib/ldap/int/DB_CONFIG
> > #set the logfile size to 100MB.
> > #set_lg_max 104857600
> > #set the in-memory log buffer size
> > set_lg_bsize 204800
> > #temporary while we're slapadding the database
> > set_flags DB_TXN_NOSYNC
> > #set the (per db?) cachesize to 0GB + X bytes, split into N pieces of
> > memory set_cachesize 0 5120000 2
> >
> >
> > Although I still don't know which ones are used, slapd.conf directives
> > or DB_CONFIG ones ??
> >
> > when slapd takes 100% , could you make a strace -p pid  (pid=pid of
> > slapd at 100%) to check what is is actually doing. For me it was looping
> >  on something, can't remember what, but it's somewhere in the list .
> >
> > Let us know if you find an explanation.
> >
> > Thanks.
> >
> > Kirill Ponazdyr wrote:
> >> Greetings,
> >>
> >> We have a problem with slapd hanging up and using 100% CPU time on our
> >> machine when we try to do operations on a tree, it happens in random
> >> places but we could find one where it happens every time, when we try
> >> to delete a certain object in the tree. We can repro the problem as
> >> many times as we wish. Unfortunately the slapd has to be killed by
> >> kill -9 and this corrupts our databases, so we have to reload a
> >> directory (PITA).
> >>
> >> Thus two questions: Why is this stuff happening ? and is there a way
> >> to run a consistency check on BDB databases, thus not requiering the
> >> full reload ?
> >>
> >> Here are release infos, configs and debug output:
> >>
> >> Releases:
> >> -----------------------------------------
> >> Openldap v2.1.12 release
> >> Bdb libraries 4.1.24
> >> Solaris 9 Sparc with latest patch cluster
> >>
> >> HW:
> >> -----------------------------------------
> >> Sun Netra T1125 with 1 Gig RAM.
> >>
> >>
> >> DB_CONFIG
> >> -------------------------------
> >> set_lg_bsize 2097152
> >> set_cachesize 0 209715200 2
> >>
> >>
> >> slapd.conf:
> >> --------------------------------------------------------------
> >> include                 /etc/openldap/schema/core.schema
> >> include                 /etc/openldap/schema/cosine.schema
> >> include                 /etc/openldap/schema/nis.schema
> >> include                 /etc/openldap/schema/qmail.schema
> >> include                 /etc/openldap/schema/inetorgperson.schema
> >> include                 /etc/openldap/schema/qmailControl.schema
> >> pidfile                 /var/run/slapd.pid
> >> argsfile                /var/run/slapd.args
> >> disallow                bind_anon
> >> allow                   bind_v2
> >>
> >> database                bdb
> >> suffix                  "o=Codeangels, c=CH"
> >> directory               /export/ldap-databases/codeangels
> >> rootdn                  ** censored **
> >> rootpw                  ** censored **
> >> index                   cn,sn,uid pres,eq,approx,sub
> >> index                   objectClass eq
> >> ... snip ....
> >>
> >> Debug:
> >> ---------------- snip -------------------
> >> => access_allowed: write access granted by write(=wrscx)
> >> ====> bdb_unlocked_cache_return_entry_r( 526 ): returned (0)
> >> bdb_dn2entry_rw("cn=managers,ou=codeangels.com,ou=mail,ou=itaccounts,o=codeangels,c=ch")
> >> => bdb_dn2id_matched(
> >> "cn=managers,ou=codeangels.com,ou=mail,ou=itaccounts,o=codeangels,c=ch"
> >> ) ====>
> >> bdb_cache_find_entry_dn2id("cn=managers,ou=codeangels.com,ou=mail,ou=itaccounts,o=codeangels,c=ch"):
> >> 542 (1 tries)
> >> bdb_cache_entry_db_lock: entry
> >> cn=managers,ou=codeangels.com,ou=mail,ou=itaccounts,o=codeangels,c=ch,
> >> rw 1, rc -30995 ====> bdb_cache_find_entry_id( 542 ): 542 (busy) 2
> >> locker = -2147483031
> >> bdb_cache_entry_db_lock: entry
> >> cn=managers,ou=codeangels.com,ou=mail,ou=itaccounts,o=codeangels,c=ch,
> >> rw 1, rc -30995 ====> bdb_cache_find_entry_id( 542 ): 542 (busy) 2
> >> locker = -2147483031
> >> bdb_cache_entry_db_lock: entry
> >> cn=managers,ou=codeangels.com,ou=mail,ou=itaccounts,o=codeangels,c=ch,
> >> rw 1, rc -30995 ====> bdb_cache_find_entry_id( 542 ): 542 (busy) 2
> >> locker = -2147483031
> >> .... repeat above 2 lines until killed ....
> >> ---------------- snip -------------------
> >>
> >> ---
> >> Kirill Ponazdyr
> >> Technical Director
> >> Codeangels Solutions
> >> Tel: +41 (0)43 844 90 10
> >> Fax: +41 (0)43 844 90 12
> >>
> >>
>
>
>
>

-- 
Igor