[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: slapd hangs at 100% cpu in sched_yield (ITS#2030)



At 01:33 AM 2002-08-20, steven.wilton@team.eftel.com wrote:
>How about adding the following lines to the patch you have applied to cvs? 

Because, as far as I can tell from looking at DB4 sources,
LOCK_ID() does not return DB_LOCK_NOTGRANTED.

They kinds of errors LOCK_ID() does return, such as ENOMEM,
are generally mapped to LDAP_OTHER slapd(8).  LDAP_BUSY
is a possibility here.

I note that looping waiting for resources to free generally
causes makes resource starvation problems worse not better.
Resource starvation is best resolved by making more resources
available to the process (or by coding changes to reduce the
demand for resources).

Kurt

> If the lock is rejected for the given reason, there is nothing major wrong with the database, but we should retry.  The client program does not know that the ldap server is only having a temporary error getting the data (as opposed to if the lock is rejected due to something like a corrupt database, where we should send an error back to the client).
>
>+retry:
>                rc = LOCK_ID ( bdb->bi_dbenv, &locker );
>                switch(rc) {
>                case 0:
>                        break;
>+               case DB_LOCK_NOTGRANTED:
>+                       ldap_pvt_thread_yield();
>+                       goto retry;
>                default:
>                        return LDAP_OTHER;
>                }
>
>
>We use ldap to authenticate users, and if one of the ldap client programs detects an error, unusual things will happen on the system (some requests will work, while a random number of connections will fail for no good reason).
>
>Steven
>
>On Tue, 20 Aug 2002 09:48:11 Kurt D. Zeilenga wrote:
>>I agree that the return result of LOCK_ID() should be checked.
>>I've added code which causes an LDAP_OTHER error if LOCK_ID()
>>fails, which in a quick check of DB4 code, is consistent with
>>possible error conditions.
>>Kurt