2030 – slapd hangs at 100% cpu in sched_yield

Issue 2030 - slapd hangs at 100% cpu in sched_yield

Summary: slapd hangs at 100% cpu in sched_yield

Status:	VERIFIED FIXED

Alias:	None

Product:	OpenLDAP
Classification:	Unclassified
Component:	slapd (show other issues)
Version:	unspecified
Hardware:	All All

Importance:	--- normal
Target Milestone:	---
Assignee:	OpenLDAP project

URL:
Keywords:

Depends on:
Blocks:

Reported:	2002-08-19 06:30 UTC by steven.wilton@team.eftel.com
Modified:	2014-08-01 21:06 UTC (History)
CC List:	0 users

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this issue.

Description steven.wilton@team.eftel.com 2002-08-19 06:30:46 UTC

Full_Name: Steven Wilton
Version: 2.1.3
OS: Linux (Debian 3.0)
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (203.24.100.137)


This is actually a fix for the problem where slapd hangs using 100% CPU load
with the busy thread doing sched_yield() continuously.  I ran slapd with the "-d
1" flag, and came up with the following:

=> bdb_back_search
bdb(o=EFTEL): Lock table is out of available locker entries
bdb_dn2entry_rw("o=eftel")
=> bdb_dn2id_matched( "o=eftel" )
====> bdb_cache_find_entry_dn2id("o=eftel"): 1 (1 tries)
bdb(o=EFTEL): Locker does not exist
====> bdb_cache_find_entry_id( 1 ): 1 (busy) 2
locker = 1429
bdb(o=EFTEL): Locker does not exist
====> bdb_cache_find_entry_id( 1 ): 1 (busy) 2
locker = 1429


The last 3 lines then continue endlessly.  The problem is that the locker is not
allocated correctly in the first place, due to the load on the server.  I came
up with the following patch, which seems to have fixed the problem (I can't get
slapd to hang any more under the same load).  I had thought about inserting a
small (100ms) sleep before the sched_yield, but am not sure what the most
portable way of doing this is.

--- openldap-2.1.3/servers/slapd/back-bdb/back-bdb.h.orig       Mon Aug 19
13:01:27 2002
+++ openldap-2.1.3/servers/slapd/back-bdb/back-bdb.h    Mon Aug 19 13:01:56
2002
@@ -153,7 +153,7 @@
 #define TXN_COMMIT(txn,f)                      txn_commit((txn), (f))
 #define        TXN_ABORT(txn)                          txn_abort((txn))
 #define TXN_ID(txn)                                    txn_id(txn)
-#define LOCK_ID(env, locker)           lock_id(env, locker)
+#define LOCK_ID(env, locker)           while(lock_id(env, locker))
{ldap_pvt_thread_yield();}
 #define LOCK_ID_FREE(env, locker)      lock_id_free(env, locker)
 #else
 #define LOCK_DETECT(env,f,t,a)         (env)->lock_detect(env, f, t, a)
@@ -165,7 +165,7 @@
 #define TXN_COMMIT(txn,f)                      (txn)->commit((txn), (f))
 #define TXN_ABORT(txn)                         (txn)->abort((txn))
 #define TXN_ID(txn)                                    (txn)->id(txn)
-#define LOCK_ID(env, locker)           (env)->lock_id(env, locker)
+#define LOCK_ID(env, locker)           while((env)->lock_id(env, locker))
{ldap_pvt_thread_yield();}
 #define LOCK_ID_FREE(env, locker)      (env)->lock_id_free(env, locker)
 #endif

Comment 1 Kurt Zeilenga 2002-08-19 21:55:11 UTC

moved from Incoming to Software Bugs

Comment 2 Kurt Zeilenga 2002-08-19 23:01:13 UTC

This problem has likely been fixed in HEAD and OPENLDAP_REL_ENG_2_1
(available via CVS). Please test.

I don't think your patch is an appropriate fix.

At 11:30 PM 2002-08-18, steven.wilton@team.eftel.com wrote:
>Full_Name: Steven Wilton
>Version: 2.1.3
>OS: Linux (Debian 3.0)
>URL: ftp://ftp.openldap.org/incoming/
>Submission from: (NULL) (203.24.100.137)
>
>
>This is actually a fix for the problem where slapd hangs using 100% CPU load
>with the busy thread doing sched_yield() continuously.  I ran slapd with the "-d
>1" flag, and came up with the following:
>
>=> bdb_back_search
>bdb(o=EFTEL): Lock table is out of available locker entries
>bdb_dn2entry_rw("o=eftel")
>=> bdb_dn2id_matched( "o=eftel" )
>====> bdb_cache_find_entry_dn2id("o=eftel"): 1 (1 tries)
>bdb(o=EFTEL): Locker does not exist
>====> bdb_cache_find_entry_id( 1 ): 1 (busy) 2
>locker = 1429
>bdb(o=EFTEL): Locker does not exist
>====> bdb_cache_find_entry_id( 1 ): 1 (busy) 2
>locker = 1429
>
>
>The last 3 lines then continue endlessly.  The problem is that the locker is not
>allocated correctly in the first place, due to the load on the server.  I came
>up with the following patch, which seems to have fixed the problem (I can't get
>slapd to hang any more under the same load).  I had thought about inserting a
>small (100ms) sleep before the sched_yield, but am not sure what the most
>portable way of doing this is.
>
>--- openldap-2.1.3/servers/slapd/back-bdb/back-bdb.h.orig       Mon Aug 19
>13:01:27 2002
>+++ openldap-2.1.3/servers/slapd/back-bdb/back-bdb.h    Mon Aug 19 13:01:56
>2002
>@@ -153,7 +153,7 @@
> #define TXN_COMMIT(txn,f)                      txn_commit((txn), (f))
> #define        TXN_ABORT(txn)                          txn_abort((txn))
> #define TXN_ID(txn)                                    txn_id(txn)
>-#define LOCK_ID(env, locker)           lock_id(env, locker)
>+#define LOCK_ID(env, locker)           while(lock_id(env, locker))
>{ldap_pvt_thread_yield();}
> #define LOCK_ID_FREE(env, locker)      lock_id_free(env, locker)
> #else
> #define LOCK_DETECT(env,f,t,a)         (env)->lock_detect(env, f, t, a)
>@@ -165,7 +165,7 @@
> #define TXN_COMMIT(txn,f)                      (txn)->commit((txn), (f))
> #define TXN_ABORT(txn)                         (txn)->abort((txn))
> #define TXN_ID(txn)                                    (txn)->id(txn)
>-#define LOCK_ID(env, locker)           (env)->lock_id(env, locker)
>+#define LOCK_ID(env, locker)           while((env)->lock_id(env, locker))
>{ldap_pvt_thread_yield();}
> #define LOCK_ID_FREE(env, locker)      (env)->lock_id_free(env, locker)
> #endif
>

Comment 3 steven.wilton@team.eftel.com 2002-08-20 00:37:28 UTC

I had a look at the current code for OPENLDAP_REL_ENG_2_1 via web cvs, and it does not look like the cause of the problem has been fixed.  The problem is that the bdb backend may or may not successfully get a lock when you ask for for one.  On ~line 74 of servers/slapd/back-bdb/search.c the ldap server runs LOCK_ID(), but does not check the return value, and continues, assuming it has a valid lock.  This lock is passed from function to function until we hit ~line 849-921 in cache.c, where

state != CACHE_ENTRY_READY

and

rc = bdb_cache_entry_db_lock ( env, locker, ep, rw, 0, lock ); fails because the lock is invalid, so we log, yield and try again.  This is an endless loop unless we can guarantee that we have a valid bdb lock in the first place.

This only affects busy systems, as we have been running 2.1.3 on a couple of servers that do not get many requests for a while now without any problems.  On busy servers, the back-bdb code runs out of bdb locks under load, and so I would assume that we should check that we get a valid lock every time we ask for one, which is exactly what my patch does (although there may be better ways of doing this).  I will test again when 2.1.4 is released, and let you know whether you have fixed the problem in the openldap source (as I came up with a test on our system which would consistently lock the ldap server within 1-2 minutes of starting it).

Steven  
On Tue, 20 Aug 2002 07:01:13 Kurt D. Zeilenga wrote:
>This problem has likely been fixed in HEAD and  (available via CVS). Please test.
>I don't think your patch is an appropriate fix.
>At 11:30 PM 2002-08-18, steven.wilton@team.eftel.com wrote:
>>Full_Name: Steven Wilton
>>Version: 2.1.3
>>OS: Linux (Debian 3.0)
>>URL: ftp://ftp.openldap.org/incoming/
>>Submission from: (NULL) (203.24.100.137)
>>
>>
>>This is actually a fix for the problem where slapd hangs using 100% CPU
>load
>>with the busy thread doing sched_yield() continuously.  I ran slapd with
>the "-d
>>1" flag, and came up with the following:
>>
>>=> bdb_back_search
>>bdb(o=EFTEL): Lock table is out of available locker entries
>>bdb_dn2entry_rw("o=eftel")
>>=> bdb_dn2id_matched( "o=eftel" )
>>====> bdb_cache_find_entry_dn2id("o=eftel"): 1 (1 tries)
>>bdb(o=EFTEL): Locker does not exist
>>====> bdb_cache_find_entry_id( 1 ): 1 (busy) 2
>>locker = 1429
>>bdb(o=EFTEL): Locker does not exist
>>====> bdb_cache_find_entry_id( 1 ): 1 (busy) 2
>>locker = 1429
>>
>>
>>The last 3 lines then continue endlessly.  The problem is that the
>locker is not
>>allocated correctly in the first place, due to the load on the server.
>I came
>>up with the following patch, which seems to have fixed the problem (I
>can't get
>>slapd to hang any more under the same load).  I had thought about
>inserting a
>>small (100ms) sleep before the sched_yield, but am not sure what the
>most
>>portable way of doing this is.
>>
>>--- openldap-2.1.3/servers/slapd/back-bdb/back-bdb.h.orig       Mon Aug
>19
>>13:01:27 2002
>>+++ openldap-2.1.3/servers/slapd/back-bdb/back-bdb.h    Mon Aug 19
>13:01:56
>>2002
>>@@ -153,7 +153,7 @@
>> #define TXN_COMMIT(txn,f)                      txn_commit((txn), (f))
>> #define        TXN_ABORT(txn)                          txn_abort((txn))
>> #define TXN_ID(txn)                                    txn_id(txn)
>>-#define LOCK_ID(env, locker)           lock_id(env, locker)
>>+#define LOCK_ID(env, locker)           while(lock_id(env, locker))
>>{ldap_pvt_thread_yield();}
>> #define LOCK_ID_FREE(env, locker)      lock_id_free(env, locker)
>> #else
>> #define LOCK_DETECT(env,f,t,a)         (env)->lock_detect(env, f, t, a)
>>@@ -165,7 +165,7 @@
>> #define TXN_COMMIT(txn,f)                      (txn)->commit((txn),
>(f))
>> #define TXN_ABORT(txn)                         (txn)->abort((txn))
>> #define TXN_ID(txn)                                    (txn)->id(txn)
>>-#define LOCK_ID(env, locker)           (env)->lock_id(env, locker)
>>+#define LOCK_ID(env, locker)           while((env)->lock_id(env,
>locker))
>>{ldap_pvt_thread_yield();}
>> #define LOCK_ID_FREE(env, locker)      (env)->lock_id_free(env, locker)
>> #endif
>>

Comment 4 Howard Chu 2002-08-20 01:24:36 UTC

We will fix the code to check the result from LOCK_ID. In the meantime,
you should configure more BDB locks in your database environment to avoid
running out of them.

  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support 

> -----Original Message-----
> From: owner-openldap-bugs@OpenLDAP.org
> [mailto:owner-openldap-bugs@OpenLDAP.org]On Behalf Of

> I had a look at the current code for OPENLDAP_REL_ENG_2_1 via web 
> cvs, and it does not look like the cause of the problem has been 
> fixed.  The problem is that the bdb backend may or may not 
> successfully get a lock when you ask for for one.  On ~line 74 of 
> servers/slapd/back-bdb/search.c the ldap server runs LOCK_ID(), 
> but does not check the return value, and continues, assuming it 
> has a valid lock.  This lock is passed from function to function 
> until we hit ~line 849-921 in cache.c, where
> 
> state != CACHE_ENTRY_READY
> 
> and
> 
> rc = bdb_cache_entry_db_lock ( env, locker, ep, rw, 0, lock ); 
> fails because the lock is invalid, so we log, yield and try again. 
>  This is an endless loop unless we can guarantee that we have a 
> valid bdb lock in the first place.

Comment 5 Kurt Zeilenga 2002-08-20 01:48:11 UTC

I agree that the return result of LOCK_ID() should be checked.
I've added code which causes an LDAP_OTHER error if LOCK_ID()
fails, which in a quick check of DB4 code, is consistent with
possible error conditions.

Kurt

Comment 6 steven.wilton@team.eftel.com 2002-08-20 08:32:36 UTC

How about adding the following lines to the patch you have applied to cvs?  If the lock is rejected for the given reason, there is nothing major wrong with the database, but we should retry. The client program does not know that the ldap server is only having a temporary error getting the data (as opposed to if the lock is rejected due to something like a corrupt database, where we should send an error back to the client).

+retry:
                rc = LOCK_ID ( bdb->bi_dbenv, &locker );
                switch(rc) {
                case 0:
                        break;
+               case DB_LOCK_NOTGRANTED:
+                       ldap_pvt_thread_yield();
+                       goto retry;
                default:
                        return LDAP_OTHER;
                }

We use ldap to authenticate users, and if one of the ldap client programs detects an error, unusual things will happen on the system (some requests will work, while a random number of connections will fail for no good reason).

Steven

On Tue, 20 Aug 2002 09:48:11 Kurt D. Zeilenga wrote:
>I agree that the return result of LOCK_ID() should be checked.
>I've added code which causes an LDAP_OTHER error if LOCK_ID()
>fails, which in a quick check of DB4 code, is consistent with
>possible error conditions.
>Kurt

Comment 7 Kurt Zeilenga 2002-08-20 16:30:55 UTC

At 01:33 AM 2002-08-20, steven.wilton@team.eftel.com wrote:
>How about adding the following lines to the patch you have applied to cvs? 

Because, as far as I can tell from looking at DB4 sources,
LOCK_ID() does not return DB_LOCK_NOTGRANTED.

They kinds of errors LOCK_ID() does return, such as ENOMEM,
are generally mapped to LDAP_OTHER slapd(8).  LDAP_BUSY
is a possibility here.

I note that looping waiting for resources to free generally
causes makes resource starvation problems worse not better.
Resource starvation is best resolved by making more resources
available to the process (or by coding changes to reduce the
demand for resources).

Kurt

> If the lock is rejected for the given reason, there is nothing major wrong with the database, but we should retry.  The client program does not know that the ldap server is only having a temporary error getting the data (as opposed to if the lock is rejected due to something like a corrupt database, where we should send an error back to the client).
>
>+retry:
>                rc = LOCK_ID ( bdb->bi_dbenv, &locker );
>                switch(rc) {
>                case 0:
>                        break;
>+               case DB_LOCK_NOTGRANTED:
>+                       ldap_pvt_thread_yield();
>+                       goto retry;
>                default:
>                        return LDAP_OTHER;
>                }
>
>
>We use ldap to authenticate users, and if one of the ldap client programs detects an error, unusual things will happen on the system (some requests will work, while a random number of connections will fail for no good reason).
>
>Steven
>
>On Tue, 20 Aug 2002 09:48:11 Kurt D. Zeilenga wrote:
>>I agree that the return result of LOCK_ID() should be checked.
>>I've added code which causes an LDAP_OTHER error if LOCK_ID()
>>fails, which in a quick check of DB4 code, is consistent with
>>possible error conditions.
>>Kurt

Comment 8 steven.wilton@team.eftel.com 2002-08-21 01:57:37 UTC

sorry to be a pain, but I really don't like the idea of returning an error to the client under this condition.  I have been running with the first patch I sent to you since just before I submitted the patch, and it has not caused any problems.  When a locker is unavailable, the program loops ~20-30 times (which takes practically no time), and then continues when a locker is freed.

The reson I am trying to figure another way around this is because sending an ldap error to the client because the server is too busy (which is not really true, as a locker does become available almost immediately) will cause us errors that are hard to debug , as the ldap server starts randomly rejecting requests as lockers become scarce.  If we increase the number of lockers, we will just delay the problem, as the ldap server becomes busier, and starts using even more lockers, we will hit the limit again.  I would prefer to see worse performance if it takes a while waiting for a locker to become available rather than having a ldap lookup fail, which will cause problems for us. I have had a closer look at the db4 source, and it looks like you are right, where ENOMEM is returned from the lock_id() routine.
Looking at the possible return codes from the __lock_id function in lock/lock.c, I see:
ret=0 at the top (as the default)
ret=EINVAL
ret = __lock_getlocker(lt, *idp, locker_ndx, 1, &lk);
return (ret);

inside the __lock_getlocker() function we have:
return (ENOMEM); (which is the part of the code I was getting an error from)
return (0); (the default)

So... as far as I can see, lock_id() will return EINVAL, ENOMEM or 0.

ENOMEM is returned when "Lock table is out of available locker entries".  
As far as I can tell (and please correct me if I am wrong), the reason that we run out of locks is because other threads are holding onto them.  
Increasing the number of locks will possibly improve performance (as we don't need to wait for another thread to finish with it's lock), but as long as we are getting an ENOMEM error, the database is out of locks (because another thread is holding the lock), and we should loop until the other thread frees the lock.  This certainly fixes the problem on our system, as the first patch I submitted has been running for the past day or two without any problems.

What I am not sure about is how many locker entries may be being held by each thread, and how many are currently enabled in the slapd code.  The defaults should be 1000 (according to the db4 docs), which is a lot more that I thought slapd should use.

thanks

Steven

On Wed, 21 Aug 2002 00:30:55 Kurt D. Zeilenga wrote:
>At 01:33 AM 2002-08-20, steven.wilton@team.eftel.com wrote:
>>How about adding the following lines to the patch you have applied to
>cvs?
>Because, as far as I can tell from looking at DB4 sources,
>LOCK_ID() does not return DB_LOCK_NOTGRANTED.
>They kinds of errors LOCK_ID() does return, such as ENOMEM,
>are generally mapped to LDAP_OTHER slapd(8).  LDAP_BUSY
>is a possibility here.
>I note that looping waiting for resources to free generally
>causes makes resource starvation problems worse not better.
>Resource starvation is best resolved by making more resources
>available to the process (or by coding changes to reduce the
>demand for resources).
>Kurt
>> If the lock is rejected for the given reason, there is nothing major
>wrong with the database, but we should retry.  The client program does
>not know that the ldap server is only having a temporary error getting
>the data (as opposed to if the lock is rejected due to something like a
>corrupt database, where we should send an error back to the client).
>>
>>+retry:
>>                rc = LOCK_ID ( bdb->bi_dbenv, &locker );
>>                switch(rc) {
>>                case 0:
>>                        break;
>>+               case DB_LOCK_NOTGRANTED:
>>+                       ldap_pvt_thread_yield();
>>+                       goto retry;
>>                default:
>>                        return LDAP_OTHER;
>>                }
>>
>>
>>We use ldap to authenticate users, and if one of the ldap client
>programs detects an error, unusual things will happen on the system (some
>requests will work, while a random number of connections will fail for no
>good reason).
>>
>>Steven
>>
>>On Tue, 20 Aug 2002 09:48:11 Kurt D. Zeilenga wrote:
>>>I agree that the return result of LOCK_ID() should be checked.
>>>I've added code which causes an LDAP_OTHER error if LOCK_ID()
>>>fails, which in a quick check of DB4 code, is consistent with
>>>possible error conditions.
>>>Kurt

Comment 9 Kurt Zeilenga 2002-08-21 03:31:20 UTC

Your suggestion will quite likely result in resource deadlock.
It will certainly spend huge amount of cycles unnecessarily
in a busy loop.  A loop which includes a back-off delay and
is finite might be acceptable.

A few additional comments...

At 06:58 PM 2002-08-20, steven.wilton@team.eftel.com wrote:
>So... as far as I can see, lock_id() will return EINVAL, ENOMEM or 0.

I'm looking at a newer version, it only returns 0, ENOMEM, and
under some odd circumstances, a range of other system result
codes.  Only one of concern here is ENOMEM.

>ENOMEM is returned when "Lock table is out of available locker entries". 

This code is also returned with memory allocation (malloc) failed.

>As far as I can tell (and please correct me if I am wrong), the reason that we run out of locks is because other threads are holding onto them.

Or this thread.

>Increasing the number of locks will possibly improve performance (as we don't need to wait for another thread to finish with it's lock),

Performance?  If you are waiting (not in a busy loop), you are
not significant hindering performance.  The issue is how to
prevent waiting forever... that is, how to prevent resource
deadlock.

>but as long as we are getting an ENOMEM error, the database is out of locks (because another thread is holding the lock)

or this thread.

>, and we should loop until the other thread frees the lock.

The other threads could be doing the same, looping for this
thread to free resources.

>This certainly fixes the problem on our system, as the first patch I submitted has been running for the past day or two without any problems.

You are just luckily in that you reached resource deadlock.

>What I am not sure about is how many locker entries may be being held by each thread, and how many are currently enabled in the slapd code.  The defaults should be 1000 (according to the db4 docs), which is a lot more that I thought slapd should use.

Lots of locks are needed for fine grain locking...  I believe
some guidelines for DB settings were posted to the software
list.

Comment 10 steven.wilton@team.eftel.com 2002-08-21 05:01:57 UTC

On Wed, 21 Aug 2002 11:31:20 Kurt D. Zeilenga wrote:
>Your suggestion will quite likely result in resource deadlock.
>It will certainly spend huge amount of cycles unnecessarily
>in a busy loop.  A loop which includes a back-off delay and
>is finite might be acceptable.

This does sound like the best solution.  I prefer the idea of returning LDAP_BUSY, as this error will only occur while the server is under load.

>A few additional comments...
>At 06:58 PM 2002-08-20, steven.wilton@team.eftel.com wrote:
>>So... as far as I can see, lock_id() will return EINVAL, ENOMEM or 0.
>I'm looking at a newer version, it only returns 0, ENOMEM, and
>under some odd circumstances, a range of other system result
>codes.  Only one of concern here is ENOMEM.
>>ENOMEM is returned when "Lock table is out of available locker entries".
>
>This code is also returned with memory allocation (malloc) failed.

I am looking at bdb 4.0.14, which is the current release.  ENOMEM is returned in different functions for different reasons, but in the __lock_id() function it is only returned in the one case where no lockers are available.

>>As far as I can tell (and please correct me if I am wrong), the reason
>that we run out of locks is because other threads are holding onto them.
>Or this thread.
>>Increasing the number of locks will possibly improve performance (as we
>don't need to wait for another thread to finish with it's lock),
>Performance?  If you are waiting (not in a busy loop), you are
>not significant hindering performance.  The issue is how to
>prevent waiting forever... that is, how to prevent resource
>deadlock.
>>but as long as we are getting an ENOMEM error, the database is out of
>locks (because another thread is holding the lock)
>or this thread.

Ahh, I didn't realise that one thread could hold open more than one locker.  That would make my code bad :)

>>, and we should loop until the other thread frees the lock.
>The other threads could be doing the same, looping for this
>thread to free resources.

Oops, I didn't think of this either.  This makes my code _really_ bad :)

>>This certainly fixes the problem on our system, as the first patch I
>submitted has been running for the past day or two without any problems.
>You are just luckily in that you reached resource deadlock.

Yes, judging from the above I have just been lucky so far.

>>What I am not sure about is how many locker entries may be being held by
>each thread, and how many are currently enabled in the slapd code.  The
>defaults should be 1000 (according to the db4 docs), which is a lot more
>that I thought slapd should use.
>Lots of locks are needed for fine grain locking...  I believe
>some guidelines for DB settings were posted to the software
>list.

I will go and play with the number of locks that are available in the db environment.  Is it worth making some of these db options configuration file options, as people will probably have to play with them once they start using bdb4 database backends?

Comment 11 Kurt Zeilenga 2002-08-21 05:37:05 UTC

At 10:02 PM 2002-08-20, steven.wilton@team.eftel.com wrote:
>On Wed, 21 Aug 2002 11:31:20 Kurt D. Zeilenga wrote:
>>Your suggestion will quite likely result in resource deadlock.
>>It will certainly spend huge amount of cycles unnecessarily
>>in a busy loop.  A loop which includes a back-off delay and
>>is finite might be acceptable.
>
>This does sound like the best solution.  I prefer the idea of returning LDAP_BUSY, as this error will only occur while the server is under load.

We've traditionally returned LDAP_OTHER on resource exhaustion
(thinking that the condition would likely not go away).  But
I'm fine with LDAP_BUSY (especially as we change the server
to better deal with resource exhaustion).

> Is it worth making some of these db options configuration
> file options, as people will probably have to play with them
> once they start using bdb4 database backends?

No.  See software/devel list discussions.

Comment 12 Howard Chu 2002-08-22 10:17:39 UTC

> -----Original Message-----
> From: owner-openldap-bugs@OpenLDAP.org
> [mailto:owner-openldap-bugs@OpenLDAP.org]On Behalf Of
> steven.wilton@team.eftel.com

> If we increase the number of lockers, we will just
> delay the problem, as the ldap server becomes busier, and starts
> using even more lockers, we will hit the limit again.

No, this is not an unbounded problem. The maximum number of locks needed is
tied to the maximum number of slapd threads. It will also depend on the
number of attribute indices you have configured. Each index is stored in its
own database, and some number of locks are needed to navigate those databases
as well. You can use db_stat to help arrive at the ideal number of locks for
your configuration. Eventually you will reach a high-water mark and no more
locks will be needed.

See the BDB "Configuring locking: sizing the system" doc for more guidelines.
http://www.sleepycat.com/docs/ref/lock/max.html

  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support

Comment 13 Kurt Zeilenga 2002-08-22 12:43:00 UTC

changed notes
changed state Open to Closed

Comment 14 Howard Chu 2006-06-11 08:51:19 UTC

moved from Software Bugs to Archive.Software Bugs

Comment 15 OpenLDAP project 2014-08-01 21:06:25 UTC

fixed in HEAD
fixed in re21