[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: slapd hangs at 100% cpu in sched_yield (ITS#2030)



On Wed, 21 Aug 2002 11:31:20 Kurt D. Zeilenga wrote:
>Your suggestion will quite likely result in resource deadlock.
>It will certainly spend huge amount of cycles unnecessarily
>in a busy loop.  A loop which includes a back-off delay and
>is finite might be acceptable.

This does sound like the best solution.  I prefer the idea of returning LDAP_BUSY, as this error will only occur while the server is under load.

>A few additional comments...
>At 06:58 PM 2002-08-20, steven.wilton@team.eftel.com wrote:
>>So... as far as I can see, lock_id() will return EINVAL, ENOMEM or 0.
>I'm looking at a newer version, it only returns 0, ENOMEM, and
>under some odd circumstances, a range of other system result
>codes.  Only one of concern here is ENOMEM.
>>ENOMEM is returned when "Lock table is out of available locker entries".
>
>This code is also returned with memory allocation (malloc) failed.

I am looking at bdb 4.0.14, which is the current release.  ENOMEM is returned in different functions for different reasons, but in the __lock_id() function it is only returned in the one case where no lockers are available.

>>As far as I can tell (and please correct me if I am wrong), the reason
>that we run out of locks is because other threads are holding onto them.
>Or this thread.
>>Increasing the number of locks will possibly improve performance (as we
>don't need to wait for another thread to finish with it's lock),
>Performance?  If you are waiting (not in a busy loop), you are
>not significant hindering performance.  The issue is how to
>prevent waiting forever... that is, how to prevent resource
>deadlock.
>>but as long as we are getting an ENOMEM error, the database is out of
>locks (because another thread is holding the lock)
>or this thread.

Ahh, I didn't realise that one thread could hold open more than one locker.  That would make my code bad :)

>>, and we should loop until the other thread frees the lock.
>The other threads could be doing the same, looping for this
>thread to free resources.

Oops, I didn't think of this either.  This makes my code _really_ bad :)

>>This certainly fixes the problem on our system, as the first patch I
>submitted has been running for the past day or two without any problems.
>You are just luckily in that you reached resource deadlock.

Yes, judging from the above I have just been lucky so far.

>>What I am not sure about is how many locker entries may be being held by
>each thread, and how many are currently enabled in the slapd code.  The
>defaults should be 1000 (according to the db4 docs), which is a lot more
>that I thought slapd should use.
>Lots of locks are needed for fine grain locking...  I believe
>some guidelines for DB settings were posted to the software
>list.

I will go and play with the number of locks that are available in the db environment.  Is it worth making some of these db options configuration file options, as people will probably have to play with them once they start using bdb4 database backends?