[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#5508) slapd process consumes all of CPU

Bill MacAllister wrote:
> Howard,
> We just upgraded to 2.3.42 and are seeing the problem again in our
> production environment.  I have attached a backtrace and db_stat output.
> This output if from the busy server.  I will disconnect the network next
> and see what happens.

The db_stat output seems to indicate that entries that are being fetched for 
read access are not getting unlocked. This behavior is new since 2.3.4x for 
you? What's the last release version where this does not occur?
> Bill
> --On Tuesday, May 13, 2008 02:45:06 AM -0700 Howard Chu<hyc@symas.com>
> wrote:
>> Bill MacAllister wrote:
>>> Attached is the output of db4.2_stat -CA of the database.
>>> Thanks for looking at this.
>> So far it just looks like a very busy server. Can you turn off the
>> network access to it and see if it settles down when the query traffic
>> stops?
>> It's a bit odd that a single transaction has so many pages of the
>> suPrivilegeGroup index locked.
>> The backtrace is somewhat suspicious, there are several<value optimized
>> out>  items in the trace. In thread 8, frames 5 and 6 the locker value is
>> odd; usually in BDB the locker ID associated with a transaction has bit
>> 31 set, yielding a very large 32 bit number. Also there is no locker with
>> that ID in the db_stat output you provided.
>> It looks like you'll have to try this again with a non-optimized binary
>> to get a reliable backtrace.
>>> Bill
>>> --On Tuesday, May 13, 2008 01:20:49 AM -0700 Howard Chu<hyc@symas.com>
>>> wrote:
>>>> whm@stanford.edu wrote:
>>>>> Full_Name: Bill MacAllister
>>>>> Version: 2.3.41-1su2
>>>>> OS: debian etch kernel 2.6.18-4-amd64
>>>>> URL: http://www.stanford.edu/~whm/ldap-test1-bt.txt
>>>>> Submission from: (NULL) (
>>>>> The slapd process will sometimes consume all of available CPU.  We
>>>>> observed this when we upgraded our production servers from 2.3.35-2su2
>>>>> to 2.3.41-1su2.  The problem was bad enough that we downgraded the
>>>>> production servers to 2.3.35-2su2. We have been trying to provoke the
>>>>>     problem in our test environment and have not been successful in
>>>>>     making it happen on demand.  Today, we noticed that one of our test
>>>>> servers went completely CPU bound.  I took a backtrace.  It is
>>>>> available at the URL below.  The interesting thing about the problem
>>>>> is that although top shows a pinned CPU and a high load the server is
>>>>> still responsive and continues to answer LDAP searches.  The test
>>>>> server that exhibits the problem is still CPU bound and has been for
>>>>> 2-3 hours now.  We will leave this server in this state in case there
>>>>> is other information that we should harvest in resolving the problem.
>>>> Please also provide the output from db_stat -CA on the database in
>>>> question, thanks.
> +--------------------------------------------------------
> | Bill MacAllister<whm@stanford.edu>
> | Systems Software Programmer, ITS Unix Systems, Stanford University
> +----------------------------------------------------------
> |  "My heart is warm with the friends I make, And better
> |   friends I'll not be knowing; Yet there isn't a train
> |   I wouldn't take, No matter where it's going."
> |                         Edna St. Vincent Millay

   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/