[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#5508) slapd process consumes all of CPU

--On Tuesday, May 13, 2008 02:45:06 AM -0700 Howard Chu <hyc@symas.com> 

> Bill MacAllister wrote:
>> Attached is the output of db4.2_stat -CA of the database.
>> Thanks for looking at this.
> So far it just looks like a very busy server. Can you turn off the
> network access to it and see if it settles down when the query traffic
> stops?

Last night the server tried to do a log rotation.  When I look at the log 
now it is zero length and nothing is getting written to it.  An ldapsearch 
on the server just hangs.

I logged into the console, shutdown the network interface down and the CPU 
is still pinned.

> It's a bit odd that a single transaction has so many pages of the
> suPrivilegeGroup index locked.
> The backtrace is somewhat suspicious, there are several <value optimized
> out> items in the trace. In thread 8, frames 5 and 6 the locker value is
> odd; usually in BDB the locker ID associated with a transaction has bit
> 31 set, yielding a very large 32 bit number. Also there is no locker with
> that ID in the db_stat output you provided.
> It looks like you'll have to try this again with a non-optimized binary
> to get a reliable backtrace.

Yes, we were afraid of that.  I will build a debug version of bdb.  The 
real rub is that we don't seem to be able to make this happen on demand.  I 
tried taking the log from the pinned server, turned the log into a shell 
script of ldapsearch commands, and pointed it at another server.  I could 
not make the second server go CPU bound.  So, we will just have to deploy 
the debug bdb support on our test servers and wait.


>> Bill
>> --On Tuesday, May 13, 2008 01:20:49 AM -0700 Howard Chu<hyc@symas.com>
>> wrote:
>>> whm@stanford.edu wrote:
>>>> Full_Name: Bill MacAllister
>>>> Version: 2.3.41-1su2
>>>> OS: debian etch kernel 2.6.18-4-amd64
>>>> URL: http://www.stanford.edu/~whm/ldap-test1-bt.txt
>>>> Submission from: (NULL) (
>>>> The slapd process will sometimes consume all of available CPU.  We
>>>> observed this when we upgraded our production servers from 2.3.35-2su2
>>>> to 2.3.41-1su2.  The problem was bad enough that we downgraded the
>>>> production servers to 2.3.35-2su2. We have been trying to provoke the
>>>>    problem in our test environment and have not been successful in
>>>>    making it happen on demand.  Today, we noticed that one of our test
>>>> servers went completely CPU bound.  I took a backtrace.  It is
>>>> available at the URL below.  The interesting thing about the problem
>>>> is that although top shows a pinned CPU and a high load the server is
>>>> still responsive and continues to answer LDAP searches.  The test
>>>> server that exhibits the problem is still CPU bound and has been for
>>>> 2-3 hours now.  We will leave this server in this state in case there
>>>> is other information that we should harvest in resolving the problem.
>>> Please also provide the output from db_stat -CA on the database in
>>> question, thanks.

Bill MacAllister <whm@stanford.edu>
Systems Programmer, ITS Unix Systems, Stanford University