[Date Prev][Date Next]
Re: (ITS#5508) slapd process consumes all of CPU
--On Tuesday, May 13, 2008 02:45:06 AM -0700 Howard Chu <firstname.lastname@example.org>
> Bill MacAllister wrote:
>> Attached is the output of db4.2_stat -CA of the database.
>> Thanks for looking at this.
> So far it just looks like a very busy server. Can you turn off the
> network access to it and see if it settles down when the query traffic
Last night the server tried to do a log rotation. When I look at the log
now it is zero length and nothing is getting written to it. An ldapsearch
on the server just hangs.
I logged into the console, shutdown the network interface down and the CPU
is still pinned.
> It's a bit odd that a single transaction has so many pages of the
> suPrivilegeGroup index locked.
> The backtrace is somewhat suspicious, there are several <value optimized
> out> items in the trace. In thread 8, frames 5 and 6 the locker value is
> odd; usually in BDB the locker ID associated with a transaction has bit
> 31 set, yielding a very large 32 bit number. Also there is no locker with
> that ID in the db_stat output you provided.
> It looks like you'll have to try this again with a non-optimized binary
> to get a reliable backtrace.
Yes, we were afraid of that. I will build a debug version of bdb. The
real rub is that we don't seem to be able to make this happen on demand. I
tried taking the log from the pinned server, turned the log into a shell
script of ldapsearch commands, and pointed it at another server. I could
not make the second server go CPU bound. So, we will just have to deploy
the debug bdb support on our test servers and wait.
>> --On Tuesday, May 13, 2008 01:20:49 AM -0700 Howard Chu<email@example.com>
>>> firstname.lastname@example.org wrote:
>>>> Full_Name: Bill MacAllister
>>>> Version: 2.3.41-1su2
>>>> OS: debian etch kernel 2.6.18-4-amd64
>>>> URL: http://www.stanford.edu/~whm/ldap-test1-bt.txt
>>>> Submission from: (NULL) (188.8.131.52)
>>>> The slapd process will sometimes consume all of available CPU. We
>>>> observed this when we upgraded our production servers from 2.3.35-2su2
>>>> to 2.3.41-1su2. The problem was bad enough that we downgraded the
>>>> production servers to 2.3.35-2su2. We have been trying to provoke the
>>>> problem in our test environment and have not been successful in
>>>> making it happen on demand. Today, we noticed that one of our test
>>>> servers went completely CPU bound. I took a backtrace. It is
>>>> available at the URL below. The interesting thing about the problem
>>>> is that although top shows a pinned CPU and a high load the server is
>>>> still responsive and continues to answer LDAP searches. The test
>>>> server that exhibits the problem is still CPU bound and has been for
>>>> 2-3 hours now. We will leave this server in this state in case there
>>>> is other information that we should harvest in resolving the problem.
>>> Please also provide the output from db_stat -CA on the database in
>>> question, thanks.
Bill MacAllister <email@example.com>
Systems Programmer, ITS Unix Systems, Stanford University