[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Continued instability on Solaris 7



Kurt D. Zeilenga said:
> At 09:27 AM 2002-08-05, Alan Sparks wrote:
>
>>I've upgraded to ENG_RELEASE_2_1 as of about a week ago,
> We've made a number of updates to re21 since then...

Well, I can take a look at the latest re21...

>>and made sure I
>>build with BDB 4.0.14 statically linked.  Built with GCC 2.95.2.
> You might want to very BDB is working properly.  Sleepycat
> includes a test suite as part of the BerkeleyDB distribution.
> With GCC, use "-O -g" (e.g., no -O2 or greater).

Doublechecked, only -O optimization on both OpenLDAP and BDB builds.  And
the BDB tests appeared to complete successfully, I'm running them again
just to be totally sure.

>>Unfortunately, nothing updates in the slave (same software build).
> What do the slave's logs say is happening?

I can't find anything in the slave logs.  Running at loglevel 256 to avoid
catastrophic overlogging and performance issues, since this is a
production server.  I've started running the slurpd as a background job
with -d 255, redirected output to a file.  Am waiting for a fault....

>
>>Sending a SIGTERM to the slave slapd causes it to stop responding and
>> suddenly begin consuming 80% of CPU, until sent a SIGKILL.
> It's likely trying hard to complete an update.  The SIGKILL,
> of course, causes it to drop the update on the floor.

Hmmmm... wonder why it suddenly got so hard to do... :-/

One things I've been doing that I've stopped, to avoid confusing things...
I had been shutting down the master and the slave nightly (one hour apart)
with SIGTERM before slapcat'ting the database for backup.  I don't know
why that might cause problems, but...

I've upgraded the system to Sun's latest recommended patchset also, it was
slightly out of date, and rebooted.  So far, I've run 24 hours without a
failure, I'll keep watching.

===========
Alan Sparks, UNIX/Linux Systems Administrator
<asparks@doublesparks.net>