[Date Prev][Date Next]
Re: slapd crashing "randomly?"
- To: Daniel Henninger <firstname.lastname@example.org>
- Subject: Re: slapd crashing "randomly?"
- From: Daniel Henninger <email@example.com>
- Date: Wed, 11 Apr 2007 13:25:01 -0400
- Cc: firstname.lastname@example.org
- In-reply-to: <123A6D86-1EB9-4E1A-9324-273FE23160B9@ncsu.edu>
- References: <email@example.com> <firstname.lastname@example.org> <4A1A9331E81817A54E2EF9A2@SW-90-717-287-3.stanford.edu> <123A6D86-1EB9-4E1A-9324-273FE23160B9@ncsu.edu>
Been a while, but I finally caught a core dump. Of course, I'm not
entirely sure why there's so few useful symbols showing since I
compiled it with debugging symbols and didn't strip it. =/ Anyway,
the information I got from it is interesting:
#0 0x000b4694 in ?? ()
#1 0x000e175c in avl_delete ()
#2 0x000b4c48 in bdb_idl_cache_put ()
#3 0x000b5930 in bdb_idl_fetch_key ()
#4 0x000b796c in bdb_key_read ()
#5 0x000b30b0 in bdb_filter_candidates ()
#6 0x000b3a28 in ?? ()
#7 0x000b3a28 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt
Is it possible I just have a busted version of berkeley db?! What
version are you all using? (I guess it's Oracle DB now...) We are
using version 4.2.52. Built with --enable-compat185.
On Feb 12, 2007, at 11:36 AM, Daniel Henninger wrote:
On Feb 6, 2007, at 3:34 PM, Quanah Gibson-Mount wrote:
--On Tuesday, February 06, 2007 1:35 PM -0500 matthew sporleder
On 2/6/07, email@example.com <firstname.lastname@example.org> wrote:
I want to start this message by saying, what I'm about to
completely vague and I don't expect to get a solution response. ;)
Basically, I'm out of ideas and am looking for some suggestions
as to how
to debug the issue I'm running into.
Starting about half a year ago, slapd started "just dieing" out
blue. Not a think in the logs shows up to indicate what might have
caused it. The last query that I see in the logs before a crash
seems to be nothing special. I don't even see a core dump being
generated yet, but then that may just be because I don't have
setup to get a core dump at this time. We were running the last
upgraded to the latest release of 2.3 to make sure it wasn't an
version" issue. Unfortunately, slapd still dies a fair amount on
appears to be fairly unpredictable. I've seen it crash within 1
of starting up slapd (then a subsequent startup 'takes' just fine).
I've seen it crash when there were a number of network issues
I've seen it crash out of the blue when nothing appeared to be
I don't really have the drive space to turn on max debug logging
until the problem occurs.
We're thinking about setting up something to watch all of the
traffic going to one of the boxes until it dies. (assuming we
something with the resources to do that)
That all said... since I have nothing solid to present, do you
any suggestions of what would be the best way to track down
on? I'm literally out of ideas unless my berkeley db config is
causing the problem or something like that.
I apologize for the vagueness. =/ Any ideas/suggestions?
After the crash, is your bdb environment clean, or is it needing a
Depending on your OS, you could watch the pid all the time and trap
the last signals received, last files accessed, etc, and that
take tons of resources.
You could try turning on max debugging and simply rotate a lot more
often. (every n minutes or even seconds) This way you could
definitely keep the -last- transactions and just not worry about the
Also, what database backend are you using? Why not build slapd
with debugging symbols so you can get a core?
and I am planning on doing so ;D
What version of 2.3 are you running at the moment? You say you
had upgraded to the latest release at some point, but not what
release that was. Up until around 2.3.28, there were issues in
the connection code that caused random crashes on my servers.
2.3.33 would be your best bet to eliminate that as an issue if you
aren't there yet.
2.3.32 is what we're running right now. I've been sticking with
the version that's labelled as "stable". Do y'all recommend going
with the release instead of the "stable"?
I've at least been having this issue since 2.2.whatever, so it's
been going on for quite some time version wise. Timewise, I still
think something may have changed in my world to cause all of this,
but just can't track it down.
Anyway, I'm working on setting up some things with which I can
Principal Software Developer
ITS/Shared Application Services
GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html