[Date Prev][Date Next] [Chronological] [Thread] [Top]

RE: BDB recovery after power outage

To: Howard Chu <hyc@highlandsun.com>
Subject: RE: BDB recovery after power outage
From: "Luke A. Kanies" <luke@madstop.com>
Date: Sun, 20 Apr 2003 21:04:40 -0500 (CDT)
Cc: openldap-software@OpenLDAP.org
In-reply-to: <002e01c307a7$c23bd5e0$0e01a8c0@CELLO>
References: <002e01c307a7$c23bd5e0$0e01a8c0@CELLO>

On Sun, 20 Apr 2003, Howard Chu wrote:

> > -----Original Message-----
> > From: owner-openldap-software@OpenLDAP.org
> > [mailto:owner-openldap-software@OpenLDAP.org]On Behalf Of Luke A. Kanies
>
> > It's quite an assumption on your part that I haven't read any of the
> > documentation, when it's pretty obvious that one must read at
> > least some
> > documentation in order to get OpenLDAP running at all.
>
> You wrote:
>
> >>>Actually, the first one resulted first in a significant outage, since the
> >>>database apparently wasn't clean or something (no, I'm not that familiar
> >>>with BDB), which caused slapd to conveniently just sit there unable to
> >>>open the database and taking 100% of my CPU; it also wouldn't respond to
> >>>signals or give any feedback.  I finally figured out that the database had
> >>>to be recovered, but said recovery resulted in data loss (and not data
> >>>that was incredibly recent, either).
>
> The slapd-bdb(5) manpage specifically talks about a "checkpoint" keyword,
> used to guarantee that buffered data is flushed to disk. If you had actually
> read the available documentation you would (a) not be unfamiliar with BDB and
> (b) not be dealing with lost data.

Thank you for that; that should hopefully do the trick.

See, I have read the documentation, but I didn't know that manpage
existed.  I've been mostly relying on this page:

http://www.openldap.org/doc/admin21/slapdconfig.html

I somehow missed the mention of slapd-bdb(5), on it.  Mistakes happen,
which is why I was asking for help.  I think I was mostly confused because
a bunch of different ldbm directives are mentioned, but no bdb-specific
directives are.

I never said that I read all of the documentation, just that I had read
some and I could not find the source of my problem.  For you to rejoinder
that I obviously had not read a damn thing was, again, insulting.

> The reason that slapd was hung on restart is because BerkeleyDB writes its
> lock information into its environment files. These files persist past program
> restarts and system restarts. If a program using a BDB environment exits
> uncleanly, it leaves its lock records in the environment and you must use
> db_recover to clean things up. This also is spelled out in the BDB
> documentation.

The thing is, I don't really have the desire or the time to read the
entirety of the documentation for everything I use.  I've been using BDB
with cfengine for > 6 months now, and have never had to look into any of
this, even though some of my cf daemons have died badly many times (mostly
because of testing).  Yes, OpenLDAP has greater requirements than
cfengine, but I figured those would be performance-related requirements,
not data integrity-related, so I figured I could ignore them.  Woe is that
decision.

Thanks for this info.

> In early versions of OpenLDAP 2.1 we had slapd automatically perform recovery
> whenever it started. However, this caused problems if you accidentally
> started a second slapd while one was running - the second recovery would wipe
> out the environment that the first one was using. Since all of the locking
> information is contained inside the BDB environment, there was no locking
> mechanism to prevent this occurrence. So now we no longer do automatic
> recovery; it's up to you to run db_recover by hand or add it to your server
> startup scripts as needed.

So that means it's safe to add it to the startup scripts?  I have read the
BDB docs on db_recover many times, but I personally don't find the BDB
docs very readable.  They seem to assume a lot of knowledge I apparently
don't have, or they're just written in a way that will always confuse me.
My goal in this is not to know everything there is to know about BDB, but
to use it to get done what I need done.  As a result, I cut some corners.

Thank you for the help.  Hopefully I won't run into these problems again,
because of knowledge of the 'checkpoint' keyword.  I highly recommend that
someone add mention of that to the slapd.conf page that I pointed to
above.

Luke

-- 
"Did you know that black paint is an excellent stain remover?"
                                       - Dogbert

References:
- RE: BDB recovery after power outage
  - From: "Howard Chu" <hyc@highlandsun.com>

Prev by Date: RE: BDB recovery after power outage
Next by Date: Re: BDB recovery after power outage
Index(es):
- Chronological
- Thread