[Date Prev][Date Next] [Chronological] [Thread] [Top]

RE: back-bdb DB_RECOVER and soft restart



After some more thought I realized that the mechanism I proposed cannot
detect a situation where there are multiple users of a BDB environment and
one or more of them exits unexpectedly, while at least one exits normally.
In that situation it is necessary, in principle, to do a recovery on the BDB
environment immediately after a process crashes, but that can't be done
while any other processes are still using the BDB environment, and so will
require manual intervention.

In this situation recovery would _not_ take place on the next system
restart, either.

After some discussion, Howard Chu and I came up with the following approach.
It has the advantages of working with lockf-style exclusive locks and it is
possible to tell by (programmatic) inspection of the lock file if a process
has exited prematurely.

This approach works as follows:

On back-bdb startup, each instance of back-bdb will do the following:

1. Open the lock file called slaplock in the BDB environment directory
   with O_CREAT. This step is only to make sure there is something to lock,
   and it doesn't matter if the file already exists.
2. Wait for a lock on byte 0. This acts as a semaphore keeping other
   instances of back-bdb from starting during this process.
3. For each byte position p in the file whose value is non-zero,
   check for a corresponding lock at position p+1. A missing lock
   indicates a process exited abnormally. There are three cases here that
   are insteresting:
   a. Each position in the lock file that had a non-zero value had a
      corresponding lock, or there were no non-zero values in the file.
      This means everything's OK and we go to step 4.
   b. One or more positions in the file had non-zero values but did not
      have corresponding locks. This means one or more processes exited
      abnormally, but the BDB environment is still in use. In this case
      we abort the back-bdb startup, logging a message that says something
      along the lines of "The BDB environment is corrupt".
   c. No positions in the file with non-zero values had corresponding
      locks. This means one or more processes exited abnormally, and no
      other instances of back-bdb are using the BDB environment. We
therefore
      set the DB_RECOVER flag, initialize the lock file to all zeros, and
      proceed.
4. Locate a position in the file that has a zero value, increment it,
   set the corresponding file lock, and update the file. Remember the
   position for shutdown.
5. Open the BDB environment, finish the backend initialization and clear
   the lock on byte 0 of the lock file.


On bdb shutdown, each instance of back-bdb will do the following:

1. Wait for a lock on byte 0 of the lock file.
2. Set the byte at the saved position in the lock file to a value of 0.
3. Clear the corresponding file lock (optional, since all locks will be
   cleared on process exit).
4. Exit as usual.

The important things to note are that the lock on byte 0 is always a
short-lived lock, used only to protect the lock file while we change it and
the BDB environment while we perform the recovery. We use individual bytes
in the file to keep track of normal vs abnormal exits. The corresponding
lock keeps us apprised of whether a process is still active. There should
never be a non-zero value in the file without a corresponding lock.

This mechanism will allow any instance of back-bdb to tell that the BDB
environment is corrupt and refuse to start. It is further possible to
periodically check the lock file to make sure no back-bdb instance exited
abnormally and send an alert or take some other action. This should
drastically cut down on mysterious hangs due to leftover locks from aborted
processes as well as other database inconsistencies.

Comments?

Matthew Hardin
Symas Corporation
Packaged, certified, and supported LDAP software:
http://www.symas.net/download

> -----Original Message-----
> From: owner-openldap-devel@OpenLDAP.org
> [mailto:owner-openldap-devel@OpenLDAP.org]On Behalf Of Matthew Hardin
> Sent: Friday, September 12, 2003 4:45 PM
> To: openldap-devel@OpenLDAP.org
> Cc: Howard Chu
> Subject: RE: back-bdb DB_RECOVER and soft restart
>
>
> This is a followup to the back-bdb DB_RECOVER thread from last
> year. We want
> to add automatic recovery to back-bdb and propose to solve the
> problem this
> way.
>
> The modifications that follow involve changes to the back-bdb
> initialization
> and shutdown routines. They are intended to detect an improper shutdown of
> back-bdb and initiate a recovery only when there are no other instances of
> back-bdb accessing the
> db. Further, additional instances of back-bdb (i.e., tools) will not
> complete their initialization until the the db recovery has been
> completed.
>
> The mechanism uses a combination of lock files and file locks,
> and works as
> follows:
>
> On startup each instance of back-bdb will do the following:
>
> 1. Open the lock file in the db directory called slaplock with O_CREAT.
>    This step is only to make sure there is something to lock,
>    and it doesn't matter if the file already exists.
> 2. Attempt to place a write lock on the lock file. If the lock fails,
>    it means another back-bdb instance is is either recovering  the db
>    or using it, so proceed to step 5.
> 3. Stat the lock file. If the file size is non zero, it means that
>    no other back-bdb instances are using the db and that the db
>    was not properly closed, so perform the recovery.
> 4. Write one byte to the file (one variation is to write the PID into it
>    so one can tell by inspection which process did it).
> 5. Wait for a read lock on the lock file and leave it there for the
>    life of the back-bdb instance.
> 6. Open the db and finish initialization.
>
>
> On bdb shutdown, each instance of back-bdb will do the following:
>
> 1. Attempt to place a write lock on the lock file. If it fails,
>    it means that other back-bdb instances are using the db file,
>    so go to step 3.
> 2. Perform the DB shutdown and then truncate the lock file to
>    0 bytes. That signals that the db was shut down cleanly.
> 3. Close the lock file and exit normally.
>
>
> This appears to cleanly and portably solve the problem of back-bdb
> DB_RECOVER and soft restart. In addition to slapd itself, any of the slap
> tools that opens a db that was shutdown uncleanly will initiate a
> DB_RECOVER, but only if they are the only process accessing the database.
> Once recovery is complete, then operation proceeds as normal.
>
> Comments?
>
> Matthew Hardin
> Symas Corporation
> Packaged, certified, and supported LDAP software:
> http://www.symas.net/download
>
> > -----Original Message-----
> > From: owner-openldap-devel@OpenLDAP.org
> > [mailto:owner-openldap-devel@OpenLDAP.org]On Behalf Of Howard Chu
> > Sent: Friday, August 09, 2002 4:16 PM
> > To: openldap-devel@OpenLDAP.org
> > Subject: back-bdb DB_RECOVER and soft restart
> >
> >
> > A couple weeks ago I patched init.c to exclude the DB_RECOVER flag when
> > running a slap tool and initializing the BDB environment. This allowed
> > slapcat to run concurrently with slapd, and actually I see no reason why
> > slapadd or slapindex wouldn't also work since they both operate
> > additively.
> >
> > There's still a problem if you're trying to start a 2nd slapd on
> > an existing
> > database, ala soft restart. I think we need some kind of a
> > semaphore instead,
> > such that whenever any program starts, if it's the only one
> > accessing the BDB
> > environment, it automatically performs a recovery. But if there
> are two or
> > more active instances, the subsequent programs leave it alone.
> That should
> > give us the most safety and convenience, and you can still just run
> > db_recover manually if you really need it.
> >
> > Or we can just ditch the auto-recovery completely and always
> > require manual
> > use of db_recover instead.
> >
> >   -- Howard Chu
> >   Chief Architect, Symas Corp.       Director, Highland Sun
> >   http://www.symas.com               http://highlandsun.com/hyc
> >   Symas: Premier OpenSource Development and Support
> >
> >
> >
>
>