[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Is putting slapd into read-only mode sufficient for backups?



On Fri, 10 Feb 2012, Buchan Milne wrote:
> On Friday, 10 February 2012 01:48:45 Quanah Gibson-Mount wrote:
...
> > I thought I was very clear on that in my last email.  It is not 
> > sufficient. You need to stop slapd and run *db_recover*, which is more 
> > exhaustive than db_checkpoint, if you want to go the route of backing 
> > up the BDB db.
>
> If you checkpoint, and you backup all the database files (including 
> transaction log files) in the correct order, you should not need to 
> db_recover

If that's all that you require at backup time, then in order to guarantee 
correctness *at restore time* you have to perform "catastrophic" recovery 
(ala db_recover -c) on the restored database before trying to use it. 
That's necessary if a checkpoint occurs between when you start copying .db 
files and when you copy the last transaction log file.


The optimized procedure that I worked out with Sleepycat's help (for a 
completely different program, but using the "transaction data store") was 
this:

**      Backing up the database environment is done with the following
**      steps:
**      0) all txn log files except the current one are copied to
**         the backup
**      1) a checkpoint is taken
**      2) the list of txn log files that are no longer needed for
**         recovery or txn_abort is obtained
**      3) the LSN of the most recent checkpoint is noted
**      4) all the database table files, including queue extents,
**         are copied to the backup
**      5) all the txn log files that were not copied in step (0)
**         are copied to the backup
**      6) if a checkpoint has *not* taken place since step (3),
**         then the database is marked as not needing catastrophic
**         recovery when restored
**      7) if the list from step (2) is not empty, then those txn
**         log files are removed from the active database environment
**         and are marked in the backup as unnecessary for normal
**         restoration
**
**      Note that the ordering of this is almost completely inflexible.
**      In particular:
**              (0) must preceed (5)
**              (1) must preceed (2) and (3)
**              (2) and (3) must preceed (4)
**              (4) must preceed (5)
**              (5) must preceed (6) and (7)
**
**      Minimizing the time between (3) and (6) is highly desirable,
**      as that minimizes the window in which a checkpoint could
**      occur that would result in a backup that would require
**      catastrophic recovery when restored.  Restoring such a
**      backup is *much* slower than restoring one that only requires
**      normal recovery.  That's why (0) and (7) are pushed forward
**      and backward to where they are.

For those trying to script this, you can get the LSN of the most recent
checkpoint with
	db_stat -t | awk '$2 ~ /^File\/offset/{print $1; exit}'


Philip Guenther