[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Is putting slapd into read-only mode sufficient for backups?

On Mon, 13 Feb 2012, Brian Reichert wrote:
> On Fri, Feb 10, 2012 at 12:00:29PM -0800, Philip Guenther wrote:
> > The optimized procedure that I worked out with Sleepycat's help (for a 
> > completely different program, but using the "transaction data store") was 
> > this:
> I'm exploring implementing these steps, but I'm running into some
> confusion.
> If you're willing to discuss this in any better detail (or even
> better, provide some script that encapsulates these steps), that'd
> be great.
> For example:
> - In step 3, I'm to 'take note' of the LSN (log sequence number).
>   In my test environment, your awk scripts yields '1/456298', as a
>   matter of example.  But, I don't see where in these steps, as an
>   implementer, that I ever actually act on this data.
>   Am I, in step 6, re-deriving the LSN, to see if they differ?

Yep.  If there's been a checkpoint between (3) and (6), then that value in 
the db_stat output will have changed.

> - In step 6, you say 'the database is marked [...]'.  How do I mark
>   the database?  Or is this effect implicitly handled by the backend
>   as it manages checkpoints and transaction logs?

Hmm, I should have written "the *backup* is marked".  Basically, the idea 
is that if such a checkpoint has occurred, then if/when the backup is 
restored you have to run "db_recover -c".  If that didn't happen, then you 
don't need to do that when the backup is restored.  So, you need to define 
some way to communicate between the backup process and the restore 
process.  I found the easiest way to do this was to have the backup script 
create a file named "_quiescent" in the backup if-and-only-if a checkpoint 
did *not* occur.  The restore script would then check for the presence of 
that file and if it wasn't present, it would perform catastrophic 

> - In step 7, you say 'log files ... are marked in the backup [...]'.
>   What sort of mark do I make so they are not involved in a normal
>   backup, but are available if a catastrophic backup if necessary?
>   (Does this play into the how the database is marked in step 6?)

It's a similar idea, though in this case you should think about whether 
you'll *ever* want to use those log files.  Under normal circumstances 
(restoring a known-good backup), you would not do so.  The time you would 
do so is when some software or hardware failure has resulted in some 
corruption to a .db file such that the checkpoint state was inconsistent.

I've seen it happen, where *some* failure resulted in an fsync() claiming 
to succeed but then the data was lost.  The OS (Solaris, in this case), 
detected it as a parity failure (IIRC) on some hardware bus and rebooted.  
Doing catastrophic recovery let us recover the database by rerunning the 
transaction log across the failure, thus fixing up the .db file.  In 
better than a decade of working with Sleepycat DBs, that's the *only* time 
I've seen a need for intentional catastrophic recovery.

Note that if you don't detect the failure soon enough, you can quickly 
reach the state where running catastrophic recovery *would* fix the 
problem...but you can't afford the downtime to do it and can reach a 
working system *much* faster by rebuilding the database from outside 
sources (the most recent LDIF, etc).

This is expecially true with something like LDAP, where the *first* 
response to "database failure on the master!" should be "so the first 
replica took over, right?  Why did you wake me?!".

For such a system, there's no benefit to keeping the logs listed in step 
(2).  Just remove them from both the active environment and the backup in 
step (7).  (Yes, yes, you can avoid copying them by moving step (0) to 
after steps (1) and (2), and then only copying files that are *not* in the 
list from the original step (2).  Whatever; just *test* your procedure!)

Philip Guenther