[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Backup and bdb-logfile removal

On Fri, 11 Apr 2008, Peter Mogensen wrote:
Howard Chu wrote:
No, "db_recover -c" is for recovering from a catastrophic failure. It's not for creating a backup.

? When reading the docs it seems to me like db_recover -c is an integral part of making a hot backup??

"db_recover -c" says "perform recovery using all of the txn log files that are present instead of only going back to the point named in the last checkpoint". When making a hot backup, you need to do that in case a checkpoint was taken between when you started the copy of the first database file and when the copy of the last txn log completed. That "catastrophic recovery" only needs to be performed on the txn log files that were copied as part of the hot backup and not txn log files that were archivable before the first database file was copied.

In theory, it would be possible to perform full catastrophic recovery of a database from *just* the txn log files starting at log.000000001 and _no_ database files...but that will probably take more time than you really are interested in spending. The whole point of backing up the database files is to make it unnecessary to save and process the txn log files whose contents have been completely checkpointed to the database files.

What happens if your environment should crash after you have discarded these log files, but before you begin your hot backup ?

Their contents have been checkpointed to the database files, so normal recovery is sufficient.

To perform a backup and prune unused logfiles from your active environment:
============= WARNING: Only my guess
1) Run "db_archive" on you active environment to identify unused log files. Copy them somewhere to keep while doing the backup.

These files are not needed in the backup itself. Indeed, they're only needed if any of the database files are lost or corrupted without also losing the txn log files. In my experience, the situations where these files are useful are better handled by recovering from a replica instead of trying to perform database level recovery.

(I once helped a site where a backplane failure managed to make fsync() lie such that a checkpoint completed without the data actually making it to disk for the database files. The txn log files were fine, so performing catastrophic recovery with they not-yet-archived txn logs was sufficient to fix the problem, but that's the *only* time, in 7 years of intensive commercial BDB usage, where I've seen a use for archivable txn log files.)

2) Run "db_archive -s" to indentify database files and copy them to your backup location.
3) Run "db_archive -l" on you active environment to indentify all log files and copy them to your backup location.

Do be sure to follow the BDB documentation regarding copying of the files. In particular, use dd instead of cp on Solaris (or write your own program that uses read() and not mmap()).

4) Run "db_recover -c" on your backup to make it consistent.
5) Since the backup is offline you can safely delete the unused log files from it. ("db_archive -d")
6) The log files copied in step 1) can now safely be discarded so they don't exist anywhere - including the active environment.

Then it's my impression that in case the active environment should crash you should be able to continue from the backup + the logfiles from the active environment with minimal data loss ???
============== End guess =========

Other than my comments above, this procedure looks good to me. *Do* be sure to test it, both by forging failures of various part of it (out of disk space during a copy?) and by actual making sure you have a tested procedure for restoring a back up.

Philip Guenther