[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: testing disaster recovery?





--On Wednesday, December 29, 2004 2:40 PM -0600 mjinks@uchicago.edu wrote:

We're testing and tweaking our OpenLDAP 2.2.18 setup, db-4.2.52.NC.
In case it matters we're running all of this on Solaris 9/SPARC
machinery.

One of the things we need to have practiced and documented before
rollout is disaster recovery.  Okay, well I read some bdb docs, then
decided to try playing around with db_recover, but I'd like to have a
repeatable way of putting the database into a deliberately-hosed
condition so that I can watch how db_recover performs when it actually
has something to do.

The obvious thing seemed like issuing slapd a kill -9, but so far on our
almost-entirely-idle testing setup, the next run of slapd has been able
to start up and run as if nothing happened.  I suppose that I'd be more
likely to get a database in damaged state if I arranged to have a number
of ldap_modify operations running when I did the kill, but at this point
it occurs to me to ask whether there's some "neater" way to dirty up the
database, or if I've just got the wrong approach to this altogether.

So, I guess my question is, any recommendations for testing disaster
recovery with a dbd back end?  I'm new here so apologize if this is a
FAQ, a search of the archive didn't turn up anything.

Honestly, I don't consider db_recover "disaster recovery". It will simply restore the database in the case of an unclean shutdown, where the data is still available for it to do so. However, slapd will start up most of the time even if there was an unclean shutdown. I've had this happen in the past, and you usually don't notice there is an issue until a new slapd process tries to access something in the database that was left in an unclean state. Then you either get a lockup or a segfault. Simply doing a kill -9 on slapd won't tell you much, you want to look at db_stat after killing slapd, and see what locks have been left around.


As far as disaster recovery, I'd look at database backups, and what to do if the database is lost off of all your servers. I have 10 ldap servers, so that risk is lower (thought not as mitigated as I'd like as they are all in the same server room ATM). Currently, we export the database on a nightly basis and store it AFS which is also backed up to a set of backup servers.

--Quanah

--
Quanah Gibson-Mount
Principal Software Developer
ITSS/Shared Services
Stanford University
GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html

"These censorship operations against schools and libraries are stronger
than ever in the present religio-political climate. They often focus on
fantasy and sf books, which foster that deadly enemy to bigotry and blind
faith, the imagination." -- Ursula K. Le Guin