[Date Prev][Date Next] [Chronological] [Thread] [Top]

A question of replication robustness.

In a recent session working on internal docs the question came up about how
robust the replication model of OpenLDAP is. (that is since it is a push
model). Now this causes me a little concern, because I'm trying to make sure
everything in our infrastructure is at least somewhat fault tolerant. Here's
the problem as I see it:

Let M be a master server, and S1, S2 and S3 be three of its slaves. Let's
assume that at 9:35am on Monday morning slave S1 dies mysteriously. The
support staff responsible for S1 get it back up and running at 11:45am, and
it's working fine. Except that:
	- at 9:57am user alice changed her LDAP password
	- at 10:35am user bob was added to LDAP
	- at 11:00am user charlie was fired and his account subsequently

When S1 comes back on it will not see the changes to alice, bob and charlie
if I understand the current system. Or does slurpd somehow gracefully handle
nodes that have lost contact? What happens here? Do we continue to retry? 

In an ideal world, S1 would somehow contact M at restart time to check for
updates (if it knows about M, say through an updateref or the like...). (it
would also be nice if the slave knew enough to get a full copy of the
database from a cold start. But that might require a lot more intelligence.)

I'm sure I could write some sort of replay monitor (and I intend to write a
couple monitors for .rej files.) that does one shots on the .rej's but I'm
wondering if maybe the master server should handle this sort of thing in
some intelligent manner. (Rather than having everyone in the world reinvent
the wheel.)

Any thoughts? Am I completely off-base? Do I not understand how something
actually works?

Justin Hahn              ProfitLogic
jeh@profitlogic.com      11 Cambridge Center
Systems Administrator    Cambridge, MA 02142
o: 617-218-1986          www.profitlogic.com
m: 617-501-2743
f: 617-218-1901