[Date Prev][Date Next] [Chronological] [Thread] [Top]

Syncrepl vs. replication

This is rather long, but I thought it best to go into detail.

Our directory sort of evolved over the years, as problems became apparent 
and new releases issued etc.  The limitations of the replication protocol 
are now becoming serious, but before I investigate syncRepl I'd like to 
know whether it will indeed work in our rather baroque situation.

First, some background:

We have offices all through Asia; some applications are local to that 
office whilst others are global.  Now, the comms links are not the best 
(this is not so much a slight on the countries so much as trying to get 
*this* country to talk to *that* country i.e. a mixture of DSL/X.25/ISDN 
etc) but real-time comms is not so important as a locator service i.e. we 
don't care if application ABC on host XYZ cannot be reached, but we do 
need to know that application ABC lives on host XYZ, so that a batch job 
can be queued for it.

After trying various configurations including some rather long replication 
chains I finally decided on something which IMHO was rather elegant: each 
office masters its own suffix (e.g. dc=sg,dc=example,dc=com) whilst 
carrying slaves for the other zones; a top-level directory hooks them 
together by handing out referrals back to the local box and to a central 
backup.  Each master replicates to this "replication server" which then 
replicates back to the other slaves in turn.

Still with me?  It was here that the replication protocol began to show 
some limitations.  Basically, there is an implicit assumption that a 
server will handle one suffix: this is apparent in the format of the 
slurd.replog file:

replica: ldap2.au.example.com:389
time: 1117516270.0
dn: ou=Admin,dc=sg,dc=example,dc=com
changetype: modify
replace: st
st: xyzzy

Note that the "key" is the host, not the suffix.

It's assumed that "ldap2" knows about that suffix; it happens when SLURPD 
tries to replicate to all known servers in slapd.conf.  I tried fiddling 
with the port (389) but that got too messy; instead, I partitioned 
slapd.conf into several files, fired up a SLURPD for each one so each 
instance knows only about its relevant slaves, and glued the lot together 
for SLAPD's benefit.

Well, that was two years ago (assuming anyone is still reading this far) 
and it's worked ever since.  Now, back to the present...

We're now using multiple disjoint directories (representing companies that 
we've acquired) served on one host, and I'm seeing this replication 
problem again, but from the other end.  Specifically, server "Joe" masters 
both "dc=au,dc=example,dc=com" and "dc=company,dc=com,dc=au", replicating 
to server "Whopper" (fictitious names to protect the guilty, of course).  
Whenever I update the first zone, it tries to update both on the slave; 
one works, and the other gets "ERROR: Referral" logged.  And vice-versa.  
It took me a while to figure out what was happening, because I was seeing 
replication both working and logging errors.

I could do funky things with slapd.conf again, but I suspect I've hit the 
limitations of the replication protocol.  So, would syncRepl be a better 
choice here?  Think of the general case of several servers, mastering 
several disjoint suffixes, replicating out to several arbitrary slaves 
(some of which are the aforesaid servers).

We're running 2.2.26, and will move to 2.3 when it's stable (we don't have 
the resources to hammer non-STABLE versions).

Dave Horsfall  DTM  VK2KFU  daveh@ci.com.au  Ph: +61 2 8425-5508 (d) -5500 (sw)
Corinthian Engineering, Level 1, 401 Pacific Hwy, Artarmon, NSW 2064, Australia