[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Replication questions



Clowser, Jeff (Contractor) wrote:
Agreed - having a single "active" master and a "hot"/active but unused
standby master solves most HA issues without introducing the conflicts a
full active-active multimaster setup creates.  But if that master is
there and accepts writes, it's inevitable that someone will some day
write to it out of ignorance, and *may* write a conflicting change, so I
see conflict resolution as a last ditch fallback for this situation (and
nothing more) to prevent corruption or breakage of replication. (Plus, I
like to close up or at least be fully aware of all the edge cases that
exist, so I know how best to avoid them :) ).

Makes sense.

You said at one point that OpenLDAP (2.4.6?) currently does entry level conflict resolution, and does not do attribute level conflict resolution yet - i.e. if the entry was updated on 2 separate servers with different updates, conflicting or not, the most recently changed version of the *entry* wins. If I change the cn on one master, and after that (but before replication has occurred) I change the userpassword on another master, then the sync up occurs, I won't see the entry with the cn and password changed on all servers, I'll see the entry as it is in the master most recently changed (i.e. in my example, I'll see a changed password, but the cn will revert). Is there a roadmap/timeline for doing attribute level conflict resolution?

There are no set dates, but I expect it to be later in the 2.4 stream.

Also, I was looking at the admin guide and syncprov man pages on how to
set up replication.  N-Way multi-mastering details are kinda sparce :).
Is there any documentation elsewhere on setting this up?  OR... Is the
setup exactly the same as setting up Mirror-mode (per 2.3.x), but the
2.4.x code just automatically does conflict resolution (i.e. was
mirror-mode a 2.3 feature, with multimaster transparently replacing it
in 2.4 by adding conflict resolution to mirror-mode, using the same
setup?)

Yes, set it up pretty much like MirrorMode. MirrorMode was 2.4.1-2.4.4, which were only alpha releases, not general/public releases.


Is it possible for a consumer to replicate from multiple masters?

Yes in 2.4.

I'm
thinking along the lines of a master server at 2 locations (for HA/DR
purposes), plus each location also has multiple read-only slave
consumers.  My first thought is that these slave servers point to the
local master, but if that master goes down, the slaves under that master
stop getting updates.  My second thought is to have a load balancer at
each site, which directs all traffic connecting to a "master ldap" vip
to route connections to the primary master if it's up, or the secondary
master if the primary is unavailable.  But... (I'm still absorbing
syncrepl and rfc 4533) will all the contextCSNs and cookies and so forth
match up well enough to allow this kind of failover for *syncrepl*?  Is
it possible, and what's the best way to set this up, such that I have
multiple masters for DR purposes, and such that the failure of any
single master does not cause some subset of my read-only slave consumers
to stop getting updated?

Syncrepl (in refreshAndPersist mode), as I understand it, generally has
the slave consumer contacting the master server, retrieving an updated
list of changes since the last time it was running (refresh), then
leaves a persistent search running that gets changed entries from the
master server as they happen (persist), so replication is near
real-time.  If the master server crashes and then is restarted or the
connection is broken/dropped (common if a load balancer is inbetween),
how well does the consumer detect this and reconnect, or do the
consumers tend to have to be restarted after this occurs?  (This is a
broken/dropped connection, *not* one cleanly closed by a master server
clean shutdown or idle timeout, and many apps have trouble detecting
this - the client still thinks it has a valid tcp connection, but
nothing is coming over it, so never gets new updates.  Does the consumer
send keepalive packets or anything to cause it to realize the connection
has died and to reconnect?)

Currently the consumer relies on TCP keepalives. We've discussed adding LDAP-level keepalives so we're not dependent on the kernel TCP timers, but that hasn't been done yet.


When initializing a consumer using an LDIF backup of the master, should
this be a slapcat export to get everything needed to support syncrepl
(such as contextCSN, entryUUIDS, etc)?

That's the fastest way. But you can just bring up a consumer with an empty database and let it pull the entire DB down during its refresh pass, it will work regardless. Unlike some other replication schemes you may have used, we don't require any special considerations for initial load vs reload or recovery. Turn it on and it works.


--
  -- Howard Chu
  Chief Architect, Symas Corp.  http://www.symas.com
  Director, Highland Sun        http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP     http://www.openldap.org/project/