[Date Prev][Date Next] [Chronological] [Thread] [Top]

RE: Multimaster further work

To: "'Derek Simkowiak'" <dereks@itsite.com>
Subject: RE: Multimaster further work
From: "Howard Chu" <hyc@highlandsun.com>
Date: Fri, 25 Oct 2002 00:49:44 -0700
Cc: <openldap-software@OpenLDAP.org>
Importance: Normal
In-reply-to: <Pine.LNX.4.33L2.0210241853290.32302-100000@dev.itsite.com>

> -----Original Message-----
> From: Derek Simkowiak [mailto:dereks@itsite.com]

> > such, multi-master replication offers no benefit, while
> introducing a slew of
> > potential problems.
>
> 	On this point, I disagree.  The goal of loadbalancing
> is not just
> to distribute load; it's also to provide high availability and ease of
> management and maintenance.
>
> 	It's far easier to manage a set of homogenous master nodes in a
> cluster than it is three separate configurations: master-primary,
> master-failover, and slave.  It's that versus just "multimaster".

The issues here aren't much different from the concurrency issues in an SMP
operating system. Any time the possibility for contention arises, you have to
explicitly serialize your processing. In today's code this is done by having
a single master server, and setting slave servers that refer updates back to
the master. This is obviously not a great solution, especially given the
difficulties in referral chasing.

(And by the way, "single master" doesn't mean there can only be one master
server in a cluster of cooperating servers. It means there is only one master
for a given piece of data. You can split the data up such that different
servers master different parts of the tree. This is an essential concept of
distributed information management - distribute the data such that it is
closest to the data's consumers.)

Since we're dreaming up solutions to the perceived evils of
single-point-of-failure and other things, let me spell out a different
solution:

First of all, in the current 2.1 code there is the beginnings of support for
a feature we call "soft restart" - this allows you to fire up a new slapd
instance while an old one is still running, with the new one taking over
connections from the old one. When fully implemented, this will allow
reconfiguration/whatever to be done on the fly, without clients ever seeing
even a hiccup. As such, the issue of interrupting service for software
upgrades, database overhauls, etc. will be eliminated.

To make clients' lives easier, I would change slapd such that slave servers
accept all updates from clients. But instead of just updating themselves (as
a "multimaster" approach would do), the slave chains the request to the
current master. This maintains serialization, without seriously affecting the
total load on the cluster. As a slight optimization, the slave could update
itself after the update succeeds on the master, and the master can skip
propagating the update back to the slave that submitted the update. Either
way, you don't have to mess with special server addresses.

Just because slapd today doesn't handle automatic slave-to-master promotion
doesn't mean it can't. Given that slaves are chaining updates to the master
server, we now have a situation where every slave maintains a connection to
the master. This gives you automatic detection that a master has failed,
because the slave can notice that the master's connection has stopped
responding. Once you've noticed that the current master is gone, you can
automate selection of a new master just using an ordered list of all of the
cooperating servers in the cluster. If one server is unavailable, the next
one will be chosen, on down the line. As long as each server uses an
identical list, you maintain perfect consistency, even in the face of partial
network failures.

For example, you have servers A, B, and C, and the "master list" is kept in
that order. If server A fails, B becomes the master, no big deal.

Let's say that A is up, but the connection from A to C fails. C will start
chaining updates to B. B is still in contact with A, so B continues the chain
back to A. In this case, performance suffers a little, but overall the
cluster continues to operate consistently. Without any additional logic,
updates to A and B won't propagate back to C until the A-C connection is
restored. But this is no longer an unsolvable conflict resolution problem,
it's a simple reachability problem, already solved in today's multicast
routing protocols.

The idea here is that as long as you can fully serialize your updates,
everything is OK. As soon as serialization is broken, all bets are off,
you're back to the same consistency problems as the current multimaster
approach. E.g., you have servers A and B, A is the master. The network
between A and B fails, but both servers are still running. B promotes itself
to master and accepts updates while the network is down. When the network is
restored, you get to see the mess that's been made. I suggest that this may
require manual intervention; long sequences of updates that depend on each
other may have already been made so it's not just a matter of looking at
timestamps or arbitrarily choosing which update to keep. Again, this isn't a
huge problem, tools like CVS deal with this all the time. Lots of times CVS
can merge separately patched source trees without any issue, sometimes it
finds a conflict and you have to fix it by hand.

  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support

Follow-Ups:
- RE: Multimaster further work
  - From: Adam Williams <awilliam@whitemice.org>

References:
- RE: Multimaster further work
  - From: Derek Simkowiak <dereks@itsite.com>

Prev by Date: Re: Multimaster further work
Next by Date: RE: Multimaster further work
Index(es):
- Chronological
- Thread