[Date Prev][Date Next]
RE: Multimaster further work
> > The whole idea behind the MAX_LAG_TIME number is that, once that
> amount of time has passed, we are guaranteed to know
> > that there are no more conflicting writes coming in for a particular
> Actually, the issue here is that it doesn't actually guarantee anything,
> since a master could go down between the time a change is committed to
> the time it can be replicated (among many other scenarios).
Saying that MAX_LAG_TIME "guarantees" anything was misleading, and
I apologize for that. What it really does is let you specify an upper
limit of how you define "conflict".
If I make a change today, and you make a differing change to the
same DN next week sometime, is that a conflict? Of course not. But what
if there are two masters, with supposedly-sync'd data, where it takes one
week for updates from one to be replicated to the other? Then we could
indeed have a conflict.
In the real world, the time won't be measured in weeks, it will be
measured in milliseconds, but the idea remains. If several masters are
supposed to accept writes for the same DN, then a user sitting in front of
one master will be able to make updates at the same (global) time as a
user sitting in front of another master that is far away--far enough that
the time it takes to replicate the entries between the two masters is a
limiting factor in knowing which occurred first.
Timestamps are a good idea, but have the limitation that your
masters need to have their clocks in sync. If they are off, then I could
make a change now, you could make a change later, but your computer's
clock is set ahead so your change would get replicated as being the
"first" one entered. Now, the master I'm sitting next to has already
committed my change; am I supposed to "undo" my previously-committed
entry, then commit your entry, then re-commit my entry? Are you supposed
to commit my entry as being the "latest and greatest" data to commit to
the database (even though your data was the latest to be entered)? What
if a master gets a replication with a timestamp set in the future? Many
hardware clocks have a resolution of only 10usecs ; if we're talking about
a cluster on a Gigabit network, with large volumes of data being
replicated, that is a significant limitation.
Granted, in the real world, conflicts don't come up very
frequently (I've never experienced one myself). But what I'm after is a
sound theoretical solution that lets us know _without doubt_ that masters
can keep their dbm files in sync.
> MAX_LAG_TIME seems to add additional complexity while not really
> helping the servers server figure out the actual order of changes,
Could you elaborate on this point?
If you don't have an upper time limit on what is or is not a
conflict, then ... what is a conflict?
One definition for "conflict" could be that multiple masters have
different data in their dbms for a given DN. But that happens as soon as
anything is written in a multimaster environment; there is a time lag
between the time I make a change, and when that change has been propagated
to all the other masters. So by that definitation, there is a "conflict"
everytime somebody writes a change but that change has not yet been
propagated to all the other masters.
I think a "conflict" means that there is different data in the
dbms (not necessarily on disk, just stored in the internal database) that
will remain different, assuming no more writes come in. The fundamental
problem is that we never know if more writes are already en route from
other masters. MAX_LAG_TIME is an attempt to set an upper limit on how
long we wait before deciding no (potentially conflicting) updates are
already en route.
> as well as a process for handling exceptional conditions/conflicts
> (perhaps as simple as notifying an administrator of unresolvable
> conflicts as is done in some loosely synchronized systems).
My goal is to eliminate "unresolvable conflicts" altogether, by
specifying an upper limit on how much time any one of the masters has to
submit (potentially) conflicting data.
> Also, I read your original post and the MD5-related checking could be
> avoided by using change sequence numbers or such that indicate the
> version of the data in the entry.
The problem is that all masters are supposed to store identical
data. So what happens when two conflicting changes occur faster than
those changes can be replicated? Whose "change sequence number or such"
I propose that neither is "more correct" than the other, that they
are both equally correct to the time resolution of the system, and that
MAX_LAG_TIME lets you specify that time resolution of the system. Using
MD5s is just a useful way to arbitrarily pick one of the entries to be
correct, and to have that pick be uniform across all masters.
We could just as easily say "the one closest to the North Pole" is
correct. But then we'd need to maintain GPS data for each of our masters.
Using MD5s seems a little more convenient.
> As for the need for multi-master. Certainly it's a nice tool to have
> in the arsenal, but I've worked with many clients that have gotten
> along fine without it in some very large scale environments.
Yes, I think one reason this problem has not already been solved
in OpenLDAP is that there is no real-world requirement for it. I'm
deploying a multimaster environment right now, with no conflict resolution
scheme. I'm relying on the fact that everything is connected in a very
fast network, and that I have pretty tight control over the endusers as to
when and how they make changes. "Good enough" syndrome.
But being able to safely deploy multiple masters across the globe
is a desirable goal for large companies. That means bigger lag times,
with more users in front of different masters, with conflicts being more
likely to occur. The 9-11 event has also brought fear as a motivating
factor, and personally, I think the global economy and corporate
internationalization will bring a need within the next few years. It
would be cool if OpenLDAP could say, without doubt, that any conflict will
be uniformely resolved across all masters.
> Just my thoughts (not being an OpenLDAP server developer or anything,
> but having dealt with many of these hard issues myself from time to
Thank you for the great feedback! This is fun stuff.