[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Multimaster further work

To: Daniel Tiefnig <daniel.tiefnig@infonova.com>
Subject: Re: Multimaster further work
From: Derek Simkowiak <dereks@itsite.com>
Date: Thu, 24 Oct 2002 12:03:38 -0700 (PDT)
Cc: <openldap-software@OpenLDAP.org>
In-reply-to: <ap8nn4$m2b$1@qmail.infonova.at>

> Well, it's "just a master", so scalability doesn't matter that much.

	Ahem... my customers would disagree.  And so would I.  The first
time your boss' laptop gets stolen and you can't revoke his stolen
client-side certificate from the LDAP database, you'll change your mind :)


> > But the point is, if we know that MAX_LAG_TIME has passed since the
> > last write for a particular DN, then we know that there is no >
> conflicting write, because enough time has passed for us to know (for
> > sure) that no other conflicting write has occurred.
>
> No. Absolute minimum would be twice MAX_LAG_TIME. (round-trip)

	I should clarify what I meant by MAX_LAG_TIME.

	It is NOT the amount of time it takes for information to travel
across some network (although, that is one component).  It is the maximum
amount of time it would take for us to receive an update from the furthest
away master.

	If you have a 386sx as one of your masters, then you would need to
include the time it takes for that old thing to process requests, write it
to the dbm, write it out to the slurpd replication log, and then launch
the child slurpd processes to propagate that information to us.  The speed
of the network is only one component, although in practice, I expect it to
be the overriding factor.

	The whole idea behind the MAX_LAG_TIME number is that, once that
amount of time has passed, we are guaranteed to know that there are no
more conflicting writes coming in for a particular DN.  In practice I
expect it to be something like the ping time (which is a round trip, btw)
times two or three, to account for network spikes.

> But:
> If we don't affect the wait on server A, server B may get a third update
> within it's wait time, that server A will get _after_ it committed one
> of the first two to the backend.

	Good point.  I've actually thought about this, and I think I have
a solution.

	When you write out (and replicate out) a change, we flag that as
time_init for that DN.

	Now, when we get a conflicting write from one of the other
masters, we should wait not just MAX_LAG_TIME, but MAX_LAG_TIME plus the
maximum amount of time it could take for our last write to get to the
other masters -- call it "time_difference".  This way we can be sure that
all the masters can our conflicting write candidate for consideration.
Here it is in psuedocode:

time_init = the_time_the_first_write_occurred_for_this_dn
next_time = the_time_the_potentially_conflicting_write_came_in_for_this_dn

time_difference = next_time - time_init

if time_difference < MAX_LAG_TIME:
	there_is_a_conflict = true

if there_is_a_conflict:
	time_period_required_to_ensure_our_write_was_received = \
		MAX_LAG_TIME - time_difference

	time_to_wait_for_other_conflict_candidates = \
		time_period_to_ensure_our_write_was_received + MAX_LAG_TIME

	The var "time_period_required_to_ensure_our_write_was_received"
above will be different for each master, thus "sync'ing" up the servers in
the global time frame.  ...I think.  I'd like to know if that makes sense
to other people.


> I think we may avoid this, by delaying genuine updates, until we have
> committed replicated (and conflicting) ones, but I'm not quite sure
> about that.

	I think that waiting MAX_LAG_TIME before commiting and replicating
the first write is pointless; if every server does that, then all you have
done is delay the entire conflict scenario by MAX_LAG_TIME, thus creating
an unresponsive server for the endusers.  The first write should be
committed as soon as it's received, and that DN should have its time_init
reset at that point.

	If, and only if, we get another write for that DN within
MAX_LAG_TIME, do we have a conflict.  It is at that point that we need to

(a) wait long enough to be sure all masters have all conflict candidates
(b) choose one of the candidates


> > 2. Once MAX_LAG_TIME has passed, make an MD5 Digest of the data for
> > each conflicting write individually.  This is the write's "score".
> >
> > 3. The DN data with the highest score wins.
>
> I like that idea. :o)

	I've been thinking about this.

	One DN write might update, say, a uid, and nothing else.  But a
conflicting write may update that DN's uid, AND the homeDirectory.  What
if the uid update was actually to the same value?  Is that really a
conflict?  It seems more like an addendum to me.

	Slurpd does not distinguish attribute-level changes for a DN, and
I think writing the code to make that happen is a bad idea.  Therefor, I
propose the following change to my initial proposal:

3. Intead of just committing the "winner" and throwing everything else
away, we just use the MD5 Digest to _rank_ the writes in order, such that
all the conflicting write candidates are in the same order on all masters.
Then, we commit each of the writes in-order.

	This way, if multiple (but different) attributes are changed
between the conflicting writes, all of those attribute changes will get
committed (instead of thrown away).  But if there are conflicting
attribute changes, then all masters will commit the writes in the same
order, thus ensuring that the data is consistent between all masters.

	Comments?

> Generally I think this topic is quite interesting and bears some fun, so
> please keep me/us uptodate on its progression.

	I think this list makes a great forum.  And I think this topic is
quite interesting also.

	Yesterday I noticed

http://www.ietf.org/html.charters/ldup-charter.html

	but I have not had time to read over it yet.  The arey almost two
years behind schedule but the last update was 9-13 of this year.  I wonder
if they have already come up with a solution to this problem.


> Maybe we should also switch the discussion to -devel list, it's off
> topic here, and there's also less traffic, so I'm not that likely to
> overlook your and others messages.

	I strongly disagree.  I think this is an issue that should have
strong input from the endusers.  Devel lists are somewhat exclusive to
people who are great at writing code, but not necessarily great at
identifying endusers' needs.  I think this discussion is best held out "in
the open", where real-world users of OpenLDAP can make sure their needs
are being taken into account.

	When it comes time to implement, *then* might be a good time to
move to -devel.  :)


Thanks,
Derek Simkowiak

Follow-Ups:
- RE: Multimaster further work
  - From: "Clayton Donley" <clayton.donley@octetstring.com>
- Re: Multimaster further work
  - From: Daniel Tiefnig <openldap@qmail.infonova.at>

References:
- Re: Multimaster further work
  - From: Daniel Tiefnig <openldap@qmail.infonova.at>

Prev by Date: RE: SASL, TLS & Client Certificates
Next by Date: RE: compare uidNumber in LDAP
Index(es):
- Chronological
- Thread