[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Multimaster further work

To: "Mark H. Wood" <mwood@IUPUI.Edu>
Subject: Re: Multimaster further work
From: Andrew Findlay <andrew.findlay@skills-1st.co.uk>
Date: Mon, 4 Nov 2002 11:18:12 +0000
Cc: openldap-software@OpenLDAP.org
Content-disposition: inline
In-reply-to: <Pine.LNX.4.33.0211010851250.16340-100000@mhw.ULib.IUPUI.Edu>
References: <20021101123013.GC14821@brick.skills-1st.co.uk> <Pine.LNX.4.33.0211010851250.16340-100000@mhw.ULib.IUPUI.Edu>
User-agent: Mutt/1.3.27i

On Fri, Nov 01, 2002 at 09:02:18AM -0500, Mark H. Wood wrote:

> On Fri, 1 Nov 2002, Andrew Findlay wrote:
> [snip]
> > If we want to give absolute assurance of database consistency across all
> > servers, it is essential that all servers receive and acknowledge each
> > update *before the result of the operation is returned to the client*.
> 
> But for a directory service this level of consistency is usually
> considered too stringent and much too costly (in terms of performance).
> In the cases I'm familiar with, it is considered best that the service can
> converge quickly (O(seconds)) to a consistent state but that brief
> inconsistency is permissible.  In a directory, new object attribute values
> are almost never dependent on previous values, so strict ordering and
> absolute consistency are not necessary.

I agree, though there are some fairly common applications where great
care is needed to maintain this property. One example that comes up on
this list every few weeks is the use of LDAP as NIS, holding login
account information: a common requirement is to select the 'next'
numerical UID when a new account is created. The usual advice in such
cases is to have an entry that holds the value of the next number to
be used, and to update it in this way:

	repeat {
		CurrentNumber = ReadValue(entry)

		bundled-operation(
			delete-value(CurrentNumber)
			insert-value(CurrentNumber+1)
		)
	} until success

This depends on the delete-insert operation taking effect globally as
an atomic operation. Any partition in the set of masters will either
block such operations or break the assumption that they use.

Of course, with a little ingenuity, it is often possible to work around this
limitation. One solution would be to allocate a range of UIDs to each
server so that UID allocations can proceed without reference to other
sites. Some scheme to avoid DN clashes would also be needed - if this
were done by giving each site its own part of the DIT then we would
not need multi-master at all...

>  We are not recalculating bank
> balances here.  What *is* strictly necessary is that all masters will
> eventually reach the same conclusion as to the correct then-current value.

In which case, I would say that the simplest approach is to require
synchronised clocks (not hard to do). Whenever an update comes in with
a timestamp older than the one already in the entry, it triggers a
clash resolution process. This must cope with overlapping updates, so
I suggest the process should be to send a complete new copy of the
entry to the server that is trying to spread old data, to be treated
as a fresh update which will therefore be propogated to all other
masters.

Assuming that all masters can eventually make contact with
all other masters, and that clock resolution is high enough, this
should result in convergence. The effects of some updates may be
completely lost even if they were non-overlapping in terms of the
attributes modified. If there is a worry about clock resolution, then
the MD5 scheme proposed earlier in this thread could be applied as
well (but the clock value should take precedence, to avoid a server
that has been cut off for months wiping out a load of entries when it
comes back).

One problem though: deletions..... We would need to keep deleted
entries around so that they could have timestamps for the time they
were deleted!

> (For the archive:  this is another reason why directory service !=
> general-purpose DBMS.)

Absolutely right.

Andrew
-- 
-----------------------------------------------------------------------
|                 From Andrew Findlay, Skills 1st Ltd                 |
| Consultant in large-scale systems, networks, and directory services |
|     http://www.skills-1st.co.uk/                +44 1628 782565     |
-----------------------------------------------------------------------

Follow-Ups:
- Re: Multimaster further work
  - From: Derek Simkowiak <dereks@itsite.com>

References:
- Re: Multimaster further work
  - From: Andrew Findlay <andrew.findlay@skills-1st.co.uk>
- Re: Multimaster further work
  - From: "Mark H. Wood" <mwood@IUPUI.Edu>

Prev by Date: RE: order of multivalue attributes
Next by Date: RE: problems on EAGAIN? (was: TLS connect from remote host to slapd hangs)
Index(es):
- Chronological
- Thread