[Date Prev][Date Next] [Chronological] [Thread] [Top]

(ITS#6097) MMR problems with deletes

Full_Name: Howard Chu
Version: 2.4/HEAD
OS: Linux
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (
Submitted by: hyc

Given two multimaster servers, and an entry is deleted from one server at the
same time that a child of that entry is created on the other server, the two
servers will probably diverge.

In Persist mode:

1) If the delete has the newer CSN, then on its original server the new child
entry will be ignored because it's too old. On the other server, the delete will
fail since the entry has children.

2) If the delete has the older CSN, then on its server the original entry (which
has already been deleted) will be resurrected as a glue entry, and its original
contents will be lost. On the other server the delete will be ignored because
it's too old.

In Refresh mode, there won't be a divergence, but the result may not make

3) On the deleting server, the original entry will be resurrected as a glue
entry and the child will be added. On the other server, the deleted entry will
be turned into a glue entry. As such, both servers will agree, but they'll both
contain a child entry with a basically invalid parent.

As a first step, we should fix the obvious differences between each server. Then
we need to figure out what actually makes sense... (I.e., for the moment, we
accept (3) as correct behavior, and so nothing needs to be fixed for Refresh

While we usually declare "last writer wins" these semantics don't make sense for
Deletes. E.g. in (1), in a single-master environment the Delete would fail, and
in (2) the Add would fail. But in a working MMR we'd get the opposite results.

To fix the divergence:

1) When the delete fails, the target entry should be changed to a glue entry,
same as in (3).

2) We should never ignore deletes even if they're old. In this case, the Delete
will fail and the target will be turned into a glue entry.

So our policy will be:

A) Deletes always win
B) Outside of A, last writer wins

With just these changes, all 3 cases will end up with the same resulting trees.
All of them will have glue entries in the tree that have real child entries
though, which is bogus.

To address this problem, the fix for (1) should delete the entire subtree of the
target entry. Likewise for (2), and also we should not create glue entries when
a parent entry is missing; instead we should just ignore the Add.

For (3) we need to purge all glue entries and their children after the refresh
completes successfully.

(Note that in Refresh, we need to allow glue entries to be created, since
children may be received before their parents in a regular refresh. But once the
refresh has completed, there should not be any glue entries left, they should
all have been turned into real entries.)