Issue 6097 - MMR problems with deletes
Summary: MMR problems with deletes
Status: UNCONFIRMED
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: slapd (show other issues)
Version: unspecified
Hardware: All All
: Highest normal
Target Milestone: 3.0.0
Assignee: Ondřej Kuzník
URL:
Keywords: replication
Depends on:
Blocks:
 
Reported: 2009-05-07 13:02 UTC by Howard Chu
Modified: 2023-11-16 16:59 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description Howard Chu 2009-05-07 13:02:51 UTC
Full_Name: Howard Chu
Version: 2.4/HEAD
OS: Linux
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (76.91.220.157)
Submitted by: hyc


Given two multimaster servers, and an entry is deleted from one server at the
same time that a child of that entry is created on the other server, the two
servers will probably diverge.

In Persist mode:

1) If the delete has the newer CSN, then on its original server the new child
entry will be ignored because it's too old. On the other server, the delete will
fail since the entry has children.

2) If the delete has the older CSN, then on its server the original entry (which
has already been deleted) will be resurrected as a glue entry, and its original
contents will be lost. On the other server the delete will be ignored because
it's too old.

In Refresh mode, there won't be a divergence, but the result may not make
sense:

3) On the deleting server, the original entry will be resurrected as a glue
entry and the child will be added. On the other server, the deleted entry will
be turned into a glue entry. As such, both servers will agree, but they'll both
contain a child entry with a basically invalid parent.

As a first step, we should fix the obvious differences between each server. Then
we need to figure out what actually makes sense... (I.e., for the moment, we
accept (3) as correct behavior, and so nothing needs to be fixed for Refresh
mode.)

While we usually declare "last writer wins" these semantics don't make sense for
Deletes. E.g. in (1), in a single-master environment the Delete would fail, and
in (2) the Add would fail. But in a working MMR we'd get the opposite results.

To fix the divergence:

1) When the delete fails, the target entry should be changed to a glue entry,
same as in (3).

2) We should never ignore deletes even if they're old. In this case, the Delete
will fail and the target will be turned into a glue entry.

So our policy will be:

A) Deletes always win
B) Outside of A, last writer wins

With just these changes, all 3 cases will end up with the same resulting trees.
All of them will have glue entries in the tree that have real child entries
though, which is bogus.

To address this problem, the fix for (1) should delete the entire subtree of the
target entry. Likewise for (2), and also we should not create glue entries when
a parent entry is missing; instead we should just ignore the Add.

For (3) we need to purge all glue entries and their children after the refresh
completes successfully.

(Note that in Refresh, we need to allow glue entries to be created, since
children may be received before their parents in a regular refresh. But once the
refresh has completed, there should not be any glue entries left, they should
all have been turned into real entries.)
Comment 1 Howard Chu 2009-06-23 00:24:56 UTC
moved from Incoming to Development
Comment 2 OpenLDAP project 2017-09-11 18:47:26 UTC
See also ITS#6268
Comment 3 Quanah Gibson-Mount 2017-09-11 18:47:26 UTC
changed notes
Comment 4 Ondřej Kuzník 2021-08-19 13:44:34 UTC
So testing confirms the systems converge right now, extremely noisily (the add or delete fails, we go into refresh, etc.) but things settle in the right way (the parent is removed (made into glue) and the child remains. There are situation where is fails but those are bugs (to be) filed separately.