Issue 9341 - Delta-sync MPR needs to be stable regardless of ordering
Summary: Delta-sync MPR needs to be stable regardless of ordering
Status: UNCONFIRMED
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: backends (show other issues)
Version: unspecified
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: Ondřej Kuzník
URL:
Keywords: replication
Depends on:
Blocks:
 
Reported: 2020-09-08 20:30 UTC by Ondřej Kuzník
Modified: 2023-11-07 17:21 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description Ondřej Kuzník 2020-09-08 20:30:45 UTC
If two or more updates are spread across several providers before they have a chance to learn about the others, all replicas need to arrive at the same content regardless of the order in which they arrive.

One example that is broken at the moment:
- (csn a) server 1 accepts a modify
- (csn b) server 2 accepts a delete on the same DN
- (csn c) server 2 accepts an add on that DN again

If a replica receives the actions in the order bca vs. abc, the content of the entry will be different even though the final CSN set is the same -> they will never converge. The ordering 'bac' also needs to result in eventual convergence, even if it means a refresh or replication from either provider stalling temporarily?

Merge request with this test case (so far):
https://git.openldap.org/openldap/openldap/-/merge_requests/145
Comment 1 Ondřej Kuzník 2020-09-08 20:35:16 UTC
Another case:
- (csn a) server 1 renames entry a to a'
- (csn b) server 2 deletes entry a
- optional: (csn c) server 2 renames entry b to a

abc, bac and bca should also eventually converge on the same state.
Comment 2 Ondřej Kuzník 2020-09-08 20:42:13 UTC
At the moment getting a good/stable resolution is also compounded by the fact there is an additional ordering - timestamp of the CSNs themselves which might or might not match the order they arrive in each consumer. The only thing we can guarantee is that if CSN a < CSN b apply to the same entry and both share the same serverId, they will arrive in order a, then b or a somehow "combined" operation with the CSN of b.
Comment 3 Quanah Gibson-Mount 2020-11-13 23:14:40 UTC
Saw something similar to this today:

MPR 1 - Gets MOD at time = x
MPR 2 - Gets DEL at time = x + fraction of a second (like 1/1000th)

consumer replicates 2, then replicates 1

consumer goes into REFESH

consumer recreates entry

consumer deletes entry

consumer goes back into REFRESH

consumer ADDs entry again

consumer ABORTS due to assert when it tries to add the entry a second time into the sessionlog