Issue 9647 - Glue entry creation doesn't replicate properly
Summary: Glue entry creation doesn't replicate properly
Status: VERIFIED FIXED
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: slapd (show other issues)
Version: unspecified
Hardware: All All
: --- normal
Target Milestone: 2.6.1
Assignee: Ondřej Kuzník
URL:
Keywords: replication
Depends on:
Blocks:
 
Reported: 2021-08-24 15:01 UTC by Ondřej Kuzník
Modified: 2022-01-20 16:52 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description Ondřej Kuzník 2021-08-24 15:01:27 UTC
In plain syncrepl, when an entry is turned into glue (to remove it when it still has children), it won't replicate correctly to its consumers - a NEW_COOKIE intermediate message is sent instead.

Scenario:
- 4 servers (A, B, C, D) and a tree with two entries - cn=parent,cn=suffix and its parent, the database suffix
- D replicates from C, C replicates from A and B, no other links set up for this

Now:
1. add an entry "cn=child,cn=parent,cn=suffix" on A
2. remove "cn=parent,cn=suffix" from B

As things settle, cn=parent,cn=suffix is retained on D while being deleted from C.
Comment 1 Ondřej Kuzník 2021-08-25 14:41:32 UTC
Right, the cause is slightly different. When dealing with the delete and turning the entry into a glue, CSN of the operation is not recorded in the modify, that's why the replica ends up ignoring it.

We should replace the entryCSN as part of the operation, but syncprov should probably figure out it's a delete when we do it, otherwise I suspect we push out glue entries to consumers that are leafs and never get reclaimed?
Comment 2 Ondřej Kuzník 2021-08-26 12:31:55 UTC
If the issue is fixed (the glue entry has the right entryCSN recorded), the set up still exposes another issue when sessionlog is not configured/available:

In the above scenario, when things settle and B is allowed to replicate from A, cn=parent,cn=suffix is recreated in the cluster (sometimes except D not sure why yet). This is because A has to do a present phase and sends all entries, including cn=parent,cn=suffix (it hasn't seen the delete yet) while B has processed the delete in full, so it has no information to reject it anymore (except contextCSN, which we ignore in this case).

We could start judging entries in the present phase by their entryCSN as compared to our cookie, but this way we make it harder to fix existing database differences (if replication failed somehow in the past).
Comment 3 Ondřej Kuzník 2021-10-18 14:57:00 UTC
And during the present phase we don't actually know which sid was responsible for this delete. We can't just pick one at random because we might then relay that deletion over a running persist session (or store it inside our sessionlog for later) - if we chose wrong, it's possible that server would never hear about the deletion from anyone.

That suggests we also have to taint the whole sessionlog/accesslog and all running persist sessions, forcing everyone into a full refresh. In that case, depends whether ITS#8125 and similar issues have actually been fully resolved otherwise we risk desyncs and/or everyone refreshing+reconnecting indefinitely.
Comment 5 Quanah Gibson-Mount 2021-12-13 17:03:53 UTC
  • 8d514517 
by Ondřej Kuzník at 2021-12-09T20:50:02+00:00 
ITS#9647 Record delete's CSN in the glue entry


  • e8f1038d 
by Ondřej Kuzník at 2021-12-09T20:50:02+00:00 
ITS#9647 Treat glue entries as missing

We're using MANAGE_DSAIT control so we get to see them, but they don't
really exist (except for their CSN sometimes).


  • ba37508f 
by Ondřej Kuzník at 2021-12-09T20:50:02+00:00 
ITS#9647 Find correct sid in compare_csns() more of the time
Comment 6 Quanah Gibson-Mount 2021-12-13 17:11:36 UTC
RE26:

  • 0a7e5abc 
by Ondřej Kuzník at 2021-12-13T17:04:15+00:00 
ITS#9647 Record delete's CSN in the glue entry


  • 93487d54 
by Ondřej Kuzník at 2021-12-13T17:04:19+00:00 
ITS#9647 Treat glue entries as missing

We're using MANAGE_DSAIT control so we get to see them, but they don't
really exist (except for their CSN sometimes).


  • fb4f4227 
by Ondřej Kuzník at 2021-12-13T17:04:23+00:00 
ITS#9647 Find correct sid in compare_csns() more of the time