Issue 8768 - Syncprov shouldn't send a new cookie at the end of delete phase
Summary: Syncprov shouldn't send a new cookie at the end of delete phase
Status: VERIFIED FIXED
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: slapd (show other issues)
Version: unspecified
Hardware: All All
: --- normal
Target Milestone: 2.5.0
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks: 6467
  Show dependency treegraph
 
Reported: 2017-10-31 17:34 UTC by Ondřej Kuzník
Modified: 2020-10-14 21:14 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description Ondřej Kuzník 2017-10-31 17:34:05 UTC
Full_Name: Ondrej Kuznik
Version: master/re24
OS: 
URL: 
Submission from: (NULL) (82.10.24.68)


If sessionlog data is available and found useful, syncprov will send a cookie at
the end of delete phase, itself followed by the entries modified since time
recorded in the client's original cookie.

Some of those entries might have been last modified before the new cookie's
recorded time and if the connection is severed before this is communicated, they
would not be re-sent under the new cookie.

To replicate this issue, in test043, configure sessionlog and after it has
finished, run ldapsearch with a cookie set to entryCSN from the mod that adds
dc=testdomain2,dc=example,dc=com and record the cookie returned at the end of
delete phase (it is the operation that deletes cn=Rosco P.
Coltrane,ou=Retired,ou=People,dc=example,dc=com).

Then try to run a syncrepl search with the recorded cookie instead. Taking only
the information from this search and delete phase from the previous search, the
client will not see modifications to these objects:
- cn=ITD Staff,ou=Groups,dc=example,dc=com
- cn=Gern Jensen,ou=Information Technology Division,ou=People,dc=example,dc=com
- ou=Retired,ou=People,dc=example,dc=com
Comment 1 Ondřej Kuzník 2017-11-02 15:54:39 UTC
On Tue, Oct 31, 2017 at 05:34:05PM +0000, ondra@openldap.org wrote:
> If sessionlog data is available and found useful, syncprov will send a cookie at
> the end of delete phase, itself followed by the entries modified since time
> recorded in the client's original cookie.
> 
> Some of those entries might have been last modified before the new cookie's
> recorded time and if the connection is severed before this is communicated, they
> would not be re-sent under the new cookie.

There are other problems with this. I have always assumed that CSN of
each write is globally unique in a well-configured system and that this
is preserved across replication, since MMR needs that to function
properly. This assumption is clearly invalid if UUIDs are sent in a
delete SyncInfo message (consumer that needs to determine CSNs that
apply can only pick a single CSN for all of the deletes).

So this is a problem in MMR situations where the cookie carries semantic
information between MMR nodes.

An MMR member receiving such a message has to pick a CSN to apply here:
- either the cookie (if present at all) - leads to problems described
  above
- or some other CSN - the deletes could be lost or propagate to other
  masters as a fresh mod, either smells of replication problems down the
  line

This shouldn't affect deltaMMR environments, though, AFAIK they never use
sessionlog in any way, so batched deletes don't get sent over the wire
at all.

-- 
Ondřej Kuzník
Senior Software Engineer
Symas Corporation                       http://www.symas.com
Packaged, certified, and supported LDAP solutions powered by OpenLDAP

Comment 2 Ondřej Kuzník 2017-11-02 17:10:06 UTC
On Thu, Nov 02, 2017 at 03:55:02PM +0000, ondra@mistotebe.net wrote:
> On Tue, Oct 31, 2017 at 05:34:05PM +0000, ondra@openldap.org wrote:
> > If sessionlog data is available and found useful, syncprov will send a cookie at
> > the end of delete phase, itself followed by the entries modified since time
> > recorded in the client's original cookie.
> > 
> > Some of those entries might have been last modified before the new cookie's
> > recorded time and if the connection is severed before this is communicated, they
> > would not be re-sent under the new cookie.
> 
> There are other problems with this. I have always assumed that CSN of
> each write is globally unique in a well-configured system and that this
> is preserved across replication, since MMR needs that to function
> properly. This assumption is clearly invalid if UUIDs are sent in a
> delete SyncInfo message (consumer that needs to determine CSNs that
> apply can only pick a single CSN for all of the deletes).
> 
> So this is a problem in MMR situations where the cookie carries semantic
> information between MMR nodes.
> 
> An MMR member receiving such a message has to pick a CSN to apply here:
> - either the cookie (if present at all) - leads to problems described
>   above
> - or some other CSN - the deletes could be lost or propagate to other
>   masters as a fresh mod, either smells of replication problems down the
>   line

Even assuming we never send a batch delete, sessionlog is a problem in
the MMR case:
- to end up in the sessionlog, we need a CSN for the delete to be
  transmitted
- if we send all deletes first, then modified entries, we can't use the
  cookie to send the information that's needed to create a sessionlog
  entry
- we can't reasonably send all entries in CSN order with the deletes
  interspersed at the relevant place in the stream, that would require
  holding onto the (UUID, CSN) list for a very long time, not to mention
  that we'd need the backend to guarantee search entry ordering in the
  first place

Maybe if we track more information in the cookie
MMR nodes might have what they needed to populate sessionlog and
standards compliant syncrepl clients would keep working as well?

Maybe storing progress of the delete phase (optional) and general
replication progress (CSN as usual) in the cookie might do it.
Question is whether that is enough, doesn't introduce new problems and
doesn't make the code even more complex and harder to maintain?

That, even if workable wouldn't get it into 2.4, so an upgrade would
have to be a lock-step affair for at least the masters/nodes with
syncprov running.

> This shouldn't affect deltaMMR environments, though, AFAIK they never use
> sessionlog in any way, so batched deletes don't get sent over the wire
> at all.

Quanah mentions that deltaMMR can hit this issue if we have to fall-back
to plain syncrepl in the case of a conflict (and we don't want a full
present phase to happen at that point).

-- 
Ondřej Kuzník
Senior Software Engineer
Symas Corporation                       http://www.symas.com
Packaged, certified, and supported LDAP solutions powered by OpenLDAP

Comment 3 OpenLDAP project 2019-04-17 23:28:24 UTC
See also https://github.com/mistotebe/openldap/commits/ITS8486-use-accesslog
Comment 4 Quanah Gibson-Mount 2019-04-17 23:28:24 UTC
changed notes
Comment 6 Quanah Gibson-Mount 2020-06-23 17:02:04 UTC
Commits: 
  • d1e874c6 
by Ondřej Kuzník at 2020-06-23T16:06:09+00:00 
ITS#8768 Introduce delcsn into our syncrepl cookies


  • 182ec30a 
by Ondřej Kuzník at 2020-06-23T16:06:09+00:00 
ITS#8768 Accept delcsn from the server


  • e24a6bf5 
by Ondřej Kuzník at 2020-06-23T16:06:09+00:00 
ITS#8768 Do not update main CSN during delete phase
Comment 7 Ondřej Kuzník 2020-09-09 10:25:52 UTC
There is another element to this issue, the consumer that is receiving these deletes might also be a provider with active persist sessions. It needs to be able to pass the right information onwards so that its own consumers do not risk diverging either. I better add some kind of regression test to make sure this is handled, it might already be fine.