Full_Name: Ondrej Kuznik Version: master/re24 OS: URL: Submission from: (NULL) (82.10.24.68) If sessionlog data is available and found useful, syncprov will send a cookie at the end of delete phase, itself followed by the entries modified since time recorded in the client's original cookie. Some of those entries might have been last modified before the new cookie's recorded time and if the connection is severed before this is communicated, they would not be re-sent under the new cookie. To replicate this issue, in test043, configure sessionlog and after it has finished, run ldapsearch with a cookie set to entryCSN from the mod that adds dc=testdomain2,dc=example,dc=com and record the cookie returned at the end of delete phase (it is the operation that deletes cn=Rosco P. Coltrane,ou=Retired,ou=People,dc=example,dc=com). Then try to run a syncrepl search with the recorded cookie instead. Taking only the information from this search and delete phase from the previous search, the client will not see modifications to these objects: - cn=ITD Staff,ou=Groups,dc=example,dc=com - cn=Gern Jensen,ou=Information Technology Division,ou=People,dc=example,dc=com - ou=Retired,ou=People,dc=example,dc=com
On Tue, Oct 31, 2017 at 05:34:05PM +0000, ondra@openldap.org wrote: > If sessionlog data is available and found useful, syncprov will send a cookie at > the end of delete phase, itself followed by the entries modified since time > recorded in the client's original cookie. > > Some of those entries might have been last modified before the new cookie's > recorded time and if the connection is severed before this is communicated, they > would not be re-sent under the new cookie. There are other problems with this. I have always assumed that CSN of each write is globally unique in a well-configured system and that this is preserved across replication, since MMR needs that to function properly. This assumption is clearly invalid if UUIDs are sent in a delete SyncInfo message (consumer that needs to determine CSNs that apply can only pick a single CSN for all of the deletes). So this is a problem in MMR situations where the cookie carries semantic information between MMR nodes. An MMR member receiving such a message has to pick a CSN to apply here: - either the cookie (if present at all) - leads to problems described above - or some other CSN - the deletes could be lost or propagate to other masters as a fresh mod, either smells of replication problems down the line This shouldn't affect deltaMMR environments, though, AFAIK they never use sessionlog in any way, so batched deletes don't get sent over the wire at all. -- Ondřej Kuzník Senior Software Engineer Symas Corporation http://www.symas.com Packaged, certified, and supported LDAP solutions powered by OpenLDAP
On Thu, Nov 02, 2017 at 03:55:02PM +0000, ondra@mistotebe.net wrote: > On Tue, Oct 31, 2017 at 05:34:05PM +0000, ondra@openldap.org wrote: > > If sessionlog data is available and found useful, syncprov will send a cookie at > > the end of delete phase, itself followed by the entries modified since time > > recorded in the client's original cookie. > > > > Some of those entries might have been last modified before the new cookie's > > recorded time and if the connection is severed before this is communicated, they > > would not be re-sent under the new cookie. > > There are other problems with this. I have always assumed that CSN of > each write is globally unique in a well-configured system and that this > is preserved across replication, since MMR needs that to function > properly. This assumption is clearly invalid if UUIDs are sent in a > delete SyncInfo message (consumer that needs to determine CSNs that > apply can only pick a single CSN for all of the deletes). > > So this is a problem in MMR situations where the cookie carries semantic > information between MMR nodes. > > An MMR member receiving such a message has to pick a CSN to apply here: > - either the cookie (if present at all) - leads to problems described > above > - or some other CSN - the deletes could be lost or propagate to other > masters as a fresh mod, either smells of replication problems down the > line Even assuming we never send a batch delete, sessionlog is a problem in the MMR case: - to end up in the sessionlog, we need a CSN for the delete to be transmitted - if we send all deletes first, then modified entries, we can't use the cookie to send the information that's needed to create a sessionlog entry - we can't reasonably send all entries in CSN order with the deletes interspersed at the relevant place in the stream, that would require holding onto the (UUID, CSN) list for a very long time, not to mention that we'd need the backend to guarantee search entry ordering in the first place Maybe if we track more information in the cookie MMR nodes might have what they needed to populate sessionlog and standards compliant syncrepl clients would keep working as well? Maybe storing progress of the delete phase (optional) and general replication progress (CSN as usual) in the cookie might do it. Question is whether that is enough, doesn't introduce new problems and doesn't make the code even more complex and harder to maintain? That, even if workable wouldn't get it into 2.4, so an upgrade would have to be a lock-step affair for at least the masters/nodes with syncprov running. > This shouldn't affect deltaMMR environments, though, AFAIK they never use > sessionlog in any way, so batched deletes don't get sent over the wire > at all. Quanah mentions that deltaMMR can hit this issue if we have to fall-back to plain syncrepl in the case of a conflict (and we don't want a full present phase to happen at that point). -- Ondřej Kuzník Senior Software Engineer Symas Corporation http://www.symas.com Packaged, certified, and supported LDAP solutions powered by OpenLDAP
See also https://github.com/mistotebe/openldap/commits/ITS8486-use-accesslog
changed notes
https://git.openldap.org/openldap/openldap/-/merge_requests/4
Commits: • d1e874c6 by Ondřej Kuzník at 2020-06-23T16:06:09+00:00 ITS#8768 Introduce delcsn into our syncrepl cookies • 182ec30a by Ondřej Kuzník at 2020-06-23T16:06:09+00:00 ITS#8768 Accept delcsn from the server • e24a6bf5 by Ondřej Kuzník at 2020-06-23T16:06:09+00:00 ITS#8768 Do not update main CSN during delete phase
There is another element to this issue, the consumer that is receiving these deletes might also be a provider with active persist sessions. It needs to be able to pass the right information onwards so that its own consumers do not risk diverging either. I better add some kind of regression test to make sure this is handled, it might already be fine.