[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#8125) MMR throws away valid changes causing drift

On Mon, Oct 15, 2018 at 01:53:30PM +0000, hyc@symas.com wrote:
> ondra@mistotebe.net wrote:
>> This is my understanding of the above discussion:
>> - deltasync consumer has just switched to full refresh (but is ahead
>>   from this provider in some ways)
>> - provider sends the present list
>> - consumer deletes extra entries, builds a new cookie
>> - problem is that the new cookie is built to reflect the union of both
>>   the local and received cookies even though we may have undone some of
>>   the changes which we then ignore
>> If that's accurate, there are some approaches that could fix it:
>> 1. Simple one is to remember the actual cookie we got from the server
>>    and refuse to delete entries with entryCSN ahead of the provided CSN
>>    set. Problem is that we get even further from being able to replicate
>>    from a generic RFC4533 provider.
>> 2. Instead, when present phase is initiated, we might terminate all
>>    other sessions, adopt the complete CSN set and restart them only once
>>    the new CSN set has been fully established.
> (2) makes sense.
>>    Also, whenever we fall back from deltasync into plain syncrepl, we
>>    should make sure that the accesslog entries we generate from this are
>>    never used for further replication which might be thought to be a
>>    separate issue.
> That should already be the case, since none of these ops will have a valid CSN.

I faintly remember Quanah seeing these accesslog entries used by
consumers at some point, but I might be mistaken.

The more general point is making sure its potential syncrepl consumer
not even try and use the accesslog entries we added before these - the
refresh has created a strange gap in the middle (or worse, duplicated
ops if a contextCSN element jumped backwards). But if we enforced that,
the question is how to get modifications originating from this replica
replicated elsewhere - unless we decide they can't be salvaged?

And should the contextCSN reset terminate not just all inbound syncrepl
sessions, but the outbound ones as well?

OndÅ?ej Kuzník
Senior Software Engineer
Symas Corporation                       http://www.symas.com
Packaged, certified, and supported LDAP solutions powered by OpenLDAP