[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#8125) MMR throws away valid changes causing drift



On Mon, Oct 15, 2018 at 01:53:30PM +0000, hyc@symas.com wrote:
> ondra@mistotebe.net wrote:
>> This is my understanding of the above discussion:
>> - deltasync consumer has just switched to full refresh (but is ahead
>>   from this provider in some ways)
>> - provider sends the present list
>> - consumer deletes extra entries, builds a new cookie
>> - problem is that the new cookie is built to reflect the union of both
>>   the local and received cookies even though we may have undone some of
>>   the changes which we then ignore
>> 
>> If that's accurate, there are some approaches that could fix it:
>> 
>> 1. Simple one is to remember the actual cookie we got from the server
>>    and refuse to delete entries with entryCSN ahead of the provided CSN
>>    set. Problem is that we get even further from being able to replicate
>>    from a generic RFC4533 provider.
>> 
>> 2. Instead, when present phase is initiated, we might terminate all
>>    other sessions, adopt the complete CSN set and restart them only once
>>    the new CSN set has been fully established.
> 
> (2) makes sense.
>> 
>>    Also, whenever we fall back from deltasync into plain syncrepl, we
>>    should make sure that the accesslog entries we generate from this are
>>    never used for further replication which might be thought to be a
>>    separate issue.
> 
> That should already be the case, since none of these ops will have a valid CSN.

I faintly remember Quanah seeing these accesslog entries used by
consumers at some point, but I might be mistaken.

The more general point is making sure its potential syncrepl consumer
not even try and use the accesslog entries we added before these - the
refresh has created a strange gap in the middle (or worse, duplicated
ops if a contextCSN element jumped backwards). But if we enforced that,
the question is how to get modifications originating from this replica
replicated elsewhere - unless we decide they can't be salvaged?

And should the contextCSN reset terminate not just all inbound syncrepl
sessions, but the outbound ones as well?

-- 
OndÅ?ej Kuzník
Senior Software Engineer
Symas Corporation                       http://www.symas.com
Packaged, certified, and supported LDAP solutions powered by OpenLDAP