[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#8125) MMR throws away valid changes causing drift



On Mon, Oct 15, 2018 at 02:54:56PM +0000, hyc@symas.com wrote:
> OndÅ?ej Kuzník wrote:
>> On Mon, Oct 15, 2018 at 01:53:30PM +0000, hyc@symas.com wrote:
>>> ondra@mistotebe.net wrote:
>>>>    Also, whenever we fall back from deltasync into plain syncrepl,
>>>>    we should make sure that the accesslog entries we generate from
>>>>    this are never used for further replication which might be
>>>>    thought to be a separate issue.
>>>
>>> That should already be the case, since none of these ops will have a
>>> valid CSN.
>>
>> I faintly remember Quanah seeing these accesslog entries used by
>> consumers at some point, but I might be mistaken.
>>
>> The more general point is making sure its potential syncrepl consumer
>> not even try and use the accesslog entries we added before these - the
>> refresh has created a strange gap in the middle (or worse, duplicated
>> ops if a contextCSN element jumped backwards). But if we enforced that,
>> the question is how to get modifications originating from this replica
>> replicated elsewhere - unless we decide they can't be salvaged?
> 
> We could set the replica to reject user mods while in refresh phase.
> Not sure how friendly that is, whether apps would be smart enough to
> retry somewhere else.

The concern here is about changes that have happened before we found out
we can't replicate from another server. And it is likely some of these
changes are the reason we couldn't reconcile with our provider and would
cause the same if we decided to push them.

>> And should the contextCSN reset terminate not just all inbound syncrepl
>> sessions, but the outbound ones as well?
> 
> Need to be careful about race conditions here, or you could end up
> with all nodes just terminating each other and everything halting.

Yes, that would actually happen... The existing state seems quite
destructive though, if you have that same situation now (two masters in
present phase from each other at the same time), you lose data.

The question is what is the priority here? Currently it seems we want
replication to continue at the expense of losing modifications on
conflict. We might at least log that happened and allow someone to
revert this decision later.

-- 
OndÅ?ej Kuzník
Senior Software Engineer
Symas Corporation                       http://www.symas.com
Packaged, certified, and supported LDAP solutions powered by OpenLDAP