[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#8432) Never ending operation modifications with 3+ MMR nodes



quanah@zimbra.com wrote:
> --On Thursday, June 09, 2016 1:19 AM +0100 Howard Chu <hyc@symas.com> wrote:
>
>> quanah@openldap.org wrote:
>>> Full_Name: Quanah Gibson-Mount
>>> Version: 2.4.44
>>> OS: Linux 2.6
>>> URL: ftp://ftp.openldap.org/incoming/
>>> Submission from: (NULL) (75.111.52.177)
>>>
>>>
>>> In MMR node, when there is > 2 nodes, operations can get sent out
>>> endlessly.
>>>
>>> For example, we see this modification occur at 20160603194926.427963Z
>>
>> You seem to have a large clock sync problem.

Summary:

after fixing the clock skew, the problem was still present. Analyzing debug 
logs with sync+stats+packets, we see that the offending mods were propagated 
by syncprov without a CSN in the sync cookie. Since the cookie contained no 
CSN, the existing check for "CSN too old, ignoring" was not taking place, so 
the mods were not being filtered out as they should be.

syncprov sends mods out without a CSN in the cookie when the mod's CSN is 
older than the newest contextCSN. In this particular case, between the time 
that the provider processed the original mod, and the time it was queued up to 
be sent to the relevant consumers, this server's own consumers had received 
newer updates from other providers. So, the mod was older than the current 
contextCSN and was sent without a cookie CSN.

(The usual case for syncprov's behavior is when queued mods get sent out of 
order; since transmission order is not guaranteed to be the same as 
write/commit order this is a normal occurrence.)

It's possible that regular syncrepl+mmr needs a corresponding fix. I haven't 
looked at that yet.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/