[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#7274) delta-syncrepl MMR infinite loop



quanah@zimbra.com wrote:
> --On Wednesday, May 16, 2012 10:27 PM +0000 quanah@OpenLDAP.org wrote:
>
>> Full_Name: Quanah Gibson-Mount
>> Version: 2.4.31
>> OS: Linux 2.6
>> URL: ftp://ftp.openldap.org/incoming/
>> Submission from: (NULL) (75.108.184.39)
>
> We can see that the script turning it into a master ran here:
>
> Thu May 17 16:05:46 2012 *** Running as zimbra user:
> /opt/zimbra/libexec/zmldapenable-mmr -s 2 -m
> ldap://zre-ldap002.eng.vmware.com:389/
>
> so 16:05:46
>
> In the accesslog, we see:
>
> dn: cn=accesslog
> objectClass: auditContainer
> cn: accesslog
> structuralObjectClass: auditContainer
> contextCSN: 20120517225152.913667Z#000000#000#000000
> contextCSN: 20120517230823.615364Z#000000#001#000000
> contextCSN: 20120517230546.409118Z#000000#002#000000
>
> dn: reqStart=20120517230546.000019Z,cn=accesslog
> objectClass: auditAdd
> structuralObjectClass: auditAdd
> reqStart: 20120517230546.000019Z
> reqEnd: 20120517230546.000020Z
> reqType: add
> reqSession: 100
> reqAuthzID: cn=config
> reqDN: cn=zimbra
> reqResult: 0
> reqMod: objectClass:+ organizationalRole
> reqMod: description:+ Zimbra Systems Application Data
> reqMod: cn:+ zimbra
> reqMod: structuralObjectClass:+ organizationalRole
> reqMod: entryUUID:+ 40f78bea-34be-1031-8a5d-e1466f667e19
> reqMod: creatorsName:+ cn=config
> reqMod: createTimestamp:+ 20120517224907Z
> reqMod: entryCSN:+ 20120517224907.221672Z#000000#000#000000
> reqMod: modifiersName:+ cn=config
> reqMod: modifyTimestamp:+ 20120517224907Z
> reqEntryUUID: 40f78bea-34be-1031-8a5d-e1466f667e19
> entryUUID: 948929e2-34c0-1031-9a14-c93bd10ff0f2
> creatorsName: cn=config
> createTimestamp: 20120517224907Z
> entryCSN: 20120517224907.221672Z#000000#000#000000
> modifiersName: cn=config
> modifyTimestamp: 20120517224907Z
>
> so it is tracking "000" as a third master?  This seems to be why the
> original server (which was 000 before being promoted to 001) replicates
> these entries back to itself.

The loop is caused by the patch to ITS#6872, which considers a consumer out of 
date whenever the number of CSNs in its sync request doesn't match the number 
known to the provider.

The data here is basically invalid: server1 has entries generated using SID=0 
but it has no contextCSN value with SID=0. It only sent SID=1 and SID=2 in its 
sync request. Server2, which just updated from server1, has a contextCSN for 
SID=0 in addition to 1 and 2 (and that's all correct).

Server1 should have always had a contextCSN value for SID=0 but doesn't. This 
problem would not occur if server1 was converted first from standalone into a 
single-master. I.e., load syncprov on it, let it scan the DB and generate the 
first sid=0 contextCSN, before turning it intu a MMR node.


-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/