[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#8490) changes not written to accesslog, causing replicas to loop syncing



--On Thursday, September 01, 2016 9:50 AM +0000 quanah@zimbra.com wrote:

I finally was able to reproduce this in a new lab of servers I set up.  The 
easiest way to reproduce this problem is to have 2 masters, and 6 replicas. 
3 replicas pointing to each master.

In my case, I have ldap07 through ldap014. 07/08 are the masters.  Odd 
number replicas (9,11,13) point to 07, even number replicas point to 08 
(10, 12, 14).

What we find based on the logs is two MODRDNs coming in at approximately 
the same time:

Sep  9 15:29:40 zre-ldap007 slapd[13794]: conn=1389 op=3179 MODRDN 
dn="uid=user.345,ou=people,dc=zre-ldap007,dc=eng,dc=zimbra,dc=com"
Sep  9 15:29:40 zre-ldap007 slapd[13794]: conn=1120 op=2919 MODRDN 
dn="uid=user.250404,ou=people,dc=zre-ldap007,dc=eng,dc=zimbra,dc=com"
Sep  9 15:29:40 zre-ldap007 slapd[13794]: slap_queue_csn: queueing 
0xa046e80 20160909192940.809794Z#000000#001#000000
Sep  9 15:29:40 zre-ldap007 slapd[13794]: slap_queue_csn: queueing 
0xa046a40 20160909192940.809794Z#000000#001#000000
Sep  9 15:29:40 zre-ldap007 slapd[13794]: slap_queue_csn: queueing 
0xa07b740 20160909192940.810205Z#000000#001#000000
Sep  9 15:29:40 zre-ldap007 slapd[13794]: slap_graduate_commit_csn: 
removing 0xa046a40 20160909192940.809794Z#000000#001#000000
Sep  9 15:29:40 zre-ldap007 slapd[13794]: slap_queue_csn: queueing 
0xa07bdc0 20160909192940.810205Z#000000#001#000000
Sep  9 15:29:40 zre-ldap007 slapd[13794]: conn=1389 op=3179 RESULT tag=109 
err=0 etime=0.001278 text=
Sep  9 15:29:40 zre-ldap007 slapd[13794]: slap_graduate_commit_csn: 
removing 0xa046e80 20160909192940.809794Z#000000#001#000000
Sep  9 15:29:40 zre-ldap007 slapd[13794]: slap_graduate_commit_csn: 
removing 0xa07bdc0 20160909192940.810205Z#000000#001#000000
Sep  9 15:29:40 zre-ldap007 slapd[13794]: conn=1120 op=2919 RESULT tag=109 
err=0 etime=0.001597 text=
Sep  9 15:29:40 zre-ldap007 slapd[13794]: slap_graduate_commit_csn: 
removing 0xa07b740 20160909192940.810205Z#000000#001#000000


But in the accesslog, we only find the one MODRDN logged:

dn: reqStart=20160909192940.809465Z,cn=accesslog
objectClass: auditModRDN
structuralObjectClass: auditModRDN
reqStart: 20160909192940.809465Z
reqEnd: 20160909192940.810148Z
reqType: modrdn
reqSession: 1389
reqAuthzID: uid=zimbra,cn=admins,cn=zimbra
reqDN: uid=user.345,ou=people,dc=zre-ldap007,dc=eng,dc=zimbra,dc=com
reqResult: 0
reqMod: entryCSN:= 20160909192940.809794Z#000000#001#000000
reqMod: modifiersName:= uid=zimbra,cn=admins,cn=zimbra
reqMod: modifyTimestamp:= 20160909192940Z
reqNewRDN: uid=renamed-user.345
reqDeleteOldRDN: TRUE
reqEntryUUID: c63b246a-0a20-1036-82cb-a772509d7f53
entryUUID: 7f740e6c-0b0f-1036-8c00-b1b4f08a59fc
creatorsName: cn=config
createTimestamp: 20160909192940Z
entryCSN: 20160909192940.809794Z#000000#001#000000
modifiersName: cn=config
modifyTimestamp: 20160909192940Z


Possibly a DN collision in writing the entry to the accesslog DB that 
causes the second OP to get lost.  Likely the same root issue as ITS#8493.

--Quanah

--

Quanah Gibson-Mount