Full_Name: Quanah Gibson-Mount Version: 2.4.44 OS: Linux 3.13 URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (75.111.52.177) I have a MMR setup, with two providers, zre-ldap003 and zre-ldap005. I have a single replica, zre-ldap002, which is configured to pull updates from both providers. However, it continually gets an error 53 from zre-ldap005 after writing an update to zre-ldap003. As far as I can tell, all contextCSN data is correct on all nodes. replica: zimbra@zre-ldap002:~$ ldapsearch -x -LLL -H ldapi:// -D cn=config -w zimbra -s base -b "" contextCSN dn: contextCSN: 20160218220547.523250Z#000000#001#000000 contextCSN: 20160218220604.321502Z#000000#002#000000 Master SID1: [zimbra@zre-ldap005 ~]$ ldapsearch -x -LLL -H ldapi:// -D cn=config -w zimbra -s base -b "" contextCSN dn: contextCSN: 20160218220547.523250Z#000000#001#000000 contextCSN: 20160218220604.321502Z#000000#002#000000 [zimbra@zre-ldap005 ~]$ ldapsearch -x -LLL -H ldapi:// -D cn=config -w zimbra -s base -b "cn=accesslog" contextCSN dn: cn=accesslog contextCSN: 20160217223749.648918Z#000000#001#000000 contextCSN: 20160218220604.321502Z#000000#002#000000 Master SID2: [zimbra@zre-ldap003 ~]$ ldapsearch -x -LLL -H ldapi:// -D cn=config -w zimbra -s base -b "" contextCSN dn: contextCSN: 20160218220547.523250Z#000000#001#000000 contextCSN: 20160218220604.321502Z#000000#002#000000 [zimbra@zre-ldap003 ~]$ ldapsearch -x -LLL -H ldapi:// -D cn=config -w zimbra -s base -b "cn=accesslog" contextCSN dn: cn=accesslog contextCSN: 20160218220547.523250Z#000000#001#000000 contextCSN: 20160218220604.321502Z#000000#002#000000 %%0 Note all contextCSNs have the same value everywhere. However, when the replica attempts to replicate off of zre-ldap005, we get the following error: Feb 18 16:50:34 zre-ldap005 slapd[10555]: conn=1191 fd=18 ACCEPT from IP=10.137.242.52:38979 (IP=10.137.242.55:389) Feb 18 16:50:34 zre-ldap005 slapd[10555]: conn=1191 op=0 EXT oid=1.3.6.1.4.1.1466.20037 Feb 18 16:50:34 zre-ldap005 slapd[10555]: conn=1191 op=0 STARTTLS Feb 18 16:50:34 zre-ldap005 slapd[10555]: conn=1191 op=0 RESULT oid= err=0 etime=0.000071 text= Feb 18 16:50:34 zre-ldap005 slapd[10555]: conn=1191 fd=18 TLS established tls_ssf=128 ssf=128 tls_proto=TLSv1.2 tls_cipher=SEED-SHA Feb 18 16:50:34 zre-ldap005 slapd[10555]: conn=1191 op=1 BIND dn="uid=zmreplica,cn=admins,cn=zimbra" metho3D3D128 Feb 18 16:50:34 zre-ldap005 slapd[10555]: conn=1191 op=1 BIND dn="uid=zmreplica,cn=admins,cn=zimbra" mech=SIMPLE ssf=0 Feb 18 16:50:34 zre-ldap005 slapd[10555]: conn=1191 op=1 RESULT tag=97 err=0 etime=0.000154 text= Feb 18 16:50:34 zre-ldap005 slapd[10555]: conn=1191 op=2 SRCH base="cn=accesslog" scope=2 deref=0 filter="(&(objectClass=auditWriteObject)(reqResult=0))" Feb 18 16:50:34 zre-ldap005 slapd[10555]: conn=1191 op=2 SRCH attr=reqDN reqType reqMod reqNewRDN reqDeleteOldRDN reqNewSuperior entryCSN Feb 18 16:50:34 zre-ldap005 slapd[10555]: conn=1191 op=2 SEARCH RESULT tag=101 err=53 etime=0.000177 nentries=0 text=consumer state is newer than provider! Feb 18 16:50:34 zre-ldap005 slapd[10555]: conn=1191 op=3 UNBIND Feb 18 16:50:34 zre-ldap5 5 slapd[10555]: conn=1191 fd=18 closed This started happening after I made a modification on zre-ldap005 followed by a modification on zre-ldap003. In that case, the following was logged on zre-ldap002: Feb 18 16:06:04 zre-ldap002 slapd[11266]: do_syncrep2: rid=101 cookie=rid=101,sid=002,csn=20160218220604.321502Z#000000#002#000000 Feb 18 16:06:04 zre-ldap002 slapd[11266]: slap_queue_csn: queueing 0x3589880 20160218220604.321502Z#000000#002#000000 Feb 18 16:06:04 zre-ldap002 slapd[11266]: do_syncrep2: rid=100 cookie=r%3=100,sid=001,csn=20160218220604.321502Z#000000#002#000000 Feb 18 16:06:04 zre-ldap002 slapd[11266]: slap_graduate_commit_csn: removing 0x3589880 20160218220604.321502Z#000000#002#000000 Feb 18 16:06:04 zre-ldap002 slapd[11266]: syncrepl_message_to_op: rid=101 be_modify cn=admins,cn=zimbra (0) Feb 18 16:06:04 zre-ldap002 slapd[11266]: slap_queue_csn: queueing 0x3589740 20160218220604.321502Z#000000#002#000000 Feb 18 16:06:04 zre-ldap002 slapd[11266]: slap_graduate_commit_csn: removing 0x3589740 20160218220604.321502Z#000000#002#000000 Feb 18 16:06:04 zre-ldap002 slapd[11266]: do_syncrep2: rid=100 CSN pending, ignoring 20160218220604.321502Z#000000#002#000000 (reqStart=20160218220604.326527Z,cn=accesslog) So we see the write accepted from one master, and correctly rejected (not replayed) from the other. After this, there were no more modifications made to either master. However, I *did* make a modification to cn=config on the replica. After that was done, is when the error(53) started getting triggered: Feb 18 16:27:12 zre-ldap002 slapd[11266]: slap_queue_csn: queueing 0x3587a40 20160218222712.155976Z#000000#000#000000 Feb 18 16:27:12 zre-ldap002 slapd[11266]: slap_graduate_commit_csn: removing 0x3587a40 20160218222712.155976Z#000000#000#000000 Feb 18 16:27:12 zre-ldap002 slapd[11266]: do_syncrep2: rid=100 LDAP_RES_SEARCH_RESULT Feb 18 16:27:12 zre-ldap002 slapd[11266]: do_syncrep2: rid=100 LDAP_RES_SEARCH_RESULT (53) Server is unwilling to perform Feb 18 16:27:12 zre-ldap002 slapd[11266]: do_syncrep2: rid=100 (53) Server is unwilling to perform Feb 18 16:27:12 zre-ldap002 slapd[11266]: do_syncrepl: rid=100 rc -2 retrying
--On Thursday, February 18, 2016 11:01 PM +0000 quanah@openldap.org wrote: After stopping all servers, and restarting the masters, I see the same error (53) being returned when zre-ldap003 tries to connect to zre-ldap005. So this is not specific to the replica or the change to cn=config I made on the replica. --Quanah -- Quanah Gibson-Mount Platform Architect Zimbra, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration A division of Synacor, Inc
--On Thursday, February 18, 2016 11:01 PM +0000 quanah@openldap.org wrote: > Master SID1: > [zimbra@zre-ldap005 ~]$ ldapsearch -x -LLL -H ldapi:// -D cn=config -w > zimbra -s base -b "" contextCSN > dn: > contextCSN: 20160218220547.523250Z#000000#001#000000 > contextCSN: 20160218220604.321502Z#000000#002#000000 > > [zimbra@zre-ldap005 ~]$ ldapsearch -x -LLL -H ldapi:// -D cn=config -w > zimbra -s base -b "cn=accesslog" contextCSN > dn: cn=accesslog > contextCSN: 20160217223749.648918Z#000000#001#000000 > contextCSN: 20160218220604.321502Z#000000#002#000000 Ooops, clearly a difference here between the main db and the accesslog DB. However, nothing was logged, error wise. Here is this modification: Feb 18 16:05:47 zre-ldap005 slapd[10555]: slap_queue_csn: queueing 0x3ce84c0 20160218220547.523250Z#000000#001#000000 Feb 18 16:05:47 zre-ldap005 slapd[10555]: slap_graduate_commit_csn: removing 0x3ce84c0 20160218220547.523250Z#000000#001#000000 Feb 18 16:05:47 zre-ldap005 slapd[10555]: syncprov_sendresp: cookie=rid=100,sid=001 Feb 18 16:05:47 zre-ldap005 slapd[10555]: syncprov_sendresp: to=002, cookie=rid=100,sid=001 On the other master, we have: Feb 18 16:05:47 zre-ldap003 slapd[6947]: do_syncrep2: rid=100 cookie=rid=100,sid=001 Feb 18 16:05:47 zre-ldap003 slapd[6947]: slap_queue_csn: queueing 0x36d58c0 20160218220547.523250Z#000000#001#000000 Feb 18 16:05:47 zre-ldap003 slapd[6947]: slap_queue_csn: queueing 0x36d5800 20160218220547.523250Z#000000#001#000000 Feb 18 16:05:47 zre-ldap003 slapd[6947]: syncprov_matchops: skipping original sid 001 Feb 18 16:05:47 zre-ldap003 slapd[6947]: slap_graduate_commit_csn: removing 0x36d5800 20160218220547.523250Z#000000#001#000000 Feb 18 16:05:47 zre-ldap003 slapd[6947]: slap_graduate_commit_csn: removing 0x36d58c0 20160218220547.523250Z#000000#001#000000 Feb 18 16:05:47 zre-ldap003 slapd[6947]: syncrepl_message_to_op: rid=100 be_modify cn=admins,cn=zimbra (0) Feb 18 16:05:47 zre-ldap003 slapd[6947]: syncprov_sendresp: cookie=rid=101,sid=002,csn=20160218220547.523250Z#000000#001#000000 In the accesslog, we have: dn: reqStart=20160218220547.523064Z,cn=accesslog objectClass: auditModify structuralObjectClass: auditModify reqStart: 20160218220547.523064Z reqEnd: 20160218220547.523558Z reqType: modify reqSession: 1027 reqAuthzID: cn=config reqDN: cn=admins,cn=zimbra reqResult: 0 reqMod: description:= admin accounts reqMod: entryCSN:= 20160218220547.523250Z#000000#001#000000 reqMod: modifiersName:= cn=config reqMod: modifyTimestamp:= 20160218220547Z reqEntryUUID: c3fbf53a-6a0c-1035-9563-ef445d2cc3b6 entryCSN: 20160218220547.523250Z#000000#001#000000 entryUUID: 822e1680-6ad7-1035-8bac-2751986a6fb8 creatorsName: cn=config createTimestamp: 20160218220547Z modifiersName: cn=config modifyTimestamp: 20160218220547Z --Quanah -- Quanah Gibson-Mount Platform Architect Zimbra, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration A division of Synacor, Inc
--On Friday, February 19, 2016 12:37 AM +0000 quanah@zimbra.com wrote: > --On Thursday, February 18, 2016 11:01 PM +0000 quanah@openldap.org wrote: So I can trivially reproduce this. I make a change on each master, and boom, they go out of sync and everything's broken. --Quanah -- Quanah Gibson-Mount Platform Architect Zimbra, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration A division of Synacor, Inc
moved from Incoming to Software Bugs
Fixed by some other fix since the original report
changed notes changed state Open to Closed
--On Friday, February 19, 2016 1:33 AM +0000 quanah@zimbra.com wrote: Played around with this again today, and the issue no longer occurs. It seems one of the many replication related fixes since I reported this ITS has solved the problem. Closing the ITS. --Quanah -- Quanah Gibson-Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: <http://www.symas.com>