Full_Name: Quanah Gibson-Mount Version: 2.4.31 OS: Linux 2.6 URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (75.108.184.39) The following problem did not occur for me in previous OpenLDAP releases. However, I can reproduce it 100% of the time with OpenLDAP 2.4.31. Basic configuration scenario is as follows: Set up a stand alone OpenLDAP server -- Just cn=config and primary DB. Basic database for the server is created (note all entries now have SID value 0) Update the server to be delta-sync MMR -- Set sid=1, Mirror Mode, and add accesslog DB. Add syncrepl stanza for a second server Install second server as a delta-sycnrepl MMR pair -- cn=config, accesslog DB, primary DB, SID=2 Servers now go into infinite loops due to the SID=0 entries. Server SID=1 logs: May 16 14:33:24 zre-ldap002 slapd[17422]: do_syncrep2: rid=100 cookie=rid=100,sid=002,csn=20120516212017.543213Z#000000#000#000000 May 16 14:33:24 zre-ldap002 slapd[17422]: slap_queue_csn: queing 0x58d2ced 20120516212017.543213Z#000000#000#000000 May 16 14:33:24 zre-ldap002 slapd[17422]: slap_graduate_commit_csn: removing 0x4699f50 20120516212017.543213Z#000000#000#000000 May 16 14:33:24 zre-ldap002 slapd[17422]: syncrepl_message_to_op: rid=100 be_add cn=zimbra (68) May 16 14:33:24 zre-ldap002 slapd[17422]: do_syncrep2: rid=100 delta-sync lost sync on (reqStart=20120516213237.000019Z,cn=accesslog), switching to REFRESH May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_message_to_entry: rid=100 DN: cn=zimbra, UUID: add167b2-33e8-1031-8f62-f1c6fd57f8b3 May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_entry: rid=100 LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_ADD) May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_entry: rid=100 inserted UUID add167b2-33e8-1031-8f62-f1c6fd57f8b3 May 16 14:33:25 zre-ldap002 slapd[17422]: dn_callback : entries have identical CSN cn=zimbra 20120516212017.543213Z#000000#000#000000 May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_entry: rid=100 be_search (0) May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_entry: rid=100 cn=zimbra May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_entry: rid=100 entry unchanged, ignored (cn=zimbra) May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_message_to_entry: rid=100 DN: cn=admins,cn=zimbra, UUID: add18ab2-33e8-1031-8f63-f1c6fd57f8b3 May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_entry: rid=100 LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_ADD) May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_entry: rid=100 inserted UUID add18ab2-33e8-1031-8f63-f1c6fd57f8b3 May 16 14:33:25 zre-ldap002 slapd[17422]: dn_callback : entries have identical CSN cn=admins,cn=zimbra 20120516212017.544170Z#000000#000#000000 May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_entry: rid=100 be_search (0) May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_entry: rid=100 cn=admins,cn=zimbra May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_entry: rid=100 entry unchanged, ignored (cn=admins,cn=zimbra) (etc, going through all 000 SID entries). Once it finishes, it starts over: May 16 14:33:25 zre-ldap002 slapd[17422]: dn_callback : entries have identical CSN cn=com_zimbra_click2call_mitel,cn=zimlets,cn=zimbra 20120516212615.543144Z#000000#000#000000 May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_entry: rid=100 be_search (0) May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_entry: rid=100 cn=com_zimbra_click2call_mitel,cn=zimlets,cn=zimbra May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_entry: rid=100 entry unchanged, ignored (cn=com_zimbra_click2call_mitel,cn=zimlets,cn=zimbra) May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_message_to_entry: rid=100 DN: cn=zre-ldap003.eng.vmware.com,cn=servers,cn=zimbra, UUID: 5a64830a-33ea-1031-8fbb-f1c6fd57f8b3 May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_entry: rid=100 LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_ADD) May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_entry: rid=100 inserted UUID 5a64830a-33ea-1031-8fbb-f1c6fd57f8b3 May 16 14:33:25 zre-ldap002 slapd[17422]: dn_callback : entries have identical CSN cn=zre-ldap003.eng.vmware.com,cn=servers,cn=zimbra 20120516213323.905859Z#000000#001#000000 May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_entry: rid=100 be_search (0) May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_entry: rid=100 cn=zre-ldap003.eng.vmware.com,cn=servers,cn=zimbra May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_entry: rid=100 entry unchanged, ignored (cn=zre-ldap003.eng.vmware.com,cn=servers,cn=zimbra) May 16 14:33:25 zre-ldap002 slapd[17422]: do_syncrep2: rid=100 LDAP_RES_SEARCH_RESULT May 16 14:33:25 zre-ldap002 slapd[17422]: do_syncrep2: rid=100 cookie=rid=100,sid=002,csn=20120516213323.905859Z#000000#001#000000;20120516213237.680889Z#000000#002#000000 May 16 14:33:25 zre-ldap002 slapd[17422]: slap_queue_csn: queing 0x5805da0 20120516213237.680889Z#000000#002#000000 May 16 14:33:25 zre-ldap002 slapd[17422]: slap_graduate_commit_csn: removing 0x5805620 20120516213237.680889Z#000000#002#000000 May 16 14:33:25 zre-ldap002 slapd[17422]: do_syncrep2: rid=100 cookie=rid=000,sid=002,csn=20120516212017.543213Z#000000#000#000000 May 16 14:33:25 zre-ldap002 slapd[17422]: slap_queue_csn: queing 0x58d2ced 20120516212017.543213Z#000000#000#000000 May 16 14:33:25 zre-ldap002 slapd[17422]: slap_graduate_commit_csn: removing 0x5805bc0 20120516212017.543213Z#000000#000#000000 May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_message_to_op: rid=100 be_add cn=zimbra (68) May 16 14:33:25 zre-ldap002 slapd[17422]: do_syncrep2: rid=100 delta-sync lost sync on (reqStart=20120516213237.000019Z,cn=accesslog), switching to REFRESH May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_message_to_entry: rid=100 DN: cn=zimbra, UUID: add167b2-33e8-1031-8f62-f1c6fd57f8b3 May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_entry: rid=100 LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_ADD) May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_entry: rid=100 inserted UUID add167b2-33e8-1031-8f62-f1c6fd57f8b3 May 16 14:33:25 zre-ldap002 slapd[17422]: dn_callback : entries have identical CSN cn=zimbra 20120516212017.543213Z#000000#000#000000 May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_entry: rid=100 be_search (0) May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_entry: rid=100 cn=zimbra May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_entry: rid=100 entry unchanged, ignored (cn=zimbra) On the second MMR master server, it logs: May 16 14:36:15 zre-ldap003 slapd[26137]: conn=1002 op=138856 SRCH attr=* + May 16 14:36:15 zre-ldap003 slapd[26137]: conn=1002 op=138856 SEARCH RESULT tag=101 err=0 nentries=0 text= May 16 14:36:15 zre-ldap003 slapd[26137]: conn=1002 op=138857 SRCH base="cn=accesslog" scope=2 deref=0 filter="(&(objectClass=auditWriteObject)(reqResult=0))" May 16 14:36:15 zre-ldap003 slapd[26137]: conn=1002 op=138857 SRCH attr=reqDN reqType reqMod reqNewRDN reqDeleteOldRDN reqNewSuperior entryCSN May 16 14:36:15 zre-ldap003 slapd[26137]: syncprov_search_response: cookie=rid=100,sid=002,csn=20120516212644.426510Z#000000#000#000000;20120516213448.200519Z#000000#001#000000;20120516213237.680085Z#000000#002#000000 May 16 14:36:15 zre-ldap003 slapd[26137]: conn=1002 op=138858 ABANDON msg=138858 May 16 14:36:15 zre-ldap003 slapd[26137]: conn=1002 op=138859 SRCH base="" scope=2 deref=0 filter="(objectClass=*)" May 16 14:36:15 zre-ldap003 slapd[26137]: conn=1002 op=138859 SRCH attr=* + May 16 14:36:15 zre-ldap003 slapd[26137]: conn=1002 op=138859 SEARCH RESULT tag=101 err=0 nentries=0 text= May 16 14:36:15 zre-ldap003 slapd[26137]: conn=1002 op=138860 SRCH base="cn=accesslog" scope=2 deref=0 filter="(&(objectClass=auditWriteObject)(reqResult=0))" May 16 14:36:15 zre-ldap003 slapd[26137]: conn=1002 op=138860 SRCH attr=reqDN reqType reqMod reqNewRDN reqDeleteOldRDN reqNewSuperior entryCSN May 16 14:36:15 zre-ldap003 slapd[26137]: conn=1002 op=138861 ABANDON msg=138861 May 16 14:36:15 zre-ldap003 slapd[26137]: conn=1002 op=138862 SRCH base="" scope=2 deref=0 filter="(objectClass=*)" May 16 14:36:15 zre-ldap003 slapd[26137]: conn=1002 op=138862 SRCH attr=* + May 16 14:36:15 zre-ldap003 slapd[26137]: conn=1002 op=138862 SEARCH RESULT tag=101 err=0 nentries=0 text= May 16 14:36:15 zre-ldap003 slapd[26137]: conn=1002 op=138863 SRCH base="cn=accesslog" scope=2 deref=0 filter="(&(objectClass=auditWriteObject)(reqResult=0))" May 16 14:36:15 zre-ldap003 slapd[26137]: conn=1002 op=138863 SRCH attr=reqDN reqType reqMod reqNewRDN reqDeleteOldRDN reqNewSuperior entryCSN May 16 14:36:15 zre-ldap003 slapd[26137]: conn=1002 op=138864 ABANDON msg=138864 May 16 14:36:15 zre-ldap003 slapd[26137]: conn=1002 op=138865 SRCH base="" scope=2 deref=0 filter="(objectClass=*)" May 16 14:36:15 zre-ldap003 slapd[26137]: conn=1002 op=138865 SRCH attr=* + May 16 14:36:15 zre-ldap003 slapd[26137]: conn=1002 op=138865 SEARCH RESULT tag=101 err=0 nentries=0 text= May 16 14:36:15 zre-ldap003 slapd[26137]: conn=1002 op=138866 SRCH base="cn=accesslog" scope=2 deref=0 filter="(&(objectClass=auditWriteObject)(reqResult=0))" May 16 14:36:15 zre-ldap003 slapd[26137]: conn=1002 op=138866 SRCH attr=reqDN reqType reqMod reqNewRDN reqDeleteOldRDN reqNewSuperior entryCSN May 16 14:36:15 zre-ldap003 slapd[26137]: syncprov_search_response: cookie=rid=100,sid=002,csn=20120516212644.426510Z#000000#000#000000;20120516213448.200519Z#000000#001#000000;20120516213237.680085Z#000000#002#000000 May 16 14:36:15 zre-ldap003 slapd[26137]: conn=1002 op=138867 ABANDON msg=138867 However, I can get a working delta-sync MMR pair if I do the following: Set up a stand alone OpenLDAP server -- Just cn=config and primary DB. Basic database for the server is created (note all entries now have SID value 0) Update the server to be delta-sync master: add accesslog DB. Install second server as a replica. Update first server to be delta-sync MMR: Set sid=1, mirrormode=true on primary DB, add syncrepl stanza for second server Update second server to be delta-sync MMR: Set sid=2, mirrormode=true on primary DB, add accesslog DB, add syncprel stanza for first server In this scenario everything works as expected. The issue seems to be caused in the first scenario by server 2 replicating the DB with SID=0 bits from the primary server, and that looping back to the first server endlessly, instead of being permanently ignored. --Quanah
--On Wednesday, May 16, 2012 10:27 PM +0000 quanah@OpenLDAP.org wrote: > Full_Name: Quanah Gibson-Mount > Version: 2.4.31 > OS: Linux 2.6 > URL: ftp://ftp.openldap.org/incoming/ > Submission from: (NULL) (75.108.184.39) We can see that the script turning it into a master ran here: Thu May 17 16:05:46 2012 *** Running as zimbra user: /opt/zimbra/libexec/zmldapenable-mmr -s 2 -m ldap://zre-ldap002.eng.vmware.com:389/ so 16:05:46 In the accesslog, we see: dn: cn=accesslog objectClass: auditContainer cn: accesslog structuralObjectClass: auditContainer contextCSN: 20120517225152.913667Z#000000#000#000000 contextCSN: 20120517230823.615364Z#000000#001#000000 contextCSN: 20120517230546.409118Z#000000#002#000000 dn: reqStart=20120517230546.000019Z,cn=accesslog objectClass: auditAdd structuralObjectClass: auditAdd reqStart: 20120517230546.000019Z reqEnd: 20120517230546.000020Z reqType: add reqSession: 100 reqAuthzID: cn=config reqDN: cn=zimbra reqResult: 0 reqMod: objectClass:+ organizationalRole reqMod: description:+ Zimbra Systems Application Data reqMod: cn:+ zimbra reqMod: structuralObjectClass:+ organizationalRole reqMod: entryUUID:+ 40f78bea-34be-1031-8a5d-e1466f667e19 reqMod: creatorsName:+ cn=config reqMod: createTimestamp:+ 20120517224907Z reqMod: entryCSN:+ 20120517224907.221672Z#000000#000#000000 reqMod: modifiersName:+ cn=config reqMod: modifyTimestamp:+ 20120517224907Z reqEntryUUID: 40f78bea-34be-1031-8a5d-e1466f667e19 entryUUID: 948929e2-34c0-1031-9a14-c93bd10ff0f2 creatorsName: cn=config createTimestamp: 20120517224907Z entryCSN: 20120517224907.221672Z#000000#000#000000 modifiersName: cn=config modifyTimestamp: 20120517224907Z so it is tracking "000" as a third master? This seems to be why the original server (which was 000 before being promoted to 001) replicates these entries back to itself. --Quanah -- Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration
quanah@zimbra.com wrote: > --On Wednesday, May 16, 2012 10:27 PM +0000 quanah@OpenLDAP.org wrote: > >> Full_Name: Quanah Gibson-Mount >> Version: 2.4.31 >> OS: Linux 2.6 >> URL: ftp://ftp.openldap.org/incoming/ >> Submission from: (NULL) (75.108.184.39) > > We can see that the script turning it into a master ran here: > > Thu May 17 16:05:46 2012 *** Running as zimbra user: > /opt/zimbra/libexec/zmldapenable-mmr -s 2 -m > ldap://zre-ldap002.eng.vmware.com:389/ > > so 16:05:46 > > In the accesslog, we see: > > dn: cn=accesslog > objectClass: auditContainer > cn: accesslog > structuralObjectClass: auditContainer > contextCSN: 20120517225152.913667Z#000000#000#000000 > contextCSN: 20120517230823.615364Z#000000#001#000000 > contextCSN: 20120517230546.409118Z#000000#002#000000 > > dn: reqStart=20120517230546.000019Z,cn=accesslog > objectClass: auditAdd > structuralObjectClass: auditAdd > reqStart: 20120517230546.000019Z > reqEnd: 20120517230546.000020Z > reqType: add > reqSession: 100 > reqAuthzID: cn=config > reqDN: cn=zimbra > reqResult: 0 > reqMod: objectClass:+ organizationalRole > reqMod: description:+ Zimbra Systems Application Data > reqMod: cn:+ zimbra > reqMod: structuralObjectClass:+ organizationalRole > reqMod: entryUUID:+ 40f78bea-34be-1031-8a5d-e1466f667e19 > reqMod: creatorsName:+ cn=config > reqMod: createTimestamp:+ 20120517224907Z > reqMod: entryCSN:+ 20120517224907.221672Z#000000#000#000000 > reqMod: modifiersName:+ cn=config > reqMod: modifyTimestamp:+ 20120517224907Z > reqEntryUUID: 40f78bea-34be-1031-8a5d-e1466f667e19 > entryUUID: 948929e2-34c0-1031-9a14-c93bd10ff0f2 > creatorsName: cn=config > createTimestamp: 20120517224907Z > entryCSN: 20120517224907.221672Z#000000#000#000000 > modifiersName: cn=config > modifyTimestamp: 20120517224907Z > > so it is tracking "000" as a third master? This seems to be why the > original server (which was 000 before being promoted to 001) replicates > these entries back to itself. The loop is caused by the patch to ITS#6872, which considers a consumer out of date whenever the number of CSNs in its sync request doesn't match the number known to the provider. The data here is basically invalid: server1 has entries generated using SID=0 but it has no contextCSN value with SID=0. It only sent SID=1 and SID=2 in its sync request. Server2, which just updated from server1, has a contextCSN for SID=0 in addition to 1 and 2 (and that's all correct). Server1 should have always had a contextCSN value for SID=0 but doesn't. This problem would not occur if server1 was converted first from standalone into a single-master. I.e., load syncprov on it, let it scan the DB and generate the first sid=0 contextCSN, before turning it intu a MMR node. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
--On Tuesday, May 29, 2012 6:25 PM -0700 Howard Chu <hyc@openldap.org> wrote: >> so it is tracking "000" as a third master? This seems to be why the >> original server (which was 000 before being promoted to 001) replicates >> these entries back to itself. > > The loop is caused by the patch to ITS#6872, which considers a consumer > out of date whenever the number of CSNs in its sync request doesn't match > the number known to the provider. > > The data here is basically invalid: server1 has entries generated using > SID=0 but it has no contextCSN value with SID=0. It only sent SID=1 and > SID=2 in its sync request. Server2, which just updated from server1, has > a contextCSN for SID=0 in addition to 1 and 2 (and that's all correct). > > Server1 should have always had a contextCSN value for SID=0 but doesn't. > This problem would not occur if server1 was converted first from > standalone into a single-master. I.e., load syncprov on it, let it scan > the DB and generate the first sid=0 contextCSN, before turning it intu a > MMR node. For documentation purposes -- The issue occurred because I set olcServerID before I loaded syncprov. Re-ordering my script to load syncprov (which then caused a contextCSN value to be correctly generated on Server1 for SID=0) fixed this loop. --Quanah -- Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration
changed state Open to Closed moved from Incoming to Documentation