[Date Prev][Date Next] [Chronological] [Thread] [Top]

(ITS#7274) delta-syncrepl MMR infinite loop



Full_Name: Quanah Gibson-Mount
Version: 2.4.31
OS: Linux 2.6
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (75.108.184.39)


The following problem did not occur for me in previous OpenLDAP releases. 
However, I can reproduce it 100% of the time with OpenLDAP 2.4.31.

Basic configuration scenario is as follows:

Set up a stand alone OpenLDAP server -- Just cn=config and primary DB.

Basic database for the server is created (note all entries now have SID value
0)

Update the server to be delta-sync MMR -- Set sid=1, Mirror Mode, and add
accesslog DB.  Add syncrepl stanza for a second server

Install second server as a delta-sycnrepl MMR pair -- cn=config, accesslog DB,
primary DB, SID=2

Servers now go into infinite loops due to the SID=0 entries.

Server SID=1 logs:

May 16 14:33:24 zre-ldap002 slapd[17422]: do_syncrep2: rid=100
cookie=rid=100,sid=002,csn=20120516212017.543213Z#000000#000#000000
May 16 14:33:24 zre-ldap002 slapd[17422]: slap_queue_csn: queing 0x58d2ced
20120516212017.543213Z#000000#000#000000
May 16 14:33:24 zre-ldap002 slapd[17422]: slap_graduate_commit_csn: removing
0x4699f50 20120516212017.543213Z#000000#000#000000
May 16 14:33:24 zre-ldap002 slapd[17422]: syncrepl_message_to_op: rid=100 be_add
cn=zimbra (68)
May 16 14:33:24 zre-ldap002 slapd[17422]: do_syncrep2: rid=100 delta-sync lost
sync on (reqStart=20120516213237.000019Z,cn=accesslog), switching to REFRESH
May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_message_to_entry: rid=100 DN:
cn=zimbra, UUID: add167b2-33e8-1031-8f62-f1c6fd57f8b3
May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_entry: rid=100
LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_ADD)
May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_entry: rid=100 inserted UUID
add167b2-33e8-1031-8f62-f1c6fd57f8b3
May 16 14:33:25 zre-ldap002 slapd[17422]: dn_callback : entries have identical
CSN cn=zimbra 20120516212017.543213Z#000000#000#000000
May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_entry: rid=100 be_search (0)
May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_entry: rid=100 cn=zimbra
May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_entry: rid=100 entry
unchanged, ignored (cn=zimbra)
May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_message_to_entry: rid=100 DN:
cn=admins,cn=zimbra, UUID: add18ab2-33e8-1031-8f63-f1c6fd57f8b3
May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_entry: rid=100
LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_ADD)
May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_entry: rid=100 inserted UUID
add18ab2-33e8-1031-8f63-f1c6fd57f8b3
May 16 14:33:25 zre-ldap002 slapd[17422]: dn_callback : entries have identical
CSN cn=admins,cn=zimbra 20120516212017.544170Z#000000#000#000000
May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_entry: rid=100 be_search (0)
May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_entry: rid=100
cn=admins,cn=zimbra
May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_entry: rid=100 entry
unchanged, ignored (cn=admins,cn=zimbra)

(etc, going through all 000 SID entries).

Once it finishes, it starts over:


May 16 14:33:25 zre-ldap002 slapd[17422]: dn_callback : entries have identical
CSN cn=com_zimbra_click2call_mitel,cn=zimlets,cn=zimbra
20120516212615.543144Z#000000#000#000000
May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_entry: rid=100 be_search (0)
May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_entry: rid=100
cn=com_zimbra_click2call_mitel,cn=zimlets,cn=zimbra
May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_entry: rid=100 entry
unchanged, ignored (cn=com_zimbra_click2call_mitel,cn=zimlets,cn=zimbra)
May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_message_to_entry: rid=100 DN:
cn=zre-ldap003.eng.vmware.com,cn=servers,cn=zimbra, UUID:
5a64830a-33ea-1031-8fbb-f1c6fd57f8b3
May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_entry: rid=100
LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_ADD)
May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_entry: rid=100 inserted UUID
5a64830a-33ea-1031-8fbb-f1c6fd57f8b3
May 16 14:33:25 zre-ldap002 slapd[17422]: dn_callback : entries have identical
CSN cn=zre-ldap003.eng.vmware.com,cn=servers,cn=zimbra
20120516213323.905859Z#000000#001#000000
May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_entry: rid=100 be_search (0)
May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_entry: rid=100
cn=zre-ldap003.eng.vmware.com,cn=servers,cn=zimbra
May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_entry: rid=100 entry
unchanged, ignored (cn=zre-ldap003.eng.vmware.com,cn=servers,cn=zimbra)
May 16 14:33:25 zre-ldap002 slapd[17422]: do_syncrep2: rid=100
LDAP_RES_SEARCH_RESULT
May 16 14:33:25 zre-ldap002 slapd[17422]: do_syncrep2: rid=100
cookie=rid=100,sid=002,csn=20120516213323.905859Z#000000#001#000000;20120516213237.680889Z#000000#002#000000
May 16 14:33:25 zre-ldap002 slapd[17422]: slap_queue_csn: queing 0x5805da0
20120516213237.680889Z#000000#002#000000
May 16 14:33:25 zre-ldap002 slapd[17422]: slap_graduate_commit_csn: removing
0x5805620 20120516213237.680889Z#000000#002#000000
May 16 14:33:25 zre-ldap002 slapd[17422]: do_syncrep2: rid=100
cookie=rid=000,sid=002,csn=20120516212017.543213Z#000000#000#000000
May 16 14:33:25 zre-ldap002 slapd[17422]: slap_queue_csn: queing 0x58d2ced
20120516212017.543213Z#000000#000#000000
May 16 14:33:25 zre-ldap002 slapd[17422]: slap_graduate_commit_csn: removing
0x5805bc0 20120516212017.543213Z#000000#000#000000
May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_message_to_op: rid=100 be_add
cn=zimbra (68)
May 16 14:33:25 zre-ldap002 slapd[17422]: do_syncrep2: rid=100 delta-sync lost
sync on (reqStart=20120516213237.000019Z,cn=accesslog), switching to REFRESH
May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_message_to_entry: rid=100 DN:
cn=zimbra, UUID: add167b2-33e8-1031-8f62-f1c6fd57f8b3
May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_entry: rid=100
LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_ADD)
May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_entry: rid=100 inserted UUID
add167b2-33e8-1031-8f62-f1c6fd57f8b3
May 16 14:33:25 zre-ldap002 slapd[17422]: dn_callback : entries have identical
CSN cn=zimbra 20120516212017.543213Z#000000#000#000000
May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_entry: rid=100 be_search (0)
May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_entry: rid=100 cn=zimbra
May 16 14:33:25 zre-ldap002 slapd[17422]: syncrepl_entry: rid=100 entry
unchanged, ignored (cn=zimbra)


On the second MMR master server, it logs:

May 16 14:36:15 zre-ldap003 slapd[26137]: conn=1002 op=138856 SRCH attr=* +
May 16 14:36:15 zre-ldap003 slapd[26137]: conn=1002 op=138856 SEARCH RESULT
tag=101 err=0 nentries=0 text=
May 16 14:36:15 zre-ldap003 slapd[26137]: conn=1002 op=138857 SRCH
base="cn=accesslog" scope=2 deref=0
filter="(&(objectClass=auditWriteObject)(reqResult=0))"
May 16 14:36:15 zre-ldap003 slapd[26137]: conn=1002 op=138857 SRCH attr=reqDN
reqType reqMod reqNewRDN reqDeleteOldRDN reqNewSuperior entryCSN
May 16 14:36:15 zre-ldap003 slapd[26137]: syncprov_search_response:
cookie=rid=100,sid=002,csn=20120516212644.426510Z#000000#000#000000;20120516213448.200519Z#000000#001#000000;20120516213237.680085Z#000000#002#000000
May 16 14:36:15 zre-ldap003 slapd[26137]: conn=1002 op=138858 ABANDON
msg=138858
May 16 14:36:15 zre-ldap003 slapd[26137]: conn=1002 op=138859 SRCH base=""
scope=2 deref=0 filter="(objectClass=*)"
May 16 14:36:15 zre-ldap003 slapd[26137]: conn=1002 op=138859 SRCH attr=* +
May 16 14:36:15 zre-ldap003 slapd[26137]: conn=1002 op=138859 SEARCH RESULT
tag=101 err=0 nentries=0 text=
May 16 14:36:15 zre-ldap003 slapd[26137]: conn=1002 op=138860 SRCH
base="cn=accesslog" scope=2 deref=0
filter="(&(objectClass=auditWriteObject)(reqResult=0))"
May 16 14:36:15 zre-ldap003 slapd[26137]: conn=1002 op=138860 SRCH attr=reqDN
reqType reqMod reqNewRDN reqDeleteOldRDN reqNewSuperior entryCSN
May 16 14:36:15 zre-ldap003 slapd[26137]: conn=1002 op=138861 ABANDON
msg=138861
May 16 14:36:15 zre-ldap003 slapd[26137]: conn=1002 op=138862 SRCH base=""
scope=2 deref=0 filter="(objectClass=*)"
May 16 14:36:15 zre-ldap003 slapd[26137]: conn=1002 op=138862 SRCH attr=* +
May 16 14:36:15 zre-ldap003 slapd[26137]: conn=1002 op=138862 SEARCH RESULT
tag=101 err=0 nentries=0 text=
May 16 14:36:15 zre-ldap003 slapd[26137]: conn=1002 op=138863 SRCH
base="cn=accesslog" scope=2 deref=0
filter="(&(objectClass=auditWriteObject)(reqResult=0))"
May 16 14:36:15 zre-ldap003 slapd[26137]: conn=1002 op=138863 SRCH attr=reqDN
reqType reqMod reqNewRDN reqDeleteOldRDN reqNewSuperior entryCSN
May 16 14:36:15 zre-ldap003 slapd[26137]: conn=1002 op=138864 ABANDON
msg=138864
May 16 14:36:15 zre-ldap003 slapd[26137]: conn=1002 op=138865 SRCH base=""
scope=2 deref=0 filter="(objectClass=*)"
May 16 14:36:15 zre-ldap003 slapd[26137]: conn=1002 op=138865 SRCH attr=* +
May 16 14:36:15 zre-ldap003 slapd[26137]: conn=1002 op=138865 SEARCH RESULT
tag=101 err=0 nentries=0 text=
May 16 14:36:15 zre-ldap003 slapd[26137]: conn=1002 op=138866 SRCH
base="cn=accesslog" scope=2 deref=0
filter="(&(objectClass=auditWriteObject)(reqResult=0))"
May 16 14:36:15 zre-ldap003 slapd[26137]: conn=1002 op=138866 SRCH attr=reqDN
reqType reqMod reqNewRDN reqDeleteOldRDN reqNewSuperior entryCSN
May 16 14:36:15 zre-ldap003 slapd[26137]: syncprov_search_response:
cookie=rid=100,sid=002,csn=20120516212644.426510Z#000000#000#000000;20120516213448.200519Z#000000#001#000000;20120516213237.680085Z#000000#002#000000
May 16 14:36:15 zre-ldap003 slapd[26137]: conn=1002 op=138867 ABANDON
msg=138867



However, I can get a working delta-sync MMR pair if I do the following:


Set up a stand alone OpenLDAP server -- Just cn=config and primary DB.

Basic database for the server is created (note all entries now have SID value
0)

Update the server to be delta-sync master: add accesslog DB.

Install second server as a replica.

Update first server to be delta-sync MMR: Set sid=1, mirrormode=true on primary
DB, add syncrepl stanza for second server

Update second server to be delta-sync MMR: Set sid=2, mirrormode=true on primary
DB, add accesslog DB, add syncprel stanza for first server


In this scenario everything works as expected.

The issue seems to be caused in the first scenario by server 2 replicating the
DB with SID=0 bits from the primary server, and that looping back to the first
server endlessly, instead of being permanently ignored.



--Quanah