[Date Prev][Date Next] [Chronological] [Thread] [Top]

Problem with syncrepl and deletion on openldap 2.4.21



Hi folks,

We've been running the latest openldap stable on one of our client sites and have hit an issue earlier in the day with syncrepl changes not being propagated to all consumers. The live system is configured with one master and 4 slaves connected across various LAN/WAN links with all servers running 2.4.21, db-4.7 on RHEL4.

Earlier in the day, we renamed a corrupted ou (that was accidentally created with hidden control characters) to ou=salvage. Since the slapd.conf backend on the master was set to "database bdb", this was done by cloning the data into a new ou=salvage and deleting the old version. This data was cleaned up by the client by copying from ou=salvage into its correct location, and then finally they attempted to delete the complete ou=salvage part of the tree. However, after the deletion phase we found that 2 of our slaves were inconsistent with the master.

As an example, on one slave we noticed that one DN with cn=001901717 had not been deleted from ou=salvage. The relevant parts of the log on the master and slave look like this:


master:

May 17 12:35:08 master slapd[3692]: conn=222514 op=3089 SRCH attr=objectclass May 17 12:35:08 master slapd[3692]: conn=222514 op=3089 SEARCH RESULT tag=101 err=0 nentries=0 text= May 17 12:35:08 master slapd[3692]: conn=222514 op=3090 DEL dn="cn=001801489,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com" May 17 12:35:08 master slapd[3692]: slap_queue_csn: queing 0x824fee10 20100517113508.089178Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=149,csn=20100517113508.089178Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: conn=222514 op=3090 RESULT tag=107 err=0 text= May 17 12:35:08 master slapd[3692]: slap_graduate_commit_csn: removing 0x82f2e250 20100517113508.089178Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=107,csn=20100517113508.089178Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=136,csn=20100517113508.089178Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=103,csn=20100517113508.089178Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=154,csn=20100517113508.089178Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: conn=222514 op=3091 SRCH base="cn=001800462,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com" scope=1 deref=2 filter="(objectClass=*)" May 17 12:35:08 master slapd[3692]: conn=222514 op=3091 SRCH attr=objectclass May 17 12:35:08 master slapd[3692]: conn=222514 op=3091 SEARCH RESULT tag=101 err=0 nentries=0 text= May 17 12:35:08 master slapd[3692]: conn=222514 op=3092 DEL dn="cn=001800462,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com" May 17 12:35:08 master slapd[3692]: slap_queue_csn: queing 0x8e151e10 20100517113508.144277Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=149,csn=20100517113508.144277Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: conn=222514 op=3092 RESULT tag=107 err=0 text= May 17 12:35:08 master slapd[3692]: slap_graduate_commit_csn: removing 0x7d00cc58 20100517113508.144277Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=107,csn=20100517113508.144277Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=136,csn=20100517113508.144277Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=103,csn=20100517113508.144277Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=154,csn=20100517113508.144277Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: conn=222514 op=3093 SRCH base="cn=001801399,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com" scope=1 deref=2 filter="(objectClass=*)" May 17 12:35:08 master slapd[3692]: conn=222514 op=3093 SRCH attr=objectclass May 17 12:35:08 master slapd[3692]: conn=222514 op=3093 SEARCH RESULT tag=101 err=0 nentries=0 text= May 17 12:35:08 master slapd[3692]: conn=222514 op=3094 DEL dn="cn=001801399,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com" May 17 12:35:08 master slapd[3692]: slap_queue_csn: queing 0x8c6f6e10 20100517113508.192643Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=149,csn=20100517113508.192643Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: conn=222514 op=3094 RESULT tag=107 err=0 text= May 17 12:35:08 master slapd[3692]: slap_graduate_commit_csn: removing 0x82588ca0 20100517113508.192643Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=107,csn=20100517113508.192643Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=136,csn=20100517113508.192643Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=103,csn=20100517113508.192643Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=154,csn=20100517113508.192643Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: conn=222514 op=3095 SRCH base="cn=001800467,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com" scope=1 deref=2 filter="(objectClass=*)" May 17 12:35:08 master slapd[3692]: conn=222514 op=3095 SRCH attr=objectclass May 17 12:35:08 master slapd[3692]: conn=222514 op=3095 SEARCH RESULT tag=101 err=0 nentries=0 text= May 17 12:35:08 master slapd[3692]: conn=222514 op=3096 DEL dn="cn=001800467,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com" May 17 12:35:08 master slapd[3692]: slap_queue_csn: queing 0x8cffae10 20100517113508.234887Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=107,csn=20100517113508.234887Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=136,csn=20100517113508.234887Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=103,csn=20100517113508.234887Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=154,csn=20100517113508.234887Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=149,csn=20100517113508.234887Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: conn=222514 op=3096 RESULT tag=107 err=0 text= May 17 12:35:08 master slapd[3692]: slap_graduate_commit_csn: removing 0x7d802e90 20100517113508.234887Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: conn=222514 op=3097 SRCH base="cn=001901714,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com" scope=1 deref=2 filter="(objectClass=*)" May 17 12:35:08 master slapd[3692]: conn=222514 op=3097 SRCH attr=objectclass May 17 12:35:08 master slapd[3692]: conn=222514 op=3097 SEARCH RESULT tag=101 err=0 nentries=0 text= May 17 12:35:08 master slapd[3692]: conn=222514 op=3098 DEL dn="cn=001901714,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com" May 17 12:35:08 master slapd[3692]: slap_queue_csn: queing 0x87ef3e10 20100517113508.277430Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=149,csn=20100517113508.277430Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: conn=222514 op=3098 RESULT tag=107 err=0 text= May 17 12:35:08 master slapd[3692]: slap_graduate_commit_csn: removing 0x7e95abe8 20100517113508.277430Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=107,csn=20100517113508.277430Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=136,csn=20100517113508.277430Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=103,csn=20100517113508.277430Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: syncprov_sendresp: cookie=rid=154,csn=20100517113508.277430Z#000000#000#000000 May 17 12:35:08 master slapd[3692]: conn=222514 op=3099 SRCH base="cn=001901717,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com" scope=1 deref=2 filter="(objectClass=*)" May 17 12:35:08 master slapd[3692]: conn=222514 op=3099 SRCH attr=objectclass May 17 12:35:08 master slapd[3692]: conn=222514 op=3099 SEARCH RESULT tag=101 err=0 nentries=0 text= May 17 12:35:08 master slapd[3692]: conn=222514 op=3100 DEL dn="cn=001901717,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com" May 17 12:35:08 master slapd[3692]: slap_queue_csn: queing 0x81cfce10 20100517113508.331930Z#000000#000#000000
May 17 12:35:08 master slapd[3692]: conn=222514 op=3101 ABANDON msg=3101
May 17 12:35:08 master slapd[3692]: conn=222514 op=3102 UNBIND
May 17 12:35:08 master slapd[3692]: slap_graduate_commit_csn: removing 0x7e550d68 20100517113508.331930Z#000000#000#000000
May 17 12:35:08 master slapd[3692]: conn=222514 fd=411 closed


slave:

May 17 12:35:32 slave slapd[2393]: slap_graduate_commit_csn: removing 0x8b867e0 20100517113508.192643Z#000000#000#000000 May 17 12:35:32 slave slapd[2393]: syncrepl_entry: rid=149 be_delete cn=001801399,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com (0) May 17 12:35:32 slave slapd[2393]: slap_queue_csn: queing 0x89bfb70 20100517113508.192643Z#000000#000#000000 May 17 12:35:32 slave slapd[2393]: slap_graduate_commit_csn: removing 0x8b69c68 20100517113508.192643Z#000000#000#000000 May 17 12:35:32 slave slapd[2393]: do_syncrep2: rid=149 cookie=rid=149,csn=20100517113508.234887Z#000000#000#000000 May 17 12:35:32 slave slapd[2393]: syncrepl_entry: rid=149 LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_DELETE)
May 17 12:35:32 slave slapd[2393]: syncrepl_entry: rid=149 be_search (0)
May 17 12:35:32 slave slapd[2393]: syncrepl_entry: rid=149 cn=001800467,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com May 17 12:35:32 slave slapd[2393]: slap_queue_csn: queing 0x8af8bf0 20100517113508.234887Z#000000#000#000000 May 17 12:35:32 slave slapd[2393]: slap_graduate_commit_csn: removing 0x86d0a60 20100517113508.234887Z#000000#000#000000 May 17 12:35:32 slave slapd[2393]: syncrepl_entry: rid=149 be_delete cn=001800467,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com (0) May 17 12:35:32 slave slapd[2393]: slap_queue_csn: queing 0x8af8bf0 20100517113508.234887Z#000000#000#000000 May 17 12:35:32 slave slapd[2393]: slap_graduate_commit_csn: removing 0x86893e8 20100517113508.234887Z#000000#000#000000 May 17 12:35:32 slave slapd[2393]: do_syncrep2: rid=149 cookie=rid=149,csn=20100517113508.277430Z#000000#000#000000 May 17 12:35:32 slave slapd[2393]: syncrepl_entry: rid=149 LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_DELETE)
May 17 12:35:32 slave slapd[2393]: syncrepl_entry: rid=149 be_search (0)
May 17 12:35:32 slave slapd[2393]: syncrepl_entry: rid=149 cn=001901714,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com May 17 12:35:32 slave slapd[2393]: slap_queue_csn: queing 0x86c0a38 20100517113508.277430Z#000000#000#000000 May 17 12:35:32 slave slapd[2393]: slap_graduate_commit_csn: removing 0x8abc018 20100517113508.277430Z#000000#000#000000 May 17 12:35:32 slave slapd[2393]: syncrepl_entry: rid=149 be_delete cn=001901714,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com (0) May 17 12:35:32 slave slapd[2393]: slap_queue_csn: queing 0x86c0a38 20100517113508.277430Z#000000#000#000000 May 17 12:35:32 slave slapd[2393]: slap_graduate_commit_csn: removing 0x8aedb68 20100517113508.277430Z#000000#000#000000

(logging stops for about 3s before another connection is made)


AFAICT it seems that something happened between the master and slave that caused an ABANDON to be issued and the connection dropped to that slave. The deletion was successful on the master, but the cn=001901717 entry still remains on the slave.

My questions would therefore be:

1) Are there any known issues with syncrepl and deletion in 2.4.21 that may be fixed in 2.4.22?

2) What does an ABANDON actually mean? Did the client or server terminate the connection and can we find out why? When slave reconnected back to the master, why was the incomplete transaction from the master not pushed back to the slave?

3) Would a simple restart of slapd on the slave cause it to resynchronise correctly with the master?

4) Is there any more information that we can provide from the logs on the 2 inconsistent slaves?

Note I'm not sure whether to open an ITS on this since it is a live system and hence the scope for testing/debugging is much more limited. But I can try my best to obtain extra information if asked.


Many thanks,

Mark.

--
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063

Sirius Labs: http://www.siriusit.co.uk/labs