[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Problem with syncrepl and deletion on openldap 2.4.21



On 17/05/2010 17:45, Mark Cave-Ayland wrote:
Hi folks,

We've been running the latest openldap stable on one of our client sites
and have hit an issue earlier in the day with syncrepl changes not being
propagated to all consumers. The live system is configured with one
master and 4 slaves connected across various LAN/WAN links with all
servers running 2.4.21, db-4.7 on RHEL4.

Earlier in the day, we renamed a corrupted ou (that was accidentally
created with hidden control characters) to ou=salvage. Since the
slapd.conf backend on the master was set to "database bdb", this was
done by cloning the data into a new ou=salvage and deleting the old
version. This data was cleaned up by the client by copying from
ou=salvage into its correct location, and then finally they attempted to
delete the complete ou=salvage part of the tree. However, after the
deletion phase we found that 2 of our slaves were inconsistent with the
master.

As an example, on one slave we noticed that one DN with cn=001901717 had
not been deleted from ou=salvage. The relevant parts of the log on the
master and slave look like this:


master:

May 17 12:35:08 master slapd[3692]: conn=222514 op=3089 SRCH
attr=objectclass
May 17 12:35:08 master slapd[3692]: conn=222514 op=3089 SEARCH RESULT
tag=101 err=0 nentries=0 text=
May 17 12:35:08 master slapd[3692]: conn=222514 op=3090 DEL
dn="cn=001801489,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com"
May 17 12:35:08 master slapd[3692]: slap_queue_csn: queing 0x824fee10
20100517113508.089178Z#000000#000#000000
May 17 12:35:08 master slapd[3692]: syncprov_sendresp:
cookie=rid=149,csn=20100517113508.089178Z#000000#000#000000
May 17 12:35:08 master slapd[3692]: conn=222514 op=3090 RESULT tag=107
err=0 text=
May 17 12:35:08 master slapd[3692]: slap_graduate_commit_csn: removing
0x82f2e250 20100517113508.089178Z#000000#000#000000
May 17 12:35:08 master slapd[3692]: syncprov_sendresp:
cookie=rid=107,csn=20100517113508.089178Z#000000#000#000000
May 17 12:35:08 master slapd[3692]: syncprov_sendresp:
cookie=rid=136,csn=20100517113508.089178Z#000000#000#000000
May 17 12:35:08 master slapd[3692]: syncprov_sendresp:
cookie=rid=103,csn=20100517113508.089178Z#000000#000#000000
May 17 12:35:08 master slapd[3692]: syncprov_sendresp:
cookie=rid=154,csn=20100517113508.089178Z#000000#000#000000
May 17 12:35:08 master slapd[3692]: conn=222514 op=3091 SRCH
base="cn=001800462,ou=People,ou=salvage,ou=Access
Groups,dc=client,dc=com" scope=1 deref=2 filter="(objectClass=*)"
May 17 12:35:08 master slapd[3692]: conn=222514 op=3091 SRCH
attr=objectclass
May 17 12:35:08 master slapd[3692]: conn=222514 op=3091 SEARCH RESULT
tag=101 err=0 nentries=0 text=
May 17 12:35:08 master slapd[3692]: conn=222514 op=3092 DEL
dn="cn=001800462,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com"
May 17 12:35:08 master slapd[3692]: slap_queue_csn: queing 0x8e151e10
20100517113508.144277Z#000000#000#000000
May 17 12:35:08 master slapd[3692]: syncprov_sendresp:
cookie=rid=149,csn=20100517113508.144277Z#000000#000#000000
May 17 12:35:08 master slapd[3692]: conn=222514 op=3092 RESULT tag=107
err=0 text=
May 17 12:35:08 master slapd[3692]: slap_graduate_commit_csn: removing
0x7d00cc58 20100517113508.144277Z#000000#000#000000
May 17 12:35:08 master slapd[3692]: syncprov_sendresp:
cookie=rid=107,csn=20100517113508.144277Z#000000#000#000000
May 17 12:35:08 master slapd[3692]: syncprov_sendresp:
cookie=rid=136,csn=20100517113508.144277Z#000000#000#000000
May 17 12:35:08 master slapd[3692]: syncprov_sendresp:
cookie=rid=103,csn=20100517113508.144277Z#000000#000#000000
May 17 12:35:08 master slapd[3692]: syncprov_sendresp:
cookie=rid=154,csn=20100517113508.144277Z#000000#000#000000
May 17 12:35:08 master slapd[3692]: conn=222514 op=3093 SRCH
base="cn=001801399,ou=People,ou=salvage,ou=Access
Groups,dc=client,dc=com" scope=1 deref=2 filter="(objectClass=*)"
May 17 12:35:08 master slapd[3692]: conn=222514 op=3093 SRCH
attr=objectclass
May 17 12:35:08 master slapd[3692]: conn=222514 op=3093 SEARCH RESULT
tag=101 err=0 nentries=0 text=
May 17 12:35:08 master slapd[3692]: conn=222514 op=3094 DEL
dn="cn=001801399,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com"
May 17 12:35:08 master slapd[3692]: slap_queue_csn: queing 0x8c6f6e10
20100517113508.192643Z#000000#000#000000
May 17 12:35:08 master slapd[3692]: syncprov_sendresp:
cookie=rid=149,csn=20100517113508.192643Z#000000#000#000000
May 17 12:35:08 master slapd[3692]: conn=222514 op=3094 RESULT tag=107
err=0 text=
May 17 12:35:08 master slapd[3692]: slap_graduate_commit_csn: removing
0x82588ca0 20100517113508.192643Z#000000#000#000000
May 17 12:35:08 master slapd[3692]: syncprov_sendresp:
cookie=rid=107,csn=20100517113508.192643Z#000000#000#000000
May 17 12:35:08 master slapd[3692]: syncprov_sendresp:
cookie=rid=136,csn=20100517113508.192643Z#000000#000#000000
May 17 12:35:08 master slapd[3692]: syncprov_sendresp:
cookie=rid=103,csn=20100517113508.192643Z#000000#000#000000
May 17 12:35:08 master slapd[3692]: syncprov_sendresp:
cookie=rid=154,csn=20100517113508.192643Z#000000#000#000000
May 17 12:35:08 master slapd[3692]: conn=222514 op=3095 SRCH
base="cn=001800467,ou=People,ou=salvage,ou=Access
Groups,dc=client,dc=com" scope=1 deref=2 filter="(objectClass=*)"
May 17 12:35:08 master slapd[3692]: conn=222514 op=3095 SRCH
attr=objectclass
May 17 12:35:08 master slapd[3692]: conn=222514 op=3095 SEARCH RESULT
tag=101 err=0 nentries=0 text=
May 17 12:35:08 master slapd[3692]: conn=222514 op=3096 DEL
dn="cn=001800467,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com"
May 17 12:35:08 master slapd[3692]: slap_queue_csn: queing 0x8cffae10
20100517113508.234887Z#000000#000#000000
May 17 12:35:08 master slapd[3692]: syncprov_sendresp:
cookie=rid=107,csn=20100517113508.234887Z#000000#000#000000
May 17 12:35:08 master slapd[3692]: syncprov_sendresp:
cookie=rid=136,csn=20100517113508.234887Z#000000#000#000000
May 17 12:35:08 master slapd[3692]: syncprov_sendresp:
cookie=rid=103,csn=20100517113508.234887Z#000000#000#000000
May 17 12:35:08 master slapd[3692]: syncprov_sendresp:
cookie=rid=154,csn=20100517113508.234887Z#000000#000#000000
May 17 12:35:08 master slapd[3692]: syncprov_sendresp:
cookie=rid=149,csn=20100517113508.234887Z#000000#000#000000
May 17 12:35:08 master slapd[3692]: conn=222514 op=3096 RESULT tag=107
err=0 text=
May 17 12:35:08 master slapd[3692]: slap_graduate_commit_csn: removing
0x7d802e90 20100517113508.234887Z#000000#000#000000
May 17 12:35:08 master slapd[3692]: conn=222514 op=3097 SRCH
base="cn=001901714,ou=People,ou=salvage,ou=Access
Groups,dc=client,dc=com" scope=1 deref=2 filter="(objectClass=*)"
May 17 12:35:08 master slapd[3692]: conn=222514 op=3097 SRCH
attr=objectclass
May 17 12:35:08 master slapd[3692]: conn=222514 op=3097 SEARCH RESULT
tag=101 err=0 nentries=0 text=
May 17 12:35:08 master slapd[3692]: conn=222514 op=3098 DEL
dn="cn=001901714,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com"
May 17 12:35:08 master slapd[3692]: slap_queue_csn: queing 0x87ef3e10
20100517113508.277430Z#000000#000#000000
May 17 12:35:08 master slapd[3692]: syncprov_sendresp:
cookie=rid=149,csn=20100517113508.277430Z#000000#000#000000
May 17 12:35:08 master slapd[3692]: conn=222514 op=3098 RESULT tag=107
err=0 text=
May 17 12:35:08 master slapd[3692]: slap_graduate_commit_csn: removing
0x7e95abe8 20100517113508.277430Z#000000#000#000000
May 17 12:35:08 master slapd[3692]: syncprov_sendresp:
cookie=rid=107,csn=20100517113508.277430Z#000000#000#000000
May 17 12:35:08 master slapd[3692]: syncprov_sendresp:
cookie=rid=136,csn=20100517113508.277430Z#000000#000#000000
May 17 12:35:08 master slapd[3692]: syncprov_sendresp:
cookie=rid=103,csn=20100517113508.277430Z#000000#000#000000
May 17 12:35:08 master slapd[3692]: syncprov_sendresp:
cookie=rid=154,csn=20100517113508.277430Z#000000#000#000000
May 17 12:35:08 master slapd[3692]: conn=222514 op=3099 SRCH
base="cn=001901717,ou=People,ou=salvage,ou=Access
Groups,dc=client,dc=com" scope=1 deref=2 filter="(objectClass=*)"
May 17 12:35:08 master slapd[3692]: conn=222514 op=3099 SRCH
attr=objectclass
May 17 12:35:08 master slapd[3692]: conn=222514 op=3099 SEARCH RESULT
tag=101 err=0 nentries=0 text=
May 17 12:35:08 master slapd[3692]: conn=222514 op=3100 DEL
dn="cn=001901717,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com"
May 17 12:35:08 master slapd[3692]: slap_queue_csn: queing 0x81cfce10
20100517113508.331930Z#000000#000#000000
May 17 12:35:08 master slapd[3692]: conn=222514 op=3101 ABANDON msg=3101
May 17 12:35:08 master slapd[3692]: conn=222514 op=3102 UNBIND
May 17 12:35:08 master slapd[3692]: slap_graduate_commit_csn: removing
0x7e550d68 20100517113508.331930Z#000000#000#000000
May 17 12:35:08 master slapd[3692]: conn=222514 fd=411 closed


slave:

May 17 12:35:32 slave slapd[2393]: slap_graduate_commit_csn: removing
0x8b867e0 20100517113508.192643Z#000000#000#000000
May 17 12:35:32 slave slapd[2393]: syncrepl_entry: rid=149 be_delete
cn=001801399,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com (0)
May 17 12:35:32 slave slapd[2393]: slap_queue_csn: queing 0x89bfb70
20100517113508.192643Z#000000#000#000000
May 17 12:35:32 slave slapd[2393]: slap_graduate_commit_csn: removing
0x8b69c68 20100517113508.192643Z#000000#000#000000
May 17 12:35:32 slave slapd[2393]: do_syncrep2: rid=149
cookie=rid=149,csn=20100517113508.234887Z#000000#000#000000
May 17 12:35:32 slave slapd[2393]: syncrepl_entry: rid=149
LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_DELETE)
May 17 12:35:32 slave slapd[2393]: syncrepl_entry: rid=149 be_search (0)
May 17 12:35:32 slave slapd[2393]: syncrepl_entry: rid=149
cn=001800467,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com
May 17 12:35:32 slave slapd[2393]: slap_queue_csn: queing 0x8af8bf0
20100517113508.234887Z#000000#000#000000
May 17 12:35:32 slave slapd[2393]: slap_graduate_commit_csn: removing
0x86d0a60 20100517113508.234887Z#000000#000#000000
May 17 12:35:32 slave slapd[2393]: syncrepl_entry: rid=149 be_delete
cn=001800467,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com (0)
May 17 12:35:32 slave slapd[2393]: slap_queue_csn: queing 0x8af8bf0
20100517113508.234887Z#000000#000#000000
May 17 12:35:32 slave slapd[2393]: slap_graduate_commit_csn: removing
0x86893e8 20100517113508.234887Z#000000#000#000000
May 17 12:35:32 slave slapd[2393]: do_syncrep2: rid=149
cookie=rid=149,csn=20100517113508.277430Z#000000#000#000000
May 17 12:35:32 slave slapd[2393]: syncrepl_entry: rid=149
LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_DELETE)
May 17 12:35:32 slave slapd[2393]: syncrepl_entry: rid=149 be_search (0)
May 17 12:35:32 slave slapd[2393]: syncrepl_entry: rid=149
cn=001901714,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com
May 17 12:35:32 slave slapd[2393]: slap_queue_csn: queing 0x86c0a38
20100517113508.277430Z#000000#000#000000
May 17 12:35:32 slave slapd[2393]: slap_graduate_commit_csn: removing
0x8abc018 20100517113508.277430Z#000000#000#000000
May 17 12:35:32 slave slapd[2393]: syncrepl_entry: rid=149 be_delete
cn=001901714,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com (0)
May 17 12:35:32 slave slapd[2393]: slap_queue_csn: queing 0x86c0a38
20100517113508.277430Z#000000#000#000000
May 17 12:35:32 slave slapd[2393]: slap_graduate_commit_csn: removing
0x8aedb68 20100517113508.277430Z#000000#000#000000

(logging stops for about 3s before another connection is made)


AFAICT it seems that something happened between the master and slave
that caused an ABANDON to be issued and the connection dropped to that
slave. The deletion was successful on the master, but the cn=001901717
entry still remains on the slave.

Actually, it looks like the dropped connection was the one that did the deletion, not the slave's connection:
May 17 12:35:08 master slapd[3692]: conn=222514 op=3100 DEL dn="cn=001901717,ou=People,ou=salvage,ou=Access Groups,dc=client,dc=com"
May 17 12:35:08 master slapd[3692]: slap_queue_csn: queing 0x81cfce10 20100517113508.331930Z#000000#000#000000
May 17 12:35:08 master slapd[3692]: conn=222514 op=3101 ABANDON msg=3101
May 17 12:35:08 master slapd[3692]: conn=222514 op=3102 UNBIND

This looks very much like ITS#6059.

My questions would therefore be:

1) Are there any known issues with syncrepl and deletion in 2.4.21 that
may be fixed in 2.4.22?

There are issues fixed in 2.4.22 regarding syncrepl. See the release notes: http://www.openldap.org/software/release/changes.html

However, I'm pretty sure they don't address the issue discussed in ITS#6059.

2) What does an ABANDON actually mean? Did the client or server
terminate the connection and can we find out why? When slave reconnected
back to the master, why was the incomplete transaction from the master
not pushed back to the slave?

An ABANDON happens when an operation is explicitly abandoned, or is cancelled, due to a lost connection for example. Typically, if you're running ldapmodify and hit CTRL-C, you get an ABANDON in the logs (well, if you are fast enough to hit it at the right time that is).

3) Would a simple restart of slapd on the slave cause it to
resynchronise correctly with the master?

No, assuming my assumptions above are correct: if the syncrepl consumer didn't lose the connection to the master, other updates will have been propagated, and the consumer's cookie will have been updated, missing out the delete operation.

To resync, start your consumer(s) once with the -c rid=nnn command line option (with nnn being the rid of the syncrepl statement in config).

Hope this helps,
Jonathan

4) Is there any more information that we can provide from the logs on
the 2 inconsistent slaves?

Note I'm not sure whether to open an ITS on this since it is a live
system and hence the scope for testing/debugging is much more limited.
But I can try my best to obtain extra information if asked.


Many thanks,

Mark.



--
--------------------------------------------------------------
Jonathan Clarke - jonathan@phillipoux.net
--------------------------------------------------------------
Ldap Synchronization Connector (LSC) - http://lsc-project.org
--------------------------------------------------------------