[Date Prev][Date Next] [Chronological] [Thread] [Top]

(ITS#8281) Syncrepl refresh failure when slapd is restarted midstream



Full_Name: Quanah Gibson-Mount
Version: RE24 Sept 11, 2015
OS: Linux 2.6
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (75.111.52.177)


Seeing a scenario where if slapd is stopped on a new MMR node while a full
REFRESH is occurring, the state of that refresh is not tracked, and the wrong
CSN value is stored.
This dataset has 15,000 users.  We see it get up to user 625:

Oct 20 16:13:09 q2 slapd[18724]: syncrepl_entry: rid=100 be_search (0)
Oct 20 16:13:09 q2 slapd[18724]: syncrepl_entry: rid=100
uid=user625,ou=people,dc=q1,dc=aon,dc=zimbraview,dc=com
Oct 20 16:13:09 q2 slapd[18724]: slap_queue_csn: queueing 0x44c7e30
20151020185526.862768Z#000000#000#000000
Oct 20 16:13:09 q2 slapd[18724]: slap_graduate_commit_csn: removing 0x44c87c0
20151020185526.862768Z#000000#000#000000
Oct 20 16:13:09 q2 slapd[18724]: syncrepl_entry: rid=100 be_add
uid=user625,ou=people,dc=q2C2Cdc=aon,dc=zimbraview,dc=com (0)
Oct 20 16:13:09 q2 slapd[18724]: slapd stopped.


Then when slapd is restarted:

Oct 20 16:13:16 q2 slapd[18970]: do_syncrep2: rid=100
cookie=rid=100,sid=001,csn=20151020201231.263989Z#000000#001#000000
Oct 20 16:13:16 q2 slapd[18970]: sp_p_queue_csn: queueing 0x309dfd8
20151020201231.263989Z#000000#001#000000
Oct 20 16:13:16 q2 slapd[18970]: slap_queue_csn: queueing 0x5054008
20151020201231.263989Z#000000#001#000000
Oct 20 16:13:16 q2 slapd[18970]: slap_graduate_commit_csn: removing 0x49353c0
20151020201231.263989Z#000000#001#000000
Oct 20 16:13:16 q2 slapd[18970]: slap_graduate_commit_csn: removing 0x4935060
20151020201231.263989Z#000000#001#000000
Oct 20 16:13:16 q2 slapd[18970]: syncrepl_message_to_op: rid=100 be_add
cn=q2.aon.zimbraview.com,cn=servers,cn=zimbra (0)

which causes it to skip the other 14,000+ users.