[Date Prev][Date Next] [Chronological] [Thread] [Top]

(ITS#7049) DEL/LDAP_SYNC_DELETE race touching entryCSN

To: openldap-its@OpenLDAP.org
Subject: (ITS#7049) DEL/LDAP_SYNC_DELETE race touching entryCSN
From: ebackes@symas.com
Date: Fri, 23 Sep 2011 04:57:42 GMT
Auto-submitted: auto-generated (OpenLDAP-ITS)

Full_Name: Emily Backes
Version: 2.4.26
OS: any
URL: 
Submission from: (NULL) (76.88.107.46)


Similar to the recent overlay fixes to prevent updating entryCSN/contextCSN on
local changes, delete operations can cause inappropriate CSN setting on remote
servers.

Given a multi-master setup (normal syncrepl tested), so that each server has a
serverID set, with no overlays loaded other than syncprov, set up two or more
threads of delete operations; three or more seems to most reliably reproduce the
problem on the systems I've tested.

As the deletes are happening, the server1 side should of course show it's
entryCSN updating:

dn: dc=example,dc=com
contextCSN: 20110923044343.412634Z#000000#001#000000

This should of course be mirrored on the server2 side with contextCSN exactly
matching the set of CSN's from the server1 side.  Instead, after enough
concurrent deletes to hit the race:

dn: dc=example,dc=com
contextCSN: 20110923044343.412634Z#000000#001#000000
contextCSN: 20110923044349.314803Z#000000#002#000000

This happens even though server2 has never received any local write operations
(or indeed any connection other than the syncrepl search from server1 and my
searches to retrieve contextCSN).  Again, no overlays are loaded.

This breaks syncrepl's assumptions and can result in other replication problems
as a result of CSN desync.

Working on tracing out exactly where it goes awry...

Prev by Date: Re: (ITS#7047) slapd crash on bad indexed translucent
Next by Date: (ITS#7050) meta stop at the first unreachable candidate
Index(es):
- Chronological
- Thread