Issue 8800 - MMR: out of date master will ignore history of its own changes
Summary: MMR: out of date master will ignore history of its own changes
Status: VERIFIED FIXED
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: slapd (show other issues)
Version: 2.4.45
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-01-30 21:04 UTC by Quanah Gibson-Mount
Modified: 2018-03-22 19:26 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description Quanah Gibson-Mount 2018-01-30 21:04:12 UTC
Full_Name: Quanah Gibson-Mount
Version: 2.4.45
OS: Linux
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (47.208.148.239)


Did the following test:

4-way MMR setup, database populated from an initial DB that has history to it

Make several thousand MODs to serverID 1 only
Stop serverID 1
wipe its database
reload serverID 1 from the initial DB

Expected result:
serverID 1 REFRESHes from the other servers to sync up its database

Actual result:
serverID 1 updates its contextCSN to the last change op, but does not sync any
actual changes back, leaving it out of sync with the rest of the cluster.
Comment 1 Quanah Gibson-Mount 2018-01-30 23:46:46 UTC
--On Tuesday, January 30, 2018 9:04 PM +0000 quanah@openldap.org wrote:

> Full_Name: Quanah Gibson-Mount
> Version: 2.4.45
> OS: Linux
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (47.208.148.239)
>
>
> Did the following test:
>
> 4-way MMR setup, database populated from an initial DB that has history
> to it

Per Howard's suggestion, I commented out lines 2390-2396 in syncprov.c, 
which allows the master to get fed back its own operations on startup.

On the plus side, the database indeed get all the operations sent back. 
There were 1187 entries in the accesslog DBs of all 4 nodes.

On the minus side, while server IDs 2,3, and 4 all agreed on the final 
resulting contextCSN, serverID 1 did not.  Which then broke the ability for 
the nodes to communicate with each other (err=53)

Last entry on serverIDs 1/2/3/4:

dn: reqStart=20180130233019.000017Z,cn=accesslog
objectClass: auditModify
reqStart: 20180130233019.000017Z
reqEnd: 20180130233019.000018Z
reqType: modify
reqSession: 1
reqAuthzID: cn=ldaproot,dc=xxx,dc=yyy
reqDN: dc=xxx,dc=yyy
reqResult: 0
reqMod: contextCSN:= 20180130233019.035885Z#000000#001#000000
reqMod: contextCSN:= 20171130222521.056018Z#000000#002#000000
reqMod: contextCSN:= 20171130222318.939265Z#000000#003#000000
reqMod: contextCSN:= 20171203041258.811473Z#000000#004#000000
reqEntryUUID: 156eb8cc-18e9-1027-80e5-d3f2010890dc


contextCSNs on 2/3/4:
base
contextCSN: 20180130233019.035885Z#000000#001#000000
contextCSN: 20171130222521.056018Z#000000#002#000000
contextCSN: 20171130222318.939265Z#000000#003#000000
contextCSN: 20171203041258.811473Z#000000#004#000000
accesslog
contextCSN: 20180130233019.035885Z#000000#001#000000
contextCSN: 20171130222521.056018Z#000000#002#000000
contextCSN: 20171130222318.939265Z#000000#003#000000
contextCSN: 20171203041258.811473Z#000000#004#000000

contextCSNs on 1:
base
contextCSN: 20180130233019.035885Z#000000#001#000000
contextCSN: 20171130222521.056018Z#000000#002#000000
contextCSN: 20171130222318.939265Z#000000#003#000000
contextCSN: 20171203041258.811473Z#000000#004#000000
accesslog
contextCSN: 20180130233016.137867Z#000000#001#000000
contextCSN: 20171130222521.056018Z#000000#002#000000
contextCSN: 20171130222318.939265Z#000000#003#000000
contextCSN: 20171203041258.811473Z#000000#004#000000


Note that the contextCSN is correct on the database root, but incorrect in 
the accesslog entry.

--Quanah


--

Quanah Gibson-Mount
Product Architect
Symas Corporation
Packaged, certified, and supported LDAP solutions powered by OpenLDAP:
<http://www.symas.com>


Comment 2 Howard Chu 2018-02-08 00:27:53 UTC
changed notes
changed state Open to Test
moved from Incoming to Software Bugs
Comment 3 Quanah Gibson-Mount 2018-02-09 18:25:54 UTC
changed notes
changed state Test to Release
Comment 4 OpenLDAP project 2018-03-22 19:26:54 UTC
fixed in master
fixed in RE24 (2.4.46)
Comment 5 Quanah Gibson-Mount 2018-03-22 19:26:54 UTC
changed notes
changed state Release to Closed