Issue 8789 - syncrepl fallback can destabilize delta-sync MMR nodes
Summary: syncrepl fallback can destabilize delta-sync MMR nodes
Status: VERIFIED FIXED
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: slapd (show other issues)
Version: 2.4.45
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-12-11 18:30 UTC by Quanah Gibson-Mount
Modified: 2019-04-17 23:18 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description Quanah Gibson-Mount 2017-12-11 18:30:35 UTC
Full_Name: Quanah Gibson-Mount
Version: 2.4.45
OS: N/A
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (47.208.148.239)


In a N-Way MMR setup, one node falling back to syncrepl REFRESH may destabilize
other nodes, as it will incorrectly record changes it is receiving from the
master that has write ops.  If another node is using this master for its source,
it will then be forced into fallback as well, further destabilizing the
cluster.

In the scenario above, triggered this via a 4-way MMR setup.  serverID 1 master
was the only master that had write ops.  serverid 3 went into REFRESH for
unknown reasons.  serverid 2 was using serverid3 to pull in the changes for
serverid 1, for unknown reasons.

serverid 1 recorded the following change:

dn: reqStart=20171206214129.000002Z,cn=accesslog
objectClass: auditModify
structuralObjectClass: auditModify
reqStart: 20171206214129.000002Z
reqEnd: 20171206214129.000003Z
reqType: modify
reqSession: 2209
reqAuthzID: cn=ldaproot,dc=xxx,dc=edu
reqDN: uid=cdxxxxx,ou=user,dc=xxx,dc=edu
reqResult: 0
reqMod: pwdAccountLockedTime:= 20171206214129Z
reqMod: pwdFailureTime:+ 20171206214129.121729Z
reqMod: entryCSN:= 20171206214129.121794Z#000000#001#000000
reqMod: modifiersName:= cn=ldaproot,dc=xxx,dc=edu
reqMod: modifyTimestamp:= 20171206214129Z
reqEntryUUID: 41e02340-18f9-1027-900a-8ac8742d2008
entryUUID: f66c9be6-6f19-1037-9821-a35b555092e3
creatorsName: cn=accesslog
createTimestamp: 20171206214129Z
entryCSN: 20171206214129.121794Z#000000#001#000000
modifiersName: cn=accesslog
modifyTimestamp: 20171206214129Z


However, serverID 3 records the following for this same CSN instead:

dn: reqStart=20171206214354.000006Z,cn=accesslog
objectClass: auditModify
structuralObjectClass: auditModify
reqStart: 20171206214354.000006Z
reqEnd: 20171206214354.000008Z
reqType: modify
reqSession: 1
reqAuthzID: cn=ldaproot,dc=xxx,dc=edu
reqDN: uid=cdxxxxx,ou=user,dc=xxx,dc=edu
reqResult: 0
reqMod: userPassword:-
reqMod: pwdFailureTime:+ 20171206214105.339881Z
reqMod: pwdFailureTime:+ 20171206214105.504629Z
reqMod: pwdFailureTime:+ 20171206214105.756105Z
reqMod: pwdFailureTime:+ 20171206214106.117063Z
reqMod: pwdFailureTime:+ 20171206214106.348441Z
reqMod: pwdFailureTime:+ 20171206214106.575907Z
reqMod: pwdFailureTime:+ 20171206214106.875082Z
reqMod: pwdFailureTime:+ 20171206214107.175699Z
reqMod: pwdFailureTime:+ 20171206214107.655344Z
reqMod: pwdFailureTime:+ 20171206214107.915930Z
reqMod: pwdFailureTime:+ 20171206214108.156601Z
reqMod: pwdFailureTime:+ 20171206214108.431242Z
reqMod: pwdFailureTime:+ 20171206214108.791469Z
reqMod: pwdFailureTime:+ 20171206214109.033924Z
reqMod: pwdFailureTime:+ 20171206214109.318285Z
reqMod: pwdFailureTime:+ 20171206214109.565585Z
reqMod: pwdFailureTime:+ 20171206214109.823744Z
reqMod: pwdFailureTime:+ 20171206214110.110372Z
reqMod: pwdFailureTime:+ 20171206214110.306955Z
reqMod: pwdFailureTime:+ 20171206214110.638527Z
reqMod: pwdFailureTime:+ 20171206214111.014705Z
reqMod: pwdFailureTime:+ 20171206214111.370965Z
reqMod: pwdFailureTime:+ 20171206214111.673694Z
reqMod: pwdFailureTime:+ 20171206214112.011806Z
reqMod: pwdFailureTime:+ 20171206214112.327727Z
reqMod: pwdFailureTime:+ 20171206214112.584305Z
reqMod: pwdFailureTime:+ 20171206214112.930555Z
reqMod: pwdFailureTime:+ 20171206214113.269235Z
reqMod: pwdFailureTime:+ 20171206214113.633844Z
reqMod: pwdFailureTime:+ 20171206214113.928111Z
reqMod: pwdFailureTime:+ 20171206214114.217342Z
reqMod: pwdFailureTime:+ 20171206214114.539026Z
reqMod: pwdFailureTime:+ 20171206214114.888149Z
reqMod: pwdFailureTime:+ 20171206214115.262042Z
reqMod: pwdFailureTime:+ 20171206214115.675217Z
reqMod: pwdFailureTime:+ 20171206214116.030024Z
reqMod: pwdFailureTime:+ 20171206214116.362739Z
reqMod: pwdFailureTime:+ 20171206214116.616784Z
reqMod: pwdFailureTime:+ 20171206214116.987779Z
reqMod: pwdFailureTime:+ 20171206214117.293091Z
reqMod: pwdFailureTime:+ 20171206214117.549392Z
reqMod: pwdFailureTime:+ 20171206214117.838969Z
reqMod: pwdFailureTime:+ 20171206214118.051355Z
reqMod: pwdFailureTime:+ 20171206214118.275629Z
reqMod: pwdFailureTime:+ 20171206214118.583510Z
reqMod: pwdFailureTime:+ 20171206214118.866746Z
reqMod: pwdFailureTime:+ 20171206214119.174928Z
reqMod: pwdFailureTime:+ 20171206214119.483218Z
reqMod: pwdFailureTime:+ 20171206214119.929568Z
reqMod: pwdFailureTime:+ 20171206214120.147090Z
reqMod: pwdFailureTime:+ 20171206214120.549317Z
reqMod: pwdFailureTime:+ 20171206214120.869798Z
reqMod: pwdFailureTime:+ 20171206214121.143126Z
reqMod: pwdFailureTime:+ 20171206214121.476740Z
reqMod: pwdFailureTime:+ 20171206214121.799935Z
reqMod: pwdFailureTime:+ 20171206214122.066816Z
reqMod: pwdFailureTime:+ 20171206214122.405710Z
reqMod: pwdFailureTime:+ 20171206214122.761880Z
reqMod: pwdFailureTime:+ 20171206214123.032806Z
reqMod: pwdFailureTime:+ 20171206214123.280540Z
reqMod: pwdFailureTime:+ 20171206214123.748973Z
reqMod: pwdFailureTime:+ 20171206214124.085579Z
reqMod: pwdFailureTime:+ 20171206214124.340470Z
reqMod: pwdFailureTime:+ 20171206214124.638673Z
reqMod: pwdFailureTime:+ 20171206214124.970374Z
reqMod: pwdFailureTime:+ 20171206214125.302162Z
reqMod: pwdFailureTime:+ 20171206214125.630451Z
reqMod: pwdFailureTime:+ 20171206214125.921736Z
reqMod: pwdFailureTime:+ 20171206214126.232407Z
reqMod: pwdFailureTime:+ 20171206214126.564006Z
reqMod: pwdFailureTime:+ 20171206214126.816303Z
reqMod: pwdFailureTime:+ 20171206214127.168459Z
reqMod: pwdFailureTime:+ 20171206214127.481267Z
reqMod: pwdFailureTime:+ 20171206214127.779584Z
reqMod: pwdFailureTime:+ 20171206214128.176611Z
reqMod: pwdFailureTime:+ 20171206214128.429982Z
reqMod: pwdFailureTime:+ 20171206214128.852280Z
reqMod: pwdFailureTime:+ 20171206214129.121729Z
reqMod: pwdAccountLockedTime:= 20171206214129Z
reqMod: entryCSN:= 20171206214129.121794Z#000000#001#000000
reqMod: modifiersName:= cn=ldaproot,dc=xxx,dc=edu
reqMod: modifyTimestamp:= 20171206214129Z
reqMod: pwdFailureTime:-
reqEntryUUID: 41e02340-18f9-1027-900a-8ac8742d2008
entryCSN: 20171206214129.121794Z#000000#001#000000
entryUUID: 4d10f56e-6f1a-1037-9a98-f5e2e6dad8c2
creatorsName: cn=accesslog
createTimestamp: 20171206214129Z
modifiersName: cn=accesslog
modifyTimestamp: 20171206214129Z


serverID2 is then unable to process the change for this CSN provided by
serverID3, and goes into refresh mode.
Comment 1 OpenLDAP project 2019-04-17 23:18:31 UTC
See ITS#8790, ITS#8100
should be fixed
Comment 2 Quanah Gibson-Mount 2019-04-17 23:18:31 UTC
changed notes
changed state Open to Closed