Issue 8927 - slapo-ppolicy is destructive to delta-sync replication
Summary: slapo-ppolicy is destructive to delta-sync replication
Status: VERIFIED FIXED
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: slapd (show other issues)
Version: 2.4.46
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
: 7578 (view as issue list)
Depends on:
Blocks:
 
Reported: 2018-10-11 20:25 UTC by Quanah Gibson-Mount
Modified: 2020-03-20 17:21 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description Quanah Gibson-Mount 2018-10-11 20:25:00 UTC
Full_Name: Quanah Gibson-Mount
Version: 2.4.46
OS: N/A
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (47.208.148.239)


While investigating a report of an issue with slapo-ppolicy in an MMR
environment, I've found that ppolicy is destructive in a delta-sync replicated
environment.

The root cause of course is that there is no guidance on how to handle how
replication works with ppolicy, a deficiency that must be addressed before any
final draft is completed.

Reproduction case:

a) Set up a delta-sync replicated environment with slapo-ppolicy enabled and a
default policy of:

pwdAttribute: userPassword
pwdLockout: TRUE
pwdLockoutDuration: 1800
pwdMaxFailure: 100
pwdFailureCountInterval: 300

b) Bind as a user to master1 with an invalid password

c) perform an ldap v3 password modify against master1 as an administrative user
and reset the password for the user in step b

When the second action is performed (c), all consumers will go into REFRESH
mode:

Oct 11 11:44:37 anvil2 slapd[5791]: syncrepl_null_callback : error code 0x10
Oct 11 11:44:37 anvil2 slapd[5791]: slap_graduate_commit_csn: removing
0x7faf10106000 20181011184437.093014Z#000000#001#000000
Oct 11 11:44:37 anvil2 slapd[5791]: syncrepl_message_to_op: rid=001 be_modify
uid=user1,ou=user,dc=example,dc=com (16)
Oct 11 11:44:37 anvil2 slapd[5791]: do_syncrep2: rid=001 delta-sync lost sync on
(reqStart=20181011184437.000001Z,cn=accesslog), switching to REFRESH


As noted in ITS#8125, going into REFRESH mode can cause data loss.
Comment 1 Quanah Gibson-Mount 2018-10-11 20:25:14 UTC
changed notes
Comment 2 Quanah Gibson-Mount 2018-10-12 15:14:26 UTC
--On Thursday, October 11, 2018 9:25 PM +0000 quanah@openldap.org wrote:

> When the second action is performed (c), all consumers will go into
> REFRESH mode:

There appears to be a serious bug in ppolicy.  If I look at the accesslog 
data that was written out, the "pwdFailureTime" attribute is cleared on two 
different entries instead of just the user entry that had its password 
reset. I.e., pwdFailureTime is cleared on the user AND the DN of the 
manager entry that made the change.

dn: reqStart=20181012145703.000000Z,cn=accesslog
objectClass: auditModify
structuralObjectClass: auditModify
reqStart: 20181012145703.000000Z
reqEnd: 20181012145703.000001Z
reqType: modify
reqSession: 1003
reqAuthzID: cn=ldaproot,dc=example,dc=com
reqDN: uid=user1,ou=user,dc=example,dc=com
reqResult: 0
reqMod: pwdFailureTime:+ 20181012145703.125562Z
reqMod: entryCSN:= 20181012145703.125803Z#000000#001#000000
reqMod: modifiersName:= cn=ldaproot,dc=example,dc=com
reqMod: modifyTimestamp:= 20181012145703Z
reqEntryUUID: ac657c60-e60a-412d-b015-522fc451e89a
entryUUID: d2b4a16c-627a-1038-9d4c-dbb80effb9f4
creatorsName: cn=accesslog
createTimestamp: 20181012145703Z
entryCSN: 20181012145703.125803Z#000000#001#000000
modifiersName: cn=accesslog
modifyTimestamp: 20181012145703Z

dn: reqStart=20181012145706.000000Z,cn=accesslog
objectClass: auditModify
structuralObjectClass: auditModify
reqStart: 20181012145706.000000Z
reqEnd: 20181012145706.000001Z
reqType: modify
reqSession: 1003
reqAuthzID: cn=ldaproot,dc=example,dc=com
reqDN: cn=idmgmt,ou=user,ou=service,dc=example,dc=com
reqResult: 0
reqMod: pwdFailureTime:-
reqMod: entryCSN:= 20181012145706.147871Z#000000#001#000000
reqMod: modifiersName:= cn=ldaproot,dc=example,dc=com
reqMod: modifyTimestamp:= 20181012145706Z
reqEntryUUID: bf72bf9a-6079-102b-83cd-8572a998cec3
entryUUID: d4822668-627a-1038-9d4d-dbb80effb9f4
creatorsName: cn=accesslog
createTimestamp: 20181012145706Z
entryCSN: 20181012145706.147871Z#000000#001#000000
modifiersName: cn=accesslog
modifyTimestamp: 20181012145706Z

dn: reqStart=20181012145706.000002Z,cn=accesslog
objectClass: auditModify
structuralObjectClass: auditModify
reqStart: 20181012145706.000002Z
reqEnd: 20181012145706.000003Z
reqType: modify
reqSession: 1003
reqAuthzID: cn=idmgmt,ou=user,ou=service,dc=example,dc=com
reqDN: uid=user1,ou=user,dc=example,dc=com
reqResult: 0
reqMod: userPassword:= {SSHA}y8UHEPuMnrOwrZnufP3XrG7ofbHKRpT0
reqMod: pwdChangedTime:= 20181012145706Z
reqMod: pwdFailureTime:-
reqMod: entryCSN:= 20181012145706.171028Z#000000#001#000000
reqMod: modifiersName:= cn=idmgmt,ou=user,ou=service,dc=example,dc=com
reqMod: modifyTimestamp:= 20181012145706Z
reqEntryUUID: ac657c60-e60a-412d-b015-522fc451e89a
entryUUID: d4845d20-627a-1038-9d4e-dbb80effb9f4
creatorsName: cn=accesslog
createTimestamp: 20181012145706Z
entryCSN: 20181012145706.171028Z#000000#001#000000
modifiersName: cn=accesslog
modifyTimestamp: 20181012145706Z

--Quanah

--

Quanah Gibson-Mount
Product Architect
Symas Corporation
Packaged, certified, and supported LDAP solutions powered by OpenLDAP:
<http://www.symas.com>


Comment 3 Quanah Gibson-Mount 2018-10-12 16:27:35 UTC
--On Friday, October 12, 2018 4:14 PM +0000 quanah@symas.com wrote:

> --On Thursday, October 11, 2018 9:25 PM +0000 quanah@openldap.org wrote:
>
>> When the second action is performed (c), all consumers will go into
>> REFRESH mode:
>
> There appears to be a serious bug in ppolicy.  If I look at the accesslog
> data that was written out, the "pwdFailureTime" attribute is cleared on
> two  different entries instead of just the user entry that had its
> password  reset. I.e., pwdFailureTime is cleared on the user AND the DN
> of the  manager entry that made the change.

Ok, this was a red herring.  The idmgmt user had a pwdFailure attribute 
set.  I removed that, and still the underlying err=16 problem occurs, but 
the idmgmt user reset does not (which is correct now).

The user entry on the other masters has the following set after the 
password failure:

pwdFailureTime: 20181012161037.177876Z

The MOD op recorded for it on the master accepting changes has:

reqMod: userPassword:= {SSHA}He+QPQcFD+1/j9uGZl617/eP50B3/QKj
reqMod: pwdChangedTime:= 20181012161047Z
reqMod: pwdFailureTime:-
reqMod: entryCSN:= 20181012161047.202928Z#000000#001#000000
reqMod: modifiersName:= cn=idmgmt,ou=user,ou=service,dc=example,dc=com
reqMod: modifyTimestamp:= 20181012161047Z

So this should succeed, and yet it fails.  Need to figure out why.

--Quanah



--

Quanah Gibson-Mount
Product Architect
Symas Corporation
Packaged, certified, and supported LDAP solutions powered by OpenLDAP:
<http://www.symas.com>


Comment 4 Quanah Gibson-Mount 2018-10-12 17:32:40 UTC
--On Friday, October 12, 2018 5:27 PM +0000 quanah@symas.com wrote:

> So this should succeed, and yet it fails.  Need to figure out why.

I dug into this further with Ondrej, and the issue is that ppolicy was 
never updated to work correctly in a delta-sync MMR environment. ppolicy on 
the receiving server currently has logic to test if it is a shadow (i.e., 
replica) and if so, change its behavior.  But there is no similar logic to 
handle the case if the receiving server is an MMR node (i.e., a shadow and 
a master).

The following 3 changes to the code base for ppolicy would alleviate this 
issue and other potential issues:

- test we're a replicated op, not just on shadow
- issue MOD_REPLACE (concurrent binds could have cleared that attribute on 
the other servers)
- expect MOD_REPLACE as well as MOD_DELETE on replicated ops


--Quanah

--

Quanah Gibson-Mount
Product Architect
Symas Corporation
Packaged, certified, and supported LDAP solutions powered by OpenLDAP:
<http://www.symas.com>


Comment 5 Ondřej Kuzník 2018-10-16 10:30:01 UTC
On Fri, Oct 12, 2018 at 05:32:52PM +0000, quanah@symas.com wrote:
> --On Friday, October 12, 2018 5:27 PM +0000 quanah@symas.com wrote:
> 
> > So this should succeed, and yet it fails.  Need to figure out why.
> 
> I dug into this further with Ondrej, and the issue is that ppolicy was 
> never updated to work correctly in a delta-sync MMR environment. ppolicy on 
> the receiving server currently has logic to test if it is a shadow (i.e., 
> replica) and if so, change its behavior.  But there is no similar logic to 
> handle the case if the receiving server is an MMR node (i.e., a shadow and 
> a master).
> 
> The following 3 changes to the code base for ppolicy would alleviate this 
> issue and other potential issues:
> 
> - test we're a replicated op, not just on shadow

The patch is available here:
https://github.com/mistotebe/openldap/tree/its8927

> - issue MOD_REPLACE (concurrent binds could have cleared that attribute on 
> the other servers)
> - expect MOD_REPLACE as well as MOD_DELETE on replicated ops

Maybe not exactly required since ppolicy will actually treat these
deletes as soft deletes if needed.

-- 
Ondřej Kuzník
Senior Software Engineer
Symas Corporation                       http://www.symas.com
Packaged, certified, and supported LDAP solutions powered by OpenLDAP

Comment 6 Quanah Gibson-Mount 2018-10-16 17:59:27 UTC
--On Tuesday, October 16, 2018 1:30 PM +0200 Ondřej Kuzník 
<ondra@mistotebe.net> wrote:

>> - test we're a replicated op, not just on shadow
>
> The patch is available here:
> https://github.com/mistotebe/openldap/tree/its8927

I still get err 16 with this patch and the servers fall back to REFRESH.

--Quanah


--

Quanah Gibson-Mount
Product Architect
Symas Corporation
Packaged, certified, and supported LDAP solutions powered by OpenLDAP:
<http://www.symas.com>


Comment 7 Quanah Gibson-Mount 2018-11-01 18:14:32 UTC
changed notes
changed state Open to Release
moved from Incoming to Software Bugs
Comment 8 Quanah Gibson-Mount 2018-11-07 21:15:37 UTC
--On Tuesday, October 16, 2018 6:59 PM +0000 quanah@symas.com wrote:

> --On Tuesday, October 16, 2018 1:30 PM +0200 Ond=C5=99ej Kuzn=C3=ADk=20
> <ondra@mistotebe.net> wrote:
>
>>> - test we're a replicated op, not just on shadow
>>
>> The patch is available here:
>> https://github.com/mistotebe/openldap/tree/its8927
>
> I still get err 16 with this patch and the servers fall back to REFRESH.

Note: Finally fixed with 04a52cef40560b9edec8037b23e444c460fe0d40 in master

--Quanah

--

Quanah Gibson-Mount
Product Architect
Symas Corporation
Packaged, certified, and supported LDAP solutions powered by OpenLDAP:
<http://www.symas.com>


Comment 9 OpenLDAP project 2018-12-19 17:22:32 UTC
Fixed in master
Fixed in RE24 (2.4.47)
See also ITS#8125
Comment 10 Quanah Gibson-Mount 2018-12-19 17:22:32 UTC
changed notes
changed state Release to Closed
Comment 11 Quanah Gibson-Mount 2020-03-20 17:21:55 UTC
*** Issue 7578 has been marked as a duplicate of this issue. ***