Issue 8185 - Clarification/enhancement request: purging stale pwdFailureTime attributes
Summary: Clarification/enhancement request: purging stale pwdFailureTime attributes
Status: VERIFIED FIXED
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: slapd (show other issues)
Version: 2.4.40
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-07-02 19:59 UTC by subbarao@computer.org
Modified: 2015-11-30 18:22 UTC (History)
0 users

See Also:


Attachments
pwdfailuretime.pl.txt (1.39 KB, text/plain)
2015-07-06 16:30 UTC, subbarao@computer.org
Details

Note You need to log in before you can comment on or make changes to this issue.
Description subbarao@computer.org 2015-07-02 19:59:39 UTC
Full_Name: Kartik Subbarao
Version: 2.4.40
OS: Linux
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (173.75.228.155)


Reading the slapo-ppolicy man page, I was optimistically expecting that excess
stale pwdFailureTime values might be removed from the entry after pwdMaxFailure
was exceeded. For example, if pwdMaxFailure is 5, then only the most recent 5
pwdFailureTime values would be kept, and the old ones purged as and when new
failed bind attempts were made.

This wording in the slapo-ppolicy man page sounds friendly towards this
interpretation: "Excess timestamps beyond those allowed by pwdMaxFailure may
also be purged."

Looking at the source code though, it doesn't seem that pwdFailureTime values
are actually removed unless a successful bind occurs -- whereupon all values of
course are removed.

I would like to request an enhancement to purge stale pwdFailureTime values as
mentioned above. This would also largely mitigate the issue raised in ITS#7089
without needing to develop more involved code for that. The common theme is to
ensure that pwdFailureTime values can't keep accumulating without bound, due to
broken/misconfigured clients that are beyond the LDAP server administrator's
control.

Comment 1 Michael Ströder 2015-07-02 22:08:39 UTC
subbarao@computer.org wrote:
> Reading the slapo-ppolicy man page, I was optimistically expecting that excess
> stale pwdFailureTime values might be removed from the entry after pwdMaxFailure
> was exceeded. For example, if pwdMaxFailure is 5, then only the most recent 5
> pwdFailureTime values would be kept, and the old ones purged as and when new
> failed bind attempts were made.
> 
> This wording in the slapo-ppolicy man page sounds friendly towards this
> interpretation: "Excess timestamps beyond those allowed by pwdMaxFailure may
> also be purged."
> 
> Looking at the source code though, it doesn't seem that pwdFailureTime values
> are actually removed unless a successful bind occurs -- whereupon all values of
> course are removed.
> 
> I would like to request an enhancement to purge stale pwdFailureTime values as
> mentioned above.

Nope. The number of pwdFailureTime is also used as failure lockout counter.
Actually it was improved with ITS#7161.

> This would also largely mitigate the issue raised in ITS#7089

I don't see the relation with ITS#7089.

Ciao, Michael.

Comment 2 subbarao@computer.org 2015-07-06 16:12:04 UTC
Hi Michael,

I'm having a bit of difficulty understanding your response, and it looks 
like my initial message was perhaps equally as unclear to you :-) Let me 
try to clarify, please let me know if this still doesn't make sense.

You mention that "pwdFailureTime is also used as a failure lockout 
counter". I don't see how that conflicts with what I am requesting. I'm 
only asking for /excess/ pwdFailureTime values that are above the 
pwdMaxFailure count to be purged. For example, if pwdMaxFailure is 3, 
and pwdFailureTime has the following values:

pwdFailureTime: 20150702184821Z
pwdFailureTime: 20150702185821Z
pwdFailureTime: 20150702190822Z
pwdFailureTime: 20150702191007Z
pwdFailureTime: 20150702191012Z

What I'm requesting is that the /oldest/ two values be deleted from this 
set:

pwdFailureTime: 20150702184821Z
pwdFailureTime: 20150702185821Z

(To be more precise, I'll suggest that when ppolicy_bind_response() 
processes the BIND failure that triggers the addition of 20150702191012Z 
to pwdFailureTime, that's when it could delete the two oldest values. It 
already loops through the entire set of pwdFailureTime values, so adding 
a check to delete older ones above the pwdMaxFailure count could be done 
in that same loop).

I'm not seeing how this would conflict with the password policy 
specification -- am I missing something?

In the particular situation that's prompting this request, it's not just 
two or three values -- for one entry it was over 38000 values that had 
accumulated over time! (and generally high values for many other entries).

ITS#7161 doesn't address this issue -- it adds more precision to the 
timestamp values, but it doesn't purge excess stale values.

Here's how this issue relates to ITS#7089. In ITS#7089, the requester 
was seeing failed bind attempts to entries that didn't have a password 
defined. As a result, pwdFailureTime values were consistently being 
added to these entries. The common theme is that there is no built-in 
way (to my knowledge) in OpenLDAP to protect against pwdFailureTime 
values continually being added to entries indefinitely.

This enhancement would mitigate that problem by putting a cap on the 
number of pwdFailureTime attributes that could ever accumulate on an 
entry -- the pwdMaxFailure count. Just like administrators have control 
over expiring old log files, they would get the ability to ensure that 
pwdFailureTime values couldn't accumulate indefinitely.

Please let me know what you think.

Thanks,

     -Kartik


Michael Stroeder wrote:
 >> This wording in the slapo-ppolicy man page sounds friendly towards this
 >> interpretation: "Excess timestamps beyond those allowed by 
pwdMaxFailure
 >> may also be purged."
 >>
 >> Looking at the source code though, it doesn't seem that pwdFailureTime
 >> values are actually removed unless a successful bind occurs -- 
whereupon
 >> all values of course are removed.
 >>
 >> I would like to request an enhancement to purge stale pwdFailureTime
 >> values as mentioned above.
 >
 > Nope. The number of pwdFailureTime is also used as failure lockout
 > counter. Actually it was improved with ITS#7161.
 >
 >> This would also largely mitigate the issue raised in ITS#7089
 >
 > I don't see the relation with ITS#7089.

Comment 3 Quanah Gibson-Mount 2015-07-06 16:18:38 UTC
--On Monday, July 06, 2015 5:12 PM +0000 subbarao@computer.org wrote:

> This is a multi-part message in MIME format.
> --------------060309030709060507050000
> Content-Type: text/plain; charset=utf-8; format=flowed
> Content-Transfer-Encoding: 7bit
>
> Hi Michael,
>
> I'm having a bit of difficulty understanding your response, and it looks
> like my initial message was perhaps equally as unclear to you :-) Let me
> try to clarify, please let me know if this still doesn't make sense.
>
> You mention that "pwdFailureTime is also used as a failure lockout
> counter". I don't see how that conflicts with what I am requesting. I'm
> only asking for /excess/ pwdFailureTime values that are above the
> pwdMaxFailure count to be purged. For example, if pwdMaxFailure is 3,
> and pwdFailureTime has the following values:
>
> pwdFailureTime: 20150702184821Z
> pwdFailureTime: 20150702185821Z
> pwdFailureTime: 20150702190822Z
> pwdFailureTime: 20150702191007Z
> pwdFailureTime: 20150702191012Z

I would note that:

IF using delta-syncrepl
AND the data values are replicated
AND authentication attempts can occur against different LDAP masters

You can run into *serious* drift between servers if you try and implement 
this, causing endless refresh mode runs that cause the servers to get 
further out of sync.  See 
<http://www.openldap.org/its/index.cgi/?findid=8125>.

More specifically:

If a client has (most often) a mobile device with a bad password, and it's 
authentication attempts are bouncing between masters, even with high 
resolution timestamps, you can get collisions in the delete op for old 
values that cannot be reconciled, causing fallback/refresh.


--Quanah

--

Quanah Gibson-Mount
Platform Architect
Zimbra, Inc.
--------------------
Zimbra ::  the leader in open source messaging and collaboration

Comment 4 Michael Ströder 2015-07-06 16:25:00 UTC
Kartik Subbarao wrote:
> You mention that "pwdFailureTime is also used as a failure lockout counter". I
> don't see how that conflicts with what I am requesting. I'm only asking for
> /excess/ pwdFailureTime values that are above the pwdMaxFailure count to be
> purged.

Oh, I see. Indeed I did not fully understand your original message.

> For example, if pwdMaxFailure is 3, and pwdFailureTime has the
> following values:
> 
> pwdFailureTime: 20150702184821Z
> pwdFailureTime: 20150702185821Z
> pwdFailureTime: 20150702190822Z
> pwdFailureTime: 20150702191007Z
> pwdFailureTime: 20150702191012Z
> 
> What I'm requesting is that the /oldest/ two values be deleted from this set:

Hmm, still have some doubts: If you want to raise the failure count limit
later you would automatically unlock some accounts you don't want to unlock at
this particular point in time.

Ciao, Michael.

Comment 5 subbarao@computer.org 2015-07-06 16:30:03 UTC
FYI for anyone else who is encountering this problem -- here is a script 
that I wrote as a workaround. It sweeps through all of the 
pwdFailureTime entries in the directory and deletes stale values greater 
than $maxvalues. Also set $basedn accordingly.

It can be run with '--ldif' to preview the changes, and '--ldap' to 
actually make the changes.

The script binds with SASL EXTERNAL on the ldapi:/// interface, so make 
sure that the Unix user has the 'manage' privilege for the 
pwdFailureTime attribute. For example, to enable this for root:

access to attrs=pwdFailureTime by 
dn.base="gidnumber=0+uidnumber=0,cn=peercred,cn=external,cn=auth" manage

Regards,

     -Kartik
Comment 6 subbarao@computer.org 2015-07-06 16:56:37 UTC
On 07/06/2015 12:25 PM, Michael Ströder wrote:
> Hmm, still have some doubts: If you want to raise the failure count limit
> later you would automatically unlock some accounts you don't want to unlock at this particular point in time.

Two thoughts on this:

1) If you raise the failure count limit, aren't you inherently making a 
decision to be more lenient in your policy, and thereby accepting that 
some accounts are not going to be locked out as fast as they might be 
under the previous policy? It seems to me that any "inadvertent" 
unlocking due to purged pwdFailureTime values could be embraced under 
this general umbrella of leniency.

2) If pwdFailureCountInterval is set to some reasonably low number, then 
this whole concern becomes moot: Just wait for pwdFailureCountInterval 
seconds after you decide to change the configuration, before actually 
changing the configuration :-)

I guess I haven't come across many sites that set pwdMaxFailure, but 
/don't/ also set pwdFailureCountInterval. But even in those cases, I 
think #1 would be valid :-)

Regards,

     -Kartik
Comment 7 subbarao@computer.org 2015-07-06 17:12:04 UTC
Thanks for the heads-up Quanah. Looks like you've found a serious 
problem with multi-master replication, good to know about. In my case, 
we're just using single-master replication, so we're able to dodge the 
problem you describe for the time being.

Just to clarify though -- once ITS#8125 is resolved, this enhancement 
shouldn't pose any additional problems for MMR sites, right?

Thanks,

     -Kartik

On 07/06/2015 12:18 PM, Quanah Gibson-Mount wrote:
> I would note that:
>
> IF using delta-syncrepl
> AND the data values are replicated
> AND authentication attempts can occur against different LDAP masters
>
> You can run into *serious* drift between servers if you try and 
> implement this, causing endless refresh mode runs that cause the 
> servers to get further out of sync.  See 
> <http://www.openldap.org/its/index.cgi/?findid=8125>.
>
> More specifically:
>
> If a client has (most often) a mobile device with a bad password, and 
> it's authentication attempts are bouncing between masters, even with 
> high resolution timestamps, you can get collisions in the delete op 
> for old values that cannot be reconciled, causing fallback/refresh.
>
>
> --Quanah
>
> -- 
>
> Quanah Gibson-Mount
> Platform Architect
> Zimbra, Inc.
> --------------------
> Zimbra ::  the leader in open source messaging and collaboration


Comment 8 Michael Ströder 2015-07-06 17:30:32 UTC
Kartik Subbarao wrote:
> On 07/06/2015 12:25 PM, Michael Ströder wrote:
>> Hmm, still have some doubts: If you want to raise the failure count limit
>> later you would automatically unlock some accounts you don't want to unlock
>> at this particular point in time.
> 
> Two thoughts on this:
> 
> 1) If you raise the failure count limit, aren't you inherently making a
> decision to be more lenient in your policy, and thereby accepting that some
> accounts are not going to be locked out as fast as they might be under the
> previous policy?

Yes, there could be a situation where you want to deliberately relax the
lockout policy after carefully considering various security aspects of your
particular deployment.

> It seems to me that any "inadvertent" unlocking due to purged
> pwdFailureTime values could be embraced under this general umbrella of leniency.

No! You set a new lockout limit and most people would expect this to be solely
effective on user accounts which did not reach the new limit yet.

> 2) If pwdFailureCountInterval is set to some reasonably low number, then this
> whole concern becomes moot: Just wait for pwdFailureCountInterval seconds
> after you decide to change the configuration, before actually changing the
> configuration :-)

Consider that you are under on-going attack with many different accounts
affected by the lockout treshold. Then you cannot simply wait for
pwdFailureCountInterval seconds because your system is changing all the time.

Such a situation is a real world scenario.

=> @OpenLDAP developers: leave as is!

Ciao, Michael.

P.S.: I'm not a big fan of password lockout anyway because it's misconfigured
too often because of brain-dead comapny security policies.


Comment 9 subbarao@computer.org 2015-07-06 17:49:44 UTC
On 07/06/2015 01:30 PM, Michael Ströder wrote:
> Consider that you are under on-going attack with many different 
> accounts affected by the lockout treshold. Then you cannot simply wait 
> for pwdFailureCountInterval seconds because your system is changing 
> all the time.
>
> Such a situation is a real world scenario.

Ok -- I'm probably not understanding enough about your particular 
scenario to fully appreciate the concerns that you express. But I think 
there could be ways to address them in this enhancement -- for instance, 
by adding optional parameter(s) like ppolicy_purge_failures <nfailures> 
and/or ppolicy_purge_olderthan <timestamp>, which could then be 
configured to accommodate the scenario you describe.

At this point, I'll think I'll leave it up to the OpenLDAP developers as 
to how they want to proceed on this, and/or to ask for more information.

Thanks for the discussion Michael.

Regards,

     -Kartik

Comment 10 Howard Chu 2015-08-14 14:25:09 UTC
subbarao@computer.org wrote:
> On 07/06/2015 01:30 PM, Michael Ströder wrote:
>> Consider that you are under on-going attack with many different
>> accounts affected by the lockout treshold. Then you cannot simply wait
>> for pwdFailureCountInterval seconds because your system is changing
>> all the time.
>>
>> Such a situation is a real world scenario.
>
> Ok -- I'm probably not understanding enough about your particular
> scenario to fully appreciate the concerns that you express. But I think
> there could be ways to address them in this enhancement -- for instance,
> by adding optional parameter(s) like ppolicy_purge_failures <nfailures>
> and/or ppolicy_purge_olderthan <timestamp>, which could then be
> configured to accommodate the scenario you describe.
>
> At this point, I'll think I'll leave it up to the OpenLDAP developers as
> to how they want to proceed on this, and/or to ask for more information.

I've added a pwdMaxRecordedFailure attribute to the policy schema. Overloading 
pwdMaxFailure would be a mistake.

MaxRecordedFailure will default to MaxFailure if that is set. It defaults to 5 
if nothing is set. There's no good reason to allow the timestamps to 
accumulate without bound.

This is now available for testing in git master.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 11 Howard Chu 2015-08-14 14:25:55 UTC
changed notes
changed state Open to Test
moved from Incoming to Software Enhancements
Comment 12 Howard Chu 2015-08-14 14:38:37 UTC
subbarao@computer.org wrote:
> In the particular situation that's prompting this request, it's not just
> two or three values -- for one entry it was over 38000 values that had
> accumulated over time! (and generally high values for many other entries).

If you have entries with tens of thousands of Bind failures being recorded, 
you have a security monitoring problem. The limit applied by the patch for 
this ITS will only mask the problem. The fact that your security auditors 
haven't already noticed these tens of thousands of Bind failures and stopped 
them at their source means you've got a major vulnerability in your network 
security.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 13 subbarao@computer.org 2015-08-14 19:43:01 UTC
On 08/14/2015 10:38 AM, Howard Chu wrote:
> subbarao@computer.org wrote:
>> In the particular situation that's prompting this request, it's not just
>> two or three values -- for one entry it was over 38000 values that had
>> accumulated over time! (and generally high values for many other 
>> entries).
>
> If you have entries with tens of thousands of Bind failures being 
> recorded, you have a security monitoring problem. The limit applied by 
> the patch for this ITS will only mask the problem. The fact that your 
> security auditors haven't already noticed these tens of thousands of 
> Bind failures and stopped them at their source means you've got a 
> major vulnerability in your network security.

Hi Howard, what's happening here is that an application account's 
password has expired, and its owners have neglected to change the 
password. Meanwhile, the old password remains hardcoded in the 
application, which continues to issue BIND requests that fail. The 
functionality of this particular application isn't mission-critical -- 
it may even be deprecated at this point by another application, so it's 
likely not a priority for the account owners given everything else on 
their plates.

In this customer environment, the monitoring of password failures has 
historically been done outside of LDAP, and focuses on user accounts 
which tend to have higher privileges. LDAP application accounts (which 
tend to have minimal privileges) aren't treated the same. These bind 
failures may get cleaned up at some point during a periodic review of 
application accounts with expired passwords, but it's not likely to 
happen soon, given the lower level of risk/impact.

Given that the bind failures themselves aren't causing a problem (the 
system as a whole has ample capacity), the only operations-impacting 
issue is the continually increasing entry size. So for this customer 
environment, I feel the best approach for now is to simply purge the 
stale failure timestamps (which I would prefer to do with a standard 
OpenLDAP configuration setting than with an external script).

As I mentioned in an earlier message, I see the slapo-ppolicy man page 
as being friendly to this feature request: "Excess timestamps beyond 
those allowed by pwdMaxFailure may also be purged." From my perspective, 
it's a valuable hygienic feature for environments such as this one.

Regards,

     -Kartik

Comment 14 subbarao@computer.org 2015-08-20 15:05:17 UTC
On 08/14/2015 10:25 AM, Howard Chu wrote:
> I've added a pwdMaxRecordedFailure attribute to the policy schema. 
> Overloading pwdMaxFailure would be a mistake.
>
> MaxRecordedFailure will default to MaxFailure if that is set. It 
> defaults to 5 if nothing is set. There's no good reason to allow the 
> timestamps to accumulate without bound.
>
> This is now available for testing in git master.

Howard, I just saw this message from you today, when I happened to be 
looking through my gmail spam folder -- no idea why it ended up there! 
On Friday, I only saw your subsequent message and responded to it 
without knowing that you had already implemented this enhancement. So I 
didn't fully understand the context in which you had written that message.

Thanks very much for implementing this enhancement! I will check out the 
code.

Regards,

     -Kartik

Comment 15 subbarao@computer.org 2015-08-20 17:52:17 UTC
On 08/14/2015 10:25 AM, Howard Chu wrote:
> I've added a pwdMaxRecordedFailure attribute to the policy schema. 
> Overloading pwdMaxFailure would be a mistake.
>
> MaxRecordedFailure will default to MaxFailure if that is set. It 
> defaults to 5 if nothing is set. There's no good reason to allow the 
> timestamps to accumulate without bound.
>
> This is now available for testing in git master.

Tested on Ubuntu 14.04, works great. Thanks again Howard!

     -Kartik

Comment 16 Quanah Gibson-Mount 2015-08-21 21:39:13 UTC
changed notes
changed state Test to Release
Comment 17 OpenLDAP project 2015-11-30 18:22:10 UTC
added to master
added to RE25
addded to RE24 (2.4.43)
Comment 18 Quanah Gibson-Mount 2015-11-30 18:22:10 UTC
changed notes
changed state Release to Closed