OpenLDAP
Up to top level
Build   Contrib   Development   Documentation   Historical   Incoming   Software Bugs   Software Enhancements   Web  

Logged in as guest

Viewing Software Bugs/6831
Full headers

From: ryans@aweber.com
Subject: Proxycache database corruption
Compose comment
Download message
State:
0 replies:
6 followups: 1 2 3 4 5 6

Major security issue: yes  no

Notes:

Notification:


Date: Tue, 15 Feb 2011 16:58:21 +0000
From: ryans@aweber.com
To: openldap-its@OpenLDAP.org
Subject: Proxycache database corruption
Full_Name: Ryan Steele
Version: 2.4.23
OS: Ubuntu Server
URL: ftp://ftp.openldap.org/incoming/ryan-steele-110215.proxycache-failure.log
Submission from: (NULL) (207.106.239.81)


I use back-ldap + proxycache on many of my servers to reduce network traffic and
to alleviate load on the masters, as well as to maintain service continuity in
the event of a network failure.  However, we have recently been noticing an
issue where the proxycache database claims that it has the data and that the
query is answerable, but fails to read data from any of the indices it thinks
the data is at.  It happens randomly, and to random entries.  We do not cache
negative search results, so the cache should never return nentries=0
authoritatively.  I can temporarily fix it for some broken users by restarting
slapd and clearing the cache (i.e., pcachePersist is set to FALSE), but
inevitably others stop working (or sometimes, the same users stop working). 
When this happens, most entries are still served from the cache just fine, but
the entries that aren't never will unless slapd is restarted.  I have tested
this with 2.4.17, 2.4.21, and 2.4.23, using the amd64 architecture, and with
libdb4.6 and libdb4.7.

Included below is the slapd.conf I use on my back-ldap + proxycache nodes, an
example of the behavior using ldapsearch, and the log messages during a failed
search in which using log level 16383:

## Proxycache slapd configuration

# Schema
include               /etc/ldap/schema/core.schema
include               /etc/ldap/schema/collective.schema
include               /etc/ldap/schema/corba.schema
include               /etc/ldap/schema/cosine.schema
include               /etc/ldap/schema/duaconf.schema
include               /etc/ldap/schema/dyngroup.schema
include               /etc/ldap/schema/inetorgperson.schema
include               /etc/ldap/schema/java.schema
include               /etc/ldap/schema/misc.schema
include               /etc/ldap/schema/openldap.schema
include               /etc/ldap/schema/ppolicy.schema
include               /etc/ldap/schema/examplecom.schema
include               /etc/ldap/schema/rfc2307bis.schema
include               /etc/ldap/schema/samba.schema
include               /etc/ldap/schema/apple_auxiliary.schema
include               /etc/ldap/schema/apple.schema

# System
pidfile               /var/run/slapd/slapd.pid
argsfile              /var/run/slapd/slapd.args
loglevel              stats
TLSCACertificateFile  /etc/ldap/ssl/certs/cacert.pem
TLSCertificateFile    /etc/ldap/ssl/certs/openldap.cert.pem
TLSCertificateKeyFile /etc/ldap/ssl/keys/openldap.key.pem
TLSVerifyClient       never

# Modules
modulepath            /usr/lib/ldap
moduleload            back_ldap.la
moduleload            back_hdb.la
moduleload            pcache.la

# Back-LDAP
database              ldap
uri                   "ldap://ldapmaster.example.com"
suffix                "dc=example,dc=com"
rootdn                "cn=admin,dc=example,dc=com"
rootpw                SECRET
tls                   start

# ACLs
access to attrs=userPassword
   by tls_ssf=128 ssf=128 self write
   by tls_ssf=128 ssf=128 anonymous auth
   by tls_ssf=128 ssf=128
group/groupOfURLs/Member="cn=ops,ou=Groups,dc=example,dc=com" write
   by tls_ssf=128 ssf=128 * compare
access to *
   by tls_ssf=128 ssf=128 self write
   by tls_ssf=128 ssf=128
group/groupOfURLs/Member="cn=ops,ou=Groups,dc=example,dc=com" write
   by tls_ssf=128 ssf=128 * read

# ProxyCache
overlay               pcache
proxycache            hdb 500000 1 5000 86400
directory             /var/lib/ldap/proxycache

index                 cn                                                eq
index                 departmentName                                    eq
index                 entryCSN                                          eq
index                 entryUUID                                         eq
index                 gidNumber                                         eq
index                 mail                                              eq
index                 member                                            eq
index                 memberUid                                         eq
index                 objectClass                                       eq
index                 pcacheQueryid                                     eq
index                 uid                                               eq
index                 uidNumber                                         eq
index                 uniqueMember                                      eq

proxycachequeries     1000000
proxyattrset          0 apple-user-homeDirectory blogCategory cn dateCreated
departmentName departmentNumber description
displayColor employeeNumber gecos getsPages gidNumber givenName homeDirectory
htaccessPasswd isAvailable isPhoneOperator
lastAdminVisit loginShell mail manager member memberUid mobile mobileEmail

Message of length 10404 truncated

Followup 1

Download message
Date: Tue, 15 Feb 2011 15:40:11 -0500
From: Ryan Steele <ryans@aweber.com>
To: openldap-its@OpenLDAP.org
CC: ryans@aweber.com
Subject: Re: (ITS#6831) Proxycache database corruption
This may also be related to ITS#6217
(http://www.openldap.org/its/index.cgi?findid=6217).  In fact, I'm pretty sure
it
is.  In response to the last question in that ITS posed by Howard Chu (Followup
18), the schema on the proxy servers
match the schema available on the primary servers exactly.  If not, it wouldn't
work for some users and not others
simultaneously (that is, users who have the same objectClasses and attributes,
and differ only in the user-specific
attribute values like uid, gecos, phone extension, etc.).  Hopefully the data
and logs I was able to provide (and any
debugger output/cache databases that might be requested of me as the ITS moves
forward) will help to fill in some of the
blanks.



Followup 2

Download message
Date: Thu, 17 Feb 2011 10:38:34 -0500
From: Ryan Steele <ryans@aweber.com>
To: openldap-its@OpenLDAP.org
CC: ryans@aweber.com
Subject: Re: (ITS#6831) Proxycache database corruption
I believe this may be the same issue as in #6217, given this behavior with two
users which have the same objectclasses,
same attributes, and differ only in the attribute values (e.g., their name,
phone number, and so on):


The initial query, where one entry doesn't get returned as it should:
=====================================================================
bash:~# ldapsearch -x -H ldaps://localhost -LLL -b ou=Users,dc=example,dc=com
'(&(|(objectClass=examplecomEmployee)(objectClass=examplecomUtilityUser))(uid=jdoe1))'
uid
dn: uid=jdoe1,ou=Users,dc=example,dc=com
uid: jdoe1

bash:~# ldapsearch -x -H ldaps://localhost -LLL -b ou=Users,dc=example,dc=com
'(&(|(objectClass=examplecomEmployee)(objectClass=examplecomUtilityUser))(uid=jdoe2))'
uid


If I reverse the filter, I get the expected results:
=====================================================================
bash:~# ldapsearch -x -H ldaps://localhost -LLL -b ou=Users,dc=example,dc=com
'(&(|(objectClass=examplecomUtilityUser)(objectClass=examplecomEmployee))(uid=jdoe1))'
uid
dn: uid=jdoe1,ou=Users,dc=example,dc=com
uid: jdoe1

bash:~# ldapsearch -x -H ldaps://localhost -LLL -b ou=Users,dc=example,dc=com
'(&(|(objectClass=examplecomUtilityUser)(objectClass=examplecomEmployee))(uid=jdoe2))'
uid
dn: uid=jdoe2,ou=Users,dc=example,dc=com
uid: jdoe2


Furthermore, if I now try the first query again, it too works:
=====================================================================
bash:~# ldapsearch -x -H ldaps://localhost -LLL -b ou=Users,dc=example,dc=com
'(&(|(objectClass=examplecomEmployee)(objectClass=examplecomUtilityUser))(uid=jdoe1))'
uid
dn: uid=jdoe1,ou=Users,dc=example,dc=com
uid: jdoe1

bash:~# ldapsearch -x -H ldaps://localhost -LLL -b ou=Users,dc=example,dc=com
'(&(|(objectClass=examplecomEmployee)(objectClass=examplecomUtilityUser))(uid=jdoe2))'
uid
dn: uid=jdoe2,ou=Users,dc=example,dc=com
uid: jdoe2


I already have the patch for ITS#5756 applied in all the versions I've tested
(2.4.17, 2.4.21,
and 2.4.23).  I can provide a sample database which exhibits this behavior if
needed, as I am working on providing a
self-contained sanitized example already.  If the developers agree that the
issue described in these two ITS's are one
and the same, I'm fine with marking this bug as a duplicate of ITS#6217 to avoid
fragmentation of debugging efforts.



Followup 3

Download message
Date: Mon, 21 Feb 2011 11:31:37 -0500
From: Ryan Steele <ryans@aweber.com>
To: openldap-its@OpenLDAP.org
CC: ryans@aweber.com
Subject: Re: (ITS#6831) Proxycache database corruption
I just uploaded two log file snippets for consideration.  One is a failed query
to the proxycache database exhibiting
this problem, and the other is a successful query to the proxycache database.  A
gzipp'ed tarball can be found here:
ftp://ftp.openldap.org/incoming/ryan-steele-110221.proxycache-logs-for-comparison.tar.gz.
 Up until the point where the
unsuccessful query fails to read the key for the UID equality candidate, both
the failed query and the successful query
are locking the same exact QC index (0x7f50429860f0), trying to fetch the exact
same keys ([b49d1940], [807e234d],
[8f951c4a]), and comparing the same equality/filter/list candidates ("id=9,
first=2699, last=2731").  This is what I
would expect, since the values for the two objectClasses in the search filter
are the same.

However, once it gets to the point where it compares the bdb_equality_candidates
for the UID, the values of the
bdb_filter_candidates, bdb_equality_candidates, and bdb_list_candidates in logs
for the unsuccessful query are a little
confusing.   I gather that a value of "0" for id/first/last is normal when it
cannot find the index in question (as it
does for the UID value that it said was "ANSWERABLE" but was unable to find),
but the "first=2" and "first=2699" values
seem a little strange, especially since the "first=2699" value supposedly
matches the bdb_equality_candidates value
found at one of the indices for the objectClass searches.  I have no idea where
the value "2" comes from, but it too
seems out of place if it couldn't find the index.  Perhaps those values make
more sense in the context where it thought
it *could* find the values, but then wasn't able to (though I admit that sounds
like a pretty big stretch).

## UID portion of unsuccessful query

Feb 21 10:02:00 blog2 slapd[18797]: bdb_idl_fetch_key: [8f951c4a]
Feb 21 10:02:00 blog2 slapd[18797]: <= bdb_index_read 9 candidates
Feb 21 10:02:00 blog2 slapd[18797]: <= bdb_equality_candidates: id=9,
first=2699, last=2731
Feb 21 10:02:00 blog2 slapd[18797]: <= bdb_filter_candidates: id=9 first=2699
last=2731
Feb 21 10:02:00 blog2 slapd[18797]: <= bdb_list_candidates: id=9 first=2699
last=2731
Feb 21 10:02:00 blog2 slapd[18797]: <= bdb_filter_candidates: id=9 first=2699
last=2731
Feb 21 10:02:00 blog2 slapd[18797]: => bdb_filter_candidates
Feb 21 10:02:00 blog2 slapd[18797]:     EQUALITY
Feb 21 10:02:00 blog2 slapd[18797]: => bdb_equality_candidates (uid)
Feb 21 10:02:00 blog2 slapd[18797]: => key_read
Feb 21 10:02:00 blog2 slapd[18797]: bdb_idl_fetch_key: [efb8b856]
Feb 21 10:02:00 blog2 slapd[18797]: <= bdb_equality_candidates: id=0,
first=0, last=0

Feb 21 10:02:00 blog2 slapd[18797]: <= bdb_filter_candidates: id=0 first=0
last=0

Feb 21 10:02:00 blog2 slapd[18797]: <= bdb_list_candidates: id=0 first=2699
last=0

Feb 21 10:02:00 blog2 slapd[18797]: <= bdb_filter_candidates: id=0 first=2699
last=0

Feb 21 10:02:00 blog2 slapd[18797]: <= bdb_list_candidates: id=0 first=0
last=0

Feb 21 10:02:00 blog2 slapd[18797]: <= bdb_filter_candidates: id=0 first=0
last=0

Feb 21 10:02:00 blog2 slapd[18797]: <= bdb_list_candidates: id=0 first=2
last=0

Feb 21 10:02:00 blog2 slapd[18797]: <= bdb_filter_candidates: id=0 first=2
last=0

Feb 21 10:02:00 blog2 slapd[18797]: bdb_search_candidates: id=0 first=2 last=0

Feb 21 10:02:00 blog2 slapd[18797]: hdb_search: no candidates


## UID portion of successful query

Feb 21 10:06:22 blog2 slapd[18797]: bdb_idl_fetch_key: [8f951c4a]
Feb 21 10:06:22 blog2 slapd[18797]: <= bdb_index_read 9 candidates
Feb 21 10:06:22 blog2 slapd[18797]: <= bdb_equality_candidates: id=9,
first=2699, last=2731
Feb 21 10:06:22 blog2 slapd[18797]: <= bdb_filter_candidates: id=9 first=2699
last=2731
Feb 21 10:06:22 blog2 slapd[18797]: <= bdb_list_candidates: id=9 first=2699
last=2731
Feb 21 10:06:22 blog2 slapd[18797]: <= bdb_filter_candidates: id=9 first=2699
last=2731
Feb 21 10:06:22 blog2 slapd[18797]: => bdb_filter_candidates
Feb 21 10:06:22 blog2 slapd[18797]:     EQUALITY
Feb 21 10:06:22 blog2 slapd[18797]: => bdb_equality_candidates (uid)
Feb 21 10:06:22 blog2 slapd[18797]: => key_read
Feb 21 10:06:22 blog2 slapd[18797]: bdb_idl_fetch_key: [ea38ed84]
Feb 21 10:06:22 blog2 slapd[18797]: <= bdb_index_read 1 candidates
Feb 21 10:06:22 blog2 slapd[18797]: <= bdb_equality_candidates: id=1,
first=2701, last=2701

Feb 21 10:06:22 blog2 slapd[18797]: <= bdb_filter_candidates: id=1 first=2701
last=2701

Feb 21 10:06:22 blog2 slapd[18797]: <= bdb_list_candidates: id=1 first=2701
last=2701

Feb 21 10:06:22 blog2 slapd[18797]: <= bdb_filter_candidates: id=1 first=2701
last=2701

Feb 21 10:06:22 blog2 slapd[18797]: <= bdb_list_candidates: id=1 first=2701
last=2701

Feb 21 10:06:22 blog2 slapd[18797]: <= bdb_filter_candidates: id=1 first=2701
last=2701

Feb 21 10:06:22 blog2 slapd[18797]: <= bdb_list_candidates: id=1 first=2701
last=2701

Feb 21 10:06:22 blog2 slapd[18797]: &

Message of length 5464 truncated


Followup 4

Download message
Date: Mon, 21 Feb 2011 11:53:58 -0500
From: Ryan Steele <ryans@aweber.com>
To: openldap-its@OpenLDAP.org
CC: ryans@aweber.com
Subject: Re: (ITS#6831) Proxycache database corruption
Additionally, today I found myself in a situation where reversing the order of
*just* the objectClasses in the filter -
i.e., from (&(|(objectClass=foo)(objectClass=bar))(uid=jdoe1)) to
(&(|(objectClass=bar)(objectClass=foo))(uid=jdoe1)) -
didn't fix things, but changing the search base (from ou=Users,dc=example,dc=com
to dc=example,dc=com) did.

I was then able to issue the original search with searchbase
ou=Users,dc=example,dc=com and filter
(&(|(objectClass=foo)(objectClass=bar))(uid=jdoe1)) and it worked fine.



Followup 5

Download message
Date: Mon, 21 Feb 2011 15:12:46 -0500
From: Ryan Steele <ryans@aweber.com>
To: openldap-its@OpenLDAP.org
CC: ryans@aweber.com
Subject: Re: (ITS#6831) Proxycache database corruption
Some more information: once I changed the search base from
ou=Users,dc=example,dc=com to dc=example,dc=com to "fix" the
problem for that particular user (after which the original query with the
ou=Users,dc=example,dc=com search base began
working again), the following entries popped up in the log:
ftp://ftp.openldap.org/incoming/ryan-steele-110221.proxycache-log-fixing-broken-entries.log.1

It gets stranger, though.  If I start out by using the root entry as the search
base (dc=example,dc=com) with a user who
is not yet in the cache, not only will it say it is ANSWERABLE (which it clearly
isn't) and then return nothing without
even trying to reach out to the upstream host, but if I then change the search
base to ou=Users,dc=example,dc=com, I get
the following error on the CLI (and this is with a successful bind as the
directory admin):


ldapsearch -x -D cn=admin,dc=example,dc=com -y /etc/ldap.secret -H
ldaps://localhost -LLL -b cn=admin,dc=example,dc=com
'(&(|(objectClass=examplecomUtilityUser)(objectClass=examplecomEmployee))(uid=jdoe4))'
uid
No such object (32)
Matched DN: dc=example,dc=com

Debugging logs generated by that query can be found here:
ftp://ftp.openldap.org/incoming/ryan-steele-110221.proxycache-log-attempt-to-fix-broken-entries-error.log.1



Followup 6

Download message
Date: Fri, 03 Jun 2011 19:14:48 -0700
From: Howard Chu <hyc@symas.com>
To: ryans@aweber.com
CC: openldap-its@openldap.org
Subject: Re: (ITS#6831) Proxycache database corruption
ryans@aweber.com wrote:
> Some more information: once I changed the search base from
ou=Users,dc=example,dc=com to dc=example,dc=com to "fix" the
> problem for that particular user (after which the original query with the
ou=Users,dc=example,dc=com search base began
> working again), the following entries popped up in the log:
> ftp://ftp.openldap.org/incoming/ryan-steele-110221.proxycache-log-fixing-broken-entries.log.1
>
> It gets stranger, though.  If I start out by using the root entry as the
search base (dc=example,dc=com) with a user who
> is not yet in the cache, not only will it say it is ANSWERABLE (which it
clearly isn't) and then return nothing without
> even trying to reach out to the upstream host, but if I then change the
search base to ou=Users,dc=example,dc=com, I get
> the following error on the CLI (and this is with a successful bind as the
directory admin):
>
>
> ldapsearch -x -D cn=admin,dc=example,dc=com -y /etc/ldap.secret -H
ldaps://localhost -LLL -b cn=admin,dc=example,dc=com
> '(&(|(objectClass=examplecomUtilityUser)(objectClass=examplecomEmployee))(uid=jdoe4))'
uid
> No such object (32)
> Matched DN: dc=example,dc=com
>
> Debugging logs generated by that query can be found here:
> ftp://ftp.openldap.org/incoming/ryan-steele-110221.proxycache-log-attempt-to-fix-broken-entries-error.log.1

I'm not sure I reproduced all the behaviors you were seeing, but I've 
reproduced at least one of them and fixed it in git. Please pull the latest 
pcache.c and test, thanks.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/


Up to top level
Build   Contrib   Development   Documentation   Historical   Incoming   Software Bugs   Software Enhancements   Web  

Logged in as guest


The OpenLDAP Issue Tracking System uses a hacked version of JitterBug

______________
© Copyright 2013, OpenLDAP Foundation, info@OpenLDAP.org