[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Logic problem in the LDBM backend
Hello,
I think I've found a login problem in the LDBM backend of slapd
(Relating to the caching of entries).
The problem stems from the fact that DNs are not stored in canonical
from in the database.
The result is that when the DELETE command is issued, cache entries
are not always correctly deleted (out of the cache) -- which has the
potential to cause further problems (mainly at the application level)
when the deleted entries still appear in search results.
(I have an example at the end of this document to demonstrate the problem)
As I see it there are a number of possible ways to fix this:
1) Always store the DNs in canonical form in the database
2) Always canonicalize the DN when loading it from the database
3) In the cache_entrydn_cmp() routine, canonicalize the cache entry
before doing the comparison.
4) Modify the ravl_delete() routine. (This is where the problem
manifests itself).
5) Never normalize/de-normalize a DN once it's in the cache.
(Not doing this has created the problem I'm trying to fix)
6) (HACK) Re-write the cache_delete_entry_internal() routine.
I think this routine should be written slightly differently
anyway, to make it slightly more robust. (However, my re-write
does obscures the bug, and doesn't fix the current memory leak,
but it _does_ do a better cleanup job than the current code.)
The basic problem stems from attempting to compare a normalized
DN to a non-normalized DN. E.g.:
Compare: cn=Sample, ou=Some org, o=Main org, c=Whatever
and: CN=SAMPLE,OU=SOME ORG,O=MAIN ORG,C=WHATEVER
Removing the spaces causes the strcasecmp() routine to
fail, both for sorting (the main problem), and for comparisons.
Does anyone have suggestions on which way I (or someone else) should
pursue this?
- Dan (Daniel Carroll - dan@mesastate.edu)
P.S. I'm submitting the patch for cache_delete_entry_internal() in
a separate message to openldap-devel.
P.P.S. Here's an extended discussion of the problem, and an example of it:
The problem appears using this sequence of steps:
a) An entry is loaded into the cache (structured as an AVL tree)
b) An operation is done on the entry (e.g. a MODIFY or DELETE command),
causing the DN of the entry to be normalized.
The result is that the entry is potentially no longer in the correct
location in the AVL tree. (Meaning that the tree may no longer
have a binary search tree structure -- which means that it
no longer has an AVL structure either.)
c) a Delete operation is performed on the entry (by the time
SLAPD is calling the cache_delete() routine, the DN of the
entry HAS been normalized).
Under certain circumstances, an attempt to call avl_delete()
on this entry (in the cache routines) will fail. This prevents
the entry from:
a) being freed up
b) (potentially) being found in the AVL tree, except during
a full treewalk.
c) being marked as deleted, causing it to show up in in
later searches that happen to match this entry.
here's an example of the specific problem(s) that I encountered:
The following AVL tree was created for the DN cache (All that I've
printed out is the DN, and the depth of each node):
This tree was printed out with the following algorithm:
a) recursively print the entries in the right sub-tree
b) print the current node's DN
c) recursively print the entries in the left sub-tree
Comments:
- The result is a list of DN's printed in lexiographically
descending order)
- The root node has depth '0' (Note #16).
This node has two children: Nodes #7 and #22.
- This is the content of the tree just before attempting to
delete 'CN=RDYKES,OU=PEOPLE,O=MSC,C=US' out of it.
(This is node #26).
Node: Depth: DN:
1 4: ou=Student, ou=Mesa7, ou=Accounts, o=MSC, c=US
2 3: ou=Student, ou=Mesa5, ou=Accounts, o=MSC, c=US
3 4: ou=Student, ou=Mesa2, ou=Accounts, o=MSC, c=US
4 2: ou=Student, ou=Mail, ou=Accounts, o=MSC, c=US
5 4: ou=Staff, ou=Mesa7, ou=Accounts, o=MSC, c=US
6 3: ou=Staff, ou=Mesa5, ou=Accounts, o=MSC, c=US
7 1: ou=Staff, ou=Mesa2, ou=Accounts, o=MSC, c=US
8 4: ou=Staff, ou=Mail, ou=Accounts, o=MSC, c=US
9 3: ou=People, o=MSC, c=US
10 5: ou=mesa7, ou=Accounts, o=MSC, c=US
11 4: ou=mesa5, ou=Accounts, o=MSC, c=US
12 2: ou=mail, ou=Accounts, o=MSC, c=US
13 4: ou=Faculty, ou=Mesa7, ou=Accounts, o=MSC, c=US
14 3: ou=Faculty, ou=Mesa5, ou=Accounts, o=MSC, c=US
15 4: ou=Faculty, ou=Mesa2, ou=Accounts, o=MSC, c=US
16 0: ou=Faculty, ou=Mail, ou=Accounts, o=MSC, c=US
17 4: ou=Admin, ou=Mesa7, ou=Accounts, o=MSC, c=US
18 3: ou=Admin, ou=Mesa5, ou=Accounts, o=MSC, c=US
19 4: ou=Admin, ou=Mesa2, ou=Accounts, o=MSC, c=US
20 2: ou=Admin, ou=Mail, ou=Accounts, o=MSC, c=US
21 3: ou=Accounts, o=MSC, c=US
22 1: o=MSC, c=US
23 4: cn=rdykes, ou=Student, ou=Mesa7, ou=Accounts, o=MSC, c=US
24 3: cn=rdykes, ou=Student, ou=Mesa5, ou=Accounts, o=MSC, c=US
25 4: cn=rdykes, ou=Student, ou=Mail, ou=Accounts, o=MSC, c=US
26 2: CN=RDYKES,OU=PEOPLE,O=MSC,C=US
27 3: cn=admin, o=MSC, c=US
Here's what happens:
The ravl_delete() routine correctly identifies node #26 as the one to delete.
Since Node #26 has two children, there's no simple way to delete it, so
it swaps Node #26 with node #25 (the next larger node in the tree),
and calls ravl_delete() on the right-hand child of #26 (which is #24).
Now it compares 'CN=RDYKES,OU=...' (the value to delete)
'cn=rdykes, ou...' (the value of node #24)
and decides that the value to delete 'CN=RDYKES,OU=...' is larger
than the value in node #24 (because of the space!), and incorrectly
goes to the _right-hand_ child of node #24 (which is node #23).
Since it doesn't find the value it was looking for, and there are no
further sub-trees to go down, it returns a NULL value (meaning that
it didn't find the entry).
This NULL value is passed all the way back out of the avl_delete routine,
(which typically means that the requested value didn't exist in the
AVL tree in the first place).