[Date Prev][Date Next] [Chronological] [Thread] [Top]

Logic problem in the LDBM backend


   I think I've found a login problem in the LDBM backend of slapd
   (Relating to the caching of entries).

   The problem stems from the fact that DNs are not stored in canonical
   from in the database.

   The result is that when the DELETE command is issued, cache entries
   are not always correctly deleted (out of the cache) -- which has the
   potential to cause further problems (mainly at the application level)
   when the deleted entries still appear in search results.

   (I have an example at the end of this document to demonstrate the problem)

   As I see it there are a number of possible ways to fix this:
     1) Always store the DNs in canonical form in the database
     2) Always canonicalize the DN when loading it from the database
     3) In the cache_entrydn_cmp() routine, canonicalize the cache entry
           before doing the comparison.
     4) Modify the ravl_delete() routine.  (This is where the problem
           manifests itself).
     5) Never normalize/de-normalize a DN once it's in the cache.
           (Not doing this has created the problem I'm trying to fix)
     6) (HACK) Re-write the cache_delete_entry_internal() routine.
           I think this routine should be written slightly differently
           anyway, to make it slightly more robust.  (However, my re-write
           does obscures the bug, and doesn't fix the current memory leak,
           but it _does_ do a better cleanup job than the current code.)

     The basic problem stems from attempting to compare a normalized
      DN to a non-normalized DN.  E.g.:
         Compare: cn=Sample, ou=Some org, o=Main org, c=Whatever
      Removing the spaces causes the strcasecmp() routine to
        fail, both for sorting (the main problem), and for comparisons.

     Does anyone have suggestions on which way I (or someone else) should
        pursue this?

	- Dan (Daniel Carroll - dan@mesastate.edu)

P.S.  I'm submitting the patch for cache_delete_entry_internal() in
      a separate message to openldap-devel.

P.P.S. Here's an extended discussion of the problem, and an example of it:

    The problem appears using this sequence of steps:
       a) An entry is loaded into the cache (structured as an AVL tree)
       b) An operation is done on the entry (e.g. a MODIFY or DELETE command),
             causing the DN of the entry to be normalized.
           The result is that the entry is potentially no longer in the correct
             location in the AVL tree.  (Meaning that the tree may no longer
             have a binary search tree structure -- which means that it
             no longer has an AVL structure either.)
       c) a Delete operation is performed on the entry (by the time
             SLAPD is calling the cache_delete() routine, the DN of the
             entry HAS been normalized).
           Under certain circumstances, an attempt to call avl_delete()
             on this entry (in the cache routines) will fail.  This prevents
            the entry from:
              a) being freed up
              b) (potentially) being found in the AVL tree, except during
                     a full treewalk.
              c) being marked as deleted, causing it to show up in in
                     later searches that happen to match this entry.

here's an example of the specific problem(s) that I encountered:
  The following AVL tree was created for the DN cache (All that I've
  printed out is the DN, and the depth of each node):

  This tree was printed out with the following algorithm:
     a) recursively print the entries in the right sub-tree
     b) print the current node's DN
     c) recursively print the entries in the left sub-tree

    - The result is a list of DN's printed in lexiographically
       descending order)
    - The root node has depth '0' (Note #16).
        This node has two children: Nodes #7 and #22.
    - This is the content of the tree just before attempting to
         delete 'CN=RDYKES,OU=PEOPLE,O=MSC,C=US' out of it.
         (This is node #26).

Node:  Depth: DN:
 1     4: ou=Student, ou=Mesa7, ou=Accounts, o=MSC, c=US
 2     3: ou=Student, ou=Mesa5, ou=Accounts, o=MSC, c=US
 3     4: ou=Student, ou=Mesa2, ou=Accounts, o=MSC, c=US
 4     2: ou=Student, ou=Mail, ou=Accounts, o=MSC, c=US
 5     4: ou=Staff, ou=Mesa7, ou=Accounts, o=MSC, c=US
 6     3: ou=Staff, ou=Mesa5, ou=Accounts, o=MSC, c=US
 7     1: ou=Staff, ou=Mesa2, ou=Accounts, o=MSC, c=US
 8     4: ou=Staff, ou=Mail, ou=Accounts, o=MSC, c=US
 9     3: ou=People, o=MSC, c=US
10     5: ou=mesa7, ou=Accounts, o=MSC, c=US
11     4: ou=mesa5, ou=Accounts, o=MSC, c=US
12     2: ou=mail, ou=Accounts, o=MSC, c=US
13     4: ou=Faculty, ou=Mesa7, ou=Accounts, o=MSC, c=US
14     3: ou=Faculty, ou=Mesa5, ou=Accounts, o=MSC, c=US
15     4: ou=Faculty, ou=Mesa2, ou=Accounts, o=MSC, c=US
16     0: ou=Faculty, ou=Mail, ou=Accounts, o=MSC, c=US
17     4: ou=Admin, ou=Mesa7, ou=Accounts, o=MSC, c=US
18     3: ou=Admin, ou=Mesa5, ou=Accounts, o=MSC, c=US
19     4: ou=Admin, ou=Mesa2, ou=Accounts, o=MSC, c=US
20     2: ou=Admin, ou=Mail, ou=Accounts, o=MSC, c=US
21     3: ou=Accounts, o=MSC, c=US
22     1: o=MSC, c=US
23     4: cn=rdykes, ou=Student, ou=Mesa7, ou=Accounts, o=MSC, c=US
24     3: cn=rdykes, ou=Student, ou=Mesa5, ou=Accounts, o=MSC, c=US
25     4: cn=rdykes, ou=Student, ou=Mail, ou=Accounts, o=MSC, c=US
27     3: cn=admin, o=MSC, c=US

Here's what happens:
  The ravl_delete() routine correctly identifies node #26 as the one to delete.
  Since Node #26 has two children, there's no simple way to delete it, so
      it swaps Node #26 with node #25 (the next larger node in the tree),
      and calls ravl_delete() on the right-hand child of #26 (which is #24).
  Now it compares 'CN=RDYKES,OU=...' (the value to delete)
                  'cn=rdykes, ou...' (the value of node #24)
      and decides that the value to delete 'CN=RDYKES,OU=...' is larger
      than the value in node #24 (because of the space!), and incorrectly
      goes to the _right-hand_ child of node #24 (which is node #23).
  Since it doesn't find the value it was looking for, and there are no
      further sub-trees to go down, it returns a NULL value (meaning that
      it didn't find the entry).
  This NULL value is passed all the way back out of the avl_delete routine,
      (which typically means that the requested value didn't exist in the
      AVL tree in the first place).