[Date Prev][Date Next] [Chronological] [Thread] [Top]

HEADS UP: new back-bdb caching code



I'm about ready to check in the hierarchical cache management code. It seems
to work OK for me, but it can certainly use more people banging on it.

The changes to cache.c are rather large; changes elsewhere are less so. The
basic EntryInfo structure is much the same as before, but instead of a single
c_dntree AVL tree for all the DNs in the cache, each EntryInfo has an AVL
tree for its immediate children, as well as a pointer to its parent. I've
moved the EntryInfo structure from its private location in cache.c into
back-bdb.h so that it can be used more easily.

The new organization trims off a lot of wasted AVL tree searches. The former
dn2id code would run an AVL search in the DN tree to find the matching ID,
returning the ID but not the cache pointer. Then the id2entry code would take
this ID and search the cache again, to find the entry. The new code just
passes the EntryInfo pointers around, so the ID search can be skipped after a
successful DN search of the cache.

This layout also allows for quick determination of whether a node is a child
of another node - just follow the parent pointers up to the root and look for
a matching ID. This feature will be particularly useful for search scope
tests, especially with aliases that point outside the original search scope.

The cache also allows shortcutting the dn2id_children routine: if the cache's
child AVL tree is non-NULL then the node definitely has children. (If NULL,
you still need to check the database. Perhaps we should add a flag to the
cache structure that says "this cached node definitely has no children" ...)

This is not the same as the BDB_HIER hierarchical database code, but it is
certainly a necessary step in that direction.

The cache still maintains an LRU list, but modified - only leaf nodes are
placed on the LRU list. It's a real pain to try to traverse a tree and have
your interior nodes expire out from under you... Of course, if an interior
node becomes a leaf node due to all of its children expiring from the cache,
then it will go onto the LRU list.

The cache uses nested transactions to coordinate database updates with cache
updates. Changes are only applied to the cache after all of the nested
database operations have committed successfully. This greatly simplifies the
cache state management.

At present my tests show it to be about the same speed as the previous code.
Hopefully it isn't any slower...

  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support