[Date Prev][Date Next] [Chronological] [Thread] [Top]

RE: back-bdb performance

> -----Original Message-----
> From: owner-openldap-devel@OpenLDAP.org
> [mailto:owner-openldap-devel@OpenLDAP.org]On Behalf Of Marijn Meijles

> > Things might be more problematic for something like modrdn, I'll have to
> > look at that later. This is another point in back-hdb's favor,
> the parent is
> > completely independent of changes made to child nodes.
> (Although I have to
> Can you elaborate a bit more on your scheme? Because there might be some
> nasty problems but I can't speculate unless I have some more details..

There are some vulnerabilities, sure. There are two main databases, id2entry
and id2parent. id2entry is pretty much the same as in the other backends,
except that the entry's DN/NDN are never stored. id2parent is keyed on entry
ID and contains parent ID and RDN/NRDN. At db_open time the entire id2parent
database is read into memory in an AVL tree. The pointer to the root of the
namingContext is saved. Then the entire tree is traversed once, connecting
children to parents in another AVL tree per parent. Once this is done the
database is ready for use.

The main problems - control of the tree remains in RAM, so unless shared
memory is used (which I don't), no slap tools are allowed to run while slapd
is running. Also, if the id2parent database is corrupted, your entire tree
structure is lost. Since no full DN copies exist anywhere, you cannot
recreate the hierarchy with only the id2entry information.

The advantages - tree control is all in memory, and is extremely fast.
Onelevel and Subtree indices don't need to be maintained on disk since the
tree structure exists in memory. Multiple nodes can be added concurrently to
the id2entry and id2parent databases with nearly zero contention. Likewise
for any other I/O operations. ModDn can be used to move entire subtrees with
a single hit on one node in the id2parent database. (Write one node's parent
ID, you're done.) This operation is slow-to-impossible with the current
backend designs.

Startup cost and overhead are negligible. With a 10000 node database and
average RDN lengths of 20 characters you're only talking 400KB for RDN/NRDN
storage, plus another hundred KB for the AVL node pointers, and the I/O time
to read this data volume is trivial.

You pay a small penalty in that you must construct the entry DNs by walking
up from the node through all of its parents when you want to give the entry
to some other part of slapd. However, this cost is invisible given the huge
gain in I/O throughput.

The current back-hdb code in the CVS archive does not take full advantage of
the hierarchical structure, since it was shoehorned into the more
traditional organization of back-bdb. With a bit of rewriting, most of the
work of looking up a node's parent for existence checks and other such
nonsense can be eliminated. (Since you must find a node starting from the
root of the tree, you have necessarily already located the parent while
searching for the target.)

Certainly some of the synchronization problems still exist, and you still
use mutexes to address them. The point is that, like in a good filesystem
design, you can deliver far greater performance by keeping the essential
metadata in-memory and minimizing the amount of updating that must go to the
backing store.

  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support