[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: back-bdb performance

You wrote:
> The main problems - control of the tree remains in RAM, so unless shared
> memory is used (which I don't), no slap tools are allowed to run while slapd
> is running. Also, if the id2parent database is corrupted, your entire tree
> structure is lost. Since no full DN copies exist anywhere, you cannot
> recreate the hierarchy with only the id2entry information.
To be honest, these problems are not very problematic in my view. When slapd
runs, you might as well use slapd. And when your database becomes corrupt,
well, you should have made a backup. That's also true for the current scheme.

> The advantages - tree control is all in memory, and is extremely fast.
After you've read through the whole tree ;) That's cheating, dude ;)

> Onelevel and Subtree indices don't need to be maintained on disk since the
> tree structure exists in memory. Multiple nodes can be added concurrently to
> the id2entry and id2parent databases with nearly zero contention. Likewise
Well, whatever structure you choose, you still need to lock the parent to
ensure it doesn't vanish while you add a child node. The only advantage
in this area is that you can delete without messing with the parent.

> for any other I/O operations. ModDn can be used to move entire subtrees with
> a single hit on one node in the id2parent database. (Write one node's parent
> ID, you're done.) This operation is slow-to-impossible with the current
> backend designs.
This is indeed a big advantage.

> Startup cost and overhead are negligible. With a 10000 node database and
> average RDN lengths of 20 characters you're only talking 400KB for RDN/NRDN
> storage, plus another hundred KB for the AVL node pointers, and the I/O time
> to read this data volume is trivial.
Well, that's 5gigabyte for 10 million entries, not to mention a startup time
of 20 minutes.
And that's only for a standard AVL. If you want to make it a bit scalable
on smp, you need way more memory for your AVL. Otherwise you'll get zero
concurrency real quick. 

> Certainly some of the synchronization problems still exist, and you still
> use mutexes to address them. The point is that, like in a good filesystem
> design, you can deliver far greater performance by keeping the essential
> metadata in-memory and minimizing the amount of updating that must go to the
> backing store.
I agree fully with this point of view, but I wouldn't run a fs which needs to
parse ALL my file info to determine my directory structure...

Why don't sheep shrink when it rains?