[Date Prev][Date Next] [Chronological] [Thread] [Top]

RE: back-bdb performance



> -----Original Message-----
> From: owner-openldap-devel@OpenLDAP.org
> [mailto:owner-openldap-devel@OpenLDAP.org]On Behalf Of Marijn Meijles

> You wrote:
> > The main problems - control of the tree remains in RAM, so unless shared
> > memory is used (which I don't), no slap tools are allowed to
> run while slapd
> > is running. Also, if the id2parent database is corrupted, your
> entire tree
> > structure is lost. Since no full DN copies exist anywhere, you cannot
> > recreate the hierarchy with only the id2entry information.

> To be honest, these problems are not very problematic in my view.
> When slapd
> runs, you might as well use slapd. And when your database becomes corrupt,
> well, you should have made a backup. That's also true for the
> current scheme.

True. And with the transaction log you have a good chance of recovery.

> > The advantages - tree control is all in memory, and is extremely fast.
> After you've read through the whole tree ;) That's cheating, dude ;)

Cheating? Me? Never...

> Well, whatever structure you choose, you still need to lock the parent to
> ensure it doesn't vanish while you add a child node. The only advantage
> in this area is that you can delete without messing with the parent.

Yes.

> > for any other I/O operations. ModDn can be used to move entire
> subtrees with
> > a single hit on one node in the id2parent database. (Write one
> node's parent
> > ID, you're done.) This operation is slow-to-impossible with the current
> > backend designs.
> This is indeed a big advantage.

Yes!

> > Startup cost and overhead are negligible. With a 10000 node database and
> > average RDN lengths of 20 characters you're only talking 400KB
> for RDN/NRDN
> > storage, plus another hundred KB for the AVL node pointers, and
> the I/O time
> > to read this data volume is trivial.

> Well, that's 5gigabyte for 10 million entries, not to mention a
> startup time
> of 20 minutes.
> And that's only for a standard AVL. If you want to make it a bit scalable
> on smp, you need way more memory for your AVL. Otherwise you'll get zero
> concurrency real quick.

Hm, your math evades me. 10 million entries means 160MB in AVL
nodes/pointers (4 bytes per pointer). Plus 400MB in RDN storage. 560MB, not
5GB. Sure, I only have 256MB physical RAM in my build system, but 560MB is
not a ridiculous requirement for a database of this size. To put this into
perspective, how much dbcache would you need to configure to get good
performance out of a standard backend for this size of database?

With a dn2id-based database, look at the extreme growth rate of the dn2id
database itself as entry counts rise and the tree depth increases. Turn that
into disk thrashing because there's no way you could cache it all. On top of
that, the dn2id index itself becomes next-to-useless because at most levels
the subtree IDLs will have already overflowed and been replaced with ALLIDS.
That means there will be no scoping on your subtree searches at all, they
will all be based over the entire 10 million entries. A lot of your
attribute indices will also have overflowed, so you won't get any help there
either. A vast majority of your searches will thus be forced to iterate over
the entire 10 million entries, indexed or not.

All of the work involved in managing the hierarchical database is O(logN),
this scales to any size you care to dream up. All of the work involved in
managing the dn2id database is O(n^2), and that runs into brick walls very
quickly.

> > Certainly some of the synchronization problems still exist, and
> you still
> > use mutexes to address them. The point is that, like in a good
> filesystem
> > design, you can deliver far greater performance by keeping the essential
> > metadata in-memory and minimizing the amount of updating that
> must go to the
> > backing store.

> I agree fully with this point of view, but I wouldn't run a fs
> which needs to
> parse ALL my file info to determine my directory structure...

If you've ever used a MiniDisc then you've already used such an fs. This is
another example of a filesystem that reads the entire directory structure at
start-time (when a disc is inserted into a drive) and keeps it completely in
memory. The approach works extremely well, especially in cases where write
speed to the physical media is ridiculously slow (as on optical media).

I suppose you wouldn't sacrifice too much speed by turning it upside down,
into a more traditional filesystem-style approach. (id2children instead of
id2parent.) That trades in-memory footprint for access time and adds an
extra I/O on the parent for every child modification. At this point I don't
consider it necessary to make that tradeoff. (Your 10 million entry database
translates to probably 20GB of id2entry data. I believe it's reasonable to
require at least 2GB of physical RAM on your server to serve this database.)

  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support