[Date Prev][Date Next] [Chronological] [Thread] [Top]

RE: back-bdb future



> -----Original Message-----
> From: Kurt D. Zeilenga [mailto:Kurt@OpenLDAP.org]

> At 03:19 AM 3/16/2003, Howard Chu wrote:
> >For 2.2, we should probably consider using lists of IDL
> ranges. Could get
> >ugly, but it might be more useful for large databases.

> I'm not yet convinced that lists of IDL ranges for the general
> indexing case is worth the pain.  It seems to me that the
> "problem" mostly occurs in one particular index, dn2id.
> That is, of course, where the number of entries matching a
> particular key, commonly a subtree key, is too large to eliminate
> the large number of entries whose value don't match that key.
> This problem, of course, can be mitigated by more sensible
> partitioning of the naming context.

I had the objectclass eq index more in mind actually; the back-hdb concept
can largely eliminate the dn2id problem. As long as you recursively descend
the tree during search evaluation rather than returning a single subtree IDL,
anyway...

Oh, to re-cap back-hdb - the current back-hdb design requires an in-memory
tree structure for the entire database. The on-disk dn2parent database
contains records with key parentID+entryID, and with data rdn and normalized
rdn. The in-memory tree is built on database startup from all of the
dn2parent records. This was a direct translation of the first design I'd done
with gdbm, so my layout options were limited...

The new design will not require constructing and maintaining the entire tree
in memory. Instead, the dn2parent index will take advantage of BDB's sorted
duplicates and range search features. The key will be the parent ID, the data
item will be the nrdn+entryID. Because of the sorted dup feature, all of the
child entries of a particular parentID can be stored under a single parentID
key. Because of the range search feature, any child of the parent can be
directly located by nrdn. The in-memory entry cache will mirror this
hierarchical structure to provide the same dn2id lookup capabilities. The LRU
scheme will have to be modified slightly to insure that only leaf nodes get
recycled. (Side note - currently the back-hdb modrdn code is broken; I put an
ifdef that allows ModDN on nodes that have children, because the on-disk
database allows this, but the entry cache mechanism doesn't accomodate it. So
currently this operation corrupts the in-memory cache...)

  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support