[Date Prev][Date Next] [Chronological] [Thread] [Top]

RE: slapadd(8), database order

> -----Original Message-----
> From: Kurt D. Zeilenga [mailto:Kurt@OpenLDAP.org]
> Sent: Monday, September 09, 2002 11:16 AM
> To: Howard Chu
> Cc: openldap-devel@OpenLDAP.org
> Subject: Re: slapadd(8), database order
> Maybe the thing to do is:
>   a) have back-bdb check for existance of parents
>   b) provide an LDIF ordering tool
> or, provide a ID defragmentation tool :-).

sigh... While all of that was tumbling thru my head I did an experiment with
the dn2id database. The results were poorer than I expected but I thought I'd
pass along the highlights.

I noticed that dn2id is maintained using prefixes on each DN. This separates
entries that are related to each other, i.e. we really aren't taking
advantage of any locality of reference in the B-tree. My first thought was to
just turn the prefix into a suffix. This way, the base, subtree, and onelevel
index slots for an entry will always be created next to each other in the
database. But still, entries that are adjacent in the DIT will be far apart
because the LDAP DN order doesn't logically sort entries together. So I wrote
some code to flip the rdn order around for the dn2id keys. I also changed the
Prefix characters to a set that sorts in ascending order. (base=!, sub=$,
one=* - this is the order we create the keys in the code, and the characters
are ascending in both ASCII and EBCDIC.)

Statistics from slapadd showed that this approach was slower than the current
approach. I was quite surprised. db_stat showed that the number of page
requests satisfied by the cache was higher; for a 450,000 entry database the
percentage of hits with a 2MB cache went from 88% to 100%, so I sort of
accomplished what I intended, but also the actual number of page requests
increased, so the amount of work being done overall was higher.

The "time" results for the current code:
691.720u 118.380s 18:09.39 74.3% 0+0k 0+0io 1628pf+0w

"time" for the experimental code:
727.810u 90.760s 19:07.57 71.3% 0+0k 0+0io 1630pf+0w

The better cache use saved 27.6 seconds of kernel/IO time, but cost 36
seconds of user time. I guess this approach causes more page splits, but
don't really know.

It occurs to me that this dn2id key layout may still be useful; it makes an
explicit DN_SUBTREE index unnecessary.

  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support