[Date Prev][Date Next] [Chronological] [Thread] [Top]

RE: slapadd(8), database order

There are two obvious ways to "reverse the DN"...
one is to just write it backwards (uid=kdz,dc=example,dc=com)
becomes (moc=cd,elpmaxe=cd,zdk=diu) the other is to reverse
the SEQUENCE of RDNs (dc=com,dc=example,uid=kdz).  Both
should provide good better "locality" using default btree
ordering routines, but generation costs are different.
Likely the former is cheaper to generate.  One could use
DN->id, but write a ordering function which worked backwards.

Anways, give a btree with such ordering, one can optimize
search by 1) starting at the baseObject and 2) stopping
on first candidate which is not under the baseObject.

This may speed things up.


At 03:41 PM 2002-09-09, Howard Chu wrote:
>> -----Original Message-----
>> From: Kurt D. Zeilenga [mailto:Kurt@OpenLDAP.org]
>> Sent: Monday, September 09, 2002 11:16 AM
>> To: Howard Chu
>> Cc: openldap-devel@OpenLDAP.org
>> Subject: Re: slapadd(8), database order
>> Maybe the thing to do is:
>>   a) have back-bdb check for existance of parents
>>   b) provide an LDIF ordering tool
>> or, provide a ID defragmentation tool :-).
>sigh... While all of that was tumbling thru my head I did an experiment with
>the dn2id database. The results were poorer than I expected but I thought I'd
>pass along the highlights.
>I noticed that dn2id is maintained using prefixes on each DN. This separates
>entries that are related to each other, i.e. we really aren't taking
>advantage of any locality of reference in the B-tree. My first thought was to
>just turn the prefix into a suffix. This way, the base, subtree, and onelevel
>index slots for an entry will always be created next to each other in the
>database. But still, entries that are adjacent in the DIT will be far apart
>because the LDAP DN order doesn't logically sort entries together. So I wrote
>some code to flip the rdn order around for the dn2id keys. I also changed the
>Prefix characters to a set that sorts in ascending order. (base=!, sub=$,
>one=* - this is the order we create the keys in the code, and the characters
>are ascending in both ASCII and EBCDIC.)
>Statistics from slapadd showed that this approach was slower than the current
>approach. I was quite surprised. db_stat showed that the number of page
>requests satisfied by the cache was higher; for a 450,000 entry database the
>percentage of hits with a 2MB cache went from 88% to 100%, so I sort of
>accomplished what I intended, but also the actual number of page requests
>increased, so the amount of work being done overall was higher.
>The "time" results for the current code:
>691.720u 118.380s 18:09.39 74.3% 0+0k 0+0io 1628pf+0w
>"time" for the experimental code:
>727.810u 90.760s 19:07.57 71.3% 0+0k 0+0io 1630pf+0w
>The better cache use saved 27.6 seconds of kernel/IO time, but cost 36
>seconds of user time. I guess this approach causes more page splits, but
>don't really know.
>It occurs to me that this dn2id key layout may still be useful; it makes an
>explicit DN_SUBTREE index unnecessary.
>  -- Howard Chu
>  Chief Architect, Symas Corp.       Director, Highland Sun
>  http://www.symas.com               http://highlandsun.com/hyc
>  Symas: Premier OpenSource Development and Support