[Date Prev][Date Next] [Chronological] [Thread] [Top]

RE: slapadd(8), database order

To: "Kurt D. Zeilenga" <Kurt@OpenLDAP.org>
Subject: RE: slapadd(8), database order
From: "Howard Chu" <hyc@highlandsun.com>
Date: Mon, 9 Sep 2002 16:46:28 -0700
Cc: <openldap-devel@OpenLDAP.org>
Importance: Normal
In-reply-to: <5.1.0.14.0.20020909155613.02958908@127.0.0.1>

> -----Original Message-----
> From: Kurt D. Zeilenga [mailto:Kurt@OpenLDAP.org]

> There are two obvious ways to "reverse the DN"...
> one is to just write it backwards (uid=kdz,dc=example,dc=com)
> becomes (moc=cd,elpmaxe=cd,zdk=diu) the other is to reverse
> the SEQUENCE of RDNs (dc=com,dc=example,uid=kdz).  Both
> should provide good better "locality" using default btree
> ordering routines, but generation costs are different.
> Likely the former is cheaper to generate.

Right. I thought about just doing the former but actually tested with the
latter. Maybe I went wrong there, as a simple string reverse requires one
less traversal through the string than resequencing the RDNs.

> One could use
> DN->id, but write a ordering function which worked backwards.

I hadn't considered that, but it probably would have been easier.

> Anways, give a btree with such ordering, one can optimize
> search by 1) starting at the baseObject and 2) stopping
> on first candidate which is not under the baseObject.

Yes, that's what I had in mind. A similar approach would work for OneLevel as
well. But then I realized that this is how back-hdb already works, and it
might make sense to just finish back-hdb instead.

  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support
>
> At 03:41 PM 2002-09-09, Howard Chu wrote:
> >> -----Original Message-----
> >> From: Kurt D. Zeilenga [mailto:Kurt@OpenLDAP.org]
> >> Sent: Monday, September 09, 2002 11:16 AM
> >> To: Howard Chu
> >> Cc: openldap-devel@OpenLDAP.org
> >> Subject: Re: slapadd(8), database order
> >>
> >>
> >> Maybe the thing to do is:
> >>   a) have back-bdb check for existance of parents
> >>   b) provide an LDIF ordering tool
> >>
> >> or, provide a ID defragmentation tool :-).
> >
> >sigh... While all of that was tumbling thru my head I did an
> experiment with
> >the dn2id database. The results were poorer than I expected but I
> thought I'd
> >pass along the highlights.
> >
> >I noticed that dn2id is maintained using prefixes on each DN.
> This separates
> >entries that are related to each other, i.e. we really aren't taking
> >advantage of any locality of reference in the B-tree. My first
> thought was to
> >just turn the prefix into a suffix. This way, the base, subtree,
> and onelevel
> >index slots for an entry will always be created next to each other in the
> >database. But still, entries that are adjacent in the DIT will be
> far apart
> >because the LDAP DN order doesn't logically sort entries
> together. So I wrote
> >some code to flip the rdn order around for the dn2id keys. I also
> changed the
> >Prefix characters to a set that sorts in ascending order. (base=!, sub=$,
> >one=* - this is the order we create the keys in the code, and the
> characters
> >are ascending in both ASCII and EBCDIC.)
> >
> >Statistics from slapadd showed that this approach was slower than
> the current
> >approach. I was quite surprised. db_stat showed that the number of page
> >requests satisfied by the cache was higher; for a 450,000 entry
> database the
> >percentage of hits with a 2MB cache went from 88% to 100%, so I sort of
> >accomplished what I intended, but also the actual number of page requests
> >increased, so the amount of work being done overall was higher.
> >
> >The "time" results for the current code:
> >691.720u 118.380s 18:09.39 74.3% 0+0k 0+0io 1628pf+0w
> >
> >"time" for the experimental code:
> >727.810u 90.760s 19:07.57 71.3% 0+0k 0+0io 1630pf+0w
> >
> >The better cache use saved 27.6 seconds of kernel/IO time, but cost 36
> >seconds of user time. I guess this approach causes more page splits, but
> >don't really know.
> >
> >It occurs to me that this dn2id key layout may still be useful;
> it makes an
> >explicit DN_SUBTREE index unnecessary.
> >
> >  -- Howard Chu
> >  Chief Architect, Symas Corp.       Director, Highland Sun
> >  http://www.symas.com               http://highlandsun.com/hyc
> >  Symas: Premier OpenSource Development and Support
>

References:
- RE: slapadd(8), database order
  - From: "Kurt D. Zeilenga" <Kurt@OpenLDAP.org>

Prev by Date: RE: slapadd(8), database order
Next by Date: back-bdb stuff
Index(es):
- Chronological
- Thread