[Date Prev][Date Next] [Chronological] [Thread] [Top]

RE: indexing



At 03:40 AM 2001-12-07, Howard Chu wrote:
>> -----Original Message-----
>> From: Kurt D. Zeilenga [mailto:Kurt@OpenLDAP.org]
>> Sent: Wednesday, December 05, 2001 7:11 PM
>
>> At 07:02 PM 2001-12-05, Howard Chu wrote:
>> >> >Good point. I have another approach in mind, but I haven't
>> >> tested it yet to
>> >> >see if it actually gains anything - use BDB's DB_MULTIPLE
>> >> support. Instead
>> >> >of storing IDLs as a single Key/Data pair, store individual IDs
>> >> as multiple
>> >> >data items under a single key. This way individual IDs can be
>> >> added/deleted
>> >> >to an index without having to rewrite the entire IDL each time.
>> >> That ought
>> >> >to cut down on a lot of the thrashing and overflow pages that
>> >> are currently
>> >> >being generated.
>> >>
>> >> I considered doing this, but thought that might easily cause
>> >> the database to have too many keys.
>
>> I'm curious as to the key count in databases containing substrings
>> indices.
>
>As I understand it, the number of keys in the database shouldn't change,
>although the number of data items increases (with multiple data items
>per key). Examining a database file with a hex dump confirms that keys are
>only stored once. For my 10011 entry sample, with
>
>        index objectclass eq
>        index cn,sn,uid pres,eq,sub
>
>I get database files:
>
>                dn2id           objectclass     cn              sn              uid
>                (keys/data)
>                (DB size)
>back-ldbm       20039/20039     26/26           48103/48103     34168/34168     48215/48215
>                2.2MB           253KB           3.2MB           2.2MB           3.2MB
>
>back-bdb        20032/40031     6/40022    48098/189327 34163/118602 48210/189330
>                2.1MB           442KB           4.6MB           2.9MB           4.6MB
>
>As you can see, the number of keys is comparable for all except objectclass.
>I believe in this case, back-ldbm has been forced to split off into indirect
>ID_BLOCKs because there are too many IDs to fit into a single block.


>With one base node, 10 ou's under that, and 10000 users under the ou's:
>The number of keys for dn2id should include
>        10011 entries
>        10010 subtrees (10011 minus 1 for skipping the base)
>           11 onelevels (base, 10 ou's)
>
>The number of data items for dn2id should include
>        10011 entries
>        20010 subtrees (skipped the base)
>        10010 onelevels (10 under base, 10000 under ou's)
>
>Again, for back-ldbm the number of keys is probably larger due to ID_BLOCK
>splits. For back-hdb, the attribute index databases are identical to
>back-bdb. (back-hdb doesn't use a dn2id database though.)
>
>Using the DB_MULTIPLE approach uses a bit more space in the attribute
>indices.

Excepting where back-ldbm uses ALLIDS as demonstrated, it appears,
in objectClass case.

>I'm a bit puzzled about why the dn2id index didn't grow by a
>similar factor.
>
>Runtimes for this scenario:
>back-ldbm
>ldadd 6.630u 1.120s 3:08.51 4.1%      0+0k 0+0io 2049pf+0w
>slapd 110.300u 63.490s 3:26.79 84.0%  0+0k 0+0io 7132pf+0w
>
>back-bdb
>ldadd 6.150u 0.960s 3:45.96 3.1%      0+0k 0+0io 609pf+0w
>slapd 94.650u 67.720s 4:15.82 63.4%   0+0k 0+0io 6586pf+0w
>
>back-hdb
>ldadd 6.400u 0.920s 3:26.11 3.5%      0+0k 0+0io 1935pf+0w
>slapd 86.290u 57.310s 3:41.46 64.8%   0+0k 0+0io 6178pf+0w
>
>There is a noticable delay from when ldapadd exits and when slapd exits.
>That time consists entirely of disk activity, flushing whatever buffers/logs
>to the disk. Looking at the CPU time figures, I would guess that most of the
>logging overhead can be hidden by storing the log on a separate physical
>volume. (This run generates just about 110MB of logs in back-bdb, and 102MB
>in back-hdb.)
>
>  -- Howard Chu
>  Chief Architect, Symas Corp.       Director, Highland Sun
>  http://www.symas.com               http://highlandsun.com/hyc
>  Symas: Premier OpenSource Development and Support