[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Some openldap fixes... (fwd) (fwd)

At 09:54 PM 9/18/00 +0200, Marijn Meijles wrote:
>> >Peter made the rest of the descriptions:
>> >
>> >- removed NEXTID file
>> >        * instead use the value of the last key in the id2entry db
>> 2.0 maintains the next id in a DB file.
>that's what the fix is all about,.. it's not maintained as such,.. the
>information you want (the last seq. number + 1) is allready present
>thus keeping a second copy of that number only adds programm complexity
>and increases the chance of errors.

Recycling IDs is currently viewed as being more dangerous (especially
if not flushing key deletes).

>> >- removed the explicit dn index
>> >        * couldn't find any use, and the program kept working just fine :)
>> 1.2 (and 2.0) uses DN indexing to properly scope searches.  Removing
>> them have side effects.  2.0 has new DN indexing to improve speed
>> (1.2 used substrings indexing which were horible, so disabled by
>> default [which meant that you cannot place multiple suffixes in
>> one database by defaults).  
>there was only 1 reference to that whole index, and that was where it was
>created,.. why keep an index if it's never to be read ? trust me,.. 
>I 'grep'-ed the whole source tree on this !

subtree_candidates()  [before you hacked it]...

>this enforces the scope. So the candidate tables can contain as many
>OOS entries as they want. The trick offcourse it to keep those to a minimum :)

The 2.0 is much cleaner... there is are DN indices for base, one-level,
and subtree searches.  These replace 1.2's used a DN substring index
(for subtree scoping) and id2children (for one level scoping).  The
indices are called from via the candidate code (through filter modification).

Note that DN indexing does not eliminate the need to test the scope.
DN indexing just reduces the number of entries which need testing.

>okay let me be more clear,.. I removed/modified those filters
>because they didn't have any added benefit and only slowed stuph down.

In 1.2, maintaining the DN substr index was expense and not effective.
In 2.0, maintaining the DN subtree index is much less expensive and
much more effective.

>> >        * fixed that butt-ugly for-loop in ldbm_back_search

Ah, it's not the for-loop you were referring to but the idl_nextid
code.  Yes, the idl_nextid code could stand some improvement.

>> >        * replaced all calls to idl_allids with give_children
>> >
>> >        id2children.c::ID_BLOCK * give_children( Beckend *, 
>> >                                                 Entry * base, 
>> >                                                 int scope) 
>> >
>> >          users the id2children db to construct a list of all id's within the
>> >        specified base/scope pair.
>> id2children was replaced additional DN indices in 2.0.  The
>> new code supports indices for scope base, one-level, and subtree.
>dewd,.. i tried that myself,.. only yields f*** huge databases
>the trick is to have only one index that has scope support
>an index per scope is too slow/big

It's all in trade-offs and highly dependent upon use.

In the fucking huge directories I've dealt with, [specifically
ISP user directories] had small set containers each with some large
number of entries.  To allow fucking large numbers (and for speed),
we put each container into an independent backend.  In this
environment, scoping indexes (whether separate or combined with
assertion indexing) is quite pointless.  A further optimization
was to design applications to only require equality assertions
upon values which had low occurrence rates (e.g. uid, mail).  Other
accesses (requiring more expensive lookups) can be off loaded to
other slaves configured for such.

Our out-of-box defaults are meant to be "reasonable" for "most"
users.  As such, they should be tuned for mid-sized directories.
Here, separate scoping and assertion indexes work well.  They
are not overly expensive to maintain and are reasonably effective
over common directory uses).  But maintaining N*M scoped keys
(vs M assertion assertion key) gets quite expensive, especially
if the tree is deep.

Again, it's all trade-offs.  There is not one "best way".  The
2.0 approach is okay for 90% of our users.  The other 10% might
need to customize the code.  If those customizations are common
place within the user base, we can include them in the distribution
behind appropriate knobs.