[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: some thoughts on indexing (Was: Some openldap fixes... (fwd))



On Wed, Sep 20, 2000 at 11:12:02AM -0700, Kurt D. Zeilenga wrote:
> To further this discussion, here are some thoughts on what
> I was planning for the replacement backend.
> 
> I had mentioned I was considering using duplicate
> keys (one key/value pair per assertValue/ID pair) instead
> of ID lists (one key -> list of IDs).  I now believe this
> is not workable.
> 
> I've also briefly considered generating a scoped key per
> assertion value (DN+assertValue/ID).  Though such
> allows one to easily skip those keys not in scope, I
> believe this approach is still not workable as you'd
> still be generating a key for each matching entry.
> 
> Peter/Marijn approach can be viewed an additional variant
> of the above.  Here you generate a scoped key/value pairs
> for parent(DN)+assertValue -> list of IDs.   This reduces
> the number of keys to maintain/fetch where you have
> multiple matching entries within the same container.
> It can degrade to the one key per matched entry... but
> given the characteristics of most directories, this may
> be workable.
kewl :)

> 
> I've also been looking into (for quite some time) in
> key management issues.  I believe the IDL code needs
> much work.
> 
> IDL splitting:  IDL splitting is designed to produce
> reasonably sized ID blocks.  My gut feeling is that
> they are not worth the effort.  I propose to axe
> splitting.   This will reduce the complexity of the
> code and allow for significant optimizations.
okay,.. 
remember though,. that because we now post-sort the on disk 
IDLs needn'd be sorted anymore.

and those blocks do save a lot of IO when the list does
get very long

just some thoughts
> 
> ALLIDS:  I believe the concept is okay... the implementation
> needs work.  First, ALLIDS should be high.  I see little
> reason not to allow IDLISTS to be large.  16KIDs? 128K IDs?
> 1M IDs?  As before, that can be knob.
The only time ALLIDS would be _faster_ is when it would require
more IO to load the list than load the few not mentioned entries.
this is a very high limit i think,..
If however you might deside to leave that idea intact, one might
make the threshold a percentage of the db size

> 
> I also plan to implement some additional administrative limits:
> 	1) return error to client if number of candidates
> 	exceeds a limit (before testing)
would break LDAP specs i think,..
allthough,.. there are a few fuzzy errors one could abuse for 
this purpose, like: LDAP_UNWILLING_TO_PERFORM

why though ?

> 	2) return error to client if number of tested candidates
> 	exceeds a limit (regardless of test result)
eruhmm,.. would this not be the same as 1) ?
because if one ignores the test-result the number of tested
entries will be the same as the number of candidates. or am
I missing something here,.. 

> 	3) return error to client if sizelimit is exceeded.
how would this be extra ?
I very often get this error :)

regardz,
Peter