[Date Prev][Date Next]
Re: some thoughts on indexing (Was: Some openldap fixes... (fwd))
On Wed, Sep 20, 2000 at 11:12:02AM -0700, Kurt D. Zeilenga wrote:
> To further this discussion, here are some thoughts on what
> I was planning for the replacement backend.
> I had mentioned I was considering using duplicate
> keys (one key/value pair per assertValue/ID pair) instead
> of ID lists (one key -> list of IDs). I now believe this
> is not workable.
> I've also briefly considered generating a scoped key per
> assertion value (DN+assertValue/ID). Though such
> allows one to easily skip those keys not in scope, I
> believe this approach is still not workable as you'd
> still be generating a key for each matching entry.
> Peter/Marijn approach can be viewed an additional variant
> of the above. Here you generate a scoped key/value pairs
> for parent(DN)+assertValue -> list of IDs. This reduces
> the number of keys to maintain/fetch where you have
> multiple matching entries within the same container.
> It can degrade to the one key per matched entry... but
> given the characteristics of most directories, this may
> be workable.
> I've also been looking into (for quite some time) in
> key management issues. I believe the IDL code needs
> much work.
> IDL splitting: IDL splitting is designed to produce
> reasonably sized ID blocks. My gut feeling is that
> they are not worth the effort. I propose to axe
> splitting. This will reduce the complexity of the
> code and allow for significant optimizations.
remember though,. that because we now post-sort the on disk
IDLs needn'd be sorted anymore.
and those blocks do save a lot of IO when the list does
get very long
just some thoughts
> ALLIDS: I believe the concept is okay... the implementation
> needs work. First, ALLIDS should be high. I see little
> reason not to allow IDLISTS to be large. 16KIDs? 128K IDs?
> 1M IDs? As before, that can be knob.
The only time ALLIDS would be _faster_ is when it would require
more IO to load the list than load the few not mentioned entries.
this is a very high limit i think,..
If however you might deside to leave that idea intact, one might
make the threshold a percentage of the db size
> I also plan to implement some additional administrative limits:
> 1) return error to client if number of candidates
> exceeds a limit (before testing)
would break LDAP specs i think,..
allthough,.. there are a few fuzzy errors one could abuse for
this purpose, like: LDAP_UNWILLING_TO_PERFORM
why though ?
> 2) return error to client if number of tested candidates
> exceeds a limit (regardless of test result)
eruhmm,.. would this not be the same as 1) ?
because if one ignores the test-result the number of tested
entries will be the same as the number of candidates. or am
I missing something here,..
> 3) return error to client if sizelimit is exceeded.
how would this be extra ?
I very often get this error :)