[Date Prev][Date Next] [Chronological] [Thread] [Top]

some thoughts on indexing (Was: Some openldap fixes... (fwd))



To further this discussion, here are some thoughts on what
I was planning for the replacement backend.

I had mentioned I was considering using duplicate
keys (one key/value pair per assertValue/ID pair) instead
of ID lists (one key -> list of IDs).  I now believe this
is not workable.

I've also briefly considered generating a scoped key per
assertion value (DN+assertValue/ID).  Though such
allows one to easily skip those keys not in scope, I
believe this approach is still not workable as you'd
still be generating a key for each matching entry.

Peter/Marijn approach can be viewed an additional variant
of the above.  Here you generate a scoped key/value pairs
for parent(DN)+assertValue -> list of IDs.   This reduces
the number of keys to maintain/fetch where you have
multiple matching entries within the same container.
It can degrade to the one key per matched entry... but
given the characteristics of most directories, this may
be workable.

I've also been looking into (for quite some time) in
key management issues.  I believe the IDL code needs
much work.

IDL splitting:  IDL splitting is designed to produce
reasonably sized ID blocks.  My gut feeling is that
they are not worth the effort.  I propose to axe
splitting.   This will reduce the complexity of the
code and allow for significant optimizations.

ALLIDS:  I believe the concept is okay... the implementation
needs work.  First, ALLIDS should be high.  I see little
reason not to allow IDLISTS to be large.  16KIDs? 128K IDs?
1M IDs?  As before, that can be knob.

I also plan to implement some additional administrative limits:
	1) return error to client if number of candidates
	exceeds a limit (before testing)
	2) return error to client if number of tested candidates
	exceeds a limit (regardless of test result)
	3) return error to client if sizelimit is exceeded.

Comments?

Kurt