[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: large multivalued attributes



At 12:09 PM 8/16/2006, Howard Chu wrote:
>This issue has been around for a long time, and I think we're going to need to deal with it soon, as we see sites creating groups with thousands of members.
>
>There's a couple different aspects that need to be addressed.
> 1) finding an individual value in a large multivalued attribute is inefficient(all linear searches).
> 2) updating a single indexed value basically requires regenerating the index for the entire attribute.
>
>The only obvious solution to 1) seems to be to always maintain multivalued attributes in sorted order, so that we can use binary search to locate values. This would make filter comparisons faster, as well as address things like ITS#4600, and locating values to modify in modify operations.

I wonder, for small numbers of values, whether sorting would have
any significant impact on performance. Probably in the noise. 

>The most efficient fix for 2) requires us to use an index mechanism that avoids collisions, so that we can safely remove index keys for individual values without worrying about losing indexing for an unrelated value. I don't see that actually happening, though.

I doubt it actually possible (given subtyping issues).

>The next alternative would be to use an index mechanism that allows duplicate keys, so that we can accumulate them when collisions occur. Then removing a single instance for one value doesn't affect the instances for other values.

One alternative would be, after the number of values exceeds some
limit, to leave old keys in place (just add keys for new values).

>We could accomplish the latter goal by changing the index structure in the DB. Currently we use a hash key and the data is a list of entryIDs for entries that match the hash. We could change the list to be entryID+ counter, so we can keep track of these collisions.
>
>Unfortunately keeping a counter like this would make slapindex a lot harder, we'd somehow have to know not to increment the counter when we're just re-indexing something that was already indexed. (Or we could just empty out/truncate the entire index before processing anything.)
>
>-- 
> -- Howard Chu
> Chief Architect, Symas Corp.  http://www.symas.com
> Director, Highland Sun        http://highlandsun.com/hyc
> OpenLDAP Core Team            http://www.openldap.org/project/