[Date Prev][Date Next] [Chronological] [Thread] [Top]

large multivalued attributes

This issue has been around for a long time, and I think we're going to need to deal with it soon, as we see sites creating groups with thousands of members.

There's a couple different aspects that need to be addressed.
1) finding an individual value in a large multivalued attribute is inefficient(all linear searches).
2) updating a single indexed value basically requires regenerating the index for the entire attribute.

The only obvious solution to 1) seems to be to always maintain multivalued attributes in sorted order, so that we can use binary search to locate values. This would make filter comparisons faster, as well as address things like ITS#4600, and locating values to modify in modify operations.

The most efficient fix for 2) requires us to use an index mechanism that avoids collisions, so that we can safely remove index keys for individual values without worrying about losing indexing for an unrelated value. I don't see that actually happening, though. The next alternative would be to use an index mechanism that allows duplicate keys, so that we can accumulate them when collisions occur. Then removing a single instance for one value doesn't affect the instances for other values.

We could accomplish the latter goal by changing the index structure in the DB. Currently we use a hash key and the data is a list of entryIDs for entries that match the hash. We could change the list to be entryID+ counter, so we can keep track of these collisions.

Unfortunately keeping a counter like this would make slapindex a lot harder, we'd somehow have to know not to increment the counter when we're just re-indexing something that was already indexed. (Or we could just empty out/truncate the entire index before processing anything.)

 -- Howard Chu
 Chief Architect, Symas Corp.  http://www.symas.com
 Director, Highland Sun        http://highlandsun.com/hyc
 OpenLDAP Core Team            http://www.openldap.org/project/