[Date Prev][Date Next]
The hashing algorithm used to produce index keys for attribute indices
frequently produces the same hash key many times for the same attribute.
(Think of a substring index on a name with repeated character sequences like
Mississippi or abracadabra.) This causes a number of problems for us:
when adding an array of index keys, adding a key that has already been
added produces a "key exists, duplicates not allowed" error and the index add
of that attribute stops.
when deleting an array of index keys, deleting a key that was already
deleted produces a "notfound" error and again, the index delete stops.
This means the array of index keys is only partly processed, and so certain
attributes are only partially "matchable" when searching for them. A quick
fix would be to ignore any KEYEXIST errors for index adds and NOTFOUND errors
for deletes. Perhaps a better fix would be to have the indexers sort the key
array and strip duplicates. Any suggestions?
In the above example, "Mississippi" might fail to match "*ppi*" because
indexing aborted at the second "issi" hash...
(Ugh. I'm pretty sure we discussed this on this list many months ago, and
dismissed it because it didn't seem harmful at the time. I didn't consider
the case of the duplicates causing indexing to bail out before the entire set
of keys was processed...)
-- Howard Chu
Chief Architect, Symas Corp. Director, Highland Sun
Symas: Premier OpenSource Development and Support