[Date Prev][Date Next] [Chronological] [Thread] [Top]

equality indices & hashes

As far as I can tell, an equality index is an index of hashes of the
normalized values, not of the entire normalized values.  Also, I think
an equality index can have hash collisions with substring indices.
If so:

If a value's equality hash collides with a substring hash which matches
a lot of values (e.g. the last 4 characters of an e-mail address), it
will be impossible or very slow to look up that value by an equality
filter.  I expect the same can happen with presence indices, though I
haven't checked the code for that.  It will be rare, but when or if it
happens, it will be experienced as a _very_ mysterious bug.

So I suggest that 2 bits of the hash value are reserved to indicate
whether it is an equality, presence, substring or approx hash.

A related note: When I learned of 'sizelimit size.unchecked', I set it
to 1 for one of our LDAP databases (with equality indices only).  That
was not a good idea, since there were hash collisions.  I think the
documentation of 'unchecked' needs to mention that indices only contain
hashes and that there may thus be hash collisions.  And also that for
substring filters, the entire filter may not be checked, with reference
to the index_substr_* config options.