[Date Prev][Date Next]
Re: equality indices & hashes
Hallvard B Furuseth wrote:
As far as I can tell, an equality index is an index of hashes of theYes. Pretty sure I raised this issue a long time ago, when 2.1 was still
in Alpha. I've definitely seen these collisions occur.
normalized values, not of the entire normalized values. Also, I think
an equality index can have hash collisions with substring indices.
The key for presence indexing is not generated by the frontend. In
either back-bdb or back-ldbm the likelihood of a collision is
vanishingly small. Nor is there any consequence if any collision with
the presence key does occur.
If a value's equality hash collides with a substring hash which matches
a lot of values (e.g. the last 4 characters of an e-mail address), it
will be impossible or very slow to look up that value by an equality
filter. I expect the same can happen with presence indices, though I
haven't checked the code for that. It will be rare, but when or if it
happens, it will be experienced as a _very_ mysterious bug.
So I suggest that 2 bits of the hash value are reserved to indicateSounds like a good idea. Right now an indicator is fed into the hash
input, and that obviously is still prone to collisions. Explicitly
masking it onto the hash output would cure that issue. Also sounds like
something that will have to wait for RE23 since it will cause an
incompatible format change.
whether it is an equality, presence, substring or approx hash.
-- Howard Chu
Chief Architect, Symas Corp. Director, Highland Sun
Symas: Premier OpenSource Development and Support