[Date Prev][Date Next] [Chronological] [Thread] [Top]

RE: ordering indexing



> -----Original Message-----
> From: Hallvard B Furuseth [mailto:h.b.furuseth@usit.uio.no]

> Howard Chu writes:
> > I never found any order-preserving hash functions that I
> > could see being generally applicable here,
>
> Are general order-preserving hash functions possible?
> I don't see how, except the identify function and functions
> that increase the length of the input string.

Well, if you can generate the hash in a second pass after all the data is
present, you can create a hash function for that data set. But no, I don't
see how you can solve the general case.

> > but we could certainly do some indexing for specific
> > syntaxes that represent timestamps and other limited
> > size/range values. (I.e., we can crunch their normalized
> > value directly into a 32 or 64 bit integer losslessly.)
>
> Note that Generalized Time, the syntax of modifyTimestamp
> etc., supports an arbitrarily large fractional part, which
> you would need to append to the integer.
>
> BTW, the full integer range of Generalized Time takes 39
> bits, or 40-41 bits for quicker conversion.

That's fine. Throw it into a 64 bit integer and you can use the remaining
bits for the fraction.

Personally I wouldn't worry about this right away, since slapd doesn't use
fractional time anywhere. Now, if we convert op->o_time and slap_get_time()
to use a struct timeval, then we would at least carry microseconds around.
That only requires 20 bits.

I'm thinking that for these syntaxes we use an index key with a 3 bit prefix
and 61 bits of key. The prefix carries the index type (equality, substring,
approx, whatever) so that we can avoid overlap without having to use BDB
subdatabases.

Given that indexing isn't required to provide exact answers, we could adopt
this approach for strings as well, where the equality index just uses the
first 7-10 octets of the string as the index key. (For caseignoreIA5 you can
assume only 6 significant bits per character, thus 10 characters can fit into
60 bits of index.) It would mean trading off equality search performance to
gain inequality performance, but I think it may be worthwhile.

  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support