[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: IDL bit vectors

Howard Chu writes:
> To avoid wasting those 5 unneeded bits in the base, we could use them as a 
> run-length counter and use multiple vectors per base. But it might be better 
> to slide things over and use just a 24 bit base, and use the bottom 8 bits of 
> the word as a bitmask to represent which following vectors are present. E.g.

With variable-length records you can't do a binary search for an entry
ID.  I don't know how significant that is though.  And you can get
around it, at some expense:

> ......01 means 1 vector follows, representing base+ 0-31.
> ......02 means 1 vector follows, representing base+32-63.

That can be normalized so bottom bit == 1: (base+32, 01).  If we also
don't use entry IDs with (ID & 31) == 0, we can look at the bottom bit
of a word in the IDL to tell if it is a vector header or a vector word.

> That allows us to represent up to 256 entryIDs in only 288 bits
> (instead of the 16384 bits we currently need).

Though in sparse IDLs, worst case will be 2 words per entry ID.

OTOH if we have exact ranges, best case is 2 words for any number of
entries:-) I don't know how they are stored now though, you mentioned
loss of precision so I assume approximate ranges are involved but a
glance at the code doesn't reveal where.

> That would allow us to track about 1.86M entryIDs in the same space
> we currently use to track only 64K entryIDs.
> It seems we would still need some kind of range representation to
> handle IDLs with more than 1.86M entries.

And some way to tell that from vectors, possibly using up another bit as
a flag in the vector header.  Alternatively, don't use IDs where (ID &
31) == 2 (or 31) either, and use that bit in the first vector words as a

Do you have any stats or feeling about how sparse the average IDL of any
size is?  Or how much ranges improve over lists of single IDLs, for that
matter?  I suppose it matters somewhat in temporary IDLs in memory as
well as in files...