[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: IDL bit vectors



Gavin Henry wrote:
Is it worth the effort for 32 bit when most production servers are 64?
Or am I missing the point?

The question is the same regardless - that is, whether/how to efficiently represent IDLs without losing too much precision and without adding too much complexity. The difference between 32 and 64 bit here is a trivial detail.


Currently we lose precision whenever any IDL goes above 65535 entries and we never get it back even if the IDL later shrinks to less than 65535 entries. We ought to have a scheme that can represent the entire range of valid entryIDs with more granularity but without using much more memory.

Currently on a 32 bit machine that means we can have 2^32 bytes of memory in the machine but it takes 4x2^32 bytes to losslessly represent every valid value, so we clearly have an efficiency problem. Likewise on a 64 bit machine there could be 2^64 bytes of memory but it would take 8*2^64 bytes for our current representation. Using a straight bitvector with no particular tricks would bring that down to 2^32 bits (2^29 bytes) or 2^64 bits (2^61 bytes) which is still impractical.

Realistically, at about 2Kbytes per entry, a 32 bit machine could directly address at most ~2^20 entries (~1M) so an IDL that maxes out at 1.86M before losing precision may be perfectly acceptable.

For 64 bit machines it seems 50 bits (1 petabyte) is the largest physical address space currently available. That puts a practical upper bound, assuming 4Kbytes per entry, at around 2^37 entries (~128 billion) directly addressable. It would take 2^34 bytes to handle that directly, 16GBytes, which again is still impractical. So again, we need some kind of range representation in addition to the actual bit vectors.

Gavin.

On 10/19/08, Howard Chu<hyc@symas.com> wrote:
Seems like I keep poking at this idea again every couple years.

If we use a<base,vector>  we can cover a much wider range without losing any
precision. For a 32 bit machine, 32 bits per word would cover 5 bits worth
of
entry IDs. So 27 bits would comprise the base, and the next 32 bits would
represent up to 32 entry IDs.

To avoid wasting those 5 unneeded bits in the base, we could use them as a
run-length counter and use multiple vectors per base. But it might be better
to slide things over and use just a 24 bit base, and use the bottom 8 bits
of
the word as a bitmask to represent which following vectors are present. E.g.

......01 means 1 vector follows, representing base+ 0-31.
......02 means 1 vector follows, representing base+32-63.
......05 means 2 vectors follow, base+0-31 then base+64-95.

That allows us to represent up to 256 entryIDs in only 288 bits (instead of
the 16384 bits we currently need). That would allow us to track about 1.86M
entryIDs in the same space we currently use to track only 64K entryIDs.

It seems we would still need some kind of range representation to handle
IDLs
with more than 1.86M entries.

--
    -- Howard Chu
    CTO, Symas Corp.           http://www.symas.com
    Director, Highland Sun     http://highlandsun.com/hyc/
    Chief Architect, OpenLDAP  http://www.openldap.org/project/




--
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/