[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: back-bdb IDL limitations



Howard Chu wrote:
Jonghyuk Choi wrote:
I think this bit-mapped approach provides more workable solution than the range ID solution for the case where IDs in an IDL are rather contiguous. A 32 bit segment bit map covers 8 entries having contiguous IDs without increasing storage requirement and with the ability of pinpointing the exact indexing status of the 8 entries in the contigous range. On the other hand, the range ID consumes double the storage, but it cannot pinpoint the indexing status of the indiviual entries in the range once it is scaled. A 64 bit segment bit map covers 32 entries having contigous IDs with the same amount of storage as in the range ID scheme, but it can provide exact indexing for those 32 entries unlike the range ID approach with scaling.

The time we're losing in back-hdb for sorting IDLs has me looking into this again. I'm only considering 32 bit IDs at the moment because I can't see a need for 64 bit IDs yet and I haven't figured out where to arbitrarily cut off (40, 48 bits?).


I'm thinking of an in-memory structure like this: the bottom 5 bits of the ID occupy the 32 bit map. A chunk of 4096 of these maps is allocated contiguously (comprising the next 12 bits of the ID). An array of 32768 slots points to these chunks (comprising the top 15 bits of the ID). This IDL structure has sufficient dynamic range to represent the full 32 bits worth of IDs. Currently our in-memory IDL is 512KB. In 512KB we can accomodate 128 chunks, so a single 512KB IDL can represent 24 bits worth of IDs, 16,777,216 IDs. I think that's pretty decent for most purposes, and it's not too impractical to increase this on larger machines to span a couple more bits. We still need a representation for ranges or All IDs, but the likelihood of needing ranges will be very very small; it would be nice to eliminate that entirely but the All IDs concept is still useful.

The on-disk format will be 256 bits of map (bottom 8 bits of ID) with a 24 bit prefix, so there are no wasted bits. Fortunately BDB allows us to do partial writes so we can update individual bytes when we're doing single-entry IDL updates.

The actual in-memory definitions would be something like this:

typedef uin32_t IDchunk[4096];
typedef struct IDL {
   char slots[32768];      /* array of ID slots */
   IDchunk chunks[125];   /* ID storage space to be allocated as needed */
   char freechunks[16];   /* bitmap of free chunks */
} IDL;

That's (512K -16368) bytes for an in-memory IDL that can represent 16,384,000 IDs. Perhaps we can come up with a better layout. A slot will contain a value 1-125 pointing to the chunk that it uses, 0 meaning an unused slot.

Another alternative is to keep a totally separate list of chunks, and attach them on demand. With that approach there would be no reason to enforce a hard limit on the total size of an IDL.

--
 -- Howard Chu
 Chief Architect, Symas Corp.  http://www.symas.com
 Director, Highland Sun        http://highlandsun.com/hyc
 OpenLDAP Core Team            http://www.openldap.org/project/