[Date Prev][Date Next] [Chronological] [Thread] [Top]

back-bdb IDL limitations

At the risk of seriously complicating the IDL code, I wonder if it's time to investigate a more compact IDL representation - perhaps using ranges exclusively, where each IDL is a list of ranges.
There would still be pathological cases where discontiguous IDs would consume a lot of space, but they should be pretty rare.

IDL[0] would still hold the number of slots in use. Each slot is a lo/hi pair. So we can express any contiguous range with a single pair, and handle discontinuities with additional ranges. An IDL cursor would be a (slot,value) pair instead of the single integer we currently use. A scheme like this could dramatically reduce the size of IDLs overall.

Borrowing from some other data compression techniques - we can also assign a scale factor if needed, to further expand the dynamic range of the IDLs.

For example, assume we have an IDL that can only accomodate 4 slots, and each slot starts with a scale factor of 1. The first slot carries a range 1-5, then the data skips one, so we use a second slot for IDs 7-12, skip a few more, slot 3 covers 15-17, skip some more, and slot 4 fills up with the range 20-30. Now we want to insert the value 32, but there are no more slots free. We bump up the scale factor to 2. This means that skips of 2 or less are treated as contiguous. (Think of it as rounding, coarsening the granularity, or shifting one bit.) So the first slot covering 1-5 is now considered contiguous with the second slot at 7-12, so they're combined into one slot. The next slot remains unchanged, and the next slot 20-30 is contiguous with 32, so it becomes 20-32.

So we started with this IDL:
  4,1   4 slots, scale factor 1

And ended with this IDL:
  3,2   3 slots, scale factor 2

Obviously for the real implementation we would use more than 4 slots. But I don't think a fixed size implementation like we have now is useful for the sizes of databases we're seeing more and more. Nor is collapsing it all into a single range very helpful for indexed lookups; the quality of the indices really suffers. If we use a dynamic scaling approach like I'm talking about, there will still be some losses in granularity but the effectiveness of the indices will degrade much more slowly as the database sizes grow.

 -- Howard Chu
 Chief Architect, Symas Corp.       Director, Highland Sun
 http://www.symas.com               http://highlandsun.com/hyc
 Symas: Premier OpenSource Development and Support