[Date Prev][Date Next] [Chronological] [Thread] [Top]

ITS#3611 Index clustering patch for fast slapadd

Some more thoughts on implementing this feature...

Again, rather than change the current IDL format, I think we can just use some level of IDL caching for slapadd -q. In this case, we would always allocate IDLs of BDB_IDL_DB_MAX size and use e.g. bdb_idl_append_one() on the cached IDLs.

I'm not sure if the regular idlcachesize parameter is appropriate here, but it makes sense to start with that for experimenting and introduce a new config parameter later if it's needed. Once the LRU limit is reached, flush the oldest cache blocks out to the database before freeing/re-using them.

As an added refinement, add a "previous size" field to the IDL cache entry, set to the length of the IDL the last time it was read from disk. Then on subsequent flushes only the IDs after prev_size need to be written to disk.

On a side note, I wasn't too happy with the bit vector results I was getting; the in-memory structure I proposed before was too slow. I'm stopping that investigation for now. Also I was only using it in hdb_dn2idl, and needed to convert back to regular IDL format before returning, which was expensive. I suppose a future attempt that uses vectors throughout the backend may be less of a problem. However, the biggest problem seemed to be that I had very long vectors that were mostly empty, and no fast way to get to only the set bits.

It occurs to me now that we can use lookup tables to reduce the enumeration time. E.g., we can enumerate 3 bits at a time with an 8 row table:
0: 0
1: 1
2: 2
3: 1,2
4: 3
5: 1,3
6: 2,3
7: 1,2,3

(For the actual implementation 4 or 8 bits / 16 or 256 rows would make more sense.)
The original code I used simply shifted through 32 bit positions, breaking if the shifted value was zero, which was definitely too slow.

 -- Howard Chu
 Chief Architect, Symas Corp.  http://www.symas.com
 Director, Highland Sun        http://highlandsun.com/hyc
 OpenLDAP Core Team            http://www.openldap.org/project/