[Date Prev][Date Next] [Chronological] [Thread] [Top]

(ITS#3611) Index clustering patch for fast slapadd



Full_Name: Jong-Hyuk Choi
Version: HEAD
OS: SLES9
URL: ftp://ftp.openldap.org/incoming/fastslapadd1.diff
Submission from: (NULL) (66.90.4.6)


This diff is the first cut of the index clustering patch for faster directory
population which I discussed with Howard a few weeks ago. 

The index clustering is to accelerate slapadd based on the observation that IDs
are added in an ascending order during slapadd. Instead of writing to the index
database upon addition of every single ID, slapadd buffers multiple IDs to make
an ID cluster and writes the cluster as a single unit.

An IDL for an index hash will look like the following:

[Non Range IDL]
Header consisting of NOID and # of IDs in the IDL
ID cluster 1 (IDC1_1, IDC1_2, IDC1_3 ..... IDC1_n)
ID cluster 2 (IDC2_1, IDC2_2, IDC2_3 ..... IDC2_n)
ID cluster 3
...
ID cluster N

[Range IDL]
0 lo hi

The header and ID clusters are keyed with the same hash value and they are
sorted in an ascending order (NOID is treated as the smallest in the dup
comparison function). As a result, the number of writes to the index databases
can be reduced by the factor of the cluster size. For example, if the cluster
size is 1024, the number of writes to the index database can be reduced by up to
1024 and this significantly improves the performance of slapadd.

For non-tool mode addition (ldapadd), if the currently added ID is the greatest
of all IDs in the IDL, it is added as a single ID stored as the end data for the
index key. If the currently added ID is not the greatest, then it is added into
the ID cluster it belongs (where lo <= ID <= hi holds). If the size of the
cluster becomes larger than the max cluster size, it is split into half. To
delete an ID from an IDL, the corresponding ID cluster need be located. The
cluster will be read, modified, and written in order to remove the ID from the
cluster.

The current code passed all test scripts except test026. I didn't look at the
test026 case further since HEAD didn't pass it either. I'm currently evaluating
its performance while optimizing loose ends in terms of performance. I
appreciate any comments to the patch.

- Jong-Hyuk