[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: slapadd/slapindex

Howard Chu wrote:
Some observations regarding slapadd performance... The ideal is to have enough memory to configure a BDB cache large enough for all of the database files. Failing that, it's best to run slapadd and slapindex separately.

For my test database with 360MB input LDIF and 285,000 entries and 15 indexed attributes, using a 512MB BDB cache, slapadd -q with indexing took 1 hour 20 minutes.
With the IDL caching patch in HEAD, and IDL cachesize at 50,000, this dropped to 1 hour even.

Running slapadd -q with no indexing took only 1 minute 15 seconds.

The resulting id2entry database is about 800MB; with all indexing the total size is around 2.1GB.

Running slapindex with this BDB environment is pretty slow. But, by setting BDB to mmap files of 800MB or less, and deleting the environment so that id2entry is mmap'd directly instead of being double-buffered through the BDB cache, the slapindex -q time drops to 26 minutes without IDL caching, and 20 minutes with caching.

Using two threads (my machine is dual-core) the slapindex -q time is now only 10 minutes, using a BDB cache of 768MB and no IDL caching. Adding IDL caching here slows it down to 15 minutes, so I've decided to disable that bit of code by default.

I've added a new global config parameter "tool-threads" (olcTooThreads) to control how many threads the indexer will use. The default value is 1. For multiple threads, all of the attributes that need indexing in an entry are divided up among each of the threads, so only one entry is in progress at a time. On my tests there was no advantage to using more threads than there are processors. More testing would be welcome...

 -- Howard Chu
 Chief Architect, Symas Corp.  http://www.symas.com
 Director, Highland Sun        http://highlandsun.com/hyc
 OpenLDAP Core Team            http://www.openldap.org/project/