[Date Prev][Date Next] [Chronological] [Thread] [Top]

slap tools notes



Never gave much thought to these before, but obviously it can be a big hassle
with a large database. Investigating ITS#2499 led me down some interesting
paths...

It appears that you may want 3 separate disks, not just 2, to run back-bdb at
top speed, since so much disk I/O occurs in the BDB environment files. Or
throw the environment into shared memory.

I had some concerns about data recoverability using shared memory, but it
appears that this is not a significant issue. The log files carry enough
information to perform a recovery regardless of whether the environment is
still intact. So as long as the log is sync'd up, you're OK.

In the absence of shared memory, you need some way to avoid reads and writes
to the environment and the database files from thrashing. This requires
putting the environment on a separate disk from the database files. If you
only have two disks, the simplest thing to do is to put the environment where
the log files reside. You also have to run with DB_TXN_NOSYNC or else the log
flushes will get you... Another thing that can save on cache I/O is to let
BDB use mmap to access the id2entry database, instead of reading it into the
cache. mmap will only be used if the database was opened ReadOnly, and if it
hasn't already been (partially) read into the cache. Also the database must
be smaller than DB_ENV->set_mp_mmapsize, which defaults to 10MB.

I just patched back-bdb to allow opening the id2entry and dn2id databases in
ReadOnly mode when running slapcat and slapindex, so that takes care of the
first condition. For the second condition, it might be useful just to create
a new environment in a different location from where slapd normally runs.
Obviously this must be done without any other processes running. On Solaris
using /var/run works  well for this purpose; since /var/run is a tmp
filesystem backed by swap it's essentially a RAMdisk and performs very fast.
For the third condition, look at the sizes of your dn2id and id2entry
databases and set a large enough value in the DB_CONFIG file in your
temporary environment.

If you use an alternate environment like this, you will of course need to use
db_recover to reset your main environment before restarting slapd.

With a 280,000 entry database that uses 823MB in id2entry.bdb and 50-some
indexes, it took 11 hours to run slapindex using the original environment. By
switching the environment to reside in /var/run, it took only ~1 hour, 15
minutes. In the 11 hour run slapindex generally got only 2% of the CPU, the
rest was IO wait. With the alternate environment, slapindex generally got 97%
of the CPU.

  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support