[Date Prev][Date Next] [Chronological] [Thread] [Top]

RE: Indexing performance



Howard,
Thanks for your reply.
Our DB_CONFIG looks like this:

set_cachesize 3 0 1
set_lg_regionmax 262144
set_lg_bsize 2097152
set_lg_dir /archives/ldap_sec
set_data_dir /ldap_sec_data

As you can see we've divided home, log and data into three separate file
systems (separate physical drive arrays).

Running db_stat -m shows no caching problems..  Can you explain
The procedure you touch upon:

The development code has added an option to store the cache in shared
memory instead. This can be better if your system can accomodate a large
enough shared memory region.

We would like to give this a try!

Thanks

Todd M Leone

> -----Original Message-----
> From: owner-openldap-software@OpenLDAP.org
> [mailto:owner-openldap-software@OpenLDAP.org]On Behalf Of Leone, Todd

> Currently were using version 2.17, when I index ~400,000 
> entries with 10
> eq indexes It takes 17 minutes --- but when I add 1 sub to 
> the index it
> take 4 hrs... Is the sub indexing process improved in 2.19, if not, is
> it something that's being addressed?

substring indexing writes a large amount of data. There's no real way to
reduce the data volume involved. The only way to speed this up is with
careful tuning of the BDB configuration. Use "db_stat -m" to see how
your BDB cache is performing; if you see non-zero values for "pages
forced from cache" then the cache is probably too small. The slapindex
process is extremely cache, memory and I/O intensive because it reads
the entire database and writes to every index. You'll also get the best
speed with a large log buffer size and with NoSync.

In the FAQ http://www.openldap.org/faq/index.cgi?file=893 I recommend a
log buffer size of 2MB to go with the default log file size of 10MB. If
you're indexing a lot of attributes, the log volume generated by
indexing a single entry may actually exceed 2MB, so this can be a factor
in performance as well.

You should use top or iostat to monitor I/O load on the system while
slapindex runs. If you see a large percentage of time being spent in IO
wait, then something isn't configured right or maybe the database is
just too large for the available memory. Because the BDB cache is stored
on disk, the disk where the cache resides can become a major bottleneck.
(The development code has added an option to store the cache in shared
memory instead. This can be better if your system can accomodate a large
enough shared memory region.) The FAQ recommends storing the BDB log
files on a separate disk from the database files. It turns out that you
also want the BDB cache file, and thus the entire BDB environment home
directory, to live on a separate disk from the database files, because
so much of the I/O consists of transferring data pages between the
database files and the BDB cache. When both are on the same disk, the
seek overhead kills throughput.

On one database that consumes about 1GB in id2entry and dn2id, with
50-some attributes indexed, it took over 12 hours to run slapindex. By
moving the BDB cache onto a memory-based filesystem (tmpfs, RAMdisk,
whatever you want to call it), the time dropped to 1 hour and 15
minutes. In a few instances I saw a single entry generate over 6MB of
index updates in the transaction log (multivalued attributes with lots
of values, all substring indexed) but these were pretty rare.

  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support