[Date Prev][Date Next]
Re: need suggestion on indexing big directory
Quanah Gibson-Mount wrote:
> Note, that in *repeated* tests
I've done, it was always quicker to "slapcat" the entire database and
then "slapadd" it back in, than to run slapindex. There was some work
done at one point to fix this problem, I don't recall if it made it into
2.2 or not, IIRC there were some unintended side effects, and it was put
off for now.
Yes, there were a few different approaches made, none with any positive
effect. Testing for the existence of an entry (so that you can avoid
adding it redundantly) took as much execution time as just blindly
adding it and catching the error code (when an entry already exists).
It's clear that adding an item that already exists in BDB is not a
no-op; in several cases the size of the underlying database changed even
though the transaction was aborted. I would say this is possibly a BDB
bug, but it's hard to trace the real reasons.
Some comparisons on a 330,000 entry db:
running slapindex where I changed a single attribute to have "sub" as
well as "eq": > 26 hours
running "slapcat" then "slapadd" for the same DB with memory cache:
approx. 2 hours
running "slapcat" then "slapadd" for the same DB with disk cache:
approx. 6 hours
There's also the fact that slapindex places a doubled demand on the BDB
cache - it requires the entry information in the database to be loaded,
crunched into index data, and then written out to the index databases.
In the slapadd case, the entry information is read as plain text, so the
demand on the BDB cache is much less. It can all be used for deferred
writes, whereas for slapindex it must serve both reads and writes. Some
of the overhead can be avoided if the entry information is mapped in
directly from its database files, instead of being copied into the BDB
cache. However, BDB generally will not memory-map files over a certain
size, and it won't do it at all if the file has already been used with
the main cache. So to take advantage of memory mapping, first you would
need to raise the size limit (in DB_CONFIG) to accomodate your id2entry
database, and then you would need to run db_recover to flush out all the
current id2entry pages, before running slapindex.
-- Howard Chu
Chief Architect, Symas Corp. Director, Highland Sun
Symas: Premier OpenSource Development and Support