[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: slapadding a large bdb database



On Mon, Aug 05, 2002 at 11:13:22AM -0700, Howard Chu wrote:
% > Of the 458,904 entries in the ldif, I can only "find" 31,729 of them.
% 
% Not sure what to make of that. If you slapcat the database does the output
% match the LDIF files that you fed into slapadd?

Yup, if I slapcat the database, the number of entries matches exactly.

For laughs, I threw a larger spindle into this machine and added the entire
ldif in one go. It only returns 263,818 entries in this case (out of ~450k).
Removing all the bdb files other than id2entry and reindexing yields the
same results.

What's also interesting is that I get lots of (~45,000):

Aug  8 16:59:47 oh.roc.frontiernet.net slapadd: bdb(o=frontier): Duplicate
data items are not supported with sorted data

when *adding* with slapadd. (I started with a pristine environment - no
files whatsoever.)

Lastly, I noticed that reindexing ~450k entries (330MB of data in ldif
format with about two dozen indices) generates about 6-8GB of BDB
transaction logs. I haven't checked the source, do slap{add,index} perform
all their operations in a single transaction? If so, is there any way to
break these up, perhaps by having slapadd/slapindex use multiple
transactions (i.e., every x entries, recover the database to free up some
transaction logs and start a new transaction)? If not, could I modify the
source to do something like this? I'd like to cut down on disk usage by
purging some transaction logs while slap{add,index} is running.

john
-- 
John Morrissey          _o            /\         ----  __o
jwm@horde.net        _-< \_          /  \       ----  <  \,
www.horde.net/    __(_)/_(_)________/    \_______(_) /_(_)__