[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Faster slapadd in presence of multiple indices?

To: Jonghyuk Choi <jongchoi@us.ibm.com>
Subject: Re: Faster slapadd in presence of multiple indices?
From: Howard Chu <hyc@symas.com>
Date: Wed, 07 Sep 2005 17:59:24 -0700
Cc: ando@sys-net.it, openldap-devel@OpenLDAP.org, owner-openldap-devel@OpenLDAP.org
In-reply-to: <OFDB59F6B9.5E2B9485-ON85257075.0054D0D6-85257075.00562D6F@us.ibm.com>
References: <OFDB59F6B9.5E2B9485-ON85257075.0054D0D6-85257075.00562D6F@us.ibm.com>
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8b4) Gecko/20050829 SeaMonkey/1.1a

Well, now that I'm developing on a dual-core system (AMD64 X2) I guess I'm a bit more interested in re-examining the question, since I now see my system is 50% idle while slapadd is running...

My previous experiment just used two separate threads, one for reading/parsing the LDIF input and one for writing the entry+index data. I suppose, in the context of -q with transactions disabled, we could consider using a separate thread per indexed attribute. This implies activating the thread-pool code in tool mode, but the current code assumes there is no thread-pool in tool mode.

Since it looks like a large slapadd is mostly CPU bound (assuming a large enough BDB cache) this ought to be a win. And once the cache gets full and starts flushing pages, it should also be a win as we get past I/O waits. I guess I should test with a smaller cache (thus more frequent I/O) and see how it goes.

Jonghyuk Choi wrote:

Actually I did examine this issue too when I was striving to enhance the performance of slapadd a few months ago. What I considered at that time is to combine the concurrent slapadding with the index clustering approach (ITS#3611). The index clustering consists of two phases: in-memory index creation and batched write of indices to DB. If the two phases can be arranged to overlap, it should be possible to keep CPU busy performing in-memory index creation during the time CPU would have been kept idle waiting for I/O completion. In this way, it would be possible to improve the performance of slapadd even in a single CPU system (in addition to multiple CPU system) if I/O bandwidth is abound (which is true esp. when index clustering approach is used). This has been in my todo lists before I switched to examining Berkeley DB. I guess this has a good potential to further improve the performance of slapadd beyond what has been achieved by the BDB scalability patch. - Jong-Hyuk
------------------------
Pierangelo Masarati wrote: > I haven't investigated this issue yet but would exploiting multiple CPUs > allow faster slapadds (with -q, i.e. with less consistency checks) if, for > instance, the entry and the indices are generated concurrently? Much like > the ancient ldif2index. This comes from the consideration that on the > machine we're testing giant slapadds, we have 75% to 87.5% of the CPU > idle... > > Does anybody see any big stopper to this approach? > I actually added multi-threading code to slapadd in a previous experiment. It's probably still in CVS, but I removed the code because it yielded no measurable improvement. I think the problem is that this will be I/O bound regardless of what CPU resources are available.


--
 -- Howard Chu
 Chief Architect, Symas Corp.  http://www.symas.com
 Director, Highland Sun        http://highlandsun.com/hyc
 OpenLDAP Core Team            http://www.openldap.org/project/

References:
- Re: Faster slapadd in presence of multiple indices?
  - From: Jonghyuk Choi <jongchoi@us.ibm.com>

Prev by Date: Re: SLAPD_CONF_UNKNOWN_BAILOUT
Next by Date: Re: SLAPD_CONF_UNKNOWN_BAILOUT
Index(es):
- Chronological
- Thread