Full_Name: Mark Adamson Version: devel OS: Solaris URL: http://nil.andrew.cmu.edu/openldap/multislapindex Submission from: (NULL) (128.2.122.223) I was using the "slapindex" program to create new index files for my LDBM backended database. It took about 1.5 hrs to run, to reindex a single attribute. I had several attributes that I wanted to reindex, and it seemed absurd to do this one at a time; the slapindex program should be able to reindex multiple attributes in a single run. I made changes to the latest (devel) build of slapindex to allow this, it was fairly trivial. I ran it, giving it 6 attributes to index at once. It took under 2 hrs to run, instead of the 9 hrs it would have taken to run them serially. As per the patch submission guidelines, a "diff -luN" of the changes is available at http://nil.andrew.cmu.edu/openldap/multislapindex Share & Enjoy. -Mark Adamson Carnegie Mellon University
moved from Incoming to Software Enhancements
> Full_Name: Mark Adamson > Version: devel > OS: Solaris > URL: http://nil.andrew.cmu.edu/openldap/multislapindex > Submission from: (NULL) (128.2.122.223) > > > I was using the "slapindex" program to create new index files for my LDBM > backended > database. It took about 1.5 hrs to run, to reindex a single attribute. I had > several > attributes that I wanted to reindex, and it seemed absurd to do this one at a > time; > the slapindex program should be able to reindex multiple attributes in a single > run. > > I made changes to the latest (devel) build of slapindex to allow this, it was > fairly > trivial. I ran it, giving it 6 attributes to index at once. It took under 2 hrs > to > run, instead of the 9 hrs it would have taken to run them serially. > > As per the patch submission guidelines, a "diff -luN" of the changes is > available at > > http://nil.andrew.cmu.edu/openldap/multislapindex > > > Share & Enjoy. > > > -Mark Adamson > Carnegie Mellon University > > > > This is a great idea, but I think you may be incorrect in your assumption (unless you actually timed it) that the indexing of all 6 attributes would take 6x the time of the first indexing. Substring (including dn) indexing takes far longer than those which are simply exact, or existence. Randy Kunkee NeoSoft Inc.
At 10:41 PM 2/21/00 GMT, kunkee@neosoft.com wrote: >> Full_Name: Mark Adamson >> Version: devel >> OS: Solaris >> URL: http://nil.andrew.cmu.edu/openldap/multislapindex >> Submission from: (NULL) (128.2.122.223) >> >> >> I was using the "slapindex" program to create new index files for my LDBM >> backended >> database. It took about 1.5 hrs to run, to reindex a single attribute. I had >> several >> attributes that I wanted to reindex, and it seemed absurd to do this one at a >> time; You can call slapindex multiple times. Wes posted a simple script for 1.2.9 which does this. Will work with slapindex with only minor changes. >> the slapindex program should be able to reindex multiple attributes in a single >> run. This is of insignificant gain. To get significant gain, you need index attributes in parallel. >> I made changes to the latest (devel) build of slapindex to allow this, it was >> fairly >> trivial. I ran it, giving it 6 attributes to index at once. It took under 2 hrs >> to >> run, instead of the 9 hrs it would have taken to run them serially. As Randy pointed out, this is likely based upon some false assumptions. Your code still creates indexes serially, I would doubt your code runs significantly faster than six separate processes ran serially. If you want significant improvement, you need to run these processes concurrently in an environment which allows for some portion of process to executed in parallel (such as when one process is waiting for I/O). >This is a great idea. Well, in parallel indexing processing is an okay idea. But management and error handling are a bit of a pain. I would rather avoid the adding in parallel index processing complexity until we sort out other, more critical 2.0 indexing issues. I would rather not add multiple (serial) index processing as this would interfer with later development of multiple parallel index generation. Kurt
Just a quick note on this issue. Mark's idea of indexing multiple attributes in one pass is sound. Though we likely not apply the patch immediately due to development considerations (the indexing code is being reworked), we will revisit this later (before 2.0 release). Kurt At 11:25 PM 2/21/00 GMT, Kurt@OpenLDAP.org wrote: >At 10:41 PM 2/21/00 GMT, kunkee@neosoft.com wrote: >>> Full_Name: Mark Adamson >>> Version: devel >>> OS: Solaris >>> URL: http://nil.andrew.cmu.edu/openldap/multislapindex >>> Submission from: (NULL) (128.2.122.223) >>> >>> >>> I was using the "slapindex" program to create new index files for my LDBM >>> backended >>> database. It took about 1.5 hrs to run, to reindex a single attribute. I had >>> several >>> attributes that I wanted to reindex, and it seemed absurd to do this one at a >>> time; > >You can call slapindex multiple times. Wes posted a simple script >for 1.2.9 which does this. Will work with slapindex with only >minor changes. > >>> the slapindex program should be able to reindex multiple attributes in a single >>> run. > >This is of insignificant gain. To get significant gain, you need >index attributes in parallel. > >>> I made changes to the latest (devel) build of slapindex to allow this, it was >>> fairly >>> trivial. I ran it, giving it 6 attributes to index at once. It took under 2 hrs >>> to >>> run, instead of the 9 hrs it would have taken to run them serially. > >As Randy pointed out, this is likely based upon some false assumptions. >Your code still creates indexes serially, I would doubt your code >runs significantly faster than six separate processes ran serially. > >If you want significant improvement, you need to run these >processes concurrently in an environment which allows for >some portion of process to executed in parallel (such as >when one process is waiting for I/O). > >>This is a great idea. > >Well, in parallel indexing processing is an okay idea. But >management and error handling are a bit of a pain. I would >rather avoid the adding in parallel index processing complexity >until we sort out other, more critical 2.0 indexing issues. >I would rather not add multiple (serial) index processing as >this would interfer with later development of multiple parallel >index generation. > >Kurt > >
changed notes changed state Open to Suspended
moved from Software Enhancements to Development
The new slapindex will regenerates all indices for all attributes of all entries within the database. Your contribution, though appreciated, is no longer relevant. (Though new patches providing selective indexing might be). Again, thanks. Kurt
changed state Suspended to Closed
Pending index changes