Re: substring index oddity

--On Wednesday, August 24, 2005 12:11 PM -0700 Quanah Gibson-Mount <quanah@stanford.edu> wrote:

--On Wednesday, August 24, 2005 1:46 PM -0500 John Madden
<jmadden@ivytech.edu> wrote:

It is quite clear in the docs that the default minimum substring
indexing starts at 3 characters.  So the "*2" and the "*22" substring
searches will not be using the index at all unless you've tweaked this.

No, I've made no mods. So "*22" shouldn't be on an index, yet it's quite fast. That does explain why "*2" is slow though.

BTW, if you have your loglevel up to around 256, do you see this

bdb_substring_candidates: (uid) index_param failed (18)

Nope, no such messages.

So I'm guessing that "*XXX*" is one character short index wise.  That
may or may not be by design.

It seems that having the glob on the end of the string is perhaps related to things being slow, although I've done so many tests I don't remember clearly.

I'm guessing it is this:

     index_substr_any_step <integer>
          Specify the steps used in subany  index  lookups.  This
          value  sets  the  offset  for  the segments of a filter
          string that are processed for a  subany  index  lookup.
          The default is 2. For example, with the default values,
          a  search  using  this  filter  "cn=*abcdefgh*"   would
          generate index lookups for "abcd", "cdef", and "efgh".

So something like "*lee*" would just generate "lee" and "e" if I'm
reading it right, and then the "e" search would fail...

Actually, looking it over, I'm guessing it is this:

index_substr_any_len <integer>
         Specify  the  length  used  for  subany   indices.   An
         attribute value must have at least this many characters
         in order to be processed. Attribute values longer  than
         this  length  will  be  processed  in  segments of this
         length. The default is 4. The subany index will also be
         used  in subinitial and subfinal index lookups when the
         filter string is longer than the index_substr_if_maxlen

I bet if you changed that to "3" from "4" it would work right...


