[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: substring index oddity

To: openldap-software@OpenLDAP.org
Subject: Re: substring index oddity
From: Quanah Gibson-Mount <quanah@stanford.edu>
Date: Wed, 24 Aug 2005 12:11:12 -0700
Cc: John Madden <jmadden@ivytech.edu>
Content-disposition: inline
In-reply-to: <54271.10.0.14.156.1124909163.squirrel@mail.ivytech.edu>
References: <53615.10.0.14.156.1124895814.squirrel@mail.ivytech.edu> <430C963E.8040600@unav.es> <53869.10.0.14.156.1124899706.squirrel@mail.ivytech.edu> <01C66B2551044A36D1B34CBD@cadabra-dsl.stanford.edu> <54271.10.0.14.156.1124909163.squirrel@mail.ivytech.edu>

--On Wednesday, August 24, 2005 1:46 PM -0500 John Madden <jmadden@ivytech.edu> wrote:

It is quite clear in the docs that the default minimum substring indexing
starts at 3 characters.  So the "*2" and the "*22" substring searches
will not be using the index at all unless you've tweaked this.


No, I've made no mods.  So "*22" shouldn't be on an index, yet it's quite
fast.  That does explain why "*2" is slow though.

BTW, if you have your loglevel up to around 256, do you see this message?
bdb_substring_candidates: (uid) index_param failed (18)


Nope, no such messages.

So I'm guessing that "*XXX*" is one character short index wise.  That may
or may not be by design.


It seems that having the glob on the end of the string is perhaps related
to things being slow, although I've done so many tests I don't remember
clearly.


I'm guessing it is this:

    index_substr_any_step <integer>
         Specify the steps used in subany  index  lookups.  This
         value  sets  the  offset  for  the segments of a filter
         string that are processed for a  subany  index  lookup.
         The default is 2. For example, with the default values,
         a  search  using  this  filter  "cn=*abcdefgh*"   would
         generate index lookups for "abcd", "cdef", and "efgh".


because these seem like they wouldn't apply:

    index_substr_if_minlen <integer>
         Specify the minimum length for subinitial and  subfinal
         indices.  An  attribute  value  must have at least this
         many  characters  in  order  to  be  processed  by  the
         indexing functions. The default is 2.

    index_substr_if_maxlen <integer>
         Specify the maximum length for subinitial and  subfinal
         indices.  Only  this  many  characters  of an attribute
         value will be processed by the indexing functions;  any
         excess characters are ignored. The default is 4.

So something like "*lee*" would just generate "lee" and "e" if I'm reading it right, and then the "e" search would fail...

--Quanah

--
Quanah Gibson-Mount
Principal Software Developer
ITSS/Shared Services
Stanford University
GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html

Follow-Ups:
- Re: substring index oddity
  - From: Quanah Gibson-Mount <quanah@stanford.edu>

References:
- substring index oddity
  - From: "John Madden" <jmadden@ivytech.edu>
- Re: substring index oddity
  - From: Ignacio Coupeau <icoupeau@unav.es>
- Re: substring index oddity
  - From: "John Madden" <jmadden@ivytech.edu>
- Re: substring index oddity
  - From: Quanah Gibson-Mount <quanah@stanford.edu>
- Re: substring index oddity
  - From: "John Madden" <jmadden@ivytech.edu>

Prev by Date: Re: slapd-relay and multiple databases
Next by Date: Re: substring index oddity
Index(es):
- Chronological
- Thread