[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Odd performance problem with substring index



I think what you see is the effect of the so-called allidsthreshold.

See http://docs.iplanet.com/docs/manuals/directory/41/admin/index1.htm#1053642

You can raise SLAPD_LDBM_MIN_MAXIDS in include/ldap_defaults.h to
lower the probability of being 'hit' by such queries in real life, but
you cannot completely avoid it.
This is a real problem for large directories (like ours, too :)).

Folks, wouldn't it be a good idea to implement a limit on the maximum
number of entries to be examined ?
iplanet offers such a setting, called lookthroughlimit.
IIRC it returns just an error whenever more than this number of entries
would have to be examined. This also prevents that someone kills your 
server's performance, e.g. when querying for unindexed attributes.
Setting lookthroughlimit <= allidsthreshold would avoid the allidsthreshold
effect, too.


regards,
Markus

Oyvind Moll wrote:
> 
> I have an odd problem with OpenLDAP 2.0.7 that I'd like some input on.
> 
> I'm setting up a catalog with approximately a million nodes in it.  My
> problem is that the performance of a substring index seems half arbitrary,
> dependent on where in the search string I put globs (*) -- _sometimes_
> OpenLDAP seems to do a linear search, even though the attribute searched on
> is indexed.
> 
> To be more concrete, the relevant part of my catalog structure is like
> this:
> 
> dc=com
>   dc=domain1,dc=com
>   dc=domain2,dc=com
>     uid=foo,dc=domain2,dc=com
>      - uid: foo
>      - mail: foo@domain2.com
>      - ...
> 
> Now, I have eq and sub indexes on the mail attribute, so I expect the
> following searches to return the correct result pretty quickly:
> 
> 1. (mail=foo@domain2.com)
> 2. (mail=*foo@domain2.com*)
> 3. (mail=*foo@domain2.com)
> 4. (mail=foo@domain2.com*)
> 5. (mail=foo@doma*in2.com)
> 6. (mail=foo@*domain2.com)
> 7. (mail=f*oo@domain2.com)
> 
> My observation is that only searches 1 and 2 respond quickly (<0.1sec),
> while the others use several minutes, pretty much the same time as a search
> on a non-indexed attribute.
> 
> However, if I add another glob to searches 5-7, I can get better
> performance.  The following perform quickly:
> 
> (mail=*foo@domain2*.com)
> (mail=*foo@doma*in2.com)
> (mail=f*oo@domain2.com*)
> 
> ...while the following are very slow:
> 
> (mail=*foo*@domain2.com)
> (mail=foo@domain2*.com*)
> 
> Is it a known issue?  If so: is there a general work-around I can rely on?
> 
> Apart from this issue, I'm happy with the performance of OpenLDAP with a
> million nodes.  Even the write performance, with a couple of indexes and
> with the database files on a RAID5 volume, is pretty good.
> 
> --
>    Øyvind Møll
>    oyvindmo@initio.no
>    http://www.initio.no/
begin:vcard 
n:Storm;Markus
tel;fax:++49 +5241 80-67867
tel;work:++49 +5241 80-7867
x-mozilla-html:FALSE
org:mediaWays GmbH;NMW-T
adr:;;Postfach 185;Guetersloh;;33311;Germany
version:2.1
email;internet:Markus.Storm@mediaWays.net
x-mozilla-cpt:ils.mediaways.net;23744
fn:Markus Storm
end:vcard