[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Approx search



Hi,

On Tuesday 22 October 2002 13:51, you wrote:
> Am Dienstag, 22. Oktober 2002 13:03 schrieb Frank Swasey:
> > Today at 12:06pm, Andreas Roedl wrote:
> > > > > ldapsearch -h localhost -x -LLL -b "ou=People,dc=company,dc=com"
> > > > > "(maildrop~=andreas.roedel@company.com)" maildrop
> > > >
> > > > And if you search on: "(maildrop=andreas.r*)" ?
> > >
> > > The same result. The filter code seems to respect the punctuation mark
> > > and stops at it. This should be configurable somehow...
> >
> > Does the definition of the maildrop attribute allow substring searches?
>
> It looks like this (don't know, if this is right or not):
>
> attributetype ( 1.3.6.1.4.1.10018.1.1.4 NAME 'maildrop'
>         DESC 'RFC822 Mailbox - mail alias'
>         EQUALITY caseIgnoreIA5Match
>         SUBSTR caseIgnoreIA5SubstringsMatch
>         SYNTAX 1.3.6.1.4.1.1466.115.121.1.26{256} )

Yes, it supports substring search.

I do not think it has something to do with the ability of substring searches,
since approx search is a different search operator (and in fact his search 
gives results, unfortunately too many)

The problem is that the approxy matching routines seem to stop after the 
first period and return success / failure depending only on the first "word".

Browsing through the code (phonetic.c and schema_init.c) I see
that slapd does something similar to 2). It splits into words
separated by " " (or "._ " if SLAPD_APPROX_INITIALS is defined)
After that, it checks, if at least some of the words of the pattern appear
in the same order as in the pattern.
Unfortunately the word breaks between schema_init.c and phonetic
are not syncronized, which may lead to these troubles

I trested it with 2.1.8 on the attribute displayName in a directory with 
20.000 entries. It worked O.K when the words were separated by spaces,
but it treated all words with hyphens in it the same if the first part 
was the same.
The filter (displayName~=Müller) [yeah, I wanted to know, what happens
with non-english letters] returned all entries that contained a word that was 
similar to Müller. i.e. Mueller, Mehler, Mahler [not exactly what I expected, 
but quite O.K], but also Müller-Dehn [seems reasonable too]

Maybe playing a bit with the definitions SLAPD_APPROX_INITIALS
and SLAPD_APPROX_DELIMITER will help a bit.

I would be interested in your results.

Yours
Peter


-- 
Peter Marschall     |   eMail: peter.marschall@mayn.de
Scheffelstraße 15   |          peter.marschall@is-energy.de
97072 Würzburg      |   Tel:   0931/14721
PGP:  D7 FF 20 FE E6 6B 31 74  D1 10 88 E0 3C FE 28 35