[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: approx match names

At 02:23 PM 12/6/00 -0500, Mark Adamson wrote:
>There is a proposal here at CMU that we try to encourage more hits in
>approximate matching of first names by trying to find hits on short forms
>of names. For example, if I search for givenname~=Joe, it would be nice
>if the server found people whose givenname was either Joe or Joseph. This
>will really help with Email recipient name exapansion in a composer's
>To: field.  This could also be used in approx searches of the "cn" field. 
>Making it work with exact matches would be trickier.

I note that this such a rule likely only makes sense for a subset
of directory strings and maybe only for a portion of values.  That is,
I believe that some folks might want a different approximate rule
for 'givenName' than 'sn' than 'cn'.  Of course, as the rule is
defined for 'name' and inherited, this is problematic in terms of
the schema rules.

I would suggest that if more sophisticated matching is desired that
this NOT be done by modify a general rule to have attribute specific
behavior, but to define additional matching rules which a client
can use appropriate.

>I'd propose that there be an option to phonetic() to check incoming
>strings to see if they are a short name, and if so, convert it first to
>the long name.  That way all Joe's and Joseph's get turned into
>Joseph. When indexing is done, only the phonetic of "Joseph" is stored,
>and all searches for Joe and Joseph look there.

I suggest that approximate matching rule implementation be
quite general [phonetic, soundex, other] and that extensible
matching rules be defined for other matching such as you describe.
That is,  I suggest (attr~=value) imply some general purpose
approximate matching and that (attr:fancyMatchingRule:=value)
implies matching by some fancy rule.

>The list of short name to long name conversions may have to be site
>specific and perhaps even stored in an external config file.

or in the directory.

>Storing them
>in slapd.conf might be a bit cluttersome. But hardcoding the set of names
>into the binary seems unwise for regional and ethnic reasons.

I would avoid hard coding them.  I suggest that a file be used
which would provide short name, long name pairs and that the
name of this file be provided as a slapd.conf(5) option.

>So I'd like to ask if people would find this useful, and if so, how and
>where should the set of short->long name conversions be stored by the site?
>Something like share/openldap/nickname
>Switching to this system would require reindexing. It could also slow the
>server during approx searches if the list of nicknames was long.

If done as an extensible matching rule, only the index for the
new rule would have to be generated.  Of course, current tools
require that all indices be generated at once (unless you play
slapd.conf games).