[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: ListMatch Clarification

To: daniel@ncsu.edu
Subject: Re: ListMatch Clarification
From: Steven Legg <steven.legg@adacel.com.au>
Date: Tue, 22 Jun 2004 10:56:14 +1000
Cc: John McMeeking <jmcmeek@us.ibm.com>, ietf-ldapbis@OpenLDAP.org
In-reply-to: <Pine.GSO.4.58.0406211059580.17557@ghidora.unity.ncsu.edu>
References: <OF1A8ED09F.D3A8643D-ON86256EBA.0045DDBA-86256EBA.00490F5E@us.ibm.com> <Pine.GSO.4.58.0406211059580.17557@ghidora.unity.ncsu.edu>
User-agent: Mozilla/5.0 (Windows; U; WinNT4.0; en-US; rv:1.3.1) Gecko/20030425


Daniel,

Daniel Henninger wrote:
> John McMeeking wrote:

As I understand it, matching applies to the entire address as a
concatinated string, excluding the $ separators, except that * --
<initial>, <any>, or <final> -- does not span lines in an attribute value.
That is, * matches a substring of a line or the entire line, but not parts
of two lines.


It is not the intention that the * separators in a Substring Assertion value
are part of the substrings that are matched, and which are required to match
entirely within a line. As Kurt points out, it isn't appropriate to think
of the *'s as wildcards either. I will try to rewrite the text about substring
matching to make this clearer in the next revision.

So if I'm understanding that correctly, one would have to do:
*Foo**NC* to match my example below?  I'll get to that.


The substrings to be matched are "Foo" and "NC", so the correct assertion
is "*Foo*NC*". The position of the *'s tells us that "Foo" and "NC" are
both any substrings (as opposed to initial or final substrings).

Applying this to your later examples:
Hrm. Ok, let me see if I understand this correctly. The address query
is
matched on lines, not the string as a whole?  Given address:
123 Foo Street
Raleigh, NC 12345
aka: 123 Foo Street $ Raleigh, NC 12345
Matching will be applied to the concatinated string "123 Foo Street
Raleigh, NC 12345" with a hidden separator replacing the $.
but the hidden seperator "stops dead" the *? is that correct?


No, because the * isn't part of a substring to be matched.

If I did a query of *Raleigh*, I would expect it to:
123 Foo Street      no match
Raleigh, NC 12345   match
  address matches


Matches:
Initial * matches "123 Foo Street", "Raleigh" matches Raleigh, final *
matches ", NC 12345"

And it only matches Raleigh because Raleigh is right at the beginning of
the line.  Correct?  If it were *NC*, it would -not- match?


"*NC*" would match because the any substring "NC" completely matches a string
of characters wholly contained within a line of the address (the second line).

If I did a query of *Foo*, I would expect it to:
123 Foo Street      match
Raleigh, NC 12345   no match
  address matches


No match:
Initial * matches either a) the entire first line or b) "123"
a) "Foo *" does not match second line
b) "Foo *" matches "Foo Street" but does not include the next line of the
address.


"*Foo*" would match because the any substring "Foo" completely matches a string
of characters wholly contained within a line of the address (the first line).

Ok, that makes sense based off my understanding from above.  =)

But what if I did *Foo Street*Raleigh*, I would expect it to:
123 Foo Street      no match
Raleigh, NC 12345   no match
  address doesn't match


Matches:
Initial "*Foo Street" matches 1st line, 2nd "*" matches empty string,
"Raleigh*" matches 2nd line.


The address matches. The any substring "Foo Street" completely matches a
string of characters wholly contained within a line of the address (the
first line) and the any substring "Raleigh" completely matches a subsequent
string of characters wholly contained within a line of the address (the
the second line).


Personal observations:
- Since the middle * is matching an empty string, "*Foo Street Raleigh*"
should also match.


"*Foo Street Raleigh*" won't match because the any substring "Foo Street Raleigh"
does not match any string of characters which is wholly contained within any
one line of the address.

- "*Foo Street*Raleigh*" could also match addresses like:
  123 Foo Street NW
  Raleigh, NC 12345

Because the first line would match *Foo Street* and the second line
would match the Raleigh* portion.


It's the right answer, but the reasoning is that the any substring "Foo Street"
completely matches a string of characters wholly contained within a line of the
address (the first line) and the any substring "Raleigh" completely matches a
subsequent string of characters wholly contained within a line of the address
(the second line).

The examples have substrings matching strings of characters on separate lines
but there is nothing to stop the substrings matching disjoint strings of
characters on the same line.

Regards,
Steven

or
  123 Foo Street
  North Raleigh, NC 12345
And because the first line would match *Foo Street and the second line
would match *Raleigh*.
Is my interpretation any better?
Yes, that makes sense.  Makes it a lot harder to parse, but hey.  =)
Thanks!
Daniel
John  McMeeking
owner-ietf-ldapbis@OpenLDAP.org wrote on 06/21/2004 06:55:08 AM:
I'm looking into implementing the ListMatch style matches and
submitting a
patch. I am having some trouble understanding part of the
specification.
I understand that $ is effectively a newline. The draft seems to
indicate
that you ignore $ (ie, pull it out) and escape (\). That said, the
part
I'm not clear on is the actual search string. Does the $ get handled
in
the search string as well? For example, lets say I want to find
someone
on Foo Street, in Raleigh, NC. Would I do a search along the lines
of:
postalAddress=*Foo Street $ Raleigh, NC*
or
postalAddress=*Foo Street * Raleigh, NC*
or both?
The second assertion value is the one that applies. The assertion
syntax for
caseIgnoreListSubstringsMatch is Substring Assertion, for which $ is an
ordinary character (only * and \ are special). The first assertion
value
would only match if there were an escaped $ (i.e. not a line separator)
in a line of the address (where only $ and \ are special).
The unescaped $ line separators in a Postal Address value are an
artefact
of the LDAP-specific encoding and are not matchable character data.
Hrm. Ok, let me see if I understand this correctly. The address query
is
matched on lines, not the string as a whole?  Given address:
123 Foo Street
Raleigh, NC 12345
aka: 123 Foo Street $ Raleigh, NC 12345
If I did a query of *Raleigh*, I would expect it to:
123 Foo Street      no match
Raleigh, NC 12345   match
  address matches
If I did a query of *Foo*, I would expect it to:
123 Foo Street      match
Raleigh, NC 12345   no match
  address matches
But what if I did *Foo Street*Raleigh*, I would expect it to:
123 Foo Street      no match
Raleigh, NC 12345   no match
  address doesn't match
Is this a correct interpretation of how it should work?  If you wanted to
make sure it was Foo Street in Raleigh, NC, how would you go about doing
that?  (&(postalAddress=*Foo Street*)(postalAddress=*Raleigh*)) ?
Thanks!
Daniel
Also is there supposed to be a space on both sides of the $, similar
to
how $'s are used in schema, or would there be no spaces?
There is no requirement either way, and for caseIgnoreListMatch it
makes no
difference since each line of the address is matched according to
caseIgnoreMatch which ignores leading and trailing space on each line.
It makes no difference for caseIgnoreListSubstringsMatch as well
because of the interaction of stringprep and the requirement that
a substring in an assertion value for caseIgnoreListSubstringsMatch
does not match characters across multiple lines.
Regards,
Steven
Daniel
--
/\\\----------------------------------------------------------------------///\
\ \\\ Daniel Henninger http://www.vorpalcloud.org/
/// /
\_\\\ North Carolina State University - Systems Programmer
///_/
\\\ Information Technology <IT>
///
"""--------------------------------------------------------------"""

Follow-Ups:
- Re: ListMatch Clarification
  - From: Daniel Henninger <daniel@unity.ncsu.edu>

References:
- Re: ListMatch Clarification
  - From: John McMeeking <jmcmeek@us.ibm.com>
- Re: ListMatch Clarification
  - From: Daniel Henninger <daniel@unity.ncsu.edu>

Prev by Date: Re: ListMatch Clarification
Next by Date: Re: ListMatch Clarification
Index(es):
- Chronological
- Thread