[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: ListMatch Clarification




Daniel,

Daniel Henninger wrote:
> John McMeeking wrote:
As I understand it, matching applies to the entire address as a
concatinated string, excluding the $ separators, except that * --
<initial>, <any>, or <final> -- does not span lines in an attribute value.
That is, * matches a substring of a line or the entire line, but not parts
of two lines.

It is not the intention that the * separators in a Substring Assertion value are part of the substrings that are matched, and which are required to match entirely within a line. As Kurt points out, it isn't appropriate to think of the *'s as wildcards either. I will try to rewrite the text about substring matching to make this clearer in the next revision.



So if I'm understanding that correctly, one would have to do:
*Foo**NC* to match my example below?  I'll get to that.

The substrings to be matched are "Foo" and "NC", so the correct assertion is "*Foo*NC*". The position of the *'s tells us that "Foo" and "NC" are both any substrings (as opposed to initial or final substrings).



Applying this to your later examples:


Hrm. Ok, let me see if I understand this correctly. The address query

is

matched on lines, not the string as a whole?  Given address:
123 Foo Street
Raleigh, NC 12345
aka: 123 Foo Street $ Raleigh, NC 12345

Matching will be applied to the concatinated string "123 Foo Street Raleigh, NC 12345" with a hidden separator replacing the $.


but the hidden seperator "stops dead" the *? is that correct?

No, because the * isn't part of a substring to be matched.



If I did a query of *Raleigh*, I would expect it to:
123 Foo Street      no match
Raleigh, NC 12345   match
  address matches

Matches: Initial * matches "123 Foo Street", "Raleigh" matches Raleigh, final * matches ", NC 12345"


And it only matches Raleigh because Raleigh is right at the beginning of
the line.  Correct?  If it were *NC*, it would -not- match?

"*NC*" would match because the any substring "NC" completely matches a string of characters wholly contained within a line of the address (the second line).



If I did a query of *Foo*, I would expect it to:
123 Foo Street      match
Raleigh, NC 12345   no match
  address matches

No match: Initial * matches either a) the entire first line or b) "123" a) "Foo *" does not match second line b) "Foo *" matches "Foo Street" but does not include the next line of the address.

"*Foo*" would match because the any substring "Foo" completely matches a string of characters wholly contained within a line of the address (the first line).



Ok, that makes sense based off my understanding from above.  =)


But what if I did *Foo Street*Raleigh*, I would expect it to:
123 Foo Street      no match
Raleigh, NC 12345   no match
  address doesn't match

Matches: Initial "*Foo Street" matches 1st line, 2nd "*" matches empty string, "Raleigh*" matches 2nd line.

The address matches. The any substring "Foo Street" completely matches a string of characters wholly contained within a line of the address (the first line) and the any substring "Raleigh" completely matches a subsequent string of characters wholly contained within a line of the address (the the second line).


Personal observations: - Since the middle * is matching an empty string, "*Foo Street Raleigh*" should also match.

"*Foo Street Raleigh*" won't match because the any substring "Foo Street Raleigh" does not match any string of characters which is wholly contained within any one line of the address.

- "*Foo Street*Raleigh*" could also match addresses like:
  123 Foo Street NW
  Raleigh, NC 12345


Because the first line would match *Foo Street* and the second line
would match the Raleigh* portion.

It's the right answer, but the reasoning is that the any substring "Foo Street" completely matches a string of characters wholly contained within a line of the address (the first line) and the any substring "Raleigh" completely matches a subsequent string of characters wholly contained within a line of the address (the second line).

The examples have substrings matching strings of characters on separate lines
but there is nothing to stop the substrings matching disjoint strings of
characters on the same line.

Regards,
Steven




or
  123 Foo Street
  North Raleigh, NC 12345


And because the first line would match *Foo Street and the second line
would match *Raleigh*.


Is my interpretation any better?


Yes, that makes sense.  Makes it a lot harder to parse, but hey.  =)
Thanks!

Daniel


John  McMeeking


owner-ietf-ldapbis@OpenLDAP.org wrote on 06/21/2004 06:55:08 AM:


I'm looking into implementing the ListMatch style matches and

submitting a

patch. I am having some trouble understanding part of the

specification.

I understand that $ is effectively a newline. The draft seems to

indicate

that you ignore $ (ie, pull it out) and escape (\). That said, the

part

I'm not clear on is the actual search string. Does the $ get handled

in

the search string as well? For example, lets say I want to find

someone

on Foo Street, in Raleigh, NC. Would I do a search along the lines

of:

postalAddress=*Foo Street $ Raleigh, NC*
or
postalAddress=*Foo Street * Raleigh, NC*
or both?

The second assertion value is the one that applies. The assertion

syntax for

caseIgnoreListSubstringsMatch is Substring Assertion, for which $ is an
ordinary character (only * and \ are special). The first assertion

value

would only match if there were an escaped $ (i.e. not a line separator)
in a line of the address (where only $ and \ are special).

The unescaped $ line separators in a Postal Address value are an

artefact

of the LDAP-specific encoding and are not matchable character data.

Hrm. Ok, let me see if I understand this correctly. The address query

is

matched on lines, not the string as a whole?  Given address:
123 Foo Street
Raleigh, NC 12345
aka: 123 Foo Street $ Raleigh, NC 12345

If I did a query of *Raleigh*, I would expect it to:
123 Foo Street      no match
Raleigh, NC 12345   match
  address matches

If I did a query of *Foo*, I would expect it to:
123 Foo Street      match
Raleigh, NC 12345   no match
  address matches

But what if I did *Foo Street*Raleigh*, I would expect it to:
123 Foo Street      no match
Raleigh, NC 12345   no match
  address doesn't match

Is this a correct interpretation of how it should work?  If you wanted to
make sure it was Foo Street in Raleigh, NC, how would you go about doing
that?  (&(postalAddress=*Foo Street*)(postalAddress=*Raleigh*)) ?

Thanks!

Daniel


Also is there supposed to be a space on both sides of the $, similar

to

how $'s are used in schema, or would there be no spaces?

There is no requirement either way, and for caseIgnoreListMatch it

makes no

difference since each line of the address is matched according to
caseIgnoreMatch which ignores leading and trailing space on each line.
It makes no difference for caseIgnoreListSubstringsMatch as well
because of the interaction of stringprep and the requirement that
a substring in an assertion value for caseIgnoreListSubstringsMatch
does not match characters across multiple lines.

Regards,
Steven


Daniel



--


/\\\----------------------------------------------------------------------///\


\ \\\ Daniel Henninger http://www.vorpalcloud.org/

/// /

\_\\\ North Carolina State University - Systems Programmer

///_/

\\\ Information Technology <IT>

///

"""--------------------------------------------------------------"""