[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: LDAPprep: mapping of " " values




David,

David Wilson wrote:

I'm sorry that I'm a bit late into this discussion (because I'm not
actually a recipient of this list). But I am interested in the question
of spaces at the start and end of substring filter strings.

Something important needs to be remembered in this discussion: that a
substring matching rule has a closely associated equality matching rule.
The association is summarized in X.501 8.8.5:

"With respect to a specific attribute type, the equality and substrings
rule (if both present) shall always be related in at least the following
respect: for all x and y that match according to the equality relation,
then for all values z of the  substring relation, the result of
evaluating the assertion against the value x equals the result of
evaluating the assertion  against the value y. That is, two values that
are indistinguishable using the equality relation are also
indistinguishable using the substrings relation."

This seems a very reasonable requirement.

I agree.

> But it does impact the
behaviour of matching a substring filter when there are spaces at the
start or end of the strings within the filter.

It rules out some interpretations, but there is still more than one way to satisfy the requirement.


X.520 Section 6.1 is not very clear about when spaces are insignificant in the strings within substring filters. It CAN be interpreted to mean that any space at the start or end of the individual strings is ignored, and so discarded, before the process of comparing the sequence of strings in the filter with the value being tested. This interpretation is consistent with the above requirement.

If one does not discard the spaces at the start, and that seems
desirable from some points of view, then this makes matching more
complicated, because of the requirement.

Firstly, if an "initial" substring starts with spaces, these can be
regarded as insignificant. Clearly:

	(cn= foo*)

should match (using quotation marks to make the spaces clearer)

	cn=" foobar"

but since this matches for equality (because of the insignificant
initial space)

	cn="foobar"

the filter should also match this second value.

Similarly, spaces at the end of a "final" substring is insignificant.

Yes, consistency between the equality and substring matching rules requires that leading spaces in an initial substring and trailing spaces in the final substring are insignificant. But we are getting into difficulty when the substrings are all spaces. The consistency requirement doesn't help us here.


The problem arises with spaces at the end of an "initial" substring, the start of a "final" substring and at the start or end of an "any" substring. In each case, there can be a natural match with a value where the space in question can be an initial space in a value, a final space in a value, or one of a sequence of spaces within a value.

Consider these cases:

	(cn=foo *)

which matches

	cn="foo "

and so MUST match

cn="foo"

One proposal (which I currently favour) would reduce trailing space in an initial substring to a single space. Trailing space in an attribute value would be removed. This means that (cn=foo *) does not match cn="foo ", but nor does it match cn="foo", so the consistency between equality and substring matching is not broken.


Then:

	(cn=John * Smith)

clearly matches

	cn="John  Smith"

(two spaces in the middle), but the value matches for equality with

	cn="John Smith"

(one space in the middle).

If trailing spaces in an initial substring and leading spaces in a final substring are reduced to a single space (and multiple consecutive spaces in an attribute value are replaced by a single space) then neither of the above values would be matched, and consistency is not broken.

> So the two spaces in the filter need to match
a SINGLE space in the second value.

I don't like any solution that requires non-disjoint matching against substrings, though with spaces it can be finessed. That is, for attribute values remove leading and trailing spaces and replace one or more consecutive spaces with *two* space characters. For substrings, remove leading spaces from initial substrings, remove trailing spaces from final substrings, replace trailing spaces in an initial or any substring and leading spaces in a final or any substring with a *single* space, and otherwise replace one or more consecutive spaces with *two* space characters.

This means that (cn=John * Smith) is unchanged. Both values are normalized to
cn="John  Smith", which the substrings of the assertion match disjointly.

This strategy has the effect of allowing spaces in the substrings to overlap
against the matched values, but I feel it is too unorthodox for substring
matching in the directory.


It is possible to write a substring matching rule that does the right thing, but it does have to deal with some interesting cases. You cannot do this in a simple minded fashion, or you run the risk of matching too few strings. The simpler choice of stripping spaces from the start and end of each substring in the filter has the property of matching, perhaps, too many strings. But that seems a better failing.

But what of substrings that are all spaces ?


[So, what about (cn= * )? This has a natural match to any value which starts and ends with a space. But since these spaces are insignificant for the equality match, if this filter matches any value, it matches all values. The insignificant space rule for the equality matching rule means that the corresponding substring rule cannot be used to find values which start or end with spaces.]

The insignificant space rule for the equality matching rule says that a string of all spaces is equivalent to a single space, which leaves wriggle room for an interpetation where (cn= * ) matches " ", without breaking consistency.



Part of the problem, it seems to me, is that people are wanting NOT to
match spaces, but word boundaries.

That does seem to be the case, but I think that is beyond the mandate of LDAPbis.

> In terms of regular expressions, a
word boundary is a zero-width regular expression. But even the
definition of "word-boundary" can be defined in lots of ways. X.520
(Section 6.5) does define some word matching rules. However, the
definition of a word is, in the famous exit-strategy of the standards
writer "a local matter".

But I don't think that substring filters should be changed to interpret
spaces as word boundary matches. Different matching rules should be used
for word matching semantics, however defined.

I agree.

Regards,
Steven