[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: LDAPprep: mapping of " " values



I'm sorry that I'm a bit late into this discussion (because I'm not
actually a recipient of this list). But I am interested in the question
of spaces at the start and end of substring filter strings.

Something important needs to be remembered in this discussion: that a
substring matching rule has a closely associated equality matching rule.
The association is summarized in X.501 8.8.5:

"With respect to a specific attribute type, the equality and substrings
rule (if both present) shall always be related in at least the following
respect: for all x and y that match according to the equality relation,
then for all values z of the  substring relation, the result of
evaluating the assertion against the value x equals the result of
evaluating the assertion  against the value y. That is, two values that
are indistinguishable using the equality relation are also
indistinguishable using the substrings relation."

This seems a very reasonable requirement. But it does impact the
behaviour of matching a substring filter when there are spaces at the
start or end of the strings within the filter.

X.520 Section 6.1 is not very clear about when spaces are insignificant
in the strings within substring filters. It CAN be interpreted to mean
that any space at the start or end of the individual strings is ignored,
and so discarded, before the process of comparing the sequence of
strings in the filter with the value being tested. This interpretation
is consistent with the above requirement.

If one does not discard the spaces at the start, and that seems
desirable from some points of view, then this makes matching more
complicated, because of the requirement.

Firstly, if an "initial" substring starts with spaces, these can be
regarded as insignificant. Clearly:

	(cn= foo*)

should match (using quotation marks to make the spaces clearer)

	cn=" foobar"

but since this matches for equality (because of the insignificant
initial space)

	cn="foobar"

the filter should also match this second value.

Similarly, spaces at the end of a "final" substring is insignificant.

The problem arises with spaces at the end of an "initial" substring, the
start of a "final" substring and at the start or end of an "any"
substring. In each case, there can be a natural match with a value where
the space in question can be an initial space in a value, a final space
in a value, or one of a sequence of spaces within a value.

Consider these cases:

	(cn=foo *)

which matches

	cn="foo "

and so MUST match

	cn="foo"

Then:

	(cn=John * Smith)

clearly matches

	cn="John  Smith"

(two spaces in the middle), but the value matches for equality with

	cn="John Smith"

(one space in the middle). So the two spaces in the filter need to match
a SINGLE space in the second value.

It is possible to write a substring matching rule that does the right
thing, but it does have to deal with some interesting cases. You cannot
do this in a simple minded fashion, or you run the risk of matching too
few strings. The simpler choice of stripping spaces from the start and
end of each substring in the filter has the property of matching,
perhaps, too many strings. But that seems a better failing.

[So, what about (cn= * )? This has a natural match to any value which
starts and ends with a space. But since these spaces are insignificant
for the equality match, if this filter matches any value, it matches all
values. The insignificant space rule for the equality matching rule
means that the corresponding substring rule cannot be used to find
values which start or end with spaces.]


Part of the problem, it seems to me, is that people are wanting NOT to
match spaces, but word boundaries. In terms of regular expressions, a
word boundary is a zero-width regular expression. But even the
definition of "word-boundary" can be defined in lots of ways. X.520
(Section 6.5) does define some word matching rules. However, the
definition of a word is, in the famous exit-strategy of the standards
writer "a local matter".

But I don't think that substring filters should be changed to interpret
spaces as word boundary matches. Different matching rules should be used
for word matching semantics, however defined.

-- 
David Wilson <David.Wilson@isode.com>