[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: LDAPprep: mapping of " " values




Kurt,

Just so we're clear, I started out arguing that LDAPprep should allow an
empty string as output to avoid anomalous behaviour in substring matching.
I concede that this may produce more matches than the user was expecting.
It is clear to me now that treating all strings and substrings exactly
the same way is the problem, no matter what that way is. I am now arguing
for LDAPprep and/or syntaxes to be revised so that whitespace treatment is
dependent on the context of the (sub)string. Sometimes that means reducing
a string of all spaces to an empty string, and sometimes it doesn't.

More comments below.

Kurt D. Zeilenga wrote:
At 10:38 PM 11/15/2004, Steven Legg wrote:

Kurt,
Kurt D. Zeilenga wrote:

Steven suggested that changing LDAPprep such that string
comprising only of whitespace would be mapped to "" instead
of " ".  I believe a poor approach for a number reasons.
While it can be argued (as Steven has) that such mapping
may make some assertions more intuitive, I argue that such
mapping will make various assertions less intuitive.
More importantly, the assertion (l=* *), which says "match a
significant space in values of l", would no longer behave
properly.  The " " ANY string would be mapped to "", leading
to (l=* *) matching any value instead of only those values
which contained a significant space.  This would likely
break a number of applications.

I agree that matching everything in such a case is excessive.


It's my view that the assertion (l= *) says  "match a
significant leading space in values of l".  These assertions
intuitively should only match strings which are all whitespace,
as leading whitespace is otherwise insignificant.

Wouldn't it be easier to just say (l= ) ?


Yes, but X.520 allows (l= *) instead.  What does your
implementation do today?

(l= ) matches any string composed entirely of spaces. (l= *) is equivalent to a presence match.

> In OpenLDAP, this assertion
will only match values which are composed entirely
of whitespace.  Others?

The logic here is that, except in one special case, that
leading and trailing spaces are insignificant.  One cannot
match on insignificant portions of the value without giving
them significance.

The way I see it I am treating leading space as insignificant. Treating (l= *) like a presence match is saying that leading space is insignificant. One gets the same result whether or not attribute values have leading space.

> And giving leading and trailing spaces
significance changes the character of the rules in a major
way.

I think I'm actually doing the opposite.

It is LDAPprep that is sometimes giving whitespace in substring assertions
the wrong significance because the context of the substrings is being ignored.



Likewise
for (l=* ).  Note that this behavior is actually useful.  One
can assert (!(l= *))

or (!(l= ))


to match all values which are not
all whitespace.  Having (l= * * ) behave like (l=*)
substracts value (and likely will break applications, see
above).

In the current specifications (l= * * ) will never match anything!


I believe that this is correct.  As the old adage goes:
ask a stupid question, get a stupid answer.

If it's a stupid question then why the concern about breaking applications ?

A value can only match (l= *) or (l=* ) if it is all whitespace.


I believe that this is correct. As every string ("X") is equivalent
to some string which has insignificant leading and trailing
whitespace (" X "), these assertions would match the same
entries as (l=*). The client should simply do (l=*) if that
what it wants.



If it is all whitespace then LDAPprep reduces it to a single space.
A single space cannot simultaneuously satisfy the initial, any and
final substrings.


I believe that this is proper as there is only one significant
space and the assertion asked whether there is three significant
spaces.


What we are running into I think is the problem that whitespace in
different parts of an attribute value are treated differently, but the
whitespace in each substring of a substring assertion is treated the
same. Intuitively, one might expect that (l= * * ) should match a
value like "  foo  bar  ".  It doesn't with the current specifications.
It would if whitespace were reduced to nothing, but it would match everything else as well.


If intuitively one might expect this, then they might also
expect (l=* * *) to match "x  x"

Given the explicit statement that multiple consecutive spaces are equivalent to a single space I now think it is quite reasonable to regard (l=* * *) as a request for something significant separated by whitespace from something significant separated by whitespace from something significant, so "x x" doesn't match.

Following the same point of view, it would be reasonable to regard (l=* x *)
as a request to match something significant separated by whitespace from an
"x" separated by whitespace from something significant. With the current
specifications the request will actually match any string that has an "x" in
it somewhere, regardless of whether it is preceded or followed by whitespace.
This is because LDAPprep applied to the any substring treats the whitespace
as leading and trailing whitespace and removes it leaving only "x".
However, the fact that the user has put the string into an any substring
makes it highly likely the user intended the spaces to be significant.
If we change syntaxes + LDAPprep to conditionally strip leading and trailing
spaces then an any substring of " x " would not be further reduced and the
request would match only those values containing an "x" surrounded by whitespace.

>(or (l=* *) to match "x x"
but not "x x").

LDAP and X.520 are pretty clear that multiple consecutive spaces are equivalent to a single space. Anyone who expects "x x" to match but not "x x" hasn't read the matching rule descriptions.

> If one can match insignificant leading and
trailing spaces, then it intuitively follows one can match
insignificant consecutive spaces.

It doesn't automatically follow since it is clear from X.520 that the significance of interior spaces is treated differently to leading and trailing spaces.

I believe that this is nonsense and that we should redesign
matching to support matching of insignificant spaces.

I assume you left a "not" out of there.

I now think we should redesign substring matching so that the treatment of
whitespace in a substring is consistent with the part of the attribute value
that substring is expected to match. Thus, leading whitespace is removed from
an initial substring because the part of the attribute value it potentially
matches also has leading whitespace removed. Likewise, the trailing whitespace
is removed from a final substring because the part of the attribute value it
potentially matches also has trailing whitespace removed. All other consecutive
whitespace characters, including trailing whitespace in an initial substring
and leading whitespace in a final substring, are reduced to a single space because
they are expected to match significant whitespace within the attribute value.



What we seem to need here is for leading whitespace in the initial substring
and trailing whitespace in the final substring to be reduced to nothing,
while every other sequence of whitespace characters, in the initial, any or
final substring, reduces to a single space.


If there is a need to match insignificant spaces, a rule which
is specifically design to support that matching should be used.
These rules were designed to ignore insignificant spaces.  We
should not change that.


It would be a modest change to LDAPprep


What you ask for, IMO, is a change to matching rule to support
matching of insignificant spaces in certain cases.  I believe
that such a change is inappropriate and certainly should be
viewed as a new feature.

I see it differently. It is clear from X.520 that some spaces are significant and some are not. We have a problem in substring matching, as it is currently specified, that it loses track of which is which, leading to unintuitive results. I'm suggesting a way to fix that.



to enable something like this.
We just need two parameters for each string handed to LDAPprep: a boolean
flag that indicates whether whitespace in the initial part of the string
is to be treated as leading whitespace, and a boolean flag that indicates
whether whitespace in the final part of the string is to be treated as
trailing whitespace. The syntaxes draft can then nominate values for the
flags for each string or substring it passes to LDAPprep. Alternatively,
LDAPprep can just reduce consecutive whitespace to a single space in every
case and leave the syntaxes draft to nominate the circumstances under
which a leading or trailing space is to be removed.

Regards, Steven