[Date Prev][Date Next] [Chronological] [Thread] [Top]

RE: LDAPprep: mapping of " " values



Isn't the problem that LDAPprep appears to be preparing the value for matching. Surely it should only *clean* the string and leave matching optimisations to the implementations.

-----Original Message-----
From: owner-ietf-ldapbis@OpenLDAP.org
[mailto:owner-ietf-ldapbis@OpenLDAP.org]On Behalf Of Steven Legg
Sent: Wednesday, 17 November 2004 16:11
To: Kurt D. Zeilenga
Cc: ietf-ldapbis@OpenLDAP.org
Subject: Re: LDAPprep: mapping of " " values



Kurt,

Just so we're clear, I started out arguing that LDAPprep should allow an
empty string as output to avoid anomalous behaviour in substring matching.
I concede that this may produce more matches than the user was expecting.
It is clear to me now that treating all strings and substrings exactly
the same way is the problem, no matter what that way is. I am now arguing
for LDAPprep and/or syntaxes to be revised so that whitespace treatment is
dependent on the context of the (sub)string. Sometimes that means reducing
a string of all spaces to an empty string, and sometimes it doesn't.

More comments below.

Kurt D. Zeilenga wrote:
> At 10:38 PM 11/15/2004, Steven Legg wrote:
> 
>>Kurt,
>>Kurt D. Zeilenga wrote:
>>
>>>Steven suggested that changing LDAPprep such that string
>>>comprising only of whitespace would be mapped to "" instead
>>>of " ".  I believe a poor approach for a number reasons.
>>>While it can be argued (as Steven has) that such mapping
>>>may make some assertions more intuitive, I argue that such
>>>mapping will make various assertions less intuitive.
>>>More importantly, the assertion (l=* *), which says "match a
>>>significant space in values of l", would no longer behave
>>>properly.  The " " ANY string would be mapped to "", leading
>>>to (l=* *) matching any value instead of only those values
>>>which contained a significant space.  This would likely
>>>break a number of applications.
>>
>>I agree that matching everything in such a case is excessive.
>>
>>
>>>It's my view that the assertion (l= *) says  "match a
>>>significant leading space in values of l".  These assertions
>>>intuitively should only match strings which are all whitespace,
>>>as leading whitespace is otherwise insignificant.
>>
>>Wouldn't it be easier to just say (l= ) ?
> 
> 
> Yes, but X.520 allows (l= *) instead.  What does your
> implementation do today?

(l= ) matches any string composed entirely of spaces.
(l= *) is equivalent to a presence match.

 > In OpenLDAP, this assertion
> will only match values which are composed entirely
> of whitespace.  Others?
> 
> The logic here is that, except in one special case, that
> leading and trailing spaces are insignificant.  One cannot
> match on insignificant portions of the value without giving
> them significance.

The way I see it I am treating leading space as insignificant.
Treating (l= *) like a presence match is saying that leading space
is insignificant. One gets the same result whether or not attribute
values have leading space.

 > And giving leading and trailing spaces
> significance changes the character of the rules in a major
> way.

I think I'm actually doing the opposite.

It is LDAPprep that is sometimes giving whitespace in substring assertions
the wrong significance because the context of the substrings is being ignored.

> 
> 
>>>Likewise
>>>for (l=* ).  Note that this behavior is actually useful.  One
>>>can assert (!(l= *))
>>
>>or (!(l= ))
>>
>>
>>>to match all values which are not
>>>all whitespace.  Having (l= * * ) behave like (l=*)
>>>substracts value (and likely will break applications, see
>>>above).
>>
>>In the current specifications (l= * * ) will never match anything!
> 
> 
> I believe that this is correct.  As the old adage goes:
> ask a stupid question, get a stupid answer.

If it's a stupid question then why the concern about breaking applications ?

>>A value can only match (l= *) or (l=* ) if it is all whitespace.
> 
> 
> I believe that this is correct.  As every string ("X") is equivalent
> to some string which has insignificant leading and trailing
> whitespace (" X "), these assertions would match the same
> entries as (l=*).   The client should simply do (l=*) if that
> what it wants. 
> 
> 
>>If it is all whitespace then LDAPprep reduces it to a single space.
>>A single space cannot simultaneuously satisfy the initial, any and
>>final substrings.
> 
> 
> I believe that this is proper as there is only one significant
> space and the assertion asked whether there is three significant
> spaces.
> 
> 
>>What we are running into I think is the problem that whitespace in
>>different parts of an attribute value are treated differently, but the
>>whitespace in each substring of a substring assertion is treated the
>>same. Intuitively, one might expect that (l= * * ) should match a
>>value like "  foo  bar  ".  It doesn't with the current specifications.
>>It would if whitespace were reduced to nothing, but it would match everything else as well.
> 
> 
> If intuitively one might expect this, then they might also
> expect (l=* * *) to match "x  x"

Given the explicit statement that multiple consecutive spaces are equivalent
to a single space I now think it is quite reasonable to regard (l=* * *) as a
request for something significant separated by whitespace from something
significant separated by whitespace from something significant, so "x  x"
doesn't match.

Following the same point of view, it would be reasonable to regard (l=* x *)
as a request to match something significant separated by whitespace from an
"x" separated by whitespace from something significant. With the current
specifications the request will actually match any string that has an "x" in
it somewhere, regardless of whether it is preceded or followed by whitespace.
This is because LDAPprep applied to the any substring treats the whitespace
as leading and trailing whitespace and removes it leaving only "x".
However, the fact that the user has put the string into an any substring
makes it highly likely the user intended the spaces to be significant.
If we change syntaxes + LDAPprep to conditionally strip leading and trailing
spaces then an any substring of " x " would not be further reduced and the
request would match only those values containing an "x" surrounded by whitespace.

 >(or (l=*  *) to match "x  x"
> but not "x x").

LDAP and X.520 are pretty clear that multiple consecutive spaces are equivalent
to a single space. Anyone who expects "x  x" to match but not "x x" hasn't
read the matching rule descriptions.

 > If one can match insignificant leading and
> trailing spaces, then it intuitively follows one can match
> insignificant consecutive spaces.

It doesn't automatically follow since it is clear from X.520 that the significance
of interior spaces is treated differently to leading and trailing spaces.

> I believe that this is nonsense and that we should redesign
> matching to support matching of insignificant spaces.

I assume you left a "not" out of there.

I now think we should redesign substring matching so that the treatment of
whitespace in a substring is consistent with the part of the attribute value
that substring is expected to match. Thus, leading whitespace is removed from
an initial substring because the part of the attribute value it potentially
matches also has leading whitespace removed. Likewise, the trailing whitespace
is removed from a final substring because the part of the attribute value it
potentially matches also has trailing whitespace removed. All other consecutive
whitespace characters, including trailing whitespace in an initial substring
and leading whitespace in a final substring, are reduced to a single space because
they are expected to match significant whitespace within the attribute value.

> 
> 
>>What we seem to need here is for leading whitespace in the initial substring
>>and trailing whitespace in the final substring to be reduced to nothing,
>>while every other sequence of whitespace characters, in the initial, any or
>>final substring, reduces to a single space.
> 
> 
> If there is a need to match insignificant spaces, a rule which
> is specifically design to support that matching should be used.
> These rules were designed to ignore insignificant spaces.  We
> should not change that.
> 
> 
>>It would be a modest change to LDAPprep
> 
> 
> What you ask for, IMO, is a change to matching rule to support
> matching of insignificant spaces in certain cases.  I believe
> that such a change is inappropriate and certainly should be
> viewed as a new feature.

I see it differently. It is clear from X.520 that some spaces are
significant and some are not. We have a problem in substring matching,
as it is currently specified, that it loses track of which is which,
leading to unintuitive results. I'm suggesting a way to fix that.

> 
> 
>>to enable something like this.
>>We just need two parameters for each string handed to LDAPprep: a boolean
>>flag that indicates whether whitespace in the initial part of the string
>>is to be treated as leading whitespace, and a boolean flag that indicates
>>whether whitespace in the final part of the string is to be treated as
>>trailing whitespace. The syntaxes draft can then nominate values for the
>>flags for each string or substring it passes to LDAPprep. Alternatively,
>>LDAPprep can just reduce consecutive whitespace to a single space in every
>>case and leave the syntaxes draft to nominate the circumstances under
>>which a leading or trailing space is to be removed.

Regards,
Steven