[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: LDAPprep: mapping of " " values




Kurt,

Kurt D. Zeilenga wrote:
Steven suggested that changing LDAPprep such that string
comprising only of whitespace would be mapped to "" instead
of " ".  I believe a poor approach for a number reasons.

While it can be argued (as Steven has) that such mapping
may make some assertions more intuitive, I argue that such
mapping will make various assertions less intuitive.

More importantly, the assertion (l=* *), which says "match a
significant space in values of l", would no longer behave
properly.  The " " ANY string would be mapped to "", leading
to (l=* *) matching any value instead of only those values
which contained a significant space.  This would likely
break a number of applications.

I agree that matching everything in such a case is excessive.

It's my view that the assertion (l= *) says  "match a
significant leading space in values of l".  These assertions
intuitively should only match strings which are all whitespace,
as leading whitespace is otherwise insignificant.

Wouldn't it be easier to just say (l= ) ?

> Likewise
for (l=* ).  Note that this behavior is actually useful.  One
can assert (!(l= *))

or (!(l= ))

> to match all values which are not
all whitespace.  Having (l= * * ) behave like (l=*)
substracts value (and likely will break applications, see
above).

In the current specifications (l= * * ) will never match anything! A value can only match (l= *) or (l=* ) if it is all whitespace. If it is all whitespace then LDAPprep reduces it to a single space. A single space cannot simultaneuously satisfy the initial, any and final substrings.

What we are running into I think is the problem that whitespace in
different parts of an attribute value are treated differently, but the
whitespace in each substring of a substring assertion is treated the
same. Intuitively, one might expect that (l= * * ) should match a
value like "  foo  bar  ". It doesn't with the current specifications.
It would if whitespace were reduced to nothing, but it would match everything
else as well.

What we seem to need here is for leading whitespace in the initial substring
and trailing whitespace in the final substring to be reduced to nothing,
while every other sequence of whitespace characters, in the initial, any or
final substring, reduces to a single space.

It would be a modest change to LDAPprep to enable something like this.
We just need two parameters for each string handed to LDAPprep: a boolean
flag that indicates whether whitespace in the initial part of the string
is to be treated as leading whitespace, and a boolean flag that indicates
whether whitespace in the final part of the string is to be treated as
trailing whitespace. The syntaxes draft can then nominate values for the
flags for each string or substring it passes to LDAPprep. Alternatively,
LDAPprep can just reduce consecutive whitespace to a single space in every
case and leave the syntaxes draft to nominate the circumstances under
which a leading or trailing space is to be removed.


Additionally, I believe it important that all outputs of LDAPprep would not be valid per the syntax of the input. If this is not so, then implementations must be very careful not to apply LDAPprep to the output of LDAPprep. Also, LDAPprep could not be used as a canonicalization function if we were to adopt this mapping.

In the wider context of component matching (and potentially even within the framework of X.500) there are many ways that the output of LDAPprep could be invalid with respect to the syntax, i.e. ASN.1 type, of the abtract value that supplied the input string. It can change the length of the string such that it is no longer an acceptable length - too short, too long (?) or an explicitly disallowed length. It can introduce space characters where space characters are disallowed. It can create a sequence of characters that no longer satisfies a pattern constraint or value constraint. And so on. And what exactly is the output syntax of LDAPprep in ASN.1 terms ? A UTF8String ? A UniversalString ? That clearly doesn't line up with an input that is a TeletexString.

LDAPprep is only used within the LDAP technical specification to prepare
character strings for a comparison routine. It is an internal part of
of a function that accepts two values and produces TRUE, FALSE or Undefined
as a result. If someone wants to use it for something else, like canonicalization
then they have to deal with the consequences, which are far more involved than
dealing with empty strings.

Regards,
Steven


Kurt