[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: [Models] An attribute value should be equal to self




Kurt,

Kurt D. Zeilenga wrote:
At 11:38 PM 3/10/2005, Steven Legg wrote:

I'm asking that LDAPprep not fail.


Given that unassigned code points are to be prohibited
[Section 7, Stringprep] (in at least one of the two involved
strings), and all revisions of Unicode have unassigned
code points, all proper profiles of Stringprep can fail
and will fail in face of unassigned code points.

Not necessarily. Section 7 says that the "list of unassigned code points MUST be given in a profile, and that list MUST be used by implementations of the profile in step 3 (prohibit)". Step 3, i.e. section 5, doesn't say anything about the unassigned code points but does suggest that a profile has a degree of discretion as to which characters to treat as prohibited. In the absence of anything definitive, I think Section 7 is more relevant in that it says "Stored strings using the profile MUST NOT contain any unassigned code points. Queries for matching strings MAY contain unassigned code points". This suggests to me that attribute values MUST NOT contain unassigned code points, but assertion values MAY contain unassigned code points. Stringprep seems to be saying that there are two different preparation algorithms. An algorithm that is applied to stored strings to normalize them and to check for prohibited characters, including unassigned code points, and another algorithm that is applied to query strings to normalize them and check for prohibited characters, but not including unassigned code points. At the moment LDAPprep is not drawing any distinction between a string in an attribute value versus a string in an assertion value. However there are wider implications.

A requirement that R(X, X) == TRUE, with LDAPprep as it is currently
defined, will have the effect of preventing attribute values from
containing prohibited characters, but only for strings that are
examined by the equality matching rule for the attribute. There is
nothing that stops the DESC field in an attribute type description,
for example, from containing prohibited characters (including unassigned
code points). If we are to follow the spirit of stringprep then I
think that means we must prevent prohibited characters appearing in
any part of an attribute value, regardless of the equality matching rule
for the attribute. After all, with component matching any part of any
attribute value can be matched.

Incidentally, I wouldn't be adverse to a blanket ban on unassigned
code points in every part of every attribute value, as long as LDAPprep
doesn't fail if they appear in an assertion value. We have discretion
as to what other characters are to be prohibited from attribute values.

Something that isn't clear to me is how stringprep is supposed to
apply to stored values. I guess that the intent is that one should
apply the stringprep profile to each string that is about to be
stored and if that returns an error (because it contains prohibited
characters) then the string should be rejected. However Section 7.1
says "Stored strings MUST NOT contain any code points outside of AO
for the latest version of a profile. That is, they are forbidden to
contain code points from the MN, D, or U categories". I take this to
mean that what one is actually supposed to store is the output of
stringprep, i.e. the normalized string. We aren't doing that.

I think we are falling short of the intent of stringprep.

That is, LDAPprep can fail.  There is no way to escape
this (and still have LDAPprep be a profile of the
StringPrep algorithm).

As you noted, troublesome characters should not be mapped
to nothing as that causes string+garbage to match string.
But mapping those troublesome characters to the replacement
character makes little sense.  If the replacement character
is prohibited, then the mapping has zero impact.  If the
replacement character is allowed, then string+garbage
would be equal to string+replacement.  I think this
is inappropriate as if we say the replacement character
is not a troublesome character, then no troublesome
character should match it.

If garbage is mapped then it has to be mapped to something. The replacement character seems to be the most appropriate because it already carries the sense of something that cannot be properly represented.


The issue I was referring with Normalization is really a code point v. character issue, and general assumption that matching (of the prepared values) is done in terms of characters not code points. Ignore this for now. (Also ignore the possibility that transcoding might fail.)

The prohibition on Unassigned code points will cause
StringPrep to fail.

For a stored string but not a query string.


Now, if you want Rule(X,X) != Undefined, then in [Syntaxes] we could say that if LDAPrep fails the matching for either the assertion value or the attribute value, the matching is False. I'd have to think about it a bit, but off-hand, I think it might be better for assertions where an implementation is unable to determine the abstract value represented (e.g., prepare the string) by either the assertion or attribute value strings to return Undefined.

I agree that in the face of failure a matching rule should return Undefined. It's more a question of what causes failure. I would be less concerned if LDAPprep only failed for characters that are never allowed in stored strings anyway, but that currently isn't generally the case. For the versioning reasons put forth in stringprep we should allow unassigned code points in assertion values. If attribute values were prevented from containing unassigned code points by a blanket ban then LDAPprep could simply ignore them. If they appear then it must be because we are preparing an assertion value, and that's okay.

If we know that stored strings never contain prohibited characters
then we could sensibly map an LDAPprep failure to FALSE.

> My rationale is that since
the implementation cannot determine what the abstract value
is and it cannot determine equality of that abstract value
to any other abstract value.  This behavior is similar to
the behavior with the string does not adhere to the
applicable syntax restrictions.  That is, I don't much
difference in (cn=garbage) and (timeStamp=20050231Z).  Both
should evaluate to Undefined.

There's an important difference. The time stamp value violates the syntax. The matching rule has nothing to do with it. The value would still be illegal even if the attribute didn't have an equality matching rule. The cn value satisfies the syntax (though perhaps it shouldn't) but the equality matching rule evaluates to undefined.

Regards,
Steven


Kurt