[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: [Models] An attribute value should be equal to self



At 10:40 PM 3/17/2005, Steven Legg wrote:
>Kurt,
>Kurt D. Zeilenga wrote:
>>At 11:38 PM 3/10/2005, Steven Legg wrote:
>>
>>>I'm asking that LDAPprep not fail. 
>>
>>Given that unassigned code points are to be prohibited
>>[Section 7, Stringprep] (in at least one of the two involved
>>strings), and all revisions of Unicode have unassigned
>>code points, all proper profiles of Stringprep can fail
>>and will fail in face of unassigned code points.
>
>Not necessarily. Section 7 says that the "list of unassigned code
>points MUST be given in a profile, and that list MUST be used by
>implementations of the profile in step 3 (prohibit)". Step 3, i.e.
>section 5, doesn't say anything about the unassigned code points
>but does suggest that a profile has a degree of discretion as to
>which characters to treat as prohibited. In the absence of anything
>definitive, I think Section 7 is more relevant in that it says
>"Stored strings using the profile MUST NOT contain any unassigned
>code points. Queries for matching strings MAY contain unassigned
>code points".  This suggests to me that attribute values MUST NOT
>contain unassigned code points, but assertion values MAY contain
>unassigned code points.

Correct.  While profiles may prohibit other code points, all profiles
are required to prohibit unassigned code points in "stored"
strings.  At least one of the two strings involved in the
comparison is to be categorized as a "stored" string.  It was my
assumption that we'd categorized, in general, the assertion value as
a "query" string and the attribute value as an "stored" string.

(I say in general because there are some special cases.)

The reason I prohibited them in LDAPprep in both strings (and had
LDAPprep failures lead to Undefined mapping) was that the presented
assertion value presented could contain an unassigned code (in the server)
X which the client thinks should be equivalent to some assigned code (in
both server and client) Y.  The server evaluate caseIgnoreMatch(prep(X),prep(Y))
to Undefined as has no way of knowing wether or not X and Y are equivalent
under some subsequent Unicode version.

I don't think it makes much sense for caseIgnoreMatch(prep(X),prep(Y))
to be FALSE on one server (implementing current LDAPprep) but TRUE on
another (implementing LDAPprep').

>Stringprep seems to be saying that there
>are two different preparation algorithms. An algorithm that is
>applied to stored strings to normalize them and to check for
>prohibited characters, including unassigned code points, and another
>algorithm that is applied to query strings to normalize them and
>check for prohibited characters, but not including unassigned
>code points. At the moment LDAPprep is not drawing any distinction
>between a string in an attribute value versus a string in an
>assertion value. However there are wider implications.

Yes, there certainly are wider implications.

>A requirement that R(X, X) == TRUE, with LDAPprep as it is currently
>defined, will have the effect of preventing attribute values from
>containing prohibited characters, but only for strings that are
>examined by the equality matching rule for the attribute. There is
>nothing that stops the DESC field in an attribute type description,
>for example, from containing prohibited characters (including unassigned
>code points). If we are to follow the spirit of stringprep then I
>think that means we must prevent prohibited characters appearing in
>any part of an attribute value, regardless of the equality matching rule
>for the attribute. After all, with component matching any part of any
>attribute value can be matched.
>
>Incidentally, I wouldn't be adverse to a blanket ban on unassigned
>code points in every part of every attribute value,

Okay.

>as long as LDAPprep doesn't fail if they appear in an assertion value.

That would require eliminating all other prohibitions.

>We have discretion
>as to what other characters are to be prohibited from attribute values.

That we do.

>Something that isn't clear to me is how stringprep is supposed to
>apply to stored values. I guess that the intent is that one should
>apply the stringprep profile to each string that is about to be
>stored and if that returns an error (because it contains prohibited
>characters) then the string should be rejected. However Section 7.1
>says "Stored strings MUST NOT contain any code points outside of AO
>for the latest version of a profile. That is, they are forbidden to
>contain code points from the MN, D, or U categories". I take this to
>mean that what one is actually supposed to store is the output of
>stringprep, i.e. the normalized string. We aren't doing that.
>
>I think we are falling short of the intent of stringprep.

I had a discussion with Patrik last week regarding rationale behind
the Section 7.  Basically, I understand Patrik to say that they
wanted to preclude implementations from creating (for lack of a
better word) stored lookup keys (indices, hashes of values, etc.)
derived from values containing unassigned code points to prevent
problems upon subsequent assignment of code points and normalization
mappings.

One could argue that Stringprep is delving into implementation
details here.

One could argue that instead of LDAPprep, we should just require
the server to apply NFKC normalization to the assertion and
attribute values and then:
        if NFKC(assert) == NFKC(attribute) return TRUE,
        if NFKC(assert) contains unassigned code points return Undefined,
        if NFKC(attribute) contains unassigned code points return Undefined,
        else return FALSE

and then state various implementation consideration regarding lookup keys
involving unassigned code points, including specifically allowing
(if not recommending or mandating) implementations to reject attribute
values containing unassigned code points and noting that if they
don't that upgrade to LDAPprep' will requiring rebuilding lookup
keys for any value containing a previously unassigned code point.

(Note that I am not necessarily advocating this approach.  Just
food for thought.)

>>That is, LDAPprep can fail.  There is no way to escape
>>this (and still have LDAPprep be a profile of the
>>StringPrep algorithm).
>>As you noted, troublesome characters should not be mapped
>>to nothing as that causes string+garbage to match string.
>>But mapping those troublesome characters to the replacement
>>character makes little sense.  If the replacement character
>>is prohibited, then the mapping has zero impact.  If the
>>replacement character is allowed, then string+garbage
>>would be equal to string+replacement.  I think this
>>is inappropriate as if we say the replacement character
>>is not a troublesome character, then no troublesome
>>character should match it.
>
>If garbage is mapped then it has to be mapped to something.
>The replacement character seems to be the most appropriate
>because it already carries the sense of something that
>cannot be properly represented.

My concern is that if the replacement character is
not garbage then no garbage should match the replacement
character.  If its allowed, its not garbage.

I rather allow garbage than to map them to something.

>>The issue I was referring with Normalization is really
>>a code point v. character issue, and general assumption
>>that matching (of the prepared values) is done in
>>terms of characters not code points.  Ignore this for now.
>>(Also ignore the possibility that transcoding might fail.)
>>The prohibition on Unassigned code points will cause
>>StringPrep to fail.
>
>For a stored string but not a query string.

but transcoding failure can happen in either string.  This
is another reason why I think LDAPprep failure should lead to
Undefined matching.

>>Now, if you want Rule(X,X) != Undefined, then in [Syntaxes]
>>we could say that if LDAPrep fails the matching for either
>>the assertion value or the attribute value, the matching
>>is False.  I'd have to think about it a bit, but off-hand,
>>I think it might be better for assertions where an implementation
>>is unable to determine the abstract value represented (e.g.,
>>prepare the string) by either the assertion or attribute
>>value strings to return Undefined.
>
>I agree that in the face of failure a matching rule should
>return Undefined. It's more a question of what causes failure.
>I would be less concerned if LDAPprep only failed for characters
>that are never allowed in stored strings anyway, but that currently
>isn't generally the case. For the versioning reasons put forth
>in stringprep we should allow unassigned code points in
>assertion values. If attribute values were prevented from
>containing unassigned code points by a blanket ban then LDAPprep
>could simply ignore them. If they appear then it must be because
>we are preparing an assertion value, and that's okay.
>
>If we know that stored strings never contain prohibited characters
>then we could sensibly map an LDAPprep failure to FALSE.

I content that it more sensible for assertion involving an
Unassigned code point in the assertion value to evaluate to
Undefined.

>> My rationale is that since
>>the implementation cannot determine what the abstract value
>>is and it cannot determine equality of that abstract value
>>to any other abstract value.  This behavior is similar to
>>the behavior with the string does not adhere to the
>>applicable syntax restrictions.  That is, I don't much
>>difference in (cn=garbage) and (timeStamp=20050231Z).  Both
>>should evaluate to Undefined.
>
>There's an important difference. The time stamp value violates
>the syntax. The matching rule has nothing to do with it. The
>value would still be illegal even if the attribute didn't have
>an equality matching rule. The cn value satisfies the syntax
>(though perhaps it shouldn't) but the equality matching rule
>evaluates to undefined.
>
>Regards,
>Steven
>
>>Kurt