[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Empty IA5String




On 10-Nov-04, at 6:30 PM, Steven Legg wrote:

Component matching allows any string matching rule to be applied to any
string component of any ASN.1 type. There is nothing to prevent those
string components having zero length. We might be able to force the
IA5 String syntax to have at least one character but we can't do the
same for every string component of every ASN.1 type.

Very true. This is (amongst) the reasons I consider the generalised use of stringprep to be unfortunate. Of course, one could define string matching rules which are just like the current ones but do not use stringprep, or use another profile. In fact, I believe that will be necessary given the concerns I outlined a couple of days ago.

On a related note, it is not clear that the non-significant characters
being deleted by stringprep are all that non-significant.

I don't know whether or not this is a problem, but: it is legal to follow
a character of class C with a combining character (class M). For
example, the following is a legal Unicode sequence: U+0041 U+200B U+0301
(CAPITAL A, ZERO WIDTH SPACE, COMBINING ACUTE ACCENT)


The U+0301 is a "defective combining sequence" (which is specifically legal,
despite the use of the word defective). Simply deleting the U+200B will
result in a non-canonical string; map-to-nothing therefore has to happen
earlier (which is does, so that's ok). This turns the above sequence into
U+00C1 (CAPITAL A WITH ACUTE).


Arguably, the original input was "bogus", but it is, I believe,
typographically meaningful, since the ZWS is allowed to expand
during justification, and is additionally a word-break character.
Unicode has some odd corners.

The insignificant character deletion which occurs in LDAPprep is
post-normalization, and therefore could theoretically create
non-NFKC normalised strings, if for example a telephone number
were allowed to contain non-IA5 characters. (Is there a reason
why NFKC was chosen rather than NFKD, by the way? I would have
thought that NFKD would be computationally simpler -- even though
the normalised strings are larger -- and I don't see how there
could be any difference in the equivalence classes generated.)