[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Empty IA5String

To: Steven Legg <steven.legg@eb2bcom.com>
Subject: Re: Empty IA5String
From: Rici Lake <rici@ricilake.net>
Date: Wed, 10 Nov 2004 20:06:45 -0500
Cc: ietf-ldapbis@OpenLDAP.org
In-reply-to: <4192A492.50202@eb2bcom.com>
References: <HBF.20041107p8mt@bombur.uio.no> <418FF619.8040508@eb2bcom.com> <HBF.20041109m19a@bombur.uio.no> <4191602C.3060702@eb2bcom.com> <6.1.2.0.0.20041110112437.02d995d0@127.0.0.1> <HBF.20041110v3gs@bombur.uio.no> <6.1.2.0.0.20041110124324.0302e570@127.0.0.1> <4192A492.50202@eb2bcom.com>


On 10-Nov-04, at 6:30 PM, Steven Legg wrote:

Component matching allows any string matching rule to be applied to any
string component of any ASN.1 type. There is nothing to prevent those
string components having zero length. We might be able to force the
IA5 String syntax to have at least one character but we can't do the
same for every string component of every ASN.1 type.


Very true. This is (amongst) the reasons I consider the generalised
use of stringprep to be unfortunate. Of course, one could define
string matching rules which are just like the current ones but do
not use stringprep, or use another profile. In fact, I believe that
will be necessary given the concerns I outlined a couple of days ago.

On a related note, it is not clear that the non-significant characters
being deleted by stringprep are all that non-significant.

I don't know whether or not this is a problem, but: it is legal to follow a character of class C with a combining character (class M). For example, the following is a legal Unicode sequence: U+0041 U+200B U+0301 (CAPITAL A, ZERO WIDTH SPACE, COMBINING ACUTE ACCENT)

The U+0301 is a "defective combining sequence" (which is specifically legal, despite the use of the word defective). Simply deleting the U+200B will result in a non-canonical string; map-to-nothing therefore has to happen earlier (which is does, so that's ok). This turns the above sequence into U+00C1 (CAPITAL A WITH ACUTE).

Arguably, the original input was "bogus", but it is, I believe,
typographically meaningful, since the ZWS is allowed to expand
during justification, and is additionally a word-break character.
Unicode has some odd corners.

The insignificant character deletion which occurs in LDAPprep is
post-normalization, and therefore could theoretically create
non-NFKC normalised strings, if for example a telephone number
were allowed to contain non-IA5 characters. (Is there a reason
why NFKC was chosen rather than NFKD, by the way? I would have
thought that NFKD would be computationally simpler -- even though
the normalised strings are larger -- and I don't see how there
could be any difference in the equivalence classes generated.)

References:
- Empty IA5String
  - From: Hallvard B Furuseth <h.b.furuseth@usit.uio.no>
- Re: Empty IA5String
  - From: Steven Legg <steven.legg@eb2bcom.com>
- Re: Empty IA5String
  - From: Hallvard B Furuseth <h.b.furuseth@usit.uio.no>
- Re: Empty IA5String
  - From: Steven Legg <steven.legg@eb2bcom.com>
- Re: Empty IA5String
  - From: "Kurt D. Zeilenga" <Kurt@OpenLDAP.org>
- Re: Empty IA5String
  - From: Hallvard B Furuseth <h.b.furuseth@usit.uio.no>
- Re: Empty IA5String
  - From: "Kurt D. Zeilenga" <Kurt@OpenLDAP.org>
- Re: Empty IA5String
  - From: Steven Legg <steven.legg@eb2bcom.com>

Prev by Date: Re: Empty IA5String
Next by Date: Re: Empty IA5String
Index(es):
- Chronological
- Thread