[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: [ldapext] UTF-8 full support in LDIF / LDIF v2

To: Michael Ströder <michael@stroeder.com>
Subject: Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
From: Kurt Zeilenga <Kurt.Zeilenga@Isode.com>
Date: Thu, 18 Jun 2009 07:07:30 -0700
Cc: ldapext@ietf.org
Delivered-to: ldapext@core3.amsl.com
In-reply-to: <4A3A1D81.1000407@stroeder.com>
References: <49C497F9.7010200@zioup.com> <CD3905D4-2A25-4C56-8187-3CE10D46C929@isode.com> <49C870C6.4010803@zioup.com> <E94B7389-9A6D-4CB6-BB2C-649CCD3FD15B@Isode.com> <49CB192E.5050105@zioup.com> <49CB211C.6070108@eb2bcom.com> <49CB87FE.1050809@zioup.com> <49CC01DE.6040506@eb2bcom.com> <4A24557D.7030006@zioup.com> <4A26A05D.8040105@zioup.com> <245BF18B-2066-4E36-9502-16F4A3140D9E@Isode.com> <4A309775.3080406@zioup.com> <4A311ED1.1030202@stroeder.com> <4A31D27B.3090208@zioup.com> <35B2A165-CE5D-4650-AADE-CC233F71470E@Isode.com> <4A35D23D.5040307@zioup.com> <D437E784-4198-4037-A4EA-0300439C3D2C@Isode.com> <4A37BCEB.5040103@zioup.com> <68380E97-521C-4A80-A569-D09F8F626F6F@Isode.com> <4A3A1D81.1000407@stroeder.com>


On Jun 18, 2009, at 3:57 AM, Michael Ströder wrote:

Kurt Zeilenga wrote:
IDNA when through all of this.  They found that they had to place
significant restrictions on Unicode domain components to ensurethat a
domain name was well-formed Unicode text.
I'd like to learn more about the term "well-formed Unicode text". Doyou
have a reference at hand? [NAMEPREP] and/or [STRINGPREP]?

Unfortunately, I don't have a good reference handy. The Unicodecommunity might actually use different terms. I tried to looselydefine the terms in a prior email. (I'm not a Unicode expert, justsomeone who's been involved Unicode issues (such as with IDNA) for anumber of years.)


To rephrase MY definitions:

"text" merely implies that the sequence of Unicode code pointsrepresents a character. In my ldif example, there is a colon followedby a combining code point. This is an example of a sequence whichdoesn't represent "text".

"well-formed text" implies that not only is the sequence is "text" butthat various other rules are met. For instance, the sequence willresult in proper directional display of bidirectional text. There aresome examples in the LDIF which show that introduction of linewrapping can break the directional display of the value.


I found

http://www.unicode.org/versions/Unicode5.1.0/#Conformance_Changes

which contains a replacement for the text in Unicode5.0 standard.
(Strange that one cannot simply download the recent version.)

You can download each chapter of the current version (each has a frontpage detailing copying restrictions, etc.).

You have not suggested placing similar restrictions on LDIF butsimply
removing the ASCII restriction.
Would it help to define similar restrictions?

First, I don't see how any of this helps in LDAP data interchange, theprimary purpose of LDIF.

Second, if one were to say that the resulting file has to be Net-Unicode (which I think at least means the file is "text"), you runinto "data loss" problems due to unintended transformations.

Stepping back a bit from the details of the interesting Unicode issues
posted here I wonder what the general strategy of the IETF regarding
these issues is?


Punt.

I remember discussions on the ietf-pkix mailing list
mentioning problems like these (e.g. when displaying subject names of
X.509 certs) without any real solution.

I think any system which takes (user) input, decodes it to a Unicode
code point sequence and display it to the user is affected by issues
with BIDI, combining characters and duplicate Unicode points.

Yes. The IETF tends to punt such issues to the user interfacedevelopment community. The IETF tends to restrict itself to design ofprotocols not design user interface (though the IETF does try todocument user interface issues, especially those with security impact).

I think of LDIF as an alternative encoding of protocol data units,used for out-of-band transfer data between protocol peers. That is, Ipunt the "user" as far as I can.

Others see LDIF as a user display format and user input format forLDAP data. I argue that LDIFv1 didn't handle this well for ASCII andthat handling this for ASCII (without data loss) is hard. Solving itfor Unicode, well, that's very, very hard.


-- Kurt
_______________________________________________
Ldapext mailing list
Ldapext@ietf.org
https://www.ietf.org/mailman/listinfo/ldapext

Follow-Ups:
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Michael Ströder <michael@stroeder.com>

References:
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Yves Dorfsman <yves@zioup.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Yves Dorfsman <yves@zioup.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Kurt Zeilenga <Kurt.Zeilenga@Isode.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Yves Dorfsman <yves@zioup.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Michael Ströder <michael@stroeder.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Yves Dorfsman <yves@zioup.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Kurt Zeilenga <Kurt.Zeilenga@Isode.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Yves Dorfsman <yves@zioup.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Kurt Zeilenga <Kurt.Zeilenga@Isode.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Yves Dorfsman <yves@zioup.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Kurt Zeilenga <Kurt.Zeilenga@Isode.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Michael Ströder <michael@stroeder.com>

Prev by Date: Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
Next by Date: Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
Index(es):
- Chronological
- Thread