[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: [ldapext] UTF-8 full support in LDIF / LDIF v2

To: Yves Dorfsman <yves@zioup.com>
Subject: Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
From: Kurt Zeilenga <Kurt.Zeilenga@Isode.com>
Date: Tue, 16 Jun 2009 12:01:41 -0700
Cc: ldapext@ietf.org
Delivered-to: ldapext@core3.amsl.com
In-reply-to: <68380E97-521C-4A80-A569-D09F8F626F6F@Isode.com>
References: <49C497F9.7010200@zioup.com> <CD3905D4-2A25-4C56-8187-3CE10D46C929@isode.com> <49C870C6.4010803@zioup.com> <E94B7389-9A6D-4CB6-BB2C-649CCD3FD15B@Isode.com> <49CB192E.5050105@zioup.com> <49CB211C.6070108@eb2bcom.com> <49CB87FE.1050809@zioup.com> <49CC01DE.6040506@eb2bcom.com> <4A24557D.7030006@zioup.com> <4A26A05D.8040105@zioup.com> <245BF18B-2066-4E36-9502-16F4A3140D9E@Isode.com> <4A309775.3080406@zioup.com> <4A311ED1.1030202@stroeder.com> <4A31D27B.3090208@zioup.com> <35B2A165-CE5D-4650-AADE-CC233F71470E@Isode.com> <4A35D23D.5040307@zioup.com> <D437E784-4198-4037-A4EA-0300439C3D2C@Isode.com> <4A37BCEB.5040103@zioup.com> <68380E97-521C-4A80-A569-D09F8F626F6F@Isode.com>

Another issue...

Though LDIFv1 doesn't actually allow multi-byte UTF-8 anywhere (exceptin base64 encoded strings), RFC 2849 did say:Implementations SHOULD NOT fold lines in the middle of a multi-byteUTF-8 character.(this was a left over, I think, when non-base64 encodings of UTF-8 wasbeing considered for LDIFv1. That is, the IETF seems to have alreadyconsidered this issue when it was developing LDIFv1 but choose torequire base64 of multi-byte UTF-8 characters.)Anyways, if we remove the ASCII restriction and leave the aboveSHOULD, that would mean that implementations would have to deal withfolding in the middle of multi-byte UTF-8 sequences, and that meansthe file as whole might not be valid UTF-8, and that would beproblematic.If we do remove the ASCII restriction (which I don't support), I thinkwe install a couple of restrictions.1) The resulting file MUST only be valid UTF-8 (minimally lengthencodings, a start octet must be followed by continuation octets,etc.) AND MUST be a valid Unicode code point (U+0000 through U+D7FF, U+E000 through U+10FFFF, inclusively) excluding the NUL (U+0000) codepoint.2) Lines longer that 76 octets MUST be folded. Folding of lines MUSTNOT occur in the middle of a multi-byte UTF-8 character.Implementations need to take care that both of these requirements aremet.Both of these requirements will significantly increase the complexityof LDIF encoders. Though the first is similar to current requirementto ensure that values encodes as SAFE-STRING met all of SAFE-STRINGrequirements, these checks are octet by octet. Checking for validUTF-8 and Unicode requires both an expansion of each UTF-8 sequencebut the code point represented by that UTF-8 sequence. And the secondrequires a bit more math, something experience has show implementorstend to get wrong.We'd also need text explaining that format does not ensure the UTF-8represents well-formed text, or even text, and hence may not bedisplayable or editable. And even displayable, users must be carefulnot to rely on visual inspection.Also, users must be careful to avoid passing LDIF data through systemsthat might apply Unicode normalization or other transformations. Suchsystems will likely lead to unintended changes to the LDAP datarepresented by the original LDIF (and/or cause the data to be non-conforming LDIF). For instance, when emailing an LDIF file, both thesending and receiving user must be careful that their MUAs (and MTAs)are not altering the LDIF data. Unfortunately, many MUAs and MTAs doalter text attachments (to conform to various conventions). Whilesuch alterations may be desirable for text attachments, it is notdesirable for LDIF data (which are commonly sent as sent as textattachments).By raising these points, I hope to show that simply removing the ASCIIrestrictions will lead to problems, and even though some of them (suchas invalid UTF-8) can be address by installing various restrictions,the simple fact that LDIF represents LDAP data not text will lead tovarious problems (such as unintended text file conversions, inabilityto use text processing programs, etc.).

-- Kurt
_______________________________________________
Ldapext mailing list
Ldapext@ietf.org
https://www.ietf.org/mailman/listinfo/ldapext

Follow-Ups:
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Howard Chu <hyc@highlandsun.com>

References:
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Yves Dorfsman <yves@zioup.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Yves Dorfsman <yves@zioup.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Kurt Zeilenga <Kurt.Zeilenga@Isode.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Yves Dorfsman <yves@zioup.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Michael Ströder <michael@stroeder.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Yves Dorfsman <yves@zioup.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Kurt Zeilenga <Kurt.Zeilenga@Isode.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Yves Dorfsman <yves@zioup.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Kurt Zeilenga <Kurt.Zeilenga@Isode.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Yves Dorfsman <yves@zioup.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Kurt Zeilenga <Kurt.Zeilenga@Isode.com>

Prev by Date: Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
Next by Date: Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
Index(es):
- Chronological
- Thread