[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: [ldapext] UTF-8 full support in LDIF / LDIF v2

To: Yves Dorfsman <yves@zioup.com>
Subject: Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
From: Kurt Zeilenga <Kurt.Zeilenga@Isode.com>
Date: Tue, 16 Jun 2009 06:46:01 -0700
Cc: ldapext@ietf.org
Delivered-to: ldapext@core3.amsl.com
In-reply-to: <4A35D23D.5040307@zioup.com>
References: <49C497F9.7010200@zioup.com> <CD3905D4-2A25-4C56-8187-3CE10D46C929@isode.com> <49C870C6.4010803@zioup.com> <E94B7389-9A6D-4CB6-BB2C-649CCD3FD15B@Isode.com> <49CB192E.5050105@zioup.com> <49CB211C.6070108@eb2bcom.com> <49CB87FE.1050809@zioup.com> <49CC01DE.6040506@eb2bcom.com> <4A24557D.7030006@zioup.com> <4A26A05D.8040105@zioup.com> <245BF18B-2066-4E36-9502-16F4A3140D9E@Isode.com> <4A309775.3080406@zioup.com> <4A311ED1.1030202@stroeder.com> <4A31D27B.3090208@zioup.com> <35B2A165-CE5D-4650-AADE-CC233F71470E@Isode.com> <4A35D23D.5040307@zioup.com>


On Jun 14, 2009, at 9:46 PM, Yves Dorfsman wrote:

Kurt Zeilenga wrote:
I think we need clear problem statements, a proposal for addressingeach problem, a summary of why the proposal does address theproblem and a statement of what (known) problems the proposal mightintroduce.You have noted that the "diffing problem". But here it's not clearwhether a) you are wish to determine how two LDIF files differ, b)you wish to determine if two LDIF files represent the same LDAPrequests, or c) you wish to determine how directory informationrepresented in the LDIF represented LDAP requests differ and, ifso, how.For a), one can use file comparison tools to determine how two LDIFfiles differ.
Yes, but because it displays unreadable characters, it makes itslightly more complicated. The better case (than simply diffing) Ihave given in the past is:
-the directory is broken
-you export to LDIF
-compare this LDIF with a previous one from when the directory wasworking.

You don't need UTF-8 for this. A simple text diff tool will tell youthat the base64 differs.

I personally find that in such a case, being able to read the valuesmakes it simpler and faster.

But now you assume you'll be able to read them. This is a badassumption. A simple diff tool might show two DIFFERENT values thesame way, leading the human to believe there is no difference whenthere is a significant difference. And then there the issue that aUTF-8 encoded Unicode file is not well-formed text, and trying totreat it as text will be quite problematic. (see below)

Other case: People have mentioned scripts that build LDIF file fromother source, and have mentioned that encoding the values in base64is an overhead they could do without.

While base64 data is an additional step, it's an additional step thatwell supported today. If we lift the ASCII restriction now, we'llhave some implementations that do support it and some that don't, andthat will cause interop problems. I cannot support inducing suchinterop problems without a strong justification.

It may be you are thinking that a human would be better able tovisually detect certain kinds of differences. However, thisassumes that removing the ASCII restriction would produce a readilydisplay Unicode text.
On a modern OS setup properly, Unicode text is displayed properly(my experience is with UTF-8 on Linux and solaris here).

The key phrase here is "Unicode text". And most such display toolsnot only require "well-formed" text, but often cannot display all"well-formed" text. But removing the ASCII restriction does not makea LDIF file "Unicode text". It makes it a series of Unicode codepoints and hence display of it as text will be quite problematic. Andeven it's displayable, you have the problem that two values mightdisplay in the same way, making visual diff'ing problematic.


-- Kurt
_______________________________________________
Ldapext mailing list
Ldapext@ietf.org
https://www.ietf.org/mailman/listinfo/ldapext

Follow-Ups:
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Yves Dorfsman <yves@zioup.com>

References:
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Yves Dorfsman <yves@zioup.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Yves Dorfsman <yves@zioup.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Kurt Zeilenga <Kurt.Zeilenga@Isode.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Yves Dorfsman <yves@zioup.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Michael Ströder <michael@stroeder.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Yves Dorfsman <yves@zioup.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Kurt Zeilenga <Kurt.Zeilenga@Isode.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Yves Dorfsman <yves@zioup.com>

Prev by Date: Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
Next by Date: Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
Index(es):
- Chronological
- Thread