[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: [ldapext] UTF-8 full support in LDIF / LDIF v2

To: ldapext@ietf.org
Subject: Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
From: Yves Dorfsman <yves@zioup.com>
Date: Tue, 16 Jun 2009 09:40:27 -0600
Delivered-to: ldapext@core3.amsl.com
In-reply-to: <D437E784-4198-4037-A4EA-0300439C3D2C@Isode.com>
References: <49C497F9.7010200@zioup.com> <CD3905D4-2A25-4C56-8187-3CE10D46C929@isode.com> <49C870C6.4010803@zioup.com> <E94B7389-9A6D-4CB6-BB2C-649CCD3FD15B@Isode.com> <49CB192E.5050105@zioup.com> <49CB211C.6070108@eb2bcom.com> <49CB87FE.1050809@zioup.com> <49CC01DE.6040506@eb2bcom.com> <4A24557D.7030006@zioup.com> <4A26A05D.8040105@zioup.com> <245BF18B-2066-4E36-9502-16F4A3140D9E@Isode.com> <4A309775.3080406@zioup.com> <4A311ED1.1030202@stroeder.com> <4A31D27B.3090208@zioup.com> <35B2A165-CE5D-4650-AADE-CC233F71470E@Isode.com> <4A35D23D.5040307@zioup.com> <D437E784-4198-4037-A4EA-0300439C3D2C@Isode.com>
User-agent: Thunderbird 2.0.0.21 (X11/20090409)

Thanks Kurt, sorry for my asking an answer in another email, it seems wewere writing at the same time.



Kurt Zeilenga wrote:

-the directory is broken
-you export to LDIF
-compare this LDIF with a previous one from when the directory wasworking.
You don't need UTF-8 for this. A simple text diff tool will tell youthat the base64 differs.

True, diff will tell you that they are different, and where, but then youneed to decode base64 to find what the text is in the two file to help youunderstand why they are different.

If there were no base64 encryption, then you would know right away, in mostcase, making it a much faster process.

But now you assume you'll be able to read them. This is a badassumption.

I still don't understand why you are saying this. Can you give preciseexamples ?

A simple diff tool might show two DIFFERENT values the sameway, leading the human to believe there is no difference when there is asignificant difference.

So, for example, one file contains U+2026 (ellipsis, "…") while the othercontains three U_002E (three times the full stop character, "..."), and theissue of duplication in Unicode(http://en.wikipedia.org/wiki/Duplicate_characters_in_Unicode) ?

Well, what do we do today when diff shows us that there is a difference intwo ascii files, but our eyes can't see it ? We hex dump of the offendingline(s), and go "ahah, I've got <CR> here, but <CR><LF> there). On the otherhand what's the percentage of time you diff files and run into this problem ?

Having those base64 encoded will make the fact that they are different moreobvious, but won't help us when we're trying to understand why they aredifferent, we will still have to go base64 --> UTF-8 --> hex values.

Data can be different because it comes from different sources, in which caseit is likely that people will use different duplicate of Unicode point toexpress the same thing, but data can also be different because it waschanged, which is the case I am more concerned with (backup of LDAP serversetc...), and in this case, I doubt somebody will change the mu symbol to themicro symbol or something similar. It might happen for malicious reasons,but I expect that to be a small percentage.

The key phrase here is "Unicode text". And most such display tools notonly require "well-formed" text, but often cannot display all"well-formed" text. But removing the ASCII restriction does not make aLDIF file "Unicode text". It makes it a series of Unicode code pointsand hence display of it as text will be quite problematic.

I don't understand what you are saying here. Could you give an example of aproblematic case, or a link to an explanation ?


Why is a series of Unicode point not text ?

What do you mean by "well-formed text", or actually by text that is notwell-formed ?

And evenit's displayable, you have the problem that two values might display inthe same way, making visual diff'ing problematic.


This is the issue above (Unicode duplication) ? Right ?


>> Other case: People have mentioned scripts that build LDIF file from
>> other source, and have mentioned that encoding the values in base64 is
>> an overhead they could do without.
>
> While base64 data is an additional step, it's an additional step that
> well supported today.  If we lift the ASCII restriction now, we'll have
> some implementations that do support it and some that don't, and that
> will cause interop problems.   I cannot support inducing such interop
> problems without a strong justification.

Adding a new version of the standard, does not remove the old one. You cantell people that you only work with version 1 LDIF files. The advantage ofhaving a standard, is that the tools will slowly adopt it and will be ableto deal with it. If we don't create a standard, we run into the situationwhere everybody creates their own format, and create their own tools totransform their format to the current LDIF one. If I remember right, Jimmentioned that he already uses his version of extended LDIF.



--
Yves.
http://www.sollers.ca/

_______________________________________________
Ldapext mailing list
Ldapext@ietf.org
https://www.ietf.org/mailman/listinfo/ldapext

Follow-Ups:
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Kurt Zeilenga <Kurt.Zeilenga@Isode.com>

References:
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Yves Dorfsman <yves@zioup.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Yves Dorfsman <yves@zioup.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Kurt Zeilenga <Kurt.Zeilenga@Isode.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Yves Dorfsman <yves@zioup.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Michael Ströder <michael@stroeder.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Yves Dorfsman <yves@zioup.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Kurt Zeilenga <Kurt.Zeilenga@Isode.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Yves Dorfsman <yves@zioup.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Kurt Zeilenga <Kurt.Zeilenga@Isode.com>

Prev by Date: Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
Next by Date: Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
Index(es):
- Chronological
- Thread