[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: [ldapext] UTF-8 full support in LDIF / LDIF v2

To: Yves Dorfsman <yves@zioup.com>
Subject: Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
From: Kurt Zeilenga <Kurt.Zeilenga@Isode.com>
Date: Tue, 16 Jun 2009 08:35:28 -0700
Cc: ldapext@ietf.org
Delivered-to: ldapext@core3.amsl.com
In-reply-to: <4A37A5D9.4040901@zioup.com>
References: <49C497F9.7010200@zioup.com> <CD3905D4-2A25-4C56-8187-3CE10D46C929@isode.com> <49C870C6.4010803@zioup.com> <E94B7389-9A6D-4CB6-BB2C-649CCD3FD15B@Isode.com> <49CB192E.5050105@zioup.com> <49CB211C.6070108@eb2bcom.com> <49CB87FE.1050809@zioup.com> <49CC01DE.6040506@eb2bcom.com> <4A24557D.7030006@zioup.com> <4A26A05D.8040105@zioup.com> <245BF18B-2066-4E36-9502-16F4A3140D9E@Isode.com> <4A309775.3080406@zioup.com> <4A311ED1.1030202@stroeder.com> <4A31D27B.3090208@zioup.com> <4A325A40.2050802@stroeder.com> <4A35CDDE.8000604@zioup.com> <4A37719E.3010006@stroeder.com> <4A37A5D9.4040901@zioup.com>


On Jun 16, 2009, at 7:02 AM, Yves Dorfsman wrote:

Michael Ströder wrote:
but was under the impression
you saw value somewhere else.
I see some value. As I said my web2ldap accepts single-line UTF-8input
in an LDIF editor multi-line input field but does not display UTF-8.
But if we run into more complicated issues like multi-line UTF-8 data
I'm against opening a can of worms with here document scheme orsimilar
complicated things.
So being able to read/write directly in UTF-8 in the LDIF file willeliminate one step in the process ? Can you give more details ?
Listing the case we have so far (please argument against if youthink a case is not valid):
-comparing at the file level by humans, to fix an issue for example ;

Again, two UTF-8 strings may display in the same, meaning humans willNOT be able to visually detect the difference and hence won't be ableto fix the difference.

this imply that the user is using a display properly setup for UTF-8(Yves Dorfsman).

Again, Unicode display programs are typically designed to display well-formed text, not arbitrary sequences of Unicode code points.

-simplifying scripts to import/export data form/to other format to/from LDIF (Steven Legg. Steven).

As I noted before, Steve's suggestion is problematic in that itdoesn't preserve Unicode code points in exchange of LDAP values.

-simplifying online editors (Michael Stöder).

I seriously doubt that removing the ASCII restriction actuallysimplifies anything. Removing the restriction adds another encodingoption, which adds complexity and adds interop problems.

Steven:
Can you confirm this is what you were saying, otherwise reword ?


Kurt:
My case was the case a) in you email. In my last email I argued that1) that case has value2) your argument against it, not being able to display UTF-8properly, is solved in modern environment setup for UTF-8

I think your assumptions about "modern environment setup for UTF-8"are flawed. Good UTF-8 display programs (that display UTF-8 as text)generally expect the UTF-8 to be well formed text. If the input tothese programs is not well formed text, the display will not be wellformed (though it may not be obvious to the viewer that it's not well-formed text). And there are lots of "bad" UTF-8 display programs inwide use, ones which cannot handle even all well formed text (likebidirectional text). And most every display program suffers fromincomplete fonts. And then there are private use code points, andvarious other things that even the best UTF-8 display programs haveproblems with.

Here's a contrived file that illustrates just some of the displayproblems removing the ASCII restriction will lead to. (apologies ifmy MUA associates the wrong MIME information to this, but that justillustrates yet another problem folks will face (wrong MIMEinformation) when exchange such files.) BTW, none of my well setupUnicode "text" display programs handled this file well... because it'ssimply not well-formed text.

version: 2

dn: cn=funky
bom:ï»¿
smiley-face:â?º
# only SPACE is special
no-break-space:  
zero-width-space:â??
word-joiner:â? 
ideographic-space:ã??
zero-width-no-break-space:ï»¿
# these hyphen differ but may look the same
hyphen-minus:-
hyphen:â??
non-breaking-hyphen:â??
figure-dash:â??
en-dash:â??
minus-sign:â??
roman-uncia-sign:ð???
# these differ but may look the same
o-diaeresis:Ã¶
o-diaeresis-decomposed:oÌ?
# combining character
diaeresis:Ì?




-- Kurt

_______________________________________________
Ldapext mailing list
Ldapext@ietf.org
https://www.ietf.org/mailman/listinfo/ldapext

References:
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Yves Dorfsman <yves@zioup.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Yves Dorfsman <yves@zioup.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Kurt Zeilenga <Kurt.Zeilenga@Isode.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Yves Dorfsman <yves@zioup.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Michael Ströder <michael@stroeder.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Yves Dorfsman <yves@zioup.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Michael Ströder <michael@stroeder.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Yves Dorfsman <yves@zioup.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Michael Ströder <michael@stroeder.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Yves Dorfsman <yves@zioup.com>

Prev by Date: Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
Next by Date: Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
Index(es):
- Chronological
- Thread