[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: [ldapext] UTF-8 full support in LDIF / LDIF v2




On Jun 16, 2009, at 7:02 AM, Yves Dorfsman wrote:

Michael Ströder wrote:
but was under the impression
you saw value somewhere else.
I see some value. As I said my web2ldap accepts single-line UTF-8 input
in an LDIF editor multi-line input field but does not display UTF-8.
But if we run into more complicated issues like multi-line UTF-8 data
I'm against opening a can of worms with here document scheme or similar
complicated things.

So being able to read/write directly in UTF-8 in the LDIF file will eliminate one step in the process ? Can you give more details ?


Listing the case we have so far (please argument against if you think a case is not valid):

-comparing at the file level by humans, to fix an issue for example ;

Again, two UTF-8 strings may display in the same, meaning humans will NOT be able to visually detect the difference and hence won't be able to fix the difference.

this imply that the user is using a display properly setup for UTF-8 (Yves Dorfsman).

Again, Unicode display programs are typically designed to display well- formed text, not arbitrary sequences of Unicode code points.


-simplifying scripts to import/export data form/to other format to/ from LDIF (Steven Legg. Steven).

As I noted before, Steve's suggestion is problematic in that it doesn't preserve Unicode code points in exchange of LDAP values.

-simplifying online editors (Michael Stöder).

I seriously doubt that removing the ASCII restriction actually simplifies anything. Removing the restriction adds another encoding option, which adds complexity and adds interop problems.


Steven:
Can you confirm this is what you were saying, otherwise reword ?


Kurt:
My case was the case a) in you email. In my last email I argued that 1) that case has value 2) your argument against it, not being able to display UTF-8 properly, is solved in modern environment setup for UTF-8

I think your assumptions about "modern environment setup for UTF-8" are flawed. Good UTF-8 display programs (that display UTF-8 as text) generally expect the UTF-8 to be well formed text. If the input to these programs is not well formed text, the display will not be well formed (though it may not be obvious to the viewer that it's not well- formed text). And there are lots of "bad" UTF-8 display programs in wide use, ones which cannot handle even all well formed text (like bidirectional text). And most every display program suffers from incomplete fonts. And then there are private use code points, and various other things that even the best UTF-8 display programs have problems with.

Here's a contrived file that illustrates just some of the display problems removing the ASCII restriction will lead to. (apologies if my MUA associates the wrong MIME information to this, but that just illustrates yet another problem folks will face (wrong MIME information) when exchange such files.) BTW, none of my well setup Unicode "text" display programs handled this file well... because it's simply not well-formed text.

version: 2

dn: cn=funky
bom:
smiley-face:â?º
# only SPACE is special
no-break-space:  
zero-width-space:â??
word-joiner:â? 
ideographic-space:ã??
zero-width-no-break-space:
# these hyphen differ but may look the same
hyphen-minus:-
hyphen:â??
non-breaking-hyphen:â??
figure-dash:â??
en-dash:â??
minus-sign:â??
roman-uncia-sign:ð???
# these differ but may look the same
o-diaeresis:ö
o-diaeresis-decomposed:oÌ?
# combining character
diaeresis:Ì?



-- Kurt
_______________________________________________
Ldapext mailing list
Ldapext@ietf.org
https://www.ietf.org/mailman/listinfo/ldapext