[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
On Jun 16, 2009, at 7:02 AM, Yves Dorfsman wrote:
Michael Ströder wrote:
but was under the impression
you saw value somewhere else.
I see some value. As I said my web2ldap accepts single-line UTF-8
input
in an LDIF editor multi-line input field but does not display UTF-8.
But if we run into more complicated issues like multi-line UTF-8 data
I'm against opening a can of worms with here document scheme or
similar
complicated things.
So being able to read/write directly in UTF-8 in the LDIF file will
eliminate one step in the process ? Can you give more details ?
Listing the case we have so far (please argument against if you
think a case is not valid):
-comparing at the file level by humans, to fix an issue for example ;
Again, two UTF-8 strings may display in the same, meaning humans will
NOT be able to visually detect the difference and hence won't be able
to fix the difference.
this imply that the user is using a display properly setup for UTF-8
(Yves Dorfsman).
Again, Unicode display programs are typically designed to display well-
formed text, not arbitrary sequences of Unicode code points.
-simplifying scripts to import/export data form/to other format to/
from LDIF (Steven Legg. Steven).
As I noted before, Steve's suggestion is problematic in that it
doesn't preserve Unicode code points in exchange of LDAP values.
-simplifying online editors (Michael Stöder).
I seriously doubt that removing the ASCII restriction actually
simplifies anything. Removing the restriction adds another encoding
option, which adds complexity and adds interop problems.
Steven:
Can you confirm this is what you were saying, otherwise reword ?
Kurt:
My case was the case a) in you email. In my last email I argued that
1) that case has value
2) your argument against it, not being able to display UTF-8
properly, is solved in modern environment setup for UTF-8
I think your assumptions about "modern environment setup for UTF-8"
are flawed. Good UTF-8 display programs (that display UTF-8 as text)
generally expect the UTF-8 to be well formed text. If the input to
these programs is not well formed text, the display will not be well
formed (though it may not be obvious to the viewer that it's not well-
formed text). And there are lots of "bad" UTF-8 display programs in
wide use, ones which cannot handle even all well formed text (like
bidirectional text). And most every display program suffers from
incomplete fonts. And then there are private use code points, and
various other things that even the best UTF-8 display programs have
problems with.
Here's a contrived file that illustrates just some of the display
problems removing the ASCII restriction will lead to. (apologies if
my MUA associates the wrong MIME information to this, but that just
illustrates yet another problem folks will face (wrong MIME
information) when exchange such files.) BTW, none of my well setup
Unicode "text" display programs handled this file well... because it's
simply not well-formed text.
version: 2
dn: cn=funky
bom:
smiley-face:â?º
# only SPACE is special
no-break-space:
zero-width-space:â??
word-joiner:â?
ideographic-space:ã??
zero-width-no-break-space:
# these hyphen differ but may look the same
hyphen-minus:-
hyphen:â??
non-breaking-hyphen:â??
figure-dash:â??
en-dash:â??
minus-sign:â??
roman-uncia-sign:ð???
# these differ but may look the same
o-diaeresis:ö
o-diaeresis-decomposed:oÌ?
# combining character
diaeresis:Ì?
-- Kurt
_______________________________________________
Ldapext mailing list
Ldapext@ietf.org
https://www.ietf.org/mailman/listinfo/ldapext