[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
On Jun 14, 2009, at 9:46 PM, Yves Dorfsman wrote:
Kurt Zeilenga wrote:
I think we need clear problem statements, a proposal for addressing
each problem, a summary of why the proposal does address the
problem and a statement of what (known) problems the proposal might
introduce.
You have noted that the "diffing problem". But here it's not clear
whether a) you are wish to determine how two LDIF files differ, b)
you wish to determine if two LDIF files represent the same LDAP
requests, or c) you wish to determine how directory information
represented in the LDIF represented LDAP requests differ and, if
so, how.
For a), one can use file comparison tools to determine how two LDIF
files differ.
Yes, but because it displays unreadable characters, it makes it
slightly more complicated. The better case (than simply diffing) I
have given in the past is:
-the directory is broken
-you export to LDIF
-compare this LDIF with a previous one from when the directory was
working.
You don't need UTF-8 for this. A simple text diff tool will tell you
that the base64 differs.
I personally find that in such a case, being able to read the values
makes it simpler and faster.
But now you assume you'll be able to read them. This is a bad
assumption. A simple diff tool might show two DIFFERENT values the
same way, leading the human to believe there is no difference when
there is a significant difference. And then there the issue that a
UTF-8 encoded Unicode file is not well-formed text, and trying to
treat it as text will be quite problematic. (see below)
Other case: People have mentioned scripts that build LDIF file from
other source, and have mentioned that encoding the values in base64
is an overhead they could do without.
While base64 data is an additional step, it's an additional step that
well supported today. If we lift the ASCII restriction now, we'll
have some implementations that do support it and some that don't, and
that will cause interop problems. I cannot support inducing such
interop problems without a strong justification.
It may be you are thinking that a human would be better able to
visually detect certain kinds of differences. However, this
assumes that removing the ASCII restriction would produce a readily
display Unicode text.
On a modern OS setup properly, Unicode text is displayed properly
(my experience is with UTF-8 on Linux and solaris here).
The key phrase here is "Unicode text". And most such display tools
not only require "well-formed" text, but often cannot display all
"well-formed" text. But removing the ASCII restriction does not make
a LDIF file "Unicode text". It makes it a series of Unicode code
points and hence display of it as text will be quite problematic. And
even it's displayable, you have the problem that two values might
display in the same way, making visual diff'ing problematic.
-- Kurt
_______________________________________________
Ldapext mailing list
Ldapext@ietf.org
https://www.ietf.org/mailman/listinfo/ldapext