[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
Kurt Zeilenga wrote:
I think we need clear problem statements, a proposal for addressing each
problem, a summary of why the proposal does address the problem and a
statement of what (known) problems the proposal might introduce.
You have noted that the "diffing problem". But here it's not clear
whether a) you are wish to determine how two LDIF files differ, b) you
wish to determine if two LDIF files represent the same LDAP requests, or
c) you wish to determine how directory information represented in the
LDIF represented LDAP requests differ and, if so, how.
For a), one can use file comparison tools to determine how two LDIF
files differ.
Yes, but because it displays unreadable characters, it makes it slightly
more complicated. The better case (than simply diffing) I have given in the
past is:
-the directory is broken
-you export to LDIF
-compare this LDIF with a previous one from when the directory was working.
I personally find that in such a case, being able to read the values makes
it simpler and faster.
Other case: People have mentioned scripts that build LDIF file from other
source, and have mentioned that encoding the values in base64 is an overhead
they could do without.
Simply having a UTF-8 value encoding option doesn't generally solve any
diffing problem. In cases b and c, the LDIF encoding of the value
doesn't matter. In case a), you'd be introducing another way for LDIF
files which represented the same LDAP requests to differ.
If you always do your export with the same tool, with the same options, then
this shouldn't be an issue. A narrow case I admit, but this is one specific
case I was thinking about.
It may be you are thinking that a human would be better able to visually
detect certain kinds of differences. However, this assumes that
removing the ASCII restriction would produce a readily display Unicode
text.
On a modern OS setup properly, Unicode text is displayed properly (my
experience is with UTF-8 on Linux and solaris here).
That, I believe, is a bad assumption. For instance, say a user
diff(1) to LDIF files and get:
% diff -u ?.ldif
--- 1.ldif 2009-06-11 22:08:21.000000000 -0700
+++ 2.ldif 2009-06-11 22:08:47.000000000 -0700
@@ -1 +1 @@
-a: f??
+a:f??
where ? represents character not displayable on the user's screen. The
user might assume the values here are same when they aren't.
This, I hope, illustrates why general file diff'ing tools, like diff(1),
are suitable only for case a but not b.
Ok, but a) is still a valid case.
--
Yves.
http://www.sollers.ca/
_______________________________________________
Ldapext mailing list
Ldapext@ietf.org
https://www.ietf.org/mailman/listinfo/ldapext