[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: [ldapext] UTF-8 full support in LDIF / LDIF v2




Kurt Zeilenga wrote:

I think we need clear problem statements, a proposal for addressing each problem, a summary of why the proposal does address the problem and a statement of what (known) problems the proposal might introduce.

You have noted that the "diffing problem". But here it's not clear whether a) you are wish to determine how two LDIF files differ, b) you wish to determine if two LDIF files represent the same LDAP requests, or c) you wish to determine how directory information represented in the LDIF represented LDAP requests differ and, if so, how.

For a), one can use file comparison tools to determine how two LDIF files differ.

Yes, but because it displays unreadable characters, it makes it slightly more complicated. The better case (than simply diffing) I have given in the past is:

-the directory is broken
-you export to LDIF
-compare this LDIF with a previous one from when the directory was working.

I personally find that in such a case, being able to read the values makes it simpler and faster.


Other case: People have mentioned scripts that build LDIF file from other source, and have mentioned that encoding the values in base64 is an overhead they could do without.


Simply having a UTF-8 value encoding option doesn't generally solve any diffing problem. In cases b and c, the LDIF encoding of the value doesn't matter. In case a), you'd be introducing another way for LDIF files which represented the same LDAP requests to differ.

If you always do your export with the same tool, with the same options, then this shouldn't be an issue. A narrow case I admit, but this is one specific case I was thinking about.


It may be you are thinking that a human would be better able to visually detect certain kinds of differences. However, this assumes that removing the ASCII restriction would produce a readily display Unicode text.

On a modern OS setup properly, Unicode text is displayed properly (my experience is with UTF-8 on Linux and solaris here).


That, I believe, is a bad assumption. For instance, say a user diff(1) to LDIF files and get:

% diff -u ?.ldif
--- 1.ldif    2009-06-11 22:08:21.000000000 -0700
+++ 2.ldif    2009-06-11 22:08:47.000000000 -0700
@@ -1 +1 @@
-a: f??
+a:f??

where ? represents character not displayable on the user's screen. The user might assume the values here are same when they aren't.

This, I hope, illustrates why general file diff'ing tools, like diff(1), are suitable only for case a but not b.

Ok, but a) is still a valid case.

--
Yves.
http://www.sollers.ca/

_______________________________________________
Ldapext mailing list
Ldapext@ietf.org
https://www.ietf.org/mailman/listinfo/ldapext