[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: [ldapext] UTF-8 full support in LDIF / LDIF v2



Yves,

I think we need clear problem statements, a proposal for addressing each problem, a summary of why the proposal does address the problem and a statement of what (known) problems the proposal might introduce.

You have noted that the "diffing problem". But here it's not clear whether a) you are wish to determine how two LDIF files differ, b) you wish to determine if two LDIF files represent the same LDAP requests, or c) you wish to determine how directory information represented in the LDIF represented LDAP requests differ and, if so, how.

For a), one can use file comparison tools to determine how two LDIF files differ.

For b), one can use an LDIF comparison tool. For instance, such a tool know that
	a: foo
and
	a:foo
and
	a:f
	 o
	 o
and:
	a::Zm9v

represent the same value for the attribute a.


For c), a schema aware tool is need.  For instance,
	a:foo
and
	b:foo

might be equivalent as b could be an alias for a.

Or
	a:foo
and
	a:FOO

could be different but equivalent values.

Simply having a UTF-8 value encoding option doesn't generally solve any diffing problem. In cases b and c, the LDIF encoding of the value doesn't matter. In case a), you'd be introducing another way for LDIF files which represented the same LDAP requests to differ. It seems to me that you want to use file comparision tools to accomplish b, you need to have a single way to represent in LDIF a particular LDAP request. As long as there are multiple ways to represent a particular LDAP request in LDIF, file comparison tools are inadequate for b.

It may be you are thinking that a human would be better able to visually detect certain kinds of differences. However, this assumes that removing the ASCII restriction would produce a readily display Unicode text. That, I believe, is a bad assumption. For instance, say a user diff(1) to LDIF files and get:

% diff -u ?.ldif
--- 1.ldif	2009-06-11 22:08:21.000000000 -0700
+++ 2.ldif	2009-06-11 22:08:47.000000000 -0700
@@ -1 +1 @@
-a: f??
+a:f??

where ? represents character not displayable on the user's screen. The user might assume the values here are same when they aren't.

This, I hope, illustrates why general file diff'ing tools, like diff(1), are suitable only for case a but not b.

Do you have a more compelling use case?

-- Kurt
_______________________________________________
Ldapext mailing list
Ldapext@ietf.org
https://www.ietf.org/mailman/listinfo/ldapext