[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
Yves,
I think we need clear problem statements, a proposal for addressing
each problem, a summary of why the proposal does address the problem
and a statement of what (known) problems the proposal might introduce.
You have noted that the "diffing problem". But here it's not clear
whether a) you are wish to determine how two LDIF files differ, b) you
wish to determine if two LDIF files represent the same LDAP requests,
or c) you wish to determine how directory information represented in
the LDIF represented LDAP requests differ and, if so, how.
For a), one can use file comparison tools to determine how two LDIF
files differ.
For b), one can use an LDIF comparison tool. For instance, such a
tool know that
a: foo
and
a:foo
and
a:f
o
o
and:
a::Zm9v
represent the same value for the attribute a.
For c), a schema aware tool is need. For instance,
a:foo
and
b:foo
might be equivalent as b could be an alias for a.
Or
a:foo
and
a:FOO
could be different but equivalent values.
Simply having a UTF-8 value encoding option doesn't generally solve
any diffing problem. In cases b and c, the LDIF encoding of the value
doesn't matter. In case a), you'd be introducing another way for LDIF
files which represented the same LDAP requests to differ.
It seems to me that you want to use file comparision tools to
accomplish b, you need to have a single way to represent in LDIF a
particular LDAP request. As long as there are multiple ways to
represent a particular LDAP request in LDIF, file comparison tools are
inadequate for b.
It may be you are thinking that a human would be better able to
visually detect certain kinds of differences. However, this assumes
that removing the ASCII restriction would produce a readily display
Unicode text. That, I believe, is a bad assumption. For instance,
say a user diff(1) to LDIF files and get:
% diff -u ?.ldif
--- 1.ldif 2009-06-11 22:08:21.000000000 -0700
+++ 2.ldif 2009-06-11 22:08:47.000000000 -0700
@@ -1 +1 @@
-a: f??
+a:f??
where ? represents character not displayable on the user's screen.
The user might assume the values here are same when they aren't.
This, I hope, illustrates why general file diff'ing tools, like
diff(1), are suitable only for case a but not b.
Do you have a more compelling use case?
-- Kurt
_______________________________________________
Ldapext mailing list
Ldapext@ietf.org
https://www.ietf.org/mailman/listinfo/ldapext