[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: [ldapext] UTF-8 full support in LDIF / LDIF v2



Kurt Zeilenga wrote:

There are a number of problems with it. Personally, I think what Steven already offered (and likely implemented) is better, though I am

My problem with Steven's solution is that it is half LDIF, half XML. As I have mentioned earlier, I thing XML has its place, and maybe DSML should be fixed or re-invented, but for other application, I find the simplicity of LDIF an advantage ; unfortunately, having to base64 encode anything that's not 7 bit ASCII takes away some of its simplicity.

concerned about line separators. As Howard comments kind of suggests, when you have a value which is multi-lined,

I have never run into the situation where I needed a multi-line value in an LDAP directory and was surprised by the need, but Steven brought this up earlier in the thread and said that he has a real-world need for it, and that the lack of a syntax for it in my proposition for an updated LDIF format was an issue.


The problem with your proposal, and Steven's, is that LDIF line separators and value line separators are one and the same thing. While one might be case occasionally, it cannot be expected to be generally the case.

On the contrary, both Steven's solution and mine separate the lines but do not impose a line separator. Steven delimits his line with the <item></item> syntax, while I let the user choose any line separator out of the half dozen that has been used throughout the history of computing.

Our syntaxes are clear enough to let the import process know that those are separate lines, and the import process or the LDAP server can choose whichever line separator it wants. Making the line separator part of the data will create cross-platform issues. The LDAP server or actually the LDAP client should choose which line separator to use for its context/platform.


Adding UTF-8 support does appear to be in support of improving LDIF as a proper interchange format. It seems to be driven by other goals, such as trying to make LDIF files displayable.

Yes and no. My main reason for pushing this is diffing. You run into a problem and you want to diff the original and the problematic LDIF export of your directory. Having half of your LDIF file base64 encoded makes it a lot more difficult to pin point the problem. If you are right, that LDIF is purely for exchanging information between applications, never to be looked at by humans, then why is the current version so human friendly ?


I'm not convinced that removing the ASCII restrictions will be a good thing. Not only do I doubt it will have a net positive on displayability of LDIF for those who have a displayability goal (I don't this goal), I think it will have a net negative impact on interoperability and user confusion, such as when the user creates a file using one Unicode normalization algorithm, but is trying to set values which require a different Unicode normalization value.

How so ?
In the current version, you have to encode your Unicode to UTF-8, and then encode it again to base64. With my proposal, you would get the exact same UTF-8 strings as you do today, but they would not be (or would not have to be) encoded in base64.

This is not a rebuttal of your argument, I am truly interested in understanding what you mean here (in the same way I was glad somebody brought up the issue of Right To Left characters, as I had not thought about it). Maybe it is a problem that we can address ?


if so, should we help Steven with the xmled RFC ?

What Steven and Andrew have done is define an extension for LDIF to allow XML values to be represented in a human-readable format instead of requiring the use use of base64. Unfortunately his proposal has interchange issues (see the I-D's security considerations section). This, I think, is a fatal problem with this extension.

So really this is the issue, should the value of the line separator be part of the data, or should everybody (LDIF importers/exporters, LDAP servers, LDAP clients) treat multi-line entries as just that, several lines, and choose their own line separator ? (in case I wasn't clear earlier, I am in favour of the latter).

--
Yves.
http://www.sollers.ca/

_______________________________________________
Ldapext mailing list
Ldapext@ietf.org
https://www.ietf.org/mailman/listinfo/ldapext