[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: [ldapext] UTF-8 full support in LDIF / LDIF v2




Kurt,

Kurt Zeilenga wrote:

On Jun 30, 2009, at 3:13 PM, Michael Ströder wrote:

[snip]

I agree that LDIF is just an alternative encoding of protocol data
units. So lifting the ASCII limitation in LDIF would IMO not introduce
any other problem a LDAP client with user interface does not already
have today (despite the new-line issue).

Removing the ASCII restriction will mean that users and systems will need to be far more careful in how they transfer LDIF data. Today, one can email LDIF files about, FTP them without putting clients in "binary" mode, etc, because of the ASCII restriction. Without this restriction, users and systems will have to be quite careful to ensure transfers preserve the LDIF data octet-for-octet.

[snip]

I'm broadly in agreement with Michael that extended LDIF and LDAP clients
face much the same issues, but there is one way in which they differ. If
I use a client to edit an attribute value, then the scope of invisible,
inadvertent changes is probably that one value (which I'm changing anyway),
or if it is a poor client, then maybe the whole attribute or whole entry.
If I carelessly edit or transmit an extended LDIF dump, then the scope of
inadvertent changes is the entire collection of entries in the dump.

As a way to mitigate wholesale inadvertent changes I suggest that the
extended format should allow each attribute value with non-ASCII characters
that is presented in the clear to be optionally followed by the same value
in the base64 encoding. The format would be designed so that the following
base64 encoding is clearly paired with the value in the clear rather than it
being the next attribute value that happens to be base64 encoded.
The paired base64 encoding, if present, will always take precedence over
the value in the clear. That is, on parsing, the base64 encoding is used and
the value in the clear is ignored.

Normal practice for directory servers when outputting non-ASCII values
in the clear would be to pair them up with their base64 encoding.
Anyone manually editing the dump file has to remember to remove the base64
encoding part for a change to the value in the clear to be accepted,
which is a minor inconvenience. However, any inadvertent changes to other
values in the clear won't have any effect because their base64 encoded
parts will still be there, and the extended LDIF format will be just as
robust as normal LDIF for transport around the network.

When composing data from non-directory data sources to import into a
directory, the values can still be conveniently presented in the clear
without needing to generate a base64 encoding for them. When extracting data
from the directory to feed into other systems the values in the clear
can be used and the base64 encoded parts ignored.

Comments anyone ?

Regards,
Steven
_______________________________________________
Ldapext mailing list
Ldapext@ietf.org
https://www.ietf.org/mailman/listinfo/ldapext