[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: [ldapext] UTF-8 full support in LDIF / LDIF v2




Kurt,

Kurt Zeilenga wrote:
On Jun 18, 2009, at 6:19 PM, Steven Legg wrote:

The potential for an inadvertent change of normalization in the LDIFv2 if
it is edited doesn't overly concern me. Stringprep takes care of it for
matching purposes

Not for userPassword and the like.

The extended format in the ELDIF specification in its current form can alter
end-of-line characters so I only use it for syntaxes where I know this is
harmless, which basically means XED syntaxes that are known to contain only XML
documents. Since the Octet String syntax doesn't fit this criterion, the
userPassword attribute never uses the extended format. If we generalize the
extended format to allow "here" documents with the unmodified literal LDAP
attribute value, then I would expect the extended format to be limited to
syntaxes that are known to produce exclusively UTF-8 character strings,
which would continue to exclude userPassword.

When I dump userPassword values they are encrypted, so even if the contents
of the octet string were UTF-8 to start with they probably aren't after
the encryption is done with it.


Not for value syntaxes which require a specific normalization to be applied else result in a syntax error.

I'm not aware of any such syntax.


And, end-of-line characters appearing in values are not required to be base64'ed or otherwise escaped, there will inadvertent change of end-of-line characters to deal with.

It's best to be tolerant of such variations anyway since editing by
ordinary LDAP clients could create such inadvertent changes.


LDIFv1 avoided such problems by limiting the characters in values that could appear without being base64'ed to a subset of the ASCII subset of characters. These issues haven't gone away since the introduction of LDIFv1.

and any client that expects attribute values to be in,
or remain in, a particular normalization form is asking for trouble.

If a technical specification says an attribute value is to be in a particular Unicode normalization form, then all clients supporting that technical specification need to be provide the values of that attribute in a particular Unicode normalization form.

I don't know of any such specification for an existing syntax that produces
exclusively UTF-8 encodings. It would be most unwise for such a requirement
to be placed on the use of an existing syntax (e.g., Directory String)
because of the installed base of software that just wouldn't honour the
requirement. If it's a new syntax, then it wouldn't be known to existing
ELDIF implementations so attributes of the syntax wouldn't use the
extended format. Depending on the details, when I got around to implementing
it (i.e., making it known) I might explicitly exclude it from using the
extended format and/or have my ELDIF parser renormalize the values it is
importing.


The
values could be modified by some other client that changes the normalization
during editing and I wouldn't count on every directory implementation
preserving the exact character sequence it is given (though mine does).

If the normalization is specified as part of the LDAP syntax for the attribute value syntax, it follows that there would be a requirement for directory servers to preserve that normalization. Or the value might be stored in an octet string (like userPassword) and the server required to preserve the octets and hence the normalization.

Putting it in an octet string would protect it from using the extended format,
as I envisage its applicability.

Regards,
Steven


If a client needs the values to be in a particular normalization form it
should do the conversion itself.

We already have one standard attribute, userPassword, where values (when text) SHOULD to be provided in a particular Unicode normalization.

-- Kurt
_______________________________________________
Ldapext mailing list
Ldapext@ietf.org
https://www.ietf.org/mailman/listinfo/ldapext