[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: [ldapext] UTF-8 full support in LDIF / LDIF v2

To: Yves Dorfsman <yves@zioup.com>
Subject: Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
From: Kurt Zeilenga <Kurt.Zeilenga@Isode.com>
Date: Thu, 11 Jun 2009 20:54:10 -0700
Cc: ldapext@ietf.org
Delivered-to: ldapext@core3.amsl.com
In-reply-to: <4A309775.3080406@zioup.com>
References: <49C497F9.7010200@zioup.com> <CD3905D4-2A25-4C56-8187-3CE10D46C929@isode.com> <49C870C6.4010803@zioup.com> <E94B7389-9A6D-4CB6-BB2C-649CCD3FD15B@Isode.com> <49CB192E.5050105@zioup.com> <49CB211C.6070108@eb2bcom.com> <49CB87FE.1050809@zioup.com> <49CC01DE.6040506@eb2bcom.com> <4A24557D.7030006@zioup.com> <4A26A05D.8040105@zioup.com> <245BF18B-2066-4E36-9502-16F4A3140D9E@Isode.com> <4A309775.3080406@zioup.com>


On Jun 10, 2009, at 10:34 PM, Yves Dorfsman wrote:

Kurt Zeilenga wrote:
There are a number of problems with it. Personally, I think whatSteven already offered (and likely implemented) is better, though Iam
My problem with Steven's solution is that it is half LDIF, half XML.As I have mentioned earlier, I thing XML has its place, and maybeDSML should be fixed or re-invented, but for other application, Ifind the simplicity of LDIF an advantage ; unfortunately, having tobase64 encode anything that's not 7 bit ASCII takes away some of itssimplicity.
concerned about line separators. As Howard comments kind ofsuggests, when you have a value which is multi-lined,
I have never run into the situation where I needed a multi-linevalue in an LDAP directory and was surprised by the need, but Stevenbrought this up earlier in the thread and said that he has a real-world need for it, and that the lack of a syntax for it in myproposition for an updated LDIF format was an issue.

The problem is that what line separators to use is syntax specific (orpossible attribute value convention specific). For instance, an LDAPsyntax could say multiple lines are to be separated by a particularset of code points (such as '$') or it could be simple be a conventionthat an attribute uses a particular set of characters.

To convert a LDIF specific line separator to an attribute value lineseparator requires not only knowledge of the LDAP schema, butknowledge of the attribute value conventions not expressed in the LDAPschema.

LDIF however was designed to allow a mechanical conversion of LDIF toLDAP PDUs without such knowledge. Requiring implementations to haveadditional knowledge is quite problematic.

The problem with your proposal, and Steven's, is that LDIF lineseparators and value line separators are one and the same thing.While one might be case occasionally, it cannot be expected to begenerally the case.
On the contrary, both Steven's solution and mine separate the linesbut do not impose a line separator. Steven delimits his line withthe <item></item> syntax,

I think you confuse XML elements with line separators in XML data.Two very different things.

Steven's proposal represents line separators in the XML data using the<SEP> production.

while I let the user choose any line separator out of the half dozenthat has been used throughout the history of computing.

The problem here is how line separators in LDIF relate to lineseparators in the value.

Your approach assumes that whatever line separator the user chooses touse in the LDIF file is valid per the LDAP value syntax and anyattribute type specific restrictions.

Our syntaxes are clear enough to let the import process know thatthose are separate lines, and the import process or the LDAP servercan choose whichever line separator it wants.

That requires LDAP schema and attribute type specific restrictionsknowledge.

Making the line separator part of the data will create cross-platform issues.


Yes, but this is what your proposal seems to do.  (See below).

The LDAP server or actually the LDAP client should choose which lineseparator to use for its context/platform.


Today, LDIF line separators (<SEP>) are not part of the LDAP value.

That is,

foo: X
 Y

is equivalent to:

foo: XY

That is, the LDAP value is merely wrapped over multiple LDIF lines.

Now maybe you meant:

foo:<<EOT
X
Y
EOT

to also be equivalent to:

foo: XY

I don't see any value in offering yet another way to line wrap an LDAPvalue.

I took your proposal as representing foo attribute value "X<SEP>Y"where <SEP> was the sequence of characters used in the LDIF toseparate X and Y. This is problematic.

Adding UTF-8 support does appear to be in support of improving LDIFas a proper interchange format. It seems to be driven by othergoals, such as trying to make LDIF files displayable.
Yes and no. My main reason for pushing this is diffing.

Diffing requires knowledge of LDAP schema. One might store "foo" andget back "FOO" (or any other equivalent value) [See LDAP's datapreservation requirements].

You run into a problem and you want to diff the original and theproblematic LDIF export of your directory. Having half of your LDIFfile base64 encoded makes it a lot more difficult to pin point theproblem.

As Michael noted, there exists LDIF diffing tools (most of which arelikely not schema aware, and hence show equivalent values as beingdifferent).

If you are right, that LDIF is purely for exchanging informationbetween applications, never to be looked at by humans, then why isthe current version so human friendly ?


I never claimed LDIF would not be looked at by humans.

I have stated that LDIF, with ASCII restrictions, already suffers fromsome display/editing issues namely due line separator issues. Liftingthe ASCII restriction will make these matters far worse. (Lineseparators are the tip of the displayability/editability iceberg.)

I'm not convinced that removing the ASCII restrictions will be agood thing. Not only do I doubt it will have a net positive ondisplayability of LDIF for those who have a displayability goal (Idon't this goal), I think it will have a net negative impact oninteroperability and user confusion, such as when the user createsa file using one Unicode normalization algorithm, but is trying toset values which require a different Unicode normalization value.
How so ?
In the current version, you have to encode your Unicode to UTF-8,and then encode it again to base64. With my proposal, you would getthe exact same UTF-8 strings as you do today, but they would not be(or would not have to be) encoded in base64.


I see two kinds of problems.

1) This would result in LDIF files which programs designed to displayUTF-8 encoded Unicode text will not be able to display. There is auser expectation that LDIF files be displayable. With the currentLDIF format, we do have some display issues (e.g., line separators),but they are limited. If we remove the ASCII restrictions, we'll runinto a wide range of display issues.

2) Today we have some separation between (non-ASCII) Unicode LDAPattribute values and their LDIF representation. This separation, Ithink, has some value in that it instills LDIF syntacticallyrequirements are LDAP attribute value syntax requirements areindependent of each other. Removing this separation, I think, willlead to user confusion.

This is not a rebuttal of your argument, I am truly interested inunderstanding what you mean here (in the same way I was gladsomebody brought up the issue of Right To Left characters, as I hadnot thought about it). Maybe it is a problem that we can address ?

Keep what separation we do have between LDIF representation of anattribute value and the LDAP syntax for the attribute value.

if so, should we help Steven with the xmled RFC ?
What Steven and Andrew have done is define an extension for LDIF toallow XML values to be represented in a human-readable formatinstead of requiring the use use of base64. Unfortunately hisproposal has interchange issues (see the I-D's securityconsiderations section). This, I think, is a fatal problem withthis extension.
So really this is the issue, should the value of the line separatorbe part of the data, or should everybody (LDIF importers/exporters,LDAP servers, LDAP clients) treat multi-line entries as just that,several lines, and choose their own line separator ? (in case Iwasn't clear earlier, I am in favour of the latter).


In RFC 2849, <SEP> is never part of the LDAP attribute value.

In ELDIF extension, certain <SEP>s are part of the LDAP attributevalue. This, I think, is problematic.


--
Yves.
http://www.sollers.ca/

_______________________________________________
Ldapext mailing list
Ldapext@ietf.org
https://www.ietf.org/mailman/listinfo/ldapext


_______________________________________________
Ldapext mailing list
Ldapext@ietf.org
https://www.ietf.org/mailman/listinfo/ldapext

References:
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Yves Dorfsman <yves@zioup.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Yves Dorfsman <yves@zioup.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Kurt Zeilenga <Kurt.Zeilenga@Isode.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Yves Dorfsman <yves@zioup.com>

Prev by Date: Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
Next by Date: Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
Index(es):
- Chronological
- Thread