Re: [ldapext] UTF-8 full support in LDIF / LDIF v2

On Wed, Jun 3, 2009 at 8:44 PM, Kurt Zeilenga <Kurt.Zeilenga@isode.com> wrote:

On Jun 3, 2009, at 9:10 AM, Yves Dorfsman wrote:

Is the idea of a here document syntax too ridiculous ?

There are a number of problems with it. Personally, I think what Steven already offered (and likely implemented) is better, though I am concerned about line separators. As Howard comments kind of suggests, when you have a value which is multi-lined, it's the syntax that controls what line separators are used, not the LDIF. For instance, in some syntaxes, a $ is used to as a line separator.

The problem with your proposal, and Steven's, is that LDIF line separators and value line separators are one and the same thing. While one might be case occasionally, it cannot be expected to be generally the case.

LDIF is first and foremost an interchange format. Conversion from LDAP PDU->LDIF Record->LDAP PDU MUST produce as output the input, octet for octet for every "data" component (the DN, every attribute description and associated values, etc.).

Is UTF-8 support in LDIF not that important ?

LDIF being a proper interchange format is important. UTF-8 support (other than being able to interchange values whose syntax is UTF-8 encoded) is cosmetic.

Adding UTF-8 support does appear to be in support of improving LDIF as a proper interchange format. It seems to be driven by other goals, such as trying to make LDIF files displayable. Given that LDAP does not constrain attribute value syntaxes (even directory strings can contain arbitrary sequences of Unicode code points), the goal of making LDIF files displayable is not terribly feasible.

I note that even today, ASCII LDIF files might not display properly without special handling, such as for line separators. But with UTF-8, line separators are only the tip of iceberg of display problems.

I'm not convinced that removing the ASCII restrictions will be a good thing. Not only do I doubt it will have a net positive on displayability of LDIF for those who have a displayability goal (I don't this goal), I think it will have a net negative impact on interoperability and user confusion, such as when the user creates a file using one Unicode normalization algorithm, but is trying to set values which require a different Unicode normalization value.

Am I the only one thinking xml is not a good replacement for LDIF,

There already exists a number of XML replacements of LDIF, such as DSML... so I guess at least some do think XML is a good replacement for LDIF.

if so, should we help Steven with the xmled RFC ?

What Steven and Andrew have done is define an extension for LDIF to allow XML values to be represented in a human-readable format instead of requiring the use use of base64. Unfortunately his proposal has interchange issues (see the I-D's security considerations section). This, I think, is a fatal problem with this extension.

-- Kurt

Thanks.

Yves Dorfsman wrote:

Steven Legg wrote:

See http://www.xmled.info/drafts/draft-sciberras-xed-eldif-05.txt

I did look at it, personally I find it difficult for humans, for diff'ing etc... XML has its place, but so does pure text.

Yes I was wondering about that, do we need multi-line values as work around because schemas aren't precise enough ?

No, we need them because sheets of paper, computer screens and RFCs are
not infinitely wide. :-) Human-readability, line breaks and indenting tend
to go hand-in-hand.

I've been thinking about this and trying a few things. My conclusion is that the best solution would be the good old here document.
objectclass: inetOrgPerson
organizationName:<<EOT
The two line
company
EOT
sn: Jensen
With the following specifications:
Any of the following characters (or sequence in the case of CR+LF) can be used as a separator (<SEP>):
LF (U+000A), CR (U+000D), CR+LF (U+000D followed by U+000A), NEL (U+0085), FF (U+000C), LS (U+2028), PS U+2029)
Any sequence of characters can be used instead of EOT, but cannot include a separator character. The same sequence has to be used at the begining and the end.
Any UTF-8 character, except separators, can be used on each line.
Any separator can be used to separate the lines.
The text start after EOT<SEP>, and finishes with the last character before <SEP>EOT. The organization name in the example above is exactly two lines, the last separator is not part of the text.
No need or possibility to escape characters, no possibility of folding lines .

--
Yves.
http://www.sollers.ca/

_______________________________________________
Ldapext mailing list
Ldapext@ietf.org
https://www.ietf.org/mailman/listinfo/ldapext

_______________________________________________
Ldapext mailing list
Ldapext@ietf.org
https://www.ietf.org/mailman/listinfo/ldapext