[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: [ldapext] UTF-8 full support in LDIF / LDIF v2




Kurt,

Kurt Zeilenga wrote:

On Jun 3, 2009, at 9:10 AM, Yves Dorfsman wrote:
Is the idea of a here document syntax too ridiculous ?

There are a number of problems with it. Personally, I think what Steven already offered (and likely implemented)

I have.

is better, though I am concerned about line separators.

Me too, but at least it doesn't matter for XML.

> As Howard comments kind of suggests,
when you have a value which is multi-lined, it's the syntax that controls what line separators are used, not the LDIF. For instance, in some syntaxes, a $ is used to as a line separator.

The problem with your proposal, and Steven's, is that LDIF line separators and value line separators are one and the same thing. While one might be case occasionally, it cannot be expected to be generally the case.

LDIF is first and foremost an interchange format. Conversion from LDAP PDU->LDIF Record->LDAP PDU MUST produce as output the input, octet for octet for every "data" component (the DN, every attribute description and associated values, etc.).

That's highly desirable for directory to directory interchange, but LDIF
is also used for composing data from various data sources to put in a
directory and to extract data from a directory to put in other data
sources. The octet-for-octet preservation usually doesn't apply in these
other cases and the need to turn line-based data such as XML documents
into base64 encodings is a serious impediment, hence the reason Andrew
and I wrote the Internet-draft.


Is UTF-8 support in LDIF not that important ?

LDIF being a proper interchange format is important. UTF-8 support (other than being able to interchange values whose syntax is UTF-8 encoded) is cosmetic.

Adding UTF-8 support does appear to be in support of improving LDIF as a proper interchange format. It seems to be driven by other goals, such as trying to make LDIF files displayable. Given that LDAP does not constrain attribute value syntaxes (even directory strings can contain arbitrary sequences of Unicode code points), the goal of making LDIF files displayable is not terribly feasible.

I note that even today, ASCII LDIF files might not display properly without special handling, such as for line separators. But with UTF-8, line separators are only the tip of iceberg of display problems.

I'm not convinced that removing the ASCII restrictions will be a good thing. Not only do I doubt it will have a net positive on displayability of LDIF for those who have a displayability goal (I don't this goal), I think it will have a net negative impact on interoperability and user confusion, such as when the user creates a file using one Unicode normalization algorithm, but is trying to set values which require a different Unicode normalization value.

The user is not going to directly enter a base64 encoded value. They
would use a tool that has those same normalization issues to produce
a UTF-8 character stream that has to be passed to another tool to turn
into base64. The issues exist anyway. It is just a question of where.

BTW, it is not my intent to replace LDIFv1 for pure directory to directory
interchange. I just need the option to produce and consume something more
amenable to human editing and batch processing, where appropriate.

Regards,
Steven


Am I the only one thinking xml is not a good replacement for LDIF,

There already exists a number of XML replacements of LDIF, such as DSML... so I guess at least some do think XML is a good replacement for LDIF.

if so, should we help Steven with the xmled RFC ?

What Steven and Andrew have done is define an extension for LDIF to allow XML values to be represented in a human-readable format instead of requiring the use use of base64. Unfortunately his proposal has interchange issues (see the I-D's security considerations section). This, I think, is a fatal problem with this extension.

-- Kurt




Thanks.


Yves Dorfsman wrote:
Steven Legg wrote:
See http://www.xmled.info/drafts/draft-sciberras-xed-eldif-05.txt
I did look at it, personally I find it difficult for humans, for diff'ing etc... XML has its place, but so does pure text.
Yes I was wondering about that, do we need multi-line values as work around because schemas aren't precise enough ?

No, we need them because sheets of paper, computer screens and RFCs are
not infinitely wide. :-) Human-readability, line breaks and indenting tend
to go hand-in-hand.
I've been thinking about this and trying a few things. My conclusion is that the best solution would be the good old here document.
objectclass: inetOrgPerson
organizationName:<<EOT
The two line
 company
EOT
sn: Jensen
With the following specifications:
Any of the following characters (or sequence in the case of CR+LF) can be used as a separator (<SEP>): LF (U+000A), CR (U+000D), CR+LF (U+000D followed by U+000A), NEL (U+0085), FF (U+000C), LS (U+2028), PS U+2029) Any sequence of characters can be used instead of EOT, but cannot include a separator character. The same sequence has to be used at the begining and the end.
Any UTF-8 character, except separators, can be used on each line.
Any separator can be used to separate the lines.
The text start after EOT<SEP>, and finishes with the last character before <SEP>EOT. The organization name in the example above is exactly two lines, the last separator is not part of the text. No need or possibility to escape characters, no possibility of folding lines .


--
Yves.
http://www.sollers.ca/

_______________________________________________
Ldapext mailing list
Ldapext@ietf.org
https://www.ietf.org/mailman/listinfo/ldapext

_______________________________________________
Ldapext mailing list
Ldapext@ietf.org
https://www.ietf.org/mailman/listinfo/ldapext
_______________________________________________
Ldapext mailing list
Ldapext@ietf.org
https://www.ietf.org/mailman/listinfo/ldapext