[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: [ldapext] UTF-8 full support in LDIF / LDIF v2

To: Kurt Zeilenga <Kurt.Zeilenga@Isode.com>
Subject: Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
From: Steven Legg <steven.legg@eb2bcom.com>
Date: Thu, 04 Jun 2009 17:29:27 +1000
Cc: ldapext@ietf.org
Delivered-to: ldapext@core3.amsl.com
In-reply-to: <245BF18B-2066-4E36-9502-16F4A3140D9E@Isode.com>
References: <49C497F9.7010200@zioup.com> <CD3905D4-2A25-4C56-8187-3CE10D46C929@isode.com> <49C870C6.4010803@zioup.com> <E94B7389-9A6D-4CB6-BB2C-649CCD3FD15B@Isode.com> <49CB192E.5050105@zioup.com> <49CB211C.6070108@eb2bcom.com> <49CB87FE.1050809@zioup.com> <49CC01DE.6040506@eb2bcom.com> <4A24557D.7030006@zioup.com> <4A26A05D.8040105@zioup.com> <245BF18B-2066-4E36-9502-16F4A3140D9E@Isode.com>
User-agent: Thunderbird 2.0.0.21 (Windows/20090302)


Kurt,

Kurt Zeilenga wrote:

On Jun 3, 2009, at 9:10 AM, Yves Dorfsman wrote:
Is the idea of a here document syntax too ridiculous ?
There are a number of problems with it. Personally, I think what Stevenalready offered (and likely implemented)


I have.

is better, though I amconcerned about line separators.


Me too, but at least it doesn't matter for XML.

> As Howard comments kind of suggests,

when you have a value which is multi-lined, it's the syntax thatcontrols what line separators are used, not the LDIF. For instance, insome syntaxes, a $ is used to as a line separator.
The problem with your proposal, and Steven's, is that LDIF lineseparators and value line separators are one and the same thing. Whileone might be case occasionally, it cannot be expected to be generallythe case.
LDIF is first and foremost an interchange format. Conversion from LDAPPDU->LDIF Record->LDAP PDU MUST produce as output the input, octet foroctet for every "data" component (the DN, every attribute descriptionand associated values, etc.).


That's highly desirable for directory to directory interchange, but LDIF
is also used for composing data from various data sources to put in a
directory and to extract data from a directory to put in other data
sources. The octet-for-octet preservation usually doesn't apply in these
other cases and the need to turn line-based data such as XML documents
into base64 encodings is a serious impediment, hence the reason Andrew
and I wrote the Internet-draft.

Is UTF-8 support in LDIF not that important ?
LDIF being a proper interchange format is important. UTF-8 support(other than being able to interchange values whose syntax is UTF-8encoded) is cosmetic.
Adding UTF-8 support does appear to be in support of improving LDIF as aproper interchange format. It seems to be driven by other goals, suchas trying to make LDIF files displayable. Given that LDAP does notconstrain attribute value syntaxes (even directory strings can containarbitrary sequences of Unicode code points), the goal of making LDIFfiles displayable is not terribly feasible.
I note that even today, ASCII LDIF files might not display properlywithout special handling, such as for line separators. But with UTF-8,line separators are only the tip of iceberg of display problems.
I'm not convinced that removing the ASCII restrictions will be a goodthing. Not only do I doubt it will have a net positive ondisplayability of LDIF for those who have a displayability goal (I don'tthis goal), I think it will have a net negative impact oninteroperability and user confusion, such as when the user creates afile using one Unicode normalization algorithm, but is trying to setvalues which require a different Unicode normalization value.


The user is not going to directly enter a base64 encoded value. They
would use a tool that has those same normalization issues to produce
a UTF-8 character stream that has to be passed to another tool to turn
into base64. The issues exist anyway. It is just a question of where.

BTW, it is not my intent to replace LDIFv1 for pure directory to directory
interchange. I just need the option to produce and consume something more
amenable to human editing and batch processing, where appropriate.

Regards,
Steven

Am I the only one thinking xml is not a good replacement for LDIF,
There already exists a number of XML replacements of LDIF, such asDSML... so I guess at least some do think XML is a good replacement forLDIF.
if so, should we help Steven with the xmled RFC ?
What Steven and Andrew have done is define an extension for LDIF toallow XML values to be represented in a human-readable format instead ofrequiring the use use of base64. Unfortunately his proposal hasinterchange issues (see the I-D's security considerations section).This, I think, is a fatal problem with this extension.
-- Kurt
Thanks.


Yves Dorfsman wrote:
Steven Legg wrote:
See http://www.xmled.info/drafts/draft-sciberras-xed-eldif-05.txt
I did look at it, personally I find it difficult for humans, fordiff'ing etc... XML has its place, but so does pure text.
Yes I was wondering about that, do we need multi-line values aswork around because schemas aren't precise enough ?
No, we need them because sheets of paper, computer screens and RFCs are
not infinitely wide. :-) Human-readability, line breaks andindenting tend
to go hand-in-hand.
I've been thinking about this and trying a few things. My conclusionis that the best solution would be the good old here document.
objectclass: inetOrgPerson
organizationName:<<EOT
The two line
 company
EOT
sn: Jensen
With the following specifications:
Any of the following characters (or sequence in the case of CR+LF)can be used as a separator (<SEP>):LF (U+000A), CR (U+000D), CR+LF (U+000D followed by U+000A), NEL(U+0085), FF (U+000C), LS (U+2028), PS U+2029)Any sequence of characters can be used instead of EOT, but cannotinclude a separator character. The same sequence has to be used atthe begining and the end.
Any UTF-8 character, except separators, can be used on each line.
Any separator can be used to separate the lines.
The text start after EOT<SEP>, and finishes with the last characterbefore <SEP>EOT. The organization name in the example above isexactly two lines, the last separator is not part of the text.No need or possibility to escape characters, no possibility offolding lines .
--
Yves.
http://www.sollers.ca/

_______________________________________________
Ldapext mailing list
Ldapext@ietf.org
https://www.ietf.org/mailman/listinfo/ldapext
_______________________________________________
Ldapext mailing list
Ldapext@ietf.org
https://www.ietf.org/mailman/listinfo/ldapext

_______________________________________________
Ldapext mailing list
Ldapext@ietf.org
https://www.ietf.org/mailman/listinfo/ldapext

References:
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Yves Dorfsman <yves@zioup.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Yves Dorfsman <yves@zioup.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Kurt Zeilenga <Kurt.Zeilenga@Isode.com>

Prev by Date: Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
Next by Date: Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
Index(es):
- Chronological
- Thread