[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: [ldapext] UTF-8 full support in LDIF / LDIF v2

To: Steven Legg <steven.legg@eb2bcom.com>
Subject: Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
From: Kurt Zeilenga <Kurt.Zeilenga@Isode.com>
Date: Thu, 18 Jun 2009 06:34:50 -0700
Cc: ldapext@ietf.org
Delivered-to: ldapext@core3.amsl.com
In-reply-to: <4A39E563.1040107@eb2bcom.com>
References: <49C497F9.7010200@zioup.com> <CD3905D4-2A25-4C56-8187-3CE10D46C929@isode.com> <49C870C6.4010803@zioup.com> <E94B7389-9A6D-4CB6-BB2C-649CCD3FD15B@Isode.com> <49CB192E.5050105@zioup.com> <49CB211C.6070108@eb2bcom.com> <49CB87FE.1050809@zioup.com> <49CC01DE.6040506@eb2bcom.com> <4A24557D.7030006@zioup.com> <4A26A05D.8040105@zioup.com> <245BF18B-2066-4E36-9502-16F4A3140D9E@Isode.com> <4A309775.3080406@zioup.com> <4A311ED1.1030202@stroeder.com> <4A31D27B.3090208@zioup.com> <4A325A40.2050802@stroeder.com> <4A35CDDE.8000604@zioup.com> <4A37719E.3010006@stroeder.com> <4A37A5D9.4040901@zioup.com> <4A3830A6.4030407@eb2bcom.com> <93053DE7-C324-4124-BF8F-B3C7088D66EB@Isode.com> <4A39E563.1040107@eb2bcom.com>


On Jun 17, 2009, at 11:57 PM, Steven Legg wrote:

Kurt,

Kurt Zeilenga wrote:
On Jun 16, 2009, at 4:54 PM, Steven Legg wrote:
Also, X.500 directories already lose something in the translationwhen outputtingas LDIF. For example, the choice in a DirectoryString is lost andif thatchoice is teletexString then transcoding wipes out the exact octetencoding.
Most of the ad-hoc LDAP string encodings are lossy in some respect.
Such changes are tolerable because the resulting value in LDIF isthe sameas far as the matching rules are concerned. Unicode normalizationof the
extended LDIF output is a similar situation.
The loss you are talking about is inherent in LDAP not LDIF. Thatis, LDIF does not lose anything (for the LDAP requests it's designto represent) in translation to/from LDAP. I don't it is tolerablefor an LDAP intermediate format to "lose" LDAP information.

First, for loss, let's only talk about loss between the data formatand LDAP requests. Loss between the data format and DAP or someother protocol ought to be beyond the scope of an LDAP data format.Loss of the DirectoryString CHOICE is an LDAP issue, not a LDAP dataformat issue.

So when I respond to your reply, I'm only going to consider loss ofinformation between the file and protocol representations of the LDAPrequest.

A "here" document mechanism where the UTF-8 character sequence between
the end of the introducer and the beginning of the terminator is the

literal directory attribute value without any modification wouldsatisfy

the no loss requirement.

Depending on particulars of value and "here" mechanism. Forinstance, line separators are problematic in the mechanism suggestedby Yves.

We could always recommend base64 encoding the
entire file to transfer it across the network, or at least treatingit as
a binary file, or suggest using LDIFv1 instead for such purposes.
For me, the convenience of being able to easily view, edit orcompose the
content in files


Have you tried to view/edit the UTF-8 file I sent out?

I'm uploading outweighs the need to be careful about
which text handling tools I use.

You will have to be quite careful in the tools you use. If they are"text" tools, you'll have problems, because the LDIF file is notrestricted to text. And the tools encoding of values as text might beproblematic (wrong normalization on a per value basis), etc..

For instance, any "text" tool which normalizes its output is generallyunsuitable because that normalization will be incorrect for some setof values.

If others have similar or overlapping
requirements it would be better for us to use the same specificationeven
if it never becomes a proposed standard RFC.

I wouldn't object to experimenting here, just so long as it notconfused with standard LDIF.

But if one is going to try to solve a particular problem within thestandard community, they should try to solve it generally. That is,we should dismiss solutions which only work well for limited subsetsof Unicode.

Would you feel better if we called it the LDAP Data CompositionFormat(LDCF)
instead of LDIFv2 or Extended LDIF ?

Well, if one was to have an experiment to develop a LDAP data format,which the whole content is say Net-Unicode, calling it something otherthan LDIF would be good to avoid confusing it with LDIF. Things totackle would include

- how to represent text values needing to be in different Unicodenormalization (in the directory) than the required normalization,- how to represent non-text Unicode values (arbitrary sequences ofUnicode points),- if Net-Unicode line separators are to be considered part of thevalue, how to represent values requiring other line separators, and

- bidirectional values.

Note: not requiring a particular normalization in the file is not asolution to first item, as that merely leaves it up to tools as towhich normalization (if any) to apply. So the first item justbecomes, how to convey the normalization algorithm required for thevalue when represented in the LDAP request.

These are not simple problems and are unlikely to be solved well onthe first publication, hence experimental track does seem appropriatefor this feature.

I would like LDIF standard extensions work to be limited to improvingdata interchange capabilities (such as representation of LDAPresponses), which is a much more straight forward problem, somethingwe are far more likely to get right on the first publication.


-- Kurt
_______________________________________________
Ldapext mailing list
Ldapext@ietf.org
https://www.ietf.org/mailman/listinfo/ldapext

Follow-Ups:
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Steven Legg <steven.legg@eb2bcom.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Yves Dorfsman <yves@zioup.com>

References:
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Yves Dorfsman <yves@zioup.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Yves Dorfsman <yves@zioup.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Kurt Zeilenga <Kurt.Zeilenga@Isode.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Yves Dorfsman <yves@zioup.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Michael Ströder <michael@stroeder.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Yves Dorfsman <yves@zioup.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Michael Ströder <michael@stroeder.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Yves Dorfsman <yves@zioup.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Michael Ströder <michael@stroeder.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Yves Dorfsman <yves@zioup.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Steven Legg <steven.legg@eb2bcom.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Kurt Zeilenga <Kurt.Zeilenga@Isode.com>
- Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
  - From: Steven Legg <steven.legg@eb2bcom.com>

Prev by Date: Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
Next by Date: Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
Index(es):
- Chronological
- Thread