[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: [ldapext] UTF-8 full support in LDIF / LDIF v2




On Jun 17, 2009, at 11:57 PM, Steven Legg wrote:


Kurt,

Kurt Zeilenga wrote:
On Jun 16, 2009, at 4:54 PM, Steven Legg wrote:
Also, X.500 directories already lose something in the translation when outputting as LDIF. For example, the choice in a DirectoryString is lost and if that choice is teletexString then transcoding wipes out the exact octet encoding.
Most of the ad-hoc LDAP string encodings are lossy in some respect.
Such changes are tolerable because the resulting value in LDIF is the same as far as the matching rules are concerned. Unicode normalization of the
extended LDIF output is a similar situation.
The loss you are talking about is inherent in LDAP not LDIF. That is, LDIF does not lose anything (for the LDAP requests it's design to represent) in translation to/from LDAP. I don't it is tolerable for an LDAP intermediate format to "lose" LDAP information.

First, for loss, let's only talk about loss between the data format and LDAP requests. Loss between the data format and DAP or some other protocol ought to be beyond the scope of an LDAP data format. Loss of the DirectoryString CHOICE is an LDAP issue, not a LDAP data format issue.

So when I respond to your reply, I'm only going to consider loss of information between the file and protocol representations of the LDAP request.

A "here" document mechanism where the UTF-8 character sequence between
the end of the introducer and the beginning of the terminator is the
literal directory attribute value without any modification would satisfy
the no loss requirement.

Depending on particulars of value and "here" mechanism. For instance, line separators are problematic in the mechanism suggested by Yves.

We could always recommend base64 encoding the
entire file to transfer it across the network, or at least treating it as
a binary file, or suggest using LDIFv1 instead for such purposes.

For me, the convenience of being able to easily view, edit or compose the
content in files

Have you tried to view/edit the UTF-8 file I sent out?

I'm uploading outweighs the need to be careful about
which text handling tools I use.

You will have to be quite careful in the tools you use. If they are "text" tools, you'll have problems, because the LDIF file is not restricted to text. And the tools encoding of values as text might be problematic (wrong normalization on a per value basis), etc..

For instance, any "text" tool which normalizes its output is generally unsuitable because that normalization will be incorrect for some set of values.

If others have similar or overlapping
requirements it would be better for us to use the same specification even
if it never becomes a proposed standard RFC.

I wouldn't object to experimenting here, just so long as it not confused with standard LDIF.

But if one is going to try to solve a particular problem within the standard community, they should try to solve it generally. That is, we should dismiss solutions which only work well for limited subsets of Unicode.


Would you feel better if we called it the LDAP Data Composition Format(LDCF)
instead of LDIFv2 or Extended LDIF ?

Well, if one was to have an experiment to develop a LDAP data format, which the whole content is say Net-Unicode, calling it something other than LDIF would be good to avoid confusing it with LDIF. Things to tackle would include

- how to represent text values needing to be in different Unicode normalization (in the directory) than the required normalization, - how to represent non-text Unicode values (arbitrary sequences of Unicode points), - if Net-Unicode line separators are to be considered part of the value, how to represent values requiring other line separators, and
- bidirectional values.

Note: not requiring a particular normalization in the file is not a solution to first item, as that merely leaves it up to tools as to which normalization (if any) to apply. So the first item just becomes, how to convey the normalization algorithm required for the value when represented in the LDAP request.

These are not simple problems and are unlikely to be solved well on the first publication, hence experimental track does seem appropriate for this feature.

I would like LDIF standard extensions work to be limited to improving data interchange capabilities (such as representation of LDAP responses), which is a much more straight forward problem, something we are far more likely to get right on the first publication.

-- Kurt
_______________________________________________
Ldapext mailing list
Ldapext@ietf.org
https://www.ietf.org/mailman/listinfo/ldapext