[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: [ldapext] UTF-8 full support in LDIF / LDIF v2




On Mar 23, 2009, at 10:33 PM, Yves Dorfsman wrote:
LDIF primary purpose is for interchange of directory data. Just adding UTF-8 support doesn't allow for any additional interchange of information. The UTF-8 change is for secondary purposes, to allow humans to more easily see what they are interchanging, to allow humans to directly modify the LDIF.

Good point, but UTF-8 sure helps for debugging ! Diffing base64 is completely useless.


Also you could argue that the beauty of LDIF is its simplicity for humans. If we don't care about that aspect, we could completely give up LDIF, and go with something more complex like an XML format, or even a binary one.

One of the problems with using UTF-8 directly is that the UTF-8 form used within the human's editor might be quite different that the UTF-8 form the human desires to store in the directory. Or the human's editor cannot display any and all sequences of UTF-8 characters, some of which might not even be well-formed text, as stored in the directory.


There are other issues, such as how to deal with BIDI values.

While I am sure there are persons in the IETF that have the expertise to get a UTF-8 LDIF right, I am quite unsure of time/energy/ willingness to contribute to this effort.


I rather spend my time on adding features that do allow for the a interchange of more directory data. Also LDIF ought to have similar extensibility as LDAP itself has.

Let's start a list, and see who wants what in/out...



Extending LDIF to support all LDAP requests, e.g., an LDAP Transaction

Extending LDIF to support all LDAP responses, e.g., Entries, References, Intermediate Responses, and Result returned in response to an LDAP Search request.

Extending LDIF to support XML values.

Really ?

See http://www.xmled.info/drafts/draft-sciberras-xed-eldif-05.txt




There were also suggestions of adding a "charset" specification (yuk).

Is there something we cannot do with Unicode ? I don't understand the advantage of a "charset" keyword. If anything, I'd rather look at an "encoding" keyword to support the different encodings of Unicode, for east asian scripts for example which apparently takes a lot less space when encoded in UTF-16. But, UTF-16 in itself is quite a bit more complex than UTF-8, requiring a byte order mark etc... I was thinking this could wait a future version.

As Alexey noted, I was only listing what someone else suggesting. I don't support adding a charset option.



Yes, I believe you need to add a PDF or PS version to the original. I can look at that. Is there a tradition on how to pick the names for the example, or is it completely random ?

I think the ASCII I-D/RFC needs to have examples that use some sort of escaping mechanism (for the purposes of the example)


	value: \C3\BC

where C3 BC is the 2 octet UTF-8 encoding of the Unicode LATIN SMALL LETTER U WITH DIAERESIS (U+00FC) codepoint.

	value: \75\CC\88


where 75 is the 1 octet UTF-8 encoding of Unicode LATIN SMALL LETTER U (0075) codepoint and CC 88 is the 2 octet UTF-8
encoding of the Unicode COMBINING DIAERESIS (0308) codepoint.


Note: The hex escaping above is used only for illustration purposes. The LDIF content should contain the actual UTF-8
encoded Unicode codepoints.


The PDF/PS version would not use the escaping.



Removing some paragraphs:
Should we remove some paragraph that don't seem to be relevant any more, such as:
" The application/directory MIME content-type [RFC2425] is a general
framework and format for conveying directory information, and is
independent of any particular directory service. The LDIF format is
a simpler format which is perhaps easier to create, and may also be
used, as noted, to describe a set of changes to be applied to a
directory.
"
I would oppose removing this text. Any replacement of the LDIF specification needs to consider the impact on use of LDIF in MIME encoded contents of the application/directory type.

It seems to me that this paragraph is giving historical information and justifying some of the reasons behind LDIF... Am I wrong here.


If I am making an error and it's worth keeping, than that's fine.

Expired draft:
Refence [Armijo00] is a draft expired in 2001. It is used in example 7. Is this still relevant ?
The revised document should provide more relevant examples. In particular, examples should use standard track controls. So this (informative) reference ought to be replaced accordingly based upon which standard track controls are used in examples.

I could not find any new version of this document, of a replacement for it. As this just become standard practice ? Or is there a written standard for it ?

My point is that this example should be replaced with one that uses a standard track control, such as the LDAP Assertion Control [RFC4528].





RFC4525:
I noticed that RFC 4525 has updated the LDIF definition. Should this be included in this RFC ?
Yes, as LDIFv1 is formally RFC 2829 as updated by RFC 4525 (that's what UPDATES means).
I have created an extra file with its inclusion.

Ok. I'll just merge the two files an drepost then.





--
Yves.
http://www.sollers.ca/

_______________________________________________
Ldapext mailing list
Ldapext@ietf.org
https://www.ietf.org/mailman/listinfo/ldapext

_______________________________________________ Ldapext mailing list Ldapext@ietf.org https://www.ietf.org/mailman/listinfo/ldapext