[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
On Mar 23, 2009, at 10:33 PM, Yves Dorfsman wrote:
LDIF primary purpose is for interchange of directory data. Just
adding UTF-8 support doesn't allow for any additional interchange
of information. The UTF-8 change is for secondary purposes, to
allow humans to more easily see what they are interchanging, to
allow humans to directly modify the LDIF.
Good point, but UTF-8 sure helps for debugging ! Diffing base64 is
completely useless.
Also you could argue that the beauty of LDIF is its simplicity for
humans. If we don't care about that aspect, we could completely give
up LDIF, and go with something more complex like an XML format, or
even a binary one.
One of the problems with using UTF-8 directly is that the UTF-8 form
used within the human's editor might be quite different that the UTF-8
form the human desires to store in the directory. Or the human's
editor cannot display any and all sequences of UTF-8 characters, some
of which might not even be well-formed text, as stored in the directory.
There are other issues, such as how to deal with BIDI values.
While I am sure there are persons in the IETF that have the expertise
to get a UTF-8 LDIF right, I am quite unsure of time/energy/
willingness to contribute to this effort.
I rather spend my time on adding features that do allow for the a
interchange of more directory data. Also LDIF ought to have
similar extensibility as LDAP itself has.
Let's start a list, and see who wants what in/out...
Extending LDIF to support all LDAP requests, e.g., an LDAP
Transaction
Extending LDIF to support all LDAP responses, e.g., Entries,
References, Intermediate Responses, and Result returned in response
to an LDAP Search request.
Extending LDIF to support XML values.
Really ?
See http://www.xmled.info/drafts/draft-sciberras-xed-eldif-05.txt
There were also suggestions of adding a "charset" specification
(yuk).
Is there something we cannot do with Unicode ? I don't understand
the advantage of a "charset" keyword. If anything, I'd rather look
at an "encoding" keyword to support the different encodings of
Unicode, for east asian scripts for example which apparently takes a
lot less space when encoded in UTF-16. But, UTF-16 in itself is
quite a bit more complex than UTF-8, requiring a byte order mark
etc... I was thinking this could wait a future version.
As Alexey noted, I was only listing what someone else suggesting. I
don't support adding a charset option.
Yes, I believe you need to add a PDF or PS version to the original.
I can look at that. Is there a tradition on how to pick the names
for the example, or is it completely random ?
I think the ASCII I-D/RFC needs to have examples that use some sort of
escaping mechanism (for the purposes of the example)
value: \C3\BC
where C3 BC is the 2 octet UTF-8 encoding of the Unicode LATIN SMALL
LETTER U WITH DIAERESIS (U+00FC) codepoint.
value: \75\CC\88
where 75 is the 1 octet UTF-8 encoding of Unicode LATIN SMALL LETTER
U (0075) codepoint and CC 88 is the 2 octet UTF-8
encoding of the Unicode COMBINING DIAERESIS (0308) codepoint.
Note: The hex escaping above is used only for illustration purposes.
The LDIF content should contain the actual UTF-8
encoded Unicode codepoints.
The PDF/PS version would not use the escaping.
Removing some paragraphs:
Should we remove some paragraph that don't seem to be relevant any
more, such as:
" The application/directory MIME content-type [RFC2425] is a
general
framework and format for conveying directory information, and is
independent of any particular directory service. The LDIF format
is
a simpler format which is perhaps easier to create, and may also be
used, as noted, to describe a set of changes to be applied to a
directory.
"
I would oppose removing this text. Any replacement of the LDIF
specification needs to consider the impact on use of LDIF in MIME
encoded contents of the application/directory type.
It seems to me that this paragraph is giving historical information
and justifying some of the reasons behind LDIF... Am I wrong here.
If I am making an error and it's worth keeping, than that's fine.
Expired draft:
Refence [Armijo00] is a draft expired in 2001. It is used in
example 7. Is this still relevant ?
The revised document should provide more relevant examples. In
particular, examples should use standard track controls. So this
(informative) reference ought to be replaced accordingly based upon
which standard track controls are used in examples.
I could not find any new version of this document, of a replacement
for it. As this just become standard practice ? Or is there a
written standard for it ?
My point is that this example should be replaced with one that uses a
standard track control, such as the LDAP Assertion Control [RFC4528].
RFC4525:
I noticed that RFC 4525 has updated the LDIF definition. Should
this be included in this RFC ?
Yes, as LDIFv1 is formally RFC 2829 as updated by RFC 4525 (that's
what UPDATES means).
I have created an extra file with its inclusion.
Ok. I'll just merge the two files an drepost then.
--
Yves.
http://www.sollers.ca/
_______________________________________________
Ldapext mailing list
Ldapext@ietf.org
https://www.ietf.org/mailman/listinfo/ldapext
_______________________________________________
Ldapext mailing list
Ldapext@ietf.org
https://www.ietf.org/mailman/listinfo/ldapext