[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: [ldapext] UTF-8 full support in LDIF / LDIF v2



Kurt Zeilenga wrote:
> I note my intent is not to discourage you from putting a draft
> together.  I'm just concerned that the result might not adequately
> address the issue of concern.  I suggest you be try to describe the
> problem you are trying to solve in the I-D.

Thanks Kurt, I actually appreciate that you come up with these issues, as I had not thought of them all. This also makes me think that we should really focus on UTF-8 only for this revision, and make sure we look at all the angles.

After thinking about this some more I've come to the conclusion that the best one could do with UTF-8 and LDIF is to allow the value to contain any (but a few) code points. The file might not be always well-formed text, and hence may not always be displayable. I do not think it feasible to create an LDIF representing arbitrary Unicode attribute values using a off-the-shelf Unicode-aware text editor. For instance, you could have one value that requires composed characters but the editor only produces decomposed characters, or vice versa.

I really do think the issue here has to do with editors, and to a certain extent, ldap client. But then, I expect the people who have a need for these characters will be equiped with the right software.


I have just tried to import these two LDIF entries (at separate time) into openldap:
##################################################################
## Ali Baba
dn:: Y2492LnZhNmKINio2KfYqNinLG91PVBlb3BsZSxkYz16aW91cCxkYz1jb20=
cn:: 2LnZhNmKINio2KfYqNin
givenName:: 2KjYp9io2Kc=
objectclass: person
objectclass: organizationalPerson
objectclass: inetOrgPerson
rfc822Mailbox: Ali.Baba@zioup.com
sn:: 2LnZhNmK
##################################################################
dn: cn=ØÙÙ ØØØØ,ou=People,dc=zioup,dc=com
cn: ØÙÙ ØØØØ
givenName: ØÙÙ
objectclass: person
objectclass: organizationalPerson
objectclass: inetOrgPerson
rfc822Mailbox: Ali.Baba@zioup.com
sn: ØØØØ
##################################################################



I then accessed openldap with Mozilla Thunderbird, and in both cases, Thunderbird displayed the name properly, from right to left. Note that the vim editor displays the arabic characters from left to right (wrong, there might be a way to make vim work, but it's not the issue here), and the gnome editor gedit displays them from right to left, so properly (openldap is happy to import non base64 encoded characters, but it follows the RFC in a strict manner when exporting and base64 encode those).


I have not looked at composed characters yet, but will.

If we leave the editor issue on the side, is there any reason why UTF-8 would give us problems ? LDAP itself supports UTF-8, has there be any problem with it ?

If anybody has a more practical experience with bidirectional and composed characters in LDAP, I'd appreciate their help to do some more testing.

--
Yves.
http://www.sollers.ca/

_______________________________________________
Ldapext mailing list
Ldapext@ietf.org
https://www.ietf.org/mailman/listinfo/ldapext