[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: using UTF-8 in openldap 2.0.7



I'd like to add my few cents here. I've been playing with ldap + UTF-8 for a few days. My sole purpose was to have a working rolodex that interfaces with various email clients, but primarily has a web interface for client contacts.

Then I started playing with UTF-8 support, as my rolodex is in english, czech, and japanese.

The software:

The server system is a variously upgraded RedHat 5.x system that's probably up to about a 6.x or so, running apache, and open-ldap 2.0. I am using ldap-abook (ldap-abook.sourceforge.net) and perl-abook (perl-abook.sourceforge.net). A minor patch was required to add the UTF-8 charset to the Address Book. Perl-abook then correctly handles the coding/decoding of the charsets.

The client is a RedHat 7.0J, with central european fonts added for czech support.

On the client, In Netscape 4.75J (this is a japanese version), single byte charsets appear to work correctly, but the display of mluti-byte (japanese) characters garbled. You can enter them into the database just fine, you just can't read them. Using stock Mozilla 0.9 set to UTF-8 encoding, all appears fine. I can mix and match characters with no problem at all, and mozilla correctly represents them on the screen.

Alas, while Netscape Mail client has an ldap interface, mozilla lacks one, so there is no way to retrieve data from an ldap server using mozilla 0.9 email client.

So, my basic evaluation is that while openLDAP itself has no obvious problems, the clients so far leave something to be desired.

--Yan

Tardis wrote:

Hello David and all,


10 May 2001, at 12:41, David Olivier wrote:

I've had the same problem as you with French extended chars

in

ISO-8859-1 format and, well, I have decided to keep using ISO-
8859-1 data.

That's what I did first and am still doing in my ldap v2 base. But
it's against the rules. Since in ldap v3 the rules exist and are
rather clear (they weren't in v2), it seems best to use them, if
possible.


In my experience clients are not all compatible with raw ISO8859-

1

encoding. For instance Netscape Communicator (4.77) does not
interpret it correctly. Try for instance:



Indeed, I did not know that. This is a real shame.

(...)

I have another client, a mail client called Mulberry; its LDAP
address lookup doesn't work correctly with UTF-8 encoded values.
But I suppose they will fix that. I imagine clients will more and
more be UTF-8 compliant. Since an LDAP server is usually meant to
be at least partly world-readable, I think it's better to stick to
the rules.



Well, the LDAP server I'm trying to setup is for restricted use of the 10 users in our network. The database contains the details of our clients and we do not wish that to be world-readable ;-)


Seriously, the main reason why I prefer ISO8859-1 data is because we are using Pegasus mail as the mailer for all the user's W9x computers. The LDAP client of Pegasus does not interpret UTF-8 characters. So are the pages of our intranet in which I would like to display the results of the queries and get data to send to the LDAP server without having to program too complex manipulations. > But I understand that if you're new to LDAP there can be more > important issues to clear up first!

Yes I have loads of questions. Anyway if someone has an answer of even an hint to my question of May 8, 2001 I would really appreciate the help (even if you have no idea, I would appreciate anything, yes I'm desperate at the moment).

In any case ISO-8859-1 to UTF-8 conversion and vice versa is not a problem in Perl, I do :

#!/usr/bin/perl

use MIME::Base64;
use Unicode::MapUTF8 qw(to_utf8 from_utf8 utf8_supported_charset);


$sn= to_utf8({ -string => $sn, -charset => 'ISO-8859-1' });
$sn= encode_base64($sn); chop $sn;

The Unicode::MapUTF8 module supports a number of other encoding including Japanese.


Ciao

--
Tardis