[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Problem with foreign characters

To: "Stein M. Eliassen" <steinme@kpnqwest.no>
Subject: Re: Problem with foreign characters
From: Paul Gillingwater <paul@lanifex.com>
Date: Mon, 13 Aug 2001 18:38:39 +0200 (CEST)
Cc: "openldap-software@openldap.org (E-Mail)" <openldap-software@OpenLDAP.org>
In-reply-to: <3B78000A.D547212@kpnqwest.no>
References: <3B78000A.D547212@kpnqwest.no>
User-agent: IMP/PHP IMAP webmail program 2.2.3

Quoting "Stein M. Eliassen" <steinme@kpnqwest.no>:

> Hi,
> 
> I'm using OpenLDAP 2.0.11 and I can't use characters like ø and æ in sn
> and cn fields, but they are allowed in dn.
> 
> How can I use these "foreign" characters in the sn and cn fields?

There must be a FAQ for this.  Basically, you need to encode non-ASCII
characters.

Here's the "Kurt" response from Kurt Zeilenga:

RFC 2253 details a UTF-8 string representation which used in LDAPv3
[RFC 2251].  RFC 1779 details a string representation which is
character set/encoding neutral, but RFC 1777 restricts DNs used
in LDAPv2 to IA5 (ASCII).

--end of Kurt's response--

Here's my usual response, with Perl code to do it for you to the LDIF file:

For those who like Perl, here's a revised snippet of code (a subroutine)
which will take as parameter the LDAP attribute name (e.g., CN) and 
the ISO-8559-1 Latin1 encoded string, and will return the string
with LDIF syntax, using Base64 and UTF-8 (Unicode) encoding in case
there are characters with the 8th bit set, i.e., non-ASCII.

This snippet is clearly incomplete, i.e., it doesn't yet handle cases
where there are trailing spaces (which require Base64 encoding), nor 
does it attempt to break long lines as per the LDIF spec, but it should
work for most cases.  Enjoy!

use MIME::Base64;
use Unicode::String qw(utf8 latin1 utf16);

sub mime_encode {

# There are two parameters.  The first is the attribute name,
# without the colon,  while the second is the value, e.g., 
# "cn" and "Paul Gillingwater." We return the concatenated
string.

# This routine will check the value for non-ASCII characters 
# (which we assume to have a value in the range 0x7f..0xff
# since we assume the input is single-byte encoded not Unicode.

# If we match, then we must return the string as MIME encoded
# with Base64, otherwise we return it untouched.  Note that
# when we do MIME encoding, we need to add a second colon 
# into the string.  We also need to convert any ISO-8559
# characters from Latin1 to UTF-8, for which we use the
# Unicode module

# Note the encoding adds a newline, which we remove with chop

  my ($att,$val) = @_;
  if ($val =~ /[\x80-\xFF]/) {
    my $u = Unicode::String::latin1($val);  # convert from ISO8559-1 to UTF-8
    $val = encode_base64($u->utf8);
    chop($val);  # remove the newline added by the encode_base64
    # we use the double colon to indicate MIME encoding
    return $att . ":: " . $val;
  } else {
    return $att . ": " . $val;
  }
}

If your Perl doesn't contain those modules, you can install them (as root) from 
CPAN using:

perl -MCPAN -e shell
$ install MIME::Base64
$ install Unicode::String 

*********************************
        Paul Gillingwater
        Managing Director
 CSO Lanifex Unternehmensberatung 
 & Softwareentwicklung G.m.b.H.
      NEW BUSINESS CONCEPTS

E-mail:  paul@lanifex.com
Teleph:  +43/1/2198222
Mobile:  +43/699/1922 3085
Webhome: http://www.lanifex.com/
Address: Praterstrasse 60/1/2 
         A-1020 Vienna, Austria
*********************************

Follow-Ups:
- Re: Problem with foreign characters
  - From: "Stein M. Eliassen" <steinme@kpnqwest.no>

References:
- Problem with foreign characters
  - From: "Stein M. Eliassen" <steinme@kpnqwest.no>

Prev by Date: Re[2]: getting dn from ldap_search_ext_s
Next by Date: Re: Openldap and Solaris 8
Index(es):
- Chronological
- Thread