[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: strange (swedish) characters



Quoting Markus Jardemalm <markus.jardemalm@enea.se>:
> Since UTF-8 encoded ISO 10646-1 is Unicode my characters is in the
> "Latin-1 Supplement". How do I tell openldap (ldapadd) about what Code
> Chart I'm using so it won't complain about my characters?
 
> >$ldapadd -x -D "cn=Manager,o=myorg,c=SE" -W -f my_entry.ldif 
> >Enter LDAP Password: 
> >adding new entry "cn=my_name,o=myorg,c=SE"
> >ldap_add: Invalid syntax
> >        additional info: value contains invalid data
> 
> Any ideas about getting these characters in the ldap database?

Assuming you're loading from an LDIF file, I first process the LDIF file with a 
Perl script I wrote, that converts from ISO 8559-1 Latin1 to UTF-8, using this 
snippet:

use MIME::Base64;
use Unicode::String;

sub utfencode {
  my ($att,$val) = @_;
  if ($val =~ /[\x80-\xFF]/) {
    my $u = Unicode::String::utf8($val);  # convert from ISO8559-1 to UTF-8
    $val = encode_base64($u->utf8);
    chop($val);  # remove the newline
    # we use the double colon to indicate MIME encoding
    return $att . ":: " . $val;
  } else {
    return $att . ": " . $val;
  }
}

This takes the attribute (like CN) as the first parameter, and the value as the 
second, the tests to see if it has a non-ASCII charactet (with Hex value > 
0x7f).  If so, we assume this is ISO8559, so I convert the result to UTF-8 
(Unicode) then encode the result into Base64.  Note that anything with Base64 
encoding uses two colons.

Hope this helps!

*********************************
        Paul Gillingwater
        Managing Director
 CSO Lanifex Unternehmensberatung 
 & Softwareentwicklung G.m.b.H.
      NEW BUSINESS CONCEPTS

E-mail:  paul@lanifex.com
Mobile:  +43/699/1922 3085
Webhome: http://www.lanifex.com
Address: Praterstrasse 60/1/2 
         A-1020 Vienna, Austria
*********************************