Issue 890 - OpenLDAP 2.0.6 -> 2.0.7 problem
Summary: OpenLDAP 2.0.6 -> 2.0.7 problem
Status: VERIFIED FIXED
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: slapd (show other issues)
Version: unspecified
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2000-11-14 15:52 UTC by chabrol@webonomics.de
Modified: 2014-08-01 21:06 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description chabrol@webonomics.de 2000-11-14 15:52:28 UTC
Full_Name: Daniel Chabrol
Version: 2.0.7
OS: Linux (2.2.16)
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (195.226.109.184)


Hello!

I upgraded from 2.0.6 to 2.0.7. But now I'm unable to import my old
LDIF-Exports. I receive the following error in the syslog:
mail slapd[8264]: conn=1 op=10 RESULT tag=105 err=21 text=value contains invalid
data
After some tests I noted that this is caused by "special" characters like the
german umlauts (������) in cn and sn attributes (objectclass person). If I use
the same configuration like with 2.0.6 it still doesn't work. What's the
difference? And how can I fix this problem to be able to import my old
directory-data?

Daniel
Comment 1 Julio Sanchez 2000-11-14 20:19:45 UTC

chabrol@webonomics.de wrote:

> What's the difference?

Code is becoming more and more RFC-compliant, but in some cases that
compliance affects uses that were common in the past, but always
were dubious if not plain wrong.

I think you used to have your special characters coded as ISO 8859-1.
You cannot do that, they have to be in UTF-8 (an encoding of Unicode
and ISO-10646).

> And how can I fix this problem to be able to import my old
> directory-data?

Sorry, I think you will have to translate them.  And your old
applications will have to be reconfigured for UTF-8.

Sucks, I know, but OpenLDAP 2.x follows LDAPv3 and that mandates
UTF-8.

Julio
Comment 2 chabrol@webonomics.de 2000-11-14 22:34:49 UTC
Hello!

> I think you used to have your special characters coded as ISO 8859-1.
> You cannot do that, they have to be in UTF-8 (an encoding of Unicode
> and ISO-10646).

The umlauts are normal ISO 8859-1 (ISO Latin 1) characters.

> > And how can I fix this problem to be able to import my old
> > directory-data?
> Sorry, I think you will have to translate them.  And your old
> applications will have to be reconfigured for UTF-8.

attributetype ( 2.5.4.3 NAME ( 'cn' 'commonName' ) SUP name )
attributetype ( 2.5.4.4 NAME ( 'sn' 'surname' ) SUP name )
attributetype ( 2.5.4.41 NAME 'name'
        EQUALITY caseIgnoreMatch
        SUBSTR caseIgnoreSubstringsMatch
        SYNTAX 1.3.6.1.4.1.1466.115.121.1.15{32768} )

This Syntax is according to the documentation an UTF-8 string. So it looks
like I've to convert it. But there is too much data for a manual
conversion. Is there a tool available to convert LDIF-Data from
ISO-LATIN-1 to UTF-8? And there is an additional Problem: The attributes
containing ISO-LATIN-1 characers are "binary" encoded in the LDIF-Data
(for example cn:: S2xhdXMgVHL2bmRsZQ==). This data should be correctly 
converted to UTF-8. But the attribute userpassword looks like it is an
normal octet string:

attributetype ( 2.5.4.35 NAME 'userPassword'
        EQUALITY octetStringMatch
        SYNTAX 1.3.6.1.4.1.1466.115.121.1.40{128} )

I suppose if this data is also converted to UTF-8, this will break the
userauthentication because. So the tool should leave the
userpassword-attribute untouched. Has somebody a solution or tip?

Daniel
  






Comment 3 Jean-Philippe Brunon 2000-11-15 08:14:09 UTC
chabrol@webonomics.de wrote:
Hi,

> This Syntax is according to the documentation an UTF-8 string. So it looks
> like I've to convert it. But there is too much data for a manual
> conversion. Is there a tool available to convert LDIF-Data from
> ISO-LATIN-1 to UTF-8?

    Try the 'iconv' utility: it supports conversion between many
character sets, including of course ISO-LATIN-1 and UTF-8.

> And there is an additional Problem: The attributes
> containing ISO-LATIN-1 characers are "binary" encoded in the LDIF-Data
> (for example cn:: S2xhdXMgVHL2bmRsZQ==). This data should be correctly
> converted to UTF-8. But the attribute userpassword looks like it is an
> normal octet string:
> 
> attributetype ( 2.5.4.35 NAME 'userPassword'
>         EQUALITY octetStringMatch
>         SYNTAX 1.3.6.1.4.1.1466.115.121.1.40{128} )
> 
> I suppose if this data is also converted to UTF-8, this will break the
> userauthentication because. So the tool should leave the
> userpassword-attribute untouched. Has somebody a solution or tip?

    It should work since such characters are common to ISO-LATIN-1 and
UTF-8 (US-ASCII range).

        Jean-Philippe.

-- 
| Jean-Philippe BRUNON - AURORA
Comment 4 Julio Sanchez 2000-11-15 09:23:56 UTC

Daniel Chabrol wrote:

> This Syntax is according to the documentation an UTF-8 string. So it looks
> like I've to convert it. But there is too much data for a manual
> conversion. Is there a tool available to convert LDIF-Data from
> ISO-LATIN-1 to UTF-8?

Not that I know. I have a Perl script that does something related, it
takes an LDIF file and outputs the suffix, translating both the
entry DN as well as DN-valued attributes.  A very crude adaptation
to you problem is:

#!/usr/bin/perl

use strict;
use diagnostics;

use Net::LDAP::LDIF;
use Unicode::String qw(latin1 utf8);

my $debug = 0;

my %ds_syntax;

{
    # List here attribute types with Directory String syntax
    my @ds_syntax = qw(cn sn o ou etc...);

    %ds_syntax = map { $_ => 1 } @ds_syntax;
}

my $o_ldif = Net::LDAP::LDIF->new( "tocho", "r" );
my $n_ldif = Net::LDAP::LDIF->new( "tocho.new", "w" );

while( my $entry = $o_ldif->read() ) {
    # Do things with $entry
    my $dn = $entry->dn;
    print "Procesando $dn\n" if $debug;
    $entry->dn(latin1($dn)->utf8);
    for my $at ($entry->attributes) {
        # print "Analizando $at\n" if $debug;
        if ($ds_syntax{lc($at)}) {
            my $vals = $entry->get($at);
            for my $i (0..$#$vals) {
                print "Replacing $$vals[$i] by " if $debug;
                $$vals[$i] = latin1($$vals[$i])->utf8;
                print "$$vals[$i]\n" if $debug;
            }
            $entry->replace($at, $vals);
        }
    }

    $n_ldif->write($entry);
}
$o_ldif->done();
$n_ldif->done();
exit 0;

I have not even checked for syntax errors above, I just provide
it to show the basic technique, but be careful.  Sorry for the
English/Spanish mix, I happen to write like that a lot.

> And there is an additional Problem: The attributes
> containing ISO-LATIN-1 characers are "binary" encoded in the LDIF-Data
> (for example cn:: S2xhdXMgVHL2bmRsZQ==). This data should be correctly
> converted to UTF-8.

No problem, Net::LDAP::LDIF should do it right.

> But the attribute userpassword looks like it is an
> normal octet string:
> 
> attributetype ( 2.5.4.35 NAME 'userPassword'
>         EQUALITY octetStringMatch
>         SYNTAX 1.3.6.1.4.1.1466.115.121.1.40{128} )
> 
> I suppose if this data is also converted to UTF-8, this will break the
> userauthentication because. So the tool should leave the
> userpassword-attribute untouched. Has somebody a solution or tip?

Have a list of translatable attribute types as I did above.

Hope this helps,

Julio
Comment 5 chabrol@webonomics.de 2000-11-15 10:42:31 UTC
Hi!

>     Try the 'iconv' utility: it supports conversion between many
> character sets, including of course ISO-LATIN-1 and UTF-8.

Iconv would work if I find a tool which converts "binary" LDIF-Encoding
(like cn:: S2xhdXMgVHL2bmRsZQ) into normal characters. Is there such a tool
available? If yes I can convert this "special encoding" to normal
ISO-LATIN-1 characters and after that with iconv to UTF-8

Daniel

Comment 6 chabrol@webonomics.de 2000-11-15 19:18:22 UTC
Hello!

> Not that I know. I have a Perl script that does something related, it
> takes an LDIF file and outputs the suffix, translating both the
> entry DN as well as DN-valued attributes.

With your Perl-script I was able to convert some attributes into UTF-8. The
import worked perfectly. Thank you for your help!

Daniel

Comment 7 Julio Sanchez 2000-11-16 13:07:08 UTC

Daniel Chabrol wrote:
> 
> With your Perl-script I was able to convert some attributes into UTF-8. The
> import worked perfectly. Thank you for your help!

Glad to know, it may not work for everyone depending on the version
of perl-ldap, the get/get_value methods in Net::LDAP::Entry have
varying availability/semantics...

Terrific modules anyway.  All praise Graham Barr...

Julio
Comment 8 Kurt Zeilenga 2000-12-27 11:07:26 UTC
changed notes
changed state Open to Closed
Comment 9 OpenLDAP project 2014-08-01 21:06:12 UTC
discussion