[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: migrating from 2.0 to 2.1



Anders Bruun Olsen writes:
>> Messy.  The DN is latin-1, but the CN is UTF-8.
>> You need to convert the DN to utf-8 (after base64-decoding it).
> 
> That would be quite a good solution.. but how do I do that?

Depends on what OS and which programs you have installed.
This Perl script should do the trick, if you have a recent enough
Perl (or have installed MIME::Base64 "by hand").

As written, the script only converts latin-1 DNs to UTF-8.
If you have other latin-1 data as well which should be UTF-8,
then:

  If you replace 'if (0 || ...)' with 'if (1 || ...)', it will
  convert everything which looks like latin-1 to UTF-8.
  If you have both binary and textual attributes, you'll instead
  need to use 'if ($line =~ /^(dn|attrname1|attrname2):/is)'
  which names the specific textual attributes which may contain
  latin-1.

#!/usr/bin/perl -w
use strict;
use MIME::Base64;

my $line = "";
while (<>) {
    print_line() unless /^ /s;
    $line .= $_;
}
print_line();

sub print_line {
    if (0 || $line =~ /^(dn):/is) {
	# Remove continuation line separators
	$line =~ s/\n //g;
	# Decode base64
	my $was_b64 = '';
	$line =~ s/^([-.\;\w]+:): *(.*)(?=\n\z)/$1." ".decode_base64($2)/es
	    and $was_b64 = '\s.*|';

	# Convert non-UTF-8 - assumed to be Latin-1 - to UTF-8
	if ($line =~ /[\300-\377](?![\200-\277])|(?![\200-\377])[\200-\277]/) {
	    $line =~ s/([\200-\377])/
		pack('CC', 0xC0 + (ord($1) >> 6), (ord($1) & 0xBF)) /ge;
	}

	# Convert back to base64
	$line =~ s/^([^\n:]+:) ($was_b64.*?[\0\n\r].*)(?=\n\z)/
	    $1 . ": " . Base64($2) /e;
    }
    print $line;
    $line = "";
}

sub Base64 {
    my $b64 = encode_base64($_[0]);
    $b64 =~ tr/ \n\r//d;
    $b64;
}