[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Rejected NLS in DN

Kjell-Einar Ofstad writes:
>On ons, 2004-04-14 at 09:36, Pierangelo Masarati wrote:
>> ldapsearch is a "dumb" client, what it does is return data in a format
>> that complies to LDIF and is safe (e.g. doesn't trigger any fancy
>> terminal feature by streaming out fancy values).  
> Well, personally I don't think that national characters are "fancy
> features"... :-) 

It is, compared to the ldapsearch's simple function of just dumping
directory data to standard output.  If it were to convert to your
character set, it would among other things need to know the syntaxes of
all your attributes, and in some cases decode the attribute values,
convert the utf-8 portions to latin-1 or whatever, and assemble them

>> ldapsearch can return the data in the original format on a
>> file; try playing with -t, or write your own client.
> Thanks! At least the -t outputs the data in a readable form :-).

It's not very hard to decode base64, anyway.
Or to convert from utf-8 to latin-1, for that matter.
I use the following.

Remove the section starting with $no_convert if you have an utf-8
character set; it's Very Wrong but conveniently lazy to mix latin-1
output and utf-8 output like this script does.  Or to assume that all
LDAP data are utf-8, for that matter.  But it works with our server,
which only contains utf-8 data converted from latin-1.

#!/usr/bin/perl -w
# Perform ldapsearch.  Add '-x' option for anonymous search, and maybe '-LLL'.
# Decode base64 output.
# Assume UTF-8 output and convert each line to latin-1 when possible,
# leave lines with the original encoding when conversion is not possible.

use strict;
use MIME::Base64;

unshift(@ARGV, "-x")	unless grep /^-\w*([DORUwXyY]$|IkKQWx)/, @ARGV;
unshift(@ARGV, "-LLL")	unless grep /^-\w*L/, @ARGV;

open(SEARCH, "-|") or exec "ldapsearch", @ARGV or die "ldapsearch: $!\n";
my $line = "";
while (<SEARCH>) {
    print_line() unless /^ /s;
    $line .= $_;

sub print_line {
    $line =~ s/\n //g;
    $line =~ s/^([-.\;\w]+:): *(.*)(?=\n\z)/$1 . " " . decode_base64($2)/es;

    my $no_convert;
    (my $latin1 = $line) =~ s/([\200-\377][\200-\277]*)/do {
	my($first, @rest) = unpack('C*', $1);
	$first -= 0xC2;
	$no_convert ||= (@rest != 1 || ($first & ~1));
	chr($first * 0x40 + $rest[0]);
    $line = $latin1 unless $no_convert;

    print $line;
    $line = "";