[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: [LDAP] Using foreign charsets / adding entries base64-encoded
Andreas Kotes wrote:
> Hi!
>
> When I add entries containing umlauts the server checks them in fine, and
> ldapsearch gets them fine, too. But when I try to look up the records with
> Netscape I get a "?" for each umlaut, rendering the result unusable.
>
> How can I tell the server to use latin1-charset, or how can I enter
> base64-encoded data, expecially using the Net::LDAPapi-Module for Perl?
> (I think this should work the same way from C) ... I already had a look at
> draft-good-ldap-ldif-00.txt and this helped somehow, but I'm quite unsure
> if this would make any difference for Netscape ... ?
>
> the Count
>
> --
> Andreas Kotes - mailto:count@linux.de - If you need any help, just ask!
> -= "Free speech not only lives, it rocks!" --Oprah Winfrey -=-
> -= Commercial use of my email adress NOT allowed. PGP key available. =-
Sure if you store latin-1 you get back latin-1, but as you have seen, the
problem is interoperability.
You should convert your data to UTF-8 before you load it with LDAPADD (or
LDAPMODIFY), converting it to BASE64 wouldn't help you much.
I produced the LDIF File with a PERL script and then I submitted it through a
pipe to the following program:
=================================================================
/* Read Latin-1 (ISO-8859-1) characters from stdin, convert them
to UTF-8, and write the converted characters to stdout.
UTF-8 is defined by RFC 2044.
*/
#include <errno.h>
#include <stdio.h>
int
main (int argc, char** argv)
{
register int c;
while ((c = getchar()) != EOF) {
if ((c & 0x80) == 0) {
putchar (c);
} else {
putchar (0xC0 | (0x03 & (c >> 6)));
putchar (0x80 | (0x3F & c));
}
}
if ( ! feof (stdin)) {
errno = ferror (stdin);
perror (argv[0]);
}
return 0;
}
=================================================================
Just for sake of completness, I add the reverve conversion:
==================================================================
/* Read UTF-8 characters from stdin, convert them to Latin-1
(ISO-8859-1), and write the converted characters to stdout.
UTF-8 is defined by RFC 2044.
*/
#include <errno.h>
#include <stdio.h>
static char UTF8len[64]
/* A map from the most-significant 6 bits of the first byte
to the total number of bytes in a UTF-8 character.
*/
= {1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* erroneous */
2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 5, 6};
int
main (int argc, char** argv)
{
register int c;
while ((c = getchar()) != EOF) {
auto int len = UTF8len [(c >> 2) & 0x3F];
register unsigned long u;
switch (len) {
case 6: u = c & 0x01; break;
case 5: u = c & 0x03; break;
case 4: u = c & 0x07; break;
case 3: u = c & 0x0F; break;
case 2: u = c & 0x1F; break;
case 1: u = c & 0x7F; break;
case 0: /* erroneous: c is the middle of a character. */
len = 5; u = c & 0x3F; break;
}
while (--len && (c = getchar()) != EOF) {
if ((c & 0xC0) == 0x80) {
u = (u << 6) | (c & 0x3F);
} else { /* unexpected start of a new character */
ungetc (c, stdin);
break;
}
}
if (c == EOF) break;
if (u <= 0xFF) {
putchar (u);
} else { /* this character can't be represented in Latin-1 */
putchar ('?'); /* a reasonable alternative is 0x1A (SUB) */
}
}
if ( ! feof (stdin)) {
errno = ferror (stdin);
perror (argv[0]);
}
return 0;
}
==================================================================
For new implementations I will integrate the Unicode-String package in my PERL
installation.
But I' m not yet finisched with it yet.... :-) :-)
Best regards
G. Baruzzi