[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Problems with case folding of UTF-8



On Sat, Dec 22, 2001 at 07:07:42PM +0100, Pierangelo Masarati wrote:
> Can you, Stig and Michael, provide a set of strings that do not
> work, so that I can try to see what's going on? I didn't have 
> any problems with few selected accented letters, while I found
> that strange folding with others so the code is not completely
> broken but there might be different subtleties here and there.

Okay, here is one that fails for me:

adding new entry "cn=Stig Venås, dc=my-domain,dc=com"
ldapadd: update failed: cn=Stig Venås, dc=my-domain,dc=com
ldap_add: Invalid DN syntax (34)
        additional info: invalid DN

The DN in base64 is Y249U3RpZyBWZW7DpXMsIGRjPW15LWRvbWFpbixkYz1jb20
Ã¥ is å (a with circle above), and should still be one character
when normalized (still 2 characters in UTF-8).

I can see that UTF8normalize does the right thing:

Breakpoint 1, UTF8normalize (bv=0x80fbf00, casefold=1 '\001') at ucstr.c:9
(gdb) ins *bv
$1 = {bv_len = 11, bv_val = 0x80fbf10 "Stig Venås"}
and at the end:
(gdb) ins out
$2 = 0x80fbf20 "STIG VENÃ\205S"

backtrace:

(gdb) bt
#0  UTF8normalize (bv=0x80fbf00, casefold=1 '\001') at ucstr.c:214
#1  0x806989b in LDAPDN_rewrite (dn=0x80fbed0, flags=0) at schema_init.c:483
#2  0x80699c9 in dnNormalize (syntax=0x0, val=0x40f9d904,
    normalized=0x40f9d900) at schema_init.c:533
#3  0x805bf67 in dn_normalize (
    dn=0x80fbd40 "cn=Stig Venås, dc=my-domain,dc=com") at dn.c:261
#4  0x8052952 in do_add (conn=0x403659a8, op=0x80fb880) at add.c:83

Stig