[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: ldap_explode_dn corrupts UTF-8 encoding (ITS#1890)



At 01:46 AM 2002-06-17, ps@psncc.at wrote:
>On Mon, 17 Jun 2002, Pierangelo Masarati wrote:
>> ps@psncc.at writes:
>>
>> > On Mon, 17 Jun 2002, Pierangelo Masarati wrote:
>> >
>> >> > OpenLDAP 2.1.2 seems to currupt non-ASCII UTF-8 encoded characters.
>> >> > It actually turns unprintable chars (in the ASCII sense) into \<hexcode>.
>> >>
>> >> I think this is a leftover of when we decided to use UTF8 instead
>> >> of the '\' + HEXPAIR representation of non-ascii chars, and initially
>> >> it was intended; of course, when parsing a DN, one wants the correct
>> >> UTF8 encoding.
>> >
>> > Note that the problem does not exist in 2.0.23...
>>
>> DN parsing/handling has been completely rewritten
>>
>> >
>> > To further elaborate the problem: before passing the DN to the
>> > ldap_explode_dn function it is properly (UTF-8) encoded. Afterwards the DN
>> > parts aren't...
>
>Well, the code fragment that broke is:
>
>            char **exploded_dn, *dn;
>            LDAP *ld;
>            LDAPMessage *e;
>
>            [snip]
>
>            dn = ldap_get_dn(ld, e);
>            /* explode DN */
>            exploded_dn = ldap_explode_dn(dn, FALSE);
>
>
>Which is exactly what the man page for ldap_explode_dn suggests. And it is
>straightforward too.

Even in 2.0.23, the value returned by ldap_explode_dn() could
have contained escaped values!   Did your code consider this?

>> They are; but they're represented in another form that is allowed
>> for DNs; it depends on whether you like it or not.  I understand
>
>I just think it is not good to break existing functionality.
>It is mostly a matter of breaking things that used to work.

The change in behavior will not break any application which
is designed to properly handle RFC 2253 RDN strings, which
not only may contain escaped values, but, in some cases,
MUST contain escaped values.

Of those applications which fail to deal with escaped RDNs,
the new ldap_explode_dn() provides the behavior believed to
break the smallest number of them.  Since many applications
were designed in the days when DNs were restricted to ASCII,
the current version ensures that RDN strings only use ASCII
characters to represent values.

If you want access to unescaped values, use ldap_str2dn(3).

Kurt