[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: ldap_explode_dn corrupts UTF-8 encoding (ITS#1890)



On Mon, 17 Jun 2002, Pierangelo Masarati wrote:


> ps@psncc.at writes:
>
> > On Mon, 17 Jun 2002, Pierangelo Masarati wrote:
> >
> >> > OpenLDAP 2.1.2 seems to currupt non-ASCII UTF-8 encoded characters.
> >> > It actually turns unprintable chars (in the ASCII sense) into \<hexcode>.
> >>
> >> I think this is a leftover of when we decided to use UTF8 instead
> >> of the '\' + HEXPAIR representation of non-ascii chars, and initially
> >> it was intended; of course, when parsing a DN, one wants the correct
> >> UTF8 encoding.
> >
> > Note that the problem does not exist in 2.0.23...
>
> DN parsing/handling has been completely rewritten
>
> >
> > To further elaborate the problem: before passing the DN to the
> > ldap_explode_dn function it is properly (UTF-8) encoded. Afterwards the DN
> > parts aren't...

Well, the code fragment that broke is:

	    char **exploded_dn, *dn;
	    LDAP *ld;
            LDAPMessage *e;

	    [snip]

	    dn = ldap_get_dn(ld, e);
	    /* explode DN */
	    exploded_dn = ldap_explode_dn(dn, FALSE);


Which is exactly what the man page for ldap_explode_dn suggests. And it is
straightforward too.

> They are; but they're represented in another form that is allowed
> for DNs; it depends on whether you like it or not.  I understand

I just think it is not good to break existing functionality.

> that DN parsing is delicate when UTF-8 is involved.  The point is
> that ldap_explode_dn API is broken, because t doesn't let you choose
> how to expand a DN (how to represent it in string form).

Well, I do not consider it to be broken, but I am not an LDAP guru... The
functionality is quite clear. I agree that additional functionality is
nice to have, but that is what ldap_str2dn & co. are there for.


> You may use:
>
> 	int i;
> 	LDAPDN *dn;
> 	char **v = 0;
>
> 	ldap_str2dn( string, &dn, LDAP_DN_FORMAT_LDAP);
> 	for ( i = 0; dn[i]; i++ ) {
> 		v = realloc( v, i + 2 );
> 		ldap_rdn2str( dn[ 0 ][ i ], &v[ i ],
> 			LDAP_DN_FORMAT_LDAPV3 | LDAP_DN_PRETTY );
> 	}
>

That code looks a lot more complex and incomprehesible than the
straightforward code fragment above... :-(

> see ldap_explode_dn code in libraries/libldap/getdn.c;
> the flag LDAP_DN_PRETTY causes UTF-8 to be represented.
>
> >
> > Is exploding a dn a conversion wrt to codesets? (I would not think it is)
> > Where would one need to specify extra flags? Or is this a purely internal
> > matter?
>
> No internal matter, only a matter of deciding, among allowed
> choiches what's the most general.  Initially I considered
> '\' + HEXPAIR the most general.

But it requires more parsing on the user side. With the whole of the world
working towards UTF-8 (think JAVA, gtk+, ...) not nearly as general as it
could be.

It is mostly a matter of breaking things that used to work.

ps