[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Problem with NON-ASCII chars in DNs when using OpenLDAP 2.1.x libs (ITS#1923)



Kurt D. Zeilenga wrote:
> At 04:50 AM 2002-07-05, michael@stroeder.com wrote:
> 
>>Full_Name: Michael Ströder
>>Version: REL_ENG_2_1 from CVS
>>OS: SuSE Linux 8.0
>>URL: ftp://ftp.openldap.org/incoming/
>>Submission from: (NULL) (217.1.21.113)
>>
>>
>>I've tested to build python-ldap against OpenLDAP REL_ENG_2_1 libs. Build went
>>just fine and it seems to work at a first glance.
>>
>>But using a search DN with NON-ASCII chars in it the server returns
>>NO_SUCH_OBJECT. The info field mentions exactly the right UTF-8 representation.
> 
> Can you elaborate on this a bit? which API call is not be
> behaving as expected?

Now I tracked it down that ldap_explode_dn() is errornous. 
Therefore my bug report above was quite misleading. Sorry.

>  How is it behaving?

Note that I'm using python-ldap which more or less directly wraps 
ldap_explode_dn() and returns a Python list containing the DN 
components. Sorry to bother you with Python but it's the level I 
can debug. C usually makes my brain hurt...

Some notes on Python string representation for you to understand 
the debug log below:

'\\' is a single back-slash. '\\\\' would be two back-slashes.

'\xc3\xb6' is the hex-encoding of the UTF-8 encoding for
o"     0151    LATIN SMALL LETTER O WITH DOUBLE ACUTE (RFC1345)

'\\C3\\B6' is simply: \C3\B6

[] is a list of arbitrary data types.

The error: I pass in a string of the DN with UTF-8 chars in it 
(raw string, no Unicode magic here!). It returns me the DN 
components but as normalized strings in OpenLDAP's internal notation.

See a python-ldap trace log (lines are wrapped):

*** _ldap.<built-in function explode_dn> (('cn=Michael 
Str\xc3\xb6der+mail=michael@stroeder.com,ou=Testing,dc=stroeder,dc=com', 
0),{})
=> result: ['cn=Michael Str\\C3\\B6der+mail=michael@stroeder.com', 
'ou=Testing', 'dc=stroeder', 'dc=com']

> how do you expect it to behave?

Return the DN components as raw 8-bit strings with UTF-8 encoded 
chars instead of OpenLDAP's normalized ASCII-clean string 
representation. Like this (trace of usage with REL_ENG_2):

*** _ldap.<built-in function explode_dn> (('cn=Michael 
Str\xc3\xb6der+mail=michael@stroeder.com,ou=Testing,dc=stroeder,dc=com', 
0),{})
=> result: ['cn=Michael Str\xc3\xb6der+mail=michael@stroeder.com', 
'ou=Testing', 'dc=stroeder', 'dc=com']

Ciao, Michael.