[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: 8bit characters



Peter Peltonen wrote:
> 
> If I try to add entries including 8bit information (like a with umlauts) with
> ldpadd I get the following error:

Transform your string to UTF-8 encoded Unicode. This encoding
contains bytes where Bit 7 is set (aka 8-bit chars). Any string or
blob containing 8-bit chars has to be base64-encoded when producing
LDIF (not if you directly add through a LDAP SDK!).
base64-encoded strings in LDIF are defined by a line
attrtype:: base64-encoded string

Now a verbose example for Python 2.0+ to set things straight. My
name is "Michael Ströder" (encoded with ISO-8859-1 character set on
my Linux xterm and in my MUA). This example will produce a string of
a LDIF line for cn attribute.

Note:
- Some lines might be wrapped.
- \xc3 denotes hex-decimal byte C3

Python 2.1 interpreter session:

>>> # Take a string encoded in the local environment - here
>>> # ISO-8859-1 AKA ISO-Latin-1
>>> unicode('Michael Ströder','iso-8859-1')
u'Michael Str\xf6der'
>>> # Encode the string in the Unicode object to UTF-8
>>> unicode('Michael Ströder','iso-8859-1').encode('utf-8')
'Michael Str\xc3\xb6der'
>>> # Create base64-encoded line already including trailing new-line
>>> import base64
>>> base64.encodestring(unicode('Michael Ströder','iso-8859-1').encode('utf-8'))
'TWljaGFlbCBTdHLDtmRlcg==\n'
>>> # Write the LDIF line
>>> 'cn:: %s' % (base64.encodestring(unicode('Michael Ströder','iso-8859-1').encode('utf-8')))
'cn:: TWljaGFlbCBTdHLDtmRlcg==\n'

Now the Perl, C, Java crowd can come up with similar examples.

Ciao, Michael.