[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: UTF-8 support for libldap

Julio Sanchez Fernandez writes:
> I think my approach is different. The change is not on the server, it is
> in the library instead.

That's certainly useful too -- in particular if you are not running
X.500 and an ldapd server:-)

> The philosophy is that you keep UTF8 in your server files.
> (...)
> I gather that storage was supposed to be in T.61, so the
> need to translate it to something more palatable.

Yup.  (In LDAPv2, that is. LDAPv3 uses UTF-8.)

Except that a lot if clients don't handle the charset, so the server
must have the same charset as the user (or application) - typically
latin-1.  And Netscape (& MSIE?) use UTF-8 over LDAPv2.  Is that why you
want UTF-8, to keep clients compatible with Netscape?

> My patch works both for read and write operations.

Mine too.  I just wish it didn't, since I don't trust it enough.
An occational error on read operations is OK, but not in write ops.

> I have not tried it with binary attributes, though.  I don't really know
> what to do about this.  The standard code seems to be translating
> from/to T.61 regardless of the syntax, so I am now royally confused.

I haven't checked what openldap does, but that is the best umich ldap
could do, since the library has no knowledge about attribute types.  The
client can also temprarily turn off charset handling for an LDAPMessage*
when it knows it's about to handle a binary attribute:
        ldap_enable_translation( ld, e, 0 );
        val = ldap_get_values( ld, e, "audio" );
        ldap_enable_translation( ld, e, 1 );

> So, WHAT is the right approach?  The approach that provides today this
> capability before V3?

For client code which may display any attribute it encounters, a nicer
approach would be to have the library read the list with attribute
syntax (similar to what slapd reads in slapd.conf), and let the client
register the syntax of other attributes if it needs.  Then the library
would know which attributes to translate (by default), and leave the
rest untranslated.  (Though some library routines may have to decode,
translate and re-encode some of their parameters; I haven't checked.)

The "right" approach - which lots of client authors will ignore unless
the library forces them to follow it:-( - would be to write clients to
be charset-aware (or attribute-syntax-aware), and a library interface
which makes this as easy as possible.  Which gets us yet another step
closer to X.500, or at least how X.500 client should have done it:-)

* Add library code to register attribute types & syntax, as mentioned

* Some library calls would need an extra "how-to-handle-charsets"
  parameter which would override the default (or the default for the
  attribute in quiestion), so the code above wouldn't have to do 3
  library calls just to extract a binary attribute.

* To avoid adding lots of new library calls, you might instead let
  as many library calls as possible handle the ";binary" attribute
  option - even in LDAPv2 mode.  E.g.
        val = ldap_get_values( ld, e, "audio;binary" );
  would also tell the library to add the LDAPv3 ";binary" option to
  known binary attributes before giving them to the client, and to be
  more aggressive in checking for ";binary" options it receives from the

  And maybe you'll want some library calls to make it convenient for the
  client to add and strip options to attribute names.

  It should be a lot easier to use LDAPv3 and let the server handle
  ";binary" stuff, though:-)

I really hope we can invent something which works well, so that we can
someday expect clients to actually handle the charset correctly...

>> What do you mean?  I assume you don't mean you want to show the user raw
>> T.61?
> I mean, who needs T.61 at all?

Anyone who uses a "correct" LDAPv2 server.  Any client which talks to
X.500 through ldapd.  Any server which wishes to be compatible with
X.500 data.

BTW, another approach (or an additional approach) to the problem is to
run serveral servers, one with each relevant charset.
We have 3 ldapd servers:

 * one proper T.61 LDAP server        (ldap://katalog.uninett.no:389/),
 * "UTF-8 LDAPv2" server for Netscape (ldap://katalog.uninett.no:3890/),
 * "latin-1 LDAPv2" server for applications that don't handle charset

If you run 3 slapds like that, you must probably make two of them
readonly servers, and either modify the replication protocol to handle
charset translation, or replicate by regularly dumping the master
server, converting the charset in the dump, and reloading it to the
slave servers.