[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: UTF-8 support for libldap

Hallvard B Furuseth escribió:
> Except that a lot if clients don't handle the charset, so the server
> must have the same charset as the user (or application) - typically
> latin-1.  And Netscape (& MSIE?) use UTF-8 over LDAPv2.  Is that why you
> want UTF-8, to keep clients compatible with Netscape?

Absolutely.  My only clients currently are either Netscape or programs
linked to my modified library.  I'll have to try MSIE to see what I get.
I have a 3.something around that does not seem to do anything about

> when it knows it's about to handle a binary attribute:
>         ldap_enable_translation( ld, e, 0 );
>         val = ldap_get_values( ld, e, "audio" );
>         ldap_enable_translation( ld, e, 1 );

kbind does this, IIRC ;-)

>   It should be a lot easier to use LDAPv3 and let the server handle
>   ";binary" stuff, though:-)

I have been wondering about this (since the server IS supposed to know
what is binary and what isn't...)

What is needed for minimal v3-conformance?  An improved ldap_modrdn,
attribute descriptions, relaxing the requirement to bind before
anything, UTF-8, the DSE, probably adding access to the schema...  I
don't see that very far away.

> Anyone who uses a "correct" LDAPv2 server.  Any client which talks to
> X.500 through ldapd.  Any server which wishes to be compatible with
> X.500 data.

Heh, heh.  So somehow we need on wire:

	T.61		Standard v2
	UTF-8		Standard v3 and "Netscape" v2
	Locale charset	To support clients unable to do the translation
			themselves or have it done by the library

The library must give the client (whether it does it with translation or
not is irrelevant for the time being):

	T.61		For clients that will do something useful with
	UTF-8		Same
	Locale charset	For everyone else

The server will have to store things in T.61, UTF-8 or maybe a locale
charset but provide the appropriate thing on the protocol.  On the other
hand, ldapd will have to translate between T.61 and UTF-8 (and maybe
more things like latin1, argh :-() as needed.

This is, of course, getting very complex and negotiating all these
options, more so.  It seems like every component needing the ability to
translate between all combinations...

Can we get rid of the locale-charset-on-wire requirement and use
something in the line of the changes I made?  Then we would have just
either T.61 or UTF-8 on wire and the library would translate if needed.
Standard v2 can be told from standard v3 without error, "Netscape" v2,
being a possible wrong guess. Then the server must settle for one
storage method and translate from/to the other using something based on
your changes...

We still have the problem with binary-encoded attributes, but it seems
that only v3 provides a clean solution for that...