[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: UTF-8 support for libldap

Julio Sanchez Fernandez writes:
>Hallvard B Furuseth escribió:
>>   It should be a lot easier to use LDAPv3 and let the server handle
>>   ";binary" stuff, though:-)
> I have been wondering about this (since the server IS supposed to know
> what is binary and what isn't...)

You mean slapd?  slapd is LDAPv2, ";binary" is LDAPv3.  Anyway, I'm not
sure ;binary is the right approach here after all.  "foo" and
"foo;binary" are considered different attributes, as far as I can tell.
Oh well, it was just a thought.

> What is needed for minimal v3-conformance?

I hope someone else can answer that...

> Heh, heh.  So somehow we need on wire:
> 	T.61		Standard v2
> 	UTF-8		Standard v3 and "Netscape" v2
> 	Locale charset	To support clients unable to do the translation
> 			themselves or have it done by the library

Yes, though that UTF-8 entry are not on the same port (and therefore
maybe not the same server).  It's standard v2 and standard v3 that can
be on the same port.

> The library must give the client (whether it does it with translation or
> not is irrelevant for the time being):
> 	T.61		For clients that will do something useful with
> 			it
> 	UTF-8		Same
> 	Locale charset	For everyone else
> The server will have to store things in T.61, UTF-8 or maybe a locale
> charset but provide the appropriate thing on the protocol.

If you want one server to handle all the charsets, yes.  That's of
course more efficient than running 3 servers with the same data in
different charsets.  But if you just want it to work, and you have some
disk space and CPU seconds to spare, it should be simpler to have 3
servers and regularly dump from the master, translate the charset, and
load into the other servers.  Uhm - provided the clients using the
non-master charsets do not need write access.

> On the other hand, ldapd will have to translate between T.61 and UTF-8
> (and maybe more things like latin1, argh :-() as needed.

Yes.  (My patch handles just one charset per server, though.  So it
doesn't need to keep track of which client wants which charset.)

> This is, of course, getting very complex and negotiating all these
> options, more so.  It seems like every component needing the ability to
> translate between all combinations...

Hey, no!  Negotiating what options?  There is no "give me a bogus
charset" protocol option.  All this happens because clients are too lazy
to handle charsets properly, not because they want to use nonstandard

The "option" to choose UTF-8 LDAPv2 can only be to bind to a host and
port which provides UTF-8 LDAPv2.  If you want a single server process
to provide both UTF-8 LDAPv2 and T.61 LDAPv2, it must listen to two

> Can we get rid of the locale-charset-on-wire requirement and use
> something in the line of the changes I made?

No.  Servers need to support "locale-charset-on-wire" if they want to
support lazy clients that don't bother with charset translation.

But your changes don't need to worry about that, since they are on the
client side.  Besides, I'm sure you support it already.  You do allow us
to *not* do charset translation in the library, I hope?:-) If we need to
talk to a latin-1 LDAP server, all we need to do is to build a client
which does *not* do any charset handling.

> Then we would have just
> either T.61 or UTF-8 on wire and the library would translate if needed.
> Standard v2 can be told from standard v3 without error, "Netscape" v2,
> being a possible wrong guess.

Don't guess.  You don't do anyone a favour if you teach people that it
often works fine to ignore charset handling.  That's why this is such a
mess now.  Rely on the user to choose the correct host:port for his

> Then the server must settle for one
> storage method and translate from/to the other using something based on
> your changes...

> We still have the problem with binary-encoded attributes, but it seems
> that only v3 provides a clean solution for that...

Yup.  Not ";binary" though.  I think the answer is to let the client
read the schema from the v3 server, so that it will know the syntax of
the attributes it receives.

But if you control both client and server, you can give the client the
server's list of known attributes.