[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: UTF-8 support for libldap

Julio Sánchez Fernández wrote:
> Hallvard B Furuseth escribió:
> > Except that a lot if clients don't handle the charset, so the server
> > must have the same charset as the user (or application) - typically
> > latin-1.  And Netscape (& MSIE?) use UTF-8 over LDAPv2.  Is that why you
> > want UTF-8, to keep clients compatible with Netscape?
> Absolutely.  My only clients currently are either Netscape or programs
> linked to my modified library.  I'll have to try MSIE to see what I get.
> I have a 3.something around that does not seem to do anything about

Outlook Express, actually. (not IE itself)  I've got Outlook 98
installed and it can use a directory service for user lookup and
something to do with news servers (haven't looked into it yet).

> ...
> > Anyone who uses a "correct" LDAPv2 server.  Any client which talks to
> > X.500 through ldapd.  Any server which wishes to be compatible with
> > X.500 data.
> Heh, heh.  So somehow we need on wire:
>         T.61            Standard v2
>         UTF-8           Standard v3 and "Netscape" v2
>         Locale charset  To support clients unable to do the translation
>                         themselves or have it done by the library
> The library must give the client (whether it does it with translation or
> not is irrelevant for the time being):
>         T.61            For clients that will do something useful with
>                         it
>         UTF-8           Same
>         Locale charset  For everyone else
> The server will have to store things in T.61, UTF-8 or maybe a locale
> charset but provide the appropriate thing on the protocol.  On the other
> hand, ldapd will have to translate between T.61 and UTF-8 (and maybe
> more things like latin1, argh :-() as needed.
> This is, of course, getting very complex and negotiating all these
> options, more so.  It seems like every component needing the ability to
> translate between all combinations...
> Can we get rid of the locale-charset-on-wire requirement and use
> something in the line of the changes I made?  Then we would have just
> either T.61 or UTF-8 on wire and the library would translate if needed.
> Standard v2 can be told from standard v3 without error, "Netscape" v2,
> being a possible wrong guess. Then the server must settle for one
> storage method and translate from/to the other using something based on
> your changes...
> We still have the problem with binary-encoded attributes, but it seems
> that only v3 provides a clean solution for that...

It seems that many new technologies are tending towards UTF-8 for the
primary encoding. Alternatively, UCS-2 (2-byte Unicode) is allowed. Note
that UTF-8 represents UCS-2 characters and will, therefore, encompass
"all" character sets.

The "preferred" encoding for XML, for example, is UTF-8.

I'd say store in UTF-8 and use conversion layers for the other cases.


Greg Stein (gstein@lyra.org)