[Date Prev][Date Next]
Re: UTF-8 support for libldap
Julio Sánchez Fernández wrote:
> Hallvard B Furuseth escribió:
> > Except that a lot if clients don't handle the charset, so the server
> > must have the same charset as the user (or application) - typically
> > latin-1. And Netscape (& MSIE?) use UTF-8 over LDAPv2. Is that why you
> > want UTF-8, to keep clients compatible with Netscape?
> Absolutely. My only clients currently are either Netscape or programs
> linked to my modified library. I'll have to try MSIE to see what I get.
> I have a 3.something around that does not seem to do anything about
Outlook Express, actually. (not IE itself) I've got Outlook 98
installed and it can use a directory service for user lookup and
something to do with news servers (haven't looked into it yet).
> > Anyone who uses a "correct" LDAPv2 server. Any client which talks to
> > X.500 through ldapd. Any server which wishes to be compatible with
> > X.500 data.
> Heh, heh. So somehow we need on wire:
> T.61 Standard v2
> UTF-8 Standard v3 and "Netscape" v2
> Locale charset To support clients unable to do the translation
> themselves or have it done by the library
> The library must give the client (whether it does it with translation or
> not is irrelevant for the time being):
> T.61 For clients that will do something useful with
> UTF-8 Same
> Locale charset For everyone else
> The server will have to store things in T.61, UTF-8 or maybe a locale
> charset but provide the appropriate thing on the protocol. On the other
> hand, ldapd will have to translate between T.61 and UTF-8 (and maybe
> more things like latin1, argh :-() as needed.
> This is, of course, getting very complex and negotiating all these
> options, more so. It seems like every component needing the ability to
> translate between all combinations...
> Can we get rid of the locale-charset-on-wire requirement and use
> something in the line of the changes I made? Then we would have just
> either T.61 or UTF-8 on wire and the library would translate if needed.
> Standard v2 can be told from standard v3 without error, "Netscape" v2,
> being a possible wrong guess. Then the server must settle for one
> storage method and translate from/to the other using something based on
> your changes...
> We still have the problem with binary-encoded attributes, but it seems
> that only v3 provides a clean solution for that...
It seems that many new technologies are tending towards UTF-8 for the
primary encoding. Alternatively, UCS-2 (2-byte Unicode) is allowed. Note
that UTF-8 represents UCS-2 characters and will, therefore, encompass
"all" character sets.
The "preferred" encoding for XML, for example, is UTF-8.
I'd say store in UTF-8 and use conversion layers for the other cases.
Greg Stein (email@example.com)