[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: utf-8 encode



On Mon, 29 May 2000, Julio Sánchez Fernández wrote:

> Date: Mon, 29 May 2000 10:05:15 +0200
> From: Julio Sánchez Fernández <j_sanchez@stl.es>
> To: Juan Miguel de los Ríos Caparrós <jmrios@germinus.com>
> Cc: openldap-general@OpenLDAP.org
> Subject: Re: utf-8 encode
> 
> 
> 
> > Juan Miguel de los Ríos Caparrós wrote:
> > 
> > Where must I indicate to LDAP for using  utf-8 encode or ISO 10646-1?... I´m using OpenLDAP 1.2.10 and Red Hat 6.2
> 
> You don't.  OpenLDAP 1.2.x is mostly character set transparent for those
> encodings that are ASCII-compatible (i.e. no wide characters and no
> encodings that contain NULs and such) so, if you build your directory
> using UTF-8, it will be in UTF-8.  From a formal point of view, this
> is a violation of the standard.  It should be in T.61 (teletexString)
> and nothing else (notice that ISO 8859-1 is no good either).
> 
> However, you will have lots of company in this particular violation of
> the standard.  E.g. Netscape Communicator will assume the directory
> is in UTF-8 by default and I think the default cannot be overridden
> in older versions.  MS software seems to implement some heuristic both
> in clients and servers that will often settle for UTF-8, but I cannot
> provide further help since I simply don't understand what is the
> heuristic.
> 
> On LDAPv3 (that OpenLDAP 1.2.x does *not* implement) it is UTF-8.
> There is an implementation of v3 in the works in the CVS HEAD branch
> but you should stay await from it (unless you want to help, of course).
> 
> So you might want to plan building your database as UTF-8 now, since
> it will simplify migration later and live with the agression to the
> standard temporarily.
> 
> Notice that neither the API nor the servers will do any kind of
> translation whatsoever: if you build with some character set, you will
> have to use that character set always.  It is impossible to do this
> properly without knowing about the schema, since only some attribute
> types should be translated (e.g. you don't want your JPEGs or your
> certificates messed with).
> 
> Julio
> 



It seams a bit unclear what is supposed to happen if you mix protocol
versions, for example a v2 client talking to a v3 server.  Per the
specs the v2 client would expect a T.61 string from the server and
act accordingly.  If the server is then to accommodate this 
expectation it should then translate from utf8 to T.61 if it receives
a v2 request.  Or not?

For the opposite case where a v3 client talks to a v2 server it is
a bit clearer as the v3 client will be foreced to downgrade to the
v2 protocol and then it could also change to expect T.61.

As long as both clients and servers limit themselves to using the 7 bit
US-ASCII character set there would be no problem as US-ASCII is a proper
subset of both T.61 and utf8.



-- 
Villy