[Date Prev][Date Next]
Re: UTF-8 support, take 2
Julio Sanchez Fernandez writes:
> What is not planned to be supported is having, say, ISO 8859-5 on wire
> and ISO 8859-1 at the API provided by the library.
Oh, right. Seems we were saying the same thing in different ways.
>> * T.61<->unicode (...),
>> * local charset <-> unicode (...)
> These two are relatively easy in the forward direction. The reverse
> direction requires sparse tables, however. A good design of the
> translation tables is critical. I tried hash tables but could not
> find efficient hash functions that don't make a mess of the tables
> with collisions when tried with real data extracted from the charset
Maybe a it's already solved. Check <URL:http://www.unicode.org/> and
<URL:ftp://ftp.unicode.org/Public/> for a translation library. And/or
ask the unicode mailinglist, firstname.lastname@example.org.
>> Not really. Most data will be translatable to latin-1, since that's
>> what most of those who put data in the directory can handle.
> For the time being, it is. But then we are not alone in the planet.
Good point, but still - very often, most or all data will be
translatable to the user's charset, because most directory operations
are on local/national data.
>> We may want to specify in which cases translation is done in the client:
>> * whether or not to translate attributes with DN syntax,
> Why? Can you explain? In any case, DNs in V3 are UTF-8 by definition.
Since DNs are sometimes data and sometimes text. They are data when
used as base DNs to further directory operations - then it's OK to get a
reversible auto-tramslation. They are data when generating certificates
and such things - then there must be *no* translation.
They are text when displayed, e.g. when we do ldap_explode_dn and
display the RDN. Then we'll often want approximate translation to the
local charset, as with other text data. Clients' authors are often lazy
and assume - or know - that the clients only work with data which can be
translated to the local charset - then they'll want auto-transation of
DNs to and from the local charset and forget about charset issues.
Well, I suppose a reversible auto-translation is best in that case.
DNs are not the only "both data and text" type, of course. Just the
most prominent one.
>> * more generally: which attributes and/or syntaxes to translate,
> Yes. According to the specs, syntaxes have defined representations
> and this *should* be the right method at the server. The client is
> going to need to know about the schema somehow to do this.
I was thinking the oppisite way: The client may want to tell the library
to (not) auto-translate certain attributes. Maybe that would simpily
mean to override the syntax of some of the attributes (e.g. set an
attribute's syntax to "bin" to avoid auto-translation).