[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: National charset support

Niels Baggesen wrote:
> Yes and no. The client cannot convert charactersets, but if you give
> the -B option to ldapsearch it will decode the base-64.and IF the
> characterset matches your xterm environment you will see the right
> thing.
> Otherwise I can recommend
>    ldapsearch -B .... | tcs -t latin1

Alright, but please notice that not all attribute types should be displayed
or translated.  Think of jpeg or a certificate.  This is essentially the
same approach I use in some of my libldap copies that translate internally
from/to ISO 8859-1 and is just as flawed.

The current situation is that libldap talks at the API level the wire
protocol (and this is explicitly written down in the new draft C API) and
it cannot do anything else because it does not know the syntaxes of the
attribute types and this is necessary to understand whether they should be
translated.  This seems pretty common: most or all LDAP C API implementations
just pass the octets around.  That means you can get T.61 when talking LDAP
V2, UTF-8 when talking LDAP V3.  But many servers will do no translation
either and will talk UTF-8 when talking LDAP V2, so you might be getting
that from the API even from V3 servers when binding with V2.  Or you might
get something completely different, some servers have been created with
other charsets.

On the other hand, if the library provides no help here, doing the right
thing is part of the client's job.  And our provided clients do nothing in
this respect.  From the above it should be clear that it is nontrivial for
them to do anything.

No matter what component does the translation, it cannot be done without
schema knowledge.  So that's why I have been working on schema issues.
The slapd in -devel already publishes the schema it knows about through
LDAP and the clients have routines available to help them parse the
answers.  Whether clients have some before knowledge of the schema or
whether they cache information formerly obtained from servers is something
yet to be explored.

I am very worried about performance of short-lived clients, since I am a
user of nss_ldap and RFC2307 myself.  Since the attribute types used by
nss_ldap is finite and known beforehand, maybe this is a clear candidate
for hardwiring what attribute types are to be translated.  Please notice
that even if all string attribute types in RFC2307 are IA5 and might seem
not to require translation, the fact is that OpenLDAP has been ported or
is going to be ported to hosts that have EBCDIC as their native character
set.  Well, OK, maybe I am getting to carried away, nss_ldap and EBCDIC
probably live in disjoint worlds.  Still, there is at least one place
where nss_ldap needs translation: when defaulting the GECOS field from
the cn attribute if no gecos attribute is present in an entry.

The library will probably be extended to help the clients in all this
process.  In principle, translation routines will be accessible to clients
so that they can do the translation themselves in a portable way.  Even some
kind of option through ldap_set_option or a client control might be defined
to simplify the client logic.  And our work could result in one or more
proposals to the IETF LDAPext WG.

I think the above is the most promising approach for fixing this problem for
good.  Most other approaches are workarounds.  Useful workarounds, mind, I
use a handful of them myself, but workarounds nonetheless.  This is my main
personal itch and what got me into this.

Hope this clarified my point.