[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Charset handling in the LDAP C API (Was: VM/ESA patches)

"Kurt D. Zeilenga" wrote:

> It would be nice if
> the wire protocol actually tag character string differently than arbitary
> octet stream, but it doesn't.

Oh, I see.  I thought I had you on record implying it could be derived from
the tags and I did not actually check it.

> The ldap_search(3) call would always act exactly per spec, requiring UTF-8
> for LDAPv3, T.61 for LDAPv2.  The strings would be written to wire without
> any translation.

Yes, that's understood, it is unfortunate nonetheless.

> An application, ldapsearch(1) or a CGI app or whatever, would use an
> auxilary API (ldap_encode/ldap_decode) to convert strings before/after
> using primary API calls.  That is, NO translation is done by the any of
> the primary API calls.

Yes, I was talking about the ldapsearch client distributed with OpenLDAP.

> Assuming that the server stores strings in one of the two representations,
> translation is required for one or the other protocols.    It must have
> schema knowledge to known which attributes are character strings and hence
> need to be translated.

Umm, do we need more syntaxes?

> Note:
>   I do not know of any server which actual does this translation, most
>   never translate strings.  That is, if you write a string using LDAPv3 and
>   read it with LDAPv2 (or vise versa) they always are equal.

Well, what can I say...  What a lossage...  Moreover, some T.61 strings are
not even legal UTF-8 (and I think the opposite is true as well).  I don't know
if I should laugh or I should cry...

> Personally, I would like to decree that that frontend<->backend interface
> always utilize UTF-8 encoded character strings.

I think it is reasonable.

> The conversion to/from
> T.61/ASCII (for LDAPv2) would be done by the frontend.

Correct, since the frontend is where protocol version awareness belongs in

> Translation to/from
> non-UTF8 character representation could be done by plugin (be_string_encode/decode)
> or be done in a backend specific manner.

It is foreseeable that some of the backends we provide have to determine this at
runtime and, thus, a configuration option would be needed.  I mean, current
databases can be presumed to be in T.61, while future databases are more likely to
be in UTF-8 and there will be some transition period.

BTW, anyone noticed that strings not in pure ASCII are stored in base64 in ldbm?
Is this really necessary if we are using UTF-8?

> >         - Should the clients default as translating or non-translating?
> This is a per application issue.  We'll eventually need to determine what the
> default should be for OpenLDAP distributed clients such ldapmodify.  For now,
> I rather focus on API issues.

The more I look into it, the more I have the feeling that charset awareness is
going to impact many components in OpenLDAP.  But you are right, let's start
somewhere or we'll stay forever at the precise spot we are now.