[Date Prev][Date Next] [Chronological] [Thread] [Top]

Charset handling in the LDAP C API (Was: VM/ESA patches)



I've been review Neale Ferguson's patches for VM/ESA.  Besides
the usual issues from lacking (POSIX) interfaces, EBCDIC support
raises a number of charset handling issues to the foreground.   

OpenLDAP is a (very incomplete) implementation of the IETF draft
LDAP C API specification.  The specification states that all
strings be passed between the application and the implementaton
(the library) be represented in the charset native to protocol.
That is, while the API is being used for LDAPv2, the strings should
be represented in T.61 or ASCII.  For LDAPv3, UTF-8.  Translation
between application required charsets and the API required
charset is left as an exercise for the developer.  (please see
LDAPext mailing list archives for background on this requirement).

So that each developer need not implement their own translations,
we should provide a set of translation functions.  Some of these
are needed to implement the server:
	t61 <-> utf8

Others are needed to support local character sets, for VM/ESA
these would include:
	ebcdic <-> t61
	ebcdic <-> utf8

These later set of routines would NOT be called by the implementation,
but would be made available by the implementation for application
use.

Charset handling actually impacts almost all LDAP applications
regradless of platform.  A number of different translation
interfaces are possible.  Suggestions/comments regrading such
interfaces is welcomed.

To get the ball rolling, here are a few possibilities.  (I've
purposely avoided specification of function prototypes for now).

They could be implemented using two pairs of translators per
local charset.
	ebcdic_to_t61()/t61_to_ebcdic()
	ebcdic_to_utf8()/utf8_to_ebcdic()

or they could be implemented such that the t61 vs utf8 choice
was specified implicit with a session handle.
	ebcdic_to_ldap(ld, ...)
	ldap_to_ebcdic(ld, ...)

or they could be implemented such that the local charset was
specified as an argument:
	ldap_encode(ld, "ebcdic", ...)
	ldap_decode(ld, "ebcdic", ...)

or ....


BTW, folks interested in designing or implementing a
charset translation infrastructure for OpenLDAP are
more than welcomed to do so.

	Kurt