[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: String conversions UTF8 <-> ISO-8859-1



At 07:54 AM 5/30/2003, Hallvard B Furuseth wrote:
>Kurt D. Zeilenga writes:
>>At 06:32 AM 5/23/2003, Hallvard B Furuseth wrote:
>>>> It seems far more reasonable just to hand the content of the PDU to
>>>> the application and let it, with its application-specific knowledge,
>>>> apply necessary conversions.
>>>
>>> The thing is, then everyone have to add a lot of code to do this. 
>> 
>> Well, to some degree, yes.  But they should have to write
>> their own conversion functions.
>I don't see why.

I left out a "not" here and there in my response.  Sorry.  So let me
try to restart the discussion a bit.  (I certainly don't want to dampen
your desire to make improvements to OpenLDAP.  Just providing comments
for your consideration.)

There are applications which use different character sets and encodings
when interacting with the user then when interacting with the
directory.  Those applications will need access to an appropriate
conversion routine.  Personally, I think applications should
deal with conversion issues at the user interface, not at the LDAP
interface.  This makes more and more sense when you consider that
many applications also have to deal with other libraries designed
for Unicode/UTF-8 interactions.  Anyways, there are certainly plenty
of "legacy" applications that demand conversion to be done below
it (and often above it).

Anyways, the LDAP API, at least as currently designed, has little
knowledge of which values are character strings as the protocol itself
does not impart that knowledge in its encoding but by tokens whose
semantics are defined in user application schema or solely by
applications.  Also, the API is unaware of extensions (attribute
description options, controls, etc., which might affect the encoding
of strings carried in the protocol.

Even if you were to make the API schema-aware, the API would not
be aware of extensions it does not implement.  Also, the API would
not be aware of encoding conventions which are not reflected in the
schema.

Now, you suggested some sort of callback mechanism.  While callbacks
offer a powerful programming paradigm, particular callback interfaces
generally have very narrow applicability because a) they don't cover
a subset of the values needing the function they intend to offer, b)
they don't provide enough context to the application, and/or c) don't
provide applications with enough response options.  Also, callbacks
tend only to cover half the problem: conversion of information being
provided by the directory service.

It has been suggested that another approach would be to have a
higher level API where strings passed (in both directions) between
the library and the application in the local character set/encoding.
This library would need to be schema-aware.  It also would needs
some mechanism for the application to impart additional knowledge
(such as "passwords I provide are textual").   But, of course, this
would not address values carried outside of the core protocol (such
as in controls) nor would it address changes in semantics of the
core protocol because of the use of extensions).  This solution is
relatively complete however, would take a lot of work to implement.

>If not, what exactly do you propose?

I was thinking more of a collection of "tools" (or helper) routines
that acted upon structures already returned by the API>   The
applications could use to make it easier to not only perform
charset/encoding conversions but also other conversions (such
as language translation).

For example, maybe provide a "foreach entry" routine which call
an application-specified function on each entry in a message
chain (previously provided by the API).  And then a "foreach
attribute" routine... etc..

Kurt