[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: String conversions UTF8 <-> ISO-8859-1

To: "Kurt D. Zeilenga" <Kurt@OpenLDAP.org>
Subject: Re: String conversions UTF8 <-> ISO-8859-1
From: Hallvard B Furuseth <h.b.furuseth@usit.uio.no>
Date: Fri, 23 May 2003 15:32:08 +0200
Cc: openldap-devel@OpenLDAP.org
In-reply-to: <5.2.0.9.0.20030523031228.02890ad0@127.0.0.1>
References: <5.2.0.9.0.20030521172025.02a5fb38@127.0.0.1> <HBF.20030430vsve@bombur.uio.no> <HBF.20030429tx53@bombur.uio.no> <006d01c30e8b$7c98af50$0e01a8c0@CELLO> <HBF.20030515ync0@bombur.uio.no> <HBF.20030523tepg@bombur.uio.no> <5.2.0.9.0.20030523031228.02890ad0@127.0.0.1>

Kurt D. Zeilenga writes:
>At 12:08 AM 5/23/2003, Hallvard B Furuseth wrote:
>
>>Plenty of experience with both LDAP and X.500 shows that if the
>>library doesn't offer charset conversion, very often it doesn't
>>get done.
> 
> Well, as we've already discussed, the library does not have the
> application-specific knowledge necessarily to reliably determine
> when to do conversion and what conversion to do.

The library alone, no.  The application would have to supply
the attribute types to convert or not convert, at least.

> I assumed that when you suggested adding callbacks, that the
> purpose of this callback would be to provide a mechanism for
> library to provide values with enough protocol information
> so that the application could determine if and what kind of
> conversion was appropriate and then do it.

Yes.  And I'd also write a liblutf.a or something which along with
libiconv would install such a callback and provide conversion of the
attribute types and other stuff which the application configured it to
convert.  That should be sufficient for most applications.

> The problem with this approach is that the application may been the
> whole PDU to determine whether or not if and what kind of conversion
> should take place.

I don't know any cases, but I suppose that can be true.  Extended
operations or server/client controls, maybe?  Anyway, if so I don't want
to try to write something which is best for everyone, only something
which can easily be used by most applications.  I want to be able to
tell OpenLDAP users that there is a simple way to add proper charset
support to most applications, and for that matter that they most likely
have no excuse for not doing it.

If we don't do this with callbacks, I'd still like to write a liblutf
library which mirrors the LDAP API but does conversion, as I mentioned
before.  But I think this will face most of the same limitations as the
callback API as to when it will not be useful.

> Now, you might have envisioned some other sort of callback.

No.

> But, I find it hard to think of useful callback mechanism.
> The key here is that to do determine when and what conversion
> to do, one needs both knowledge of protocol context
> (...)
> And its also may not be fully aware of protocol context (due
> to various kinds of extensions which can alter character encodings).

It seems to me such extensions must be critical controls, or something
else which the application must explicitly support.  So this is only a
problem for applications that support these controls.  And since they
support them, hopefully they do it right without my help.

Still, I suppose client/server controls could be passed to the callback
routines if we decide there is any need for that.  I suspect that will
make the API too bloated though, so it may be better to decide not to
support something as advanced as character set controls.  We'll see, if
I get that far.

> as well as application-specific context.  I don't see how the library
> can ever know the application-specific context.

I'm not sure which application-specific contexts you mean, other than
which attributes and other elements to convert or not.  E.g. DNs and
octet strings should usually not be converted.  There will be
applications for which this model is not useful, of course, but they
don't need to use it and I don't think they will pay a noticeable
overhead for the 'no, don't convert anything' tests in OpenLDAP.  I can
test the latter and report back if I implement this, if you wish.

> It's my opinion that no reasonable way for the library to
> determine in which places conversion is needed and,
> if so, what kind of conversion.  Excessing the necessary
> protocol information in a callback so the application is
> certainly possible, but seems kind of pointless.

I couldn't quite parse that.  If you meant to pass a lot of protocol
information to the callback: Right, I don't want that.  I thought I'd
pass one integer with context (Search filter, attribute in Add
operation, whatever), one string or integer index with attribute type if
any, and an array with the string values to convert.  I hadn't thought
of how much info to pass in the first integer.  I suppose it could be
quite detailed, an (operation | further detail) thingy, where most
applications - and liblutf in particular - would only pay attention to
the 'operation' bits.

> It seems far more reasonable just to hand the content of the PDU to
> the application and let it, with its application-specific knowledge,
> apply necessary conversions.

The thing is, then everyone have to add a lot of code to do this.  Which
they simply don't do.  It's a lot of duplication of work, too.  Unless
you point out something I've missed, I think an API in OpenLDAP will
provide most applications with what they need so they can do it without
adding a lot of charset conversion calls all over their code.

-- 
Hallvard

Follow-Ups:
- Re: String conversions UTF8 <-> ISO-8859-1
  - From: "Kurt D. Zeilenga" <Kurt@OpenLDAP.org>

References:
- Re: String conversions UTF8 <-> ISO-8859-1
  - From: "Kurt D. Zeilenga" <Kurt@OpenLDAP.org>
- Re: String conversions UTF8 <-> ISO-8859-1
  - From: Hallvard B Furuseth <h.b.furuseth@usit.uio.no>
- Re: String conversions UTF8 <-> ISO-8859-1
  - From: Hallvard B Furuseth <h.b.furuseth@usit.uio.no>
- Re: String conversions UTF8 <-> ISO-8859-1
  - From: "Kurt D. Zeilenga" <Kurt@OpenLDAP.org>

Prev by Date: Re: GNUTLS support?
Next by Date: Re: String conversions UTF8 <-> ISO-8859-1
Index(es):
- Chronological
- Thread