[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: String conversions UTF8 <-> ISO-8859-1

Kurt D. Zeilenga writes:
>At 10:58 AM 6/1/2003, Hallvard B Furuseth wrote:
>> First of all, I think you are trying to bite over too much. 
> Actually, I'm trying to bit off very little of "the problem".  In fact,
> I'd prefer just to provide a few helper routines to have application
> developers avoid some tedious coding.

Well, you are trying to provide a little help to almost all
applications.  I'm trying to provide more help to somewhat fewer

>> Which is why I'm personally only interested in getting it to be
>> useful in _most_ cases.  I think people can do their own conversion
>> in the remaining cases.
> I think _most_ people will have remaining cases.  That is, _most_
> people will still need to do some conversion without the assistance
> of the LDAP API.

I don't, not even with a pretty context-less callback API.  Do you have
an example of some reasonably simple code which would need to do its own
conversion in addition to using ours?  ('Reasonably simple' because I
want to play with it and see if I agree that it would need both.)

But in any case, I'd like the assistance the LDAP API gives to require
the user to add as little code as possible.

>> Meaning, the application should think UTF-8 internally?
> Unless they got a good reason not to use Unicode/UTF-8 internally, yes.
> Otherwise they'll have not only to deal with user<->internal conversions
> but internal<->Internet conversions.

First, as I said, applications that use UTF-8 internally are irrelevant
to this discussion, because they won't use any of the conversion tools
we might provide.  Second, all the world is not Internet.  Third, the
Internet is still primarily latin-1, not UTF-8, despite all the Unicode
hype.  Fourth, UTF-8 can be a poor internal representation for Unicode
if the application does any amount of text processing, since strings
must be parsed just to find the length of a character, and an UTF-8
character cannot be stored in a variable (be it a char variable or an
unsigned long).  UCS-something can be better internally.  Look at the
Emacs internals if you don't believe me.  So far Emacs uses another
multi-byte encoding than UTF-8, but in this regard the principle is the

>> The choice of internal character set in the application is up to the
>> application developer, not to us.
> Of course, as it they, not us, who live with that choice.

If with 'live with' you mean 'won't be able to use an LDAP API for
conversion very well', I disagree again.

> I think we actually agree that we should provide some "help" to
> application developers who need to do "conversions".

That, yes.

> I think we just disagree over the choice of the API mechanism to use
> to provide that "help".

And what situations it needs to cover, and about UTF-8 in general.  For
example, as I said in the previous message, I don't really believe in
your extensions that modify the character set/encoding - at least not
as something relevant to this discussion.

> The problem with callbacks is coming up with a reasonable way to
> provide enough context so that the application can make the right
> conversion.

I still want to see some concrete examples of what you mean here.
However, if you are talking about controls, these can be passed
to the callback mechanism to initialize a set of parsing commands,
and they can modify a datastructure which is passed to the other

>>> For example, maybe provide a "foreach entry" routine which call
>>> an application-specified function on each entry in a message
>>> chain (previously provided by the API).  And then a "foreach
>>> attribute" routine... etc..
>> This sounds very slow.  Seems to be it would entail a lot of unpacking
>> and repacking of Ber elements in the LDAPMessages.
> If we go with callbacks, un/repacking of BER is exactly what we'll be
> doing.

Why, no.  The LDAP API provide data to the user as trings, not as
pointers into the Ber elements.  The callbacks would happen at the
places where the data is unpacked anyway, like ldap_get_values()
and when 'char*' strings with DNs are created.

> If we just provide helpers, the application can do conversion
> where they normally do value extraction and hence avoid repacking.

If you are talking about wrapper functions around all the standard LDAP
API calls, yes.  If you mean 'foreach entry' helper functions, the
application must do quite a bit of more work.

For example, I looked at ldapsearch and ldapmodify, and found that the
most annoying part of adding charset support - if there is some 'foreach
entry/attribute' API - is to convert all the DNs, if the user wants to
do that, including the checks for whether or not the conversion was
successful, and the free() calls afterwards to free the converted DN.
'foreach entry/attribute' won't help there, but callbacks or a wrapper
API will.  I'm not sure a wrapper API will address your objections about
context better than callbacks will, though.  Gotta play with it a bit.