[Date Prev][Date Next] [Chronological] [Thread] [Top]

RE: String conversions UTF8 <-> ISO-8859-1



> -----Original Message-----
> From: owner-openldap-devel@OpenLDAP.org
> [mailto:owner-openldap-devel@OpenLDAP.org]On Behalf Of Hallvard B Furuseth

> Michael Ströder writes:
> > And wouldn't it be necessary to have schema knowledge to determine
> > whether the conversion is applicable at all? E.g. if syntax is
> > OctetString the charset conversion might not be the right thing.
>
> Yes.  In most cases I think it would be enough for the client to know
> which syntaxes should _not_ be converted, though.

Perhaps so; there definitely aren't a lot of Binary or Not-Human-Readable
syntaxes in the standard schema.

> E.g. it wouldn't hurt
> to convert OctetString, since OctetString can't contain non-ASCII
> characters.

OctetString is UTF-8; it can certainly contain non-ASCII characters.

>  OTOH, if the client used EBCDIC it would need to
> know a bit more...

Indeed. This is quite a headache. I guess since EBCDIC is only an 8-bit
character set we would just map "unmappable" codes to '?' and leave it at
that.
>
> I suggest ldap.conf could contain lines with
>
>    attr-charset <charset> [<attribute-name> <attribute-name>...]
>    client-charset <charset>
>
> <charset> would normally be "unknown" alias "binary" or "UTF-8".  The
> default attr-charset would be unknown, but an "attr-charset
> UTF-8" line
> without any attributes would set the default attr-charset to UTF-8.
>
> Then, all that remains is to implement this:-)

Given that utilities like iconv and such exist, I think this is best left as
an exercise for the reader.

  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support