[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: invalid syntax when teletexstring

Erwann ABALEA wrote:
2011/7/29 Howard Chu<hyc@symas.com>:
Howard Chu wrote:
Erwann ABALEA wrote:
Do you have any document or pointer to understand the task of
converting to/from T.61, and incompatible character sets you talked
about? I Googled for this, but I'm not sure of what I found (what I
found reminds me of old character sets we used many years ago in
France for the Minitel, with G1/G2 character groups, etc, not that far
from VT consoles).

You can reference this old draft; I wrote Appendix A and B to document the
mapping as we understood it at that time. These Appendices were dropped
the final version because it was considered futile to attempt to document
T.61 character encoding rules.


You can also read libldap/t61.c; the code has been present in every
release since 2002 but is not compiled or used.

This Guide has a pretty good discussion of the issues.


The section on "Character Sets" is particularly relevant. The section on
"Comparing DNs" is somewhat relevant, though in fact OpenLDAP has already
solved this problem (for all the string types besides T61String) by doing
all matching in UTF-8.

Thank you for the pointers. I appreciate Peter's writings, and already
read this text, some time ago, but wasn't focused on T.61 then.
OpenSSL in its 1.0.0 version internally stores the named in UTF8,
"semi-normalized" form (useless spaces removed, everything is
converted to lowercase, but no NFC/NFD normalization is done).

I'm reading now libldap/t61.c. I just read the IETF draft, and the
numerous tables... What a mess. X.680 has a reference to T.61
recommendation, which was deleted some years ago, and I'm not clever
enough to make Google find a copy of the standard. It can't be bought
anymore from ITU, but it's still referenced by later standards. Nice.

The 1988 edition is still downloadable.

It also references T.51:

Unfortunately the 1993 edition of T.61 is gone.

Meanwhile, I still haven't found the Czech CSCA certificate, but I
know what to do with the remaining 1% uncertainty. The CN field is
encoded as T61String, to hold the "CSCA_CZ" value. That fits well
within the 7bits limit.

Then you should just be using PrintableString. You're required to use the least-inclusive string type, after all.

If everything is internally converted to UTF8 and t61.c seems to
provide a lossless T.61 to UTF8 conversion, why isn't it used?

Because it's incomplete. It only handles the original 333 character repertoire of T.61, it doesn't handle shift-in/shift-out to other character sets. I believe in the last version of T.61 there was support for Japanese (JIS), Chinese, and Greek. So quite a lot more logic and tables needs to be added, and it looks like a lot of work for something nobody should actually be using.

  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/