[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Grammar nits (Re: [Fwd: I-D ACTION:draft-good-ldap-ldif-04.txt])



At 20:37 26.06.99 +0200, Hallvard B Furuseth wrote:
Sorry, I lost this:

Harald Tveit Alvestrand writes:

> I recommend basing the grammar on octets, and saying that you define
> it like this.

I agree, plus a mention that this is after the file has been converted
to UTF-8 if it was encoded differently (as in my `what is a character?'
point).  Also: Lines must not be wrapped in the middle of a "multi-octet
UTF-8 character" (or whatever is the proper phrase), so UTF-8 LDIF files
can be printed/edited by a program which handles UTF-8.

I agree. If you want a grammar for UTF-8, here's one from ACAP:

UTF8-1             = %x80-BF

UTF8-2             = %xC0-DF UTF8-1

UTF8-3             = %xE0-EF 2UTF8-1

UTF8-4             = %xF0-F7 3UTF8-1

UTF8-5             = %xF8-FB 4UTF8-1

UTF8-6             = %xFC-FD 5UTF8-1

UTF8-CHAR          = TEXT-UTF8-CHAR / CR / LF

SAFE-UTF8-CHAR     = SAFE-CHAR / UTF8-2 / UTF8-3 / UTF8-4 /
                     UTF8-5 / UTF8-6

(See the RFC for SAFE-CHAR; you probably want to roll your own)
You can then say that folding cannot occur inside an UTF8-CHAR.

                          Harald


Harald

--
Harald Tveit Alvestrand, Maxware, Norway
Harald.Alvestrand@maxware.no