[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Grammar nits (Re: [Fwd: I-D ACTION:draft-good-ldap-ldif-04.txt])



Harald Tveit Alvestrand writes:
>>
>>How do we say "any character except NUL, CR or LF" in ABNF when we don't
>>know the max integer code of a character in the parser's characer set?
>>Assume iso10646 and say something like `%x01-09/%x0B-0C/%x0E-7FFFFFFF'?
> 
> RFC 2234 is quiet here:

Meaning "we can't"?

> The ACAP specs have chosen to represent their grammar as a grammar of
> octets, meaning that the correct "high value" is 255, or 0xFF.

...or it means "an ACAP grammar describes the file in terms of octets"?
How nice for hosts with 9-bit or 16-bit bytes:-)


> This actually brings out an important question:
> What's the character set of an LDIF file?
> Note 8 to the grammar seems to assume that the character set is UTF-8,

Note 8 says the input file's encoding can be anything, but the generated
LDIF content (the output) must be UTF-8.

I think the input file must be converted to UTF-8 _before_ it is fed to
the grammar, since the grammar describes LDAP strings (= UTF-8 strings).
Maybe the draft should say so.

However, I'm not sure that answer your question even if the file is
UTF-8.  Is a `character' an octet, a sequence of UTF-8 octets which
encodes an iso-10646 character, or that iso-10646 character?
If it is not the former, we can't fold a line in the middle of an
multi-octet encoded iso10646 character.

> and the changelog says this is "clarified", but I can't find the
> clarification....

I guess note 8 was added in version -01, which is where the changelog
says it was clarified.

> Here are the REAL grammar nits from verson 04:
> 
> - missing endquote for "control"

Yup.  See my list of grammar bugs.

> - extra space in front of repetition in second line of same definiton

Didn't catch that one.

> DIGIT: Used 1 times, but not defined
> base-64-dn: Used 1 times, but not defined
> base-64-rdn: Used 1 times, but not defined
> base64-rdn: Defined but not used

Yup.

> NUL: Defined but not used

Well, it's used in the description of <safe>.

-- 
Hallvard