[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: separator escapement in draft-ietf-ldapbis-syntaxes-00.txt



I agree with this direction of clarification.

>>> "Kurt D. Zeilenga" <Kurt@OpenLDAP.org> 07/30/01 07:12PM >>>
The intent, as I see it, of this paragraph is to provide
a set of general rules for escaping special characters
with directory string contained within larger productions
where the set of special characters is dependent upon
the specification of the larger production.  

For <value>s within substring assertions, the special
characters to be escaped are '*' and '\'.  For <dstring>s
within postalAddress, '$' and '\'.  

There must be an explicit statement in the specification
of the larger production to define which special characters
are to be escaped per these rules for each sub-productions
needing such.

I suggest the paragraph be replaced with something like
(needs work):

    In encodings where an arbitrary string is used as part of
    a larger production, a quoting or escaping mechanism is
    needed so that separators used in the larger production
    may appear (escaped or quoted) within the arbitrary string.
    For many productions, a backslash quoting mechanism is used.
    To allow for re-use by LDAP syntax designers, a general
    specification is provided below.
 
    The backslash quoting mechanism uses the backslash character ('\')
    followed by two hexadecimal digits (0-9,A-F,a-c) to replace
    any special characters appearing in the arbitrary string.  The
    set of special characters includes the backslash character as well
    as one or more separator (or other) characters as detailed
    in the specification of the larger production. The escaped string
    must conform to the following ABNF:
        string = *( character / backslash HEX HEX )
        character = <any character EXCEPT special>
        special = backslash / separators
        separators = <specific to larger production, MUST be chosen
                from 0x20-0x5b,0x5d-0x7e>

    For example, the string "foo\ $bar" would represented in a larger
    production where '$' was used as a separator as "foo\5C \24bar"
    or as "foo\5c \24bar".

Note that the DN string representation, like many other larger
productions, provides its own mechanism.

Kurt



At 01:58 PM 7/30/2001, Jim Sermersheim wrote:
>The text of the paragraph from 4.3 has been moved to Section 2.1 (a good thing I think), but it has been reworded from:
>
>   In encodings where an arbitrary string, not a Distinguished Name, is
>   used as part of a larger production, and other than as part of a
>   Distinguished Name, a backslash quoting mechanism is used to escape
>   the following separator symbol character (such as "'", "$" or "#") if
>   it should occur in that string.  The backslash is followed by a pair
>   of hexadecimal digits representing the next character.  A backslash
>   itself in the string which forms part of a larger syntax is always
>   transmitted as '\5C' or '\5c'. An example is given in section 6.27.
>
>to:
>
>   In cases where an arbitrary string, not a Distinguished Name or part 
>   of one, is used in a value of an attribute, a backslash quoting 
>   mechanism is used to escape the following separator symbol character 
>   (such as "'", "$" or "#") if it should occur in that string.  The 
>   backslash is followed by a pair of hexadecimal digits representing 
>   the next character.  A backslash itself in the string which forms 
>   part of a larger syntax is always represented as '\5C' or '\5c'.  An 
>   example is given in section ?? postalAddress attribute.
>
>Most noteably, the wording "as part of a larger production" has changed to "in a value of an attriubte". This adds a ambiguity to an already ambigous statement. As I understand it, this paragraph is applied to multi-part syntaxes, i.e. those that consist of multiple values, separated by some kind of separator character. This knowlege must be inferred to a small degree in RFC 2252,  because the terms "arbitrary", "larger production", and "larger syntax" aren't really defined. Now "larger production" is gone, which could cause implementors to simply apply this rule to *all* string-valued syntaxes.
>
>I also infer from the original (and after carefull application of logic, the latter) that one only escapes instances of a _following_ separator character in strings. This means in a syntax like:
>dstring '#' dstring
>one must escape instances of '#' in the first dstring, but not the second. Is this correct? If so, I wish it was more explicit.
>Then when talking about escaping the \ character, it appears that it is *always* escaped in multi-part syntaxes which means it would be escaped in both dstrings in the syntax above. I'm not sure what that's all about, since we only escape separator char's when the string is followed by a separator char--or am I reading it wrong?
>
>Jim