[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: string value encoding and escaping question



Mark Smith wrote:
> 
> The origin of the '$' separator is the Quipu X.500 implementation (it
> used '$' inside various string syntaxes because '$' is not a valid
> character in the T.61 character set which was used for some string
> syntaxes in the olden days).  

Ah! ok, I hadn't realized T.61 was the culprit. I'd had a hunch it was because
X.500 was largely Euro-originated and suspected they chose $ cuz it wasn't
their (whoever "they" exactly were) currency symbol. I hadn't looked at T.61
closely enuff to figure out that it doesn't contain '$'. So my hunch wasn't
that far off actually. 

Do you or anyone else have a URL handy that points to a reference for T.61?
I'd like to stick it in the LDAP Roadmap. 

> Use of '$' has been carried over to some
> of the LDAPv3 syntaxes, so we are stuck with it now.

right.

> In general, you should pick a separator character that makes sense to
> you.  Backslash is clearly an inconvenient choice ;-)

Well, of course. (my 10yr old would say: DUH. ;) 

So, are there any chars other than '\' that're treated specially in the
protocol docs (aka RFCs [2251..2256] + relevant near-RFC I-Ds) that you know
of? My search hasn't turned up any, but I might've left a stone unturned. It
looks to me like the protocol docs ~don't~ treat '$' specially. 

Also, I'd appreciate getting explicit confirmation from LDAP/X.500 mavens on
these other questions I had...

> Jeff.Hodges@Stanford.edu scribbled in netscape.dev.directory newsgroup:
>
> What I'm trying to figure out (sorta outta morbid curiosity) is whether it is
> the libldap (aka "the ldap sdk", "the ldap stub") or the NS DS that is
> recognizing the '\' char and interpreting it as a hex escape. Anyone know? 
> 
> The below RFC 2252 excerpts imply to me that the client side (aka the LDAP
> stub, lib, or whatever) needs to know about this stuff in order to understand
> and properly handle this value syntax. Is this correct? Or not and why?
> 
> Also, I'm curious as to whether there's anything to gain by following X.500's
> lead and using '$' as a separator char? I don't believe that any of the RFCs
> or I-Ds specify treating it specially, so I doubt it will be inadvertently
> specially treated as backslash apparently is. 

If '$' isn't treated specially protocol-wise, then the only value of using it
as a separator is consistency with "tradition" and thus perhaps reuse of some
amount of attribute value parsing code out there, tho we don't really have a
large body of that ourselves. 

thanks,

Jeff

ps: thanks to Mark Wilcox for experimenting with duplicating our attr value
issues. 


> -------------------------------------------------------------------------------
> http://info.internet.isi.edu:80/in-notes/rfc/files/rfc2252.txt
>                         .
>                         .
> 
> 4.1. Common Encoding Aspects
> 
>    For the purposes of defining the encoding rules for attribute
>    syntaxes, the following BNF definitions will be used.  They are based
>    on the BNF styles of RFC 822 [13].
> 
>     a     = "a" / "b" / "c" / "d" / "e" / "f" / "g" / "h" / "i" /
>             "j" / "k" / "l" / "m" / "n" / "o" / "p" / "q" / "r" /
>             "s" / "t" / "u" / "v" / "w" / "x" / "y" / "z" / "A" /
>             "B" / "C" / "D" / "E" / "F" / "G" / "H" / "I" / "J" /
>             "K" / "L" / "M" / "N" / "O" / "P" / "Q" / "R" / "S" /
>             "T" / "U" / "V" / "W" / "X" / "Y" / "Z"
> 
>     d               = "0" / "1" / "2" / "3" / "4" /
>                       "5" / "6" / "7" / "8" / "9"
> 
>     hex-digit       =  d / "a" / "b" / "c" / "d" / "e" / "f" /
>                            "A" / "B" / "C" / "D" / "E" / "F"
> 
>     k               = a / d / "-" / ";"
> 
>     p               = a / d / """ / "(" / ")" / "+" / "," /
>                       "-" / "." / "/" / ":" / "?" / " "
> 
>     letterstring    = 1*a
> 
>     numericstring   = 1*d
> 
>     anhstring       = 1*k
> 
>     keystring       = a [ anhstring ]
> 
>     printablestring = 1*p
> 
>     space           = 1*" "
> 
>     whsp            = [ space ]
> 
>     utf8            = <any sequence of octets formed from the UTF-8 [9]
>                        transformation of a character from ISO10646 [10]>
> 
>     dstring         = 1*utf8
> 
>     qdstring        = whsp "'" dstring "'" whsp
> 
>     qdstringlist    = [ qdstring *( qdstring ) ]
> 
>     qdstrings       = qdstring / ( whsp "(" qdstringlist ")" whsp )
> 
>                         .
>                         .
> 4.3. Syntaxes
>                         .
>                         .
>    In encodings where an arbitrary string, not a Distinguished Name, is
>    used as part of a larger production, and other than as part of a
>    Distinguished Name, a backslash quoting mechanism is used to escape
>    the following separator symbol character (such as "'", "$" or "#") if
>    it should occur in that string.  The backslash is followed by a pair
>    of hexadecimal digits representing the next character.  A backslash
>    itself in the string which forms part of a larger syntax is always
>    transmitted as '\5C' or '\5c'. An example is given in section 6.27.
>                         .
>                         .
> 
> 6.27. Postal Address
> 
>    ( 1.3.6.1.4.1.1466.115.121.1.41 DESC 'Postal Address' )
> 
>    Values in this syntax are encoded according to the following BNF:
> 
>       postal-address = dstring *( "$" dstring )
> 
>    In the above, each dstring component of a postal address value is
>    encoded as a value of type Directory String syntax.  Backslashes and
>    dollar characters, if they occur in the component, are quoted as
>    described in section 4.3.   Many servers limit the postal address to
>    six lines of up to thirty characters.
> 
>    Example:
> 
>       1234 Main St.$Anytown, CA 12345$USA
>       \241,000,000 Sweepstakes$PO Box 1000000$Anytown, CA 12345$USA
> 
>                         .
>                         .
> 
> [ note that "\241,000,000" is intended to resolve to "$1000000" once the value 
> string shown above is parsed out into its constituent components, which're 
> delineated by the "$" chars. This implies to me that the client needs to know 
> about this in order to understand and properly handle this value syntax. I 
> don't know if that assertion is exactly true. ]
> 
> -------------------------------------------------------------------------------