[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: string value encoding and escaping question



Jeff,

T.61 is avaialable from http://www.itu.ch.  Note that it is copyrighted.

The '*' character is a special character in some cases.  See section 4.5.1
of RFC 2251.

Cheers,               ....Erik.

---------------------------------
Erik Skovgaard
GeoTrain Corp.
LDAP and X.500 Training and Consulting
http://www.geotrain.com

At 13:20 99/02/11 -0800, Jeff Hodges wrote:
>Mark Smith wrote:
>> 
>> The origin of the '$' separator is the Quipu X.500 implementation (it
>> used '$' inside various string syntaxes because '$' is not a valid
>> character in the T.61 character set which was used for some string
>> syntaxes in the olden days).  
>
>Ah! ok, I hadn't realized T.61 was the culprit. I'd had a hunch it was
because
>X.500 was largely Euro-originated and suspected they chose $ cuz it wasn't
>their (whoever "they" exactly were) currency symbol. I hadn't looked at T.61
>closely enuff to figure out that it doesn't contain '$'. So my hunch wasn't
>that far off actually. 
>
>Do you or anyone else have a URL handy that points to a reference for T.61?
>I'd like to stick it in the LDAP Roadmap. 
>
>> Use of '$' has been carried over to some
>> of the LDAPv3 syntaxes, so we are stuck with it now.
>
>right.
>
>> In general, you should pick a separator character that makes sense to
>> you.  Backslash is clearly an inconvenient choice ;-)
>
>Well, of course. (my 10yr old would say: DUH. ;) 
>
>So, are there any chars other than '\' that're treated specially in the
>protocol docs (aka RFCs [2251..2256] + relevant near-RFC I-Ds) that you know
>of? My search hasn't turned up any, but I might've left a stone unturned. It
>looks to me like the protocol docs ~don't~ treat '$' specially. 
>
>Also, I'd appreciate getting explicit confirmation from LDAP/X.500 mavens on
>these other questions I had...
>
>> Jeff.Hodges@Stanford.edu scribbled in netscape.dev.directory newsgroup:
>>
>> What I'm trying to figure out (sorta outta morbid curiosity) is whether
it is
>> the libldap (aka "the ldap sdk", "the ldap stub") or the NS DS that is
>> recognizing the '\' char and interpreting it as a hex escape. Anyone know? 
>> 
>> The below RFC 2252 excerpts imply to me that the client side (aka the LDAP
>> stub, lib, or whatever) needs to know about this stuff in order to
understand
>> and properly handle this value syntax. Is this correct? Or not and why?
>> 
>> Also, I'm curious as to whether there's anything to gain by following
X.500's
>> lead and using '$' as a separator char? I don't believe that any of the
RFCs
>> or I-Ds specify treating it specially, so I doubt it will be inadvertently
>> specially treated as backslash apparently is. 
>
>If '$' isn't treated specially protocol-wise, then the only value of using it
>as a separator is consistency with "tradition" and thus perhaps reuse of some
>amount of attribute value parsing code out there, tho we don't really have a
>large body of that ourselves. 
>
>thanks,
>
>Jeff
>
>ps: thanks to Mark Wilcox for experimenting with duplicating our attr value
>issues. 
>
>
>>
----------------------------------------------------------------------------
---
>> http://info.internet.isi.edu:80/in-notes/rfc/files/rfc2252.txt
>>                         .
>>                         .
>> 
>> 4.1. Common Encoding Aspects
>> 
>>    For the purposes of defining the encoding rules for attribute
>>    syntaxes, the following BNF definitions will be used.  They are based
>>    on the BNF styles of RFC 822 [13].
>> 
>>     a     = "a" / "b" / "c" / "d" / "e" / "f" / "g" / "h" / "i" /
>>             "j" / "k" / "l" / "m" / "n" / "o" / "p" / "q" / "r" /
>>             "s" / "t" / "u" / "v" / "w" / "x" / "y" / "z" / "A" /
>>             "B" / "C" / "D" / "E" / "F" / "G" / "H" / "I" / "J" /
>>             "K" / "L" / "M" / "N" / "O" / "P" / "Q" / "R" / "S" /
>>             "T" / "U" / "V" / "W" / "X" / "Y" / "Z"
>> 
>>     d               = "0" / "1" / "2" / "3" / "4" /
>>                       "5" / "6" / "7" / "8" / "9"
>> 
>>     hex-digit       =  d / "a" / "b" / "c" / "d" / "e" / "f" /
>>                            "A" / "B" / "C" / "D" / "E" / "F"
>> 
>>     k               = a / d / "-" / ";"
>> 
>>     p               = a / d / """ / "(" / ")" / "+" / "," /
>>                       "-" / "." / "/" / ":" / "?" / " "
>> 
>>     letterstring    = 1*a
>> 
>>     numericstring   = 1*d
>> 
>>     anhstring       = 1*k
>> 
>>     keystring       = a [ anhstring ]
>> 
>>     printablestring = 1*p
>> 
>>     space           = 1*" "
>> 
>>     whsp            = [ space ]
>> 
>>     utf8            = <any sequence of octets formed from the UTF-8 [9]
>>                        transformation of a character from ISO10646 [10]>
>> 
>>     dstring         = 1*utf8
>> 
>>     qdstring        = whsp "'" dstring "'" whsp
>> 
>>     qdstringlist    = [ qdstring *( qdstring ) ]
>> 
>>     qdstrings       = qdstring / ( whsp "(" qdstringlist ")" whsp )
>> 
>>                         .
>>                         .
>> 4.3. Syntaxes
>>                         .
>>                         .
>>    In encodings where an arbitrary string, not a Distinguished Name, is
>>    used as part of a larger production, and other than as part of a
>>    Distinguished Name, a backslash quoting mechanism is used to escape
>>    the following separator symbol character (such as "'", "$" or "#") if
>>    it should occur in that string.  The backslash is followed by a pair
>>    of hexadecimal digits representing the next character.  A backslash
>>    itself in the string which forms part of a larger syntax is always
>>    transmitted as '\5C' or '\5c'. An example is given in section 6.27.
>>                         .
>>                         .
>> 
>> 6.27. Postal Address
>> 
>>    ( 1.3.6.1.4.1.1466.115.121.1.41 DESC 'Postal Address' )
>> 
>>    Values in this syntax are encoded according to the following BNF:
>> 
>>       postal-address = dstring *( "$" dstring )
>> 
>>    In the above, each dstring component of a postal address value is
>>    encoded as a value of type Directory String syntax.  Backslashes and
>>    dollar characters, if they occur in the component, are quoted as
>>    described in section 4.3.   Many servers limit the postal address to
>>    six lines of up to thirty characters.
>> 
>>    Example:
>> 
>>       1234 Main St.$Anytown, CA 12345$USA
>>       \241,000,000 Sweepstakes$PO Box 1000000$Anytown, CA 12345$USA
>> 
>>                         .
>>                         .
>> 
>> [ note that "\241,000,000" is intended to resolve to "$1000000" once the
value 
>> string shown above is parsed out into its constituent components, which're 
>> delineated by the "$" chars. This implies to me that the client needs to
know 
>> about this in order to understand and properly handle this value syntax. I 
>> don't know if that assertion is exactly true. ]
>> 
>>
----------------------------------------------------------------------------
---
>
>
>