[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: string value encoding and escaping question



Mark Smith wrote:
> 
> Jeff Hodges wrote:
> > ...
> > Do you or anyone else have a URL handy that points to a reference for T.61?
> > I'd like to stick it in the LDAP Roadmap.
> 
> I don't have a good reference.  RFC 1345 - "Character Mnemonics &
> Character Sets" lists the characters in T.61, but there isn't a lot of
> extra information there.  

Thanks for the pointer to 1345. I hadn't been aware of it. I'll add that to
the roadmap. 

> > ...
> > So, are there any chars other than '\' that're treated specially in the
> > protocol docs (aka RFCs [2251..2256] + relevant near-RFC I-Ds) that you know
> > of? My search hasn't turned up any, but I might've left a stone unturned. It
> > looks to me like the protocol docs ~don't~ treat '$' specially.
> 
> '\' is the main one, but depending on context other characters must be
> escaped as well.  For example, if the value appears in a DN RFC 2253
> says quite a few characters must be escaped including, for example, ','
> and '+'.  RFC 2254 says that characters such as '('and '*' must be
> escape when used in string represented search filters.  And so on.

Right -- I hadn't explicitly specified context. The context I was asking about
was that of a non-distinguished, nominal string value. "nominal" in the sense
that the dir server might not have "schema check" (or equivalent) turned on,
and may not have an explicit syntax declaration for the attribute. 

> I don't think the Netscape Directory SDKs do anything special as far as
> escaping except inside search filters and DNs.  Actually, for DNs and
> values both I think we just send along the values you give us.  I am
> kind of puzzled that you encountered problems honestly, unless you used
> the values with '\' inside filters or DN attributes.

Which we didn't. Given Mark W's quick test, the NS SDK didn't muck with the
attr value, where our experience with the UMich libldap indicates something is
going on with it doing so.  

But again, I'm wondering how the authors imagined these sections of rfc2252
would be implemented..

> 4.3. Syntaxes
>                         .
>                         .
>    In encodings where an arbitrary string, not a Distinguished Name, is
>    used as part of a larger production, and other than as part of a
>    Distinguished Name, a backslash quoting mechanism is used to escape
>    the following separator symbol character (such as "'", "$" or "#") if
>    it should occur in that string.  The backslash is followed by a pair
>    of hexadecimal digits representing the next character.  A backslash
>    itself in the string which forms part of a larger syntax is always
>    transmitted as '\5C' or '\5c'. An example is given in section 6.27.
>                         .
>                         .
> 
> 6.27. Postal Address
> 
>    ( 1.3.6.1.4.1.1466.115.121.1.41 DESC 'Postal Address' )
> 
>    Values in this syntax are encoded according to the following BNF:
> 
>       postal-address = dstring *( "$" dstring )
> 
>    In the above, each dstring component of a postal address value is
>    encoded as a value of type Directory String syntax.  Backslashes and
>    dollar characters, if they occur in the component, are quoted as
>    described in section 4.3.   Many servers limit the postal address to
>    six lines of up to thirty characters.
> 
>    Example:
> 
>       1234 Main St.$Anytown, CA 12345$USA
>       \241,000,000 Sweepstakes$PO Box 1000000$Anytown, CA 12345$USA


..whether on the server or client side. No one's really answered that yet. 

thanks,

Jeff