[Date Prev][Date Next] [Chronological] [Thread] [Top]

(ITS#3467) URL parsing routines non symmetric; generated URL strings unusable



Full_Name: Pierangelo Masarati
Version: HEAD, 2.2
OS: irrelevant
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (81.74.43.82)


I noted that ULR parsing routines do not behave in a symmetric manner, e.g. when
parsing an URL list via ldap_url_parselist(), the string containing a list of
URLs that is generated back can be different and unusable in slapd, essentially
in the chars that are URL-escaped.

An example is in back-meta, when using ldapi:// with a non-standard path.  A
path containing "/" requires them to be escaped as "%2F"; however, if the string
is rewritten via ldap_url_list2urls(), the "/" are not encoded.  back-meta needs
to parse the URI to interpret and strip the DN portion, and needs to use lists
of URIs for redundancy reasons.

I haven't gone too far in RFCs about the URL syntax, but at least in RFC 2396
there's a clear list of what MUST be escaped and what can go unescaped.  I have
modified the URL parse/string representation functions.  RFC 2396 states 

2.2. Reserved Characters
 
   Many URI include components consisting of or delimited by, certain
   special characters.  These characters are called "reserved", since
   their usage within the URI component is limited to their reserved
   purpose.  If the data for a URI component would conflict with the
   reserved purpose, then the conflicting data must be escaped before
   forming the URI.

so it is my understanding that when draft-ietf-ldapbis-url states

2.1.  Escaping Using the % Method
                                                                               

   A generated LDAP URL MUST consist only of the restricted set of
   characters included in the uric production that is defined in section
   2 of [RFC2396].  Implementations SHOULD accept other valid UTF-8
   strings [RFC3629] as input.  An octet MUST be escaped using the %
   method described in section 2.4 of [RFC2396] in any of these
   situations:
 
      The octet is not in the reserved set defined in section 2.2 of
      [RFC2396] or in the unreserved set defined in section 2.3 of
      [RFC2396].
 
      It is the single Reserved character '?' and occurs inside a dn,
      filter, or other element of an LDAP URL.
 
      It is a comma character ',' that occurs inside an extension value.
 
by "The octet is not in the reserved set defined in section 2.2 of [RFC2396]" it
means that the octet is not there for the purpose it is reserved, and thus must
be escaped to prevent its interpretation as reserved.  This is the case of the
"/" in defining a local path.

A patch is coming shortly, unless my analysis is not correct.

p.