[Date Prev][Date Next] [Chronological] [Thread] [Top]

non-UTF8 string Substrings matching



RFC 2251 defines a substrings filter as:

 SubstringFilter ::= SEQUENCE {
   type            AttributeDescription,
   -- at least one must be present
   substrings      SEQUENCE OF CHOICE {
     initial [0] LDAPString,
     any     [1] LDAPString,
     final   [2] LDAPString } }
 LDAPString ::= OCTET STRING

where LDAPString is restricted to UTF-8 encoded ISO 10646-1
character set.

This implies that octetSubstringsMatch cannot be specified
as the SUBSTR matching rule of any attribute type as the
asserted substrings are not restricted to UTF-8.

This also implies that (cn;binary=*hvalue*) [where hvalue
is the hex-escaped BER encoded value] is invalid as the BER
encoding itself is not restricted to UTF-8.

To allow non-UTF8 string substring assertions, is that it
might be appropriate to change the ASN.1 to:

 SubstringFilter ::= SEQUENCE {
   type            AttributeDescription,
   -- at least one must be present
   substrings      SEQUENCE OF CHOICE {
     initial [0] LDAPSubstring,
     any     [1] LDAPSubstring,
     final   [2] LDAPSubstring } }
   LDAPSubstring ::= OCTET STRING

where that actual value held in LDAPSubstring is restricted
to the syntax appropriate for the substrings assertion.

For (cn=*value*), the LDAPSubstring is restricted to
UTF-8.  For (cn;binary=*hvalue*), the LDAPSubstring
must contain the BER-encoded directoryString asserted
value.  For (1.2.3=*value*), where 1.2.3 SUBSTR matching
rule is octetSubstringMatch, the LDAPSubstring may be
any octet string.  Likewise for other substrings
assertion syntaxes.

Basically the proposal is to trade one notational
convenience for another such that we can describe the
full range of behavior allowed by X.500.  That is,
it makes possible the encoding of non-UTF8 substrings
assertions.

Comments?

Kurt