[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: dnValidate (was: Re: UTF8 case insensitive matching)

At 05:15 PM 12/22/00 -0800, Kurt D. Zeilenga wrote:
>Just a few random comments:
>At 01:27 PM 12/22/00 -0500, David A. Cooper wrote:
> >Based on earlier discussions, I have been working on a version of dnValidate that will read in a distinguished name that was generated based on RFC 2253 (or RFC 1779) and will return a normalized version of that string (compliant with RFC 2253).
>I would prefer that such a routine be provided within the
>client library as client need to normalize user input.  That is,
>often users type in DNs which don't strictly conform to RFC2253,
>Section 2 (but use some RFC 1779 style variants).

I hadn't thought about using it on the client side, but it makes sense. My main interest in this comes from the desire to run a server. For the server that I'll be running, I expect that the DNs will contain only attribute types from the list in RFC 2253 with attribute values that only use ASCII characters. My only concern was that, having no control over the clients that will sending in requests, I wanted to be prepared to handle attribute types and values in any form that might possibly arrive (e.g., attribute types specified as OIDs, quoted attribute values, attribute values with characters that have been escaped in one way or another, and BER encoded values). So, for my own purposes, this routine would need to be used at the server even if openLDAP clients only sent queries with "normalized" DNs.

> >The code that I produced will handle attribute values of type Directory string whether they are provided as a string, a quoted string, or a BER encoded string.
>When dealing with BER encoded directoryStrings, I would suggest
>limiting yourself to universalString (UCS-4), printableString
>(subset of IA5), and utf8String (UTF-8) choices of the
>directoryString syntax.  That is, I would waste time dealing
>with teletexString (T.61) strings.

Good. The more I read about TeletexString (what little I can find), the more work it seems that it would be to convert from TeletexString to UTF8. At the moment, I believe that my code treats TeletexStrings as if they are really Latin-1 strings. Since it appears that most people who use the TeletexString tag in X.509 certificates are really providing Latin-1 encoded strings, this should work OK in most cases anyway.

BTW, I did include BMP strings since working with this encoding isn't much different than universalString.

>In addition, I suggest you do not muck with any attribute
>value of an attribute type not listed in the RFC 2253,
>Section 2.3 table.  This means you only have to deal with
>the directoryString and IA5String BER encodings.
> >It can similarly handle attribute values of type bitstring.
>As there is no type listed in the type which has this syntax,
>the type *should* be listed by OID and the value BER encoded.
>However, as noted above, don't muck with these.

At the moment, the only attribute type that I could find that has an attribute type of bitstring is uniqueIdentifer (OID For this, I went ahead and used the string representation for bitstrings from RFC 2252 as the "normalized" version. The alternative would be to DER encode the bitstring. While this might be an option, I found it to be a lot easier to convert an arbitrary BER encoded bitstring into a string according to RFC 2252 than it would have been to convert to a DER encoding.

As you noted, of course, it is unlikely that we'll encounter DNs with attribute values of type bitstring, but I figure it can't hurt for the code to be able to handle such things (particularly since the code to handle it has already been written).

> >Unlike the dnValidate function currently in servers/slapd/schema_init.c, I have added two additional parameters: make_uppercase and compress_whitespace.
>For the client side routine, the user provided values should not
>be mucked with.
>On the server side routine, we have to be careful with when and
>where we muck as in general a directory server should not muck
>with user data.  That is, if a user provides a goofy looking
>DN as a value of say a 'member' attribute type, the server should
>provide the value back to the user when later requested.

Well, it sounds like I did exactly the right thing. If I understand you correctly, compact_whitespace (and sometimes make_uppercase) should be set when normalizing a DN for the purposes of performing matches. On the other hand, neither should be set when prettying a DN (It's nice to hear that all of the options will be used and that the extra work of allowing these different options won't go to waste). 

 From the description below, I believe that my function will operate as desired (for the non-matching case) when neither compact_whitespace nor make_uppercase are set. Please give this a try and let me know if any changes are needed.

Otherwise, I think the next step is to try to ensure that my function will integrate well with the work that Stig Venås is doing.

>However, a DN is a complex attribute syntax.  It is valid for
>a server to convert the DN string on input to BER form and then
>produce a string generated from the BER form upon request.  This
>conversion can be done upfront or as needed.
>So, it would be okay for the server to "normalize" the DN string
>representation as long as it does not alter the user data contained
>in the representation.  That is, the server can "pretty" the DN
>(replace RFC 1779isms with RFC2253ims including value escaping),
>but it cannot alter any assertion value.  No leading, trailing, consecutive
>space removal, no upper (or lower) casing, etc.  Such "normalization"
>should only be done during DN matching.
>That is:
>         cn = " foo "; o=bar
>can be "prettied" to:
>         CN=\20foo\20,o=bar
>as such a change preserves the assertion values.
> >If make_uppercase is set, then all of the characters in the string are made uppercase (using uctoupper), otherwise the cases of the characters in the attribute values are left unchanged.
> >
> >If compress_whitespace is set, then all leading and trailing whitespace characters are removed from attribute values and sequences of whitespace characters between "words" in an attribute value are replaced by a single space. If compress_whitespace is not set, then only those leading and trailing whitespace characters that are not considered to be part of the attribute value according to RFC 2253 are removed. For example, if compress_whitespace is set, the string 'cn =  \20 David  Cooper \20  ' would be compressed to 'cn=David Cooper', whereas it would become 'cn=\20 David  Cooper \20' if compress_whitespace were not set.
>Such a change alters the assertion value and should not be done
>as part of prettying the DN (but is done for matching).