[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: LDAPprep



David,

Thanks for raising these issues.  I note the I-D in question
is in the RFC-Editor queue, I have notified our AD that LDAPBIS
is discussing these issues and will likely propose a change
be made prior to publication.  I'll work with the AD to
determine how best to accomplish this.

As indicated by my above comments, I concur that there is
at least one issue significant enough to warrant a last
minute fix.  Details below:

At 11:15 AM 5/4/2006, David Wilson wrote:
>I've been looking at draft-ietf-ldapbis-strprep-07, and there seems to
>be a serious problem in the area of substring matches.
>
>Section 2.6.1 states that if the string contains any non-space
>characters then it is modified to start and finish with a space, and any
>internal sequences of spaces are altered to be two spaces. This appears
>to apply to substring filter strings. (The following paragraph has a
>specific exception for these for the case of only spaces in the value).
>
>But if this is done, and you have, say,
>
>        (cn=*bar*)
>
>that is not going to match a value of "foobar", as the 'any' string
>becomes "<SPACE>bar<SPACE>" by the above rule, the value being matched
>becomes "<SPACE>foobar<SPACE>" which does not contain the substring.
>
>The overall scheme would work, but you need more complicated rules for
>substring filter strings. Inner sequences of spaces become two spaces.
>Leading or trailing sequences become one space, but spaces are NOT added
>at the ends except:
>
>- a space at the start of an initial substring
>- a space at the end of a final substring

I concur.

The fix would be to replace:
 If the input string contains at least one non-space character, then
 the string is modified such that the string starts with exactly one
 space character, ends with exactly one SPACE character, and that
 any inner (non-empty) sequence of space characters is replaced with
 exactly two SPACE characters.  For instance, the input strings
 "foo<SPACE>bar<SPACE><SPACE>", results in the output
 "<SPACE>foo<SPACE><SPACE>bar<SPACE>".

 Otherwise, if the string being prepared is an initial, any, or final
 substring, then the output string is exactly one SPACE character,
 else the output string is exactly two SPACEs.

with:
 For input strings which are attribute values or non-substring
 assertion values:  If the input string contains no non-space
 character, then the output is exactly two SPACEs.   Otherwise
 (the input string contains at least one non-space character)
 then the string is modified such that the string starts
 with exactly one space character, ends with exactly one SPACE
 character, and that any inner (non-empty) sequence of space
 characters is replaced with exactly two SPACE characters.  For
 instance, the input strings "foo<SPACE>bar<SPACE><SPACE>",
 results in the output "<SPACE>foo<SPACE><SPACE>bar<SPACE>".

 For input strings which are substring assertion values: If the
 string being prepared contains no non-space characters, then the
 output string is exactly one SPACE.  Otherwise, the following steps
 are taken:
  - If the input string is an initial substring, it is modified to
    start with exactly one SPACE character;
  - If the input string is an initial or an any substring which ends in
    one or more space characters, it is modified to end with exactly
    one SPACE character;
  - If the input string is an any or a final substring which ends in
    one or more space characters, it is modified to end with exactly
    one SPACE character; and
  - If the input string is a final substring, it is modified to end
    with exactly one SPACE character.
 For instance, for the input string "foo<SPACE>bar<SPACE><SPACE>"
 as an initial substring, the output would be
 "<SPACE>foo<SPACE><SPACE>bar<SPACE>".  As an any or final substring,
 the same input would result in "foo<SPACE>bar<SPACE>".


>I have two other minor comments on this draft, not directly related to
>the above.
>
>draft-ietf-ldapbis-syntaxes-11 does not change the definition of the
>telephone number syntax nor the definition of facsimile telephone
>number. In both cases the number is a PrintableString. So, I'm not sure
>why "2.6.3 telephoneNumber Insignificant Character Handling" needs to
>make allowance for non-PrintableString hyphen-type characters.

I note there is a third case: values carried in a Substring Assertion.
That aside...

We certainly could have regarded only U+002D as a hyphen here.
We didn't.  That is, I suggest we not regard this as significant
(one requiring we consider changes to an approved specification)
technical issue.

>In Appendix B, an alternative scheme for insignificant space handing is
>described. In conjunction with substring matching, this alternative
>scheme tends to make substrings in the filter shorter, by removing
>leading and trailing spaces. Therefore you get matches which you don't
>expect, rather than not getting matches you do expect. In particular,
>the first case erroneously states that (with this mechanism) (cn=foo\20*
>\20bar) would NOT match CN values "foo<SPACE>bar" and
>"foo<SPACE><SPACE>bar", but it would. The initial and final substrings
>are reduced to "foo" and "bar", and so match the prepared values, which
>both become "foo<SPACE>bar". The same applies to the third example,
>which uses the same filter and one of the same value strings.

Here I think that I was treating the space in (cn=foo<SPACE>*) (in
at least some instances) as an inner space.  Aside from dropping
the "would not" part of item 1, I need to rephrase item 3 as
indicative as why another simple approach (just leaving these
single spaces in) is also problematic.  Luckily this is in
non-normative background text.

Thanks, Kurt