[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: LDAPprep: mapping of " " values




On 30-Nov-04, at 10:42 PM, Kurt D. Zeilenga wrote:

At 07:13 PM 11/30/2004, Steven Legg wrote:
I don't like any solution that requires non-disjoint matching against substrings, though with spaces it can be finessed.

How about this: Prepare the attribute value...

Here is a slightly modified version of a proposal I made a few weeks ago. I will argue that it does not involve disjoint matching, does not violate the provision of X.501 8.8.5, produces the same results as Kurt's proposal (I call this the [two-space} proposal in the following text) in the majority of cases, and produces more intuitive results in the remaining cases,


Proposal:

1) The preparation rules are unchanged from the current strprep draft (that is, both attribute and assertion values have initial and final spaces stripped, and internal sequences of spaces collapsed into a single space) [Note 1] and that any value which only consists of zero or more spaces is replaced with a single space) [Note 2].

2) The following is added to the current definition of the substring matching rule (nothing is removed):

  Definition: a space-initial (final, respectively) assertion
  value is an assertion value whose prepared value does not
  consist of a single space, and whose unprepared value
  starts (ends, respectively) with a space.

  -- existing text of substring matching rule

  The rule evaluates to TRUE if and only the prepared substrings
  of the assertion value match disjoint portions of the prepared
  attribute value character string in the order of the substrings
  in the assertion value, and

  an <initial> substring, if present, matches the beginning of the
  prepared attribute value character string, and

  a <final> substring, if present, matches the end of the prepared
  attribute value character string

  -- proposed addition

  and,

  a space-initial assertion value either matches the beginning of
  the prepared attribute value character string, or matches the
  prepared attribute value character string at a position
  immediately following a space, and

  a space-final assertion value either matches the end of the
  prepared attribute value character string, or matches the prepared
  attribute value character string at a position immediately preceding
  a space.

The difference between this and my previous proposal is that it clarifies
that the proposed additional restrictions do not apply to assertion values
consisting only of spaces, and that they do apply to <initial> and <final>
substrings. (Clarifies may be a euphemism here.)


1. This proposal does not involve disjoint matching. The disjoint
   matching provision is unchanged by the proposal. Rather, it further
   restricts where certain assertion values may match.

2. This obeys the restriction of X.501 8.8.5. If a substring
   is plucked out of the original attribute value and compared with
   any assertion value, it will match exactly as provided for by
   the equality matching rule. In particular, if <x> and <y> are
   attribute values whose prepared values are identical (i.e., they
   are equal under the equality matching rule), and <z> is any
   assertion value (consisting of an order sequence of <initial>,
   <any> and <final> strings), then <z> will match <x> if and only
   if <z> matches <y>.

3. The proposal produces the same results as the status quo if no
   string in the assertion value begins or ends with a space. In
   no circumstance does it produce a match which would not be
   produced by the status quo; consequently, it is a strict subset.

4. The proposal produces the same results as the [two-space]
   proposal if no string in the assertion value is empty or consists
   only of spaces. There is a proof of this which involves colouring
   initial assertion spaces green, final assertion spaces red, and
   the two-space internal sequence (red, green) and then observing
   that spaces will only match if their colours are the same, except
   for cases involving assertion and attribute values consisting
   entirely of spaces.

5. As with the status quo, the assertion value (*foo* *bar*) will
   match an attribute value where a word [Note 3] containing "foo"
   precedes a word containing "bar". It will not match "foobar".
   Furthermore, the assertion value (at=*foo* * *bar*) will match
   where there is at least one intervening word, the assertion value
   *foo* * * *bar* will match where there are at least two
   intervening words, and so on. The presence or absence of initial/
   final spaces in the strings in the assertion value does not change
   the semantics, except to restrict the position of the assertion
   strings in the words of the attribute value.

6. The [two-space] proposal exhibits much more complex behaviour.
   (at=*foo* *bar*) will match as above, but (at=*foo* * *bar*)
   matches exactly the same set of attribute values; *foo* * * *bar*
   is necessary to require an intervening word. However, if "foo"
   is changed to "foo " or "bar" is changed to " bar", this changes:
   now, two " " strings are sufficient to require an intervening word.
   If both attribute values are changed, then one " " string is
   sufficient to require an intervening word, and three are sufficient
   to require two intervening words. I argue that this behaviour is
   more complex and less intuitive than either the status quo or
   my proposal.

[Note 1]: I did not repeat my observations about spaces immediately
followed by Unicode combining characters, but it does seem clear
that these need to be handled differently, as is made clear in
strprep. As I said earlier, the simplest solution is probably to
change such characters to U+00A0 following the prohibition step
and prior to the insignificant space removal step.

[Note 2]: It is not clear to me how you would specify an empty
string as an <initial> or <final> string in a substring match. Or,
alternatively, how you would specify the absence of an <initial>
or <final> string. (In the filter syntax, I mean -- it is clear
how to do it in the protocol.)

[Note 3]: I use the word "word" here to mean any sequence of
characters not including a space (other than a space which starts
a combining sequence, as above). There are other, possibly more
useful and certainly less simple, definitions of what a word is,
but in many cases this definition will prove adequate.