[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: LDAPprep: mapping of " " values
On 30-Nov-04, at 10:42 PM, Kurt D. Zeilenga wrote:
At 07:13 PM 11/30/2004, Steven Legg wrote:
I don't like any solution that requires non-disjoint matching against
substrings, though with spaces it can be finessed.
How about this:
Prepare the attribute value...
Here is a slightly modified version of a proposal I made a few weeks
ago. I will argue that it does not involve disjoint matching, does not
violate the provision of X.501 8.8.5, produces the same results as
Kurt's proposal (I call this the [two-space} proposal in the following
text) in the majority of cases, and produces more intuitive results in
the remaining cases,
Proposal:
1) The preparation rules are unchanged from the current strprep draft
(that is, both attribute and assertion values have initial and final
spaces stripped, and internal sequences of spaces collapsed into a
single space) [Note 1] and that any value which only consists of zero
or more spaces is replaced with a single space) [Note 2].
2) The following is added to the current definition of the substring
matching rule (nothing is removed):
Definition: a space-initial (final, respectively) assertion
value is an assertion value whose prepared value does not
consist of a single space, and whose unprepared value
starts (ends, respectively) with a space.
-- existing text of substring matching rule
The rule evaluates to TRUE if and only the prepared substrings
of the assertion value match disjoint portions of the prepared
attribute value character string in the order of the substrings
in the assertion value, and
an <initial> substring, if present, matches the beginning of the
prepared attribute value character string, and
a <final> substring, if present, matches the end of the prepared
attribute value character string
-- proposed addition
and,
a space-initial assertion value either matches the beginning of
the prepared attribute value character string, or matches the
prepared attribute value character string at a position
immediately following a space, and
a space-final assertion value either matches the end of the
prepared attribute value character string, or matches the prepared
attribute value character string at a position immediately preceding
a space.
The difference between this and my previous proposal is that it
clarifies
that the proposed additional restrictions do not apply to assertion
values
consisting only of spaces, and that they do apply to <initial> and
<final>
substrings. (Clarifies may be a euphemism here.)
1. This proposal does not involve disjoint matching. The disjoint
matching provision is unchanged by the proposal. Rather, it further
restricts where certain assertion values may match.
2. This obeys the restriction of X.501 8.8.5. If a substring
is plucked out of the original attribute value and compared with
any assertion value, it will match exactly as provided for by
the equality matching rule. In particular, if <x> and <y> are
attribute values whose prepared values are identical (i.e., they
are equal under the equality matching rule), and <z> is any
assertion value (consisting of an order sequence of <initial>,
<any> and <final> strings), then <z> will match <x> if and only
if <z> matches <y>.
3. The proposal produces the same results as the status quo if no
string in the assertion value begins or ends with a space. In
no circumstance does it produce a match which would not be
produced by the status quo; consequently, it is a strict subset.
4. The proposal produces the same results as the [two-space]
proposal if no string in the assertion value is empty or consists
only of spaces. There is a proof of this which involves colouring
initial assertion spaces green, final assertion spaces red, and
the two-space internal sequence (red, green) and then observing
that spaces will only match if their colours are the same, except
for cases involving assertion and attribute values consisting
entirely of spaces.
5. As with the status quo, the assertion value (*foo* *bar*) will
match an attribute value where a word [Note 3] containing "foo"
precedes a word containing "bar". It will not match "foobar".
Furthermore, the assertion value (at=*foo* * *bar*) will match
where there is at least one intervening word, the assertion value
*foo* * * *bar* will match where there are at least two
intervening words, and so on. The presence or absence of initial/
final spaces in the strings in the assertion value does not change
the semantics, except to restrict the position of the assertion
strings in the words of the attribute value.
6. The [two-space] proposal exhibits much more complex behaviour.
(at=*foo* *bar*) will match as above, but (at=*foo* * *bar*)
matches exactly the same set of attribute values; *foo* * * *bar*
is necessary to require an intervening word. However, if "foo"
is changed to "foo " or "bar" is changed to " bar", this changes:
now, two " " strings are sufficient to require an intervening word.
If both attribute values are changed, then one " " string is
sufficient to require an intervening word, and three are sufficient
to require two intervening words. I argue that this behaviour is
more complex and less intuitive than either the status quo or
my proposal.
[Note 1]: I did not repeat my observations about spaces immediately
followed by Unicode combining characters, but it does seem clear
that these need to be handled differently, as is made clear in
strprep. As I said earlier, the simplest solution is probably to
change such characters to U+00A0 following the prohibition step
and prior to the insignificant space removal step.
[Note 2]: It is not clear to me how you would specify an empty
string as an <initial> or <final> string in a substring match. Or,
alternatively, how you would specify the absence of an <initial>
or <final> string. (In the filter syntax, I mean -- it is clear
how to do it in the protocol.)
[Note 3]: I use the word "word" here to mean any sequence of
characters not including a space (other than a space which starts
a combining sequence, as above). There are other, possibly more
useful and certainly less simple, definitions of what a word is,
but in many cases this definition will prove adequate.