[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: ListMatch Clarification




Daniel,

Daniel Henninger wrote:
caseIgnoreListSubstringsMatch has the same basic substrings
semantics as caseIgnoreSubstringsMatch.  Instead of matching
against a single directory string, the former matches against
a list of directory strings (lines).  The initial substring, if
asserted, must match the beginning of the first line.  The
final substring, if asserted, must match the end of the last
line.  Each any substrings match in between.  The matched
portions must not overlap each other, must not span lines,
and must appear in the same order as the substrings.  Multiple
matched portions, however, may be in the same line... and
some lines may have no matched portions.


Ok so, caseIgnoreListSubstringsMatch is no different than
caseIgnoreSubstringsMatch -except- for:

A. the list is effectively concatenated into one line
B. there are "markers" where there were line breaks that no single pattern
 can cross
C. \'s and $'s are handled so that they can be escaped out

Does that sound right?

Yes.


Excellent.  So I've been looking at what it would take to implement this,
and here is how I "think I should implement it".  I'd definitely like
feedback before going through with this.  I also have a side question, but
I'll get to this.  First of all, here's how I propose that I would
implement this:

given: A. is the query string
       B. is the entity within the ldap db I am comparing against

1. convert A to "escaped" format by running that escape command on it
   (assuming that is a real command  haven't looked at that)  escape
   out $ and \

Our implementation unescapes B and stores it as a list of lines. This is closer to the spirit of X.500. Escaping A should have the same effect as far as I can see, but there are nuances to deal with.


2. convert B by turning the line separator $ into a ^M (ctrl-M) (or a \n not sure which would be better there)

Why change the $ separator ? If you have escaped $ in the assertion substrings then none of the substrings can match a literal $ in B. What you will need to do though is escape any *'s in B because they are already escaped in A. All this assumes that the routine for matching substrings is doing a literal character comparison and is ignorant of character escapes.


3. turn the converted A and B over to caseIgnoreSubstringsMatch for comparison

This does mean that an added side effect is that you could actually
-match- those newlines using ^M or \n, depending on which were used.
Personally, I don't think anything is wrong with that.  ;)

It would not be conformant with the specification of caseIgnoreListSubstringsMatch.


If I can do that, then my second question is moot. If I can not allow the newline to be matchable at all, then how is it possible to do a non-substrings match?

1) Don't map the $'s in B to anything else. Do escape the *'s in B.

OR

2) The description of caseIgnoreListSubstringsMatch depends on the description
of caseIgnoreSubstringsMatch but that doesn't mean it must necessarily be
implemented that way. Our implementation uses a separate function for
caseIgnoreListSubstringsMatch. It attempts to match as many of the assertion
substrings as it can in the first line (where an initial substring must match
the beginning of the first line). If there are any left over it attempts
to match the remainder on the second line. If there are still some left over it
attempts to match the new remainder on the third line, and so on. If all the substrings
get matched, and any final substring matches the end of the last line, then success!

Since we store B as a list of separately terminated lines (and do much the same for A)
there is no need to worry about separators or escaping (except to undo it on input).

Regards,
Steven


Daniel