[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: filter preprocessing for performance improvement



Pierangelo Masarati wrote:
Jon Roberts wrote:
I was just today thinking about something along the lines of filter
preprocessing (at the client level actually) that prevented say a
contains search like (telephonenumber=*67530*) on an attribute that the
directory has not indexed for substring searches (case of
telephonenumber). Something at the server level would be better of course.

Something like that was discussed long time ago when I proposed the "limits" feature (which eventually got into slapd in its current form). It's hard to tell what such constraint would mean. However, if one only looks at the presence of a substrings filter in a search, unexpected results may occur; for example:

	(telephonenumber=*67530*)		=> reject

but what about

	(!(telephonenumber=*67530*))		=> ?

or

	(&(uid=foo)(telephonenumber=*67530*))	=> ?

A better approach, which we recently developed for a customer, would be
to define what filter is to be considered acceptable and what is not,
and then analyze the logic of the filter to see if it matches that of
the requirement.  For example, logic analysis could allow to determine
if a filter is surely acceptable, surely unacceptable, or "grey"; then,
decision making could determine what to do in the "grey" cases.

If what you want to control is searches resulting in large candidate
sets, you need to define what may potentially lead to large candidate
sets.  So you need to define what's "large", and what simple filters
could lead to large candidates sets.

OK, so you want to prevent candidate generation to occur for filter terms which might result in large candidate sets. First of all, assuming that that's even a valid thing to do (noting your issues listed above) I would just define a new limit analogous to sizelimit.unchecked, and skip the probability guessing games. E.g. sizelimit.intermediate which would be checked at intermediate stages of filter evaluation. That would render sizelimit.unchecked moot.


The implementation would apply this limit to each individual filter term lookup, and fail with ADMINLIMIT_EXCEEDED when any term exceeds the limit.

In practice I think this will cause a lot of harm though; it will cause ANDed filters to fail that would otherwise come in under the unchecked limit.
--
-- Howard Chu
Chief Architect, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc
Chief Architect, OpenLDAP http://www.openldap.org/project/