[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: SLAP_INDEX_SUBSTR_ANY_LEN & co



Howard Chu writes:
> You already know what IF_MINLEN and ANY_LEN are for; they control both 
> index generation (which occurs when attributes are stored) and index 
> lookup (which occurs when search filters are evaluated). The other two 
> values only affect index lookup:

IF_MAXLEN also affects generation, fortunately.  At least in
octetStringSubstringsIndexer.

> ANY_STEP has to do with the sliding window that is used to generate a 
> substring index keys for a value. For example, when indexing the 
> attribute "cn=abcdefgh" with a STEP size of 2 a hash key is generated 
> for these parts:
>       abcd
>           cdef
>              efgh

Hm.  So with ANY_LEN reduced to 3 I'll probably need ANY_STEP of 1 to
keep down the number of false positives.

Looks there is room for a lot of tuning here - like setting ANY_LEN to
an array {3, 4}, where filtering uses the largest possible value and an
ANY_STEP of something like (the applied ANY_LEN) - 2.  And making
ANY_STEP dependent on the substring length and the size of the index or
something, but that seems a lot more hairy.

> I should point out that our patch also fixes the initial/final behavior: 
> if a filter is provided that exceeds the MAXLEN, we no longer ignore the 
> excess characters. Instead we combine them with an ANY substring index 
> lookup, so that
>    cn=abcdefgh*
> is internally equivalent to
>    cn=abcd*defgh*
> 
> Naturally this doesn't work if subany indexing was not used...

Then I suggest you document under 'index .. subinitial/final' that
turning off subany impacts (attr=foo*bar) searches.

Maybe also adding a third constant for the IF_MAXLEN value to use if
subany is disabled, with a larger default value (IF_MAXLEN + ANY_LEN -
ANY_STEP?) so the index will handle (attr=foo*bar) approximately
equally well with and without subany.

Thanks a lot for the explanation - and for the coming patch.

-- 
Hallvard