[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: indexing warning considered harmful



Michael Ströder writes:
> We all know the following messages in syslog (loglevel stats):
> 
> mdb_equality_candidates: (foo) not indexed
> 
> At first glance this seems helpful to find indexing issues.
> 
> But IMO
> 1. this is somewhat mis-leading. regarding performance tuning and

True.  But it's also very valuable for identifying what needs
to be indexed, and it does not look easy to delay the message
until we know if it might be useful.

We could clarify the doc, which some people even read, and
maybe reword the message.

> 2. if internal searches are conducted (e.g. by set-based ACLs) the
> amount of the very same indexing warnings is really annoying and costs
> performance due to excessive logging.

We can add a "none" indexing level with no effect other than
to shut up that warning.  I've been thinking of it before,
but never got around to coding it.

> AFAIK a set of search candidates is derived from filter assertions by first
> searching the indexed attributes.

When that looks useful, yes - like an AND filter.  The index narrows
down the possible candidate entries: The server takes the intersection
of the candidate set returned by baseDN/scope and indexed attrs.

There are some implicit "filters" too: I mentioned baseDN/scope, which
works a bit differently.  Also filtering for objectClass, see below.

> Then the non-indexed assertions are tested but only on the search
> candidate set. Is this correct?

The full filter is tested, since the indexes are inaccurate.
Typically it is a hash of the attribute values which is indexed.
(Which doesn't imply the index is implemented as a hash table, BTW.)

Yes, for each indexed attribute, an AND filter looks up the
set of entry IDs matching the assertion value.  Then the server can take
the intersection of these.

> If yes, then indexing an attribute which is present in many entries
> can lead to large search candidate set even though the amount of final
> search results are small.

Yes.

> Consider the following simple example:
> 
> (&(objectClass=posixAccount)(uid=foo))

Bad example.  The manual says objectClass should be indexed for
performance.  This is because the server may turn your (FILTER) into

  (|(FILTER)(objectClass=alias)(objectClass=referral))

...depending on your search parameters.  If there are aliases or
referrals in the search scope, the server doesn't know if the entries
they refer to match (FILTER).  So it has to find and follow every
alias in the scope to check, and return all referrals in scope.
That's also why it's a bad idea to have lots of aliases.

You are roughly right for other attributes than objectClass, though.

> With lots of user accounts this would lead to two search candidate sets, one
> very large and one with one entry (assuming uid values are unique). Not
> indexing objectClass would one result in *one* search candidate. So indexing
> objectClass might not be very wise.

More accurately, it leads to just one candidate *set*. Plus the implicit
DN/scope candidate set. So there will be fewer candidate sets which the
server needs to take an intersection of.

-- 
Hallvard