[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#8875) [Patch] Performance problems in back-mdb with large DITs and many aliases



On Mon, Jul 15, 2019 at 07:44:05PM +0100, Howard Chu wrote:
> Howard Chu wrote:
> > Henrik Bohnenkamp wrote:
> >> On Mon, Jul 15, 2019 at 02:26:59PM +0100, Howard Chu wrote:
> >>>
> >>> Fyi, on our problematic test database with 11M entries and 3.7M aliases, a search with -a always , starting from the
> >>> DB suffix, took 4 minutes without this patch, and 1235 minutes with this patch.
> >>>
> >>> Needless to say, that's not looking good. Still checking other test cases.


I start to understand why this is the case. I believe that our two
approaches to solve the problem each have their Nemesis...  I have
constructed a test case where the patched slapd needs 2 minutes (1
minute on a repeat) for a search for people objects, but the master
branch has yet to finish after 30 minutes (and might not finish today).

The test case is modeled similar to what is done in my company: group
objects have not only member entries for people, but each member has
an alias to the corresponding person object, these aliases being
children to the respective group object. Furthermore, subgroups/group
inclusion is modeled by adding aliases (also as children) to the group
object, pointing to the group objects representing the included
groups. So there is some recursive descent to get all person objects
of all subgroups. In total I have 5 mio entries, 3 mio of which are
aliases.  There are no cycles, but in this test case the depth of the
recursion is absolutely unrealistic.

I believe, in the master-slapd, each step down in the recursion needs a call to
mdb_idscope, which in turn goes through all 3 mio aliases to compute
the intersection with the scope of the group object (that's what I
remember, but I have not looked at the algorithm for a year now).

The  patch traverses in this case just very small sub-trees (although many of those) 
to collect the relevant aliases and continues from there. 
That seems to be much more efficient here. 

If I understand your description above correctly, my patch sucks with
your test case because it does a full traversal of the LDAP tree to
collect all aliases ... with 11 mio entries that takes a while. The
master branch, however, just computes intersection with the scope once
and continues from there, which must be much faster.

I actually do not know what to conclude from all this. I wonder
however, if a combination of both approaches could be beneficial. With
the patch the diskNode of an entry contains not only the number of
children in the  scope, but also the number of aliases. If the
total number of aliases in the database is large, but the search scope
is small, the tree traversal might be faster. If the number of entries in
the scope is big, and there are aliases in the search scope, the
intersection in mdb_idscope might be more efficient. If there are no 
aliases in the search scope, nothing has to be done at all.