Issue 8875 - [Patch] Performance problems in back-mdb with large DITs and many aliases
Summary: [Patch] Performance problems in back-mdb with large DITs and many aliases
Status: UNCONFIRMED
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: slapd (show other issues)
Version: unspecified
Hardware: All All
: --- normal
Target Milestone: 3.0.0
Assignee: OpenLDAP project
URL:
Keywords: has_patch
Depends on:
Blocks:
 
Reported: 2018-07-06 22:20 UTC by hbohnenkamp@united-internet.de
Modified: 2022-05-30 10:58 UTC (History)
1 user (show)

See Also:


Attachments
mkalias.c (2.31 KB, text/x-csrc)
2019-07-15 20:12 UTC, Howard Chu
Details
mkalias.c (2.23 KB, text/x-csrc)
2019-07-15 18:44 UTC, Howard Chu
Details

Note You need to log in before you can comment on or make changes to this issue.
Description hbohnenkamp@united-internet.de 2018-07-06 22:20:42 UTC
Full_Name: Henrik Bohnenkamp
Version: >= 2.4.44, HEAD
OS: Ubuntu 18.04, Coreos 7.5
URL: https://github.com/hbo/openldap-mdb-deref-problem
Submission from: (NULL) (77.176.95.241)


This is a followup to
http://www.openldap.org/lists/openldap-technical/201805/msg00065.html

When MDB is used as backend database and with large DITs (O(10^6)) with many
alias entries (O(10^5)), search requests with deref=always and scope=sub will
take prohibitively long. Servers with a high request rate might become utterly
unresponsive. This problem is not present in the HDB backend.

In this issue I want to contribute two things:
- a means to demonstrate the problem; this in the form of two scripts
(bash/python) which set up a large test  DIT and start two slapds (one HDB, one
MDB) to allow easy comparisons of the performance
- a patch to fix the problem

The patch is certainly not production ready (or, if it is, it needs still
exhaustive testing to inspire  confidence), however, I think it is far enough to
at least discuss the approach. 

Both the scripts and the patch, together with instructions how to use the former
can be found here: 

https://github.com/hbo/openldap-mdb-deref-problem

I am looking forward to discuss the patch.
Comment 1 Quanah Gibson-Mount 2018-07-06 22:28:33 UTC
--On Friday, July 06, 2018 11:20 PM +0000 hbohnenkamp@united-internet.de 
wrote:

> Full_Name: Henrik Bohnenkamp
> Version: >= 2.4.44, HEAD
> OS: Ubuntu 18.04, Coreos 7.5
> URL: https://github.com/hbo/openldap-mdb-deref-problem
> Submission from: (NULL) (77.176.95.241)
>
>
> This is a followup to
> http://www.openldap.org/lists/openldap-technical/201805/msg00065.html

Note that this is duplicate of ITS#7657.

--Quanah



--

Quanah Gibson-Mount
Product Architect
Symas Corporation
Packaged, certified, and supported LDAP solutions powered by OpenLDAP:
<http://www.symas.com>


Comment 2 Quanah Gibson-Mount 2018-07-06 22:30:13 UTC
changed notes
Comment 3 Howard Chu 2018-07-09 17:28:50 UTC
quanah@symas.com wrote:
> --On Friday, July 06, 2018 11:20 PM +0000 hbohnenkamp@united-internet.de
> wrote:
> 
>> Full_Name: Henrik Bohnenkamp
>> Version: &gt;= 2.4.44, HEAD
>> OS: Ubuntu 18.04, Coreos 7.5
>> URL: https://github.com/hbo/openldap-mdb-deref-problem
>> Submission from: (NULL) (77.176.95.241)
>>
>>
>> This is a followup to
>> http://www.openldap.org/lists/openldap-technical/201805/msg00065.html
> 
> Note that this is duplicate of ITS#7657.

We may have to consider this for 2.5; since it changes the format of the dn2id 
index it's not something we can easily put in at this late stage of 2.4.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 4 hbohnenkamp@united-internet.de 2019-01-25 13:51:49 UTC
Some months have passed but I finally found the time to set up a test
rig for regression testing of my patch under LIVE conditions.  I'll
describe the setup below.

The good news is, today I found the first bug in my patch (handling of
scope=one searches is incorrect). I say "good" news because it shows
that the testing setup is doing something right. 

In related news, it turns out that apparently the implementation of
alias-dereferencing in the hdb backend seems to be faulty, too... not
wrt performance, but pureley from a functional point of view. We found
a search request which should return around 6700 entries, but only
deliveres around 6150. My patched mdb-implementation returns more
closely to 6700 entries, but for obvious reasons also that might be
wrong still. I had not the time yet to find out what goes wrong in the
hdb backend. I'll open a bug  if I manage to reproduce it somehow
under lab conditions.

My test setup is like this:

I use the productive LDAP content of my company of about 2.4 million
Entries, slapcated from the LDAP master, to test against. All user
passwd hashes are replaced with a default hash so that the test tool
can bind successfully. Slapd configuration is essentially the same as
for our productive slaves (esp. the ACLs).

As test cases I use slapd logfiles of our productive clusters (approx
24 million log lines (stats) per server per 24h). Mostly binds and
searches. Those are directed against the hdb-configured slapd and a
patched-mdb-configured slapd (v 2.4.47). Results are compared: whether
binds succeed or fail, and the returned search results. I will extend
the tester to also include other operations, but this is waht is
relevant to test the patch, I guess.  

In the taxonomy of test-methods I would classify this as "advanced
monkey testing". 

While writing the test scripts I was wondering if somebody else did
ever do something similar. There are of course tests in the test
directory of the Openldap sources, but are there any (regression)
tests made (e.g. for Openldap release candidates) which try to
simulate large  productive setups?

-- 
Henrik Bohnenkamp


Comment 5 Quanah Gibson-Mount 2019-01-25 14:31:41 UTC
--On Friday, January 25, 2019 1:51 PM +0000 hbohnenkamp@united-internet.de 
wrote:

> While writing the test scripts I was wondering if somebody else did
> ever do something similar. There are of course tests in the test
> directory of the Openldap sources, but are there any (regression)
> tests made (e.g. for Openldap release candidates) which try to
> simulate large  productive setups?

Hi Henrik,

I've been expanding the regression test cases (See tests/data/regressions) 
and made it possible to run "make its" to trigger executing them prior to 
releases.  However for obvious reasons that clearly isn't going to include 
large datasets.

I've been quite interested in setting adding tests for large data sets that 
are stored and executed outside of the regular set of OpenLDAP tests and 
source tree but so far haven't had a specific case that required such a 
thing.

So if this issue turns out to require testing against a large data set, it 
could be the first item for that work.

Warm regards,
Quanah


--

Quanah Gibson-Mount
Product Architect
Symas Corporation
Packaged, certified, and supported LDAP solutions powered by OpenLDAP:
<http://www.symas.com>


Comment 6 Henrik Bohnenkamp 2019-04-02 14:26:33 UTC
On Fri, Jan 25, 2019 at 06:31:41AM -0800, Quanah Gibson-Mount wrote:
> --On Friday, January 25, 2019 1:51 PM +0000 hbohnenkamp@united-internet.de
> wrote:
> 


Hi,

I have now done quite a lot of testing in the way described in my
email from January (Followup 3). I used version 2.4.47, not the HEAD
of the openldap git master branch.

I have found one bug, the one mentioned already in the same email. I
have added a fix for that bug to the patch directory in
https://github.com/hbo/openldap-mdb-deref-problem

I have come across the regression with the back-hdb backend again,
which I also described in my previous email already.  I am now
convinced that this regression is really a problem with the back-hdb
implementation, and I can show it without using my patched
implementation of back-mdb. The demonstration requires a bit more work
to be really presentable and should in any case go into a different
bug report.

Apart from these two issues I have not seen any regressions, and I am
now quite confident that my patch would work reliable in the
environment of my company... no guarantees for anything else :-)  

I have seen that Quanah has suggested to look at ITS#8875 for the
2.4.48 release. Whether there will be a fix in 2.4.48 and whether it
will be based on my patch is of course up to the lead developers (the
change in the DB schema is certainly a strong point against), but this
patch can perhaps serve as something to test the actual fix against.

Best regards
Henrik


--
Henrik Bohnenkamp

Comment 7 Howard Chu 2019-04-04 16:32:16 UTC
henrik.bohnenkamp@ionos.com wrote:
> On Fri, Jan 25, 2019 at 06:31:41AM -0800, Quanah Gibson-Mount wrote:
>> --On Friday, January 25, 2019 1:51 PM +0000 hbohnenkamp@united-internet.de
>> wrote:
>>

Hi, thanks for the report and investigation. I've briefly reviewed your
patches. Please squash your #3 into #2, there's no reason to preserve that.

We don't use "inline" in this code. If the compiler is smart enough it will
inline automatically anyway. Please drop that from your patch.

No need to keep the old mdb_idscope invocation commented out in patch #2.
Just delete it, we have the git history.

In patch #1 delete.c your patch line #145 is incorrect. You should be
using is_entry_alias(e) instead of (op->ora_e.) The entry in question is not
part of the op request, only its DN is in the Delete request.

Not sure why you dropped nsubs from the diskNode definition. Are you sure you
haven't broken the nsubs processing in this patch?

> 
> Hi,
> 
> I have now done quite a lot of testing in the way described in my
> email from January (Followup 3). I used version 2.4.47, not the HEAD
> of the openldap git master branch.
> 
> I have found one bug, the one mentioned already in the same email. I
> have added a fix for that bug to the patch directory in
> https://github.com/hbo/openldap-mdb-deref-problem
> 
> I have come across the regression with the back-hdb backend again,
> which I also described in my previous email already.  I am now
> convinced that this regression is really a problem with the back-hdb
> implementation, and I can show it without using my patched
> implementation of back-mdb. The demonstration requires a bit more work
> to be really presentable and should in any case go into a different
> bug report.
> 
> Apart from these two issues I have not seen any regressions, and I am
> now quite confident that my patch would work reliable in the
> environment of my company... no guarantees for anything else :-)  
> 
> I have seen that Quanah has suggested to look at ITS#8875 for the
> 2.4.48 release. Whether there will be a fix in 2.4.48 and whether it
> will be based on my patch is of course up to the lead developers (the
> change in the DB schema is certainly a strong point against), but this
> patch can perhaps serve as something to test the actual fix against.
> 
> Best regards
> Henrik
> 
> 
> --
> Henrik Bohnenkamp
> 
> 
> 
> 


-- 
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 8 Henrik Bohnenkamp 2019-04-07 09:14:57 UTC
On Thu, Apr 04, 2019 at 05:32:16PM +0100, Howard Chu wrote:
Hi Howard,

thanks for your comments. There is now a new version of patches available in my 
github repository. Apart from addressing your comments, I have rebased the patches against 
the current master branch (quite a lot activity in the last two months, I have noticed). 
In particular, the new function mdb_get_aliases  now also uses the global size variables
for IDL dimensions, rather than the previous CPP constants.

As for your comments:


> 
> Hi, thanks for the report and investigation. I've briefly reviewed your
> patches. Please squash your #3 into #2, there's no reason to preserve that.

done 

> 
> We don't use "inline" in this code. If the compiler is smart enough it will
> inline automatically anyway. Please drop that from your patch.

done

> 
> No need to keep the old mdb_idscope invocation commented out in patch #2.
> Just delete it, we have the git history.

done 

> 
> In patch #1 delete.c your patch line #145 is incorrect. You should be
> using is_entry_alias(e) instead of (op->ora_e.) The entry in question is not
> part of the op request, only its DN is in the Delete request.

Ah, that was a mistake when squashing my commits down to the two patches... patch 1 introduced
the faulty line with  op->ora_e, patch 2 changed op->ora_e to e. I have cleaned that up. 

> 
> Not sure why you dropped nsubs from the diskNode definition. Are you sure you
> haven't broken the nsubs processing in this patch?

Note that the nsubs declaration was always commented out and had thus only a decorative function. 
Anyway, I have added this comment back to reduce the foot print of the patches.


The current master branch with the two patches builds without problems. 
I have run the mdb tests in the tests directory... the last test was not applied due 
to missing software. All other 74 tests passed. (ubuntu 18.4). 

Best regards
Henrik
> 
> > 
--
Henrik Bohnenkamp

Comment 9 OpenLDAP project 2019-04-18 01:54:23 UTC
has patch
See also ITS#7657
Comment 10 Quanah Gibson-Mount 2019-04-18 01:54:23 UTC
changed notes
Comment 11 Quanah Gibson-Mount 2019-06-28 05:34:19 UTC
--On Sunday, April 07, 2019 10:15 AM +0000 henrik.bohnenkamp@ionos.com 
wrote:

> On Thu, Apr 04, 2019 at 05:32:16PM +0100, Howard Chu wrote:
> Hi Howard,
>
> thanks for your comments. There is now a new version of patches available
> in my  github repository. Apart from addressing your comments, I have
> rebased the patches against  the current master branch (quite a lot
> activity in the last two months, I have noticed).  In particular, the new
> function mdb_get_aliases  now also uses the global size variables for IDL
> dimensions, rather than the previous CPP constants.

Had a customer who was hitting this issue try out these patches -- It 
greatly decreases the search time (from unknown/infinite to 1 minute). 
Unfortunately slapd then segv's.  Working on getting a test database to 
reproduce the issue with for a good backtrace.

--Quanah


--

Quanah Gibson-Mount
Product Architect
Symas Corporation
Packaged, certified, and supported LDAP solutions powered by OpenLDAP:
<http://www.symas.com>


Comment 12 Henrik Bohnenkamp 2019-06-28 06:37:10 UTC
On Thu, Jun 27, 2019 at 10:34:19PM -0700, Quanah Gibson-Mount wrote:
> --On Sunday, April 07, 2019 10:15 AM +0000 henrik.bohnenkamp@ionos.com
> wrote:
> 
> > On Thu, Apr 04, 2019 at 05:32:16PM +0100, Howard Chu wrote:
> 
> Had a customer who was hitting this issue try out these patches -- It
> greatly decreases the search time (from unknown/infinite to 1 minute).
> Unfortunately slapd then segv's.  Working on getting a test database to
> reproduce the issue with for a good backtrace.

oh, that's too bad. I'd be happy to participate in the bug hunt. May I
ask: which version did your customer use? I have modified the patch to
work with the recent changes on master, but I did not dig in very deep
to understand these changes. So I must admit that these modifications
to the patch are completely untested (except the tests in the tests
subdir).

Things might work better with the 2.4.46/47 patch, if that is an
option for your customer.

Wrt getting a test-tree, maybe the python script I have in the github
repo for the patch can provide
one. (https://github.com/hbo/openldap-mdb-deref-problem/blob/master/scripts/write_tree.py)

I'm of course very interested to keep this patch alive, so if you have
details on the segv, please let me know.

Henrik


> 
> --Quanah
> 
> 
> --
> 
> Quanah Gibson-Mount
> Product Architect
> Symas Corporation
> Packaged, certified, and supported LDAP solutions powered by OpenLDAP:
> <http://www.symas.com>

-- 
Dr. rer. nat. Henrik Bohnenkamp

Senior Sysadmin
IT Operations Monitoring & Infrastructure

1&1 IONOS SE | Brauerstraße 48 | 76135 Karlsruhe | Germany
Phone: +49 721 91374 - 4159
E-mail: henrik.bohnenkamp@ionos.com | Web: www.ionos.de

Hauptsitz Montabaur, Amtsgericht Montabaur, HRB 24498

Vorstand: Dr. Christian Böing, Hüseyin Dogan, Hans-Henning Kettler, Matthias Steinberg, Achim Weiß
Aufsichtsratsvorsitzender: Markus Kadelke

Member of United Internet

Diese E-Mail kann vertrauliche und/oder gesetzlich geschützte Informationen enthalten. Wenn Sie nicht der bestimmungsgemäße Adressat sind oder diese E-Mail irrtümlich erhalten haben, unterrichten Sie bitte den Absender und vernichten Sie diese E-Mail. Anderen als dem bestimmungsgemäßen Adressaten ist untersagt, diese E-Mail zu speichern, weiterzuleiten oder ihren Inhalt auf welche Weise auch immer zu verwenden.

This e-mail may contain confidential and/or privileged information. If you are not the intended recipient of this e-mail, you are hereby notified that saving, distribution or use of the content of this e-mail in any way is prohibited. If you have received this e-mail in error, please notify the sender and delete the e-mail.

Comment 13 Howard Chu 2019-07-15 13:26:59 UTC
henrik.bohnenkamp@ionos.com wrote:
> On Thu, Jun 27, 2019 at 10:34:19PM -0700, Quanah Gibson-Mount wrote:
>> --On Sunday, April 07, 2019 10:15 AM +0000 henrik.bohnenkamp@ionos.com
>> wrote:
>>
>>> On Thu, Apr 04, 2019 at 05:32:16PM +0100, Howard Chu wrote:
>>
>> Had a customer who was hitting this issue try out these patches -- It
>> greatly decreases the search time (from unknown/infinite to 1 minute).
>> Unfortunately slapd then segv's.  Working on getting a test database to
>> reproduce the issue with for a good backtrace.
> 
> oh, that's too bad. I'd be happy to participate in the bug hunt. May I
> ask: which version did your customer use? I have modified the patch to
> work with the recent changes on master, but I did not dig in very deep
> to understand these changes. So I must admit that these modifications
> to the patch are completely untested (except the tests in the tests
> subdir).
> 
> Things might work better with the 2.4.46/47 patch, if that is an
> option for your customer.
> 
> Wrt getting a test-tree, maybe the python script I have in the github
> repo for the patch can provide
> one. (https://github.com/hbo/openldap-mdb-deref-problem/blob/master/scripts/write_tree.py)
> 
> I'm of course very interested to keep this patch alive, so if you have
> details on the segv, please let me know.

Fyi, on our problematic test database with 11M entries and 3.7M aliases, a search with -a always , starting from the
DB suffix, took 4 minutes without this patch, and 1235 minutes with this patch.

Needless to say, that's not looking good. Still checking other test cases.

-- 
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 14 Henrik Bohnenkamp 2019-07-15 15:41:10 UTC
On Mon, Jul 15, 2019 at 02:26:59PM +0100, Howard Chu wrote:
> 
> Fyi, on our problematic test database with 11M entries and 3.7M aliases, a search with -a always , starting from the
> DB suffix, took 4 minutes without this patch, and 1235 minutes with this patch.
> 
> Needless to say, that's not looking good. Still checking other test cases.

Interesting, so the behavior is reversed now :-). I assume you have
found an alternative approach to solve the problem. That's fine with
me, I want the problem solved, not my patch integrated.  I'm of course
interested in how you do it. Surely you did not get the 4 minutes with
a stock 2.4.48 slapd?

Comment 15 Howard Chu 2019-07-15 15:53:23 UTC
Henrik Bohnenkamp wrote:
> On Mon, Jul 15, 2019 at 02:26:59PM +0100, Howard Chu wrote:
>>
>> Fyi, on our problematic test database with 11M entries and 3.7M aliases, a search with -a always , starting from the
>> DB suffix, took 4 minutes without this patch, and 1235 minutes with this patch.
>>
>> Needless to say, that's not looking good. Still checking other test cases.
> 
> Interesting, so the behavior is reversed now :-). I assume you have
> found an alternative approach to solve the problem. That's fine with
> me, I want the problem solved, not my patch integrated.  I'm of course
> interested in how you do it. Surely you did not get the 4 minutes with
> a stock 2.4.48 slapd?

For this size of DB we needed the ITS#8977 patches to accommodate larger IDLs.
(I used 24 bits for IDLs, 16.7M slots)
Also at this size, the IDL processing itself is the main bottleneck now. We would
need to switch to bitmaps or trees to avoid this bottleneck, but that's also a
much larger change than we can consider for this release.

-- 
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 16 Howard Chu 2019-07-15 18:44:05 UTC
Howard Chu wrote:
> Henrik Bohnenkamp wrote:
>> On Mon, Jul 15, 2019 at 02:26:59PM +0100, Howard Chu wrote:
>>>
>>> Fyi, on our problematic test database with 11M entries and 3.7M aliases, a search with -a always , starting from the
>>> DB suffix, took 4 minutes without this patch, and 1235 minutes with this patch.
>>>
>>> Needless to say, that's not looking good. Still checking other test cases.
>>
>> Interesting, so the behavior is reversed now :-). I assume you have
>> found an alternative approach to solve the problem. That's fine with
>> me, I want the problem solved, not my patch integrated.  I'm of course
>> interested in how you do it. Surely you did not get the 4 minutes with
>> a stock 2.4.48 slapd?
> 
> For this size of DB we needed the ITS#8977 patches to accommodate larger IDLs.
> (I used 24 bits for IDLs, 16.7M slots)
> Also at this size, the IDL processing itself is the main bottleneck now. We would
> need to switch to bitmaps or trees to avoid this bottleneck, but that's also a
> much larger change than we can consider for this release.
> 
I've set up a more modest test database along the lines of ITS#7657. It has 500,000 users,
30,000 aliases total, and 435 in ou=alias2 (all the rest under ou=alias1).

For unpatched back-mdb:

time ../clients/tools/ldapsearch -x -H ldap://:9012 -D cn=manager,dc=example,dc=com -w secret -b ou=alias1,dc=example,dc=com -a always
# search result
search: 2
result: 0 Success

# numResponses: 29567
# numEntries: 29566

real    0m42.504s
user    0m1.344s
sys     0m2.996s

time ../clients/tools/ldapsearch -x -H ldap://:9012 -D cn=manager,dc=example,dc=com -w secret -b ou=alias2,dc=example,dc=com -a always
# search result
search: 2
result: 0 Success

# numResponses: 437
# numEntries: 436

real    0m48.406s
user    0m0.040s
sys     0m0.076s

For back-mdb with e90e8c7d3c12d897bb0584ba04dc519d4f23acf9

time ../clients/tools/ldapsearch -x -H ldap://:9012 -D cn=manager,dc=example,dc=com -w secret -b ou=alias1,dc=example,dc=com -a always
# search result
search: 2
result: 0 Success

# numResponses: 29567
# numEntries: 29566

real    0m5.500s
user    0m1.516s
sys     0m2.944s

time ../clients/tools/ldapsearch -x -H ldap://:9012 -D cn=manager,dc=example,dc=com -w secret -b ou=alias2,dc=example,dc=com -a always
# search result
search: 2
result: 0 Success

# numResponses: 437
# numEntries: 436

real    0m0.399s
user    0m0.048s
sys     0m0.060s

For back-mdb with this ITS#8875 patch

time ../clients/tools/ldapsearch -x -H ldap://:9012 -D cn=manager,dc=example,dc=com -w secret -b ou=alias1,dc=example,dc=com -a always
# search result
search: 2
result: 0 Success

# numResponses: 29567
# numEntries: 29566

real    0m6.020s
user    0m1.640s
sys     0m3.372s

time ../clients/tools/ldapsearch -x -H ldap://:9012 -D cn=manager,dc=example,dc=com -w secret -b ou=alias2,dc=example,dc=com -a always
# search result
search: 2
result: 0 Success

# numResponses: 437
# numEntries: 436

real    0m0.203s
user    0m0.052s
sys     0m0.048s

It seems close enough in this case (I didn't do enough repeated runs to average out any measurement error) while
the committed patch performs better on the really ugly test case.

The tool to generate the test LDIF is attached. It reads an LDIF containing 500,000 users on stdin, and outputs the same LDIF,
with aliases interspersed, on stdout.

-- 
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
Comment 17 Howard Chu 2019-07-15 20:12:14 UTC
hyc@symas.com wrote:
> The tool to generate the test LDIF is attached. It reads an LDIF containing 500,000 users on stdin, and outputs the same LDIF,
> with aliases interspersed, on stdout.
> 
Slightly tweaked, creates the alias after the target entry. In case the server does referential integrity on loading.

-- 
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
Comment 18 Henrik Bohnenkamp 2019-07-18 17:04:02 UTC
On Mon, Jul 15, 2019 at 07:44:05PM +0100, Howard Chu wrote:
> Howard Chu wrote:
> > Henrik Bohnenkamp wrote:
> >> On Mon, Jul 15, 2019 at 02:26:59PM +0100, Howard Chu wrote:
> >>>
> >>> Fyi, on our problematic test database with 11M entries and 3.7M aliases, a search with -a always , starting from the
> >>> DB suffix, took 4 minutes without this patch, and 1235 minutes with this patch.
> >>>
> >>> Needless to say, that's not looking good. Still checking other test cases.


I start to understand why this is the case. I believe that our two
approaches to solve the problem each have their Nemesis...  I have
constructed a test case where the patched slapd needs 2 minutes (1
minute on a repeat) for a search for people objects, but the master
branch has yet to finish after 30 minutes (and might not finish today).

The test case is modeled similar to what is done in my company: group
objects have not only member entries for people, but each member has
an alias to the corresponding person object, these aliases being
children to the respective group object. Furthermore, subgroups/group
inclusion is modeled by adding aliases (also as children) to the group
object, pointing to the group objects representing the included
groups. So there is some recursive descent to get all person objects
of all subgroups. In total I have 5 mio entries, 3 mio of which are
aliases.  There are no cycles, but in this test case the depth of the
recursion is absolutely unrealistic.

I believe, in the master-slapd, each step down in the recursion needs a call to
mdb_idscope, which in turn goes through all 3 mio aliases to compute
the intersection with the scope of the group object (that's what I
remember, but I have not looked at the algorithm for a year now).

The  patch traverses in this case just very small sub-trees (although many of those) 
to collect the relevant aliases and continues from there. 
That seems to be much more efficient here. 

If I understand your description above correctly, my patch sucks with
your test case because it does a full traversal of the LDAP tree to
collect all aliases ... with 11 mio entries that takes a while. The
master branch, however, just computes intersection with the scope once
and continues from there, which must be much faster.

I actually do not know what to conclude from all this. I wonder
however, if a combination of both approaches could be beneficial. With
the patch the diskNode of an entry contains not only the number of
children in the  scope, but also the number of aliases. If the
total number of aliases in the database is large, but the search scope
is small, the tree traversal might be faster. If the number of entries in
the scope is big, and there are aliases in the search scope, the
intersection in mdb_idscope might be more efficient. If there are no 
aliases in the search scope, nothing has to be done at all.