Full_Name: Ryan Tandy Version: 2.4.46 OS: Debian URL: ftp://ftp.openldap.org/incoming/20180511_rtandy_syncrepl-memory-consumer.tgz Submission from: (NULL) (70.66.128.207) Submitted by: ryan When running object-based syncrepl, and making changes to groups, the provider slapd uses more and more memory, apparently without bound. We've discussed this issue before but there was no ITS tracking it specifically. original Debian bug: https://bugs.debian.org/725091 and a possibly related openldap-technical post: https://www.openldap.org/lists/openldap-technical/201503/msg00206.html reproducer: ftp://ftp.openldap.org/incoming/20180511_rtandy_syncrepl-memory-consumer.tgz ./prepare ./runslapd (backgrounds a provider slapd and a consumer slapd) ./modify (makes a number of modifications on the provider) ./clean (kills both slapds and cleans databases) Run top in another terminal and watch the memory growth. On my system, the provider grows to over 3 GB resident and does not shrink even after replication completes. with delta-syncrepl enabled, the producer's RSS is only around 10 MB. Reproduced on Debian unstable with 2.4.46 and glibc malloc (glibc 2.27-3) and tcmalloc_minimal (2.7-1).
bisect identifies c365ac359e9c9b483b934c2a1f0bc552645c32fa as the commit that introduced this behaviour. 003dfbda574f37bbf1a2240f530ff9fa35ab0801 on RE24 (2.4.20) commit c365ac359e9c9b483b934c2a1f0bc552645c32fa Author: Howard Chu <hyc@openldap.org> Date: Sun Nov 22 04:42:00 2009 +0000 ITS#6368 use dup'd entries in response queue
ryan@nardis.ca wrote: > bisect identifies c365ac359e9c9b483b934c2a1f0bc552645c32fa as the commit > that introduced this behaviour. > > 003dfbda574f37bbf1a2240f530ff9fa35ab0801 on RE24 (2.4.20) > > commit c365ac359e9c9b483b934c2a1f0bc552645c32fa > Author: Howard Chu <hyc@openldap.org> > Date: Sun Nov 22 04:42:00 2009 +0000 > > ITS#6368 use dup'd entries in response queue I've run your reproducer and see no memory leak. The response queue will indeed grow without bound if the consumer runs slower than the provider, and doesn't read responses fast enough. But in the case of this test script, eventually the client finishes and the consumer catches up. The provider's process size may not decrease, but that just means the malloc implementation isn't returning freed memory to the kernel - it's not a leak. This can be verified using mleak, and using SIGPROF to snapshot the memory usage of the provider. The simplest way to force the memory use to grow is to first suspend the consumer with SIGSTOP. Let the modify client run to completion. mleak / SIGPROF will show a large amount of memory in use. Resume the consumer with SIGCONT, let it run to completion, and then check with SIGPROF on the provider again - all of the response queue memory is freed. So, conclusively, there is no actual leak. But there's a problem with sustained client modifications when the consumer is too slow. Our options here are to configure a size limit on the response queue, and hang the client when the limit is hit, or to return LDAP_BUSY to the client. Neither of these are very attractive options. Doing batched commits will speed up the consumer, but that feature is only in 2.5. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
(In reply to Howard Chu from comment #2) > Doing batched commits will speed up the consumer, but that feature is only > in 2.5. Batched commits for replication were reverted in 79ced664b8597c8c08afcb9d1fd48ca4201fe5f7 and 12dbcc0eb3fd534ba02e3c8ed8fb1e55c964d6af due to issues uncovered in its#8752
Similarly, when I used AWS, it was necessary to have the consumers be set at 4k IOPS while the providers were 3k IOPS. I.e., it's generally necessary that consumers be faster than providers when processing large sequences of write updates.
May be possible to improve diff code for standard syncrepl to improve performance on the consumer side if the attribute is sorted via sortvals, needs investigation.
attr_cmp should check the attribute is a sortval and if so, should diff without resolving to a double loop.
Making attr_cmp do a linear sweep for sortvals attributes (instead of the quadratic match it has to do right now) makes the consumer 7-8x slower than a provider across the board with the environment provided. I might have expected something like 3-4x but that's out of scope for this particular ITS.
For comparison, using deltasync (and sortvals!) makes the consumer take a similar amount of CPU time (about +50-90 % on the provider's) to process the 10k value additions, just like Ryan noted earlier. On the other idea, no clue on whether we can somehow limit the amount of data queued up without severely impairing replication progress.
• 8986f99d by Ondřej Kuzník at 2023-11-14T18:09:10+00:00 ITS#8852 Optimise attr_cmp for sortval attributes