Issue 9146 - incomplete initialization of sessionlog structure may cause session log to grow indefinitely
Summary: incomplete initialization of sessionlog structure may cause session log to gr...
Status: VERIFIED FIXED
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: slapd (show other issues)
Version: 2.4.48
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-01-08 13:09 UTC by maxime.besson@worteks.com
Modified: 2020-04-08 07:26 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description maxime.besson@worteks.com 2020-01-08 13:09:08 UTC
Full_Name: Maxime Besson
Version: 2.4.48
OS: Amazon Linux
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (92.184.104.114)


During a cloud provider migration, I have migrated an OpenLDAP instance using
MDB + Syncrepl (in mirror mode, with session log) from Ubuntu to Amazon Linux.

When starting a server with no data, syncrepl kicks in and starts replicating
the other server's database (12G, 7M entries).

In Ubuntu this phase takes about 25 minutes (with dbnosync).  In Amazon Linux,
with the exact same cn=config and daemon options, it took 10 days on first try,
at 100% CPU consumption the whole time.

I observed that the start of the import is pretty fast (data.mdb grows by
10M/s) but import speed slows down rapidly to a crawl. 

After some investigation, I discovered that disabling the sessionlog brings the
import times to normal.

Digging further, I found that most of the CPU time is spent in
syncprov_add_slog, and particularly this line:


            /* Keep the list in csn order. */
            ...
                for ( sep = &sl->sl_head; *sep; sep = &(*sep)->se_next ) {
>>>>>>>             if ( ber_bvcmp( &se->se_csn, &(*sep)->se_csn ) < 0 ) {
                        se->se_next = *sep;
                        *sep = se;
                        break;
                    }
                }

The sessionlog appears to be growing endlessly:

  (gdb) print sl->sl_num
  $1 = 681896
  (gdb) print sl->sl_size
  $3 = 100

Digging around the source, it seems that the session log is supposed to be
cleaned up at the end of syncprov_add_slog, if it's not playing:

    if (!sl->sl_playing) {
        while ( sl->sl_num > sl->sl_size ) {
            ...

However:

  (gdb) print sl->sl_playing
  $4 = 543388517

This looked like an uninitialized value to me, and indeed, sp_cf_gen
doesn't seem to initialize this field:

        if ( !sl ) {
            sl = ch_malloc( sizeof( sessionlog ));
            sl->sl_mincsn = NULL;
            sl->sl_sids = NULL;
            sl->sl_num = 0;
            sl->sl_numcsns = 0;
            sl->sl_head = sl->sl_tail = NULL;
            ldap_pvt_thread_mutex_init( &sl->sl_mutex );
            si->si_logs = sl;
        }


I tried the following patch:

--- openldap-2.4.48.orig/servers/slapd/overlays/syncprov.c      2019-07-23
16:46:22.000000000 +0200
+++ openldap-2.4.48/servers/slapd/overlays/syncprov.c   2020-01-08
11:33:16.770110282 +0100
@@ -3082,6 +3082,7 @@
                        sl = ch_malloc( sizeof( sessionlog ));
                        sl->sl_mincsn = NULL;
                        sl->sl_sids = NULL;
+                       sl->sl_playing = 0;
                        sl->sl_num = 0;
                        sl->sl_numcsns = 0;
                        sl->sl_head = sl->sl_tail = NULL;


And it got rid of the issue. My guess is that most of the time, sl_playing
happens to be 0, but on Amazon Linux, for some reason (patched glibc?), the
sessionlog lands in the wrong chunk of uninitilized memory

Additional info:

# rpm -q glibc
glibc-2.26-32.amzn2.0.2.x86_64

# uname -a
Linux 4.14.152-127.182.amzn2.x86_64 #1 SMP Thu Nov 14 17:32:43 UTC 2019 x86_64
x86_64 x86_64 GNU/Linux

# /usr/local/openldap/libexec/slapd -V
@(#) $OpenLDAP: slapd 2.4.48 (Jan  8 2020 11:39:38) $

(LTB build)
 
  
Comment 1 Howard Chu 2020-01-08 22:23:02 UTC
maxime.besson@worteks.com wrote:
> Full_Name: Maxime Besson
> Version: 2.4.48
> OS: Amazon Linux
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (92.184.104.114)

Thanks for the report, fixed now in git master.
> 
> 
> During a cloud provider migration, I have migrated an OpenLDAP instance using
> MDB + Syncrepl (in mirror mode, with session log) from Ubuntu to Amazon Linux.
> 
> When starting a server with no data, syncrepl kicks in and starts replicating
> the other server's database (12G, 7M entries).
> 
> In Ubuntu this phase takes about 25 minutes (with dbnosync).  In Amazon Linux,
> with the exact same cn=config and daemon options, it took 10 days on first try,
> at 100% CPU consumption the whole time.
> 
> I observed that the start of the import is pretty fast (data.mdb grows by
> 10M/s) but import speed slows down rapidly to a crawl. 
> 
> After some investigation, I discovered that disabling the sessionlog brings the
> import times to normal.
> 
> Digging further, I found that most of the CPU time is spent in
> syncprov_add_slog, and particularly this line:
> 
> 
>             /* Keep the list in csn order. */
>             ...
>                 for ( sep = &sl->sl_head; *sep; sep = &(*sep)->se_next ) {
>>>>>>>>             if ( ber_bvcmp( &se->se_csn, &(*sep)->se_csn ) < 0 ) {
>                         se->se_next = *sep;
>                         *sep = se;
>                         break;
>                     }
>                 }
> 
> The sessionlog appears to be growing endlessly:
> 
>   (gdb) print sl->sl_num
>   $1 = 681896
>   (gdb) print sl->sl_size
>   $3 = 100
> 
> Digging around the source, it seems that the session log is supposed to be
> cleaned up at the end of syncprov_add_slog, if it's not playing:
> 
>     if (!sl->sl_playing) {
>         while ( sl->sl_num > sl->sl_size ) {
>             ...
> 
> However:
> 
>   (gdb) print sl->sl_playing
>   $4 = 543388517
> 
> This looked like an uninitialized value to me, and indeed, sp_cf_gen
> doesn't seem to initialize this field:
> 
>         if ( !sl ) {
>             sl = ch_malloc( sizeof( sessionlog ));
>             sl->sl_mincsn = NULL;
>             sl->sl_sids = NULL;
>             sl->sl_num = 0;
>             sl->sl_numcsns = 0;
>             sl->sl_head = sl->sl_tail = NULL;
>             ldap_pvt_thread_mutex_init( &sl->sl_mutex );
>             si->si_logs = sl;
>         }
> 
> 
> I tried the following patch:
> 
> --- openldap-2.4.48.orig/servers/slapd/overlays/syncprov.c      2019-07-23
> 16:46:22.000000000 +0200
> +++ openldap-2.4.48/servers/slapd/overlays/syncprov.c   2020-01-08
> 11:33:16.770110282 +0100
> @@ -3082,6 +3082,7 @@
>                         sl = ch_malloc( sizeof( sessionlog ));
>                         sl->sl_mincsn = NULL;
>                         sl->sl_sids = NULL;
> +                       sl->sl_playing = 0;
>                         sl->sl_num = 0;
>                         sl->sl_numcsns = 0;
>                         sl->sl_head = sl->sl_tail = NULL;
> 
> 
> And it got rid of the issue. My guess is that most of the time, sl_playing
> happens to be 0, but on Amazon Linux, for some reason (patched glibc?), the
> sessionlog lands in the wrong chunk of uninitilized memory
> 
> Additional info:
> 
> # rpm -q glibc
> glibc-2.26-32.amzn2.0.2.x86_64
> 
> # uname -a
> Linux 4.14.152-127.182.amzn2.x86_64 #1 SMP Thu Nov 14 17:32:43 UTC 2019 x86_64
> x86_64 x86_64 GNU/Linux
> 
> # /usr/local/openldap/libexec/slapd -V
> @(#) $OpenLDAP: slapd 2.4.48 (Jan  8 2020 11:39:38) $
> 
> (LTB build)
>  
>   
> 
> 
> 


-- 
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 2 Quanah Gibson-Mount 2020-01-09 17:13:07 UTC
changed notes
changed state Open to Release
moved from Incoming to Software Bugs
Comment 3 OpenLDAP project 2020-01-30 18:33:54 UTC
Fixed in master (57dbe995b8f7c80421acf82ad187a4f281e9a4d5)
Fixed in RE24 (2.4.49)
Comment 4 Quanah Gibson-Mount 2020-01-30 18:33:54 UTC
changed notes
changed state Release to Closed