[Date Prev][Date Next] [Chronological] [Thread] [Top]

(ITS#9146) incomplete initialization of sessionlog structure may cause session log to grow indefinitely



Full_Name: Maxime Besson
Version: 2.4.48
OS: Amazon Linux
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (92.184.104.114)


During a cloud provider migration, I have migrated an OpenLDAP instance using
MDB + Syncrepl (in mirror mode, with session log) from Ubuntu to Amazon Linux.

When starting a server with no data, syncrepl kicks in and starts replicating
the other server's database (12G, 7M entries).

In Ubuntu this phase takes about 25 minutes (with dbnosync).  In Amazon Linux,
with the exact same cn=config and daemon options, it took 10 days on first try,
at 100% CPU consumption the whole time.

I observed that the start of the import is pretty fast (data.mdb grows by
10M/s) but import speed slows down rapidly to a crawl. 

After some investigation, I discovered that disabling the sessionlog brings the
import times to normal.

Digging further, I found that most of the CPU time is spent in
syncprov_add_slog, and particularly this line:


            /* Keep the list in csn order. */
            ...
                for ( sep = &sl->sl_head; *sep; sep = &(*sep)->se_next ) {
>>>>>>>             if ( ber_bvcmp( &se->se_csn, &(*sep)->se_csn ) < 0 ) {
                        se->se_next = *sep;
                        *sep = se;
                        break;
                    }
                }

The sessionlog appears to be growing endlessly:

  (gdb) print sl->sl_num
  $1 = 681896
  (gdb) print sl->sl_size
  $3 = 100

Digging around the source, it seems that the session log is supposed to be
cleaned up at the end of syncprov_add_slog, if it's not playing:

    if (!sl->sl_playing) {
        while ( sl->sl_num > sl->sl_size ) {
            ...

However:

  (gdb) print sl->sl_playing
  $4 = 543388517

This looked like an uninitialized value to me, and indeed, sp_cf_gen
doesn't seem to initialize this field:

        if ( !sl ) {
            sl = ch_malloc( sizeof( sessionlog ));
            sl->sl_mincsn = NULL;
            sl->sl_sids = NULL;
            sl->sl_num = 0;
            sl->sl_numcsns = 0;
            sl->sl_head = sl->sl_tail = NULL;
            ldap_pvt_thread_mutex_init( &sl->sl_mutex );
            si->si_logs = sl;
        }


I tried the following patch:

--- openldap-2.4.48.orig/servers/slapd/overlays/syncprov.c      2019-07-23
16:46:22.000000000 +0200
+++ openldap-2.4.48/servers/slapd/overlays/syncprov.c   2020-01-08
11:33:16.770110282 +0100
@@ -3082,6 +3082,7 @@
                        sl = ch_malloc( sizeof( sessionlog ));
                        sl->sl_mincsn = NULL;
                        sl->sl_sids = NULL;
+                       sl->sl_playing = 0;
                        sl->sl_num = 0;
                        sl->sl_numcsns = 0;
                        sl->sl_head = sl->sl_tail = NULL;


And it got rid of the issue. My guess is that most of the time, sl_playing
happens to be 0, but on Amazon Linux, for some reason (patched glibc?), the
sessionlog lands in the wrong chunk of uninitilized memory

Additional info:

# rpm -q glibc
glibc-2.26-32.amzn2.0.2.x86_64

# uname -a
Linux 4.14.152-127.182.amzn2.x86_64 #1 SMP Thu Nov 14 17:32:43 UTC 2019 x86_64
x86_64 x86_64 GNU/Linux

# /usr/local/openldap/libexec/slapd -V
@(#) $OpenLDAP: slapd 2.4.48 (Jan  8 2020 11:39:38) $

(LTB build)