[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#9146) incomplete initialization of sessionlog structure may cause session log to grow indefinitely



maxime.besson@worteks.com wrote:
> Full_Name: Maxime Besson
> Version: 2.4.48
> OS: Amazon Linux
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (92.184.104.114)

Thanks for the report, fixed now in git master.
> 
> 
> During a cloud provider migration, I have migrated an OpenLDAP instance using
> MDB + Syncrepl (in mirror mode, with session log) from Ubuntu to Amazon Linux.
> 
> When starting a server with no data, syncrepl kicks in and starts replicating
> the other server's database (12G, 7M entries).
> 
> In Ubuntu this phase takes about 25 minutes (with dbnosync).  In Amazon Linux,
> with the exact same cn=config and daemon options, it took 10 days on first try,
> at 100% CPU consumption the whole time.
> 
> I observed that the start of the import is pretty fast (data.mdb grows by
> 10M/s) but import speed slows down rapidly to a crawl. 
> 
> After some investigation, I discovered that disabling the sessionlog brings the
> import times to normal.
> 
> Digging further, I found that most of the CPU time is spent in
> syncprov_add_slog, and particularly this line:
> 
> 
>             /* Keep the list in csn order. */
>             ...
>                 for ( sep = &sl->sl_head; *sep; sep = &(*sep)->se_next ) {
>>>>>>>>             if ( ber_bvcmp( &se->se_csn, &(*sep)->se_csn ) < 0 ) {
>                         se->se_next = *sep;
>                         *sep = se;
>                         break;
>                     }
>                 }
> 
> The sessionlog appears to be growing endlessly:
> 
>   (gdb) print sl->sl_num
>   $1 = 681896
>   (gdb) print sl->sl_size
>   $3 = 100
> 
> Digging around the source, it seems that the session log is supposed to be
> cleaned up at the end of syncprov_add_slog, if it's not playing:
> 
>     if (!sl->sl_playing) {
>         while ( sl->sl_num > sl->sl_size ) {
>             ...
> 
> However:
> 
>   (gdb) print sl->sl_playing
>   $4 = 543388517
> 
> This looked like an uninitialized value to me, and indeed, sp_cf_gen
> doesn't seem to initialize this field:
> 
>         if ( !sl ) {
>             sl = ch_malloc( sizeof( sessionlog ));
>             sl->sl_mincsn = NULL;
>             sl->sl_sids = NULL;
>             sl->sl_num = 0;
>             sl->sl_numcsns = 0;
>             sl->sl_head = sl->sl_tail = NULL;
>             ldap_pvt_thread_mutex_init( &sl->sl_mutex );
>             si->si_logs = sl;
>         }
> 
> 
> I tried the following patch:
> 
> --- openldap-2.4.48.orig/servers/slapd/overlays/syncprov.c      2019-07-23
> 16:46:22.000000000 +0200
> +++ openldap-2.4.48/servers/slapd/overlays/syncprov.c   2020-01-08
> 11:33:16.770110282 +0100
> @@ -3082,6 +3082,7 @@
>                         sl = ch_malloc( sizeof( sessionlog ));
>                         sl->sl_mincsn = NULL;
>                         sl->sl_sids = NULL;
> +                       sl->sl_playing = 0;
>                         sl->sl_num = 0;
>                         sl->sl_numcsns = 0;
>                         sl->sl_head = sl->sl_tail = NULL;
> 
> 
> And it got rid of the issue. My guess is that most of the time, sl_playing
> happens to be 0, but on Amazon Linux, for some reason (patched glibc?), the
> sessionlog lands in the wrong chunk of uninitilized memory
> 
> Additional info:
> 
> # rpm -q glibc
> glibc-2.26-32.amzn2.0.2.x86_64
> 
> # uname -a
> Linux 4.14.152-127.182.amzn2.x86_64 #1 SMP Thu Nov 14 17:32:43 UTC 2019 x86_64
> x86_64 x86_64 GNU/Linux
> 
> # /usr/local/openldap/libexec/slapd -V
> @(#) $OpenLDAP: slapd 2.4.48 (Jan  8 2020 11:39:38) $
> 
> (LTB build)
>  
>   
> 
> 
> 


-- 
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/