Issue 8050 - [PATCH] test060 slapd-mtread concurrency issues
Summary: [PATCH] test060 slapd-mtread concurrency issues
Status: VERIFIED FIXED
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: slapd (show other issues)
Version: unspecified
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-02-03 19:54 UTC by richton@nbcs.rutgers.edu
Modified: 2015-07-02 17:48 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description richton@nbcs.rutgers.edu 2015-02-03 19:54:33 UTC
Full_Name: Aaron Richton
Version: RE24
OS: Fedora 20
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (128.6.31.135)


While performing RE24 testing observed an issue in slapd-mtread:

==28795== Process terminating with default action of signal 11 (SIGSEGV):
dumping core
==28795==  Access not within mapped region at address 0x2FE9F9D0
==28795==    at 0x3A45E0A1%3: pthread_join (pthread_join.c:47)
==28795==    by 0x4A09005: ??? (in
/usr/lib64/valgrind/vgpreload_helgrind-amd64-linux.so)
==28795==    by 0x4A0BBA5: pthread_join (in
/usr/lib64/valgrind/vgpreload_helgrind-amd64-linux.so)
==28795==  by 0 0x409A2A: ldap_pvt_thread_join (thr_posix.c:197)
==28795==    by 0x405540: main (slapd-mtread.c:361)

Hmm, not particularly helpful. Gave this a run through helgrind, report below.
So far as I can tell we had one thread in ldap_dup protected by ldcmutex:

542         LDAP_MUTEX_LOCK( &old->ld_ldcmutex );
543         ld->ldc = old->ldc;
544         old->ld_ldcrefcnt++;
545         LDAP_MUTEX_UNLOCK( &old->ld_ldcmutex );

while ldap_get_option(LDAP_OPT_SESSION_REFCNT) is protected by ldo_mutex only.
As a quick attempt:

--- a/libraries/libldap/options.c
+++ b/libraries/libldap/options.c
@@ -123,6 +123,10 @@ ldap_get_option(
                return LDAP_OPT_ERROR;
        }
 
+       if( option == LDAP_OPT_SESSION_REFCNT ) {
+               LDAP_MUTEX_LOCK( &ld->ld_ldcmutex );
+       }
+
        LDAP_MUTEX_LOCK( &lo->ldo_mutex );
 
        switch(option) {
@@ -414,6 +418,11 @@ ldap_get_option(
        }
 
        LDAP_MUTEX_UNLOCK( &lo->ldo_mutex );
+
+       if( option == LDAP_OPT_SESSION_REFCNT ) {
+               LDAP_MUTEX_UNLOCK( &ld->ld_ldcmutex );
+       }
+
        return ( rc );
 }
 
helgrind looks happy with this. The if() is obviously optional but is hopefully
a small performance gain for ldap_get_option(others).

helgrind report:

==28795== Lock at 0x4C393D0 was first observed
==28795==    at 0x4A0BCCF: pthread_mutex_init (in
/usr/lib64/valgrind/vgpreload_helgrind-amd64-linux.so)
==28795==    by 0x409B08: ldap_pvt_thread_mutex_init (thr_posix.c:383)
==28795==    by 0x40A25A: ldap_create (open.c:180)
==28795==    by 0x40A3E4: ldap_initialize (open.c:240)
==28795==    by 0x4062B7: do_conn (slapd-mtread.c:584)
==28795==    by 0x40530A: main (slapd-mtread.c:331)
==28795== 
==28795== Lock at 0x4C393A8 was first observed
==28795==    at 0x4A0BCCF: pthread_mutex_init (in
/usr/lib64/valgrind/vgpreload_helgrind-amd64-linux.so)
==28795==    by 0x409B08: ldap_pvt_thread_mutex_init (thr_posix.c:283)
==28795==    by 0x409FBF: ldap_create (open.c:129)
==28795%%3=    by 0xA3A3E4: ldap_initialize (open.c:240)
==28795==    by 0x4062B7: do_conn (slapd-mtread.c:584)
==28795==    by 0x40530A: main (slapd-mtread.c:331)
==28795== 
==28795== Possible data race during write of size 4 at 0x4C39238 by thread #%2
==28795== Locks held: 1, at address 0x4C393D0
==28795==    at 0x412AC1: ldap_ld_free (unbind.c:87)
==28795==    by 0x4130EE: ldap_destroy (unbind.c:245)
==28795==    by 0x405BEF: do_onethread (slapd-mtread.c:455)
==28795==    by 0x4A0AECD: ??? (in
/usr/lib64/valgrind/vgpreload_helgrind-amd64-linux.so)
==28795==    by 0x3A45E07EE4: start_thread (pthread_create.c:309)
==28795==    by 0x3A45AF4B8C: clone (clone.S:111)
==28795== 
==28795== This conflicts with a previous read of size 4 by thread #29
==28795%%3= Locks held: 1, at address 0x4C393A8
==28795==    at 0x426957: ldap_get_option (options.c:374)
==28795==    by 0x405914: do_onethread (slapd-mtread.c:423)
==28795==    by 0x4A0AECD: ??? (in
/usr/lib64/valgrind/vgpreload_helgrind-amd64-linux.so)
==28795==    by 0x3A45E07EE4: start_thread (pthread_create.c:309)
==28795==    by 0x3A45AF4B8C: clone (clone.S:111)
==28795== 
==28795== Address 0x4C39238 is 88 bytes inside a block of size 736 alloc'd
==28795==    at 0x4A083F4: calloc (in
/usr/lib64/valgrind/vgpreload_helgrind-amd64-linux.so)
==28795==    by 0x43E4B1: ber_memcalloc_x (memory.c:283)
==28795==    by 0x409F4B: ldap_create (open.c:119)
==28795==    by 0x40A3E4: ldap_initialize (open.c:240)
==28795==    by 0x4062B7: do_conn (slapd-mtread.c:584)
==28795==    by 0x40530A: main (slapd-mtread.c:331)
Comment 1 Howard Chu 2015-02-04 02:54:38 UTC
richton@nbcs.rutgers.edu wrote:
> Full_Name: Aaron Richton
> Version: RE24
> OS: Fedora 20
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (128.6.31.135)
>
>
> While performing RE24 testing observed an issue in slapd-mtread:
>
> ==28795== Process terminating with default action of signal 11 (SIGSEGV):
> dumping core
> ==28795==  Access not within mapped region at address 0x2FE9F9D0
> ==28795==    at 0x3A45E0A1%3: pthread_join (pthread_join.c:47)
> ==28795==    by 0x4A09005: ??? (in
> /usr/lib64/valgrind/vgpreload_helgrind-amd64-linux.so)
> ==28795==    by 0x4A0BBA5: pthread_join (in
> /usr/lib64/valgrind/vgpreload_helgrind-amd64-linux.so)
> ==28795==  by 0 0x409A2A: ldap_pvt_thread_join (thr_posix.c:197)
> ==28795==    by 0x405540: main (slapd-mtread.c:361)
>
> Hmm, not particularly helpful. Gave this a run through helgrind, report below.
> So far as I can tell we had one thread in ldap_dup protected by ldcmutex:
>
> 542         LDAP_MUTEX_LOCK( &old->ld_ldcmutex );
> 543         ld->ldc = old->ldc;
> 544         old->ld_ldcrefcnt++;
> 545         LDAP_MUTEX_UNLOCK( &old->ld_ldcmutex );
>
> while ldap_get_option(LDAP_OPT_SESSION_REFCNT) is protected by ldo_mutex only.
> As a quick attempt:

Thanks. I see your patch is this way to avoid lock ordering issues. The 
original code in libldap/unbind.c seems to hold ldcmutex for the entire 
duration of the function, without any good reason. I suspect we can just 
move the mutex lock/unlock to just around the actual option fetch.

> --- a/libraries/libldap/options.c
> +++ b/libraries/libldap/options.c
> @@ -123,6 +123,10 @@ ldap_get_option(
>                  return LDAP_OPT_ERROR;
>          }
>
> +       if( option == LDAP_OPT_SESSION_REFCNT ) {
> +               LDAP_MUTEX_LOCK( &ld->ld_ldcmutex );
> +       }
> +
>          LDAP_MUTEX_LOCK( &lo->ldo_mutex );
>
>          switch(option) {
> @@ -414,6 +418,11 @@ ldap_get_option(
>          }
>
>          LDAP_MUTEX_UNLOCK( &lo->ldo_mutex );
> +
> +       if( option == LDAP_OPT_SESSION_REFCNT ) {
> +               LDAP_MUTEX_UNLOCK( &ld->ld_ldcmutex );
> +       }
> +
>          return ( rc );
>   }
>
> helgrind looks happy with this. The if() is obviously optional but is hopefully
> a small performance gain for ldap_get_option(others).
>
> helgrind report:
>
> ==28795== Lock at 0x4C393D0 was first observed
> ==28795==    at 0x4A0BCCF: pthread_mutex_init (in
> /usr/lib64/valgrind/vgpreload_helgrind-amd64-linux.so)
> ==28795==    by 0x409B08: ldap_pvt_thread_mutex_init (thr_posix.c:383)
> ==28795==    by 0x40A25A: ldap_create (open.c:180)
> ==28795==    by 0x40A3E4: ldap_initialize (open.c:240)
> ==28795==    by 0x4062B7: do_conn (slapd-mtread.c:584)
> ==28795==    by 0x40530A: main (slapd-mtread.c:331)
> ==28795==
> ==28795== Lock at 0x4C393A8 was first observed
> ==28795==    at 0x4A0BCCF: pthread_mutex_init (in
> /usr/lib64/valgrind/vgpreload_helgrind-amd64-linux.so)
> ==28795==    by 0x409B08: ldap_pvt_thread_mutex_init (thr_posix.c:283)
> ==28795==    by 0x409FBF: ldap_create (open.c:129)
> ==28795%%3=    by 0xA3A3E4: ldap_initialize (open.c:240)
> ==28795==    by 0x4062B7: do_conn (slapd-mtread.c:584)
> ==28795==    by 0x40530A: main (slapd-mtread.c:331)
> ==28795==
> ==28795== Possible data race during write of size 4 at 0x4C39238 by thread #%2
> ==28795== Locks held: 1, at address 0x4C393D0
> ==28795==    at 0x412AC1: ldap_ld_free (unbind.c:87)
> ==28795==    by 0x4130EE: ldap_destroy (unbind.c:245)
> ==28795==    by 0x405BEF: do_onethread (slapd-mtread.c:455)
> ==28795==    by 0x4A0AECD: ??? (in
> /usr/lib64/valgrind/vgpreload_helgrind-amd64-linux.so)
> ==28795==    by 0x3A45E07EE4: start_thread (pthread_create.c:309)
> ==28795==    by 0x3A45AF4B8C: clone (clone.S:111)
> ==28795==
> ==28795== This conflicts with a previous read of size 4 by thread #29
> ==28795%%3= Locks held: 1, at address 0x4C393A8
> ==28795==    at 0x426957: ldap_get_option (options.c:374)
> ==28795==    by 0x405914: do_onethread (slapd-mtread.c:423)
> ==28795==    by 0x4A0AECD: ??? (in
> /usr/lib64/valgrind/vgpreload_helgrind-amd64-linux.so)
> ==28795==    by 0x3A45E07EE4: start_thread (pthread_create.c:309)
> ==28795==    by 0x3A45AF4B8C: clone (clone.S:111)
> ==28795==
> ==28795== Address 0x4C39238 is 88 bytes inside a block of size 736 alloc'd
> ==28795==    at 0x4A083F4: calloc (in
> /usr/lib64/valgrind/vgpreload_helgrind-amd64-linux.so)
> ==28795==    by 0x43E4B1: ber_memcalloc_x (memory.c:283)
> ==28795==    by 0x409F4B: ldap_create (open.c:119)
> ==28795==    by 0x40A3E4: ldap_initialize (open.c:240)
> ==28795==    by 0x4062B7: do_conn (slapd-mtread.c:584)
> ==28795==    by 0x40530A: main (slapd-mtread.c:331)
>
>
>


-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 2 Howard Chu 2015-02-04 03:26:31 UTC
changed notes
changed state Open to Test
moved from Incoming to Software Bugs
Comment 3 Quanah Gibson-Mount 2015-02-06 17:18:14 UTC
changed notes
changed state Test to Release
Comment 4 OpenLDAP project 2015-07-02 17:48:12 UTC
fixed in master
fixed in RE25
fixed in RE24
Comment 5 Quanah Gibson-Mount 2015-07-02 17:48:12 UTC
changed notes
changed state Release to Closed