Issue 8339 - LMDB robust mutex fix
Summary: LMDB robust mutex fix
Status: VERIFIED FIXED
Alias: None
Product: LMDB
Classification: Unclassified
Component: liblmdb (show other issues)
Version: unspecified
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-12-20 01:54 UTC by openldap@gulag.ch
Modified: 2020-03-12 15:55 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description openldap@gulag.ch 2015-12-20 01:54:36 UTC
Full_Name: Rolf Stalder
Version: 2.4.43
OS: Solaris 11.2
URL: https://www.gulag.ch/www/download/0001-Solaris-robust-mutex-fix.patch
Submission from: (NULL) (217.162.126.5)


The provided patch (against 2.4.43) might solve a problem with the new robust
POSIX mutex implementation of LMDB 0.9.17 on Solaris 10/11.x. On Solaris 11
slapd fails to open the database:

--
mdb_db_open: database "x": dbenv_open(y).
mdb_db_open: database "x" cannot be opened: z (N). Restore from backup!
backend_startup_one (type=mdb, suffix="x"): bi_db_open failed! (N)
slapd shutdown: initiated
--
Values of 16 (EBUSY) and 22 (EINVAL) have been observed for N.

The following notes can be found in pthread_mutexattr_setrobust(3C) (
https://docs.oracle.com/cd/E36784_01/html/E36874/pthread-mutexattr-setrobust-3c.
html
):

--
The mutex memory must be zeroed before first  initialization
of  a  mutex  with  the  PTHREAD_MUTEX_ROBUST attribute. Any
thread in any process interested in the robust lock can call
pthread_mutex_init()  to potentially initialize it, provided
that all such callers of  pthread_mutex_init()  specify  the
same  set  of  attributes  in their attribute structures. In
this situation, if pthread_mutex_init() is called on a  pre-
viously  initialized  robust mutex, it will not reinitialize
the  mutex  and  will  return  the  error  value  EBUSY.  If
pthread_mutex_init()  is  called on a previously initialized
robust mutex, and if the caller specifies a different set of
attributes  from  those  already in effect for the mutex, it
will not reinitialize the mutex and will  return  the  error
value EINVAL.
--

Solaris 10 only provides the *_np functions (2.4.43 build/compilation fails
altogether) but the above meaning of EBUSY/EINVAL applies as well.

This patch has only be tested on Solaris 10/11.2 with Solaris Studio 12.3.
Comment 1 openldap@gulag.ch 2016-06-01 23:11:10 UTC
Hi

I'm fully aware of the fact that Solaris is not the most widely-used 
platform on earth... Is there any chance to get this patch evaluated? 
I'm glad to help with whatever is needed.

Thank you

Rolf

On 12/20/15 02:54 AM, openldap-its@OpenLDAP.org wrote:
> *** THIS IS AN AUTOMATICALLY GENERATED REPLY ***
>
> Thanks for your report to the OpenLDAP Issue Tracking System.  Your
> report has been assigned the tracking number ITS#8339.
>
> One of our support engineers will look at your report in due course.
> Note that this may take some time because our support engineers
> are volunteers.  They only work on OpenLDAP when they have spare
> time.
>
> If you need to provide additional information in regards to your
> issue report, you may do so by replying to this message.  Note that
> any mail sent to openldap-its@openldap.org with (ITS#8339)
> in the subject will automatically be attached to the issue report.
>
> 	mailto:openldap-its@openldap.org?subject=(ITS#8339)
>
> You may follow the progress of this report by loading the following
> URL in a web browser:
>      http://www.OpenLDAP.org/its/index.cgi?findid=8339
>
> Please remember to retain your issue tracking number (ITS#8339)
> on any further messages you send to us regarding this report.  If
> you don't then you'll just waste our time and yours because we
> won't be able to properly track the report.
>
> Please note that the Issue Tracking System is not intended to
> be used to seek help in the proper use of OpenLDAP Software.
> Such requests will be closed.
>
> OpenLDAP Software is user supported.
> 	http://www.OpenLDAP.org/support/
>
> --------------
> Copyright 1998-2007 The OpenLDAP Foundation, All Rights Reserved.
>


Comment 2 Howard Chu 2016-06-01 23:48:45 UTC
openldap@gulag.ch wrote:
> Hi
>
> I'm fully aware of the fact that Solaris is not the most widely-used
> platform on earth... Is there any chance to get this patch evaluated?
> I'm glad to help with whatever is needed.

My last SPARC/Solaris dev machine was decommissioned sometime last year so I 
haven't had any system available to test this on.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 3 openldap@gulag.ch 2016-06-02 01:50:33 UTC
On 06/02/16 01:48 AM, Howard Chu wrote:
> My last SPARC/Solaris dev machine was decommissioned sometime last year
> so I haven't had any system available to test this on.

Hi Howard

Thanks for the reply. Maybe I can help out and setup two systems for you 
(accessible via Internet):

-Solaris 11.3 Intel with Solaris Studio 12.3/4
-Solaris 10 Intel with Solaris Studio 12.3/4

We already use this patch (applied to LMDB 0.9.18) with OpenLDAP 2.4.44 
and as part of our application in production on 11.3/Intel and 
10/Intel+Sparc. So I could do some further testing for you on 10/Sparc 
(Solaris Studio 12.3/4 as well) if required.

Does this sound feasible?

Thank you

Rolf

Comment 4 Hallvard Furuseth 2016-06-02 05:56:56 UTC
Rolf, please try this instead, like the manpage you referred to
says under Notes.  Do not catch EBUSY/EINVAL, they do mean the
mutex was not (re)inited.

diff --git a/libraries/liblmdb/mdb.c b/libraries/liblmdb/mdb.c
index 0209c09..5ae93bc 100644
--- a/libraries/liblmdb/mdb.c
+++ b/libraries/liblmdb/mdb.c
@@ -304,7 +304,7 @@ union semun {
 # else
 #  define MDB_USE_ROBUST	1
 /* glibc < 2.12 only provided _np API */
-#  if defined(__GLIBC__) && GLIBC_VER < 0x02000c
+#  if (defined(__GLIBC__) && GLIBC_VER < 0x02000c) || defined(__SunOS_5_10)
 #   define PTHREAD_MUTEX_ROBUST	PTHREAD_MUTEX_ROBUST_NP
 #   define pthread_mutexattr_setrobust(attr, flag)	pthread_mutexattr_setrobust_np(attr, flag)
 #   define pthread_mutex_consistent(mutex)	pthread_mutex_consistent_np(mutex)
@@ -4963,6 +4963,13 @@ mdb_env_setup_locks(MDB_env *env, char *lpath, int mode, int *excl)
 #else	/* MDB_USE_POSIX_MUTEX: */
 		pthread_mutexattr_t mattr;
 
+		/* Solaris needs this before initing a robust mutex.  Otherwise
+		 * it may skip the init and return EBUSY "seems someone already
+		 * inited" or EINVAL "it was inited differently".
+		 */
+		memset(env->me_txns->mti_rmutex, 0, sizeof(*env->me_txns->mti_rmutex));
+		memset(env->me_txns->mti_wmutex, 0, sizeof(*env->me_txns->mti_wmutex));
+
 		if ((rc = pthread_mutexattr_init(&mattr))
 			|| (rc = pthread_mutexattr_setpshared(&mattr, PTHREAD_PROCESS_SHARED))
 #ifdef MDB_ROBUST_SUPPORTED

Comment 5 openldap@gulag.ch 2016-06-02 18:47:58 UTC
Hi Hallvard

Thanks for the hint. Your are perfectly right, the mutex memory has to 
be zeroed in case of a robust mutex. Applied against 0.9.18 and it seems 
to work. My proposed patch is broken.

There is a slight inconsistency if subsequent calls after a successfull 
pthread_mutexattr_init() fail (in the ORed if statement). This might 
result in a memory leak as the mutex attributes are not destroyed. But 
this is not related to the topic of this ITS.

Thank you

Rolf

On 06/02/16 07:56 AM, Hallvard Breien Furuseth wrote:
> Rolf, please try this instead, like the manpage you referred to
> says under Notes.  Do not catch EBUSY/EINVAL, they do mean the
> mutex was not (re)inited.
>
> diff --git a/libraries/liblmdb/mdb.c b/libraries/liblmdb/mdb.c
> index 0209c09..5ae93bc 100644
> --- a/libraries/liblmdb/mdb.c
> +++ b/libraries/liblmdb/mdb.c
> @@ -304,7 +304,7 @@ union semun {
>   # else
>   #  define MDB_USE_ROBUST	1
>   /* glibc < 2.12 only provided _np API */
> -#  if defined(__GLIBC__) && GLIBC_VER < 0x02000c
> +#  if (defined(__GLIBC__) && GLIBC_VER < 0x02000c) || defined(__SunOS_5_10)
>   #   define PTHREAD_MUTEX_ROBUST	PTHREAD_MUTEX_ROBUST_NP
>   #   define pthread_mutexattr_setrobust(attr, flag)	pthread_mutexattr_setrobust_np(attr, flag)
>   #   define pthread_mutex_consistent(mutex)	pthread_mutex_consistent_np(mutex)
> @@ -4963,6 +4963,13 @@ mdb_env_setup_locks(MDB_env *env, char *lpath, int mode, int *excl)
>   #else	/* MDB_USE_POSIX_MUTEX: */
>   		pthread_mutexattr_t mattr;
>
> +		/* Solaris needs this before initing a robust mutex.  Otherwise
> +		 * it may skip the init and return EBUSY "seems someone already
> +		 * inited" or EINVAL "it was inited differently".
> +		 */
> +		memset(env->me_txns->mti_rmutex, 0, sizeof(*env->me_txns->mti_rmutex));
> +		memset(env->me_txns->mti_wmutex, 0, sizeof(*env->me_txns->mti_wmutex));
> +
>   		if ((rc = pthread_mutexattr_init(&mattr))
>   			|| (rc = pthread_mutexattr_setpshared(&mattr, PTHREAD_PROCESS_SHARED))
>   #ifdef MDB_ROBUST_SUPPORTED
>


Comment 6 Howard Chu 2016-06-02 20:15:38 UTC
openldap@gulag.ch wrote:
> Hi Hallvard
>
> Thanks for the hint. Your are perfectly right, the mutex memory has to
> be zeroed in case of a robust mutex. Applied against 0.9.18 and it seems
> to work. My proposed patch is broken.

I've pushed a tweaked version of Hallvard's patch. While the Studio compiler 
defines __SunOS_5_10 it appears that gcc doesn't, so we took a different 
detection route.
>
> There is a slight inconsistency if subsequent calls after a successfull
> pthread_mutexattr_init() fail (in the ORed if statement). This might
> result in a memory leak as the mutex attributes are not destroyed. But
> this is not related to the topic of this ITS.

Feel free to submit another ITS for that, thanks.
>
> Thank you
>
> Rolf
>
> On 06/02/16 07:56 AM, Hallvard Breien Furuseth wrote:
>> Rolf, please try this instead, like the manpage you referred to
>> says under Notes.  Do not catch EBUSY/EINVAL, they do mean the
>> mutex was not (re)inited.
>>
>> diff --git a/libraries/liblmdb/mdb.c b/libraries/liblmdb/mdb.c
>> index 0209c09..5ae93bc 100644
>> --- a/libraries/liblmdb/mdb.c
>> +++ b/libraries/liblmdb/mdb.c
>> @@ -304,7 +304,7 @@ union semun {
>>    # else
>>    #  define MDB_USE_ROBUST	1
>>    /* glibc < 2.12 only provided _np API */
>> -#  if defined(__GLIBC__) && GLIBC_VER < 0x02000c
>> +#  if (defined(__GLIBC__) && GLIBC_VER < 0x02000c) || defined(__SunOS_5_10)
>>    #   define PTHREAD_MUTEX_ROBUST	PTHREAD_MUTEX_ROBUST_NP
>>    #   define pthread_mutexattr_setrobust(attr, flag)	pthread_mutexattr_setrobust_np(attr, flag)
>>    #   define pthread_mutex_consistent(mutex)	pthread_mutex_consistent_np(mutex)
>> @@ -4963,6 +4963,13 @@ mdb_env_setup_locks(MDB_env *env, char *lpath, int mode, int *excl)
>>    #else	/* MDB_USE_POSIX_MUTEX: */
>>    		pthread_mutexattr_t mattr;
>>
>> +		/* Solaris needs this before initing a robust mutex.  Otherwise
>> +		 * it may skip the init and return EBUSY "seems someone already
>> +		 * inited" or EINVAL "it was inited differently".
>> +		 */
>> +		memset(env->me_txns->mti_rmutex, 0, sizeof(*env->me_txns->mti_rmutex));
>> +		memset(env->me_txns->mti_wmutex, 0, sizeof(*env->me_txns->mti_wmutex));
>> +
>>    		if ((rc = pthread_mutexattr_init(&mattr))
>>    			|| (rc = pthread_mutexattr_setpshared(&mattr, PTHREAD_PROCESS_SHARED))
>>    #ifdef MDB_ROBUST_SUPPORTED
>>
>
>
>
>
>


-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 7 Howard Chu 2016-06-02 20:58:39 UTC
openldap@gulag.ch wrote:
> On 06/02/16 01:48 AM, Howard Chu wrote:
>> My last SPARC/Solaris dev machine was decommissioned sometime last year
>> so I haven't had any system available to test this on.
>
> Hi Howard
>
> Thanks for the reply. Maybe I can help out and setup two systems for you
> (accessible via Internet):
>
> -Solaris 11.3 Intel with Solaris Studio 12.3/4
> -Solaris 10 Intel with Solaris Studio 12.3/4
>
> We already use this patch (applied to LMDB 0.9.18) with OpenLDAP 2.4.44 and as
> part of our application in production on 11.3/Intel and 10/Intel+Sparc. So I
> could do some further testing for you on 10/Sparc (Solaris Studio 12.3/4 as
> well) if required.
>
> Does this sound feasible?

Thanks for the offer, perhaps in the future. I just now acquired access to a 
Solaris VPS so we can handle this issue for now.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 8 openldap@gulag.ch 2016-06-02 22:53:16 UTC
On 06/02/16 10:15 PM, Howard Chu wrote:
> I've pushed a tweaked version of Hallvard's patch. While the Studio
> compiler defines __SunOS_5_10 it appears that gcc doesn't, so we took a
> different detection route.

Perfect and the detection clearly works. Thanks for reviewing this ITS.

--- pthread.h S11 ---
/*
  * Mutex robust attribute values.
  * Keep these in synch with sys/synch.h lock types.
  */
#define PTHREAD_MUTEX_STALLED   0x0
#define PTHREAD_MUTEX_ROBUST    0x40
/*
  * Historical solaris-specific names,
  * from before pthread_mutexattr_getrobust() became standardized
  */
#define PTHREAD_MUTEX_STALL_NP    PTHREAD_MUTEX_STALLED
#define PTHREAD_MUTEX_ROBUST_NP   PTHREAD_MUTEX_ROBUST
---------------------

 > uname -r
5.11
 > nm liblmdb.so |grep setrobust
[161]   |         0|         0|FUNC |GLOB |0    |UNDEF 
|pthread_mutexattr_setrobust

--- pthread.h S10 ---
/*
  * Mutex robustness attribute values. The robustness attribute is a
  * Solaris specific extension to support robust mutexes. Note the _NP 
suffix
  * to indicate these are not part of the current POSIX spec (POSIX 
1003.1 1996),
  * but are platform specific non-portable extensions. Keep these in synch
  * with sys/synch.h lock types.
  */
#define PTHREAD_MUTEX_STALL_NP    0x0
#define PTHREAD_MUTEX_ROBUST_NP   0x40
---------------------

 > uname -r
5.10
 > nm liblmdb.so |grep setrobust
[245]   |         0|         0|FUNC |GLOB |0    |UNDEF 
|pthread_mutexattr_setrobust_np



Comment 9 Quanah Gibson-Mount 2016-06-02 23:10:41 UTC
changed notes
changed state Open to Release
Comment 10 openldap@gulag.ch 2016-06-06 09:34:44 UTC
On 06/02/16 10:15 PM, Howard Chu wrote:
>>
>> There is a slight inconsistency if subsequent calls after a successfull
>> pthread_mutexattr_init() fail (in the ORed if statement). This might
>> result in a memory leak as the mutex attributes are not destroyed. But
>> this is not related to the topic of this ITS.
>
> Feel free to submit another ITS for that, thanks.

Has just been fixed by Hallvard:

http://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=commitdiff;h=c4c7833d245fdc4b2ea4a1a459d7069ce3af87f5

Thanks!

Comment 11 OpenLDAP project 2017-06-01 22:08:30 UTC
fixed in master
fixed in 0.9.19
Comment 12 Quanah Gibson-Mount 2017-06-01 22:08:30 UTC
changed notes
changed state Release to Closed
moved from Incoming to Software Bugs