Issue 216 - bug in liblthread/thr_nt.c
Summary: bug in liblthread/thr_nt.c
Status: VERIFIED FIXED
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: slapd (show other issues)
Version: unspecified
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 1999-06-30 17:56 UTC by twd@cs.brown.edu
Modified: 2014-08-01 21:07 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description Kurt Zeilenga 1999-06-30 17:55:31 UTC
Marked this issue PUBLIC.

twd@cs.brown.edu wrote:
> Full_Name: Tom Doeppner
> Version: 1.2.3
> OS: NT
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (128.148.33.19)
> 
> I'm not sure if anyone cares, but there's a bug in the wrappers used for
> building
> liblthread on top of Windows NT.

The 1.2.3 Windows NT support (libraries only) don't depend upon -lthread.
If the bug is devel code, please direct discussions to the -devel mailing
list.

	Kurt


> In particular, the wrappers for
> ldap_pvt_thread_cond_broadcast and ldap_pvt_thread_cond_wait have a somewhat
> subtle
> race.
> 
> int
> ldap_pvt_thread_cond_wait( ldap_pvt_thread_cond_t *cond,
>                           ldap_pvt_thread_mutex_t *mutex )
> {
>         ReleaseMutex( *mutex );
>         WaitForSingleObject( *cond, INFINITE );
>         WaitForSingleObject( *mutex, INFINITE );
>         return( 0 );
> }
> 
> int
> ldap_pvt_thread_cond_broadcast( ldap_pvt_thread_cond_t *cv )
> {
>         SetEvent( *cv );
>         return( 0 );
> }
> 
> The problem lies in ldap_pvt_thread_cond_wait: the calling thread first releases
> its
> mutex, then waits for the condition variable (represented by an auto-reset
> event
> object) to be signaled or broadcast. However, these two steps must be combined
> into
> a single, atomic operation and must not be implemented separately, as is done
> here.
> What might happen is that the calling thread releases the mutex and then loses
> its
> time slice. If the condition variable is then broadcast (i.e., another thread
> calls ldap_pvt_thread_cond_broadcast) and there is another thread waiting on
> the condition variable, by the time our thread executes again the condition
> variable
> will be in the unsignaled state and our thread will be stuck. This is because
> doing
> the broadcast sets the event object to the signaled state just long enough for
> all
> threads currently waiting on it to be released. As soon as they are all
> released,
> it goes back to the unsignaled state, which is how our thread will find it.
> 
> As an example, consider the following, where x is initially 0:
> 
> f1() {
>         mutex_lock(&m);
>         while(x == 0)
>                 cond_wait(&c, &m);
>         mutex_unlock(&m);
> }
> 
> f2() {
>         mutex_lock(&m);
>         x = 1;
>         cond_broadcast(&c);
>         mutex_unlock(&m);
> }
> 
> Suppose two (different) threads call f1 and one thread calls f2. Eventually,
> all
> three threads will return, _if_ mutexes and condition variables are implemented
> according to POSIX semantics. However, consider what might happen if this is
> done
> on NT with the wrappers in thr_nt.c: The first thread calls f1 and blocks in
> the
> call to cond_wait. The second thread calls f1 and releases the mutex inside of
> cond_wait, but loses its time slice before it gets a chance to wait on the
> event
> object. Now the third thread calls f2 and returns from it after having set both
> x to
> 1 and the event object to signaled.
> 
> The first thread is released from its call to WaitForSingleObject and the event
> object goes back to unsignaled. It then finds x to be 1 and returns from f1.
> The
> second thread finally gets another time slice and now calls
> WaitForSingleObject,
> only to find that it is unsignaled. It's now blocked (forever in this example).
> 
> The solution (which works on NT 4.0 but on neither Windows 95 nor Windows 98) is
> to
> replace ldap_pvt_thread_cond_wait with:
> 
> int
> ldap_pvt_thread_cond_wait( ldap_pvt_thread_cond_t *cond,
>                           ldap_pvt_thread_mutex_t *mutex )
> {
>         ReleaseMutex( *mutex );
>         SignalObjectAndWait(*mutex, *cond, INFINITE, FALSE);
>                 // this routine is implemented only on NT
>         WaitForSingleObject( *mutex, INFINITE );
>         return( 0 );
> }
Comment 1 twd@cs.brown.edu 1999-06-30 17:56:19 UTC
Full_Name: Tom Doeppner
Version: 1.2.3
OS: NT
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (128.148.33.19)


I'm not sure if anyone cares, but there's a bug in the wrappers used for
building
liblthread on top of Windows NT. In particular, the wrappers for
ldap_pvt_thread_cond_broadcast and ldap_pvt_thread_cond_wait have a somewhat
subtle 
race.

int
ldap_pvt_thread_cond_wait( ldap_pvt_thread_cond_t *cond,
                          ldap_pvt_thread_mutex_t *mutex )
{
        ReleaseMutex( *mutex );
        WaitForSingleObject( *cond, INFINITE );
        WaitForSingleObject( *mutex, INFINITE );
        return( 0 );
}

int
ldap_pvt_thread_cond_broadcast( ldap_pvt_thread_cond_t *cv )
{
        SetEvent( *cv );
        return( 0 );
}


The problem lies in ldap_pvt_thread_cond_wait: the calling thread first releases
its
mutex, then waits for the condition variable (represented by an auto-reset
event
object) to be signaled or broadcast. However, these two steps must be combined
into
a single, atomic operation and must not be implemented separately, as is done
here.
What might happen is that the calling thread releases the mutex and then loses
its
time slice. If the condition variable is then broadcast (i.e., another thread
calls ldap_pvt_thread_cond_broadcast) and there is another thread waiting on
the condition variable, by the time our thread executes again the condition
variable
will be in the unsignaled state and our thread will be stuck. This is because
doing
the broadcast sets the event object to the signaled state just long enough for
all
threads currently waiting on it to be released. As soon as they are all
released,
it goes back to the unsignaled state, which is how our thread will find it.

As an example, consider the following, where x is initially 0:

f1() {
	mutex_lock(&m);
	while(x == 0)
		cond_wait(&c, &m);
	mutex_unlock(&m);
}

f2() {
	mutex_lock(&m);
	x = 1;
	cond_broadcast(&c);
	mutex_unlock(&m);
}

Suppose two (different) threads call f1 and one thread calls f2. Eventually,
all
three threads will return, _if_ mutexes and condition variables are implemented
according to POSIX semantics. However, consider what might happen if this is
done
on NT with the wrappers in thr_nt.c: The first thread calls f1 and blocks in
the
call to cond_wait. The second thread calls f1 and releases the mutex inside of
cond_wait, but loses its time slice before it gets a chance to wait on the
event
object. Now the third thread calls f2 and returns from it after having set both
x to
1 and the event object to signaled.

The first thread is released from its call to WaitForSingleObject and the event
object goes back to unsignaled. It then finds x to be 1 and returns from f1.
The
second thread finally gets another time slice and now calls
WaitForSingleObject,
only to find that it is unsignaled. It's now blocked (forever in this example).

The solution (which works on NT 4.0 but on neither Windows 95 nor Windows 98) is
to
replace ldap_pvt_thread_cond_wait with:

int
ldap_pvt_thread_cond_wait( ldap_pvt_thread_cond_t *cond,
                          ldap_pvt_thread_mutex_t *mutex )
{
        ReleaseMutex( *mutex );
	SignalObjectAndWait(*mutex, *cond, INFINITE, FALSE);
		// this routine is implemented only on NT
        WaitForSingleObject( *mutex, INFINITE );
        return( 0 );
}
Comment 2 Kurt Zeilenga 1999-07-08 04:37:21 UTC
moved from Incoming to Software Bugs
Comment 3 Kurt Zeilenga 1999-07-16 20:48:11 UTC
changed notes
changed state Open to Suspended
Comment 4 Kurt Zeilenga 1999-07-23 17:40:29 UTC
moved from Software Bugs to Software Enhancements
Comment 5 Kurt Zeilenga 2000-03-10 16:01:06 UTC
changed notes
changed state Suspended to Closed
Comment 6 OpenLDAP project 2014-08-01 21:07:00 UTC
Applied to devel.