[Date Prev][Date Next] [Chronological] [Thread] [Top]

RE: Reader Writer Locks possible wrong implementation (ITS#1865)



Good catch. The bug is actually that ldap_pvt_thread_cond_broadcast isn't
releasing all the waiters at once as it is intended to. I've got a different
patch to thr_nt.c to address this problem. It is now committed to HEAD
(thr_nt.c 1.21).

  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support

> -----Original Message-----
> From: owner-openldap-bugs@OpenLDAP.org
> [mailto:owner-openldap-bugs@OpenLDAP.org]On Behalf Of
> g.aprotosoaie@computer.org
> Sent: Friday, June 07, 2002 8:18 AM
> To: openldap-its@OpenLDAP.org
> Subject: Reader Writer Locks possible wrong implementation (ITS#1865)
>
>
> Full_Name: George Aprotosoaie
> Version: 2.0.23 (and 2.1.1beta)
> OS: Windows, but it may be OS independent
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (209.47.38.244)
>
>
> Hi!
>
>
> Platform: Windows 2000 (I will expect the same behavior on NT4
> and may be other
> OSs)
> Version: 2.0.23 (2.1.1beta also has this problem)
> Build type: built from source
> Effect: server not responding to queries
>
>
> The common understanding for a ReaderWriter lock is that a) after
> a writer is
> released b) if one or more readers are waiting to acquire the
> lock, then all of
> the readers have to be able to acquire the lock.
>
> A possible error for the existing implementation (in
> 'libraries\libldap_r\rdwr.c') is that only one reader is able to
> acquire the
> lock after a writer releases the lock. The issue is that only a
> writer release
> would enable one waiting reader to acquire the lock. When the
> writer releases
> the lock if there are more readers waiting to acquire the lock
> (as in the case
> when there are more searches done at the same time than add or modify
> operations), than all the waiting readers less one will be
> blocked. The blocked
> readers are using up threads from the pool (may generate a pool
> starvation by
> using all the threads and locking up the LDAP server, with the
> server using no
> CPU). The only way to release the blocked readers is to perform
> operations that
> will take the writer lock (one writer released for each blocked
> reader). New
> attempts to take reader locks without a writer owning the lock
> will succeed.
>
> Using the rdwr.c source: when a reader attempts to acquire a
> lock, if a writer
> is 'active', then the reader is blocked in 'rw->ltrw_read' until
> the writer
> broadcast the lock release by calling
> 'ldap_pvt_thread_cond_broadcast( &rw->
> ltrw_read )'. The issue (may be only Windows related) is that the
> implementation
> of the 'ldap_pvt_thread_cond_broadcast( &rw->ltrw_read )'
> releases only one
> waiting thread and not all of them. As a result if there are more than one
> readers waiting on 'rw->ltrw_read' then only one will be
> released, with the
> other ones still waiting. If the are OS where
> 'ldap_pvt_thread_cond_broadcast(
> &rw->ltrw_read )' releases all the waiting threads, than this is
> not an issue on
> that OS.
>
> The solution I propose (I have tested it for Windows and it works
> fine) is when
> a reader is released from waiting on 'rw->ltrw_read' (as a result
> there are no
> 'active' writers), the reader should check if there are more
> waiting readers. If
> there are waiting readers, than the reader should broadcast again the lock
> release using 'ldap_pvt_thread_cond_broadcast( &rw->ltrw_read )'
> (in the same
> way as the writer does when is released). This broadcast will
> release another
> waiting reader that again will check if there are waiting readers
> and if any
> will repeat the cycle until there are no more waiting readers!
> The new function
> looks like (the change is follows the comment 'the following two
> lines were
> added to release possible other waiting readers'):
>
> int ldap_pvt_thread_rdwr_rlock( ldap_pvt_thread_rdwr_t *rwlock )
> {
> 	struct ldap_int_thread_rdwr_s *rw;
>
> 	assert( rwlock != NULL );
> 	rw = *rwlock;
>
> 	assert( rw != NULL );
> 	assert( rw->ltrw_valid == LDAP_PVT_THREAD_RDWR_VALID );
>
> 	if( rw->ltrw_valid != LDAP_PVT_THREAD_RDWR_VALID )
> 		return LDAP_PVT_THREAD_EINVAL;
>
> 	ldap_pvt_thread_mutex_lock( &rw->ltrw_mutex );
>
> 	assert( rw->ltrw_w_active >= 0 );
> 	assert( rw->ltrw_w_wait >= 0 );
> 	assert( rw->ltrw_r_active >= 0 );
> 	assert( rw->ltrw_r_wait >= 0 );
>
> 	if( rw->ltrw_w_active > 0 ) {
> 		/* writer is active */
>
> 		rw->ltrw_r_wait++;
>
> 		do {
> 			ldap_pvt_thread_cond_wait(
> 				&rw->ltrw_read, &rw->ltrw_mutex );
> 		} while( rw->ltrw_w_active > 0 );
>
> 		rw->ltrw_r_wait--;
> 		assert( rw->ltrw_r_wait >= 0 );
> 	}
>
> 	rw->ltrw_r_active++;
>
> // the following two lines were added to release
> // possible other waiting readers
> 	if (rw->ltrw_r_wait > 0)
> 		ldap_pvt_thread_cond_broadcast( &rw->ltrw_read );
>
> 	ldap_pvt_thread_mutex_unlock( &rw->ltrw_mutex );
>
> 	return 0;
> }
>
> Implementation notes: the change uses 'rw->ltrw_r_wait'. As a
> result has to be
> under the 'rw->ltrw_mutex' lock protection.
>
> Note: this issue may be the cause for the Incoming/1764 issue.
>
>
> Regards,
> George