[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#3925) Strange thread pool code



h.b.furuseth@usit.uio.no wrote:
> tpool.c:ldap_pvt_thread_pool_pause () has a bit of strange code:
> 	/* If someone else has already requested a pause, we have to wait */
> 	if (pool->ltp_state == LDAP_INT_THREAD_POOL_PAUSING) {
> 		pool->ltp_pending_count++;
> 		pool->ltp_active_count--;
> 		ldap_pvt_thread_cond_wait(&pool->ltp_cond, &pool->ltp_mutex);
> 		pool->ltp_pending_count--;
> 		pool->ltp_active_count++;
> 	}
> 	/* Wait for everyone else to finish */
> 	pool->ltp_state = LDAP_INT_THREAD_POOL_PAUSING;
> 	...
> Usually one puts such a cond_wait in a while loop, not in an if test.
>
> Once in a while I get segfauts from tpool which I have not seen since
> changing this to a while loop, but OTOH with that change test004 sometimes
> fails and I don't remember seeing that before.  I have not tested much though.
>
> The test004 error with the while loop is
>    slap_startup failed
>    slapadd failed (1)!
> with no logfile in the testrun directory.
>   

I don't understand how this code can affect test004 or slapadd, since 
this function is not called in either case.

> Stack trace with the if-test (test003 or test 004):
> Command-line: ../servers/slapd/slapd -Ta -f ./testrun/slapd.1.conf -l
>     ./testdata/test-ordered.ldif
>
> Accessing a memory range that crosses a memory segment boundary:
>     pthread_cond_destroy@@GLIBC_2.3.2  [libpthread.so.0]
>     ldap_pvt_thread_cond_destroy  [thr_posix.c]
>     ldap_pvt_thread_pool_destroy  [tpool.c]
>     slapd_daemon_task  [daemon.c]
>     start_thread  [libpthread.so.0]
>
>   * Addressing 0xb656deb8 for 4 bytes ending at 0xb656debc,
>     beginning in the stack which ends at 0xb6571000.
>
>   * Addressing 0xb656deb8 for 4 bytes ending at 0xb656debc,
>     beginning in the stack which ends at 0xb6571000.
>
>   * Addressing 0xb656deb4 for 4 bytes ending at 0xb656deb8,
>     beginning in the stack which ends at 0xb6571000.
>
>   * Addressing 0xb656ded0 for 4 bytes ending at 0xb656ded4,
>     beginning in the stack which ends at 0xb6571000.
>
>   * Addressing 0xb656deb0 for 4 bytes ending at 0xb656deb4,
>     beginning in the stack which ends at 0xb6571000.

I have no idea what this trace is saying. A back trace with line numbers 
would be more useful. How can destroying a valid condition variable 
cause multiple accesses across a stack boundary? (How can it even cause 
a single error of this type?) Since the pool_pause function is not 
invoked during slapadd, you're going to have to look for the culprit 
elsewhere.

-- 
  -- Howard Chu
  Chief Architect, Symas Corp.  http://www.symas.com
  Director, Highland Sun        http://highlandsun.com/hyc
  OpenLDAP Core Team            http://www.openldap.org/project/