[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#4943) tpool.c pause vs. finish

OK, I've fixed some of it in HEAD.  See comments in the commits for
tpool.c 1.63 and outwards.

RE23 has the same problems.  Nearly identical code sans the new
semaphore stuff and a few variable names, so fixes should be easy
to import when they've been tested a bit.

Remaining problems:

tpool.c breaks if there are multiple pools.  The simplest
solution seems to be to remove support for multiple pools.
The problem is thread_keys[].  It is shared between pools, but:

- ldap_pvt_thread_pool_purgekey() needs _all_ pools to be
  paused, but ldap_pvt_thread_pool_pause() only pauses one pool.
  If we keep multi-pool support, maybe it only should destroy keys
  from the current pool?

- ltp_max_count (max #threads) is pool-specific, but is used to
  protect from array bounds violation on thread_keys[LDAP_MAXTHR].
  (New simple code from me, previously there was no limit.)

- This '#if 0'ed comment is also wrong with multiple pools:
	/* start up one thread, just so there is one. no need to
	 * lock the mutex right now, since no threads are running.
  Time to remove the whole #if 0?

Other thread_keys[] strangeness:

- setkey does not free the old data with ltk_free.  Possibly that's
  intentional, but if so that requires weird code elsewhere.

- maybe related to how getkey can return the ltk_free function - which
  seems crazy since a caller who uses it must then explicitly reset the
  key afterwards (or better, before), to be sure purgekey doesn't
  see the data and free it again.

Other notes:

- I've inserted new, possibly too aggressive, waits for ltp_pause in
  pool_wrapper.  See the tpool.c 1.65 comment.  Yet another alternative
  than splitting the ltp_pause state might be to remove the requirement
  that purgekey() can only be used by a paused thread.  (Could loop
  until the key is _really_ gone instead, or something.)

- tpool.c uses LDAP_FREE(), can call ldap_pvt_thread_pool_context(),
  which uses thread_keys[].  If one thread does that when another thread
  has paused the pool, that can break since thread_keys[] is reserved
  for ldap_pvt_thread_pool_purgekey() during pauses.

  Not sure if that can happen, haven't checked.  If
  LDAP_MALLOC/LDAP_CALLOC also can do call pool_context(), we do have
  problem in pool_submit().
  One fix is to use malloc/calloc/free instead.  Another, in the
  case of LDAP_FREE(ctx) in pool_submit, is to put the ctx on the
  free list instead.

- ltp_pending_count includes threads sleeping in pool_pause() - should
  it?  In effect, pool_pause() temporarily reduces ltp_max_pending.

- In ldap_pvt_thread_pool_submit():

  This comment:
			/* there is another open thread, so this
			 * context will be handled eventually.
			 * continue on and signal that the context
			 * is waiting.
  is wrong if the previous if (pool->ltp_open_count == 0) was
  taken but the no thread was found inside that.

  Also if ctx is invalid (another thread handled it) and a new
  pending request has gotten the now-unused ctx, then the
  'if (pool->ltp_open_count == 0)' code will free that new
  pending request instead of the old one.

  One fix is to add an is_pending flag=1 in submit() and int
  *ltc_pendingp; in ldap_int_thread_ctx_t.  In submit, if need_thread,
  set ltc_pendingp = &is_pending.  In if(pool->ltp_open_count == 0)
  check is_pending instead of searching for ctx.  Elsewhere set
  *ltc_pendingp = 0 and/or ltc_pendingp = NULL when appropriate.

- Note, I haven't looked at the new semaphore stuff yet.

Will get back to this later, but could use review so far.  In particular
I'd like to know if we'll keep or kill multi-pool support.