[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#4943) tpool.c pause vs. finish



hyc@symas.com writes:
> h.b.furuseth@usit.uio.no wrote:
>> * When a thread is finishing, make it go active for context_reset().
>>   Should make slapi module destructors safer, and I don't see that it
>>   can hurt.
>
> I don't understand this.

Let me dig a bit back in the ITS...
  "context_reset() can call connection_fake_destroy(), which via
  slapi_int_free_object_extensions() calls slapi module destructors,
  which can do who-knows-what."

>> * I'd like to rename thread contexts to tasks(?) and user contexts back
>>   to contexts (as in RE23).  The current terminology is confusing.
>
> I don't care about this either way.

Then I will.  I still confuse myself sometimes, and I certainly would if
I came back to this code next year.

>> (...)
> Never use malloc unless absolutely necessary. We have enough problems
> with heap fragmentation already.

Aha.  OK.

>> * Scheduling: If several threads call pool_pause(), then once there is a
>>   pause tpool does not schedule them all.  They could get handled then,
>>   or another thread could undo the pause so tpool would wait to pause
>>   again.  Is that deliberate?
>
> I don't understand the question. The point of the pause is to prevent
> any other thread (in the pool) from running. Why should tpool schedule
> any other threads at this point?

Two threads call pool_pause().  Eventually the pool gets paused, and one
pool_pause() call returns.  When that thread calls pool_resume(), the
other thread waiting in pool_pause() may or may not get scheduled.

>>   - pool_context() breaks if several ldap_pvt_thread_t bit patterns can
>>     represent the same thread ID: TID_HASH() would give different hashes
>>     for the same thread, and pool_context() stops searching when it hits
>>     a slot with ctx == NULL.  (My previous bug report was a bit wrong.)
>
> How can a single thread ID be represented by multiple bit patterns?

A struct/union with padding bytes seems the least unlikely possibility.

A pointer in hardware where several address representations map to the
same physical address, and the compiler handles that by normalizing
pointers when they are compared/subtracted.  Like DOS "huge" pointers
would have been if the compiler normalized them when comparing them
instead of when incrementing/decrementing them.  20-bit address bus,
32-bit pointers, physical address = 16 * <segment half of pointer> +
<offset half of pointer>.

Or, not that I take this too seriously: A pthread implementation which
uses e.g. a 32-bit integer type for thread IDs but just ignores the top
22 bits.  The Posix spec seems to allow that, and thread IDs are to be
compared with pthread_equal().

>>     The best fix would be to use use real thread-specific data instead.
>>     Just one key with the ctx for now, that minimizes the changes.  OTOH
>>     it also means we'll do thread-specific key lookup twice - first in
>>     pthread to get the ctx, and then in ltu_key[] to find our data.
>>     Anyway, I said I'd do that later but might as well get on with it,
>>     at least for pthreads.  Except I don't know if that's OS-dependent
>>     enough that it should wait for RE24?
>
> The best fix may be to just use the address of the thread's stack as the
> thread ID. That's actually perfect for our purpose.

How does a function called from somewhere inside the thread know that
address?  That's almost what the user context is, and thus what
thread_keys[] maps the thread ID to.

>>   - thread_keys is a poor hash table implementation with a poor hash
>>     function.
>
> Ideally it would not be a hash table at all, but a direct lookup based
> on the thread ID.

If we can.  See above.  Anyway, I don't know why I keep kvetching about
the hash table implementation.  I'll just make it chained.  As I finally
figured out, that can be done without mallocs.  Solves the multi-pool
problem too.

>> * I wonder if there are long-lived ldap_pvt_thread_cond_wait(),
>>   [r]mutex_lock() or rdwr_[rw]lock() calls which should make the thread
>>   go inactive first (after waiting for any pause to finish), so that
>>   (a) a pool_pause() will not block waiting for them, and
>>   (b) all threads in the pool can't be blocked in long waits without
>>       tpool knowing about it.
>
> Long-lived lock calls would be a quite noticeable bug. This seems to be
> a non-issue.

Well, it'd have to be with code which is very rarely executed, or unusual
configurations.  But yes, I feel relaxed enough about it.

>> * back-bdb/tools.c uses pool_submit() to start long-lived threads:
>>   (...)
> Tool threads are completely separate from slapd threads. This is not an
> issue.

Fine.  I don't understand that in this context, but I'll take your word
for it:-)

>> * slapd/bconfig.c maintains its own idea of the max number of threads
>>   in connection_pool_max and slap_tool_thread_max, which does not match
>>   tpool's idea if the limit is <=0 or larger than LDAP_MAXTHR.
>>
>>   Since slapd acts on its own count, I imagine that can cause breakage.
>>   Simplest fix might be to set max #threads and then read it back,
>>   I haven't checked.  Would need to return LDAP_MAXTHR instead of 0
>>   for "as many as allowed".  Note however also the issue above:
>>   tpool's idea of actually available threads can be too high too.
>
> The practical limits are actually pretty low. Even if you're on a 1024
> processor machine with many terabytes of RAM, it's not a good idea to
> have many thousands of threads... Scheduler overhead would overshadow
> any useful work.

OK.  I forgot how much RAM a slapd thread requires.  But then it won't
hurt to give slapd a max limit lower than LDAP_MAXTHR either.

>> * I said my new ltp_pause waits might be too aggressive.  Maybe I should
>>   clarify: I don't see any instances where they can be dropped, but it's
>>   possible that slapd expects pauses to be less aggressive somewhere.
>
> I don't understand this comment.

Clarifying the clarification: I don't know slapd well enough to know if
the extra pauses do something bad to it.

>>   Oh, the comment /* no open threads at all?!? */:  That's not strange.
>>   Happens in the first pool_submit() call(s) if thread_create() fails.
>>   Though in that case there was no danger of removing the wrong ctx
>>   from ltp_pending_list after all (fixed in rev 1.63).
>
> (Sure, but thread_create() should never fail if you have a sane
> configuration.)

And if the rest of the machine doesn't run wild.  I imagine
thread_create() can fail when fork() can fail on some OSes.

-- 
Regards,
Hallvard