[Date Prev][Date Next]
Re: (ITS#4943) tpool.c pause vs. finish
> firstname.lastname@example.org wrote:
>> * When a thread is finishing, make it go active for context_reset().
>> Should make slapi module destructors safer, and I don't see that it
>> can hurt.
> I don't understand this.
Let me dig a bit back in the ITS...
"context_reset() can call connection_fake_destroy(), which via
slapi_int_free_object_extensions() calls slapi module destructors,
which can do who-knows-what."
>> * I'd like to rename thread contexts to tasks(?) and user contexts back
>> to contexts (as in RE23). The current terminology is confusing.
> I don't care about this either way.
Then I will. I still confuse myself sometimes, and I certainly would if
I came back to this code next year.
> Never use malloc unless absolutely necessary. We have enough problems
> with heap fragmentation already.
>> * Scheduling: If several threads call pool_pause(), then once there is a
>> pause tpool does not schedule them all. They could get handled then,
>> or another thread could undo the pause so tpool would wait to pause
>> again. Is that deliberate?
> I don't understand the question. The point of the pause is to prevent
> any other thread (in the pool) from running. Why should tpool schedule
> any other threads at this point?
Two threads call pool_pause(). Eventually the pool gets paused, and one
pool_pause() call returns. When that thread calls pool_resume(), the
other thread waiting in pool_pause() may or may not get scheduled.
>> - pool_context() breaks if several ldap_pvt_thread_t bit patterns can
>> represent the same thread ID: TID_HASH() would give different hashes
>> for the same thread, and pool_context() stops searching when it hits
>> a slot with ctx == NULL. (My previous bug report was a bit wrong.)
> How can a single thread ID be represented by multiple bit patterns?
A struct/union with padding bytes seems the least unlikely possibility.
A pointer in hardware where several address representations map to the
same physical address, and the compiler handles that by normalizing
pointers when they are compared/subtracted. Like DOS "huge" pointers
would have been if the compiler normalized them when comparing them
instead of when incrementing/decrementing them. 20-bit address bus,
32-bit pointers, physical address = 16 * <segment half of pointer> +
<offset half of pointer>.
Or, not that I take this too seriously: A pthread implementation which
uses e.g. a 32-bit integer type for thread IDs but just ignores the top
22 bits. The Posix spec seems to allow that, and thread IDs are to be
compared with pthread_equal().
>> The best fix would be to use use real thread-specific data instead.
>> Just one key with the ctx for now, that minimizes the changes. OTOH
>> it also means we'll do thread-specific key lookup twice - first in
>> pthread to get the ctx, and then in ltu_key to find our data.
>> Anyway, I said I'd do that later but might as well get on with it,
>> at least for pthreads. Except I don't know if that's OS-dependent
>> enough that it should wait for RE24?
> The best fix may be to just use the address of the thread's stack as the
> thread ID. That's actually perfect for our purpose.
How does a function called from somewhere inside the thread know that
address? That's almost what the user context is, and thus what
thread_keys maps the thread ID to.
>> - thread_keys is a poor hash table implementation with a poor hash
> Ideally it would not be a hash table at all, but a direct lookup based
> on the thread ID.
If we can. See above. Anyway, I don't know why I keep kvetching about
the hash table implementation. I'll just make it chained. As I finally
figured out, that can be done without mallocs. Solves the multi-pool
>> * I wonder if there are long-lived ldap_pvt_thread_cond_wait(),
>> [r]mutex_lock() or rdwr_[rw]lock() calls which should make the thread
>> go inactive first (after waiting for any pause to finish), so that
>> (a) a pool_pause() will not block waiting for them, and
>> (b) all threads in the pool can't be blocked in long waits without
>> tpool knowing about it.
> Long-lived lock calls would be a quite noticeable bug. This seems to be
> a non-issue.
Well, it'd have to be with code which is very rarely executed, or unusual
configurations. But yes, I feel relaxed enough about it.
>> * back-bdb/tools.c uses pool_submit() to start long-lived threads:
> Tool threads are completely separate from slapd threads. This is not an
Fine. I don't understand that in this context, but I'll take your word
>> * slapd/bconfig.c maintains its own idea of the max number of threads
>> in connection_pool_max and slap_tool_thread_max, which does not match
>> tpool's idea if the limit is <=0 or larger than LDAP_MAXTHR.
>> Since slapd acts on its own count, I imagine that can cause breakage.
>> Simplest fix might be to set max #threads and then read it back,
>> I haven't checked. Would need to return LDAP_MAXTHR instead of 0
>> for "as many as allowed". Note however also the issue above:
>> tpool's idea of actually available threads can be too high too.
> The practical limits are actually pretty low. Even if you're on a 1024
> processor machine with many terabytes of RAM, it's not a good idea to
> have many thousands of threads... Scheduler overhead would overshadow
> any useful work.
OK. I forgot how much RAM a slapd thread requires. But then it won't
hurt to give slapd a max limit lower than LDAP_MAXTHR either.
>> * I said my new ltp_pause waits might be too aggressive. Maybe I should
>> clarify: I don't see any instances where they can be dropped, but it's
>> possible that slapd expects pauses to be less aggressive somewhere.
> I don't understand this comment.
Clarifying the clarification: I don't know slapd well enough to know if
the extra pauses do something bad to it.
>> Oh, the comment /* no open threads at all?!? */: That's not strange.
>> Happens in the first pool_submit() call(s) if thread_create() fails.
>> Though in that case there was no danger of removing the wrong ctx
>> from ltp_pending_list after all (fixed in rev 1.63).
> (Sure, but thread_create() should never fail if you have a sane
And if the rest of the machine doesn't run wild. I imagine
thread_create() can fail when fork() can fail on some OSes.