[Date Prev][Date Next]
Re: SLAP_LIGHTWEIGHT_LISTENER, using lazy_sem is a bad design
The title sounds like a tabloid headline... :-)
Howard Chu wrote:
This solution will deadlock if all worker threads are stuck in a write
wait. Since the semaphore completely blocks the listener thread, there
will be no way to wake up the waiting writers and free up more
threads. Also, blocking the listener thread like this prevents the
idletimeout checker from working. I.e., you have managed to disable
two key mechanisms for returning server resources to the pool,
precisely when they are needed the most.
The listener thread must never block, period.
It would be better to simply have ldap_pvt_thread_pool_submit return a
result code (e.g. LDAP_BUSY if the submitted op will be queued because
there are no available workers, LDAP_SUCCESS otherwise) that is passed
back to the listener thread. When the listener thread gets this result
it should drop all read descriptors from the event set, but keep
monitoring the wake_sds and the write events.
Alternatively we should stop monitoring write events and just let the
writers unblock themselves. Is there a particular reason why we
monitor write events? I don't see any benefit. Eliminating one set of
event sources would reduce our kernel load by half.
I want to ask the same question, too. What was the original rationale
behind the listener's waking up the write-wait workers via condition
variable ? If there're no compeling reason to have the listener wake up
the writers (this certainly does nothing to do with concurrency
improvement), we should consider the idea of relying on the writers
themselves and the OS to unblock.
Aside from this issue there is definitely a bug in the current
implementation; I see the same event being submitted multiple times in
rapid succession. The CPU usage goes to 100% and there does not appear
to be any end condition that disables the event. This occurs most
often in test033, but should occur in any test that uses syncrepl (or
listener-managed client tasks like syncrepl). After the syncrepl task
has sent a search request to the provider and the first reply arrives,
marking the socket readable. It appears this readable state is not
Will look into this further.