[Date Prev][Date Next]
Kurt D. Zeilenga wrote:
At 12:02 AM 10/14/2005, Jong-Hyuk wrote:ITS#3671 demonstrates that even with non-blocking writes we still have
threads blocked waiting for writes. Whether the actual write() is
blocking, or the thread is just waiting for the c_write_cv, the result
is the same - any other threads trying to write on the same connection
will also block and eventually the thread pool will be used up.
Is there a particular reason why we monitor write events? I don't see any benefit. Eliminating one set of event sources would reduce our kernel load by half.I want to ask the same question, too. What was the original rationale behind the listener's waking up the write-wait workers via condition variable
Given this approach dates to Umich days, we can only guess at
One guess would be to reduce the number of active threads.
Note that, with the intro of pools, this still has some
relevancy. It would be bad to allow N blocked writers
consume all the active threads. But there are various
other ways of preventing this.
While just having the workers block on write is an option,
I rather avoid letting workers block (on read or write)
as this necessitates relying on large number of threads
to ensure the whole system doesn't block. Many threading
subsystems do not support well large number of concurrent
It seems that there is some benefit to selecting for writes then, but
the current implementation doesn't take advantage of it. When
send_ldap_ber() gets EAGAIN (or if there are already queued writes) it
should dup the outgoing ber and enqueue it, and let the original thread
return immediately. The listener thread should kick off a writer thread
to process the queue as needed.
So, the desired approach is:
1: the listener thread listens for reads on wake_sds, listeners, and
data connection, and listens for writes on data connections, with a
timeout to allow for idle and runqueue processing.
2: if the thread pool gets full, the listener stops listening for
reads on listeners and data connections, but continues to listen for
reads on wake_sds and writes on data connections.
3: (optional) if a single connection has many writers queued, the
listener should stop listening for reads on that connection. Currently
we read the connection and put requests on the pending_ops queue. Maybe
we should only stop listening for reads when the number of pending_ops
hits a threshold, not the number of pending writes.
Dropping the readers from the event set (as in step 2) is very expensive
using epoll() in its default mode, since it requires calling epoll_ctl()
for each fd individually. (Once to remove it, and once again to reinsert
it when we want to re-enable listening.) Using epoll() in Edge Triggered
mode only partially mitigates this problem; it won't wake us up for old
events but will keep on waking up for new events. The only practical
alternative here is to use cascaded epoll sets, one dedicated to the
listeners / data readers, and a main one for the wake_sd and the data
writers. The listener epoll fd will itself be added or removed from the
main epoll fd. When we want to mask off reads, we remove the listener
epoll fd from the main fd set. When we want to monitor reads, we add it
back in, and then we have to go through a second level loop to find out
which readers are active. This is ugly and stupid, but that's life using
epoll(). (And no, I never had time to write the equeue implementation,
though it would surely come in handy right now.)
-- Howard Chu
Chief Architect, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc
OpenLDAP Core Team http://www.openldap.org/project/