[Date Prev][Date Next] [Chronological] [Thread] [Top]


Kurt D. Zeilenga wrote:
At 12:02 AM 10/14/2005, Jong-Hyuk wrote:
Is there a particular reason why we monitor write events? I don't see any benefit. Eliminating one set of event sources would reduce our kernel load by half.
I want to ask the same question, too. What was the original rationale behind the listener's waking up the write-wait workers via condition variable

Given this approach dates to Umich days, we can only guess at the answer.

One guess would be to reduce the number of active threads.
Note that, with the intro of pools, this still has some
relevancy.  It would be bad to allow N blocked writers
consume all the active threads.  But there are various
other ways of preventing this.

While just having the workers block on write is an option,
I rather avoid letting workers block (on read or write)
as this necessitates relying on large number of threads
to ensure the whole system doesn't block.  Many threading
subsystems do not support well large number of concurrent

ITS#3671 demonstrates that even with non-blocking writes we still have threads blocked waiting for writes. Whether the actual write() is blocking, or the thread is just waiting for the c_write_cv, the result is the same - any other threads trying to write on the same connection will also block and eventually the thread pool will be used up.

It seems that there is some benefit to selecting for writes then, but the current implementation doesn't take advantage of it. When send_ldap_ber() gets EAGAIN (or if there are already queued writes) it should dup the outgoing ber and enqueue it, and let the original thread return immediately. The listener thread should kick off a writer thread to process the queue as needed.

So, the desired approach is:
1: the listener thread listens for reads on wake_sds, listeners, and data connection, and listens for writes on data connections, with a timeout to allow for idle and runqueue processing.
2: if the thread pool gets full, the listener stops listening for reads on listeners and data connections, but continues to listen for reads on wake_sds and writes on data connections.
3: (optional) if a single connection has many writers queued, the listener should stop listening for reads on that connection. Currently we read the connection and put requests on the pending_ops queue. Maybe we should only stop listening for reads when the number of pending_ops hits a threshold, not the number of pending writes.

Dropping the readers from the event set (as in step 2) is very expensive using epoll() in its default mode, since it requires calling epoll_ctl() for each fd individually. (Once to remove it, and once again to reinsert it when we want to re-enable listening.) Using epoll() in Edge Triggered mode only partially mitigates this problem; it won't wake us up for old events but will keep on waking up for new events. The only practical alternative here is to use cascaded epoll sets, one dedicated to the listeners / data readers, and a main one for the wake_sd and the data writers. The listener epoll fd will itself be added or removed from the main epoll fd. When we want to mask off reads, we remove the listener epoll fd from the main fd set. When we want to monitor reads, we add it back in, and then we have to go through a second level loop to find out which readers are active. This is ugly and stupid, but that's life using epoll(). (And no, I never had time to write the equeue implementation, though it would surely come in handy right now.)


 -- Howard Chu
 Chief Architect, Symas Corp.  http://www.symas.com
 Director, Highland Sun        http://highlandsun.com/hyc
 OpenLDAP Core Team            http://www.openldap.org/project/