[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: SLAP_LIGHTWEIGHT_LISTENER



Jong-Hyuk wrote:
The title sounds like a tabloid headline... :-)

I must have been watching too much Fox News lately.....

Alternatively we should stop monitoring write events and just let the writers unblock themselves. Is there a particular reason why we monitor write events? I don't see any benefit. Eliminating one set of event sources would reduce our kernel load by half.

I want to ask the same question, too. What was the original rationale behind the listener's waking up the write-wait workers via condition variable ? If there're no compeling reason to have the listener wake up the writers (this certainly does nothing to do with concurrency improvement), we should consider the idea of relying on the writers themselves and the OS to unblock.

There is one case to consider, but it's weak. By centralizing like this, the listener thread can detect slap_sig_shutdown and tell all the waiting threads to give up. But in fact this is unreliable; we've seen that a writer can still block even though the sockets are set Nonblocking (due to network errors, pulling out the network cable, etc...). As such, this is of questionable benefit.


We currently do set the sockets to Nonblocking, since ber_get_next() was being called in the main thread, but we don't need to do that now. Then we would no longer need to test for EAGAIN/EWOULDBLOCK in either the reader or the writer. We need to leave the sockets blocking, otherwise we need to add a select loop to the writer threads, and that seems like a lot of unnecessary overhead.

Aside from this issue there is definitely a bug in the current implementation; I see the same event being submitted multiple times in rapid succession. The CPU usage goes to 100% and there does not appear to be any end condition that disables the event. This occurs most often in test033, but should occur in any test that uses syncrepl (or listener-managed client tasks like syncrepl). After the syncrepl task has sent a search request to the provider and the first reply arrives, marking the socket readable. It appears this readable state is not getting reset.


Will look into this further.

This is with epoll, by the way. When it happens, I see that slap_daemon.sd_index[sd] == -1, which should not be happening. The slapd_suspend() call can't do anything without a valid index, so the event is never cleared/suspended.


--
 -- Howard Chu
 Chief Architect, Symas Corp.  http://www.symas.com
 Director, Highland Sun        http://highlandsun.com/hyc
 OpenLDAP Core Team            http://www.openldap.org/project/