[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#6276) paused pool can deadlock if writers are waiting



Hallvard B Furuseth wrote:
> This is a quick abstract answer, I haven't dived into the code to see
> what my suggestions would mean in practice.  Anyway:

> It sounds to me like this problem cannot be fully prevented when network
> operations share the same pool queue as everything else, and we have
> operations that can wait for other operations.  All pool threads can be
> occupied with LDAP-level operations, so that network-level operations
> which the LDAP-level operations depend on can get blocked.  If slapd has
> a design which even tries to guarantee forward progress, I'm not aware
> of it.
>
> So LDAP-level operations ought to leave at least one thread free for for
> network-level operations.  Not necessary a designated thread: A thread
> moving from network-level to LDAP-level operation like
> connection_read_thread() does could first check that at least one other
> thread remains available for network-level operation.

Yes...

> Beyond that, my immediate reaction was that pauses are implemented at
> the wrong level and/or need to be split up in different types of pauses.
> There is no good reason cn=config's need to have slapd config variables
> for itself to affect network operations - nor, I hope, affect pool-level
> operations like pool_purgekey().

I suppose implementing things this way can appear to be too blunt. But the 
amount of locks required to do it at a finer level is, IMO, unmanageable. The 
fact that you can change global slapd config parameters (such as the number of 
threads, size of sockbuf buffers, etc.) makes it inherently safe for 
*anything* else to be active while config changes are being made. And sifting 
thru each variable to decide how sensitive they are, and when they are unsafe, 
will inevitably lead to requiring locks on every single piece of configuration 
data. (Want to look up an attribute type, or objectclass? Or a database 
suffix? etc. etc. etc... There are countless things we do arbitrarily that 
simply won't work if we allow arbitrary threads to continue to run while their 
configurations are changed from under them.)

> E.g. cn=config could use a slapd_config_pause() call to ask for lone
> access to the config variables instead of thread_pool_pause() call.
> Slapd operations that must respect such pauses should thus call a
> slapd_config_pausecheck() macro/function, not thread_pool_pausecheck()
> or depend on the pool to respect slapd-level pauses.
> connection_read_thread() can then go ahead and use the connection
> independently of those pauses.  If it reaches connection_operation(),
> it'll need to check for slapd pause.

Yes, I've thought about this approach too. That makes the pause mechanisms 
even messier.

Another possibility is to just try to read 1 byte in the listener thread, to 
detect the hangup there when we have no other means to discover it. We would 
have to make sure to be able to unget this byte back to the bottom of the 
sockbuf stack if there's valid data. This will affect the throughput of the 
listener thread, but it may not be too terrible a hit.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/