[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#6276) paused pool can deadlock if writers are waiting

This is a quick abstract answer, I haven't dived into the code to see
what my suggestions would mean in practice.  Anyway:

It sounds to me like this problem cannot be fully prevented when network
operations share the same pool queue as everything else, and we have
operations that can wait for other operations.  All pool threads can be
occupied with LDAP-level operations, so that network-level operations
which the LDAP-level operations depend on can get blocked.  If slapd has
a design which even tries to guarantee forward progress, I'm not aware
of it.

So LDAP-level operations ought to leave at least one thread free for for
network-level operations.  Not necessary a designated thread: A thread
moving from network-level to LDAP-level operation like
connection_read_thread() does could first check that at least one other
thread remains available for network-level operation.

Beyond that, my immediate reaction was that pauses are implemented at
the wrong level and/or need to be split up in different types of pauses.
There is no good reason cn=config's need to have slapd config variables
for itself to affect network operations - nor, I hope, affect pool-level
operations like pool_purgekey().

E.g. cn=config could use a slapd_config_pause() call to ask for lone
access to the config variables instead of thread_pool_pause() call.
Slapd operations that must respect such pauses should thus call a
slapd_config_pausecheck() macro/function, not thread_pool_pausecheck()
or depend on the pool to respect slapd-level pauses.
connection_read_thread() can then go ahead and use the connection
independently of those pauses.  If it reaches connection_operation(),
it'll need to check for slapd pause.

Alternatively, with different types of pauses/operations: The pool could
have different sets of pauses (a pause-type bitmask?) and if one type of
pauses happens, operations that ignore that type of pause will not be
affected.  This messes up the pool code further though, and I already
hate what pauses do to the otherwise clean pool code:-(