[Date Prev][Date Next] [Chronological] [Thread] [Top]

Fix for back-shell child process deadlocks (ITS#2262)



Full_Name: Simon Brady
Version: 2.0.27
OS: Red Hat Linux 7.3 (x86), Solaris 8 10/01 (Sparc)
URL: ftp://ftp.openldap.org/incoming/simon-brady-030109.patch
Submission from: (NULL) (139.80.123.1)


Shell subprocesses spawned by back-shell can deadlock due to other subprocesses
holding open file descriptors on their pipes. The observed symptom is that two
or more shell subprocesses receive all their expected input but then block
waiting for the pipe connected to their stdin to be closed. lsof reveals that
one or more other shell subprocesses have file descriptors other than stdout
open for write which are connected to the pipe the stalled subprocess is taking
its stdin from.

My site has recently observed this issue on heavily loaded Solaris 8 10/01
(Sparc) and RH Linux 7.x (x86) servers running OpenLDAP 2.0.27. We have tracked
it to a race condition around the forkandexec() routine in back-shell, and I
have uploaded a patch against HEAD to ftp.openldap.org/incoming which made the
problem Go Away on these servers (when applied to 2.0.27). I believe the patch
obviates the need to implement a surrogate parent, and it may even make
back-shell safe for use --with-threads.

The race condition is as follows: threads A and B in the parent slapd process
call forkandexec() reasonably close together, such that both threads execute the
calls to pipe() prior to either of them calling fork(). Consider the write end
of the parent-to-child pipes: suppose that in thread A p2c[1] = 10 and in thread
B p2c[1] = 14. After the fork() each child immediately closes its copy of
p2c[1], and the parent threads close theirs after writing. However, each child
gets a copy of ALL its parent process's file descriptors open at the time of the
fork(). This means that child A has fd 14 opened for writing, and child B has fd
10. Since neither child knows about this extra descriptor it never closes it, so
both input pipes remain stuck open.

My proposed fix is to add a mutex shared across all shell backends such that
forkandexec() can't be called while another thread is writing.

There are two caveats (beyond the usual) to this patch. First, I believe that
the comment in fork.c about "child may deadlock due to resources locked by the
parent" actually refers to this bug and is therefore misleading. I may be wrong.
Second, the patch calls ldap_pvt_thread_mutex_init in the child to clear its
copy of the parent's write mutex - as the comment above the call notes, this may
be incorrect for threading libraries other than pthreads.

Finally, I appreciate that back-shell was not designed to be thread-safe (i.e.
I've read ITS #2160). If you decide to reject this patch on those grounds, then
I implore you to please document back-shell's non-safety in the stable release.
I have been unable to find anything in the Admin Guide, READMEs or source which
might have warned me of this limitation, and as a result I discovered it the
hard way... Thanks!