[Date Prev][Date Next] [Chronological] [Thread] [Top]

Back-shell zombies with surrogate parent - another patch (ITS#2140)



Full_Name: Cris Bailiff
Version: 2.1.5
OS: Solaris 2.6
URL: ftp://ftp.openldap.org/incoming/cris-bailiff-021016.patch
Submission from: (NULL) (203.16.202.34)


Whilst testing back_shell with the surrogate parent code on Solaris 2.6 (with
threading), I found I was getting zombies. See ITS #1973 and ITS# 2109 for other
similar reports). 

The surrogate parent was a big improvement on the hanging I was getting before
using it, so I'm a little surprised to see it being removed again in 2.1.6, but
I digress.

The issue seemed quite straightforward - truss shows that nothing in the
surrogate parent process is collecting signals for SIGCHLD (or otherwise
wait()ing), even though an appropriate signal handler seems to be set up in
main.c before the surrogate is forked off.

I added the attached patch, to duplicate the signal handler from slapd/main.c to
back-shell/fork.c . I wanted a small, quickly working patch, so I didn't try to
make some exportable handler routines that could be re-usedin back-shell, I just
cut-and-pasted into fork.c all the parts I needed.

This works well for me, on Solaris 2.6 with threads, and has been hit with some
very heavy test loads without failing a connection or fork, and without leaving
any zombies. I'm using a (very) small C routine for the 'backend-backend', so I
get quite a high fork rate!

The patch can be applied to 2.1.5 as-is for anyone in immediate need. For 2.1.6
or head, I'm not sure if it's enough to fix the problem with the surrogate
parent code removed.

Please consider maintaining back-shell as priority, as it's a very useful
feature of openldap! Certainly, I think it's important for it to be stable and
efficient across releases!

Cheers,
Cris