[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#3567) slapd terminates unexpectedly



Dagobert Michelsen wrote:

>Hello,
>  
>
>>This will show the select call getting mapped to poll() like the following. In ITS 3400, we saw that the pollset as built from the select() call had a bit set for an FD that Solaris thinks is closed. (My suspicsion is slapd lost track of FDs somehow and left one set for an FD that it had closed.)
>>    
>>
Looks like Chris was close.

>
>I used
>  truss -d -l -f -vpoll -u\!back_perl-2.2 -ua.out:: -o ldap-remapper-2005-03-08.truss -p 5988
>because the backend messages cluttered up the log. The output can be found at
>  http://www.familie-michelsen.de/its3567/ldap-remapper-2005-03-08.truss.bz2 [3.3M]
>  
>
Here is an excerpt from the truss showing the failure:
5988/2:        203.2408    poll(0x7EFFEED8, 7, -1)                Err#89 
ENOSYS
5988/2:            fd=5  ev=POLLRDNORM rev=0
5988/2:            fd=7  ev=POLLRDNORM rev=0
5988/2:            fd=8  ev=POLLRDNORM rev=0
5988/2:            fd=14 ev=POLLRDNORM rev=0
5988/2:            fd=16 ev=POLLRDNORM rev=POLLIN
5988/2:            fd=19 ev=POLLRDNORM rev=0
5988/2:            fd=22 ev=POLLRDNORM rev=0
5988/4:        203.2409    close(14)                    = 0
5988/2:        203.2411    write(2, " d a e m o n :   s e l e".., 53)    
= 53

Here's what happened earlier on; the crucial part is to notice the 
thread IDs and watch what happened to descriptor #14:
5988/2:        202.9282    time()                        = 1110276036
5988/3@3:    202.9284        -> sl_calloc(0x1, 0x18, 0x6c0250, 0x95dac)
5988/2:        202.9285    write(2, " d a e m o n :   s e l e".., 51)    
= 51
5988/3@3:    202.9286          -> sl_malloc(0x18, 0x6c0250, 0x0, 0x0)
5988/2:        202.9287    write(2, " d a e m o n :   s e l e".., 51)    
= 51
5988/3@3:    202.9289          <- sl_malloc() = 0x6d584c
5988/2:        202.9290    poll(0x7EFFEED8, 7, -1)                = 3
5988/2:            fd=5  ev=POLLRDNORM rev=0
5988/2:            fd=7  ev=POLLRDNORM rev=0
5988/2:            fd=8  ev=POLLRDNORM rev=POLLRDNORM
5988/2:            fd=16 ev=POLLRDNORM rev=0
5988/2:            fd=19 ev=POLLRDNORM rev=POLLRDNORM
5988/2:            fd=22 ev=POLLRDNORM rev=0
5988/2:            fd=27 ev=POLLRDNORM rev=POLLRDNORM
5988/3@3:    202.9291        <- sl_calloc() = 0x6d584c
5988/2:        202.9293    write(2, " d a e m o n :   a c t i".., 34)    
= 34
5988/3@3:    202.9295        -> slap_sl_free(0x6d584c, 0x6c0250, 
0x96134, 0x0)
5988/5:        202.9247    close(14)                    = 0
5988/3@3:    202.9298        <- slap_sl_free() = 0x6d584c
5988/2:        202.9303    accept(8, 0x7EFFFD30, 0x7EFFF914, 1)        = 14
5988/3@3:    202.9303        -> slap_sl_free(0x6d582c, 0x6c0250, 
0x96134, 0x0)
5988/3@3:    202.9306        <- slap_sl_free() = 0x6d582c
5988/5:        202.9309    fcntl(28, F_DUP2FD, 0x0000000E)            = 14
5988/2:        202.9310    setsockopt(14, SOL_SOCKET, SO_KEEPALIVE, 
0x7EFFF0E8, 4, 1) = 0
5988/3@3:    202.9311        -> ad_inlist(0x809e88, 0x6d565c, 0xcec00, 
0xce800)
5988/2:        202.9316    setsockopt(14, tcp, TCP_NODELAY, 0x7EFFF0E8, 
4, 1) = 0
5988/2:        202.9318    write(2, " d a e m o n :   n e w  ".., 29)    
= 29
5988/2@2:    202.9320    -> connection_init(0xe, 0xf8628, 0xcbe28, 
0x7efff0f0)
5988/5:        202.9310    close(28)                    = 0
5988/3@3:    202.9324        <- ad_inlist() = 1
5988/5:        202.9324    llseek(14, 0, SEEK_CUR)                Err#29 
ESPIPE
5988/3@3:    202.9327        -> access_allowed(0x82be78, 0xbcdd00, 
0x809e88, 0xbce518)
5988/5:        202.9329    close(14)                    = 0

At time 202.9303 the slapd listener thread (#2) accepted an incoming 
connection, and that was assigned to fd 14. At time 202.9309 thread #5 
explicitly dup'd descriptor 28 onto #14. Thread #5 is your perl code 
(which you can verify by noting the message logged at 202.9019). There 
are interleaved operations on descriptor 14 between thread 2 and 5 after 
that point. Ultimately fd 14 is closed by the perl code.

As I suspected at the outset, the problem is in your back-perl code, not 
in slapd. slapd cannot be expected to defend itself from user code 
closing its file descriptors arbitrarily, and signalling the problem by 
terminating as it does is the safest thing it can do. This ITS will be 
closed.

-- 
  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support