[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#3567) slapd terminates unexpectedly



Hello Howard,

Am 08.03.2005 um 20:05 schrieb Howard Chu:
> Dagobert Michelsen wrote:
>> I used
>>  truss -d -l -f -vpoll -u\!back_perl-2.2 -ua.out:: -o  
>> ldap-remapper-2005-03-08.truss -p 5988
>> because the backend messages cluttered up the log. The output can be  
>> found at
>>   
>> http://www.familie-michelsen.de/its3567/ldap-remapper-2005-03 
>> -08.truss.bz2 [3.3M]
>>
> Here is an excerpt from the truss showing the failure:
> 5988/2:        203.2408    poll(0x7EFFEED8, 7, -1)                 
> Err#89 ENOSYS
> 5988/2:            fd=5  ev=POLLRDNORM rev=0
> 5988/2:            fd=7  ev=POLLRDNORM rev=0
> 5988/2:            fd=8  ev=POLLRDNORM rev=0
> 5988/2:            fd=14 ev=POLLRDNORM rev=0
> 5988/2:            fd=16 ev=POLLRDNORM rev=POLLIN
> 5988/2:            fd=19 ev=POLLRDNORM rev=0
> 5988/2:            fd=22 ev=POLLRDNORM rev=0
> 5988/4:        203.2409    close(14)                    = 0
> 5988/2:        203.2411    write(2, " d a e m o n :   s e l e".., 53)   
>   = 53
>
> Here's what happened earlier on; the crucial part is to notice the  
> thread IDs and watch what happened to descriptor #14:
> 5988/2:        202.9282    time()                        = 1110276036
> 5988/3@3:    202.9284        -> sl_calloc(0x1, 0x18, 0x6c0250, 0x95dac)
> 5988/2:        202.9285    write(2, " d a e m o n :   s e l e".., 51)   
>   = 51
> 5988/3@3:    202.9286          -> sl_malloc(0x18, 0x6c0250, 0x0, 0x0)
> 5988/2:        202.9287    write(2, " d a e m o n :   s e l e".., 51)   
>   = 51
> 5988/3@3:    202.9289          <- sl_malloc() = 0x6d584c
> 5988/2:        202.9290    poll(0x7EFFEED8, 7, -1)                = 3
> 5988/2:            fd=5  ev=POLLRDNORM rev=0
> 5988/2:            fd=7  ev=POLLRDNORM rev=0
> 5988/2:            fd=8  ev=POLLRDNORM rev=POLLRDNORM
> 5988/2:            fd=16 ev=POLLRDNORM rev=0
> 5988/2:            fd=19 ev=POLLRDNORM rev=POLLRDNORM
> 5988/2:            fd=22 ev=POLLRDNORM rev=0
> 5988/2:            fd=27 ev=POLLRDNORM rev=POLLRDNORM
> 5988/3@3:    202.9291        <- sl_calloc() = 0x6d584c
> 5988/2:        202.9293    write(2, " d a e m o n :   a c t i".., 34)   
>   = 34
> 5988/3@3:    202.9295        -> slap_sl_free(0x6d584c, 0x6c0250,  
> 0x96134, 0x0)
> 5988/5:        202.9247    close(14)                    = 0
> 5988/3@3:    202.9298        <- slap_sl_free() = 0x6d584c
> 5988/2:        202.9303    accept(8, 0x7EFFFD30, 0x7EFFF914, 1)         
> = 14
> 5988/3@3:    202.9303        -> slap_sl_free(0x6d582c, 0x6c0250,  
> 0x96134, 0x0)
> 5988/3@3:    202.9306        <- slap_sl_free() = 0x6d582c
> 5988/5:        202.9309    fcntl(28, F_DUP2FD, 0x0000000E)             
> = 14
> 5988/2:        202.9310    setsockopt(14, SOL_SOCKET, SO_KEEPALIVE,  
> 0x7EFFF0E8, 4, 1) = 0
> 5988/3@3:    202.9311        -> ad_inlist(0x809e88, 0x6d565c, 0xcec00,  
> 0xce800)
> 5988/2:        202.9316    setsockopt(14, tcp, TCP_NODELAY,  
> 0x7EFFF0E8, 4, 1) = 0
> 5988/2:        202.9318    write(2, " d a e m o n :   n e w  ".., 29)   
>   = 29
> 5988/2@2:    202.9320    -> connection_init(0xe, 0xf8628, 0xcbe28,  
> 0x7efff0f0)
> 5988/5:        202.9310    close(28)                    = 0
> 5988/3@3:    202.9324        <- ad_inlist() = 1
> 5988/5:        202.9324    llseek(14, 0, SEEK_CUR)                 
> Err#29 ESPIPE
> 5988/3@3:    202.9327        -> access_allowed(0x82be78, 0xbcdd00,  
> 0x809e88, 0xbce518)
> 5988/5:        202.9329    close(14)                    = 0
>
> At time 202.9303 the slapd listener thread (#2) accepted an incoming  
> connection, and that was assigned to fd 14. At time 202.9309 thread #5  
> explicitly dup'd descriptor 28 onto #14. Thread #5 is your perl code  
> (which you can verify by noting the message logged at 202.9019). There  
> are interleaved operations on descriptor 14 between thread 2 and 5  
> after that point. Ultimately fd 14 is closed by the perl code.
>
> As I suspected at the outset, the problem is in your back-perl code,  
> not in slapd. slapd cannot be expected to defend itself from user code  
> closing its file descriptors arbitrarily, and signalling the problem  
> by terminating as it does is the safest thing it can do. This ITS will  
> be closed.

First thanks for the in-depth analysis. However, for me it looks like  
fd #14 is opened straight before the select you mentioned and closed  
without errors right after the poll:

5988/4:         203.2398        putmsg(4, 0x7DFFEFB0, 0x7DFFEFA4, 0)     
         = 0
5988/3@3:       203.2400            <- access_allowed() = 1
5988/4:         203.2401        open("/var/run/syslog_door", O_RDONLY)   
         = 14         <---
5988/2:         203.2346        time()                                   
         = 1110276036
5988/4:         203.2404        door_info(14, 0x7DFFEEE8)                
         = 0
5988/2:         203.2406        write(2, " d a e m o n :   s e l e"..,  
51)      = 51
5988/4:         203.2406        getpid()                                 
         = 5988 [5979]
5988/2:         203.2407        write(2, " d a e m o n :   s e l e"..,  
51)      = 51
5988/4:         203.2408        door_call(14, 0x7DFFEED0)                
         = 0
5988/2:         203.2408        poll(0x7EFFEED8, 7, -1)                  
         Err#89 ENOSYS
5988/2:                 fd=5  ev=POLLRDNORM rev=0
5988/2:                 fd=7  ev=POLLRDNORM rev=0
5988/2:                 fd=8  ev=POLLRDNORM rev=0
5988/2:                 fd=14 ev=POLLRDNORM rev=0
5988/2:                 fd=16 ev=POLLRDNORM rev=POLLIN
5988/2:                 fd=19 ev=POLLRDNORM rev=0
5988/2:                 fd=22 ev=POLLRDNORM rev=0
5988/4:         203.2409        close(14)                                
         = 0
5988/2:         203.2411        write(2, " d a e m o n :   s e l e"..,  
53)      = 53

However there is one more thing I don't understand: if my Perl code  
runs in thread #5 and the fd is duped onto fd #14 in this thread, why  
should openldap bother about issueing select on it? Additionally, the  
perl code never closes a fd explicitly, only perl filehandles and  
Net::LDAP are used.


Best regards

   -- Dagobert