[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: (ITS#3567) slapd terminates unexpectedly
Hello Howard,
Am 08.03.2005 um 20:05 schrieb Howard Chu:
> Dagobert Michelsen wrote:
>> I used
>> truss -d -l -f -vpoll -u\!back_perl-2.2 -ua.out:: -o
>> ldap-remapper-2005-03-08.truss -p 5988
>> because the backend messages cluttered up the log. The output can be
>> found at
>>
>> http://www.familie-michelsen.de/its3567/ldap-remapper-2005-03
>> -08.truss.bz2 [3.3M]
>>
> Here is an excerpt from the truss showing the failure:
> 5988/2: 203.2408 poll(0x7EFFEED8, 7, -1)
> Err#89 ENOSYS
> 5988/2: fd=5 ev=POLLRDNORM rev=0
> 5988/2: fd=7 ev=POLLRDNORM rev=0
> 5988/2: fd=8 ev=POLLRDNORM rev=0
> 5988/2: fd=14 ev=POLLRDNORM rev=0
> 5988/2: fd=16 ev=POLLRDNORM rev=POLLIN
> 5988/2: fd=19 ev=POLLRDNORM rev=0
> 5988/2: fd=22 ev=POLLRDNORM rev=0
> 5988/4: 203.2409 close(14) = 0
> 5988/2: 203.2411 write(2, " d a e m o n : s e l e".., 53)
> = 53
>
> Here's what happened earlier on; the crucial part is to notice the
> thread IDs and watch what happened to descriptor #14:
> 5988/2: 202.9282 time() = 1110276036
> 5988/3@3: 202.9284 -> sl_calloc(0x1, 0x18, 0x6c0250, 0x95dac)
> 5988/2: 202.9285 write(2, " d a e m o n : s e l e".., 51)
> = 51
> 5988/3@3: 202.9286 -> sl_malloc(0x18, 0x6c0250, 0x0, 0x0)
> 5988/2: 202.9287 write(2, " d a e m o n : s e l e".., 51)
> = 51
> 5988/3@3: 202.9289 <- sl_malloc() = 0x6d584c
> 5988/2: 202.9290 poll(0x7EFFEED8, 7, -1) = 3
> 5988/2: fd=5 ev=POLLRDNORM rev=0
> 5988/2: fd=7 ev=POLLRDNORM rev=0
> 5988/2: fd=8 ev=POLLRDNORM rev=POLLRDNORM
> 5988/2: fd=16 ev=POLLRDNORM rev=0
> 5988/2: fd=19 ev=POLLRDNORM rev=POLLRDNORM
> 5988/2: fd=22 ev=POLLRDNORM rev=0
> 5988/2: fd=27 ev=POLLRDNORM rev=POLLRDNORM
> 5988/3@3: 202.9291 <- sl_calloc() = 0x6d584c
> 5988/2: 202.9293 write(2, " d a e m o n : a c t i".., 34)
> = 34
> 5988/3@3: 202.9295 -> slap_sl_free(0x6d584c, 0x6c0250,
> 0x96134, 0x0)
> 5988/5: 202.9247 close(14) = 0
> 5988/3@3: 202.9298 <- slap_sl_free() = 0x6d584c
> 5988/2: 202.9303 accept(8, 0x7EFFFD30, 0x7EFFF914, 1)
> = 14
> 5988/3@3: 202.9303 -> slap_sl_free(0x6d582c, 0x6c0250,
> 0x96134, 0x0)
> 5988/3@3: 202.9306 <- slap_sl_free() = 0x6d582c
> 5988/5: 202.9309 fcntl(28, F_DUP2FD, 0x0000000E)
> = 14
> 5988/2: 202.9310 setsockopt(14, SOL_SOCKET, SO_KEEPALIVE,
> 0x7EFFF0E8, 4, 1) = 0
> 5988/3@3: 202.9311 -> ad_inlist(0x809e88, 0x6d565c, 0xcec00,
> 0xce800)
> 5988/2: 202.9316 setsockopt(14, tcp, TCP_NODELAY,
> 0x7EFFF0E8, 4, 1) = 0
> 5988/2: 202.9318 write(2, " d a e m o n : n e w ".., 29)
> = 29
> 5988/2@2: 202.9320 -> connection_init(0xe, 0xf8628, 0xcbe28,
> 0x7efff0f0)
> 5988/5: 202.9310 close(28) = 0
> 5988/3@3: 202.9324 <- ad_inlist() = 1
> 5988/5: 202.9324 llseek(14, 0, SEEK_CUR)
> Err#29 ESPIPE
> 5988/3@3: 202.9327 -> access_allowed(0x82be78, 0xbcdd00,
> 0x809e88, 0xbce518)
> 5988/5: 202.9329 close(14) = 0
>
> At time 202.9303 the slapd listener thread (#2) accepted an incoming
> connection, and that was assigned to fd 14. At time 202.9309 thread #5
> explicitly dup'd descriptor 28 onto #14. Thread #5 is your perl code
> (which you can verify by noting the message logged at 202.9019). There
> are interleaved operations on descriptor 14 between thread 2 and 5
> after that point. Ultimately fd 14 is closed by the perl code.
>
> As I suspected at the outset, the problem is in your back-perl code,
> not in slapd. slapd cannot be expected to defend itself from user code
> closing its file descriptors arbitrarily, and signalling the problem
> by terminating as it does is the safest thing it can do. This ITS will
> be closed.
First thanks for the in-depth analysis. However, for me it looks like
fd #14 is opened straight before the select you mentioned and closed
without errors right after the poll:
5988/4: 203.2398 putmsg(4, 0x7DFFEFB0, 0x7DFFEFA4, 0)
= 0
5988/3@3: 203.2400 <- access_allowed() = 1
5988/4: 203.2401 open("/var/run/syslog_door", O_RDONLY)
= 14 <---
5988/2: 203.2346 time()
= 1110276036
5988/4: 203.2404 door_info(14, 0x7DFFEEE8)
= 0
5988/2: 203.2406 write(2, " d a e m o n : s e l e"..,
51) = 51
5988/4: 203.2406 getpid()
= 5988 [5979]
5988/2: 203.2407 write(2, " d a e m o n : s e l e"..,
51) = 51
5988/4: 203.2408 door_call(14, 0x7DFFEED0)
= 0
5988/2: 203.2408 poll(0x7EFFEED8, 7, -1)
Err#89 ENOSYS
5988/2: fd=5 ev=POLLRDNORM rev=0
5988/2: fd=7 ev=POLLRDNORM rev=0
5988/2: fd=8 ev=POLLRDNORM rev=0
5988/2: fd=14 ev=POLLRDNORM rev=0
5988/2: fd=16 ev=POLLRDNORM rev=POLLIN
5988/2: fd=19 ev=POLLRDNORM rev=0
5988/2: fd=22 ev=POLLRDNORM rev=0
5988/4: 203.2409 close(14)
= 0
5988/2: 203.2411 write(2, " d a e m o n : s e l e"..,
53) = 53
However there is one more thing I don't understand: if my Perl code
runs in thread #5 and the fd is duped onto fd #14 in this thread, why
should openldap bother about issueing select on it? Additionally, the
perl code never closes a fd explicitly, only perl filehandles and
Net::LDAP are used.
Best regards
-- Dagobert