[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: (ITS#3567) slapd terminates unexpectedly
Dagobert Michelsen wrote:
>Hello,
>
>
>>This will show the select call getting mapped to poll() like the following. In ITS 3400, we saw that the pollset as built from the select() call had a bit set for an FD that Solaris thinks is closed. (My suspicsion is slapd lost track of FDs somehow and left one set for an FD that it had closed.)
>>
>>
Looks like Chris was close.
>
>I used
> truss -d -l -f -vpoll -u\!back_perl-2.2 -ua.out:: -o ldap-remapper-2005-03-08.truss -p 5988
>because the backend messages cluttered up the log. The output can be found at
> http://www.familie-michelsen.de/its3567/ldap-remapper-2005-03-08.truss.bz2 [3.3M]
>
>
Here is an excerpt from the truss showing the failure:
5988/2: 203.2408 poll(0x7EFFEED8, 7, -1) Err#89
ENOSYS
5988/2: fd=5 ev=POLLRDNORM rev=0
5988/2: fd=7 ev=POLLRDNORM rev=0
5988/2: fd=8 ev=POLLRDNORM rev=0
5988/2: fd=14 ev=POLLRDNORM rev=0
5988/2: fd=16 ev=POLLRDNORM rev=POLLIN
5988/2: fd=19 ev=POLLRDNORM rev=0
5988/2: fd=22 ev=POLLRDNORM rev=0
5988/4: 203.2409 close(14) = 0
5988/2: 203.2411 write(2, " d a e m o n : s e l e".., 53)
= 53
Here's what happened earlier on; the crucial part is to notice the
thread IDs and watch what happened to descriptor #14:
5988/2: 202.9282 time() = 1110276036
5988/3@3: 202.9284 -> sl_calloc(0x1, 0x18, 0x6c0250, 0x95dac)
5988/2: 202.9285 write(2, " d a e m o n : s e l e".., 51)
= 51
5988/3@3: 202.9286 -> sl_malloc(0x18, 0x6c0250, 0x0, 0x0)
5988/2: 202.9287 write(2, " d a e m o n : s e l e".., 51)
= 51
5988/3@3: 202.9289 <- sl_malloc() = 0x6d584c
5988/2: 202.9290 poll(0x7EFFEED8, 7, -1) = 3
5988/2: fd=5 ev=POLLRDNORM rev=0
5988/2: fd=7 ev=POLLRDNORM rev=0
5988/2: fd=8 ev=POLLRDNORM rev=POLLRDNORM
5988/2: fd=16 ev=POLLRDNORM rev=0
5988/2: fd=19 ev=POLLRDNORM rev=POLLRDNORM
5988/2: fd=22 ev=POLLRDNORM rev=0
5988/2: fd=27 ev=POLLRDNORM rev=POLLRDNORM
5988/3@3: 202.9291 <- sl_calloc() = 0x6d584c
5988/2: 202.9293 write(2, " d a e m o n : a c t i".., 34)
= 34
5988/3@3: 202.9295 -> slap_sl_free(0x6d584c, 0x6c0250,
0x96134, 0x0)
5988/5: 202.9247 close(14) = 0
5988/3@3: 202.9298 <- slap_sl_free() = 0x6d584c
5988/2: 202.9303 accept(8, 0x7EFFFD30, 0x7EFFF914, 1) = 14
5988/3@3: 202.9303 -> slap_sl_free(0x6d582c, 0x6c0250,
0x96134, 0x0)
5988/3@3: 202.9306 <- slap_sl_free() = 0x6d582c
5988/5: 202.9309 fcntl(28, F_DUP2FD, 0x0000000E) = 14
5988/2: 202.9310 setsockopt(14, SOL_SOCKET, SO_KEEPALIVE,
0x7EFFF0E8, 4, 1) = 0
5988/3@3: 202.9311 -> ad_inlist(0x809e88, 0x6d565c, 0xcec00,
0xce800)
5988/2: 202.9316 setsockopt(14, tcp, TCP_NODELAY, 0x7EFFF0E8,
4, 1) = 0
5988/2: 202.9318 write(2, " d a e m o n : n e w ".., 29)
= 29
5988/2@2: 202.9320 -> connection_init(0xe, 0xf8628, 0xcbe28,
0x7efff0f0)
5988/5: 202.9310 close(28) = 0
5988/3@3: 202.9324 <- ad_inlist() = 1
5988/5: 202.9324 llseek(14, 0, SEEK_CUR) Err#29
ESPIPE
5988/3@3: 202.9327 -> access_allowed(0x82be78, 0xbcdd00,
0x809e88, 0xbce518)
5988/5: 202.9329 close(14) = 0
At time 202.9303 the slapd listener thread (#2) accepted an incoming
connection, and that was assigned to fd 14. At time 202.9309 thread #5
explicitly dup'd descriptor 28 onto #14. Thread #5 is your perl code
(which you can verify by noting the message logged at 202.9019). There
are interleaved operations on descriptor 14 between thread 2 and 5 after
that point. Ultimately fd 14 is closed by the perl code.
As I suspected at the outset, the problem is in your back-perl code, not
in slapd. slapd cannot be expected to defend itself from user code
closing its file descriptors arbitrarily, and signalling the problem by
terminating as it does is the safest thing it can do. This ITS will be
closed.
--
-- Howard Chu
Chief Architect, Symas Corp. Director, Highland Sun
http://www.symas.com http://highlandsun.com/hyc
Symas: Premier OpenSource Development and Support