[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#4108) LWL causing core dumps in HEAD



hyc@symas.com wrote:
> Running slapd on a single processor machine, I no longer see any 
> asserts. Running on my dual machine, if syslog is enabled, I 
> occasionally see an assert in slapd_set_read() called from 
> connection_read(). With syslog disabled, this problem doesn't occur. No 
> bright ideas come to mind at the moment...
Some context:
Oct 29 04:56:14 mandolin slapd.head[10328]: conn=87548 op=1 UNBIND
Oct 29 04:56:14 mandolin slapd.head[10328]: conn=87548 fd=15 closed
Oct 29 04:56:14 mandolin slapd.head[10328]: conn=87549 fd=15 ACCEPT from 
IP=192.168.1.22:23726 (IP=0.0.0.0:9000)
Oct 29 04:56:14 mandolin slapd.head[10328]: conn=87549 op=0 BIND 
dn="cn=Sandi Couto,ou=peons,o=foo" method=128
Oct 29 04:56:14 mandolin slapd.head[10328]: conn=87549 op=0 BIND 
dn="cn=Sandi Couto,ou=Peons,o=foo" mech=SIMPLE ssf=0
Oct 29 04:56:14 mandolin slapd.head[10328]: conn=87549 op=0 RESULT 
tag=97 err=0 text=
Oct 29 04:56:14 mandolin slapd.head[10328]: conn=87549 op=1 UNBIND
Oct 29 04:56:14 mandolin slapd.head[10328]: conn=87549 fd=15 closed
Oct 29 04:56:14 mandolin slapd.head[10328]: conn=87550 fd=15 ACCEPT from 
IP=192.168.1.22:23727 (IP=0.0.0.0:9000)

daemon: listen=11, new connection on 15
daemon: added 15r
daemon: activity on 2 descriptors
daemon: select: listen=10 active_threads=0 tvp=NULL
daemon: select: listen=11 active_threads=0 tvp=NULL
daemon: activity on 1 descriptor
daemon: activity on: 15r
daemon: read active on 15
daemon: select: listen=10 active_threads=0 tvp=NULL
daemon: select: listen=11 active_threads=0 tvp=NULL
daemon: activity on 2 descriptors
daemon: select: listen=10 active_threads=0 tvp=NULL
daemon: select: listen=11 active_threads=0 tvp=NULL
daemon: activity on 1 descriptor
daemon: activity on: 15r
daemon: read active on 15
daemon: select: listen=10 active_threads=0 tvp=NULL
daemon: select: listen=11 active_threads=0 tvp=NULL
daemon: activity on 1 descriptor
daemon: select: listen=10 active_threads=0 tvp=NULL
daemon: select: listen=11 active_threads=0 tvp=NULL
daemon: activity on 2 descriptors
daemon: activity on: 15r
daemon: read active on 15
daemon: select: listen=10 active_threads=0 tvp=NULL
daemon: select: listen=11 busy
daemon: removing 15
daemon: listen=11, new connection on 15
slapd.head: ../../../head/servers/slapd/daemon.c:567: slapd_set_read: 
Assertion `((slap_daemon.sd_index[(s)]) != -1)' failed.
daemon: activity on 1 descriptor

Program received signal SIGABRT, Aborted.
[Switching to Thread 1090517344 (LWP 10355)]
0x00002aaaab6e76cd in raise () from /lib64/tls/libc.so.6
(gdb) bt
#0  0x00002aaaab6e76cd in raise () from /lib64/tls/libc.so.6
#1  0x00002aaaab6e8c7e in abort () from /lib64/tls/libc.so.6
#2  0x00002aaaab6e1420 in __assert_fail () from /lib64/tls/libc.so.6
#3  0x000000000042e7a3 in slapd_set_read (s=15, wake=1)
    at ../../../head/servers/slapd/daemon.c:567
#4  0x00000000004369bf in connection_read (s=15, op=0x40fff480)
    at ../../../head/servers/slapd/connection.c:1665
#5  0x0000000000435efe in connection_read_thread (ctx=0x40fff4c0, argv=0xf)
    at ../../../head/servers/slapd/connection.c:1419
#6  0x0000000000556184 in ldap_int_thread_pool_wrapper (xpool=0x788710)
    at ../../../head/libraries/libldap_r/tpool.c:615
#7  0x00002aaaab5ab649 in start_thread () from /lib64/tls/libpthread.so.0
#8  0x00002aaaab7716c3 in thread_start () from /lib64/tls/libc.so.6
(gdb) info thr
  10 Thread 1149237600 (LWP 10362)  0x00002aaaab5ad55a in 
pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/tls/libpthread.so.0
  9 Thread 1140848992 (LWP 10361)  0x00002aaaab5ad55a in 
pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/tls/libpthread.so.0
  8 Thread 1132460384 (LWP 10360)  0x00002aaaab5ad55a in 
pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/tls/libpthread.so.0
  7 Thread 1124071776 (LWP 10359)  0x00002aaaab5ad55a in 
pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/tls/libpthread.so.0
  6 Thread 1115683168 (LWP 10358)  0x00002aaaab5af8eb in 
__lll_mutex_lock_wait
    () from /lib64/tls/libpthread.so.0
  5 Thread 1107294560 (LWP 10357)  0x00002aaaab5af8eb in 
__lll_mutex_lock_wait
    () from /lib64/tls/libpthread.so.0
  4 Thread 1098905952 (LWP 10356)  0x00002aaaab5ad55a in 
pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/tls/libpthread.so.0
* 3 Thread 1090517344 (LWP 10355)  0x00002aaaab6e76cd in raise ()
   from /lib64/tls/libc.so.6
  2 Thread 1082128736 (LWP 10337)  0x00002aaaab5af8eb in 
__lll_mutex_lock_wait
    () from /lib64/tls/libpthread.so.0
  1 Thread 46912511001792 (LWP 10328)  0x00002aaaab5abd2f in pthread_join ()
   from /lib64/tls/libpthread.so.0
(gdb) thr 6
[Switching to thread 6 (Thread 1115683168 (LWP 10358))]#5  
0x000000000043108a in slap_listener (sl=0x2aaaab71c520) at 
../../../head/servers/slapd/daemon.c:1519
1519            slapd_add( s, 1, NULL );
(gdb) bt
#0  0x00002aaaab5af8eb in __lll_mutex_lock_wait ()
   from /lib64/tls/libpthread.so.0
#1  0x0000000000000000 in ?? ()
#2  0x0000000000000008 in ?? ()
#3  0x00002aaaab5ac877 in pthread_mutex_lock () from 
/lib64/tls/libpthread.so.0
#4  0x00002aaaab76ddc8 in syslog () from /lib64/tls/libc.so.6
#5  0x000000000043108a in slap_listener (sl=0x2aaaab71c520)
    at ../../../head/servers/slapd/daemon.c:1519
(gdb) thr 5
[Switching to thread 5 (Thread 1107294560 (LWP 10357))]#0  
0x00002aaaab5af8eb in __lll_mutex_lock_wait () from 
/lib64/tls/libpthread.so.0
(gdb) bt 20
#0  0x00002aaaab5af8eb in __lll_mutex_lock_wait ()
   from /lib64/tls/libpthread.so.0
#1  0x0000000000766fe8 in slap_counters ()
#2  0x0000000000000000 in ?? ()
#3  0x00002aaaab5ac877 in pthread_mutex_lock () from 
/lib64/tls/libpthread.so.0
#4  0x000000000082e7c0 in ?? ()
#5  0x0000000041fff290 in ?? ()
#6  0x000000000058a3e9 in ber_memcalloc (n=46912527370128, s=1107293072)
    at ../../../head/libraries/liblber/memory.c:306
#7  0x00000000004593b9 in do_bind (op=0x2aaaadf02f20, rs=0x41fff3e0)
    at ../../../head/servers/slapd/bind.c:54
#8  0x0000000000435a98 in connection_operation (ctx=0x41fff4c0,
    arg_v=0x2aaaadf02f20) at ../../../head/servers/slapd/connection.c:1292
#9  0x0000000000556184 in ldap_int_thread_pool_wrapper (xpool=0x788710)
    at ../../../head/libraries/libldap_r/tpool.c:615
#10 0x00002aaaab5ab649 in start_thread () from /lib64/tls/libpthread.so.0
#11 0x00002aaaab7716c3 in thread_start () from /lib64/tls/libc.so.6
(gdb) thr 2
[Switching to thread 2 (Thread 1082128736 (LWP 10337))]#0  
0x00002aaaab5af8eb in __lll_mutex_lock_wait () from 
/lib64/tls/libpthread.so.0
(gdb) bt 20
#0  0x00002aaaab5af8eb in __lll_mutex_lock_wait ()
   from /lib64/tls/libpthread.so.0
#1  0x0000000000000000 in ?? ()
#2  0x0000000000000000 in ?? ()
#3  0x00002aaaab5ac877 in pthread_mutex_lock () from 
/lib64/tls/libpthread.so.0
#4  0x0000000000000001 in ?? ()
#5  0x00000000005962ce in __PRETTY_FUNCTION__.11610 ()
#6  0x0000000000000000 in ?? ()
#7  0x0000000000435ed3 in connection_client_stop (s=0)
    at ../../../head/servers/slapd/connection.c:1404
#8  0x00000000004319a1 in slapd_daemon_task (ptr=0x0)
    at ../../../head/servers/slapd/daemon.c:1734
#9  0x00002aaaab5ab649 in start_thread () from /lib64/tls/libpthread.so.0
#10 0x00002aaaab7716c3 in thread_start () from /lib64/tls/libc.so.6
#11 0x0000000000000000 in ?? ()
#12 0x0000000000000000 in ?? ()
#13 0x0000000000000000 in ?? ()
#14 0x0000000000000000 in ?? ()
#15 0x0000000000000000 in ?? ()
#16 0x0000000000000000 in ?? ()
#17 0x0000000000000000 in ?? ()
#18 0x0000000000000000 in ?? ()
#19 0x0000000000000000 in ?? ()
(More stack frames follow...)
(gdb) frame 8
#8  0x00000000004319a1 in slapd_daemon_task (ptr=0x0)
    at ../../../head/servers/slapd/daemon.c:1734
1734                    ldap_pvt_thread_mutex_lock( &slap_daemon.sd_mutex );
(gdb) thr 3
[Switching to thread 3 (Thread 1090517344 (LWP 10355))]#0  
0x00002aaaab6e76cd in raise () from /lib64/tls/libc.so.6
(gdb) frame 4
#4  0x00000000004369bf in connection_read (s=15, op=0x40fff480)
    at ../../../head/servers/slapd/connection.c:1665
1665            slapd_set_read( s, 1 );
(gdb) p *c
$13 = {c_struct_state = 2, c_conn_state = 3, c_conn_idx = 3,
  c_close_reason = 0x596d1b "?", c_mutex = {__m_reserved = 2, __m_count = 0,
    __m_owner = 0x100002873, __m_kind = 0, __m_lock = {__status = 0,
      __spinlock = 0}}, c_sb = 0x2aaaadc02d90, c_starttime = 1130586974,
  c_activitytime = 1130586974, c_connid = 87550, c_peer_domain = {bv_len 
= 7,
    bv_val = 0x2aaaadf03460 "unknown"}, c_peer_name = {bv_len = 21,
    bv_val = 0x2aaaadf034e0 "IP=192.168.1.22:23727"}, c_listener = 0x76a260,
  c_sasl_bind_in_progress = 0, c_sasl_bind_mech = {bv_len = 0, bv_val = 
0x0},
  c_sasl_dn = {bv_len = 0, bv_val = 0x0}, c_sasl_authz_dn = {bv_len = 0,
    bv_val = 0x0}, c_authz_backend = 0x0, c_authz_cookie = 0x0, c_authz = {
    sai_method = 128, sai_mech = {bv_len = 0, bv_val = 0x0}, sai_dn = {
      bv_len = 0, bv_val = 0x0}, sai_ndn = {bv_len = 0, bv_val = 0x0},
    sai_ssf = 0, sai_transport_ssf = 0, sai_tls_ssf = 0, sai_sasl_ssf = 0},
  c_protocol = 0, c_ops = {stqh_first = 0x2aaaadf02f20,
    stqh_last = 0x2aaaadf03078}, c_pending_ops = {stqh_first = 0x0,
    stqh_last = 0x2aaaac8788a8}, c_write_mutex = {__m_reserved = 0,
    __m_count = 0, __m_owner = 0x0, __m_kind = 0, __m_lock = {__status = 0,
      __spinlock = 0}}, c_write_cv = {__c_lock = {__status = 8589934592,
      __spinlock = 1}, __c_waiting = 0x1,
    __padding = "\001\000\000\000\000\000\000\000\220\207\207»*\000",
    __align = 0}, c_currentber = 0x2aaaadc00930, c_writewaiter = 0,
  c_is_tls = 0, c_needs_tls_accept = 0, c_sasl_layers = 0, c_sasl_done = 0,
  c_sasl_authctx = 0x2aaaadf00b40, c_sasl_sockctx = 0x0,
  c_sasl_extra = 0x2aaaadf03630, c_sasl_bindop = 0x0, 
c_pagedresults_state = {
    ps_be = 0x0, ps_size = 0, ps_cookie = 0, ps_count = 0},
  c_n_ops_received = 1, c_n_ops_executing = 1, c_n_ops_pending = 0,
  c_n_ops_completed = 0, c_n_get = 1, c_n_read = 1, c_n_write = 0, c_pb 
= 0x0,
  c_extensions = 0x0, c_clientfunc = 0, c_clientarg = 0x0,
  c_send_ldap_result = 0x4490d9 <slap_send_ldap_result>,
  c_send_search_entry = 0x449e34 <slap_send_search_entry>,
  c_send_search_reference = 0x44c0ac <slap_send_search_reference>,
  c_send_ldap_extended = 0x449973 <slap_send_ldap_extended>,
  c_send_ldap_intermediate = 0x449c15 <slap_send_ldap_intermediate>}
(gdb)

What really doesn't make sense here is that the reader thread that 
crashed shouldn't even be running yet. I.e., the debug output shows that 
fd 15 was just removed from slapd control. Then a new connection was 
accepted, using that same fd. But the listener thread hasn't added 15 
back into slapd control yet - that's the slapd_add() call that happens 
right after the StatsLog invocation. So, fd 15 really is still invalid 
at this point. But since it is invalid, there should not be any active 
reader threads using that descriptor yet. Notice that thread 5 has 
already read a Bind operation from that fd, and the conn->c_conn_state 
is SLAP_C_BINDING. Again, it should not be possible for a reader thread 
to have already gotten this far, since the accepting thread hasn't 
enabled the descriptor yet.

The only possibility that comes to mind is that this reader thread is 
leftover from the previous invocation. I.e., it was submitted into the 
pool while the previous connection was active, and the connection was 
closed before the thread got to run. Since the listener thread has just 
called connection_init(), the conn structure is valid when the reader 
thread runs. But since the listener hasn't called slapd_add() yet, the 
socket is still invalid.

Moving the slapd_add() so that it occurs while the conn structure is 
locked appears to fix this problem.

-- 
  -- Howard Chu
  Chief Architect, Symas Corp.  http://www.symas.com
  Director, Highland Sun        http://highlandsun.com/hyc
  OpenLDAP Core Team            http://www.openldap.org/project/