[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#5835) master slapd dying on lost writes



--On Friday, November 28, 2008 7:40 PM +0000 quanah@zimbra.com wrote:

>
>
> --On November 28, 2008 11:35:45 AM -0800 Quanah Gibson-Mount
> <quanah@zimbra.com> wrote:

Latest backtrace, with the patch from HEAD about idletimeout, we get:

(gdb) bt
#0  0x0000003fca02e21d in raise () from /lib64/tls/libc.so.6
#1  0x0000003fca02fa1e in abort () from /lib64/tls/libc.so.6
#2  0x0000003fca027ae1 in __assert_fail () from /lib64/tls/libc.so.6
#3  0x000000000042b5c1 in connection_close (c=0x3b69520) at connection.c:877
#4  0x0000000000429e7d in connections_timeout_idle (now=1228167365) at 
connection.c:242
#5  0x0000000000428097 in slapd_daemon_task (ptr=0x0) at daemon.c:2039
#6  0x0000003fca90610a in start_thread () from /lib64/tls/libpthread.so.0
#7  0x0000003fca0c68c3 in clone () from /lib64/tls/libc.so.6
#8  0x0000000000000000 in ?? ()

Looking at the connection, we see:

(gdb) frame 3
#3  0x000000000042b5c1 in connection_close (c=0x3b69520) at connection.c:877
877     connection.c: No such file or directory.
        in connection.c
(gdb) print *c
$1 = {c_struct_state = 2, c_conn_state = 3, c_conn_idx = 28, c_close_reason 
= 0x4bea5b "?", c_mutex = {__m_reserved = 2, __m_count = 0, __m_owner = 
0x100002068, __m_kind = 0, __m_lock = {
      __status = 0, __spinlock = 0}}, c_sb = 0x13b4d860, c_starttime = 
1228167365, c_activitytime = 1228167365, c_connid = 3120, c_peer_domain = 
{bv_len = 7,
    bv_val = 0xdfac1b0 "unknown"}, c_peer_name = {bv_len = 23, bv_val = 
0x108dc558 "IP=192.168.58.225:53050"}, c_listener = 0x88df00, 
c_sasl_bind_in_progress = 0, c_sasl_bind_mech = {
    bv_len = 0, bv_val = 0x0}, c_sasl_dn = {bv_len = 0, bv_val = 0x0}, 
c_sasl_authz_dn = {bv_len = 0, bv_val = 0x0}, c_authz_backend = 0x0, 
c_authz_cookie = 0x0, c_authz = {
    sai_method = 128, sai_mech = {bv_len = 0, bv_val = 0x0}, sai_dn = 
{bv_len = 0, bv_val = 0x0}, sai_ndn = {bv_len = 0, bv_val = 0x0}, sai_ssf = 
0, sai_transport_ssf = 0,
    sai_tls_ssf = 0, sai_sasl_ssf = 0}, c_protocol = 0, c_ops = {stqh_first 
= 0x64abc00, stqh_last = 0x64abd58}, c_pending_ops = {stqh_first = 0x0, 
stqh_last = 0x3b69650},
  c_write_mutex = {__m_reserved = 0, __m_count = 0, __m_owner = 0x0, 
__m_kind = 0, __m_lock = {__status = 0, __spinlock = 0}}, c_write_cv = 
{__c_lock = {__status = 3556232921088,
      __spinlock = 414}, __c_waiting = 0x19e, __padding = 
"\236\001\000\000\000\000\000\0008\225�\003\000\000\000", __align = 
0}, c_currentber = 0xe032000, c_writewaiter = 0,
  c_is_tls = 0, c_needs_tls_accept = 0, c_sasl_layers = 0, c_sasl_done = 0, 
c_sasl_authctx = 0xe035400, c_sasl_sockctx = 0x0, c_sasl_extra = 0x669cac8, 
c_sasl_bindop = 0x0,
  c_pagedresults_state = {ps_be = 0x0, ps_size = 0, ps_cookie = 0, ps_count 
= 0}, c_n_ops_received = 1, c_n_ops_executing = 1, c_n_ops_pending = 0, 
c_n_ops_completed = 0, c_n_get = 1,
  c_n_read = 1, c_n_write = 0, c_pb = 0x0, c_extensions = 0x0, c_clientfunc 
= 0, c_clientarg = 0x0, c_send_ldap_result = 0x43ea8c 
<slap_send_ldap_result>,
  c_send_search_entry = 0x43f787 <slap_send_search_entry>, 
c_send_search_reference = 0x44188d <slap_send_search_reference>, 
c_send_ldap_extended = 0x43f2fa <slap_send_ldap_extended>,
  c_send_ldap_intermediate = 0x43f57e <slap_send_ldap_intermediate>}

So this was file descriptor 28.

This is definitely the same assert as before in connection.c:

        assert( c->c_conn_state == SLAP_C_CLOSING );

being line 877.

Line 242 is:

                if( difftime( c->c_activitytime+global_idletimeout, now) < 
0 ) {
                        /* close it */
                        connection_closing( c, "idletimeout" );
-->                        connection_close( c );
                        i++;
                }

The log file shows:

Dec  2 03:06:04 ldap slapd[8292]: daemon: epoll: listen=7 busy
Dec  2 03:06:04 ldap slapd[8292]: daemon: epoll: listen=8 active_threads=1 
tvp=zero
Dec  2 03:06:04 ldap slapd[8292]: daemon: activity on 1 descriptor
Dec  2 03:06:04 ldap slapd[8292]: daemon: activity on:
Dec  2 03:06:04 ldap slapd[8292]:  28r
Dec  2 03:06:04 ldap slapd[8292]:
Dec  2 03:06:04 ldap slapd[8292]: daemon: read active on 28
Dec  2 03:06:04 ldap slapd[8292]: daemon: epoll: listen=7 busy
Dec  2 03:06:04 ldap slapd[8292]: daemon: epoll: listen=8 active_threads=1 
tvp=zero
Dec  2 03:06:04 ldap slapd[8292]: daemon: activity on 1 descriptor
Dec  2 03:06:04 ldap slapd[8292]: daemon: activity on:
Dec  2 03:06:04 ldap slapd[8292]:
Dec  2 03:06:04 ldap slapd[8292]: daemon: hangup on 28
Dec  2 03:06:04 ldap slapd[8292]: daemon: epoll: listen=7 busy
Dec  2 03:06:04 ldap slapd[8292]: daemon: epoll: listen=8 active_threads=1 
tvp=zero
Dec  2 03:06:04 ldap slapd[8292]: daemon: activity on 1 descriptor
Dec  2 03:06:04 ldap slapd[8292]: daemon: activity on:
Dec  2 03:06:04 ldap slapd[8292]:
Dec  2 03:06:04 ldap slapd[8292]: daemon: hangup on 28
Dec  2 03:06:04 ldap slapd[8292]: daemon: epoll: listen=7 busy
Dec  2 03:06:04 ldap slapd[8292]: daemon: epoll: listen=8 active_threads=1 
tvp=zero
Dec  2 03:06:04 ldap slapd[8292]: daemon: activity on 1 descriptor
Dec  2 03:06:04 ldap slapd[8292]: daemon: activity on:
Dec  2 03:06:04 ldap slapd[8292]:
Dec  2 03:06:04 ldap slapd[8292]: daemon: hangup on 28
Dec  2 03:06:04 ldap slapd[8292]: daemon: epoll: listen=7 busy
Dec  2 03:06:04 ldap slapd[8292]: daemon: epoll: listen=8 active_threads=1 
tvp=zero
Dec  2 03:06:04 ldap slapd[8292]: daemon: activity on 1 descriptor
Dec  2 03:06:04 ldap slapd[8292]: daemon: activity on:
Dec  2 03:06:04 ldap slapd[8292]:
Dec  2 03:06:04 ldap slapd[8292]: daemon: hangup on 28
Dec  2 03:06:04 ldap slapd[8292]: daemon: epoll: listen=7 busy
Dec  2 03:06:04 ldap slapd[8292]: daemon: epoll: listen=8 active_threads=1 
tvp=zero
Dec  2 03:06:04 ldap slapd[8292]: daemon: activity on 1 descriptor
Dec  2 03:06:04 ldap slapd[8292]: daemon: activity on:
Dec  2 03:06:04 ldap slapd[8292]:
Dec  2 03:06:04 ldap slapd[8292]: daemon: hangup on 28
Dec  2 03:06:04 ldap slapd[8292]: daemon: epoll: listen=7 busy
Dec  2 03:06:04 ldap slapd[8292]: daemon: epoll: listen=8 active_threads=1 
tvp=zero
Dec  2 03:06:04 ldap slapd[8292]: daemon: activity on 1 descriptor
Dec  2 03:06:04 ldap slapd[8292]: daemon: activity on:
Dec  2 03:06:04 ldap slapd[8292]:
Dec  2 03:06:04 ldap slapd[8292]: daemon: hangup on 28
Dec  2 03:06:04 ldap slapd[8292]: daemon: epoll: listen=7 busy
Dec  2 03:06:04 ldap slapd[8292]: daemon: epoll: listen=8 active_threads=1 
tvp=zero
Dec  2 03:06:04 ldap slapd[8292]: daemon: activity on 1 descriptor
Dec  2 03:06:05 ldap slapd[8292]: daemon: activity on:
Dec  2 03:06:05 ldap slapd[8292]:
Dec  2 03:06:05 ldap slapd[8292]: daemon: hangup on 28
Dec  2 03:06:05 ldap slapd[8292]: connection_closing: readying conn=3049 
sd=28 for close
Dec  2 03:06:05 ldap slapd[8292]: connection_resched: attempting closing 
conn=3049 sd=28
Dec  2 03:06:05 ldap slapd[8292]: daemon: removing 28
Dec  2 03:06:05 ldap slapd[8292]: conn=3049 fd=28 closed (idletimeout)
Dec  2 03:06:05 ldap slapd[8292]: daemon: listen=7, new connection on 28
Dec  2 03:06:05 ldap slapd[8292]: daemon: added 28r (active) listener=(nil)
Dec  2 03:06:05 ldap slapd[8292]: conn=3120 fd=28 ACCEPT from 
IP=192.168.58.225:53050 (IP=192.168.58.179:389)
Dec  2 03:06:05 ldap slapd[8292]: conn=3120 op=0 BIND 
dn="uid=zimbra,cn=admins,cn=zimbra" method=128


And here's the start of the previous connection on descriptor 28:

Dec  2 03:03:53 ldap slapd[8292]: daemon: epoll: listen=7 busy
Dec  2 03:03:53 ldap slapd[8292]: daemon: epoll: listen=8 active_threads=1 
tvp=zero
Dec  2 03:03:53 ldap slapd[8292]: daemon: listen=7, new connection on 28
Dec  2 03:03:53 ldap slapd[8292]: daemon: added 28r (active) listener=(nil)
Dec  2 03:03:53 ldap slapd[8292]: conn=3049 fd=28 ACCEPT from 
IP=192.168.61.35:57352 (IP=192.168.58.179:389)
Dec  2 03:03:53 ldap slapd[8292]: daemon: activity on 2 descriptors
Dec  2 03:03:53 ldap slapd[8292]: daemon: activity on:
Dec  2 03:03:53 ldap slapd[8292]:  28r
Dec  2 03:03:53 ldap slapd[8292]:
Dec  2 03:03:53 ldap slapd[8292]: daemon: read active on 28
Dec  2 03:03:53 ldap slapd[8292]: daemon: epoll: listen=7 active_threads=1 
tvp=zero
Dec  2 03:03:53 ldap slapd[8292]: daemon: epoll: listen=8 active_threads=1 
tvp=zero
Dec  2 03:03:53 ldap slapd[8292]: conn=3049 op=0 STARTTLS
Dec  2 03:03:53 ldap slapd[8292]: conn=3049 op=0 RESULT oid= err=0 text=
Dec  2 03:03:53 ldap slapd[8292]: daemon: activity on 1 descriptor
Dec  2 03:03:53 ldap slapd[8292]: daemon: activity on:
Dec  2 03:03:53 ldap slapd[8292]:
Dec  2 03:03:53 ldap slapd[8292]: daemon: epoll: listen=7 active_threads=1 
tvp=zero
Dec  2 03:03:53 ldap slapd[8292]: daemon: epoll: listen=8 active_threads=1 
tvp=zero
Dec  2 03:03:53 ldap slapd[8292]: daemon: activity on 1 descriptor
Dec  2 03:03:53 ldap slapd[8292]: daemon: activity on:
Dec  2 03:03:53 ldap slapd[8292]:  28r
Dec  2 03:03:53 ldap slapd[8292]:
Dec  2 03:03:53 ldap slapd[8292]: daemon: read active on 28
Dec  2 03:03:53 ldap slapd[8292]: daemon: epoll: listen=7 active_threads=1 
tvp=zero
Dec  2 03:03:53 ldap slapd[8292]: daemon: epoll: listen=8 active_threads=1 
tvp=zero
Dec  2 03:03:53 ldap slapd[8292]: daemon: activity on 1 descriptor
Dec  2 03:03:53 ldap slapd[8292]: daemon: activity on:
Dec  2 03:03:53 ldap slapd[8292]:
Dec  2 03:03:53 ldap slapd[8292]: daemon: epoll: listen=7 active_threads=1 
tvp=zero
Dec  2 03:03:53 ldap slapd[8292]: daemon: epoll: listen=8 active_threads=1 
tvp=zero
Dec  2 03:03:53 ldap slapd[8292]: daemon: activity on 1 descriptor
Dec  2 03:03:53 ldap slapd[8292]: daemon: activity on:
Dec  2 03:03:53 ldap slapd[8292]:
Dec  2 03:03:53 ldap slapd[8292]: daemon: epoll: listen=7 active_threads=1 
tvp=zero
Dec  2 03:03:53 ldap slapd[8292]: daemon: epoll: listen=8 active_threads=1 
tvp=zero
Dec  2 03:03:53 ldap slapd[8292]: daemon: activity on 1 descriptor
Dec  2 03:03:53 ldap slapd[8292]: daemon: activity on:
Dec  2 03:03:53 ldap slapd[8292]:  28r
Dec  2 03:03:53 ldap slapd[8292]:
Dec  2 03:03:53 ldap slapd[8292]: daemon: read active on 28
Dec  2 03:03:53 ldap slapd[8292]: daemon: epoll: listen=7 active_threads=1 
tvp=zero
Dec  2 03:03:53 ldap slapd[8292]: daemon: epoll: listen=8 active_threads=1 
tvp=zero
Dec  2 03:03:53 ldap slapd[8292]: conn=3049 op=1 BIND 
dn="uid=zmreplica,cn=admins,cn=zimbra" method=128
Dec  2 03:03:53 ldap slapd[8292]: conn=3049 op=1 BIND 
dn="uid=zmreplica,cn=admins,cn=zimbra" mech=SIMPLE ssf=0
Dec  2 03:03:53 ldap slapd[8292]: conn=3049 op=1 RESULT tag=97 err=0 text=
Dec  2 03:03:53 ldap slapd[8292]: daemon: activity on 1 descriptor
Dec  2 03:03:53 ldap slapd[8292]: daemon: activity on:
Dec  2 03:03:53 ldap slapd[8292]:
Dec  2 03:03:53 ldap slapd[8292]: daemon: epoll: listen=7 active_threads=1 
tvp=zero
Dec  2 03:03:53 ldap slapd[8292]: daemon: epoll: listen=8 active_threads=1 
tvp=zero
Dec  2 03:03:53 ldap slapd[8292]: daemon: activity on 1 descriptor
Dec  2 03:03:53 ldap slapd[8292]: daemon: activity on:
Dec  2 03:03:53 ldap slapd[8292]:  28r
Dec  2 03:03:53 ldap slapd[8292]:
Dec  2 03:03:53 ldap slapd[8292]: daemon: read active on 28
Dec  2 03:03:53 ldap slapd[8292]: daemon: epoll: listen=7 active_threads=1 
tvp=zero
Dec  2 03:03:53 ldap slapd[8292]: daemon: epoll: listen=8 active_threads=1 
tvp=zero
Dec  2 03:03:53 ldap slapd[8292]: conn=3049 op=2 SRCH base="cn=accesslog" 
scope=2 deref=0 filter="(&(objectClass=auditWriteObject)(reqResult=0))"
Dec  2 03:03:53 ldap slapd[8292]: conn=3049 op=2 SRCH attr=reqDN reqType 
reqMod reqNewRDN reqDeleteOldRDN reqNewSuperior entryCSN
Dec  2 03:03:53 ldap slapd[8292]: daemon: activity on 1 descriptor
Dec  2 03:03:53 ldap slapd[8292]: daemon: activity on:
Dec  2 03:03:53 ldap slapd[8292]:


Then nothing logged again until the first segment quoted where it crashed.

--Quanah

--

Quanah Gibson-Mount
Principal Software Engineer
Zimbra, Inc
--------------------
Zimbra ::  the leader in open source messaging and collaboration