[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#7650) Multi-master replication deadlock with TLS



sdj42@live.com wrote:
> Full_Name: Scott Ireland
> Version: 2.4.23
> OS: RHEL 6.4
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (207.34.120.71)
>
>
> A 2-server, multi-master replication setup appears to deadlock during sync.
> After a period of time (sometimes hours), both servers will lock up at the same
> time and fail to handle requests (although they do accept TCP connections).
> This only occurs when the sync occurs with TLS (either StartTLS or ldaps); when
> TLS is not used the sync runs without incident.  Stopping/restarting one of the
> servers clears the lockup on the other server.  Sometimes the issue can be
> triggered immediately when a change is made to the database (in which case the
> client does not see the operation complete), but will also occur on its own as
> periodic replication checks occur.

You're running a very old release, current is 2.4.35.

You're running a Red Hat customized build, contact Red Hat for support.
Closing this ITS.

> Typical log (level 0x4100) before the freeze:
>
> Jul 25 11:09:43 cd1911-o99 slapd[5789]: conn=1380 fd=23 ACCEPT from
> IP=redacted:48388 (IP=redacted:389)
> Jul 25 11:09:43 cd1911-o99 slapd[5789]: conn=1380 op=0 EXT
> oid=1.3.6.1.4.1.1466.20037
> Jul 25 11:09:43 cd1911-o99 slapd[5789]: conn=1380 op=0 STARTTLS
> Jul 25 11:09:43 cd1911-o99 slapd[5789]: conn=1380 op=0 RESULT oid= err=0 text=
> [freezes here]
>
> GDB backtrace (this lock was triggered by a change in cn=config):
>
> (gdb) info threads
>    5 Thread 0x7f7eeb3d0700 (LWP 7048)  0x00007f7eee227f03 in epoll_wait ()
>     from /lib64/libc.so.6
>    4 Thread 0x7f7ed5028700 (LWP 7049)  0x00007f7eee6eb054 in __lll_lock_wait ()
>     from /lib64/libpthread.so.0
>    3 Thread 0x7f7ed4827700 (LWP 7050)  0x00007f7eee6eb54d in read ()
>     from /lib64/libpthread.so.0
>    2 Thread 0x7f7ecffff700 (LWP 7051)  0x00007f7eee6e843c in
> pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
> * 1 Thread 0x7f7ef0700700 (LWP 7043)  0x00007f7eee6e50ad in pthread_join ()
>     from /lib64/libpthread.so.0
>
> (gdb) bt
> #0  0x00007f7eee6e50ad in pthread_join () from /lib64/libpthread.so.0
> #1  0x00007f7ef075685c in slapd_daemon ()
>      at ../../../servers/slapd/daemon.c:2842
> #2  0x00007f7ef0742b0e in main (argc=5, argv=<value optimized out>)
>      at ../../../servers/slapd/main.c:961
>
> (gdb) thread 2
> [Switching to thread 2 (Thread 0x7f7ecffff700 (LWP 7051))]#0  0x00007f7eee6e843c
>               in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
> (gdb) bt
> #0  0x00007f7eee6e843c in pthread_cond_wait@@GLIBC_2.3.2 ()
>     from /lib64/libpthread.so.0
> #1  0x00007f7ef085c6fb in handle_pause (tpool=<value optimized out>,
>      do_pause=1) at ../../../libraries/libldap_r/tpool.c:757
> #2  0x00007f7ef074a53c in config_back_modify (op=0x7f7ec81adaa0,
>      rs=0x7f7ecfffe920) at ../../../servers/slapd/bconfig.c:5677
> #3  0x00007f7ef07c8977 in overlay_op_walk (op=0x7f7ec81adaa0,
>      rs=0x7f7ecfffe920, which=op_modify, oi=0x7f7ef1a644c0, on=0x0)
>      at ../../../servers/slapd/backover.c:669
> #4  0x00007f7ef07c93cb in over_op_func (op=0x7f7ec81adaa0,
>      rs=<value optimized out>, which=<value optimized out>)
>      at ../../../servers/slapd/backover.c:721
> #5  0x00007f7ef0775ece in fe_op_modify (op=0x7f7ec81adaa0, rs=0x7f7ecfffe920)
>      at ../../../servers/slapd/modify.c:301
> #6  0x00007f7ef077685c in do_modify (op=0x7f7ec81adaa0, rs=0x7f7ecfffe920)
>      at ../../../servers/slapd/modify.c:175
> #7  0x00007f7ef075c9f9 in connection_operation (ctx=0x7f7ecfffeb70,
>      arg_v=0x7f7ec81adaa0) at ../../../servers/slapd/connection.c:1109
> #8  0x00007f7ef075d34d in connection_read_thread (ctx=0x7f7ecfffeb70,
>      argv=<value optimized out>) at ../../../servers/slapd/connection.c:1245
> #9  0x00007f7ef085cdd8 in ldap_int_thread_pool_wrapper (xpool=0x7f7ef19957b0)
>      at ../../../libraries/libldap_r/tpool.c:685
> #10 0x00007f7eee6e4851 in start_thread () from /lib64/libpthread.so.0
> #11 0x00007f7eee22790d in clone () from /lib64/libc.so.6
>
> (gdb) thread 3
> [Switching to thread 3 (Thread 0x7f7ed4827700 (LWP 7050))]#0  0x00007f7eee6eb54d
> in read () from /lib64/libpthread.so.0
> (gdb) bt
> #0  0x00007f7eee6eb54d in read () from /lib64/libpthread.so.0
> #1  0x00007f7ef088c67c in sb_debug_read (sbiod=0x7f7ec0174cb0,
>      buf=0x7f7ec0191130, len=5) at ../../../libraries/liblber/sockbuf.c:829
> #2  0x00007f7ef0882564 in tlsm_PR_Recv (fd=<value optimized out>,
>      buf=0x7f7ec0191130, len=5, flags=<value optimized out>,
>      timeout=<value optimized out>) at tls_m.c:3032
> #3  0x00007f7eef6ed1ed in ssl_DefRecv (ss=<value optimized out>,
>      buf=<value optimized out>, len=5, flags=<value optimized out>)
>      at ssldef.c:62
> #4  0x00007f7eef6e8480 in ssl3_GatherData (ss=0x7f7ec0190d50, flags=0)
>      at ssl3gthr.c:59
> #5  ssl3_GatherCompleteHandshake (ss=0x7f7ec0190d50, flags=0) at ssl3gthr.c:318
> #6  0x00007f7eef6eaed2 in ssl_GatherRecord1stHandshake (ss=0x7f7ec0190d50)
>      at sslcon.c:1227
> #7  0x00007f7eef6f1135 in ssl_Do1stHandshake (ss=0x7f7ec0190d50)
>      at sslsecur.c:120
> #8  0x00007f7eef6f296f in SSL_ForceHandshake (fd=<value optimized out>)
>      at sslsecur.c:390
> #9  0x00007f7ef0882288 in tlsm_session_accept_or_connect (
>      session=0x7f7ec0174910, is_accept=<value optimized out>) at tls_m.c:2677
> #10 0x00007f7ef0880e8a in ldap_int_tls_connect (ld=0x7f7ec0107ca0,
>      conn=<value optimized out>, host=0x7f7ec0170710 "redacted")
>      at tls2.c:366
> #11 0x00007f7ef088110a in ldap_int_tls_start (ld=0x7f7ec0107ca0,
>      conn=0x7f7ec0107f10, srv=<value optimized out>) at tls2.c:845
> #12 0x00007f7ef088120e in ldap_start_tls_s (ld=0x7f7ec0107ca0,
>      serverctrls=0x0, clientctrls=0x0) at tls2.c:938
> #13 0x00007f7ef075269e in slap_client_connect (ldp=0x7f7ec01709f0,
>      sb=0x7f7ec01707d0) at ../../../servers/slapd/config.c:1921
> #14 0x00007f7ef07c5062 in do_syncrep1 (ctx=<value optimized out>,
>      arg=0x7f7ec0170b60) at ../../../servers/slapd/syncrepl.c:571
> #15 do_syncrepl (ctx=<value optimized out>, arg=0x7f7ec0170b60)
>      at ../../../servers/slapd/syncrepl.c:1422
> #16 0x00007f7ef085cdd8 in ldap_int_thread_pool_wrapper (xpool=0x7f7ef19957b0)
>      at ../../../libraries/libldap_r/tpool.c:685
> #17 0x00007f7eee6e4851 in start_thread () from /lib64/libpthread.so.0
> #18 0x00007f7eee22790d in clone () from /lib64/libc.so.6
>
> (gdb) thread 4
> [Switching to thread 4 (Thread 0x7f7ed5028700 (LWP 7049))]#0  0x00007f7eee6eb054
> in __lll_lock_wait () from /lib64/libpthread.so.0
> (gdb) bt
> #0  0x00007f7eee6eb054 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x00007f7eee6e6388 in _L_lock_854 () from /lib64/libpthread.so.0
> #2  0x00007f7eee6e6257 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x00007f7ef0882280 in tlsm_session_accept_or_connect (
>      session=0x7f7ec4131df0, is_accept=<value optimized out>) at tls_m.c:2674
> #4  0x00007f7ef08812c7 in ldap_pvt_tls_accept (sb=0x7f7ec4140420,
>      ctx_arg=0x7f7ef1a82e70) at tls2.c:433
> #5  0x00007f7ef075d3bb in connection_read (ctx=0x7f7ed5027b70, argv=0x17)
>      at ../../../servers/slapd/connection.c:1326
> #6  connection_read_thread (ctx=0x7f7ed5027b70, argv=0x17)
>      at ../../../servers/slapd/connection.c:1238
> #7  0x00007f7ef085cdd8 in ldap_int_thread_pool_wrapper (xpool=0x7f7ef19957b0)
>      at ../../../libraries/libldap_r/tpool.c:685
> #8  0x00007f7eee6e4851 in start_thread () from /lib64/libpthread.so.0
> #9  0x00007f7eee22790d in clone () from /lib64/libc.so.6
>
> (gdb) thread 5
> [Switching to thread 5 (Thread 0x7f7eeb3d0700 (LWP 7048))]#0  0x00007f7eee227f03
> in epoll_wait () from /lib64/libc.so.6
> (gdb) bt
> #0  0x00007f7eee227f03 in epoll_wait () from /lib64/libc.so.6
> #1  0x00007f7ef07576f5 in slapd_daemon_task (ptr=<value optimized out>)
>      at ../../../servers/slapd/daemon.c:2467
> #2  0x00007f7eee6e4851 in start_thread () from /lib64/libpthread.so.0
> #3  0x00007f7eee22790d in clone () from /lib64/libc.so.6
>
>


-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/