[Date Prev][Date Next] [Chronological] [Thread] [Top]

(ITS#7650) Multi-master replication deadlock with TLS



Full_Name: Scott Ireland
Version: 2.4.23
OS: RHEL 6.4
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (207.34.120.71)


A 2-server, multi-master replication setup appears to deadlock during sync. 
After a period of time (sometimes hours), both servers will lock up at the same
time and fail to handle requests (although they do accept TCP connections). 
This only occurs when the sync occurs with TLS (either StartTLS or ldaps); when
TLS is not used the sync runs without incident.  Stopping/restarting one of the
servers clears the lockup on the other server.  Sometimes the issue can be
triggered immediately when a change is made to the database (in which case the
client does not see the operation complete), but will also occur on its own as
periodic replication checks occur.

Typical log (level 0x4100) before the freeze:

Jul 25 11:09:43 cd1911-o99 slapd[5789]: conn=1380 fd=23 ACCEPT from
IP=redacted:48388 (IP=redacted:389)
Jul 25 11:09:43 cd1911-o99 slapd[5789]: conn=1380 op=0 EXT
oid=1.3.6.1.4.1.1466.20037
Jul 25 11:09:43 cd1911-o99 slapd[5789]: conn=1380 op=0 STARTTLS
Jul 25 11:09:43 cd1911-o99 slapd[5789]: conn=1380 op=0 RESULT oid= err=0 text=
[freezes here]

GDB backtrace (this lock was triggered by a change in cn=config):

(gdb) info threads
  5 Thread 0x7f7eeb3d0700 (LWP 7048)  0x00007f7eee227f03 in epoll_wait ()
   from /lib64/libc.so.6
  4 Thread 0x7f7ed5028700 (LWP 7049)  0x00007f7eee6eb054 in __lll_lock_wait ()
   from /lib64/libpthread.so.0
  3 Thread 0x7f7ed4827700 (LWP 7050)  0x00007f7eee6eb54d in read ()
   from /lib64/libpthread.so.0
  2 Thread 0x7f7ecffff700 (LWP 7051)  0x00007f7eee6e843c in
pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
* 1 Thread 0x7f7ef0700700 (LWP 7043)  0x00007f7eee6e50ad in pthread_join ()
   from /lib64/libpthread.so.0

(gdb) bt
#0  0x00007f7eee6e50ad in pthread_join () from /lib64/libpthread.so.0
#1  0x00007f7ef075685c in slapd_daemon ()
    at ../../../servers/slapd/daemon.c:2842
#2  0x00007f7ef0742b0e in main (argc=5, argv=<value optimized out>)
    at ../../../servers/slapd/main.c:961

(gdb) thread 2
[Switching to thread 2 (Thread 0x7f7ecffff700 (LWP 7051))]#0  0x00007f7eee6e843c
             in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007f7eee6e843c in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1  0x00007f7ef085c6fb in handle_pause (tpool=<value optimized out>,
    do_pause=1) at ../../../libraries/libldap_r/tpool.c:757
#2  0x00007f7ef074a53c in config_back_modify (op=0x7f7ec81adaa0,
    rs=0x7f7ecfffe920) at ../../../servers/slapd/bconfig.c:5677
#3  0x00007f7ef07c8977 in overlay_op_walk (op=0x7f7ec81adaa0,
    rs=0x7f7ecfffe920, which=op_modify, oi=0x7f7ef1a644c0, on=0x0)
    at ../../../servers/slapd/backover.c:669
#4  0x00007f7ef07c93cb in over_op_func (op=0x7f7ec81adaa0,
    rs=<value optimized out>, which=<value optimized out>)
    at ../../../servers/slapd/backover.c:721
#5  0x00007f7ef0775ece in fe_op_modify (op=0x7f7ec81adaa0, rs=0x7f7ecfffe920)
    at ../../../servers/slapd/modify.c:301
#6  0x00007f7ef077685c in do_modify (op=0x7f7ec81adaa0, rs=0x7f7ecfffe920)
    at ../../../servers/slapd/modify.c:175
#7  0x00007f7ef075c9f9 in connection_operation (ctx=0x7f7ecfffeb70,
    arg_v=0x7f7ec81adaa0) at ../../../servers/slapd/connection.c:1109
#8  0x00007f7ef075d34d in connection_read_thread (ctx=0x7f7ecfffeb70,
    argv=<value optimized out>) at ../../../servers/slapd/connection.c:1245
#9  0x00007f7ef085cdd8 in ldap_int_thread_pool_wrapper (xpool=0x7f7ef19957b0)
    at ../../../libraries/libldap_r/tpool.c:685
#10 0x00007f7eee6e4851 in start_thread () from /lib64/libpthread.so.0
#11 0x00007f7eee22790d in clone () from /lib64/libc.so.6

(gdb) thread 3
[Switching to thread 3 (Thread 0x7f7ed4827700 (LWP 7050))]#0  0x00007f7eee6eb54d
in read () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007f7eee6eb54d in read () from /lib64/libpthread.so.0
#1  0x00007f7ef088c67c in sb_debug_read (sbiod=0x7f7ec0174cb0,
    buf=0x7f7ec0191130, len=5) at ../../../libraries/liblber/sockbuf.c:829
#2  0x00007f7ef0882564 in tlsm_PR_Recv (fd=<value optimized out>,
    buf=0x7f7ec0191130, len=5, flags=<value optimized out>,
    timeout=<value optimized out>) at tls_m.c:3032
#3  0x00007f7eef6ed1ed in ssl_DefRecv (ss=<value optimized out>,
    buf=<value optimized out>, len=5, flags=<value optimized out>)
    at ssldef.c:62
#4  0x00007f7eef6e8480 in ssl3_GatherData (ss=0x7f7ec0190d50, flags=0)
    at ssl3gthr.c:59
#5  ssl3_GatherCompleteHandshake (ss=0x7f7ec0190d50, flags=0) at ssl3gthr.c:318
#6  0x00007f7eef6eaed2 in ssl_GatherRecord1stHandshake (ss=0x7f7ec0190d50)
    at sslcon.c:1227
#7  0x00007f7eef6f1135 in ssl_Do1stHandshake (ss=0x7f7ec0190d50)
    at sslsecur.c:120
#8  0x00007f7eef6f296f in SSL_ForceHandshake (fd=<value optimized out>)
    at sslsecur.c:390
#9  0x00007f7ef0882288 in tlsm_session_accept_or_connect (
    session=0x7f7ec0174910, is_accept=<value optimized out>) at tls_m.c:2677
#10 0x00007f7ef0880e8a in ldap_int_tls_connect (ld=0x7f7ec0107ca0,
    conn=<value optimized out>, host=0x7f7ec0170710 "redacted")
    at tls2.c:366
#11 0x00007f7ef088110a in ldap_int_tls_start (ld=0x7f7ec0107ca0,
    conn=0x7f7ec0107f10, srv=<value optimized out>) at tls2.c:845
#12 0x00007f7ef088120e in ldap_start_tls_s (ld=0x7f7ec0107ca0,
    serverctrls=0x0, clientctrls=0x0) at tls2.c:938
#13 0x00007f7ef075269e in slap_client_connect (ldp=0x7f7ec01709f0,
    sb=0x7f7ec01707d0) at ../../../servers/slapd/config.c:1921
#14 0x00007f7ef07c5062 in do_syncrep1 (ctx=<value optimized out>,
    arg=0x7f7ec0170b60) at ../../../servers/slapd/syncrepl.c:571
#15 do_syncrepl (ctx=<value optimized out>, arg=0x7f7ec0170b60)
    at ../../../servers/slapd/syncrepl.c:1422
#16 0x00007f7ef085cdd8 in ldap_int_thread_pool_wrapper (xpool=0x7f7ef19957b0)
    at ../../../libraries/libldap_r/tpool.c:685
#17 0x00007f7eee6e4851 in start_thread () from /lib64/libpthread.so.0
#18 0x00007f7eee22790d in clone () from /lib64/libc.so.6

(gdb) thread 4
[Switching to thread 4 (Thread 0x7f7ed5028700 (LWP 7049))]#0  0x00007f7eee6eb054
in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007f7eee6eb054 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f7eee6e6388 in _L_lock_854 () from /lib64/libpthread.so.0
#2  0x00007f7eee6e6257 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f7ef0882280 in tlsm_session_accept_or_connect (
    session=0x7f7ec4131df0, is_accept=<value optimized out>) at tls_m.c:2674
#4  0x00007f7ef08812c7 in ldap_pvt_tls_accept (sb=0x7f7ec4140420,
    ctx_arg=0x7f7ef1a82e70) at tls2.c:433
#5  0x00007f7ef075d3bb in connection_read (ctx=0x7f7ed5027b70, argv=0x17)
    at ../../../servers/slapd/connection.c:1326
#6  connection_read_thread (ctx=0x7f7ed5027b70, argv=0x17)
    at ../../../servers/slapd/connection.c:1238
#7  0x00007f7ef085cdd8 in ldap_int_thread_pool_wrapper (xpool=0x7f7ef19957b0)
    at ../../../libraries/libldap_r/tpool.c:685
#8  0x00007f7eee6e4851 in start_thread () from /lib64/libpthread.so.0
#9  0x00007f7eee22790d in clone () from /lib64/libc.so.6

(gdb) thread 5
[Switching to thread 5 (Thread 0x7f7eeb3d0700 (LWP 7048))]#0  0x00007f7eee227f03
in epoll_wait () from /lib64/libc.so.6
(gdb) bt
#0  0x00007f7eee227f03 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f7ef07576f5 in slapd_daemon_task (ptr=<value optimized out>)
    at ../../../servers/slapd/daemon.c:2467
#2  0x00007f7eee6e4851 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f7eee22790d in clone () from /lib64/libc.so.6