[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#8384) LDAP Servers in Multi-master configuration locked up and unresponsive



Hello,

this seems to be a RHEL/CentOS-specific issue. Please contact Red Hat Support.
I think this ITS ticket may be closed.

Regards.
On 03/11/2016 04:01 PM, kevin4sullivan@gmail.com wrote:
> Full_Name: Kevin Sullivan
> Version: 2.4.39-8
> OS: RHEL 6.4
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (38.127.164.138)
>
>
> Setup:
>
> I have two OpenLDAP servers running on separate hosts and replicating via delta
> syncrepl in mirrormode.
>
> Problem:
>
> On rare occasions when one of my servers comes online, both servers will lock up
> and be unresponsive. Any query to either server will time out. The servers will
> stay in this state indefinitely.
>
> Workaround:
>
> Killing either of the servers will resolve this problem.
>
> Observations:
>
> - I don't know how to reproduce this reliably.
> - Both servers have a thread that is stuck in a do_syncrep1() call.
> - Netstat shows that each server has unacknowledged data in one of their
> socket's receive queue.
>
>
> GDB on host1:
>
> (gdb) info threads
>    18 Thread 0x7f8b7b80d700 (LWP 31620)  0x00007f8ba7fa6f03 in epoll_wait () from
> /lib64/libc.so.6
>    17 Thread 0x7f8b7b00c700 (LWP 31621)  0x00007f8ba846a054 in __lll_lock_wait ()
> from /lib64/libpthread.so.0
>    16 Thread 0x7f8b7a80b700 (LWP 31622)  0x00007f8ba846a054 in __lll_lock_wait ()
> from /lib64/libpthread.so.0
>    15 Thread 0x7f8b7a00a700 (LWP 31623)  0x00007f8ba846a054 in __lll_lock_wait ()
> from /lib64/libpthread.so.0
>    14 Thread 0x7f8b79809700 (LWP 31625)  0x00007f8ba846a54d in read () from
> /lib64/libpthread.so.0
>    13 Thread 0x7f8b79008700 (LWP 31626)  0x00007f8ba846a054 in __lll_lock_wait ()
> from /lib64/libpthread.so.0
>    12 Thread 0x7f8b59ffd700 (LWP 25459)  0x00007f8ba846a054 in __lll_lock_wait ()
> from /lib64/libpthread.so.0
>    11 Thread 0x7f8b597fc700 (LWP 10195)  0x00007f8ba846a054 in __lll_lock_wait ()
> from /lib64/libpthread.so.0
>    10 Thread 0x7f8b70b37700 (LWP 28996)  0x00007f8ba846a054 in __lll_lock_wait ()
> from /lib64/libpthread.so.0
>    9 Thread 0x7f8b58ffb700 (LWP 8763)  0x00007f8ba846a054 in __lll_lock_wait ()
> from /lib64/libpthread.so.0
>    8 Thread 0x7f8b4bfff700 (LWP 28614)  0x00007f8ba846a054 in __lll_lock_wait ()
> from /lib64/libpthread.so.0
>    7 Thread 0x7f8b4b7fe700 (LWP 23367)  0x00007f8ba846a054 in __lll_lock_wait ()
> from /lib64/libpthread.so.0
>    6 Thread 0x7f8b4affd700 (LWP 13359)  0x00007f8ba846a054 in __lll_lock_wait ()
> from /lib64/libpthread.so.0
>    5 Thread 0x7f8b4a7fc700 (LWP 29064)  0x00007f8ba846a054 in __lll_lock_wait ()
> from /lib64/libpthread.so.0
>    4 Thread 0x7f8b49ffb700 (LWP 2184)  0x00007f8ba846a054 in __lll_lock_wait ()
> from /lib64/libpthread.so.0
>    3 Thread 0x7f8b497fa700 (LWP 31175)  0x00007f8ba846a054 in __lll_lock_wait ()
> from /lib64/libpthread.so.0
>    2 Thread 0x7f8b48ff9700 (LWP 2316)  0x00007f8ba846a054 in __lll_lock_wait ()
> from /lib64/libpthread.so.0
> * 1 Thread 0x7f8baa8db700 (LWP 31615)  0x00007f8ba84640ad in pthread_join ()
> from /lib64/libpthread.so.0
> (gdb) bt
> #0  0x00007f8ba84640ad in pthread_join () from /lib64/libpthread.so.0
> #1  0x00007f8baa93c969 in slapd_daemon () at
> ../../%./servers/slapd/daemon.c:2929
> #2  0x00007f8baa927733 in main (argc=6, argv=<value optimized out>) at
> ../../../servers/slapd/main.c:1012
> (gdb) thread 2
> [Switching to thread 2 (Thread 0x7f8b48ff9700 (LWP 2316))]#0  0x00007f8ba846a054
> in __lll_lock_wait (929 from /lib64/libpthread.so.0
> (gdb) bt
> #0  0x00007f8ba846a054 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x00007f8ba8465388 in _L_lock_854 () from /lib64/libpthread.so.0
> #2  0x00007f8ba8465257 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x00007f8baa4b96a0 in tlsm_session_accept_or_connect
> (session=0x7f8b2c104110, is_accept=<value optimized out>) at tls_m.c:2656
> #4  0x00007f8baa4b8787 in ldap_pvt_tls_accept (sb=0x7f8b2c104150,
> ctx_arg=0x7f8bac6f8230) at tls2.c:425
> #5  0x00007f8baa943f23 in connection_read (ctx=0x7f8b48ff8b70, argv=0x2f) at
> ../../../servers/slapd/connection.c:1372
> #6  connection_read_thread (ctx=0x7f8b48ff8b70, argv=0x2f) at
> ../../../servers/slapd/connection.c:1284
> #7  0x00007f8baa490a98 in ldap_int_thread_pool_wrapper (xpool=0x7f8bac5c8fb0) at
> ../../../libraries/libldap_r/tpool.c:688
> #8  0x00007f8ba8463851 in start_thread () from /lib64/libpthread.so.0
> #9  0x00007f8ba7fa690d in clone () from /lib64/libc.so.6
> (gdb) thread 14
> [Switching to thread 14 (Thread 0x7f8b79809700 (LWP 31625))]#0
> 0x00007f8ba846a54d in read () from /lib64/libpthread.so.0
> (gdb) bt
> #0  0x00007f8ba846a54d in read () from /lib64/libpthread.so.0
> #1  0x00007f8baa278b1c in sb_debug_read (sbiod=0x7f8b681def80,
> buf=0x7f8b6819c830, len=5) at ../../../libraries/liblber/sockbuf.c:829
> #2  0x00007f8baa4b9984 in tlsm_PR_Recv (fd=<value optimized out>,
> buf=0x7f8b6819c830, len=5, flags=<value optimized out>, timeout=<value optimized
> out>) at tls_m.c:3007
> #3  0x00007f8ba946c1ed in ?? () from /usr/lib64/libssl3.so
> #4  0000007f8ba9467480 in ?? () from /usr/lib64/libssl3.so
> #5  0x00007f8ba9469ed2 in ?? () from /usr/lib64/libssl3.so
> #6  0x00007f8ba9470135 in ?? () from /usr/lib64/libssl3.so
> #7  0x00007f8ba947196f in SSL_ForceHandshake () from /usr/lib64/libssl3o%o
> #8  0x00007f8baa4b96a8 in tlsm_session_accept_or_connect
> (session=0x7f8b681b0f20, is_accept=<value optimized out>) at tls_m.c:2658
> #9  0x00007f8baa4b82e2 in ldap_int_tls_connect (ld=0x7f8b681d9250, conn=<value
> optimized out>) at tls2.c:362
> #10 0x00007f8baa4b857inin ldap_int_tls_start (ld=0x7f8b681d9250,
> conn=0x7f8b681b0ba0, srv=<value optimized out>) at tls2.c:860
> #11 0x00007f8baa4b86ce in ldap_start_tls_s (ld=0x7f8b681d9250, serverctrls=0x0,
> clientctrls=0x0) at tls2.c:1040
> #12 0x00007f8baa937ff6 in slap_client_connect (ldp=0x7f8bac6d35a8,
> sb=0x7f8bac6d3380) at ../../../servers/slapd/config.c:2012
> #13 0x00007f8baa9ae3f5 in do_syncrep1 (ctx=<value optimized out>,
> arg=0x7f8bac6cc430) at ../../../servers/slapd/syncrepl.c:613
> #14 do_syncrepl (ctx=<value optimized out>, arg=0x7f8bac6cc430) at
> ../../../servers/slapd/syncrepl.c:1527
> #15 0x00007f8baa490a98 in ldap_int_thread_pool_wrapper (xpool=0x7f8bac5c8fb0) at
> ../../../libraries/libldap_r/tpool.c:688
> #16 0x00007f8ba8463851 in start_thread () from /lib64/libpthread.so.0
> #17 0x00007f8ba7fa690d in clone () from /lib64/libc.so.6
> (gdb) thread 18
> [Switching to thread 18 (Thread 0x7f8b7b80d700 (LWP 31620))]#0
> 0x00007f8ba7fa6f03 in epoll_wait () from /lib64/libc.so.6
> (gdb) bt
> #0  0x00007f8ba7fa6f03 in epoll_wait () from /lib64/libc.so.6
> #1  0x00007f8baa93d9e2 in slapd_daemon_task (ptr=<value optimized out>) at
> ../../../servers/slapd/daemon.c:2536
> #2  0x00007f8ba8463851 in start_thread () from /lib64/libpthread.so.0
> #3  0x00007f8ba7fa690d in clone () from /lib64/libc.so.6
>
>
> GDB on host2:
>
> (gdb) info threads
>    7 Thread 0x7f5c942d5700 (LWP 3869)  0x00007f5cc0797f03 in epoll_wait () from
> /lib64/libc.so.6
>    6 Thread 0x7f5c93ad4700 (LWP 3870)  0x00007f5cc0c5b54d in read () from
> /lib64/libpthread.so.0
>    5 Thread 0x7f5c932d3700 (LWP 3871)  0x00007f5cc0c5b054 in __lll_lock_wait ()
> from /lib64/libpthread.so.0
>    4 Thread 0x7f5c92ad2700 (LWP 3872)  0x00007f5cc0c5843c in
> pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
>    3 Thread 0x7f5c922d1700 (LWP 3873)  0x00007f5cc0c5843c in
> pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
>    2 Thread 0x7f5c91ad0700 (LWP 3874)  0x00007f5cc0c5843c in
> pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
> * 1 Thread 0x7f5cc30cc700 (LWP 3868)  0x00007f5cc0c550ad in pthread_join () from
> /lib64/libpthread.so.0
> (gdb) bt
> #0  0x00007f5cc0c550ad in pthread_join () from /lib64/libpthread.so.0
> #1  0x00007f5cc312d969 in slapd_daemon () at
> ../../../servers/slapd/daemon.c:2929
> #2  0x00007f5cc3118733 in main (argc=6, argv=<value optimized out>) at
> ../../../servers/slapd/main.c:1012
> (gdb) thread 2
> [Switching to thread 2 (Thread 0x7f5c91ad0700 (LWP 3874))]#0  0x00007f5cc0c5843c
> in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
> (gdb) bt
> #0  0x00007f5cc0c5843c in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1  0x00007f5cc2c81af5 in ldap_int_thread_pool_wrapper (xpool=0x7f5cc3d18030) at
> ../../../libraries/libldap_r/tpool.c:675
> #2  0x00007f5cc0c54851 in start_thread () from /lib64/libpthread.so.0
> #3  0x00007f5cc079790d in clone () from /lib64/libc.so.6
> (gdb) thread 5
> [Switching to thread 5 (Thread 0x7f5c932d3700 (LWP 3871))]#0  0x00007f5cc0c5b054
> in __lll_lock_wait () from /lib64/libpthread.so.0
> (gdb) bt
> #0  0x00007f5cc0c5b054 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x00007f5cc0c56388 in _L_lock_854 () from /lib64/libpthread.so.0
> #2  0x00007f5cc0c56257 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x00007f5cc2caa6a0 in tlsm_session_accept_or_connect
> (session=0x7f5c841165c0, is_accept=<value optimized out>) at tls_m.c:2656
> #4  0x00007f5cc2ca9787 in ldap_pvt_tls_accept (sb=0x7f5c800008c0,
> ctx_arg=0x7f5cc3e3b240) at tls2.c:425
> #5  0x00007f5cc3134f23 in connection_read (ctx=0x7f5c932d2b70, argv=0x10) at
> ../../../servers/slapd/connection.c:1372
> #6  connection_read_thread (ctx=0x7f5c932d2b70, argv=0x10) at
> ../../../servers/slapd/connection.c:1284
> #7  0x00007f5cc2c81a98 in ldap_int_thread_pool_wrapper (xpool=0x7f5cc3d18030) at
> ../../../libraries/libldap_r%tptpool.c:688
> #8  0x00007f5cc0c54851 in start_thread () from /lib64/libpthread.so.0
> #9  0x00007f5cc079790d in clone () from /lib64/libc.so.6
> (gdb) thread 6
> [Switching to thread 6 (Thread 0x7f5c93ad4700 (LWP 3870))]#0  0x00007f5cc0c5b54d
> in read () from /lib64/libpthread.so.0
> (gdb) bt
> #0  0x00007f5cc0c5b54d in read () from /lib64/libpthread.so.0
> #1  0x00007f5cc2a69b1c in sb_debug_read (sbiod=0x7f5c7c185e10,
> buf=0x7f5c7c18a7f0, len=5) at ../../../libraries/liblber/sockbuf.c:829
> #2  0x00007f5cc2caa984 in tlsm_PR_Recv (fd=<value optimized out>,
> buf=0x7f5c7c18a7f0, len=5, flags=<value optimized out>, timeout=<value optimized
> out>) at tls_m.c:3007
> #3  0x00007f5cc1c5d1ed in ?? () from /usr/lib64/libssl3.so
> #4  0x00007f5cc1c58480 in ?? () from /usr/lib64/libssl3.so
> #5  0x00007f5cc1c5aed2 in ?? () from /usr/lib64/libssl3.so
> #6  0x00007f5cc1c61135 in ?? () from /usr/lib64/libssl3.so
> #7  0x00007f5cc1c6296f in SSL_ForceHandshake () from /usr/lib64/libssl3.so
> #8  0x00007f5cc2caa6a8 in tlsm_session_accept_or_connect
> (session=0x7f5c7c182100, is_accept=<value optimized out>) at tls_m.c:2658
> #9  0x00007f5cc2ca92e2 in ldap_int_tls_connect (ld=0x7f5c7c100910, conn=<value
> optimized out>) at tls2.c:362
> #10 0x00007f5cc2ca9574 in ldap_int_tls_start (ld=0x7f5c7c100910,
> conn=0x7f5c7c10a020, srv=<value optimized out>) at tls2.c:860
> #11 0x00007f5cc2ca96ce in ldap_start_tls_s (ld=0x7f5c7c100910, serverctrls=0x0,
> clientctrls=0x0) at tls2.c:1040
> #12 0x00007f5cc3128ff6 in slap_client_connect (ldp=0x7f5cc3e22558,
> sb=0x7f5cc3e22330) at ../../../servers/slapd/config.c:2012
> #13 0x00007f5cc319f3f5 in do_syncrep1 (ctx=<value optimized out>,
> arg=0x7f5cc3e1f410) at ../../../servers/slapd/syncrepl.c:613
> #14 do_syncrepl (ctx=<value optimized out>, arg=0x7f5cc3e1f410) at
> ../../../servers/slapd/syncrepl.c:1527
> #15 0x00007f5cc2c81a98 in ldap_int_thread_pool_wrapper (xpool=0x7f5cc3d18030) at
> ../../../libraries/libldap_r/tpool.c:688
> #16 0x00007f5cc0c54851 in start_thread () from /lib64/libpthread.so.0
> #17 0x00007f5cc079790d in clone () from /lib64/libc.so.6
> (gdb) thread 7
> [Switching to thread 7 (Thread 0x7f5c942d5700 (LWP 3869))]#0  0x00007f5cc0797f03
> in epoll_wait () from /lib64/libc.so.6
> (gdb) bt
> #0  0x00007f5cc0797f03 in epoll_wait () from /lib64/libc.so.6
> #1  0x00007f5cc312e9e2 in slapd_daemon_task (ptr=<value optimized out>) at
> ../../../servers/slapd/daemon.c:2536
> #2  0x00007f5cc0c54851 in start_thread () from /lib64/libpthread.so.0
> #3  0x00007f5cc079790d in clone () from /lib64/libc.so.6
>
>
> Relevant configuration information (from host2):
>
> syncrepl rid=001
>      provider=ldap://host1/
>      type=refreshAndPersist
>      retry="10 +"
>      searchbase="dc=example,dc=com"
>      starttls=critical
>      tls_cacert=/etc/openldap/ssl-certs/cacert.pem
>      tls_cert=/etc/openldap/ssl-certs/host1.crt
>      tls_key=/etc/openldap/ssl-certs/host1.key
>      tls_reqcert=demand
>      bindmethod=sasl
>      saslmech=EXTERNAL
>      sizelimit=unlimited
>      keepalive="10:2:10"
>      logfilter="(&(objectClass=auditWriteObject)(reqResult=0))"
>      logbase="cn=accesslog"
>      syncdata=accesslog
>
>
> Netstat of host1:
>
> [root@host1 ~]# netstat -anp | grep slapd
> tcp        0      0 0.0.0.0:389                 0.0.0.0:*
> LISTEN      31615/slapd
> tcp       56      0 192.168.1.1:389               192.168.1.2:57276
> ESTABLISHED 31615/slapd
> tcp        0      0 192.168.1.1:45569             192.168.1.2:389
> ESTABLISHED 31615/slapd
> unix  2      [ ]         STREAM     CONNECTED     45346152 31615/slapd
> unix  2      [ ]         DGRAM                    45346144 31615/slapd
>
> Netstat of host2:
>
> [root@host2 ~]# netstat -anp | grep slapd
> tcp        0      0 0.0.0.0:389                 0.0.0.0:*
> LISTEN      3868/slapd
> tcp       56      0 192.168.1.2:389               192.168.1.1:45569
> ESTABLISHED 3868/slapd
> tcp        0      0 192.168.1.2:57276             192.168.1.1:389
> ESTABLISHED 3868/slapd
> unix  2      [ ]         STREAM     CONNECTED     19144  3868/slapd
> unix         [ ]         DGRAM                    19136  3868/slapd
>
>

-- 
Matúš HonÄ?k