[Date Prev][Date Next] [Chronological] [Thread] [Top]

(ITS#8384) LDAP Servers in Multi-master configuration locked up and unresponsive



Full_Name: Kevin Sullivan
Version: 2.4.39-8
OS: RHEL 6.4
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (38.127.164.138)


Setup:

I have two OpenLDAP servers running on separate hosts and replicating via delta
syncrepl in mirrormode. 

Problem:

On rare occasions when one of my servers comes online, both servers will lock up
and be unresponsive. Any query to either server will time out. The servers will
stay in this state indefinitely.

Workaround:

Killing either of the servers will resolve this problem. 

Observations:

- I don't know how to reproduce this reliably.
- Both servers have a thread that is stuck in a do_syncrep1() call.
- Netstat shows that each server has unacknowledged data in one of their
socket's receive queue.


GDB on host1:

(gdb) info threads
  18 Thread 0x7f8b7b80d700 (LWP 31620)  0x00007f8ba7fa6f03 in epoll_wait () from
/lib64/libc.so.6
  17 Thread 0x7f8b7b00c700 (LWP 31621)  0x00007f8ba846a054 in __lll_lock_wait ()
from /lib64/libpthread.so.0
  16 Thread 0x7f8b7a80b700 (LWP 31622)  0x00007f8ba846a054 in __lll_lock_wait ()
from /lib64/libpthread.so.0
  15 Thread 0x7f8b7a00a700 (LWP 31623)  0x00007f8ba846a054 in __lll_lock_wait ()
from /lib64/libpthread.so.0
  14 Thread 0x7f8b79809700 (LWP 31625)  0x00007f8ba846a54d in read () from
/lib64/libpthread.so.0
  13 Thread 0x7f8b79008700 (LWP 31626)  0x00007f8ba846a054 in __lll_lock_wait ()
from /lib64/libpthread.so.0
  12 Thread 0x7f8b59ffd700 (LWP 25459)  0x00007f8ba846a054 in __lll_lock_wait ()
from /lib64/libpthread.so.0
  11 Thread 0x7f8b597fc700 (LWP 10195)  0x00007f8ba846a054 in __lll_lock_wait ()
from /lib64/libpthread.so.0
  10 Thread 0x7f8b70b37700 (LWP 28996)  0x00007f8ba846a054 in __lll_lock_wait ()
from /lib64/libpthread.so.0
  9 Thread 0x7f8b58ffb700 (LWP 8763)  0x00007f8ba846a054 in __lll_lock_wait ()
from /lib64/libpthread.so.0
  8 Thread 0x7f8b4bfff700 (LWP 28614)  0x00007f8ba846a054 in __lll_lock_wait ()
from /lib64/libpthread.so.0
  7 Thread 0x7f8b4b7fe700 (LWP 23367)  0x00007f8ba846a054 in __lll_lock_wait ()
from /lib64/libpthread.so.0
  6 Thread 0x7f8b4affd700 (LWP 13359)  0x00007f8ba846a054 in __lll_lock_wait ()
from /lib64/libpthread.so.0
  5 Thread 0x7f8b4a7fc700 (LWP 29064)  0x00007f8ba846a054 in __lll_lock_wait ()
from /lib64/libpthread.so.0
  4 Thread 0x7f8b49ffb700 (LWP 2184)  0x00007f8ba846a054 in __lll_lock_wait ()
from /lib64/libpthread.so.0
  3 Thread 0x7f8b497fa700 (LWP 31175)  0x00007f8ba846a054 in __lll_lock_wait ()
from /lib64/libpthread.so.0
  2 Thread 0x7f8b48ff9700 (LWP 2316)  0x00007f8ba846a054 in __lll_lock_wait ()
from /lib64/libpthread.so.0
* 1 Thread 0x7f8baa8db700 (LWP 31615)  0x00007f8ba84640ad in pthread_join ()
from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007f8ba84640ad in pthread_join () from /lib64/libpthread.so.0
#1  0x00007f8baa93c969 in slapd_daemon () at
../../%./servers/slapd/daemon.c:2929
#2  0x00007f8baa927733 in main (argc=6, argv=<value optimized out>) at
../../../servers/slapd/main.c:1012
(gdb) thread 2
[Switching to thread 2 (Thread 0x7f8b48ff9700 (LWP 2316))]#0  0x00007f8ba846a054
in __lll_lock_wait (929 from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007f8ba846a054 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f8ba8465388 in _L_lock_854 () from /lib64/libpthread.so.0
#2  0x00007f8ba8465257 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f8baa4b96a0 in tlsm_session_accept_or_connect
(session=0x7f8b2c104110, is_accept=<value optimized out>) at tls_m.c:2656
#4  0x00007f8baa4b8787 in ldap_pvt_tls_accept (sb=0x7f8b2c104150,
ctx_arg=0x7f8bac6f8230) at tls2.c:425
#5  0x00007f8baa943f23 in connection_read (ctx=0x7f8b48ff8b70, argv=0x2f) at
../../../servers/slapd/connection.c:1372
#6  connection_read_thread (ctx=0x7f8b48ff8b70, argv=0x2f) at
../../../servers/slapd/connection.c:1284
#7  0x00007f8baa490a98 in ldap_int_thread_pool_wrapper (xpool=0x7f8bac5c8fb0) at
../../../libraries/libldap_r/tpool.c:688
#8  0x00007f8ba8463851 in start_thread () from /lib64/libpthread.so.0
#9  0x00007f8ba7fa690d in clone () from /lib64/libc.so.6
(gdb) thread 14
[Switching to thread 14 (Thread 0x7f8b79809700 (LWP 31625))]#0 
0x00007f8ba846a54d in read () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007f8ba846a54d in read () from /lib64/libpthread.so.0
#1  0x00007f8baa278b1c in sb_debug_read (sbiod=0x7f8b681def80,
buf=0x7f8b6819c830, len=5) at ../../../libraries/liblber/sockbuf.c:829
#2  0x00007f8baa4b9984 in tlsm_PR_Recv (fd=<value optimized out>,
buf=0x7f8b6819c830, len=5, flags=<value optimized out>, timeout=<value optimized
out>) at tls_m.c:3007
#3  0x00007f8ba946c1ed in ?? () from /usr/lib64/libssl3.so
#4  0000007f8ba9467480 in ?? () from /usr/lib64/libssl3.so
#5  0x00007f8ba9469ed2 in ?? () from /usr/lib64/libssl3.so
#6  0x00007f8ba9470135 in ?? () from /usr/lib64/libssl3.so
#7  0x00007f8ba947196f in SSL_ForceHandshake () from /usr/lib64/libssl3o%o
#8  0x00007f8baa4b96a8 in tlsm_session_accept_or_connect
(session=0x7f8b681b0f20, is_accept=<value optimized out>) at tls_m.c:2658
#9  0x00007f8baa4b82e2 in ldap_int_tls_connect (ld=0x7f8b681d9250, conn=<value
optimized out>) at tls2.c:362
#10 0x00007f8baa4b857inin ldap_int_tls_start (ld=0x7f8b681d9250,
conn=0x7f8b681b0ba0, srv=<value optimized out>) at tls2.c:860
#11 0x00007f8baa4b86ce in ldap_start_tls_s (ld=0x7f8b681d9250, serverctrls=0x0,
clientctrls=0x0) at tls2.c:1040
#12 0x00007f8baa937ff6 in slap_client_connect (ldp=0x7f8bac6d35a8,
sb=0x7f8bac6d3380) at ../../../servers/slapd/config.c:2012
#13 0x00007f8baa9ae3f5 in do_syncrep1 (ctx=<value optimized out>,
arg=0x7f8bac6cc430) at ../../../servers/slapd/syncrepl.c:613
#14 do_syncrepl (ctx=<value optimized out>, arg=0x7f8bac6cc430) at
../../../servers/slapd/syncrepl.c:1527
#15 0x00007f8baa490a98 in ldap_int_thread_pool_wrapper (xpool=0x7f8bac5c8fb0) at
../../../libraries/libldap_r/tpool.c:688
#16 0x00007f8ba8463851 in start_thread () from /lib64/libpthread.so.0
#17 0x00007f8ba7fa690d in clone () from /lib64/libc.so.6
(gdb) thread 18
[Switching to thread 18 (Thread 0x7f8b7b80d700 (LWP 31620))]#0 
0x00007f8ba7fa6f03 in epoll_wait () from /lib64/libc.so.6
(gdb) bt
#0  0x00007f8ba7fa6f03 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f8baa93d9e2 in slapd_daemon_task (ptr=<value optimized out>) at
../../../servers/slapd/daemon.c:2536
#2  0x00007f8ba8463851 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f8ba7fa690d in clone () from /lib64/libc.so.6


GDB on host2:

(gdb) info threads
  7 Thread 0x7f5c942d5700 (LWP 3869)  0x00007f5cc0797f03 in epoll_wait () from
/lib64/libc.so.6
  6 Thread 0x7f5c93ad4700 (LWP 3870)  0x00007f5cc0c5b54d in read () from
/lib64/libpthread.so.0
  5 Thread 0x7f5c932d3700 (LWP 3871)  0x00007f5cc0c5b054 in __lll_lock_wait ()
from /lib64/libpthread.so.0
  4 Thread 0x7f5c92ad2700 (LWP 3872)  0x00007f5cc0c5843c in
pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  3 Thread 0x7f5c922d1700 (LWP 3873)  0x00007f5cc0c5843c in
pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  2 Thread 0x7f5c91ad0700 (LWP 3874)  0x00007f5cc0c5843c in
pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
* 1 Thread 0x7f5cc30cc700 (LWP 3868)  0x00007f5cc0c550ad in pthread_join () from
/lib64/libpthread.so.0
(gdb) bt
#0  0x00007f5cc0c550ad in pthread_join () from /lib64/libpthread.so.0
#1  0x00007f5cc312d969 in slapd_daemon () at
../../../servers/slapd/daemon.c:2929
#2  0x00007f5cc3118733 in main (argc=6, argv=<value optimized out>) at
../../../servers/slapd/main.c:1012
(gdb) thread 2
[Switching to thread 2 (Thread 0x7f5c91ad0700 (LWP 3874))]#0  0x00007f5cc0c5843c
in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007f5cc0c5843c in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00007f5cc2c81af5 in ldap_int_thread_pool_wrapper (xpool=0x7f5cc3d18030) at
../../../libraries/libldap_r/tpool.c:675
#2  0x00007f5cc0c54851 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f5cc079790d in clone () from /lib64/libc.so.6
(gdb) thread 5
[Switching to thread 5 (Thread 0x7f5c932d3700 (LWP 3871))]#0  0x00007f5cc0c5b054
in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007f5cc0c5b054 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f5cc0c56388 in _L_lock_854 () from /lib64/libpthread.so.0
#2  0x00007f5cc0c56257 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f5cc2caa6a0 in tlsm_session_accept_or_connect
(session=0x7f5c841165c0, is_accept=<value optimized out>) at tls_m.c:2656
#4  0x00007f5cc2ca9787 in ldap_pvt_tls_accept (sb=0x7f5c800008c0,
ctx_arg=0x7f5cc3e3b240) at tls2.c:425
#5  0x00007f5cc3134f23 in connection_read (ctx=0x7f5c932d2b70, argv=0x10) at
../../../servers/slapd/connection.c:1372
#6  connection_read_thread (ctx=0x7f5c932d2b70, argv=0x10) at
../../../servers/slapd/connection.c:1284
#7  0x00007f5cc2c81a98 in ldap_int_thread_pool_wrapper (xpool=0x7f5cc3d18030) at
../../../libraries/libldap_r%tptpool.c:688
#8  0x00007f5cc0c54851 in start_thread () from /lib64/libpthread.so.0
#9  0x00007f5cc079790d in clone () from /lib64/libc.so.6
(gdb) thread 6
[Switching to thread 6 (Thread 0x7f5c93ad4700 (LWP 3870))]#0  0x00007f5cc0c5b54d
in read () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007f5cc0c5b54d in read () from /lib64/libpthread.so.0
#1  0x00007f5cc2a69b1c in sb_debug_read (sbiod=0x7f5c7c185e10,
buf=0x7f5c7c18a7f0, len=5) at ../../../libraries/liblber/sockbuf.c:829
#2  0x00007f5cc2caa984 in tlsm_PR_Recv (fd=<value optimized out>,
buf=0x7f5c7c18a7f0, len=5, flags=<value optimized out>, timeout=<value optimized
out>) at tls_m.c:3007
#3  0x00007f5cc1c5d1ed in ?? () from /usr/lib64/libssl3.so
#4  0x00007f5cc1c58480 in ?? () from /usr/lib64/libssl3.so
#5  0x00007f5cc1c5aed2 in ?? () from /usr/lib64/libssl3.so
#6  0x00007f5cc1c61135 in ?? () from /usr/lib64/libssl3.so
#7  0x00007f5cc1c6296f in SSL_ForceHandshake () from /usr/lib64/libssl3.so
#8  0x00007f5cc2caa6a8 in tlsm_session_accept_or_connect
(session=0x7f5c7c182100, is_accept=<value optimized out>) at tls_m.c:2658
#9  0x00007f5cc2ca92e2 in ldap_int_tls_connect (ld=0x7f5c7c100910, conn=<value
optimized out>) at tls2.c:362
#10 0x00007f5cc2ca9574 in ldap_int_tls_start (ld=0x7f5c7c100910,
conn=0x7f5c7c10a020, srv=<value optimized out>) at tls2.c:860
#11 0x00007f5cc2ca96ce in ldap_start_tls_s (ld=0x7f5c7c100910, serverctrls=0x0,
clientctrls=0x0) at tls2.c:1040
#12 0x00007f5cc3128ff6 in slap_client_connect (ldp=0x7f5cc3e22558,
sb=0x7f5cc3e22330) at ../../../servers/slapd/config.c:2012
#13 0x00007f5cc319f3f5 in do_syncrep1 (ctx=<value optimized out>,
arg=0x7f5cc3e1f410) at ../../../servers/slapd/syncrepl.c:613
#14 do_syncrepl (ctx=<value optimized out>, arg=0x7f5cc3e1f410) at
../../../servers/slapd/syncrepl.c:1527
#15 0x00007f5cc2c81a98 in ldap_int_thread_pool_wrapper (xpool=0x7f5cc3d18030) at
../../../libraries/libldap_r/tpool.c:688
#16 0x00007f5cc0c54851 in start_thread () from /lib64/libpthread.so.0
#17 0x00007f5cc079790d in clone () from /lib64/libc.so.6
(gdb) thread 7
[Switching to thread 7 (Thread 0x7f5c942d5700 (LWP 3869))]#0  0x00007f5cc0797f03
in epoll_wait () from /lib64/libc.so.6
(gdb) bt
#0  0x00007f5cc0797f03 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f5cc312e9e2 in slapd_daemon_task (ptr=<value optimized out>) at
../../../servers/slapd/daemon.c:2536
#2  0x00007f5cc0c54851 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f5cc079790d in clone () from /lib64/libc.so.6


Relevant configuration information (from host2):

syncrepl rid=001
    provider=ldap://host1/
    type=refreshAndPersist
    retry="10 +"
    searchbase="dc=example,dc=com"
    starttls=critical
    tls_cacert=/etc/openldap/ssl-certs/cacert.pem
    tls_cert=/etc/openldap/ssl-certs/host1.crt
    tls_key=/etc/openldap/ssl-certs/host1.key
    tls_reqcert=demand
    bindmethod=sasl
    saslmech=EXTERNAL
    sizelimit=unlimited
    keepalive="10:2:10"
    logfilter="(&(objectClass=auditWriteObject)(reqResult=0))"
    logbase="cn=accesslog"
    syncdata=accesslog


Netstat of host1:

[root@host1 ~]# netstat -anp | grep slapd
tcp        0      0 0.0.0.0:389                 0.0.0.0:*                  
LISTEN      31615/slapd
tcp       56      0 192.168.1.1:389               192.168.1.2:57276            
ESTABLISHED 31615/slapd
tcp        0      0 192.168.1.1:45569             192.168.1.2:389              
ESTABLISHED 31615/slapd
unix  2      [ ]         STREAM     CONNECTED     45346152 31615/slapd
unix  2      [ ]         DGRAM                    45346144 31615/slapd

Netstat of host2:

[root@host2 ~]# netstat -anp | grep slapd
tcp        0      0 0.0.0.0:389                 0.0.0.0:*                  
LISTEN      3868/slapd
tcp       56      0 192.168.1.2:389               192.168.1.1:45569            
ESTABLISHED 3868/slapd
tcp        0      0 192.168.1.2:57276             192.168.1.1:389              
ESTABLISHED 3868/slapd
unix  2      [ ]         STREAM     CONNECTED     19144  3868/slapd
unix         [ ]         DGRAM                    19136  3868/slapd