[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#8752) MMR delta-sync deadlock using slapd.conf



On Thu, Oct 05, 2017 at 03:32:38AM +0000, quanah@openldap.org wrote:
> Full_Name: Quanah Gibson-Mount
> Version: 2.4.45
> OS: Linux
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (47.208.148.239)
> 
> 
> There is a reproducible lockup with delta-sync based multimaster replication. 
> The more masters exist the easier it is to trigger, with 100% reproducibility at
> 4 masters.
> 
> Generally:
> 
> Load a database onto a server with no syncprov/accesslog configuration.
> 
> Stop slapd
> 
> Reconfigure slapd.conf to include syncprov/accesslog overlays, and cn=accesslog
> DB for changes, and replication agreements with other servers
> 
> Configure other servers for syncprov/accesslog, but no database
> 
> start slapd on all servers
> 
> execute a MOD operation sequentially across the servers (1, 2, 3, 4)
> 
> slapd will deadlock on server 3 or 4 100% of the time

This is still an issue at least in master - running the its8752
regression test often results in the following deadlock:

(gdb) deadlock
Cycles:
Thread #5 (LWP 8024) in mdb_txn_renew0() at ./../../../libraries/liblmdb/mdb.c:2749
        Thread #4 (LWP 8023) in accesslog_op_mod() at accesslog.c:1988
                deadlock from root
                Thread #3 (LWP 8022) in mdb_txn_renew0() at ./../../../libraries/liblmdb/mdb.c:2749

(gdb) thread apply all bt

Thread 9 (Thread 0x7f9199ffd700 (LWP 15035)):
#3  0x00007f91b815f20d in ldap_pvt_thread_cond_wait (cond=0xe11cb8, mutex=0xe11c90) at thr_posix.c:288

Thread 8 (Thread 0x7f91a09fa700 (LWP 15034)):
#3  0x00007f91b815f20d in ldap_pvt_thread_cond_wait (cond=0xe11cb8, mutex=0xe11c90) at thr_posix.c:288

Thread 7 (Thread 0x7f91b12b7700 (LWP 8026)):
#3  0x00007f91b815f20d in ldap_pvt_thread_cond_wait (cond=0xe11cb8, mutex=0xe11c90) at thr_posix.c:288

Thread 6 (Thread 0x7f91b1ab8700 (LWP 8025)):
#3  0x00007f91b815f20d in ldap_pvt_thread_cond_wait (cond=0xe11cb8, mutex=0xe11c90) at thr_posix.c:288

Thread 5 (Thread 0x7f91b22b9700 (LWP 8024)):
#0  0x00007f91b5dda352 in __pthread_mutex_lock_full [waiting on thread 4 (LWP 8023)] (mutex=0x7f91b8114040) at ../nptl/pthread_mutex_lock.c:313
#1  0x000000000051a881 in mdb_txn_renew0 (txn=0x1027ac0) at ./../../../libraries/liblmdb/mdb.c:2749
#2  0x000000000051afab in mdb_txn_begin (env=0xf24e10, parent=0x0, flags=0, ret=0x7f91b22b7190) at ./../../../libraries/liblmdb/mdb.c:2907
#3  0x00000000005792f4 in mdb_opinfo_get (op=Modify request = {...}, mdb=0x7f91b543b010, rdonly=0, moip=0x7f91b22b7178) at id2entry.c:740
#4  0x000000000052f989 in mdb_modify (op=Modify request = {...}, rs=0x7f91b22b89b8) at modify.c:584
#5  0x00000000004f6d65 in overlay_op_walk (op=Modify request = {...}, rs=0x7f91b22b89b8, which=op_modify, oi=0xe29b10, on=0x0) at backover.c:706
#6  0x00000000004f9d61 in over_op_func (op=Modify request = {...}, rs=0x7f91b22b89b8, which=op_modify) at backover.c:766
#7  0x00000000004f8fb2 in over_op_modify (op=Modify request = {...}, rs=0x7f91b22b89b8) at backover.c:808
#8  0x0000000000472e4e in fe_op_modify (op=Modify request = {...}, rs=0x7f91b22b89b8) at modify.c:307
#9  0x0000000000471bf5 in do_modify (op=Modify request = {...}, rs=0x7f91b22b89b8) at modify.c:177
#10 0x000000000044e1ce in connection_operation (ctx=0x7f91b22b8b78, arg_v=0x7f919c100fa0) at connection.c:1169
#11 0x000000000044c3a1 in connection_read_thread (ctx=0x7f91b22b8b78, argv=0xf) at connection.c:1326

Thread 4 (Thread 0x7f91b2aba700 (LWP 8023)):
#2  0x00007f91b815f275 in ldap_pvt_thread_mutex_lock [waiting on thread 5 (LWP 8024)] (mutex=0xe7c328) at thr_posix.c:307
#3  0x00007f91b54fffbc in accesslog_op_mod (op=Modify request = {...}, rs=0x7f91b2ab8bd8) at accesslog.c:1988
#4  0x00000000004f6cc1 in overlay_op_walk (op=Modify request = {...}, rs=0x7f91b2ab8bd8, which=op_modify, oi=0xe29b10, on=0xe7c0c0) at backover.c:691
#5  0x00000000004f9d61 in over_op_func (op=Modify request = {...}, rs=0x7f91b2ab8bd8, which=op_modify) at backover.c:766
#6  0x00000000004f8fb2 in over_op_modify (op=Modify request = {...}, rs=0x7f91b2ab8bd8) at backover.c:808
#7  0x00000000004ec259 in syncrepl_updateCookie (si=0xe7aa90, op=Modify request = {...}, syncCookie=0x7f91b2ab91f0, save=1) at syncrepl.c:4051
#8  0x00000000004e978e in do_syncrep2 (op=Modify request = {...}, si=0xe7aa90) at syncrepl.c:1206
#9  0x00000000004e2556 in do_syncrepl (ctx=0x7f91b2ab9b78, arg=0xe7b030) at syncrepl.c:1604

Thread 3 (Thread 0x7f91b32bb700 (LWP 8022)):
#0  0x00007f91b5dda352 in __pthread_mutex_lock_full [waiting on thread 4 (LWP 8023)] (mutex=0x7f91b8114040) at ../nptl/pthread_mutex_lock.c:313
#1  0x000000000051a881 in mdb_txn_renew0 (txn=0x1027ac0) at ./../../../libraries/liblmdb/mdb.c:2749
#2  0x000000000051afab in mdb_txn_begin (env=0xf24e10, parent=0x0, flags=0, ret=0x7f91a4001170) at ./../../../libraries/liblmdb/mdb.c:2907
#3  0x00000000005792f4 in mdb_opinfo_get (op=Search request = {...}, mdb=0x7f91b543b010, rdonly=0, moip=0xe7b348) at id2entry.c:740
#4  0x000000000057989d in mdb_txn (op=Search request = {...}, txnop=1, ptr=0xe7b348) at id2entry.c:812
#5  0x00000000004ee059 in syncrepl_entry (si=0xe7b0f0, op=Search request = {...}, entry="dc=example,dc=com", modlist=0x7f91b32ba1a8, syncstate=1, syncUUID=0x7f91b32ba140, syncCSN=0x0) at syncrepl.c:3087
#6  0x00000000004e8cac in do_syncrep2 (op=Search request = {...}, si=0xe7b0f0) at syncrepl.c:1056
#7  0x00000000004e2556 in do_syncrepl (ctx=0x7f91b32bab78, arg=0xe7b6d0) at syncrepl.c:1604

Thread 2 (Thread 0x7f91b3abc700 (LWP 8021)):
#0  0x00007f91b5d0b207 in epoll_wait (epfd=6, events=0xdebc50, maxevents=1024, timeout=7000) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  0x000000000044591a in slapd_daemon_task (ptr=0x1232700) at daemon.c:2745

Thread 1 (Thread 0x7f91b5761780 (LWP 8006)):
#1  0x00007f91b815f11d in ldap_pvt_thread_join (thread=140263761364736, thread_return=0x0) at thr_posix.c:208
#2  0x0000000000443a1b in slapd_daemon () at daemon.c:3157
#3  0x000000000041ec5b in main (argc=8, argv=0x7ffcee967dc8) at main.c:1018

-- 
OndÅ?ej Kuzník
Senior Software Engineer
Symas Corporation                       http://www.symas.com
Packaged, certified, and supported LDAP solutions powered by OpenLDAP