[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#8752) MMR delta-sync deadlock using slapd.conf



ondra@mistotebe.net wrote:
> On Thu, Oct 05, 2017 at 03:32:38AM +0000, quanah@openldap.org wrote:
>> Full_Name: Quanah Gibson-Mount
>> Version: 2.4.45
>> OS: Linux
>> URL: ftp://ftp.openldap.org/incoming/
>> Submission from: (NULL) (47.208.148.239)
>>
>>
>> There is a reproducible lockup with delta-sync based multimaster replication. 
>> The more masters exist the easier it is to trigger, with 100% reproducibility at
>> 4 masters.
>>
>> Generally:
>>
>> Load a database onto a server with no syncprov/accesslog configuration.
>>
>> Stop slapd
>>
>> Reconfigure slapd.conf to include syncprov/accesslog overlays, and cn=accesslog
>> DB for changes, and replication agreements with other servers
>>
>> Configure other servers for syncprov/accesslog, but no database
>>
>> start slapd on all servers
>>
>> execute a MOD operation sequentially across the servers (1, 2, 3, 4)
>>
>> slapd will deadlock on server 3 or 4 100% of the time
> 
> This is still an issue at least in master - running the its8752
> regression test often results in the following deadlock:

This says that thread 4 is still holding the LMDB writer mutex while running inside the
accesslog overlay. Which it should not be, since back-mdb acquires and commits its LMDB
transactions before letting any other module have control.
> 
> (gdb) deadlock
> Cycles:
> Thread #5 (LWP 8024) in mdb_txn_renew0() at ./../../../libraries/liblmdb/mdb.c:2749
>         Thread #4 (LWP 8023) in accesslog_op_mod() at accesslog.c:1988
>                 deadlock from root
>                 Thread #3 (LWP 8022) in mdb_txn_renew0() at ./../../../libraries/liblmdb/mdb.c:2749
> 
> (gdb) thread apply all bt
> 
> Thread 9 (Thread 0x7f9199ffd700 (LWP 15035)):
> #3  0x00007f91b815f20d in ldap_pvt_thread_cond_wait (cond=0xe11cb8, mutex=0xe11c90) at thr_posix.c:288
> 
> Thread 8 (Thread 0x7f91a09fa700 (LWP 15034)):
> #3  0x00007f91b815f20d in ldap_pvt_thread_cond_wait (cond=0xe11cb8, mutex=0xe11c90) at thr_posix.c:288
> 
> Thread 7 (Thread 0x7f91b12b7700 (LWP 8026)):
> #3  0x00007f91b815f20d in ldap_pvt_thread_cond_wait (cond=0xe11cb8, mutex=0xe11c90) at thr_posix.c:288
> 
> Thread 6 (Thread 0x7f91b1ab8700 (LWP 8025)):
> #3  0x00007f91b815f20d in ldap_pvt_thread_cond_wait (cond=0xe11cb8, mutex=0xe11c90) at thr_posix.c:288
> 
> Thread 5 (Thread 0x7f91b22b9700 (LWP 8024)):
> #0  0x00007f91b5dda352 in __pthread_mutex_lock_full [waiting on thread 4 (LWP 8023)] (mutex=0x7f91b8114040) at ../nptl/pthread_mutex_lock.c:313
> #1  0x000000000051a881 in mdb_txn_renew0 (txn=0x1027ac0) at ./../../../libraries/liblmdb/mdb.c:2749
> #2  0x000000000051afab in mdb_txn_begin (env=0xf24e10, parent=0x0, flags=0, ret=0x7f91b22b7190) at ./../../../libraries/liblmdb/mdb.c:2907
> #3  0x00000000005792f4 in mdb_opinfo_get (op=Modify request = {...}, mdb=0x7f91b543b010, rdonly=0, moip=0x7f91b22b7178) at id2entry.c:740
> #4  0x000000000052f989 in mdb_modify (op=Modify request = {...}, rs=0x7f91b22b89b8) at modify.c:584
> #5  0x00000000004f6d65 in overlay_op_walk (op=Modify request = {...}, rs=0x7f91b22b89b8, which=op_modify, oi=0xe29b10, on=0x0) at backover.c:706
> #6  0x00000000004f9d61 in over_op_func (op=Modify request = {...}, rs=0x7f91b22b89b8, which=op_modify) at backover.c:766
> #7  0x00000000004f8fb2 in over_op_modify (op=Modify request = {...}, rs=0x7f91b22b89b8) at backover.c:808
> #8  0x0000000000472e4e in fe_op_modify (op=Modify request = {...}, rs=0x7f91b22b89b8) at modify.c:307
> #9  0x0000000000471bf5 in do_modify (op=Modify request = {...}, rs=0x7f91b22b89b8) at modify.c:177
> #10 0x000000000044e1ce in connection_operation (ctx=0x7f91b22b8b78, arg_v=0x7f919c100fa0) at connection.c:1169
> #11 0x000000000044c3a1 in connection_read_thread (ctx=0x7f91b22b8b78, argv=0xf) at connection.c:1326
> 
> Thread 4 (Thread 0x7f91b2aba700 (LWP 8023)):
> #2  0x00007f91b815f275 in ldap_pvt_thread_mutex_lock [waiting on thread 5 (LWP 8024)] (mutex=0xe7c328) at thr_posix.c:307
> #3  0x00007f91b54fffbc in accesslog_op_mod (op=Modify request = {...}, rs=0x7f91b2ab8bd8) at accesslog.c:1988
> #4  0x00000000004f6cc1 in overlay_op_walk (op=Modify request = {...}, rs=0x7f91b2ab8bd8, which=op_modify, oi=0xe29b10, on=0xe7c0c0) at backover.c:691
> #5  0x00000000004f9d61 in over_op_func (op=Modify request = {...}, rs=0x7f91b2ab8bd8, which=op_modify) at backover.c:766
> #6  0x00000000004f8fb2 in over_op_modify (op=Modify request = {...}, rs=0x7f91b2ab8bd8) at backover.c:808
> #7  0x00000000004ec259 in syncrepl_updateCookie (si=0xe7aa90, op=Modify request = {...}, syncCookie=0x7f91b2ab91f0, save=1) at syncrepl.c:4051
> #8  0x00000000004e978e in do_syncrep2 (op=Modify request = {...}, si=0xe7aa90) at syncrepl.c:1206
> #9  0x00000000004e2556 in do_syncrepl (ctx=0x7f91b2ab9b78, arg=0xe7b030) at syncrepl.c:1604
> 
> Thread 3 (Thread 0x7f91b32bb700 (LWP 8022)):
> #0  0x00007f91b5dda352 in __pthread_mutex_lock_full [waiting on thread 4 (LWP 8023)] (mutex=0x7f91b8114040) at ../nptl/pthread_mutex_lock.c:313
> #1  0x000000000051a881 in mdb_txn_renew0 (txn=0x1027ac0) at ./../../../libraries/liblmdb/mdb.c:2749
> #2  0x000000000051afab in mdb_txn_begin (env=0xf24e10, parent=0x0, flags=0, ret=0x7f91a4001170) at ./../../../libraries/liblmdb/mdb.c:2907
> #3  0x00000000005792f4 in mdb_opinfo_get (op=Search request = {...}, mdb=0x7f91b543b010, rdonly=0, moip=0xe7b348) at id2entry.c:740
> #4  0x000000000057989d in mdb_txn (op=Search request = {...}, txnop=1, ptr=0xe7b348) at id2entry.c:812
> #5  0x00000000004ee059 in syncrepl_entry (si=0xe7b0f0, op=Search request = {...}, entry="dc=example,dc=com", modlist=0x7f91b32ba1a8, syncstate=1, syncUUID=0x7f91b32ba140, syncCSN=0x0) at syncrepl.c:3087
> #6  0x00000000004e8cac in do_syncrep2 (op=Search request = {...}, si=0xe7b0f0) at syncrepl.c:1056
> #7  0x00000000004e2556 in do_syncrepl (ctx=0x7f91b32bab78, arg=0xe7b6d0) at syncrepl.c:1604
> 
> Thread 2 (Thread 0x7f91b3abc700 (LWP 8021)):
> #0  0x00007f91b5d0b207 in epoll_wait (epfd=6, events=0xdebc50, maxevents=1024, timeout=7000) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
> #1  0x000000000044591a in slapd_daemon_task (ptr=0x1232700) at daemon.c:2745
> 
> Thread 1 (Thread 0x7f91b5761780 (LWP 8006)):
> #1  0x00007f91b815f11d in ldap_pvt_thread_join (thread=140263761364736, thread_return=0x0) at thr_posix.c:208
> #2  0x0000000000443a1b in slapd_daemon () at daemon.c:3157
> #3  0x000000000041ec5b in main (argc=8, argv=0x7ffcee967dc8) at main.c:1018
> 


-- 
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/