Full_Name: Quanah Gibson-Mount Version: 2.4.45 OS: Linux URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (47.208.148.239) There is a reproducible lockup with delta-sync based multimaster replication. The more masters exist the easier it is to trigger, with 100% reproducibility at 4 masters. Generally: Load a database onto a server with no syncprov/accesslog configuration. Stop slapd Reconfigure slapd.conf to include syncprov/accesslog overlays, and cn=accesslog DB for changes, and replication agreements with other servers Configure other servers for syncprov/accesslog, but no database start slapd on all servers execute a MOD operation sequentially across the servers (1, 2, 3, 4) slapd will deadlock on server 3 or 4 100% of the time
changed notes changed state Open to Test moved from Incoming to Software Bugs
changed notes changed state Test to Release
changed notes changed state Release to Closed
On Thu, Oct 05, 2017 at 03:32:38AM +0000, quanah@openldap.org wrote: > Full_Name: Quanah Gibson-Mount > Version: 2.4.45 > OS: Linux > URL: ftp://ftp.openldap.org/incoming/ > Submission from: (NULL) (47.208.148.239) > > > There is a reproducible lockup with delta-sync based multimaster replication. > The more masters exist the easier it is to trigger, with 100% reproducibility at > 4 masters. > > Generally: > > Load a database onto a server with no syncprov/accesslog configuration. > > Stop slapd > > Reconfigure slapd.conf to include syncprov/accesslog overlays, and cn=accesslog > DB for changes, and replication agreements with other servers > > Configure other servers for syncprov/accesslog, but no database > > start slapd on all servers > > execute a MOD operation sequentially across the servers (1, 2, 3, 4) > > slapd will deadlock on server 3 or 4 100% of the time This is still an issue at least in master - running the its8752 regression test often results in the following deadlock: (gdb) deadlock Cycles: Thread #5 (LWP 8024) in mdb_txn_renew0() at ./../../../libraries/liblmdb/mdb.c:2749 Thread #4 (LWP 8023) in accesslog_op_mod() at accesslog.c:1988 deadlock from root Thread #3 (LWP 8022) in mdb_txn_renew0() at ./../../../libraries/liblmdb/mdb.c:2749 (gdb) thread apply all bt Thread 9 (Thread 0x7f9199ffd700 (LWP 15035)): #3 0x00007f91b815f20d in ldap_pvt_thread_cond_wait (cond=0xe11cb8, mutex=0xe11c90) at thr_posix.c:288 Thread 8 (Thread 0x7f91a09fa700 (LWP 15034)): #3 0x00007f91b815f20d in ldap_pvt_thread_cond_wait (cond=0xe11cb8, mutex=0xe11c90) at thr_posix.c:288 Thread 7 (Thread 0x7f91b12b7700 (LWP 8026)): #3 0x00007f91b815f20d in ldap_pvt_thread_cond_wait (cond=0xe11cb8, mutex=0xe11c90) at thr_posix.c:288 Thread 6 (Thread 0x7f91b1ab8700 (LWP 8025)): #3 0x00007f91b815f20d in ldap_pvt_thread_cond_wait (cond=0xe11cb8, mutex=0xe11c90) at thr_posix.c:288 Thread 5 (Thread 0x7f91b22b9700 (LWP 8024)): #0 0x00007f91b5dda352 in __pthread_mutex_lock_full [waiting on thread 4 (LWP 8023)] (mutex=0x7f91b8114040) at ../nptl/pthread_mutex_lock.c:313 #1 0x000000000051a881 in mdb_txn_renew0 (txn=0x1027ac0) at ./../../../libraries/liblmdb/mdb.c:2749 #2 0x000000000051afab in mdb_txn_begin (env=0xf24e10, parent=0x0, flags=0, ret=0x7f91b22b7190) at ./../../../libraries/liblmdb/mdb.c:2907 #3 0x00000000005792f4 in mdb_opinfo_get (op=Modify request = {...}, mdb=0x7f91b543b010, rdonly=0, moip=0x7f91b22b7178) at id2entry.c:740 #4 0x000000000052f989 in mdb_modify (op=Modify request = {...}, rs=0x7f91b22b89b8) at modify.c:584 #5 0x00000000004f6d65 in overlay_op_walk (op=Modify request = {...}, rs=0x7f91b22b89b8, which=op_modify, oi=0xe29b10, on=0x0) at backover.c:706 #6 0x00000000004f9d61 in over_op_func (op=Modify request = {...}, rs=0x7f91b22b89b8, which=op_modify) at backover.c:766 #7 0x00000000004f8fb2 in over_op_modify (op=Modify request = {...}, rs=0x7f91b22b89b8) at backover.c:808 #8 0x0000000000472e4e in fe_op_modify (op=Modify request = {...}, rs=0x7f91b22b89b8) at modify.c:307 #9 0x0000000000471bf5 in do_modify (op=Modify request = {...}, rs=0x7f91b22b89b8) at modify.c:177 #10 0x000000000044e1ce in connection_operation (ctx=0x7f91b22b8b78, arg_v=0x7f919c100fa0) at connection.c:1169 #11 0x000000000044c3a1 in connection_read_thread (ctx=0x7f91b22b8b78, argv=0xf) at connection.c:1326 Thread 4 (Thread 0x7f91b2aba700 (LWP 8023)): #2 0x00007f91b815f275 in ldap_pvt_thread_mutex_lock [waiting on thread 5 (LWP 8024)] (mutex=0xe7c328) at thr_posix.c:307 #3 0x00007f91b54fffbc in accesslog_op_mod (op=Modify request = {...}, rs=0x7f91b2ab8bd8) at accesslog.c:1988 #4 0x00000000004f6cc1 in overlay_op_walk (op=Modify request = {...}, rs=0x7f91b2ab8bd8, which=op_modify, oi=0xe29b10, on=0xe7c0c0) at backover.c:691 #5 0x00000000004f9d61 in over_op_func (op=Modify request = {...}, rs=0x7f91b2ab8bd8, which=op_modify) at backover.c:766 #6 0x00000000004f8fb2 in over_op_modify (op=Modify request = {...}, rs=0x7f91b2ab8bd8) at backover.c:808 #7 0x00000000004ec259 in syncrepl_updateCookie (si=0xe7aa90, op=Modify request = {...}, syncCookie=0x7f91b2ab91f0, save=1) at syncrepl.c:4051 #8 0x00000000004e978e in do_syncrep2 (op=Modify request = {...}, si=0xe7aa90) at syncrepl.c:1206 #9 0x00000000004e2556 in do_syncrepl (ctx=0x7f91b2ab9b78, arg=0xe7b030) at syncrepl.c:1604 Thread 3 (Thread 0x7f91b32bb700 (LWP 8022)): #0 0x00007f91b5dda352 in __pthread_mutex_lock_full [waiting on thread 4 (LWP 8023)] (mutex=0x7f91b8114040) at ../nptl/pthread_mutex_lock.c:313 #1 0x000000000051a881 in mdb_txn_renew0 (txn=0x1027ac0) at ./../../../libraries/liblmdb/mdb.c:2749 #2 0x000000000051afab in mdb_txn_begin (env=0xf24e10, parent=0x0, flags=0, ret=0x7f91a4001170) at ./../../../libraries/liblmdb/mdb.c:2907 #3 0x00000000005792f4 in mdb_opinfo_get (op=Search request = {...}, mdb=0x7f91b543b010, rdonly=0, moip=0xe7b348) at id2entry.c:740 #4 0x000000000057989d in mdb_txn (op=Search request = {...}, txnop=1, ptr=0xe7b348) at id2entry.c:812 #5 0x00000000004ee059 in syncrepl_entry (si=0xe7b0f0, op=Search request = {...}, entry="dc=example,dc=com", modlist=0x7f91b32ba1a8, syncstate=1, syncUUID=0x7f91b32ba140, syncCSN=0x0) at syncrepl.c:3087 #6 0x00000000004e8cac in do_syncrep2 (op=Search request = {...}, si=0xe7b0f0) at syncrepl.c:1056 #7 0x00000000004e2556 in do_syncrepl (ctx=0x7f91b32bab78, arg=0xe7b6d0) at syncrepl.c:1604 Thread 2 (Thread 0x7f91b3abc700 (LWP 8021)): #0 0x00007f91b5d0b207 in epoll_wait (epfd=6, events=0xdebc50, maxevents=1024, timeout=7000) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30 #1 0x000000000044591a in slapd_daemon_task (ptr=0x1232700) at daemon.c:2745 Thread 1 (Thread 0x7f91b5761780 (LWP 8006)): #1 0x00007f91b815f11d in ldap_pvt_thread_join (thread=140263761364736, thread_return=0x0) at thr_posix.c:208 #2 0x0000000000443a1b in slapd_daemon () at daemon.c:3157 #3 0x000000000041ec5b in main (argc=8, argv=0x7ffcee967dc8) at main.c:1018 -- Ondřej Kuzník Senior Software Engineer Symas Corporation http://www.symas.com Packaged, certified, and supported LDAP solutions powered by OpenLDAP
ondra@mistotebe.net wrote: > On Thu, Oct 05, 2017 at 03:32:38AM +0000, quanah@openldap.org wrote: >> Full_Name: Quanah Gibson-Mount >> Version: 2.4.45 >> OS: Linux >> URL: ftp://ftp.openldap.org/incoming/ >> Submission from: (NULL) (47.208.148.239) >> >> >> There is a reproducible lockup with delta-sync based multimaster replication. >> The more masters exist the easier it is to trigger, with 100% reproducibility at >> 4 masters. >> >> Generally: >> >> Load a database onto a server with no syncprov/accesslog configuration. >> >> Stop slapd >> >> Reconfigure slapd.conf to include syncprov/accesslog overlays, and cn=accesslog >> DB for changes, and replication agreements with other servers >> >> Configure other servers for syncprov/accesslog, but no database >> >> start slapd on all servers >> >> execute a MOD operation sequentially across the servers (1, 2, 3, 4) >> >> slapd will deadlock on server 3 or 4 100% of the time > > This is still an issue at least in master - running the its8752 > regression test often results in the following deadlock: This says that thread 4 is still holding the LMDB writer mutex while running inside the accesslog overlay. Which it should not be, since back-mdb acquires and commits its LMDB transactions before letting any other module have control. > > (gdb) deadlock > Cycles: > Thread #5 (LWP 8024) in mdb_txn_renew0() at ./../../../libraries/liblmdb/mdb.c:2749 > Thread #4 (LWP 8023) in accesslog_op_mod() at accesslog.c:1988 > deadlock from root > Thread #3 (LWP 8022) in mdb_txn_renew0() at ./../../../libraries/liblmdb/mdb.c:2749 > > (gdb) thread apply all bt > > Thread 9 (Thread 0x7f9199ffd700 (LWP 15035)): > #3 0x00007f91b815f20d in ldap_pvt_thread_cond_wait (cond=0xe11cb8, mutex=0xe11c90) at thr_posix.c:288 > > Thread 8 (Thread 0x7f91a09fa700 (LWP 15034)): > #3 0x00007f91b815f20d in ldap_pvt_thread_cond_wait (cond=0xe11cb8, mutex=0xe11c90) at thr_posix.c:288 > > Thread 7 (Thread 0x7f91b12b7700 (LWP 8026)): > #3 0x00007f91b815f20d in ldap_pvt_thread_cond_wait (cond=0xe11cb8, mutex=0xe11c90) at thr_posix.c:288 > > Thread 6 (Thread 0x7f91b1ab8700 (LWP 8025)): > #3 0x00007f91b815f20d in ldap_pvt_thread_cond_wait (cond=0xe11cb8, mutex=0xe11c90) at thr_posix.c:288 > > Thread 5 (Thread 0x7f91b22b9700 (LWP 8024)): > #0 0x00007f91b5dda352 in __pthread_mutex_lock_full [waiting on thread 4 (LWP 8023)] (mutex=0x7f91b8114040) at ../nptl/pthread_mutex_lock.c:313 > #1 0x000000000051a881 in mdb_txn_renew0 (txn=0x1027ac0) at ./../../../libraries/liblmdb/mdb.c:2749 > #2 0x000000000051afab in mdb_txn_begin (env=0xf24e10, parent=0x0, flags=0, ret=0x7f91b22b7190) at ./../../../libraries/liblmdb/mdb.c:2907 > #3 0x00000000005792f4 in mdb_opinfo_get (op=Modify request = {...}, mdb=0x7f91b543b010, rdonly=0, moip=0x7f91b22b7178) at id2entry.c:740 > #4 0x000000000052f989 in mdb_modify (op=Modify request = {...}, rs=0x7f91b22b89b8) at modify.c:584 > #5 0x00000000004f6d65 in overlay_op_walk (op=Modify request = {...}, rs=0x7f91b22b89b8, which=op_modify, oi=0xe29b10, on=0x0) at backover.c:706 > #6 0x00000000004f9d61 in over_op_func (op=Modify request = {...}, rs=0x7f91b22b89b8, which=op_modify) at backover.c:766 > #7 0x00000000004f8fb2 in over_op_modify (op=Modify request = {...}, rs=0x7f91b22b89b8) at backover.c:808 > #8 0x0000000000472e4e in fe_op_modify (op=Modify request = {...}, rs=0x7f91b22b89b8) at modify.c:307 > #9 0x0000000000471bf5 in do_modify (op=Modify request = {...}, rs=0x7f91b22b89b8) at modify.c:177 > #10 0x000000000044e1ce in connection_operation (ctx=0x7f91b22b8b78, arg_v=0x7f919c100fa0) at connection.c:1169 > #11 0x000000000044c3a1 in connection_read_thread (ctx=0x7f91b22b8b78, argv=0xf) at connection.c:1326 > > Thread 4 (Thread 0x7f91b2aba700 (LWP 8023)): > #2 0x00007f91b815f275 in ldap_pvt_thread_mutex_lock [waiting on thread 5 (LWP 8024)] (mutex=0xe7c328) at thr_posix.c:307 > #3 0x00007f91b54fffbc in accesslog_op_mod (op=Modify request = {...}, rs=0x7f91b2ab8bd8) at accesslog.c:1988 > #4 0x00000000004f6cc1 in overlay_op_walk (op=Modify request = {...}, rs=0x7f91b2ab8bd8, which=op_modify, oi=0xe29b10, on=0xe7c0c0) at backover.c:691 > #5 0x00000000004f9d61 in over_op_func (op=Modify request = {...}, rs=0x7f91b2ab8bd8, which=op_modify) at backover.c:766 > #6 0x00000000004f8fb2 in over_op_modify (op=Modify request = {...}, rs=0x7f91b2ab8bd8) at backover.c:808 > #7 0x00000000004ec259 in syncrepl_updateCookie (si=0xe7aa90, op=Modify request = {...}, syncCookie=0x7f91b2ab91f0, save=1) at syncrepl.c:4051 > #8 0x00000000004e978e in do_syncrep2 (op=Modify request = {...}, si=0xe7aa90) at syncrepl.c:1206 > #9 0x00000000004e2556 in do_syncrepl (ctx=0x7f91b2ab9b78, arg=0xe7b030) at syncrepl.c:1604 > > Thread 3 (Thread 0x7f91b32bb700 (LWP 8022)): > #0 0x00007f91b5dda352 in __pthread_mutex_lock_full [waiting on thread 4 (LWP 8023)] (mutex=0x7f91b8114040) at ../nptl/pthread_mutex_lock.c:313 > #1 0x000000000051a881 in mdb_txn_renew0 (txn=0x1027ac0) at ./../../../libraries/liblmdb/mdb.c:2749 > #2 0x000000000051afab in mdb_txn_begin (env=0xf24e10, parent=0x0, flags=0, ret=0x7f91a4001170) at ./../../../libraries/liblmdb/mdb.c:2907 > #3 0x00000000005792f4 in mdb_opinfo_get (op=Search request = {...}, mdb=0x7f91b543b010, rdonly=0, moip=0xe7b348) at id2entry.c:740 > #4 0x000000000057989d in mdb_txn (op=Search request = {...}, txnop=1, ptr=0xe7b348) at id2entry.c:812 > #5 0x00000000004ee059 in syncrepl_entry (si=0xe7b0f0, op=Search request = {...}, entry="dc=example,dc=com", modlist=0x7f91b32ba1a8, syncstate=1, syncUUID=0x7f91b32ba140, syncCSN=0x0) at syncrepl.c:3087 > #6 0x00000000004e8cac in do_syncrep2 (op=Search request = {...}, si=0xe7b0f0) at syncrepl.c:1056 > #7 0x00000000004e2556 in do_syncrepl (ctx=0x7f91b32bab78, arg=0xe7b6d0) at syncrepl.c:1604 > > Thread 2 (Thread 0x7f91b3abc700 (LWP 8021)): > #0 0x00007f91b5d0b207 in epoll_wait (epfd=6, events=0xdebc50, maxevents=1024, timeout=7000) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30 > #1 0x000000000044591a in slapd_daemon_task (ptr=0x1232700) at daemon.c:2745 > > Thread 1 (Thread 0x7f91b5761780 (LWP 8006)): > #1 0x00007f91b815f11d in ldap_pvt_thread_join (thread=140263761364736, thread_return=0x0) at thr_posix.c:208 > #2 0x0000000000443a1b in slapd_daemon () at daemon.c:3157 > #3 0x000000000041ec5b in main (argc=8, argv=0x7ffcee967dc8) at main.c:1018 > -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
changed notes changed state Closed to Open
changed notes changed state Open to Release
fixed in master fixed in RE24 (2.4.46) Further fixes in RE24 (2.4.47) Re-opening due to continued issues (at least in master)