Issue 7951 - slapd deadlocks during mod operation
Summary: slapd deadlocks during mod operation
Status: VERIFIED FIXED
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: slapd (show other issues)
Version: 2.4.36
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-09-26 00:39 UTC by christianclinton@gmail.com
Modified: 2014-12-05 19:57 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description christianclinton@gmail.com 2014-09-26 00:39:17 UTC
Full_Name: Christian Clinton
Version: 2.4.36
OS: CentOS 6.4
URL: ftp://ftp.openldap.org/incoming/christian-clinton-140925.zip
Submission from: (NULL) (216.148.0.72)


Description: Slapd hangs during LDAP modification operation. Subsequent read or
write operations hang from all clients until slapd restart.

slapd logs show:
slapd[11213]: => bdb_idl_insert_key: c_put id failed: BDB0068 DB_LOCK_DEADLOCK:
Locker killed to resolve a deadlock (-30993)
slapd[11213]: conn=12787 op=25: attribute "entryCSN" index add failure
slapd[11213]: => bdb_idl_insert_key: c_put id failed: BDB0068 DB_LOCK_DEADLOCK:
Locker killed to resolve a deadlock (-30993)
slapd[11213]: conn=13382 op=3: attribute "entryCSN" index add failure

This issue has occurred sporadically for several months across different
OpenLDAP versions. I managed to capture of a backtrace for two occurrences of
the crash (excerpt below, full backtrace loloaded to FTP). In both instances, a
couple of threads were acquiring locks which is where I suspect the issue lies.

BT: ftp://ftp.openldap.org/incoming/christian-clinton-140925.zip

Environment:
OS: CentOS 6.4
OpenLDAP 2.4.36
1 Master w/ many syncrepl slaves

Thread 129 (Thread 0x7f10ddffb700 (LWP 16586)):
#0  0x00000037e560b43c in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00007f11a076e623 in __db_hybrid_mutex_suspend () from
/usr/local/openldap/lib/libdb-5.3.so
#2  0x00007f11a076d6eb in __db_tas_mutex_lock () from
/usr/local/openldap/lib/libdb-5.3.so
#3  0x00007f11a08205e6 in __lock_get_internal () from
/usr/local/openldap/lib/libdb-5.3.so
#4  0x00007f11a08208ab in __lock_get () from
/usr/local/openldap/lib/libdb-5.3.so
#5  0x00007f11a0848468 in __db_lget () from
/usr/local/openldap/lib/libdb-5.3.so
#6  0x00007f11a078dff6 in __bam_search () from
/usr/local/openldap/lib/libdb-5.3.so
#7  0x00007f11a0778ed6 in __bamc_search () from
/usr/local/openldap/lib/libdb-5.3.so
#8  0x00007f11a077c637 in __bamc_get () from
/usr/local/openldap/lib/libdb-5.3.so
#9  0x00007f11a083507d in __dbc_iget () from
/usr/local/openldap/lib/libdb-5.3.so
#10 0x00007f11a0845d00 in __dbc_get_pp () from
/usr/local/openldap/lib/libdb-5.3.so
#11 0x00000000005489ed in hdb_idl_delete_key (be=0x7f10ddff9480,
db=0x7f111a3f9610, tid=0x7f10603015b0, key=0x7f10ddff8ec0, id=250418) at
idl.c:947
#12 0x000000000054b455 in hdb_key_change (be=0x7f10ddff9480, db=0x7f111a3f9610,
txn=0x7f10603015b0, k=0x7f1060001ab0, id=250418, op=2) at key.c:97
#13 0x000000000054a4ee in indexer (op=0x7f10603c9f90, txn=0x7f10603015b0,
ad=0x1b6d5e0, atname=0x1b6d4b8, vals=0x7f1060001a88, id=250418, opid=2, mask=4)
at index.c:214
#14 0x000000000054a978 in index_at_values (op=0x7f10603c9f90,
txn=0x7f10603015b0, ad=0x1b6d5e0, type=0x1b6d450, tags=0x1b6d600,
vals=0x7f1060001a88, id=250418, opid=2)
    at index.c:337
#15 0x000000000054aaf6 in hdb_index_values (op=0x7f10603c9f90,
txn=0x7f10603015b0, desc=0x1b6d5e0, vals=0x7f1060001a88, id=250418, opid=2) at
index.c:386
#16 0x00000000004dbb44 in hdb_modify_internal (op=0x7f10603c9f90,
tid=0x7f10603015b0, modlist=0x7f10603f9930, e=0x7f10ddff91d0,
text=0x7f10ddffa990, 
    textbuf=0x7f10ddff9260 "\300\032\070\220\020\177", textlen=256) at
modify.c:358
#17 0x00000000004dcc09 in hdb_modify (op=0x7f10603c9f90, rs=0x7f10ddffa970) at
modify.c:642
#18 0x00000000004bbc11 in overlay_op_walk (op=0x7f10603c9f90, rs=0x7f10ddffa970,
which=op_modify, oi=0x1c9a500, on=0x0) at backover.c:671
#19 0x00000000004bbe28 in over_op_func (op=0x7f10603c9f90, rs=0x7f10ddffa970,
which=op_modify) at backover.c:723
#20 0x00000000004bbf64 in over_op_modify (op=0x7f10603c9f90, rs=0x7f10ddffa970)
at backover.c:762
#21 0x000000000044d5c6 in fe_op_modify (op=0x7f10603c9f90, rs=0x7f10ddffa970) at
modify.c:303
#22 0x000000000044ce98 in do_modify (op=0x7f10603c9f90, rs=0x7f10ddffa970) at
modify.c:177
#23 0x000000000042d068 in connection_operation (ctx=0x7f10ddffaaa0,
arg_v=0x7f10603c9f90) at connection.c:1155
#24 0x000000000042d609 in connection_read_thread (ctx=0x7f10ddffaaa0, argv=0xb7)
at connection.c:1291
#25 0x0000000000596261 in ldap_int_thread_pool_wrapper (xpool=0x1b77350) at
tpool.c:688
#26 0x00000037e5607851 in start_thread () from /lib64/libpthread.so.0
#27 0x00000037e52e890d in clone () from /lib64/libc.so.6

Thread 71 (Thread 0x7f0fedffb700 (LWP 16648)):
#0  0x00000037e560b43c in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00007f11a076e623 in __db_hybrid_mutex_suspend () from
/usr/local/openldap/lib/libdb-5.3.so
#2  0x00007f11a076d6eb in __db_tas_mutex_lock () from
/usr/local/openldap/lib/libdb-5.3.so
#3  0x00007f11a08205e6 in __lock_get_internal () from
/usr/local/openldap/lib/libdb-5.3.so
#4  0x00007f11a08208ab in __lock_get () from
/usr/local/openldap/lib/libdb-5.3.so
#5  0x00007f11a0848468 in __db_lget () from
/usr/local/openldap/lib/libdb-5.3.so
#6  0x00007f11a078dff6 in __bam_search () from
/usr/local/openldap/lib/libdb-5.3.so
#7  0x00007f11a0778ed6 in __bamc_search () from
/usr/local/openldap/lib/libdb-5.3.so
#8  0x00007f11a077cb1f in __bamc_get () from
/usr/local/openldap/lib/libdb-5.3.so
#9  0x00007f11a083507d in __dbc_iget () from
/usr/local/openldap/lib/libdb-5.3.so
#10 0x00007f11a0845d00 in __dbc_get_pp () from
/usr/local/openldap/lib/libdb-5.3.so
#11 0x000000000053f414 in hdb_dn2id_delete (op=0x7f10883ba740,
txn=0x7f10a82e7430, eip=0x7f10505a82b0, e=0x7f117b58b968) at dn2id.c:646
#12 0x000000000053d8f0 in hdb_delete (op=0x7f10883ba740, rs=0x7f0fedffa970) at
dele.c.c:406
#13 0x00000000004bbc11 in overlay_op_walk (op=0x7f10883ba740, rs=0x7f0fedffa970,
which=op_delete, oi=0x1c9a500, on=0x0) at backover.c:671
#14 0x00000000004bbe28 in over_op_func (op=0x7f10883ba740, rs=0x7f0fedffa970,
which=op_delete) at backover.c:723
#15 0x00000000004bbfe2 in over_op_delete (op=0x7f10883ba740, rs=0x7f0fedffa970)
at backover.c:780
#16 0x000000000044ff7c in fe_op_delete (op=0x7f10883ba740, rs=0x7f0fedffa970) at
delete.c:174
#17 0x000000000044fbef in do_delete (op=0x7f10883ba740, rs=0x7f0fedffa970) at
delete.c:95
#18 0x000000000042d068 in connection_operation (ctx=0x7f0fedffaaa0,
arg_v=0x7f10883ba740) at connection.c:1155
#19 0x000000000042d609 in connection_read_thread (ctx=0x7f0fedffaaa0, argv=0x3f)
at connection.cA1A1291
#20 0x0000000000596261 in ldap_int_thread_pool_wrapper (xpool=0x1b77350) at
tpool.c:688
#21 0x00000037e5607851 in start_thread () from /lib64/libpthread.so.0
#22 0x00000037e52e890d in clone () from /lib64/libc.so.6
Comment 1 Quanah Gibson-Mount 2014-09-26 01:01:01 UTC
--On Friday, September 26, 2014 1:39 AM +0000 christianclinton@gmail.com 
wrote:

> Full_Name: Christian Clinton
> Version: 2.4.36
> OS: CentOS 6.4
> URL: ftp://ftp.openldap.org/incoming/christian-clinton-140925.zip
> Submission from: (NULL) (216.148.0.72)
>
>
> Description: Slapd hangs during LDAP modification operation. Subsequent
> read or write operations hang from all clients until slapd restart.

Upgrade to OpenLDAP 2.4.40, use back-mdb with a 64-bit OS, and dump the 
deprecated and useless back-hdb/bdb.

--Quanah


--

Quanah Gibson-Mount
Server Architect
Zimbra, Inc.
--------------------
Zimbra ::  the leader in open source messaging and collaboration

Comment 2 Howard Chu 2014-09-26 01:15:59 UTC
christianclinton@gmail.com wrote:
> Full_Name: Christian Clinton
> Version: 2.4.36
> OS: CentOS 6.4
> URL: ftp://ftp.openldap.org/incoming/christian-clinton-140925.zip
> Submission from: (NULL) (216.148.0.72)
>
>
> Description: Slapd hangs during LDAP modification operation. Subsequent read or
> write operations hang from all clients until slapd restart.

2.4.36 is over a year old. Bugs reported against old releases will not be 
investigated.

> slapd logs show:
> slapd[11213]: => bdb_idl_insert_key: c_put id failed: BDB0068 DB_LOCK_DEADLOCK:
> Locker killed to resolve a deadlock (-30993)
> slapd[11213]: conn=12787 op=25: attribute "entryCSN" index add failure
> slapd[11213]: => bdb_idl_insert_key: c_put id failed: BDB0068 DB_LOCK_DEADLOCK:
> Locker killed to resolve a deadlock (-30993)
> slapd[11213]: conn=13382 op=3: attribute "entryCSN" index add failure

This is a normal occurrence in BerkeleyDB and does not indicate an actual 
error. The bdb/hdb backend code automatically retries when a BDB deadlock is 
reported.

> This issue has occurred sporadically for several months across different
> OpenLDAP versions. I managed to capture of a backtrace for two occurrences of
> the crash (excerpt below, full backtrace loloaded to FTP). In both instances, a
> couple of threads were acquiring locks which is where I suspect the issue lies.
>
> BT: ftp://ftp.openldap.org/incoming/christian-clinton-140925.zip

When talking about BDB deadlock issues you should always obtain the output 
from db_stat -CA on the db directory, in addition to the full stack trace.

You should consider testing with different BDB versions as well, as issues are 
known to slip into various updates.

> Environment:
> OS: CentOS 6.4
> OpenLDAP 2.4.36
> 1 Master w/ many syncrepl slaves
>
> Thread 129 (Thread 0x7f10ddffb700 (LWP 16586)):
> #0  0x00000037e560b43c in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1  0x00007f11a076e623 in __db_hybrid_mutex_suspend () from
> /usr/local/openldap/lib/libdb-5.3.so
> #2  0x00007f11a076d6eb in __db_tas_mutex_lock () from
> /usr/local/openldap/lib/libdb-5.3.so
> #3  0x00007f11a08205e6 in __lock_get_internal () from
> /usr/local/openldap/lib/libdb-5.3.so
> #4  0x00007f11a08208ab in __lock_get () from
> /usr/local/openldap/lib/libdb-5.3.so
> #5  0x00007f11a0848468 in __db_lget () from
> /usr/local/openldap/lib/libdb-5.3.so
> #6  0x00007f11a078dff6 in __bam_search () from
> /usr/local/openldap/lib/libdb-5.3.so
> #7  0x00007f11a0778ed6 in __bamc_search () from
> /usr/local/openldap/lib/libdb-5.3.so
> #8  0x00007f11a077c637 in __bamc_get () from
> /usr/local/openldap/lib/libdb-5.3.so
> #9  0x00007f11a083507d in __dbc_iget () from
> /usr/local/openldap/lib/libdb-5.3.so
> #10 0x00007f11a0845d00 in __dbc_get_pp () from
> /usr/local/openldap/lib/libdb-5.3.so
> #11 0x00000000005489ed in hdb_idl_delete_key (be=0x7f10ddff9480,
> db=0x7f111a3f9610, tid=0x7f10603015b0, key=0x7f10ddff8ec0, id=250418) at
> idl.c:947
> #12 0x000000000054b455 in hdb_key_change (be=0x7f10ddff9480, db=0x7f111a3f9610,
> txn=0x7f10603015b0, k=0x7f1060001ab0, id=250418, op=2) at key.c:97
> #13 0x000000000054a4ee in indexer (op=0x7f10603c9f90, txn=0x7f10603015b0,
> ad=0x1b6d5e0, atname=0x1b6d4b8, vals=0x7f1060001a88, id=250418, opid=2, mask=4)
> at index.c:214
> #14 0x000000000054a978 in index_at_values (op=0x7f10603c9f90,
> txn=0x7f10603015b0, ad=0x1b6d5e0, type=0x1b6d450, tags=0x1b6d600,
> vals=0x7f1060001a88, id=250418, opid=2)
>      at index.c:337
> #15 0x000000000054aaf6 in hdb_index_values (op=0x7f10603c9f90,
> txn=0x7f10603015b0, desc=0x1b6d5e0, vals=0x7f1060001a88, id=250418, opid=2) at
> index.c:386
> #16 0x00000000004dbb44 in hdb_modify_internal (op=0x7f10603c9f90,
> tid=0x7f10603015b0, modlist=0x7f10603f9930, e=0x7f10ddff91d0,
> text=0x7f10ddffa990,
>      textbuf=0x7f10ddff9260 "\300\032\070\220\020\177", textlen=256) at
> modify.c:358
> #17 0x00000000004dcc09 in hdb_modify (op=0x7f10603c9f90, rs=0x7f10ddffa970) at
> modify.c:642
> #18 0x00000000004bbc11 in overlay_op_walk (op=0x7f10603c9f90, rs=0x7f10ddffa970,
> which=op_modify, oi=0x1c9a500, on=0x0) at backover.c:671
> #19 0x00000000004bbe28 in over_op_func (op=0x7f10603c9f90, rs=0x7f10ddffa970,
> which=op_modify) at backover.c:723
> #20 0x00000000004bbf64 in over_op_modify (op=0x7f10603c9f90, rs=0x7f10ddffa970)
> at backover.c:762
> #21 0x000000000044d5c6 in fe_op_modify (op=0x7f10603c9f90, rs=0x7f10ddffa970) at
> modify.c:303
> #22 0x000000000044ce98 in do_modify (op=0x7f10603c9f90, rs=0x7f10ddffa970) at
> modify.c:177
> #23 0x000000000042d068 in connection_operation (ctx=0x7f10ddffaaa0,
> arg_v=0x7f10603c9f90) at connection.c:1155
> #24 0x000000000042d609 in connection_read_thread (ctx=0x7f10ddffaaa0, argv=0xb7)
> at connection.c:1291
> #25 0x0000000000596261 in ldap_int_thread_pool_wrapper (xpool=0x1b77350) at
> tpool.c:688
> #26 0x00000037e5607851 in start_thread () from /lib64/libpthread.so.0
> #27 0x00000037e52e890d in clone () from /lib64/libc.so.6
>
> Thread 71 (Thread 0x7f0fedffb700 (LWP 16648)):
> #0  0x00000037e560b43c in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1  0x00007f11a076e623 in __db_hybrid_mutex_suspend () from
> /usr/local/openldap/lib/libdb-5.3.so
> #2  0x00007f11a076d6eb in __db_tas_mutex_lock () from
> /usr/local/openldap/lib/libdb-5.3.so
> #3  0x00007f11a08205e6 in __lock_get_internal () from
> /usr/local/openldap/lib/libdb-5.3.so
> #4  0x00007f11a08208ab in __lock_get () from
> /usr/local/openldap/lib/libdb-5.3.so
> #5  0x00007f11a0848468 in __db_lget () from
> /usr/local/openldap/lib/libdb-5.3.so
> #6  0x00007f11a078dff6 in __bam_search () from
> /usr/local/openldap/lib/libdb-5.3.so
> #7  0x00007f11a0778ed6 in __bamc_search () from
> /usr/local/openldap/lib/libdb-5.3.so
> #8  0x00007f11a077cb1f in __bamc_get () from
> /usr/local/openldap/lib/libdb-5.3.so
> #9  0x00007f11a083507d in __dbc_iget () from
> /usr/local/openldap/lib/libdb-5.3.so
> #10 0x00007f11a0845d00 in __dbc_get_pp () from
> /usr/local/openldap/lib/libdb-5.3.so
> #11 0x000000000053f414 in hdb_dn2id_delete (op=0x7f10883ba740,
> txn=0x7f10a82e7430, eip=0x7f10505a82b0, e=0x7f117b58b968) at dn2id.c:646
> #12 0x000000000053d8f0 in hdb_delete (op=0x7f10883ba740, rs=0x7f0fedffa970) at
> dele.c.c:406
> #13 0x00000000004bbc11 in overlay_op_walk (op=0x7f10883ba740, rs=0x7f0fedffa970,
> which=op_delete, oi=0x1c9a500, on=0x0) at backover.c:671
> #14 0x00000000004bbe28 in over_op_func (op=0x7f10883ba740, rs=0x7f0fedffa970,
> which=op_delete) at backover.c:723
> #15 0x00000000004bbfe2 in over_op_delete (op=0x7f10883ba740, rs=0x7f0fedffa970)
> at backover.c:780
> #16 0x000000000044ff7c in fe_op_delete (op=0x7f10883ba740, rs=0x7f0fedffa970) at
> delete.c:174
> #17 0x000000000044fbef in do_delete (op=0x7f10883ba740, rs=0x7f0fedffa970) at
> delete.c:95
> #18 0x000000000042d068 in connection_operation (ctx=0x7f0fedffaaa0,
> arg_v=0x7f10883ba740) at connection.c:1155
> #19 0x000000000042d609 in connection_read_thread (ctx=0x7f0fedffaaa0, argv=0x3f)
> at connection.cA1A1291
> #20 0x0000000000596261 in ldap_int_thread_pool_wrapper (xpool=0x1b77350) at
> tpool.c:688
> #21 0x00000037e5607851 in start_thread () from /lib64/libpthread.so.0
> #22 0x00000037e52e890d in clone () from /lib64/libc.so.6
>
>


-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 3 Howard Chu 2014-12-05 19:57:03 UTC
changed state Open to Closed