[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
(ITS#7633) Slapd hangs on hdb write lock
Full_Name: Dusan Fric
Version: 2.4.32
OS: RHEL 5.7 x64
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (88.212.40.139)
We are experiencing frequent hangs in slapd. Once hung we usualy cannot continue
to
connect until we kill -9 the slapd process and restart it. The directory is used
for 2 applications as user eDir and we are using it in production over 1 month -
we have noted the busier the directory becomes the more often it hangs (now
twice per week).
We have installed identical configurations on 3 environments, each has only one
single server (no replication). There are 30k entries in the directory
(production).
We are running:
RHEL 5.7 x64 (VMWare with NFS mountpoints)
OpenLDAP 2.4.32
Berkeley DB 4.8.30 (4.7.25)
We were starting with DB 4.8.30, after downgrade to version 4.7.25 (according to
ITS#7378 - https://www.openldap.org/its/index.cgi/Incoming?id=7378;selectid=7378)
we are facing the same issue.
The problem occurs when two requests simultaneously try to update attributes of
a record and in most cases of the same DN value.
We can easily reproduce it with a java test program running 2 threads each
connecting to the ldap server and updating the record for a particular DN
value.
It need not be the same DN value but DN values which reside on the same BDB
page.
Configuration
olc cn=config (part):
dn: cn=config
objectClass: olcGlobal
cn: config
olcConcurrency: 0
olcConnMaxPending: 100
olcConnMaxPendingAuth: 1000
olcSockbufMaxIncoming: 262143
olcSockbufMaxIncomingAuth: 16777215
olcThreads: 32
olc DB hdb config (part):
dn: olcDatabase={1}hdb,cn=config
objectClass: olcHdbConfig
objectClass: olcDatabaseConfig
olcDatabase: {1}hdb
olcDbCacheSize: 80000
olcDbCheckpoint: 128 5
olcDbConfig: {0}set_cachesize 0 268435456 1
olcDbConfig: {1}set_lg_max 10485760
olcDbConfig: {2}set_lg_bsize 2097152
olcDbConfig: {3}set_lg_dir /pkg/openldap/dblog
olcDbConfig: {4}set_lg_regionmax 262144
olcDbConfig: {5}set_lk_detect DB_LOCK_EXPIRE
olcDbConfig: {6}set_flags DB_TXN_NOSYNC
olcDbDirtyRead: FALSE
olcDbDNcacheSize: 0
olcDbIDLcacheSize: 240000
olcDbIndex: default eq,sub
olcDbIndex: objectClass eq
olcDbIndex: cn eq,sub
olcDbIndex: uid eq,sub
olcDbIndex: mail pres,eq
olcDbIndex: sn eq,sub
olcDbIndex: member eq
olcDbLinearIndex: FALSE
olcDbMode: 0600
olcDbNoSync: TRUE
olcDbSearchStack: 20
olcDbShmKey: 0
olcLastMod: TRUE
olcMaxDerefDepth: 15
olcMonitoring: TRUE
olcReadOnly: FALSE
We have managed to collect db_stat lock information, which indicates the same
issue with DB write locks on both DB versions.
db_stat -C ol (4.8.30)
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Lock REGINFO information:
Lock Region type
5 Region ID
/apps/DECCLASA-1/data/openldap/__db.005 Region name
0x2ab41cdc3000 Region address
0x2ab41cdc3138 Region primary address
0 Region maximum allocation
0 Region allocated
Region allocations: 3006 allocations, 0 failures, 0 frees, 1 longest
REGION_JOIN_OK Region flags
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Locks grouped by lockers:
Locker Mode Count Status ----------------- Object ---------------
1 dd= 0 locks held 1 write locks 0 pid/thread 30034/47555767609232
1 READ 1 HELD id2entry.bdb handle 0
2 dd= 0 locks held 0 write locks 0 pid/thread 30034/47555767609232
3 dd= 0 locks held 1 write locks 0 pid/thread 30034/47555767609232
3 READ 1 HELD dn2id.bdb handle 0
4 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
5 dd= 0 locks held 0 write locks 0 pid/thread 30034/47555767609232
6 dd= 0 locks held 0 write locks 0 pid/thread 30034/1139579200
7 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
8 dd= 0 locks held 0 write locks 0 pid/thread 30034/1131186496
9 dd= 0 locks held 1 write locks 0 pid/thread 30034/1122793792
9 READ 1 HELD objectClass.bdb handle 0
a dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792
b dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792
c dd= 0 locks held 1 write locks 0 pid/thread 30034/1122793792
c READ 1 HELD uid.bdb handle 0
d dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792
e dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792
f dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792
10 dd= 0 locks held 1 write locks 0 pid/thread 30034/1147971904
10 READ 1 HELD member.bdb handle 0
11 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
12 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
13 dd= 0 locks held 1 write locks 0 pid/thread 30034/1147971904
13 READ 1 HELD mail.bdb handle 0
14 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
15 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
16 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
17 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
18 dd= 0 locks held 1 write locks 0 pid/thread 30034/1147971904
18 READ 1 HELD sn.bdb handle 0
19 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
1a dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
1b dd= 0 locks held 1 write locks 0 pid/thread 30034/1147971904
1b READ 1 HELD cn.bdb handle 0
1c dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
1d dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
1e dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792
1f dd= 0 locks held 0 write locks 0 pid/thread 30034/1131186496
80000003 dd= 0 locks held 0 write locks 0 pid/thread 30034/47555767609232
80000004 dd= 0 locks held 0 write locks 0 pid/thread 30034/1139579200
80000005 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
80000006 dd= 0 locks held 0 write locks 0 pid/thread 30034/1131186496
80000007 dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792
80000026 dd= 0 locks held 1 write locks 0 pid/thread 30034/1122793792
80000026 READ 1 HELD 0x5e948 len: 9 data: 0x560x0500000000000000
80000027 dd= 0 locks held 1 write locks 0 pid/thread 30034/1131186496
80000027 READ 1 HELD 0x36490 len: 9 data: 0x370000000000000000
80000028 dd= 0 locks held 1 write locks 0 pid/thread 30034/1131186496
80000028 WRITE 1 WAIT mail.bdb page 3
80000028 READ 1 HELD mail.bdb page 3
80000029 dd= 0 locks held 1 write locks 0 pid/thread 30034/1122793792
80000029 WRITE 1 WAIT mail.bdb page 3
80000029 READ 1 HELD mail.bdb page 3
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Locks grouped by object:
Locker Mode Count Status ----------------- Object ---------------
80000027 READ 1 HELD 0x36490 len: 9 data: 0x370000000000000000
1b READ 1 HELD cn.bdb handle 0
10 READ 1 HELD member.bdb handle 0
9 READ 1 HELD objectClass.bdb handle 0
c READ 1 HELD uid.bdb handle 0
80000026 READ 1 HELD 0x5e948 len: 9 data: 0x560x0500000000000000
1 READ 1 HELD id2entry.bdb handle 0
18 READ 1 HELD sn.bdb handle 0
3 READ 1 HELD dn2id.bdb handle 0
80000029 READ 1 HELD mail.bdb page 3
80000028 READ 1 HELD mail.bdb page 3
80000029 WRITE 1 WAIT mail.bdb page 3
80000028 WRITE 1 WAIT mail.bdb page 3
13 READ 1 HELD mail.bdb handle 0
db_stat -C ol (4.7.25)
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Lock REGINFO information:
Lock Region type
5 Region ID
/apps/DECCLASA-1/data/openldap/__db.005 Region name
0x2b4a8a577000 Original region address
0x2b4a8a577000 Region address
0x2b4a8a577138 Region primary address
0 Region maximum allocation
0 Region allocated
Region allocations: 3006 allocations, 0 failures, 0 frees, 1 longest
REGION_JOIN_OK Region flags
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Locks grouped by lockers:
Locker Mode Count Status ----------------- Object ---------------
1 dd= 0 locks held 1 write locks 0 pid/thread 32447/47463753631632
1 READ 1 HELD id2entry.bdb handle 0
2 dd= 0 locks held 0 write locks 0 pid/thread 32447/47463753631632
3 dd= 0 locks held 1 write locks 0 pid/thread 32447/47463753631632
3 READ 1 HELD dn2id.bdb handle 0
4 dd= 0 locks held 0 write locks 0 pid/thread 32447/1114126656
5 dd= 0 locks held 0 write locks 0 pid/thread 32447/47463753631632
6 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
7 dd= 0 locks held 1 write locks 0 pid/thread 32447/1097341248
7 READ 1 HELD objectClass.bdb handle 0
8 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
9 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
a dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
b dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
c dd= 0 locks held 1 write locks 0 pid/thread 32447/1097341248
c READ 1 HELD cn.bdb handle 0
d dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
e dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
f dd= 0 locks held 1 write locks 0 pid/thread 32447/1097341248
f READ 1 HELD uid.bdb handle 0
10 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
11 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
12 dd= 0 locks held 1 write locks 0 pid/thread 32447/1097341248
12 READ 1 HELD member.bdb handle 0
13 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
14 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
15 dd= 0 locks held 1 write locks 0 pid/thread 32447/1105733952
15 READ 1 HELD mail.bdb handle 0
16 dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952
17 dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952
18 dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952
19 dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952
1a dd= 0 locks held 1 write locks 0 pid/thread 32447/1105733952
1a READ 1 HELD sn.bdb handle 0
1b dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952
1c dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952
1d dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952
1e dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952
1f dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
80000003 dd= 0 locks held 0 write locks 0 pid/thread 32447/47463753631632
80000004 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
80000008 dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952
80000038 dd= 0 locks held 0 write locks 0 pid/thread 32447/1114126656
800000a9 dd= 0 locks held 1 write locks 0 pid/thread 32447/1097341248
800000a9 READ 1 HELD 0x2d228 len: 9 data: 0x370000000000000000
800000aa dd= 0 locks held 2 write locks 0 pid/thread 32447/1097341248
800000aa WRITE 1 WAIT mail.bdb page 3
800000aa READ 1 HELD mail.bdb page 1
800000aa READ 1 HELD mail.bdb page 3
800000ab dd= 0 locks held 1 write locks 0 pid/thread 32447/1105733952
800000ab READ 1 HELD 0x43d58 len: 9 data: 0x540x0500000000000000
800000ac dd= 0 locks held 2 write locks 0 pid/thread 32447/1105733952
800000ac WRITE 1 WAIT mail.bdb page 3
800000ac READ 1 HELD mail.bdb page 1
800000ac READ 1 HELD mail.bdb page 3
800000b1 dd= 0 locks held 0 write locks 0 pid/thread 32447/1122519360
800000b2 dd= 0 locks held 0 write locks 0 pid/thread 32447/1130912064
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Locks grouped by object:
Locker Mode Count Status ----------------- Object ---------------
800000a9 READ 1 HELD 0x2d228 len: 9 data: 0x370000000000000000
1 READ 1 HELD id2entry.bdb handle 0
3 READ 1 HELD dn2id.bdb handle 0
7 READ 1 HELD objectClass.bdb handle 0
f READ 1 HELD uid.bdb handle 0
1a READ 1 HELD sn.bdb handle 0
800000aa READ 1 HELD mail.bdb page 3
800000ac READ 1 HELD mail.bdb page 3
800000aa WRITE 1 WAIT mail.bdb page 3
800000ac WRITE 1 WAIT mail.bdb page 3
c READ 1 HELD cn.bdb handle 0
15 READ 1 HELD mail.bdb handle 0
800000aa READ 1 HELD mail.bdb page 1
800000ac READ 1 HELD mail.bdb page 1
12 READ 1 HELD member.bdb handle 0
800000ab READ 1 HELD 0x43d58 len: 9 data: 0x540x0500000000000000
We have also collected the backtrace of threads for both DB versions which I
have uploaded to:
https://dl.dropboxusercontent.com/u/92115703/dusan_gdb_threads_4.8.30_20130625.txt
https://dl.dropboxusercontent.com/u/92115703/dusan_gdb_threads_4.7.25_20130625.txt