[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
RE: (ITS#7633) Slapd hangs on hdb write lock
Hi Dusan,
On Fri, 28 Jun 2013, dusan.fric@t-systems.sk wrote:
> Hi Chu,
>
> thanks for your quick response. We will try to contact Oracle to advice here.
>
> However, what is your suggestion here ' try different settings for the deadlock detector' exactly?
>
> BTW - certainly, we are patched DB 4.7.25 (all 4 patches)...
In case you can update to 2.4.35 you might want to test the mdb backend with your workload.
Especially if you can easily reproduce the problem.
Greetings
Christian
>
> Regards,
> Dusan
>
> -----Original Message-----
> From: Howard Chu [mailto:hyc@symas.com]
> Sent: 26. j??na 2013 13:32
> To: Fric, Dusan; openldap-its@openldap.org
> Subject: Re: (ITS#7633) Slapd hangs on hdb write lock
>
> dusan.fric@t-systems.sk wrote:
>> Full_Name: Dusan Fric
>> Version: 2.4.32
>> OS: RHEL 5.7 x64
>> URL: ftp://ftp.openldap.org/incoming/
>> Submission from: (NULL) (88.212.40.139)
>>
>>
>> We are experiencing frequent hangs in slapd. Once hung we usualy
>> cannot continue to connect until we kill -9 the slapd process and
>> restart it. The directory is used for 2 applications as user eDir and
>> we are using it in production over 1 month - we have noted the busier
>> the directory becomes the more often it hangs (now twice per week).
>>
>> We have installed identical configurations on 3 environments, each has
>> only one single server (no replication). There are 30k entries in the
>> directory (production).
>>
>> We are running:
>>
>> RHEL 5.7 x64 (VMWare with NFS mountpoints) OpenLDAP 2.4.32 Berkeley DB
>> 4.8.30 (4.7.25)
>>
>> We were starting with DB 4.8.30, after downgrade to version 4.7.25
>> (according to
>> ITS#7378 -
>> https://www.openldap.org/its/index.cgi/Incoming?id=7378;selectid=7378)
>> we are facing the same issue.
>>
>> The problem occurs when two requests simultaneously try to update
>> attributes of a record and in most cases of the same DN value.
>> We can easily reproduce it with a java test program running 2 threads
>> each connecting to the ldap server and updating the record for a
>> particular DN value.
>> It need not be the same DN value but DN values which reside on the
>> same BDB page.
>
> Thanks for the traces, but the information you've provided shows that this is a BDB problem, not an OpenLDAP bug. You'll have to contact Oracle for help.
> The BDB deadlock detector is supposed to handle these situations.
>
> For example in your 4.8.30 lock status there are only 2 conflicting transactions 80000028 and 80000029 and they are only contending on a single page. This is one of the most elementary deadlock situations.
>
> You might try different settings for the deadlock detector, but ultimately this appears to be a BDB bug and the choice of detector shouldn't matter.
>
>> Configuration
>>
>> olc cn=config (part):
>> dn: cn=config
>> objectClass: olcGlobal
>> cn: config
>> olcConcurrency: 0
>> olcConnMaxPending: 100
>> olcConnMaxPendingAuth: 1000
>> olcSockbufMaxIncoming: 262143
>> olcSockbufMaxIncomingAuth: 16777215
>> olcThreads: 32
>>
>> olc DB hdb config (part):
>> dn: olcDatabase={1}hdb,cn=config
>> objectClass: olcHdbConfig
>> objectClass: olcDatabaseConfig
>> olcDatabase: {1}hdb
>> olcDbCacheSize: 80000
>> olcDbCheckpoint: 128 5
>> olcDbConfig: {0}set_cachesize 0 268435456 1
>> olcDbConfig: {1}set_lg_max 10485760
>> olcDbConfig: {2}set_lg_bsize 2097152
>> olcDbConfig: {3}set_lg_dir /pkg/openldap/dblog
>> olcDbConfig: {4}set_lg_regionmax 262144
>> olcDbConfig: {5}set_lk_detect DB_LOCK_EXPIRE
>> olcDbConfig: {6}set_flags DB_TXN_NOSYNC
>> olcDbDirtyRead: FALSE
>> olcDbDNcacheSize: 0
>> olcDbIDLcacheSize: 240000
>> olcDbIndex: default eq,sub
>> olcDbIndex: objectClass eq
>> olcDbIndex: cn eq,sub
>> olcDbIndex: uid eq,sub
>> olcDbIndex: mail pres,eq
>> olcDbIndex: sn eq,sub
>> olcDbIndex: member eq
>> olcDbLinearIndex: FALSE
>> olcDbMode: 0600
>> olcDbNoSync: TRUE
>> olcDbSearchStack: 20
>> olcDbShmKey: 0
>> olcLastMod: TRUE
>> olcMaxDerefDepth: 15
>> olcMonitoring: TRUE
>> olcReadOnly: FALSE
>>
>> We have managed to collect db_stat lock information, which indicates
>> the same issue with DB write locks on both DB versions.
>>
>> db_stat -C ol (4.8.30)
>> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>> Lock REGINFO information:
>> Lock Region type
>> 5 Region ID
>> /apps/DECCLASA-1/data/openldap/__db.005 Region name
>> 0x2ab41cdc3000 Region address
>> 0x2ab41cdc3138 Region primary address
>> 0 Region maximum allocation
>> 0 Region allocated
>> Region allocations: 3006 allocations, 0 failures, 0 frees, 1 longest
>> REGION_JOIN_OK Region flags
>> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>> Locks grouped by lockers:
>> Locker Mode Count Status ----------------- Object ---------------
>> 1 dd= 0 locks held 1 write locks 0 pid/thread 30034/47555767609232
>> 1 READ 1 HELD id2entry.bdb handle 0
>> 2 dd= 0 locks held 0 write locks 0 pid/thread 30034/47555767609232
>> 3 dd= 0 locks held 1 write locks 0 pid/thread 30034/47555767609232
>> 3 READ 1 HELD dn2id.bdb handle 0
>> 4 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
>> 5 dd= 0 locks held 0 write locks 0 pid/thread 30034/47555767609232
>> 6 dd= 0 locks held 0 write locks 0 pid/thread 30034/1139579200
>> 7 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
>> 8 dd= 0 locks held 0 write locks 0 pid/thread 30034/1131186496
>> 9 dd= 0 locks held 1 write locks 0 pid/thread 30034/1122793792
>> 9 READ 1 HELD objectClass.bdb handle 0
>> a dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792
>> b dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792
>> c dd= 0 locks held 1 write locks 0 pid/thread 30034/1122793792
>> c READ 1 HELD uid.bdb handle 0
>> d dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792
>> e dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792
>> f dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792
>> 10 dd= 0 locks held 1 write locks 0 pid/thread 30034/1147971904
>> 10 READ 1 HELD member.bdb handle 0
>> 11 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
>> 12 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
>> 13 dd= 0 locks held 1 write locks 0 pid/thread 30034/1147971904
>> 13 READ 1 HELD mail.bdb handle 0
>> 14 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
>> 15 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
>> 16 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
>> 17 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
>> 18 dd= 0 locks held 1 write locks 0 pid/thread 30034/1147971904
>> 18 READ 1 HELD sn.bdb handle 0
>> 19 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
>> 1a dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
>> 1b dd= 0 locks held 1 write locks 0 pid/thread 30034/1147971904
>> 1b READ 1 HELD cn.bdb handle 0
>> 1c dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
>> 1d dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
>> 1e dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792
>> 1f dd= 0 locks held 0 write locks 0 pid/thread 30034/1131186496
>> 80000003 dd= 0 locks held 0 write locks 0 pid/thread 30034/47555767609232
>> 80000004 dd= 0 locks held 0 write locks 0 pid/thread 30034/1139579200
>> 80000005 dd= 0 locks held 0 write locks 0 pid/thread 30034/1147971904
>> 80000006 dd= 0 locks held 0 write locks 0 pid/thread 30034/1131186496
>> 80000007 dd= 0 locks held 0 write locks 0 pid/thread 30034/1122793792
>> 80000026 dd= 0 locks held 1 write locks 0 pid/thread 30034/1122793792
>> 80000026 READ 1 HELD 0x5e948 len: 9 data: 0x560x0500000000000000
>> 80000027 dd= 0 locks held 1 write locks 0 pid/thread 30034/1131186496
>> 80000027 READ 1 HELD 0x36490 len: 9 data: 0x370000000000000000
>> 80000028 dd= 0 locks held 1 write locks 0 pid/thread 30034/1131186496
>> 80000028 WRITE 1 WAIT mail.bdb page 3
>> 80000028 READ 1 HELD mail.bdb page 3
>> 80000029 dd= 0 locks held 1 write locks 0 pid/thread 30034/1122793792
>> 80000029 WRITE 1 WAIT mail.bdb page 3
>> 80000029 READ 1 HELD mail.bdb page 3
>> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>> Locks grouped by object:
>> Locker Mode Count Status ----------------- Object ---------------
>> 80000027 READ 1 HELD 0x36490 len: 9 data: 0x370000000000000000
>>
>> 1b READ 1 HELD cn.bdb handle 0
>>
>> 10 READ 1 HELD member.bdb handle 0
>>
>> 9 READ 1 HELD objectClass.bdb handle 0
>>
>> c READ 1 HELD uid.bdb handle 0
>>
>> 80000026 READ 1 HELD 0x5e948 len: 9 data: 0x560x0500000000000000
>>
>> 1 READ 1 HELD id2entry.bdb handle 0
>>
>> 18 READ 1 HELD sn.bdb handle 0
>>
>> 3 READ 1 HELD dn2id.bdb handle 0
>>
>> 80000029 READ 1 HELD mail.bdb page 3
>> 80000028 READ 1 HELD mail.bdb page 3
>> 80000029 WRITE 1 WAIT mail.bdb page 3
>> 80000028 WRITE 1 WAIT mail.bdb page 3
>>
>> 13 READ 1 HELD mail.bdb handle 0
>>
>> db_stat -C ol (4.7.25)
>> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>> Lock REGINFO information:
>> Lock Region type
>> 5 Region ID
>> /apps/DECCLASA-1/data/openldap/__db.005 Region name
>> 0x2b4a8a577000 Original region address
>> 0x2b4a8a577000 Region address
>> 0x2b4a8a577138 Region primary address
>> 0 Region maximum allocation
>> 0 Region allocated
>> Region allocations: 3006 allocations, 0 failures, 0 frees, 1 longest
>> REGION_JOIN_OK Region flags
>> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>> Locks grouped by lockers:
>> Locker Mode Count Status ----------------- Object ---------------
>> 1 dd= 0 locks held 1 write locks 0 pid/thread 32447/47463753631632
>> 1 READ 1 HELD id2entry.bdb handle 0
>> 2 dd= 0 locks held 0 write locks 0 pid/thread 32447/47463753631632
>> 3 dd= 0 locks held 1 write locks 0 pid/thread 32447/47463753631632
>> 3 READ 1 HELD dn2id.bdb handle 0
>> 4 dd= 0 locks held 0 write locks 0 pid/thread 32447/1114126656
>> 5 dd= 0 locks held 0 write locks 0 pid/thread 32447/47463753631632
>> 6 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
>> 7 dd= 0 locks held 1 write locks 0 pid/thread 32447/1097341248
>> 7 READ 1 HELD objectClass.bdb handle 0
>> 8 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
>> 9 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
>> a dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
>> b dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
>> c dd= 0 locks held 1 write locks 0 pid/thread 32447/1097341248
>> c READ 1 HELD cn.bdb handle 0
>> d dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
>> e dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
>> f dd= 0 locks held 1 write locks 0 pid/thread 32447/1097341248
>> f READ 1 HELD uid.bdb handle 0
>> 10 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
>> 11 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
>> 12 dd= 0 locks held 1 write locks 0 pid/thread 32447/1097341248
>> 12 READ 1 HELD member.bdb handle 0
>> 13 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
>> 14 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
>> 15 dd= 0 locks held 1 write locks 0 pid/thread 32447/1105733952
>> 15 READ 1 HELD mail.bdb handle 0
>> 16 dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952
>> 17 dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952
>> 18 dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952
>> 19 dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952
>> 1a dd= 0 locks held 1 write locks 0 pid/thread 32447/1105733952
>> 1a READ 1 HELD sn.bdb handle 0
>> 1b dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952
>> 1c dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952
>> 1d dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952
>> 1e dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952
>> 1f dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
>> 80000003 dd= 0 locks held 0 write locks 0 pid/thread 32447/47463753631632
>> 80000004 dd= 0 locks held 0 write locks 0 pid/thread 32447/1097341248
>> 80000008 dd= 0 locks held 0 write locks 0 pid/thread 32447/1105733952
>> 80000038 dd= 0 locks held 0 write locks 0 pid/thread 32447/1114126656
>> 800000a9 dd= 0 locks held 1 write locks 0 pid/thread 32447/1097341248
>> 800000a9 READ 1 HELD 0x2d228 len: 9 data: 0x370000000000000000
>> 800000aa dd= 0 locks held 2 write locks 0 pid/thread 32447/1097341248
>> 800000aa WRITE 1 WAIT mail.bdb page 3
>> 800000aa READ 1 HELD mail.bdb page 1
>> 800000aa READ 1 HELD mail.bdb page 3
>> 800000ab dd= 0 locks held 1 write locks 0 pid/thread 32447/1105733952
>> 800000ab READ 1 HELD 0x43d58 len: 9 data: 0x540x0500000000000000
>> 800000ac dd= 0 locks held 2 write locks 0 pid/thread 32447/1105733952
>> 800000ac WRITE 1 WAIT mail.bdb page 3
>> 800000ac READ 1 HELD mail.bdb page 1
>> 800000ac READ 1 HELD mail.bdb page 3
>> 800000b1 dd= 0 locks held 0 write locks 0 pid/thread 32447/1122519360
>> 800000b2 dd= 0 locks held 0 write locks 0 pid/thread 32447/1130912064
>> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>> Locks grouped by object:
>> Locker Mode Count Status ----------------- Object ---------------
>> 800000a9 READ 1 HELD 0x2d228 len: 9 data: 0x370000000000000000
>>
>> 1 READ 1 HELD id2entry.bdb handle 0
>>
>> 3 READ 1 HELD dn2id.bdb handle 0
>>
>> 7 READ 1 HELD objectClass.bdb handle 0
>>
>> f READ 1 HELD uid.bdb handle 0
>>
>> 1a READ 1 HELD sn.bdb handle 0
>>
>> 800000aa READ 1 HELD mail.bdb page 3
>> 800000ac READ 1 HELD mail.bdb page 3
>> 800000aa WRITE 1 WAIT mail.bdb page 3
>> 800000ac WRITE 1 WAIT mail.bdb page 3
>>
>> c READ 1 HELD cn.bdb handle 0
>>
>> 15 READ 1 HELD mail.bdb handle 0
>>
>> 800000aa READ 1 HELD mail.bdb page 1
>> 800000ac READ 1 HELD mail.bdb page 1
>>
>> 12 READ 1 HELD member.bdb handle 0
>>
>> 800000ab READ 1 HELD 0x43d58 len: 9 data: 0x540x0500000000000000
>>
>>
>> We have also collected the backtrace of threads for both DB versions
>> which I have uploaded to:
>> https://dl.dropboxusercontent.com/u/92115703/dusan_gdb_threads_4.8.30_
>> 20130625.txt
>>
>> https://dl.dropboxusercontent.com/u/92115703/dusan_gdb_threads_4.7.25_
>> 20130625.txt
>>
>>
>
>
> --
> -- Howard Chu
> CTO, Symas Corp. http://www.symas.com
> Director, Highland Sun http://highlandsun.com/hyc/
> Chief Architect, OpenLDAP http://www.openldap.org/project/
>
>
>
--
Christian Kratzer CK Software GmbH
Email: ck@cksoft.de Wildberger Weg 24/2
Phone: +49 7032 893 997 - 0 D-71126 Gaeufelden
Fax: +49 7032 893 997 - 9 HRB 245288, Amtsgericht Stuttgart
Web: http://www.cksoft.de/ Geschaeftsfuehrer: Christian Kratzer