[Date Prev][Date Next] [Chronological] [Thread] [Top]

RE: (ITS#7633) Slapd hangs on hdb write lock



Hi Dusan,

On Fri, 28 Jun 2013, dusan.fric@t-systems.sk wrote:

> Hi Chu,
>
> thanks for your quick response.  We will try to contact Oracle to advice here.
>
> However, what is your suggestion here ' try different settings for the deadlock detector' exactly?
>
> BTW - certainly, we are patched DB 4.7.25 (all 4 patches)...


In case you can update to 2.4.35 you might want to test the mdb backend with your workload.

Especially if you can easily reproduce the problem.

Greetings
Christian



>
> Regards,
> Dusan
>
> -----Original Message-----
> From: Howard Chu [mailto:hyc@symas.com]
> Sent: 26. j??na 2013 13:32
> To: Fric, Dusan; openldap-its@openldap.org
> Subject: Re: (ITS#7633) Slapd hangs on hdb write lock
>
> dusan.fric@t-systems.sk wrote:
>> Full_Name: Dusan Fric
>> Version: 2.4.32
>> OS: RHEL 5.7 x64
>> URL: ftp://ftp.openldap.org/incoming/
>> Submission from: (NULL) (88.212.40.139)
>>
>>
>> We are experiencing frequent hangs in slapd. Once hung we usualy
>> cannot continue to connect until we kill -9 the slapd process and
>> restart it. The directory is used for 2 applications as user eDir and
>> we are using it in production over 1 month - we have noted the busier
>> the directory becomes the more often it hangs (now twice per week).
>>
>> We have installed identical configurations on 3 environments, each has
>> only one single server (no replication). There are 30k entries in the
>> directory (production).
>>
>> We are running:
>>
>> RHEL 5.7 x64 (VMWare with NFS mountpoints) OpenLDAP 2.4.32 Berkeley DB
>> 4.8.30 (4.7.25)
>>
>> We were starting with DB 4.8.30, after downgrade to version 4.7.25
>> (according to
>> ITS#7378 -
>> https://www.openldap.org/its/index.cgi/Incoming?id=7378;selectid=7378)
>> we are facing the same issue.
>>
>> The problem occurs when two requests simultaneously try to update
>> attributes of a record and in most cases of the same DN value.
>> We can easily reproduce it with a java test program running 2 threads
>> each connecting to the ldap server and updating the record for a
>> particular DN value.
>> It need not be the same DN value but DN values which reside on the
>> same BDB page.
>
> Thanks for the traces, but the information you've provided shows that this is a BDB problem, not an OpenLDAP bug. You'll have to contact Oracle for help.
> The BDB deadlock detector is supposed to handle these situations.
>
> For example in your 4.8.30 lock status there are only 2 conflicting transactions 80000028 and 80000029 and they are only contending on a single page. This is one of the most elementary deadlock situations.
>
> You might try different settings for the deadlock detector, but ultimately this appears to be a BDB bug and the choice of detector shouldn't matter.
>
>> Configuration
>>
>> olc cn=config (part):
>> dn: cn=config
>> objectClass: olcGlobal
>> cn: config
>> olcConcurrency: 0
>> olcConnMaxPending: 100
>> olcConnMaxPendingAuth: 1000
>> olcSockbufMaxIncoming: 262143
>> olcSockbufMaxIncomingAuth: 16777215
>> olcThreads: 32
>>
>> olc DB hdb config (part):
>> dn: olcDatabase={1}hdb,cn=config
>> objectClass: olcHdbConfig
>> objectClass: olcDatabaseConfig
>> olcDatabase: {1}hdb
>> olcDbCacheSize: 80000
>> olcDbCheckpoint: 128 5
>> olcDbConfig: {0}set_cachesize   0       268435456       1
>> olcDbConfig: {1}set_lg_max      10485760
>> olcDbConfig: {2}set_lg_bsize    2097152
>> olcDbConfig: {3}set_lg_dir      /pkg/openldap/dblog
>> olcDbConfig: {4}set_lg_regionmax 262144
>> olcDbConfig: {5}set_lk_detect DB_LOCK_EXPIRE
>> olcDbConfig: {6}set_flags DB_TXN_NOSYNC
>> olcDbDirtyRead: FALSE
>> olcDbDNcacheSize: 0
>> olcDbIDLcacheSize: 240000
>> olcDbIndex: default eq,sub
>> olcDbIndex: objectClass eq
>> olcDbIndex: cn eq,sub
>> olcDbIndex: uid eq,sub
>> olcDbIndex: mail pres,eq
>> olcDbIndex: sn eq,sub
>> olcDbIndex: member eq
>> olcDbLinearIndex: FALSE
>> olcDbMode: 0600
>> olcDbNoSync: TRUE
>> olcDbSearchStack: 20
>> olcDbShmKey: 0
>> olcLastMod: TRUE
>> olcMaxDerefDepth: 15
>> olcMonitoring: TRUE
>> olcReadOnly: FALSE
>>
>> We have managed to collect db_stat lock information, which indicates
>> the same issue with DB write locks on both DB versions.
>>
>> db_stat -C ol (4.8.30)
>> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>> Lock REGINFO information:
>> Lock    Region type
>> 5       Region ID
>> /apps/DECCLASA-1/data/openldap/__db.005 Region name
>> 0x2ab41cdc3000  Region address
>> 0x2ab41cdc3138  Region primary address
>> 0       Region maximum allocation
>> 0       Region allocated
>> Region allocations: 3006 allocations, 0 failures, 0 frees, 1 longest
>> REGION_JOIN_OK  Region flags
>> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>> Locks grouped by lockers:
>> Locker   Mode      Count Status  ----------------- Object ---------------
>>         1 dd= 0 locks held 1    write locks 0    pid/thread 30034/47555767609232
>>         1 READ          1 HELD    id2entry.bdb              handle        0
>>         2 dd= 0 locks held 0    write locks 0    pid/thread 30034/47555767609232
>>         3 dd= 0 locks held 1    write locks 0    pid/thread 30034/47555767609232
>>         3 READ          1 HELD    dn2id.bdb                 handle        0
>>         4 dd= 0 locks held 0    write locks 0    pid/thread 30034/1147971904
>>         5 dd= 0 locks held 0    write locks 0    pid/thread 30034/47555767609232
>>         6 dd= 0 locks held 0    write locks 0    pid/thread 30034/1139579200
>>         7 dd= 0 locks held 0    write locks 0    pid/thread 30034/1147971904
>>         8 dd= 0 locks held 0    write locks 0    pid/thread 30034/1131186496
>>         9 dd= 0 locks held 1    write locks 0    pid/thread 30034/1122793792
>>         9 READ          1 HELD    objectClass.bdb           handle        0
>>         a dd= 0 locks held 0    write locks 0    pid/thread 30034/1122793792
>>         b dd= 0 locks held 0    write locks 0    pid/thread 30034/1122793792
>>         c dd= 0 locks held 1    write locks 0    pid/thread 30034/1122793792
>>         c READ          1 HELD    uid.bdb                   handle        0
>>         d dd= 0 locks held 0    write locks 0    pid/thread 30034/1122793792
>>         e dd= 0 locks held 0    write locks 0    pid/thread 30034/1122793792
>>         f dd= 0 locks held 0    write locks 0    pid/thread 30034/1122793792
>>        10 dd= 0 locks held 1    write locks 0    pid/thread 30034/1147971904
>>        10 READ          1 HELD    member.bdb                handle        0
>>        11 dd= 0 locks held 0    write locks 0    pid/thread 30034/1147971904
>>        12 dd= 0 locks held 0    write locks 0    pid/thread 30034/1147971904
>>        13 dd= 0 locks held 1    write locks 0    pid/thread 30034/1147971904
>>        13 READ          1 HELD    mail.bdb                  handle        0
>>        14 dd= 0 locks held 0    write locks 0    pid/thread 30034/1147971904
>>        15 dd= 0 locks held 0    write locks 0    pid/thread 30034/1147971904
>>        16 dd= 0 locks held 0    write locks 0    pid/thread 30034/1147971904
>>        17 dd= 0 locks held 0    write locks 0    pid/thread 30034/1147971904
>>        18 dd= 0 locks held 1    write locks 0    pid/thread 30034/1147971904
>>        18 READ          1 HELD    sn.bdb                    handle        0
>>        19 dd= 0 locks held 0    write locks 0    pid/thread 30034/1147971904
>>        1a dd= 0 locks held 0    write locks 0    pid/thread 30034/1147971904
>>        1b dd= 0 locks held 1    write locks 0    pid/thread 30034/1147971904
>>        1b READ          1 HELD    cn.bdb                    handle        0
>>        1c dd= 0 locks held 0    write locks 0    pid/thread 30034/1147971904
>>        1d dd= 0 locks held 0    write locks 0    pid/thread 30034/1147971904
>>        1e dd= 0 locks held 0    write locks 0    pid/thread 30034/1122793792
>>        1f dd= 0 locks held 0    write locks 0    pid/thread 30034/1131186496
>> 80000003 dd= 0 locks held 0    write locks 0    pid/thread 30034/47555767609232
>> 80000004 dd= 0 locks held 0    write locks 0    pid/thread 30034/1139579200
>> 80000005 dd= 0 locks held 0    write locks 0    pid/thread 30034/1147971904
>> 80000006 dd= 0 locks held 0    write locks 0    pid/thread 30034/1131186496
>> 80000007 dd= 0 locks held 0    write locks 0    pid/thread 30034/1122793792
>> 80000026 dd= 0 locks held 1    write locks 0    pid/thread 30034/1122793792
>> 80000026 READ          1 HELD    0x5e948 len:   9 data: 0x560x0500000000000000
>> 80000027 dd= 0 locks held 1    write locks 0    pid/thread 30034/1131186496
>> 80000027 READ          1 HELD    0x36490 len:   9 data: 0x370000000000000000
>> 80000028 dd= 0 locks held 1    write locks 0    pid/thread 30034/1131186496
>> 80000028 WRITE         1 WAIT    mail.bdb                  page          3
>> 80000028 READ          1 HELD    mail.bdb                  page          3
>> 80000029 dd= 0 locks held 1    write locks 0    pid/thread 30034/1122793792
>> 80000029 WRITE         1 WAIT    mail.bdb                  page          3
>> 80000029 READ          1 HELD    mail.bdb                  page          3
>> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>> Locks grouped by object:
>> Locker   Mode      Count Status  ----------------- Object ---------------
>> 80000027 READ          1 HELD    0x36490 len:   9 data: 0x370000000000000000
>>
>>        1b READ          1 HELD    cn.bdb                    handle        0
>>
>>        10 READ          1 HELD    member.bdb                handle        0
>>
>>         9 READ          1 HELD    objectClass.bdb           handle        0
>>
>>         c READ          1 HELD    uid.bdb                   handle        0
>>
>> 80000026 READ          1 HELD    0x5e948 len:   9 data: 0x560x0500000000000000
>>
>>         1 READ          1 HELD    id2entry.bdb              handle        0
>>
>>        18 READ          1 HELD    sn.bdb                    handle        0
>>
>>         3 READ          1 HELD    dn2id.bdb                 handle        0
>>
>> 80000029 READ          1 HELD    mail.bdb                  page          3
>> 80000028 READ          1 HELD    mail.bdb                  page          3
>> 80000029 WRITE         1 WAIT    mail.bdb                  page          3
>> 80000028 WRITE         1 WAIT    mail.bdb                  page          3
>>
>>        13 READ          1 HELD    mail.bdb                  handle        0
>>
>> db_stat -C ol (4.7.25)
>> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>> Lock REGINFO information:
>> Lock    Region type
>> 5       Region ID
>> /apps/DECCLASA-1/data/openldap/__db.005 Region name
>> 0x2b4a8a577000  Original region address
>> 0x2b4a8a577000  Region address
>> 0x2b4a8a577138  Region primary address
>> 0       Region maximum allocation
>> 0       Region allocated
>> Region allocations: 3006 allocations, 0 failures, 0 frees, 1 longest
>> REGION_JOIN_OK  Region flags
>> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>> Locks grouped by lockers:
>> Locker   Mode      Count Status  ----------------- Object ---------------
>>         1 dd= 0 locks held 1    write locks 0    pid/thread 32447/47463753631632
>>         1 READ          1 HELD    id2entry.bdb              handle        0
>>         2 dd= 0 locks held 0    write locks 0    pid/thread 32447/47463753631632
>>         3 dd= 0 locks held 1    write locks 0    pid/thread 32447/47463753631632
>>         3 READ          1 HELD    dn2id.bdb                 handle        0
>>         4 dd= 0 locks held 0    write locks 0    pid/thread 32447/1114126656
>>         5 dd= 0 locks held 0    write locks 0    pid/thread 32447/47463753631632
>>         6 dd= 0 locks held 0    write locks 0    pid/thread 32447/1097341248
>>         7 dd= 0 locks held 1    write locks 0    pid/thread 32447/1097341248
>>         7 READ          1 HELD    objectClass.bdb           handle        0
>>         8 dd= 0 locks held 0    write locks 0    pid/thread 32447/1097341248
>>         9 dd= 0 locks held 0    write locks 0    pid/thread 32447/1097341248
>>         a dd= 0 locks held 0    write locks 0    pid/thread 32447/1097341248
>>         b dd= 0 locks held 0    write locks 0    pid/thread 32447/1097341248
>>         c dd= 0 locks held 1    write locks 0    pid/thread 32447/1097341248
>>         c READ          1 HELD    cn.bdb                    handle        0
>>         d dd= 0 locks held 0    write locks 0    pid/thread 32447/1097341248
>>         e dd= 0 locks held 0    write locks 0    pid/thread 32447/1097341248
>>         f dd= 0 locks held 1    write locks 0    pid/thread 32447/1097341248
>>         f READ          1 HELD    uid.bdb                   handle        0
>>        10 dd= 0 locks held 0    write locks 0    pid/thread 32447/1097341248
>>        11 dd= 0 locks held 0    write locks 0    pid/thread 32447/1097341248
>>        12 dd= 0 locks held 1    write locks 0    pid/thread 32447/1097341248
>>        12 READ          1 HELD    member.bdb                handle        0
>>        13 dd= 0 locks held 0    write locks 0    pid/thread 32447/1097341248
>>        14 dd= 0 locks held 0    write locks 0    pid/thread 32447/1097341248
>>        15 dd= 0 locks held 1    write locks 0    pid/thread 32447/1105733952
>>        15 READ          1 HELD    mail.bdb                  handle        0
>>        16 dd= 0 locks held 0    write locks 0    pid/thread 32447/1105733952
>>        17 dd= 0 locks held 0    write locks 0    pid/thread 32447/1105733952
>>        18 dd= 0 locks held 0    write locks 0    pid/thread 32447/1105733952
>>        19 dd= 0 locks held 0    write locks 0    pid/thread 32447/1105733952
>>        1a dd= 0 locks held 1    write locks 0    pid/thread 32447/1105733952
>>        1a READ          1 HELD    sn.bdb                    handle        0
>>        1b dd= 0 locks held 0    write locks 0    pid/thread 32447/1105733952
>>        1c dd= 0 locks held 0    write locks 0    pid/thread 32447/1105733952
>>        1d dd= 0 locks held 0    write locks 0    pid/thread 32447/1105733952
>>        1e dd= 0 locks held 0    write locks 0    pid/thread 32447/1105733952
>>        1f dd= 0 locks held 0    write locks 0    pid/thread 32447/1097341248
>> 80000003 dd= 0 locks held 0    write locks 0    pid/thread 32447/47463753631632
>> 80000004 dd= 0 locks held 0    write locks 0    pid/thread 32447/1097341248
>> 80000008 dd= 0 locks held 0    write locks 0    pid/thread 32447/1105733952
>> 80000038 dd= 0 locks held 0    write locks 0    pid/thread 32447/1114126656
>> 800000a9 dd= 0 locks held 1    write locks 0    pid/thread 32447/1097341248
>> 800000a9 READ          1 HELD    0x2d228 len:   9 data: 0x370000000000000000
>> 800000aa dd= 0 locks held 2    write locks 0    pid/thread 32447/1097341248
>> 800000aa WRITE         1 WAIT    mail.bdb                  page          3
>> 800000aa READ          1 HELD    mail.bdb                  page          1
>> 800000aa READ          1 HELD    mail.bdb                  page          3
>> 800000ab dd= 0 locks held 1    write locks 0    pid/thread 32447/1105733952
>> 800000ab READ          1 HELD    0x43d58 len:   9 data: 0x540x0500000000000000
>> 800000ac dd= 0 locks held 2    write locks 0    pid/thread 32447/1105733952
>> 800000ac WRITE         1 WAIT    mail.bdb                  page          3
>> 800000ac READ          1 HELD    mail.bdb                  page          1
>> 800000ac READ          1 HELD    mail.bdb                  page          3
>> 800000b1 dd= 0 locks held 0    write locks 0    pid/thread 32447/1122519360
>> 800000b2 dd= 0 locks held 0    write locks 0    pid/thread 32447/1130912064
>> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>> Locks grouped by object:
>> Locker   Mode      Count Status  ----------------- Object ---------------
>> 800000a9 READ          1 HELD    0x2d228 len:   9 data: 0x370000000000000000
>>
>>         1 READ          1 HELD    id2entry.bdb              handle        0
>>
>>         3 READ          1 HELD    dn2id.bdb                 handle        0
>>
>>         7 READ          1 HELD    objectClass.bdb           handle        0
>>
>>         f READ          1 HELD    uid.bdb                   handle        0
>>
>>        1a READ          1 HELD    sn.bdb                    handle        0
>>
>> 800000aa READ          1 HELD    mail.bdb                  page          3
>> 800000ac READ          1 HELD    mail.bdb                  page          3
>> 800000aa WRITE         1 WAIT    mail.bdb                  page          3
>> 800000ac WRITE         1 WAIT    mail.bdb                  page          3
>>
>>         c READ          1 HELD    cn.bdb                    handle        0
>>
>>        15 READ          1 HELD    mail.bdb                  handle        0
>>
>> 800000aa READ          1 HELD    mail.bdb                  page          1
>> 800000ac READ          1 HELD    mail.bdb                  page          1
>>
>>        12 READ          1 HELD    member.bdb                handle        0
>>
>> 800000ab READ          1 HELD    0x43d58 len:   9 data: 0x540x0500000000000000
>>
>>
>> We have also collected the backtrace of threads for both DB versions
>> which I have uploaded to:
>> https://dl.dropboxusercontent.com/u/92115703/dusan_gdb_threads_4.8.30_
>> 20130625.txt
>>
>> https://dl.dropboxusercontent.com/u/92115703/dusan_gdb_threads_4.7.25_
>> 20130625.txt
>>
>>
>
>
> --
>   -- Howard Chu
>   CTO, Symas Corp.           http://www.symas.com
>   Director, Highland Sun     http://highlandsun.com/hyc/
>   Chief Architect, OpenLDAP  http://www.openldap.org/project/
>
>
>

-- 
Christian Kratzer                      CK Software GmbH
Email:   ck@cksoft.de                  Wildberger Weg 24/2
Phone:   +49 7032 893 997 - 0          D-71126 Gaeufelden
Fax:     +49 7032 893 997 - 9          HRB 245288, Amtsgericht Stuttgart
Web:     http://www.cksoft.de/         Geschaeftsfuehrer: Christian Kratzer