[Date Prev][Date Next] [Chronological] [Thread] [Top]

RE: (ITS#7633) Slapd hangs on hdb write lock



Hi Chu,

thanks for your quick response.  We will try to contact Oracle to advice here.

However, what is your suggestion here ' try different settings for the deadlock detector' exactly?

BTW - certainly, we are patched DB 4.7.25 (all 4 patches)...

Regards,
Dusan

-----Original Message-----
From: Howard Chu [mailto:hyc@symas.com]
Sent: 26. júna 2013 13:32
To: Fric, Dusan; openldap-its@openldap.org
Subject: Re: (ITS#7633) Slapd hangs on hdb write lock

dusan.fric@t-systems.sk wrote:
> Full_Name: Dusan Fric
> Version: 2.4.32
> OS: RHEL 5.7 x64
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (88.212.40.139)
>
>
> We are experiencing frequent hangs in slapd. Once hung we usualy
> cannot continue to connect until we kill -9 the slapd process and
> restart it. The directory is used for 2 applications as user eDir and
> we are using it in production over 1 month - we have noted the busier
> the directory becomes the more often it hangs (now twice per week).
>
> We have installed identical configurations on 3 environments, each has
> only one single server (no replication). There are 30k entries in the
> directory (production).
>
> We are running:
>
> RHEL 5.7 x64 (VMWare with NFS mountpoints) OpenLDAP 2.4.32 Berkeley DB
> 4.8.30 (4.7.25)
>
> We were starting with DB 4.8.30, after downgrade to version 4.7.25
> (according to
> ITS#7378 -
> https://www.openldap.org/its/index.cgi/Incoming?id=7378;selectid=7378)
> we are facing the same issue.
>
> The problem occurs when two requests simultaneously try to update
> attributes of a record and in most cases of the same DN value.
> We can easily reproduce it with a java test program running 2 threads
> each connecting to the ldap server and updating the record for a
> particular DN value.
> It need not be the same DN value but DN values which reside on the
> same BDB page.

Thanks for the traces, but the information you've provided shows that this is a BDB problem, not an OpenLDAP bug. You'll have to contact Oracle for help.
The BDB deadlock detector is supposed to handle these situations.

For example in your 4.8.30 lock status there are only 2 conflicting transactions 80000028 and 80000029 and they are only contending on a single page. This is one of the most elementary deadlock situations.

You might try different settings for the deadlock detector, but ultimately this appears to be a BDB bug and the choice of detector shouldn't matter.

> Configuration
>
> olc cn=config (part):
> dn: cn=config
> objectClass: olcGlobal
> cn: config
> olcConcurrency: 0
> olcConnMaxPending: 100
> olcConnMaxPendingAuth: 1000
> olcSockbufMaxIncoming: 262143
> olcSockbufMaxIncomingAuth: 16777215
> olcThreads: 32
>
> olc DB hdb config (part):
> dn: olcDatabase={1}hdb,cn=config
> objectClass: olcHdbConfig
> objectClass: olcDatabaseConfig
> olcDatabase: {1}hdb
> olcDbCacheSize: 80000
> olcDbCheckpoint: 128 5
> olcDbConfig: {0}set_cachesize   0       268435456       1
> olcDbConfig: {1}set_lg_max      10485760
> olcDbConfig: {2}set_lg_bsize    2097152
> olcDbConfig: {3}set_lg_dir      /pkg/openldap/dblog
> olcDbConfig: {4}set_lg_regionmax 262144
> olcDbConfig: {5}set_lk_detect DB_LOCK_EXPIRE
> olcDbConfig: {6}set_flags DB_TXN_NOSYNC
> olcDbDirtyRead: FALSE
> olcDbDNcacheSize: 0
> olcDbIDLcacheSize: 240000
> olcDbIndex: default eq,sub
> olcDbIndex: objectClass eq
> olcDbIndex: cn eq,sub
> olcDbIndex: uid eq,sub
> olcDbIndex: mail pres,eq
> olcDbIndex: sn eq,sub
> olcDbIndex: member eq
> olcDbLinearIndex: FALSE
> olcDbMode: 0600
> olcDbNoSync: TRUE
> olcDbSearchStack: 20
> olcDbShmKey: 0
> olcLastMod: TRUE
> olcMaxDerefDepth: 15
> olcMonitoring: TRUE
> olcReadOnly: FALSE
>
> We have managed to collect db_stat lock information, which indicates
> the same issue with DB write locks on both DB versions.
>
> db_stat -C ol (4.8.30)
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Lock REGINFO information:
> Lock    Region type
> 5       Region ID
> /apps/DECCLASA-1/data/openldap/__db.005 Region name
> 0x2ab41cdc3000  Region address
> 0x2ab41cdc3138  Region primary address
> 0       Region maximum allocation
> 0       Region allocated
> Region allocations: 3006 allocations, 0 failures, 0 frees, 1 longest
> REGION_JOIN_OK  Region flags
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Locks grouped by lockers:
> Locker   Mode      Count Status  ----------------- Object ---------------
>         1 dd= 0 locks held 1    write locks 0    pid/thread 30034/47555767609232
>         1 READ          1 HELD    id2entry.bdb              handle        0
>         2 dd= 0 locks held 0    write locks 0    pid/thread 30034/47555767609232
>         3 dd= 0 locks held 1    write locks 0    pid/thread 30034/47555767609232
>         3 READ          1 HELD    dn2id.bdb                 handle        0
>         4 dd= 0 locks held 0    write locks 0    pid/thread 30034/1147971904
>         5 dd= 0 locks held 0    write locks 0    pid/thread 30034/47555767609232
>         6 dd= 0 locks held 0    write locks 0    pid/thread 30034/1139579200
>         7 dd= 0 locks held 0    write locks 0    pid/thread 30034/1147971904
>         8 dd= 0 locks held 0    write locks 0    pid/thread 30034/1131186496
>         9 dd= 0 locks held 1    write locks 0    pid/thread 30034/1122793792
>         9 READ          1 HELD    objectClass.bdb           handle        0
>         a dd= 0 locks held 0    write locks 0    pid/thread 30034/1122793792
>         b dd= 0 locks held 0    write locks 0    pid/thread 30034/1122793792
>         c dd= 0 locks held 1    write locks 0    pid/thread 30034/1122793792
>         c READ          1 HELD    uid.bdb                   handle        0
>         d dd= 0 locks held 0    write locks 0    pid/thread 30034/1122793792
>         e dd= 0 locks held 0    write locks 0    pid/thread 30034/1122793792
>         f dd= 0 locks held 0    write locks 0    pid/thread 30034/1122793792
>        10 dd= 0 locks held 1    write locks 0    pid/thread 30034/1147971904
>        10 READ          1 HELD    member.bdb                handle        0
>        11 dd= 0 locks held 0    write locks 0    pid/thread 30034/1147971904
>        12 dd= 0 locks held 0    write locks 0    pid/thread 30034/1147971904
>        13 dd= 0 locks held 1    write locks 0    pid/thread 30034/1147971904
>        13 READ          1 HELD    mail.bdb                  handle        0
>        14 dd= 0 locks held 0    write locks 0    pid/thread 30034/1147971904
>        15 dd= 0 locks held 0    write locks 0    pid/thread 30034/1147971904
>        16 dd= 0 locks held 0    write locks 0    pid/thread 30034/1147971904
>        17 dd= 0 locks held 0    write locks 0    pid/thread 30034/1147971904
>        18 dd= 0 locks held 1    write locks 0    pid/thread 30034/1147971904
>        18 READ          1 HELD    sn.bdb                    handle        0
>        19 dd= 0 locks held 0    write locks 0    pid/thread 30034/1147971904
>        1a dd= 0 locks held 0    write locks 0    pid/thread 30034/1147971904
>        1b dd= 0 locks held 1    write locks 0    pid/thread 30034/1147971904
>        1b READ          1 HELD    cn.bdb                    handle        0
>        1c dd= 0 locks held 0    write locks 0    pid/thread 30034/1147971904
>        1d dd= 0 locks held 0    write locks 0    pid/thread 30034/1147971904
>        1e dd= 0 locks held 0    write locks 0    pid/thread 30034/1122793792
>        1f dd= 0 locks held 0    write locks 0    pid/thread 30034/1131186496
> 80000003 dd= 0 locks held 0    write locks 0    pid/thread 30034/47555767609232
> 80000004 dd= 0 locks held 0    write locks 0    pid/thread 30034/1139579200
> 80000005 dd= 0 locks held 0    write locks 0    pid/thread 30034/1147971904
> 80000006 dd= 0 locks held 0    write locks 0    pid/thread 30034/1131186496
> 80000007 dd= 0 locks held 0    write locks 0    pid/thread 30034/1122793792
> 80000026 dd= 0 locks held 1    write locks 0    pid/thread 30034/1122793792
> 80000026 READ          1 HELD    0x5e948 len:   9 data: 0x560x0500000000000000
> 80000027 dd= 0 locks held 1    write locks 0    pid/thread 30034/1131186496
> 80000027 READ          1 HELD    0x36490 len:   9 data: 0x370000000000000000
> 80000028 dd= 0 locks held 1    write locks 0    pid/thread 30034/1131186496
> 80000028 WRITE         1 WAIT    mail.bdb                  page          3
> 80000028 READ          1 HELD    mail.bdb                  page          3
> 80000029 dd= 0 locks held 1    write locks 0    pid/thread 30034/1122793792
> 80000029 WRITE         1 WAIT    mail.bdb                  page          3
> 80000029 READ          1 HELD    mail.bdb                  page          3
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Locks grouped by object:
> Locker   Mode      Count Status  ----------------- Object ---------------
> 80000027 READ          1 HELD    0x36490 len:   9 data: 0x370000000000000000
>
>        1b READ          1 HELD    cn.bdb                    handle        0
>
>        10 READ          1 HELD    member.bdb                handle        0
>
>         9 READ          1 HELD    objectClass.bdb           handle        0
>
>         c READ          1 HELD    uid.bdb                   handle        0
>
> 80000026 READ          1 HELD    0x5e948 len:   9 data: 0x560x0500000000000000
>
>         1 READ          1 HELD    id2entry.bdb              handle        0
>
>        18 READ          1 HELD    sn.bdb                    handle        0
>
>         3 READ          1 HELD    dn2id.bdb                 handle        0
>
> 80000029 READ          1 HELD    mail.bdb                  page          3
> 80000028 READ          1 HELD    mail.bdb                  page          3
> 80000029 WRITE         1 WAIT    mail.bdb                  page          3
> 80000028 WRITE         1 WAIT    mail.bdb                  page          3
>
>        13 READ          1 HELD    mail.bdb                  handle        0
>
> db_stat -C ol (4.7.25)
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Lock REGINFO information:
> Lock    Region type
> 5       Region ID
> /apps/DECCLASA-1/data/openldap/__db.005 Region name
> 0x2b4a8a577000  Original region address
> 0x2b4a8a577000  Region address
> 0x2b4a8a577138  Region primary address
> 0       Region maximum allocation
> 0       Region allocated
> Region allocations: 3006 allocations, 0 failures, 0 frees, 1 longest
> REGION_JOIN_OK  Region flags
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Locks grouped by lockers:
> Locker   Mode      Count Status  ----------------- Object ---------------
>         1 dd= 0 locks held 1    write locks 0    pid/thread 32447/47463753631632
>         1 READ          1 HELD    id2entry.bdb              handle        0
>         2 dd= 0 locks held 0    write locks 0    pid/thread 32447/47463753631632
>         3 dd= 0 locks held 1    write locks 0    pid/thread 32447/47463753631632
>         3 READ          1 HELD    dn2id.bdb                 handle        0
>         4 dd= 0 locks held 0    write locks 0    pid/thread 32447/1114126656
>         5 dd= 0 locks held 0    write locks 0    pid/thread 32447/47463753631632
>         6 dd= 0 locks held 0    write locks 0    pid/thread 32447/1097341248
>         7 dd= 0 locks held 1    write locks 0    pid/thread 32447/1097341248
>         7 READ          1 HELD    objectClass.bdb           handle        0
>         8 dd= 0 locks held 0    write locks 0    pid/thread 32447/1097341248
>         9 dd= 0 locks held 0    write locks 0    pid/thread 32447/1097341248
>         a dd= 0 locks held 0    write locks 0    pid/thread 32447/1097341248
>         b dd= 0 locks held 0    write locks 0    pid/thread 32447/1097341248
>         c dd= 0 locks held 1    write locks 0    pid/thread 32447/1097341248
>         c READ          1 HELD    cn.bdb                    handle        0
>         d dd= 0 locks held 0    write locks 0    pid/thread 32447/1097341248
>         e dd= 0 locks held 0    write locks 0    pid/thread 32447/1097341248
>         f dd= 0 locks held 1    write locks 0    pid/thread 32447/1097341248
>         f READ          1 HELD    uid.bdb                   handle        0
>        10 dd= 0 locks held 0    write locks 0    pid/thread 32447/1097341248
>        11 dd= 0 locks held 0    write locks 0    pid/thread 32447/1097341248
>        12 dd= 0 locks held 1    write locks 0    pid/thread 32447/1097341248
>        12 READ          1 HELD    member.bdb                handle        0
>        13 dd= 0 locks held 0    write locks 0    pid/thread 32447/1097341248
>        14 dd= 0 locks held 0    write locks 0    pid/thread 32447/1097341248
>        15 dd= 0 locks held 1    write locks 0    pid/thread 32447/1105733952
>        15 READ          1 HELD    mail.bdb                  handle        0
>        16 dd= 0 locks held 0    write locks 0    pid/thread 32447/1105733952
>        17 dd= 0 locks held 0    write locks 0    pid/thread 32447/1105733952
>        18 dd= 0 locks held 0    write locks 0    pid/thread 32447/1105733952
>        19 dd= 0 locks held 0    write locks 0    pid/thread 32447/1105733952
>        1a dd= 0 locks held 1    write locks 0    pid/thread 32447/1105733952
>        1a READ          1 HELD    sn.bdb                    handle        0
>        1b dd= 0 locks held 0    write locks 0    pid/thread 32447/1105733952
>        1c dd= 0 locks held 0    write locks 0    pid/thread 32447/1105733952
>        1d dd= 0 locks held 0    write locks 0    pid/thread 32447/1105733952
>        1e dd= 0 locks held 0    write locks 0    pid/thread 32447/1105733952
>        1f dd= 0 locks held 0    write locks 0    pid/thread 32447/1097341248
> 80000003 dd= 0 locks held 0    write locks 0    pid/thread 32447/47463753631632
> 80000004 dd= 0 locks held 0    write locks 0    pid/thread 32447/1097341248
> 80000008 dd= 0 locks held 0    write locks 0    pid/thread 32447/1105733952
> 80000038 dd= 0 locks held 0    write locks 0    pid/thread 32447/1114126656
> 800000a9 dd= 0 locks held 1    write locks 0    pid/thread 32447/1097341248
> 800000a9 READ          1 HELD    0x2d228 len:   9 data: 0x370000000000000000
> 800000aa dd= 0 locks held 2    write locks 0    pid/thread 32447/1097341248
> 800000aa WRITE         1 WAIT    mail.bdb                  page          3
> 800000aa READ          1 HELD    mail.bdb                  page          1
> 800000aa READ          1 HELD    mail.bdb                  page          3
> 800000ab dd= 0 locks held 1    write locks 0    pid/thread 32447/1105733952
> 800000ab READ          1 HELD    0x43d58 len:   9 data: 0x540x0500000000000000
> 800000ac dd= 0 locks held 2    write locks 0    pid/thread 32447/1105733952
> 800000ac WRITE         1 WAIT    mail.bdb                  page          3
> 800000ac READ          1 HELD    mail.bdb                  page          1
> 800000ac READ          1 HELD    mail.bdb                  page          3
> 800000b1 dd= 0 locks held 0    write locks 0    pid/thread 32447/1122519360
> 800000b2 dd= 0 locks held 0    write locks 0    pid/thread 32447/1130912064
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Locks grouped by object:
> Locker   Mode      Count Status  ----------------- Object ---------------
> 800000a9 READ          1 HELD    0x2d228 len:   9 data: 0x370000000000000000
>
>         1 READ          1 HELD    id2entry.bdb              handle        0
>
>         3 READ          1 HELD    dn2id.bdb                 handle        0
>
>         7 READ          1 HELD    objectClass.bdb           handle        0
>
>         f READ          1 HELD    uid.bdb                   handle        0
>
>        1a READ          1 HELD    sn.bdb                    handle        0
>
> 800000aa READ          1 HELD    mail.bdb                  page          3
> 800000ac READ          1 HELD    mail.bdb                  page          3
> 800000aa WRITE         1 WAIT    mail.bdb                  page          3
> 800000ac WRITE         1 WAIT    mail.bdb                  page          3
>
>         c READ          1 HELD    cn.bdb                    handle        0
>
>        15 READ          1 HELD    mail.bdb                  handle        0
>
> 800000aa READ          1 HELD    mail.bdb                  page          1
> 800000ac READ          1 HELD    mail.bdb                  page          1
>
>        12 READ          1 HELD    member.bdb                handle        0
>
> 800000ab READ          1 HELD    0x43d58 len:   9 data: 0x540x0500000000000000
>
>
> We have also collected the backtrace of threads for both DB versions
> which I have uploaded to:
> https://dl.dropboxusercontent.com/u/92115703/dusan_gdb_threads_4.8.30_
> 20130625.txt
>
> https://dl.dropboxusercontent.com/u/92115703/dusan_gdb_threads_4.7.25_
> 20130625.txt
>
>


--
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/