[Date Prev][Date Next] [Chronological] [Thread] [Top]

(ITS#7633) Slapd hangs on hdb write lock



Full_Name: Dusan Fric
Version: 2.4.32
OS: RHEL 5.7 x64
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (88.212.40.139)


We are experiencing frequent hangs in slapd. Once hung we usualy cannot continue
to
connect until we kill -9 the slapd process and restart it. The directory is used
for 2 applications as user eDir and we are using it in production over 1 month -
we have noted the busier the directory becomes the more often it hangs (now
twice per week).

We have installed identical configurations on 3 environments, each has only one
single server (no replication). There are 30k entries in the directory
(production).

We are running:

RHEL 5.7 x64 (VMWare with NFS mountpoints)
OpenLDAP 2.4.32
Berkeley DB 4.8.30 (4.7.25)

We were starting with DB 4.8.30, after downgrade to version 4.7.25 (according to
ITS#7378 - https://www.openldap.org/its/index.cgi/Incoming?id=7378;selectid=7378)
we are facing the same issue.

The problem occurs when two requests simultaneously try to update attributes of
a record and in most cases of the same DN value.
We can easily reproduce it with a java test program running 2 threads each
connecting to the ldap server and updating the record for a particular DN
value.
It need not be the same DN value but DN values which reside on the same BDB
page.

Configuration

olc cn=config (part):
dn: cn=config
objectClass: olcGlobal
cn: config
olcConcurrency: 0
olcConnMaxPending: 100
olcConnMaxPendingAuth: 1000
olcSockbufMaxIncoming: 262143
olcSockbufMaxIncomingAuth: 16777215
olcThreads: 32

olc DB hdb config (part):
dn: olcDatabase={1}hdb,cn=config
objectClass: olcHdbConfig
objectClass: olcDatabaseConfig
olcDatabase: {1}hdb
olcDbCacheSize: 80000
olcDbCheckpoint: 128 5
olcDbConfig: {0}set_cachesize   0       268435456       1
olcDbConfig: {1}set_lg_max      10485760
olcDbConfig: {2}set_lg_bsize    2097152
olcDbConfig: {3}set_lg_dir      /pkg/openldap/dblog
olcDbConfig: {4}set_lg_regionmax 262144
olcDbConfig: {5}set_lk_detect DB_LOCK_EXPIRE
olcDbConfig: {6}set_flags DB_TXN_NOSYNC
olcDbDirtyRead: FALSE
olcDbDNcacheSize: 0
olcDbIDLcacheSize: 240000
olcDbIndex: default eq,sub
olcDbIndex: objectClass eq
olcDbIndex: cn eq,sub
olcDbIndex: uid eq,sub
olcDbIndex: mail pres,eq
olcDbIndex: sn eq,sub
olcDbIndex: member eq
olcDbLinearIndex: FALSE
olcDbMode: 0600
olcDbNoSync: TRUE
olcDbSearchStack: 20
olcDbShmKey: 0
olcLastMod: TRUE
olcMaxDerefDepth: 15
olcMonitoring: TRUE
olcReadOnly: FALSE

We have managed to collect db_stat lock information, which indicates the same
issue with DB write locks on both DB versions.

db_stat -C ol (4.8.30)
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Lock REGINFO information:
Lock    Region type
5       Region ID
/apps/DECCLASA-1/data/openldap/__db.005 Region name
0x2ab41cdc3000  Region address
0x2ab41cdc3138  Region primary address
0       Region maximum allocation
0       Region allocated
Region allocations: 3006 allocations, 0 failures, 0 frees, 1 longest
REGION_JOIN_OK  Region flags
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Locks grouped by lockers:
Locker   Mode      Count Status  ----------------- Object ---------------
       1 dd= 0 locks held 1    write locks 0    pid/thread 30034/47555767609232
       1 READ          1 HELD    id2entry.bdb              handle        0
       2 dd= 0 locks held 0    write locks 0    pid/thread 30034/47555767609232
       3 dd= 0 locks held 1    write locks 0    pid/thread 30034/47555767609232
       3 READ          1 HELD    dn2id.bdb                 handle        0
       4 dd= 0 locks held 0    write locks 0    pid/thread 30034/1147971904
       5 dd= 0 locks held 0    write locks 0    pid/thread 30034/47555767609232
       6 dd= 0 locks held 0    write locks 0    pid/thread 30034/1139579200
       7 dd= 0 locks held 0    write locks 0    pid/thread 30034/1147971904
       8 dd= 0 locks held 0    write locks 0    pid/thread 30034/1131186496
       9 dd= 0 locks held 1    write locks 0    pid/thread 30034/1122793792
       9 READ          1 HELD    objectClass.bdb           handle        0
       a dd= 0 locks held 0    write locks 0    pid/thread 30034/1122793792
       b dd= 0 locks held 0    write locks 0    pid/thread 30034/1122793792
       c dd= 0 locks held 1    write locks 0    pid/thread 30034/1122793792
       c READ          1 HELD    uid.bdb                   handle        0
       d dd= 0 locks held 0    write locks 0    pid/thread 30034/1122793792
       e dd= 0 locks held 0    write locks 0    pid/thread 30034/1122793792
       f dd= 0 locks held 0    write locks 0    pid/thread 30034/1122793792
      10 dd= 0 locks held 1    write locks 0    pid/thread 30034/1147971904
      10 READ          1 HELD    member.bdb                handle        0
      11 dd= 0 locks held 0    write locks 0    pid/thread 30034/1147971904
      12 dd= 0 locks held 0    write locks 0    pid/thread 30034/1147971904
      13 dd= 0 locks held 1    write locks 0    pid/thread 30034/1147971904
      13 READ          1 HELD    mail.bdb                  handle        0
      14 dd= 0 locks held 0    write locks 0    pid/thread 30034/1147971904
      15 dd= 0 locks held 0    write locks 0    pid/thread 30034/1147971904
      16 dd= 0 locks held 0    write locks 0    pid/thread 30034/1147971904
      17 dd= 0 locks held 0    write locks 0    pid/thread 30034/1147971904
      18 dd= 0 locks held 1    write locks 0    pid/thread 30034/1147971904
      18 READ          1 HELD    sn.bdb                    handle        0
      19 dd= 0 locks held 0    write locks 0    pid/thread 30034/1147971904
      1a dd= 0 locks held 0    write locks 0    pid/thread 30034/1147971904
      1b dd= 0 locks held 1    write locks 0    pid/thread 30034/1147971904
      1b READ          1 HELD    cn.bdb                    handle        0
      1c dd= 0 locks held 0    write locks 0    pid/thread 30034/1147971904
      1d dd= 0 locks held 0    write locks 0    pid/thread 30034/1147971904
      1e dd= 0 locks held 0    write locks 0    pid/thread 30034/1122793792
      1f dd= 0 locks held 0    write locks 0    pid/thread 30034/1131186496
80000003 dd= 0 locks held 0    write locks 0    pid/thread 30034/47555767609232
80000004 dd= 0 locks held 0    write locks 0    pid/thread 30034/1139579200
80000005 dd= 0 locks held 0    write locks 0    pid/thread 30034/1147971904
80000006 dd= 0 locks held 0    write locks 0    pid/thread 30034/1131186496
80000007 dd= 0 locks held 0    write locks 0    pid/thread 30034/1122793792
80000026 dd= 0 locks held 1    write locks 0    pid/thread 30034/1122793792
80000026 READ          1 HELD    0x5e948 len:   9 data: 0x560x0500000000000000
80000027 dd= 0 locks held 1    write locks 0    pid/thread 30034/1131186496
80000027 READ          1 HELD    0x36490 len:   9 data: 0x370000000000000000
80000028 dd= 0 locks held 1    write locks 0    pid/thread 30034/1131186496
80000028 WRITE         1 WAIT    mail.bdb                  page          3
80000028 READ          1 HELD    mail.bdb                  page          3
80000029 dd= 0 locks held 1    write locks 0    pid/thread 30034/1122793792
80000029 WRITE         1 WAIT    mail.bdb                  page          3
80000029 READ          1 HELD    mail.bdb                  page          3
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Locks grouped by object:
Locker   Mode      Count Status  ----------------- Object ---------------
80000027 READ          1 HELD    0x36490 len:   9 data: 0x370000000000000000

      1b READ          1 HELD    cn.bdb                    handle        0

      10 READ          1 HELD    member.bdb                handle        0

       9 READ          1 HELD    objectClass.bdb           handle        0

       c READ          1 HELD    uid.bdb                   handle        0

80000026 READ          1 HELD    0x5e948 len:   9 data: 0x560x0500000000000000

       1 READ          1 HELD    id2entry.bdb              handle        0

      18 READ          1 HELD    sn.bdb                    handle        0

       3 READ          1 HELD    dn2id.bdb                 handle        0

80000029 READ          1 HELD    mail.bdb                  page          3
80000028 READ          1 HELD    mail.bdb                  page          3
80000029 WRITE         1 WAIT    mail.bdb                  page          3
80000028 WRITE         1 WAIT    mail.bdb                  page          3

      13 READ          1 HELD    mail.bdb                  handle        0

db_stat -C ol (4.7.25)
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Lock REGINFO information:
Lock    Region type
5       Region ID
/apps/DECCLASA-1/data/openldap/__db.005 Region name
0x2b4a8a577000  Original region address
0x2b4a8a577000  Region address
0x2b4a8a577138  Region primary address
0       Region maximum allocation
0       Region allocated
Region allocations: 3006 allocations, 0 failures, 0 frees, 1 longest
REGION_JOIN_OK  Region flags
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Locks grouped by lockers:
Locker   Mode      Count Status  ----------------- Object ---------------
       1 dd= 0 locks held 1    write locks 0    pid/thread 32447/47463753631632
       1 READ          1 HELD    id2entry.bdb              handle        0
       2 dd= 0 locks held 0    write locks 0    pid/thread 32447/47463753631632
       3 dd= 0 locks held 1    write locks 0    pid/thread 32447/47463753631632
       3 READ          1 HELD    dn2id.bdb                 handle        0
       4 dd= 0 locks held 0    write locks 0    pid/thread 32447/1114126656
       5 dd= 0 locks held 0    write locks 0    pid/thread 32447/47463753631632
       6 dd= 0 locks held 0    write locks 0    pid/thread 32447/1097341248
       7 dd= 0 locks held 1    write locks 0    pid/thread 32447/1097341248
       7 READ          1 HELD    objectClass.bdb           handle        0
       8 dd= 0 locks held 0    write locks 0    pid/thread 32447/1097341248
       9 dd= 0 locks held 0    write locks 0    pid/thread 32447/1097341248
       a dd= 0 locks held 0    write locks 0    pid/thread 32447/1097341248
       b dd= 0 locks held 0    write locks 0    pid/thread 32447/1097341248
       c dd= 0 locks held 1    write locks 0    pid/thread 32447/1097341248
       c READ          1 HELD    cn.bdb                    handle        0
       d dd= 0 locks held 0    write locks 0    pid/thread 32447/1097341248
       e dd= 0 locks held 0    write locks 0    pid/thread 32447/1097341248
       f dd= 0 locks held 1    write locks 0    pid/thread 32447/1097341248
       f READ          1 HELD    uid.bdb                   handle        0
      10 dd= 0 locks held 0    write locks 0    pid/thread 32447/1097341248
      11 dd= 0 locks held 0    write locks 0    pid/thread 32447/1097341248
      12 dd= 0 locks held 1    write locks 0    pid/thread 32447/1097341248
      12 READ          1 HELD    member.bdb                handle        0
      13 dd= 0 locks held 0    write locks 0    pid/thread 32447/1097341248
      14 dd= 0 locks held 0    write locks 0    pid/thread 32447/1097341248
      15 dd= 0 locks held 1    write locks 0    pid/thread 32447/1105733952
      15 READ          1 HELD    mail.bdb                  handle        0
      16 dd= 0 locks held 0    write locks 0    pid/thread 32447/1105733952
      17 dd= 0 locks held 0    write locks 0    pid/thread 32447/1105733952
      18 dd= 0 locks held 0    write locks 0    pid/thread 32447/1105733952
      19 dd= 0 locks held 0    write locks 0    pid/thread 32447/1105733952
      1a dd= 0 locks held 1    write locks 0    pid/thread 32447/1105733952
      1a READ          1 HELD    sn.bdb                    handle        0
      1b dd= 0 locks held 0    write locks 0    pid/thread 32447/1105733952
      1c dd= 0 locks held 0    write locks 0    pid/thread 32447/1105733952
      1d dd= 0 locks held 0    write locks 0    pid/thread 32447/1105733952
      1e dd= 0 locks held 0    write locks 0    pid/thread 32447/1105733952
      1f dd= 0 locks held 0    write locks 0    pid/thread 32447/1097341248
80000003 dd= 0 locks held 0    write locks 0    pid/thread 32447/47463753631632
80000004 dd= 0 locks held 0    write locks 0    pid/thread 32447/1097341248
80000008 dd= 0 locks held 0    write locks 0    pid/thread 32447/1105733952
80000038 dd= 0 locks held 0    write locks 0    pid/thread 32447/1114126656
800000a9 dd= 0 locks held 1    write locks 0    pid/thread 32447/1097341248
800000a9 READ          1 HELD    0x2d228 len:   9 data: 0x370000000000000000
800000aa dd= 0 locks held 2    write locks 0    pid/thread 32447/1097341248
800000aa WRITE         1 WAIT    mail.bdb                  page          3
800000aa READ          1 HELD    mail.bdb                  page          1
800000aa READ          1 HELD    mail.bdb                  page          3
800000ab dd= 0 locks held 1    write locks 0    pid/thread 32447/1105733952
800000ab READ          1 HELD    0x43d58 len:   9 data: 0x540x0500000000000000
800000ac dd= 0 locks held 2    write locks 0    pid/thread 32447/1105733952
800000ac WRITE         1 WAIT    mail.bdb                  page          3
800000ac READ          1 HELD    mail.bdb                  page          1
800000ac READ          1 HELD    mail.bdb                  page          3
800000b1 dd= 0 locks held 0    write locks 0    pid/thread 32447/1122519360
800000b2 dd= 0 locks held 0    write locks 0    pid/thread 32447/1130912064
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Locks grouped by object:
Locker   Mode      Count Status  ----------------- Object ---------------
800000a9 READ          1 HELD    0x2d228 len:   9 data: 0x370000000000000000

       1 READ          1 HELD    id2entry.bdb              handle        0

       3 READ          1 HELD    dn2id.bdb                 handle        0

       7 READ          1 HELD    objectClass.bdb           handle        0

       f READ          1 HELD    uid.bdb                   handle        0

      1a READ          1 HELD    sn.bdb                    handle        0

800000aa READ          1 HELD    mail.bdb                  page          3
800000ac READ          1 HELD    mail.bdb                  page          3
800000aa WRITE         1 WAIT    mail.bdb                  page          3
800000ac WRITE         1 WAIT    mail.bdb                  page          3

       c READ          1 HELD    cn.bdb                    handle        0

      15 READ          1 HELD    mail.bdb                  handle        0

800000aa READ          1 HELD    mail.bdb                  page          1
800000ac READ          1 HELD    mail.bdb                  page          1

      12 READ          1 HELD    member.bdb                handle        0

800000ab READ          1 HELD    0x43d58 len:   9 data: 0x540x0500000000000000


We have also collected the backtrace of threads for both DB versions which I
have uploaded to:
https://dl.dropboxusercontent.com/u/92115703/dusan_gdb_threads_4.8.30_20130625.txt

https://dl.dropboxusercontent.com/u/92115703/dusan_gdb_threads_4.7.25_20130625.txt