OpenLDAP
Up to top level
Build   Contrib   Development   Documentation   Historical   Incoming   Software Bugs   Software Enhancements   Web  

Logged in as guest

Viewing Incoming/7378
Full headers

From: nikolai@net24.co.nz
Subject: Slapd hangs on bdb write lock
Compose comment
Download message
State:
0 replies:
9 followups: 1 2 3 4 5 6 7 8 9

Major security issue: yes  no

Notes:

Notification:


Date: Sat, 01 Sep 2012 13:46:02 +0000
From: nikolai@net24.co.nz
To: openldap-its@OpenLDAP.org
Subject: Slapd hangs on bdb write lock
Full_Name: Nikolai Schupbach
Version: 2.4.31
OS: FreeBSD
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (202.78.158.60)


We are experiencing frequent hangs in slapd. Once hung we can continue to
connect, but all searches will just hang indefinitely until we kill -9 the slapd
process and restart it. The directory is used for mail routing and we have been
migrating to it from an existing directory server over the last 3 weeks - we
have noted the busier the directory becomes the more often it hangs (now once
every 2 days).

We have one master and 10 syncrepl read only replicas - the master is used
mainly for writes and has not hung yet, but most of the replicas have hung at
least once. The replicas receive anywhere between 50 to 300 searches/sec, while
the master would only get 1/sec. There are 45k entries in the directory.

We are running:

FreeBSD 8.3/9.0 x64
OpenLDAP 2.4.31
Berkeley DB 4.6.21

The old directory we are migrating from has the same load and is also running
OpenLDAP, but has been rock solid for 5 years. It is running Berkeley DB 4.3.29
and OpenLDAP 2.3.27.

We have managed to collect db_stat lock information, which indicates the same
issue each time - a write lock on dn2id.bdb.

Locks grouped by object:
Locker   Mode      Count Status  ----------------- Object ---------------
8000a85e READ          1 HELD    0xb26c8 len:   9 data: 60xa800000000000000

      8a READ          1 HELD    id2entry.bdb              handle        0

      8c READ          1 HELD    dn2id.bdb                 handle        0

      96 READ          1 HELD    objectClass.bdb           handle        0

      93 READ          1 HELD    entryCSN.bdb              handle        0

      90 READ          1 HELD    entryUUID.bdb             handle        0

8000a85f WRITE         4 HELD    dn2id.bdb                 page        219

80000782 READ          1 HELD    dn2id.bdb                 page        768
80000a45 READ          1 HELD    dn2id.bdb                 page        768
80000b9e READ          1 HELD    dn2id.bdb                 page        768
800006a0 READ          1 HELD    dn2id.bdb                 page        768
80000771 READ          1 HELD    dn2id.bdb                 page        768
80000534 READ          1 HELD    dn2id.bdb                 page        768
80000a44 READ          1 HELD    dn2id.bdb                 page        768
80000641 READ          1 HELD    dn2id.bdb                 page        768
80001049 READ          1 HELD    dn2id.bdb                 page        768
8000104a READ          1 HELD    dn2id.bdb                 page        768
80001048 READ          1 HELD    dn2id.bdb                 page        768
80000783 READ          1 HELD    dn2id.bdb                 page        768
80000535 READ          1 HELD    dn2id.bdb                 page        768
8000066e READ          1 HELD    dn2id.bdb                 page        768
80000697 READ          1 HELD    dn2id.bdb                 page        768
8000a85f READ          1 HELD    dn2id.bdb                 page        768

8000a85e READ          1 HELD    0xb19a8 len:   9 data: 40xa800000000000000

8000a85f READ          1 HELD    dn2id.bdb                 page        933
8000a85f WRITE         2 HELD    dn2id.bdb                 page        933

80001047 WRITE         1 HELD    dn2id.bdb                 page        559
80000782 READ          1 WAIT    dn2id.bdb                 page        559
80000a45 READ          1 WAIT    dn2id.bdb                 page        559
80000b9e READ          1 WAIT    dn2id.bdb                 page        559
800006a0 READ          1 WAIT    dn2id.bdb                 page        559
80000771 READ          1 WAIT    dn2id.bdb                 page        559
80000534 READ          1 WAIT    dn2id.bdb                 page        559
80000a44 READ          1 WAIT    dn2id.bdb                 page        559
80000641 READ          1 WAIT    dn2id.bdb                 page        559
80001049 READ          1 WAIT    dn2id.bdb                 page        559
8000104a READ          1 WAIT    dn2id.bdb                 page        559
80001048 READ          1 WAIT    dn2id.bdb                 page        559
80000783 READ          1 WAIT    dn2id.bdb                 page        559
80000535 READ          1 WAIT    dn2id.bdb                 page        559
8000066e READ          1 WAIT    dn2id.bdb                 page        559
80000697 READ          1 WAIT    dn2id.bdb                 page        559
8000a85f READ          1 WAIT    dn2id.bdb                 page        559

8000a85f READ          2 HELD    dn2id.bdb                 page       1362
8000a85f WRITE         2 HELD    dn2id.bdb                 page       1362

8000a85f READ          2 HELD    dn2id.bdb                 page       1353
8000a85f WRITE         2 HELD    dn2id.bdb                 page       1353

      b6 READ          1 HELD    uid.bdb                  

Message of length 10582 truncated

Followup 1

Download message
Date: Sat, 01 Sep 2012 12:07:09 -0700
From: Quanah Gibson-Mount <quanah@zimbra.com>
To: nikolai@net24.co.nz, openldap-its@openldap.org
Subject: Re: (ITS#7378) Slapd hangs on bdb write lock
--On Saturday, September 01, 2012 1:46 PM +0000 nikolai@net24.co.nz wrote:

> Full_Name: Nikolai Schupbach
> Version: 2.4.31
> OS: FreeBSD
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (202.78.158.60)

Have you confirmed this isn't the same thing ITS#7222, fixed in OpenLDAP 
2.4.32?

--Quanah



--

Quanah Gibson-Mount
Sr. Member of Technical Staff
Zimbra, Inc
A Division of VMware, Inc.
--------------------
Zimbra ::  the leader in open source messaging and collaboration



Followup 2

Download message
Subject: Re: (ITS#7378) Slapd hangs on bdb write lock
From: Nikolai Schupbach <nikolai@net24.co.nz>
Date: Sun, 2 Sep 2012 11:34:35 +1200
Cc: openldap-its@openldap.org
To: Quanah Gibson-Mount <quanah@zimbra.com>
I haven't yet - I wanted to collect information before making any changes. I did
look at that fix and wasn't confident it would solve our problem. You're right
though - I need to test it to rule it out. I will upgrade all the servers to
2.4.32 and report back.

On 2/09/2012, at 7:07 AM, Quanah Gibson-Mount wrote:

> --On Saturday, September 01, 2012 1:46 PM +0000 nikolai@net24.co.nz wrote:
> 
>> Full_Name: Nikolai Schupbach
>> Version: 2.4.31
>> OS: FreeBSD
>> URL: ftp://ftp.openldap.org/incoming/
>> Submission from: (NULL) (202.78.158.60)
> 
> Have you confirmed this isn't the same thing ITS#7222, fixed in OpenLDAP
2.4.32?
> 
> --Quanah
> 
> 
> 
> --
> 
> Quanah Gibson-Mount
> Sr. Member of Technical Staff
> Zimbra, Inc
> A Division of VMware, Inc.
> --------------------
> Zimbra ::  the leader in open source messaging and collaboration




Followup 3

Download message
Date: Sun, 02 Sep 2012 15:11:54 +0200
From: =?ISO-8859-1?Q?Michael_Str=F6der?= <michael@stroeder.com>
To: openldap-its@openldap.org
Subject: Re: (ITS#7378) Slapd hangs on bdb write lock
This is a cryptographically signed message in MIME format.

--------------ms080100030105010600070605
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

A couple of days ago I had a hang with OpenLDAP 2.4.32 / back-hdb running=
 on
Debian Squeeze, self-compiled against BDB 4.8.30. It seemed Database was
locked as restarting slapd of even rebooting OS did not help. Unfortunate=
ly I
had to bring up the system as fast as possible and could not examine the =
problem.

The system has only 200 entries and not much load yet. I had renamed entr=
ies
with web2ldap when all 4 masters (4-way MMR) locked up one after the othe=
r.

So there seem to be lockup problems in 2.4.32.


--------------ms080100030105010600070605
Content-Type: application/pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIILHzCC
BT8wggQnoAMCAQICDwCmSwABAAIAivjZQ8SBvzANBgkqhkiG9w0BAQUFADB8MQswCQYDVQQG
EwJERTEcMBoGA1UEChMTVEMgVHJ1c3RDZW50ZXIgR21iSDElMCMGA1UECxMcVEMgVHJ1c3RD
ZW50ZXIgQ2xhc3MgMSBMMSBDQTEoMCYGA1UEAxMfVEMgVHJ1c3RDZW50ZXIgQ2xhc3MgMSBM
MSBDQSBJWDAeFw0xMjA2MDYxOTAyMTZaFw0xMzA2MDcxOTAyMTZaMCgxCzAJBgNVBAYTAkRF
MRkwFwYDVQQDDBBNaWNoYWVsIFN0csO2ZGVyMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIB
CgKCAQEAxXZGav40rnGNLxEggBW94MILWHlfC8a23Jew5U1gPlfRTXOjjzmoaZ1uCyGdgF6M
VvuO9T1aTQNGH+OdeGe3P7Tfc/NsLJFJ2wtd8blvhmodUgse2eypiWjNOd4gZuhalBhgsQ0K
b5D6/1foghII4E264iZlJ7AJ+UYcO+GxvFWT0YMTbLckgDkZk7c3qwTozdhYvXarvqx+8Ou/
kuxpQQhac/ebzxpu0N+RHSf2KIUS0g0tEGnPtGv6iL+9QNHc4JKo9Y9KKVw3tQy+Re+FQLxB
1fPE5F+qxuD3AUENpOwkMsqWLM94ohtx3CFqLpxfUPrnKFLAHOhHEbByYGvFPwIDAQABo4IC
EDCCAgwwgaUGCCsGAQUFBwEBBIGYMIGVMFEGCCsGAQUFBzAChkVodHRwOi8vd3d3LnRydXN0
Y2VudGVyLmRlL2NlcnRzZXJ2aWNlcy9jYWNlcnRzL3RjX2NsYXNzMV9MMV9DQV9JWC5jcnQw
QAYIKwYBBQUHMAGGNGh0dHA6Ly9vY3NwLml4LnRjY2xhc3MxLnRjdW5pdmVyc2FsLWkudHJ1
c3RjZW50ZXIuZGUwHwYDVR0jBBgwFoAU6bgoHUbP/M34TpvF7ktg69g7P9EwDAYDVR0TAQH/
BAIwADBKBgNVHSAEQzBBMD8GCSqCFAAsAQEBATAyMDAGCCsGAQUFBwIBFiRodHRwOi8vd3d3
LnRydXN0Y2VudGVyLmRlL2d1aWRlbGluZXMwDgYDVR0PAQH/BAQDAgTwMB0GA1UdDgQWBBS2
KAWfTfgJ/JQ63qLGwTXYLnI+LzBiBgNVHR8EWzBZMFegVaBThlFodHRwOi8vY3JsLml4LnRj
Y2xhc3MxLnRjdW5pdmVyc2FsLWkudHJ1c3RjZW50ZXIuZGUvY3JsL3YyL3RjX0NsYXNzMV9M
MV9DQV9JWC5jcmwwMwYDVR0lBCwwKgYIKwYBBQUHAwIGCCsGAQUFBwMEBggrBgEFBQcDBwYK
KwYBBAGCNxQCAjAfBgNVHREEGDAWgRRtaWNoYWVsQHN0cm9lZGVyLmNvbTANBgkqhkiG9w0B
AQUFAAOCAQEAQ3bvVUpEq+cQrLpcogyt5BJNk/WvUvOHqhzyj28M9pg9hcDl1+MYl5qqj6tR
GSTLPQZyf287pcmbMwbcTGZO/gbW9v7RYcut6RauWdwKMCUmKC3J4fVfDq9ZETA2WOV68ef4
B3Gzdhghsbp3Rhp5dDmrCVKAHlafm6ZwJrEQ9P76fxnQZzRLgeKpZep5ePH5YHUB3+YaOQvJ
FG0bOXvfHhRiRG7/HW2G+yDgjHSxDz8AFzMWL/RFePqZ4pn6T/SM/qU6WEpW39MWyJNoH/Kx
QDYK8gGYuesn1ciMCTnjrvZQj0fonGTO4SfWekJRkuGrJ7dYSZRjYbDcWBBkdFLWzzCCBdgw
ggTAoAMCAQICDgboAAEAAkqWLSQM/sXJMA0GCSqGSIb3DQEBBQUAMHkxCzAJBgNVBAYTAkRF
MRwwGgYDVQQKExNUQyBUcnVzdENlbnRlciBHbWJIMSQwIgYDVQQLExtUQyBUcnVzdENlbnRl
ciBVbml2ZXJzYWwgQ0ExJjAkBgNVBAMTHVRDIFRydXN0Q2VudGVyIFVuaXZlcnNhbCBDQSBJ
MB4XDTA5MTEwMzE0MDgxOVoXDTI1MTIzMTIxNTk1OVowfDELMAkGA1UEBhMCREUxHDAaBgNV
BAoTE1RDIFRydXN0Q2VudGVyIEdtYkgxJTAjBgNVBAsTHFRDIFRydXN0Q2VudGVyIENsYXNz
IDEgTDEgQ0ExKDAmBgNVBAMTH1RDIFRydXN0Q2VudGVyIENsYXNzIDEgTDEgQ0EgSVgwggEi
MA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQC75pBuz2Lp6QuqthDVR+V8XSsncZpozVVt
5KLv5P7yemMRwleKyH3PjmYfZUVL64Biab1GjovFblqVGCrep/EfdRonq20yU+P7TVhiLP8Z
5cegDZotIYhZhM0d8cPIij6w5d4IJM/8QCy6QSOUu4ASiTVItoYE4AFPjLqpmPwcie0fiqHH
hpgmHnJla/7PZdkMZEsaCfVDEWBmJuMzVprJPT40anjG5VBLyM2I5DlsUCaeQCy2O3w3sqf1
3dyzUcv03IICuNc63towXA31Qt0TaVNU6YAmQjMepdfMbspmCZ+G8D2+xophEPPR/1vkstst
smUMqX0XrLonTUJczglPAgMBAAGjggJZMIICVTCBmgYIKwYBBQUHAQEEgY0wgYowUgYIKwYB
BQUHMAKGRmh0dHA6Ly93d3cudHJ1c3RjZW50ZXIuZGUvY2VydHNlcnZpY2VzL2NhY2VydHMv
dGNfdW5pdmVyc2FsX3Jvb3RfSS5jcnQwNAYIKwYBBQUHMAGGKGh0dHA6Ly9vY3NwLnRjdW5p
dmVyc2FsLUkudHJ1c3RjZW50ZXIuZGUwHwYDVR0jBBgwFoAUkqR1LKSevoFE63n8isWVpesQ
dXMwEgYDVR0TAQH/BAgwBgEB/wIBADBSBgNVHSAESzBJMAYGBFUdIAAwPwYJKoIUACwBAQEB
MDIwMAYIKwYBBQUHAgEWJGh0dHA6Ly93d3cudHJ1c3RjZW50ZXIuZGUvZ3VpZGVsaW5lczAO
BgNVHQ8BAf8EBAMCAQYwHQYDVR0OBBYEFOm4KB1Gz/zN+E6bxe5LYOvYOz/RMIH9BgNVHR8E
gfUwgfIwge+ggeyggemGRmh0dHA6Ly9jcmwudGN1bml2ZXJzYWwtSS50cnVzdGNlbnRlci5k
ZS9jcmwvdjIvdGNfdW5pdmVyc2FsX3Jvb3RfSS5jcmyGgZ5sZGFwOi8vd3d3LnRydXN0Y2Vu
dGVyLmRlL0NOPVRDJTIwVHJ1c3RDZW50ZXIlMjBVbml2ZXJzYWwlMjBDQSUyMEksTz1UQyUy
MFRydXN0Q2VudGVyJTIwR21iSCxPVT1yb290Y2VydHMsREM9dHJ1c3RjZW50ZXIsREM9ZGU/
Y2VydGlmaWNhdGVSZXZvY2F0aW9uTGlzdD9iYXNlPzANBgkqhkiG9w0BAQUFAAOCAQEAOcjE
m+6+mO5Icm+N53G2DpCM07LBFSGoRpBoX0oE8TrJaIQh2KXmBHVdn9LU8kt3QzLclctgvwJV
0KwcsMUUl5tlCsMPpR3s2Ek5lbWpvvr0HqtW56blAQiINV9nBd1EJFASIkRjefGbV2nOq9Yz
UU+N8HA7jq1ROhd/NZZraGhjthwKyfjfHV7PKxGlY+3M0MbTIG+q/GhIfm0euDpFqhKG88e9
ALXr/uoSn3MzeOcoOWjTpW3adtFO4VWVgKbgG7jNrFbvRVlHmFLbOm4msjE5aXWxLiTwpJ2X
iF4zKca1vAdAOgw9us90jEtOeiH6GzjNxEMvb7TfeO6Zkuc6HDGCA84wggPKAgEBMIGPMHwx
CzAJBgNVBAYTAkRFMRwwGgYDVQQKExNUQyBUcnVzdENlbnRlciBHbWJIMSUwIwYDVQQLExxU
QyBUcnVzdENlbnRlciBDbGFzcyAx

Message of length 6253 truncated


Followup 4

Download message
Date: Mon, 03 Sep 2012 02:45:37 -0700
From: Howard Chu <hyc@symas.com>
To: nikolai@net24.co.nz
CC: openldap-its@openldap.org
Subject: Re: (ITS#7378) Slapd hangs on bdb write lock
nikolai@net24.co.nz wrote:
> Full_Name: Nikolai Schupbach
> Version: 2.4.31
> OS: FreeBSD
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (202.78.158.60)
> 
> 
> We are experiencing frequent hangs in slapd. Once hung we can continue to
> connect, but all searches will just hang indefinitely until we kill -9 the
slapd
> process and restart it. The directory is used for mail routing and we have
been
> migrating to it from an existing directory server over the last 3 weeks -
we
> have noted the busier the directory becomes the more often it hangs (now
once
> every 2 days).
> 
> We have one master and 10 syncrepl read only replicas - the master is used
> mainly for writes and has not hung yet, but most of the replicas have hung
at
> least once. The replicas receive anywhere between 50 to 300 searches/sec,
while
> the master would only get 1/sec. There are 45k entries in the directory.
> 
> We are running:
> 
> FreeBSD 8.3/9.0 x64
> OpenLDAP 2.4.31
> Berkeley DB 4.6.21
> 
> The old directory we are migrating from has the same load and is also
running
> OpenLDAP, but has been rock solid for 5 years. It is running Berkeley DB
4.3.29
> and OpenLDAP 2.3.27.
> 
> We have managed to collect db_stat lock information, which indicates the
same
> issue each time - a write lock on dn2id.bdb.

It's more than that. Your db_stat shows that a single thread has 3 active
transactions. This should never happen:

8000a85e dd= 0 locks held 2    write locks 0    pid/thread 88000/34386526336
8000a85e READ          1 HELD    0xb19a8 len:   9 data: 40xa800000000000000
8000a85e READ          1 HELD    0xb26c8 len:   9 data: 60xa800000000000000
8000a85f dd= 0 locks held 8    write locks 4    pid/thread 88000/34386526336
8000a85f READ          1 WAIT    dn2id.bdb                 page        559
8000a85f READ          1 HELD    dn2id.bdb                 page        768
8000a85f WRITE         2 HELD    dn2id.bdb                 page       1362
8000a85f READ          2 HELD    dn2id.bdb                 page       1362
8000a85f WRITE         2 HELD    dn2id.bdb                 page       1353
8000a85f READ          2 HELD    dn2id.bdb                 page       1353
8000a85f WRITE         2 HELD    dn2id.bdb                 page        933
8000a85f READ          1 HELD    dn2id.bdb                 page        933
8000a85f WRITE         4 HELD    dn2id.bdb                 page        219
80001047 dd=28 locks held 1    write locks 1    pid/thread 88000/34386526336
80001047 WRITE         1 HELD    dn2id.bdb                 page        559

I would first recommend changing from BDB 4.6.21 to some other version. There
are no code paths in back-bdb where we would ever return without either
committing or aborting the current transactions, so this appears to be a BDB
bug, not an OpenLDAP bug.

> We have also collected the backtrace for all the threads which I have
uploaded
> to:
> 
> ftp://ftp.openldap.org/incoming/nikolai-gdb-120902.txt
> 
> The full db_stat output is located at:
> 
> ftp://ftp.openldap.org/incoming/nikolai-dbstat-120902.txt

-- 
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/



Followup 5

Download message
Date: Mon, 03 Sep 2012 03:11:27 -0700
From: Howard Chu <hyc@symas.com>
To: michael@stroeder.com
CC: openldap-its@openldap.org
Subject: Re: (ITS#7378) Slapd hangs on bdb write lock
michael@stroeder.com wrote:
> This is a cryptographically signed message in MIME format.
> 
> --------------ms080100030105010600070605
> Content-Type: text/plain; charset=ISO-8859-1
> Content-Transfer-Encoding: quoted-printable
> 
> A couple of days ago I had a hang with OpenLDAP 2.4.32 / back-hdb running=
>  on
> Debian Squeeze, self-compiled against BDB 4.8.30. It seemed Database was
> locked as restarting slapd of even rebooting OS did not help. Unfortunate=
> ly I
> had to bring up the system as fast as possible and could not examine the =
> problem.

db_recover will always return the DB to a usable state and reset any DB locks.
(It completely deletes the lock region, so there cannot be any stale locks
after it runs.)

> The system has only 200 entries and not much load yet. I had renamed entr=
> ies
> with web2ldap when all 4 masters (4-way MMR) locked up one after the othe=
> r.

> So there seem to be lockup problems in 2.4.32.

The only way to know if you're seeing the same problem as the original poster
is if you provide db_stat -CA and gdb trace output, like the original poster
did.

-- 
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/



Followup 6

Download message
Subject: Re: (ITS#7378) Slapd hangs on bdb write lock
From: Nikolai Schupbach <nikolai@net24.co.nz>
Date: Tue, 4 Sep 2012 01:43:31 +1200
Cc: openldap-its@openldap.org
To: Howard Chu <hyc@symas.com>
Hi Howard,

Thank you very much for the explanation. What BDB version would you recommend.
Obviously I have quite a few options and would like to use a version that is
known to be very solid.

Sincerely,
Nikolai Schupbach

On 3/09/2012, at 9:45 PM, Howard Chu wrote:

> nikolai@net24.co.nz wrote:
>> Full_Name: Nikolai Schupbach
>> Version: 2.4.31
>> OS: FreeBSD
>> URL: ftp://ftp.openldap.org/incoming/
>> Submission from: (NULL) (202.78.158.60)
>> 
>> 
>> We are experiencing frequent hangs in slapd. Once hung we can continue
to
>> connect, but all searches will just hang indefinitely until we kill -9
the slapd
>> process and restart it. The directory is used for mail routing and we
have been
>> migrating to it from an existing directory server over the last 3 weeks
- we
>> have noted the busier the directory becomes the more often it hangs
(now once
>> every 2 days).
>> 
>> We have one master and 10 syncrepl read only replicas - the master is
used
>> mainly for writes and has not hung yet, but most of the replicas have
hung at
>> least once. The replicas receive anywhere between 50 to 300
searches/sec, while
>> the master would only get 1/sec. There are 45k entries in the
directory.
>> 
>> We are running:
>> 
>> FreeBSD 8.3/9.0 x64
>> OpenLDAP 2.4.31
>> Berkeley DB 4.6.21
>> 
>> The old directory we are migrating from has the same load and is also
running
>> OpenLDAP, but has been rock solid for 5 years. It is running Berkeley
DB 4.3.29
>> and OpenLDAP 2.3.27.
>> 
>> We have managed to collect db_stat lock information, which indicates
the same
>> issue each time - a write lock on dn2id.bdb.
> 
> It's more than that. Your db_stat shows that a single thread has 3 active
> transactions. This should never happen:
> 
> 8000a85e dd= 0 locks held 2    write locks 0    pid/thread
88000/34386526336
> 8000a85e READ          1 HELD    0xb19a8 len:   9 data: 40xa800000000000000
> 8000a85e READ          1 HELD    0xb26c8 len:   9 data: 60xa800000000000000
> 8000a85f dd= 0 locks held 8    write locks 4    pid/thread
88000/34386526336
> 8000a85f READ          1 WAIT    dn2id.bdb                 page        559
> 8000a85f READ          1 HELD    dn2id.bdb                 page        768
> 8000a85f WRITE         2 HELD    dn2id.bdb                 page       1362
> 8000a85f READ          2 HELD    dn2id.bdb                 page       1362
> 8000a85f WRITE         2 HELD    dn2id.bdb                 page       1353
> 8000a85f READ          2 HELD    dn2id.bdb                 page       1353
> 8000a85f WRITE         2 HELD    dn2id.bdb                 page        933
> 8000a85f READ          1 HELD    dn2id.bdb                 page        933
> 8000a85f WRITE         4 HELD    dn2id.bdb                 page        219
> 80001047 dd=28 locks held 1    write locks 1    pid/thread
88000/34386526336
> 80001047 WRITE         1 HELD    dn2id.bdb                 page        559
> 
> I would first recommend changing from BDB 4.6.21 to some other version.
There
> are no code paths in back-bdb where we would ever return without either
> committing or aborting the current transactions, so this appears to be a
BDB
> bug, not an OpenLDAP bug.
> 
>> We have also collected the backtrace for all the threads which I have
uploaded
>> to:
>> 
>> ftp://ftp.openldap.org/incoming/nikolai-gdb-120902.txt
>> 
>> The full db_stat output is located at:
>> 
>> ftp://ftp.openldap.org/incoming/nikolai-dbstat-120902.txt
> 
> -- 
>  -- Howard Chu
>  CTO, Symas Corp.           http://www.symas.com
>  Director, Highland Sun     http://highlandsun.com/hyc/
>  Chief Architect, OpenLDAP  http://www.openldap.org/project/




Followup 7

Download message
Date: Mon, 03 Sep 2012 11:35:14 -0700
From: Howard Chu <hyc@symas.com>
To: Nikolai Schupbach <nikolai@net24.co.nz>
CC: openldap-its@openldap.org
Subject: Re: (ITS#7378) Slapd hangs on bdb write lock
Nikolai Schupbach wrote:
> Hi Howard,
> 
> Thank you very much for the explanation. What BDB version would you
recommend. Obviously I have quite a few options and would like to use a
version that is known to be very solid.

I believe 4.7.25 + all 4 of its official patches was pretty stable.
http://www.oracle.com/technetwork/products/berkeleydb/patch-088170.html

I've done limited testing with 4.8.30, 5.1.19, and 5.3.21. At this point I'm
no longer tracking BDB revisions since MDB has superior performance while
using 1/4 as much RAM and requiring no tuning.

> Sincerely,
> Nikolai Schupbach
> 
> On 3/09/2012, at 9:45 PM, Howard Chu wrote:
> 
>> nikolai@net24.co.nz wrote:
>>> Full_Name: Nikolai Schupbach
>>> Version: 2.4.31
>>> OS: FreeBSD
>>> URL: ftp://ftp.openldap.org/incoming/
>>> Submission from: (NULL) (202.78.158.60)
>>>
>>>
>>> We are experiencing frequent hangs in slapd. Once hung we can
continue to
>>> connect, but all searches will just hang indefinitely until we kill
-9 the slapd
>>> process and restart it. The directory is used for mail routing and
we have been
>>> migrating to it from an existing directory server over the last 3
weeks - we
>>> have noted the busier the directory becomes the more often it hangs
(now once
>>> every 2 days).
>>>
>>> We have one master and 10 syncrepl read only replicas - the master
is used
>>> mainly for writes and has not hung yet, but most of the replicas
have hung at
>>> least once. The replicas receive anywhere between 50 to 300
searches/sec, while
>>> the master would only get 1/sec. There are 45k entries in the
directory.
>>>
>>> We are running:
>>>
>>> FreeBSD 8.3/9.0 x64
>>> OpenLDAP 2.4.31
>>> Berkeley DB 4.6.21
>>>
>>> The old directory we are migrating from has the same load and is
also running
>>> OpenLDAP, but has been rock solid for 5 years. It is running
Berkeley DB 4.3.29
>>> and OpenLDAP 2.3.27.
>>>
>>> We have managed to collect db_stat lock information, which
indicates the same
>>> issue each time - a write lock on dn2id.bdb.
>>
>> It's more than that. Your db_stat shows that a single thread has 3
active
>> transactions. This should never happen:
>>
>> 8000a85e dd= 0 locks held 2    write locks 0    pid/thread
88000/34386526336
>> 8000a85e READ          1 HELD    0xb19a8 len:   9 data:
40xa800000000000000
>> 8000a85e READ          1 HELD    0xb26c8 len:   9 data:
60xa800000000000000
>> 8000a85f dd= 0 locks held 8    write locks 4    pid/thread
88000/34386526336
>> 8000a85f READ          1 WAIT    dn2id.bdb                 page       
559
>> 8000a85f READ          1 HELD    dn2id.bdb                 page       
768
>> 8000a85f WRITE         2 HELD    dn2id.bdb                 page      
1362
>> 8000a85f READ          2 HELD    dn2id.bdb                 page      
1362
>> 8000a85f WRITE         2 HELD    dn2id.bdb                 page      
1353
>> 8000a85f READ          2 HELD    dn2id.bdb                 page      
1353
>> 8000a85f WRITE         2 HELD    dn2id.bdb                 page       
933
>> 8000a85f READ          1 HELD    dn2id.bdb                 page       
933
>> 8000a85f WRITE         4 HELD    dn2id.bdb                 page       
219
>> 80001047 dd=28 locks held 1    write locks 1    pid/thread
88000/34386526336
>> 80001047 WRITE         1 HELD    dn2id.bdb                 page       
559
>>
>> I would first recommend changing from BDB 4.6.21 to some other version.
There
>> are no code paths in back-bdb where we would ever return without either
>> committing or aborting the current transactions, so this appears to be
a BDB
>> bug, not an OpenLDAP bug.
>>
>>> We have also collected the backtrace for all the threads which I
have uploaded
>>> to:
>>>
>>> ftp://ftp.openldap.org/incoming/nikolai-gdb-120902.txt
>>>
>>> The full db_stat output is located at:
>>>
>>> ftp://ftp.openldap.org/incoming/nikolai-dbstat-120902.txt
>>
>> -- 
>>  -- Howard Chu
>>  CTO, Symas Corp.           http://www.symas.com
>>  Director, Highland Sun     http://highlandsun.com/hyc/
>>  Chief Architect, OpenLDAP  http://www.openldap.org/project/
> 
> 


-- 
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/



Followup 8

Download message
Date: Mon, 03 Sep 2012 13:42:58 -0700
From: Quanah Gibson-Mount <quanah@zimbra.com>
To: openldap-its@openldap.org
cc: nikolai@net24.co.nz
Subject: Re: (ITS#7378) Slapd hangs on bdb write lock
--On Monday, September 03, 2012 6:35 PM +0000 hyc@symas.com wrote:

> Nikolai Schupbach wrote:
>> Hi Howard,
>>
>> Thank you very much for the explanation. What BDB version would you
> recommend. Obviously I have quite a few options and would like to use a
> version that is known to be very solid.
>
> I believe 4.7.25 + all 4 of its official patches was pretty stable.
> http://www.oracle.com/technetwork/products/berkeleydb/patch-088170.html
>
> I've done limited testing with 4.8.30, 5.1.19, and 5.3.21. At this point
> I'm no longer tracking BDB revisions since MDB has superior performance
> while using 1/4 as much RAM and requiring no tuning.

We've been using BDB 4.7.25+all 4 patches without issue for several years 
as well.  However, I will also note that we are now switching over to MDB 
as well for our production services starting with OpenLDAP 2.4.32.

--Quanah


--

Quanah Gibson-Mount
Sr. Member of Technical Staff
Zimbra, Inc
A Division of VMware, Inc.
--------------------
Zimbra ::  the leader in open source messaging and collaboration



Followup 9

Download message
Subject: Re: (ITS#7378) Slapd hangs on bdb write lock
From: Nikolai Schupbach <nikolai@net24.co.nz>
Date: Tue, 4 Sep 2012 21:17:08 +1200
Cc: openldap-its@openldap.org
To: Quanah Gibson-Mount <quanah@zimbra.com>
Thanks guys - I think we will look at going to MDB as well now.

On 4/09/2012, at 8:42 AM, Quanah Gibson-Mount wrote:

> --On Monday, September 03, 2012 6:35 PM +0000 hyc@symas.com wrote:
> 
>> Nikolai Schupbach wrote:
>>> Hi Howard,
>>> 
>>> Thank you very much for the explanation. What BDB version would you
>> recommend. Obviously I have quite a few options and would like to use a
>> version that is known to be very solid.
>> 
>> I believe 4.7.25 + all 4 of its official patches was pretty stable.
>> http://www.oracle.com/technetwork/products/berkeleydb/patch-088170.html
>> 
>> I've done limited testing with 4.8.30, 5.1.19, and 5.3.21. At this
point
>> I'm no longer tracking BDB revisions since MDB has superior performance
>> while using 1/4 as much RAM and requiring no tuning.
> 
> We've been using BDB 4.7.25+all 4 patches without issue for several years
as well.  However, I will also note that we are now switching over to MDB as
well for our production services starting with OpenLDAP 2.4.32.
> 
> --Quanah
> 
> 
> --
> 
> Quanah Gibson-Mount
> Sr. Member of Technical Staff
> Zimbra, Inc
> A Division of VMware, Inc.
> --------------------
> Zimbra ::  the leader in open source messaging and collaboration



Up to top level
Build   Contrib   Development   Documentation   Historical   Incoming   Software Bugs   Software Enhancements   Web  

Logged in as guest


The OpenLDAP Issue Tracking System uses a hacked version of JitterBug

______________
© Copyright 2013, OpenLDAP Foundation, info@OpenLDAP.org