Full_Name: Nikolai Schupbach Version: 2.4.31 OS: FreeBSD URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (202.78.158.60) We are experiencing frequent hangs in slapd. Once hung we can continue to connect, but all searches will just hang indefinitely until we kill -9 the slapd process and restart it. The directory is used for mail routing and we have been migrating to it from an existing directory server over the last 3 weeks - we have noted the busier the directory becomes the more often it hangs (now once every 2 days). We have one master and 10 syncrepl read only replicas - the master is used mainly for writes and has not hung yet, but most of the replicas have hung at least once. The replicas receive anywhere between 50 to 300 searches/sec, while the master would only get 1/sec. There are 45k entries in the directory. We are running: FreeBSD 8.3/9.0 x64 OpenLDAP 2.4.31 Berkeley DB 4.6.21 The old directory we are migrating from has the same load and is also running OpenLDAP, but has been rock solid for 5 years. It is running Berkeley DB 4.3.29 and OpenLDAP 2.3.27. We have managed to collect db_stat lock information, which indicates the same issue each time - a write lock on dn2id.bdb. Locks grouped by object: Locker Mode Count Status ----------------- Object --------------- 8000a85e READ 1 HELD 0xb26c8 len: 9 data: 60xa800000000000000 8a READ 1 HELD id2entry.bdb handle 0 8c READ 1 HELD dn2id.bdb handle 0 96 READ 1 HELD objectClass.bdb handle 0 93 READ 1 HELD entryCSN.bdb handle 0 90 READ 1 HELD entryUUID.bdb handle 0 8000a85f WRITE 4 HELD dn2id.bdb page 219 80000782 READ 1 HELD dn2id.bdb page 768 80000a45 READ 1 HELD dn2id.bdb page 768 80000b9e READ 1 HELD dn2id.bdb page 768 800006a0 READ 1 HELD dn2id.bdb page 768 80000771 READ 1 HELD dn2id.bdb page 768 80000534 READ 1 HELD dn2id.bdb page 768 80000a44 READ 1 HELD dn2id.bdb page 768 80000641 READ 1 HELD dn2id.bdb page 768 80001049 READ 1 HELD dn2id.bdb page 768 8000104a READ 1 HELD dn2id.bdb page 768 80001048 READ 1 HELD dn2id.bdb page 768 80000783 READ 1 HELD dn2id.bdb page 768 80000535 READ 1 HELD dn2id.bdb page 768 8000066e READ 1 HELD dn2id.bdb page 768 80000697 READ 1 HELD dn2id.bdb page 768 8000a85f READ 1 HELD dn2id.bdb page 768 8000a85e READ 1 HELD 0xb19a8 len: 9 data: 40xa800000000000000 8000a85f READ 1 HELD dn2id.bdb page 933 8000a85f WRITE 2 HELD dn2id.bdb page 933 80001047 WRITE 1 HELD dn2id.bdb page 559 80000782 READ 1 WAIT dn2id.bdb page 559 80000a45 READ 1 WAIT dn2id.bdb page 559 80000b9e READ 1 WAIT dn2id.bdb page 559 800006a0 READ 1 WAIT dn2id.bdb page 559 80000771 READ 1 WAIT dn2id.bdb page 559 80000534 READ 1 WAIT dn2id.bdb page 559 80000a44 READ 1 WAIT dn2id.bdb page 559 80000641 READ 1 WAIT dn2id.bdb page 559 80001049 READ 1 WAIT dn2id.bdb page 559 8000104a READ 1 WAIT dn2id.bdb page 559 80001048 READ 1 WAIT dn2id.bdb page 559 80000783 READ 1 WAIT dn2id.bdb page 559 80000535 READ 1 WAIT dn2id.bdb page 559 8000066e READ 1 WAIT dn2id.bdb page 559 80000697 READ 1 WAIT dn2id.bdb page 559 8000a85f READ 1 WAIT dn2id.bdb page 559 8000a85f READ 2 HELD dn2id.bdb page 1362 8000a85f WRITE 2 HELD dn2id.bdb page 1362 8000a85f READ 2 HELD dn2id.bdb page 1353 8000a85f WRITE 2 HELD dn2id.bdb page 1353 b6 READ 1 HELD uid.bdb handle 0 a5 READ 1 HELD mail.bdb handle 0 af READ 1 HELD mailLocalAddress.bdb handle 0 9b READ 1 HELD miLoginid.bdb handle 0 aa READ 1 HELD mailHost.bdb handle 0 bb READ 1 HELD miDomainName.bdb handle 0 c0 READ 1 HELD mpMailHost.bdb handle 0 a0 READ 1 HELD mpMailUserType.bdb handle 0 We have also collected the backtrace for all the threads which I have uploaded to: ftp://ftp.openldap.org/incoming/nikolai-gdb-120902.txt The full db_stat output is located at: ftp://ftp.openldap.org/incoming/nikolai-dbstat-120902.txt Our DB_CONFIG: # One 512MB cache set_cachesize 0 536870912 1 # Transaction Log settings set_lg_regionmax 1048576 set_lg_max 10485760 set_lg_bsize 2097152 set_flags DB_LOG_AUTOREMOVE # Increase lock maximums set_lk_max_locks 2000 set_lk_max_lockers 2000 set_lk_max_objects 2000 Our slapd.conf on our replicas: # Load the following schema files include /usr/local/etc/openldap/schema/core.schema include /usr/local/etc/openldap/schema/cosine.schema include /usr/local/etc/openldap/schema/nis.schema include /usr/local/etc/openldap/schema/inetorgperson.schema include /usr/local/etc/openldap/schema/misc.schema include /usr/local/etc/openldap/schema/mirapoint.schema include /usr/local/etc/openldap/schema/smp.schema # Runtime settings for slapd pidfile /var/run/openldap/slapd.pid argsfile /var/run/openldap/slapd.args loglevel none # TLS security options for slapd. TLSCipherSuite HIGH TLSCACertificateFile /usr/local/etc/openldap/tls/ca-cert.pem TLSCertificateFile /usr/local/etc/openldap/tls/server-cert.pem TLSCertificateKeyFile /usr/local/etc/openldap/tls/server-key.pem # This option configures one or more hashes to be used in generation # of user passwords stored in the userPassword attribute during # processing of LDAP Password Modify Extended Operations (RFC 3062). password-hash {SSHA} # Load dynamic backend modules: modulepath /usr/local/libexec/openldap moduleload back_bdb moduleload back_monitor # Do not limit size or time of requests. sizelimit unlimited timelimit unlimited # Require authentication prior to directory operations require authc ############################################################################### # BDB Database Definitions # # The following configuration directives relate to bdb database definitions ############################################################################### # The remaining configuration directives relate to bdb database definitions database bdb suffix "o=top" rootdn "cn=root,o=top" # Cleartext passwords, especially for the rootdn, should # be avoid. See slappasswd(8) and slapd.conf(5) for details. rootpw {SSHA}********** # The database directory must exist prior to running slapd and # should only be accessible by the slapd and slap tools. directory /var/db/openldap-data # Indices to maintain index cn eq,sub,pres index entryUUID eq index entryCSN eq index mail eq,sub,pres index mailHost eq index mailLocalAddress eq,sub,pres index miDomainName eq,sub index miLoginId eq,pres index mpMailHost eq index mpMailUserType eq index mpSystemRole eq index objectClass eq,pres index uid eq,pres # Specify the number of entries which should be held in memory cachesize 200000 # Set transactional checkpoint checkpoint 512 60 ############################################################################### # LDAP Sync Replication # # A unique replica id number is required for each replication client ############################################################################### # LDAP sync replication settings syncrepl rid=36 provider=ldaps://ldapmaster/ type=refreshAndPersist retry=30,+ searchbase="o=top" filter="(objectClass=*)" scope=sub attrs="*" sizelimit=unlimited timelimit=unlimited schemachecking=off bindmethod=simple binddn="cn=replica,ou=users,ou=directory,o=top" credentials=********** # Where to refer ldap updates to updateref ldaps://ldapmaster/ ############################################################################### # LDAP Statistics # # The OpenLDAP server can be configured to provide real time performance # statistics through the monitor branch. ############################################################################### # Enable the statistics monitoring database database monitor # Allow access to monitoring user only access to dn.subtree="cn=monitor" by dn.exact="cn=monitor,ou=users,ou=directory,o=top" read by * none Sincerely, Nikolai Schupbach
--On Saturday, September 01, 2012 1:46 PM +0000 nikolai@net24.co.nz wrote: > Full_Name: Nikolai Schupbach > Version: 2.4.31 > OS: FreeBSD > URL: ftp://ftp.openldap.org/incoming/ > Submission from: (NULL) (202.78.158.60) Have you confirmed this isn't the same thing ITS#7222, fixed in OpenLDAP 2.4.32? --Quanah -- Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration
I haven't yet - I wanted to collect information before making any changes. I did look at that fix and wasn't confident it would solve our problem. You're right though - I need to test it to rule it out. I will upgrade all the servers to 2.4.32 and report back. On 2/09/2012, at 7:07 AM, Quanah Gibson-Mount wrote: > --On Saturday, September 01, 2012 1:46 PM +0000 nikolai@net24.co.nz wrote: > >> Full_Name: Nikolai Schupbach >> Version: 2.4.31 >> OS: FreeBSD >> URL: ftp://ftp.openldap.org/incoming/ >> Submission from: (NULL) (202.78.158.60) > > Have you confirmed this isn't the same thing ITS#7222, fixed in OpenLDAP 2.4.32? > > --Quanah > > > > -- > > Quanah Gibson-Mount > Sr. Member of Technical Staff > Zimbra, Inc > A Division of VMware, Inc. > -------------------- > Zimbra :: the leader in open source messaging and collaboration
A couple of days ago I had a hang with OpenLDAP 2.4.32 / back-hdb running on Debian Squeeze, self-compiled against BDB 4.8.30. It seemed Database was locked as restarting slapd of even rebooting OS did not help. Unfortunately I had to bring up the system as fast as possible and could not examine the problem. The system has only 200 entries and not much load yet. I had renamed entries with web2ldap when all 4 masters (4-way MMR) locked up one after the other. So there seem to be lockup problems in 2.4.32.
nikolai@net24.co.nz wrote: > Full_Name: Nikolai Schupbach > Version: 2.4.31 > OS: FreeBSD > URL: ftp://ftp.openldap.org/incoming/ > Submission from: (NULL) (202.78.158.60) > > > We are experiencing frequent hangs in slapd. Once hung we can continue to > connect, but all searches will just hang indefinitely until we kill -9 the slapd > process and restart it. The directory is used for mail routing and we have been > migrating to it from an existing directory server over the last 3 weeks - we > have noted the busier the directory becomes the more often it hangs (now once > every 2 days). > > We have one master and 10 syncrepl read only replicas - the master is used > mainly for writes and has not hung yet, but most of the replicas have hung at > least once. The replicas receive anywhere between 50 to 300 searches/sec, while > the master would only get 1/sec. There are 45k entries in the directory. > > We are running: > > FreeBSD 8.3/9.0 x64 > OpenLDAP 2.4.31 > Berkeley DB 4.6.21 > > The old directory we are migrating from has the same load and is also running > OpenLDAP, but has been rock solid for 5 years. It is running Berkeley DB 4.3.29 > and OpenLDAP 2.3.27. > > We have managed to collect db_stat lock information, which indicates the same > issue each time - a write lock on dn2id.bdb. It's more than that. Your db_stat shows that a single thread has 3 active transactions. This should never happen: 8000a85e dd= 0 locks held 2 write locks 0 pid/thread 88000/34386526336 8000a85e READ 1 HELD 0xb19a8 len: 9 data: 40xa800000000000000 8000a85e READ 1 HELD 0xb26c8 len: 9 data: 60xa800000000000000 8000a85f dd= 0 locks held 8 write locks 4 pid/thread 88000/34386526336 8000a85f READ 1 WAIT dn2id.bdb page 559 8000a85f READ 1 HELD dn2id.bdb page 768 8000a85f WRITE 2 HELD dn2id.bdb page 1362 8000a85f READ 2 HELD dn2id.bdb page 1362 8000a85f WRITE 2 HELD dn2id.bdb page 1353 8000a85f READ 2 HELD dn2id.bdb page 1353 8000a85f WRITE 2 HELD dn2id.bdb page 933 8000a85f READ 1 HELD dn2id.bdb page 933 8000a85f WRITE 4 HELD dn2id.bdb page 219 80001047 dd=28 locks held 1 write locks 1 pid/thread 88000/34386526336 80001047 WRITE 1 HELD dn2id.bdb page 559 I would first recommend changing from BDB 4.6.21 to some other version. There are no code paths in back-bdb where we would ever return without either committing or aborting the current transactions, so this appears to be a BDB bug, not an OpenLDAP bug. > We have also collected the backtrace for all the threads which I have uploaded > to: > > ftp://ftp.openldap.org/incoming/nikolai-gdb-120902.txt > > The full db_stat output is located at: > > ftp://ftp.openldap.org/incoming/nikolai-dbstat-120902.txt -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
michael@stroeder.com wrote: > This is a cryptographically signed message in MIME format. > > --------------ms080100030105010600070605 > Content-Type: text/plain; charset=ISO-8859-1 > Content-Transfer-Encoding: quoted-printable > > A couple of days ago I had a hang with OpenLDAP 2.4.32 / back-hdb running= > on > Debian Squeeze, self-compiled against BDB 4.8.30. It seemed Database was > locked as restarting slapd of even rebooting OS did not help. Unfortunate= > ly I > had to bring up the system as fast as possible and could not examine the = > problem. db_recover will always return the DB to a usable state and reset any DB locks. (It completely deletes the lock region, so there cannot be any stale locks after it runs.) > The system has only 200 entries and not much load yet. I had renamed entr= > ies > with web2ldap when all 4 masters (4-way MMR) locked up one after the othe= > r. > So there seem to be lockup problems in 2.4.32. The only way to know if you're seeing the same problem as the original poster is if you provide db_stat -CA and gdb trace output, like the original poster did. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
Hi Howard, Thank you very much for the explanation. What BDB version would you recommend. Obviously I have quite a few options and would like to use a version that is known to be very solid. Sincerely, Nikolai Schupbach On 3/09/2012, at 9:45 PM, Howard Chu wrote: > nikolai@net24.co.nz wrote: >> Full_Name: Nikolai Schupbach >> Version: 2.4.31 >> OS: FreeBSD >> URL: ftp://ftp.openldap.org/incoming/ >> Submission from: (NULL) (202.78.158.60) >> >> >> We are experiencing frequent hangs in slapd. Once hung we can continue to >> connect, but all searches will just hang indefinitely until we kill -9 the slapd >> process and restart it. The directory is used for mail routing and we have been >> migrating to it from an existing directory server over the last 3 weeks - we >> have noted the busier the directory becomes the more often it hangs (now once >> every 2 days). >> >> We have one master and 10 syncrepl read only replicas - the master is used >> mainly for writes and has not hung yet, but most of the replicas have hung at >> least once. The replicas receive anywhere between 50 to 300 searches/sec, while >> the master would only get 1/sec. There are 45k entries in the directory. >> >> We are running: >> >> FreeBSD 8.3/9.0 x64 >> OpenLDAP 2.4.31 >> Berkeley DB 4.6.21 >> >> The old directory we are migrating from has the same load and is also running >> OpenLDAP, but has been rock solid for 5 years. It is running Berkeley DB 4.3.29 >> and OpenLDAP 2.3.27. >> >> We have managed to collect db_stat lock information, which indicates the same >> issue each time - a write lock on dn2id.bdb. > > It's more than that. Your db_stat shows that a single thread has 3 active > transactions. This should never happen: > > 8000a85e dd= 0 locks held 2 write locks 0 pid/thread 88000/34386526336 > 8000a85e READ 1 HELD 0xb19a8 len: 9 data: 40xa800000000000000 > 8000a85e READ 1 HELD 0xb26c8 len: 9 data: 60xa800000000000000 > 8000a85f dd= 0 locks held 8 write locks 4 pid/thread 88000/34386526336 > 8000a85f READ 1 WAIT dn2id.bdb page 559 > 8000a85f READ 1 HELD dn2id.bdb page 768 > 8000a85f WRITE 2 HELD dn2id.bdb page 1362 > 8000a85f READ 2 HELD dn2id.bdb page 1362 > 8000a85f WRITE 2 HELD dn2id.bdb page 1353 > 8000a85f READ 2 HELD dn2id.bdb page 1353 > 8000a85f WRITE 2 HELD dn2id.bdb page 933 > 8000a85f READ 1 HELD dn2id.bdb page 933 > 8000a85f WRITE 4 HELD dn2id.bdb page 219 > 80001047 dd=28 locks held 1 write locks 1 pid/thread 88000/34386526336 > 80001047 WRITE 1 HELD dn2id.bdb page 559 > > I would first recommend changing from BDB 4.6.21 to some other version. There > are no code paths in back-bdb where we would ever return without either > committing or aborting the current transactions, so this appears to be a BDB > bug, not an OpenLDAP bug. > >> We have also collected the backtrace for all the threads which I have uploaded >> to: >> >> ftp://ftp.openldap.org/incoming/nikolai-gdb-120902.txt >> >> The full db_stat output is located at: >> >> ftp://ftp.openldap.org/incoming/nikolai-dbstat-120902.txt > > -- > -- Howard Chu > CTO, Symas Corp. http://www.symas.com > Director, Highland Sun http://highlandsun.com/hyc/ > Chief Architect, OpenLDAP http://www.openldap.org/project/
Nikolai Schupbach wrote: > Hi Howard, > > Thank you very much for the explanation. What BDB version would you recommend. Obviously I have quite a few options and would like to use a version that is known to be very solid. I believe 4.7.25 + all 4 of its official patches was pretty stable. http://www.oracle.com/technetwork/products/berkeleydb/patch-088170.html I've done limited testing with 4.8.30, 5.1.19, and 5.3.21. At this point I'm no longer tracking BDB revisions since MDB has superior performance while using 1/4 as much RAM and requiring no tuning. > Sincerely, > Nikolai Schupbach > > On 3/09/2012, at 9:45 PM, Howard Chu wrote: > >> nikolai@net24.co.nz wrote: >>> Full_Name: Nikolai Schupbach >>> Version: 2.4.31 >>> OS: FreeBSD >>> URL: ftp://ftp.openldap.org/incoming/ >>> Submission from: (NULL) (202.78.158.60) >>> >>> >>> We are experiencing frequent hangs in slapd. Once hung we can continue to >>> connect, but all searches will just hang indefinitely until we kill -9 the slapd >>> process and restart it. The directory is used for mail routing and we have been >>> migrating to it from an existing directory server over the last 3 weeks - we >>> have noted the busier the directory becomes the more often it hangs (now once >>> every 2 days). >>> >>> We have one master and 10 syncrepl read only replicas - the master is used >>> mainly for writes and has not hung yet, but most of the replicas have hung at >>> least once. The replicas receive anywhere between 50 to 300 searches/sec, while >>> the master would only get 1/sec. There are 45k entries in the directory. >>> >>> We are running: >>> >>> FreeBSD 8.3/9.0 x64 >>> OpenLDAP 2.4.31 >>> Berkeley DB 4.6.21 >>> >>> The old directory we are migrating from has the same load and is also running >>> OpenLDAP, but has been rock solid for 5 years. It is running Berkeley DB 4.3.29 >>> and OpenLDAP 2.3.27. >>> >>> We have managed to collect db_stat lock information, which indicates the same >>> issue each time - a write lock on dn2id.bdb. >> >> It's more than that. Your db_stat shows that a single thread has 3 active >> transactions. This should never happen: >> >> 8000a85e dd= 0 locks held 2 write locks 0 pid/thread 88000/34386526336 >> 8000a85e READ 1 HELD 0xb19a8 len: 9 data: 40xa800000000000000 >> 8000a85e READ 1 HELD 0xb26c8 len: 9 data: 60xa800000000000000 >> 8000a85f dd= 0 locks held 8 write locks 4 pid/thread 88000/34386526336 >> 8000a85f READ 1 WAIT dn2id.bdb page 559 >> 8000a85f READ 1 HELD dn2id.bdb page 768 >> 8000a85f WRITE 2 HELD dn2id.bdb page 1362 >> 8000a85f READ 2 HELD dn2id.bdb page 1362 >> 8000a85f WRITE 2 HELD dn2id.bdb page 1353 >> 8000a85f READ 2 HELD dn2id.bdb page 1353 >> 8000a85f WRITE 2 HELD dn2id.bdb page 933 >> 8000a85f READ 1 HELD dn2id.bdb page 933 >> 8000a85f WRITE 4 HELD dn2id.bdb page 219 >> 80001047 dd=28 locks held 1 write locks 1 pid/thread 88000/34386526336 >> 80001047 WRITE 1 HELD dn2id.bdb page 559 >> >> I would first recommend changing from BDB 4.6.21 to some other version. There >> are no code paths in back-bdb where we would ever return without either >> committing or aborting the current transactions, so this appears to be a BDB >> bug, not an OpenLDAP bug. >> >>> We have also collected the backtrace for all the threads which I have uploaded >>> to: >>> >>> ftp://ftp.openldap.org/incoming/nikolai-gdb-120902.txt >>> >>> The full db_stat output is located at: >>> >>> ftp://ftp.openldap.org/incoming/nikolai-dbstat-120902.txt >> >> -- >> -- Howard Chu >> CTO, Symas Corp. http://www.symas.com >> Director, Highland Sun http://highlandsun.com/hyc/ >> Chief Architect, OpenLDAP http://www.openldap.org/project/ > > -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
--On Monday, September 03, 2012 6:35 PM +0000 hyc@symas.com wrote: > Nikolai Schupbach wrote: >> Hi Howard, >> >> Thank you very much for the explanation. What BDB version would you > recommend. Obviously I have quite a few options and would like to use a > version that is known to be very solid. > > I believe 4.7.25 + all 4 of its official patches was pretty stable. > http://www.oracle.com/technetwork/products/berkeleydb/patch-088170.html > > I've done limited testing with 4.8.30, 5.1.19, and 5.3.21. At this point > I'm no longer tracking BDB revisions since MDB has superior performance > while using 1/4 as much RAM and requiring no tuning. We've been using BDB 4.7.25+all 4 patches without issue for several years as well. However, I will also note that we are now switching over to MDB as well for our production services starting with OpenLDAP 2.4.32. --Quanah -- Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration
Thanks guys - I think we will look at going to MDB as well now. On 4/09/2012, at 8:42 AM, Quanah Gibson-Mount wrote: > --On Monday, September 03, 2012 6:35 PM +0000 hyc@symas.com wrote: > >> Nikolai Schupbach wrote: >>> Hi Howard, >>> >>> Thank you very much for the explanation. What BDB version would you >> recommend. Obviously I have quite a few options and would like to use a >> version that is known to be very solid. >> >> I believe 4.7.25 + all 4 of its official patches was pretty stable. >> http://www.oracle.com/technetwork/products/berkeleydb/patch-088170.html >> >> I've done limited testing with 4.8.30, 5.1.19, and 5.3.21. At this point >> I'm no longer tracking BDB revisions since MDB has superior performance >> while using 1/4 as much RAM and requiring no tuning. > > We've been using BDB 4.7.25+all 4 patches without issue for several years as well. However, I will also note that we are now switching over to MDB as well for our production services starting with OpenLDAP 2.4.32. > > --Quanah > > > -- > > Quanah Gibson-Mount > Sr. Member of Technical Staff > Zimbra, Inc > A Division of VMware, Inc. > -------------------- > Zimbra :: the leader in open source messaging and collaboration
changed notes changed state Open to Closed
BDB bug