Full_Name: Helge Milde Version: slapd 2.4.11 OS: Debian 5 URL: Submission from: (NULL) (81.175.2.164) Hello, In the last two days, slapd has randomly stopped working on our server, about one time every two or three hours. Before this, slapd had been running without any problems for about two months. slapd.conf has not been touched since then. What happens is that slapd stops taking queries (postfix, dovecot, NSS etc. starts getting timeouts), and it starts using exactly 100% of one core until we restart it. After restart, everything works again (until this happens again). It seems to only happen when slapd is under "stress" (we have an 8-core system with 8G RAM, and so slapd is really never under much "stress"; the biggest CPU load I've seen is about 40% on one core). The last ldap search before slapd stops working is always the same: Oct 2 10:57:38 oi-mail slapd[24550]: conn=1543 op=1 SRCH base="ou=groups,dc=web,dc=xxxx,dc=net" scope=1 deref=0 filter="(&(objectClass=posixGroup))" .. which should return about 1800 entries. Different connections might query this simultaneously, and it's a query that happens every ~10 seconds on average. I've tried to reproduce this by doing .. # while [ 1 ]; do ldapsearch -x -H ldaps://.. -b 'ou=groups,dc=web,dc=xxxx,dc=net' '(&(objectClass=posixAccount))'; done .. for two minutes without any "luck" :). slapd just uses a bit more CPU, but never stops working. With "loglevel stats stats2", I get nothing out of the logs: Oct 2 08:42:15 oi-mail slapd[15836]: conn=34798 op=1 ENTRY dn="cn=xxxx,ou=groups,dc=web,dc=xxxx,dc=net" ... time of "crash" (we're mid-search at this point); nothing gets logged until we do a restart: ... Oct 2 08:42:30 oi-mail slapd[15836]: daemon: shutdown requested and initiated. Oct 2 08:42:30 oi-mail slapd[15836]: conn=34391 fd=31 closed (slapd shutdown) ... many entires like the one above, and then I seem to be getting the next entry from the same connection, right before it closes: ... Oct 2 08:42:30 oi-mail slapd[15836]: conn=34798 op=1 ENTRY dn="cn=xxxx,ou=groups,dc=web,dc=xxxx,dc=net" Oct 2 08:42:30 oi-mail slapd[15836]: conn=34798 fd=81 closed (slapd shutdown) I've tried logging more than just stats, but the interesting loglevels also causes a big performance hit (trace and args), and I'm not really sure which of the other loglevels could be helpful. This is a live ldap server, on a server that also hosts other services, so I simply can't take the performance hit. I also did a strace -p <slapd pid>, which only gives "futex(0x41c499e0, FUTEX_WAIT," until I restart slapd, but the "restarting routine" could be useful, so here it is: http://www.pastie.org/639170 I understand that this probably isn't enough data to see what the problem is, and so I'm hoping to get some help in debugging this properly. As I've said, args/trace loglevel takes too much of a performance hit, but if this is the only way, we might reconcider. Interesting parts of slapd.conf below. We use slapd.conf instead of cn=config, everything goes over TLS, we're AFAIK only using LDAPv3 binding, and we only have one database entry in slapd.conf. moduleload back_hdb sizelimit 99999 tool-threads 8 threads 16 concurrency 32 backend hdb database hdb cachesize 60000 dbconfig set_cachesize 0 52428800 0 dbconfig set_lk_max_objects 1500 dbconfig set_lk_max_locks 1500 dbconfig set_lk_max_lockers 1500 lastmod on checkpoint 512 30 Best regards, Helge Milde
helge@monsternett.no wrote: > Full_Name: Helge Milde > Version: slapd 2.4.11 > OS: Debian 5 > URL: > Submission from: (NULL) (81.175.2.164) > > > Hello, > > In the last two days, slapd has randomly stopped working on our server, about > one time every two or three hours. Before this, slapd had been running without > any problems for about two months. slapd.conf has not been touched since then. > > What happens is that slapd stops taking queries (postfix, dovecot, NSS etc. > starts getting timeouts), and it starts using exactly 100% of one core until we > restart it. > After restart, everything works again (until this happens again). > > It seems to only happen when slapd is under "stress" (we have an 8-core system > with 8G RAM, and so slapd is really never under much "stress"; the biggest CPU > load I've seen is about 40% on one core). The last ldap search before slapd > stops working is always the same: > Oct 2 10:57:38 oi-mail slapd[24550]: conn=1543 op=1 SRCH > base="ou=groups,dc=web,dc=xxxx,dc=net" scope=1 deref=0 > filter="(&(objectClass=posixGroup))" > .. which should return about 1800 entries. Different connections might query > this simultaneously, and it's a query that happens every ~10 seconds on > average. > > I've tried to reproduce this by doing .. > # while [ 1 ]; do ldapsearch -x -H ldaps://.. -b > 'ou=groups,dc=web,dc=xxxx,dc=net' '(&(objectClass=posixAccount))'; done > .. for two minutes without any "luck" :). slapd just uses a bit more CPU, but > never stops working. > > With "loglevel stats stats2", I get nothing out of the logs: > Oct 2 08:42:15 oi-mail slapd[15836]: conn=34798 op=1 ENTRY > dn="cn=xxxx,ou=groups,dc=web,dc=xxxx,dc=net" > ... time of "crash" (we're mid-search at this point); nothing gets logged until > we do a restart: ... > Oct 2 08:42:30 oi-mail slapd[15836]: daemon: shutdown requested and initiated. > Oct 2 08:42:30 oi-mail slapd[15836]: conn=34391 fd=31 closed (slapd shutdown) > ... many entires like the one above, and then I seem to be getting the next > entry from the same connection, right before it closes: ... > Oct 2 08:42:30 oi-mail slapd[15836]: conn=34798 op=1 ENTRY > dn="cn=xxxx,ou=groups,dc=web,dc=xxxx,dc=net" > Oct 2 08:42:30 oi-mail slapd[15836]: conn=34798 fd=81 closed (slapd shutdown) > > I've tried logging more than just stats, but the interesting loglevels also > causes a big performance hit (trace and args), and I'm not really sure which of > the other loglevels could be helpful. > This is a live ldap server, on a server that also hosts other services, so I > simply can't take the performance hit. > > I also did a strace -p<slapd pid>, which only gives "futex(0x41c499e0, > FUTEX_WAIT," until I restart slapd, but the "restarting routine" could be > useful, so here it is: http://www.pastie.org/639170 strace is useless. Use gdb and get a trace of all running threads when this occurs. http://www.openldap.org/faq/data/cache/59.html -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
--On Friday, October 02, 2009 8:54 PM +0000 hyc@symas.com wrote: > strace is useless. Use gdb and get a trace of all running threads when > this occurs. > > http://www.openldap.org/faq/data/cache/59.html You also need to test against a current release of OpenLDAP, like OpenLDAP 2.4.18. --Quanah -- Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
changed notes changed state Open to Closed
I have now installed openldap 2.4.19 from source (default configuration except for --enable-crypt=yes). # slapd -V @(#) $OpenLDAP: slapd 2.4.19 (Dec 9 2009 22:46:15) $ But unfortunately, the bug is still there. Nothing has changed at all.. It has crashed 10 times since I did the migration 10 hours ago. It usually crashes 1-2 times at night, and 10-20 times during work hours (when the servers have more load). When I migrated, I recreated the database by typing: # oldbin/slapcat > db.ldif # newbin/slapadd < db.ldif # (under the new openldap/slapd user) So I can't imagine the database can be corrupt in any way, which I initially thought when I first sent this bug report. It still dies on the same queries as before, in the middle of iterating through ou=users,.. or ou=groups,.., which both have ~1.8k entries. These queries comes from proftpd, which does a "getent passwd" and "getent group" every time a customer logs in. I tried to reproduce this manually again, launching 8 processes, constantly querying ou=users and ou=groups, 2 using a unix socket locally and 6 using ldaps://hostname/ remotely, but it still won't break down when I do that. This causes *alot* more load than proftpd does, but the crashes seem to only happen "randomly". Any suggestions on what I can do to figure out this problem? The LDAP server is live and in use in two servers with 2-3k users, so I can't mess with it *too* much. On Fri, Oct 02, 2009 at 09:28:25PM +0000, quanah@zimbra.com wrote: >--On Friday, October 02, 2009 8:54 PM +0000 hyc@symas.com wrote: > > >> strace is useless. Use gdb and get a trace of all running threads when >> this occurs. >> >> http://www.openldap.org/faq/data/cache/59.html > > >You also need to test against a current release of OpenLDAP, like OpenLDAP >2.4.18. > >--Quanah > > >-- > >Quanah Gibson-Mount >Principal Software Engineer >Zimbra, Inc >-------------------- >Zimbra :: the leader in open source messaging and collaboration > > -- Helge Milde, 69701808 www.monsternett.no
--On Thursday, December 10, 2009 10:44 AM +0100 Helge Milde <helge@monsternett.no> wrote: > I have now installed openldap 2.4.19 from source (default configuration > except for --enable-crypt=yes). > ># slapd -V > @(#) $OpenLDAP: slapd 2.4.19 (Dec 9 2009 22:46:15) $ > > But unfortunately, the bug is still there. Nothing has changed at all.. > It has crashed 10 times since I did the migration 10 hours ago. It > usually crashes 1-2 times at night, and 10-20 times during work hours > (when the servers have more load). Well, now 2.4.20 is current (just to note). What we'll need for resolving your issue is: (a) Your configuration files (minus passwords) (b) A backtrace from gdb of a crash. Make sure you build OpenLDAP with debugging symbols (using CFLAGS of -g) and when you run make install, make sure you use STRIP="", like: make install DESTDIR=/my/path/to/install STRIP="" Thanks, Quanah -- Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
On Thu, Dec 10, 2009 at 08:21:20AM -0800, Quanah Gibson-Mount wrote: >Well, now 2.4.20 is current (just to note). Right. I'm now running this version. The results below are with 2.4.20, and not 2.4.19. >What we'll need for resolving your issue is: > >(a) Your configuration files (minus passwords) # egrep -v '^#|^$' etc/openldap/slapd.conf TLSCertificateFile <removed>.crt TLSCertificateKeyFile <removed>.key TLSCACertificateFile <removed>.ca include /usr/local/ldap/etc/openldap/schema/core.schema include /usr/local/ldap/etc/openldap/schema/cosine.schema include /usr/local/ldap/etc/openldap/schema/nis.schema include /usr/local/ldap/etc/openldap/schema/qmail.schema include /usr/local/ldap/etc/openldap/schema/inetorgperson.schema include /usr/local/ldap/etc/openldap/schema/mod_vhost_ldap.schema include /usr/local/ldap/etc/openldap/schema/monsternett_vhost_ldap.schema include /usr/local/ldap/etc/openldap/schema/sql_user.schema include /usr/local/ldap/etc/openldap/schema/using_space.schema pidfile /usr/local/ldap/var/run/slapd/slapd.pid argsfile /usr/local/ldap/var/run/slapd/slapd.args loglevel stats modulepath /usr/local/ldap/lib moduleload back_hdb sizelimit 99999 tool-threads 8 threads 16 concurrency 32 backend hdb database hdb suffix "dc=<removed>" directory "/usr/local/ldap/var/openldap-data" cachesize 60000 dbconfig set_cachesize 0 52428800 0 dbconfig set_lk_max_objects 1500 dbconfig set_lk_max_locks 1500 dbconfig set_lk_max_lockers 1500 index objectclass,sn eq index cn,uniqueMember,memberUid,uidNumber,gidNumber pres,eq index uid,mail,associatedDomain,mailAlternateAddress,apacheServerName,apacheServerAlias,apacheSuexecUid pres,eq,sub lastmod on checkpoint 512 30 access to attrs=userPassword,shadowLastChange by dn="cn=admin,dc=<removed>" write by anonymous auth by self write by * none access to dn.base="" by * read access to * by dn="cn=admin,dc=<removed>" write by * read >(b) A backtrace from gdb of a crash. Since this is a *live* server, I don't want to run it through gdb if I don't have to. Also, the server doesn't really crash, it just hangs, using 100% of one CPU core. We currently have a script that checks if ldapsearch succeeds, and if not, it restarts slapd. AFAIK, if we would have done this via gdb directly, the ports would still be open, and slapd wouldn't be able to start, which means I would have to sit there for an hour to quickly be able to run a backtrace, so slapd could be restarted instantly :) Anyway, instead, I made it dump a core (by sending killall -11 slapd in my script). I hope this is sufficient. --- gdb core bt start --- (gdb) thread apply all bt Thread 18 (process 7691): #0 0x00007ff212f77b78 in epoll_wait () from /lib/libc.so.6 #1 0x000000000041c618 in slapd_daemon_task (ptr=<value optimized out>) at daemon.c:2465 #2 0x00007ff214c4ffc7 in start_thread () from /lib/libpthread.so.0 #3 0x00007ff212f775ad in clone () from /lib/libc.so.6 #4 0x0000000000000000 in ?? () Thread 17 (process 7692): #0 0x00007ff214c53d29 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #1 0x000000000042efc5 in send_ldap_ber (op=<value optimized out>, ber=0x410f6670) at result.c:225 #2 0x0000000000430d00 in slap_send_search_entry (op=0x2732490, rs=0x41277ca0) at result.c:1260 #3 0x000000000049df00 in hdb_search (op=0x2732490, rs=0x41277ca0) at search.c:961 #4 0x0000000000421491 in fe_op_search (op=0x2732490, rs=0x41277ca0) at search.c:366 #5 0x0000000000421c8c in do_search (op=0x2732490, rs=0x41277ca0) at search.c:217 #6 0x000000000041efff in connection_operation (ctx=0x41277df0, arg_v=<value optimized out>) at connection.c:1109 #7 0x000000000041fcdd in connection_read_thread (ctx=0x41277df0, argv=<value optimized out>) at connection.c:1245 #8 0x00000000004eae20 in ldap_int_thread_pool_wrapper (xpool=<value optimized out>) at tpool.c:685 #9 0x00007ff214c4ffc7 in start_thread () from /lib/libpthread.so.0 #10 0x00007ff212f775ad in clone () from /lib/libc.so.6 #11 0x0000000000000000 in ?? () Thread 16 (process 7694): #0 0x00007ff214c53d29 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #1 0x000000000042efc5 in send_ldap_ber (op=<value optimized out>, ber=0x418f7670) at result.c:225 #2 0x0000000000430d00 in slap_send_search_entry (op=0x285ba50, rs=0x41a78ca0) at result.c:1260 #3 0x000000000049df00 in hdb_search (op=0x285ba50, rs=0x41a78ca0) at search.c:961 #4 0x0000000000421491 in fe_op_search (op=0x285ba50, rs=0x41a78ca0) at search.c:366 #5 0x0000000000421c8c in do_search (op=0x285ba50, rs=0x41a78ca0) at search.c:217 #6 0x000000000041efff in connection_operation (ctx=0x41a78df0, arg_v=<value optimized out>) at connection.c:1109 #7 0x000000000041fcdd in connection_read_thread (ctx=0x41a78df0, argv=<value optimized out>) at connection.c:1245 #8 0x00000000004eae20 in ldap_int_thread_pool_wrapper (xpool=<value optimized out>) at tpool.c:685 #9 0x00007ff214c4ffc7 in start_thread () from /lib/libpthread.so.0 #10 0x00007ff212f775ad in clone () from /lib/libc.so.6 #11 0x0000000000000000 in ?? () Thread 15 (process 7695): #0 0x00007ff214c53d29 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #1 0x000000000042efc5 in send_ldap_ber (op=<value optimized out>, ber=0x42947670) at result.c:225 #2 0x0000000000430d00 in slap_send_search_entry (op=0x7ff1e40163b0, rs=0x42ac8ca0) at result.c:1260 #3 0x000000000049df00 in hdb_search (op=0x7ff1e40163b0, rs=0x42ac8ca0) at search.c:961 #4 0x0000000000421491 in fe_op_search (op=0x7ff1e40163b0, rs=0x42ac8ca0) at search.c:366 #5 0x0000000000421c8c in do_search (op=0x7ff1e40163b0, rs=0x42ac8ca0) at search.c:217 #6 0x000000000041efff in connection_operation (ctx=0x42ac8df0, arg_v=<value optimized out>) at connection.c:1109 #7 0x000000000041fcdd in connection_read_thread (ctx=0x42ac8df0, argv=<value optimized out>) at connection.c:1245 #8 0x00000000004eae20 in ldap_int_thread_pool_wrapper (xpool=<value optimized out>) at tpool.c:685 #9 0x00007ff214c4ffc7 in start_thread () from /lib/libpthread.so.0 #10 0x00007ff212f775ad in clone () from /lib/libc.so.6 #11 0x0000000000000000 in ?? () Thread 14 (process 7699): #0 0x00007ff214c53d29 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #1 0x000000000042efc5 in send_ldap_ber (op=<value optimized out>, ber=0x43148670) at result.c:225 #2 0x0000000000430d00 in slap_send_search_entry (op=0x7ff1f42babf0, rs=0x432c9ca0) at result.c:1260 #3 0x000000000049df00 in hdb_search (op=0x7ff1f42babf0, rs=0x432c9ca0) at search.c:961 #4 0x0000000000421491 in fe_op_search (op=0x7ff1f42babf0, rs=0x432c9ca0) at search.c:366 #5 0x0000000000421c8c in do_search (op=0x7ff1f42babf0, rs=0x432c9ca0) at search.c:217 #6 0x000000000041efff in connection_operation (ctx=0x432c9df0, arg_v=<value optimized out>) at connection.c:1109 #7 0x000000000041fcdd in connection_read_thread (ctx=0x432c9df0, argv=<value optimized out>) at connection.c:1245 #8 0x00000000004eae20 in ldap_int_thread_pool_wrapper (xpool=<value optimized out>) at tpool.c:685 #9 0x00007ff214c4ffc7 in start_thread () from /lib/libpthread.so.0 #10 0x00007ff212f775ad in clone () from /lib/libc.so.6 #11 0x0000000000000000 in ?? () Thread 13 (process 7700): #0 0x00007ff214c53d29 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #1 0x000000000042efc5 in send_ldap_ber (op=<value optimized out>, ber=0x43949670) at result.c:225 #2 0x0000000000430d00 in slap_send_search_entry (op=0x16e57b0, rs=0x43acaca0) at result.c:1260 #3 0x000000000049df00 in hdb_search (op=0x16e57b0, rs=0x43acaca0) at search.c:961 #4 0x0000000000421491 in fe_op_search (op=0x16e57b0, rs=0x43acaca0) at search.c:366 #5 0x0000000000421c8c in do_search (op=0x16e57b0, rs=0x43acaca0) at search.c:217 #6 0x000000000041efff in connection_operation (ctx=0x43acadf0, arg_v=<value optimized out>) at connection.c:1109 #7 0x000000000041fcdd in connection_read_thread (ctx=0x43acadf0, argv=<value optimized out>) at connection.c:1245 #8 0x00000000004eae20 in ldap_int_thread_pool_wrapper (xpool=<value optimized out>) at tpool.c:685 #9 0x00007ff214c4ffc7 in start_thread () from /lib/libpthread.so.0 #10 0x00007ff212f775ad in clone () from /lib/libc.so.6 #11 0x0000000000000000 in ?? () Thread 12 (process 7715): #0 0x00007ff214c53d29 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #1 0x000000000042efc5 in send_ldap_ber (op=<value optimized out>, ber=0x4414a670) at result.c:225 #2 0x0000000000430d00 in slap_send_search_entry (op=0x7ff1f42e7260, rs=0x442cbca0) at result.c:1260 #3 0x000000000049df00 in hdb_search (op=0x7ff1f42e7260, rs=0x442cbca0) at search.c:961 #4 0x0000000000421491 in fe_op_search (op=0x7ff1f42e7260, rs=0x442cbca0) at search.c:366 #5 0x0000000000421c8c in do_search (op=0x7ff1f42e7260, rs=0x442cbca0) at search.c:217 #6 0x000000000041efff in connection_operation (ctx=0x442cbdf0, arg_v=<value optimized out>) at connection.c:1109 #7 0x000000000041fcdd in connection_read_thread (ctx=0x442cbdf0, argv=<value optimized out>) at connection.c:1245 #8 0x00000000004eae20 in ldap_int_thread_pool_wrapper (xpool=<value optimized out>) at tpool.c:685 #9 0x00007ff214c4ffc7 in start_thread () from /lib/libpthread.so.0 #10 0x00007ff212f775ad in clone () from /lib/libc.so.6 #11 0x0000000000000000 in ?? () Thread 11 (process 7864): #0 0x00007ff214c53d29 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #1 0x000000000042efc5 in send_ldap_ber (op=<value optimized out>, ber=0x4494b670) at result.c:225 #2 0x0000000000430d00 in slap_send_search_entry (op=0x1b731b0, rs=0x44accca0) at result.c:1260 #3 0x000000000049df00 in hdb_search (op=0x1b731b0, rs=0x44accca0) at search.c:961 #4 0x0000000000421491 in fe_op_search (op=0x1b731b0, rs=0x44accca0) at search.c:366 #5 0x0000000000421c8c in do_search (op=0x1b731b0, rs=0x44accca0) at search.c:217 #6 0x000000000041efff in connection_operation (ctx=0x44accdf0, arg_v=<value optimized out>) at connection.c:1109 #7 0x000000000041fcdd in connection_read_thread (ctx=0x44accdf0, argv=<value optimized out>) at connection.c:1245 #8 0x00000000004eae20 in ldap_int_thread_pool_wrapper (xpool=<value optimized out>) at tpool.c:685 #9 0x00007ff214c4ffc7 in start_thread () from /lib/libpthread.so.0 #10 0x00007ff212f775ad in clone () from /lib/libc.so.6 #11 0x0000000000000000 in ?? () Thread 10 (process 8395): #0 0x00007ff214c53d29 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #1 0x000000000042efc5 in send_ldap_ber (op=<value optimized out>, ber=0x4514c670) at result.c:225 #2 0x0000000000430d00 in slap_send_search_entry (op=0x7ff1e4038380, rs=0x452cdca0) at result.c:1260 #3 0x000000000049df00 in hdb_search (op=0x7ff1e4038380, rs=0x452cdca0) at search.c:961 #4 0x0000000000421491 in fe_op_search (op=0x7ff1e4038380, rs=0x452cdca0) at search.c:366 #5 0x0000000000421c8c in do_search (op=0x7ff1e4038380, rs=0x452cdca0) at search.c:217 #6 0x000000000041efff in connection_operation (ctx=0x452cddf0, arg_v=<value optimized out>) at connection.c:1109 #7 0x000000000041fcdd in connection_read_thread (ctx=0x452cddf0, argv=<value optimized out>) at connection.c:1245 #8 0x00000000004eae20 in ldap_int_thread_pool_wrapper (xpool=<value optimized out>) at tpool.c:685 #9 0x00007ff214c4ffc7 in start_thread () from /lib/libpthread.so.0 #10 0x00007ff212f775ad in clone () from /lib/libc.so.6 #11 0x0000000000000000 in ?? () Thread 9 (process 8808): #0 0x00007ff214c53d29 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #1 0x000000000042efc5 in send_ldap_ber (op=<value optimized out>, ber=0x4594d670) at result.c:225 #2 0x0000000000430d00 in slap_send_search_entry (op=0x7ff1fcac89e0, rs=0x45aceca0) at result.c:1260 #3 0x000000000049df00 in hdb_search (op=0x7ff1fcac89e0, rs=0x45aceca0) at search.c:961 #4 0x0000000000421491 in fe_op_search (op=0x7ff1fcac89e0, rs=0x45aceca0) at search.c:366 #5 0x0000000000421c8c in do_search (op=0x7ff1fcac89e0, rs=0x45aceca0) at search.c:217 #6 0x000000000041efff in connection_operation (ctx=0x45acedf0, arg_v=<value optimized out>) at connection.c:1109 #7 0x000000000041fcdd in connection_read_thread (ctx=0x45acedf0, argv=<value optimized out>) at connection.c:1245 #8 0x00000000004eae20 in ldap_int_thread_pool_wrapper (xpool=<value optimized out>) at tpool.c:685 #9 0x00007ff214c4ffc7 in start_thread () from /lib/libpthread.so.0 #10 0x00007ff212f775ad in clone () from /lib/libc.so.6 #11 0x0000000000000000 in ?? () Thread 8 (process 9706): #0 0x00007ff214c53d29 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #1 0x000000000042efc5 in send_ldap_ber (op=<value optimized out>, ber=0x4614e670) at result.c:225 #2 0x0000000000430d00 in slap_send_search_entry (op=0x414b4d0, rs=0x462cfca0) at result.c:1260 #3 0x000000000049df00 in hdb_search (op=0x414b4d0, rs=0x462cfca0) at search.c:961 #4 0x0000000000421491 in fe_op_search (op=0x414b4d0, rs=0x462cfca0) at search.c:366 #5 0x0000000000421c8c in do_search (op=0x414b4d0, rs=0x462cfca0) at search.c:217 #6 0x000000000041efff in connection_operation (ctx=0x462cfdf0, arg_v=<value optimized out>) at connection.c:1109 #7 0x000000000041fcdd in connection_read_thread (ctx=0x462cfdf0, argv=<value optimized out>) at connection.c:1245 #8 0x00000000004eae20 in ldap_int_thread_pool_wrapper (xpool=<value optimized out>) at tpool.c:685 #9 0x00007ff214c4ffc7 in start_thread () from /lib/libpthread.so.0 #10 0x00007ff212f775ad in clone () from /lib/libc.so.6 #11 0x0000000000000000 in ?? () Thread 7 (process 10771): #0 0x00007ff214c53d29 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #1 0x000000000042efc5 in send_ldap_ber (op=<value optimized out>, ber=0x4694f670) at result.c:225 #2 0x0000000000430d00 in slap_send_search_entry (op=0x414c100, rs=0x46ad0ca0) at result.c:1260 #3 0x000000000049df00 in hdb_search (op=0x414c100, rs=0x46ad0ca0) at search.c:961 #4 0x0000000000421491 in fe_op_search (op=0x414c100, rs=0x46ad0ca0) at search.c:366 #5 0x0000000000421c8c in do_search (op=0x414c100, rs=0x46ad0ca0) at search.c:217 #6 0x000000000041efff in connection_operation (ctx=0x46ad0df0, arg_v=<value optimized out>) at connection.c:1109 #7 0x000000000041fcdd in connection_read_thread (ctx=0x46ad0df0, argv=<value optimized out>) at connection.c:1245 #8 0x00000000004eae20 in ldap_int_thread_pool_wrapper (xpool=<value optimized out>) at tpool.c:685 #9 0x00007ff214c4ffc7 in start_thread () from /lib/libpthread.so.0 #10 0x00007ff212f775ad in clone () from /lib/libc.so.6 #11 0x0000000000000000 in ?? () Thread 6 (process 11742): #0 0x00007ff214c53d29 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #1 0x000000000042efc5 in send_ldap_ber (op=<value optimized out>, ber=0x47150670) at result.c:225 #2 0x0000000000430d00 in slap_send_search_entry (op=0x7ff1f42e8640, rs=0x472d1ca0) at result.c:1260 #3 0x000000000049df00 in hdb_search (op=0x7ff1f42e8640, rs=0x472d1ca0) at search.c:961 #4 0x0000000000421491 in fe_op_search (op=0x7ff1f42e8640, rs=0x472d1ca0) at search.c:366 #5 0x0000000000421c8c in do_search (op=0x7ff1f42e8640, rs=0x472d1ca0) at search.c:217 #6 0x000000000041efff in connection_operation (ctx=0x472d1df0, arg_v=<value optimized out>) at connection.c:1109 #7 0x000000000041fcdd in connection_read_thread (ctx=0x472d1df0, argv=<value optimized out>) at connection.c:1245 #8 0x00000000004eae20 in ldap_int_thread_pool_wrapper (xpool=<value optimized out>) at tpool.c:685 #9 0x00007ff214c4ffc7 in start_thread () from /lib/libpthread.so.0 #10 0x00007ff212f775ad in clone () from /lib/libc.so.6 #11 0x0000000000000000 in ?? () Thread 5 (process 11743): #0 0x00007ff214c53d29 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #1 0x000000000042efc5 in send_ldap_ber (op=<value optimized out>, ber=0x47951670) at result.c:225 #2 0x0000000000430d00 in slap_send_search_entry (op=0x414c610, rs=0x47ad2ca0) at result.c:1260 #3 0x000000000049df00 in hdb_search (op=0x414c610, rs=0x47ad2ca0) at search.c:961 #4 0x0000000000421491 in fe_op_search (op=0x414c610, rs=0x47ad2ca0) at search.c:366 #5 0x0000000000421c8c in do_search (op=0x414c610, rs=0x47ad2ca0) at search.c:217 #6 0x000000000041efff in connection_operation (ctx=0x47ad2df0, arg_v=<value optimized out>) at connection.c:1109 #7 0x000000000041fcdd in connection_read_thread (ctx=0x47ad2df0, argv=<value optimized out>) at connection.c:1245 #8 0x00000000004eae20 in ldap_int_thread_pool_wrapper (xpool=<value optimized out>) at tpool.c:685 #9 0x00007ff214c4ffc7 in start_thread () from /lib/libpthread.so.0 #10 0x00007ff212f775ad in clone () from /lib/libc.so.6 #11 0x0000000000000000 in ?? () Thread 4 (process 18180): #0 0x00007ff214c53d29 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #1 0x000000000042efc5 in send_ldap_ber (op=<value optimized out>, ber=0x48152670) at result.c:225 #2 0x0000000000430d00 in slap_send_search_entry (op=0x4376ee0, rs=0x482d3ca0) at result.c:1260 #3 0x000000000049df00 in hdb_search (op=0x4376ee0, rs=0x482d3ca0) at search.c:961 #4 0x0000000000421491 in fe_op_search (op=0x4376ee0, rs=0x482d3ca0) at search.c:366 #5 0x0000000000421c8c in do_search (op=0x4376ee0, rs=0x482d3ca0) at search.c:217 #6 0x000000000041efff in connection_operation (ctx=0x482d3df0, arg_v=<value optimized out>) at connection.c:1109 #7 0x000000000041fcdd in connection_read_thread (ctx=0x482d3df0, argv=<value optimized out>) at connection.c:1245 #8 0x00000000004eae20 in ldap_int_thread_pool_wrapper (xpool=<value optimized out>) at tpool.c:685 #9 0x00007ff214c4ffc7 in start_thread () from /lib/libpthread.so.0 #10 0x00007ff212f775ad in clone () from /lib/libc.so.6 #11 0x0000000000000000 in ?? () Thread 3 (process 18190): #0 0x00007ff214c53d29 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #1 0x000000000042efc5 in send_ldap_ber (op=<value optimized out>, ber=0x48953670) at result.c:225 #2 0x0000000000430d00 in slap_send_search_entry (op=0x7ff1f42d4510, rs=0x48ad4ca0) at result.c:1260 #3 0x000000000049df00 in hdb_search (op=0x7ff1f42d4510, rs=0x48ad4ca0) at search.c:961 #4 0x0000000000421491 in fe_op_search (op=0x7ff1f42d4510, rs=0x48ad4ca0) at search.c:366 #5 0x0000000000421c8c in do_search (op=0x7ff1f42d4510, rs=0x48ad4ca0) at search.c:217 #6 0x000000000041efff in connection_operation (ctx=0x48ad4df0, arg_v=<value optimized out>) at connection.c:1109 #7 0x000000000041fcdd in connection_read_thread (ctx=0x48ad4df0, argv=<value optimized out>) at connection.c:1245 #8 0x00000000004eae20 in ldap_int_thread_pool_wrapper (xpool=<value optimized out>) at tpool.c:685 #9 0x00007ff214c4ffc7 in start_thread () from /lib/libpthread.so.0 #10 0x00007ff212f775ad in clone () from /lib/libc.so.6 #11 0x0000000000000000 in ?? () Thread 2 (process 18200): #0 0x00007ff214c53d29 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #1 0x000000000042efc5 in send_ldap_ber (op=<value optimized out>, ber=0x49154670) at result.c:225 #2 0x0000000000430d00 in slap_send_search_entry (op=0x7ff1f42e78e0, rs=0x492d5ca0) at result.c:1260 #3 0x000000000049df00 in hdb_search (op=0x7ff1f42e78e0, rs=0x492d5ca0) at search.c:961 #4 0x0000000000421491 in fe_op_search (op=0x7ff1f42e78e0, rs=0x492d5ca0) at search.c:366 #5 0x0000000000421c8c in do_search (op=0x7ff1f42e78e0, rs=0x492d5ca0) at search.c:217 #6 0x000000000041efff in connection_operation (ctx=0x492d5df0, arg_v=<value optimized out>) at connection.c:1109 #7 0x000000000041fcdd in connection_read_thread (ctx=0x492d5df0, argv=<value optimized out>) at connection.c:1245 #8 0x00000000004eae20 in ldap_int_thread_pool_wrapper (xpool=<value optimized out>) at tpool.c:685 #9 0x00007ff214c4ffc7 in start_thread () from /lib/libpthread.so.0 #10 0x00007ff212f775ad in clone () from /lib/libc.so.6 #11 0x0000000000000000 in ?? () Thread 1 (process 7690): #0 0x00007ff214c50715 in pthread_join () from /lib/libpthread.so.0 #1 0x0000000000419c4c in slapd_daemon () at daemon.c:2840 #2 0x0000000000407d8a in main (argc=7, argv=0x7fff1d3caa88) at main.c:961 --- gdb core bt end --- I guess the last part in the treads are caused by me sending a SIGSEGV, i.e. every thread was inside clone () after slapd started hanging? Well, I'll leave the analysis to the experts I think :-) -- Helge Milde, 69701808 www.monsternett.no
Would be nice to get some feedback on this one.. Were using a core dump sufficient? If so, was the information helpful at all? If there's anything more I could do, please say so. -- Helge Milde, 69701808 www.monsternett.no
helge@monsternett.no wrote: > Would be nice to get some feedback on this one.. > Were using a core dump sufficient? If so, was the information helpful at all? > If there's anything more I could do, please say so. > Sorry, your issue was closed since we hadn't received any feedback from you in a couple of months. The last trace you sent shows nothing unusual, just a bunch of threads all waiting to write on their respective connections. Have you tried the .21 release yet? -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
changed notes changed state Closed to Open
On Tue, Dec 29, 2009 at 11:00:25AM -0800, Howard Chu wrote: >Sorry, your issue was closed since we hadn't received any feedback >from you in a couple of months. > >The last trace you sent shows nothing unusual, just a bunch of >threads all waiting to write on their respective connections. If the core dump backtrace wasn't helpful, then please tell me so; I thought I had given you sufficient feedback, and that you were working on it. >Have you tried the .21 release yet? No.. I might do that next week. Is there anything else that I can do? I just enabled trace loglevel, perhaps that might give us a clue on what's going on. >-- > -- Howard Chu > CTO, Symas Corp. http://www.symas.com > Director, Highland Sun http://highlandsun.com/hyc/ > Chief Architect, OpenLDAP http://www.openldap.org/project/ -- Helge Milde, 69701808 www.monsternett.no
On Wed, Dec 30, 2009 at 09:04:12AM +0100, Helge Milde wrote: >I just enabled trace loglevel, perhaps that might give us a clue on what's going on. Slapd just stopped working again. Here's the (annotated) log for viewing and/or downloading: http://www.pastie.org/761198.txt Does this give you any clues? Tell me if I can do anything else to help. -- Helge Milde, 69701808 www.monsternett.no
--On Wednesday, December 30, 2009 1:09 PM +0000 helge@monsternett.no wrote: > On Wed, Dec 30, 2009 at 09:04:12AM +0100, Helge Milde wrote: >> I just enabled trace loglevel, perhaps that might give us a clue on >> what's going on. > > Slapd just stopped working again. Here's the (annotated) log for viewing > and/or downloading: http://www.pastie.org/761198.txt > > Does this give you any clues? > > Tell me if I can do anything else to help. A fix for this has been committed to HEAD. You can apply it to the latest source (2.4.21) if you would like to try it out: <http://www.openldap.org/devel/cvsweb.cgi/servers/slapd/daemon.c.diff?hideattic=1&r1=text&tr1=1.442&r2=text&tr2=1.443&f=u> --Quanah -- Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
--On Tuesday, January 05, 2010 1:39 PM -0800 Quanah Gibson-Mount <quanah@zimbra.com> wrote: > --On Wednesday, December 30, 2009 1:09 PM +0000 helge@monsternett.no > wrote: > >> On Wed, Dec 30, 2009 at 09:04:12AM +0100, Helge Milde wrote: >>> I just enabled trace loglevel, perhaps that might give us a clue on >>> what's going on. >> >> Slapd just stopped working again. Here's the (annotated) log for viewing >> and/or downloading: http://www.pastie.org/761198.txt >> >> Does this give you any clues? >> >> Tell me if I can do anything else to help. > > A fix for this has been committed to HEAD. You can apply it to the > latest source (2.4.21) if you would like to try it out: > > <http://www.openldap.org/devel/cvsweb.cgi/servers/slapd/daemon.c.diff?hid > eattic=1&r1=text&tr1=1.442&r2=text&tr2=1.443&f=u> Hm, correction, it fixes a minor shutdown issue, not your main issue. :/ --Quanah -- Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
Hello again. We randomly tried to switch to LDAP without SSL last Friday (we used ldaps://..:636/ before), and the server hasn't crashed since! This is a good enough workaround for us, but I hope this information can help with fixing this bug, or help others with the same problem. On Wed, Dec 30, 2009 at 02:08:00PM +0100, Helge Milde wrote: >On Wed, Dec 30, 2009 at 09:04:12AM +0100, Helge Milde wrote: >>I just enabled trace loglevel, perhaps that might give us a clue on what's going on. > >Slapd just stopped working again. Here's the (annotated) log for viewing and/or downloading: >http://www.pastie.org/761198.txt > >Does this give you any clues? > >Tell me if I can do anything else to help. > >-- >Helge Milde, 69701808 >www.monsternett.no -- Helge Milde, 69701808 www.monsternett.no
SSL library issue. No idea which SSL library was in use.