[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: (ITS#3546) Sync rep provider and server crash on SIGTERM
On Thu, 2005-02-17 at 04:56 -0800, Howard Chu wrote:
> The backtrace you provided was a bit inaccurate; you need to compile
> with "-g" (debugging info present) and without optimization in order to
> get a consistent trace.
Yes, they confused me a bit too... here are some new ones with CFLAGS="g
-O0":
provider:
#0 0x0057f7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x005bf955 in raise () from /lib/tls/libc.so.6
#2 0x005c1319 in abort () from /lib/tls/libc.so.6
#3 0x005b8f41 in __assert_fail () from /lib/tls/libc.so.6
#4 0x08068c65 in connection2anonymous ()
#5 0x080692ec in connection_closing ()
#6 0x0806a4b0 in connection_read ()
#7 0x0806753f in slapd_daemon_destroy ()
#8 0x007df3ae in start_thread () from /lib/tls/libpthread.so.0
#9 0x0065eb6e in clone () from /lib/tls/libc.so.6
consumer:
#0 0x0057f7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x005bf955 in raise () from /lib/tls/libc.so.6
#2 0x005c1319 in abort () from /lib/tls/libc.so.6
#3 0x005b8f41 in __assert_fail () from /lib/tls/libc.so.6
#4 0x080fb765 in ldap_next_message ()
#5 0x080adaa9 in init_syncrepl ()
#6 0x080adeb9 in do_syncrepl ()
#7 0x080f6c9a in ldap_pvt_thread_pool_destroy ()
#8 0x007df3ae in start_thread () from /lib/tls/libpthread.so.0
#9 0x0065eb6e in clone () from /lib/tls/libc.so.6
> I've reproduced part of the problem; the provider is not segfaulting,
Yes, now you point it out nor is mine. I had "ulimit -c unlimited" set
on my machine which seems to generate core dumps in this situation. I
also get: "Program terminated with signal 6, Aborted." in my gdb output
for both core files.
> it is hitting an assert() at connection.c:687. Specifically, the connection
> is being torn down while someone is still waiting to write on it. This
> happens because there is a large search in progress, and data has piled
> up faster than the network can send it. When you terminate the syncrepl
> client, it sends an Unbind request and then closes its side of the
> connection. (In my test, the syncrepl consumer shutdown gracefully
> though, there was no crash.) The Unbind is received by the provider but
> actually gets Deferred, because it's still waiting for its writes to
> flush. Then the connection actually closes, and the problem occurs. This
> provider-side assert() situation is not unique to syncrepl, it can
> happen whenever any large search request is terminated in the middle.
> We'll definitely have to fix that up.
Thanks. My logs (level=256) if you need them...
Feb 17 15:00:21 mdte slapd[19649]: @(#) $OpenLDAP: slapd 2.2.23 (Feb 17 2005 14:58:42) $ martin@mdte:/home/martin/tasks/openldap/src/openldap-2.2.23/servers/slapd
Feb 17 15:00:21 mdte slapd[19649]: bdb_back_initialize: Sleepycat Software: Berkeley DB 4.2.52: (December 3, 2003)
Feb 17 15:00:21 mdte slapd[19649]: bdb_db_init: Initializing BDB database
Feb 17 15:00:21 mdte slapd[19650]: slapd starting
Feb 17 15:00:23 mdte slapd[19659]: @(#) $OpenLDAP: slapd 2.2.23 (Feb 17 2005 14:58:42) $ martin@mdte:/home/martin/tasks/openldap/src/openldap-2.2.23/servers/slapd
Feb 17 15:00:23 mdte slapd[19659]: bdb_back_initialize: Sleepycat Software: Berkeley DB 4.2.52: (December 3, 2003)
Feb 17 15:00:23 mdte slapd[19659]: bdb_db_init: Initializing BDB database
Feb 17 15:00:24 mdte slapd[19660]: slapd starting
Feb 17 15:00:24 mdte slapd[19650]: conn=0 fd=11 ACCEPT from IP=127.0.0.1:33091 (IP=127.0.0.1:11389)
Feb 17 15:00:24 mdte slapd[19650]: conn=0 op=0 BIND dn="uid=syncrepl,dc=qmul,dc=ac,dc=uk" method=128
Feb 17 15:00:24 mdte slapd[19650]: conn=0 op=0 BIND dn="uid=syncrepl,dc=qmul,dc=ac,dc=uk" mech=SIMPLE ssf=0
Feb 17 15:00:24 mdte slapd[19650]: conn=0 op=0 RESULT tag=97 err=0 text=
Feb 17 15:00:24 mdte slapd[19650]: conn=0 op=1 SRCH base="dc=qmul,dc=ac,dc=uk" scope=2 deref=0 filter="(objectClass=*)"
Feb 17 15:00:24 mdte slapd[19650]: conn=0 op=1 SRCH attr=* +
Feb 17 15:00:31 mdte slapd[19660]: slapd shutdown: waiting for 2 threads to terminate
Feb 17 15:00:31 mdte slapd[19650]: connection_input: conn=0 deferring operation: awaiting write
> I'll play with this a bit more to see if I can reproduce the
> consumer-side crash.
Thanks.
Martin.
>
> m.d.t.evans@qmul.ac.uk wrote:
>
> >Full_Name: Martin Evans
> >Version: 2.2.23
> >OS: Linux mdte 2.6.10-1.766_FC3.mdte30 #1 Tue Feb 15 13:50:26 GMT 2005 i686 i686 i386 GNU/Linux
> >URL: ftp://ftp.openldap.org/incoming/
> >Submission from: (NULL) (217.42.8.111)
> >
> >
> >While a syncrep consumer being populated, if it is sent TERM signal, both it and
> >the provider segfault. This did not happen in 2.2.17 (I havent checked
> >intermediate versions). This can be reproduced by removing the consumers bdb
> >backend files, starting both the provider and consumer, then sending TERM while
> >the consumer is replicating.
> >
> >My provider has a bdb backend.
> >
> >My consumer is refreshAndPersist:
> >syncrepl rid=140
> > provider=ldap://localhost:11389/
> > type=refreshAndPersist
> > searchbase="<hidden>"
> > filter="(objectClass=*)"
> > scope=sub
> > schemachecking=off
> > updatedn="<hidden>"
> > bindmethod=simple
> > binddn="<hidden>"
> > credentials=<hidden>
> >
> >For the provider, gdb bt says:
> >#0 0x0057f7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
> >#1 0x005bf955 in raise () from /lib/tls/libc.so.6
> >#2 0x005c1319 in abort () from /lib/tls/libc.so.6
> >#3 0x005b8f41 in __assert_fail () from /lib/tls/libc.so.6
> >#4 0x08066ea4 in connection2anonymous ()
> >#5 0x08067913 in connection_read ()
> >#6 0x08064e67 in slapd_daemon_destroy ()
> >#7 0x007df3ae in start_thread () from /lib/tls/libpthread.so.0
> >#8 0x0065eb6e in clone () from /lib/tls/libc.so.6
> >
> >And for the consumer:
> >#0 0x0057f7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
> >#1 0x005bf955 in raise () from /lib/tls/libc.so.6
> >#2 0x005c1319 in abort () from /lib/tls/libc.so.6
> >#3 0x005b8f41 in __assert_fail () from /lib/tls/libc.so.6
> >#4 0x080db4e2 in ldap_next_message ()
> >#5 0x0809e8a4 in do_syncrepl ()
> >#6 0x080d79ef in ldap_int_thread_pool_shutdown ()
> >#7 0x007df3ae in start_thread () from /lib/tls/libpthread.so.0
> >#8 0x0065eb6e in clone () from /lib/tls/libc.so.6
> >
> >This might be related to #3534.
> >
> >Take care,
> >Martin.
> >
> >
> >
> >
> >
> >
>
>
--
-- Dr MDT Evans, Computing Services, Queen Mary, University of London