[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#3546) Sync rep provider and server crash on SIGTERM



The backtrace you provided was a bit inaccurate; you need to compile 
with "-g" (debugging info present) and without optimization in order to 
get a consistent trace.

I've reproduced part of the problem; the provider is not segfaulting, it 
is hitting an assert() at connection.c:687. Specifically, the connection 
is being torn down while someone is still waiting to write on it. This 
happens because there is a large search in progress, and data has piled 
up faster than the network can send it. When you terminate the syncrepl 
client, it sends an Unbind request and then closes its side of the 
connection. (In my test, the syncrepl consumer shutdown gracefully 
though, there was no crash.) The Unbind is received by the provider but 
actually gets Deferred, because it's still waiting for its writes to 
flush. Then the connection actually closes, and the problem occurs. This 
provider-side assert() situation is not unique to syncrepl, it can 
happen whenever any large search request is terminated in the middle. 
We'll definitely have to fix that up.

I'll play with this a bit more to see if I can reproduce the 
consumer-side crash.

m.d.t.evans@qmul.ac.uk wrote:

>Full_Name: Martin Evans
>Version: 2.2.23
>OS: Linux mdte 2.6.10-1.766_FC3.mdte30 #1 Tue Feb 15 13:50:26 GMT 2005 i686 i686 i386 GNU/Linux
>URL: ftp://ftp.openldap.org/incoming/
>Submission from: (NULL) (217.42.8.111)
>
>
>While a syncrep consumer being populated, if it is sent TERM signal, both it and
>the provider segfault. This did not happen in 2.2.17 (I havent checked
>intermediate versions). This can be reproduced by removing the consumers bdb
>backend files, starting both the provider and consumer, then sending TERM while
>the consumer is replicating.
>
>My provider has a bdb backend.
>
>My consumer is refreshAndPersist:
>syncrepl rid=140
>         provider=ldap://localhost:11389/
>         type=refreshAndPersist
>         searchbase="<hidden>"
>         filter="(objectClass=*)"
>         scope=sub
>         schemachecking=off
>         updatedn="<hidden>"
>         bindmethod=simple
>         binddn="<hidden>"
>         credentials=<hidden>
>
>For the provider, gdb bt says:
>#0  0x0057f7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
>#1  0x005bf955 in raise () from /lib/tls/libc.so.6
>#2  0x005c1319 in abort () from /lib/tls/libc.so.6
>#3  0x005b8f41 in __assert_fail () from /lib/tls/libc.so.6
>#4  0x08066ea4 in connection2anonymous ()
>#5  0x08067913 in connection_read ()
>#6  0x08064e67 in slapd_daemon_destroy ()
>#7  0x007df3ae in start_thread () from /lib/tls/libpthread.so.0
>#8  0x0065eb6e in clone () from /lib/tls/libc.so.6
>
>And for the consumer:
>#0  0x0057f7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
>#1  0x005bf955 in raise () from /lib/tls/libc.so.6
>#2  0x005c1319 in abort () from /lib/tls/libc.so.6
>#3  0x005b8f41 in __assert_fail () from /lib/tls/libc.so.6
>#4  0x080db4e2 in ldap_next_message ()
>#5  0x0809e8a4 in do_syncrepl ()
>#6  0x080d79ef in ldap_int_thread_pool_shutdown ()
>#7  0x007df3ae in start_thread () from /lib/tls/libpthread.so.0
>#8  0x0065eb6e in clone () from /lib/tls/libc.so.6
>
>This might be related to #3534.
>
>Take care,
>Martin.
>
>
>
>
>  
>


-- 
  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support