[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Please test RE24

Rein Tollevik wrote:
Howard Chu wrote:

And unfortunately I had no time to do any more debugging until now; with
St. Patrick's Day this Tuesday I had gigs all weekend. I also see that
the test050 run I left overnight eventually crashed, and the symptoms
are the same as in Quanah's. So, there's still more to track down.

Look as if I might have hit the same, see stack trace at the end.

For reference:

violino:~/OD/hobj/tests/testrun>  grep rid=003 !$
grep rid=003 slapd.1.log
=>do_syncrepl rid=003
do_syncrepl: rid=003 retrying (9 retries left)
=>do_syncrepl rid=003
=>do_syncrep2 rid=003
=>do_syncrepl rid=003
=>do_syncrep2 rid=003
olcSyncrepl: {2}rid=003 provider=ldap://localhost:9013/
binddn="cn=config" bin
=>do_syncrepl rid=003
=>do_syncrep2 rid=003
olcSyncrepl: {2}rid=003 provider=ldap://localhost:9013/
binddn="cn=config" bin
=>do_syncrepl rid=003
=>do_syncrepl rid=003
=>do_syncrep2 rid=003
do_syncrepl: rid=003 quitting

The odd thing here of course is that it should never jump from '9
retries left' to 'quitting', there should be at least 9 failures / retry
messages. Seems like we have a wild memory overwrite occurring.

I assume it is quitting due to config update. Looks to me as if syncinfo structures are released while still active.

OK. This must be occurring because a connection_client thread is in the thread pool but hasn't started running yet when the config change occurs. So the usual mutexes aren't held yet...


(gdb) where
#0  0x0000002a968d2540 in strlen () from /lib64/tls/libc.so.6
#1  0x0000002a968a4a0b in vfprintf () from /lib64/tls/libc.so.6
#2  0x0000002a968c4434 in vsnprintf () from /lib64/tls/libc.so.6
#3  0x0000002a958c3181 in lutil_debug (debug=<value optimized out>,
      level=<value optimized out>, fmt=0x448076c8 "$") at debug.c:66
#4  0x00000000004957d1 in do_syncrepl (ctx=0x44807e90, arg=0x858150)
      at syncrepl.c:1261
#5  0x0000002a9567e415 in ldap_int_thread_pool_wrapper (
      xpool=<value optimized out>) at tpool.c:663
#6  0x0000002a9675310a in start_thread () from /lib64/tls/libpthread.so.0
#7  0x0000002a969288b3 in clone () from /lib64/tls/libc.so.6
#8  0x0000000000000000 in ?? ()
(gdb) print si
$1 = (syncinfo_t *) 0x0
(gdb) print *rtask
$2 = {next_sched = {tv_sec = 7598733802573148208,
      tv_usec = 14422794207978861}, interval = {tv_sec = 384, tv_usec = 64},
    tnext = {stqe_next = 0x84bc30}, rnext = {stqe_next = 0x858870},
routine = 0,
    arg = 0x0, tname = 0x505cc0 "do_syncrepl", tspec = 0x857d94 "rid=004"}

(gdb) thr 8
[Switching to thread 8 (process 23265)]#0  0x0000002a968d2540 in strlen ()
     from /lib64/tls/libc.so.6
(gdb) frame 4
#4  0x00000000004957d1 in do_syncrepl (ctx=0x41801e90, arg=0x858a30)
      at syncrepl.c:1261
1261		Debug( LDAP_DEBUG_TRACE, "=>do_syncrepl %s\n", si->si_ridtxt, 0, 0 );
(gdb) print si
$3 = (syncinfo_t *) 0x20
(gdb) print *rtask
$4 = {next_sched = {tv_sec = 7526470944284832317,
      tv_usec = 7598542775770181185}, interval = {tv_sec =
      tv_usec = 3683997482740818493}, tnext = {stqe_next =
    rnext = {stqe_next = 0x333d74756f656d}, routine = 0xc0, arg = 0x20,
    tname = 0x84bc10 "\220\004",
    tspec = 0x69666e6f43657361<Address 0x69666e6f43657361 out of bounds>}

-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/