[Date Prev][Date Next] [Chronological] [Thread] [Top]

RE: test008-concurrency coredump (ITS#2866)




  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support

> -----Original Message-----
> From: owner-openldap-bugs@OpenLDAP.org
> [mailto:owner-openldap-bugs@OpenLDAP.org]On Behalf Of
h.b.furuseth@usit.uio.no

> Full_Name: Hallvard B Furuseth
> Version: HEAD
> OS: Solaris

> OpenLDAP was configured with
> CPPFLAGS="-DLDAP_LOCALIZE" ol_cv_bdb_compat=yes ./configure
> and linked with db-4.1.24 and Electric Fence.
>
> The core dump was produced by looping over test008
> with environment variable EF_PROTECT_BELOW=1, though I don't know if
> the latter has any relevance.

> bash$ gdb ../servers/slapd/slapd core
> Core was generated by `../servers/slapd/slapd -s0 -f
> ./testrun/slapd.1.conf -h
> ldap://localhost:9011/'.
> Program terminated with signal 11, Segmentation fault.
> #0  0x0011d6fc in ber_dupbv_x (dst=0xecc01778,
> src=0xf12ca000, ctx=0x0)
>     at memory.c:525
> 525		AC_MEMCPY( new->bv_val, src->bv_val, src->bv_len );
> (gdb) bt
> #0  0x0011d6fc in ber_dupbv_x (dst=0xecc01778,
> src=0xf12ca000, ctx=0x0)
>     at memory.c:525
> #1  0x0011d76c in ber_dupbv (dst=0xecc01778, src=0xf12ca000)
> at memory.c:536
> #2  0x00099ed0 in slap_get_commit_csn (op=0xecd56000, csn=0xecc01778)
>     at ctxcsn.c:60
> #3  0x000ba1dc in bdb_csn_commit (op=0xecd56000, rs=0xecc01ab0,
>     tid=0xe8b6a000, ei=0xf43c8000, suffix_ei=0xecc0189c,
> ctxcsn_e=0xecc01898,
>     ctxcsn_added=0xecc01894, locker=2147487653) at ctxcsn.c:68
> #4  0x000bcbe8 in bdb_delete (op=0xecd56000, rs=0xecc01ab0)
> at delete.c:496
> #5  0x0005286c in do_delete (op=0xecd56000, rs=0xecc01ab0) at
> delete.c:216
> #6  0x00033934 in connection_operation (ctx=0xecc01b78,
> arg_v=0xecd56000)
>     at connection.c:1014
> #7  0x000df46c in ldap_int_thread_pool_wrapper (xpool=0xfc00a000)
>     at tpool.c:467
> (gdb)

The most obvious problem here is that the dupbv occurs outside of the
critical section in slap_get_commit_csn. This means another thread can come
along in slap_graduate_commit_csn and free the selected csn before dupbv can
execute, and that is most likely the cause of this particular crash.

A secondary issue is why slap_get_commit_csn returns the last committed CSN
in the list, instead of the first. The CSNs are appended to the list in
order, so later ones in the list will be higher in sequence. But due to
transaction retries, commits can happen in an arbitrary sequence. This
approach means the max_csn stored in the database may be greater than any
actually committed changes. A syncrepl client that updates with this value
will have missed some transactions (that haven't been committed yet) and
store this high value in its internal state. Next time it queries it will ask
for transactions newer than this, so it will never pick up the missed
transactions.

I believe this function should always return the first committed CSN in the
list, but probably Jong should look into this to verify.

  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support