[Date Prev][Date Next] [Chronological] [Thread] [Top]

(ITS#5439) syncprov race condition seg. fault



Full_Name: Rein Tollevik
Version: CVS head
OS: CentOS 4.4
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (81.93.160.250)


We have bin hit by what looks like a race condition bug in syncprov.  We got
some core dumps all showing stack frames like the one at the end.  As such nasty
bugs tends to do it have behaved OK after I restarted slapd with more debug
output :-( (trace + stats + stats2 + sync).

The configuration is a master server with multiple bdb backend databases all
being subordinate to the same glue database where syncprov is used.  One of the
backends is a syncrepl consumer from another server, the server is master for
the other backends.  There are multiple consumers for the syncprov suffix, which
I assume is what causes the race condition to happen.

Note the a=0xBAD argument to attr_find(), which I expect is the result of some
other thread freeing the attribute list it was called with while it was
processing it.  The rs->sr_entry->e_attrs argument passed to attr_find() as the
original "a" argument by findpres_cb() looks like a perfectly valid structure,
as are all the attributes found by following the a_next pointer.  The list is
terminated by an attribute with a NULL a_next value, none of the a_next values
are 0xBAD.

I'm currently trying to gather more information related to this bug, any
pointers as to what I should look for is appreciated.  I'm posting this bug
report now in the hope that the stack frame should enlighten someone with better
knowledge of the code than what I have.

Rein Tollevik
Basefarm AS

#0  0x0807d03a in attr_find (a=0xbad, desc=0x81e8680) at attr.c:665
#1  0xb7a656f6 in findpres_cb (op=0xaf068ba4, rs=0xaf068b68) at syncprov.c:546
#2  0x0808416d in slap_response_play (op=0xaf068ba4, rs=0xaf068b68) at
result.c:307
#3  0x0808555b in slap_send_search_entry (op=0xaf068ba4, rs=0xaf068b68) at
result.c:770
#4  0x080f2cdc in bdb_search (op=0xaf068ba4, rs=0xaf068b68) at search.c:870
#5  0x080db72b in overlay_op_walk (op=0xaf068ba4, rs=0xaf068b68,
which=op_search, oi=0x8274218, on=0x8274318) at backover.c:653
#6  0x080dbcaf in over_op_func (op=0xaf068ba4, rs=0xaf068b68, which=op_search)
at backover.c:705
#7  0x080dbdef in over_op_search (op=0xaf068ba4, rs=0xaf068b68) at
backover.c:727
#8  0x080d9570 in glue_sub_search (op=0xaf068ba4, rs=0xaf068b68, b0=0xaf068ba4,
on=0xaf068ba4) at backglue.c:340
#9  0x080da131 in glue_op_search (op=0xbad, rs=0xaf068b68) at backglue.c:459
#10 0x080db6d5 in overlay_op_walk (op=0xaf068ba4, rs=0xaf068b68,
which=op_search, oi=0x8271860, on=0x8271a60) at backover.c:643
#11 0x080dbcaf in over_op_func (op=0xaf068ba4, rs=0xaf068b68, which=op_search)
at backover.c:705
#12 0x080dbdef in over_op_search (op=0xaf068ba4, rs=0xaf068b68) at
backover.c:727
#13 0xb7a65ff4 in syncprov_findcsn (op=0x85c7e60, mode=FIND_PRESENT) at
syncprov.c:700
#14 0xb7a670a0 in syncprov_op_search (op=0x85c7e60, rs=0xaf06a1c0) at
syncprov.c:2277
#15 0x080db6d5 in overlay_op_walk (op=0x85c7e60, rs=0xaf06a1c0, which=op_search,
oi=0x8271860, on=0x8271b60) at backover.c:643
#16 0x080dbcaf in over_op_func (op=0x85c7e60, rs=0xaf06a1c0, which=op_search) at
backover.c:705
#17 0x080dbdef in over_op_search (op=0x85c7e60, rs=0xaf06a1c0) at
backover.c:727
#18 0x08076554 in fe_op_search (op=0x85c7e60, rs=0xaf06a1c0) at search.c:368
#19 0x080770e4 in do_search (op=0x85c7e60, rs=0xaf06a1c0) at search.c:217
#20 0x08073e28 in connection_operation (ctx=0xaf06a2b8, arg_v=0x85c7e60) at
connection.c:1084
#21 0x08074f14 in connection_read_thread (ctx=0xaf06a2b8, argv=0x59) at
connection.c:1211
#22 0xb7fb5546 in ldap_int_thread_pool_wrapper (xpool=0x81ee240) at tpool.c:663
#23 0xb7c80371 in start_thread () from /lib/tls/libpthread.so.0
#24 0xb7c17ffe in clone () from /lib/tls/libc.so.6