5439 – syncprov race condition seg. fault

Issue 5439 - syncprov race condition seg. fault

Summary: syncprov race condition seg. fault

Status:	VERIFIED FIXED

Alias:	None

Product:	OpenLDAP
Classification:	Unclassified
Component:	slapd (show other issues)
Version:	unspecified
Hardware:	All All

Importance:	--- normal
Target Milestone:	---
Assignee:	OpenLDAP project

URL:
Keywords:

Depends on:
Blocks:

Reported:	2008-03-25 15:34 UTC by Rein
Modified:	2014-08-01 21:04 UTC (History)
CC List:	0 users

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this issue.

Description Rein 2008-03-25 15:34:55 UTC

Full_Name: Rein Tollevik
Version: CVS head
OS: CentOS 4.4
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (81.93.160.250)


We have bin hit by what looks like a race condition bug in syncprov.  We got
some core dumps all showing stack frames like the one at the end.  As such nasty
bugs tends to do it have behaved OK after I restarted slapd with more debug
output :-( (trace + stats + stats2 + sync).

The configuration is a master server with multiple bdb backend databases all
being subordinate to the same glue database where syncprov is used.  One of the
backends is a syncrepl consumer from another server, the server is master for
the other backends.  There are multiple consumers for the syncprov suffix, which
I assume is what causes the race condition to happen.

Note the a=0xBAD argument to attr_find(), which I expect is the result of some
other thread freeing the attribute list it was called with while it was
processing it.  The rs->sr_entry->e_attrs argument passed to attr_find() as the
original "a" argument by findpres_cb() looks like a perfectly valid structure,
as are all the attributes found by following the a_next pointer.  The list is
terminated by an attribute with a NULL a_next value, none of the a_next values
are 0xBAD.

I'm currently trying to gather more information related to this bug, any
pointers as to what I should look for is appreciated.  I'm posting this bug
report now in the hope that the stack frame should enlighten someone with better
knowledge of the code than what I have.

Rein Tollevik
Basefarm AS

#0  0x0807d03a in attr_find (a=0xbad, desc=0x81e8680) at attr.c:665
#1  0xb7a656f6 in findpres_cb (op=0xaf068ba4, rs=0xaf068b68) at syncprov.c:546
#2  0x0808416d in slap_response_play (op=0xaf068ba4, rs=0xaf068b68) at
result.c:307
#3  0x0808555b in slap_send_search_entry (op=0xaf068ba4, rs=0xaf068b68) at
result.c:770
#4  0x080f2cdc in bdb_search (op=0xaf068ba4, rs=0xaf068b68) at search.c:870
#5  0x080db72b in overlay_op_walk (op=0xaf068ba4, rs=0xaf068b68,
which=op_search, oi=0x8274218, on=0x8274318) at backover.c:653
#6  0x080dbcaf in over_op_func (op=0xaf068ba4, rs=0xaf068b68, which=op_search)
at backover.c:705
#7  0x080dbdef in over_op_search (op=0xaf068ba4, rs=0xaf068b68) at
backover.c:727
#8  0x080d9570 in glue_sub_search (op=0xaf068ba4, rs=0xaf068b68, b0=0xaf068ba4,
on=0xaf068ba4) at backglue.c:340
#9  0x080da131 in glue_op_search (op=0xbad, rs=0xaf068b68) at backglue.c:459
#10 0x080db6d5 in overlay_op_walk (op=0xaf068ba4, rs=0xaf068b68,
which=op_search, oi=0x8271860, on=0x8271a60) at backover.c:643
#11 0x080dbcaf in over_op_func (op=0xaf068ba4, rs=0xaf068b68, which=op_search)
at backover.c:705
#12 0x080dbdef in over_op_search (op=0xaf068ba4, rs=0xaf068b68) at
backover.c:727
#13 0xb7a65ff4 in syncprov_findcsn (op=0x85c7e60, mode=FIND_PRESENT) at
syncprov.c:700
#14 0xb7a670a0 in syncprov_op_search (op=0x85c7e60, rs=0xaf06a1c0) at
syncprov.c:2277
#15 0x080db6d5 in overlay_op_walk (op=0x85c7e60, rs=0xaf06a1c0, which=op_search,
oi=0x8271860, on=0x8271b60) at backover.c:643
#16 0x080dbcaf in over_op_func (op=0x85c7e60, rs=0xaf06a1c0, which=op_search) at
backover.c:705
#17 0x080dbdef in over_op_search (op=0x85c7e60, rs=0xaf06a1c0) at
backover.c:727
#18 0x08076554 in fe_op_search (op=0x85c7e60, rs=0xaf06a1c0) at search.c:368
#19 0x080770e4 in do_search (op=0x85c7e60, rs=0xaf06a1c0) at search.c:217
#20 0x08073e28 in connection_operation (ctx=0xaf06a2b8, arg_v=0x85c7e60) at
connection.c:1084
#21 0x08074f14 in connection_read_thread (ctx=0xaf06a2b8, argv=0x59) at
connection.c:1211
#22 0xb7fb5546 in ldap_int_thread_pool_wrapper (xpool=0x81ee240) at tpool.c:663
#23 0xb7c80371 in start_thread () from /lib/tls/libpthread.so.0
#24 0xb7c17ffe in clone () from /lib/tls/libc.so.6

Comment 1 Howard Chu 2008-03-25 20:56:42 UTC

rein@basefarm.no wrote:
> Full_Name: Rein Tollevik
> Version: CVS head
> OS: CentOS 4.4
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (81.93.160.250)
>
>
> We have bin hit by what looks like a race condition bug in syncprov.  We got
> some core dumps all showing stack frames like the one at the end.  As such nasty
> bugs tends to do it have behaved OK after I restarted slapd with more debug
> output :-( (trace + stats + stats2 + sync).
>
> The configuration is a master server with multiple bdb backend databases all
> being subordinate to the same glue database where syncprov is used.  One of the
> backends is a syncrepl consumer from another server, the server is master for
> the other backends.  There are multiple consumers for the syncprov suffix, which
> I assume is what causes the race condition to happen.
>
> Note the a=0xBAD argument to attr_find(), which I expect is the result of some
> other thread freeing the attribute list it was called with while it was
> processing it.  The rs->sr_entry->e_attrs argument passed to attr_find() as the
> original "a" argument by findpres_cb() looks like a perfectly valid structure,
> as are all the attributes found by following the a_next pointer.  The list is
> terminated by an attribute with a NULL a_next value, none of the a_next values
> are 0xBAD.

I don't believe that's the cause. Notice that arg0 in stack frame #9 is also 
0xbad, even though it is shown correctly in frames 8 and 10. Something else is 
going on.

> I'm currently trying to gather more information related to this bug, any
> pointers as to what I should look for is appreciated.  I'm posting this bug
> report now in the hope that the stack frame should enlighten someone with better
> knowledge of the code than what I have.

Check for stack overruns, compile without optimization and make sure it's not 
a compiler optimization bug, etc.
>
> Rein Tollevik
> Basefarm AS
>
> #0  0x0807d03a in attr_find (a=0xbad, desc=0x81e8680) at attr.c:665
> #1  0xb7a656f6 in findpres_cb (op=0xaf068ba4, rs=0xaf068b68) at syncprov.c:546
> #2  0x0808416d in slap_response_play (op=0xaf068ba4, rs=0xaf068b68) at
> result.c:307
> #3  0x0808555b in slap_send_search_entry (op=0xaf068ba4, rs=0xaf068b68) at
> result.c:770
> #4  0x080f2cdc in bdb_search (op=0xaf068ba4, rs=0xaf068b68) at search.c:870
> #5  0x080db72b in overlay_op_walk (op=0xaf068ba4, rs=0xaf068b68,
> which=op_search, oi=0x8274218, on=0x8274318) at backover.c:653
> #6  0x080dbcaf in over_op_func (op=0xaf068ba4, rs=0xaf068b68, which=op_search)
> at backover.c:705
> #7  0x080dbdef in over_op_search (op=0xaf068ba4, rs=0xaf068b68) at
> backover.c:727
> #8  0x080d9570 in glue_sub_search (op=0xaf068ba4, rs=0xaf068b68, b0=0xaf068ba4,
> on=0xaf068ba4) at backglue.c:340
> #9  0x080da131 in glue_op_search (op=0xbad, rs=0xaf068b68) at backglue.c:459
> #10 0x080db6d5 in overlay_op_walk (op=0xaf068ba4, rs=0xaf068b68,
> which=op_search, oi=0x8271860, on=0x8271a60) at backover.c:643
> #11 0x080dbcaf in over_op_func (op=0xaf068ba4, rs=0xaf068b68, which=op_search)
> at backover.c:705
> #12 0x080dbdef in over_op_search (op=0xaf068ba4, rs=0xaf068b68) at
> backover.c:727
> #13 0xb7a65ff4 in syncprov_findcsn (op=0x85c7e60, mode=FIND_PRESENT) at
> syncprov.c:700
> #14 0xb7a670a0 in syncprov_op_search (op=0x85c7e60, rs=0xaf06a1c0) at
> syncprov.c:2277
> #15 0x080db6d5 in overlay_op_walk (op=0x85c7e60, rs=0xaf06a1c0, which=op_search,
> oi=0x8271860, on=0x8271b60) at backover.c:643
> #16 0x080dbcaf in over_op_func (op=0x85c7e60, rs=0xaf06a1c0, which=op_search) at
> backover.c:705
> #17 0x080dbdef in over_op_search (op=0x85c7e60, rs=0xaf06a1c0) at
> backover.c:727
> #18 0x08076554 in fe_op_search (op=0x85c7e60, rs=0xaf06a1c0) at search.c:368
> #19 0x080770e4 in do_search (op=0x85c7e60, rs=0xaf06a1c0) at search.c:217
> #20 0x08073e28 in connection_operation (ctx=0xaf06a2b8, arg_v=0x85c7e60) at
> connection.c:1084
> #21 0x08074f14 in connection_read_thread (ctx=0xaf06a2b8, argv=0x59) at
> connection.c:1211
> #22 0xb7fb5546 in ldap_int_thread_pool_wrapper (xpool=0x81ee240) at tpool.c:663
> #23 0xb7c80371 in start_thread () from /lib/tls/libpthread.so.0
> #24 0xb7c17ffe in clone () from /lib/tls/libc.so.6

-- 
   -- Howard Chu
   Chief Architect, Symas Corp.  http://www.symas.com
   Director, Highland Sun        http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP     http://www.openldap.org/project/

Comment 2 Rein 2008-03-25 23:53:30 UTC

On Tue, 25 Mar 2008, hyc@symas.com wrote:

> rein@basefarm.no wrote:

>> We have bin hit by what looks like a race condition bug in syncprov.  We got
>> some core dumps all showing stack frames like the one at the end.  As such nasty
>> bugs tends to do it have behaved OK after I restarted slapd with more debug
>> output :-( (trace + stats + stats2 + sync).
>>
>>
>> Note the a=0xBAD argument to attr_find(), which I expect is the result of some
>> other thread freeing the attribute list it was called with while it was
>> processing it.  The rs->sr_entry->e_attrs argument passed to attr_find() as the
>> original "a" argument by findpres_cb() looks like a perfectly valid structure,
>> as are all the attributes found by following the a_next pointer.  The list is
>> terminated by an attribute with a NULL a_next value, none of the a_next values
>> are 0xBAD.
>
> I don't believe that's the cause. Notice that arg0 in stack frame #9 is also
> 0xbad, even though it is shown correctly in frames 8 and 10. Something else is
> going on.

I don't know how I managed to overlook the strange 0xbad argument in
frame #9, and I really haven't any explanation as to why it shows that
value.  But if I am on the right track in my assumptions below I
currently writes is off as an unexpected result of optimization.

It now looks to me as if the entire rs->sr_entry was released an reused,
and that the bug probably is in back-bdb.  It just always happened when
syncprov was running as this is a master server with mostly writes
except from the reads syncprov does.  Which also means that the title on
the bug report is probably misleading :-(

The reason for my shift of focus is some highly suspicious variables
found in the bdb_search() frame.

First, ei->bei_e seem to be NULL every time this happen, not the same as
the "e" variable as I would except from looking at the code.  Second,
e->e_id is either 0 or ei->bei_id+1, the content of *e is the (more or
less completely initialized) entry following the *ei entry in the
database, while I would expect that ei->bei_id==e->e_id and that
ei->bei_e==e.  I don't think the consecutive ID numbers or entries has
anything with the bug to do though, they are probably a result of the
searches reading the entire database.

I looks to me as if the EntryInfo referred to by ei was released somehow
when slap_send_search_entry() was running, and that the ei->bei_e passed
as rs->rs_entry was first released and then reused for the next entry.
Alternatively that the structures wasn't properly initialized and
thereby appeared free to the second thread.

I have found a core file that adds to my suspicions, this time with two
active threads.  Their stack frames are shown at the end.  In this core
file the id variable in the bdb_search() frame is 9318, the second
thread is loading id==9319.  The value of the "e" variable in
bdb_search() for the first thread equals the "x" variable of the
entry_decode() frame in the second thread.  Since entry_decode() has
just allocated its "x" value it shouldn't have found the same value as
the other thread has unless someone released its "e" entry while it was
still using it.  Why this happened I haven't figured out yet.

The cachsesize directive has its default value, which is way below the
optimal value.  The result is that the cache is always full.  I don't
know if that can have any influence on the bug?  I tried to lower its
value to see if that should help reproducing the problem, but not so
far.

>> I'm currently trying to gather more information related to this bug, any
>> pointers as to what I should look for is appreciated.  I'm posting this bug
>> report now in the hope that the stack frame should enlighten someone with better
>> knowledge of the code than what I have.
>
> Check for stack overruns, compile without optimization and make sure it's not
> a compiler optimization bug, etc.

I'll try to run it with a version compiled with only -g and see if the
same thing should happen then.

Rein

Acive thread when segmentation fault occured:

#0  0x0807d03a in attr_find (a=0xbad, desc=0x81e8680) at attr.c:665
#1  0xb7a656f6 in findpres_cb (op=0xac05cba4, rs=0xac05cb68) at syncprov.c:546
#2  0x0808416d in slap_response_play (op=0xac05cba4, rs=0xac05cb68) at result.c:307
#3  0x0808555b in slap_send_search_entry (op=0xac05cba4, rs=0xac05cb68) at result.c:770
#4  0x080f2cdc in bdb_search (op=0xac05cba4, rs=0xac05cb68) at search.c:870
#5  0x080db72b in overlay_op_walk (op=0xac05cba4, rs=0xac05cb68, which=op_search, oi=0x8274218, on=0x8274318) at backover.c:653
#6  0x080dbcaf in over_op_func (op=0xac05cba4, rs=0xac05cb68, which=op_search) at backover.c:705
#7  0x080dbdef in over_op_search (op=0xac05cba4, rs=0xac05cb68) at backover.c:727
#8  0x080d9570 in glue_sub_search (op=0xac05cba4, rs=0xac05cb68, b0=0xac05cba4, on=0xac05cba4) at backglue.c:340
#9  0x080da131 in glue_op_search (op=0xbad, rs=0xac05cb68) at backglue.c:459
#10 0x080db6d5 in overlay_op_walk (op=0xac05cba4, rs=0xac05cb68, which=op_search, oi=0x8271860, on=0x8271a60) at backover.c:643
#11 0x080dbcaf in over_op_func (op=0xac05cba4, rs=0xac05cb68, which=op_search) at backover.c:705
#12 0x080dbdef in over_op_search (op=0xac05cba4, rs=0xac05cb68) at backover.c:727
#13 0xb7a65ff4 in syncprov_findcsn (op=0x85085b8, mode=FIND_PRESENT) at syncprov.c:700
#14 0xb7a670a0 in syncprov_op_search (op=0x85085b8, rs=0xac05e1c0) at syncprov.c:2277
#15 0x080db6d5 in overlay_op_walk (op=0x85085b8, rs=0xac05e1c0, which=op_search, oi=0x8271860, on=0x8271b60) at backover.c:643
#16 0x080dbcaf in over_op_func (op=0x85085b8, rs=0xac05e1c0, which=op_search) at backover.c:705
#17 0x080dbdef in over_op_search (op=0x85085b8, rs=0xac05e1c0) at backover.c:727
#18 0x08076554 in fe_op_search (op=0x85085b8, rs=0xac05e1c0) at search.c:368
#19 0x080770e4 in do_search (op=0x85085b8, rs=0xac05e1c0) at search.c:217
#20 0x08073e28 in connection_operation (ctx=0xac05e2b8, arg_v=0x85085b8) at connection.c:1084
#21 0x08074f14 in connection_read_thread (ctx=0xac05e2b8, argv=0x3d) at connection.c:1211
#22 0xb7fb5546 in ldap_int_thread_pool_wrapper (xpool=0x81ee240) at tpool.c:663
#23 0xb7c80371 in start_thread () from /lib/tls/libpthread.so.0
#24 0xb7c17ffe in clone () from /lib/tls/libc.so.6

Thead loading the next entry:

#0  0x08123ef6 in avl_find (root=0x81e7940, data=0xae7a62e8, fcmp=0x80b8871 <attr_index_name_cmp>) at avl.c:541
#1  0x080b7852 in at_bvfind (name=0xae7a62e8) at at.c:130
#2  0x080b5fd8 in slap_bv2ad (bv=0xae7a6354, ad=0xae7a6360, text=0xae7a6364) at ad.c:189
#3  0x0807eef5 in entry_decode (eh=0xae7a63bc, e=0xae7a64a8) at entry.c:859
#4  0x0810d849 in bdb_id2entry (be=0xae8666c0, tid=0x0, locker=0xb742da7c, id=9319, e=0xae7a64a8) at id2entry.c:169
#5  0x08104ffb in bdb_cache_find_id (op=0xae866ba4, tid=0x0, id=9319, eip=0xae7a6614, flag=2, locker=0xb742da7c, lock=0xae7a65f4) at cache.c:834
#6  0x080f2682 in bdb_search (op=0xae866ba4, rs=0xae866b68) at search.c:684
#7  0x080db72b in overlay_op_walk (op=0xae866ba4, rs=0xae866b68, which=op_search, oi=0x8274218, on=0x8274318) at backover.c:653
#8  0x080dbcaf in over_op_func (op=0xae866ba4, rs=0xae866b68, which=op_search) at backover.c:705
#9  0x080dbdef in over_op_search (op=0xae866ba4, rs=0xae866b68) at backover.c:727
#10 0x080d9570 in glue_sub_search (op=0xae866ba4, rs=0xae866b68, b0=0x81e7930, on=0xae866ba4) at backglue.c:340
#11 0x080da131 in glue_op_search (op=0xfffffffa, rs=0xae866b68) at backglue.c:459
#12 0x080db6d5 in overlay_op_walk (op=0xae866ba4, rs=0xae866b68, which=op_search, oi=0x8271860, on=0x8271a60) at backover.c:643
#13 0x080dbcaf in over_op_func (op=0xae866ba4, rs=0xae866b68, which=op_search) at backover.c:705
#14 0x080dbdef in over_op_search (op=0xae866ba4, rs=0xae866b68) at backover.c:727
#15 0xb7a65ff4 in syncprov_findcsn (op=0x8508e60, mode=FIND_PRESENT) at syncprov.c:700
#16 0xb7a670a0 in syncprov_op_search (op=0x8508e60, rs=0xae8681c0) at syncprov.c:2277
#17 0x080db6d5 in overlay_op_walk (op=0x8508e60, rs=0xae8681c0, which=op_search, oi=0x8271860, on=0x8271b60) at backover.c:643
#18 0x080dbcaf in over_op_func (op=0x8508e60, rs=0xae8681c0, which=op_search) at backover.c:705
#19 0x080dbdef in over_op_search (op=0x8508e60, rs=0xae8681c0) at backover.c:727
#20 0x08076554 in fe_op_search (op=0x8508e60, rs=0xae8681c0) at search.c:368
#21 0x080770e4 in do_search (op=0x8508e60, rs=0xae8681c0) at search.c:217
#22 0x08073e28 in connection_operation (ctx=0xae8682b8, arg_v=0x8508e60) at connection.c:1084
#23 0x08074f14 in connection_read_thread (ctx=0xae8682b8, argv=0x54) at connection.c:1211
#24 0xb7fb5546 in ldap_int_thread_pool_wrapper (xpool=0x81ee240) at tpool.c:663
#25 0xb7c80371 in start_thread () from /lib/tls/libpthread.so.0
#26 0xb7c17ffe in clone () from /lib/tls/libc.so.6

Comment 3 Howard Chu 2008-03-26 00:45:04 UTC

Rein Tollevik wrote:
> On Tue, 25 Mar 2008, hyc@symas.com wrote:
> It now looks to me as if the entire rs->sr_entry was released an reused,
> and that the bug probably is in back-bdb.  It just always happened when
> syncprov was running as this is a master server with mostly writes
> except from the reads syncprov does.  Which also means that the title on
> the bug report is probably misleading :-(
>
> The reason for my shift of focus is some highly suspicious variables
> found in the bdb_search() frame.

Look at the value of idflag in bdb_search. (see the comment around 672)

Could be a locking bug with ID_NOCACHE in cache.c...

> First, ei->bei_e seem to be NULL every time this happen, not the same as
> the "e" variable as I would except from looking at the code.  Second,
> e->e_id is either 0 or ei->bei_id+1, the content of *e is the (more or
> less completely initialized) entry following the *ei entry in the
> database, while I would expect that ei->bei_id==e->e_id and that
> ei->bei_e==e.  I don't think the consecutive ID numbers or entries has
> anything with the bug to do though, they are probably a result of the
> searches reading the entire database.
>
> I looks to me as if the EntryInfo referred to by ei was released somehow
> when slap_send_search_entry() was running, and that the ei->bei_e passed
> as rs->rs_entry was first released and then reused for the next entry.
> Alternatively that the structures wasn't properly initialized and
> thereby appeared free to the second thread.
>
> I have found a core file that adds to my suspicions, this time with two
> active threads.  Their stack frames are shown at the end.  In this core
> file the id variable in the bdb_search() frame is 9318, the second
> thread is loading id==9319.  The value of the "e" variable in
> bdb_search() for the first thread equals the "x" variable of the
> entry_decode() frame in the second thread.  Since entry_decode() has
> just allocated its "x" value it shouldn't have found the same value as
> the other thread has unless someone released its "e" entry while it was
> still using it.  Why this happened I haven't figured out yet.
>
> The cachsesize directive has its default value, which is way below the
> optimal value.  The result is that the cache is always full.  I don't
> know if that can have any influence on the bug?  I tried to lower its
> value to see if that should help reproducing the problem, but not so
> far.
>
>>> I'm currently trying to gather more information related to this bug, any
>>> pointers as to what I should look for is appreciated.  I'm posting this bug
>>> report now in the hope that the stack frame should enlighten someone with better
>>> knowledge of the code than what I have.
>> Check for stack overruns, compile without optimization and make sure it's not
>> a compiler optimization bug, etc.
>
> I'll try to run it with a version compiled with only -g and see if the
> same thing should happen then.
>


-- 
   -- Howard Chu
   Chief Architect, Symas Corp.  http://www.symas.com
   Director, Highland Sun        http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP     http://www.openldap.org/project/

Comment 4 Hallvard Furuseth 2008-03-28 13:41:25 UTC

rein@basefarm.no writes:

> It now looks to me as if the entire rs->sr_entry was released an
> reused, (...)

Could this be ITS#5340?

-- 
Hallvard

Comment 5 Rein 2008-03-31 18:26:33 UTC

On Tue, 25 Mar 2008, Howard Chu wrote:

> Rein Tollevik wrote:
>> On Tue, 25 Mar 2008, hyc@symas.com wrote:
>> It now looks to me as if the entire rs->sr_entry was released an reused,
>> and that the bug probably is in back-bdb.  It just always happened when
>> syncprov was running as this is a master server with mostly writes
>> except from the reads syncprov does.  Which also means that the title on
>> the bug report is probably misleading :-(
>> 
>> The reason for my shift of focus is some highly suspicious variables
>> found in the bdb_search() frame.
>
> Look at the value of idflag in bdb_search. (see the comment around 672)
>
> Could be a locking bug with ID_NOCACHE in cache.c...

Yes, it looks very much as this was the problem.  We have been running
with the patch at the end since thursday afternoon, and so far it seem
to have worked :-) Although it is a bit too early to conclude,
especially as we have seen other problems as well.

We did see segmentation faults both friday and monday (not very much
going on during the weekend..), but so far those appear to be related to
ITS#5445. A slapd and db_checkpoint (run out of cron) deadlock which I
hope is due to the BerkeleyDB bug mentioned in ITS#5391 has also
occurred, so I'll upgrade from 4.6.18 to 4.6.21.1 to see if that problem
goes away.

Now, back to this bug.  There is a potential bug in bdb_cache_find_id()
as it inspects and modifies the EntryInfo without it being locked.
Although I don't think this was the cause here, as bdb_search() always
passes a NULL eip which mean that bdb_cache_find_ndn() should have
acquired a lock.  The patch always locks the entryinfo though, as I
believe that to be necessary for the rest of the patch to be correct.

The real problem was in bdb_cache_return_entry_rw() which released
ei->bei_e without making sure it was the only thread using it.  I.e it
should have had a write lock on the entry_db (as bdb_cache_lru_purge()
makes sure to have), but instead it freed it after releasing the read
lock.

Instead of write locking the entry before releasing it my alternative
solution is to only let the thread that loads the entry set the
CACHE_ENTRY_NOT_CACHED flag, and only if no other threads found the
entryinfo while it loaded the entry.  All other threads clears the flag
when they find they don't need to load the entry.

I also use trylock to lock the entryinfo before setting or inspecting
the cache flag, for two reasons.  First, honoring this flag should not
be important enough to wait for a lock (after all, it should be cleared
anyhow if anyone else is interested in the same entry).  Second (and
most important), to avoid any deadlock situations, which my first
attempt to fix this bug managed to create when I moved the release of
the entryinfo lock to after the entry_db lock had been acquired in 
bdb_cache_find_id().

Rein

Index: OpenLDAP/servers/slapd/back-bdb/cache.c
diff -u OpenLDAP/servers/slapd/back-bdb/cache.c:1.1.1.15 OpenLDAP/servers/slapd/back-bdb/cache.c:1.2
--- OpenLDAP/servers/slapd/back-bdb/cache.c:1.1.1.15	Sat Mar 22 16:47:56 2008
+++ OpenLDAP/servers/slapd/back-bdb/cache.c	Mon Mar 31 17:52:17 2008
@@ -252,16 +252,27 @@
  	int free = 0;

  	ei = e->e_private;
-	bdb_cache_entry_db_unlock( bdb, lock );
-	if ( ei ) {
-		bdb_cache_entryinfo_lock( ei );
+	if ( ei &&
+		( ei->bei_state & CACHE_ENTRY_NOT_CACHED ) &&
+		( bdb_cache_entryinfo_trylock( ei ) == 0 ) ) {
  		if ( ei->bei_state & CACHE_ENTRY_NOT_CACHED ) {
+			/* Releasing the entry can only be done when
+			 * we know that nobody else is using it, i.e we
+			 * should have an entry_db writelock.  But the
+			 * flag is only set by the thread that loads the
+			 * entry, and only if no other threads has found
+			 * it while it was working.  All other threads
+			 * clear the flag, which mean that we should be
+			 * the only thread using the entry if the flag
+			 * is set here.
+			 */
  			ei->bei_e = NULL;
  			ei->bei_state ^= CACHE_ENTRY_NOT_CACHED;
  			free = 1;
  		}
  		bdb_cache_entryinfo_unlock( ei );
  	}
+	bdb_cache_entry_db_unlock( bdb, lock );
  	if ( free ) {
  		e->e_private = NULL;
  		bdb_entry_return( e );
@@ -854,6 +865,11 @@

  	/* Ok, we found the info, do we have the entry? */
  	if ( rc == 0 ) {
+		if ( !( flag & ID_LOCKED )) {
+			bdb_cache_entryinfo_lock ( *eip );
+			flag |= ID_LOCKED;
+		}
+
  		if ( (*eip)->bei_state & CACHE_ENTRY_DELETED ) {
  			rc = DB_NOTFOUND;
  		} else {
@@ -873,13 +889,13 @@
  				(*eip)->bei_state |= CACHE_ENTRY_LOADING;
  			}

-			/* If the entry was loaded before but uncached, and we need
-			 * it again, clear the uncached state
-			 */
-			if ( (*eip)->bei_state & CACHE_ENTRY_NOT_CACHED ) {
-				(*eip)->bei_state ^= CACHE_ENTRY_NOT_CACHED;
-				if ( flag & ID_NOCACHE )
-					flag ^= ID_NOCACHE;
+			if ( !load ) {
+				/* Clear the uncached state if we are not
+				 * loading it, i.e it is already cached or
+				 * another thread is currently loading it.
+				 */
+				(*eip)->bei_state &= ~CACHE_ENTRY_NOT_CACHED;
+				flag &= ~ID_NOCACHE;
  			}

  			if ( flag & ID_LOCKED ) {
@@ -906,10 +922,14 @@
  #endif
  						ep = NULL;
  						bdb_cache_lru_link( bdb, *eip );
-						if ( flag & ID_NOCACHE ) {
-							bdb_cache_entryinfo_lock( *eip );
-							(*eip)->bei_state |= CACHE_ENTRY_NOT_CACHED;
-							bdb_cache_entryinfo_unlock( *eip );
+						if ( ( flag & ID_NOCACHE ) &&
+							( bdb_cache_entryinfo_trylock ( *eip ) == 0 ) ) {
+							/* Set the cached state only if no other thread
+							 * found the info while we was loading the entry.
+							 */
+							if ( (*eip)->bei_finders == 1 )
+								(*eip)->bei_state |= CACHE_ENTRY_NOT_CACHED;
+							bdb_cache_entryinfo_unlock ( *eip );
  						}
  					}
  					if ( rc == 0 ) {

Comment 6 Quanah Gibson-Mount 2008-03-31 18:43:09 UTC

--On Monday, March 31, 2008 6:27 PM +0000 rein@basefarm.no wrote:

> We did see segmentation faults both Friday and Monday (not very much
> going on during the weekend..), but so far those appear to be related to
> ITS#5445. A slapd and db_checkpoint (run out of cron) deadlock which I
> hope is due to the BerkeleyDB bug mentioned in ITS#5391 has also
> occurred, so I'll upgrade from 4.6.18 to 4.6.21.1 to see if that problem
> goes away.


Why do you run db_checkpoint out of cron?  Slapd will already take care of 
checkpointing the database for you using the "checkpoint" directive...

--Quanah


--

Quanah Gibson-Mount
Principal Software Engineer
Zimbra, Inc
--------------------
Zimbra ::  the leader in open source messaging and collaboration

Comment 7 Howard Chu 2008-04-03 06:02:26 UTC

changed notes
changed state Open to Test
moved from Incoming to Software Bugs

Comment 8 Quanah Gibson-Mount 2008-05-05 23:55:59 UTC

changed notes
changed state Test to Release

Comment 9 Quanah Gibson-Mount 2008-05-13 03:23:10 UTC

changed notes
changed state Release to Closed

Comment 10 OpenLDAP project 2014-08-01 21:04:14 UTC

fixed in HEAD
fixed in RE24