[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#4659) Core dump after MOD



quanah@stanford.edu wrote:
> After looking at the timing of everything, it is possible that I did kill 
> -9 on a syncrepl consumer right in the middle of the MOD where the master 
> died.  That may have triggered this core, if the timing was all right down 
> to the nanosecond...
>   
Well, that's not a good reason to core anyway :)

OK, before actually destroying the connection, connection_close() waits 
for the operations queue to be empty; from your dump, it seems that the 
connection being destroyed has no (pending) ops, but the c_write_mutex 
is locked and c_writewaiter is set.  This means that send_ldap_ber() in 
result.c was waiting on

                 ldap_pvt_thread_cond_wait( &conn->c_write_cv, 
&conn->c_mutex );

but the operation was somehow removed from the operations queue.  I'm 
trying to figure out how this could happen.

One point is that even if your killing killed a persistent search op 
(which I doubt), no writewaiter should be involved in a persistent 
search...  However I note that syncprov actually removes operations from 
c_ops, in syncprov_drop_search().  I wonder if by chance it's being 
called in few cases with lock erroneously unset...

It would be great to know what operation caused this, but since there is 
no c_*ops in the Connection structure, it's going to be impossible.

p.




Ing. Pierangelo Masarati
OpenLDAP Core Team

SysNet s.n.c.
Via Dossi, 8 - 27100 Pavia - ITALIA
http://www.sys-net.it
------------------------------------------
Office:   +39.02.23998309
Mobile:   +39.333.4963172
Email:    pierangelo.masarati@sys-net.it
------------------------------------------