[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: delta-syncrepl stopped receiving changes



On 2012-07-18 18:01, Gavin Henry wrote:
>>> Hi Dave,
>>>
>>> Have you been able to reproduce it since?
>>>
>>> Thanks.
>>
>> So far I've only had the one failure and I haven't been able to
>> reproduce it since.
>>
> 
> That's tricky then. Did you file an ITS? Will check...

I haven't filed an ITS yet, but another consumer locked up yesterday.

This time I didn't try to modify anything when syncrepl was locked.

The backtrace is the same for the hdb_modify, but there are also threads
that are locked on binds and searches. It turns out that these searches
and binds are for the user that is being modified. The consumer shut
down, came up normally, and replication continued without incident.

Here's a bind that is locked:

=====
Thread 5 (Thread 0x474cb950 (LWP 27575)):
#0  0x00007fad5de7eb99 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib/libpthread.so.0
#1  0x00007fad5e0b34fb in __db_pthread_mutex_lock ()
   from /apps/local/depend/BerkeleyDB-4.7.25p4/lib/libdb-4.7.so
#2  0x00007fad5e131dec in __lock_get_internal ()
   from /apps/local/depend/BerkeleyDB-4.7.25p4/lib/libdb-4.7.so
#3  0x00007fad5e132391 in __lock_get_pp ()
   from /apps/local/depend/BerkeleyDB-4.7.25p4/lib/libdb-4.7.so
#4  0x000000000054dbfa in bdb_cache_entry_db_lock (bdb=0xaec000,
    txn=0x32ee9560, ei=0xea8d680, rw=0, tryOnly=0, lock=0x474ca8b0)
    at cache.c:234
#5  0x000000000054f4d5 in hdb_cache_find_id (op=0x7fac2a014c00,
    tid=0x32ee9560, id=1455322, eip=0x474ca870, flag=0, lock=0x474ca8b0)
    at cache.c:988
#6  0x00000000005564f5 in hdb_dn2entry (op=0x7fac2a014c00, tid=0x32ee9560,
    dn=0x7fac2a014c38, e=0x474ca8e0, matched=1, lock=0x474ca8b0)
    at dn2entry.c:67
#7  0x000000000054d212 in hdb_bind (op=0x7fac2a014c00, rs=0x474cac90)
    at bind.c:70
...
=====

Clearly I need to upgrade and see if this still continues to happen.