[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#5161) delta-syncrepl mutex lockup

quanah@zimbra.com wrote:
> --On Tuesday, October 02, 2007 3:28 AM +0000 quanah@zimbra.com wrote:
>> --On Tuesday, October 02, 2007 2:35 AM +0000 hyc@symas.com wrote:
>>> quanah@zimbra.com wrote:
>>>> --On October 1, 2007 11:22:11 PM +0000 quanah@zimbra.com wrote:
>>>>> The following files will be uploaded to the ftp site, where # will be
>>>>> the assigned ITS number.
>>>> URL's specifically are:
>>>> <ftp://ftp.openldap.org/incoming/5161-pstak.out.2007-10-01>
>>>> <ftp://ftp.openldap.org/incoming/5161-dbstat.delta.out.2007-10-01>
>>>> <ftp://ftp.openldap.org/incoming/5161-db_stat.out.2007-10-01>
>>> The pstack output is a bit odd, is this a regular debug build? With frame
>>> pointers, etc? Can you get a stack trace in gdb?
>> It is a regular build, and they killed and restarted it before getting
>> any  gdb information.  We've asked them to please get the gdb information
>> in the  future.  Since it has happened twice now for thi particular group
>> in about  a month, I'm hopeful it'll happen again before too long. ;)
> And here is the last logged operation:
> Oct  1 17:48:21 ldap01 slapd.bin[16121]: conn=62333 op=1 MOD 
> dn="uid=XXXXXXX,ou=people,dc=YYYYYY,dc=com"
> Oct  1 17:48:21 ldap01 slapd.bin[16121]: conn=62333 op=1 MOD 
> attr=zimbraLastLogonTimestamp

Based on the (unreliable) pstack output it appears that all of the threads are 
waiting for the same mutex. This of course shouldn't be possible since one of 
those threads must already own it. We really need to have gdb access here to 
inspect the state of the mutex and see which thread is the owner, then figure 
out why it's trying to lock it again. In OpenLDAP 2.3 this pretty much means 
that some operation locked the mutex and somehow completed without unlocking 
it, i.e. completed without going thru the accesslog response callback.

This has nothing to do with BDB so db_stat isn't relevant here. It's about the 
accesslog overlay and any other overlays that may be manipulating the callback 
stack, so your slapd.conf is more relevant here.
   -- Howard Chu
   Chief Architect, Symas Corp.  http://www.symas.com
   Director, Highland Sun        http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP     http://www.openldap.org/project/