[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#7085) mutex lockup issue



--0016364c7ac7fff89504b31ea350
Content-Type: text/plain; charset=ISO-8859-1

Quanah,

Were you able to recreate this issue?

Soichi

On Wed, Nov 16, 2011 at 3:56 PM, Soichi Hayashi <hayashis@indiana.edu>wrote:

> Quanah,
>
> We have compiled OpenLDAP 2.4.26 with BDB 5.2.36. The OpenLDAP locked up 4
> hours into our testing in similar manner to what I have reported earlier. I
> believe this issue still occurs on the latest version.
>
> However, when I used gdb, I didn't notice the mutex locked threads like I
> did with OpenLDAP 2.4.22.
>
> Following is from locked 2.4.26 slapd server.
>
> (gdb) info thread
>   14 Thread 0x418dd940 (LWP 13814)  0x00000037aa4d48a8 in epoll_wait ()
> from /lib64/libc.so.6
>   13 Thread 0x420de940 (LWP 13815)  0x00000037aac0aee9 in
> pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
>   12 Thread 0x428df940 (LWP 13816)  0x00000037aac0aee9 in
> pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
>   11 Thread 0x430e0940 (LWP 13843)  0x00000037aac0aee9 in
> pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
>   10 Thread 0x438e1940 (LWP 13855)  0x00000037aac0aee9 in
> pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
>   9 Thread 0x440e2940 (LWP 13856)  0x00000037aac0aee9 in pthread_cond_wait@@GLIBC_2.3.2
> () from /lib64/libpthread.so.0
>   8 Thread 0x448e3940 (LWP 13857)  0x00000037aac0aee9 in pthread_cond_wait@@GLIBC_2.3.2
> () from /lib64/libpthread.so.0
>   7 Thread 0x450e4940 (LWP 13858)  0x00000037aac0aee9 in pthread_cond_wait@@GLIBC_2.3.2
> () from /lib64/libpthread.so.0
>   6 Thread 0x458e5940 (LWP 13859)  0x00000037aac0aee9 in pthread_cond_wait@@GLIBC_2.3.2
> () from /lib64/libpthread.so.0
>   5 Thread 0x460e6940 (LWP 13860)  0x00000037aac0aee9 in pthread_cond_wait@@GLIBC_2.3.2
> () from /lib64/libpthread.so.0
>   4 Thread 0x468e7940 (LWP 2007)  0x00000037aac0aee9 in pthread_cond_wait@@GLIBC_2.3.2
> () from /lib64/libpthread.so.0
>   3 Thread 0x470e8940 (LWP 2008)  0x00000037aa4cd722 in select () from
> /lib64/libc.so.6
>   2 Thread 0x478e9940 (LWP 2009)  0x00000037aac0aee9 in pthread_cond_wait@@GLIBC_2.3.2
> () from /lib64/libpthread.so.0
> * 1 Thread 0x2ac6ccfdc930 (LWP 13805)  0x00000037aac07b35 in pthread_join
> () from /lib64/libpthread.so.0
> (gdb) thread 3
> [Switching to thread 3 (Thread 0x470e8940 (LWP 2008))]#0
>  0x00000037aa4cd722 in select () from /lib64/libc.so.6
> (gdb) bt
> #0  0x00000037aa4cd722 in select () from /lib64/libc.so.6
> #1  0x000000000054ece5 in ?? ()
> #2  0x000000000054aa15 in ?? ()
> #3  0x0000000000557637 in ?? ()
> #4  0x0000000000557c11 in ?? ()
> #5  0x00000000004b2d93 in ?? ()
> #6  0x00000000004e9d7c in ?? ()
> #7  0x00000037aac0673d in start_thread () from /lib64/libpthread.so.0
> #8  0x00000037aa4d44bd in clone () from /lib64/libc.so.6
>
> It looks like it's waiting on select() on thread 3 which never get fired
> when I access it using ldapsearch command.
>
> I ran strace on ldapsearch (on a client machine) and following is what I
> see at the end of the log..
>
> $ strace ldapsearch -h 129.79.14.152 -p 2180 -l 3 -x -b
> mds-vo-name=WT2,o=grid
> "(&(objectClass=GlueLocation)(GlueLocationName=TIMESTAMP))"
>
> ....
> write(1, "\n", 1
> )                       = 1
> write(3, "0l\2\1\2cg\4\26mds-vo-name=WT2,o=grid\n"..., 110) = 110
> poll([{fd=3, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, -1
>
> Not sure if this strace is useful or not.. but after this, ldapsearch
> never returned.
>
> Thanks,
> Soichi
>
>
> On Wed, Nov 9, 2011 at 1:13 PM, Quanah Gibson-Mount <quanah@zimbra.com>wrote:
>
>> --On Wednesday, November 09, 2011 2:01 PM +0000 hayashis@indiana.eduwrote:
>>
>>  Full_Name: Soichi Hayashi
>>> Version: 2.4.22
>>>
>>
>> OpenLDAP 2.4.22 is quite old, and had various known issues.  Please use a
>> current release (2.4.26).  This report will not be investigated unless you
>> can reproduce it with a current release of OpenLDAP.  You also fail to note
>> what BDB release you are using, and whether or not it has all the relevant
>> patches applied to it.  If you have a broken policy of only using vendor
>> provided packages, then you will need to send a bug report to RedHat, as it
>> is their job to maintain their vendor packages.
>>
>>
>> Thanks!
>>
>> --Quanah
>>
>> --
>>
>> Quanah Gibson-Mount
>> Sr. Member of Technical Staff
>> Zimbra, Inc
>> A Division of VMware, Inc.
>> --------------------
>> Zimbra ::  the leader in open source messaging and collaboration
>>
>
>

--0016364c7ac7fff89504b31ea350
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Quanah,<div><br></div><div>Were you able to recreate this issue?</div><div>=
<br></div><div>Soichi<br><div><br><div class=3D"gmail_quote">On Wed, Nov 16=
, 2011 at 3:56 PM, Soichi Hayashi <span dir=3D"ltr">&lt;<a href=3D"mailto:h=
ayashis@indiana.edu">hayashis@indiana.edu</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex;"><div>Quanah,</div><div><br></div><div>We ha=
ve compiled OpenLDAP 2.4.26 with BDB 5.2.36. The OpenLDAP locked up 4 hours=
 into our testing in similar manner to what I have reported earlier. I beli=
eve this issue still occurs on the latest version.</div>

<div><br></div><div>However, when I used gdb, I didn&#39;t notice the mutex=
 locked threads like I did with OpenLDAP 2.4.22.</div><div><br></div><div>F=
ollowing is from locked 2.4.26 slapd server.</div><div><br></div><div>
(gdb) info thread</div>
<div>=A0 14 Thread 0x418dd940 (LWP 13814) =A00x00000037aa4d48a8 in epoll_wa=
it () from /lib64/libc.so.6</div><div>=A0 13 Thread 0x420de940 (LWP 13815) =
=A00x00000037aac0aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libp=
thread.so.0</div>

<div>=A0 12 Thread 0x428df940 (LWP 13816) =A00x00000037aac0aee9 in pthread_=
cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0</div><div>=A0 11 Thre=
ad 0x430e0940 (LWP 13843) =A00x00000037aac0aee9 in pthread_cond_wait@@GLIBC=
_2.3.2 () from /lib64/libpthread.so.0</div>

<div>=A0 10 Thread 0x438e1940 (LWP 13855) =A00x00000037aac0aee9 in pthread_=
cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0</div><div>=A0 9 Threa=
d 0x440e2940 (LWP 13856) =A00x00000037aac0aee9 in pthread_cond_wait@@GLIBC_=
2.3.2 () from /lib64/libpthread.so.0</div>

<div>=A0 8 Thread 0x448e3940 (LWP 13857) =A00x00000037aac0aee9 in pthread_c=
ond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0</div><div>=A0 7 Thread=
 0x450e4940 (LWP 13858) =A00x00000037aac0aee9 in pthread_cond_wait@@GLIBC_2=
.3.2 () from /lib64/libpthread.so.0</div>

<div>=A0 6 Thread 0x458e5940 (LWP 13859) =A00x00000037aac0aee9 in pthread_c=
ond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0</div><div>=A0 5 Thread=
 0x460e6940 (LWP 13860) =A00x00000037aac0aee9 in pthread_cond_wait@@GLIBC_2=
.3.2 () from /lib64/libpthread.so.0</div>

<div>=A0 4 Thread 0x468e7940 (LWP 2007) =A00x00000037aac0aee9 in pthread_co=
nd_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0</div><div>=A0 3 Thread =
0x470e8940 (LWP 2008) =A00x00000037aa4cd722 in select () from /lib64/libc.s=
o.6</div>

<div>=A0 2 Thread 0x478e9940 (LWP 2009) =A00x00000037aac0aee9 in pthread_co=
nd_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0</div><div>* 1 Thread 0x=
2ac6ccfdc930 (LWP 13805) =A00x00000037aac07b35 in pthread_join () from /lib=
64/libpthread.so.0</div>

<div>(gdb) thread 3</div><div>[Switching to thread 3 (Thread 0x470e8940 (LW=
P 2008))]#0 =A00x00000037aa4cd722 in select () from /lib64/libc.so.6</div><=
div>(gdb) bt</div><div>#0 =A00x00000037aa4cd722 in select () from /lib64/li=
bc.so.6</div>

<div>#1 =A00x000000000054ece5 in ?? ()</div><div>#2 =A00x000000000054aa15 i=
n ?? ()</div><div>#3 =A00x0000000000557637 in ?? ()</div><div>#4 =A00x00000=
00000557c11 in ?? ()</div><div>#5 =A00x00000000004b2d93 in ?? ()</div><div>=
#6 =A00x00000000004e9d7c in ?? ()</div>

<div>#7 =A00x00000037aac0673d in start_thread () from /lib64/libpthread.so.=
0</div><div>#8 =A00x00000037aa4d44bd in clone () from /lib64/libc.so.6</div=
><div><br></div><div>It looks like it&#39;s waiting on select() on thread 3=
 which never get fired when I access it using ldapsearch command.=A0</div>

<div><br></div><div>I ran strace on ldapsearch (on a client machine) and fo=
llowing is what I see at the end of the log..</div><div><br></div><div>$ st=
race ldapsearch -h 129.79.14.152 -p 2180 -l 3 -x -b mds-vo-name=3DWT2,o=3Dg=
rid &quot;(&amp;(objectClass=3DGlueLocation)(GlueLocationName=3DTIMESTAMP))=
&quot;</div>

<div><br></div><div>....</div><div>write(1, &quot;\n&quot;, 1</div><div>) =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =3D 1</div><div>write(3, &quot;=
0l\2\1\2cg\4\26mds-vo-name=3DWT2,o=3Dgrid\n&quot;..., 110) =3D 110</div><di=
v>poll([{fd=3D3, events=3DPOLLIN|POLLPRI|POLLERR|POLLHUP}], 1, -1</div>

<div><br></div><div>Not sure if this strace is useful or not.. but after th=
is, ldapsearch never returned.</div><div><br></div><div>Thanks,</div><div>S=
oichi</div><div class=3D"HOEnZb"><div class=3D"h5"><div><br></div><br><div =
class=3D"gmail_quote">
On Wed, Nov 9, 2011 at 1:13 PM, Quanah Gibson-Mount <span dir=3D"ltr">&lt;<=
a href=3D"mailto:quanah@zimbra.com"; target=3D"_blank">quanah@zimbra.com</a>=
&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">--On Wednesday, November 09, 2011 2:01 PM +0=
000 <a href=3D"mailto:hayashis@indiana.edu"; target=3D"_blank">hayashis@indi=
ana.edu</a> wrote:<br>


<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
Full_Name: Soichi Hayashi<br>
Version: 2.4.22<br>
</blockquote>
<br>
OpenLDAP 2.4.22 is quite old, and had various known issues. =A0Please use a=
 current release (2.4.26). =A0This report will not be investigated unless y=
ou can reproduce it with a current release of OpenLDAP. =A0You also fail to=
 note what BDB release you are using, and whether or not it has all the rel=
evant patches applied to it. =A0If you have a broken policy of only using v=
endor provided packages, then you will need to send a bug report to RedHat,=
 as it is their job to maintain their vendor packages.<br>


<br>
<br>
Thanks!<span><font color=3D"#888888"><br>
<br>
--Quanah<br>
<br>
--<br>
<br>
Quanah Gibson-Mount<br>
Sr. Member of Technical Staff<br>
Zimbra, Inc<br>
A Division of VMware, Inc.<br>
--------------------<br>
Zimbra :: =A0the leader in open source messaging and collaboration<br>
</font></span></blockquote></div><br>
</div></div></blockquote></div><br></div></div>

--0016364c7ac7fff89504b31ea350--