[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: (ITS#7085) mutex lockup issue
--0016364c7ac7fff89504b31ea350
Content-Type: text/plain; charset=ISO-8859-1
Quanah,
Were you able to recreate this issue?
Soichi
On Wed, Nov 16, 2011 at 3:56 PM, Soichi Hayashi <hayashis@indiana.edu>wrote:
> Quanah,
>
> We have compiled OpenLDAP 2.4.26 with BDB 5.2.36. The OpenLDAP locked up 4
> hours into our testing in similar manner to what I have reported earlier. I
> believe this issue still occurs on the latest version.
>
> However, when I used gdb, I didn't notice the mutex locked threads like I
> did with OpenLDAP 2.4.22.
>
> Following is from locked 2.4.26 slapd server.
>
> (gdb) info thread
> 14 Thread 0x418dd940 (LWP 13814) 0x00000037aa4d48a8 in epoll_wait ()
> from /lib64/libc.so.6
> 13 Thread 0x420de940 (LWP 13815) 0x00000037aac0aee9 in
> pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
> 12 Thread 0x428df940 (LWP 13816) 0x00000037aac0aee9 in
> pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
> 11 Thread 0x430e0940 (LWP 13843) 0x00000037aac0aee9 in
> pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
> 10 Thread 0x438e1940 (LWP 13855) 0x00000037aac0aee9 in
> pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
> 9 Thread 0x440e2940 (LWP 13856) 0x00000037aac0aee9 in pthread_cond_wait@@GLIBC_2.3.2
> () from /lib64/libpthread.so.0
> 8 Thread 0x448e3940 (LWP 13857) 0x00000037aac0aee9 in pthread_cond_wait@@GLIBC_2.3.2
> () from /lib64/libpthread.so.0
> 7 Thread 0x450e4940 (LWP 13858) 0x00000037aac0aee9 in pthread_cond_wait@@GLIBC_2.3.2
> () from /lib64/libpthread.so.0
> 6 Thread 0x458e5940 (LWP 13859) 0x00000037aac0aee9 in pthread_cond_wait@@GLIBC_2.3.2
> () from /lib64/libpthread.so.0
> 5 Thread 0x460e6940 (LWP 13860) 0x00000037aac0aee9 in pthread_cond_wait@@GLIBC_2.3.2
> () from /lib64/libpthread.so.0
> 4 Thread 0x468e7940 (LWP 2007) 0x00000037aac0aee9 in pthread_cond_wait@@GLIBC_2.3.2
> () from /lib64/libpthread.so.0
> 3 Thread 0x470e8940 (LWP 2008) 0x00000037aa4cd722 in select () from
> /lib64/libc.so.6
> 2 Thread 0x478e9940 (LWP 2009) 0x00000037aac0aee9 in pthread_cond_wait@@GLIBC_2.3.2
> () from /lib64/libpthread.so.0
> * 1 Thread 0x2ac6ccfdc930 (LWP 13805) 0x00000037aac07b35 in pthread_join
> () from /lib64/libpthread.so.0
> (gdb) thread 3
> [Switching to thread 3 (Thread 0x470e8940 (LWP 2008))]#0
> 0x00000037aa4cd722 in select () from /lib64/libc.so.6
> (gdb) bt
> #0 0x00000037aa4cd722 in select () from /lib64/libc.so.6
> #1 0x000000000054ece5 in ?? ()
> #2 0x000000000054aa15 in ?? ()
> #3 0x0000000000557637 in ?? ()
> #4 0x0000000000557c11 in ?? ()
> #5 0x00000000004b2d93 in ?? ()
> #6 0x00000000004e9d7c in ?? ()
> #7 0x00000037aac0673d in start_thread () from /lib64/libpthread.so.0
> #8 0x00000037aa4d44bd in clone () from /lib64/libc.so.6
>
> It looks like it's waiting on select() on thread 3 which never get fired
> when I access it using ldapsearch command.
>
> I ran strace on ldapsearch (on a client machine) and following is what I
> see at the end of the log..
>
> $ strace ldapsearch -h 129.79.14.152 -p 2180 -l 3 -x -b
> mds-vo-name=WT2,o=grid
> "(&(objectClass=GlueLocation)(GlueLocationName=TIMESTAMP))"
>
> ....
> write(1, "\n", 1
> ) = 1
> write(3, "0l\2\1\2cg\4\26mds-vo-name=WT2,o=grid\n"..., 110) = 110
> poll([{fd=3, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, -1
>
> Not sure if this strace is useful or not.. but after this, ldapsearch
> never returned.
>
> Thanks,
> Soichi
>
>
> On Wed, Nov 9, 2011 at 1:13 PM, Quanah Gibson-Mount <quanah@zimbra.com>wrote:
>
>> --On Wednesday, November 09, 2011 2:01 PM +0000 hayashis@indiana.eduwrote:
>>
>> Full_Name: Soichi Hayashi
>>> Version: 2.4.22
>>>
>>
>> OpenLDAP 2.4.22 is quite old, and had various known issues. Please use a
>> current release (2.4.26). This report will not be investigated unless you
>> can reproduce it with a current release of OpenLDAP. You also fail to note
>> what BDB release you are using, and whether or not it has all the relevant
>> patches applied to it. If you have a broken policy of only using vendor
>> provided packages, then you will need to send a bug report to RedHat, as it
>> is their job to maintain their vendor packages.
>>
>>
>> Thanks!
>>
>> --Quanah
>>
>> --
>>
>> Quanah Gibson-Mount
>> Sr. Member of Technical Staff
>> Zimbra, Inc
>> A Division of VMware, Inc.
>> --------------------
>> Zimbra :: the leader in open source messaging and collaboration
>>
>
>
--0016364c7ac7fff89504b31ea350
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Quanah,<div><br></div><div>Were you able to recreate this issue?</div><div>=
<br></div><div>Soichi<br><div><br><div class=3D"gmail_quote">On Wed, Nov 16=
, 2011 at 3:56 PM, Soichi Hayashi <span dir=3D"ltr"><<a href=3D"mailto:h=
ayashis@indiana.edu">hayashis@indiana.edu</a>></span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex;"><div>Quanah,</div><div><br></div><div>We ha=
ve compiled OpenLDAP 2.4.26 with BDB 5.2.36. The OpenLDAP locked up 4 hours=
into our testing in similar manner to what I have reported earlier. I beli=
eve this issue still occurs on the latest version.</div>
<div><br></div><div>However, when I used gdb, I didn't notice the mutex=
locked threads like I did with OpenLDAP 2.4.22.</div><div><br></div><div>F=
ollowing is from locked 2.4.26 slapd server.</div><div><br></div><div>
(gdb) info thread</div>
<div>=A0 14 Thread 0x418dd940 (LWP 13814) =A00x00000037aa4d48a8 in epoll_wa=
it () from /lib64/libc.so.6</div><div>=A0 13 Thread 0x420de940 (LWP 13815) =
=A00x00000037aac0aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libp=
thread.so.0</div>
<div>=A0 12 Thread 0x428df940 (LWP 13816) =A00x00000037aac0aee9 in pthread_=
cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0</div><div>=A0 11 Thre=
ad 0x430e0940 (LWP 13843) =A00x00000037aac0aee9 in pthread_cond_wait@@GLIBC=
_2.3.2 () from /lib64/libpthread.so.0</div>
<div>=A0 10 Thread 0x438e1940 (LWP 13855) =A00x00000037aac0aee9 in pthread_=
cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0</div><div>=A0 9 Threa=
d 0x440e2940 (LWP 13856) =A00x00000037aac0aee9 in pthread_cond_wait@@GLIBC_=
2.3.2 () from /lib64/libpthread.so.0</div>
<div>=A0 8 Thread 0x448e3940 (LWP 13857) =A00x00000037aac0aee9 in pthread_c=
ond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0</div><div>=A0 7 Thread=
0x450e4940 (LWP 13858) =A00x00000037aac0aee9 in pthread_cond_wait@@GLIBC_2=
.3.2 () from /lib64/libpthread.so.0</div>
<div>=A0 6 Thread 0x458e5940 (LWP 13859) =A00x00000037aac0aee9 in pthread_c=
ond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0</div><div>=A0 5 Thread=
0x460e6940 (LWP 13860) =A00x00000037aac0aee9 in pthread_cond_wait@@GLIBC_2=
.3.2 () from /lib64/libpthread.so.0</div>
<div>=A0 4 Thread 0x468e7940 (LWP 2007) =A00x00000037aac0aee9 in pthread_co=
nd_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0</div><div>=A0 3 Thread =
0x470e8940 (LWP 2008) =A00x00000037aa4cd722 in select () from /lib64/libc.s=
o.6</div>
<div>=A0 2 Thread 0x478e9940 (LWP 2009) =A00x00000037aac0aee9 in pthread_co=
nd_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0</div><div>* 1 Thread 0x=
2ac6ccfdc930 (LWP 13805) =A00x00000037aac07b35 in pthread_join () from /lib=
64/libpthread.so.0</div>
<div>(gdb) thread 3</div><div>[Switching to thread 3 (Thread 0x470e8940 (LW=
P 2008))]#0 =A00x00000037aa4cd722 in select () from /lib64/libc.so.6</div><=
div>(gdb) bt</div><div>#0 =A00x00000037aa4cd722 in select () from /lib64/li=
bc.so.6</div>
<div>#1 =A00x000000000054ece5 in ?? ()</div><div>#2 =A00x000000000054aa15 i=
n ?? ()</div><div>#3 =A00x0000000000557637 in ?? ()</div><div>#4 =A00x00000=
00000557c11 in ?? ()</div><div>#5 =A00x00000000004b2d93 in ?? ()</div><div>=
#6 =A00x00000000004e9d7c in ?? ()</div>
<div>#7 =A00x00000037aac0673d in start_thread () from /lib64/libpthread.so.=
0</div><div>#8 =A00x00000037aa4d44bd in clone () from /lib64/libc.so.6</div=
><div><br></div><div>It looks like it's waiting on select() on thread 3=
which never get fired when I access it using ldapsearch command.=A0</div>
<div><br></div><div>I ran strace on ldapsearch (on a client machine) and fo=
llowing is what I see at the end of the log..</div><div><br></div><div>$ st=
race ldapsearch -h 129.79.14.152 -p 2180 -l 3 -x -b mds-vo-name=3DWT2,o=3Dg=
rid "(&(objectClass=3DGlueLocation)(GlueLocationName=3DTIMESTAMP))=
"</div>
<div><br></div><div>....</div><div>write(1, "\n", 1</div><div>) =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =3D 1</div><div>write(3, "=
0l\2\1\2cg\4\26mds-vo-name=3DWT2,o=3Dgrid\n"..., 110) =3D 110</div><di=
v>poll([{fd=3D3, events=3DPOLLIN|POLLPRI|POLLERR|POLLHUP}], 1, -1</div>
<div><br></div><div>Not sure if this strace is useful or not.. but after th=
is, ldapsearch never returned.</div><div><br></div><div>Thanks,</div><div>S=
oichi</div><div class=3D"HOEnZb"><div class=3D"h5"><div><br></div><br><div =
class=3D"gmail_quote">
On Wed, Nov 9, 2011 at 1:13 PM, Quanah Gibson-Mount <span dir=3D"ltr"><<=
a href=3D"mailto:quanah@zimbra.com" target=3D"_blank">quanah@zimbra.com</a>=
></span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">--On Wednesday, November 09, 2011 2:01 PM +0=
000 <a href=3D"mailto:hayashis@indiana.edu" target=3D"_blank">hayashis@indi=
ana.edu</a> wrote:<br>
<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
Full_Name: Soichi Hayashi<br>
Version: 2.4.22<br>
</blockquote>
<br>
OpenLDAP 2.4.22 is quite old, and had various known issues. =A0Please use a=
current release (2.4.26). =A0This report will not be investigated unless y=
ou can reproduce it with a current release of OpenLDAP. =A0You also fail to=
note what BDB release you are using, and whether or not it has all the rel=
evant patches applied to it. =A0If you have a broken policy of only using v=
endor provided packages, then you will need to send a bug report to RedHat,=
as it is their job to maintain their vendor packages.<br>
<br>
<br>
Thanks!<span><font color=3D"#888888"><br>
<br>
--Quanah<br>
<br>
--<br>
<br>
Quanah Gibson-Mount<br>
Sr. Member of Technical Staff<br>
Zimbra, Inc<br>
A Division of VMware, Inc.<br>
--------------------<br>
Zimbra :: =A0the leader in open source messaging and collaboration<br>
</font></span></blockquote></div><br>
</div></div></blockquote></div><br></div></div>
--0016364c7ac7fff89504b31ea350--