[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#7085) mutex lockup issue



--0015174c117a5169be04b1e058e5
Content-Type: text/plain; charset=ISO-8859-1

Quanah,

We have compiled OpenLDAP 2.4.26 with BDB 5.2.36. The OpenLDAP locked up 4
hours into our testing in similar manner to what I have reported earlier. I
believe this issue still occurs on the latest version.

However, when I used gdb, I didn't notice the mutex locked threads like I
did with OpenLDAP 2.4.22.

Following is from locked 2.4.26 slapd server.

(gdb) info thread
  14 Thread 0x418dd940 (LWP 13814)  0x00000037aa4d48a8 in epoll_wait ()
from /lib64/libc.so.6
  13 Thread 0x420de940 (LWP 13815)  0x00000037aac0aee9 in
pthread_cond_wait@@GLIBC_2.3.2
() from /lib64/libpthread.so.0
  12 Thread 0x428df940 (LWP 13816)  0x00000037aac0aee9 in
pthread_cond_wait@@GLIBC_2.3.2
() from /lib64/libpthread.so.0
  11 Thread 0x430e0940 (LWP 13843)  0x00000037aac0aee9 in
pthread_cond_wait@@GLIBC_2.3.2
() from /lib64/libpthread.so.0
  10 Thread 0x438e1940 (LWP 13855)  0x00000037aac0aee9 in
pthread_cond_wait@@GLIBC_2.3.2
() from /lib64/libpthread.so.0
  9 Thread 0x440e2940 (LWP 13856)  0x00000037aac0aee9 in
pthread_cond_wait@@GLIBC_2.3.2
() from /lib64/libpthread.so.0
  8 Thread 0x448e3940 (LWP 13857)  0x00000037aac0aee9 in
pthread_cond_wait@@GLIBC_2.3.2
() from /lib64/libpthread.so.0
  7 Thread 0x450e4940 (LWP 13858)  0x00000037aac0aee9 in
pthread_cond_wait@@GLIBC_2.3.2
() from /lib64/libpthread.so.0
  6 Thread 0x458e5940 (LWP 13859)  0x00000037aac0aee9 in
pthread_cond_wait@@GLIBC_2.3.2
() from /lib64/libpthread.so.0
  5 Thread 0x460e6940 (LWP 13860)  0x00000037aac0aee9 in
pthread_cond_wait@@GLIBC_2.3.2
() from /lib64/libpthread.so.0
  4 Thread 0x468e7940 (LWP 2007)  0x00000037aac0aee9 in
pthread_cond_wait@@GLIBC_2.3.2
() from /lib64/libpthread.so.0
  3 Thread 0x470e8940 (LWP 2008)  0x00000037aa4cd722 in select () from
/lib64/libc.so.6
  2 Thread 0x478e9940 (LWP 2009)  0x00000037aac0aee9 in
pthread_cond_wait@@GLIBC_2.3.2
() from /lib64/libpthread.so.0
* 1 Thread 0x2ac6ccfdc930 (LWP 13805)  0x00000037aac07b35 in pthread_join
() from /lib64/libpthread.so.0
(gdb) thread 3
[Switching to thread 3 (Thread 0x470e8940 (LWP 2008))]#0
 0x00000037aa4cd722 in select () from /lib64/libc.so.6
(gdb) bt
#0  0x00000037aa4cd722 in select () from /lib64/libc.so.6
#1  0x000000000054ece5 in ?? ()
#2  0x000000000054aa15 in ?? ()
#3  0x0000000000557637 in ?? ()
#4  0x0000000000557c11 in ?? ()
#5  0x00000000004b2d93 in ?? ()
#6  0x00000000004e9d7c in ?? ()
#7  0x00000037aac0673d in start_thread () from /lib64/libpthread.so.0
#8  0x00000037aa4d44bd in clone () from /lib64/libc.so.6

It looks like it's waiting on select() on thread 3 which never get fired
when I access it using ldapsearch command.

I ran strace on ldapsearch (on a client machine) and following is what I
see at the end of the log..

$ strace ldapsearch -h 129.79.14.152 -p 2180 -l 3 -x -b
mds-vo-name=WT2,o=grid
"(&(objectClass=GlueLocation)(GlueLocationName=TIMESTAMP))"

....
write(1, "\n", 1
)                       = 1
write(3, "0l\2\1\2cg\4\26mds-vo-name=WT2,o=grid\n"..., 110) = 110
poll([{fd=3, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, -1

Not sure if this strace is useful or not.. but after this, ldapsearch never
returned.

Thanks,
Soichi


On Wed, Nov 9, 2011 at 1:13 PM, Quanah Gibson-Mount <quanah@zimbra.com>wrote:

> --On Wednesday, November 09, 2011 2:01 PM +0000 hayashis@indiana.eduwrote:
>
>  Full_Name: Soichi Hayashi
>> Version: 2.4.22
>>
>
> OpenLDAP 2.4.22 is quite old, and had various known issues.  Please use a
> current release (2.4.26).  This report will not be investigated unless you
> can reproduce it with a current release of OpenLDAP.  You also fail to note
> what BDB release you are using, and whether or not it has all the relevant
> patches applied to it.  If you have a broken policy of only using vendor
> provided packages, then you will need to send a bug report to RedHat, as it
> is their job to maintain their vendor packages.
>
>
> Thanks!
>
> --Quanah
>
> --
>
> Quanah Gibson-Mount
> Sr. Member of Technical Staff
> Zimbra, Inc
> A Division of VMware, Inc.
> --------------------
> Zimbra ::  the leader in open source messaging and collaboration
>

--0015174c117a5169be04b1e058e5
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div>Quanah,</div><div><br></div><div>We have compiled OpenLDAP 2.4.26 with=
 BDB 5.2.36. The OpenLDAP locked up 4 hours into our testing in similar man=
ner to what I have reported earlier. I believe this issue still occurs on t=
he latest version.</div>
<div><br></div><div>However, when I used gdb, I didn&#39;t notice the mutex=
 locked threads like I did with OpenLDAP 2.4.22.</div><div><br></div><div>F=
ollowing is from locked 2.4.26 slapd server.</div><div><br></div><div>(gdb)=
 info thread</div>
<div>=A0 14 Thread 0x418dd940 (LWP 13814) =A00x00000037aa4d48a8 in epoll_wa=
it () from /lib64/libc.so.6</div><div>=A0 13 Thread 0x420de940 (LWP 13815) =
=A00x00000037aac0aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libp=
thread.so.0</div>
<div>=A0 12 Thread 0x428df940 (LWP 13816) =A00x00000037aac0aee9 in pthread_=
cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0</div><div>=A0 11 Thre=
ad 0x430e0940 (LWP 13843) =A00x00000037aac0aee9 in pthread_cond_wait@@GLIBC=
_2.3.2 () from /lib64/libpthread.so.0</div>
<div>=A0 10 Thread 0x438e1940 (LWP 13855) =A00x00000037aac0aee9 in pthread_=
cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0</div><div>=A0 9 Threa=
d 0x440e2940 (LWP 13856) =A00x00000037aac0aee9 in pthread_cond_wait@@GLIBC_=
2.3.2 () from /lib64/libpthread.so.0</div>
<div>=A0 8 Thread 0x448e3940 (LWP 13857) =A00x00000037aac0aee9 in pthread_c=
ond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0</div><div>=A0 7 Thread=
 0x450e4940 (LWP 13858) =A00x00000037aac0aee9 in pthread_cond_wait@@GLIBC_2=
.3.2 () from /lib64/libpthread.so.0</div>
<div>=A0 6 Thread 0x458e5940 (LWP 13859) =A00x00000037aac0aee9 in pthread_c=
ond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0</div><div>=A0 5 Thread=
 0x460e6940 (LWP 13860) =A00x00000037aac0aee9 in pthread_cond_wait@@GLIBC_2=
.3.2 () from /lib64/libpthread.so.0</div>
<div>=A0 4 Thread 0x468e7940 (LWP 2007) =A00x00000037aac0aee9 in pthread_co=
nd_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0</div><div>=A0 3 Thread =
0x470e8940 (LWP 2008) =A00x00000037aa4cd722 in select () from /lib64/libc.s=
o.6</div>
<div>=A0 2 Thread 0x478e9940 (LWP 2009) =A00x00000037aac0aee9 in pthread_co=
nd_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0</div><div>* 1 Thread 0x=
2ac6ccfdc930 (LWP 13805) =A00x00000037aac07b35 in pthread_join () from /lib=
64/libpthread.so.0</div>
<div>(gdb) thread 3</div><div>[Switching to thread 3 (Thread 0x470e8940 (LW=
P 2008))]#0 =A00x00000037aa4cd722 in select () from /lib64/libc.so.6</div><=
div>(gdb) bt</div><div>#0 =A00x00000037aa4cd722 in select () from /lib64/li=
bc.so.6</div>
<div>#1 =A00x000000000054ece5 in ?? ()</div><div>#2 =A00x000000000054aa15 i=
n ?? ()</div><div>#3 =A00x0000000000557637 in ?? ()</div><div>#4 =A00x00000=
00000557c11 in ?? ()</div><div>#5 =A00x00000000004b2d93 in ?? ()</div><div>=
#6 =A00x00000000004e9d7c in ?? ()</div>
<div>#7 =A00x00000037aac0673d in start_thread () from /lib64/libpthread.so.=
0</div><div>#8 =A00x00000037aa4d44bd in clone () from /lib64/libc.so.6</div=
><div><br></div><div>It looks like it&#39;s waiting on select() on thread 3=
 which never get fired when I access it using ldapsearch command.=A0</div>
<div><br></div><div>I ran strace on ldapsearch (on a client machine) and fo=
llowing is what I see at the end of the log..</div><div><br></div><div>$ st=
race ldapsearch -h 129.79.14.152 -p 2180 -l 3 -x -b mds-vo-name=3DWT2,o=3Dg=
rid &quot;(&amp;(objectClass=3DGlueLocation)(GlueLocationName=3DTIMESTAMP))=
&quot;</div>
<div><br></div><div>....</div><div>write(1, &quot;\n&quot;, 1</div><div>) =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =3D 1</div><div>write(3, &quot;=
0l\2\1\2cg\4\26mds-vo-name=3DWT2,o=3Dgrid\n&quot;..., 110) =3D 110</div><di=
v>poll([{fd=3D3, events=3DPOLLIN|POLLPRI|POLLERR|POLLHUP}], 1, -1</div>
<div><br></div><div>Not sure if this strace is useful or not.. but after th=
is, ldapsearch never returned.</div><div><br></div><div>Thanks,</div><div>S=
oichi</div><div><br></div><br><div class=3D"gmail_quote">On Wed, Nov 9, 201=
1 at 1:13 PM, Quanah Gibson-Mount <span dir=3D"ltr">&lt;<a href=3D"mailto:q=
uanah@zimbra.com">quanah@zimbra.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex;">--On Wednesday, November 09, 2011 2:01 PM +=
0000 <a href=3D"mailto:hayashis@indiana.edu"; target=3D"_blank">hayashis@ind=
iana.edu</a> wrote:<br>

<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
Full_Name: Soichi Hayashi<br>
Version: 2.4.22<br>
</blockquote>
<br>
OpenLDAP 2.4.22 is quite old, and had various known issues. =A0Please use a=
 current release (2.4.26). =A0This report will not be investigated unless y=
ou can reproduce it with a current release of OpenLDAP. =A0You also fail to=
 note what BDB release you are using, and whether or not it has all the rel=
evant patches applied to it. =A0If you have a broken policy of only using v=
endor provided packages, then you will need to send a bug report to RedHat,=
 as it is their job to maintain their vendor packages.<br>

<br>
<br>
Thanks!<span class=3D"HOEnZb"><font color=3D"#888888"><br>
<br>
--Quanah<br>
<br>
--<br>
<br>
Quanah Gibson-Mount<br>
Sr. Member of Technical Staff<br>
Zimbra, Inc<br>
A Division of VMware, Inc.<br>
--------------------<br>
Zimbra :: =A0the leader in open source messaging and collaboration<br>
</font></span></blockquote></div><br>

--0015174c117a5169be04b1e058e5--