Issue 679 - test001-slapdadd stalled
Summary: test001-slapdadd stalled
Status: VERIFIED FIXED
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: build (show other issues)
Version: unspecified
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2000-08-22 02:51 UTC by manabu@iij.ad.jp
Modified: 2014-08-01 21:05 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description manabu@iij.ad.jp 2000-08-22 02:51:34 UTC
Full_Name: Manabu Kondo
Version: 2.0-gamma
OS: BSD/OS 4.1
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (210.130.1.80)


I tried to build 2.0-gamma on BSD/OS 4.1.
'make depend' and 'make' were OK, but 'make test' was NG.

In 'test001-slapadd', 

Initiating LDAP tests for LDBM...
>>>>> Executing all LDAP tests...
>>>>> Test Directory: .
>>>>> Backend: ldbm
>>>>> Starting test001-slapadd ...
running defines.sh . ldbm
Datadir is ./data
Cleaning up in ./test-db...
Running slapadd to build slapd database...
Starting slapd on TCP/IP port 9009...
Using ldapsearch to retrieve all the entries...

and maybe stalled.
slapd.log was

Aug 22 11:19:43 voodoo slapd[23628]: slapd starting
Aug 22 11:19:43 voodoo slapd[23628]: daemon: conn=0 fd=7 connection \
 from IP=127.0.0.1:49414 (IP=127.0.0.1:9009) accepted.

I used BerkeleyDB 3.1.17 for ldbm.

#Sorry for my poor English...

--manabu

Comment 1 Kurt Zeilenga 2000-08-22 11:22:33 UTC
changed notes
Comment 2 Kurt Zeilenga 2000-08-23 16:42:26 UTC
I suggest you try --without-threads pending investigation
by a BSD/OS developer...

At 02:51 AM 8/22/00 +0000, manabu@iij.ad.jp wrote:
>Full_Name: Manabu Kondo
>Version: 2.0-gamma
>OS: BSD/OS 4.1
>URL: ftp://ftp.openldap.org/incoming/
>Submission from: (NULL) (210.130.1.80)
>
>
>I tried to build 2.0-gamma on BSD/OS 4.1.
>'make depend' and 'make' were OK, but 'make test' was NG.
>
>In 'test001-slapadd', 
>
>Initiating LDAP tests for LDBM...
>>>>>> Executing all LDAP tests...
>>>>>> Test Directory: .
>>>>>> Backend: ldbm
>>>>>> Starting test001-slapadd ...
>running defines.sh . ldbm
>Datadir is ./data
>Cleaning up in ./test-db...
>Running slapadd to build slapd database...
>Starting slapd on TCP/IP port 9009...
>Using ldapsearch to retrieve all the entries...
>
>and maybe stalled.
>slapd.log was
>
>Aug 22 11:19:43 voodoo slapd[23628]: slapd starting
>Aug 22 11:19:43 voodoo slapd[23628]: daemon: conn=0 fd=7 connection \
> from IP=127.0.0.1:49414 (IP=127.0.0.1:9009) accepted.
>
>I used BerkeleyDB 3.1.17 for ldbm.
>
>#Sorry for my poor English...
>
>--manabu

Comment 3 Kurt Zeilenga 2000-08-27 09:47:54 UTC
changed state Open to Suspended
Comment 4 Kurt Zeilenga 2000-09-02 15:01:34 UTC
moved from Incoming to Software Bugs
Comment 5 Kurt Zeilenga 2000-09-05 11:39:54 UTC
changed notes
Comment 6 Kurt Zeilenga 2000-09-06 09:59:30 UTC
Please test OpenLDAP 2.0.1
Comment 7 manabu@iij.ad.jp 2000-09-07 03:11:55 UTC
Date: Wed, 6 Sep 2000 16:59:29 GMT
Subject: Re: test001-slapdadd stalled (ITS#679)
From: Kurt Zeilenga <openldap-its@OpenLDAP.org> sez:

:Please test OpenLDAP 2.0.1

I tryed but there's same result...stalled.

-- 
Manabu Kondo / manabu@iij.ad.jp

Comment 8 Kurt Zeilenga 2000-09-07 17:59:45 UTC
You need to provide additional information to help
us identify the problem.  When slapd "stalls",
is it using CPU or is idle?  You should
be able to use an debugger to determine where it
is "stalled".  (If using CPU, it's likely in a
busy loop.  If idle, it's likely deadlocked).

Does it always stall in the same place?

Do other tests stall?  (you can run them individually
by typing:
  ./scripts/testscript

(where testscript is the name of the script).

Does the problem go away if you configure
--without-threads?

Was the system build with TLS support?  Does the
problem go away if you configure --without-tls?

Kurt

At 03:12 AM 9/7/00 +0000, manabu@iij.ad.jp wrote:
>Date: Wed, 6 Sep 2000 16:59:29 GMT
>Subject: Re: test001-slapdadd stalled (ITS#679)
>From: Kurt Zeilenga <openldap-its@OpenLDAP.org> sez:
>
>:Please test OpenLDAP 2.0.1
>
>I tryed but there's same result...stalled.
>
>-- 
>Manabu Kondo / manabu@iij.ad.jp

Comment 9 manabu@iij.ad.jp 2000-09-08 05:04:41 UTC
Date: Thu, 07 Sep 2000 10:59:45 -0700
Subject: Re: test001-slapdadd stalled (ITS#679)
From: "Kurt D. Zeilenga" <Kurt@OpenLDAP.org> sez:

:You need to provide additional information to help
:us identify the problem.  When slapd "stalls",
:is it using CPU or is idle?  You should
:be able to use an debugger to determine where it
:is "stalled".  (If using CPU, it's likely in a
:busy loop.  If idle, it's likely deadlocked).

Well, according to 'ps' command, slapd is idle.

:Does it always stall in the same place?

Yes.

:Do other tests stall?  (you can run them individually
:by typing:
:  ./scripts/testscript
:
:(where testscript is the name of the script).

I tryed test00[1-4] and test007, but stall(slapd is idle) at all
test scripts.

:Does the problem go away if you configure
:--without-threads?

Yes. Everything's gonna be allright.

:Was the system build with TLS support?  Does the
:problem go away if you configure --without-tls?

It was built without TLS.

And, here's a master.log for test001-testadd with full debug option.
I'll email you directly (means won't email to openldap-bugs).

-- 
Manabu Kondo / manabu@iij.ad.jp


Comment 10 Kurt Zeilenga 2000-09-09 16:41:45 UTC
Have you checked to ensure you have all relevant service
patches are installed?  In particular, these look of
interest.

Mod    : M410-008
Mod    : M410-014
Mod    : M410-021

Kurt

Comment 11 manabu@iij.ad.jp 2000-09-10 01:01:20 UTC
Sat, 09 Sep 2000 09:41:45 -0700,
"Kurt D. Zeilenga" <Kurt@OpenLDAP.org> sez:

:Have you checked to ensure you have all relevant service
:patches are installed?  In particular, these look of
:interest.
:
:Mod    : M410-008
:Mod    : M410-014
:Mod    : M410-021

All patches(from M410-001 to M410-029) are already applied.

And I tryed --with-threads=pth, but 'Segmentation fault - core dumped'
at test001-slapdadd. :-<
Configure options are
    --with-threads=pth
    --enable-crypt=yes
    --with-ldbm-api=berkeley

Here's 'bt' by gdb.

---from
<snip>
Core was generated by `slapd'.
Program terminated with signal 11, Segmentation fault.
#0  0x48163d9d in __pth_ring_append ()
(gdb) bt
#0  0x48163d9d in __pth_ring_append ()
#1  0x481679a2 in pth_mutex_acquire ()
#2  0x807b27f in ldap_pvt_thread_mutex_lock (mutex=0x8155f70) at thr_pth.c:130
#3  0x808466c in tls_locking_cb (mode=9, type=2, file=0x81432ac "err.c",
    line=208) at tls.c:73
#4  0x80e79b0 in CRYPTO_lock ()
#5  0x8080cd8 in ldap_int_initialize (gopts=0x814c860, dbglvl=0x814bc24)
    at init.c:447
#6  0x80811bf in ldap_set_option (ld=0x0, option=20481, invalue=0x814bc24)
    at options.c:322
#7  0x804ab5f in main (argc=7, argv=0x804795c) at main.c:278
#8  0x804a8c7 in __start ()
(gdb)
---end

-- 
Manabu Kondo / manabu@iij.ad.jp
Comment 12 Kurt Zeilenga 2000-09-10 01:11:50 UTC
At 01:01 AM 9/10/00 +0000, manabu@iij.ad.jp wrote:
>All patches(from M410-001 to M410-029) are already applied.

Thanks.  I try to walk through the code again and see if I
can find anything in slapd's code that might be causing
the deadlock (apparently on c_mutex).  If you have a thread
aware debugger, you can help by attaching to the deadlock'ed
process and obtaining a stack trace for each thread.

>And I tryed --with-threads=pth, but 'Segmentation fault - core dumped'
>at test001-slapdadd. :-<

That's a separate issue which should be filed separately.

Comment 13 manabu@iij.ad.jp 2000-09-10 01:20:45 UTC
Sat, 09 Sep 2000 18:11:50 -0700,
"Kurt D. Zeilenga" <Kurt@OpenLDAP.org> sez:

:>All patches(from M410-001 to M410-029) are already applied.
:Thanks.  I try to walk through the code again and see if I
:can find anything in slapd's code that might be causing
:the deadlock (apparently on c_mutex).  If you have a thread
:aware debugger, you can help by attaching to the deadlock'ed
:process and obtaining a stack trace for each thread.

OK, I'll try.

:>And I tryed --with-threads=pth, but 'Segmentation fault - core dumped'
:>at test001-slapdadd. :-<
:That's a separate issue which should be filed separately.

Yes, I think so.
Is it better to make a new ITS report for this core dump with
--with-threads=pth?

-- 
Manabu Kondo / manabu@iij.ad.jp
Comment 14 Kurt Zeilenga 2000-09-10 01:26:59 UTC
At 10:20 AM 9/10/00 +0900, Manabu Kondo wrote:
>:>And I tryed --with-threads=pth, but 'Segmentation fault - core dumped'
>:>at test001-slapdadd. :-<
>:That's a separate issue which should be filed separately.
>
>Yes, I think so.
>Is it better to make a new ITS report for this core dump with
>--with-threads=pth?

You should file a new, separate ITS with appropriate details.

Comment 15 Kurt Zeilenga 2000-09-11 15:55:55 UTC
changed notes
Comment 16 manabu@iij.ad.jp 2000-09-12 16:47:05 UTC
Hi, Kurt.

Date: Sat, 09 Sep 2000 18:11:50 -0700
Subject: Re: test001-slapdadd stalled (ITS#679)
From: "Kurt D. Zeilenga" <Kurt@OpenLDAP.org> sez:

:the deadlock (apparently on c_mutex).  If you have a thread
:aware debugger, you can help by attaching to the deadlock'ed
:process and obtaining a stack trace for each thread.

Well, here's 'bt' results by gdb.

---from  
(gdb) attach 2485
Attaching to program `/var/tmp/openldap-2.0.1/servers/slapd/slapd', process 2485    
0x48187575 in _syscall_sys_select ()
(gdb) bt
#0  0x48187575 in _syscall_sys_select ()
#1  0x48218028 in _thread_aio_poll ()
#2  0x48213b59 in _thread_kern_switch ()
#3  0x48213fc5 in _thread_kern_block ()
#4  0x482144e6 in pthread_mutex_lock ()
#5  0x807b13b in ldap_pvt_thread_mutex_lock (mutex=0x81ce548)
    at thr_posix.c:207
#6  0x805a121 in do_bind (conn=0x81ce540, op=0x81cd580) at bind.c:56
#7  0x804e309 in connection_operation (arg_v=0x81cca60) at connection.c:767
#8  0x807af64 in ldap_int_thread_pool_wrapper (pool=0x81b3080) at tpool.c:377
#9  0x48213d7c in _thread_kern_start ()
(gdb) info threads
* 6 thread 0x8159600  0x48187575 in _syscall_sys_select ()
  5 thread 0x8159500  0x48213c20 in _thread_kern_switch ()
  4 thread 0x8159200  0x48213c20 in _thread_kern_switch ()
(gdb) thread 6
[Switching to thread 0x8159600]
#0  0x48187575 in _syscall_sys_select ()
(gdb) bt
#0  0x48187575 in _syscall_sys_select ()
#1  0x48218028 in _thread_aio_poll ()
#2  0x48213b59 in _thread_kern_switch ()
#3  0x48213fc5 in _thread_kern_block ()
#4  0x482144e6 in pthread_mutex_lock ()
#5  0x807b13b in ldap_pvt_thread_mutex_lock (mutex=0x81ce548)
    at thr_posix.c:207
#6  0x805a121 in do_bind (conn=0x81ce540, op=0x81cd580) at bind.c:56
#7  0x804e309 in connection_operation (arg_v=0x81cca60) at connection.c:767
#8  0x807af64 in ldap_int_thread_pool_wrapper (pool=0x81b3080) at tpool.c:377
#9  0x48213d7c in _thread_kern_start ()
(gdb) thread 5
[Switching to thread 0x8159500]
#0  0x48213c20 in _thread_kern_switch ()
(gdb) bt
#0  0x48213c20 in _thread_kern_switch ()
#1  0x48213fc5 in _thread_kern_block ()
#2  0x482181ea in _thread_aio_suspend ()
#3  0x48218de4 in _thread_sys_read ()
#4  0x808ca91 in sb_stream_read (sbiod=0x81cbb00, buf=0x81d9000, len=16384)
    at sockbuf.c:449
#5  0x808cd07 in sb_rdahead_read (sbiod=0x81cbb20, buf=0x81c4b10, len=1)
    at sockbuf.c:613
#6  0x808d065 in sb_debug_read (sbiod=0x81cbb40, buf=0x81c4b10, len=1)
    at sockbuf.c:779
#7  0x808c98e in ber_int_sb_read (sb=0x81cba20, buf=0x81c4b10, len=1)
    at sockbuf.c:366
#8  0x808b4b1 in ber_get_next (sb=0x81cba20, len=0x4828966c, ber=0x81c4b00)
    at io.c:509
#9  0x804e802 in connection_input (conn=0x81ce540) at connection.c:1024
#10 0x804e6ce in connection_read (s=7) at connection.c:983
#11 0x804cd77 in slapd_daemon_task (ptr=0x0) at daemon.c:1135
#12 0x48213d7c in _thread_kern_start ()
(gdb)  thread 4
[Switching to thread 0x8159200]
#0  0x48213c20 in _thread_kern_switch ()
(gdb) bt
#0  0x48213c20 in _thread_kern_switch ()
#1  0x48213fc5 in _thread_kern_block ()
#2  0x48213908 in pthread_join ()
#3  0x807b0a2 in ldap_pvt_thread_join (thread=0x8159500, thread_return=0x0)
    at thr_posix.c:123
#4  0x804cfb1 in slapd_daemon () at daemon.c:1206
#5  0x804add8 in main (argc=7, argv=0x80478a0) at main.c:425
#6  0x804a827 in __start ()
(gdb) 
---end

-- 
Manabu Kondo / manabu@iij.ad.jp

Comment 17 Kurt Zeilenga 2000-09-12 17:17:56 UTC
Thanks for the traces (which I redirected to the ITS so that
we can appropriate track this issue).

Thread 6 appears to waiting on a connection's c_mutex.
Thread 5 appears to have this c_mutex, but is blocked on a read.
Thread 4 is the main thread and is in a normal (join) wait.

So, the question is, why is stream read is blocked?

I don't have an answer to this yet... will try to look into
further when I get the chance.  Any stream state information
(server and client side) you could provide would be useful.

Kurt


At 04:47 PM 9/12/00 +0000, manabu@iij.ad.jp wrote:
>Hi, Kurt.
>
>Date: Sat, 09 Sep 2000 18:11:50 -0700
>Subject: Re: test001-slapdadd stalled (ITS#679)
>From: "Kurt D. Zeilenga" <Kurt@OpenLDAP.org> sez:
>
>:the deadlock (apparently on c_mutex).  If you have a thread
>:aware debugger, you can help by attaching to the deadlock'ed
>:process and obtaining a stack trace for each thread.
>
>Well, here's 'bt' results by gdb.
>
>---from  
>(gdb) attach 2485
>Attaching to program `/var/tmp/openldap-2.0.1/servers/slapd/slapd', process 2485    
>0x48187575 in _syscall_sys_select ()
>(gdb) bt
>#0  0x48187575 in _syscall_sys_select ()
>#1  0x48218028 in _thread_aio_poll ()
>#2  0x48213b59 in _thread_kern_switch ()
>#3  0x48213fc5 in _thread_kern_block ()
>#4  0x482144e6 in pthread_mutex_lock ()
>#5  0x807b13b in ldap_pvt_thread_mutex_lock (mutex=0x81ce548)
>    at thr_posix.c:207
>#6  0x805a121 in do_bind (conn=0x81ce540, op=0x81cd580) at bind.c:56
>#7  0x804e309 in connection_operation (arg_v=0x81cca60) at connection.c:767
>#8  0x807af64 in ldap_int_thread_pool_wrapper (pool=0x81b3080) at tpool.c:377
>#9  0x48213d7c in _thread_kern_start ()
>(gdb) info threads
>* 6 thread 0x8159600  0x48187575 in _syscall_sys_select ()
>  5 thread 0x8159500  0x48213c20 in _thread_kern_switch ()
>  4 thread 0x8159200  0x48213c20 in _thread_kern_switch ()
>(gdb) thread 6
>[Switching to thread 0x8159600]
>#0  0x48187575 in _syscall_sys_select ()
>(gdb) bt
>#0  0x48187575 in _syscall_sys_select ()
>#1  0x48218028 in _thread_aio_poll ()
>#2  0x48213b59 in _thread_kern_switch ()
>#3  0x48213fc5 in _thread_kern_block ()
>#4  0x482144e6 in pthread_mutex_lock ()
>#5  0x807b13b in ldap_pvt_thread_mutex_lock (mutex=0x81ce548)
>    at thr_posix.c:207
>#6  0x805a121 in do_bind (conn=0x81ce540, op=0x81cd580) at bind.c:56
>#7  0x804e309 in connection_operation (arg_v=0x81cca60) at connection.c:767
>#8  0x807af64 in ldap_int_thread_pool_wrapper (pool=0x81b3080) at tpool.c:377
>#9  0x48213d7c in _thread_kern_start ()
>(gdb) thread 5
>[Switching to thread 0x8159500]
>#0  0x48213c20 in _thread_kern_switch ()
>(gdb) bt
>#0  0x48213c20 in _thread_kern_switch ()
>#1  0x48213fc5 in _thread_kern_block ()
>#2  0x482181ea in _thread_aio_suspend ()
>#3  0x48218de4 in _thread_sys_read ()
>#4  0x808ca91 in sb_stream_read (sbiod=0x81cbb00, buf=0x81d9000, len=16384)
>    at sockbuf.c:449
>#5  0x808cd07 in sb_rdahead_read (sbiod=0x81cbb20, buf=0x81c4b10, len=1)
>    at sockbuf.c:613
>#6  0x808d065 in sb_debug_read (sbiod=0x81cbb40, buf=0x81c4b10, len=1)
>    at sockbuf.c:779
>#7  0x808c98e in ber_int_sb_read (sb=0x81cba20, buf=0x81c4b10, len=1)
>    at sockbuf.c:366
>#8  0x808b4b1 in ber_get_next (sb=0x81cba20, len=0x4828966c, ber=0x81c4b00)
>    at io.c:509
>#9  0x804e802 in connection_input (conn=0x81ce540) at connection.c:1024
>#10 0x804e6ce in connection_read (s=7) at connection.c:983
>#11 0x804cd77 in slapd_daemon_task (ptr=0x0) at daemon.c:1135
>#12 0x48213d7c in _thread_kern_start ()
>(gdb)  thread 4
>[Switching to thread 0x8159200]
>#0  0x48213c20 in _thread_kern_switch ()
>(gdb) bt
>#0  0x48213c20 in _thread_kern_switch ()
>#1  0x48213fc5 in _thread_kern_block ()
>#2  0x48213908 in pthread_join ()
>#3  0x807b0a2 in ldap_pvt_thread_join (thread=0x8159500, thread_return=0x0)
>    at thr_posix.c:123
>#4  0x804cfb1 in slapd_daemon () at daemon.c:1206
>#5  0x804add8 in main (argc=7, argv=0x80478a0) at main.c:425
>#6  0x804a827 in __start ()
>(gdb) 
>---end
>
>-- 
>Manabu Kondo / manabu@iij.ad.jp

Comment 18 Kurt Zeilenga 2000-09-15 09:55:18 UTC
Please test 2.0.3.  It contains a number of changes
that might resolve this issue.
Comment 19 Kurt Zeilenga 2000-09-15 09:56:17 UTC
changed state Suspended to Feedback
Comment 20 manabu@iij.ad.jp 2000-09-16 08:02:27 UTC
Hi, Kurt.

Fri, 15 Sep 2000 16:55:17 GMT,
Kurt Zeilenga <openldap-its@OpenLDAP.org> sez:

:Please test 2.0.3.  It contains a number of changes
:that might resolve this issue.

I tested 2.0.3, but situation didn't change.
There was a deadlock at bind status at slapd.

If you want, I'll email to you the backtrace results by gdb.

-- 
Manabu Kondo / manabu@iij.ad.jp
Comment 21 Kurt Zeilenga 2000-09-20 05:33:17 UTC
At 08:02 AM 9/16/00 +0000, manabu@iij.ad.jp wrote:
>If you want, I'll email to you the backtrace results by gdb.

Don't send it to me, send it to <openldap-its@openldap.org>
(in a message containing the above subject line).

Kurt


Comment 22 Kurt Zeilenga 2001-05-02 23:27:04 UTC
changed notes
Comment 23 Kurt Zeilenga 2001-07-28 23:44:12 UTC
changed notes
moved from Software Bugs to Build
Comment 24 Kurt Zeilenga 2001-10-01 09:39:07 UTC
changed notes
Comment 25 Kurt Zeilenga 2001-12-05 01:29:11 UTC
changed notes
changed state Feedback to Closed
Comment 26 OpenLDAP project 2014-08-01 21:05:08 UTC
need additional info
BSD/OS build
Additional sleep needed?