4108 – LWL causing core dumps in HEAD

Issue 4108 - LWL causing core dumps in HEAD

Summary: LWL causing core dumps in HEAD

Status:	VERIFIED FIXED

Alias:	None

Product:	OpenLDAP
Classification:	Unclassified
Component:	slapd (show other issues)
Version:	unspecified
Hardware:	All All

Importance:	--- normal
Target Milestone:	---
Assignee:	OpenLDAP project

URL:
Keywords:

Depends on:
Blocks:

Reported:	2005-10-24 18:21 UTC by Quanah Gibson-Mount
Modified:	2014-08-01 21:06 UTC (History)
CC List:	0 users

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this issue.

Description Quanah Gibson-Mount 2005-10-24 18:21:57 UTC

Full_Name: Quanah Gibson-Mount
Version: HEAD
OS: Solaris 8
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (171.64.19.82)


The recent Light Weight Listener import is causing core dumps in HEAD, which
limits the usability testing I'd like to do.  Backtrace:

#0  0xfee1f2b4 in _lwp_kill () from /usr/lib/libc.so.1
(gdb) bt
#0  0xfee1f2b4 in _lwp_kill () from /usr/lib/libc.so.1
#1  0xfedcbcf4 in raise () from /usr/lib/libc.so.1
#2  0xfedb598c in abort () from /usr/lib/libc.so.1
#3  0xfedb5c30 in _assert () from /usr/lib/libc.so.1
#4  0x00022090 in slapd_clr_read (s=13, wake=0) at daemon.c:543
#5  0x00027b88 in connection_read_activate (s=-1639983344) at connection.c:1440
#6  0x00024aa0 in slapd_daemon_task (ptr=0xffc00) at daemon.c:1991
#7  0xfed64c44 in _lwp_start () from /usr/lib/lwp/libthread.so.1
#8  0xfed64c44 in _lwp_start () from /usr/lib/lwp/libthread.so.1


Howard reports encountering the same issue.

--Quanah

Comment 1 Howard Chu 2005-10-28 21:45:56 UTC

Using this slight modification of Jason's bindstress patch in ITS#3855 I 
can reproduce the crash quite rapidly. On my system (Linux x86_64) it 
tends to crash much sooner using epoll than using select, but in both 
cases it is the same assert failing.

--- bindstress.pl       2005-07-11 22:14:32.000000000 -0700
+++ binds2.pl   2005-10-28 14:40:39.000000000 -0700
@@ -2,22 +2,29 @@
 # bindstress.pl - free software by Matt Richard
 # edit the variables below to reflect your LDAP server

-my $server = "server";
+my $server = "ldap://:9000";
 my $uid_base = "cn=users,dc=example,dc=com";
 my $bind_pass = "abc123";
-my $filter = "(!(|(uid=system)(uid=root)))";
+my $filter = "(&(objectclass=person)(!(|(uid=system)(uid=root))))";
+my $rdn = "";

 # search for users in the database
-my @search = `ldapsearch -x -LLL -A -b "$uid_base" -h "$server" 
"$filter" dn`;
+my @search = `ldapsearch -x -LLL -A -b "$uid_base" -H "$server" 
"$filter" dn`;

 # convert search results into a list of users
 for $searchline (@search)
 {
-    @dn = split(/[=,\ ]/, $searchline);
+       $searchline =~ s/^dn: //;
+
+    @dn = split(/[=,]/, $searchline);

-    if ($dn[2] ne "")
+       if ($rdn eq "")
+       {
+               $rdn = $dn[0];
+       }
+    if ($dn[1] ne "")
     {
-        push (@users, $dn[2]);
+        push (@users, $dn[1]);
     }
 }

@@ -28,7 +35,7 @@
     my $uid = @users [ rand @users ];

     # define the ldap bind command
-    $cmd = "ldapwhoami -x -h $server -D \"uid=$uid,$uid_base\" -w 
\"$bind_pass\
"";
+    $cmd = "ldapwhoami -x -H $server -D \"$rdn=$uid,$uid_base\" -w 
\"$bind_pass
\"";

     # get the current time
     my $date = `date \"+%H:%M:%S\"`;


quanah@stanford.edu wrote:
> Full_Name: Quanah Gibson-Mount
> Version: HEAD
> OS: Solaris 8
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (171.64.19.82)
>
>
> The recent Light Weight Listener import is causing core dumps in HEAD, which
> limits the usability testing I'd like to do.  Backtrace:
>
> #0  0xfee1f2b4 in _lwp_kill () from /usr/lib/libc.so.1
> (gdb) bt
> #0  0xfee1f2b4 in _lwp_kill () from /usr/lib/libc.so.1
> #1  0xfedcbcf4 in raise () from /usr/lib/libc.so.1
> #2  0xfedb598c in abort () from /usr/lib/libc.so.1
> #3  0xfedb5c30 in _assert () from /usr/lib/libc.so.1
> #4  0x00022090 in slapd_clr_read (s=13, wake=0) at daemon.c:543
> #5  0x00027b88 in connection_read_activate (s=-1639983344) at connection.c:1440
> #6  0x00024aa0 in slapd_daemon_task (ptr=0xffc00) at daemon.c:1991
> #7  0xfed64c44 in _lwp_start () from /usr/lib/lwp/libthread.so.1
> #8  0xfed64c44 in _lwp_start () from /usr/lib/lwp/libthread.so.1
>
>
> Howard reports encountering the same issue.
>
> --Quanah
>
>
>
>
>   


-- 
  -- Howard Chu
  Chief Architect, Symas Corp.  http://www.symas.com
  Director, Highland Sun        http://highlandsun.com/hyc
  OpenLDAP Core Team            http://www.openldap.org/project/

Comment 2 Howard Chu 2005-10-28 22:09:59 UTC

Howard Chu wrote:
> Using this slight modification of Jason's bindstress patch in ITS#3855 
> I can reproduce the crash quite rapidly. On my system (Linux x86_64) 
> it tends to crash much sooner using epoll than using select, but in 
> both cases it is the same assert failing.
>
It appears that the problem existed before. I was able to reproduce the 
same crash against 2.3.11 after running the bindstress script for 
several minutes. I.e., it doesn't happen as quickly in 2.3.11, but it 
eventually hits the same assert.

-- 
  -- Howard Chu
  Chief Architect, Symas Corp.  http://www.symas.com
  Director, Highland Sun        http://highlandsun.com/hyc
  OpenLDAP Core Team            http://www.openldap.org/project/

Comment 3 Howard Chu 2005-10-28 22:34:27 UTC

moved from Incoming to Development

Comment 4 Quanah Gibson-Mount 2005-10-29 00:00:32 UTC

--On Friday, October 28, 2005 3:09 PM -0700 Howard Chu <hyc@symas.com> 
wrote:

> Howard Chu wrote:
>> Using this slight modification of Jason's bindstress patch in ITS#3855
>> I can reproduce the crash quite rapidly. On my system (Linux x86_64)
>> it tends to crash much sooner using epoll than using select, but in
>> both cases it is the same assert failing.
>>
> It appears that the problem existed before. I was able to reproduce the
> same crash against 2.3.11 after running the bindstress script for several
> minutes. I.e., it doesn't happen as quickly in 2.3.11, but it eventually
> hits the same assert.

You are filling me with cheer... :P

I think we need a 2.3.12 release soon anyway. ;)

--Quanah

--
Quanah Gibson-Mount
Principal Software Developer
ITSS/Shared Services
Stanford University
GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html

"These censorship operations against schools and libraries are stronger
than ever in the present religio-political climate. They often focus on
fantasy and sf books, which foster that deadly enemy to bigotry and blind
faith, the imagination." -- Ursula K. Le Guin

Comment 5 Howard Chu 2005-10-29 02:21:54 UTC

quanah@stanford.edu wrote:
> --On Friday, October 28, 2005 3:09 PM -0700 Howard Chu <hyc@symas.com> 
> wrote:
>
>   
>> Howard Chu wrote:
>>     
>>> Using this slight modification of Jason's bindstress patch in ITS#3855
>>> I can reproduce the crash quite rapidly. On my system (Linux x86_64)
>>> it tends to crash much sooner using epoll than using select, but in
>>> both cases it is the same assert failing.
>>>
>>>       
>> It appears that the problem existed before. I was able to reproduce the
>> same crash against 2.3.11 after running the bindstress script for several
>> minutes. I.e., it doesn't happen as quickly in 2.3.11, but it eventually
>> hits the same assert.
>>     
>
> You are filling me with cheer... :P
>
> I think we need a 2.3.12 release soon anyway. ;)

So it turns out there were a few race conditions and cases of trying to 
close the same connection twice. (Which was bad news, because the 
descriptor could get reused immediately after the first close.)
I believe this is now fixed in HEAD. Please test. Heavily.

-- 
  -- Howard Chu
  Chief Architect, Symas Corp.  http://www.symas.com
  Director, Highland Sun        http://highlandsun.com/hyc
  OpenLDAP Core Team            http://www.openldap.org/project/

Comment 6 Howard Chu 2005-10-29 02:27:32 UTC

changed notes
changed state Open to Test

Comment 7 Howard Chu 2005-10-29 02:27:51 UTC

moved from Development to Software Bugs

Comment 8 Howard Chu 2005-10-29 04:19:17 UTC

hyc@symas.com wrote:
> So it turns out there were a few race conditions and cases of trying to 
> close the same connection twice. (Which was bad news, because the 
> descriptor could get reused immediately after the first close.)
> I believe this is now fixed in HEAD. Please test. Heavily.

Running slapd on a single processor machine, I no longer see any 
asserts. Running on my dual machine, if syslog is enabled, I 
occasionally see an assert in slapd_set_read() called from 
connection_read(). With syslog disabled, this problem doesn't occur. No 
bright ideas come to mind at the moment...

-- 
  -- Howard Chu
  Chief Architect, Symas Corp.  http://www.symas.com
  Director, Highland Sun        http://highlandsun.com/hyc
  OpenLDAP Core Team            http://www.openldap.org/project/

Comment 9 Howard Chu 2005-10-29 13:17:53 UTC

hyc@symas.com wrote:
> Running slapd on a single processor machine, I no longer see any 
> asserts. Running on my dual machine, if syslog is enabled, I 
> occasionally see an assert in slapd_set_read() called from 
> connection_read(). With syslog disabled, this problem doesn't occur. No 
> bright ideas come to mind at the moment...
Some context:
Oct 29 04:56:14 mandolin slapd.head[10328]: conn=87548 op=1 UNBIND
Oct 29 04:56:14 mandolin slapd.head[10328]: conn=87548 fd=15 closed
Oct 29 04:56:14 mandolin slapd.head[10328]: conn=87549 fd=15 ACCEPT from 
IP=192.168.1.22:23726 (IP=0.0.0.0:9000)
Oct 29 04:56:14 mandolin slapd.head[10328]: conn=87549 op=0 BIND 
dn="cn=Sandi Couto,ou=peons,o=foo" method=128
Oct 29 04:56:14 mandolin slapd.head[10328]: conn=87549 op=0 BIND 
dn="cn=Sandi Couto,ou=Peons,o=foo" mech=SIMPLE ssf=0
Oct 29 04:56:14 mandolin slapd.head[10328]: conn=87549 op=0 RESULT 
tag=97 err=0 text=
Oct 29 04:56:14 mandolin slapd.head[10328]: conn=87549 op=1 UNBIND
Oct 29 04:56:14 mandolin slapd.head[10328]: conn=87549 fd=15 closed
Oct 29 04:56:14 mandolin slapd.head[10328]: conn=87550 fd=15 ACCEPT from 
IP=192.168.1.22:23727 (IP=0.0.0.0:9000)

daemon: listen=11, new connection on 15
daemon: added 15r
daemon: activity on 2 descriptors
daemon: select: listen=10 active_threads=0 tvp=NULL
daemon: select: listen=11 active_threads=0 tvp=NULL
daemon: activity on 1 descriptor
daemon: activity on: 15r
daemon: read active on 15
daemon: select: listen=10 active_threads=0 tvp=NULL
daemon: select: listen=11 active_threads=0 tvp=NULL
daemon: activity on 2 descriptors
daemon: select: listen=10 active_threads=0 tvp=NULL
daemon: select: listen=11 active_threads=0 tvp=NULL
daemon: activity on 1 descriptor
daemon: activity on: 15r
daemon: read active on 15
daemon: select: listen=10 active_threads=0 tvp=NULL
daemon: select: listen=11 active_threads=0 tvp=NULL
daemon: activity on 1 descriptor
daemon: select: listen=10 active_threads=0 tvp=NULL
daemon: select: listen=11 active_threads=0 tvp=NULL
daemon: activity on 2 descriptors
daemon: activity on: 15r
daemon: read active on 15
daemon: select: listen=10 active_threads=0 tvp=NULL
daemon: select: listen=11 busy
daemon: removing 15
daemon: listen=11, new connection on 15
slapd.head: ../../../head/servers/slapd/daemon.c:567: slapd_set_read: 
Assertion `((slap_daemon.sd_index[(s)]) != -1)' failed.
daemon: activity on 1 descriptor

Program received signal SIGABRT, Aborted.
[Switching to Thread 1090517344 (LWP 10355)]
0x00002aaaab6e76cd in raise () from /lib64/tls/libc.so.6
(gdb) bt
#0  0x00002aaaab6e76cd in raise () from /lib64/tls/libc.so.6
#1  0x00002aaaab6e8c7e in abort () from /lib64/tls/libc.so.6
#2  0x00002aaaab6e1420 in __assert_fail () from /lib64/tls/libc.so.6
#3  0x000000000042e7a3 in slapd_set_read (s=15, wake=1)
    at ../../../head/servers/slapd/daemon.c:567
#4  0x00000000004369bf in connection_read (s=15, op=0x40fff480)
    at ../../../head/servers/slapd/connection.c:1665
#5  0x0000000000435efe in connection_read_thread (ctx=0x40fff4c0, argv=0xf)
    at ../../../head/servers/slapd/connection.c:1419
#6  0x0000000000556184 in ldap_int_thread_pool_wrapper (xpool=0x788710)
    at ../../../head/libraries/libldap_r/tpool.c:615
#7  0x00002aaaab5ab649 in start_thread () from /lib64/tls/libpthread.so.0
#8  0x00002aaaab7716c3 in thread_start () from /lib64/tls/libc.so.6
(gdb) info thr
  10 Thread 1149237600 (LWP 10362)  0x00002aaaab5ad55a in 
pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/tls/libpthread.so.0
  9 Thread 1140848992 (LWP 10361)  0x00002aaaab5ad55a in 
pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/tls/libpthread.so.0
  8 Thread 1132460384 (LWP 10360)  0x00002aaaab5ad55a in 
pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/tls/libpthread.so.0
  7 Thread 1124071776 (LWP 10359)  0x00002aaaab5ad55a in 
pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/tls/libpthread.so.0
  6 Thread 1115683168 (LWP 10358)  0x00002aaaab5af8eb in 
__lll_mutex_lock_wait
    () from /lib64/tls/libpthread.so.0
  5 Thread 1107294560 (LWP 10357)  0x00002aaaab5af8eb in 
__lll_mutex_lock_wait
    () from /lib64/tls/libpthread.so.0
  4 Thread 1098905952 (LWP 10356)  0x00002aaaab5ad55a in 
pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/tls/libpthread.so.0
* 3 Thread 1090517344 (LWP 10355)  0x00002aaaab6e76cd in raise ()
   from /lib64/tls/libc.so.6
  2 Thread 1082128736 (LWP 10337)  0x00002aaaab5af8eb in 
__lll_mutex_lock_wait
    () from /lib64/tls/libpthread.so.0
  1 Thread 46912511001792 (LWP 10328)  0x00002aaaab5abd2f in pthread_join ()
   from /lib64/tls/libpthread.so.0
(gdb) thr 6
[Switching to thread 6 (Thread 1115683168 (LWP 10358))]#5  
0x000000000043108a in slap_listener (sl=0x2aaaab71c520) at 
../../../head/servers/slapd/daemon.c:1519
1519            slapd_add( s, 1, NULL );
(gdb) bt
#0  0x00002aaaab5af8eb in __lll_mutex_lock_wait ()
   from /lib64/tls/libpthread.so.0
#1  0x0000000000000000 in ?? ()
#2  0x0000000000000008 in ?? ()
#3  0x00002aaaab5ac877 in pthread_mutex_lock () from 
/lib64/tls/libpthread.so.0
#4  0x00002aaaab76ddc8 in syslog () from /lib64/tls/libc.so.6
#5  0x000000000043108a in slap_listener (sl=0x2aaaab71c520)
    at ../../../head/servers/slapd/daemon.c:1519
(gdb) thr 5
[Switching to thread 5 (Thread 1107294560 (LWP 10357))]#0  
0x00002aaaab5af8eb in __lll_mutex_lock_wait () from 
/lib64/tls/libpthread.so.0
(gdb) bt 20
#0  0x00002aaaab5af8eb in __lll_mutex_lock_wait ()
   from /lib64/tls/libpthread.so.0
#1  0x0000000000766fe8 in slap_counters ()
#2  0x0000000000000000 in ?? ()
#3  0x00002aaaab5ac877 in pthread_mutex_lock () from 
/lib64/tls/libpthread.so.0
#4  0x000000000082e7c0 in ?? ()
#5  0x0000000041fff290 in ?? ()
#6  0x000000000058a3e9 in ber_memcalloc (n=46912527370128, s=1107293072)
    at ../../../head/libraries/liblber/memory.c:306
#7  0x00000000004593b9 in do_bind (op=0x2aaaadf02f20, rs=0x41fff3e0)
    at ../../../head/servers/slapd/bind.c:54
#8  0x0000000000435a98 in connection_operation (ctx=0x41fff4c0,
    arg_v=0x2aaaadf02f20) at ../../../head/servers/slapd/connection.c:1292
#9  0x0000000000556184 in ldap_int_thread_pool_wrapper (xpool=0x788710)
    at ../../../head/libraries/libldap_r/tpool.c:615
#10 0x00002aaaab5ab649 in start_thread () from /lib64/tls/libpthread.so.0
#11 0x00002aaaab7716c3 in thread_start () from /lib64/tls/libc.so.6
(gdb) thr 2
[Switching to thread 2 (Thread 1082128736 (LWP 10337))]#0  
0x00002aaaab5af8eb in __lll_mutex_lock_wait () from 
/lib64/tls/libpthread.so.0
(gdb) bt 20
#0  0x00002aaaab5af8eb in __lll_mutex_lock_wait ()
   from /lib64/tls/libpthread.so.0
#1  0x0000000000000000 in ?? ()
#2  0x0000000000000000 in ?? ()
#3  0x00002aaaab5ac877 in pthread_mutex_lock () from 
/lib64/tls/libpthread.so.0
#4  0x0000000000000001 in ?? ()
#5  0x00000000005962ce in __PRETTY_FUNCTION__.11610 ()
#6  0x0000000000000000 in ?? ()
#7  0x0000000000435ed3 in connection_client_stop (s=0)
    at ../../../head/servers/slapd/connection.c:1404
#8  0x00000000004319a1 in slapd_daemon_task (ptr=0x0)
    at ../../../head/servers/slapd/daemon.c:1734
#9  0x00002aaaab5ab649 in start_thread () from /lib64/tls/libpthread.so.0
#10 0x00002aaaab7716c3 in thread_start () from /lib64/tls/libc.so.6
#11 0x0000000000000000 in ?? ()
#12 0x0000000000000000 in ?? ()
#13 0x0000000000000000 in ?? ()
#14 0x0000000000000000 in ?? ()
#15 0x0000000000000000 in ?? ()
#16 0x0000000000000000 in ?? ()
#17 0x0000000000000000 in ?? ()
#18 0x0000000000000000 in ?? ()
#19 0x0000000000000000 in ?? ()
(More stack frames follow...)
(gdb) frame 8
#8  0x00000000004319a1 in slapd_daemon_task (ptr=0x0)
    at ../../../head/servers/slapd/daemon.c:1734
1734                    ldap_pvt_thread_mutex_lock( &slap_daemon.sd_mutex );
(gdb) thr 3
[Switching to thread 3 (Thread 1090517344 (LWP 10355))]#0  
0x00002aaaab6e76cd in raise () from /lib64/tls/libc.so.6
(gdb) frame 4
#4  0x00000000004369bf in connection_read (s=15, op=0x40fff480)
    at ../../../head/servers/slapd/connection.c:1665
1665            slapd_set_read( s, 1 );
(gdb) p *c
$13 = {c_struct_state = 2, c_conn_state = 3, c_conn_idx = 3,
  c_close_reason = 0x596d1b "?", c_mutex = {__m_reserved = 2, __m_count = 0,
    __m_owner = 0x100002873, __m_kind = 0, __m_lock = {__status = 0,
      __spinlock = 0}}, c_sb = 0x2aaaadc02d90, c_starttime = 1130586974,
  c_activitytime = 1130586974, c_connid = 87550, c_peer_domain = {bv_len 
= 7,
    bv_val = 0x2aaaadf03460 "unknown"}, c_peer_name = {bv_len = 21,
    bv_val = 0x2aaaadf034e0 "IP=192.168.1.22:23727"}, c_listener = 0x76a260,
  c_sasl_bind_in_progress = 0, c_sasl_bind_mech = {bv_len = 0, bv_val = 
0x0},
  c_sasl_dn = {bv_len = 0, bv_val = 0x0}, c_sasl_authz_dn = {bv_len = 0,
    bv_val = 0x0}, c_authz_backend = 0x0, c_authz_cookie = 0x0, c_authz = {
    sai_method = 128, sai_mech = {bv_len = 0, bv_val = 0x0}, sai_dn = {
      bv_len = 0, bv_val = 0x0}, sai_ndn = {bv_len = 0, bv_val = 0x0},
    sai_ssf = 0, sai_transport_ssf = 0, sai_tls_ssf = 0, sai_sasl_ssf = 0},
  c_protocol = 0, c_ops = {stqh_first = 0x2aaaadf02f20,
    stqh_last = 0x2aaaadf03078}, c_pending_ops = {stqh_first = 0x0,
    stqh_last = 0x2aaaac8788a8}, c_write_mutex = {__m_reserved = 0,
    __m_count = 0, __m_owner = 0x0, __m_kind = 0, __m_lock = {__status = 0,
      __spinlock = 0}}, c_write_cv = {__c_lock = {__status = 8589934592,
      __spinlock = 1}, __c_waiting = 0x1,
    __padding = "\001\000\000\000\000\000\000\000\220\207\207¬ª*\000",
    __align = 0}, c_currentber = 0x2aaaadc00930, c_writewaiter = 0,
  c_is_tls = 0, c_needs_tls_accept = 0, c_sasl_layers = 0, c_sasl_done = 0,
  c_sasl_authctx = 0x2aaaadf00b40, c_sasl_sockctx = 0x0,
  c_sasl_extra = 0x2aaaadf03630, c_sasl_bindop = 0x0, 
c_pagedresults_state = {
    ps_be = 0x0, ps_size = 0, ps_cookie = 0, ps_count = 0},
  c_n_ops_received = 1, c_n_ops_executing = 1, c_n_ops_pending = 0,
  c_n_ops_completed = 0, c_n_get = 1, c_n_read = 1, c_n_write = 0, c_pb 
= 0x0,
  c_extensions = 0x0, c_clientfunc = 0, c_clientarg = 0x0,
  c_send_ldap_result = 0x4490d9 <slap_send_ldap_result>,
  c_send_search_entry = 0x449e34 <slap_send_search_entry>,
  c_send_search_reference = 0x44c0ac <slap_send_search_reference>,
  c_send_ldap_extended = 0x449973 <slap_send_ldap_extended>,
  c_send_ldap_intermediate = 0x449c15 <slap_send_ldap_intermediate>}
(gdb)

What really doesn't make sense here is that the reader thread that 
crashed shouldn't even be running yet. I.e., the debug output shows that 
fd 15 was just removed from slapd control. Then a new connection was 
accepted, using that same fd. But the listener thread hasn't added 15 
back into slapd control yet - that's the slapd_add() call that happens 
right after the StatsLog invocation. So, fd 15 really is still invalid 
at this point. But since it is invalid, there should not be any active 
reader threads using that descriptor yet. Notice that thread 5 has 
already read a Bind operation from that fd, and the conn->c_conn_state 
is SLAP_C_BINDING. Again, it should not be possible for a reader thread 
to have already gotten this far, since the accepting thread hasn't 
enabled the descriptor yet.

The only possibility that comes to mind is that this reader thread is 
leftover from the previous invocation. I.e., it was submitted into the 
pool while the previous connection was active, and the connection was 
closed before the thread got to run. Since the listener thread has just 
called connection_init(), the conn structure is valid when the reader 
thread runs. But since the listener hasn't called slapd_add() yet, the 
socket is still invalid.

Moving the slapd_add() so that it occurs while the conn structure is 
locked appears to fix this problem.

-- 
  -- Howard Chu
  Chief Architect, Symas Corp.  http://www.symas.com
  Director, Highland Sun        http://highlandsun.com/hyc
  OpenLDAP Core Team            http://www.openldap.org/project/

Comment 10 Howard Chu 2005-10-29 13:32:23 UTC

hyc@symas.com wrote:
> So it turns out there were a few race conditions and cases of trying to 
> close the same connection twice. (Which was bad news, because the 
> descriptor could get reused immediately after the first close.)
> I believe this is now fixed in HEAD. Please test. Heavily.

One more footnote - testing this bug using tests/progs/slapd-bind 
freezes up my kernel when running slapd with debug output on the 
console. It seems to be safe when running from an ssh session. I also 
run out of socket resources on the server pretty quickly if the bind 
client is on the same machine as the server. Otherwise, running remote 
clients, it can run through hundreds of thousands of connections without 
any trouble. (Server was running Linux 2.6.13, now 2.6.14, same lockup 
occurs on both. The Magic SysRq doesn't work, the kernel is totally 
locked up, no way to get any diagnostics or recover at that point.)

-- 
  -- Howard Chu
  Chief Architect, Symas Corp.  http://www.symas.com
  Director, Highland Sun        http://highlandsun.com/hyc
  OpenLDAP Core Team            http://www.openldap.org/project/

Comment 11 Kurt Zeilenga 2005-11-03 18:21:56 UTC

changed notes
changed state Test to Closed

Comment 12 Howard Chu 2009-02-17 05:15:58 UTC

moved from Software Bugs to Archive.Software Bugs

Comment 13 OpenLDAP project 2014-08-01 21:06:40 UTC

HEAD only
fixed