[Date Prev][Date Next] [Chronological] [Thread] [Top]

slapd stops processing thread jobs (ITS#2774)



Full_Name: John Matthews
Version: 2.1.22
OS: Linux 2.4.20
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (203.12.97.112)


I have been investigating a slapd problem where after a few days it will stop
responding to new client requests. Rather oddly the problem always occurs at
about 1am on one of our LDAP servers. Using a debug level that includes
LDAP_DEBUG_CONNS and with some extra debug output in connection.c (see below) I
have tracked the problem down into the threading code tpool.c. 

static int connection_op_activate( Operation *op )
{

...
status = ldap_pvt_thread_pool_submit( &connection_pool,
		connection_operation, (void *) op );

Debug( LDAP_DEBUG_CONNS, "connection_activate %d %s %d\n", op->o_conn->c_connid,
connection_state2str(op->o_conn->c_conn_state), status );

static void *
connection_operation( void *ctx, void *arg_v )
{
...
	Debug( LDAP_DEBUG_CONNS, "connection(%d) op start %d %d\n", conn->c_connid,
conn->c_n_ops_executing, conn->c_n_ops_completed  );

	ldap_pvt_thread_mutex_lock( &num_ops_mutex );

static int
connection_input(
    Connection *conn
)
{

...

	/* Don't process requests when the conn is in the middle of a
	* Bind, or if it's closing. Also, don't let any single conn
	* use up all the available threads, and don't execute if we're
	* currently blocked on output. And don't execute if there are
	* already pending ops, let them go first.
	*/
	if ( conn->c_conn_state == SLAP_C_BINDING
		|| conn->c_conn_state == SLAP_C_CLOSING
		|| conn->c_n_ops_executing >= connection_pool_max/2
		|| conn->c_n_ops_pending
		|| conn->c_writewaiter)
	{
		int max = conn->c_dn.bv_len ? slap_conn_max_pending_auth
			: slap_conn_max_pending;
#ifdef NEW_LOGGING
		LDAP_LOG( CONNECTION, INFO, 
			"connection_input: conn %lu  deferring operation\n",
			conn->c_connid, 0, 0 );
#else
		Debug( LDAP_DEBUG_CONNS, "deferring operation %d %d %d\n", conn->c_conn_state
== SLAP_C_BINDING, conn->c_conn_state == SLAP_C_CLOSING, conn->c_writewaiter );
#endif

I get the following debug output when the problem ocurrs:

Oct 17 00:07:47 bizserve slapd[14564]: daemon: select: listen=6 active_threads=1
tvp=NULL
Oct 17 00:07:47 bizserve slapd[14564]: daemon: activity on 1 descriptors
Oct 17 00:07:47 bizserve slapd[14564]: daemon: activity on:
Oct 17 00:07:47 bizserve slapd[14564]:  14r
Oct 17 00:07:47 bizserve slapd[14564]:
Oct 17 00:07:47 bizserve slapd[14564]: daemon: read activity on 14
Oct 17 00:07:47 bizserve slapd[14564]: connection_get(14): got connid=1641
Oct 17 00:07:47 bizserve slapd[14564]: connection_read(14): checking for input
on id=1641
Oct 17 00:07:47 bizserve slapd[14564]: connection_activate 1641  0
Oct 17 00:07:47 bizserve slapd[14564]: ber_get_next on fd 14 failed errno=11
(Resource temporarily unavailable)
Oct 17 00:07:47 bizserve slapd[14564]: connection(1641) op start 1 8
Oct 17 00:07:47 bizserve slapd[14564]: connection(1641): do_search
Oct 17 00:07:47 bizserve slapd[14564]: <= bdb_equality_candidates: (objectClass)
index_param failed (18)
Oct 17 00:07:47 bizserve slapd[14564]: <= bdb_equality_candidates: (objectClass)
index_param failed (18)
Oct 17 00:07:47 bizserve slapd[14564]: daemon: select: listen=6 active_threads=1
tvp=NULL
Oct 17 00:07:47 bizserve slapd[14564]: connection(1641) op finish 0 9
Oct 17 00:09:28 bizserve slapd[14564]: daemon: activity on 2 descriptors
Oct 17 00:09:28 bizserve slapd[14564]: daemon: activity on:
Oct 17 00:09:28 bizserve slapd[14564]:  10r
Oct 17 00:09:28 bizserve slapd[14564]:  14r
Oct 17 00:09:28 bizserve slapd[14564]:
Oct 17 00:09:28 bizserve slapd[14564]: daemon: read activity on 10
Oct 17 00:09:28 bizserve slapd[14564]: connection_get(10): got connid=1640
Oct 17 00:09:28 bizserve slapd[14564]: connection_read(10): checking for input
on id=1640
Oct 17 00:09:29 bizserve slapd[14564]: ber_get_next on fd 10 failed errno=0
(Success)
Oct 17 00:09:29 bizserve slapd[14564]: connection_read(10): input error=-2
id=1640, closing.
Oct 17 00:09:29 bizserve slapd[14564]: connection_closing: readying conn=1640
sd=10 for close
Oct 17 00:09:29 bizserve slapd[14564]: connection_closing 1640
Oct 17 00:09:29 bizserve slapd[14564]: connection_close: conn=1640 sd=10
Oct 17 00:09:29 bizserve slapd[14564]: daemon: removing 10
Oct 17 00:09:29 bizserve slapd[14564]: connection_destroy -1
Oct 17 00:09:29 bizserve slapd[14564]: daemon: read activity on 14
Oct 17 00:09:29 bizserve slapd[14564]: connection_get(14): got connid=1641
Oct 17 00:09:29 bizserve slapd[14564]: connection_read(14): checking for input
on id=1641
Oct 17 00:09:29 bizserve slapd[14564]: ber_get_next on fd 14 failed errno=0
(Success)
Oct 17 00:09:29 bizserve slapd[14564]: connection_read(14): input error=-2
id=1641, closing.
Oct 17 00:09:29 bizserve slapd[14564]: connection_closing: readying conn=1641
sd=14 for close
Oct 17 00:09:29 bizserve slapd[14564]: connection_closing 1641
Oct 17 00:09:29 bizserve slapd[14564]: connection_close: conn=1641 sd=14
Oct 17 00:09:29 bizserve slapd[14564]: daemon: removing 14
Oct 17 00:09:29 bizserve slapd[14564]: connection_destroy -1
Oct 17 00:09:29 bizserve slapd[14564]: daemon: select: listen=6 active_threads=0
tvp=NULL
Oct 17 00:09:29 bizserve slapd[14564]: daemon: activity on 1 descriptors
Oct 17 00:09:29 bizserve slapd[14564]: daemon: select: listen=6 active_threads=0
tvp=NULL
Oct 17 00:11:08 bizserve slapd[14564]: daemon: activity on 1 descriptors
Oct 17 00:11:08 bizserve slapd[14564]: daemon: activity on:
Oct 17 00:11:08 bizserve slapd[14564]:  15r
Oct 17 00:11:08 bizserve slapd[14564]: daemon: read activity on 15
Oct 17 00:11:08 bizserve slapd[14564]: connection_get(15): got connid=1642
Oct 17 00:11:08 bizserve slapd[14564]: connection_read(15): checking for input
on id=1642
Oct 17 00:11:08 bizserve slapd[14564]: ber_get_next on fd 15 failed errno=0
(Success)
Oct 17 00:11:08 bizserve slapd[14564]: connection_read(15): input error=-2
id=1642, closing.
Oct 17 00:11:08 bizserve slapd[14564]: connection_closing: readying conn=1642
sd=15 for close
Oct 17 00:11:08 bizserve slapd[14564]: connection_closing 1642
Oct 17 00:11:08 bizserve slapd[14564]: connection_close: conn=1642 sd=15
Oct 17 00:11:08 bizserve slapd[14564]: daemon: removing 15
Oct 17 00:11:08 bizserve slapd[14564]: connection_destroy -1
Oct 17 00:11:08 bizserve slapd[14564]: daemon: select: listen=6 active_threads=0
tvp=NULL
Oct 17 00:11:08 bizserve slapd[14564]: daemon: activity on 1 descriptors
Oct 17 00:11:08 bizserve slapd[14564]: daemon: select: listen=6 active_threads=0
tvp=NULL
Oct 17 00:25:20 bizserve slapd[14564]: daemon: activity on 1 descriptors
Oct 17 00:25:20 bizserve slapd[14564]: daemon: new connection on 10
Oct 17 00:25:20 bizserve slapd[14564]: connection_init 1643
Oct 17 00:25:20 bizserve slapd[14564]: daemon: added 10r
Oct 17 00:25:20 bizserve slapd[14564]: daemon: activity on:
Oct 17 00:25:20 bizserve slapd[14564]:
Oct 17 00:25:20 bizserve slapd[14564]: daemon: select: listen=6 active_threads=0
tvp=NULL
Oct 17 00:25:20 bizserve slapd[14564]: daemon: activity on 1 descriptors
Oct 17 00:25:20 bizserve slapd[14564]: daemon: activity on:
Oct 17 00:25:20 bizserve slapd[14564]:  10r
Oct 17 00:25:20 bizserve slapd[14564]:
Oct 17 00:25:20 bizserve slapd[14564]: daemon: read activity on 10
Oct 17 00:25:20 bizserve slapd[14564]: connection_get(10): got connid=1643
Oct 17 00:25:20 bizserve slapd[14564]: connection_read(10): checking for input
on id=1643
Oct 17 00:25:20 bizserve slapd[14564]: connection_activate 1643 B 0
Oct 17 00:25:20 bizserve slapd[14564]: ber_get_next on fd 10 failed errno=11
(Resource temporarily unavailable)
Oct 17 00:25:20 bizserve slapd[14564]: daemon: select: listen=6 active_threads=1
tvp=NULL
Oct 17 00:26:20 bizserve slapd[14564]: daemon: activity on 1 descriptors
Oct 17 00:26:20 bizserve slapd[14564]: daemon: activity on:
Oct 17 00:26:20 bizserve slapd[14564]:  10r
Oct 17 00:26:20 bizserve slapd[14564]:
Oct 17 00:26:20 bizserve slapd[14564]: daemon: read activity on 10
Oct 17 00:26:20 bizserve slapd[14564]: connection_get(10): got connid=1643
Oct 17 00:26:20 bizserve slapd[14564]: connection_read(10): checking for input
on id=1643
Oct 17 00:26:20 bizserve slapd[14564]: deferring operation 1 0 0
Oct 17 00:26:20 bizserve slapd[14564]: Pending 1 1
Oct 17 00:26:20 bizserve slapd[14564]: daemon: select: listen=6 active_threads=1
tvp=NULL
Oct 17 00:26:20 bizserve slapd[14564]: daemon: activity on 1 descriptors
Oct 17 00:26:20 bizserve slapd[14564]: daemon: activity on:
Oct 17 00:26:20 bizserve slapd[14564]:  10r
Oct 17 00:26:20 bizserve slapd[14564]:
Oct 17 00:26:20 bizserve slapd[14564]: daemon: read activity on 10
Oct 17 00:26:20 bizserve slapd[14564]: connection_get(10): got connid=1643
Oct 17 00:26:20 bizserve slapd[14564]: connection_read(10): checking for input
on id=1643
Oct 17 00:26:20 bizserve slapd[14564]: deferring operation 1 0 0
Oct 17 00:26:20 bizserve slapd[14564]: Pending 2 1
Oct 17 00:26:20 bizserve slapd[14564]: daemon: select: listen=6 active_threads=1
tvp=NULL
Oct 17 00:26:20 bizserve slapd[14564]: daemon: activity on 1 descriptors
Oct 17 00:26:20 bizserve slapd[14564]: daemon: activity on:
Oct 17 00:26:20 bizserve slapd[14564]:  10r
Oct 17 00:26:20 bizserve slapd[14564]:
Oct 17 00:26:20 bizserve slapd[14564]: daemon: read activity on 10
Oct 17 00:26:20 bizserve slapd[14564]: connection_get(10): got connid=1643
Oct 17 00:26:20 bizserve slapd[14564]: connection_read(10): checking for input
on id=1643
Oct 17 00:26:20 bizserve slapd[14564]: ber_get_next on fd 10 failed errno=0
(Success)
Oct 17 00:26:20 bizserve slapd[14564]: connection_read(10): input error=-2
id=1643, closing.
Oct 17 00:26:20 bizserve slapd[14564]: connection_closing: readying conn=1643
sd=10 for close
Oct 17 00:26:20 bizserve slapd[14564]: connection_closing 1643
Oct 17 00:26:20 bizserve slapd[14564]: connection_close: deferring conn=1643
sd=10
Oct 17 00:26:20 bizserve slapd[14564]: daemon: select: listen=6 active_threads=1
tvp=NULL
Oct 17 00:26:20 bizserve slapd[14564]: daemon: activity on 1 descriptors
Oct 17 00:26:20 bizserve slapd[14564]: daemon: select: listen=6 active_threads=1
tvp=NULL
Oct 17 00:27:24 bizserve slapd[14564]: daemon: activity on 1 descriptors
Oct 17 00:27:24 bizserve slapd[14564]: daemon: new connection on 14
Oct 17 00:27:24 bizserve slapd[14564]: connection_init 1644
Oct 17 00:27:24 bizserve slapd[14564]: daemon: added 14r
Oct 17 00:27:24 bizserve slapd[14564]: daemon: activity on:
Oct 17 00:27:24 bizserve slapd[14564]:
Oct 17 00:27:24 bizserve slapd[14564]: daemon: select: listen=6 active_threads=1
tvp=NULL
Oct 17 00:27:24 bizserve slapd[14564]: daemon: activity on 1 descriptors
Oct 17 00:27:24 bizserve slapd[14564]: daemon: activity on:
Oct 17 00:27:24 bizserve slapd[14564]:  14r
Oct 17 00:27:24 bizserve slapd[14564]: daemon: read activity on 14
Oct 17 00:27:24 bizserve slapd[14564]: connection_get(14): got connid=1644
Oct 17 00:27:24 bizserve slapd[14564]: connection_read(14): checking for input
on id=1644
Oct 17 00:27:24 bizserve slapd[14564]: connection_activate 1644 B 0
Oct 17 00:27:24 bizserve slapd[14564]: ber_get_next on fd 14 failed errno=11
(Resource temporarily unavailable)
Oct 17 00:27:24 bizserve slapd[14564]: daemon: select: listen=6 active_threads=2
tvp=NULL
Oct 17 00:28:24 bizserve slapd[14564]: daemon: activity on 1 descriptors
Oct 17 00:28:24 bizserve slapd[14564]: daemon: activity on:
Oct 17 00:28:24 bizserve slapd[14564]:  14r
Oct 17 00:28:24 bizserve slapd[14564]:
Oct 17 00:28:24 bizserve slapd[14564]: daemon: read activity on 14
Oct 17 00:28:24 bizserve slapd[14564]: connection_get(14): got connid=1644
Oct 17 00:28:24 bizserve slapd[14564]: connection_read(14): checking for input
on id=1644
Oct 17 00:28:24 bizserve slapd[14564]: deferring operation 1 0 0
Oct 17 00:28:24 bizserve slapd[14564]: Pending 1 1
Oct 17 00:28:24 bizserve slapd[14564]: daemon: select: listen=6 active_threads=2
tvp=NULL
Oct 17 00:28:24 bizserve slapd[14564]: daemon: activity on 1 descriptors
Oct 17 00:28:24 bizserve slapd[14564]: daemon: activity on:
Oct 17 00:28:24 bizserve slapd[14564]:  14r


**************

Notice how each new operation is deferred in connection_input() because
conn->c_conn_state is "stuck" in a SLAP_C_BINDING state. The number of
"active_threads" in the debug ouput increases for each new request,
connection_op_activate() never gets called again and of course the LDAP clients
never get any further response from the ldap server .. until slapd is
restarted.