[Date Prev][Date Next] [Chronological] [Thread] [Top]

100.000 pr. second: "connection_read(64): no connection" from slapd



Hi all,

 

my slapd (@(#) $OpenLDAP: slapd 2.4.44 (Feb  9 2017 13:38:13) $) has developed a tendency to become unresponsive for 10-30 minutes at a time.

I run it with -dStats,Sync which gets me a fair amount of debugging info. This I pipe thru multilog to get the timestamps below.

 

The two snippets below are from two different dates, due to the difficulty of reproducing the error.

 

What I’m seeing at the time, is that I am getting thousands of these (from an earlier run):

 

2017-03-06 14:15:11.641329500 58bd60df connection_read(64): no connection!

2017-03-06 14:15:11.641331500 58bd60df connection_read(64): no connection!

2017-03-06 14:15:11.641393500 58bd60df connection_read(64): no connection!

2017-03-06 14:15:11.641396500 58bd60df connection_read(64): no connection!

2017-03-06 14:15:11.641398500 58bd60df connection_read(64): no connection!

2017-03-06 14:15:11.641401500 58bd60df connection_read(64): no connection!

2017-03-06 14:15:11.641403500 58bd60df connection_read(64): no connection!

 

Yes: about 2 or 3 microseconds apart! About 100.000 pr. Second.

 

(

I tried post-filtering the “no connection” message away before it hit the log (possible with multilog),

theorizing that the disc simply got swamped by these, but slapd was still unresponsive.

I can’t run the slapd without some debugging, as I need the logtrace, and the “no connection” seems to

be output on any debug flag:

 

From openldap-2.4.44/servers/slapd/connection.c:

 

                Debug( LDAP_DEBUG_ANY,

                        "connection_read(%ld): no connection!\n",

                        (long) s, 0, 0 );

)

 

The (64) varies, but there is always one number responsible for the 100.000/sec and a

few other numbers with less than 10/sec.

 

Sometimes it recovers on itself,  other times I kill slapd and get this (from a later run);

 

2017-03-10 14:39:08.632744500 58c2ac7c daemon: shutdown requested and initiated.

2017-03-10 14:39:08.633288500 58c2ac7c conn=45284802 fd=19 closed (slapd shutdown)

2017-03-10 14:39:08.633348500 58c2ac7c conn=45284896 fd=20 closed (slapd shutdown)

2017-03-10 14:39:08.633468500 58c2ac7c conn=45284823 fd=24 closed (slapd shutdown)

2017-03-10 14:39:08.633519500 58c2ac7c conn=45284825 fd=25 closed (slapd shutdown)

2017-03-10 14:39:08.633558500 58c2ac7c conn=45284829 fd=26 closed (slapd shutdown)

2017-03-10 14:39:08.633596500 58c2ac7c conn=45284838 fd=27 closed (slapd shutdown)

2017-03-10 14:39:08.633684500 58c2ac7c conn=45284845 fd=28 closed (slapd shutdown)

2017-03-10 14:39:08.634018500 58c2ac7c conn=45284890 fd=29 closed (slapd shutdown)

2017-03-10 14:39:08.634823500 58c2ac7c conn=45266816 fd=33 closed (slapd shutdown)

2017-03-10 14:39:08.686058500 58c2ac7c slapd shutdown: waiting for 3643059 operations/tasks to finish

2017-03-10 14:39:17.320835500 58c2ac85 daemon: accept(-1) failed errno=9 (Bad file descriptor)

2017-03-10 14:39:34.750553500 58c2ac96 daemon: accept(-1) failed errno=9 (Bad file descriptor)

2017-03-10 14:40:55.413900500 58c2ace7 slapd stopped.

 

Note that: 3643059 operations/tasks

 

Does anybody have a hint what I am facing here, or where I should look?

I have not (yet) discovered a reliable way to reproduce the errorm normally I just wait for it to happend.

 

Regards, Ole.