[Date Prev][Date Next]
slapd connection_read: no connection; tcp time_wait state
This is an interesting one... I have an OpenLDAP 2.4.12 server as a
consumer in a two node cluster. It's sole function is to answer queries
for our mail hub for recipient validation. We see about 50-300 queries
/ second and occasional spikes.
Unfortunately, our mail hub appliances (vendor name left out to protect
the guilty) are somewhat inefficient in ldap connection handling and are
opening a new TCP connection for every single ldap query. It does this
even when there are multiple recipients in one smtp session (boggles the
mind!). A percentage of these connections don't get closed properly and
I get the following error in the syslog:
slapd: connection_read(18): no connection!
The reason is that the connections are in a time_wait state because they
were not closed properly. They go away in 60 seconds, but with the load
this server gets we continuously have several hundred tcp connections in
a time_wait state and a system log full of the above errors.
I'm attaching two packet captures:
time_wait.cap - filtered a single complete tcp session that ended with
the port in a time_wait condition.
no_time_wait.cap - control capture for reference. This session closed
I can't claim to have the greatest understanding of 3-way / 4-way tcp
open / close handshakes. But, one thing that I did notice that seems to
be consistent among the sessions that end in time_wait is that the
fin-ack is initiated by the server. Possibly i'm reading it wrong, but
doesn't the client normally initiate the close? and the server does a
passive close? So, in theory the server should never have to wait for
Could someone more knowledgeable than me tell me why the server might
initiate the active close?