[Date Prev][Date Next] [Chronological] [Thread] [Top]

slapd connection_read: no connection; tcp time_wait state


This is an interesting one... I have an OpenLDAP 2.4.12 server as a consumer in a two node cluster. It's sole function is to answer queries for our mail hub for recipient validation. We see about 50-300 queries / second and occasional spikes.

Unfortunately, our mail hub appliances (vendor name left out to protect the guilty) are somewhat inefficient in ldap connection handling and are opening a new TCP connection for every single ldap query. It does this even when there are multiple recipients in one smtp session (boggles the mind!). A percentage of these connections don't get closed properly and I get the following error in the syslog:

slapd[23108]: connection_read(18): no connection!

The reason is that the connections are in a time_wait state because they were not closed properly. They go away in 60 seconds, but with the load this server gets we continuously have several hundred tcp connections in a time_wait state and a system log full of the above errors.

I'm attaching two packet captures:

time_wait.cap - filtered a single complete tcp session that ended with the port in a time_wait condition.

no_time_wait.cap - control capture for reference. This session closed properly.

I can't claim to have the greatest understanding of 3-way / 4-way tcp open / close handshakes. But, one thing that I did notice that seems to be consistent among the sessions that end in time_wait is that the fin-ack is initiated by the server. Possibly i'm reading it wrong, but doesn't the client normally initiate the close? and the server does a passive close? So, in theory the server should never have to wait for the client.

Could someone more knowledgeable than me tell me why the server might initiate the active close?



Attachment: time_wait.cap
Description: application/cap

Attachment: no_time_wait.cap
Description: application/cap