[Date Prev][Date Next] [Chronological] [Thread] [Top]

Socket-level timeouts?

We've noticed hard failures on both our Linux and Mac workstations when an LDAP server fails in a way which causes it to stop responding but leave a connection open (e.g. lock contention, disk failure). This usually ends up requiring the system to be rebooted because a key system process will probably have made a call which is waiting on a read() which might take days to fail.

I've created a patch simply calls setsockopt() to set SO_SNDTIMEO| SO_RCVTIMEO when LDAP_OPT_NETWORK_TIMEOUT has been set. This appears to produce the desired result on Linux (both with pam_ldap and the ldap utilities) and OS X (within the DirectoryService plugin).

Is there a drawback to this approach which I've missed? It appears that the issue has come up in the past but there's no solution that I can see (certainly nothing else uses socket-level timeouts). I'd like to find a solution for this as it's by far the biggest source of Linux downtime in our environment.


Attachment: openldap-socket-timeouts.diff
Description: Binary data

Attachment: smime.p7s
Description: S/MIME cryptographic signature