[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Socket-level timeouts?




On Apr 8, 2008, at 10:54 AM, Aaron Richton wrote:
I think you might be confusing LDAP_OPT_NETWORK_TIMEOUT and LDAP_OPT_TIMEOUT. (Or maybe I am...) But as I recall, NETWORK_TIMEOUT is for initial connect(), and you're referring to ongoing conversations.

This is correct - I'm proposing extending that to include a timeout for all network communication. In some cases the APIs have a timeout but many do not and this seems cleaner than requiring the client to pass a timeout for every call which could conceivably perform network operations.


For that matter, I'm having a hard time envisioning the situation you describe playing out. Let's say your server dies hard and you reboot it.

This is the only situation which works well currently. The only three failures we've had with slapd, however, have been situations where the server failed by simply becoming unresponsive and anything which touched PAM/NSS hung waiting for read() to return. We've also seen similar problems with mobile and multi-homed systems where an connection was attempted before the defined LDAP server was reachable.


Finally, libldap does use TCP keepalive nowadays. In the event of intermediate network path dying hard (which can't be relied upon to nicely produce TCP resets), the underlying keepalive mechanism should pick that up.

This is an improvement but it wouldn't help with the slapd failures we've observed because the server's TCP stack can respond to keepalives even when the service is unresponsive. It would definitely help recover when the server is rebooted but it uses the system-wide keepalive settings and the values appropriate for a local LDAP server would be far too aggressive for internet connections.


I understand the current situation but as a user it would feel more correct for LDAP_OPT_NETWORK_TIMEOUT to mean "try the next server if a response is not obtained within this time", covering the additional class of failures where an LDAP server is partially up as we cannot guarantee minute-level admin response times to restart a failing server.

Chris

Attachment: smime.p7s
Description: S/MIME cryptographic signature