[Date Prev][Date Next] [Chronological] [Thread] [Top]

(ITS#4708) refreshAndPersist retry parameter isn't meaningful



Full_Name: Aaron Richton
Version: HEAD/RE23
OS: CentOS 4.4
URL: ftp://ftp.openldap.org/incoming/richton-20061011.patch
Submission from: (NULL) (128.6.31.135)


In the event of a loss of connection with the syncrepl server, slapd(8) in its
role as syncrepl client is expected to retry consistent with any setting in a
"retry" configuration clause. However, a refreshAndPersist client in connected
state will merely wait (forever) for the (nonexistent lost) connection to
provide data in the event of a network failure. There is no current application
layer nor network layer awareness of the connection failure apart from "it's
been quiet for a long time," which doesn't make a good algorithm. From a *ix
network stack standpoint, connections remain ESTABLISHED even in the face of
network failure, and slapd(8) doesn't have a clue that it should be retrying.

The linked patch turns on SO_KEEPALIVE if available, creating network layer
awareness of the connection failure. When combined with appropriate IP stack
tuning (out of the scope of OpenLDAP), very quick retry times can be
accomplished. I have found any retries impossible without this patch.

To replicate, install OpenLDAP with refreshAndPersist, and do something brutal
to the consumer network -- firewall off communication with your master server,
pull the network cable, etc. Wait for the connection to die off on the master
server (slapd/daemon.c already makes SO_KEEPALIVE on the server side), then
restore proper network state to the consumer. netstat on the syncrepl client
will show ESTABLISHED; it's ignorant of your network destruction and will never
retry because it still believes everything is happy.