[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: configurable keepalive setting through libldap?



Am Dienstag 05 Mai 2009 22:48:10 schrieb Howard Chu:
> Ralf Haferkamp wrote:
> > Am Freitag 01 Mai 2009 11:50:15 schrieb masarati@aero.polimi.it:
> >>> Hi,
> >>>
> >>> since quite some time libldap enables tcp-keepalive, e.g. to detected
> >>> dangling
> >>> syncrepl connections. However the default timeout of two hours that
> >>> most systems are using might be a bit too long for some applications
> >>> (e.g. I had a
> >>> problem lately were nscd didn't answer queries anymore because nss_ldap
> >>> was
> >>> blocking in SSL_read() while the underlying connection has been cut
> >>> off). On
> >>> the other hand messing with the system wide settings might no be a good
> >>> idea
> >>> either. On Linux it is possible to configure the keepalive settings on
> >>> a per
> >>> socket basis through the TCP_KEEP* socket options.
> >>>
> >>> Would it be worth adding ldap_set_option() support for those, even if
> >>> they are
> >>> not really portable?
> >>
> >> I think it would; for archs that do not support it, it could do nothing
> >> (and log accordingly, just in case).
> >
> > Ok, I'll introduce the following new options for keepalive support then:
> > LDAP_OPT_X_KEEPALIVE_IDLE	0x6300
> > LDAP_OPT_X_KEEPALIVE_PROBES	0x6301
> > LDAP_OPT_X_KEEPALIVE_INTERVAL	0x6302
> >
> > We might also think about adding support to set those values for syncrepl
> > and back-ldap/back-meta.
>
> I'd prefer a portable solution vs something so extremely
> platform-dependent. As already discussed many times before, we just need a
> client to send a periodic LDAP no-op message to get the same effect.
> (Abandon 0 will work fine.)
Something like proposed in ITS#5133? It seems that it was rejected with a 
reference to the enablement of SO_KEEPALIVE, though. Should we revisit that?

My problem was not so much with syncrepl though, I had nss_ldap making me 
trouble.

> While it's not as general purpose as setting a
> keepalive in the socket layer, I think we only need to worry about the
> syncrepl client. back-ldap/meta already have their own retry mechanisms,
> they can take care of themselves.
There seems to be a problem with many retry mechanisms when it comes to the  
scenario I described in my orignial post. On a TLS protected connection 
SSL_read (called from ldap_result) might trigger multiple read() calls. As 
there are no select/poll calls inbetween them, one of those read()s might 
block forever (until TCP keepalive kicks in) in case the server is not 
answering anymore and didn't close the connection correctly (power failure, 
...)
I havn't had a good idea yet how to easily fix this case, apart from 
leveraging TCP keepalives.

(According to the docs, SSL_read() would return SSL_ERROR_WANT_READ when the 
underlying BIO is non-blocking. But we're using blocking IO. I am unsure how 
much effort it would be to port that to non-blocking. I'd think it's a non-
trivial task ;)).

> So - I'd rather see an option for a periodic LDAP ping added to the
> syncrepl client - that will work uniformly across all platforms.
>
> And in general - I am opposed to any code that causes our feature set /
> behavior to differ from platform to platform.
Understandable, that's why I was asking before commiting anything. But AFAIK 
we have plattform specific issues in other places as well. (Or think about the 
various different LDAP_OPT_X_TLS-settings depending on which underlying SSL 
implementation is used.)

-- 
Ralf