[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Real idletimeout more than configured idletimeout



Eric DÃchaux wrote:
Dear openldap gurus,

I am hitting some strange behavior with the idle sessions timeout
feature. In my configuration this timeout is set to 60 seconds on 4
slaves that are behind a load balancer. This load balancer times-out
idle sessions after 90 seconds, which should be fine. Openldap version
is the stable one from Debian Etch r3.

I have no idea what Debian or any other distro packages. You should quote specific version numbers for all relevant pieces of software.


I however encounter random connection issues that have been traced to
the load balancer timeouting and idle session *before* the ldap slave.

I have straced the slapd process and I found out the applyed idletimeout
was way above the  configured one, please check the two following strace
output :


Output 1

[ some uninteresting ldap stuff ]

futex(0x603428, FUTEX_WAKE, 1) = 1
read(12, 0x6f30ff, 8) = -1 EAGAIN (Resource temporarily unavailable)
futex(0x2b0db3b35dc8, FUTEX_WAKE, 1) = 1
select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout)
select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout)
select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout)
select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout)
select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout)
write(5, "0", 1) = 1
shutdown(12, 2 /* send and receive */) = 0
close(12) = 0

Here, we can see 5 select system calls for a real idletimeout is 75
seconds instead of 60.

This doesn't really surprise me.

Output 2

[ some uninteresting ldap stuff ]

futex(0x2b0db3b35dc8, FUTEX_WAKE, 1) = 1
select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout)
select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout)
select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout)
select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout)
select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout)
select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout)
write(5, "0", 1) = 1
shutdown(12, 2 /* send and receive */) = 0
close(12) = 0

Here we have 6 select system calls for a real idletimeout of 90 seconds
which is enough for the session to expire on the load balancer.

This is rather surprising.

I have checked the source code and the logic that choose either to
idletimeout the session or go into a "SLAP_EVENT_WAIT" (select) call is
the following :

from server/slap/daemon.c


now = slap_get_time();

         if ( ( global_idletimeout>  0 )&&
         difftime( last_idle_check +
         global_idletimeout/SLAPD_IDLE_CHECK_LIMIT, now )<  0 )
         {
         	connections_timeout_idle( now );
         	last_idle_check = now;
         }


As I understand this, no connection should be tested against the idletimeout before any "event wait loop" takes more time than the idletimeout parameter / 4.

Right, on an otherwise idle server, we don't want to wake up too frequently to check for idle connections. It's OK to check a little late, but we don't want to wake up much too late, which would often occur if the IDLE_CHECK_LIMIT was smaller.


In my case, I need the "event wait loop" to last more than 15 seconds
for connections to be checked against aging.

Basically, yes.

If I am not mistaken, as the difftime call compares seconds, I need the
loop to last a least for 16 seconds for the connections_timeout_idle
procedure to be called.

Am I understanding everything the right way ?

Sounds like it.

If it is the case, shouldn't the difftime call be tested<= 0 to help
idle sessions to be cleaned sonner ?

I don't think it makes much difference in the long run. Whenever you choose an idletimeout that is not evenly divisible by 4 (IDLE_CHECK_LIMIT) it's going to have extra slop anyway. And none of this explains how your 60 second idletimeout allowed an idle connection to continue for 90 seconds. Frankly I have no idea why that would be.


In the meantime, on an idle server, I don't see any urgency in closing idle connections, because in this case there's no danger of resource starvation. On the other hand, for an active server, the event loop is going to be waking up more frequently anyway due to real activity, in which case the idle checks will happen more frequently. So as the server gets busier, the actual idletimeouts will get much closer to the configured value.

--
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/