[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Real idletimeout more than configured idletimeout



Le lundi 07 juillet 2008 Ã 02:32 -0700, Howard Chu a Ãcrit :
> Eric DÃchaux wrote:
> > Dear openldap gurus,
> >
> > I am hitting some strange behavior with the idle sessions timeout
> > feature. In my configuration this timeout is set to 60 seconds on 4
> > slaves that are behind a load balancer. This load balancer times-out
> > idle sessions after 90 seconds, which should be fine. Openldap version
> > is the stable one from Debian Etch r3.
> 
> I have no idea what Debian or any other distro packages. You should quote 
> specific version numbers for all relevant pieces of software.

Sorry about that. Version is 2.3.30.
I also forgot to mention I am running the whole thing inside a VMware
ESX 301 virtual machine. I don't know if this can have impact.

> 
> > I however encounter random connection issues that have been traced to
> > the load balancer timeouting and idle session *before* the ldap slave.
> 
> > I have straced the slapd process and I found out the applyed idletimeout
> > was way above the  configured one, please check the two following strace
> > output :
> >
> >
> > Output 1
> 
> > [ some uninteresting ldap stuff ]
> >
> > futex(0x603428, FUTEX_WAKE, 1) = 1
> > read(12, 0x6f30ff, 8) = -1 EAGAIN (Resource temporarily unavailable)
> > futex(0x2b0db3b35dc8, FUTEX_WAKE, 1) = 1
> > select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout)
> > select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout)
> > select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout)
> > select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout)
> > select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout)
> > write(5, "0", 1) = 1
> > shutdown(12, 2 /* send and receive */) = 0
> > close(12) = 0
> >
> > Here, we can see 5 select system calls for a real idletimeout is 75
> > seconds instead of 60.
> 
> This doesn't really surprise me.
> 

Me neither.

> > Output 2
> 
> > [ some uninteresting ldap stuff ]
> >
> > futex(0x2b0db3b35dc8, FUTEX_WAKE, 1) = 1
> > select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout)
> > select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout)
> > select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout)
> > select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout)
> > select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout)
> > select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout)
> > write(5, "0", 1) = 1
> > shutdown(12, 2 /* send and receive */) = 0
> > close(12) = 0
> >
> > Here we have 6 select system calls for a real idletimeout of 90 seconds
> > which is enough for the session to expire on the load balancer.
> 
> This is rather surprising.
> 
> > I have checked the source code and the logic that choose either to
> > idletimeout the session or go into a "SLAP_EVENT_WAIT" (select) call is
> > the following :
> >
> > from server/slap/daemon.c
> >
> >
> >          now = slap_get_time();
> >
> >          if ( ( global_idletimeout>  0 )&&
> >          difftime( last_idle_check +
> >          global_idletimeout/SLAPD_IDLE_CHECK_LIMIT, now )<  0 )
> >          {
> >          	connections_timeout_idle( now );
> >          	last_idle_check = now;
> >          }
> >
> >
> > As I understand this, no connection should be tested against the
> > idletimeout before any "event wait loop" takes more time than the
> > idletimeout parameter / 4.
> 
> Right, on an otherwise idle server, we don't want to wake up too frequently to 
> check for idle connections. It's OK to check a little late, but we don't want 
> to wake up much too late, which would often occur if the IDLE_CHECK_LIMIT was 
> smaller.
> 
> > In my case, I need the "event wait loop" to last more than 15 seconds
> > for connections to be checked against aging.
> 
> Basically, yes.
> 
> > If I am not mistaken, as the difftime call compares seconds, I need the
> > loop to last a least for 16 seconds for the connections_timeout_idle
> > procedure to be called.
> 
> > Am I understanding everything the right way ?
> 
> Sounds like it.
> 
> > If it is the case, shouldn't the difftime call be tested<= 0 to help
> > idle sessions to be cleaned sonner ?
> 
> I don't think it makes much difference in the long run. Whenever you choose an 
> idletimeout that is not evenly divisible by 4 (IDLE_CHECK_LIMIT) it's going to 
> have extra slop anyway. And none of this explains how your 60 second 
> idletimeout allowed an idle connection to continue for 90 seconds. Frankly I 
> have no idea why that would be.
> 

ï
I believe it is possible when the main event loop takes less than 1
second, not counting the select timeout, when an idle check was done on
the previous loop. If this condition happens,
ïdifftime(last_idle_check+global_idletimeout/SLAPD_IDLE_CHECK_LIMIT,
now) will return 0 and no connection aging will be checked.


> In the meantime, on an idle server, I don't see any urgency in closing idle 
> connections, because in this case there's no danger of resource starvation. On 
> the other hand, for an active server, the event loop is going to be waking up 
> more frequently anyway due to real activity, in which case the idle checks 
> will happen more frequently. So as the server gets busier, the actual 
> idletimeouts will get much closer to the configured value.
> 

Got it.
It seems there is no simple workaround on ldap side for my issue.
I will search for other options.


Many thanks for your help.

-- 
Eric DÃchaux

IngÃnieur KÃbabiste

Sun Microsystems Services France