[Date Prev][Date Next] [Chronological] [Thread] [Top]

RE: test007-replication failure (ITS#2272)



Also note - the TCP standard specifies a 120 second delay for TIME_WAIT.
Solaris follows the spec. Typically BSD systems only stay in TIME_WAIT for 30
seconds, which means this problem would never show up there because of the
two 15 second sleep's in the middle of the script. I.e., any sockets that
were used by one iteration of the script would be competely cleaned up during
the next iteration. On Solaris we are effectively creating sockets 4 times
faster than they are destroyed, and so eventually run out.

  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support

> -----Original Message-----
> From: owner-openldap-bugs@OpenLDAP.org
> [mailto:owner-openldap-bugs@OpenLDAP.org]On Behalf Of hyc@highlandsun.com

> Ok, more details - the client is getting connection refused,
> and netstat
> shows many (~45) connections in TIME_WAIT state. I believe
> this problem is
> simply a case of the network layer running out of allocatable
> ports after
> running the script so many times. Setting SO_REUSEADDR may
> help, may not.
> Also, it makes a difference whether only the server or only
> the client (or
> both) sets SO_REUSEADDR, as the TIME_WAIT state is entered by
> the side that
> initiates the TCP close. In the case of an LDAP Unbind sent
> from the client
> to the server, this is a bit of a race condition because we
> can't predict
> whether the client or the server will actually get to process
> its close()
> call first. Not sure this is a problem we need to worry
> about. Comments?
>
>   -- Howard Chu
>   Chief Architect, Symas Corp.       Director, Highland Sun
>   http://www.symas.com               http://highlandsun.com/hyc
>   Symas: Premier OpenSource Development and Support