[Date Prev][Date Next] [Chronological] [Thread] [Top]

RE: test007-replication failure (ITS#2272)



Ok, more details - the client is getting connection refused, and netstat
shows many (~45) connections in TIME_WAIT state. I believe this problem is
simply a case of the network layer running out of allocatable ports after
running the script so many times. Setting SO_REUSEADDR may help, may not.
Also, it makes a difference whether only the server or only the client (or
both) sets SO_REUSEADDR, as the TIME_WAIT state is entered by the side that
initiates the TCP close. In the case of an LDAP Unbind sent from the client
to the server, this is a bit of a race condition because we can't predict
whether the client or the server will actually get to process its close()
call first. Not sure this is a problem we need to worry about. Comments?

  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support

> -----Original Message-----
> From: owner-openldap-bugs@OpenLDAP.org
> [mailto:owner-openldap-bugs@OpenLDAP.org]On Behalf Of hyc@highlandsun.com

> I've duplicated this behavior, but don't have a good
> explanation for it. I
> altered the test007 script to not kill the daemons when this
> problem occurs,
> so that I could attach to them and see what happened.
> However, it didn't get
> to that.
>
> Your slurp.log shows that slurpd first tried to bind to the
> slave using IPv6,
> which failed, and then it tried IPv4, and also failed.
>
> Mine shows the same result. However, since I left the daemons
> running, slurpd
> retried a few seconds later and connected and bound
> successfully, and all the
> updates were pushed across.
>
> On the slave.log, you see an anonymous bind on IPv6 that is
> from the initial
> ldapsearch command that was issued to check that the slave
> was running. Then
> you see a successful bind from slurpd. What's important to
> notice here is
> that both binds are on IPv6, the port numbers differ only by
> one, and there
> is no other intervening connection logged on the slave. There is no
> indication of slurpd's previous failed connection attempt.
>
> I have no explanation for why the connection attempt fails,
> as the slave
> clearly has listeners open on both IPv6 and IPv4. It is clear
> to me that this
> problem has nothing to do with slapd - the listeners are there. It is
> possible that this is a libldap problem. Certainly the client
> never actually
> attempted to connect to the server; if it had, the port number of the
> successful connection would have been (at least) 2 greater
> than the anonymous
> connection.
>
>   -- Howard Chu
>   Chief Architect, Symas Corp.       Director, Highland Sun
>   http://www.symas.com               http://highlandsun.com/hyc
>   Symas: Premier OpenSource Development and Support
>
> > -----Original Message-----
> > From: owner-openldap-bugs@OpenLDAP.org
> > [mailto:owner-openldap-bugs@OpenLDAP.org]On Behalf Of
> > h.b.furuseth@usit.uio.no
> > Sent: Sunday, January 19, 2003 12:35 PM
> > To: openldap-its@OpenLDAP.org
> > Subject: test007-replication failure (ITS#2272)
> >
> >
> > Full_Name: Hallvard B. Furuseth
> > Version: 2.1.12 and HEAD
> > OS: Solaris 2.8 (sparc)
> > URL: ftp://ftp.openldap.org/incoming/Hallvard-Furuseth-030119.tgz
> > Submission from: (NULL) (129.240.186.42)
> >
> >
> > test007-replication sometimes fails with error code 32 (no
> > such object).
> > It fails about 6% of the time with BDB and 3% of the time with LDBM.
> > To reproduce:
> >
> >    make test		# to make the symlinks in tests/
> >    ^C
> >    (cd tests; while scripts/test007-replication; do :; done)
> >
> > I've put the test-db and test-repl directories from a failed run
> > (with the HEAD branch) in the URL provided with this report.
> > Here is the output:
> >
> > running defines.sh
> > Cleaning up in ./test-db...
> > Cleaning up in ./test-repl...
> > Starting master slapd on TCP/IP port 9009...
> > Starting slave slapd on TCP/IP port 9010...
> > Using ldapsearch to check that master slapd is running...
> > Using ldapsearch to check that slave slapd is running...
> > Starting slurpd...
> > Using ldapadd to populate the master directory...
> > Waiting 15 seconds for slurpd to send changes...
> > Using ldapmodify to modify master directory...
> > Waiting 15 seconds for slurpd to send changes...
> > Using ldapsearch to read all the entries from the master...
> > Using ldapsearch to read all the entries from the slave...
> > ldapsearch failed (32)!
> > 24813 Killed
> >
> >
> >
>
>
>