[Date Prev][Date Next] [Chronological] [Thread] [Top]

RE: test007-replication failure (ITS#2272)



SO_REUSEADDR **should** only be needed on the server so that it
can bind(2) to the local *.*:port address despite there being
other local ADDRESS:port (and hence remote ADDRESS:xxx) uses.
Since the client doesn't bind(2) to a specific local address,
it **should not** need to set SO_REUSEADDR.

Kurt


At 12:29 PM 2/15/2003, hyc@highlandsun.com wrote:
>Ok, more details - the client is getting connection refused, and netstat
>shows many (~45) connections in TIME_WAIT state. I believe this problem is
>simply a case of the network layer running out of allocatable ports after
>running the script so many times. Setting SO_REUSEADDR may help, may not.
>Also, it makes a difference whether only the server or only the client (or
>both) sets SO_REUSEADDR, as the TIME_WAIT state is entered by the side that
>initiates the TCP close. In the case of an LDAP Unbind sent from the client
>to the server, this is a bit of a race condition because we can't predict
>whether the client or the server will actually get to process its close()
>call first. Not sure this is a problem we need to worry about. Comments?
>
>  -- Howard Chu
>  Chief Architect, Symas Corp.       Director, Highland Sun
>  http://www.symas.com               http://highlandsun.com/hyc
>  Symas: Premier OpenSource Development and Support
>
>> -----Original Message-----
>> From: owner-openldap-bugs@OpenLDAP.org
>> [mailto:owner-openldap-bugs@OpenLDAP.org]On Behalf Of hyc@highlandsun.com
>
>> I've duplicated this behavior, but don't have a good
>> explanation for it. I
>> altered the test007 script to not kill the daemons when this
>> problem occurs,
>> so that I could attach to them and see what happened.
>> However, it didn't get
>> to that.
>>
>> Your slurp.log shows that slurpd first tried to bind to the
>> slave using IPv6,
>> which failed, and then it tried IPv4, and also failed.
>>
>> Mine shows the same result. However, since I left the daemons
>> running, slurpd
>> retried a few seconds later and connected and bound
>> successfully, and all the
>> updates were pushed across.
>>
>> On the slave.log, you see an anonymous bind on IPv6 that is
>> from the initial
>> ldapsearch command that was issued to check that the slave
>> was running. Then
>> you see a successful bind from slurpd. What's important to
>> notice here is
>> that both binds are on IPv6, the port numbers differ only by
>> one, and there
>> is no other intervening connection logged on the slave. There is no
>> indication of slurpd's previous failed connection attempt.
>>
>> I have no explanation for why the connection attempt fails,
>> as the slave
>> clearly has listeners open on both IPv6 and IPv4. It is clear
>> to me that this
>> problem has nothing to do with slapd - the listeners are there. It is
>> possible that this is a libldap problem. Certainly the client
>> never actually
>> attempted to connect to the server; if it had, the port number of the
>> successful connection would have been (at least) 2 greater
>> than the anonymous
>> connection.
>>
>>   -- Howard Chu
>>   Chief Architect, Symas Corp.       Director, Highland Sun
>>   http://www.symas.com               http://highlandsun.com/hyc
>>   Symas: Premier OpenSource Development and Support
>>
>> > -----Original Message-----
>> > From: owner-openldap-bugs@OpenLDAP.org
>> > [mailto:owner-openldap-bugs@OpenLDAP.org]On Behalf Of
>> > h.b.furuseth@usit.uio.no
>> > Sent: Sunday, January 19, 2003 12:35 PM
>> > To: openldap-its@OpenLDAP.org
>> > Subject: test007-replication failure (ITS#2272)
>> >
>> >
>> > Full_Name: Hallvard B. Furuseth
>> > Version: 2.1.12 and HEAD
>> > OS: Solaris 2.8 (sparc)
>> > URL: ftp://ftp.openldap.org/incoming/Hallvard-Furuseth-030119.tgz
>> > Submission from: (NULL) (129.240.186.42)
>> >
>> >
>> > test007-replication sometimes fails with error code 32 (no
>> > such object).
>> > It fails about 6% of the time with BDB and 3% of the time with LDBM.
>> > To reproduce:
>> >
>> >    make test                # to make the symlinks in tests/
>> >    ^C
>> >    (cd tests; while scripts/test007-replication; do :; done)
>> >
>> > I've put the test-db and test-repl directories from a failed run
>> > (with the HEAD branch) in the URL provided with this report.
>> > Here is the output:
>> >
>> > running defines.sh
>> > Cleaning up in ./test-db...
>> > Cleaning up in ./test-repl...
>> > Starting master slapd on TCP/IP port 9009...
>> > Starting slave slapd on TCP/IP port 9010...
>> > Using ldapsearch to check that master slapd is running...
>> > Using ldapsearch to check that slave slapd is running...
>> > Starting slurpd...
>> > Using ldapadd to populate the master directory...
>> > Waiting 15 seconds for slurpd to send changes...
>> > Using ldapmodify to modify master directory...
>> > Waiting 15 seconds for slurpd to send changes...
>> > Using ldapsearch to read all the entries from the master...
>> > Using ldapsearch to read all the entries from the slave...
>> > ldapsearch failed (32)!
>> > 24813 Killed
>> >
>> >
>> >
>>
>>
>>