Full_Name: Maxime Besson Version: 2.4.48 OS: Debian Buster / CentOS7 URL: https://cloud.worteks.com/index.php/s/9CE6ALLaAfrxZW4/download Submission from: (NULL) (92.184.104.113) I have two OpenLDAP (2.4.48, Ubuntu) servers running with syncrepl in mirrormode. One of my server's X509 certificates has recently expired, and I noticed that while it was expired, the other node's connection count kept climbing until it a "Max open files" condition. It seems that when a Syncrepl consumer encounters a certificate error, the outgoing LDAP connection to the provider is never closed. Attached to this bug you will find a test case to reproduce this behavior # runs a provider with a bogus certificate # and a consumer with retry=3+ sh test.sh ... TLS certificate verification: Error, self signed certificate TLS trace: SSL3 alert write:fatal:unknown CA TLS trace: SSL_connect:error in error TLS: can't connect: error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed (self signed certificate). 5e160966 slap_client_connect: URI=ldaps://127.0.1.1:6636/ DN="cn=manager,dc=example,dc=com" ldap_sasl_bind_s failed (-1) 5e160966 do_syncrepl: rid=001 rc -1 retrying ... and so on, every 3 seconds While the consumer retries, running lsof on its PID will reveal the connection leak: 42u TCP localhost:45964->localhost.lan:6636 (CLOSE_WAIT) 43u TCP localhost:45966->localhost.lan:6636 (CLOSE_WAIT) 44u TCP localhost:45968->localhost.lan:6636 (CLOSE_WAIT) 45u TCP localhost:45970->localhost.lan:6636 (CLOSE_WAIT) 46u TCP localhost:45972->localhost.lan:6636 (CLOSE_WAIT) 47u TCP localhost:45974->localhost.lan:6636 (CLOSE_WAIT) 48u TCP localhost:45976->localhost.lan:6636 (CLOSE_WAIT) 49u TCP localhost:45978->localhost.lan:6636 (CLOSE_WAIT) 50u TCP localhost:45980->localhost.lan:6636 (CLOSE_WAIT) 51u TCP localhost:45982->localhost.lan:6636 (CLOSE_WAIT) 52u TCP localhost:45984->localhost.lan:6636 (CLOSE_WAIT) 53u TCP localhost:45986->localhost.lan:6636 (CLOSE_WAIT) 54u TCP localhost:45988->localhost.lan:6636 (CLOSE_WAIT) 55u TCP localhost:45990->localhost.lan:6636 (CLOSE_WAIT) 56u TCP localhost:45992->localhost.lan:6636 (CLOSE_WAIT) 57u TCP localhost:45994->localhost.lan:6636 (CLOSE_WAIT) 58u TCP localhost:45996->localhost.lan:6636 (CLOSE_WAIT) 59u TCP localhost:45998->localhost.lan:6636 (CLOSE_WAIT) 60u TCP localhost:46000->localhost.lan:6636 (CLOSE_WAIT) 61u TCP localhost:46002->localhost.lan:6636 (CLOSE_WAIT) 62u TCP localhost:46004->localhost.lan:6636 (CLOSE_WAIT) 63u TCP localhost:46006->localhost.lan:6636 (CLOSE_WAIT) 64u TCP localhost:46008->localhost.lan:6636 (CLOSE_WAIT) Modifying the provider URL in slapd.2.conf with a wrong port causes the syncrepl consumer to fail and retry just as much, but without connection piling up in CLOSE_WAIT state. This is not a very critical issue because it only affects servers who are in already degraded condition (broken replication, invalid certificate on the provider) but I thought it still was worth reporting. I was able to reproduce this issue on the git master branch, on released 2.4.48, on Centos7 and on Debian Buster. OpenSSL version on the debian system: 1.1.1d-0+deb10u2
maxime.besson@worteks.com wrote: > Full_Name: Maxime Besson > Version: 2.4.48 > OS: Debian Buster / CentOS7 > URL: https://cloud.worteks.com/index.php/s/9CE6ALLaAfrxZW4/download > Submission from: (NULL) (92.184.104.113) Thanks for the report and testcase, fixed now in git master. > > > I have two OpenLDAP (2.4.48, Ubuntu) servers running with syncrepl in > mirrormode. > > One of my server's X509 certificates has recently expired, and I noticed that > while it was expired, the other node's connection count kept climbing until it > a "Max open files" condition. It seems that when a Syncrepl consumer encounters > a certificate error, the outgoing LDAP connection to the provider is never > closed. > > Attached to this bug you will find a test case to reproduce this behavior > > # runs a provider with a bogus certificate > # and a consumer with retry=3+ > sh test.sh > ... > TLS certificate verification: Error, self signed certificate > TLS trace: SSL3 alert write:fatal:unknown CA > TLS trace: SSL_connect:error in error > TLS: can't connect: error:1416F086:SSL > routines:tls_process_server_certificate:certificate verify failed (self signed > certificate). > 5e160966 slap_client_connect: URI=ldaps://127.0.1.1:6636/ > DN="cn=manager,dc=example,dc=com" ldap_sasl_bind_s failed (-1) > 5e160966 do_syncrepl: rid=001 rc -1 retrying > ... > and so on, every 3 seconds > > While the consumer retries, running lsof on its PID will reveal the connection > leak: > > 42u TCP localhost:45964->localhost.lan:6636 (CLOSE_WAIT) > 43u TCP localhost:45966->localhost.lan:6636 (CLOSE_WAIT) > 44u TCP localhost:45968->localhost.lan:6636 (CLOSE_WAIT) > 45u TCP localhost:45970->localhost.lan:6636 (CLOSE_WAIT) > 46u TCP localhost:45972->localhost.lan:6636 (CLOSE_WAIT) > 47u TCP localhost:45974->localhost.lan:6636 (CLOSE_WAIT) > 48u TCP localhost:45976->localhost.lan:6636 (CLOSE_WAIT) > 49u TCP localhost:45978->localhost.lan:6636 (CLOSE_WAIT) > 50u TCP localhost:45980->localhost.lan:6636 (CLOSE_WAIT) > 51u TCP localhost:45982->localhost.lan:6636 (CLOSE_WAIT) > 52u TCP localhost:45984->localhost.lan:6636 (CLOSE_WAIT) > 53u TCP localhost:45986->localhost.lan:6636 (CLOSE_WAIT) > 54u TCP localhost:45988->localhost.lan:6636 (CLOSE_WAIT) > 55u TCP localhost:45990->localhost.lan:6636 (CLOSE_WAIT) > 56u TCP localhost:45992->localhost.lan:6636 (CLOSE_WAIT) > 57u TCP localhost:45994->localhost.lan:6636 (CLOSE_WAIT) > 58u TCP localhost:45996->localhost.lan:6636 (CLOSE_WAIT) > 59u TCP localhost:45998->localhost.lan:6636 (CLOSE_WAIT) > 60u TCP localhost:46000->localhost.lan:6636 (CLOSE_WAIT) > 61u TCP localhost:46002->localhost.lan:6636 (CLOSE_WAIT) > 62u TCP localhost:46004->localhost.lan:6636 (CLOSE_WAIT) > 63u TCP localhost:46006->localhost.lan:6636 (CLOSE_WAIT) > 64u TCP localhost:46008->localhost.lan:6636 (CLOSE_WAIT) > > > Modifying the provider URL in slapd.2.conf with a wrong port causes the > syncrepl > consumer to fail and retry just as much, but without connection piling up in > CLOSE_WAIT state. > > This is not a very critical issue because it only affects servers who are in > already degraded condition (broken replication, invalid certificate on the > provider) but I thought it still was worth reporting. > > I was able to reproduce this issue on the git master branch, on released 2.4.48, > on Centos7 and on Debian Buster. > OpenSSL version on the debian system: 1.1.1d-0+deb10u2 > > -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
changed notes changed state Open to Test moved from Incoming to Software Bugs
changed notes changed state Test to Release
fixed in master Fixed in RE24 (2.4.49)
changed notes changed state Release to Closed