[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
(ITS#9147) syncrepl connexion leak when provider uses an expired certificate
Full_Name: Maxime Besson
Version: 2.4.48
OS: Debian Buster / CentOS7
URL: https://cloud.worteks.com/index.php/s/9CE6ALLaAfrxZW4/download
Submission from: (NULL) (92.184.104.113)
I have two OpenLDAP (2.4.48, Ubuntu) servers running with syncrepl in
mirrormode.
One of my server's X509 certificates has recently expired, and I noticed that
while it was expired, the other node's connection count kept climbing until it
a "Max open files" condition. It seems that when a Syncrepl consumer encounters
a certificate error, the outgoing LDAP connection to the provider is never
closed.
Attached to this bug you will find a test case to reproduce this behavior
# runs a provider with a bogus certificate
# and a consumer with retry=3+
sh test.sh
...
TLS certificate verification: Error, self signed certificate
TLS trace: SSL3 alert write:fatal:unknown CA
TLS trace: SSL_connect:error in error
TLS: can't connect: error:1416F086:SSL
routines:tls_process_server_certificate:certificate verify failed (self signed
certificate).
5e160966 slap_client_connect: URI=ldaps://127.0.1.1:6636/
DN="cn=manager,dc=example,dc=com" ldap_sasl_bind_s failed (-1)
5e160966 do_syncrepl: rid=001 rc -1 retrying
...
and so on, every 3 seconds
While the consumer retries, running lsof on its PID will reveal the connection
leak:
42u TCP localhost:45964->localhost.lan:6636 (CLOSE_WAIT)
43u TCP localhost:45966->localhost.lan:6636 (CLOSE_WAIT)
44u TCP localhost:45968->localhost.lan:6636 (CLOSE_WAIT)
45u TCP localhost:45970->localhost.lan:6636 (CLOSE_WAIT)
46u TCP localhost:45972->localhost.lan:6636 (CLOSE_WAIT)
47u TCP localhost:45974->localhost.lan:6636 (CLOSE_WAIT)
48u TCP localhost:45976->localhost.lan:6636 (CLOSE_WAIT)
49u TCP localhost:45978->localhost.lan:6636 (CLOSE_WAIT)
50u TCP localhost:45980->localhost.lan:6636 (CLOSE_WAIT)
51u TCP localhost:45982->localhost.lan:6636 (CLOSE_WAIT)
52u TCP localhost:45984->localhost.lan:6636 (CLOSE_WAIT)
53u TCP localhost:45986->localhost.lan:6636 (CLOSE_WAIT)
54u TCP localhost:45988->localhost.lan:6636 (CLOSE_WAIT)
55u TCP localhost:45990->localhost.lan:6636 (CLOSE_WAIT)
56u TCP localhost:45992->localhost.lan:6636 (CLOSE_WAIT)
57u TCP localhost:45994->localhost.lan:6636 (CLOSE_WAIT)
58u TCP localhost:45996->localhost.lan:6636 (CLOSE_WAIT)
59u TCP localhost:45998->localhost.lan:6636 (CLOSE_WAIT)
60u TCP localhost:46000->localhost.lan:6636 (CLOSE_WAIT)
61u TCP localhost:46002->localhost.lan:6636 (CLOSE_WAIT)
62u TCP localhost:46004->localhost.lan:6636 (CLOSE_WAIT)
63u TCP localhost:46006->localhost.lan:6636 (CLOSE_WAIT)
64u TCP localhost:46008->localhost.lan:6636 (CLOSE_WAIT)
Modifying the provider URL in slapd.2.conf with a wrong port causes the
syncrepl
consumer to fail and retry just as much, but without connection piling up in
CLOSE_WAIT state.
This is not a very critical issue because it only affects servers who are in
already degraded condition (broken replication, invalid certificate on the
provider) but I thought it still was worth reporting.
I was able to reproduce this issue on the git master branch, on released 2.4.48,
on Centos7 and on Debian Buster.
OpenSSL version on the debian system: 1.1.1d-0+deb10u2