9147 – syncrepl connexion leak when provider uses an expired certificate

Issue 9147 - syncrepl connexion leak when provider uses an expired certificate

Summary: syncrepl connexion leak when provider uses an expired certificate

Status:	VERIFIED FIXED

Alias:	None

Product:	OpenLDAP
Classification:	Unclassified
Component:	slapd (show other issues)
Version:	2.4.48
Hardware:	All All

Importance:	--- normal
Target Milestone:	---
Assignee:	OpenLDAP project

URL:
Keywords:

Depends on:
Blocks:

Reported:	2020-01-08 17:07 UTC by maxime.besson@worteks.com
Modified:	2020-04-08 07:26 UTC (History)
CC List:	1 user (show)

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this issue.

Description maxime.besson@worteks.com 2020-01-08 17:07:55 UTC

Full_Name: Maxime Besson
Version: 2.4.48
OS: Debian Buster / CentOS7
URL: https://cloud.worteks.com/index.php/s/9CE6ALLaAfrxZW4/download
Submission from: (NULL) (92.184.104.113)


I have two OpenLDAP (2.4.48, Ubuntu) servers running with syncrepl in
mirrormode.

One of my server's X509 certificates has recently expired, and I noticed that
while it was expired, the other node's connection count kept climbing until it
a "Max open files" condition. It seems that when a Syncrepl consumer encounters
a certificate error, the outgoing LDAP connection to the provider is never
closed.

Attached to this bug you will find a test case to reproduce this behavior

    # runs a provider with a bogus certificate 
    # and a consumer with retry=3+
    sh test.sh 
    ...
    TLS certificate verification: Error, self signed certificate
    TLS trace: SSL3 alert write:fatal:unknown CA
    TLS trace: SSL_connect:error in error
    TLS: can't connect: error:1416F086:SSL
routines:tls_process_server_certificate:certificate verify failed (self signed
certificate).
    5e160966 slap_client_connect: URI=ldaps://127.0.1.1:6636/
DN="cn=manager,dc=example,dc=com" ldap_sasl_bind_s failed (-1)
    5e160966 do_syncrepl: rid=001 rc -1 retrying
    ...
    and so on, every 3 seconds

While the consumer retries, running lsof on its PID will reveal the connection
leak:

    42u     TCP localhost:45964->localhost.lan:6636 (CLOSE_WAIT)
    43u     TCP localhost:45966->localhost.lan:6636 (CLOSE_WAIT)
    44u     TCP localhost:45968->localhost.lan:6636 (CLOSE_WAIT)
    45u     TCP localhost:45970->localhost.lan:6636 (CLOSE_WAIT)
    46u     TCP localhost:45972->localhost.lan:6636 (CLOSE_WAIT)
    47u     TCP localhost:45974->localhost.lan:6636 (CLOSE_WAIT)
    48u     TCP localhost:45976->localhost.lan:6636 (CLOSE_WAIT)
    49u     TCP localhost:45978->localhost.lan:6636 (CLOSE_WAIT)
    50u     TCP localhost:45980->localhost.lan:6636 (CLOSE_WAIT)
    51u     TCP localhost:45982->localhost.lan:6636 (CLOSE_WAIT)
    52u     TCP localhost:45984->localhost.lan:6636 (CLOSE_WAIT)
    53u     TCP localhost:45986->localhost.lan:6636 (CLOSE_WAIT)
    54u     TCP localhost:45988->localhost.lan:6636 (CLOSE_WAIT)
    55u     TCP localhost:45990->localhost.lan:6636 (CLOSE_WAIT)
    56u     TCP localhost:45992->localhost.lan:6636 (CLOSE_WAIT)
    57u     TCP localhost:45994->localhost.lan:6636 (CLOSE_WAIT)
    58u     TCP localhost:45996->localhost.lan:6636 (CLOSE_WAIT)
    59u     TCP localhost:45998->localhost.lan:6636 (CLOSE_WAIT)
    60u     TCP localhost:46000->localhost.lan:6636 (CLOSE_WAIT)
    61u     TCP localhost:46002->localhost.lan:6636 (CLOSE_WAIT)
    62u     TCP localhost:46004->localhost.lan:6636 (CLOSE_WAIT)
    63u     TCP localhost:46006->localhost.lan:6636 (CLOSE_WAIT)
    64u     TCP localhost:46008->localhost.lan:6636 (CLOSE_WAIT)


Modifying the provider URL in slapd.2.conf with a wrong port causes the
syncrepl
consumer to fail and retry just as much, but without connection piling up in
CLOSE_WAIT state.

This is not a very critical issue because it only affects servers who are in
already degraded condition (broken replication, invalid certificate on the
provider) but I thought it still was worth reporting.

I was able to reproduce this issue on the git master branch, on released 2.4.48,
on Centos7 and on Debian Buster.
OpenSSL version on the debian system: 1.1.1d-0+deb10u2

Comment 1 Howard Chu 2020-01-11 04:23:25 UTC

maxime.besson@worteks.com wrote:
> Full_Name: Maxime Besson
> Version: 2.4.48
> OS: Debian Buster / CentOS7
> URL: https://cloud.worteks.com/index.php/s/9CE6ALLaAfrxZW4/download
> Submission from: (NULL) (92.184.104.113)

Thanks for the report and testcase, fixed now in git master.
> 
> 
> I have two OpenLDAP (2.4.48, Ubuntu) servers running with syncrepl in
> mirrormode.
> 
> One of my server's X509 certificates has recently expired, and I noticed that
> while it was expired, the other node's connection count kept climbing until it
> a "Max open files" condition. It seems that when a Syncrepl consumer encounters
> a certificate error, the outgoing LDAP connection to the provider is never
> closed.
> 
> Attached to this bug you will find a test case to reproduce this behavior
> 
>     # runs a provider with a bogus certificate 
>     # and a consumer with retry=3+
>     sh test.sh 
>     ...
>     TLS certificate verification: Error, self signed certificate
>     TLS trace: SSL3 alert write:fatal:unknown CA
>     TLS trace: SSL_connect:error in error
>     TLS: can't connect: error:1416F086:SSL
> routines:tls_process_server_certificate:certificate verify failed (self signed
> certificate).
>     5e160966 slap_client_connect: URI=ldaps://127.0.1.1:6636/
> DN="cn=manager,dc=example,dc=com" ldap_sasl_bind_s failed (-1)
>     5e160966 do_syncrepl: rid=001 rc -1 retrying
>     ...
>     and so on, every 3 seconds
> 
> While the consumer retries, running lsof on its PID will reveal the connection
> leak:
> 
>     42u     TCP localhost:45964->localhost.lan:6636 (CLOSE_WAIT)
>     43u     TCP localhost:45966->localhost.lan:6636 (CLOSE_WAIT)
>     44u     TCP localhost:45968->localhost.lan:6636 (CLOSE_WAIT)
>     45u     TCP localhost:45970->localhost.lan:6636 (CLOSE_WAIT)
>     46u     TCP localhost:45972->localhost.lan:6636 (CLOSE_WAIT)
>     47u     TCP localhost:45974->localhost.lan:6636 (CLOSE_WAIT)
>     48u     TCP localhost:45976->localhost.lan:6636 (CLOSE_WAIT)
>     49u     TCP localhost:45978->localhost.lan:6636 (CLOSE_WAIT)
>     50u     TCP localhost:45980->localhost.lan:6636 (CLOSE_WAIT)
>     51u     TCP localhost:45982->localhost.lan:6636 (CLOSE_WAIT)
>     52u     TCP localhost:45984->localhost.lan:6636 (CLOSE_WAIT)
>     53u     TCP localhost:45986->localhost.lan:6636 (CLOSE_WAIT)
>     54u     TCP localhost:45988->localhost.lan:6636 (CLOSE_WAIT)
>     55u     TCP localhost:45990->localhost.lan:6636 (CLOSE_WAIT)
>     56u     TCP localhost:45992->localhost.lan:6636 (CLOSE_WAIT)
>     57u     TCP localhost:45994->localhost.lan:6636 (CLOSE_WAIT)
>     58u     TCP localhost:45996->localhost.lan:6636 (CLOSE_WAIT)
>     59u     TCP localhost:45998->localhost.lan:6636 (CLOSE_WAIT)
>     60u     TCP localhost:46000->localhost.lan:6636 (CLOSE_WAIT)
>     61u     TCP localhost:46002->localhost.lan:6636 (CLOSE_WAIT)
>     62u     TCP localhost:46004->localhost.lan:6636 (CLOSE_WAIT)
>     63u     TCP localhost:46006->localhost.lan:6636 (CLOSE_WAIT)
>     64u     TCP localhost:46008->localhost.lan:6636 (CLOSE_WAIT)
> 
> 
> Modifying the provider URL in slapd.2.conf with a wrong port causes the
> syncrepl
> consumer to fail and retry just as much, but without connection piling up in
> CLOSE_WAIT state.
> 
> This is not a very critical issue because it only affects servers who are in
> already degraded condition (broken replication, invalid certificate on the
> provider) but I thought it still was worth reporting.
> 
> I was able to reproduce this issue on the git master branch, on released 2.4.48,
> on Centos7 and on Debian Buster.
> OpenSSL version on the debian system: 1.1.1d-0+deb10u2
> 
> 


-- 
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 2 Howard Chu 2020-01-11 04:26:37 UTC

changed notes
changed state Open to Test
moved from Incoming to Software Bugs

Comment 3 Quanah Gibson-Mount 2020-01-11 23:19:53 UTC

changed notes
changed state Test to Release

Comment 4 OpenLDAP project 2020-01-30 18:33:59 UTC

fixed in master
Fixed in RE24 (2.4.49)

Comment 5 Quanah Gibson-Mount 2020-01-30 18:33:59 UTC

changed notes
changed state Release to Closed