Issue 8385 - Use After Free of struct ldap_common in slap_client_connect
Summary: Use After Free of struct ldap_common in slap_client_connect
Status: VERIFIED FIXED
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: slapd (show other issues)
Version: unspecified
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-03-12 00:47 UTC by mkp37215@gmail.com
Modified: 2017-06-01 22:08 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description mkp37215@gmail.com 2016-03-12 00:47:18 UTC
Full_Name: Maciej Puzio
Version: 2.4.44 and current sources from git repo
OS: Ubuntu 15.10 amd64
URL: 
Submission from: (NULL) (129.112.109.41)


While configuring two servers in a dual-master setup I encountered an issue
causing replication failures, when replication is done on a TLS connection. In
this particular setup the issue manifested itself by replication retries
failing, even when the cause of the original failure no longer existed. For
example, let's consider that dual-master setup consists of servers A and B. If
server A was started before server B, its on-startup replication would fail (as
expected, since B is down). Server B would then be started and its replication
attempt would succeed (since A is already running). However A's subsequent
retries would fail, even though B was now accessible. If I then restarted server
A, its replication will succeed, but B will now lose connection to A and all B's
subsequent retries would fail. This issue has ~50% reproducibility, but if it
occurs once, then it would continue failing. On occasion both servers would
replicate fine, but obviously the whole situation was unacceptable on servers
being prepared for production use.

I tested this issue thoroughly eliminating various hypotheses, and ended up
debugging OpenLDAP code and finding a bug causing the above behavior. In short,
on failure slap_client_connect deallocates LDAP object ld and objects pointed by
it, including ldap_common which contains ld_options. Pointer to ld_options is
held in sb->sb_tls_ctx and reused in subsequent slap_client_connect calls, at
this point referencing freed memory.

This issue occurs in OpenLDAP newest stable version 2.4.44, as well as in
current source code from git repo (as of March 11, 2016), obtained from
git://git.openldap.org/openldap.git

In order to demonstrate that the issue is indeed use-after-free, I made the
following debug modifications to the code. (The diffs are against the source
code from git repo as of 2016/03/11)


--- servers/slapd/config.c.orig 2016-03-11 14:52:14.000000000 -0600
+++ servers/slapd/config.c      2016-03-11 17:20:55.216170535 -0600
@@ -60,6 +60,26 @@
 
 #define ARGS_STEP      512
 
+
+/* DEBUG */
+#include <syslog.h>
+#include <gnutls/gnutls.h>
+#include "../../libraries/libldap/ldap-int.h"
+
+typedef struct tlsg_ctx {
+        struct ldapoptions *lo;
+        gnutls_certificate_credentials_t cred;
+        gnutls_dh_params_t dh_params;
+        unsigned long verify_depth;
+        int refcount;
+        gnutls_priority_t prios;
+#ifdef LDAP_R_COMPILE
+        ldap_pvt_thread_mutex_t ref_mutex;
+#endif
+} tlsg_ctx;
+/* END DEBUG */
+
+
 /*
  * defaults for various global variables
  */
@@ -2001,9 +2021,11 @@
 
 #ifdef HAVE_TLS
        if ( sb->sb_tls_do_init ) {
+               syslog(LOG_DEBUG, "DEBUG: slap_client_connect sb->sb_tls_do_init
branch\n");
                rc = bindconf_tls_set( sb, ld );
 
        } else if ( sb->sb_tls_ctx ) {
+               syslog(LOG_DEBUG, "DEBUG: slap_client_connect sb->sb_tls_ctx
branch\n");
                rc = ldap_set_option( ld, LDAP_OPT_X_TLS_CTX,
                        sb->sb_tls_ctx );
        }
@@ -2015,6 +2037,7 @@
                        sb->sb_uri.bv_val, rc, 0 );
                goto done;
        }
+       syslog(LOG_DEBUG, "DEBUG: slap_client_connect
ldo_tls_require_cert=%X\n",
((tlsg_x*x*)ld->ld_options.ldo_tls_ctx)->lo->ldo_tls_require_cert);
 #endif


--- libraries/libldap/unbind.c.orig     2016-03-11 14:52:14.000000000 -0600
+++ libraries/libldap/unbind.c  2016-03-11 15:49:18.603447766 -0600
@@ -27,6 +27,8 @@
 
 #include "ldap-int.h"
 
+#include <syslog.h>  /* DEBUG */
+
 /* An Unbind Request looks like this:
  *
  *     UnbindRequest ::= [APPLICATION 2] NULL
@@ -208,6 +210,8 @@
 
 #ifdef HAVE_TLS
        ldap_int_tls_destroy( &ld->ld_options );
+       syslog(LOG_DEBUG, "DEBUG: ldap_ld_destroy\n");
+       ld->ld_options.ldo_tls_require_cert = 0xdeadbeef;  /* DEBUG */
 #endif
 
        if ( ld->ld_options.ldo_sctrls != NULL ) {
@@ -233,7 +237,7 @@
 #ifndef NDEBUG
        LDAP_TRASH(ld);
 #endif
-       LDAP_FREE( (char *) ld->ldc );
+       //LDAP_FREE( (char *) ld->ldc );  /* DEBUG */
        LDAP_FREE( (char *) ld );
 
        return( err );


Here, instead of freeing ld->ldc (struct ldap_common), I overwrite its contents
with a magic debug value 0xdeadbeef. I overwrite ldo_tls_require_cert, as an
incorrect value of this particular variable turned out to be the direct cause of
replication failures mentioned at the beginning (more about it later). However,
other fields of ldap_common could be used as well.

File ldap-int.h is included to be able to examine contents of *ld, while the
declaration of struct tlsg_ctx has been copied verbatim from
libraries/libldap/tls_g.c, in order to examine the contents of
ld->ld_options.ldo_tls_ctx. I use GnuTLS library following Ubuntu build options.
But the issue in question is not GnuTLS-related, though it may cause different
symptoms with OpenSSL or Mozilla NSS (I did not test these options).

One would expect that the only effect of these modifications would be a mere
memory leak, but in addition to that we see that the magic value gets logged. As
you can see, the second replication attempts uses memory that, if executing
non-debug code, would be already freed:

The results are as follows (only one server started):
Mar 11 17:42:21 srvA slapd[14360]: slapd starting
Mar 11 17:42:21 srvA slapd[14360]: =>do_syncrepl rid=001
Mar 11 17:42:21 srvA slapd[14360]: DEBUG: slap_client_connect sb->sb_tls_do_init
branch
Mar 11 17:42:21 srvA slapd[14355]: ...done.
Mar 11 17:42:21 srvA slapd[14360]: DEBUG: slap_client_connect
ldo_tls_require_cert=0
Mar 11 17:42:21 srvA slapd[14360]: DEBUG: ldap_ld_destroy
Mar 11 17:42:21 srvA slapd[14360]: do_syncrepl: rid=001 rc -1 retrying (9
retries left)
Mar 11 17:42:51 srvA slapd[14360]: =>do_syncrepl rid=001
Mar 11 17:42:51 srvA slapd[14360]: DEBUG: slap_client_connect sb->sb_tls_ctx
branch
Mar 11 17:42:51 srvA slapd[14360]: DEBUG: slap_client_connect
ldo_tls_require_cert=DEADBEEF
Mar 11 17:42:51 srvA slapd[14360]: DEBUG: ldap_ld_destroy
Mar 11 17:42:51 srvA slapd[14360]: do_syncrepl: rid=001 rc -1 retrying (8
retries left)

When running unmodified non-debug code, on replication retry ld_options contains
either (A) previous values, (B) zeros or (C) random values. Occasionally a
segfault occurs (although I observed them only during debugging). Replication
fails much more readily if, in addition to the issue discussed here, there is a
TLS certificate problem. In this case connection tends to fail in
tlsg_cert_verify, even if tls_reqcert is set to 'allow' or 'never'. This is
because random value of ldo_tls_require_cert will cause the certificate to be
checked regardless of tls_reqcert setting. If certificates are correct, the
issue may well remain hidden, as TLS connection will be established regardless
of whether A, B or C occurs. In this case an occasional segfault may be the only
symptom. Of course, if by chance B occurs, certificate verification will be
silently skipped and this may lead to a security problem.

A quick and dirty fix is to comment out few lines in servers/slapd/config.c so
that  "rc = bindconf_tls_set( sb, ld )" is always executed, which avoids the
reuse of sb->sb_tls_ctx. However, I think a better fix would involve changing
the handling of ld->ldoptions and related objects. Given the complexity of
interactions between various objects involved, and my limited familiarity with
OpenLDAP design, I feel that I am not in position to propose a patch.

For reference, OpenLDAP was build with this configuration:
./configure --prefix=/opt/openldap --enable-debug --enable-dynamic
--enable-syslog --enable-local --enable-slapd --enable-crypt --enable-modules
--enable-backends --enable-ndb=no --enable-shell=no --enable-perl=no
--enable-sql=no --enable-wt=no --enable-overlays --with-tls=gnutls
CPPFLAGS="-Wno-format-extra-args"

The relevant parts of slapd.conf are [redacted to anonymize]:

loglevel        1
serverID        001
moduleload      syncprov
TLSCipherSuite          SECURE256:-VERS-SSL3.0
TLSCACertificateFile    /etc/ldap/ssl/ca.pem
TLSCertificateFile      /etc/ldap/ssl/srvA.pem
TLSCertificateKeyFile   /etc/ldap/ssl/srvA.key
syncrepl rid=001
        provider=ldaps://10.0.0.2
        type=refreshAndPersist
        retry="30 10 300 +" 
        searchbase="dc=test"
        attrs="*,+"
        bindmethod=simple
        binddn="cn=root,dc=test"
        credentials="plaintext-password"
        tls_reqcert=never
        keepalive="240:5:10"
mirrormode  TRUE
overlay     syncprov
syncprov-checkpoint 10 1440

Test machine hardware:
CPU: Intel Atom C2558 @ 2.40GHz
Motherboard: Supermicro A1SRM-2558F
NIC: Intel Corporation Ethernet Connection I354 (rev 03)

Software used:
OS: Ubuntu 15.10
Kernel: 4.2.0-30-generic #36-Ubuntu SMP x86_64
OpenLDAP: slapd 2.4.44 and slapd 2.X (from git repo)
GnuTLS: 3.3.15-5ubuntu2

Please let me know if you need further information. Thank you.

Maciej Puzio
Comment 1 Howard Chu 2016-03-12 10:51:31 UTC
mkp37215@gmail.com wrote:
> Full_Name: Maciej Puzio
> Version: 2.4.44 and current sources from git repo
> OS: Ubuntu 15.10 amd64
> URL:
> Submission from: (NULL) (129.112.109.41)
>
>
> While configuring two servers in a dual-master setup I encountered an issue
> causing replication failures, when replication is done on a TLS connection. In
> this particular setup the issue manifested itself by replication retries
> failing, even when the cause of the original failure no longer existed. For
> example, let's consider that dual-master setup consists of servers A and B. If
> server A was started before server B, its on-startup replication would fail (as
> expected, since B is down). Server B would then be started and its replication
> attempt would succeed (since A is already running). However A's subsequent
> retries would fail, even though B was now accessible. If I then restarted server
> A, its replication will succeed, but B will now lose connection to A and all B's
> subsequent retries would fail. This issue has ~50% reproducibility, but if it
> occurs once, then it would continue failing. On occasion both servers would
> replicate fine, but obviously the whole situation was unacceptable on servers
> being prepared for production use.
>
> I tested this issue thoroughly eliminating various hypotheses, and ended up
> debugging OpenLDAP code and finding a bug causing the above behavior. In short,
> on failure slap_client_connect deallocates LDAP object ld and objects pointed by
> it, including ldap_common which contains ld_options. Pointer to ld_options is
> held in sb->sb_tls_ctx and reused in subsequent slap_client_connect calls, at
> this point referencing freed memory.
>
> This issue occurs in OpenLDAP newest stable version 2.4.44, as well as in
> current source code from git repo (as of March 11, 2016), obtained from
> git://git.openldap.org/openldap.git

> Here, instead of freeing ld->ldc (struct ldap_common), I overwrite its contents
> with a magic debug value 0xdeadbeef. I overwrite ldo_tls_require_cert, as an
> incorrect value of this particular variable turned out to be the direct cause of
> replication failures mentioned at the beginning (more about it later). However,
> other fields of ldap_common could be used as well.
>
> File ldap-int.h is included to be able to examine contents of *ld, while the
> declaration of struct tlsg_ctx has been copied verbatim from
> libraries/libldap/tls_g.c, in order to examine the contents of
> ld->ld_options.ldo_tls_ctx. I use GnuTLS library following Ubuntu build options.
> But the issue in question is not GnuTLS-related, though it may cause different
> symptoms with OpenSSL or Mozilla NSS (I did not test these options).

Thanks for the report. Inspection shows that the issue only exists in 
libldap's GnuTLS support. As always, the Project recommends you use OpenSSL.

It also looks like only the ldo_tls_require_cert field is being used so we can 
probably just copy that flag instead of keeping a pointer to the ldap options 
structure.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 2 Howard Chu 2016-03-12 11:04:31 UTC
> Thanks for the report. Inspection shows that the issue only exists in
> libldap's GnuTLS support. As always, the Project recommends you use OpenSSL.
>
> It also looks like only the ldo_tls_require_cert field is being used so we can
> probably just copy that flag instead of keeping a pointer to the ldap options
> structure.
>
Fixed now in git master.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 3 Howard Chu 2016-03-12 11:05:36 UTC
changed notes
changed state Open to Test
moved from Incoming to Software Bugs
Comment 4 mkp37215@gmail.com 2016-03-15 00:34:40 UTC
Howard, thank you very much for quickly fixing this issue. My tests show
that the replication is now working fine.
Regarding your recommendation of OpenSSL, that should rather go to OpenLDAP
package maintainers at Ubuntu and Debian. I use what they build and the
choice of the TLS library is not mine.
Comment 5 Quanah Gibson-Mount 2016-03-15 01:04:53 UTC
--On Tuesday, March 15, 2016 1:34 AM +0000 mkp37215@gmail.com wrote:

> --089e0139fc98189d9d052e0b902f
> Content-Type: text/plain; charset=UTF-8
>
> Howard, thank you very much for quickly fixing this issue. My tests show
> that the replication is now working fine.
> Regarding your recommendation of OpenSSL, that should rather go to
> OpenLDAP package maintainers at Ubuntu and Debian. I use what they build
> and the choice of the TLS library is not mine.

The Debian/Ubuntu maintainers are quite aware of the problems with using 
GnuTLS, and maintain they cannot use OpenSSL due to licensing issues, 
despite the fact every other major distribution uses OpenSSL (except RHEL, 
which switched to MozNSS which resulted in much disaster, and is likely 
going to switch back to OpenSSL).

There is no reason you cannot use OpenSSL, regardless of what 
Debian/Ubuntu's broken decisions are.

--Quanah

--

Quanah Gibson-Mount
Platform Architect
Zimbra, Inc.
--------------------
Zimbra ::  the leader in open source messaging and collaboration
A division of Synacor, Inc

Comment 6 mkp37215@gmail.com 2016-03-15 18:41:38 UTC
On Mon, Mar 14, 2016 at 8:04 PM, Quanah Gibson-Mount <quanah@zimbra.com> wrote:
>
> The Debian/Ubuntu maintainers are quite aware of the problems with using
> GnuTLS, and maintain they cannot use OpenSSL due to licensing issues,
> despite the fact every other major distribution uses OpenSSL (except RHEL,
> which switched to MozNSS which resulted in much disaster, and is likely
> going to switch back to OpenSSL).
>
> There is no reason you cannot use OpenSSL, regardless of what
> Debian/Ubuntu's broken decisions are.
>
> --Quanah

Quanah, thank you for your reply. However, I do not want to engage in
a political discussion, particularly on the matter that is beyond my
control. It is also out of topic, as the bug discussed here was in
libldap code, and not in GnuTLS. I would like to again thank Howard
for his work on this bug and coming up with a working fix very
quickly. Unless any new issue related to this bug or the fix comes to
light, I think we can consider the matter closed.

Maciej Puzio

Comment 7 Quanah Gibson-Mount 2016-10-17 17:26:45 UTC
changed notes
changed state Test to Release
Comment 8 OpenLDAP project 2017-06-01 22:08:51 UTC
fixed in master
fixed in RE25
fixed in RE24 (2.4.45)
Comment 9 Quanah Gibson-Mount 2017-06-01 22:08:51 UTC
changed notes
changed state Release to Closed