Issue 9288 - back-ldap triggers assert in bind.c
Summary: back-ldap triggers assert in bind.c
Status: VERIFIED FIXED
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: backends (show other issues)
Version: 2.4.49
Hardware: x86_64 Linux
: --- blocker
Target Milestone: 2.5.3
Assignee: Ondřej Kuzník
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-07-11 08:34 UTC by tekkitan
Modified: 2021-04-01 04:07 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description tekkitan 2020-07-11 08:34:52 UTC
System: t3.small AWS instance 2VCPU 2GB RAM
OS: Ubuntu 20.04 (was happening on 16.04 as well)
Package Version: 2.4.49+dfsg-2ubuntu1.2

We've been having this weird problem with slapd between Ubuntu 16.04 and Ubuntu 20.04 where slapd will suddenly stop according to syslog but will also generate a crash file within apport. We thought it was due to some weird stuff we were doing in the 16.04 version (2.4.42+dfsg-2ubuntu3.8), but we deployed a new LDAP proxy in 20.04 (2.4.49+dfsg-2ubuntu1.2) which appears to have the same issue so I am reporting it and hoping for guidance as we use this proxy for our corporate VPN.

Below is the backtrace which was the same from both proxy systems, but this one is from the 20.04 version as that is our current system in use:

root@useldap02:~/_usr_sbin_slapd.0.crash# apport-retrace --stdout /var/crash/_usr_sbin_slapd.0.crash
--- stack trace ---
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
        set = {__val = {0, 140488162916432, 140488307736576, 140487911640016, 140487911640117, 140487911640016, 140487911640016, 140487911640147, 140487911640316, 140487911640016, 140487911640316, 0, 0, 0, 0, 0}}
        pid = <optimized out>
        tid = <optimized out>
        ret = <optimized out>
#1  0x00007fc5f3043859 in __GI_abort () at abort.c:79
        save_stage = 1
        act = {__sigaction_handler = {sa_handler = 0x2, sa_sigaction = 0x2}, sa_mask = {__val = {131, 4, 99, 0, 0, 140488164070405, 0, 21474836480, 140488121483504, 0, 140488164102160, 0, 6703301646461552640, 140488164070405, 140488165695488, 140488164087176}}, sa_flags = -235332728, sa_restorer = 0xbf}
        sigs = {__val = {32, 0 <repeats 15 times>}}
#2  0x00007fc5f3043729 in __assert_fail_base (fmt=0x7fc5f31d9588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x7fc5f1f91953 "!LDAP_BACK_CONN_TAINTED( lc )", file=0x7fc5f1f91b88 "../../../../../servers/slapd/back-ldap/bind.c", line=191, function=<optimized out>) at assert.c:92
        str = 0x7fc5c410a280 "@Q\021\344\305\177"
        total = 4096
#3  0x00007fc5f3054f36 in __GI___assert_fail (assertion=0x7fc5f1f91953 "!LDAP_BACK_CONN_TAINTED( lc )", file=0x7fc5f1f91b88 "../../../../../servers/slapd/back-ldap/bind.c", line=191, function=0x7fc5f1f921a0 "ldap_back_conn_delete") at assert.c:101
No locals.
#4  0x00007fc5f1f807c7 in ldap_back_conn_delete () from /usr/lib/ldap/back_ldap-2.4.so.2
No symbol table info available.
#5  0x00007fc5f1f814a4 in ?? () from /usr/lib/ldap/back_ldap-2.4.so.2
No symbol table info available.
#6  0x00007fc5f1f815ad in ldap_back_release_conn_lock () from /usr/lib/ldap/back_ldap-2.4.so.2
No symbol table info available.
#7  0x00007fc5f1f833bc in ldap_back_retry () from /usr/lib/ldap/back_ldap-2.4.so.2
No symbol table info available.
#8  0x00007fc5f1f7ec0b in ldap_back_search () from /usr/lib/ldap/back_ldap-2.4.so.2
No symbol table info available.
#9  0x000055b87105cc88 in overlay_op_walk ()
No symbol table info available.
#10 0x000055b87105cdb7 in ?? ()
No symbol table info available.
#11 0x000055b870fef80d in fe_op_search ()
No symbol table info available.
#12 0x000055b870fef034 in do_search ()
No symbol table info available.
#13 0x000055b870fec6ed in ?? ()
No symbol table info available.
#14 0x000055b870fed22c in ?? ()
No symbol table info available.
#15 0x00007fc5f32e7a03 in ?? () from /lib/x86_64-linux-gnu/libldap_r-2.4.so.2
No symbol table info available.
#16 0x00007fc5f3219609 in start_thread (arg=<optimized out>) at pthread_create.c:477
        ret = <optimized out>
        pd = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140488121493248, -8220430158823943899, 140488129874654, 140488129874655, 140488129874784, 140488121490880, 8241811018110734629, 8241811687867814181}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = 0
#17 0x00007fc5f3140103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.

This is usually what we see within syslog when slapd stops:

Jul 11 00:49:32 useldap02 slapd[5257]: conn=1035 op=239 SRCH base="ou=people,dc=company,dc=com" scope=2 deref=0 filter="(&(|(objectClass=person)(objectClass=organizationalPerson)(objectClass=inetOrgPerson)(?objectClass=fw1Person))(uid=exampleusername))"
Jul 11 00:49:32 useldap02 slapd[5257]: conn=1035 op=239 SRCH attr=cn uid sn mail proxyAddresses userPrincipalName fullName displayName description objectclass fw1hour-range-from fw1hour-range-to fw1expiration-date fw1day fw1allowed-dst fw1allowed-src fw1auth-method userAccountControl fw1userPwdPolicy mobile fw1BadPwdCount fw1lastLoginFailure fw1pwdLastMod fw1auth-server fw1auth-server fw1groupTemplate fw1sr-auth-track fw1enc-methods fw1ISAKMP-EncMethod fw1ISAKMP-AuthMethods fw1ISAKMP-HashMethods fw1ISAKMP-Transform fw1ISAKMP-DataIntegrityMethod fw1ISAKMP-SharedSecret fw1ISAKMP-DataEncMethod givenName surname
Jul 11 00:49:32 useldap02 slapd[5257]: conn=1035 op=240 SRCH base="ou=groups,dc=company,dc=com" scope=2 deref=0 filter="(&(|(objectClass=person)(objectClass=organizationalPerson)(objectClass=inetOrgPerson)(?objectClass=fw1Person))(uid=exampleusername))"
Jul 11 00:49:32 useldap02 slapd[5257]: conn=1035 op=240 SRCH attr=cn uid sn mail proxyAddresses userPrincipalName fullName displayName description objectclass fw1hour-range-from fw1hour-range-to fw1expiration-date fw1day fw1allowed-dst fw1allowed-src fw1auth-method userAccountControl fw1userPwdPolicy mobile fw1BadPwdCount fw1lastLoginFailure fw1pwdLastMod fw1auth-server fw1auth-server fw1groupTemplate fw1sr-auth-track fw1enc-methods fw1ISAKMP-EncMethod fw1ISAKMP-AuthMethods fw1ISAKMP-HashMethods fw1ISAKMP-Transform fw1ISAKMP-DataIntegrityMethod fw1ISAKMP-SharedSecret fw1ISAKMP-DataEncMethod givenName surname
Jul 11 00:49:34 useldap02 slapd[28099]:  * Stopping OpenLDAP slapd
Jul 11 00:49:34 useldap02 slapd[28099]:    ...done.
Jul 11 00:49:34 useldap02 systemd[1]: slapd.service: Succeeded.


Note that a lot of these attributes being searched for (all the fw1* attributes) do not exist within our LDAP. Not sure if that would be the cause.

Thank you for any help that can be provided.
Comment 1 Quanah Gibson-Mount 2020-07-14 15:53:07 UTC
The backtrace shows it hit an assert:

assertion=0x7fc5f1f91953 "!LDAP_BACK_CONN_TAINTED( lc )"

which as the backtrace notes, is from line 191 in bind.c for the ldap backend:

 190         if ( LDAP_BACK_CONN_CACHED( lc ) ) {
 191             assert( !LDAP_BACK_CONN_TAINTED( lc ) );
 192             tmplc = avl_delete( &li->li_conninfo.lai_tree, (caddr_t)lc,
 193                 ldap_back_conndnlc_cmp );
 194             assert( tmplc == lc );
 195             LDAP_BACK_CONN_CACHED_CLEAR( lc );
 196         }


Interestingly, this entire function is unchanged since 2007.
Comment 2 Quanah Gibson-Mount 2021-01-28 17:52:02 UTC
May be duplicate of ITS#9400. Please see if this still occurs with OpenLDAP 2.4.57, thanks.
Comment 3 tero.saarni 2021-02-09 07:38:23 UTC
I might have bumped into a variant of this problem.  In my case, I have a reproducible test for this, and it reproduces crash even after #9400 is applied.

The assert can be triggered by sending InvalidCredentials response from the remote server after back-ldap retries the connection and binds again.

This bug might possibly related to another bug that I'm observing: back-ldap retries bind as anonymous even if `rebind-as-user` is set to yes.  I did not find existing issue for this in bugs.openldap.org yet.


Log file has:

slapd: bind.c:191: ldap_back_conn_delete: Assertion `!LDAP_BACK_CONN_TAINTED( lc )' failed.


And backtrace from gdb:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f587d350859 in __GI_abort () at abort.c:79
#2  0x00007f587d350729 in __assert_fail_base (fmt=0x7f587d4e6588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
    assertion=0x55740debf839 "!LDAP_BACK_CONN_TAINTED( lc )", file=0x55740debf7a0 "bind.c", line=191, function=<optimized out>) at assert.c:92
#3  0x00007f587d361f36 in __GI___assert_fail (assertion=0x55740debf839 "!LDAP_BACK_CONN_TAINTED( lc )", file=0x55740debf7a0 "bind.c", line=191, 
    function=0x55740debfff0 <__PRETTY_FUNCTION__.12907> "ldap_back_conn_delete") at assert.c:101
#4  0x000055740dd857a1 in ldap_back_conn_delete (li=0x55740e2a4c90, lc=0x7f586c102fa0) at bind.c:191
#5  0x000055740dd862ad in ldap_back_freeconn (li=0x55740e2a4c90, lc=0x7f586c102fa0, dolock=0) at bind.c:510
#6  0x000055740dd8a795 in ldap_back_retry (lcp=0x7f587ae77810, op=0x7f5870000bb0, rs=0x7f587ae78a30, sendok=LDAP_BACK_DONTSEND) at bind.c:2035
#7  0x000055740dd48efc in ldap_back_search (op=0x7f5870000bb0, rs=0x7f587ae78a30) at search.c:579
#8  0x000055740dc91a31 in fe_op_search (op=0x7f5870000bb0, rs=0x7f587ae78a30) at search.c:406
#9  0x000055740dc91229 in do_search (op=0x7f5870000bb0, rs=0x7f587ae78a30) at search.c:247
#10 0x000055740dc8d6a7 in connection_operation (ctx=0x7f587ae78b90, arg_v=0x7f5870000bb0) at connection.c:1168
#11 0x000055740dc8ddd4 in connection_read_thread (ctx=0x7f587ae78b90, argv=0xb) at connection.c:1319
#12 0x000055740de82e10 in ldap_int_thread_pool_wrapper (xpool=0x55740e243540) at tpool.c:1051
#13 0x00007f587d526609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#14 0x00007f587d44d293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Comment 4 Ondřej Kuzník 2021-02-09 12:05:06 UTC
On Tue, Feb 09, 2021 at 07:38:23AM +0000, openldap-its@openldap.org wrote:
> I might have bumped into a variant of this problem.  In my case, I have a
> reproducible test for this, and it reproduces crash even after #9400 is
> applied.

Hi Tero,
thanks for investigating this, would you be able to provide the test
script here or propose it as an addition to tests/data/regressions?

> The assert can be triggered by sending InvalidCredentials response from the
> remote server after back-ldap retries the connection and binds again.
> 
> This bug might possibly related to another bug that I'm observing: back-ldap
> retries bind as anonymous even if `rebind-as-user` is set to yes.  I did not
> find existing issue for this in bugs.openldap.org yet.

I haven't been able to find a bug referencing this directly, but seem to
remember there was a suggestion there were issues reconnecting, maybe a
part of a larger issue that manifested as ITS#9205 and ITS#8427?

Thanks,
Comment 5 Quanah Gibson-Mount 2021-02-09 16:03:04 UTC
Test case MR: https://git.openldap.org/openldap/openldap/-/merge_requests/223
Comment 6 tero.saarni 2021-02-09 16:38:46 UTC
Here is a notice of origin and rights statement for the merge request

The attached patch file is derived from OpenLDAP Software. All of the modifications to OpenLDAP Software represented in the following patch(es) were developed by Tero Saarni tero.saarni@est.tech. I have not assigned rights and/or interest in this work to any party.

Ericsson Software Technology AB hereby place the following modifications to OpenLDAP Software (and only these modifications) into the public domain. Hence, these modifications may be freely used and/or redistributed for any purpose with or without attribution and/or other notice.
Comment 7 Quanah Gibson-Mount 2021-03-02 20:49:43 UTC
Commits: 
  • 1e3e6cdd 
by Tero Saarni at 2021-03-02T20:00:55+00:00 
ITS#9288 crash if back-ldap rebind fails


  • 99efeda0 
by Ondřej Kuzník at 2021-03-02T20:00:55+00:00 
ITS#9288 Do no change tainted status on failed retry

It seems refcnt == 0 connections are tainted only if they were just
removed from cache.