Issue 8648 - sasl_client_init called concurrently causes issues
Summary: sasl_client_init called concurrently causes issues
Status: VERIFIED FIXED
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: slapd (show other issues)
Version: unspecified
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-04-28 04:43 UTC by Ryan Tandy
Modified: 2017-08-10 23:13 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description Ryan Tandy 2017-04-28 04:43:33 UTC
Full_Name: Ryan Tandy
Version: RE24 (66d107e)
OS: Debian
URL: 
Submission from: (NULL) (24.68.41.160)
Submitted by: ryan


Test program uploaded to the FTP site:

ftp://ftp.openldap.org/incoming/20170427_rtandy_sasltest.c

I have built cyrus-sasl and openldap myself from clean sources in order to rule
out issues introduced by Debian patches. My environment is Debian unstable with
OpenSSL 1.0.2k.

cyrus-sasl master at 04dd838
./autogen.sh CFLAGS="-g -O0" --with-devrandom=/dev/urandom
make
sudo make install
sudo ldconfig
export SASL_PATH=/usr/local/lib/sasl2

openldap RE24 at 66d107e
./configure CFLAGS="-g -O0" --disable-backends --enable-mdb --disable-overlays
--with-cyrus-sasl --with-tls=openssl
make
sudo make STRIP= install
sudo ldconfig

cat > slapd.conf << 'eof'
include servers/slapd/schema/core.schema
include servers/slapd/schema/cosine.schema

sasl-secprops none
authz-regexp uid=(.*),cn=.*,cn=auth ldap:///dc=example,dc=com??sub?(uid=$1)

database mdb
directory .
suffix dc=example,dc=com
rootdn cn=root,dc=example,dc=com
index objectClass eq

eof

/usr/local/sbin/slapadd -f slapd.conf << eof
dn: dc=example,dc=com
objectClass: domain
dc: example

dn: uid=admin,dc=example,dc=com
objectClass: account
objectClass: simpleSecurityObject
uid: admin
userPassword: admin

eof

/usr/local/libexec/slapd -h ldap://:9000 -f slapd.conf -s0 -dstats

# allow PLAIN
export LDAPSASL_SECPROPS=none

# verify authz-regexp. should return uid=admin,dc=example,dc=com
ldapwhoami -H ldap://:9000 -Y PLAIN -U admin -w admin

# edit sasltest.c and change defines as needed. remove -DSASL to use simple
bind
cc -g -DSASL sasltest.c -pthread -llber -lldap -lsasl2 -o sasltest
./sasltest

On my system, most runs of ./sasltest result in errors like:

rc = -6 (Unknown authentication method)
sasltest: sasltest.c:70: bind_thread: Assertion `rc == LDAP_SUCCESS' failed.
Aborted

With no concurrency (THREADS defined to 1), I see no errors.

I believe this is due to sasl_client_init being called from multiple threads
concurrently. I suspect the global mech list is getting mucked with.

As a proof of concept, I patched libldap to call sasl_client_init during its
global init:

ftp://ftp.openldap.org/incoming/20170427_rtandy_call-sasl_client_init-in-global-init.patch

With this change, I see no errors.

If I'm right about this bug, I suppose the right fix is to wrap sasl_client_init
in a new mutex. I'll post a patch for review soon.
Comment 1 Ryan Tandy 2017-04-28 14:52:34 UTC
On Fri, Apr 28, 2017 at 04:43:33AM +0000, ryan@openldap.org wrote:
>If I'm right about this bug, I suppose the right fix is to wrap sasl_client_init
>in a new mutex.

... of course that idea's half-baked, because libldap doesn't *have* 
mutexes. hmm.

I tried cyrus-sasl-2.1.25 and the issue doesn't seem to happen. I'll see 
if I can isolate the related change.

Comment 2 Ryan Tandy 2017-05-02 05:52:59 UTC
On Fri, Apr 28, 2017 at 02:52:44PM +0000, ryan@nardis.ca wrote:
>I tried cyrus-sasl-2.1.25 and the issue doesn't seem to happen. I'll see
>if I can isolate the related change.

SASL version was a red herring. I accidentally linked with Debian's 
libldap (2.4.31) when I tested that. Mea culpa.

The culprit does indeed seem to be the ITS#6798 change, as Quanah 
suggested. Reverting b483ee1a ("Drop ldap_int_sasl_mutex") on RE24 and 
linking with libldap_r makes my test program work.

Since the proof-of-concept patch I posted above also makes my test 
program work, I'm back to thinking that calling sasl_client_init 
concurrently is the real bug here.

>On Fri, Apr 28, 2017 at 04:43:33AM +0000, ryan@openldap.org wrote:
>>If I'm right about this bug, I suppose the right fix is to wrap sasl_client_init
>>in a new mutex.
>
>... of course that idea's half-baked, because libldap doesn't *have*
>mutexes. hmm.

I'm still not happy with the idea of calling sasl_client_init from 
ldap_int_initialize, as it does a bunch of work (loading plugins and 
such) that non-SASL users don't care about. However, I haven't as yet 
thought of another fix that works for libldap as well as libldap_r.

Would love to hear others' thoughts on this. Maybe fixing for libldap_r 
only (and documenting that) is a good enough compromise?

Comment 3 Howard Chu 2017-05-02 19:32:58 UTC
ryan@nardis.ca wrote:
> On Fri, Apr 28, 2017 at 02:52:44PM +0000, ryan@nardis.ca wrote:
>> I tried cyrus-sasl-2.1.25 and the issue doesn't seem to happen. I'll see
>> if I can isolate the related change.
>
> SASL version was a red herring. I accidentally linked with Debian's
> libldap (2.4.31) when I tested that. Mea culpa.
>
> The culprit does indeed seem to be the ITS#6798 change, as Quanah
> suggested. Reverting b483ee1a ("Drop ldap_int_sasl_mutex") on RE24 and
> linking with libldap_r makes my test program work.
>
> Since the proof-of-concept patch I posted above also makes my test
> program work, I'm back to thinking that calling sasl_client_init
> concurrently is the real bug here.
>
>> On Fri, Apr 28, 2017 at 04:43:33AM +0000, ryan@openldap.org wrote:
>>> If I'm right about this bug, I suppose the right fix is to wrap sasl_client_init
>>> in a new mutex.
>>
>> ... of course that idea's half-baked, because libldap doesn't *have*
>> mutexes. hmm.
>
> I'm still not happy with the idea of calling sasl_client_init from
> ldap_int_initialize, as it does a bunch of work (loading plugins and
> such) that non-SASL users don't care about. However, I haven't as yet
> thought of another fix that works for libldap as well as libldap_r.
>
> Would love to hear others' thoughts on this. Maybe fixing for libldap_r
> only (and documenting that) is a good enough compromise?

Yes, if the problem only exists in concurrent use then it's only relevant to 
libldap_r.


-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 4 Ryan Tandy 2017-05-05 03:31:25 UTC
changed notes
Comment 5 Ryan Tandy 2017-05-07 21:58:34 UTC
changed notes
changed state Open to Test
moved from Incoming to Software Bugs
Comment 6 Ryan Tandy 2017-05-09 15:47:29 UTC
An Ubuntu user confirmed that this change solved their issue with slapd 
and multiple syncrepl clients using GSSAPI.

https://bugs.launchpad.net/ubuntu/+source/openldap/+bug/1688575/comments/6

Comment 7 OpenLDAP project 2017-08-10 23:13:27 UTC
introduced by ITS#6798
fixed in master
fixed in RE24 (2.4.45)
Comment 8 Quanah Gibson-Mount 2017-08-10 23:13:27 UTC
changed notes
changed state Test to Closed