Issue 8997 - openldap-nssov/back ldap segfault
Summary: openldap-nssov/back ldap segfault
Status: VERIFIED FIXED
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: slapd (show other issues)
Version: 2.4.47
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-03-20 21:21 UTC by matt@pallissard.net
Modified: 2019-07-24 19:05 UTC (History)
0 users

See Also:


Attachments
dif.txt (407 bytes, text/plain)
2019-03-28 14:41 UTC, Howard Chu
Details

Note You need to log in before you can comment on or make changes to this issue.
Description matt@pallissard.net 2019-03-20 21:21:26 UTC
Full_Name: Matthew Pallissard
Version: 2.4.47
OS: Archlinux
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (185.236.200.195)


Hi,

We're seeng a lot of segfaults on some of our busier HPC machines. These
typically have hundreds of jobs landing on them at a given time.

We use back-ldap with an nssov-overlay and pcache in front of Active Directory. 
 This is Authorization only, authentication is handled via krb5.  The relevant
bits of slightly scrubbed config are at the bottom of this message.


We notice two things;

1. this can be replicated semi-consistently;
  1. stop slapd, ensure cache is empty
  2. do something dumb like this
    > for i in {1..100}; do ./dumb.sh &; done

    > #!/bin/bash
    > # dumb.sh
    > while [[ 1 -ne 2 ]]; do
    >   for i in $(getent passwd  | cut -f 1 -d ':'); do
    >     time id ${i}
    >   done
    > done
  3. start slapd


2. Turning the log level to 0 /seems/ to make the issue go away. I'll report
back once I can confirm that.
  A note on this; We do have a good handful of 'service accounts' that don't
have all of the posix attributes in active directory.  As such those entries do
spam the logs a bit.



# config 2.4.47
dn: cn=module,cn=config
objectClass: olcModuleList
cn: module
olcModulePath: /usr/lib/openldap
olcModuleLoad: pcache
olcModuleLoad: nssov
olcModuleLoad: back_ldap
olcModuleLoad: back_mdb
dn: olcDatabase={1}ldap,cn=config
objectClass: olcDatabaseConfig
objectClass: olcLDAPConfig
olcDatabase: {1}ldap
olcSuffix: dc=ad,dc=domain,dc=edu
olcAddContentAcl: FALSE
olcLastMod: FALSE
olcMaxDerefDepth: 15
olcReadOnly: FALSE
olcSyncUseSubentry: FALSE
olcMonitoring: FALSE
olcRootDN: cn=ldap_rootdn,cn=config
olcDbURI: "ldap://ad.domain.edu"
olcDbStartTLS: none  starttls=no
olcDbACLBind: bindmethod=simple timeout=0 network-timeout=0 binddn="" crede
 ntials="" keepalive=0:0:0
olcDbIDAssertBind: mode=none flags=prescriptive,proxy-authz-non-critical bin
 dmethod=simple timeout=0 network-timeout=0 binddn="" credentials=""
keepalive=0:0:0
olcDbIDAssertAuthzFrom: *
olcDbRebindAsUser: FALSE
olcDbChaseReferrals: FALSE
olcDbTFSupport: no
olcDbProxyWhoAmI: FALSE
olcDbProtocolVersion: 3
olcDbSingleConn: FALSE
olcDbCancel: abandon
olcDbUseTemporaryConn: FALSE
olcDbConnectionPoolMax: 16
olcDbSessionTrackingRequest: FALSE
olcDbNoRefs: FALSE
olcDbNoUndefFilter: FALSE
olcDbOnErr: continue
olcDbKeepalive: 0:0:0
structuralObjectClass: olcLDAPConfig

dn: olcOverlay={0}nssov,olcDatabase={1}ldap,cn=config
objectClass: olcOverlayConfig
objectClass: olcNssOvConfig
olcOverlay: {0}nssov
olcNssSsd: group ldap:///dc=ad,dc=domain,dc=edu??sub?(objectClass=posi
 xGroup)
olcNssSsd: passwd ldap:///dc=ad,dc=domain,dc=edu??sub?(objectClass=pos
 ixAccount)
olcNssSsd: shadow ldap:///dc=ad,dc=domain,dc=edu??sub?(objectClass=sha
 dowAccount)
olcNssMap: group uniqueMember member
olcNssMap: passwd gecos title
olcNssMap: passwd homeDirectory unixHomeDirectory
olcNssPam: uid2dn
olcNssPamMinUid: 0
olcNssPamMaxUid: 0
structuralObjectClass: olcNssOvConfig

dn: olcOverlay={1}pcache,olcDatabase={1}ldap,cn=config
objectClass: olcOverlayConfig
objectClass: olcPcacheConfig
olcOverlay: {1}pcache
olcPcache: mdb 1000000 30 1000000 3600
olcPcacheAttrset: 0 uid userPassword uidNumber gidNumber gecos cn homeDirectory
loginShell objectClass
olcPcacheAttrset: 1 cn userPassword gidNumber memberUid objectClass member
olcPcacheTemplate: "(&(objectclass=)(|(memberuid=)(member=)))" 1 3600
olcPcacheTemplate: "(&(objectclass=)(|(memberuid=)(uniquemember=)))" 1 3600
olcPcacheTemplate: "(&(objectclass=)(gidnumber=))" 1 3600
olcPcacheTemplate: "(&(objectclass=)(uidnumber=))" 0 3600
olcPcacheTemplate: "(&(objectclass=)(uid=))" 0 3600
olcPcacheTemplate: "(objectclass=)" 0 3600
olcPcacheTemplate: "(objectclass=)" 1 3600
olcPcachePosition: head
olcPcacheMaxQueries: 10000000
olcPcachePersist: FALSE
olcPcacheValidate: FALSE
olcPcacheOffline: TRUE

dn: olcDatabase={0}mdb,olcOverlay={1}pcache,olcDatabase={1}ldap,cn=config
objectClass: olcMdbConfig
objectClass: olcPcacheDatabase
olcDatabase: {0}mdb
olcDbDirectory: /var/lib/openldap/openldap-data
olcDbNoSync: FALSE
olcDbIndex: objectClass eq
olcDbIndex: cn pres,eq,sub
olcDbIndex: uid pres,eq,sub
olcDbIndex: uidNumber eq
olcDbIndex: gidNumber eq
olcDbIndex: memberUid eq
olcDbIndex: sn pres,eq,sub
olcDbIndex: mail pres,eq,sub
olcDbIndex: uniqueMember eq
Comment 1 Howard Chu 2019-03-20 22:01:40 UTC
matt@pallissard.net wrote:

> 2. Turning the log level to 0 /seems/ to make the issue go away. I'll report
> back once I can confirm that.

That would imply the SEGV is specifically due to a NULL pointer being passed in a Debug message.
If that's true, you should be able to reproduce this using whatever accounts aren't fully populated
with POSIX attributes.

>   A note on this; We do have a good handful of 'service accounts' that don't
> have all of the posix attributes in active directory.  As such those entries do
> spam the logs a bit.



-- 
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 2 matt@pallissard.net 2019-03-21 22:13:30 UTC
On 2019-03-20T22:01:40, Howard Chu wrote:
> matt@pallissard.net wrote:
>
> > 2. Turning the log level to 0 /seems/ to make the issue go away. I'll report
> > back once I can confirm that.
>
> That would imply the SEGV is specifically due to a NULL pointer being passed in a Debug message.
> If that's true, you should be able to reproduce this using whatever accounts aren't fully populated
> with POSIX attributes.
>
> >   A note on this; We do have a good handful of 'service accounts' that don't
> > have all of the posix attributes in active directory.  As such those entries do
> > spam the logs a bit.

I have a backtrace I can submit.  What's the proper way of doing so?  As there is ldap info in it I'd rather not have it publicly accessable.

Matt Pallissard
Comment 3 matt@pallissard.net 2019-03-28 13:54:35 UTC
On 2019-03-21T15:13:30, Pallissard, Matthew wrote:
> On 2019-03-20T22:01:40, Howard Chu wrote:
> > matt@pallissard.net wrote:
> >
> > > 2. Turning the log level to 0 /seems/ to make the issue go away. I'll report
> > > back once I can confirm that.

Follow up, turning the log level to 0 does *not* prevent these segfaults.

Matt Pallissard
Comment 4 matt@pallissard.net 2019-03-28 14:16:24 UTC
On 2019-03-28T06:54:35, Pallissard, Matthew wrote:
> On 2019-03-21T15:13:30, Pallissard, Matthew wrote:
> > On 2019-03-20T22:01:40, Howard Chu wrote:
> > > matt@pallissard.net wrote:
> > >
> > > > 2. Turning the log level to 0 /seems/ to make the issue go away. I'll report
> > > > back once I can confirm that.
> 
> Follow up, turning the log level to 0 does *not* prevent these segfaults.
> 
> Matt Pallissard

Actually now that I'm looking more closely at this output; the last line below looks interesting.

(I probably should have tacked this on the backtrace)
 
Thread 4 "slapd" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff0c7c700 (LWP 341)]
nssov_dn2uid (op=0x7ffff0c7b730, ni=0x555555a24420, dn=0x7fffe811eab0, uid=0x7fffe8002450) at passwd.c:137
137     passwd.c: No such file or directory.

Matt Pallissard
Comment 5 matt@pallissard.net 2019-03-28 14:22:46 UTC
> Actually now that I'm looking more closely at this output; the last line below looks interesting.
>
> (I probably should have tacked this on the backtrace)
>
> Thread 4 "slapd" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff0c7c700 (LWP 341)]
> nssov_dn2uid (op=0x7ffff0c7b730, ni=0x555555a24420, dn=0x7fffe811eab0, uid=0x7fffe8002450) at passwd.c:137
> 137     passwd.c: No such file or directory.
>

Sorry, that was from the wrong output, see below for the correct snippet.

Thread 3 "slapd" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff277f700 (LWP 170)]
nssov_dn2uid (op=0x7ffff277e730, ni=0x555555a24420, dn=0x7fffe4128430, uid=0x7fffe4002430) at passwd.c:137
137     passwd.c: No such file or directory.

Matt Pallissard
Comment 6 Howard Chu 2019-03-28 14:41:48 UTC
Pallissard, Matthew wrote:
>> Actually now that I'm looking more closely at this output; the last line below looks interesting.
>>
>> (I probably should have tacked this on the backtrace)
>>
>> Thread 4 "slapd" received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 0x7ffff0c7c700 (LWP 341)]
>> nssov_dn2uid (op=0x7ffff0c7b730, ni=0x555555a24420, dn=0x7fffe811eab0, uid=0x7fffe8002450) at passwd.c:137
>> 137     passwd.c: No such file or directory.
>>
> 
> Sorry, that was from the wrong output, see below for the correct snippet.
> 
> Thread 3 "slapd" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff277f700 (LWP 170)]
> nssov_dn2uid (op=0x7ffff277e730, ni=0x555555a24420, dn=0x7fffe4128430, uid=0x7fffe4002430) at passwd.c:137
> 137     passwd.c: No such file or directory.
> 
> Matt Pallissard
> 
Thanks that was a crucial piece.

Please try this patch.

-- 
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
Comment 7 matt@pallissard.net 2019-03-28 15:17:23 UTC
> Please try this patch.

Right on.  Ran some initial tests, so far so good.

I'll bang on it for a few hours and follow up sometime in the late afternoon.

Thanks a bunch! You guys rock!

Matt Pallissard
Comment 8 Quanah Gibson-Mount 2019-06-06 22:57:34 UTC
changed notes
moved from Incoming to Software Bugs
Comment 9 Quanah Gibson-Mount 2019-06-17 17:26:26 UTC
changed notes
changed state Open to Test
Comment 10 Quanah Gibson-Mount 2019-06-17 17:28:37 UTC
changed notes
changed state Test to Release
Comment 11 OpenLDAP project 2019-07-24 19:05:04 UTC
Fixed in master
Fixed in RE24 (2.4.48)
Comment 12 Quanah Gibson-Mount 2019-07-24 19:05:04 UTC
changed notes
changed state Release to Closed