Full_Name: Matthew Pallissard Version: 2.4.47 OS: Archlinux URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (185.236.200.195) Hi, We're seeng a lot of segfaults on some of our busier HPC machines. These typically have hundreds of jobs landing on them at a given time. We use back-ldap with an nssov-overlay and pcache in front of Active Directory. This is Authorization only, authentication is handled via krb5. The relevant bits of slightly scrubbed config are at the bottom of this message. We notice two things; 1. this can be replicated semi-consistently; 1. stop slapd, ensure cache is empty 2. do something dumb like this > for i in {1..100}; do ./dumb.sh &; done > #!/bin/bash > # dumb.sh > while [[ 1 -ne 2 ]]; do > for i in $(getent passwd | cut -f 1 -d ':'); do > time id ${i} > done > done 3. start slapd 2. Turning the log level to 0 /seems/ to make the issue go away. I'll report back once I can confirm that. A note on this; We do have a good handful of 'service accounts' that don't have all of the posix attributes in active directory. As such those entries do spam the logs a bit. # config 2.4.47 dn: cn=module,cn=config objectClass: olcModuleList cn: module olcModulePath: /usr/lib/openldap olcModuleLoad: pcache olcModuleLoad: nssov olcModuleLoad: back_ldap olcModuleLoad: back_mdb dn: olcDatabase={1}ldap,cn=config objectClass: olcDatabaseConfig objectClass: olcLDAPConfig olcDatabase: {1}ldap olcSuffix: dc=ad,dc=domain,dc=edu olcAddContentAcl: FALSE olcLastMod: FALSE olcMaxDerefDepth: 15 olcReadOnly: FALSE olcSyncUseSubentry: FALSE olcMonitoring: FALSE olcRootDN: cn=ldap_rootdn,cn=config olcDbURI: "ldap://ad.domain.edu" olcDbStartTLS: none starttls=no olcDbACLBind: bindmethod=simple timeout=0 network-timeout=0 binddn="" crede ntials="" keepalive=0:0:0 olcDbIDAssertBind: mode=none flags=prescriptive,proxy-authz-non-critical bin dmethod=simple timeout=0 network-timeout=0 binddn="" credentials="" keepalive=0:0:0 olcDbIDAssertAuthzFrom: * olcDbRebindAsUser: FALSE olcDbChaseReferrals: FALSE olcDbTFSupport: no olcDbProxyWhoAmI: FALSE olcDbProtocolVersion: 3 olcDbSingleConn: FALSE olcDbCancel: abandon olcDbUseTemporaryConn: FALSE olcDbConnectionPoolMax: 16 olcDbSessionTrackingRequest: FALSE olcDbNoRefs: FALSE olcDbNoUndefFilter: FALSE olcDbOnErr: continue olcDbKeepalive: 0:0:0 structuralObjectClass: olcLDAPConfig dn: olcOverlay={0}nssov,olcDatabase={1}ldap,cn=config objectClass: olcOverlayConfig objectClass: olcNssOvConfig olcOverlay: {0}nssov olcNssSsd: group ldap:///dc=ad,dc=domain,dc=edu??sub?(objectClass=posi xGroup) olcNssSsd: passwd ldap:///dc=ad,dc=domain,dc=edu??sub?(objectClass=pos ixAccount) olcNssSsd: shadow ldap:///dc=ad,dc=domain,dc=edu??sub?(objectClass=sha dowAccount) olcNssMap: group uniqueMember member olcNssMap: passwd gecos title olcNssMap: passwd homeDirectory unixHomeDirectory olcNssPam: uid2dn olcNssPamMinUid: 0 olcNssPamMaxUid: 0 structuralObjectClass: olcNssOvConfig dn: olcOverlay={1}pcache,olcDatabase={1}ldap,cn=config objectClass: olcOverlayConfig objectClass: olcPcacheConfig olcOverlay: {1}pcache olcPcache: mdb 1000000 30 1000000 3600 olcPcacheAttrset: 0 uid userPassword uidNumber gidNumber gecos cn homeDirectory loginShell objectClass olcPcacheAttrset: 1 cn userPassword gidNumber memberUid objectClass member olcPcacheTemplate: "(&(objectclass=)(|(memberuid=)(member=)))" 1 3600 olcPcacheTemplate: "(&(objectclass=)(|(memberuid=)(uniquemember=)))" 1 3600 olcPcacheTemplate: "(&(objectclass=)(gidnumber=))" 1 3600 olcPcacheTemplate: "(&(objectclass=)(uidnumber=))" 0 3600 olcPcacheTemplate: "(&(objectclass=)(uid=))" 0 3600 olcPcacheTemplate: "(objectclass=)" 0 3600 olcPcacheTemplate: "(objectclass=)" 1 3600 olcPcachePosition: head olcPcacheMaxQueries: 10000000 olcPcachePersist: FALSE olcPcacheValidate: FALSE olcPcacheOffline: TRUE dn: olcDatabase={0}mdb,olcOverlay={1}pcache,olcDatabase={1}ldap,cn=config objectClass: olcMdbConfig objectClass: olcPcacheDatabase olcDatabase: {0}mdb olcDbDirectory: /var/lib/openldap/openldap-data olcDbNoSync: FALSE olcDbIndex: objectClass eq olcDbIndex: cn pres,eq,sub olcDbIndex: uid pres,eq,sub olcDbIndex: uidNumber eq olcDbIndex: gidNumber eq olcDbIndex: memberUid eq olcDbIndex: sn pres,eq,sub olcDbIndex: mail pres,eq,sub olcDbIndex: uniqueMember eq
matt@pallissard.net wrote: > 2. Turning the log level to 0 /seems/ to make the issue go away. I'll report > back once I can confirm that. That would imply the SEGV is specifically due to a NULL pointer being passed in a Debug message. If that's true, you should be able to reproduce this using whatever accounts aren't fully populated with POSIX attributes. > A note on this; We do have a good handful of 'service accounts' that don't > have all of the posix attributes in active directory. As such those entries do > spam the logs a bit. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
On 2019-03-20T22:01:40, Howard Chu wrote: > matt@pallissard.net wrote: > > > 2. Turning the log level to 0 /seems/ to make the issue go away. I'll report > > back once I can confirm that. > > That would imply the SEGV is specifically due to a NULL pointer being passed in a Debug message. > If that's true, you should be able to reproduce this using whatever accounts aren't fully populated > with POSIX attributes. > > > A note on this; We do have a good handful of 'service accounts' that don't > > have all of the posix attributes in active directory. As such those entries do > > spam the logs a bit. I have a backtrace I can submit. What's the proper way of doing so? As there is ldap info in it I'd rather not have it publicly accessable. Matt Pallissard
On 2019-03-21T15:13:30, Pallissard, Matthew wrote: > On 2019-03-20T22:01:40, Howard Chu wrote: > > matt@pallissard.net wrote: > > > > > 2. Turning the log level to 0 /seems/ to make the issue go away. I'll report > > > back once I can confirm that. Follow up, turning the log level to 0 does *not* prevent these segfaults. Matt Pallissard
On 2019-03-28T06:54:35, Pallissard, Matthew wrote: > On 2019-03-21T15:13:30, Pallissard, Matthew wrote: > > On 2019-03-20T22:01:40, Howard Chu wrote: > > > matt@pallissard.net wrote: > > > > > > > 2. Turning the log level to 0 /seems/ to make the issue go away. I'll report > > > > back once I can confirm that. > > Follow up, turning the log level to 0 does *not* prevent these segfaults. > > Matt Pallissard Actually now that I'm looking more closely at this output; the last line below looks interesting. (I probably should have tacked this on the backtrace) Thread 4 "slapd" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7ffff0c7c700 (LWP 341)] nssov_dn2uid (op=0x7ffff0c7b730, ni=0x555555a24420, dn=0x7fffe811eab0, uid=0x7fffe8002450) at passwd.c:137 137 passwd.c: No such file or directory. Matt Pallissard
> Actually now that I'm looking more closely at this output; the last line below looks interesting. > > (I probably should have tacked this on the backtrace) > > Thread 4 "slapd" received signal SIGSEGV, Segmentation fault. > [Switching to Thread 0x7ffff0c7c700 (LWP 341)] > nssov_dn2uid (op=0x7ffff0c7b730, ni=0x555555a24420, dn=0x7fffe811eab0, uid=0x7fffe8002450) at passwd.c:137 > 137 passwd.c: No such file or directory. > Sorry, that was from the wrong output, see below for the correct snippet. Thread 3 "slapd" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7ffff277f700 (LWP 170)] nssov_dn2uid (op=0x7ffff277e730, ni=0x555555a24420, dn=0x7fffe4128430, uid=0x7fffe4002430) at passwd.c:137 137 passwd.c: No such file or directory. Matt Pallissard
Pallissard, Matthew wrote: >> Actually now that I'm looking more closely at this output; the last line below looks interesting. >> >> (I probably should have tacked this on the backtrace) >> >> Thread 4 "slapd" received signal SIGSEGV, Segmentation fault. >> [Switching to Thread 0x7ffff0c7c700 (LWP 341)] >> nssov_dn2uid (op=0x7ffff0c7b730, ni=0x555555a24420, dn=0x7fffe811eab0, uid=0x7fffe8002450) at passwd.c:137 >> 137 passwd.c: No such file or directory. >> > > Sorry, that was from the wrong output, see below for the correct snippet. > > Thread 3 "slapd" received signal SIGSEGV, Segmentation fault. > [Switching to Thread 0x7ffff277f700 (LWP 170)] > nssov_dn2uid (op=0x7ffff277e730, ni=0x555555a24420, dn=0x7fffe4128430, uid=0x7fffe4002430) at passwd.c:137 > 137 passwd.c: No such file or directory. > > Matt Pallissard > Thanks that was a crucial piece. Please try this patch. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
> Please try this patch. Right on. Ran some initial tests, so far so good. I'll bang on it for a few hours and follow up sometime in the late afternoon. Thanks a bunch! You guys rock! Matt Pallissard
changed notes moved from Incoming to Software Bugs
changed notes changed state Open to Test
changed notes changed state Test to Release
Fixed in master Fixed in RE24 (2.4.48)
changed notes changed state Release to Closed