Issue 7636 - slapd crash when multi-master replication (syncrepl) enabled
Summary: slapd crash when multi-master replication (syncrepl) enabled
Status: RESOLVED PARTIAL
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: slapd (show other issues)
Version: 2.4.31
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-07-02 19:16 UTC by kb9vqf@pearsoncomputing.net
Modified: 2014-10-14 11:40 UTC (History)
0 users

See Also:


Attachments
openldap_syncprov_plugin_crash_fix.diff (764 bytes, patch)
2013-07-09 17:51 UTC, kb9vqf@pearsoncomputing.net
Details
diff.txt (942 bytes, text/plain)
2013-07-09 22:56 UTC, Howard Chu
Details

Note You need to log in before you can comment on or make changes to this issue.
Description kb9vqf@pearsoncomputing.net 2013-07-02 19:16:11 UTC
Full_Name: Timothy Pearson
Version: 2.4.31
OS: Debian Wheezy
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (131.156.2.26)


The setup:
Multi-master syncrepl on two servers
Identical hardware and software between servers
Self-signed TLS using common (private) CA certificate

The problem:
slapd on one server crashes repeatably within a minute of slapd starting on the
other server.  slapd works reliably if and only if the other server is not
running a slapd process.


Backtrace (does not change appreciably from crash to crash):

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffe356c700 (LWP 10433)]
0x00007ffff5a32475 in *__GI_raise (sig=<optimized out>) at
../nptl/sysdeps/unix/sysv/linux/raise.c:64
64      ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x00007ffff5a32475 in *__GI_raise (sig=<optimized out>) at
../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007ffff5a356f0 in *__GI_abort () at abort.c:92
#2  0x00007ffff5a6d52b in __libc_message (do_abort=<optimized out>,
fmt=<optimized out>) at ../sysdeps/unix/sysv/linux/libc_fatal.c:189
#3  0x00007ffff5a76d76 in malloc_printerr (action=3, str=0x7ffff5b4f170
"munmap_chunk(): invalid pointer", ptr=<optimized out>) at malloc.c:6283
#4  0x00007ffff63d214a in slapi_op_search_callback (op=0x7fffe3569fd0,
rs=<optimized out>, prc=<optimized out>) at
../../../../../servers/slapd/slapi/slapi_overlay.c:313
#5  slapi_op_search_callback (op=0x7fffe3569fd0, rs=<optimized out>,
prc=<optimized out>) at ../../../../../servers/slapd/slapi/slapi_overlay.c:296
#6  0x00007ffff63d304f in slapi_op_func (rs=0x7fffe3569f60, op=<optimized out>)
at ../../../../../servers/slapd/slapi/slapi_overlay.c:631
#7  slapi_op_func (op=0x7fffe3569fd0, rs=0x7fffe3569f60) at
../../../../../servers/slapd/slapi/slapi_overlay.c:556
#8  0x00005555555ff18a in overlay_op_walk (op=op@entry=0x7fffe3569fd0,
rs=0x7fffe3569f60, which=op_search, oi=0x5555559e57f0, on=0x5555559e4510) at
../../../../servers/slapd/backover.c:661
#9  0x00005555555ff31b in over_op_func (op=0x7fffe3569fd0, rs=<optimized out>,
which=<optimized out>) at ../../../../servers/slapd/backover.c:723
#10 0x00007ffff1c4870e in syncprov_findbase (op=op@entry=0x555555dd8560,
fc=fc@entry=0x7fffe356a280) at
../../../../../servers/slapd/overlays/syncprov.c:453
#11 0x00007ffff1c4b1ef in syncprov_op_search (op=0x555555dd8560,
rs=0x7fffe356ba50) at ../../../../../servers/slapd/overlays/syncprov.c:2465
#12 0x00005555555ff18a in overlay_op_walk (op=op@entry=0x555555dd8560,
rs=rs@entry=0x7fffe356ba50, which=which@entry=op_search, oi=0x5555559e57f0,
on=0x5555559e15e0) at ../../../../servers/slapd/backover.c:661
#13 0x00007ffff63d3086 in slapi_op_func (rs=0x7fffe356ba50, op=<optimized out>)
at ../../../../../servers/slapd/slapi/slapi_overlay.c:647
#14 slapi_op_func (op=0x555555dd8560, rs=0x7fffe356ba50) at
../../../../../servers/slapd/slapi/slapi_overlay.c:556
#15 0x00005555555ff18a in overlay_op_walk (op=op@entry=0x555555dd8560,
rs=0x7fffe356ba50, which=op_search, oi=0x5555559e57f0, on=0x5555559e4510) at
../../../../servers/slapd/backover.c:661
#16 0x00005555555ff31b in over_op_func (op=0x555555dd8560, rs=<optimized out>,
which=<optimized out>) at ../../../../servers/slapd/backover.c:723
#17 0x0000555555594641 in fe_op_search (op=0x555555dd8560, rs=0x7fffe356ba50) at
../../../../servers/slapd/search.c:402
#18 0x0000555555593f06 in do_search (op=0x555555dd8560, rs=0x7fffe356ba50) at
../../../../servers/slapd/search.c:247
#19 0x0000555555591961 in connection_operation (ctx=ctx@entry=0x7fffe356bba0,
arg_v=arg_v@entry=0x555555dd8560) at
../../../../servers/slapd/connection.c:1150
#20 0x0000555555591c84 in connection_read_thread (ctx=0x7fffe356bba0,
argv=<optimized out>) at ../../../../servers/slapd/connection.c:1286
#21 0x00007ffff7b9dff3 in ?? () from
/usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2
#22 0x00007ffff5d90b50 in start_thread (arg=<optimized out>) at
pthread_create.c:304
#23 0x00007ffff5adaa7d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#24 0x0000000000000000 in ?? ()


Relevant configuration files:


olcDatabase={0}config.ldif:

dn: olcDatabase={0}config
objectClass: olcDatabaseConfig
olcDatabase: {0}config
olcAccess: {0}to * by
group/groupOfNames/member.exact="cn=realmadmins,ou=groups,ou=core,ou=realm,dc=<redacted>"
write by dn.base="uid=ldapadmin,ou=users,ou=core,ou=realm,dc=<redacted>" write
by dn.base="cn=admin,dc=<redacted>" write by sockurl.regex="^ldapi:///$" write
by dynacl/aci write by * none
olcAddContentAcl: TRUE
olcLastMod: TRUE
olcMaxDerefDepth: 15
olcReadOnly: FALSE
olcRootDN: cn=config
olcRootPW:: REDACTED
olcSyncUseSubentry: FALSE
olcMonitoring: FALSE
structuralObjectClass: olcDatabaseConfig
creatorsName: cn=config
createTimestamp: 20130702170316Z
olcSyncrepl: {0}rid=001 provider=ldaps://ldap002.<redacted>/
binddn="cn=admin,dc=<redacted>" bindmethod=simple credentials="ldapadmin001"
searchbase="cn=config" type=refreshAndPersist retry="5 5 300 5 600 +" timeout=1
tls_reqcert=never tls_cacert="/etc/trinity/ldap/tde-ca/anchors/tdeca.pem"lcSyncrepl:
{1}rid=002 provider=ldaps://ldap003.<redacted>/ binddn="cn=admin,dc=<redacted>"
bindmethod=simple credentials="ldapadmin001" searchbase="cn=config"
type=refreshAndPersist retry="5 5 300 5 600 +" timeout=1 tls_reqcert=never
tls_cacert="/etc/trinity/ldap/tde-ca/anchors/tdeca.pem"
olcMirrorMode: TRUE
entryCSN: 20130702180604.682031Z#000000#002#000000
modifiersName:
modifyTimestamp: 20130702180604Z


olcDatabase={1}hdb.ldif:

dn: olcDatabase={1}hdb
objectClass: olcDatabaseConfig
objectClass: olcHdbConfig
olcDatabase: {1}hdb
olcDbDirectory: /var/lib/ldap
olcSuffix: dc=<redacted>
olcAccess: {0}to attrs=userPassword,shadowLastChange,krb5Key,krb5PrincipalName,krb5KeyVersionNumber,krb5MaxLife,krb5MaxRenew,krb5KDCFlags,privateRootCertificateKey
by group/groupOfNames/member.exact="cn=realmadmins,ou=groups,ou=core,ou=realm,dc=<redacted>"
write by dn.base="uid=ldapadmin,ou=users,ou=core,ou=realm,dc=<redacted>" by
sockurl.regex="^ldapi:///$" write by anonymous auth by self write by * none
olcAccess: {1}to dn.base="" by * read
olcAccess: {2}to * by
group/groupOfNames/member.exact="cn=realmadmins,ou=groups,ou=core,ou=realm,dc=<redacted>"
write by dn.base="uid=ldapadmin,ou=users,ou=core,ou=realm,dc=<redacted>" write
by sockurl.regex="^ldapi:///$" write by dynacl/aci write
olcAddContentAcl: FALSE
olcLastMod: TRUE
olcMaxDerefDepth: 15
olcReadOnly: FALSE
olcRootDN: cn=admin,dc=<redacted>
olcRootPW:: REDACTED
olcMonitoring: TRUE
olcDbCacheSize: 1000
olcDbCheckpoint: 512 30
olcDbConfig: {0}set_cachesize 0 67108864 1
olcDbConfig: {1}set_lg_regionmax 262144
olcDbConfig: {2}set_lg_bsize 2097152
olcDbNoSync: FALSE
olcDbDirtyRead: FALSE
olcDbIDLcacheSize: 0
olcDbIndex: objectClass eq
olcDbIndex: krb5PrincipalName eq,pres
olcDbIndex: cn eq,pres,subinitial
olcDbIndex: mail eq,pres
olcDbIndex: uid pres,eq
olcDbIndex: uidNumber eq
olcDbIndex: gidNumber eq
olcDbLinearIndex: FALSE
olcDbMode: 0600
olcDbSearchStack: 16
olcDbShmKey: 0
olcDbCacheFree: 1
olcDbDNcacheSize: 0
olcPlugin: postoperation /opt/trinity/lib/slapi-acl-manager.so plugin_init
admingroup-dn:=cn=realmadmins,ou=groups,ou=core,ou=realm,dc=<redacted>
realm:=CEET.NIU.EDU aclfile:=/etc/heimdal-kdc/kadmind.acl  builtinadmin:=admin
structuralObjectClass: olcHdbConfig
creatorsName: cn=config
createTimestamp: 20130702170316Z
olcSyncrepl: {0}rid=001 provider=ldaps://ldap002.<redacted>/
binddn="cn=admin,dc=<redacted>" bindmethod=simple credentials="ldapadmin001"
searchbase="dc=<redacted>" type=refreshAndPersist retry="5 5 300 5" timeout=1
tls_reqcert=never tls_cacert="/etc/trinity/ldap/tde-ca/anchors/tdeca.pem"
olcSyncrepl: {1}rid=002 provider=ldaps://ldap003.<redacted>/
binddn="cn=admin,dc=<redacted>" bindmethod=simple credentials="ldapadmin001"
searchbase="dc=<redacted>" type=refreshAndPersist retry="5 5 300 5" timeout=1
tls_reqcert=never tls_cacert="/etc/trinity/ldap/tde-ca/anchors/tdeca.pem"
olcMirrorMode: TRUE
entryCSN: 20130702170511.039863Z#000000#002#000000
modifiersName:
modifyTimestamp: 20130702170511Z


olcDatabase={0}config/olcOverlay={0}syncprov.ldif:

dn: olcOverlay={0}syncprov
objectClass: olcOverlayConfig
objectClass: olcSyncProvConfig
olcOverlay: {0}syncprov
structuralObjectClass: olcSyncProvConfig
entryUUID: 14f25934-7785-1032-93bd-0fe5d581c3b6
creatorsName:
createTimestamp: 20130702170337Z
entryCSN: 20130702170337.631532Z#000000#002#000000
modifiersName:
modifyTimestamp: 20130702170337Z


olcDatabase={1}hdb/olcOverlay={0}syncprov.ldif:

dn: olcOverlay={0}syncprov
objectClass: olcOverlayConfig
objectClass: olcSyncProvConfig
olcOverlay: {0}syncprov
structuralObjectClass: olcSyncProvConfig
entryUUID: bfa2e530-778d-1032-9b41-e7480f731418
creatorsName:
createTimestamp: 20130702180539Z
entryCSN: 20130702180539.975057Z#000000#002#000000
modifiersName:
modifyTimestamp: 20130702180539Z
Comment 1 kb9vqf@pearsoncomputing.net 2013-07-02 19:22:41 UTC
Since my report above seems to have been somewhat mangled by the
submission process, here is a link to a plain-text version of the same
report:
http://www.ceet.niu.edu/pastebin/openldap_bug_report_7636.txt

Comment 2 Howard Chu 2013-07-02 19:42:53 UTC
kb9vqf@pearsoncomputing.net wrote:
> Full_Name: Timothy Pearson
> Version: 2.4.31
> OS: Debian Wheezy
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (131.156.2.26)
>
>
> The setup:
> Multi-master syncrepl on two servers
> Identical hardware and software between servers
> Self-signed TLS using common (private) CA certificate
>
> The problem:
> slapd on one server crashes repeatably within a minute of slapd starting on the
> other server.  slapd works reliably if and only if the other server is not
> running a slapd process.

1) 2.4.31 is ancient. Current is 2.4.35. Please provide a backtrace against a 
current OpenLDAP release.

2) Your backtrace shows a crash in a slapi plugin. If the bug is in your 
plugin there's nothing we can do about it.

3) Don't touch the individual files inside the slapd configuration database. 
Use "slapcat -n0".

>
> Backtrace (does not change appreciably from crash to crash):
>
> Program received signal SIGABRT, Aborted.
> [Switching to Thread 0x7fffe356c700 (LWP 10433)]
> 0x00007ffff5a32475 in *__GI_raise (sig=<optimized out>) at
> ../nptl/sysdeps/unix/sysv/linux/raise.c:64
> 64      ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
> (gdb) bt
> #0  0x00007ffff5a32475 in *__GI_raise (sig=<optimized out>) at
> ../nptl/sysdeps/unix/sysv/linux/raise.c:64
> #1  0x00007ffff5a356f0 in *__GI_abort () at abort.c:92
> #2  0x00007ffff5a6d52b in __libc_message (do_abort=<optimized out>,
> fmt=<optimized out>) at ../sysdeps/unix/sysv/linux/libc_fatal.c:189
> #3  0x00007ffff5a76d76 in malloc_printerr (action=3, str=0x7ffff5b4f170
> "munmap_chunk(): invalid pointer", ptr=<optimized out>) at malloc.c:6283
> #4  0x00007ffff63d214a in slapi_op_search_callback (op=0x7fffe3569fd0,
> rs=<optimized out>, prc=<optimized out>) at
> ../../../../../servers/slapd/slapi/slapi_overlay.c:313
> #5  slapi_op_search_callback (op=0x7fffe3569fd0, rs=<optimized out>,
> prc=<optimized out>) at ../../../../../servers/slapd/slapi/slapi_overlay.c:296
> #6  0x00007ffff63d304f in slapi_op_func (rs=0x7fffe3569f60, op=<optimized out>)
> at ../../../../../servers/slapd/slapi/slapi_overlay.c:631
> #7  slapi_op_func (op=0x7fffe3569fd0, rs=0x7fffe3569f60) at
> ../../../../../servers/slapd/slapi/slapi_overlay.c:556
> #8  0x00005555555ff18a in overlay_op_walk (op=op@entry=0x7fffe3569fd0,
> rs=0x7fffe3569f60, which=op_search, oi=0x5555559e57f0, on=0x5555559e4510) at
> ../../../../servers/slapd/backover.c:661
> #9  0x00005555555ff31b in over_op_func (op=0x7fffe3569fd0, rs=<optimized out>,
> which=<optimized out>) at ../../../../servers/slapd/backover.c:723
> #10 0x00007ffff1c4870e in syncprov_findbase (op=op@entry=0x555555dd8560,
> fc=fc@entry=0x7fffe356a280) at
> ../../../../../servers/slapd/overlays/syncprov.c:453
> #11 0x00007ffff1c4b1ef in syncprov_op_search (op=0x555555dd8560,
> rs=0x7fffe356ba50) at ../../../../../servers/slapd/overlays/syncprov.c:2465
> #12 0x00005555555ff18a in overlay_op_walk (op=op@entry=0x555555dd8560,
> rs=rs@entry=0x7fffe356ba50, which=which@entry=op_search, oi=0x5555559e57f0,
> on=0x5555559e15e0) at ../../../../servers/slapd/backover.c:661
> #13 0x00007ffff63d3086 in slapi_op_func (rs=0x7fffe356ba50, op=<optimized out>)
> at ../../../../../servers/slapd/slapi/slapi_overlay.c:647
> #14 slapi_op_func (op=0x555555dd8560, rs=0x7fffe356ba50) at
> ../../../../../servers/slapd/slapi/slapi_overlay.c:556
> #15 0x00005555555ff18a in overlay_op_walk (op=op@entry=0x555555dd8560,
> rs=0x7fffe356ba50, which=op_search, oi=0x5555559e57f0, on=0x5555559e4510) at
> ../../../../servers/slapd/backover.c:661
> #16 0x00005555555ff31b in over_op_func (op=0x555555dd8560, rs=<optimized out>,
> which=<optimized out>) at ../../../../servers/slapd/backover.c:723
> #17 0x0000555555594641 in fe_op_search (op=0x555555dd8560, rs=0x7fffe356ba50) at
> ../../../../servers/slapd/search.c:402
> #18 0x0000555555593f06 in do_search (op=0x555555dd8560, rs=0x7fffe356ba50) at
> ../../../../servers/slapd/search.c:247
> #19 0x0000555555591961 in connection_operation (ctx=ctx@entry=0x7fffe356bba0,
> arg_v=arg_v@entry=0x555555dd8560) at
> ../../../../servers/slapd/connection.c:1150
> #20 0x0000555555591c84 in connection_read_thread (ctx=0x7fffe356bba0,
> argv=<optimized out>) at ../../../../servers/slapd/connection.c:1286
> #21 0x00007ffff7b9dff3 in ?? () from
> /usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2
> #22 0x00007ffff5d90b50 in start_thread (arg=<optimized out>) at
> pthread_create.c:304
> #23 0x00007ffff5adaa7d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
> #24 0x0000000000000000 in ?? ()
>
>
> Relevant configuration files:
>
>
> olcDatabase={0}config.ldif:
>
> dn: olcDatabase={0}config
> objectClass: olcDatabaseConfig
> olcDatabase: {0}config
> olcAccess: {0}to * by
> group/groupOfNames/member.exact="cn=realmadmins,ou=groups,ou=core,ou=realm,dc=<redacted>"
> write by dn.base="uid=ldapadmin,ou=users,ou=core,ou=realm,dc=<redacted>" write
> by dn.base="cn=admin,dc=<redacted>" write by sockurl.regex="^ldapi:///$" write
> by dynacl/aci write by * none
> olcAddContentAcl: TRUE
> olcLastMod: TRUE
> olcMaxDerefDepth: 15
> olcReadOnly: FALSE
> olcRootDN: cn=config
> olcRootPW:: REDACTED
> olcSyncUseSubentry: FALSE
> olcMonitoring: FALSE
> structuralObjectClass: olcDatabaseConfig
> creatorsName: cn=config
> createTimestamp: 20130702170316Z
> olcSyncrepl: {0}rid=001 provider=ldaps://ldap002.<redacted>/
> binddn="cn=admin,dc=<redacted>" bindmethod=simple credentials="ldapadmin001"
> searchbase="cn=config" type=refreshAndPersist retry="5 5 300 5 600 +" timeout=1
> tls_reqcert=never tls_cacert="/etc/trinity/ldap/tde-ca/anchors/tdeca.pem"lcSyncrepl:
> {1}rid=002 provider=ldaps://ldap003.<redacted>/ binddn="cn=admin,dc=<redacted>"
> bindmethod=simple credentials="ldapadmin001" searchbase="cn=config"
> type=refreshAndPersist retry="5 5 300 5 600 +" timeout=1 tls_reqcert=never
> tls_cacert="/etc/trinity/ldap/tde-ca/anchors/tdeca.pem"
> olcMirrorMode: TRUE
> entryCSN: 20130702180604.682031Z#000000#002#000000
> modifiersName:
> modifyTimestamp: 20130702180604Z
>
>
> olcDatabase={1}hdb.ldif:
>
> dn: olcDatabase={1}hdb
> objectClass: olcDatabaseConfig
> objectClass: olcHdbConfig
> olcDatabase: {1}hdb
> olcDbDirectory: /var/lib/ldap
> olcSuffix: dc=<redacted>
> olcAccess: {0}to attrs=userPassword,shadowLastChange,krb5Key,krb5PrincipalName,krb5KeyVersionNumber,krb5MaxLife,krb5MaxRenew,krb5KDCFlags,privateRootCertificateKey
> by group/groupOfNames/member.exact="cn=realmadmins,ou=groups,ou=core,ou=realm,dc=<redacted>"
> write by dn.base="uid=ldapadmin,ou=users,ou=core,ou=realm,dc=<redacted>" by
> sockurl.regex="^ldapi:///$" write by anonymous auth by self write by * none
> olcAccess: {1}to dn.base="" by * read
> olcAccess: {2}to * by
> group/groupOfNames/member.exact="cn=realmadmins,ou=groups,ou=core,ou=realm,dc=<redacted>"
> write by dn.base="uid=ldapadmin,ou=users,ou=core,ou=realm,dc=<redacted>" write
> by sockurl.regex="^ldapi:///$" write by dynacl/aci write
> olcAddContentAcl: FALSE
> olcLastMod: TRUE
> olcMaxDerefDepth: 15
> olcReadOnly: FALSE
> olcRootDN: cn=admin,dc=<redacted>
> olcRootPW:: REDACTED
> olcMonitoring: TRUE
> olcDbCacheSize: 1000
> olcDbCheckpoint: 512 30
> olcDbConfig: {0}set_cachesize 0 67108864 1
> olcDbConfig: {1}set_lg_regionmax 262144
> olcDbConfig: {2}set_lg_bsize 2097152
> olcDbNoSync: FALSE
> olcDbDirtyRead: FALSE
> olcDbIDLcacheSize: 0
> olcDbIndex: objectClass eq
> olcDbIndex: krb5PrincipalName eq,pres
> olcDbIndex: cn eq,pres,subinitial
> olcDbIndex: mail eq,pres
> olcDbIndex: uid pres,eq
> olcDbIndex: uidNumber eq
> olcDbIndex: gidNumber eq
> olcDbLinearIndex: FALSE
> olcDbMode: 0600
> olcDbSearchStack: 16
> olcDbShmKey: 0
> olcDbCacheFree: 1
> olcDbDNcacheSize: 0
> olcPlugin: postoperation /opt/trinity/lib/slapi-acl-manager.so plugin_init
> admingroup-dn:=cn=realmadmins,ou=groups,ou=core,ou=realm,dc=<redacted>
> realm:=CEET.NIU.EDU aclfile:=/etc/heimdal-kdc/kadmind.acl  builtinadmin:=admin
> structuralObjectClass: olcHdbConfig
> creatorsName: cn=config
> createTimestamp: 20130702170316Z
> olcSyncrepl: {0}rid=001 provider=ldaps://ldap002.<redacted>/
> binddn="cn=admin,dc=<redacted>" bindmethod=simple credentials="ldapadmin001"
> searchbase="dc=<redacted>" type=refreshAndPersist retry="5 5 300 5" timeout=1
> tls_reqcert=never tls_cacert="/etc/trinity/ldap/tde-ca/anchors/tdeca.pem"
> olcSyncrepl: {1}rid=002 provider=ldaps://ldap003.<redacted>/
> binddn="cn=admin,dc=<redacted>" bindmethod=simple credentials="ldapadmin001"
> searchbase="dc=<redacted>" type=refreshAndPersist retry="5 5 300 5" timeout=1
> tls_reqcert=never tls_cacert="/etc/trinity/ldap/tde-ca/anchors/tdeca.pem"
> olcMirrorMode: TRUE
> entryCSN: 20130702170511.039863Z#000000#002#000000
> modifiersName:
> modifyTimestamp: 20130702170511Z
>
>
> olcDatabase={0}config/olcOverlay={0}syncprov.ldif:
>
> dn: olcOverlay={0}syncprov
> objectClass: olcOverlayConfig
> objectClass: olcSyncProvConfig
> olcOverlay: {0}syncprov
> structuralObjectClass: olcSyncProvConfig
> entryUUID: 14f25934-7785-1032-93bd-0fe5d581c3b6
> creatorsName:
> createTimestamp: 20130702170337Z
> entryCSN: 20130702170337.631532Z#000000#002#000000
> modifiersName:
> modifyTimestamp: 20130702170337Z
>
>
> olcDatabase={1}hdb/olcOverlay={0}syncprov.ldif:
>
> dn: olcOverlay={0}syncprov
> objectClass: olcOverlayConfig
> objectClass: olcSyncProvConfig
> olcOverlay: {0}syncprov
> structuralObjectClass: olcSyncProvConfig
> entryUUID: bfa2e530-778d-1032-9b41-e7480f731418
> creatorsName:
> createTimestamp: 20130702180539Z
> entryCSN: 20130702180539.975057Z#000000#002#000000
> modifiersName:
> modifyTimestamp: 20130702180539Z
>
>


-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 3 kb9vqf@pearsoncomputing.net 2013-07-03 16:58:51 UTC
> kb9vqf@pearsoncomputing.net wrote:
>> Full_Name: Timothy Pearson
>> Version: 2.4.31
>> OS: Debian Wheezy
>> URL: ftp://ftp.openldap.org/incoming/
>> Submission from: (NULL) (131.156.2.26)
>>
>>
>> The setup:
>> Multi-master syncrepl on two servers
>> Identical hardware and software between servers
>> Self-signed TLS using common (private) CA certificate
>>
>> The problem:
>> slapd on one server crashes repeatably within a minute of slapd
starting on the
>> other server.  slapd works reliably if and only if the other server is not
>> running a slapd process.
>
> 1) 2.4.31 is ancient. Current is 2.4.35. Please provide a backtrace
against a
> current OpenLDAP release.
>
> 2) Your backtrace shows a crash in a slapi plugin. If the bug is in your
plugin there's nothing we can do about it.
>
> 3) Don't touch the individual files inside the slapd configuration
database.
> Use "slapcat -n0".

Thank you for your prompt response.  In the backtrace I posted, which
portion suggests that the crash occurred in a plugin?  I would have
expected to see at least one frame in something other than
../../../../servers/slapd.

I am not directly modifying the config database text files, but did not
know the correct method to dump the existing files.  Thanks for the
command!



Comment 4 Howard Chu 2013-07-03 19:09:06 UTC
kb9vqf@pearsoncomputing.net wrote:
>> kb9vqf@pearsoncomputing.net wrote:
>>> Full_Name: Timothy Pearson
>>> Version: 2.4.31
>>> OS: Debian Wheezy
>>> URL: ftp://ftp.openldap.org/incoming/
>>> Submission from: (NULL) (131.156.2.26)
>>>
>>>
>>> The setup:
>>> Multi-master syncrepl on two servers
>>> Identical hardware and software between servers
>>> Self-signed TLS using common (private) CA certificate
>>>
>>> The problem:
>>> slapd on one server crashes repeatably within a minute of slapd
> starting on the
>>> other server.  slapd works reliably if and only if the other server is not
>>> running a slapd process.
>>
>> 1) 2.4.31 is ancient. Current is 2.4.35. Please provide a backtrace
> against a
>> current OpenLDAP release.
>>
>> 2) Your backtrace shows a crash in a slapi plugin. If the bug is in your
> plugin there's nothing we can do about it.
>>
>> 3) Don't touch the individual files inside the slapd configuration
> database.
>> Use "slapcat -n0".
>
> Thank you for your prompt response.  In the backtrace I posted, which
> portion suggests that the crash occurred in a plugin?  I would have
> expected to see at least one frame in something other than
> ../../../../servers/slapd.


Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffe356c700 (LWP 10433)]
0x00007ffff5a32475 in *__GI_raise (sig=<optimized out>) at
../nptl/sysdeps/unix/sysv/linux/raise.c:64
64      ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x00007ffff5a32475 in *__GI_raise (sig=<optimized out>) at
../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007ffff5a356f0 in *__GI_abort () at abort.c:92
#2  0x00007ffff5a6d52b in __libc_message (do_abort=<optimized out>,
fmt=<optimized out>) at ../sysdeps/unix/sysv/linux/libc_fatal.c:189
#3  0x00007ffff5a76d76 in malloc_printerr (action=3, str=0x7ffff5b4f170
"munmap_chunk(): invalid pointer", ptr=<optimized out>) at malloc.c:6283
#4  0x00007ffff63d214a in slapi_op_search_callback (op=0x7fffe3569fd0,
rs=<optimized out>, prc=<optimized out>) at
../../../../../servers/slapd/slapi/slapi_overlay.c:313
#5  slapi_op_search_callback (op=0x7fffe3569fd0, rs=<optimized out>,
prc=<optimized out>) at ../../../../../servers/slapd/slapi/slapi_overlay.c:296
#6  0x00007ffff63d304f in slapi_op_func (rs=0x7fffe3569f60, op=<optimized out>)
at ../../../../../servers/slapd/slapi/slapi_overlay.c:631
#7  slapi_op_func (op=0x7fffe3569fd0, rs=0x7fffe3569f60) at
../../../../../servers/slapd/slapi/slapi_overlay.c:556

The slapi_* functions would not be present in the stack trace unless you had 
configured a slapi plugin on this database.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 5 kb9vqf@pearsoncomputing.net 2013-07-09 16:42:36 UTC
I have done further testing, and it appears that simply enabling a dummy
plugin is enough to cause slapd to crash.  This plugin code alone is
sufficient to cause a crash when syncrepl is enabled:

#include "slapi-plugin.h"

int plugin_init (Slapi_PBlock *pb);
int internal_plugin_init (Slapi_PBlock *pb);

__attribute__ ((visibility ("default"))) int plugin_init (Slapi_PBlock *pb) {
    return 0;
}

Therefore, it would seem that there is actually a bug in the openldap
code, not in a third-party plugin as originally thought.

What do you need to investigate this bug further?

Thanks!

Comment 6 kb9vqf@pearsoncomputing.net 2013-07-09 16:59:14 UTC
Here is the backtrace with the above dummy plugin on the latest OpenLDAP
(2.4.35):

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffe34a3700 (LWP 15269)]
0x00007ffff5a32475 in *__GI_raise (sig=<optimized out>) at
../nptl/sysdeps/unix/sysv/linux/raise.c:64
64      ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x00007ffff5a32475 in *__GI_raise (sig=<optimized out>) at
../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007ffff5a356f0 in *__GI_abort () at abort.c:92
#2  0x00007ffff5a6d52b in __libc_message (do_abort=<optimized out>,
fmt=<optimized out>) at ../sysdeps/unix/sysv/linux/libc_fatal.c:189
#3  0x00007ffff5a76d76 in malloc_printerr (action=3, str=0x7ffff5b4f170
"munmap_chunk(): invalid pointer", ptr=<optimized out>) at malloc.c:6283
#4  0x00007ffff63d214a in slapi_op_search_callback (op=0x7fffe34a0ff0,
rs=<optimized out>, prc=<optimized out>) at
../../../../../servers/slapd/slapi/slapi_overlay.c:313
#5  slapi_op_search_callback (op=0x7fffe34a0ff0, rs=<optimized out>,
prc=<optimized out>) at
../../../../../servers/slapd/slapi/slapi_overlay.c:296
#6  0x00007ffff63d304f in slapi_op_func (rs=0x7fffe34a0f80, op=<optimized
out>) at ../../../../../servers/slapd/slapi/slapi_overlay.c:631
#7  slapi_op_func (op=0x7fffe34a0ff0, rs=0x7fffe34a0f80) at
../../../../../servers/slapd/slapi/slapi_overlay.c:556
#8  0x00005555555ff18a in overlay_op_walk (op=op@entry=0x7fffe34a0ff0,
rs=0x7fffe34a0f80, which=op_search, oi=0x5555559e5680, on=0x5555559e43a0)
at ../../../../servers/slapd/backover.c:661
#9  0x00005555555ff31b in over_op_func (op=0x7fffe34a0ff0, rs=<optimized
out>, which=<optimized out>) at ../../../../servers/slapd/backover.c:723
#10 0x00007ffff1c4a45b in syncprov_findbase (op=op@entry=0x555555f093c0,
fc=fc@entry=0x7fffe34a1290) at syncprov.c:465
#11 0x00007ffff1c4cd87 in syncprov_op_search (op=0x555555f093c0,
rs=0x7fffe34a2a50) at syncprov.c:2461
#12 0x00005555555ff18a in overlay_op_walk (op=op@entry=0x555555f093c0,
rs=rs@entry=0x7fffe34a2a50, which=which@entry=op_search,
oi=0x5555559e5680, on=0x5555559bfbe0) at
../../../../servers/slapd/backover.c:661
#13 0x00007ffff63d3086 in slapi_op_func (rs=0x7fffe34a2a50, op=<optimized
out>) at ../../../../../servers/slapd/slapi/slapi_overlay.c:647
#14 slapi_op_func (op=0x555555f093c0, rs=0x7fffe34a2a50) at
../../../../../servers/slapd/slapi/slapi_overlay.c:556
#15 0x00005555555ff18a in overlay_op_walk (op=op@entry=0x555555f093c0,
rs=0x7fffe34a2a50, which=op_search, oi=0x5555559e5680, on=0x5555559e43a0)
at ../../../../servers/slapd/backover.c:661
#16 0x00005555555ff31b in over_op_func (op=0x555555f093c0, rs=<optimized
out>, which=<optimized out>) at ../../../../servers/slapd/backover.c:723
#17 0x0000555555594641 in fe_op_search (op=0x555555f093c0,
rs=0x7fffe34a2a50) at ../../../../servers/slapd/search.c:402
#18 0x0000555555593f06 in do_search (op=0x555555f093c0, rs=0x7fffe34a2a50)
at ../../../../servers/slapd/search.c:247
#19 0x0000555555591961 in connection_operation
(ctx=ctx@entry=0x7fffe34a2ba0, arg_v=arg_v@entry=0x555555f093c0) at
../../../../servers/slapd/connection.c:1150
#20 0x0000555555591c84 in connection_read_thread (ctx=0x7fffe34a2ba0,
argv=<optimized out>) at ../../../../servers/slapd/connection.c:1286
#21 0x00007ffff7b9dff3 in ?? () from
/usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2
#22 0x00007ffff5d90b50 in start_thread (arg=<optimized out>) at
pthread_create.c:304
#23 0x00007ffff5adaa7d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#24 0x0000000000000000 in ?? ()
(gdb)

Comment 7 kb9vqf@pearsoncomputing.net 2013-07-09 17:51:14 UTC
I have traced the fault into the syncrepl overlay, specifically it passes
a static global variable to be_search().  If a plugin is configured,
slapi_op_search_callback() is called, which then attempts to free the
static global search variable passed to be_search(), causing a crash.

The attached patch fixes the problem on my test system.
Comment 8 kb9vqf@pearsoncomputing.net 2013-07-09 21:21:53 UTC
Unfortunately the previous patch only fixes some of the most obvious
(immediate) crashes.  There are other places in the code (e.g. syncprov.c
line 647) which pass non-malloc()ed variables to be_search() and thereby
crash slapd.

Has syncrepl even been tested with plugins before?  We really wanted to
use OpenLDAP for our directory server, however if replication cannot be
used with plugins then we may need to go with a Microsoft solution
instead.

Any hints?

Comment 9 Howard Chu 2013-07-09 22:11:30 UTC
kb9vqf@pearsoncomputing.net wrote:
> Unfortunately the previous patch only fixes some of the most obvious
> (immediate) crashes.  There are other places in the code (e.g. syncprov.c
> line 647) which pass non-malloc()ed variables to be_search() and thereby
> crash slapd.
>
> Has syncrepl even been tested with plugins before?  We really wanted to
> use OpenLDAP for our directory server, however if replication cannot be
> used with plugins then we may need to go with a Microsoft solution
> instead.
>
> Any hints?

Seems like the problem should be fixed in the slapi support code. And in 
general, the slapi code receives very little attention since slapd overlays 
are the native plugin mechanism. Patching syncprov.c is definitely the wrong 
way to go though.

What are the plugins you're trying to use? Are you sure there isn't already a 
slapd overlay that provides the feature?

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 10 Howard Chu 2013-07-09 22:56:20 UTC
kb9vqf@pearsoncomputing.net wrote:
> Unfortunately the previous patch only fixes some of the most obvious
> (immediate) crashes.  There are other places in the code (e.g. syncprov.c
> line 647) which pass non-malloc()ed variables to be_search() and thereby
> crash slapd.
>
> Has syncrepl even been tested with plugins before?  We really wanted to
> use OpenLDAP for our directory server, however if replication cannot be
> used with plugins then we may need to go with a Microsoft solution
> instead.
>
> Any hints?

Somewhat of a workaround - slapi should not be doing anything here unless the 
filter was actually changed.

syncprov is using (objectclass=*) which frankly no plugin should be rewriting, 
so this will cover it for the most part. Aside from that it looks like this 
area of the slapi interaction needs some redesign.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/
Comment 11 kb9vqf@pearsoncomputing.net 2013-07-09 23:32:25 UTC
> kb9vqf@pearsoncomputing.net wrote:
>> Unfortunately the previous patch only fixes some of the most obvious
(immediate) crashes.  There are other places in the code (e.g.
>> syncprov.c
>> line 647) which pass non-malloc()ed variables to be_search() and
thereby crash slapd.
>>
>> Has syncrepl even been tested with plugins before?  We really wanted to
use OpenLDAP for our directory server, however if replication cannot be
used with plugins then we may need to go with a Microsoft solution
instead.
>>
>> Any hints?
>
> Seems like the problem should be fixed in the slapi support code. And in
general, the slapi code receives very little attention since slapd
overlays
> are the native plugin mechanism. Patching syncprov.c is definitely the
wrong
> way to go though.

OK, thanks for the info.  I don't know much acout the internals of
OpenLDAP, but I can still see where non-malloc()ed memory is being passed
(eventually) to a free()-type function. ;-)

> What are the plugins you're trying to use? Are you sure there isn't
already a
> slapd overlay that provides the feature?

We use a custom plugin to keep Kerberos ACLs in sync with group
memberships.  The plugin only monitors for certain changes to the
directory, it does not modify the directory in any way.



Comment 12 kb9vqf@pearsoncomputing.net 2013-07-10 19:21:06 UTC
> Somewhat of a workaround - slapi should not be doing anything here
unless the
> filter was actually changed.
>
> syncprov is using (objectclass=*) which frankly no plugin should be
rewriting,
> so this will cover it for the most part. Aside from that it looks like this
> area of the slapi interaction needs some redesign.
>
> --

Unfortunately the workaround didn't help:

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffe2ca2700 (LWP 29843)]
0x00007ffff5a32475 in *__GI_raise (sig=<optimized out>) at
../nptl/sysdeps/unix/sysv/linux/raise.c:64
64      ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or
directory. (gdb) bt
#0  0x00007ffff5a32475 in *__GI_raise (sig=<optimized out>) at
../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007ffff5a356f0 in *__GI_abort () at abort.c:92
#2  0x00007ffff5a6d52b in __libc_message (do_abort=<optimized out>,
fmt=<optimized out>) at ../sysdeps/unix/sysv/linux/libc_fatal.c:189 #3 
0x00007ffff5a76d76 in malloc_printerr (action=3, str=0x7ffff5b4f170
"munmap_chunk(): invalid pointer", ptr=<optimized out>) at malloc.c:6283
#4  0x00007ffff63d214a in slapi_op_search_callback (op=0x7fffe2c9fff0,
rs=<optimized out>, prc=<optimized out>) at
../../../../../servers/slapd/slapi/slapi_overlay.c:313
#5  slapi_op_search_callback (op=0x7fffe2c9fff0, rs=<optimized out>,
prc=<optimized out>) at
../../../../../servers/slapd/slapi/slapi_overlay.c:296
#6  0x00007ffff63d304f in slapi_op_func (rs=0x7fffe2c9ff80, op=<optimized
out>) at ../../../../../servers/slapd/slapi/slapi_overlay.c:631
#7  slapi_op_func (op=0x7fffe2c9fff0, rs=0x7fffe2c9ff80) at
../../../../../servers/slapd/slapi/slapi_overlay.c:556
#8  0x00005555555ff18a in overlay_op_walk (op=op@entry=0x7fffe2c9fff0,
rs=0x7fffe2c9ff80, which=op_search, oi=0x5555559e5680, on=0x5555559e43a0)
at ../../../../servers/slapd/backover.c:661
#9  0x00005555555ff31b in over_op_func (op=0x7fffe2c9fff0, rs=<optimized
out>, which=<optimized out>) at ../../../../servers/slapd/backover.c:723
#10 0x00007ffff1c4a45b in syncprov_findbase (op=op@entry=0x555555eeff60,
fc=fc@entry=0x7fffe2ca0290) at syncprov.c:465
#11 0x00007ffff1c4cd87 in syncprov_op_search (op=0x555555eeff60,
rs=0x7fffe2ca1a50) at syncprov.c:2461
#12 0x00005555555ff18a in overlay_op_walk (op=op@entry=0x555555eeff60,
rs=rs@entry=0x7fffe2ca1a50, which=which@entry=op_search,
oi=0x5555559e5680, on=0x5555559bfbe0) at
../../../../servers/slapd/backover.c:661
#13 0x00007ffff63d3086 in slapi_op_func (rs=0x7fffe2ca1a50, op=<optimized
out>) at ../../../../../servers/slapd/slapi/slapi_overlay.c:647
#14 slapi_op_func (op=0x555555eeff60, rs=0x7fffe2ca1a50) at
../../../../../servers/slapd/slapi/slapi_overlay.c:556
#15 0x00005555555ff18a in overlay_op_walk (op=op@entry=0x555555eeff60,
rs=0x7fffe2ca1a50, which=op_search, oi=0x5555559e5680, on=0x5555559e43a0)
at ../../../../servers/slapd/backover.c:661
#16 0x00005555555ff31b in over_op_func (op=0x555555eeff60, rs=<optimized
out>, which=<optimized out>) at ../../../../servers/slapd/backover.c:723
#17 0x0000555555594641 in fe_op_search (op=0x555555eeff60,
rs=0x7fffe2ca1a50) at ../../../../servers/slapd/search.c:402
#18 0x0000555555593f06 in do_search (op=0x555555eeff60, rs=0x7fffe2ca1a50)
at ../../../../servers/slapd/search.c:247
#19 0x0000555555591961 in connection_operation
(ctx=ctx@entry=0x7fffe2ca1ba0, arg_v=arg_v@entry=0x555555eeff60) at
../../../../servers/slapd/connection.c:1150
#20 0x0000555555591c84 in connection_read_thread (ctx=0x7fffe2ca1ba0,
argv=<optimized out>) at ../../../../servers/slapd/connection.c:1286 #21
0x00007ffff7b9dff3 in ?? () from
/usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2
#22 0x00007ffff5d90b50 in start_thread (arg=<optimized out>) at
pthread_create.c:304
#23 0x00007ffff5adaa7d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#24 0x0000000000000000 in ?? ()
(gdb)

Valgrind doesn't turn up anything strange before the crash, so I guess it
comes down to design error in a relatively untested code path.

Tim



Comment 13 kb9vqf@pearsoncomputing.net 2013-07-10 19:29:59 UTC
> Somewhat of a workaround - slapi should not be doing anything here unless
> the
> filter was actually changed.
>
> syncprov is using (objectclass=*) which frankly no plugin should be
> rewriting,
> so this will cover it for the most part. Aside from that it looks like
> this
> area of the slapi interaction needs some redesign.

Ignore my earlier noise please--an old (unpatched) libslapi library was
present on the system from earlier debugging attempts, this was what
caused the recent crash.

Your patch seems to work!  Thank you for your quick response; we now have
a shot at going forward with the OpenLDAP deployment.

Tim

Comment 14 Howard Chu 2013-07-10 19:57:41 UTC
changed notes
changed state Open to Test
moved from Incoming to Software Bugs
Comment 15 Quanah Gibson-Mount 2013-07-29 19:33:20 UTC
changed notes
changed state Test to Partial
Comment 16 OpenLDAP project 2014-08-01 21:04:48 UTC
partial fix in master
partial fix in RE24
Comment 17 Leonid Yuriev 2014-10-14 11:40:37 UTC
It is one more relevant bug with patch
http://www.OpenLDAP.org/its/index.cgi?findid=7965