Issue 7791 - 3-Way delta MMR segfault
Summary: 3-Way delta MMR segfault
Status: VERIFIED FIXED
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: slapd (show other issues)
Version: unspecified
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-01-24 09:03 UTC by marvin.mundry@uni-hamburg.de
Modified: 2017-04-03 17:15 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description marvin.mundry@uni-hamburg.de 2014-01-24 09:03:12 UTC
Full_Name: Marvin Mundry
Version: slapd 2.4.38
OS: Ubuntu 12.04
URL: http://attachment.rrz.uni-hamburg.de/rzrv002/bug.tar.bz
Submission from: (NULL) (134.100.2.183)


i have set up a 3-Way delta MMR replication.
when i add the third machine to the replication (zd-ldap-3) that machine starts
replicating objects but segfaults at some point during replication.


i have
* build openldap slapd 2.4.38 with the configuration shown in build.txt (in the
attached file)
* slapadd -S 1 -l /usr/local/openldap/etc/openldap/slapd.ldif -F
/usr/local/openldap/etc/openldap/slapd.d -bcn=config (slapd.ldif is contained in
the attached file)
* started up zd-ldap-next-1
* imported ~67'000 objects via ldapadd
* on zd-ldap-next-1: slapcat -bcn=config -F
/usr/local/openldap/etc/openldap/slapd.d/ -l /tmp/configbk.ldif
* scp /tmp/configbk.ldif from zd-ldap-1 --to--> zd-ldap-2 and zd-ldap-3
* on zd-ldap 2 and 3: slapadd -b cn=config -l /tmp/configbk.ldif -F
/usr/local/openldap/etc/openldap/slapd.d
* started up slapd on zd-ldap-2
* waited for the replication to finish (worked fine)
* started up slapd on zd-ldap-3
* waited for the replication to finish ==> segfault after replication of ~40'000
objects (the number differs when i try to reproduce the bug (i.e. replication
does not always fail at the same object))
Comment 1 Michael Ströder 2014-01-24 10:56:47 UTC
Hard to tell what's going on without seeing the configuration.

I'd disable overlays one by one to check whether one of those causes the seg
fault.
slapo-rwm is one such candidate.

Ciao, Michael.


Comment 2 jsynacek@redhat.com 2014-01-24 11:24:04 UTC
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 01/24/2014 11:57 AM, michael@stroeder.com wrote:
> Hard to tell what's going on without seeing the configuration.
> 
> I'd disable overlays one by one to check whether one of those causes the seg
> fault.
> slapo-rwm is one such candidate.

If it really turns out that slapo-rwm is the culprit, consider checking the latest patch in ITS#7723.

- -- 
Jan Synacek
Software Engineer, Red Hat
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJS4k1TAAoJEL3BmMJQOtjBxU0QAJPHFtx9mcEIi52JJQgVX3Ju
I6kc1J+17ZaifuwS3wfBDAWy373oBUnmjcX17Sl0hPug3IoWLQ0alX5FO4ppCEwJ
/fnbFpm7oXaP6xMqoZTUGAZ4pAk/fidXd/aGov3kIVIh89ZlA7OmsH9yCXqpLw8Z
2F4uZI54Gap5ZjCaQ5TCXynP+bpNiu3IImp3Co5TgOdAWrRxgfKnNz2zfjZSRnLt
/jMOjcsEOuKJzK8CWIrA/ihH2Es8QPHXQ+bnnG9UtjIJJF3D22qn7Fu3qXOA1IDd
RTqXXMsy1djmwi8idayFZpp2SzTMSYNAQ+81bPKhIKRzZvPq6mX1AFhjM6c4PqHK
eIA5km37wfM5Es9+d6Z0MwYELkFdtMz/TGPy5KB79SkJ+rnK7TGyZG2Ojpl0XdG2
VcqW0oYCpPfKj5mH5J7zaJfdWPHF8ymbSruX0dcyGxZcoK2YMqU3bVzg/GW6Gbos
nRG9G/ZglB4lRhztJOnjij8Vn964fKaPFzJhVAnWQIt2+3fExuGoWuU15brwMySW
iHVy4kM8U9ON/fYpFkBGJDr1LcgtuOb6ERSyRcIkp5QVVKVfSLFP85gQJAn0XPDq
06ReRV3qu+Epao5yW+lrLb1HhFZ983NZzjH2nxkyJJ9DqooKuwqFVBAwuFp/ryEn
xScRkMGLPCO5ksKj71UW
=ZHqA
-----END PGP SIGNATURE-----

Comment 3 marvin.mundry@uni-hamburg.de 2014-01-24 12:21:15 UTC
> Hard to tell what's going on without seeing the configuration.
config is contained in slapd.conf

> I'd disable overlays one by one
Unfortunately I'm lacking the time to create a miminalistic config that still shows the problem right now.
I hope that the core dump is sufficient to determine what's going on

here's the interesting part:

Core was generated by `/usr/local/openldap/libexec/slapd -h ldap://zd-ldap-next-3/ ldaps://zd-ldap-nex'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f0a41327497 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) backtrace
#0  0x00007f0a41327497 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00000000004b2ec1 in avl_find ()
#2  0x0000000000475094 in oc_bvfind ()
#3  0x000000000046def4 in ?? ()
#4  0x000000000044ae4e in ordered_value_match ()
#5  0x000000000044d1b7 in ?? ()
#6  0x000000000044d8e1 in test_filter ()
#7  0x000000000044dd12 in test_filter ()
#8  0x00007f0a3d8379a6 in syncprov_matchops (op=0x7f0a34a43ea0, opc=0x7f0a24000a90, saveit=0) at syncprov.c:1317
#9  0x00007f0a3d8386ce in syncprov_op_response (op=0x7f0a34a43ea0, rs=<optimized out>) at syncprov.c:1940
#10 0x000000000043e3b7 in ?? ()
#11 0x000000000043e8ba in ?? ()
#12 0x000000000043f2eb in slap_send_ldap_result ()
#13 0x00007f0a3e51101c in hdb_add (op=0x7f0a34a43ea0, rs=0x7f0a34a43e10) at add.c:515
#14 0x0000000000498637 in overlay_op_walk ()
#15 0x0000000000498718 in ?? ()
#16 0x00007f0a3da451d1 in accesslog_response (op=<optimized out>, rs=<optimized out>) at accesslog.c:1835
#17 0x0000000000497a68 in ?? ()
#18 0x000000000043e3b7 in ?? ()
#19 0x000000000043e8ba in ?? ()
#20 0x000000000043f2eb in slap_send_ldap_result ()
#21 0x00007f0a3e51101c in hdb_add (op=0x7f0a34a452c0, rs=0x7f0a34a449d0) at add.c:515
#22 0x0000000000498637 in overlay_op_walk ()
#23 0x0000000000498718 in ?? ()
#24 0x000000000048cbbe in ?? ()
#25 0x0000000000491024 in ?? ()
#26 0x00007f0a4204689a in ldap_int_thread_pool_wrapper (xpool=0x219c1d0) at tpool.c:688
#27 0x00007f0a415b2e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#28 0x00007f0a412df3fd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#29 0x0000000000000000 in ?? ()

Best regards,

Marvin
Comment 4 Frederic Nass 2015-06-27 07:01:21 UTC
Hi, 

We are facing the exact same situation with a 3-way configuration with openldap-2.4.39.2z packaged by Zimbra 8.6 p2. 

Slapd crashes at some point during first replication. We can reproduce this on any of the 3 nodes by stopping ldap service, deleting databases (accesslog and mdb) and restarting ldap service. 

Please tell us how we can be helpful. 

Frédéric Nass 

Sous-direction Infrastructures 
Direction du Numérique 
Université de Lorraine 
Comment 5 Quanah Gibson-Mount 2015-06-27 17:22:46 UTC
--On Saturday, June 27, 2015 8:01 AM +0000 frederic.nass@univ-lorraine.fr 
wrote:

> ------=_Part_4076294_1166621201.1435388481648
> Content-Type: text/plain; charset=utf-8
> Content-Transfer-Encoding: quoted-printable
>
> Hi,=20
>
> We are facing the exact same situation with a 3-way configuration with
> open= ldap-2.4.39.2z packaged by Zimbra 8.6 p2.=20
>
> Slapd crashes at some point during first replication. We can reproduce
> this=  on any of the 3 nodes by stopping ldap service, deleting databases
> (access= log and mdb) and restarting ldap service.=20
>
> Please tell us how we can be helpful.=20

Provide a gdb backtrace.  <http://www.openldap.org/faq/data/cache/59.html>

--Quanah


--

Quanah Gibson-Mount
Platform Architect
Zimbra, Inc.
--------------------
Zimbra ::  the leader in open source messaging and collaboration

Comment 6 Quanah Gibson-Mount 2017-04-03 17:14:48 UTC
Hello,

I can say that I've not seen this issue in numerous current OpenLDAP 
deployments with OpenLDAP 2.4.44.  However, be aware of ITS#8432 which does 
impact 2.4.44.  I'd guess that whatever was causing the problem was fixed 
between 2.4.38 and 2.4.44.  Unfortunately, a clear backtrace with full 
debugging symbols and no optimization of slapd was never provided, so work 
on this ITS can't really go forward.  I will close it for now, given that 
I've not seen this issue impact large volume 3 & 4 way MMR setups with 
current releases.  If you can reproduce with current OpenLDAP, please feel 
free to respond, at which time I will reopen the issue.

Regards,
Quanah

--

Quanah Gibson-Mount
Product Architect
Symas Corporation
Packaged, certified, and supported LDAP solutions powered by OpenLDAP:
<http://www.symas.com>


Comment 7 OpenLDAP project 2017-04-03 17:15:12 UTC
Does not appear to affect current OpenLDAP releases.
Comment 8 Quanah Gibson-Mount 2017-04-03 17:15:12 UTC
changed notes
changed state Open to Closed
moved from Incoming to Software Bugs