Issue 7655 - segfault during initial mirror of multimaster delta replication
Summary: segfault during initial mirror of multimaster delta replication
Status: VERIFIED FEEDBACK
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: slapd (show other issues)
Version: unspecified
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-08-04 16:27 UTC by hans.freitag@entiretec.com
Modified: 2021-08-03 18:13 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description hans.freitag@entiretec.com 2013-08-04 16:27:19 UTC
Full_Name: Hans Freitag
Version: 2.4.35 and 33
OS: SLES 11SP2
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (193.200.138.3)


I have a Multimaster Delta replication setup here with bdb on a 18 GB Database.

After a crash due to a full disk I made a new database on one node ans started
over. 

The empty node started to replicate, from the full one but after a while
(approx. 2GB) it crashed with a segfault: 

Aug  4 11:45:32 mhr-dd-lda-01 kernel: [52189.476209] slapd[10158]: segfault at
20 ip 00007ff97ebfabc0 sp 00007ff6e57e6b38 error 4 in
libc-2.11.1.so[7ff97eb79000+155000] 

So i thought, maybe it is not e good Idea to put in a package for SP2 in a
machine running SP1 so my first attempt to solve was an upgrade. After the
upgrade I got this: 

Aug  4 12:46:29 mhr-dd-lda-01 kernel: [ 1414.757587] slapd[3704]: segfault at 20
ip 00007fc82eee6182 sp 00007fc592e0acf0 error 4 in slapd[7fc82ee7a000+1e6000]

So I created a brandnew openldap RPM 2.4.35 rpm to try out if the problem is
maybe related to the 2.4.33 version I am running. But fail: 

Aug  4 13:47:19 mhr-dd-lda-01 kernel: [ 5063.074410] slapd[8749]: segfault at 20
ip 00007fcbc1b537dc sp 00007fc92624fb88 error 4 in slapd[7fcbc1ac8000+1ea000]

At the moment I deactivated the accesslogging on the node which seems to work. I
will know for sure in a few hours. ;-) I can try to reproduce that on a backup
node next week. Whenn all the main nodes are up and running again. :) 
Comment 1 Quanah Gibson-Mount 2013-08-05 03:11:05 UTC
--On Sunday, August 04, 2013 4:27 PM +0000 hans.freitag@entiretec.com wrote:

> Full_Name: Hans Freitag
> Version: 2.4.35 and 33
> OS: SLES 11SP2
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (193.200.138.3)
>
>
> I have a Multimaster Delta replication setup here with bdb on a 18 GB
> Database.
>
> After a crash due to a full disk I made a new database on one node ans
> started over.
>
> The empty node started to replicate, from the full one but after a while
> (approx. 2GB) it crashed with a segfault:
>
> Aug  4 11:45:32 mhr-dd-lda-01 kernel: [52189.476209] slapd[10158]:
> segfault at 20 ip 00007ff97ebfabc0 sp 00007ff6e57e6b38 error 4 in
> libc-2.11.1.so[7ff97eb79000+155000]
>
> So i thought, maybe it is not e good Idea to put in a package for SP2 in a
> machine running SP1 so my first attempt to solve was an upgrade. After the
> upgrade I got this:
>
> Aug  4 12:46:29 mhr-dd-lda-01 kernel: [ 1414.757587] slapd[3704]:
> segfault at 20 ip 00007fc82eee6182 sp 00007fc592e0acf0 error 4 in
> slapd[7fc82ee7a000+1e6000]
>
> So I created a brandnew openldap RPM 2.4.35 rpm to try out if the problem
> is maybe related to the 2.4.33 version I am running. But fail:
>
> Aug  4 13:47:19 mhr-dd-lda-01 kernel: [ 5063.074410] slapd[8749]:
> segfault at 20 ip 00007fcbc1b537dc sp 00007fc92624fb88 error 4 in
> slapd[7fcbc1ac8000+1ea000]
>
> At the moment I deactivated the accesslogging on the node which seems to
> work. I will know for sure in a few hours. ;-) I can try to reproduce
> that on a backup node next week. Whenn all the main nodes are up and
> running again. :)

I would suggest you build with debugging symbols, enable core files, and 
provide a backtrace of the problem.  What you have provided does not give 
any useful information for debugging purposes.  You also fail to state the 
backend you are using (back-bdb or back-hdb).

For information on how to provide a backtrace:

<http://www.openldap.org/faq/data/cache/59.html>

Regards,
Quanah

--

Quanah Gibson-Mount
Lead Engineer
Zimbra, Inc
--------------------
Zimbra ::  the leader in open source messaging and collaboration

Comment 2 Howard Chu 2013-08-10 13:00:15 UTC
changed state Open to Feedback
Comment 3 hans.freitag@entiretec.com 2013-09-18 13:13:32 UTC
Hi, 

unfortunately i was not able to reproduce the exact problem with the segfault, but, after a few updates, 
we still have the problem that with replication enabled the slapd freezes during a write operation.

SETUP DESCRIPTION: 

Openldap Version 2.4.36  
Back-MDB (we have issues for quite a while, even when we where running on bdb) 
 

All write and read requests are directed to the active node, so the passive 
node is replicating. 

So, if I did not understand something wrong I have two threads: The main thread, 
and the one which is doing the replication.



Netstat of TCP Replication connections, the second is initiated by the 
passive system polling from the active

tcp        0     53 10.169.127.13:389       10.169.126.13:43340     ESTABLISHED
tcp   1905336      0 10.169.127.13:52384     10.169.126.13:389       ESTABLISHED


top -H of the LDAP Processes: 

 7767 ldap      20   0 84.4g 7.1g 6.9g S      1 10.1   1:02.13 slapd
 7768 ldap      20   0 84.4g 7.1g 6.9g S      0 10.1   7:54.44 slapd
 8023 ldap      20   0 84.4g 7.1g 6.9g S      0 10.1   0:32.31 slapd
 7766 ldap      20   0 84.4g 7.1g 6.9g S      0 10.1   0:00.00 slapd
 7769 ldap      20   0 84.4g 7.1g 6.9g S      0 10.1   0:32.81 slapd
 7770 ldap      20   0 84.4g 7.1g 6.9g S      0 10.1   7:44.94 slapd
 8024 ldap      20   0 84.4g 7.1g 6.9g t      0 10.1   0:32.53 slapd

PASTEBIN: 

I Pastebinned all the backtraces to: 

http://pastebin.com/vVGEqEUt


I hope this helps to track back the problem. 


Kind regards - Mit freundlichen Grüßen 

i.A. Hans Freitag
» Linux Administrator

ENTIRETEC AG . Pforzheimer Strasse 33 . 01189 Dresden . Germany
T: +49.351.41355.0 . M:  . F: +49.351.41355.99
E: hans.freitag@entiretec.com

ENTIRETEC | http://www.entiretec.com
Germany | Switzerland | United Arab Emirates | Malaysia | United States of America

ENTIRETEC AG
Vorstand: Thomas Herrmann (Vorsitzender), Thomas Wetzel, Carsten Klemm . Aufsichtsratsvorsitzende: Dr. Jutta Horezky
Sitz der Gesellschaft: Dresden . Amtsgericht Dresden HRB 24915 . USt-IdNr. DE227705033



> -----Ursprüngliche Nachricht-----
> Von: openldap-bugs-bounces@OpenLDAP.org [mailto:openldap-bugs-
> bounces@OpenLDAP.org] Im Auftrag von quanah@zimbra.com
> Gesendet: Montag, 5. August 2013 05:15
> An: openldap-its@openldap.org
> Betreff: Re: (ITS#7655) segfault during initial mirror of multimaster
> delta replication
> 
> --On Sunday, August 04, 2013 4:27 PM +0000 hans.freitag@entiretec.com
> wrote:
> 
> > Full_Name: Hans Freitag
> > Version: 2.4.35 and 33
> > OS: SLES 11SP2
> > URL: ftp://ftp.openldap.org/incoming/
> > Submission from: (NULL) (193.200.138.3)
> >
> >
> > I have a Multimaster Delta replication setup here with bdb on a 18 GB
> > Database.
> >
> > After a crash due to a full disk I made a new database on one node
> ans
> > started over.
> >
> > The empty node started to replicate, from the full one but after a
> while
> > (approx. 2GB) it crashed with a segfault:
> >
> > Aug  4 11:45:32 mhr-dd-lda-01 kernel: [52189.476209] slapd[10158]:
> > segfault at 20 ip 00007ff97ebfabc0 sp 00007ff6e57e6b38 error 4 in
> > libc-2.11.1.so[7ff97eb79000+155000]
> >
> > So i thought, maybe it is not e good Idea to put in a package for SP2
> in a
> > machine running SP1 so my first attempt to solve was an upgrade.
> After the
> > upgrade I got this:
> >
> > Aug  4 12:46:29 mhr-dd-lda-01 kernel: [ 1414.757587] slapd[3704]:
> > segfault at 20 ip 00007fc82eee6182 sp 00007fc592e0acf0 error 4 in
> > slapd[7fc82ee7a000+1e6000]
> >
> > So I created a brandnew openldap RPM 2.4.35 rpm to try out if the
> problem
> > is maybe related to the 2.4.33 version I am running. But fail:
> >
> > Aug  4 13:47:19 mhr-dd-lda-01 kernel: [ 5063.074410] slapd[8749]:
> > segfault at 20 ip 00007fcbc1b537dc sp 00007fc92624fb88 error 4 in
> > slapd[7fcbc1ac8000+1ea000]
> >
> > At the moment I deactivated the accesslogging on the node which seems
> to
> > work. I will know for sure in a few hours. ;-) I can try to reproduce
> > that on a backup node next week. Whenn all the main nodes are up and
> > running again. :)
> 
> I would suggest you build with debugging symbols, enable core files,
> and
> provide a backtrace of the problem.  What you have provided does not
> give
> any useful information for debugging purposes.  You also fail to state
> the
> backend you are using (back-bdb or back-hdb).
> 
> For information on how to provide a backtrace:
> 
> <http://www.openldap.org/faq/data/cache/59.html>
> 
> Regards,
> Quanah
> 
> --
> 
> Quanah Gibson-Mount
> Lead Engineer
> Zimbra, Inc
> --------------------
> Zimbra ::  the leader in open source messaging and collaboration
>