Full_Name: Hans Freitag Version: 2.4.35 and 33 OS: SLES 11SP2 URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (193.200.138.3) I have a Multimaster Delta replication setup here with bdb on a 18 GB Database. After a crash due to a full disk I made a new database on one node ans started over. The empty node started to replicate, from the full one but after a while (approx. 2GB) it crashed with a segfault: Aug 4 11:45:32 mhr-dd-lda-01 kernel: [52189.476209] slapd[10158]: segfault at 20 ip 00007ff97ebfabc0 sp 00007ff6e57e6b38 error 4 in libc-2.11.1.so[7ff97eb79000+155000] So i thought, maybe it is not e good Idea to put in a package for SP2 in a machine running SP1 so my first attempt to solve was an upgrade. After the upgrade I got this: Aug 4 12:46:29 mhr-dd-lda-01 kernel: [ 1414.757587] slapd[3704]: segfault at 20 ip 00007fc82eee6182 sp 00007fc592e0acf0 error 4 in slapd[7fc82ee7a000+1e6000] So I created a brandnew openldap RPM 2.4.35 rpm to try out if the problem is maybe related to the 2.4.33 version I am running. But fail: Aug 4 13:47:19 mhr-dd-lda-01 kernel: [ 5063.074410] slapd[8749]: segfault at 20 ip 00007fcbc1b537dc sp 00007fc92624fb88 error 4 in slapd[7fcbc1ac8000+1ea000] At the moment I deactivated the accesslogging on the node which seems to work. I will know for sure in a few hours. ;-) I can try to reproduce that on a backup node next week. Whenn all the main nodes are up and running again. :)
--On Sunday, August 04, 2013 4:27 PM +0000 hans.freitag@entiretec.com wrote: > Full_Name: Hans Freitag > Version: 2.4.35 and 33 > OS: SLES 11SP2 > URL: ftp://ftp.openldap.org/incoming/ > Submission from: (NULL) (193.200.138.3) > > > I have a Multimaster Delta replication setup here with bdb on a 18 GB > Database. > > After a crash due to a full disk I made a new database on one node ans > started over. > > The empty node started to replicate, from the full one but after a while > (approx. 2GB) it crashed with a segfault: > > Aug 4 11:45:32 mhr-dd-lda-01 kernel: [52189.476209] slapd[10158]: > segfault at 20 ip 00007ff97ebfabc0 sp 00007ff6e57e6b38 error 4 in > libc-2.11.1.so[7ff97eb79000+155000] > > So i thought, maybe it is not e good Idea to put in a package for SP2 in a > machine running SP1 so my first attempt to solve was an upgrade. After the > upgrade I got this: > > Aug 4 12:46:29 mhr-dd-lda-01 kernel: [ 1414.757587] slapd[3704]: > segfault at 20 ip 00007fc82eee6182 sp 00007fc592e0acf0 error 4 in > slapd[7fc82ee7a000+1e6000] > > So I created a brandnew openldap RPM 2.4.35 rpm to try out if the problem > is maybe related to the 2.4.33 version I am running. But fail: > > Aug 4 13:47:19 mhr-dd-lda-01 kernel: [ 5063.074410] slapd[8749]: > segfault at 20 ip 00007fcbc1b537dc sp 00007fc92624fb88 error 4 in > slapd[7fcbc1ac8000+1ea000] > > At the moment I deactivated the accesslogging on the node which seems to > work. I will know for sure in a few hours. ;-) I can try to reproduce > that on a backup node next week. Whenn all the main nodes are up and > running again. :) I would suggest you build with debugging symbols, enable core files, and provide a backtrace of the problem. What you have provided does not give any useful information for debugging purposes. You also fail to state the backend you are using (back-bdb or back-hdb). For information on how to provide a backtrace: <http://www.openldap.org/faq/data/cache/59.html> Regards, Quanah -- Quanah Gibson-Mount Lead Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
changed state Open to Feedback
Hi, unfortunately i was not able to reproduce the exact problem with the segfault, but, after a few updates, we still have the problem that with replication enabled the slapd freezes during a write operation. SETUP DESCRIPTION: Openldap Version 2.4.36 Back-MDB (we have issues for quite a while, even when we where running on bdb) All write and read requests are directed to the active node, so the passive node is replicating. So, if I did not understand something wrong I have two threads: The main thread, and the one which is doing the replication. Netstat of TCP Replication connections, the second is initiated by the passive system polling from the active tcp 0 53 10.169.127.13:389 10.169.126.13:43340 ESTABLISHED tcp 1905336 0 10.169.127.13:52384 10.169.126.13:389 ESTABLISHED top -H of the LDAP Processes: 7767 ldap 20 0 84.4g 7.1g 6.9g S 1 10.1 1:02.13 slapd 7768 ldap 20 0 84.4g 7.1g 6.9g S 0 10.1 7:54.44 slapd 8023 ldap 20 0 84.4g 7.1g 6.9g S 0 10.1 0:32.31 slapd 7766 ldap 20 0 84.4g 7.1g 6.9g S 0 10.1 0:00.00 slapd 7769 ldap 20 0 84.4g 7.1g 6.9g S 0 10.1 0:32.81 slapd 7770 ldap 20 0 84.4g 7.1g 6.9g S 0 10.1 7:44.94 slapd 8024 ldap 20 0 84.4g 7.1g 6.9g t 0 10.1 0:32.53 slapd PASTEBIN: I Pastebinned all the backtraces to: http://pastebin.com/vVGEqEUt I hope this helps to track back the problem. Kind regards - Mit freundlichen Grüßen i.A. Hans Freitag » Linux Administrator ENTIRETEC AG . Pforzheimer Strasse 33 . 01189 Dresden . Germany T: +49.351.41355.0 . M: . F: +49.351.41355.99 E: hans.freitag@entiretec.com ENTIRETEC | http://www.entiretec.com Germany | Switzerland | United Arab Emirates | Malaysia | United States of America ENTIRETEC AG Vorstand: Thomas Herrmann (Vorsitzender), Thomas Wetzel, Carsten Klemm . Aufsichtsratsvorsitzende: Dr. Jutta Horezky Sitz der Gesellschaft: Dresden . Amtsgericht Dresden HRB 24915 . USt-IdNr. DE227705033 > -----Ursprüngliche Nachricht----- > Von: openldap-bugs-bounces@OpenLDAP.org [mailto:openldap-bugs- > bounces@OpenLDAP.org] Im Auftrag von quanah@zimbra.com > Gesendet: Montag, 5. August 2013 05:15 > An: openldap-its@openldap.org > Betreff: Re: (ITS#7655) segfault during initial mirror of multimaster > delta replication > > --On Sunday, August 04, 2013 4:27 PM +0000 hans.freitag@entiretec.com > wrote: > > > Full_Name: Hans Freitag > > Version: 2.4.35 and 33 > > OS: SLES 11SP2 > > URL: ftp://ftp.openldap.org/incoming/ > > Submission from: (NULL) (193.200.138.3) > > > > > > I have a Multimaster Delta replication setup here with bdb on a 18 GB > > Database. > > > > After a crash due to a full disk I made a new database on one node > ans > > started over. > > > > The empty node started to replicate, from the full one but after a > while > > (approx. 2GB) it crashed with a segfault: > > > > Aug 4 11:45:32 mhr-dd-lda-01 kernel: [52189.476209] slapd[10158]: > > segfault at 20 ip 00007ff97ebfabc0 sp 00007ff6e57e6b38 error 4 in > > libc-2.11.1.so[7ff97eb79000+155000] > > > > So i thought, maybe it is not e good Idea to put in a package for SP2 > in a > > machine running SP1 so my first attempt to solve was an upgrade. > After the > > upgrade I got this: > > > > Aug 4 12:46:29 mhr-dd-lda-01 kernel: [ 1414.757587] slapd[3704]: > > segfault at 20 ip 00007fc82eee6182 sp 00007fc592e0acf0 error 4 in > > slapd[7fc82ee7a000+1e6000] > > > > So I created a brandnew openldap RPM 2.4.35 rpm to try out if the > problem > > is maybe related to the 2.4.33 version I am running. But fail: > > > > Aug 4 13:47:19 mhr-dd-lda-01 kernel: [ 5063.074410] slapd[8749]: > > segfault at 20 ip 00007fcbc1b537dc sp 00007fc92624fb88 error 4 in > > slapd[7fcbc1ac8000+1ea000] > > > > At the moment I deactivated the accesslogging on the node which seems > to > > work. I will know for sure in a few hours. ;-) I can try to reproduce > > that on a backup node next week. Whenn all the main nodes are up and > > running again. :) > > I would suggest you build with debugging symbols, enable core files, > and > provide a backtrace of the problem. What you have provided does not > give > any useful information for debugging purposes. You also fail to state > the > backend you are using (back-bdb or back-hdb). > > For information on how to provide a backtrace: > > <http://www.openldap.org/faq/data/cache/59.html> > > Regards, > Quanah > > -- > > Quanah Gibson-Mount > Lead Engineer > Zimbra, Inc > -------------------- > Zimbra :: the leader in open source messaging and collaboration >