Full_Name: Michael Str�der Version: RE24 OS: OpenSUSE Linux 10.3 URL: Submission from: (NULL) (84.163.88.4) HI! I already reported different test scripts failing every now and then. I ran 'make test' a couple of times and finally it failed once in test050 (see below). Running test050 again succeeds. I've also attached the debug logs. Ciao, Michael. >>>>> Starting test050-syncrepl-multimaster ... running defines.sh Initializing server configurations... Starting producer slapd on TCP/IP port 9011... Using ldapsearch to check that producer slapd is running... Inserting syncprov overlay on producer... Starting consumer slapd on TCP/IP port 9012... Using ldapsearch to check that consumer slapd is running... Configuring syncrepl on consumer... Starting consumer2 slapd on TCP/IP port 9013... Using ldapsearch to check that consumer2 slapd is running... Configuring syncrepl on consumer2... Adding schema and databases on producer... Using ldapadd to populate producer... Waiting 20 seconds for syncrepl to receive changes... Using ldapadd to populate consumer... ldapadd failed for consumer database (32)! >>>>> ./scripts/test050-syncrepl-multimaster failed (exit 32) make[2]: *** [hdb-yes] Error 32 make[2]: Leaving directory `/home/michael/src/openldap/OPENLDAP_REL_ENG_2_4/openldap/tests' make[1]: *** [test] Error 2 make[1]: Leaving directory `/home/michael/src/openldap/OPENLDAP_REL_ENG_2_4/openldap/tests' make: *** [test] Error 2 --------------------------- end of test.out ---------------------------- ldap_add: No such object (32) matched DN: ou=People,dc=example,dc=com adding new entry "cn=James A Jones 2,ou=Alumni Association,ou=People,dc=example,dc=com"
See slapd's debug logs attached as tar.gz
michael@stroeder.com wrote: > Full_Name: Michael Ströder > Version: RE24 > OS: OpenSUSE Linux 10.3 > URL: > Submission from: (NULL) (84.163.88.4) > > > HI! > > I already reported different test scripts failing every now and then. > I ran 'make test' a couple of times and finally it failed once in test050 (see > below). Running test050 again succeeds. > > I've also attached the debug logs. As noted in the other email thread, I'm pretty sure this is the same as http://www.openldap.org/lists/openldap-devel/200804/msg00067.html I have committed a patch to syncprov.c for this, but I haven't reproduced the original problem so I can't verify that it is fixed. Running test050 by itself 20 times never showed any problem for me. (Dual core OpenSUSE 10.2) If you can get it to fail more regularly, let me know... > Ciao, Michael. > >>>>>> Starting test050-syncrepl-multimaster ... > running defines.sh > Initializing server configurations... > Starting producer slapd on TCP/IP port 9011... > Using ldapsearch to check that producer slapd is running... > Inserting syncprov overlay on producer... > Starting consumer slapd on TCP/IP port 9012... > Using ldapsearch to check that consumer slapd is running... > Configuring syncrepl on consumer... > Starting consumer2 slapd on TCP/IP port 9013... > Using ldapsearch to check that consumer2 slapd is running... > Configuring syncrepl on consumer2... > Adding schema and databases on producer... > Using ldapadd to populate producer... > Waiting 20 seconds for syncrepl to receive changes... > Using ldapadd to populate consumer... > ldapadd failed for consumer database (32)! >>>>>> ./scripts/test050-syncrepl-multimaster failed (exit 32) > make[2]: *** [hdb-yes] Error 32 > make[2]: Leaving directory > `/home/michael/src/openldap/OPENLDAP_REL_ENG_2_4/openldap/tests' > make[1]: *** [test] Error 2 > make[1]: Leaving directory > `/home/michael/src/openldap/OPENLDAP_REL_ENG_2_4/openldap/tests' > make: *** [test] Error 2 > > --------------------------- end of test.out ---------------------------- > ldap_add: No such object (32) > matched DN: ou=People,dc=example,dc=com > adding new entry "cn=James A Jones 2,ou=Alumni > Association,ou=People,dc=example,dc=com" > > > -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
Le Mer 16 avril 2008 10:41, hyc@symas.com a écrit : > michael@stroeder.com wrote: >> Full_Name: Michael Ströder >> Version: RE24 >> OS: OpenSUSE Linux 10.3 >> URL: >> Submission from: (NULL) (84.163.88.4) >> >> >> HI! >> >> I already reported different test scripts failing every now and then. >> I ran 'make test' a couple of times and finally it failed once in >> test050 (see >> below). Running test050 again succeeds. >> >> I've also attached the debug logs. > > As noted in the other email thread, I'm pretty sure this is the same as > http://www.openldap.org/lists/openldap-devel/200804/msg00067.html > > I have committed a patch to syncprov.c for this, but I haven't reproduced > the > original problem so I can't verify that it is fixed. Running test050 by > itself > 20 times never showed any problem for me. (Dual core OpenSUSE 10.2) > > If you can get it to fail more regularly, let me know... All tests passed now twice here, plus 5 times for test050/hdb only. Ubuntu Hardy/i386/dual core. Regards, Raphaël Ouazana.
Howard Chu wrote: > > I have committed a patch to syncprov.c for this, but I haven't > reproduced the original problem so I can't verify that it is fixed. > Running test050 by itself 20 times never showed any problem for me. > (Dual core OpenSUSE 10.2) Note that I reported it especially for RE24 since I've only tested it there. I will check HEAD more thoroughly with your patch in HEAD. Any chance to see exactly this patch in RE24 any time soon so that I can test with this particular code-base? Yes, I could port your patch to RE24 myself but I'd prefer if we have exactly the same code-base. Ciao, Michael.
Re, Le Mer 16 avril 2008 10:41, hyc@symas.com a écrit : > michael@stroeder.com wrote: >> Full_Name: Michael Ströder >> Version: RE24 >> OS: OpenSUSE Linux 10.3 >> URL: >> Submission from: (NULL) (84.163.88.4) >> >> >> HI! >> >> I already reported different test scripts failing every now and then. >> I ran 'make test' a couple of times and finally it failed once in >> test050 (see >> below). Running test050 again succeeds. >> >> I've also attached the debug logs. > > As noted in the other email thread, I'm pretty sure this is the same as > http://www.openldap.org/lists/openldap-devel/200804/msg00067.html > > I have committed a patch to syncprov.c for this, but I haven't reproduced > the > original problem so I can't verify that it is fixed. Running test050 by > itself > 20 times never showed any problem for me. (Dual core OpenSUSE 10.2) > > If you can get it to fail more regularly, let me know... . In fact test050 fails with or without your patch on syncprov.c. Logs attached (RE24+last syncprov patch). Regards, Raphaël Ouazana. Cleaning up test run directory leftover from previous run. Running ./scripts/test050-syncrepl-multimaster... running defines.sh Initializing server configurations... Starting producer slapd on TCP/IP port 9011... Using ldapsearch to check that producer slapd is running... Inserting syncprov overlay on producer... Starting consumer slapd on TCP/IP port 9012... Using ldapsearch to check that consumer slapd is running... Configuring syncrepl on consumer... Starting consumer2 slapd on TCP/IP port 9013... Using ldapsearch to check that consumer2 slapd is running... Configuring syncrepl on consumer2... Adding schema and databases on producer... Using ldapadd to populate producer... Waiting 20 seconds for syncrepl to receive changes... Using ldapadd to populate consumer... ldapadd failed for consumer database (32)!
Raphaël Ouazana-Sustowski wrote: > Re, > > Le Mer 16 avril 2008 10:41, hyc@symas.com a écrit : >> I have committed a patch to syncprov.c for this, but I haven't reproduced >> the >> original problem so I can't verify that it is fixed. Running test050 by >> itself >> 20 times never showed any problem for me. (Dual core OpenSUSE 10.2) >> >> If you can get it to fail more regularly, let me know... > . > In fact test050 fails with or without your patch on syncprov.c. OK, thanks. I actually reproduced the failure once as well. Will need a different workaround. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
changed notes changed state Open to Test moved from Incoming to Software Bugs
Raphaël Ouazana-Sustowski wrote: > Re, > > Le Mer 16 avril 2008 10:41, hyc@symas.com a écrit : >> If you can get it to fail more regularly, let me know... > . > In fact test050 fails with or without your patch on syncprov.c. > > Logs attached (RE24+last syncprov patch). OK, please try again using the latest syncrepl.c patch. The syncprov.c patch has been reverted since it did nothing helpful. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
Hi, Le Jeu 17 avril 2008 01:59, Howard Chu a écrit : > Raphaël Ouazana-Sustowski wrote: >> Re, >> >> Le Mer 16 avril 2008 10:41, hyc@symas.com a écrit : >>> If you can get it to fail more regularly, let me know... >> . >> In fact test050 fails with or without your patch on syncprov.c. >> >> Logs attached (RE24+last syncprov patch). > > OK, please try again using the latest syncrepl.c patch. The syncprov.c > patch > has been reverted since it did nothing helpful. Failed after 68 iterations: Cleaning up test run directory leftover from previous run. Running ./scripts/test050-syncrepl-multimaster... running defines.sh Initializing server configurations... Starting producer slapd on TCP/IP port 9011... Using ldapsearch to check that producer slapd is running... Inserting syncprov overlay on producer... Starting consumer slapd on TCP/IP port 9012... Using ldapsearch to check that consumer slapd is running... Configuring syncrepl on consumer... Starting consumer2 slapd on TCP/IP port 9013... Using ldapsearch to check that consumer2 slapd is running... Configuring syncrepl on consumer2... Adding schema and databases on producer... Using ldapadd to populate producer... Waiting 20 seconds for syncrepl to receive changes... Using ldapadd to populate consumer... Waiting 20 seconds for syncrepl to receive changes... Using ldapsearch to check that syncrepl received database changes... Waiting 5 seconds for syncrepl to receive changes... Waiting 5 seconds for syncrepl to receive changes... Waiting 5 seconds for syncrepl to receive changes... Waiting 5 seconds for syncrepl to receive changes... Waiting 5 seconds for syncrepl to receive changes... Waiting 5 seconds for syncrepl to receive changes... ldapsearch failed (32)!
I ran into this same problem and was able to fix it. I had been using the yum version of Berkeley DB 4.5 and updated to Oracle Berkeley 4.6 and then it worked fine. Just make sure that when you run configure that it includes the correct version of db.h
scratch that.. it happened again. :\ fooie
I updated RE24 this morning with a couple of fixes from Howard. See if you can still repo, thanks. --Quanah --On April 21, 2008 5:31:48 PM +0000 phil@cogbill.net wrote: > scratch that.. it happened again. :\ fooie > > -- Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
Hi, Le Lun 21 avril 2008 19:44, quanah@zimbra.com a écrit : > I updated RE24 this morning with a couple of fixes from Howard. > > See if you can still repo, thanks. It still seems to fail, after more than 100 runs: Cleaning up test run directory leftover from previous run. Running ./scripts/test050-syncrepl-multimaster... running defines.sh Initializing server configurations... Starting producer slapd on TCP/IP port 9011... Using ldapsearch to check that producer slapd is running... Inserting syncprov overlay on producer... Starting consumer slapd on TCP/IP port 9012... Using ldapsearch to check that consumer slapd is running... Configuring syncrepl on consumer... Starting consumer2 slapd on TCP/IP port 9013... Using ldapsearch to check that consumer2 slapd is running... Configuring syncrepl on consumer2... Adding schema and databases on producer... Using ldapadd to populate producer... Waiting 20 seconds for syncrepl to receive changes... Using ldapadd to populate consumer... Waiting 20 seconds for syncrepl to receive changes... Using ldapsearch to check that syncrepl received database changes... Using ldapsearch to check that syncrepl received database changes on consumer2... Using ldapsearch to read config from the producer... Using ldapsearch to read config from the consumer... Using ldapsearch to read config from consumer2... Filtering producer results... Filtering consumer results... Filtering consumer2 results... Comparing retrieved configs from producer and consumer... Comparing retrieved configs from producer and consumer2... Using ldapsearch to read all the entries from the producer... Using ldapsearch to read all the entries from the consumer... Using ldapsearch to read all the entries from the consumer2... Filtering producer results... Filtering consumer results... Filtering consumer2 results... Comparing retrieved entries from producer and consumer... Comparing retrieved entries from producer and consumer2... test failed - producer and consumer2 databases differ I sent the testrun directory through FTP: ftp://ftp.openldap.org/incoming/raphael-ouazana-testrun-20080424.tgz Regards, Raphaël Ouazana.
raphael.ouazana@linagora.com wrote: > Hi, > > Le Lun 21 avril 2008 19:44, quanah@zimbra.com a écrit : >> I updated RE24 this morning with a couple of fixes from Howard. >> >> See if you can still repo, thanks. > > It still seems to fail, after more than 100 runs: It appears from your logs that the entry "cn=ITD Staff,ou=Groups,dc=example,dc=com" is being added first, and then removed from server 3, probably related to some issue in sync replication. Can you enable "sync" log when running the tests? SLAPD_DEBUG="sync,stats,args,trace" ./run test050 p. Ing. Pierangelo Masarati OpenLDAP Core Team SysNet s.r.l. via Dossi, 8 - 27100 Pavia - ITALIA http://www.sys-net.it --------------------------------------- Office: +39 02 23998309 Mobile: +39 333 4963172 Email: pierangelo.masarati@sys-net.it ---------------------------------------
ando@sys-net.it wrote: > raphael.ouazana@linagora.com wrote: >> Hi, >> >> Le Lun 21 avril 2008 19:44, quanah@zimbra.com a écrit : >>> I updated RE24 this morning with a couple of fixes from Howard. >>> >>> See if you can still repo, thanks. >> It still seems to fail, after more than 100 runs: > > It appears from your logs that the entry > > "cn=ITD Staff,ou=Groups,dc=example,dc=com" > > is being added first, and then removed from server 3, probably related > to some issue in sync replication. Seems like the consumer has received a cookie from one provider, while receiving a newer entry from the other provider. During the nonpresent search the new entry is found and removed. So we need to restrict the filter in the nonpresent search to only return entries older than the cookie context. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
Here it is again. HEAD code checked out this morning. I'm able to reproduce the crash fairly often. testrun directory uploaded at ftp://ftp.sys-net.it/luca_scamoni_its5470_20080430.tgz the core file is in testrun/con
Howard Chu wrote: > Thanks. Please try HEAD again. No way. new testrun directory in ftp://ftp.sys-net.it/luca_scamoni_its5470_20080430-new.tgz backtrace attached
luca@OpenLDAP.org wrote: > This is a multi-part message in MIME format. > --------------080809000906010300090306 > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > Content-Transfer-Encoding: 7bit > > Howard Chu wrote: > >> Thanks. Please try HEAD again. >> > No way. > new testrun directory in > ftp://ftp.sys-net.it/luca_scamoni_its5470_20080430-new.tgz > > backtrace attached > recent commits seem to have fixed it (at least, right now I'm not able to reproduce it anymore...)
luca@OpenLDAP.org wrote: > luca@OpenLDAP.org wrote: >> This is a multi-part message in MIME format. >> --------------080809000906010300090306 >> Content-Type: text/plain; charset=ISO-8859-1; format=flowed >> Content-Transfer-Encoding: 7bit >> >> Howard Chu wrote: >> >>> Thanks. Please try HEAD again. >>> >> No way. >> new testrun directory in >> ftp://ftp.sys-net.it/luca_scamoni_its5470_20080430-new.tgz >> >> backtrace attached >> > recent commits seem to have fixed it (at least, right now I'm not able > to reproduce it anymore...) Right. Confirmed here too; I (temporarily) added an assert(0) to the offending branch of code to make sure the patch was actually getting hit. It takes a very particular timing to trigger that code path. I'm not sure how we can reliably test for this down the road. Perhaps we should add a "disabled" config keyword for backends and syncrepl consumers, so that we can start up the individual servers, (which takes an unpredictable amount of time for each) and then enable various parts in a fixed sequence (e.g. 1 second sleeps between ldapmodify/enable requests). Even that's hit or miss, because our test database is so small it's unlikely that we can hit the window of time on demand. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
Hi, Le Ven 2 mai 2008 11:01, hyc@symas.com a écrit : > luca@OpenLDAP.org wrote: >> luca@OpenLDAP.org wrote: >>> This is a multi-part message in MIME format. >>> --------------080809000906010300090306 >>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed >>> Content-Transfer-Encoding: 7bit >>> >>> Howard Chu wrote: >>> >>>> Thanks. Please try HEAD again. >>>> >>> No way. >>> new testrun directory in >>> ftp://ftp.sys-net.it/luca_scamoni_its5470_20080430-new.tgz >>> >>> backtrace attached >>> >> recent commits seem to have fixed it (at least, right now I'm not able >> to reproduce it anymore...) > > Right. Confirmed here too; I (temporarily) added an assert(0) to the > offending > branch of code to make sure the patch was actually getting hit. It takes a > very particular timing to trigger that code path. > > I'm not sure how we can reliably test for this down the road. Perhaps we > should add a "disabled" config keyword for backends and syncrepl > consumers, so > that we can start up the individual servers, (which takes an unpredictable > amount of time for each) and then enable various parts in a fixed sequence > (e.g. 1 second sleeps between ldapmodify/enable requests). Even that's hit > or > miss, because our test database is so small it's unlikely that we can hit > the > window of time on demand. I'm testing the last RE24 tag. After 201 successful runs of test050, I got a failure :/ Cleaning up test run directory leftover from previous run. Running ./scripts/test050-syncrepl-multimaster... running defines.sh Initializing server configurations... Starting producer slapd on TCP/IP port 9011... Using ldapsearch to check that producer slapd is running... Inserting syncprov overlay on producer... Starting consumer slapd on TCP/IP port 9012... Using ldapsearch to check that consumer slapd is running... Configuring syncrepl on consumer... Starting consumer2 slapd on TCP/IP port 9013... Using ldapsearch to check that consumer2 slapd is running... Configuring syncrepl on consumer2... Adding schema and databases on producer... Using ldapadd to populate producer... Waiting 20 seconds for syncrepl to receive changes... Using ldapadd to populate consumer... Waiting 20 seconds for syncrepl to receive changes... Using ldapsearch to check that syncrepl received database changes... Waiting 5 seconds for syncrepl to receive changes... Waiting 5 seconds for syncrepl to receive changes... Waiting 5 seconds for syncrepl to receive changes... Waiting 5 seconds for syncrepl to receive changes... Waiting 5 seconds for syncrepl to receive changes... Waiting 5 seconds for syncrepl to receive changes... ldapsearch failed (32)! testrun uploaded in ftp://ftp.openldap.org/incoming/raphael-ouazana-testrun-080505.tgz Regards, Raphaël Ouazana.
Raphaël Ouazana-Sustowski wrote: > Hi, > > Le Ven 2 mai 2008 11:01, hyc@symas.com a écrit : >> luca@OpenLDAP.org wrote: >>> luca@OpenLDAP.org wrote: >>>> This is a multi-part message in MIME format. >>>> --------------080809000906010300090306 >>>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed >>>> Content-Transfer-Encoding: 7bit >>>> >>>> Howard Chu wrote: >>>> >>>>> Thanks. Please try HEAD again. >>>>> >>>> No way. >>>> new testrun directory in >>>> ftp://ftp.sys-net.it/luca_scamoni_its5470_20080430-new.tgz >>>> >>>> backtrace attached >>>> >>> recent commits seem to have fixed it (at least, right now I'm not able >>> to reproduce it anymore...) >> Right. Confirmed here too; I (temporarily) added an assert(0) to the >> offending >> branch of code to make sure the patch was actually getting hit. It takes a >> very particular timing to trigger that code path. >> >> I'm not sure how we can reliably test for this down the road. Perhaps we >> should add a "disabled" config keyword for backends and syncrepl >> consumers, so >> that we can start up the individual servers, (which takes an unpredictable >> amount of time for each) and then enable various parts in a fixed sequence >> (e.g. 1 second sleeps between ldapmodify/enable requests). Even that's hit >> or >> miss, because our test database is so small it's unlikely that we can hit >> the >> window of time on demand. > > I'm testing the last RE24 tag. After 201 successful runs of test050, I got > a failure :/ > Cleaning up test run directory leftover from previous run. > Running ./scripts/test050-syncrepl-multimaster... > running defines.sh > Initializing server configurations... > Starting producer slapd on TCP/IP port 9011... > Using ldapsearch to check that producer slapd is running... > Inserting syncprov overlay on producer... > Starting consumer slapd on TCP/IP port 9012... > Using ldapsearch to check that consumer slapd is running... > Configuring syncrepl on consumer... > Starting consumer2 slapd on TCP/IP port 9013... > Using ldapsearch to check that consumer2 slapd is running... > Configuring syncrepl on consumer2... > Adding schema and databases on producer... > Using ldapadd to populate producer... > Waiting 20 seconds for syncrepl to receive changes... > Using ldapadd to populate consumer... > Waiting 20 seconds for syncrepl to receive changes... > Using ldapsearch to check that syncrepl received database changes... > Waiting 5 seconds for syncrepl to receive changes... > Waiting 5 seconds for syncrepl to receive changes... > Waiting 5 seconds for syncrepl to receive changes... > Waiting 5 seconds for syncrepl to receive changes... > Waiting 5 seconds for syncrepl to receive changes... > Waiting 5 seconds for syncrepl to receive changes... > ldapsearch failed (32)! > > testrun uploaded in > ftp://ftp.openldap.org/incoming/raphael-ouazana-testrun-080505.tgz The logs show that the syncrepl consumers all timed out periodically, when trying to bind to a provider. It seems that using a 1 second timeout in the syncrepl configs is too short, or your test machine was too slow during that run. Probably we should remove that timeout now, since the cn=config/thread pause issue has already been resolved. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
changed notes changed state Test to Release
Hi, Le Lun 5 mai 2008 23:08, Howard Chu a écrit : > The logs show that the syncrepl consumers all timed out periodically, when > trying to bind to a provider. It seems that using a 1 second timeout in > the > syncrepl configs is too short, or your test machine was too slow during > that run. > > Probably we should remove that timeout now, since the cn=config/thread > pause > issue has already been resolved. Currently testing new RE24. I have now a slapd process stuck at 100% : \_ /bin/sh ./scripts/test050-syncrepl-multimaster hdb \_ /tmp/openldap/tests/../servers/slapd/slapd -s0 -F slapd.d -h ldap://localhost:9011/ -d 261 \_ /tmp/openldap/tests/../servers/slapd/slapd -s0 -F ./slapd.d -h ldap://localhost:9012/ -d 261 \_ /tmp/openldap/tests/../servers/slapd/slapd -s0 -F ./slapd.d -h ldap://localhost:9013/ -d 261 \_ /bin/sh ./scripts/test050-syncrepl-multimaster hdb \_ ../clients/tools/ldapsearch -P 3 -x -LLL -H ldap://localhost:9013/ -s base -b cn=Ursula Hampster,ou=Alumni Association,ou=People,dc=example,dc=com (objectClass=*) \_ awk /^dn:/ {print "OK"} Do you need some debugging info (strace, gdb, log, or other) ? Regards, Raphael Ouazana.
--On Wednesday, May 07, 2008 12:44 PM +0000 raphael.ouazana@linagora.com wrote: > Currently testing new RE24. I have now a slapd process stuck at 100% : > \_ /bin/sh ./scripts/test050-syncrepl-multimaster hdb > \_ /tmp/openldap/tests/../servers/slapd/slapd -s0 -F slapd.d -h > ldap://localhost:9011/ -d 261 > \_ /tmp/openldap/tests/../servers/slapd/slapd -s0 -F ./slapd.d -h > ldap://localhost:9012/ -d 261 > \_ /tmp/openldap/tests/../servers/slapd/slapd -s0 -F ./slapd.d -h > ldap://localhost:9013/ -d 261 > \_ /bin/sh ./scripts/test050-syncrepl-multimaster hdb > \_ ../clients/tools/ldapsearch -P 3 -x -LLL -H > ldap://localhost:9013/ -s base -b cn=Ursula Hampster,ou=Alumni > Association,ou=People,dc=example,dc=com (objectClass=*) > \_ awk /^dn:/ {print "OK"} > > Do you need some debugging info (strace, gdb, log, or other) ? Please provide a gdb backtrace of the hung process. --Quanah -- Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
Hi, Le Mer 7 mai 2008 16:11, Quanah Gibson-Mount a écrit : > --On Wednesday, May 07, 2008 12:44 PM +0000 raphael.ouazana@linagora.com > wrote: > >> Currently testing new RE24. I have now a slapd process stuck at 100% : >> \_ /bin/sh ./scripts/test050-syncrepl-multimaster hdb >> \_ /tmp/openldap/tests/../servers/slapd/slapd -s0 -F slapd.d -h >> ldap://localhost:9011/ -d 261 >> \_ /tmp/openldap/tests/../servers/slapd/slapd -s0 -F ./slapd.d -h >> ldap://localhost:9012/ -d 261 >> \_ /tmp/openldap/tests/../servers/slapd/slapd -s0 -F ./slapd.d -h >> ldap://localhost:9013/ -d 261 >> \_ /bin/sh ./scripts/test050-syncrepl-multimaster hdb >> \_ ../clients/tools/ldapsearch -P 3 -x -LLL -H >> ldap://localhost:9013/ -s base -b cn=Ursula Hampster,ou=Alumni >> Association,ou=People,dc=example,dc=com (objectClass=*) >> \_ awk /^dn:/ {print "OK"} >> >> Do you need some debugging info (strace, gdb, log, or other) ? > > Please provide a gdb backtrace of the hung process. It is not really hung, it is more in an infinite loop. Here is the backtrace: (gdb) thread apply all bt Thread 5 (Thread 0xb7ad5b90 (LWP 6028)): #0 0xb7eef410 in __kernel_vsyscall () #1 0xb7d14676 in epoll_wait () from /lib/tls/i686/cmov/libc.so.6 #2 0x080600bb in slapd_daemon_task (ptr=0x0) at daemon.c:2281 #3 0xb7da54fb in start_thread () from /lib/tls/i686/cmov/libpthread.so.0 #4 0xb7d13e5e in clone () from /lib/tls/i686/cmov/libc.so.6 Thread 4 (Thread 0xb76d4b90 (LWP 6030)): #0 0xb7eef410 in __kernel_vsyscall () #1 0xb7da9aa5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/i686/cmov/libpthread.so.0 #2 0x0807302b in send_ldap_ber (conn=0xb7ad6b28, ber=0xb76d2e98) at result.c:188 #3 0x08075256 in slap_send_search_entry (op=0x8252af0, rs=0xb76d4178) at result.c:1200 #4 0x08056418 in config_send (op=0x8252af0, rs=0xb76d4178, ce=0x828bf48, depth=0) at bconfig.c:3507 #5 0x080563e4 in config_send (op=0x8252af0, rs=0xb76d4178, ce=0x828bf48, depth=1) at bconfig.c:3516 #6 0x080563be in config_send (op=0x8252af0, rs=0xb76d4178, ce=0x82527c0, depth=0) at bconfig.c:3511 #7 0x080563e4 in config_send (op=0x8252af0, rs=0xb76d4178, ce=0x82527c0, depth=1) at bconfig.c:3516 #8 0x080563be in config_send (op=0x8252af0, rs=0xb76d4178, ce=0x824fc80, depth=0) at bconfig.c:3511 #9 0x080564e4 in config_back_search (op=0x8252af0, rs=0xb76d4178) at bconfig.c:5261 #10 0x08065283 in fe_op_search (op=0x8252af0, rs=0xb76d4178) at search.c:366 #11 0x08065af7 in do_search (op=0x8252af0, rs=0xb76d4178) at search.c:217 #12 0x0806312f in connection_operation (ctx=0xb76d4248, arg_v=0x8252af0) at connection.c:1084 #13 0x080639bf in connection_read_thread (ctx=0xb76d4248, argv=0x8) at connection.c:1211 #14 0x0812fb48 in ldap_int_thread_pool_wrapper (xpool=0x822cae8) at tpool.c:663 #15 0xb7da54fb in start_thread () from /lib/tls/i686/cmov/libpthread.so.0 #16 0xb7d13e5e in clone () from /lib/tls/i686/cmov/libc.so.6 Thread 3 (Thread 0xb71d2b90 (LWP 6034)): #0 0xb7eef410 in __kernel_vsyscall () #1 0xb7da9aa5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/i686/cmov/libpthread.so.0 #2 0x0812fbb5 in ldap_int_thread_pool_wrapper (xpool=0x822cae8) at tpool.c:654 #3 0xb7da54fb in start_thread () from /lib/tls/i686/cmov/libpthread.so.0 #4 0xb7d13e5e in clone () from /lib/tls/i686/cmov/libc.so.6 Thread 2 (Thread 0xb6dd1b90 (LWP 6035)): #0 0xb7eef410 in __kernel_vsyscall () #1 0xb7da9aa5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/i686/cmov/libpthread.so.0 #2 0x0812f4c2 in handle_pause (tpool=<value optimized out>, do_pause=1) at tpool.c:738 #3 0x08054abf in config_back_add (op=0xb6dd0dd4, rs=0xb6dd0728) at bconfig.c:4548 ---Type <return> to continue, or q <return> to quit--- #4 0x080bb70f in syncrepl_entry (si=0x8255888, op=0xb6dd0dd4, entry=0x823ffec, modlist=0xb6dd0b44, syncstate=1, syncUUID=0xb6dd0b0c, syncCSN=0x0) at syncrepl.c:2108 #5 0x080be0d9 in do_syncrep2 (op=0xb6dd0dd4, si=0x8255888) at syncrepl.c:862 #6 0x080bfec4 in do_syncrepl (ctx=0xb6dd1248, arg=0x8255ac8) at syncrepl.c:1276 #7 0x08063a06 in connection_read_thread (ctx=0xb6dd1248, argv=0x9) at connection.c:1213 #8 0x0812fb48 in ldap_int_thread_pool_wrapper (xpool=0x822cae8) at tpool.c:663 #9 0xb7da54fb in start_thread () from /lib/tls/i686/cmov/libpthread.so.0 #10 0xb7d13e5e in clone () from /lib/tls/i686/cmov/libc.so.6 Thread 1 (Thread 0xb7c3c6b0 (LWP 6024)): #0 0xb7eef410 in __kernel_vsyscall () #1 0xb7da6775 in pthread_join () from /lib/tls/i686/cmov/libpthread.so.0 #2 0x0805d633 in slapd_daemon () at daemon.c:2644 #3 0x0804c597 in main (argc=8, argv=0xbfaefcc4) at main.c:946 #0 0xb7eef410 in __kernel_vsyscall () (gdb) Regards, Raphaël Ouazana.
raphael.ouazana@linagora.com wrote: > It is not really hung, it is more in an infinite loop. Nothing in this trace indicates a loop; all of the threads are in wait states. Is there anything in the logs that indicates a loop? -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
Le Jeu 8 mai 2008 00:03, Howard Chu a écrit : > raphael.ouazana@linagora.com wrote: >> It is not really hung, it is more in an infinite loop. > > Nothing in this trace indicates a loop; all of the threads are in wait > states. > Is there anything in the logs that indicates a loop? Yes, at least two things: - one CPU is about at 100% - logs from slapd.1.log and slapd.2.log are slowly increasing It seems to be an infinite loop between the first slapd and the second slapd, where the second is always active. I'm sorry I won't be able to access the logs until Tuesday, and all the processes are now killed. Regards, Raphaël Ouazana.
changed notes changed state Release to Closed
syncrepl fixed in HEAD fixed in RE24