[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: HEAD is hanging quite frequently...



This is likely related to the new lightweight dispatcher
code in HEAD...

Kurt

At 12:12 AM 10/16/2005, Pierangelo Masarati wrote:
>I'm seeing frequent occurrences of slapd getting stuck during trivial
>operations; this is, for example, test003.  The problem is quite
>repeatable, although some times it hangs in a slightly different place.
>I haven't been able to complete it in tens of attempts, though.
>
>[masarati@ando tests]$ SLAPD_DEBUG=256 ./run test003
>Cleaning up test run directory leftover from previous run.
>Running ./scripts/test003-search...
>running defines.sh
>Running slapadd to build slapd database...
>Running slapindex to index slapd database...
>Starting slapd on TCP/IP port 9011...
>Testing slapd searching...
>Testing exact searching...
>Testing approximate searching...
>Testing OR searching...
>Testing AND matching and ends-with searching...
>Testing NOT searching...
>[hung...]
>
>(gdb) thread apply all bt
>
>Thread 4 (Thread 1082132832 (LWP 31897)):
>#0  0x0000002a962e3f1c in epoll_wait () from /lib64/tls/libc.so.6
>#1  0x0000000000438383 in slapd_daemon_task (ptr=0x0) at daemon.c:1803
>#2  0x0000002a9610f0aa in start_thread ()
>from /lib64/tls/libpthread.so.0
>#3  0x0000002a962e3b43 in clone () from /lib64/tls/libc.so.6
>#4  0x0000000000000000 in ?? ()
>
>Thread 3 (Thread 1090525536 (LWP 31899)):
>#0  0x0000002a961118da in pthread_cond_wait@@GLIBC_2.3.2 ()
>   from /lib64/tls/libpthread.so.0
>#1  0x00000000005b0ce8 in ldap_pvt_thread_cond_wait (cond=0x80bf30,
>    mutex=0x80bf08) at thr_posix.c:282
>#2  0x00000000005afa8f in ldap_int_thread_pool_wrapper (xpool=0x80bf00)
>    at tpool.c:607
>#3  0x0000002a9610f0aa in start_thread ()
>from /lib64/tls/libpthread.so.0
>#4  0x0000002a962e3b43 in clone () from /lib64/tls/libc.so.6
>#5  0x0000000000000000 in ?? ()
>
>Thread 2 (Thread 1098918240 (LWP 31900)):
>#0  0x0000002a961118da in pthread_cond_wait@@GLIBC_2.3.2 ()
>   from /lib64/tls/libpthread.so.0
>#1  0x00000000005b0ce8 in ldap_pvt_thread_cond_wait (cond=0x80bf30,
>    mutex=0x80bf08) at thr_posix.c:282
>#2  0x00000000005afa8f in ldap_int_thread_pool_wrapper (xpool=0x80bf00)
>    at tpool.c:607
>#3  0x0000002a9610f0aa in start_thread ()
>from /lib64/tls/libpthread.so.0
>#4  0x0000002a962e3b43 in clone () from /lib64/tls/libc.so.6
>#5  0x0000000000000000 in ?? ()
>
>Thread 1 (Thread 182910803744 (LWP 31895)):
>#0  0x0000002a9610ff2b in pthread_join ()
>from /lib64/tls/libpthread.so.0
>#1  0x00000000005b0c40 in ldap_pvt_thread_join (thread=1082132832,
>    thread_return=0x0) at thr_posix.c:186
>#2  0x0000000000439086 in slapd_daemon () at daemon.c:2179
>#3  0x0000000000423c08 in main (argc=8, argv=0x7fbffff168) at main.c:785
>
>The output is:
>
>@(#) $OpenLDAP: slapd 2.X (Oct 16 2005 09:00:22) $
>
>cat testrun/slapd.1.log
>masarati@ando:/home/masarati/Lavoro/sysnet/Ldap/ldap/servers/slapd
>bdb_db_open: Warning - No DB_CONFIG file found in
>directory ./testrun/db.1.a: (2)
>Expect poor performance for suffix dc=example,dc=com.
>slapd starting
>conn=0 fd=11 ACCEPT from IP=127.0.0.1:33162 (IP=127.0.0.1:9011)
>conn=0 op=0 BIND dn="" method=128
>conn=0 op=0 RESULT tag=97 err=0 text=
>conn=0 op=1 SRCH base="" scope=0 deref=0 filter="(objectClass=*)"
>conn=0 op=1 SEARCH RESULT tag=101 err=0 nentries=1 text=
>conn=1 fd=12 ACCEPT from IP=127.0.0.1:33163 (IP=127.0.0.1:9011)
>conn=0 op=2 UNBIND
>conn=0 fd=11 closed
>conn=1 op=0 BIND dn="" method=128
>conn=1 op=0 RESULT tag=97 err=0 text=
>conn=1 op=1 SRCH base="dc=example,dc=com" scope=2 deref=0
>filter="(sn=jensen)"
>conn=1 op=1 SEARCH RESULT tag=101 err=0 nentries=2 text=
>conn=1 op=2 UNBIND
>conn=1 fd=12 closed
>connection_read(12): no connection!
>conn=2 fd=12 ACCEPT from IP=127.0.0.1:33164 (IP=127.0.0.1:9011)
>conn=2 op=0 BIND dn="" method=128
>conn=2 op=0 RESULT tag=97 err=0 text=
>conn=2 op=1 SRCH base="dc=example,dc=com" scope=2 deref=0
>filter="(sn~=jensen)"
>conn=2 op=1 SRCH attr=name
><= bdb_approx_candidates: (sn) index_param failed (18)
>conn=2 op=1 SEARCH RESULT tag=101 err=0 nentries=2 text=
>conn=2 op=2 UNBIND
>conn=2 fd=12 closed
>conn=3 fd=12 ACCEPT from IP=127.0.0.1:33165 (IP=127.0.0.1:9011)
>conn=3 op=0 BIND dn="" method=128
>conn=3 op=0 RESULT tag=97 err=0 text=
>get_filter: conn 3 unknown attribute type=undef (17)
>conn=3 op=1 SRCH base="dc=example,dc=com" scope=2 deref=0 filter="(|
>(givenName=xx*yy*z)(?=undefined)(?
>=false)(objectClass=groupOfNames)(sn=jones)(member=cn=manager,dc=example,dc=com)(uniqueMember=cn=manager,dc=example,dc=com))"
><= bdb_substring_candidates: (givenName) index_param failed (18)
><= bdb_equality_candidates: (member) index_param failed (18)
><= bdb_equality_candidates: (uniqueMember) index_param failed (18)
>conn=3 op=1 SEARCH RESULT tag=101 err=0 nentries=4 text=
>conn=3 op=2 UNBIND
>conn=3 fd=12 closed
>connection_read(12): no connection!
>conn=4 fd=12 ACCEPT from IP=127.0.0.1:33166 (IP=127.0.0.1:9011)
>conn=4 op=0 BIND dn="" method=128
>conn=4 op=0 RESULT tag=97 err=0 text=
>conn=4 op=1 SRCH base="ou=groups,dc=example,dc=com" scope=1 deref=0
>filter="(&(objectClass=groupOfNames)(cn=a*)(member=cn=mark
>elliot,ou=alumni association,ou=people,dc=example,dc=com))"
><= bdb_equality_candidates: (member) index_param failed (18)
>conn=4 op=1 SEARCH RESULT tag=101 err=0 nentries=2 text=
>conn=4 op=2 UNBIND
>conn=4 fd=12 closed
>connection_read(12): no connection!
>
>I note those strange messages about connection_read (no writing seems to
>be occurring at all, during this test).
>
>The software/hardware is a CentOS 4.2 (RehHat AS4 clone) running on
>AMD64.
>
>Any clues?  I'm not submitting an ITS yet because i suspect something
>trivial on my side.
>
>p.
>
>
>
>    SysNet - via Dossi,8 27100 Pavia Tel: +390382573859 Fax: +390382476497