[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#3574) server hangs on multiple diverse queries



Your suspicion was right. Our ldap server runs on a redhat9 (shrike) 
system. I recompiled it on a system running RHEL (AS). The problem 
doesn't show up there, so we will migrate to RHEL.

I suspect the problem was actually a pthreads problem, but I haven't got 
the time to look into it further.

Thanks for the quick response.

KR,
Frank

On Thu, 3 Mar 2005 hyc@symas.com wrote:

> We'll need more detailed information to identify the problem. At the 
> moment I suspect your operating system / glibc since we have performed 
> quite a bit of heavy querying against this code already without any 
> hangs. It would help to have your specific kernel and glibc version number.
> 
> To get some more information, recompile your libldap_r with 
> -DLDAP_RDWR_DEBUG, so that additional debug info will be present, and 
> relink slapd with this library.
> 
> Then, the next time the hang occurs, print the contents of the 
> bdb->bi_cache.c_rwlock. That should tell us which thread is currently 
> holding the lock. Looking at the code and the trace you sent, I don't 
> believe any of those threads would validly have the lock, but this 
> should tell us for certain.
> 
> Frank.Meisschaert@UGent.be wrote:
> 
> >Full_Name: Frank Meisschaert
> >Version: 2.2.23
> >OS: Linux (2.4)
> >URL: ftp://ftp.openldap.org/incoming/
> >Submission from: (NULL) (157.193.44.125)
> >
> >
> >Hello,
> >
> >We recently upgraded our ldap infrastructure to OpenLDAP v2.2.23 with BDB
> >v4.3.27 as a backend due to some server hangs, but the server keeps hanging. In
> >the log file we can see that the server keeps accepting new connections, but no
> >RESULT's are returned. A gdb backtrace of a hung process is included at the end.
> >Only one thread is in bdb_cache_lru_add, the others are stuck in
> >bdb_cache_find_id. I suspect a cache problem (lru locking), which is consistent
> >with what we see: the problem only occurs when multiple queries are made
> >requesting many different entries. In our case we can replicate the problem by
> >issuing several searches for random patterns in the 'givenName' attribute (not
> >indexed).
> >
> >Kind Regards,
> >Frank Meisschaert
> >
> > Thread 18 (process 16670):
> >#0  0xffffe002 in ?? ()
> >#1  0x08052eff in slapd_daemon_task (ptr=0x0) at daemon.c:2007
> >#2  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
> >
> >Thread 17 (process 16671):
> >#0  0xffffe002 in ?? ()
> >#1  0x0809f78a in bdb_cache_lru_add (bdb=0x0, locker=181, ei=0x4425ea88) at
> >cache.c:555
> >#2  0x0809faa8 in bdb_cache_find_id (op=0x82b4c68, tid=0x0, id=825,
> >eip=0x409bc348, islocked=0, locker=181, 
> >    lock=0x409bc35c) at cache.c:799
> >#3  0x080973f7 in bdb_do_search (op=0x82b4c68, rs=0x40a7d61c, sop=0x82b4c68,
> >ps_e=0x0, ps_type=0)
> >    at search.c:935
> >#4  0x08095ab3 in bdb_search (op=0xfffffffc, rs=0x40a7d61c) at search.c:384
> >#5  0x08057adb in do_search (op=0x82b4c68, rs=0x40a7d61c) at search.c:412
> >#6  0x080563ca in connection_operation (ctx=0x40a7d69c, arg_v=0x82b4c68) at
> >connection.c:1086
> >#7  0x080b372f in ldap_int_thread_pool_wrapper (xpool=0x81af340) at tpool.c:467
> >#8  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
> >
> >Thread 16 (process 16672):
> >#0  0xffffe002 in ?? ()
> >#1  0x0809fa4d in bdb_cache_find_id (op=0x82d7468, tid=0x0, id=825,
> >eip=0x416be2c8, islocked=0, locker=191, 
> >    lock=0x416be2dc) at cache.c:784
> >#2  0x080973f7 in bdb_do_search (op=0x82d7468, rs=0x4177f59c, sop=0x82d7468,
> >ps_e=0x0, ps_type=0)
> >    at search.c:935
> >#3  0x08095ab3 in bdb_search (op=0xfffffff5, rs=0x4177f59c) at search.c:384
> >#4  0x08057adb in do_search (op=0x82d7468, rs=0x4177f59c) at search.c:412
> >#5  0x080563ca in connection_operation (ctx=0x4177f61c, arg_v=0x82d7468) at
> >connection.c:1086
> >#6  0x080b372f in ldap_int_thread_pool_wrapper (xpool=0x81af340) at tpool.c:467
> >#7  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
> >
> >Thread 15 (process 16673):
> >#0  0xffffe002 in ?? ()
> >#1  0x0809fa4d in bdb_cache_find_id (op=0x82d64a0, tid=0x0, id=143,
> >eip=0x41d3e248, islocked=0, locker=199, 
> >    lock=0x41d3e25c) at cache.c:784
> >#2  0x080973f7 in bdb_do_search (op=0x82d64a0, rs=0x41dff51c, sop=0x82d64a0,
> >ps_e=0x0, ps_type=0)
> >    at search.c:935
> >#3  0x08095ab3 in bdb_search (op=0xfffffff5, rs=0x41dff51c) at search.c:384
> >#4  0x08057adb in do_search (op=0x82d64a0, rs=0x41dff51c) at search.c:412
> >#5  0x080563ca in connection_operation (ctx=0x41dff59c, arg_v=0x82d64a0) at
> >connection.c:1086
> >#6  0x080b372f in ldap_int_thread_pool_wrapper (xpool=0x81af340) at tpool.c:467
> >#7  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
> >
> >Thread 14 (process 16674):
> >#0  0xffffe002 in ?? ()
> >#1  0x0809fa4d in bdb_cache_find_id (op=0x82a3510, tid=0x0, id=823,
> >eip=0x42c721c8, islocked=0, locker=206, 
> >    lock=0x42c721dc) at cache.c:784
> >#2  0x080973f7 in bdb_do_search (op=0x82a3510, rs=0x42d3349c, sop=0x82a3510,
> >ps_e=0x0, ps_type=0)
> >    at search.c:935
> >#3  0x08095ab3 in bdb_search (op=0xfffffffc, rs=0x42d3349c) at search.c:384
> >#4  0x08057adb in do_search (op=0x82a3510, rs=0x42d3349c) at search.c:412
> >#5  0x080563ca in connection_operation (ctx=0x42d3351c, arg_v=0x82a3510) at
> >connection.c:1086
> >#6  0x080b372f in ldap_int_thread_pool_wrapper (xpool=0x81af340) at tpool.c:467
> >#7  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
> >---Type <return> to continue, or q <return> to quit--- 
> >
> >Thread 13 (process 16708):
> >#0  0xffffe002 in ?? ()
> >#1  0x0809fa4d in bdb_cache_find_id (op=0x44200aa8, tid=0x0, id=82,
> >eip=0x4463e148, islocked=0, locker=225, 
> >    lock=0x4463e15c) at cache.c:784
> >#2  0x080973f7 in bdb_do_search (op=0x44200aa8, rs=0x446ff41c, sop=0x44200aa8,
> >ps_e=0x0, ps_type=0)
> >    at search.c:935
> >#3  0x08095ab3 in bdb_search (op=0xfffffff5, rs=0x446ff41c) at search.c:384
> >#4  0x08057adb in do_search (op=0x44200aa8, rs=0x446ff41c) at search.c:412
> >#5  0x080563ca in connection_operation (ctx=0x446ff49c, arg_v=0x44200aa8) at
> >connection.c:1086
> >#6  0x080b372f in ldap_int_thread_pool_wrapper (xpool=0x81af340) at tpool.c:467
> >#7  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
> >
> >Thread 12 (process 16709):
> >#0  0xffffe002 in ?? ()
> >#1  0x0809fa4d in bdb_cache_find_id (op=0x44202100, tid=0x0, id=22,
> >eip=0x453400c8, islocked=0, locker=226, 
> >    lock=0x453400dc) at cache.c:784
> >#2  0x080973f7 in bdb_do_search (op=0x44202100, rs=0x4540139c, sop=0x44202100,
> >ps_e=0x0, ps_type=0)
> >    at search.c:935
> >#3  0x08095ab3 in bdb_search (op=0xfffffff5, rs=0x4540139c) at search.c:384
> >#4  0x08057adb in do_search (op=0x44202100, rs=0x4540139c) at search.c:412
> >#5  0x080563ca in connection_operation (ctx=0x4540141c, arg_v=0x44202100) at
> >connection.c:1086
> >#6  0x080b372f in ldap_int_thread_pool_wrapper (xpool=0x81af340) at tpool.c:467
> >#7  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
> >
> >Thread 11 (process 16710):
> >#0  0xffffe002 in ?? ()
> >#1  0x0809fa4d in bdb_cache_find_id (op=0x4426fdf8, tid=0x0, id=7,
> >eip=0x46103204, islocked=0, locker=227, 
> >    lock=0x4610324c) at cache.c:784
> >#2  0x080a3619 in bdb_dn2entry (op=0x4426fdf8, tid=0x0, dn=0x0, e=0x46103244,
> >matched=1, locker=227, 
> >    lock=0x4610324c) at dn2entry.c:69
> >#3  0x0809ecb0 in bdb_bind (op=0x4426fdf8, rs=0x4610331c) at bind.c:69
> >#4  0x080690c8 in do_bind (op=0x4426fdf8, rs=0x4610331c) at bind.c:622
> >#5  0x0805630d in connection_operation (ctx=0x4610339c, arg_v=0x4426fdf8) at
> >connection.c:1051
> >#6  0x080b372f in ldap_int_thread_pool_wrapper (xpool=0x81af340) at tpool.c:467
> >#7  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
> >
> >Thread 10 (process 16711):
> >#0  0xffffe002 in ?? ()
> >#1  0x0809fa4d in bdb_cache_find_id (op=0x4426ff88, tid=0x0, id=7,
> >eip=0x46604184, islocked=0, locker=228, 
> >    lock=0x466041cc) at cache.c:784
> >#2  0x080a3619 in bdb_dn2entry (op=0x4426ff88, tid=0x0, dn=0x0, e=0x466041c4,
> >matched=1, locker=228, 
> >    lock=0x466041cc) at dn2entry.c:69
> >#3  0x0809ecb0 in bdb_bind (op=0x4426ff88, rs=0x4660429c) at bind.c:69
> >#4  0x080690c8 in do_bind (op=0x4426ff88, rs=0x4660429c) at bind.c:622
> >#5  0x0805630d in connection_operation (ctx=0x4660431c, arg_v=0x4426ff88) at
> >connection.c:1051
> >#6  0x080b372f in ldap_int_thread_pool_wrapper (xpool=0x81af340) at tpool.c:467
> >#7  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
> >
> >Thread 9 (process 16712):
> >#0  0xffffe002 in ?? ()
> >#1  0x0809fa4d in bdb_cache_find_id (op=0x44270118, tid=0x0, id=7,
> >eip=0x46b05104, islocked=0, locker=229, 
> >    lock=0x46b0514c) at cache.c:784
> >#2  0x080a3619 in bdb_dn2entry (op=0x44270118, tid=0x0, dn=0x0, e=0x46b05144,
> >matched=1, locker=229, 
> >---Type <return> to continue, or q <return> to quit---
> >    lock=0x46b0514c) at dn2entry.c:69
> >#3  0x0809ecb0 in bdb_bind (op=0x44270118, rs=0x46b0521c) at bind.c:69
> >#4  0x080690c8 in do_bind (op=0x44270118, rs=0x46b0521c) at bind.c:622
> >#5  0x0805630d in connection_operation (ctx=0x46b0529c, arg_v=0x44270118) at
> >connection.c:1051
> >#6  0x080b372f in ldap_int_thread_pool_wrapper (xpool=0x81af340) at tpool.c:467
> >#7  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
> >
> >Thread 8 (process 16713):
> >#0  0xffffe002 in ?? ()
> >#1  0x0809fa4d in bdb_cache_find_id (op=0x44263ef8, tid=0x0, id=7,
> >eip=0x47006084, islocked=0, locker=230, 
> >    lock=0x470060cc) at cache.c:784
> >#2  0x080a3619 in bdb_dn2entry (op=0x44263ef8, tid=0x0, dn=0x0, e=0x470060c4,
> >matched=1, locker=230, 
> >    lock=0x470060cc) at dn2entry.c:69
> >#3  0x0809ecb0 in bdb_bind (op=0x44263ef8, rs=0x4700619c) at bind.c:69
> >#4  0x080690c8 in do_bind (op=0x44263ef8, rs=0x4700619c) at bind.c:622
> >#5  0x0805630d in connection_operation (ctx=0x4700621c, arg_v=0x44263ef8) at
> >connection.c:1051
> >#6  0x080b372f in ldap_int_thread_pool_wrapper (xpool=0x81af340) at tpool.c:467
> >#7  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
> >
> >Thread 7 (process 16714):
> >#0  0xffffe002 in ?? ()
> >#1  0x0809fa4d in bdb_cache_find_id (op=0x442ca410, tid=0x0, id=7,
> >eip=0x47507004, islocked=0, locker=231, 
> >    lock=0x4750704c) at cache.c:784
> >#2  0x080a3619 in bdb_dn2entry (op=0x442ca410, tid=0x0, dn=0x0, e=0x47507044,
> >matched=1, locker=231, 
> >    lock=0x4750704c) at dn2entry.c:69
> >#3  0x0809ecb0 in bdb_bind (op=0x442ca410, rs=0x4750711c) at bind.c:69
> >#4  0x080690c8 in do_bind (op=0x442ca410, rs=0x4750711c) at bind.c:622
> >#5  0x0805630d in connection_operation (ctx=0x4750719c, arg_v=0x442ca410) at
> >connection.c:1051
> >#6  0x080b372f in ldap_int_thread_pool_wrapper (xpool=0x81af340) at tpool.c:467
> >#7  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
> >
> >Thread 6 (process 16715):
> >#0  0xffffe002 in ?? ()
> >#1  0x0809fa4d in bdb_cache_find_id (op=0x442ca5d8, tid=0x0, id=7,
> >eip=0x47a07f84, islocked=0, locker=232, 
> >    lock=0x47a07fcc) at cache.c:784
> >#2  0x080a3619 in bdb_dn2entry (op=0x442ca5d8, tid=0x0, dn=0x0, e=0x47a07fc4,
> >matched=1, locker=232, 
> >    lock=0x47a07fcc) at dn2entry.c:69
> >#3  0x0809ecb0 in bdb_bind (op=0x442ca5d8, rs=0x47a0809c) at bind.c:69
> >#4  0x080690c8 in do_bind (op=0x442ca5d8, rs=0x47a0809c) at bind.c:622
> >#5  0x0805630d in connection_operation (ctx=0x47a0811c, arg_v=0x442ca5d8) at
> >connection.c:1051
> >#6  0x080b372f in ldap_int_thread_pool_wrapper (xpool=0x81af340) at tpool.c:467
> >#7  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
> >
> >Thread 5 (process 16716):
> >#0  0xffffe002 in ?? ()
> >#1  0x0809fa4d in bdb_cache_find_id (op=0x4425b7b0, tid=0x0, id=7,
> >eip=0x47f08f04, islocked=0, locker=233, 
> >    lock=0x47f08f4c) at cache.c:784
> >#2  0x080a3619 in bdb_dn2entry (op=0x4425b7b0, tid=0x0, dn=0x0, e=0x47f08f44,
> >matched=1, locker=233, 
> >    lock=0x47f08f4c) at dn2entry.c:69
> >#3  0x0809ecb0 in bdb_bind (op=0x4425b7b0, rs=0x47f0901c) at bind.c:69
> >#4  0x080690c8 in do_bind (op=0x4425b7b0, rs=0x47f0901c) at bind.c:622
> >#5  0x0805630d in connection_operation (ctx=0x47f0909c, arg_v=0x4425b7b0) at
> >connection.c:1051
> >#6  0x080b372f in ldap_int_thread_pool_wrapper (xpool=0x81af340) at tpool.c:467
> >#7  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
> >---Type <return> to continue, or q <return> to quit---
> >
> >Thread 4 (process 16717):
> >#0  0xffffe002 in ?? ()
> >#1  0x0809fa4d in bdb_cache_find_id (op=0x442acfb8, tid=0x0, id=7,
> >eip=0x48409e84, islocked=0, locker=234, 
> >    lock=0x48409ecc) at cache.c:784
> >#2  0x080a3619 in bdb_dn2entry (op=0x442acfb8, tid=0x0, dn=0x0, e=0x48409ec4,
> >matched=1, locker=234, 
> >    lock=0x48409ecc) at dn2entry.c:69
> >#3  0x0809ecb0 in bdb_bind (op=0x442acfb8, rs=0x48409f9c) at bind.c:69
> >#4  0x080690c8 in do_bind (op=0x442acfb8, rs=0x48409f9c) at bind.c:622
> >#5  0x0805630d in connection_operation (ctx=0x4840a01c, arg_v=0x442acfb8) at
> >connection.c:1051
> >#6  0x080b372f in ldap_int_thread_pool_wrapper (xpool=0x81af340) at tpool.c:467
> >#7  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
> >
> >Thread 3 (process 16718):
> >#0  0xffffe002 in ?? ()
> >#1  0x0809fa4d in bdb_cache_find_id (op=0x442ad1e0, tid=0x0, id=7,
> >eip=0x4890ae04, islocked=0, locker=235, 
> >    lock=0x4890ae4c) at cache.c:784
> >#2  0x080a3619 in bdb_dn2entry (op=0x442ad1e0, tid=0x0, dn=0x0, e=0x4890ae44,
> >matched=1, locker=235, 
> >    lock=0x4890ae4c) at dn2entry.c:69
> >#3  0x0809ecb0 in bdb_bind (op=0x442ad1e0, rs=0x4890af1c) at bind.c:69
> >#4  0x080690c8 in do_bind (op=0x442ad1e0, rs=0x4890af1c) at bind.c:622
> >#5  0x0805630d in connection_operation (ctx=0x4890af9c, arg_v=0x442ad1e0) at
> >connection.c:1051
> >#6  0x080b372f in ldap_int_thread_pool_wrapper (xpool=0x81af340) at tpool.c:467
> >#7  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
> >
> >Thread 2 (process 16719):
> >#0  0xffffe002 in ?? ()
> >#1  0x0809fa4d in bdb_cache_find_id (op=0x442ad408, tid=0x0, id=7,
> >eip=0x48e0bd84, islocked=0, locker=236, 
> >    lock=0x48e0bdcc) at cache.c:784
> >#2  0x080a3619 in bdb_dn2entry (op=0x442ad408, tid=0x0, dn=0x0, e=0x48e0bdc4,
> >matched=1, locker=236, 
> >    lock=0x48e0bdcc) at dn2entry.c:69
> >#3  0x0809ecb0 in bdb_bind (op=0x442ad408, rs=0x48e0be9c) at bind.c:69
> >#4  0x080690c8 in do_bind (op=0x442ad408, rs=0x48e0be9c) at bind.c:622
> >#5  0x0805630d in connection_operation (ctx=0x48e0bf1c, arg_v=0x442ad408) at
> >connection.c:1051
> >#6  0x080b372f in ldap_int_thread_pool_wrapper (xpool=0x81af340) at tpool.c:467
> >#7  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
> >
> >Thread 1 (process 16669):
> >#0  0xffffe002 in ?? ()
> >#1  0x080545dd in slapd_daemon () at daemon.c:2041
> >#2  0x0804b7b0 in main (argc=7, argv=0xbfffed44) at main.c:713
> >#3  0x42015574 in __libc_start_main () from /lib/tls/libc.so.6
> >
> >
> >
> >
> >
> >  
> >
> 
> 
>