[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#3574) server hangs on multiple diverse queries



We'll need more detailed information to identify the problem. At the 
moment I suspect your operating system / glibc since we have performed 
quite a bit of heavy querying against this code already without any 
hangs. It would help to have your specific kernel and glibc version number.

To get some more information, recompile your libldap_r with 
-DLDAP_RDWR_DEBUG, so that additional debug info will be present, and 
relink slapd with this library.

Then, the next time the hang occurs, print the contents of the 
bdb->bi_cache.c_rwlock. That should tell us which thread is currently 
holding the lock. Looking at the code and the trace you sent, I don't 
believe any of those threads would validly have the lock, but this 
should tell us for certain.

Frank.Meisschaert@UGent.be wrote:

>Full_Name: Frank Meisschaert
>Version: 2.2.23
>OS: Linux (2.4)
>URL: ftp://ftp.openldap.org/incoming/
>Submission from: (NULL) (157.193.44.125)
>
>
>Hello,
>
>We recently upgraded our ldap infrastructure to OpenLDAP v2.2.23 with BDB
>v4.3.27 as a backend due to some server hangs, but the server keeps hanging. In
>the log file we can see that the server keeps accepting new connections, but no
>RESULT's are returned. A gdb backtrace of a hung process is included at the end.
>Only one thread is in bdb_cache_lru_add, the others are stuck in
>bdb_cache_find_id. I suspect a cache problem (lru locking), which is consistent
>with what we see: the problem only occurs when multiple queries are made
>requesting many different entries. In our case we can replicate the problem by
>issuing several searches for random patterns in the 'givenName' attribute (not
>indexed).
>
>Kind Regards,
>Frank Meisschaert
>
> Thread 18 (process 16670):
>#0  0xffffe002 in ?? ()
>#1  0x08052eff in slapd_daemon_task (ptr=0x0) at daemon.c:2007
>#2  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
>
>Thread 17 (process 16671):
>#0  0xffffe002 in ?? ()
>#1  0x0809f78a in bdb_cache_lru_add (bdb=0x0, locker=181, ei=0x4425ea88) at
>cache.c:555
>#2  0x0809faa8 in bdb_cache_find_id (op=0x82b4c68, tid=0x0, id=825,
>eip=0x409bc348, islocked=0, locker=181, 
>    lock=0x409bc35c) at cache.c:799
>#3  0x080973f7 in bdb_do_search (op=0x82b4c68, rs=0x40a7d61c, sop=0x82b4c68,
>ps_e=0x0, ps_type=0)
>    at search.c:935
>#4  0x08095ab3 in bdb_search (op=0xfffffffc, rs=0x40a7d61c) at search.c:384
>#5  0x08057adb in do_search (op=0x82b4c68, rs=0x40a7d61c) at search.c:412
>#6  0x080563ca in connection_operation (ctx=0x40a7d69c, arg_v=0x82b4c68) at
>connection.c:1086
>#7  0x080b372f in ldap_int_thread_pool_wrapper (xpool=0x81af340) at tpool.c:467
>#8  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
>
>Thread 16 (process 16672):
>#0  0xffffe002 in ?? ()
>#1  0x0809fa4d in bdb_cache_find_id (op=0x82d7468, tid=0x0, id=825,
>eip=0x416be2c8, islocked=0, locker=191, 
>    lock=0x416be2dc) at cache.c:784
>#2  0x080973f7 in bdb_do_search (op=0x82d7468, rs=0x4177f59c, sop=0x82d7468,
>ps_e=0x0, ps_type=0)
>    at search.c:935
>#3  0x08095ab3 in bdb_search (op=0xfffffff5, rs=0x4177f59c) at search.c:384
>#4  0x08057adb in do_search (op=0x82d7468, rs=0x4177f59c) at search.c:412
>#5  0x080563ca in connection_operation (ctx=0x4177f61c, arg_v=0x82d7468) at
>connection.c:1086
>#6  0x080b372f in ldap_int_thread_pool_wrapper (xpool=0x81af340) at tpool.c:467
>#7  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
>
>Thread 15 (process 16673):
>#0  0xffffe002 in ?? ()
>#1  0x0809fa4d in bdb_cache_find_id (op=0x82d64a0, tid=0x0, id=143,
>eip=0x41d3e248, islocked=0, locker=199, 
>    lock=0x41d3e25c) at cache.c:784
>#2  0x080973f7 in bdb_do_search (op=0x82d64a0, rs=0x41dff51c, sop=0x82d64a0,
>ps_e=0x0, ps_type=0)
>    at search.c:935
>#3  0x08095ab3 in bdb_search (op=0xfffffff5, rs=0x41dff51c) at search.c:384
>#4  0x08057adb in do_search (op=0x82d64a0, rs=0x41dff51c) at search.c:412
>#5  0x080563ca in connection_operation (ctx=0x41dff59c, arg_v=0x82d64a0) at
>connection.c:1086
>#6  0x080b372f in ldap_int_thread_pool_wrapper (xpool=0x81af340) at tpool.c:467
>#7  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
>
>Thread 14 (process 16674):
>#0  0xffffe002 in ?? ()
>#1  0x0809fa4d in bdb_cache_find_id (op=0x82a3510, tid=0x0, id=823,
>eip=0x42c721c8, islocked=0, locker=206, 
>    lock=0x42c721dc) at cache.c:784
>#2  0x080973f7 in bdb_do_search (op=0x82a3510, rs=0x42d3349c, sop=0x82a3510,
>ps_e=0x0, ps_type=0)
>    at search.c:935
>#3  0x08095ab3 in bdb_search (op=0xfffffffc, rs=0x42d3349c) at search.c:384
>#4  0x08057adb in do_search (op=0x82a3510, rs=0x42d3349c) at search.c:412
>#5  0x080563ca in connection_operation (ctx=0x42d3351c, arg_v=0x82a3510) at
>connection.c:1086
>#6  0x080b372f in ldap_int_thread_pool_wrapper (xpool=0x81af340) at tpool.c:467
>#7  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
>---Type <return> to continue, or q <return> to quit--- 
>
>Thread 13 (process 16708):
>#0  0xffffe002 in ?? ()
>#1  0x0809fa4d in bdb_cache_find_id (op=0x44200aa8, tid=0x0, id=82,
>eip=0x4463e148, islocked=0, locker=225, 
>    lock=0x4463e15c) at cache.c:784
>#2  0x080973f7 in bdb_do_search (op=0x44200aa8, rs=0x446ff41c, sop=0x44200aa8,
>ps_e=0x0, ps_type=0)
>    at search.c:935
>#3  0x08095ab3 in bdb_search (op=0xfffffff5, rs=0x446ff41c) at search.c:384
>#4  0x08057adb in do_search (op=0x44200aa8, rs=0x446ff41c) at search.c:412
>#5  0x080563ca in connection_operation (ctx=0x446ff49c, arg_v=0x44200aa8) at
>connection.c:1086
>#6  0x080b372f in ldap_int_thread_pool_wrapper (xpool=0x81af340) at tpool.c:467
>#7  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
>
>Thread 12 (process 16709):
>#0  0xffffe002 in ?? ()
>#1  0x0809fa4d in bdb_cache_find_id (op=0x44202100, tid=0x0, id=22,
>eip=0x453400c8, islocked=0, locker=226, 
>    lock=0x453400dc) at cache.c:784
>#2  0x080973f7 in bdb_do_search (op=0x44202100, rs=0x4540139c, sop=0x44202100,
>ps_e=0x0, ps_type=0)
>    at search.c:935
>#3  0x08095ab3 in bdb_search (op=0xfffffff5, rs=0x4540139c) at search.c:384
>#4  0x08057adb in do_search (op=0x44202100, rs=0x4540139c) at search.c:412
>#5  0x080563ca in connection_operation (ctx=0x4540141c, arg_v=0x44202100) at
>connection.c:1086
>#6  0x080b372f in ldap_int_thread_pool_wrapper (xpool=0x81af340) at tpool.c:467
>#7  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
>
>Thread 11 (process 16710):
>#0  0xffffe002 in ?? ()
>#1  0x0809fa4d in bdb_cache_find_id (op=0x4426fdf8, tid=0x0, id=7,
>eip=0x46103204, islocked=0, locker=227, 
>    lock=0x4610324c) at cache.c:784
>#2  0x080a3619 in bdb_dn2entry (op=0x4426fdf8, tid=0x0, dn=0x0, e=0x46103244,
>matched=1, locker=227, 
>    lock=0x4610324c) at dn2entry.c:69
>#3  0x0809ecb0 in bdb_bind (op=0x4426fdf8, rs=0x4610331c) at bind.c:69
>#4  0x080690c8 in do_bind (op=0x4426fdf8, rs=0x4610331c) at bind.c:622
>#5  0x0805630d in connection_operation (ctx=0x4610339c, arg_v=0x4426fdf8) at
>connection.c:1051
>#6  0x080b372f in ldap_int_thread_pool_wrapper (xpool=0x81af340) at tpool.c:467
>#7  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
>
>Thread 10 (process 16711):
>#0  0xffffe002 in ?? ()
>#1  0x0809fa4d in bdb_cache_find_id (op=0x4426ff88, tid=0x0, id=7,
>eip=0x46604184, islocked=0, locker=228, 
>    lock=0x466041cc) at cache.c:784
>#2  0x080a3619 in bdb_dn2entry (op=0x4426ff88, tid=0x0, dn=0x0, e=0x466041c4,
>matched=1, locker=228, 
>    lock=0x466041cc) at dn2entry.c:69
>#3  0x0809ecb0 in bdb_bind (op=0x4426ff88, rs=0x4660429c) at bind.c:69
>#4  0x080690c8 in do_bind (op=0x4426ff88, rs=0x4660429c) at bind.c:622
>#5  0x0805630d in connection_operation (ctx=0x4660431c, arg_v=0x4426ff88) at
>connection.c:1051
>#6  0x080b372f in ldap_int_thread_pool_wrapper (xpool=0x81af340) at tpool.c:467
>#7  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
>
>Thread 9 (process 16712):
>#0  0xffffe002 in ?? ()
>#1  0x0809fa4d in bdb_cache_find_id (op=0x44270118, tid=0x0, id=7,
>eip=0x46b05104, islocked=0, locker=229, 
>    lock=0x46b0514c) at cache.c:784
>#2  0x080a3619 in bdb_dn2entry (op=0x44270118, tid=0x0, dn=0x0, e=0x46b05144,
>matched=1, locker=229, 
>---Type <return> to continue, or q <return> to quit---
>    lock=0x46b0514c) at dn2entry.c:69
>#3  0x0809ecb0 in bdb_bind (op=0x44270118, rs=0x46b0521c) at bind.c:69
>#4  0x080690c8 in do_bind (op=0x44270118, rs=0x46b0521c) at bind.c:622
>#5  0x0805630d in connection_operation (ctx=0x46b0529c, arg_v=0x44270118) at
>connection.c:1051
>#6  0x080b372f in ldap_int_thread_pool_wrapper (xpool=0x81af340) at tpool.c:467
>#7  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
>
>Thread 8 (process 16713):
>#0  0xffffe002 in ?? ()
>#1  0x0809fa4d in bdb_cache_find_id (op=0x44263ef8, tid=0x0, id=7,
>eip=0x47006084, islocked=0, locker=230, 
>    lock=0x470060cc) at cache.c:784
>#2  0x080a3619 in bdb_dn2entry (op=0x44263ef8, tid=0x0, dn=0x0, e=0x470060c4,
>matched=1, locker=230, 
>    lock=0x470060cc) at dn2entry.c:69
>#3  0x0809ecb0 in bdb_bind (op=0x44263ef8, rs=0x4700619c) at bind.c:69
>#4  0x080690c8 in do_bind (op=0x44263ef8, rs=0x4700619c) at bind.c:622
>#5  0x0805630d in connection_operation (ctx=0x4700621c, arg_v=0x44263ef8) at
>connection.c:1051
>#6  0x080b372f in ldap_int_thread_pool_wrapper (xpool=0x81af340) at tpool.c:467
>#7  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
>
>Thread 7 (process 16714):
>#0  0xffffe002 in ?? ()
>#1  0x0809fa4d in bdb_cache_find_id (op=0x442ca410, tid=0x0, id=7,
>eip=0x47507004, islocked=0, locker=231, 
>    lock=0x4750704c) at cache.c:784
>#2  0x080a3619 in bdb_dn2entry (op=0x442ca410, tid=0x0, dn=0x0, e=0x47507044,
>matched=1, locker=231, 
>    lock=0x4750704c) at dn2entry.c:69
>#3  0x0809ecb0 in bdb_bind (op=0x442ca410, rs=0x4750711c) at bind.c:69
>#4  0x080690c8 in do_bind (op=0x442ca410, rs=0x4750711c) at bind.c:622
>#5  0x0805630d in connection_operation (ctx=0x4750719c, arg_v=0x442ca410) at
>connection.c:1051
>#6  0x080b372f in ldap_int_thread_pool_wrapper (xpool=0x81af340) at tpool.c:467
>#7  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
>
>Thread 6 (process 16715):
>#0  0xffffe002 in ?? ()
>#1  0x0809fa4d in bdb_cache_find_id (op=0x442ca5d8, tid=0x0, id=7,
>eip=0x47a07f84, islocked=0, locker=232, 
>    lock=0x47a07fcc) at cache.c:784
>#2  0x080a3619 in bdb_dn2entry (op=0x442ca5d8, tid=0x0, dn=0x0, e=0x47a07fc4,
>matched=1, locker=232, 
>    lock=0x47a07fcc) at dn2entry.c:69
>#3  0x0809ecb0 in bdb_bind (op=0x442ca5d8, rs=0x47a0809c) at bind.c:69
>#4  0x080690c8 in do_bind (op=0x442ca5d8, rs=0x47a0809c) at bind.c:622
>#5  0x0805630d in connection_operation (ctx=0x47a0811c, arg_v=0x442ca5d8) at
>connection.c:1051
>#6  0x080b372f in ldap_int_thread_pool_wrapper (xpool=0x81af340) at tpool.c:467
>#7  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
>
>Thread 5 (process 16716):
>#0  0xffffe002 in ?? ()
>#1  0x0809fa4d in bdb_cache_find_id (op=0x4425b7b0, tid=0x0, id=7,
>eip=0x47f08f04, islocked=0, locker=233, 
>    lock=0x47f08f4c) at cache.c:784
>#2  0x080a3619 in bdb_dn2entry (op=0x4425b7b0, tid=0x0, dn=0x0, e=0x47f08f44,
>matched=1, locker=233, 
>    lock=0x47f08f4c) at dn2entry.c:69
>#3  0x0809ecb0 in bdb_bind (op=0x4425b7b0, rs=0x47f0901c) at bind.c:69
>#4  0x080690c8 in do_bind (op=0x4425b7b0, rs=0x47f0901c) at bind.c:622
>#5  0x0805630d in connection_operation (ctx=0x47f0909c, arg_v=0x4425b7b0) at
>connection.c:1051
>#6  0x080b372f in ldap_int_thread_pool_wrapper (xpool=0x81af340) at tpool.c:467
>#7  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
>---Type <return> to continue, or q <return> to quit---
>
>Thread 4 (process 16717):
>#0  0xffffe002 in ?? ()
>#1  0x0809fa4d in bdb_cache_find_id (op=0x442acfb8, tid=0x0, id=7,
>eip=0x48409e84, islocked=0, locker=234, 
>    lock=0x48409ecc) at cache.c:784
>#2  0x080a3619 in bdb_dn2entry (op=0x442acfb8, tid=0x0, dn=0x0, e=0x48409ec4,
>matched=1, locker=234, 
>    lock=0x48409ecc) at dn2entry.c:69
>#3  0x0809ecb0 in bdb_bind (op=0x442acfb8, rs=0x48409f9c) at bind.c:69
>#4  0x080690c8 in do_bind (op=0x442acfb8, rs=0x48409f9c) at bind.c:622
>#5  0x0805630d in connection_operation (ctx=0x4840a01c, arg_v=0x442acfb8) at
>connection.c:1051
>#6  0x080b372f in ldap_int_thread_pool_wrapper (xpool=0x81af340) at tpool.c:467
>#7  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
>
>Thread 3 (process 16718):
>#0  0xffffe002 in ?? ()
>#1  0x0809fa4d in bdb_cache_find_id (op=0x442ad1e0, tid=0x0, id=7,
>eip=0x4890ae04, islocked=0, locker=235, 
>    lock=0x4890ae4c) at cache.c:784
>#2  0x080a3619 in bdb_dn2entry (op=0x442ad1e0, tid=0x0, dn=0x0, e=0x4890ae44,
>matched=1, locker=235, 
>    lock=0x4890ae4c) at dn2entry.c:69
>#3  0x0809ecb0 in bdb_bind (op=0x442ad1e0, rs=0x4890af1c) at bind.c:69
>#4  0x080690c8 in do_bind (op=0x442ad1e0, rs=0x4890af1c) at bind.c:622
>#5  0x0805630d in connection_operation (ctx=0x4890af9c, arg_v=0x442ad1e0) at
>connection.c:1051
>#6  0x080b372f in ldap_int_thread_pool_wrapper (xpool=0x81af340) at tpool.c:467
>#7  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
>
>Thread 2 (process 16719):
>#0  0xffffe002 in ?? ()
>#1  0x0809fa4d in bdb_cache_find_id (op=0x442ad408, tid=0x0, id=7,
>eip=0x48e0bd84, islocked=0, locker=236, 
>    lock=0x48e0bdcc) at cache.c:784
>#2  0x080a3619 in bdb_dn2entry (op=0x442ad408, tid=0x0, dn=0x0, e=0x48e0bdc4,
>matched=1, locker=236, 
>    lock=0x48e0bdcc) at dn2entry.c:69
>#3  0x0809ecb0 in bdb_bind (op=0x442ad408, rs=0x48e0be9c) at bind.c:69
>#4  0x080690c8 in do_bind (op=0x442ad408, rs=0x48e0be9c) at bind.c:622
>#5  0x0805630d in connection_operation (ctx=0x48e0bf1c, arg_v=0x442ad408) at
>connection.c:1051
>#6  0x080b372f in ldap_int_thread_pool_wrapper (xpool=0x81af340) at tpool.c:467
>#7  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
>
>Thread 1 (process 16669):
>#0  0xffffe002 in ?? ()
>#1  0x080545dd in slapd_daemon () at daemon.c:2041
>#2  0x0804b7b0 in main (argc=7, argv=0xbfffed44) at main.c:713
>#3  0x42015574 in __libc_start_main () from /lib/tls/libc.so.6
>
>
>
>
>
>  
>


-- 
  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support