[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#3574) server hangs on multiple diverse queries



OK, thanks for the confirmation, this ITS is now closed.

Frank.Meisschaert@UGent.be wrote:

>Your suspicion was right. Our ldap server runs on a redhat9 (shrike) 
>system. I recompiled it on a system running RHEL (AS). The problem 
>doesn't show up there, so we will migrate to RHEL.
>
>I suspect the problem was actually a pthreads problem, but I haven't got 
>the time to look into it further.
>
>Thanks for the quick response.
>
>KR,
>Frank
>
>On Thu, 3 Mar 2005 hyc@symas.com wrote:
>
>  
>
>>We'll need more detailed information to identify the problem. At the 
>>moment I suspect your operating system / glibc since we have performed 
>>quite a bit of heavy querying against this code already without any 
>>hangs. It would help to have your specific kernel and glibc version number.
>>
>>To get some more information, recompile your libldap_r with 
>>-DLDAP_RDWR_DEBUG, so that additional debug info will be present, and 
>>relink slapd with this library.
>>
>>Then, the next time the hang occurs, print the contents of the 
>>bdb->bi_cache.c_rwlock. That should tell us which thread is currently 
>>holding the lock. Looking at the code and the trace you sent, I don't 
>>believe any of those threads would validly have the lock, but this 
>>should tell us for certain.
>>
>>Frank.Meisschaert@UGent.be wrote:
>>
>>    
>>
>>>Full_Name: Frank Meisschaert
>>>Version: 2.2.23
>>>OS: Linux (2.4)
>>>URL: ftp://ftp.openldap.org/incoming/
>>>Submission from: (NULL) (157.193.44.125)
>>>
>>>
>>>Hello,
>>>
>>>We recently upgraded our ldap infrastructure to OpenLDAP v2.2.23 with BDB
>>>v4.3.27 as a backend due to some server hangs, but the server keeps hanging. In
>>>the log file we can see that the server keeps accepting new connections, but no
>>>RESULT's are returned. A gdb backtrace of a hung process is included at the end.
>>>Only one thread is in bdb_cache_lru_add, the others are stuck in
>>>bdb_cache_find_id. I suspect a cache problem (lru locking), which is consistent
>>>with what we see: the problem only occurs when multiple queries are made
>>>requesting many different entries. In our case we can replicate the problem by
>>>issuing several searches for random patterns in the 'givenName' attribute (not
>>>indexed).
>>>
>>>Kind Regards,
>>>Frank Meisschaert
>>>
>>>Thread 18 (process 16670):
>>>#0  0xffffe002 in ?? ()
>>>#1  0x08052eff in slapd_daemon_task (ptr=0x0) at daemon.c:2007
>>>#2  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
>>>
>>>Thread 17 (process 16671):
>>>#0  0xffffe002 in ?? ()
>>>#1  0x0809f78a in bdb_cache_lru_add (bdb=0x0, locker=181, ei=0x4425ea88) at
>>>cache.c:555
>>>#2  0x0809faa8 in bdb_cache_find_id (op=0x82b4c68, tid=0x0, id=825,
>>>eip=0x409bc348, islocked=0, locker=181, 
>>>   lock=0x409bc35c) at cache.c:799
>>>#3  0x080973f7 in bdb_do_search (op=0x82b4c68, rs=0x40a7d61c, sop=0x82b4c68,
>>>ps_e=0x0, ps_type=0)
>>>   at search.c:935
>>>#4  0x08095ab3 in bdb_search (op=0xfffffffc, rs=0x40a7d61c) at search.c:384
>>>#5  0x08057adb in do_search (op=0x82b4c68, rs=0x40a7d61c) at search.c:412
>>>#6  0x080563ca in connection_operation (ctx=0x40a7d69c, arg_v=0x82b4c68) at
>>>connection.c:1086
>>>#7  0x080b372f in ldap_int_thread_pool_wrapper (xpool=0x81af340) at tpool.c:467
>>>#8  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
>>>
>>>Thread 16 (process 16672):
>>>#0  0xffffe002 in ?? ()
>>>#1  0x0809fa4d in bdb_cache_find_id (op=0x82d7468, tid=0x0, id=825,
>>>eip=0x416be2c8, islocked=0, locker=191, 
>>>   lock=0x416be2dc) at cache.c:784
>>>#2  0x080973f7 in bdb_do_search (op=0x82d7468, rs=0x4177f59c, sop=0x82d7468,
>>>ps_e=0x0, ps_type=0)
>>>   at search.c:935
>>>#3  0x08095ab3 in bdb_search (op=0xfffffff5, rs=0x4177f59c) at search.c:384
>>>#4  0x08057adb in do_search (op=0x82d7468, rs=0x4177f59c) at search.c:412
>>>#5  0x080563ca in connection_operation (ctx=0x4177f61c, arg_v=0x82d7468) at
>>>connection.c:1086
>>>#6  0x080b372f in ldap_int_thread_pool_wrapper (xpool=0x81af340) at tpool.c:467
>>>#7  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
>>>
>>>Thread 15 (process 16673):
>>>#0  0xffffe002 in ?? ()
>>>#1  0x0809fa4d in bdb_cache_find_id (op=0x82d64a0, tid=0x0, id=143,
>>>eip=0x41d3e248, islocked=0, locker=199, 
>>>   lock=0x41d3e25c) at cache.c:784
>>>#2  0x080973f7 in bdb_do_search (op=0x82d64a0, rs=0x41dff51c, sop=0x82d64a0,
>>>ps_e=0x0, ps_type=0)
>>>   at search.c:935
>>>#3  0x08095ab3 in bdb_search (op=0xfffffff5, rs=0x41dff51c) at search.c:384
>>>#4  0x08057adb in do_search (op=0x82d64a0, rs=0x41dff51c) at search.c:412
>>>#5  0x080563ca in connection_operation (ctx=0x41dff59c, arg_v=0x82d64a0) at
>>>connection.c:1086
>>>#6  0x080b372f in ldap_int_thread_pool_wrapper (xpool=0x81af340) at tpool.c:467
>>>#7  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
>>>
>>>Thread 14 (process 16674):
>>>#0  0xffffe002 in ?? ()
>>>#1  0x0809fa4d in bdb_cache_find_id (op=0x82a3510, tid=0x0, id=823,
>>>eip=0x42c721c8, islocked=0, locker=206, 
>>>   lock=0x42c721dc) at cache.c:784
>>>#2  0x080973f7 in bdb_do_search (op=0x82a3510, rs=0x42d3349c, sop=0x82a3510,
>>>ps_e=0x0, ps_type=0)
>>>   at search.c:935
>>>#3  0x08095ab3 in bdb_search (op=0xfffffffc, rs=0x42d3349c) at search.c:384
>>>#4  0x08057adb in do_search (op=0x82a3510, rs=0x42d3349c) at search.c:412
>>>#5  0x080563ca in connection_operation (ctx=0x42d3351c, arg_v=0x82a3510) at
>>>connection.c:1086
>>>#6  0x080b372f in ldap_int_thread_pool_wrapper (xpool=0x81af340) at tpool.c:467
>>>#7  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
>>>---Type <return> to continue, or q <return> to quit--- 
>>>
>>>Thread 13 (process 16708):
>>>#0  0xffffe002 in ?? ()
>>>#1  0x0809fa4d in bdb_cache_find_id (op=0x44200aa8, tid=0x0, id=82,
>>>eip=0x4463e148, islocked=0, locker=225, 
>>>   lock=0x4463e15c) at cache.c:784
>>>#2  0x080973f7 in bdb_do_search (op=0x44200aa8, rs=0x446ff41c, sop=0x44200aa8,
>>>ps_e=0x0, ps_type=0)
>>>   at search.c:935
>>>#3  0x08095ab3 in bdb_search (op=0xfffffff5, rs=0x446ff41c) at search.c:384
>>>#4  0x08057adb in do_search (op=0x44200aa8, rs=0x446ff41c) at search.c:412
>>>#5  0x080563ca in connection_operation (ctx=0x446ff49c, arg_v=0x44200aa8) at
>>>connection.c:1086
>>>#6  0x080b372f in ldap_int_thread_pool_wrapper (xpool=0x81af340) at tpool.c:467
>>>#7  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
>>>
>>>Thread 12 (process 16709):
>>>#0  0xffffe002 in ?? ()
>>>#1  0x0809fa4d in bdb_cache_find_id (op=0x44202100, tid=0x0, id=22,
>>>eip=0x453400c8, islocked=0, locker=226, 
>>>   lock=0x453400dc) at cache.c:784
>>>#2  0x080973f7 in bdb_do_search (op=0x44202100, rs=0x4540139c, sop=0x44202100,
>>>ps_e=0x0, ps_type=0)
>>>   at search.c:935
>>>#3  0x08095ab3 in bdb_search (op=0xfffffff5, rs=0x4540139c) at search.c:384
>>>#4  0x08057adb in do_search (op=0x44202100, rs=0x4540139c) at search.c:412
>>>#5  0x080563ca in connection_operation (ctx=0x4540141c, arg_v=0x44202100) at
>>>connection.c:1086
>>>#6  0x080b372f in ldap_int_thread_pool_wrapper (xpool=0x81af340) at tpool.c:467
>>>#7  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
>>>
>>>Thread 11 (process 16710):
>>>#0  0xffffe002 in ?? ()
>>>#1  0x0809fa4d in bdb_cache_find_id (op=0x4426fdf8, tid=0x0, id=7,
>>>eip=0x46103204, islocked=0, locker=227, 
>>>   lock=0x4610324c) at cache.c:784
>>>#2  0x080a3619 in bdb_dn2entry (op=0x4426fdf8, tid=0x0, dn=0x0, e=0x46103244,
>>>matched=1, locker=227, 
>>>   lock=0x4610324c) at dn2entry.c:69
>>>#3  0x0809ecb0 in bdb_bind (op=0x4426fdf8, rs=0x4610331c) at bind.c:69
>>>#4  0x080690c8 in do_bind (op=0x4426fdf8, rs=0x4610331c) at bind.c:622
>>>#5  0x0805630d in connection_operation (ctx=0x4610339c, arg_v=0x4426fdf8) at
>>>connection.c:1051
>>>#6  0x080b372f in ldap_int_thread_pool_wrapper (xpool=0x81af340) at tpool.c:467
>>>#7  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
>>>
>>>Thread 10 (process 16711):
>>>#0  0xffffe002 in ?? ()
>>>#1  0x0809fa4d in bdb_cache_find_id (op=0x4426ff88, tid=0x0, id=7,
>>>eip=0x46604184, islocked=0, locker=228, 
>>>   lock=0x466041cc) at cache.c:784
>>>#2  0x080a3619 in bdb_dn2entry (op=0x4426ff88, tid=0x0, dn=0x0, e=0x466041c4,
>>>matched=1, locker=228, 
>>>   lock=0x466041cc) at dn2entry.c:69
>>>#3  0x0809ecb0 in bdb_bind (op=0x4426ff88, rs=0x4660429c) at bind.c:69
>>>#4  0x080690c8 in do_bind (op=0x4426ff88, rs=0x4660429c) at bind.c:622
>>>#5  0x0805630d in connection_operation (ctx=0x4660431c, arg_v=0x4426ff88) at
>>>connection.c:1051
>>>#6  0x080b372f in ldap_int_thread_pool_wrapper (xpool=0x81af340) at tpool.c:467
>>>#7  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
>>>
>>>Thread 9 (process 16712):
>>>#0  0xffffe002 in ?? ()
>>>#1  0x0809fa4d in bdb_cache_find_id (op=0x44270118, tid=0x0, id=7,
>>>eip=0x46b05104, islocked=0, locker=229, 
>>>   lock=0x46b0514c) at cache.c:784
>>>#2  0x080a3619 in bdb_dn2entry (op=0x44270118, tid=0x0, dn=0x0, e=0x46b05144,
>>>matched=1, locker=229, 
>>>---Type <return> to continue, or q <return> to quit---
>>>   lock=0x46b0514c) at dn2entry.c:69
>>>#3  0x0809ecb0 in bdb_bind (op=0x44270118, rs=0x46b0521c) at bind.c:69
>>>#4  0x080690c8 in do_bind (op=0x44270118, rs=0x46b0521c) at bind.c:622
>>>#5  0x0805630d in connection_operation (ctx=0x46b0529c, arg_v=0x44270118) at
>>>connection.c:1051
>>>#6  0x080b372f in ldap_int_thread_pool_wrapper (xpool=0x81af340) at tpool.c:467
>>>#7  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
>>>
>>>Thread 8 (process 16713):
>>>#0  0xffffe002 in ?? ()
>>>#1  0x0809fa4d in bdb_cache_find_id (op=0x44263ef8, tid=0x0, id=7,
>>>eip=0x47006084, islocked=0, locker=230, 
>>>   lock=0x470060cc) at cache.c:784
>>>#2  0x080a3619 in bdb_dn2entry (op=0x44263ef8, tid=0x0, dn=0x0, e=0x470060c4,
>>>matched=1, locker=230, 
>>>   lock=0x470060cc) at dn2entry.c:69
>>>#3  0x0809ecb0 in bdb_bind (op=0x44263ef8, rs=0x4700619c) at bind.c:69
>>>#4  0x080690c8 in do_bind (op=0x44263ef8, rs=0x4700619c) at bind.c:622
>>>#5  0x0805630d in connection_operation (ctx=0x4700621c, arg_v=0x44263ef8) at
>>>connection.c:1051
>>>#6  0x080b372f in ldap_int_thread_pool_wrapper (xpool=0x81af340) at tpool.c:467
>>>#7  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
>>>
>>>Thread 7 (process 16714):
>>>#0  0xffffe002 in ?? ()
>>>#1  0x0809fa4d in bdb_cache_find_id (op=0x442ca410, tid=0x0, id=7,
>>>eip=0x47507004, islocked=0, locker=231, 
>>>   lock=0x4750704c) at cache.c:784
>>>#2  0x080a3619 in bdb_dn2entry (op=0x442ca410, tid=0x0, dn=0x0, e=0x47507044,
>>>matched=1, locker=231, 
>>>   lock=0x4750704c) at dn2entry.c:69
>>>#3  0x0809ecb0 in bdb_bind (op=0x442ca410, rs=0x4750711c) at bind.c:69
>>>#4  0x080690c8 in do_bind (op=0x442ca410, rs=0x4750711c) at bind.c:622
>>>#5  0x0805630d in connection_operation (ctx=0x4750719c, arg_v=0x442ca410) at
>>>connection.c:1051
>>>#6  0x080b372f in ldap_int_thread_pool_wrapper (xpool=0x81af340) at tpool.c:467
>>>#7  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
>>>
>>>Thread 6 (process 16715):
>>>#0  0xffffe002 in ?? ()
>>>#1  0x0809fa4d in bdb_cache_find_id (op=0x442ca5d8, tid=0x0, id=7,
>>>eip=0x47a07f84, islocked=0, locker=232, 
>>>   lock=0x47a07fcc) at cache.c:784
>>>#2  0x080a3619 in bdb_dn2entry (op=0x442ca5d8, tid=0x0, dn=0x0, e=0x47a07fc4,
>>>matched=1, locker=232, 
>>>   lock=0x47a07fcc) at dn2entry.c:69
>>>#3  0x0809ecb0 in bdb_bind (op=0x442ca5d8, rs=0x47a0809c) at bind.c:69
>>>#4  0x080690c8 in do_bind (op=0x442ca5d8, rs=0x47a0809c) at bind.c:622
>>>#5  0x0805630d in connection_operation (ctx=0x47a0811c, arg_v=0x442ca5d8) at
>>>connection.c:1051
>>>#6  0x080b372f in ldap_int_thread_pool_wrapper (xpool=0x81af340) at tpool.c:467
>>>#7  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
>>>
>>>Thread 5 (process 16716):
>>>#0  0xffffe002 in ?? ()
>>>#1  0x0809fa4d in bdb_cache_find_id (op=0x4425b7b0, tid=0x0, id=7,
>>>eip=0x47f08f04, islocked=0, locker=233, 
>>>   lock=0x47f08f4c) at cache.c:784
>>>#2  0x080a3619 in bdb_dn2entry (op=0x4425b7b0, tid=0x0, dn=0x0, e=0x47f08f44,
>>>matched=1, locker=233, 
>>>   lock=0x47f08f4c) at dn2entry.c:69
>>>#3  0x0809ecb0 in bdb_bind (op=0x4425b7b0, rs=0x47f0901c) at bind.c:69
>>>#4  0x080690c8 in do_bind (op=0x4425b7b0, rs=0x47f0901c) at bind.c:622
>>>#5  0x0805630d in connection_operation (ctx=0x47f0909c, arg_v=0x4425b7b0) at
>>>connection.c:1051
>>>#6  0x080b372f in ldap_int_thread_pool_wrapper (xpool=0x81af340) at tpool.c:467
>>>#7  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
>>>---Type <return> to continue, or q <return> to quit---
>>>
>>>Thread 4 (process 16717):
>>>#0  0xffffe002 in ?? ()
>>>#1  0x0809fa4d in bdb_cache_find_id (op=0x442acfb8, tid=0x0, id=7,
>>>eip=0x48409e84, islocked=0, locker=234, 
>>>   lock=0x48409ecc) at cache.c:784
>>>#2  0x080a3619 in bdb_dn2entry (op=0x442acfb8, tid=0x0, dn=0x0, e=0x48409ec4,
>>>matched=1, locker=234, 
>>>   lock=0x48409ecc) at dn2entry.c:69
>>>#3  0x0809ecb0 in bdb_bind (op=0x442acfb8, rs=0x48409f9c) at bind.c:69
>>>#4  0x080690c8 in do_bind (op=0x442acfb8, rs=0x48409f9c) at bind.c:622
>>>#5  0x0805630d in connection_operation (ctx=0x4840a01c, arg_v=0x442acfb8) at
>>>connection.c:1051
>>>#6  0x080b372f in ldap_int_thread_pool_wrapper (xpool=0x81af340) at tpool.c:467
>>>#7  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
>>>
>>>Thread 3 (process 16718):
>>>#0  0xffffe002 in ?? ()
>>>#1  0x0809fa4d in bdb_cache_find_id (op=0x442ad1e0, tid=0x0, id=7,
>>>eip=0x4890ae04, islocked=0, locker=235, 
>>>   lock=0x4890ae4c) at cache.c:784
>>>#2  0x080a3619 in bdb_dn2entry (op=0x442ad1e0, tid=0x0, dn=0x0, e=0x4890ae44,
>>>matched=1, locker=235, 
>>>   lock=0x4890ae4c) at dn2entry.c:69
>>>#3  0x0809ecb0 in bdb_bind (op=0x442ad1e0, rs=0x4890af1c) at bind.c:69
>>>#4  0x080690c8 in do_bind (op=0x442ad1e0, rs=0x4890af1c) at bind.c:622
>>>#5  0x0805630d in connection_operation (ctx=0x4890af9c, arg_v=0x442ad1e0) at
>>>connection.c:1051
>>>#6  0x080b372f in ldap_int_thread_pool_wrapper (xpool=0x81af340) at tpool.c:467
>>>#7  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
>>>
>>>Thread 2 (process 16719):
>>>#0  0xffffe002 in ?? ()
>>>#1  0x0809fa4d in bdb_cache_find_id (op=0x442ad408, tid=0x0, id=7,
>>>eip=0x48e0bd84, islocked=0, locker=236, 
>>>   lock=0x48e0bdcc) at cache.c:784
>>>#2  0x080a3619 in bdb_dn2entry (op=0x442ad408, tid=0x0, dn=0x0, e=0x48e0bdc4,
>>>matched=1, locker=236, 
>>>   lock=0x48e0bdcc) at dn2entry.c:69
>>>#3  0x0809ecb0 in bdb_bind (op=0x442ad408, rs=0x48e0be9c) at bind.c:69
>>>#4  0x080690c8 in do_bind (op=0x442ad408, rs=0x48e0be9c) at bind.c:622
>>>#5  0x0805630d in connection_operation (ctx=0x48e0bf1c, arg_v=0x442ad408) at
>>>connection.c:1051
>>>#6  0x080b372f in ldap_int_thread_pool_wrapper (xpool=0x81af340) at tpool.c:467
>>>#7  0x401202b6 in start_thread () from /lib/tls/libpthread.so.0
>>>
>>>Thread 1 (process 16669):
>>>#0  0xffffe002 in ?? ()
>>>#1  0x080545dd in slapd_daemon () at daemon.c:2041
>>>#2  0x0804b7b0 in main (argc=7, argv=0xbfffed44) at main.c:713
>>>#3  0x42015574 in __libc_start_main () from /lib/tls/libc.so.6
>>>
>>>
>>>
>>>
>>>
>>> 
>>>
>>>      
>>>
>>
>>    
>>
>
>
>
>
>  
>


-- 
  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support