[Date Prev][Date Next] [Chronological] [Thread] [Top]

slapd hanging - meta backend - Solaris 10



Hello,

I've seen a couple of instances where slapd becomes unresponsive, apparently because the threads are waiting on a backend meta DB.  We're running slapd 2.4.23 on Solaris 10 (update 11/06).  We have 128 threads configured and when I attach with truss, I see 130 allocated, most of which look like this:

/53:    lwp_park(0x00000000, 0)         (sleeping...)

When I run pstack against the PID, the stack for almost all of the threads looks like one of the two threads below, both of which are somewhere within meta_back_search:

-----------------  lwp# 20 / thread# 20  --------------------
 fee40408 lwp_park (0, 0, 0)
 00140e64 ldap_build_search_req (110f0c0, 599eb90, 2, 79db010, 79db0a0, 0) + 2c
 001412bc ldap_pvt_search (110f0c0, 599eb90, 2, 79db010, 79db0a0, 0) + d4
 000cd494 ???????? (3f45d70, f4fffd58, f4fff478, f4fff294, 0, 2b822c0)
 000cd7fc meta_back_search (3f45d70, f4fffd58, 2, 2, 24d400, 0) + 1e0
 000a0928 ???????? (3f45d70, f4fffd58, f4fff8e0, 28ff20, 2ac1b8, 14)
 000a12dc ???????? (f4fff8e0, f4fffd58, 2, 2, 24d400, 28ff20)
 000a3950 overlay_op_walk (8000, f4fffd58, 8000, 28fe18, 28ff20, 818) + 4c
 000a3ae4 ???????? (3f45d70, f4fffd58, 2, 1e6000, a3ba0, 28fe18)
 00041e08 fe_op_search (3f45d70, f4fffd58, 3f45e70, f4fffad8, 1ee5f8, 1ee6f0) + 3f8
 00041528 do_search (3f45d70, f4fffd58, fee6cbc0, 1e6000, 16ec00, f4fffad8) + 590
 0003f8e4 ???????? (f4fffe08, 3f45d70, fee6cbc0, fe2d4400, 2683c8, 0)
 0013ca30 ???????? (2683b8, f5000000, 0, 0, 13c8d4, 1)
 fee40368 _lwp_start (0, 0, 0, 0, 0, 0)

-----------------  lwp# 21 / thread# 21  --------------------
 fee40408 lwp_park (0, 0, 0)
 0013e5c8 ldap_result (10ea3b0, 43e, 2, f47ff488, f47ff290, 0) + 3c
 000cddf0 meta_back_search (53fe5b8, f47ffd58, 2, 2, 0, 1) + 7d4
 000a0928 ???????? (53fe5b8, f47ffd58, f47ff8e0, 28ff20, 2ac1b8, 14)
 000a12dc ???????? (f47ff8e0, f47ffd58, 2, 2, 24d400, 28ff20)
 000a3950 overlay_op_walk (8000, f47ffd58, 8000, 28fe18, 28ff20, 818) + 4c
 000a3ae4 ???????? (53fe5b8, f47ffd58, 2, 1e6000, a3ba0, 28fe18)
 00041e08 fe_op_search (53fe5b8, f47ffd58, 53fe6b8, f47ffad8, 1ee5f8, 1ee6f0) + 3f8
 00041528 do_search (53fe5b8, f47ffd58, fee6cbc0, 1e6000, 16ec00, f47ffad8) + 590
 0003f8e4 ???????? (f47ffe08, 53fe5b8, fee6cbc0, fe2d4800, 2683c8, 0)
 0013ca30 ???????? (2683b8, f4800000, 0, 0, 13c8d4, 1)
 fee40368 _lwp_start (0, 0, 0, 0, 0, 0)

I took a core dump while the process was running but I'm not really sure how to proceed from here.  Is there any way to get more information on what was happening with these threads at the time?  In either case, is this a situation that should be handled with a general timeout directive?  We currently only have a network-timeout and a bind timeout specified:

network-timeout 3
timeout bind=3

Thanks,
Lincoln