[Date Prev][Date Next] [Chronological] [Thread] [Top]

slapd hanging - meta backend - Solaris 10

To: openldap-technical@openldap.org
Subject: slapd hanging - meta backend - Solaris 10
From: Lincoln Souzek <lsouzek@gmail.com>
Date: Mon, 5 Dec 2011 18:09:42 +0100
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; bh=wxbje4h7eV3iCXuYSZybg8bRs3BK/1llSgo21VluFBo=; b=mhlA+33Sv7C9lgB1qjJQcmTwl47G0fWsScDRQybx6GlYETxCtc49uUceRxRsq1OR25 bZsdqqgpUUpXnsIRgiFxOYhdEic0xNSAjb4zfwgastVXkPGt9C8CRnBd2hBFlLjSVir8 4lWfO3xuuoDgGvkGAH8RG5ydsdQkYelhTejTs=

Hello,

I've seen a couple of instances where slapd becomes unresponsive, apparently because the threads are waiting on a backend meta DB. We're running slapd 2.4.23 on Solaris 10 (update 11/06). We have 128 threads configured and when I attach with truss, I see 130 allocated, most of which look like this:

/53: lwp_park(0x00000000, 0) (sleeping...)

When I run pstack against the PID, the stack for almost all of the threads looks like one of the two threads below, both of which are somewhere within meta_back_search:

----------------- lwp# 20 / thread# 20 --------------------
fee40408 lwp_park (0, 0, 0)
00140e64 ldap_build_search_req (110f0c0, 599eb90, 2, 79db010, 79db0a0, 0) + 2c
001412bc ldap_pvt_search (110f0c0, 599eb90, 2, 79db010, 79db0a0, 0) + d4
000cd494 ???????? (3f45d70, f4fffd58, f4fff478, f4fff294, 0, 2b822c0)
000cd7fc meta_back_search (3f45d70, f4fffd58, 2, 2, 24d400, 0) + 1e0
000a0928 ???????? (3f45d70, f4fffd58, f4fff8e0, 28ff20, 2ac1b8, 14)
000a12dc ???????? (f4fff8e0, f4fffd58, 2, 2, 24d400, 28ff20)
000a3950 overlay_op_walk (8000, f4fffd58, 8000, 28fe18, 28ff20, 818) + 4c
000a3ae4 ???????? (3f45d70, f4fffd58, 2, 1e6000, a3ba0, 28fe18)
00041e08 fe_op_search (3f45d70, f4fffd58, 3f45e70, f4fffad8, 1ee5f8, 1ee6f0) + 3f8
00041528 do_search (3f45d70, f4fffd58, fee6cbc0, 1e6000, 16ec00, f4fffad8) + 590
0003f8e4 ???????? (f4fffe08, 3f45d70, fee6cbc0, fe2d4400, 2683c8, 0)
0013ca30 ???????? (2683b8, f5000000, 0, 0, 13c8d4, 1)
fee40368 _lwp_start (0, 0, 0, 0, 0, 0)

----------------- lwp# 21 / thread# 21 --------------------
fee40408 lwp_park (0, 0, 0)
0013e5c8 ldap_result (10ea3b0, 43e, 2, f47ff488, f47ff290, 0) + 3c
000cddf0 meta_back_search (53fe5b8, f47ffd58, 2, 2, 0, 1) + 7d4
000a0928 ???????? (53fe5b8, f47ffd58, f47ff8e0, 28ff20, 2ac1b8, 14)
000a12dc ???????? (f47ff8e0, f47ffd58, 2, 2, 24d400, 28ff20)
000a3950 overlay_op_walk (8000, f47ffd58, 8000, 28fe18, 28ff20, 818) + 4c
000a3ae4 ???????? (53fe5b8, f47ffd58, 2, 1e6000, a3ba0, 28fe18)
00041e08 fe_op_search (53fe5b8, f47ffd58, 53fe6b8, f47ffad8, 1ee5f8, 1ee6f0) + 3f8
00041528 do_search (53fe5b8, f47ffd58, fee6cbc0, 1e6000, 16ec00, f47ffad8) + 590
0003f8e4 ???????? (f47ffe08, 53fe5b8, fee6cbc0, fe2d4800, 2683c8, 0)
0013ca30 ???????? (2683b8, f4800000, 0, 0, 13c8d4, 1)
fee40368 _lwp_start (0, 0, 0, 0, 0, 0)

I took a core dump while the process was running but I'm not really sure how to proceed from here. Is there any way to get more information on what was happening with these threads at the time? In either case, is this a situation that should be handled with a general timeout directive? We currently only have a network-timeout and a bind timeout specified:

network-timeout 3
timeout bind=3

Thanks,
Lincoln

Follow-Ups:
- Re: slapd hanging - meta backend - Solaris 10
  - From: Quanah Gibson-Mount <quanah@zimbra.com>

Prev by Date: ldap_add_s - mods.mod_values empty
Next by Date: Re: memberof overlay deployment
Index(es):
- Chronological
- Thread