[Date Prev][Date Next] [Chronological] [Thread] [Top]

hang due to WAKE_LISTENER in slapd/daemon.c



Version:             OPENLDAP_REL_ENG_2_0_ALPHA3
OS:  Solaris        7
Thread Model:    no threads

Hi All,

I discovered that under certain, admittedly bizarre, circumstances
slapd can hang sending to the wake_sds socket pair (in the
WAKE_LISTENER macro.  I think that the buffering for the pair is
filling up and slapd gets permanently stuck in write.
Here's the stack trace:

#0  0xff218008 in _write () from /usr/lib/libc.so.1
#1  0x20a84 in slapd_set_write (s=28, wake=1) at daemon.c:142
#2  0x2bd8c in send_ldap_ber (conn=0xd57e0, ber=0x2d17e68) at result.c:202
#3  0x2d050 in send_search_entry (be=0xc0068, conn=0xd57e0, op=0x2d07240,
    e=0x2df2e00, attrs=0x2cd5870, attrsonly=0, ctrls=0x0) at result.c:707
#4  0x3cb1c in ldbm_back_search (be=0xc0068, conn=0xd57e0, op=0x2d07240,
    base=0x2df2e00 "", scope=1, deref=3, slimit=0, tlimit=0, filter=0x2f633b0,
    filterstr=0x2f57ff8 "(objectclass=*)", attrs=0x2cd5870, attrsonly=0)
    at search.c:284
#5  0x251dc in do_search (conn=0xd57e0, op=0x2d07240) at search.c:209
#6  0x23e04 in connection_operation (arg_v=0x2d14a40) at connection.c:745
#7  0x52b10 in ldap_pvt_thread_create (thread=0x2d07248, detach=1,
    start_routine=0x23d0c <connection_operation>, arg=0x2d14a40)
    at thr_stub.c:48
#8  0x248e8 in connection_op_activate (conn=0xd57e0, op=0x2d07240)
    at connection.c:1075
#9  0x246a4 in connection_input (conn=0xd57e0) at connection.c:979
#10 0x24268 in connection_read (s=230) at connection.c:880
#11 0x22424 in slapd_daemon_task (ptr=0x0) at daemon.c:849
#12 0x52b10 in ldap_pvt_thread_create (thread=0xbb408, detach=0,
    start_routine=0x216e0 <slapd_daemon_task>, arg=0x0) at thr_stub.c:48
#13 0x22638 in slapd_daemon () at daemon.c:901
#14 0x20428 in main (argc=769320, argv=0x99400) at main.c:433

I haven't duplicated this exact problem but I did do some tests that
indicate that even under moderate load multiple bytes can accumulate
in the wake_sds pair.  This could be fixed easily by limiting the number
of bytes that can be written to the pair using a counter and completely
emptying the sock-pair on each select hit.  If there are no objections
I'll submit a fix with an ITS entry.

-Jeff Romine