Issue 3227 - Finally using 2.2.13! A segfault problem... gdb output included
Summary: Finally using 2.2.13! A segfault problem... gdb output included
Status: VERIFIED FIXED
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: slapd (show other issues)
Version: unspecified
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-07-09 14:32 UTC by borwicjh@wfu.edu
Modified: 2014-08-01 21:05 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description borwicjh@wfu.edu 2004-07-09 14:32:35 UTC
Full_Name: John Borwick
Version: 2.2.13
OS: Red Hat Workstation 3
URL: http://www.wfu.edu/~borwicjh/examples/openldap-2.2.13-segfault/
Submission from: (NULL) (152.17.53.226)


First, thanks very much for OpenLDAP!  2.2 seems really fast!

I'm running openldap 2.2.13 with BDB 4.2.52.  Both BDB patches have been
applied, along with some crazy patches from Red Hat.  Maybe that's a problem, I
don't know.

After hitting the "o=WFU,c=US" backend (which rewrites to
"ou=Users,dc=wfu,dc=edu") maybe 10000 times, as fast as possible, the server
segfaults.

Here's a running count of the number of LDAP connections and the "backtrace
full" output.  Some symbols are missing; please let me know if this isn't enough
data.  We *did* compile with "--enable-ldap" and "--enable-rewrite".

Please see the URL http://www.wfu.edu/~borwicjh/examples/openldap-2.2.13-segfault/
for information on how to replicate.

Thank you very much!
John

-=-=- while true; do lsof -i :389 | wc -l; sleep 2; done
      0
      2
      2
    129
    264
    663
   1015
   1017
   1017
   1017
   1017
   1017
   1017
   1017
   1017
   1017
   1017
   1017
    990
    683
    731
   1017
   1017
   1017
   1017
   1017
   1017
    926
    632
    319
    382
    293
    350
    350
    350
    350
    350
    350
    350
      0
      0
      0

-=-=- gdb servers/slapd/slapd core -=-=-
#0  0x00000001 in ?? ()
No symbol table info available.
#1  <signal handler called>
No symbol table info available.
#2  0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
No symbol table info available.
#3  0xb737c8eb in __write_nocancel () from /lib/tls/libpthread.so.0
No symbol table info available.
#4  0x08108f95 in sb_stream_write (sbiod=0x81ef5b0, buf=0x8fb39950, len=94) at
sockbuf.c:549
No locals.
#5  0x08109835 in sb_debug_write (sbiod=0x81ef5c8, buf=0x8fb39950, len=94) at
sockbuf.c:846
        ret = -1884291056
#6  0x08108eb1 in ber_int_sb_write (sb=0x81ec6d8, buf=0x8fb39950, len=94) at
sockbuf.c:433
        ret = -1884291056
#7  0x08105a9e in ber_flush (sb=0x81ec6d8, ber=0x8fb4ec90, freeit=0) at
io.c:243
        towrite = 94
        rc = -1800410192
#8  0x080f0c54 in ldap_int_flush_request (ld=0x81eda20, lr=0x8fb4ed08) at
request.c:166
        lc = (LDAPConn *) 0x81ef510
#9  0x080f0fad in ldap_send_server_request (ld=0x81eda20, ber=0x8fb4ec90,
msgid=13991, parentreq=0x0, srvlist=0x0, lc=0x81ef510, bind=0x0) at
request.c:294
        lr = (LDAPRequest *) 0x8fb4ed08
        incparent = 0
        rc = 0
#10 0x080f0bf7 in ldap_send_initial_request (ld=0x81eda20, msgtype=99,
dn=0x8fb97ad8 "ou=Users,dc=wfu,dc=edu", ber=0x8fb4ec90, msgid=13991) at
request.c:147
        servers = (LDAPURLDesc *) 0x0
        rc = 136239648
#11 0x080e2011 in ldap_search_ext (ld=0x81eda20, base=0x8fb97ad8
"ou=Users,dc=wfu,dc=edu", scope=2, filter=0x8b78cb00
"(|(cn=sue*)(mail=sue*)(sn=sue*))", attrs=0x0,
    attrsonly=0, sctrls=0x0, cctrls=0x0, timeout=0x94afd7a0, sizelimit=500,
msgidp=0x94afd790) at search.c:110
        rc = 0
        ber = (BerElement *) 0x8fb4ec90
        timelimit = 3600
        id = 13991
#12 0x080b342a in ldap_back_search (op=0x8b3992c0, rs=0x94afe870) at
search.c:143
        li = (struct ldapinfo *) 0x819a9b8
        lc = (struct ldapconn *) 0x81ee408
        tv = {tv_sec = 3600, tv_usec = 0}
        res = (LDAPMessage *) 0x8099845
        e = (LDAPMessage *) 0x94afd7c8
        rc = 0
        msgid = -1959161152
        match = {bv_len = 0, bv_val = 0x0}
        mapped_attrs = (char **) 0x0
        mbase = {bv_len = 22, bv_val = 0x8fb97ad8 "ou=Users,dc=wfu,dc=edu"}
        mfilter = {bv_len = 32, bv_val = 0x8b78cb00
"(|(cn=sue*)(mail=sue*)(sn=sue*))"}
        dontfreetext = 0
        dc = {rwmap = 0x819a9f4, conn = 0x96a9fc88, ctx = 0x8124e53
"searchBase", rs = 0x94afe870}
#13 0x0805cbab in do_search (op=0x8b3992c0, rs=0x94afe870) at search.c:400
        base = {bv_len = 10, bv_val = 0x86506e47 "o=WFU,c=US"}
        siz = 0
        off = 0
        i = 0
        manageDSAit = 0
        be_manageDSAit = 0
#14 0x0805a551 in connection_operation (ctx=0x94afe900, arg_v=0x8b3992c0) at
connection.c:1042
        rc = -1025
        op = (Operation *) 0x8b3992c0
        rs = {sr_type = REP_RESULT, sr_tag = 0, sr_msgid = 0, sr_err = 0,
sr_matched = 0x0, sr_text = 0x0, sr_ref = 0x0, sr_ctrls = 0x0, sr_un = {sru_sasl
= {
      r_sasldata = 0x0}, sru_extended = {r_rspoid = 0x0, r_rspdata = 0x0},
sru_search = {r_entry = 0x0, r_attrs = 0x0, r_nentries = 0, r_v2ref = 0x0}},
sr_flags = 0}
        tag = 99
        oldtag = 99
        conn = (Connection *) 0x96a9fc88
        memctx = (void *) 0x8206058
        memctx_null = (void *) 0x0
        memsiz = 1048576
#15 0x080de3b6 in ldap_int_thread_pool_wrapper (xpool=0x8154fb8) at tpool.c:467
        pool = (struct ldap_int_thread_pool_s *) 0x8154fb8
        ctx = (ldap_int_thread_ctx_t *) 0x865f94b8
        ltc_key = {{ltk_key = 0x8097a48, ltk_data = 0x8206058, ltk_free =
0x8097a18 <sl_mem_destroy>}, {ltk_key = 0x81e4018, ltk_data = 0x13f,
    ltk_free = 0x80bc6d0 <bdb_locker_id_free>}, {ltk_key = 0x80af37d, ltk_data =
0x890fe008, ltk_free = 0x80af365 <search_stack_free>}, {ltk_key = 0x0,
    ltk_data = 0x0, ltk_free = 0} <repeats 29 times>}
        tid = 2494557104
        i = 734
        keyslot = 734
        hash = 734
#16 0xb7377dac in start_thread () from /lib/tls/libpthread.so.0
No symbol table info available.
#17 0xb7316a8a in clone () from /lib/tls/libc.so.6
No symbol table info available. 

Comment 1 ando@openldap.org 2004-07-12 18:47:32 UTC
According to the link you sent, each instance of your application
is trying to send 512 simultanoeus requests to slapd:

        perl load-test.pl --server=server-name --num-forks=512

since slapd cannot handle more than 1024 file descriptors (as far
as I know, because of an intrinsic limitation in glibc's select)
you're likely to be exausting system resources.  The core dump
you're showing is meaningless to me, because it shows the error
occurring in an obscure and generic internal of glibc rather than
in some specific part of slapd, starting from generic low level I/O
routines of libldap.  Can you reproduce the problem with a more
limited load?

p.

-- 
Pierangelo Masarati
mailto:pierangelo.masarati@sys-net.it



-- 
Pierangelo Masarati
mailto:pierangelo.masarati@sys-net.it


    SysNet - via Dossi,8 27100 Pavia Tel: +390382573859 Fax: +390382476497

Comment 2 ando@openldap.org 2004-07-12 21:04:29 UTC
Also, note that if you submit a large number of simultaneous connections,
those that exceed the number of available threads are queued and remain
pending.  I guess the sigsegv is a bug, and it would be nice to be able
to track it down.  I haven't been able to generate it on my system, so it
might be something related to your setupo, or at least something that
depends on the rest of the environmet.   However, in your case, if you
think your production system may be undergoing a high load, you might try
to increase the number of available threads.

p.

> Pierangelo Masarati wrote:
>> According to the link you sent, each instance of your application
>> is trying to send 512 simultanoeus requests to slapd:
>>
>>         perl load-test.pl --server=server-name --num-forks=512
>>
>> since slapd cannot handle more than 1024 file descriptors (as far
>> as I know, because of an intrinsic limitation in glibc's select)
>> you're likely to be exausting system resources.  The core dump
>> you're showing is meaningless to me, because it shows the error
>> occurring in an obscure and generic internal of glibc rather than
>> in some specific part of slapd, starting from generic low level I/O
>> routines of libldap.  Can you reproduce the problem with a more
>> limited load?
>
> Yes, with "--num-forks=32" run on each of two machines, the server still
> crashes with the same problem.  It performed fine with "--num-forks=16"
> and "--num-forks=24".
>
> Please consider that the # of file descriptors is at least doubled,
> because the LDAP backend is being used for each request to rewrite from
> "o=WFU,c=US" to "ou=Users,dc=wfu,dc=edu".
>
> With "lsof" monitoring, the pattern seems to be
>    1. normal # conns
>    2. quickly increasing # conns
>    3. hanging until one or both processes killed
>    4. unresponsive until # connections goes down
>    5. normal # conns
>    6. a lockup
>    7. crash
>
> During testing, I may have found a better gdb backtrace, too!  Check out
> the "__assert_fail" statement.  Thank you very much!
>
> #0  0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
> No symbol table info available.
> #1  0xb7262a29 in raise () from /lib/tls/libc.so.6
> No symbol table info available.
> #2  0xb7264255 in abort () from /lib/tls/libc.so.6
> No symbol table info available.
> #3  0xb725c559 in __assert_fail () from /lib/tls/libc.so.6
> No symbol table info available.
> #4  0x0806ad32 in slap_op_free (op=0x8b6405d8) at operation.c:66
>          slap_empty_bv_dup = {bv_len = 2433723312, bv_val = 0xb7380b7c
> "xÚ"}
> #5  0x0807391e in do_abandon (op=0x8233d40, rs=0x910fa870) at
> abandon.c:107
>          id = 1971
>          o = (Operation *) 0x8b6405d8
>          i = 7
> #6  0x0805a591 in connection_operation (ctx=0x910fa900, arg_v=0x8233d40)
> at connection.c:1047
>          rc = -1025
>          op = (Operation *) 0x8233d40
>          rs = {sr_type = REP_RESULT, sr_tag = 0, sr_msgid = 0, sr_err =
> 0, sr_matched = 0x0, sr_text = 0x0,
>    sr_ref = 0x0, sr_ctrls = 0x0, sr_un = {sru_sasl = {r_sasldata = 0x0},
> sru_extended = {r_rspoid = 0x0,
>        r_rspdata = 0x0}, sru_search = {r_entry = 0x0, r_attrs = 0x0,
> r_nentries = 0, r_v2ref = 0x0}}, sr_flags = 0}
>          tag = 80
>          oldtag = 80
>          conn = (Connection *) 0x96a9d088
>          memctx = (void *) 0x82d7930
>          memctx_null = (void *) 0x0
>          memsiz = 1048576
> #7  0x080de3b6 in ldap_int_thread_pool_wrapper (xpool=0x8154fc0) at
> tpool.c:467
>          pool = (struct ldap_int_thread_pool_s *) 0x8154fc0
>          ctx = (ldap_int_thread_ctx_t *) 0x90874260
>          ltc_key = {{ltk_key = 0x8097a48, ltk_data = 0x82d7930, ltk_free
> = 0x8097a18 <sl_mem_destroy>}, {
>      ltk_key = 0x81ad600, ltk_data = 0x132, ltk_free = 0x80bc6d0
> <bdb_locker_id_free>}, {ltk_key = 0x80af37d,
>      ltk_data = 0x88dfd008, ltk_free = 0x80af365 <search_stack_free>},
> {ltk_key = 0x0, ltk_data = 0x0,
>      ltk_free = 0} <repeats 29 times>}
>          tid = 2433723312
>          i = 507
>          keyslot = 507
>          hash = 507
> #8  0xb7377dac in start_thread () from /lib/tls/libpthread.so.0
> No symbol table info available.
> #9  0xb7316a8a in clone () from /lib/tls/libc.so.6
> No symbol table info available.
> (gdb) bt full
> #0  0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
> No symbol table info available.
> #1  0xb7262a29 in raise () from /lib/tls/libc.so.6
> No symbol table info available.
> #2  0xb7264255 in abort () from /lib/tls/libc.so.6
> No symbol table info available.
> #3  0xb725c559 in __assert_fail () from /lib/tls/libc.so.6
> No symbol table info available.
> #4  0x0806ad32 in slap_op_free (op=0x8b6405d8) at operation.c:66
>          slap_empty_bv_dup = {bv_len = 2433723312, bv_val = 0xb7380b7c
> "xÚ"}
> #5  0x0807391e in do_abandon (op=0x8233d40, rs=0x910fa870) at
> abandon.c:107
>          id = 1971
>          o = (Operation *) 0x8b6405d8
>          i = 7
> #6  0x0805a591 in connection_operation (ctx=0x910fa900, arg_v=0x8233d40)
> at connection.c:1047
>          rc = -1025
>          op = (Operation *) 0x8233d40
>          rs = {sr_type = REP_RESULT, sr_tag = 0, sr_msgid = 0, sr_err =
> 0, sr_matched = 0x0, sr_text = 0x0,
>    sr_ref = 0x0, sr_ctrls = 0x0, sr_un = {sru_sasl = {r_sasldata = 0x0},
> sru_extended = {r_rspoid = 0x0,
>        r_rspdata = 0x0}, sru_search = {r_entry = 0x0, r_attrs = 0x0,
> r_nentries = 0, r_v2ref = 0x0}}, sr_flags = 0}
>          tag = 80
>          oldtag = 80
>          conn = (Connection *) 0x96a9d088
>          memctx = (void *) 0x82d7930
>          memctx_null = (void *) 0x0
>          memsiz = 1048576
> #7  0x080de3b6 in ldap_int_thread_pool_wrapper (xpool=0x8154fc0) at
> tpool.c:467
>          pool = (struct ldap_int_thread_pool_s *) 0x8154fc0
>          ctx = (ldap_int_thread_ctx_t *) 0x90874260
>          ltc_key = {{ltk_key = 0x8097a48, ltk_data = 0x82d7930, ltk_free
> = 0x8097a18 <sl_mem_destroy>}, {
>      ltk_key = 0x81ad600, ltk_data = 0x132, ltk_free = 0x80bc6d0
> <bdb_locker_id_free>}, {ltk_key = 0x80af37d,
>      ltk_data = 0x88dfd008, ltk_free = 0x80af365 <search_stack_free>},
> {ltk_key = 0x0, ltk_data = 0x0,
>      ltk_free = 0} <repeats 29 times>}
>          tid = 2433723312
>          i = 507
>          keyslot = 507
>          hash = 507
> #8  0xb7377dac in start_thread () from /lib/tls/libpthread.so.0
> No symbol table info available.
> #9  0xb7316a8a in clone () from /lib/tls/libc.so.6
> No symbol table info available.
>
>
> --
>             John Borwick
>         Systems Administrator
>        Wake Forest University | web  http://www.wfu.edu/~borwicjh
>        Winston-Salem, NC, USA | GPG key ID               56D60872
>


-- 
Pierangelo Masarati
mailto:pierangelo.masarati@sys-net.it


    SysNet - via Dossi,8 27100 Pavia Tel: +390382573859 Fax: +390382476497

Comment 3 borwicjh@wfu.edu 2004-07-13 15:44:23 UTC
Pierangelo Masarati wrote:
> Also, note that if you submit a large number of simultaneous connections,
> those that exceed the number of available threads are queued and remain
> pending.  I guess the sigsegv is a bug, and it would be nice to be able
> to track it down.  I haven't been able to generate it on my system, so it
> might be something related to your setupo, or at least something that
> depends on the rest of the environmet.   However, in your case, if you
> think your production system may be undergoing a high load, you might try
> to increase the number of available threads.
> 
> p.

OK.  I recompiled with "--enable-threads=no" and still get crashes. 
Should that eliminate threads as a problem?

If I do the "--num-forks=512" test with the two machines hitting the 
*BDB* backend, there is no crash.  The entire test case completes fine. 
  It seems that only the *LDAP* backend is causing a crash.  (This could 
be due to something else, though, like a linear vs. exponential demand 
on resources.)


What's weird to me is that "libpthread" is still linked in even when 
"--enable-threads=no":

# ldd `which slapd`
         libdb-4.2.so => /usr/lib/libdb-4.2.so (0xb7500000)
         libsasl2.so.2 => /usr/lib/libsasl2.so.2 (0xb74ea000)
         libssl.so.4 => /lib/libssl.so.4 (0xb74b5000)
         libcrypto.so.4 => /lib/libcrypto.so.4 (0xb73c3000)
         libcrypt.so.1 => /lib/libcrypt.so.1 (0xb7396000)
         libresolv.so.2 => /lib/libresolv.so.2 (0xb7384000)
         libpthread.so.0 => /lib/tls/libpthread.so.0 (0xb7373000)
         libc.so.6 => /lib/tls/libc.so.6 (0xb723b000)
         libdl.so.2 => /lib/libdl.so.2 (0xb7238000)
         libgssapi_krb5.so.2 => /usr/kerberos/lib/libgssapi_krb5.so.2 
(0xb7225000)
         libkrb5.so.3 => /usr/kerberos/lib/libkrb5.so.3 (0xb71c7000)
         libcom_err.so.3 => /usr/kerberos/lib/libcom_err.so.3 (0xb71c5000)
         libk5crypto.so.3 => /usr/kerberos/lib/libk5crypto.so.3 (0xb71b4000)
         libz.so.1 => /usr/lib/libz.so.1 (0xb71a6000)
         /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0xb75eb000)

Are there any potential incompatibilities with threading between 
OpenLDAP and these libraries?  Are there other libraries I should 
recompile/upgrade/remove to do further testing?

Thank you very very much,
John
-- 
            John Borwick
        Systems Administrator
       Wake Forest University | web  http://www.wfu.edu/~borwicjh
       Winston-Salem, NC, USA | GPG key ID               56D60872
Comment 4 ando@openldap.org 2004-07-13 16:10:15 UTC
> Pierangelo Masarati wrote:
>> Also, note that if you submit a large number of simultaneous
>> connections,
>> those that exceed the number of available threads are queued and remain
>> pending.  I guess the sigsegv is a bug, and it would be nice to be able
>> to track it down.  I haven't been able to generate it on my system, so
>> it
>> might be something related to your setupo, or at least something that
>> depends on the rest of the environmet.   However, in your case, if you
>> think your production system may be undergoing a high load, you might
>> try
>> to increase the number of available threads.
>>
>> p.
>
> OK.  I recompiled with "--enable-threads=no" and still get crashes.
> Should that eliminate threads as a problem?

There's no --enable-threads switch in OpenLDAP's configure.  There's a
--with-threads one.

In any case, I think back-ldap definitely needs threads and, provided the
system threads are not buggy, their use with slapd should be relatively
safe and beneficial in all cases.  I think you just need to boost the
number of simultaneous threads your slapd can handle.  The default is 16,
and if you want to deal with 512 simultaneous connections you could try
"threads 64" or "threads 128" (if your hardware can stand it, i.e. you are
using a 2/4 CPU system with a lot of ram and overall good performance,
including network bandwidth).  Otherwise, you cannot simply accept so many
simultaneous connections with your hardware, sigsegv or not.

>
> If I do the "--num-forks=512" test with the two machines hitting the
> *BDB* backend, there is no crash.  The entire test case completes fine.
>   It seems that only the *LDAP* backend is causing a crash.  (This could
> be due to something else, though, like a linear vs. exponential demand
> on resources.)

Are the back-ldap and back-bdb in the same slapd?  If not, are they
on the same machine?  On my laptop (a much older RH 7.1) when I try
such an intensive test, the system runs out of file descriptors way
before 128 simultaneous processes are started, and slapd hangs after
a while.  However, when I kill the requests and the machine load
decreases a bit, the slapd goes (slowly) back to service.  I used
your config file, and I hit a test database containing a few tenths
of entries, but this should not be an issue.

p.

>
>
> What's weird to me is that "libpthread" is still linked in even when
> "--enable-threads=no":
>
> # ldd `which slapd`
>          libdb-4.2.so => /usr/lib/libdb-4.2.so (0xb7500000)
>          libsasl2.so.2 => /usr/lib/libsasl2.so.2 (0xb74ea000)
>          libssl.so.4 => /lib/libssl.so.4 (0xb74b5000)
>          libcrypto.so.4 => /lib/libcrypto.so.4 (0xb73c3000)
>          libcrypt.so.1 => /lib/libcrypt.so.1 (0xb7396000)
>          libresolv.so.2 => /lib/libresolv.so.2 (0xb7384000)
>          libpthread.so.0 => /lib/tls/libpthread.so.0 (0xb7373000)
>          libc.so.6 => /lib/tls/libc.so.6 (0xb723b000)
>          libdl.so.2 => /lib/libdl.so.2 (0xb7238000)
>          libgssapi_krb5.so.2 => /usr/kerberos/lib/libgssapi_krb5.so.2
> (0xb7225000)
>          libkrb5.so.3 => /usr/kerberos/lib/libkrb5.so.3 (0xb71c7000)
>          libcom_err.so.3 => /usr/kerberos/lib/libcom_err.so.3 (0xb71c5000)
>          libk5crypto.so.3 => /usr/kerberos/lib/libk5crypto.so.3
> (0xb71b4000)
>          libz.so.1 => /usr/lib/libz.so.1 (0xb71a6000)
>          /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0xb75eb000)
>
> Are there any potential incompatibilities with threading between
> OpenLDAP and these libraries?  Are there other libraries I should
> recompile/upgrade/remove to do further testing?
>
> Thank you very very much,



-- 
Pierangelo Masarati
mailto:pierangelo.masarati@sys-net.it


    SysNet - via Dossi,8 27100 Pavia Tel: +390382573859 Fax: +390382476497

Comment 5 borwicjh@wfu.edu 2004-07-13 17:26:55 UTC
Pierangelo Masarati wrote:
>>OK.  I recompiled with "--enable-threads=no" and still get crashes.
>>Should that eliminate threads as a problem?
> 
> 
> There's no --enable-threads switch in OpenLDAP's configure.  There's a
> --with-threads one.

Whoops!

> In any case, I think back-ldap definitely needs threads and, provided the
> system threads are not buggy, their use with slapd should be relatively
> safe and beneficial in all cases.  I think you just need to boost the
> number of simultaneous threads your slapd can handle.  The default is 16,
> and if you want to deal with 512 simultaneous connections you could try
> "threads 64" or "threads 128" (if your hardware can stand it, i.e. you are
> using a 2/4 CPU system with a lot of ram and overall good performance,
> including network bandwidth).  Otherwise, you cannot simply accept so many
> simultaneous connections with your hardware, sigsegv or not.

Excellent.  With "threads 128" our dual CPU (+SMP) machine handled the 
4500 test queries without crashing!

We've actually been having our production LDAP server crash all the time 
(at least once a day) due to what I'm hoping is this problem.  I'm going 
to increase the number of threads there and see if that helps.

Here is a theory for you:

Does the BDB backend accept queries only as fast as it can actually 
resolve them, whereas the LDAP backend accepts queries as soon as they 
are received and start queuing them up?

-- 
            John Borwick
        Systems Administrator
       Wake Forest University | web  http://www.wfu.edu/~borwicjh
       Winston-Salem, NC, USA | GPG key ID               56D60872
Comment 6 ando@openldap.org 2004-07-13 17:42:40 UTC
> Pierangelo Masarati wrote:
>>>OK.  I recompiled with "--enable-threads=no" and still get crashes.
>>>Should that eliminate threads as a problem?
>>
>>
>> There's no --enable-threads switch in OpenLDAP's configure.  There's a
>> --with-threads one.
>
> Whoops!
>
>> In any case, I think back-ldap definitely needs threads and, provided
>> the
>> system threads are not buggy, their use with slapd should be relatively
>> safe and beneficial in all cases.  I think you just need to boost the
>> number of simultaneous threads your slapd can handle.  The default is
>> 16,
>> and if you want to deal with 512 simultaneous connections you could try
>> "threads 64" or "threads 128" (if your hardware can stand it, i.e. you
>> are
>> using a 2/4 CPU system with a lot of ram and overall good performance,
>> including network bandwidth).  Otherwise, you cannot simply accept so
>> many
>> simultaneous connections with your hardware, sigsegv or not.
>
> Excellent.  With "threads 128" our dual CPU (+SMP) machine handled the
> 4500 test queries without crashing!
>
> We've actually been having our production LDAP server crash all the time
> (at least once a day) due to what I'm hoping is this problem.  I'm going
> to increase the number of threads there and see if that helps.
>
> Here is a theory for you:
>
> Does the BDB backend accept queries only as fast as it can actually
> resolve them, whereas the LDAP backend accepts queries as soon as they
> are received and start queuing them up?

I might not be the most appropriate person to answer your question; as far
as I can tell, the frontend accepts connections, and concurrently handles
as many connections as threads are available in the main pool (that's one
of the reasons for not compiling --without-threads...).  Connections are
handled by calling backends as appropriate.  Back-bdb has to do some work,
while back-ldap forwards requests and waits for response.  I guess if the
remote server is not much responsive, back-ldap may submit too many
concurrent requests and idle while they're answered.  Here the frontend
starts queuing further connections.  In any case, the frontend that
accepts and queues connections is the same for all backends, so at this
level there should be no difference.

p.

-- 
Pierangelo Masarati
mailto:pierangelo.masarati@sys-net.it


    SysNet - via Dossi,8 27100 Pavia Tel: +390382573859 Fax: +390382476497

Comment 7 ando@openldap.org 2004-07-16 13:49:03 UTC
changed notes
Comment 8 Kurt Zeilenga 2004-08-28 05:17:07 UTC
changed state Open to Closed
Comment 9 Howard Chu 2009-02-17 05:25:25 UTC
moved from Incoming to Archive.Incoming
Comment 10 OpenLDAP project 2014-08-01 21:05:50 UTC
could not reproduce; OS/resource exaustion related?