[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: (ITS#6034) Assert failure - connection_state_closing / send_ldap_ber / slap_send_search_entry / syncprov_sendresp
- To: openldap-its@OpenLDAP.org
- Subject: Re: (ITS#6034) Assert failure - connection_state_closing / send_ldap_ber / slap_send_search_entry / syncprov_sendresp
- From: oskar@deckle.co.za
- Date: Tue, 28 Apr 2009 14:18:45 GMT
- Auto-submitted: auto-generated (OpenLDAP-ITS)
Hi
Apologies - I was incorrect in item 2. We have not yet upgraded to
2.4.16 - that changecontrol is still pending. So the patch I referred
to applies to 2.4.15.
Thanks, and good luck.
Oskar
On 28 Apr 2009, at 15:16, Oskar Pearson wrote:
> Hi David
>
> On 28 Apr 2009, at 12:20, David Angerer wrote:
>
>> Hi there,
>>
>> Already posted this a while ago to openldap-bugs, but havn't got a
>> response yet. As the problem still persist in my environment, I'll
>> give it another try here.
>>
>> We upgraded to openldap 2.4.16 a while ago. Since then we are
>> experiencing relatively frequent core dumps (average uptime is around
>> 2-3 days). Investigating the root cause we came across ITS#6034 which
>> reminded us much of problem we are facing, but should be fixed in
>> 2.4.16.
>
> We've not had a crash related to this (or its #5836) since doing the
> following four things:
>
> 1) Fixing path-mtu issues on the network with the VPN, so that when
> large transactions occurred, the network wouldn't go into infinite
> wait, chewing up threads. Links flapping up and down, or path-mtu-
> discovery icmp filtering can have the same effect and be out of your
> control.
>
> 2) Upgrading to 2.4.16
>
> 3) setting this in slapd.conf (after applying the patch in the next
> step)
> writetimeout 30
>
> 4) Applying the patch here.
> The patch is a hacked up version of the patch relating to #5836
> pulled out of CVS, which removes some of the threading aspects, and
> "clarifies" some of the code in the process.
>
> If I were you, I'd try to pull the #5836 patch out of CVS and see if
> that helps you. If not, or you need a very urgent fix to see if this
> helps, then consider using this patch.
>
> I have to make numerous disclaimers about this patch...
> 4a) some of it may be a less-optimal way of performing the same
> changes that are mentioned in #5836 in trunk. I originally started
> hacking that code as I encountered some "wierdness" in it - where
> straces didn't do what I expected. In retrospect, I think some of
> that may have been optimisations that I didn't understand at the
> time. I tested my hacked-up patch, and it worked, so I've stuck to
> it since. I intend to re-pull the original patch out of the cvs
> source sometime and re-test it, but I've not had time.
> 4b) It works for us, but I make you absolutely no promises - I am
> releasing it under the same licence as openldap (no warantee, etc
> etc etc).
> 4c) it probably only reduces the problem because of some corner-case
> that I am completely unaware of. EG: it may mean the assert-ing code
> is never reached because the connections never reach the expected
> thread state.
> 4d) I've not done any C code for years, nor do I have very much
> experience with the openldap source.
> 4e) I fully expect at least three flames from Howard Chu including
> great detail about why this patch is horrible. :) But at least it
> has some useful comments. :)
>
>
> <oskar-connection-patch-v1>
>
>
> Hope that helps.
>
> Oskar
>
>
>> Setup:
>> Solaris 10
>> Openldap 2.4.16 (64bit)
>> Berkeley DB 4.7
>>
>> 1 Master
>> 5 Replicas with deltasyncrepl
>>
>> Cores only happened to the master so far, replicas are running
>> stable so far.
>>
>> I couldn't find a reference to this ITS in the 2.4.16 changelog
>> (probably listed as other ITS or fixed earlier).
>> Also if the fix made it's way into 2.4.16, I am wondering why we
>> still
>> observe this cores.
>>
>> If you need further information or have any suggestions please let me
>> know. Please find some debugging information below.
>>
>> Kind regards,
>> David
>>
>> (dbx) threads
>> t@1 a l@1 ?() LWP suspended in __lwp_wait()
>> t@2 a l@2 slapd_daemon_task() LWP suspended in __pollsys()
>> t@3 a l@3 ldap_int_thread_pool_wrapper() sleep on
>> 0x1004dd428 in __lwp_park()
>> t@4 a l@4 ldap_int_thread_pool_wrapper() LWP suspended in
>> ___lwp_cond_wait()
>> t@5 a l@5 ldap_int_thread_pool_wrapper() LWP suspended in
>> ___lwp_cond_wait()
>> t@6 a l@6 ldap_int_thread_pool_wrapper() LWP suspended in
>> ___lwp_cond_wait()
>> t@7 a l@7 ldap_int_thread_pool_wrapper() sleep on
>> 0x1004dd670 in __lwp_park()
>> t@8 a l@8 ldap_int_thread_pool_wrapper() LWP suspended in
>> ___lwp_cond_wait()
>> t@9 a l@9 ldap_int_thread_pool_wrapper() sleep on
>> 0x1004a9908 in __lwp_park()
>> o t@10 a l@10 ldap_int_thread_pool_wrapper() signal SIGABRT
>> in
>> __lwp_kill()
>> t@11 a l@11 ldap_int_thread_pool_wrapper() sleep on
>> 0x1004a9908 in __lwp_park()
>> t@12 a l@12 ldap_int_thread_pool_wrapper() LWP suspended in
>> ___lwp_cond_wait()
>>> t@13 a l@13 ldap_int_thread_pool_wrapper() LWP suspended in
>> slap_access_allowed()
>> t@14 a l@14 ldap_int_thread_pool_wrapper() LWP suspended in
>> ___lwp_cond_wait()
>> t@15 a l@15 ldap_int_thread_pool_wrapper() sleep on
>> 0x1004dd1e0 in __lwp_park()
>> t@16 a l@16 ldap_int_thread_pool_wrapper() LWP suspended in
>> ___lwp_cond_wait()
>> t@17 a l@17 ldap_int_thread_pool_wrapper() sleep on
>> 0x1004a9908 in __lwp_park()
>> t@18 a l@18 ldap_int_thread_pool_wrapper() LWP suspended in
>> __lock_getlocker_int()
>> (dbx) lwp l@10
>> Current function is connection_state_closing
>> 686 assert( c->c_struct_state == SLAP_C_USED );
>> t@10 (l@10) stopped in __lwp_kill at 0xffffffff7decea00
>> 0xffffffff7decea00: __lwp_kill+0x0008: bcc,a,pt %icc,__lwp_kill
>> +0x18
>> ! 0xffffffff7decea10
>> (dbx) where
>> [1] __lwp_kill(0x0, 0x6, 0x10004b128, 0x19d480, 0x0, 0x0), at
>> 0xffffffff7decea00 [2] raise(0x6, 0x0, 0x100445d58,
>> 0xffffffffffffffff, 0xffffffff7dfe6000, 0x0), at 0xffffffff7de6b4f4
>> [3] abort(0x1, 0x1b8, 0x10004b128, 0x19d480, 0x0, 0x0), at
>> 0xffffffff7de48c8c [4] __assert(0x100271f48, 0x100271f70, 0x2ae,
>> 0x0,
>> 0x19d138, 0x100069f70), at 0xffffffff7de48f30 =>[5]
>> connection_state_closing(c = 0x1004dce30), line 686 in "connection.c"
>> [6] send_ldap_ber(op = 0xffffffff733ff768, ber = 0xffffffff733ff028),
>> line 148 in "result.c"
>> [7] slap_send_search_entry(op = 0xffffffff733ff768, rs =
>> 0xffffffff733ff2d0), line 1247 in "result.c"
>> [8] syncprov_sendresp(op = 0xffffffff733ff768, opc =
>> 0xffffffff733ff4a0, so = 0x113c0a930, e = 0xffffffff733ff500, mode =
>> 3), line 848 in "syncprov.c"
>> [9] syncprov_qplay(op = 0xffffffff733ff768, so = 0x113c0a930), line
>> 913 in "syncprov.c"
>> [10] syncprov_qtask(ctx = 0xffffffff733ffc20, arg = 0x113c0a930),
>> line 971 in "syncprov.c"
>> [11] ldap_int_thread_pool_wrapper(xpool = 0x10046a250), line 663 in
>> "tpool.c"
>> (dbx) list 680,690
>> 680 int connection_state_closing( Connection *c )
>> 681 {
>> 682 /* c_mutex must be locked by caller */
>> 683
>> 684 int state;
>> 685 assert( c != NULL );
>> 686 assert( c->c_struct_state == SLAP_C_USED );
>> 687
>> 688 state = c->c_conn_state;
>> 689
>> 690 assert( state != SLAP_C_INVALID );
>> (dbx) print c
>> c = 0x1004dce30
>> (dbx) print *c
>> *c = {
>> c_struct_state = 1
>> c_conn_state = 0
>> c_conn_idx = 26
>> c_sd = -1
>> c_close_reason = 0x100271eb0 "?"
>> c_mutex = {
>> __pthread_mutex_flags = {
>> __pthread_mutex_flag1 = 4U
>> __pthread_mutex_flag2 = '\0'
>> __pthread_mutex_ceiling = '\0'
>> __pthread_mutex_type = 0
>> __pthread_mutex_magic = 19800U
>> }
>> __pthread_mutex_lock = {
>> __pthread_mutex_lock64 = {
>> __pthread_mutex_pad = ""
>> }
>> __pthread_mutex_lock32 = {
>> __pthread_ownerpid = 0
>> __pthread_lockword = 0
>> }
>> __pthread_mutex_owner64 = 0
>> }
>> __pthread_mutex_data = 0
>> }
>> c_sb = 0x118772d90
>> c_starttime = 0
>> c_activitytime = 0
>> c_connid = 18446744073709551615U
>> c_peer_domain = {
>> bv_len = 0
>> bv_val = (nil)
>> }
>> c_peer_name = {
>> bv_len = 0
>> bv_val = (nil)
>> }
>> c_listener = (nil)
>> c_sasl_bind_mech = {
>> bv_len = 0
>> bv_val = (nil)
>> }
>> c_sasl_dn = {
>> bv_len = 0
>> bv_val = (nil)
>> }
>> c_sasl_authz_dn = {
>> bv_len = 0
>> bv_val = (nil)
>> }
>> c_authz_backend = (nil)
>> c_authz_cookie = (nil)
>> c_authz = {
>> sai_method = 128U
>> sai_mech = {
>> bv_len = 0
>> bv_val = (nil)
>> }
>> sai_dn = {
>> bv_len = 0
>> bv_val = (nil)
>> }
>> sai_ndn = {
>> bv_len = 0
>> bv_val = (nil)
>> }
>> sai_ssf = 0
>> sai_transport_ssf = 0
>> sai_tls_ssf = 0
>> sai_sasl_ssf = 0
>> }
>> c_protocol = 0
>> c_ops = {
>> stqh_first = (nil)
>> stqh_last = 0x1004dcf38
>> }
>> c_pending_ops = {
>> stqh_first = (nil)
>> stqh_last = 0x1004dcf48
>> }
>> c_write1_mutex = {
>> __pthread_mutex_flags = {
>> __pthread_mutex_flag1 = 4U
>> __pthread_mutex_flag2 = '\0'
>> __pthread_mutex_ceiling = '\0'
>> __pthread_mutex_type = 0
>> __pthread_mutex_magic = 19800U
>> }
>> __pthread_mutex_lock = {
>> __pthread_mutex_lock64 = {
>> __pthread_mutex_pad = ""
>> }
>> __pthread_mutex_lock32 = {
>> __pthread_ownerpid = 0
>> __pthread_lockword = 4278190080U
>> }
>> __pthread_mutex_owner64 = 4278190080U
>> }
>> __pthread_mutex_data = 18446744071451976704U
>> }
>> c_write1_cv = {
>> __pthread_cond_flags = {
>> __pthread_cond_flag = ""
>> __pthread_cond_type = 0
>> __pthread_cond_magic = 17238U
>> }
>> __pthread_cond_data = 0
>> }
>> c_write2_mutex = {
>> __pthread_mutex_flags = {
>> __pthread_mutex_flag1 = 4U
>> __pthread_mutex_flag2 = '\0'
>> __pthread_mutex_ceiling = '\0'
>> __pthread_mutex_type = 0
>> __pthread_mutex_magic = 19800U
>> }
>> __pthread_mutex_lock = {
>> __pthread_mutex_lock64 = {
>> __pthread_mutex_pad = ""
>> }
>> __pthread_mutex_lock32 = {
>> __pthread_ownerpid = 0
>> __pthread_lockword = 0
>> }
>> __pthread_mutex_owner64 = 0
>> }
>> __pthread_mutex_data = 0
>> }
>> c_write2_cv = {
>> __pthread_cond_flags = {
>> __pthread_cond_flag = ""
>> __pthread_cond_type = 0
>> __pthread_cond_magic = 17238U
>> }
>> __pthread_cond_data = 0
>> }
>> c_currentber = (nil)
>> c_writers = 0
>> c_sasl_bind_in_progress = '\0'
>> c_writewaiter = '\0'
>> c_sasl_layers = '\0'
>> c_sasl_done = '\0'
>> c_sasl_authctx = (nil)
>> c_sasl_sockctx = (nil)
>> c_sasl_extra = (nil)
>> c_sasl_bindop = (nil)
>> c_pagedresults_state = {
>> ps_be = (nil)
>> ps_size = 0
>> ps_count = 0
>> ps_cookie = 0
>> ps_cookieval = {
>> bv_len = 0
>> bv_val = (nil)
>> }
>> }
>> c_n_ops_received = 3
>> c_n_ops_executing = 0
>> c_n_ops_pending = 0
>> c_n_ops_completed = 3
>> c_n_get = 4
>> c_n_read = 3
>> c_n_write = 1
>> c_extensions = (nil)
>> c_clientfunc = (nil)
>> c_clientarg = (nil)
>> c_send_ldap_result = 0x10006b900 =
>> &slap_send_ldap_result(Operation *op, SlapReply *rs)
>> c_send_search_entry = 0x10006cda0 =
>> &slap_send_search_entry(Operation *op, SlapReply *rs)
>> c_send_search_reference = 0x10006fa80 =
>> &slap_send_search_reference(Operation *op, SlapReply *rs)
>> c_send_ldap_extended = 0x10006c5f8 =
>> &slap_send_ldap_extended(Operation *op, SlapReply *rs)
>> c_send_ldap_intermediate = 0x10006ca28 =
>> &slap_send_ldap_intermediate(Operation *op, SlapReply *rs) }
>