[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: slapd crash in back-bdb/ctxcsn.c (ITS#3301)



rhafer@suse.de wrote:

>Full_Name: Ralf Haferkamp
>Version: 2.2.15
>OS: Linux (Kernel 2.6)
>URL: ftp://ftp.openldap.org/incoming/
>Submission from: (NULL) (212.95.102.25)
>
>
>I did run a slightly modified version of "test008-concurrency" on a test server
>with around 10000 entries. The test runs a many add, read and modify (I adapted
>slapd-modrdn to do modifies instead of modrdn) operations in parallel. After a
>short while the server crashed. I was able to produce the following backtrace:
>
>#0  0x080c8a09 in bdb_csn_commit (op=0x44219800, rs=0x44088870, tid=0x4623df90,
>
>    ei=0x81842a8, suffix_ei=0x440884d0, ctxcsn_e=0x440884cc,
>ctxcsn_added=0x440884c8, 
>    locker=2147502168) at ctxcsn.c:62
>        bdb = (struct bdb_info *) 0x816aad0
>        ctxcsn_ei = (EntryInfo *) 0x0
>        ctxcsn_lock = {off = 0, ndx = 938, gen = 135781712, mode = 1075070032}
>        max_committed_csn = {bv_len = 135421424, bv_val = 0x4620e4d0 "\017"}
>        suffix_lock = {off = 1176550456, ndx = 0, gen = 1141408840, mode =
>135075791}
>        rc = -30995
>        ret = 10427
>        ctxcsn_id = 1176560848
>        e = (Entry *) 0x46237f18
>        textbuf = "....."
>        textlen = 256
>        eip = (EntryInfo *) 0x0
>#1  0x080c4d7e in bdb_add (op=0x44219800, rs=0x44088870) at add.c:441
>        bdb = (struct bdb_info *) 0x816aad0
>        pdn = {bv_len = 11, bv_val = 0x46243feb "o=customers"}
>        p = (Entry *) 0x0
>        ei = (EntryInfo *) 0x81842a8
>        textbuf = "....."
>        textlen = 256
>        children = (AttributeDescription *) 0x81298b0
>        entry = (AttributeDescription *) 0x8129720
>        ltid = (DB_TXN *) 0x4623df90
>        lt2 = (DB_TXN *) 0x462573a0
>        opinfo = {boi_bdb = 0x816a9d0, boi_txn = 0x4623df90, boi_lock = {off =
>16, 
>    ndx = 1077478705, gen = 1176560872, mode = 1074201072}, boi_err = 0, 
>  boi_locker = 2147502168, boi_acl_cache = 0}
>        subentry = 0
>        locker = 2147502168
>        lock = {off = 298840, ndx = 386, gen = 3273, mode = DB_LOCK_READ}
>        num_retries = 0
>        ps_list = (Operation *) 0x10
>        rc = 1176502288
>        suffix_ei = (EntryInfo *) 0x0
>        ctxcsn_e = (Entry *) 0x440884e8
>        ctxcsn_added = 0
>        postread_ctrl = (LDAPControl **) 0x0
>        ctrls = {0x0, 0x4043f4c0, 0x4043f4c0, 0x400, 0x18, 0x4620e4e0}
>        num_ctrls = 0
>#2  0x0806aad2 in do_add (op=0x44219800, rs=0x44088870) at add.c:318
>        update = 0
>        textbuf = "....."
>        textlen = 256
>        cb = {sc_next = 0x0, sc_response = 0x807106a <slap_replog_cb>,
>sc_cleanup = 0, 
>  sc_private = 0x0}
>        repl_user = 0
>        ber = (BerElement *) 0x4623df10
>        last = 0x46230d6f ""
>        dn = {bv_len = 30, bv_val = 0x46230b32 "cn=James A Jones
>5,o=customers"}
>        len = 36
>        tag = 4294967295
>        e = (Entry *) 0x4620bc38
>        modlist = (Modifications *) 0x46265498
>        modtail = (Modifications **) 0x4625cbc0
>        tmp = {sml_mod = {sm_op = 1141409752, sm_desc = 0x40324eb0, sm_type =
>{bv_len = 15, 
>      bv_val = 0x46230d4d "telephoneNumber"}, sm_values = 0x46225a90, sm_nvalues
>= 0x0}, 
>  sml_next = 0x0}
>        manageDSAit = 0
>#3  0x0806445e in connection_operation (ctx=0x44088900, arg_v=0x44219800)
>    at connection.c:1048
>        rc = 80
>        op = (Operation *) 0x44219800
>        rs = {sr_type = REP_RESULT, sr_tag = 0, sr_msgid = 0, sr_err = 0,
>sr_matched = 0x0, 
>  sr_text = 0x0, sr_ref = 0x0, sr_ctrls = 0x0, sr_un = {sru_sasl = {r_sasldata =
>0x0}, 
>    sru_extended = {r_rspoid = 0x0, r_rspdata = 0x0}, sru_search = {r_entry =
>0x0, 
>      r_attrs = 0x0, r_nentries = 0, r_v2ref = 0x0}}, sr_flags = 0}
>        tag = 104
>        oldtag = 104
>        conn = (Connection *) 0x42d274bc
>        memctx = (void *) 0x819c1e0
>        memctx_null = (void *) 0x0
>        memsiz = 1048576
>#4  0x4003166d in ldap_int_thread_pool_wrapper (xpool=0x812b520) at tpool.c:467
>        pool = (struct ldap_int_thread_pool_s *) 0x812b520
>        ctx = (ldap_int_thread_ctx_t *) 0x8199c70
>        ltc_key = {{ltk_key = 0x80a3de8, ltk_data = 0x819c1e0, 
>    ltk_free = 0x80a3db8 <sl_mem_destroy>}, {ltk_key = 0x817d210, ltk_data =
>0xe, 
>    ltk_free = 0x80c786f <bdb_locker_id_free>}, {ltk_key = 0x817d211, ltk_data =
>0x81a1df0, 
>    ltk_free = 0x80c76db <bdb_txn_free>}, {ltk_key = 0x0, ltk_data = 0x0, 
>    ltk_free = 0} <repeats 29 times>}
>        tid = 1141410736
>        i = 391
>        keyslot = 391
>        hash = 391
>#5  0x403239ed in start_thread () from /lib/tls/libpthread.so.0
>No symbol table info available.
>#6  0x403e59ca in clone () from /lib/tls/libc.so.6
>No symbol table info available.
>
>So it looks the like dn2entry in back-bdb/ctxcsn.c:62 is returning
>DB_LOCK_DEADLOCK (rc = -30995 in the backtrace) and therefore ctxcsn_ei is still
>NULL.
>
>Unfortunately I am not very familar with this code so I don't know how to
>correctly fix it, but returning BDB_CSN_RETRY directly after the dn2entry call
>if rc==DB_LOCK_DEADLOCK seems to fix the problem.
>
>
>
>
>  
>
This is now patched in HEAD, please test.

-- 
  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support