[Date Prev][Date Next] [Chronological] [Thread] [Top]

slapd crash in back-bdb/ctxcsn.c (ITS#3301)



Full_Name: Ralf Haferkamp
Version: 2.2.15
OS: Linux (Kernel 2.6)
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (212.95.102.25)


I did run a slightly modified version of "test008-concurrency" on a test server
with around 10000 entries. The test runs a many add, read and modify (I adapted
slapd-modrdn to do modifies instead of modrdn) operations in parallel. After a
short while the server crashed. I was able to produce the following backtrace:

#0  0x080c8a09 in bdb_csn_commit (op=0x44219800, rs=0x44088870, tid=0x4623df90,

    ei=0x81842a8, suffix_ei=0x440884d0, ctxcsn_e=0x440884cc,
ctxcsn_added=0x440884c8, 
    locker=2147502168) at ctxcsn.c:62
        bdb = (struct bdb_info *) 0x816aad0
        ctxcsn_ei = (EntryInfo *) 0x0
        ctxcsn_lock = {off = 0, ndx = 938, gen = 135781712, mode = 1075070032}
        max_committed_csn = {bv_len = 135421424, bv_val = 0x4620e4d0 "\017"}
        suffix_lock = {off = 1176550456, ndx = 0, gen = 1141408840, mode =
135075791}
        rc = -30995
        ret = 10427
        ctxcsn_id = 1176560848
        e = (Entry *) 0x46237f18
        textbuf = "....."
        textlen = 256
        eip = (EntryInfo *) 0x0
#1  0x080c4d7e in bdb_add (op=0x44219800, rs=0x44088870) at add.c:441
        bdb = (struct bdb_info *) 0x816aad0
        pdn = {bv_len = 11, bv_val = 0x46243feb "o=customers"}
        p = (Entry *) 0x0
        ei = (EntryInfo *) 0x81842a8
        textbuf = "....."
        textlen = 256
        children = (AttributeDescription *) 0x81298b0
        entry = (AttributeDescription *) 0x8129720
        ltid = (DB_TXN *) 0x4623df90
        lt2 = (DB_TXN *) 0x462573a0
        opinfo = {boi_bdb = 0x816a9d0, boi_txn = 0x4623df90, boi_lock = {off =
16, 
    ndx = 1077478705, gen = 1176560872, mode = 1074201072}, boi_err = 0, 
  boi_locker = 2147502168, boi_acl_cache = 0}
        subentry = 0
        locker = 2147502168
        lock = {off = 298840, ndx = 386, gen = 3273, mode = DB_LOCK_READ}
        num_retries = 0
        ps_list = (Operation *) 0x10
        rc = 1176502288
        suffix_ei = (EntryInfo *) 0x0
        ctxcsn_e = (Entry *) 0x440884e8
        ctxcsn_added = 0
        postread_ctrl = (LDAPControl **) 0x0
        ctrls = {0x0, 0x4043f4c0, 0x4043f4c0, 0x400, 0x18, 0x4620e4e0}
        num_ctrls = 0
#2  0x0806aad2 in do_add (op=0x44219800, rs=0x44088870) at add.c:318
        update = 0
        textbuf = "....."
        textlen = 256
        cb = {sc_next = 0x0, sc_response = 0x807106a <slap_replog_cb>,
sc_cleanup = 0, 
  sc_private = 0x0}
        repl_user = 0
        ber = (BerElement *) 0x4623df10
        last = 0x46230d6f ""
        dn = {bv_len = 30, bv_val = 0x46230b32 "cn=James A Jones
5,o=customers"}
        len = 36
        tag = 4294967295
        e = (Entry *) 0x4620bc38
        modlist = (Modifications *) 0x46265498
        modtail = (Modifications **) 0x4625cbc0
        tmp = {sml_mod = {sm_op = 1141409752, sm_desc = 0x40324eb0, sm_type =
{bv_len = 15, 
      bv_val = 0x46230d4d "telephoneNumber"}, sm_values = 0x46225a90, sm_nvalues
= 0x0}, 
  sml_next = 0x0}
        manageDSAit = 0
#3  0x0806445e in connection_operation (ctx=0x44088900, arg_v=0x44219800)
    at connection.c:1048
        rc = 80
        op = (Operation *) 0x44219800
        rs = {sr_type = REP_RESULT, sr_tag = 0, sr_msgid = 0, sr_err = 0,
sr_matched = 0x0, 
  sr_text = 0x0, sr_ref = 0x0, sr_ctrls = 0x0, sr_un = {sru_sasl = {r_sasldata =
0x0}, 
    sru_extended = {r_rspoid = 0x0, r_rspdata = 0x0}, sru_search = {r_entry =
0x0, 
      r_attrs = 0x0, r_nentries = 0, r_v2ref = 0x0}}, sr_flags = 0}
        tag = 104
        oldtag = 104
        conn = (Connection *) 0x42d274bc
        memctx = (void *) 0x819c1e0
        memctx_null = (void *) 0x0
        memsiz = 1048576
#4  0x4003166d in ldap_int_thread_pool_wrapper (xpool=0x812b520) at tpool.c:467
        pool = (struct ldap_int_thread_pool_s *) 0x812b520
        ctx = (ldap_int_thread_ctx_t *) 0x8199c70
        ltc_key = {{ltk_key = 0x80a3de8, ltk_data = 0x819c1e0, 
    ltk_free = 0x80a3db8 <sl_mem_destroy>}, {ltk_key = 0x817d210, ltk_data =
0xe, 
    ltk_free = 0x80c786f <bdb_locker_id_free>}, {ltk_key = 0x817d211, ltk_data =
0x81a1df0, 
    ltk_free = 0x80c76db <bdb_txn_free>}, {ltk_key = 0x0, ltk_data = 0x0, 
    ltk_free = 0} <repeats 29 times>}
        tid = 1141410736
        i = 391
        keyslot = 391
        hash = 391
#5  0x403239ed in start_thread () from /lib/tls/libpthread.so.0
No symbol table info available.
#6  0x403e59ca in clone () from /lib/tls/libc.so.6
No symbol table info available.

So it looks the like dn2entry in back-bdb/ctxcsn.c:62 is returning
DB_LOCK_DEADLOCK (rc = -30995 in the backtrace) and therefore ctxcsn_ei is still
NULL.

Unfortunately I am not very familar with this code so I don't know how to
correctly fix it, but returning BDB_CSN_RETRY directly after the dn2entry call
if rc==DB_LOCK_DEADLOCK seems to fix the problem.