[Date Prev][Date Next] [Chronological] [Thread] [Top]

(ITS#7715) SIGBUS when mdb is configured with writemap



Full_Name: ?eljko Neja?mić
Version: latest git pull of OPENLDAP_REL_ENG_2_4
OS: RedHat 6.3
URL: 
Submission from: (NULL) (213.147.123.33)


Using ldclt tool to stress test our OpenLDAP mirror sync setup I encountered a
SIGBUS. Do note that the same issue occurs on only one node too, without sync.
I've tested using the aforementioned tool and the same arguments on both Red Hat
6.3 (2.6.32-279.el6.x86_64) and Ubuntu Server 12.04 (Linux 3.2.0-54-generic
x86_64) with the exact outcome.
In both cases the OpenLDAP was compiled from sources
(origin/OPENLDAP_REL_ENG_2_4), configured with --disable-{hdb,bdb},
--prefix=/opt/openldap, --enable-local=yes and using mdb as a backend, tweaked
additionally with:
* nometasync
* writemap

Without the writemap tweak, SIGBUS isn't happening.

The command used was:
    ldclt -h 172.17.101.150 -p 389 -D "cn=xxx,dc=xxx" -w "xxx" -b
"ds=USERS,o=STANDARD,dc=xxx" \
    -e object=xxx.txt,rdn='uid:[A=INCRNNOLOOP(200000;999999;6)]' -e
add,commoncounter -I 68

...where the xxx.txt has the following content:
    objectclass: xxxUser

The ldclt command uses 10 threads to do the add operation with the incrementing
uid parameter on the base dn: ds=USERS,o=STANDARD,dc=xxx.

Ulimits are:
    ulimit -a
    core file size          (blocks, -c) unlimited
    data seg size           (kbytes, -d) unlimited
    scheduling priority             (-e) 0
    file size               (blocks, -f) unlimited
    pending signals                 (-i) 2066206
    max locked memory       (kbytes, -l) unlimited
    max memory size         (kbytes, -m) unlimited
    open files                      (-n) 4096
    pipe size            (512 bytes, -p) 8
    POSIX message queues     (bytes, -q) unlimited
    real-time priority              (-r) 0
    stack size              (kbytes, -s) unlimited
    cpu time               (seconds, -t) unlimited
    max user processes              (-u) 1024
    virtual memory          (kbytes, -v) unlimited
    file locks                      (-x) unlimited


At first sight, gdb seems to point to mdb_page_alloc:
    Starting program: /opt/openldap/libexec/slapd -h ldap:///\ ldapi:/// -F
/opt/openldap/etc/openldap/slapd.d -g openldap -u openldap -d 0
    [Thread debugging using libthread_db enabled]
    [New Thread 0x2aaaac764700 (LWP 20415)]
    [Thread 0x2aaaac764700 (LWP 20415) exited]
    [New Thread 0x2aaaac764700 (LWP 20416)]
    [New Thread 0x2ab3ad168700 (LWP 20417)]
    [New Thread 0x2ab3ad969700 (LWP 20418)]
    [New Thread 0x2ab3ae16a700 (LWP 20419)]
    [New Thread 0x2ab3ae96b700 (LWP 20420)]
    [New Thread 0x2ab3b8800700 (LWP 20421)]
    [New Thread 0x2ab3d4800700 (LWP 20422)]

    Program received signal SIGBUS, Bus error.
    [Switching to Thread 0x2ab3b8800700 (LWP 20421)]
    mdb_page_alloc (mc=<value optimized out>, num=1, mp=0x2ab3b87fd8b8)
        at ./../../../libraries/liblmdb/mdb.c:1759
    warning: Source file is more recent than executable.
    1759            np->mp_pgno = pgno;

And the backtrace is:
    #0  mdb_page_alloc (mc=<value optimized out>, num=1, mp=0x2ab3b87fd8b8)
        at ./../../../libraries/liblmdb/mdb.c:1759
    #1  0x00000000004afb19 in mdb_page_touch (mc=0x2ab3bc1103f0)
        at ./../../../libraries/liblmdb/mdb.c:1889
    #2  0x00000000004b1c8c in mdb_cursor_touch (mc=0x2ab3bc1103f0)
        at ./../../../libraries/liblmdb/mdb.c:5597
    #3  0x00000000004b3a85 in mdb_cursor_put (mc=0x2ab3bc1103f0,
key=0x2ab3b87ff000,
        data=0x2ab3b87feff0, flags=32) at
./../../../libraries/liblmdb/mdb.c:5727
    #4  0x00000000004f8586 in mdb_idl_insert_keys (be=<value optimized out>,
cursor=0x2ab3bc1103f0,
        keys=<value optimized out>, id=13) at idl.c:534
    #5  0x00000000004f9116 in indexer (op=0x2ab3bc10dbd0, txn=<value optimized
out>,
        ai=<value optimized out>, ad=0x88e0e0, atname=0x88dfb8,
vals=0x2ab3bc110120, id=13, opid=1,
        mask=4) at index.c:219
    #6  0x00000000004f95d1 in index_at_values (op=0x2ab3bc10dbd0,
txn=0x2ab3bc10e2f0,
        ad=<value optimized out>, type=0x88df50, tags=0x88e100,
vals=0x2ab3bc110120, id=13, opid=1)
        at index.c:337
    #7  0x00000000004f9627 in mdb_index_values (op=<value optimized out>,
txn=<value optimized out>,
        desc=<value optimized out>, vals=<value optimized out>, id=<value
optimized out>,
        opid=<value optimized out>) at index.c:386
    #8  0x00000000004f96f9 in mdb_index_entry (op=0x2ab3bc10dbd0,
txn=0x2ab3bc10e2f0, opid=1,
        e=0x8c18d8) at index.c:558
    #9  0x00000000004ed77e in mdb_add (op=0x2ab3bc10dbd0, rs=0x2ab3b87ff950) at
add.c:359
    #10 0x0000000000487ac7 in overlay_op_walk (op=0x2ab3bc10dbd0,
rs=0x2ab3b87ff950, which=op_add,
        oi=0x932280, on=0x0) at backover.c:671
    #11 0x00000000004884a7 in over_op_func (op=0x2ab3bc10dbd0, rs=<value
optimized out>,
        which=<value optimized out>) at backover.c:723
    #12 0x00000000004281c0 in fe_op_add (op=0x2ab3bc10dbd0, rs=0x2ab3b87ff950)
at add.c:334
    #13 0x0000000000428a16 in do_add (op=0x2ab3bc10dbd0, rs=0x2ab3b87ff950) at
add.c:194
    #14 0x0000000000421259 in connection_operation (ctx=0x2ab3b87ffab0,
arg_v=0x2ab3bc10dbd0)
        at connection.c:1155
    #15 0x0000000000421a35 in connection_read_thread (ctx=0x2ab3b87ffab0,
argv=<value optimized out>)
        at connection.c:1291
    #16 0x0000000000516380 in ldap_int_thread_pool_wrapper (xpool=0x898160) at
tpool.c:688
    #17 0x000000384cc07851 in start_thread () from /lib64/libpthread.so.0 
    #18 0x000000384c8e767d in clone () from /lib64/libc.so.6

For context, the assembly land around the offending pointer dereferencing looks
like:
    0x4af949 <mdb_page_alloc+665>   movslq %r13d,%rax
    0x4af94c <mdb_page_alloc+668>   lea    (%rcx,%rax,1),%rax
    0x4af950 <mdb_page_alloc+672>   mov    %rax,0x10(%r14)
    0x4af954 <mdb_page_alloc+676>   mov    %rcx,0x0(%rbp)
    
Hardware underneath all of that is:
    1) HP ProLiant BL460c Gen8, dual Xeon E5-2658 with attached storage blade
D2200sb with SSD raid, 256GB RAM -- RedHat 6.3 tests
    2) Intel server blade S2400BB, dual Xeon E5-2403, 48GB RAM -- Ubuntu 12.04
tests

If anything more is required to assist you in troubleshooting, please let me
know.


Zeljko