[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Fwd: Re: (ITS#7713) Segmentation fault if the pagesize of the Operating system is not equal to 4096.



On Tue, 1 Oct 2013, Howard Chu wrote:

Fixing this will either require adding a bunch of ugly code, or changing the on-disk format again. Opinions?

Interesting...I think this might be the same issue that hit sparcv9
=2.4.32.

http://www.openldap.org/lists/openldap-devel/201207/msg00017.html

I'm going to echo the concept of "fix it right," even if that means dump/reload...

Currently the page in-use offsets mp_lower and mp_upper range from [PAGEHDRSZ to pagesize]. IMO this was a stupid choice, carried over from the original btree code. It should instead have ranged from [0 to pagesize-PAGEHDRSZ] and then we'd have no issue right now. Adjusting this would require only a few minor tweaks to the code, but would require a full dump/reload of existing databases.


-------- Original Message --------
Subject: Re: (ITS#7713) Segmentation fault if the pagesize of the Operating system is not equal to 4096.
Date: Tue, 1 Oct 2013 07:16:11 GMT
From: hyc@symas.com
To: openldap-its@openldap.org

sumantk2@linux.vnet.ibm.com wrote:
Full_Name: sumanth k
Version: 2.4.35 to any recent version with mdb support
OS: Linux - ppc64
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (122.248.161.59)


The pagesize of Linux for x86 and s390x architecture is 4096. Whereas in Powerpc <ppc64> , the pagesize is by default 65536. So when the pagesize is not equal to 4096, the segmentation fault occurs. So tried to compile the powerpc64 kernel
with page size of 4096 and the problem disappears and runs smoothly. But by
default the powerpc64 architecture runs with 65536 pagesize . So there is some problem in mdb_env_open2() function in mdb.c when the pagesize is not equal to
4096.

Thanks for the report. I believe you should be able to instead change the
definition of MDB_PAGESIZE to 65536, instead of forcing your machine to use
4096 byte pages.

There are other problems though; we use an unsigned short for page offsets.
I'm not sure the assert that you tripped will succeed in this case.


These are my observations :


Compiled the source with -O0 optimization .
/home/openldap-2.4.36/tests/../servers/slapd/slapd -s0 -f
/home/openldap-2.4.36/tests/testrun/slapd.1.conf -h ldap://localhost:9011/ -d
0x4105

./scripts/test000-rootdse: line 31: 19059 Aborted (core dumped)
$SLAPD -f $CONF1 -h $URI1 -d $LVL $TIMING > $LOG1 2>&1

Core file :
bash-4.2# cat .gdbinit
b mdb_db_open
b mdb_env_open
b mdb_env_open2
b mdb_txn_begin
b mdb_txn_renew0
b mdb_dbi_open
b mdb_cursor_init
b mdb_cursor_set
b mdb_page_search
b mdb_page_get
b mdb_page_search_root
r -s0 -f /home/openldap-2.4.36/tests/testrun/slapd.1.conf -h
ldap://localhost:9011/ -d 0x4105


# Core was generated by `/home/openldap-2.4.36/tests/../servers/slapd/slapd -s0
-f /home/openldap-2.4.36'.
Program terminated with signal 6, Aborted.
#0  0x00001fffff7adb70 in .raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install
cyrus-sasl-lib-2.1.26-8.mcp8_0.2.ppc64 cyrus-sasl-md5-2.1.26-8.mcp8_0.2.ppc64
glibc-2.17-4.mcp8_0.6.ppc64 keyutils-libs-1.5.5-4.mcp8_0.2.ppc64
krb5-libs-1.11.3-1.mcp8_0.2.ppc64 libcom_err-1.42.7-2.mcp8_0.1.ppc64
libdb-5.3.21-8.mcp8_0.4.ppc64 libselinux-2.1.13-15.mcp8_0.3.ppc64
nss-softokn-freebl-3.14.3-1.mcp8_0.2.ppc64 openssl-libs-1.0.1e-4.mcp8_0.1.ppc64
pcre-8.32-7.mcp8_0.2.ppc64 zlib-1.2.7-10.mcp8_0.1.ppc64
(gdb) where
#0  0x00001fffff7adb70 in .raise () from /lib64/libc.so.6
#1  0x00001fffff7afb64 in .abort () from /lib64/libc.so.6
#2  0x00001fffff7a455c in .__assert_fail_base () from /lib64/libc.so.6
#3  0x00001fffff7a464c in .__assert_fail () from /lib64/libc.so.6
#4  0x00000000101498cc in mdb_node_add (mc=0x3fffe89a7d40, indx=0,
key=0x3fffe89a7d20, data=0x3fffe89a7d30, pgno=0, flags=2) at
./../../../libraries/liblmdb/mdb.c:6160
#5 0x000000001014882c in mdb_cursor_put (mc=0x3fffe89a7d40, key=0x3fffe89a7d20,
data=0x3fffe89a7d30, flags=2) at ./../../../libraries/liblmdb/mdb.c:5877
#6  0x00000000101516a8 in mdb_dbi_open (txn=0x1000f675980, name=0x1027b9c8
"ad2i", flags=262152, dbi=0x1fffff0800a0) at
./../../../libraries/liblmdb/mdb.c:7902
#7 0x0000000010139cd4 in mdb_db_open (be=0x1000f4f83a0, cr=0x3fffe89a81c0) at
init.c:207
#8  0x0000000010050e34 in backend_startup_one (be=0x1000f4f83a0,
cr=0x3fffe89a81c0) at backend.c:224
#9 0x0000000010051588 in backend_startup (be=0x1000f4f83a0) at backend.c:325
#10 0x0000000010089a7c in slap_startup (be=0x0) at init.c:219
#11 0x000000001000a9c8 in main (argc=8, argv=0x3fffe89a8958) at main.c:991



Here is the error message :

#/home/openldap-2.4.36/tests/../servers/slapd/slapd -s0 -f
/home/openldap-2.4.36/tests/testrun/slapd.1.conf -h ldap://localhost:9011/ -d
0x4105
(...some messages...)
524a2fe1 mdb_db_open: database "o=OpenLDAP Project,l=Internet":
dbenv_open(/home/openldap-2.4.36/tests/testrun/db.1.a).
slapd: ./../../../libraries/liblmdb/mdb.c:6160: mdb_node_add: Assertion
`mp->mp_pb.pb.pb_upper >= mp->mp_pb.pb.pb_lower' failed. < === fails in assert()
;
Aborted (core dumped)



Some of my observation :

in this file libraries/liblmdb/mdb.c
in X86 : rc = mdb_cursor_set(&mc, &key, &data, MDB_SET, &exact); the value of rc = MD_SUCCESS, but for ppc64 it is MDB_NOTFOUND. This is due to the fact that
md_root != 2

The value of md_root=2 for env->me_metas[1]->mm_dbs in X86 , but some huge value
in ppc64. The value of md_pad = 4096 in x86 and 65536 in ppc64 .

in x86:

Breakpoint 2, mdb_txn_begin (env=0x9c5050, parent=0x0, flags=0,
ret=0x7fffffffdf98) at ./../../../libraries/liblmdb/mdb.c:2219
2219		int rc, size, tsize = sizeof(MDB_txn);
(gdb) p env->me_metas[1]->mm_dbs
$1 = {{md_pad = 4096, md_flags = 8, md_depth = 0, md_branch_pages = 0,
md_leaf_pages = 0, md_overflow_pages = 0, md_entries = 0, md_root =
18446744073709551615}, {md_pad = 0, md_flags = 0,
md_depth = 1, md_branch_pages = 0, md_leaf_pages = 1, md_overflow_pages = 0,
md_entries = 4, md_root = 2}}
(gdb) p env->me_metas[0]->mm_dbs
$2 = {{md_pad = 4096, md_flags = 8, md_depth = 0, md_branch_pages = 0,
md_leaf_pages = 0, md_overflow_pages = 0, md_entries = 0, md_root =
18446744073709551615}, {md_pad = 0, md_flags = 0,
md_depth = 0, md_branch_pages = 0, md_leaf_pages = 0, md_overflow_pages = 0,
md_entries = 0, md_root = 18446744073709551615}}


in ppc64:

Breakpoint 2, mdb_txn_begin (env=0x10493ef0, parent=0x0, flags=0,
ret=0x3fffffffe590) at ./../../../libraries/liblmdb/mdb.c:2219
2219            int rc, size, tsize = sizeof(MDB_txn);
(gdb) p env->me_metas[0]->mm_dbs
$1 = {{md_pad = 65536, md_flags = 8, md_depth = 0, md_branch_pages = 0,
md_leaf_pages = 0, md_overflow_pages = 0, md_entries = 0, md_root =
18446744073709551615}, {md_pad = 0, md_flags = 0,
md_depth = 0, md_branch_pages = 0, md_leaf_pages = 0, md_overflow_pages = 0,
md_entries = 0, md_root = 18446744073709551615}}
(gdb) p env->me_metas[1]->mm_dbs
$2 = {{md_pad = 65536, md_flags = 8, md_depth = 0, md_branch_pages = 0,
md_leaf_pages = 0, md_overflow_pages = 0, md_entries = 0, md_root =
18446744073709551615}, {md_pad = 0, md_flags = 0,
md_depth = 0, md_branch_pages = 0, md_leaf_pages = 0, md_overflow_pages = 0,
md_entries = 0, md_root = 18446744073709551615}}




# From further investigation , the value of env->me_metas[1] is initialized in
mdb_env_open2() at :

p = (MDB_page *)env->me_map;
env->me_metas[0] = METADATA(p);
env->me_metas[1] = (MDB_meta *)((char *)env->me_metas[0] + meta.mm_psize);

Here the meta.mm_psize in ppc64 is 65536 , hence there is some problem.. If the
value of meta.mm_psize is 4096 , then everything works fine.


As i dont have deep knowledge in openldap ,some help is needed.

Thank you,
Sumanth K




--
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/