[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#8898) mdb_get on corrupted LMDB file crash the process



Irad K wrote:
> Hi Howard and thanks for the info.=C2=A0
>=20
> I've read your insights about the corruption nature and I'd like to see=
 if I can write some sort of integrity check before I try to open the fil=
e.=C2=A0
> Perhaps you could guide me where to look for the file format, where can=
 I find all the meta-pages and in which offset can I find the used pages =
?

Read the source yourself if you want to know the underlying format. Other=
wise, just use mdb_env_info().
=C2=A0
> Basically, I wish to check the latest meta-page and see if the number o=
f pages correspond with the file actual size.=C2=A0
>=20
> Second, as I've stated, my target operating system is macOS.. are you f=
amiliar of any known issues with file atomicity in this particular OS ?=C2=
=A0
> Perhaps you could tell me how it's implemented under the hood to achiev=
e data consistency if I'm using the database using MDB_NOMETASYNC, so I f=
urther
> investigate this issue and address it to Apple if necessary.=20

I've seen other reports of corruptions on MacOSX. Since they're in SQLite=
 I haven't investigated further.

http://sqlite.1065341.n5.nabble.com/SQLITE-vs-OSX-mmap-inevitable-catalog=
-corruption-td85620.html

http://sqlite.1065341.n5.nabble.com/Re-Database-corruption-and-PRAGMA-ful=
lfsync-on-macOS-td95366i20.html



>=20
> Irad
>=20
> On Mon, Sep 10, 2018 at 8:44 PM Howard Chu <hyc@symas.com <mailto:hyc@s=
ymas.com>> wrote:
>=20
>     iradization@gmail.com <mailto:iradization@gmail.com> wrote:
>     > Full_Name: Irad.k
>     > Version: recent from GitHub.
>     > OS: macOS
>     > URL: ftp://ftp.openldap.org/incoming/
>     > Submission from: (NULL) (82.81.84.130)
>     >
>     >
>     > Hi,
>     >
>     > I'm working with LMDB with the following flags :=C2=A0 MDB_NOMETA=
SYNC | MDB_NOSUBDIR
>     > | MDB_NOTLS.
>     >
>     > Somehow, even though the DB should be ACI(without the D),=C2=A0 i=
t got corrupted
>     > after recovering from kernel panic, and It crashes my process whe=
n trying to
>     > access one of the records (see crash log below).
>     >
>     > Here's a link to the file :
>     >
>     > https://drive.google.com/file/d/12Q3KYYrapiJOgiaccnDL3tQACkqrM3C5=
/view?usp=3Dsharing
>=20
>     This file is only 6 pages long, but the latest meta page claims tha=
t it's using 11 pages.
>     Looking at the previous meta page, it claims that 9 pages were used=
.
>=20
>     Clearly your OS failed to write the complete contents out before it=
 panicked. At the
>     very least the file should have been 9 pages long. Sounds like you =
have an OS bug.
>=20
>     > According to the crash log from the process, It can clearly be se=
en that the
>     > invalid address reside inside the mapped file region which is the=
 lmdb mapped
>     > file, but still I get KERN_MEMORY_ERROR on that address.
>     >
>     >>From what I know, an attempt to access address within the mapped =
range can
>     > either retrieve the page contents directly from memory (if it's a=
lready there),
>     > or trigger page fault trap that eventually lead to reading the mi=
ssing data from
>     > disk and return it to process as well.
>     >
>     > One thing that raise some concerns is that the file size is only =
24k and the
>     > mapping spans over 256M. However, the file's meta data seems to b=
e coherent to
>     > file contents.
>     >
>     > Any idea how did it happen, and what exactly in the file cause th=
is corruption ?
>     >
>     >
>     > CRASH LOG :
>     > -----------------------------------------------------------------=
-----
>     > Exception Type:=C2=A0 =C2=A0 =C2=A0 =C2=A0 EXC_BAD_ACCESS (SIGBUS=
)
>     > Exception Codes:=C2=A0 =C2=A0 =C2=A0 =C2=A0KERN_MEMORY_ERROR at 0=
x000000010648800a
>     > Exception Note:=C2=A0 =C2=A0 =C2=A0 =C2=A0 EXC_CORPSE_NOTIFY
>     >
>     > Termination Signal:=C2=A0 =C2=A0 Bus error: 10
>     > Termination Reason:=C2=A0 =C2=A0 Namespace SIGNAL, Code 0xa
>     > Terminating Process:=C2=A0 =C2=A0exc handler [0]
>     >
>     > VM Regions Near 0x10648800a:
>     >
>     > __LINKEDIT=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00000000=
106464000-000000010647f000 [=C2=A0 108K] r--/rwx SM=3DCOW
>     >=C2=A0 ^Z^C [/usr/lib/dyld]
>     > --> mapped file=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 00000001=
0647f000-000000011647f000 [256.0M] r--/rwx
>     > SM=3DPRV=C2=A0 Object_id=3D9034edd9
>     > STACK GUARD=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 000070000f2b=
3000-000070000f2b4000 [=C2=A0 =C2=A0 4K] ---/rwx SM=3DNUL
>     >=C2=A0 stack guard for thread 1
>     >
>     > Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
>     > 0=C2=A0 =C2=A0myprog=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00x0000000101666756 mdb_page_search_root + =
39
>     > 1=C2=A0 =C2=A0myprog=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00x00000001016660f7 mdb_page_search + 182
>     > 2=C2=A0 =C2=A0myprog=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00x00000001016614de mdb_cursor_set + 88
>     > 3=C2=A0 =C2=A0myprog=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00x0000000101661476 mdb_get + 134
>     >
>     >
>     >
>=20
>=20
>     --=20
>     =C2=A0 -- Howard Chu
>     =C2=A0 CTO, Symas Corp.=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0htt=
p://www.symas.com
>     =C2=A0 Director, Highland Sun=C2=A0 =C2=A0 =C2=A0http://highlandsun=
.com/hyc/
>     =C2=A0 Chief Architect, OpenLDAP=C2=A0 http://www.openldap.org/proj=
ect/
>=20


--=20
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/