8181 – LMDB page leaks etc when treating DBs as data

Issue 8181 - LMDB page leaks etc when treating DBs as data

Summary: LMDB page leaks etc when treating DBs as data

Status:	VERIFIED FIXED

Alias:	None

Product:	LMDB
Classification:	Unclassified
Component:	liblmdb (show other issues)
Version:	unspecified
Hardware:	All All

Importance:	--- normal
Target Milestone:	---
Assignee:	OpenLDAP project

URL:
Keywords:

Depends on:
Blocks:

Reported:	2015-06-29 18:44 UTC by Hallvard Furuseth
Modified:	2020-03-12 15:55 UTC (History)
CC List:	0 users

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this issue.

Description Hallvard Furuseth 2015-06-29 18:44:42 UTC

Full_Name: Hallvard B Furuseth
Version: LMDB_0.9.15
OS: 
URL: 
Submission from: (NULL) (81.191.45.5)
Submitted by: hallvard


The user can overwrite or delete a named DB record
with normal data.  This leaks the pages of the DB.
Also the new non-DB can then be treated as a DB.
A fix is coming (from my branch mdb/bundle).

Comment 1 Howard Chu 2015-06-29 21:05:27 UTC

h.b.furuseth@usit.uio.no wrote:
> Full_Name: Hallvard B Furuseth
> Version: LMDB_0.9.15
> OS:
> URL:
> Submission from: (NULL) (81.191.45.5)
> Submitted by: hallvard
>
>
> The user can overwrite or delete a named DB record
> with normal data.  This leaks the pages of the DB.
> Also the new non-DB can then be treated as a DB.
> A fix is coming (from my branch mdb/bundle).

Seriously, why aren't we just saying "don't do this" and moving on? There are 
lots of stupid things you can do with software. It's a waste of time and 
energy to prevent them all.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 2 Hallvard Furuseth 2015-06-30 12:39:59 UTC

On 29/06/15 23:05, hyc@symas.com wrote:
> Seriously, why aren't we just saying "don't do this" and moving on?

OK, then document it.  Maybe keeping it simple, call
mixing mainDB data and named DBs a user error and warn this
can break the DB.

> There are
> lots of stupid things you can do with software. It's a waste of time and
> energy to prevent them all.

We totally disagree, starting with what is stupid.  But we knew
that.  That's I asked which parts of "mdb/bundle" to push and
which you rejected, we must have misunderstood each other.  I
thought I was just ITSing this commit properly before pushing.

Comment 3 Howard Chu 2015-06-30 12:52:10 UTC

Hallvard Breien Furuseth wrote:
> On 29/06/15 23:05, hyc@symas.com wrote:
>> Seriously, why aren't we just saying "don't do this" and moving on?
>
> OK, then document it.  Maybe keeping it simple, call
> mixing mainDB data and named DBs a user error and warn this
> can break the DB.

We have already documented this. "Database names are kept as keys in the 
unnamed database." It should be obvious that if you muck with the record of an 
existing key, things will change, therefore you should not muck with those keys.

>> There are
>> lots of stupid things you can do with software. It's a waste of time and
>> energy to prevent them all.
>
> We totally disagree, starting with what is stupid.  But we knew
> that.  That's I asked which parts of "mdb/bundle" to push and
> which you rejected, we must have misunderstood each other.  I
> thought I was just ITSing this commit properly before pushing.

I just don't see this happening in practice. Most applications will never use 
subDBs. It's not like some random person is going to come along and open an 
arbitrary LMDB file created by some other application and start mucking with 
it, without knowing the meaning of what's inside. Embedded database libraries 
are used in specific contexts by their enclosing application. They don't just 
get random ops thrown at them from 3rd party code. The designer of the 
enclosing app will design a schema/whatever set of tables to use and that 
scheme will be fixed for the life of the application.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 4 Howard Chu 2015-06-30 13:09:27 UTC

Hallvard Breien Furuseth wrote:
> On 29/06/15 23:05, hyc@symas.com wrote:
>> Seriously, why aren't we just saying "don't do this" and moving on?
>
> OK, then document it.  Maybe keeping it simple, call
> mixing mainDB data and named DBs a user error and warn this
> can break the DB.
>
>> There are
>> lots of stupid things you can do with software. It's a waste of time and
>> energy to prevent them all.
>
> We totally disagree, starting with what is stupid.  But we knew
> that.

Ultimately it's redundant work. I.e., every restriction you place inside the 
code still requires documentation, otherwise all you've done is changed "why 
doesn't this work" into "why doesn't LMDB let me do what I tried to do?" 
Simpler to only write the doc and leave the code pristine.

The library's purpose is to perform its intended uses efficiently. Misuses are 
not our responsibility. Preventing misuse is the responsibility of the 
documentation, it doesn't belong in the code.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 5 Hallvard Furuseth 2015-06-30 13:28:15 UTC

On 30. juni 2015 14:52, hyc@symas.com wrote:
> We have already documented this. "Database names are kept as keys in the
> unnamed database."

"Hey, that may mean I can drop a DB without having to open it
and mess with LMDB's DBI handling!  Let me try this..."

> It should be obvious that if you muck with the record of an
> existing key, things will change, therefore you should not muck with those keys.

Lots of people think differently than you or me.

We're not going to agree.  Not on this, and not on what you said
in your next mail.  I don't see the point of rehashing those
arguments yet again, and I didn't intend to.  I was just ITSing a
commit before pushing, _after_ asking you about it. I thought.

-- 
Hallvard

Comment 6 Hallvard Furuseth 2015-07-12 06:45:10 UTC

changed state Open to Test
moved from Incoming to Software Bugs

Comment 7 Hallvard Furuseth 2015-07-13 20:59:46 UTC

Added the error checks after all, plus documentation,
after chatting about it.

Comment 8 OpenLDAP project 2015-07-28 14:16:19 UTC

fixed in mdb.master, mdb.RE/0.9, master

Comment 9 Howard Chu 2015-07-28 14:16:19 UTC

changed notes

Comment 10 Quanah Gibson-Mount 2015-07-28 16:06:59 UTC

changed state Test to Release

Comment 11 juerg.bircher@gmail.com 2015-07-30 16:01:49 UTC

Hi Howard,

I am new to lmdb. I have been working with lmdb intensively for one month. I really appreciate your great work. Good efficient C code is not always found!
Well I like to follow up on that reported issue.
I am using multiple databases on the same environment. I was a bit confused about your statement that most application use never subDBs? I think it is a great feature that helps to support multiple indexes.
I ran unintentionally into a related problem as I set the compare function for the main db to an integer based one opposite to the literal compare function which is the default. Therefore when opening a database by its name the wrong database might be returned as the integer compare function might think names are equal as only 96 bits (in my function) are compared. So the compare function only compares the prefix of the database names!
Maybe the database meta should be kept in a private space. But I also agree on your statement to keep things simple. I solved the problem by never using the main db so under no circumstances the database meta is corrupted. I think the price paid for having only named databases is very cheap as I open databases at startup and keep the database index (dbi).

Regards
Jürg

Rockethealth by Helmedica AG
Web: www.rockethealth.ch<http://www.rockethealth.ch/>
Jürg Bircher
Chief Technology Officer
Mail: juerg.bircher@helmedica.ch<mailto:christoph.baumann@helmedica.ch>

Comment 12 juerg.bircher@gmail.com 2015-07-30 16:09:30 UTC

Sorry the mail before was not as plain text. Here again...

Hi Howard,

I am new to lmdb. I have been working with lmdb intensively for one month.
I really appreciate your great work. Good efficient C code is not always
found!
Well I like to follow up on that reported issue.
I am using multiple databases on the same environment. I was a bit
confused about your statement that most application use never subDBs? I
think it is a great feature that helps to support multiple indexes.
I ran unintentionally into a related problem as I set the compare function
for the main db to an integer based one opposite to the literal compare
function which is the default. Therefore when opening a database by its
name the wrong database might be returned as the integer compare function
might think names are equal as only 96 bits (in my function) are compared.
So the compare function only compares the prefix of the database names!
Maybe the database meta should be kept in a private space. But I also
agree on your statement to keep things simple. I solved the problem by
never using the main db so under no circumstances the database meta is
corrupted. I think the price paid for having only named databases is very
cheap as I open databases at startup and keep the database index (dbi).

Regards
Jürg

Rockethealth by Helmedica AG
Web: www.rockethealth.ch
Jürg Bircher
Chief Technology Officer
Mail: juerg.bircher@helmedica.ch

Comment 13 Howard Chu 2015-07-30 16:18:16 UTC

juerg.bircher@helmedica.com wrote:
> --_000_D1E06B6B7E2juergbircherhelmedicacom_
> Content-Type: text/plain; charset="iso-8859-1"
> Content-Transfer-Encoding: quoted-printable
>
> Hi Howard,
>
> I am new to lmdb. I have been working with lmdb intensively for one month. =
> I really appreciate your great work. Good efficient C code is not always fo=
> und!
> Well I like to follow up on that reported issue.
> I am using multiple databases on the same environment. I was a bit confused=
>   about your statement that most application use never subDBs? I think it is=
>   a great feature that helps to support multiple indexes.
> I ran unintentionally into a related problem as I set the compare function =
> for the main db to an integer based one opposite to the literal compare fun=
> ction which is the default. Therefore when opening a database by its name t=
> he wrong database might be returned as the integer compare function might t=
> hink names are equal as only 96 bits (in my function) are compared. So the =
> compare function only compares the prefix of the database names!
> Maybe the database meta should be kept in a private space. But I also agree=
>   on your statement to keep things simple. I solved the problem by never usi=
> ng the main db so under no circumstances the database meta is corrupted. I =
> think the price paid for having only named databases is very cheap as I ope=
> n databases at startup and keep the database index (dbi).

Thanks for the feedback. Yes, this is the best practice - if you're using 
named databases, you should not use the main DB. (Or just make sure you don't 
create name collisions.)

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 14 Quanah Gibson-Mount 2015-08-18 17:41:35 UTC

changed state Release to Closed