7853 – Improve safety of mdb_env_close

Issue 7853 - Improve safety of mdb_env_close

Summary: Improve safety of mdb_env_close

Status:	UNCONFIRMED

Alias:	None

Product:	LMDB
Classification:	Unclassified
Component:	liblmdb (show other issues)
Version:	0.9.11
Hardware:	All All

Importance:	--- normal
Target Milestone:	---
Assignee:	OpenLDAP project

URL:
Keywords:

Depends on:
Blocks:

Reported:	2014-05-13 22:27 UTC by dw@hmmz.org
Modified:	2021-06-03 15:56 UTC (History)
CC List:	0 users

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this issue.

Description dw@hmmz.org 2014-05-13 22:27:11 UTC

Full_Name: David Wilson
Version: LMDB 0.9.11
OS: 
URL: https://raw.githubusercontent.com/dw/py-lmdb/master/misc/readers_mrb_env.patch
Submission from: (NULL) (178.238.153.20)


Currently if a user (wilfully or accidentally, say, through composition of
third party libs) opens the same LMDB environment twice within a process, and
maintains active read transactions at the time one mdb_env_close() is called,
all reader slots will be deallocated in all environments due to the logic
around line 4253 that unconditionally clears reader slots based on PID.

I'd like to avoid this in py-lmdb, and seems there are a few alternatives to
fix it:


1) Tell users not to do that (doc fix, nobody reads docs)

2) Externally maintain a set of (st_dev, st_ino) keys representing already
opened environments, and check any attempt to open an environment against this
list, failing if data.mdb's key already appears.

3) Disable the mdb_env_close() cleanup loop when MDB_NOTLS is enabled. Since no
reader slot will ever exist in TLS in this mode, the existing mdb_env_close()
doc "All transactions, databases, and cursors must already be closed before
calling this function." ensures no readers exist at close time, and the loop
need not run.

4) Modify lock.mdb to include MDB_env* address within the process, and update
mdb_env_close() to invalidate only readers associated with the environment
being closed. I dislike using the MDB_env* as an opaque publicly visible
cookie, but perhaps storing this field might be reused for other things in
future.


Option 3 lets the user wilfully mix py-lmdb Environment objects within the
process, since py-lmdb always enables NOTLS, but it does not fix the case where
the user is integrating a Python application with a C library with some opaque
interface, and both the application and the library independently have need to
access the environment.

This kind of scenario regularly crops up in huge software projects, the user
may not even be aware the library uses LMDB internally. In that case, the user
is again exposed to having their reader slots silently become deallocated
through no fault of their own.


Option 4 is the least aesthetic, but has all the benefits of option 3 in
addition to preventing the "accidental integration" scenario. I've attached a
patch for it (against 0.9.11), although I completely understand if it does not
get applied. :)

One alternative would be instead of using MDB_env* address, using a
monotonically increasing counter based on a static global, say, "mrb_token",
but this might break in the case two copies of LMDB were somehow linked in the
application. (It's easy to argue the user is an idiot in this case ;)


David

Comment 1 dw@hmmz.org 2014-05-13 22:28:44 UTC

It's worth mentioning that this is not a theoretical issue, I've already
hit it while writing unit tests.

Comment 2 Howard Chu 2014-05-13 23:17:59 UTC

dw@botanicus.net wrote:
> Full_Name: David Wilson
> Version: LMDB 0.9.11
> OS:
> URL: https://raw.githubusercontent.com/dw/py-lmdb/master/misc/readers_mrb_env.patch
> Submission from: (NULL) (178.238.153.20)
>
>
> Currently if a user (wilfully or accidentally, say, through composition of
> third party libs) opens the same LMDB environment twice within a process, and
> maintains active read transactions at the time one mdb_env_close() is called,
> all reader slots will be deallocated in all environments due to the logic
> around line 4253 that unconditionally clears reader slots based on PID.
>
> I'd like to avoid this in py-lmdb, and seems there are a few alternatives to
> fix it:
>
>
> 1) Tell users not to do that (doc fix, nobody reads docs)

This is already in the doc - don't open the same env twice within one process.

> 2) Externally maintain a set of (st_dev, st_ino) keys representing already
> opened environments, and check any attempt to open an environment against this
> list, failing if data.mdb's key already appears.

That would require global library state, which is definitely unwanted.

> 3) Disable the mdb_env_close() cleanup loop when MDB_NOTLS is enabled. Since no
> reader slot will ever exist in TLS in this mode, the existing mdb_env_close()
> doc "All transactions, databases, and cursors must already be closed before
> calling this function." ensures no readers exist at close time, and the loop
> need not run.

Could do this, but since it's an incomplete solution, not sure there's any point.

> 4) Modify lock.mdb to include MDB_env* address within the process, and update
> mdb_env_close() to invalidate only readers associated with the environment
> being closed. I dislike using the MDB_env* as an opaque publicly visible
> cookie, but perhaps storing this field might be reused for other things in
> future.

Hmm. I guess this could work. I think storing the MDB_env* is harmless since 
the pointer is meaningless to any other process.

> Option 3 lets the user wilfully mix py-lmdb Environment objects within the
> process, since py-lmdb always enables NOTLS, but it does not fix the case where
> the user is integrating a Python application with a C library with some opaque
> interface, and both the application and the library independently have need to
> access the environment.

Allowing multiple independent libraries to open the same environment would 
consume redundant chunks of address space. For large enough DBs this may 
simply fail.

Unfortunately the semantics of fcntl locks prevents us from even detecting 
that this problem has occurred.

> This kind of scenario regularly crops up in huge software projects, the user
> may not even be aware the library uses LMDB internally. In that case, the user
> is again exposed to having their reader slots silently become deallocated
> through no fault of their own.
>
>
> Option 4 is the least aesthetic, but has all the benefits of option 3 in
> addition to preventing the "accidental integration" scenario. I've attached a
> patch for it (against 0.9.11), although I completely understand if it does not
> get applied. :)
>
> One alternative would be instead of using MDB_env* address, using a
> monotonically increasing counter based on a static global, say, "mrb_token",
> but this might break in the case two copies of LMDB were somehow linked in the
> application. (It's easy to argue the user is an idiot in this case ;)

Global state is verboten.
>
>
> David
>
>


-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 3 Hallvard Furuseth 2014-05-14 08:23:52 UTC

On 05/14/2014 01:18 AM, hyc@symas.com wrote:
> dw@botanicus.net wrote:
>> Currently if a user (wilfully or accidentally, say, through composition of
>> third party libs) opens the same LMDB environment twice within a process, and
>> maintains active read transactions at the time one mdb_env_close() is called,
>> all reader slots will be deallocated in all environments due to the logic
>> around line 4253 that unconditionally clears reader slots based on PID.
>>
>> I'd like to avoid this in py-lmdb, and seems there are a few alternatives to
>> fix it:
>>
>> 1) Tell users not to do that (doc fix, nobody reads docs)
>
> This is already in the doc - don't open the same env twice within one process.

1.5) Use something else than fcntl locks on the lock-file descriptor
to control whether to re-init the lockfile and its mutexes/semaphores.
Like MDB_NOLOCK.

Needed to get any other suggestions off the ground, see Caveats
in lmdb.h.  Once two MDB_env's in a process has the lockfile open,
both lose their fcntl locks when either closes it - and the next
mdb_env_open() will reset the lockfile.  Then both readers and
writers are screwed, and more care with reader slots won't help.

Well, one MDB_env could leak its lockfile descriptor instead of
closing it, or it could give the descriptor to the other MDB_env
for a deferred close if it could find it using some global state.
(Could work until someone wrote an open-close loop which would
run into the max-open-FD limit.)

Either way, no fun.

Another way would be #ifdef ... more unportable code ... #endif:-(
Surely some OSes have come up with sane way to address this
problem, there's just no portable way.

>> 2) Externally maintain a set of (st_dev, st_ino) keys representing already
>> opened environments, and check any attempt to open an environment against this
>> list, failing if data.mdb's key already appears.
>
> That would require global library state, which is definitely unwanted.

Though in one sense we already have that, which is the problem.
Global (well, per-process) lock state on open files and mutexes.

>> 3) Disable the mdb_env_close() cleanup loop when MDB_NOTLS is enabled. Since no
>> reader slot will ever exist in TLS in this mode, the existing mdb_env_close()
>> doc "All transactions, databases, and cursors must already be closed before
>> calling this function." ensures no readers exist at close time, and the loop
>> need not run.
>
> Could do this, but since it's an incomplete solution, not sure there's any point.

That's MDB_NOLOCK anyway.  MDB_NOTLS does have reader table entries.

 > (...)

-- 
Hallvard

Comment 4 Howard Chu 2021-06-02 13:17:58 UTC

(In reply to dw@hmmz.org from comment #0)
> Full_Name: David Wilson
> Version: LMDB 0.9.11
> OS: 
> URL:
> https://raw.githubusercontent.com/dw/py-lmdb/master/misc/readers_mrb_env.
> patch
> Submission from: (NULL) (178.238.153.20)
> 
> 
> Currently if a user (wilfully or accidentally, say, through composition of
> third party libs) opens the same LMDB environment twice within a process, and
> maintains active read transactions at the time one mdb_env_close() is called,
> all reader slots will be deallocated in all environments due to the logic
> around line 4253 that unconditionally clears reader slots based on PID.
> 
> I'd like to avoid this in py-lmdb, and seems there are a few alternatives to
> fix it:

> 4) Modify lock.mdb to include MDB_env* address within the process, and update
> mdb_env_close() to invalidate only readers associated with the environment
> being closed. I dislike using the MDB_env* as an opaque publicly visible
> cookie, but perhaps storing this field might be reused for other things in
> future.

I was looking at this approach, which initially seems harmless. But unfortunately it does nothing to address the fact the fcntl lock on the
lock file will still go away on the first env_close() call. If there are
no other processes with the env open, then the next process that opens
the env will re-init the lock file, because it doesn't know about the
existing process. At that point the existing process is hosed anyway.

So the question is, if a process has made the mistake of opening the same env twice, what is the anticipated lifetime of each env instance? If the code was written according to doc guidelines, then both env instances should only be closed after all processing is done and just before program exit. In that case it doesn't really matter what happens to the reader table or the fcntl lock. But if the two instances have drastically different lifetimes, then all bets are off.

Comment 5 Howard Chu 2021-06-03 15:56:39 UTC

(In reply to Howard Chu from comment #4)
> (In reply to dw@hmmz.org from comment #0)
> > Full_Name: David Wilson
> > Version: LMDB 0.9.11
> > OS: 
> > URL:
> > https://raw.githubusercontent.com/dw/py-lmdb/master/misc/readers_mrb_env.
> > patch
> > Submission from: (NULL) (178.238.153.20)
> > 
> > 
> > Currently if a user (wilfully or accidentally, say, through composition of
> > third party libs) opens the same LMDB environment twice within a process, and
> > maintains active read transactions at the time one mdb_env_close() is called,
> > all reader slots will be deallocated in all environments due to the logic
> > around line 4253 that unconditionally clears reader slots based on PID.
> > 
> > I'd like to avoid this in py-lmdb, and seems there are a few alternatives to
> > fix it:
> 
> > 4) Modify lock.mdb to include MDB_env* address within the process, and update
> > mdb_env_close() to invalidate only readers associated with the environment
> > being closed. I dislike using the MDB_env* as an opaque publicly visible
> > cookie, but perhaps storing this field might be reused for other things in
> > future.
> 
> I was looking at this approach, which initially seems harmless. But
> unfortunately it does nothing to address the fact the fcntl lock on the
> lock file will still go away on the first env_close() call. If there are
> no other processes with the env open, then the next process that opens
> the env will re-init the lock file, because it doesn't know about the
> existing process. At that point the existing process is hosed anyway.

Another possibility here is to detect that there were multiple envs for the same PID, and not close the descriptors in that case. This means intentionally leaking a file descriptor, but keeps the locks intact. Still ugly, but not as bad as inviting corruption. And, this would only happen due to a library misuse, so it would still be a rare situation.