Issue 8505 - LMDB vs. fork()
Summary: LMDB vs. fork()
Status: VERIFIED FIXED
Alias: None
Product: LMDB
Classification: Unclassified
Component: liblmdb (show other issues)
Version: unspecified
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-09-27 05:00 UTC by Hallvard Furuseth
Modified: 2020-03-12 15:56 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description Hallvard Furuseth 2016-09-27 05:00:06 UTC
Full_Name: Hallvard B Furuseth
Version: LMDB_0.9.18
OS: Linux
URL: 
Submission from: (NULL) (81.191.45.31)
Submitted by: hallvard


This is from openldap-tecnical thread "Using LMDB safely with fork()
and exec()" by Lorenz Bauer:
<http://www.openldap.org/lists/openldap-technical/201608/msg00062.html>

====

The doc is too restrictive, it implicitly suggests an LMDB process
can't fork another LDMB process (since the old LMDB would still
be open).  We should document the use of FD_CLOEXEC etc.

====

This breaks LMDB:
    if (fork() == 0) pthread_exit(NULL);
because the child's me_txkey destructor releases the reader slot.
This can happen if a single-threaded LMDB process forks a multi-
threaded child, which is one way to deal with threads vs. fork().

Simplest fix: Check if (reader->mr_pid == getpid()) in reader_dest().

We could instead use pthread_atfork(), but there is not way to remove
an atfork handler.  It seems to be implementation-dependent what
happens with an atfork handler if liblmdb is dynamically unloaded.

====

LMDB leaks file descriptors to a fork() child, since the user may
not call mdb_env_close() and there is no way to get at some of the
descriptors.

Partial fix: Set FD_CLOEXEC for me_mfd and the mdb_env_copy() FD.

Should I code up an mdb_env_child_cleanup() function which may be
called in the child if it does not fork()?  That makes sense either
if the parent is single-threaded and forks a more complex child, or
if the child is fairly simple.

====

Branch "mdb/fopen2" in my repo will soon include suggested fixes,
except mdb_env_child_cleanup().  (Fixed version of older "mdb/fopen")
Comment 1 lmb@cloudflare.com 2016-09-28 19:29:23 UTC
I think your proposal is going beyond what I was trying to do.

From the documentation, it sounds like a process using LMDB can not
safely fork, under any circumstances. My idea was to specialise this
to saying that fork() followed immediately by exec() is safe, but that
a plain fork() is still not supported. If a child wants to use the
same db, it goes through the regular mdb_env_open sequence, rather
than somehow inheriting the environment across the fork. I think that
should address your pthread_exit concern, as well?

On your point that a user may not call mdb_env_close before the fork:
the user might not want to, since s/he still wants to use the env in
the original process.

Comment 2 Hallvard Furuseth 2016-09-29 10:39:41 UTC
On 28/09/16 21:29, lmb@cloudflare.com wrote:
> I think your proposal is going beyond what I was trying to do.

Yes. Your concern seems to reduce to a doc bug and an otherwise- 
harmless file descriptor leak which it is too late to fix
completely. So I ended up thinking of resource leaks in general.

> From the documentation, it sounds like a process using LMDB can not
> safely fork, under any circumstances. My idea was to specialise this
> to saying that fork() followed immediately by exec() is safe,

Yes.  That doc bug should be fixed.

> but that
> a plain fork() is still not supported.  If a child wants to use the
> same db, it goes through the regular mdb_env_open sequence, rather
> than somehow inheriting the environment across the fork.  I think that
> should address your pthread_exit concern, as well?

It'd be pretty intrusive of a _library_ to forbid the user to
fork() at all without exec().  A _program_ could specify that.
So I prefer to protect the parent from pthread_exit in the child.
(Note, this has nothing to do with file descriptor leaks).

> On your point that a user may not call mdb_env_close before the fork:
> the user might not want to, since s/he still wants to use the env in
> the original process.

No, I mean a partial env_close to be called _after_ the fork, in the
child.  A child which does not want FD-leaks must in any case do a
close() loop or close the mdb_env_get_fd() descriptor. So maybe we
just as well should give him a more thorough "free resources in child"
call which cleans up a bit for exec()-less programs as well.
(It'll need an argument which says just what is safe to clean up -
e.g. it can't do much much if the parent is already multi-threaded.)

Comment 3 lmb@cloudflare.com 2016-09-29 16:27:40 UTC
On 29 September 2016 at 03:39, Hallvard Breien Furuseth
<h.b.furuseth@usit.uio.no> wrote:

> Yes. Your concern seems to reduce to a doc bug and an
> otherwise- harmless file descriptor leak which it is too late to fix
> completely. So I ended up thinking of resource leaks in general.

Why is it harmless in your opinion?

> It'd be pretty intrusive of a _library_ to forbid the user to
> fork() at all without exec().

From my point of view, that is what the LMDB doc says. I can't use
LMDB and fork(). We're probably approaching this from different angles
though: I want to take something away that is de-facto possible, but
was not documented to be that way.

> No, I mean a partial env_close to be called _after_ the fork, in the
> child.  A child which does not want FD-leaks must in any case do a
> close() loop or close the mdb_env_get_fd() descriptor. So maybe we
> just as well should give him a more thorough "free resources in child"
> call which cleans up a bit for exec()-less programs as well.
> (It'll need an argument which says just what is safe to clean up -
> e.g. it can't do much much if the parent is already multi-threaded.)

Sorry, I misunderstood you here.

If I understand you correctly, LMDB would be safe to use in these
scenarios in an ideal world:

1. mdb_env_open(), fork(), exec() (without fd leaks I'd argue)
2. mdb_env_open(), pthread_create(), fork()
3. mdb_env_open(), fork()

As far as I understand you are after 2 and 3, while I want 1. Case 2
seems unlikely, given that forking a multi-threaded program is so hard
that is rarely makes sense [1]. Case 3 is simply a matter of
reordering the calls.

1: http://thorstenball.com/blog/2014/10/13/why-threads-cant-fork/

Comment 4 Hallvard Furuseth 2016-09-29 18:12:57 UTC
On 29. sep. 2016 18:27, lmb@cloudflare.com wrote:
> <h.b.furuseth@usit.uio.no> wrote:
>
>> Yes. Your concern seems to reduce to a doc bug and an
>> otherwise-harmless file descriptor leak which it is too late to fix
>> completely. So I ended up thinking of resource leaks in general.
>
> Why is it harmless in your opinion?

Because I wrote that documentation, and now I know what it gets wrong:-)

Only the lockfile descriptor is dangerous.  Therefore it already had
FD_CLOEXEC.  The other FD leaks are just leaks, they can't break LMDB.

>> It'd be pretty intrusive of a _library_ to forbid the user to
>> fork() at all without exec().
>
> From my point of view, that is what the LMDB doc says.

Yes.  And that doc is half wrong, so I'll fix it.  And it's half
right with pthread_exit(), so I want to fix LMDB about that.

> If I understand you correctly, LMDB would be safe to use in these
> scenarios in an ideal world:
>
> 1. mdb_env_open(), fork(), exec() (without fd leaks I'd argue)
> 2. mdb_env_open(), pthread_create(), fork()
> 3. mdb_env_open(), fork()
>
> As far as I understand you are after 2 and 3, while I want 1.

I'm after all three.

(1) is already safe.  The mdb_env_get_fd() descriptor will leak
since otherwise programs using it could break.  But you can close()
it yourself in the child.  Well, or mdb_env_open() could take an
MDB_CLOEXEC flag which says to set FD_CLOEXEC on that FD too.

And since the child may need to do something to avoid leaks anyway,
I figure we might as well provide a cleanup function do do it
properly.  So...

> Case 2
> seems unlikely, given that forking a multi-threaded program is so hard
> that is rarely makes sense [1].

It's very limited what you can do in the child, yes.  But sometimes
if you have do then you have to.  E.g. if the exec()ed program may
not exist, and the child has do some simple task as a fallback.
Or if you're not free to rewrite the main program, only some module.

Case 2 will at least have memory leaks, since free() in the child
is unsafe.  But we can get rid of some OS resources (FDs, memory map,
maybe semaphores, need to check about that.)

> Case 3 is simply a matter of reordering the calls.

Only if the program logic allows it.  Cumbersome e.g. if the LMDB
database contains the config controlling the child.

-- 
Hallvard

Comment 5 Hallvard Furuseth 2016-09-29 19:47:45 UTC
I wrote:
>On 29. sep. 2016 18:27, lmb@cloudflare.com wrote:
>> If I understand you correctly, LMDB would be safe to use in these
>> scenarios in an ideal world:
>>
>> 1. mdb_env_open(), fork(), exec() (without fd leaks I'd argue)
>> 2. mdb_env_open(), pthread_create(), fork()
>> 3. mdb_env_open(), fork()
>>
>> As far as I understand you are after 2 and 3, while I want 1.
>
> I'm after all three.

...if the user calls the cleanup function.  But:

Whoops, correction: If you mean using LMDB after fork(), (2)
can't be safe unless the user makes it safe with pthread_atfork().
But at least it can clean up some of the stuff (3) also does,
and he'll have the pthread_atfork option if he really needs it.

Comment 6 Hallvard Furuseth 2016-10-05 05:07:33 UTC
changed notes
changed state Open to Test
moved from Incoming to Software Bugs
Comment 7 lmb@cloudflare.com 2016-10-13 15:29:05 UTC
I obviously agree that fixing the problems you mention is worthwhile, it
just so far doesn't really get me what I need: safe fork() + exec().

Well, or mdb_env_open() could take an
> MDB_CLOEXEC flag which says to set FD_CLOEXEC on that FD too.
>

If I could get that I'd move off my soapbox immediately. I can contribute a
patch if that sounds reasonable?

-- 
Lorenz Bauer  |  Systems Engineer
25 Lavington St., London SE1 0NZ

www.cloudflare.com
Comment 8 OpenLDAP project 2018-02-09 18:58:54 UTC
Fixed in mdb.master
fixed in 0.9.19
Comment 9 Quanah Gibson-Mount 2018-02-09 18:58:54 UTC
changed notes
changed state Test to Closed