[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#8505) LMDB vs. fork()



On 29. sep. 2016 18:27, lmb@cloudflare.com wrote:
> <h.b.furuseth@usit.uio.no> wrote:
>
>> Yes. Your concern seems to reduce to a doc bug and an
>> otherwise-harmless file descriptor leak which it is too late to fix
>> completely. So I ended up thinking of resource leaks in general.
>
> Why is it harmless in your opinion?

Because I wrote that documentation, and now I know what it gets wrong:-)

Only the lockfile descriptor is dangerous.  Therefore it already had
FD_CLOEXEC.  The other FD leaks are just leaks, they can't break LMDB.

>> It'd be pretty intrusive of a _library_ to forbid the user to
>> fork() at all without exec().
>
> From my point of view, that is what the LMDB doc says.

Yes.  And that doc is half wrong, so I'll fix it.  And it's half
right with pthread_exit(), so I want to fix LMDB about that.

> If I understand you correctly, LMDB would be safe to use in these
> scenarios in an ideal world:
>
> 1. mdb_env_open(), fork(), exec() (without fd leaks I'd argue)
> 2. mdb_env_open(), pthread_create(), fork()
> 3. mdb_env_open(), fork()
>
> As far as I understand you are after 2 and 3, while I want 1.

I'm after all three.

(1) is already safe.  The mdb_env_get_fd() descriptor will leak
since otherwise programs using it could break.  But you can close()
it yourself in the child.  Well, or mdb_env_open() could take an
MDB_CLOEXEC flag which says to set FD_CLOEXEC on that FD too.

And since the child may need to do something to avoid leaks anyway,
I figure we might as well provide a cleanup function do do it
properly.  So...

> Case 2
> seems unlikely, given that forking a multi-threaded program is so hard
> that is rarely makes sense [1].

It's very limited what you can do in the child, yes.  But sometimes
if you have do then you have to.  E.g. if the exec()ed program may
not exist, and the child has do some simple task as a fallback.
Or if you're not free to rewrite the main program, only some module.

Case 2 will at least have memory leaks, since free() in the child
is unsafe.  But we can get rid of some OS resources (FDs, memory map,
maybe semaphores, need to check about that.)

> Case 3 is simply a matter of reordering the calls.

Only if the program logic allows it.  Cumbersome e.g. if the LMDB
database contains the config controlling the child.

-- 
Hallvard