[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: LMDB dead process detection



Howard Chu writes:
> There's been a long-running discussion about the need to have APIs in
> liblmdb for displaying the reader table and clearing out stale slots.
> Quite a few open questions on the topic:
> (...)
> 3) What approach should be used for automatic detection of stale slots?
>
>     Currently we record the process ID and thread ID of a reader in
> the table.  It's not clear to me that the thread ID has anything more
> than informational value. Since we register a per-thread destructor
> for slots, exiting threads should never be leaving stale slots in the
> first place.

Unless the thread is killed with TerminateThread() on Windows. The doc
has a bunch of dire warnings about that, but I suspect real life may
differ from Microsoft's recommendations.

> I'm also not sure that there are good APIs for an outside
> caller to determine the liveness of a given thread ID.

As far as I can tell: Windows has thread IDs and handles for this.
Posix does not provide a way for outside callers to get at threads -
either kill them or exampine them.  Individual OSes may, but then they
likely provide both.  E.g. Linux clone() can create a thread, and
tgkill() can kill it.  These calls use another ID than the Posix
thread ID.  I hope we don't want to know...

>     The process ID is also prone to wraparound; it's still very common
> for Linux systems to use 15 bit process IDs. (...)
>
>     A) set a byte range lock for every process attached to the
> environment.
> (...)
>        c) This approach won't tell us if a process is in Zombie state.

Misplaced (c).  This is the approach which does work portably for
Zombies, at least on Unix.  And as we've discussed, on at least some
OSes, approach (B) below can also check for zombies, but it may take
more time.

>     B) check process ID and process start time.
> This appears to be a fairly reliable approach, and reasonably fast,
> but there is no POSIX standard API for obtaining this process
> information.
> (...)
>
> We can implement approach (A) fairly easily, with no major
> repercussions.  For (B) we would need to add a field to the reader
> table records to store the process start time. (Thus a lockfile format
> change.)

We need to change the lockfile version anyway.  Otherwise one process
using the current MDB version and one which uses either of these
approaches, could sabotage each other.

-- 
Hallvard