Issue 7165 - Killing slap tools breaks the running mdb
Summary: Killing slap tools breaks the running mdb
Status: VERIFIED FIXED
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: backends (show other issues)
Version: 2.4.47
Hardware: All All
: --- normal
Target Milestone: 2.5.13
Assignee: Howard Chu
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-02-15 06:53 UTC by Hallvard Furuseth
Modified: 2022-07-14 21:18 UTC (History)
0 users

See Also:


Attachments
its7165.diff (4.21 KB, patch)
2020-03-20 05:53 UTC, Quanah Gibson-Mount
Details

Note You need to log in before you can comment on or make changes to this issue.
Description Hallvard Furuseth 2012-02-15 06:53:55 UTC
Full_Name: Hallvard B Furuseth
Version: master, d861910f55cb1beb5e443caa7e961ed760074352
OS: Linux x86_64
URL: 
Submission from: (NULL) (195.1.106.125)
Submitted by: hallvard


Aborting an MDB process can break MDB if another MDB process is
running, because then locks/reader table info are not reset.

The problems below go away when the last slap process terminates
so the lockfile can be reset.  Sometimes kill -KILL is needed.


==== lock.conf ====
include servers/slapd/schema/core.schema
database mdb
suffix o=hi
directory lock.dir


(A) If a process holding a shared mutex dies, other processes' ops
on the mutex can hang/fail:

    rm -rf lock.dir; mkdir lock.dir
    servers/slapd/slapd -f lock.conf -h ldapi://ldapi -d0 &
    MDB_KILL_R=true servers/slapd/slapd -Tcat -f lock.conf
    ldapsearch -xLLH ldapi://ldapi -b o=hi l=x 1.1
    ^C (kills ldapsearch which is waiting for results)
    kill %% (fails to kill slapd)
    kill -KILL %%

The Posix fix, robust mutexes, is outlined below.  With _WIN32,
check return code WAIT_ABANDONED.  __APPLE__ Posix semaphores:
Don't know.  SysV semaphores have SEM_UNDO to cancel the process'
effect if the process dies, but the peers do not seem to be
informed that this happened.

if ((rc = pthread_mutexattr_init(      &mattr                        )) ||
    (rc = pthread_mutexattr_setpshared(&mattr, PTHREAD_PROCESS_SHARED)) ||
    (rc = pthread_mutexattr_setrobust( &mattr, PTHREAD_MUTEX_ROBUST  )) ||
    (rc = pthread_mutex_init(&mutex,   &mattr                        )))
    return rc;
...
switch (pthread_mutex_lock(&mutex)) {
case 0:
    break;
case EOWNERDEAD:
    Repair the possibly half-updated data protected by <mutex>;
    if (successful repair) {
        pthread_mutex_consistent(&mutex);
        break;
    }
    /* With Posix mutexes, make future use of <mutex> return failure: */
    pthread_mutex_unlock(&mutex);
    /* FALLTHRU */
default:
    return MDB_PANIC;
}
...
pthread_mutex_unlock(&mutex);


BTW, the above also happens with MDB_KILL_W=true slapcat followed
by ldapadd, dunno why slapcat uses a txn without MDB_TXN_RDONLY.
In this case, raise(SIGINT) is sufficient.  With MDB_KILL_R, that
does not kill slapd and we need SIGKILL - it catches SIGINT I
guess.  Don't know if that difference is intentional.


(B1) Repeated kill -KILL slap<tool> while slapd is running can use
up the reader table since it does not clear its table slots,
making the db unusable.

(B2) slapcat report no error after it fails to read the db due to
full reader table, it exits with success status.

(B3) After one "kill -KILL slap<tool>" while slapd is running, I
imagine the stale slot which does not go away can prevent freelist
reuse so the db grows quickly.  But I have not seen that.

    bash$ rm -rf lock.dir; mkdir lock.dir
    bash$ servers/slapd/slapd -f lock.conf -h ldapi://ldapi -d0 &
    bash$ for (( i=0; i<130; i++)) {
        MDB_KILL_UR=true servers/slapd/slapd -Tcat -f lock.conf
    } > /dev/null
    bash$ servers/slapd/slapd -Tcat -f lock.conf -d-1 2>cat.out
    (success result, no output, no errors in the log)
    bash$ ldapsearch -xLLH ldapi://ldapi -b o=hi l=x 1.1
    version: 1

    [R..RLOCKED..R1] Other (e.g., implementation specific) error (80)
    Additional information: internal error
    bash$

Maybe it's easier to have a per-process reader table after all,
and a shared table with just the oldest txn per process.

Or, each transaction chould have an exclusive file region lock.
A writer could try to grab the oldest txn's region lock if that
belongs to another process, and clear the txn it if successful.
A reader, before giving up, would just search for another pid's
txn whose lock it can grab.

Except since the system reuses pids, the ID would either have to
be more exclusive (e.g. pid + mdb_env_open() time) or mdb_env_open
must clear away reader table entries with its own pid.

Speaking of which, I don't see the use of the repeated getpid()
calls.  Should be enough to store the pid in the MDB_env.  fcntl
record locks do not survive fork(), so an MDB_env doesn't either.
And the thread ID is unsed, and must be reset with memset, not
'mr_tid = 0' since pthread_t can be a struct.


(C) Mentioned earlier:  Startup can fail due to a race condition,
when another process starts MDB at the same time and then aborts:
- Process 1 starts MDB, gets write lock in mdb_env_setup_locks().
- Process 2 starts MDB, awaits read lock since write lock is taken.
- Process 1 dies.
- Process 2 gets the read lock and proceeds, but with an uninitialized
  lockfile since it expects that process 1 initialized it.

Fix:  An extra exclusive lock around the current lock setup code.

    kill slapd process, if any
    rm -rf lock.dir; mkdir lock.dir
    MDB_RACE=true servers/slapd/slapd -Tcat -f lock.conf & sleep 1; \
        servers/slapd/slapd -Tcat -f lock.conf

In addition, the above dumps core in mdb_env_close:
    for (i=0; i<env->me_txns->mti_numreaders; i++)
where env->me_txns == (MDB_txninfo *) 0xffffffffffffffff.


(D) And of course ^Z when holding a mutex can block other processes'
threads, but I suppose BDB has the same problem.

    DB_SUSPEND=true servers/slapd/slapd -Tadd -f lock.conf < /dev/null
    servers/slapd/slapd -Tadd -f lock.conf < /dev/null
    (hanging...) ^C
    kill %%

The fix I can see for now is "don't do that" in the doc.  MDB
could in addition use pthread_mutex_timedlock() so slapd at least
can keep running and can be killed without kill -KILL.  Final fix
= a lock-free library, when a useful portable one emerges.
Comment 1 Hallvard Furuseth 2012-02-15 08:38:09 UTC
 I wrote:
> Aborting an MDB process can break MDB if another MDB process is
> running, because then locks/reader table info are not reset.

 I forgot the patch I've been using to play with mdb:

     http://folk.uio.no/hbf/OpenLDAP/its7165.diff

Comment 2 Hallvard Furuseth 2012-02-15 11:19:10 UTC
changed notes
moved from Incoming to Documentation
Comment 3 OpenLDAP project 2014-08-01 21:04:11 UTC
slap tools needs "don't do that" doc
Comment 4 Quanah Gibson-Mount 2020-03-20 05:53:49 UTC
Created attachment 623 [details]
its7165.diff
Comment 5 Howard Chu 2021-06-02 12:35:14 UTC
Most of this bug appears to be obsolete.

(In reply to Hallvard Furuseth from comment #0)
> Full_Name: Hallvard B Furuseth
> Version: master, d861910f55cb1beb5e443caa7e961ed760074352
> OS: Linux x86_64
> URL: 
> Submission from: (NULL) (195.1.106.125)
> Submitted by: hallvard
> 
> 
> Aborting an MDB process can break MDB if another MDB process is
> running, because then locks/reader table info are not reset.

Using mdb_stat -rr will purge stale readers. I suppose we could
tweak back-mdb to also do a reader purge if it ever gets a MDB_READERS_FULL error from mdb_txn_begin.
> 
> The problems below go away when the last slap process terminates
> so the lockfile can be reset.  Sometimes kill -KILL is needed.
> 
> 
> ==== lock.conf ====
> include servers/slapd/schema/core.schema
> database mdb
> suffix o=hi
> directory lock.dir
> 
> 
> (A) If a process holding a shared mutex dies, other processes' ops
> on the mutex can hang/fail:

Robust mutex support was added in a2ac10107e2fb845c4a38a339239063ec4407d84 in 2014.
> 
>     rm -rf lock.dir; mkdir lock.dir
>     servers/slapd/slapd -f lock.conf -h ldapi://ldapi -d0 &
>     MDB_KILL_R=true servers/slapd/slapd -Tcat -f lock.conf
>     ldapsearch -xLLH ldapi://ldapi -b o=hi l=x 1.1
>     ^C (kills ldapsearch which is waiting for results)
>     kill %% (fails to kill slapd)
>     kill -KILL %%
> 
> The Posix fix, robust mutexes, is outlined below.  With _WIN32,
> check return code WAIT_ABANDONED.  __APPLE__ Posix semaphores:
> Don't know.  SysV semaphores have SEM_UNDO to cancel the process'
> effect if the process dies, but the peers do not seem to be
> informed that this happened.
> 
> if ((rc = pthread_mutexattr_init(      &mattr                        )) ||
>     (rc = pthread_mutexattr_setpshared(&mattr, PTHREAD_PROCESS_SHARED)) ||
>     (rc = pthread_mutexattr_setrobust( &mattr, PTHREAD_MUTEX_ROBUST  )) ||
>     (rc = pthread_mutex_init(&mutex,   &mattr                        )))
>     return rc;
> ...
> switch (pthread_mutex_lock(&mutex)) {
> case 0:
>     break;
> case EOWNERDEAD:
>     Repair the possibly half-updated data protected by <mutex>;
>     if (successful repair) {
>         pthread_mutex_consistent(&mutex);
>         break;
>     }
>     /* With Posix mutexes, make future use of <mutex> return failure: */
>     pthread_mutex_unlock(&mutex);
>     /* FALLTHRU */
> default:
>     return MDB_PANIC;
> }
> ...
> pthread_mutex_unlock(&mutex);
> 
> 
> BTW, the above also happens with MDB_KILL_W=true slapcat followed
> by ldapadd, dunno why slapcat uses a txn without MDB_TXN_RDONLY.

slapcat already opens the env RDONLY. In 3e47e825fd61e1bff001c01f40d70263c89ffabd in 2012 was patched to also create
the txn RDONLY.

> In this case, raise(SIGINT) is sufficient.  With MDB_KILL_R, that
> does not kill slapd and we need SIGKILL - it catches SIGINT I
> guess.  Don't know if that difference is intentional.
> 
> 
> (B1) Repeated kill -KILL slap<tool> while slapd is running can use
> up the reader table since it does not clear its table slots,
> making the db unusable.
> 
> (B2) slapcat report no error after it fails to read the db due to
> full reader table, it exits with success status.

Unable to reproduce, dunno when this was fixed but it returns
an error code now.
> 
> (B3) After one "kill -KILL slap<tool>" while slapd is running, I
> imagine the stale slot which does not go away can prevent freelist
> reuse so the db grows quickly.  But I have not seen that.

Depends on timing I suppose; it's possible that there was no
active read txn at the moment, in which case the stale reader slot
has no effect on page reuse.
> 
>     bash$ rm -rf lock.dir; mkdir lock.dir
>     bash$ servers/slapd/slapd -f lock.conf -h ldapi://ldapi -d0 &
>     bash$ for (( i=0; i<130; i++)) {
>         MDB_KILL_UR=true servers/slapd/slapd -Tcat -f lock.conf
>     } > /dev/null
>     bash$ servers/slapd/slapd -Tcat -f lock.conf -d-1 2>cat.out
>     (success result, no output, no errors in the log)
>     bash$ ldapsearch -xLLH ldapi://ldapi -b o=hi l=x 1.1
>     version: 1
> 
>     [R..RLOCKED..R1] Other (e.g., implementation specific) error (80)
>     Additional information: internal error
>     bash$
> 
> Maybe it's easier to have a per-process reader table after all,
> and a shared table with just the oldest txn per process.
> 
> Or, each transaction chould have an exclusive file region lock.
> A writer could try to grab the oldest txn's region lock if that
> belongs to another process, and clear the txn it if successful.
> A reader, before giving up, would just search for another pid's
> txn whose lock it can grab.

The current code uses an fcntl lock to detect process liveness.
> 
> Except since the system reuses pids, the ID would either have to
> be more exclusive (e.g. pid + mdb_env_open() time) or mdb_env_open
> must clear away reader table entries with its own pid.
> 
> Speaking of which, I don't see the use of the repeated getpid()
> calls.  Should be enough to store the pid in the MDB_env.  fcntl
> record locks do not survive fork(), so an MDB_env doesn't either.
> And the thread ID is unsed, and must be reset with memset, not
> 'mr_tid = 0' since pthread_t can be a struct.
> 
The above is no longer true.
> 
> (C) Mentioned earlier:  Startup can fail due to a race condition,
> when another process starts MDB at the same time and then aborts:
> - Process 1 starts MDB, gets write lock in mdb_env_setup_locks().
> - Process 2 starts MDB, awaits read lock since write lock is taken.
> - Process 1 dies.
> - Process 2 gets the read lock and proceeds, but with an uninitialized
>   lockfile since it expects that process 1 initialized it.

Probably still true, but already doc'd in Caveats.
> 
> Fix:  An extra exclusive lock around the current lock setup code.
> 
>     kill slapd process, if any
>     rm -rf lock.dir; mkdir lock.dir
>     MDB_RACE=true servers/slapd/slapd -Tcat -f lock.conf & sleep 1; \
>         servers/slapd/slapd -Tcat -f lock.conf
> 
> In addition, the above dumps core in mdb_env_close:
>     for (i=0; i<env->me_txns->mti_numreaders; i++)
> where env->me_txns == (MDB_txninfo *) 0xffffffffffffffff.
> 
> 
> (D) And of course ^Z when holding a mutex can block other processes'
> threads, but I suppose BDB has the same problem.
> 
>     DB_SUSPEND=true servers/slapd/slapd -Tadd -f lock.conf < /dev/null
>     servers/slapd/slapd -Tadd -f lock.conf < /dev/null
>     (hanging...) ^C
>     kill %%
> 
> The fix I can see for now is "don't do that" in the doc.  MDB
> could in addition use pthread_mutex_timedlock() so slapd at least
> can keep running and can be killed without kill -KILL.  Final fix
> = a lock-free library, when a useful portable one emerges.

At this point the only potential action I see is to add a check for
MDB_READERS_FULL as noted at the beginning of this reply.
Comment 6 Howard Chu 2022-05-13 15:34:37 UTC
> At this point the only potential action I see is to add a check for
MDB_READERS_FULL as noted at the beginning of this reply.

https://git.openldap.org/openldap/openldap/-/merge_requests/526
Comment 7 Quanah Gibson-Mount 2022-05-16 17:00:56 UTC
HEAD:

  • 205e2f1a 
by Howard Chu at 2022-05-16T13:54:08+00:00 
ITS#7165 back-mdb: check for stale readers on MDB_READERS_FULL


RE26:

  • 7e7f01c3 
by Howard Chu at 2022-05-16T15:09:08+00:00 
ITS#7165 back-mdb: check for stale readers on MDB_READERS_FULL

RE25:

  • f3d89d62 
by Howard Chu at 2022-05-16T15:11:51+00:00 
ITS#7165 back-mdb: check for stale readers on MDB_READERS_FULL