7377 – Poor libmdb error handling

Issue 7377 - Poor libmdb error handling

Summary: Poor libmdb error handling

Status:	RESOLVED PARTIAL

Alias:	None

Product:	OpenLDAP
Classification:	Unclassified
Component:	slapd (show other issues)
Version:	2.4.32
Hardware:	All All

Importance:	--- normal
Target Milestone:	---
Assignee:	OpenLDAP project

URL:
Keywords:

Depends on:
Blocks:

Reported:	2012-09-01 09:32 UTC by Hallvard Furuseth
Modified:	2014-08-01 21:04 UTC (History)
CC List:	0 users

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this issue.

Description Hallvard Furuseth 2012-09-01 09:32:55 UTC

Full_Name: Hallvard B Furuseth
Version: 2.4.32
OS: Linux x86_64
URL: 
Submission from: (NULL) (193.69.163.163)
Submitted by: hallvard


libmdb omits necessary checks for system call errors, and does
not always clean up after the errors it does detect.

Also mdb could catch more user errors - including "cannot happen"
mutex errors from mutex lock/unlock, since some threading user
errors are be non-obvious from the doc.

At least sem_wait() needs an EINTR loop.  POSIX says "The
sem_wait() function is interruptible by the delivery of a signal."
The Linux manpage adds that a signal handler always interrupts it
regardless of sigaction SA_RESTART.  I don't know about BSD.

Some error checks first need updates to the Windows emulation, or
will look cleaner that way.  E.g. pthread_key_create().

I'll fix mdb_env_open()'s error cleanup, at least.
(Hyc, that moves cleanup out of mdb_env_setup_locks().  Any
preference for using ~25 exit points 'return <ErrCode() or rc>'
vs. ~25 'goto <failure label>' which does the same?)

Maybe add a flag to request error paranoia, like setting
PTHREAD_MUTEX_ERRORCHECK.  Trying to catch if a current mdb
process has lost is locks. (The fcntl F_SETLK is lost if the
process closes another file descriptor to the same file,
e.g. after some other module in the process opened and closed the
mdb lock file.)  Catching a change to the pid from getpid().

Regarding the pid: Repeating getpid() - in mdb_txn_renew0() and
mdb_env_close() - is useless since mdb cannot survive fork()
anyway.  fork drops fcntl locks.  Might as well store the pid (or
just some unique "environment id") in MDB_env and use that.  OTOH
some strategically placed getpid()s can then catch the user error
of having forked, and set e.g. MDB_FATAL_ERROR.

The fork() restriction needs to be documented, as does "don't open
or close the same database twice in the same process".


mdb_env_open() should catch being called twice.  Except, should
it try to allow a 2nd call if 1st mdb_env_open() failed?  It can
easily do that if 1st attempt failed before reading params from
the DB files, but fail if the 1st open got that far.

Comment 1 Hallvard Furuseth 2012-09-01 09:53:13 UTC

I wrote:

> Also mdb could catch more user errors - including "cannot happen"
> mutex errors from mutex lock/unlock, since some threading user
> errors are be non-obvious from the doc.

That is, "cannot happen" when there is no user error.

> The fork() restriction needs to be documented, as does "don't
> open or close the same database twice in the same process".

Correction, "don't have the same database open twice at the same
time in the same process".

Comment 2 Hallvard Furuseth 2012-09-01 09:56:35 UTC

I messed up last reply headers somehow.  Trying again.  I wrote:

> Also mdb could catch more user errors - including "cannot happen"
> mutex errors from mutex lock/unlock, since some threading user
> errors are be non-obvious from the doc.

That is, "cannot happen" when there is no user error.

> The fork() restriction needs to be documented, as does "don't
> open or close the same database twice in the same process".

Correction, "don't have the same database open twice at the same
time in the same process".

Comment 3 Howard Chu 2012-09-01 10:07:19 UTC

h.b.furuseth@usit.uio.no wrote:
> Full_Name: Hallvard B Furuseth
> Version: 2.4.32
> OS: Linux x86_64
> URL: 
> Submission from: (NULL) (193.69.163.163)
> Submitted by: hallvard
> 
> 
> libmdb omits necessary checks for system call errors, and does
> not always clean up after the errors it does detect.
> 
> Also mdb could catch more user errors - including "cannot happen"
> mutex errors from mutex lock/unlock, since some threading user
> errors are be non-obvious from the doc.
> 
> At least sem_wait() needs an EINTR loop.  POSIX says "The
> sem_wait() function is interruptible by the delivery of a signal."
> The Linux manpage adds that a signal handler always interrupts it
> regardless of sigaction SA_RESTART.  I don't know about BSD.
> 
> Some error checks first need updates to the Windows emulation, or
> will look cleaner that way.  E.g. pthread_key_create().
> 
> I'll fix mdb_env_open()'s error cleanup, at least.
> (Hyc, that moves cleanup out of mdb_env_setup_locks().  Any
> preference for using ~25 exit points 'return <ErrCode() or rc>'
> vs. ~25 'goto <failure label>' which does the same?)

In general I prefer that functions have only one exit point, so multiple gotos
is preferable.

> Maybe add a flag to request error paranoia, like setting
> PTHREAD_MUTEX_ERRORCHECK.  Trying to catch if a current mdb
> process has lost is locks. (The fcntl F_SETLK is lost if the
> process closes another file descriptor to the same file,
> e.g. after some other module in the process opened and closed the
> mdb lock file.)  Catching a change to the pid from getpid().

That's really not interesting to me. No other module has any business opening
and reading our lock file. We haven't bothered with this for alock either, and
it hasn't been a problem.

Change of pid... not worth bothering. Just add a note to the docs.

> Regarding the pid: Repeating getpid() - in mdb_txn_renew0() and
> mdb_env_close() - is useless since mdb cannot survive fork()
> anyway.  fork drops fcntl locks.  Might as well store the pid (or
> just some unique "environment id") in MDB_env and use that.

Agreed. I had a patch here that does just that, seems it got lost in some
other edits.

>  OTOH
> some strategically placed getpid()s can then catch the user error
> of having forked, and set e.g. MDB_FATAL_ERROR.

Not really interesting.

> The fork() restriction needs to be documented, as does "don't open
> or close the same database twice in the same process".

Go ahead.

> mdb_env_open() should catch being called twice.  Except, should
> it try to allow a 2nd call if 1st mdb_env_open() failed?  It can
> easily do that if 1st attempt failed before reading params from
> the DB files, but fail if the 1st open got that far.

Called twice on the same env handle? Again, not interesting.

As a rule, we should of course check for failure return codes from normal
operation. We should not add extensive checks for programmer error/misuse of API.

-- 
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 4 Hallvard Furuseth 2012-09-01 11:02:11 UTC

Howard Chu wrote:
>h.b.furuseth@usit.uio.no wrote:
>> Maybe add a flag to request error paranoia, like setting
>> PTHREAD_MUTEX_ERRORCHECK.  Trying to catch if a current mdb
>> process has lost is locks. (The fcntl F_SETLK is lost if the
>> process closes another file descriptor to the same file,
>> e.g. after some other module in the process opened and closed the
>> mdb lock file.)  Catching a change to the pid from getpid().
> 
> That's really not interesting to me. No other module has any business
> opening and reading our lock file.

I skipped a step there.  The other module could also be using mdb.
"other module" = "a program component which the module using this
MDB_env does not know about".  They may even be linking different
libmdb.so's, so the 'libmdb's can't notify each other via static vars.
There are a few other use cases like a monolith program which also
can backup a directory - which the user could point at the mdb dir.

Anyway, a CAVEATS section in the doc is the obvious fix.

> We haven't bothered with this for alock either, and
> it hasn't been a problem.

Note that this ITS is about libmdb, not back-mdb.

>> The fork() restriction needs to be documented, as does "don't open
>> or close the same database twice in the same process".
> 
> Go ahead.

Hmm.  I suggested a too pessimistic caveats section once, I can dig
that up and try again.

Comment 5 Hallvard Furuseth 2012-09-17 14:30:48 UTC

changed notes
changed state Open to Test
moved from Incoming to Software Bugs

Comment 6 Quanah Gibson-Mount 2013-02-26 20:17:23 UTC

changed notes
changed state Test to Partial

Comment 7 Hallvard Furuseth 2013-06-26 21:56:15 UTC

Changes from a failed liblmdb operation can be visible.  Sometimes
the resulting DB will also be inconsistent.  The operation should
invalidate the affected transaction/cursors or revert the change.
At least I hope it's feasible to invalidate so the user cannot see
surprising results, instead of just documenting "don't do that".

Related issues:

put(MDB_MULTIPLE) can write some of the provided items and then
fail, but be able to keep the txn usable.  It should invalidate the
txn anyway, or document that the caller must check for e.g. return
value MDB_PARTIAL_SUCCESS.  Maybe invalidate unless the the caller
requests support for partial success. put() would update the input
params to indicate which items remain unwritten.

A failed txn could still grow the txn size, unless MDB "un-touches"
pages which memcmp says have not changed, rewinds me_pglast and
me_pghead, etc.  Which seems a lot of work, likely not worth the
trouble.

I think cursors need a C_INVALID flag, so future ops cannot give a
surprising success result instead of failing: get(MDB_NEXT/MDB_PREV)
take ~C_INITIALIZED to mean (re)start from the beginning/end.

Don't know how often reverting is a relevant option. Rearranging
code can help, but maybe not much.  E.g. the mdb_del() in mdb_drop()
can do some harmless touches and then fail cleanly - but mdb_drop()
must invalidate the txn anyway since mdb_drop0() has deleted pages.
Unless it does extra work - remember mt_free_pgs[0] before and after
mdb_drop0(), then delete from mt_free_pgs[] the pages added by
drop0.  Or drop0 could be done last since it can fail cleanly just
by resetting mt_free_pgs[0]. But I don't know if del(MDB_MULTIPLE)
can also do that if the DB used a subpage. drop0 can't examine the
subpage after it got deleted.

-- 
Hallvard

Comment 8 Howard Chu 2013-06-27 00:22:46 UTC

h.b.furuseth@usit.uio.no wrote:
> Changes from a failed liblmdb operation can be visible.  Sometimes
> the resulting DB will also be inconsistent.  The operation should
> invalidate the affected transaction/cursors or revert the change.

The transaction should be invalidated. Nothing more can be done once any error 
occurs in a transaction.

> At least I hope it's feasible to invalidate so the user cannot see
> surprising results, instead of just documenting "don't do that".
>
> Related issues:
>
> put(MDB_MULTIPLE) can write some of the provided items and then
> fail, but be able to keep the txn usable.  It should invalidate the
> txn anyway, or document that the caller must check for e.g. return
> value MDB_PARTIAL_SUCCESS.  Maybe invalidate unless the the caller
> requests support for partial success. put() would update the input
> params to indicate which items remain unwritten.

No, "partial success" is not allowed. A transaction is atomic, all or nothing.

> A failed txn could still grow the txn size, unless MDB "un-touches"
> pages which memcmp says have not changed, rewinds me_pglast and
> me_pghead, etc.  Which seems a lot of work, likely not worth the
> trouble.

pghead/pglast should not persist into the environment until a successful 
commit. Until then any error should just discard them.

>
> I think cursors need a C_INVALID flag, so future ops cannot give a
> surprising success result instead of failing: get(MDB_NEXT/MDB_PREV)
> take ~C_INITIALIZED to mean (re)start from the beginning/end.

Good idea.

> Don't know how often reverting is a relevant option. Rearranging
> code can help, but maybe not much.  E.g. the mdb_del() in mdb_drop()
> can do some harmless touches and then fail cleanly - but mdb_drop()
> must invalidate the txn anyway since mdb_drop0() has deleted pages.
> Unless it does extra work - remember mt_free_pgs[0] before and after
> mdb_drop0(), then delete from mt_free_pgs[] the pages added by
> drop0.  Or drop0 could be done last since it can fail cleanly just
> by resetting mt_free_pgs[0]. But I don't know if del(MDB_MULTIPLE)
> can also do that if the DB used a subpage. drop0 can't examine the
> subpage after it got deleted.
>
Reverting is irrelevant. Once a txn is aborted no changes are visible.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 9 Hallvard Furuseth 2013-06-27 05:11:06 UTC

Howard Chu writes:
> h.b.furuseth@usit.uio.no wrote:
>> Changes from a failed liblmdb operation can be visible.  Sometimes
>> the resulting DB will also be inconsistent.  The operation should
>> invalidate the affected transaction/cursors or revert the change.
>
> The transaction should be invalidated. Nothing more can be done
> once any error  occurs in a transaction.

That would break back-mdb for MDB_NOTFOUND/MDB_KEYEXIST.  And at
least MDB_KEYEXIST can happen after touching pages, which affects
the txn size even if the change "does nothing".

I expected to generally invalidate except in necessary cases, and
from then on make more cases "gentle" at our convenience without
spending much time on it.

>> put(MDB_MULTIPLE) can write some of the provided items and then
>> fail, but be able to keep the txn usable.  (...)
>
> No, "partial success" is not allowed. A transaction is atomic, all
> or nothing.

It would be atomic like a partial OS write().  It'd be up to the
user what to do next.  Not that I care myself if it is supported.

>> A failed txn could still grow the txn size, unless MDB "un-touches"
>> pages which memcmp says have not changed, rewinds me_pglast and
>> me_pghead, etc.  Which seems a lot of work, likely not worth the
>> trouble.
>
> pghead/pglast should not persist into the environment until a successful
> commit. Until then any error should just discard them.

Sorry, I meant a failed _operation_ could grow them and thus affect
the committed freelist size.

>> I think cursors need a C_INVALID flag, so future ops cannot give a
>> surprising success result instead of failing: get(MDB_NEXT/MDB_PREV)
>> take ~C_INITIALIZED to mean (re)start from the beginning/end.
>
> Good idea.

...or C_VALID so a single OR/AND can set/clear them both.

>> Don't know how often reverting is a relevant option. (...)
>
> Reverting is irrelevant. Once a txn is aborted no changes are visible.

True, but I meant while the txn is still live and usable. Hopefully
it's only relevant for "nice to have" cases.

-- 
Hallvard

Comment 10 Hallvard Furuseth 2013-06-27 05:30:53 UTC

I wrote:
> at
> least MDB_KEYEXIST can happen after touching pages, which affects
> the txn size even if the change "does nothing".

Only when updating subDBs, I think.

-- 
Hallvard

Comment 11 OpenLDAP project 2014-08-01 21:04:44 UTC

Some fixes in mdb.master.
Some fixes in master
Some fixes in RE24