Issue 318 - slapd index corruption in low memory situation
Summary: slapd index corruption in low memory situation
Status: VERIFIED FIXED
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: slapd (show other issues)
Version: unspecified
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 1999-10-05 06:11 UTC by bernard@messagecare.com
Modified: 2014-08-01 21:06 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description bernard@messagecare.com 1999-10-05 06:11:09 UTC
Full_Name: Bernard Gardner
Version: 1.2.6
OS: Solaris 7
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (203.108.14.145)


System version:
OpenLDAP 1.2.6 with Sleepycat BDB v2.7.5 (but I can see that this isn't
changed in 1.2.7)

Initial Symptom:
Under circumstances of low memory (system already running one instance of
slapd,
tuned to use all available memory), loads via ldap_add progress very slowly
(1 add every three seconds).

Cause:
When sleepycat is unable to allocate memory during initial open of db files,
a zero length file can be left on disk, and an error returned from the open
attempt (error 11 in this case). The error is ignored, and it would seem the
index change is lost. The next time an attempt is made to modify this index,
the zero length file is seen by sleepycat as an attempt by another
thread/process
to create the file, and so three one second sleeps are executed, wating for
the file's creator to write the first page of the file, identifying the type,
to allow thwe current open to proceed. After the three seconds, another attempt
is
made to create the file, the allocation of the cache fails (again), and an
error is returned to slapd, and ignored (again), and the cycle continues.
(See db_open in db.c, which is called from within libraries/libldbm/ldbm.c
in the ldbm_open wrapper function, where the returned error is typecast to
void).

Discussion:
This bug also affects index creation when using ldif2ldbm.

Is there a good reason for the void typecast of the error return of db_open?

I'm currently testing a modification that checks the return value, and
if it looks as if the problem was due to a lack of memory (ENOMEM || EAGAIN)
retry the db_open with db_cachesize=0 (use default cache size instead of user
specified). My main concerns here are that EAGAIN might be returned for some
other reason (like inability to acquire a lock), and if the second attempt to
open the file fails, well, what should it do then?

Bernard.

Comment 1 Kurt Zeilenga 1999-10-09 02:01:36 UTC
slapd fault handling is wholely inadequate.  We would welcome
improvements made in this area.  I would suggest that any
coding be done against our devel code as it must pass mustard
there before being applied to 1.2.

It is quite likely that a number of interfaces will have to
be changed to properly report errors up through the stack
so that appropriate high level handling can occur.

I would suggest that further discussion of this issue be
moved to -devel.

Kurt

At 06:11 AM 10/5/99 GMT, bernardgardner@ozemail.com.au wrote:
>Is there a good reason for the void typecast of the error return of db_open?

Lazy programming.

>I'm currently testing a modification that checks the return value, and
>if it looks as if the problem was due to a lack of memory (ENOMEM || EAGAIN)
>retry the db_open with db_cachesize=0 (use default cache size instead of user
>specified).

I would suggest that an no retry be attempted.  The system is misconfigured.
Report it immediately.

>My main concerns here are that EAGAIN might be returned for some
>other reason (like inability to acquire a lock), and if the second attempt to
>open the file fails, well, what should it do then?

slapd should has exclusive access to the db files.  If access
is not granted slapd should stop (as it's likely some other
application is mucking with the database).

Kurt

----
Kurt D. Zeilenga		<kurt@boolean.net>
Net Boolean Incorporated	<http://www.boolean.net/>
Comment 2 Kurt Zeilenga 1999-12-12 19:45:00 UTC
moved from Incoming to Software Bugs
Comment 3 Kurt Zeilenga 2000-09-06 17:24:40 UTC
changed notes
changed state Open to Closed
Comment 4 OpenLDAP project 2014-08-01 21:06:53 UTC
slapd does not manage resources, must be given enough