[Date Prev][Date Next] [Chronological] [Thread] [Top]

slapd index corruption in low memory situation (ITS#318)

To: openldap-its@OpenLDAP.org
Subject: slapd index corruption in low memory situation (ITS#318)
From: bernardgardner@ozemail.com.au
Date: Tue, 5 Oct 1999 06:11:10 GMT

Full_Name: Bernard Gardner
Version: 1.2.6
OS: Solaris 7
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (203.108.14.145)


System version:
OpenLDAP 1.2.6 with Sleepycat BDB v2.7.5 (but I can see that this isn't
changed in 1.2.7)

Initial Symptom:
Under circumstances of low memory (system already running one instance of
slapd,
tuned to use all available memory), loads via ldap_add progress very slowly
(1 add every three seconds).

Cause:
When sleepycat is unable to allocate memory during initial open of db files,
a zero length file can be left on disk, and an error returned from the open
attempt (error 11 in this case). The error is ignored, and it would seem the
index change is lost. The next time an attempt is made to modify this index,
the zero length file is seen by sleepycat as an attempt by another
thread/process
to create the file, and so three one second sleeps are executed, wating for
the file's creator to write the first page of the file, identifying the type,
to allow thwe current open to proceed. After the three seconds, another attempt
is
made to create the file, the allocation of the cache fails (again), and an
error is returned to slapd, and ignored (again), and the cycle continues.
(See db_open in db.c, which is called from within libraries/libldbm/ldbm.c
in the ldbm_open wrapper function, where the returned error is typecast to
void).

Discussion:
This bug also affects index creation when using ldif2ldbm.

Is there a good reason for the void typecast of the error return of db_open?

I'm currently testing a modification that checks the return value, and
if it looks as if the problem was due to a lack of memory (ENOMEM || EAGAIN)
retry the db_open with db_cachesize=0 (use default cache size instead of user
specified). My main concerns here are that EAGAIN might be returned for some
other reason (like inability to acquire a lock), and if the second attempt to
open the file fails, well, what should it do then?

Bernard.

Prev by Date: Re: back-ldbm: corrupted attribute index files (ITS#314)
Next by Date: setlocale (ITS#319)
Index(es):
- Chronological
- Thread