Full_Name: Bernard Gardner Version: 1.2.6 OS: Solaris 7 URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (203.108.14.145) System version: OpenLDAP 1.2.6 with Sleepycat BDB v2.7.5 (but I can see that this isn't changed in 1.2.7) Initial Symptom: Under circumstances of low memory (system already running one instance of slapd, tuned to use all available memory), loads via ldap_add progress very slowly (1 add every three seconds). Cause: When sleepycat is unable to allocate memory during initial open of db files, a zero length file can be left on disk, and an error returned from the open attempt (error 11 in this case). The error is ignored, and it would seem the index change is lost. The next time an attempt is made to modify this index, the zero length file is seen by sleepycat as an attempt by another thread/process to create the file, and so three one second sleeps are executed, wating for the file's creator to write the first page of the file, identifying the type, to allow thwe current open to proceed. After the three seconds, another attempt is made to create the file, the allocation of the cache fails (again), and an error is returned to slapd, and ignored (again), and the cycle continues. (See db_open in db.c, which is called from within libraries/libldbm/ldbm.c in the ldbm_open wrapper function, where the returned error is typecast to void). Discussion: This bug also affects index creation when using ldif2ldbm. Is there a good reason for the void typecast of the error return of db_open? I'm currently testing a modification that checks the return value, and if it looks as if the problem was due to a lack of memory (ENOMEM || EAGAIN) retry the db_open with db_cachesize=0 (use default cache size instead of user specified). My main concerns here are that EAGAIN might be returned for some other reason (like inability to acquire a lock), and if the second attempt to open the file fails, well, what should it do then? Bernard.
slapd fault handling is wholely inadequate. We would welcome improvements made in this area. I would suggest that any coding be done against our devel code as it must pass mustard there before being applied to 1.2. It is quite likely that a number of interfaces will have to be changed to properly report errors up through the stack so that appropriate high level handling can occur. I would suggest that further discussion of this issue be moved to -devel. Kurt At 06:11 AM 10/5/99 GMT, bernardgardner@ozemail.com.au wrote: >Is there a good reason for the void typecast of the error return of db_open? Lazy programming. >I'm currently testing a modification that checks the return value, and >if it looks as if the problem was due to a lack of memory (ENOMEM || EAGAIN) >retry the db_open with db_cachesize=0 (use default cache size instead of user >specified). I would suggest that an no retry be attempted. The system is misconfigured. Report it immediately. >My main concerns here are that EAGAIN might be returned for some >other reason (like inability to acquire a lock), and if the second attempt to >open the file fails, well, what should it do then? slapd should has exclusive access to the db files. If access is not granted slapd should stop (as it's likely some other application is mucking with the database). Kurt ---- Kurt D. Zeilenga <kurt@boolean.net> Net Boolean Incorporated <http://www.boolean.net/>
moved from Incoming to Software Bugs
changed notes changed state Open to Closed
slapd does not manage resources, must be given enough