[Date Prev][Date Next] [Chronological] [Thread] [Top]

RE: BDB corruption on every unclean shutdown



> -----Original Message-----
> From: owner-openldap-software@OpenLDAP.org
> [mailto:owner-openldap-software@OpenLDAP.org]On Behalf Of Marek Szuba

> On Sun, 11 Jan 2004 17:17:23 -0800
> "Howard Chu" <hyc@highlandsun.com> wrote:
>
> > Gosh, that's funny, every time my system power cycles due to a crash
> > or power loss, my filesystems get corrupted too. Luckily my system
> > startup scripts run fsck to repair the damage before anything else can
> > try to use the filesystems.

> I wish it were that easy...
>
> First of all, since the filesystem in question is ext3, apart from it
> being journalled my box *does* run fsck when needed.

It appears you missed the point, so I'll spell it out:

   when a filesystem shuts down uncleanly, you must run a tool to fix it.
      this tool happens to be called fsck.

   when a Berkeley database shuts down uncleanly, you must run a tool to fix
it.
      this tool happens to be called db_recover.

> Second, is it normal for slapd to hang under such circumstances?

In short, yes. By default, BDB locks are recorded in filesystem files. If
they are not unlocked cleanly, they remain in the BDB environment unless
forcibly removed (using db_recover.) slapd has deadlock detection but that
only works when all the locks in the environment actually have
threads/processes still associated. If a lock is leftover from a crash, the
lock detector doesn't know how to resolve the situation.

If you want to avoid this hassle, in OpenLDAP 2.2 you can configure back-bdb
to use shared memory instead of files, that way no stale locks will remain
after a system crash.

> Third, the database in question is almost entirely read-only
> (with quite
> a lot of caching even so to take some load off slapd),
> because the only
> kind of modifications occuring regularly is the users changing their
> passwords - and that doesn't happen so often that it should make a
> serious difference. When there are writes, however, all the
> transactions
> seem to get committed correctly; what is more, when I examine the
> corrupted databases the data (or at least most of it) appears to be
> there. As if it were a problem with some internal structures being
> destroyed... I tried reindexing the databases, but it didn't help.
>
> > Try reading http://www.openldap.org/faq/index.cgi?file=893
> Basically, I have already read this part of the FAQ. Still,
> there might
> be something I have missed in the documents it links to, so thanks for
> reminding me.

Perhaps you missed something along the lines of "checkpoints"...

  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support