[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: OpenLDAP + RHEL4 - time to time database crash (BDB backend)



On 21/09/05, Michal Dobroczynski <michal.dobroczynski@gmail.com> wrote:
> On 21/09/05, Quanah Gibson-Mount <quanah@stanford.edu> wrote:
> >
> >
> > --On Wednesday, September 21, 2005 2:40 PM +0200 Michal Dobroczynski
> > <michal.dobroczynski@gmail.com> wrote:
> >
> > > For the past 30 days it happened two times that the database 'just
> > > crashed'. For the first time I noticed that it was possible to
> > > 'slapcat' the database and suddenly at some point it was simply
> > > stopping. Nothing else was possible. I'm making backups of the
> > > database every hour.
> > >

I went into details a bit. Before I 'slapadded' the database from the
backup ldif I copied contents of /var/lib/ldap to a separate place.
Now I ran the db_recovery. Below is the output:

db_recover: Finding last valid log LSN: file: 31 offset 8779253
db_recover: Recovery starting from [1][28]
db_recover: Recovery complete at Wed Sep 21 22:39:50 2005
db_recover: Maximum transaction ID 800082a9 Recovery checkpoint [31][8780885]

Seems that this could fix the database within a moment... but that's
fixing. Idea is to avoid fixing (I'd like to know what is causing the
problem...)

I found out a few things that might cause the problem but I do not
have enough experience to judge which one is the most likely:
- bdb database tuning, meaning I should configure checkpointing,
cachesize, idlcachesize etc. - I'll refer to faq-o-matic for some
clues about the possible values;
- running db_checkpoint (uid same as slapd's uid) (but how often?)

I have created a few indices to speed up searching (exactly 9).

I also found out that it might be a problem that I'm running slapcat
with uid=root every hour.

copy-paste:

slapcat in OpenLDAP 2.2.27 was fixed to prevent any
writes/flushes/checkpoints from occurring. As such, it no longer makes
any difference what user you run as, it will not accidentally change
the ownership of any database files. (In older releases, slapcat
performed a checkpoint before closing the database environment. If
slapcat was running as root, and the checkpoint caused a new log file
to be created, it would be created/owned by root, and other processes
would be unable to write to the log. This was ITS#3703.)

end-copy-paste

I checked ownership of the database files I stored after crash - all
files belong to ldap:ldap. Files belong to ldap:ldap, so maybe slapcat
was not the cause... but I don't know.

Anyway, I'll change it - from now on slapcat will run as ldap:ldap.

Hints and advices are welcome.

Regards,
Michal Dobroczynski