[Date Prev][Date Next]
I had an unexpected and completely undocumented crash of slapd this
morning. I'm looking for some hints on tracking it down.
Here's the background.
We are running 2.3.41 (locally built RPM) on RedHat EL 4 with four slave
servers (running the same 2.3.41 and RHEL4). We use a nightly update
process where we slapcat the master database, apply the changes from the
systems of record (students, employees, retirees, etc) to the LDIF,
generate an ldapmodify data stream and run ldapmodify to apply the changes.
The student system made some massive changes this morning which caused
us to generate an ldapmodify input file with 31,973 changes (adds,
modifies, modrdn's) in it. The ldapmodify on the master took 8 minutes.
The delta-syncrepl to the slave/replica servers took 33 to 44 minutes.
The replica delta-syncrepl processes seem to have been averaging about
800 changes per minute, which is quite slow for what I was expecting.
Since it took so long for the replica's to get all the changes, they
fell more than the 10 minutes behind the master server and the person on
call got paged (nagios monitoring of the replica and master CSN's). The
person on call had not been properly trained (my fault) to look for the
syncrepl messages in the syslog on the replica servers and thus they
issued a restart on one of the replicas (thinking that something was
hung). The replica restarted properly, but the master seems to have
crashed without a sound at the same time. There was no core file
generated and I haven't found anything logged in the syslog on the
master. slapd was started on the master, and the output of the startup
says that the accesslog database had an unclean shutdown and needed to
be recovered (which it was successfully).
I'm wondering the following things:
1) Is it possible that one of the ITS's for syncrepl that will be
included in 2.3.42 would address this crash? Any suggestions on
tracking down why it crashed?
2) Does it appear that I have a configuration problem (the
delta-syncrepl taking about five times as long to get the changes out to
the replicas as it took to apply them on the master)? Where would you
suggest I look if it is likely?
Frank Swasey | http://www.uvm.edu/~fcs
Sr Systems Administrator | Always remember: You are UNIQUE,
University of Vermont | just like everyone else.
"I am not young enough to know everything." - Oscar Wilde (1854-1900)