[Date Prev][Date Next] [Chronological] [Thread] [Top]

slapd mysteriously quit/crash/die without last words



 Hello.

slapd keep dying more and more frequently recently, from dying once a
week in last month to current dying 3 to 5 times a day.
"/etc/init.d/slapd start" instantly recover it, for next a few hours.
Exactly the same web application that uses the ldap database had been
running as-is for about one year without this problem. Server machine
never changed/replaced/touched.

What we tried are (in that order):

   1. Run a server monitor to make sure the server load is not high
      (below 0.5) when slapd dies.
   2. Upgrade to 2.4.11-1+lenny2 (on Debian)
   3. slapcat the mostly used database (hdb) and slapadd them back in.
   4. do the same for the other database (bdb);
   5. track the log message at log level 256 (connection) and finding no
      clue. For example, one time, the last word is:

      Oct 25 16:06:59 www slapd[11969]: conn=26289 fd=29 ACCEPT from IP=**.**.**.**:56539 (IP=0.0.0.0:389) 
      Oct 25 16:06:59 www slapd[11969]: conn=26289 op=0 BIND dn="cn=admin,dc=*******" method=128 
      Oct 25 16:06:59 www slapd[11969]: conn=26289 op=0 BIND dn="cn=admin,dc=*******" mech=SIMPLE ssf=0 
      Oct 25 16:06:59 www slapd[11969]: conn=26289 op=0 RESULT tag=97 err=0 text= 
      Oct 25 16:06:59 www slapd[11969]: conn=26288 op=8 UNBIND 
      Oct 25 16:06:59 www slapd[11969]: conn=26288 fd=41 closed 
      Oct 25 16:06:59 www slapd[11969]: conn=26289 op=1 UNBIND 
      Oct 25 16:06:59 www slapd[11969]: conn=26289 fd=29 closed

      Another time it is:


      Oct 25 16:27:09 www slapd[25691]: conn=2750 fd=51 ACCEPT from IP=**.**.**.**:54846 (IP=0.0.0.0:389) 
      Oct 25 16:27:09 www slapd[25691]: conn=2750 op=1 SRCH base="ou=contacts,ou=china,dc=*******" scope=2 deref=0 filter="(uidNumber=7762)" 
      Oct 25 16:27:10 www slapd[25691]: conn=2750 op=1 SRCH attr=o mail telephonenumber contactperson c st l street postalcode postofficebox facsimiletelephonenumber labeleduri businesscategory description pnglogo changetime lastrecapdate objectclass category	

   6. Track the log message at log level 4 (heavy trace debugging) and
      finding no clue. For example, one time, the last word is:

      Oct 25 22:23:48 www slapd[723]: connection_get(50) 
      Oct 25 22:23:48 www slapd[723]: SRCH "ou=contacts,ou=china,dc=*******" 2 0
      Oct 25 22:23:48 www slapd[723]:     0 0 0 
      Oct 25 22:23:48 www slapd[723]:     filter: (uidNumber=2) 
      Oct 25 22:23:48 www slapd[723]:     attrs:
      Oct 25 22:23:49 www slapd[723]:  
      Oct 25 22:23:49 www slapd[723]: connection_get(56) 
      Oct 25 22:23:49 www slapd[723]: SRCH "ou=contacts,ou=china,dc=*******" 2 0
      Oct 25 22:23:49 www slapd[723]:     0 0 0 
      Oct 25 22:23:49 www slapd[723]:     filter: (uidNumber=2) 


Help, hints and suggestion of specific RTFM highly appreciated. Offering
*paid* help to remote login to solve this problem is highly appreciated
as well (please send me and my colleague on the 'cc' an email about the
quotes). The problem exhausted us all.

Thanks in advance!

Zhang Weiwu