[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#5171) hdb txn_checkpoint failures



> No. The BDB transaction log files don't know (or care) anything about IP 
> addresses. Nothing at the slapd layer could have any direct effect on the BDB 
> transaction logs. How exactly did you reconfigure the servers, did you stop 
> them and restart them or did you use cn=config?

echo 192.blahblah master.r.e >> /etc/hosts

The master changed from 128.blahblah to 192.blahblah. Same physical 
machine, just different interface. On slave4 and 6, I didn't touch slapd.

> Might as well get the db_stat -l output for a few of them to compare.

This isn't going well at all; they just can't join the environment. I 
tried on slave1, it hung. I tried on slave4 under truss, it hung. (We're 
talking >30 minutes here.) Although I swear I've run db_stat hot, I killed 
db_stat (ungracefully, sadly) and stopped slapd (gracefully) on slave1, 
ran db_stat again, and it hung there...and corrupted the environment to 
the point where I couldn't get db_recover/slapd to run. (I ended up 
blowing the slave1 database away; it's refreshing from syncrepl now.)


I've got a few more slaves that I haven't shot in the foot yet, and I only 
tried this on one of the suffixes on slave{1,4}. Plenty of more 
opportunities to screw this up yet if there's anything to try...I suppose 
I could go for -N, or if the command line is going to be a pain, I could 
join the slapd process with dbx and print ->log_stat myself (although I 
might need a bit of hand holding on that)...

[the hang on slave4]
db_stat         ->    libdb-4.2.so:*db_env_create(0xffbffaec, 0x0, 0x17154)
lwp_mutex_lock(0xFF0D0000)      (sleeping...)
         mutex type: USYNC_PROCESS

  ff307248 __db_des_get (29ac0, 29d78, 29d78, ffbff9d0, 0, ffbff9d9) + c0
  ff305780 __db_e_attach (29ac0, ffbffa94, 40400, 40000, 33e021, 29d71) + 6e0
  ff2ff434 __dbenv_open (29ac0, 0, 40400, 0, 0, 0) + 664
  00016514 db_init  (29ac0, 0, 4, 100000, ffbffba0, ff3deb54) + 64
  00011e3c main     (2, ffbffc44, ffbffc50, 29800, 0, 0) + 9a4
  00011470 _start   (0, 0, 0, 0, 0, 0) + 108