[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#5171) hdb txn_checkpoint failures



> itself. Again, we can't really tell without single-stepping thru the BDB 
> library code. It may not be worth the effort, but that's your call.

The lock was

env_region.c:290         MUTEX_LOCK(dbenv, &renv->mutex);

but that wasn't making much sense....and after a couple minutes in dbx I 
realized that I've been killing myself with the attempts at db_stat. 
Yesterday's attempts were running db_* binaries with a wrong (but 
compatible) ABI. It'd be nice if Sleepycat had some more/earlier checks 
for that, but oh well...

So anyway, I corrupted base2/slave4 by running the wrong db_stat, but that 
left three other bases on slave4 and all three bases on slave6. I ran 
db_stat -l on them, the output is:

https://www.nbcs.rutgers.edu/~richton/its5171_dbstatl


BTW, this ABI screwup shouldn't be the root cause of the failures...I 
haven't tried any db tools until the course of debugging this. These are 
AUTOREMOVE, so db_archive is unlikely, for instance.