[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#5171) hdb txn_checkpoint failures



> One more thing to check is just using "ls -l" to see if the actual size of 
> the log files corresponds with the db_stat offsets. E.g. if slave6 base1's 
> log.0000001 is really 8MB but the LSN is only 233KB, then we have to look for 
> a weird in-memory corruption. If not, then somebody reset your logs.

No, it looks like those sizes all match. Actually, the "reset logs" may 
well be the case (although I still can't imagine how, I'm willing to just 
chalk this whole thing up to user error...of course logs show that the 
user was me, which is a shame :) and is hard to disprove (with only one 
log file active) with the exception of base2. base2 has multiple log files 
going back:

[slave4]
-rw-------   1 root     root     9999986 Sep  6 18:03 log.0000000001
-rw-------   1 root     root     9999967 Sep 10 14:03 log.0000000002
-rw-------   1 root     root     9999983 Sep 18 16:33 log.0000000003
-rw-------   1 root     root     9429761 Oct  8 05:33 log.0000000004

[slave6]
-rw-------   1 root     root     9999986 Sep  6 18:03 log.0000000001
-rw-------   1 root     root     9999967 Sep 10 14:03 log.0000000002
-rw-------   1 root     root     9999983 Sep 18 16:33 log.0000000003
-rw-------   1 root     root     9429761 Oct  8 05:33 log.0000000004

which of course match the db_stat -l, but also extend back prior to 
September 24 according to the filesystem timestamps. I guess the argument 
could be made that log 4 was truncated on September 24...would that be 
detected/come up sane/come up bad in the db_stat?