[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#5171) hdb txn_checkpoint failures

To: openldap-its@OpenLDAP.org
Subject: Re: (ITS#5171) hdb txn_checkpoint failures
From: richton@nbcs.rutgers.edu
Date: Mon, 8 Oct 2007 17:21:41 GMT

> One more thing to check is just using "ls -l" to see if the actual size of 
> the log files corresponds with the db_stat offsets. E.g. if slave6 base1's 
> log.0000001 is really 8MB but the LSN is only 233KB, then we have to look for 
> a weird in-memory corruption. If not, then somebody reset your logs.

No, it looks like those sizes all match. Actually, the "reset logs" may 
well be the case (although I still can't imagine how, I'm willing to just 
chalk this whole thing up to user error...of course logs show that the 
user was me, which is a shame :) and is hard to disprove (with only one 
log file active) with the exception of base2. base2 has multiple log files 
going back:

[slave4]
-rw-------   1 root     root     9999986 Sep  6 18:03 log.0000000001
-rw-------   1 root     root     9999967 Sep 10 14:03 log.0000000002
-rw-------   1 root     root     9999983 Sep 18 16:33 log.0000000003
-rw-------   1 root     root     9429761 Oct  8 05:33 log.0000000004

[slave6]
-rw-------   1 root     root     9999986 Sep  6 18:03 log.0000000001
-rw-------   1 root     root     9999967 Sep 10 14:03 log.0000000002
-rw-------   1 root     root     9999983 Sep 18 16:33 log.0000000003
-rw-------   1 root     root     9429761 Oct  8 05:33 log.0000000004

which of course match the db_stat -l, but also extend back prior to 
September 24 according to the filesystem timestamps. I guess the argument 
could be made that log 4 was truncated on September 24...would that be 
detected/come up sane/come up bad in the db_stat?

Prev by Date: (ITS#5177) refreshAndPersist race condition
Next by Date: Re: (ITS#4940) libldap doesn't wait for server's TLS close_notify
Index(es):
- Chronological
- Thread