[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#8475) Feature request: MDB low durability transactions



Ben Trask wrote:
> On 08/07/2016 05:44 PM, Howard Chu wrote:
>> The only way to guarantee integrity is with ordered writes. All SCSI
>> devices support this feature, but e.g. the Linux kernel does not (and
>> neither does SATA, and no idea about PCIe SSDs...).
>>
>> Lacking a portable mechanism for ordered writes, you have two choices
>> for preserving integrity - append-only operation (which forces ordered
>> writes anyway) or at least one synchronous write somewhere.
>>
>> Whenever you decide to reuse existing pages rather than operating as
>> append-only, you create the possibility of overwriting some required
>> data before it was safe to do so. Your 3-root checksum scheme *might*
>> let you detect that the DB is corrupted, but it *won't* let you recover
>> to a clean state. Given that writes occur in unpredictable order,
>> without fsyncs there is no way you can guarantee that anything sane is
>> on the disk.
>
> Consider three roots without any checksums. Each root has a simple flag
> indicating whether it was written durably (fsync write barrier). During
> recovery, non-durable roots are simply ignored/discarded. This is equivalent
> to Hallvard's suggestion for volatile meta-pages. I think it's pretty clear
> this is workable.
>
>  From there, checksums just give you slightly stronger guarantees, although
> they might not be worth the overhead (CPU/storage) and recovery complexity.

Knowing whether or not the root pages are pristine still doesn't tell you 
anything about whether the data pages are intact. The only way to make any of 
these schemes work is to avoid overwriting/reusing any data pages for the last 
N transactions. I.e., reverting to append-only behavior. So the underlying 
question (which we have wrestled with internally for quite some time) which 
you haven't asked or answered - how many of these non-durable transactions 
will you support at any given time?

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/