[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#8475) Feature request: MDB low durability transactions



On 08/07/2016 05:44 PM, Howard Chu wrote:
> The only way to guarantee integrity is with ordered writes. All SCSI
> devices support this feature, but e.g. the Linux kernel does not (and
> neither does SATA, and no idea about PCIe SSDs...).
>
> Lacking a portable mechanism for ordered writes, you have two choices
> for preserving integrity - append-only operation (which forces ordered
> writes anyway) or at least one synchronous write somewhere.
>
> Whenever you decide to reuse existing pages rather than operating as
> append-only, you create the possibility of overwriting some required
> data before it was safe to do so. Your 3-root checksum scheme *might*
> let you detect that the DB is corrupted, but it *won't* let you recover
> to a clean state. Given that writes occur in unpredictable order,
> without fsyncs there is no way you can guarantee that anything sane is
> on the disk.

Consider three roots without any checksums. Each root has a simple flag 
indicating whether it was written durably (fsync write barrier). During 
recovery, non-durable roots are simply ignored/discarded. This is 
equivalent to Hallvard's suggestion for volatile meta-pages. I think 
it's pretty clear this is workable.

 From there, checksums just give you slightly stronger guarantees, 
although they might not be worth the overhead (CPU/storage) and recovery 
complexity.