[Date Prev][Date Next]
Re: (ITS#8475) Feature request: MDB low durability transactions
On 08/07/2016 05:44 PM, Howard Chu wrote:
> The only way to guarantee integrity is with ordered writes. All SCSI
> devices support this feature, but e.g. the Linux kernel does not (and
> neither does SATA, and no idea about PCIe SSDs...).
> Lacking a portable mechanism for ordered writes, you have two choices
> for preserving integrity - append-only operation (which forces ordered
> writes anyway) or at least one synchronous write somewhere.
> Whenever you decide to reuse existing pages rather than operating as
> append-only, you create the possibility of overwriting some required
> data before it was safe to do so. Your 3-root checksum scheme *might*
> let you detect that the DB is corrupted, but it *won't* let you recover
> to a clean state. Given that writes occur in unpredictable order,
> without fsyncs there is no way you can guarantee that anything sane is
> on the disk.
Consider three roots without any checksums. Each root has a simple flag
indicating whether it was written durably (fsync write barrier). During
recovery, non-durable roots are simply ignored/discarded. This is
equivalent to Hallvard's suggestion for volatile meta-pages. I think
it's pretty clear this is workable.
From there, checksums just give you slightly stronger guarantees,
although they might not be worth the overhead (CPU/storage) and recovery