[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#9017) Improving performance of commit sync in Windows



kriszyp@gmail.com wrote:
> Full_Name: Kristopher William Zyp
> Version: LMDB 0.9.23
> OS: Windows
> URL: https://github.com/kriszyp/node-lmdb/commit/7ff525ae57684a163d32af74a0ab9332b7fc4ce9
> Submission from: (NULL) (71.199.6.148)
> 
> 
> We have seen very poor performance on the sync of commits on large databases in
> Windows. On databases with 2GB of data, in writemap mode, the sync of even small
> commits is consistently well over 100ms (without writemap it is faster, but
> still slow). It is expected that a sync should take some time while waiting for
> disk confirmation of the writes, but more concerning is that these sync
> operations (in writemap mode) are instead dominated by nearly 100% system CPU
> utilization, so operations that requires sub-millisecond b-tree update
> operations are then dominated by very large amounts of system CPU cycles during
> the sync phase.
> 
> I think that the fundamental problem is that FlushViewOfFile seems to be an O(n)
> operation where n is the size of the file (or map). I presume that Windows is
> scanning the entire map/file for dirty pages to flush, I'm guessing because it
> doesn't have an internal index of all the dirty pages for every file/map-view in
> the OS disk cache. Therefore, the turns into an extremely expensive, CPU-bound
> operation to find the dirty pages for (large file) and initiate their writes,
> which, of course, is contrary to the whole goal of a scalable database system.
> And FlushFileBuffers is also relatively slow as well. We have attempted to batch
> as many operations into single transaction as possible, but this is still a very
> large overhead.
> 
> The Windows docs for FlushFileBuffers itself warns about the inefficiencies of
> this function (https://docs.microsoft.com/en-us/windows/desktop/api/fileapi/nf-fileapi-flushfilebuffers).
> Which also points to the solution: it is much faster to write out the dirty
> pages with WriteFile through a sync file handle (FILE_FLAG_WRITE_THROUGH).
> 
> The associated patch
> (https://github.com/kriszyp/node-lmdb/commit/7ff525ae57684a163d32af74a0ab9332b7fc4ce9)
> is my attempt at implementing this solution, for Windows. Fortunately, with the
> design of LMDB, this is relatively straightforward. LMDB already supports
> writing out dirty pages with WriteFile calls. I added a write-through handle for
> sending these writes directly to disk. I then made that file-handle
> overlapped/asynchronously, so all the writes for a commit could be started in
> overlap mode, and (at least theoretically) transfer in parallel to the drive and
> then used GetOverlappedResult to wait for the completion. So basically
> mdb_page_flush becomes the sync. I extended the writing of dirty pages through
> WriteFile to writemap mode as well (for writing meta too), so that WriteFile
> with write-through can be used to flush the data without ever needing to call
> FlushViewOfFile or FlushFileBuffers. I also implemented support for write
> gathering in writemap mode where contiguous file positions infers contiguous
> memory (by tracking the starting position with wdp and writing contiguous pages
> in single operations). Sorting of the dirty list is maintained even in writemap
> mode for this purpose.

What is the point of using writemap mode if you still need to use WriteFile
on every individual page?

> The performance benefits of this patch, in my testing, are considerable. Writing
> out/syncing transactions is typically over 5x faster in writemap mode, and 2x
> faster in standard mode. And perhaps more importantly (especially in environment
> with many threads/processes), the efficiency benefits are even larger,
> particularly in writemap mode, where there can be a 50-100x reduction in the
> system CPU usage by using this patch. This brings windows performance with
> sync'ed transactions in LMDB back into the range of "lightning" performance :).

What is the performance difference between your patch using writemap, and just
not using writemap in the first place?

-- 
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/