git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Neeraj Singh <nksingh85@gmail.com>
To: Christoph Hellwig <hch@lst.de>
Cc: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Neeraj Singh via GitGitGadget" <gitgitgadget@gmail.com>,
	"Git List" <git@vger.kernel.org>,
	"Johannes Schindelin" <Johannes.Schindelin@gmx.de>,
	"Jeff King" <peff@peff.net>,
	"Jeff Hostetler" <jeffhost@microsoft.com>,
	"Neeraj Singh" <neerajsi@microsoft.com>
Subject: Re: [PATCH 2/2] core.fsyncobjectfiles: batch disk flushes
Date: Tue, 31 Aug 2021 12:59:14 -0700	[thread overview]
Message-ID: <CANQDOdfV3omEBHOAq1b2P4Wb5=FCtrtsy22VvRu+FneFx9o9Gw@mail.gmail.com> (raw)
In-Reply-To: <20210828065700.GA31211@lst.de>

On Fri, Aug 27, 2021 at 11:57 PM Christoph Hellwig <hch@lst.de> wrote:
>
> On Fri, Aug 27, 2021 at 05:20:44PM -0700, Neeraj Singh wrote:
> > You're right. On re-read of the man page, sync_file_range is listed as
> > an "extremely dangerous"
> > system call.  The opportunity in the linux kernel is to offer an
> > alternative set of flags or separate
> > API that allows for an application like Git to separate a metadata
> > writeback request from the disk flush.
>
> How do you want to do that?  I metadata writeback without a cache flush
> is worse than useless, in fact it is generally actively harmful.
>
> To take XFS as an example:  fsync and fdatasync do the following thing:
>
>  1) writeback all dirty data for file to the data device
>  2) flush the write cache of the data device to ensure they are really
>     on disk before writing back the metadata referring to them
>  3) write out the log up until the log sequence that contained the last
>     modifications to the file
>  4) flush the cache for the log device.
>     If the data device and the log device are the same (they usually are
>     for common setups) and the log device support the FUA bit that writes
>     through the cache, the log writes use that bit and this step can
>     be skipped.
>
> So in general there are very few metadata writes, and it is absolutely
> essential to flush the cache before that, because otherwise your metadata
> could point to data that might not actually have made it to disk.
>
> The best way to optimize such a workload is by first batching all the
> data writeout for multiple fils in step one, then only doing one cache
> flush and one log force (as we call it) to cover all the files.  syncfs
> will do that, but without a good way to pick individual files.

Yes, I think we want to do step (1) of your sequence for all of the files, then
issue steps (2-4) for all files as a group.  Of course, if the log
fills up then we
can flush the intermediate steps.  The unfortunate thing is that
there's no Linux interface
to do step (1) and to also ensure that the relevant data is in the log
stream or is
otherwise available to be part of the durable metadata.

It seems to me that XFS would be compatible with this sequence if the
appropriate
kernel API exists.

>
> > Separately, I'm hoping I can push from the Windows filesystem side to
> > get a barrier primitive put into
> > the NVME standard so that we can offer more useful behavior to
> > applications rather than these painful
> > hardware flushes.
>
> I'm not sure what you mean with barriers, but if you mean the concept
> of implying a global ordering on I/Os as we did in Linux back in the
> bad old days the barrier bio flag, or badly reinvented by this paper:
>
>   https://www.usenix.org/conference/fast18/presentation/won
>
> they might help a little bit with single threaded operations, but will
> heavily degrade I/O performance for multithreaded workloads.  As an
> active member of (but not speaking for) the NVMe technical working group
> with a bit of knowledge of SSD internals I also doubt it will be very
> well received there.

I looked at that paper and definitely agree with you about the questionable
implementation strategy they picked. I don't (yet) have detailed knowledge of
SSD internals, but it's surprising to me that there is little value to
barrier semantics
within the drive as opposed to a full durability sync. At least for
Windows, we have
a database (the Registry) for which any single-threaded latency
improvement would
be welcome.

  reply	other threads:[~2021-08-31 19:59 UTC|newest]

Thread overview: 160+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-25  1:51 [PATCH 0/2] [RFC] Implement a bulk-checkin option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
2021-08-25  1:51 ` [PATCH 1/2] object-file: use futimes rather than utime Neeraj Singh via GitGitGadget
2021-08-25 13:51   ` Johannes Schindelin
2021-08-25 22:08     ` Neeraj Singh
2021-08-25  1:51 ` [PATCH 2/2] core.fsyncobjectfiles: batch disk flushes Neeraj Singh via GitGitGadget
2021-08-25  5:38   ` Christoph Hellwig
2021-08-25 17:40     ` Neeraj Singh
2021-08-26  5:54       ` Christoph Hellwig
2021-08-25 16:11   ` Ævar Arnfjörð Bjarmason
2021-08-26  0:49     ` Neeraj Singh
2021-08-26  5:50       ` Christoph Hellwig
2021-08-28  0:20         ` Neeraj Singh
2021-08-28  6:57           ` Christoph Hellwig
2021-08-31 19:59             ` Neeraj Singh [this message]
2021-09-01  5:09               ` Christoph Hellwig
2021-08-26  5:57     ` Christoph Hellwig
2021-08-25 18:52   ` Johannes Schindelin
2021-08-25 21:26     ` Junio C Hamano
2021-08-26  1:19     ` Neeraj Singh
2021-08-25 16:58 ` [PATCH 0/2] [RFC] Implement a bulk-checkin option for core.fsyncObjectFiles Neeraj Singh
2021-08-27 23:49 ` [PATCH v2 0/6] Implement a batched fsync " Neeraj K. Singh via GitGitGadget
2021-08-27 23:49   ` [PATCH v2 1/6] object-file: use futimens rather than utime Neeraj Singh via GitGitGadget
2021-08-27 23:49   ` [PATCH v2 2/6] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
2021-08-27 23:49   ` [PATCH v2 3/6] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
2021-08-27 23:49   ` [PATCH v2 4/6] core.fsyncobjectfiles: add windows support for batch mode Neeraj Singh via GitGitGadget
2021-08-27 23:49   ` [PATCH v2 5/6] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
2021-08-27 23:49   ` [PATCH v2 6/6] core.fsyncobjectfiles: performance tests for add and stash Neeraj Singh via GitGitGadget
2021-09-07 19:44   ` [PATCH v2 0/6] Implement a batched fsync option for core.fsyncObjectFiles Neeraj Singh
2021-09-07 19:50     ` Ævar Arnfjörð Bjarmason
2021-09-07 19:54     ` Randall S. Becker
2021-09-08  0:54       ` Neeraj Singh
2021-09-08  1:22         ` Ævar Arnfjörð Bjarmason
2021-09-08 14:04           ` Randall S. Becker
2021-09-08 19:01           ` Neeraj Singh
2021-09-08  0:55     ` Neeraj Singh
2021-09-08  6:44       ` Junio C Hamano
2021-09-08  6:49         ` Christoph Hellwig
2021-09-08 13:57           ` Randall S. Becker
2021-09-08 14:13             ` 'Christoph Hellwig'
2021-09-08 14:25               ` Randall S. Becker
2021-09-08 16:34         ` Neeraj Singh
2021-09-08 19:12           ` Junio C Hamano
2021-09-08 19:20             ` Neeraj Singh
2021-09-08 19:23           ` Ævar Arnfjörð Bjarmason
2021-09-14  3:38   ` [PATCH v3 " Neeraj K. Singh via GitGitGadget
2021-09-14  3:38     ` [PATCH v3 1/6] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
2021-09-14  3:38     ` [PATCH v3 2/6] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
2021-09-14 10:39       ` Bagas Sanjaya
2021-09-14 19:05         ` Neeraj Singh
2021-09-14 19:34       ` Junio C Hamano
2021-09-14 20:33         ` Junio C Hamano
2021-09-15  4:55         ` Neeraj Singh
2021-09-14  3:38     ` [PATCH v3 3/6] core.fsyncobjectfiles: add windows support for batch mode Neeraj Singh via GitGitGadget
2021-09-14  3:38     ` [PATCH v3 4/6] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
2021-09-14 19:35       ` Junio C Hamano
2021-09-14  3:38     ` [PATCH v3 5/6] core.fsyncobjectfiles: performance tests for add and stash Neeraj Singh via GitGitGadget
2021-09-14  3:38     ` [PATCH v3 6/6] core.fsyncobjectfiles: enable batch mode for testing Neeraj Singh via GitGitGadget
2021-09-15 16:21       ` Junio C Hamano
2021-09-15 22:43         ` Neeraj Singh
2021-09-15 23:12           ` Junio C Hamano
2021-09-16  6:19             ` Junio C Hamano
2021-09-14  5:49     ` [PATCH v3 0/6] Implement a batched fsync option for core.fsyncObjectFiles Christoph Hellwig
2021-09-20 22:15     ` [PATCH v4 " Neeraj K. Singh via GitGitGadget
2021-09-20 22:15       ` [PATCH v4 1/6] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
2021-09-20 22:15       ` [PATCH v4 2/6] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
2021-09-21 23:16         ` Ævar Arnfjörð Bjarmason
2021-09-22  1:23           ` Neeraj Singh
2021-09-22  2:02             ` Ævar Arnfjörð Bjarmason
2021-09-22 19:46             ` Neeraj Singh
2021-09-20 22:15       ` [PATCH v4 3/6] core.fsyncobjectfiles: add windows support for batch mode Neeraj Singh via GitGitGadget
2021-09-21 23:42         ` Ævar Arnfjörð Bjarmason
2021-09-22  1:23           ` Neeraj Singh
2021-09-20 22:15       ` [PATCH v4 4/6] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
2021-09-21 23:46         ` Ævar Arnfjörð Bjarmason
2021-09-22  1:27           ` Neeraj Singh
2021-09-23 22:32             ` Neeraj Singh
2021-09-20 22:15       ` [PATCH v4 5/6] core.fsyncobjectfiles: tests for batch mode Neeraj Singh via GitGitGadget
2021-09-21 23:54         ` Ævar Arnfjörð Bjarmason
2021-09-22  1:30           ` Neeraj Singh
2021-09-22  1:58             ` Ævar Arnfjörð Bjarmason
2021-09-22 17:55               ` Neeraj Singh
2021-09-22 20:01                 ` Ævar Arnfjörð Bjarmason
2021-09-20 22:15       ` [PATCH v4 6/6] core.fsyncobjectfiles: performance tests for add and stash Neeraj Singh via GitGitGadget
2021-09-24 20:12       ` [PATCH v5 0/7] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
2021-09-24 20:12         ` [PATCH v5 1/7] object-file.c: do not rename in a temp odb Neeraj Singh via GitGitGadget
2021-09-24 20:12         ` [PATCH v5 2/7] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
2021-09-24 20:12         ` [PATCH v5 3/7] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
2021-09-24 21:47           ` Neeraj Singh
2021-09-24 20:12         ` [PATCH v5 4/7] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
2021-09-24 21:49           ` Neeraj Singh
2021-09-24 20:12         ` [PATCH v5 5/7] unpack-objects: " Neeraj Singh via GitGitGadget
2021-09-24 20:12         ` [PATCH v5 6/7] core.fsyncobjectfiles: tests for batch mode Neeraj Singh via GitGitGadget
2021-09-24 20:12         ` [PATCH v5 7/7] core.fsyncobjectfiles: performance tests for add and stash Neeraj Singh via GitGitGadget
2021-09-24 23:31         ` [PATCH v5 0/7] Implement a batched fsync option for core.fsyncObjectFiles Neeraj Singh
2021-09-24 23:53         ` [PATCH v6 0/8] " Neeraj K. Singh via GitGitGadget
2021-09-24 23:53           ` [PATCH v6 1/8] object-file.c: do not rename in a temp odb Neeraj Singh via GitGitGadget
2021-09-24 23:53           ` [PATCH v6 2/8] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
2021-09-24 23:53           ` [PATCH v6 3/8] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
2021-09-25  3:15             ` Bagas Sanjaya
2021-09-27  0:27               ` Neeraj Singh
2021-09-24 23:53           ` [PATCH v6 4/8] core.fsyncobjectfiles: add windows support for batch mode Neeraj Singh via GitGitGadget
2021-09-27 20:07             ` Junio C Hamano
2021-09-27 20:55               ` Neeraj Singh
2021-09-27 21:03                 ` Neeraj Singh
2021-09-27 23:53                   ` Junio C Hamano
2021-09-24 23:53           ` [PATCH v6 5/8] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
2021-09-24 23:53           ` [PATCH v6 6/8] unpack-objects: " Neeraj Singh via GitGitGadget
2021-09-24 23:53           ` [PATCH v6 7/8] core.fsyncobjectfiles: tests for batch mode Neeraj Singh via GitGitGadget
2021-09-24 23:53           ` [PATCH v6 8/8] core.fsyncobjectfiles: performance tests for add and stash Neeraj Singh via GitGitGadget
2021-09-28 23:32           ` [PATCH v7 0/9] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
2021-09-28 23:32             ` [PATCH v7 1/9] object-file.c: do not rename in a temp odb Neeraj Singh via GitGitGadget
2021-09-28 23:55               ` Jeff King
2021-09-29  0:10                 ` Neeraj Singh
2021-09-28 23:32             ` [PATCH v7 2/9] tmp-objdir: new API for creating temporary writable databases Neeraj Singh via GitGitGadget
2021-09-29  8:41               ` Elijah Newren
2021-09-29 16:40                 ` Neeraj Singh
2021-09-28 23:32             ` [PATCH v7 3/9] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
2021-09-28 23:32             ` [PATCH v7 4/9] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
2021-09-28 23:32             ` [PATCH v7 5/9] core.fsyncobjectfiles: add windows support for batch mode Neeraj Singh via GitGitGadget
2021-09-28 23:32             ` [PATCH v7 6/9] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
2021-09-28 23:32             ` [PATCH v7 7/9] unpack-objects: " Neeraj Singh via GitGitGadget
2021-09-28 23:32             ` [PATCH v7 8/9] core.fsyncobjectfiles: tests for batch mode Neeraj Singh via GitGitGadget
2021-09-28 23:32             ` [PATCH v7 9/9] core.fsyncobjectfiles: performance tests for add and stash Neeraj Singh via GitGitGadget
2021-10-04 16:57             ` [PATCH v8 0/9] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
2021-10-04 16:57               ` [PATCH v8 1/9] tmp-objdir: new API for creating temporary writable databases Neeraj Singh via GitGitGadget
2021-10-04 16:57               ` [PATCH v8 2/9] tmp-objdir: disable ref updates when replacing the primary odb Neeraj Singh via GitGitGadget
2021-10-04 16:57               ` [PATCH v8 3/9] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
2021-10-04 16:57               ` [PATCH v8 4/9] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
2021-10-04 16:57               ` [PATCH v8 5/9] core.fsyncobjectfiles: add windows support for batch mode Neeraj Singh via GitGitGadget
2021-10-04 16:57               ` [PATCH v8 6/9] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
2021-10-04 16:57               ` [PATCH v8 7/9] unpack-objects: " Neeraj Singh via GitGitGadget
2021-10-04 16:57               ` [PATCH v8 8/9] core.fsyncobjectfiles: tests for batch mode Neeraj Singh via GitGitGadget
2021-10-04 16:57               ` [PATCH v8 9/9] core.fsyncobjectfiles: performance tests for add and stash Neeraj Singh via GitGitGadget
2021-11-15 23:50               ` [PATCH v9 0/9] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
2021-11-15 23:50                 ` [PATCH v9 1/9] tmp-objdir: new API for creating temporary writable databases Neeraj Singh via GitGitGadget
2021-11-30 21:27                   ` Elijah Newren
2021-11-30 21:52                     ` Neeraj Singh
2021-11-30 22:36                       ` Elijah Newren
2021-11-15 23:50                 ` [PATCH v9 2/9] tmp-objdir: disable ref updates when replacing the primary odb Neeraj Singh via GitGitGadget
2021-11-16  7:23                   ` Ævar Arnfjörð Bjarmason
2021-11-16 20:38                     ` Neeraj Singh
2021-11-15 23:50                 ` [PATCH v9 3/9] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
2021-11-15 23:50                 ` [PATCH v9 4/9] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
2021-11-15 23:50                 ` [PATCH v9 5/9] core.fsyncobjectfiles: add windows support for batch mode Neeraj Singh via GitGitGadget
2021-11-15 23:51                 ` [PATCH v9 6/9] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
2021-11-15 23:51                 ` [PATCH v9 7/9] unpack-objects: " Neeraj Singh via GitGitGadget
2021-11-15 23:51                 ` [PATCH v9 8/9] core.fsyncobjectfiles: tests for batch mode Neeraj Singh via GitGitGadget
2021-11-15 23:51                 ` [PATCH v9 9/9] core.fsyncobjectfiles: performance tests for add and stash Neeraj Singh via GitGitGadget
2021-11-16  8:02                 ` [PATCH v9 0/9] Implement a batched fsync option for core.fsyncObjectFiles Ævar Arnfjörð Bjarmason
2021-11-17  7:06                   ` Neeraj Singh
2021-11-17  7:24                     ` Ævar Arnfjörð Bjarmason
2021-11-18  5:03                       ` Neeraj Singh
2021-12-01 14:15                         ` Ævar Arnfjörð Bjarmason
2022-03-09 23:02                           ` Ævar Arnfjörð Bjarmason
2022-03-10  1:16                             ` Neeraj Singh
2022-03-10 14:01                               ` Ævar Arnfjörð Bjarmason
2022-03-10 17:52                                 ` Neeraj Singh
2022-03-10 18:08                                   ` rsbecker
2022-03-10 18:43                                     ` Neeraj Singh
2022-03-10 18:48                                       ` rsbecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CANQDOdfV3omEBHOAq1b2P4Wb5=FCtrtsy22VvRu+FneFx9o9Gw@mail.gmail.com' \
    --to=nksingh85@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=hch@lst.de \
    --cc=jeffhost@microsoft.com \
    --cc=neerajsi@microsoft.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).