git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Neeraj Singh <nksingh85@gmail.com>
To: Christoph Hellwig <hch@lst.de>
Cc: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Git List" <git@vger.kernel.org>,
	"Patrick Steinhardt" <ps@pks.im>, "Jeff King" <peff@peff.net>,
	"Johannes Schindelin" <Johannes.Schindelin@gmx.de>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Neeraj K. Singh" <neerajsi@microsoft.com>,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	"Eric Wong" <e@80x24.org>,
	"Emily Shaffer" <emilyshaffer@google.com>,
	linux-fsdevel@vger.kernel.org,
	"Amir Goldstein" <amir73il@gmail.com>
Subject: Re: RFC: A configuration design for future-proofing fsync() configuration
Date: Wed, 17 Nov 2021 10:49:20 -0800	[thread overview]
Message-ID: <CANQDOdedAoOvPHra0e8PuOO68xt+gOSbbV3tHzGxcyJy5nTm_A@mail.gmail.com> (raw)
In-Reply-To: <20211112055421.GA27823@lst.de>

On Thu, Nov 11, 2021 at 9:54 PM Christoph Hellwig <hch@lst.de> wrote:
>
> On Wed, Nov 10, 2021 at 04:47:24PM -0800, Neeraj Singh wrote:
> > It would be nice to loop in some Linux fs developers to find out what can be
> > done on current implementations to get the durability without terrible
> > performance. From reading the docs and mailing threads it looks like the
> > sync_file_range + bulk fsync approach should actually work on the current XFS
> > implementation.
>
> If you want more than just my advice linux-fsdevel@vger.kernel.org is
> a good place to find a wide range of opinions.
>
> Anyway, I think syncfs is the biggest band for the buck as it will give
> you very efficient syncing with very little overhead in git, but it does
> have a huge noisy neighbor problem that might make it unattractive
> for multi-tenant file systems or git hosting.

To summarize where we are at for linux-fsdevel:
We're working on making Git preserve data added to the repo even if
the system crashes or loses power at some point soon after a Git
command completes. The default behavior of git-for-windows is to set
core.fsyncobjectfiles=true, which at least ensures durability for
loose object files.

The current implementation of core.fsyncobjectfiles inserts an fsync
between writing each new object to a temp name and renaming it to its
final hash-based name. This approach is slow when adding hundreds of
files to the repo [1]. The main cost on the hardware we tested is
actually the CACHE_FLUSH request sent down to
the storage hardware. There is also work in-flight by Patrick
Steinhardt to sync ref files [2].

In a patch series at [3], I implemented a batch mode that issues
pagecache writeback for each object file when it's being written and
then before any of the files are renamed to their final destination we
do an fsync to a dummy file on the same filesystem.  On linux, this is
using the sync_file_range(fd,0,0,  SYNC_FILE_RANGE_WRITE_AND_WAIT) to
do the pagecache writeback.  According to Amir's thread at [4] this
flag combo should actually trigger the desired writeback. The
expectation is that the fsync of the dummy file should trigger a log
writeback and one or more CACHE_FLUSH commands to harden the block
mapping metadata and directory entries such that the data would be
retrievable after the fsync completes.

The equivalent sequence is specified to work on the common Windows
filesystems [5]. The question I have for the Linux community is
whether the same sequence will work on any of the common extant Linux
filesystems such that it can provide value to Git users on Linux. My
understanding from Christoph Hellwig's comments is that on XFS at
least the sync_file_range, fsync, and rename sequence would allow us
to guarantee that the complete written contents of the file would be
visible if the new name is visible.  I also expect that additional
fsync to a dummy file after the renames would also ensure that the log
is forced again, which should ensure that all of the renames are
visible before a ref file could be written that points at one of the
object names.

I wasn't able to find any clear semantics about the ext4 filesystem,
and I gather from what I've read that the btrfs filesystem does not
support the desired semantics.  Christoph mentioned that syncfs would
efficiently provide a batched CACHE_FLUSH with the cost of picking up
dirty cached data unrelated to Git.

Are there any opinions on the Linux side about what APIs we should use
to provide durability across multiple Git files while not completely
tanking performance by adding one CACHE_FLUSH per file modified?  What
are the semantics of the ext4 log (when it is enabled) with regards to
creating a temp file, populating its contents and then renaming it?
Are they similar enough to XFS's 'log force' such that our batch mode
would work there?

Thanks,
Neeraj
Windows Core Filesystem Dev

[1] https://docs.google.com/spreadsheets/d/1uxMBkEXFFnQ1Y3lXKqcKpw6Mq44BzhpCAcPex14T-QQ/edit#gid=1898936117
[2] https://lore.kernel.org/git/cover.1636544377.git.ps@pks.im/
[3] https://lore.kernel.org/git/b9d3d87443266767f00e77c967bd77357fe50484.1633366667.git.gitgitgadget@gmail.com/
[4] https://lore.kernel.org/linux-fsdevel/20190419072938.31320-1-amir73il@gmail.com/
[5] See FLUSH_FLAGS_NO_SYNC -
https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/nf-ntifs-ntflushbuffersfileex

  reply	other threads:[~2021-11-17 18:49 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-10 15:09 RFC: A configuration design for future-proofing fsync() configuration Ævar Arnfjörð Bjarmason
2021-11-11  0:47 ` Neeraj Singh
2021-11-11  0:57   ` Ævar Arnfjörð Bjarmason
2021-11-17 22:16     ` Neeraj Singh
2021-11-18 19:00       ` Junio C Hamano
2021-11-18 19:46         ` Neeraj Singh
2021-11-12  5:54   ` Christoph Hellwig
2021-11-17 18:49     ` Neeraj Singh [this message]
2021-11-11 18:03 ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CANQDOdedAoOvPHra0e8PuOO68xt+gOSbbV3tHzGxcyJy5nTm_A@mail.gmail.com \
    --to=nksingh85@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=amir73il@gmail.com \
    --cc=avarab@gmail.com \
    --cc=e@80x24.org \
    --cc=emilyshaffer@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=neerajsi@microsoft.com \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).