From: Neeraj Singh <nksingh85@gmail.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: "Neeraj K. Singh via GitGitGadget" <gitgitgadget@gmail.com>,
Git List <git@vger.kernel.org>,
Johannes Schindelin <Johannes.Schindelin@gmx.de>,
Jeff King <peff@peff.net>,
Jeff Hostetler <jeffhost@microsoft.com>,
Christoph Hellwig <hch@lst.de>,
"Randall S. Becker" <rsbecker@nexbridge.com>,
Bagas Sanjaya <bagasdotme@gmail.com>,
Elijah Newren <newren@gmail.com>,
"Neeraj K. Singh" <neerajsi@microsoft.com>,
Patrick Steinhardt <ps@pks.im>,
Junio C Hamano <gitster@pobox.com>, Eric Wong <e@80x24.org>
Subject: Re: [PATCH v9 0/9] Implement a batched fsync option for core.fsyncObjectFiles
Date: Wed, 17 Nov 2021 21:03:16 -0800 [thread overview]
Message-ID: <CANQDOdcKzxM+M7wgxUz831SbpwGWR7gcUC8xLFM14BcCJ+60sA@mail.gmail.com> (raw)
In-Reply-To: <211117.86ee7f8cm4.gmgdl@evledraar.gmail.com>
On Tue, Nov 16, 2021 at 11:28 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>
> On Tue, Nov 16 2021, Neeraj Singh wrote:
>
> > On Tue, Nov 16, 2021 at 12:10 AM Ævar Arnfjörð Bjarmason
> > <avarab@gmail.com> wrote:
> >>
> >>
> >> On Mon, Nov 15 2021, Neeraj K. Singh via GitGitGadget wrote:
> >>
> >> > * Per [2], I'm leaving the fsyncObjectFiles configuration as is with
> >> > 'true', 'false', and 'batch'. This makes using old and new versions of
> >> > git with 'batch' mode a little trickier, but hopefully people will
> >> > generally be moving forward in versions.
> >> >
> >> > [1] See
> >> > https://lore.kernel.org/git/pull.1067.git.1635287730.gitgitgadget@gmail.com/
> >> > [2] https://lore.kernel.org/git/xmqqh7cimuxt.fsf@gitster.g/
> >>
> >> I really think leaving that in-place is just being unnecessarily
> >> cavalier. There's a lot of mixed-version environments where git is
> >> deployed in, and we almost never break the configuration in this way (I
> >> think in the past always by mistake).
> >
> >> In this case it's easy to avoid it, and coming up with a less narrow
> >> config model[1] seems like a good idea in any case to unify the various
> >> outstanding work in this area.
> >>
> >> More generally on this series, per the thread ending in [2] I really
> >
> > My primary goal in all of these changes is to move git-for-windows over to
> > a default of batch fsync so that it can get closer to other platforms
> > in performance
> > of 'git add' while still retaining the same level of data integrity.
> > I'm hoping that
> > most end-users are just sticking to defaults here.
> >
> > I'm happy to change the configuration schema again if there's a
> > consensus from the Git
> > community that backwards-compatibility of the configuration is
> > actually important to someone.
> >
> > Also, if we're doing a deeper rethink of the fsync configuration (as
> > prompted by this work and
> > Eric Wong's and Patrick Steinhardts work), do we want to retain a mode
> > where we fsync some
> > parts of the persistent repo data but not others? If we add fsyncing
> > of the index in addition to the refs,
> > I believe we would have covered all of the critical data structures
> > that would be needed to find the
> > data that a user has added to the repo if they complete a series of
> > git commands and then experience
> > a system crash.
>
> Just talking about it is how we'll find consensus, maybe you & Junio
> would like to keep it as-is. I don't see why we'd expose this bad edge
> case in configuration handling to users when it's entirely avoidable,
> and we're still in the design phase.
After trying to figure out an implementation, I have a new proposal,
which I've shared on the other thread [1].
[1] https://lore.kernel.org/git/CANQDOdcdhfGtPg0PxpXQA5gQ4x9VknKDKCCi4HEB0Z1xgnjKzg@mail.gmail.com/
>
> >> don't get why we have code like this:
> >>
> >> @@ -503,10 +504,12 @@ static void unpack_all(void)
> >> if (!quiet)
> >> progress = start_progress(_("Unpacking objects"), nr_objects);
> >> CALLOC_ARRAY(obj_list, nr_objects);
> >> + plug_bulk_checkin();
> >> for (i = 0; i < nr_objects; i++) {
> >> unpack_one(i);
> >> display_progress(progress, i + 1);
> >> }
> >> + unplug_bulk_checkin();
> >> stop_progress(&progress);
> >>
> >> if (delta_list)
> >>
> >> As opposed to doing an fsync on the last object we're
> >> processing. I.e. why do we need the step of intentionally making the
> >> objects unavailable in the tmp-objdir, and creating a "cookie" file to
> >> sync at the start/end, as opposed to fsyncing on the last file (which
> >> we're writing out anyway).
> >>
> >> 1. https://lore.kernel.org/git/211110.86r1bogg27.gmgdl@evledraar.gmail.com/
> >> 2. https://lore.kernel.org/git/20211111000349.GA703@neerajsi-x1.localdomain/
> >
> > It's important to not expose an object's final name until its contents
> > have been fsynced
> > to disk. We want to ensure that wherever we crash, we won't have a
> > loose object that
> > Git may later try to open where the filename doesn't match the content
> > hash. I believe it's
> > okay for a given OID to be missing, since a later command could
> > recreate it, but an object
> > with a wrong hash looks like it would persist until we do a git-fsck.
>
> Yes, we handle that rather badly, as I mentioned in some other threads,
> but not doing the fsync on the last object v.s. a "cookie" file right
> afterwards seems like a hail-mary at best, no?
>
I'm not quite grasping what you're saying here. Are you saying that
using a dummy
file instead of one of the actual objects is less likely to produce
the desired outcome
on actual filesystem implementations?
> > I thought about figuring out how to sync the last object rather than some random
> > "cookie" file, but it wasn't clear to me how I'd figure out which
> > object is actually last
> > from library code in a way that doesn't burden each command with
> > somehow figuring
> > out its last object and communicating that. The 'cookie' approach
> > seems to lead to a cleaner
> > interface for callers.
>
> The above quoted code is looping through nr_objects isn't it? Can't a
> "do fsync" be passed down to unpack_one() when we process the last loose
> object?
Are you proposing that we do something different for unpack_objects
versus update_index
and git-add? I was hoping to keep all of the users of the batch fsync
functionality equivalent.
For the git-add workflow and update-index, we'd need to track the most
recent file so that we
can go back and fsync it. I don't believe that syncing the last
object composes well with the existing
implementation of those commands.
Thanks,
Neeraj
next prev parent reply other threads:[~2021-11-18 5:03 UTC|newest]
Thread overview: 160+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-25 1:51 [PATCH 0/2] [RFC] Implement a bulk-checkin option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
2021-08-25 1:51 ` [PATCH 1/2] object-file: use futimes rather than utime Neeraj Singh via GitGitGadget
2021-08-25 13:51 ` Johannes Schindelin
2021-08-25 22:08 ` Neeraj Singh
2021-08-25 1:51 ` [PATCH 2/2] core.fsyncobjectfiles: batch disk flushes Neeraj Singh via GitGitGadget
2021-08-25 5:38 ` Christoph Hellwig
2021-08-25 17:40 ` Neeraj Singh
2021-08-26 5:54 ` Christoph Hellwig
2021-08-25 16:11 ` Ævar Arnfjörð Bjarmason
2021-08-26 0:49 ` Neeraj Singh
2021-08-26 5:50 ` Christoph Hellwig
2021-08-28 0:20 ` Neeraj Singh
2021-08-28 6:57 ` Christoph Hellwig
2021-08-31 19:59 ` Neeraj Singh
2021-09-01 5:09 ` Christoph Hellwig
2021-08-26 5:57 ` Christoph Hellwig
2021-08-25 18:52 ` Johannes Schindelin
2021-08-25 21:26 ` Junio C Hamano
2021-08-26 1:19 ` Neeraj Singh
2021-08-25 16:58 ` [PATCH 0/2] [RFC] Implement a bulk-checkin option for core.fsyncObjectFiles Neeraj Singh
2021-08-27 23:49 ` [PATCH v2 0/6] Implement a batched fsync " Neeraj K. Singh via GitGitGadget
2021-08-27 23:49 ` [PATCH v2 1/6] object-file: use futimens rather than utime Neeraj Singh via GitGitGadget
2021-08-27 23:49 ` [PATCH v2 2/6] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
2021-08-27 23:49 ` [PATCH v2 3/6] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
2021-08-27 23:49 ` [PATCH v2 4/6] core.fsyncobjectfiles: add windows support for batch mode Neeraj Singh via GitGitGadget
2021-08-27 23:49 ` [PATCH v2 5/6] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
2021-08-27 23:49 ` [PATCH v2 6/6] core.fsyncobjectfiles: performance tests for add and stash Neeraj Singh via GitGitGadget
2021-09-07 19:44 ` [PATCH v2 0/6] Implement a batched fsync option for core.fsyncObjectFiles Neeraj Singh
2021-09-07 19:50 ` Ævar Arnfjörð Bjarmason
2021-09-07 19:54 ` Randall S. Becker
2021-09-08 0:54 ` Neeraj Singh
2021-09-08 1:22 ` Ævar Arnfjörð Bjarmason
2021-09-08 14:04 ` Randall S. Becker
2021-09-08 19:01 ` Neeraj Singh
2021-09-08 0:55 ` Neeraj Singh
2021-09-08 6:44 ` Junio C Hamano
2021-09-08 6:49 ` Christoph Hellwig
2021-09-08 13:57 ` Randall S. Becker
2021-09-08 14:13 ` 'Christoph Hellwig'
2021-09-08 14:25 ` Randall S. Becker
2021-09-08 16:34 ` Neeraj Singh
2021-09-08 19:12 ` Junio C Hamano
2021-09-08 19:20 ` Neeraj Singh
2021-09-08 19:23 ` Ævar Arnfjörð Bjarmason
2021-09-14 3:38 ` [PATCH v3 " Neeraj K. Singh via GitGitGadget
2021-09-14 3:38 ` [PATCH v3 1/6] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
2021-09-14 3:38 ` [PATCH v3 2/6] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
2021-09-14 10:39 ` Bagas Sanjaya
2021-09-14 19:05 ` Neeraj Singh
2021-09-14 19:34 ` Junio C Hamano
2021-09-14 20:33 ` Junio C Hamano
2021-09-15 4:55 ` Neeraj Singh
2021-09-14 3:38 ` [PATCH v3 3/6] core.fsyncobjectfiles: add windows support for batch mode Neeraj Singh via GitGitGadget
2021-09-14 3:38 ` [PATCH v3 4/6] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
2021-09-14 19:35 ` Junio C Hamano
2021-09-14 3:38 ` [PATCH v3 5/6] core.fsyncobjectfiles: performance tests for add and stash Neeraj Singh via GitGitGadget
2021-09-14 3:38 ` [PATCH v3 6/6] core.fsyncobjectfiles: enable batch mode for testing Neeraj Singh via GitGitGadget
2021-09-15 16:21 ` Junio C Hamano
2021-09-15 22:43 ` Neeraj Singh
2021-09-15 23:12 ` Junio C Hamano
2021-09-16 6:19 ` Junio C Hamano
2021-09-14 5:49 ` [PATCH v3 0/6] Implement a batched fsync option for core.fsyncObjectFiles Christoph Hellwig
2021-09-20 22:15 ` [PATCH v4 " Neeraj K. Singh via GitGitGadget
2021-09-20 22:15 ` [PATCH v4 1/6] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
2021-09-20 22:15 ` [PATCH v4 2/6] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
2021-09-21 23:16 ` Ævar Arnfjörð Bjarmason
2021-09-22 1:23 ` Neeraj Singh
2021-09-22 2:02 ` Ævar Arnfjörð Bjarmason
2021-09-22 19:46 ` Neeraj Singh
2021-09-20 22:15 ` [PATCH v4 3/6] core.fsyncobjectfiles: add windows support for batch mode Neeraj Singh via GitGitGadget
2021-09-21 23:42 ` Ævar Arnfjörð Bjarmason
2021-09-22 1:23 ` Neeraj Singh
2021-09-20 22:15 ` [PATCH v4 4/6] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
2021-09-21 23:46 ` Ævar Arnfjörð Bjarmason
2021-09-22 1:27 ` Neeraj Singh
2021-09-23 22:32 ` Neeraj Singh
2021-09-20 22:15 ` [PATCH v4 5/6] core.fsyncobjectfiles: tests for batch mode Neeraj Singh via GitGitGadget
2021-09-21 23:54 ` Ævar Arnfjörð Bjarmason
2021-09-22 1:30 ` Neeraj Singh
2021-09-22 1:58 ` Ævar Arnfjörð Bjarmason
2021-09-22 17:55 ` Neeraj Singh
2021-09-22 20:01 ` Ævar Arnfjörð Bjarmason
2021-09-20 22:15 ` [PATCH v4 6/6] core.fsyncobjectfiles: performance tests for add and stash Neeraj Singh via GitGitGadget
2021-09-24 20:12 ` [PATCH v5 0/7] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
2021-09-24 20:12 ` [PATCH v5 1/7] object-file.c: do not rename in a temp odb Neeraj Singh via GitGitGadget
2021-09-24 20:12 ` [PATCH v5 2/7] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
2021-09-24 20:12 ` [PATCH v5 3/7] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
2021-09-24 21:47 ` Neeraj Singh
2021-09-24 20:12 ` [PATCH v5 4/7] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
2021-09-24 21:49 ` Neeraj Singh
2021-09-24 20:12 ` [PATCH v5 5/7] unpack-objects: " Neeraj Singh via GitGitGadget
2021-09-24 20:12 ` [PATCH v5 6/7] core.fsyncobjectfiles: tests for batch mode Neeraj Singh via GitGitGadget
2021-09-24 20:12 ` [PATCH v5 7/7] core.fsyncobjectfiles: performance tests for add and stash Neeraj Singh via GitGitGadget
2021-09-24 23:31 ` [PATCH v5 0/7] Implement a batched fsync option for core.fsyncObjectFiles Neeraj Singh
2021-09-24 23:53 ` [PATCH v6 0/8] " Neeraj K. Singh via GitGitGadget
2021-09-24 23:53 ` [PATCH v6 1/8] object-file.c: do not rename in a temp odb Neeraj Singh via GitGitGadget
2021-09-24 23:53 ` [PATCH v6 2/8] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
2021-09-24 23:53 ` [PATCH v6 3/8] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
2021-09-25 3:15 ` Bagas Sanjaya
2021-09-27 0:27 ` Neeraj Singh
2021-09-24 23:53 ` [PATCH v6 4/8] core.fsyncobjectfiles: add windows support for batch mode Neeraj Singh via GitGitGadget
2021-09-27 20:07 ` Junio C Hamano
2021-09-27 20:55 ` Neeraj Singh
2021-09-27 21:03 ` Neeraj Singh
2021-09-27 23:53 ` Junio C Hamano
2021-09-24 23:53 ` [PATCH v6 5/8] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
2021-09-24 23:53 ` [PATCH v6 6/8] unpack-objects: " Neeraj Singh via GitGitGadget
2021-09-24 23:53 ` [PATCH v6 7/8] core.fsyncobjectfiles: tests for batch mode Neeraj Singh via GitGitGadget
2021-09-24 23:53 ` [PATCH v6 8/8] core.fsyncobjectfiles: performance tests for add and stash Neeraj Singh via GitGitGadget
2021-09-28 23:32 ` [PATCH v7 0/9] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
2021-09-28 23:32 ` [PATCH v7 1/9] object-file.c: do not rename in a temp odb Neeraj Singh via GitGitGadget
2021-09-28 23:55 ` Jeff King
2021-09-29 0:10 ` Neeraj Singh
2021-09-28 23:32 ` [PATCH v7 2/9] tmp-objdir: new API for creating temporary writable databases Neeraj Singh via GitGitGadget
2021-09-29 8:41 ` Elijah Newren
2021-09-29 16:40 ` Neeraj Singh
2021-09-28 23:32 ` [PATCH v7 3/9] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
2021-09-28 23:32 ` [PATCH v7 4/9] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
2021-09-28 23:32 ` [PATCH v7 5/9] core.fsyncobjectfiles: add windows support for batch mode Neeraj Singh via GitGitGadget
2021-09-28 23:32 ` [PATCH v7 6/9] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
2021-09-28 23:32 ` [PATCH v7 7/9] unpack-objects: " Neeraj Singh via GitGitGadget
2021-09-28 23:32 ` [PATCH v7 8/9] core.fsyncobjectfiles: tests for batch mode Neeraj Singh via GitGitGadget
2021-09-28 23:32 ` [PATCH v7 9/9] core.fsyncobjectfiles: performance tests for add and stash Neeraj Singh via GitGitGadget
2021-10-04 16:57 ` [PATCH v8 0/9] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
2021-10-04 16:57 ` [PATCH v8 1/9] tmp-objdir: new API for creating temporary writable databases Neeraj Singh via GitGitGadget
2021-10-04 16:57 ` [PATCH v8 2/9] tmp-objdir: disable ref updates when replacing the primary odb Neeraj Singh via GitGitGadget
2021-10-04 16:57 ` [PATCH v8 3/9] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
2021-10-04 16:57 ` [PATCH v8 4/9] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
2021-10-04 16:57 ` [PATCH v8 5/9] core.fsyncobjectfiles: add windows support for batch mode Neeraj Singh via GitGitGadget
2021-10-04 16:57 ` [PATCH v8 6/9] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
2021-10-04 16:57 ` [PATCH v8 7/9] unpack-objects: " Neeraj Singh via GitGitGadget
2021-10-04 16:57 ` [PATCH v8 8/9] core.fsyncobjectfiles: tests for batch mode Neeraj Singh via GitGitGadget
2021-10-04 16:57 ` [PATCH v8 9/9] core.fsyncobjectfiles: performance tests for add and stash Neeraj Singh via GitGitGadget
2021-11-15 23:50 ` [PATCH v9 0/9] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
2021-11-15 23:50 ` [PATCH v9 1/9] tmp-objdir: new API for creating temporary writable databases Neeraj Singh via GitGitGadget
2021-11-30 21:27 ` Elijah Newren
2021-11-30 21:52 ` Neeraj Singh
2021-11-30 22:36 ` Elijah Newren
2021-11-15 23:50 ` [PATCH v9 2/9] tmp-objdir: disable ref updates when replacing the primary odb Neeraj Singh via GitGitGadget
2021-11-16 7:23 ` Ævar Arnfjörð Bjarmason
2021-11-16 20:38 ` Neeraj Singh
2021-11-15 23:50 ` [PATCH v9 3/9] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
2021-11-15 23:50 ` [PATCH v9 4/9] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
2021-11-15 23:50 ` [PATCH v9 5/9] core.fsyncobjectfiles: add windows support for batch mode Neeraj Singh via GitGitGadget
2021-11-15 23:51 ` [PATCH v9 6/9] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
2021-11-15 23:51 ` [PATCH v9 7/9] unpack-objects: " Neeraj Singh via GitGitGadget
2021-11-15 23:51 ` [PATCH v9 8/9] core.fsyncobjectfiles: tests for batch mode Neeraj Singh via GitGitGadget
2021-11-15 23:51 ` [PATCH v9 9/9] core.fsyncobjectfiles: performance tests for add and stash Neeraj Singh via GitGitGadget
2021-11-16 8:02 ` [PATCH v9 0/9] Implement a batched fsync option for core.fsyncObjectFiles Ævar Arnfjörð Bjarmason
2021-11-17 7:06 ` Neeraj Singh
2021-11-17 7:24 ` Ævar Arnfjörð Bjarmason
2021-11-18 5:03 ` Neeraj Singh [this message]
2021-12-01 14:15 ` Ævar Arnfjörð Bjarmason
2022-03-09 23:02 ` Ævar Arnfjörð Bjarmason
2022-03-10 1:16 ` Neeraj Singh
2022-03-10 14:01 ` Ævar Arnfjörð Bjarmason
2022-03-10 17:52 ` Neeraj Singh
2022-03-10 18:08 ` rsbecker
2022-03-10 18:43 ` Neeraj Singh
2022-03-10 18:48 ` rsbecker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CANQDOdcKzxM+M7wgxUz831SbpwGWR7gcUC8xLFM14BcCJ+60sA@mail.gmail.com \
--to=nksingh85@gmail.com \
--cc=Johannes.Schindelin@gmx.de \
--cc=avarab@gmail.com \
--cc=bagasdotme@gmail.com \
--cc=e@80x24.org \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
--cc=hch@lst.de \
--cc=jeffhost@microsoft.com \
--cc=neerajsi@microsoft.com \
--cc=newren@gmail.com \
--cc=peff@peff.net \
--cc=ps@pks.im \
--cc=rsbecker@nexbridge.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).