git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Neeraj Singh <nksingh85@gmail.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: Neeraj Singh via GitGitGadget <gitgitgadget@gmail.com>,
	Git List <git@vger.kernel.org>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	Jeff King <peff@peff.net>,
	Jeff Hostetler <jeffhost@microsoft.com>,
	Christoph Hellwig <hch@lst.de>,
	"Randall S. Becker" <rsbecker@nexbridge.com>,
	Bagas Sanjaya <bagasdotme@gmail.com>,
	Neeraj Singh <neerajsi@microsoft.com>
Subject: Re: [PATCH v4 4/6] update-index: use the bulk-checkin infrastructure
Date: Tue, 21 Sep 2021 18:27:25 -0700	[thread overview]
Message-ID: <CANQDOdeV4JuE8jnkzLmK6VfFj1t-+EOzvn=GD-ejPdS6unc66w@mail.gmail.com> (raw)
In-Reply-To: <87ee9h8p0a.fsf@evledraar.gmail.com>

On Tue, Sep 21, 2021 at 4:53 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>
> On Mon, Sep 20 2021, Neeraj Singh via GitGitGadget wrote:
>
> > From: Neeraj Singh <neerajsi@microsoft.com>
> >
> > The update-index functionality is used internally by 'git stash push' to
> > setup the internal stashed commit.
> >
> > This change enables bulk-checkin for update-index infrastructure to
> > speed up adding new objects to the object database by leveraging the
> > pack functionality and the new bulk-fsync functionality. This mode
> > is enabled when passing paths to update-index via the --stdin flag,
> > as is done by 'git stash'.
> >
> > There is some risk with this change, since under batch fsync, the object
> > files will not be available until the update-index is entirely complete.
> > This usage is unlikely, since any tool invoking update-index and
> > expecting to see objects would have to snoop the output of --verbose to
> > find out when update-index has actually processed a given path.
> > Additionally the index is locked for the duration of the update.
>
> Would you really need to sniff the verbose output? If I'm streaming data
> to update-index now it looks like I could assume before that
> update-index would have done the work if I managed to fflush() to it,
> since it's processing a line at a time and doing the work in that
> line-at-a-time loop.
>
> I.e. you could print lines to it, and then do concurrent object lookups
> knowing the data was written already...
>
> I think this is probably fine, but that case seems way likelier than
> someone sniffing back the verbose output, presumably for the "add" in
> update_one(), but that's called in the getline_fn() loop...

Does fflush really guarantee that the reader has picked up the input from
a pipe across all environments?  Even if a reader picks up the input, does
that mean that the reader is done processing it?

Do you think I really need to revise this comment? Maybe leave a terser,
'this usage is thought to be unlikely'?

>
> All of this makes me wonder why this isn't using tmp-objdir.c, i.e. we
> could have our cake and eat it too by writing the "real" objects, and
> then just renaming them between directories instead. But perhaps the
> answer has something to do with the metadata issues I raised.
>
> And well, tmp-objdir.c isn't going to help someone in practice that's
> relying on this "update-index --stdin" behavior, as they won't know
> where we staged the temporary files...
>

One motivation of the current design behind renaming the files is that
some networked filesystems don't seem to like cross-directory renames
much.  It also so happens that ReFS on Windows also prefers renames to
stay within the directory. Actually any filesystem would likely be
slightly faster,
since fewer objects are being modified (one dir versus two).

  reply	other threads:[~2021-09-22  1:27 UTC|newest]

Thread overview: 160+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-25  1:51 [PATCH 0/2] [RFC] Implement a bulk-checkin option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
2021-08-25  1:51 ` [PATCH 1/2] object-file: use futimes rather than utime Neeraj Singh via GitGitGadget
2021-08-25 13:51   ` Johannes Schindelin
2021-08-25 22:08     ` Neeraj Singh
2021-08-25  1:51 ` [PATCH 2/2] core.fsyncobjectfiles: batch disk flushes Neeraj Singh via GitGitGadget
2021-08-25  5:38   ` Christoph Hellwig
2021-08-25 17:40     ` Neeraj Singh
2021-08-26  5:54       ` Christoph Hellwig
2021-08-25 16:11   ` Ævar Arnfjörð Bjarmason
2021-08-26  0:49     ` Neeraj Singh
2021-08-26  5:50       ` Christoph Hellwig
2021-08-28  0:20         ` Neeraj Singh
2021-08-28  6:57           ` Christoph Hellwig
2021-08-31 19:59             ` Neeraj Singh
2021-09-01  5:09               ` Christoph Hellwig
2021-08-26  5:57     ` Christoph Hellwig
2021-08-25 18:52   ` Johannes Schindelin
2021-08-25 21:26     ` Junio C Hamano
2021-08-26  1:19     ` Neeraj Singh
2021-08-25 16:58 ` [PATCH 0/2] [RFC] Implement a bulk-checkin option for core.fsyncObjectFiles Neeraj Singh
2021-08-27 23:49 ` [PATCH v2 0/6] Implement a batched fsync " Neeraj K. Singh via GitGitGadget
2021-08-27 23:49   ` [PATCH v2 1/6] object-file: use futimens rather than utime Neeraj Singh via GitGitGadget
2021-08-27 23:49   ` [PATCH v2 2/6] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
2021-08-27 23:49   ` [PATCH v2 3/6] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
2021-08-27 23:49   ` [PATCH v2 4/6] core.fsyncobjectfiles: add windows support for batch mode Neeraj Singh via GitGitGadget
2021-08-27 23:49   ` [PATCH v2 5/6] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
2021-08-27 23:49   ` [PATCH v2 6/6] core.fsyncobjectfiles: performance tests for add and stash Neeraj Singh via GitGitGadget
2021-09-07 19:44   ` [PATCH v2 0/6] Implement a batched fsync option for core.fsyncObjectFiles Neeraj Singh
2021-09-07 19:50     ` Ævar Arnfjörð Bjarmason
2021-09-07 19:54     ` Randall S. Becker
2021-09-08  0:54       ` Neeraj Singh
2021-09-08  1:22         ` Ævar Arnfjörð Bjarmason
2021-09-08 14:04           ` Randall S. Becker
2021-09-08 19:01           ` Neeraj Singh
2021-09-08  0:55     ` Neeraj Singh
2021-09-08  6:44       ` Junio C Hamano
2021-09-08  6:49         ` Christoph Hellwig
2021-09-08 13:57           ` Randall S. Becker
2021-09-08 14:13             ` 'Christoph Hellwig'
2021-09-08 14:25               ` Randall S. Becker
2021-09-08 16:34         ` Neeraj Singh
2021-09-08 19:12           ` Junio C Hamano
2021-09-08 19:20             ` Neeraj Singh
2021-09-08 19:23           ` Ævar Arnfjörð Bjarmason
2021-09-14  3:38   ` [PATCH v3 " Neeraj K. Singh via GitGitGadget
2021-09-14  3:38     ` [PATCH v3 1/6] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
2021-09-14  3:38     ` [PATCH v3 2/6] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
2021-09-14 10:39       ` Bagas Sanjaya
2021-09-14 19:05         ` Neeraj Singh
2021-09-14 19:34       ` Junio C Hamano
2021-09-14 20:33         ` Junio C Hamano
2021-09-15  4:55         ` Neeraj Singh
2021-09-14  3:38     ` [PATCH v3 3/6] core.fsyncobjectfiles: add windows support for batch mode Neeraj Singh via GitGitGadget
2021-09-14  3:38     ` [PATCH v3 4/6] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
2021-09-14 19:35       ` Junio C Hamano
2021-09-14  3:38     ` [PATCH v3 5/6] core.fsyncobjectfiles: performance tests for add and stash Neeraj Singh via GitGitGadget
2021-09-14  3:38     ` [PATCH v3 6/6] core.fsyncobjectfiles: enable batch mode for testing Neeraj Singh via GitGitGadget
2021-09-15 16:21       ` Junio C Hamano
2021-09-15 22:43         ` Neeraj Singh
2021-09-15 23:12           ` Junio C Hamano
2021-09-16  6:19             ` Junio C Hamano
2021-09-14  5:49     ` [PATCH v3 0/6] Implement a batched fsync option for core.fsyncObjectFiles Christoph Hellwig
2021-09-20 22:15     ` [PATCH v4 " Neeraj K. Singh via GitGitGadget
2021-09-20 22:15       ` [PATCH v4 1/6] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
2021-09-20 22:15       ` [PATCH v4 2/6] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
2021-09-21 23:16         ` Ævar Arnfjörð Bjarmason
2021-09-22  1:23           ` Neeraj Singh
2021-09-22  2:02             ` Ævar Arnfjörð Bjarmason
2021-09-22 19:46             ` Neeraj Singh
2021-09-20 22:15       ` [PATCH v4 3/6] core.fsyncobjectfiles: add windows support for batch mode Neeraj Singh via GitGitGadget
2021-09-21 23:42         ` Ævar Arnfjörð Bjarmason
2021-09-22  1:23           ` Neeraj Singh
2021-09-20 22:15       ` [PATCH v4 4/6] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
2021-09-21 23:46         ` Ævar Arnfjörð Bjarmason
2021-09-22  1:27           ` Neeraj Singh [this message]
2021-09-23 22:32             ` Neeraj Singh
2021-09-20 22:15       ` [PATCH v4 5/6] core.fsyncobjectfiles: tests for batch mode Neeraj Singh via GitGitGadget
2021-09-21 23:54         ` Ævar Arnfjörð Bjarmason
2021-09-22  1:30           ` Neeraj Singh
2021-09-22  1:58             ` Ævar Arnfjörð Bjarmason
2021-09-22 17:55               ` Neeraj Singh
2021-09-22 20:01                 ` Ævar Arnfjörð Bjarmason
2021-09-20 22:15       ` [PATCH v4 6/6] core.fsyncobjectfiles: performance tests for add and stash Neeraj Singh via GitGitGadget
2021-09-24 20:12       ` [PATCH v5 0/7] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
2021-09-24 20:12         ` [PATCH v5 1/7] object-file.c: do not rename in a temp odb Neeraj Singh via GitGitGadget
2021-09-24 20:12         ` [PATCH v5 2/7] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
2021-09-24 20:12         ` [PATCH v5 3/7] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
2021-09-24 21:47           ` Neeraj Singh
2021-09-24 20:12         ` [PATCH v5 4/7] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
2021-09-24 21:49           ` Neeraj Singh
2021-09-24 20:12         ` [PATCH v5 5/7] unpack-objects: " Neeraj Singh via GitGitGadget
2021-09-24 20:12         ` [PATCH v5 6/7] core.fsyncobjectfiles: tests for batch mode Neeraj Singh via GitGitGadget
2021-09-24 20:12         ` [PATCH v5 7/7] core.fsyncobjectfiles: performance tests for add and stash Neeraj Singh via GitGitGadget
2021-09-24 23:31         ` [PATCH v5 0/7] Implement a batched fsync option for core.fsyncObjectFiles Neeraj Singh
2021-09-24 23:53         ` [PATCH v6 0/8] " Neeraj K. Singh via GitGitGadget
2021-09-24 23:53           ` [PATCH v6 1/8] object-file.c: do not rename in a temp odb Neeraj Singh via GitGitGadget
2021-09-24 23:53           ` [PATCH v6 2/8] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
2021-09-24 23:53           ` [PATCH v6 3/8] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
2021-09-25  3:15             ` Bagas Sanjaya
2021-09-27  0:27               ` Neeraj Singh
2021-09-24 23:53           ` [PATCH v6 4/8] core.fsyncobjectfiles: add windows support for batch mode Neeraj Singh via GitGitGadget
2021-09-27 20:07             ` Junio C Hamano
2021-09-27 20:55               ` Neeraj Singh
2021-09-27 21:03                 ` Neeraj Singh
2021-09-27 23:53                   ` Junio C Hamano
2021-09-24 23:53           ` [PATCH v6 5/8] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
2021-09-24 23:53           ` [PATCH v6 6/8] unpack-objects: " Neeraj Singh via GitGitGadget
2021-09-24 23:53           ` [PATCH v6 7/8] core.fsyncobjectfiles: tests for batch mode Neeraj Singh via GitGitGadget
2021-09-24 23:53           ` [PATCH v6 8/8] core.fsyncobjectfiles: performance tests for add and stash Neeraj Singh via GitGitGadget
2021-09-28 23:32           ` [PATCH v7 0/9] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
2021-09-28 23:32             ` [PATCH v7 1/9] object-file.c: do not rename in a temp odb Neeraj Singh via GitGitGadget
2021-09-28 23:55               ` Jeff King
2021-09-29  0:10                 ` Neeraj Singh
2021-09-28 23:32             ` [PATCH v7 2/9] tmp-objdir: new API for creating temporary writable databases Neeraj Singh via GitGitGadget
2021-09-29  8:41               ` Elijah Newren
2021-09-29 16:40                 ` Neeraj Singh
2021-09-28 23:32             ` [PATCH v7 3/9] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
2021-09-28 23:32             ` [PATCH v7 4/9] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
2021-09-28 23:32             ` [PATCH v7 5/9] core.fsyncobjectfiles: add windows support for batch mode Neeraj Singh via GitGitGadget
2021-09-28 23:32             ` [PATCH v7 6/9] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
2021-09-28 23:32             ` [PATCH v7 7/9] unpack-objects: " Neeraj Singh via GitGitGadget
2021-09-28 23:32             ` [PATCH v7 8/9] core.fsyncobjectfiles: tests for batch mode Neeraj Singh via GitGitGadget
2021-09-28 23:32             ` [PATCH v7 9/9] core.fsyncobjectfiles: performance tests for add and stash Neeraj Singh via GitGitGadget
2021-10-04 16:57             ` [PATCH v8 0/9] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
2021-10-04 16:57               ` [PATCH v8 1/9] tmp-objdir: new API for creating temporary writable databases Neeraj Singh via GitGitGadget
2021-10-04 16:57               ` [PATCH v8 2/9] tmp-objdir: disable ref updates when replacing the primary odb Neeraj Singh via GitGitGadget
2021-10-04 16:57               ` [PATCH v8 3/9] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
2021-10-04 16:57               ` [PATCH v8 4/9] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
2021-10-04 16:57               ` [PATCH v8 5/9] core.fsyncobjectfiles: add windows support for batch mode Neeraj Singh via GitGitGadget
2021-10-04 16:57               ` [PATCH v8 6/9] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
2021-10-04 16:57               ` [PATCH v8 7/9] unpack-objects: " Neeraj Singh via GitGitGadget
2021-10-04 16:57               ` [PATCH v8 8/9] core.fsyncobjectfiles: tests for batch mode Neeraj Singh via GitGitGadget
2021-10-04 16:57               ` [PATCH v8 9/9] core.fsyncobjectfiles: performance tests for add and stash Neeraj Singh via GitGitGadget
2021-11-15 23:50               ` [PATCH v9 0/9] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
2021-11-15 23:50                 ` [PATCH v9 1/9] tmp-objdir: new API for creating temporary writable databases Neeraj Singh via GitGitGadget
2021-11-30 21:27                   ` Elijah Newren
2021-11-30 21:52                     ` Neeraj Singh
2021-11-30 22:36                       ` Elijah Newren
2021-11-15 23:50                 ` [PATCH v9 2/9] tmp-objdir: disable ref updates when replacing the primary odb Neeraj Singh via GitGitGadget
2021-11-16  7:23                   ` Ævar Arnfjörð Bjarmason
2021-11-16 20:38                     ` Neeraj Singh
2021-11-15 23:50                 ` [PATCH v9 3/9] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
2021-11-15 23:50                 ` [PATCH v9 4/9] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
2021-11-15 23:50                 ` [PATCH v9 5/9] core.fsyncobjectfiles: add windows support for batch mode Neeraj Singh via GitGitGadget
2021-11-15 23:51                 ` [PATCH v9 6/9] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
2021-11-15 23:51                 ` [PATCH v9 7/9] unpack-objects: " Neeraj Singh via GitGitGadget
2021-11-15 23:51                 ` [PATCH v9 8/9] core.fsyncobjectfiles: tests for batch mode Neeraj Singh via GitGitGadget
2021-11-15 23:51                 ` [PATCH v9 9/9] core.fsyncobjectfiles: performance tests for add and stash Neeraj Singh via GitGitGadget
2021-11-16  8:02                 ` [PATCH v9 0/9] Implement a batched fsync option for core.fsyncObjectFiles Ævar Arnfjörð Bjarmason
2021-11-17  7:06                   ` Neeraj Singh
2021-11-17  7:24                     ` Ævar Arnfjörð Bjarmason
2021-11-18  5:03                       ` Neeraj Singh
2021-12-01 14:15                         ` Ævar Arnfjörð Bjarmason
2022-03-09 23:02                           ` Ævar Arnfjörð Bjarmason
2022-03-10  1:16                             ` Neeraj Singh
2022-03-10 14:01                               ` Ævar Arnfjörð Bjarmason
2022-03-10 17:52                                 ` Neeraj Singh
2022-03-10 18:08                                   ` rsbecker
2022-03-10 18:43                                     ` Neeraj Singh
2022-03-10 18:48                                       ` rsbecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CANQDOdeV4JuE8jnkzLmK6VfFj1t-+EOzvn=GD-ejPdS6unc66w@mail.gmail.com' \
    --to=nksingh85@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=avarab@gmail.com \
    --cc=bagasdotme@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=hch@lst.de \
    --cc=jeffhost@microsoft.com \
    --cc=neerajsi@microsoft.com \
    --cc=peff@peff.net \
    --cc=rsbecker@nexbridge.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).