git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Jeff King <peff@peff.net>
Cc: Derrick Stolee <stolee@gmail.com>, Taylor Blau <me@ttaylorr.com>,
	Junio C Hamano <gitster@pobox.com>,
	git@vger.kernel.org, dstolee@microsoft.com
Subject: Re: [PATCH 1/1] commit-graph.c: avoid unnecessary tag dereference when merging
Date: Sun, 22 Mar 2020 09:44:39 -0600	[thread overview]
Message-ID: <20200322154439.GA53402@syl.local> (raw)
In-Reply-To: <20200322054916.GB578498@coredump.intra.peff.net>

On Sun, Mar 22, 2020 at 01:49:16AM -0400, Jeff King wrote:
> On Sat, Mar 21, 2020 at 08:23:13PM -0400, Derrick Stolee wrote:
>
> > On 3/21/2020 8:20 PM, Taylor Blau wrote:
> > > On Sat, Mar 21, 2020 at 08:03:01PM -0400, Derrick Stolee wrote:
> > >> On 3/21/2020 2:50 PM, Junio C Hamano wrote:
> > >>> Do we need to worry about INFO_QUICK and SKIP_FETCH_OBJECT in this
> > >>> codepath, by the way?
> > >>
> > >> I was coming back to this thread to bring up these exact flags for
> > >> consideration. The good news is that in a partial clone with any
> > >> amount of filtering we will still have all reachable commits, which
> > >> are necessary for the commit-graph to make sense. The only ones that
> > >> would fail has_object_file() are ones removed by GC, but they may
> > >> still exist on the remote. So without SKIP_FETCH_OBJECT, we would
> > >> generate a network call even if the server has GC'd to remove the
> > >> commits. This gets particularly bad when the server returns all
> > >> reachable objects from that commit!
> > >
> > > That makes sense. Do you think something like this should be applied?
> > > +       int flags = OBJECT_INFO_QUICK | OBJECT_INFO_SKIP_FETCH_OBJECT;
> > [...]
> >
> > Yes, I think this is the appropriate update. Thanks!
>
> I'm not so sure.
>
> In a non-partial clone, when would expect QUICK to matter? If we find
> the object is missing, then either:
>
>   - we are racing with repack, and we would benefit from double-checking
>     that the object really is gone
>
>   - we used to have it (since it was mentioned in the graph file) but it
>     went away
>
> Using QUICK means we won't waste time double-checking in the second
> case. But it means we won't catch the first case, and we may generate a
> new graph file that omits the object. They're both optimizations, and I
> don't think we're impacting correctness[1], but it's not clear to me
> whether one is a win over the other. We don't generally expect objects
> we have to go away often.
>
> Skipping fetching seems a little more straight-forward to me. If we had
> it in a graph file, presumably we had the actual object before, too. And
> either we're in the first case above (we really do have it and just need
> to double-check) in which case not saying QUICK would be enough. Or we
> intentionally got rid of it. In which case downloading it just to
> generate a cache is quite silly.

I was going to write that I'm not entirely sure of this, but I tried to
talk myself through it below, and I think that the right flag is *only*
OBJECT_INFO_SKIP_FETCH_OBJECT.

First, I agree with your reasoning that we shouldn't have
OBJECT_INFO_QUICK, that much makes sense to me.

To consider OBJECT_INFO_SKIP_FETCH_OBJECT, let's say that we used to
have some commit, and it got GC'd away. This shouldn't happen unless a
commit is unreachable, so if we ever find ourselves in a situation where
the parent of some other commit is missing, we know that that represents
a true corruption, not the result of a normal 'git gc'.

Now, if we do find ourselves in this case, we know that a 'git
commit-graph write --input=none --split' *would* work because we would
find that the unreachable commit (if it were in the graph) was missing,
and 'OBJECT_INFO_SKIP_FETCH_OBJECT' would tell us not to go looking for
it. Likewise, if the commit weren't already in the graph, this would
work fine, too.

But, most of the time we won't even get a chance to write a new
commit-graph based on existing ones, since 'gc.writeCommitGraph' will
take care of this for us in-process with 'git gc'. This uses
'--reachable' without '--split', so we won't ever invoke
'merge_commit_graph{s,}'.

> So it seems like SKIP_FETCH_OBJECT but _not_ QUICK might be reasonable
> here.

I agree with your original reasoning that OBJECT_INFO_QUICK is the wrong
choice, but I think my reasoning above that
OBJECT_INFO_SKIP_FETCH_OBJECT *is* an appropriate flag is sound.

> -Peff
>
> [1] I'm actually not quite sure about correctness here. It should be
>     fine to generate a graph file without any given commit; readers will
>     just have to load that commit the old-fashioned way. But at this
>     phase of "commit-graph write", I think we'll already have done the
>     close_reachable() check. What does it mean to throw away a commit at
>     this stage? If we're the parent of another commit, then it will have
>     trouble referring to us by a uint32_t. Will the actual writing phase
>     barf, or will we generate an invalid graph file?

Thanks,
Taylor

  parent reply	other threads:[~2020-03-22 15:44 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-21  3:44 [PATCH 0/1] commit-graph: avoid unnecessary tag deference when merging Taylor Blau
2020-03-21  3:44 ` [PATCH 1/1] commit-graph.c: avoid unnecessary tag dereference " Taylor Blau
2020-03-21  5:00   ` Jeff King
2020-03-21  6:11     ` Taylor Blau
2020-03-21  6:24       ` Taylor Blau
2020-03-21  7:03       ` Jeff King
2020-03-21 17:27         ` Taylor Blau
2020-03-22  5:36           ` Jeff King
2020-03-22 11:04             ` SZEDER Gábor
2020-03-22 18:45               ` looking up object types quickly, was " Jeff King
2020-03-22 19:18                 ` Jeff King
2020-03-23 20:15               ` Taylor Blau
2020-03-22 16:45             ` Taylor Blau
2020-03-24  6:06               ` Jeff King
2020-03-21 18:50         ` Junio C Hamano
2020-03-22  0:03           ` Derrick Stolee
2020-03-22  0:20             ` Taylor Blau
2020-03-22  0:23               ` Derrick Stolee
2020-03-22  5:49                 ` Jeff King
2020-03-22  6:04                   ` Jeff King
2020-03-22 15:47                     ` Taylor Blau
2020-03-24  6:11                       ` Jeff King
2020-03-24 23:08                         ` Taylor Blau
2020-03-27  8:42                           ` Jeff King
2020-03-27 15:03                             ` Taylor Blau
2020-03-22 15:44                   ` Taylor Blau [this message]
2020-03-24  6:14                     ` Jeff King
2020-03-21  5:01   ` Junio C Hamano
2020-03-21  4:56 ` [PATCH 0/1] commit-graph: avoid unnecessary tag deference " Junio C Hamano
2020-03-21  5:04   ` Jeff King
2020-03-21  6:12     ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200322154439.GA53402@syl.local \
    --to=me@ttaylorr.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).