From: Jeff King <peff@peff.net>
To: Derrick Stolee <stolee@gmail.com>
Cc: Taylor Blau <me@ttaylorr.com>, Junio C Hamano <gitster@pobox.com>,
git@vger.kernel.org, dstolee@microsoft.com
Subject: Re: [PATCH 1/1] commit-graph.c: avoid unnecessary tag dereference when merging
Date: Sun, 22 Mar 2020 01:49:16 -0400 [thread overview]
Message-ID: <20200322054916.GB578498@coredump.intra.peff.net> (raw)
In-Reply-To: <1cb561fc-5bce-28f0-e5e1-886f590fba92@gmail.com>
On Sat, Mar 21, 2020 at 08:23:13PM -0400, Derrick Stolee wrote:
> On 3/21/2020 8:20 PM, Taylor Blau wrote:
> > On Sat, Mar 21, 2020 at 08:03:01PM -0400, Derrick Stolee wrote:
> >> On 3/21/2020 2:50 PM, Junio C Hamano wrote:
> >>> Do we need to worry about INFO_QUICK and SKIP_FETCH_OBJECT in this
> >>> codepath, by the way?
> >>
> >> I was coming back to this thread to bring up these exact flags for
> >> consideration. The good news is that in a partial clone with any
> >> amount of filtering we will still have all reachable commits, which
> >> are necessary for the commit-graph to make sense. The only ones that
> >> would fail has_object_file() are ones removed by GC, but they may
> >> still exist on the remote. So without SKIP_FETCH_OBJECT, we would
> >> generate a network call even if the server has GC'd to remove the
> >> commits. This gets particularly bad when the server returns all
> >> reachable objects from that commit!
> >
> > That makes sense. Do you think something like this should be applied?
> > + int flags = OBJECT_INFO_QUICK | OBJECT_INFO_SKIP_FETCH_OBJECT;
> [...]
>
> Yes, I think this is the appropriate update. Thanks!
I'm not so sure.
In a non-partial clone, when would expect QUICK to matter? If we find
the object is missing, then either:
- we are racing with repack, and we would benefit from double-checking
that the object really is gone
- we used to have it (since it was mentioned in the graph file) but it
went away
Using QUICK means we won't waste time double-checking in the second
case. But it means we won't catch the first case, and we may generate a
new graph file that omits the object. They're both optimizations, and I
don't think we're impacting correctness[1], but it's not clear to me
whether one is a win over the other. We don't generally expect objects
we have to go away often.
Skipping fetching seems a little more straight-forward to me. If we had
it in a graph file, presumably we had the actual object before, too. And
either we're in the first case above (we really do have it and just need
to double-check) in which case not saying QUICK would be enough. Or we
intentionally got rid of it. In which case downloading it just to
generate a cache is quite silly.
So it seems like SKIP_FETCH_OBJECT but _not_ QUICK might be reasonable
here.
-Peff
[1] I'm actually not quite sure about correctness here. It should be
fine to generate a graph file without any given commit; readers will
just have to load that commit the old-fashioned way. But at this
phase of "commit-graph write", I think we'll already have done the
close_reachable() check. What does it mean to throw away a commit at
this stage? If we're the parent of another commit, then it will have
trouble referring to us by a uint32_t. Will the actual writing phase
barf, or will we generate an invalid graph file?
next prev parent reply other threads:[~2020-03-22 5:49 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-21 3:44 [PATCH 0/1] commit-graph: avoid unnecessary tag deference when merging Taylor Blau
2020-03-21 3:44 ` [PATCH 1/1] commit-graph.c: avoid unnecessary tag dereference " Taylor Blau
2020-03-21 5:00 ` Jeff King
2020-03-21 6:11 ` Taylor Blau
2020-03-21 6:24 ` Taylor Blau
2020-03-21 7:03 ` Jeff King
2020-03-21 17:27 ` Taylor Blau
2020-03-22 5:36 ` Jeff King
2020-03-22 11:04 ` SZEDER Gábor
2020-03-22 18:45 ` looking up object types quickly, was " Jeff King
2020-03-22 19:18 ` Jeff King
2020-03-23 20:15 ` Taylor Blau
2020-03-22 16:45 ` Taylor Blau
2020-03-24 6:06 ` Jeff King
2020-03-21 18:50 ` Junio C Hamano
2020-03-22 0:03 ` Derrick Stolee
2020-03-22 0:20 ` Taylor Blau
2020-03-22 0:23 ` Derrick Stolee
2020-03-22 5:49 ` Jeff King [this message]
2020-03-22 6:04 ` Jeff King
2020-03-22 15:47 ` Taylor Blau
2020-03-24 6:11 ` Jeff King
2020-03-24 23:08 ` Taylor Blau
2020-03-27 8:42 ` Jeff King
2020-03-27 15:03 ` Taylor Blau
2020-03-22 15:44 ` Taylor Blau
2020-03-24 6:14 ` Jeff King
2020-03-21 5:01 ` Junio C Hamano
2020-03-21 4:56 ` [PATCH 0/1] commit-graph: avoid unnecessary tag deference " Junio C Hamano
2020-03-21 5:04 ` Jeff King
2020-03-21 6:12 ` Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200322054916.GB578498@coredump.intra.peff.net \
--to=peff@peff.net \
--cc=dstolee@microsoft.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=me@ttaylorr.com \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).