git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Jeff King <peff@peff.net>
Cc: Taylor Blau <me@ttaylorr.com>, Derrick Stolee <stolee@gmail.com>,
	Junio C Hamano <gitster@pobox.com>,
	git@vger.kernel.org, dstolee@microsoft.com
Subject: Re: [PATCH 1/1] commit-graph.c: avoid unnecessary tag dereference when merging
Date: Tue, 24 Mar 2020 17:08:26 -0600	[thread overview]
Message-ID: <20200324230826.GA42939@syl.local> (raw)
In-Reply-To: <20200324061159.GC610977@coredump.intra.peff.net>

On Tue, Mar 24, 2020 at 02:11:59AM -0400, Jeff King wrote:
> On Sun, Mar 22, 2020 at 09:47:49AM -0600, Taylor Blau wrote:
>
> > > > [1] I'm actually not quite sure about correctness here. It should be
> > > >     fine to generate a graph file without any given commit; readers will
> > > >     just have to load that commit the old-fashioned way. But at this
> > > >     phase of "commit-graph write", I think we'll already have done the
> > > >     close_reachable() check. What does it mean to throw away a commit at
> > > >     this stage? If we're the parent of another commit, then it will have
> > > >     trouble referring to us by a uint32_t. Will the actual writing phase
> > > >     barf, or will we generate an invalid graph file?
> > >
> > > It doesn't seem great. If I instrument Git like this to simulate an
> > > object temporarily "missing" (if it were really missing the whole repo
> > > would be corrupt; we're trying to see what would happen if a race causes
> > > us to momentarily not see it):
> >
> > This is definitely a problem on either side of this patch, which is
> > demonstrated by the fact that you applied your changes without my patch
> > on top (and that my patch isn't changing anything substantial in this
> > area like removing the 'continue' statement).
> >
> > Should we address this before moving on with my patch? I think that we
> > *could*, but I'd rather go forward with what we have for now, since it's
> > only improving the situation, and not introducing a new bug.
>
> I do agree it's a problem before your patch. But I think your patch may
> make it a lot more common, if only because it means we'd _actually_ be
> dropping entries for objects that went away, instead of accidentally
> keeping them due to re-using the graph result. So it probably is worth
> trying to deal with it now, or at least thinking hard about it.
>
> The trouble is that I'm not sure what _should_ happen. Aborting the
> whole commit-graph generation seems overboard (and would be annoying for
> cases where whole swaths of history became unreachable and went away;
> presumably we'd be dropping _all_ of those objects, and the write phase
> would be just fine). I have a feeling the correct solution is to do this
> merging pass earlier, before we check close_reachable(). Or if not a
> true merging pass, at least a pass to check which existing entries are
> still valid. But I don't know how painful that reordering would be.

Maybe, but I'm not sure that moving 'close_reachable' up would
necessarily solve all of our problems. Or, at least, that it wouldn't
create a set of new problems :).

Let's say that you have (1) some connected component of your history
that is written to a commit-graph, where (2) part of that cluster has
been dropped (e.g., due to corruption, a rogue 'git gc', etc), and (3)
you're writing a new graph with '--input=graphed'.

What is 'close_reachable' supposed to do? If some of the now-missing
commits are in the reachability closure of the commits that still exist
in the repository, then we're back to where we started. We *could* have
'close_reachable' take all missing commits and drop their ancestors and
descendants, but this creates quite the headache for 'close_reachable',
which now needs to keep track of such things.

I'm hopeful that this isn't so common in practice, but I'm also not
entirely sure one way or another. I can plan to deploy this patch to
GitHub's servers for a ~month and see if we experience it. If it turns
out to be common, I'll assume that others have this problem, too, in
which case we can go back and think more about it.

But, I'm hopeful that this problem is rare enough that it isn't worth
worrying about for anybody, GitHub or not.

> -Peff

Thanks,
Taylor

  reply	other threads:[~2020-03-24 23:08 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-21  3:44 [PATCH 0/1] commit-graph: avoid unnecessary tag deference when merging Taylor Blau
2020-03-21  3:44 ` [PATCH 1/1] commit-graph.c: avoid unnecessary tag dereference " Taylor Blau
2020-03-21  5:00   ` Jeff King
2020-03-21  6:11     ` Taylor Blau
2020-03-21  6:24       ` Taylor Blau
2020-03-21  7:03       ` Jeff King
2020-03-21 17:27         ` Taylor Blau
2020-03-22  5:36           ` Jeff King
2020-03-22 11:04             ` SZEDER Gábor
2020-03-22 18:45               ` looking up object types quickly, was " Jeff King
2020-03-22 19:18                 ` Jeff King
2020-03-23 20:15               ` Taylor Blau
2020-03-22 16:45             ` Taylor Blau
2020-03-24  6:06               ` Jeff King
2020-03-21 18:50         ` Junio C Hamano
2020-03-22  0:03           ` Derrick Stolee
2020-03-22  0:20             ` Taylor Blau
2020-03-22  0:23               ` Derrick Stolee
2020-03-22  5:49                 ` Jeff King
2020-03-22  6:04                   ` Jeff King
2020-03-22 15:47                     ` Taylor Blau
2020-03-24  6:11                       ` Jeff King
2020-03-24 23:08                         ` Taylor Blau [this message]
2020-03-27  8:42                           ` Jeff King
2020-03-27 15:03                             ` Taylor Blau
2020-03-22 15:44                   ` Taylor Blau
2020-03-24  6:14                     ` Jeff King
2020-03-21  5:01   ` Junio C Hamano
2020-03-21  4:56 ` [PATCH 0/1] commit-graph: avoid unnecessary tag deference " Junio C Hamano
2020-03-21  5:04   ` Jeff King
2020-03-21  6:12     ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200324230826.GA42939@syl.local \
    --to=me@ttaylorr.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).