git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Cc: "Taylor Blau" <me@ttaylorr.com>,
	"Derrick Stolee" <dstolee@microsoft.com>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Subject: Re: [PATCH] upload-pack: disable commit graph more gently for shallow traversal
Date: Wed, 11 Sep 2019 20:18:46 -0400	[thread overview]
Message-ID: <20190912001846.GA31370@sigill.intra.peff.net> (raw)
In-Reply-To: <20190912000414.GA31334@sigill.intra.peff.net>

On Wed, Sep 11, 2019 at 08:04:14PM -0400, Jeff King wrote:

> When the client has asked for certain shallow options like
> "deepen-since", we do a custom rev-list walk that pretends to be
> shallow. Before doing so, we have to disable the commit-graph, since it
> is not compatible with the shallow view of the repository. That's
> handled by 829a321569 (commit-graph: close_commit_graph before shallow
> walk, 2018-08-20). That commit literally closes and frees our
> repo->objects->commit_graph struct.

A few notes and curiosities on my patch and this general area.

The test suite passes with my patch both with and without
GIT_TEST_COMMIT_GRAPH=1. But to my surprise, it also passes if I delete
the close_commit_graph() line added by 829a321569!

So it's not clear to me whether this whole thing is truly unnecessary
(and Stolee was just being overly cautious because the code is related
to shallow-ness, even though it is OK doing a true-parent traversal
itself), or if we just don't have good test coverage for the case that
requires it.

If it _is_ necessary, I'm a little worried there are other problems
lurking. The whole issue is that we've seen and parsed some commits
before we get to this shallow deepen-since code-path. So just disabling
commit-graphs isn't enough. Even without them, we might have parsed some
commits the old-fashioned way and filled in their parent pointers. Is
that a problem?

If so, then I think we either have to "unparse" all of the existing
in-memory commits, or call out to a rev-list process with a fresh memory
space.

One especially interesting bit is this:

> +#  - we must use protocol v2, because it handles the "have" negotiation before
> +#    processing the shallow direectives

In the v0 protocol, we handle shallows at the end of receive_needs(). So
it happens before we call into get_common_commits(), which may do
further traversal. That makes it much more likely for the shallow code
to see a "clean" slate. Should v2 be ordering things in the same way?
But then would the have negotiation see the shallow parents? If so, is
that good or bad?

I'm pretty ignorant of how the shallow bits of upload-pack are supposed
to work.

> That creates an interesting problem for commits that have _already_ been
> parsed using the commit graph. Their commit->object.parsed flag is set,
> their commit->graph_pos is set, but their commit->maybe_tree may still
> be NULL. When somebody later calls repo_get_commit_tree(), we see that
> we haven't loaded the tree oid yet and try to get it from the commit
> graph. But since it has been freed, we segfault!

I was surprised we ever called repo_get_commit_tree() at all, since
we're literally just traversing commits here. It looks like
list-objects.c is very happy to queue pending trees for each commit,
even if we're just going to throw them away when we get to
process_tree()! I wonder if could be checking revs->tree_objects here
and saving ourselves some work.

> The new test in t5500 triggers this segfault, but see the comments there
> for how horribly intimate it has to be with how both upload-pack and
> commit graphs work.

This bug was triggered by a real world case. I'm not entirely clear what
the client-side state was; I had to reverse engineer the whole thing out
of the server-side coredump. The test here is my attempt to cut it down
to a minimum. I don't like having to manually generate the upload-pack
input, but I had trouble finding a fetch command that would trigger it.
And anyway, given how weirdly specific the requirements are for
generating this case, it seemed sensible to me to keep the input close
to the source of the problem.

-Peff

  reply	other threads:[~2019-09-12  0:18 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-12  0:04 [PATCH] upload-pack: disable commit graph more gently for shallow traversal Jeff King
2019-09-12  0:18 ` Jeff King [this message]
2019-09-12  1:11   ` [PATCH] list-objects: don't queue root trees unless revs->tree_objects is set Jeff King
2019-09-12  1:19     ` Jeff King
2019-09-12 12:31     ` Derrick Stolee
2019-09-12 16:52     ` Junio C Hamano
2019-09-12 22:34       ` Jeff King
2019-09-13 18:05         ` Junio C Hamano
2019-09-12  2:08   ` [PATCH] upload-pack: disable commit graph more gently for shallow traversal Taylor Blau
2019-09-12 14:03     ` Jeff King
2019-09-12  2:07 ` Taylor Blau
2019-09-12 11:06   ` SZEDER Gábor
2019-09-12 14:01     ` Jeff King
2019-09-12 12:46   ` Derrick Stolee
2019-09-12 13:59   ` Jeff King
2019-09-12 12:23 ` Derrick Stolee
2019-09-12 14:23   ` Jeff King
2019-09-12 19:27     ` Derrick Stolee
2019-09-12 14:41 ` [PATCH v2] upload-pack commit graph segfault fix Jeff King
2019-09-12 14:43   ` Jeff King
2019-09-12 14:44   ` [PATCH v2 1/2] commit-graph: bump DIE_ON_LOAD check to actual load-time Jeff King
2019-09-12 19:30     ` Derrick Stolee
2019-09-12 14:44   ` [PATCH v2 2/2] upload-pack: disable commit graph more gently for shallow traversal Jeff King
2019-09-13 13:26     ` Derrick Stolee
2019-09-12 16:56   ` [PATCH v2] upload-pack commit graph segfault fix Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190912001846.GA31370@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=me@ttaylorr.com \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).