From: Derrick Stolee <stolee@gmail.com>
To: Jeff King <peff@peff.net>, git@vger.kernel.org
Cc: "Taylor Blau" <me@ttaylorr.com>,
"Derrick Stolee" <dstolee@microsoft.com>,
"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Subject: Re: [PATCH] upload-pack: disable commit graph more gently for shallow traversal
Date: Thu, 12 Sep 2019 08:23:49 -0400 [thread overview]
Message-ID: <fdb81ce0-20ee-5880-2a6c-0c8b3f1739b9@gmail.com> (raw)
In-Reply-To: <20190912000414.GA31334@sigill.intra.peff.net>
On 9/11/2019 8:04 PM, Jeff King wrote:
> When the client has asked for certain shallow options like
> "deepen-since", we do a custom rev-list walk that pretends to be
> shallow. Before doing so, we have to disable the commit-graph, since it
> is not compatible with the shallow view of the repository. That's
> handled by 829a321569 (commit-graph: close_commit_graph before shallow
> walk, 2018-08-20). That commit literally closes and frees our
> repo->objects->commit_graph struct.
>
> That creates an interesting problem for commits that have _already_ been
> parsed using the commit graph. Their commit->object.parsed flag is set,
> their commit->graph_pos is set, but their commit->maybe_tree may still
> be NULL. When somebody later calls repo_get_commit_tree(), we see that
> we haven't loaded the tree oid yet and try to get it from the commit
> graph. But since it has been freed, we segfault!
OOPS! That is certainly a bad thing. I'm glad you found it, but I
am sorry for how you (probably) found it.
> So the root of the issue is a data dependency between the commit's
> lazy-load of the tree oid and the fact that the commit graph can go
> away mid-process. How can we resolve it?
>
> There are a couple of general approaches:
>
> 1. The obvious answer is to avoid loading the tree from the graph when
> we see that it's NULL. But then what do we return for the tree oid?
> If we return NULL, our caller in do_traverse() will rightly
> complain that we have no tree. We'd have to fallback to loading the
> actual commit object and re-parsing it. That requires teaching
> parse_commit_buffer() to understand re-parsing (i.e., not starting
> from a clean slate and not leaking any allocated bits like parent
> list pointers).
>
> 2. When we close the commit graph, walk through the set of in-memory
> objects and clear any graph_pos pointers. But this means we also
> have to "unparse" any such commits so that we know they still need
> to open the commit object to fill in their trees. So it's no less
> complicated than (1), and is more expensive (since we clear objects
> we might not later need).
>
> 3. Stop freeing the commit-graph struct. Continue to let it be used
> for lazy-loads of tree oids, but let upload-pack specify that it
> shouldn't be used for further commit parsing.
>
> 4. Push the whole shallow rev-list out to its own sub-process, with
> the commit-graph disabled from the start, giving it a clean memory
> space to work from.
>
> I've chosen (3) here. Options (1) and (2) would work, but are
> non-trivial to implement. Option (4) is more expensive, and I'm not sure
> how complicated it is (shelling out for the actual rev-list part is
> easy, but we do then parse the resulting commits internally, and I'm not
> clear which parts need to be handling shallow-ness).
I agree that (3) is the best option. The commit-graph is just a database
of commit data, and we are choosing to interpret the parent data differently.
The tree data is perfectly good.
> The new test in t5500 triggers this segfault, but see the comments there
> for how horribly intimate it has to be with how both upload-pack and
> commit graphs work.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> commit-graph.c | 10 ++++++++++
> commit-graph.h | 6 ++++++
> t/t5500-fetch-pack.sh | 38 ++++++++++++++++++++++++++++++++++++++
> upload-pack.c | 2 +-
> 4 files changed, 55 insertions(+), 1 deletion(-)
>
> diff --git a/commit-graph.c b/commit-graph.c
> index 9b02d2c426..bc5dd5913f 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -41,6 +41,8 @@
> #define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \
> + GRAPH_FANOUT_SIZE + the_hash_algo->rawsz)
>
> +static int commit_graph_disabled;
Should we be putting this inside the repository struct instead?
> +
> char *get_commit_graph_filename(const char *obj_dir)
> {
> char *filename = xstrfmt("%s/info/commit-graph", obj_dir);
> @@ -472,6 +474,9 @@ static int prepare_commit_graph(struct repository *r)
> die("dying as requested by the '%s' variable on commit-graph load!",
> GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD);
>
> + if (commit_graph_disabled)
> + return 0;
> +
> if (r->objects->commit_graph_attempted)
> return !!r->objects->commit_graph;
> r->objects->commit_graph_attempted = 1;
> @@ -2101,3 +2106,8 @@ void free_commit_graph(struct commit_graph *g)
> free(g->filename);
> free(g);
> }
> +
> +void disable_commit_graph(void)
> +{
> + commit_graph_disabled = 1;
> +}
Then this would take a struct repository *r.
> diff --git a/commit-graph.h b/commit-graph.h
> index 486e64e591..42d6e7c289 100644
> --- a/commit-graph.h
> +++ b/commit-graph.h
> @@ -107,4 +107,10 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
> void close_commit_graph(struct raw_object_store *);
> void free_commit_graph(struct commit_graph *);
>
> +/*
> + * Disable further use of the commit graph in this process when parsing a
> + * "struct commit".
> + */
> +void disable_commit_graph(void);
> +
> #endif
> diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
> index 8210f63d41..7601664b74 100755
> --- a/t/t5500-fetch-pack.sh
> +++ b/t/t5500-fetch-pack.sh
> @@ -783,6 +783,44 @@ test_expect_success 'clone shallow since selects no commits' '
> )
> '
>
> +# A few subtle things about the request in this test:
> +#
> +# - the server must have commit-graphs present and enabled
> +#
> +# - the history is such that our want/have share a common ancestor ("base"
> +# here)
> +#
> +# - we send only a single have, which is fewer than a normal client would
> +# send. This ensures that we don't parse "base" up front with
> +# parse_object(), but rather traverse to it as a parent while deciding if we
> +# can stop the "have" negotiation, and call parse_commit(). The former
> +# sees the actual object data and so always loads the three oid, whereas the
> +# latter will try to lod it lazily.
> +#
> +# - we must use protocol v2, because it handles the "have" negotiation before
> +# processing the shallow direectives
> +#
> +test_expect_success 'shallow since with commit graph and already-seen commit' '
> + test_create_repo shallow-since-graph &&
> + (
> + cd shallow-since-graph &&
> + test_commit base &&
> + test_commit master &&
> + git checkout -b other HEAD^ &&
> + test_commit other &&
> + git commit-graph write --reachable &&
> + git config core.commitGraph true &&
> +
> + GIT_PROTOCOL=version=2 git upload-pack . <<-EOF >/dev/null
> + 0012command=fetch
> + 00010013deepen-since 1
> + 0032want $(git rev-parse other)
> + 0032have $(git rev-parse master)
> + 0000
> + EOF
> + )
> +'
> +
> test_expect_success 'shallow clone exclude tag two' '
> test_create_repo shallow-exclude &&
> (
> diff --git a/upload-pack.c b/upload-pack.c
> index 222cd3ad89..a260b6b50d 100644
> --- a/upload-pack.c
> +++ b/upload-pack.c
> @@ -722,7 +722,7 @@ static void deepen_by_rev_list(struct packet_writer *writer, int ac,
> {
> struct commit_list *result;
>
> - close_commit_graph(the_repository->objects);
> + disable_commit_graph();
> result = get_shallow_commits_by_rev_list(ac, av, SHALLOW, NOT_SHALLOW);
> send_shallow(writer, result);
> free_commit_list(result);
>
Your patch does not seem to actually cover the "I've already parsed some commits"
case, as you are only preventing the commit-graph from being prepared. Instead,
we need to have a short-circuit inside parse_commit() to avoid future parsing
from the commit-graph file.
(I'm going to the rest of the thread messages now to see if I just repeated
someone else's concerns.)
Thanks,
-Stolee
next prev parent reply other threads:[~2019-09-12 12:23 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-09-12 0:04 [PATCH] upload-pack: disable commit graph more gently for shallow traversal Jeff King
2019-09-12 0:18 ` Jeff King
2019-09-12 1:11 ` [PATCH] list-objects: don't queue root trees unless revs->tree_objects is set Jeff King
2019-09-12 1:19 ` Jeff King
2019-09-12 12:31 ` Derrick Stolee
2019-09-12 16:52 ` Junio C Hamano
2019-09-12 22:34 ` Jeff King
2019-09-13 18:05 ` Junio C Hamano
2019-09-12 2:08 ` [PATCH] upload-pack: disable commit graph more gently for shallow traversal Taylor Blau
2019-09-12 14:03 ` Jeff King
2019-09-12 2:07 ` Taylor Blau
2019-09-12 11:06 ` SZEDER Gábor
2019-09-12 14:01 ` Jeff King
2019-09-12 12:46 ` Derrick Stolee
2019-09-12 13:59 ` Jeff King
2019-09-12 12:23 ` Derrick Stolee [this message]
2019-09-12 14:23 ` Jeff King
2019-09-12 19:27 ` Derrick Stolee
2019-09-12 14:41 ` [PATCH v2] upload-pack commit graph segfault fix Jeff King
2019-09-12 14:43 ` Jeff King
2019-09-12 14:44 ` [PATCH v2 1/2] commit-graph: bump DIE_ON_LOAD check to actual load-time Jeff King
2019-09-12 19:30 ` Derrick Stolee
2019-09-12 14:44 ` [PATCH v2 2/2] upload-pack: disable commit graph more gently for shallow traversal Jeff King
2019-09-13 13:26 ` Derrick Stolee
2019-09-12 16:56 ` [PATCH v2] upload-pack commit graph segfault fix Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fdb81ce0-20ee-5880-2a6c-0c8b3f1739b9@gmail.com \
--to=stolee@gmail.com \
--cc=dstolee@microsoft.com \
--cc=git@vger.kernel.org \
--cc=me@ttaylorr.com \
--cc=pclouds@gmail.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).