git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] upload-pack: disable commit graph more gently for shallow traversal
@ 2019-09-12  0:04 Jeff King
  2019-09-12  0:18 ` Jeff King
                   ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: Jeff King @ 2019-09-12  0:04 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Derrick Stolee,
	Nguyễn Thái Ngọc Duy

When the client has asked for certain shallow options like
"deepen-since", we do a custom rev-list walk that pretends to be
shallow. Before doing so, we have to disable the commit-graph, since it
is not compatible with the shallow view of the repository. That's
handled by 829a321569 (commit-graph: close_commit_graph before shallow
walk, 2018-08-20). That commit literally closes and frees our
repo->objects->commit_graph struct.

That creates an interesting problem for commits that have _already_ been
parsed using the commit graph. Their commit->object.parsed flag is set,
their commit->graph_pos is set, but their commit->maybe_tree may still
be NULL. When somebody later calls repo_get_commit_tree(), we see that
we haven't loaded the tree oid yet and try to get it from the commit
graph. But since it has been freed, we segfault!

So the root of the issue is a data dependency between the commit's
lazy-load of the tree oid and the fact that the commit graph can go
away mid-process. How can we resolve it?

There are a couple of general approaches:

  1. The obvious answer is to avoid loading the tree from the graph when
     we see that it's NULL. But then what do we return for the tree oid?
     If we return NULL, our caller in do_traverse() will rightly
     complain that we have no tree. We'd have to fallback to loading the
     actual commit object and re-parsing it. That requires teaching
     parse_commit_buffer() to understand re-parsing (i.e., not starting
     from a clean slate and not leaking any allocated bits like parent
     list pointers).

  2. When we close the commit graph, walk through the set of in-memory
     objects and clear any graph_pos pointers. But this means we also
     have to "unparse" any such commits so that we know they still need
     to open the commit object to fill in their trees. So it's no less
     complicated than (1), and is more expensive (since we clear objects
     we might not later need).

  3. Stop freeing the commit-graph struct. Continue to let it be used
     for lazy-loads of tree oids, but let upload-pack specify that it
     shouldn't be used for further commit parsing.

  4. Push the whole shallow rev-list out to its own sub-process, with
     the commit-graph disabled from the start, giving it a clean memory
     space to work from.

I've chosen (3) here. Options (1) and (2) would work, but are
non-trivial to implement. Option (4) is more expensive, and I'm not sure
how complicated it is (shelling out for the actual rev-list part is
easy, but we do then parse the resulting commits internally, and I'm not
clear which parts need to be handling shallow-ness).

The new test in t5500 triggers this segfault, but see the comments there
for how horribly intimate it has to be with how both upload-pack and
commit graphs work.

Signed-off-by: Jeff King <peff@peff.net>
---
 commit-graph.c        | 10 ++++++++++
 commit-graph.h        |  6 ++++++
 t/t5500-fetch-pack.sh | 38 ++++++++++++++++++++++++++++++++++++++
 upload-pack.c         |  2 +-
 4 files changed, 55 insertions(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index 9b02d2c426..bc5dd5913f 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -41,6 +41,8 @@
 #define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \
 			+ GRAPH_FANOUT_SIZE + the_hash_algo->rawsz)
 
+static int commit_graph_disabled;
+
 char *get_commit_graph_filename(const char *obj_dir)
 {
 	char *filename = xstrfmt("%s/info/commit-graph", obj_dir);
@@ -472,6 +474,9 @@ static int prepare_commit_graph(struct repository *r)
 		die("dying as requested by the '%s' variable on commit-graph load!",
 		    GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD);
 
+	if (commit_graph_disabled)
+		return 0;
+
 	if (r->objects->commit_graph_attempted)
 		return !!r->objects->commit_graph;
 	r->objects->commit_graph_attempted = 1;
@@ -2101,3 +2106,8 @@ void free_commit_graph(struct commit_graph *g)
 	free(g->filename);
 	free(g);
 }
+
+void disable_commit_graph(void)
+{
+	commit_graph_disabled = 1;
+}
diff --git a/commit-graph.h b/commit-graph.h
index 486e64e591..42d6e7c289 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -107,4 +107,10 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
 void close_commit_graph(struct raw_object_store *);
 void free_commit_graph(struct commit_graph *);
 
+/*
+ * Disable further use of the commit graph in this process when parsing a
+ * "struct commit".
+ */
+void disable_commit_graph(void);
+
 #endif
diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index 8210f63d41..7601664b74 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -783,6 +783,44 @@ test_expect_success 'clone shallow since selects no commits' '
 	)
 '
 
+# A few subtle things about the request in this test:
+#
+#  - the server must have commit-graphs present and enabled
+#
+#  - the history is such that our want/have share a common ancestor ("base"
+#    here)
+#
+#  - we send only a single have, which is fewer than a normal client would
+#    send. This ensures that we don't parse "base" up front with
+#    parse_object(), but rather traverse to it as a parent while deciding if we
+#    can stop the "have" negotiation, and call parse_commit(). The former
+#    sees the actual object data and so always loads the three oid, whereas the
+#    latter will try to lod it lazily.
+#
+#  - we must use protocol v2, because it handles the "have" negotiation before
+#    processing the shallow direectives
+#
+test_expect_success 'shallow since with commit graph and already-seen commit' '
+	test_create_repo shallow-since-graph &&
+	(
+	cd shallow-since-graph &&
+	test_commit base &&
+	test_commit master &&
+	git checkout -b other HEAD^ &&
+	test_commit other &&
+	git commit-graph write --reachable &&
+	git config core.commitGraph true &&
+
+	GIT_PROTOCOL=version=2 git upload-pack . <<-EOF >/dev/null
+	0012command=fetch
+	00010013deepen-since 1
+	0032want $(git rev-parse other)
+	0032have $(git rev-parse master)
+	0000
+	EOF
+	)
+'
+
 test_expect_success 'shallow clone exclude tag two' '
 	test_create_repo shallow-exclude &&
 	(
diff --git a/upload-pack.c b/upload-pack.c
index 222cd3ad89..a260b6b50d 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -722,7 +722,7 @@ static void deepen_by_rev_list(struct packet_writer *writer, int ac,
 {
 	struct commit_list *result;
 
-	close_commit_graph(the_repository->objects);
+	disable_commit_graph();
 	result = get_shallow_commits_by_rev_list(ac, av, SHALLOW, NOT_SHALLOW);
 	send_shallow(writer, result);
 	free_commit_list(result);
-- 
2.23.0.663.gbe3d117559

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH] upload-pack: disable commit graph more gently for shallow traversal
  2019-09-12  0:04 [PATCH] upload-pack: disable commit graph more gently for shallow traversal Jeff King
@ 2019-09-12  0:18 ` Jeff King
  2019-09-12  1:11   ` [PATCH] list-objects: don't queue root trees unless revs->tree_objects is set Jeff King
  2019-09-12  2:08   ` [PATCH] upload-pack: disable commit graph more gently for shallow traversal Taylor Blau
  2019-09-12  2:07 ` Taylor Blau
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 25+ messages in thread
From: Jeff King @ 2019-09-12  0:18 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Derrick Stolee,
	Nguyễn Thái Ngọc Duy

On Wed, Sep 11, 2019 at 08:04:14PM -0400, Jeff King wrote:

> When the client has asked for certain shallow options like
> "deepen-since", we do a custom rev-list walk that pretends to be
> shallow. Before doing so, we have to disable the commit-graph, since it
> is not compatible with the shallow view of the repository. That's
> handled by 829a321569 (commit-graph: close_commit_graph before shallow
> walk, 2018-08-20). That commit literally closes and frees our
> repo->objects->commit_graph struct.

A few notes and curiosities on my patch and this general area.

The test suite passes with my patch both with and without
GIT_TEST_COMMIT_GRAPH=1. But to my surprise, it also passes if I delete
the close_commit_graph() line added by 829a321569!

So it's not clear to me whether this whole thing is truly unnecessary
(and Stolee was just being overly cautious because the code is related
to shallow-ness, even though it is OK doing a true-parent traversal
itself), or if we just don't have good test coverage for the case that
requires it.

If it _is_ necessary, I'm a little worried there are other problems
lurking. The whole issue is that we've seen and parsed some commits
before we get to this shallow deepen-since code-path. So just disabling
commit-graphs isn't enough. Even without them, we might have parsed some
commits the old-fashioned way and filled in their parent pointers. Is
that a problem?

If so, then I think we either have to "unparse" all of the existing
in-memory commits, or call out to a rev-list process with a fresh memory
space.

One especially interesting bit is this:

> +#  - we must use protocol v2, because it handles the "have" negotiation before
> +#    processing the shallow direectives

In the v0 protocol, we handle shallows at the end of receive_needs(). So
it happens before we call into get_common_commits(), which may do
further traversal. That makes it much more likely for the shallow code
to see a "clean" slate. Should v2 be ordering things in the same way?
But then would the have negotiation see the shallow parents? If so, is
that good or bad?

I'm pretty ignorant of how the shallow bits of upload-pack are supposed
to work.

> That creates an interesting problem for commits that have _already_ been
> parsed using the commit graph. Their commit->object.parsed flag is set,
> their commit->graph_pos is set, but their commit->maybe_tree may still
> be NULL. When somebody later calls repo_get_commit_tree(), we see that
> we haven't loaded the tree oid yet and try to get it from the commit
> graph. But since it has been freed, we segfault!

I was surprised we ever called repo_get_commit_tree() at all, since
we're literally just traversing commits here. It looks like
list-objects.c is very happy to queue pending trees for each commit,
even if we're just going to throw them away when we get to
process_tree()! I wonder if could be checking revs->tree_objects here
and saving ourselves some work.

> The new test in t5500 triggers this segfault, but see the comments there
> for how horribly intimate it has to be with how both upload-pack and
> commit graphs work.

This bug was triggered by a real world case. I'm not entirely clear what
the client-side state was; I had to reverse engineer the whole thing out
of the server-side coredump. The test here is my attempt to cut it down
to a minimum. I don't like having to manually generate the upload-pack
input, but I had trouble finding a fetch command that would trigger it.
And anyway, given how weirdly specific the requirements are for
generating this case, it seemed sensible to me to keep the input close
to the source of the problem.

-Peff

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] list-objects: don't queue root trees unless revs->tree_objects is set
  2019-09-12  0:18 ` Jeff King
@ 2019-09-12  1:11   ` Jeff King
  2019-09-12  1:19     ` Jeff King
                       ` (2 more replies)
  2019-09-12  2:08   ` [PATCH] upload-pack: disable commit graph more gently for shallow traversal Taylor Blau
  1 sibling, 3 replies; 25+ messages in thread
From: Jeff King @ 2019-09-12  1:11 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Derrick Stolee,
	Nguyễn Thái Ngọc Duy

On Wed, Sep 11, 2019 at 08:18:46PM -0400, Jeff King wrote:

> > That creates an interesting problem for commits that have _already_ been
> > parsed using the commit graph. Their commit->object.parsed flag is set,
> > their commit->graph_pos is set, but their commit->maybe_tree may still
> > be NULL. When somebody later calls repo_get_commit_tree(), we see that
> > we haven't loaded the tree oid yet and try to get it from the commit
> > graph. But since it has been freed, we segfault!
> 
> I was surprised we ever called repo_get_commit_tree() at all, since
> we're literally just traversing commits here. It looks like
> list-objects.c is very happy to queue pending trees for each commit,
> even if we're just going to throw them away when we get to
> process_tree()! I wonder if could be checking revs->tree_objects here
> and saving ourselves some work.

Indeed, this seems to help quite a bit in the commit-graph case. I think
it's worth doing (and is independent of the other patch).

-- >8 --
Subject: list-objects: don't queue root trees unless revs->tree_objects is set

When traverse_commit_list() processes each commit, it queues the
commit's root tree in the pending array. Then, after all commits are
processed, it calls traverse_trees_and_blobs() to walk over the pending
list, calling process_tree() on each. But if revs->tree_objects is not
set, process_tree() just exists immediately!

We can save ourselves some work by not even bothering to queue these
trees in the first place. There are a few subtle points to make:

  - we also detect commits with a NULL tree pointer here. But this isn't
    an interesting check for broken commits, since the lookup_tree()
    we'd have done during commit parsing doesn't actually check that we
    have the tree on disk. So we're not losing any robustness.

  - besides queueing, we also set the NOT_USER_GIVEN flag on the tree
    object. This is used by the traverse_commit_list_filtered() variant.
    But if we're not exploring trees, then we won't actually care about
    this flag, which is used only inside process_tree() code-paths.

  - queueing trees eventually leads to us queueing blobs, too. But we
    don't need to check revs->blob_objects here. Even in the current
    code, we still wouldn't find those blobs, because we'd never open up
    the tree objects to list their contents.

  - the user-visible impact to the caller is minimal. The pending trees
    are all cleared by the time the function returns anyway, by
    traverse_trees_and_blobs(). We do call a show_commit() callback,
    which technically could be looking at revs->pending during the
    callback. But it seems like a rather unlikely thing to do (if you
    want the tree of the current commit, then accessing the tree struct
    member is a lot simpler).

So this should be safe to do. Let's look at the benefits:

  [before]
  Benchmark #1: git -C linux rev-list HEAD >/dev/null
    Time (mean ± σ):      7.651 s ±  0.021 s    [User: 7.399 s, System: 0.252 s]
    Range (min … max):    7.607 s …  7.683 s    10 runs

  [after]
  Benchmark #1: git -C linux rev-list HEAD >/dev/null
    Time (mean ± σ):      7.593 s ±  0.023 s    [User: 7.329 s, System: 0.264 s]
    Range (min … max):    7.565 s …  7.634 s    10 runs

Not too impressive, but then we're really just avoiding sticking a
pointer into a growable array. But still, I'll take a free 0.75%
speedup.

Let's try it after running "git commit-graph write":

  [before]
  Benchmark #1: git -C linux rev-list HEAD >/dev/null
    Time (mean ± σ):      1.458 s ±  0.011 s    [User: 1.199 s, System: 0.259 s]
    Range (min … max):    1.447 s …  1.481 s    10 runs

  [after]
  Benchmark #1: git -C linux rev-list HEAD >/dev/null
    Time (mean ± σ):      1.126 s ±  0.023 s    [User: 896.5 ms, System: 229.0 ms]
    Range (min … max):    1.106 s …  1.181 s    10 runs

Now that's more like it. We saved over 22% of the total time. Part of
that is because the runtime is shorter overall, but the absolute
improvement is also much larger. What's going on?

When we fill in a commit struct using the commit graph, we don't bother
to set the tree pointer, and instead lazy-load it when somebody calls
get_commit_tree(). So we're not only skipping the pointer write to the
pending queue, but we're skipping the lazy-load of the tree entirely.

Signed-off-by: Jeff King <peff@peff.net>
---
 list-objects.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/list-objects.c b/list-objects.c
index b5651ddd5b..c837bcaca8 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -370,7 +370,9 @@ static void do_traverse(struct traversal_context *ctx)
 		 * an uninteresting boundary commit may not have its tree
 		 * parsed yet, but we are not going to show them anyway
 		 */
-		if (get_commit_tree(commit)) {
+		if (!ctx->revs->tree_objects)
+			; /* do not bother loading tree */
+		else if (get_commit_tree(commit)) {
 			struct tree *tree = get_commit_tree(commit);
 			tree->object.flags |= NOT_USER_GIVEN;
 			add_pending_tree(ctx->revs, tree);
-- 
2.23.0.663.gbe3d117559


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH] list-objects: don't queue root trees unless revs->tree_objects is set
  2019-09-12  1:11   ` [PATCH] list-objects: don't queue root trees unless revs->tree_objects is set Jeff King
@ 2019-09-12  1:19     ` Jeff King
  2019-09-12 12:31     ` Derrick Stolee
  2019-09-12 16:52     ` Junio C Hamano
  2 siblings, 0 replies; 25+ messages in thread
From: Jeff King @ 2019-09-12  1:19 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Derrick Stolee,
	Nguyễn Thái Ngọc Duy

On Wed, Sep 11, 2019 at 09:11:37PM -0400, Jeff King wrote:

> Let's try it after running "git commit-graph write":
> 
>   [before]
>   Benchmark #1: git -C linux rev-list HEAD >/dev/null
>     Time (mean ± σ):      1.458 s ±  0.011 s    [User: 1.199 s, System: 0.259 s]
>     Range (min … max):    1.447 s …  1.481 s    10 runs
> 
>   [after]
>   Benchmark #1: git -C linux rev-list HEAD >/dev/null
>     Time (mean ± σ):      1.126 s ±  0.023 s    [User: 896.5 ms, System: 229.0 ms]
>     Range (min … max):    1.106 s …  1.181 s    10 runs
> 
> Now that's more like it. We saved over 22% of the total time. Part of
> that is because the runtime is shorter overall, but the absolute
> improvement is also much larger. What's going on?

Another thing I noticed is that rev-list line-buffers when we're writing
to /dev/null. This is actually the doing of glibc's stdio, as it
consider the character device special enough to turn off full buffering
(we also do our own manual flush after each commit).

I think it's probably a fairer test to time it that way (quite often
you'd be writing to a pipe, which would have the same behavior). But our
improvement is even better as a percentage when writing to a file:

  [before]
  Benchmark #1: git -C linux rev-list HEAD >file
  Time (mean ± σ):      1.046 s ±  0.017 s    [User: 922.7 ms, System: 104.3 ms]
  Range (min … max):    1.031 s …  1.087 s    10 runs

  [after]
  Benchmark #1: git -C linux rev-list HEAD >file
  Time (mean ± σ):     741.4 ms ±  14.1 ms    [User: 644.8 ms, System: 75.9 ms]
  Range (min … max):   721.2 ms … 766.8 ms    10 runs

That's a 29% improvement instead of 22% (and shows that write() syscalls
are wasting close to 30% of our runtime, a well).

I wonder if it would be worth teaching rev-list a --buffer option. Or
just kicking it in automatically when we're just printing single oids.
Once upon a time the single-record flushing was useful for:

  git rev-list HEAD -- <pathspec> | git diff-tree ...

to feed incremental results as soon as we have them (imagine we see one
commit which touches the pathspec, then go through 100,000 that don't).
But these days "git log" does that at all internally (and typically
outputs quite a bit more between each flush, though one could argue that
"log --oneline" might want the same behavior).

I dunno. Maybe it's not worth micro-optimizing too hard, but I was
surprised how big a difference it made.

-Peff

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] upload-pack: disable commit graph more gently for shallow traversal
  2019-09-12  0:04 [PATCH] upload-pack: disable commit graph more gently for shallow traversal Jeff King
  2019-09-12  0:18 ` Jeff King
@ 2019-09-12  2:07 ` Taylor Blau
  2019-09-12 11:06   ` SZEDER Gábor
                     ` (2 more replies)
  2019-09-12 12:23 ` Derrick Stolee
  2019-09-12 14:41 ` [PATCH v2] upload-pack commit graph segfault fix Jeff King
  3 siblings, 3 replies; 25+ messages in thread
From: Taylor Blau @ 2019-09-12  2:07 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Taylor Blau, Derrick Stolee,
	Nguyễn Thái Ngọc Duy

Hi Peff,

Thanks in advance for digging into all of this. I feel guilty for having
found the issue myself, and then gotten a headache for just long enough
to have you completely fix the issue by the time that I got back. So,
thanks :-).

On Wed, Sep 11, 2019 at 08:04:15PM -0400, Jeff King wrote:
> When the client has asked for certain shallow options like
> "deepen-since", we do a custom rev-list walk that pretends to be
> shallow. Before doing so, we have to disable the commit-graph, since it
> is not compatible with the shallow view of the repository. That's
> handled by 829a321569 (commit-graph: close_commit_graph before shallow
> walk, 2018-08-20). That commit literally closes and frees our
> repo->objects->commit_graph struct.
>
> That creates an interesting problem for commits that have _already_ been
> parsed using the commit graph. Their commit->object.parsed flag is set,
> their commit->graph_pos is set, but their commit->maybe_tree may still
> be NULL. When somebody later calls repo_get_commit_tree(), we see that
> we haven't loaded the tree oid yet and try to get it from the commit
> graph. But since it has been freed, we segfault!
>
> So the root of the issue is a data dependency between the commit's
> lazy-load of the tree oid and the fact that the commit graph can go
> away mid-process. How can we resolve it?
>
> There are a couple of general approaches:
>
>   1. The obvious answer is to avoid loading the tree from the graph when
>      we see that it's NULL. But then what do we return for the tree oid?
>      If we return NULL, our caller in do_traverse() will rightly
>      complain that we have no tree. We'd have to fallback to loading the
>      actual commit object and re-parsing it. That requires teaching
>      parse_commit_buffer() to understand re-parsing (i.e., not starting
>      from a clean slate and not leaking any allocated bits like parent
>      list pointers).
>
>   2. When we close the commit graph, walk through the set of in-memory
>      objects and clear any graph_pos pointers. But this means we also
>      have to "unparse" any such commits so that we know they still need
>      to open the commit object to fill in their trees. So it's no less
>      complicated than (1), and is more expensive (since we clear objects
>      we might not later need).
>
>   3. Stop freeing the commit-graph struct. Continue to let it be used
>      for lazy-loads of tree oids, but let upload-pack specify that it
>      shouldn't be used for further commit parsing.
>
>   4. Push the whole shallow rev-list out to its own sub-process, with
>      the commit-graph disabled from the start, giving it a clean memory
>      space to work from.
>
> I've chosen (3) here. Options (1) and (2) would work, but are
> non-trivial to implement. Option (4) is more expensive, and I'm not sure
> how complicated it is (shelling out for the actual rev-list part is
> easy, but we do then parse the resulting commits internally, and I'm not
> clear which parts need to be handling shallow-ness).

Option (3) make the most sense to me, too. When I started hacking on a
similar patch, I implemented roughly what you describe in (1), which is
to say that I added an extra:

  && r->objects->commit_graph

To ensure that the 'commit_graph' about to cause a segfault was
non-NULL after calling 'close_commit_graph'. But, as you pointed out:
what are we to return after that? NULL, since we expected it to be in
the commit-graph, but we don't have a commit-graph to check anymore? I
don't think that that makes much sense, and what you implemented here is
clearly better.

> The new test in t5500 triggers this segfault, but see the comments there
> for how horribly intimate it has to be with how both upload-pack and
> commit graphs work.

Thanks for the comment, too. I agree that protocol-level hacking is
somewhat smelly, at least as far as t5500 is concerned, but for this
particular case it seems the only sensible option.

I'm still left scratching my (sore) head about how someone triggered
this in production, but maybe that's a riddle for another day.

> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  commit-graph.c        | 10 ++++++++++
>  commit-graph.h        |  6 ++++++
>  t/t5500-fetch-pack.sh | 38 ++++++++++++++++++++++++++++++++++++++
>  upload-pack.c         |  2 +-
>  4 files changed, 55 insertions(+), 1 deletion(-)
>
> diff --git a/commit-graph.c b/commit-graph.c
> index 9b02d2c426..bc5dd5913f 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -41,6 +41,8 @@
>  #define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \
>  			+ GRAPH_FANOUT_SIZE + the_hash_algo->rawsz)
>
> +static int commit_graph_disabled;
> +
>  char *get_commit_graph_filename(const char *obj_dir)
>  {
>  	char *filename = xstrfmt("%s/info/commit-graph", obj_dir);
> @@ -472,6 +474,9 @@ static int prepare_commit_graph(struct repository *r)
>  		die("dying as requested by the '%s' variable on commit-graph load!",
>  		    GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD);
>
> +	if (commit_graph_disabled)
> +		return 0;
> +
>  	if (r->objects->commit_graph_attempted)
>  		return !!r->objects->commit_graph;
>  	r->objects->commit_graph_attempted = 1;
> @@ -2101,3 +2106,8 @@ void free_commit_graph(struct commit_graph *g)
>  	free(g->filename);
>  	free(g);
>  }
> +
> +void disable_commit_graph(void)
> +{
> +	commit_graph_disabled = 1;
> +}
> diff --git a/commit-graph.h b/commit-graph.h
> index 486e64e591..42d6e7c289 100644
> --- a/commit-graph.h
> +++ b/commit-graph.h
> @@ -107,4 +107,10 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
>  void close_commit_graph(struct raw_object_store *);
>  void free_commit_graph(struct commit_graph *);
>
> +/*
> + * Disable further use of the commit graph in this process when parsing a
> + * "struct commit".
> + */
> +void disable_commit_graph(void);
> +
>  #endif
> diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
> index 8210f63d41..7601664b74 100755
> --- a/t/t5500-fetch-pack.sh
> +++ b/t/t5500-fetch-pack.sh
> @@ -783,6 +783,44 @@ test_expect_success 'clone shallow since selects no commits' '
>  	)
>  '
>
> +# A few subtle things about the request in this test:
> +#
> +#  - the server must have commit-graphs present and enabled

I think "enabled" may be somewhat redundant, at least since some recent
changes to enable this by default. (As an aside, I tried to dig up the
patches that Stolee sent to actually enable this and back up my claim,
but I couldn't find them on 'master'. I'm not sure if that's my poor use
of 'git log', or misremembering the fact that these were enabled by
default.)

> +#
> +#  - the history is such that our want/have share a common ancestor ("base"
> +#    here)
> +#
> +#  - we send only a single have, which is fewer than a normal client would
> +#    send. This ensures that we don't parse "base" up front with
> +#    parse_object(), but rather traverse to it as a parent while deciding if we
> +#    can stop the "have" negotiation, and call parse_commit(). The former
> +#    sees the actual object data and so always loads the three oid, whereas the
> +#    latter will try to lod it lazily.

s/lod/load

> +#
> +#  - we must use protocol v2, because it handles the "have" negotiation before
> +#    processing the shallow direectives
> +#
> +test_expect_success 'shallow since with commit graph and already-seen commit' '
> +	test_create_repo shallow-since-graph &&
> +	(

I'm not sure if this same-level indentation is common, or you're missing
an extra tab here. Either way.

> +	cd shallow-since-graph &&
> +	test_commit base &&
> +	test_commit master &&
> +	git checkout -b other HEAD^ &&
> +	test_commit other &&
> +	git commit-graph write --reachable &&
> +	git config core.commitGraph true &&

Another nit, but do you have any thoughts about using 'test_config' here
instead of a pure 'git config'? I don't think that it really would
matter much (since none of the other tests hopefully have anything to do
with commit-graph, and doubly so if it is enabled by default, _and_
since you're using your own repository), but anyway.
> +
> +	GIT_PROTOCOL=version=2 git upload-pack . <<-EOF >/dev/null
> +	0012command=fetch
> +	00010013deepen-since 1
> +	0032want $(git rev-parse other)
> +	0032have $(git rev-parse master)
> +	0000
> +	EOF
> +	)
> +'
> +

Aside from my nit-picking above, the protocol exchange here looks
correct, and your rationale backs it up. Likewise, this causes a
segfault for me on the tip of 'master', and applying the later part of
your patch fixes it for me. Thanks.

>  test_expect_success 'shallow clone exclude tag two' '
>  	test_create_repo shallow-exclude &&
>  	(
> diff --git a/upload-pack.c b/upload-pack.c
> index 222cd3ad89..a260b6b50d 100644
> --- a/upload-pack.c
> +++ b/upload-pack.c
> @@ -722,7 +722,7 @@ static void deepen_by_rev_list(struct packet_writer *writer, int ac,
>  {
>  	struct commit_list *result;
>
> -	close_commit_graph(the_repository->objects);
> +	disable_commit_graph();

This line made me think to check if we could remove 'close_commit_graph'
all together, but there is a remaining callsite in packfile.c, and
closing the commit-graph _is_ the right thing to do there, so I think we
ought to keep it around.

>  	result = get_shallow_commits_by_rev_list(ac, av, SHALLOW, NOT_SHALLOW);
>  	send_shallow(writer, result);
>  	free_commit_list(result);
> --
> 2.23.0.663.gbe3d117559

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] upload-pack: disable commit graph more gently for shallow traversal
  2019-09-12  0:18 ` Jeff King
  2019-09-12  1:11   ` [PATCH] list-objects: don't queue root trees unless revs->tree_objects is set Jeff King
@ 2019-09-12  2:08   ` Taylor Blau
  2019-09-12 14:03     ` Jeff King
  1 sibling, 1 reply; 25+ messages in thread
From: Taylor Blau @ 2019-09-12  2:08 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Taylor Blau, Derrick Stolee,
	Nguyễn Thái Ngọc Duy

On Wed, Sep 11, 2019 at 08:18:46PM -0400, Jeff King wrote:
> On Wed, Sep 11, 2019 at 08:04:14PM -0400, Jeff King wrote:
>
> > When the client has asked for certain shallow options like
> > "deepen-since", we do a custom rev-list walk that pretends to be
> > shallow. Before doing so, we have to disable the commit-graph, since it
> > is not compatible with the shallow view of the repository. That's
> > handled by 829a321569 (commit-graph: close_commit_graph before shallow
> > walk, 2018-08-20). That commit literally closes and frees our
> > repo->objects->commit_graph struct.
>
> A few notes and curiosities on my patch and this general area.
>
> The test suite passes with my patch both with and without
> GIT_TEST_COMMIT_GRAPH=1. But to my surprise, it also passes if I delete
> the close_commit_graph() line added by 829a321569!
>
> So it's not clear to me whether this whole thing is truly unnecessary
> (and Stolee was just being overly cautious because the code is related
> to shallow-ness, even though it is OK doing a true-parent traversal
> itself), or if we just don't have good test coverage for the case that
> requires it.
>
> If it _is_ necessary, I'm a little worried there are other problems
> lurking. The whole issue is that we've seen and parsed some commits
> before we get to this shallow deepen-since code-path. So just disabling
> commit-graphs isn't enough. Even without them, we might have parsed some
> commits the old-fashioned way and filled in their parent pointers. Is
> that a problem?

I am, too, but I don't think we should hold this patch up which is
obviously improving the situation in the meantime while we figure that
out.

> [snip]

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] upload-pack: disable commit graph more gently for shallow traversal
  2019-09-12  2:07 ` Taylor Blau
@ 2019-09-12 11:06   ` SZEDER Gábor
  2019-09-12 14:01     ` Jeff King
  2019-09-12 12:46   ` Derrick Stolee
  2019-09-12 13:59   ` Jeff King
  2 siblings, 1 reply; 25+ messages in thread
From: SZEDER Gábor @ 2019-09-12 11:06 UTC (permalink / raw)
  To: Taylor Blau
  Cc: Jeff King, git, Derrick Stolee,
	Nguyễn Thái Ngọc Duy

On Wed, Sep 11, 2019 at 10:07:48PM -0400, Taylor Blau wrote:
> > diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
> > index 8210f63d41..7601664b74 100755
> > --- a/t/t5500-fetch-pack.sh
> > +++ b/t/t5500-fetch-pack.sh

> > +#  - we must use protocol v2, because it handles the "have" negotiation before
> > +#    processing the shallow direectives

s/ee/e/

> > +#
> > +test_expect_success 'shallow since with commit graph and already-seen commit' '
> > +	test_create_repo shallow-since-graph &&
> > +	(
> 
> I'm not sure if this same-level indentation is common, or you're missing
> an extra tab here. Either way.
> 
> > +	cd shallow-since-graph &&
> > +	test_commit base &&
> > +	test_commit master &&
> > +	git checkout -b other HEAD^ &&
> > +	test_commit other &&
> > +	git commit-graph write --reachable &&
> > +	git config core.commitGraph true &&
> 
> Another nit, but do you have any thoughts about using 'test_config' here
> instead of a pure 'git config'? I don't think that it really would
> matter much (since none of the other tests hopefully have anything to do
> with commit-graph, and doubly so if it is enabled by default, _and_
> since you're using your own repository), but anyway.

We can't simply replace that 'git config' command with 'test_config',
because it is implemented using 'test_when_finished', which doesn't
work in a subshell [1].  What we could do is:

  test_create_repo shallow-since-graph &&
  test_config -C shallow-since-graph core.commitGraph true &&
  (
     cd shallow-since-graph &&
     ....

Or we could entirely avoid the subshell by passing '-C
shallow-since-graph' to every single command... [2]

However, since this repo was specifically created for this test, it
doesn't really matter in what state it's left behind, so I don't think
it's worth it.


[1] 0968f12a99 (test-lib-functions: detect test_when_finished in
    subshell, 2015-09-05)
[2] Or bite the bullet, and declare that every test case shall start
    in $TRASH_DIRECTORY.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] upload-pack: disable commit graph more gently for shallow traversal
  2019-09-12  0:04 [PATCH] upload-pack: disable commit graph more gently for shallow traversal Jeff King
  2019-09-12  0:18 ` Jeff King
  2019-09-12  2:07 ` Taylor Blau
@ 2019-09-12 12:23 ` Derrick Stolee
  2019-09-12 14:23   ` Jeff King
  2019-09-12 14:41 ` [PATCH v2] upload-pack commit graph segfault fix Jeff King
  3 siblings, 1 reply; 25+ messages in thread
From: Derrick Stolee @ 2019-09-12 12:23 UTC (permalink / raw)
  To: Jeff King, git
  Cc: Taylor Blau, Derrick Stolee,
	Nguyễn Thái Ngọc Duy

On 9/11/2019 8:04 PM, Jeff King wrote:
> When the client has asked for certain shallow options like
> "deepen-since", we do a custom rev-list walk that pretends to be
> shallow. Before doing so, we have to disable the commit-graph, since it
> is not compatible with the shallow view of the repository. That's
> handled by 829a321569 (commit-graph: close_commit_graph before shallow
> walk, 2018-08-20). That commit literally closes and frees our
> repo->objects->commit_graph struct.
> 
> That creates an interesting problem for commits that have _already_ been
> parsed using the commit graph. Their commit->object.parsed flag is set,
> their commit->graph_pos is set, but their commit->maybe_tree may still
> be NULL. When somebody later calls repo_get_commit_tree(), we see that
> we haven't loaded the tree oid yet and try to get it from the commit
> graph. But since it has been freed, we segfault!

OOPS! That is certainly a bad thing. I'm glad you found it, but I
am sorry for how you (probably) found it.

> So the root of the issue is a data dependency between the commit's
> lazy-load of the tree oid and the fact that the commit graph can go
> away mid-process. How can we resolve it?
> 
> There are a couple of general approaches:
> 
>   1. The obvious answer is to avoid loading the tree from the graph when
>      we see that it's NULL. But then what do we return for the tree oid?
>      If we return NULL, our caller in do_traverse() will rightly
>      complain that we have no tree. We'd have to fallback to loading the
>      actual commit object and re-parsing it. That requires teaching
>      parse_commit_buffer() to understand re-parsing (i.e., not starting
>      from a clean slate and not leaking any allocated bits like parent
>      list pointers).
> 
>   2. When we close the commit graph, walk through the set of in-memory
>      objects and clear any graph_pos pointers. But this means we also
>      have to "unparse" any such commits so that we know they still need
>      to open the commit object to fill in their trees. So it's no less
>      complicated than (1), and is more expensive (since we clear objects
>      we might not later need).
> 
>   3. Stop freeing the commit-graph struct. Continue to let it be used
>      for lazy-loads of tree oids, but let upload-pack specify that it
>      shouldn't be used for further commit parsing.
> 
>   4. Push the whole shallow rev-list out to its own sub-process, with
>      the commit-graph disabled from the start, giving it a clean memory
>      space to work from.
> 
> I've chosen (3) here. Options (1) and (2) would work, but are
> non-trivial to implement. Option (4) is more expensive, and I'm not sure
> how complicated it is (shelling out for the actual rev-list part is
> easy, but we do then parse the resulting commits internally, and I'm not
> clear which parts need to be handling shallow-ness).

I agree that (3) is the best option. The commit-graph is just a database
of commit data, and we are choosing to interpret the parent data differently.
The tree data is perfectly good.

> The new test in t5500 triggers this segfault, but see the comments there
> for how horribly intimate it has to be with how both upload-pack and
> commit graphs work.
> 
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  commit-graph.c        | 10 ++++++++++
>  commit-graph.h        |  6 ++++++
>  t/t5500-fetch-pack.sh | 38 ++++++++++++++++++++++++++++++++++++++
>  upload-pack.c         |  2 +-
>  4 files changed, 55 insertions(+), 1 deletion(-)
> 
> diff --git a/commit-graph.c b/commit-graph.c
> index 9b02d2c426..bc5dd5913f 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -41,6 +41,8 @@
>  #define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \
>  			+ GRAPH_FANOUT_SIZE + the_hash_algo->rawsz)
>  
> +static int commit_graph_disabled;

Should we be putting this inside the repository struct instead?

> +
>  char *get_commit_graph_filename(const char *obj_dir)
>  {
>  	char *filename = xstrfmt("%s/info/commit-graph", obj_dir);
> @@ -472,6 +474,9 @@ static int prepare_commit_graph(struct repository *r)
>  		die("dying as requested by the '%s' variable on commit-graph load!",
>  		    GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD);
>  
> +	if (commit_graph_disabled)
> +		return 0;
> +
>  	if (r->objects->commit_graph_attempted)
>  		return !!r->objects->commit_graph;
>  	r->objects->commit_graph_attempted = 1;
> @@ -2101,3 +2106,8 @@ void free_commit_graph(struct commit_graph *g)
>  	free(g->filename);
>  	free(g);
>  }
> +
> +void disable_commit_graph(void)
> +{
> +	commit_graph_disabled = 1;
> +}

Then this would take a struct repository *r.

> diff --git a/commit-graph.h b/commit-graph.h
> index 486e64e591..42d6e7c289 100644
> --- a/commit-graph.h
> +++ b/commit-graph.h
> @@ -107,4 +107,10 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
>  void close_commit_graph(struct raw_object_store *);
>  void free_commit_graph(struct commit_graph *);
>  
> +/*
> + * Disable further use of the commit graph in this process when parsing a
> + * "struct commit".
> + */
> +void disable_commit_graph(void);
> +
>  #endif
> diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
> index 8210f63d41..7601664b74 100755
> --- a/t/t5500-fetch-pack.sh
> +++ b/t/t5500-fetch-pack.sh
> @@ -783,6 +783,44 @@ test_expect_success 'clone shallow since selects no commits' '
>  	)
>  '
>  
> +# A few subtle things about the request in this test:
> +#
> +#  - the server must have commit-graphs present and enabled
> +#
> +#  - the history is such that our want/have share a common ancestor ("base"
> +#    here)
> +#
> +#  - we send only a single have, which is fewer than a normal client would
> +#    send. This ensures that we don't parse "base" up front with
> +#    parse_object(), but rather traverse to it as a parent while deciding if we
> +#    can stop the "have" negotiation, and call parse_commit(). The former
> +#    sees the actual object data and so always loads the three oid, whereas the
> +#    latter will try to lod it lazily.
> +#
> +#  - we must use protocol v2, because it handles the "have" negotiation before
> +#    processing the shallow direectives
> +#
> +test_expect_success 'shallow since with commit graph and already-seen commit' '
> +	test_create_repo shallow-since-graph &&
> +	(
> +	cd shallow-since-graph &&
> +	test_commit base &&
> +	test_commit master &&
> +	git checkout -b other HEAD^ &&
> +	test_commit other &&
> +	git commit-graph write --reachable &&
> +	git config core.commitGraph true &&
> +
> +	GIT_PROTOCOL=version=2 git upload-pack . <<-EOF >/dev/null
> +	0012command=fetch
> +	00010013deepen-since 1
> +	0032want $(git rev-parse other)
> +	0032have $(git rev-parse master)
> +	0000
> +	EOF
> +	)
> +'
> +
>  test_expect_success 'shallow clone exclude tag two' '
>  	test_create_repo shallow-exclude &&
>  	(
> diff --git a/upload-pack.c b/upload-pack.c
> index 222cd3ad89..a260b6b50d 100644
> --- a/upload-pack.c
> +++ b/upload-pack.c
> @@ -722,7 +722,7 @@ static void deepen_by_rev_list(struct packet_writer *writer, int ac,
>  {
>  	struct commit_list *result;
>  
> -	close_commit_graph(the_repository->objects);
> +	disable_commit_graph();
>  	result = get_shallow_commits_by_rev_list(ac, av, SHALLOW, NOT_SHALLOW);
>  	send_shallow(writer, result);
>  	free_commit_list(result);
> 

Your patch does not seem to actually cover the "I've already parsed some commits"
case, as you are only preventing the commit-graph from being prepared. Instead,
we need to have a short-circuit inside parse_commit() to avoid future parsing
from the commit-graph file.

(I'm going to the rest of the thread messages now to see if I just repeated
someone else's concerns.)

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] list-objects: don't queue root trees unless revs->tree_objects is set
  2019-09-12  1:11   ` [PATCH] list-objects: don't queue root trees unless revs->tree_objects is set Jeff King
  2019-09-12  1:19     ` Jeff King
@ 2019-09-12 12:31     ` Derrick Stolee
  2019-09-12 16:52     ` Junio C Hamano
  2 siblings, 0 replies; 25+ messages in thread
From: Derrick Stolee @ 2019-09-12 12:31 UTC (permalink / raw)
  To: Jeff King, git
  Cc: Taylor Blau, Derrick Stolee,
	Nguyễn Thái Ngọc Duy

On 9/11/2019 9:11 PM, Jeff King wrote:
> On Wed, Sep 11, 2019 at 08:18:46PM -0400, Jeff King wrote:
> 
>>> That creates an interesting problem for commits that have _already_ been
>>> parsed using the commit graph. Their commit->object.parsed flag is set,
>>> their commit->graph_pos is set, but their commit->maybe_tree may still
>>> be NULL. When somebody later calls repo_get_commit_tree(), we see that
>>> we haven't loaded the tree oid yet and try to get it from the commit
>>> graph. But since it has been freed, we segfault!
>>
>> I was surprised we ever called repo_get_commit_tree() at all, since
>> we're literally just traversing commits here. It looks like
>> list-objects.c is very happy to queue pending trees for each commit,
>> even if we're just going to throw them away when we get to
>> process_tree()! I wonder if could be checking revs->tree_objects here
>> and saving ourselves some work.
> 
> Indeed, this seems to help quite a bit in the commit-graph case. I think
> it's worth doing (and is independent of the other patch).

Good find!

> -- >8 --
> Subject: list-objects: don't queue root trees unless revs->tree_objects is set
> 
> When traverse_commit_list() processes each commit, it queues the
> commit's root tree in the pending array. Then, after all commits are
> processed, it calls traverse_trees_and_blobs() to walk over the pending
> list, calling process_tree() on each. But if revs->tree_objects is not
> set, process_tree() just exists immediately!
> 
> We can save ourselves some work by not even bothering to queue these
> trees in the first place. There are a few subtle points to make:
> 
>   - we also detect commits with a NULL tree pointer here. But this isn't
>     an interesting check for broken commits, since the lookup_tree()
>     we'd have done during commit parsing doesn't actually check that we
>     have the tree on disk. So we're not losing any robustness.
> 
>   - besides queueing, we also set the NOT_USER_GIVEN flag on the tree
>     object. This is used by the traverse_commit_list_filtered() variant.
>     But if we're not exploring trees, then we won't actually care about
>     this flag, which is used only inside process_tree() code-paths.
> 
>   - queueing trees eventually leads to us queueing blobs, too. But we
>     don't need to check revs->blob_objects here. Even in the current
>     code, we still wouldn't find those blobs, because we'd never open up
>     the tree objects to list their contents.
> 
>   - the user-visible impact to the caller is minimal. The pending trees
>     are all cleared by the time the function returns anyway, by
>     traverse_trees_and_blobs(). We do call a show_commit() callback,
>     which technically could be looking at revs->pending during the
>     callback. But it seems like a rather unlikely thing to do (if you
>     want the tree of the current commit, then accessing the tree struct
>     member is a lot simpler).

These all look reasonable. We shouldn't need to do any of that any more.

> So this should be safe to do. Let's look at the benefits:
> 
>   [before]
>   Benchmark #1: git -C linux rev-list HEAD >/dev/null
>     Time (mean ± σ):      7.651 s ±  0.021 s    [User: 7.399 s, System: 0.252 s]
>     Range (min … max):    7.607 s …  7.683 s    10 runs
> 
>   [after]
>   Benchmark #1: git -C linux rev-list HEAD >/dev/null
>     Time (mean ± σ):      7.593 s ±  0.023 s    [User: 7.329 s, System: 0.264 s]
>     Range (min … max):    7.565 s …  7.634 s    10 runs
> 
> Not too impressive, but then we're really just avoiding sticking a
> pointer into a growable array. But still, I'll take a free 0.75%
> speedup.
> 
> Let's try it after running "git commit-graph write":
> 
>   [before]
>   Benchmark #1: git -C linux rev-list HEAD >/dev/null
>     Time (mean ± σ):      1.458 s ±  0.011 s    [User: 1.199 s, System: 0.259 s]
>     Range (min … max):    1.447 s …  1.481 s    10 runs
> 
>   [after]
>   Benchmark #1: git -C linux rev-list HEAD >/dev/null
>     Time (mean ± σ):      1.126 s ±  0.023 s    [User: 896.5 ms, System: 229.0 ms]
>     Range (min … max):    1.106 s …  1.181 s    10 runs
> 
> Now that's more like it. We saved over 22% of the total time. Part of
> that is because the runtime is shorter overall, but the absolute
> improvement is also much larger. What's going on?

Very cool.

> When we fill in a commit struct using the commit graph, we don't bother
> to set the tree pointer, and instead lazy-load it when somebody calls
> get_commit_tree(). So we're not only skipping the pointer write to the
> pending queue, but we're skipping the lazy-load of the tree entirely.
> 
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  list-objects.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/list-objects.c b/list-objects.c
> index b5651ddd5b..c837bcaca8 100644
> --- a/list-objects.c
> +++ b/list-objects.c
> @@ -370,7 +370,9 @@ static void do_traverse(struct traversal_context *ctx)
>  		 * an uninteresting boundary commit may not have its tree
>  		 * parsed yet, but we are not going to show them anyway
>  		 */
> -		if (get_commit_tree(commit)) {
> +		if (!ctx->revs->tree_objects)
> +			; /* do not bother loading tree */
> +		else if (get_commit_tree(commit)) {
>  			struct tree *tree = get_commit_tree(commit);
>  			tree->object.flags |= NOT_USER_GIVEN;
>  			add_pending_tree(ctx->revs, tree);

And a simple code fix. LGTM.

Thanks!
-Stolee

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] upload-pack: disable commit graph more gently for shallow traversal
  2019-09-12  2:07 ` Taylor Blau
  2019-09-12 11:06   ` SZEDER Gábor
@ 2019-09-12 12:46   ` Derrick Stolee
  2019-09-12 13:59   ` Jeff King
  2 siblings, 0 replies; 25+ messages in thread
From: Derrick Stolee @ 2019-09-12 12:46 UTC (permalink / raw)
  To: Taylor Blau, Jeff King
  Cc: git, Derrick Stolee, Nguyễn Thái Ngọc Duy

On 9/11/2019 10:07 PM, Taylor Blau wrote:>>
>> +# A few subtle things about the request in this test:
>> +#
>> +#  - the server must have commit-graphs present and enabled
> 
> I think "enabled" may be somewhat redundant, at least since some recent
> changes to enable this by default. (As an aside, I tried to dig up the
> patches that Stolee sent to actually enable this and back up my claim,
> but I couldn't find them on 'master'. I'm not sure if that's my poor use
> of 'git log', or misremembering the fact that these were enabled by
> default.)

Commit 31b1de6 ("commit-graph: turn on commit-graph by default" 2019-08-13)
is part of ds/feature-macros and seems to be in master (at least in gitster/git).

Having core.commitGraph=true does very little if the commit-graph is not
written, but gc.writeCommitGraph is also enabled by default in that commit.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] upload-pack: disable commit graph more gently for shallow traversal
  2019-09-12  2:07 ` Taylor Blau
  2019-09-12 11:06   ` SZEDER Gábor
  2019-09-12 12:46   ` Derrick Stolee
@ 2019-09-12 13:59   ` Jeff King
  2 siblings, 0 replies; 25+ messages in thread
From: Jeff King @ 2019-09-12 13:59 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Derrick Stolee, Nguyễn Thái Ngọc Duy

On Wed, Sep 11, 2019 at 10:07:48PM -0400, Taylor Blau wrote:

> > The new test in t5500 triggers this segfault, but see the comments there
> > for how horribly intimate it has to be with how both upload-pack and
> > commit graphs work.
> 
> Thanks for the comment, too. I agree that protocol-level hacking is
> somewhat smelly, at least as far as t5500 is concerned, but for this
> particular case it seems the only sensible option.
> 
> I'm still left scratching my (sore) head about how someone triggered
> this in production, but maybe that's a riddle for another day.

I'll admit that part of my motivation was my inability to generate a
working case just using Git commands (so my justification is that if
it's so hard to do, then the test is inherently fragile, but you can
also just accuse me of being lazy).

I think the key is something to do with the shape of history related to
the requests, such that we walk over a commit during the have
negotiation, but then also need to walk over it during the deepen-since.
So maybe I am just missing something obvious, or maybe it just needs to
be a deeper history. I do like that the case I showed is the minimal
history, at least.

> > +# A few subtle things about the request in this test:
> > +#
> > +#  - the server must have commit-graphs present and enabled
> 
> I think "enabled" may be somewhat redundant, at least since some recent
> changes to enable this by default. (As an aside, I tried to dig up the
> patches that Stolee sent to actually enable this and back up my claim,
> but I couldn't find them on 'master'. I'm not sure if that's my poor use
> of 'git log', or misremembering the fact that these were enabled by
> default.)

Yeah, it is redundant these days. I figured this might go to 'maint',
though, and 31b1de6a09 (commit-graph: turn on commit-graph by default,
2019-08-13) is only in master.

> > +#    latter will try to lod it lazily.
> 
> s/lod/load

Thanks, fixed.

> > +#
> > +#  - we must use protocol v2, because it handles the "have" negotiation before
> > +#    processing the shallow direectives
> > +#
> > +test_expect_success 'shallow since with commit graph and already-seen commit' '
> > +	test_create_repo shallow-since-graph &&
> > +	(
> 
> I'm not sure if this same-level indentation is common, or you're missing
> an extra tab here. Either way.

It's not common, but I was matching the surrounding tests for style (and
use of a separate repo, and non-use of test_config).

> > diff --git a/upload-pack.c b/upload-pack.c
> > index 222cd3ad89..a260b6b50d 100644
> > --- a/upload-pack.c
> > +++ b/upload-pack.c
> > @@ -722,7 +722,7 @@ static void deepen_by_rev_list(struct packet_writer *writer, int ac,
> >  {
> >  	struct commit_list *result;
> >
> > -	close_commit_graph(the_repository->objects);
> > +	disable_commit_graph();
> 
> This line made me think to check if we could remove 'close_commit_graph'
> all together, but there is a remaining callsite in packfile.c, and
> closing the commit-graph _is_ the right thing to do there, so I think we
> ought to keep it around.

Yeah, it's sort of inherently dangerous, as it doesn't scrub any
in-memory commit structs that are in this "have a graph_pos but not
maybe_tree" state.

The call in close_object_store() is probably fine, though, as the whole
point there is getting rid of everything related to the object store.
And since 14ba97f81c (alloc: allow arbitrary repositories for alloc
functions, 2018-05-15) or thereabouts, that includes the object structs.

There's another call in write_commit_graph_file(), right before we
rename a new file into place. This generally happens in a separate
"commit-graph write" process, so I think it's OK.  But it could
eventually happen as part of another process, which would maybe be a
problem. I'm actually not convinced the in-memory one needs to be closed
here, but maybe it's a Windows thing (closing the file before renaming
over it)?

-Peff

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] upload-pack: disable commit graph more gently for shallow traversal
  2019-09-12 11:06   ` SZEDER Gábor
@ 2019-09-12 14:01     ` Jeff King
  0 siblings, 0 replies; 25+ messages in thread
From: Jeff King @ 2019-09-12 14:01 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Taylor Blau, git, Derrick Stolee,
	Nguyễn Thái Ngọc Duy

On Thu, Sep 12, 2019 at 01:06:20PM +0200, SZEDER Gábor wrote:

> > > +#  - we must use protocol v2, because it handles the "have" negotiation before
> > > +#    processing the shallow direectives
> 
> s/ee/e/

Thanks, fixed.

> We can't simply replace that 'git config' command with 'test_config',
> because it is implemented using 'test_when_finished', which doesn't
> work in a subshell [1].  What we could do is:
> 
>   test_create_repo shallow-since-graph &&
>   test_config -C shallow-since-graph core.commitGraph true &&
>   (
>      cd shallow-since-graph &&
>      ....
> 
> Or we could entirely avoid the subshell by passing '-C
> shallow-since-graph' to every single command... [2]
> 
> However, since this repo was specifically created for this test, it
> doesn't really matter in what state it's left behind, so I don't think
> it's worth it.

Yep, agreed on all of this. :)

-Peff

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] upload-pack: disable commit graph more gently for shallow traversal
  2019-09-12  2:08   ` [PATCH] upload-pack: disable commit graph more gently for shallow traversal Taylor Blau
@ 2019-09-12 14:03     ` Jeff King
  0 siblings, 0 replies; 25+ messages in thread
From: Jeff King @ 2019-09-12 14:03 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Derrick Stolee, Nguyễn Thái Ngọc Duy

On Wed, Sep 11, 2019 at 10:08:48PM -0400, Taylor Blau wrote:

> > The test suite passes with my patch both with and without
> > GIT_TEST_COMMIT_GRAPH=1. But to my surprise, it also passes if I delete
> > the close_commit_graph() line added by 829a321569!
> >
> > So it's not clear to me whether this whole thing is truly unnecessary
> > (and Stolee was just being overly cautious because the code is related
> > to shallow-ness, even though it is OK doing a true-parent traversal
> > itself), or if we just don't have good test coverage for the case that
> > requires it.
> >
> > If it _is_ necessary, I'm a little worried there are other problems
> > lurking. The whole issue is that we've seen and parsed some commits
> > before we get to this shallow deepen-since code-path. So just disabling
> > commit-graphs isn't enough. Even without them, we might have parsed some
> > commits the old-fashioned way and filled in their parent pointers. Is
> > that a problem?
> 
> I am, too, but I don't think we should hold this patch up which is
> obviously improving the situation in the meantime while we figure that
> out.

Yeah, I'd agree here, unless we determine quickly that we do need the
bigger fix, and somebody with a bit more knowledge of this shallow code
offers a fix. I believe my patch is a strict improvement, and puts the
commit-graph code path on par with the regular one.

-Peff

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] upload-pack: disable commit graph more gently for shallow traversal
  2019-09-12 12:23 ` Derrick Stolee
@ 2019-09-12 14:23   ` Jeff King
  2019-09-12 19:27     ` Derrick Stolee
  0 siblings, 1 reply; 25+ messages in thread
From: Jeff King @ 2019-09-12 14:23 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git, Taylor Blau, Derrick Stolee,
	Nguyễn Thái Ngọc Duy

On Thu, Sep 12, 2019 at 08:23:49AM -0400, Derrick Stolee wrote:

> > That creates an interesting problem for commits that have _already_ been
> > parsed using the commit graph. Their commit->object.parsed flag is set,
> > their commit->graph_pos is set, but their commit->maybe_tree may still
> > be NULL. When somebody later calls repo_get_commit_tree(), we see that
> > we haven't loaded the tree oid yet and try to get it from the commit
> > graph. But since it has been freed, we segfault!
> 
> OOPS! That is certainly a bad thing. I'm glad you found it, but I
> am sorry for how you (probably) found it.

Heh. I'll admit it was quite a slog of debugging, but _most_ of that was
figuring out in which circumstance we'd have actually parsed the object.
Finding the problematic end state was pretty easy from a coredump. :)

> > diff --git a/commit-graph.c b/commit-graph.c
> > index 9b02d2c426..bc5dd5913f 100644
> > --- a/commit-graph.c
> > +++ b/commit-graph.c
> > @@ -41,6 +41,8 @@
> >  #define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \
> >  			+ GRAPH_FANOUT_SIZE + the_hash_algo->rawsz)
> >  
> > +static int commit_graph_disabled;
> 
> Should we be putting this inside the repository struct instead?

Probably. The only caller will just pass the_repository, but it doesn't
hurt to scope it down now.

It could potentially go into the commit_graph itself, but it looks like
with the incremental work we may have multiple such structs. It could
also go into raw_object_store, but I think conceptually it's a
repo-level thing.

So I put it straight into "struct repository".

> Your patch does not seem to actually cover the "I've already parsed some commits"
> case, as you are only preventing the commit-graph from being prepared. Instead,
> we need to have a short-circuit inside parse_commit() to avoid future parsing
> from the commit-graph file.

Maybe I was too clever, then. :)

I didn't want to have to sprinkle "are we disabled" in parse_commit(),
etc. But any such uses of the commit graph have to do:

  if (!prepare_commit_graph(r))
	return;

to lazy-load it. So the logic to prepare becomes (roughly):

  if (disabled)
	return 0;
  if (already_loaded)
	return 1;
  return actually_load() ? 1 : 0;

and "disabled" takes precedence.

I've added this comment in prepare_commit_graph():

        /*
         * This must come before the "already attempted?" check below, because
         * we want to disable even an already-loaded graph file.
         */
        if (r->commit_graph_disabled)
                return 0;

        if (r->objects->commit_graph_attempted)
                return !!r->objects->commit_graph;
        r->objects->commit_graph_attempted = 1;

Does that make more sense?

Unrelated, but I also notice the top of prepare_commit_graph() has:

        if (git_env_bool(GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD, 0))
                die("dying as requested by the '%s' variable on commit-graph load!",
                    GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD);

as the very first thing. Meaning we're calling getenv() as part of every
single parse_commit(), rather than just once per process. Seems like an
easy efficiency win.

-Peff

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2] upload-pack commit graph segfault fix
  2019-09-12  0:04 [PATCH] upload-pack: disable commit graph more gently for shallow traversal Jeff King
                   ` (2 preceding siblings ...)
  2019-09-12 12:23 ` Derrick Stolee
@ 2019-09-12 14:41 ` Jeff King
  2019-09-12 14:43   ` Jeff King
                     ` (3 more replies)
  3 siblings, 4 replies; 25+ messages in thread
From: Jeff King @ 2019-09-12 14:41 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Derrick Stolee,
	Nguyễn Thái Ngọc Duy

Here's a re-roll of my "disable commit graph more gently" fix. Versus
v1:

  - I've included a preparatory patch that speeds up
    prepare_commit_graph(). It's not strictly related, but there's a
    textual dependency. It could be easily spun off to its own series.

  - a comment points out that the ordering of the "disable" check is
    important

  - disable_commit_graph() now works on a repository struct

  - typo fixes in the test comments

  [1/2]: commit-graph: bump DIE_ON_LOAD check to actual load-time
  [2/2]: upload-pack: disable commit graph more gently for shallow traversal

 commit-graph.c        | 18 +++++++++++++++---
 commit-graph.h        |  6 ++++++
 repository.h          |  3 +++
 t/t5500-fetch-pack.sh | 38 ++++++++++++++++++++++++++++++++++++++
 upload-pack.c         |  2 +-
 5 files changed, 63 insertions(+), 4 deletions(-)

-Peff

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2] upload-pack commit graph segfault fix
  2019-09-12 14:41 ` [PATCH v2] upload-pack commit graph segfault fix Jeff King
@ 2019-09-12 14:43   ` Jeff King
  2019-09-12 14:44   ` [PATCH v2 1/2] commit-graph: bump DIE_ON_LOAD check to actual load-time Jeff King
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 25+ messages in thread
From: Jeff King @ 2019-09-12 14:43 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Derrick Stolee,
	Nguyễn Thái Ngọc Duy

On Thu, Sep 12, 2019 at 10:41:22AM -0400, Jeff King wrote:

> Here's a re-roll of my "disable commit graph more gently" fix. Versus
> v1:
> 
>   - I've included a preparatory patch that speeds up
>     prepare_commit_graph(). It's not strictly related, but there's a
>     textual dependency. It could be easily spun off to its own series.

I _didn't_ include here the other speedup from this thread:

  https://public-inbox.org/git/20190912011137.GA23412@sigill.intra.peff.net/

That's worth doing, but is totally independent (conceptually and
textually).

-Peff

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 1/2] commit-graph: bump DIE_ON_LOAD check to actual load-time
  2019-09-12 14:41 ` [PATCH v2] upload-pack commit graph segfault fix Jeff King
  2019-09-12 14:43   ` Jeff King
@ 2019-09-12 14:44   ` Jeff King
  2019-09-12 19:30     ` Derrick Stolee
  2019-09-12 14:44   ` [PATCH v2 2/2] upload-pack: disable commit graph more gently for shallow traversal Jeff King
  2019-09-12 16:56   ` [PATCH v2] upload-pack commit graph segfault fix Taylor Blau
  3 siblings, 1 reply; 25+ messages in thread
From: Jeff King @ 2019-09-12 14:44 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Derrick Stolee,
	Nguyễn Thái Ngọc Duy

Commit 43d3561805 (commit-graph write: don't die if the existing graph
is corrupt, 2019-03-25) added an environment variable we use only in the
test suite, $GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD. But it put the check for
this variable at the very top of prepare_commit_graph(), which is called
every time we want to use the commit graph. Most importantly, it comes
_before_ we check the fast-path "did we already try to load?", meaning
we end up calling getenv() for every single use of the commit graph,
rather than just when we load.

getenv() is allowed to have unexpected side effects, but that shouldn't
be a problem here; we're lazy-loading the graph so it's clear that at
least _one_ invocation of this function is going to call it.

But it is inefficient. getenv() typically has to do a linear search
through the environment space.

We could memoize the call, but it's simpler still to just bump the check
down to the actual loading step. That's fine for our sole user in t5318,
and produces this minor real-world speedup:

  [before]
  Benchmark #1: git -C linux rev-list HEAD >/dev/null
    Time (mean ± σ):      1.460 s ±  0.017 s    [User: 1.174 s, System: 0.285 s]
    Range (min … max):    1.440 s …  1.491 s    10 runs

  [after]
  Benchmark #1: git -C linux rev-list HEAD >/dev/null
    Time (mean ± σ):      1.391 s ±  0.005 s    [User: 1.118 s, System: 0.273 s]
    Range (min … max):    1.385 s …  1.399 s    10 runs

Of course that actual speedup depends on how big your environment is. We
can game it like this:

  for i in $(seq 10000); do
    export dummy$i=$i
  done

in which case I get:

  [before]
  Benchmark #1: git -C linux rev-list HEAD >/dev/null
    Time (mean ± σ):      6.257 s ±  0.061 s    [User: 6.005 s, System: 0.250 s]
    Range (min … max):    6.174 s …  6.337 s    10 runs

  [after]
  Benchmark #1: git -C linux rev-list HEAD >/dev/null
  Time (mean ± σ):      1.403 s ±  0.005 s    [User: 1.146 s, System: 0.256 s]
  Range (min … max):    1.396 s …  1.412 s    10 runs

So this is really more about avoiding the pathological case than
providing a big real-world speedup.

Signed-off-by: Jeff King <peff@peff.net>
---
 commit-graph.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 9b02d2c426..baeaf0d1bf 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -468,14 +468,14 @@ static int prepare_commit_graph(struct repository *r)
 {
 	struct object_directory *odb;
 
-	if (git_env_bool(GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD, 0))
-		die("dying as requested by the '%s' variable on commit-graph load!",
-		    GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD);
-
 	if (r->objects->commit_graph_attempted)
 		return !!r->objects->commit_graph;
 	r->objects->commit_graph_attempted = 1;
 
+	if (git_env_bool(GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD, 0))
+		die("dying as requested by the '%s' variable on commit-graph load!",
+		    GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD);
+
 	prepare_repo_settings(r);
 
 	if (!git_env_bool(GIT_TEST_COMMIT_GRAPH, 0) &&
-- 
2.23.0.667.gcccf1fbb03


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 2/2] upload-pack: disable commit graph more gently for shallow traversal
  2019-09-12 14:41 ` [PATCH v2] upload-pack commit graph segfault fix Jeff King
  2019-09-12 14:43   ` Jeff King
  2019-09-12 14:44   ` [PATCH v2 1/2] commit-graph: bump DIE_ON_LOAD check to actual load-time Jeff King
@ 2019-09-12 14:44   ` Jeff King
  2019-09-13 13:26     ` Derrick Stolee
  2019-09-12 16:56   ` [PATCH v2] upload-pack commit graph segfault fix Taylor Blau
  3 siblings, 1 reply; 25+ messages in thread
From: Jeff King @ 2019-09-12 14:44 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Derrick Stolee,
	Nguyễn Thái Ngọc Duy

When the client has asked for certain shallow options like
"deepen-since", we do a custom rev-list walk that pretends to be
shallow. Before doing so, we have to disable the commit-graph, since it
is not compatible with the shallow view of the repository. That's
handled by 829a321569 (commit-graph: close_commit_graph before shallow
walk, 2018-08-20). That commit literally closes and frees our
repo->objects->commit_graph struct.

That creates an interesting problem for commits that have _already_ been
parsed using the commit graph. Their commit->object.parsed flag is set,
their commit->graph_pos is set, but their commit->maybe_tree may still
be NULL. When somebody later calls repo_get_commit_tree(), we see that
we haven't loaded the tree oid yet and try to get it from the commit
graph. But since it has been freed, we segfault!

So the root of the issue is a data dependency between the commit's
lazy-load of the tree oid and the fact that the commit graph can go
away mid-process. How can we resolve it?

There are a couple of general approaches:

  1. The obvious answer is to avoid loading the tree from the graph when
     we see that it's NULL. But then what do we return for the tree oid?
     If we return NULL, our caller in do_traverse() will rightly
     complain that we have no tree. We'd have to fallback to loading the
     actual commit object and re-parsing it. That requires teaching
     parse_commit_buffer() to understand re-parsing (i.e., not starting
     from a clean slate and not leaking any allocated bits like parent
     list pointers).

  2. When we close the commit graph, walk through the set of in-memory
     objects and clear any graph_pos pointers. But this means we also
     have to "unparse" any such commits so that we know they still need
     to open the commit object to fill in their trees. So it's no less
     complicated than (1), and is more expensive (since we clear objects
     we might not later need).

  3. Stop freeing the commit-graph struct. Continue to let it be used
     for lazy-loads of tree oids, but let upload-pack specify that it
     shouldn't be used for further commit parsing.

  4. Push the whole shallow rev-list out to its own sub-process, with
     the commit-graph disabled from the start, giving it a clean memory
     space to work from.

I've chosen (3) here. Options (1) and (2) would work, but are
non-trivial to implement. Option (4) is more expensive, and I'm not sure
how complicated it is (shelling out for the actual rev-list part is
easy, but we do then parse the resulting commits internally, and I'm not
clear which parts need to be handling shallow-ness).

The new test in t5500 triggers this segfault, but see the comments there
for how horribly intimate it has to be with how both upload-pack and
commit graphs work.

Signed-off-by: Jeff King <peff@peff.net>
---
 commit-graph.c        | 12 ++++++++++++
 commit-graph.h        |  6 ++++++
 repository.h          |  3 +++
 t/t5500-fetch-pack.sh | 38 ++++++++++++++++++++++++++++++++++++++
 upload-pack.c         |  2 +-
 5 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index baeaf0d1bf..bbde647f8b 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -468,6 +468,13 @@ static int prepare_commit_graph(struct repository *r)
 {
 	struct object_directory *odb;
 
+	/*
+	 * This must come before the "already attempted?" check below, because
+	 * we want to disable even an already-loaded graph file.
+	 */
+	if (r->commit_graph_disabled)
+		return 0;
+
 	if (r->objects->commit_graph_attempted)
 		return !!r->objects->commit_graph;
 	r->objects->commit_graph_attempted = 1;
@@ -2101,3 +2108,8 @@ void free_commit_graph(struct commit_graph *g)
 	free(g->filename);
 	free(g);
 }
+
+void disable_commit_graph(struct repository *r)
+{
+	r->commit_graph_disabled = 1;
+}
diff --git a/commit-graph.h b/commit-graph.h
index 486e64e591..7f5c933fa2 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -107,4 +107,10 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
 void close_commit_graph(struct raw_object_store *);
 void free_commit_graph(struct commit_graph *);
 
+/*
+ * Disable further use of the commit graph in this process when parsing a
+ * "struct commit".
+ */
+void disable_commit_graph(struct repository *r);
+
 #endif
diff --git a/repository.h b/repository.h
index 4da275e73f..84335292cd 100644
--- a/repository.h
+++ b/repository.h
@@ -124,6 +124,9 @@ struct repository {
 	/* A unique-id for tracing purposes. */
 	int trace2_repo_id;
 
+	/* True if commit-graph has been disabled within this process. */
+	int commit_graph_disabled;
+
 	/* Configurations */
 
 	/* Indicate if a repository has a different 'commondir' from 'gitdir' */
diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index 8210f63d41..244c3e7062 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -783,6 +783,44 @@ test_expect_success 'clone shallow since selects no commits' '
 	)
 '
 
+# A few subtle things about the request in this test:
+#
+#  - the server must have commit-graphs present and enabled
+#
+#  - the history is such that our want/have share a common ancestor ("base"
+#    here)
+#
+#  - we send only a single have, which is fewer than a normal client would
+#    send. This ensures that we don't parse "base" up front with
+#    parse_object(), but rather traverse to it as a parent while deciding if we
+#    can stop the "have" negotiation, and call parse_commit(). The former
+#    sees the actual object data and so always loads the three oid, whereas the
+#    latter will try to load it lazily.
+#
+#  - we must use protocol v2, because it handles the "have" negotiation before
+#    processing the shallow directives
+#
+test_expect_success 'shallow since with commit graph and already-seen commit' '
+	test_create_repo shallow-since-graph &&
+	(
+	cd shallow-since-graph &&
+	test_commit base &&
+	test_commit master &&
+	git checkout -b other HEAD^ &&
+	test_commit other &&
+	git commit-graph write --reachable &&
+	git config core.commitGraph true &&
+
+	GIT_PROTOCOL=version=2 git upload-pack . <<-EOF >/dev/null
+	0012command=fetch
+	00010013deepen-since 1
+	0032want $(git rev-parse other)
+	0032have $(git rev-parse master)
+	0000
+	EOF
+	)
+'
+
 test_expect_success 'shallow clone exclude tag two' '
 	test_create_repo shallow-exclude &&
 	(
diff --git a/upload-pack.c b/upload-pack.c
index 222cd3ad89..135bb3f6cc 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -722,7 +722,7 @@ static void deepen_by_rev_list(struct packet_writer *writer, int ac,
 {
 	struct commit_list *result;
 
-	close_commit_graph(the_repository->objects);
+	disable_commit_graph(the_repository);
 	result = get_shallow_commits_by_rev_list(ac, av, SHALLOW, NOT_SHALLOW);
 	send_shallow(writer, result);
 	free_commit_list(result);
-- 
2.23.0.667.gcccf1fbb03

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH] list-objects: don't queue root trees unless revs->tree_objects is set
  2019-09-12  1:11   ` [PATCH] list-objects: don't queue root trees unless revs->tree_objects is set Jeff King
  2019-09-12  1:19     ` Jeff King
  2019-09-12 12:31     ` Derrick Stolee
@ 2019-09-12 16:52     ` Junio C Hamano
  2019-09-12 22:34       ` Jeff King
  2 siblings, 1 reply; 25+ messages in thread
From: Junio C Hamano @ 2019-09-12 16:52 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Taylor Blau, Derrick Stolee,
	Nguyễn Thái Ngọc Duy

Jeff King <peff@peff.net> writes:

>> I was surprised we ever called repo_get_commit_tree() at all, since
>> we're literally just traversing commits here. It looks like
>> list-objects.c is very happy to queue pending trees for each commit,
>> even if we're just going to throw them away when we get to
>> process_tree()! I wonder if could be checking revs->tree_objects here
>> and saving ourselves some work.
>
> Indeed, this seems to help quite a bit in the commit-graph case. I think
> it's worth doing (and is independent of the other patch).

Yeah, I agree this is very much worth doing and is orthogonal to the
other one.

Thanks for spotting it.  I wonder if it was broken like this forever
since the beginning X-<.

>
> -- >8 --
> Subject: list-objects: don't queue root trees unless revs->tree_objects is set
>
> When traverse_commit_list() processes each commit, it queues the
> commit's root tree in the pending array. Then, after all commits are
> processed, it calls traverse_trees_and_blobs() to walk over the pending
> list, calling process_tree() on each. But if revs->tree_objects is not
> set, process_tree() just exists immediately!
>
> We can save ourselves some work by not even bothering to queue these
> trees in the first place. There are a few subtle points to make:
>
>   - we also detect commits with a NULL tree pointer here. But this isn't
>     an interesting check for broken commits, since the lookup_tree()
>     we'd have done during commit parsing doesn't actually check that we
>     have the tree on disk. So we're not losing any robustness.
>
>   - besides queueing, we also set the NOT_USER_GIVEN flag on the tree
>     object. This is used by the traverse_commit_list_filtered() variant.
>     But if we're not exploring trees, then we won't actually care about
>     this flag, which is used only inside process_tree() code-paths.
>
>   - queueing trees eventually leads to us queueing blobs, too. But we
>     don't need to check revs->blob_objects here. Even in the current
>     code, we still wouldn't find those blobs, because we'd never open up
>     the tree objects to list their contents.
>
>   - the user-visible impact to the caller is minimal. The pending trees
>     are all cleared by the time the function returns anyway, by
>     traverse_trees_and_blobs(). We do call a show_commit() callback,
>     which technically could be looking at revs->pending during the
>     callback. But it seems like a rather unlikely thing to do (if you
>     want the tree of the current commit, then accessing the tree struct
>     member is a lot simpler).
>
> So this should be safe to do. Let's look at the benefits:
>
>   [before]
>   Benchmark #1: git -C linux rev-list HEAD >/dev/null
>     Time (mean ± σ):      7.651 s ±  0.021 s    [User: 7.399 s, System: 0.252 s]
>     Range (min … max):    7.607 s …  7.683 s    10 runs
>
>   [after]
>   Benchmark #1: git -C linux rev-list HEAD >/dev/null
>     Time (mean ± σ):      7.593 s ±  0.023 s    [User: 7.329 s, System: 0.264 s]
>     Range (min … max):    7.565 s …  7.634 s    10 runs
>
> Not too impressive, but then we're really just avoiding sticking a
> pointer into a growable array. But still, I'll take a free 0.75%
> speedup.
>
> Let's try it after running "git commit-graph write":
>
>   [before]
>   Benchmark #1: git -C linux rev-list HEAD >/dev/null
>     Time (mean ± σ):      1.458 s ±  0.011 s    [User: 1.199 s, System: 0.259 s]
>     Range (min … max):    1.447 s …  1.481 s    10 runs
>
>   [after]
>   Benchmark #1: git -C linux rev-list HEAD >/dev/null
>     Time (mean ± σ):      1.126 s ±  0.023 s    [User: 896.5 ms, System: 229.0 ms]
>     Range (min … max):    1.106 s …  1.181 s    10 runs
>
> Now that's more like it. We saved over 22% of the total time. Part of
> that is because the runtime is shorter overall, but the absolute
> improvement is also much larger. What's going on?
>
> When we fill in a commit struct using the commit graph, we don't bother
> to set the tree pointer, and instead lazy-load it when somebody calls
> get_commit_tree(). So we're not only skipping the pointer write to the
> pending queue, but we're skipping the lazy-load of the tree entirely.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  list-objects.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/list-objects.c b/list-objects.c
> index b5651ddd5b..c837bcaca8 100644
> --- a/list-objects.c
> +++ b/list-objects.c
> @@ -370,7 +370,9 @@ static void do_traverse(struct traversal_context *ctx)
>  		 * an uninteresting boundary commit may not have its tree
>  		 * parsed yet, but we are not going to show them anyway
>  		 */
> -		if (get_commit_tree(commit)) {
> +		if (!ctx->revs->tree_objects)
> +			; /* do not bother loading tree */
> +		else if (get_commit_tree(commit)) {
>  			struct tree *tree = get_commit_tree(commit);
>  			tree->object.flags |= NOT_USER_GIVEN;
>  			add_pending_tree(ctx->revs, tree);

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2] upload-pack commit graph segfault fix
  2019-09-12 14:41 ` [PATCH v2] upload-pack commit graph segfault fix Jeff King
                     ` (2 preceding siblings ...)
  2019-09-12 14:44   ` [PATCH v2 2/2] upload-pack: disable commit graph more gently for shallow traversal Jeff King
@ 2019-09-12 16:56   ` Taylor Blau
  3 siblings, 0 replies; 25+ messages in thread
From: Taylor Blau @ 2019-09-12 16:56 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Taylor Blau, Derrick Stolee,
	Nguyễn Thái Ngọc Duy

On Thu, Sep 12, 2019 at 10:41:22AM -0400, Jeff King wrote:
> Here's a re-roll of my "disable commit graph more gently" fix. Versus
> v1:

Thanks for sending a re-roll. I deployed this out to all of our servers
running git at GitHub, and it seems to be working fine.

For what it's worth, I don't have 'core.commitGraph' enabled on any of
those servers for now, but I'll turn it back on shortly.

>   - I've included a preparatory patch that speeds up
>     prepare_commit_graph(). It's not strictly related, but there's a
>     textual dependency. It could be easily spun off to its own series.
>
>   - a comment points out that the ordering of the "disable" check is
>     important
>
>   - disable_commit_graph() now works on a repository struct
>
>   - typo fixes in the test comments
>
>   [1/2]: commit-graph: bump DIE_ON_LOAD check to actual load-time
>   [2/2]: upload-pack: disable commit graph more gently for shallow traversal
>
>  commit-graph.c        | 18 +++++++++++++++---
>  commit-graph.h        |  6 ++++++
>  repository.h          |  3 +++
>  t/t5500-fetch-pack.sh | 38 ++++++++++++++++++++++++++++++++++++++
>  upload-pack.c         |  2 +-
>  5 files changed, 63 insertions(+), 4 deletions(-)
>
> -Peff

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] upload-pack: disable commit graph more gently for shallow traversal
  2019-09-12 14:23   ` Jeff King
@ 2019-09-12 19:27     ` Derrick Stolee
  0 siblings, 0 replies; 25+ messages in thread
From: Derrick Stolee @ 2019-09-12 19:27 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Taylor Blau, Derrick Stolee,
	Nguyễn Thái Ngọc Duy

On 9/12/2019 10:23 AM, Jeff King wrote:
> On Thu, Sep 12, 2019 at 08:23:49AM -0400, Derrick Stolee wrote:
> 
>>> That creates an interesting problem for commits that have _already_ been
>>> parsed using the commit graph. Their commit->object.parsed flag is set,
>>> their commit->graph_pos is set, but their commit->maybe_tree may still
>>> be NULL. When somebody later calls repo_get_commit_tree(), we see that
>>> we haven't loaded the tree oid yet and try to get it from the commit
>>> graph. But since it has been freed, we segfault!
>>
>> OOPS! That is certainly a bad thing. I'm glad you found it, but I
>> am sorry for how you (probably) found it.
> 
> Heh. I'll admit it was quite a slog of debugging, but _most_ of that was
> figuring out in which circumstance we'd have actually parsed the object.
> Finding the problematic end state was pretty easy from a coredump. :)
> 
>>> diff --git a/commit-graph.c b/commit-graph.c
>>> index 9b02d2c426..bc5dd5913f 100644
>>> --- a/commit-graph.c
>>> +++ b/commit-graph.c
>>> @@ -41,6 +41,8 @@
>>>  #define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \
>>>  			+ GRAPH_FANOUT_SIZE + the_hash_algo->rawsz)
>>>  
>>> +static int commit_graph_disabled;
>>
>> Should we be putting this inside the repository struct instead?
> 
> Probably. The only caller will just pass the_repository, but it doesn't
> hurt to scope it down now.
> 
> It could potentially go into the commit_graph itself, but it looks like
> with the incremental work we may have multiple such structs. It could
> also go into raw_object_store, but I think conceptually it's a
> repo-level thing.
> 
> So I put it straight into "struct repository".
> 
>> Your patch does not seem to actually cover the "I've already parsed some commits"
>> case, as you are only preventing the commit-graph from being prepared. Instead,
>> we need to have a short-circuit inside parse_commit() to avoid future parsing
>> from the commit-graph file.
> 
> Maybe I was too clever, then. :)
> 
> I didn't want to have to sprinkle "are we disabled" in parse_commit(),
> etc. But any such uses of the commit graph have to do:
> 
>   if (!prepare_commit_graph(r))
> 	return;
> 
> to lazy-load it. So the logic to prepare becomes (roughly):
> 
>   if (disabled)
> 	return 0;
>   if (already_loaded)
> 	return 1;
>   return actually_load() ? 1 : 0;
> 
> and "disabled" takes precedence.
> 
> I've added this comment in prepare_commit_graph():
> 
>         /*
>          * This must come before the "already attempted?" check below, because
>          * we want to disable even an already-loaded graph file.
>          */
>         if (r->commit_graph_disabled)
>                 return 0;
> 
>         if (r->objects->commit_graph_attempted)
>                 return !!r->objects->commit_graph;
>         r->objects->commit_graph_attempted = 1;
> 
> Does that make more sense?

Ah. That does make sense. I now see the connection between parsing and this
change.

> Unrelated, but I also notice the top of prepare_commit_graph() has:
> 
>         if (git_env_bool(GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD, 0))
>                 die("dying as requested by the '%s' variable on commit-graph load!",
>                     GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD);
> 
> as the very first thing. Meaning we're calling getenv() as part of every
> single parse_commit(), rather than just once per process. Seems like an
> easy efficiency win.

Absolutely. Move this to after the "have we attempted already?" condition.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/2] commit-graph: bump DIE_ON_LOAD check to actual load-time
  2019-09-12 14:44   ` [PATCH v2 1/2] commit-graph: bump DIE_ON_LOAD check to actual load-time Jeff King
@ 2019-09-12 19:30     ` Derrick Stolee
  0 siblings, 0 replies; 25+ messages in thread
From: Derrick Stolee @ 2019-09-12 19:30 UTC (permalink / raw)
  To: Jeff King, git
  Cc: Taylor Blau, Derrick Stolee,
	Nguyễn Thái Ngọc Duy

On 9/12/2019 10:44 AM, Jeff King wrote:
> Commit 43d3561805 (commit-graph write: don't die if the existing graph
> is corrupt, 2019-03-25) added an environment variable we use only in the
> test suite, $GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD. But it put the check for
> this variable at the very top of prepare_commit_graph(), which is called
> every time we want to use the commit graph. Most importantly, it comes
> _before_ we check the fast-path "did we already try to load?", meaning
> we end up calling getenv() for every single use of the commit graph,
> rather than just when we load.
> 
> getenv() is allowed to have unexpected side effects, but that shouldn't
> be a problem here; we're lazy-loading the graph so it's clear that at
> least _one_ invocation of this function is going to call it.
> 
> But it is inefficient. getenv() typically has to do a linear search
> through the environment space.
> 
> We could memoize the call, but it's simpler still to just bump the check
> down to the actual loading step. That's fine for our sole user in t5318,
> and produces this minor real-world speedup:
> 
>   [before]
>   Benchmark #1: git -C linux rev-list HEAD >/dev/null
>     Time (mean ± σ):      1.460 s ±  0.017 s    [User: 1.174 s, System: 0.285 s]
>     Range (min … max):    1.440 s …  1.491 s    10 runs
> 
>   [after]
>   Benchmark #1: git -C linux rev-list HEAD >/dev/null
>     Time (mean ± σ):      1.391 s ±  0.005 s    [User: 1.118 s, System: 0.273 s]
>     Range (min … max):    1.385 s …  1.399 s    10 runs

This looks like an important improvement on its own.

> Of course that actual speedup depends on how big your environment is. We
> can game it like this:
> 
>   for i in $(seq 10000); do
>     export dummy$i=$i
>   done
> 
> in which case I get:
> 
>   [before]
>   Benchmark #1: git -C linux rev-list HEAD >/dev/null
>     Time (mean ± σ):      6.257 s ±  0.061 s    [User: 6.005 s, System: 0.250 s]
>     Range (min … max):    6.174 s …  6.337 s    10 runs
> 
>   [after]
>   Benchmark #1: git -C linux rev-list HEAD >/dev/null
>   Time (mean ± σ):      1.403 s ±  0.005 s    [User: 1.146 s, System: 0.256 s]
>   Range (min … max):    1.396 s …  1.412 s    10 runs
> 
> So this is really more about avoiding the pathological case than
> providing a big real-world speedup.

This change is stunning. I'm _sure_ someone is hurting with this.
 
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  commit-graph.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/commit-graph.c b/commit-graph.c
> index 9b02d2c426..baeaf0d1bf 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -468,14 +468,14 @@ static int prepare_commit_graph(struct repository *r)
>  {
>  	struct object_directory *odb;
>  
> -	if (git_env_bool(GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD, 0))
> -		die("dying as requested by the '%s' variable on commit-graph load!",
> -		    GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD);
> -
>  	if (r->objects->commit_graph_attempted)
>  		return !!r->objects->commit_graph;
>  	r->objects->commit_graph_attempted = 1;
>  
> +	if (git_env_bool(GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD, 0))
> +		die("dying as requested by the '%s' variable on commit-graph load!",
> +		    GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD);
> +
>  	prepare_repo_settings(r);
>  
>  	if (!git_env_bool(GIT_TEST_COMMIT_GRAPH, 0) &&

LGTM, thanks!

-Stolee

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] list-objects: don't queue root trees unless revs->tree_objects is set
  2019-09-12 16:52     ` Junio C Hamano
@ 2019-09-12 22:34       ` Jeff King
  2019-09-13 18:05         ` Junio C Hamano
  0 siblings, 1 reply; 25+ messages in thread
From: Jeff King @ 2019-09-12 22:34 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Taylor Blau, Derrick Stolee,
	Nguyễn Thái Ngọc Duy

On Thu, Sep 12, 2019 at 09:52:53AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> >> I was surprised we ever called repo_get_commit_tree() at all, since
> >> we're literally just traversing commits here. It looks like
> >> list-objects.c is very happy to queue pending trees for each commit,
> >> even if we're just going to throw them away when we get to
> >> process_tree()! I wonder if could be checking revs->tree_objects here
> >> and saving ourselves some work.
> >
> > Indeed, this seems to help quite a bit in the commit-graph case. I think
> > it's worth doing (and is independent of the other patch).
> 
> Yeah, I agree this is very much worth doing and is orthogonal to the
> other one.
> 
> Thanks for spotting it.  I wonder if it was broken like this forever
> since the beginning X-<.

Not quite since the beginning; it comes from 8d2dfc49b1
(process_{tree,blob}: show objects without buffering, 2009-04-10). I
suspect nobody noticed because the cost for the normal case is really
just shuffling some pointers around. It's only because it actively works
against the commit-graph optimizations that it's so expensive there.

I was surprised (and pleased) by how much such a simple thing helped.

-Peff

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/2] upload-pack: disable commit graph more gently for shallow traversal
  2019-09-12 14:44   ` [PATCH v2 2/2] upload-pack: disable commit graph more gently for shallow traversal Jeff King
@ 2019-09-13 13:26     ` Derrick Stolee
  0 siblings, 0 replies; 25+ messages in thread
From: Derrick Stolee @ 2019-09-13 13:26 UTC (permalink / raw)
  To: Jeff King, git
  Cc: Taylor Blau, Derrick Stolee,
	Nguyễn Thái Ngọc Duy

On 9/12/2019 10:44 AM, Jeff King wrote:
> When the client has asked for certain shallow options like
> "deepen-since", we do a custom rev-list walk that pretends to be
> shallow. Before doing so, we have to disable the commit-graph, since it
> is not compatible with the shallow view of the repository. That's
> handled by 829a321569 (commit-graph: close_commit_graph before shallow
> walk, 2018-08-20). That commit literally closes and frees our
> repo->objects->commit_graph struct.
> 
> That creates an interesting problem for commits that have _already_ been
> parsed using the commit graph. Their commit->object.parsed flag is set,
> their commit->graph_pos is set, but their commit->maybe_tree may still
> be NULL. When somebody later calls repo_get_commit_tree(), we see that
> we haven't loaded the tree oid yet and try to get it from the commit
> graph. But since it has been freed, we segfault!
> 
> So the root of the issue is a data dependency between the commit's
> lazy-load of the tree oid and the fact that the commit graph can go
> away mid-process. How can we resolve it?
> 
> There are a couple of general approaches:
> 
>   1. The obvious answer is to avoid loading the tree from the graph when
>      we see that it's NULL. But then what do we return for the tree oid?
>      If we return NULL, our caller in do_traverse() will rightly
>      complain that we have no tree. We'd have to fallback to loading the
>      actual commit object and re-parsing it. That requires teaching
>      parse_commit_buffer() to understand re-parsing (i.e., not starting
>      from a clean slate and not leaking any allocated bits like parent
>      list pointers).
> 
>   2. When we close the commit graph, walk through the set of in-memory
>      objects and clear any graph_pos pointers. But this means we also
>      have to "unparse" any such commits so that we know they still need
>      to open the commit object to fill in their trees. So it's no less
>      complicated than (1), and is more expensive (since we clear objects
>      we might not later need).
> 
>   3. Stop freeing the commit-graph struct. Continue to let it be used
>      for lazy-loads of tree oids, but let upload-pack specify that it
>      shouldn't be used for further commit parsing.
> 
>   4. Push the whole shallow rev-list out to its own sub-process, with
>      the commit-graph disabled from the start, giving it a clean memory
>      space to work from.
> 
> I've chosen (3) here. Options (1) and (2) would work, but are
> non-trivial to implement. Option (4) is more expensive, and I'm not sure
> how complicated it is (shelling out for the actual rev-list part is
> easy, but we do then parse the resulting commits internally, and I'm not
> clear which parts need to be handling shallow-ness).
> 
> The new test in t5500 triggers this segfault, but see the comments there
> for how horribly intimate it has to be with how both upload-pack and
> commit graphs work.
> 
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  commit-graph.c        | 12 ++++++++++++
>  commit-graph.h        |  6 ++++++
>  repository.h          |  3 +++
>  t/t5500-fetch-pack.sh | 38 ++++++++++++++++++++++++++++++++++++++
>  upload-pack.c         |  2 +-
>  5 files changed, 60 insertions(+), 1 deletion(-)
> 
> diff --git a/commit-graph.c b/commit-graph.c
> index baeaf0d1bf..bbde647f8b 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -468,6 +468,13 @@ static int prepare_commit_graph(struct repository *r)
>  {
>  	struct object_directory *odb;
>  
> +	/*
> +	 * This must come before the "already attempted?" check below, because
> +	 * we want to disable even an already-loaded graph file.
> +	 */
> +	if (r->commit_graph_disabled)
> +		return 0;
> +
>  	if (r->objects->commit_graph_attempted)
>  		return !!r->objects->commit_graph;
>  	r->objects->commit_graph_attempted = 1;
> @@ -2101,3 +2108,8 @@ void free_commit_graph(struct commit_graph *g)
>  	free(g->filename);
>  	free(g);
>  }
> +
> +void disable_commit_graph(struct repository *r)
> +{
> +	r->commit_graph_disabled = 1;
> +}

Thanks for the additional comment and using the repo struct. LGTM.

-Stolee

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] list-objects: don't queue root trees unless revs->tree_objects is set
  2019-09-12 22:34       ` Jeff King
@ 2019-09-13 18:05         ` Junio C Hamano
  0 siblings, 0 replies; 25+ messages in thread
From: Junio C Hamano @ 2019-09-13 18:05 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Taylor Blau, Derrick Stolee,
	Nguyễn Thái Ngọc Duy

Jeff King <peff@peff.net> writes:

>> Thanks for spotting it.  I wonder if it was broken like this forever
>> since the beginning X-<.
>
> Not quite since the beginning; it comes from 8d2dfc49b1
> (process_{tree,blob}: show objects without buffering, 2009-04-10). I
> suspect nobody noticed because the cost for the normal case is really
> just shuffling some pointers around. It's only because it actively works
> against the commit-graph optimizations that it's so expensive there.

Yeah, that is what I meant by "since the beginning" (of
commit-graph, that is).

> I was surprised (and pleased) by how much such a simple thing helped.

Yes, it is very pleasing.  Thanks.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2019-09-13 18:05 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-12  0:04 [PATCH] upload-pack: disable commit graph more gently for shallow traversal Jeff King
2019-09-12  0:18 ` Jeff King
2019-09-12  1:11   ` [PATCH] list-objects: don't queue root trees unless revs->tree_objects is set Jeff King
2019-09-12  1:19     ` Jeff King
2019-09-12 12:31     ` Derrick Stolee
2019-09-12 16:52     ` Junio C Hamano
2019-09-12 22:34       ` Jeff King
2019-09-13 18:05         ` Junio C Hamano
2019-09-12  2:08   ` [PATCH] upload-pack: disable commit graph more gently for shallow traversal Taylor Blau
2019-09-12 14:03     ` Jeff King
2019-09-12  2:07 ` Taylor Blau
2019-09-12 11:06   ` SZEDER Gábor
2019-09-12 14:01     ` Jeff King
2019-09-12 12:46   ` Derrick Stolee
2019-09-12 13:59   ` Jeff King
2019-09-12 12:23 ` Derrick Stolee
2019-09-12 14:23   ` Jeff King
2019-09-12 19:27     ` Derrick Stolee
2019-09-12 14:41 ` [PATCH v2] upload-pack commit graph segfault fix Jeff King
2019-09-12 14:43   ` Jeff King
2019-09-12 14:44   ` [PATCH v2 1/2] commit-graph: bump DIE_ON_LOAD check to actual load-time Jeff King
2019-09-12 19:30     ` Derrick Stolee
2019-09-12 14:44   ` [PATCH v2 2/2] upload-pack: disable commit graph more gently for shallow traversal Jeff King
2019-09-13 13:26     ` Derrick Stolee
2019-09-12 16:56   ` [PATCH v2] upload-pack commit graph segfault fix Taylor Blau

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).