git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/1] [PERF BUG] Fix size_mult option in fetch.writeCommitGraph
@ 2020-01-02 16:14 Derrick Stolee via GitGitGadget
  2020-01-02 16:14 ` [PATCH 1/1] fetch: set size_multiple in split_commit_graph_opts Derrick Stolee via GitGitGadget
  0 siblings, 1 reply; 6+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-01-02 16:14 UTC (permalink / raw)
  To: git; +Cc: peff, me, szeder.dev, Derrick Stolee, Junio C Hamano

I found this while doing some digging into fetch behavior and split commit
graphs. I had been running fetch.writeCommitGraph=true on my local repos for
a while and noticed that the commit-graph chains were much longer than
expected.

The reason is silly, and the commit message includes all the details.

This behavior exists since v2.24.0, so I'm not sure if it makes the bar for
v2.25.0 this late in the release cycle. At minimum, the change is very small
and unlikely to cause more pain.

This is only a performance bug, and the effect is relatively small. A large
list of commit-graph files slows down the commit lookup time as we need to
perform a linear number of binary searches. This only affects finding the
first commit(s) in a commit walk, as after that we can navigate quickly to
the correct position using graph_pos. When a user runs gc (with 
gc.writeCommitGraph=true, on by default), the chain collapses to a single
level, fixing the performance problem.

Thanks, -Stolee

Derrick Stolee (1):
  fetch: set size_multiple in split_commit_graph_opts

 builtin/fetch.c | 4 +---
 commit-graph.c  | 4 +++-
 2 files changed, 4 insertions(+), 4 deletions(-)


base-commit: 99c33bed562b41de6ce9bd3fd561303d39645048
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-509%2Fderrickstolee%2Ffetch-write-commit-graph-split-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-509/derrickstolee/fetch-write-commit-graph-split-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/509
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/1] fetch: set size_multiple in split_commit_graph_opts
  2020-01-02 16:14 [PATCH 0/1] [PERF BUG] Fix size_mult option in fetch.writeCommitGraph Derrick Stolee via GitGitGadget
@ 2020-01-02 16:14 ` Derrick Stolee via GitGitGadget
  2020-01-02 16:20   ` Derrick Stolee
                     ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-01-02 16:14 UTC (permalink / raw)
  To: git; +Cc: peff, me, szeder.dev, Derrick Stolee, Junio C Hamano,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

In 50f26bd ("fetch: add fetch.writeCommitGraph config setting",
2019-09-02), the fetch builtin added the capability to write a
commit-graph using the "--split" feature. This feature creates
multiple commit-graph files, and those can merge based on a set
of "split options" including a size multiple. The default size
multiple is 2, which intends to provide a log_2 N depth of the
commit-graph chain where N is the number of commits.

However, I noticed during dogfooding that my commit-graph chains
were becoming quite large when left only to builds by 'git fetch'.
It turns out that in split_graph_merge_strategy(), we default the
size_mult variable to 2 except we override it with the context's
split_opts if they exist. In builtin/fetch.c, we create such a
split_opts, but do not populate it with values.

This problem is due to two failures:

 1. It is unclear that we can add the flag COMMIT_GRAPH_WRITE_SPLIT
    with a NULL split_opts.
 2. If we have a non-NULL split_opts, then we override the default
    values even if a zero value is given.

Correct both of these issues. First, do not override size_mult when
the options provide a zero value. Second, stop creating a split_opts
in the fetch builtin.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/fetch.c | 4 +---
 commit-graph.c  | 4 +++-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index f8765b385b..b4c6d921d0 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1866,15 +1866,13 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 	    (fetch_write_commit_graph < 0 &&
 	     the_repository->settings.fetch_write_commit_graph)) {
 		int commit_graph_flags = COMMIT_GRAPH_WRITE_SPLIT;
-		struct split_commit_graph_opts split_opts;
-		memset(&split_opts, 0, sizeof(struct split_commit_graph_opts));
 
 		if (progress)
 			commit_graph_flags |= COMMIT_GRAPH_WRITE_PROGRESS;
 
 		write_commit_graph_reachable(get_object_directory(),
 					     commit_graph_flags,
-					     &split_opts);
+					     NULL);
 	}
 
 	close_object_store(the_repository->objects);
diff --git a/commit-graph.c b/commit-graph.c
index e771394aff..b205e65ed1 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1542,7 +1542,9 @@ static void split_graph_merge_strategy(struct write_commit_graph_context *ctx)
 
 	if (ctx->split_opts) {
 		max_commits = ctx->split_opts->max_commits;
-		size_mult = ctx->split_opts->size_multiple;
+
+		if (ctx->split_opts->size_multiple)
+			size_mult = ctx->split_opts->size_multiple;
 	}
 
 	g = ctx->r->objects->commit_graph;
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/1] fetch: set size_multiple in split_commit_graph_opts
  2020-01-02 16:14 ` [PATCH 1/1] fetch: set size_multiple in split_commit_graph_opts Derrick Stolee via GitGitGadget
@ 2020-01-02 16:20   ` Derrick Stolee
  2020-01-02 21:49   ` Junio C Hamano
  2020-01-06 19:39   ` Jeff King
  2 siblings, 0 replies; 6+ messages in thread
From: Derrick Stolee @ 2020-01-02 16:20 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget, git
  Cc: peff, me, szeder.dev, Derrick Stolee, Junio C Hamano

As I was writing this commit message, I changed the plan for how
to fix this. Originally, I was going to set size_multiple = 2 in
builtin/fetch.c, which is how the subject line was created. I forgot
to change that to something more like:

	"commit-graph: prefer default size_mult when given zero"

On 1/2/2020 11:14 AM, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
> 
> In 50f26bd ("fetch: add fetch.writeCommitGraph config setting",
> 2019-09-02), the fetch builtin added the capability to write a
> commit-graph using the "--split" feature. This feature creates
> multiple commit-graph files, and those can merge based on a set
> of "split options" including a size multiple. The default size
> multiple is 2, which intends to provide a log_2 N depth of the
> commit-graph chain where N is the number of commits.
> 
> However, I noticed during dogfooding that my commit-graph chains
> were becoming quite large when left only to builds by 'git fetch'.
> It turns out that in split_graph_merge_strategy(), we default the
> size_mult variable to 2 except we override it with the context's
> split_opts if they exist. In builtin/fetch.c, we create such a
> split_opts, but do not populate it with values.
> 
> This problem is due to two failures:
> 
>  1. It is unclear that we can add the flag COMMIT_GRAPH_WRITE_SPLIT
>     with a NULL split_opts.
>  2. If we have a non-NULL split_opts, then we override the default
>     values even if a zero value is given.
> 
> Correct both of these issues. First, do not override size_mult when
> the options provide a zero value. Second, stop creating a split_opts
> in the fetch builtin.

This is a correct description of the actual patch.

Thanks,
-Stolee



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/1] fetch: set size_multiple in split_commit_graph_opts
  2020-01-02 16:14 ` [PATCH 1/1] fetch: set size_multiple in split_commit_graph_opts Derrick Stolee via GitGitGadget
  2020-01-02 16:20   ` Derrick Stolee
@ 2020-01-02 21:49   ` Junio C Hamano
  2020-01-03 13:07     ` Derrick Stolee
  2020-01-06 19:39   ` Jeff King
  2 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2020-01-02 21:49 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, peff, me, szeder.dev, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> This problem is due to two failures:
>
>  1. It is unclear that we can add the flag COMMIT_GRAPH_WRITE_SPLIT
>     with a NULL split_opts.
>  2. If we have a non-NULL split_opts, then we override the default
>     values even if a zero value is given.
>
> Correct both of these issues. First, do not override size_mult when
> the options provide a zero value. Second, stop creating a split_opts
> in the fetch builtin.

OK, so there is the hardcoded default 2 in the code, and split_opts
structure *can* override it, but 0 in the field of the structure is
meant to signal "no, I do not have any value to override the
default", not "I do want to set the multiple to 0"?

Makes sense.  Will queue.

Thanks.


> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  builtin/fetch.c | 4 +---
>  commit-graph.c  | 4 +++-
>  2 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index f8765b385b..b4c6d921d0 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -1866,15 +1866,13 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
>  	    (fetch_write_commit_graph < 0 &&
>  	     the_repository->settings.fetch_write_commit_graph)) {
>  		int commit_graph_flags = COMMIT_GRAPH_WRITE_SPLIT;
> -		struct split_commit_graph_opts split_opts;
> -		memset(&split_opts, 0, sizeof(struct split_commit_graph_opts));
>  
>  		if (progress)
>  			commit_graph_flags |= COMMIT_GRAPH_WRITE_PROGRESS;
>  
>  		write_commit_graph_reachable(get_object_directory(),
>  					     commit_graph_flags,
> -					     &split_opts);
> +					     NULL);
>  	}
>  
>  	close_object_store(the_repository->objects);
> diff --git a/commit-graph.c b/commit-graph.c
> index e771394aff..b205e65ed1 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -1542,7 +1542,9 @@ static void split_graph_merge_strategy(struct write_commit_graph_context *ctx)
>  
>  	if (ctx->split_opts) {
>  		max_commits = ctx->split_opts->max_commits;
> -		size_mult = ctx->split_opts->size_multiple;
> +
> +		if (ctx->split_opts->size_multiple)
> +			size_mult = ctx->split_opts->size_multiple;
>  	}
>  
>  	g = ctx->r->objects->commit_graph;

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/1] fetch: set size_multiple in split_commit_graph_opts
  2020-01-02 21:49   ` Junio C Hamano
@ 2020-01-03 13:07     ` Derrick Stolee
  0 siblings, 0 replies; 6+ messages in thread
From: Derrick Stolee @ 2020-01-03 13:07 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, peff, me, szeder.dev, Derrick Stolee

On 1/2/2020 4:49 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> This problem is due to two failures:
>>
>>  1. It is unclear that we can add the flag COMMIT_GRAPH_WRITE_SPLIT
>>     with a NULL split_opts.
>>  2. If we have a non-NULL split_opts, then we override the default
>>     values even if a zero value is given.
>>
>> Correct both of these issues. First, do not override size_mult when
>> the options provide a zero value. Second, stop creating a split_opts
>> in the fetch builtin.
> 
> OK, so there is the hardcoded default 2 in the code, and split_opts
> structure *can* override it, but 0 in the field of the structure is
> meant to signal "no, I do not have any value to override the
> default", not "I do want to set the multiple to 0"?

Correct. The multiple 0 makes it so we never merge layers of the
chain, and this was happening accidentally. A caller could still
accomplish this by passing -1, but that is not recommended.

-Stolee


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/1] fetch: set size_multiple in split_commit_graph_opts
  2020-01-02 16:14 ` [PATCH 1/1] fetch: set size_multiple in split_commit_graph_opts Derrick Stolee via GitGitGadget
  2020-01-02 16:20   ` Derrick Stolee
  2020-01-02 21:49   ` Junio C Hamano
@ 2020-01-06 19:39   ` Jeff King
  2 siblings, 0 replies; 6+ messages in thread
From: Jeff King @ 2020-01-06 19:39 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, me, szeder.dev, Derrick Stolee, Junio C Hamano

On Thu, Jan 02, 2020 at 04:14:14PM +0000, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <dstolee@microsoft.com>
> 
> In 50f26bd ("fetch: add fetch.writeCommitGraph config setting",
> 2019-09-02), the fetch builtin added the capability to write a
> commit-graph using the "--split" feature. This feature creates
> multiple commit-graph files, and those can merge based on a set
> of "split options" including a size multiple. The default size
> multiple is 2, which intends to provide a log_2 N depth of the
> commit-graph chain where N is the number of commits.
> 
> However, I noticed during dogfooding that my commit-graph chains
> were becoming quite large when left only to builds by 'git fetch'.
> It turns out that in split_graph_merge_strategy(), we default the
> size_mult variable to 2 except we override it with the context's
> split_opts if they exist. In builtin/fetch.c, we create such a
> split_opts, but do not populate it with values.
> 
> This problem is due to two failures:
> 
>  1. It is unclear that we can add the flag COMMIT_GRAPH_WRITE_SPLIT
>     with a NULL split_opts.
>  2. If we have a non-NULL split_opts, then we override the default
>     values even if a zero value is given.
> 
> Correct both of these issues. First, do not override size_mult when
> the options provide a zero value. Second, stop creating a split_opts
> in the fetch builtin.

Thanks, the explanation and fix (both parts) look good to me, modulo the
subject correction you already noted.

> ---
>  builtin/fetch.c | 4 +---
>  commit-graph.c  | 4 +++-

Is it worth covering this with a test?

I guess the non-fetch code paths for splitting already cover this pretty
well, and this is just about managing to get the right number into the
commit-graph code. So perhaps it isn't worth it.

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-01-06 19:39 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-02 16:14 [PATCH 0/1] [PERF BUG] Fix size_mult option in fetch.writeCommitGraph Derrick Stolee via GitGitGadget
2020-01-02 16:14 ` [PATCH 1/1] fetch: set size_multiple in split_commit_graph_opts Derrick Stolee via GitGitGadget
2020-01-02 16:20   ` Derrick Stolee
2020-01-02 21:49   ` Junio C Hamano
2020-01-03 13:07     ` Derrick Stolee
2020-01-06 19:39   ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).