git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/1] Write commit-graph on fetch
@ 2019-09-03  2:22 Derrick Stolee via GitGitGadget
  2019-09-03  2:22 ` [PATCH 1/1] fetch: add fetch.writeCommitGraph config setting Derrick Stolee via GitGitGadget
  0 siblings, 1 reply; 18+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-09-03  2:22 UTC (permalink / raw)
  To: git; +Cc: peff, avarab, garimasigit, Junio C Hamano

Instead of waiting for a non-trivial GC command to write the commit-graph,
write one during 'git fetch'. By using the equivalent method for 'git
commit-graph write --split --reachable', we create a commit-graph chain that
includes all reachable commits. Most of the time, these writes only add the
newly-downloaded commits in a small file at the top of the commit-graph
chain. Commits that are no longer reachable still exist in the commit-graph,
but will be cleaned up by a later GC command that forces the commit-graph to
be rewritten completely.

A version of this patch [1] used to be a part of
ds/commit-graph-incremental, but was removed to focus on the incremental
commit-graph file format. Now that the incremental file format has been
shipped in v2.23.0 and some config things have adjusted in
ds/feature-macros, I'm reintroducing the idea.

Ævar had mentioned wanting to do something with "incremental maintenance
during GC" [2]. I haven't seen any patches towards that aim (please point me
in that direction if they have been submitted). I still think it is worth
allowing a write at fetch time, as some users have GC disabled. I know for
sure that users who only interact with their Git repos via Visual Studio
Team Explorer have all Git commands running with GC disabled, and likely
other desktop GUI clients have it disabled to avoid blocking processes.

Aside: VFS for Git users have GC disabled, but the commit-graph is being
written in the background by a monitoring process. We shipped the
incremental commit-graph writes in a recent version and reduced our writes
from ~60 seconds each to less than a second on average. Very rarely, the
layers of the commit-graph chain collapse and return to the old values. This
feature has been performing well with no known issues.

Thanks, -Stolee

[1] 
https://public-inbox.org/git/3c52385e5696887c40cab4a6b9b7923d60a0567c.1557330827.git.gitgitgadget@gmail.com/

[2] 
https://public-inbox.org/git/b1de6af2-c015-098e-a656-e1b68056e037@gmail.com/

Derrick Stolee (1):
  fetch: add fetch.writeCommitGraph config setting

 Documentation/config/feature.txt |  8 ++++++++
 Documentation/config/fetch.txt   | 10 ++++++++++
 builtin/fetch.c                  | 15 +++++++++++++++
 repo-settings.c                  |  4 ++++
 repository.h                     |  1 +
 t/t5510-fetch.sh                 | 13 +++++++++++++
 6 files changed, 51 insertions(+)


base-commit: aaf633c2ad10b47af7623c130ddfe7231658c7e4
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-328%2Fderrickstolee%2Ffetch-write-commit-graph-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-328/derrickstolee/fetch-write-commit-graph-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/328
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/1] fetch: add fetch.writeCommitGraph config setting
  2019-09-03  2:22 [PATCH 0/1] Write commit-graph on fetch Derrick Stolee via GitGitGadget
@ 2019-09-03  2:22 ` Derrick Stolee via GitGitGadget
  2019-09-03 19:05   ` Junio C Hamano
  2019-09-04  3:08   ` Jeff King
  0 siblings, 2 replies; 18+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-09-03  2:22 UTC (permalink / raw)
  To: git; +Cc: peff, avarab, garimasigit, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The commit-graph feature is now on by default, and is being
written during 'git gc' by default. Typically, Git only writes
a commit-graph when a 'git gc --auto' command passes the gc.auto
setting to actualy do work. This means that a commit-graph will
typically fall behind the commits that are being used every day.

To stay updated with the latest commits, add a step to 'git
fetch' to write a commit-graph after fetching new objects. The
fetch.writeCommitGraph config setting enables writing a split
commit-graph, so on average the cost of writing this file is
very small. Occasionally, the commit-graph chain will collapse
to a single level, and this could be slow for very large repos.

For additional use, adjust the default to be true when
feature.experimental is enabled.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/config/feature.txt |  8 ++++++++
 Documentation/config/fetch.txt   | 10 ++++++++++
 builtin/fetch.c                  | 15 +++++++++++++++
 repo-settings.c                  |  4 ++++
 repository.h                     |  1 +
 t/t5510-fetch.sh                 | 13 +++++++++++++
 6 files changed, 51 insertions(+)

diff --git a/Documentation/config/feature.txt b/Documentation/config/feature.txt
index 545522f306..875f8c8a66 100644
--- a/Documentation/config/feature.txt
+++ b/Documentation/config/feature.txt
@@ -17,6 +17,14 @@ which can improve `git push` performance in repos with many files.
 +
 * `fetch.negotiationAlgorithm=skipping` may improve fetch negotiation times by
 skipping more commits at a time, reducing the number of round trips.
++
+* `fetch.writeCommitGraph=true` writes a commit-graph after every `git fetch`
+command that downloads a pack-file from a remote. Using the `--split` option,
+most executions will create a very small commit-graph file on top of the
+existing commit-graph file(s). Occasionally, these files will merge and the
+write may take longer. Having an updated commit-graph file helps performance
+of many Git commands, including `git merge-base`, `git push -f`, and
+`git log --graph`.
 
 feature.manyFiles::
 	Enable config options that optimize for repos with many files in the
diff --git a/Documentation/config/fetch.txt b/Documentation/config/fetch.txt
index d402110638..e8cb20547c 100644
--- a/Documentation/config/fetch.txt
+++ b/Documentation/config/fetch.txt
@@ -69,3 +69,13 @@ fetch.showForcedUpdates::
 	Set to false to enable `--no-show-forced-updates` in
 	linkgit:git-fetch[1] and linkgit:git-pull[1] commands.
 	Defaults to true.
+
+fetch.writeCommitGraph::
+	Set to true to write a commit-graph after every `git fetch` command
+	that downloads a pack-file from a remote. Using the `--split` option,
+	most executions will create a very small commit-graph file on top of
+	the existing commit-graph file(s). Occasionally, these files will
+	merge and the write may take longer. Having an updated commit-graph
+	file helps performance of many Git commands, including `git merge-base`,
+	`git push -f`, and `git log --graph`. Defaults to false, unless
+	`feature.experimental` is true.
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 53ce99d2bb..d36a403859 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -23,6 +23,7 @@
 #include "packfile.h"
 #include "list-objects-filter-options.h"
 #include "commit-reach.h"
+#include "commit-graph.h"
 
 #define FORCED_UPDATES_DELAY_WARNING_IN_MS (10 * 1000)
 
@@ -1715,6 +1716,20 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 
 	string_list_clear(&list, 0);
 
+	prepare_repo_settings(the_repository);
+	if (the_repository->settings.fetch_write_commit_graph) {
+		int commit_graph_flags = COMMIT_GRAPH_SPLIT;
+		struct split_commit_graph_opts split_opts;
+		memset(&split_opts, 0, sizeof(struct split_commit_graph_opts));
+
+		if (progress)
+			commit_graph_flags |= COMMIT_GRAPH_PROGRESS;
+
+		write_commit_graph_reachable(get_object_directory(),
+					     commit_graph_flags,
+					     &split_opts);
+	}
+
 	close_object_store(the_repository->objects);
 
 	if (enable_auto_gc) {
diff --git a/repo-settings.c b/repo-settings.c
index 3779b85c17..05546db98e 100644
--- a/repo-settings.c
+++ b/repo-settings.c
@@ -49,10 +49,14 @@ void prepare_repo_settings(struct repository *r)
 		UPDATE_DEFAULT_BOOL(r->settings.index_version, 4);
 		UPDATE_DEFAULT_BOOL(r->settings.core_untracked_cache, UNTRACKED_CACHE_WRITE);
 	}
+	if (!repo_config_get_bool(r, "fetch.writecommitgraph", &value))
+		r->settings.fetch_write_commit_graph = value;
 	if (!repo_config_get_bool(r, "feature.experimental", &value) && value) {
 		UPDATE_DEFAULT_BOOL(r->settings.pack_use_sparse, 1);
 		UPDATE_DEFAULT_BOOL(r->settings.fetch_negotiation_algorithm, FETCH_NEGOTIATION_SKIPPING);
+		UPDATE_DEFAULT_BOOL(r->settings.fetch_write_commit_graph, 1);
 	}
+	UPDATE_DEFAULT_BOOL(r->settings.fetch_write_commit_graph, 0);
 
 	/* Hack for test programs like test-dump-untracked-cache */
 	if (ignore_untracked_cache_config)
diff --git a/repository.h b/repository.h
index 4da275e73f..fe0b5f5dc6 100644
--- a/repository.h
+++ b/repository.h
@@ -30,6 +30,7 @@ struct repo_settings {
 
 	int core_commit_graph;
 	int gc_write_commit_graph;
+	int fetch_write_commit_graph;
 
 	int index_version;
 	enum untracked_cache_setting core_untracked_cache;
diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
index 139f7106f7..91ede622f0 100755
--- a/t/t5510-fetch.sh
+++ b/t/t5510-fetch.sh
@@ -570,6 +570,19 @@ test_expect_success 'LHS of refspec follows ref disambiguation rules' '
 	)
 '
 
+test_expect_success 'fetch.writeCommitGraph' '
+	git clone three write &&
+	(
+		cd three &&
+		test_commit new
+	) &&
+	(
+		cd write &&
+		git -c fetch.writeCommitGraph fetch origin &&
+		test_path_is_file .git/objects/info/commit-graphs/commit-graph-chain
+	)
+'
+
 # configured prune tests
 
 set_config_tristate () {
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1] fetch: add fetch.writeCommitGraph config setting
  2019-09-03  2:22 ` [PATCH 1/1] fetch: add fetch.writeCommitGraph config setting Derrick Stolee via GitGitGadget
@ 2019-09-03 19:05   ` Junio C Hamano
  2019-09-03 23:36     ` Derrick Stolee
  2019-09-04  3:08   ` Jeff King
  1 sibling, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2019-09-03 19:05 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, peff, avarab, garimasigit, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index 53ce99d2bb..d36a403859 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -23,6 +23,7 @@
>  #include "packfile.h"
>  #include "list-objects-filter-options.h"
>  #include "commit-reach.h"
> +#include "commit-graph.h"
>  
>  #define FORCED_UPDATES_DELAY_WARNING_IN_MS (10 * 1000)
>  
> @@ -1715,6 +1716,20 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
>  
>  	string_list_clear(&list, 0);
>  
> +	prepare_repo_settings(the_repository);
> +	if (the_repository->settings.fetch_write_commit_graph) {
> +		int commit_graph_flags = COMMIT_GRAPH_SPLIT;
> +		struct split_commit_graph_opts split_opts;
> +		memset(&split_opts, 0, sizeof(struct split_commit_graph_opts));
> +
> +		if (progress)
> +			commit_graph_flags |= COMMIT_GRAPH_PROGRESS;
> +
> +		write_commit_graph_reachable(get_object_directory(),
> +					     commit_graph_flags,
> +					     &split_opts);
> +	}

As a low-impact change this is good.  

For longer term, it feels a bit unfortunate that this is still a
separate phase of the program, though.  We know what new refs we
added, we know what new objects we received, and we even scanned
each and every one of them while running the index-pack step to
store the .pack and compute the .idx file, i.e. it feels that we
have most of the information already in-core to extend the commit
graph for new parts of the history we just received.

Thanks.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1] fetch: add fetch.writeCommitGraph config setting
  2019-09-03 19:05   ` Junio C Hamano
@ 2019-09-03 23:36     ` Derrick Stolee
  2019-09-06 21:46       ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: Derrick Stolee @ 2019-09-03 23:36 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, peff, avarab, garimasigit, Derrick Stolee

On 9/3/2019 3:05 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> diff --git a/builtin/fetch.c b/builtin/fetch.c
>> index 53ce99d2bb..d36a403859 100644
>> --- a/builtin/fetch.c
>> +++ b/builtin/fetch.c
>> @@ -23,6 +23,7 @@
>>  #include "packfile.h"
>>  #include "list-objects-filter-options.h"
>>  #include "commit-reach.h"
>> +#include "commit-graph.h"
>>  
>>  #define FORCED_UPDATES_DELAY_WARNING_IN_MS (10 * 1000)
>>  
>> @@ -1715,6 +1716,20 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
>>  
>>  	string_list_clear(&list, 0);
>>  
>> +	prepare_repo_settings(the_repository);
>> +	if (the_repository->settings.fetch_write_commit_graph) {
>> +		int commit_graph_flags = COMMIT_GRAPH_SPLIT;
>> +		struct split_commit_graph_opts split_opts;
>> +		memset(&split_opts, 0, sizeof(struct split_commit_graph_opts));
>> +
>> +		if (progress)
>> +			commit_graph_flags |= COMMIT_GRAPH_PROGRESS;
>> +
>> +		write_commit_graph_reachable(get_object_directory(),
>> +					     commit_graph_flags,
>> +					     &split_opts);
>> +	}
> 
> As a low-impact change this is good.  
> 
> For longer term, it feels a bit unfortunate that this is still a
> separate phase of the program, though.  We know what new refs we
> added, we know what new objects we received, and we even scanned
> each and every one of them while running the index-pack step to
> store the .pack and compute the .idx file, i.e. it feels that we
> have most of the information already in-core to extend the commit
> graph for new parts of the history we just received.

You're right that we could isolate the new write to the refs we
just received. We could use the more cumbersome write_commit_graph()
method with a list of commit oids as starting points. I'm happy to
make that change if we see a lot of value there.

However, the current patch also gives us a way to add local refs
to the commit graph and "catch up" to the work the user had done.
With this in mind, I do think the simpler code has another functional
advantage.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1] fetch: add fetch.writeCommitGraph config setting
  2019-09-03  2:22 ` [PATCH 1/1] fetch: add fetch.writeCommitGraph config setting Derrick Stolee via GitGitGadget
  2019-09-03 19:05   ` Junio C Hamano
@ 2019-09-04  3:08   ` Jeff King
  2019-09-05 20:37     ` Junio C Hamano
  1 sibling, 1 reply; 18+ messages in thread
From: Jeff King @ 2019-09-04  3:08 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, avarab, garimasigit, Junio C Hamano, Derrick Stolee

On Mon, Sep 02, 2019 at 07:22:02PM -0700, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <dstolee@microsoft.com>
> 
> The commit-graph feature is now on by default, and is being
> written during 'git gc' by default. Typically, Git only writes
> a commit-graph when a 'git gc --auto' command passes the gc.auto
> setting to actualy do work. This means that a commit-graph will
> typically fall behind the commits that are being used every day.
> 
> To stay updated with the latest commits, add a step to 'git
> fetch' to write a commit-graph after fetching new objects. The
> fetch.writeCommitGraph config setting enables writing a split
> commit-graph, so on average the cost of writing this file is
> very small. Occasionally, the commit-graph chain will collapse
> to a single level, and this could be slow for very large repos.
> 
> For additional use, adjust the default to be true when
> feature.experimental is enabled.

Seems like a good time to do it. We'd eventually want a similar option
for updating it on the receiving side of a push, too. I don't insist
that come at the same time, but we should at least have a plan about how
it will look to the user.

Do we want to to have fetch.writeCommitGraph, receive.writeCommitGraph,
and then a master transfer.writeCommitGraph?

-Peff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1] fetch: add fetch.writeCommitGraph config setting
  2019-09-04  3:08   ` Jeff King
@ 2019-09-05 20:37     ` Junio C Hamano
  2019-09-06 17:00       ` Derrick Stolee
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2019-09-05 20:37 UTC (permalink / raw)
  To: Jeff King
  Cc: Derrick Stolee via GitGitGadget, git, avarab, garimasigit,
	Derrick Stolee

Jeff King <peff@peff.net> writes:

> Do we want to to have fetch.writeCommitGraph, receive.writeCommitGraph,
> and then a master transfer.writeCommitGraph?

If anything, it may be good for consistency.

I am not sure if it is a good idea to trigger writing the commit
graph when accepting a push, though.  It tends to be a lot finer
grained than fetching, right?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1] fetch: add fetch.writeCommitGraph config setting
  2019-09-05 20:37     ` Junio C Hamano
@ 2019-09-06 17:00       ` Derrick Stolee
  2019-09-06 17:56         ` Jeff King
  0 siblings, 1 reply; 18+ messages in thread
From: Derrick Stolee @ 2019-09-06 17:00 UTC (permalink / raw)
  To: Junio C Hamano, Jeff King
  Cc: Derrick Stolee via GitGitGadget, git, avarab, garimasigit,
	Derrick Stolee

On 9/5/2019 4:37 PM, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
> 
>> Do we want to to have fetch.writeCommitGraph, receive.writeCommitGraph,
>> and then a master transfer.writeCommitGraph?
> 
> If anything, it may be good for consistency.
> 
> I am not sure if it is a good idea to trigger writing the commit
> graph when accepting a push, though.  It tends to be a lot finer
> grained than fetching, right?

And I expect a push to include many fewer commits than a fetch.
In a server environment, I would expect to have a separate
maintenance task responsible for updating the commit-graph after
receiving new data, but not in an in-line fashion with the push.

Think about the situation of many pushes that happen in a short
burst: one write after all of the pushes would have close to the
same performance benefits as writing with every push, but does
a lot less work.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1] fetch: add fetch.writeCommitGraph config setting
  2019-09-06 17:00       ` Derrick Stolee
@ 2019-09-06 17:56         ` Jeff King
  2019-09-06 18:24           ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: Jeff King @ 2019-09-06 17:56 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Junio C Hamano, Derrick Stolee via GitGitGadget, git, avarab,
	garimasigit, Derrick Stolee

On Fri, Sep 06, 2019 at 01:00:40PM -0400, Derrick Stolee wrote:

> On 9/5/2019 4:37 PM, Junio C Hamano wrote:
> > Jeff King <peff@peff.net> writes:
> > 
> >> Do we want to to have fetch.writeCommitGraph, receive.writeCommitGraph,
> >> and then a master transfer.writeCommitGraph?
> > 
> > If anything, it may be good for consistency.
> > 
> > I am not sure if it is a good idea to trigger writing the commit
> > graph when accepting a push, though.  It tends to be a lot finer
> > grained than fetching, right?
> 
> And I expect a push to include many fewer commits than a fetch.
> In a server environment, I would expect to have a separate
> maintenance task responsible for updating the commit-graph after
> receiving new data, but not in an in-line fashion with the push.

That's probably how we'll end up doing it at GitHub, because we run a
big server farm that has a job-queueing system for periodic maintenance,
etc.

But keep in mind the "little guy" who is hosting a few repositories for
themselves or their company. They'd presumably want the benefits here,
too, right? We already have options to trigger auto-gc and
update-server-info for them, so why not this maintenance, too?

> Think about the situation of many pushes that happen in a short
> burst: one write after all of the pushes would have close to the
> same performance benefits as writing with every push, but does
> a lot less work.

Sure, but wouldn't that similarly apply to fetching? What is it that
makes bursts of pushes more likely than bursts of fetches?

-Peff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1] fetch: add fetch.writeCommitGraph config setting
  2019-09-06 17:56         ` Jeff King
@ 2019-09-06 18:24           ` Junio C Hamano
  2019-09-06 19:16             ` Jeff King
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2019-09-06 18:24 UTC (permalink / raw)
  To: Jeff King
  Cc: Derrick Stolee, Derrick Stolee via GitGitGadget, git, avarab,
	garimasigit, Derrick Stolee

Jeff King <peff@peff.net> writes:

> Sure, but wouldn't that similarly apply to fetching? What is it that
> makes bursts of pushes more likely than bursts of fetches?

Because people tend to use a repository as a gathering point?  You
may periodically fetch from and push to a repository, and you may
even do so at the same interval, but simply because there are more
"other" people than you alone as a single developer in the project,
your fetch tends to grab work from more people than yoru push that
publish your work alone?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1] fetch: add fetch.writeCommitGraph config setting
  2019-09-06 18:24           ` Junio C Hamano
@ 2019-09-06 19:16             ` Jeff King
  2019-09-06 20:42               ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: Jeff King @ 2019-09-06 19:16 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee, Derrick Stolee via GitGitGadget, git, avarab,
	garimasigit, Derrick Stolee

On Fri, Sep 06, 2019 at 11:24:12AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Sure, but wouldn't that similarly apply to fetching? What is it that
> > makes bursts of pushes more likely than bursts of fetches?
> 
> Because people tend to use a repository as a gathering point?  You
> may periodically fetch from and push to a repository, and you may
> even do so at the same interval, but simply because there are more
> "other" people than you alone as a single developer in the project,
> your fetch tends to grab work from more people than yoru push that
> publish your work alone?

I suppose so. But I think the "stock git without any other job
infrastructure" case would still benefit. How do we know when the burst
is done? We'd effectively be relying on auto-gc to do that, but "enough
packs to merit gc" and "burst is done, now is a good time to update the
commit graph" are two different metrics.

-Peff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1] fetch: add fetch.writeCommitGraph config setting
  2019-09-06 19:16             ` Jeff King
@ 2019-09-06 20:42               ` Junio C Hamano
  2019-09-06 21:04                 ` Derrick Stolee
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2019-09-06 20:42 UTC (permalink / raw)
  To: Jeff King
  Cc: Derrick Stolee, Derrick Stolee via GitGitGadget, git, avarab,
	garimasigit, Derrick Stolee

Jeff King <peff@peff.net> writes:

> I suppose so. But I think the "stock git without any other job
> infrastructure" case would still benefit.

Oh, no question about it.

I did question if an automatic writing would benefit the side that
receives a push when you brought up the usual "fetch.* and receive.*
for separate configuration, transfer.* when want to rule them both"
convention, which is a good idea if only for consistency, but the
question was if it helps the receiving end of a push to the same
degree as it would help the repository that fetches.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1] fetch: add fetch.writeCommitGraph config setting
  2019-09-06 20:42               ` Junio C Hamano
@ 2019-09-06 21:04                 ` Derrick Stolee
  2019-09-06 21:57                   ` Junio C Hamano
  2019-09-07  4:46                   ` Jeff King
  0 siblings, 2 replies; 18+ messages in thread
From: Derrick Stolee @ 2019-09-06 21:04 UTC (permalink / raw)
  To: Junio C Hamano, Jeff King
  Cc: Derrick Stolee via GitGitGadget, git, avarab, garimasigit,
	Derrick Stolee

On 9/6/2019 4:42 PM, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
> 
>> I suppose so. But I think the "stock git without any other job
>> infrastructure" case would still benefit.
> 
> Oh, no question about it.
> 
> I did question if an automatic writing would benefit the side that
> receives a push when you brought up the usual "fetch.* and receive.*
> for separate configuration, transfer.* when want to rule them both"
> convention, which is a good idea if only for consistency, but the
> question was if it helps the receiving end of a push to the same
> degree as it would help the repository that fetches.

I think the "stock git without any other job infrastructure" is
a very important scenario. Putting the simplest version of
"commit-graph writes in-line with every push" seems to be ripe
for failure under load. I'd rather think deeply about what is
best for this scenario.

Spit-balling: what if we used the same mechanism as running GC
in the background to launch 'git commit-graph write' commands?
And we could have the command give up (without failure) if the
commit-graph.lock file is already created, so concurrent pushes
would not fight each other.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1] fetch: add fetch.writeCommitGraph config setting
  2019-09-03 23:36     ` Derrick Stolee
@ 2019-09-06 21:46       ` Junio C Hamano
  2019-09-07  4:51         ` Jeff King
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2019-09-06 21:46 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, peff, avarab, garimasigit,
	Derrick Stolee

Derrick Stolee <stolee@gmail.com> writes:

>>> +		write_commit_graph_reachable(get_object_directory(),
>>> +					     commit_graph_flags,
>>> +					     &split_opts);
>>> +	}
>> 
>> As a low-impact change this is good.  
>> 
>> For longer term, it feels a bit unfortunate that this is still a
>> separate phase of the program, though.  We know what new refs we
>> added, we know what new objects we received, and we even scanned
>> each and every one of them while running the index-pack step to
>> store the .pack and compute the .idx file, i.e. it feels that we
>> have most of the information already in-core to extend the commit
>> graph for new parts of the history we just received.
>
> You're right that we could isolate the new write to the refs we
> just received. We could use the more cumbersome write_commit_graph()
> method with a list of commit oids as starting points. I'm happy to
> make that change if we see a lot of value there.

Well, that is not the kind of information reuse I am talking about.

I was wondering if "index-pack" has enough information in-core after
it receives and processes the incoming pack data, scanned each and
every object in it, in order to write out the commit graph _without_
having to do a lot of duplicate computation and enumeration of the
objects done in the current commit-graph.c::write_commit_graph(), so
that it can learn a "--write-commit-graph" option that performs much
better than running "git fetch && git commit-graph write".

Perhaps that would require too much refactoring of both index-pack
and commit-graph code and infeasible in short to medium term and
that is why I said "for longer term, it feels a bit unfortunate".

Thanks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1] fetch: add fetch.writeCommitGraph config setting
  2019-09-06 21:04                 ` Derrick Stolee
@ 2019-09-06 21:57                   ` Junio C Hamano
  2019-09-07  4:47                     ` Jeff King
  2019-09-07  4:46                   ` Jeff King
  1 sibling, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2019-09-06 21:57 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Jeff King, Derrick Stolee via GitGitGadget, git, avarab,
	garimasigit, Derrick Stolee

Derrick Stolee <stolee@gmail.com> writes:

> On 9/6/2019 4:42 PM, Junio C Hamano wrote:
>> Jeff King <peff@peff.net> writes:
>> 
>>> I suppose so. But I think the "stock git without any other job
>>> infrastructure" case would still benefit.
>> 
>> Oh, no question about it.
>> 
>> I did question if an automatic writing would benefit the side that
>> receives a push when you brought up the usual "fetch.* and receive.*
>> for separate configuration, transfer.* when want to rule them both"
>> convention, which is a good idea if only for consistency, but the
>> question was if it helps the receiving end of a push to the same
>> degree as it would help the repository that fetches.
>
> I think the "stock git without any other job infrastructure" is
> a very important scenario. Putting the simplest version of
> "commit-graph writes in-line with every push" seems to be ripe
> for failure under load. I'd rather think deeply about what is
> best for this scenario.

As to what to do on the push side, I suppose we can afford to let it
simmer in the back of our heads while moving this topic forward.
Whether we'd later decide to have receive.writeCommitGraph (in which
case we would add transfer.writeCommitGraph, too) or not, this
change on the fetch side can independently be used, right?

Thanks.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1] fetch: add fetch.writeCommitGraph config setting
  2019-09-06 21:04                 ` Derrick Stolee
  2019-09-06 21:57                   ` Junio C Hamano
@ 2019-09-07  4:46                   ` Jeff King
  1 sibling, 0 replies; 18+ messages in thread
From: Jeff King @ 2019-09-07  4:46 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Junio C Hamano, Derrick Stolee via GitGitGadget, git, avarab,
	garimasigit, Derrick Stolee

On Fri, Sep 06, 2019 at 05:04:17PM -0400, Derrick Stolee wrote:

> On 9/6/2019 4:42 PM, Junio C Hamano wrote:
> > Jeff King <peff@peff.net> writes:
> > 
> >> I suppose so. But I think the "stock git without any other job
> >> infrastructure" case would still benefit.
> > 
> > Oh, no question about it.
> > 
> > I did question if an automatic writing would benefit the side that
> > receives a push when you brought up the usual "fetch.* and receive.*
> > for separate configuration, transfer.* when want to rule them both"
> > convention, which is a good idea if only for consistency, but the
> > question was if it helps the receiving end of a push to the same
> > degree as it would help the repository that fetches.
> 
> I think the "stock git without any other job infrastructure" is
> a very important scenario. Putting the simplest version of
> "commit-graph writes in-line with every push" seems to be ripe
> for failure under load. I'd rather think deeply about what is
> best for this scenario.

If it's going to be a problem under load, that makes me worry about the
same thing for fetches. Whether you see a lot of either depends on your
workflow. But as long as neither option is enabled by default (or at
least if it becomes common knowledge to _disable_ them if you have a
high rate), it may be OK.

I do agree that the normal "busy" repos you and I have both seen in
enterprise settings (where people are literally pushing multiple times
per second at peak) would want some kind of throttling. But I think
there's a long tail of "push once a week" repos.

> Spit-balling: what if we used the same mechanism as running GC
> in the background to launch 'git commit-graph write' commands?
> And we could have the command give up (without failure) if the
> commit-graph.lock file is already created, so concurrent pushes
> would not fight each other.

I have mixed feelings. It's nice not to stand in the critical path of
the user, but background tasks have a way of finding funny corner cases
(e.g., packfiles moving around, or the issues we've had with deciding
when to shut down auto-gc for a grace period due to warnings/errors).

I think since this is generally additive (adding new commit-graph
files), it might be less finicky, though.

-Peff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1] fetch: add fetch.writeCommitGraph config setting
  2019-09-06 21:57                   ` Junio C Hamano
@ 2019-09-07  4:47                     ` Jeff King
  0 siblings, 0 replies; 18+ messages in thread
From: Jeff King @ 2019-09-07  4:47 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee, Derrick Stolee via GitGitGadget, git, avarab,
	garimasigit, Derrick Stolee

On Fri, Sep 06, 2019 at 02:57:55PM -0700, Junio C Hamano wrote:

> > I think the "stock git without any other job infrastructure" is
> > a very important scenario. Putting the simplest version of
> > "commit-graph writes in-line with every push" seems to be ripe
> > for failure under load. I'd rather think deeply about what is
> > best for this scenario.
> 
> As to what to do on the push side, I suppose we can afford to let it
> simmer in the back of our heads while moving this topic forward.
> Whether we'd later decide to have receive.writeCommitGraph (in which
> case we would add transfer.writeCommitGraph, too) or not, this
> change on the fetch side can independently be used, right?

Yeah, I'm OK with proceeding without the receive-pack side for now.

-Peff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1] fetch: add fetch.writeCommitGraph config setting
  2019-09-06 21:46       ` Junio C Hamano
@ 2019-09-07  4:51         ` Jeff King
  2019-09-09 17:53           ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: Jeff King @ 2019-09-07  4:51 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee, Derrick Stolee via GitGitGadget, git, avarab,
	garimasigit, Derrick Stolee

On Fri, Sep 06, 2019 at 02:46:22PM -0700, Junio C Hamano wrote:

> > You're right that we could isolate the new write to the refs we
> > just received. We could use the more cumbersome write_commit_graph()
> > method with a list of commit oids as starting points. I'm happy to
> > make that change if we see a lot of value there.
> 
> Well, that is not the kind of information reuse I am talking about.
> 
> I was wondering if "index-pack" has enough information in-core after
> it receives and processes the incoming pack data, scanned each and
> every object in it, in order to write out the commit graph _without_
> having to do a lot of duplicate computation and enumeration of the
> objects done in the current commit-graph.c::write_commit_graph(), so
> that it can learn a "--write-commit-graph" option that performs much
> better than running "git fetch && git commit-graph write".
> 
> Perhaps that would require too much refactoring of both index-pack
> and commit-graph code and infeasible in short to medium term and
> that is why I said "for longer term, it feels a bit unfortunate".

I think the basic metadata should be easy. We have each commit expanded
in memory at some point, so parsing it and filing away its parents,
trees, etc isn't too hard.

Generation numbers are a little trickier, though, because they imply an
actual topological traversal. It might actually be easier to couple this
with the connectivity check we do after index-pack finishes (though I've
often wondered if we could drop that check in favor of making index-pack
smarter about finding the boundaries).

-Peff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1] fetch: add fetch.writeCommitGraph config setting
  2019-09-07  4:51         ` Jeff King
@ 2019-09-09 17:53           ` Junio C Hamano
  0 siblings, 0 replies; 18+ messages in thread
From: Junio C Hamano @ 2019-09-09 17:53 UTC (permalink / raw)
  To: Jeff King
  Cc: Derrick Stolee, Derrick Stolee via GitGitGadget, git, avarab,
	garimasigit, Derrick Stolee

Jeff King <peff@peff.net> writes:

> Generation numbers are a little trickier, though, because they imply an
> actual topological traversal. It might actually be easier to couple this
> with the connectivity check we do after index-pack finishes (though I've
> often wondered if we could drop that check in favor of making index-pack
> smarter about finding the boundaries).

Currently after scanning the incoming objects with the fsck
machinery, we count the number of objects that are pointed at by
these objects in the pack and are not in the pack in the
builtin/index-pack.c::check_object() function; at that point, we no
longer know who points at the object in question, which is needed if
we want to compute the boundary, so we need a bit of work here.

With boundary information, we could be smarter about lazy fetching,
I presume?

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2019-09-09 17:53 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-03  2:22 [PATCH 0/1] Write commit-graph on fetch Derrick Stolee via GitGitGadget
2019-09-03  2:22 ` [PATCH 1/1] fetch: add fetch.writeCommitGraph config setting Derrick Stolee via GitGitGadget
2019-09-03 19:05   ` Junio C Hamano
2019-09-03 23:36     ` Derrick Stolee
2019-09-06 21:46       ` Junio C Hamano
2019-09-07  4:51         ` Jeff King
2019-09-09 17:53           ` Junio C Hamano
2019-09-04  3:08   ` Jeff King
2019-09-05 20:37     ` Junio C Hamano
2019-09-06 17:00       ` Derrick Stolee
2019-09-06 17:56         ` Jeff King
2019-09-06 18:24           ` Junio C Hamano
2019-09-06 19:16             ` Jeff King
2019-09-06 20:42               ` Junio C Hamano
2019-09-06 21:04                 ` Derrick Stolee
2019-09-06 21:57                   ` Junio C Hamano
2019-09-07  4:47                     ` Jeff King
2019-09-07  4:46                   ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).