git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/2] Maintenance: add pack-refs task
@ 2021-02-08 14:52 Derrick Stolee via GitGitGadget
  2021-02-08 14:52 ` [PATCH 1/2] maintenance: " Derrick Stolee via GitGitGadget
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-02-08 14:52 UTC (permalink / raw)
  To: git; +Cc: gitster, sluongng, martin.agren, sunshine, Derrick Stolee

This patch series adds a new pack-refs task to the maintenance builtin. This
operation already happens within git gc (and hence the gc task) but it is
easy to extract. Packing refs does not delete any data, only collects loose
objects into a combined file. This makes things faster in subtle ways,
especially when a command needs to iterate through refs (especially tags).

Credit for inspiring this goes to Suolong, who asked for this to be added to
Scalar [1]. I've been waiting instead to add it directly to Git and its
background maintenance. Now is the time!

[1] https://github.com/microsoft/scalar/issues/382

I chose to add it to the incremental maintenance strategy at a weekly
cadence. I'm not sure there is significant value to the difference between
weekly and daily. It just seems to me that weekly is often enough. Feel free
to correct me if you have a different opinion.

My hope is that this patch series could be used as an example for further
extracting tasks out of the gc task and making them be full maintenance
tasks. Doing more of these extractions could be a good project for a new
contributor.

One thing that is not implemented in this series is a notion of the behavior
for the pack-refs task during git maintenance run --auto. This could be
added in the future, but I wanted to focus on getting this behavior into the
incremental maintenance schedule.

Thanks, -Stolee

Derrick Stolee (2):
  maintenance: add pack-refs task
  maintenance: incremental strategy runs pack-refs weekly

 Documentation/config/maintenance.txt |  5 +++--
 Documentation/git-maintenance.txt    |  6 ++++++
 builtin/gc.c                         | 23 +++++++++++++++++++----
 t/t7900-maintenance.sh               | 24 ++++++++++++++++++++++++
 4 files changed, 52 insertions(+), 6 deletions(-)


base-commit: fb7fa4a1fd273f22efcafdd13c7f897814fd1eb9
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-871%2Fderrickstolee%2Fmaintenance%2Fpack-refs-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-871/derrickstolee/maintenance/pack-refs-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/871
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/2] maintenance: add pack-refs task
  2021-02-08 14:52 [PATCH 0/2] Maintenance: add pack-refs task Derrick Stolee via GitGitGadget
@ 2021-02-08 14:52 ` Derrick Stolee via GitGitGadget
  2021-02-08 22:53   ` Taylor Blau
  2021-02-08 23:06   ` Eric Sunshine
  2021-02-08 14:52 ` [PATCH 2/2] maintenance: incremental strategy runs pack-refs weekly Derrick Stolee via GitGitGadget
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 12+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-02-08 14:52 UTC (permalink / raw)
  To: git
  Cc: gitster, sluongng, martin.agren, sunshine, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

It is valuable to collect loose refs into a more compressed form. This
is typically the packed-refs file, although this could be the reftable
in the future. Having packed refs can be extremely valuable in repos
with many tags or remote branches that are not modified by the local
user, but still are necessary for other queries.

For instance, with many exploded refs, commands such as

	git describe --tags --exact-match HEAD

can be very slow (multiple seconds). This command in particular is used
by terminal prompts to show when a detatched HEAD is pointing to an
existing tag, so having it be slow causes significant delays for users.

Add a new 'pack-refs' maintenance task. This is already a sub-step of
the 'gc' task, but users could run this at other intervals if they are
interested. Also, if users opt-in to the default background maintenance
schedule, then the 'gc' task is disabled.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-maintenance.txt |  6 ++++++
 builtin/gc.c                      | 21 +++++++++++++++++----
 t/t7900-maintenance.sh            | 10 ++++++++++
 3 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index 3b432171d608..8cc0225b3304 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -145,6 +145,12 @@ incremental-repack::
 	which is a special case that attempts to repack all pack-files
 	into a single pack-file.
 
+pack-refs::
+	The `pack-refs` task collects the loose reference files and
+	collects them into a single file. This speeds up operations that
+	need to iterate across many refereences. See linkgit:git-pack-refs[1]
+	for more information.
+
 OPTIONS
 -------
 --auto::
diff --git a/builtin/gc.c b/builtin/gc.c
index 4c40594d660e..8f13c3bb8607 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -54,7 +54,6 @@ static const char *prune_worktrees_expire = "3.months.ago";
 static unsigned long big_pack_threshold;
 static unsigned long max_delta_cache_size = DEFAULT_DELTA_CACHE_SIZE;
 
-static struct strvec pack_refs_cmd = STRVEC_INIT;
 static struct strvec reflog = STRVEC_INIT;
 static struct strvec repack = STRVEC_INIT;
 static struct strvec prune = STRVEC_INIT;
@@ -163,6 +162,15 @@ static void gc_config(void)
 	git_config(git_default_config, NULL);
 }
 
+struct maintenance_run_opts;
+static int maintenance_task_pack_refs(struct maintenance_run_opts *opts)
+{
+	struct strvec pack_refs_cmd = STRVEC_INIT;
+	strvec_pushl(&pack_refs_cmd, "pack-refs", "--all", "--prune", NULL);
+
+	return run_command_v_opt(pack_refs_cmd.v, RUN_GIT_CMD);
+}
+
 static int too_many_loose_objects(void)
 {
 	/*
@@ -518,8 +526,8 @@ static void gc_before_repack(void)
 	if (done++)
 		return;
 
-	if (pack_refs && run_command_v_opt(pack_refs_cmd.v, RUN_GIT_CMD))
-		die(FAILED_RUN, pack_refs_cmd.v[0]);
+	if (pack_refs && maintenance_task_pack_refs(NULL))
+		die(FAILED_RUN, "pack-refs");
 
 	if (prune_reflogs && run_command_v_opt(reflog.v, RUN_GIT_CMD))
 		die(FAILED_RUN, reflog.v[0]);
@@ -556,7 +564,6 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage_with_options(builtin_gc_usage, builtin_gc_options);
 
-	strvec_pushl(&pack_refs_cmd, "pack-refs", "--all", "--prune", NULL);
 	strvec_pushl(&reflog, "reflog", "expire", "--all", NULL);
 	strvec_pushl(&repack, "repack", "-d", "-l", NULL);
 	strvec_pushl(&prune, "prune", "--expire", NULL);
@@ -1224,6 +1231,7 @@ enum maintenance_task_label {
 	TASK_INCREMENTAL_REPACK,
 	TASK_GC,
 	TASK_COMMIT_GRAPH,
+	TASK_PACK_REFS,
 
 	/* Leave as final value */
 	TASK__COUNT
@@ -1255,6 +1263,11 @@ static struct maintenance_task tasks[] = {
 		maintenance_task_commit_graph,
 		should_write_commit_graph,
 	},
+	[TASK_PACK_REFS] = {
+		"pack-refs",
+		maintenance_task_pack_refs,
+		NULL,
+	},
 };
 
 static int compare_tasks_by_selection(const void *a_, const void *b_)
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 78ccf4b33f87..ef4527823d29 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -343,6 +343,16 @@ test_expect_success 'maintenance.incremental-repack.auto' '
 	test_subcommand git multi-pack-index write --no-progress <trace-B
 '
 
+test_expect_success 'pack-refs task' '
+	for n in $(test_seq 1 5)
+	do
+		git branch -f to-pack/$n HEAD || return 1
+	done &&
+	git maintenance run --task=pack-refs &&
+	ls .git/refs/heads/ >after &&
+	test_must_be_empty after
+'
+
 test_expect_success '--auto and --schedule incompatible' '
 	test_must_fail git maintenance run --auto --schedule=daily 2>err &&
 	test_i18ngrep "at most one" err
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/2] maintenance: incremental strategy runs pack-refs weekly
  2021-02-08 14:52 [PATCH 0/2] Maintenance: add pack-refs task Derrick Stolee via GitGitGadget
  2021-02-08 14:52 ` [PATCH 1/2] maintenance: " Derrick Stolee via GitGitGadget
@ 2021-02-08 14:52 ` Derrick Stolee via GitGitGadget
  2021-02-08 22:46 ` [PATCH 0/2] Maintenance: add pack-refs task Taylor Blau
  2021-02-09 13:42 ` [PATCH v2 " Derrick Stolee via GitGitGadget
  3 siblings, 0 replies; 12+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-02-08 14:52 UTC (permalink / raw)
  To: git
  Cc: gitster, sluongng, martin.agren, sunshine, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When the 'maintenance.strategy' config option is set to 'incremental',
a default maintenance schedule is enabled. Add the 'pack-refs' task to
that strategy at the weekly cadence.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/config/maintenance.txt |  5 +++--
 builtin/gc.c                         |  2 ++
 t/t7900-maintenance.sh               | 14 ++++++++++++++
 3 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/Documentation/config/maintenance.txt b/Documentation/config/maintenance.txt
index a5ead09e4bc2..18f056213145 100644
--- a/Documentation/config/maintenance.txt
+++ b/Documentation/config/maintenance.txt
@@ -15,8 +15,9 @@ maintenance.strategy::
 * `none`: This default setting implies no task are run at any schedule.
 * `incremental`: This setting optimizes for performing small maintenance
   activities that do not delete any data. This does not schedule the `gc`
-  task, but runs the `prefetch` and `commit-graph` tasks hourly and the
-  `loose-objects` and `incremental-repack` tasks daily.
+  task, but runs the `prefetch` and `commit-graph` tasks hourly, the
+  `loose-objects` and `incremental-repack` tasks daily, and the `pack-refs`
+  task weekly.
 
 maintenance.<task>.enabled::
 	This boolean config option controls whether the maintenance task
diff --git a/builtin/gc.c b/builtin/gc.c
index 8f13c3bb8607..8d365a59d027 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1352,6 +1352,8 @@ static void initialize_maintenance_strategy(void)
 		tasks[TASK_INCREMENTAL_REPACK].schedule = SCHEDULE_DAILY;
 		tasks[TASK_LOOSE_OBJECTS].enabled = 1;
 		tasks[TASK_LOOSE_OBJECTS].schedule = SCHEDULE_DAILY;
+		tasks[TASK_PACK_REFS].enabled = 1;
+		tasks[TASK_PACK_REFS].schedule = SCHEDULE_WEEKLY;
 	}
 }
 
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index ef4527823d29..fa92e01fb2c5 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -406,18 +406,32 @@ test_expect_success 'maintenance.strategy inheritance' '
 		git maintenance run --schedule=hourly --quiet &&
 	GIT_TRACE2_EVENT="$(pwd)/incremental-daily.txt" \
 		git maintenance run --schedule=daily --quiet &&
+	GIT_TRACE2_EVENT="$(pwd)/incremental-weekly.txt" \
+		git maintenance run --schedule=weekly --quiet &&
 
 	test_subcommand git commit-graph write --split --reachable \
 		--no-progress <incremental-hourly.txt &&
 	test_subcommand ! git prune-packed --quiet <incremental-hourly.txt &&
 	test_subcommand ! git multi-pack-index write --no-progress \
 		<incremental-hourly.txt &&
+	test_subcommand ! git pack-refs --all --prune \
+		<incremental-hourly.txt &&
 
 	test_subcommand git commit-graph write --split --reachable \
 		--no-progress <incremental-daily.txt &&
 	test_subcommand git prune-packed --quiet <incremental-daily.txt &&
 	test_subcommand git multi-pack-index write --no-progress \
 		<incremental-daily.txt &&
+	test_subcommand ! git pack-refs --all --prune \
+		<incremental-daily.txt &&
+
+	test_subcommand git commit-graph write --split --reachable \
+		--no-progress <incremental-weekly.txt &&
+	test_subcommand git prune-packed --quiet <incremental-weekly.txt &&
+	test_subcommand git multi-pack-index write --no-progress \
+		<incremental-weekly.txt &&
+	test_subcommand git pack-refs --all --prune \
+		<incremental-weekly.txt &&
 
 	# Modify defaults
 	git config maintenance.commit-graph.schedule daily &&
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] Maintenance: add pack-refs task
  2021-02-08 14:52 [PATCH 0/2] Maintenance: add pack-refs task Derrick Stolee via GitGitGadget
  2021-02-08 14:52 ` [PATCH 1/2] maintenance: " Derrick Stolee via GitGitGadget
  2021-02-08 14:52 ` [PATCH 2/2] maintenance: incremental strategy runs pack-refs weekly Derrick Stolee via GitGitGadget
@ 2021-02-08 22:46 ` Taylor Blau
  2021-02-09 13:42 ` [PATCH v2 " Derrick Stolee via GitGitGadget
  3 siblings, 0 replies; 12+ messages in thread
From: Taylor Blau @ 2021-02-08 22:46 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, gitster, sluongng, martin.agren, sunshine, Derrick Stolee

On Mon, Feb 08, 2021 at 02:52:21PM +0000, Derrick Stolee via GitGitGadget wrote:
> I chose to add it to the incremental maintenance strategy at a weekly
> cadence. I'm not sure there is significant value to the difference between
> weekly and daily. It just seems to me that weekly is often enough. Feel free
> to correct me if you have a different opinion.

I don't have any strong opinion about this. At GitHub we run pack-refs
as part of a maintenance-like job, which varies in frequency between
different repositories based on a number of factors. In our busiest
repository networks, this is almost certainly at least a few times a
day, if not "as often as possible."

I'd suspect that for any laptop-sized repository, anywhere between daily
and weekly is just fine.

I suppose weekly is a good starting point, since we can always change
the default to run more often if enough users complain.

> My hope is that this patch series could be used as an example for further
> extracting tasks out of the gc task and making them be full maintenance
> tasks. Doing more of these extractions could be a good project for a new
> contributor.

That's great :-).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] maintenance: add pack-refs task
  2021-02-08 14:52 ` [PATCH 1/2] maintenance: " Derrick Stolee via GitGitGadget
@ 2021-02-08 22:53   ` Taylor Blau
  2021-02-09 12:42     ` Derrick Stolee
  2021-02-08 23:06   ` Eric Sunshine
  1 sibling, 1 reply; 12+ messages in thread
From: Taylor Blau @ 2021-02-08 22:53 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, gitster, sluongng, martin.agren, sunshine, Derrick Stolee,
	Derrick Stolee

On Mon, Feb 08, 2021 at 02:52:22PM +0000, Derrick Stolee via GitGitGadget wrote:
> +pack-refs::
> +	The `pack-refs` task collects the loose reference files and
> +	collects them into a single file. This speeds up operations that
> +	need to iterate across many refereences. See linkgit:git-pack-refs[1]
> +	for more information.
> +

Do you think it's worth documenting here or in git-gc(1) that running
this has the effect of disabling the same step during gc?

>  OPTIONS
>  -------
>  --auto::
> diff --git a/builtin/gc.c b/builtin/gc.c
> index 4c40594d660e..8f13c3bb8607 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -54,7 +54,6 @@ static const char *prune_worktrees_expire = "3.months.ago";
>  static unsigned long big_pack_threshold;
>  static unsigned long max_delta_cache_size = DEFAULT_DELTA_CACHE_SIZE;
>
> -static struct strvec pack_refs_cmd = STRVEC_INIT;
>  static struct strvec reflog = STRVEC_INIT;
>  static struct strvec repack = STRVEC_INIT;
>  static struct strvec prune = STRVEC_INIT;
> @@ -163,6 +162,15 @@ static void gc_config(void)
>  	git_config(git_default_config, NULL);
>  }
>
> +struct maintenance_run_opts;
> +static int maintenance_task_pack_refs(struct maintenance_run_opts *opts)

It may be worth calling this "unused", since you explicitly pass NULL
below.

> +{
> +	struct strvec pack_refs_cmd = STRVEC_INIT;
> +	strvec_pushl(&pack_refs_cmd, "pack-refs", "--all", "--prune", NULL);
> +
> +	return run_command_v_opt(pack_refs_cmd.v, RUN_GIT_CMD);
> +}
> +
>  static int too_many_loose_objects(void)
>  {
>  	/*
> @@ -518,8 +526,8 @@ static void gc_before_repack(void)
>  	if (done++)
>  		return;
>
> -	if (pack_refs && run_command_v_opt(pack_refs_cmd.v, RUN_GIT_CMD))
> -		die(FAILED_RUN, pack_refs_cmd.v[0]);
> +	if (pack_refs && maintenance_task_pack_refs(NULL))
> +		die(FAILED_RUN, "pack-refs");

Hmm. Am I missing where opting into the maintenance step disables this
in gc? You suggest that it does in the commit message, but I can't seem
to see it here.

> +test_expect_success 'pack-refs task' '
> +	for n in $(test_seq 1 5)
> +	do
> +		git branch -f to-pack/$n HEAD || return 1
> +	done &&
> +	git maintenance run --task=pack-refs &&
> +	ls .git/refs/heads/ >after &&
> +	test_must_be_empty after

Worth testing that $GIT_DIR/packed-refs exists, too?

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] maintenance: add pack-refs task
  2021-02-08 14:52 ` [PATCH 1/2] maintenance: " Derrick Stolee via GitGitGadget
  2021-02-08 22:53   ` Taylor Blau
@ 2021-02-08 23:06   ` Eric Sunshine
  2021-02-09 12:42     ` Derrick Stolee
  1 sibling, 1 reply; 12+ messages in thread
From: Eric Sunshine @ 2021-02-08 23:06 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git List, Junio C Hamano, Son Luong Ngoc, Martin Ågren,
	Derrick Stolee, Derrick Stolee

On Mon, Feb 8, 2021 at 9:52 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> +pack-refs::
> +       The `pack-refs` task collects the loose reference files and
> +       collects them into a single file. This speeds up operations that
> +       need to iterate across many refereences. See linkgit:git-pack-refs[1]
> +       for more information.

s/refereences/references/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] maintenance: add pack-refs task
  2021-02-08 22:53   ` Taylor Blau
@ 2021-02-09 12:42     ` Derrick Stolee
  0 siblings, 0 replies; 12+ messages in thread
From: Derrick Stolee @ 2021-02-09 12:42 UTC (permalink / raw)
  To: Taylor Blau, Derrick Stolee via GitGitGadget
  Cc: git, gitster, sluongng, martin.agren, sunshine, Derrick Stolee,
	Derrick Stolee

On 2/8/2021 5:53 PM, Taylor Blau wrote:
> On Mon, Feb 08, 2021 at 02:52:22PM +0000, Derrick Stolee via GitGitGadget wrote:
>> +pack-refs::
>> +	The `pack-refs` task collects the loose reference files and
>> +	collects them into a single file. This speeds up operations that
>> +	need to iterate across many refereences. See linkgit:git-pack-refs[1]
>> +	for more information.
>> +
> 
> Do you think it's worth documenting here or in git-gc(1) that running
> this has the effect of disabling the same step during gc?

It doesn't disable the step during gc.

Perhaps I should use a better term than "extract". The 'gc' task shouldn't
change as we make some of its steps also available as independent tasks.

Instead, users can select a subset of its steps by enabling them directly.
Other such tasks could include:

 * prune-reflog
 * full-repack (as opposed to the existing incremental-repack task)

>> +struct maintenance_run_opts;
>> +static int maintenance_task_pack_refs(struct maintenance_run_opts *opts)
> 
> It may be worth calling this "unused", since you explicitly pass NULL
> below.

Good idea.

>> +	if (pack_refs && maintenance_task_pack_refs(NULL))
>> +		die(FAILED_RUN, "pack-refs");
> 
> Hmm. Am I missing where opting into the maintenance step disables this
> in gc? You suggest that it does in the commit message, but I can't seem
> to see it here.

My commit message suggests wrong.
 
>> +test_expect_success 'pack-refs task' '
>> +	for n in $(test_seq 1 5)
>> +	do
>> +		git branch -f to-pack/$n HEAD || return 1
>> +	done &&
>> +	git maintenance run --task=pack-refs &&
>> +	ls .git/refs/heads/ >after &&
>> +	test_must_be_empty after
> 
> Worth testing that $GIT_DIR/packed-refs exists, too?

That would start breaking if we change the backing store, right? I
want to be sure this doesn't create test breakages with reftable.

I _could_ add a 'test_subcommand' check here, though.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] maintenance: add pack-refs task
  2021-02-08 23:06   ` Eric Sunshine
@ 2021-02-09 12:42     ` Derrick Stolee
  0 siblings, 0 replies; 12+ messages in thread
From: Derrick Stolee @ 2021-02-09 12:42 UTC (permalink / raw)
  To: Eric Sunshine, Derrick Stolee via GitGitGadget
  Cc: Git List, Junio C Hamano, Son Luong Ngoc, Martin Ågren,
	Derrick Stolee, Derrick Stolee

On 2/8/2021 6:06 PM, Eric Sunshine wrote:
> On Mon, Feb 8, 2021 at 9:52 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>> +pack-refs::
>> +       The `pack-refs` task collects the loose reference files and
>> +       collects them into a single file. This speeds up operations that
>> +       need to iterate across many refereences. See linkgit:git-pack-refs[1]
>> +       for more information.
> 
> s/refereences/references/
 
Thanks!
-Stolee

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 0/2] Maintenance: add pack-refs task
  2021-02-08 14:52 [PATCH 0/2] Maintenance: add pack-refs task Derrick Stolee via GitGitGadget
                   ` (2 preceding siblings ...)
  2021-02-08 22:46 ` [PATCH 0/2] Maintenance: add pack-refs task Taylor Blau
@ 2021-02-09 13:42 ` Derrick Stolee via GitGitGadget
  2021-02-09 13:42   ` [PATCH v2 1/2] maintenance: " Derrick Stolee via GitGitGadget
                     ` (2 more replies)
  3 siblings, 3 replies; 12+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-02-09 13:42 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Eric Sunshine, Derrick Stolee, Derrick Stolee

This patch series adds a new pack-refs task to the maintenance builtin. This
operation already happens within git gc (and hence the gc task) but it is
easy to extract. Packing refs does not delete any data, only collects loose
objects into a combined file. This makes things faster in subtle ways,
especially when a command needs to iterate through refs (especially tags).

Credit for inspiring this goes to Suolong, who asked for this to be added to
Scalar [1]. I've been waiting instead to add it directly to Git and its
background maintenance. Now is the time!

[1] https://github.com/microsoft/scalar/issues/382

I chose to add it to the incremental maintenance strategy at a weekly
cadence. I'm not sure there is significant value to the difference between
weekly and daily. It just seems to me that weekly is often enough. Feel free
to correct me if you have a different opinion.

My hope is that this patch series could be used as an example for further
extracting tasks out of the gc task and making them be full maintenance
tasks. Doing more of these extractions could be a good project for a new
contributor.

One thing that is not implemented in this series is a notion of the behavior
for the pack-refs task during git maintenance run --auto. This could be
added in the future, but I wanted to focus on getting this behavior into the
incremental maintenance schedule.


Updates in V2
=============

 * Fixed doc typo. Thanks, Eric!
 * Updated commit messages to make it clear that the 'pack-refs' step will
   still happen within the 'gc' task.
 * Updated the test to check that we run the correct subcommand.
 * maintenance_task_pack_refs() uses MAYBE_UNUSED on its parameter.

Thanks, -Stolee

Cc: gitster@pobox.com Cc: sluongng@gmail.com Cc: martin.agren@gmail.com Cc:
sunshine@sunshineco.com

Derrick Stolee (2):
  maintenance: add pack-refs task
  maintenance: incremental strategy runs pack-refs weekly

 Documentation/config/maintenance.txt |  5 +++--
 Documentation/git-maintenance.txt    |  6 ++++++
 builtin/gc.c                         | 23 +++++++++++++++++++----
 t/t7900-maintenance.sh               | 26 ++++++++++++++++++++++++++
 4 files changed, 54 insertions(+), 6 deletions(-)


base-commit: fb7fa4a1fd273f22efcafdd13c7f897814fd1eb9
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-871%2Fderrickstolee%2Fmaintenance%2Fpack-refs-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-871/derrickstolee/maintenance/pack-refs-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/871

Range-diff vs v1:

 1:  33b7a74af4eb ! 1:  bedaeb548b06 maintenance: add pack-refs task
     @@ Commit message
          by terminal prompts to show when a detatched HEAD is pointing to an
          existing tag, so having it be slow causes significant delays for users.
      
     -    Add a new 'pack-refs' maintenance task. This is already a sub-step of
     -    the 'gc' task, but users could run this at other intervals if they are
     -    interested. Also, if users opt-in to the default background maintenance
     -    schedule, then the 'gc' task is disabled.
     +    Add a new 'pack-refs' maintenance task. It runs 'git pack-refs --all
     +    --prune' to move loose refs into a packed form. For now, that is the
     +    packed-refs file, but could adjust to other file formats in the future.
     +
     +    This is the first of several sub-tasks of the 'gc' task that could be
     +    extracted to their own tasks. In this process, we should not change the
     +    behavior of the 'gc' task since that remains the default way to keep
     +    repositories maintained. Creating a new task for one of these sub-tasks
     +    only provides more customization options for those choosing to not use
     +    the 'gc' task. It is certainly possible to have both the 'gc' and
     +    'pack-refs' tasks enabled and run regularly. While they may repeat
     +    effort, they do not conflict in a destructive way.
     +
     +    The 'auto_condition' function pointer is left NULL for now. We could
     +    extend this in the future to have a condition check if pack-refs should
     +    be run during 'git maintenance run --auto'.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
     @@ Documentation/git-maintenance.txt: incremental-repack::
      +pack-refs::
      +	The `pack-refs` task collects the loose reference files and
      +	collects them into a single file. This speeds up operations that
     -+	need to iterate across many refereences. See linkgit:git-pack-refs[1]
     ++	need to iterate across many references. See linkgit:git-pack-refs[1]
      +	for more information.
      +
       OPTIONS
     @@ builtin/gc.c: static void gc_config(void)
       }
       
      +struct maintenance_run_opts;
     -+static int maintenance_task_pack_refs(struct maintenance_run_opts *opts)
     ++static int maintenance_task_pack_refs(MAYBE_UNUSED struct maintenance_run_opts *opts)
      +{
      +	struct strvec pack_refs_cmd = STRVEC_INIT;
      +	strvec_pushl(&pack_refs_cmd, "pack-refs", "--all", "--prune", NULL);
     @@ t/t7900-maintenance.sh: test_expect_success 'maintenance.incremental-repack.auto
      +	do
      +		git branch -f to-pack/$n HEAD || return 1
      +	done &&
     -+	git maintenance run --task=pack-refs &&
     ++	GIT_TRACE2_EVENT="$(pwd)/pack-refs.txt" \
     ++		git maintenance run --task=pack-refs &&
      +	ls .git/refs/heads/ >after &&
     -+	test_must_be_empty after
     ++	test_must_be_empty after &&
     ++	test_subcommand git pack-refs --all --prune <pack-refs.txt
      +'
      +
       test_expect_success '--auto and --schedule incompatible' '
 2:  8012d2dc1420 = 2:  c38fc9a4170e maintenance: incremental strategy runs pack-refs weekly

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/2] maintenance: add pack-refs task
  2021-02-09 13:42 ` [PATCH v2 " Derrick Stolee via GitGitGadget
@ 2021-02-09 13:42   ` Derrick Stolee via GitGitGadget
  2021-02-09 13:42   ` [PATCH v2 2/2] maintenance: incremental strategy runs pack-refs weekly Derrick Stolee via GitGitGadget
  2021-02-10  2:41   ` [PATCH v2 0/2] Maintenance: add pack-refs task Taylor Blau
  2 siblings, 0 replies; 12+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-02-09 13:42 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Eric Sunshine, Derrick Stolee, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

It is valuable to collect loose refs into a more compressed form. This
is typically the packed-refs file, although this could be the reftable
in the future. Having packed refs can be extremely valuable in repos
with many tags or remote branches that are not modified by the local
user, but still are necessary for other queries.

For instance, with many exploded refs, commands such as

	git describe --tags --exact-match HEAD

can be very slow (multiple seconds). This command in particular is used
by terminal prompts to show when a detatched HEAD is pointing to an
existing tag, so having it be slow causes significant delays for users.

Add a new 'pack-refs' maintenance task. It runs 'git pack-refs --all
--prune' to move loose refs into a packed form. For now, that is the
packed-refs file, but could adjust to other file formats in the future.

This is the first of several sub-tasks of the 'gc' task that could be
extracted to their own tasks. In this process, we should not change the
behavior of the 'gc' task since that remains the default way to keep
repositories maintained. Creating a new task for one of these sub-tasks
only provides more customization options for those choosing to not use
the 'gc' task. It is certainly possible to have both the 'gc' and
'pack-refs' tasks enabled and run regularly. While they may repeat
effort, they do not conflict in a destructive way.

The 'auto_condition' function pointer is left NULL for now. We could
extend this in the future to have a condition check if pack-refs should
be run during 'git maintenance run --auto'.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-maintenance.txt |  6 ++++++
 builtin/gc.c                      | 21 +++++++++++++++++----
 t/t7900-maintenance.sh            | 12 ++++++++++++
 3 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index 3b432171d608..80ddd33ceba0 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -145,6 +145,12 @@ incremental-repack::
 	which is a special case that attempts to repack all pack-files
 	into a single pack-file.
 
+pack-refs::
+	The `pack-refs` task collects the loose reference files and
+	collects them into a single file. This speeds up operations that
+	need to iterate across many references. See linkgit:git-pack-refs[1]
+	for more information.
+
 OPTIONS
 -------
 --auto::
diff --git a/builtin/gc.c b/builtin/gc.c
index 4c40594d660e..41bec4f177b3 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -54,7 +54,6 @@ static const char *prune_worktrees_expire = "3.months.ago";
 static unsigned long big_pack_threshold;
 static unsigned long max_delta_cache_size = DEFAULT_DELTA_CACHE_SIZE;
 
-static struct strvec pack_refs_cmd = STRVEC_INIT;
 static struct strvec reflog = STRVEC_INIT;
 static struct strvec repack = STRVEC_INIT;
 static struct strvec prune = STRVEC_INIT;
@@ -163,6 +162,15 @@ static void gc_config(void)
 	git_config(git_default_config, NULL);
 }
 
+struct maintenance_run_opts;
+static int maintenance_task_pack_refs(MAYBE_UNUSED struct maintenance_run_opts *opts)
+{
+	struct strvec pack_refs_cmd = STRVEC_INIT;
+	strvec_pushl(&pack_refs_cmd, "pack-refs", "--all", "--prune", NULL);
+
+	return run_command_v_opt(pack_refs_cmd.v, RUN_GIT_CMD);
+}
+
 static int too_many_loose_objects(void)
 {
 	/*
@@ -518,8 +526,8 @@ static void gc_before_repack(void)
 	if (done++)
 		return;
 
-	if (pack_refs && run_command_v_opt(pack_refs_cmd.v, RUN_GIT_CMD))
-		die(FAILED_RUN, pack_refs_cmd.v[0]);
+	if (pack_refs && maintenance_task_pack_refs(NULL))
+		die(FAILED_RUN, "pack-refs");
 
 	if (prune_reflogs && run_command_v_opt(reflog.v, RUN_GIT_CMD))
 		die(FAILED_RUN, reflog.v[0]);
@@ -556,7 +564,6 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage_with_options(builtin_gc_usage, builtin_gc_options);
 
-	strvec_pushl(&pack_refs_cmd, "pack-refs", "--all", "--prune", NULL);
 	strvec_pushl(&reflog, "reflog", "expire", "--all", NULL);
 	strvec_pushl(&repack, "repack", "-d", "-l", NULL);
 	strvec_pushl(&prune, "prune", "--expire", NULL);
@@ -1224,6 +1231,7 @@ enum maintenance_task_label {
 	TASK_INCREMENTAL_REPACK,
 	TASK_GC,
 	TASK_COMMIT_GRAPH,
+	TASK_PACK_REFS,
 
 	/* Leave as final value */
 	TASK__COUNT
@@ -1255,6 +1263,11 @@ static struct maintenance_task tasks[] = {
 		maintenance_task_commit_graph,
 		should_write_commit_graph,
 	},
+	[TASK_PACK_REFS] = {
+		"pack-refs",
+		maintenance_task_pack_refs,
+		NULL,
+	},
 };
 
 static int compare_tasks_by_selection(const void *a_, const void *b_)
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 78ccf4b33f87..4a8a78769bd6 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -343,6 +343,18 @@ test_expect_success 'maintenance.incremental-repack.auto' '
 	test_subcommand git multi-pack-index write --no-progress <trace-B
 '
 
+test_expect_success 'pack-refs task' '
+	for n in $(test_seq 1 5)
+	do
+		git branch -f to-pack/$n HEAD || return 1
+	done &&
+	GIT_TRACE2_EVENT="$(pwd)/pack-refs.txt" \
+		git maintenance run --task=pack-refs &&
+	ls .git/refs/heads/ >after &&
+	test_must_be_empty after &&
+	test_subcommand git pack-refs --all --prune <pack-refs.txt
+'
+
 test_expect_success '--auto and --schedule incompatible' '
 	test_must_fail git maintenance run --auto --schedule=daily 2>err &&
 	test_i18ngrep "at most one" err
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/2] maintenance: incremental strategy runs pack-refs weekly
  2021-02-09 13:42 ` [PATCH v2 " Derrick Stolee via GitGitGadget
  2021-02-09 13:42   ` [PATCH v2 1/2] maintenance: " Derrick Stolee via GitGitGadget
@ 2021-02-09 13:42   ` Derrick Stolee via GitGitGadget
  2021-02-10  2:41   ` [PATCH v2 0/2] Maintenance: add pack-refs task Taylor Blau
  2 siblings, 0 replies; 12+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-02-09 13:42 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Eric Sunshine, Derrick Stolee, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When the 'maintenance.strategy' config option is set to 'incremental',
a default maintenance schedule is enabled. Add the 'pack-refs' task to
that strategy at the weekly cadence.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/config/maintenance.txt |  5 +++--
 builtin/gc.c                         |  2 ++
 t/t7900-maintenance.sh               | 14 ++++++++++++++
 3 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/Documentation/config/maintenance.txt b/Documentation/config/maintenance.txt
index a5ead09e4bc2..18f056213145 100644
--- a/Documentation/config/maintenance.txt
+++ b/Documentation/config/maintenance.txt
@@ -15,8 +15,9 @@ maintenance.strategy::
 * `none`: This default setting implies no task are run at any schedule.
 * `incremental`: This setting optimizes for performing small maintenance
   activities that do not delete any data. This does not schedule the `gc`
-  task, but runs the `prefetch` and `commit-graph` tasks hourly and the
-  `loose-objects` and `incremental-repack` tasks daily.
+  task, but runs the `prefetch` and `commit-graph` tasks hourly, the
+  `loose-objects` and `incremental-repack` tasks daily, and the `pack-refs`
+  task weekly.
 
 maintenance.<task>.enabled::
 	This boolean config option controls whether the maintenance task
diff --git a/builtin/gc.c b/builtin/gc.c
index 41bec4f177b3..6db9cb39e679 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1352,6 +1352,8 @@ static void initialize_maintenance_strategy(void)
 		tasks[TASK_INCREMENTAL_REPACK].schedule = SCHEDULE_DAILY;
 		tasks[TASK_LOOSE_OBJECTS].enabled = 1;
 		tasks[TASK_LOOSE_OBJECTS].schedule = SCHEDULE_DAILY;
+		tasks[TASK_PACK_REFS].enabled = 1;
+		tasks[TASK_PACK_REFS].schedule = SCHEDULE_WEEKLY;
 	}
 }
 
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 4a8a78769bd6..286b18db3cc2 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -408,18 +408,32 @@ test_expect_success 'maintenance.strategy inheritance' '
 		git maintenance run --schedule=hourly --quiet &&
 	GIT_TRACE2_EVENT="$(pwd)/incremental-daily.txt" \
 		git maintenance run --schedule=daily --quiet &&
+	GIT_TRACE2_EVENT="$(pwd)/incremental-weekly.txt" \
+		git maintenance run --schedule=weekly --quiet &&
 
 	test_subcommand git commit-graph write --split --reachable \
 		--no-progress <incremental-hourly.txt &&
 	test_subcommand ! git prune-packed --quiet <incremental-hourly.txt &&
 	test_subcommand ! git multi-pack-index write --no-progress \
 		<incremental-hourly.txt &&
+	test_subcommand ! git pack-refs --all --prune \
+		<incremental-hourly.txt &&
 
 	test_subcommand git commit-graph write --split --reachable \
 		--no-progress <incremental-daily.txt &&
 	test_subcommand git prune-packed --quiet <incremental-daily.txt &&
 	test_subcommand git multi-pack-index write --no-progress \
 		<incremental-daily.txt &&
+	test_subcommand ! git pack-refs --all --prune \
+		<incremental-daily.txt &&
+
+	test_subcommand git commit-graph write --split --reachable \
+		--no-progress <incremental-weekly.txt &&
+	test_subcommand git prune-packed --quiet <incremental-weekly.txt &&
+	test_subcommand git multi-pack-index write --no-progress \
+		<incremental-weekly.txt &&
+	test_subcommand git pack-refs --all --prune \
+		<incremental-weekly.txt &&
 
 	# Modify defaults
 	git config maintenance.commit-graph.schedule daily &&
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 0/2] Maintenance: add pack-refs task
  2021-02-09 13:42 ` [PATCH v2 " Derrick Stolee via GitGitGadget
  2021-02-09 13:42   ` [PATCH v2 1/2] maintenance: " Derrick Stolee via GitGitGadget
  2021-02-09 13:42   ` [PATCH v2 2/2] maintenance: incremental strategy runs pack-refs weekly Derrick Stolee via GitGitGadget
@ 2021-02-10  2:41   ` Taylor Blau
  2 siblings, 0 replies; 12+ messages in thread
From: Taylor Blau @ 2021-02-10  2:41 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, Taylor Blau, Eric Sunshine, Derrick Stolee, Derrick Stolee

On Tue, Feb 09, 2021 at 01:42:27PM +0000, Derrick Stolee via GitGitGadget wrote:
> Updates in V2
> =============
>
>  * Fixed doc typo. Thanks, Eric!
>  * Updated commit messages to make it clear that the 'pack-refs' step will
>    still happen within the 'gc' task.
>  * Updated the test to check that we run the correct subcommand.
>  * maintenance_task_pack_refs() uses MAYBE_UNUSED on its parameter.

Thanks, the range-diff looked sane to me.

  Reviewed-by: Taylor Blau <me@ttaylorr.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-02-10  2:46 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-08 14:52 [PATCH 0/2] Maintenance: add pack-refs task Derrick Stolee via GitGitGadget
2021-02-08 14:52 ` [PATCH 1/2] maintenance: " Derrick Stolee via GitGitGadget
2021-02-08 22:53   ` Taylor Blau
2021-02-09 12:42     ` Derrick Stolee
2021-02-08 23:06   ` Eric Sunshine
2021-02-09 12:42     ` Derrick Stolee
2021-02-08 14:52 ` [PATCH 2/2] maintenance: incremental strategy runs pack-refs weekly Derrick Stolee via GitGitGadget
2021-02-08 22:46 ` [PATCH 0/2] Maintenance: add pack-refs task Taylor Blau
2021-02-09 13:42 ` [PATCH v2 " Derrick Stolee via GitGitGadget
2021-02-09 13:42   ` [PATCH v2 1/2] maintenance: " Derrick Stolee via GitGitGadget
2021-02-09 13:42   ` [PATCH v2 2/2] maintenance: incremental strategy runs pack-refs weekly Derrick Stolee via GitGitGadget
2021-02-10  2:41   ` [PATCH v2 0/2] Maintenance: add pack-refs task Taylor Blau

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).