git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/7] [RFC] Maintenance III: background maintenance
@ 2020-08-19 17:16 Derrick Stolee via GitGitGadget
  2020-08-19 17:16 ` [PATCH 1/7] maintenance: optionally skip --auto process Derrick Stolee via GitGitGadget
                   ` (8 more replies)
  0 siblings, 9 replies; 41+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-19 17:16 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee

This is based on ds/maintenance-part-2, but with some local updates to
review feedback. It won't apply cleanly right now. This RFC is for early
feedback and not intended to make a new tracking branch until v2.

This RFC is intended to show how I hope to integrate true background
maintenance into Git. As opposed to my original RFC [1], this entirely
integrates with cron (through crontab [-e|-l]) to launch maintenance
commands in the background.

[1] 
https://lore.kernel.org/git/pull.597.git.1585946894.gitgitgadget@gmail.com/

Some preliminary work is done to allow a new --scheduled option that
triggers enabled tasks only if they have not been run in some amount of
time. The timestamp of the previous run is stored in the 
maintenance.<task>.lastRun config value while the interval is stored in the 
maintenance.<task>.schedule config value.

A new for-each-repo builtin runs Git commands on every repo in a given list.
Currently, the list is stored as a config setting, allowing a new 
maintenance.repos config list to store the repositories registered for
background maintenance. Others may want to add a --file=<file> option for
their own workflows, but I focused on making this as simple as possible for
now.

The updates to the git maintenance builtin include new register/unregister 
subcommands and start/stop subcommands. The register subcommand initializes
the config while the start subcommand does everything register does plus 
update the cron table. The unregister and stop commands reverse this
process.

The very last patch is entirely optional. It sets a recommended schedule
based on my own experience with very large repositories. I'm open to other
suggestions, but these are ones that I think work well and don't cause a
"rewrite the world" scenario like running nightly 'gc' would do.

I've been testing this scenario on my macOS laptop for a while and my Linux
machine. I have modified my cron task to provide logging via trace2 so I can
see what's happening. A future direction here would be to add some
maintenance logs to the repository so we can track what is happening and
diagnose whether the maintenance strategy is working on real repos.

It could be helpful for contributors to suggest ways to configure certain
jobs to run "nightly" or "overnight on a weekend" instead of just "whenever
the last run was long enough ago." One way to do this would be to set the 
lastRun config to be at the time of day that the job should run. Another
option would be to make the cron table more complicated with multiple rows,
but I would prefer to avoid that option if there is a simpler mechanism.

Note: git maintenance (start|stop) only works on machines with cron by
design. The proper thing to do on Windows will come later. Perhaps this
command should be marked as unavailable on Windows somehow, or at least a
better error than "cron may not be available on your system". I did find
that that message is helpful sometimes: macOS worker agents for CI builds
typically do not have cron available.

Thanks, -Stolee

Derrick Stolee (7):
  maintenance: optionally skip --auto process
  maintenance: store the "last run" time in config
  maintenance: add --scheduled option and config
  for-each-repo: run subcommands on configured repos
  maintenance: add [un]register subcommands
  maintenance: add start/stop subcommands
  maintenance: recommended schedule in register/start

 .gitignore                           |   1 +
 Documentation/config/maintenance.txt |  19 ++
 Documentation/git-for-each-repo.txt  |  45 +++++
 Documentation/git-maintenance.txt    |  44 ++++-
 Makefile                             |   2 +
 builtin.h                            |   1 +
 builtin/for-each-repo.c              |  58 ++++++
 builtin/gc.c                         | 282 ++++++++++++++++++++++++++-
 git-gvfs-helper                      | Bin 0 -> 11171736 bytes
 git.c                                |   1 +
 run-command.c                        |   8 +
 t/helper/test-crontab.c              |  35 ++++
 t/helper/test-gvfs-protocol          | Bin 0 -> 10946928 bytes
 t/helper/test-tool.c                 |   1 +
 t/helper/test-tool.h                 |   1 +
 t/t0068-for-each-repo.sh             |  30 +++
 t/t7900-maintenance.sh               |  95 ++++++++-
 t/test-lib.sh                        |   6 +
 18 files changed, 625 insertions(+), 4 deletions(-)
 create mode 100644 Documentation/git-for-each-repo.txt
 create mode 100644 builtin/for-each-repo.c
 create mode 100755 git-gvfs-helper
 create mode 100644 t/helper/test-crontab.c
 create mode 100755 t/helper/test-gvfs-protocol
 create mode 100755 t/t0068-for-each-repo.sh


base-commit: 0c43c64dd2fb41ac14038f1c3143bddbc6c35585
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-680%2Fderrickstolee%2Fmaintenance%2Fscheduled-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-680/derrickstolee/maintenance/scheduled-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/680
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 1/7] maintenance: optionally skip --auto process
  2020-08-19 17:16 [PATCH 0/7] [RFC] Maintenance III: background maintenance Derrick Stolee via GitGitGadget
@ 2020-08-19 17:16 ` Derrick Stolee via GitGitGadget
  2020-08-20  2:06   ` Đoàn Trần Công Danh
  2020-08-19 17:16 ` [PATCH 3/7] maintenance: add --scheduled option and config Derrick Stolee via GitGitGadget
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 41+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-19 17:16 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Some commands run 'git maintenance run --auto --[no-]quiet' after doing
their normal work, as a way to keep repositories clean as they are used.
Currently, users who do not want this maintenance to occur would set the
'gc.auto' config option to 0 to avoid the 'gc' task from running.
However, this does not stop the extra process invocation. On Windows,
this extra process invocation can be more expensive than necessary.

Allow users to drop this extra process by setting 'maintenance.auto' to
'false'.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/config/maintenance.txt |  5 +++++
 run-command.c                        |  8 ++++++++
 t/t7900-maintenance.sh               | 13 +++++++++++++
 3 files changed, 26 insertions(+)

diff --git a/Documentation/config/maintenance.txt b/Documentation/config/maintenance.txt
index a0706d8f09..06db758172 100644
--- a/Documentation/config/maintenance.txt
+++ b/Documentation/config/maintenance.txt
@@ -1,3 +1,8 @@
+maintenance.auto::
+	This boolean config option controls whether some commands run
+	`git maintenance run --auto` after doing their normal work. Defaults
+	to true.
+
 maintenance.<task>.enabled::
 	This boolean config option controls whether the maintenance task
 	with name `<task>` is run when no `--task` option is specified to
diff --git a/run-command.c b/run-command.c
index 2ee59acdc8..9c9d5d7f98 100644
--- a/run-command.c
+++ b/run-command.c
@@ -7,6 +7,7 @@
 #include "strbuf.h"
 #include "string-list.h"
 #include "quote.h"
+#include "config.h"
 
 void child_process_init(struct child_process *child)
 {
@@ -1868,8 +1869,15 @@ int run_processes_parallel_tr2(int n, get_next_task_fn get_next_task,
 
 int run_auto_maintenance(int quiet)
 {
+	int enabled;
 	struct child_process maint = CHILD_PROCESS_INIT;
 
+	if (!git_config_get_bool("maintenance.auto", &enabled) &&
+	    !enabled) {
+		    fprintf(stderr, "enabled: %d\n", enabled);
+		return 0;
+	    }
+
 	maint.git_cmd = 1;
 	strvec_pushl(&maint.args, "maintenance", "run", "--auto", NULL);
 	strvec_push(&maint.args, quiet ? "--quiet" : "--no-quiet");
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 6f878b0141..e0ba19e1ff 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -26,6 +26,19 @@ test_expect_success 'run [--auto|--quiet]' '
 	test_subcommand git gc --no-quiet <run-no-quiet.txt
 '
 
+test_expect_success 'maintenance.auto config option' '
+	GIT_TRACE2_EVENT="$(pwd)/default" git commit --quiet --allow-empty -m 1 &&
+	test_subcommand git maintenance run --auto --quiet <default &&
+	GIT_TRACE2_EVENT="$(pwd)/true" \
+		git -c maintenance.auto=true \
+		commit --quiet --allow-empty -m 2 &&
+	test_subcommand git maintenance run --auto --quiet  <true &&
+	GIT_TRACE2_EVENT="$(pwd)/false" \
+		git -c maintenance.auto=false \
+		commit --quiet --allow-empty -m 3 &&
+	test_subcommand ! git maintenance run --auto --quiet  <false
+'
+
 test_expect_success 'maintenance.<task>.enabled' '
 	git config maintenance.gc.enabled false &&
 	git config maintenance.commit-graph.enabled true &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 3/7] maintenance: add --scheduled option and config
  2020-08-19 17:16 [PATCH 0/7] [RFC] Maintenance III: background maintenance Derrick Stolee via GitGitGadget
  2020-08-19 17:16 ` [PATCH 1/7] maintenance: optionally skip --auto process Derrick Stolee via GitGitGadget
@ 2020-08-19 17:16 ` Derrick Stolee via GitGitGadget
  2020-08-20 14:51   ` Đoàn Trần Công Danh
  2020-08-19 17:16 ` [PATCH 4/7] for-each-repo: run subcommands on configured repos Derrick Stolee via GitGitGadget
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 41+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-19 17:16 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

A user may want to run certain maintenance tasks based on frequency, not
conditions given in the repository. For example, the user may want to
perform a 'prefetch' task every hour, or 'gc' task every day. To assist,
update the 'git maintenance run --scheduled' command to check the config
for the last run of that task and add a number of seconds. The task
would then run only if the current time is beyond that minimum
timestamp.

Add a '--scheduled' option to 'git maintenance run' to only run tasks
that have had enough time pass since their last run. This is done for
each enabled task by checking if the current timestamp is at least as
large as the sum of 'maintenance.<task>.lastRun' and
'maintenance.<task>.schedule' in the Git config. This second value is
new to this commit, storing a number of seconds intended between runs.

A user could then set up an hourly maintenance run with the following
cron table:

  0 * * * * git -C <repo> maintenance run --scheduled

Then, the user could configure the repository with the following config
values:

  maintenance.prefetch.schedule  3000
  maintenance.gc.schedule       86000

These numbers are slightly lower than one hour and one day (in seconds).
The cron schedule will enforce the hourly run rate, but we can use these
schedules to ensure the 'gc' task runs once a day. The error is given
because the *.lastRun config option is specified at the _start_ of the
task run. Otherwise, a slow task run could shift the "daily" job of 'gc'
from a 10:00pm run to 11:00pm run, or later.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/config/maintenance.txt |  9 +++++
 Documentation/git-maintenance.txt    | 13 +++++++-
 builtin/gc.c                         | 50 +++++++++++++++++++++++++++-
 t/t7900-maintenance.sh               | 20 +++++++++++
 4 files changed, 90 insertions(+), 2 deletions(-)

diff --git a/Documentation/config/maintenance.txt b/Documentation/config/maintenance.txt
index 8dd34169da..caacacd322 100644
--- a/Documentation/config/maintenance.txt
+++ b/Documentation/config/maintenance.txt
@@ -15,6 +15,15 @@ maintenance.<task>.lastRun::
 	`<task>` is run. It stores a timestamp representing the most-recent
 	run of the `<task>`.
 
+maintenance.<task>.schedule::
+	This config option controls whether or not the given `<task>` runs
+	during a `git maintenance run --scheduled` command. If the option
+	is an integer value `S`, then the `<task>` is run when the current
+	time is `S` seconds after the timestamp stored in
+	`maintenance.<task>.lastRun`. If the option has no value or a
+	non-integer value, then the task will never run with the `--scheduled`
+	option.
+
 maintenance.commit-graph.auto::
 	This integer config option controls how often the `commit-graph` task
 	should be run as part of `git maintenance run --auto`. If zero, then
diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index 95a333f000..e8004e7b11 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -110,7 +110,18 @@ OPTIONS
 	only if certain thresholds are met. For example, the `gc` task
 	runs when the number of loose objects exceeds the number stored
 	in the `gc.auto` config setting, or when the number of pack-files
-	exceeds the `gc.autoPackLimit` config setting.
+	exceeds the `gc.autoPackLimit` config setting. Not compatible with
+	the `--scheduled` option.
+
+--scheduled::
+	When combined with the `run` subcommand, run maintenance tasks
+	only if certain time conditions are met, as specified by the
+	`maintenance.<task>.schedule` config value for each `<task>`.
+	This config value specifies a number of seconds since the last
+	time that task ran, according to the `maintenance.<task>.lastRun`
+	config value. The tasks that are tested are those provided by
+	the `--task=<task>` option(s) or those with
+	`maintenance.<task>.enabled` set to true.
 
 --quiet::
 	Do not report progress or other information over `stderr`.
diff --git a/builtin/gc.c b/builtin/gc.c
index 707c144fb9..352948529d 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -711,6 +711,7 @@ static const char * const builtin_maintenance_usage[] = {
 
 struct maintenance_opts {
 	int auto_flag;
+	int scheduled;
 	int quiet;
 };
 
@@ -1226,7 +1227,8 @@ struct maintenance_task {
 	const char *name;
 	maintenance_task_fn *fn;
 	maintenance_auto_fn *auto_condition;
-	unsigned enabled:1;
+	unsigned enabled:1,
+		 scheduled:1;
 
 	/* -1 if not selected. */
 	int selected_order;
@@ -1337,6 +1339,9 @@ static int maintenance_run(struct maintenance_opts *opts)
 		     !tasks[i].auto_condition()))
 			continue;
 
+		if (opts->scheduled && !tasks[i].scheduled)
+			continue;
+
 		update_last_run(&tasks[i]);
 
 		trace2_region_enter("maintenance", tasks[i].name, r);
@@ -1351,6 +1356,29 @@ static int maintenance_run(struct maintenance_opts *opts)
 	return result;
 }
 
+static void fill_schedule_info(struct maintenance_task *task,
+			       const char *config_name,
+			       timestamp_t schedule_delay)
+{
+	timestamp_t now = approxidate("now");
+	char *value = NULL;
+	struct strbuf last_run = STRBUF_INIT;
+	int64_t previous_run;
+
+	strbuf_addf(&last_run, "maintenance.%s.lastrun", task->name);
+
+	if (git_config_get_string(last_run.buf, &value))
+		task->scheduled = 1;
+	else {
+		previous_run = git_config_int64(last_run.buf, value);
+		if (now >= previous_run + schedule_delay)
+			task->scheduled = 1;
+	}
+
+	free(value);
+	strbuf_release(&last_run);
+}
+
 static void initialize_task_config(void)
 {
 	int i;
@@ -1359,6 +1387,7 @@ static void initialize_task_config(void)
 
 	for (i = 0; i < TASK__COUNT; i++) {
 		int config_value;
+		char *config_str;
 
 		strbuf_setlen(&config_name, 0);
 		strbuf_addf(&config_name, "maintenance.%s.enabled",
@@ -1366,6 +1395,20 @@ static void initialize_task_config(void)
 
 		if (!git_config_get_bool(config_name.buf, &config_value))
 			tasks[i].enabled = config_value;
+
+		strbuf_setlen(&config_name, 0);
+		strbuf_addf(&config_name, "maintenance.%s.schedule",
+			    tasks[i].name);
+
+		if (!git_config_get_string(config_name.buf, &config_str)) {
+			timestamp_t schedule_delay = git_config_int64(
+							config_name.buf,
+							config_str);
+			fill_schedule_info(&tasks[i],
+						config_name.buf,
+						schedule_delay);
+			free(config_str);
+		}
 	}
 
 	strbuf_release(&config_name);
@@ -1409,6 +1452,8 @@ int cmd_maintenance(int argc, const char **argv, const char *prefix)
 	struct option builtin_maintenance_options[] = {
 		OPT_BOOL(0, "auto", &opts.auto_flag,
 			 N_("run tasks based on the state of the repository")),
+		OPT_BOOL(0, "scheduled", &opts.scheduled,
+			 N_("run tasks based on time intervals")),
 		OPT_BOOL(0, "quiet", &opts.quiet,
 			 N_("do not report progress or other information over stderr")),
 		OPT_CALLBACK_F(0, "task", NULL, N_("task"),
@@ -1434,6 +1479,9 @@ int cmd_maintenance(int argc, const char **argv, const char *prefix)
 			     builtin_maintenance_usage,
 			     PARSE_OPT_KEEP_UNKNOWN);
 
+	if (opts.auto_flag + opts.scheduled > 1)
+		die(_("use at most one of the --auto and --scheduled options"));
+
 	if (argc != 1)
 		usage_with_options(builtin_maintenance_usage,
 				   builtin_maintenance_options);
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index a985ce3674..3e0c5f1ca8 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -264,6 +264,11 @@ test_expect_success 'maintenance.incremental-repack.auto' '
 	done
 '
 
+test_expect_success '--auto and --scheduled incompatible' '
+	test_must_fail git maintenance run --auto --scheduled 2>err &&
+	test_i18ngrep "at most one" err
+'
+
 test_expect_success 'tasks update maintenance.<task>.lastRun' '
 	git config --unset maintenance.commit-graph.lastrun &&
 	GIT_TRACE2_EVENT="$(pwd)/run.txt" \
@@ -274,4 +279,19 @@ test_expect_success 'tasks update maintenance.<task>.lastRun' '
 	test_cmp_config 1595000000 maintenance.commit-graph.lastrun
 '
 
+test_expect_success '--scheduled with specific time' '
+	git config maintenance.commit-graph.schedule 100 &&
+	GIT_TRACE2_EVENT="$(pwd)/too-soon.txt" \
+		GIT_TEST_DATE_NOW=1595000099 \
+		git maintenance run --scheduled 2>/dev/null &&
+	test_subcommand ! git commit-graph write --split --reachable \
+		--no-progress <too-soon.txt &&
+	GIT_TRACE2_EVENT="$(pwd)/long-enough.txt" \
+		GIT_TEST_DATE_NOW=1595000100 \
+		git maintenance run --scheduled 2>/dev/null &&
+	test_subcommand git commit-graph write --split --reachable \
+		--no-progress <long-enough.txt &&
+	test_cmp_config 1595000100 maintenance.commit-graph.lastrun
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 4/7] for-each-repo: run subcommands on configured repos
  2020-08-19 17:16 [PATCH 0/7] [RFC] Maintenance III: background maintenance Derrick Stolee via GitGitGadget
  2020-08-19 17:16 ` [PATCH 1/7] maintenance: optionally skip --auto process Derrick Stolee via GitGitGadget
  2020-08-19 17:16 ` [PATCH 3/7] maintenance: add --scheduled option and config Derrick Stolee via GitGitGadget
@ 2020-08-19 17:16 ` Derrick Stolee via GitGitGadget
  2020-08-20 15:00   ` Đoàn Trần Công Danh
  2020-08-19 17:16 ` [PATCH 5/7] maintenance: add [un]register subcommands Derrick Stolee via GitGitGadget
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 41+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-19 17:16 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

It can be helpful to store a list of repositories in global or system
config and then iterate Git commands on that list. Create a new builtin
that makes this process simple for experts. We will use this builtin to
run scheduled maintenance on all configured repositories in a future
change.

The test is very simple, but does highlight that the "--" argument is
optional.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 .gitignore                          |  1 +
 Documentation/git-for-each-repo.txt | 45 ++++++++++++++++++++++
 Makefile                            |  1 +
 builtin.h                           |  1 +
 builtin/for-each-repo.c             | 58 +++++++++++++++++++++++++++++
 git.c                               |  1 +
 t/t0068-for-each-repo.sh            | 30 +++++++++++++++
 7 files changed, 137 insertions(+)
 create mode 100644 Documentation/git-for-each-repo.txt
 create mode 100644 builtin/for-each-repo.c
 create mode 100755 t/t0068-for-each-repo.sh

diff --git a/.gitignore b/.gitignore
index a5808fa30d..5eb2a2be71 100644
--- a/.gitignore
+++ b/.gitignore
@@ -67,6 +67,7 @@
 /git-filter-branch
 /git-fmt-merge-msg
 /git-for-each-ref
+/git-for-each-repo
 /git-format-patch
 /git-fsck
 /git-fsck-objects
diff --git a/Documentation/git-for-each-repo.txt b/Documentation/git-for-each-repo.txt
new file mode 100644
index 0000000000..83b06db410
--- /dev/null
+++ b/Documentation/git-for-each-repo.txt
@@ -0,0 +1,45 @@
+git-for-each-repo(1)
+====================
+
+NAME
+----
+git-for-each-repo - Run a Git command on a list of repositories
+
+
+SYNOPSIS
+--------
+[verse]
+'git for-each-repo' --config=<config> [--] <arguments>
+
+
+DESCRIPTION
+-----------
+Run a Git commands on a list of repositories. The arguments after the
+known options or `--` indicator are used as the arguments for the Git
+command.
+
+For example, we could run maintenance on each of a list of repositories
+stored in a `maintenance.repo` config variable using
+
+-------------
+git for-each-repo --config=maintenance.repo maintenance run
+-------------
+
+This will run `git -C <repo> maintenance run` for each value `<repo>`
+in the multi-valued config variable `maintenance.repo`.
+
+
+OPTIONS
+-------
+--config=<config>::
+	Use the given config variable as a multi-valued list storing
+	absolute path names. Iterate on that list of paths to run
+	the given arguments.
++
+These config values are loaded from system, global, and local Git config,
+as available. If `git for-each-repo` is run in a directory that is not a
+Git repository, then only the system and global config is used.
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/Makefile b/Makefile
index 65f8cfb236..7c588ff036 100644
--- a/Makefile
+++ b/Makefile
@@ -1071,6 +1071,7 @@ BUILTIN_OBJS += builtin/fetch-pack.o
 BUILTIN_OBJS += builtin/fetch.o
 BUILTIN_OBJS += builtin/fmt-merge-msg.o
 BUILTIN_OBJS += builtin/for-each-ref.o
+BUILTIN_OBJS += builtin/for-each-repo.o
 BUILTIN_OBJS += builtin/fsck.o
 BUILTIN_OBJS += builtin/gc.o
 BUILTIN_OBJS += builtin/get-tar-commit-id.o
diff --git a/builtin.h b/builtin.h
index 17c1c0ce49..ff7c6e5aa9 100644
--- a/builtin.h
+++ b/builtin.h
@@ -150,6 +150,7 @@ int cmd_fetch(int argc, const char **argv, const char *prefix);
 int cmd_fetch_pack(int argc, const char **argv, const char *prefix);
 int cmd_fmt_merge_msg(int argc, const char **argv, const char *prefix);
 int cmd_for_each_ref(int argc, const char **argv, const char *prefix);
+int cmd_for_each_repo(int argc, const char **argv, const char *prefix);
 int cmd_format_patch(int argc, const char **argv, const char *prefix);
 int cmd_fsck(int argc, const char **argv, const char *prefix);
 int cmd_gc(int argc, const char **argv, const char *prefix);
diff --git a/builtin/for-each-repo.c b/builtin/for-each-repo.c
new file mode 100644
index 0000000000..5bba623ff1
--- /dev/null
+++ b/builtin/for-each-repo.c
@@ -0,0 +1,58 @@
+#include "cache.h"
+#include "config.h"
+#include "builtin.h"
+#include "parse-options.h"
+#include "run-command.h"
+#include "string-list.h"
+
+static const char * const for_each_repo_usage[] = {
+	N_("git for-each-repo --config=<config> <command-args>"),
+	NULL
+};
+
+static int run_command_on_repo(const char *path,
+			       void *cbdata)
+{
+	int i;
+	struct child_process child = CHILD_PROCESS_INIT;
+	struct strvec *args = (struct strvec *)cbdata;
+
+	child.git_cmd = 1;
+	strvec_pushl(&child.args, "-C", path, NULL);
+
+	for (i = 0; i < args->nr; i++)
+		strvec_push(&child.args, args->v[i]);
+
+	return run_command(&child);
+}
+
+int cmd_for_each_repo(int argc, const char **argv, const char *prefix)
+{
+	static const char *config_key = NULL;
+	int i, result = 0;
+	const struct string_list *values;
+	struct strvec args = STRVEC_INIT;
+
+	const struct option options[] = {
+		OPT_STRING(0, "config", &config_key, N_("config"),
+			   N_("config key storing a list of repository paths")),
+		OPT_END()
+	};
+
+	argc = parse_options(argc, argv, prefix, options, for_each_repo_usage,
+			     PARSE_OPT_STOP_AT_NON_OPTION);
+
+	if (!config_key)
+		die(_("missing --config=<config>"));
+
+	for (i = 0; i < argc; i++)
+		strvec_push(&args, argv[i]);
+
+	values = repo_config_get_value_multi(the_repository,
+					     config_key);
+
+	for (i = 0; !result && i < values->nr; i++)
+		result = run_command_on_repo(values->items[i].string, &args);
+
+	return result;
+}
diff --git a/git.c b/git.c
index 24f250d29a..1cab64b5d1 100644
--- a/git.c
+++ b/git.c
@@ -511,6 +511,7 @@ static struct cmd_struct commands[] = {
 	{ "fetch-pack", cmd_fetch_pack, RUN_SETUP | NO_PARSEOPT },
 	{ "fmt-merge-msg", cmd_fmt_merge_msg, RUN_SETUP },
 	{ "for-each-ref", cmd_for_each_ref, RUN_SETUP },
+	{ "for-each-repo", cmd_for_each_repo, RUN_SETUP_GENTLY },
 	{ "format-patch", cmd_format_patch, RUN_SETUP },
 	{ "fsck", cmd_fsck, RUN_SETUP },
 	{ "fsck-objects", cmd_fsck, RUN_SETUP },
diff --git a/t/t0068-for-each-repo.sh b/t/t0068-for-each-repo.sh
new file mode 100755
index 0000000000..136b4ec839
--- /dev/null
+++ b/t/t0068-for-each-repo.sh
@@ -0,0 +1,30 @@
+#!/bin/sh
+
+test_description='git for-each-repo builtin'
+
+. ./test-lib.sh
+
+test_expect_success 'run based on configured value' '
+	git init one &&
+	git init two &&
+	git init three &&
+	git -C two commit --allow-empty -m "DID NOT RUN" &&
+	git config run.key "$TRASH_DIRECTORY/one" &&
+	git config --add run.key "$TRASH_DIRECTORY/three" &&
+	git for-each-repo --config=run.key commit --allow-empty -m "ran" &&
+	git -C one log -1 --pretty=format:%s >message &&
+	grep ran message &&
+	git -C two log -1 --pretty=format:%s >message &&
+	! grep ran message &&
+	git -C three log -1 --pretty=format:%s >message &&
+	grep ran message &&
+	git for-each-repo --config=run.key -- commit --allow-empty -m "ran again" &&
+	git -C one log -1 --pretty=format:%s >message &&
+	grep again message &&
+	git -C two log -1 --pretty=format:%s >message &&
+	! grep again message &&
+	git -C three log -1 --pretty=format:%s >message &&
+	grep again message
+'
+
+test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 5/7] maintenance: add [un]register subcommands
  2020-08-19 17:16 [PATCH 0/7] [RFC] Maintenance III: background maintenance Derrick Stolee via GitGitGadget
                   ` (2 preceding siblings ...)
  2020-08-19 17:16 ` [PATCH 4/7] for-each-repo: run subcommands on configured repos Derrick Stolee via GitGitGadget
@ 2020-08-19 17:16 ` Derrick Stolee via GitGitGadget
  2020-08-19 17:16 ` [PATCH 6/7] maintenance: add start/stop subcommands Derrick Stolee via GitGitGadget
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 41+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-19 17:16 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

In preparation for launching background maintenance from the 'git
maintenance' builtin, create register/unregister subcommands. These
commands update the new 'maintenance.repos' config option in the global
config so the background maintenance job knows which repositories to
maintain.

These commands allow users to add a repository to the background
maintenance list without disrupting the actual maintenance mechanism.

For example, a user can run 'git maintenance register' when no
background maintenance is running and it will not start the background
maintenance. A later update to start running background maintenance will
then pick up this repository automatically.

The opposite example is that a user can run 'git maintenance unregister'
to remove the current repository from background maintenance without
halting maintenance for other repositories.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-maintenance.txt | 14 ++++++++
 builtin/gc.c                      | 55 ++++++++++++++++++++++++++++++-
 t/t7900-maintenance.sh            | 17 +++++++++-
 3 files changed, 84 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index e8004e7b11..ac6fcae678 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -29,6 +29,15 @@ Git repository.
 SUBCOMMANDS
 -----------
 
+register::
+	Initialize Git config values so any scheduled maintenance will
+	start running on this repository. This adds the repository to the
+	`maintenance.repo` config variable in the current user's global
+	config and enables some recommended configuration values for
+	`maintenance.<task>.schedule`. The tasks that are enabled are safe
+	for running in the background without disrupting foreground
+	processes.
+
 run::
 	Run one or more maintenance tasks. If one or more `--task` options
 	are specified, then those tasks are run in that order. Otherwise,
@@ -36,6 +45,11 @@ run::
 	config options are true. By default, only `maintenance.gc.enabled`
 	is true.
 
+unregister::
+	Remove the current repository from background maintenance. This
+	only removes the repository from the configured list. It does not
+	stop the background maintenance processes from running.
+
 TASKS
 -----
 
diff --git a/builtin/gc.c b/builtin/gc.c
index 352948529d..eb8a0a52ab 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -705,7 +705,7 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 }
 
 static const char * const builtin_maintenance_usage[] = {
-	N_("git maintenance run [<options>]"),
+	N_("git maintenance <subcommand> [<options>]"),
 	NULL
 };
 
@@ -1445,6 +1445,55 @@ static int task_option_parse(const struct option *opt,
 	return 0;
 }
 
+static int maintenance_register(void)
+{
+	struct child_process config_set = CHILD_PROCESS_INIT;
+	struct child_process config_get = CHILD_PROCESS_INIT;
+
+	/* There is no current repository, so skip registering it */
+	if (!the_repository || !the_repository->gitdir)
+		return 0;
+
+	config_get.git_cmd = 1;
+	strvec_pushl(&config_get.args, "config", "--global", "--get", "maintenance.repo",
+		     the_repository->worktree ? the_repository->worktree
+					      : the_repository->gitdir,
+			 NULL);
+	config_get.out = -1;
+
+	if (start_command(&config_get))
+		return error(_("failed to run 'git config'"));
+
+	/* We already have this value in our config! */
+	if (!finish_command(&config_get))
+		return 0;
+
+	config_set.git_cmd = 1;
+	strvec_pushl(&config_set.args, "config", "--add", "--global", "maintenance.repo",
+		     the_repository->worktree ? the_repository->worktree
+					      : the_repository->gitdir,
+		     NULL);
+
+	return run_command(&config_set);
+}
+
+static int maintenance_unregister(void)
+{
+	struct child_process config_unset = CHILD_PROCESS_INIT;
+
+	if (!the_repository || !the_repository->gitdir)
+		return error(_("no current repository to unregister"));
+
+	config_unset.git_cmd = 1;
+	strvec_pushl(&config_unset.args, "config", "--global", "--unset",
+		     "maintenance.repo",
+		     the_repository->worktree ? the_repository->worktree
+					      : the_repository->gitdir,
+		     NULL);
+
+	return run_command(&config_unset);
+}
+
 int cmd_maintenance(int argc, const char **argv, const char *prefix)
 {
 	int i;
@@ -1486,8 +1535,12 @@ int cmd_maintenance(int argc, const char **argv, const char *prefix)
 		usage_with_options(builtin_maintenance_usage,
 				   builtin_maintenance_options);
 
+	if (!strcmp(argv[0], "register"))
+		return maintenance_register();
 	if (!strcmp(argv[0], "run"))
 		return maintenance_run(&opts);
+	if (!strcmp(argv[0], "unregister"))
+		return maintenance_unregister();
 
 	die(_("invalid subcommand: %s"), argv[0]);
 }
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 3e0c5f1ca8..b20ee2d542 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -9,7 +9,7 @@ GIT_TEST_MULTI_PACK_INDEX=0
 
 test_expect_success 'help text' '
 	test_expect_code 129 git maintenance -h 2>err &&
-	test_i18ngrep "usage: git maintenance run" err &&
+	test_i18ngrep "usage: git maintenance <subcommand>" err &&
 	test_expect_code 128 git maintenance barf 2>err &&
 	test_i18ngrep "invalid subcommand: barf" err
 '
@@ -294,4 +294,19 @@ test_expect_success '--scheduled with specific time' '
 	test_cmp_config 1595000100 maintenance.commit-graph.lastrun
 '
 
+test_expect_success 'register and unregister' '
+	test_when_finished git config --global --unset-all maintenance.repo &&
+	git config --global --add maintenance.repo /existing1 &&
+	git config --global --add maintenance.repo /existing2 &&
+	git config --global --get-all maintenance.repo >before &&
+	git maintenance register &&
+	git config --global --get-all maintenance.repo >actual &&
+	cp before after &&
+	pwd >>after &&
+	test_cmp after actual &&
+	git maintenance unregister &&
+	git config --global --get-all maintenance.repo >actual &&
+	test_cmp before actual
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 6/7] maintenance: add start/stop subcommands
  2020-08-19 17:16 [PATCH 0/7] [RFC] Maintenance III: background maintenance Derrick Stolee via GitGitGadget
                   ` (3 preceding siblings ...)
  2020-08-19 17:16 ` [PATCH 5/7] maintenance: add [un]register subcommands Derrick Stolee via GitGitGadget
@ 2020-08-19 17:16 ` Derrick Stolee via GitGitGadget
  2020-08-19 17:16 ` [PATCH 7/7] maintenance: recommended schedule in register/start Derrick Stolee via GitGitGadget
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 41+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-19 17:16 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Add new subcommands to 'git maintenance' that start or stop background
maintenance using 'cron', when available. This integration is as simple
as I could make it, barring some implementation complications.

For now, the background maintenance is scheduled to run hourly via the
following cron table row (ignore line breaks):

	0 * * * * $p/git --exec-path=$p
		for-each-repo --config=maintenance.repo
		maintenance run --scheduled

Future extensions may want to add more complex schedules or some form of
logging. For now, hourly runs seem frequent enough to satisfy the needs
of tasks like 'prefetch' without being so frequent that users would
complain about many no-op commands.

Here, "$p" is a placeholder for the path to the current Git executable.
This is critical for systems with multiple versions of Git.
Specifically, macOS has a system version at '/usr/bin/git' while the
version that users can install resides at '/usr/local/bin/git' (symlinked
to '/usr/local/libexec/git-core/git'). This will also use your
locally-built version if you build and run this in your development
environment without installing first.

The GIT_TEST_CRONTAB environment variable is not intended for users to
edit, but instead as a way to mock the 'crontab [-l]' command. This
variable is set in test-lib.sh to avoid a future test from accidentally
running anything with the cron integration from modifying the user's
schedule. We use GIT_TEST_CRONTAB='test-tool crontab <file>' in our
tests to check how the schedule is modified in 'git maintenance
(start|stop)' commands.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-maintenance.txt |  11 +++
 Makefile                          |   1 +
 builtin/gc.c                      | 117 ++++++++++++++++++++++++++++++
 t/helper/test-crontab.c           |  35 +++++++++
 t/helper/test-tool.c              |   1 +
 t/helper/test-tool.h              |   1 +
 t/t7900-maintenance.sh            |  30 ++++++++
 t/test-lib.sh                     |   6 ++
 8 files changed, 202 insertions(+)
 create mode 100644 t/helper/test-crontab.c

diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index ac6fcae678..600272caa5 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -45,6 +45,17 @@ run::
 	config options are true. By default, only `maintenance.gc.enabled`
 	is true.
 
+start::
+	Start running maintenance on the current repository. This performs
+	the same config updates as the `register` subcommand, then updates
+	the background scheduler to run `git maintenance run --scheduled`
+	on an hourly basis.
+
+stop::
+	Halt the background maintenance schedule. The current repository
+	is not removed from the list of maintained repositories, in case
+	the background maintenance is restarted later.
+
 unregister::
 	Remove the current repository from background maintenance. This
 	only removes the repository from the configured list. It does not
diff --git a/Makefile b/Makefile
index 7c588ff036..c39b39bd7d 100644
--- a/Makefile
+++ b/Makefile
@@ -690,6 +690,7 @@ TEST_BUILTINS_OBJS += test-advise.o
 TEST_BUILTINS_OBJS += test-bloom.o
 TEST_BUILTINS_OBJS += test-chmtime.o
 TEST_BUILTINS_OBJS += test-config.o
+TEST_BUILTINS_OBJS += test-crontab.o
 TEST_BUILTINS_OBJS += test-ctype.o
 TEST_BUILTINS_OBJS += test-date.o
 TEST_BUILTINS_OBJS += test-delta.o
diff --git a/builtin/gc.c b/builtin/gc.c
index eb8a0a52ab..b09287f0fc 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -32,6 +32,7 @@
 #include "remote.h"
 #include "midx.h"
 #include "object-store.h"
+#include "exec-cmd.h"
 
 #define FAILED_RUN "failed to run %s"
 
@@ -1494,6 +1495,118 @@ static int maintenance_unregister(void)
 	return run_command(&config_unset);
 }
 
+#define BEGIN_LINE "# BEGIN GIT MAINTENANCE SCHEDULE"
+#define END_LINE "# END GIT MAINTENANCE SCHEDULE"
+
+static int update_background_schedule(int run_maintenance)
+{
+	int result = 0;
+	int in_old_region = 0;
+	struct child_process crontab_list = CHILD_PROCESS_INIT;
+	struct child_process crontab_edit = CHILD_PROCESS_INIT;
+	FILE *cron_list, *cron_in;
+	const char *crontab_name;
+	struct strbuf line = STRBUF_INIT;
+	struct lock_file lk;
+	char *lock_path = xstrfmt("%s/schedule", the_repository->objects->odb->path);
+
+	if (hold_lock_file_for_update(&lk, lock_path, LOCK_NO_DEREF) < 0)
+		return error(_("another process is scheduling background maintenance"));
+
+	crontab_name = getenv("GIT_TEST_CRONTAB");
+	if (!crontab_name)
+		crontab_name = "crontab";
+
+	strvec_split(&crontab_list.args, crontab_name);
+	strvec_push(&crontab_list.args, "-l");
+	crontab_list.in = -1;
+	crontab_list.out = dup(lk.tempfile->fd);
+	crontab_list.git_cmd = 0;
+
+	if (start_command(&crontab_list)) {
+		result = error(_("failed to run 'crontab -l'; your system might not support 'cron'"));
+		goto cleanup;
+	}
+
+	/* Ignore exit code, as an empty crontab will return error. */
+	finish_command(&crontab_list);
+
+	/*
+	 * Read from the .lock file, filtering out the old
+	 * schedule while appending the new schedule.
+	 */
+	cron_list = fdopen(lk.tempfile->fd, "r");
+	rewind(cron_list);
+
+	strvec_split(&crontab_edit.args, crontab_name);
+	crontab_edit.in = -1;
+	crontab_edit.git_cmd = 0;
+
+	if (start_command(&crontab_edit)) {
+		result = error(_("failed to run 'crontab'; your system might not support 'cron'"));
+		goto cleanup;
+	}
+
+	cron_in = fdopen(crontab_edit.in, "w");
+	if (!cron_in) {
+		result = error(_("failed to open stdin of 'crontab'"));
+		goto done_editing;
+	}
+
+	while (!strbuf_getline_lf(&line, cron_list)) {
+		if (!in_old_region && !strcmp(line.buf, BEGIN_LINE))
+			in_old_region = 1;
+		if (in_old_region)
+			continue;
+		fprintf(cron_in, "%s\n", line.buf);
+		if (in_old_region && !strcmp(line.buf, END_LINE))
+			in_old_region = 0;
+	}
+
+	if (run_maintenance) {
+		const char *exec_path = git_exec_path();
+
+		fprintf(cron_in, "\n%s\n", BEGIN_LINE);
+		fprintf(cron_in, "# The following schedule was created by Git\n");
+		fprintf(cron_in, "# Any edits made in this region might be\n");
+		fprintf(cron_in, "# replaced in the future by a Git command.\n\n");
+
+		fprintf(cron_in,
+			"0 * * * * \"%s/git\" --exec-path=\"%s\" for-each-repo --config=maintenance.repo maintenance run --scheduled\n",
+			exec_path, exec_path);
+
+		fprintf(cron_in, "\n%s\n", END_LINE);
+	}
+
+	fflush(cron_in);
+	fclose(cron_in);
+	close(crontab_edit.in);
+
+done_editing:
+	if (finish_command(&crontab_edit)) {
+		result = error(_("'crontab' died"));
+		goto cleanup;
+	}
+	fclose(cron_list);
+
+cleanup:
+	rollback_lock_file(&lk);
+	return result;
+}
+
+static int maintenance_start(void)
+{
+	if (maintenance_register())
+		warning(_("failed to add repo to global config"));
+
+	return update_background_schedule(1);
+}
+
+static int maintenance_stop(void)
+{
+	return update_background_schedule(0);
+}
+
 int cmd_maintenance(int argc, const char **argv, const char *prefix)
 {
 	int i;
@@ -1539,6 +1652,10 @@ int cmd_maintenance(int argc, const char **argv, const char *prefix)
 		return maintenance_register();
 	if (!strcmp(argv[0], "run"))
 		return maintenance_run(&opts);
+	if (!strcmp(argv[0], "start"))
+		return maintenance_start();
+	if (!strcmp(argv[0], "stop"))
+		return maintenance_stop();
 	if (!strcmp(argv[0], "unregister"))
 		return maintenance_unregister();
 
diff --git a/t/helper/test-crontab.c b/t/helper/test-crontab.c
new file mode 100644
index 0000000000..f5db6319c6
--- /dev/null
+++ b/t/helper/test-crontab.c
@@ -0,0 +1,35 @@
+#include "test-tool.h"
+#include "cache.h"
+
+/*
+ * Usage: test-tool cron <file> [-l]
+ *
+ * If -l is specified, then write the contents of <file> to stdou.
+ * Otherwise, write from stdin into <file>.
+ */
+int cmd__crontab(int argc, const char **argv)
+{
+	char a;
+	FILE *from, *to;
+
+	if (argc == 3 && !strcmp(argv[2], "-l")) {
+		from = fopen(argv[1], "r");
+		if (!from)
+			return 0;
+		to = stdout;
+	} else if (argc == 2) {
+		from = stdin;
+		to = fopen(argv[1], "w");
+	} else
+		return error("unknown arguments");
+
+	while ((a = fgetc(from)) != EOF)
+		fputc(a, to);
+
+	if (argc == 3)
+		fclose(from);
+	else
+		fclose(to);
+
+	return 0;
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 590b2efca7..432b49d948 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -18,6 +18,7 @@ static struct test_cmd cmds[] = {
 	{ "bloom", cmd__bloom },
 	{ "chmtime", cmd__chmtime },
 	{ "config", cmd__config },
+	{ "crontab", cmd__crontab },
 	{ "ctype", cmd__ctype },
 	{ "date", cmd__date },
 	{ "delta", cmd__delta },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index ddc8e990e9..7c3281e071 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -8,6 +8,7 @@ int cmd__advise_if_enabled(int argc, const char **argv);
 int cmd__bloom(int argc, const char **argv);
 int cmd__chmtime(int argc, const char **argv);
 int cmd__config(int argc, const char **argv);
+int cmd__crontab(int argc, const char **argv);
 int cmd__ctype(int argc, const char **argv);
 int cmd__date(int argc, const char **argv);
 int cmd__delta(int argc, const char **argv);
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index b20ee2d542..6491031be8 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -309,4 +309,34 @@ test_expect_success 'register and unregister' '
 	test_cmp before actual
 '
 
+test_expect_success 'start from empty cron table' '
+	GIT_TEST_CRONTAB="test-tool crontab cron.txt" git maintenance start &&
+
+	# start registers the repo
+	git config --get --global maintenance.repo "$(pwd)" &&
+
+	grep "for-each-repo --config=maintenance.repo maintenance run --scheduled" cron.txt
+'
+
+test_expect_success 'stop from existing schedule' '
+	GIT_TEST_CRONTAB="test-tool crontab cron.txt" git maintenance stop &&
+
+	# stop does not unregister the repo
+	git config --get --global maintenance.repo "$(pwd)" &&
+
+	# The newline is preserved
+	echo >empty &&
+	test_cmp empty cron.txt &&
+
+	# Operation is idempotent
+	GIT_TEST_CRONTAB="test-tool crontab cron.txt" git maintenance stop &&
+	test_cmp empty cron.txt
+'
+
+test_expect_success 'start preserves existing schedule' '
+	echo "Important information!" >cron.txt &&
+	GIT_TEST_CRONTAB="test-tool crontab cron.txt" git maintenance start &&
+	grep "Important information!" cron.txt
+'
+
 test_done
diff --git a/t/test-lib.sh b/t/test-lib.sh
index ef31f40037..4a60d1ed76 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1702,3 +1702,9 @@ test_lazy_prereq SHA1 '
 test_lazy_prereq REBASE_P '
 	test -z "$GIT_TEST_SKIP_REBASE_P"
 '
+
+# Ensure that no test accidentally triggers a Git command
+# that runs 'crontab', affecting a user's cron schedule.
+# Tests that verify the cron integration must set this locally
+# to avoid errors.
+GIT_TEST_CRONTAB="exit 1"
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 7/7] maintenance: recommended schedule in register/start
  2020-08-19 17:16 [PATCH 0/7] [RFC] Maintenance III: background maintenance Derrick Stolee via GitGitGadget
                   ` (4 preceding siblings ...)
  2020-08-19 17:16 ` [PATCH 6/7] maintenance: add start/stop subcommands Derrick Stolee via GitGitGadget
@ 2020-08-19 17:16 ` Derrick Stolee via GitGitGadget
       [not found] ` <bdc27fa28ee70222ed3c7c9863746ace8ea835e4.1597857409.git.gitgitgadget@gmail.com>
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 41+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-19 17:16 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The 'git maintenance (register|start)' subcommands add the current
repository to the global Git config so maintenance will operate on that
repository. It does not specify what maintenance should occur or how
often.

If a user sets any 'maintenance.<task>.scheduled' config value, then
they have chosen a specific schedule for themselves and Git should
respect that.

However, in an effort to recommend a good schedule for repositories of
all sizes, set new config values for recommended tasks that are safe to
run in the background while users run foreground Git commands. These
commands are generally everything but the 'gc' task.

Author's Note: I feel we should do _something_ to recommend a good
schedule to users, but I'm not 100% set on this schedule. This is the
schedule we use in Scalar and VFS for Git for very large repositories
using the GVFS protocol. While the schedule works in that environment,
it is possible that "normal" Git repositories could benefit from
something more obvious (such as running 'gc' once a day). However, this
patch gives us a place to start a conversation on what we should
recommend. For my purposes, Scalar will set these config values so we
can always differ from core Git's recommendations.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-maintenance.txt |  6 +++++
 builtin/gc.c                      | 44 +++++++++++++++++++++++++++++++
 t/t7900-maintenance.sh            |  5 ++++
 3 files changed, 55 insertions(+)

diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index 600272caa5..a4b46ea329 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -37,6 +37,12 @@ register::
 	`maintenance.<task>.schedule`. The tasks that are enabled are safe
 	for running in the background without disrupting foreground
 	processes.
++
+If your repository has no 'maintenance.<task>.schedule' configuration
+values set, then Git will set configuration values to some recommended
+settings. These settings disable foreground maintenance while performing
+maintenance tasks in the background that will not interrupt foreground Git
+operations.
 
 run::
 	Run one or more maintenance tasks. If one or more `--task` options
diff --git a/builtin/gc.c b/builtin/gc.c
index b09287f0fc..080d58735c 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1446,6 +1446,47 @@ static int task_option_parse(const struct option *opt,
 	return 0;
 }
 
+static int has_schedule_config(void)
+{
+	int i, found = 0;
+	struct strbuf config_name = STRBUF_INIT;
+	size_t prefix;
+
+	strbuf_addstr(&config_name, "maintenance.");
+	prefix = config_name.len;
+
+	for (i = 0; !found && i < TASK__COUNT; i++) {
+		int value;
+
+		strbuf_setlen(&config_name, prefix);
+		strbuf_addf(&config_name, "%s.schedule", tasks[i].name);
+
+		if (!git_config_get_int(config_name.buf, &value))
+			found = 1;
+	}
+
+	strbuf_release(&config_name);
+	return found;
+}
+
+static void set_recommended_schedule(void)
+{
+	git_config_set("maintenance.auto", "false");
+	git_config_set("maintenance.gc.enabled", "false");
+
+	git_config_set("maintenance.prefetch.enabled", "true");
+	git_config_set("maintenance.prefetch.schedule", "3500");
+
+	git_config_set("maintenance.commit-graph.enabled", "true");
+	git_config_set("maintenance.commit-graph.schedule", "3500");
+
+	git_config_set("maintenance.loose-objects.enabled", "true");
+	git_config_set("maintenance.loose-objects.schedule", "86000");
+
+	git_config_set("maintenance.incremental-repack.enabled", "true");
+	git_config_set("maintenance.incremental-repack.schedule", "86000");
+}
+
 static int maintenance_register(void)
 {
 	struct child_process config_set = CHILD_PROCESS_INIT;
@@ -1455,6 +1496,9 @@ static int maintenance_register(void)
 	if (!the_repository || !the_repository->gitdir)
 		return 0;
 
+	if (has_schedule_config())
+		set_recommended_schedule();
+
 	config_get.git_cmd = 1;
 	strvec_pushl(&config_get.args, "config", "--global", "--get", "maintenance.repo",
 		     the_repository->worktree ? the_repository->worktree
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 6491031be8..7417e5858a 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -300,6 +300,11 @@ test_expect_success 'register and unregister' '
 	git config --global --add maintenance.repo /existing2 &&
 	git config --global --get-all maintenance.repo >before &&
 	git maintenance register &&
+	test_cmp_config false maintenance.auto &&
+	test_cmp_config false maintenance.gc.enabled &&
+	test_cmp_config true maintenance.prefetch.enabled &&
+	test_cmp_config 3500 maintenance.commit-graph.schedule &&
+	test_cmp_config 86000 maintenance.incremental-repack.schedule &&
 	git config --global --get-all maintenance.repo >actual &&
 	cp before after &&
 	pwd >>after &&
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/7] maintenance: optionally skip --auto process
  2020-08-19 17:16 ` [PATCH 1/7] maintenance: optionally skip --auto process Derrick Stolee via GitGitGadget
@ 2020-08-20  2:06   ` Đoàn Trần Công Danh
  2020-08-20 12:12     ` Derrick Stolee
  0 siblings, 1 reply; 41+ messages in thread
From: Đoàn Trần Công Danh @ 2020-08-20  2:06 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, sandals, steadmon, jrnieder, peff, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

On 2020-08-19 17:16:42+0000, Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
> @@ -1868,8 +1869,15 @@ int run_processes_parallel_tr2(int n, get_next_task_fn get_next_task,
>  
>  int run_auto_maintenance(int quiet)
>  {
> +	int enabled;
>  	struct child_process maint = CHILD_PROCESS_INIT;
>  
> +	if (!git_config_get_bool("maintenance.auto", &enabled) &&
> +	    !enabled) {
> +		    fprintf(stderr, "enabled: %d\n", enabled);
> +		return 0;
> +	    }

Nit: This block of code is mis-indented (mixed space and tab).

If we're running into inner block, "enabled" will always be "0"
We can just write:

	fprintf(stderr, "enabled: 0\n");

instead. Writing like that, we have one less thing to worry about
(whether "enabled" is initialised in git_config_get_bool or not).


-- 
Danh

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/7] maintenance: optionally skip --auto process
  2020-08-20  2:06   ` Đoàn Trần Công Danh
@ 2020-08-20 12:12     ` Derrick Stolee
  0 siblings, 0 replies; 41+ messages in thread
From: Derrick Stolee @ 2020-08-20 12:12 UTC (permalink / raw)
  To: Đoàn Trần Công Danh,
	Derrick Stolee via GitGitGadget
  Cc: git, sandals, steadmon, jrnieder, peff, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

On 8/19/2020 10:06 PM, Đoàn Trần Công Danh wrote:
> On 2020-08-19 17:16:42+0000, Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote:
>> From: Derrick Stolee <dstolee@microsoft.com>
>> @@ -1868,8 +1869,15 @@ int run_processes_parallel_tr2(int n, get_next_task_fn get_next_task,
>>  
>>  int run_auto_maintenance(int quiet)
>>  {
>> +	int enabled;
>>  	struct child_process maint = CHILD_PROCESS_INIT;
>>  
>> +	if (!git_config_get_bool("maintenance.auto", &enabled) &&
>> +	    !enabled) {
>> +		    fprintf(stderr, "enabled: %d\n", enabled);
>> +		return 0;
>> +	    }
> 
> Nit: This block of code is mis-indented (mixed space and tab).

Thanks.
 
> If we're running into inner block, "enabled" will always be "0"
> We can just write:
> 
> 	fprintf(stderr, "enabled: 0\n");
> 
> instead. Writing like that, we have one less thing to worry about
> (whether "enabled" is initialised in git_config_get_bool or not).

Whoops! This fprintf shouldn't even be here, but got accidentally
added during a debugging session. Thanks for noticing!

-Stolee


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/7] maintenance: store the "last run" time in config
       [not found] ` <bdc27fa28ee70222ed3c7c9863746ace8ea835e4.1597857409.git.gitgitgadget@gmail.com>
@ 2020-08-20 14:34   ` Đoàn Trần Công Danh
  0 siblings, 0 replies; 41+ messages in thread
From: Đoàn Trần Công Danh @ 2020-08-20 14:34 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, sandals, steadmon, jrnieder, peff, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

On 2020-08-19 17:16:43+0000, Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
> 
> Users may want to run certain maintenance tasks only so often. Update
> the local config with a new 'maintenance.<task>.lastRun' config option
> that stores the timestamp just before running the maintenance task.
> 
> I selected the timestamp before the task, as opposed to after the task,
> for a couple reasons:
> 
>  1. The time the task takes to execute should not contribute to the
>     interval between running the tasks. If a daily task takes 10 minutes
>     to run, then every day the execution will drift by at least 10
>     minutes.
> 
>  2. If the task fails for some unforseen reason, it would be good to
>     indicate that we _attempted_ the task at a certain timestamp. This
>     will avoid spamming a repository that is in a bad state.
> 
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  Documentation/config/maintenance.txt |   5 +++++
>  builtin/gc.c                         |  16 ++++++++++++++++
>  git-gvfs-helper                      | Bin 0 -> 11171736 bytes
>  t/helper/test-gvfs-protocol          | Bin 0 -> 10946928 bytes

Look like those 2 files should be added into .gitignore, no?

-- 
Danh

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/7] maintenance: add --scheduled option and config
  2020-08-19 17:16 ` [PATCH 3/7] maintenance: add --scheduled option and config Derrick Stolee via GitGitGadget
@ 2020-08-20 14:51   ` Đoàn Trần Công Danh
  2020-08-24 14:03     ` Derrick Stolee
  0 siblings, 1 reply; 41+ messages in thread
From: Đoàn Trần Công Danh @ 2020-08-20 14:51 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, sandals, steadmon, jrnieder, peff, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

On 2020-08-19 17:16:44+0000, Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote:
> +static void fill_schedule_info(struct maintenance_task *task,
> +			       const char *config_name,
> +			       timestamp_t schedule_delay)
> +{
> +	timestamp_t now = approxidate("now");

I see this pattern in both previous patch and this patch,
should we create a helper (if not exist) to get current timestamp
instead, parsing "now" every now and then is not a good idea, in my
very opinionated opinion.

> +	char *value = NULL;
> +	struct strbuf last_run = STRBUF_INIT;
> +	int64_t previous_run;
> +
> +	strbuf_addf(&last_run, "maintenance.%s.lastrun", task->name);
> +
> +	if (git_config_get_string(last_run.buf, &value))
> +		task->scheduled = 1;
> +	else {
> +		previous_run = git_config_int64(last_run.buf, value);
> +		if (now >= previous_run + schedule_delay)
> +			task->scheduled = 1;
> +	}
> +
> +	free(value);
> +	strbuf_release(&last_run);
> +}
> +
>  static void initialize_task_config(void)
>  {
>  	int i;
> @@ -1359,6 +1387,7 @@ static void initialize_task_config(void)
>  
>  	for (i = 0; i < TASK__COUNT; i++) {
>  		int config_value;
> +		char *config_str;
>  
>  		strbuf_setlen(&config_name, 0);
>  		strbuf_addf(&config_name, "maintenance.%s.enabled",
> @@ -1366,6 +1395,20 @@ static void initialize_task_config(void)
>  
>  		if (!git_config_get_bool(config_name.buf, &config_value))
>  			tasks[i].enabled = config_value;
> +
> +		strbuf_setlen(&config_name, 0);

It looks like we have a simple and better named alias for this:

	strbuf_reset(&config_name)

_reset has 400+ occurences in this code base, compare to 20 of _setlen


-- 
Danh

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 4/7] for-each-repo: run subcommands on configured repos
  2020-08-19 17:16 ` [PATCH 4/7] for-each-repo: run subcommands on configured repos Derrick Stolee via GitGitGadget
@ 2020-08-20 15:00   ` Đoàn Trần Công Danh
  0 siblings, 0 replies; 41+ messages in thread
From: Đoàn Trần Công Danh @ 2020-08-20 15:00 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, sandals, steadmon, jrnieder, peff, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

On 2020-08-19 17:16:45+0000, Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
> 
> It can be helpful to store a list of repositories in global or system
> config and then iterate Git commands on that list. Create a new builtin
> that makes this process simple for experts. We will use this builtin to
> run scheduled maintenance on all configured repositories in a future
> change.

Nice, I like this new command.

However, I'm not sure if we could declare this command as plumbing or
porcelain command.

I guess this command is meant more for scripting purpose, hence, it
should be plumbing, thus we need to define clear protocol for this
command, e.g, where it will redirect other command output, error to,
where for-each-repo write its own output/error.

Also, I think it would be nice to declare this is experimental for now.
Like we declared git-switch and git-restore.

-- 
Danh

> 
> The test is very simple, but does highlight that the "--" argument is
> optional.
> 
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  .gitignore                          |  1 +
>  Documentation/git-for-each-repo.txt | 45 ++++++++++++++++++++++
>  Makefile                            |  1 +
>  builtin.h                           |  1 +
>  builtin/for-each-repo.c             | 58 +++++++++++++++++++++++++++++
>  git.c                               |  1 +
>  t/t0068-for-each-repo.sh            | 30 +++++++++++++++
>  7 files changed, 137 insertions(+)
>  create mode 100644 Documentation/git-for-each-repo.txt
>  create mode 100644 builtin/for-each-repo.c
>  create mode 100755 t/t0068-for-each-repo.sh
> 
> diff --git a/.gitignore b/.gitignore
> index a5808fa30d..5eb2a2be71 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -67,6 +67,7 @@
>  /git-filter-branch
>  /git-fmt-merge-msg
>  /git-for-each-ref
> +/git-for-each-repo
>  /git-format-patch
>  /git-fsck
>  /git-fsck-objects
> diff --git a/Documentation/git-for-each-repo.txt b/Documentation/git-for-each-repo.txt
> new file mode 100644
> index 0000000000..83b06db410
> --- /dev/null
> +++ b/Documentation/git-for-each-repo.txt
> @@ -0,0 +1,45 @@
> +git-for-each-repo(1)
> +====================
> +
> +NAME
> +----
> +git-for-each-repo - Run a Git command on a list of repositories
> +
> +
> +SYNOPSIS
> +--------
> +[verse]
> +'git for-each-repo' --config=<config> [--] <arguments>
> +
> +
> +DESCRIPTION
> +-----------
> +Run a Git commands on a list of repositories. The arguments after the
> +known options or `--` indicator are used as the arguments for the Git
> +command.
> +
> +For example, we could run maintenance on each of a list of repositories
> +stored in a `maintenance.repo` config variable using
> +
> +-------------
> +git for-each-repo --config=maintenance.repo maintenance run
> +-------------
> +
> +This will run `git -C <repo> maintenance run` for each value `<repo>`
> +in the multi-valued config variable `maintenance.repo`.
> +
> +
> +OPTIONS
> +-------
> +--config=<config>::
> +	Use the given config variable as a multi-valued list storing
> +	absolute path names. Iterate on that list of paths to run
> +	the given arguments.
> ++
> +These config values are loaded from system, global, and local Git config,
> +as available. If `git for-each-repo` is run in a directory that is not a
> +Git repository, then only the system and global config is used.
> +
> +GIT
> +---
> +Part of the linkgit:git[1] suite
> diff --git a/Makefile b/Makefile
> index 65f8cfb236..7c588ff036 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1071,6 +1071,7 @@ BUILTIN_OBJS += builtin/fetch-pack.o
>  BUILTIN_OBJS += builtin/fetch.o
>  BUILTIN_OBJS += builtin/fmt-merge-msg.o
>  BUILTIN_OBJS += builtin/for-each-ref.o
> +BUILTIN_OBJS += builtin/for-each-repo.o
>  BUILTIN_OBJS += builtin/fsck.o
>  BUILTIN_OBJS += builtin/gc.o
>  BUILTIN_OBJS += builtin/get-tar-commit-id.o
> diff --git a/builtin.h b/builtin.h
> index 17c1c0ce49..ff7c6e5aa9 100644
> --- a/builtin.h
> +++ b/builtin.h
> @@ -150,6 +150,7 @@ int cmd_fetch(int argc, const char **argv, const char *prefix);
>  int cmd_fetch_pack(int argc, const char **argv, const char *prefix);
>  int cmd_fmt_merge_msg(int argc, const char **argv, const char *prefix);
>  int cmd_for_each_ref(int argc, const char **argv, const char *prefix);
> +int cmd_for_each_repo(int argc, const char **argv, const char *prefix);
>  int cmd_format_patch(int argc, const char **argv, const char *prefix);
>  int cmd_fsck(int argc, const char **argv, const char *prefix);
>  int cmd_gc(int argc, const char **argv, const char *prefix);
> diff --git a/builtin/for-each-repo.c b/builtin/for-each-repo.c
> new file mode 100644
> index 0000000000..5bba623ff1
> --- /dev/null
> +++ b/builtin/for-each-repo.c
> @@ -0,0 +1,58 @@
> +#include "cache.h"
> +#include "config.h"
> +#include "builtin.h"
> +#include "parse-options.h"
> +#include "run-command.h"
> +#include "string-list.h"
> +
> +static const char * const for_each_repo_usage[] = {
> +	N_("git for-each-repo --config=<config> <command-args>"),
> +	NULL
> +};
> +
> +static int run_command_on_repo(const char *path,
> +			       void *cbdata)
> +{
> +	int i;
> +	struct child_process child = CHILD_PROCESS_INIT;
> +	struct strvec *args = (struct strvec *)cbdata;
> +
> +	child.git_cmd = 1;
> +	strvec_pushl(&child.args, "-C", path, NULL);
> +
> +	for (i = 0; i < args->nr; i++)
> +		strvec_push(&child.args, args->v[i]);
> +
> +	return run_command(&child);
> +}
> +
> +int cmd_for_each_repo(int argc, const char **argv, const char *prefix)
> +{
> +	static const char *config_key = NULL;
> +	int i, result = 0;
> +	const struct string_list *values;
> +	struct strvec args = STRVEC_INIT;
> +
> +	const struct option options[] = {
> +		OPT_STRING(0, "config", &config_key, N_("config"),
> +			   N_("config key storing a list of repository paths")),
> +		OPT_END()
> +	};
> +
> +	argc = parse_options(argc, argv, prefix, options, for_each_repo_usage,
> +			     PARSE_OPT_STOP_AT_NON_OPTION);
> +
> +	if (!config_key)
> +		die(_("missing --config=<config>"));
> +
> +	for (i = 0; i < argc; i++)
> +		strvec_push(&args, argv[i]);
> +
> +	values = repo_config_get_value_multi(the_repository,
> +					     config_key);
> +
> +	for (i = 0; !result && i < values->nr; i++)
> +		result = run_command_on_repo(values->items[i].string, &args);
> +
> +	return result;
> +}
> diff --git a/git.c b/git.c
> index 24f250d29a..1cab64b5d1 100644
> --- a/git.c
> +++ b/git.c
> @@ -511,6 +511,7 @@ static struct cmd_struct commands[] = {
>  	{ "fetch-pack", cmd_fetch_pack, RUN_SETUP | NO_PARSEOPT },
>  	{ "fmt-merge-msg", cmd_fmt_merge_msg, RUN_SETUP },
>  	{ "for-each-ref", cmd_for_each_ref, RUN_SETUP },
> +	{ "for-each-repo", cmd_for_each_repo, RUN_SETUP_GENTLY },
>  	{ "format-patch", cmd_format_patch, RUN_SETUP },
>  	{ "fsck", cmd_fsck, RUN_SETUP },
>  	{ "fsck-objects", cmd_fsck, RUN_SETUP },
> diff --git a/t/t0068-for-each-repo.sh b/t/t0068-for-each-repo.sh
> new file mode 100755
> index 0000000000..136b4ec839
> --- /dev/null
> +++ b/t/t0068-for-each-repo.sh
> @@ -0,0 +1,30 @@
> +#!/bin/sh
> +
> +test_description='git for-each-repo builtin'
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'run based on configured value' '
> +	git init one &&
> +	git init two &&
> +	git init three &&
> +	git -C two commit --allow-empty -m "DID NOT RUN" &&
> +	git config run.key "$TRASH_DIRECTORY/one" &&
> +	git config --add run.key "$TRASH_DIRECTORY/three" &&
> +	git for-each-repo --config=run.key commit --allow-empty -m "ran" &&
> +	git -C one log -1 --pretty=format:%s >message &&
> +	grep ran message &&
> +	git -C two log -1 --pretty=format:%s >message &&
> +	! grep ran message &&
> +	git -C three log -1 --pretty=format:%s >message &&
> +	grep ran message &&
> +	git for-each-repo --config=run.key -- commit --allow-empty -m "ran again" &&
> +	git -C one log -1 --pretty=format:%s >message &&
> +	grep again message &&
> +	git -C two log -1 --pretty=format:%s >message &&
> +	! grep again message &&
> +	git -C three log -1 --pretty=format:%s >message &&
> +	grep again message
> +'
> +
> +test_done
> -- 
> gitgitgadget
> 

-- 
Danh

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/7] maintenance: add --scheduled option and config
  2020-08-20 14:51   ` Đoàn Trần Công Danh
@ 2020-08-24 14:03     ` Derrick Stolee
  0 siblings, 0 replies; 41+ messages in thread
From: Derrick Stolee @ 2020-08-24 14:03 UTC (permalink / raw)
  To: Đoàn Trần Công Danh,
	Derrick Stolee via GitGitGadget
  Cc: git, sandals, steadmon, jrnieder, peff, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

On 8/20/2020 10:51 AM, Đoàn Trần Công Danh wrote:
> On 2020-08-19 17:16:44+0000, Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote:
>> +static void fill_schedule_info(struct maintenance_task *task,
>> +			       const char *config_name,
>> +			       timestamp_t schedule_delay)
>> +{
>> +	timestamp_t now = approxidate("now");
> 
> I see this pattern in both previous patch and this patch,
> should we create a helper (if not exist) to get current timestamp
> instead, parsing "now" every now and then is not a good idea, in my
> very opinionated opinion.

Parsing "now" is not that much work, and it is done only once per
maintenance task. To make a helper that avoids these string comparisons
(specifically to avoid iterating through the "special" array in date.c)
is unlikely to be worth the effort and code duplication.

If you mean it would be good to use a macro here, then that would be
easy:

	#define approxidate_now() approxidate("now")

One important thing for using this over time(NULL) is that we really
want this to work with GIT_TEST_DATE_NOW.

>> +	char *value = NULL;
>> +	struct strbuf last_run = STRBUF_INIT;
>> +	int64_t previous_run;
>> +
>> +	strbuf_addf(&last_run, "maintenance.%s.lastrun", task->name);
>> +
>> +	if (git_config_get_string(last_run.buf, &value))
>> +		task->scheduled = 1;
>> +	else {
>> +		previous_run = git_config_int64(last_run.buf, value);
>> +		if (now >= previous_run + schedule_delay)
>> +			task->scheduled = 1;
>> +	}
>> +
>> +	free(value);
>> +	strbuf_release(&last_run);
>> +}
>> +
>>  static void initialize_task_config(void)
>>  {
>>  	int i;
>> @@ -1359,6 +1387,7 @@ static void initialize_task_config(void)
>>  
>>  	for (i = 0; i < TASK__COUNT; i++) {
>>  		int config_value;
>> +		char *config_str;
>>  
>>  		strbuf_setlen(&config_name, 0);
>>  		strbuf_addf(&config_name, "maintenance.%s.enabled",
>> @@ -1366,6 +1395,20 @@ static void initialize_task_config(void)
>>  
>>  		if (!git_config_get_bool(config_name.buf, &config_value))
>>  			tasks[i].enabled = config_value;
>> +
>> +		strbuf_setlen(&config_name, 0);
> 
> It looks like we have a simple and better named alias for this:
> 
> 	strbuf_reset(&config_name)
> 
> _reset has 400+ occurences in this code base, compare to 20 of _setlen

Makes sense. Thanks.

-Stolee


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 0/7] [RFC] Maintenance III: background maintenance
  2020-08-19 17:16 [PATCH 0/7] [RFC] Maintenance III: background maintenance Derrick Stolee via GitGitGadget
                   ` (6 preceding siblings ...)
       [not found] ` <bdc27fa28ee70222ed3c7c9863746ace8ea835e4.1597857409.git.gitgitgadget@gmail.com>
@ 2020-08-25 18:39 ` Derrick Stolee via GitGitGadget
  2020-08-25 18:39   ` [PATCH v2 1/7] maintenance: optionally skip --auto process Derrick Stolee via GitGitGadget
                     ` (7 more replies)
  2020-08-26 12:42 ` [PATCH 0/7] [RFC] Maintenance III: background maintenance Michal Suchánek
  8 siblings, 8 replies; 41+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-25 18:39 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee

This is based on v3 of Part II (ds/maintenance-part-2) [1].

[1] 
https://lore.kernel.org/git/pull.696.v3.git.1598380599.gitgitgadget@gmail.com/

This RFC is intended to show how I hope to integrate true background
maintenance into Git. As opposed to my original RFC [2], this entirely
integrates with cron (through crontab [-e|-l]) to launch maintenance
commands in the background.

[2] 
https://lore.kernel.org/git/pull.597.git.1585946894.gitgitgadget@gmail.com/

Some preliminary work is done to allow a new --scheduled option that
triggers enabled tasks only if they have not been run in some amount of
time. The timestamp of the previous run is stored in the 
maintenance.<task>.lastRun config value while the interval is stored in the 
maintenance.<task>.schedule config value.

A new for-each-repo builtin runs Git commands on every repo in a given list.
Currently, the list is stored as a config setting, allowing a new 
maintenance.repos config list to store the repositories registered for
background maintenance. Others may want to add a --file=<file> option for
their own workflows, but I focused on making this as simple as possible for
now.

The updates to the git maintenance builtin include new register/unregister 
subcommands and start/stop subcommands. The register subcommand initializes
the config while the start subcommand does everything register does plus 
update the cron table. The unregister and stop commands reverse this
process.

The very last patch is entirely optional. It sets a recommended schedule
based on my own experience with very large repositories. I'm open to other
suggestions, but these are ones that I think work well and don't cause a
"rewrite the world" scenario like running nightly 'gc' would do.

I've been testing this scenario on my macOS laptop for a while and my Linux
machine. I have modified my cron task to provide logging via trace2 so I can
see what's happening. A future direction here would be to add some
maintenance logs to the repository so we can track what is happening and
diagnose whether the maintenance strategy is working on real repos.

It could be helpful for contributors to suggest ways to configure certain
jobs to run "nightly" or "overnight on a weekend" instead of just "whenever
the last run was long enough ago." One way to do this would be to set the 
lastRun config to be at the time of day that the job should run. Another
option would be to make the cron table more complicated with multiple rows,
but I would prefer to avoid that option if there is a simpler mechanism.

Note: git maintenance (start|stop) only works on machines with cron by
design. The proper thing to do on Windows will come later. Perhaps this
command should be marked as unavailable on Windows somehow, or at least a
better error than "cron may not be available on your system". I did find
that that message is helpful sometimes: macOS worker agents for CI builds
typically do not have cron available.

Updates since RFC v1
====================

 * Some fallout from rewriting the option parsing in "Maintenance I"
   
   
 * This applies cleanly on v3 of "Maintenance II"
   
   
 * Several helpful feedback items from Đoàn Trần Công Danh are applied.
   
   
 * There is an unresolved comment around the use of approxidate("now").
   These calls are untouched from v1.
   
   

Thanks, -Stolee

Derrick Stolee (7):
  maintenance: optionally skip --auto process
  maintenance: store the "last run" time in config
  maintenance: add --scheduled option and config
  for-each-repo: run subcommands on configured repos
  maintenance: add [un]register subcommands
  maintenance: add start/stop subcommands
  maintenance: recommended schedule in register/start

 .gitignore                           |   1 +
 Documentation/config/maintenance.txt |  19 ++
 Documentation/git-for-each-repo.txt  |  59 ++++++
 Documentation/git-maintenance.txt    |  44 ++++-
 Makefile                             |   2 +
 builtin.h                            |   1 +
 builtin/for-each-repo.c              |  58 ++++++
 builtin/gc.c                         | 286 ++++++++++++++++++++++++++-
 command-list.txt                     |   1 +
 git.c                                |   1 +
 run-command.c                        |   6 +
 t/helper/test-crontab.c              |  35 ++++
 t/helper/test-tool.c                 |   1 +
 t/helper/test-tool.h                 |   1 +
 t/t0068-for-each-repo.sh             |  30 +++
 t/t7900-maintenance.sh               |  95 ++++++++-
 t/test-lib.sh                        |   6 +
 17 files changed, 640 insertions(+), 6 deletions(-)
 create mode 100644 Documentation/git-for-each-repo.txt
 create mode 100644 builtin/for-each-repo.c
 create mode 100644 t/helper/test-crontab.c
 create mode 100755 t/t0068-for-each-repo.sh


base-commit: e9bb32f53ade2067f773bfe6e5c13ed1a5d694a6
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-680%2Fderrickstolee%2Fmaintenance%2Fscheduled-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-680/derrickstolee/maintenance/scheduled-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/680

Range-diff vs v1:

 1:  90de25d128 ! 1:  5fdd8188b1 maintenance: optionally skip --auto process
     @@ run-command.c: int run_processes_parallel_tr2(int n, get_next_task_fn get_next_t
       	struct child_process maint = CHILD_PROCESS_INIT;
       
      +	if (!git_config_get_bool("maintenance.auto", &enabled) &&
     -+	    !enabled) {
     -+		    fprintf(stderr, "enabled: %d\n", enabled);
     ++	    !enabled)
      +		return 0;
     -+	    }
      +
       	maint.git_cmd = 1;
       	strvec_pushl(&maint.args, "maintenance", "run", "--auto", NULL);
 2:  bdc27fa28e ! 2:  e3ef0b9bea maintenance: store the "last run" time in config
     @@ builtin/gc.c: static int compare_tasks_by_selection(const void *a_, const void *
      +	strbuf_release(&value);
      +}
      +
     - static int maintenance_run(struct maintenance_opts *opts)
     + static int maintenance_run_tasks(struct maintenance_run_opts *opts)
       {
       	int i, found_selected = 0;
     -@@ builtin/gc.c: static int maintenance_run(struct maintenance_opts *opts)
     +@@ builtin/gc.c: static int maintenance_run_tasks(struct maintenance_run_opts *opts)
       		     !tasks[i].auto_condition()))
       			continue;
       
     @@ builtin/gc.c: static int maintenance_run(struct maintenance_opts *opts)
       		if (tasks[i].fn(opts)) {
       			error(_("task '%s' failed"), tasks[i].name);
      
     - ## git-gvfs-helper (new) ##
     - Binary files /dev/null and git-gvfs-helper differ
     -
     - ## t/helper/test-gvfs-protocol (new) ##
     - Binary files /dev/null and t/helper/test-gvfs-protocol differ
     -
       ## t/t7900-maintenance.sh ##
      @@ t/t7900-maintenance.sh: test_expect_success 'maintenance.incremental-repack.auto' '
       	done
 3:  4473c93b11 ! 3:  c728c57d85 maintenance: add --scheduled option and config
     @@ Documentation/git-maintenance.txt: OPTIONS
       	Do not report progress or other information over `stderr`.
      
       ## builtin/gc.c ##
     -@@ builtin/gc.c: static const char * const builtin_maintenance_usage[] = {
     +@@ builtin/gc.c: int cmd_gc(int argc, const char **argv, const char *prefix)
     + }
     + 
     + static const char * const builtin_maintenance_run_usage[] = {
     +-	N_("git maintenance run [--auto] [--[no-]quiet] [--task=<task>]"),
     ++	N_("git maintenance run [--auto] [--[no-]quiet] [--task=<task>] [--scheduled]"),
     + 	NULL
     + };
       
     - struct maintenance_opts {
     + struct maintenance_run_opts {
       	int auto_flag;
      +	int scheduled;
       	int quiet;
     @@ builtin/gc.c: struct maintenance_task {
       
       	/* -1 if not selected. */
       	int selected_order;
     -@@ builtin/gc.c: static int maintenance_run(struct maintenance_opts *opts)
     +@@ builtin/gc.c: static int maintenance_run_tasks(struct maintenance_run_opts *opts)
       		     !tasks[i].auto_condition()))
       			continue;
       
     @@ builtin/gc.c: static int maintenance_run(struct maintenance_opts *opts)
       		update_last_run(&tasks[i]);
       
       		trace2_region_enter("maintenance", tasks[i].name, r);
     -@@ builtin/gc.c: static int maintenance_run(struct maintenance_opts *opts)
     +@@ builtin/gc.c: static int maintenance_run_tasks(struct maintenance_run_opts *opts)
       	return result;
       }
       
     @@ builtin/gc.c: static void initialize_task_config(void)
       		int config_value;
      +		char *config_str;
       
     - 		strbuf_setlen(&config_name, 0);
     +-		strbuf_setlen(&config_name, 0);
     ++		strbuf_reset(&config_name);
       		strbuf_addf(&config_name, "maintenance.%s.enabled",
     -@@ builtin/gc.c: static void initialize_task_config(void)
     + 			    tasks[i].name);
       
       		if (!git_config_get_bool(config_name.buf, &config_value))
       			tasks[i].enabled = config_value;
      +
     -+		strbuf_setlen(&config_name, 0);
     ++		strbuf_reset(&config_name);
      +		strbuf_addf(&config_name, "maintenance.%s.schedule",
      +			    tasks[i].name);
      +
     @@ builtin/gc.c: static void initialize_task_config(void)
       	}
       
       	strbuf_release(&config_name);
     -@@ builtin/gc.c: int cmd_maintenance(int argc, const char **argv, const char *prefix)
     - 	struct option builtin_maintenance_options[] = {
     +@@ builtin/gc.c: static int maintenance_run(int argc, const char **argv, const char *prefix)
     + 	struct option builtin_maintenance_run_options[] = {
       		OPT_BOOL(0, "auto", &opts.auto_flag,
       			 N_("run tasks based on the state of the repository")),
      +		OPT_BOOL(0, "scheduled", &opts.scheduled,
     @@ builtin/gc.c: int cmd_maintenance(int argc, const char **argv, const char *prefi
       		OPT_BOOL(0, "quiet", &opts.quiet,
       			 N_("do not report progress or other information over stderr")),
       		OPT_CALLBACK_F(0, "task", NULL, N_("task"),
     -@@ builtin/gc.c: int cmd_maintenance(int argc, const char **argv, const char *prefix)
     - 			     builtin_maintenance_usage,
     - 			     PARSE_OPT_KEEP_UNKNOWN);
     +@@ builtin/gc.c: static int maintenance_run(int argc, const char **argv, const char *prefix)
     + 			     builtin_maintenance_run_usage,
     + 			     PARSE_OPT_STOP_AT_NON_OPTION);
       
      +	if (opts.auto_flag + opts.scheduled > 1)
      +		die(_("use at most one of the --auto and --scheduled options"));
      +
     - 	if (argc != 1)
     - 		usage_with_options(builtin_maintenance_usage,
     - 				   builtin_maintenance_options);
     + 	if (argc != 0)
     + 		usage_with_options(builtin_maintenance_run_usage,
     + 				   builtin_maintenance_run_options);
      
       ## t/t7900-maintenance.sh ##
      @@ t/t7900-maintenance.sh: test_expect_success 'maintenance.incremental-repack.auto' '
 4:  ccb667dc6f ! 4:  0314258c5c for-each-repo: run subcommands on configured repos
     @@ Documentation/git-for-each-repo.txt (new)
      +
      +DESCRIPTION
      +-----------
     -+Run a Git commands on a list of repositories. The arguments after the
     ++Run a Git command on a list of repositories. The arguments after the
      +known options or `--` indicator are used as the arguments for the Git
     -+command.
     ++subprocess.
     ++
     ++THIS COMMAND IS EXPERIMENTAL. THE BEHAVIOR MAY CHANGE.
      +
      +For example, we could run maintenance on each of a list of repositories
      +stored in a `maintenance.repo` config variable using
     @@ Documentation/git-for-each-repo.txt (new)
      +as available. If `git for-each-repo` is run in a directory that is not a
      +Git repository, then only the system and global config is used.
      +
     ++
     ++SUBPROCESS BEHAVIOR
     ++-------------------
     ++
     ++If any `git -C <repo> <arguments>` subprocess returns a non-zero exit code,
     ++then the `git for-each-repo` process returns that exit code without running
     ++more subprocesses.
     ++
     ++Each `git -C <repo> <arguments>` subprocess inherits the standard file
     ++descriptors `stdin`, `stdout`, and `stderr`.
     ++
     ++
      +GIT
      +---
      +Part of the linkgit:git[1] suite
     @@ builtin/for-each-repo.c (new)
      +	return result;
      +}
      
     + ## command-list.txt ##
     +@@ command-list.txt: git-fetch-pack                          synchingrepositories
     + git-filter-branch                       ancillarymanipulators
     + git-fmt-merge-msg                       purehelpers
     + git-for-each-ref                        plumbinginterrogators
     ++git-for-each-repo                       plumbinginterrogators
     + git-format-patch                        mainporcelain
     + git-fsck                                ancillaryinterrogators          complete
     + git-gc                                  mainporcelain
     +
       ## git.c ##
      @@ git.c: static struct cmd_struct commands[] = {
       	{ "fetch-pack", cmd_fetch_pack, RUN_SETUP | NO_PARSEOPT },
 5:  f44c6a0f20 ! 5:  c0ce1267a9 maintenance: add [un]register subcommands
     @@ Documentation/git-maintenance.txt: run::
       
      
       ## builtin/gc.c ##
     -@@ builtin/gc.c: int cmd_gc(int argc, const char **argv, const char *prefix)
     - }
     - 
     - static const char * const builtin_maintenance_usage[] = {
     --	N_("git maintenance run [<options>]"),
     -+	N_("git maintenance <subcommand> [<options>]"),
     - 	NULL
     - };
     - 
     -@@ builtin/gc.c: static int task_option_parse(const struct option *opt,
     - 	return 0;
     +@@ builtin/gc.c: static int maintenance_run(int argc, const char **argv, const char *prefix)
     + 	return maintenance_run_tasks(&opts);
       }
       
     +-static const char builtin_maintenance_usage[] = N_("git maintenance run [<options>]");
      +static int maintenance_register(void)
      +{
      +	struct child_process config_set = CHILD_PROCESS_INIT;
     @@ builtin/gc.c: static int task_option_parse(const struct option *opt,
      +	return run_command(&config_unset);
      +}
      +
     ++static const char builtin_maintenance_usage[] =	N_("git maintenance <subcommand> [<options>]");
     + 
       int cmd_maintenance(int argc, const char **argv, const char *prefix)
       {
     - 	int i;
      @@ builtin/gc.c: int cmd_maintenance(int argc, const char **argv, const char *prefix)
     - 		usage_with_options(builtin_maintenance_usage,
     - 				   builtin_maintenance_options);
       
     -+	if (!strcmp(argv[0], "register"))
     + 	if (!strcmp(argv[1], "run"))
     + 		return maintenance_run(argc - 1, argv + 1, prefix);
     ++	if (!strcmp(argv[1], "register"))
      +		return maintenance_register();
     - 	if (!strcmp(argv[0], "run"))
     - 		return maintenance_run(&opts);
     -+	if (!strcmp(argv[0], "unregister"))
     ++	if (!strcmp(argv[1], "unregister"))
      +		return maintenance_unregister();
       
     - 	die(_("invalid subcommand: %s"), argv[0]);
     + 	die(_("invalid subcommand: %s"), argv[1]);
       }
      
       ## t/t7900-maintenance.sh ##
 6:  2442071fd0 ! 6:  8a7c34035a maintenance: add start/stop subcommands
     @@ builtin/gc.c: static int maintenance_unregister(void)
      +	return update_background_schedule(0);
      +}
      +
     + static const char builtin_maintenance_usage[] =	N_("git maintenance <subcommand> [<options>]");
     + 
       int cmd_maintenance(int argc, const char **argv, const char *prefix)
     - {
     - 	int i;
      @@ builtin/gc.c: int cmd_maintenance(int argc, const char **argv, const char *prefix)
     - 		return maintenance_register();
     - 	if (!strcmp(argv[0], "run"))
     - 		return maintenance_run(&opts);
     -+	if (!strcmp(argv[0], "start"))
     + 
     + 	if (!strcmp(argv[1], "run"))
     + 		return maintenance_run(argc - 1, argv + 1, prefix);
     ++	if (!strcmp(argv[1], "start"))
      +		return maintenance_start();
     -+	if (!strcmp(argv[0], "stop"))
     ++	if (!strcmp(argv[1], "stop"))
      +		return maintenance_stop();
     - 	if (!strcmp(argv[0], "unregister"))
     - 		return maintenance_unregister();
     - 
     + 	if (!strcmp(argv[1], "register"))
     + 		return maintenance_register();
     + 	if (!strcmp(argv[1], "unregister"))
      
       ## t/helper/test-crontab.c (new) ##
      @@
 7:  40b1a0546c ! 7:  9ecabeb055 maintenance: recommended schedule in register/start
     @@ Documentation/git-maintenance.txt: register::
       	Run one or more maintenance tasks. If one or more `--task` options
      
       ## builtin/gc.c ##
     -@@ builtin/gc.c: static int task_option_parse(const struct option *opt,
     - 	return 0;
     +@@ builtin/gc.c: static int maintenance_run(int argc, const char **argv, const char *prefix)
     + 	return maintenance_run_tasks(&opts);
       }
       
      +static int has_schedule_config(void)

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 1/7] maintenance: optionally skip --auto process
  2020-08-25 18:39 ` [PATCH v2 0/7] [RFC] Maintenance III: background maintenance Derrick Stolee via GitGitGadget
@ 2020-08-25 18:39   ` Derrick Stolee via GitGitGadget
  2020-08-25 21:44     ` Junio C Hamano
  2020-08-25 18:39   ` [PATCH v2 2/7] maintenance: store the "last run" time in config Derrick Stolee via GitGitGadget
                     ` (6 subsequent siblings)
  7 siblings, 1 reply; 41+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-25 18:39 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Some commands run 'git maintenance run --auto --[no-]quiet' after doing
their normal work, as a way to keep repositories clean as they are used.
Currently, users who do not want this maintenance to occur would set the
'gc.auto' config option to 0 to avoid the 'gc' task from running.
However, this does not stop the extra process invocation. On Windows,
this extra process invocation can be more expensive than necessary.

Allow users to drop this extra process by setting 'maintenance.auto' to
'false'.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/config/maintenance.txt |  5 +++++
 run-command.c                        |  6 ++++++
 t/t7900-maintenance.sh               | 13 +++++++++++++
 3 files changed, 24 insertions(+)

diff --git a/Documentation/config/maintenance.txt b/Documentation/config/maintenance.txt
index a0706d8f09..06db758172 100644
--- a/Documentation/config/maintenance.txt
+++ b/Documentation/config/maintenance.txt
@@ -1,3 +1,8 @@
+maintenance.auto::
+	This boolean config option controls whether some commands run
+	`git maintenance run --auto` after doing their normal work. Defaults
+	to true.
+
 maintenance.<task>.enabled::
 	This boolean config option controls whether the maintenance task
 	with name `<task>` is run when no `--task` option is specified to
diff --git a/run-command.c b/run-command.c
index 2ee59acdc8..ea4d0fb4b1 100644
--- a/run-command.c
+++ b/run-command.c
@@ -7,6 +7,7 @@
 #include "strbuf.h"
 #include "string-list.h"
 #include "quote.h"
+#include "config.h"
 
 void child_process_init(struct child_process *child)
 {
@@ -1868,8 +1869,13 @@ int run_processes_parallel_tr2(int n, get_next_task_fn get_next_task,
 
 int run_auto_maintenance(int quiet)
 {
+	int enabled;
 	struct child_process maint = CHILD_PROCESS_INIT;
 
+	if (!git_config_get_bool("maintenance.auto", &enabled) &&
+	    !enabled)
+		return 0;
+
 	maint.git_cmd = 1;
 	strvec_pushl(&maint.args, "maintenance", "run", "--auto", NULL);
 	strvec_push(&maint.args, quiet ? "--quiet" : "--no-quiet");
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 6f878b0141..e0ba19e1ff 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -26,6 +26,19 @@ test_expect_success 'run [--auto|--quiet]' '
 	test_subcommand git gc --no-quiet <run-no-quiet.txt
 '
 
+test_expect_success 'maintenance.auto config option' '
+	GIT_TRACE2_EVENT="$(pwd)/default" git commit --quiet --allow-empty -m 1 &&
+	test_subcommand git maintenance run --auto --quiet <default &&
+	GIT_TRACE2_EVENT="$(pwd)/true" \
+		git -c maintenance.auto=true \
+		commit --quiet --allow-empty -m 2 &&
+	test_subcommand git maintenance run --auto --quiet  <true &&
+	GIT_TRACE2_EVENT="$(pwd)/false" \
+		git -c maintenance.auto=false \
+		commit --quiet --allow-empty -m 3 &&
+	test_subcommand ! git maintenance run --auto --quiet  <false
+'
+
 test_expect_success 'maintenance.<task>.enabled' '
 	git config maintenance.gc.enabled false &&
 	git config maintenance.commit-graph.enabled true &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 2/7] maintenance: store the "last run" time in config
  2020-08-25 18:39 ` [PATCH v2 0/7] [RFC] Maintenance III: background maintenance Derrick Stolee via GitGitGadget
  2020-08-25 18:39   ` [PATCH v2 1/7] maintenance: optionally skip --auto process Derrick Stolee via GitGitGadget
@ 2020-08-25 18:39   ` Derrick Stolee via GitGitGadget
  2020-08-25 21:52     ` Junio C Hamano
  2020-08-25 18:40   ` [PATCH v2 3/7] maintenance: add --scheduled option and config Derrick Stolee via GitGitGadget
                     ` (5 subsequent siblings)
  7 siblings, 1 reply; 41+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-25 18:39 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Users may want to run certain maintenance tasks only so often. Update
the local config with a new 'maintenance.<task>.lastRun' config option
that stores the timestamp just before running the maintenance task.

I selected the timestamp before the task, as opposed to after the task,
for a couple reasons:

 1. The time the task takes to execute should not contribute to the
    interval between running the tasks. If a daily task takes 10 minutes
    to run, then every day the execution will drift by at least 10
    minutes.

 2. If the task fails for some unforseen reason, it would be good to
    indicate that we _attempted_ the task at a certain timestamp. This
    will avoid spamming a repository that is in a bad state.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/config/maintenance.txt |  5 +++++
 builtin/gc.c                         | 16 ++++++++++++++++
 t/t7900-maintenance.sh               | 10 ++++++++++
 3 files changed, 31 insertions(+)

diff --git a/Documentation/config/maintenance.txt b/Documentation/config/maintenance.txt
index 06db758172..8dd34169da 100644
--- a/Documentation/config/maintenance.txt
+++ b/Documentation/config/maintenance.txt
@@ -10,6 +10,11 @@ maintenance.<task>.enabled::
 	`--task` option exists. By default, only `maintenance.gc.enabled`
 	is true.
 
+maintenance.<task>.lastRun::
+	This config value is automatically updated by Git when the task
+	`<task>` is run. It stores a timestamp representing the most-recent
+	run of the `<task>`.
+
 maintenance.commit-graph.auto::
 	This integer config option controls how often the `commit-graph` task
 	should be run as part of `git maintenance run --auto`. If zero, then
diff --git a/builtin/gc.c b/builtin/gc.c
index f8459df04c..fb6f231a5c 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1212,6 +1212,20 @@ static int compare_tasks_by_selection(const void *a_, const void *b_)
 	return b->selected_order - a->selected_order;
 }
 
+static void update_last_run(struct maintenance_task *task)
+{
+	timestamp_t now = approxidate("now");
+	struct strbuf config = STRBUF_INIT;
+	struct strbuf value = STRBUF_INIT;
+	strbuf_addf(&config, "maintenance.%s.lastrun", task->name);
+	strbuf_addf(&value, "%"PRItime"", now);
+
+	git_config_set(config.buf, value.buf);
+
+	strbuf_release(&config);
+	strbuf_release(&value);
+}
+
 static int maintenance_run_tasks(struct maintenance_run_opts *opts)
 {
 	int i, found_selected = 0;
@@ -1254,6 +1268,8 @@ static int maintenance_run_tasks(struct maintenance_run_opts *opts)
 		     !tasks[i].auto_condition()))
 			continue;
 
+		update_last_run(&tasks[i]);
+
 		trace2_region_enter("maintenance", tasks[i].name, r);
 		if (tasks[i].fn(opts)) {
 			error(_("task '%s' failed"), tasks[i].name);
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index e0ba19e1ff..a985ce3674 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -264,4 +264,14 @@ test_expect_success 'maintenance.incremental-repack.auto' '
 	done
 '
 
+test_expect_success 'tasks update maintenance.<task>.lastRun' '
+	git config --unset maintenance.commit-graph.lastrun &&
+	GIT_TRACE2_EVENT="$(pwd)/run.txt" \
+		GIT_TEST_DATE_NOW=1595000000 \
+		git maintenance run --task=commit-graph 2>/dev/null &&
+	test_subcommand git commit-graph write --split --reachable \
+		--no-progress <run.txt &&
+	test_cmp_config 1595000000 maintenance.commit-graph.lastrun
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 3/7] maintenance: add --scheduled option and config
  2020-08-25 18:39 ` [PATCH v2 0/7] [RFC] Maintenance III: background maintenance Derrick Stolee via GitGitGadget
  2020-08-25 18:39   ` [PATCH v2 1/7] maintenance: optionally skip --auto process Derrick Stolee via GitGitGadget
  2020-08-25 18:39   ` [PATCH v2 2/7] maintenance: store the "last run" time in config Derrick Stolee via GitGitGadget
@ 2020-08-25 18:40   ` Derrick Stolee via GitGitGadget
  2020-08-25 22:01     ` Junio C Hamano
  2020-08-25 18:40   ` [PATCH v2 4/7] for-each-repo: run subcommands on configured repos Derrick Stolee via GitGitGadget
                     ` (4 subsequent siblings)
  7 siblings, 1 reply; 41+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-25 18:40 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

A user may want to run certain maintenance tasks based on frequency, not
conditions given in the repository. For example, the user may want to
perform a 'prefetch' task every hour, or 'gc' task every day. To assist,
update the 'git maintenance run --scheduled' command to check the config
for the last run of that task and add a number of seconds. The task
would then run only if the current time is beyond that minimum
timestamp.

Add a '--scheduled' option to 'git maintenance run' to only run tasks
that have had enough time pass since their last run. This is done for
each enabled task by checking if the current timestamp is at least as
large as the sum of 'maintenance.<task>.lastRun' and
'maintenance.<task>.schedule' in the Git config. This second value is
new to this commit, storing a number of seconds intended between runs.

A user could then set up an hourly maintenance run with the following
cron table:

  0 * * * * git -C <repo> maintenance run --scheduled

Then, the user could configure the repository with the following config
values:

  maintenance.prefetch.schedule  3000
  maintenance.gc.schedule       86000

These numbers are slightly lower than one hour and one day (in seconds).
The cron schedule will enforce the hourly run rate, but we can use these
schedules to ensure the 'gc' task runs once a day. The error is given
because the *.lastRun config option is specified at the _start_ of the
task run. Otherwise, a slow task run could shift the "daily" job of 'gc'
from a 10:00pm run to 11:00pm run, or later.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/config/maintenance.txt |  9 +++++
 Documentation/git-maintenance.txt    | 13 ++++++-
 builtin/gc.c                         | 54 ++++++++++++++++++++++++++--
 t/t7900-maintenance.sh               | 20 +++++++++++
 4 files changed, 92 insertions(+), 4 deletions(-)

diff --git a/Documentation/config/maintenance.txt b/Documentation/config/maintenance.txt
index 8dd34169da..caacacd322 100644
--- a/Documentation/config/maintenance.txt
+++ b/Documentation/config/maintenance.txt
@@ -15,6 +15,15 @@ maintenance.<task>.lastRun::
 	`<task>` is run. It stores a timestamp representing the most-recent
 	run of the `<task>`.
 
+maintenance.<task>.schedule::
+	This config option controls whether or not the given `<task>` runs
+	during a `git maintenance run --scheduled` command. If the option
+	is an integer value `S`, then the `<task>` is run when the current
+	time is `S` seconds after the timestamp stored in
+	`maintenance.<task>.lastRun`. If the option has no value or a
+	non-integer value, then the task will never run with the `--scheduled`
+	option.
+
 maintenance.commit-graph.auto::
 	This integer config option controls how often the `commit-graph` task
 	should be run as part of `git maintenance run --auto`. If zero, then
diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index b44efb05a3..2bc02c65e4 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -107,7 +107,18 @@ OPTIONS
 	only if certain thresholds are met. For example, the `gc` task
 	runs when the number of loose objects exceeds the number stored
 	in the `gc.auto` config setting, or when the number of pack-files
-	exceeds the `gc.autoPackLimit` config setting.
+	exceeds the `gc.autoPackLimit` config setting. Not compatible with
+	the `--scheduled` option.
+
+--scheduled::
+	When combined with the `run` subcommand, run maintenance tasks
+	only if certain time conditions are met, as specified by the
+	`maintenance.<task>.schedule` config value for each `<task>`.
+	This config value specifies a number of seconds since the last
+	time that task ran, according to the `maintenance.<task>.lastRun`
+	config value. The tasks that are tested are those provided by
+	the `--task=<task>` option(s) or those with
+	`maintenance.<task>.enabled` set to true.
 
 --quiet::
 	Do not report progress or other information over `stderr`.
diff --git a/builtin/gc.c b/builtin/gc.c
index fb6f231a5c..5726a9a3b3 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -705,12 +705,13 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 }
 
 static const char * const builtin_maintenance_run_usage[] = {
-	N_("git maintenance run [--auto] [--[no-]quiet] [--task=<task>]"),
+	N_("git maintenance run [--auto] [--[no-]quiet] [--task=<task>] [--scheduled]"),
 	NULL
 };
 
 struct maintenance_run_opts {
 	int auto_flag;
+	int scheduled;
 	int quiet;
 };
 
@@ -1157,7 +1158,8 @@ struct maintenance_task {
 	const char *name;
 	maintenance_task_fn *fn;
 	maintenance_auto_fn *auto_condition;
-	unsigned enabled:1;
+	unsigned enabled:1,
+		 scheduled:1;
 
 	/* -1 if not selected. */
 	int selected_order;
@@ -1268,6 +1270,9 @@ static int maintenance_run_tasks(struct maintenance_run_opts *opts)
 		     !tasks[i].auto_condition()))
 			continue;
 
+		if (opts->scheduled && !tasks[i].scheduled)
+			continue;
+
 		update_last_run(&tasks[i]);
 
 		trace2_region_enter("maintenance", tasks[i].name, r);
@@ -1282,6 +1287,29 @@ static int maintenance_run_tasks(struct maintenance_run_opts *opts)
 	return result;
 }
 
+static void fill_schedule_info(struct maintenance_task *task,
+			       const char *config_name,
+			       timestamp_t schedule_delay)
+{
+	timestamp_t now = approxidate("now");
+	char *value = NULL;
+	struct strbuf last_run = STRBUF_INIT;
+	int64_t previous_run;
+
+	strbuf_addf(&last_run, "maintenance.%s.lastrun", task->name);
+
+	if (git_config_get_string(last_run.buf, &value))
+		task->scheduled = 1;
+	else {
+		previous_run = git_config_int64(last_run.buf, value);
+		if (now >= previous_run + schedule_delay)
+			task->scheduled = 1;
+	}
+
+	free(value);
+	strbuf_release(&last_run);
+}
+
 static void initialize_task_config(void)
 {
 	int i;
@@ -1290,13 +1318,28 @@ static void initialize_task_config(void)
 
 	for (i = 0; i < TASK__COUNT; i++) {
 		int config_value;
+		char *config_str;
 
-		strbuf_setlen(&config_name, 0);
+		strbuf_reset(&config_name);
 		strbuf_addf(&config_name, "maintenance.%s.enabled",
 			    tasks[i].name);
 
 		if (!git_config_get_bool(config_name.buf, &config_value))
 			tasks[i].enabled = config_value;
+
+		strbuf_reset(&config_name);
+		strbuf_addf(&config_name, "maintenance.%s.schedule",
+			    tasks[i].name);
+
+		if (!git_config_get_string(config_name.buf, &config_str)) {
+			timestamp_t schedule_delay = git_config_int64(
+							config_name.buf,
+							config_str);
+			fill_schedule_info(&tasks[i],
+						config_name.buf,
+						schedule_delay);
+			free(config_str);
+		}
 	}
 
 	strbuf_release(&config_name);
@@ -1340,6 +1383,8 @@ static int maintenance_run(int argc, const char **argv, const char *prefix)
 	struct option builtin_maintenance_run_options[] = {
 		OPT_BOOL(0, "auto", &opts.auto_flag,
 			 N_("run tasks based on the state of the repository")),
+		OPT_BOOL(0, "scheduled", &opts.scheduled,
+			 N_("run tasks based on time intervals")),
 		OPT_BOOL(0, "quiet", &opts.quiet,
 			 N_("do not report progress or other information over stderr")),
 		OPT_CALLBACK_F(0, "task", NULL, N_("task"),
@@ -1360,6 +1405,9 @@ static int maintenance_run(int argc, const char **argv, const char *prefix)
 			     builtin_maintenance_run_usage,
 			     PARSE_OPT_STOP_AT_NON_OPTION);
 
+	if (opts.auto_flag + opts.scheduled > 1)
+		die(_("use at most one of the --auto and --scheduled options"));
+
 	if (argc != 0)
 		usage_with_options(builtin_maintenance_run_usage,
 				   builtin_maintenance_run_options);
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index a985ce3674..3e0c5f1ca8 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -264,6 +264,11 @@ test_expect_success 'maintenance.incremental-repack.auto' '
 	done
 '
 
+test_expect_success '--auto and --scheduled incompatible' '
+	test_must_fail git maintenance run --auto --scheduled 2>err &&
+	test_i18ngrep "at most one" err
+'
+
 test_expect_success 'tasks update maintenance.<task>.lastRun' '
 	git config --unset maintenance.commit-graph.lastrun &&
 	GIT_TRACE2_EVENT="$(pwd)/run.txt" \
@@ -274,4 +279,19 @@ test_expect_success 'tasks update maintenance.<task>.lastRun' '
 	test_cmp_config 1595000000 maintenance.commit-graph.lastrun
 '
 
+test_expect_success '--scheduled with specific time' '
+	git config maintenance.commit-graph.schedule 100 &&
+	GIT_TRACE2_EVENT="$(pwd)/too-soon.txt" \
+		GIT_TEST_DATE_NOW=1595000099 \
+		git maintenance run --scheduled 2>/dev/null &&
+	test_subcommand ! git commit-graph write --split --reachable \
+		--no-progress <too-soon.txt &&
+	GIT_TRACE2_EVENT="$(pwd)/long-enough.txt" \
+		GIT_TEST_DATE_NOW=1595000100 \
+		git maintenance run --scheduled 2>/dev/null &&
+	test_subcommand git commit-graph write --split --reachable \
+		--no-progress <long-enough.txt &&
+	test_cmp_config 1595000100 maintenance.commit-graph.lastrun
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 4/7] for-each-repo: run subcommands on configured repos
  2020-08-25 18:39 ` [PATCH v2 0/7] [RFC] Maintenance III: background maintenance Derrick Stolee via GitGitGadget
                     ` (2 preceding siblings ...)
  2020-08-25 18:40   ` [PATCH v2 3/7] maintenance: add --scheduled option and config Derrick Stolee via GitGitGadget
@ 2020-08-25 18:40   ` Derrick Stolee via GitGitGadget
  2020-08-25 22:19     ` Junio C Hamano
  2020-08-25 18:40   ` [PATCH v2 5/7] maintenance: add [un]register subcommands Derrick Stolee via GitGitGadget
                     ` (3 subsequent siblings)
  7 siblings, 1 reply; 41+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-25 18:40 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

It can be helpful to store a list of repositories in global or system
config and then iterate Git commands on that list. Create a new builtin
that makes this process simple for experts. We will use this builtin to
run scheduled maintenance on all configured repositories in a future
change.

The test is very simple, but does highlight that the "--" argument is
optional.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 .gitignore                          |  1 +
 Documentation/git-for-each-repo.txt | 59 +++++++++++++++++++++++++++++
 Makefile                            |  1 +
 builtin.h                           |  1 +
 builtin/for-each-repo.c             | 58 ++++++++++++++++++++++++++++
 command-list.txt                    |  1 +
 git.c                               |  1 +
 t/t0068-for-each-repo.sh            | 30 +++++++++++++++
 8 files changed, 152 insertions(+)
 create mode 100644 Documentation/git-for-each-repo.txt
 create mode 100644 builtin/for-each-repo.c
 create mode 100755 t/t0068-for-each-repo.sh

diff --git a/.gitignore b/.gitignore
index a5808fa30d..5eb2a2be71 100644
--- a/.gitignore
+++ b/.gitignore
@@ -67,6 +67,7 @@
 /git-filter-branch
 /git-fmt-merge-msg
 /git-for-each-ref
+/git-for-each-repo
 /git-format-patch
 /git-fsck
 /git-fsck-objects
diff --git a/Documentation/git-for-each-repo.txt b/Documentation/git-for-each-repo.txt
new file mode 100644
index 0000000000..94bd19da26
--- /dev/null
+++ b/Documentation/git-for-each-repo.txt
@@ -0,0 +1,59 @@
+git-for-each-repo(1)
+====================
+
+NAME
+----
+git-for-each-repo - Run a Git command on a list of repositories
+
+
+SYNOPSIS
+--------
+[verse]
+'git for-each-repo' --config=<config> [--] <arguments>
+
+
+DESCRIPTION
+-----------
+Run a Git command on a list of repositories. The arguments after the
+known options or `--` indicator are used as the arguments for the Git
+subprocess.
+
+THIS COMMAND IS EXPERIMENTAL. THE BEHAVIOR MAY CHANGE.
+
+For example, we could run maintenance on each of a list of repositories
+stored in a `maintenance.repo` config variable using
+
+-------------
+git for-each-repo --config=maintenance.repo maintenance run
+-------------
+
+This will run `git -C <repo> maintenance run` for each value `<repo>`
+in the multi-valued config variable `maintenance.repo`.
+
+
+OPTIONS
+-------
+--config=<config>::
+	Use the given config variable as a multi-valued list storing
+	absolute path names. Iterate on that list of paths to run
+	the given arguments.
++
+These config values are loaded from system, global, and local Git config,
+as available. If `git for-each-repo` is run in a directory that is not a
+Git repository, then only the system and global config is used.
+
+
+SUBPROCESS BEHAVIOR
+-------------------
+
+If any `git -C <repo> <arguments>` subprocess returns a non-zero exit code,
+then the `git for-each-repo` process returns that exit code without running
+more subprocesses.
+
+Each `git -C <repo> <arguments>` subprocess inherits the standard file
+descriptors `stdin`, `stdout`, and `stderr`.
+
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/Makefile b/Makefile
index 65f8cfb236..7c588ff036 100644
--- a/Makefile
+++ b/Makefile
@@ -1071,6 +1071,7 @@ BUILTIN_OBJS += builtin/fetch-pack.o
 BUILTIN_OBJS += builtin/fetch.o
 BUILTIN_OBJS += builtin/fmt-merge-msg.o
 BUILTIN_OBJS += builtin/for-each-ref.o
+BUILTIN_OBJS += builtin/for-each-repo.o
 BUILTIN_OBJS += builtin/fsck.o
 BUILTIN_OBJS += builtin/gc.o
 BUILTIN_OBJS += builtin/get-tar-commit-id.o
diff --git a/builtin.h b/builtin.h
index 17c1c0ce49..ff7c6e5aa9 100644
--- a/builtin.h
+++ b/builtin.h
@@ -150,6 +150,7 @@ int cmd_fetch(int argc, const char **argv, const char *prefix);
 int cmd_fetch_pack(int argc, const char **argv, const char *prefix);
 int cmd_fmt_merge_msg(int argc, const char **argv, const char *prefix);
 int cmd_for_each_ref(int argc, const char **argv, const char *prefix);
+int cmd_for_each_repo(int argc, const char **argv, const char *prefix);
 int cmd_format_patch(int argc, const char **argv, const char *prefix);
 int cmd_fsck(int argc, const char **argv, const char *prefix);
 int cmd_gc(int argc, const char **argv, const char *prefix);
diff --git a/builtin/for-each-repo.c b/builtin/for-each-repo.c
new file mode 100644
index 0000000000..5bba623ff1
--- /dev/null
+++ b/builtin/for-each-repo.c
@@ -0,0 +1,58 @@
+#include "cache.h"
+#include "config.h"
+#include "builtin.h"
+#include "parse-options.h"
+#include "run-command.h"
+#include "string-list.h"
+
+static const char * const for_each_repo_usage[] = {
+	N_("git for-each-repo --config=<config> <command-args>"),
+	NULL
+};
+
+static int run_command_on_repo(const char *path,
+			       void *cbdata)
+{
+	int i;
+	struct child_process child = CHILD_PROCESS_INIT;
+	struct strvec *args = (struct strvec *)cbdata;
+
+	child.git_cmd = 1;
+	strvec_pushl(&child.args, "-C", path, NULL);
+
+	for (i = 0; i < args->nr; i++)
+		strvec_push(&child.args, args->v[i]);
+
+	return run_command(&child);
+}
+
+int cmd_for_each_repo(int argc, const char **argv, const char *prefix)
+{
+	static const char *config_key = NULL;
+	int i, result = 0;
+	const struct string_list *values;
+	struct strvec args = STRVEC_INIT;
+
+	const struct option options[] = {
+		OPT_STRING(0, "config", &config_key, N_("config"),
+			   N_("config key storing a list of repository paths")),
+		OPT_END()
+	};
+
+	argc = parse_options(argc, argv, prefix, options, for_each_repo_usage,
+			     PARSE_OPT_STOP_AT_NON_OPTION);
+
+	if (!config_key)
+		die(_("missing --config=<config>"));
+
+	for (i = 0; i < argc; i++)
+		strvec_push(&args, argv[i]);
+
+	values = repo_config_get_value_multi(the_repository,
+					     config_key);
+
+	for (i = 0; !result && i < values->nr; i++)
+		result = run_command_on_repo(values->items[i].string, &args);
+
+	return result;
+}
diff --git a/command-list.txt b/command-list.txt
index 0e3204e7d1..581499be82 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -94,6 +94,7 @@ git-fetch-pack                          synchingrepositories
 git-filter-branch                       ancillarymanipulators
 git-fmt-merge-msg                       purehelpers
 git-for-each-ref                        plumbinginterrogators
+git-for-each-repo                       plumbinginterrogators
 git-format-patch                        mainporcelain
 git-fsck                                ancillaryinterrogators          complete
 git-gc                                  mainporcelain
diff --git a/git.c b/git.c
index 24f250d29a..1cab64b5d1 100644
--- a/git.c
+++ b/git.c
@@ -511,6 +511,7 @@ static struct cmd_struct commands[] = {
 	{ "fetch-pack", cmd_fetch_pack, RUN_SETUP | NO_PARSEOPT },
 	{ "fmt-merge-msg", cmd_fmt_merge_msg, RUN_SETUP },
 	{ "for-each-ref", cmd_for_each_ref, RUN_SETUP },
+	{ "for-each-repo", cmd_for_each_repo, RUN_SETUP_GENTLY },
 	{ "format-patch", cmd_format_patch, RUN_SETUP },
 	{ "fsck", cmd_fsck, RUN_SETUP },
 	{ "fsck-objects", cmd_fsck, RUN_SETUP },
diff --git a/t/t0068-for-each-repo.sh b/t/t0068-for-each-repo.sh
new file mode 100755
index 0000000000..136b4ec839
--- /dev/null
+++ b/t/t0068-for-each-repo.sh
@@ -0,0 +1,30 @@
+#!/bin/sh
+
+test_description='git for-each-repo builtin'
+
+. ./test-lib.sh
+
+test_expect_success 'run based on configured value' '
+	git init one &&
+	git init two &&
+	git init three &&
+	git -C two commit --allow-empty -m "DID NOT RUN" &&
+	git config run.key "$TRASH_DIRECTORY/one" &&
+	git config --add run.key "$TRASH_DIRECTORY/three" &&
+	git for-each-repo --config=run.key commit --allow-empty -m "ran" &&
+	git -C one log -1 --pretty=format:%s >message &&
+	grep ran message &&
+	git -C two log -1 --pretty=format:%s >message &&
+	! grep ran message &&
+	git -C three log -1 --pretty=format:%s >message &&
+	grep ran message &&
+	git for-each-repo --config=run.key -- commit --allow-empty -m "ran again" &&
+	git -C one log -1 --pretty=format:%s >message &&
+	grep again message &&
+	git -C two log -1 --pretty=format:%s >message &&
+	! grep again message &&
+	git -C three log -1 --pretty=format:%s >message &&
+	grep again message
+'
+
+test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 5/7] maintenance: add [un]register subcommands
  2020-08-25 18:39 ` [PATCH v2 0/7] [RFC] Maintenance III: background maintenance Derrick Stolee via GitGitGadget
                     ` (3 preceding siblings ...)
  2020-08-25 18:40   ` [PATCH v2 4/7] for-each-repo: run subcommands on configured repos Derrick Stolee via GitGitGadget
@ 2020-08-25 18:40   ` Derrick Stolee via GitGitGadget
  2020-08-25 18:40   ` [PATCH v2 6/7] maintenance: add start/stop subcommands Derrick Stolee via GitGitGadget
                     ` (2 subsequent siblings)
  7 siblings, 0 replies; 41+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-25 18:40 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

In preparation for launching background maintenance from the 'git
maintenance' builtin, create register/unregister subcommands. These
commands update the new 'maintenance.repos' config option in the global
config so the background maintenance job knows which repositories to
maintain.

These commands allow users to add a repository to the background
maintenance list without disrupting the actual maintenance mechanism.

For example, a user can run 'git maintenance register' when no
background maintenance is running and it will not start the background
maintenance. A later update to start running background maintenance will
then pick up this repository automatically.

The opposite example is that a user can run 'git maintenance unregister'
to remove the current repository from background maintenance without
halting maintenance for other repositories.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-maintenance.txt | 14 ++++++++
 builtin/gc.c                      | 55 ++++++++++++++++++++++++++++++-
 t/t7900-maintenance.sh            | 17 +++++++++-
 3 files changed, 84 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index 2bc02c65e4..c42a176a95 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -29,6 +29,15 @@ Git repository.
 SUBCOMMANDS
 -----------
 
+register::
+	Initialize Git config values so any scheduled maintenance will
+	start running on this repository. This adds the repository to the
+	`maintenance.repo` config variable in the current user's global
+	config and enables some recommended configuration values for
+	`maintenance.<task>.schedule`. The tasks that are enabled are safe
+	for running in the background without disrupting foreground
+	processes.
+
 run::
 	Run one or more maintenance tasks. If one or more `--task` options
 	are specified, then those tasks are run in that order. Otherwise,
@@ -36,6 +45,11 @@ run::
 	config options are true. By default, only `maintenance.gc.enabled`
 	is true.
 
+unregister::
+	Remove the current repository from background maintenance. This
+	only removes the repository from the configured list. It does not
+	stop the background maintenance processes from running.
+
 TASKS
 -----
 
diff --git a/builtin/gc.c b/builtin/gc.c
index 5726a9a3b3..5218d52cb7 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1414,7 +1414,56 @@ static int maintenance_run(int argc, const char **argv, const char *prefix)
 	return maintenance_run_tasks(&opts);
 }
 
-static const char builtin_maintenance_usage[] = N_("git maintenance run [<options>]");
+static int maintenance_register(void)
+{
+	struct child_process config_set = CHILD_PROCESS_INIT;
+	struct child_process config_get = CHILD_PROCESS_INIT;
+
+	/* There is no current repository, so skip registering it */
+	if (!the_repository || !the_repository->gitdir)
+		return 0;
+
+	config_get.git_cmd = 1;
+	strvec_pushl(&config_get.args, "config", "--global", "--get", "maintenance.repo",
+		     the_repository->worktree ? the_repository->worktree
+					      : the_repository->gitdir,
+			 NULL);
+	config_get.out = -1;
+
+	if (start_command(&config_get))
+		return error(_("failed to run 'git config'"));
+
+	/* We already have this value in our config! */
+	if (!finish_command(&config_get))
+		return 0;
+
+	config_set.git_cmd = 1;
+	strvec_pushl(&config_set.args, "config", "--add", "--global", "maintenance.repo",
+		     the_repository->worktree ? the_repository->worktree
+					      : the_repository->gitdir,
+		     NULL);
+
+	return run_command(&config_set);
+}
+
+static int maintenance_unregister(void)
+{
+	struct child_process config_unset = CHILD_PROCESS_INIT;
+
+	if (!the_repository || !the_repository->gitdir)
+		return error(_("no current repository to unregister"));
+
+	config_unset.git_cmd = 1;
+	strvec_pushl(&config_unset.args, "config", "--global", "--unset",
+		     "maintenance.repo",
+		     the_repository->worktree ? the_repository->worktree
+					      : the_repository->gitdir,
+		     NULL);
+
+	return run_command(&config_unset);
+}
+
+static const char builtin_maintenance_usage[] =	N_("git maintenance <subcommand> [<options>]");
 
 int cmd_maintenance(int argc, const char **argv, const char *prefix)
 {
@@ -1423,6 +1472,10 @@ int cmd_maintenance(int argc, const char **argv, const char *prefix)
 
 	if (!strcmp(argv[1], "run"))
 		return maintenance_run(argc - 1, argv + 1, prefix);
+	if (!strcmp(argv[1], "register"))
+		return maintenance_register();
+	if (!strcmp(argv[1], "unregister"))
+		return maintenance_unregister();
 
 	die(_("invalid subcommand: %s"), argv[1]);
 }
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 3e0c5f1ca8..b20ee2d542 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -9,7 +9,7 @@ GIT_TEST_MULTI_PACK_INDEX=0
 
 test_expect_success 'help text' '
 	test_expect_code 129 git maintenance -h 2>err &&
-	test_i18ngrep "usage: git maintenance run" err &&
+	test_i18ngrep "usage: git maintenance <subcommand>" err &&
 	test_expect_code 128 git maintenance barf 2>err &&
 	test_i18ngrep "invalid subcommand: barf" err
 '
@@ -294,4 +294,19 @@ test_expect_success '--scheduled with specific time' '
 	test_cmp_config 1595000100 maintenance.commit-graph.lastrun
 '
 
+test_expect_success 'register and unregister' '
+	test_when_finished git config --global --unset-all maintenance.repo &&
+	git config --global --add maintenance.repo /existing1 &&
+	git config --global --add maintenance.repo /existing2 &&
+	git config --global --get-all maintenance.repo >before &&
+	git maintenance register &&
+	git config --global --get-all maintenance.repo >actual &&
+	cp before after &&
+	pwd >>after &&
+	test_cmp after actual &&
+	git maintenance unregister &&
+	git config --global --get-all maintenance.repo >actual &&
+	test_cmp before actual
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 6/7] maintenance: add start/stop subcommands
  2020-08-25 18:39 ` [PATCH v2 0/7] [RFC] Maintenance III: background maintenance Derrick Stolee via GitGitGadget
                     ` (4 preceding siblings ...)
  2020-08-25 18:40   ` [PATCH v2 5/7] maintenance: add [un]register subcommands Derrick Stolee via GitGitGadget
@ 2020-08-25 18:40   ` Derrick Stolee via GitGitGadget
  2020-08-25 18:40   ` [PATCH v2 7/7] maintenance: recommended schedule in register/start Derrick Stolee via GitGitGadget
  2020-08-28 15:45   ` [PATCH v3 0/6] [RFC] Maintenance III: background maintenance Derrick Stolee via GitGitGadget
  7 siblings, 0 replies; 41+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-25 18:40 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Add new subcommands to 'git maintenance' that start or stop background
maintenance using 'cron', when available. This integration is as simple
as I could make it, barring some implementation complications.

For now, the background maintenance is scheduled to run hourly via the
following cron table row (ignore line breaks):

	0 * * * * $p/git --exec-path=$p
		for-each-repo --config=maintenance.repo
		maintenance run --scheduled

Future extensions may want to add more complex schedules or some form of
logging. For now, hourly runs seem frequent enough to satisfy the needs
of tasks like 'prefetch' without being so frequent that users would
complain about many no-op commands.

Here, "$p" is a placeholder for the path to the current Git executable.
This is critical for systems with multiple versions of Git.
Specifically, macOS has a system version at '/usr/bin/git' while the
version that users can install resides at '/usr/local/bin/git' (symlinked
to '/usr/local/libexec/git-core/git'). This will also use your
locally-built version if you build and run this in your development
environment without installing first.

The GIT_TEST_CRONTAB environment variable is not intended for users to
edit, but instead as a way to mock the 'crontab [-l]' command. This
variable is set in test-lib.sh to avoid a future test from accidentally
running anything with the cron integration from modifying the user's
schedule. We use GIT_TEST_CRONTAB='test-tool crontab <file>' in our
tests to check how the schedule is modified in 'git maintenance
(start|stop)' commands.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-maintenance.txt |  11 +++
 Makefile                          |   1 +
 builtin/gc.c                      | 117 ++++++++++++++++++++++++++++++
 t/helper/test-crontab.c           |  35 +++++++++
 t/helper/test-tool.c              |   1 +
 t/helper/test-tool.h              |   1 +
 t/t7900-maintenance.sh            |  30 ++++++++
 t/test-lib.sh                     |   6 ++
 8 files changed, 202 insertions(+)
 create mode 100644 t/helper/test-crontab.c

diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index c42a176a95..d0316db5ae 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -45,6 +45,17 @@ run::
 	config options are true. By default, only `maintenance.gc.enabled`
 	is true.
 
+start::
+	Start running maintenance on the current repository. This performs
+	the same config updates as the `register` subcommand, then updates
+	the background scheduler to run `git maintenance run --scheduled`
+	on an hourly basis.
+
+stop::
+	Halt the background maintenance schedule. The current repository
+	is not removed from the list of maintained repositories, in case
+	the background maintenance is restarted later.
+
 unregister::
 	Remove the current repository from background maintenance. This
 	only removes the repository from the configured list. It does not
diff --git a/Makefile b/Makefile
index 7c588ff036..c39b39bd7d 100644
--- a/Makefile
+++ b/Makefile
@@ -690,6 +690,7 @@ TEST_BUILTINS_OBJS += test-advise.o
 TEST_BUILTINS_OBJS += test-bloom.o
 TEST_BUILTINS_OBJS += test-chmtime.o
 TEST_BUILTINS_OBJS += test-config.o
+TEST_BUILTINS_OBJS += test-crontab.o
 TEST_BUILTINS_OBJS += test-ctype.o
 TEST_BUILTINS_OBJS += test-date.o
 TEST_BUILTINS_OBJS += test-delta.o
diff --git a/builtin/gc.c b/builtin/gc.c
index 5218d52cb7..d97af4e546 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -32,6 +32,7 @@
 #include "remote.h"
 #include "midx.h"
 #include "object-store.h"
+#include "exec-cmd.h"
 
 #define FAILED_RUN "failed to run %s"
 
@@ -1463,6 +1464,118 @@ static int maintenance_unregister(void)
 	return run_command(&config_unset);
 }
 
+#define BEGIN_LINE "# BEGIN GIT MAINTENANCE SCHEDULE"
+#define END_LINE "# END GIT MAINTENANCE SCHEDULE"
+
+static int update_background_schedule(int run_maintenance)
+{
+	int result = 0;
+	int in_old_region = 0;
+	struct child_process crontab_list = CHILD_PROCESS_INIT;
+	struct child_process crontab_edit = CHILD_PROCESS_INIT;
+	FILE *cron_list, *cron_in;
+	const char *crontab_name;
+	struct strbuf line = STRBUF_INIT;
+	struct lock_file lk;
+	char *lock_path = xstrfmt("%s/schedule", the_repository->objects->odb->path);
+
+	if (hold_lock_file_for_update(&lk, lock_path, LOCK_NO_DEREF) < 0)
+		return error(_("another process is scheduling background maintenance"));
+
+	crontab_name = getenv("GIT_TEST_CRONTAB");
+	if (!crontab_name)
+		crontab_name = "crontab";
+
+	strvec_split(&crontab_list.args, crontab_name);
+	strvec_push(&crontab_list.args, "-l");
+	crontab_list.in = -1;
+	crontab_list.out = dup(lk.tempfile->fd);
+	crontab_list.git_cmd = 0;
+
+	if (start_command(&crontab_list)) {
+		result = error(_("failed to run 'crontab -l'; your system might not support 'cron'"));
+		goto cleanup;
+	}
+
+	/* Ignore exit code, as an empty crontab will return error. */
+	finish_command(&crontab_list);
+
+	/*
+	 * Read from the .lock file, filtering out the old
+	 * schedule while appending the new schedule.
+	 */
+	cron_list = fdopen(lk.tempfile->fd, "r");
+	rewind(cron_list);
+
+	strvec_split(&crontab_edit.args, crontab_name);
+	crontab_edit.in = -1;
+	crontab_edit.git_cmd = 0;
+
+	if (start_command(&crontab_edit)) {
+		result = error(_("failed to run 'crontab'; your system might not support 'cron'"));
+		goto cleanup;
+	}
+
+	cron_in = fdopen(crontab_edit.in, "w");
+	if (!cron_in) {
+		result = error(_("failed to open stdin of 'crontab'"));
+		goto done_editing;
+	}
+
+	while (!strbuf_getline_lf(&line, cron_list)) {
+		if (!in_old_region && !strcmp(line.buf, BEGIN_LINE))
+			in_old_region = 1;
+		if (in_old_region)
+			continue;
+		fprintf(cron_in, "%s\n", line.buf);
+		if (in_old_region && !strcmp(line.buf, END_LINE))
+			in_old_region = 0;
+	}
+
+	if (run_maintenance) {
+		const char *exec_path = git_exec_path();
+
+		fprintf(cron_in, "\n%s\n", BEGIN_LINE);
+		fprintf(cron_in, "# The following schedule was created by Git\n");
+		fprintf(cron_in, "# Any edits made in this region might be\n");
+		fprintf(cron_in, "# replaced in the future by a Git command.\n\n");
+
+		fprintf(cron_in,
+			"0 * * * * \"%s/git\" --exec-path=\"%s\" for-each-repo --config=maintenance.repo maintenance run --scheduled\n",
+			exec_path, exec_path);
+
+		fprintf(cron_in, "\n%s\n", END_LINE);
+	}
+
+	fflush(cron_in);
+	fclose(cron_in);
+	close(crontab_edit.in);
+
+done_editing:
+	if (finish_command(&crontab_edit)) {
+		result = error(_("'crontab' died"));
+		goto cleanup;
+	}
+	fclose(cron_list);
+
+cleanup:
+	rollback_lock_file(&lk);
+	return result;
+}
+
+static int maintenance_start(void)
+{
+	if (maintenance_register())
+		warning(_("failed to add repo to global config"));
+
+	return update_background_schedule(1);
+}
+
+static int maintenance_stop(void)
+{
+	return update_background_schedule(0);
+}
+
 static const char builtin_maintenance_usage[] =	N_("git maintenance <subcommand> [<options>]");
 
 int cmd_maintenance(int argc, const char **argv, const char *prefix)
@@ -1472,6 +1585,10 @@ int cmd_maintenance(int argc, const char **argv, const char *prefix)
 
 	if (!strcmp(argv[1], "run"))
 		return maintenance_run(argc - 1, argv + 1, prefix);
+	if (!strcmp(argv[1], "start"))
+		return maintenance_start();
+	if (!strcmp(argv[1], "stop"))
+		return maintenance_stop();
 	if (!strcmp(argv[1], "register"))
 		return maintenance_register();
 	if (!strcmp(argv[1], "unregister"))
diff --git a/t/helper/test-crontab.c b/t/helper/test-crontab.c
new file mode 100644
index 0000000000..f5db6319c6
--- /dev/null
+++ b/t/helper/test-crontab.c
@@ -0,0 +1,35 @@
+#include "test-tool.h"
+#include "cache.h"
+
+/*
+ * Usage: test-tool cron <file> [-l]
+ *
+ * If -l is specified, then write the contents of <file> to stdou.
+ * Otherwise, write from stdin into <file>.
+ */
+int cmd__crontab(int argc, const char **argv)
+{
+	char a;
+	FILE *from, *to;
+
+	if (argc == 3 && !strcmp(argv[2], "-l")) {
+		from = fopen(argv[1], "r");
+		if (!from)
+			return 0;
+		to = stdout;
+	} else if (argc == 2) {
+		from = stdin;
+		to = fopen(argv[1], "w");
+	} else
+		return error("unknown arguments");
+
+	while ((a = fgetc(from)) != EOF)
+		fputc(a, to);
+
+	if (argc == 3)
+		fclose(from);
+	else
+		fclose(to);
+
+	return 0;
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 590b2efca7..432b49d948 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -18,6 +18,7 @@ static struct test_cmd cmds[] = {
 	{ "bloom", cmd__bloom },
 	{ "chmtime", cmd__chmtime },
 	{ "config", cmd__config },
+	{ "crontab", cmd__crontab },
 	{ "ctype", cmd__ctype },
 	{ "date", cmd__date },
 	{ "delta", cmd__delta },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index ddc8e990e9..7c3281e071 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -8,6 +8,7 @@ int cmd__advise_if_enabled(int argc, const char **argv);
 int cmd__bloom(int argc, const char **argv);
 int cmd__chmtime(int argc, const char **argv);
 int cmd__config(int argc, const char **argv);
+int cmd__crontab(int argc, const char **argv);
 int cmd__ctype(int argc, const char **argv);
 int cmd__date(int argc, const char **argv);
 int cmd__delta(int argc, const char **argv);
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index b20ee2d542..6491031be8 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -309,4 +309,34 @@ test_expect_success 'register and unregister' '
 	test_cmp before actual
 '
 
+test_expect_success 'start from empty cron table' '
+	GIT_TEST_CRONTAB="test-tool crontab cron.txt" git maintenance start &&
+
+	# start registers the repo
+	git config --get --global maintenance.repo "$(pwd)" &&
+
+	grep "for-each-repo --config=maintenance.repo maintenance run --scheduled" cron.txt
+'
+
+test_expect_success 'stop from existing schedule' '
+	GIT_TEST_CRONTAB="test-tool crontab cron.txt" git maintenance stop &&
+
+	# stop does not unregister the repo
+	git config --get --global maintenance.repo "$(pwd)" &&
+
+	# The newline is preserved
+	echo >empty &&
+	test_cmp empty cron.txt &&
+
+	# Operation is idempotent
+	GIT_TEST_CRONTAB="test-tool crontab cron.txt" git maintenance stop &&
+	test_cmp empty cron.txt
+'
+
+test_expect_success 'start preserves existing schedule' '
+	echo "Important information!" >cron.txt &&
+	GIT_TEST_CRONTAB="test-tool crontab cron.txt" git maintenance start &&
+	grep "Important information!" cron.txt
+'
+
 test_done
diff --git a/t/test-lib.sh b/t/test-lib.sh
index ef31f40037..4a60d1ed76 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1702,3 +1702,9 @@ test_lazy_prereq SHA1 '
 test_lazy_prereq REBASE_P '
 	test -z "$GIT_TEST_SKIP_REBASE_P"
 '
+
+# Ensure that no test accidentally triggers a Git command
+# that runs 'crontab', affecting a user's cron schedule.
+# Tests that verify the cron integration must set this locally
+# to avoid errors.
+GIT_TEST_CRONTAB="exit 1"
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 7/7] maintenance: recommended schedule in register/start
  2020-08-25 18:39 ` [PATCH v2 0/7] [RFC] Maintenance III: background maintenance Derrick Stolee via GitGitGadget
                     ` (5 preceding siblings ...)
  2020-08-25 18:40   ` [PATCH v2 6/7] maintenance: add start/stop subcommands Derrick Stolee via GitGitGadget
@ 2020-08-25 18:40   ` Derrick Stolee via GitGitGadget
  2020-08-28 15:45   ` [PATCH v3 0/6] [RFC] Maintenance III: background maintenance Derrick Stolee via GitGitGadget
  7 siblings, 0 replies; 41+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-25 18:40 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The 'git maintenance (register|start)' subcommands add the current
repository to the global Git config so maintenance will operate on that
repository. It does not specify what maintenance should occur or how
often.

If a user sets any 'maintenance.<task>.scheduled' config value, then
they have chosen a specific schedule for themselves and Git should
respect that.

However, in an effort to recommend a good schedule for repositories of
all sizes, set new config values for recommended tasks that are safe to
run in the background while users run foreground Git commands. These
commands are generally everything but the 'gc' task.

Author's Note: I feel we should do _something_ to recommend a good
schedule to users, but I'm not 100% set on this schedule. This is the
schedule we use in Scalar and VFS for Git for very large repositories
using the GVFS protocol. While the schedule works in that environment,
it is possible that "normal" Git repositories could benefit from
something more obvious (such as running 'gc' once a day). However, this
patch gives us a place to start a conversation on what we should
recommend. For my purposes, Scalar will set these config values so we
can always differ from core Git's recommendations.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-maintenance.txt |  6 +++++
 builtin/gc.c                      | 44 +++++++++++++++++++++++++++++++
 t/t7900-maintenance.sh            |  5 ++++
 3 files changed, 55 insertions(+)

diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index d0316db5ae..bba76f0b0d 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -37,6 +37,12 @@ register::
 	`maintenance.<task>.schedule`. The tasks that are enabled are safe
 	for running in the background without disrupting foreground
 	processes.
++
+If your repository has no 'maintenance.<task>.schedule' configuration
+values set, then Git will set configuration values to some recommended
+settings. These settings disable foreground maintenance while performing
+maintenance tasks in the background that will not interrupt foreground Git
+operations.
 
 run::
 	Run one or more maintenance tasks. If one or more `--task` options
diff --git a/builtin/gc.c b/builtin/gc.c
index d97af4e546..037402e47f 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1415,6 +1415,47 @@ static int maintenance_run(int argc, const char **argv, const char *prefix)
 	return maintenance_run_tasks(&opts);
 }
 
+static int has_schedule_config(void)
+{
+	int i, found = 0;
+	struct strbuf config_name = STRBUF_INIT;
+	size_t prefix;
+
+	strbuf_addstr(&config_name, "maintenance.");
+	prefix = config_name.len;
+
+	for (i = 0; !found && i < TASK__COUNT; i++) {
+		int value;
+
+		strbuf_setlen(&config_name, prefix);
+		strbuf_addf(&config_name, "%s.schedule", tasks[i].name);
+
+		if (!git_config_get_int(config_name.buf, &value))
+			found = 1;
+	}
+
+	strbuf_release(&config_name);
+	return found;
+}
+
+static void set_recommended_schedule(void)
+{
+	git_config_set("maintenance.auto", "false");
+	git_config_set("maintenance.gc.enabled", "false");
+
+	git_config_set("maintenance.prefetch.enabled", "true");
+	git_config_set("maintenance.prefetch.schedule", "3500");
+
+	git_config_set("maintenance.commit-graph.enabled", "true");
+	git_config_set("maintenance.commit-graph.schedule", "3500");
+
+	git_config_set("maintenance.loose-objects.enabled", "true");
+	git_config_set("maintenance.loose-objects.schedule", "86000");
+
+	git_config_set("maintenance.incremental-repack.enabled", "true");
+	git_config_set("maintenance.incremental-repack.schedule", "86000");
+}
+
 static int maintenance_register(void)
 {
 	struct child_process config_set = CHILD_PROCESS_INIT;
@@ -1424,6 +1465,9 @@ static int maintenance_register(void)
 	if (!the_repository || !the_repository->gitdir)
 		return 0;
 
+	if (has_schedule_config())
+		set_recommended_schedule();
+
 	config_get.git_cmd = 1;
 	strvec_pushl(&config_get.args, "config", "--global", "--get", "maintenance.repo",
 		     the_repository->worktree ? the_repository->worktree
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 6491031be8..7417e5858a 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -300,6 +300,11 @@ test_expect_success 'register and unregister' '
 	git config --global --add maintenance.repo /existing2 &&
 	git config --global --get-all maintenance.repo >before &&
 	git maintenance register &&
+	test_cmp_config false maintenance.auto &&
+	test_cmp_config false maintenance.gc.enabled &&
+	test_cmp_config true maintenance.prefetch.enabled &&
+	test_cmp_config 3500 maintenance.commit-graph.schedule &&
+	test_cmp_config 86000 maintenance.incremental-repack.schedule &&
 	git config --global --get-all maintenance.repo >actual &&
 	cp before after &&
 	pwd >>after &&
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 1/7] maintenance: optionally skip --auto process
  2020-08-25 18:39   ` [PATCH v2 1/7] maintenance: optionally skip --auto process Derrick Stolee via GitGitGadget
@ 2020-08-25 21:44     ` Junio C Hamano
  2020-08-26 12:29       ` Derrick Stolee
  0 siblings, 1 reply; 41+ messages in thread
From: Junio C Hamano @ 2020-08-25 21:44 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, sandals, steadmon, jrnieder, peff, congdanhqx,
	phillip.wood123, emilyshaffer, sluongng, jonathantanmy,
	Derrick Stolee, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> Some commands run 'git maintenance run --auto --[no-]quiet' after doing
> their normal work, as a way to keep repositories clean as they are used.
> Currently, users who do not want this maintenance to occur would set the
> 'gc.auto' config option to 0 to avoid the 'gc' task from running.
> However, this does not stop the extra process invocation.

OK, that is because the configuration is checked on the other side,
and the new check is implemented on this side before we decide to
spawn the maintenance task.

It sounds like a change worth having without even waiting for the
"git maintenance" to materialize ;-).

> @@ -1868,8 +1869,13 @@ int run_processes_parallel_tr2(int n, get_next_task_fn get_next_task,
>  
>  int run_auto_maintenance(int quiet)
>  {
> +	int enabled;
>  	struct child_process maint = CHILD_PROCESS_INIT;
>  
> +	if (!git_config_get_bool("maintenance.auto", &enabled) &&
> +	    !enabled)
> +		return 0;

So in a repository without this configuration, get_bool would fail
and we do not short-circuit.  Only if the value get_bool sees is
false, we return without running the command.  Makes sense.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 2/7] maintenance: store the "last run" time in config
  2020-08-25 18:39   ` [PATCH v2 2/7] maintenance: store the "last run" time in config Derrick Stolee via GitGitGadget
@ 2020-08-25 21:52     ` Junio C Hamano
  2020-08-26 13:34       ` Derrick Stolee
  0 siblings, 1 reply; 41+ messages in thread
From: Junio C Hamano @ 2020-08-25 21:52 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, sandals, steadmon, jrnieder, peff, congdanhqx,
	phillip.wood123, emilyshaffer, sluongng, jonathantanmy,
	Derrick Stolee, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> I selected the timestamp before the task, as opposed to after the task,
> for a couple reasons:
>
>  1. The time the task takes to execute should not contribute to the
>     interval between running the tasks.

... as long as the run time is sufficiently shorter than the
interval, that is.  If a task takes 10-30 minutes depending on how
dirty the repository is, it does not make sense to even try to run
it every 15 minutes.

> If a daily task takes 10 minutes
>     to run, then every day the execution will drift by at least 10
>     minutes.

That is not incorrect per-se, but it does not tell us why drifting
by 10 minutes is a bad thing.

>  2. If the task fails for some unforseen reason, it would be good to
>     indicate that we _attempted_ the task at a certain timestamp. This
>     will avoid spamming a repository that is in a bad state.

Absolutely.

> +static void update_last_run(struct maintenance_task *task)
> +{
> +	timestamp_t now = approxidate("now");
> +	struct strbuf config = STRBUF_INIT;
> +	struct strbuf value = STRBUF_INIT;
> +	strbuf_addf(&config, "maintenance.%s.lastrun", task->name);
> +	strbuf_addf(&value, "%"PRItime"", now);

So is this essentially meant as a human-unreadable opaque value,
like we have in the commit object header lines?  I do not have a
strong opinion, but it would be nice to allow curious to casually
read it.  Perhaps "git config --type=timestamp maintenance.lastrun"
can be taught to pretty print its value?


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 3/7] maintenance: add --scheduled option and config
  2020-08-25 18:40   ` [PATCH v2 3/7] maintenance: add --scheduled option and config Derrick Stolee via GitGitGadget
@ 2020-08-25 22:01     ` Junio C Hamano
  2020-08-26 15:30       ` Derrick Stolee
  0 siblings, 1 reply; 41+ messages in thread
From: Junio C Hamano @ 2020-08-25 22:01 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, sandals, steadmon, jrnieder, peff, congdanhqx,
	phillip.wood123, emilyshaffer, sluongng, jonathantanmy,
	Derrick Stolee, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> A user may want to run certain maintenance tasks based on frequency, not
> conditions given in the repository. For example, the user may want to
> perform a 'prefetch' task every hour, or 'gc' task every day. To assist,
> update the 'git maintenance run --scheduled' command to check the config
> for the last run of that task and add a number of seconds. The task
> would then run only if the current time is beyond that minimum
> timestamp.
>
> Add a '--scheduled' option to 'git maintenance run' to only run tasks
> that have had enough time pass since their last run. This is done for
> each enabled task by checking if the current timestamp is at least as
> large as the sum of 'maintenance.<task>.lastRun' and
> 'maintenance.<task>.schedule' in the Git config. This second value is
> new to this commit, storing a number of seconds intended between runs.
>
> A user could then set up an hourly maintenance run with the following
> cron table:
>
>   0 * * * * git -C <repo> maintenance run --scheduled

The scheme has one obvious drawback.  An hourly crontab entry means
your maintenance.*.schedule that is finer grained than an hour
increment will not run as expected.  You'd need to take all the
schedule intervals and take their GCD to come up with the frequency
of the single crontab entry.  

Wouldn't it make more sense to have N crontab entries for N tasks
you want to run periodically, each with their own frequency
controlled by crontab?  That way, you do not need to maintain
maintenance.*.schedule configuration variables and the --scheduled
option.  It might make maintenance.*.lastrun timestamps unneeded,
which would be an added plus to simplify the system quite
drastically.  Most importantly, that would be the way crontab users
are most used to in order to schedule their periodical jobs, so it
is one less thing to learn.




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 4/7] for-each-repo: run subcommands on configured repos
  2020-08-25 18:40   ` [PATCH v2 4/7] for-each-repo: run subcommands on configured repos Derrick Stolee via GitGitGadget
@ 2020-08-25 22:19     ` Junio C Hamano
  2020-08-26 16:03       ` Derrick Stolee
  0 siblings, 1 reply; 41+ messages in thread
From: Junio C Hamano @ 2020-08-25 22:19 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, sandals, steadmon, jrnieder, peff, congdanhqx,
	phillip.wood123, emilyshaffer, sluongng, jonathantanmy,
	Derrick Stolee, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +SYNOPSIS
> +--------
> +[verse]
> +'git for-each-repo' --config=<config> [--] <arguments>
> + ...
> +--config=<config>::
> +	Use the given config variable as a multi-valued list storing
> +	absolute path names.

Would it make sense to allow this config to be read from the current
repository, I wonder.  It is probably designed to be written to
either ~/.gitconfig or /etc/gitconfig because it is probably a need
that is not per-repository to list repositories for various purposes
specified by the config key, but I suspect there _might_ be a good
use case for storing some custom list of repositories in the
configuration file local to a repository, but it is not quite
obvious what it is.

If we have a good example, we may want to spell it out---that would
help future readers who wonder about this (just like I am doing now).

Also, if we do read from local config, should there be a way to say
"ah, you may have read values from /etc/gitconfig and ~/.gitconfig,
but please forget them---I have a full list I care when you are
running in this repository", i.e. clear the list.  It is purely a
convention and there is no built-in mechanism for this in the config
API, but often it is signalled by giving an empty string as a value.

By the way, I do not have a good concrete suggestion, but can we use
something better than <config> as the placeholder?  I first thought
this was naming the name of a file that lists repositories, not the
config variable name in our usual config namespace.

> +static int run_command_on_repo(const char *path,
> +			       void *cbdata)

Is that on repo or in repo?  When I saw "-C" on the command line, I
immediately thought of "in repo".

> +{
> +	int i;
> +	struct child_process child = CHILD_PROCESS_INIT;
> +	struct strvec *args = (struct strvec *)cbdata;
> +
> +	child.git_cmd = 1;
> +	strvec_pushl(&child.args, "-C", path, NULL);
> +
> +	for (i = 0; i < args->nr; i++)
> +		strvec_push(&child.args, args->v[i]);

Would strvec_pushv() work, or is args->v[] not NULL terminated?

> +	return run_command(&child);
> +}


> +	values = repo_config_get_value_multi(the_repository,
> +					     config_key);

Not your fault, but it is a bit unsatisfactory that we do not have
special "type" meant for paths in the config API, unlike the
parse-options API where there is a "filename" type that is a bit
richer than a vanilla "string" type by allowing "prefix" handling.
For the purposes of this, as the values are limited to absolute/full
pathnames, it does not hurt as much, though.

Thanks.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 1/7] maintenance: optionally skip --auto process
  2020-08-25 21:44     ` Junio C Hamano
@ 2020-08-26 12:29       ` Derrick Stolee
  2020-08-26 16:57         ` Junio C Hamano
  0 siblings, 1 reply; 41+ messages in thread
From: Derrick Stolee @ 2020-08-26 12:29 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, sandals, steadmon, jrnieder, peff, congdanhqx,
	phillip.wood123, emilyshaffer, sluongng, jonathantanmy,
	Derrick Stolee, Derrick Stolee

On 8/25/2020 5:44 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> Some commands run 'git maintenance run --auto --[no-]quiet' after doing
>> their normal work, as a way to keep repositories clean as they are used.
>> Currently, users who do not want this maintenance to occur would set the
>> 'gc.auto' config option to 0 to avoid the 'gc' task from running.
>> However, this does not stop the extra process invocation.
> 
> OK, that is because the configuration is checked on the other side,
> and the new check is implemented on this side before we decide to
> spawn the maintenance task.
> 
> It sounds like a change worth having without even waiting for the
> "git maintenance" to materialize ;-).

True. This one could be pulled out of Part III and placed any time
after Part I (outside of test script context conflicts). The only
reason to wait until after Part I is that the name "maintenance.auto"
implies a connection to the maintenance builtin instead of gc.

Before the maintenance builtin, it would be natural to see "gc.auto=0"
and do this same behavior, but that will not be general enough in the
future.

If you prefer, I can pull this out into a series on its own to be
tracked separately.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/7] [RFC] Maintenance III: background maintenance
  2020-08-19 17:16 [PATCH 0/7] [RFC] Maintenance III: background maintenance Derrick Stolee via GitGitGadget
                   ` (7 preceding siblings ...)
  2020-08-25 18:39 ` [PATCH v2 0/7] [RFC] Maintenance III: background maintenance Derrick Stolee via GitGitGadget
@ 2020-08-26 12:42 ` Michal Suchánek
  8 siblings, 0 replies; 41+ messages in thread
From: Michal Suchánek @ 2020-08-26 12:42 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, sandals, steadmon, jrnieder, peff, congdanhqx,
	phillip.wood123, emilyshaffer, sluongng, jonathantanmy,
	Derrick Stolee

On Wed, Aug 19, 2020 at 05:16:41PM +0000, Derrick Stolee via GitGitGadget wrote:
> This is based on ds/maintenance-part-2, but with some local updates to
> review feedback. It won't apply cleanly right now. This RFC is for early
> feedback and not intended to make a new tracking branch until v2.
> 
> This RFC is intended to show how I hope to integrate true background
> maintenance into Git. As opposed to my original RFC [1], this entirely
> integrates with cron (through crontab [-e|-l]) to launch maintenance
> commands in the background.
> 
> [1] 
> https://lore.kernel.org/git/pull.597.git.1585946894.gitgitgadget@gmail.com/
> 
> Some preliminary work is done to allow a new --scheduled option that
> triggers enabled tasks only if they have not been run in some amount of
> time. The timestamp of the previous run is stored in the 
> maintenance.<task>.lastRun config value while the interval is stored in the 
> maintenance.<task>.schedule config value.
This changes the config file from read-mostly to continuously updated. Is
that desirable?
In particular it significanly increases the risk of race with the user
editing the file.
I think timestamps are not configuration and should be written to some
other file.
Or is there already a core git feature that continuously updates the
config file?

Thanks

Michal

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 2/7] maintenance: store the "last run" time in config
  2020-08-25 21:52     ` Junio C Hamano
@ 2020-08-26 13:34       ` Derrick Stolee
  2020-08-26 17:03         ` Junio C Hamano
  0 siblings, 1 reply; 41+ messages in thread
From: Derrick Stolee @ 2020-08-26 13:34 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, sandals, steadmon, jrnieder, peff, congdanhqx,
	phillip.wood123, emilyshaffer, sluongng, jonathantanmy,
	Derrick Stolee, Derrick Stolee

On 8/25/2020 5:52 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> I selected the timestamp before the task, as opposed to after the task,
>> for a couple reasons:
>>
>>  1. The time the task takes to execute should not contribute to the
>>     interval between running the tasks.
> 
> ... as long as the run time is sufficiently shorter than the
> interval, that is.  If a task takes 10-30 minutes depending on how
> dirty the repository is, it does not make sense to even try to run
> it every 15 minutes.

Definitely. The lock on the object database from earlier prevents these
longer-than-anticipated tasks from stacking.

>> If a daily task takes 10 minutes
>>     to run, then every day the execution will drift by at least 10
>>     minutes.
> 
> That is not incorrect per-se, but it does not tell us why drifting
> by 10 minutes is a bad thing.

True.

>>  2. If the task fails for some unforseen reason, it would be good to
>>     indicate that we _attempted_ the task at a certain timestamp. This
>>     will avoid spamming a repository that is in a bad state.
> 
> Absolutely.
> 
>> +static void update_last_run(struct maintenance_task *task)
>> +{
>> +	timestamp_t now = approxidate("now");
>> +	struct strbuf config = STRBUF_INIT;
>> +	struct strbuf value = STRBUF_INIT;
>> +	strbuf_addf(&config, "maintenance.%s.lastrun", task->name);
>> +	strbuf_addf(&value, "%"PRItime"", now);
> 
> So is this essentially meant as a human-unreadable opaque value,
> like we have in the commit object header lines?  I do not have a
> strong opinion, but it would be nice to allow curious to casually
> read it.  Perhaps "git config --type=timestamp maintenance.lastrun"
> can be taught to pretty print its value?

Good idea. I will think on this. Of course, we already have
--type=expiry-date, which does the opposite. Perhaps this config
value should be a human-readable date and then be parsed into a
timestamp in-process using git_config_expiry_date()?

I have mixed feelings on using that format, because it can store
both a fixed or relative datetime. The *.lastRun config really
wants a _fixed_ datetime, but the *.schedule config (in the next
patch) would want a _relative_ datetime. This also allows things
like "now" or "never", so it presents a lot of flexibility for
users. A nightmare to test, but perhaps that flexibility is
useful.

(Of course, in another thread you mentioned multiple `crontab`
lines, which might make this entire discussion irrelevant. I'll
follow up there.)

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 3/7] maintenance: add --scheduled option and config
  2020-08-25 22:01     ` Junio C Hamano
@ 2020-08-26 15:30       ` Derrick Stolee
  2020-08-27 15:47         ` Derrick Stolee
  0 siblings, 1 reply; 41+ messages in thread
From: Derrick Stolee @ 2020-08-26 15:30 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, sandals, steadmon, jrnieder, peff, congdanhqx,
	phillip.wood123, emilyshaffer, sluongng, jonathantanmy,
	Derrick Stolee, Derrick Stolee

On 8/25/2020 6:01 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> A user may want to run certain maintenance tasks based on frequency, not
>> conditions given in the repository. For example, the user may want to
>> perform a 'prefetch' task every hour, or 'gc' task every day. To assist,
>> update the 'git maintenance run --scheduled' command to check the config
>> for the last run of that task and add a number of seconds. The task
>> would then run only if the current time is beyond that minimum
>> timestamp.
>>
>> Add a '--scheduled' option to 'git maintenance run' to only run tasks
>> that have had enough time pass since their last run. This is done for
>> each enabled task by checking if the current timestamp is at least as
>> large as the sum of 'maintenance.<task>.lastRun' and
>> 'maintenance.<task>.schedule' in the Git config. This second value is
>> new to this commit, storing a number of seconds intended between runs.
>>
>> A user could then set up an hourly maintenance run with the following
>> cron table:
>>
>>   0 * * * * git -C <repo> maintenance run --scheduled
> 
> The scheme has one obvious drawback.  An hourly crontab entry means
> your maintenance.*.schedule that is finer grained than an hour
> increment will not run as expected.  You'd need to take all the
> schedule intervals and take their GCD to come up with the frequency
> of the single crontab entry.  

My intention for the *.schedule is that it is not an _exact_ frequency,
but instead a lower bound on the frequency. That can be shelved for now
as we discuss this setup:

> Wouldn't it make more sense to have N crontab entries for N tasks
> you want to run periodically, each with their own frequency
> controlled by crontab?  That way, you do not need to maintain
> maintenance.*.schedule configuration variables and the --scheduled
> option.  It might make maintenance.*.lastrun timestamps unneeded,
> which would be an added plus to simplify the system quite
> drastically.  Most importantly, that would be the way crontab users
> are most used to in order to schedule their periodical jobs, so it
> is one less thing to learn.

I had briefly considered setting up crontab entries for each task
(and possibly each repo) but ended up with these complications:

 1. Maintenance frequency differs by task, so we need to split the
    crontab by task. But we can't just split everything because we
    do not want multiple tasks running at the same time on one
    repository. We would need to group the tasks and have one entry
    saying "git maintenance run --task=<task1> --task=<task2> ..."
    for all tasks in the group.

 2. Different repositories might want different tasks at different
    frequencies, so we might need to split the crontab by repository.
    Again, we likely want to group repositories by these frequencies
    because a user could have 100 registered repositories and we don't
    really want to launch 100 parallel processes.

 3. If we want to stop maintenance, then restart it, we need to
    clear the crontab and repopulate it, which would require iterating
    through all "registered" repositories to read their config for
    frequencies.

 4. On macOS, editing the crontab doesn't require "sudo" but it _does_
    create a pop-up(!) to get permission from the user. It would be
    good to minimize how often we edit the crontab and instead use
    config edits to change frequencies.

With these things in mind, here is a suggested alternative design:

Let users specify a schedule frequency among this list: hourly, daily,
weekly, monthly. We then set the following* crontab:

	0 * * * * git for-each-repo --config=maintenance.repos maintenance run --scheduled=hourly
	0 0 * * * git for-each-repo --config=maintenance.repos maintenance run --scheduled=daily
	0 0 * * 0 git for-each-repo --config=maintenance.repos maintenance run --scheduled=weekly
	0 0 0 * * git for-each-repo --config=maintenance.repos maintenance run --scheduled=monthly

*Of course, there is some care around "$path/git --exec-path=$path"
that I drop for ease here.

Then, "git maintenance (start|stop)" can be just as simple as we have
now: write a fixed schedule every time.

The problem here is that cron will launch these processes in parallel,
and then our object-database lock will cause some to fail! If anyone
knows a simple way to tell cron "run hourly _except_ not at midnight"
then we could let the "daily" schedule also run the "hourly" jobs, for
instance. Hopefully that pattern could be extended to the weekly and
monthly collisions.

Alternatively, we could run every hour and then interpret from config
if the current "hour" matches one of the schedules ourselves. So, the
crontab would be this simple:

	0 * * * * git for-each-repo --config=maintenance.repos maintenance run --scheduled

and then we would internally decide "is this the midnight hour?" and
"is this the first day of the week?" and "is this the first day of the
month?" to detect if we should run the daily/weekly/monthly tasks. While
it adds more time-awareness into Git, it does avoid the parallel task
collisions. There are some concerns here related to long-running tasks
delaying sequential runs of "git -C <repo> maintenance run --scheduled"
causing the "is this the midnight hour?" queries to fail and having
nightly/weekly/monthly maintenance be skipped accidentally. This
motivates the *.lastRun config giving us some guarantee of _eventually_
running the tasks, just _not too frequently_.

I hope this launches a good discussion to help us find a good cron
schedule strategy. After we land on a suitable strategy, I'll summarize
all of these subtleties in the commit message for posterity.

Hopefully, the current way that I integrate with crontab and test that
integration (in PATCH 6/7) could also be reviewed in parallel with this
discussion. I'm very curious to see how that could be improved.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 4/7] for-each-repo: run subcommands on configured repos
  2020-08-25 22:19     ` Junio C Hamano
@ 2020-08-26 16:03       ` Derrick Stolee
  0 siblings, 0 replies; 41+ messages in thread
From: Derrick Stolee @ 2020-08-26 16:03 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, sandals, steadmon, jrnieder, peff, congdanhqx,
	phillip.wood123, emilyshaffer, sluongng, jonathantanmy,
	Derrick Stolee, Derrick Stolee

On 8/25/2020 6:19 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> +SYNOPSIS
>> +--------
>> +[verse]
>> +'git for-each-repo' --config=<config> [--] <arguments>
>> + ...
>> +--config=<config>::
>> +	Use the given config variable as a multi-valued list storing
>> +	absolute path names.
> 
> Would it make sense to allow this config to be read from the current
> repository, I wonder.  It is probably designed to be written to
> either ~/.gitconfig or /etc/gitconfig because it is probably a need
> that is not per-repository to list repositories for various purposes
> specified by the config key, but I suspect there _might_ be a good
> use case for storing some custom list of repositories in the
> configuration file local to a repository, but it is not quite
> obvious what it is.
> 
> If we have a good example, we may want to spell it out---that would
> help future readers who wonder about this (just like I am doing now).
> 
> Also, if we do read from local config, should there be a way to say
> "ah, you may have read values from /etc/gitconfig and ~/.gitconfig,
> but please forget them---I have a full list I care when you are
> running in this repository", i.e. clear the list.  It is purely a
> convention and there is no built-in mechanism for this in the config
> API, but often it is signalled by giving an empty string as a value.

I guess I should test this, but if I ask for a multi-valued config,
will I not get _all_ of the results from /etc/gitconfig, ~/.gitconfig,
AND .git/config? That was my expectation, which is why I don't specify
"local" or "global" config anywhere in the discussion.

> By the way, I do not have a good concrete suggestion, but can we use
> something better than <config> as the placeholder?  I first thought
> this was naming the name of a file that lists repositories, not the
> config variable name in our usual config namespace.
Sure. How about "<key>"?

>> +static int run_command_on_repo(const char *path,
>> +			       void *cbdata)
> 
> Is that on repo or in repo?  When I saw "-C" on the command line, I
> immediately thought of "in repo".

"in" is better.

>> +{
>> +	int i;
>> +	struct child_process child = CHILD_PROCESS_INIT;
>> +	struct strvec *args = (struct strvec *)cbdata;
>> +
>> +	child.git_cmd = 1;
>> +	strvec_pushl(&child.args, "-C", path, NULL);
>> +
>> +	for (i = 0; i < args->nr; i++)
>> +		strvec_push(&child.args, args->v[i]);
> 
> Would strvec_pushv() work, or is args->v[] not NULL terminated?

Yeah, pushv should work.

>> +	return run_command(&child);
>> +}
> 
> 
>> +	values = repo_config_get_value_multi(the_repository,
>> +					     config_key);
> 
> Not your fault, but it is a bit unsatisfactory that we do not have
> special "type" meant for paths in the config API, unlike the
> parse-options API where there is a "filename" type that is a bit
> richer than a vanilla "string" type by allowing "prefix" handling.
> For the purposes of this, as the values are limited to absolute/full
> pathnames, it does not hurt as much, though.

Interesting. Noted.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 1/7] maintenance: optionally skip --auto process
  2020-08-26 12:29       ` Derrick Stolee
@ 2020-08-26 16:57         ` Junio C Hamano
  0 siblings, 0 replies; 41+ messages in thread
From: Junio C Hamano @ 2020-08-26 16:57 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, sandals, steadmon, jrnieder,
	peff, congdanhqx, phillip.wood123, emilyshaffer, sluongng,
	jonathantanmy, Derrick Stolee, Derrick Stolee

Derrick Stolee <stolee@gmail.com> writes:

> Before the maintenance builtin, it would be natural to see "gc.auto=0"
> and do this same behavior, but that will not be general enough in the
> future.

I do think it is an excellent change to move the check done in
the need_to_gc() to check the value of gc.auto to run_auto_gc()
for exactly the reason the log message of this patch gives.  And
after the function is renamed to run_auto_maintenance(), at some
point the variable that gets checked would also be updated, and
we'd eventually reach the same state, I would think.

But it is so small a change that it probably is not worth the
book-keeping burden of remembering that the maintenance topic needs
to build on the patch to update auto-gc.  

> If you prefer, I can pull this out into a series on its own to be
> tracked separately.

So let's leave it as-is.

Thanks.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 2/7] maintenance: store the "last run" time in config
  2020-08-26 13:34       ` Derrick Stolee
@ 2020-08-26 17:03         ` Junio C Hamano
  2020-08-27 13:02           ` Derrick Stolee
  0 siblings, 1 reply; 41+ messages in thread
From: Junio C Hamano @ 2020-08-26 17:03 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, sandals, steadmon, jrnieder,
	peff, congdanhqx, phillip.wood123, emilyshaffer, sluongng,
	jonathantanmy, Derrick Stolee, Derrick Stolee

Derrick Stolee <stolee@gmail.com> writes:

>>>  1. The time the task takes to execute should not contribute to the
>>>     interval between running the tasks.
>> 
>> ... as long as the run time is sufficiently shorter than the
>> interval, that is.  If a task takes 10-30 minutes depending on how
>> dirty the repository is, it does not make sense to even try to run
>> it every 15 minutes.
>
> Definitely. The lock on the object database from earlier prevents these
> longer-than-anticipated tasks from stacking.

Hmph, I actually was (anticipating|hoping) that you would give a
good argument for having maintenance subsystem in change of
scheduling rather than cron, as it can monitor how the already
running job is goind and skip one cycle if needed.  The above is
instead a good argument that independent cron jobs can still
coordinate and there is no central and custom scheduler in the form
of 'maintenance run'.

>>>  2. If the task fails for some unforseen reason, it would be good to
>>>     indicate that we _attempted_ the task at a certain timestamp. This
>>>     will avoid spamming a repository that is in a bad state.
>> 
>> Absolutely.

Somebody already mentioned that using the configuration file for
recordkeeping may not be a good idea, and I tend to agree, by the
way.  I may want to periodically take a snapshot of my configuration
to notice and remember changes I made myself intentionally
(e.g. switched access method of a hosting site from ssh:// to
https://, added a new branch that builds on something else, etc.) by
comparing the snapshot with previous ones (and might even put it
under version-control) and mechanical noise would interfere with it.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 2/7] maintenance: store the "last run" time in config
  2020-08-26 17:03         ` Junio C Hamano
@ 2020-08-27 13:02           ` Derrick Stolee
  0 siblings, 0 replies; 41+ messages in thread
From: Derrick Stolee @ 2020-08-27 13:02 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee via GitGitGadget, git, sandals, steadmon, jrnieder,
	peff, congdanhqx, phillip.wood123, emilyshaffer, sluongng,
	jonathantanmy, Derrick Stolee, Derrick Stolee

On 8/26/2020 1:03 PM, Junio C Hamano wrote:
> Derrick Stolee <stolee@gmail.com> writes:
> 
>>>>  1. The time the task takes to execute should not contribute to the
>>>>     interval between running the tasks.
>>>
>>> ... as long as the run time is sufficiently shorter than the
>>> interval, that is.  If a task takes 10-30 minutes depending on how
>>> dirty the repository is, it does not make sense to even try to run
>>> it every 15 minutes.
>>
>> Definitely. The lock on the object database from earlier prevents these
>> longer-than-anticipated tasks from stacking.
> 
> Hmph, I actually was (anticipating|hoping) that you would give a
> good argument for having maintenance subsystem in change of
> scheduling rather than cron, as it can monitor how the already
> running job is goind and skip one cycle if needed.  The above is
> instead a good argument that independent cron jobs can still
> coordinate and there is no central and custom scheduler in the form
> of 'maintenance run'.

While the lock does prevent concurrent 'maintenance run' commands
from colliding and causing unpredictable behavior as they both
modify the object database, this does not help ensure that maintenance
tasks actually happen if certain tasks are fired independently by
cron and consistently collide.

This is the main motivation for me using a single crontab entry.
More discussion of all of the tradeoffs is in [1].

[1] https://lore.kernel.org/git/bd4e18b7-6265-73e7-bc1a-a7d647eafd0a@gmail.com/

>>>>  2. If the task fails for some unforseen reason, it would be good to
>>>>     indicate that we _attempted_ the task at a certain timestamp. This
>>>>     will avoid spamming a repository that is in a bad state.
>>>
>>> Absolutely.
> 
> Somebody already mentioned that using the configuration file for
> recordkeeping may not be a good idea, and I tend to agree, by the
> way.  I may want to periodically take a snapshot of my configuration
> to notice and remember changes I made myself intentionally
> (e.g. switched access method of a hosting site from ssh:// to
> https://, added a new branch that builds on something else, etc.) by
> comparing the snapshot with previous ones (and might even put it
> under version-control) and mechanical noise would interfere with it.
 
I will think of another way to handle this, then. If we cannot infer
that "this task was launched, therefore it is due to run" from an
optimal cron schedule, then I'll probably create a new file in the
.git repository that stores these values. That file would be in the
config format to make parsing easy.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 3/7] maintenance: add --scheduled option and config
  2020-08-26 15:30       ` Derrick Stolee
@ 2020-08-27 15:47         ` Derrick Stolee
  0 siblings, 0 replies; 41+ messages in thread
From: Derrick Stolee @ 2020-08-27 15:47 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, sandals, steadmon, jrnieder, peff, congdanhqx,
	phillip.wood123, emilyshaffer, sluongng, jonathantanmy,
	Derrick Stolee, Derrick Stolee, Jeff Hostetler

On 8/26/2020 11:30 AM, Derrick Stolee wrote:
> Let users specify a schedule frequency among this list: hourly, daily,
> weekly, monthly. We then set the following* crontab:
> 
> 	0 * * * * git for-each-repo --config=maintenance.repos maintenance run --scheduled=hourly
> 	0 0 * * * git for-each-repo --config=maintenance.repos maintenance run --scheduled=daily
> 	0 0 * * 0 git for-each-repo --config=maintenance.repos maintenance run --scheduled=weekly
> 	0 0 0 * * git for-each-repo --config=maintenance.repos maintenance run --scheduled=monthly
> 
> *Of course, there is some care around "$path/git --exec-path=$path"
> that I drop for ease here.

Jeff Hostetler pointed out the following details in the crontab
documentation [1]:

 Ranges of numbers are allowed.  Ranges are two numbers separated with
 a hyphen.  The specified range is inclusive.  For example, 8-11 for
 an 'hours' entry specifies execution at hours 8, 9, 10, and 11. The
 first number must be less than or equal to the second one.

[1] https://man7.org/linux/man-pages/man5/crontab.5.html

This means we could try this schedule:

 0 1-23 * * * git for-each-repo --config=maintenance.repos maintenance run --scheduled=hourly
 0 0 * * 1-6 git for-each-repo --config=maintenance.repos maintenance run --scheduled=daily
 0 0 1-30 * 0 git for-each-repo --config=maintenance.repos maintenance run --scheduled=weekly
 0 0 0 * * git for-each-repo --config=maintenance.repos maintenance run --scheduled=monthly

And it should behave this way:

 Run --scheduled=hourly every hour, except at midnight. This runs
 all "hourly" tasks.

 Run --scheduled=daily at midnight, except on Sunday. This runs all
 "hourly" and "daily" tasks.

 Run --scheduled=weekly at midnight Sunday, except on the first day
 of the month. This runs all "hourly", "daily", and "weekly" tasks.

 Run --scheduled=monthly at midnight on the first day of the month.
 This runs all scheduled tasks.

There is some subtlety between whether the "weekly" runs should be a
subset of "monthly" and maybe the easiest way to handle that would
be to not support "monthly" and have only "hourly", "daily", and "weekly"
options for now.

This should get around all of the parallel issues and allow us to drop
the *.lastRun config option.

Thoughts?

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v3 0/6] [RFC] Maintenance III: background maintenance
  2020-08-25 18:39 ` [PATCH v2 0/7] [RFC] Maintenance III: background maintenance Derrick Stolee via GitGitGadget
                     ` (6 preceding siblings ...)
  2020-08-25 18:40   ` [PATCH v2 7/7] maintenance: recommended schedule in register/start Derrick Stolee via GitGitGadget
@ 2020-08-28 15:45   ` Derrick Stolee via GitGitGadget
  2020-08-28 15:45     ` [PATCH v3 1/6] maintenance: optionally skip --auto process Derrick Stolee via GitGitGadget
                       ` (5 more replies)
  7 siblings, 6 replies; 41+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-28 15:45 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Derrick Stolee

This is based on v3 of Part II (ds/maintenance-part-2) [1].

[1] 
https://lore.kernel.org/git/pull.696.v3.git.1598380599.gitgitgadget@gmail.com/

This RFC is intended to show how I hope to integrate true background
maintenance into Git. As opposed to my original RFC [2], this entirely
integrates with cron (through crontab [-e|-l]) to launch maintenance
commands in the background.

[2] 
https://lore.kernel.org/git/pull.597.git.1585946894.gitgitgadget@gmail.com/

Some preliminary work is done to allow a new --schedule option that tells
the command which tasks to run based on a maintenance.<task>.schedule config
option. The timing is not enforced by Git, but instead is expected to be
provided as a hint from a cron schedule.

A new for-each-repo builtin runs Git commands on every repo in a given list.
Currently, the list is stored as a config setting, allowing a new 
maintenance.repos config list to store the repositories registered for
background maintenance. Others may want to add a --file=<file> option for
their own workflows, but I focused on making this as simple as possible for
now.

The updates to the git maintenance builtin include new register/unregister 
subcommands and start/stop subcommands. The register subcommand initializes
the config while the start subcommand does everything register does plus 
update the cron table. The unregister and stop commands reverse this
process.

The very last patch is entirely optional. It sets a recommended schedule
based on my own experience with very large repositories. I'm open to other
suggestions, but these are ones that I think work well and don't cause a
"rewrite the world" scenario like running nightly 'gc' would do.

I've been testing this scenario on my macOS laptop for a while and my Linux
machine. I have modified my cron task to provide logging via trace2 so I can
see what's happening. A future direction here would be to add some
maintenance logs to the repository so we can track what is happening and
diagnose whether the maintenance strategy is working on real repos.

Note: git maintenance (start|stop) only works on machines with cron by
design. The proper thing to do on Windows will come later. Perhaps this
command should be marked as unavailable on Windows somehow, or at least a
better error than "cron may not be available on your system". I did find
that that message is helpful sometimes: macOS worker agents for CI builds
typically do not have cron available.

Updates since RFC v2
====================

 * Update the cron schedule with three lines saying "run hourly except at
   midnight", "run daily except on first day of week", and "run weekly".
   This avoids parallel processes competing for the object database lock.
   
   
 * Update the --schedule= and 'maintenance..schedule' config options. This
   is reflected in the recommended schedule at the end.
   
   
 * Drop the *.lastRun config option. It was going to trash config files but
   it is also not needed by the new cron schedule.
   
   

I expect this to be my final RFC version before restarting the thread with a
v1 next week. Please throw any and all critique at the plan here!

Updates since RFC v1
====================

 * Some fallout from rewriting the option parsing in "Maintenance I"
   
   
 * This applies cleanly on v3 of "Maintenance II"
   
   
 * Several helpful feedback items from Đoàn Trần Công Danh are applied.
   
   
 * There is an unresolved comment around the use of approxidate("now").
   These calls are untouched from v1.
   
   

Thanks, -Stolee

Cc: sandals@crustytoothpaste.net [sandals@crustytoothpaste.net], 
steadmon@google.com [steadmon@google.com], jrnieder@gmail.com
[jrnieder@gmail.com], peff@peff.net [peff@peff.net], congdanhqx@gmail.com
[congdanhqx@gmail.com], phillip.wood123@gmail.com
[phillip.wood123@gmail.com], emilyshaffer@google.com
[emilyshaffer@google.com], sluongng@gmail.com [sluongng@gmail.com], 
jonathantanmy@google.com [jonathantanmy@google.com]

Derrick Stolee (6):
  maintenance: optionally skip --auto process
  maintenance: add --schedule option and config
  for-each-repo: run subcommands on configured repos
  maintenance: add [un]register subcommands
  maintenance: add start/stop subcommands
  maintenance: recommended schedule in register/start

 .gitignore                           |   1 +
 Documentation/config/maintenance.txt |  10 +
 Documentation/git-for-each-repo.txt  |  59 ++++++
 Documentation/git-maintenance.txt    |  44 +++-
 Makefile                             |   2 +
 builtin.h                            |   1 +
 builtin/for-each-repo.c              |  58 ++++++
 builtin/gc.c                         | 292 ++++++++++++++++++++++++++-
 command-list.txt                     |   1 +
 git.c                                |   1 +
 run-command.c                        |   6 +
 t/helper/test-crontab.c              |  35 ++++
 t/helper/test-tool.c                 |   1 +
 t/helper/test-tool.h                 |   1 +
 t/t0068-for-each-repo.sh             |  30 +++
 t/t7900-maintenance.sh               | 114 ++++++++++-
 t/test-lib.sh                        |   6 +
 17 files changed, 654 insertions(+), 8 deletions(-)
 create mode 100644 Documentation/git-for-each-repo.txt
 create mode 100644 builtin/for-each-repo.c
 create mode 100644 t/helper/test-crontab.c
 create mode 100755 t/t0068-for-each-repo.sh


base-commit: e9bb32f53ade2067f773bfe6e5c13ed1a5d694a6
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-680%2Fderrickstolee%2Fmaintenance%2Fscheduled-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-680/derrickstolee/maintenance/scheduled-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/680

Range-diff vs v2:

 1:  5fdd8188b1 = 1:  5fdd8188b1 maintenance: optionally skip --auto process
 2:  e3ef0b9bea < -:  ---------- maintenance: store the "last run" time in config
 3:  c728c57d85 ! 2:  41a067894d maintenance: add --scheduled option and config
     @@ Metadata
      Author: Derrick Stolee <dstolee@microsoft.com>
      
       ## Commit message ##
     -    maintenance: add --scheduled option and config
     +    maintenance: add --schedule option and config
      
          A user may want to run certain maintenance tasks based on frequency, not
          conditions given in the repository. For example, the user may want to
          perform a 'prefetch' task every hour, or 'gc' task every day. To assist,
     -    update the 'git maintenance run --scheduled' command to check the config
     -    for the last run of that task and add a number of seconds. The task
     -    would then run only if the current time is beyond that minimum
     -    timestamp.
     +    update the 'git maintenance run' command to include a
     +    '--schedule=<frequency>' option. The allowed frequencies are 'hourly',
     +    'daily', and 'weekly'. These values are also allowed in a new config
     +    value 'maintenance.<task>.schedule'.
      
     -    Add a '--scheduled' option to 'git maintenance run' to only run tasks
     -    that have had enough time pass since their last run. This is done for
     -    each enabled task by checking if the current timestamp is at least as
     -    large as the sum of 'maintenance.<task>.lastRun' and
     -    'maintenance.<task>.schedule' in the Git config. This second value is
     -    new to this commit, storing a number of seconds intended between runs.
     +    The 'git maintenance run --schedule=<frequency>' checks the '*.schedule'
     +    config value for each enabled task to see if the configured frequency is
     +    at least as frequent as the frequency from the '--schedule' argument. We
     +    use the following order, for full clarity:
      
     -    A user could then set up an hourly maintenance run with the following
     -    cron table:
     +            'hourly' > 'daily' > 'weekly'
      
     -      0 * * * * git -C <repo> maintenance run --scheduled
     +    Use new 'enum schedule_priority' to track these values numerically.
      
     -    Then, the user could configure the repository with the following config
     -    values:
     +    The following cron table would run the scheduled tasks with the correct
     +    frequencies:
      
     -      maintenance.prefetch.schedule  3000
     -      maintenance.gc.schedule       86000
     +      0 1-23 * * *    git -C <repo> maintenance run --scheduled=hourly
     +      0 0    * * 1-6  git -C <repo> maintenance run --scheduled=daily
     +      0 0    * * 0    git -C <repo> maintenance run --scheduled=weekly
      
     -    These numbers are slightly lower than one hour and one day (in seconds).
     -    The cron schedule will enforce the hourly run rate, but we can use these
     -    schedules to ensure the 'gc' task runs once a day. The error is given
     -    because the *.lastRun config option is specified at the _start_ of the
     -    task run. Otherwise, a slow task run could shift the "daily" job of 'gc'
     -    from a 10:00pm run to 11:00pm run, or later.
     +    This cron schedule will run --scheduled=hourly every hour except at
     +    midnight. This avoids a concurrent run with the --scheduled=daily that
     +    runs at midnight every day except the first day of the week. This avoids
     +    a concurrent run with the --scheduled=weekly that runs at midnight on
     +    the first day of the week. Since --scheduled=daily also runs the
     +    'hourly' tasks and --scheduled=weekly runs the 'hourly' and 'daily'
     +    tasks, we will still see all tasks run with the proper frequencies.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## Documentation/config/maintenance.txt ##
     -@@ Documentation/config/maintenance.txt: maintenance.<task>.lastRun::
     - 	`<task>` is run. It stores a timestamp representing the most-recent
     - 	run of the `<task>`.
     +@@ Documentation/config/maintenance.txt: maintenance.<task>.enabled::
     + 	`--task` option exists. By default, only `maintenance.gc.enabled`
     + 	is true.
       
      +maintenance.<task>.schedule::
      +	This config option controls whether or not the given `<task>` runs
     -+	during a `git maintenance run --scheduled` command. If the option
     -+	is an integer value `S`, then the `<task>` is run when the current
     -+	time is `S` seconds after the timestamp stored in
     -+	`maintenance.<task>.lastRun`. If the option has no value or a
     -+	non-integer value, then the task will never run with the `--scheduled`
     -+	option.
     ++	during a `git maintenance run --schedule=<frequency>` command. The
     ++	value must be one of "hourly", "daily", or "weekly".
      +
       maintenance.commit-graph.auto::
       	This integer config option controls how often the `commit-graph` task
     @@ Documentation/git-maintenance.txt: OPTIONS
       	in the `gc.auto` config setting, or when the number of pack-files
      -	exceeds the `gc.autoPackLimit` config setting.
      +	exceeds the `gc.autoPackLimit` config setting. Not compatible with
     -+	the `--scheduled` option.
     ++	the `--schedule` option.
      +
     -+--scheduled::
     ++--schedule::
      +	When combined with the `run` subcommand, run maintenance tasks
      +	only if certain time conditions are met, as specified by the
      +	`maintenance.<task>.schedule` config value for each `<task>`.
     @@ Documentation/git-maintenance.txt: OPTIONS
      
       ## builtin/gc.c ##
      @@ builtin/gc.c: int cmd_gc(int argc, const char **argv, const char *prefix)
     + 	return 0;
       }
       
     - static const char * const builtin_maintenance_run_usage[] = {
     +-static const char * const builtin_maintenance_run_usage[] = {
      -	N_("git maintenance run [--auto] [--[no-]quiet] [--task=<task>]"),
     -+	N_("git maintenance run [--auto] [--[no-]quiet] [--task=<task>] [--scheduled]"),
     ++static const char *const builtin_maintenance_run_usage[] = {
     ++	N_("git maintenance run [--auto] [--[no-]quiet] [--task=<task>] [--schedule]"),
       	NULL
       };
       
     ++enum schedule_priority {
     ++	SCHEDULE_NONE = 0,
     ++	SCHEDULE_WEEKLY = 1,
     ++	SCHEDULE_DAILY = 2,
     ++	SCHEDULE_HOURLY = 3,
     ++};
     ++
     ++static enum schedule_priority parse_schedule(const char *value)
     ++{
     ++	if (!value)
     ++		return SCHEDULE_NONE;
     ++	if (!strcasecmp(value, "hourly"))
     ++		return SCHEDULE_HOURLY;
     ++	if (!strcasecmp(value, "daily"))
     ++		return SCHEDULE_DAILY;
     ++	if (!strcasecmp(value, "weekly"))
     ++		return SCHEDULE_WEEKLY;
     ++	return SCHEDULE_NONE;
     ++}
     ++
     ++static int maintenance_opt_schedule(const struct option *opt, const char *arg,
     ++				    int unset)
     ++{
     ++	enum schedule_priority *priority = opt->value;
     ++
     ++	if (unset)
     ++		die(_("--no-schedule is not allowed"));
     ++
     ++	*priority = parse_schedule(arg);
     ++
     ++	if (!*priority)
     ++		die(_("unrecognized --schedule argument '%s'"), arg);
     ++
     ++	return 0;
     ++}
     ++
       struct maintenance_run_opts {
       	int auto_flag;
     -+	int scheduled;
       	int quiet;
     ++	enum schedule_priority schedule;
       };
       
     + /* Remember to update object flag allocation in object.h */
      @@ builtin/gc.c: struct maintenance_task {
     - 	const char *name;
     - 	maintenance_task_fn *fn;
       	maintenance_auto_fn *auto_condition;
     --	unsigned enabled:1;
     -+	unsigned enabled:1,
     -+		 scheduled:1;
     + 	unsigned enabled:1;
       
     ++	enum schedule_priority schedule;
     ++
       	/* -1 if not selected. */
       	int selected_order;
     + };
      @@ builtin/gc.c: static int maintenance_run_tasks(struct maintenance_run_opts *opts)
     - 		     !tasks[i].auto_condition()))
       			continue;
       
     -+		if (opts->scheduled && !tasks[i].scheduled)
     + 		if (opts->auto_flag &&
     +-		    (!tasks[i].auto_condition ||
     +-		     !tasks[i].auto_condition()))
     ++		    (!tasks[i].auto_condition || !tasks[i].auto_condition()))
      +			continue;
      +
     - 		update_last_run(&tasks[i]);
     ++		if (opts->schedule && tasks[i].schedule < opts->schedule)
     + 			continue;
       
       		trace2_region_enter("maintenance", tasks[i].name, r);
     -@@ builtin/gc.c: static int maintenance_run_tasks(struct maintenance_run_opts *opts)
     - 	return result;
     - }
     - 
     -+static void fill_schedule_info(struct maintenance_task *task,
     -+			       const char *config_name,
     -+			       timestamp_t schedule_delay)
     -+{
     -+	timestamp_t now = approxidate("now");
     -+	char *value = NULL;
     -+	struct strbuf last_run = STRBUF_INIT;
     -+	int64_t previous_run;
     -+
     -+	strbuf_addf(&last_run, "maintenance.%s.lastrun", task->name);
     -+
     -+	if (git_config_get_string(last_run.buf, &value))
     -+		task->scheduled = 1;
     -+	else {
     -+		previous_run = git_config_int64(last_run.buf, value);
     -+		if (now >= previous_run + schedule_delay)
     -+			task->scheduled = 1;
     -+	}
     -+
     -+	free(value);
     -+	strbuf_release(&last_run);
     -+}
     -+
     - static void initialize_task_config(void)
     - {
     - 	int i;
      @@ builtin/gc.c: static void initialize_task_config(void)
       
       	for (i = 0; i < TASK__COUNT; i++) {
     @@ builtin/gc.c: static void initialize_task_config(void)
      +			    tasks[i].name);
      +
      +		if (!git_config_get_string(config_name.buf, &config_str)) {
     -+			timestamp_t schedule_delay = git_config_int64(
     -+							config_name.buf,
     -+							config_str);
     -+			fill_schedule_info(&tasks[i],
     -+						config_name.buf,
     -+						schedule_delay);
     ++			tasks[i].schedule = parse_schedule(config_str);
      +			free(config_str);
      +		}
       	}
     @@ builtin/gc.c: static int maintenance_run(int argc, const char **argv, const char
       	struct option builtin_maintenance_run_options[] = {
       		OPT_BOOL(0, "auto", &opts.auto_flag,
       			 N_("run tasks based on the state of the repository")),
     -+		OPT_BOOL(0, "scheduled", &opts.scheduled,
     -+			 N_("run tasks based on time intervals")),
     ++		OPT_CALLBACK(0, "schedule", &opts.schedule, N_("frequency"),
     ++			     N_("run tasks based on frequency"),
     ++			     maintenance_opt_schedule),
       		OPT_BOOL(0, "quiet", &opts.quiet,
       			 N_("do not report progress or other information over stderr")),
       		OPT_CALLBACK_F(0, "task", NULL, N_("task"),
     @@ builtin/gc.c: static int maintenance_run(int argc, const char **argv, const char
       			     builtin_maintenance_run_usage,
       			     PARSE_OPT_STOP_AT_NON_OPTION);
       
     -+	if (opts.auto_flag + opts.scheduled > 1)
     -+		die(_("use at most one of the --auto and --scheduled options"));
     ++	if (opts.auto_flag && opts.schedule)
     ++		die(_("use at most one of --auto and --schedule=<frequency>"));
      +
       	if (argc != 0)
       		usage_with_options(builtin_maintenance_run_usage,
     @@ t/t7900-maintenance.sh: test_expect_success 'maintenance.incremental-repack.auto
       	done
       '
       
     -+test_expect_success '--auto and --scheduled incompatible' '
     -+	test_must_fail git maintenance run --auto --scheduled 2>err &&
     ++test_expect_success '--auto and --schedule incompatible' '
     ++	test_must_fail git maintenance run --auto --schedule=daily 2>err &&
      +	test_i18ngrep "at most one" err
      +'
      +
     - test_expect_success 'tasks update maintenance.<task>.lastRun' '
     - 	git config --unset maintenance.commit-graph.lastrun &&
     - 	GIT_TRACE2_EVENT="$(pwd)/run.txt" \
     -@@ t/t7900-maintenance.sh: test_expect_success 'tasks update maintenance.<task>.lastRun' '
     - 	test_cmp_config 1595000000 maintenance.commit-graph.lastrun
     - '
     - 
     -+test_expect_success '--scheduled with specific time' '
     -+	git config maintenance.commit-graph.schedule 100 &&
     -+	GIT_TRACE2_EVENT="$(pwd)/too-soon.txt" \
     -+		GIT_TEST_DATE_NOW=1595000099 \
     -+		git maintenance run --scheduled 2>/dev/null &&
     ++test_expect_success 'invalid --schedule value' '
     ++	test_must_fail git maintenance run --schedule=annually 2>err &&
     ++	test_i18ngrep "unrecognized --schedule" err
     ++'
     ++
     ++test_expect_success '--schedule inheritance weekly -> daily -> hourly' '
     ++	git config maintenance.loose-objects.enabled true &&
     ++	git config maintenance.loose-objects.schedule hourly &&
     ++	git config maintenance.commit-graph.enabled true &&
     ++	git config maintenance.commit-graph.schedule daily &&
     ++	git config maintenance.incremental-repack.enabled true &&
     ++	git config maintenance.incremental-repack.schedule weekly &&
     ++
     ++	GIT_TRACE2_EVENT="$(pwd)/hourly.txt" \
     ++		git maintenance run --schedule=hourly 2>/dev/null &&
     ++	test_subcommand git prune-packed --quiet <hourly.txt &&
      +	test_subcommand ! git commit-graph write --split --reachable \
     -+		--no-progress <too-soon.txt &&
     -+	GIT_TRACE2_EVENT="$(pwd)/long-enough.txt" \
     -+		GIT_TEST_DATE_NOW=1595000100 \
     -+		git maintenance run --scheduled 2>/dev/null &&
     ++		--no-progress <hourly.txt &&
     ++	test_subcommand ! git multi-pack-index write --no-progress <hourly.txt &&
     ++
     ++	GIT_TRACE2_EVENT="$(pwd)/daily.txt" \
     ++		git maintenance run --schedule=daily 2>/dev/null &&
     ++	test_subcommand git prune-packed --quiet <daily.txt &&
     ++	test_subcommand git commit-graph write --split --reachable \
     ++		--no-progress <daily.txt &&
     ++	test_subcommand ! git multi-pack-index write --no-progress <daily.txt &&
     ++
     ++	GIT_TRACE2_EVENT="$(pwd)/weekly.txt" \
     ++		git maintenance run --schedule=weekly 2>/dev/null &&
     ++	test_subcommand git prune-packed --quiet <weekly.txt &&
      +	test_subcommand git commit-graph write --split --reachable \
     -+		--no-progress <long-enough.txt &&
     -+	test_cmp_config 1595000100 maintenance.commit-graph.lastrun
     ++		--no-progress <weekly.txt &&
     ++	test_subcommand git multi-pack-index write --no-progress <weekly.txt
      +'
      +
       test_done
 4:  0314258c5c = 3:  b29b68614b for-each-repo: run subcommands on configured repos
 5:  c0ce1267a9 ! 4:  fc741fab5a maintenance: add [un]register subcommands
     @@ t/t7900-maintenance.sh: GIT_TEST_MULTI_PACK_INDEX=0
       	test_expect_code 128 git maintenance barf 2>err &&
       	test_i18ngrep "invalid subcommand: barf" err
       '
     -@@ t/t7900-maintenance.sh: test_expect_success '--scheduled with specific time' '
     - 	test_cmp_config 1595000100 maintenance.commit-graph.lastrun
     +@@ t/t7900-maintenance.sh: test_expect_success '--schedule inheritance weekly -> daily -> hourly' '
     + 	test_subcommand git multi-pack-index write --no-progress <weekly.txt
       '
       
      +test_expect_success 'register and unregister' '
 6:  8a7c34035a ! 5:  e9672c6a6c maintenance: add start/stop subcommands
     @@ Commit message
          maintenance using 'cron', when available. This integration is as simple
          as I could make it, barring some implementation complications.
      
     -    For now, the background maintenance is scheduled to run hourly via the
     -    following cron table row (ignore line breaks):
     +    The schedule is laid out as follows:
      
     -            0 * * * * $p/git --exec-path=$p
     -                    for-each-repo --config=maintenance.repo
     -                    maintenance run --scheduled
     +      0 1-23 * * *   $cmd maintenance run --schedule=hourly
     +      0 0    * * 1-6 $cmd maintenance run --schedule=daily
     +      0 0    * * 0   $cmd maintenance run --schedule=weekly
      
     -    Future extensions may want to add more complex schedules or some form of
     -    logging. For now, hourly runs seem frequent enough to satisfy the needs
     -    of tasks like 'prefetch' without being so frequent that users would
     -    complain about many no-op commands.
     +    where $cmd is a properly-qualified 'git for-each-repo' execution:
      
     -    Here, "$p" is a placeholder for the path to the current Git executable.
     -    This is critical for systems with multiple versions of Git.
     -    Specifically, macOS has a system version at '/usr/bin/git' while the
     -    version that users can install resides at '/usr/local/bin/git' (symlinked
     -    to '/usr/local/libexec/git-core/git'). This will also use your
     -    locally-built version if you build and run this in your development
     +    $cmd=$path/git --exec-path=$path for-each-repo --config=maintenance.repo
     +
     +    where $path points to the location of the Git executable running 'git
     +    maintenance start'. This is critical for systems with multiple versions
     +    of Git. Specifically, macOS has a system version at '/usr/bin/git' while
     +    the version that users can install resides at '/usr/local/bin/git'
     +    (symlinked to '/usr/local/libexec/git-core/git'). This will also use
     +    your locally-built version if you build and run this in your development
          environment without installing first.
      
     +    This conditional schedule avoids having cron launch multiple 'git
     +    for-each-repo' commands in parallel. Such parallel commands would likely
     +    lead to the 'hourly' and 'daily' tasks competing over the object
     +    database lock. This could lead to to some tasks never being run! Since
     +    the --schedule=<frequency> argument will run all tasks with _at least_
     +    the given frequency, the daily runs will also run the hourly tasks.
     +    Similarly, the weekly runs will also run the daily and hourly tasks.
     +
          The GIT_TEST_CRONTAB environment variable is not intended for users to
          edit, but instead as a way to mock the 'crontab [-l]' command. This
          variable is set in test-lib.sh to avoid a future test from accidentally
     @@ builtin/gc.c: static int maintenance_unregister(void)
      +	}
      +
      +	if (run_maintenance) {
     ++		struct strbuf line_format = STRBUF_INIT;
      +		const char *exec_path = git_exec_path();
      +
     -+		fprintf(cron_in, "\n%s\n", BEGIN_LINE);
     -+		fprintf(cron_in, "# The following schedule was created by Git\n");
     ++		fprintf(cron_in, "%s\n", BEGIN_LINE);
     ++		fprintf(cron_in,
     ++			"# The following schedule was created by Git\n");
      +		fprintf(cron_in, "# Any edits made in this region might be\n");
     -+		fprintf(cron_in, "# replaced in the future by a Git command.\n\n");
     -+
      +		fprintf(cron_in,
     -+			"0 * * * * \"%s/git\" --exec-path=\"%s\" for-each-repo --config=maintenance.repo maintenance run --scheduled\n",
     -+			exec_path, exec_path);
     ++			"# replaced in the future by a Git command.\n\n");
     ++
     ++		strbuf_addf(&line_format,
     ++			    "%%s %%s * * %%s \"%s/git\" --exec-path=\"%s\" for-each-repo --config=maintenance.repo maintenance run --schedule=%%s\n",
     ++			    exec_path, exec_path);
     ++		fprintf(cron_in, line_format.buf, "0", "1-23", "*", "hourly");
     ++		fprintf(cron_in, line_format.buf, "0", "0", "1-6", "daily");
     ++		fprintf(cron_in, line_format.buf, "0", "0", "0", "weekly");
     ++		strbuf_release(&line_format);
      +
      +		fprintf(cron_in, "\n%s\n", END_LINE);
      +	}
     @@ t/t7900-maintenance.sh: test_expect_success 'register and unregister' '
      +	# start registers the repo
      +	git config --get --global maintenance.repo "$(pwd)" &&
      +
     -+	grep "for-each-repo --config=maintenance.repo maintenance run --scheduled" cron.txt
     ++	grep "for-each-repo --config=maintenance.repo maintenance run --schedule=daily" cron.txt &&
     ++	grep "for-each-repo --config=maintenance.repo maintenance run --schedule=hourly" cron.txt &&
     ++	grep "for-each-repo --config=maintenance.repo maintenance run --schedule=weekly" cron.txt
      +'
      +
      +test_expect_success 'stop from existing schedule' '
     @@ t/t7900-maintenance.sh: test_expect_success 'register and unregister' '
      +	# stop does not unregister the repo
      +	git config --get --global maintenance.repo "$(pwd)" &&
      +
     -+	# The newline is preserved
     -+	echo >empty &&
     -+	test_cmp empty cron.txt &&
     -+
      +	# Operation is idempotent
      +	GIT_TEST_CRONTAB="test-tool crontab cron.txt" git maintenance stop &&
     -+	test_cmp empty cron.txt
     ++	test_must_be_empty cron.txt
      +'
      +
      +test_expect_success 'start preserves existing schedule' '
 7:  9ecabeb055 ! 6:  62e8db8b2a maintenance: recommended schedule in register/start
     @@ Commit message
          repository. It does not specify what maintenance should occur or how
          often.
      
     -    If a user sets any 'maintenance.<task>.scheduled' config value, then
     +    If a user sets any 'maintenance.<task>.schedule' config value, then
          they have chosen a specific schedule for themselves and Git should
          respect that.
      
     @@ Commit message
          schedule we use in Scalar and VFS for Git for very large repositories
          using the GVFS protocol. While the schedule works in that environment,
          it is possible that "normal" Git repositories could benefit from
     -    something more obvious (such as running 'gc' once a day). However, this
     +    something more obvious (such as running 'gc' weekly). However, this
          patch gives us a place to start a conversation on what we should
          recommend. For my purposes, Scalar will set these config values so we
          can always differ from core Git's recommendations.
     @@ builtin/gc.c: static int maintenance_run(int argc, const char **argv, const char
      +	prefix = config_name.len;
      +
      +	for (i = 0; !found && i < TASK__COUNT; i++) {
     -+		int value;
     ++		char *value;
      +
      +		strbuf_setlen(&config_name, prefix);
      +		strbuf_addf(&config_name, "%s.schedule", tasks[i].name);
      +
     -+		if (!git_config_get_int(config_name.buf, &value))
     ++		if (!git_config_get_string(config_name.buf, &value)) {
      +			found = 1;
     ++			FREE_AND_NULL(value);
     ++		}
      +	}
      +
      +	strbuf_release(&config_name);
     @@ builtin/gc.c: static int maintenance_run(int argc, const char **argv, const char
      +	git_config_set("maintenance.gc.enabled", "false");
      +
      +	git_config_set("maintenance.prefetch.enabled", "true");
     -+	git_config_set("maintenance.prefetch.schedule", "3500");
     ++	git_config_set("maintenance.prefetch.schedule", "hourly");
      +
      +	git_config_set("maintenance.commit-graph.enabled", "true");
     -+	git_config_set("maintenance.commit-graph.schedule", "3500");
     ++	git_config_set("maintenance.commit-graph.schedule", "hourly");
      +
      +	git_config_set("maintenance.loose-objects.enabled", "true");
     -+	git_config_set("maintenance.loose-objects.schedule", "86000");
     ++	git_config_set("maintenance.loose-objects.schedule", "daily");
      +
      +	git_config_set("maintenance.incremental-repack.enabled", "true");
     -+	git_config_set("maintenance.incremental-repack.schedule", "86000");
     ++	git_config_set("maintenance.incremental-repack.schedule", "daily");
      +}
      +
       static int maintenance_register(void)
     @@ builtin/gc.c: static int maintenance_register(void)
       	if (!the_repository || !the_repository->gitdir)
       		return 0;
       
     -+	if (has_schedule_config())
     ++	if (!has_schedule_config())
      +		set_recommended_schedule();
      +
       	config_get.git_cmd = 1;
     @@ builtin/gc.c: static int maintenance_register(void)
      
       ## t/t7900-maintenance.sh ##
      @@ t/t7900-maintenance.sh: test_expect_success 'register and unregister' '
     + 	git config --global --add maintenance.repo /existing1 &&
       	git config --global --add maintenance.repo /existing2 &&
       	git config --global --get-all maintenance.repo >before &&
     ++
     ++	# We still have maintenance.<task>.schedule config set,
     ++	# so this does not update the local schedule
     ++	git maintenance register &&
     ++	test_must_fail git config maintenance.auto &&
     ++
     ++	# Clear previous maintenance.<task>.schedule values
     ++	for task in loose-objects commit-graph incremental-repack
     ++	do
     ++		git config --unset maintenance.$task.schedule || return 1
     ++	done &&
       	git maintenance register &&
      +	test_cmp_config false maintenance.auto &&
      +	test_cmp_config false maintenance.gc.enabled &&
      +	test_cmp_config true maintenance.prefetch.enabled &&
     -+	test_cmp_config 3500 maintenance.commit-graph.schedule &&
     -+	test_cmp_config 86000 maintenance.incremental-repack.schedule &&
     ++	test_cmp_config hourly maintenance.commit-graph.schedule &&
     ++	test_cmp_config daily maintenance.incremental-repack.schedule &&
       	git config --global --get-all maintenance.repo >actual &&
       	cp before after &&
       	pwd >>after &&

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v3 1/6] maintenance: optionally skip --auto process
  2020-08-28 15:45   ` [PATCH v3 0/6] [RFC] Maintenance III: background maintenance Derrick Stolee via GitGitGadget
@ 2020-08-28 15:45     ` Derrick Stolee via GitGitGadget
  2020-08-28 15:45     ` [PATCH v3 2/6] maintenance: add --schedule option and config Derrick Stolee via GitGitGadget
                       ` (4 subsequent siblings)
  5 siblings, 0 replies; 41+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-28 15:45 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Some commands run 'git maintenance run --auto --[no-]quiet' after doing
their normal work, as a way to keep repositories clean as they are used.
Currently, users who do not want this maintenance to occur would set the
'gc.auto' config option to 0 to avoid the 'gc' task from running.
However, this does not stop the extra process invocation. On Windows,
this extra process invocation can be more expensive than necessary.

Allow users to drop this extra process by setting 'maintenance.auto' to
'false'.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/config/maintenance.txt |  5 +++++
 run-command.c                        |  6 ++++++
 t/t7900-maintenance.sh               | 13 +++++++++++++
 3 files changed, 24 insertions(+)

diff --git a/Documentation/config/maintenance.txt b/Documentation/config/maintenance.txt
index a0706d8f09..06db758172 100644
--- a/Documentation/config/maintenance.txt
+++ b/Documentation/config/maintenance.txt
@@ -1,3 +1,8 @@
+maintenance.auto::
+	This boolean config option controls whether some commands run
+	`git maintenance run --auto` after doing their normal work. Defaults
+	to true.
+
 maintenance.<task>.enabled::
 	This boolean config option controls whether the maintenance task
 	with name `<task>` is run when no `--task` option is specified to
diff --git a/run-command.c b/run-command.c
index 2ee59acdc8..ea4d0fb4b1 100644
--- a/run-command.c
+++ b/run-command.c
@@ -7,6 +7,7 @@
 #include "strbuf.h"
 #include "string-list.h"
 #include "quote.h"
+#include "config.h"
 
 void child_process_init(struct child_process *child)
 {
@@ -1868,8 +1869,13 @@ int run_processes_parallel_tr2(int n, get_next_task_fn get_next_task,
 
 int run_auto_maintenance(int quiet)
 {
+	int enabled;
 	struct child_process maint = CHILD_PROCESS_INIT;
 
+	if (!git_config_get_bool("maintenance.auto", &enabled) &&
+	    !enabled)
+		return 0;
+
 	maint.git_cmd = 1;
 	strvec_pushl(&maint.args, "maintenance", "run", "--auto", NULL);
 	strvec_push(&maint.args, quiet ? "--quiet" : "--no-quiet");
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 6f878b0141..e0ba19e1ff 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -26,6 +26,19 @@ test_expect_success 'run [--auto|--quiet]' '
 	test_subcommand git gc --no-quiet <run-no-quiet.txt
 '
 
+test_expect_success 'maintenance.auto config option' '
+	GIT_TRACE2_EVENT="$(pwd)/default" git commit --quiet --allow-empty -m 1 &&
+	test_subcommand git maintenance run --auto --quiet <default &&
+	GIT_TRACE2_EVENT="$(pwd)/true" \
+		git -c maintenance.auto=true \
+		commit --quiet --allow-empty -m 2 &&
+	test_subcommand git maintenance run --auto --quiet  <true &&
+	GIT_TRACE2_EVENT="$(pwd)/false" \
+		git -c maintenance.auto=false \
+		commit --quiet --allow-empty -m 3 &&
+	test_subcommand ! git maintenance run --auto --quiet  <false
+'
+
 test_expect_success 'maintenance.<task>.enabled' '
 	git config maintenance.gc.enabled false &&
 	git config maintenance.commit-graph.enabled true &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 2/6] maintenance: add --schedule option and config
  2020-08-28 15:45   ` [PATCH v3 0/6] [RFC] Maintenance III: background maintenance Derrick Stolee via GitGitGadget
  2020-08-28 15:45     ` [PATCH v3 1/6] maintenance: optionally skip --auto process Derrick Stolee via GitGitGadget
@ 2020-08-28 15:45     ` Derrick Stolee via GitGitGadget
  2020-08-28 15:45     ` [PATCH v3 3/6] for-each-repo: run subcommands on configured repos Derrick Stolee via GitGitGadget
                       ` (3 subsequent siblings)
  5 siblings, 0 replies; 41+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-28 15:45 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

A user may want to run certain maintenance tasks based on frequency, not
conditions given in the repository. For example, the user may want to
perform a 'prefetch' task every hour, or 'gc' task every day. To assist,
update the 'git maintenance run' command to include a
'--schedule=<frequency>' option. The allowed frequencies are 'hourly',
'daily', and 'weekly'. These values are also allowed in a new config
value 'maintenance.<task>.schedule'.

The 'git maintenance run --schedule=<frequency>' checks the '*.schedule'
config value for each enabled task to see if the configured frequency is
at least as frequent as the frequency from the '--schedule' argument. We
use the following order, for full clarity:

	'hourly' > 'daily' > 'weekly'

Use new 'enum schedule_priority' to track these values numerically.

The following cron table would run the scheduled tasks with the correct
frequencies:

  0 1-23 * * *    git -C <repo> maintenance run --scheduled=hourly
  0 0    * * 1-6  git -C <repo> maintenance run --scheduled=daily
  0 0    * * 0    git -C <repo> maintenance run --scheduled=weekly

This cron schedule will run --scheduled=hourly every hour except at
midnight. This avoids a concurrent run with the --scheduled=daily that
runs at midnight every day except the first day of the week. This avoids
a concurrent run with the --scheduled=weekly that runs at midnight on
the first day of the week. Since --scheduled=daily also runs the
'hourly' tasks and --scheduled=weekly runs the 'hourly' and 'daily'
tasks, we will still see all tasks run with the proper frequencies.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/config/maintenance.txt |  5 +++
 Documentation/git-maintenance.txt    | 13 +++++-
 builtin/gc.c                         | 67 +++++++++++++++++++++++++---
 t/t7900-maintenance.sh               | 40 +++++++++++++++++
 4 files changed, 119 insertions(+), 6 deletions(-)

diff --git a/Documentation/config/maintenance.txt b/Documentation/config/maintenance.txt
index 06db758172..70585564fa 100644
--- a/Documentation/config/maintenance.txt
+++ b/Documentation/config/maintenance.txt
@@ -10,6 +10,11 @@ maintenance.<task>.enabled::
 	`--task` option exists. By default, only `maintenance.gc.enabled`
 	is true.
 
+maintenance.<task>.schedule::
+	This config option controls whether or not the given `<task>` runs
+	during a `git maintenance run --schedule=<frequency>` command. The
+	value must be one of "hourly", "daily", or "weekly".
+
 maintenance.commit-graph.auto::
 	This integer config option controls how often the `commit-graph` task
 	should be run as part of `git maintenance run --auto`. If zero, then
diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index b44efb05a3..3af5907b01 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -107,7 +107,18 @@ OPTIONS
 	only if certain thresholds are met. For example, the `gc` task
 	runs when the number of loose objects exceeds the number stored
 	in the `gc.auto` config setting, or when the number of pack-files
-	exceeds the `gc.autoPackLimit` config setting.
+	exceeds the `gc.autoPackLimit` config setting. Not compatible with
+	the `--schedule` option.
+
+--schedule::
+	When combined with the `run` subcommand, run maintenance tasks
+	only if certain time conditions are met, as specified by the
+	`maintenance.<task>.schedule` config value for each `<task>`.
+	This config value specifies a number of seconds since the last
+	time that task ran, according to the `maintenance.<task>.lastRun`
+	config value. The tasks that are tested are those provided by
+	the `--task=<task>` option(s) or those with
+	`maintenance.<task>.enabled` set to true.
 
 --quiet::
 	Do not report progress or other information over `stderr`.
diff --git a/builtin/gc.c b/builtin/gc.c
index f8459df04c..85a3370692 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -704,14 +704,51 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 	return 0;
 }
 
-static const char * const builtin_maintenance_run_usage[] = {
-	N_("git maintenance run [--auto] [--[no-]quiet] [--task=<task>]"),
+static const char *const builtin_maintenance_run_usage[] = {
+	N_("git maintenance run [--auto] [--[no-]quiet] [--task=<task>] [--schedule]"),
 	NULL
 };
 
+enum schedule_priority {
+	SCHEDULE_NONE = 0,
+	SCHEDULE_WEEKLY = 1,
+	SCHEDULE_DAILY = 2,
+	SCHEDULE_HOURLY = 3,
+};
+
+static enum schedule_priority parse_schedule(const char *value)
+{
+	if (!value)
+		return SCHEDULE_NONE;
+	if (!strcasecmp(value, "hourly"))
+		return SCHEDULE_HOURLY;
+	if (!strcasecmp(value, "daily"))
+		return SCHEDULE_DAILY;
+	if (!strcasecmp(value, "weekly"))
+		return SCHEDULE_WEEKLY;
+	return SCHEDULE_NONE;
+}
+
+static int maintenance_opt_schedule(const struct option *opt, const char *arg,
+				    int unset)
+{
+	enum schedule_priority *priority = opt->value;
+
+	if (unset)
+		die(_("--no-schedule is not allowed"));
+
+	*priority = parse_schedule(arg);
+
+	if (!*priority)
+		die(_("unrecognized --schedule argument '%s'"), arg);
+
+	return 0;
+}
+
 struct maintenance_run_opts {
 	int auto_flag;
 	int quiet;
+	enum schedule_priority schedule;
 };
 
 /* Remember to update object flag allocation in object.h */
@@ -1159,6 +1196,8 @@ struct maintenance_task {
 	maintenance_auto_fn *auto_condition;
 	unsigned enabled:1;
 
+	enum schedule_priority schedule;
+
 	/* -1 if not selected. */
 	int selected_order;
 };
@@ -1250,8 +1289,10 @@ static int maintenance_run_tasks(struct maintenance_run_opts *opts)
 			continue;
 
 		if (opts->auto_flag &&
-		    (!tasks[i].auto_condition ||
-		     !tasks[i].auto_condition()))
+		    (!tasks[i].auto_condition || !tasks[i].auto_condition()))
+			continue;
+
+		if (opts->schedule && tasks[i].schedule < opts->schedule)
 			continue;
 
 		trace2_region_enter("maintenance", tasks[i].name, r);
@@ -1274,13 +1315,23 @@ static void initialize_task_config(void)
 
 	for (i = 0; i < TASK__COUNT; i++) {
 		int config_value;
+		char *config_str;
 
-		strbuf_setlen(&config_name, 0);
+		strbuf_reset(&config_name);
 		strbuf_addf(&config_name, "maintenance.%s.enabled",
 			    tasks[i].name);
 
 		if (!git_config_get_bool(config_name.buf, &config_value))
 			tasks[i].enabled = config_value;
+
+		strbuf_reset(&config_name);
+		strbuf_addf(&config_name, "maintenance.%s.schedule",
+			    tasks[i].name);
+
+		if (!git_config_get_string(config_name.buf, &config_str)) {
+			tasks[i].schedule = parse_schedule(config_str);
+			free(config_str);
+		}
 	}
 
 	strbuf_release(&config_name);
@@ -1324,6 +1375,9 @@ static int maintenance_run(int argc, const char **argv, const char *prefix)
 	struct option builtin_maintenance_run_options[] = {
 		OPT_BOOL(0, "auto", &opts.auto_flag,
 			 N_("run tasks based on the state of the repository")),
+		OPT_CALLBACK(0, "schedule", &opts.schedule, N_("frequency"),
+			     N_("run tasks based on frequency"),
+			     maintenance_opt_schedule),
 		OPT_BOOL(0, "quiet", &opts.quiet,
 			 N_("do not report progress or other information over stderr")),
 		OPT_CALLBACK_F(0, "task", NULL, N_("task"),
@@ -1344,6 +1398,9 @@ static int maintenance_run(int argc, const char **argv, const char *prefix)
 			     builtin_maintenance_run_usage,
 			     PARSE_OPT_STOP_AT_NON_OPTION);
 
+	if (opts.auto_flag && opts.schedule)
+		die(_("use at most one of --auto and --schedule=<frequency>"));
+
 	if (argc != 0)
 		usage_with_options(builtin_maintenance_run_usage,
 				   builtin_maintenance_run_options);
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index e0ba19e1ff..328bbaa830 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -264,4 +264,44 @@ test_expect_success 'maintenance.incremental-repack.auto' '
 	done
 '
 
+test_expect_success '--auto and --schedule incompatible' '
+	test_must_fail git maintenance run --auto --schedule=daily 2>err &&
+	test_i18ngrep "at most one" err
+'
+
+test_expect_success 'invalid --schedule value' '
+	test_must_fail git maintenance run --schedule=annually 2>err &&
+	test_i18ngrep "unrecognized --schedule" err
+'
+
+test_expect_success '--schedule inheritance weekly -> daily -> hourly' '
+	git config maintenance.loose-objects.enabled true &&
+	git config maintenance.loose-objects.schedule hourly &&
+	git config maintenance.commit-graph.enabled true &&
+	git config maintenance.commit-graph.schedule daily &&
+	git config maintenance.incremental-repack.enabled true &&
+	git config maintenance.incremental-repack.schedule weekly &&
+
+	GIT_TRACE2_EVENT="$(pwd)/hourly.txt" \
+		git maintenance run --schedule=hourly 2>/dev/null &&
+	test_subcommand git prune-packed --quiet <hourly.txt &&
+	test_subcommand ! git commit-graph write --split --reachable \
+		--no-progress <hourly.txt &&
+	test_subcommand ! git multi-pack-index write --no-progress <hourly.txt &&
+
+	GIT_TRACE2_EVENT="$(pwd)/daily.txt" \
+		git maintenance run --schedule=daily 2>/dev/null &&
+	test_subcommand git prune-packed --quiet <daily.txt &&
+	test_subcommand git commit-graph write --split --reachable \
+		--no-progress <daily.txt &&
+	test_subcommand ! git multi-pack-index write --no-progress <daily.txt &&
+
+	GIT_TRACE2_EVENT="$(pwd)/weekly.txt" \
+		git maintenance run --schedule=weekly 2>/dev/null &&
+	test_subcommand git prune-packed --quiet <weekly.txt &&
+	test_subcommand git commit-graph write --split --reachable \
+		--no-progress <weekly.txt &&
+	test_subcommand git multi-pack-index write --no-progress <weekly.txt
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 3/6] for-each-repo: run subcommands on configured repos
  2020-08-28 15:45   ` [PATCH v3 0/6] [RFC] Maintenance III: background maintenance Derrick Stolee via GitGitGadget
  2020-08-28 15:45     ` [PATCH v3 1/6] maintenance: optionally skip --auto process Derrick Stolee via GitGitGadget
  2020-08-28 15:45     ` [PATCH v3 2/6] maintenance: add --schedule option and config Derrick Stolee via GitGitGadget
@ 2020-08-28 15:45     ` Derrick Stolee via GitGitGadget
  2020-08-28 15:45     ` [PATCH v3 4/6] maintenance: add [un]register subcommands Derrick Stolee via GitGitGadget
                       ` (2 subsequent siblings)
  5 siblings, 0 replies; 41+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-28 15:45 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

It can be helpful to store a list of repositories in global or system
config and then iterate Git commands on that list. Create a new builtin
that makes this process simple for experts. We will use this builtin to
run scheduled maintenance on all configured repositories in a future
change.

The test is very simple, but does highlight that the "--" argument is
optional.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 .gitignore                          |  1 +
 Documentation/git-for-each-repo.txt | 59 +++++++++++++++++++++++++++++
 Makefile                            |  1 +
 builtin.h                           |  1 +
 builtin/for-each-repo.c             | 58 ++++++++++++++++++++++++++++
 command-list.txt                    |  1 +
 git.c                               |  1 +
 t/t0068-for-each-repo.sh            | 30 +++++++++++++++
 8 files changed, 152 insertions(+)
 create mode 100644 Documentation/git-for-each-repo.txt
 create mode 100644 builtin/for-each-repo.c
 create mode 100755 t/t0068-for-each-repo.sh

diff --git a/.gitignore b/.gitignore
index a5808fa30d..5eb2a2be71 100644
--- a/.gitignore
+++ b/.gitignore
@@ -67,6 +67,7 @@
 /git-filter-branch
 /git-fmt-merge-msg
 /git-for-each-ref
+/git-for-each-repo
 /git-format-patch
 /git-fsck
 /git-fsck-objects
diff --git a/Documentation/git-for-each-repo.txt b/Documentation/git-for-each-repo.txt
new file mode 100644
index 0000000000..94bd19da26
--- /dev/null
+++ b/Documentation/git-for-each-repo.txt
@@ -0,0 +1,59 @@
+git-for-each-repo(1)
+====================
+
+NAME
+----
+git-for-each-repo - Run a Git command on a list of repositories
+
+
+SYNOPSIS
+--------
+[verse]
+'git for-each-repo' --config=<config> [--] <arguments>
+
+
+DESCRIPTION
+-----------
+Run a Git command on a list of repositories. The arguments after the
+known options or `--` indicator are used as the arguments for the Git
+subprocess.
+
+THIS COMMAND IS EXPERIMENTAL. THE BEHAVIOR MAY CHANGE.
+
+For example, we could run maintenance on each of a list of repositories
+stored in a `maintenance.repo` config variable using
+
+-------------
+git for-each-repo --config=maintenance.repo maintenance run
+-------------
+
+This will run `git -C <repo> maintenance run` for each value `<repo>`
+in the multi-valued config variable `maintenance.repo`.
+
+
+OPTIONS
+-------
+--config=<config>::
+	Use the given config variable as a multi-valued list storing
+	absolute path names. Iterate on that list of paths to run
+	the given arguments.
++
+These config values are loaded from system, global, and local Git config,
+as available. If `git for-each-repo` is run in a directory that is not a
+Git repository, then only the system and global config is used.
+
+
+SUBPROCESS BEHAVIOR
+-------------------
+
+If any `git -C <repo> <arguments>` subprocess returns a non-zero exit code,
+then the `git for-each-repo` process returns that exit code without running
+more subprocesses.
+
+Each `git -C <repo> <arguments>` subprocess inherits the standard file
+descriptors `stdin`, `stdout`, and `stderr`.
+
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/Makefile b/Makefile
index 65f8cfb236..7c588ff036 100644
--- a/Makefile
+++ b/Makefile
@@ -1071,6 +1071,7 @@ BUILTIN_OBJS += builtin/fetch-pack.o
 BUILTIN_OBJS += builtin/fetch.o
 BUILTIN_OBJS += builtin/fmt-merge-msg.o
 BUILTIN_OBJS += builtin/for-each-ref.o
+BUILTIN_OBJS += builtin/for-each-repo.o
 BUILTIN_OBJS += builtin/fsck.o
 BUILTIN_OBJS += builtin/gc.o
 BUILTIN_OBJS += builtin/get-tar-commit-id.o
diff --git a/builtin.h b/builtin.h
index 17c1c0ce49..ff7c6e5aa9 100644
--- a/builtin.h
+++ b/builtin.h
@@ -150,6 +150,7 @@ int cmd_fetch(int argc, const char **argv, const char *prefix);
 int cmd_fetch_pack(int argc, const char **argv, const char *prefix);
 int cmd_fmt_merge_msg(int argc, const char **argv, const char *prefix);
 int cmd_for_each_ref(int argc, const char **argv, const char *prefix);
+int cmd_for_each_repo(int argc, const char **argv, const char *prefix);
 int cmd_format_patch(int argc, const char **argv, const char *prefix);
 int cmd_fsck(int argc, const char **argv, const char *prefix);
 int cmd_gc(int argc, const char **argv, const char *prefix);
diff --git a/builtin/for-each-repo.c b/builtin/for-each-repo.c
new file mode 100644
index 0000000000..5bba623ff1
--- /dev/null
+++ b/builtin/for-each-repo.c
@@ -0,0 +1,58 @@
+#include "cache.h"
+#include "config.h"
+#include "builtin.h"
+#include "parse-options.h"
+#include "run-command.h"
+#include "string-list.h"
+
+static const char * const for_each_repo_usage[] = {
+	N_("git for-each-repo --config=<config> <command-args>"),
+	NULL
+};
+
+static int run_command_on_repo(const char *path,
+			       void *cbdata)
+{
+	int i;
+	struct child_process child = CHILD_PROCESS_INIT;
+	struct strvec *args = (struct strvec *)cbdata;
+
+	child.git_cmd = 1;
+	strvec_pushl(&child.args, "-C", path, NULL);
+
+	for (i = 0; i < args->nr; i++)
+		strvec_push(&child.args, args->v[i]);
+
+	return run_command(&child);
+}
+
+int cmd_for_each_repo(int argc, const char **argv, const char *prefix)
+{
+	static const char *config_key = NULL;
+	int i, result = 0;
+	const struct string_list *values;
+	struct strvec args = STRVEC_INIT;
+
+	const struct option options[] = {
+		OPT_STRING(0, "config", &config_key, N_("config"),
+			   N_("config key storing a list of repository paths")),
+		OPT_END()
+	};
+
+	argc = parse_options(argc, argv, prefix, options, for_each_repo_usage,
+			     PARSE_OPT_STOP_AT_NON_OPTION);
+
+	if (!config_key)
+		die(_("missing --config=<config>"));
+
+	for (i = 0; i < argc; i++)
+		strvec_push(&args, argv[i]);
+
+	values = repo_config_get_value_multi(the_repository,
+					     config_key);
+
+	for (i = 0; !result && i < values->nr; i++)
+		result = run_command_on_repo(values->items[i].string, &args);
+
+	return result;
+}
diff --git a/command-list.txt b/command-list.txt
index 0e3204e7d1..581499be82 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -94,6 +94,7 @@ git-fetch-pack                          synchingrepositories
 git-filter-branch                       ancillarymanipulators
 git-fmt-merge-msg                       purehelpers
 git-for-each-ref                        plumbinginterrogators
+git-for-each-repo                       plumbinginterrogators
 git-format-patch                        mainporcelain
 git-fsck                                ancillaryinterrogators          complete
 git-gc                                  mainporcelain
diff --git a/git.c b/git.c
index 24f250d29a..1cab64b5d1 100644
--- a/git.c
+++ b/git.c
@@ -511,6 +511,7 @@ static struct cmd_struct commands[] = {
 	{ "fetch-pack", cmd_fetch_pack, RUN_SETUP | NO_PARSEOPT },
 	{ "fmt-merge-msg", cmd_fmt_merge_msg, RUN_SETUP },
 	{ "for-each-ref", cmd_for_each_ref, RUN_SETUP },
+	{ "for-each-repo", cmd_for_each_repo, RUN_SETUP_GENTLY },
 	{ "format-patch", cmd_format_patch, RUN_SETUP },
 	{ "fsck", cmd_fsck, RUN_SETUP },
 	{ "fsck-objects", cmd_fsck, RUN_SETUP },
diff --git a/t/t0068-for-each-repo.sh b/t/t0068-for-each-repo.sh
new file mode 100755
index 0000000000..136b4ec839
--- /dev/null
+++ b/t/t0068-for-each-repo.sh
@@ -0,0 +1,30 @@
+#!/bin/sh
+
+test_description='git for-each-repo builtin'
+
+. ./test-lib.sh
+
+test_expect_success 'run based on configured value' '
+	git init one &&
+	git init two &&
+	git init three &&
+	git -C two commit --allow-empty -m "DID NOT RUN" &&
+	git config run.key "$TRASH_DIRECTORY/one" &&
+	git config --add run.key "$TRASH_DIRECTORY/three" &&
+	git for-each-repo --config=run.key commit --allow-empty -m "ran" &&
+	git -C one log -1 --pretty=format:%s >message &&
+	grep ran message &&
+	git -C two log -1 --pretty=format:%s >message &&
+	! grep ran message &&
+	git -C three log -1 --pretty=format:%s >message &&
+	grep ran message &&
+	git for-each-repo --config=run.key -- commit --allow-empty -m "ran again" &&
+	git -C one log -1 --pretty=format:%s >message &&
+	grep again message &&
+	git -C two log -1 --pretty=format:%s >message &&
+	! grep again message &&
+	git -C three log -1 --pretty=format:%s >message &&
+	grep again message
+'
+
+test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 4/6] maintenance: add [un]register subcommands
  2020-08-28 15:45   ` [PATCH v3 0/6] [RFC] Maintenance III: background maintenance Derrick Stolee via GitGitGadget
                       ` (2 preceding siblings ...)
  2020-08-28 15:45     ` [PATCH v3 3/6] for-each-repo: run subcommands on configured repos Derrick Stolee via GitGitGadget
@ 2020-08-28 15:45     ` Derrick Stolee via GitGitGadget
  2020-08-28 15:45     ` [PATCH v3 5/6] maintenance: add start/stop subcommands Derrick Stolee via GitGitGadget
  2020-08-28 15:45     ` [PATCH v3 6/6] maintenance: recommended schedule in register/start Derrick Stolee via GitGitGadget
  5 siblings, 0 replies; 41+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-28 15:45 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

In preparation for launching background maintenance from the 'git
maintenance' builtin, create register/unregister subcommands. These
commands update the new 'maintenance.repos' config option in the global
config so the background maintenance job knows which repositories to
maintain.

These commands allow users to add a repository to the background
maintenance list without disrupting the actual maintenance mechanism.

For example, a user can run 'git maintenance register' when no
background maintenance is running and it will not start the background
maintenance. A later update to start running background maintenance will
then pick up this repository automatically.

The opposite example is that a user can run 'git maintenance unregister'
to remove the current repository from background maintenance without
halting maintenance for other repositories.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-maintenance.txt | 14 ++++++++
 builtin/gc.c                      | 55 ++++++++++++++++++++++++++++++-
 t/t7900-maintenance.sh            | 17 +++++++++-
 3 files changed, 84 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index 3af5907b01..78d0d8df91 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -29,6 +29,15 @@ Git repository.
 SUBCOMMANDS
 -----------
 
+register::
+	Initialize Git config values so any scheduled maintenance will
+	start running on this repository. This adds the repository to the
+	`maintenance.repo` config variable in the current user's global
+	config and enables some recommended configuration values for
+	`maintenance.<task>.schedule`. The tasks that are enabled are safe
+	for running in the background without disrupting foreground
+	processes.
+
 run::
 	Run one or more maintenance tasks. If one or more `--task` options
 	are specified, then those tasks are run in that order. Otherwise,
@@ -36,6 +45,11 @@ run::
 	config options are true. By default, only `maintenance.gc.enabled`
 	is true.
 
+unregister::
+	Remove the current repository from background maintenance. This
+	only removes the repository from the configured list. It does not
+	stop the background maintenance processes from running.
+
 TASKS
 -----
 
diff --git a/builtin/gc.c b/builtin/gc.c
index 85a3370692..ec77e8d2fa 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1407,7 +1407,56 @@ static int maintenance_run(int argc, const char **argv, const char *prefix)
 	return maintenance_run_tasks(&opts);
 }
 
-static const char builtin_maintenance_usage[] = N_("git maintenance run [<options>]");
+static int maintenance_register(void)
+{
+	struct child_process config_set = CHILD_PROCESS_INIT;
+	struct child_process config_get = CHILD_PROCESS_INIT;
+
+	/* There is no current repository, so skip registering it */
+	if (!the_repository || !the_repository->gitdir)
+		return 0;
+
+	config_get.git_cmd = 1;
+	strvec_pushl(&config_get.args, "config", "--global", "--get", "maintenance.repo",
+		     the_repository->worktree ? the_repository->worktree
+					      : the_repository->gitdir,
+			 NULL);
+	config_get.out = -1;
+
+	if (start_command(&config_get))
+		return error(_("failed to run 'git config'"));
+
+	/* We already have this value in our config! */
+	if (!finish_command(&config_get))
+		return 0;
+
+	config_set.git_cmd = 1;
+	strvec_pushl(&config_set.args, "config", "--add", "--global", "maintenance.repo",
+		     the_repository->worktree ? the_repository->worktree
+					      : the_repository->gitdir,
+		     NULL);
+
+	return run_command(&config_set);
+}
+
+static int maintenance_unregister(void)
+{
+	struct child_process config_unset = CHILD_PROCESS_INIT;
+
+	if (!the_repository || !the_repository->gitdir)
+		return error(_("no current repository to unregister"));
+
+	config_unset.git_cmd = 1;
+	strvec_pushl(&config_unset.args, "config", "--global", "--unset",
+		     "maintenance.repo",
+		     the_repository->worktree ? the_repository->worktree
+					      : the_repository->gitdir,
+		     NULL);
+
+	return run_command(&config_unset);
+}
+
+static const char builtin_maintenance_usage[] =	N_("git maintenance <subcommand> [<options>]");
 
 int cmd_maintenance(int argc, const char **argv, const char *prefix)
 {
@@ -1416,6 +1465,10 @@ int cmd_maintenance(int argc, const char **argv, const char *prefix)
 
 	if (!strcmp(argv[1], "run"))
 		return maintenance_run(argc - 1, argv + 1, prefix);
+	if (!strcmp(argv[1], "register"))
+		return maintenance_register();
+	if (!strcmp(argv[1], "unregister"))
+		return maintenance_unregister();
 
 	die(_("invalid subcommand: %s"), argv[1]);
 }
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 328bbaa830..272d1605d2 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -9,7 +9,7 @@ GIT_TEST_MULTI_PACK_INDEX=0
 
 test_expect_success 'help text' '
 	test_expect_code 129 git maintenance -h 2>err &&
-	test_i18ngrep "usage: git maintenance run" err &&
+	test_i18ngrep "usage: git maintenance <subcommand>" err &&
 	test_expect_code 128 git maintenance barf 2>err &&
 	test_i18ngrep "invalid subcommand: barf" err
 '
@@ -304,4 +304,19 @@ test_expect_success '--schedule inheritance weekly -> daily -> hourly' '
 	test_subcommand git multi-pack-index write --no-progress <weekly.txt
 '
 
+test_expect_success 'register and unregister' '
+	test_when_finished git config --global --unset-all maintenance.repo &&
+	git config --global --add maintenance.repo /existing1 &&
+	git config --global --add maintenance.repo /existing2 &&
+	git config --global --get-all maintenance.repo >before &&
+	git maintenance register &&
+	git config --global --get-all maintenance.repo >actual &&
+	cp before after &&
+	pwd >>after &&
+	test_cmp after actual &&
+	git maintenance unregister &&
+	git config --global --get-all maintenance.repo >actual &&
+	test_cmp before actual
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 5/6] maintenance: add start/stop subcommands
  2020-08-28 15:45   ` [PATCH v3 0/6] [RFC] Maintenance III: background maintenance Derrick Stolee via GitGitGadget
                       ` (3 preceding siblings ...)
  2020-08-28 15:45     ` [PATCH v3 4/6] maintenance: add [un]register subcommands Derrick Stolee via GitGitGadget
@ 2020-08-28 15:45     ` Derrick Stolee via GitGitGadget
  2020-08-28 15:45     ` [PATCH v3 6/6] maintenance: recommended schedule in register/start Derrick Stolee via GitGitGadget
  5 siblings, 0 replies; 41+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-28 15:45 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Add new subcommands to 'git maintenance' that start or stop background
maintenance using 'cron', when available. This integration is as simple
as I could make it, barring some implementation complications.

The schedule is laid out as follows:

  0 1-23 * * *   $cmd maintenance run --schedule=hourly
  0 0    * * 1-6 $cmd maintenance run --schedule=daily
  0 0    * * 0   $cmd maintenance run --schedule=weekly

where $cmd is a properly-qualified 'git for-each-repo' execution:

$cmd=$path/git --exec-path=$path for-each-repo --config=maintenance.repo

where $path points to the location of the Git executable running 'git
maintenance start'. This is critical for systems with multiple versions
of Git. Specifically, macOS has a system version at '/usr/bin/git' while
the version that users can install resides at '/usr/local/bin/git'
(symlinked to '/usr/local/libexec/git-core/git'). This will also use
your locally-built version if you build and run this in your development
environment without installing first.

This conditional schedule avoids having cron launch multiple 'git
for-each-repo' commands in parallel. Such parallel commands would likely
lead to the 'hourly' and 'daily' tasks competing over the object
database lock. This could lead to to some tasks never being run! Since
the --schedule=<frequency> argument will run all tasks with _at least_
the given frequency, the daily runs will also run the hourly tasks.
Similarly, the weekly runs will also run the daily and hourly tasks.

The GIT_TEST_CRONTAB environment variable is not intended for users to
edit, but instead as a way to mock the 'crontab [-l]' command. This
variable is set in test-lib.sh to avoid a future test from accidentally
running anything with the cron integration from modifying the user's
schedule. We use GIT_TEST_CRONTAB='test-tool crontab <file>' in our
tests to check how the schedule is modified in 'git maintenance
(start|stop)' commands.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-maintenance.txt |  11 +++
 Makefile                          |   1 +
 builtin/gc.c                      | 124 ++++++++++++++++++++++++++++++
 t/helper/test-crontab.c           |  35 +++++++++
 t/helper/test-tool.c              |   1 +
 t/helper/test-tool.h              |   1 +
 t/t7900-maintenance.sh            |  28 +++++++
 t/test-lib.sh                     |   6 ++
 8 files changed, 207 insertions(+)
 create mode 100644 t/helper/test-crontab.c

diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index 78d0d8df91..7f8c279fe8 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -45,6 +45,17 @@ run::
 	config options are true. By default, only `maintenance.gc.enabled`
 	is true.
 
+start::
+	Start running maintenance on the current repository. This performs
+	the same config updates as the `register` subcommand, then updates
+	the background scheduler to run `git maintenance run --scheduled`
+	on an hourly basis.
+
+stop::
+	Halt the background maintenance schedule. The current repository
+	is not removed from the list of maintained repositories, in case
+	the background maintenance is restarted later.
+
 unregister::
 	Remove the current repository from background maintenance. This
 	only removes the repository from the configured list. It does not
diff --git a/Makefile b/Makefile
index 7c588ff036..c39b39bd7d 100644
--- a/Makefile
+++ b/Makefile
@@ -690,6 +690,7 @@ TEST_BUILTINS_OBJS += test-advise.o
 TEST_BUILTINS_OBJS += test-bloom.o
 TEST_BUILTINS_OBJS += test-chmtime.o
 TEST_BUILTINS_OBJS += test-config.o
+TEST_BUILTINS_OBJS += test-crontab.o
 TEST_BUILTINS_OBJS += test-ctype.o
 TEST_BUILTINS_OBJS += test-date.o
 TEST_BUILTINS_OBJS += test-delta.o
diff --git a/builtin/gc.c b/builtin/gc.c
index ec77e8d2fa..9914417e25 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -32,6 +32,7 @@
 #include "remote.h"
 #include "midx.h"
 #include "object-store.h"
+#include "exec-cmd.h"
 
 #define FAILED_RUN "failed to run %s"
 
@@ -1456,6 +1457,125 @@ static int maintenance_unregister(void)
 	return run_command(&config_unset);
 }
 
+#define BEGIN_LINE "# BEGIN GIT MAINTENANCE SCHEDULE"
+#define END_LINE "# END GIT MAINTENANCE SCHEDULE"
+
+static int update_background_schedule(int run_maintenance)
+{
+	int result = 0;
+	int in_old_region = 0;
+	struct child_process crontab_list = CHILD_PROCESS_INIT;
+	struct child_process crontab_edit = CHILD_PROCESS_INIT;
+	FILE *cron_list, *cron_in;
+	const char *crontab_name;
+	struct strbuf line = STRBUF_INIT;
+	struct lock_file lk;
+	char *lock_path = xstrfmt("%s/schedule", the_repository->objects->odb->path);
+
+	if (hold_lock_file_for_update(&lk, lock_path, LOCK_NO_DEREF) < 0)
+		return error(_("another process is scheduling background maintenance"));
+
+	crontab_name = getenv("GIT_TEST_CRONTAB");
+	if (!crontab_name)
+		crontab_name = "crontab";
+
+	strvec_split(&crontab_list.args, crontab_name);
+	strvec_push(&crontab_list.args, "-l");
+	crontab_list.in = -1;
+	crontab_list.out = dup(lk.tempfile->fd);
+	crontab_list.git_cmd = 0;
+
+	if (start_command(&crontab_list)) {
+		result = error(_("failed to run 'crontab -l'; your system might not support 'cron'"));
+		goto cleanup;
+	}
+
+	/* Ignore exit code, as an empty crontab will return error. */
+	finish_command(&crontab_list);
+
+	/*
+	 * Read from the .lock file, filtering out the old
+	 * schedule while appending the new schedule.
+	 */
+	cron_list = fdopen(lk.tempfile->fd, "r");
+	rewind(cron_list);
+
+	strvec_split(&crontab_edit.args, crontab_name);
+	crontab_edit.in = -1;
+	crontab_edit.git_cmd = 0;
+
+	if (start_command(&crontab_edit)) {
+		result = error(_("failed to run 'crontab'; your system might not support 'cron'"));
+		goto cleanup;
+	}
+
+	cron_in = fdopen(crontab_edit.in, "w");
+	if (!cron_in) {
+		result = error(_("failed to open stdin of 'crontab'"));
+		goto done_editing;
+	}
+
+	while (!strbuf_getline_lf(&line, cron_list)) {
+		if (!in_old_region && !strcmp(line.buf, BEGIN_LINE))
+			in_old_region = 1;
+		if (in_old_region)
+			continue;
+		fprintf(cron_in, "%s\n", line.buf);
+		if (in_old_region && !strcmp(line.buf, END_LINE))
+			in_old_region = 0;
+	}
+
+	if (run_maintenance) {
+		struct strbuf line_format = STRBUF_INIT;
+		const char *exec_path = git_exec_path();
+
+		fprintf(cron_in, "%s\n", BEGIN_LINE);
+		fprintf(cron_in,
+			"# The following schedule was created by Git\n");
+		fprintf(cron_in, "# Any edits made in this region might be\n");
+		fprintf(cron_in,
+			"# replaced in the future by a Git command.\n\n");
+
+		strbuf_addf(&line_format,
+			    "%%s %%s * * %%s \"%s/git\" --exec-path=\"%s\" for-each-repo --config=maintenance.repo maintenance run --schedule=%%s\n",
+			    exec_path, exec_path);
+		fprintf(cron_in, line_format.buf, "0", "1-23", "*", "hourly");
+		fprintf(cron_in, line_format.buf, "0", "0", "1-6", "daily");
+		fprintf(cron_in, line_format.buf, "0", "0", "0", "weekly");
+		strbuf_release(&line_format);
+
+		fprintf(cron_in, "\n%s\n", END_LINE);
+	}
+
+	fflush(cron_in);
+	fclose(cron_in);
+	close(crontab_edit.in);
+
+done_editing:
+	if (finish_command(&crontab_edit)) {
+		result = error(_("'crontab' died"));
+		goto cleanup;
+	}
+	fclose(cron_list);
+
+cleanup:
+	rollback_lock_file(&lk);
+	return result;
+}
+
+static int maintenance_start(void)
+{
+	if (maintenance_register())
+		warning(_("failed to add repo to global config"));
+
+	return update_background_schedule(1);
+}
+
+static int maintenance_stop(void)
+{
+	return update_background_schedule(0);
+}
+
 static const char builtin_maintenance_usage[] =	N_("git maintenance <subcommand> [<options>]");
 
 int cmd_maintenance(int argc, const char **argv, const char *prefix)
@@ -1465,6 +1585,10 @@ int cmd_maintenance(int argc, const char **argv, const char *prefix)
 
 	if (!strcmp(argv[1], "run"))
 		return maintenance_run(argc - 1, argv + 1, prefix);
+	if (!strcmp(argv[1], "start"))
+		return maintenance_start();
+	if (!strcmp(argv[1], "stop"))
+		return maintenance_stop();
 	if (!strcmp(argv[1], "register"))
 		return maintenance_register();
 	if (!strcmp(argv[1], "unregister"))
diff --git a/t/helper/test-crontab.c b/t/helper/test-crontab.c
new file mode 100644
index 0000000000..f5db6319c6
--- /dev/null
+++ b/t/helper/test-crontab.c
@@ -0,0 +1,35 @@
+#include "test-tool.h"
+#include "cache.h"
+
+/*
+ * Usage: test-tool cron <file> [-l]
+ *
+ * If -l is specified, then write the contents of <file> to stdou.
+ * Otherwise, write from stdin into <file>.
+ */
+int cmd__crontab(int argc, const char **argv)
+{
+	char a;
+	FILE *from, *to;
+
+	if (argc == 3 && !strcmp(argv[2], "-l")) {
+		from = fopen(argv[1], "r");
+		if (!from)
+			return 0;
+		to = stdout;
+	} else if (argc == 2) {
+		from = stdin;
+		to = fopen(argv[1], "w");
+	} else
+		return error("unknown arguments");
+
+	while ((a = fgetc(from)) != EOF)
+		fputc(a, to);
+
+	if (argc == 3)
+		fclose(from);
+	else
+		fclose(to);
+
+	return 0;
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 590b2efca7..432b49d948 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -18,6 +18,7 @@ static struct test_cmd cmds[] = {
 	{ "bloom", cmd__bloom },
 	{ "chmtime", cmd__chmtime },
 	{ "config", cmd__config },
+	{ "crontab", cmd__crontab },
 	{ "ctype", cmd__ctype },
 	{ "date", cmd__date },
 	{ "delta", cmd__delta },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index ddc8e990e9..7c3281e071 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -8,6 +8,7 @@ int cmd__advise_if_enabled(int argc, const char **argv);
 int cmd__bloom(int argc, const char **argv);
 int cmd__chmtime(int argc, const char **argv);
 int cmd__config(int argc, const char **argv);
+int cmd__crontab(int argc, const char **argv);
 int cmd__ctype(int argc, const char **argv);
 int cmd__date(int argc, const char **argv);
 int cmd__delta(int argc, const char **argv);
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 272d1605d2..8803fcf621 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -319,4 +319,32 @@ test_expect_success 'register and unregister' '
 	test_cmp before actual
 '
 
+test_expect_success 'start from empty cron table' '
+	GIT_TEST_CRONTAB="test-tool crontab cron.txt" git maintenance start &&
+
+	# start registers the repo
+	git config --get --global maintenance.repo "$(pwd)" &&
+
+	grep "for-each-repo --config=maintenance.repo maintenance run --schedule=daily" cron.txt &&
+	grep "for-each-repo --config=maintenance.repo maintenance run --schedule=hourly" cron.txt &&
+	grep "for-each-repo --config=maintenance.repo maintenance run --schedule=weekly" cron.txt
+'
+
+test_expect_success 'stop from existing schedule' '
+	GIT_TEST_CRONTAB="test-tool crontab cron.txt" git maintenance stop &&
+
+	# stop does not unregister the repo
+	git config --get --global maintenance.repo "$(pwd)" &&
+
+	# Operation is idempotent
+	GIT_TEST_CRONTAB="test-tool crontab cron.txt" git maintenance stop &&
+	test_must_be_empty cron.txt
+'
+
+test_expect_success 'start preserves existing schedule' '
+	echo "Important information!" >cron.txt &&
+	GIT_TEST_CRONTAB="test-tool crontab cron.txt" git maintenance start &&
+	grep "Important information!" cron.txt
+'
+
 test_done
diff --git a/t/test-lib.sh b/t/test-lib.sh
index ef31f40037..4a60d1ed76 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1702,3 +1702,9 @@ test_lazy_prereq SHA1 '
 test_lazy_prereq REBASE_P '
 	test -z "$GIT_TEST_SKIP_REBASE_P"
 '
+
+# Ensure that no test accidentally triggers a Git command
+# that runs 'crontab', affecting a user's cron schedule.
+# Tests that verify the cron integration must set this locally
+# to avoid errors.
+GIT_TEST_CRONTAB="exit 1"
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 6/6] maintenance: recommended schedule in register/start
  2020-08-28 15:45   ` [PATCH v3 0/6] [RFC] Maintenance III: background maintenance Derrick Stolee via GitGitGadget
                       ` (4 preceding siblings ...)
  2020-08-28 15:45     ` [PATCH v3 5/6] maintenance: add start/stop subcommands Derrick Stolee via GitGitGadget
@ 2020-08-28 15:45     ` Derrick Stolee via GitGitGadget
  5 siblings, 0 replies; 41+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-28 15:45 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The 'git maintenance (register|start)' subcommands add the current
repository to the global Git config so maintenance will operate on that
repository. It does not specify what maintenance should occur or how
often.

If a user sets any 'maintenance.<task>.schedule' config value, then
they have chosen a specific schedule for themselves and Git should
respect that.

However, in an effort to recommend a good schedule for repositories of
all sizes, set new config values for recommended tasks that are safe to
run in the background while users run foreground Git commands. These
commands are generally everything but the 'gc' task.

Author's Note: I feel we should do _something_ to recommend a good
schedule to users, but I'm not 100% set on this schedule. This is the
schedule we use in Scalar and VFS for Git for very large repositories
using the GVFS protocol. While the schedule works in that environment,
it is possible that "normal" Git repositories could benefit from
something more obvious (such as running 'gc' weekly). However, this
patch gives us a place to start a conversation on what we should
recommend. For my purposes, Scalar will set these config values so we
can always differ from core Git's recommendations.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-maintenance.txt |  6 ++++
 builtin/gc.c                      | 46 +++++++++++++++++++++++++++++++
 t/t7900-maintenance.sh            | 16 +++++++++++
 3 files changed, 68 insertions(+)

diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index 7f8c279fe8..364b3e32bf 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -37,6 +37,12 @@ register::
 	`maintenance.<task>.schedule`. The tasks that are enabled are safe
 	for running in the background without disrupting foreground
 	processes.
++
+If your repository has no 'maintenance.<task>.schedule' configuration
+values set, then Git will set configuration values to some recommended
+settings. These settings disable foreground maintenance while performing
+maintenance tasks in the background that will not interrupt foreground Git
+operations.
 
 run::
 	Run one or more maintenance tasks. If one or more `--task` options
diff --git a/builtin/gc.c b/builtin/gc.c
index 9914417e25..5f253d3458 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1408,6 +1408,49 @@ static int maintenance_run(int argc, const char **argv, const char *prefix)
 	return maintenance_run_tasks(&opts);
 }
 
+static int has_schedule_config(void)
+{
+	int i, found = 0;
+	struct strbuf config_name = STRBUF_INIT;
+	size_t prefix;
+
+	strbuf_addstr(&config_name, "maintenance.");
+	prefix = config_name.len;
+
+	for (i = 0; !found && i < TASK__COUNT; i++) {
+		char *value;
+
+		strbuf_setlen(&config_name, prefix);
+		strbuf_addf(&config_name, "%s.schedule", tasks[i].name);
+
+		if (!git_config_get_string(config_name.buf, &value)) {
+			found = 1;
+			FREE_AND_NULL(value);
+		}
+	}
+
+	strbuf_release(&config_name);
+	return found;
+}
+
+static void set_recommended_schedule(void)
+{
+	git_config_set("maintenance.auto", "false");
+	git_config_set("maintenance.gc.enabled", "false");
+
+	git_config_set("maintenance.prefetch.enabled", "true");
+	git_config_set("maintenance.prefetch.schedule", "hourly");
+
+	git_config_set("maintenance.commit-graph.enabled", "true");
+	git_config_set("maintenance.commit-graph.schedule", "hourly");
+
+	git_config_set("maintenance.loose-objects.enabled", "true");
+	git_config_set("maintenance.loose-objects.schedule", "daily");
+
+	git_config_set("maintenance.incremental-repack.enabled", "true");
+	git_config_set("maintenance.incremental-repack.schedule", "daily");
+}
+
 static int maintenance_register(void)
 {
 	struct child_process config_set = CHILD_PROCESS_INIT;
@@ -1417,6 +1460,9 @@ static int maintenance_register(void)
 	if (!the_repository || !the_repository->gitdir)
 		return 0;
 
+	if (!has_schedule_config())
+		set_recommended_schedule();
+
 	config_get.git_cmd = 1;
 	strvec_pushl(&config_get.args, "config", "--global", "--get", "maintenance.repo",
 		     the_repository->worktree ? the_repository->worktree
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 8803fcf621..5a31f3925b 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -309,7 +309,23 @@ test_expect_success 'register and unregister' '
 	git config --global --add maintenance.repo /existing1 &&
 	git config --global --add maintenance.repo /existing2 &&
 	git config --global --get-all maintenance.repo >before &&
+
+	# We still have maintenance.<task>.schedule config set,
+	# so this does not update the local schedule
+	git maintenance register &&
+	test_must_fail git config maintenance.auto &&
+
+	# Clear previous maintenance.<task>.schedule values
+	for task in loose-objects commit-graph incremental-repack
+	do
+		git config --unset maintenance.$task.schedule || return 1
+	done &&
 	git maintenance register &&
+	test_cmp_config false maintenance.auto &&
+	test_cmp_config false maintenance.gc.enabled &&
+	test_cmp_config true maintenance.prefetch.enabled &&
+	test_cmp_config hourly maintenance.commit-graph.schedule &&
+	test_cmp_config daily maintenance.incremental-repack.schedule &&
 	git config --global --get-all maintenance.repo >actual &&
 	cp before after &&
 	pwd >>after &&
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2020-08-28 15:46 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-19 17:16 [PATCH 0/7] [RFC] Maintenance III: background maintenance Derrick Stolee via GitGitGadget
2020-08-19 17:16 ` [PATCH 1/7] maintenance: optionally skip --auto process Derrick Stolee via GitGitGadget
2020-08-20  2:06   ` Đoàn Trần Công Danh
2020-08-20 12:12     ` Derrick Stolee
2020-08-19 17:16 ` [PATCH 3/7] maintenance: add --scheduled option and config Derrick Stolee via GitGitGadget
2020-08-20 14:51   ` Đoàn Trần Công Danh
2020-08-24 14:03     ` Derrick Stolee
2020-08-19 17:16 ` [PATCH 4/7] for-each-repo: run subcommands on configured repos Derrick Stolee via GitGitGadget
2020-08-20 15:00   ` Đoàn Trần Công Danh
2020-08-19 17:16 ` [PATCH 5/7] maintenance: add [un]register subcommands Derrick Stolee via GitGitGadget
2020-08-19 17:16 ` [PATCH 6/7] maintenance: add start/stop subcommands Derrick Stolee via GitGitGadget
2020-08-19 17:16 ` [PATCH 7/7] maintenance: recommended schedule in register/start Derrick Stolee via GitGitGadget
     [not found] ` <bdc27fa28ee70222ed3c7c9863746ace8ea835e4.1597857409.git.gitgitgadget@gmail.com>
2020-08-20 14:34   ` [PATCH 2/7] maintenance: store the "last run" time in config Đoàn Trần Công Danh
2020-08-25 18:39 ` [PATCH v2 0/7] [RFC] Maintenance III: background maintenance Derrick Stolee via GitGitGadget
2020-08-25 18:39   ` [PATCH v2 1/7] maintenance: optionally skip --auto process Derrick Stolee via GitGitGadget
2020-08-25 21:44     ` Junio C Hamano
2020-08-26 12:29       ` Derrick Stolee
2020-08-26 16:57         ` Junio C Hamano
2020-08-25 18:39   ` [PATCH v2 2/7] maintenance: store the "last run" time in config Derrick Stolee via GitGitGadget
2020-08-25 21:52     ` Junio C Hamano
2020-08-26 13:34       ` Derrick Stolee
2020-08-26 17:03         ` Junio C Hamano
2020-08-27 13:02           ` Derrick Stolee
2020-08-25 18:40   ` [PATCH v2 3/7] maintenance: add --scheduled option and config Derrick Stolee via GitGitGadget
2020-08-25 22:01     ` Junio C Hamano
2020-08-26 15:30       ` Derrick Stolee
2020-08-27 15:47         ` Derrick Stolee
2020-08-25 18:40   ` [PATCH v2 4/7] for-each-repo: run subcommands on configured repos Derrick Stolee via GitGitGadget
2020-08-25 22:19     ` Junio C Hamano
2020-08-26 16:03       ` Derrick Stolee
2020-08-25 18:40   ` [PATCH v2 5/7] maintenance: add [un]register subcommands Derrick Stolee via GitGitGadget
2020-08-25 18:40   ` [PATCH v2 6/7] maintenance: add start/stop subcommands Derrick Stolee via GitGitGadget
2020-08-25 18:40   ` [PATCH v2 7/7] maintenance: recommended schedule in register/start Derrick Stolee via GitGitGadget
2020-08-28 15:45   ` [PATCH v3 0/6] [RFC] Maintenance III: background maintenance Derrick Stolee via GitGitGadget
2020-08-28 15:45     ` [PATCH v3 1/6] maintenance: optionally skip --auto process Derrick Stolee via GitGitGadget
2020-08-28 15:45     ` [PATCH v3 2/6] maintenance: add --schedule option and config Derrick Stolee via GitGitGadget
2020-08-28 15:45     ` [PATCH v3 3/6] for-each-repo: run subcommands on configured repos Derrick Stolee via GitGitGadget
2020-08-28 15:45     ` [PATCH v3 4/6] maintenance: add [un]register subcommands Derrick Stolee via GitGitGadget
2020-08-28 15:45     ` [PATCH v3 5/6] maintenance: add start/stop subcommands Derrick Stolee via GitGitGadget
2020-08-28 15:45     ` [PATCH v3 6/6] maintenance: recommended schedule in register/start Derrick Stolee via GitGitGadget
2020-08-26 12:42 ` [PATCH 0/7] [RFC] Maintenance III: background maintenance Michal Suchánek

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).