git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Jonathan Nieder <jrnieder@gmail.com>,
	Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, sandals@crustytoothpaste.net,
	steadmon@google.com, peff@peff.net, congdanhqx@gmail.com,
	phillip.wood123@gmail.com, emilyshaffer@google.com,
	sluongng@gmail.com, jonathantanmy@google.com,
	Derrick Stolee <derrickstolee@github.com>,
	Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH 01/11] maintenance: create basic maintenance runner
Date: Thu, 13 Aug 2020 21:05:48 -0400	[thread overview]
Message-ID: <b9db68dc-95a8-f8c3-a015-9d3390e30f7c@gmail.com> (raw)
In-Reply-To: <20200812210326.GA104953@google.com>

On 8/12/2020 5:03 PM, Jonathan Nieder wrote:
> Derrick Stolee wrote:
> 
>> Helped-by: Jonathan Nieder <jrnieder@gmail.com>
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>>  .gitignore                        |  1 +
>>  Documentation/git-maintenance.txt | 57 +++++++++++++++++++++++++++++++
>>  builtin.h                         |  1 +
>>  builtin/gc.c                      | 57 +++++++++++++++++++++++++++++++
>>  git.c                             |  1 +
>>  t/t7900-maintenance.sh            | 19 +++++++++++
>>  t/test-lib-functions.sh           | 33 ++++++++++++++++++
>>  7 files changed, 169 insertions(+)
>>  create mode 100644 Documentation/git-maintenance.txt
>>  create mode 100755 t/t7900-maintenance.sh
> 
> Looks reasonable.
> 
> [...]
>> --- /dev/null
>> +++ b/Documentation/git-maintenance.txt
>> @@ -0,0 +1,57 @@
>> +git-maintenance(1)
>> +==================
>> +
>> +NAME
>> +----
>> +git-maintenance - Run tasks to optimize Git repository data
>> +
>> +
>> +SYNOPSIS
>> +--------
>> +[verse]
>> +'git maintenance' run [<options>]
>> +
>> +
>> +DESCRIPTION
>> +-----------
>> +Run tasks to optimize Git repository data, speeding up other Git commands
>> +and reducing storage requirements for the repository.
>> ++
>> +Git commands that add repository data, such as `git add` or `git fetch`,
>> +are optimized for a responsive user experience. These commands do not take
>> +time to optimize the Git data, since such optimizations scale with the full
>> +size of the repository while these user commands each perform a relatively
>> +small action.
>> ++
>> +The `git maintenance` command provides flexibility for how to optimize the
>> +Git repository.
>> +
>> +SUBCOMMANDS
>> +-----------
>> +
>> +run::
>> +	Run one or more maintenance tasks.
> 
> This is still confusing --- shouldn't it say something like "Run the
> `gc` maintenance task (see below)"?

As I mentioned in the earlier version, I disagree with this.

> [...]
>> +TASKS
>> +-----
>> +
>> +gc::
>> +	Cleanup unnecessary files and optimize the local repository. "GC"
> 
> nit: cleanup is the noun, "clean up" is the verb
> 
>> +	stands for "garbage collection," but this task performs many
>> +	smaller tasks. This task can be rather expensive for large
> 
> nit: the word "rather" is not doing much work here, so we could leave
> it out

Both good.

> 
>> +	repositories, as it repacks all Git objects into a single pack-file.
>> +	It can also be disruptive in some situations, as it deletes stale
>> +	data.
> 
> What stale data is this referring to?  Where can I read more about
> what disruption it causes (or in other words, as a user, how would I
> decide when *not* to run this command)?

For this and the next comment, I prefer (for now) to keep all of the
deep details of the 'gc' task in the git-gc.txt documentation. However,
I should use "linkgit:git-gc[1]" to point users to that for reference.

Eventually, the maintenance builtin might replace the gc builtin, but
only after all of its behavior has been extracted into smaller tasks.
Those tasks would be documented in detail here, allowing us to make the
blurb for the 'gc' task be "Run the 'prune-reflog', 'pack-refs', ...,
and 'full-repack' tasks."

> [...]
>> +OPTIONS
>> +-------
>> +--auto::
>> +	When combined with the `run` subcommand, run maintenance tasks
>> +	only if certain thresholds are met. For example, the `gc` task
>> +	runs when the number of loose objects exceeds the number stored
>> +	in the `gc.auto` config setting, or when the number of pack-files
>> +	exceeds the `gc.autoPackLimit` config setting.
> 
> Hm, today I learned.  I had thought that --auto doesn't only affect
> thresholds but also would affect how aggressive the gc is when
> triggered, but the git-gc(1) manpage agrees with what's said above.
> 
> Is there a way we can share information between the two to avoid one
> falling out of date?  For example, should git-gc.txt point to this
> page for the authoritative description?

I remember Junio mentioning something about how 'gc' interprets '--auto'
as a "quick" mode, but like you I cannot find any reference to that in
the docs. Since it is not documented there and none of the other tasks
I am implementing here use that interpretation, I'll plan to leave this
portion as-is.
 
> [...]
>> --- a/builtin/gc.c
>> +++ b/builtin/gc.c
>> @@ -699,3 +699,60 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
>>  
>>  	return 0;
>>  }
>> +
>> +static const char * const builtin_maintenance_usage[] = {
>> +	N_("git maintenance run [<options>]"),
> 
> not about this patch: I wish we could automatically generate a usage
> string in this simple kind of case, to decrease the burden on
> translators.
> 
> [...]
>> +static int maintenance_task_gc(struct maintenance_opts *opts)
>> +{
>> +	struct child_process child = CHILD_PROCESS_INIT;
>> +
>> +	child.git_cmd = 1;
>> +	strvec_push(&child.args, "gc");
>> +
>> +	if (opts->auto_flag)
>> +		strvec_push(&child.args, "--auto");
>> +
>> +	close_object_store(the_repository->objects);
> 
> Interesting --- what does this function call do?

We need to close our file handles on the pack-files (and
commit-graph and multi-pack-index) or else the GC subcommand
cannot delete the files on Windows.

>> +	return run_command(&child);
>> +}
>> +
>> +static int maintenance_run(struct maintenance_opts *opts)
>> +{
>> +	return maintenance_task_gc(opts);
>> +}
>> +
>> +int cmd_maintenance(int argc, const char **argv, const char *prefix)
>> +{
>> +	static struct maintenance_opts opts;
>> +	static struct option builtin_maintenance_options[] = {
>> +		OPT_BOOL(0, "auto", &opts.auto_flag,
>> +			 N_("run tasks based on the state of the repository")),
>> +		OPT_END()
>> +	};
> 
> optional: these could be local instead of static

I see my problem here. The builtin_maintenance_options is static
(copied from another builtin, I'm sure) which is why the 'opts'
struct needs to be static or else this doesn't compile.

>> +
>> +	memset(&opts, 0, sizeof(opts));
> 
> not needed if it's static.  If it's not static, could use `opts = {0}`
> where it's declared to do the same thing in a simpler way.
> 
>> +
>> +	if (argc == 2 && !strcmp(argv[1], "-h"))
>> +		usage_with_options(builtin_maintenance_usage,
>> +				   builtin_maintenance_options);
>> +
>> +	argc = parse_options(argc, argv, prefix,
>> +			     builtin_maintenance_options,
>> +			     builtin_maintenance_usage,
>> +			     PARSE_OPT_KEEP_UNKNOWN);
>> +
>> +	if (argc == 1) {
>> +		if (!strcmp(argv[0], "run"))
>> +			return maintenance_run(&opts);
>> +	}
>> +
>> +	usage_with_options(builtin_maintenance_usage,
>> +			   builtin_maintenance_options);
> 
> optional: could reverse this test to get the uninteresting case out of
> the way first:
> 
> 	if (argc != 1)
> 		usage ...
> 
> 	...
> 
> That would also allow making the usage string when argv[0] is not
> "run" more specific.

Sure. Were you thinking of anything more specific than

	die(_("invalid subcommand: %s"), argv[0]);

? Of course, running "git maintenance barf" would result in

	fatal: invalid subcommand: barf

with an exit code of 128 instead of 129 like the usage string does,
so I'm not sure of the best way to make a custom "usage" error.

> [...]
>> --- /dev/null
>> +++ b/t/t7900-maintenance.sh
>> @@ -0,0 +1,19 @@
>> +#!/bin/sh
>> +
>> +test_description='git maintenance builtin'
>> +
>> +. ./test-lib.sh
>> +
>> +test_expect_success 'help text' '
>> +	test_expect_code 129 git maintenance -h 2>err &&
>> +	test_i18ngrep "usage: git maintenance run" err
>> +'
>> +
>> +test_expect_success 'run [--auto]' '
>> +	GIT_TRACE2_EVENT="$(pwd)/run-no-auto.txt" git maintenance run &&
>> +	GIT_TRACE2_EVENT="$(pwd)/run-auto.txt" git maintenance run --auto &&
>> +	test_subcommand git gc <run-no-auto.txt &&
>> +	test_subcommand git gc --auto <run-auto.txt
> 
> Nice.
> 
> [...]
>> --- a/t/test-lib-functions.sh
>> +++ b/t/test-lib-functions.sh
>> @@ -1561,3 +1561,36 @@ test_path_is_hidden () {
>>  	case "$("$SYSTEMROOT"/system32/attrib "$1")" in *H*?:*) return 0;; esac
>>  	return 1
>>  }
>> +
>> +# Check that the given command was invoked as part of the
>> +# trace2-format trace on stdin.
>> +#
>> +#	test_subcommand [!] <command> <args>... < <trace>
>> +#
>> +# For example, to look for an invocation of "git upload-pack
>> +# /path/to/repo"
>> +#
>> +#	GIT_TRACE2_EVENT=event.log git fetch ... &&
>> +#	test_subcommand git upload-pack "$PATH" <event.log
>> +#
>> +# If the first parameter passed is !, this instead checks that
>> +# the given command was not called.
>> +#
>> +test_subcommand () {
>> +	local negate=
>> +	if test "$1" = "!"
>> +	then
>> +		negate=t
>> +		shift
>> +	fi
>> +
>> +	local expr=$(printf '"%s",' "$@")
>> +	expr="${expr%,}"
>> +
>> +	if test -n "$negate"
>> +	then
>> +		! grep "\[$expr\]"
>> +	else
>> +		grep "\[$expr\]"
>> +	fi
>> +}
> 
> With whatever subset of the changes described above makes sense,
> 
> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
> 
> Thanks for your patient work.

Thanks,
-Stolee


  parent reply	other threads:[~2020-08-14  1:05 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-06 15:48 [PATCH 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
2020-08-06 15:48 ` [PATCH 01/11] maintenance: create basic maintenance runner Derrick Stolee via GitGitGadget
2020-08-07 22:16   ` Martin Ågren
2020-08-12 21:03   ` Jonathan Nieder
2020-08-12 22:07     ` Junio C Hamano
2020-08-12 22:50       ` Jonathan Nieder
2020-08-14  1:05     ` Derrick Stolee [this message]
2020-08-06 15:48 ` [PATCH 02/11] maintenance: add --quiet option Derrick Stolee via GitGitGadget
2020-08-06 15:48 ` [PATCH 03/11] maintenance: replace run_auto_gc() Derrick Stolee via GitGitGadget
2020-08-06 15:48 ` [PATCH 04/11] maintenance: initialize task array Derrick Stolee via GitGitGadget
2020-08-06 15:48 ` [PATCH 05/11] maintenance: add commit-graph task Derrick Stolee via GitGitGadget
2020-08-07 22:29   ` Martin Ågren
2020-08-12 13:30     ` Derrick Stolee
2020-08-14 12:23       ` Martin Ågren
2020-08-06 15:48 ` [PATCH 06/11] maintenance: add --task option Derrick Stolee via GitGitGadget
2020-08-06 15:48 ` [PATCH 07/11] maintenance: take a lock on the objects directory Derrick Stolee via GitGitGadget
2020-08-06 15:48 ` [PATCH 08/11] maintenance: create maintenance.<task>.enabled config Derrick Stolee via GitGitGadget
2020-08-06 15:48 ` [PATCH 09/11] maintenance: use pointers to check --auto Derrick Stolee via GitGitGadget
2020-08-06 15:48 ` [PATCH 10/11] maintenance: add auto condition for commit-graph task Derrick Stolee via GitGitGadget
2020-08-06 15:48 ` [PATCH 11/11] maintenance: add trace2 regions for task execution Derrick Stolee via GitGitGadget
2020-08-18 14:22 ` [PATCH v2 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
2020-08-18 14:22   ` [PATCH v2 01/11] maintenance: create basic maintenance runner Derrick Stolee via GitGitGadget
2020-08-18 14:22   ` [PATCH v2 02/11] maintenance: add --quiet option Derrick Stolee via GitGitGadget
2020-08-18 14:23   ` [PATCH v2 03/11] maintenance: replace run_auto_gc() Derrick Stolee via GitGitGadget
2020-08-18 14:23   ` [PATCH v2 04/11] maintenance: initialize task array Derrick Stolee via GitGitGadget
2020-08-18 23:46     ` Jonathan Tan
2020-08-18 14:23   ` [PATCH v2 05/11] maintenance: add commit-graph task Derrick Stolee via GitGitGadget
2020-08-18 23:51     ` Jonathan Tan
2020-08-19 15:04       ` Derrick Stolee
2020-08-19 17:43         ` Jonathan Tan
2020-08-18 14:23   ` [PATCH v2 06/11] maintenance: add --task option Derrick Stolee via GitGitGadget
2020-08-19  0:00     ` Jonathan Tan
2020-08-19  0:36       ` Junio C Hamano
2020-08-19 15:09         ` Derrick Stolee
2020-08-19 17:35           ` Jonathan Tan
2020-08-18 14:23   ` [PATCH v2 07/11] maintenance: take a lock on the objects directory Derrick Stolee via GitGitGadget
2020-08-19  0:04     ` Jonathan Tan
2020-08-19 15:10       ` Derrick Stolee
2020-08-18 14:23   ` [PATCH v2 08/11] maintenance: create maintenance.<task>.enabled config Derrick Stolee via GitGitGadget
2020-08-18 14:23   ` [PATCH v2 09/11] maintenance: use pointers to check --auto Derrick Stolee via GitGitGadget
2020-08-18 14:23   ` [PATCH v2 10/11] maintenance: add auto condition for commit-graph task Derrick Stolee via GitGitGadget
2020-08-19  0:09     ` Jonathan Tan
2020-08-19 15:15       ` Derrick Stolee
2020-08-18 14:23   ` [PATCH v2 11/11] maintenance: add trace2 regions for task execution Derrick Stolee via GitGitGadget
2020-08-19  0:11     ` Jonathan Tan
2020-08-18 20:18   ` [PATCH v2 00/11] Maintenance I: Command, gc and commit-graph tasks Junio C Hamano
2020-08-19 14:51     ` Derrick Stolee
2020-08-25 18:33   ` [PATCH v3 " Derrick Stolee via GitGitGadget
2020-08-25 18:33     ` [PATCH v3 01/11] maintenance: create basic maintenance runner Derrick Stolee via GitGitGadget
2020-08-25 18:33     ` [PATCH v3 02/11] maintenance: add --quiet option Derrick Stolee via GitGitGadget
2020-08-25 18:33     ` [PATCH v3 03/11] maintenance: replace run_auto_gc() Derrick Stolee via GitGitGadget
2020-08-25 18:33     ` [PATCH v3 04/11] maintenance: initialize task array Derrick Stolee via GitGitGadget
2020-08-25 18:33     ` [PATCH v3 05/11] maintenance: add commit-graph task Derrick Stolee via GitGitGadget
2020-08-25 18:33     ` [PATCH v3 06/11] maintenance: add --task option Derrick Stolee via GitGitGadget
2020-08-25 18:33     ` [PATCH v3 07/11] maintenance: take a lock on the objects directory Derrick Stolee via GitGitGadget
2020-08-26 23:02       ` Jonathan Tan
2020-08-25 18:33     ` [PATCH v3 08/11] maintenance: create maintenance.<task>.enabled config Derrick Stolee via GitGitGadget
2020-08-25 18:33     ` [PATCH v3 09/11] maintenance: use pointers to check --auto Derrick Stolee via GitGitGadget
2020-08-25 18:33     ` [PATCH v3 10/11] maintenance: add auto condition for commit-graph task Derrick Stolee via GitGitGadget
2020-08-26 23:02       ` Jonathan Tan
2020-08-26 23:56         ` Junio C Hamano
2020-08-25 18:33     ` [PATCH v3 11/11] maintenance: add trace2 regions for task execution Derrick Stolee via GitGitGadget
2020-09-04 13:09     ` [PATCH v4 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
2020-09-04 13:09       ` [PATCH v4 01/11] maintenance: create basic maintenance runner Derrick Stolee via GitGitGadget
2020-09-04 13:09       ` [PATCH v4 02/11] maintenance: add --quiet option Derrick Stolee via GitGitGadget
2020-09-04 13:09       ` [PATCH v4 03/11] maintenance: replace run_auto_gc() Derrick Stolee via GitGitGadget
2020-09-04 13:09       ` [PATCH v4 04/11] maintenance: initialize task array Derrick Stolee via GitGitGadget
2020-09-04 13:09       ` [PATCH v4 05/11] maintenance: add commit-graph task Derrick Stolee via GitGitGadget
2020-09-04 13:09       ` [PATCH v4 06/11] maintenance: add --task option Derrick Stolee via GitGitGadget
2020-09-04 13:09       ` [PATCH v4 07/11] maintenance: take a lock on the objects directory Derrick Stolee via GitGitGadget
2020-09-04 13:09       ` [PATCH v4 08/11] maintenance: create maintenance.<task>.enabled config Derrick Stolee via GitGitGadget
2020-09-04 13:09       ` [PATCH v4 09/11] maintenance: use pointers to check --auto Derrick Stolee via GitGitGadget
2020-09-04 13:09       ` [PATCH v4 10/11] maintenance: add auto condition for commit-graph task Derrick Stolee via GitGitGadget
2020-09-04 13:09       ` [PATCH v4 11/11] maintenance: add trace2 regions for task execution Derrick Stolee via GitGitGadget
2020-09-17 18:11       ` [PATCH v5 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
2020-09-17 18:11         ` [PATCH v5 01/11] maintenance: create basic maintenance runner Derrick Stolee via GitGitGadget
2020-09-17 18:11         ` [PATCH v5 02/11] maintenance: add --quiet option Derrick Stolee via GitGitGadget
2020-09-17 18:11         ` [PATCH v5 03/11] maintenance: replace run_auto_gc() Derrick Stolee via GitGitGadget
2020-09-17 18:11         ` [PATCH v5 04/11] maintenance: initialize task array Derrick Stolee via GitGitGadget
2020-09-17 18:11         ` [PATCH v5 05/11] maintenance: add commit-graph task Derrick Stolee via GitGitGadget
2020-09-17 18:11         ` [PATCH v5 06/11] maintenance: add --task option Derrick Stolee via GitGitGadget
2020-09-17 18:11         ` [PATCH v5 07/11] maintenance: take a lock on the objects directory Derrick Stolee via GitGitGadget
2020-09-21 13:36           ` Ævar Arnfjörð Bjarmason
2020-09-21 13:43             ` Derrick Stolee
2020-09-21 19:29               ` Junio C Hamano
2020-09-17 18:11         ` [PATCH v5 08/11] maintenance: create maintenance.<task>.enabled config Derrick Stolee via GitGitGadget
2020-09-17 18:11         ` [PATCH v5 09/11] maintenance: use pointers to check --auto Derrick Stolee via GitGitGadget
2020-09-17 18:11         ` [PATCH v5 10/11] maintenance: add auto condition for commit-graph task Derrick Stolee via GitGitGadget
2020-09-17 18:11         ` [PATCH v5 11/11] maintenance: add trace2 regions for task execution Derrick Stolee via GitGitGadget
2020-09-17 18:35         ` [PATCH v5 00/11] Maintenance I: Command, gc and commit-graph tasks Junio C Hamano
2020-09-18 13:14           ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b9db68dc-95a8-f8c3-a015-9d3390e30f7c@gmail.com \
    --to=stolee@gmail.com \
    --cc=congdanhqx@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=dstolee@microsoft.com \
    --cc=emilyshaffer@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=jonathantanmy@google.com \
    --cc=jrnieder@gmail.com \
    --cc=peff@peff.net \
    --cc=phillip.wood123@gmail.com \
    --cc=sandals@crustytoothpaste.net \
    --cc=sluongng@gmail.com \
    --cc=steadmon@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).