git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/6] configuration-based hook management
@ 2019-12-10  2:33 Emily Shaffer
  2019-12-10  2:33 ` [PATCH 1/6] hook: scaffolding for git-hook subcommand Emily Shaffer
                   ` (6 more replies)
  0 siblings, 7 replies; 125+ messages in thread
From: Emily Shaffer @ 2019-12-10  2:33 UTC (permalink / raw)
  To: git
  Cc: Emily Shaffer, brian m. carlson, Jonathan Nieder,
	Ævar Arnfjörð Bjarmason

An implementation of the first piece of the proposal given in
lore.kernel.org/git/20191116011125.GG22855@google.com.

Teaches a new command, 'git hook', which will someday include 'git hook
--add ...', 'git hook --edit ...', and maybe more. For now, just teach
it how to check the config files with 'git hook --list ...'.

The hooks-to-run list is collected in a new library, hook.o, which can
someday reimplement find_hook() or otherwise be invoked to run all hooks
for a given hookname (e.g. "pre-commit").

The change to config.[ch] allows us to display a similar scope name to
the one a user may use to 'git config --add' or later 'git hook --add' a
hook at a certain scope, e.g.:

  $ git hook --list pre-commit
  001	global	~/foo.sh
  $ git hook --add --global pre-commit 005 ~/bar.sh
  Added.
  001	global	~/foo.sh
  005	global	~/bar.sh

There are config examples in many of the commit messages in this chain.

Before I consider "--list" to be done, I also want to add support to
check "hook.runHookDir" and take .git/hooks/* into account. But I wanted
us to spend time chewing on the config format for a while before I got
too far.

It's also very possible (likely, even!) to put this feature behind an
experimental flag, which gives us more room to change the config format
if we want before the feature is "done".

In the discussion thread with brian, I also mentioned a self-paced
deprecation of hooks which live in .git/hooks/, which I'm aware some
users may not want to follow. However, it occurred to me that we may be
able to hide a Git-paced deprecation behind a config macro (since those
are new and shiny) which is opt-in, and handles something like:

  hook.runHookDir = true
  hook.warnHookDir = false

  {some months pass, we are sure config-based hooks are working nicely}

  hook.runHookDir = true
  hook.warnHookDir = true

  {so start yelling at users to move away, and wait some more
  months/years}

  hook.runHookDir = false

  {users who have opted into the hookdir phaseout macro are no longer
  using the hookdir}

As it's opt-in (and easily reversible by changing configs) this might be
a good middle ground for the "deprecate or not" discussion brian and I
had.

Thanks.
 - Emily

Emily Shaffer (6):
  hook: scaffolding for git-hook subcommand
  config: add string mapping for enum config_scope
  hook: add --list mode
  hook: support reordering of hook list
  hook: remove prior hook with '---'
  hook: teach --porcelain mode

 .gitignore                    |  1 +
 Documentation/git-hook.txt    | 53 ++++++++++++++++++++
 Makefile                      |  2 +
 builtin.h                     |  1 +
 builtin/hook.c                | 80 ++++++++++++++++++++++++++++++
 config.c                      | 17 +++++++
 config.h                      |  1 +
 git.c                         |  1 +
 hook.c                        | 93 +++++++++++++++++++++++++++++++++++
 hook.h                        | 14 ++++++
 t/t1360-config-based-hooks.sh | 89 +++++++++++++++++++++++++++++++++
 11 files changed, 352 insertions(+)
 create mode 100644 Documentation/git-hook.txt
 create mode 100644 builtin/hook.c
 create mode 100644 hook.c
 create mode 100644 hook.h
 create mode 100755 t/t1360-config-based-hooks.sh

-- 
2.24.0.393.g34dc348eaf-goog


^ permalink raw reply	[flat|nested] 125+ messages in thread

* [PATCH 1/6] hook: scaffolding for git-hook subcommand
  2019-12-10  2:33 [PATCH 0/6] configuration-based hook management Emily Shaffer
@ 2019-12-10  2:33 ` Emily Shaffer
  2019-12-12  9:41   ` Bert Wesarg
  2019-12-12 10:47   ` SZEDER Gábor
  2019-12-10  2:33 ` [PATCH 2/6] config: add string mapping for enum config_scope Emily Shaffer
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 125+ messages in thread
From: Emily Shaffer @ 2019-12-10  2:33 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

Introduce infrastructure for a new subcommand, git-hook, which will be
used to ease config-based hook management. This command will handle
parsing configs to compose a list of hooks to run for a given event, as
well as adding or modifying hook configs in an interactive fashion.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 .gitignore                    |  1 +
 Documentation/git-hook.txt    | 19 +++++++++++++++++++
 Makefile                      |  1 +
 builtin.h                     |  1 +
 builtin/hook.c                | 21 +++++++++++++++++++++
 git.c                         |  1 +
 t/t1360-config-based-hooks.sh | 11 +++++++++++
 7 files changed, 55 insertions(+)
 create mode 100644 Documentation/git-hook.txt
 create mode 100644 builtin/hook.c
 create mode 100755 t/t1360-config-based-hooks.sh

diff --git a/.gitignore b/.gitignore
index 89b3b79c1a..9ef59b9baa 100644
--- a/.gitignore
+++ b/.gitignore
@@ -74,6 +74,7 @@
 /git-grep
 /git-hash-object
 /git-help
+/git-hook
 /git-http-backend
 /git-http-fetch
 /git-http-push
diff --git a/Documentation/git-hook.txt b/Documentation/git-hook.txt
new file mode 100644
index 0000000000..2d50c414cc
--- /dev/null
+++ b/Documentation/git-hook.txt
@@ -0,0 +1,19 @@
+git-hook(1)
+===========
+
+NAME
+----
+git-hook - Manage configured hooks
+
+SYNOPSIS
+--------
+[verse]
+'git hook'
+
+DESCRIPTION
+-----------
+You can list, add, and modify hooks with this command.
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/Makefile b/Makefile
index 58b92af54b..83263505c0 100644
--- a/Makefile
+++ b/Makefile
@@ -1074,6 +1074,7 @@ BUILTIN_OBJS += builtin/get-tar-commit-id.o
 BUILTIN_OBJS += builtin/grep.o
 BUILTIN_OBJS += builtin/hash-object.o
 BUILTIN_OBJS += builtin/help.o
+BUILTIN_OBJS += builtin/hook.o
 BUILTIN_OBJS += builtin/index-pack.o
 BUILTIN_OBJS += builtin/init-db.o
 BUILTIN_OBJS += builtin/interpret-trailers.o
diff --git a/builtin.h b/builtin.h
index 5cf5df69f7..d4ca2ac9a5 100644
--- a/builtin.h
+++ b/builtin.h
@@ -173,6 +173,7 @@ int cmd_get_tar_commit_id(int argc, const char **argv, const char *prefix);
 int cmd_grep(int argc, const char **argv, const char *prefix);
 int cmd_hash_object(int argc, const char **argv, const char *prefix);
 int cmd_help(int argc, const char **argv, const char *prefix);
+int cmd_hook(int argc, const char **argv, const char *prefix);
 int cmd_index_pack(int argc, const char **argv, const char *prefix);
 int cmd_init_db(int argc, const char **argv, const char *prefix);
 int cmd_interpret_trailers(int argc, const char **argv, const char *prefix);
diff --git a/builtin/hook.c b/builtin/hook.c
new file mode 100644
index 0000000000..b2bbc84d4d
--- /dev/null
+++ b/builtin/hook.c
@@ -0,0 +1,21 @@
+#include "cache.h"
+
+#include "builtin.h"
+#include "parse-options.h"
+
+static const char * const builtin_hook_usage[] = {
+	N_("git hook"),
+	NULL
+};
+
+int cmd_hook(int argc, const char **argv, const char *prefix)
+{
+	struct option builtin_hook_options[] = {
+		OPT_END(),
+	};
+
+	argc = parse_options(argc, argv, prefix, builtin_hook_options,
+			     builtin_hook_usage, 0);
+
+	return 0;
+}
diff --git a/git.c b/git.c
index ce6ab0ece2..c8344b9ab7 100644
--- a/git.c
+++ b/git.c
@@ -513,6 +513,7 @@ static struct cmd_struct commands[] = {
 	{ "grep", cmd_grep, RUN_SETUP_GENTLY },
 	{ "hash-object", cmd_hash_object },
 	{ "help", cmd_help },
+	{ "hook", cmd_hook, RUN_SETUP },
 	{ "index-pack", cmd_index_pack, RUN_SETUP_GENTLY | NO_PARSEOPT },
 	{ "init", cmd_init_db },
 	{ "init-db", cmd_init_db },
diff --git a/t/t1360-config-based-hooks.sh b/t/t1360-config-based-hooks.sh
new file mode 100755
index 0000000000..34b0df5216
--- /dev/null
+++ b/t/t1360-config-based-hooks.sh
@@ -0,0 +1,11 @@
+#!/bin/bash
+
+test_description='config-managed multihooks, including git-hook command'
+
+. ./test-lib.sh
+
+test_expect_success 'git hook command does not crash' '
+	git hook
+'
+
+test_done
-- 
2.24.0.393.g34dc348eaf-goog


^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [PATCH 2/6] config: add string mapping for enum config_scope
  2019-12-10  2:33 [PATCH 0/6] configuration-based hook management Emily Shaffer
  2019-12-10  2:33 ` [PATCH 1/6] hook: scaffolding for git-hook subcommand Emily Shaffer
@ 2019-12-10  2:33 ` Emily Shaffer
  2019-12-10 11:16   ` Philip Oakley
  2019-12-10  2:33 ` [PATCH 3/6] hook: add --list mode Emily Shaffer
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 125+ messages in thread
From: Emily Shaffer @ 2019-12-10  2:33 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

If a user is interacting with their config files primarily by the 'git
config' command, using the location flags (--global, --system, etc) then
they may be more interested to see the scope of the config file they are
editing, rather than the filepath.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 config.c | 17 +++++++++++++++++
 config.h |  1 +
 2 files changed, 18 insertions(+)

diff --git a/config.c b/config.c
index e7052b3977..a20110e016 100644
--- a/config.c
+++ b/config.c
@@ -3312,6 +3312,23 @@ enum config_scope current_config_scope(void)
 		return current_parsing_scope;
 }
 
+const char *config_scope_to_string(enum config_scope scope)
+{
+	switch (scope) {
+	case CONFIG_SCOPE_SYSTEM:
+		return _("system");
+	case CONFIG_SCOPE_GLOBAL:
+		return _("global");
+	case CONFIG_SCOPE_REPO:
+		return _("repo");
+	case CONFIG_SCOPE_CMDLINE:
+		return _("cmdline");
+	case CONFIG_SCOPE_UNKNOWN:
+	default:
+		return _("unknown");
+	}
+}
+
 int lookup_config(const char **mapping, int nr_mapping, const char *var)
 {
 	int i;
diff --git a/config.h b/config.h
index f0ed464004..612f43acd0 100644
--- a/config.h
+++ b/config.h
@@ -139,6 +139,7 @@ enum config_scope {
 };
 
 enum config_scope current_config_scope(void);
+const char *config_scope_to_string(enum config_scope);
 const char *current_config_origin_type(void);
 const char *current_config_name(void);
 
-- 
2.24.0.393.g34dc348eaf-goog


^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [PATCH 3/6] hook: add --list mode
  2019-12-10  2:33 [PATCH 0/6] configuration-based hook management Emily Shaffer
  2019-12-10  2:33 ` [PATCH 1/6] hook: scaffolding for git-hook subcommand Emily Shaffer
  2019-12-10  2:33 ` [PATCH 2/6] config: add string mapping for enum config_scope Emily Shaffer
@ 2019-12-10  2:33 ` Emily Shaffer
  2019-12-12  9:38   ` Bert Wesarg
  2019-12-12 10:58   ` SZEDER Gábor
  2019-12-10  2:33 ` [PATCH 4/6] hook: support reordering of hook list Emily Shaffer
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 125+ messages in thread
From: Emily Shaffer @ 2019-12-10  2:33 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

Teach 'git hook --list <hookname>', which checks the known configs in
order to create an ordered list of hooks to run on a given hook event.

The hook config format is "hook.<hookname> = <order>:<path-to-hook>".
This paves the way for multiple hook support; hooks should be run in the
order specified by the user in the config, and in the case of an order
number collision, configuration order should be used (e.g. global hook
004 will run before repo hook 004).

For example:

  $ grep -A2 "\[hook\]" ~/.gitconfig
  [hook]
          pre-commit = 001:~/test.sh
          pre-commit = 999:~/baz.sh

  $ grep -A1 "\[hook\]" ~/git/.git/config
  [hook]
          pre-commit = 900:~/bar.sh

  $ ./bin-wrappers/git hook --list pre-commit
  001     global  ~/test.sh
  900     repo    ~/bar.sh
  999     global  ~/baz.sh

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 Documentation/git-hook.txt    | 17 +++++++-
 Makefile                      |  1 +
 builtin/hook.c                | 54 ++++++++++++++++++++++-
 hook.c                        | 81 +++++++++++++++++++++++++++++++++++
 hook.h                        | 14 ++++++
 t/t1360-config-based-hooks.sh | 43 ++++++++++++++++++-
 6 files changed, 206 insertions(+), 4 deletions(-)
 create mode 100644 hook.c
 create mode 100644 hook.h

diff --git a/Documentation/git-hook.txt b/Documentation/git-hook.txt
index 2d50c414cc..a141884239 100644
--- a/Documentation/git-hook.txt
+++ b/Documentation/git-hook.txt
@@ -8,12 +8,27 @@ git-hook - Manage configured hooks
 SYNOPSIS
 --------
 [verse]
-'git hook'
+'git hook' -l | --list <hook-name>
 
 DESCRIPTION
 -----------
 You can list, add, and modify hooks with this command.
 
+This command parses the default configuration files for lines which look like
+"hook.<hook-name> = <order number>:<hook command>", e.g. "hook.pre-commit =
+010:/path/to/script.sh". In this way, multiple scripts can be run during a
+single hook. Hooks are sorted in ascending order by order number; in the event
+of an order number conflict, they are sorted in configuration order.
+
+OPTIONS
+-------
+
+-l::
+--list::
+	List the hooks which have been configured for <hook-name>. Hooks appear
+	in the order they should be run. Output of this command follows the
+	format '<order number> <origin config> <hook command>'.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/Makefile b/Makefile
index 83263505c0..21b3a82208 100644
--- a/Makefile
+++ b/Makefile
@@ -892,6 +892,7 @@ LIB_OBJS += hashmap.o
 LIB_OBJS += linear-assignment.o
 LIB_OBJS += help.o
 LIB_OBJS += hex.o
+LIB_OBJS += hook.o
 LIB_OBJS += ident.o
 LIB_OBJS += interdiff.o
 LIB_OBJS += json-writer.o
diff --git a/builtin/hook.c b/builtin/hook.c
index b2bbc84d4d..8261302b27 100644
--- a/builtin/hook.c
+++ b/builtin/hook.c
@@ -1,21 +1,73 @@
 #include "cache.h"
 
 #include "builtin.h"
+#include "config.h"
+#include "hook.h"
 #include "parse-options.h"
+#include "strbuf.h"
 
 static const char * const builtin_hook_usage[] = {
-	N_("git hook"),
+	N_("git hook --list <hookname>"),
 	NULL
 };
 
+enum hook_command {
+	HOOK_NO_COMMAND = 0,
+	HOOK_LIST,
+};
+
+static int print_hook_list(const struct strbuf *hookname)
+{
+	struct list_head *head, *pos;
+	struct hook *item;
+
+	head = hook_list(hookname);
+
+	list_for_each(pos, head) {
+		item = list_entry(pos, struct hook, list);
+		if (item)
+			printf("%.3d\t%s\t%s\n", item->order,
+			       config_scope_to_string(item->origin),
+			       item->command.buf);
+	}
+
+	return 0;
+}
+
 int cmd_hook(int argc, const char **argv, const char *prefix)
 {
+	enum hook_command command = 0;
+	struct strbuf hookname = STRBUF_INIT;
+
 	struct option builtin_hook_options[] = {
+		OPT_CMDMODE('l', "list", &command,
+			    N_("list scripts which will be run for <hookname>"),
+			    HOOK_LIST),
 		OPT_END(),
 	};
 
 	argc = parse_options(argc, argv, prefix, builtin_hook_options,
 			     builtin_hook_usage, 0);
 
+	if (argc < 1) {
+		usage_msg_opt("a hookname must be provided to operate on.",
+			      builtin_hook_usage, builtin_hook_options);
+	}
+
+	strbuf_addstr(&hookname, "hook.");
+	strbuf_addstr(&hookname, argv[0]);
+
+	switch(command) {
+		case HOOK_LIST:
+			return print_hook_list(&hookname);
+			break;
+		default:
+			usage_msg_opt("no command given.", builtin_hook_usage,
+				      builtin_hook_options);
+	}
+
+	clear_hook_list();
+	strbuf_release(&hookname);
+
 	return 0;
 }
diff --git a/hook.c b/hook.c
new file mode 100644
index 0000000000..f8d1109084
--- /dev/null
+++ b/hook.c
@@ -0,0 +1,81 @@
+#include "cache.h"
+
+#include "hook.h"
+#include "config.h"
+
+static LIST_HEAD(hook_head);
+
+void free_hook(struct hook *ptr)
+{
+	if (ptr) {
+		strbuf_release(&ptr->command);
+		free(ptr);
+	}
+}
+
+static void emplace_hook(struct list_head *pos, int order, const char *command)
+{
+	struct hook *to_add = malloc(sizeof(struct hook));
+	to_add->order = order;
+	to_add->origin = current_config_scope();
+	strbuf_init(&to_add->command, 0);
+	strbuf_addstr(&to_add->command, command);
+
+	list_add_tail(&to_add->list, pos);
+}
+
+static void remove_hook(struct list_head *to_remove)
+{
+	struct hook *hook_to_remove = list_entry(to_remove, struct hook, list);
+	list_del(to_remove);
+	free_hook(hook_to_remove);
+}
+
+void clear_hook_list()
+{
+	struct list_head *pos, *tmp;
+	list_for_each_safe(pos, tmp, &hook_head)
+		remove_hook(pos);
+}
+
+static int check_config_for_hooks(const char *var, const char *value, void *hookname)
+{
+	struct list_head *pos, *p;
+	struct hook *item;
+	const struct strbuf *hookname_strbuf = hookname;
+
+	if (!strcmp(var, hookname_strbuf->buf)) {
+		int order = 0;
+		// TODO this is bad - open to overflows
+		char command[256];
+		int added = 0;
+		if (!sscanf(value, "%d:%s", &order, command))
+			die(_("hook config '%s' doesn't match expected format"),
+			    value);
+
+		list_for_each_safe(pos, p, &hook_head) {
+			item = list_entry(pos, struct hook, list);
+
+			/*
+			 * the new entry should go just before the first entry
+			 * which has a higher order number than it.
+			 */
+			if (item->order > order && !added) {
+				emplace_hook(pos, order, command);
+				added = 1;
+			}
+		}
+
+		if (!added)
+			emplace_hook(pos, order, command);
+	}
+
+	return 0;
+}
+
+struct list_head* hook_list(const struct strbuf* hookname)
+{
+	git_config(check_config_for_hooks, (void*)hookname);
+
+	return &hook_head;
+}
diff --git a/hook.h b/hook.h
new file mode 100644
index 0000000000..104df4c088
--- /dev/null
+++ b/hook.h
@@ -0,0 +1,14 @@
+#include "config.h"
+
+struct hook
+{
+	struct list_head list;
+	int order;
+	enum config_scope origin;
+	struct strbuf command;
+};
+
+struct list_head* hook_list(const struct strbuf *hookname);
+
+void free_hook(struct hook *ptr);
+void clear_hook_list();
diff --git a/t/t1360-config-based-hooks.sh b/t/t1360-config-based-hooks.sh
index 34b0df5216..1434051db3 100755
--- a/t/t1360-config-based-hooks.sh
+++ b/t/t1360-config-based-hooks.sh
@@ -4,8 +4,47 @@ test_description='config-managed multihooks, including git-hook command'
 
 . ./test-lib.sh
 
-test_expect_success 'git hook command does not crash' '
-	git hook
+test_expect_success 'git hook rejects commands without a mode' '
+	test_must_fail git hook pre-commit
+'
+
+
+test_expect_success 'git hook rejects commands without a hookname' '
+	test_must_fail git hook --list
+'
+
+test_expect_success 'setup hooks in system, global, and local' '
+	git config --add --global hook.pre-commit "010:/path/def" &&
+	git config --add --global hook.pre-commit "999:/path/uvw" &&
+
+	git config --add --local hook.pre-commit "100:/path/ghi" &&
+	git config --add --local hook.pre-commit "990:/path/rst"
+'
+
+test_expect_success 'git hook --list orders by order number' '
+	cat >expected <<-\EOF &&
+	010	global	/path/def
+	100	repo	/path/ghi
+	990	repo	/path/rst
+	999	global	/path/uvw
+	EOF
+
+	git hook --list pre-commit >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'order number collisions resolved in config order' '
+	cat >expected <<-\EOF &&
+	010	global	/path/def
+	010	repo	/path/abc
+	100	repo	/path/ghi
+	990	repo	/path/rst
+	999	global	/path/uvw
+	EOF
+
+	git config --add --local hook.pre-commit "010:/path/abc" &&
+	git hook --list pre-commit >actual &&
+	test_cmp expected actual
 '
 
 test_done
-- 
2.24.0.393.g34dc348eaf-goog


^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [PATCH 4/6] hook: support reordering of hook list
  2019-12-10  2:33 [PATCH 0/6] configuration-based hook management Emily Shaffer
                   ` (2 preceding siblings ...)
  2019-12-10  2:33 ` [PATCH 3/6] hook: add --list mode Emily Shaffer
@ 2019-12-10  2:33 ` Emily Shaffer
  2019-12-11 19:21   ` Junio C Hamano
  2019-12-10  2:33 ` [PATCH 5/6] hook: remove prior hook with '---' Emily Shaffer
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 125+ messages in thread
From: Emily Shaffer @ 2019-12-10  2:33 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

It's possible that in most cases a user wants to run pre-commit hook
'A', but in exactly one repo that user wants to run pre-commit hook 'A'
first instead. Teach 'git hook' to support this by allowing a user to
specify a new order number for a hook after the initial hook has been
specified.

For example:

  $ grep -A2 "\[hook\]" ~/.gitconfig
  [hook]
          pre-commit = 001:~/test.sh
          pre-commit = 999:~/baz.sh
  $ grep -A2 "\[hook\]" ~/git/.git/config
  [hook]
          pre-commit = 900:~/bar.sh
          pre-commit = 050:~/baz.sh
  $ ./bin-wrappers/git hook --list pre-commit
  001     global  ~/test.sh
  050     repo    ~/baz.sh
  900     repo    ~/bar.sh

In the above example, '~/baz.sh' is provided in the global config with
order position 999. Then, in the local config, that order is overridden
to 050. Instead of running ~/baz.sh twice (at order 050 and at order
999), only run it once, in the position specified last in config order.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 Documentation/git-hook.txt    |  8 ++++++++
 hook.c                        |  7 +++++++
 t/t1360-config-based-hooks.sh | 14 ++++++++++++++
 3 files changed, 29 insertions(+)

diff --git a/Documentation/git-hook.txt b/Documentation/git-hook.txt
index a141884239..0f7115f826 100644
--- a/Documentation/git-hook.txt
+++ b/Documentation/git-hook.txt
@@ -20,6 +20,14 @@ This command parses the default configuration files for lines which look like
 single hook. Hooks are sorted in ascending order by order number; in the event
 of an order number conflict, they are sorted in configuration order.
 
+The order number of a hook can be changed at a more local scope, e.g.:
+
+  git config --add --global hook.pre-commit "001:/foo.sh"
+  git config --add --local hook.pre-commit "005:/foo.sh"
+
+When the order number is respecified this way, the previously specified hook
+configuration is overridden.
+
 OPTIONS
 -------
 
diff --git a/hook.c b/hook.c
index f8d1109084..a7dcd18a2e 100644
--- a/hook.c
+++ b/hook.c
@@ -64,6 +64,13 @@ static int check_config_for_hooks(const char *var, const char *value, void *hook
 				emplace_hook(pos, order, command);
 				added = 1;
 			}
+
+			/*
+			 * if the command already exists, this entry should be
+			 * replacing it.
+			 */
+			if (!strcmp(item->command.buf, command))
+				remove_hook(pos);
 		}
 
 		if (!added)
diff --git a/t/t1360-config-based-hooks.sh b/t/t1360-config-based-hooks.sh
index 1434051db3..1af43ef18d 100755
--- a/t/t1360-config-based-hooks.sh
+++ b/t/t1360-config-based-hooks.sh
@@ -47,4 +47,18 @@ test_expect_success 'order number collisions resolved in config order' '
 	test_cmp expected actual
 '
 
+test_expect_success 'adding a command with a different number reorders list' '
+	cat >expected <<-\EOF &&
+	010	repo	/path/abc
+	050	repo	/path/def
+	100	repo	/path/ghi
+	990	repo	/path/rst
+	999	global	/path/uvw
+	EOF
+
+	git config --add --local hook.pre-commit "050:/path/def" &&
+	git hook --list pre-commit >actual &&
+	test_cmp expected actual
+'
+
 test_done
-- 
2.24.0.393.g34dc348eaf-goog


^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [PATCH 5/6] hook: remove prior hook with '---'
  2019-12-10  2:33 [PATCH 0/6] configuration-based hook management Emily Shaffer
                   ` (3 preceding siblings ...)
  2019-12-10  2:33 ` [PATCH 4/6] hook: support reordering of hook list Emily Shaffer
@ 2019-12-10  2:33 ` Emily Shaffer
  2019-12-10  2:33 ` [PATCH 6/6] hook: teach --porcelain mode Emily Shaffer
  2019-12-11 22:42 ` [PATCH 0/6] configuration-based hook management Junio C Hamano
  6 siblings, 0 replies; 125+ messages in thread
From: Emily Shaffer @ 2019-12-10  2:33 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

It's possible a user may want to run a hook for nearly every repo,
except for one. Rather than requiring the user to specify the hook
locally in all but one repo, teach 'git hook' how to interpret config
lines intended to remove hooks specified earlier during the config
parse. This means a user can specify such a hook at the system or global
level and override it at the local level.

For example:

$ grep -A2 "\[hook\]" ~/.gitconfig
[hook]
        pre-commit = 001:~/test.sh
        pre-commit = 999:~/baz.sh
$ grep -A2 "\[hook\]" ~/git/.git/config
[hook]
        pre-commit = 900:~/bar.sh
        pre-commit = ---:~/baz.sh
$ ./bin-wrappers/git hook --list pre-commit
001     global  ~/test.sh
900     repo    ~/bar.sh

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 Documentation/git-hook.txt    |  8 ++++++++
 hook.c                        | 15 ++++++++++-----
 t/t1360-config-based-hooks.sh | 13 +++++++++++++
 3 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-hook.txt b/Documentation/git-hook.txt
index 0f7115f826..b4a992d43f 100644
--- a/Documentation/git-hook.txt
+++ b/Documentation/git-hook.txt
@@ -28,6 +28,14 @@ The order number of a hook can be changed at a more local scope, e.g.:
 When the order number is respecified this way, the previously specified hook
 configuration is overridden.
 
+A hook specified at a more global scope can be removed by specifying "---"
+instead of an order number, e.g.:
+
+  git config --add --global hook.pre-commit "001:/foo.sh"
+  git config --add --local hook.pre-commit "---:/foo.sh"
+
+When the hook is removed in this way, `/foo.sh` will not be run at all.
+
 OPTIONS
 -------
 
diff --git a/hook.c b/hook.c
index a7dcd18a2e..e7afa140c8 100644
--- a/hook.c
+++ b/hook.c
@@ -49,9 +49,14 @@ static int check_config_for_hooks(const char *var, const char *value, void *hook
 		// TODO this is bad - open to overflows
 		char command[256];
 		int added = 0;
-		if (!sscanf(value, "%d:%s", &order, command))
-			die(_("hook config '%s' doesn't match expected format"),
-			    value);
+		int remove = 0;
+		if (!sscanf(value, "%d:%s", &order, command)) {
+			if (sscanf(value, "---:%s", command))
+				remove = 1;
+			else
+				die(_("hook config '%s' doesn't match expected format"),
+				    value);
+		}
 
 		list_for_each_safe(pos, p, &hook_head) {
 			item = list_entry(pos, struct hook, list);
@@ -60,7 +65,7 @@ static int check_config_for_hooks(const char *var, const char *value, void *hook
 			 * the new entry should go just before the first entry
 			 * which has a higher order number than it.
 			 */
-			if (item->order > order && !added) {
+			if (item->order > order && !added && !remove) {
 				emplace_hook(pos, order, command);
 				added = 1;
 			}
@@ -73,7 +78,7 @@ static int check_config_for_hooks(const char *var, const char *value, void *hook
 				remove_hook(pos);
 		}
 
-		if (!added)
+		if (!added && !remove)
 			emplace_hook(pos, order, command);
 	}
 
diff --git a/t/t1360-config-based-hooks.sh b/t/t1360-config-based-hooks.sh
index 1af43ef18d..66e70ae222 100755
--- a/t/t1360-config-based-hooks.sh
+++ b/t/t1360-config-based-hooks.sh
@@ -61,4 +61,17 @@ test_expect_success 'adding a command with a different number reorders list' '
 	test_cmp expected actual
 '
 
+test_expect_success 'remove a command with "---:/path/to/cmd"' '
+	cat >expected <<-\EOF &&
+	010	repo	/path/abc
+	050	repo	/path/def
+	100	repo	/path/ghi
+	990	repo	/path/rst
+	EOF
+
+	git config --add --local hook.pre-commit "---:/path/uvw" &&
+	git hook --list pre-commit >actual &&
+	test_cmp expected actual
+'
+
 test_done
-- 
2.24.0.393.g34dc348eaf-goog


^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [PATCH 6/6] hook: teach --porcelain mode
  2019-12-10  2:33 [PATCH 0/6] configuration-based hook management Emily Shaffer
                   ` (4 preceding siblings ...)
  2019-12-10  2:33 ` [PATCH 5/6] hook: remove prior hook with '---' Emily Shaffer
@ 2019-12-10  2:33 ` Emily Shaffer
  2019-12-11 19:33   ` Junio C Hamano
  2019-12-11 22:42 ` [PATCH 0/6] configuration-based hook management Junio C Hamano
  6 siblings, 1 reply; 125+ messages in thread
From: Emily Shaffer @ 2019-12-10  2:33 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

It might be desirable - for a user script, or a scripted Git command -
to run the appropriate set of hooks from outside of the compiled Git
binary. So, teach --porcelain in a way that enables the following:

  git hook --list --porcelain pre-commit | xargs -I% sh "%"

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 Documentation/git-hook.txt    |  5 ++++-
 builtin/hook.c                | 19 +++++++++++++------
 t/t1360-config-based-hooks.sh | 12 ++++++++++++
 3 files changed, 29 insertions(+), 7 deletions(-)

diff --git a/Documentation/git-hook.txt b/Documentation/git-hook.txt
index b4a992d43f..34276f5bce 100644
--- a/Documentation/git-hook.txt
+++ b/Documentation/git-hook.txt
@@ -8,7 +8,7 @@ git-hook - Manage configured hooks
 SYNOPSIS
 --------
 [verse]
-'git hook' -l | --list <hook-name>
+'git hook' -l | --list [--porcelain] <hook-name>
 
 DESCRIPTION
 -----------
@@ -45,6 +45,9 @@ OPTIONS
 	in the order they should be run. Output of this command follows the
 	format '<order number> <origin config> <hook command>'.
 
+--porcelain::
+	Print in a machine-readable format suitable for scripting.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/builtin/hook.c b/builtin/hook.c
index 8261302b27..b76dd3ad8f 100644
--- a/builtin/hook.c
+++ b/builtin/hook.c
@@ -16,7 +16,7 @@ enum hook_command {
 	HOOK_LIST,
 };
 
-static int print_hook_list(const struct strbuf *hookname)
+static int print_hook_list(const struct strbuf *hookname, int porcelain)
 {
 	struct list_head *head, *pos;
 	struct hook *item;
@@ -25,10 +25,14 @@ static int print_hook_list(const struct strbuf *hookname)
 
 	list_for_each(pos, head) {
 		item = list_entry(pos, struct hook, list);
-		if (item)
-			printf("%.3d\t%s\t%s\n", item->order,
-			       config_scope_to_string(item->origin),
-			       item->command.buf);
+		if (item) {
+			if (porcelain)
+				printf("%s\n", item->command.buf);
+			else
+				printf("%.3d\t%s\t%s\n", item->order,
+				       config_scope_to_string(item->origin),
+				       item->command.buf);
+		}
 	}
 
 	return 0;
@@ -38,11 +42,14 @@ int cmd_hook(int argc, const char **argv, const char *prefix)
 {
 	enum hook_command command = 0;
 	struct strbuf hookname = STRBUF_INIT;
+	int porcelain = 0;
 
 	struct option builtin_hook_options[] = {
 		OPT_CMDMODE('l', "list", &command,
 			    N_("list scripts which will be run for <hookname>"),
 			    HOOK_LIST),
+		OPT_BOOL(0, "porcelain", &porcelain,
+			 N_("display in machine parseable format")),
 		OPT_END(),
 	};
 
@@ -59,7 +66,7 @@ int cmd_hook(int argc, const char **argv, const char *prefix)
 
 	switch(command) {
 		case HOOK_LIST:
-			return print_hook_list(&hookname);
+			return print_hook_list(&hookname, porcelain);
 			break;
 		default:
 			usage_msg_opt("no command given.", builtin_hook_usage,
diff --git a/t/t1360-config-based-hooks.sh b/t/t1360-config-based-hooks.sh
index 66e70ae222..6f16ea1dd8 100755
--- a/t/t1360-config-based-hooks.sh
+++ b/t/t1360-config-based-hooks.sh
@@ -33,6 +33,18 @@ test_expect_success 'git hook --list orders by order number' '
 	test_cmp expected actual
 '
 
+test_expect_success 'git hook --list --porcelain' '
+	cat >expected <<-\EOF &&
+	/path/def
+	/path/ghi
+	/path/rst
+	/path/uvw
+	EOF
+
+	git hook --list --porcelain pre-commit >actual &&
+	test_cmp expected actual
+'
+
 test_expect_success 'order number collisions resolved in config order' '
 	cat >expected <<-\EOF &&
 	010	global	/path/def
-- 
2.24.0.393.g34dc348eaf-goog


^ permalink raw reply related	[flat|nested] 125+ messages in thread

* Re: [PATCH 2/6] config: add string mapping for enum config_scope
  2019-12-10  2:33 ` [PATCH 2/6] config: add string mapping for enum config_scope Emily Shaffer
@ 2019-12-10 11:16   ` Philip Oakley
  2019-12-10 17:21     ` Philip Oakley
  0 siblings, 1 reply; 125+ messages in thread
From: Philip Oakley @ 2019-12-10 11:16 UTC (permalink / raw)
  To: Emily Shaffer, git; +Cc: "Matthew Rogers mattr94"

Hi Emily,

On 10/12/2019 02:33, Emily Shaffer wrote:
> If a user is interacting with their config files primarily by the 'git
> config' command, using the location flags (--global, --system, etc) then
> they may be more interested to see the scope of the config file they are
> editing, rather than the filepath.
There's asimilar issue being worked on under Git-for-Windows with some 
proposed code for this very 'problem'
https://github.com/git-for-windows/git/pull/2399 and a GitGitGadget PR 
https://github.com/gitgitgadget/git/pull/478

cc'ing Matthew to help coordination.

Philip
>
> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> ---
>   config.c | 17 +++++++++++++++++
>   config.h |  1 +
>   2 files changed, 18 insertions(+)
>
> diff --git a/config.c b/config.c
> index e7052b3977..a20110e016 100644
> --- a/config.c
> +++ b/config.c
> @@ -3312,6 +3312,23 @@ enum config_scope current_config_scope(void)
>   		return current_parsing_scope;
>   }
>   
> +const char *config_scope_to_string(enum config_scope scope)
> +{
> +	switch (scope) {
> +	case CONFIG_SCOPE_SYSTEM:
> +		return _("system");
> +	case CONFIG_SCOPE_GLOBAL:
> +		return _("global");
> +	case CONFIG_SCOPE_REPO:
> +		return _("repo");
> +	case CONFIG_SCOPE_CMDLINE:
> +		return _("cmdline");
> +	case CONFIG_SCOPE_UNKNOWN:
> +	default:
> +		return _("unknown");
> +	}
> +}
> +
>   int lookup_config(const char **mapping, int nr_mapping, const char *var)
>   {
>   	int i;
> diff --git a/config.h b/config.h
> index f0ed464004..612f43acd0 100644
> --- a/config.h
> +++ b/config.h
> @@ -139,6 +139,7 @@ enum config_scope {
>   };
>   
>   enum config_scope current_config_scope(void);
> +const char *config_scope_to_string(enum config_scope);
>   const char *current_config_origin_type(void);
>   const char *current_config_name(void);
>   


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 2/6] config: add string mapping for enum config_scope
  2019-12-10 11:16   ` Philip Oakley
@ 2019-12-10 17:21     ` Philip Oakley
  0 siblings, 0 replies; 125+ messages in thread
From: Philip Oakley @ 2019-12-10 17:21 UTC (permalink / raw)
  To: Emily Shaffer, git; +Cc: "\"Matthew Rogers\" mattr94"

correcting Matt's email address.
original thread 
https://lore.kernel.org/git/20191210023335.49987-3-emilyshaffer@google.com/

On 10/12/2019 11:16, Philip Oakley wrote:
> On 10/12/2019 02:33, Emily Shaffer wrote:
>> If a user is interacting with their config files primarily by the 'git
>> config' command, using the location flags (--global, --system, etc) then
>> they may be more interested to see the scope of the config file they are
>> editing, rather than the filepath.
> There's asimilar issue being worked on under Git-for-Windows with some 
> proposed code for this very 'problem'
> https://github.com/git-for-windows/git/pull/2399 and a GitGitGadget PR 
> https://github.com/gitgitgadget/git/pull/478
>
> cc'ing Matthew to help coordination.
Philip

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 4/6] hook: support reordering of hook list
  2019-12-10  2:33 ` [PATCH 4/6] hook: support reordering of hook list Emily Shaffer
@ 2019-12-11 19:21   ` Junio C Hamano
  0 siblings, 0 replies; 125+ messages in thread
From: Junio C Hamano @ 2019-12-11 19:21 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: git

Emily Shaffer <emilyshaffer@google.com> writes:

>   $ grep -A2 "\[hook\]" ~/.gitconfig
>   [hook]
>           pre-commit = 001:~/test.sh
>           pre-commit = 999:~/baz.sh
>   $ grep -A2 "\[hook\]" ~/git/.git/config
>   [hook]
>           pre-commit = 900:~/bar.sh
>           pre-commit = 050:~/baz.sh
>   $ ./bin-wrappers/git hook --list pre-commit
>   001     global  ~/test.sh
>   050     repo    ~/baz.sh
>   900     repo    ~/bar.sh
>
> In the above example, '~/baz.sh' is provided in the global config with
> order position 999. Then, in the local config, that order is overridden
> to 050. Instead of running ~/baz.sh twice (at order 050 and at order
> 999), only run it once, in the position specified last in config order.

Doesn't that depend on the nature of the hook?  A hook that is
general enough to be used to inspect if another hook's effect is
sane and reject the result may want to be run after invocation of
each hook that is not itself, so I would prefer to avoid a design
that forbids the same command to be specified twice.

I would love it if it were possible without the precedence order and
instead the order of appearance in git_config() stream were usable
to decide the order these hooks are executed.  Unfortunately, there
is a fixed order that the configuration files are read, and I do not
see a way short of adding <number>: prefix like this design does to
ensure that a hook defined in the local config can run before or
after a hook defined in the global config, so <number>: in the above
design is probably a necessary evil X-<.

Having said that, I have a suspicion that the config file itself
should be kept simple---if a hook appears twice with different
numbers, they would be run twice, for example---and the tooling
around it (e.g. "git hook add/edit/replace/reorder") should
implement such a policy (e.g. "the same hook can run only once, so
remove the other entries when adding the same") if desired.

Which would mean that overriding/disabling an entry in the same
configuration file should be done by replacing or removing the
entry.  Adding another entry for the same command with different
precedence should mean the command would run twice.

And you would need a notation to override or disable an entry in a
different configuration file (e.g. global tells us to run foo.sh at
level 50 with "hook.pre-commit=50:foo.sh"; repository wants to say
not to run it at all, or run it at 80 instead).  I would think you'd
just need a notation to kill an existing entry (e.g. the local one
adds "hook.pre-commit=-50:foo.sh" to countermand the entry in the
earlier example, and then can add another one at level 80 if it
desires).

I am also tempted to say that the precedence level may not stay to
be the only attribute for a <hookname, executable> pair wants to
keep.  Instead of

	[hook]
                pre-commit = 900:bar.sh

it may have to become more like

	[hook "pre-commit"]
		level = 900
		path = bar.sh

if we do not want to paint us into a corner from which we cannot get
out of.  I dunno.

Doesn't this require coordination between the three configuration
sources how numbers are assigned and used, by the way?  Between the
per-user and the per-repository config, they are set by the same
person anyway, so there is not much to coordinate, but I am not sure
what the expectations are to allow reading from the system-wide
configuration (or, should we just disable reading from the
system-wide configuration for, say, security reasons?)

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 6/6] hook: teach --porcelain mode
  2019-12-10  2:33 ` [PATCH 6/6] hook: teach --porcelain mode Emily Shaffer
@ 2019-12-11 19:33   ` Junio C Hamano
  2019-12-11 22:00     ` Emily Shaffer
  0 siblings, 1 reply; 125+ messages in thread
From: Junio C Hamano @ 2019-12-11 19:33 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: git

Emily Shaffer <emilyshaffer@google.com> writes:

> It might be desirable - for a user script, or a scripted Git command -
> to run the appropriate set of hooks from outside of the compiled Git
> binary. So, teach --porcelain in a way that enables the following:
>
>   git hook --list --porcelain pre-commit | xargs -I% sh "%"
>
> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> ---

> +--porcelain::
> +	Print in a machine-readable format suitable for scripting.
> +
> ...
> +static int print_hook_list(const struct strbuf *hookname, int porcelain)
>  {
>  	struct list_head *head, *pos;
>  	struct hook *item;
> @@ -25,10 +25,14 @@ static int print_hook_list(const struct strbuf *hookname)
>  
>  	list_for_each(pos, head) {
>  		item = list_entry(pos, struct hook, list);
> +		if (item) {
> +			if (porcelain)
> +				printf("%s\n", item->command.buf);
> +			else
> +				printf("%.3d\t%s\t%s\n", item->order,
> +				       config_scope_to_string(item->origin),
> +				       item->command.buf);
> +		}

So, a Porcelain script cannot learn where the hook command comes
from, or what the precedence order of each line of the output is?


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 6/6] hook: teach --porcelain mode
  2019-12-11 19:33   ` Junio C Hamano
@ 2019-12-11 22:00     ` Emily Shaffer
  2019-12-11 22:07       ` Junio C Hamano
  0 siblings, 1 reply; 125+ messages in thread
From: Emily Shaffer @ 2019-12-11 22:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Wed, Dec 11, 2019 at 11:33:38AM -0800, Junio C Hamano wrote:
> Emily Shaffer <emilyshaffer@google.com> writes:
> 
> > It might be desirable - for a user script, or a scripted Git command -
> > to run the appropriate set of hooks from outside of the compiled Git
> > binary. So, teach --porcelain in a way that enables the following:
> >
> >   git hook --list --porcelain pre-commit | xargs -I% sh "%"
> >
> > Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> > ---
> 
> > +--porcelain::
> > +	Print in a machine-readable format suitable for scripting.
> > +
> > ...
> > +static int print_hook_list(const struct strbuf *hookname, int porcelain)
> >  {
> >  	struct list_head *head, *pos;
> >  	struct hook *item;
> > @@ -25,10 +25,14 @@ static int print_hook_list(const struct strbuf *hookname)
> >  
> >  	list_for_each(pos, head) {
> >  		item = list_entry(pos, struct hook, list);
> > +		if (item) {
> > +			if (porcelain)
> > +				printf("%s\n", item->command.buf);
> > +			else
> > +				printf("%.3d\t%s\t%s\n", item->order,
> > +				       config_scope_to_string(item->origin),
> > +				       item->command.buf);
> > +		}
> 
> So, a Porcelain script cannot learn where the hook command comes
> from,

Not as I had envisioned.

> or what the precedence order of each line of the output is?
> 

They're printed in the order they should be executed; the explicit order
isn't provided.


I suppose I had considered really just the one use case listed in the
commit message, especially since other inquiry into the hooks to be run
can be done against the config files themselves. But - I'm of course
open to use cases. What did you have in mind?

Maybe this can be solved better with a --pretty=format type of argument.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 6/6] hook: teach --porcelain mode
  2019-12-11 22:00     ` Emily Shaffer
@ 2019-12-11 22:07       ` Junio C Hamano
  2019-12-11 23:15         ` Emily Shaffer
  0 siblings, 1 reply; 125+ messages in thread
From: Junio C Hamano @ 2019-12-11 22:07 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: git

Emily Shaffer <emilyshaffer@google.com> writes:

>> So, a Porcelain script cannot learn where the hook command comes
>> from,
>
> Not as I had envisioned.
>
>> or what the precedence order of each line of the output is?
>> 
>
> They're printed in the order they should be executed; the explicit order
> isn't provided.
>
>
> I suppose I had considered really just the one use case listed in the
> commit message, especially since other inquiry into the hooks to be run
> can be done against the config files themselves. But - I'm of course
> open to use cases. What did you have in mind?

A tool to diagnose why the hooks are not firing in the order the
user intended them to, for example?

Or a tool to help editing the list of hooks.


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 0/6] configuration-based hook management
  2019-12-10  2:33 [PATCH 0/6] configuration-based hook management Emily Shaffer
                   ` (5 preceding siblings ...)
  2019-12-10  2:33 ` [PATCH 6/6] hook: teach --porcelain mode Emily Shaffer
@ 2019-12-11 22:42 ` Junio C Hamano
  6 siblings, 0 replies; 125+ messages in thread
From: Junio C Hamano @ 2019-12-11 22:42 UTC (permalink / raw)
  To: Emily Shaffer
  Cc: git, brian m. carlson, Jonathan Nieder,
	Ævar Arnfjörð Bjarmason

Emily Shaffer <emilyshaffer@google.com> writes:

> An implementation of the first piece of the proposal given in
> lore.kernel.org/git/20191116011125.GG22855@google.com.
>
> Teaches a new command, 'git hook', which will someday include 'git hook
> --add ...', 'git hook --edit ...', and maybe more. For now, just teach
> it how to check the config files with 'git hook --list ...'.
>
> The hooks-to-run list is collected in a new library, hook.o, which can
> someday reimplement find_hook() or otherwise be invoked to run all hooks
> for a given hookname (e.g. "pre-commit").

Nice to see the endgame vision upfront.

A few things that I'd like to see in the endgame that you did not
mention here are:

 - We may probably not want to have an authoritative "these are the
   hooks Git runs" catalog, so it would be great if the resulting
   system can operate without one.

 - There are at least two kinds of hooks wrt the style of input they
   take.  Some take their input on their command line, which makes
   it quite easy to run multiple of them in a row.  Others take
   their input from their standard input stream, which probably
   means that there needs a cache of the input stream to feed to
   each such hook script (unless Git process itself is generating
   the stream to drive the hook, in which case we could run the
   generation of the stream multiple times) if we want to run
   multiple of them.  

   . With the design goal of *not* having an authoritiative catalog,
     we'd probably need some way to annotate each entry in the [hook]
     configuration which kind of invication the hook program wants.

   . There may be more than the above two styles.  The system should
     be designed to be extensible to accomodate yet more.


Thanks.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 6/6] hook: teach --porcelain mode
  2019-12-11 22:07       ` Junio C Hamano
@ 2019-12-11 23:15         ` Emily Shaffer
  0 siblings, 0 replies; 125+ messages in thread
From: Emily Shaffer @ 2019-12-11 23:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Wed, Dec 11, 2019 at 02:07:45PM -0800, Junio C Hamano wrote:
> Emily Shaffer <emilyshaffer@google.com> writes:
> 
> >> So, a Porcelain script cannot learn where the hook command comes
> >> from,
> >
> > Not as I had envisioned.
> >
> >> or what the precedence order of each line of the output is?
> >> 
> >
> > They're printed in the order they should be executed; the explicit order
> > isn't provided.
> >
> >
> > I suppose I had considered really just the one use case listed in the
> > commit message, especially since other inquiry into the hooks to be run
> > can be done against the config files themselves. But - I'm of course
> > open to use cases. What did you have in mind?
> 
> A tool to diagnose why the hooks are not firing in the order the
> user intended them to, for example?
> 
> Or a tool to help editing the list of hooks.
FWIW, the next step for this 'git hook' tool is just such a mode,
although I certainly won't argue with anybody who wants to interact with
them somewhat differently.

Does allowing a format string solve this, then? Maybe it's less
Git-idiomatic, but it seems to me to be a very explicit format contract
that the scripter can write, and probably more useful than guessing what
info one might want when scripting. It also doesn't paint us into a
corner if we add other interesting info later.

Unless you have a complaint about it, I'll try to add that kind of
argument instead of --porcelain for this command.

 - Emily

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 3/6] hook: add --list mode
  2019-12-10  2:33 ` [PATCH 3/6] hook: add --list mode Emily Shaffer
@ 2019-12-12  9:38   ` Bert Wesarg
  2019-12-12 10:58   ` SZEDER Gábor
  1 sibling, 0 replies; 125+ messages in thread
From: Bert Wesarg @ 2019-12-12  9:38 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: Git Mailing List

On Tue, Dec 10, 2019 at 3:34 AM Emily Shaffer <emilyshaffer@google.com> wrote:
>
> Teach 'git hook --list <hookname>', which checks the known configs in
> order to create an ordered list of hooks to run on a given hook event.
>
> The hook config format is "hook.<hookname> = <order>:<path-to-hook>".
> This paves the way for multiple hook support; hooks should be run in the
> order specified by the user in the config, and in the case of an order
> number collision, configuration order should be used (e.g. global hook
> 004 will run before repo hook 004).
>
> For example:
>
>   $ grep -A2 "\[hook\]" ~/.gitconfig
>   [hook]
>           pre-commit = 001:~/test.sh
>           pre-commit = 999:~/baz.sh
>
>   $ grep -A1 "\[hook\]" ~/git/.git/config
>   [hook]
>           pre-commit = 900:~/bar.sh
>
>   $ ./bin-wrappers/git hook --list pre-commit
>   001     global  ~/test.sh
>   900     repo    ~/bar.sh
>   999     global  ~/baz.sh
>
> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> ---
>  Documentation/git-hook.txt    | 17 +++++++-
>  Makefile                      |  1 +
>  builtin/hook.c                | 54 ++++++++++++++++++++++-
>  hook.c                        | 81 +++++++++++++++++++++++++++++++++++
>  hook.h                        | 14 ++++++
>  t/t1360-config-based-hooks.sh | 43 ++++++++++++++++++-
>  6 files changed, 206 insertions(+), 4 deletions(-)
>  create mode 100644 hook.c
>  create mode 100644 hook.h
>
> diff --git a/Documentation/git-hook.txt b/Documentation/git-hook.txt
> index 2d50c414cc..a141884239 100644
> --- a/Documentation/git-hook.txt
> +++ b/Documentation/git-hook.txt
> @@ -8,12 +8,27 @@ git-hook - Manage configured hooks
>  SYNOPSIS
>  --------
>  [verse]
> -'git hook'
> +'git hook' -l | --list <hook-name>
>
>  DESCRIPTION
>  -----------
>  You can list, add, and modify hooks with this command.
>
> +This command parses the default configuration files for lines which look like
> +"hook.<hook-name> = <order number>:<hook command>", e.g. "hook.pre-commit =
> +010:/path/to/script.sh". In this way, multiple scripts can be run during a
> +single hook. Hooks are sorted in ascending order by order number; in the event
> +of an order number conflict, they are sorted in configuration order.
> +
> +OPTIONS
> +-------
> +
> +-l::
> +--list::
> +       List the hooks which have been configured for <hook-name>. Hooks appear
> +       in the order they should be run. Output of this command follows the
> +       format '<order number> <origin config> <hook command>'.
> +
>  GIT
>  ---
>  Part of the linkgit:git[1] suite
> diff --git a/Makefile b/Makefile
> index 83263505c0..21b3a82208 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -892,6 +892,7 @@ LIB_OBJS += hashmap.o
>  LIB_OBJS += linear-assignment.o
>  LIB_OBJS += help.o
>  LIB_OBJS += hex.o
> +LIB_OBJS += hook.o
>  LIB_OBJS += ident.o
>  LIB_OBJS += interdiff.o
>  LIB_OBJS += json-writer.o
> diff --git a/builtin/hook.c b/builtin/hook.c
> index b2bbc84d4d..8261302b27 100644
> --- a/builtin/hook.c
> +++ b/builtin/hook.c
> @@ -1,21 +1,73 @@
>  #include "cache.h"
>
>  #include "builtin.h"
> +#include "config.h"
> +#include "hook.h"
>  #include "parse-options.h"
> +#include "strbuf.h"
>
>  static const char * const builtin_hook_usage[] = {
> -       N_("git hook"),
> +       N_("git hook --list <hookname>"),

Its "<hook-name>" in Documentation/git-hook.txt

>         NULL
>  };
>
> +enum hook_command {
> +       HOOK_NO_COMMAND = 0,
> +       HOOK_LIST,
> +};
> +
> +static int print_hook_list(const struct strbuf *hookname)
> +{
> +       struct list_head *head, *pos;
> +       struct hook *item;
> +
> +       head = hook_list(hookname);
> +
> +       list_for_each(pos, head) {
> +               item = list_entry(pos, struct hook, list);
> +               if (item)
> +                       printf("%.3d\t%s\t%s\n", item->order,
> +                              config_scope_to_string(item->origin),
> +                              item->command.buf);
> +       }
> +
> +       return 0;
> +}
> +
>  int cmd_hook(int argc, const char **argv, const char *prefix)
>  {
> +       enum hook_command command = 0;
> +       struct strbuf hookname = STRBUF_INIT;
> +
>         struct option builtin_hook_options[] = {
> +               OPT_CMDMODE('l', "list", &command,
> +                           N_("list scripts which will be run for <hookname>"),


Its "<hook-name>" in Documentation/git-hook.txt
> +                           HOOK_LIST),
>                 OPT_END(),
>         };
>
>         argc = parse_options(argc, argv, prefix, builtin_hook_options,
>                              builtin_hook_usage, 0);
>
> +       if (argc < 1) {
> +               usage_msg_opt("a hookname must be provided to operate on.",
> +                             builtin_hook_usage, builtin_hook_options);
> +       }
> +
> +       strbuf_addstr(&hookname, "hook.");
> +       strbuf_addstr(&hookname, argv[0]);

The arg is never checked, if this is a valid/known hook.

Bert

> +
> +       switch(command) {
> +               case HOOK_LIST:
> +                       return print_hook_list(&hookname);
> +                       break;
> +               default:
> +                       usage_msg_opt("no command given.", builtin_hook_usage,
> +                                     builtin_hook_options);
> +       }
> +
> +       clear_hook_list();
> +       strbuf_release(&hookname);
> +
>         return 0;
>  }
> diff --git a/hook.c b/hook.c
> new file mode 100644
> index 0000000000..f8d1109084
> --- /dev/null
> +++ b/hook.c
> @@ -0,0 +1,81 @@
> +#include "cache.h"
> +
> +#include "hook.h"
> +#include "config.h"
> +
> +static LIST_HEAD(hook_head);
> +
> +void free_hook(struct hook *ptr)
> +{
> +       if (ptr) {
> +               strbuf_release(&ptr->command);
> +               free(ptr);
> +       }
> +}
> +
> +static void emplace_hook(struct list_head *pos, int order, const char *command)
> +{
> +       struct hook *to_add = malloc(sizeof(struct hook));
> +       to_add->order = order;
> +       to_add->origin = current_config_scope();
> +       strbuf_init(&to_add->command, 0);
> +       strbuf_addstr(&to_add->command, command);
> +
> +       list_add_tail(&to_add->list, pos);
> +}
> +
> +static void remove_hook(struct list_head *to_remove)
> +{
> +       struct hook *hook_to_remove = list_entry(to_remove, struct hook, list);
> +       list_del(to_remove);
> +       free_hook(hook_to_remove);
> +}
> +
> +void clear_hook_list()
> +{
> +       struct list_head *pos, *tmp;
> +       list_for_each_safe(pos, tmp, &hook_head)
> +               remove_hook(pos);
> +}
> +
> +static int check_config_for_hooks(const char *var, const char *value, void *hookname)
> +{
> +       struct list_head *pos, *p;
> +       struct hook *item;
> +       const struct strbuf *hookname_strbuf = hookname;
> +
> +       if (!strcmp(var, hookname_strbuf->buf)) {
> +               int order = 0;
> +               // TODO this is bad - open to overflows
> +               char command[256];
> +               int added = 0;
> +               if (!sscanf(value, "%d:%s", &order, command))
> +                       die(_("hook config '%s' doesn't match expected format"),
> +                           value);
> +
> +               list_for_each_safe(pos, p, &hook_head) {
> +                       item = list_entry(pos, struct hook, list);
> +
> +                       /*
> +                        * the new entry should go just before the first entry
> +                        * which has a higher order number than it.
> +                        */
> +                       if (item->order > order && !added) {
> +                               emplace_hook(pos, order, command);
> +                               added = 1;
> +                       }
> +               }
> +
> +               if (!added)
> +                       emplace_hook(pos, order, command);
> +       }
> +
> +       return 0;
> +}
> +
> +struct list_head* hook_list(const struct strbuf* hookname)
> +{
> +       git_config(check_config_for_hooks, (void*)hookname);
> +
> +       return &hook_head;
> +}
> diff --git a/hook.h b/hook.h
> new file mode 100644
> index 0000000000..104df4c088
> --- /dev/null
> +++ b/hook.h
> @@ -0,0 +1,14 @@
> +#include "config.h"
> +
> +struct hook
> +{
> +       struct list_head list;
> +       int order;
> +       enum config_scope origin;
> +       struct strbuf command;
> +};
> +
> +struct list_head* hook_list(const struct strbuf *hookname);
> +
> +void free_hook(struct hook *ptr);
> +void clear_hook_list();
> diff --git a/t/t1360-config-based-hooks.sh b/t/t1360-config-based-hooks.sh
> index 34b0df5216..1434051db3 100755
> --- a/t/t1360-config-based-hooks.sh
> +++ b/t/t1360-config-based-hooks.sh
> @@ -4,8 +4,47 @@ test_description='config-managed multihooks, including git-hook command'
>
>  . ./test-lib.sh
>
> -test_expect_success 'git hook command does not crash' '
> -       git hook
> +test_expect_success 'git hook rejects commands without a mode' '
> +       test_must_fail git hook pre-commit
> +'
> +
> +
> +test_expect_success 'git hook rejects commands without a hookname' '
> +       test_must_fail git hook --list
> +'
> +
> +test_expect_success 'setup hooks in system, global, and local' '
> +       git config --add --global hook.pre-commit "010:/path/def" &&
> +       git config --add --global hook.pre-commit "999:/path/uvw" &&
> +
> +       git config --add --local hook.pre-commit "100:/path/ghi" &&
> +       git config --add --local hook.pre-commit "990:/path/rst"
> +'
> +
> +test_expect_success 'git hook --list orders by order number' '
> +       cat >expected <<-\EOF &&
> +       010     global  /path/def
> +       100     repo    /path/ghi
> +       990     repo    /path/rst
> +       999     global  /path/uvw
> +       EOF
> +
> +       git hook --list pre-commit >actual &&
> +       test_cmp expected actual
> +'
> +
> +test_expect_success 'order number collisions resolved in config order' '
> +       cat >expected <<-\EOF &&
> +       010     global  /path/def
> +       010     repo    /path/abc
> +       100     repo    /path/ghi
> +       990     repo    /path/rst
> +       999     global  /path/uvw
> +       EOF
> +
> +       git config --add --local hook.pre-commit "010:/path/abc" &&
> +       git hook --list pre-commit >actual &&
> +       test_cmp expected actual
>  '
>
>  test_done
> --
> 2.24.0.393.g34dc348eaf-goog
>

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 1/6] hook: scaffolding for git-hook subcommand
  2019-12-10  2:33 ` [PATCH 1/6] hook: scaffolding for git-hook subcommand Emily Shaffer
@ 2019-12-12  9:41   ` Bert Wesarg
  2019-12-12 10:47   ` SZEDER Gábor
  1 sibling, 0 replies; 125+ messages in thread
From: Bert Wesarg @ 2019-12-12  9:41 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: Git Mailing List

On Tue, Dec 10, 2019 at 3:34 AM Emily Shaffer <emilyshaffer@google.com> wrote:
>
> Introduce infrastructure for a new subcommand, git-hook, which will be
> used to ease config-based hook management. This command will handle
> parsing configs to compose a list of hooks to run for a given event, as
> well as adding or modifying hook configs in an interactive fashion.
>
> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> ---
>  .gitignore                    |  1 +
>  Documentation/git-hook.txt    | 19 +++++++++++++++++++
>  Makefile                      |  1 +
>  builtin.h                     |  1 +
>  builtin/hook.c                | 21 +++++++++++++++++++++
>  git.c                         |  1 +

how about adding also completion support here?

Bert

>  t/t1360-config-based-hooks.sh | 11 +++++++++++
>  7 files changed, 55 insertions(+)
>  create mode 100644 Documentation/git-hook.txt
>  create mode 100644 builtin/hook.c
>  create mode 100755 t/t1360-config-based-hooks.sh
>
> diff --git a/.gitignore b/.gitignore
> index 89b3b79c1a..9ef59b9baa 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -74,6 +74,7 @@
>  /git-grep
>  /git-hash-object
>  /git-help
> +/git-hook
>  /git-http-backend
>  /git-http-fetch
>  /git-http-push
> diff --git a/Documentation/git-hook.txt b/Documentation/git-hook.txt
> new file mode 100644
> index 0000000000..2d50c414cc
> --- /dev/null
> +++ b/Documentation/git-hook.txt
> @@ -0,0 +1,19 @@
> +git-hook(1)
> +===========
> +
> +NAME
> +----
> +git-hook - Manage configured hooks
> +
> +SYNOPSIS
> +--------
> +[verse]
> +'git hook'
> +
> +DESCRIPTION
> +-----------
> +You can list, add, and modify hooks with this command.
> +
> +GIT
> +---
> +Part of the linkgit:git[1] suite
> diff --git a/Makefile b/Makefile
> index 58b92af54b..83263505c0 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1074,6 +1074,7 @@ BUILTIN_OBJS += builtin/get-tar-commit-id.o
>  BUILTIN_OBJS += builtin/grep.o
>  BUILTIN_OBJS += builtin/hash-object.o
>  BUILTIN_OBJS += builtin/help.o
> +BUILTIN_OBJS += builtin/hook.o
>  BUILTIN_OBJS += builtin/index-pack.o
>  BUILTIN_OBJS += builtin/init-db.o
>  BUILTIN_OBJS += builtin/interpret-trailers.o
> diff --git a/builtin.h b/builtin.h
> index 5cf5df69f7..d4ca2ac9a5 100644
> --- a/builtin.h
> +++ b/builtin.h
> @@ -173,6 +173,7 @@ int cmd_get_tar_commit_id(int argc, const char **argv, const char *prefix);
>  int cmd_grep(int argc, const char **argv, const char *prefix);
>  int cmd_hash_object(int argc, const char **argv, const char *prefix);
>  int cmd_help(int argc, const char **argv, const char *prefix);
> +int cmd_hook(int argc, const char **argv, const char *prefix);
>  int cmd_index_pack(int argc, const char **argv, const char *prefix);
>  int cmd_init_db(int argc, const char **argv, const char *prefix);
>  int cmd_interpret_trailers(int argc, const char **argv, const char *prefix);
> diff --git a/builtin/hook.c b/builtin/hook.c
> new file mode 100644
> index 0000000000..b2bbc84d4d
> --- /dev/null
> +++ b/builtin/hook.c
> @@ -0,0 +1,21 @@
> +#include "cache.h"
> +
> +#include "builtin.h"
> +#include "parse-options.h"
> +
> +static const char * const builtin_hook_usage[] = {
> +       N_("git hook"),
> +       NULL
> +};
> +
> +int cmd_hook(int argc, const char **argv, const char *prefix)
> +{
> +       struct option builtin_hook_options[] = {
> +               OPT_END(),
> +       };
> +
> +       argc = parse_options(argc, argv, prefix, builtin_hook_options,
> +                            builtin_hook_usage, 0);
> +
> +       return 0;
> +}
> diff --git a/git.c b/git.c
> index ce6ab0ece2..c8344b9ab7 100644
> --- a/git.c
> +++ b/git.c
> @@ -513,6 +513,7 @@ static struct cmd_struct commands[] = {
>         { "grep", cmd_grep, RUN_SETUP_GENTLY },
>         { "hash-object", cmd_hash_object },
>         { "help", cmd_help },
> +       { "hook", cmd_hook, RUN_SETUP },
>         { "index-pack", cmd_index_pack, RUN_SETUP_GENTLY | NO_PARSEOPT },
>         { "init", cmd_init_db },
>         { "init-db", cmd_init_db },
> diff --git a/t/t1360-config-based-hooks.sh b/t/t1360-config-based-hooks.sh
> new file mode 100755
> index 0000000000..34b0df5216
> --- /dev/null
> +++ b/t/t1360-config-based-hooks.sh
> @@ -0,0 +1,11 @@
> +#!/bin/bash
> +
> +test_description='config-managed multihooks, including git-hook command'
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'git hook command does not crash' '
> +       git hook
> +'
> +
> +test_done
> --
> 2.24.0.393.g34dc348eaf-goog
>

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 1/6] hook: scaffolding for git-hook subcommand
  2019-12-10  2:33 ` [PATCH 1/6] hook: scaffolding for git-hook subcommand Emily Shaffer
  2019-12-12  9:41   ` Bert Wesarg
@ 2019-12-12 10:47   ` SZEDER Gábor
  1 sibling, 0 replies; 125+ messages in thread
From: SZEDER Gábor @ 2019-12-12 10:47 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: git

On Mon, Dec 09, 2019 at 06:33:30PM -0800, Emily Shaffer wrote:
> Introduce infrastructure for a new subcommand, git-hook, which will be
> used to ease config-based hook management. This command will handle
> parsing configs to compose a list of hooks to run for a given event, as
> well as adding or modifying hook configs in an interactive fashion.
> 
> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> ---
>  .gitignore                    |  1 +
>  Documentation/git-hook.txt    | 19 +++++++++++++++++++
>  Makefile                      |  1 +
>  builtin.h                     |  1 +
>  builtin/hook.c                | 21 +++++++++++++++++++++
>  git.c                         |  1 +
>  t/t1360-config-based-hooks.sh | 11 +++++++++++
>  7 files changed, 55 insertions(+)
>  create mode 100644 Documentation/git-hook.txt
>  create mode 100644 builtin/hook.c
>  create mode 100755 t/t1360-config-based-hooks.sh

When adding a new command please don't forget the steps noted in
4ed5562925 (myfirstcontrib: add 'psuh' to command-list.txt,
2019-10-31) ;)


> diff --git a/t/t1360-config-based-hooks.sh b/t/t1360-config-based-hooks.sh
> new file mode 100755
> index 0000000000..34b0df5216
> --- /dev/null
> +++ b/t/t1360-config-based-hooks.sh
> @@ -0,0 +1,11 @@
> +#!/bin/bash

s/ba//

> +
> +test_description='config-managed multihooks, including git-hook command'
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'git hook command does not crash' '
> +	git hook
> +'
> +
> +test_done
> -- 
> 2.24.0.393.g34dc348eaf-goog
> 

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 3/6] hook: add --list mode
  2019-12-10  2:33 ` [PATCH 3/6] hook: add --list mode Emily Shaffer
  2019-12-12  9:38   ` Bert Wesarg
@ 2019-12-12 10:58   ` SZEDER Gábor
  1 sibling, 0 replies; 125+ messages in thread
From: SZEDER Gábor @ 2019-12-12 10:58 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: git

On Mon, Dec 09, 2019 at 06:33:32PM -0800, Emily Shaffer wrote:
> Teach 'git hook --list <hookname>', which checks the known configs in
> order to create an ordered list of hooks to run on a given hook event.
> 
> The hook config format is "hook.<hookname> = <order>:<path-to-hook>".
> This paves the way for multiple hook support; hooks should be run in the
> order specified by the user in the config, and in the case of an order
> number collision, configuration order should be used (e.g. global hook
> 004 will run before repo hook 004).
> 
> For example:
> 
>   $ grep -A2 "\[hook\]" ~/.gitconfig
>   [hook]
>           pre-commit = 001:~/test.sh
>           pre-commit = 999:~/baz.sh
> 
>   $ grep -A1 "\[hook\]" ~/git/.git/config
>   [hook]
>           pre-commit = 900:~/bar.sh
> 
>   $ ./bin-wrappers/git hook --list pre-commit
>   001     global  ~/test.sh
>   900     repo    ~/bar.sh
>   999     global  ~/baz.sh
> 
> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> ---
>  Documentation/git-hook.txt    | 17 +++++++-
>  Makefile                      |  1 +
>  builtin/hook.c                | 54 ++++++++++++++++++++++-
>  hook.c                        | 81 +++++++++++++++++++++++++++++++++++
>  hook.h                        | 14 ++++++
>  t/t1360-config-based-hooks.sh | 43 ++++++++++++++++++-
>  6 files changed, 206 insertions(+), 4 deletions(-)
>  create mode 100644 hook.c
>  create mode 100644 hook.h
> 
> diff --git a/Documentation/git-hook.txt b/Documentation/git-hook.txt
> index 2d50c414cc..a141884239 100644
> --- a/Documentation/git-hook.txt
> +++ b/Documentation/git-hook.txt
> @@ -8,12 +8,27 @@ git-hook - Manage configured hooks
>  SYNOPSIS
>  --------
>  [verse]
> -'git hook'
> +'git hook' -l | --list <hook-name>
>  
>  DESCRIPTION
>  -----------
>  You can list, add, and modify hooks with this command.
>  
> +This command parses the default configuration files for lines which look like
> +"hook.<hook-name> = <order number>:<hook command>", e.g. "hook.pre-commit =
> +010:/path/to/script.sh". In this way, multiple scripts can be run during a
> +single hook. Hooks are sorted in ascending order by order number; in the event
> +of an order number conflict, they are sorted in configuration order.
> +
> +OPTIONS
> +-------
> +
> +-l::
> +--list::
> +	List the hooks which have been configured for <hook-name>. Hooks appear
> +	in the order they should be run. Output of this command follows the
> +	format '<order number> <origin config> <hook command>'.
> +
>  GIT
>  ---
>  Part of the linkgit:git[1] suite

> diff --git a/builtin/hook.c b/builtin/hook.c
> index b2bbc84d4d..8261302b27 100644
> --- a/builtin/hook.c
> +++ b/builtin/hook.c
> @@ -1,21 +1,73 @@
>  #include "cache.h"
>  
>  #include "builtin.h"
> +#include "config.h"
> +#include "hook.h"
>  #include "parse-options.h"
> +#include "strbuf.h"
>  
>  static const char * const builtin_hook_usage[] = {
> -	N_("git hook"),
> +	N_("git hook --list <hookname>"),
>  	NULL
>  };
>  
> +enum hook_command {
> +	HOOK_NO_COMMAND = 0,
> +	HOOK_LIST,
> +};
> +
> +static int print_hook_list(const struct strbuf *hookname)
> +{
> +	struct list_head *head, *pos;
> +	struct hook *item;
> +
> +	head = hook_list(hookname);
> +
> +	list_for_each(pos, head) {
> +		item = list_entry(pos, struct hook, list);
> +		if (item)
> +			printf("%.3d\t%s\t%s\n", item->order,
> +			       config_scope_to_string(item->origin),
> +			       item->command.buf);
> +	}
> +
> +	return 0;
> +}
> +
>  int cmd_hook(int argc, const char **argv, const char *prefix)
>  {
> +	enum hook_command command = 0;
> +	struct strbuf hookname = STRBUF_INIT;
> +
>  	struct option builtin_hook_options[] = {
> +		OPT_CMDMODE('l', "list", &command,
> +			    N_("list scripts which will be run for <hookname>"),
> +			    HOOK_LIST),

I'm not sure about '--list' being an option.  I don't know what other
operations you have in mind for this 'git hook' command, but I suppose
that besides listing configured hooks it will be able to at least add,
remove, and reorder them as well.  These seem to be better implemented
as subcommands, along the lines of e.g. how notes and remotes can be
added, removed, etc.

>  		OPT_END(),
>  	};

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Notes from Git Contributor Summit, Los Angeles (April 5, 2020)
@ 2020-03-12  3:55 James Ramsay
  2020-03-12  3:56 ` [TOPIC 1/17] Reftable James Ramsay
                   ` (19 more replies)
  0 siblings, 20 replies; 125+ messages in thread
From: James Ramsay @ 2020-03-12  3:55 UTC (permalink / raw)
  To: git

It was great to see everyone at the Contributor Summit last week, in 
person and virtually.

Particular thanks go to Peff for facilitating, and to GitHub for 
organizing the logistics of the meeting place and food. Thank you!

On the day, the topics below were discussed:

1. Ref table (8 votes)
2. Hooks in the future (7 votes)
3. Obliterate (6 votes)
4. Sparse checkout (5 votes)
5. Partial Clone (6 votes)
6. GC strategies (6 votes)
7. Background operations/maintenance (4 votes)
8. Push performance (4 votes)
9. Obsolescence markers and evolve (4 votes)
10. Expel ‘git shell’? (3 votes)
11. GPL enforcement (3 votes)
12. Test harness improvements (3 votes)
13. Cross implementation test suite (3 votes)
14. Aspects of merge-ort: cool, or crimes against humanity? (2 votes)
15. Reachability checks (2 votes)
16. “I want a reviewer” (2 votes)
17. Security (2 votes)

Notes were taken in the linked Google Doc, but for those who’d rather 
read the notes here, I’ll also send the notes as replies to this 
message.

https://docs.google.com/document/d/15a_MPnKaEPbC92a4jhprlHvkyirDh2CtTtgOxNbnIbA/edit#heading=h.vvhyp0oa4hhz

Regards,
James

^ permalink raw reply	[flat|nested] 125+ messages in thread

* [TOPIC 1/17] Reftable
  2020-03-12  3:55 Notes from Git Contributor Summit, Los Angeles (April 5, 2020) James Ramsay
@ 2020-03-12  3:56 ` James Ramsay
  2020-03-12  3:56 ` [TOPIC 2/17] Hooks in the future James Ramsay
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 125+ messages in thread
From: James Ramsay @ 2020-03-12  3:56 UTC (permalink / raw)
  To: git

1. In case you’re not aware what it is. It was introduced in JGit. 
???Prefix table??

2. Gerrit team likes to get this in cgit

3. From the Stump the Experts yesterday, the question was “If you 
could go back and change anything what would it be?”: Loose refs can 
cause difficulties. So it would be nice to make reftables a first-class 
citizen. There are issues with OSes with case-insensitive filesystems. 
Reftables can help with this.

4. Stolee: contributing an entire copy of the source of a library 
elsewhere as one patch makes it hard to review, and doesn’t feel like 
a contribution to Git.

5. Brian: agree. Is it an external library that needs to be pulled in 
every time a new version added in JGit.

6. Edward: having it as external library moves the maintenance burden

7. Jonathan N: example of xdiff, we have a copy, Mercurial has a copy, 
and they have been patched in different ways. Can we separate these 
concerns? One: patches that can be reviewed separately. Two: licensing. 
Three: ongoing maintenance approach.

8. Peff: benefits of external library are clear. What is the maintenance 
burden of not maintaining this in the core git tree. More concerned 
about niceties in Git that aren’t in other libraries, like strbufs and 
data structures. Lowest common denominator isn’t ideal. Can this cost 
be mitigated?

9. Ed: I have the same concerns. We also have strbufs, but they are not 
the same. We also might run into licensing issues.

10. Stolee: also cross platform compatibility… It might not perform 
well on different platforms.
Peff: It feels to me there are a lot of hairy filesystem details 
reftables need to do.

11. Brian: Atomic renames have issues on Windows.

12. Jonathan N: Han-Wen wanted a more substantial review, and we just 
provided one (actionable for

13. Jonathan: write a summary email to Han-Wen)

14. Brian: (inaudible) Having a reftable library would be interesting to 
test SHA256 changes.

15. Stolee: would be nice to have tests regarding case-sensitivity & 
directory/file conflicts

16. Ed: wait, are we loosening the restriction?

17. Peff: no, for backwards-compatibility we cannot. Would love to get 
rid of that restriction, though.

18. Jonathan N: Immediate benefit wrt D/F conflicts is being able to 
keep reflogs for deleted branches

^ permalink raw reply	[flat|nested] 125+ messages in thread

* [TOPIC 2/17] Hooks in the future
  2020-03-12  3:55 Notes from Git Contributor Summit, Los Angeles (April 5, 2020) James Ramsay
  2020-03-12  3:56 ` [TOPIC 1/17] Reftable James Ramsay
@ 2020-03-12  3:56 ` James Ramsay
  2020-03-12 14:16   ` Emily Shaffer
  2020-03-12  3:57 ` [TOPIC 3/17] Obliterate James Ramsay
                   ` (17 subsequent siblings)
  19 siblings, 1 reply; 125+ messages in thread
From: James Ramsay @ 2020-03-12  3:56 UTC (permalink / raw)
  To: git

1. Emily: hooks in the config file. Sent a read only patch, but didn’t 
get much traction. Add a new header in the config file, then have prefix 
number so that security scan could be configured at system level that 
would run last, and then hook could also be configured at the project 
level.

2. Peff: Having hooks in the config would be nice. But don’t do it at 
`hooks.prereceive`, but use a subconfig like `hooks.prereceive.command` 
so it’s possible to add options later on.

3. Brian: sometimes the need to overridden, ordering works for me. For 
Git LFS it would be helpful to have a pre-push hook for LFS, and a 
different one for something else. Want flexibility about finding and 
discovering hooks.

4. Emily: if you want to specify a hook that is in the work tree, then 
it has to be configured after cloning.

5. Jonathan: It’s better to start with something low complexity as 
long as it can be extended/changed later. If there's room to tweak it 
over time then I'm not too worried about the initial version being 
perfect — we can make mistakes and learn from them. A challenge will 
be how hooks interact. Analogy to the challenges of stacked union 
filesystems and security modules in Linux. Analogy to sequence number 
allocation for unit scripts

6. CB: Declare dependencies instead of a sequence number? In theory 
independent hooks can also run in parallel.

7. Peff: Maybe that’s something to not worry about from the start. 
Like, how many hooks do you expect to run anyway.

8. Christian: At booking.com they use a lot of hooks, and they also sent 
patches to the mailing list to improve that.

9. Emily: In-tree hooks?

10. Brian: You can do `git cat-file <ref> | sh` to run a hook.

11. Brandon: Is it possible to globally to disable all hooks locally? It 
might be a security concern. Or is it something we might want to add?

12. Peff: No it’s not.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* [TOPIC 3/17] Obliterate
  2020-03-12  3:55 Notes from Git Contributor Summit, Los Angeles (April 5, 2020) James Ramsay
  2020-03-12  3:56 ` [TOPIC 1/17] Reftable James Ramsay
  2020-03-12  3:56 ` [TOPIC 2/17] Hooks in the future James Ramsay
@ 2020-03-12  3:57 ` James Ramsay
  2020-03-12 18:06   ` Konstantin Ryabitsev
  2020-03-15 22:19   ` Damien Robert
  2020-03-12  3:58 ` [TOPIC 4/17] Sparse checkout James Ramsay
                   ` (16 subsequent siblings)
  19 siblings, 2 replies; 125+ messages in thread
From: James Ramsay @ 2020-03-12  3:57 UTC (permalink / raw)
  To: git

1. Jonathan N: sometimes people accidentally add a big file they don’t 
need. Have to use BFG and it’s a pain. Next time, maybe you just deal 
with it and ignore. This happened to Chrome. Some huge blob that was in 
the repo, should no longer be in the repo, but don’t want to rewrite 
the history. Other use cases are confidential information, like 
password, credit card number etc. Initial reactions: it’s already out 
there, rotate. Second reaction: if it’s a toxic blob it needs to be 
removed everywhere! What if someone taught kernel repo to

2. James: I’ve been in a lot of meetings with customers where they 
mentioned it’s not possible to rotate the information that was leaked 
into the repo

3. Demetr: How far back do we allow to go to obliterate?

4. Jonathan N: there are indeed horrible real-world examples where 
things to be obliterated are from a long time ago.

5. James: real cost to changing object ids: Git and tools interacting 
with it really assume that history is immutable.

6. Elijah: replace refs helps, but not supported by hosts like GitHub 
etc

     a. Stolee: breaks commit graph because of generation numbers.
     b. Replace refs for blobs, then special packfile, there were edge 
cases.

7. Demetr: Backward compatibility, wouldn’t custom handling be 
problematic for old clients.

8. Jeff H: can we introduce a new type of object -- a "revoked blob" if 
you will that burns the original one but also holds the original SHA in 
the ODB ??

9. Peff: what would this mean for signatures? New opportunity to forge 
signatures.

10. Jonathan N: if a new entity, this means you’ve changed the content 
which we want to avoid. Maybe a list of revoked blobs. If fsck notices 
missing, it should be happy. Protocol support, if someone tries to 
include a patch with it, just ignore it. Not great. Improvement would be 
to send a list of things I deliberately didn’t send. Could also 
communicate blobs to be deleted, but ignore that for v1. Learn from 
Mercurial who have a very complicated signed revocation mechanism.

11. Brian: the remote can’t be trusted, ala leftpad maintainer could 
do something malicious causing repo to become invalid.

12. Jonathan N: main scenario I’m considering is trusted company 
remote.

13. Terry: partial clone and solve large files. Maybe the server could 
handle it by converting normal clone into partial, and then handle the 
error if someone asks for that blob.

14. Jakub: one idea would simply be to treat this as a missing blob in a 
partial clone

15. Michael Haggerty: does this only apply to blobs? (Peff: no, commit 
messages can contain sensitive information; Johannes: trees contain file 
names which also can contain sensitive information)

16. Jonathan N: partial clone is not a solution for the desire to get 
rid of the blob on the server side.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* [TOPIC 4/17] Sparse checkout
  2020-03-12  3:55 Notes from Git Contributor Summit, Los Angeles (April 5, 2020) James Ramsay
                   ` (2 preceding siblings ...)
  2020-03-12  3:57 ` [TOPIC 3/17] Obliterate James Ramsay
@ 2020-03-12  3:58 ` James Ramsay
  2020-03-12  4:00 ` [TOPIC 5/17] Partial Clone James Ramsay
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 125+ messages in thread
From: James Ramsay @ 2020-03-12  3:58 UTC (permalink / raw)
  To: git

1. Stolee: built in! We’re making improvements! We’re looking at UX 
- add, remove, state. Harder things like grep needs everything, but not 
expected.

2. Elijah: we do have a strongly worded warning, so we could just change 
it.

3. Jonathan N: I like both modes.

4. Terry: how is GC wired up? If I change a cone, will it be reclaimed?

5. Stolee: GC doesn’t remove reachable objects. Haven’t found people 
need to do this, unless they accidentally rehydrated something massive 
they didn’t really need. Day to day work doesn’t introduce too much.

6. Terry: Android devs have massive special machines. Constantly running 
out of disk space.

7. Stolee: more of a partial clone feature, than a sparse checkout 
feature. If I checkout three branches, go offline, I don’t want GC to 
clean things that I had downloaded.

8. Jonathan N: switching between Word and Powerpoint. Would it be useful 
to attach cone to branch rather than repo.

9. Stolee: Office team is building some kind of magic to automatically 
detect from branch.

10. Brian: can use reflog maybe. Prune based on that? People who run out 
of disk space could have shorter reflog.

11. Elijah: biggest problem people run into doing a rebase/pull, hit 
conflicts, then they need to update sparsity patterns, which they 
can’t do because there are conflicts. Working on a patch.

12. Stolee: Office scoper tool would automatically recalculate 
dependencies and update sparsity config so that they can build.

13. During break, Minh brought up an idea that we could use in-tree data 
to manage the dependency chain: The tree could contain files that 
contain directory names, and users use config to specify the list of 
those files to use for the sparse-checkout definition. When Git updates 
the working directory and those files change, the sparse-checkout can be 
updated to include the union of those directories. Stolee will look into 
how this could work and whether this works for existing customers.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* [TOPIC 5/17] Partial Clone
  2020-03-12  3:55 Notes from Git Contributor Summit, Los Angeles (April 5, 2020) James Ramsay
                   ` (3 preceding siblings ...)
  2020-03-12  3:58 ` [TOPIC 4/17] Sparse checkout James Ramsay
@ 2020-03-12  4:00 ` James Ramsay
  2020-03-17  7:38   ` Allowing only blob filtering was: " Christian Couder
  2020-03-12  4:01 ` [TOPIC 6/17] GC strategies James Ramsay
                   ` (14 subsequent siblings)
  19 siblings, 1 reply; 125+ messages in thread
From: James Ramsay @ 2020-03-12  4:00 UTC (permalink / raw)
  To: git

1. Stolee: what is the status, who is deploying it, what issues need to 
be handled? Example, downloading tags. Hard to highly recommend it.

2. Taylor: we deployed it. No activity except for internal testing. Some 
more activity, but no crashes. Have been dragging our feet. Chicken egg, 
can’t deploy it because the client may not work, but hoping to hear 
about problems.

3. ZJ: dark launched for a mission critical repos. Internal questions 
from CI team, not sure about performance. Build farm hitting it hard 
with different filter specs.

4. Taylor: we have patches we are promising to the list. Blob none and 
limit, for now, but add them incrementally. Bitmap patches are on the 
list

5. James: I’ve been talking to customers who have high interest in 
this. But they are hesitant. Do people have similar situations, like 
shallow clones?

6. Jonathan N: we’re not using it en masse with server farms (see 
Terry’s stats). Performance issues with catchup, long periods of 
downloading and no progress. Missing progress display means a user waits 
and gets worried. On server side, reachability check can be expensive, 
in part because enumerating refs is expensive.

7. Peff: client experience sucks with N+1 situations. If the server 
operator side is tolerable, that way it’s easier to move the client 
side forward. By default, v2 just serves them up, no reachability check. 
Not sure if we’ll do that forever. Often have to inflate blobs that 
are deltas, and then delta compression which is not needed.

8. Stolee: Jonathan built a batch download when changing trees. Possible 
to improve by sending haves.

9. Jonathan N: if you’re in the blob none filter, and say I have a 
commit, I might not actually have what the server expects.

10. Peff: could enumerate blobs

12. Demetr: Partial clones are dangerous for DoS attacks

12. Jonathan: JGit forbids most filters that can't use bitmaps.

13. Peff: just blob filters? Yes, so far.

14. Jonathan: as far as the client experience goes, we’re not batching 
often enough and not showing progress on catch-up fetches. Any other UX 
issues?

15. Jeff: no, those two are what I meant.

16. James: another question for git service providers: Is it a 
replacement for LFS?

17. Brian: some files can compress, others don’t. Repacking can blow 
up if you try to compress something that can’t be compressed. How do 
we identify which objects we compress, and which we don’t.

18. Jonathan N: if you see something already compressed, tell zlib to do 
passthrough compression.

19. Taylor: two problems - which projects do you want to quarantine, 
where do you put them. CDN offloading would be nice.

20. Stolee: reachability bitmaps are tied to a single packfile. Becomes 
more and more expensive. Even just having them in another file requires 
a lot of work.

21. Taylor: we’re looking at some heuristics so that some parts of the 
pack can just be moved over verbatim.

22. Peff: I see three problems: multi pack lookups, bitmaps,

23. Jonathan N: we never generate on the fly deltas

24. Peff: there are pathological cases.

25. Terry: we are seeing 89k partial clones per day. Majority is clone. 
Shallow clone equivalent.

26. Peff: why? Is it better?

27. Jonathan N: initial clone is about the same as shallow. One reason 
we encourage, if you do a follow up, with shallow clone it is expensive 
for the server.

28. Stolee: if you persist the previous shallow clone, it is much much 
cheaper to do incremental fetch.

29. Terry: JGit has enough shallow clone bugs that we often just send 
everything. Make shallow clone obsolete

30. Jonathan N: Jenkins style CI, option for shallow clone. Want to run 
diff or git describe, have to turn it off. Partial clone is simpler.

31. Minh: could the server force the client to partial clone?

32. Brian: risks, working on an airplane. I don’t want to do any kind 
of fetch operation on poor connection. Could be good for CI, but don’t 
want to break things for humans.

33. Jonathan N: if I am going to get on an airplane, is there a way to 
fill it in the background. There are workarounds, like run `git show` 
which needs everything.

34. Elijah: I want to fetch a bunch more stuff, but don’t fetch 
anymore, throw an error rather than hanging.

35. Jonathan: filter blob:none is people's first experience of the 
feature. Make it a first class ui concept, present a user oriented UI 
like git sparse-checkout?

36. Taylor: It looks like it’s simple to use, but there’s a lot to 
do to actually use it. And Scalar is doing that for you.

37. James: Some of our customers would be interested to have a feature 
that pushes down configuration to all the users. It would give them LFS 
by default, without the end-users doing something.

38. Jonathan: We considered enabling a global config at Google. For 
example for 1+GB files.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* [TOPIC 6/17] GC strategies
  2020-03-12  3:55 Notes from Git Contributor Summit, Los Angeles (April 5, 2020) James Ramsay
                   ` (4 preceding siblings ...)
  2020-03-12  4:00 ` [TOPIC 5/17] Partial Clone James Ramsay
@ 2020-03-12  4:01 ` James Ramsay
  2020-03-12  4:02 ` [TOPIC 7/17] Background operations/maintenance James Ramsay
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 125+ messages in thread
From: James Ramsay @ 2020-03-12  4:01 UTC (permalink / raw)
  To: git

1. Jonathan N: Git has a flexible packfile format. Compared to CVS where 
things are stored as deltas against the next revision of the same file. 
GC can be a huge operation if it’s not done regularly. "git gc" makes 
one huge pack. Better amortized behavior to have multiple packs with 
exponentially increasing size and combine them when needed (Martin 
Dick's exproll).

2. Jonathan N: There are also unreachable objects to take care about. GC 
can/should delete them. But at the same time someone else might be 
creating history that still needs those objects. To give objects a grace 
period, we turn the unused objects into loose objects and look at the 
creation time. But alternatively there’s the proposal to move these 
unreachable objects into a packfile for all these objects. But this can 
be a problem for older git clients, because they might not know the pack 
is garbage and might move objects across packs. See the hash function 
transition doc for details.

3. Terry: JGit has these unreachable garbage packs

4. Peff: You want to solve this loose objects explosion problem?

5. Peff: what if you reference an object in the garbage pack from an 
object in a non-garbage pack?

6. Jonathan N: At GC time the object from the garbage pack is copied to 
a non-garbage pack. Basically rescue it from the garbage. It only saves 
the referenced objects, not the whole garbage pack.

7. Jonathan N: It has been running in production for >2 years.

8. Peff: There are so many non-atomic operations that can happen. And 
races can happen.

9. Jonathan N: If you find races, please comment on the JGit change that 
describes the algorithm. Happens-before relation and grace period.

10. `git gc --prune-now` should no longer create loose objects first, 
before just deleting them.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* [TOPIC 7/17] Background operations/maintenance
  2020-03-12  3:55 Notes from Git Contributor Summit, Los Angeles (April 5, 2020) James Ramsay
                   ` (5 preceding siblings ...)
  2020-03-12  4:01 ` [TOPIC 6/17] GC strategies James Ramsay
@ 2020-03-12  4:02 ` James Ramsay
  2020-03-12  4:03 ` [TOPIC 8/17] Push performance James Ramsay
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 125+ messages in thread
From: James Ramsay @ 2020-03-12  4:02 UTC (permalink / raw)
  To: git

1. Stolee: Are we interested in having a background process doing 
things?

2. Emily: There are a lot of different ways to do this. Even only 
looking at Linux there are different ways.

3. Stolee: Without looking at how? What background operations would we 
like to have?

4. Emily: Is it a good candidate for `git undo`? To keep track of what 
the user was doing and to make it possible to roll back?

5. Brian: It can run into scalability issues. Also there might be repos 
on my disk that never change and don’t need background processing. At 
GitHub we do maintenance based on the number of pushes.

6. Stolee: Kind of maintenance will differ from client and server, 
interests are different. For Scalar we have this one process looking at 
all repos and will do operations on them.

7. Peff: On server-side you’ll have millions of repos and even one 
process looking at all processes have impact on the system. Most hosting 
providers already have services taking care of this, so I think this 
feature is only interesting for client-side.

8. Brian: We should be careful. For example I’m constantly creating 
test repos in /tmp.

9. Stolee: Thanks for the input, we’ll do research and come back to 
this.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* [TOPIC 8/17] Push performance
  2020-03-12  3:55 Notes from Git Contributor Summit, Los Angeles (April 5, 2020) James Ramsay
                   ` (6 preceding siblings ...)
  2020-03-12  4:02 ` [TOPIC 7/17] Background operations/maintenance James Ramsay
@ 2020-03-12  4:03 ` James Ramsay
  2020-03-12  4:04 ` [TOPIC 9/17] Obsolescence markers and evolve James Ramsay
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 125+ messages in thread
From: James Ramsay @ 2020-03-12  4:03 UTC (permalink / raw)
  To: git

1. Terry: Chrome has 500MB file pushed up. Using Gerrit, feature work 
becomes stale over a few days, then push. For a few months pushes would 
push gigabytes of data.

2. Stolee: where we do the tree walk, we are doing it from the merge 
base.
Jonathan N: Minh rescued us by advertising more .have refs to avoid it 
being pushed. In protocol V2 for push there are 3 major changes 
proposed: one, abbreviating ref advertisement; two, adding negotiation; 
three, push to fast moving ref if you don’t care if its a fast 
forward. Are there other cases?

3. Minh: performance on reachability. Would help to know what branch you 
are pushing.

4. Peff: I might be pushing a random sha, without a branch.

5. Brian: I’ve seen cases with 80k refs, we tried then to send minimal 
amounts of objects. We spend a lot of time negotiating, to eventually 
only send 4 objects. It’s not very efficient, you could just spend 
less time on that and send a few more objects.

6. Minh: can we invert the pattern? Just send the new thing, and then 
the server says give me more.

7. Peff: You’ll get N+1 issues.

8. Jonathan N: I like Jeff Hostetler’s idea in Zoom chat. You can look 
at the branch and see when the author changes and use that as a crude 
heuristic to ask the server if they have that commit.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* [TOPIC 9/17] Obsolescence markers and evolve
  2020-03-12  3:55 Notes from Git Contributor Summit, Los Angeles (April 5, 2020) James Ramsay
                   ` (7 preceding siblings ...)
  2020-03-12  4:03 ` [TOPIC 8/17] Push performance James Ramsay
@ 2020-03-12  4:04 ` James Ramsay
  2020-05-09 21:31   ` Noam Soloveichik
  2020-03-12  4:05 ` [TOPIC 10/17] Expel ‘git shell’? James Ramsay
                   ` (10 subsequent siblings)
  19 siblings, 1 reply; 125+ messages in thread
From: James Ramsay @ 2020-03-12  4:04 UTC (permalink / raw)
  To: git

1. Brandon: I thought it would be interesting to have a similar feature 
as Mercurial has. Mercurial evolve will help you do a big rebase commit 
by commit. Giving you more insights how commits change over time.

2. Peff: This has been discussed a lot of time on the list already.

3. Jonathan N: It will help with Googlers productivity, but it’s 
smaller compared to other performance fixes.

4. Brian: It’s a great feature and I would like to have it, but I’m 
not sure it gives enough value to someone to sit down and implement it.

5. Emily: Is it a good candidate for GSoC?

6. Brian: If we have a good design.

7. Stolee: It should be easier to use than interactive rebase.

8. Stolee: It would be nice to have instead of fixup commits I would 
send to you new commits which mark your original commits are obsolete.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* [TOPIC 10/17] Expel ‘git shell’?
  2020-03-12  3:55 Notes from Git Contributor Summit, Los Angeles (April 5, 2020) James Ramsay
                   ` (8 preceding siblings ...)
  2020-03-12  4:04 ` [TOPIC 9/17] Obsolescence markers and evolve James Ramsay
@ 2020-03-12  4:05 ` James Ramsay
  2020-03-12  4:07 ` [TOPIC 11/17] GPL enforcement James Ramsay
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 125+ messages in thread
From: James Ramsay @ 2020-03-12  4:05 UTC (permalink / raw)
  To: git

1. Jonathan N: Cannot use safely on its own. So why do we still have it?

2. Jonathan N: It’s not an interactive shell, it’s a login shell. To 
give the user only access to a git repo.

3. Jonathan N: Gitolite is the only sensible thing that uses git-shell. 
If this is the only good use-case? So can we donate it to them?

4. Peff: If it’s a tool for security, but no one is using it, so 
it’s dangerous to have it around. It’s mostly stand-alone, so it 
should be possible.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* [TOPIC 11/17] GPL enforcement
  2020-03-12  3:55 Notes from Git Contributor Summit, Los Angeles (April 5, 2020) James Ramsay
                   ` (9 preceding siblings ...)
  2020-03-12  4:05 ` [TOPIC 10/17] Expel ‘git shell’? James Ramsay
@ 2020-03-12  4:07 ` James Ramsay
  2020-03-12  4:08 ` [TOPIC 12/17] Test harness improvements James Ramsay
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 125+ messages in thread
From: James Ramsay @ 2020-03-12  4:07 UTC (permalink / raw)
  To: git

1. Peff: Hypothetically if a company would not distribute the sources of 
a modified version of git they ship. What should we do about it? Should 
we take legal actions and make them aware that they are doing something 
they should not do? Making sure they also treat other projects better.

2. Brandon: Would we gather together with other projects affected by 
this company?

3. Peff: How hard do we want to take it on this company?

4. Ed: I’m also bothered by this. And they just send me a tar ball. At 
Microsoft we are aggressive about doing this right.

5. Jonathan N: Can we make it really easy to comply, e.g. by making a 
build target that contains everything?

6. ZJ: They could just push to GitHub. That would be fine.

7. Peff: I brought this up, because we were made aware of this by the 
Conservancy. So I wanted to hear how people are feeling about it.

8. Brian: A more aggressive approach would be appropriate if we have 
made them aware of the issue and they decided to not comply on purpose.

9. Peff: Code change doesn’t matter, whether it’s a security fix or 
feature. And I’m fine giving them a bit of lag time, like a day. Not a 
day.

10. Peff: You’re not obliged to send the source code, but you should 
provide the offer to share the source. In this case, they sent us a tar 
ball, but the sources are not on their open-source. So they probably do 
not yet apply.

11. CB: But everyone on Mac can request to send you the source code. We 
could release a form somewhere to give people an easy option to request 
this.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* [TOPIC 12/17] Test harness improvements
  2020-03-12  3:55 Notes from Git Contributor Summit, Los Angeles (April 5, 2020) James Ramsay
                   ` (10 preceding siblings ...)
  2020-03-12  4:07 ` [TOPIC 11/17] GPL enforcement James Ramsay
@ 2020-03-12  4:08 ` James Ramsay
  2020-03-12  4:09 ` [TOPIC 13/17] Cross implementation test suite James Ramsay
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 125+ messages in thread
From: James Ramsay @ 2020-03-12  4:08 UTC (permalink / raw)
  To: git

1. Jonathan N: Test harness is an important part of the development 
process, shapes what kinds of tests people write. What can we improve?

2. Peff: I love our test harness

3. Brian: It’s amazing for integration tests. For our C code it’s a 
lot harder to do unit tests. We sometimes have a portability issue, 
about POSIX shell vs others.

4. ZJ: I like how it also acts as a piece of documentation.

5. Jonathan N: If we had more unit tests: if I am working on refs, I 
might like to run all tests related to that. And now we have this lack 
of dependency graph between this

6. Peff: I’m super nervous about that. Tools like code coverage could 
do this. But I’ve seen cases where all new tests are green, and tests 
in the area I expected they succeed. But at some far corner it seems to 
fail. So you’re optimizing for speed, might be losing in correctness. 
I’m biased because I can run all tests on my computer in 1 minute. But 
for Windows this doesn’t seem to work that.

7. Peff: We can spend time on speeding up things, making it better 
parallelized for example. I’ll send some patches out on this.

8. Jonathan N: Really nice contribution to Git by David Barr, whose 
background was as a Java developer and thus the code was written in a 
Java way with clear API boundaries and unit tests.

9. Brian: Yes if your function is doing too much, it should be split up 
making it possible to test the separate pieces and then have a function 
that calls those and tests the end result.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* [TOPIC 13/17] Cross implementation test suite
  2020-03-12  3:55 Notes from Git Contributor Summit, Los Angeles (April 5, 2020) James Ramsay
                   ` (11 preceding siblings ...)
  2020-03-12  4:08 ` [TOPIC 12/17] Test harness improvements James Ramsay
@ 2020-03-12  4:09 ` James Ramsay
  2020-03-12  4:11 ` [TOPIC 14/17] Aspects of merge-ort: cool, or crimes against humanity? James Ramsay
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 125+ messages in thread
From: James Ramsay @ 2020-03-12  4:09 UTC (permalink / raw)
  To: git

1. Carlos: some aspects are under specified, or work in very specific 
ways, but need agreement of correct behaviour. For example implementing 
a command line tool that will have expectations, or expected repo state 
so another tool can generate the right output. For example libgit2 
keeping up with ignore rules. How does JGit handle this?

2. Jonathan N: JGit has some tests of matching behavior which I do not 
like. Invokes git-grep, generate patterns and compare output. Having 
non-deterministic tests is not great. I like the idea of table driven 
tests, common data, but different manifestations of how you test those 
things.

3. Patrick: config formatted tests, need to write drivers for other 
projects. Stopped because writing all the tests in this format was not 
fun. Basics work though. Spoke to Peff 2 years ago, likely easy to write 
drivers for Git.

4. Peff: already replaced tests with table driven, and prefer that. 
There are table driven tests for attribute matching.

5. Brian: valuable for LFS. Know attribute matching is not up to spec. 
Could benefit from the tests to help identify gaps. We are MIT licensed, 
so we can’t just drop them in, but we could import them in CI.

6. Peff: make whatever is in Git as authority, add tables, and then 
these can be used by other projects.

7. Jonathan N: example is diff tests

^ permalink raw reply	[flat|nested] 125+ messages in thread

* [TOPIC 14/17] Aspects of merge-ort: cool, or crimes against humanity?
  2020-03-12  3:55 Notes from Git Contributor Summit, Los Angeles (April 5, 2020) James Ramsay
                   ` (12 preceding siblings ...)
  2020-03-12  4:09 ` [TOPIC 13/17] Cross implementation test suite James Ramsay
@ 2020-03-12  4:11 ` James Ramsay
  2020-03-12  4:13 ` [TOPIC 15/17] Reachability checks James Ramsay
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 125+ messages in thread
From: James Ramsay @ 2020-03-12  4:11 UTC (permalink / raw)
  To: git

1. Elijah: ORT stands for Ostensibly Recursive’s Twin. As a merge 
strategy, just like you can call ‘git merge -s recursive’ you can 
call ‘git merge -s ort’.  Git’s option parsing doesn’t require 
the space after the ‘-s’.

2. Major question is about performance & possible layering violations. 
Merge recursive calls unpacks trees to walks trees, then needs to get 
all file and directory names and so walks the trees on the right again, 
and then the trees on the left again. Then diff needs to walk sets of 
trees twice, and then insert_stage_data() does a bunch more narrow tree 
walks for rename detection. Lots of tree walking. Replaced that with two 
tree walks.

3. Using traverse_trees() instead of unpack_trees(), and avoid the index 
entirely (not even touching or creating cache_entry’s), and building 
up information as I need. I’m not calling diffcore_std(), but instead 
directly calling diffcore_rename(). Is this horrifying? Or is it 
justified by the performance gains?

4. Peff: both, some of it sounds like an improvement, but maybe there 
were hidden benefits previously.

5. Elijah: I write to a tree before I do anything.

6. Peff: I like that. Seems like a clean up to me. We have written 
libgit2-like code for merging server-side

7. Elijah: I’ve been adding tests for the past few years, more to add, 
feel good about it.

8. Jonathan N: If you are using a lower-layer thing, I would not say 
you’re not doing anything you shouldn’t. But if you docs say you 
should not to use diffcore_rename(), you can update the docs to say that 
it’s fine to use it.

9. Elijah: three places directly write tree objects. All have different 
data structures they are writing from. Should I pull them out? But then 
my data structure was also different, so I’d have a fourth.

10. Peff: not worried because trees are simple. Worried about policy 
logic. Can’t write a tree entry with a double slash. Want this to be 
enforced everywhere, but no idea how hard that would be to implement. 
Not about lines of code, but consistency of policy. Fearful that only 
one place does it.

11. Elijah: I know merge-ort checks this, but it’s not nearby, so it 
could change.

12. Peff: as bad as it is to round trip through the index, it may bypass 
quality checks, which you will need to manually implement.

13. Elijah: usability side, with the tree I’ve created, I could have 
.git/AUTOMERGE. I have an old tree, a new tree, and a checkout can get 
me there. Fixed a whole bunch of bugs for sparsity and submodules.

14. Elijah: If we use this to on-the-fly remerge as part of git-log in 
order to compare the merge commit to what the automatic merging would 
have done, where/how should we write objects as we go?

15. Jonathan N: can end up with proliferation of packs, would be nice to 
have similar to fast import and have in memory store. Dream not to have 
loose files written ever.

16. Peff: I like your dream. But fast import packs are bad. We assume 
that packs are good, and thus need to use GC aggressively. This 
increases pollution of that problem. I know about objects, but not 
written to disc, risk that you can write objects that are broken, but 
git doesn’t know because git thinks it has the object but it’s only 
in memory. Log is conceptually a read operation, but this would create 
the need for writes.

17. Elijah: you could write into a temporary directory. Worried about 
`gc --auto` in the middle of my operation. If I write to a temp pack I 
could potentially avoid it.

18. Elijah: large files. Rename detection might not work efficiently OR 
correctly for sufficiently large files (binary or not). Limited bucket 
size means that completely different files treated as renames when both 
are over 8MB. Should big files just not be compared?

19. Peff: maybe we should fix the hash…

20. Elijah: present situation is broken, maybe we can cheat in the short 
term, and avoid fixing?

21. Peff: seems more correct for now, but we’d need to document

22. Elijah: checkout --overwrite-ignore flag. Should merge have the same 
flag.

23. Jonathan N: gitignore original use case was build outputs which can 
be regenerate. But then some people want to ignore `.hg` which is much 
more precious.

24. Peff: we can plumb it through later to other commands

25. Brian: CI doesn’t really care. Moving between branches it would 
complain. For checkout and merge it makes sense to support just 
destroying.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* [TOPIC 15/17] Reachability checks
  2020-03-12  3:55 Notes from Git Contributor Summit, Los Angeles (April 5, 2020) James Ramsay
                   ` (13 preceding siblings ...)
  2020-03-12  4:11 ` [TOPIC 14/17] Aspects of merge-ort: cool, or crimes against humanity? James Ramsay
@ 2020-03-12  4:13 ` James Ramsay
  2020-03-12  4:14 ` [TOPIC 16/17] “I want a reviewer” James Ramsay
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 125+ messages in thread
From: James Ramsay @ 2020-03-12  4:13 UTC (permalink / raw)
  To: git

1. Jonathan N: seed the idea that it would be nice to hint the ref that 
your commit might be reachable from to help the server avoid iterating 
over all refs. Also, any strategies for speeding up reachability checks?

2. Demitr: reachability by user, or would you consider open to everyone?

3. Stolee: we don’t do branch level security, but we do tailor ref 
list to default, favorites and those you’ve pushed. There is also a 
full endpoint.

4. Brian: security model we have to have is that we assume everyone has 
read to everything. There are too many ways to attack. Useful for 
performance reasons, but not sure reachability checks provide much 
benefit. Don’t think it’s difficult to automate.

5. Demitr: what about security issues

6. Stolee: we’d say find another way.

7. Terry: we have a mono repo, easier to test everything. JGit goes down 
to object level.

8. Peff: Git doesn’t go down to that level, doesn’t validate haves.

9. Jonathan: two lessons, no one except Gerrit cares strongly about 
this; second if we like the model by branch permissions, worth making it 
work well in Git to prevent distance between JGit and Git.

10. Terry: can remove a branch very quickly and prevent new people 
getting it

11. Peff: don’t deny its usefulness, but the performance implication 
is concerning. Trying to keep objects private from determined attackers. 
But pushing a malicious commit to Linux, a user can see it, and won’t 
understand reachability doesn’t imply endorsement.

12. Jonathan: if Git has an easy cheap way to do it, people would use 
it.

13. Peff: have flirted with it, but might have to open 50GB of 
packfiles, or bitmap has corner cases. There are some obvious ways to 
improve, but a lot of work. V2 spec says you’re not allowed to check 
reachability.

14. Jonathan N: nah, it says you don't advertise a capability describing 
whether it is checking reachability.

15. Peff: submodule, but then the commit disappears and becomes 
unreachable. How do you handle?

16. Jonathan N: encourage folks to do fast forward only updates. In 
hooks instead of the git layer

17. Peff: you might not know what ref has reachability to that commit. I 
like the hint thing, if it’s just a hint.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* [TOPIC 16/17] “I want a reviewer”
  2020-03-12  3:55 Notes from Git Contributor Summit, Los Angeles (April 5, 2020) James Ramsay
                   ` (14 preceding siblings ...)
  2020-03-12  4:13 ` [TOPIC 15/17] Reachability checks James Ramsay
@ 2020-03-12  4:14 ` James Ramsay
  2020-03-12 13:31   ` Emily Shaffer
  2020-03-13 21:25   ` Eric Wong
  2020-03-12  4:16 ` [TOPIC 17/17] Security James Ramsay
                   ` (3 subsequent siblings)
  19 siblings, 2 replies; 125+ messages in thread
From: James Ramsay @ 2020-03-12  4:14 UTC (permalink / raw)
  To: git

1. Jonathan N: An experience some folks have is, sending a patch and 
hearing nothing. That must mean patch is awesome. But then realize I 
need to do something to get a review. In Git, people like Peff are good 
about responding to newcomers. As an author it can be hard to excite 
people enough to review your patch. Relatedly, you might get a review, 
but it doesn’t give you the feedback you wanted. As a reviewer, you 
want to help people to grow and make progress quickly, but it might not 
be easy to identify patches where this will be possible.

2. Emily: A few months ago we started doing code review book club. Git 
devel IRC, and mailing list, could we be more public about these? I 
queue my patch to list of things that have been idle and needs a review, 
then a bot pops something off the list to increase attention for people 
to review?

3. Jonathan Tan: during book club we discuss and review together. 
Everyone can benefit from review experience and expertise. Emily is 
hoping for similar knowledge transfer in the IRC channel.

4. Brian: general case that patches don’t get lost. There is the git 
context script, but I am now a reviewer because I have touched 
everything for SHA256. But we are losing patches and bug reports because 
things get missed. What tool would we use? How would we do it?

5. Jonathan N: patchwork exists, need to learn how to use it :)

6. Peff: this is all possible on the mailing list. I see things that 
look interesting, and have a to do folder. If someone replies, I’ll 
take it off the list. Once a week go through all the items. I like the 
book club idea, instead of it being ad hoc, or by me, a group of a few 
people review the list in the queue. You might want to use a separate 
tool, like IRC, but it would be good to have it bring it back to the 
mailing list as a summary. Public inbox could be better, but someone 
needs to write it. Maybe nerd snipe Eric?

7. Stolee: not just about doing reviews, but training reviewers.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* [TOPIC 17/17] Security
  2020-03-12  3:55 Notes from Git Contributor Summit, Los Angeles (April 5, 2020) James Ramsay
                   ` (15 preceding siblings ...)
  2020-03-12  4:14 ` [TOPIC 16/17] “I want a reviewer” James Ramsay
@ 2020-03-12  4:16 ` James Ramsay
  2020-03-12 14:38 ` Notes from Git Contributor Summit, Los Angeles (April 5, 2020) Derrick Stolee
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 125+ messages in thread
From: James Ramsay @ 2020-03-12  4:16 UTC (permalink / raw)
  To: git

1. Demtr: what are people doing to prevent security issues? For example, 
not allowing things into trees that would be problematic for various 
filesystems.

2. Jonathan N: transfer fsck objects by default, to validate at the 
trust boundary (in case some code paths at use time are missing some 
validation)

3. Peff: we have had buffer overflows, most are logic errors, and mostly 
paths related. Recently we’ve tightened up which paths are allowed. 
Forbidding things that might be valid on Linux, but problems on Windows. 
Can’t catch everything though, because Windows is so so complex

4. Stolee: I am fearful, and do not know all the rules.

5. Peff: I don’t think it is possible.

6. Demetr: only latin chars, numbers and a few other characters. Do not 
allow any special symbols.

7. Brian: that’s going to break lots of existing projects. Some 
projects have never been on Windows, and therefore people have no 
concern about Windows. People checking files that are strange to 
deliberately test strange files in their own software. If Windows has an 
API to test filepath, there is not much we can do to protect it. 
Compatibility is important.

8. Peff: probably some cleanup needed, maybe can’t clone git.git. Some 
paths that are innocuous, are a problem in strange situations.

9. Jonathan N: what in Git's design scares the crap out of you?

10. ZJ: GitLab shells out for everything. We had injections. Now we have 
a DSL to verify things. Looking at --end-of-options.

11. Peff: C is terrifying. Rust rewrite please. Still have integer 
overflow risks. Tried to deal with it a few years ago, and found some 
more a few months back. A happy story: OID array uses signed integer, 
because no-one has more than 2billion objects. Someone had 3billion 
objects. Just the SHA1s are 60GB. Found it because it triggered overflow 
in st_add. As soon as they wrapped around, it crashed, preventing under 
allocation

12. Jeff H: communication between processes

13. <musical interlude>

14. Peff: I feel good about where we read and write strings to each 
other. Maybe if we were using JSON encode/decode it might be easier to 
handle obscure cases

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [TOPIC 16/17] “I want a reviewer”
  2020-03-12  4:14 ` [TOPIC 16/17] “I want a reviewer” James Ramsay
@ 2020-03-12 13:31   ` Emily Shaffer
  2020-03-12 17:31     ` Konstantin Ryabitsev
  2020-03-17  0:43     ` Philippe Blain
  2020-03-13 21:25   ` Eric Wong
  1 sibling, 2 replies; 125+ messages in thread
From: Emily Shaffer @ 2020-03-12 13:31 UTC (permalink / raw)
  To: James Ramsay; +Cc: git

On Thu, Mar 12, 2020 at 03:14:25PM +1100, James Ramsay wrote:
> 5. Jonathan N: patchwork exists, need to learn how to use it :)

We've actually got a meeting with some Patchwork folks today - if
anybody has a burning need they want filled via Patchwork, just say so,
and we'll try to ask.

 - Emily

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [TOPIC 2/17] Hooks in the future
  2020-03-12  3:56 ` [TOPIC 2/17] Hooks in the future James Ramsay
@ 2020-03-12 14:16   ` Emily Shaffer
  2020-03-13 17:56     ` Junio C Hamano
  0 siblings, 1 reply; 125+ messages in thread
From: Emily Shaffer @ 2020-03-12 14:16 UTC (permalink / raw)
  To: James Ramsay; +Cc: git

On Thu, Mar 12, 2020 at 02:56:53PM +1100, James Ramsay wrote:
> 1. Emily: hooks in the config file. Sent a read only patch, but didn’t get
> much traction. Add a new header in the config file, then have prefix number
> so that security scan could be configured at system level that would run
> last, and then hook could also be configured at the project level.
> 
> 2. Peff: Having hooks in the config would be nice. But don’t do it at
> `hooks.prereceive`, but use a subconfig like `hooks.prereceive.command` so
> it’s possible to add options later on.
> 
> 3. Brian: sometimes the need to overridden, ordering works for me. For Git
> LFS it would be helpful to have a pre-push hook for LFS, and a different one
> for something else. Want flexibility about finding and discovering hooks.
> 
> 4. Emily: if you want to specify a hook that is in the work tree, then it
> has to be configured after cloning.
> 
> 5. Jonathan: It’s better to start with something low complexity as long as
> it can be extended/changed later. If there's room to tweak it over time then
> I'm not too worried about the initial version being perfect — we can make
> mistakes and learn from them. A challenge will be how hooks interact.
> Analogy to the challenges of stacked union filesystems and security modules
> in Linux. Analogy to sequence number allocation for unit scripts
> 
> 6. CB: Declare dependencies instead of a sequence number? In theory
> independent hooks can also run in parallel.
> 
> 7. Peff: Maybe that’s something to not worry about from the start. Like, how
> many hooks do you expect to run anyway.
> 
> 8. Christian: At booking.com they use a lot of hooks, and they also sent
> patches to the mailing list to improve that.
> 
> 9. Emily: In-tree hooks?
> 
> 10. Brian: You can do `git cat-file <ref> | sh` to run a hook.
> 
> 11. Brandon: Is it possible to globally to disable all hooks locally? It
> might be a security concern. Or is it something we might want to add?
> 
> 12. Peff: No it’s not.

Thanks for the notes, James.

I came away with the understanding that we want the config hook to look
something like this (barring misunderstanding of config file syntax,
plus or minus naming quibbles):

[hook "/path/to/executable.sh"]
	event = pre-commit

The idea being that by using a subsection, we can extend the format
later much more easily, but by starting simply, we can start using it
and see what we need or don't want. We can use config order to begin
with.

This means that we could do something like this:

[hook "/path/to/executable.sh"]
	event = pre-commit
	order = 123
	mustSucceed = false
	parallelizable = true

etc, etc as needed.

But I wonder if we also want to be able to do something like this:

[hook "/etc/git-secrets/git-secrets"]
	event = pre-commit
	event = prepare-commit-msg
	...

I guess the point is that we can choose to allow this, or not. I could
see there being some trouble if you wanted the execution order to work
differently (e.g. run it first for pre-commit but last for
prepare-commit-msg)...

I think, though, that something like
hook.pre-commit."path/to/executable.sh" won't work. It doesn't seem like
multiple subsections are OK in config syntax, as far as I can see. I'd
be interested to know I'm wrong :)

Will try and get some work on this soon, but honestly my hope is to get
bugreport squared away first.

 - Emily

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: Notes from Git Contributor Summit, Los Angeles (April 5, 2020)
  2020-03-12  3:55 Notes from Git Contributor Summit, Los Angeles (April 5, 2020) James Ramsay
                   ` (16 preceding siblings ...)
  2020-03-12  4:16 ` [TOPIC 17/17] Security James Ramsay
@ 2020-03-12 14:38 ` Derrick Stolee
  2020-03-13 20:47 ` Jeff King
  2020-03-15 18:42 ` Jakub Narebski
  19 siblings, 0 replies; 125+ messages in thread
From: Derrick Stolee @ 2020-03-12 14:38 UTC (permalink / raw)
  To: James Ramsay, git

On 3/11/2020 11:55 PM, James Ramsay wrote:
> It was great to see everyone at the Contributor Summit last week, in person and virtually.
> 
> Particular thanks go to Peff for facilitating, and to GitHub for organizing the logistics of the meeting place and food. Thank you!

Thanks for taking excellent notes!

> On the day, the topics below were discussed:
> 
> 1. Ref table (8 votes)
> 2. Hooks in the future (7 votes)
> 3. Obliterate (6 votes)
> 4. Sparse checkout (5 votes)
> 5. Partial Clone (6 votes)
> 6. GC strategies (6 votes)
> 7. Background operations/maintenance (4 votes)
> 8. Push performance (4 votes)
> 9. Obsolescence markers and evolve (4 votes)
> 10. Expel ‘git shell’? (3 votes)
> 11. GPL enforcement (3 votes)
> 12. Test harness improvements (3 votes)
> 13. Cross implementation test suite (3 votes)
> 14. Aspects of merge-ort: cool, or crimes against humanity? (2 votes)
> 15. Reachability checks (2 votes)
> 16. “I want a reviewer” (2 votes)
> 17. Security (2 votes)

Wow, this split into separate emails was a fantastic idea to control the multi-threaded discussion. Kudos!

-Stolee

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [TOPIC 16/17] “I want a reviewer”
  2020-03-12 13:31   ` Emily Shaffer
@ 2020-03-12 17:31     ` Konstantin Ryabitsev
  2020-03-12 17:42       ` Jonathan Nieder
  2020-03-17  0:43     ` Philippe Blain
  1 sibling, 1 reply; 125+ messages in thread
From: Konstantin Ryabitsev @ 2020-03-12 17:31 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: James Ramsay, git

On Thu, Mar 12, 2020 at 06:31:27AM -0700, Emily Shaffer wrote:
> On Thu, Mar 12, 2020 at 03:14:25PM +1100, James Ramsay wrote:
> > 5. Jonathan N: patchwork exists, need to learn how to use it :)
> 
> We've actually got a meeting with some Patchwork folks today - if
> anybody has a burning need they want filled via Patchwork, just say so,
> and we'll try to ask.

Just to highlight this -- a long while ago someone asked me to set up a 
patchwork instance for Git, but I believe they never used it:

https://patchwork.kernel.org/project/git/list/

-K

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [TOPIC 16/17] “I want a reviewer”
  2020-03-12 17:31     ` Konstantin Ryabitsev
@ 2020-03-12 17:42       ` Jonathan Nieder
  2020-03-12 18:00         ` Konstantin Ryabitsev
  0 siblings, 1 reply; 125+ messages in thread
From: Jonathan Nieder @ 2020-03-12 17:42 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: Emily Shaffer, James Ramsay, git

Hi!

Konstantin Ryabitsev wrote:
> On Thu, Mar 12, 2020 at 06:31:27AM -0700, Emily Shaffer wrote:

>> We've actually got a meeting with some Patchwork folks today - if
>> anybody has a burning need they want filled via Patchwork, just say so,
>> and we'll try to ask.
>
> Just to highlight this -- a long while ago someone asked me to set up a
> patchwork instance for Git, but I believe they never used it:
>
> https://patchwork.kernel.org/project/git/list/

That was me.  In fact, we are using it, but mostly read-only (similar
to lore patchwork) so far.  I'm hoping we can learn more about how to
automatically close reviews when a patch has landed, assign delegates
to reviews, set up bundles, etc and write some docs so it becomes
useful to more people.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [TOPIC 16/17] “I want a reviewer”
  2020-03-12 17:42       ` Jonathan Nieder
@ 2020-03-12 18:00         ` Konstantin Ryabitsev
  0 siblings, 0 replies; 125+ messages in thread
From: Konstantin Ryabitsev @ 2020-03-12 18:00 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Emily Shaffer, James Ramsay, git

On Thu, Mar 12, 2020 at 10:42:12AM -0700, Jonathan Nieder wrote:
> >> We've actually got a meeting with some Patchwork folks today - if
> >> anybody has a burning need they want filled via Patchwork, just say so,
> >> and we'll try to ask.
> >
> > Just to highlight this -- a long while ago someone asked me to set up a
> > patchwork instance for Git, but I believe they never used it:
> >
> > https://patchwork.kernel.org/project/git/list/
> 
> That was me.  In fact, we are using it, but mostly read-only (similar
> to lore patchwork) so far.  I'm hoping we can learn more about how to
> automatically close reviews when a patch has landed, assign delegates
> to reviews, set up bundles, etc and write some docs so it becomes
> useful to more people.

FYI, I can set it up with git-patchwork-bot, which does some of the 
above. You can read more here:

https://korg.wiki.kernel.org/userdoc/patchwork#adding_patchwork-bot_integration

If that's something you would like to see, please send a request per 
that doc.

-K

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [TOPIC 3/17] Obliterate
  2020-03-12  3:57 ` [TOPIC 3/17] Obliterate James Ramsay
@ 2020-03-12 18:06   ` Konstantin Ryabitsev
  2020-03-15 22:19   ` Damien Robert
  1 sibling, 0 replies; 125+ messages in thread
From: Konstantin Ryabitsev @ 2020-03-12 18:06 UTC (permalink / raw)
  To: James Ramsay; +Cc: git

On Thu, Mar 12, 2020 at 02:57:24PM +1100, James Ramsay wrote:
> 8. Jeff H: can we introduce a new type of object -- a "revoked blob" if you
> will that burns the original one but also holds the original SHA in the ODB
> ??
> 
> 9. Peff: what would this mean for signatures? New opportunity to forge
> signatures.

Easy, you just quickly find a collision for that blob's sha1 and put 
that in place of the offending original. ;)

(Fully tongue-in-cheek.)

-K

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [TOPIC 2/17] Hooks in the future
  2020-03-12 14:16   ` Emily Shaffer
@ 2020-03-13 17:56     ` Junio C Hamano
  2020-04-07 23:01       ` Emily Shaffer
  0 siblings, 1 reply; 125+ messages in thread
From: Junio C Hamano @ 2020-03-13 17:56 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: James Ramsay, git

Emily Shaffer <emilyshaffer@google.com> writes:

> This means that we could do something like this:
>
> [hook "/path/to/executable.sh"]
> 	event = pre-commit
> 	order = 123
> 	mustSucceed = false
> 	parallelizable = true
>
> etc, etc as needed.

You can do

    [hook "pre-commit"]
	order = 123
	path = "/path/to/executable.sh"

    [hook "pre-commit"]
	order = 234
	path = "/path/to/another-executable.sh"

as well, and using the second level for what hook the (sub)section
is about, instead of "we have this path that is used for a hook.
What hook is it?", feels (at least to me) more natural.

> But I wonder if we also want to be able to do something like this:
>
> [hook "/etc/git-secrets/git-secrets"]
> 	event = pre-commit
> 	event = prepare-commit-msg

Once you start going this route, it no longer makes sense to give
priority (you called it "order") to a path and have that same number
used in contexts of different hooks.  Your git-secrets script may
want to be called early among pre-commit hooks but late among the
prepare-commit-msg hooks, for example.

> I think, though, that something like
> hook.pre-commit."path/to/executable.sh" won't work.

That is why Peff already suggested in the TOPIC notes to use
"command" in the message you are responding to (I used "path" in the
above description).

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: Notes from Git Contributor Summit, Los Angeles (April 5, 2020)
  2020-03-12  3:55 Notes from Git Contributor Summit, Los Angeles (April 5, 2020) James Ramsay
                   ` (17 preceding siblings ...)
  2020-03-12 14:38 ` Notes from Git Contributor Summit, Los Angeles (April 5, 2020) Derrick Stolee
@ 2020-03-13 20:47 ` Jeff King
  2020-03-15 18:42 ` Jakub Narebski
  19 siblings, 0 replies; 125+ messages in thread
From: Jeff King @ 2020-03-13 20:47 UTC (permalink / raw)
  To: James Ramsay; +Cc: git

On Thu, Mar 12, 2020 at 02:55:21PM +1100, James Ramsay wrote:

> It was great to see everyone at the Contributor Summit last week, in person
> and virtually.
> 
> Particular thanks go to Peff for facilitating, and to GitHub for organizing
> the logistics of the meeting place and food. Thank you!

Thanks very much to you and others who took these notes! It's nice to
have a more permanent record of these discussions.

-Peff

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [TOPIC 16/17] “I want a reviewer”
  2020-03-12  4:14 ` [TOPIC 16/17] “I want a reviewer” James Ramsay
  2020-03-12 13:31   ` Emily Shaffer
@ 2020-03-13 21:25   ` Eric Wong
  2020-03-14 17:27     ` Jeff King
  1 sibling, 1 reply; 125+ messages in thread
From: Eric Wong @ 2020-03-13 21:25 UTC (permalink / raw)
  To: James Ramsay, Jeff King; +Cc: git

James Ramsay <james@jramsay.com.au> wrote:

James: first off, thank you for these accessible summaries for
non-JS users and those who could not attend for various reasons(*)

<snip>

> 6. Peff: this is all possible on the mailing list. I see things that look
> interesting, and have a to do folder. If someone replies, I’ll take it off
> the list. Once a week go through all the items. I like the book club idea,
> instead of it being ad hoc, or by me, a group of a few people review the
> list in the queue. You might want to use a separate tool, like IRC, but it
> would be good to have it bring it back to the mailing list as a summary.
> Public inbox could be better, but someone needs to write it. Maybe nerd
> snipe Eric?

What now? :o

There's a lot of things it could be better at, but a more
concrete idea of what you want would help.

Right now I only have enough resources to do bugfixes along with scalability
and performance improvements so more people can run it and keep
it 100% reproducible and centralization resistant.

I'm also planning on some local tooling along the lines of
notmuch/mairix which is NNTP/HTTPS-aware but not sure when I'll
be able to do that...


(*) I stopped attending events over a decade ago for privacy reasons
    (facial recognition, invasive airport searches, etc.)

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [TOPIC 16/17] “I want a reviewer”
  2020-03-13 21:25   ` Eric Wong
@ 2020-03-14 17:27     ` Jeff King
  2020-03-15  0:36       ` inbox indexing wishlist [was: [TOPIC 16/17] “I want a reviewer”] Eric Wong
  0 siblings, 1 reply; 125+ messages in thread
From: Jeff King @ 2020-03-14 17:27 UTC (permalink / raw)
  To: Eric Wong; +Cc: James Ramsay, git

On Fri, Mar 13, 2020 at 09:25:31PM +0000, Eric Wong wrote:

> > 6. Peff: this is all possible on the mailing list. I see things that look
> > interesting, and have a to do folder. If someone replies, I’ll take it off
> > the list. Once a week go through all the items. I like the book club idea,
> > instead of it being ad hoc, or by me, a group of a few people review the
> > list in the queue. You might want to use a separate tool, like IRC, but it
> > would be good to have it bring it back to the mailing list as a summary.
> > Public inbox could be better, but someone needs to write it. Maybe nerd
> > snipe Eric?
> 
> What now? :o
> 
> There's a lot of things it could be better at, but a more
> concrete idea of what you want would help.

short answer: searching for threads that only one person participated in

The discussion here was around people finding useful things to do on the
list: triaging or fixing bugs, responding to questions, etc. And I said
my mechanism for doing that was to hold interesting-looking but
not-yet-responded-to mails in my git-list inbox, treating it like a todo
list, and then eventually:

  1. I sweep through and spend time on each one.

  2. I see that somebody else responded, and I drop it from my queue.

  3. It ages out and I figure that it must not have been that important
     (I do this less individually, and more by occasionally declaring
     bankruptcy).

That's easy for me because I use mutt, and I basically keep my own list
archive anyway. But it would probably be possible to use an existing
archive and just search for "threads with only one author from the last
7 days". And people could sweep through that[1].

You already allow date-based searches, so it would really just be adding
the "thread has only one author" search. It's conceptually simple, but
it might be hard to index (because of course it may change as messages
are added to the archive, though any updates are bounded to the set of
threads the new messages are in).

But to be clear, I don't think you have any obligation here. I just
wondered if it might be interesting enough that you would implement it
for fun. :) As far as I'm concerned, if you never implemented another
feature for public-inbox, what you've done already has been a great
service to the community.

-Peff

[1] The obvious thing this lacks compared to my workflow is a way to
    mark threads as "seen" or "not interesting". But that implies
    per-user storage.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* inbox indexing wishlist [was: [TOPIC 16/17] “I want a reviewer”]
  2020-03-14 17:27     ` Jeff King
@ 2020-03-15  0:36       ` Eric Wong
  0 siblings, 0 replies; 125+ messages in thread
From: Eric Wong @ 2020-03-15  0:36 UTC (permalink / raw)
  To: Jeff King; +Cc: James Ramsay, git, meta

Jeff King <peff@peff.net> wrote:
> On Fri, Mar 13, 2020 at 09:25:31PM +0000, Eric Wong wrote:
> 
> > > 6. Peff: this is all possible on the mailing list. I see things that look
> > > interesting, and have a to do folder. If someone replies, I’ll take it off
> > > the list. Once a week go through all the items. I like the book club idea,
> > > instead of it being ad hoc, or by me, a group of a few people review the
> > > list in the queue. You might want to use a separate tool, like IRC, but it
> > > would be good to have it bring it back to the mailing list as a summary.
> > > Public inbox could be better, but someone needs to write it. Maybe nerd
> > > snipe Eric?
> > 
> > What now? :o
> > 
> > There's a lot of things it could be better at, but a more
> > concrete idea of what you want would help.
> 
> short answer: searching for threads that only one person participated in

+Cc meta@public-inbox.org

OK, something I've thought of doing anyways in the past...

> The discussion here was around people finding useful things to do on the
> list: triaging or fixing bugs, responding to questions, etc. And I said
> my mechanism for doing that was to hold interesting-looking but
> not-yet-responded-to mails in my git-list inbox, treating it like a todo
> list, and then eventually:
> 
>   1. I sweep through and spend time on each one.
> 
>   2. I see that somebody else responded, and I drop it from my queue.
> 
>   3. It ages out and I figure that it must not have been that important
>      (I do this less individually, and more by occasionally declaring
>      bankruptcy).
> 
> That's easy for me because I use mutt, and I basically keep my own list
> archive anyway. But it would probably be possible to use an existing
> archive and just search for "threads with only one author from the last
> 7 days". And people could sweep through that[1].
> 
> You already allow date-based searches, so it would really just be adding
> the "thread has only one author" search. It's conceptually simple, but
> it might be hard to index (because of course it may change as messages
> are added to the archive, though any updates are bounded to the set of
> threads the new messages are in).

Exactly on being conceptually simple but requiring some deeper
changes to the way indexing works.  I'll have to think about it
a bit, but it should be doable without being too intrusive,
invasive or expensive for existing users.

> But to be clear, I don't think you have any obligation here. I just
> wondered if it might be interesting enough that you would implement it
> for fun. :) As far as I'm concerned, if you never implemented another
> feature for public-inbox, what you've done already has been a great
> service to the community.

Thanks.  I'll keep that index change in mind and it should be
doable if I remain alive and society doesn't collapse...

> [1] The obvious thing this lacks compared to my workflow is a way to
>     mark threads as "seen" or "not interesting". But that implies
>     per-user storage.

Yeah, that would be part of the local tools bit I've been
thinking about (user labels such as "important", "seen",
"replied", "new", "ignore", ... flags).

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: Notes from Git Contributor Summit, Los Angeles (April 5, 2020)
  2020-03-12  3:55 Notes from Git Contributor Summit, Los Angeles (April 5, 2020) James Ramsay
                   ` (18 preceding siblings ...)
  2020-03-13 20:47 ` Jeff King
@ 2020-03-15 18:42 ` Jakub Narebski
  2020-03-16 19:31   ` Jeff King
  19 siblings, 1 reply; 125+ messages in thread
From: Jakub Narebski @ 2020-03-15 18:42 UTC (permalink / raw)
  To: James Ramsay; +Cc: git

"James Ramsay" <james@jramsay.com.au> writes:

> It was great to see everyone at the Contributor Summit last week, in
> person and virtually.
>
> Particular thanks go to Peff for facilitating, and to GitHub for
> organizing the logistics of the meeting place and food. Thank you!
>
> On the day, the topics below were discussed:
>
> 1. Ref table (8 votes)
> 2. Hooks in the future (7 votes)
> 3. Obliterate (6 votes)
> 4. Sparse checkout (5 votes)
> 5. Partial Clone (6 votes)
> 6. GC strategies (6 votes)
> 7. Background operations/maintenance (4 votes)
> 8. Push performance (4 votes)
> 9. Obsolescence markers and evolve (4 votes)
> 10. Expel ‘git shell’? (3 votes)
> 11. GPL enforcement (3 votes)
> 12. Test harness improvements (3 votes)
> 13. Cross implementation test suite (3 votes)
> 14. Aspects of merge-ort: cool, or crimes against humanity? (2 votes)
> 15. Reachability checks (2 votes)
> 16. “I want a reviewer” (2 votes)
> 17. Security (2 votes)

Thank you very much for sending split writeup to the mailing list.

One question to all participating live (in person): how those topics
were proposed, and how they were voted for?  This was done before remote
access was turned on, I think.

Thanks in advance,
-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [TOPIC 3/17] Obliterate
  2020-03-12  3:57 ` [TOPIC 3/17] Obliterate James Ramsay
  2020-03-12 18:06   ` Konstantin Ryabitsev
@ 2020-03-15 22:19   ` Damien Robert
  2020-03-16 12:55     ` Konstantin Tokarev
                       ` (3 more replies)
  1 sibling, 4 replies; 125+ messages in thread
From: Damien Robert @ 2020-03-15 22:19 UTC (permalink / raw)
  To: James Ramsay; +Cc: git

From James Ramsay, Thu 12 Mar 2020 at 14:57:24 (+1100) :
> 6. Elijah: replace refs helps, but not supported by hosts like GitHub etc
>     a. Stolee: breaks commit graph because of generation numbers.
>     b. Replace refs for blobs, then special packfile, there were edge cases.

I am interested in more details on how to handle this using replace.

My situation: coworkers push big files by mistake, I don't want to rewrite
history because they are not too well versed with git, but I want to keep
*my* repo clean.

Partial solution:
- identify the large blobs (easy)
- write a replace ref (easy):
  $ git replace b5f74037bb91 $(git hash-object -w -t blob /dev/null)
  and replace the file (if it is still in the repo) by an empty file.

Now the pain points start:
- first the index does not handle replace (I think), so the replaced file
  appear as changed in git status, even through eg git diff shows nothing.

=> Solution: configure .git/info/sparse-checkout

- secondly, I want to remove the large blob from my repo.

Ideally I'd like to repack everything but filter this blob, except that
repack does not understand --filter. So I need to use `git pack-objects`
directly and then do the naming and clean up that repack usually does
manually, which is error prone.

Furthermore, while `git pack-objects` accepts --filter, I can only filter on
blob size, not blob oid. (there is filter=sparse:oid where I could reuse my
sparse checkout file, but I would need to make a blob of it first). And if I
have one large file I want to keep, I cannot filter by blob size.

Another solution would be to use `git unpack-objects` to unpack all objects
(except I would need to do that in an empty git dir), remove the blob, and
then repack everything.

Am I missing a simpler solution?

- finally, checkouting to a ref including the replaced (now missing) blob
  gives error messages of the form:
error: invalid object 100644 b5f74037bb91c45606b233b0ad6aad86f8e3875e for 'Silverman-Height-NonTorsion.pdf'

On the one hand it is reassuring that git checks that the real object
(rather than only the replaced object) is still there, on the other hand it
would be nice to ask git to completely forget about the original object
(except fsck of course).

Thanks,
Damien

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [TOPIC 3/17] Obliterate
  2020-03-15 22:19   ` Damien Robert
@ 2020-03-16 12:55     ` Konstantin Tokarev
  2020-03-26 22:27       ` Damien Robert
  2020-03-16 16:32     ` Elijah Newren
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 125+ messages in thread
From: Konstantin Tokarev @ 2020-03-16 12:55 UTC (permalink / raw)
  To: Damien Robert, James Ramsay; +Cc: git@vger.kernel.org



16.03.2020, 02:13, "Damien Robert" <damien.olivier.robert@gmail.com>:
> From James Ramsay, Thu 12 Mar 2020 at 14:57:24 (+1100) :
>>  6. Elijah: replace refs helps, but not supported by hosts like GitHub etc
>>      a. Stolee: breaks commit graph because of generation numbers.
>>      b. Replace refs for blobs, then special packfile, there were edge cases.
>
> I am interested in more details on how to handle this using replace.
>
> My situation: coworkers push big files by mistake, I don't want to rewrite
> history because they are not too well versed with git, but I want to keep
> *my* repo clean.

Wouldn't it be better to prevent *them* from such mistakes, e.g. by using
pre-push review system like Gerrit?

-- 
Regards,
Konstantin


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [TOPIC 3/17] Obliterate
  2020-03-15 22:19   ` Damien Robert
  2020-03-16 12:55     ` Konstantin Tokarev
@ 2020-03-16 16:32     ` Elijah Newren
  2020-03-26 22:30       ` Damien Robert
  2020-03-16 18:32     ` Phillip Susi
  2020-03-16 20:01     ` Philip Oakley
  3 siblings, 1 reply; 125+ messages in thread
From: Elijah Newren @ 2020-03-16 16:32 UTC (permalink / raw)
  To: Damien Robert; +Cc: James Ramsay, Git Mailing List

On Sun, Mar 15, 2020 at 4:16 PM Damien Robert
<damien.olivier.robert@gmail.com> wrote:
>
> From James Ramsay, Thu 12 Mar 2020 at 14:57:24 (+1100) :
> > 6. Elijah: replace refs helps, but not supported by hosts like GitHub etc
> >     a. Stolee: breaks commit graph because of generation numbers.
> >     b. Replace refs for blobs, then special packfile, there were edge cases.
>
> I am interested in more details on how to handle this using replace.

This comment at the conference was in reference to how people rewrite
history to remove the big blobs, but then run into issues because
there are many places outside of git that reference old commit IDs
(wiki pages, old emails, issues/tickets, etc.) that are now broken.

replace refs can help in that situation, because replace refs can be
used to not only replace existing objects with something else, they
can be used to replace non-existing objects with something else,
essentially setting up an alias for an object.

git filter-repo uses this when it rewrites history to give you a way
to access NEW commit hashes using OLD commit hashes, despite the old
commit hashes not being stored in the repository.  The old commit
hashes are just replace refs that replace non-existing objects (at
least within the newly rewritten repo) that happen to match old commit
hashes and map to the new commit hashes.  Unfortunately this isn't
quite a perfect solution, there are still three known downsides:

  * replace refs cannot be abbreviated, unlike real object ids.  Thus,
if you have an abbreviated old commit hash, git won't recognize it in
such a setup.
  * commit-graph apparently assumes that the existence of replace refs
implies that commit objects in the repo have likely been replaced
(even though that is not the case for this situation), and thus is
disabled when such refs are present.
  * external GUI programs such as GitHub and Gerrit and likely others
do not honor replace refs, instead showing you some form of "Not
Found" error.


As for using replace refs to attempt to alleviate problems without
rewriting history, that's an even bigger can of worms and it doesn't
solve clone/fetch/gc/fsck nor the many other places you highlighted in
your email.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [TOPIC 3/17] Obliterate
  2020-03-15 22:19   ` Damien Robert
  2020-03-16 12:55     ` Konstantin Tokarev
  2020-03-16 16:32     ` Elijah Newren
@ 2020-03-16 18:32     ` Phillip Susi
  2020-03-26 22:37       ` Damien Robert
  2020-03-16 20:01     ` Philip Oakley
  3 siblings, 1 reply; 125+ messages in thread
From: Phillip Susi @ 2020-03-16 18:32 UTC (permalink / raw)
  To: Damien Robert; +Cc: James Ramsay, git


Damien Robert writes:

> My situation: coworkers push big files by mistake, I don't want to rewrite
> history because they are not too well versed with git, but I want to keep
> *my* repo clean.
>
> Partial solution:
> - identify the large blobs (easy)
> - write a replace ref (easy):
>   $ git replace b5f74037bb91 $(git hash-object -w -t blob /dev/null)
>   and replace the file (if it is still in the repo) by an empty file.
>
> Now the pain points start:
> - first the index does not handle replace (I think), so the replaced file
>   appear as changed in git status, even through eg git diff shows nothing.

Instead of replacing the blob with an empty file, why not replace the
tree that references it with one that does not?  That way you won't have
the file in your checkout at all, and the index won't list it so status
won't show it as changed.


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: Notes from Git Contributor Summit, Los Angeles (April 5, 2020)
  2020-03-15 18:42 ` Jakub Narebski
@ 2020-03-16 19:31   ` Jeff King
  0 siblings, 0 replies; 125+ messages in thread
From: Jeff King @ 2020-03-16 19:31 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: James Ramsay, git

On Sun, Mar 15, 2020 at 07:42:19PM +0100, Jakub Narebski wrote:

> One question to all participating live (in person): how those topics
> were proposed, and how they were voted for?  This was done before remote
> access was turned on, I think.

During breakfast people wrote topics no a whiteboard and people voted on
them by putting a tick-mark on the board. The topics and votes were
transferred to the Google Doc for notes. Next time I think we'll just go
straight to the online doc to save time and make things friendlier for
remote folks (the whiteboard is a holdover from when we didn't have any
remotes). And I'll make it clear that remote people are welcome to add
topics and vote via the doc.

-Peff

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [TOPIC 3/17] Obliterate
  2020-03-15 22:19   ` Damien Robert
                       ` (2 preceding siblings ...)
  2020-03-16 18:32     ` Phillip Susi
@ 2020-03-16 20:01     ` Philip Oakley
  2020-05-16  2:21       ` nbelakovski
  3 siblings, 1 reply; 125+ messages in thread
From: Philip Oakley @ 2020-03-16 20:01 UTC (permalink / raw)
  To: Damien Robert, James Ramsay; +Cc: git

Hi Damien, James, (and 4 other who voted for the topic)

I had been thinking about 'missing' blobs for a long while as a earlier
'partial clone' concept (unpublished)

On 15/03/2020 22:19, Damien Robert wrote:
> From James Ramsay, Thu 12 Mar 2020 at 14:57:24 (+1100) :
>> 6. Elijah: replace refs helps, but not supported by hosts like GitHub etc
>>     a. Stolee: breaks commit graph because of generation numbers.
>>     b. Replace refs for blobs, then special packfile, there were edge cases.
> I am interested in more details on how to handle this using replace.
>
> My situation: coworkers push big files by mistake, I don't want to rewrite
> history because they are not too well versed with git, but I want to keep
> *my* repo clean.
>
> Partial solution:
> - identify the large blobs (easy)
> - write a replace ref (easy):
>   $ git replace b5f74037bb91 $(git hash-object -w -t blob /dev/null)
>   and replace the file (if it is still in the repo) by an empty file.

Here, my idea was to create a deliberately malformed blob object that
would allow self reference to say "this blob is deliberately missing".
(i.e. the same content would exist under two oids, one valid, one
invalid) The change would require extra code (more below).

Managing the verification of the replacement is a bigger problem,
especially if already pushed to a server.
> Now the pain points start:
> - first the index does not handle replace (I think), so the replaced file
>   appear as changed in git status, even through eg git diff shows nothing.
>
> => Solution: configure .git/info/sparse-checkout
>
> - secondly, I want to remove the large blob from my repo.
>
> Ideally I'd like to repack everything but filter this blob, except that
> repack does not understand --filter. So I need to use `git pack-objects`
> directly and then do the naming and clean up that repack usually does
> manually, which is error prone.
>
> Furthermore, while `git pack-objects` accepts --filter, I can only filter on
> blob size, not blob oid. (there is filter=sparse:oid where I could reuse my
> sparse checkout file, but I would need to make a blob of it first). And if I
> have one large file I want to keep, I cannot filter by blob size.
>
> Another solution would be to use `git unpack-objects` to unpack all objects
> (except I would need to do that in an empty git dir), remove the blob, and
> then repack everything.
>
> Am I missing a simpler solution?
>
> - finally, checkouting to a ref including the replaced (now missing) blob
>   gives error messages of the form:
> error: invalid object 100644 b5f74037bb91c45606b233b0ad6aad86f8e3875e for 'Silverman-Height-NonTorsion.pdf'
>
> On the one hand it is reassuring that git checks that the real object
> (rather than only the replaced object) is still there, on the other hand it
> would be nice to ask git to completely forget about the original object
> (except fsck of course).
>
> Thanks,
> Damien
My notes on the "13. Obliterate" ideas.

1. If the object is in the wild & is dangerous : Stop: Failed: Damage
limitation.
2. If the object is external, but still tame : Seek and recapture;
either treat as internal, or treat as wild [1].
3. The object is in captivity, even if distributed around an enclosure.
Proceed to vaccination [4].
4. Create new blob object with exact content "Git revoke: <oid>" (or
similar) This object includes the embedded object type coding as part of
the object format. This object is/becomes part of the git signature/oid
commit hierarchy. This should (ultimately) be on 'master' branch as it
is the verifier for the obliteration.
5. In the old revoked object <oid>, replace the object content (after
zlib etc) with the same content as created in step 4. This deliberately
_malformed_ object would normally cause fsck to barf. see [6]
6. However here we/fsck would detect the length and prefix of the
(barfed) object contents and so determine its oid (the oid of the
content). This results in an oid equal to that found in 4. which can be
looked up and determined to be a self referral to this obliterated oid,
so an fsck 'pass:obliterated' result is returned. This content could be
actually be stored in any removed file if checked out!

Consequences:
Packs and other served object contents no longer contain the  revoked oid.
Hygiene/vaccination needs applied to other distributed recipients of the
former defective object.

Possible attacks: Attacker removes other important commits/blobs/trees
by adding a 'revoke' which propagates to other users: Separate the
hygiene cycle from the initial server revocation.

For trees(?) and commits the message is "revoke <oid> <use-oid>". But
where to 'hold' the commit & tree (maybe require that tree revoke is
treated as a commit revoke, so the the new tree is got for free). We
still need the new commit to be walked by fsck/gc, and the old oid
contents to be gc'd.
For a 'commit' revocation it (the new msg/trees/revision) could maybe be
a 2nd (or third parent after {0}) so a 'normal walk finds it, but
probably that's just a recipe for disaster.
Maybe a revocation reflog that doesn't expire? or can be rebuilt (fsck
would extend it's lost/found to include a revoked list).

The new (XY) problem is now one of tying in the new revoked blob to the
'old' commit/tree hierarchy which only handles tracked files! Maybe its
a .revoked file (like a .gitignore) which has a list of the old oids and
has actual blobs attached under a .revoked tree.

Also need to make sure that re-packing is done if the blob/tree/commit
was a delta-base at the point of obliteration. Also need to prompt the
local user, just in case it's a spoof!. Plus need a way of 'sending' the
revocation. (and flag for what to do about a fetch pack containing a
revocation for which we have the original, esp if we have it as a pack
that will take a long time to recreate. Need a way of writing the
'defective' object (more code).

Newhash transition. When histories are rewritten, then the obliterated
artefacts are truly removed. For new repos using the newhash then the
revocation mechanism is essentially the same other than extending the
nominal size of the revocation objects.

Perhaps use the 'submodule' commit object type (i.e 'stuff held
elsewhere')  for the holder of the revoked ID (for commits & trees).
This could be locked into the history (details not fully thought
through..).

If there is a design error within Git, its the lack of an 'after the
fact' redaction mechanism (and how it is spread across branches and
distributed users/servers) - not easy.

Philip

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [TOPIC 16/17] “I want a reviewer”
  2020-03-12 13:31   ` Emily Shaffer
  2020-03-12 17:31     ` Konstantin Ryabitsev
@ 2020-03-17  0:43     ` Philippe Blain
  1 sibling, 0 replies; 125+ messages in thread
From: Philippe Blain @ 2020-03-17  0:43 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: James Ramsay, git, Johannes Schindelin

Hi Emily,

> Le 12 mars 2020 à 09:31, Emily Shaffer <emilyshaffer@google.com> a écrit :
> 
> On Thu, Mar 12, 2020 at 03:14:25PM +1100, James Ramsay wrote:
>> 5. Jonathan N: patchwork exists, need to learn how to use it :)
> 
> We've actually got a meeting with some Patchwork folks today - if
> anybody has a burning need they want filled via Patchwork, just say so,
> and we'll try to ask.

I just read this so I don't know if it's too late, but patchwork does not cope well with how Gitgitgadget uses the same email address for all submissions.
I reported that here:  https://lore.kernel.org/git/75987318-A9A7-4235-8B1D-315B29B644E8@gmail.com/, but haven't opened an issue yet on patchwork's bug tracker.
I'm not sure either if the best course of action is on the GGG or the patchwork side, though, as perJunio's suggestion in the above thread...

Philippe.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Allowing only blob filtering was: [TOPIC 5/17] Partial Clone
  2020-03-12  4:00 ` [TOPIC 5/17] Partial Clone James Ramsay
@ 2020-03-17  7:38   ` Christian Couder
  2020-03-17 20:39     ` [RFC PATCH 0/2] upload-pack.c: limit allowed filter choices Taylor Blau
  0 siblings, 1 reply; 125+ messages in thread
From: Christian Couder @ 2020-03-17  7:38 UTC (permalink / raw)
  To: Taylor Blau, Jeff King; +Cc: git, James Ramsay

Hi Taylor and Peff,

On Thu, Mar 12, 2020 at 5:01 AM James Ramsay <james@jramsay.com.au> wrote:
>
> 1. Stolee: what is the status, who is deploying it, what issues need to
> be handled? Example, downloading tags. Hard to highly recommend it.
>
> 2. Taylor: we deployed it. No activity except for internal testing. Some
> more activity, but no crashes. Have been dragging our feet. Chicken egg,
> can’t deploy it because the client may not work, but hoping to hear
> about problems.
>
> 3. ZJ: dark launched for a mission critical repos. Internal questions
> from CI team, not sure about performance. Build farm hitting it hard
> with different filter specs.
>
> 4. Taylor: we have patches we are promising to the list. Blob none and
> limit, for now, but add them incrementally. Bitmap patches are on the
> list

We (GitLab) would be interested in seeing the patches you already have
that only allow blob filtering.

Thanks,
Christian.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* [RFC PATCH 0/2] upload-pack.c: limit allowed filter choices
  2020-03-17  7:38   ` Allowing only blob filtering was: " Christian Couder
@ 2020-03-17 20:39     ` Taylor Blau
  2020-03-17 20:39       ` [RFC PATCH 1/2] list_objects_filter_options: introduce 'list_object_filter_config_name' Taylor Blau
                         ` (2 more replies)
  0 siblings, 3 replies; 125+ messages in thread
From: Taylor Blau @ 2020-03-17 20:39 UTC (permalink / raw)
  To: git; +Cc: christian.couder, peff, james

Hi Christian,

Of course, I would be happy to send along our patches. They are included
in the series below, and correspond roughly to what we are running at
GitHub. (For us, there have been a few more clean-ups and additional
patches, but I squashed them into 2/2 below).

The approach is roughly that we have:

  - 'uploadpack.filter.allow' -> specifying the default for unspecified
    filter choices, itself defaulting to true in order to maintain
    backwards compatibility, and

  - 'uploadpack.filter.<filter>.allow' -> specifying whether or not each
    filter kind is allowed or not. (Originally this was given as 'git
    config uploadpack.filter=blob:none.allow true', but this '=' is
    ambiguous to configuration given over '-c', which itself uses an '='
    to separate keys from values.)

I noted in the second patch that there is the unfortunate possibility of
encountering a SIGPIPE when trying to write the ERR sideband back to a
client who requested a non-supported filter. Peff and I have had some
discussion off-list about resurrecting SZEDZER's work which makes room
in the buffer by reading one packet back from the client when the server
encounters a SIGPIPE. It is for this reason that I am marking the series
as 'RFC'.

For reference, our configuration at GitHub looks something like:

  [uploadpack]
    allowAnySHA1InWant = true
    allowFilter = true
  [uploadpack "filter"]
    allow = false
  [uploadpack "filter.blob:limit"]
    allow = true
  [uploadpack "filter.blob:none"]
    allow = true

with a few irrelevant details elided for the purposes of the list :-).

I'd be happy to take in any comments that you or others might have
before dropping the 'RFC' status.

Taylor Blau (2):
  list_objects_filter_options: introduce 'list_object_filter_config_name'
  upload-pack.c: allow banning certain object filter(s)

 Documentation/config/uploadpack.txt | 12 ++++++
 list-objects-filter-options.c       | 25 +++++++++++
 list-objects-filter-options.h       |  6 +++
 t/t5616-partial-clone.sh            | 23 ++++++++++
 upload-pack.c                       | 67 +++++++++++++++++++++++++++++
 5 files changed, 133 insertions(+)

--
2.26.0.rc2.2.g888d9484cf

^ permalink raw reply	[flat|nested] 125+ messages in thread

* [RFC PATCH 1/2] list_objects_filter_options: introduce 'list_object_filter_config_name'
  2020-03-17 20:39     ` [RFC PATCH 0/2] upload-pack.c: limit allowed filter choices Taylor Blau
@ 2020-03-17 20:39       ` Taylor Blau
  2020-03-17 20:53         ` Eric Sunshine
  2020-03-17 20:39       ` [RFC PATCH 2/2] upload-pack.c: allow banning certain object filter(s) Taylor Blau
  2020-03-18 10:18       ` [RFC PATCH 0/2] upload-pack.c: limit allowed filter choices Jeff King
  2 siblings, 1 reply; 125+ messages in thread
From: Taylor Blau @ 2020-03-17 20:39 UTC (permalink / raw)
  To: git; +Cc: christian.couder, peff, james

In a subsequent commit, we will add configuration options that are
specific to each kind of object filter, in which case it is handy to
have a function that translates between 'enum
list_objects_filter_choice' and an appropriate configuration-friendly
string.
---
 list-objects-filter-options.c | 25 +++++++++++++++++++++++++
 list-objects-filter-options.h |  6 ++++++
 2 files changed, 31 insertions(+)

diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index 256bcfbdfe..6b6aa0b3ec 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -15,6 +15,31 @@ static int parse_combine_filter(
 	const char *arg,
 	struct strbuf *errbuf);
 
+const char *list_object_filter_config_name(enum list_objects_filter_choice c)
+{
+	switch (c) {
+	case LOFC_BLOB_NONE:
+		return "blob:none";
+	case LOFC_BLOB_LIMIT:
+		return "blob:limit";
+	case LOFC_TREE_DEPTH:
+		return "tree:depth";
+	case LOFC_SPARSE_OID:
+		return "sparse:oid";
+	case LOFC_COMBINE:
+		return "combine";
+	case LOFC_DISABLED:
+	case LOFC__COUNT:
+		/*
+		 * Include these to catch all enumerated values, but
+		 * break to treat them as a bug. Any new values of this
+		 * enum will cause a compiler error, as desired.
+		 */
+		break;
+	}
+	BUG("list_object_filter_choice_name: invalid argument '%d'", c);
+}
+
 /*
  * Parse value of the argument to the "filter" keyword.
  * On the command line this looks like:
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index 2ffb39222c..e5259e4ac6 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -17,6 +17,12 @@ enum list_objects_filter_choice {
 	LOFC__COUNT /* must be last */
 };
 
+/*
+ * Returns a configuration key suitable for describing the given object filter,
+ * e.g.: "blob:none", "combine", etc.
+ */
+const char *list_object_filter_config_name(enum list_objects_filter_choice c);
+
 struct list_objects_filter_options {
 	/*
 	 * 'filter_spec' is the raw argument value given on the command line
-- 
2.26.0.rc2.2.g888d9484cf


^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [RFC PATCH 2/2] upload-pack.c: allow banning certain object filter(s)
  2020-03-17 20:39     ` [RFC PATCH 0/2] upload-pack.c: limit allowed filter choices Taylor Blau
  2020-03-17 20:39       ` [RFC PATCH 1/2] list_objects_filter_options: introduce 'list_object_filter_config_name' Taylor Blau
@ 2020-03-17 20:39       ` Taylor Blau
  2020-03-17 21:11         ` Eric Sunshine
  2020-03-18 11:18         ` Philip Oakley
  2020-03-18 10:18       ` [RFC PATCH 0/2] upload-pack.c: limit allowed filter choices Jeff King
  2 siblings, 2 replies; 125+ messages in thread
From: Taylor Blau @ 2020-03-17 20:39 UTC (permalink / raw)
  To: git; +Cc: christian.couder, peff, james

Git clients may ask the server for a partial set of objects, where the
set of objects being requested is refined by one or more object filters.
Server administrators can configure 'git upload-pack' to allow or ban
these filters by setting the 'uploadpack.allowFilter' variable to
'true' or 'false', respectively.

However, administrators using bitmaps may wish to allow certain kinds of
object filters, but ban others. Specifically, they may wish to allow
object filters that can be optimized by the use of bitmaps, while
rejecting other object filters which aren't and represent a perceived
performance degradation (as well as an increased load factor on the
server).

Allow configuring 'git upload-pack' to support object filters on a
case-by-case basis by introducing a new configuration variable and
section:

  - 'uploadpack.filter.allow'

  - 'uploadpack.filter.<kind>.allow'

where '<kind>' may be one of 'blob:none', 'blob:limit', 'tree:depth',
and so on. The additional '.' between 'filter' and '<kind>' is part of
the sub-section.

Setting the second configuration variable for any valid value of
'<kind>' explicitly allows or disallows restricting that kind of object
filter.

If a client requests the object filter <kind> and the respective
configuration value is not set, 'git upload-pack' will default to the
value of 'uploadpack.filter.allow', which itself defaults to 'true' to
maintain backwards compatibility. Note that this differs from
'uploadpack.allowfilter', which controls whether or not the 'filter'
capability is advertised.

NB: this introduces an unfortunate possibility that attempt to write the
ERR sideband will cause a SIGPIPE. This can be prevented by some of
SZEDZER's previous work, but it is silenced in 't' for now.
---
 Documentation/config/uploadpack.txt | 12 ++++++
 t/t5616-partial-clone.sh            | 23 ++++++++++
 upload-pack.c                       | 67 +++++++++++++++++++++++++++++
 3 files changed, 102 insertions(+)

diff --git a/Documentation/config/uploadpack.txt b/Documentation/config/uploadpack.txt
index ed1c835695..6213bd619c 100644
--- a/Documentation/config/uploadpack.txt
+++ b/Documentation/config/uploadpack.txt
@@ -57,6 +57,18 @@ uploadpack.allowFilter::
 	If this option is set, `upload-pack` will support partial
 	clone and partial fetch object filtering.
 
+uploadpack.filter.allow::
+	Provides a default value for unspecified object filters (see: the
+	below configuration variable).
+	Defaults to `true`.
+
+uploadpack.filter.<filter>.allow::
+	Explicitly allow or ban the object filter corresponding to `<filter>`,
+	where `<filter>` may be one of: `blob:none`, `blob:limit`, `tree:depth`,
+	`sparse:oid`, or `combine`. If using combined filters, both `combine`
+	and all of the nested filter kinds must be allowed.
+	Defaults to `uploadpack.filter.allow`.
+
 uploadpack.allowRefInWant::
 	If this option is set, `upload-pack` will support the `ref-in-want`
 	feature of the protocol version 2 `fetch` command.  This feature
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 77bb91e976..ee1af9b682 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -235,6 +235,29 @@ test_expect_success 'implicitly construct combine: filter with repeated flags' '
 	test_cmp unique_types.expected unique_types.actual
 '
 
+test_expect_success 'upload-pack fails banned object filters' '
+	# Ensure that configuration keys are normalized by capitalizing
+	# "blob:None" below:
+	test_config -C srv.bare uploadpack.filter.blob:None.allow false &&
+	test_must_fail ok=sigpipe git clone --no-checkout --filter.blob:none \
+		"file://$(pwd)/srv.bare" pc3
+'
+
+test_expect_success 'upload-pack fails banned combine object filters' '
+	test_config -C srv.bare uploadpack.filter.allow false &&
+	test_config -C srv.bare uploadpack.filter.combine.allow true &&
+	test_config -C srv.bare uploadpack.filter.tree:depth.allow true &&
+	test_config -C srv.bare uploadpack.filter.blob:none.allow false &&
+	test_must_fail ok=sigpipe git clone --no-checkout --filter=tree:1 \
+		--filter=blob:none "file://$(pwd)/srv.bare" pc3
+'
+
+test_expect_success 'upload-pack fails banned object filters with fallback' '
+	test_config -C srv.bare uploadpack.filter.allow false &&
+	test_must_fail ok=sigpipe git clone --no-checkout --filter=blob:none \
+		"file://$(pwd)/srv.bare" pc3
+'
+
 test_expect_success 'partial clone fetches blobs pointed to by refs even if normally filtered out' '
 	rm -rf src dst &&
 	git init src &&
diff --git a/upload-pack.c b/upload-pack.c
index c53249cac1..81f2701f99 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -69,6 +69,8 @@ static int filter_capability_requested;
 static int allow_filter;
 static int allow_ref_in_want;
 static struct list_objects_filter_options filter_options;
+static struct string_list allowed_filters = STRING_LIST_INIT_DUP;
+static int allow_filter_fallback = 1;
 
 static int allow_sideband_all;
 
@@ -848,6 +850,45 @@ static int process_deepen_not(const char *line, struct string_list *deepen_not,
 	return 0;
 }
 
+static int allows_filter_choice(enum list_objects_filter_choice c)
+{
+	const char *key = list_object_filter_config_name(c);
+	struct string_list_item *item = string_list_lookup(&allowed_filters,
+							   key);
+	if (item)
+		return (intptr_t) item->util;
+	return allow_filter_fallback;
+}
+
+static struct list_objects_filter_options *banned_filter(
+	struct list_objects_filter_options *opts)
+{
+	size_t i;
+
+	if (!allows_filter_choice(opts->choice))
+		return opts;
+
+	if (opts->choice == LOFC_COMBINE)
+		for (i = 0; i < opts->sub_nr; i++) {
+			struct list_objects_filter_options *sub = &opts->sub[i];
+			if (banned_filter(sub))
+				return sub;
+		}
+	return NULL;
+}
+
+static void die_if_using_banned_filter(struct packet_writer *w,
+				       struct list_objects_filter_options *opts)
+{
+	struct list_objects_filter_options *banned = banned_filter(opts);
+	if (!banned)
+		return;
+
+	packet_writer_error(w, _("filter '%s' not supported\n"),
+			    list_object_filter_config_name(banned->choice));
+	die(_("git upload-pack: banned object filter requested"));
+}
+
 static void receive_needs(struct packet_reader *reader, struct object_array *want_obj)
 {
 	struct object_array shallows = OBJECT_ARRAY_INIT;
@@ -885,6 +926,7 @@ static void receive_needs(struct packet_reader *reader, struct object_array *wan
 				die("git upload-pack: filtering capability not negotiated");
 			list_objects_filter_die_if_populated(&filter_options);
 			parse_list_objects_filter(&filter_options, arg);
+			die_if_using_banned_filter(&writer, &filter_options);
 			continue;
 		}
 
@@ -1044,6 +1086,9 @@ static int find_symref(const char *refname, const struct object_id *oid,
 
 static int upload_pack_config(const char *var, const char *value, void *unused)
 {
+	const char *sub, *key;
+	int sub_len;
+
 	if (!strcmp("uploadpack.allowtipsha1inwant", var)) {
 		if (git_config_bool(var, value))
 			allow_unadvertised_object_request |= ALLOW_TIP_SHA1;
@@ -1065,6 +1110,26 @@ static int upload_pack_config(const char *var, const char *value, void *unused)
 			keepalive = -1;
 	} else if (!strcmp("uploadpack.allowfilter", var)) {
 		allow_filter = git_config_bool(var, value);
+	} else if (!parse_config_key(var, "uploadpack", &sub, &sub_len, &key) &&
+		   key && !strcmp(key, "allow")) {
+		if (sub && skip_prefix(sub, "filter.", &sub) && sub_len >= 7) {
+			struct string_list_item *item;
+			char *spec;
+
+			/*
+			 * normalize the filter, and chomp off '.allow' from the
+			 * end
+			 */
+			spec = xstrdup_tolower(sub);
+			spec[sub_len - 7] = 0;
+
+			item = string_list_insert(&allowed_filters, spec);
+			item->util = (void *) (intptr_t) git_config_bool(var, value);
+
+			free(spec);
+		} else if (!strcmp("uploadpack.filter.allow", var)) {
+			allow_filter_fallback = git_config_bool(var, value);
+		}
 	} else if (!strcmp("uploadpack.allowrefinwant", var)) {
 		allow_ref_in_want = git_config_bool(var, value);
 	} else if (!strcmp("uploadpack.allowsidebandall", var)) {
@@ -1308,6 +1373,8 @@ static void process_args(struct packet_reader *request,
 		if (allow_filter && skip_prefix(arg, "filter ", &p)) {
 			list_objects_filter_die_if_populated(&filter_options);
 			parse_list_objects_filter(&filter_options, p);
+			die_if_using_banned_filter(&data->writer,
+						   &filter_options);
 			continue;
 		}
 
-- 
2.26.0.rc2.2.g888d9484cf

^ permalink raw reply related	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 1/2] list_objects_filter_options: introduce 'list_object_filter_config_name'
  2020-03-17 20:39       ` [RFC PATCH 1/2] list_objects_filter_options: introduce 'list_object_filter_config_name' Taylor Blau
@ 2020-03-17 20:53         ` Eric Sunshine
  2020-03-18 10:03           ` Jeff King
  2020-03-18 21:05           ` Taylor Blau
  0 siblings, 2 replies; 125+ messages in thread
From: Eric Sunshine @ 2020-03-17 20:53 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Git List, Christian Couder, Jeff King, james

On Tue, Mar 17, 2020 at 4:40 PM Taylor Blau <me@ttaylorr.com> wrote:
> In a subsequent commit, we will add configuration options that are
> specific to each kind of object filter, in which case it is handy to
> have a function that translates between 'enum
> list_objects_filter_choice' and an appropriate configuration-friendly
> string.
> ---

Missing sign-off (but perhaps that's intentional since this is RFC).

> diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
> @@ -15,6 +15,31 @@ static int parse_combine_filter(
> +const char *list_object_filter_config_name(enum list_objects_filter_choice c)
> +{
> +       switch (c) {
> +       case LOFC_BLOB_NONE:
> +               return "blob:none";
> +       case LOFC_BLOB_LIMIT:
> +               return "blob:limit";
> +       case LOFC_TREE_DEPTH:
> +               return "tree:depth";
> +       case LOFC_SPARSE_OID:
> +               return "sparse:oid";
> +       case LOFC_COMBINE:
> +               return "combine";
> +       case LOFC_DISABLED:
> +       case LOFC__COUNT:
> +               /*
> +                * Include these to catch all enumerated values, but
> +                * break to treat them as a bug. Any new values of this
> +                * enum will cause a compiler error, as desired.
> +                */

In general, people will see a warning, not an error, unless they
specifically use -Werror (or such) to turn the warning into an error,
so this statement is misleading. Also, while some compilers may
complain, others may not. So, although the comment claims that we will
notice an unhandled enum constant at compile-time, that isn't
necessarily the case.

Moreover, the comment itself, in is present form, is rather
superfluous since its merely repeating what the BUG() invocation just
below it already tells me. In fact, as a reader of this code, I would
be more interested in knowing why those two cases do not have string
equivalents which are returned (although perhaps even that would be
obvious to someone familiar with the code, hence the comment can
probably be dropped altogether).

> +               break;
> +       }
> +       BUG("list_object_filter_choice_name: invalid argument '%d'", c);
> +}

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 2/2] upload-pack.c: allow banning certain object filter(s)
  2020-03-17 20:39       ` [RFC PATCH 2/2] upload-pack.c: allow banning certain object filter(s) Taylor Blau
@ 2020-03-17 21:11         ` Eric Sunshine
  2020-03-18 21:18           ` Taylor Blau
  2020-03-18 11:18         ` Philip Oakley
  1 sibling, 1 reply; 125+ messages in thread
From: Eric Sunshine @ 2020-03-17 21:11 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Git List, Christian Couder, Jeff King, james

On Tue, Mar 17, 2020 at 4:40 PM Taylor Blau <me@ttaylorr.com> wrote:
> NB: this introduces an unfortunate possibility that attempt to write the
> ERR sideband will cause a SIGPIPE. This can be prevented by some of
> SZEDZER's previous work, but it is silenced in 't' for now.

s/SZEDZER/SZEDER/

> diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
> @@ -235,6 +235,29 @@ test_expect_success 'implicitly construct combine: filter with repeated flags' '
> +test_expect_success 'upload-pack fails banned object filters' '
> +       # Ensure that configuration keys are normalized by capitalizing
> +       # "blob:None" below:
> +       test_config -C srv.bare uploadpack.filter.blob:None.allow false &&

I found the wording of the comment more confusing than clarifying.
Perhaps rewriting it like this could help:

    Test case-insensitivity by intentional use of "blob:None" rather than
    "blob:none".

or something.

> +       test_must_fail ok=sigpipe git clone --no-checkout --filter.blob:none \
> +               "file://$(pwd)/srv.bare" pc3
> +'

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 1/2] list_objects_filter_options: introduce 'list_object_filter_config_name'
  2020-03-17 20:53         ` Eric Sunshine
@ 2020-03-18 10:03           ` Jeff King
  2020-03-18 19:40             ` Junio C Hamano
  2020-03-18 22:38             ` Eric Sunshine
  2020-03-18 21:05           ` Taylor Blau
  1 sibling, 2 replies; 125+ messages in thread
From: Jeff King @ 2020-03-18 10:03 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Taylor Blau, Git List, Christian Couder, james

On Tue, Mar 17, 2020 at 04:53:44PM -0400, Eric Sunshine wrote:

> > +       case LOFC_DISABLED:
> > +       case LOFC__COUNT:
> > +               /*
> > +                * Include these to catch all enumerated values, but
> > +                * break to treat them as a bug. Any new values of this
> > +                * enum will cause a compiler error, as desired.
> > +                */
> 
> In general, people will see a warning, not an error, unless they
> specifically use -Werror (or such) to turn the warning into an error,
> so this statement is misleading. Also, while some compilers may
> complain, others may not. So, although the comment claims that we will
> notice an unhandled enum constant at compile-time, that isn't
> necessarily the case.

Yes, but that's the best we can do, isn't it?

There's sort of a meta-issue here which Taylor and I discussed off-list
and which led to this comment. We quite often write switch statements
over enums like this:

  switch (foo)
  case FOO_ONE:
	...do something...
  case FOO_TWO:
        ...something else...
  default:
	BUG("I don't know what to do with %d", foo);
  }

That's reasonable and does the right thing at runtime if we ever hit
this case. But it has the unfortunate side effect that we lose any
-Wswitch warning that could tell us at compile time that we're missing a
case. Not everybody would see such a warning, as you note, but
developers on gcc and clang generally would (it's part of -Wall).

But we can't just remove the default case. Even though enums don't
generally take on other values, it's legal for them to do so. So we do
want to make sure we BUG() in that instance.

This is awkward to solve in the general case[1]. But because we're
returning in each case arm here, it's easy to just put the BUG() after
the switch. Anything that didn't return is unhandled, and we get the
best of both: -Wswitch warnings when we need to add a new filter type,
and a BUG() in the off chance that we see an unexpected value.

But the cost is that we have to enumerate the set of values that are
defined but not handled here (LOFC__COUNT, for instance, isn't a real
enum value but rather a placeholder to let other code know how many
filter types there are).

So...I dunno. Worth it as a general technique?

-Peff

[1] In the general case where you don't return, you have to somehow know
    whether the value was actually handled or not (and BUG() if it
    wasn't). Presumably by keeping a separate flag variable, which is
    pretty ugly. -Wswitch-enum is supposed to deal with this by
    requiring that you list all of the values even if you have a default
    case. But it triggers in a lot of other places in the code that I
    think would be made much harder to read by having to list out the
    enumerated possibilities.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 0/2] upload-pack.c: limit allowed filter choices
  2020-03-17 20:39     ` [RFC PATCH 0/2] upload-pack.c: limit allowed filter choices Taylor Blau
  2020-03-17 20:39       ` [RFC PATCH 1/2] list_objects_filter_options: introduce 'list_object_filter_config_name' Taylor Blau
  2020-03-17 20:39       ` [RFC PATCH 2/2] upload-pack.c: allow banning certain object filter(s) Taylor Blau
@ 2020-03-18 10:18       ` Jeff King
  2020-03-18 18:26         ` Re*: " Junio C Hamano
                           ` (2 more replies)
  2 siblings, 3 replies; 125+ messages in thread
From: Jeff King @ 2020-03-18 10:18 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, christian.couder, james

On Tue, Mar 17, 2020 at 02:39:05PM -0600, Taylor Blau wrote:

> Hi Christian,
> 
> Of course, I would be happy to send along our patches. They are included
> in the series below, and correspond roughly to what we are running at
> GitHub. (For us, there have been a few more clean-ups and additional
> patches, but I squashed them into 2/2 below).
> 
> The approach is roughly that we have:
> 
>   - 'uploadpack.filter.allow' -> specifying the default for unspecified
>     filter choices, itself defaulting to true in order to maintain
>     backwards compatibility, and
> 
>   - 'uploadpack.filter.<filter>.allow' -> specifying whether or not each
>     filter kind is allowed or not. (Originally this was given as 'git
>     config uploadpack.filter=blob:none.allow true', but this '=' is
>     ambiguous to configuration given over '-c', which itself uses an '='
>     to separate keys from values.)

One thing that's a little ugly here is the embedded dot in the
subsection (i.e., "filter.<filter>"). It makes it look like a four-level
key, but really there is no such thing in Git.  But everything else we
tried was even uglier.

I think we want to declare a real subsection for each filter and not
just "uploadpack.filter.<filter>". That gives us room to expand to other
config options besides "allow" later on if we need to.

We don't want to claim "uploadpack.allow" and "uploadpack.<filter>.allow";
that's too generic.

Likewise "filter.allow" is too generic.

We could do "uploadpackfilter.allow" and "uploadpackfilter.<filter>.allow",
but that's both ugly _and_ separates these options from the rest of
uploadpack.*.

We could use a character besides ".", which would reduce confusion. But
what? Using colon is kind of ugly, because it's already syntactically
significant in filter names, and you get:

  uploadpack.filter:blob:none.allow

We tried equals, like:

  uploadpack.filter=blob:none.allow

but there's an interesting side effect. Doing:

  git -c uploadpack.filter=blob:none.allow=true upload-pack ...

doesn't work, because the "-c" parser ends the key at the first "=". As
it should, because otherwise we'd get confused by an "=" in a value.
This is a failing of the "-c" syntax; it can't represent values with
"=". Fixing it would be awkward, and I've never seen it come up in
practice outside of this (you _could_ have a branch with a funny name
and try to do "git -c branch.my=funny=branch.remote=origin" or
something, but the lack of bug reports suggests nobody is that
masochistic).

So...maybe the extra dot is the last bad thing?

> I noted in the second patch that there is the unfortunate possibility of
> encountering a SIGPIPE when trying to write the ERR sideband back to a
> client who requested a non-supported filter. Peff and I have had some
> discussion off-list about resurrecting SZEDZER's work which makes room
> in the buffer by reading one packet back from the client when the server
> encounters a SIGPIPE. It is for this reason that I am marking the series
> as 'RFC'.

For reference, the patch I was thinking of was this:

  https://lore.kernel.org/git/20190830121005.GI8571@szeder.dev/

-Peff

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 2/2] upload-pack.c: allow banning certain object filter(s)
  2020-03-17 20:39       ` [RFC PATCH 2/2] upload-pack.c: allow banning certain object filter(s) Taylor Blau
  2020-03-17 21:11         ` Eric Sunshine
@ 2020-03-18 11:18         ` Philip Oakley
  2020-03-18 21:20           ` Taylor Blau
  1 sibling, 1 reply; 125+ messages in thread
From: Philip Oakley @ 2020-03-18 11:18 UTC (permalink / raw)
  To: Taylor Blau, git; +Cc: christian.couder, peff, james

Hi
On 17/03/2020 20:39, Taylor Blau wrote:
> Git clients may ask the server for a partial set of objects, where the
> set of objects being requested is refined by one or more object filters.
> Server administrators can configure 'git upload-pack' to allow or ban
> these filters by setting the 'uploadpack.allowFilter' variable to
> 'true' or 'false', respectively.
>
> However, administrators using bitmaps may wish to allow certain kinds of
> object filters, but ban others. Specifically, they may wish to allow
> object filters that can be optimized by the use of bitmaps, while
> rejecting other object filters which aren't and represent a perceived
> performance degradation (as well as an increased load factor on the
> server).
>
> Allow configuring 'git upload-pack' to support object filters on a
> case-by-case basis by introducing a new configuration variable and
> section:
>
>   - 'uploadpack.filter.allow'
>
>   - 'uploadpack.filter.<kind>.allow'
>
> where '<kind>' may be one of 'blob:none', 'blob:limit', 'tree:depth',
> and so on. The additional '.' between 'filter' and '<kind>' is part of
> the sub-section.
>
> Setting the second configuration variable for any valid value of
> '<kind>' explicitly allows or disallows restricting that kind of object
> filter.
>
> If a client requests the object filter <kind> and the respective
> configuration value is not set, 'git upload-pack' will default to the
> value of 'uploadpack.filter.allow', which itself defaults to 'true' to
> maintain backwards compatibility. Note that this differs from
> 'uploadpack.allowfilter', which controls whether or not the 'filter'
> capability is advertised.
>
> NB: this introduces an unfortunate possibility that attempt to write the
> ERR sideband will cause a SIGPIPE. This can be prevented by some of
> SZEDZER's previous work, but it is silenced in 't' for now.
> ---
>  Documentation/config/uploadpack.txt | 12 ++++++
>  t/t5616-partial-clone.sh            | 23 ++++++++++
>  upload-pack.c                       | 67 +++++++++++++++++++++++++++++
>  3 files changed, 102 insertions(+)
>
> diff --git a/Documentation/config/uploadpack.txt b/Documentation/config/uploadpack.txt
> index ed1c835695..6213bd619c 100644
> --- a/Documentation/config/uploadpack.txt
> +++ b/Documentation/config/uploadpack.txt
> @@ -57,6 +57,18 @@ uploadpack.allowFilter::
>  	If this option is set, `upload-pack` will support partial
>  	clone and partial fetch object filtering.
>  
> +uploadpack.filter.allow::
> +	Provides a default value for unspecified object filters (see: the
> +	below configuration variable).
> +	Defaults to `true`.
> +
> +uploadpack.filter.<filter>.allow::
> +	Explicitly allow or ban the object filter corresponding to `<filter>`,
> +	where `<filter>` may be one of: `blob:none`, `blob:limit`, `tree:depth`,
> +	`sparse:oid`, or `combine`. If using combined filters, both `combine`
> +	and all of the nested filter kinds must be allowed.

Doesn't the man page at least need the part from the commit message "The
additional '.' between 'filter' and '<kind>' is part of
the sub-section." as it's not a common mechanism (other comments not
withstanding)

Philip
> +	Defaults to `uploadpack.filter.allow`.
> +
>  uploadpack.allowRefInWant::
>  	If this option is set, `upload-pack` will support the `ref-in-want`
>  	feature of the protocol version 2 `fetch` command.  This feature
> diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
> index 77bb91e976..ee1af9b682 100755
> --- a/t/t5616-partial-clone.sh
> +++ b/t/t5616-partial-clone.sh
> @@ -235,6 +235,29 @@ test_expect_success 'implicitly construct combine: filter with repeated flags' '
>  	test_cmp unique_types.expected unique_types.actual
>  '
>  
> +test_expect_success 'upload-pack fails banned object filters' '
> +	# Ensure that configuration keys are normalized by capitalizing
> +	# "blob:None" below:
> +	test_config -C srv.bare uploadpack.filter.blob:None.allow false &&
> +	test_must_fail ok=sigpipe git clone --no-checkout --filter.blob:none \
> +		"file://$(pwd)/srv.bare" pc3
> +'
> +
> +test_expect_success 'upload-pack fails banned combine object filters' '
> +	test_config -C srv.bare uploadpack.filter.allow false &&
> +	test_config -C srv.bare uploadpack.filter.combine.allow true &&
> +	test_config -C srv.bare uploadpack.filter.tree:depth.allow true &&
> +	test_config -C srv.bare uploadpack.filter.blob:none.allow false &&
> +	test_must_fail ok=sigpipe git clone --no-checkout --filter=tree:1 \
> +		--filter=blob:none "file://$(pwd)/srv.bare" pc3
> +'
> +
> +test_expect_success 'upload-pack fails banned object filters with fallback' '
> +	test_config -C srv.bare uploadpack.filter.allow false &&
> +	test_must_fail ok=sigpipe git clone --no-checkout --filter=blob:none \
> +		"file://$(pwd)/srv.bare" pc3
> +'
> +
>  test_expect_success 'partial clone fetches blobs pointed to by refs even if normally filtered out' '
>  	rm -rf src dst &&
>  	git init src &&
> diff --git a/upload-pack.c b/upload-pack.c
> index c53249cac1..81f2701f99 100644
> --- a/upload-pack.c
> +++ b/upload-pack.c
> @@ -69,6 +69,8 @@ static int filter_capability_requested;
>  static int allow_filter;
>  static int allow_ref_in_want;
>  static struct list_objects_filter_options filter_options;
> +static struct string_list allowed_filters = STRING_LIST_INIT_DUP;
> +static int allow_filter_fallback = 1;
>  
>  static int allow_sideband_all;
>  
> @@ -848,6 +850,45 @@ static int process_deepen_not(const char *line, struct string_list *deepen_not,
>  	return 0;
>  }
>  
> +static int allows_filter_choice(enum list_objects_filter_choice c)
> +{
> +	const char *key = list_object_filter_config_name(c);
> +	struct string_list_item *item = string_list_lookup(&allowed_filters,
> +							   key);
> +	if (item)
> +		return (intptr_t) item->util;
> +	return allow_filter_fallback;
> +}
> +
> +static struct list_objects_filter_options *banned_filter(
> +	struct list_objects_filter_options *opts)
> +{
> +	size_t i;
> +
> +	if (!allows_filter_choice(opts->choice))
> +		return opts;
> +
> +	if (opts->choice == LOFC_COMBINE)
> +		for (i = 0; i < opts->sub_nr; i++) {
> +			struct list_objects_filter_options *sub = &opts->sub[i];
> +			if (banned_filter(sub))
> +				return sub;
> +		}
> +	return NULL;
> +}
> +
> +static void die_if_using_banned_filter(struct packet_writer *w,
> +				       struct list_objects_filter_options *opts)
> +{
> +	struct list_objects_filter_options *banned = banned_filter(opts);
> +	if (!banned)
> +		return;
> +
> +	packet_writer_error(w, _("filter '%s' not supported\n"),
> +			    list_object_filter_config_name(banned->choice));
> +	die(_("git upload-pack: banned object filter requested"));
> +}
> +
>  static void receive_needs(struct packet_reader *reader, struct object_array *want_obj)
>  {
>  	struct object_array shallows = OBJECT_ARRAY_INIT;
> @@ -885,6 +926,7 @@ static void receive_needs(struct packet_reader *reader, struct object_array *wan
>  				die("git upload-pack: filtering capability not negotiated");
>  			list_objects_filter_die_if_populated(&filter_options);
>  			parse_list_objects_filter(&filter_options, arg);
> +			die_if_using_banned_filter(&writer, &filter_options);
>  			continue;
>  		}
>  
> @@ -1044,6 +1086,9 @@ static int find_symref(const char *refname, const struct object_id *oid,
>  
>  static int upload_pack_config(const char *var, const char *value, void *unused)
>  {
> +	const char *sub, *key;
> +	int sub_len;
> +
>  	if (!strcmp("uploadpack.allowtipsha1inwant", var)) {
>  		if (git_config_bool(var, value))
>  			allow_unadvertised_object_request |= ALLOW_TIP_SHA1;
> @@ -1065,6 +1110,26 @@ static int upload_pack_config(const char *var, const char *value, void *unused)
>  			keepalive = -1;
>  	} else if (!strcmp("uploadpack.allowfilter", var)) {
>  		allow_filter = git_config_bool(var, value);
> +	} else if (!parse_config_key(var, "uploadpack", &sub, &sub_len, &key) &&
> +		   key && !strcmp(key, "allow")) {
> +		if (sub && skip_prefix(sub, "filter.", &sub) && sub_len >= 7) {
> +			struct string_list_item *item;
> +			char *spec;
> +
> +			/*
> +			 * normalize the filter, and chomp off '.allow' from the
> +			 * end
> +			 */
> +			spec = xstrdup_tolower(sub);
> +			spec[sub_len - 7] = 0;
> +
> +			item = string_list_insert(&allowed_filters, spec);
> +			item->util = (void *) (intptr_t) git_config_bool(var, value);
> +
> +			free(spec);
> +		} else if (!strcmp("uploadpack.filter.allow", var)) {
> +			allow_filter_fallback = git_config_bool(var, value);
> +		}
>  	} else if (!strcmp("uploadpack.allowrefinwant", var)) {
>  		allow_ref_in_want = git_config_bool(var, value);
>  	} else if (!strcmp("uploadpack.allowsidebandall", var)) {
> @@ -1308,6 +1373,8 @@ static void process_args(struct packet_reader *request,
>  		if (allow_filter && skip_prefix(arg, "filter ", &p)) {
>  			list_objects_filter_die_if_populated(&filter_options);
>  			parse_list_objects_filter(&filter_options, p);
> +			die_if_using_banned_filter(&data->writer,
> +						   &filter_options);
>  			continue;
>  		}
>  


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re*: [RFC PATCH 0/2] upload-pack.c: limit allowed filter choices
  2020-03-18 10:18       ` [RFC PATCH 0/2] upload-pack.c: limit allowed filter choices Jeff King
@ 2020-03-18 18:26         ` Junio C Hamano
  2020-03-19 17:03           ` Jeff King
  2020-03-18 21:28         ` Taylor Blau
  2020-04-17  9:41         ` Christian Couder
  2 siblings, 1 reply; 125+ messages in thread
From: Junio C Hamano @ 2020-03-18 18:26 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git, christian.couder, james

Jeff King <peff@peff.net> writes:

>>   - 'uploadpack.filter.<filter>.allow' -> specifying whether or not each
>>     filter kind is allowed or not. (Originally this was given as 'git
>>     config uploadpack.filter=blob:none.allow true', but this '=' is
>>     ambiguous to configuration given over '-c', which itself uses an '='
>>     to separate keys from values.)
>
> One thing that's a little ugly here is the embedded dot in the
> subsection (i.e., "filter.<filter>"). It makes it look like a four-level
> key, but really there is no such thing in Git.  But everything else we
> tried was even uglier.

I think this gives us the best arrangement by upfront forcing all
the configuration handers for "<subcommand>.*.<token>" namespace,
current and future, to use "<group-prefix>" before the unbounded set
of user-specifiable values that affects the <subcommand> (which is
"uploadpack").

So far, the configuration variables that needs to be grouped by
unbounded set of user-specifiable values we supported happened to
have only one sensible such set for each <subcommand>, so we could
get away without such <group-prefix> and it was perfectly OK to
have, say "guitool.<name>.cmd".

Syntactically, the convention to always end such <group-prefix> with
a dot "." may look unusual, or once readers' eyes get used to them,
may look natural.  One tiny sad thing about it is that it cannot be
mechanically enforced, but that is minor.

> We could do "uploadpackfilter.allow" and "uploadpackfilter.<filter>.allow",
> but that's both ugly _and_ separates these options from the rest of
> uploadpack.*.

There is an existing instance of a configuration that affects
<subcommand> that uses a different word after <subcommand>, which is
credentialCache.ignoreSIGHUP, and I tend to agree that it is ugly.

By the way, I noticed the following while I was studying the current
practice, so before I forget...

-- >8 --
Subject: [PATCH] separate tar.* config to its own source file

Even though there is only one configuration variable in the
namespace, it is not quite right to have tar.umask described
among the variables for tag.* namespace.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/config.txt     | 2 ++
 Documentation/config/tag.txt | 7 -------
 Documentation/config/tar.txt | 6 ++++++
 3 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 08b13ba72b..2450589a0e 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -447,6 +447,8 @@ include::config/submodule.txt[]
 
 include::config/tag.txt[]
 
+include::config/tar.txt[]
+
 include::config/trace2.txt[]
 
 include::config/transfer.txt[]
diff --git a/Documentation/config/tag.txt b/Documentation/config/tag.txt
index 6d9110d84c..5062a057ff 100644
--- a/Documentation/config/tag.txt
+++ b/Documentation/config/tag.txt
@@ -15,10 +15,3 @@ tag.gpgSign::
 	convenient to use an agent to avoid typing your gpg passphrase
 	several times. Note that this option doesn't affect tag signing
 	behavior enabled by "-u <keyid>" or "--local-user=<keyid>" options.
-
-tar.umask::
-	This variable can be used to restrict the permission bits of
-	tar archive entries.  The default is 0002, which turns off the
-	world write bit.  The special value "user" indicates that the
-	archiving user's umask will be used instead.  See umask(2) and
-	linkgit:git-archive[1].
diff --git a/Documentation/config/tar.txt b/Documentation/config/tar.txt
new file mode 100644
index 0000000000..de8ff48ea9
--- /dev/null
+++ b/Documentation/config/tar.txt
@@ -0,0 +1,6 @@
+tar.umask::
+	This variable can be used to restrict the permission bits of
+	tar archive entries.  The default is 0002, which turns off the
+	world write bit.  The special value "user" indicates that the
+	archiving user's umask will be used instead.  See umask(2) and
+	linkgit:git-archive[1].

^ permalink raw reply related	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 1/2] list_objects_filter_options: introduce 'list_object_filter_config_name'
  2020-03-18 10:03           ` Jeff King
@ 2020-03-18 19:40             ` Junio C Hamano
  2020-03-18 22:38             ` Eric Sunshine
  1 sibling, 0 replies; 125+ messages in thread
From: Junio C Hamano @ 2020-03-18 19:40 UTC (permalink / raw)
  To: Jeff King; +Cc: Eric Sunshine, Taylor Blau, Git List, Christian Couder, james

Jeff King <peff@peff.net> writes:

> But the cost is that we have to enumerate the set of values that are
> defined but not handled here (LOFC__COUNT, for instance, isn't a real
> enum value but rather a placeholder to let other code know how many
> filter types there are).
>
> So...I dunno. Worth it as a general technique?

"This is a possible value in the enum we are switching on, so I
write a case arm for it, but we do nothing for it here" is OK, but
if it were "we do nothing for it here or anywhere" (i.e. the maximum
enum value defined as a sentinel), the resulting code would be ugly.

I am not sure if the tradeoff is good to force such an ugliness on
readers' eyes to squelch the -Wswitch warnings.

So, I dunno.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 1/2] list_objects_filter_options: introduce 'list_object_filter_config_name'
  2020-03-17 20:53         ` Eric Sunshine
  2020-03-18 10:03           ` Jeff King
@ 2020-03-18 21:05           ` Taylor Blau
  1 sibling, 0 replies; 125+ messages in thread
From: Taylor Blau @ 2020-03-18 21:05 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Taylor Blau, Git List, Christian Couder, Jeff King, james

Hi Eric,

On Tue, Mar 17, 2020 at 04:53:44PM -0400, Eric Sunshine wrote:
> On Tue, Mar 17, 2020 at 4:40 PM Taylor Blau <me@ttaylorr.com> wrote:
> > In a subsequent commit, we will add configuration options that are
> > specific to each kind of object filter, in which case it is handy to
> > have a function that translates between 'enum
> > list_objects_filter_choice' and an appropriate configuration-friendly
> > string.
> > ---
>
> Missing sign-off (but perhaps that's intentional since this is RFC).

Yes, the missing sign-off (in this patch as well as 2/2) is intentional,
since this is an RFC. Sorry for not calling this out more clearly in my
cover.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 2/2] upload-pack.c: allow banning certain object filter(s)
  2020-03-17 21:11         ` Eric Sunshine
@ 2020-03-18 21:18           ` Taylor Blau
  0 siblings, 0 replies; 125+ messages in thread
From: Taylor Blau @ 2020-03-18 21:18 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Taylor Blau, Git List, Christian Couder, Jeff King, james

On Tue, Mar 17, 2020 at 05:11:42PM -0400, Eric Sunshine wrote:
> On Tue, Mar 17, 2020 at 4:40 PM Taylor Blau <me@ttaylorr.com> wrote:
> > NB: this introduces an unfortunate possibility that attempt to write the
> > ERR sideband will cause a SIGPIPE. This can be prevented by some of
> > SZEDZER's previous work, but it is silenced in 't' for now.
>
> s/SZEDZER/SZEDER/

Thank you for pointing this out, and my apologies to SZEDER.

> > diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
> > @@ -235,6 +235,29 @@ test_expect_success 'implicitly construct combine: filter with repeated flags' '
> > +test_expect_success 'upload-pack fails banned object filters' '
> > +       # Ensure that configuration keys are normalized by capitalizing
> > +       # "blob:None" below:
> > +       test_config -C srv.bare uploadpack.filter.blob:None.allow false &&
>
> I found the wording of the comment more confusing than clarifying.
> Perhaps rewriting it like this could help:
>
>     Test case-insensitivity by intentional use of "blob:None" rather than
>     "blob:none".
>
> or something.

Sure, your suggestion does clarify things. I'll apply it to my fork.

> > +       test_must_fail ok=sigpipe git clone --no-checkout --filter.blob:none \
> > +               "file://$(pwd)/srv.bare" pc3
> > +'

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 2/2] upload-pack.c: allow banning certain object filter(s)
  2020-03-18 11:18         ` Philip Oakley
@ 2020-03-18 21:20           ` Taylor Blau
  0 siblings, 0 replies; 125+ messages in thread
From: Taylor Blau @ 2020-03-18 21:20 UTC (permalink / raw)
  To: Philip Oakley; +Cc: Taylor Blau, git, christian.couder, peff, james

Hi Philip,

On Wed, Mar 18, 2020 at 11:18:16AM +0000, Philip Oakley wrote:
> Hi
> On 17/03/2020 20:39, Taylor Blau wrote:
> > Git clients may ask the server for a partial set of objects, where the
> > set of objects being requested is refined by one or more object filters.
> > Server administrators can configure 'git upload-pack' to allow or ban
> > these filters by setting the 'uploadpack.allowFilter' variable to
> > 'true' or 'false', respectively.
> >
> > However, administrators using bitmaps may wish to allow certain kinds of
> > object filters, but ban others. Specifically, they may wish to allow
> > object filters that can be optimized by the use of bitmaps, while
> > rejecting other object filters which aren't and represent a perceived
> > performance degradation (as well as an increased load factor on the
> > server).
> >
> > Allow configuring 'git upload-pack' to support object filters on a
> > case-by-case basis by introducing a new configuration variable and
> > section:
> >
> >   - 'uploadpack.filter.allow'
> >
> >   - 'uploadpack.filter.<kind>.allow'
> >
> > where '<kind>' may be one of 'blob:none', 'blob:limit', 'tree:depth',
> > and so on. The additional '.' between 'filter' and '<kind>' is part of
> > the sub-section.
> >
> > Setting the second configuration variable for any valid value of
> > '<kind>' explicitly allows or disallows restricting that kind of object
> > filter.
> >
> > If a client requests the object filter <kind> and the respective
> > configuration value is not set, 'git upload-pack' will default to the
> > value of 'uploadpack.filter.allow', which itself defaults to 'true' to
> > maintain backwards compatibility. Note that this differs from
> > 'uploadpack.allowfilter', which controls whether or not the 'filter'
> > capability is advertised.
> >
> > NB: this introduces an unfortunate possibility that attempt to write the
> > ERR sideband will cause a SIGPIPE. This can be prevented by some of
> > SZEDZER's previous work, but it is silenced in 't' for now.
> > ---
> >  Documentation/config/uploadpack.txt | 12 ++++++
> >  t/t5616-partial-clone.sh            | 23 ++++++++++
> >  upload-pack.c                       | 67 +++++++++++++++++++++++++++++
> >  3 files changed, 102 insertions(+)
> >
> > diff --git a/Documentation/config/uploadpack.txt b/Documentation/config/uploadpack.txt
> > index ed1c835695..6213bd619c 100644
> > --- a/Documentation/config/uploadpack.txt
> > +++ b/Documentation/config/uploadpack.txt
> > @@ -57,6 +57,18 @@ uploadpack.allowFilter::
> >  	If this option is set, `upload-pack` will support partial
> >  	clone and partial fetch object filtering.
> >
> > +uploadpack.filter.allow::
> > +	Provides a default value for unspecified object filters (see: the
> > +	below configuration variable).
> > +	Defaults to `true`.
> > +
> > +uploadpack.filter.<filter>.allow::
> > +	Explicitly allow or ban the object filter corresponding to `<filter>`,
> > +	where `<filter>` may be one of: `blob:none`, `blob:limit`, `tree:depth`,
> > +	`sparse:oid`, or `combine`. If using combined filters, both `combine`
> > +	and all of the nested filter kinds must be allowed.
>
> Doesn't the man page at least need the part from the commit message "The
> additional '.' between 'filter' and '<kind>' is part of
> the sub-section." as it's not a common mechanism (other comments not
> withstanding)

Thanks, you're certainly right. I wrote the man pages back when the
configuration was spelled:

  $ git config uploadpack.filter=blob:none.allow true

But now that there is the extra '.', it's worth calling out here, too.
I'll make sure that this is addressed based on the outcome of the
discussion below when these patches hit non-RFC status.

> Philip

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 0/2] upload-pack.c: limit allowed filter choices
  2020-03-18 10:18       ` [RFC PATCH 0/2] upload-pack.c: limit allowed filter choices Jeff King
  2020-03-18 18:26         ` Re*: " Junio C Hamano
@ 2020-03-18 21:28         ` Taylor Blau
  2020-03-18 22:41           ` Junio C Hamano
  2020-03-19 17:09           ` Jeff King
  2020-04-17  9:41         ` Christian Couder
  2 siblings, 2 replies; 125+ messages in thread
From: Taylor Blau @ 2020-03-18 21:28 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git, christian.couder, james

On Wed, Mar 18, 2020 at 06:18:25AM -0400, Jeff King wrote:
> On Tue, Mar 17, 2020 at 02:39:05PM -0600, Taylor Blau wrote:
>
> > Hi Christian,
> >
> > Of course, I would be happy to send along our patches. They are included
> > in the series below, and correspond roughly to what we are running at
> > GitHub. (For us, there have been a few more clean-ups and additional
> > patches, but I squashed them into 2/2 below).
> >
> > The approach is roughly that we have:
> >
> >   - 'uploadpack.filter.allow' -> specifying the default for unspecified
> >     filter choices, itself defaulting to true in order to maintain
> >     backwards compatibility, and
> >
> >   - 'uploadpack.filter.<filter>.allow' -> specifying whether or not each
> >     filter kind is allowed or not. (Originally this was given as 'git
> >     config uploadpack.filter=blob:none.allow true', but this '=' is
> >     ambiguous to configuration given over '-c', which itself uses an '='
> >     to separate keys from values.)
>
> One thing that's a little ugly here is the embedded dot in the
> subsection (i.e., "filter.<filter>"). It makes it look like a four-level
> key, but really there is no such thing in Git.  But everything else we
> tried was even uglier.
>
> I think we want to declare a real subsection for each filter and not
> just "uploadpack.filter.<filter>". That gives us room to expand to other
> config options besides "allow" later on if we need to.
>
> We don't want to claim "uploadpack.allow" and "uploadpack.<filter>.allow";
> that's too generic.
>
> Likewise "filter.allow" is too generic.

I wonder. A multi-valued 'uploadpack.filter.allow' *might* solve some
problems, but the more I turn it over in my head, the more that I think
that it's creating more headaches for us than it's removing.

On the pro's side, is that we could have this be a multi-valued key
where each value is the name of an allowed filter. I guess that would
solve the subsection-naming problem, but it is admittedly generic, not
to mention the fact that we already *use* this key to specify a default
value for missing 'uploadpack.filter.<filter>.allow' values. For that
reason, it seems like a non-starter to me.

> We could do "uploadpackfilter.allow" and "uploadpackfilter.<filter>.allow",
> but that's both ugly _and_ separates these options from the rest of
> uploadpack.*.
>
> We could use a character besides ".", which would reduce confusion. But
> what? Using colon is kind of ugly, because it's already syntactically
> significant in filter names, and you get:
>
>   uploadpack.filter:blob:none.allow
>
> We tried equals, like:
>
>   uploadpack.filter=blob:none.allow
>
> but there's an interesting side effect. Doing:
>
>   git -c uploadpack.filter=blob:none.allow=true upload-pack ...
>
> doesn't work, because the "-c" parser ends the key at the first "=". As
> it should, because otherwise we'd get confused by an "=" in a value.
> This is a failing of the "-c" syntax; it can't represent values with
> "=". Fixing it would be awkward, and I've never seen it come up in
> practice outside of this (you _could_ have a branch with a funny name
> and try to do "git -c branch.my=funny=branch.remote=origin" or
> something, but the lack of bug reports suggests nobody is that
> masochistic).

Thanks for adding some more detail to this decision.

Another thing we could do is just simply use a different character. It
may be a little odd, but it keeps the filter-related variables in their
own sub-section, allowing us to add more configuration sub-variables in
the future. I guess that calling it something like:

  $ git config uploadpack.filter@blob:none.allow <true|false>

is a little strange (i.e., why '@' over '#'? There's certainly no
precedent here that I can think of...), but maybe it is slightly
less-weird than a pseudo-four-level key.

> So...maybe the extra dot is the last bad thing?
>
> > I noted in the second patch that there is the unfortunate possibility of
> > encountering a SIGPIPE when trying to write the ERR sideband back to a
> > client who requested a non-supported filter. Peff and I have had some
> > discussion off-list about resurrecting SZEDZER's work which makes room
> > in the buffer by reading one packet back from the client when the server
> > encounters a SIGPIPE. It is for this reason that I am marking the series
> > as 'RFC'.
>
> For reference, the patch I was thinking of was this:
>
>   https://lore.kernel.org/git/20190830121005.GI8571@szeder.dev/

Thanks.

> -Peff

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 1/2] list_objects_filter_options: introduce 'list_object_filter_config_name'
  2020-03-18 10:03           ` Jeff King
  2020-03-18 19:40             ` Junio C Hamano
@ 2020-03-18 22:38             ` Eric Sunshine
  2020-03-19 17:15               ` Jeff King
  1 sibling, 1 reply; 125+ messages in thread
From: Eric Sunshine @ 2020-03-18 22:38 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, Git List, Christian Couder, james

On Wed, Mar 18, 2020 at 6:03 AM Jeff King <peff@peff.net> wrote:
> On Tue, Mar 17, 2020 at 04:53:44PM -0400, Eric Sunshine wrote:
> > > +       case LOFC__COUNT:
> > > +               /*
> > > +                * Include these to catch all enumerated values, but
> > > +                * break to treat them as a bug. Any new values of this
> > > +                * enum will cause a compiler error, as desired.
> > > +                */
> >
> > In general, people will see a warning, not an error, unless they
> > specifically use -Werror (or such) to turn the warning into an error,
> > so this statement is misleading. Also, while some compilers may
> > complain, others may not. So, although the comment claims that we will
> > notice an unhandled enum constant at compile-time, that isn't
> > necessarily the case.
>
> Yes, but that's the best we can do, isn't it?

To be clear, I wasn't questioning the code structure at all. I was
specifically referring to the comment talking about "error" when it
should say "warning" or "possible warning".

Moreover, normally, we use comments to highlight something in the code
which is not obvious or straightforward, so I was questioning whether
this comment is even helpful since the code seems reasonably clear.
And...

> But we can't just remove the default case. Even though enums don't
> generally take on other values, it's legal for them to do so. So we do
> want to make sure we BUG() in that instance.
>
> This is awkward to solve in the general case[1]. But because we're
> returning in each case arm here, it's easy to just put the BUG() after
> the switch. Anything that didn't return is unhandled, and we get the
> best of both: -Wswitch warnings when we need to add a new filter type,
> and a BUG() in the off chance that we see an unexpected value.
>
> So...I dunno. Worth it as a general technique?

...if this is or will become an idiom we want in this codebase, then
it would be silly to write an explanatory comment every place we
employ it. Instead, a document such as CodingGuidelines would likely
be a better fit for such knowledge.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 0/2] upload-pack.c: limit allowed filter choices
  2020-03-18 21:28         ` Taylor Blau
@ 2020-03-18 22:41           ` Junio C Hamano
  2020-03-19 17:10             ` Jeff King
  2020-03-19 17:09           ` Jeff King
  1 sibling, 1 reply; 125+ messages in thread
From: Junio C Hamano @ 2020-03-18 22:41 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Jeff King, git, christian.couder, james

Taylor Blau <me@ttaylorr.com> writes:

>> We tried equals, like:
>>
>>   uploadpack.filter=blob:none.allow
>>
>> but there's an interesting side effect. Doing:
>>
>>   git -c uploadpack.filter=blob:none.allow=true upload-pack ...
>>
>> doesn't work, because the "-c" parser ends the key at the first "=". As
>> it should, because otherwise we'd get confused by an "=" in a value.
>> This is a failing of the "-c" syntax; it can't represent values with
>> "=". 

s/value/key/ I presume ;-)

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: Re*: [RFC PATCH 0/2] upload-pack.c: limit allowed filter choices
  2020-03-18 18:26         ` Re*: " Junio C Hamano
@ 2020-03-19 17:03           ` Jeff King
  0 siblings, 0 replies; 125+ messages in thread
From: Jeff King @ 2020-03-19 17:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, git, christian.couder, james

On Wed, Mar 18, 2020 at 11:26:00AM -0700, Junio C Hamano wrote:

> > One thing that's a little ugly here is the embedded dot in the
> > subsection (i.e., "filter.<filter>"). It makes it look like a four-level
> > key, but really there is no such thing in Git.  But everything else we
> > tried was even uglier.
> 
> I think this gives us the best arrangement by upfront forcing all
> the configuration handers for "<subcommand>.*.<token>" namespace,
> current and future, to use "<group-prefix>" before the unbounded set
> of user-specifiable values that affects the <subcommand> (which is
> "uploadpack").
> 
> So far, the configuration variables that needs to be grouped by
> unbounded set of user-specifiable values we supported happened to
> have only one sensible such set for each <subcommand>, so we could
> get away without such <group-prefix> and it was perfectly OK to
> have, say "guitool.<name>.cmd".

Yeah. We have often just split those out into a separate hierarchy from
<subcommand> E.g., tar.<format>.command, which is really feeding the
git-archive command. We could do that here, too, but I wasn't sure of a
good name (this really is upload-pack specific, though I guess in theory
other commands could grow a need to look at or restrict "remote object
filters").

> Syntactically, the convention to always end such <group-prefix> with
> a dot "." may look unusual, or once readers' eyes get used to them,
> may look natural.  One tiny sad thing about it is that it cannot be
> mechanically enforced, but that is minor.

The biggest downside to implying a 4-level key is that the
case-sensitivity rules may be different. I.e., you can say:

  UploadPack.filter.blob:none.Allow

but not:

  UploadPack.Filter.blob:none.Allow

Since "filter" is part of the subsection, it's case sensitive. We could
match it case-insensitively in upload_pack_config(), but it would crop
up in other laces (e.g., "git config --unset" would still care).

> > We could do "uploadpackfilter.allow" and "uploadpackfilter.<filter>.allow",
> > but that's both ugly _and_ separates these options from the rest of
> > uploadpack.*.
> 
> There is an existing instance of a configuration that affects
> <subcommand> that uses a different word after <subcommand>, which is
> credentialCache.ignoreSIGHUP, and I tend to agree that it is ugly.

I don't think that's what's going on here. It affects only the
credential-cache subcommand, but we avoid hyphens in our key names.
So it really is the subcommand; it's just that the name is a superset of
another command name. :)

> By the way, I noticed the following while I was studying the current
> practice, so before I forget...
> 
> -- >8 --
> Subject: [PATCH] separate tar.* config to its own source file
> 
> Even though there is only one configuration variable in the
> namespace, it is not quite right to have tar.umask described
> among the variables for tag.* namespace.

Yeah, this is definitely an improvement. But I was surprised that
tar.<format>.* wasn't covered here. It is documented in git-archive.
Probably worth moving or duplicating it in git-config.

-Peff

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 0/2] upload-pack.c: limit allowed filter choices
  2020-03-18 21:28         ` Taylor Blau
  2020-03-18 22:41           ` Junio C Hamano
@ 2020-03-19 17:09           ` Jeff King
  1 sibling, 0 replies; 125+ messages in thread
From: Jeff King @ 2020-03-19 17:09 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, christian.couder, james

On Wed, Mar 18, 2020 at 03:28:18PM -0600, Taylor Blau wrote:

> I wonder. A multi-valued 'uploadpack.filter.allow' *might* solve some
> problems, but the more I turn it over in my head, the more that I think
> that it's creating more headaches for us than it's removing.

IMHO we should avoid multi-valued keys when there's not a compelling
reason. There are a lot of corner cases they introduce (e.g., there's no
standard way to override them rather than adding to the list).

> Another thing we could do is just simply use a different character. It
> may be a little odd, but it keeps the filter-related variables in their
> own sub-section, allowing us to add more configuration sub-variables in
> the future. I guess that calling it something like:
> 
>   $ git config uploadpack.filter@blob:none.allow <true|false>
> 
> is a little strange (i.e., why '@' over '#'? There's certainly no
> precedent here that I can think of...), but maybe it is slightly
> less-weird than a pseudo-four-level key.

I guess it's subjective, but the "@" just feels odd because it's
associated with so many other meanings. Likewise "#".

-Peff

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 0/2] upload-pack.c: limit allowed filter choices
  2020-03-18 22:41           ` Junio C Hamano
@ 2020-03-19 17:10             ` Jeff King
  0 siblings, 0 replies; 125+ messages in thread
From: Jeff King @ 2020-03-19 17:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, git, christian.couder, james

On Wed, Mar 18, 2020 at 03:41:51PM -0700, Junio C Hamano wrote:

> Taylor Blau <me@ttaylorr.com> writes:
> 
> >> We tried equals, like:
> >>
> >>   uploadpack.filter=blob:none.allow
> >>
> >> but there's an interesting side effect. Doing:
> >>
> >>   git -c uploadpack.filter=blob:none.allow=true upload-pack ...
> >>
> >> doesn't work, because the "-c" parser ends the key at the first "=". As
> >> it should, because otherwise we'd get confused by an "=" in a value.
> >> This is a failing of the "-c" syntax; it can't represent values with
> >> "=". 
> 
> s/value/key/ I presume ;-)

Yes. :)

-Peff

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 1/2] list_objects_filter_options: introduce 'list_object_filter_config_name'
  2020-03-18 22:38             ` Eric Sunshine
@ 2020-03-19 17:15               ` Jeff King
  0 siblings, 0 replies; 125+ messages in thread
From: Jeff King @ 2020-03-19 17:15 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Taylor Blau, Git List, Christian Couder, james

On Wed, Mar 18, 2020 at 06:38:49PM -0400, Eric Sunshine wrote:

> To be clear, I wasn't questioning the code structure at all. I was
> specifically referring to the comment talking about "error" when it
> should say "warning" or "possible warning".
> 
> Moreover, normally, we use comments to highlight something in the code
> which is not obvious or straightforward, so I was questioning whether
> this comment is even helpful since the code seems reasonably clear.
> And...

OK, I agree with all that. :)

> ...if this is or will become an idiom we want in this codebase, then
> it would be silly to write an explanatory comment every place we
> employ it. Instead, a document such as CodingGuidelines would likely
> be a better fit for such knowledge.

Yeah, that makes sense. If we do use this technique, though, we'll have
to explicitly list "case" lines for the enum values which are meant to
break out to the BUG(). And there it _is_ worth commenting on "yes, we
know about this value but it is not handled here because...". Which is
what you asked for in your original message. :)

Something like:

  switch (c) {
  case LOFC_BLOB_NONE:
	return "blob:none":
  ..etc...
  case LOFC__COUNT:
	/* not a real filter type; just a marker for counting the number */
	break;
  case LOFC_DISABLED:
	/* we have no name for "no filter at all" */
	break;
  }
  BUG(...);

-Peff

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [TOPIC 3/17] Obliterate
  2020-03-16 12:55     ` Konstantin Tokarev
@ 2020-03-26 22:27       ` Damien Robert
  0 siblings, 0 replies; 125+ messages in thread
From: Damien Robert @ 2020-03-26 22:27 UTC (permalink / raw)
  To: Konstantin Tokarev; +Cc: James Ramsay, git@vger.kernel.org

From Konstantin Tokarev, Mon 16 Mar 2020 at 15:55:39 (+0300) :
> > My situation: coworkers push big files by mistake, I don't want to rewrite
> > history because they are not too well versed with git, but I want to keep
> > *my* repo clean.

> Wouldn't it be better to prevent *them* from such mistakes, e.g. by using
> pre-push review system like Gerrit?

So my coworkers are mathematicians, and not all of them are comfortable
with dvcs, and I already have a hard time convincing them to use git rather
than dropbox. I take it upon myself to make it as easy as possible to use
git (by telling them to push to a different branch when there is a conflict
so that I can handle the conflict myself).

I don't think Gerrit is a solution there...

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [TOPIC 3/17] Obliterate
  2020-03-16 16:32     ` Elijah Newren
@ 2020-03-26 22:30       ` Damien Robert
  0 siblings, 0 replies; 125+ messages in thread
From: Damien Robert @ 2020-03-26 22:30 UTC (permalink / raw)
  To: Elijah Newren; +Cc: James Ramsay, Git Mailing List

From Elijah Newren, Mon 16 Mar 2020 at 09:32:45 (-0700) :
> > I am interested in more details on how to handle this using replace.

> This comment at the conference was in reference to how people rewrite
> history to remove the big blobs, but then run into issues because
> there are many places outside of git that reference old commit IDs
> (wiki pages, old emails, issues/tickets, etc.) that are now broken.
[...]

Interesting, thanks for the context!

> As for using replace refs to attempt to alleviate problems without
> rewriting history, that's an even bigger can of worms and it doesn't
> solve clone/fetch/gc/fsck nor the many other places you highlighted in
> your email.

I agreed, but one part that makes it easier in my context is that I don't
need to distribute the replaced references, I just need them for myself.
This alleviate a lot of problems already, and as I outlined in my email the
combination of replace ref and sparse checkout is almost enough.

-- 
Damien Robert
http://www.normalesup.org/~robert/pro

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [TOPIC 3/17] Obliterate
  2020-03-16 18:32     ` Phillip Susi
@ 2020-03-26 22:37       ` Damien Robert
  0 siblings, 0 replies; 125+ messages in thread
From: Damien Robert @ 2020-03-26 22:37 UTC (permalink / raw)
  To: Phillip Susi; +Cc: James Ramsay, git

From Phillip Susi, Mon 16 Mar 2020 at 14:32:46 (-0400) :
> Instead of replacing the blob with an empty file, why not replace the
> tree that references it with one that does not?  That way you won't have
> the file in your checkout at all, and the index won't list it so status
> won't show it as changed.

That's an interesting solution, but it only works if the tree itself does
not change.

- This is the case when these large objects were uploaded and then removed
  in another commit [*].
  But in this case I won't checkout (usually) back to this tree anyway, so the
  error due to the missing blob is not a big problem.

- When my coauthors use git as a dropbox alternative where they upload big
  pdf files (rather than only source code or .tex files), they also want to
  keep them there. If they had uploaded these files to the special literature/
  folder I had made for them, I could just replace the literature/ tree by
  an empty one, but they managed to upload them in the root folder which is
  subject to change unfortunately, and it would be annoying to make a new
  replace ref each time.

[*] by myself usually, for instance when people commit spurious tex
generated files like eg *.synctex, despite my .gitignore (I don't know how
they manage this...)

-- 
Damien Robert
http://www.normalesup.org/~robert/pro

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [TOPIC 2/17] Hooks in the future
  2020-03-13 17:56     ` Junio C Hamano
@ 2020-04-07 23:01       ` Emily Shaffer
  2020-04-07 23:51         ` Emily Shaffer
  0 siblings, 1 reply; 125+ messages in thread
From: Emily Shaffer @ 2020-04-07 23:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: James Ramsay, git

On Fri, Mar 13, 2020 at 10:56:59AM -0700, Junio C Hamano wrote:
> Emily Shaffer <emilyshaffer@google.com> writes:

Phew - now that git-bugreport looks to be on the path to 'next' I get to
work on hooks again :) Forgive me for the late reply.

> 
> > This means that we could do something like this:
> >
> > [hook "/path/to/executable.sh"]
> > 	event = pre-commit
> > 	order = 123
> > 	mustSucceed = false
> > 	parallelizable = true
> >
> > etc, etc as needed.
> 
> You can do
> 
>     [hook "pre-commit"]
> 	order = 123
> 	path = "/path/to/executable.sh"
> 
>     [hook "pre-commit"]
> 	order = 234
> 	path = "/path/to/another-executable.sh"
> 
> as well, and using the second level for what hook the (sub)section
> is about, instead of "we have this path that is used for a hook.
> What hook is it?", feels (at least to me) more natural.

Yeah, I see what you mean, and it's true I misread the notes and
misremembered Peff's suggestion. I was reworking my RFC patch some
today, and noticed that the following two configs:

A.gitconfig:
  [hook "pre-commit"]
    command = "/path/to/executable.sh"
    option = foo
  [hook "pre-commit"]
    command = "/path/to/another-executable.sh"

B.gitconfig:
  [hook "pre-commit"]
    command = "/path/to/executable.sh"
  [hook "pre-commit"]
    option = foo
    command = "/path/to/another-executable.sh"

are indistinguishable during the config parse - both show up during the
config callback looking like:

  value = "hook.pre-commit.command"; var = "/path/to/executable.sh"
  value = "hook.pre-commit.option"; var = "foo"
  value = "hook.pre-commit.command"; var = "/path/to/another-executable.sh"

I didn't see anything to get around this in the config parser library;
if I missed it I'd love to know.

Using the hook path as the subsection still doesn't help, of course. I
think the only way I see around it is to require a specific value at the
beginning of each hook config section, e.g. "each hook entry must begin
with 'command'"; that means that the config parser callback can look
something like:

  parse section, subsection, key
  if section.subsection = "hook.pre-commit":
    if key = "command":
      add a new hook to the hook list
    else:
      operate on the tail of the hook list

The price of this is poor user experience for those handcrafting their
own hook configs, but I don't think it's poorer than carefully spelling
out "123:~/my-hook-path.sh:whatever:other:options" or something. I'll
add that I had planned to teach 'git-hook' to write and modify config
files for the user with an interactive-rebase-like UI, so a brittle
config layout might not be the end of the world.

Or, I suppose, we could teach the config parser how to understand
"structlike" configs like this where repeated header entries need to be
collated together. That seems to be contrary to the semantics of the
config file right now, though, and it looks like it'd require a rework
of the config_set implementation: today config_set_element looks like

  struct config_set_element {
          struct hashmap_entry ent;
          char *key; /* "hook.pre-commit.command" */
          struct string_list value_list; /* "/path/to/executable.sh"
	                                  * "path/to/another-executable.sh"
					  */
  };

I'm not very keen on the idea of changing the way configs are stored for
everyone, although if folks are unsatisfied with the way it is now and
want to do that, I guess it's an option. But it's certainly more
overhead than my earlier suggestion.

Thoughts?

 - Emily

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [TOPIC 2/17] Hooks in the future
  2020-04-07 23:01       ` Emily Shaffer
@ 2020-04-07 23:51         ` Emily Shaffer
  2020-04-08  0:40           ` Junio C Hamano
  2020-04-10 21:31           ` Jeff King
  0 siblings, 2 replies; 125+ messages in thread
From: Emily Shaffer @ 2020-04-07 23:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: James Ramsay, git

On Tue, Apr 07, 2020 at 04:01:32PM -0700, Emily Shaffer wrote:
> Thoughts?

Jonathan Nieder and I discussed this a little bit offline, and he
suggested another thought:

[hook "unique-name"]
  pre-commit = ~/path-to-hook.sh args-for-precommit
  pre-push = ~/path-to-hook.sh
  order = 001

Then, in another config:

hook.unique-name.pre-push-order = 123

or,

hook.unique-name.enable = false
hook.unique-name.pre-commit-enable = true

To pick it apart a little more:

 - Let's give each logical action a unique name, e.g. "git-secrets".
 - Users can sign up for a certain event by providing the command to
   run, e.g. `hook.git-secrets.pre-commit = git-secrets pre-commit`.
 - Users can set up defaults for the logical action, e.g.
   `hook.git-secrets.before = gerrit` (where "gerrit" is the unique name
   for another logical action), and then change it on a per-hook basis
   e.g. `hook.git-secrets.pre-commit-before = clang-tidy`

There's some benefit:
 - We don't have to kludge something new (multiple sections with the
   same name, but logically disparate) into the config semantics where
   it doesn't really fit.
 - Users could, for example, turn off all "git-secrets" invocations in a
   repo without knowing which hooks it's attached to, e.g.
   `hook.git-secrets.enable = false`
 - We still have the option to add and remove parameters like 'order' or
   'before'/'after' or 'parallelizable' or etc., on a per-hook basis or
   for all flavors of a logical action such as "git-secrets"
 - It may be easier for a config-authoring iteration of 'git-hook' to
   modify existing configs than it would be if the ordering of config
   entries is vital.

One drawback I can think of is that these unique names could be either
difficult to autogenerate and guarantee uniqueness, or difficult for
humans to parse. I'd have to rethink the UI for writing or editing with
git-hook (rather than editing the config by hand), although I think with
the mood shifting away from configs looking like
"hook.pre-commit=123:~/path-to-thing.sh" my UI mockups are all invalid
anyways :)

We also considered something like:

[hook "git-secrets-pre-commit"]
  command = ~/path-to-hooks.sh args-for-precommit
  order = 001

[hook "git-secrets-pre-push"]
  comand = ~/path-to-hook.sh
  order = 123

but concluded that it's more verbose without adding additional value
over the earlier proposal above. This syntax can achieve a subset of
goals but is missing extra value like an easy path to disable all hooks
in that logical action, or nice defaults when you don't expect the order
or parallelism to change.

Definitely interested in hearing more ideas :)

 - Emily

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [TOPIC 2/17] Hooks in the future
  2020-04-07 23:51         ` Emily Shaffer
@ 2020-04-08  0:40           ` Junio C Hamano
  2020-04-08  1:09             ` Emily Shaffer
  2020-04-10 21:31           ` Jeff King
  1 sibling, 1 reply; 125+ messages in thread
From: Junio C Hamano @ 2020-04-08  0:40 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: James Ramsay, git

Emily Shaffer <emilyshaffer@google.com> writes:

> [hook "unique-name"]
>   pre-commit = ~/path-to-hook.sh args-for-precommit
>   pre-push = ~/path-to-hook.sh
>   order = 001
>
> Then, in another config:
>
> hook.unique-name.pre-push-order = 123
>
> or,
>
> hook.unique-name.enable = false
> hook.unique-name.pre-commit-enable = true
>
> To pick it apart a little more:
>
>  - Let's give each logical action a unique name, e.g. "git-secrets".
>  - Users can sign up for a certain event by providing the command to
>    run, e.g. `hook.git-secrets.pre-commit = git-secrets pre-commit`.
>  - Users can set up defaults for the logical action, e.g.
>    `hook.git-secrets.before = gerrit` (where "gerrit" is the unique name
>    for another logical action), and then change it on a per-hook basis
>    e.g. `hook.git-secrets.pre-commit-before = clang-tidy`

Sorry, but the description and the tokens used in there are so
detached from the current reality that I am having a hard time
trying to even guess what you two were talking about.  

For example, how would I express that I am using program X as my
'push-to-checkout' hook in a way consistent with the above
description?  Would "push" correspond to your "git-secrets" and
"checkout" to your "pre-commit", or would these be placed where you
wrote "unique-name"?




^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [TOPIC 2/17] Hooks in the future
  2020-04-08  0:40           ` Junio C Hamano
@ 2020-04-08  1:09             ` Emily Shaffer
  0 siblings, 0 replies; 125+ messages in thread
From: Emily Shaffer @ 2020-04-08  1:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: James Ramsay, git

On Tue, Apr 07, 2020 at 05:40:14PM -0700, Junio C Hamano wrote:
> Emily Shaffer <emilyshaffer@google.com> writes:
> 
> > [hook "unique-name"]
> >   pre-commit = ~/path-to-hook.sh args-for-precommit
> >   pre-push = ~/path-to-hook.sh
> >   order = 001
> >
> > Then, in another config:
> >
> > hook.unique-name.pre-push-order = 123
> >
> > or,
> >
> > hook.unique-name.enable = false
> > hook.unique-name.pre-commit-enable = true
> >
> > To pick it apart a little more:
> >
> >  - Let's give each logical action a unique name, e.g. "git-secrets".
> >  - Users can sign up for a certain event by providing the command to
> >    run, e.g. `hook.git-secrets.pre-commit = git-secrets pre-commit`.
> >  - Users can set up defaults for the logical action, e.g.
> >    `hook.git-secrets.before = gerrit` (where "gerrit" is the unique name
> >    for another logical action), and then change it on a per-hook basis
> >    e.g. `hook.git-secrets.pre-commit-before = clang-tidy`
> 
> Sorry, but the description and the tokens used in there are so
> detached from the current reality that I am having a hard time
> trying to even guess what you two were talking about.  

Ack, sorry about that. Point taken.

> 
> For example, how would I express that I am using program X as my
> 'push-to-checkout' hook in a way consistent with the above
> description?  Would "push" correspond to your "git-secrets" and
> "checkout" to your "pre-commit", or would these be placed where you
> wrote "unique-name"?

If you are using program X, which lives at /bin/x, and you want to use
it as your push-to-checkout hook:

[hook "x"]
  push-to-checkout = /bin/x

"unique-name" is unique and arbitrary, which is why I mentioned it could
be either difficult to machine-generate or difficult to human-read.

 - Emily

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [TOPIC 2/17] Hooks in the future
  2020-04-07 23:51         ` Emily Shaffer
  2020-04-08  0:40           ` Junio C Hamano
@ 2020-04-10 21:31           ` Jeff King
  2020-04-13 19:15             ` Emily Shaffer
  1 sibling, 1 reply; 125+ messages in thread
From: Jeff King @ 2020-04-10 21:31 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: Junio C Hamano, James Ramsay, git

On Tue, Apr 07, 2020 at 04:51:16PM -0700, Emily Shaffer wrote:

> On Tue, Apr 07, 2020 at 04:01:32PM -0700, Emily Shaffer wrote:
> > Thoughts?
> 
> Jonathan Nieder and I discussed this a little bit offline, and he
> suggested another thought:
> 
> [hook "unique-name"]
>   pre-commit = ~/path-to-hook.sh args-for-precommit
>   pre-push = ~/path-to-hook.sh
>   order = 001

Yeah, giving each block a unique name lets you give them each an order.
It seems kind of weird to me that you'd define multiple hook types for a
given name. And it doesn't leave a lot of room for defining
per-hook-type options; you have to make new keys like pre-push-order
(though that does work because the hook names are a finite set that
conforms to our config key names).

What if we added a layer of indirection: have a section for each type of
hook, defining keys for that type. And then for each hook command we
define there, it can have its own section, too. Maybe better explained
with an example:

  [hook "pre-receive"]
  # put any pre-receive related options here; e.g., a rule for what to
  # do with hook exit codes (e.g., stop running, run all but return exit
  # code, ignore failures, etc)
  fail = stop

  # And we can define actual hook commands. This one refers to the
  # hookcmd block below.
  command = foo

  # But if there's no such hookcmd block, we could just do something
  # sensible, like defaulting hookcmd.X.command to "X"
  command = /path/to/some-hook.sh

  [hookcmd "foo"]
  # the actual hook command to run
  command = /path/to/another-hook
  # other hook options, like order priority
  order = 123

I think both this schema and the one you wrote above can express the
same set of things. But you don't _have_ to pick a unique name if you
don't want to. Just doing:

  [hook "pre-receive"]
  command = /some/script

would be valid and useful (and that's as far as 99% of use cases would
need to go).

-Peff

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [TOPIC 2/17] Hooks in the future
  2020-04-10 21:31           ` Jeff King
@ 2020-04-13 19:15             ` Emily Shaffer
  2020-04-13 21:52               ` Jeff King
  0 siblings, 1 reply; 125+ messages in thread
From: Emily Shaffer @ 2020-04-13 19:15 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, James Ramsay, git

On Fri, Apr 10, 2020 at 05:31:46PM -0400, Jeff King wrote:
> On Tue, Apr 07, 2020 at 04:51:16PM -0700, Emily Shaffer wrote:
> 
> > On Tue, Apr 07, 2020 at 04:01:32PM -0700, Emily Shaffer wrote:
> > > Thoughts?
> > 
> > Jonathan Nieder and I discussed this a little bit offline, and he
> > suggested another thought:
> > 
> > [hook "unique-name"]
> >   pre-commit = ~/path-to-hook.sh args-for-precommit
> >   pre-push = ~/path-to-hook.sh
> >   order = 001
> 
> Yeah, giving each block a unique name lets you give them each an order.
> It seems kind of weird to me that you'd define multiple hook types for a
> given name.

Not so odd - git-secrets configures itself for pre-commit,
prepare-commit-msg, and commit-msg-hook. The invocation is slightly
different ('git-secrets pre-commit', 'git-secrets prepare-commit-msg',
etc) but to me it still makes some sense to treat it as a single logical
unit.

> And it doesn't leave a lot of room for defining
> per-hook-type options; you have to make new keys like pre-push-order
> (though that does work because the hook names are a finite set that
> conforms to our config key names).

Oh, interesting. I think you're saying "what if option 'frotz' only
makes sense for prepare-commit-msg; then there's no reason to allow
'frotz' and 'prepare-commit-msg-frotz' and 'post-commit-frotz' and so
on? I think I didn't do a great job explaining myself in that mail, but
my idea was to let an unqualified option name in a hook block set the
default, and then allow it to be overridden by qualifying it with the
name of the hook in question:

[hook "unique-name"]
  option = "some default"
  post-commit-option = "post-commit specific version"
  pre-push = ~/foo.sh pre-push
  post-commit = ~/foo.sh post-commit

Then when post-commit is invoked, option = "post-commit specific
version"; when pre-push is invoked, option = "some default". My
intention was to generate the hook-specific option key on the fly during
setup.

> 
> What if we added a layer of indirection: have a section for each type of
> hook, defining keys for that type. And then for each hook command we
> define there, it can have its own section, too. Maybe better explained
> with an example:
> 
>   [hook "pre-receive"]
>   # put any pre-receive related options here; e.g., a rule for what to
>   # do with hook exit codes (e.g., stop running, run all but return exit
>   # code, ignore failures, etc)
>   fail = stop

Interesting - so this is a default for all pre-receive hooks, that I can
set at whichever scope I wish.

> 
>   # And we can define actual hook commands. This one refers to the
>   # hookcmd block below.
>   command = foo
> 
>   # But if there's no such hookcmd block, we could just do something
>   # sensible, like defaulting hookcmd.X.command to "X"
>   command = /path/to/some-hook.sh

I like this idea a lot!

> 
>   [hookcmd "foo"]
>   # the actual hook command to run
>   command = /path/to/another-hook
>   # other hook options, like order priority
>   order = 123

Looks familiar enough. Now I worry - what if I specify 'fail' here too?

It seems like I may be saying "let's set a default per hookcmd" and you
may be saying "let's set a default per hook". Maybe you're saying "some
options are hook-specific and some options are command-specific." You
might be saying "we shouldn't need to set multiple option values for a
single command," and I think I disagree with that based on the
git-secrets value alone; if I'm getting ready to commit, I want
git-secrets to run last so it can look at changes other hooks made to my
commit, but if I'm getting ready to push, I want git-secrets to run
first so I don't wait around for a test suite just to find that my
commit is invalid anyways. Although, I guess with your schema the former
would be in [hookcmd "git-secrets-committing"] and the latter
would be in [hookcmd "git-secrets-pushing"], so I can set the ordering
how I wish.

(This might be an OK problem to punt on. I don't think there are any
options we have in mind just yet - even "order", we aren't sure whether
to prefer config order or an explicit number. I think if we make no
decision on how to treat per-hook options today, it doesn't stop us from
deciding on some schema tomorrow. Once we do decide, then we put it in
documentation and need to stick to it, but for now I think it's OK to
leave it undefined. We might not even need it.)

This schema also means it's easy to reorder or remove hooks later on,
which I like. A single line in my worktree config is clear:

  hookcmd.git-secrets-committing.skip = true
  hookcmd.git-secrets-pushing.order = 001
> 
> I think both this schema and the one you wrote above can express the
> same set of things. But you don't _have_ to pick a unique name if you
> don't want to. Just doing:
> 
>   [hook "pre-receive"]
>   command = /some/script
> 
> would be valid and useful (and that's as far as 99% of use cases would
> need to go).

Yeah, I see what you mean, and again I really like that. That lets us
run multiples in config order easily:

[hook "pre-receive"]
  command = /some/script
  command = /some/other-script
  command = some-hookcmd-header

If we add a little repeated-name detection then we can also reorder
easily this way if that's the direction we want for ordering:

{global}
hook.pre-receive.command = a.sh
hook.pre-receive.command = b.sh

{local}
hook.pre-receive.command = c.sh
hook.pre-receive.command = a.sh

for a final order of {b.sh, c.sh, a.sh}.

Very nice, IMO.

I wonder - I think even something like this would work:

{global}
[hook "pre-receive"]
  command = no-hookcmd-entry.sh

{local for repo "zork"}
[hookcmd "no-hookcmd-entry.sh"]
  skip = true

For most repos, now I simply invoke no-hookcmd-entry.sh on pre-receive,
but when I'm parsing the config in "zork", now I see a populated hookcmd
entry, and when I look it up with the key I found in the global config,
I see that it's supposed to be skipped.

Although I might need to do something hacky if I have multiple hooks
pointing to the same simple invocation:

{global}
[hook "pre-receive"]
  command = no-hookcmd-entry.sh

[hook "post-commit"]
  command = no-hookcmd-entry.sh

{local}
hookcmd.no-hookcmd-entry.sh.skip = true

[hook "pre-receive"]
  command = modified-no-hookcmd.sh

[hookcmd "modified-no-hookcmd-entry"]
  command = no-hookcmd-entry.sh

That is, I think this makes it kind of tricky to shut off one invocation
for only one hook.  Maybe it makes sense to honor something like:

hookcmd.foo.skip-pre-receive = true

?

I wonder if I'm getting buried in the weeds of stuff we won't ever have
to worry about ;)

 - Emily

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [TOPIC 2/17] Hooks in the future
  2020-04-13 19:15             ` Emily Shaffer
@ 2020-04-13 21:52               ` Jeff King
  2020-04-14  0:54                 ` [RFC PATCH v2 0/2] configuration-based hook management (was: [TOPIC 2/17] Hooks in the future) Emily Shaffer
  2020-04-15  3:45                 ` [TOPIC 2/17] Hooks in the future Jonathan Nieder
  0 siblings, 2 replies; 125+ messages in thread
From: Jeff King @ 2020-04-13 21:52 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: Junio C Hamano, James Ramsay, git

On Mon, Apr 13, 2020 at 12:15:15PM -0700, Emily Shaffer wrote:

> > Yeah, giving each block a unique name lets you give them each an order.
> > It seems kind of weird to me that you'd define multiple hook types for a
> > given name.
> 
> Not so odd - git-secrets configures itself for pre-commit,
> prepare-commit-msg, and commit-msg-hook. The invocation is slightly
> different ('git-secrets pre-commit', 'git-secrets prepare-commit-msg',
> etc) but to me it still makes some sense to treat it as a single logical
> unit.

Yeah, I do see how that use case makes sense. I wonder how common it is
versus having separate one-off hooks. And whether setting the order
priority for all hooks at once is that useful (e.g., I can easily
imagine a case where the pre-commit hook for program A must go before B,
but it's the other way around for another hook).

I'm just speculating, but my instinct is that it's worth trying to make
the simple things as simple as possible, while still allowing the more
complex things.

> > And it doesn't leave a lot of room for defining
> > per-hook-type options; you have to make new keys like pre-push-order
> > (though that does work because the hook names are a finite set that
> > conforms to our config key names).
> 
> Oh, interesting. I think you're saying "what if option 'frotz' only
> makes sense for prepare-commit-msg; then there's no reason to allow
> 'frotz' and 'prepare-commit-msg-frotz' and 'post-commit-frotz' and so
> on?

No, what I meant was just that if we had a hook "foo/bar", then the
natural option to control its hook-specific order would be:

  [hook "whatever"]
  order-foo/bar = 123

which isn't allowed ("/" is not valid in a key name). But that should be
OK since we control the names of hooks and can decide not to make one
with an invalid character in it. We might also support hooks for
third-party programs (e.g., if a porcelain wrapper wanted to have its
own "pre-switch-branches" hook or something), but it's not too much of
an imposition to say that the hook name should be a valid config key.

> I think I didn't do a great job explaining myself in that mail, but
> my idea was to let an unqualified option name in a hook block set the
> default, and then allow it to be overridden by qualifying it with the
> name of the hook in question:
> 
> [hook "unique-name"]
>   option = "some default"
>   post-commit-option = "post-commit specific version"
>   pre-push = ~/foo.sh pre-push
>   post-commit = ~/foo.sh post-commit

Yeah, that overriding system makes sense to me. Any other option "frotz"
would have the same constraint, though I don't think that's too big a
deal.

> Then when post-commit is invoked, option = "post-commit specific
> version"; when pre-push is invoked, option = "some default". My
> intention was to generate the hook-specific option key on the fly during
> setup.

Right that makes sense to me.

> >   [hook "pre-receive"]
> >   # put any pre-receive related options here; e.g., a rule for what to
> >   # do with hook exit codes (e.g., stop running, run all but return exit
> >   # code, ignore failures, etc)
> >   fail = stop
> 
> Interesting - so this is a default for all pre-receive hooks, that I can
> set at whichever scope I wish.

Yes. Though I had imagined "fail" as semantics for operating on the
whole list of "pre-receive" hooks, you could define it in a per-command
way, too. I was thinking of it as "this is the strategy when a command
fails". But you could also think of it as "what to do when this
particular command fails".

> >   [hookcmd "foo"]
> >   # the actual hook command to run
> >   command = /path/to/another-hook
> >   # other hook options, like order priority
> >   order = 123
> 
> Looks familiar enough. Now I worry - what if I specify 'fail' here too?

If there's a per-command version of "fail", then presumably it would
override any per-hook. I.e., I'd expect code to resolve this at
run-time, like:

  struct hook *hook = get_hook("pre-receive");
  for (i = 0; i < hook->nr; i++) {
          struct hookcmd *cmd = hook->cmds[i];

          if (run_hook(cmd->prog) != 0) {
                  enum failure_strategy f = cmd->failure_strategy;
                  if (f == FAILURE_STRATEGY_UNSET)
                          f = hook->failure_strategy;
                  switch (f) {
                  ...do whatever...
                  }
          }
  }

> It seems like I may be saying "let's set a default per hookcmd" and you
> may be saying "let's set a default per hook". Maybe you're saying "some
> options are hook-specific and some options are command-specific."

Yeah, the latter. Or it might even be that an option is sometimes
hook-specific and sometimes command-specific.

> You
> might be saying "we shouldn't need to set multiple option values for a
> single command," and I think I disagree with that based on the
> git-secrets value alone; if I'm getting ready to commit, I want
> git-secrets to run last so it can look at changes other hooks made to my
> commit, but if I'm getting ready to push, I want git-secrets to run
> first so I don't wait around for a test suite just to find that my
> commit is invalid anyways. Although, I guess with your schema the former
> would be in [hookcmd "git-secrets-committing"] and the latter
> would be in [hookcmd "git-secrets-pushing"], so I can set the ordering
> how I wish.

I think all of this is _possible_ in either scheme. We're encoding
potentially tabular data into a hierarchical config structure. In either
case I can set hook->cmd->option or cmd->hook->option. The question is
just which arrangement makes it simplest to do the most common things.

> Yeah, I see what you mean, and again I really like that. That lets us
> run multiples in config order easily:
> 
> [hook "pre-receive"]
>   command = /some/script
>   command = /some/other-script
>   command = some-hookcmd-header

Yep, config order makes sense as a default (though I think you could
make an argument for lexical order by command-name, which allows naming
things "000foo" if the user really wants to).

> If we add a little repeated-name detection then we can also reorder
> easily this way if that's the direction we want for ordering:
> 
> {global}
> hook.pre-receive.command = a.sh
> hook.pre-receive.command = b.sh
> 
> {local}
> hook.pre-receive.command = c.sh
> hook.pre-receive.command = a.sh
> 
> for a final order of {b.sh, c.sh, a.sh}.

I'm not sure what I'd expect a repeated mention of "a.sh" to do, but as
long as it's well-defined I don't really care. :)

> I wonder - I think even something like this would work:
> 
> {global}
> [hook "pre-receive"]
>   command = no-hookcmd-entry.sh
> 
> {local for repo "zork"}
> [hookcmd "no-hookcmd-entry.sh"]
>   skip = true
>
> For most repos, now I simply invoke no-hookcmd-entry.sh on pre-receive,
> but when I'm parsing the config in "zork", now I see a populated hookcmd
> entry, and when I look it up with the key I found in the global config,
> I see that it's supposed to be skipped.

Yes, exactly. The config parsing procedure is really just filling in a
"struct hookcmd" as it goes, so we don't care that they're in two
separate files.

> Although I might need to do something hacky if I have multiple hooks
> pointing to the same simple invocation:
> 
> {global}
> [hook "pre-receive"]
>   command = no-hookcmd-entry.sh
> 
> [hook "post-commit"]
>   command = no-hookcmd-entry.sh

I think your commands would be:

  command = "no-hookcmd-entry.sh pre-receive"

etc in that case, so they'd have different hookcmd blocks. You
_wouldn't_ be able to just turn off all of them with one config command,
though.

> I wonder if I'm getting buried in the weeds of stuff we won't ever have
> to worry about ;)

Yeah. I don't mind a little over-engineering as long as the easy things
remain simple, and the hard things remain possible. But that also means
we might be able to grow the hard things later (or never) as long as we
have a reasonable plan for them.

-Peff

^ permalink raw reply	[flat|nested] 125+ messages in thread

* [RFC PATCH v2 0/2] configuration-based hook management (was: [TOPIC 2/17] Hooks in the future)
  2020-04-13 21:52               ` Jeff King
@ 2020-04-14  0:54                 ` Emily Shaffer
  2020-04-14  0:54                   ` [RFC PATCH v2 1/2] hook: scaffolding for git-hook subcommand Emily Shaffer
                                     ` (2 more replies)
  2020-04-15  3:45                 ` [TOPIC 2/17] Hooks in the future Jonathan Nieder
  1 sibling, 3 replies; 125+ messages in thread
From: Emily Shaffer @ 2020-04-14  0:54 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer, Junio C Hamano, James Ramsay, Jeff King

Not much to look at compared to the original RFC I sent some months ago.
This implements Peff's suggestion of using the "hookcmd" section as a
layer of indirection. The scope is a little smaller than the original
RFC as it doesn't have a way to remove hooks from downstream (yet), and
ordering numbers are dropped (for now).

One thing that's missing, as evidenced by the TODO, is a way to handle
arbitrary options given within a "hookcmd" unit. I think this can be
achieved with a callback, since it seems plausible that "pre-receive"
might want a different set of options than "post-commit" or so on. To
me, it sounds achievable with a callback; I imagine a follow-on teaching
git-hook how to remove a hook with something like "hookcmd.foo.skip =
true" will give an OK indication of how that might look.

Overall though, I think this is simpler than the first version of the
RFC because I was reminded by wiser folks than I to "keep it simple,
stupid." ;)

I think it's feasible that with these couple patches applied, someone
who wanted to jump in early could replace their
.git/hook/whatever-hookname with some boilerplate like

  xargs -n 1 'sh -c' <<<"$(git hook --list whatever-hookname)"

and give it a shot. Untested snippet. :)

CI run: https://github.com/gitgitgadget/git/pull/611/checks

 - Emily

Emily Shaffer (2):
  hook: scaffolding for git-hook subcommand
  hook: add --list mode

 .gitignore                    |  1 +
 Documentation/git-hook.txt    | 53 ++++++++++++++++++++
 Makefile                      |  2 +
 builtin.h                     |  1 +
 builtin/hook.c                | 77 +++++++++++++++++++++++++++++
 git.c                         |  1 +
 hook.c                        | 92 +++++++++++++++++++++++++++++++++++
 hook.h                        | 13 +++++
 t/t1360-config-based-hooks.sh | 58 ++++++++++++++++++++++
 9 files changed, 298 insertions(+)
 create mode 100644 Documentation/git-hook.txt
 create mode 100644 builtin/hook.c
 create mode 100644 hook.c
 create mode 100644 hook.h
 create mode 100755 t/t1360-config-based-hooks.sh

-- 
2.26.0.110.g2183baf09c-goog


^ permalink raw reply	[flat|nested] 125+ messages in thread

* [RFC PATCH v2 1/2] hook: scaffolding for git-hook subcommand
  2020-04-14  0:54                 ` [RFC PATCH v2 0/2] configuration-based hook management (was: [TOPIC 2/17] Hooks in the future) Emily Shaffer
@ 2020-04-14  0:54                   ` Emily Shaffer
  2020-04-14  0:54                   ` [RFC PATCH v2 2/2] hook: add --list mode Emily Shaffer
  2020-04-14 15:15                   ` [RFC PATCH v2 0/2] configuration-based hook management Phillip Wood
  2 siblings, 0 replies; 125+ messages in thread
From: Emily Shaffer @ 2020-04-14  0:54 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

Introduce infrastructure for a new subcommand, git-hook, which will be
used to ease config-based hook management. This command will handle
parsing configs to compose a list of hooks to run for a given event, as
well as adding or modifying hook configs in an interactive fashion.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 .gitignore                    |  1 +
 Documentation/git-hook.txt    | 19 +++++++++++++++++++
 Makefile                      |  1 +
 builtin.h                     |  1 +
 builtin/hook.c                | 21 +++++++++++++++++++++
 git.c                         |  1 +
 t/t1360-config-based-hooks.sh | 11 +++++++++++
 7 files changed, 55 insertions(+)
 create mode 100644 Documentation/git-hook.txt
 create mode 100644 builtin/hook.c
 create mode 100755 t/t1360-config-based-hooks.sh

diff --git a/.gitignore b/.gitignore
index 188bd1c3de..0f8b74f651 100644
--- a/.gitignore
+++ b/.gitignore
@@ -74,6 +74,7 @@
 /git-grep
 /git-hash-object
 /git-help
+/git-hook
 /git-http-backend
 /git-http-fetch
 /git-http-push
diff --git a/Documentation/git-hook.txt b/Documentation/git-hook.txt
new file mode 100644
index 0000000000..2d50c414cc
--- /dev/null
+++ b/Documentation/git-hook.txt
@@ -0,0 +1,19 @@
+git-hook(1)
+===========
+
+NAME
+----
+git-hook - Manage configured hooks
+
+SYNOPSIS
+--------
+[verse]
+'git hook'
+
+DESCRIPTION
+-----------
+You can list, add, and modify hooks with this command.
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/Makefile b/Makefile
index ef1ff2228f..7b9670c205 100644
--- a/Makefile
+++ b/Makefile
@@ -1079,6 +1079,7 @@ BUILTIN_OBJS += builtin/get-tar-commit-id.o
 BUILTIN_OBJS += builtin/grep.o
 BUILTIN_OBJS += builtin/hash-object.o
 BUILTIN_OBJS += builtin/help.o
+BUILTIN_OBJS += builtin/hook.o
 BUILTIN_OBJS += builtin/index-pack.o
 BUILTIN_OBJS += builtin/init-db.o
 BUILTIN_OBJS += builtin/interpret-trailers.o
diff --git a/builtin.h b/builtin.h
index 2b25a80cde..c4cd252f61 100644
--- a/builtin.h
+++ b/builtin.h
@@ -173,6 +173,7 @@ int cmd_get_tar_commit_id(int argc, const char **argv, const char *prefix);
 int cmd_grep(int argc, const char **argv, const char *prefix);
 int cmd_hash_object(int argc, const char **argv, const char *prefix);
 int cmd_help(int argc, const char **argv, const char *prefix);
+int cmd_hook(int argc, const char **argv, const char *prefix);
 int cmd_index_pack(int argc, const char **argv, const char *prefix);
 int cmd_init_db(int argc, const char **argv, const char *prefix);
 int cmd_interpret_trailers(int argc, const char **argv, const char *prefix);
diff --git a/builtin/hook.c b/builtin/hook.c
new file mode 100644
index 0000000000..b2bbc84d4d
--- /dev/null
+++ b/builtin/hook.c
@@ -0,0 +1,21 @@
+#include "cache.h"
+
+#include "builtin.h"
+#include "parse-options.h"
+
+static const char * const builtin_hook_usage[] = {
+	N_("git hook"),
+	NULL
+};
+
+int cmd_hook(int argc, const char **argv, const char *prefix)
+{
+	struct option builtin_hook_options[] = {
+		OPT_END(),
+	};
+
+	argc = parse_options(argc, argv, prefix, builtin_hook_options,
+			     builtin_hook_usage, 0);
+
+	return 0;
+}
diff --git a/git.c b/git.c
index b07198fe03..c79a9192d6 100644
--- a/git.c
+++ b/git.c
@@ -513,6 +513,7 @@ static struct cmd_struct commands[] = {
 	{ "grep", cmd_grep, RUN_SETUP_GENTLY },
 	{ "hash-object", cmd_hash_object },
 	{ "help", cmd_help },
+	{ "hook", cmd_hook, RUN_SETUP },
 	{ "index-pack", cmd_index_pack, RUN_SETUP_GENTLY | NO_PARSEOPT },
 	{ "init", cmd_init_db },
 	{ "init-db", cmd_init_db },
diff --git a/t/t1360-config-based-hooks.sh b/t/t1360-config-based-hooks.sh
new file mode 100755
index 0000000000..34b0df5216
--- /dev/null
+++ b/t/t1360-config-based-hooks.sh
@@ -0,0 +1,11 @@
+#!/bin/bash
+
+test_description='config-managed multihooks, including git-hook command'
+
+. ./test-lib.sh
+
+test_expect_success 'git hook command does not crash' '
+	git hook
+'
+
+test_done
-- 
2.26.0.110.g2183baf09c-goog


^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [RFC PATCH v2 2/2] hook: add --list mode
  2020-04-14  0:54                 ` [RFC PATCH v2 0/2] configuration-based hook management (was: [TOPIC 2/17] Hooks in the future) Emily Shaffer
  2020-04-14  0:54                   ` [RFC PATCH v2 1/2] hook: scaffolding for git-hook subcommand Emily Shaffer
@ 2020-04-14  0:54                   ` Emily Shaffer
  2020-04-14 15:15                   ` [RFC PATCH v2 0/2] configuration-based hook management Phillip Wood
  2 siblings, 0 replies; 125+ messages in thread
From: Emily Shaffer @ 2020-04-14  0:54 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

Teach 'git hook --list <hookname>', which checks the known configs in
order to create an ordered list of hooks to run on a given hook event.

Multiple commands can be specified for a given hook by providing
multiple "hook.<hookname>.command = <path-to-hook>" lines. Hooks will be
run in config order. If more properties need to be set on a given hook
in the future, commands can also be specified by providing
"hook.<hookname>.command = <hookcmd-name>", as well as a "[hookcmd
<hookcmd-name>]" subsection; at minimum, this subsection must contain a
"hookcmd.<hookcmd-name>.command = <path-to-hook>" line.

For example:

  $ git config --list | grep ^hook
  hook.pre-commit.command=baz
  hook.pre-commit.command=~/bar.sh
  hookcmd.baz.command=~/baz/from/hookcmd.sh

  $ git hook --list pre-commit
  ~/baz/from/hookcmd.sh
  ~/bar.sh

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 Documentation/git-hook.txt    | 36 +++++++++++++-
 Makefile                      |  1 +
 builtin/hook.c                | 58 +++++++++++++++++++++-
 hook.c                        | 92 +++++++++++++++++++++++++++++++++++
 hook.h                        | 15 ++++++
 t/t1360-config-based-hooks.sh | 51 ++++++++++++++++++-
 6 files changed, 249 insertions(+), 4 deletions(-)
 create mode 100644 hook.c
 create mode 100644 hook.h

diff --git a/Documentation/git-hook.txt b/Documentation/git-hook.txt
index 2d50c414cc..aafc762ea2 100644
--- a/Documentation/git-hook.txt
+++ b/Documentation/git-hook.txt
@@ -8,12 +8,46 @@ git-hook - Manage configured hooks
 SYNOPSIS
 --------
 [verse]
-'git hook'
+'git hook' -l | --list <hook-name>
 
 DESCRIPTION
 -----------
 You can list, add, and modify hooks with this command.
 
+This command parses the default configuration files for sections "hook" and
+"hookcmd". "hook" is used to describe the commands which will be run during a
+particular hook event; commands are run in config order. "hookcmd" is used to
+describe attributes of a specific command. If additional attributes don't need
+to be specified, a command to run can be specified directly in the "hook"
+section; if a "hookcmd" by that name isn't found, Git will attempt to run the
+provided value directly. For example:
+
+Global config
+----
+  [hook "post-commit"]
+    command = "linter"
+    command = "~/typocheck.sh"
+
+  [hookcmd "linter"]
+    command = "/bin/linter --c"
+----
+
+Local config
+----
+  [hook "prepare-commit-msg"]
+    command = "linter"
+  [hook "post-commit"]
+    command = "python ~/run-test-suite.py"
+----
+
+OPTIONS
+-------
+
+-l::
+--list::
+	List the hooks which have been configured for <hook-name>. Hooks appear
+	in the order they should be run.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/Makefile b/Makefile
index 7b9670c205..5f170f885b 100644
--- a/Makefile
+++ b/Makefile
@@ -896,6 +896,7 @@ LIB_OBJS += hashmap.o
 LIB_OBJS += linear-assignment.o
 LIB_OBJS += help.o
 LIB_OBJS += hex.o
+LIB_OBJS += hook.o
 LIB_OBJS += ident.o
 LIB_OBJS += interdiff.o
 LIB_OBJS += json-writer.o
diff --git a/builtin/hook.c b/builtin/hook.c
index b2bbc84d4d..60617578fb 100644
--- a/builtin/hook.c
+++ b/builtin/hook.c
@@ -1,21 +1,77 @@
 #include "cache.h"
 
 #include "builtin.h"
+#include "config.h"
+#include "hook.h"
 #include "parse-options.h"
+#include "strbuf.h"
 
 static const char * const builtin_hook_usage[] = {
-	N_("git hook"),
+	N_("git hook --list <hookname>"),
 	NULL
 };
 
+enum hook_command {
+	HOOK_NO_COMMAND = 0,
+	HOOK_LIST,
+};
+
+static int print_hook_list(const struct strbuf *hookname)
+{
+	struct list_head *head, *pos;
+	struct hook *item;
+
+	head = hook_list(hookname);
+
+	if (!head) {
+		printf(_("no commands configured for hook '%s'\n"),
+		       hookname->buf);
+		return 0;
+	}
+
+	list_for_each(pos, head) {
+		item = list_entry(pos, struct hook, list);
+		if (item)
+			printf("%s\n",
+			       item->command.buf);
+	}
+
+	return 0;
+}
+
 int cmd_hook(int argc, const char **argv, const char *prefix)
 {
+	enum hook_command command = 0;
+	struct strbuf hookname = STRBUF_INIT;
+
 	struct option builtin_hook_options[] = {
+		OPT_CMDMODE('l', "list", &command,
+			    N_("list scripts which will be run for <hookname>"),
+			    HOOK_LIST),
 		OPT_END(),
 	};
 
 	argc = parse_options(argc, argv, prefix, builtin_hook_options,
 			     builtin_hook_usage, 0);
 
+	if (argc < 1) {
+		usage_msg_opt("a hookname must be provided to operate on.",
+			      builtin_hook_usage, builtin_hook_options);
+	}
+
+	strbuf_addstr(&hookname, argv[0]);
+
+	switch(command) {
+		case HOOK_LIST:
+			return print_hook_list(&hookname);
+			break;
+		default:
+			usage_msg_opt("no command given.", builtin_hook_usage,
+				      builtin_hook_options);
+	}
+
+	clear_hook_list();
+	strbuf_release(&hookname);
+
 	return 0;
 }
diff --git a/hook.c b/hook.c
new file mode 100644
index 0000000000..a31943a25e
--- /dev/null
+++ b/hook.c
@@ -0,0 +1,92 @@
+#include "cache.h"
+
+#include "hook.h"
+#include "config.h"
+
+static LIST_HEAD(hook_head);
+
+void free_hook(struct hook *ptr)
+{
+	if (ptr) {
+		strbuf_release(&ptr->command);
+		free(ptr);
+	}
+}
+
+static void emplace_hook(struct list_head *pos, const char *command)
+{
+	struct hook *to_add = malloc(sizeof(struct hook));
+	to_add->origin = current_config_scope();
+	strbuf_init(&to_add->command, 0);
+	strbuf_addstr(&to_add->command, command);
+
+	list_add_tail(&to_add->list, pos);
+}
+
+static void remove_hook(struct list_head *to_remove)
+{
+	struct hook *hook_to_remove = list_entry(to_remove, struct hook, list);
+	list_del(to_remove);
+	free_hook(hook_to_remove);
+}
+
+void clear_hook_list(void)
+{
+	struct list_head *pos, *tmp;
+	list_for_each_safe(pos, tmp, &hook_head)
+		remove_hook(pos);
+}
+
+struct list_head* hook_list(const struct strbuf* hookname)
+{
+	struct strbuf hook_key = STRBUF_INIT;
+	const struct string_list *commands = NULL;
+	struct string_list_item *it = NULL;
+	struct list_head *pos = NULL, *tmp = NULL;
+	struct strbuf hookcmd_name = STRBUF_INIT;
+	struct hook *hook = NULL;
+
+	if (!hookname)
+		return NULL;
+
+	strbuf_addf(&hook_key, "hook.%s.command", hookname->buf);
+
+	commands = git_config_get_value_multi(hook_key.buf);
+
+	if (!commands)
+		return NULL;
+
+	for_each_string_list_item(it, commands) {
+		const char *command = it->string;
+
+		strbuf_reset(&hookcmd_name);
+		strbuf_addf(&hookcmd_name, "hookcmd.%s.command", command);
+
+		/* If no hookcmd with that name exists, &command is untouched */
+		git_config_get_value(hookcmd_name.buf, &command);
+
+		if (!command)
+			return NULL;
+
+		/*
+		 * TODO: implement an option-getting callback, e.g.
+		 *   get configs by pattern hookcmd.$value.*
+		 *   for each key+value, do_callback(key, value, cb_data)
+		 */
+
+		list_for_each_safe(pos, tmp, &hook_head) {
+			hook = list_entry(pos, struct hook, list);
+			/*
+			 * The list of hooks to run can be reordered by being redeclared
+			 * in the config. Options about hook ordering should be checked
+			 * here.
+			 */
+			if (0 == strcmp(hook->command.buf, command))
+				remove_hook(pos);
+		}
+		emplace_hook(pos, command);
+
+	}
+
+	return &hook_head;
+}
diff --git a/hook.h b/hook.h
new file mode 100644
index 0000000000..aaf6511cff
--- /dev/null
+++ b/hook.h
@@ -0,0 +1,15 @@
+#include "config.h"
+#include "list.h"
+#include "strbuf.h"
+
+struct hook
+{
+	struct list_head list;
+	enum config_scope origin;
+	struct strbuf command;
+};
+
+struct list_head* hook_list(const struct strbuf *hookname);
+
+void free_hook(struct hook *ptr);
+void clear_hook_list(void);
diff --git a/t/t1360-config-based-hooks.sh b/t/t1360-config-based-hooks.sh
index 34b0df5216..2e6a5e09d3 100755
--- a/t/t1360-config-based-hooks.sh
+++ b/t/t1360-config-based-hooks.sh
@@ -4,8 +4,55 @@ test_description='config-managed multihooks, including git-hook command'
 
 . ./test-lib.sh
 
-test_expect_success 'git hook command does not crash' '
-	git hook
+test_expect_success 'git hook rejects commands without a mode' '
+	test_must_fail git hook pre-commit
+'
+
+
+test_expect_success 'git hook rejects commands without a hookname' '
+	test_must_fail git hook --list
+'
+
+test_expect_success 'setup hooks in global, and local' '
+	git config --add --local hook.pre-commit.command "/path/ghi" &&
+	git config --add --global hook.pre-commit.command "/path/def"
+'
+
+test_expect_success 'git hook --list orders by config order' '
+	cat >expected <<-\EOF &&
+	/path/def
+	/path/ghi
+	EOF
+
+	git hook --list pre-commit >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'git hook --list dereferences a hookcmd' '
+	git config --add --local hook.pre-commit.command "abc" &&
+	git config --add --global hookcmd.abc.command "/path/abc" &&
+
+	cat >expected <<-\EOF &&
+	/path/def
+	/path/ghi
+	/path/abc
+	EOF
+
+	git hook --list pre-commit >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'git hook --list reorders on duplicate commands' '
+	git config --add --local hook.pre-commit.command "/path/def" &&
+
+	cat >expected <<-\EOF &&
+	/path/ghi
+	/path/abc
+	/path/def
+	EOF
+
+	git hook --list pre-commit >actual &&
+	test_cmp expected actual
 '
 
 test_done
-- 
2.26.0.110.g2183baf09c-goog


^ permalink raw reply related	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH v2 0/2] configuration-based hook management
  2020-04-14  0:54                 ` [RFC PATCH v2 0/2] configuration-based hook management (was: [TOPIC 2/17] Hooks in the future) Emily Shaffer
  2020-04-14  0:54                   ` [RFC PATCH v2 1/2] hook: scaffolding for git-hook subcommand Emily Shaffer
  2020-04-14  0:54                   ` [RFC PATCH v2 2/2] hook: add --list mode Emily Shaffer
@ 2020-04-14 15:15                   ` Phillip Wood
  2020-04-14 19:24                     ` Emily Shaffer
                                       ` (2 more replies)
  2 siblings, 3 replies; 125+ messages in thread
From: Phillip Wood @ 2020-04-14 15:15 UTC (permalink / raw)
  To: Emily Shaffer, git; +Cc: Junio C Hamano, James Ramsay, Jeff King

Hi Emily

Thanks for working on this, having a way to manage multiple commands per 
hook without using an external framework would be really useful

On 14/04/2020 01:54, Emily Shaffer wrote:
> Not much to look at compared to the original RFC I sent some months ago.
> This implements Peff's suggestion of using the "hookcmd" section as a
> layer of indirection.

I'm not really clear what the advantage of this indirection is. It seems 
unlikely to me that different hooks will share exactly the same command 
line or other options. In the 'git secrets' example earlier in this 
thread each hook needs to use a different command line. In general a 
command cannot tell which hook it is being invoked as without a flag of 
some kind. (In some cases it can use the number of arguments if that is 
different for each hook that it handles but that is not true in general)

Without the redirection one could have
   hook.pre-commit.linter.command = my-command
   hook.pre-commit.check-whitespace.command = 'git diff --check --cached'

and other keys can be added for ordering etc. e.g.
   hook.pre-commit.linter.before = check-whitespace

With the indirection one needs to set
   hook.pre-commit.command = linter
   hook.pre-commit.check-whitespace = 'git diff --check --cached'
   hookcmd.linter.command = my-command
   hookcmd.linter.pre-commit-before = check-whitespace

which involves setting an extra key and checking it each time the hook 
is invoked without any benefit that I can see. I suspect which one seems 
more logical depends on how one thinks of setting hooks - I tend to 
think "I want to set a pre-commit hook" not "I want to set a git-secrets 
hook". If you've got an example where this indirection is helpful or 
necessary that would be really useful to see.

Best Wishes

Phillip


> The scope is a little smaller than the original
> RFC as it doesn't have a way to remove hooks from downstream (yet), and
> ordering numbers are dropped (for now).
> 
> One thing that's missing, as evidenced by the TODO, is a way to handle
> arbitrary options given within a "hookcmd" unit. I think this can be
> achieved with a callback, since it seems plausible that "pre-receive"
> might want a different set of options than "post-commit" or so on. To
> me, it sounds achievable with a callback; I imagine a follow-on teaching
> git-hook how to remove a hook with something like "hookcmd.foo.skip =
> true" will give an OK indication of how that might look.
> 
> Overall though, I think this is simpler than the first version of the
> RFC because I was reminded by wiser folks than I to "keep it simple,
> stupid." ;)
> 
> I think it's feasible that with these couple patches applied, someone
> who wanted to jump in early could replace their
> .git/hook/whatever-hookname with some boilerplate like
> 
>    xargs -n 1 'sh -c' <<<"$(git hook --list whatever-hookname)"
> 
> and give it a shot. Untested snippet. :)
> CI run: https://github.com/gitgitgadget/git/pull/611/checks
> 
>   - Emily
> 
> Emily Shaffer (2):
>    hook: scaffolding for git-hook subcommand
>    hook: add --list mode
> 
>   .gitignore                    |  1 +
>   Documentation/git-hook.txt    | 53 ++++++++++++++++++++
>   Makefile                      |  2 +
>   builtin.h                     |  1 +
>   builtin/hook.c                | 77 +++++++++++++++++++++++++++++
>   git.c                         |  1 +
>   hook.c                        | 92 +++++++++++++++++++++++++++++++++++
>   hook.h                        | 13 +++++
>   t/t1360-config-based-hooks.sh | 58 ++++++++++++++++++++++
>   9 files changed, 298 insertions(+)
>   create mode 100644 Documentation/git-hook.txt
>   create mode 100644 builtin/hook.c
>   create mode 100644 hook.c
>   create mode 100644 hook.h
>   create mode 100755 t/t1360-config-based-hooks.sh
> 

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH v2 0/2] configuration-based hook management
  2020-04-14 15:15                   ` [RFC PATCH v2 0/2] configuration-based hook management Phillip Wood
@ 2020-04-14 19:24                     ` Emily Shaffer
  2020-04-14 20:27                       ` Jeff King
  2020-04-14 20:03                     ` Josh Steadmon
  2020-04-14 20:32                     ` Jeff King
  2 siblings, 1 reply; 125+ messages in thread
From: Emily Shaffer @ 2020-04-14 19:24 UTC (permalink / raw)
  To: phillip.wood; +Cc: git, Junio C Hamano, James Ramsay, Jeff King

On Tue, Apr 14, 2020 at 04:15:11PM +0100, Phillip Wood wrote:
> Hi Emily
> 
> Thanks for working on this, having a way to manage multiple commands per
> hook without using an external framework would be really useful
> 
> On 14/04/2020 01:54, Emily Shaffer wrote:
> > Not much to look at compared to the original RFC I sent some months ago.
> > This implements Peff's suggestion of using the "hookcmd" section as a
> > layer of indirection.
> 
> I'm not really clear what the advantage of this indirection is. It seems
> unlikely to me that different hooks will share exactly the same command line
> or other options. In the 'git secrets' example earlier in this thread each
> hook needs to use a different command line. In general a command cannot tell
> which hook it is being invoked as without a flag of some kind. (In some
> cases it can use the number of arguments if that is different for each hook
> that it handles but that is not true in general)
> 
> Without the redirection one could have
>   hook.pre-commit.linter.command = my-command
>   hook.pre-commit.check-whitespace.command = 'git diff --check --cached'

I think this isn't supported by the config semantics. Have a look at
config.h:parse_config_key:

  /*
   * Match and parse a config key of the form:
   *
   *   section.(subsection.)?key
   *
   * (i.e., what gets handed to a config_fn_t). The caller provides the section;
   * we return -1 if it does not match, 0 otherwise. The subsection and key
   * out-parameters are filled by the function (and *subsection is NULL if it is
   * missing).
   *
   * If the subsection pointer-to-pointer passed in is NULL, returns 0 only if
   * there is no subsection at all.
   */
  int parse_config_key(const char *var,
                       const char *section,
                       const char **subsection, int *subsection_len,
                       const char **key);

We'd need to fudge one of these fields to include the extra section, I
think. Unfortunate, because I find your example very tidy, but in
practice maybe not very neat. The closest thing I can find to a nice way
of writing it might be:

  [hook.pre-commit "linter"]
    command = my-command
    before = check-whitespace
  [hook.pre-commit "check-whitespace"]
    command = 'git diff --check --cached'

But this is kind of a lie; the sections aren't "hook", "pre-commit", and
"linter" as you'd expect. Whether it's OK to lie like this, though, I
don't know - I suspect it might make it awkward for others trying to
parse the config. (my Vim syntax highlighter had kind of a hard time.)

> 
> and other keys can be added for ordering etc. e.g.
>   hook.pre-commit.linter.before = check-whitespace
> 
> With the indirection one needs to set
>   hook.pre-commit.command = linter
>   hook.pre-commit.check-whitespace = 'git diff --check --cached'
>   hookcmd.linter.command = my-command
>   hookcmd.linter.pre-commit-before = check-whitespace
> 
> which involves setting an extra key and checking it each time the hook is
> invoked without any benefit that I can see. I suspect which one seems more
> logical depends on how one thinks of setting hooks - I tend to think "I want
> to set a pre-commit hook" not "I want to set a git-secrets hook". If you've
> got an example where this indirection is helpful or necessary that would be
> really useful to see.

Thanks for sharing your workflow; as always, it's hard to understand the
ways others work differently from yourself, so I'm glad to hear from
you. Let me think some more on it and reply back again.

 - Emily

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH v2 0/2] configuration-based hook management
  2020-04-14 15:15                   ` [RFC PATCH v2 0/2] configuration-based hook management Phillip Wood
  2020-04-14 19:24                     ` Emily Shaffer
@ 2020-04-14 20:03                     ` Josh Steadmon
  2020-04-15 10:08                       ` Phillip Wood
  2020-04-14 20:32                     ` Jeff King
  2 siblings, 1 reply; 125+ messages in thread
From: Josh Steadmon @ 2020-04-14 20:03 UTC (permalink / raw)
  To: phillip.wood; +Cc: Emily Shaffer, git, Junio C Hamano, James Ramsay, Jeff King

On 2020.04.14 16:15, Phillip Wood wrote:
> Hi Emily
> 
> Thanks for working on this, having a way to manage multiple commands per
> hook without using an external framework would be really useful
> 
> On 14/04/2020 01:54, Emily Shaffer wrote:
> > Not much to look at compared to the original RFC I sent some months ago.
> > This implements Peff's suggestion of using the "hookcmd" section as a
> > layer of indirection.
> 
> I'm not really clear what the advantage of this indirection is. It seems
> unlikely to me that different hooks will share exactly the same command line
> or other options. In the 'git secrets' example earlier in this thread each
> hook needs to use a different command line. In general a command cannot tell
> which hook it is being invoked as without a flag of some kind. (In some
> cases it can use the number of arguments if that is different for each hook
> that it handles but that is not true in general)
> 
> Without the redirection one could have
>   hook.pre-commit.linter.command = my-command
>   hook.pre-commit.check-whitespace.command = 'git diff --check --cached'
> 
> and other keys can be added for ordering etc. e.g.
>   hook.pre-commit.linter.before = check-whitespace
> 
> With the indirection one needs to set
>   hook.pre-commit.command = linter
>   hook.pre-commit.check-whitespace = 'git diff --check --cached'
>   hookcmd.linter.command = my-command
>   hookcmd.linter.pre-commit-before = check-whitespace
> 
> which involves setting an extra key and checking it each time the hook is
> invoked without any benefit that I can see. I suspect which one seems more
> logical depends on how one thinks of setting hooks - I tend to think "I want
> to set a pre-commit hook" not "I want to set a git-secrets hook". If you've
> got an example where this indirection is helpful or necessary that would be
> really useful to see.
> 
> Best Wishes
> 
> Phillip

Indexing repo content (see [1] for a detailed discussion) is one use
case where you have a single command that runs identically from
post-commit, post-merge, and post-checkout.

Also, I suspect that many users don't have a firm enough grasp on the
various git hooks options to know ahead of time which ones they want to
set to accomplish a given task (without diving into the docs first). I'm
not trying to say that your workflow is incorrect, but my gut feeling is
that most Git users would work in the opposite direction. Every time I
have needed to automate something, I generally had a rough script in
place first, and then looked up which hook(s) would be appropriate
triggers for the script.


[1]: https://tbaggery.com/2011/08/08/effortless-ctags-with-git.html

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH v2 0/2] configuration-based hook management
  2020-04-14 19:24                     ` Emily Shaffer
@ 2020-04-14 20:27                       ` Jeff King
  2020-04-15 10:01                         ` Phillip Wood
  0 siblings, 1 reply; 125+ messages in thread
From: Jeff King @ 2020-04-14 20:27 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: phillip.wood, git, Junio C Hamano, James Ramsay

On Tue, Apr 14, 2020 at 12:24:18PM -0700, Emily Shaffer wrote:

> > Without the redirection one could have
> >   hook.pre-commit.linter.command = my-command
> >   hook.pre-commit.check-whitespace.command = 'git diff --check --cached'
> [...]
> We'd need to fudge one of these fields to include the extra section, I
> think. Unfortunate, because I find your example very tidy, but in
> practice maybe not very neat. The closest thing I can find to a nice way
> of writing it might be:
> 
>   [hook.pre-commit "linter"]
>     command = my-command
>     before = check-whitespace
>   [hook.pre-commit "check-whitespace"]
>     command = 'git diff --check --cached'

Syntactically the whole section between the outer dots is the
subsection. So it's:

  [hook "pre-commit.check-whitespace"]
  command = ...

And I don't think we want to change the config syntax at this point.
Even in the neater dotted notation, we must keep that whole thing as a
subsection, because existing subsections may contain dots, too.

> But this is kind of a lie; the sections aren't "hook", "pre-commit", and
> "linter" as you'd expect. Whether it's OK to lie like this, though, I
> don't know - I suspect it might make it awkward for others trying to
> parse the config. (my Vim syntax highlighter had kind of a hard time.)

I think we should avoid it if possible. There are some subtleties there,
like the fact that subsections are case-sensitive, but sections and keys
are not.

-Peff

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH v2 0/2] configuration-based hook management
  2020-04-14 15:15                   ` [RFC PATCH v2 0/2] configuration-based hook management Phillip Wood
  2020-04-14 19:24                     ` Emily Shaffer
  2020-04-14 20:03                     ` Josh Steadmon
@ 2020-04-14 20:32                     ` Jeff King
  2020-04-15 10:01                       ` Phillip Wood
  2 siblings, 1 reply; 125+ messages in thread
From: Jeff King @ 2020-04-14 20:32 UTC (permalink / raw)
  To: phillip.wood; +Cc: Emily Shaffer, git, Junio C Hamano, James Ramsay

On Tue, Apr 14, 2020 at 04:15:11PM +0100, Phillip Wood wrote:

> On 14/04/2020 01:54, Emily Shaffer wrote:
> > Not much to look at compared to the original RFC I sent some months ago.
> > This implements Peff's suggestion of using the "hookcmd" section as a
> > layer of indirection.
> 
> I'm not really clear what the advantage of this indirection is. It seems
> unlikely to me that different hooks will share exactly the same command line
> or other options. In the 'git secrets' example earlier in this thread each
> hook needs to use a different command line. In general a command cannot tell
> which hook it is being invoked as without a flag of some kind. (In some
> cases it can use the number of arguments if that is different for each hook
> that it handles but that is not true in general)
> 
> Without the redirection one could have
>   hook.pre-commit.linter.command = my-command
>   hook.pre-commit.check-whitespace.command = 'git diff --check --cached'
> 
> and other keys can be added for ordering etc. e.g.
>   hook.pre-commit.linter.before = check-whitespace
> 
> With the indirection one needs to set
>   hook.pre-commit.command = linter
>   hook.pre-commit.check-whitespace = 'git diff --check --cached'
>   hookcmd.linter.command = my-command
>   hookcmd.linter.pre-commit-before = check-whitespace

In the proposal I gave, you could do:

  hook.pre-commit.command = my-command
  hook.pre-commit.command = git diff --check --cached

If you want to refer to commands in ordering options (like your
"before"), then you'd have to refer to their names. For "my-command"
that's not too bad. For the longer one, it's a bit awkward. You _could_
do:

  hookcmd.my-command.before = git diff --check --cached

which is the same number of lines as yours. But I'd probably give it a
name, like:

  hookcmd.check-whitespace.command = git diff --check --cached
  hookcmd.my-command.before = check-whitespace

That's one more line than yours, but I think it separates the concerns
more clearly. And it extends naturally to more options specific to
check-whitespace.

-Peff

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [TOPIC 2/17] Hooks in the future
  2020-04-13 21:52               ` Jeff King
  2020-04-14  0:54                 ` [RFC PATCH v2 0/2] configuration-based hook management (was: [TOPIC 2/17] Hooks in the future) Emily Shaffer
@ 2020-04-15  3:45                 ` Jonathan Nieder
  2020-04-15 20:59                   ` Emily Shaffer
  2020-04-15 22:42                   ` [TOPIC 2/17] Hooks in the future Jeff King
  1 sibling, 2 replies; 125+ messages in thread
From: Jonathan Nieder @ 2020-04-15  3:45 UTC (permalink / raw)
  To: Jeff King; +Cc: Emily Shaffer, Junio C Hamano, James Ramsay, git

Hi,

Jeff King wrote:
> On Mon, Apr 13, 2020 at 12:15:15PM -0700, Emily Shaffer wrote:
>> Jeff King wrote:

>>> Yeah, giving each block a unique name lets you give them each an order.
>>> It seems kind of weird to me that you'd define multiple hook types for a
>>> given name.
>>
>> Not so odd - git-secrets configures itself for pre-commit,
>> prepare-commit-msg, and commit-msg-hook.
[...]
> Yeah, I do see how that use case makes sense. I wonder how common it is
> versus having separate one-off hooks.

I think separately from the frequency question, we should look at the
"what model do we want to present to the user" question.

It's not too unusual for a project with their source code in a Git
repository to have conventions they want to nudge users toward.  I'd
expect them to use a combination of hooks for this:

	prepare-commit-msg
	commit-msg
	pre-push

Git LFS installs multiple hooks:

	pre-push
	post-checkout
	post-commit
	post-merge

git-secrets installs multiple hooks, as already mentioned.

We've also had some instances over time of one hook replacing another,
to improve the interface.  A program wanting to install hooks would
then be likely to migrate from the older interface to the better one.

What I mean to get at is that I think thinking of them in terms of
individual hooks, the user model assumed by these programs is to think
of them as plugins hooking into Git.  The individual hooks are events
that the plugin listens on.  If I am trying to disable a plugin, I
don't want to have to learn which events it cared about.

>                                       And whether setting the order
> priority for all hooks at once is that useful (e.g., I can easily
> imagine a case where the pre-commit hook for program A must go before B,
> but it's the other way around for another hook).

This I agree about.  Actually I'm skeptical about ordering
dependencies being something that is meaningful for users to work with
in general, except in the case of closely cooperating hook authors.

That doesn't mean we shouldn't try to futureproof for that, but I
don't think we need to overfit on it.

[...]
>>> And it doesn't leave a lot of room for defining
>>> per-hook-type options; you have to make new keys like pre-push-order
>>> (though that does work because the hook names are a finite set that
>>> conforms to our config key names).

Exactly: field names like prePushOrder should work okay, even if
they're a bit noisy.

[...]
>>>   [hook "pre-receive"]
>>>   # put any pre-receive related options here; e.g., a rule for what to
>>>   # do with hook exit codes (e.g., stop running, run all but return exit
>>>   # code, ignore failures, etc)
>>>   fail = stop
>>
>> Interesting - so this is a default for all pre-receive hooks, that I can
>> set at whichever scope I wish.

If I have the mental model of "these are plugins, and particular hooks
are events they listen to", then it seems hard to make use of this
broader setting.

But scoped to a particular (plugin, event) pair it sounds very handy.

My two cents,
Jonathan

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH v2 0/2] configuration-based hook management
  2020-04-14 20:27                       ` Jeff King
@ 2020-04-15 10:01                         ` Phillip Wood
  0 siblings, 0 replies; 125+ messages in thread
From: Phillip Wood @ 2020-04-15 10:01 UTC (permalink / raw)
  To: Jeff King, Emily Shaffer; +Cc: phillip.wood, git, Junio C Hamano, James Ramsay

On 14/04/2020 21:27, Jeff King wrote:
> On Tue, Apr 14, 2020 at 12:24:18PM -0700, Emily Shaffer wrote:
> 
>>> Without the redirection one could have
>>>   hook.pre-commit.linter.command = my-command
>>>   hook.pre-commit.check-whitespace.command = 'git diff --check --cached'
>> [...]
>> We'd need to fudge one of these fields to include the extra section, I
>> think. Unfortunate, because I find your example very tidy, but in
>> practice maybe not very neat. The closest thing I can find to a nice way
>> of writing it might be:
>>
>>   [hook.pre-commit "linter"]
>>     command = my-command
>>     before = check-whitespace
>>   [hook.pre-commit "check-whitespace"]
>>     command = 'git diff --check --cached'
> 
> Syntactically the whole section between the outer dots is the
> subsection. So it's:
> 
>   [hook "pre-commit.check-whitespace"]
>   command = ...
> 
> And I don't think we want to change the config syntax at this point.
> Even in the neater dotted notation, we must keep that whole thing as a
> subsection, because existing subsections may contain dots, too.

Thanks for clarifying that, I agree we don't want to change the config
syntax and break existing subsections

Best Wishes

Phillip

>> But this is kind of a lie; the sections aren't "hook", "pre-commit", and
>> "linter" as you'd expect. Whether it's OK to lie like this, though, I
>> don't know - I suspect it might make it awkward for others trying to
>> parse the config. (my Vim syntax highlighter had kind of a hard time.)
> 
> I think we should avoid it if possible. There are some subtleties there,
> like the fact that subsections are case-sensitive, but sections and keys
> are not.> -Peff
> 


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH v2 0/2] configuration-based hook management
  2020-04-14 20:32                     ` Jeff King
@ 2020-04-15 10:01                       ` Phillip Wood
  2020-04-15 14:51                         ` Junio C Hamano
  0 siblings, 1 reply; 125+ messages in thread
From: Phillip Wood @ 2020-04-15 10:01 UTC (permalink / raw)
  To: Jeff King, phillip.wood; +Cc: Emily Shaffer, git, Junio C Hamano, James Ramsay

On 14/04/2020 21:32, Jeff King wrote:
> On Tue, Apr 14, 2020 at 04:15:11PM +0100, Phillip Wood wrote:
> 
>> On 14/04/2020 01:54, Emily Shaffer wrote:
>>> Not much to look at compared to the original RFC I sent some months ago.
>>> This implements Peff's suggestion of using the "hookcmd" section as a
>>> layer of indirection.
>>
>> I'm not really clear what the advantage of this indirection is. It seems
>> unlikely to me that different hooks will share exactly the same command line
>> or other options. In the 'git secrets' example earlier in this thread each
>> hook needs to use a different command line. In general a command cannot tell
>> which hook it is being invoked as without a flag of some kind. (In some
>> cases it can use the number of arguments if that is different for each hook
>> that it handles but that is not true in general)
>>
>> Without the redirection one could have
>>   hook.pre-commit.linter.command = my-command
>>   hook.pre-commit.check-whitespace.command = 'git diff --check --cached'
>>
>> and other keys can be added for ordering etc. e.g.
>>   hook.pre-commit.linter.before = check-whitespace
>>
>> With the indirection one needs to set
>>   hook.pre-commit.command = linter
>>   hook.pre-commit.check-whitespace = 'git diff --check --cached'
>>   hookcmd.linter.command = my-command
>>   hookcmd.linter.pre-commit-before = check-whitespace
> 
> In the proposal I gave, you could do:
> 
>   hook.pre-commit.command = my-command
>   hook.pre-commit.command = git diff --check --cached
> 
> If you want to refer to commands in ordering options (like your
> "before"), then you'd have to refer to their names. For "my-command"
> that's not too bad. For the longer one, it's a bit awkward. You _could_
> do:
> 
>   hookcmd.my-command.before = git diff --check --cached
> 
> which is the same number of lines as yours. But I'd probably give it a
> name, like:
> 
>   hookcmd.check-whitespace.command = git diff --check --cached
>   hookcmd.my-command.before = check-whitespace
> 
> That's one more line than yours, but I think it separates the concerns
> more clearly. And it extends naturally to more options specific to
> check-whitespace.

I agree that using a name rather than the command line makes things
clearer here

Best Wishes

Phillip

> -Peff
> 


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH v2 0/2] configuration-based hook management
  2020-04-14 20:03                     ` Josh Steadmon
@ 2020-04-15 10:08                       ` Phillip Wood
  0 siblings, 0 replies; 125+ messages in thread
From: Phillip Wood @ 2020-04-15 10:08 UTC (permalink / raw)
  To: Josh Steadmon, phillip.wood, Emily Shaffer, git, Junio C Hamano,
	James Ramsay, Jeff King

On 14/04/2020 21:03, Josh Steadmon wrote:
> On 2020.04.14 16:15, Phillip Wood wrote:
>> Hi Emily
>>
>> Thanks for working on this, having a way to manage multiple commands per
>> hook without using an external framework would be really useful
>>
>> On 14/04/2020 01:54, Emily Shaffer wrote:
>>> Not much to look at compared to the original RFC I sent some months ago.
>>> This implements Peff's suggestion of using the "hookcmd" section as a
>>> layer of indirection.
>>
>> I'm not really clear what the advantage of this indirection is. It seems
>> unlikely to me that different hooks will share exactly the same command line
>> or other options. In the 'git secrets' example earlier in this thread each
>> hook needs to use a different command line. In general a command cannot tell
>> which hook it is being invoked as without a flag of some kind. (In some
>> cases it can use the number of arguments if that is different for each hook
>> that it handles but that is not true in general)
>>
>> Without the redirection one could have
>>   hook.pre-commit.linter.command = my-command
>>   hook.pre-commit.check-whitespace.command = 'git diff --check --cached'
>>
>> and other keys can be added for ordering etc. e.g.
>>   hook.pre-commit.linter.before = check-whitespace
>>
>> With the indirection one needs to set
>>   hook.pre-commit.command = linter
>>   hook.pre-commit.check-whitespace = 'git diff --check --cached'
>>   hookcmd.linter.command = my-command
>>   hookcmd.linter.pre-commit-before = check-whitespace
>>
>> which involves setting an extra key and checking it each time the hook is
>> invoked without any benefit that I can see. I suspect which one seems more
>> logical depends on how one thinks of setting hooks - I tend to think "I want
>> to set a pre-commit hook" not "I want to set a git-secrets hook". If you've
>> got an example where this indirection is helpful or necessary that would be
>> really useful to see.
>>
>> Best Wishes
>>
>> Phillip
> 
> Indexing repo content (see [1] for a detailed discussion) is one use
> case where you have a single command that runs identically from
> post-commit, post-merge, and post-checkout.

Thanks for sharing that, it is a useful reference point

> Also, I suspect that many users don't have a firm enough grasp on the
> various git hooks options to know ahead of time which ones they want to
> set to accomplish a given task (without diving into the docs first). 

I agree with this, especially as setting up a hook is probably an
infrequent task for most people

> I'm
> not trying to say that your workflow is incorrect, but my gut feeling is
> that most Git users would work in the opposite direction. Every time I
> have needed to automate something, I generally had a rough script in
> place first, and then looked up which hook(s) would be appropriate
> triggers for the script.

As you say once they have a script they still have to look up which
hooks they want to hook it up to, the indirection does not avoid that,
it just means they have to lookup how to set up a hookcmd as well as
which hooks they want to use.

Best Wishes

Phillip

> [1]: https://tbaggery.com/2011/08/08/effortless-ctags-with-git.html
> 


^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH v2 0/2] configuration-based hook management
  2020-04-15 10:01                       ` Phillip Wood
@ 2020-04-15 14:51                         ` Junio C Hamano
  2020-04-15 20:30                           ` Emily Shaffer
  0 siblings, 1 reply; 125+ messages in thread
From: Junio C Hamano @ 2020-04-15 14:51 UTC (permalink / raw)
  To: Phillip Wood; +Cc: Jeff King, phillip.wood, Emily Shaffer, git, James Ramsay

Phillip Wood <phillip.wood123@gmail.com> writes:

>> If you want to refer to commands in ordering options (like your
>> "before"), then you'd have to refer to their names. For "my-command"
>> that's not too bad. For the longer one, it's a bit awkward. You _could_
>> do:
>> 
>>   hookcmd.my-command.before = git diff --check --cached
>> 
>> which is the same number of lines as yours. But I'd probably give it a
>> name, like:
>> 
>>   hookcmd.check-whitespace.command = git diff --check --cached
>>   hookcmd.my-command.before = check-whitespace
>> 
>> That's one more line than yours, but I think it separates the concerns
>> more clearly. And it extends naturally to more options specific to
>> check-whitespace.
>
> I agree that using a name rather than the command line makes things
> clearer here

True.   

These ways call for a different attitude to deal with errors
compared to the approach to order them with numbers, though.  

If your approach is to order by number attached to each hook, only
possible errors you'd need to worry about are (1) what to do when
the user forgets to give a number to a hook and (2) what to do when
the user gives the same number by accident to multiple hooks, and
both can even be made non-errors by declaring that an unnumbered
hook has a default number, and that two hooks with the same number
execute in an unspecified and unstable order.

On the other hand, the approach to specify relative ordering among
hooks can break more easily.  E.g. when a hook that used to be
before "my-command" got removed.  It is harder to find a "sensible"
default behaviour for such situations.

I am perfectly fine with having more possible error cases than
allowing misconfigured system to silently do a wrong thing, so...




^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH v2 0/2] configuration-based hook management
  2020-04-15 14:51                         ` Junio C Hamano
@ 2020-04-15 20:30                           ` Emily Shaffer
  2020-04-15 22:19                             ` Junio C Hamano
  0 siblings, 1 reply; 125+ messages in thread
From: Emily Shaffer @ 2020-04-15 20:30 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Phillip Wood, Jeff King, phillip.wood, git, James Ramsay

On Wed, Apr 15, 2020 at 07:51:14AM -0700, Junio C Hamano wrote:
> 
> Phillip Wood <phillip.wood123@gmail.com> writes:
> 
> >> If you want to refer to commands in ordering options (like your
> >> "before"), then you'd have to refer to their names. For "my-command"
> >> that's not too bad. For the longer one, it's a bit awkward. You _could_
> >> do:
> >> 
> >>   hookcmd.my-command.before = git diff --check --cached
> >> 
> >> which is the same number of lines as yours. But I'd probably give it a
> >> name, like:
> >> 
> >>   hookcmd.check-whitespace.command = git diff --check --cached
> >>   hookcmd.my-command.before = check-whitespace
> >> 
> >> That's one more line than yours, but I think it separates the concerns
> >> more clearly. And it extends naturally to more options specific to
> >> check-whitespace.
> >
> > I agree that using a name rather than the command line makes things
> > clearer here
> 
> True.   
> 
> These ways call for a different attitude to deal with errors
> compared to the approach to order them with numbers, though.  
> 
> If your approach is to order by number attached to each hook, only
> possible errors you'd need to worry about are (1) what to do when
> the user forgets to give a number to a hook and (2) what to do when
> the user gives the same number by accident to multiple hooks, and
> both can even be made non-errors by declaring that an unnumbered
> hook has a default number, and that two hooks with the same number
> execute in an unspecified and unstable order.
> 
> On the other hand, the approach to specify relative ordering among
> hooks can break more easily.  E.g. when a hook that used to be
> before "my-command" got removed.  It is harder to find a "sensible"
> default behaviour for such situations.

To be clear, the examples listed (both numbered order and relational
order) were more for illustration purposes. At the contributor summit, I
think Peff's suggestion was to stick with config ordering until we
discover something more robust is needed, which is fine by me. At that
time, I don't see a problem with doing something like:

[hook]
  ordering = numerical

[hookcmd "my-command"]
  command = ~/my-command.sh
  order = 001

(which means others can still rely on config ordering if they want.)

Or, to put it another way, I don't think we need to solve the config
ordering problem today - as long as we don't make it impossible for us
to change tomorrow :)

 - Emily

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [TOPIC 2/17] Hooks in the future
  2020-04-15  3:45                 ` [TOPIC 2/17] Hooks in the future Jonathan Nieder
@ 2020-04-15 20:59                   ` Emily Shaffer
  2020-04-20 23:53                     ` [PATCH] doc: propose hooks managed by the config Emily Shaffer
  2020-04-15 22:42                   ` [TOPIC 2/17] Hooks in the future Jeff King
  1 sibling, 1 reply; 125+ messages in thread
From: Emily Shaffer @ 2020-04-15 20:59 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Jeff King, Junio C Hamano, James Ramsay, git

On Tue, Apr 14, 2020 at 08:45:50PM -0700, Jonathan Nieder wrote:
> 
> Hi,
> 
> Jeff King wrote:
> > On Mon, Apr 13, 2020 at 12:15:15PM -0700, Emily Shaffer wrote:
> >> Jeff King wrote:
> 
> >>> Yeah, giving each block a unique name lets you give them each an order.
> >>> It seems kind of weird to me that you'd define multiple hook types for a
> >>> given name.
> >>
> >> Not so odd - git-secrets configures itself for pre-commit,
> >> prepare-commit-msg, and commit-msg-hook.
> [...]
> > Yeah, I do see how that use case makes sense. I wonder how common it is
> > versus having separate one-off hooks.
> 
> I think separately from the frequency question, we should look at the
> "what model do we want to present to the user" question.
> 
> It's not too unusual for a project with their source code in a Git
> repository to have conventions they want to nudge users toward.  I'd
> expect them to use a combination of hooks for this:
> 
> 	prepare-commit-msg
> 	commit-msg
> 	pre-push
> 
> Git LFS installs multiple hooks:
> 
> 	pre-push
> 	post-checkout
> 	post-commit
> 	post-merge
> 
> git-secrets installs multiple hooks, as already mentioned.
> 
> We've also had some instances over time of one hook replacing another,
> to improve the interface.  A program wanting to install hooks would
> then be likely to migrate from the older interface to the better one.

I find this argument particularly compelling :)

> 
> What I mean to get at is that I think thinking of them in terms of
> individual hooks, the user model assumed by these programs is to think
> of them as plugins hooking into Git.  The individual hooks are events
> that the plugin listens on.  If I am trying to disable a plugin, I
> don't want to have to learn which events it cared about.
> 
> >                                       And whether setting the order
> > priority for all hooks at once is that useful (e.g., I can easily
> > imagine a case where the pre-commit hook for program A must go before B,
> > but it's the other way around for another hook).
> 
> This I agree about.  Actually I'm skeptical about ordering
> dependencies being something that is meaningful for users to work with
> in general, except in the case of closely cooperating hook authors.
> 
> That doesn't mean we shouldn't try to futureproof for that, but I
> don't think we need to overfit on it.
> 
> [...]
> >>> And it doesn't leave a lot of room for defining
> >>> per-hook-type options; you have to make new keys like pre-push-order
> >>> (though that does work because the hook names are a finite set that
> >>> conforms to our config key names).
> 
> Exactly: field names like prePushOrder should work okay, even if
> they're a bit noisy.
> 
> [...]
> >>>   [hook "pre-receive"]
> >>>   # put any pre-receive related options here; e.g., a rule for what to
> >>>   # do with hook exit codes (e.g., stop running, run all but return exit
> >>>   # code, ignore failures, etc)
> >>>   fail = stop
> >>
> >> Interesting - so this is a default for all pre-receive hooks, that I can
> >> set at whichever scope I wish.
> 
> If I have the mental model of "these are plugins, and particular hooks
> are events they listen to", then it seems hard to make use of this
> broader setting.
> 
> But scoped to a particular (plugin, event) pair it sounds very handy.

Striking out on finding another place to fit into the thread, I wonder
if the reason some of us are thinking "I'm going to write a pre-receive
hook" rather than "I'm going to write a linter hook" may be because of
the prior single-script-per-hook limitation. As a result, when you want
to add another function to your hook, you think, "I'll modify my
pre-receive hook". I think part of this RFC is a subtle paradigm shift
away from hooks-as-units-of-work and towards hooks-as-events.

That observation doesn't really provide much guidance though, except
maybe to point out we should think about what the glossary entries would
say for terms like "hook" and "hook command" now... and I think figuring
out those definitions might help us settle on what is most logical in
the config.

(That makes me think I had better write a design doc next, before I get
too much further with RFC patches. I made one pass at one a while ago,
but it was more focused on history and choosing between alternatives;
since we seem to have agreed on an approach, I'll make another attempt
focusing on design and definition instead. I'll try to have something to
the list by next week.)

 - Emily

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH v2 0/2] configuration-based hook management
  2020-04-15 20:30                           ` Emily Shaffer
@ 2020-04-15 22:19                             ` Junio C Hamano
  0 siblings, 0 replies; 125+ messages in thread
From: Junio C Hamano @ 2020-04-15 22:19 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: Phillip Wood, Jeff King, phillip.wood, git, James Ramsay

Emily Shaffer <emilyshaffer@google.com> writes:

> Or, to put it another way, I don't think we need to solve the config
> ordering problem today - as long as we don't make it impossible for us
> to change tomorrow :)

OK.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [TOPIC 2/17] Hooks in the future
  2020-04-15  3:45                 ` [TOPIC 2/17] Hooks in the future Jonathan Nieder
  2020-04-15 20:59                   ` Emily Shaffer
@ 2020-04-15 22:42                   ` Jeff King
  2020-04-15 22:48                     ` Emily Shaffer
  1 sibling, 1 reply; 125+ messages in thread
From: Jeff King @ 2020-04-15 22:42 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Emily Shaffer, Junio C Hamano, James Ramsay, git

On Tue, Apr 14, 2020 at 08:45:50PM -0700, Jonathan Nieder wrote:

> > Yeah, I do see how that use case makes sense. I wonder how common it is
> > versus having separate one-off hooks.
> 
> I think separately from the frequency question, we should look at the
> "what model do we want to present to the user" question.

I sort of agree. The mental model is important, but we should avoid
presenting a model that is overly complex to a user who only wants to do
simple things. So how common that simple thing is impacts the answer to
your question.

> [...]
> What I mean to get at is that I think thinking of them in terms of
> individual hooks, the user model assumed by these programs is to think
> of them as plugins hooking into Git.  The individual hooks are events
> that the plugin listens on.  If I am trying to disable a plugin, I
> don't want to have to learn which events it cared about.

Sure, I agree that's a perfectly reasonable mental model. But for
somebody who just wants to do a one-off hook, they're now saddled with a
thing they don't care about: defining a plugin group for their hook.

The examples you gave are all reasonable, but personally I've never used
anything other than one-off hooks.

On the other hand, I've very rarely used hooks at all myself.

To be clear, I don't _really_ care all that much, and this isn't a hill
I particularly care to die on. I was mostly just clarifying my earlier
suggestion. (I _am_ somewhat amazed that the simple concept of "I would
like to run this shell command instead of $GIT_DIR/hooks/foo" has
generated so much discussion. So really I am in favor of whatever lets
me stop thinking about this as soon as possible).

> >                                       And whether setting the order
> > priority for all hooks at once is that useful (e.g., I can easily
> > imagine a case where the pre-commit hook for program A must go before B,
> > but it's the other way around for another hook).
> 
> This I agree about.  Actually I'm skeptical about ordering
> dependencies being something that is meaningful for users to work with
> in general, except in the case of closely cooperating hook authors.
>
> That doesn't mean we shouldn't try to futureproof for that, but I
> don't think we need to overfit on it.

I share that skepticism (and also agree that avoiding painting ourselves
into a corner is the main thing).

> >>> And it doesn't leave a lot of room for defining
> >>> per-hook-type options; you have to make new keys like pre-push-order
> >>> (though that does work because the hook names are a finite set that
> >>> conforms to our config key names).
> 
> Exactly: field names like prePushOrder should work okay, even if
> they're a bit noisy.

A side note:

Here you've done a custom munging of pre-push into prePush. I'm fine
with that, but would we ever want to allow third-party scripts to define
their own hooks using this mechanism? E.g., if there's a git-hooks
command could I run "git hooks run foo" to run the foo hook? If so, then
it might be simpler to just use the name as-is rather than defining the
exact munging rules.

-Peff

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [TOPIC 2/17] Hooks in the future
  2020-04-15 22:42                   ` [TOPIC 2/17] Hooks in the future Jeff King
@ 2020-04-15 22:48                     ` Emily Shaffer
  2020-04-15 22:57                       ` Jeff King
  0 siblings, 1 reply; 125+ messages in thread
From: Emily Shaffer @ 2020-04-15 22:48 UTC (permalink / raw)
  To: Jeff King; +Cc: Jonathan Nieder, Junio C Hamano, James Ramsay, git

On Wed, Apr 15, 2020 at 06:42:44PM -0400, Jeff King wrote:
> 
> On Tue, Apr 14, 2020 at 08:45:50PM -0700, Jonathan Nieder wrote:
> > >>> And it doesn't leave a lot of room for defining
> > >>> per-hook-type options; you have to make new keys like pre-push-order
> > >>> (though that does work because the hook names are a finite set that
> > >>> conforms to our config key names).
> > 
> > Exactly: field names like prePushOrder should work okay, even if
> > they're a bit noisy.
> 
> A side note:
> 
> Here you've done a custom munging of pre-push into prePush. I'm fine
> with that, but would we ever want to allow third-party scripts to define
> their own hooks using this mechanism? E.g., if there's a git-hooks
> command could I run "git hooks run foo" to run the foo hook? If so, then
> it might be simpler to just use the name as-is rather than defining the
> exact munging rules.

I did envision that kind of thing, or at very least something like
`git hook --list --porcelain foo | xargs -n 1 sh -c`. When I saw
Jonathan's suggestion I wondered if using the hookname as is (pre-push)
was not idiomatic to the config, and maybe I should change it. But I
would rather leave it identical to the hookname, personally.

 - Emily

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [TOPIC 2/17] Hooks in the future
  2020-04-15 22:48                     ` Emily Shaffer
@ 2020-04-15 22:57                       ` Jeff King
  0 siblings, 0 replies; 125+ messages in thread
From: Jeff King @ 2020-04-15 22:57 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: Jonathan Nieder, Junio C Hamano, James Ramsay, git

On Wed, Apr 15, 2020 at 03:48:52PM -0700, Emily Shaffer wrote:

> > Here you've done a custom munging of pre-push into prePush. I'm fine
> > with that, but would we ever want to allow third-party scripts to define
> > their own hooks using this mechanism? E.g., if there's a git-hooks
> > command could I run "git hooks run foo" to run the foo hook? If so, then
> > it might be simpler to just use the name as-is rather than defining the
> > exact munging rules.
> 
> I did envision that kind of thing, or at very least something like
> `git hook --list --porcelain foo | xargs -n 1 sh -c`. When I saw
> Jonathan's suggestion I wondered if using the hookname as is (pre-push)
> was not idiomatic to the config, and maybe I should change it. But I
> would rather leave it identical to the hookname, personally.

You do still have to communicate to users of git-hook that their hook
names are limited to the characters used in config keys. But that seems
simpler to me than describing any special dash-and-capitalization
conversion.

-Peff

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 0/2] upload-pack.c: limit allowed filter choices
  2020-03-18 10:18       ` [RFC PATCH 0/2] upload-pack.c: limit allowed filter choices Jeff King
  2020-03-18 18:26         ` Re*: " Junio C Hamano
  2020-03-18 21:28         ` Taylor Blau
@ 2020-04-17  9:41         ` Christian Couder
  2020-04-17 17:40           ` Taylor Blau
  2 siblings, 1 reply; 125+ messages in thread
From: Christian Couder @ 2020-04-17  9:41 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git, James Ramsay

Hi Taylor and Peff,

On Wed, Mar 18, 2020 at 11:18 AM Jeff King <peff@peff.net> wrote:
>
> On Tue, Mar 17, 2020 at 02:39:05PM -0600, Taylor Blau wrote:
>
> > Of course, I would be happy to send along our patches. They are included
> > in the series below, and correspond roughly to what we are running at
> > GitHub. (For us, there have been a few more clean-ups and additional
> > patches, but I squashed them into 2/2 below).

Thanks for the patches, and sorry for the delay in responding!

> > The approach is roughly that we have:
> >
> >   - 'uploadpack.filter.allow' -> specifying the default for unspecified
> >     filter choices, itself defaulting to true in order to maintain
> >     backwards compatibility, and
> >
> >   - 'uploadpack.filter.<filter>.allow' -> specifying whether or not each
> >     filter kind is allowed or not. (Originally this was given as 'git
> >     config uploadpack.filter=blob:none.allow true', but this '=' is
> >     ambiguous to configuration given over '-c', which itself uses an '='
> >     to separate keys from values.)
>
> One thing that's a little ugly here is the embedded dot in the
> subsection (i.e., "filter.<filter>"). It makes it look like a four-level
> key, but really there is no such thing in Git.  But everything else we
> tried was even uglier.
>
> I think we want to declare a real subsection for each filter and not
> just "uploadpack.filter.<filter>". That gives us room to expand to other
> config options besides "allow" later on if we need to.
>
> We don't want to claim "uploadpack.allow" and "uploadpack.<filter>.allow";
> that's too generic.
>
> Likewise "filter.allow" is too generic.
>
> We could do "uploadpackfilter.allow" and "uploadpackfilter.<filter>.allow",
> but that's both ugly _and_ separates these options from the rest of
> uploadpack.*.

What do you think about something like:

[promisorFilter "noBlobs"]
        type = blob:none
        uploadpack = true # maybe "allow" could also mean "true" here
        ...
?

> > I noted in the second patch that there is the unfortunate possibility of
> > encountering a SIGPIPE when trying to write the ERR sideband back to a
> > client who requested a non-supported filter. Peff and I have had some
> > discussion off-list about resurrecting SZEDZER's work which makes room
> > in the buffer by reading one packet back from the client when the server
> > encounters a SIGPIPE. It is for this reason that I am marking the series
> > as 'RFC'.
>
> For reference, the patch I was thinking of was this:
>
>   https://lore.kernel.org/git/20190830121005.GI8571@szeder.dev/

Are you using the patches in this series with or without something
like the above patch? I am ok to resend this patch series including
the above patch (crediting Szeder) if you use something like it.

Thanks,
Christian.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 0/2] upload-pack.c: limit allowed filter choices
  2020-04-17  9:41         ` Christian Couder
@ 2020-04-17 17:40           ` Taylor Blau
  2020-04-17 18:06             ` Jeff King
  2020-04-21 12:17             ` Christian Couder
  0 siblings, 2 replies; 125+ messages in thread
From: Taylor Blau @ 2020-04-17 17:40 UTC (permalink / raw)
  To: Christian Couder; +Cc: Jeff King, Taylor Blau, git, James Ramsay

On Fri, Apr 17, 2020 at 11:41:48AM +0200, Christian Couder wrote:
> Hi Taylor and Peff,
>
> On Wed, Mar 18, 2020 at 11:18 AM Jeff King <peff@peff.net> wrote:
> >
> > On Tue, Mar 17, 2020 at 02:39:05PM -0600, Taylor Blau wrote:
> >
> > > Of course, I would be happy to send along our patches. They are included
> > > in the series below, and correspond roughly to what we are running at
> > > GitHub. (For us, there have been a few more clean-ups and additional
> > > patches, but I squashed them into 2/2 below).
>
> Thanks for the patches, and sorry for the delay in responding!

No need to apologize. Clearly these had slipped my mind, too :).

> > > The approach is roughly that we have:
> > >
> > >   - 'uploadpack.filter.allow' -> specifying the default for unspecified
> > >     filter choices, itself defaulting to true in order to maintain
> > >     backwards compatibility, and
> > >
> > >   - 'uploadpack.filter.<filter>.allow' -> specifying whether or not each
> > >     filter kind is allowed or not. (Originally this was given as 'git
> > >     config uploadpack.filter=blob:none.allow true', but this '=' is
> > >     ambiguous to configuration given over '-c', which itself uses an '='
> > >     to separate keys from values.)
> >
> > One thing that's a little ugly here is the embedded dot in the
> > subsection (i.e., "filter.<filter>"). It makes it look like a four-level
> > key, but really there is no such thing in Git.  But everything else we
> > tried was even uglier.
> >
> > I think we want to declare a real subsection for each filter and not
> > just "uploadpack.filter.<filter>". That gives us room to expand to other
> > config options besides "allow" later on if we need to.
> >
> > We don't want to claim "uploadpack.allow" and "uploadpack.<filter>.allow";
> > that's too generic.
> >
> > Likewise "filter.allow" is too generic.
> >
> > We could do "uploadpackfilter.allow" and "uploadpackfilter.<filter>.allow",
> > but that's both ugly _and_ separates these options from the rest of
> > uploadpack.*.
>
> What do you think about something like:
>
> [promisorFilter "noBlobs"]
>         type = blob:none
>         uploadpack = true # maybe "allow" could also mean "true" here
>         ...
> ?

I'm not sure about introducing a layer of indirection here with
"noBlobs". It's nice that it could perhaps be enabled/disabled for
different builtins (e.g., by adding 'revList = false', say), but I'm not
convinced that this is improving all of those cases, either.

For example, what happens if I have something like:

  [uploadpack "filter.tree"]
    maxDepth = 1
    allow = true

but I want to use a different value of maxDepth for, say, rev-list? I'd
rather have two sections (each for the 'tree' filter, but scoped to
'upload-pack' and 'rev-list' separately) than write something like:

  [promisorFilter "treeDepth"]
          type = tree
          uploadpack = true
          uploadpackMaxDepth = 1
          revList = true
          revListMaxDepth = 0
          ...

So, yeah, the current system is not great because it has the '.' in the
second component. I am definitely eager to hear other suggestions about
naming it differently, but I think that the general structure is on
track.

One thing that I can think of (other than replacing the '.' with another
delimiting character other than '=') is renaming the key from
'uploadPack' to 'uploadPackFilter'. I believe that this was suggested by
Juino (?) earlier in the thread. I think that it's a fine resolution to
this, but I'm also not opposed to what is currently written in too above patches.

> > > I noted in the second patch that there is the unfortunate possibility of
> > > encountering a SIGPIPE when trying to write the ERR sideband back to a
> > > client who requested a non-supported filter. Peff and I have had some
> > > discussion off-list about resurrecting SZEDZER's work which makes room
> > > in the buffer by reading one packet back from the client when the server
> > > encounters a SIGPIPE. It is for this reason that I am marking the series
> > > as 'RFC'.
> >
> > For reference, the patch I was thinking of was this:
> >
> >   https://lore.kernel.org/git/20190830121005.GI8571@szeder.dev/
>
> Are you using the patches in this series with or without something
> like the above patch? I am ok to resend this patch series including
> the above patch (crediting Szeder) if you use something like it.

We're not using them, but without them we suffer from a problem that if
we can get a SIGPIPE when writing the "sorry, I don't support that
filter" message back to the client, then they won't receive it.

Szeder's patches help address that issue by catching the SIGPIPE and
popping off enough from the client buffer so that we can write the
message out before dying.

I appreciate your offer to resubmit the series on my behalf, but I was
already planning on doing this myself and wouldn't want to burden you
with another to-do. I'll be happy to take it on myself, probably within
a week or so.

> Thanks,
> Christian.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 0/2] upload-pack.c: limit allowed filter choices
  2020-04-17 17:40           ` Taylor Blau
@ 2020-04-17 18:06             ` Jeff King
  2020-04-21 12:34               ` Christian Couder
  2020-04-22 20:42               ` Taylor Blau
  2020-04-21 12:17             ` Christian Couder
  1 sibling, 2 replies; 125+ messages in thread
From: Jeff King @ 2020-04-17 18:06 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Christian Couder, git, James Ramsay

On Fri, Apr 17, 2020 at 11:40:30AM -0600, Taylor Blau wrote:

> > What do you think about something like:
> >
> > [promisorFilter "noBlobs"]
> >         type = blob:none
> >         uploadpack = true # maybe "allow" could also mean "true" here
> >         ...
> > ?
> 
> I'm not sure about introducing a layer of indirection here with
> "noBlobs". It's nice that it could perhaps be enabled/disabled for
> different builtins (e.g., by adding 'revList = false', say), but I'm not
> convinced that this is improving all of those cases, either.

Yeah, I don't like forcing the user to invent a subsection name. My
first thought was to suggest:

  [promisorFilter "blob:none"]
  uploadpack = true

but your tree example shows why that gets awkward: there are more keys
than just "allow this".

> One thing that I can think of (other than replacing the '.' with another
> delimiting character other than '=') is renaming the key from
> 'uploadPack' to 'uploadPackFilter'. I believe that this was suggested by

Yeah, that proposal isn't bad. To me the two viable options seem like:

 - uploadpack.filter.<filter>.*: this has the ugly fake multilevel
   subsection, but stays under uploadpack.*

 - uploadpackfilter.<filter>.*: more natural subsection, but not grouped
   syntactically with other uploadpack stuff

I am actually leaning towards the second. It should make the parsing
code less confusing, and it's not like there aren't already other config
sections that impact uploadpack.

> > > For reference, the patch I was thinking of was this:
> > >
> > >   https://lore.kernel.org/git/20190830121005.GI8571@szeder.dev/
> >
> > Are you using the patches in this series with or without something
> > like the above patch? I am ok to resend this patch series including
> > the above patch (crediting Szeder) if you use something like it.
> 
> We're not using them, but without them we suffer from a problem that if
> we can get a SIGPIPE when writing the "sorry, I don't support that
> filter" message back to the client, then they won't receive it.
> 
> Szeder's patches help address that issue by catching the SIGPIPE and
> popping off enough from the client buffer so that we can write the
> message out before dying.

I definitely think we should pursue that patch, but it really can be
done orthogonally. It's an existing bug that affects other instances
where upload-pack returns an error. The tests can work around it with
"test_must_fail ok=sigpipe" in the meantime.

-Peff

^ permalink raw reply	[flat|nested] 125+ messages in thread

* [PATCH] doc: propose hooks managed by the config
  2020-04-15 20:59                   ` Emily Shaffer
@ 2020-04-20 23:53                     ` Emily Shaffer
  2020-04-21  0:22                       ` Emily Shaffer
  2020-04-25 20:57                       ` brian m. carlson
  0 siblings, 2 replies; 125+ messages in thread
From: Emily Shaffer @ 2020-04-20 23:53 UTC (permalink / raw)
  To: git
  Cc: Emily Shaffer, Jeff King, Junio C Hamano, James Ramsay,
	Jonathan Nieder, brian m. carlson,
	Ævar Arnfjörð Bjarmason, Phillip Wood,
	Josh Steadmon

Begin a design document for config-based hooks, managed via git-hook.
Focus on an overview of the implementation and motivation for design
decisions. Briefly discuss the alternatives considered before this
point. Also, attempt to redefine terms to fit into a multihook world.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
Hi all,

I wasn't sure whether it made more sense to leave the design doc in the
conversation or not, but I figured it fit well into the context. I tried
to also add relevant IDs to the "References" headers to this mail.

Hopefully this is complete enough that we can discuss it directly until
we feel comfortable getting ready for implementation. I'm planning to
send a reply today with some comments, too.

 - Emily

 Documentation/Makefile                        |   1 +
 .../technical/config-based-hooks.txt          | 317 ++++++++++++++++++
 2 files changed, 318 insertions(+)
 create mode 100644 Documentation/technical/config-based-hooks.txt

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 8fe829cc1b..301111f236 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -79,6 +79,7 @@ SP_ARTICLES += $(API_DOCS)
 TECH_DOCS += MyFirstContribution
 TECH_DOCS += MyFirstObjectWalk
 TECH_DOCS += SubmittingPatches
+TECH_DOCS += technical/config-based-hooks
 TECH_DOCS += technical/hash-function-transition
 TECH_DOCS += technical/http-protocol
 TECH_DOCS += technical/index-format
diff --git a/Documentation/technical/config-based-hooks.txt b/Documentation/technical/config-based-hooks.txt
new file mode 100644
index 0000000000..38893423be
--- /dev/null
+++ b/Documentation/technical/config-based-hooks.txt
@@ -0,0 +1,317 @@
+Configuration-based hook management
+===================================
+
+== Motivation
+
+Treat hooks as a first-class citizen by replacing the .git/hook/hookname path as
+the only source of hooks to execute, in a way which is friendly to users with
+multiple repos which have similar needs.
+
+Redefine "hook" as an event rather than a single script, allowing users to
+perform unrelated actions on a single event.
+
+Take a step closer to safety when copying zipped Git repositories from untrusted
+users.
+
+Make it easier for users to discover Git's hook feature and automate their
+workflows.
+
+== User interfaces
+
+=== Config schema
+
+Hooks can be introduced by editing the configuration manually. There are two new
+sections added, `hook` and `hookcmd`.
+
+==== `hook`
+
+Primarily contains subsections for each hook event. These subsections define
+hook command execution order; hook commands can be specified by passing the
+command directly if no additional configuration is needed, or by passing the
+name of a `hookcmd`. If Git does not find a `hookcmd` whose subsection matches
+the value of the given command string, Git will try to execute the string
+directly. Hook event subsections can also contain per-hook-event settings.
+
+Also contains top-level hook execution settings, for example,
+`hook.warnHookDir`, `hook.runHookDir`, or `hook.disableAll`.
+
+----
+[hook "pre-commit"]
+  command = perl-linter
+  command = /usr/bin/git-secrets --pre-commit
+
+[hook "pre-applypatch"]
+  command = perl-linter
+  error = ignore
+
+[hook]
+  warnHookDir = true
+  runHookDir = prompt
+----
+
+==== `hookcmd`
+
+Defines a hook command and its attributes, which will be used when a hook event
+occurs. Unqualified attributes are assumed to apply to this hook during all hook
+events, but event-specific attributes can also be supplied. The example runs
+`/usr/bin/lint-it --language=perl <args passed by Git>`, but for repos which
+include this config, the hook command will be skipped for all events to which
+it's normally subscribed _except_ `pre-commit`.
+
+----
+[hookcmd "perl-linter"]
+  command = /usr/bin/lint-it --language=perl
+  skip = true
+  pre-commit-skip = false
+----
+
+=== Command-line API
+
+Users should be able to view, reorder, and create hook commands via the command
+line. External tools should be able to view a list of hooks in the correct order
+to run.
+
+*`git hook list <hook-event>`*
+
+*`git hook list (--system|--global|--local|--worktree)`*
+
+*`git hook edit <hook-event>`*
+
+*`git hook add <hook-command> <hook-event> <options...>`*
+
+=== Hook editor
+
+The tool which is presented by `git hook edit <hook-command>`. Ideally, this
+tool should be easier to use than manually editing the config, and then produce
+a concise config afterwards. It may take a form similar to `git rebase
+--interactive`.
+
+== Implementation
+
+=== Library
+
+`hook.c` and `hook.h` are responsible for interacting with the config files. In
+the case when the code generating a hook event doesn't have special concerns
+about how to run the hooks, the hook library will provide a basic API to call
+all hooks in config order with an `argv_array` provided by the code which
+generates the hook event:
+
+*`int run_hooks(const char *hookname, struct argv_array *args)`*
+
+This call includes the hook command provided by `run-command.h:find_hook()`;
+eventually, this legacy hook will be gated by a config `hook.runHookDir`. The
+config is checked against a number of cases:
+
+- "no": the legacy hook will not be run
+- "interactive": Git will prompt the user before running the legacy hook
+- "warn": Git will print a warning to stderr before running the legacy hook
+- "yes" (default): Git will silently run the legacy hook
+
+If `hook.runHookDir` is provided more than once, Git will use the most
+restrictive setting provided, for security reasons.
+
+If the caller wants to do something more complicated, the hook library can also
+provide a callback API:
+
+*`int for_each_hookcmd(const char *hookname, hookcmd_function *cb)`*
+
+Finally, to facilitate the builtin, the library will also provide the following
+APIs to interact with the config:
+
+----
+int set_hook_commands(const char *hookname, struct string_list *commands,
+	enum config_scope scope);
+int set_hookcmd(const char *hookcmd, struct hookcmd options);
+
+int list_hook_commands(const char *hookname, struct string_list *commands);
+int list_hooks_in_scope(enum config_scope scope, struct string_list *commands);
+----
+
+`struct hookcmd` is expected to grow in size over time as more functionality is
+added to hooks; so that other parts of the code don't need to understand the
+config schema, `struct hookcmd` should contain logical values instead of string
+pairs.
+
+----
+struct hookcmd {
+  const char *name;
+  const char *command;
+
+  /* for illustration only; not planned at present */
+  int parallelizable;
+  const char *hookcmd_before;
+  const char *hookcmd_after;
+  enum recovery_action on_fail;
+}
+----
+
+=== Builtin
+
+`builtin/hook.c` is responsible for providing the frontend. It's responsible for
+formatting user-provided data and then calling the library API to set the
+configs as appropriate. The builtin frontend is not responsible for calling the
+config directly, so that other areas of Git can rely on the hook library to
+understand the most recent config schema for hooks.
+
+=== Migration path
+
+==== Stage 0
+
+Hooks are called by running `run-command.h:find_hook()` with the hookname and
+executing the result. The hook library and builtin do not exist. Hooks only
+exist as specially named scripts within `.git/hooks/`.
+
+==== Stage 1
+
+`git hook list --porcelain <hook-event>` is implemented. Users can replace their
+`.git/hooks/<hook-event>` scripts with a trampoline based on `git hook list`'s
+output. Modifier commands like `git hook add` and `git hook edit` can be
+implemented around this time as well.
+
+==== Stage 2
+
+`hook.h:run_hooks()` is taught to include `run-command.h:find_hook()` at the
+end; calls to `find_hook()` are replaced with calls to `run_hooks()`. Users can
+opt-in to config-based hooks simply by creating some in their config; otherwise
+users should remain unaffected by the change.
+
+==== Stage 3
+
+The call to `find_hook()` inside of `run_hooks()` learns to check for a config,
+`hook.runHookDir`. Users can opt into managing their hooks completely via the
+config this way.
+
+==== Stage 4
+
+`.git/hooks` is removed from the template and the hook directory is considered
+deprecated. To avoid breaking older repos, the default of `hook.runHookDir` is
+not changed, and `find_hook()` is not removed.
+
+== Caveats
+
+=== Security and repo config
+
+Part of the motivation behind this refactor is to mitigate hooks as an attack
+vector;footnote:[https://lore.kernel.org/git/20171002234517.GV19555@aiede.mtv.corp.google.com/]
+however, as the design stands, users can still provide hooks in the repo-level
+config, which is included when a repo is zipped and sent elsewhere.  The
+security of the repo-level config is still under discussion; this design
+generally assumes the repo-level config is secure, which is not true yet. The
+goal is to avoid an overcomplicated design to work around a problem which has
+ceased to exist.
+
+=== Ease of use
+
+The config schema is nontrivial; that's why it's important for the `git hook`
+modifier commands to be usable. Contributors with UX expertise are encouraged to
+share their suggestions.
+
+== Alternative approaches
+
+A previous summary of alternatives exists in the
+archives.footnote:[https://lore.kernel.org/git/20191116011125.GG22855@google.com]
+
+=== Status quo
+
+Today users can implement multihooks themselves by using a "trampoline script"
+as their hook, and pointing that script to a directory or list of other scripts
+they wish to run.
+
+=== Hook directories
+
+Other contributors have suggested Git learn about the existence of a directory
+such as `.git/hooks/<hookname>.d` and execute those hooks in alphabetical order.
+
+=== Comparison table
+
+.Comparison of alternatives
+|===
+|Feature |Config-based hooks |Hook directories |Status quo
+
+|Supports multiple hooks
+|Natively
+|Natively
+|With user effort
+
+|Safer for zipped repos
+|A little
+|No
+|No
+
+|Previous hooks just work
+|If configured
+|Yes
+|Yes
+
+|Can install one hook to many repos
+|Yes
+|No
+|No
+
+|Discoverability
+|Better (in `git help git`)
+|Same as before
+|Same as before
+
+|Hard to run unexpected hook
+|If configured
+|No
+|No
+|===
+
+== Future work
+
+=== Execution ordering
+
+We may find that config order is insufficient for some users; for example,
+config order makes it difficult to add a new hook to the system or global config
+which runs at the end of the hook list. A new ordering schema should be:
+
+1) Specified by a `hook.order` config, so that users will not unexpectedly see
+their order change;
+
+2) Either dependency or numerically based.
+
+Dependency-based ordering is prone to classic linked-list problems, like a
+cycles and handling of missing dependencies. But, it paves the way for enabling
+parallelization if some tasks truly depend on others.
+
+Numerical ordering makes it tricky for Git to generate suggested ordering
+numbers for each command, but is easy to determine a definitive order.
+
+=== Parallelization
+
+Users with many hooks might want to run them simultaneously, if the hooks don't
+modify state; if one hook depends on another's output, then users will want to
+specify those dependencies. If we decide to solve this problem, we may want to
+look to modern build systems for inspiration on how to manage dependencies and
+parallel tasks.
+
+=== Securing hookdir hooks
+
+With the design as written in this doc, it's still possible for a malicious user
+to modify `.git/config` to include `hook.pre-receive.command = rm -rf /`, then
+zip their repo and send it to another user. It may be necessary to teach Git to
+only allow one-line hooks like this if they were configured outside of the local
+scope; or another approach, like a list of safe projects, might be useful. It
+may also be sufficient (or at least useful) to teach a `hook.disableAll` config
+or similar flag to the Git executable.
+
+=== Submodule inheritance
+
+It's possible some submodules may want to run the identical set of hooks that
+their superrepo runs. While a globally-configured hook set is helpful, it's not
+a great solution for users who have multiple repos-with-submodules under the
+same user. It would be useful for submodules to learn how to run hooks from
+their superrepo's config, or inherit that hook setting.
+
+== Glossary
+
+*hook event*
+
+A point during Git's execution where user scripts may be run, for example,
+_prepare-commit-msg_ or _pre-push_.
+
+*hook command*
+
+A user script or executable which will be run on one or more hook events.
-- 
2.26.1.301.g55bc3eb7cb9-goog


^ permalink raw reply related	[flat|nested] 125+ messages in thread

* Re: [PATCH] doc: propose hooks managed by the config
  2020-04-20 23:53                     ` [PATCH] doc: propose hooks managed by the config Emily Shaffer
@ 2020-04-21  0:22                       ` Emily Shaffer
  2020-04-21  1:20                         ` Junio C Hamano
  2020-04-25 20:57                       ` brian m. carlson
  1 sibling, 1 reply; 125+ messages in thread
From: Emily Shaffer @ 2020-04-21  0:22 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Junio C Hamano, James Ramsay, Jonathan Nieder,
	brian m. carlson, Ævar Arnfjörð Bjarmason,
	Phillip Wood, Josh Steadmon

On Mon, Apr 20, 2020 at 04:53:10PM -0700, Emily Shaffer wrote:
> 
> Begin a design document for config-based hooks, managed via git-hook.
> Focus on an overview of the implementation and motivation for design
> decisions. Briefly discuss the alternatives considered before this
> point. Also, attempt to redefine terms to fit into a multihook world.
> 
> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> ---
> Hi all,
> 
> I wasn't sure whether it made more sense to leave the design doc in the
> conversation or not, but I figured it fit well into the context. I tried
> to also add relevant IDs to the "References" headers to this mail.
> 
> Hopefully this is complete enough that we can discuss it directly until
> we feel comfortable getting ready for implementation. I'm planning to
> send a reply today with some comments, too.
> 
>  - Emily
> 
>  Documentation/Makefile                        |   1 +
>  .../technical/config-based-hooks.txt          | 317 ++++++++++++++++++
>  2 files changed, 318 insertions(+)
>  create mode 100644 Documentation/technical/config-based-hooks.txt
> 
> diff --git a/Documentation/Makefile b/Documentation/Makefile
> index 8fe829cc1b..301111f236 100644
> --- a/Documentation/Makefile
> +++ b/Documentation/Makefile
> @@ -79,6 +79,7 @@ SP_ARTICLES += $(API_DOCS)
>  TECH_DOCS += MyFirstContribution
>  TECH_DOCS += MyFirstObjectWalk
>  TECH_DOCS += SubmittingPatches
> +TECH_DOCS += technical/config-based-hooks
>  TECH_DOCS += technical/hash-function-transition
>  TECH_DOCS += technical/http-protocol
>  TECH_DOCS += technical/index-format
> diff --git a/Documentation/technical/config-based-hooks.txt b/Documentation/technical/config-based-hooks.txt
> new file mode 100644
> index 0000000000..38893423be
> --- /dev/null
> +++ b/Documentation/technical/config-based-hooks.txt
> @@ -0,0 +1,317 @@
> +Configuration-based hook management
> +===================================
> +
> +== Motivation
> +
> +Treat hooks as a first-class citizen by replacing the .git/hook/hookname path as
> +the only source of hooks to execute, in a way which is friendly to users with
> +multiple repos which have similar needs.
> +
> +Redefine "hook" as an event rather than a single script, allowing users to
> +perform unrelated actions on a single event.
> +
> +Take a step closer to safety when copying zipped Git repositories from untrusted
> +users.
> +
> +Make it easier for users to discover Git's hook feature and automate their
> +workflows.
> +
> +== User interfaces
> +
> +=== Config schema
> +
> +Hooks can be introduced by editing the configuration manually. There are two new
> +sections added, `hook` and `hookcmd`.
> +
> +==== `hook`
> +
> +Primarily contains subsections for each hook event. These subsections define
> +hook command execution order; hook commands can be specified by passing the
> +command directly if no additional configuration is needed, or by passing the
> +name of a `hookcmd`. If Git does not find a `hookcmd` whose subsection matches
> +the value of the given command string, Git will try to execute the string
> +directly. Hook event subsections can also contain per-hook-event settings.
> +
> +Also contains top-level hook execution settings, for example,
> +`hook.warnHookDir`, `hook.runHookDir`, or `hook.disableAll`.
> +
> +----
> +[hook "pre-commit"]
> +  command = perl-linter
> +  command = /usr/bin/git-secrets --pre-commit
> +
> +[hook "pre-applypatch"]
> +  command = perl-linter
> +  error = ignore
> +
> +[hook]
> +  warnHookDir = true
> +  runHookDir = prompt

whoops, just realized this doesn't match the proposal below. Wrote these
on different days :)

> +----
> +
> +==== `hookcmd`
> +
> +Defines a hook command and its attributes, which will be used when a hook event
> +occurs. Unqualified attributes are assumed to apply to this hook during all hook
> +events, but event-specific attributes can also be supplied. The example runs
> +`/usr/bin/lint-it --language=perl <args passed by Git>`, but for repos which
> +include this config, the hook command will be skipped for all events to which
> +it's normally subscribed _except_ `pre-commit`.
> +
> +----
> +[hookcmd "perl-linter"]
> +  command = /usr/bin/lint-it --language=perl
> +  skip = true
> +  pre-commit-skip = false
> +----
> +
> +=== Command-line API
> +
> +Users should be able to view, reorder, and create hook commands via the command
> +line. External tools should be able to view a list of hooks in the correct order
> +to run.
> +
> +*`git hook list <hook-event>`*
> +
> +*`git hook list (--system|--global|--local|--worktree)`*
> +
> +*`git hook edit <hook-event>`*
> +
> +*`git hook add <hook-command> <hook-event> <options...>`*
> +
> +=== Hook editor
> +
> +The tool which is presented by `git hook edit <hook-command>`. Ideally, this
> +tool should be easier to use than manually editing the config, and then produce
> +a concise config afterwards. It may take a form similar to `git rebase
> +--interactive`.

This section is a little thin because I'm hoping to meet with some UX
folks on our end and get a better suggestion. Suggestions welcome from
upstream too - I'm having trouble coming up with anything that's better
than modifying the config directly.

> +
> +== Implementation
> +
> +=== Library
> +
> +`hook.c` and `hook.h` are responsible for interacting with the config files. In
> +the case when the code generating a hook event doesn't have special concerns
> +about how to run the hooks, the hook library will provide a basic API to call
> +all hooks in config order with an `argv_array` provided by the code which
> +generates the hook event:
> +
> +*`int run_hooks(const char *hookname, struct argv_array *args)`*
> +
> +This call includes the hook command provided by `run-command.h:find_hook()`;
> +eventually, this legacy hook will be gated by a config `hook.runHookDir`. The
> +config is checked against a number of cases:
> +
> +- "no": the legacy hook will not be run
> +- "interactive": Git will prompt the user before running the legacy hook
> +- "warn": Git will print a warning to stderr before running the legacy hook
> +- "yes" (default): Git will silently run the legacy hook
> +
> +If `hook.runHookDir` is provided more than once, Git will use the most
> +restrictive setting provided, for security reasons.
> +
> +If the caller wants to do something more complicated, the hook library can also
> +provide a callback API:
> +
> +*`int for_each_hookcmd(const char *hookname, hookcmd_function *cb)`*

Another alternative is to do this by providing a linked-list of
structs or even just an ordered string_list; that means the caller
becomes responsible for config syntax and parallelization, which I
didn't want. I'm open to hearing more argument. (on the rest of the doc
too... but also here. :) )

> +
> +Finally, to facilitate the builtin, the library will also provide the following
> +APIs to interact with the config:
> +
> +----
> +int set_hook_commands(const char *hookname, struct string_list *commands,
> +	enum config_scope scope);
> +int set_hookcmd(const char *hookcmd, struct hookcmd options);
> +
> +int list_hook_commands(const char *hookname, struct string_list *commands);
> +int list_hooks_in_scope(enum config_scope scope, struct string_list *commands);
> +----
> +
> +`struct hookcmd` is expected to grow in size over time as more functionality is
> +added to hooks; so that other parts of the code don't need to understand the
> +config schema, `struct hookcmd` should contain logical values instead of string
> +pairs.
> +
> +----
> +struct hookcmd {
> +  const char *name;
> +  const char *command;
> +
> +  /* for illustration only; not planned at present */
> +  int parallelizable;
> +  const char *hookcmd_before;
> +  const char *hookcmd_after;
> +  enum recovery_action on_fail;
> +}
> +----
> +
> +=== Builtin
> +
> +`builtin/hook.c` is responsible for providing the frontend. It's responsible for
> +formatting user-provided data and then calling the library API to set the
> +configs as appropriate. The builtin frontend is not responsible for calling the
> +config directly, so that other areas of Git can rely on the hook library to
> +understand the most recent config schema for hooks.
> +
> +=== Migration path
> +
> +==== Stage 0
> +
> +Hooks are called by running `run-command.h:find_hook()` with the hookname and
> +executing the result. The hook library and builtin do not exist. Hooks only
> +exist as specially named scripts within `.git/hooks/`.
> +
> +==== Stage 1
> +
> +`git hook list --porcelain <hook-event>` is implemented. Users can replace their
> +`.git/hooks/<hook-event>` scripts with a trampoline based on `git hook list`'s
> +output. Modifier commands like `git hook add` and `git hook edit` can be
> +implemented around this time as well.
> +
> +==== Stage 2
> +
> +`hook.h:run_hooks()` is taught to include `run-command.h:find_hook()` at the
> +end; calls to `find_hook()` are replaced with calls to `run_hooks()`. Users can
> +opt-in to config-based hooks simply by creating some in their config; otherwise
> +users should remain unaffected by the change.
> +
> +==== Stage 3
> +
> +The call to `find_hook()` inside of `run_hooks()` learns to check for a config,
> +`hook.runHookDir`. Users can opt into managing their hooks completely via the
> +config this way.
> +
> +==== Stage 4
> +
> +`.git/hooks` is removed from the template and the hook directory is considered
> +deprecated. To avoid breaking older repos, the default of `hook.runHookDir` is
> +not changed, and `find_hook()` is not removed.
> +
> +== Caveats
> +
> +=== Security and repo config
> +
> +Part of the motivation behind this refactor is to mitigate hooks as an attack
> +vector;footnote:[https://lore.kernel.org/git/20171002234517.GV19555@aiede.mtv.corp.google.com/]
> +however, as the design stands, users can still provide hooks in the repo-level
> +config, which is included when a repo is zipped and sent elsewhere.  The
> +security of the repo-level config is still under discussion; this design
> +generally assumes the repo-level config is secure, which is not true yet. The
> +goal is to avoid an overcomplicated design to work around a problem which has
> +ceased to exist.
> +
> +=== Ease of use
> +
> +The config schema is nontrivial; that's why it's important for the `git hook`
> +modifier commands to be usable. Contributors with UX expertise are encouraged to
> +share their suggestions.
> +
> +== Alternative approaches
> +
> +A previous summary of alternatives exists in the
> +archives.footnote:[https://lore.kernel.org/git/20191116011125.GG22855@google.com]
> +
> +=== Status quo
> +
> +Today users can implement multihooks themselves by using a "trampoline script"
> +as their hook, and pointing that script to a directory or list of other scripts
> +they wish to run.
> +
> +=== Hook directories
> +
> +Other contributors have suggested Git learn about the existence of a directory
> +such as `.git/hooks/<hookname>.d` and execute those hooks in alphabetical order.
> +
> +=== Comparison table
> +
> +.Comparison of alternatives
> +|===
> +|Feature |Config-based hooks |Hook directories |Status quo
> +
> +|Supports multiple hooks
> +|Natively
> +|Natively
> +|With user effort
> +
> +|Safer for zipped repos
> +|A little
> +|No
> +|No
> +
> +|Previous hooks just work
> +|If configured
> +|Yes
> +|Yes
> +
> +|Can install one hook to many repos
> +|Yes
> +|No
> +|No
> +
> +|Discoverability
> +|Better (in `git help git`)
> +|Same as before
> +|Same as before
> +
> +|Hard to run unexpected hook
> +|If configured
> +|No
> +|No
> +|===

Please share more features that come to your mind; I took most of this
list from the RFC I sent last fall:
https://lore.kernel.org/git/20191116011125.GG22855@google.com

> +
> +== Future work
> +
> +=== Execution ordering
> +
> +We may find that config order is insufficient for some users; for example,
> +config order makes it difficult to add a new hook to the system or global config
> +which runs at the end of the hook list. A new ordering schema should be:
> +
> +1) Specified by a `hook.order` config, so that users will not unexpectedly see
> +their order change;
> +
> +2) Either dependency or numerically based.
> +
> +Dependency-based ordering is prone to classic linked-list problems, like a
> +cycles and handling of missing dependencies. But, it paves the way for enabling
> +parallelization if some tasks truly depend on others.
> +
> +Numerical ordering makes it tricky for Git to generate suggested ordering
> +numbers for each command, but is easy to determine a definitive order.
> +
> +=== Parallelization
> +
> +Users with many hooks might want to run them simultaneously, if the hooks don't
> +modify state; if one hook depends on another's output, then users will want to
> +specify those dependencies. If we decide to solve this problem, we may want to
> +look to modern build systems for inspiration on how to manage dependencies and
> +parallel tasks.
> +
> +=== Securing hookdir hooks
> +
> +With the design as written in this doc, it's still possible for a malicious user
> +to modify `.git/config` to include `hook.pre-receive.command = rm -rf /`, then
> +zip their repo and send it to another user. It may be necessary to teach Git to
> +only allow one-line hooks like this if they were configured outside of the local
> +scope; or another approach, like a list of safe projects, might be useful. It
> +may also be sufficient (or at least useful) to teach a `hook.disableAll` config
> +or similar flag to the Git executable.
> +
> +=== Submodule inheritance
> +
> +It's possible some submodules may want to run the identical set of hooks that
> +their superrepo runs. While a globally-configured hook set is helpful, it's not
> +a great solution for users who have multiple repos-with-submodules under the
> +same user. It would be useful for submodules to learn how to run hooks from
> +their superrepo's config, or inherit that hook setting.
> +
> +== Glossary
> +
> +*hook event*
> +
> +A point during Git's execution where user scripts may be run, for example,
> +_prepare-commit-msg_ or _pre-push_.
> +
> +*hook command*
> +
> +A user script or executable which will be run on one or more hook events.

If other terms in the design doc are surprising to you, let me know and
I'll define them here too.

> -- 
> 2.26.1.301.g55bc3eb7cb9-goog
> 

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH] doc: propose hooks managed by the config
  2020-04-21  0:22                       ` Emily Shaffer
@ 2020-04-21  1:20                         ` Junio C Hamano
  2020-04-24 23:14                           ` Emily Shaffer
  0 siblings, 1 reply; 125+ messages in thread
From: Junio C Hamano @ 2020-04-21  1:20 UTC (permalink / raw)
  To: Emily Shaffer
  Cc: git, Jeff King, James Ramsay, Jonathan Nieder, brian m. carlson,
	Ævar Arnfjörð Bjarmason, Phillip Wood,
	Josh Steadmon

Emily Shaffer <emilyshaffer@google.com> writes:

> Whoops, just realized this doesn't match the proposal below. Wrote these
> on different days :)

It often is a good idea to attempt writing anything in one sitting
for coherency, and proofread the result on a separate day before
sending it out ;-)

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 0/2] upload-pack.c: limit allowed filter choices
  2020-04-17 17:40           ` Taylor Blau
  2020-04-17 18:06             ` Jeff King
@ 2020-04-21 12:17             ` Christian Couder
  1 sibling, 0 replies; 125+ messages in thread
From: Christian Couder @ 2020-04-21 12:17 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Jeff King, git, James Ramsay

On Fri, Apr 17, 2020 at 7:40 PM Taylor Blau <me@ttaylorr.com> wrote:
>
> On Fri, Apr 17, 2020 at 11:41:48AM +0200, Christian Couder wrote:

> > What do you think about something like:
> >
> > [promisorFilter "noBlobs"]
> >         type = blob:none
> >         uploadpack = true # maybe "allow" could also mean "true" here
> >         ...
> > ?
>
> I'm not sure about introducing a layer of indirection here with
> "noBlobs". It's nice that it could perhaps be enabled/disabled for
> different builtins (e.g., by adding 'revList = false', say), but I'm not
> convinced that this is improving all of those cases, either.
>
> For example, what happens if I have something like:
>
>   [uploadpack "filter.tree"]
>     maxDepth = 1
>     allow = true
>
> but I want to use a different value of maxDepth for, say, rev-list? I'd
> rather have two sections (each for the 'tree' filter, but scoped to
> 'upload-pack' and 'rev-list' separately) than write something like:
>
>   [promisorFilter "treeDepth"]
>           type = tree
>           uploadpack = true
>           uploadpackMaxDepth = 1
>           revList = true
>           revListMaxDepth = 0
>           ...

You can have two sections using:

[promisorFilter "treeDepth1"]
          type = tree
          uploadpack = true
          maxDepth = 1

[promisorFilter "treeDepth0"]
          type = tree
          revList = true
          maxDepth = 0

(Of course "treeDepth1" for example could be also spelled
"treeDepthOneLevel" or however the user prefers.)

> So, yeah, the current system is not great because it has the '.' in the
> second component. I am definitely eager to hear other suggestions about
> naming it differently, but I think that the general structure is on
> track.
>
> One thing that I can think of (other than replacing the '.' with another
> delimiting character other than '=') is renaming the key from
> 'uploadPack' to 'uploadPackFilter'.

I don't like either of those very much. I think an upload-pack filter
is not very different than a rev-list filter. They are all promisor
(or partial clone) filter, so there is no real reason to differentiate
at the top level of the key name hierarchy.

I also think that users are likely to want to use the same filters for
both upload-pack filters and rev-list filters, so using 'uploadPack'
or 'uploadPackFilter' might necessitate duplicating entries with other
keys for rev-list filters or other filters.

> > > For reference, the patch I was thinking of was this:
> > >
> > >   https://lore.kernel.org/git/20190830121005.GI8571@szeder.dev/
> >
> > Are you using the patches in this series with or without something
> > like the above patch? I am ok to resend this patch series including
> > the above patch (crediting Szeder) if you use something like it.
>
> We're not using them, but without them we suffer from a problem that if
> we can get a SIGPIPE when writing the "sorry, I don't support that
> filter" message back to the client, then they won't receive it.
>
> Szeder's patches help address that issue by catching the SIGPIPE and
> popping off enough from the client buffer so that we can write the
> message out before dying.
>
> I appreciate your offer to resubmit the series on my behalf, but I was
> already planning on doing this myself and wouldn't want to burden you
> with another to-do. I'll be happy to take it on myself, probably within
> a week or so.

Ok, I am happy that you will resubmit then.

Thanks,
Christian.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 0/2] upload-pack.c: limit allowed filter choices
  2020-04-17 18:06             ` Jeff King
@ 2020-04-21 12:34               ` Christian Couder
  2020-04-22 20:41                 ` Taylor Blau
  2020-04-22 20:42               ` Taylor Blau
  1 sibling, 1 reply; 125+ messages in thread
From: Christian Couder @ 2020-04-21 12:34 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git, James Ramsay

On Fri, Apr 17, 2020 at 8:06 PM Jeff King <peff@peff.net> wrote:
>
> On Fri, Apr 17, 2020 at 11:40:30AM -0600, Taylor Blau wrote:
>
> > > What do you think about something like:
> > >
> > > [promisorFilter "noBlobs"]
> > >         type = blob:none
> > >         uploadpack = true # maybe "allow" could also mean "true" here
> > >         ...
> > > ?
> >
> > I'm not sure about introducing a layer of indirection here with
> > "noBlobs". It's nice that it could perhaps be enabled/disabled for
> > different builtins (e.g., by adding 'revList = false', say), but I'm not
> > convinced that this is improving all of those cases, either.
>
> Yeah, I don't like forcing the user to invent a subsection name. My
> first thought was to suggest:
>
>   [promisorFilter "blob:none"]
>   uploadpack = true
>
> but your tree example shows why that gets awkward: there are more keys
> than just "allow this".

I like your first thought better than something that starts with
"uploadPack". And I think if we let people find a subsection name (as
what I suggest) they might indeed end up with something like:

[promisorFilter "blob:none"]
     type = blob:none
     uploadpack = true

as they might lack inspiration. As filters are becoming more and more
complex though, people might find it much simpler to use the
subsection name in commands if we let them do that. For example we
already allow:

git rev-list --filter=combine:<filter1>+<filter2>+...<filterN> ...

which could be simplified to:

git rev-list --filter=combinedFilter ...

(where "combinedFilter" is defined in the config with
"type=combine:<filter1>+<filter2>+...<filterN>".)

[...]

> > We're not using them, but without them we suffer from a problem that if
> > we can get a SIGPIPE when writing the "sorry, I don't support that
> > filter" message back to the client, then they won't receive it.
> >
> > Szeder's patches help address that issue by catching the SIGPIPE and
> > popping off enough from the client buffer so that we can write the
> > message out before dying.
>
> I definitely think we should pursue that patch, but it really can be
> done orthogonally. It's an existing bug that affects other instances
> where upload-pack returns an error. The tests can work around it with
> "test_must_fail ok=sigpipe" in the meantime.

Ok, maybe I will take a look a this one then.

Thanks,
Christian.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 0/2] upload-pack.c: limit allowed filter choices
  2020-04-21 12:34               ` Christian Couder
@ 2020-04-22 20:41                 ` Taylor Blau
  0 siblings, 0 replies; 125+ messages in thread
From: Taylor Blau @ 2020-04-22 20:41 UTC (permalink / raw)
  To: Christian Couder; +Cc: Jeff King, Taylor Blau, git, James Ramsay

On Tue, Apr 21, 2020 at 02:34:18PM +0200, Christian Couder wrote:
> On Fri, Apr 17, 2020 at 8:06 PM Jeff King <peff@peff.net> wrote:
> >
> > On Fri, Apr 17, 2020 at 11:40:30AM -0600, Taylor Blau wrote:
> >
> > > > What do you think about something like:
> > > >
> > > > [promisorFilter "noBlobs"]
> > > >         type = blob:none
> > > >         uploadpack = true # maybe "allow" could also mean "true" here
> > > >         ...
> > > > ?
> > >
> > > I'm not sure about introducing a layer of indirection here with
> > > "noBlobs". It's nice that it could perhaps be enabled/disabled for
> > > different builtins (e.g., by adding 'revList = false', say), but I'm not
> > > convinced that this is improving all of those cases, either.
> >
> > Yeah, I don't like forcing the user to invent a subsection name. My
> > first thought was to suggest:
> >
> >   [promisorFilter "blob:none"]
> >   uploadpack = true
> >
> > but your tree example shows why that gets awkward: there are more keys
> > than just "allow this".
>
> I like your first thought better than something that starts with
> "uploadPack". And I think if we let people find a subsection name (as
> what I suggest) they might indeed end up with something like:
>
> [promisorFilter "blob:none"]
>      type = blob:none
>      uploadpack = true
>
> as they might lack inspiration. As filters are becoming more and more
> complex though, people might find it much simpler to use the
> subsection name in commands if we let them do that. For example we
> already allow:
>
> git rev-list --filter=combine:<filter1>+<filter2>+...<filterN> ...
>
> which could be simplified to:
>
> git rev-list --filter=combinedFilter ...
>
> (where "combinedFilter" is defined in the config with
> "type=combine:<filter1>+<filter2>+...<filterN>".)
>
> [...]

I really think that we're getting ahead of ourselves here. For now, I
don't think that we have powerful enough filters that it makes sense to
put them together with combine and give them meaningful names. At least,
no one has asked about such a thing on the list, which I take to mean
that people don't have a use for it.

I'm also skeptical about relying on named filters when working with a
server. If the server defines the filter names (as we at GitHub would
do under this proposal), then what use are they to the client? For the
server, I'm not at all convinced that this is beneficial: the extra
layer of indirection through the configuration makes this brittle and
hard-to-follow.

Not to mention that the server could just as easily spell out the whole
filter.

I'm not trying to give you a too-simple proposal, but I think yours
introduces additional complexity that is trying to enable use-cases that
we don't have in practice.

> > > We're not using them, but without them we suffer from a problem that if
> > > we can get a SIGPIPE when writing the "sorry, I don't support that
> > > filter" message back to the client, then they won't receive it.
> > >
> > > Szeder's patches help address that issue by catching the SIGPIPE and
> > > popping off enough from the client buffer so that we can write the
> > > message out before dying.
> >
> > I definitely think we should pursue that patch, but it really can be
> > done orthogonally. It's an existing bug that affects other instances
> > where upload-pack returns an error. The tests can work around it with
> > "test_must_fail ok=sigpipe" in the meantime.
>
> Ok, maybe I will take a look a this one then.

Thanks.

> Thanks,
> Christian.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [RFC PATCH 0/2] upload-pack.c: limit allowed filter choices
  2020-04-17 18:06             ` Jeff King
  2020-04-21 12:34               ` Christian Couder
@ 2020-04-22 20:42               ` Taylor Blau
  1 sibling, 0 replies; 125+ messages in thread
From: Taylor Blau @ 2020-04-22 20:42 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, Christian Couder, git, James Ramsay

On Fri, Apr 17, 2020 at 02:06:45PM -0400, Jeff King wrote:
> On Fri, Apr 17, 2020 at 11:40:30AM -0600, Taylor Blau wrote:
>
> > > What do you think about something like:
> > >
> > > [promisorFilter "noBlobs"]
> > >         type = blob:none
> > >         uploadpack = true # maybe "allow" could also mean "true" here
> > >         ...
> > > ?
> >
> > I'm not sure about introducing a layer of indirection here with
> > "noBlobs". It's nice that it could perhaps be enabled/disabled for
> > different builtins (e.g., by adding 'revList = false', say), but I'm not
> > convinced that this is improving all of those cases, either.
>
> Yeah, I don't like forcing the user to invent a subsection name. My
> first thought was to suggest:
>
>   [promisorFilter "blob:none"]
>   uploadpack = true
>
> but your tree example shows why that gets awkward: there are more keys
> than just "allow this".
>
> > One thing that I can think of (other than replacing the '.' with another
> > delimiting character other than '=') is renaming the key from
> > 'uploadPack' to 'uploadPackFilter'. I believe that this was suggested by
>
> Yeah, that proposal isn't bad. To me the two viable options seem like:
>
>  - uploadpack.filter.<filter>.*: this has the ugly fake multilevel
>    subsection, but stays under uploadpack.*
>
>  - uploadpackfilter.<filter>.*: more natural subsection, but not grouped
>    syntactically with other uploadpack stuff
>
> I am actually leaning towards the second. It should make the parsing
> code less confusing, and it's not like there aren't already other config
> sections that impact uploadpack.

Me too.

> > > > For reference, the patch I was thinking of was this:
> > > >
> > > >   https://lore.kernel.org/git/20190830121005.GI8571@szeder.dev/
> > >
> > > Are you using the patches in this series with or without something
> > > like the above patch? I am ok to resend this patch series including
> > > the above patch (crediting Szeder) if you use something like it.
> >
> > We're not using them, but without them we suffer from a problem that if
> > we can get a SIGPIPE when writing the "sorry, I don't support that
> > filter" message back to the client, then they won't receive it.
> >
> > Szeder's patches help address that issue by catching the SIGPIPE and
> > popping off enough from the client buffer so that we can write the
> > message out before dying.
>
> I definitely think we should pursue that patch, but it really can be
> done orthogonally. It's an existing bug that affects other instances
> where upload-pack returns an error. The tests can work around it with
> "test_must_fail ok=sigpipe" in the meantime.

Yes, I agree. My main hesitation is that it would be uncouth of me to
send a patch that includes 'test_must_fail ok=sigpipe' to the list, but
if you (and others) feel that this is an OK intermediate step (given
that we can easily remove it once SZEDER's patch lands), then I am OK
with it, too.

And I see that Christian already posted such a patch to the list.

> -Peff

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH] doc: propose hooks managed by the config
  2020-04-21  1:20                         ` Junio C Hamano
@ 2020-04-24 23:14                           ` Emily Shaffer
  0 siblings, 0 replies; 125+ messages in thread
From: Emily Shaffer @ 2020-04-24 23:14 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Jeff King, James Ramsay, Jonathan Nieder, brian m. carlson,
	Ævar Arnfjörð Bjarmason, Phillip Wood,
	Josh Steadmon

On Mon, Apr 20, 2020 at 06:20:00PM -0700, Junio C Hamano wrote:
> 
> Emily Shaffer <emilyshaffer@google.com> writes:
> 
> > Whoops, just realized this doesn't match the proposal below. Wrote these
> > on different days :)
> 
> It often is a good idea to attempt writing anything in one sitting
> for coherency, and proofread the result on a separate day before
> sending it out ;-)

Agreed for next time :)

I didn't make it very clear in my initial comment that the only problem
here is the code snippets and the difference is very minor - I don't
think it's worth a reroll on its own without hearing feedback about the
rest. Or, to put it another way, if any interested reader said "I'll
wait to review" - don't ;)

 - Emily

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH] doc: propose hooks managed by the config
  2020-04-20 23:53                     ` [PATCH] doc: propose hooks managed by the config Emily Shaffer
  2020-04-21  0:22                       ` Emily Shaffer
@ 2020-04-25 20:57                       ` brian m. carlson
  2020-05-06 21:33                         ` Emily Shaffer
  1 sibling, 1 reply; 125+ messages in thread
From: brian m. carlson @ 2020-04-25 20:57 UTC (permalink / raw)
  To: Emily Shaffer
  Cc: git, Jeff King, Junio C Hamano, James Ramsay, Jonathan Nieder,
	Ævar Arnfjörð Bjarmason, Phillip Wood,
	Josh Steadmon

[-- Attachment #1: Type: text/plain, Size: 4591 bytes --]

On 2020-04-20 at 23:53:10, Emily Shaffer wrote:
> +=== Config schema
> +
> +Hooks can be introduced by editing the configuration manually. There are two new
> +sections added, `hook` and `hookcmd`.
> +
> +==== `hook`
> +
> +Primarily contains subsections for each hook event. These subsections define
> +hook command execution order; hook commands can be specified by passing the
> +command directly if no additional configuration is needed, or by passing the
> +name of a `hookcmd`. If Git does not find a `hookcmd` whose subsection matches
> +the value of the given command string, Git will try to execute the string
> +directly. Hook event subsections can also contain per-hook-event settings.

Can we say explicitly that the commands are invoked by the shell?  Or is
the plan to try to parse them without passing to the shell?

> +Also contains top-level hook execution settings, for example,
> +`hook.warnHookDir`, `hook.runHookDir`, or `hook.disableAll`.
> +
> +----
> +[hook "pre-commit"]
> +  command = perl-linter
> +  command = /usr/bin/git-secrets --pre-commit
> +
> +[hook "pre-applypatch"]
> +  command = perl-linter
> +  error = ignore
> +
> +[hook]
> +  warnHookDir = true
> +  runHookDir = prompt
> +----
> +
> +==== `hookcmd`
> +
> +Defines a hook command and its attributes, which will be used when a hook event
> +occurs. Unqualified attributes are assumed to apply to this hook during all hook
> +events, but event-specific attributes can also be supplied. The example runs
> +`/usr/bin/lint-it --language=perl <args passed by Git>`, but for repos which
> +include this config, the hook command will be skipped for all events to which
> +it's normally subscribed _except_ `pre-commit`.
> +
> +----
> +[hookcmd "perl-linter"]
> +  command = /usr/bin/lint-it --language=perl
> +  skip = true
> +  pre-commit-skip = false
> +----

This seems fine to me.  I like this design and it seems sane.

> +== Implementation
> +
> +=== Library
> +
> +`hook.c` and `hook.h` are responsible for interacting with the config files. In
> +the case when the code generating a hook event doesn't have special concerns
> +about how to run the hooks, the hook library will provide a basic API to call
> +all hooks in config order with an `argv_array` provided by the code which
> +generates the hook event:
> +
> +*`int run_hooks(const char *hookname, struct argv_array *args)`*
> +
> +This call includes the hook command provided by `run-command.h:find_hook()`;
> +eventually, this legacy hook will be gated by a config `hook.runHookDir`. The
> +config is checked against a number of cases:
> +
> +- "no": the legacy hook will not be run
> +- "interactive": Git will prompt the user before running the legacy hook
> +- "warn": Git will print a warning to stderr before running the legacy hook
> +- "yes" (default): Git will silently run the legacy hook
> +
> +If `hook.runHookDir` is provided more than once, Git will use the most
> +restrictive setting provided, for security reasons.

I don't think this is consistent with the way the rest of our options
work.  What if someone generally wants to disable legacy hooks but then
works with a program in a repository that requires them?

> +== Caveats
> +
> +=== Security and repo config
> +
> +Part of the motivation behind this refactor is to mitigate hooks as an attack
> +vector;footnote:[https://lore.kernel.org/git/20171002234517.GV19555@aiede.mtv.corp.google.com/]
> +however, as the design stands, users can still provide hooks in the repo-level
> +config, which is included when a repo is zipped and sent elsewhere.  The
> +security of the repo-level config is still under discussion; this design
> +generally assumes the repo-level config is secure, which is not true yet. The
> +goal is to avoid an overcomplicated design to work around a problem which has
> +ceased to exist.

I want to be clear that I'm very much opposed to trying to "secure" the
config as a whole.  I believe that it's going to ultimately lead to a
variety of new and interesting attack vectors and will lead to Git
becoming a CVE factory.  Vim has this problem with modelines, for
example.

I think we should maintain the status quo that the only safe things you
can do with an untrusted repository are clone and fetch because it sets
a clear security boundary.

Having said that, I'm otherwise pretty happy with this design and I'm
looking forward to seeing it implemented.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH] doc: propose hooks managed by the config
  2020-04-25 20:57                       ` brian m. carlson
@ 2020-05-06 21:33                         ` Emily Shaffer
  2020-05-06 23:13                           ` brian m. carlson
  2020-05-19 20:10                           ` Emily Shaffer
  0 siblings, 2 replies; 125+ messages in thread
From: Emily Shaffer @ 2020-05-06 21:33 UTC (permalink / raw)
  To: brian m. carlson, git, Jeff King, Junio C Hamano, James Ramsay,
	Jonathan Nieder, Ævar Arnfjörð Bjarmason,
	Phillip Wood, Josh Steadmon

On Sat, Apr 25, 2020 at 08:57:27PM +0000, brian m. carlson wrote:
> 
> On 2020-04-20 at 23:53:10, Emily Shaffer wrote:
> > +=== Config schema
> > +
> > +Hooks can be introduced by editing the configuration manually. There are two new
> > +sections added, `hook` and `hookcmd`.
> > +
> > +==== `hook`
> > +
> > +Primarily contains subsections for each hook event. These subsections define
> > +hook command execution order; hook commands can be specified by passing the
> > +command directly if no additional configuration is needed, or by passing the
> > +name of a `hookcmd`. If Git does not find a `hookcmd` whose subsection matches
> > +the value of the given command string, Git will try to execute the string
> > +directly. Hook event subsections can also contain per-hook-event settings.
> 
> Can we say explicitly that the commands are invoked by the shell?  Or is
> the plan to try to parse them without passing to the shell?

Sure. If I didn't make it clear it was by mistake, not by intent.

> 
> > +Also contains top-level hook execution settings, for example,
> > +`hook.warnHookDir`, `hook.runHookDir`, or `hook.disableAll`.
> > +
> > +----
> > +[hook "pre-commit"]
> > +  command = perl-linter
> > +  command = /usr/bin/git-secrets --pre-commit
> > +
> > +[hook "pre-applypatch"]
> > +  command = perl-linter
> > +  error = ignore
> > +
> > +[hook]
> > +  warnHookDir = true
> > +  runHookDir = prompt
> > +----
> > +
> > +==== `hookcmd`
> > +
> > +Defines a hook command and its attributes, which will be used when a hook event
> > +occurs. Unqualified attributes are assumed to apply to this hook during all hook
> > +events, but event-specific attributes can also be supplied. The example runs
> > +`/usr/bin/lint-it --language=perl <args passed by Git>`, but for repos which
> > +include this config, the hook command will be skipped for all events to which
> > +it's normally subscribed _except_ `pre-commit`.
> > +
> > +----
> > +[hookcmd "perl-linter"]
> > +  command = /usr/bin/lint-it --language=perl
> > +  skip = true
> > +  pre-commit-skip = false
> > +----
> 
> This seems fine to me.  I like this design and it seems sane.
> 
> > +== Implementation
> > +
> > +=== Library
> > +
> > +`hook.c` and `hook.h` are responsible for interacting with the config files. In
> > +the case when the code generating a hook event doesn't have special concerns
> > +about how to run the hooks, the hook library will provide a basic API to call
> > +all hooks in config order with an `argv_array` provided by the code which
> > +generates the hook event:
> > +
> > +*`int run_hooks(const char *hookname, struct argv_array *args)`*
> > +
> > +This call includes the hook command provided by `run-command.h:find_hook()`;
> > +eventually, this legacy hook will be gated by a config `hook.runHookDir`. The
> > +config is checked against a number of cases:
> > +
> > +- "no": the legacy hook will not be run
> > +- "interactive": Git will prompt the user before running the legacy hook
> > +- "warn": Git will print a warning to stderr before running the legacy hook
> > +- "yes" (default): Git will silently run the legacy hook
> > +
> > +If `hook.runHookDir` is provided more than once, Git will use the most
> > +restrictive setting provided, for security reasons.
> 
> I don't think this is consistent with the way the rest of our options
> work.  What if someone generally wants to disable legacy hooks but then
> works with a program in a repository that requires them?

Unfortunately this is something I think my end will want to hold firm
on. In general we disagree with your statement later about not wanting
to make the .git/config secure. I see your use case, and I anticipate
two possible workarounds I'd present:

1) If working in that repo for the short term, run `git -c
hook.runHookDir=yes <command> <arg...>` (and therefore allow the config
from command line scope, which I'm happy with in general). Maybe
someone would want to use an alias, hookgit or hg? Just kidding.. ;P

2) If you're stuck with that repo for the long term, add
`hook.<hookname>.command = /path/.git/hooks/<hookname>` lines to the local
config.

Yes, those are both somewhat user-unfriendly, and I think we can do
better... I'll have to think more and see what I can come up with.
Suggestions welcome.

> 
> > +== Caveats
> > +
> > +=== Security and repo config
> > +
> > +Part of the motivation behind this refactor is to mitigate hooks as an attack
> > +vector;footnote:[https://lore.kernel.org/git/20171002234517.GV19555@aiede.mtv.corp.google.com/]
> > +however, as the design stands, users can still provide hooks in the repo-level
> > +config, which is included when a repo is zipped and sent elsewhere.  The
> > +security of the repo-level config is still under discussion; this design
> > +generally assumes the repo-level config is secure, which is not true yet. The
> > +goal is to avoid an overcomplicated design to work around a problem which has
> > +ceased to exist.
> 
> I want to be clear that I'm very much opposed to trying to "secure" the
> config as a whole.  I believe that it's going to ultimately lead to a
> variety of new and interesting attack vectors and will lead to Git
> becoming a CVE factory.  Vim has this problem with modelines, for
> example.

I'm really interested to hear more - it seems like security and config
efforts will end up on my plate before the end of the year, so I'd like
to know what is on your mind.

> 
> I think we should maintain the status quo that the only safe things you
> can do with an untrusted repository are clone and fetch because it sets
> a clear security boundary.

I wish there was a way to make that more apparent. The trouble is that
while you and I and the sysadmin know the dangers, the high schooler
making a website might not. Talking about how to warn users is
definitely out-of-scope for this conversation, but it's on my mind.

> 
> Having said that, I'm otherwise pretty happy with this design and I'm
> looking forward to seeing it implemented.

Thanks very much for the feedback and for reading it through! :)

 - Emily

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH] doc: propose hooks managed by the config
  2020-05-06 21:33                         ` Emily Shaffer
@ 2020-05-06 23:13                           ` brian m. carlson
  2020-05-19 20:10                           ` Emily Shaffer
  1 sibling, 0 replies; 125+ messages in thread
From: brian m. carlson @ 2020-05-06 23:13 UTC (permalink / raw)
  To: Emily Shaffer
  Cc: git, Jeff King, Junio C Hamano, James Ramsay, Jonathan Nieder,
	Ævar Arnfjörð Bjarmason, Phillip Wood,
	Josh Steadmon

[-- Attachment #1: Type: text/plain, Size: 4069 bytes --]

On 2020-05-06 at 21:33:54, Emily Shaffer wrote:
> On Sat, Apr 25, 2020 at 08:57:27PM +0000, brian m. carlson wrote:
> > 
> > On 2020-04-20 at 23:53:10, Emily Shaffer wrote:
> > > +== Caveats
> > > +
> > > +=== Security and repo config
> > > +
> > > +Part of the motivation behind this refactor is to mitigate hooks as an attack
> > > +vector;footnote:[https://lore.kernel.org/git/20171002234517.GV19555@aiede.mtv.corp.google.com/]
> > > +however, as the design stands, users can still provide hooks in the repo-level
> > > +config, which is included when a repo is zipped and sent elsewhere.  The
> > > +security of the repo-level config is still under discussion; this design
> > > +generally assumes the repo-level config is secure, which is not true yet. The
> > > +goal is to avoid an overcomplicated design to work around a problem which has
> > > +ceased to exist.
> > 
> > I want to be clear that I'm very much opposed to trying to "secure" the
> > config as a whole.  I believe that it's going to ultimately lead to a
> > variety of new and interesting attack vectors and will lead to Git
> > becoming a CVE factory.  Vim has this problem with modelines, for
> > example.
> 
> I'm really interested to hear more - it seems like security and config
> efforts will end up on my plate before the end of the year, so I'd like
> to know what is on your mind.

In general, having untrusted configuration is enormously difficult and
is typically only possible as a designed-in feature with extremely
limited options.  We have not designed that feature in from the
beginning and our config parsing is far too ad-hoc to support any
reasonable security posture.  We've also written a program entirely in
C, which has all of the fun memory safety problems.

If we try to secure the config and allow people to use untrusted
repositories securely, we've changed the security posture of the project
very significantly.  The number of keys we can safely trust come down to
probably core.repositoryformatversion and extensions.objectformat, and
I'm not even sure that the latter can be trusted because there are all
sorts of fun behaviors one can produce by setting the wrong hash
algorithm.

That's just one example of a potential source of security problems, but
I anticipate people can use other options as well.  Setting the rename
limit can be a DoS.  Changing the colors of diff or log output could be
used to hide malicious code from inspection.  We obviously can't trust
anything containing a URL, since an attacker could try to make "git pull
origin" point to their server instead, which means having remotes is out
of the question.  Most of our recent security issues have involved the
.gitmodules file, which, despite being extremely limited, is indeed an
untrusted config file.

The scope of potential vulnerabilities explodes as you allow users to
have untrusted config.  I don't think there's any reasonable set of
useful configuration we can have on a per-repo basis that doesn't open
us up to a whole set of security vulnerabilities.  It seems to me that
we're setting ourselves up to either have a feature so limited nobody
uses it or a massive, never-ending set of CVEs as everybody finds new
ways to attack things.  I just don't think promising that feature to
users is honest because I don't think we can practically achieve it in
Git.  Most projects don't even try it as an option.

On the other hand, what we promise now, which is to restrict untrusted
repositories to cloning and fetching, while surprising to users,
dramatically reduces the scope because it's basically what we promise
over the network.  The interface is highly restricted, well known, and
reasonably secure.  We've also limited attack surface to a much smaller
number of binaries.

So while I think the intention is good and the idea, if implementable,
would be beneficial to users, I think it's practically going to be
unachievable.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [TOPIC 9/17] Obsolescence markers and evolve
  2020-03-12  4:04 ` [TOPIC 9/17] Obsolescence markers and evolve James Ramsay
@ 2020-05-09 21:31   ` Noam Soloveichik
  2020-05-15 22:26     ` Jeff King
  0 siblings, 1 reply; 125+ messages in thread
From: Noam Soloveichik @ 2020-05-09 21:31 UTC (permalink / raw)
  To: git, peff, sandals; +Cc: emilyshaffer, james, jrnieder


On 12/03/2020 6:04, James Ramsay wrote:
> 1. Brandon: I thought it would be interesting to have a similar feature
> as Mercurial has. Mercurial evolve will help you do a big rebase
> commit by commit. Giving you more insights how commits change over time.
>
> 2. Peff: This has been discussed a lot of time on the list already.
Since I'm very interested in this topic, can you link me to some key
discussions you remember? Most of what I've found is Stefan Xenos having
a take on implementing it.

Also, I read some discussions on tools trying to record rebase history
such as
git-series, available on GitHub.
> 3. Jonathan N: It will help with Googlers productivity, but it’s
> smaller compared to other performance fixes.
>
> 4. Brian: It’s a great feature and I would like to have it, but I’m
> not sure it gives enough value to someone to sit down and implement it.
I personally am very interested in this and consider contributing to it
myself,
although it sounds very complex and intricate to perfect.
> 5. Emily: Is it a good candidate for GSoC?
>
> 6. Brian: If we have a good design.

Three's a design proposal from last year:
https://public-inbox.org/git/20190215043105.163688-1-sxenos@google.com/

Did you get a chance to have a look at it?

> 7. Stolee: It should be easier to use than interactive rebase.
>
> 8. Stolee: It would be nice to have instead of fixup commits I would
> send to you new commits which mark your original commits are obsolete.

Hi all,

Seeing this post really encourages me to try my best and tackle this issue.
The main reason I'm interested in this is that such a feature encourages
people to craft very high quality commit history.

It's a great relief to see people discussing the topic, since I've looked
over the web and did not find much talk about it so far.

I want to contribute but since it's a big feature proposal with project-wide
consequences and not some simple bug fix, I'm not sure where to start.

Do you guys have any tips?

Anyway, I think part of the deal of making this feature materialize is
to raise
awareness, so if anybody reading this is onboard, please share

Thanks!
Noam



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [TOPIC 9/17] Obsolescence markers and evolve
  2020-05-09 21:31   ` Noam Soloveichik
@ 2020-05-15 22:26     ` Jeff King
  0 siblings, 0 replies; 125+ messages in thread
From: Jeff King @ 2020-05-15 22:26 UTC (permalink / raw)
  To: Noam Soloveichik; +Cc: git, sandals, emilyshaffer, james, jrnieder

On Sun, May 10, 2020 at 12:31:37AM +0300, Noam Soloveichik wrote:

> On 12/03/2020 6:04, James Ramsay wrote:
> > 1. Brandon: I thought it would be interesting to have a similar feature
> > as Mercurial has. Mercurial evolve will help you do a big rebase
> > commit by commit. Giving you more insights how commits change over time.
> >
> > 2. Peff: This has been discussed a lot of time on the list already.
> Since I'm very interested in this topic, can you link me to some key
> discussions you remember? Most of what I've found is Stefan Xenos having
> a take on implementing it.

Sorry, I don't much to offer. The discussion from Stefan is the only one
I remember talking about evolve itself.

I think my comment may have been specifically about the extra graph
pointers that would be needed to represent rebases, etc. The general
concept of a parent pointer that doesn't imply reachability has come up
over the years. I don't have any links handy, though, and searching for
"parent" in the list archive is not likely to be all that helpful.

Hmm, searching for "parent" and "reachable" also turns up a lot, but
this one is probably relevant:

  https://lore.kernel.org/git/20060425035421.18382.51677.stgit@localhost.localdomain/

Something that old is as likely to hurt as help, though. ;)

-Peff

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [TOPIC 3/17] Obliterate
  2020-03-16 20:01     ` Philip Oakley
@ 2020-05-16  2:21       ` nbelakovski
  0 siblings, 0 replies; 125+ messages in thread
From: nbelakovski @ 2020-05-16  2:21 UTC (permalink / raw)
  To: Philip Oakley; +Cc: Nickolai Belakovski, git

From: Nickolai Belakovski <nbelakovski@gmail.com>

Hi guys,

Sorry I missed you at the contributor summit, but this is an idea I've been
thinking of on my own for some time now, mostly in the context of dealing with
large files as opposed to security issues. I've come to a lot of the same
conclusions that this group has already come up with, namely

* Using git replace functionality is a very obvious apth forward here

* A list of 'revoked' (could I propose the word 'obliterated', in order to make
  the names consistent?) hashes needs to be maintained so that any
  functionality expecting the original object can figure out that it's not
  available.

* This needs support from GitHub, Gitlab, etc. in order for it to work. I'm
  thinking that git prune gets updated to remove 'oblierated' objects, and
  when a git hosting service receives an updated list of obliterated objects,
  it just runs prune. Of course, there would need to be support for replace
  refs as well

I've started working on a prototype/proof of concept. In the v1 it will do the
following upon receiving a hash (i.e. git obliterate abc123):

* Add it to the list of obliterated objects (I'm thinking just .git/obliterate,
  any issues with that?)

* Create a new blob containing the content "This file was obliterated by
  $git.user on $today" and create a replace ref from the provided has to the
  hash of this new blob (so instead of an empty file, there's some info as to
  why the file is missing)

* Run git prune (which will be modified to delete obliterated objects)

It should just take a couple of days. If anyone is interested in joining, I'm
livestreaming my work on twitch at https://www.twitch.tv/actinium226 from 1pm
to 5pm Pacific time on weekdays. This version will still have some issues. As
Damien pointed out, index doesn't handle replace, so the file will look
modified, but I hope that having an initial prototype will help further
discussion and get this feature closer to a state of being completed.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH] doc: propose hooks managed by the config
  2020-05-06 21:33                         ` Emily Shaffer
  2020-05-06 23:13                           ` brian m. carlson
@ 2020-05-19 20:10                           ` Emily Shaffer
  1 sibling, 0 replies; 125+ messages in thread
From: Emily Shaffer @ 2020-05-19 20:10 UTC (permalink / raw)
  To: brian m. carlson, git, Jeff King, Junio C Hamano, James Ramsay,
	Jonathan Nieder, Ævar Arnfjörð Bjarmason,
	Phillip Wood, Josh Steadmon

On Wed, May 06, 2020 at 02:33:54PM -0700, Emily Shaffer wrote:
> 
> On Sat, Apr 25, 2020 at 08:57:27PM +0000, brian m. carlson wrote:
> > 
> > On 2020-04-20 at 23:53:10, Emily Shaffer wrote:
> > > +=== Config schema
> > > +
> > > +Hooks can be introduced by editing the configuration manually. There are two new
> > > +sections added, `hook` and `hookcmd`.
> > > +
> > > +==== `hook`
> > > +
> > > +Primarily contains subsections for each hook event. These subsections define
> > > +hook command execution order; hook commands can be specified by passing the
> > > +command directly if no additional configuration is needed, or by passing the
> > > +name of a `hookcmd`. If Git does not find a `hookcmd` whose subsection matches
> > > +the value of the given command string, Git will try to execute the string
> > > +directly. Hook event subsections can also contain per-hook-event settings.
> > 
> > Can we say explicitly that the commands are invoked by the shell?  Or is
> > the plan to try to parse them without passing to the shell?
> 
> Sure. If I didn't make it clear it was by mistake, not by intent.
> 
> > 
> > > +Also contains top-level hook execution settings, for example,
> > > +`hook.warnHookDir`, `hook.runHookDir`, or `hook.disableAll`.
> > > +
> > > +----
> > > +[hook "pre-commit"]
> > > +  command = perl-linter
> > > +  command = /usr/bin/git-secrets --pre-commit
> > > +
> > > +[hook "pre-applypatch"]
> > > +  command = perl-linter
> > > +  error = ignore
> > > +
> > > +[hook]
> > > +  warnHookDir = true
> > > +  runHookDir = prompt
> > > +----
> > > +
> > > +==== `hookcmd`
> > > +
> > > +Defines a hook command and its attributes, which will be used when a hook event
> > > +occurs. Unqualified attributes are assumed to apply to this hook during all hook
> > > +events, but event-specific attributes can also be supplied. The example runs
> > > +`/usr/bin/lint-it --language=perl <args passed by Git>`, but for repos which
> > > +include this config, the hook command will be skipped for all events to which
> > > +it's normally subscribed _except_ `pre-commit`.
> > > +
> > > +----
> > > +[hookcmd "perl-linter"]
> > > +  command = /usr/bin/lint-it --language=perl
> > > +  skip = true
> > > +  pre-commit-skip = false
> > > +----
> > 
> > This seems fine to me.  I like this design and it seems sane.
> > 
> > > +== Implementation
> > > +
> > > +=== Library
> > > +
> > > +`hook.c` and `hook.h` are responsible for interacting with the config files. In
> > > +the case when the code generating a hook event doesn't have special concerns
> > > +about how to run the hooks, the hook library will provide a basic API to call
> > > +all hooks in config order with an `argv_array` provided by the code which
> > > +generates the hook event:
> > > +
> > > +*`int run_hooks(const char *hookname, struct argv_array *args)`*
> > > +
> > > +This call includes the hook command provided by `run-command.h:find_hook()`;
> > > +eventually, this legacy hook will be gated by a config `hook.runHookDir`. The
> > > +config is checked against a number of cases:
> > > +
> > > +- "no": the legacy hook will not be run
> > > +- "interactive": Git will prompt the user before running the legacy hook
> > > +- "warn": Git will print a warning to stderr before running the legacy hook
> > > +- "yes" (default): Git will silently run the legacy hook
> > > +
> > > +If `hook.runHookDir` is provided more than once, Git will use the most
> > > +restrictive setting provided, for security reasons.
> > 
> > I don't think this is consistent with the way the rest of our options
> > work.  What if someone generally wants to disable legacy hooks but then
> > works with a program in a repository that requires them?
> 
> Unfortunately this is something I think my end will want to hold firm
> on. In general we disagree with your statement later about not wanting
> to make the .git/config secure. I see your use case, and I anticipate
> two possible workarounds I'd present:
> 
> 1) If working in that repo for the short term, run `git -c
> hook.runHookDir=yes <command> <arg...>` (and therefore allow the config
> from command line scope, which I'm happy with in general). Maybe
> someone would want to use an alias, hookgit or hg? Just kidding.. ;P
> 
> 2) If you're stuck with that repo for the long term, add
> `hook.<hookname>.command = /path/.git/hooks/<hookname>` lines to the local
> config.
> 
> Yes, those are both somewhat user-unfriendly, and I think we can do
> better... I'll have to think more and see what I can come up with.
> Suggestions welcome.

I thought more about this and today I'm revisiting this work (and
starting on patches!) so I figured I'd close the loop, since it'll be
buried in the next round of the design doc.

Refusing to trust the local config is actually contrary to one of the
tenets I was trying to use when designing this - that we should assume
the .git/config is safe, so that we don't end up with bloat later if
.git/config does become safe. The suggestion I made here to disallow
overrides doesn't fit, so I'll drop it. The implementation will allow a
more local config to turn hookdir hooks back on.

Thanks, all. By way of status update, I think I'll be able to start
working on this more actively starting this week.

 - Emily

^ permalink raw reply	[flat|nested] 125+ messages in thread

end of thread, other threads:[~2020-05-19 20:10 UTC | newest]

Thread overview: 125+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-12  3:55 Notes from Git Contributor Summit, Los Angeles (April 5, 2020) James Ramsay
2020-03-12  3:56 ` [TOPIC 1/17] Reftable James Ramsay
2020-03-12  3:56 ` [TOPIC 2/17] Hooks in the future James Ramsay
2020-03-12 14:16   ` Emily Shaffer
2020-03-13 17:56     ` Junio C Hamano
2020-04-07 23:01       ` Emily Shaffer
2020-04-07 23:51         ` Emily Shaffer
2020-04-08  0:40           ` Junio C Hamano
2020-04-08  1:09             ` Emily Shaffer
2020-04-10 21:31           ` Jeff King
2020-04-13 19:15             ` Emily Shaffer
2020-04-13 21:52               ` Jeff King
2020-04-14  0:54                 ` [RFC PATCH v2 0/2] configuration-based hook management (was: [TOPIC 2/17] Hooks in the future) Emily Shaffer
2020-04-14  0:54                   ` [RFC PATCH v2 1/2] hook: scaffolding for git-hook subcommand Emily Shaffer
2020-04-14  0:54                   ` [RFC PATCH v2 2/2] hook: add --list mode Emily Shaffer
2020-04-14 15:15                   ` [RFC PATCH v2 0/2] configuration-based hook management Phillip Wood
2020-04-14 19:24                     ` Emily Shaffer
2020-04-14 20:27                       ` Jeff King
2020-04-15 10:01                         ` Phillip Wood
2020-04-14 20:03                     ` Josh Steadmon
2020-04-15 10:08                       ` Phillip Wood
2020-04-14 20:32                     ` Jeff King
2020-04-15 10:01                       ` Phillip Wood
2020-04-15 14:51                         ` Junio C Hamano
2020-04-15 20:30                           ` Emily Shaffer
2020-04-15 22:19                             ` Junio C Hamano
2020-04-15  3:45                 ` [TOPIC 2/17] Hooks in the future Jonathan Nieder
2020-04-15 20:59                   ` Emily Shaffer
2020-04-20 23:53                     ` [PATCH] doc: propose hooks managed by the config Emily Shaffer
2020-04-21  0:22                       ` Emily Shaffer
2020-04-21  1:20                         ` Junio C Hamano
2020-04-24 23:14                           ` Emily Shaffer
2020-04-25 20:57                       ` brian m. carlson
2020-05-06 21:33                         ` Emily Shaffer
2020-05-06 23:13                           ` brian m. carlson
2020-05-19 20:10                           ` Emily Shaffer
2020-04-15 22:42                   ` [TOPIC 2/17] Hooks in the future Jeff King
2020-04-15 22:48                     ` Emily Shaffer
2020-04-15 22:57                       ` Jeff King
2020-03-12  3:57 ` [TOPIC 3/17] Obliterate James Ramsay
2020-03-12 18:06   ` Konstantin Ryabitsev
2020-03-15 22:19   ` Damien Robert
2020-03-16 12:55     ` Konstantin Tokarev
2020-03-26 22:27       ` Damien Robert
2020-03-16 16:32     ` Elijah Newren
2020-03-26 22:30       ` Damien Robert
2020-03-16 18:32     ` Phillip Susi
2020-03-26 22:37       ` Damien Robert
2020-03-16 20:01     ` Philip Oakley
2020-05-16  2:21       ` nbelakovski
2020-03-12  3:58 ` [TOPIC 4/17] Sparse checkout James Ramsay
2020-03-12  4:00 ` [TOPIC 5/17] Partial Clone James Ramsay
2020-03-17  7:38   ` Allowing only blob filtering was: " Christian Couder
2020-03-17 20:39     ` [RFC PATCH 0/2] upload-pack.c: limit allowed filter choices Taylor Blau
2020-03-17 20:39       ` [RFC PATCH 1/2] list_objects_filter_options: introduce 'list_object_filter_config_name' Taylor Blau
2020-03-17 20:53         ` Eric Sunshine
2020-03-18 10:03           ` Jeff King
2020-03-18 19:40             ` Junio C Hamano
2020-03-18 22:38             ` Eric Sunshine
2020-03-19 17:15               ` Jeff King
2020-03-18 21:05           ` Taylor Blau
2020-03-17 20:39       ` [RFC PATCH 2/2] upload-pack.c: allow banning certain object filter(s) Taylor Blau
2020-03-17 21:11         ` Eric Sunshine
2020-03-18 21:18           ` Taylor Blau
2020-03-18 11:18         ` Philip Oakley
2020-03-18 21:20           ` Taylor Blau
2020-03-18 10:18       ` [RFC PATCH 0/2] upload-pack.c: limit allowed filter choices Jeff King
2020-03-18 18:26         ` Re*: " Junio C Hamano
2020-03-19 17:03           ` Jeff King
2020-03-18 21:28         ` Taylor Blau
2020-03-18 22:41           ` Junio C Hamano
2020-03-19 17:10             ` Jeff King
2020-03-19 17:09           ` Jeff King
2020-04-17  9:41         ` Christian Couder
2020-04-17 17:40           ` Taylor Blau
2020-04-17 18:06             ` Jeff King
2020-04-21 12:34               ` Christian Couder
2020-04-22 20:41                 ` Taylor Blau
2020-04-22 20:42               ` Taylor Blau
2020-04-21 12:17             ` Christian Couder
2020-03-12  4:01 ` [TOPIC 6/17] GC strategies James Ramsay
2020-03-12  4:02 ` [TOPIC 7/17] Background operations/maintenance James Ramsay
2020-03-12  4:03 ` [TOPIC 8/17] Push performance James Ramsay
2020-03-12  4:04 ` [TOPIC 9/17] Obsolescence markers and evolve James Ramsay
2020-05-09 21:31   ` Noam Soloveichik
2020-05-15 22:26     ` Jeff King
2020-03-12  4:05 ` [TOPIC 10/17] Expel ‘git shell’? James Ramsay
2020-03-12  4:07 ` [TOPIC 11/17] GPL enforcement James Ramsay
2020-03-12  4:08 ` [TOPIC 12/17] Test harness improvements James Ramsay
2020-03-12  4:09 ` [TOPIC 13/17] Cross implementation test suite James Ramsay
2020-03-12  4:11 ` [TOPIC 14/17] Aspects of merge-ort: cool, or crimes against humanity? James Ramsay
2020-03-12  4:13 ` [TOPIC 15/17] Reachability checks James Ramsay
2020-03-12  4:14 ` [TOPIC 16/17] “I want a reviewer” James Ramsay
2020-03-12 13:31   ` Emily Shaffer
2020-03-12 17:31     ` Konstantin Ryabitsev
2020-03-12 17:42       ` Jonathan Nieder
2020-03-12 18:00         ` Konstantin Ryabitsev
2020-03-17  0:43     ` Philippe Blain
2020-03-13 21:25   ` Eric Wong
2020-03-14 17:27     ` Jeff King
2020-03-15  0:36       ` inbox indexing wishlist [was: [TOPIC 16/17] “I want a reviewer”] Eric Wong
2020-03-12  4:16 ` [TOPIC 17/17] Security James Ramsay
2020-03-12 14:38 ` Notes from Git Contributor Summit, Los Angeles (April 5, 2020) Derrick Stolee
2020-03-13 20:47 ` Jeff King
2020-03-15 18:42 ` Jakub Narebski
2020-03-16 19:31   ` Jeff King
  -- strict thread matches above, loose matches on Subject: below --
2019-12-10  2:33 [PATCH 0/6] configuration-based hook management Emily Shaffer
2019-12-10  2:33 ` [PATCH 1/6] hook: scaffolding for git-hook subcommand Emily Shaffer
2019-12-12  9:41   ` Bert Wesarg
2019-12-12 10:47   ` SZEDER Gábor
2019-12-10  2:33 ` [PATCH 2/6] config: add string mapping for enum config_scope Emily Shaffer
2019-12-10 11:16   ` Philip Oakley
2019-12-10 17:21     ` Philip Oakley
2019-12-10  2:33 ` [PATCH 3/6] hook: add --list mode Emily Shaffer
2019-12-12  9:38   ` Bert Wesarg
2019-12-12 10:58   ` SZEDER Gábor
2019-12-10  2:33 ` [PATCH 4/6] hook: support reordering of hook list Emily Shaffer
2019-12-11 19:21   ` Junio C Hamano
2019-12-10  2:33 ` [PATCH 5/6] hook: remove prior hook with '---' Emily Shaffer
2019-12-10  2:33 ` [PATCH 6/6] hook: teach --porcelain mode Emily Shaffer
2019-12-11 19:33   ` Junio C Hamano
2019-12-11 22:00     ` Emily Shaffer
2019-12-11 22:07       ` Junio C Hamano
2019-12-11 23:15         ` Emily Shaffer
2019-12-11 22:42 ` [PATCH 0/6] configuration-based hook management Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).