git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [PATCH] hooks: propose repository owner configured hooks
@ 2021-03-18 22:03 Albert Cui via GitGitGadget
  2021-03-18 22:29 ` Junio C Hamano
                   ` (3 more replies)
  0 siblings, 4 replies; 26+ messages in thread
From: Albert Cui via GitGitGadget @ 2021-03-18 22:03 UTC (permalink / raw)
  To: git; +Cc: Albert Cui, Albert Cui

From: Albert Cui <albertqcui@gmail.com>

Hooks today are configured at the repository level, making it difficult to
share hooks across repositories. Configuration-based hook management, by
moving hooks configuration to the config, makes this much easier. However,
there is still no good way for project maintainers to encourage or enforce
adoption of specific hook commands on specific hook events in a repository.
As such, there are many tools that provide this functionality on top of Git.

We propose adding native Git functionality to allow project maintainers to
specify hooks that a user ought to install and utilize in their development
workflows. This patch documents the requirements we propose for this feature
as well as a design sketch for implementation.

Signed-off-by: Albert Cui <albertqcui@gmail.com>
Helped-by: Emily Shaffer <emilyshaffer@google.com>
---
    hooks: propose repository owner configured hooks
    
    Hooks today are configured at the repository level, making it difficult
    to share hooks across repositories. Configuration-based hook management,
    by moving hooks configuration to the config, makes this much easier.
    However, there is still no good way for project maintainers to encourage
    or enforce adoption of specific hook commands on specific hook events in
    a repository. As such, there are many tools that provide this
    functionality on top of Git.
    
    We propose adding native Git functionality to allow project maintainers
    to specify hooks that a user ought to install and utilize in their
    development workflows. This patch documents the requirements we propose
    for this feature as well as a design sketch for implementation.
    
    Signed-off-by: Albert Cui albertqcui@gmail.com Helped-by: Emily Shaffer
    emilyshaffer@google.com

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-908%2Falbertcui%2Fhooks-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-908/albertcui/hooks-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/908

 Documentation/Makefile                        |   1 +
 .../technical/repository-owner-hooks.txt      | 294 ++++++++++++++++++
 2 files changed, 295 insertions(+)
 create mode 100644 Documentation/technical/repository-owner-hooks.txt

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 81d1bf7a049b..cdd52143f883 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -96,6 +96,7 @@ TECH_DOCS += technical/protocol-common
 TECH_DOCS += technical/protocol-v2
 TECH_DOCS += technical/racy-git
 TECH_DOCS += technical/reftable
+TECH_DOCS += technical/repository-owner-hooks
 TECH_DOCS += technical/send-pack-pipeline
 TECH_DOCS += technical/shallow
 TECH_DOCS += technical/signature-format
diff --git a/Documentation/technical/repository-owner-hooks.txt b/Documentation/technical/repository-owner-hooks.txt
new file mode 100644
index 000000000000..9d80c7b4c4c3
--- /dev/null
+++ b/Documentation/technical/repository-owner-hooks.txt
@@ -0,0 +1,294 @@
+Repository Owner Hooks Administration
+-------------------------------------
+
+Background
+~~~~~~~~~~
+
+Context
+^^^^^^^
+
+Git has https://git-scm.com/docs/githooks[hooks] functionality to allow users to
+execute commands or scripts when certain Git events occur. Some use cases
+include:
+
+* The `pre-commit` hook event: before committing, a developer may want to lint
+their changes to enforce certain code style and quality. If there are style
+issues, the developer may want the commit to fail so that they can fix the
+issues.
+
+* The `commit-msg` hook event: after committing, repository owners may want to
+enforce a certain commit message style. This may be out of necessity:
+https://www.gerritcodereview.com/[Gerrit Code Review], for example, requires
+each commit to have a Change-Id in the footer.
+
+* The `pre-push` hook: before pushing, repository owners may want to verify that
+pushes are going to the correct remote to prevent leaks.
+
+A common thread between these use cases is to enforce certain behavior across
+many developers working in the same code base, encouraging best practices and
+healthy code quality.
+
+Hooks today are configured at the repository level, making it difficult to share
+hooks across repositories.
+https://lore.kernel.org/git/20210311021037.3001235-2-emilyshaffer@google.com[Configuration-based
+hook management], by moving hooks configuration to the config, makes this much
+easier. However, there is still no good way for project maintainers to encourage
+or enforce adoption of specific hook commands on specific hook events in a
+repository. As such, there are many tools that provide this functionality on top
+of Git (see <<prior-art, Prior Art>>).
+
+We propose adding native Git functionality to allow project maintainers to
+specify hooks that a user ought to install and utilize in their development
+workflows.
+
+User Goals / Critical User Journeys
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+* As a repository owner / project maintainer,
+
+    ** I want to enforce code style so I require developers to use a
+    standardized linter.
+
+    ** I want to prevent leaks / share code widely so I check that developers
+    are uploading code to the right place.
+
+    ** I want this to just work for all the developers in my repository, without
+    needing to support them through configuration.
+
+* As a developer developing in a repository,
+
+    ** I want to set up my workspace
+
+    ** I want control over what code runs on my machine
+
+    ** I want to skip hooks from running (for various reasons)
+
+Design Principles
+~~~~~~~~~~~~~~~~~
+
+* *Make it easy:* Developers just want to get their work done. Introducing
+friction is bad, especially when it prevents development from even starting e.g.
+workspace set up.
+
+* *Treat hooks as software, not configuration:* We take seriously the
+responsibility that comes with causing arbitrary code to run on a user's
+machine. Such code needs to come from a reputable source, be automatically
+updated, and run with user consent.
+
+Feature Requirements
+~~~~~~~~~~~~~~~~~~~~
+
+Minimum Feature Set
+^^^^^^^^^^^^^^^^^^^
+
+* A repository owner can specify a configuration for what hook commands are
+enabled for what hook events
+
+* A repository owner can specify, in this configuration, where the hook
+commands reside
+
+    ** This could be a path to a script/binary within the repository
+
+    ** This could be a path to a script/binary contained within submodules of
+    the repository
+
+    ** This could be a user installed command or script/binary that exists
+    outside of the repository and is present in `$PATH`
+
+* Users must explicitly approve hook execution at least once (yes/no)
+
+    ** This could happen during setup or at execution time
+
+    ** When a hook command changes, a user should re-approve execution (note:
+    implementation should not interfere with requirement listed in “Fast
+    Follows")
+
+    * Users do not need to run setup scripts to install hooks --- the setup flow
+    happens automatically at clone time
+
+* Automation is able to continue to use clone and other commands
+non-interactively
+
+* Works across Windows/Linux/macOS
+
+Fast Follows
+^^^^^^^^^^^^
+
+* When prompted to execute a hook, users can specify always or never, even if
+the hook updates
+
+Nice to Haves
+^^^^^^^^^^^^^
+
+* A method to skip hook execution i.e. `--no-verify` works everywhere
+
+* Support a “warnings only mode” where hooks run but don’t block commands from
+executing
+
+Out of Scope
+^^^^^^^^^^^^
+
+* Ensuring the user has installed software that isn't distributed with the repo
+
+Implementation Exploration: Check "magic" branch for configs at fetch time
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Example workflow
+^^^^^^^^^^^^^^^^
+
+* Perform fetch as normal
+
+* After fetch is complete, Git checks for a "magic" config branch (e.g.
++origin/refs/recommended-config+) which contains information about config lines
+an end-user may want (including hooks).
+
+* As part of the fetch subcommand, Git prompts users to install the configs
+contained there.
+
+    ** User responses to that prompt could be "sticky" - e.g. a user could reply
+    "no (this time)", "no (never)", "yes (this time)", or "yes (always)".
+    Always/never indicate that the user trusts the remote this config is coming
+    from, and should not apply to configs fetched from other remotes.
+
+Later, we might want to do this before the initial clone is performed; that
+workflow looks like:
+
+* During clone, perform ls-refs as normal
+
+* If the server has a "magic" config branch, fetch only that config branch.
+
+* Prompt users as described above.
+
+* Perform the rest of the clone.
+
+Pros
+^^^^
+
+* Repository owners have a method for providing recommended config for
+contributors.
+
+* Installation flow happens without additional user intervention.
+
+* Keeping config branch and history separate from code branch and history means
+it is versioned, but not tied to user's checkout.
+
+* Letting users specify "always" or "never" reduces amount of pain introduced by
+interactive "configuration wizard" flow.
+
+Cons
+^^^^
+
+* Requires addition of step to fetch (and later clone).
+
+* Turning a "set and forget" command like clone into an interactive session with
+the user is not ideal; care must be taken to avoid breaking bots.
+
+* Inflating configs and executables from a remote tracking branch which is never
+checked out could be slow.
+
+Future Work
+~~~~~~~~~~~
+
+* Extending this to allow repository owners to specify specific configurations
+in general e.g. this repository should use partial-clone with these parameters.
+
+* Extending this to support submodules: We want to make sure this works in a way
+that's easy to adapt to submodules, who would likely need to run the same hooks
+as the superproject; for example, submodules could inherit the superproject
+config.
+
+* A way to perform general set up of a code base e.g. installing dependencies,
+any first-build related tasks
+
+[[prior-art]]
+Prior Art
+~~~~~~~~~
+
+Husky
+^^^^^
+
+* Add it as a dev dependency in package.json
+
+* Supports out-of-the-box configuration: Adding a `prepare` script in
+package.json with `husky install` will automate the installation of the husky
+script in the .git directory.
+
+pre-commit
+^^^^^^^^^^
+
+* Acts as a package manager for linting, installing the required linters as
+needed
+
+* `pre-commit install` installs the pre-commit hook shim
+
+* Config is checked into the repository, allowing owners to set versions for
+linters
+
+* Provides some out-of-the-box hooks:
+https://github.com/pre-commit/pre-commit-hooks[https://github.com/pre-commit/pre-commit-hooks]
+
+Repo Hooks
+^^^^^^^^^^
+
+* A Git repository containing hooks commands is specified in the manifest and
+checked out during `repo init`.
+
+* A `repo-hooks` element specifies the Git repository where the hooks code lives
+and which hook to enable. This is typically
+https://android.googlesource.com/platform/tools/repohooks/.
+
+* The only hook event currently supported is `pre-upload`, running when people
+run +repo upload+.
+
+* The hooks code file name must be the same as the hook event name e.g.
+`pre-upload.py`.
+
+* When a hook is used for the first time, the user is prompted
+to approve execution
+
+    ** For manifests fetched via secure protocols (e.g. https://), the user is
+    prompted once.
+
+    ** For insecure protocols (e.g. http://), the user is prompted whenever the
+    registered repohooks project is updated and a hook is triggered.
+
+* Repo hooks must be python modules (Python 2 or 3, with 2 support deprecated)
+
+* Repo hooks run from the top of the repo directory
+
+    ** if the repo client is under `~/tree/`, then that is where the hook runs,
+    even if you ran repo in a git repository at `~/tree/src/foo/`, or in a
+    subdirectory of that git repository in `~/tree/src/foo/bar/`.
+
+* `--no-verify` allows developers to bypass hooks.
+
+* `--ignore-hooks` allows developers to run hooks without blocking
+
+// The '+' in Gerrit URL frightens asciidoctor.
+:repohook-read: https://android.googlesource.com/platform/tools/repohooks/+/refs/heads/master/rh/config.py
+:repohook-config: https://android.googlesource.com/platform/tools/repohooks/+/refs/heads/master#config-files
+
+* The hooks {repohook-read}[read] a {repohook-config}[config file] to determine
+what code to actually execute.
+
+    ** Config can be global e.g. at the top of the Repo workspace and local, and
+    they get merged together.
+
+    ** Local config: looks for the closest config file to where you ran repo,
+    allowing each repository to configure its own hooks
+
+* Example code that gets run:
+
+    ** pylint3 (requiring this to be installed on the host)
+
+    ** commit_msg_bug_field: require “Bug:” to be present in the commit message
+    (built into repohooks)
+
+Glossary
+~~~~~~~~
+
+*hook event*: A point during Git’s execution where user scripts may be run, for
+example, `prepare-commit-msg` or `pre-push`.
+
+*hook command*: A user script or executable which will be run on one or more
+hook events.

base-commit: a5828ae6b52137b913b978e16cd2334482eb4c1f
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] hooks: propose repository owner configured hooks
  2021-03-18 22:03 [PATCH] hooks: propose repository owner configured hooks Albert Cui via GitGitGadget
@ 2021-03-18 22:29 ` Junio C Hamano
  2021-03-18 23:45   ` Albert Cui
  2021-03-19  1:28 ` brian m. carlson
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 26+ messages in thread
From: Junio C Hamano @ 2021-03-18 22:29 UTC (permalink / raw)
  To: Albert Cui via GitGitGadget; +Cc: git, Albert Cui

"Albert Cui via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Albert Cui <albertqcui@gmail.com>
>
> Hooks today are configured at the repository level, making it difficult to
> share hooks across repositories. Configuration-based hook management, by
> moving hooks configuration to the config, makes this much easier. However,
> there is still no good way for project maintainers to encourage or enforce
> adoption of specific hook commands on specific hook events in a repository.
> As such, there are many tools that provide this functionality on top of Git.
>
> We propose adding native Git functionality to allow project maintainers to
> specify hooks that a user ought to install and utilize in their development
> workflows. This patch documents the requirements we propose for this feature
> as well as a design sketch for implementation.
>
> Signed-off-by: Albert Cui <albertqcui@gmail.com>
> Helped-by: Emily Shaffer <emilyshaffer@google.com>
> ---

A copy of Android or Chromium project I have on my disk is owned by
me in the distributed world---open projects do not and do not have
to care who makes and has copies, even though they may care who can
push to the project's own repository.

So avoid "repository owner configured", when you mean "project
configured" or "project controlled".

On the other side of the coin is that this document should avoid
reference to a "repository" in an ambiguous way, as it can refer to
the central meeting place the project controls, lets developers to
clone and fetch from, and push into, and it can also refer to the
copy of that central meeting place individual contributors work in.

In our own documentation, we often refer to the former as "the
central repository", and the latter as "a clone" (as in "you start
working in your own clone").

>     hooks: propose repository owner configured hooks
> ...
>     for this feature as well as a design sketch for implementation.

No need to duplicate this text twice.  

> @@ -0,0 +1,294 @@
> +Repository Owner Hooks Administration
> +-------------------------------------

Ditto and throughout the document.

> +Background
> +~~~~~~~~~~
> +
> +Context
> +^^^^^^^
> +
> +Git has https://git-scm.com/docs/githooks[hooks] functionality to allow users to
> +execute commands or scripts when certain Git events occur. Some use cases
> +include:
> +
> +* The `pre-commit` hook event: before committing, a developer may want to lint
> +their changes to enforce certain code style and quality. If there are style
> +issues, the developer may want the commit to fail so that they can fix the
> +issues.

This is irrelevant in the context of this proposal, no?  It is not
that "the developer may want".

Rather, it is "the project may want the commit to fail so that they
won't upload commits that violate the house style", no?

> +* The `commit-msg` hook event: after committing, repository owners may want to
> +enforce a certain commit message style. This may be out of necessity:
> +https://www.gerritcodereview.com/[Gerrit Code Review], for example, requires
> +each commit to have a Change-Id in the footer.

This is a more honest descrition ;-)

> +We propose adding native Git functionality to allow project maintainers to
> +specify hooks that a user ought to install and utilize in their development
> +workflows.

OK.

> +User Goals / Critical User Journeys
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +* As a repository owner / project maintainer,
> +
> +    ** I want to enforce code style so I require developers to use a
> +    standardized linter.
> +
> +    ** I want to prevent leaks / share code widely so I check that developers
> +    are uploading code to the right place.

I understand "You want to prevent leaks", but not "I want to share
code widely".  Perhaps you meant s/widely/narrowly/?

> +    ** I want this to just work for all the developers in my repository, without
> +    needing to support them through configuration.
> +
> +* As a developer developing in a repository,
> +
> +    ** I want to set up my workspace
> +
> +    ** I want control over what code runs on my machine
> +
> +    ** I want to skip hooks from running (for various reasons)

Also I want to run hooks other people may not run.  And that is one
thing Emily's config based stuff makes easier.


> +Design Principles
> +~~~~~~~~~~~~~~~~~
> +
> +* *Make it easy:* Developers just want to get their work done. Introducing
> +friction is bad, especially when it prevents development from even starting e.g.
> +workspace set up.
> +
> +* *Treat hooks as software, not configuration:* We take seriously the
> +responsibility that comes with causing arbitrary code to run on a user's
> +machine. Such code needs to come from a reputable source, be automatically
> +updated, and run with user consent.

OK.

> +Feature Requirements
> +~~~~~~~~~~~~~~~~~~~~
> +
> +Minimum Feature Set
> +^^^^^^^^^^^^^^^^^^^
> +
> +* A repository owner can specify a configuration for what hook commands are
> +enabled for what hook events
> +
> +* A repository owner can specify, in this configuration, where the hook
> +commands reside
> +
> +    ** This could be a path to a script/binary within the repository
> +
> +    ** This could be a path to a script/binary contained within submodules of
> +    the repository
> +
> +    ** This could be a user installed command or script/binary that exists
> +    outside of the repository and is present in `$PATH`
> +
> +* Users must explicitly approve hook execution at least once (yes/no)
> +
> +    ** This could happen during setup or at execution time
> +
> +    ** When a hook command changes, a user should re-approve execution (note:
> +    implementation should not interfere with requirement listed in “Fast
> +    Follows")
> +
> +    * Users do not need to run setup scripts to install hooks --- the setup flow
> +    happens automatically at clone time

This one is probably unacceptable, as it makes it impossible to
perform unattended cloning.  A better alternative may be to make it
part of the build procedure.

> +* Automation is able to continue to use clone and other commands
> +non-interactively

This directly contradicts with the "setup flow happens
automatically", doesn't it?  The user can pretend to be (or the
"automation" detection may incorrectly misidentify the users to be)
an automated client when cloning the project, and would not trigger
any setup.  A separate setup procedure needs to be there to salvage
such users anyway.


> +* Works across Windows/Linux/macOS
> +
> +Fast Follows
> +^^^^^^^^^^^^
> +
> +* When prompted to execute a hook, users can specify always or never, even if
> +the hook updates

This contradicts the earlier claim to take the responsibility
seriously, doesn't it?  I think the convenience feature is useful,
but then we should tone down the claim---we allow users to be loose
and blindly trust their own project, instead of taking it always
seriously.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] hooks: propose repository owner configured hooks
  2021-03-18 22:29 ` Junio C Hamano
@ 2021-03-18 23:45   ` Albert Cui
  0 siblings, 0 replies; 26+ messages in thread
From: Albert Cui @ 2021-03-18 23:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Albert Cui via GitGitGadget, git

On Thu, Mar 18, 2021 at 3:29 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> So avoid "repository owner configured", when you mean "project
> configured" or "project controlled".
>
> On the other side of the coin is that this document should avoid
> reference to a "repository" in an ambiguous way, as it can refer to
> the central meeting place the project controls, lets developers to
> clone and fetch from, and push into, and it can also refer to the
> copy of that central meeting place individual contributors work in.
>
> In our own documentation, we often refer to the former as "the
> central repository", and the latter as "a clone" (as in "you start
> working in your own clone").
>

Ack, thanks for the correct terminology.

> > +* The `pre-commit` hook event: before committing, a developer may want to lint
> > +their changes to enforce certain code style and quality. If there are style
> > +issues, the developer may want the commit to fail so that they can fix the
> > +issues.
>
> This is irrelevant in the context of this proposal, no?  It is not
> that "the developer may want".
>
> Rather, it is "the project may want the commit to fail so that they
> won't upload commits that violate the house style", no?

Good point

> > +User Goals / Critical User Journeys
> > +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > +
> > +* As a repository owner / project maintainer,
> > +
> > +    ** I want to prevent leaks / share code widely so I check that developers
> > +    are uploading code to the right place.
>
> I understand "You want to prevent leaks", but not "I want to share
> code widely".  Perhaps you meant s/widely/narrowly/?
>

There are two opposite intentions here and the slash is making it confusing.
Some projects have internal and external central repositories, and they want to
encourage developers to push to the external repositories when possible.

> > +Design Principles
> > +~~~~~~~~~~~~~~~~~
> >
> > +* *Treat hooks as software, not configuration:* We take seriously the
> > +responsibility that comes with causing arbitrary code to run on a user's
> > +machine. Such code needs to come from a reputable source, be automatically
> > +updated, and run with user consent.

> > +Feature Requirements
> > +~~~~~~~~~~~~~~~~~~~~
> > +
> > +Minimum Feature Set
> > +^^^^^^^^^^^^^^^^^^^
> > +    * Users do not need to run setup scripts to install hooks --- the setup flow
> > +    happens automatically at clone time
>
> This one is probably unacceptable, as it makes it impossible to
> perform unattended cloning.  A better alternative may be to make it
> part of the build procedure.
>
> > +* Automation is able to continue to use clone and other commands
> > +non-interactively
>
> This directly contradicts with the "setup flow happens
> automatically", doesn't it?  The user can pretend to be (or the
> "automation" detection may incorrectly misidentify the users to be)
> an automated client when cloning the project, and would not trigger
> any setup.  A separate setup procedure needs to be there to salvage
> such users anyway.

I don't think there's a contradiction here. Users should always be able to
decline hooks since it's their own machine, so the goal shouldn't be to prevent
uses from declining setup.

That said, most users want to do the right thing so that their patches
get accepted
e.g. adopt the right code styles via linting, so the goal should be to
make it as easy
as possible for them to do that.

In the ideal case: automation detection is great and there are few
false positives :)

If not, we can't break unattended clones, so the default clone
behavior would always have
to be no-hooks-setup... maybe that speaks to `git clone --recommended-setup`?

+1 we'd need a method for a user to trigger a set up if they change
their minds or
need to recover.

> > +* Works across Windows/Linux/macOS
> > +
> > +Fast Follows
> > +^^^^^^^^^^^^
> > +
> > +* When prompted to execute a hook, users can specify always or never, even if
> > +the hook updates
>
> This contradicts the earlier claim to take the responsibility
> seriously, doesn't it?  I think the convenience feature is useful,
> but then we should tone down the claim---we allow users to be loose
> and blindly trust their own project, instead of taking it always
> seriously.
>

I don't think this is contradicting since the user is consenting, but
maybe the principle
can be clearer:

"Execution of code must require user consent, and users should clearly
understand
what they are consenting to." That is, I would imagine for a prompt
for "always" we'd

1) have clear help text saying that hooks from $REMOTE will be
automatically installed
when they change
2) give an FYI if hooks the hooks do change

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] hooks: propose repository owner configured hooks
  2021-03-18 22:03 [PATCH] hooks: propose repository owner configured hooks Albert Cui via GitGitGadget
  2021-03-18 22:29 ` Junio C Hamano
@ 2021-03-19  1:28 ` brian m. carlson
  2021-03-19 10:27 ` Ævar Arnfjörð Bjarmason
  2021-03-26  1:43 ` [PATCH v2] hooks: propose project " Albert Cui via GitGitGadget
  3 siblings, 0 replies; 26+ messages in thread
From: brian m. carlson @ 2021-03-19  1:28 UTC (permalink / raw)
  To: Albert Cui via GitGitGadget; +Cc: git, Albert Cui

[-- Attachment #1: Type: text/plain, Size: 8563 bytes --]

On 2021-03-18 at 22:03:35, Albert Cui via GitGitGadget wrote:
> +User Goals / Critical User Journeys
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +* As a repository owner / project maintainer,
> +
> +    ** I want to enforce code style so I require developers to use a
> +    standardized linter.

This sounds like you're proposing pre-commit hooks for this purpose.
Let me quote you this passage from the FAQ that I wrote that explains my
position perfectly:

  [S]ome advanced users find `pre-commit` hooks to be an impediment to
  workflows that use temporary commits to stage work in progress or that create
  fixup commits, so it's better to push these kinds of checks to the server
  anyway.

It also outlines that pre-commit hooks are not an effective control and
for effective implementation, this must live on the server.

I can never imagine myself wanting to use a pre-commit hook for these
reasons.  It would interrupt my typical workflow which uses many, many
fixup and squash commits and near constant rebases.

Similarly, a pre-push hook prevents me from pushing up in-progress code
to a remote so my colleagues and other interested parties can follow my
progress, contribute design feedback, and, if I need to be out of the
office for a while, pick up my work and continue it.

> +    ** I want to prevent leaks / share code widely so I check that developers
> +    are uploading code to the right place.

This is also not an effective control.  Any user can bypass this
trivially, so there's little point in shipping a hook that does this.

> +    ** I want this to just work for all the developers in my repository, without
> +    needing to support them through configuration.

Unfortunately, in most cases, what hooks are to be used and whether they
should be used is a developer choice, and developers should be able to
use or not use hooks much the same way as they choose an editor.  Thus, a
proposal that a project maintainer should specify a set of hooks to be
used on a developer machine is much like that maintainer proposing that
a developer use a specific editor.

While there are situations where that's done, in general, it's not a
best practice to do so, so I'm generally negative on the idea of
automatically deploying hooks to developer machines.  I think doing so
is a fundamental mistake in the developer process in most circumstances.
Certainly there are exceptions, but I think they are so few that the
burden of running a script is very minor in comparison.

> +Design Principles
> +~~~~~~~~~~~~~~~~~
> +
> +* *Make it easy:* Developers just want to get their work done. Introducing
> +friction is bad, especially when it prevents development from even starting e.g.
> +workspace set up.

I think this assumes that developers need these hooks to be productive,
and I'm not in agreement with that assumption.  For the reasons I've
outlined above, I think that's a misunderstanding about best practices
and how codebases should be structured.

> +* *Treat hooks as software, not configuration:* We take seriously the
> +responsibility that comes with causing arbitrary code to run on a user's
> +machine. Such code needs to come from a reputable source, be automatically
> +updated, and run with user consent.

I don't think we can securely automatically update hooks from a remote
source.  That sounds like a terrible idea from a security perspective.

If users really want this, they can add a post-checkout hook to do so.

> +Feature Requirements

> +* Users must explicitly approve hook execution at least once (yes/no)
> +
> +    ** This could happen during setup or at execution time
> +
> +    ** When a hook command changes, a user should re-approve execution (note:
> +    implementation should not interfere with requirement listed in “Fast
> +    Follows")
> +
> +    * Users do not need to run setup scripts to install hooks --- the setup flow
> +    happens automatically at clone time

This sounds like you're proposing automatically installing hooks, which
would of course be a security vulnerability.  If that's not what you're
proposing, then perhaps you'll want to rephrase this point.

> +* Automation is able to continue to use clone and other commands
> +non-interactively
> +
> +* Works across Windows/Linux/macOS

Git supports other platforms as well.

> +Implementation Exploration: Check "magic" branch for configs at fetch time
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Example workflow
> +^^^^^^^^^^^^^^^^
> +
> +* Perform fetch as normal
> +
> +* After fetch is complete, Git checks for a "magic" config branch (e.g.
> ++origin/refs/recommended-config+) which contains information about config lines
> +an end-user may want (including hooks).

As I explain further below, shipping config is a security problem.

> +* As part of the fetch subcommand, Git prompts users to install the configs
> +contained there.

I don't want to be prompted to install hooks, because I never want to
use hooks from a repository like this.

Moreover, how could a user possibly make an educated decision?  They
can't actually see the hooks at this point because no checkout has
occurred, so the user must make a leap of faith based on the purported
reputation of the repository owner.

In similar situations, such as piping curl to bash, the recommendation
is always to download the script manually, inspect it for any problems
or vulnerabilities, and only then execute it.  This proposal doesn't
allow that.

> +    ** User responses to that prompt could be "sticky" - e.g. a user could reply
> +    "no (this time)", "no (never)", "yes (this time)", or "yes (always)".
> +    Always/never indicate that the user trusts the remote this config is coming
> +    from, and should not apply to configs fetched from other remotes.

"Yes (always)" is almost certainly a security problem.

> +Future Work
> +~~~~~~~~~~~
> +
> +* Extending this to allow repository owners to specify specific configurations
> +in general e.g. this repository should use partial-clone with these parameters.

I don't think any sort of automatic configuration is going to be secure.
Git has had numerous vulnerabilities with the .gitmodules file and it is
highly restricted in its contents.  So I'm pretty much completely
opposed to this since I think it's likely to result in Git becoming a
CVE magnet.

Even if the entire rest of the project disagreed with me and adopted
this, the config would need to be restricted to an allowlist, not a
denylist, because we have good and sufficient evidence that the latter
is not secure.  Even things as simple as changing diff colors can cause
a security problem (e.g., hiding malicious code in a code review), so
I'm having trouble imagining how we could come up with more than a tiny
set of options that could be legitimately shipped securely, at which
point the extra maintenance and development burden becomes excessive.

> +* A way to perform general set up of a code base e.g. installing dependencies,
> +any first-build related tasks

There are already a variety of proposals for this, such as
https://github.com/github/scripts-to-rule-them-all.  I don't believe Git
needs to be in the business of doing this, and all of my concerns about
security are appropriate here as well.

In general, I have grave concerns about this proposal.  It makes
assumptions that users should or will want to use the repository owner's
hooks, which I disagree with.  I think prompting users to install hooks
will encourage them to be much more likely to do so, especially for less
skilled users, which is likely to lead to social engineering attacks.
And overall, this proposal sounds like it does not give enough
consideration to the giant security risks it introduces.  I don't think
this proposal as it stands should be accepted, and I am having a hard
time thinking of any similar proposal I could support.

I appreciate that this proposal does make things slightly easier for a
subset of users and environments, but I think picking a standard set of
scripts for those users and environments and letting users execute them
if and when they want is only slightly more difficult and much more
secure.  I've used that approach at work and it works great, so I feel
confident it can work in most other environments as well.
-- 
brian m. carlson (he/him or they/them)
Houston, Texas, US

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] hooks: propose repository owner configured hooks
  2021-03-18 22:03 [PATCH] hooks: propose repository owner configured hooks Albert Cui via GitGitGadget
  2021-03-18 22:29 ` Junio C Hamano
  2021-03-19  1:28 ` brian m. carlson
@ 2021-03-19 10:27 ` Ævar Arnfjörð Bjarmason
  2021-04-06  0:35   ` Albert Cui
  2021-03-26  1:43 ` [PATCH v2] hooks: propose project " Albert Cui via GitGitGadget
  3 siblings, 1 reply; 26+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-19 10:27 UTC (permalink / raw)
  To: Albert Cui via GitGitGadget; +Cc: git, Albert Cui


On Thu, Mar 18 2021, Albert Cui via GitGitGadget wrote:

> We propose adding native Git functionality to allow project maintainers to
> specify hooks that a user ought to install and utilize in their development
> workflows. This patch documents the requirements we propose for this feature
> as well as a design sketch for implementation.

I like this goal. It's something I proposed (off-list) before and did a
small write-up of here:
https://lore.kernel.org/git/87zi6eakkt.fsf@evledraar.gmail.com/

As far as I recall the response in the room at the time was the expected
combination of "sure that would be neat" and "let's see the
patches". I.e. I don't think there's hard objections to the existence of
such a facility, but of course the devel is in the details, and
obviously I never followed-up with actually trying to implement it.

With config-based hooks this'll be much easier for the hook case, and
I've tried to help that along[A]. I'd be interested in reviewing any
effort in this area as well. The ".githooks/*" case in that proposal
goes away with config-based hooks (since they wouldn't be special
anymore, just config).

> +Example workflow
> +^^^^^^^^^^^^^^^^
> +
> +* Perform fetch as normal
> +
> +* After fetch is complete, Git checks for a "magic" config branch (e.g.
> ++origin/refs/recommended-config+) which contains information about config lines
> +an end-user may want (including hooks).

Why collapse the many-to-many branch/config relationship to many-one
this way instead of having a .gitconfig at the top-level? Then you can
seamlessly have per-branch config, and it'll work if you later locally
clone the repo or just transport that branch (e.g. via bundle).

But reading ahead...

> +* As part of the fetch subcommand, Git prompts users to install the configs
> +contained there.
> +
> +    ** User responses to that prompt could be "sticky" - e.g. a user could reply
> +    "no (this time)", "no (never)", "yes (this time)", or "yes (always)".
> +    Always/never indicate that the user trusts the remote this config is coming
> +    from, and should not apply to configs fetched from other remotes.

As noted in the proposal I linked I think anyone thinking about this
would do well to examine the trade-off Emacs's implementation of this
uses, since it manages to safely open arbitrary files that'll
potentially run arbitrary code on-open, but only if the user opts-in.

> +Later, we might want to do this before the initial clone is performed; that
> +workflow looks like:
> +
> +* During clone, perform ls-refs as normal
> +
> +* If the server has a "magic" config branch, fetch only that config branch.

...the reason for the magic branch is this before-clone use-case?

Unless there's a really strong use-case for that I'd think the
per-branch .gitconfig would be a better trade-off, but even then there
would be ways to make that work obviously in the many-many case, and
still e.g. have a branch advertise a config blob for its "main" branch
as a special case or something.

I also suspect an unstated constraint of having this in a magic branch
is the limitation of some git hosting provider's ACL
implementations. More on that later...

> +* Prompt users as described above.
> +
> +* Perform the rest of the clone.
> [...]
> +* Repository owners have a method for providing recommended config for
> +contributors.
> +
> +* Installation flow happens without additional user intervention.
> +
> +* Keeping config branch and history separate from code branch and history means
> +it is versioned, but not tied to user's checkout.
> +
> +* Letting users specify "always" or "never" reduces amount of pain introduced by
> +interactive "configuration wizard" flow.
> +
> +Cons
> +^^^^
> +
> +* Requires addition of step to fetch (and later clone).
> +
> +* Turning a "set and forget" command like clone into an interactive session with
> +the user is not ideal; care must be taken to avoid breaking bots.
> +
> +* Inflating configs and executables from a remote tracking branch which is never
> +checked out could be slow.
> +
> +Future Work
> +~~~~~~~~~~~
> +
> +* Extending this to allow repository owners to specify specific configurations
> +in general e.g. this repository should use partial-clone with these parameters.

I don't see why such a proposal should narrowly confine itself to hooks.

Once we have config-based hooks then hooks are just configuration.

If we're going to pick up arbitrary configuration from remotes/repos on
request then we'd be starting with the most dangerous configuration.

I think a better way to start such an effort incrementally would be:

1. Audit git's config parser for the safety of parsing arbitrary config,
   we parse .gitmodules now, do we feel it's safe enough to parse
   arbitrary config (probably, but worth checking).

2. Add reflection to git's own config variables. Right now we have this
   in the binary generated via a grep from the documentation. Similar to
   Emacs's implementation we should/could categorize all our known
   config variables by safety.

   Hooks are the least safe, something like a diff.context=N setting the
   repo wants to suggest a -U<n> setting much safer (just tweaking our
   existing diff algorithm) etc. But even in those cases we'd want to
   proceed slowly, e.g. is that config parsing for that -U<n> defensive
   enough to be safe for arbitrary data?

3. Users should be able to e.g. configure "yes, for any repo I clone
   they can tweak 'safe'" variables. This would reduce user dialog
   fatigue, and thus increase safety. I.e. a repo who wants to set a
   thing like hooks would stand out, but something that e.g. wants to
   change the diff order would pass on existing global configuration.

4. Smarter ACL that magic remote+magic branch: It seems like an obvious
   thing to me to want that if I clone e.g. a random clone of git.git
   that I'd want to setup config for it IFF the .gitconfig in it is
   reachable from a tag GPG signed by <approved key>.

   Git's a distributed system, so while I don't think it's bad to have
   some feature like "I always trust config from this remote" (e.g. a
   corporate environment where you know its .gitconfig is
   guarded/audited) we should think about more distributed-friendly
   solutions first and if possible guide users towards those.

A. https://lore.kernel.org/git/87mtv8fww3.fsf@evledraar.gmail.com/

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2] hooks: propose project configured hooks
  2021-03-18 22:03 [PATCH] hooks: propose repository owner configured hooks Albert Cui via GitGitGadget
                   ` (2 preceding siblings ...)
  2021-03-19 10:27 ` Ævar Arnfjörð Bjarmason
@ 2021-03-26  1:43 ` Albert Cui via GitGitGadget
  2021-03-29 23:20   ` Emily Shaffer
                     ` (3 more replies)
  3 siblings, 4 replies; 26+ messages in thread
From: Albert Cui via GitGitGadget @ 2021-03-26  1:43 UTC (permalink / raw)
  To: git
  Cc: Albert Cui, brian m. carlson,
	Ævar Arnfjörð Bjarmason, Albert Cui, Albert Cui

From: Albert Cui <albertqcui@gmail.com>

Hooks today are configured at the repository level, making it difficult to
share hooks across repositories. Configuration-based hook management, by
moving hooks configuration to the config, makes this much easier. However,
there is still no good way for project maintainers to encourage or enforce
adoption of specific hook commands on specific hook events in a repository.
As such, there are many tools that provide this functionality on top of Git.

This patch documents the requirements we propose for this feature as well as
a design sketch for implementation.

Signed-off-by: Albert Cui <albertqcui@gmail.com>
Helped-by: Emily Shaffer <emilyshaffer@google.com>
Change-Id: I5f6747524b97c51dfe5fa28e48ea03981b2da5b8
---
    [RFC] hooks: propose project configured hooks
    
    V2:
    
     * Clarify usage of the word repository; "repository owner" is
       specifically confusing so changed to "project"
     * Add section about local vs remote hooks, and why think why there's
       still use case for local hooks
     * Change design principles section to be more explicit about security
       considerations
     * Introduce requirement for a command to set up hooks
     * Add UX examples
    
    Signed-off-by: Albert Cui albertqcui@gmail.com Helped-by: Emily Shaffer
    emilyshaffer@google.com

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-908%2Falbertcui%2Fhooks-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-908/albertcui/hooks-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/908

Range-diff vs v1:

 1:  2a558d06245d ! 1:  473e14edd441 hooks: propose repository owner configured hooks
     @@ Metadata
      Author: Albert Cui <albertqcui@gmail.com>
      
       ## Commit message ##
     -    hooks: propose repository owner configured hooks
     +    hooks: propose project configured hooks
      
          Hooks today are configured at the repository level, making it difficult to
          share hooks across repositories. Configuration-based hook management, by
     @@ Commit message
          adoption of specific hook commands on specific hook events in a repository.
          As such, there are many tools that provide this functionality on top of Git.
      
     -    We propose adding native Git functionality to allow project maintainers to
     -    specify hooks that a user ought to install and utilize in their development
     -    workflows. This patch documents the requirements we propose for this feature
     -    as well as a design sketch for implementation.
     +    This patch documents the requirements we propose for this feature as well as
     +    a design sketch for implementation.
      
          Signed-off-by: Albert Cui <albertqcui@gmail.com>
          Helped-by: Emily Shaffer <emilyshaffer@google.com>
     +    Change-Id: I5f6747524b97c51dfe5fa28e48ea03981b2da5b8
      
       ## Documentation/Makefile ##
      @@ Documentation/Makefile: TECH_DOCS += technical/protocol-common
       TECH_DOCS += technical/protocol-v2
       TECH_DOCS += technical/racy-git
       TECH_DOCS += technical/reftable
     -+TECH_DOCS += technical/repository-owner-hooks
     ++TECH_DOCS += technical/project-configured-hooks
       TECH_DOCS += technical/send-pack-pipeline
       TECH_DOCS += technical/shallow
       TECH_DOCS += technical/signature-format
      
     - ## Documentation/technical/repository-owner-hooks.txt (new) ##
     + ## Documentation/technical/project-configured-hooks.txt (new) ##
      @@
     -+Repository Owner Hooks Administration
     -+-------------------------------------
     ++Project Configured Hooks
     ++------------------------
      +
      +Background
      +~~~~~~~~~~
     @@ Documentation/technical/repository-owner-hooks.txt (new)
      +execute commands or scripts when certain Git events occur. Some use cases
      +include:
      +
     -+* The `pre-commit` hook event: before committing, a developer may want to lint
     -+their changes to enforce certain code style and quality. If there are style
     -+issues, the developer may want the commit to fail so that they can fix the
     -+issues.
     ++* The `pre-commit` hook event:
      +
     -+* The `commit-msg` hook event: after committing, repository owners may want to
     -+enforce a certain commit message style. This may be out of necessity:
     ++  ** A developer may want to lint their changes to enforce certain code style
     ++  and quality. The project may want the commit to fail so that commits that
     ++  violate the project's style don't get uploaded.
     ++
     ++  ** The project may want to prevent developers from committing passwords or
     ++  other sensitive information e.g. via
     ++  https://github.com/awslabs/git-secrets[git-secrets].
     ++
     ++* The `commit-msg` hook event: the project may want to enforce a certain commit
     ++message style. This may be out of necessity:
      +https://www.gerritcodereview.com/[Gerrit Code Review], for example, requires
      +each commit to have a Change-Id in the footer.
      +
     -+* The `pre-push` hook: before pushing, repository owners may want to verify that
     -+pushes are going to the correct remote to prevent leaks.
     ++* The `pre-push` hook: the project may want to verify that pushes are going to
     ++the correct central repository to prevent leaks.
      +
      +A common thread between these use cases is to enforce certain behavior across
      +many developers working in the same code base, encouraging best practices and
      +healthy code quality.
      +
     -+Hooks today are configured at the repository level, making it difficult to share
     -+hooks across repositories.
     ++Hooks today are configured individually in each clone, making it difficult for
     ++project maintainers to enforce hooks usage across them.
      +https://lore.kernel.org/git/20210311021037.3001235-2-emilyshaffer@google.com[Configuration-based
     -+hook management], by moving hooks configuration to the config, makes this much
     -+easier. However, there is still no good way for project maintainers to encourage
     ++hook management], by moving hooks to the config, makes this easier; individuals
     ++can at least configure hooks to be used across multiple workspaces on their
     ++host. However, there is still no good way for project maintainers to encourage
      +or enforce adoption of specific hook commands on specific hook events in a
     -+repository. As such, there are many tools that provide this functionality on top
     -+of Git (see <<prior-art, Prior Art>>).
     ++clone. As such, there are many tools that provide this functionality on top of
     ++Git (see <<prior-art, Prior Art>>).
      +
      +We propose adding native Git functionality to allow project maintainers to
      +specify hooks that a user ought to install and utilize in their development
      +workflows.
      +
     ++Server-side vs Local Checks
     ++^^^^^^^^^^^^^^^^^^^^^^^^^^^
     ++
     ++A large motivation for this change is providing projects a method to enable
     ++local checks of code e.g. linting. As documented in
     ++https://git-scm.com/docs/gitfaq#_hooks[gitfaq], in some cases, server-side
     ++checks may be more appropriate, especially since developers can always skip
     ++local hook execution. We think there are still benefits to local checks:
     ++
     ++* Ensuring commit message style and preventing the committing of sensitive data,
     ++as described above. In particular, if CI results are public, as with many open
     ++source projects, server side checks are useless for guarding against leaking
     ++sensitive data.
     ++
     ++* Helps developers catch issues earlier: typically developers need to push to
     ++the remote to trigger server-side checks. Local hooks can be run anytime the
     ++developer wants. This is especially useful if the project has slow
     ++server-checks; catching issues locally can save the developer a lot of time
     ++waiting for CI. They are also useful for locally reproducing an issue identified
     ++in CI, helping resolve issues faster.
     ++
     ++* Since the typical goal of developers to submit changes to a central
     ++repository, their interests are aligned with the project maintainers'; while
     ++they can choose to skip local hook execution, it is in their interest to allow
     ++hooks to run at least before proposing code for submission to the central
     ++repository to increase the chances of the code getting accepted.
     ++
     ++In the ideal world, developers and project maintainers use both local and server
     ++side checks in their workflow. However, for many smaller projects, this may not
     ++be possible: CI may be too expensive to run or configure. The number of local
     ++solutions to this use case speaks to this need (see <<prior-art, Prior Art>>).
     ++Bringing this natively to Git can give all these developers a well-supported,
     ++secure implementation opposed to the fragmentation we see today.
     ++
      +User Goals / Critical User Journeys
      +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      +
     -+* As a repository owner / project maintainer,
     ++* As a project maintainer,
      +
      +    ** I want to enforce code style so I require developers to use a
      +    standardized linter.
      +
     -+    ** I want to prevent leaks / share code widely so I check that developers
     -+    are uploading code to the right place.
     ++    ** I want to prevent sensitive data from getting checked in.
      +
     -+    ** I want this to just work for all the developers in my repository, without
     ++    ** I want to prevent leaks so I check that developers are uploading code to
     ++    the right private central repository. Conversely, I may want to encourage
     ++    more open source development and encourage developers to push to public
     ++    central repositories.
     ++
     ++    ** I want this to just work for all the developers in my project, without
      +    needing to support them through configuration.
      +
     -+* As a developer developing in a repository,
     ++* As a developer developing in a local clone,
     ++
     ++    ** I want to set up my workspace.
     ++
     ++    ** I want control over what code runs on my machine.
     ++
     ++    ** I want to set up my own hooks, in addition to or in lieu of project
     ++    configured hooks.
     ++
     ++    ** I want to skip hooks from running (for various reasons).
     ++
     ++Security Considerations and Design Principles
     ++~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      +
     -+    ** I want to set up my workspace
     ++We must balance the desire to make hooks setup easy for developers --- allowing
     ++them to get hooks set up with low friction, and hence increasing the probability
     ++of them adopting these hooks --- with protecting users from the security risks
     ++of arbitrary code execution on their hosts.
      +
     -+    ** I want control over what code runs on my machine
     ++To inform the design, we propose these design principles:
      +
     -+    ** I want to skip hooks from running (for various reasons)
     ++* User consent: Users must explicitly agree to hooks usage; no hooks should
     ++execute without consent, and users should re-consent if hooks update. Users can
     ++opt-out of hooks.
      +
     -+Design Principles
     -+~~~~~~~~~~~~~~~~~
     ++* Trust comes from the central repository:
     ++  ** Most users don't have the time or expertise to properly audit every hook
     ++  and what it does. There must be trust between the user and the remote that the
     ++  code came from, and the Git project should ensure trust to the degree it can
     ++  e.g. enforce HTTPS for its integrity guarantees.
      +
     -+* *Make it easy:* Developers just want to get their work done. Introducing
     -+friction is bad, especially when it prevents development from even starting e.g.
     -+workspace set up.
     ++  ** Since developers will likely build their local clone in their development
     ++  process, at some point, arbitrary code from the repository will be executed.
     ++  In this sense, hooks _with user consent_ do not introduce a new attack surface.
      +
     -+* *Treat hooks as software, not configuration:* We take seriously the
     -+responsibility that comes with causing arbitrary code to run on a user's
     -+machine. Such code needs to come from a reputable source, be automatically
     -+updated, and run with user consent.
     ++* Give users visibility: Git must allow users to make informed decisions. This
     ++means surfacing essential information to the user in a visible manner e.g. what
     ++remotes the hooks are coming from, whether the hooks have changed in the latest
     ++checkout.
      +
      +Feature Requirements
      +~~~~~~~~~~~~~~~~~~~~
     @@ Documentation/technical/repository-owner-hooks.txt (new)
      +Minimum Feature Set
      +^^^^^^^^^^^^^^^^^^^
      +
     -+* A repository owner can specify a configuration for what hook commands are
     ++* A repository can specify a configuration for what hook commands are
      +enabled for what hook events
      +
     -+* A repository owner can specify, in this configuration, where the hook
     ++* A repository can specify, in this configuration, where the hook
      +commands reside
      +
      +    ** This could be a path to a script/binary within the repository
     @@ Documentation/technical/repository-owner-hooks.txt (new)
      +    ** This could be a user installed command or script/binary that exists
      +    outside of the repository and is present in `$PATH`
      +
     -+* Users must explicitly approve hook execution at least once (yes/no)
     ++* This configuration should only apply if it was received over HTTPS
      +
     -+    ** This could happen during setup or at execution time
     ++* A setup command for users to set up hooks
     ++
     ++    ** Hook setup could happen at clone time assuming the user has consented
     ++    e.g. if `--setup-hooks` is passed to `git clone`
     ++
     ++* Users must explicitly approve hooks at least once
     ++
     ++    ** Running the setup command should count as approval, including if the user
     ++    consented during the clone
      +
      +    ** When a hook command changes, a user should re-approve execution (note:
      +    implementation should not interfere with requirement listed in “Fast
      +    Follows")
      +
     -+    * Users do not need to run setup scripts to install hooks --- the setup flow
     -+    happens automatically at clone time
     -+
      +* Automation is able to continue to use clone and other commands
      +non-interactively
      +
     @@ Documentation/technical/repository-owner-hooks.txt (new)
      +Implementation Exploration: Check "magic" branch for configs at fetch time
      +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      +
     -+Example workflow
     -+^^^^^^^^^^^^^^^^
     ++Example User Experience
     ++^^^^^^^^^^^^^^^^^^^^^^^
     ++
     ++===== Case 1: Consent through clone
     ++
     ++....
     ++$ git clone --setup-hooks
     ++...
     ++
     ++The following hooks were installed from remote `origin` ($ORIGIN_URL):
     ++
     ++pre-commit: git-secrets --pre_commit_hook
     ++pre-push:  $GIT_ROOT/pre_push.sh
     ++....
     ++
     ++===== Case 2: Prompting after clone
     ++....
     ++$ git clone
     ++...
     ++
     ++Remote `origin` ($ORIGIN_URL) suggest installing the following hooks:
     ++
     ++pre-commit: git-secrets --pre_commit_hook
     ++pre-push:  $GIT_ROOT/pre_push.sh
     ++
     ++# instead of prompting, we could give users commands to run instead
     ++# see case 3
     ++
     ++Do you wish to install them?
     ++1. Yes (this time)
     ++2. Yes (always from origin)
     ++3. No (not this time)
     ++4. No (never)
     ++....
     ++
     ++===== Case 3: Re-prompting when hooks change
     ++....
     ++$ git pull
     ++
     ++The following hooks were updated from remote `origin` ($ORIGIN_URL):
     ++
     ++pre-push:  $GIT_ROOT/pre_push.sh
     ++
     ++If you wish to install them, run `git hook setup origin`.
     ++
     ++If you wish to always accept hooks from `origin`, run `git hook setup --always
     ++origin`. You should only do this if you trust code changes from origin.
     ++
     ++To always ignore hooks from `origin`, run `git hook ignore origin`.
     ++....
     ++
     ++===== Case 4: Nudging when hooks weren't installed
     ++....
     ++$ git commit
     ++advice: The repository owner has recommended a 'pre-commit' hook that was not run.
     ++To view it, run `git show origin/refs/recommended-config:some-pre-commit`. To install it, run `git hook setup origin pre-commit`
     ++
     ++Turn off this advice by setting config variable advice.missingHook to false."
     ++....
     ++
     ++
     ++Implementation Sketch
     ++^^^^^^^^^^^^^^^^^^^^^
      +
      +* Perform fetch as normal
      +
     @@ Documentation/technical/repository-owner-hooks.txt (new)
      +* A way to perform general set up of a code base e.g. installing dependencies,
      +any first-build related tasks
      +
     ++* Sandboxing hook execution to provide higher levels of security.
     ++
      +[[prior-art]]
      +Prior Art
      +~~~~~~~~~


 Documentation/Makefile                        |   1 +
 .../technical/project-configured-hooks.txt    | 426 ++++++++++++++++++
 2 files changed, 427 insertions(+)
 create mode 100644 Documentation/technical/project-configured-hooks.txt

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 81d1bf7a049b..06ba5945c428 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -96,6 +96,7 @@ TECH_DOCS += technical/protocol-common
 TECH_DOCS += technical/protocol-v2
 TECH_DOCS += technical/racy-git
 TECH_DOCS += technical/reftable
+TECH_DOCS += technical/project-configured-hooks
 TECH_DOCS += technical/send-pack-pipeline
 TECH_DOCS += technical/shallow
 TECH_DOCS += technical/signature-format
diff --git a/Documentation/technical/project-configured-hooks.txt b/Documentation/technical/project-configured-hooks.txt
new file mode 100644
index 000000000000..e880dd7af300
--- /dev/null
+++ b/Documentation/technical/project-configured-hooks.txt
@@ -0,0 +1,426 @@
+Project Configured Hooks
+------------------------
+
+Background
+~~~~~~~~~~
+
+Context
+^^^^^^^
+
+Git has https://git-scm.com/docs/githooks[hooks] functionality to allow users to
+execute commands or scripts when certain Git events occur. Some use cases
+include:
+
+* The `pre-commit` hook event:
+
+  ** A developer may want to lint their changes to enforce certain code style
+  and quality. The project may want the commit to fail so that commits that
+  violate the project's style don't get uploaded.
+
+  ** The project may want to prevent developers from committing passwords or
+  other sensitive information e.g. via
+  https://github.com/awslabs/git-secrets[git-secrets].
+
+* The `commit-msg` hook event: the project may want to enforce a certain commit
+message style. This may be out of necessity:
+https://www.gerritcodereview.com/[Gerrit Code Review], for example, requires
+each commit to have a Change-Id in the footer.
+
+* The `pre-push` hook: the project may want to verify that pushes are going to
+the correct central repository to prevent leaks.
+
+A common thread between these use cases is to enforce certain behavior across
+many developers working in the same code base, encouraging best practices and
+healthy code quality.
+
+Hooks today are configured individually in each clone, making it difficult for
+project maintainers to enforce hooks usage across them.
+https://lore.kernel.org/git/20210311021037.3001235-2-emilyshaffer@google.com[Configuration-based
+hook management], by moving hooks to the config, makes this easier; individuals
+can at least configure hooks to be used across multiple workspaces on their
+host. However, there is still no good way for project maintainers to encourage
+or enforce adoption of specific hook commands on specific hook events in a
+clone. As such, there are many tools that provide this functionality on top of
+Git (see <<prior-art, Prior Art>>).
+
+We propose adding native Git functionality to allow project maintainers to
+specify hooks that a user ought to install and utilize in their development
+workflows.
+
+Server-side vs Local Checks
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+A large motivation for this change is providing projects a method to enable
+local checks of code e.g. linting. As documented in
+https://git-scm.com/docs/gitfaq#_hooks[gitfaq], in some cases, server-side
+checks may be more appropriate, especially since developers can always skip
+local hook execution. We think there are still benefits to local checks:
+
+* Ensuring commit message style and preventing the committing of sensitive data,
+as described above. In particular, if CI results are public, as with many open
+source projects, server side checks are useless for guarding against leaking
+sensitive data.
+
+* Helps developers catch issues earlier: typically developers need to push to
+the remote to trigger server-side checks. Local hooks can be run anytime the
+developer wants. This is especially useful if the project has slow
+server-checks; catching issues locally can save the developer a lot of time
+waiting for CI. They are also useful for locally reproducing an issue identified
+in CI, helping resolve issues faster.
+
+* Since the typical goal of developers to submit changes to a central
+repository, their interests are aligned with the project maintainers'; while
+they can choose to skip local hook execution, it is in their interest to allow
+hooks to run at least before proposing code for submission to the central
+repository to increase the chances of the code getting accepted.
+
+In the ideal world, developers and project maintainers use both local and server
+side checks in their workflow. However, for many smaller projects, this may not
+be possible: CI may be too expensive to run or configure. The number of local
+solutions to this use case speaks to this need (see <<prior-art, Prior Art>>).
+Bringing this natively to Git can give all these developers a well-supported,
+secure implementation opposed to the fragmentation we see today.
+
+User Goals / Critical User Journeys
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+* As a project maintainer,
+
+    ** I want to enforce code style so I require developers to use a
+    standardized linter.
+
+    ** I want to prevent sensitive data from getting checked in.
+
+    ** I want to prevent leaks so I check that developers are uploading code to
+    the right private central repository. Conversely, I may want to encourage
+    more open source development and encourage developers to push to public
+    central repositories.
+
+    ** I want this to just work for all the developers in my project, without
+    needing to support them through configuration.
+
+* As a developer developing in a local clone,
+
+    ** I want to set up my workspace.
+
+    ** I want control over what code runs on my machine.
+
+    ** I want to set up my own hooks, in addition to or in lieu of project
+    configured hooks.
+
+    ** I want to skip hooks from running (for various reasons).
+
+Security Considerations and Design Principles
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+We must balance the desire to make hooks setup easy for developers --- allowing
+them to get hooks set up with low friction, and hence increasing the probability
+of them adopting these hooks --- with protecting users from the security risks
+of arbitrary code execution on their hosts.
+
+To inform the design, we propose these design principles:
+
+* User consent: Users must explicitly agree to hooks usage; no hooks should
+execute without consent, and users should re-consent if hooks update. Users can
+opt-out of hooks.
+
+* Trust comes from the central repository:
+  ** Most users don't have the time or expertise to properly audit every hook
+  and what it does. There must be trust between the user and the remote that the
+  code came from, and the Git project should ensure trust to the degree it can
+  e.g. enforce HTTPS for its integrity guarantees.
+
+  ** Since developers will likely build their local clone in their development
+  process, at some point, arbitrary code from the repository will be executed.
+  In this sense, hooks _with user consent_ do not introduce a new attack surface.
+
+* Give users visibility: Git must allow users to make informed decisions. This
+means surfacing essential information to the user in a visible manner e.g. what
+remotes the hooks are coming from, whether the hooks have changed in the latest
+checkout.
+
+Feature Requirements
+~~~~~~~~~~~~~~~~~~~~
+
+Minimum Feature Set
+^^^^^^^^^^^^^^^^^^^
+
+* A repository can specify a configuration for what hook commands are
+enabled for what hook events
+
+* A repository can specify, in this configuration, where the hook
+commands reside
+
+    ** This could be a path to a script/binary within the repository
+
+    ** This could be a path to a script/binary contained within submodules of
+    the repository
+
+    ** This could be a user installed command or script/binary that exists
+    outside of the repository and is present in `$PATH`
+
+* This configuration should only apply if it was received over HTTPS
+
+* A setup command for users to set up hooks
+
+    ** Hook setup could happen at clone time assuming the user has consented
+    e.g. if `--setup-hooks` is passed to `git clone`
+
+* Users must explicitly approve hooks at least once
+
+    ** Running the setup command should count as approval, including if the user
+    consented during the clone
+
+    ** When a hook command changes, a user should re-approve execution (note:
+    implementation should not interfere with requirement listed in “Fast
+    Follows")
+
+* Automation is able to continue to use clone and other commands
+non-interactively
+
+* Works across Windows/Linux/macOS
+
+Fast Follows
+^^^^^^^^^^^^
+
+* When prompted to execute a hook, users can specify always or never, even if
+the hook updates
+
+Nice to Haves
+^^^^^^^^^^^^^
+
+* A method to skip hook execution i.e. `--no-verify` works everywhere
+
+* Support a “warnings only mode” where hooks run but don’t block commands from
+executing
+
+Out of Scope
+^^^^^^^^^^^^
+
+* Ensuring the user has installed software that isn't distributed with the repo
+
+Implementation Exploration: Check "magic" branch for configs at fetch time
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Example User Experience
+^^^^^^^^^^^^^^^^^^^^^^^
+
+===== Case 1: Consent through clone
+
+....
+$ git clone --setup-hooks
+...
+
+The following hooks were installed from remote `origin` ($ORIGIN_URL):
+
+pre-commit: git-secrets --pre_commit_hook
+pre-push:  $GIT_ROOT/pre_push.sh
+....
+
+===== Case 2: Prompting after clone
+....
+$ git clone
+...
+
+Remote `origin` ($ORIGIN_URL) suggest installing the following hooks:
+
+pre-commit: git-secrets --pre_commit_hook
+pre-push:  $GIT_ROOT/pre_push.sh
+
+# instead of prompting, we could give users commands to run instead
+# see case 3
+
+Do you wish to install them?
+1. Yes (this time)
+2. Yes (always from origin)
+3. No (not this time)
+4. No (never)
+....
+
+===== Case 3: Re-prompting when hooks change
+....
+$ git pull
+
+The following hooks were updated from remote `origin` ($ORIGIN_URL):
+
+pre-push:  $GIT_ROOT/pre_push.sh
+
+If you wish to install them, run `git hook setup origin`.
+
+If you wish to always accept hooks from `origin`, run `git hook setup --always
+origin`. You should only do this if you trust code changes from origin.
+
+To always ignore hooks from `origin`, run `git hook ignore origin`.
+....
+
+===== Case 4: Nudging when hooks weren't installed
+....
+$ git commit
+advice: The repository owner has recommended a 'pre-commit' hook that was not run.
+To view it, run `git show origin/refs/recommended-config:some-pre-commit`. To install it, run `git hook setup origin pre-commit`
+
+Turn off this advice by setting config variable advice.missingHook to false."
+....
+
+
+Implementation Sketch
+^^^^^^^^^^^^^^^^^^^^^
+
+* Perform fetch as normal
+
+* After fetch is complete, Git checks for a "magic" config branch (e.g.
++origin/refs/recommended-config+) which contains information about config lines
+an end-user may want (including hooks).
+
+* As part of the fetch subcommand, Git prompts users to install the configs
+contained there.
+
+    ** User responses to that prompt could be "sticky" - e.g. a user could reply
+    "no (this time)", "no (never)", "yes (this time)", or "yes (always)".
+    Always/never indicate that the user trusts the remote this config is coming
+    from, and should not apply to configs fetched from other remotes.
+
+Later, we might want to do this before the initial clone is performed; that
+workflow looks like:
+
+* During clone, perform ls-refs as normal
+
+* If the server has a "magic" config branch, fetch only that config branch.
+
+* Prompt users as described above.
+
+* Perform the rest of the clone.
+
+Pros
+^^^^
+
+* Repository owners have a method for providing recommended config for
+contributors.
+
+* Installation flow happens without additional user intervention.
+
+* Keeping config branch and history separate from code branch and history means
+it is versioned, but not tied to user's checkout.
+
+* Letting users specify "always" or "never" reduces amount of pain introduced by
+interactive "configuration wizard" flow.
+
+Cons
+^^^^
+
+* Requires addition of step to fetch (and later clone).
+
+* Turning a "set and forget" command like clone into an interactive session with
+the user is not ideal; care must be taken to avoid breaking bots.
+
+* Inflating configs and executables from a remote tracking branch which is never
+checked out could be slow.
+
+Future Work
+~~~~~~~~~~~
+
+* Extending this to allow repository owners to specify specific configurations
+in general e.g. this repository should use partial-clone with these parameters.
+
+* Extending this to support submodules: We want to make sure this works in a way
+that's easy to adapt to submodules, who would likely need to run the same hooks
+as the superproject; for example, submodules could inherit the superproject
+config.
+
+* A way to perform general set up of a code base e.g. installing dependencies,
+any first-build related tasks
+
+* Sandboxing hook execution to provide higher levels of security.
+
+[[prior-art]]
+Prior Art
+~~~~~~~~~
+
+Husky
+^^^^^
+
+* Add it as a dev dependency in package.json
+
+* Supports out-of-the-box configuration: Adding a `prepare` script in
+package.json with `husky install` will automate the installation of the husky
+script in the .git directory.
+
+pre-commit
+^^^^^^^^^^
+
+* Acts as a package manager for linting, installing the required linters as
+needed
+
+* `pre-commit install` installs the pre-commit hook shim
+
+* Config is checked into the repository, allowing owners to set versions for
+linters
+
+* Provides some out-of-the-box hooks:
+https://github.com/pre-commit/pre-commit-hooks[https://github.com/pre-commit/pre-commit-hooks]
+
+Repo Hooks
+^^^^^^^^^^
+
+* A Git repository containing hooks commands is specified in the manifest and
+checked out during `repo init`.
+
+* A `repo-hooks` element specifies the Git repository where the hooks code lives
+and which hook to enable. This is typically
+https://android.googlesource.com/platform/tools/repohooks/.
+
+* The only hook event currently supported is `pre-upload`, running when people
+run +repo upload+.
+
+* The hooks code file name must be the same as the hook event name e.g.
+`pre-upload.py`.
+
+* When a hook is used for the first time, the user is prompted
+to approve execution
+
+    ** For manifests fetched via secure protocols (e.g. https://), the user is
+    prompted once.
+
+    ** For insecure protocols (e.g. http://), the user is prompted whenever the
+    registered repohooks project is updated and a hook is triggered.
+
+* Repo hooks must be python modules (Python 2 or 3, with 2 support deprecated)
+
+* Repo hooks run from the top of the repo directory
+
+    ** if the repo client is under `~/tree/`, then that is where the hook runs,
+    even if you ran repo in a git repository at `~/tree/src/foo/`, or in a
+    subdirectory of that git repository in `~/tree/src/foo/bar/`.
+
+* `--no-verify` allows developers to bypass hooks.
+
+* `--ignore-hooks` allows developers to run hooks without blocking
+
+// The '+' in Gerrit URL frightens asciidoctor.
+:repohook-read: https://android.googlesource.com/platform/tools/repohooks/+/refs/heads/master/rh/config.py
+:repohook-config: https://android.googlesource.com/platform/tools/repohooks/+/refs/heads/master#config-files
+
+* The hooks {repohook-read}[read] a {repohook-config}[config file] to determine
+what code to actually execute.
+
+    ** Config can be global e.g. at the top of the Repo workspace and local, and
+    they get merged together.
+
+    ** Local config: looks for the closest config file to where you ran repo,
+    allowing each repository to configure its own hooks
+
+* Example code that gets run:
+
+    ** pylint3 (requiring this to be installed on the host)
+
+    ** commit_msg_bug_field: require “Bug:” to be present in the commit message
+    (built into repohooks)
+
+Glossary
+~~~~~~~~
+
+*hook event*: A point during Git’s execution where user scripts may be run, for
+example, `prepare-commit-msg` or `pre-push`.
+
+*hook command*: A user script or executable which will be run on one or more
+hook events.

base-commit: a5828ae6b52137b913b978e16cd2334482eb4c1f
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] hooks: propose project configured hooks
  2021-03-26  1:43 ` [PATCH v2] hooks: propose project " Albert Cui via GitGitGadget
@ 2021-03-29 23:20   ` Emily Shaffer
  2021-04-01 20:02     ` Albert Cui
  2021-03-30 15:24   ` Derrick Stolee
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 26+ messages in thread
From: Emily Shaffer @ 2021-03-29 23:20 UTC (permalink / raw)
  To: Albert Cui via GitGitGadget
  Cc: git, Albert Cui, brian m. carlson,
	Ævar Arnfjörð Bjarmason

On Fri, Mar 26, 2021 at 01:43:36AM +0000, Albert Cui via GitGitGadget wrote:

> Change-Id: I5f6747524b97c51dfe5fa28e48ea03981b2da5b8
Oops :)

I avoid this by setting gerrit.createChangeId = false in my global
config and adding an alias:
  alias.gerrit-commit = "-c gerrit.createChangeId=true commit"

> +Server-side vs Local Checks
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +* Helps developers catch issues earlier: typically developers need to push to
> +the remote to trigger server-side checks. Local hooks can be run anytime the
> +developer wants. This is especially useful if the project has slow
> +server-checks; catching issues locally can save the developer a lot of time
> +waiting for CI. They are also useful for locally reproducing an issue identified
> +in CI, helping resolve issues faster.

Big +1 to this - I hate having to wait for a push and CI build, possibly
queued behind someone else's work or an earlier mistaken push, to check
whether my stuff is right. :)

> +In the ideal world, developers and project maintainers use both local and server
> +side checks in their workflow. However, for many smaller projects, this may not
> +be possible: CI may be too expensive to run or configure. The number of local
> +solutions to this use case speaks to this need (see <<prior-art, Prior Art>>).
> +Bringing this natively to Git can give all these developers a well-supported,
            ^~~~
This is a little vague here. It sounds like you might be suggesting to
standardize server-side CI config in Git-controlled projects.
> +secure implementation opposed to the fragmentation we see today.

The point about solution fragmentation is a strong one and I wonder
whether it's being emphasized enough. There is obviously a need, or else
people wouldn't keep writing all these things in the Prior Art section
:)

> +Security Considerations and Design Principles
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
[snip]
> +  ** Since developers will likely build their local clone in their development
> +  process, at some point, arbitrary code from the repository will be executed.
> +  In this sense, hooks _with user consent_ do not introduce a new attack surface.

It might be worth saying that we want to make configuration of
project-configured hooks to be approximately as easy/automatic as
building (that is, the user still has to explicitly run a build, and
isn't prompted at the end of their clone whether they want to build it
right away).
> +
> +* Give users visibility: Git must allow users to make informed decisions. This
> +means surfacing essential information to the user in a visible manner e.g. what
> +remotes the hooks are coming from, whether the hooks have changed in the latest
> +checkout.
   ^~~~~~~~
Better say "fetch", if we are proposing this magic branch thing.

> +* This configuration should only apply if it was received over HTTPS

Meaning, non-HTTPS fetches should just not update this special branch?

> +* A setup command for users to set up hooks
AIUI, this is proposed to be part of `git hook`, right?

I don't think it needs to be part of this doc but it'd be nice to also
support installing just a subset, like:

  git hook setup pre-commit
  git hook setup --interactive

> +* Users must explicitly approve hooks at least once
> +
> +    ** Running the setup command should count as approval, including if the user
> +    consented during the clone
> +
> +    ** When a hook command changes, a user should re-approve execution (note:
> +    implementation should not interfere with requirement listed in “Fast
> +    Follows")
> +
> +* Automation is able to continue to use clone and other commands
> +non-interactively

One interesting point - by using an advice instead of an interactive
prompt at clone time, we get this for free.

> +Fast Follows
> +^^^^^^^^^^^^
> +
> +* When prompted to execute a hook, users can specify always or never, even if
> +the hook updates

I think we want to base this on the remote URL, right? I know we talked
a little offline about how to mitigate vs. malicious maintainer (for
example this whole mess with The Great Suspender) and I'm not sure what
solution there might be.

I wonder if it's worth it to notify users that their always-okayed hooks
were updated during fetch?

> +
> +Nice to Haves
> +^^^^^^^^^^^^^
> +
> +* A method to skip hook execution i.e. `--no-verify` works everywhere

This part I'd like to discuss more on-list - I think it would need to
happen as an argument to git.c (e.g. git --no-verify commit blah), or
else we'd have the problems we have with --no-verify today. But is that
too ugly? I think everything else (even teaching parse-options to grab
--no-verify regardless, which, ick) would still be prone to issues,
since not everybody uses parse-options and not every subcommand
implementor knows their subcommand will invoke a hook. (For example, the
nice surprise when rebase started using some different strategy and
invoking the post-commit hook way more often, off the top of my head so
details may not be correct.)

> +* Support a “warnings only mode” where hooks run but don’t block commands from
> +executing

Same as --no-verify. I wonder whether it's "good enough" to do these two
as configs? hook.skip-all=true, hook.ignore-result=true?

> +Implementation Exploration: Check "magic" branch for configs at fetch time
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Example User Experience
> +^^^^^^^^^^^^^^^^^^^^^^^
> +
> +===== Case 1: Consent through clone
> +
> +....
> +$ git clone --setup-hooks
> +...
> +
> +The following hooks were installed from remote `origin` ($ORIGIN_URL):
> +
> +pre-commit: git-secrets --pre_commit_hook
> +pre-push:  $GIT_ROOT/pre_push.sh

Hm, I thought we wanted to consider storing the hook body in the magic
branch as well? To avoid changing hook implementation during bisect, for
example?

> +....
> +
> +===== Case 2: Prompting after clone
> +....
> +$ git clone
> +...
> +
> +Remote `origin` ($ORIGIN_URL) suggest installing the following hooks:
> +
> +pre-commit: git-secrets --pre_commit_hook
> +pre-push:  $GIT_ROOT/pre_push.sh
> +
> +# instead of prompting, we could give users commands to run instead
> +# see case 3

Yep, I think this is a better idea - I glued together the two UXen below
:)

> +
> +Do you wish to install them?
> +1. Yes (this time)
> +2. Yes (always from origin)
> +3. No (not this time)
> +4. No (never)
> +....

Offline when we discussed this, it seems like users will just smash 2
("whatever gets you to stop bothering me") regardless of whether the
hooks are actually coming from a source the user trusts. So I would
prefer something like:

  $ git clone
  ....
  Remote `origin` ($ORIGIN_URL) suggest installing the following hooks:

  pre-commit: git-secrets --pre_commit_hook
  pre-push:  $GIT_ROOT/pre_push.sh

  If you wish to install them, run `git hook setup origin`.

> +===== Case 3: Re-prompting when hooks change
> +....
> +$ git pull
> +
> +The following hooks were updated from remote `origin` ($ORIGIN_URL):
> +
> +pre-push:  $GIT_ROOT/pre_push.sh
> +
> +If you wish to install them, run `git hook setup origin`.
> +
> +If you wish to always accept hooks from `origin`, run `git hook setup --always
> +origin`. You should only do this if you trust code changes from origin.
> +
> +To always ignore hooks from `origin`, run `git hook ignore origin`.
> +....
> +
> +===== Case 4: Nudging when hooks weren't installed
> +....
> +$ git commit
> +advice: The repository owner has recommended a 'pre-commit' hook that was not run.
> +To view it, run `git show origin/refs/recommended-config:some-pre-commit`. To install it, run `git hook setup origin pre-commit`
> +
> +Turn off this advice by setting config variable advice.missingHook to false."
> +....

(Full disclosure: this was my idea.)
I realize that some folks upstream may find this is too chatty for
general use. I'm hoping being able to shut off the advice globally might
be enough of a mitigation; maybe we can gate it behind an experimental
config or something if folks aren't so sure?

> +Implementation Sketch
> +^^^^^^^^^^^^^^^^^^^^^
> +
> +* Perform fetch as normal
> +
> +* After fetch is complete, Git checks for a "magic" config branch (e.g.
> ++origin/refs/recommended-config+) which contains information about config lines
> +an end-user may want (including hooks).
> +
> +* As part of the fetch subcommand, Git prompts users to install the configs
> +contained there.

Like I mentioned above, I think we probably want to drop the entire
interactive installer wizard concept...

> +    ** User responses to that prompt could be "sticky" - e.g. a user could reply
> +    "no (this time)", "no (never)", "yes (this time)", or "yes (always)".
> +    Always/never indicate that the user trusts the remote this config is coming
> +    from, and should not apply to configs fetched from other remotes.

...which also means that we can drop trying to express this briefly and
instead say something wordy in a flag to `git hook setup` (or whatever
we call it).

> +Later, we might want to do this before the initial clone is performed; that
> +workflow looks like:
> +
> +* During clone, perform ls-refs as normal
> +
> +* If the server has a "magic" config branch, fetch only that config branch.
> +
> +* Prompt users as described above.
> +
> +* Perform the rest of the clone.

This part I'm still interested in, although I'm not sure how to
reconcile not wanting an interactive prompt with wanting an early step
like this during clone. Maybe that's what this `git clone --setup-hooks`
(or maybe, `git clone --with-recommended-configs`) is for?

> +Pros
> +^^^^
> +
> +* Repository owners have a method for providing recommended config for
> +contributors.
> +
> +* Installation flow happens without additional user intervention.

I think when we wrote this bullet point it was to express "the user
doesn't have to run something else to discover these hooks exist". But I
don't think "without additional user intervention" fully describes
what's proposed here, either. Hrm.

> +
> +* Keeping config branch and history separate from code branch and history means
> +it is versioned, but not tied to user's checkout.

Probably worth discussing/including that we intend hook contents to also
live in the config branch, to make sure we're running the same hook
regardless of checkout/bisect state/inspection/have been working on a
feature for 6 months and have been fetching but not rebasing/etc. I'm
not sure I see that explicitly called out here...

Actually, I found the following (pasting from much earlier in the doc):

  +    ** This could be a path to a script/binary within the repository
  +
  +    ** This could be a path to a script/binary contained within
  submodules of
  +    the repository
  +
  +    ** This could be a user installed command or script/binary that
  exists
  +    outside of the repository and is present in `$PATH`

Maybe this part needs to be modified to explicitly refer to the hook
executable being tracked in the magic branch?

> +Cons
> +^^^^
[snip]
> +* Turning a "set and forget" command like clone into an interactive session with
> +the user is not ideal; care must be taken to avoid breaking bots.

If we notify and nag, but don't interactively prompt, then we get happy
bots for free ;)

> +
> +* Inflating configs and executables from a remote tracking branch which is never
> +checked out could be slow.

I wonder about this. This seems to me like something that might be
drastically slower or faster depending on platform. Hmmm.

> +Future Work
> +~~~~~~~~~~~
> +
> +* Extending this to allow repository owners to specify specific configurations
> +in general e.g. this repository should use partial-clone with these parameters.

Offline I think there was a little discussion with Stolee about whether
it made more sense to *only* approach this specific problem with this
document, as the hooks are also config, and so they could come later.
But I think if we want to store the executable in the magic branch (and
I do... since I keep bringing it up :) ) then it doesn't make sense to
say "build it for config and everything else will follow".

> +* Extending this to support submodules: We want to make sure this works in a way
> +that's easy to adapt to submodules, who would likely need to run the same hooks
> +as the superproject; for example, submodules could inherit the superproject
> +config.

I'm hoping to send an RFC patch introducing such an inherited
superproject config ... very soon. I hope. So there wasn't much detail
provided here, intentionally.

> +* Sandboxing hook execution to provide higher levels of security.

I think this says: "Can we run a user hook in a container that only has
access to the repo in question?"

It sounds like a complicated answer. I could see legitimate reasons to
want wider access than just the container - for example, some
hook-specific configuration that doesn't fit the Git config format, or
even something like updating a stats file to keep a record of how many
commits I made/pushed/whatever every day, stored in a central location
for reference at performance review time :) But I also don't know alllll
that much about containerization - I think there are ways to hand over
access to other needed files like this, right?

But then, I also feel yucky thinking about Debian telling me that my Git
install also needs me to install Docker... :)

Worth thinking about and discussing at a later date, I'd guess.

> +[[prior-art]]
> +Prior Art
> +~~~~~~~~~

I wonder whether it's useful to mention (in mails, I guess, not in
the checked in doc) why these are bad - do they duplicate work between
each other? Are they engaging in bad practices when interfacing with
Git? etc.?

It would be a lot of work to collect, so maybe it's not that useful..


Thanks for writing up v2 / mailing it, Albert.

 - Emily

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] hooks: propose project configured hooks
  2021-03-26  1:43 ` [PATCH v2] hooks: propose project " Albert Cui via GitGitGadget
  2021-03-29 23:20   ` Emily Shaffer
@ 2021-03-30 15:24   ` Derrick Stolee
  2021-04-05 22:45     ` Albert Cui
  2021-04-02  9:59   ` Ævar Arnfjörð Bjarmason
  2021-04-02 10:30   ` Ævar Arnfjörð Bjarmason
  3 siblings, 1 reply; 26+ messages in thread
From: Derrick Stolee @ 2021-03-30 15:24 UTC (permalink / raw)
  To: Albert Cui via GitGitGadget, git
  Cc: Albert Cui, brian m. carlson, Ævar Arnfjörð Bjarmason

On 3/25/2021 9:43 PM, Albert Cui via GitGitGadget wrote:
> From: Albert Cui <albertqcui@gmail.com>
> 
> Hooks today are configured at the repository level, making it difficult to
> share hooks across repositories. Configuration-based hook management, by
> moving hooks configuration to the config, makes this much easier. However,
> there is still no good way for project maintainers to encourage or enforce
> adoption of specific hook commands on specific hook events in a repository.
> As such, there are many tools that provide this functionality on top of Git.
> 
> This patch documents the requirements we propose for this feature as well as
> a design sketch for implementation.

Sorry for being so late in reviewing this.

My first reaction is that this feature is suggesting multiple security
vulnerabilities as core functionality. It also seems to be tied to
niche projects (in number of projects, not necessarily the size of those
projects).

I was recommended in conversation to think of this as a way to take
existing ad-hoc behavior and standardize it with a "Git-blessed"
solution. I'm not sure this proposal makes a strong enough case for
why having a "configure-hooks.sh" script in the base of the repo is
not enough. It simultaneously does not use existing precedents like
.gitattributes or .gitignore as direction in using the worktree at
HEAD as a mechanism for communicating details. I find using a separate
ref for hooks to be a non-starter and the design should be rebuilt from
scratch.

I also expect that a significant portion of users will see a message
like "this repository needs hooks" and will just say "yes" to get rid
of the prompt. There needs to be sufficient opportunity for users to
inspect the hook configuration and avoid frustrated or distracted users
from doing the wrong thing.

Server-side checks should always exist, so users who don't follow the
project's guidelines using the recommended hooks will be blocked. The
important thing is that there is an easy way for willing participants
to install the correct hooks. This doesn't mean we should make it
almost automatic.

Also, please proactively pursue a security review of the feature,
including non-technical risks such as social engineering, forks, or
other possible attacks. This idea seems so risky that I would be
against accepting it unless a security expert has done a thorough
review.

> +We propose adding native Git functionality to allow project maintainers to
> +specify hooks that a user ought to install and utilize in their development
> +workflows.

I think providing a way for repository owners to _recommend_ how cloners
should interact with the repository is a good idea. I think starting with
hooks is perhaps a significant jump to the most complicated version of
that idea.

As you think of this design, it might be good to think about how some
recommended Git config (within an allow-list) might fit into this system
as well. I would have started there, with things like "Use partial clone"
or "use sparse-checkout". Those are really things that need to happen at
clone time, they can't really happen after-the-fact, which helps justifying
a modification to 'git clone'. The hook configuration doesn't _need_ to
happen during 'git clone'. More on this timing later.

The .gitattributes file is the closest analogue I could find in current
functionality, but it operates on a path-based scope, not repository scope.

> +Server-side vs Local Checks
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
> +In the ideal world, developers and project maintainers use both local and server
> +side checks in their workflow. However, for many smaller projects, this may not
> +be possible: CI may be too expensive to run or configure. The number of local
> +solutions to this use case speaks to this need (see <<prior-art, Prior Art>>).
> +Bringing this natively to Git can give all these developers a well-supported,
> +secure implementation opposed to the fragmentation we see today.

I'm not sure this is a good selling point for small projects. If they
are small, then the CI to verify commits is cheap(er).

Local hooks should never be used as a replacement for server-side checks.
A user could always use a repository without the local hooks and push
commits that have not been vetted locally. The extreme example is to have
a commit hook that compiles the code and runs all the tests. Would you
then remove all CI builds?

Making it easier to adopt local hooks can avoid some pain points when
users are blocked by the server-side checks.

> +Server-side vs Local Checks
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
> +User Goals / Critical User Journeys
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...

I appreciate the motivation in this document. However, the motivation
doesn't really justify why this should be baked into Git itself, since
a "configure-repo" script in the base of the repo would suffice to
achieve that functionality.

The reason to put this in Git is to standardize this process so it
is not different in each repository. It might be good to spend time
justifying that angle.

> +Security Considerations and Design Principles
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +We must balance the desire to make hooks setup easy for developers --- allowing
> +them to get hooks set up with low friction, and hence increasing the probability
> +of them adopting these hooks --- with protecting users from the security risks
> +of arbitrary code execution on their hosts.
> +
> +To inform the design, we propose these design principles:
> +
> +* User consent: Users must explicitly agree to hooks usage; no hooks should
> +execute without consent, and users should re-consent if hooks update. Users can
> +opt-out of hooks.
> +
> +* Trust comes from the central repository:
> +  ** Most users don't have the time or expertise to properly audit every hook
> +  and what it does. There must be trust between the user and the remote that the
> +  code came from, and the Git project should ensure trust to the degree it can
> +  e.g. enforce HTTPS for its integrity guarantees.
> +
> +  ** Since developers will likely build their local clone in their development
> +  process, at some point, arbitrary code from the repository will be executed.
> +  In this sense, hooks _with user consent_ do not introduce a new attack surface.

It is critical that users are presented with this consent at the correct
times. For instance, I believe configuring local hooks should only be
done _after_ "git clone" completes. That allows a user to inspect the
worktree to their content instead of in the middle of an interactive shell
session or something. (The "git clone" command could output a message to
stderr saying "This repository recommends configuring local hooks. Run
'git <tbd>' to inspect the hooks and configure them.")

We've had enough code-execution bugs with "git clone" that I want to
completely avoid that possibility here.

> +* Give users visibility: Git must allow users to make informed decisions. This
> +means surfacing essential information to the user in a visible manner e.g. what
> +remotes the hooks are coming from, whether the hooks have changed in the latest
> +checkout.

As a user moves HEAD, we should similarly avoid updating the hooks
automatically, but instead present a message to the user to update their
hooks using an intentional command.

> +    ** This could be a path to a script/binary within the repository

Binaries will be tricky if you want users of multiple platforms to
interact with your repository. And scripts can be slower than binaries.

How could someone build hooks from source using your workflow? Perhaps
users are expected to locally compile the code before configuring the
hooks?

> +    ** This could be a path to a script/binary contained within submodules of
> +    the repository

This gives me significant chills. Proceed with caution here.

I understand the reason to want this feature: you could have a suite
of repositories using a common hook set that lives in each as a
submodule. I just want to point out that this adds yet another
dimension for attack.

> +    ** This could be a user installed command or script/binary that exists
> +    outside of the repository and is present in `$PATH`

Like `rm -rf ~/*`? I'm trying to think of dangerous things to do without
elevation. It could help here to clarify the intended user pattern here:
"This repository requires that you install tool X." This seems unlikely
to be necessarily true at clone time, so the users will have a broken
state if they don't run some extra steps. How will that be communicated?

Requirements like these make me think that these repositories would be
better off with a script that configures the hooks after checking if
these things actually exist on the PATH (and installs them if not). I
would lower the priority of this one for now.

> +* This configuration should only apply if it was received over HTTPS

Avoiding http:// and git:// makes sense. Why not SSH?

> +* A setup command for users to set up hooks
> +
> +    ** Hook setup could happen at clone time assuming the user has consented
> +    e.g. if `--setup-hooks` is passed to `git clone`

This is not enough consent.

> +* Users must explicitly approve hooks at least once
> +
> +    ** Running the setup command should count as approval, including if the user
> +    consented during the clone
> +
> +    ** When a hook command changes, a user should re-approve execution (note:
> +    implementation should not interfere with requirement listed in “Fast
> +    Follows")

Users should explicitly approve hooks any time they would change.
They should also be able to explore the source of the change using
whatever editors and tools they want, so the worktree should change
to its new state without new hooks, _then_ the user could consider
updating hooks based on that new state.

> +Fast Follows
> +^^^^^^^^^^^^
> +
> +* When prompted to execute a hook, users can specify always or never, even if
> +the hook updates

I don't understand what this means. "when prompted to execute a hook" are you
saying that the user will get a message saying "Git will now run the pre-commit
hook, are you ok with that?"

"even if the hook updates": I've made my stance clear that the user should be
in complete control of when the hooks update.

> +Out of Scope
> +^^^^^^^^^^^^
> +
> +* Ensuring the user has installed software that isn't distributed with the repo

If you are going to allow hooks to run something on the PATH, then Git
should probably check that such an executable exists before setting the
config and causing problems.

> +Implementation Exploration: Check "magic" branch for configs at fetch time
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Example User Experience
> +^^^^^^^^^^^^^^^^^^^^^^^
> +
> +===== Case 1: Consent through clone
> +
> +....
> +$ git clone --setup-hooks
> +...
> +
> +The following hooks were installed from remote `origin` ($ORIGIN_URL):
> +
> +pre-commit: git-secrets --pre_commit_hook
> +pre-push:  $GIT_ROOT/pre_push.sh
> +....

Nope. I think this workflow is a non-starter.

> +===== Case 2: Prompting after clone
> +....
> +$ git clone
> +...
> +
> +Remote `origin` ($ORIGIN_URL) suggest installing the following hooks:
> +
> +pre-commit: git-secrets --pre_commit_hook
> +pre-push:  $GIT_ROOT/pre_push.sh

Yes, this works for me.

> +# instead of prompting, we could give users commands to run instead
> +# see case 3
> +
> +Do you wish to install them?
> +1. Yes (this time)
> +2. Yes (always from origin)
> +3. No (not this time)
> +4. No (never)

I'd rather see the installation as a separate step. That gives more
weight to the users' consent. Even if you do have a prompt here that
says Yes/No, *do not* include "always from origin".

> +===== Case 3: Re-prompting when hooks change
> +....
> +$ git pull
> +
> +The following hooks were updated from remote `origin` ($ORIGIN_URL):
> +
> +pre-push:  $GIT_ROOT/pre_push.sh
> +
> +If you wish to install them, run `git hook setup origin`.

Good. Stop here. Perhaps also describe this as something that happens
with "git checkout" because it matters when HEAD updates, even if the
commit was fetched earlier.

> +===== Case 4: Nudging when hooks weren't installed
> +....
> +$ git commit
> +advice: The repository owner has recommended a 'pre-commit' hook that was not run.
> +To view it, run `git show origin/refs/recommended-config:some-pre-commit`. To install it, run `git hook setup origin pre-commit`
> +
> +Turn off this advice by setting config variable advice.missingHook to false."
> +....

These nudges seem like a good pattern, especially with the advice config.

> +Implementation Sketch
> +^^^^^^^^^^^^^^^^^^^^^
> +
> +* Perform fetch as normal
> +
> +* After fetch is complete, Git checks for a "magic" config branch (e.g.
> ++origin/refs/recommended-config+) which contains information about config lines
> +an end-user may want (including hooks).

I think this is the wrong direction to go. You are recommending a few things:

1. Some branch names are more special than others.
2. Hooks live in a separate history than the rest of the repository.
3. Users cannot inspect the hooks in their worktree before installation.

Instead, think about things like .gitignore and .gitattributes, as they can
change as the repository changes. Make a special _filename_ or directory:
for example ".githooks/".

> +* As part of the fetch subcommand, Git prompts users to install the configs
> +contained there.

Prompt users that they are available and can be configured using another
command.

I summarized my thoughts at the top.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] hooks: propose project configured hooks
  2021-03-29 23:20   ` Emily Shaffer
@ 2021-04-01 20:02     ` Albert Cui
  0 siblings, 0 replies; 26+ messages in thread
From: Albert Cui @ 2021-04-01 20:02 UTC (permalink / raw)
  To: Emily Shaffer
  Cc: Albert Cui via GitGitGadget, git, brian m. carlson,
	Ævar Arnfjörð Bjarmason

On Mon, Mar 29, 2021 at 4:20 PM Emily Shaffer <emilyshaffer@google.com> wrote:
>
> > +Security Considerations and Design Principles
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> [snip]
> > +  ** Since developers will likely build their local clone in their development
> > +  process, at some point, arbitrary code from the repository will be executed.
> > +  In this sense, hooks _with user consent_ do not introduce a new attack surface.
>
> It might be worth saying that we want to make configuration of
> project-configured hooks to be approximately as easy/automatic as
> building (that is, the user still has to explicitly run a build, and
> isn't prompted at the end of their clone whether they want to build it
> right away).

+1, I like phrasing it this way.
> > +
> > +* Give users visibility: Git must allow users to make informed decisions. This
> > +means surfacing essential information to the user in a visible manner e.g. what
> > +remotes the hooks are coming from, whether the hooks have changed in the latest
> > +checkout.
>    ^~~~~~~~
> Better say "fetch", if we are proposing this magic branch thing.
>
> > +* This configuration should only apply if it was received over HTTPS
>
> Meaning, non-HTTPS fetches should just not update this special branch?

Yes, though I erroneously forgot to include SSH as well. I think the
main issue is
person-in-the-middle type attacks.

> > +* A setup command for users to set up hooks
> AIUI, this is proposed to be part of `git hook`, right?
>
> I don't think it needs to be part of this doc but it'd be nice to also
> support installing just a subset, like:
>
>   git hook setup pre-commit
>   git hook setup --interactive
>
Correct, I think `git hook` is a natural evolution. This is a nice to
have that we can document.

>
> > +Fast Follows
> > +^^^^^^^^^^^^
> > +
> > +* When prompted to execute a hook, users can specify always or never, even if
> > +the hook updates
>
> I think we want to base this on the remote URL, right? I know we talked
> a little offline about how to mitigate vs. malicious maintainer (for
> example this whole mess with The Great Suspender) and I'm not sure what
> solution there might be.
>
> I wonder if it's worth it to notify users that their always-okayed hooks
> were updated during fetch?
>
It definitely aligns with the security principles to notify, even if
they have OK'd updates.

> > +Implementation Exploration: Check "magic" branch for configs at fetch time
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +Example User Experience
> > +^^^^^^^^^^^^^^^^^^^^^^^
> > +
> > +===== Case 1: Consent through clone
> > +
> > +....
> > +$ git clone --setup-hooks
> > +...
> > +
> > +The following hooks were installed from remote `origin` ($ORIGIN_URL):
> > +
> > +pre-commit: git-secrets --pre_commit_hook
> > +pre-push:  $GIT_ROOT/pre_push.sh
>
> Hm, I thought we wanted to consider storing the hook body in the magic
> branch as well? To avoid changing hook implementation during bisect, for
> example?
>

Good question. If we consider this as an extension of config-based
hooks, then I think
it's logical to still support hooks in the repo itself. In
documentation, we might suggest
that people who want to use this feature store the hook in the magic
branch for that
reason.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] hooks: propose project configured hooks
  2021-03-26  1:43 ` [PATCH v2] hooks: propose project " Albert Cui via GitGitGadget
  2021-03-29 23:20   ` Emily Shaffer
  2021-03-30 15:24   ` Derrick Stolee
@ 2021-04-02  9:59   ` Ævar Arnfjörð Bjarmason
  2021-04-05 23:42     ` Albert Cui
  2021-04-02 10:30   ` Ævar Arnfjörð Bjarmason
  3 siblings, 1 reply; 26+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-02  9:59 UTC (permalink / raw)
  To: Albert Cui via GitGitGadget; +Cc: git, brian m. carlson, Albert Cui


On Fri, Mar 26 2021, Albert Cui via GitGitGadget wrote:

> From: Albert Cui <albertqcui@gmail.com>

Formatting nit:

> +Git has https://git-scm.com/docs/githooks[hooks] functionality to allow users to
> [...]
> +local checks of code e.g. linting. As documented in
> +https://git-scm.com/docs/gitfaq#_hooks[gitfaq], in some cases, server-side
> +checks may be more appropriate, especially since developers can always skip

This should be a linkgit:* even in the technical/ directory, should it
not? We build docs on git-scm.com (among others), but in our source tree
we should be using linking syntax, not linking to our own already-built
docs on some website.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] hooks: propose project configured hooks
  2021-03-26  1:43 ` [PATCH v2] hooks: propose project " Albert Cui via GitGitGadget
                     ` (2 preceding siblings ...)
  2021-04-02  9:59   ` Ævar Arnfjörð Bjarmason
@ 2021-04-02 10:30   ` Ævar Arnfjörð Bjarmason
  2021-04-03  0:58     ` Albert Cui
  3 siblings, 1 reply; 26+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-02 10:30 UTC (permalink / raw)
  To: Albert Cui via GitGitGadget; +Cc: git, brian m. carlson, Albert Cui


On Fri, Mar 26 2021, Albert Cui via GitGitGadget wrote:

> From: Albert Cui <albertqcui@gmail.com>
>
> Hooks today are configured at the repository level, making it difficult to
> share hooks across repositories. Configuration-based hook management, by
> moving hooks configuration to the config, makes this much easier. However,
> there is still no good way for project maintainers to encourage or enforce
> adoption of specific hook commands on specific hook events in a repository.
> As such, there are many tools that provide this functionality on top of Git.
>
> This patch documents the requirements we propose for this feature as well as
> a design sketch for implementation.

I had comments on the v1 that are still unanswered by this re-roll:
https://lore.kernel.org/git/87im5nzbe6.fsf@evledraar.gmail.com/

I think a more productive way to handle such proposals is to reply to
such general discussion and /then/ re-roll the series.

I'm not going to repeat the outstanding points there (but would like you
to reply to it, and having just read Derrick Stolee's E-Mail downthread
there's another significant point of feedback to reply to.

So just comments on the range-diff below. I think for any one paragraph
reading ahead helps, because it's a bit of stream-of-conciousness as I
try to make sense of things that are then hinted at in later parts of
the document (which itself is a suggestion to maybe re-arrange some of
it).

>      -+* The `pre-commit` hook event: before committing, a developer may want to lint
>      -+their changes to enforce certain code style and quality. If there are style
>      -+issues, the developer may want the commit to fail so that they can fix the
>      -+issues.
>      ++* The `pre-commit` hook event:
>       +
>      -+* The `commit-msg` hook event: after committing, repository owners may want to
>      -+enforce a certain commit message style. This may be out of necessity:
>      ++  ** A developer may want to lint their changes to enforce certain code style
>      ++  and quality. The project may want the commit to fail so that commits that
>      ++  violate the project's style don't get uploaded.
>      ++
>      ++  ** The project may want to prevent developers from committing passwords or
>      ++  other sensitive information e.g. via
>      ++  https://github.com/awslabs/git-secrets[git-secrets].

Why does the project want to prevent developers from *committing* such
information by default? I commit such stuff all the time, it's only a
problem once it's pushed.

The genesis of this series seems to be need within the
Google-plex. Having operated a similar central-repository corporate
setup before I'm surprised that you're even trying to prevent such
things on local computers, surely you'll have to re-do such checks
on-push on the server, so just have them live there?

But I think this is answered below:

>      ++Server-side vs Local Checks
>      ++^^^^^^^^^^^^^^^^^^^^^^^^^^^
>      ++
>      ++A large motivation for this change is providing projects a method to enable
>      ++local checks of code e.g. linting. As documented in
>      ++https://git-scm.com/docs/gitfaq#_hooks[gitfaq], in some cases, server-side
>      ++checks may be more appropriate, especially since developers can always skip
>      ++local hook execution. We think there are still benefits to local checks:
>      ++
>      ++* Ensuring commit message style and preventing the committing of sensitive data,
>      ++as described above. In particular, if CI results are public, as with many open
>      ++source projects, server side checks are useless for guarding against leaking
>      ++sensitive data.

So what you really mean is you want a pre-commit hook as an alternative
to some "once we've accpted the push and CI runs we notice naughty
data", not as a pre-receive hook alternative?

>      ++
>      ++* Helps developers catch issues earlier: typically developers need to push to
>      ++the remote to trigger server-side checks. Local hooks can be run anytime the
>      ++developer wants. This is especially useful if the project has slow
>      ++server-checks; catching issues locally can save the developer a lot of time
>      ++waiting for CI. They are also useful for locally reproducing an issue identified
>      ++in CI, helping resolve issues faster.
>      ++
>      ++* Since the typical goal of developers to submit changes to a central
>      ++repository, their interests are aligned with the project maintainers'; while
>      ++they can choose to skip local hook execution, it is in their interest to allow
>      ++hooks to run at least before proposing code for submission to the central
>      ++repository to increase the chances of the code getting accepted.

This all makes sense, but is really missing how this problem isn't adequately solved by:

    $ HACKING
    Welcome to project XYZ, first you'll be much more productive with
    our hooks, run:

        ./setup-hooks.sh

    [...]

Presumably developers working on some central repo are the ones least
served by this type of thing, since such setups usually have established
setup scripts etc. that you (mostly) go through once.

>      ++In the ideal world, developers and project maintainers use both local and server
>      ++side checks in their workflow. However, for many smaller projects, this may not
>      ++be possible: CI may be too expensive to run or configure. The number of local
>      ++solutions to this use case speaks to this need (see <<prior-art, Prior Art>>).
>      ++Bringing this natively to Git can give all these developers a well-supported,
>      ++secure implementation opposed to the fragmentation we see today.

The end-goal of this series combined with Emily's configurable hook is
basically to have less friction when it comes to setting up and
maintaining hooks.

I don't see how it would help with "CI may be too expensive to run or
configure" though. We're basically talking about auto-updating a script
in .git/hooks, which today is essentially a ./setup-hooks.sh, and the
script checking for a new version of itself when it runs at
origin/myscripts:myname.sh, no?

>      -+* As a repository owner / project maintainer,
>      ++* As a project maintainer,
>       +
>       +    ** I want to enforce code style so I require developers to use a
>       +    standardized linter.
>       +
>      -+    ** I want to prevent leaks / share code widely so I check that developers
>      -+    are uploading code to the right place.
>      ++    ** I want to prevent sensitive data from getting checked in.
>       +
>      -+    ** I want this to just work for all the developers in my repository, without
>      ++    ** I want to prevent leaks so I check that developers are uploading code to
>      ++    the right private central repository. Conversely, I may want to encourage
>      ++    more open source development and encourage developers to push to public
>      ++    central repositories.

I think I'm beginning to get the gist of this, so it's really also a
proposed workaround for projects that host on platforms like github.com
and whatnot where you can run arbitrary code in a CI, but you can't
install a custom pre-receive hook?

I think it might be worth explicitly spelling that out. E.g. if those
platforms gained a feature (which isn't that hard to imagine) of hiding
your ref until after some pre-receive part of a CI check has run (which
would not have public logs, so you could "safely" check/push passwords)
it seems to me that a large part of the immediate need for this would go
away.

Which doesn't per-se mean we shouldn't do it, just that assumptions,
constaints etc. should be documented.

>      ++* Trust comes from the central repository:
>      ++  ** Most users don't have the time or expertise to properly audit every hook
>      ++  and what it does. There must be trust between the user and the remote that the
>      ++  code came from, and the Git project should ensure trust to the degree it can
>      ++  e.g. enforce HTTPS for its integrity guarantees.

I won't belabor the point but in my feedback on v1 I suggested an
alternate mechanism of doing this which I think is more
distributed-friendly and more safe:
https://lore.kernel.org/git/87im5nzbe6.fsf@evledraar.gmail.com/

...

>      ++Example User Experience
>      ++^^^^^^^^^^^^^^^^^^^^^^^
>      ++
>      ++===== Case 1: Consent through clone
>      ++
>      ++....
>      ++$ git clone --setup-hooks
>      ++...
>      ++
>      ++The following hooks were installed from remote `origin` ($ORIGIN_URL):
>      ++
>      ++pre-commit: git-secrets --pre_commit_hook
>      ++pre-push:  $GIT_ROOT/pre_push.sh
>      ++....
>      ++
>      ++===== Case 2: Prompting after clone
>      ++....
>      ++$ git clone
>      ++...
>      ++
>      ++Remote `origin` ($ORIGIN_URL) suggest installing the following hooks:
>      ++
>      ++pre-commit: git-secrets --pre_commit_hook
>      ++pre-push:  $GIT_ROOT/pre_push.sh
>      ++
>      ++# instead of prompting, we could give users commands to run instead
>      ++# see case 3
>      ++
>      ++Do you wish to install them?
>      ++1. Yes (this time)
>      ++2. Yes (always from origin)
>      ++3. No (not this time)
>      ++4. No (never)
>      ++....
>      ++
>      ++===== Case 3: Re-prompting when hooks change
>      ++....
>      ++$ git pull
>      ++
>      ++The following hooks were updated from remote `origin` ($ORIGIN_URL):
>      ++
>      ++pre-push:  $GIT_ROOT/pre_push.sh
>      ++
>      ++If you wish to install them, run `git hook setup origin`.
>      ++
>      ++If you wish to always accept hooks from `origin`, run `git hook setup --always
>      ++origin`. You should only do this if you trust code changes from origin.
>      ++
>      ++To always ignore hooks from `origin`, run `git hook ignore origin`.
>      ++....
>      ++

That alternate security model I suggested above would make this much
less painful. I.e. you could say "I'll trust updates as long as the
whole chain is signed by XYZ authority".

> [...]

Snip the rest of the doc, which as noted I've god unreplied-to feedback
on in https://lore.kernel.org/git/87im5nzbe6.fsf@evledraar.gmail.com/

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] hooks: propose project configured hooks
  2021-04-02 10:30   ` Ævar Arnfjörð Bjarmason
@ 2021-04-03  0:58     ` Albert Cui
  0 siblings, 0 replies; 26+ messages in thread
From: Albert Cui @ 2021-04-03  0:58 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Albert Cui via GitGitGadget, git, brian m. carlson

On Fri, Apr 2, 2021 at 3:30 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>
> I had comments on the v1 that are still unanswered by this re-roll:
> https://lore.kernel.org/git/87im5nzbe6.fsf@evledraar.gmail.com/
>
> I think a more productive way to handle such proposals is to reply to
> such general discussion and /then/ re-roll the series.
>
> I'm not going to repeat the outstanding points there (but would like you
> to reply to it, and having just read Derrick Stolee's E-Mail downthread
> there's another significant point of feedback to reply to.
>
Thanks for the feedback! I'm new to this process. I'll make sure to
reply to both.

[...]

> >      ++  ** The project may want to prevent developers from committing passwords or
> >      ++  other sensitive information e.g. via
> >      ++  https://github.com/awslabs/git-secrets[git-secrets].
>
> Why does the project want to prevent developers from *committing* such
> information by default? I commit such stuff all the time, it's only a
> problem once it's pushed.
>
> But I think this is answered below:
>
> >      ++Server-side vs Local Checks
> >      ++^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >      ++
> >      ++A large motivation for this change is providing projects a method to enable
> >      ++local checks of code e.g. linting. As documented in
> >      ++https://git-scm.com/docs/gitfaq#_hooks[gitfaq], in some cases, server-side
> >      ++checks may be more appropriate, especially since developers can always skip
> >      ++local hook execution. We think there are still benefits to local checks:
> >      ++
> >      ++* Ensuring commit message style and preventing the committing of sensitive data,
> >      ++as described above. In particular, if CI results are public, as with many open
> >      ++source projects, server side checks are useless for guarding against leaking
> >      ++sensitive data.
>
> So what you really mean is you want a pre-commit hook as an alternative
> to some "once we've accpted the push and CI runs we notice naughty
> data", not as a pre-receive hook alternative?
>

I think you're identifying that "prevent developers from commiting" is
the wrong wording.
Maybe a better phrasing is "prevent sensitive information from getting pushed or
checked in."

Yes, this could be done in CI, but as noted below, catching this
before the push (or commit)
happens is more productive for developers:

> >      ++
> >      ++* Helps developers catch issues earlier: typically developers need to push to
> >      ++the remote to trigger server-side checks. Local hooks can be run anytime the
> >      ++developer wants. This is especially useful if the project has slow
> >      ++server-checks; catching issues locally can save the developer a lot of time
> >      ++waiting for CI. They are also useful for locally reproducing an issue identified
> >      ++in CI, helping resolve issues faster.
> >      ++

An additional point I should also call out is that for large
repositories, pushes themselves
can be slow, so even if server-side checks are fast, being able to
avoid unnecessary pushes
can help developer velocity.

>
> This all makes sense, but is really missing how this problem isn't adequately solved by:
>
>     $ HACKING
>     Welcome to project XYZ, first you'll be much more productive with
>     our hooks, run:
>
>         ./setup-hooks.sh
>
>     [...]
>
> Presumably developers working on some central repo are the ones least
> served by this type of thing, since such setups usually have established
> setup scripts etc. that you (mostly) go through once.
>

One issue that immediately comes to mind with a setup script is that
developers could
easily miss out on updates to the hooks. One nice thing about this
proposal is that we
try to address that problem.

Anothing issue is that people in general are bad at reading
instructions; that's why I'm
trying to get as close as possible to `git clone` doing set up for you
while balancing the
security concerns (I know Derrick Stolee finds issue with this in his
reply, which I'll try to
address in my reply there.)

> >      ++In the ideal world, developers and project maintainers use both local and server
> >      ++side checks in their workflow. However, for many smaller projects, this may not
> >      ++be possible: CI may be too expensive to run or configure. The number of local
> >      ++solutions to this use case speaks to this need (see <<prior-art, Prior Art>>).
> >      ++Bringing this natively to Git can give all these developers a well-supported,
> >      ++secure implementation opposed to the fragmentation we see today.
>
> The end-goal of this series combined with Emily's configurable hook is
> basically to have less friction when it comes to setting up and
> maintaining hooks.
>
> I don't see how it would help with "CI may be too expensive to run or
> configure" though. We're basically talking about auto-updating a script
> in .git/hooks, which today is essentially a ./setup-hooks.sh, and the
> script checking for a new version of itself when it runs at
> origin/myscripts:myname.sh, no?
>

By "expensive", I mean from both a money and a time perspective: for small
projects, you may not set up any server-side checks and instead rely
purely on local checks,
and this helps by, as you said, reducing the friction to set up these
local checks.

From my own experience: When I start a new weekend project, setting up
CI is not at the top
of the list of things I want to spend time on; I'm usually not even
writing tests. Maybe
I'm just a bad developer :D But I do set up local checks: linting,
code formatters.

To your point about updating hooks (we're thinking on the same
lines!): that's putting the
responsibility on the developer to implement. From a
maximizing-global-productivity
point of view, isn't it more elegant for us to extend Git's
functionality and provide good support
for this use case?

> >      -+* As a repository owner / project maintainer,
> >      ++* As a project maintainer,
> >       +
> >       +    ** I want to enforce code style so I require developers to use a
> >       +    standardized linter.
> >       +
> >      -+    ** I want to prevent leaks / share code widely so I check that developers
> >      -+    are uploading code to the right place.
> >      ++    ** I want to prevent sensitive data from getting checked in.
> >       +
> >      -+    ** I want this to just work for all the developers in my repository, without
> >      ++    ** I want to prevent leaks so I check that developers are uploading code to
> >      ++    the right private central repository. Conversely, I may want to encourage
> >      ++    more open source development and encourage developers to push to public
> >      ++    central repositories.
>
> I think I'm beginning to get the gist of this, so it's really also a
> proposed workaround for projects that host on platforms like github.com
> and whatnot where you can run arbitrary code in a CI, but you can't
> install a custom pre-receive hook?
>
> I think it might be worth explicitly spelling that out. E.g. if those
> platforms gained a feature (which isn't that hard to imagine) of hiding
> your ref until after some pre-receive part of a CI check has run (which
> would not have public logs, so you could "safely" check/push passwords)
> it seems to me that a large part of the immediate need for this would go
> away.
>
> Which doesn't per-se mean we shouldn't do it, just that assumptions,
> constaints etc. should be documented.
>

Agree we can call this out, but as I'm saying above, I don't think
it's fair to assume
everyone is already using server-side checks and there's still benefit
to doing the same
checks locally even if you have server-side checks.

[...]

> Snip the rest of the doc, which as noted I've god unreplied-to feedback
> on in https://lore.kernel.org/git/87im5nzbe6.fsf@evledraar.gmail.com/

Really appreciate the engagement! I'll try to reply early next week.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] hooks: propose project configured hooks
  2021-03-30 15:24   ` Derrick Stolee
@ 2021-04-05 22:45     ` Albert Cui
  2021-04-05 23:09       ` Junio C Hamano
  2021-04-06 23:15       ` brian m. carlson
  0 siblings, 2 replies; 26+ messages in thread
From: Albert Cui @ 2021-04-05 22:45 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Albert Cui via GitGitGadget, git, brian m. carlson,
	Ævar Arnfjörð Bjarmason

On Tue, Mar 30, 2021 at 8:24 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 3/25/2021 9:43 PM, Albert Cui via GitGitGadget wrote:
> > From: Albert Cui <albertqcui@gmail.com>
> >
> > Hooks today are configured at the repository level, making it difficult to
> > share hooks across repositories. Configuration-based hook management, by
> > moving hooks configuration to the config, makes this much easier. However,
> > there is still no good way for project maintainers to encourage or enforce
> > adoption of specific hook commands on specific hook events in a repository.
> > As such, there are many tools that provide this functionality on top of Git.
> >
> > This patch documents the requirements we propose for this feature as well as
> > a design sketch for implementation.
>
> Sorry for being so late in reviewing this.
>
> My first reaction is that this feature is suggesting multiple security
> vulnerabilities as core functionality. It also seems to be tied to
> niche projects (in number of projects, not necessarily the size of those
> projects).
>

According to it's GitHub page, Husky [0] has 22M million monthly downloads and
is used by 437K projects. Many projects, both big (e.g. Android [1],
ChromeOS [2],
Angular [3], Webpack [4]) and small have use cases around hooks configuration,
so I don't think it's fair to say this is only needed by niche projects.

> I was recommended in conversation to think of this as a way to take
> existing ad-hoc behavior and standardize it with a "Git-blessed"
> solution. I'm not sure this proposal makes a strong enough case for
> why having a "configure-hooks.sh" script in the base of the repo is
> not enough. It simultaneously does not use existing precedents like
> .gitattributes or .gitignore as direction in using the worktree at
> HEAD as a mechanism for communicating details. I find using a separate
> ref for hooks to be a non-starter and the design should be rebuilt from
> scratch.
>

Right, this entire proposal is trying to get to a "Git-blessed" solution,
and I should make the need clearer. A few reasons for standardizing
this come to mind:

1. Many existing "standard" solutions. A multitude of existing solutions for
this use case speaks to the fact that a basic config script is not sufficient.
I mentioned Husky above, but here are a few more; basically each
popular programming language environment has a solution for this.

https://github.com/sds/overcommit - Ruby
https://github.com/pre-commit/pre-commit - Python
https://github.com/Arkweid/lefthook - Go
https://github.com/shibapm/Komondor - Swift
https://github.com/typicode/husky - Node

These solutions all handle the installation and updating of hooks. A
"configure-hooks.sh" script doesn't handle hook updates, unless you go through
the trouble yourself of implementing and maintaining that.

2. External dependencies. The existing solutions require users to
either have those
toolchains already installed or an explicit installation step via an OS package
manager, so there's a lot of friction for users, even when using these
standard solutions.
This functionality shouldn't require external dependencies, and most
certainly _how_ you
set up hooks shouldn't change depending on your coding environment.
Fixing this is only
possible in a world where Git supports this functionality natively.

3. Improving security. As you mentioned, hooks are difficult to get
right from a security
perspective, and standardizing on a single implementation allows us to
give developers
a well-vetted solution with a better security model than what exists
today. For example,
we're proposing making it very clear to users whenever there's a hook
update. This isn't
something that existing solutions do.

I'll also say in general, the Git project is much more likely to get
security right than smaller
projects, where oftentimes even popular projects end up unmaintained.

For the design feedback: what I'm hearing is that you'd prefer a
design that keeps configuration in the worktree. I'll leave discussion about
implementation to those on the list better suited :)

However, one thing that I should have mentioned in patch about the design
proposal (as Emily Shaffer touched upon in her response) is that since the
configuration is separate from the code base, it allows you to go back
in history
or to older branches while preserving "improvements" to the hooks configuration
e.g. maybe the project has shifted to using a newer version of a linter.

This functionality has pros and cons. Bringing improvements to checks
can be useful
e.g. it's arguably better to bring older code up to newer code style.
But it also makes
it less possible to replicate the exact state of older history. This
is a problem that
server-side checks also have.

> I also expect that a significant portion of users will see a message
> like "this repository needs hooks" and will just say "yes" to get rid
> of the prompt. There needs to be sufficient opportunity for users to
> inspect the hook configuration and avoid frustrated or distracted users
> from doing the wrong thing.
>
> Server-side checks should always exist, so users who don't follow the
> project's guidelines using the recommended hooks will be blocked. The
> important thing is that there is an easy way for willing participants
> to install the correct hooks. This doesn't mean we should make it
> almost automatic.
>

What I'm hearing here is that you don't like the interactive prompt, as you
noted in the UX feedback below. Allowing users the chance to inspect the hook
configuration makes sense to call out explicitly as part of the security model.

> Also, please proactively pursue a security review of the feature,
> including non-technical risks such as social engineering, forks, or
> other possible attacks. This idea seems so risky that I would be
> against accepting it unless a security expert has done a thorough
> review.
>

Agreed. We already did a security review internally at Google. The main
feedback was:

* We need an explicit opt-in opposed to setting hooks up automatically,
e.g. a command line flag like --accept-hooks at minimum. This is primarily
to distinguish people who are just cloning a repository to browse the code
from people who are developing.

* The average user doesn't have the ability to review hooks in general
(security is hard and obscuration is easy), and if the user has
already opted into
this feature because they are engaged in development, it's very likely
that they're
already running build scripts, so the additional attack vector here doesn't seem
like a big issue.

If there are other security folks I should seek advice from, please let me know.

[...]
> I think providing a way for repository owners to _recommend_ how cloners
> should interact with the repository is a good idea. I think starting with
> hooks is perhaps a significant jump to the most complicated version of
> that idea.
>
> As you think of this design, it might be good to think about how some
> recommended Git config (within an allow-list) might fit into this system
> as well. I would have started there, with things like "Use partial clone"
> or "use sparse-checkout". Those are really things that need to happen at
> clone time, they can't really happen after-the-fact, which helps justifying
> a modification to 'git clone'. The hook configuration doesn't _need_ to
> happen during 'git clone'. More on this timing later.
>
> The .gitattributes file is the closest analogue I could find in current
> functionality, but it operates on a path-based scope, not repository scope.

I am happy to extend this patch to talk about configuration in general with
a specific use case of suggesting hooks. However, I do want to separate out
"what should we do when?" from "should we do this?" Based on this feedback,
I'm hearing "we should have a design for suggested configuration in general,"
but I don't think that necessitates that we should implement, for
example, partial
clone configuration before hooks configuration.

>
> > +Server-side vs Local Checks
> > +^^^^^^^^^^^^^^^^^^^^^^^^^^^
> ...
> > +In the ideal world, developers and project maintainers use both local and server
> > +side checks in their workflow. However, for many smaller projects, this may not
> > +be possible: CI may be too expensive to run or configure. The number of local
> > +solutions to this use case speaks to this need (see <<prior-art, Prior Art>>).
> > +Bringing this natively to Git can give all these developers a well-supported,
> > +secure implementation opposed to the fragmentation we see today.
>
> I'm not sure this is a good selling point for small projects. If they
> are small, then the CI to verify commits is cheap(er).
>
> Local hooks should never be used as a replacement for server-side checks.
> A user could always use a repository without the local hooks and push
> commits that have not been vetted locally. The extreme example is to have
> a commit hook that compiles the code and runs all the tests. Would you
> then remove all CI builds?
>
> Making it easier to adopt local hooks can avoid some pain points when
> users are blocked by the server-side checks.
>

Sorry, this wasn't meant to come off as "small projects don't need CI." It was
more to say, "small projects today are already primarily relying on
local checks,
so we should make that easier for them." I clarified that a bit here as well:
https://lore.kernel.org/git/CAMbkP-Q4_z-nQzJwr2ZeM16deHKmzr=Z4908UzgOyk9FA-gKEA@mail.gmail.com/T/#u

[...]

> I appreciate the motivation in this document. However, the motivation
> doesn't really justify why this should be baked into Git itself, since
> a "configure-repo" script in the base of the repo would suffice to
> achieve that functionality.
>
> The reason to put this in Git is to standardize this process so it
> is not different in each repository. It might be good to spend time
> justifying that angle.
>

Thank you for the feedback. I hope what I wrote above helps.

[...]
> > +* Trust comes from the central repository:
> > +  ** Most users don't have the time or expertise to properly audit every hook
> > +  and what it does. There must be trust between the user and the remote that the
> > +  code came from, and the Git project should ensure trust to the degree it can
> > +  e.g. enforce HTTPS for its integrity guarantees.
> > +
> > +  ** Since developers will likely build their local clone in their development
> > +  process, at some point, arbitrary code from the repository will be executed.
> > +  In this sense, hooks _with user consent_ do not introduce a new attack surface.
>
> It is critical that users are presented with this consent at the correct
> times. For instance, I believe configuring local hooks should only be
> done _after_ "git clone" completes. That allows a user to inspect the
> worktree to their content instead of in the middle of an interactive shell
> session or something. (The "git clone" command could output a message to
> stderr saying "This repository recommends configuring local hooks. Run
> 'git <tbd>' to inspect the hooks and configure them.")
>
> We've had enough code-execution bugs with "git clone" that I want to
> completely avoid that possibility here.
>

Responding to this below in the "Consent through clone" section.

> > +* Give users visibility: Git must allow users to make informed decisions. This
> > +means surfacing essential information to the user in a visible manner e.g. what
> > +remotes the hooks are coming from, whether the hooks have changed in the latest
> > +checkout.
>
> As a user moves HEAD, we should similarly avoid updating the hooks
> automatically, but instead present a message to the user to update their
> hooks using an intentional command.
>

Responding to this below.

> > +    ** This could be a path to a script/binary within the repository
>
> Binaries will be tricky if you want users of multiple platforms to
> interact with your repository. And scripts can be slower than binaries.
>
> How could someone build hooks from source using your workflow? Perhaps
> users are expected to locally compile the code before configuring the
> hooks?
>

"Binaries" was primarily referring to packages like pylint which are installed
at the OS level. In an enterprise environment, these could be
installed automatically
for the user.

> > +    ** This could be a path to a script/binary contained within submodules of
> > +    the repository
>
> This gives me significant chills. Proceed with caution here.
>
> I understand the reason to want this feature: you could have a suite
> of repositories using a common hook set that lives in each as a
> submodule. I just want to point out that this adds yet another
> dimension for attack.
>

If we notify users about changes, both to hook commands and to the
underlying source files, we can make this safer.

> > +    ** This could be a user installed command or script/binary that exists
> > +    outside of the repository and is present in `$PATH`
>
> Like `rm -rf ~/*`? I'm trying to think of dangerous things to do without
> elevation. It could help here to clarify the intended user pattern here:
> "This repository requires that you install tool X." This seems unlikely
> to be necessarily true at clone time, so the users will have a broken
> state if they don't run some extra steps. How will that be communicated?
>
> Requirements like these make me think that these repositories would be
> better off with a script that configures the hooks after checking if
> these things actually exist on the PATH (and installs them if not). I
> would lower the priority of this one for now.
>

As mentioned, for enterprise deployments, this can be solved by administrators
installing any necessary software automatically.

Otherwise, I think ensuring the tool is installed feels out-of-scope
(as written in
the doc); it's not like Git makes sure compilers or build tools are
installed today,
and even today, users could set up Husky hooks that rely on $PATH tools, so
we're not introducing a new problem.

Some day, maybe we could have a `post-clone` and `post-checkout` hooks,
but I think this is all out-of-scope of the immediate discussion.

> > +* This configuration should only apply if it was received over HTTPS
>
> Avoiding http:// and git:// makes sense. Why not SSH?
>

Left off SSH accidentally! Definitely makes sense to include.

> > +* A setup command for users to set up hooks
> > +
> > +    ** Hook setup could happen at clone time assuming the user has consented
> > +    e.g. if `--setup-hooks` is passed to `git clone`
>
> This is not enough consent.
>

I hear what you're saying. How would you feel if this specific
functionality was behind
config (that enterprise environments could ship)? What I'm thinking:

```
#~/.gitconfig
[hook]
   allowCloneInstallFromRemote = $REMOTE
```

IFF $REMOTE matches the config, then `git clone $REMOTE --setup-hooks` works.

(We could even get rid of --setup-hooks and rename the config to
hook.installOnCloneFromRemote but I think requiring user consent is
still best here.)

> > +* Users must explicitly approve hooks at least once
> > +
> > +    ** Running the setup command should count as approval, including if the user
> > +    consented during the clone
> > +
> > +    ** When a hook command changes, a user should re-approve execution (note:
> > +    implementation should not interfere with requirement listed in “Fast
> > +    Follows")
>
> Users should explicitly approve hooks any time they would change.
> They should also be able to explore the source of the change using
> whatever editors and tools they want, so the worktree should change
> to its new state without new hooks, _then_ the user could consider
> updating hooks based on that new state.
>

As mentioned above, developers will execute code locally anyway when building.
Do you think allowing users to opt-into hook updates significantly
increases the attack
surface? I should note that existing solutions already do hook updates
automatically and
silently (since they use a trampoline script); this proposal is safer
from that given
1) we ask users to opt into auto updates and 2) we warn users about changes.

> > +Fast Follows
> > +^^^^^^^^^^^^
> > +
> > +* When prompted to execute a hook, users can specify always or never, even if
> > +the hook updates
>
> I don't understand what this means. "when prompted to execute a hook" are you
> saying that the user will get a message saying "Git will now run the pre-commit
> hook, are you ok with that?"
>

This should say "When prompted to install a hook, users can specify always or
never, even if the hook updates. For security, this consent should be tied to
a specific remote."

[trimming some duplicate points]

> > +# instead of prompting, we could give users commands to run instead
> > +# see case 3
> > +
> > +Do you wish to install them?
> > +1. Yes (this time)
> > +2. Yes (always from origin)
> > +3. No (not this time)
> > +4. No (never)
>
> I'd rather see the installation as a separate step. That gives more
> weight to the users' consent. Even if you do have a prompt here that
> says Yes/No, *do not* include "always from origin".
>
> > +===== Case 3: Re-prompting when hooks change
> > +....
> > +$ git pull
> > +
> > +The following hooks were updated from remote `origin` ($ORIGIN_URL):
> > +
> > +pre-push:  $GIT_ROOT/pre_push.sh
> > +
> > +If you wish to install them, run `git hook setup origin`.
>
> Good. Stop here. Perhaps also describe this as something that happens
> with "git checkout" because it matters when HEAD updates, even if the
> commit was fetched earlier.
>

Sure, case 3 seems like a reasonable path to pursue. Sounds like auto-updating
is the main point of contention here.

[...]

> > +Implementation Sketch
> > +^^^^^^^^^^^^^^^^^^^^^
> > +
> > +* Perform fetch as normal
> > +
> > +* After fetch is complete, Git checks for a "magic" config branch (e.g.
> > ++origin/refs/recommended-config+) which contains information about config lines
> > +an end-user may want (including hooks).
>
> I think this is the wrong direction to go. You are recommending a few things:
>
> 1. Some branch names are more special than others.
> 2. Hooks live in a separate history than the rest of the repository.
> 3. Users cannot inspect the hooks in their worktree before installation.
>
> Instead, think about things like .gitignore and .gitattributes, as they can
> change as the repository changes. Make a special _filename_ or directory:
> for example ".githooks/".
>

#2 is a feature, as mentioned above, but certainly just a
"nice-to-have." I agree we'd
need a solution to #3.

Thanks for all the feedback!

Albert

[0] https://github.com/typicode/husky
[1] https://android.googlesource.com/platform/tools/repohooks/
[2] https://chromium.googlesource.com/chromiumos/docs/+/HEAD/contributing.md#Upload-changes
[3] https://github.com/angular/angular/tree/master/.husky
[4] https://github.com/webpack/webpack/tree/master/.husky

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] hooks: propose project configured hooks
  2021-04-05 22:45     ` Albert Cui
@ 2021-04-05 23:09       ` Junio C Hamano
  2021-04-05 23:40         ` Albert Cui
  2021-04-06 23:15       ` brian m. carlson
  1 sibling, 1 reply; 26+ messages in thread
From: Junio C Hamano @ 2021-04-05 23:09 UTC (permalink / raw)
  To: Albert Cui
  Cc: Derrick Stolee, Albert Cui via GitGitGadget, git,
	brian m. carlson, Ævar Arnfjörð Bjarmason

Albert Cui <albertqcui@gmail.com> writes:

>> Requirements like these make me think that these repositories would be
>> better off with a script that configures the hooks after checking if
>> these things actually exist on the PATH (and installs them if not). I
>> would lower the priority of this one for now.
>>
>
> As mentioned, for enterprise deployments, this can be solved by administrators
> installing any necessary software automatically.
>
> Otherwise, I think ensuring the tool is installed feels out-of-scope
> (as written in
> the doc); it's not like Git makes sure compilers or build tools are
> installed today,
> and even today, users could set up Husky hooks that rely on $PATH tools, so
> we're not introducing a new problem.

I am afraid that this compares apples and oranges.  

I may "git clone" and try "make" to find out that I needed a special
compiler, and that would not be the end of the world.  It is
guaranteed that "git clean -f -x -d" followed by installation of
necessary toolchain followed by "make" would work.  And that is
partly because "git clone" does not do any more than just clone and
checkout the initial tree.

If a new version of "git clone" told me "I can install the project
recommended hooks to use", I answer "yes", and then failed while
installing and configuring the project-recommended hooks because of
missing dependencies, then I wouldn't know in what state the result
would be in.  In some projects, it may be enough to just install the
missing dependencies, and in some others, it may not be enough and I
have a broken half-configured mess depending on how the "installing
and configuring" step failed.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] hooks: propose project configured hooks
  2021-04-05 23:09       ` Junio C Hamano
@ 2021-04-05 23:40         ` Albert Cui
  2021-04-06  0:13           ` Junio C Hamano
  0 siblings, 1 reply; 26+ messages in thread
From: Albert Cui @ 2021-04-05 23:40 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee, Albert Cui via GitGitGadget, git,
	brian m. carlson, Ævar Arnfjörð Bjarmason

On Mon, Apr 5, 2021 at 4:09 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Albert Cui <albertqcui@gmail.com> writes:
>
> >> Requirements like these make me think that these repositories would be
> >> better off with a script that configures the hooks after checking if
> >> these things actually exist on the PATH (and installs them if not). I
> >> would lower the priority of this one for now.
> >>
> >
> > As mentioned, for enterprise deployments, this can be solved by administrators
> > installing any necessary software automatically.
> >
> > Otherwise, I think ensuring the tool is installed feels out-of-scope
> > (as written in
> > the doc); it's not like Git makes sure compilers or build tools are
> > installed today,
> > and even today, users could set up Husky hooks that rely on $PATH tools, so
> > we're not introducing a new problem.
>
> I am afraid that this compares apples and oranges.
>
> I may "git clone" and try "make" to find out that I needed a special
> compiler, and that would not be the end of the world.  It is
> guaranteed that "git clean -f -x -d" followed by installation of
> necessary toolchain followed by "make" would work.  And that is
> partly because "git clone" does not do any more than just clone and
> checkout the initial tree.
>
> If a new version of "git clone" told me "I can install the project
> recommended hooks to use", I answer "yes", and then failed while
> installing and configuring the project-recommended hooks because of
> missing dependencies, then I wouldn't know in what state the result
> would be in.  In some projects, it may be enough to just install the
> missing dependencies, and in some others, it may not be enough and I
> have a broken half-configured mess depending on how the "installing
> and configuring" step failed.

I'm a little confused, and maybe it's because we have different
definitions of what
"installing hooks" means. By installing hooks, I meant the addition of
the hook command to the config, e.g the outcome of:

`git config --add hook.pre-commit.command pylint`

This works today; Git won't complain if I don't have pylint installed, so
I don't see how we'd get into a "broken half-configured mess."

It will complain when it tries to execute the hook, and this is where I see it
as the same as the 'I may "git clone" and try "make" to find out that
I needed a special compiler, and that would not be the end of the world' case.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] hooks: propose project configured hooks
  2021-04-02  9:59   ` Ævar Arnfjörð Bjarmason
@ 2021-04-05 23:42     ` Albert Cui
  0 siblings, 0 replies; 26+ messages in thread
From: Albert Cui @ 2021-04-05 23:42 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Albert Cui via GitGitGadget, git, brian m. carlson

On Fri, Apr 2, 2021 at 2:59 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>
>
> On Fri, Mar 26 2021, Albert Cui via GitGitGadget wrote:
>
> > From: Albert Cui <albertqcui@gmail.com>
>
> Formatting nit:
>
> > +Git has https://git-scm.com/docs/githooks[hooks] functionality to allow users to
> > [...]
> > +local checks of code e.g. linting. As documented in
> > +https://git-scm.com/docs/gitfaq#_hooks[gitfaq], in some cases, server-side
> > +checks may be more appropriate, especially since developers can always skip
>
> This should be a linkgit:* even in the technical/ directory, should it
> not? We build docs on git-scm.com (among others), but in our source tree
> we should be using linking syntax, not linking to our own already-built
> docs on some website.

Yes, thanks. Good catch.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] hooks: propose project configured hooks
  2021-04-05 23:40         ` Albert Cui
@ 2021-04-06  0:13           ` Junio C Hamano
  2021-04-06  0:27             ` Albert Cui
  0 siblings, 1 reply; 26+ messages in thread
From: Junio C Hamano @ 2021-04-06  0:13 UTC (permalink / raw)
  To: Albert Cui
  Cc: Derrick Stolee, Albert Cui via GitGitGadget, git,
	brian m. carlson, Ævar Arnfjörð Bjarmason

Albert Cui <albertqcui@gmail.com> writes:

> I'm a little confused, and maybe it's because we have different
> definitions of what "installing hooks" means. By installing hooks,
> I meant the addition of the hook command to the config, e.g the
> outcome of:
>
> `git config --add hook.pre-commit.command pylint`

Yeah, I was envisioning that it won't be as a simple, mechanical and
unconditional single command invocation.  Rather, the 'pylint' part
would have to become different depending on the environment (e.g.
what operating system you are on, what editor you prefer, etc.).
And the part that decides what kind of environment you are on would
have to be written by the project that controls the "project-managed
hooks"---unfortunately that would most likely to be an error-prone
part.




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] hooks: propose project configured hooks
  2021-04-06  0:13           ` Junio C Hamano
@ 2021-04-06  0:27             ` Albert Cui
  0 siblings, 0 replies; 26+ messages in thread
From: Albert Cui @ 2021-04-06  0:27 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee, Albert Cui via GitGitGadget, git,
	brian m. carlson, Ævar Arnfjörð Bjarmason

On Mon, Apr 5, 2021 at 5:14 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Albert Cui <albertqcui@gmail.com> writes:
>
> > I'm a little confused, and maybe it's because we have different
> > definitions of what "installing hooks" means. By installing hooks,
> > I meant the addition of the hook command to the config, e.g the
> > outcome of:
> >
> > `git config --add hook.pre-commit.command pylint`
>
> Yeah, I was envisioning that it won't be as a simple, mechanical and
> unconditional single command invocation.  Rather, the 'pylint' part
> would have to become different depending on the environment (e.g.
> what operating system you are on, what editor you prefer, etc.).
> And the part that decides what kind of environment you are on would
> have to be written by the project that controls the "project-managed
> hooks"---unfortunately that would most likely to be an error-prone
> part.
>

I don't see a need for that. At least, this doesn't feel part of the
"minimum feature set."

I don't think any existing solutions for hooks management handle this,
so it feels fine to push this part to the project. As you said, if
they wanted this, I'd expect `git config --add hook.pre-commit.command
lint.sh` where lint.sh does whatever cross-platform magic is required.

I wouldn't expect a lot of need for this anyway; I'd expect most hooks
to depend on tools that already work cross-platform (e.g. pylint) or
be hooks written in the programming environment of the project, so the
toolchain should already be in place to execute the hooks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] hooks: propose repository owner configured hooks
  2021-03-19 10:27 ` Ævar Arnfjörð Bjarmason
@ 2021-04-06  0:35   ` Albert Cui
  2021-04-07 22:47     ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 26+ messages in thread
From: Albert Cui @ 2021-04-06  0:35 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Albert Cui via GitGitGadget, git

On Fri, Mar 19, 2021 at 3:27 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>
> On Thu, Mar 18 2021, Albert Cui via GitGitGadget wrote:
>
> > We propose adding native Git functionality to allow project maintainers to
> > specify hooks that a user ought to install and utilize in their development
> > workflows. This patch documents the requirements we propose for this feature
> > as well as a design sketch for implementation.
>
> I like this goal. It's something I proposed (off-list) before and did a
> small write-up of here:
> https://lore.kernel.org/git/87zi6eakkt.fsf@evledraar.gmail.com/
>
> As far as I recall the response in the room at the time was the expected
> combination of "sure that would be neat" and "let's see the
> patches". I.e. I don't think there's hard objections to the existence of
> such a facility, but of course the devel is in the details, and
> obviously I never followed-up with actually trying to implement it.
>
> With config-based hooks this'll be much easier for the hook case, and
> I've tried to help that along[A]. I'd be interested in reviewing any
> effort in this area as well. The ".githooks/*" case in that proposal
> goes away with config-based hooks (since they wouldn't be special
> anymore, just config).
>

Thanks for the context and the support. Agreed, I think this is a
natural evolution of config-based hooks.

[...]
> > +* As part of the fetch subcommand, Git prompts users to install the configs
> > +contained there.
> > +
> > +    ** User responses to that prompt could be "sticky" - e.g. a user could reply
> > +    "no (this time)", "no (never)", "yes (this time)", or "yes (always)".
> > +    Always/never indicate that the user trusts the remote this config is coming
> > +    from, and should not apply to configs fetched from other remotes.
>
> As noted in the proposal I linked I think anyone thinking about this
> would do well to examine the trade-off Emacs's implementation of this
> uses, since it manages to safely open arbitrary files that'll
> potentially run arbitrary code on-open, but only if the user opts-in.
>

In [0], I proposed only allowing pre-clone opt-in to suggested hooks
if allowlisted in the config for the given remote, which seems similar
to your previous proposal. Extending this idea to any config settings
seems very reasonable. I'd love other people's thoughts about this.

In your proposal, you wrote:

> It would work really well with includeIf, e.g. I could clone all my work repos to a "safe" area in ~/work which is allowed to set more options, e.g. aliases.

It seems much safer to do this for a given remote, opposed to a local
file path, no?

> > +Later, we might want to do this before the initial clone is performed; that
> > +workflow looks like:
> > +
> > +* During clone, perform ls-refs as normal
> > +
> > +* If the server has a "magic" config branch, fetch only that config branch.
>
> ...the reason for the magic branch is this before-clone use-case?
>
> Unless there's a really strong use-case for that I'd think the
> per-branch .gitconfig would be a better trade-off, but even then there
> would be ways to make that work obviously in the many-many case, and
> still e.g. have a branch advertise a config blob for its "main" branch
> as a special case or something.
>
> I also suspect an unstated constraint of having this in a magic branch
> is the limitation of some git hosting provider's ACL
> implementations. More on that later...
>

Mentioned in [0], the primary motivation for a magic config branch
that lived outside of the worktree was "since the configuration is
separate from the code base, it allows you to go back in history or to
older branches while preserving "improvements" to the hooks
configuration e.g. maybe the project has shifted to using a newer
version of a linter."

[...]
> > +Future Work
> > +~~~~~~~~~~~
> > +
> > +* Extending this to allow repository owners to specify specific configurations
> > +in general e.g. this repository should use partial-clone with these parameters.
>
> I don't see why such a proposal should narrowly confine itself to hooks.
>
> Once we have config-based hooks then hooks are just configuration.
>
> If we're going to pick up arbitrary configuration from remotes/repos on
> request then we'd be starting with the most dangerous configuration.
>

Summarizing from [0]: Based on this feedback, I'm hearing "we should
have a design for suggested configuration in general," but I don't
think that necessitates that we should pursue generic configuration
before hooks configuration.

> I think a better way to start such an effort incrementally would be:
>
> 1. Audit git's config parser for the safety of parsing arbitrary config,
>    we parse .gitmodules now, do we feel it's safe enough to parse
>    arbitrary config (probably, but worth checking).
>
> 2. Add reflection to git's own config variables. Right now we have this
>    in the binary generated via a grep from the documentation. Similar to
>    Emacs's implementation we should/could categorize all our known
>    config variables by safety.
>

To clarify, are you saying, today, git's config variables are pulled
from the documentation? I.e. the documentation is the source of truth
for what config variables are supported? o.0

>    Hooks are the least safe, something like a diff.context=N setting the
>    repo wants to suggest a -U<n> setting much safer (just tweaking our
>    existing diff algorithm) etc. But even in those cases we'd want to
>    proceed slowly, e.g. is that config parsing for that -U<n> defensive
>    enough to be safe for arbitrary data?
>

To clarify, this proposal is just to set the hooks config that
config-based hooks enabled e.g. running `git config --add
hook.pre-commit.command pylint` on behalf of the user, so I'm not sure
what "arbitrary data" you're referring to. At least, any problems I'd
think we'd already address with config-based hooks.

> 3. Users should be able to e.g. configure "yes, for any repo I clone
>    they can tweak 'safe'" variables. This would reduce user dialog
>    fatigue, and thus increase safety. I.e. a repo who wants to set a
>    thing like hooks would stand out, but something that e.g. wants to
>    change the diff order would pass on existing global configuration.
>
> 4. Smarter ACL that magic remote+magic branch: It seems like an obvious
>    thing to me to want that if I clone e.g. a random clone of git.git
>    that I'd want to setup config for it IFF the .gitconfig in it is
>    reachable from a tag GPG signed by <approved key>.
>
>    Git's a distributed system, so while I don't think it's bad to have
>    some feature like "I always trust config from this remote" (e.g. a
>    corporate environment where you know its .gitconfig is
>    guarded/audited) we should think about more distributed-friendly
>    solutions first and if possible guide users towards those.

This seems like an OK alternative to allow-listing based on remote,
but are you expecting users to add a GPG key to their .gitconfig?
Remote URLs seem much more user friendly (think IP address vs URL).

[0]: https://lore.kernel.org/git/CAMbkP-S-9cccMpU4HG0Wurqap-WkTmD2zk50nKd9kJ_oWO__qw@mail.gmail.com/

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] hooks: propose project configured hooks
  2021-04-05 22:45     ` Albert Cui
  2021-04-05 23:09       ` Junio C Hamano
@ 2021-04-06 23:15       ` brian m. carlson
  2021-04-07  7:53         ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 26+ messages in thread
From: brian m. carlson @ 2021-04-06 23:15 UTC (permalink / raw)
  To: Albert Cui
  Cc: Derrick Stolee, Albert Cui via GitGitGadget, git,
	Ævar Arnfjörð Bjarmason

[-- Attachment #1: Type: text/plain, Size: 7638 bytes --]

On 2021-04-05 at 22:45:10, Albert Cui wrote:
> Right, this entire proposal is trying to get to a "Git-blessed" solution,
> and I should make the need clearer. A few reasons for standardizing
> this come to mind:
> 
> 1. Many existing "standard" solutions. A multitude of existing solutions for
> this use case speaks to the fact that a basic config script is not sufficient.
> I mentioned Husky above, but here are a few more; basically each
> popular programming language environment has a solution for this.
> 
> https://github.com/sds/overcommit - Ruby
> https://github.com/pre-commit/pre-commit - Python
> https://github.com/Arkweid/lefthook - Go
> https://github.com/shibapm/Komondor - Swift
> https://github.com/typicode/husky - Node
> 
> These solutions all handle the installation and updating of hooks. A
> "configure-hooks.sh" script doesn't handle hook updates, unless you go through
> the trouble yourself of implementing and maintaining that.

I think part of the problem is that an automated process to update hooks
is generally a security vulnerability, since it means that untrusted
remote code will automatically run on your computer.

I want to be clear that I understand the desire for this feature, even
though it's not a feature I would personally use, and the fact that
there are many approaches means that clearly there are many people that
do want this functionality.  I have in the past shared hooks with others
and we have mutually benefitted enormously from that fact.  My concerns
here are solely about the security aspects of this feature.

> 3. Improving security. As you mentioned, hooks are difficult to get
> right from a security
> perspective, and standardizing on a single implementation allows us to
> give developers
> a well-vetted solution with a better security model than what exists
> today. For example,
> we're proposing making it very clear to users whenever there's a hook
> update. This isn't
> something that existing solutions do.

I don't think this materially improves security.  All of these options
have the same security problems, and that's inherent in the solution.
What we're doing here is basically giving people a built-in feature that
is the equivalent of piping curl to bash and blessing it as secure when
it's not.

> I'll also say in general, the Git project is much more likely to get
> security right than smaller
> projects, where oftentimes even popular projects end up unmaintained.

I agree that Git tries to be careful about security.  It is for these
reasons that I think Derrick and I have provided you the feedback we
have here.

> Agreed. We already did a security review internally at Google. The main
> feedback was:
> 
> * We need an explicit opt-in opposed to setting hooks up automatically,
> e.g. a command line flag like --accept-hooks at minimum. This is primarily
> to distinguish people who are just cloning a repository to browse the code
> from people who are developing.
> 
> * The average user doesn't have the ability to review hooks in general
> (security is hard and obscuration is easy), and if the user has
> already opted into
> this feature because they are engaged in development, it's very likely
> that they're
> already running build scripts, so the additional attack vector here doesn't seem
> like a big issue.

I think you've hit the nail on the head here, but drawn a mistaken
conclusion.  The average user doesn't have the ability to review hooks
in general and therefore cannot make an informed decision about whether
to enable them, so the behavior we need to have is not to lead them to
doing things which are risky from a security perspective.

If my goal is to just build a product and not to run its tests, which I
do with a decent number of projects, then I can audit a Go or Rust
project trivially and determine if it executes arbitrary code or not
during the build process and if so, inspect it and gain confidence in
it.  In fact, there are many projects which don't execute build scripts
during the process, and therefore which are completely safe.  This hook
design changes that calculus dramatically.

I also want to point out that people clone repositories for a variety of
reasons.  At GitHub, every team has its own repository with
documentation.  Literally every employee at the company, regardless of
role, interacts with a Git repository, even people who do normally
nontechnical tasks such as our in-house lawyers and our event planners.
Many of these people are nontechnical, and almost none of these
repositories has any software development involvement.  There are also
numerous people elsewhere who may work on projects such as books or
other non-software in repositories who are nontechnical.  Under the
current model, the biggest problem these people face is accidentally
corrupting their local repository and losing data.  With a design that
prompts them to install hooks, they face the possibility of arbitrary
code execution.

The reason I proposed the FAQ we have in our documentation is because I
answer a decent number of questions on Stack Overflow, in addition to
questions that involve users that I get pulled into at work.
Overwhelmingly, the vast majority of users, even developers, are not
completely comfortable with Git and are unsure about how to use it
effectively (cf. https://ohshitgit.com/).  If we propose to a user that
they should do something like enable hooks by adding a prompt, many
users will automatically say "yes" because (a) they don't understand and
they trust that Git is prompting them to do something beneficial and (b)
because they don't know or care and just want to get on with their
lives.  As a result, we're exposing people to giant social engineering
attacks on behalf of potentially unscrupulous repository maintainers.

This is made worse by the fact that we will prompt users even when
cloning a repo that they have no intention of performing development on
means that we will have users who are misled here where otherwise
nothing would happen.

There is a huge problem with social engineering attacks and phishing on
the Internet today and I'm concerned that this is going in exactly the
wrong direction.

I would want to see a comprehensive security analysis feature taking
into consideration social engineering attacks, the skill level and
comfort with Git of the majority of Git users, and the fact that people
clone repositories for many reasons other than software development.
It's easy to look at this from the perspective of the typical employee
at a major tech company and assume that users are generally security
conscious, comfortable with Git, and primarily engaged in software
development on the projects they clone, but I'm not sure any of those
cases are generally true, and anyway there are many counterexamples in
the real world whose use cases we need to take into account.

I continue to have serious reservations about this series and approach,
and I'm not sure that any proposal we can adopt here will address the
security concerns.  To be frank, I don't think this proposal should move
forward in its current state or otherwise, since I think the security
problems are inherent in this approach and fundamentally can't be fixed.

This is, as should be obvious from my email address, my personal
opinion, despite my reference to my employer above.  Unless otherwise
stated, I don't speak for my employer and they don't speak for me.
-- 
brian m. carlson (he/him or they/them)
Houston, Texas, US

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] hooks: propose project configured hooks
  2021-04-06 23:15       ` brian m. carlson
@ 2021-04-07  7:53         ` Ævar Arnfjörð Bjarmason
  2021-04-07 13:09           ` Derrick Stolee
  0 siblings, 1 reply; 26+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-07  7:53 UTC (permalink / raw)
  To: brian m. carlson
  Cc: Albert Cui, Derrick Stolee, Albert Cui via GitGitGadget, git


On Wed, Apr 07 2021, brian m. carlson wrote:

> On 2021-04-05 at 22:45:10, Albert Cui wrote:
>> Right, this entire proposal is trying to get to a "Git-blessed" solution,
>> and I should make the need clearer. A few reasons for standardizing
>> this come to mind:
>> 
>> 1. Many existing "standard" solutions. A multitude of existing solutions for
>> this use case speaks to the fact that a basic config script is not sufficient.
>> I mentioned Husky above, but here are a few more; basically each
>> popular programming language environment has a solution for this.
>> 
>> https://github.com/sds/overcommit - Ruby
>> https://github.com/pre-commit/pre-commit - Python
>> https://github.com/Arkweid/lefthook - Go
>> https://github.com/shibapm/Komondor - Swift
>> https://github.com/typicode/husky - Node
>> 
>> These solutions all handle the installation and updating of hooks. A
>> "configure-hooks.sh" script doesn't handle hook updates, unless you go through
>> the trouble yourself of implementing and maintaining that.
>
> I think part of the problem is that an automated process to update hooks
> is generally a security vulnerability, since it means that untrusted
> remote code will automatically run on your computer.
>
> I want to be clear that I understand the desire for this feature, even
> though it's not a feature I would personally use, and the fact that
> there are many approaches means that clearly there are many people that
> do want this functionality.  I have in the past shared hooks with others
> and we have mutually benefitted enormously from that fact.  My concerns
> here are solely about the security aspects of this feature.
>
>> 3. Improving security. As you mentioned, hooks are difficult to get
>> right from a security
>> perspective, and standardizing on a single implementation allows us to
>> give developers
>> a well-vetted solution with a better security model than what exists
>> today. For example,
>> we're proposing making it very clear to users whenever there's a hook
>> update. This isn't
>> something that existing solutions do.
>
> I don't think this materially improves security.  All of these options
> have the same security problems, and that's inherent in the solution.
> What we're doing here is basically giving people a built-in feature that
> is the equivalent of piping curl to bash and blessing it as secure when
> it's not.
>
>> I'll also say in general, the Git project is much more likely to get
>> security right than smaller
>> projects, where oftentimes even popular projects end up unmaintained.
>
> I agree that Git tries to be careful about security.  It is for these
> reasons that I think Derrick and I have provided you the feedback we
> have here.
>
>> Agreed. We already did a security review internally at Google. The main
>> feedback was:
>> 
>> * We need an explicit opt-in opposed to setting hooks up automatically,
>> e.g. a command line flag like --accept-hooks at minimum. This is primarily
>> to distinguish people who are just cloning a repository to browse the code
>> from people who are developing.
>> 
>> * The average user doesn't have the ability to review hooks in general
>> (security is hard and obscuration is easy), and if the user has
>> already opted into
>> this feature because they are engaged in development, it's very likely
>> that they're
>> already running build scripts, so the additional attack vector here doesn't seem
>> like a big issue.
>
> I think you've hit the nail on the head here, but drawn a mistaken
> conclusion.  The average user doesn't have the ability to review hooks
> in general and therefore cannot make an informed decision about whether
> to enable them, so the behavior we need to have is not to lead them to
> doing things which are risky from a security perspective.
>
> If my goal is to just build a product and not to run its tests, which I
> do with a decent number of projects, then I can audit a Go or Rust
> project trivially and determine if it executes arbitrary code or not
> during the build process and if so, inspect it and gain confidence in
> it.  In fact, there are many projects which don't execute build scripts
> during the process, and therefore which are completely safe.  This hook
> design changes that calculus dramatically.
>
> I also want to point out that people clone repositories for a variety of
> reasons.  At GitHub, every team has its own repository with
> documentation.  Literally every employee at the company, regardless of
> role, interacts with a Git repository, even people who do normally
> nontechnical tasks such as our in-house lawyers and our event planners.
> Many of these people are nontechnical, and almost none of these
> repositories has any software development involvement.  There are also
> numerous people elsewhere who may work on projects such as books or
> other non-software in repositories who are nontechnical.  Under the
> current model, the biggest problem these people face is accidentally
> corrupting their local repository and losing data.  With a design that
> prompts them to install hooks, they face the possibility of arbitrary
> code execution.
>
> The reason I proposed the FAQ we have in our documentation is because I
> answer a decent number of questions on Stack Overflow, in addition to
> questions that involve users that I get pulled into at work.
> Overwhelmingly, the vast majority of users, even developers, are not
> completely comfortable with Git and are unsure about how to use it
> effectively (cf. https://ohshitgit.com/).  If we propose to a user that
> they should do something like enable hooks by adding a prompt, many
> users will automatically say "yes" because (a) they don't understand and
> they trust that Git is prompting them to do something beneficial and (b)
> because they don't know or care and just want to get on with their
> lives.  As a result, we're exposing people to giant social engineering
> attacks on behalf of potentially unscrupulous repository maintainers.
>
> This is made worse by the fact that we will prompt users even when
> cloning a repo that they have no intention of performing development on
> means that we will have users who are misled here where otherwise
> nothing would happen.
>
> There is a huge problem with social engineering attacks and phishing on
> the Internet today and I'm concerned that this is going in exactly the
> wrong direction.
>
> I would want to see a comprehensive security analysis feature taking
> into consideration social engineering attacks, the skill level and
> comfort with Git of the majority of Git users, and the fact that people
> clone repositories for many reasons other than software development.
> It's easy to look at this from the perspective of the typical employee
> at a major tech company and assume that users are generally security
> conscious, comfortable with Git, and primarily engaged in software
> development on the projects they clone, but I'm not sure any of those
> cases are generally true, and anyway there are many counterexamples in
> the real world whose use cases we need to take into account.
>
> I continue to have serious reservations about this series and approach,
> and I'm not sure that any proposal we can adopt here will address the
> security concerns.  To be frank, I don't think this proposal should move
> forward in its current state or otherwise, since I think the security
> problems are inherent in this approach and fundamentally can't be fixed.
>
> This is, as should be obvious from my email address, my personal
> opinion, despite my reference to my employer above.  Unless otherwise
> stated, I don't speak for my employer and they don't speak for me.

I agree with pretty much every word you said, in particular the social
engineering aspect of this. In past mails I've referred to elsewhere
I've proposed some Emacs-like "ask" facility for git, but you've
convinced me that that default would be a bad idea for the "user just
clicks yes no matter what" reasons you noted.

Still, I do think there is a way to make this work in a way that's
probably acceptable for everyone:

 * We don't ever ask the user to install hooks, it's something that
   they'd have to know about and pro-actively set up in advance.

 * The security model is entirely focused on not approving changes as
   you "pull" them, but e.g. GPG-verifying the whole chain with some
   pre-setup key.

The use-case (and I've had this use-case in the past) would be something
like a BigCorp which automates its servers/laptops, but would prefer not
to patch/build/ship something like git itself.

So when you "git clone" your corporate repos you get relevant
config/hooks, but not otherwise. We'd of course have a way to discover
that you can set these up & do so after "clone", but it would be
something more like check-mailmap, not something we'd prompt you to do.

I'm personally much more interested in doing something like this for an
in-repo .gitconfig, with us shipping a graduals whitelist of known
config values at differeng safety levels.

That sort of thing really *is* something we could imagine asking the
user about, or even doing by default, e.g. applying a "diff -U<n>"
setting for that repo, picking up non-executable aliases for
non-data-changing git programs etc.

But as you note hooks are really on the extreme other side of that
security curve, which is why in some earlier thread discussing this I
suggested that a much more productive way to start an effort like this
would be the in-repo .gitconfig route. We could start with our N most
safe config variables, and work from there...


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] hooks: propose project configured hooks
  2021-04-07  7:53         ` Ævar Arnfjörð Bjarmason
@ 2021-04-07 13:09           ` Derrick Stolee
  2021-04-07 18:40             ` Albert Cui
  0 siblings, 1 reply; 26+ messages in thread
From: Derrick Stolee @ 2021-04-07 13:09 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, brian m. carlson
  Cc: Albert Cui, Albert Cui via GitGitGadget, git

On 4/7/2021 3:53 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Wed, Apr 07 2021, brian m. carlson wrote:
>>
>> I continue to have serious reservations about this series and approach,
>> and I'm not sure that any proposal we can adopt here will address the
>> security concerns.  To be frank, I don't think this proposal should move
>> forward in its current state or otherwise, since I think the security
>> problems are inherent in this approach and fundamentally can't be fixed.
>>
>> This is, as should be obvious from my email address, my personal
>> opinion, despite my reference to my employer above.  Unless otherwise
>> stated, I don't speak for my employer and they don't speak for me.
> 
> I agree with pretty much every word you said, in particular the social
> engineering aspect of this. In past mails I've referred to elsewhere
> I've proposed some Emacs-like "ask" facility for git, but you've
> convinced me that that default would be a bad idea for the "user just
> clicks yes no matter what" reasons you noted.

These replies definitely speak from a perspective common to mine.
This is very dangerous territory and should be handled carefully.

There is also a legitimate user need to use hooks _to contribute_
to some repositories. Hooks are not needed to read the repositories
or interact with them as a document.

The current mechanisms require ad-hoc approaches that are custom to
each project, so there would be value in creating a standard inside
the Git client itself. I think the proposal goes too far in making
this an automatic configuration, either because it assumes trust or
assumes sufficient skepticism on behalf of the users. Either is not
acceptable for the Git project.

Here are the hard lines I draw:

1. This should not happen in "git clone" (other than maybe a message
   over stderr that hooks are available to be configured through a
   different command).

2. Hooks should not update in "git checkout" (other than a message
   that hooks have updated).

3. Whatever document triggers a hook configuration should live at
   HEAD and should not be configured or updated until HEAD has been
   updated by one Git command (git clone, git checkout), time
   passes for the user to inspect the worktree, then _another_
   command (git hooks?) is run manually to reconfigure the hooks.

I think there is a potential way forward if these items are followed.

But I'd like to ask a different question: What problems are these
custom hooks solving, and can Git solve those problems in-core?

If we care about checking commits for format or something, is that
a common enough problem that we could implement it in Git itself and
enable it through a Git config option? It might be interesting to
pursue this direction and maybe we'll solve 80% of the need with
extensions like that.

I'm aware of some hooks that insert things like a Gerrit change-id
that would probably not be appropriate for such an in-core change.

There is always the extreme option of requiring users to use a
specific fork of Git in order to work with your repository. That
has its own pains, believe me. But, it does allow for the ultimate
flexibility in how these things are done. Optional config can be
enabled by default. Hooks can be replaced with in-core functionality.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] hooks: propose project configured hooks
  2021-04-07 13:09           ` Derrick Stolee
@ 2021-04-07 18:40             ` Albert Cui
  2021-04-07 20:02               ` Junio C Hamano
  0 siblings, 1 reply; 26+ messages in thread
From: Albert Cui @ 2021-04-07 18:40 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Ævar Arnfjörð Bjarmason, brian m. carlson,
	Albert Cui via GitGitGadget, git

On Wed, Apr 7, 2021 at 6:09 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 4/7/2021 3:53 AM, Ævar Arnfjörð Bjarmason wrote:
> >
> > On Wed, Apr 07 2021, brian m. carlson wrote:
> >>
> >> I continue to have serious reservations about this series and approach,
> >> and I'm not sure that any proposal we can adopt here will address the
> >> security concerns.  To be frank, I don't think this proposal should move
> >> forward in its current state or otherwise, since I think the security
> >> problems are inherent in this approach and fundamentally can't be fixed.
> >>
> >> This is, as should be obvious from my email address, my personal
> >> opinion, despite my reference to my employer above.  Unless otherwise
> >> stated, I don't speak for my employer and they don't speak for me.
> >
> > I agree with pretty much every word you said, in particular the social
> > engineering aspect of this. In past mails I've referred to elsewhere
> > I've proposed some Emacs-like "ask" facility for git, but you've
> > convinced me that that default would be a bad idea for the "user just
> > clicks yes no matter what" reasons you noted.
>
> These replies definitely speak from a perspective common to mine.
> This is very dangerous territory and should be handled carefully.
>
> There is also a legitimate user need to use hooks _to contribute_
> to some repositories. Hooks are not needed to read the repositories
> or interact with them as a document.
>
> The current mechanisms require ad-hoc approaches that are custom to
> each project, so there would be value in creating a standard inside
> the Git client itself. I think the proposal goes too far in making
> this an automatic configuration, either because it assumes trust or
> assumes sufficient skepticism on behalf of the users. Either is not
> acceptable for the Git project.
>
> Here are the hard lines I draw:
>
> 1. This should not happen in "git clone" (other than maybe a message
>    over stderr that hooks are available to be configured through a
>    different command).
>
> 2. Hooks should not update in "git checkout" (other than a message
>    that hooks have updated).
>

To Ævar's point, maybe it would help to separate the two user bases of
project configured hooks.
(1) Employee working at BigCorp. They are cloning from a trusted
remote on company machines where the company controls what gets
installed and how Git is configured. Their motivation is to make
changes to their local clone and submit changes to a central
repository.
(2) Git user cloning from any remote e.g. GitHub. They could have many
motivations: to make changes, to inspect the code, to simply just
build.

I agree that this feature should not get in the way of users (2), or
expose them to new attack surfaces, users who may have no desire to
have project configured hooks. That said, I think we can still get
into a world that better serves users (1). I proposed this upthread
and would like feedback on it (I realize these examples still assume
one config for every branch, but you get the gist):

Case 1. Opt-into clone setup via config
```
#~/.gitconfig
[hook]
   allowCloneInstallFromRemote = $REMOTE
```

IFF $REMOTE matches the config, then `git clone $REMOTE --setup-hooks` works:

```
$ git clone $remote --setup-hooks
The following hooks were installed from `origin` ($ORIGIN_URL):
pre-push: $GIT_ROOT/pre_push.sh
```

Case 2. Without the config opt-in for clone setup
```
$ git clone $remote # using --setup-hooks here wouldn't change
behavior since there's no config opt-in
Remote `origin` ($ORIGIN_URL) suggests the following hooks:
pre-push: $GIT_ROOT/pre_push.sh

If you wish to install them, run `git hook setup origin`.
To always ignore hooks from `origin`, run `git hook ignore origin`.
```

Case 3. Opting into updates
You could imagine a similar config, e.g. allowAutoUpdateFromRemote
that allows Git to prompt users to consent to auto-updating hooks on
"git checkout" with this type of behavior:

....
$ git checkout

The following hooks were updated from remote `origin` ($ORIGIN_URL):

pre-push: $GIT_ROOT/pre_push.sh

If you wish to install them, run `git hook setup origin`.

# The below only appears if allowAutoUpdateFromRemote is set for $ORIGIN_URL
If you wish to always accept hooks from `origin`, run `git hook setup --always
origin`. You should only do this if you trust code changes from origin.

To always ignore hooks from `origin`, run `git hook ignore origin`.
....

> 3. Whatever document triggers a hook configuration should live at
>    HEAD and should not be configured or updated until HEAD has been
>    updated by one Git command (git clone, git checkout), time
>    passes for the user to inspect the worktree, then _another_
>    command (git hooks?) is run manually to reconfigure the hooks.
>

I want to separate the requirement from the implementation. What I'm
hearing is that "users should have a chance to inspect the suggested
hook before consenting to installing it." That doesn't necessarily
require the configuration to be in HEAD.

Again, that's reasonable for users (2) but doesn't seem necessary for
users (1) if we have the correct opt-ins.

> I think there is a potential way forward if these items are followed.
>
> But I'd like to ask a different question: What problems are these
> custom hooks solving, and can Git solve those problems in-core?
>
> If we care about checking commits for format or something, is that
> a common enough problem that we could implement it in Git itself and
> enable it through a Git config option? It might be interesting to
> pursue this direction and maybe we'll solve 80% of the need with
> extensions like that.
>
> I'm aware of some hooks that insert things like a Gerrit change-id
> that would probably not be appropriate for such an in-core change.
>

A `git lint` command would cover a lot of the use cases, but to your
point, there are others.

Thanks,
Albert

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] hooks: propose project configured hooks
  2021-04-07 18:40             ` Albert Cui
@ 2021-04-07 20:02               ` Junio C Hamano
  2021-04-07 22:23                 ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 26+ messages in thread
From: Junio C Hamano @ 2021-04-07 20:02 UTC (permalink / raw)
  To: Albert Cui
  Cc: Derrick Stolee, Ævar Arnfjörð Bjarmason,
	brian m. carlson, Albert Cui via GitGitGadget, git

Albert Cui <albertqcui@gmail.com> writes:

>> Here are the hard lines I draw:
>>
>> 1. This should not happen in "git clone" (other than maybe a message
>>    over stderr that hooks are available to be configured through a
>>    different command).
>>
>> 2. Hooks should not update in "git checkout" (other than a message
>>    that hooks have updated).
>>
>
> To Ævar's point, maybe it would help to separate the two user bases of
> project configured hooks.
> (1) Employee working at BigCorp. They are cloning from a trusted
> remote on company machines where the company controls what gets
> installed and how Git is configured. Their motivation is to make
> changes to their local clone and submit changes to a central
> repository.

Hmph.

If the assumption is that their configuration is controlled by
BigCorp when they arrive at their desk, why do you even need any
change to upstream Git, especially with Emily's "configuration file
tells Git what hook scripts to run" in sight?

Wouldn't it be just a matter of the BigCorp installing
/etc/gitconfig on their BigCorpLinux installations?


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] hooks: propose project configured hooks
  2021-04-07 20:02               ` Junio C Hamano
@ 2021-04-07 22:23                 ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 26+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-07 22:23 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Albert Cui, Derrick Stolee, brian m. carlson,
	Albert Cui via GitGitGadget, git


On Wed, Apr 07 2021, Junio C Hamano wrote:

> Albert Cui <albertqcui@gmail.com> writes:
>
>>> Here are the hard lines I draw:
>>>
>>> 1. This should not happen in "git clone" (other than maybe a message
>>>    over stderr that hooks are available to be configured through a
>>>    different command).
>>>
>>> 2. Hooks should not update in "git checkout" (other than a message
>>>    that hooks have updated).
>>>
>>
>> To Ævar's point, maybe it would help to separate the two user bases of
>> project configured hooks.
>> (1) Employee working at BigCorp. They are cloning from a trusted
>> remote on company machines where the company controls what gets
>> installed and how Git is configured. Their motivation is to make
>> changes to their local clone and submit changes to a central
>> repository.
>
> Hmph.
>
> If the assumption is that their configuration is controlled by
> BigCorp when they arrive at their desk, why do you even need any
> change to upstream Git, especially with Emily's "configuration file
> tells Git what hook scripts to run" in sight?

FWIW I've been assuming that this is wanted for BigCorp people who have
devs using vanilla OS's / XCode etc. Having managed laptops with a
managed /etc/gitconfig certainly makes some things easier...

> Wouldn't it be just a matter of the BigCorp installing
> /etc/gitconfig on their BigCorpLinux installations?

Whether you're at BigCorp or not you've got the problem with such a
/etc/gitconfig that it applies to all repos, but someone at BigCorp
might clone both a BigCorp repo (wants custom config, hooks etc.), and
also some random node.js github.com project (should have no such custom
config).

Having a .gitconfig or otherwise making the repo suggest/control the
config/hooks is an elegant way around that issue.

You otherwise need to do something like have config includes apply a
rule to ~/work/, and then hope your users follow your suggestion to
clone repos in the ~/work/ folder.

I think extending config includes to e.g. operate on a glob of the
remote URI was suggested at some point, which would make use-cases like
that easier than they are now, and would be a less invasive change than
what's being discussed here.

But right now we don't have that, so we have a gap there...

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] hooks: propose repository owner configured hooks
  2021-04-06  0:35   ` Albert Cui
@ 2021-04-07 22:47     ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 26+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-07 22:47 UTC (permalink / raw)
  To: Albert Cui; +Cc: Albert Cui via GitGitGadget, git


On Tue, Apr 06 2021, Albert Cui wrote:

> On Fri, Mar 19, 2021 at 3:27 AM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>
>> On Thu, Mar 18 2021, Albert Cui via GitGitGadget wrote:

Not replying to everything here for lack of time, might need to loop
back to this, please headsup me if there's outstanding things you'd like

>> As noted in the proposal I linked I think anyone thinking about this
>> would do well to examine the trade-off Emacs's implementation of this
>> uses, since it manages to safely open arbitrary files that'll
>> potentially run arbitrary code on-open, but only if the user opts-in.
>>
>
> In [0], I proposed only allowing pre-clone opt-in to suggested hooks
> if allowlisted in the config for the given remote, which seems similar
> to your previous proposal. Extending this idea to any config settings
> seems very reasonable. I'd love other people's thoughts about this.
>
> In your proposal, you wrote:
>
>> It would work really well with includeIf, e.g. I could clone all my work repos to a "safe" area in ~/work which is allowed to set more options, e.g. aliases.
>
> It seems much safer to do this for a given remote, opposed to a local
> file path, no?

Yes, probably. I don't think I thought much about it at the time.

>> > +Later, we might want to do this before the initial clone is performed; that
>> > +workflow looks like:
>> > +
>> > +* During clone, perform ls-refs as normal
>> > +
>> > +* If the server has a "magic" config branch, fetch only that config branch.
>>
>> ...the reason for the magic branch is this before-clone use-case?
>>
>> Unless there's a really strong use-case for that I'd think the
>> per-branch .gitconfig would be a better trade-off, but even then there
>> would be ways to make that work obviously in the many-many case, and
>> still e.g. have a branch advertise a config blob for its "main" branch
>> as a special case or something.
>>
>> I also suspect an unstated constraint of having this in a magic branch
>> is the limitation of some git hosting provider's ACL
>> implementations. More on that later...
>>
>
> Mentioned in [0], the primary motivation for a magic config branch
> that lived outside of the worktree was "since the configuration is
> separate from the code base, it allows you to go back in history or to
> older branches while preserving "improvements" to the hooks
> configuration e.g. maybe the project has shifted to using a newer
> version of a linter."

It also means that your hooks will forever need to be aware of the union
of all revisions in the project to work properly, or more likely they'll
simply break on older revisions as e.g. a hook that needs to handle a
test directory handles just "t", but it used to be called "tests".

It's also just un-git-y in *requiring* a remote. A .mailmap,
.gitattributes etc. all work with a repo you find on-disk, why does
config & hooks need to be different?

How would a user of such a repo suggest changes to a hook? Now it's
fairly easy for e.g. .gitattributes, you change it, push a branch, ask
for it to be merged etc.

If you want the same hook for all revisions ever having some light logic
in the hook itself to check/cache that (it's executing arbitrary code
after all) seems like a saner thing for those who have this "magic
branch" use-case than to make it the default.


> [...]
>> > +Future Work
>> > +~~~~~~~~~~~
>> > +
>> > +* Extending this to allow repository owners to specify specific configurations
>> > +in general e.g. this repository should use partial-clone with these parameters.
>>
>> I don't see why such a proposal should narrowly confine itself to hooks.
>>
>> Once we have config-based hooks then hooks are just configuration.
>>
>> If we're going to pick up arbitrary configuration from remotes/repos on
>> request then we'd be starting with the most dangerous configuration.
>>
>
> Summarizing from [0]: Based on this feedback, I'm hearing "we should
> have a design for suggested configuration in general," but I don't
> think that necessitates that we should pursue generic configuration
> before hooks configuration.

It's the principle that if we think you might stab your eye
accidentally, maybe we should start with a plastic fork instead of going
directly to a lightsaber :)

I.e. hooks are a *lot* more dangerous to get wrong. I think this sort of
proposal would be better received if it started with a small whitelist
of our least dangerous (but still useful to set in-repo .gitconfig)
config variables. Then the worst thing that'll happen is you'll confuse
git-diff a bit, not an RCE.

>> I think a better way to start such an effort incrementally would be:
>>
>> 1. Audit git's config parser for the safety of parsing arbitrary config,
>>    we parse .gitmodules now, do we feel it's safe enough to parse
>>    arbitrary config (probably, but worth checking).
>>
>> 2. Add reflection to git's own config variables. Right now we have this
>>    in the binary generated via a grep from the documentation. Similar to
>>    Emacs's implementation we should/could categorize all our known
>>    config variables by safety.
>>
>
> To clarify, are you saying, today, git's config variables are pulled
> from the documentation? I.e. the documentation is the source of truth
> for what config variables are supported? o.0

Yes, see generate-configlist.sh.

But we don't hard round-trip it, and we couldn't have an exhaustive list
anyway due to things like userdiff, sendemail.identity etc.

But I'm mentioning it because the next logical step from there is to for
some/most pre-declare them, and at that point it could be some struct
that lists the danger level of each one, and from there whitelisting
them for in-repo .gitconfig would be a logical next step...

>>    Hooks are the least safe, something like a diff.context=N setting the
>>    repo wants to suggest a -U<n> setting much safer (just tweaking our
>>    existing diff algorithm) etc. But even in those cases we'd want to
>>    proceed slowly, e.g. is that config parsing for that -U<n> defensive
>>    enough to be safe for arbitrary data?
>>
>
> To clarify, this proposal is just to set the hooks config that
> config-based hooks enabled e.g. running `git config --add
> hook.pre-commit.command pylint` on behalf of the user, so I'm not sure
> what "arbitrary data" you're referring to. At least, any problems I'd
> think we'd already address with config-based hooks.

As noted by brian m. carlson etc. in the side-thread in
<YGzrfaSC4xd75j2U@camp.crustytoothpaste.net> the danger is that by
making this a supported feature git becomes the social-engineering
vector to fool users into trusting a command like that which they
otherwise might not have.

>> 3. Users should be able to e.g. configure "yes, for any repo I clone
>>    they can tweak 'safe'" variables. This would reduce user dialog
>>    fatigue, and thus increase safety. I.e. a repo who wants to set a
>>    thing like hooks would stand out, but something that e.g. wants to
>>    change the diff order would pass on existing global configuration.
>>
>> 4. Smarter ACL that magic remote+magic branch: It seems like an obvious
>>    thing to me to want that if I clone e.g. a random clone of git.git
>>    that I'd want to setup config for it IFF the .gitconfig in it is
>>    reachable from a tag GPG signed by <approved key>.
>>
>>    Git's a distributed system, so while I don't think it's bad to have
>>    some feature like "I always trust config from this remote" (e.g. a
>>    corporate environment where you know its .gitconfig is
>>    guarded/audited) we should think about more distributed-friendly
>>    solutions first and if possible guide users towards those.
>
> This seems like an OK alternative to allow-listing based on remote,
> but are you expecting users to add a GPG key to their .gitconfig?

That instead of saying you trust https://github.com/git/git your primary
means of interaction with this feature would be saying you, as an
example, trust Junio's GPG key.

And if you clone a git.git from *anywhere* and have already opted-in to
whatever scary config we'd expose for tihs we'll trust and setup the
hooks, if and only if they're identical to hooks we can find that were
last signed by the trusted GPG key.

> Remote URLs seem much more user friendly (think IP address vs URL).

Git is content-addressable, something that caters directly to that is
more consistent and user friendly than something that makes URLs magical
as a model for trust.

If I want a trusted kernel I don't really care what URL I clone
linux.git from, I care that I can resolve the content to something Linus
signed.

So if I trust Junio's key and would like this to Just Work it would be
better if I clone a mirror of git.git that it works, and I don't have to
maintain a list of all mirrors myself. Trusting based on content and GPG
keys gives you that, magic URLs don't.

It seems to me that in the BigCorp use-case this would also be
useful. Presumably you just want blessed reviewed hooks to execute
arbitrary code on your laptop, but if your co-worker forks a repo and
has control of your magic branch you don't want to setup the potential
harmful RCE. They might have malicious hooks.

But that's also annoying because in your BigCorp repo you'll presumably
want the hooks to check/validate/whatever. Just because you cloned a
fork you'd still like that.

If you trust (signed) content instead of URLs you can have your cake and
eat it too. As long as your co-worker didn't modify the hook config in
his fork it'll Just Work.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2021-04-07 22:47 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-18 22:03 [PATCH] hooks: propose repository owner configured hooks Albert Cui via GitGitGadget
2021-03-18 22:29 ` Junio C Hamano
2021-03-18 23:45   ` Albert Cui
2021-03-19  1:28 ` brian m. carlson
2021-03-19 10:27 ` Ævar Arnfjörð Bjarmason
2021-04-06  0:35   ` Albert Cui
2021-04-07 22:47     ` Ævar Arnfjörð Bjarmason
2021-03-26  1:43 ` [PATCH v2] hooks: propose project " Albert Cui via GitGitGadget
2021-03-29 23:20   ` Emily Shaffer
2021-04-01 20:02     ` Albert Cui
2021-03-30 15:24   ` Derrick Stolee
2021-04-05 22:45     ` Albert Cui
2021-04-05 23:09       ` Junio C Hamano
2021-04-05 23:40         ` Albert Cui
2021-04-06  0:13           ` Junio C Hamano
2021-04-06  0:27             ` Albert Cui
2021-04-06 23:15       ` brian m. carlson
2021-04-07  7:53         ` Ævar Arnfjörð Bjarmason
2021-04-07 13:09           ` Derrick Stolee
2021-04-07 18:40             ` Albert Cui
2021-04-07 20:02               ` Junio C Hamano
2021-04-07 22:23                 ` Ævar Arnfjörð Bjarmason
2021-04-02  9:59   ` Ævar Arnfjörð Bjarmason
2021-04-05 23:42     ` Albert Cui
2021-04-02 10:30   ` Ævar Arnfjörð Bjarmason
2021-04-03  0:58     ` Albert Cui

git@vger.kernel.org list mirror (unofficial, one of many)

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 git git/ https://public-inbox.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git
 note: .onion URLs require Tor: https://www.torproject.org/

code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git