git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Emily Shaffer <emilyshaffer@google.com>
To: Albert Cui via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, "Albert Cui" <albertqcui@gmail.com>,
	"brian m. carlson" <sandals@crustytoothpaste.net>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: Re: [PATCH v2] hooks: propose project configured hooks
Date: Mon, 29 Mar 2021 16:20:35 -0700	[thread overview]
Message-ID: <YGJgw5QPKFyv4HSG@google.com> (raw)
In-Reply-To: <pull.908.v2.git.1616723016659.gitgitgadget@gmail.com>

On Fri, Mar 26, 2021 at 01:43:36AM +0000, Albert Cui via GitGitGadget wrote:

> Change-Id: I5f6747524b97c51dfe5fa28e48ea03981b2da5b8
Oops :)

I avoid this by setting gerrit.createChangeId = false in my global
config and adding an alias:
  alias.gerrit-commit = "-c gerrit.createChangeId=true commit"

> +Server-side vs Local Checks
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +* Helps developers catch issues earlier: typically developers need to push to
> +the remote to trigger server-side checks. Local hooks can be run anytime the
> +developer wants. This is especially useful if the project has slow
> +server-checks; catching issues locally can save the developer a lot of time
> +waiting for CI. They are also useful for locally reproducing an issue identified
> +in CI, helping resolve issues faster.

Big +1 to this - I hate having to wait for a push and CI build, possibly
queued behind someone else's work or an earlier mistaken push, to check
whether my stuff is right. :)

> +In the ideal world, developers and project maintainers use both local and server
> +side checks in their workflow. However, for many smaller projects, this may not
> +be possible: CI may be too expensive to run or configure. The number of local
> +solutions to this use case speaks to this need (see <<prior-art, Prior Art>>).
> +Bringing this natively to Git can give all these developers a well-supported,
            ^~~~
This is a little vague here. It sounds like you might be suggesting to
standardize server-side CI config in Git-controlled projects.
> +secure implementation opposed to the fragmentation we see today.

The point about solution fragmentation is a strong one and I wonder
whether it's being emphasized enough. There is obviously a need, or else
people wouldn't keep writing all these things in the Prior Art section
:)

> +Security Considerations and Design Principles
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
[snip]
> +  ** Since developers will likely build their local clone in their development
> +  process, at some point, arbitrary code from the repository will be executed.
> +  In this sense, hooks _with user consent_ do not introduce a new attack surface.

It might be worth saying that we want to make configuration of
project-configured hooks to be approximately as easy/automatic as
building (that is, the user still has to explicitly run a build, and
isn't prompted at the end of their clone whether they want to build it
right away).
> +
> +* Give users visibility: Git must allow users to make informed decisions. This
> +means surfacing essential information to the user in a visible manner e.g. what
> +remotes the hooks are coming from, whether the hooks have changed in the latest
> +checkout.
   ^~~~~~~~
Better say "fetch", if we are proposing this magic branch thing.

> +* This configuration should only apply if it was received over HTTPS

Meaning, non-HTTPS fetches should just not update this special branch?

> +* A setup command for users to set up hooks
AIUI, this is proposed to be part of `git hook`, right?

I don't think it needs to be part of this doc but it'd be nice to also
support installing just a subset, like:

  git hook setup pre-commit
  git hook setup --interactive

> +* Users must explicitly approve hooks at least once
> +
> +    ** Running the setup command should count as approval, including if the user
> +    consented during the clone
> +
> +    ** When a hook command changes, a user should re-approve execution (note:
> +    implementation should not interfere with requirement listed in “Fast
> +    Follows")
> +
> +* Automation is able to continue to use clone and other commands
> +non-interactively

One interesting point - by using an advice instead of an interactive
prompt at clone time, we get this for free.

> +Fast Follows
> +^^^^^^^^^^^^
> +
> +* When prompted to execute a hook, users can specify always or never, even if
> +the hook updates

I think we want to base this on the remote URL, right? I know we talked
a little offline about how to mitigate vs. malicious maintainer (for
example this whole mess with The Great Suspender) and I'm not sure what
solution there might be.

I wonder if it's worth it to notify users that their always-okayed hooks
were updated during fetch?

> +
> +Nice to Haves
> +^^^^^^^^^^^^^
> +
> +* A method to skip hook execution i.e. `--no-verify` works everywhere

This part I'd like to discuss more on-list - I think it would need to
happen as an argument to git.c (e.g. git --no-verify commit blah), or
else we'd have the problems we have with --no-verify today. But is that
too ugly? I think everything else (even teaching parse-options to grab
--no-verify regardless, which, ick) would still be prone to issues,
since not everybody uses parse-options and not every subcommand
implementor knows their subcommand will invoke a hook. (For example, the
nice surprise when rebase started using some different strategy and
invoking the post-commit hook way more often, off the top of my head so
details may not be correct.)

> +* Support a “warnings only mode” where hooks run but don’t block commands from
> +executing

Same as --no-verify. I wonder whether it's "good enough" to do these two
as configs? hook.skip-all=true, hook.ignore-result=true?

> +Implementation Exploration: Check "magic" branch for configs at fetch time
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Example User Experience
> +^^^^^^^^^^^^^^^^^^^^^^^
> +
> +===== Case 1: Consent through clone
> +
> +....
> +$ git clone --setup-hooks
> +...
> +
> +The following hooks were installed from remote `origin` ($ORIGIN_URL):
> +
> +pre-commit: git-secrets --pre_commit_hook
> +pre-push:  $GIT_ROOT/pre_push.sh

Hm, I thought we wanted to consider storing the hook body in the magic
branch as well? To avoid changing hook implementation during bisect, for
example?

> +....
> +
> +===== Case 2: Prompting after clone
> +....
> +$ git clone
> +...
> +
> +Remote `origin` ($ORIGIN_URL) suggest installing the following hooks:
> +
> +pre-commit: git-secrets --pre_commit_hook
> +pre-push:  $GIT_ROOT/pre_push.sh
> +
> +# instead of prompting, we could give users commands to run instead
> +# see case 3

Yep, I think this is a better idea - I glued together the two UXen below
:)

> +
> +Do you wish to install them?
> +1. Yes (this time)
> +2. Yes (always from origin)
> +3. No (not this time)
> +4. No (never)
> +....

Offline when we discussed this, it seems like users will just smash 2
("whatever gets you to stop bothering me") regardless of whether the
hooks are actually coming from a source the user trusts. So I would
prefer something like:

  $ git clone
  ....
  Remote `origin` ($ORIGIN_URL) suggest installing the following hooks:

  pre-commit: git-secrets --pre_commit_hook
  pre-push:  $GIT_ROOT/pre_push.sh

  If you wish to install them, run `git hook setup origin`.

> +===== Case 3: Re-prompting when hooks change
> +....
> +$ git pull
> +
> +The following hooks were updated from remote `origin` ($ORIGIN_URL):
> +
> +pre-push:  $GIT_ROOT/pre_push.sh
> +
> +If you wish to install them, run `git hook setup origin`.
> +
> +If you wish to always accept hooks from `origin`, run `git hook setup --always
> +origin`. You should only do this if you trust code changes from origin.
> +
> +To always ignore hooks from `origin`, run `git hook ignore origin`.
> +....
> +
> +===== Case 4: Nudging when hooks weren't installed
> +....
> +$ git commit
> +advice: The repository owner has recommended a 'pre-commit' hook that was not run.
> +To view it, run `git show origin/refs/recommended-config:some-pre-commit`. To install it, run `git hook setup origin pre-commit`
> +
> +Turn off this advice by setting config variable advice.missingHook to false."
> +....

(Full disclosure: this was my idea.)
I realize that some folks upstream may find this is too chatty for
general use. I'm hoping being able to shut off the advice globally might
be enough of a mitigation; maybe we can gate it behind an experimental
config or something if folks aren't so sure?

> +Implementation Sketch
> +^^^^^^^^^^^^^^^^^^^^^
> +
> +* Perform fetch as normal
> +
> +* After fetch is complete, Git checks for a "magic" config branch (e.g.
> ++origin/refs/recommended-config+) which contains information about config lines
> +an end-user may want (including hooks).
> +
> +* As part of the fetch subcommand, Git prompts users to install the configs
> +contained there.

Like I mentioned above, I think we probably want to drop the entire
interactive installer wizard concept...

> +    ** User responses to that prompt could be "sticky" - e.g. a user could reply
> +    "no (this time)", "no (never)", "yes (this time)", or "yes (always)".
> +    Always/never indicate that the user trusts the remote this config is coming
> +    from, and should not apply to configs fetched from other remotes.

...which also means that we can drop trying to express this briefly and
instead say something wordy in a flag to `git hook setup` (or whatever
we call it).

> +Later, we might want to do this before the initial clone is performed; that
> +workflow looks like:
> +
> +* During clone, perform ls-refs as normal
> +
> +* If the server has a "magic" config branch, fetch only that config branch.
> +
> +* Prompt users as described above.
> +
> +* Perform the rest of the clone.

This part I'm still interested in, although I'm not sure how to
reconcile not wanting an interactive prompt with wanting an early step
like this during clone. Maybe that's what this `git clone --setup-hooks`
(or maybe, `git clone --with-recommended-configs`) is for?

> +Pros
> +^^^^
> +
> +* Repository owners have a method for providing recommended config for
> +contributors.
> +
> +* Installation flow happens without additional user intervention.

I think when we wrote this bullet point it was to express "the user
doesn't have to run something else to discover these hooks exist". But I
don't think "without additional user intervention" fully describes
what's proposed here, either. Hrm.

> +
> +* Keeping config branch and history separate from code branch and history means
> +it is versioned, but not tied to user's checkout.

Probably worth discussing/including that we intend hook contents to also
live in the config branch, to make sure we're running the same hook
regardless of checkout/bisect state/inspection/have been working on a
feature for 6 months and have been fetching but not rebasing/etc. I'm
not sure I see that explicitly called out here...

Actually, I found the following (pasting from much earlier in the doc):

  +    ** This could be a path to a script/binary within the repository
  +
  +    ** This could be a path to a script/binary contained within
  submodules of
  +    the repository
  +
  +    ** This could be a user installed command or script/binary that
  exists
  +    outside of the repository and is present in `$PATH`

Maybe this part needs to be modified to explicitly refer to the hook
executable being tracked in the magic branch?

> +Cons
> +^^^^
[snip]
> +* Turning a "set and forget" command like clone into an interactive session with
> +the user is not ideal; care must be taken to avoid breaking bots.

If we notify and nag, but don't interactively prompt, then we get happy
bots for free ;)

> +
> +* Inflating configs and executables from a remote tracking branch which is never
> +checked out could be slow.

I wonder about this. This seems to me like something that might be
drastically slower or faster depending on platform. Hmmm.

> +Future Work
> +~~~~~~~~~~~
> +
> +* Extending this to allow repository owners to specify specific configurations
> +in general e.g. this repository should use partial-clone with these parameters.

Offline I think there was a little discussion with Stolee about whether
it made more sense to *only* approach this specific problem with this
document, as the hooks are also config, and so they could come later.
But I think if we want to store the executable in the magic branch (and
I do... since I keep bringing it up :) ) then it doesn't make sense to
say "build it for config and everything else will follow".

> +* Extending this to support submodules: We want to make sure this works in a way
> +that's easy to adapt to submodules, who would likely need to run the same hooks
> +as the superproject; for example, submodules could inherit the superproject
> +config.

I'm hoping to send an RFC patch introducing such an inherited
superproject config ... very soon. I hope. So there wasn't much detail
provided here, intentionally.

> +* Sandboxing hook execution to provide higher levels of security.

I think this says: "Can we run a user hook in a container that only has
access to the repo in question?"

It sounds like a complicated answer. I could see legitimate reasons to
want wider access than just the container - for example, some
hook-specific configuration that doesn't fit the Git config format, or
even something like updating a stats file to keep a record of how many
commits I made/pushed/whatever every day, stored in a central location
for reference at performance review time :) But I also don't know alllll
that much about containerization - I think there are ways to hand over
access to other needed files like this, right?

But then, I also feel yucky thinking about Debian telling me that my Git
install also needs me to install Docker... :)

Worth thinking about and discussing at a later date, I'd guess.

> +[[prior-art]]
> +Prior Art
> +~~~~~~~~~

I wonder whether it's useful to mention (in mails, I guess, not in
the checked in doc) why these are bad - do they duplicate work between
each other? Are they engaging in bad practices when interfacing with
Git? etc.?

It would be a lot of work to collect, so maybe it's not that useful..


Thanks for writing up v2 / mailing it, Albert.

 - Emily

  reply	other threads:[~2021-03-29 23:21 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-18 22:03 [PATCH] hooks: propose repository owner configured hooks Albert Cui via GitGitGadget
2021-03-18 22:29 ` Junio C Hamano
2021-03-18 23:45   ` Albert Cui
2021-03-19  1:28 ` brian m. carlson
2021-03-19 10:27 ` Ævar Arnfjörð Bjarmason
2021-04-06  0:35   ` Albert Cui
2021-04-07 22:47     ` Ævar Arnfjörð Bjarmason
2021-06-21 19:36       ` Jonathan Tan
2021-06-21 20:35         ` Ævar Arnfjörð Bjarmason
2021-03-26  1:43 ` [PATCH v2] hooks: propose project " Albert Cui via GitGitGadget
2021-03-29 23:20   ` Emily Shaffer [this message]
2021-04-01 20:02     ` Albert Cui
2021-03-30 15:24   ` Derrick Stolee
2021-04-05 22:45     ` Albert Cui
2021-04-05 23:09       ` Junio C Hamano
2021-04-05 23:40         ` Albert Cui
2021-04-06  0:13           ` Junio C Hamano
2021-04-06  0:27             ` Albert Cui
2021-04-06 23:15       ` brian m. carlson
2021-04-07  7:53         ` Ævar Arnfjörð Bjarmason
2021-04-07 13:09           ` Derrick Stolee
2021-04-07 18:40             ` Albert Cui
2021-04-07 20:02               ` Junio C Hamano
2021-04-07 22:23                 ` Ævar Arnfjörð Bjarmason
2021-04-15 16:52             ` Ed Maste
2021-04-15 19:41               ` Junio C Hamano
2021-04-15 20:37                 ` Ed Maste
2021-04-15 20:50                   ` Junio C Hamano
2021-04-15 22:28                   ` brian m. carlson
2021-04-02  9:59   ` Ævar Arnfjörð Bjarmason
2021-04-05 23:42     ` Albert Cui
2021-04-02 10:30   ` Ævar Arnfjörð Bjarmason
2021-04-03  0:58     ` Albert Cui
2021-04-24  1:38   ` [PATCH v3] " Albert Cui via GitGitGadget
2021-04-28  2:48     ` Junio C Hamano
2021-05-05 19:11     ` [PATCH v4] " Albert Cui via GitGitGadget
2021-06-03  3:31       ` Jonathan Tan
2021-06-03 20:16         ` Albert Cui
2021-06-03 22:10           ` Jonathan Tan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YGJgw5QPKFyv4HSG@google.com \
    --to=emilyshaffer@google.com \
    --cc=albertqcui@gmail.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).