From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-3.9 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by dcvr.yhbt.net (Postfix) with ESMTP id 91DE21F5AF for ; Tue, 30 Mar 2021 15:25:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232382AbhC3PYr (ORCPT ); Tue, 30 Mar 2021 11:24:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57470 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231794AbhC3PYd (ORCPT ); Tue, 30 Mar 2021 11:24:33 -0400 Received: from mail-ot1-x330.google.com (mail-ot1-x330.google.com [IPv6:2607:f8b0:4864:20::330]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4D790C061574 for ; Tue, 30 Mar 2021 08:24:33 -0700 (PDT) Received: by mail-ot1-x330.google.com with SMTP id y19-20020a0568301d93b02901b9f88a238eso15921365oti.11 for ; Tue, 30 Mar 2021 08:24:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=hAP8AMK3/CnfkaeEgGmqtNq7bKyi/ZmXqtvOip4+aCo=; b=QETIPu+Z+nZcqrUkMOLKQjzOgN9hcv1oq/anY6KX2nEaglpK1TH8lT+kADTGGEvB3m bVyiyTvhWHXTjT5V67HTDBA+3yZVXDj7uIN15Zu6XFig58ZUh6sLDBmM8wBoq0RPMOWx wKMPBMzdI3GYz3J/ISowTCyn8HCQk+W7EzDxrtl4TLyeWRzy9Jn0hhhIqZbghYF+SsTF BzVSdwuUZmvriiHrPNK/hBkHCmuNTPBqS6IZoz1IQAkCQiNhYDZUCUEVD9o16R+nyydh DuGuThkRkcExXW8Bsc8wilpveQsOF+2LvnZsDIoo5NCW5AGxJhPAbPLDjcK0/cS7/Ak3 RwVA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=hAP8AMK3/CnfkaeEgGmqtNq7bKyi/ZmXqtvOip4+aCo=; b=BwQItVSnNM49GQmHvputh4lUPZKHEZY22IOQ++gWWFJGtv9SPe5KZCD5DL6o5Du7Sh FsJVzCvVwKvG0hnpBpw3bME2gOfbzj9gM+NerZwCJGt+iA1v8bCg1QCEzhFF2u9GNmr1 Jy349Y8FG06+9gi2yMHwwxStwY0JhOkIYXhAehUxHiIDeTJLRcfkGhB6VqVh4rz6d2od DMWKheLBjX8fXn9vxuTOHErFk/0TyItd1iwCHLhO+ObzGMOp0FBtTvG+z3GICccMMI9c OO468bPPa9m7x2GFQSRoALr3zxBzL1ROEsRH70xKyLAjAnMEoSd3sIQY/K2koPXtLIp6 eSPA== X-Gm-Message-State: AOAM532hJfsx/YWkKo+0vcEVy21Dyu2cOukidfgn6whesTT/surLIndN 0W+Os8ugS9fio1X0S03Jlps= X-Google-Smtp-Source: ABdhPJytul4PBwjjC6PP5l7hNqIDlbbsYeJk5ujkar6WuiY1HVNR/ifBEaozamfas7A9Ke0CDQICjQ== X-Received: by 2002:a05:6830:20d2:: with SMTP id z18mr27278263otq.260.1617117872381; Tue, 30 Mar 2021 08:24:32 -0700 (PDT) Received: from ?IPv6:2600:1700:e72:80a0:51d7:1436:793b:b3c9? ([2600:1700:e72:80a0:51d7:1436:793b:b3c9]) by smtp.gmail.com with ESMTPSA id 9sm4127272oid.17.2021.03.30.08.24.31 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 30 Mar 2021 08:24:31 -0700 (PDT) Subject: Re: [PATCH v2] hooks: propose project configured hooks To: Albert Cui via GitGitGadget , git@vger.kernel.org Cc: Albert Cui , "brian m. carlson" , =?UTF-8?B?w4Z2YXIgQXJuZmrDtnLDsCBCamFybWFzb24=?= References: From: Derrick Stolee Message-ID: Date: Tue, 30 Mar 2021 11:24:29 -0400 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.9.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On 3/25/2021 9:43 PM, Albert Cui via GitGitGadget wrote: > From: Albert Cui > > Hooks today are configured at the repository level, making it difficult to > share hooks across repositories. Configuration-based hook management, by > moving hooks configuration to the config, makes this much easier. However, > there is still no good way for project maintainers to encourage or enforce > adoption of specific hook commands on specific hook events in a repository. > As such, there are many tools that provide this functionality on top of Git. > > This patch documents the requirements we propose for this feature as well as > a design sketch for implementation. Sorry for being so late in reviewing this. My first reaction is that this feature is suggesting multiple security vulnerabilities as core functionality. It also seems to be tied to niche projects (in number of projects, not necessarily the size of those projects). I was recommended in conversation to think of this as a way to take existing ad-hoc behavior and standardize it with a "Git-blessed" solution. I'm not sure this proposal makes a strong enough case for why having a "configure-hooks.sh" script in the base of the repo is not enough. It simultaneously does not use existing precedents like .gitattributes or .gitignore as direction in using the worktree at HEAD as a mechanism for communicating details. I find using a separate ref for hooks to be a non-starter and the design should be rebuilt from scratch. I also expect that a significant portion of users will see a message like "this repository needs hooks" and will just say "yes" to get rid of the prompt. There needs to be sufficient opportunity for users to inspect the hook configuration and avoid frustrated or distracted users from doing the wrong thing. Server-side checks should always exist, so users who don't follow the project's guidelines using the recommended hooks will be blocked. The important thing is that there is an easy way for willing participants to install the correct hooks. This doesn't mean we should make it almost automatic. Also, please proactively pursue a security review of the feature, including non-technical risks such as social engineering, forks, or other possible attacks. This idea seems so risky that I would be against accepting it unless a security expert has done a thorough review. > +We propose adding native Git functionality to allow project maintainers to > +specify hooks that a user ought to install and utilize in their development > +workflows. I think providing a way for repository owners to _recommend_ how cloners should interact with the repository is a good idea. I think starting with hooks is perhaps a significant jump to the most complicated version of that idea. As you think of this design, it might be good to think about how some recommended Git config (within an allow-list) might fit into this system as well. I would have started there, with things like "Use partial clone" or "use sparse-checkout". Those are really things that need to happen at clone time, they can't really happen after-the-fact, which helps justifying a modification to 'git clone'. The hook configuration doesn't _need_ to happen during 'git clone'. More on this timing later. The .gitattributes file is the closest analogue I could find in current functionality, but it operates on a path-based scope, not repository scope. > +Server-side vs Local Checks > +^^^^^^^^^^^^^^^^^^^^^^^^^^^ ... > +In the ideal world, developers and project maintainers use both local and server > +side checks in their workflow. However, for many smaller projects, this may not > +be possible: CI may be too expensive to run or configure. The number of local > +solutions to this use case speaks to this need (see <>). > +Bringing this natively to Git can give all these developers a well-supported, > +secure implementation opposed to the fragmentation we see today. I'm not sure this is a good selling point for small projects. If they are small, then the CI to verify commits is cheap(er). Local hooks should never be used as a replacement for server-side checks. A user could always use a repository without the local hooks and push commits that have not been vetted locally. The extreme example is to have a commit hook that compiles the code and runs all the tests. Would you then remove all CI builds? Making it easier to adopt local hooks can avoid some pain points when users are blocked by the server-side checks. > +Server-side vs Local Checks > +^^^^^^^^^^^^^^^^^^^^^^^^^^^ ... > +User Goals / Critical User Journeys > +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ... I appreciate the motivation in this document. However, the motivation doesn't really justify why this should be baked into Git itself, since a "configure-repo" script in the base of the repo would suffice to achieve that functionality. The reason to put this in Git is to standardize this process so it is not different in each repository. It might be good to spend time justifying that angle. > +Security Considerations and Design Principles > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +We must balance the desire to make hooks setup easy for developers --- allowing > +them to get hooks set up with low friction, and hence increasing the probability > +of them adopting these hooks --- with protecting users from the security risks > +of arbitrary code execution on their hosts. > + > +To inform the design, we propose these design principles: > + > +* User consent: Users must explicitly agree to hooks usage; no hooks should > +execute without consent, and users should re-consent if hooks update. Users can > +opt-out of hooks. > + > +* Trust comes from the central repository: > + ** Most users don't have the time or expertise to properly audit every hook > + and what it does. There must be trust between the user and the remote that the > + code came from, and the Git project should ensure trust to the degree it can > + e.g. enforce HTTPS for its integrity guarantees. > + > + ** Since developers will likely build their local clone in their development > + process, at some point, arbitrary code from the repository will be executed. > + In this sense, hooks _with user consent_ do not introduce a new attack surface. It is critical that users are presented with this consent at the correct times. For instance, I believe configuring local hooks should only be done _after_ "git clone" completes. That allows a user to inspect the worktree to their content instead of in the middle of an interactive shell session or something. (The "git clone" command could output a message to stderr saying "This repository recommends configuring local hooks. Run 'git ' to inspect the hooks and configure them.") We've had enough code-execution bugs with "git clone" that I want to completely avoid that possibility here. > +* Give users visibility: Git must allow users to make informed decisions. This > +means surfacing essential information to the user in a visible manner e.g. what > +remotes the hooks are coming from, whether the hooks have changed in the latest > +checkout. As a user moves HEAD, we should similarly avoid updating the hooks automatically, but instead present a message to the user to update their hooks using an intentional command. > + ** This could be a path to a script/binary within the repository Binaries will be tricky if you want users of multiple platforms to interact with your repository. And scripts can be slower than binaries. How could someone build hooks from source using your workflow? Perhaps users are expected to locally compile the code before configuring the hooks? > + ** This could be a path to a script/binary contained within submodules of > + the repository This gives me significant chills. Proceed with caution here. I understand the reason to want this feature: you could have a suite of repositories using a common hook set that lives in each as a submodule. I just want to point out that this adds yet another dimension for attack. > + ** This could be a user installed command or script/binary that exists > + outside of the repository and is present in `$PATH` Like `rm -rf ~/*`? I'm trying to think of dangerous things to do without elevation. It could help here to clarify the intended user pattern here: "This repository requires that you install tool X." This seems unlikely to be necessarily true at clone time, so the users will have a broken state if they don't run some extra steps. How will that be communicated? Requirements like these make me think that these repositories would be better off with a script that configures the hooks after checking if these things actually exist on the PATH (and installs them if not). I would lower the priority of this one for now. > +* This configuration should only apply if it was received over HTTPS Avoiding http:// and git:// makes sense. Why not SSH? > +* A setup command for users to set up hooks > + > + ** Hook setup could happen at clone time assuming the user has consented > + e.g. if `--setup-hooks` is passed to `git clone` This is not enough consent. > +* Users must explicitly approve hooks at least once > + > + ** Running the setup command should count as approval, including if the user > + consented during the clone > + > + ** When a hook command changes, a user should re-approve execution (note: > + implementation should not interfere with requirement listed in “Fast > + Follows") Users should explicitly approve hooks any time they would change. They should also be able to explore the source of the change using whatever editors and tools they want, so the worktree should change to its new state without new hooks, _then_ the user could consider updating hooks based on that new state. > +Fast Follows > +^^^^^^^^^^^^ > + > +* When prompted to execute a hook, users can specify always or never, even if > +the hook updates I don't understand what this means. "when prompted to execute a hook" are you saying that the user will get a message saying "Git will now run the pre-commit hook, are you ok with that?" "even if the hook updates": I've made my stance clear that the user should be in complete control of when the hooks update. > +Out of Scope > +^^^^^^^^^^^^ > + > +* Ensuring the user has installed software that isn't distributed with the repo If you are going to allow hooks to run something on the PATH, then Git should probably check that such an executable exists before setting the config and causing problems. > +Implementation Exploration: Check "magic" branch for configs at fetch time > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +Example User Experience > +^^^^^^^^^^^^^^^^^^^^^^^ > + > +===== Case 1: Consent through clone > + > +.... > +$ git clone --setup-hooks > +... > + > +The following hooks were installed from remote `origin` ($ORIGIN_URL): > + > +pre-commit: git-secrets --pre_commit_hook > +pre-push: $GIT_ROOT/pre_push.sh > +.... Nope. I think this workflow is a non-starter. > +===== Case 2: Prompting after clone > +.... > +$ git clone > +... > + > +Remote `origin` ($ORIGIN_URL) suggest installing the following hooks: > + > +pre-commit: git-secrets --pre_commit_hook > +pre-push: $GIT_ROOT/pre_push.sh Yes, this works for me. > +# instead of prompting, we could give users commands to run instead > +# see case 3 > + > +Do you wish to install them? > +1. Yes (this time) > +2. Yes (always from origin) > +3. No (not this time) > +4. No (never) I'd rather see the installation as a separate step. That gives more weight to the users' consent. Even if you do have a prompt here that says Yes/No, *do not* include "always from origin". > +===== Case 3: Re-prompting when hooks change > +.... > +$ git pull > + > +The following hooks were updated from remote `origin` ($ORIGIN_URL): > + > +pre-push: $GIT_ROOT/pre_push.sh > + > +If you wish to install them, run `git hook setup origin`. Good. Stop here. Perhaps also describe this as something that happens with "git checkout" because it matters when HEAD updates, even if the commit was fetched earlier. > +===== Case 4: Nudging when hooks weren't installed > +.... > +$ git commit > +advice: The repository owner has recommended a 'pre-commit' hook that was not run. > +To view it, run `git show origin/refs/recommended-config:some-pre-commit`. To install it, run `git hook setup origin pre-commit` > + > +Turn off this advice by setting config variable advice.missingHook to false." > +.... These nudges seem like a good pattern, especially with the advice config. > +Implementation Sketch > +^^^^^^^^^^^^^^^^^^^^^ > + > +* Perform fetch as normal > + > +* After fetch is complete, Git checks for a "magic" config branch (e.g. > ++origin/refs/recommended-config+) which contains information about config lines > +an end-user may want (including hooks). I think this is the wrong direction to go. You are recommending a few things: 1. Some branch names are more special than others. 2. Hooks live in a separate history than the rest of the repository. 3. Users cannot inspect the hooks in their worktree before installation. Instead, think about things like .gitignore and .gitattributes, as they can change as the repository changes. Make a special _filename_ or directory: for example ".githooks/". > +* As part of the fetch subcommand, Git prompts users to install the configs > +contained there. Prompt users that they are available and can be configured using another command. I summarized my thoughts at the top. Thanks, -Stolee