git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [PATCH] [RFC] setup.c: make bare repo discovery optional
@ 2022-05-06 18:30 Glen Choo via GitGitGadget
  2022-05-06 20:33 ` Junio C Hamano
                   ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Glen Choo via GitGitGadget @ 2022-05-06 18:30 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, brian m. carlson, Derrick Stolee, Junio C Hamano,
	Emily Shaffer, Glen Choo, Glen Choo

From: Glen Choo <chooglen@google.com>

Add a config variable, `safe.barerepository`, that tells Git whether or
not to recognize bare repositories when it is trying to discover the
repository. This only affects repository discovery, thus it has no
effect if discovery was not done (e.g. `--git-dir` was passed).

This is motivated by the fact that some workflows don't use bare
repositories at all, and users may prefer to opt out of bare repository
discovery altogether:

- An easy assumption for a user to make is that Git commands run
  anywhere inside a repository's working tree will use the same
  repository. However, if the working tree contains a bare repository
  below the root-level (".git" is preferred at the root-level), any
  operations inside that bare repository use the bare repository
  instead.

  In the worst case, attackers can use this confusion to trick users
  into running arbitrary code (see [1] for a deeper discussion). But
  even in benign situations (e.g. a user renames ".git/" to ".git.old/"
  and commits it for archival purposes), disabling bare repository
  discovery can be a simpler mode of operation (e.g. because the user
  doesn't actually want to use ".git.old/") [2].

- Git won't "accidentally" recognize a directory that wasn't meant to be
  a bare repository, but happens to resemble one. While such accidents
  are probably very rare in practice, this lets users reduce the chance
  to zero.

This config is designed to be used like an allow-list, but it is not yet
clear what a good format for this allow-list would be. As such, this
patch limits the config value to a tri-state of [true|false|unset]:

- [*|(unset)] recognize all bare repositories (like Git does today)
- (empty) recognize no bare repositories

and leaves the full format to be determined later.

[1]: https://lore.kernel.org/git/kl6lsfqpygsj.fsf@chooglen-macbookpro.roam.corp.google.com
[2]: I don't personally know anyone who does this as part of their
normal workflow, but a cursory search on GitHub suggests that there is a
not insubstantial number of people who munge ".git" in order to store
its contents.

https://github.com/search?l=&o=desc&p=1&q=ref+size%3A%3C1000+filename%3AHEAD&s=indexed&type=Code
(aka search for the text "ref", size:<1000, filename:HEAD)

Signed-off-by: Glen Choo <chooglen@google.com>
---
    RFC setup.c: make bare repo discovery optional
    
    (Forgive the non-standard RFC tag, I haven't figured out how to send as
    RFC using GGG. I also didn't realize that /preview would also respect
    CC...)
    
    = Description
    
    A relatively easy win that came out of the discussions around embedded
    bare repos [1], is to just let users opt-out of discovering bare repos.
    This patch does exactly that, by adding a 'boolean' config variable,
    safe.barerepository.
    
    safe.barerepository is presented to users as an allow-list of
    directories that Git will recognize as a bare repository during the
    repository discovery process (much like safe.directory), but this patch
    only implements (and permits) boolean behavior (i.e. on, off and unset).
    Hopefully, this gives us some room to discuss and experiment with
    possible formats.
    
    Thanks to Taylor for suggesting the allow-list idea :)
    
    I think the core concept of letting users toggle bare repo discovery is
    solid, but I'm sending this as RFC for the following reasons:
    
     * I don't love the name safe.barerepository, because it feels like Git
       is saying that bare repos are unsafe and consequently, that bare repo
       users are behaving unsafely. On the other hand, this is quite similar
       to safe.directory in a few respects, so it might make sense for the
       naming to reflect that.
    
     * The *-gcc CI jobs don't pass. I haven't discerned any kind of pattern
       yet.
    
    = How this relates to embedded bare repos
    
    This does not change the default behavior (i.e. Git will still discover
    all bare repos by default) because that would be catastrophic for bare
    repo users [2]. As such, this patch isn't intended to solve the problem
    of embedded bare repos for all users once and for all, but I think it
    does improve the our stance on the matter:
    
     * In the short-term, users who know they won't need bare repos (or
       those who are willing to set GIT_DIR for all of their bare repos) can
       opt-in to a safer, easier to reason about mode of operation.
    
     * In the longer-term, we might identify a usable-enough default that we
       can give opt-out protection that works for the vast majority of
       users.
    
    = Other questions/Concerns
    
     * Maybe it's more informative for the user if we die() (or warn()) when
       we find a bare repo instead of silently ignoring it?
    
     * I wonder if it makes sense to separate the toggle for bare repo
       discovery and the allow-list of bare repositories. Something like
       core.barediscovery or discovery.barerepository has a lot less baggage
       than safe.*, and boolean enable/disable is a lot simpler, but this
       isn't good from an extensibility perspective.
    
     * Is there any reason why safe.barerepository shouldn't use the same
       format as (its obvious inspiration) safe.directory?
    
     * Are the docs clear enough? I found those hard to put into words, so
       I'd especially appreciate wording suggestions :)
    
    = Future work
    
     * Like safe.directory, safe.barerepository is only read from system and
       global config. I anticipate that this is too restrictive; there has
       already been some discussion of adding a GIT_SAFE_DIRECTORIES
       environment variable for safe.directory [3], and it would be useful
       to have the same thing for safe.barerepository.
    
    [1]
    https://lore.kernel.org/git/kl6lsfqpygsj.fsf@chooglen-macbookpro.roam.corp.google.com
    [2] In https://lore.kernel.org/git/xmqqh76ucdg6.fsf@gitster.g, Junio
    experimented with switching off bare repo discovery altogether and
    relying solely on GIT_DIR. The resulting fallout was deemed too big to
    be feasible. [3] https://lore.kernel.org/git/xmqqee1il09v.fsf@gitster.g/

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1261%2Fchooglen%2Fsetup%2Fdisable-bare-repo-config-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1261/chooglen/setup/disable-bare-repo-config-v1
Pull-Request: https://github.com/git/git/pull/1261

 Documentation/config/safe.txt | 24 +++++++++++++++
 setup.c                       | 36 ++++++++++++++++++++++-
 t/t1510-repo-setup.sh         | 55 +++++++++++++++++++++++++++++++++++
 3 files changed, 114 insertions(+), 1 deletion(-)

diff --git a/Documentation/config/safe.txt b/Documentation/config/safe.txt
index 6d764fe0ccf..02032251ffd 100644
--- a/Documentation/config/safe.txt
+++ b/Documentation/config/safe.txt
@@ -1,3 +1,27 @@
+safe.barerepository::
+	This config entry specifies directories that Git can recognize as
+	a bare repository when looking for the repository (aka repository
+	discovery). This has no effect if repository discovery is not
+	performed e.g. the path to the repository is set via `--git-dir`
+	(see linkgit:git[1]).
++
+It is recommended that you set this value so that Git will only use the bare
+repositories you intend it to. This prevents certain types of security and
+non-security problems, such as:
+
+* `git clone`-ing a repository containing a maliciously bare repository
+  inside it.
+* Git recognizing a directory that isn't mean to be a bare repository,
+  but happens to look like one.
++
+The currently supported values are `*` (Git recognizes all bare
+repositories) and the empty value (Git never recognizes bare repositories).
+Defaults to `*`.
++
+This config setting is only respected when specified in a system or global
+config, not when it is specified in a repository config or via the command
+line option `-c safe.barerepository=<path>`.
+
 safe.directory::
 	These config entries specify Git-tracked directories that are
 	considered safe even if they are owned by someone other than the
diff --git a/setup.c b/setup.c
index a7b36f3ffbf..9b5dd877273 100644
--- a/setup.c
+++ b/setup.c
@@ -1133,6 +1133,40 @@ static int ensure_valid_ownership(const char *path)
 	return data.is_safe;
 }
 
+/*
+ * This is similar to safe_directory_data, but only supports true/false.
+ */
+struct safe_bare_repository_data {
+	int is_safe;
+};
+
+static int safe_bare_repository_cb(const char *key, const char *value, void *d)
+{
+	struct safe_bare_repository_data *data = d;
+
+	if (strcmp(key, "safe.barerepository"))
+		return 0;
+
+	if (!value || !strcmp(value, "*")) {
+		data->is_safe = 1;
+		return 0;
+	}
+	if (!*value) {
+		data->is_safe = 0;
+		return 0;
+	}
+	return -1;
+}
+
+static int should_detect_bare(void)
+{
+	struct safe_bare_repository_data data;
+
+	read_very_early_config(safe_bare_repository_cb, &data);
+
+	return data.is_safe;
+}
+
 enum discovery_result {
 	GIT_DIR_NONE = 0,
 	GIT_DIR_EXPLICIT,
@@ -1238,7 +1272,7 @@ static enum discovery_result setup_git_directory_gently_1(struct strbuf *dir,
 			return GIT_DIR_DISCOVERED;
 		}
 
-		if (is_git_directory(dir->buf)) {
+		if (should_detect_bare() && is_git_directory(dir->buf)) {
 			if (!ensure_valid_ownership(dir->buf))
 				return GIT_DIR_INVALID_OWNERSHIP;
 			strbuf_addstr(gitdir, ".");
diff --git a/t/t1510-repo-setup.sh b/t/t1510-repo-setup.sh
index 591505a39c0..3ce8f776921 100755
--- a/t/t1510-repo-setup.sh
+++ b/t/t1510-repo-setup.sh
@@ -541,6 +541,61 @@ test_expect_success '#16e: bareness preserved by --bare' '
 	)
 '
 
+# Test the tri-state of [(unset)|""|"*"].
+test_expect_success '#16f: bare repo in worktree' '
+	test_when_finished "git config --global --unset safe.barerepository" &&
+	setup_repo 16f unset "" unset &&
+
+	git init --bare 16f/default/bare &&
+	git init --bare 16f/default/bare/bare &&
+	try_case 16f/default/bare unset unset \
+		. "(null)" "$here/16f/default/bare" "(null)" &&
+	try_case 16f/default/bare/bare unset unset \
+		. "(null)" "$here/16f/default/bare/bare" "(null)" &&
+
+	git config --global safe.barerepository "*" &&
+	git init --bare 16f/all/bare &&
+	git init --bare 16f/all/bare/bare &&
+	try_case 16f/all/bare unset unset \
+		. "(null)" "$here/16f/all/bare" "(null)" &&
+	try_case 16f/all/bare/bare unset unset \
+		. "(null)" "$here/16f/all/bare/bare" "(null)" &&
+
+	git config --global safe.barerepository "" &&
+	git init --bare 16f/never/bare &&
+	git init --bare 16f/never/bare/bare &&
+	try_case 16f/never/bare unset unset \
+		".git" "$here/16f" "$here/16f" "never/bare/" &&
+	try_case 16f/never/bare/bare unset unset \
+		".git" "$here/16f" "$here/16f" "never/bare/bare/"
+'
+
+test_expect_success '#16g: inside .git with safe.barerepository' '
+	test_when_finished "git config --global --unset safe.barerepository" &&
+
+	# Omit the "default" case; it is covered by 16a.
+
+	git config --global safe.barerepository "*" &&
+	setup_repo 16g/all unset "" unset &&
+	mkdir -p 16g/all/.git/wt/sub &&
+	try_case 16g/all/.git unset unset \
+		. "(null)" "$here/16g/all/.git" "(null)" &&
+	try_case 16g/all/.git/wt unset unset \
+		"$here/16g/all/.git" "(null)" "$here/16g/all/.git/wt" "(null)" &&
+	try_case 16g/all/.git/wt/sub unset unset \
+		"$here/16g/all/.git" "(null)" "$here/16g/all/.git/wt/sub" "(null)" &&
+
+	git config --global safe.barerepository "" &&
+	setup_repo 16g/never unset "" unset &&
+	mkdir -p 16g/never/.git/wt/sub &&
+	try_case 16g/never/.git unset unset \
+		".git" "$here/16g/never" "$here/16g/never" ".git/" &&
+	try_case 16g/never/.git/wt unset unset \
+		".git" "$here/16g/never" "$here/16g/never" ".git/wt/" &&
+	try_case 16g/never/.git/wt/sub unset unset \
+		".git" "$here/16g/never" "$here/16g/never" ".git/wt/sub/"
+'
+
 test_expect_success '#17: GIT_WORK_TREE without explicit GIT_DIR is accepted (bare case)' '
 	# Just like #16.
 	setup_repo 17a unset "" true &&

base-commit: 0f828332d5ac36fc63b7d8202652efa152809856
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] [RFC] setup.c: make bare repo discovery optional
  2022-05-06 18:30 [PATCH] [RFC] setup.c: make bare repo discovery optional Glen Choo via GitGitGadget
@ 2022-05-06 20:33 ` Junio C Hamano
  2022-05-09 21:42 ` Taylor Blau
  2022-05-13 23:37 ` [PATCH v2 0/2] " Glen Choo via GitGitGadget
  2 siblings, 0 replies; 26+ messages in thread
From: Junio C Hamano @ 2022-05-06 20:33 UTC (permalink / raw)
  To: Glen Choo via GitGitGadget
  Cc: git, Taylor Blau, brian m. carlson, Derrick Stolee,
	Emily Shaffer, Glen Choo

"Glen Choo via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Glen Choo <chooglen@google.com>
>
> Add a config variable, `safe.barerepository`, that tells Git whether or
> not to recognize bare repositories when it is trying to discover the
> repository. This only affects repository discovery, thus it has no
> effect if discovery was not done (e.g. `--git-dir` was passed).

> +safe.barerepository::
> +	This config entry specifies directories that Git can recognize as
> +	a bare repository when looking for the repository (aka repository
> +	discovery). This has no effect if repository discovery is not
> +	performed e.g. the path to the repository is set via `--git-dir`
> +	(see linkgit:git[1]).
> ++
> +It is recommended that you set this value so that Git will only use the bare
> +repositories you intend it to. This prevents certain types of security and
> +non-security problems, such as:
> +
> +* `git clone`-ing a repository containing a maliciously bare repository
> +  inside it.

"maliciously bare"? "malicious bare" probably.

> +* Git recognizing a directory that isn't mean to be a bare repository,

"mean to be" -> "meant to be".

> +  but happens to look like one.

> diff --git a/setup.c b/setup.c
> index a7b36f3ffbf..9b5dd877273 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -1133,6 +1133,40 @@ static int ensure_valid_ownership(const char *path)
>  	return data.is_safe;
>  }
>  
> +/*
> + * This is similar to safe_directory_data, but only supports true/false.
> + */
> +struct safe_bare_repository_data {
> +	int is_safe;
> +};
> +
> +static int safe_bare_repository_cb(const char *key, const char *value, void *d)
> +{
> +	struct safe_bare_repository_data *data = d;
> +
> +	if (strcmp(key, "safe.barerepository"))
> +		return 0;
> +
> +	if (!value || !strcmp(value, "*")) {
> +		data->is_safe = 1;
> +		return 0;
> +	}
> +	if (!*value) {
> +		data->is_safe = 0;
> +		return 0;
> +	}
> +	return -1;
> +}
> +
> +static int should_detect_bare(void)
> +{
> +	struct safe_bare_repository_data data;
> +
> +	read_very_early_config(safe_bare_repository_cb, &data);
> +
> +	return data.is_safe;
> +}
> +
>  enum discovery_result {
>  	GIT_DIR_NONE = 0,
>  	GIT_DIR_EXPLICIT,
> @@ -1238,7 +1272,7 @@ static enum discovery_result setup_git_directory_gently_1(struct strbuf *dir,
>  			return GIT_DIR_DISCOVERED;
>  		}
>  
> -		if (is_git_directory(dir->buf)) {
> +		if (should_detect_bare() && is_git_directory(dir->buf)) {
>  			if (!ensure_valid_ownership(dir->buf))
>  				return GIT_DIR_INVALID_OWNERSHIP;
>  			strbuf_addstr(gitdir, ".");

This is in a loop, which will go up and try the parent directory if
the body of this block is not entered, so it is calling the new
should_detect_bare() helper over and over if it returns false.

Not a very good idea.

Perhaps this would help?  I dunno.

static int should_detect_bare(void)
{
	static int should = -1; /* unknown yet */

	if (should < 0) {
		struct safe_bare_repository_data data = { 0 };
		read_very_early_config(safe_bare_repository_cb, &data);
		should = data.is_safe;
	}
	return should;
}

In any case, I very much appreciate the fact that this touches the
setup_git_directory_gently_1() codepath only minimally, as we have
other plans to update the code further soonish.

Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] [RFC] setup.c: make bare repo discovery optional
  2022-05-06 18:30 [PATCH] [RFC] setup.c: make bare repo discovery optional Glen Choo via GitGitGadget
  2022-05-06 20:33 ` Junio C Hamano
@ 2022-05-09 21:42 ` Taylor Blau
  2022-05-09 22:54   ` Junio C Hamano
  2022-05-10 22:00   ` Glen Choo
  2022-05-13 23:37 ` [PATCH v2 0/2] " Glen Choo via GitGitGadget
  2 siblings, 2 replies; 26+ messages in thread
From: Taylor Blau @ 2022-05-09 21:42 UTC (permalink / raw)
  To: Glen Choo via GitGitGadget
  Cc: git, brian m. carlson, Derrick Stolee, Junio C Hamano,
	Emily Shaffer, Glen Choo

Hi Glen,

On Fri, May 06, 2022 at 06:30:10PM +0000, Glen Choo via GitGitGadget wrote:
> From: Glen Choo <chooglen@google.com>
>
> Add a config variable, `safe.barerepository`, that tells Git whether or
> not to recognize bare repositories when it is trying to discover the
> repository. This only affects repository discovery, thus it has no
> effect if discovery was not done (e.g. `--git-dir` was passed).

Thanks for working on this! I'm excited to see some patches here, though
I'm not totally convinced of this direction. More below.

To summarize, this proposal attempts to work around the problem of
embedding bare repositories in non-bare checkouts by providing a way to
opt-out of bare repository discovery (which is to only discover things
that are listed in the safe.bareRepository configuration).

I agree that this would prevent the problem you're trying to solve, but
I have significant concerns that this patch is going too far (at the
risk of future damage to unrelated workflows) in order to accomplish
that goal.

My concern is that if we ever flipped the default (i.e. that
"safe.bareRepository" might someday be ""), that many legitimate cases
of using bare repositories would be broken. I think there are many such
legitimate use cases that _do_ rely on discovering bare repositories
(i.e., Git invocations that do not have a `--git-dir` in their
command-line). One such example would be forges, but I imagine that
there are many other uses we don't even know about, and I would like to
avoid breaking those if we ended up changing the default.

If it's possible to pursue a more targeted fix that leaves non-embedded
bare repositories alone, I'd like to try and focus these efforts on a
more narrow fix that would address just the case of embedded bare
repositories. I think that the direction I outlined in:

    https://lore.kernel.org/git/Ylobp7sntKeWTLDX@nand.local/

could be a good place to start (see the paragraph beginning with "Here's
an alternative approach" and below for the details).

One potential problem with that approach (that this patch doesn't suffer
from) is that any discovery which finds a bare repository would have to
continue up to the root of the volume in order to figure out whether or
not that bare repository is embedded in another non-bare one. That is
probably a non-starter due to performance, but I think you could easily
work around with a top-level setting that controls whether or not you
even _care_ about embedded bare repositories.

For example, if I set safe.bareRepository='*' in my top-level
/etc/gitconfig, then we can avoid having to continue discovery for bare
repositories altogether because we know we'll allow it anyway.

To pursue a change that targets just embedded bare repositories, I think
you fundamentally have to do an exhaustive repository discovery in order
to figure out whether the (bare) repository you're dealing with is
embedded or not. So having an opt-out for users that either (a) don't
care or (b) can't accept the performance degradation that Emily
mentioned as a result of doing unbounded filesystem traversal would be
sensible.

Playing devil's advocate for a moment, though, even if we had something
like the proposal I outlined, flipping the top-level default from '*' to
some value that implies we stop working in embedded bare repositories
will break existing workflows. But that breakage would just be limited
to embedded bare repositories, and not non-embedded ones. So I think on
balance that breakage would affect fewer real-world users, while still
being just as easy to recover from.

>     safe.barerepository is presented to users as an allow-list of
>     directories that Git will recognize as a bare repository during the
>     repository discovery process (much like safe.directory), but this patch
>     only implements (and permits) boolean behavior (i.e. on, off and unset).
>     Hopefully, this gives us some room to discuss and experiment with
>     possible formats.
>
>     Thanks to Taylor for suggesting the allow-list idea :)

I did suggest an allow-list, but not this one ;-).

>     I think the core concept of letting users toggle bare repo discovery is
>     solid, but I'm sending this as RFC for the following reasons:
>
>      * I don't love the name safe.barerepository, because it feels like Git
>        is saying that bare repos are unsafe and consequently, that bare repo
>        users are behaving unsafely. On the other hand, this is quite similar
>        to safe.directory in a few respects, so it might make sense for the
>        naming to reflect that.

Yes, the concerns I outlined above are definitely echoing this
sentiment. Another way to say it is that this feels like too big of a
hammer (i.e., it is targeting _all_ bare repositories, not just embedded
ones) for too small of a nail (embedded bare repositories). As you're
probably sick of hearing me say by now, I would strongly prefer a more
targeted solution (perhaps what I outlined, or perhaps something else,
so long as it doesn't break non-embedded bare repositories if/ever we
decided to change the default value of safe.bareRepository).

>      * The *-gcc CI jobs don't pass. I haven't discerned any kind of pattern
>        yet.

Interesting. I wouldn't expect this to be the case (since the default is
to allow everything right now).

>      * In the longer-term, we might identify a usable-enough default that we
>        can give opt-out protection that works for the vast majority of
>        users.

Perhaps, and I think if this were the case then I would feel differently
about this patch. But I don't want us to paint ourselves into a corner,
either. It would be unfortunate to, say, find ourselves in a position
where the only protection against some novel embedded bare repository
attack is to change a default that would break many existing workflows
for _non_-embedded bare repositories.

>     = Other questions/Concerns
>
>      * Maybe it's more informative for the user if we die() (or warn()) when
>        we find a bare repo instead of silently ignoring it?

We should definitely provide more feedback to the user. If I set
`safe.bareRepository` to the empty string via a global config, and then
execute a Git command in a non-embedded bare repository, I get:

    $ git.compile config --get --global --default='*' safe.bareRepository

    $ git.compile rev-parse --absolute-git-dir
    fatal: not a git repository (or any of the parent directories): .git

whereas on the last release of Git, I get instead:

    $ git rev-parse --absolute-git-dir
    /home/ttaylorr/repo.git

I'm still not convinced that just reading repository extensions while
ignoring the rest of config and hooks is too confusing, so I'd be more
in favor of something like:

    $ git.compile rev-parse --absolute-git-dir
    warning: ignoring repository config and hooks
    advice: to permit bare repository discovery (which
    advice: will read config and hooks), consider running:
    advice:
    advice:   $ git config --global --add safe.bareRepository /home/ttaylorr/repo.git
    /home/ttaylorr/repo.git

(though I still feel strongly that we should pursue a more targeted
approach here).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] [RFC] setup.c: make bare repo discovery optional
  2022-05-09 21:42 ` Taylor Blau
@ 2022-05-09 22:54   ` Junio C Hamano
  2022-05-09 23:57     ` Taylor Blau
  2022-05-10 22:00   ` Glen Choo
  1 sibling, 1 reply; 26+ messages in thread
From: Junio C Hamano @ 2022-05-09 22:54 UTC (permalink / raw)
  To: Taylor Blau
  Cc: Glen Choo via GitGitGadget, git, brian m. carlson,
	Derrick Stolee, Emily Shaffer, Glen Choo

Taylor Blau <me@ttaylorr.com> writes:

> Thanks for working on this! I'm excited to see some patches here, though
> I'm not totally convinced of this direction. More below.
>
> To summarize, this proposal attempts to work around the problem of
> embedding bare repositories in non-bare checkouts by providing a way to
> opt-out of bare repository discovery (which is to only discover things
> that are listed in the safe.bareRepository configuration).
>
> I agree that this would prevent the problem you're trying to solve, but
> I have significant concerns that this patch is going too far (at the
> risk of future damage to unrelated workflows) in order to accomplish
> that goal.
>
> My concern is that if we ever flipped the default (i.e. that
> "safe.bareRepository" might someday be ""), that many legitimate cases
> of using bare repositories would be broken. I think there are many such
> legitimate use cases that _do_ rely on discovering bare repositories
> (i.e., Git invocations that do not have a `--git-dir` in their
> command-line).

I think 99% of such use is to chdir into the directory with HEAD,
refs/ and objects/ in it and let git recognise the cwd is a git
directory.  Am I mistaken, or are there tools that chdir into
objects/08/ and rely on setup_git_directory_gently_1() to find the
parent directory of that 'objects' directory to be a git directory?

I am wondering if another knob to help that particular use case
easier may be sufficient.  If you are a forge operator, you'd just
set a boolean configuration variable to say "it is sufficient to
chdir into a directory to use it a bare repository without exporting
the environment variable GIT_DIR=."

It is likely that end-user human users would not want to enable such
a variable, of course, but I wonder if a simple single knob would be
sufficient to help other use cases you are worried about?

While I wish "extensions and nothing else", i.e. we use "degraded
access", not "refuse to give access at all", were workable, I am
pessimistic that it would work well in practice.

Saying "nothing else" is easy, but we do "if X exists, use it" for
hook, and to implement "nothing else", you'd need to find such a
code and say "even if X exists, because we are in this strange
embedded bare thing, ignore this part of the logic" for every X.
We've been casually saying "potentially risky config" and then
started mixing "hooks" in the discussion, but who knows what other
things are used from the repository by third-party tools that we
need to yet add to the mix?

> I'm still not convinced that just reading repository extensions while
> ignoring the rest of config and hooks is too confusing, so I'd be more
> in favor of something like:

I do not think it would be confusing.  I am worried about it being
error prone.

>     $ git.compile rev-parse --absolute-git-dir
>     warning: ignoring repository config and hooks
>     advice: to permit bare repository discovery (which
>     advice: will read config and hooks), consider running:
>     advice:
>     advice:   $ git config --global --add safe.bareRepository /home/ttaylorr/repo.git
>     /home/ttaylorr/repo.git

Is the last line meant to be an output from "rev-parse --absolute-git-dir"?
IOW, the warning says you are ignoring, but we are still recognising
it as a repository?

By the way, do we need safe.bareRepository?  Shouldn't
safe.directory cover the same purpose?  

If a directory is on the latter, you are saying that (1) the
directory is OK to use as a repository, and (2) it is so even if the
directory is owned by somebody else, not you.

Theoretically you can argue that there can be cases where you only
want (1) and not (2), but as long as you control such a directory
(like an embedded repository in your project's checkout) yourself,
you do not have to worry about the "ok even if it is owned by
somebody else" part.





^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] [RFC] setup.c: make bare repo discovery optional
  2022-05-09 22:54   ` Junio C Hamano
@ 2022-05-09 23:57     ` Taylor Blau
  2022-05-10  0:23       ` Junio C Hamano
  0 siblings, 1 reply; 26+ messages in thread
From: Taylor Blau @ 2022-05-09 23:57 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Taylor Blau, Glen Choo via GitGitGadget, git, brian m. carlson,
	Derrick Stolee, Emily Shaffer, Glen Choo

On Mon, May 09, 2022 at 03:54:08PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > Thanks for working on this! I'm excited to see some patches here, though
> > I'm not totally convinced of this direction. More below.
> >
> > To summarize, this proposal attempts to work around the problem of
> > embedding bare repositories in non-bare checkouts by providing a way to
> > opt-out of bare repository discovery (which is to only discover things
> > that are listed in the safe.bareRepository configuration).
> >
> > I agree that this would prevent the problem you're trying to solve, but
> > I have significant concerns that this patch is going too far (at the
> > risk of future damage to unrelated workflows) in order to accomplish
> > that goal.
> >
> > My concern is that if we ever flipped the default (i.e. that
> > "safe.bareRepository" might someday be ""), that many legitimate cases
> > of using bare repositories would be broken. I think there are many such
> > legitimate use cases that _do_ rely on discovering bare repositories
> > (i.e., Git invocations that do not have a `--git-dir` in their
> > command-line).
>
> I think 99% of such use is to chdir into the directory with HEAD,
> refs/ and objects/ in it and let git recognise the cwd is a git
> directory.  Am I mistaken, or are there tools that chdir into
> objects/08/ and rely on setup_git_directory_gently_1() to find the
> parent directory of that 'objects' directory to be a git directory?

If you took this change, and then at some point in the future we changed
the default value of safe.bareRepository to "", wouldn't that break that
99% of use cases you are talking about?

When I read your "I think 99% of such use is ...", it makes me think
that this change won't disrupt bare repo discovery when we only traverse
one layer above $CWD. But this change disrupts the case where we don't
need to traverse at all to do discovery (i.e., when $CWD is the root of
a bare repository).

> I am wondering if another knob to help that particular use case
> easier may be sufficient.  If you are a forge operator, you'd just
> set a boolean configuration variable to say "it is sufficient to
> chdir into a directory to use it a bare repository without exporting
> the environment variable GIT_DIR=."

Yes, GitHub would almost certainly set safe.bareRepository to "*"
regardless of what Git's own default would be.

> It is likely that end-user human users would not want to enable such
> a variable, of course, but I wonder if a simple single knob would be
> sufficient to help other use cases you are worried about?

I'm not sure I agree that end-users wouldn't want to touch this knob. If
they have embedded bare repositories that they rely on as test fixtures,
for example, wouldn't safe.bareRepository need to be tweaked?

(On a separate but somewhat-related note, I still think that this
setting should be read from the repository config, too, i.e., it seems
odd that we'd force a user to set safe.bareRepository to some deeply
nested repository (in the embedded case) via their global config.)

> While I wish "extensions and nothing else", i.e. we use "degraded
> access", not "refuse to give access at all", were workable, I am
> pessimistic that it would work well in practice.
>
> Saying "nothing else" is easy, but we do "if X exists, use it" for
> hook, and to implement "nothing else", you'd need to find such a
> code and say "even if X exists, because we are in this strange
> embedded bare thing, ignore this part of the logic" for every X.
> We've been casually saying "potentially risky config" and then
> started mixing "hooks" in the discussion, but who knows what other
> things are used from the repository by third-party tools that we
> need to yet add to the mix?
>
> > I'm still not convinced that just reading repository extensions while
> > ignoring the rest of config and hooks is too confusing, so I'd be more
> > in favor of something like:
>
> I do not think it would be confusing.  I am worried about it being
> error prone.

Yeah, on this and the above quoted hunk, I am fine if our behavior
eventually became "call die()" for when we are in an embedded bare
repository. But I do think this transition should be gradual, i.e., we
should likely emit a warning in those cases that would be broken in the
future to say "this will break, run this `git config` invocation if you
want it to remain working".

> >     $ git.compile rev-parse --absolute-git-dir
> >     warning: ignoring repository config and hooks
> >     advice: to permit bare repository discovery (which
> >     advice: will read config and hooks), consider running:
> >     advice:
> >     advice:   $ git config --global --add safe.bareRepository /home/ttaylorr/repo.git
> >     /home/ttaylorr/repo.git
>
> Is the last line meant to be an output from "rev-parse --absolute-git-dir"?
> IOW, the warning says you are ignoring, but we are still recognising
> it as a repository?

In this example, yes. But again, I'm not so deeply attached to the idea
that we *have* to run in those cases. So I would equally be OK with the
above s/warning/fatal and minus the last line, too (i.e., that we call
die(), obviously we'd have to emit the advice before calling die()).

> By the way, do we need safe.bareRepository?  Shouldn't
> safe.directory cover the same purpose?
>
> If a directory is on the latter, you are saying that (1) the
> directory is OK to use as a repository, and (2) it is so even if the
> directory is owned by somebody else, not you.
>
> Theoretically you can argue that there can be cases where you only
> want (1) and not (2), but as long as you control such a directory
> (like an embedded repository in your project's checkout) yourself,
> you do not have to worry about the "ok even if it is owned by
> somebody else" part.

I'm not sure yet, but will think more about it.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] [RFC] setup.c: make bare repo discovery optional
  2022-05-09 23:57     ` Taylor Blau
@ 2022-05-10  0:23       ` Junio C Hamano
  0 siblings, 0 replies; 26+ messages in thread
From: Junio C Hamano @ 2022-05-10  0:23 UTC (permalink / raw)
  To: Taylor Blau
  Cc: Glen Choo via GitGitGadget, git, brian m. carlson,
	Derrick Stolee, Emily Shaffer, Glen Choo

Taylor Blau <me@ttaylorr.com> writes:

>> > My concern is that if we ever flipped the default (i.e. that
>> > "safe.bareRepository" might someday be ""), that many legitimate cases
>> > of using bare repositories would be broken. I think there are many such
>> > legitimate use cases that _do_ rely on discovering bare repositories
>> > (i.e., Git invocations that do not have a `--git-dir` in their
>> > command-line).
>>
>> I think 99% of such use is to chdir into the directory with HEAD,
>> refs/ and objects/ in it and let git recognise the cwd is a git
>> directory.  Am I mistaken, or are there tools that chdir into
>> objects/08/ and rely on setup_git_directory_gently_1() to find the
>> parent directory of that 'objects' directory to be a git directory?
>
> If you took this change, and then at some point in the future we changed
> the default value of safe.bareRepository to "", wouldn't that break that
> 99% of use cases you are talking about?

Our spawning (e.g. "fetch" run_command()s "upload-pack" in a local
repository, or "fetch" runs "upload-pack" over ssh connection, or
http gateway runs "upload-pack" after learning which repository the
request is fetching from) of subcommands can and should be fixed by
exporting "GIT_DIR=." when we spawn them in the target directory,
and such a fix should be more or less trivial.  It must happen
before such a switch of default happens (if it is what we plan to
do, that is).  Also, the trivial fix must be conveyed to third-party
tool authors and give them time to adjust their ware.

That's part of the usual migration process, and I am not so worried
about it.

If some third-party tool for whatever reason wants to start from a
random subdirectory in a bare repository, that is a different story.
Fixing such a third-party tool would be more involved than "more or
less trivial".

> When I read your "I think 99% of such use is ...", it makes me think
> that this change won't disrupt bare repo discovery when we only traverse
> one layer above $CWD. But this change disrupts the case where we don't
> need to traverse at all to do discovery (i.e., when $CWD is the root of
> a bare repository).

By "this change" you mean what Glen proposes?  I think it was
designed to break the use case where you go there to signal that you
want to use the directory as a repository.

>> I am wondering if another knob to help that particular use case
>> easier may be sufficient.  If you are a forge operator, you'd just
>> set a boolean configuration variable to say "it is sufficient to
>> chdir into a directory to use it a bare repository without exporting
>> the environment variable GIT_DIR=."

And such a boolean, without safe.bareRepository setting, should be
sufficient to cover that 99% of such use, because it disables that
deliberate refusal of treating CWD as a repository without
explicitly saying that is what you want with "GIT_DIR=.".  One thing
I wasn't sure about was if that 99% number is close to reality,
hence my question.

> Yes, GitHub would almost certainly set safe.bareRepository to "*"
> regardless of what Git's own default would be.

And with such a boolean, I am hoping that GitHub do not have to make
such a wildly open setting.  Only $CWD that is the top of a repository,
without allowing it to be any random subdirectory, would be allowed.

> I'm not sure I agree that end-users wouldn't want to touch this knob. If
> they have embedded bare repositories that they rely on as test fixtures,
> for example, wouldn't safe.bareRepository need to be tweaked?

But not in the "My $CWD is always fine" knob, whose only reason is
to simplify things without opening you up unnecessarily too widely
for hosting sites.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] [RFC] setup.c: make bare repo discovery optional
  2022-05-09 21:42 ` Taylor Blau
  2022-05-09 22:54   ` Junio C Hamano
@ 2022-05-10 22:00   ` Glen Choo
  1 sibling, 0 replies; 26+ messages in thread
From: Glen Choo @ 2022-05-10 22:00 UTC (permalink / raw)
  To: Taylor Blau, Glen Choo via GitGitGadget
  Cc: git, brian m. carlson, Derrick Stolee, Junio C Hamano, Emily Shaffer

Hi Taylor,

Taylor Blau <me@ttaylorr.com> writes:

> Hi Glen,
>
> On Fri, May 06, 2022 at 06:30:10PM +0000, Glen Choo via GitGitGadget wrote:
>> From: Glen Choo <chooglen@google.com>
>>
>> Add a config variable, `safe.barerepository`, that tells Git whether or
>> not to recognize bare repositories when it is trying to discover the
>> repository. This only affects repository discovery, thus it has no
>> effect if discovery was not done (e.g. `--git-dir` was passed).
>
> To summarize, this proposal attempts to work around the problem of
> embedding bare repositories in non-bare checkouts by providing a way to
> opt-out of bare repository discovery (which is to only discover things
> that are listed in the safe.bareRepository configuration).
>
> I agree that this would prevent the problem you're trying to solve, but
> I have significant concerns that this patch is going too far (at the
> risk of future damage to unrelated workflows) in order to accomplish
> that goal.

Thanks again for the careful read. As I understand it, your concern is
that making bare repository discovery configurable and then flipping the
default to e.g. never detecting bare repositories is too disruptive to
fix the embedded bare repository problem. And to avoid disrupting
non-embedded bare repositories, you would prefer to pursue a more
targeted fix.

If the problem statement were limited to embedded bare repositories,
then I agree that this is way more than overkill, and that a targeted
solution would be preferable.

More generally however, the problem of embedded bare repositories seems
to suggest that bare repository discovery doesn't serve all users well,
and in fact, may even be a net negative for a subset of users. I'd be
interested in hearing your thoughts from that perspective, e.g.

- Should bare repository discovery should be configurable?
- What is a good default for bare repository discovery? (regardless of
  how feasible changing the default is)

This is a somewhat different direction from how the conversation started
(I hope it doesn't look like I'm shifting the goal posts), but I think
it's a good opportunity to step back and simplify something that we
wished we got right in the beginning.

And even if we don't flip the default, shipping the config value still
seems useful e.g. there's a good amount of interest in disabling bare
repository discovery at $DAYJOB (and I think we'll get a lot of
interesting results once we do).

>>     safe.barerepository is presented to users as an allow-list of
>>     directories that Git will recognize as a bare repository during the
>>     repository discovery process (much like safe.directory), but this patch
>>     only implements (and permits) boolean behavior (i.e. on, off and unset).
>>     Hopefully, this gives us some room to discuss and experiment with
>>     possible formats.
>>
>>     Thanks to Taylor for suggesting the allow-list idea :)
>
> I did suggest an allow-list, but not this one ;-).

Ah, yes. Oops. Sorry if it looked like I was putting words in your
mouth.

What I really meant was that an allow-list (untethered from any specific
purpose) seems like a useful 'UI primitive', so thanks for bringing up
the option.

>>     I think the core concept of letting users toggle bare repo discovery is
>>     solid, but I'm sending this as RFC for the following reasons:
>>
>>      * I don't love the name safe.barerepository, because it feels like Git
>>        is saying that bare repos are unsafe and consequently, that bare repo
>>        users are behaving unsafely. On the other hand, this is quite similar
>>        to safe.directory in a few respects, so it might make sense for the
>>        naming to reflect that.
>
> Yes, the concerns I outlined above are definitely echoing this
> sentiment. Another way to say it is that this feels like too big of a
> hammer (i.e., it is targeting _all_ bare repositories, not just embedded
> ones) for too small of a nail (embedded bare repositories). As you're
> probably sick of hearing me say by now, I would strongly prefer a more
> targeted solution (perhaps what I outlined, or perhaps something else,
> so long as it doesn't break non-embedded bare repositories if/ever we
> decided to change the default value of safe.bareRepository).

Ok, yeah I think safe.barerepository is a terrible way to achieve my
purported goal of 'making bare repository discovery
configurable/simpler/' - using the "safe." namespace makes it impossible
to see this as anything other than protection against dangerous, unknown
bare repositories. I'll drop the idea of safe-listing known bare
repositories for now, that seems unproductive.

'Optionally disable bare repository discovery' still sounds like it's on
the table though, but probably with a different kind of UX e.g.
"discovery.barerepository" with the options:

- always: always discover bare repos
- never: never discover bare repos
- cwd-only: only discover bare repos if they are the cwd
- dotgit-only: only discover bare repos if they are a descendant of
  .git/

>>      * The *-gcc CI jobs don't pass. I haven't discerned any kind of pattern
>>        yet.
>
> Interesting. I wouldn't expect this to be the case (since the default is
> to allow everything right now).

This might be a false alarm - I saw similar failures on an unrelated
patch. I think my "master" is just out of date :(

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 0/2] setup.c: make bare repo discovery optional
  2022-05-06 18:30 [PATCH] [RFC] setup.c: make bare repo discovery optional Glen Choo via GitGitGadget
  2022-05-06 20:33 ` Junio C Hamano
  2022-05-09 21:42 ` Taylor Blau
@ 2022-05-13 23:37 ` Glen Choo via GitGitGadget
  2022-05-13 23:37   ` [PATCH v2 1/2] " Glen Choo via GitGitGadget
                     ` (4 more replies)
  2 siblings, 5 replies; 26+ messages in thread
From: Glen Choo via GitGitGadget @ 2022-05-13 23:37 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, brian m. carlson, Derrick Stolee, Junio C Hamano,
	Emily Shaffer, Glen Choo

Thanks all for the comments on v1, I've expanded this series somewhat to
address them, notably:

 * The config value is now named discovery.bare and is an enum (instead of
   an allow-list). This hopefully moves us away from "bare repos are unsafe
   and need to be guarded against" and towards "bare repos can be made
   optional if it serves your needs better".
 * discovery.bare now causes git to die() instead of silently ignoring the
   bare repo.
 * Add an option that allows a bare repo if it is the CWD, since this is
   presumably a reasonable default for 99% of bare repo users [1].

= Questions/Concerns

 * die()-ing is necessary if we're trying to flip the default value of
   discovery.bare. We'd expect many bare repo users to be broken, and it's
   more helpful to fail loudly than to silently ignore the bare repo.
   
   But in the long term, long after we've flipped the default and users know
   that they need to opt into bare repo discovery, would it be a better UX
   to just silently ignore the bare repo?

= Patch organization

 * Patch 1 introduces discovery.bare with allowed values [always|never].

 * Patch 2 adds discover.bare=cwd, which is useful when users don't always
   set GIT_DIR e.g. their workflow really depends on it, they are in the
   midst of migration.

= Series history

Changes since v1:

 * Rename safe.barerepository to discovery.bare and make it die()
 * Move tests into t/t0034-discovery-bare.sh
 * Avoid unnecessary config reading by using a static variable
 * Add discovery.bare=cwd
 * Fix typos

[1] I tried this 'cwd' setting on our test suite, with some pretty promising
results.

https://github.com/chooglen/git/actions/runs/2321914777

Out of the 8 failing scripts:

 * 6 are of the form "make sure we 'do the right thing' inside a
   subdirectory of a bare repo" (which typically means .git) e.g.
   t9903-bash-prompt.sh. We should be setting discovery.bare=always for
   these tests, so this is a non-issue.
 * t5323-pack-redundant.sh can be rewritten to -C into the root of the bare
   repo instead of a subdirectory.
 * t3310-notes-merge-manual-resolve.sh: not sure what the test is checking
   in particular, but I think this can be rewritten.

IOW, I don't think we have any commands that require that CWD is a
subdirectory of a bare repo, and we could use discovery.bare without much
hassle.

Glen Choo (2):
  setup.c: make bare repo discovery optional
  setup.c: learn discovery.bareRepository=cwd

 Documentation/config/discovery.txt | 26 +++++++++
 setup.c                            | 89 ++++++++++++++++++++++++++++--
 t/t0034-discovery-bare.sh          | 69 +++++++++++++++++++++++
 3 files changed, 178 insertions(+), 6 deletions(-)
 create mode 100644 Documentation/config/discovery.txt
 create mode 100755 t/t0034-discovery-bare.sh


base-commit: e8005e4871f130c4e402ddca2032c111252f070a
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1261%2Fchooglen%2Fsetup%2Fdisable-bare-repo-config-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1261/chooglen/setup/disable-bare-repo-config-v2
Pull-Request: https://github.com/git/git/pull/1261

Range-diff vs v1:

 1:  3370258c4b3 ! 1:  22b10bf9da8 [RFC] setup.c: make bare repo discovery optional
     @@ Metadata
      Author: Glen Choo <chooglen@google.com>
      
       ## Commit message ##
     -    [RFC] setup.c: make bare repo discovery optional
     +    setup.c: make bare repo discovery optional
      
     -    Add a config variable, `safe.barerepository`, that tells Git whether or
     -    not to recognize bare repositories when it is trying to discover the
     -    repository. This only affects repository discovery, thus it has no
     +    Add a config variable, `discovery.bare`, that tells Git whether or not
     +    it should work with the bare repository it has discovered i.e. Git will
     +    die() if it discovers a bare repository, but it is not allowed by
     +    `discovery.bare`. This only affects repository discovery, thus it has no
          effect if discovery was not done (e.g. `--git-dir` was passed).
      
          This is motivated by the fact that some workflows don't use bare
     @@ Commit message
            are probably very rare in practice, this lets users reduce the chance
            to zero.
      
     -    This config is designed to be used like an allow-list, but it is not yet
     -    clear what a good format for this allow-list would be. As such, this
     -    patch limits the config value to a tri-state of [true|false|unset]:
     +    This config is an enum of:
      
     -    - [*|(unset)] recognize all bare repositories (like Git does today)
     -    - (empty) recognize no bare repositories
     +    - ["always"|(unset)]: always recognize bare repositories (like Git does
     +      today)
     +    - "never": never recognize bare repositories
      
     -    and leaves the full format to be determined later.
     +    More values are expected to be added later, and the default is expected
     +    to change (i.e. to something other than "always").
      
          [1]: https://lore.kernel.org/git/kl6lsfqpygsj.fsf@chooglen-macbookpro.roam.corp.google.com
          [2]: I don't personally know anyone who does this as part of their
     @@ Commit message
      
          Signed-off-by: Glen Choo <chooglen@google.com>
      
     - ## Documentation/config/safe.txt ##
     +    WIP setup.c: make discovery.bare die on failure
     +
     +    Signed-off-by: Glen Choo <chooglen@google.com>
     +
     + ## Documentation/config/discovery.txt (new) ##
      @@
     -+safe.barerepository::
     -+	This config entry specifies directories that Git can recognize as
     -+	a bare repository when looking for the repository (aka repository
     ++discovery.bare::
     ++	Specifies what kinds of directories Git can recognize as a bare
     ++	repository when looking for the repository (aka repository
      +	discovery). This has no effect if repository discovery is not
      +	performed e.g. the path to the repository is set via `--git-dir`
      +	(see linkgit:git[1]).
      ++
     -+It is recommended that you set this value so that Git will only use the bare
     -+repositories you intend it to. This prevents certain types of security and
     -+non-security problems, such as:
     -+
     -+* `git clone`-ing a repository containing a maliciously bare repository
     -+  inside it.
     -+* Git recognizing a directory that isn't mean to be a bare repository,
     -+  but happens to look like one.
     -++
     -+The currently supported values are `*` (Git recognizes all bare
     -+repositories) and the empty value (Git never recognizes bare repositories).
     -+Defaults to `*`.
     -++
      +This config setting is only respected when specified in a system or global
      +config, not when it is specified in a repository config or via the command
     -+line option `-c safe.barerepository=<path>`.
     ++line option `-c discovery.bare=<value>`.
     +++
     ++The currently supported values are `always` (Git always recognizes bare
     ++repositories) and `never` (Git never recognizes bare repositories).
     ++This defaults to `always`, but this default is likely to change.
     +++
     ++If your workflow does not rely on bare repositories, it is recommended that
     ++you set this value to `never`. This makes repository discovery easier to
     ++reason about and prevents certain types of security and non-security
     ++problems, such as:
      +
     - safe.directory::
     - 	These config entries specify Git-tracked directories that are
     - 	considered safe even if they are owned by someone other than the
     ++* `git clone`-ing a repository containing a malicious bare repository
     ++  inside it.
     ++* Git recognizing a directory that isn't meant to be a bare repository,
     ++  but happens to look like one.
      
       ## setup.c ##
     +@@
     + static int inside_git_dir = -1;
     + static int inside_work_tree = -1;
     + static int work_tree_config_is_bogus;
     ++enum discovery_bare_config {
     ++	DISCOVERY_BARE_UNKNOWN = -1,
     ++	DISCOVERY_BARE_NEVER = 0,
     ++	DISCOVERY_BARE_ALWAYS,
     ++};
     ++static enum discovery_bare_config discovery_bare_config =
     ++	DISCOVERY_BARE_UNKNOWN;
     + 
     + static struct startup_info the_startup_info;
     + struct startup_info *startup_info = &the_startup_info;
      @@ setup.c: static int ensure_valid_ownership(const char *path)
       	return data.is_safe;
       }
       
     -+/*
     -+ * This is similar to safe_directory_data, but only supports true/false.
     -+ */
     -+struct safe_bare_repository_data {
     -+	int is_safe;
     -+};
     -+
     -+static int safe_bare_repository_cb(const char *key, const char *value, void *d)
     ++static int discovery_bare_cb(const char *key, const char *value, void *d)
      +{
     -+	struct safe_bare_repository_data *data = d;
     -+
     -+	if (strcmp(key, "safe.barerepository"))
     ++	if (strcmp(key, "discovery.bare"))
      +		return 0;
      +
     -+	if (!value || !strcmp(value, "*")) {
     -+		data->is_safe = 1;
     ++	if (!strcmp(value, "never")) {
     ++		discovery_bare_config = DISCOVERY_BARE_NEVER;
      +		return 0;
      +	}
     -+	if (!*value) {
     -+		data->is_safe = 0;
     ++	if (!strcmp(value, "always")) {
     ++		discovery_bare_config = DISCOVERY_BARE_ALWAYS;
      +		return 0;
      +	}
      +	return -1;
      +}
      +
     -+static int should_detect_bare(void)
     ++static int check_bare_repo_allowed(void)
      +{
     -+	struct safe_bare_repository_data data;
     -+
     -+	read_very_early_config(safe_bare_repository_cb, &data);
     ++	if (discovery_bare_config == DISCOVERY_BARE_UNKNOWN) {
     ++		read_very_early_config(discovery_bare_cb, NULL);
     ++		/* We didn't find a value; use the default. */
     ++		if (discovery_bare_config == DISCOVERY_BARE_UNKNOWN)
     ++			discovery_bare_config = DISCOVERY_BARE_ALWAYS;
     ++	}
     ++	switch (discovery_bare_config) {
     ++	case DISCOVERY_BARE_NEVER:
     ++		return 0;
     ++	case DISCOVERY_BARE_ALWAYS:
     ++		return 1;
     ++	default:
     ++		BUG("invalid discovery_bare_config %d", discovery_bare_config);
     ++	}
     ++}
      +
     -+	return data.is_safe;
     ++static const char *discovery_bare_config_to_string(void)
     ++{
     ++	switch (discovery_bare_config) {
     ++	case DISCOVERY_BARE_NEVER:
     ++		return "never";
     ++	case DISCOVERY_BARE_ALWAYS:
     ++		return "always";
     ++	default:
     ++		BUG("invalid discovery_bare_config %d", discovery_bare_config);
     ++	}
      +}
      +
       enum discovery_result {
       	GIT_DIR_NONE = 0,
       	GIT_DIR_EXPLICIT,
     +@@ setup.c: enum discovery_result {
     + 	GIT_DIR_HIT_CEILING = -1,
     + 	GIT_DIR_HIT_MOUNT_POINT = -2,
     + 	GIT_DIR_INVALID_GITFILE = -3,
     +-	GIT_DIR_INVALID_OWNERSHIP = -4
     ++	GIT_DIR_INVALID_OWNERSHIP = -4,
     ++	GIT_DIR_DISALLOWED_BARE = -5
     + };
     + 
     + /*
      @@ setup.c: static enum discovery_result setup_git_directory_gently_1(struct strbuf *dir,
     - 			return GIT_DIR_DISCOVERED;
       		}
       
     --		if (is_git_directory(dir->buf)) {
     -+		if (should_detect_bare() && is_git_directory(dir->buf)) {
     + 		if (is_git_directory(dir->buf)) {
     ++			if (!check_bare_repo_allowed())
     ++				return GIT_DIR_DISALLOWED_BARE;
       			if (!ensure_valid_ownership(dir->buf))
       				return GIT_DIR_INVALID_OWNERSHIP;
       			strbuf_addstr(gitdir, ".");
     +@@ setup.c: const char *setup_git_directory_gently(int *nongit_ok)
     + 		}
     + 		*nongit_ok = 1;
     + 		break;
     ++	case GIT_DIR_DISALLOWED_BARE:
     ++		if (!nongit_ok) {
     ++			die(_("cannot use bare repository '%s' (discovery.bare is '%s')"),
     ++			    dir.buf,
     ++			    discovery_bare_config_to_string());
     ++		}
     ++		*nongit_ok = 1;
     ++		break;
     + 	case GIT_DIR_NONE:
     + 		/*
     + 		 * As a safeguard against setup_git_directory_gently_1 returning
      
     - ## t/t1510-repo-setup.sh ##
     -@@ t/t1510-repo-setup.sh: test_expect_success '#16e: bareness preserved by --bare' '
     - 	)
     - '
     - 
     -+# Test the tri-state of [(unset)|""|"*"].
     -+test_expect_success '#16f: bare repo in worktree' '
     -+	test_when_finished "git config --global --unset safe.barerepository" &&
     -+	setup_repo 16f unset "" unset &&
     + ## t/t0034-discovery-bare.sh (new) ##
     +@@
     ++#!/bin/sh
      +
     -+	git init --bare 16f/default/bare &&
     -+	git init --bare 16f/default/bare/bare &&
     -+	try_case 16f/default/bare unset unset \
     -+		. "(null)" "$here/16f/default/bare" "(null)" &&
     -+	try_case 16f/default/bare/bare unset unset \
     -+		. "(null)" "$here/16f/default/bare/bare" "(null)" &&
     ++test_description='verify discovery.bare checks'
      +
     -+	git config --global safe.barerepository "*" &&
     -+	git init --bare 16f/all/bare &&
     -+	git init --bare 16f/all/bare/bare &&
     -+	try_case 16f/all/bare unset unset \
     -+		. "(null)" "$here/16f/all/bare" "(null)" &&
     -+	try_case 16f/all/bare/bare unset unset \
     -+		. "(null)" "$here/16f/all/bare/bare" "(null)" &&
     ++. ./test-lib.sh
      +
     -+	git config --global safe.barerepository "" &&
     -+	git init --bare 16f/never/bare &&
     -+	git init --bare 16f/never/bare/bare &&
     -+	try_case 16f/never/bare unset unset \
     -+		".git" "$here/16f" "$here/16f" "never/bare/" &&
     -+	try_case 16f/never/bare/bare unset unset \
     -+		".git" "$here/16f" "$here/16f" "never/bare/bare/"
     -+'
     ++pwd="$(pwd)"
      +
     -+test_expect_success '#16g: inside .git with safe.barerepository' '
     -+	test_when_finished "git config --global --unset safe.barerepository" &&
     ++expect_allowed () {
     ++	git rev-parse --absolute-git-dir >actual &&
     ++	echo "$pwd/outer-repo/bare-repo" >expected &&
     ++	test_cmp expected actual
     ++}
      +
     -+	# Omit the "default" case; it is covered by 16a.
     ++expect_rejected () {
     ++	test_must_fail git rev-parse --absolute-git-dir 2>err &&
     ++	grep "discovery.bare" err
     ++}
      +
     -+	git config --global safe.barerepository "*" &&
     -+	setup_repo 16g/all unset "" unset &&
     -+	mkdir -p 16g/all/.git/wt/sub &&
     -+	try_case 16g/all/.git unset unset \
     -+		. "(null)" "$here/16g/all/.git" "(null)" &&
     -+	try_case 16g/all/.git/wt unset unset \
     -+		"$here/16g/all/.git" "(null)" "$here/16g/all/.git/wt" "(null)" &&
     -+	try_case 16g/all/.git/wt/sub unset unset \
     -+		"$here/16g/all/.git" "(null)" "$here/16g/all/.git/wt/sub" "(null)" &&
     ++test_expect_success 'setup bare repo in worktree' '
     ++	git init outer-repo &&
     ++	git init --bare outer-repo/bare-repo
     ++'
     ++
     ++test_expect_success 'discovery.bare unset' '
     ++	(
     ++		cd outer-repo/bare-repo &&
     ++		expect_allowed &&
     ++		cd refs/ &&
     ++		expect_allowed
     ++	)
     ++'
     ++
     ++test_expect_success 'discovery.bare=always' '
     ++	git config --global discovery.bare always &&
     ++	(
     ++		cd outer-repo/bare-repo &&
     ++		expect_allowed &&
     ++		cd refs/ &&
     ++		expect_allowed
     ++	)
     ++'
      +
     -+	git config --global safe.barerepository "" &&
     -+	setup_repo 16g/never unset "" unset &&
     -+	mkdir -p 16g/never/.git/wt/sub &&
     -+	try_case 16g/never/.git unset unset \
     -+		".git" "$here/16g/never" "$here/16g/never" ".git/" &&
     -+	try_case 16g/never/.git/wt unset unset \
     -+		".git" "$here/16g/never" "$here/16g/never" ".git/wt/" &&
     -+	try_case 16g/never/.git/wt/sub unset unset \
     -+		".git" "$here/16g/never" "$here/16g/never" ".git/wt/sub/"
     ++test_expect_success 'discovery.bare=never' '
     ++	git config --global discovery.bare never &&
     ++	(
     ++		cd outer-repo/bare-repo &&
     ++		expect_rejected &&
     ++		cd refs/ &&
     ++		expect_rejected
     ++	) &&
     ++	(
     ++		GIT_DIR=outer-repo/bare-repo &&
     ++		export GIT_DIR &&
     ++		expect_allowed
     ++	)
      +'
      +
     - test_expect_success '#17: GIT_WORK_TREE without explicit GIT_DIR is accepted (bare case)' '
     - 	# Just like #16.
     - 	setup_repo 17a unset "" true &&
     ++test_done
 -:  ----------- > 2:  62070aab7eb setup.c: learn discovery.bareRepository=cwd

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 1/2] setup.c: make bare repo discovery optional
  2022-05-13 23:37 ` [PATCH v2 0/2] " Glen Choo via GitGitGadget
@ 2022-05-13 23:37   ` Glen Choo via GitGitGadget
  2022-05-16 18:12     ` Glen Choo
  2022-05-16 18:46     ` Derrick Stolee
  2022-05-13 23:37   ` [PATCH v2 2/2] setup.c: learn discovery.bareRepository=cwd Glen Choo via GitGitGadget
                     ` (3 subsequent siblings)
  4 siblings, 2 replies; 26+ messages in thread
From: Glen Choo via GitGitGadget @ 2022-05-13 23:37 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, brian m. carlson, Derrick Stolee, Junio C Hamano,
	Emily Shaffer, Glen Choo, Glen Choo

From: Glen Choo <chooglen@google.com>

Add a config variable, `discovery.bare`, that tells Git whether or not
it should work with the bare repository it has discovered i.e. Git will
die() if it discovers a bare repository, but it is not allowed by
`discovery.bare`. This only affects repository discovery, thus it has no
effect if discovery was not done (e.g. `--git-dir` was passed).

This is motivated by the fact that some workflows don't use bare
repositories at all, and users may prefer to opt out of bare repository
discovery altogether:

- An easy assumption for a user to make is that Git commands run
  anywhere inside a repository's working tree will use the same
  repository. However, if the working tree contains a bare repository
  below the root-level (".git" is preferred at the root-level), any
  operations inside that bare repository use the bare repository
  instead.

  In the worst case, attackers can use this confusion to trick users
  into running arbitrary code (see [1] for a deeper discussion). But
  even in benign situations (e.g. a user renames ".git/" to ".git.old/"
  and commits it for archival purposes), disabling bare repository
  discovery can be a simpler mode of operation (e.g. because the user
  doesn't actually want to use ".git.old/") [2].

- Git won't "accidentally" recognize a directory that wasn't meant to be
  a bare repository, but happens to resemble one. While such accidents
  are probably very rare in practice, this lets users reduce the chance
  to zero.

This config is an enum of:

- ["always"|(unset)]: always recognize bare repositories (like Git does
  today)
- "never": never recognize bare repositories

More values are expected to be added later, and the default is expected
to change (i.e. to something other than "always").

[1]: https://lore.kernel.org/git/kl6lsfqpygsj.fsf@chooglen-macbookpro.roam.corp.google.com
[2]: I don't personally know anyone who does this as part of their
normal workflow, but a cursory search on GitHub suggests that there is a
not insubstantial number of people who munge ".git" in order to store
its contents.

https://github.com/search?l=&o=desc&p=1&q=ref+size%3A%3C1000+filename%3AHEAD&s=indexed&type=Code
(aka search for the text "ref", size:<1000, filename:HEAD)

Signed-off-by: Glen Choo <chooglen@google.com>

WIP setup.c: make discovery.bare die on failure

Signed-off-by: Glen Choo <chooglen@google.com>
---
 Documentation/config/discovery.txt | 24 +++++++++++
 setup.c                            | 66 +++++++++++++++++++++++++++++-
 t/t0034-discovery-bare.sh          | 59 ++++++++++++++++++++++++++
 3 files changed, 148 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/config/discovery.txt
 create mode 100755 t/t0034-discovery-bare.sh

diff --git a/Documentation/config/discovery.txt b/Documentation/config/discovery.txt
new file mode 100644
index 00000000000..761cabe6e70
--- /dev/null
+++ b/Documentation/config/discovery.txt
@@ -0,0 +1,24 @@
+discovery.bare::
+	Specifies what kinds of directories Git can recognize as a bare
+	repository when looking for the repository (aka repository
+	discovery). This has no effect if repository discovery is not
+	performed e.g. the path to the repository is set via `--git-dir`
+	(see linkgit:git[1]).
++
+This config setting is only respected when specified in a system or global
+config, not when it is specified in a repository config or via the command
+line option `-c discovery.bare=<value>`.
++
+The currently supported values are `always` (Git always recognizes bare
+repositories) and `never` (Git never recognizes bare repositories).
+This defaults to `always`, but this default is likely to change.
++
+If your workflow does not rely on bare repositories, it is recommended that
+you set this value to `never`. This makes repository discovery easier to
+reason about and prevents certain types of security and non-security
+problems, such as:
+
+* `git clone`-ing a repository containing a malicious bare repository
+  inside it.
+* Git recognizing a directory that isn't meant to be a bare repository,
+  but happens to look like one.
diff --git a/setup.c b/setup.c
index a7b36f3ffbf..cee01d86f0c 100644
--- a/setup.c
+++ b/setup.c
@@ -10,6 +10,13 @@
 static int inside_git_dir = -1;
 static int inside_work_tree = -1;
 static int work_tree_config_is_bogus;
+enum discovery_bare_config {
+	DISCOVERY_BARE_UNKNOWN = -1,
+	DISCOVERY_BARE_NEVER = 0,
+	DISCOVERY_BARE_ALWAYS,
+};
+static enum discovery_bare_config discovery_bare_config =
+	DISCOVERY_BARE_UNKNOWN;
 
 static struct startup_info the_startup_info;
 struct startup_info *startup_info = &the_startup_info;
@@ -1133,6 +1140,52 @@ static int ensure_valid_ownership(const char *path)
 	return data.is_safe;
 }
 
+static int discovery_bare_cb(const char *key, const char *value, void *d)
+{
+	if (strcmp(key, "discovery.bare"))
+		return 0;
+
+	if (!strcmp(value, "never")) {
+		discovery_bare_config = DISCOVERY_BARE_NEVER;
+		return 0;
+	}
+	if (!strcmp(value, "always")) {
+		discovery_bare_config = DISCOVERY_BARE_ALWAYS;
+		return 0;
+	}
+	return -1;
+}
+
+static int check_bare_repo_allowed(void)
+{
+	if (discovery_bare_config == DISCOVERY_BARE_UNKNOWN) {
+		read_very_early_config(discovery_bare_cb, NULL);
+		/* We didn't find a value; use the default. */
+		if (discovery_bare_config == DISCOVERY_BARE_UNKNOWN)
+			discovery_bare_config = DISCOVERY_BARE_ALWAYS;
+	}
+	switch (discovery_bare_config) {
+	case DISCOVERY_BARE_NEVER:
+		return 0;
+	case DISCOVERY_BARE_ALWAYS:
+		return 1;
+	default:
+		BUG("invalid discovery_bare_config %d", discovery_bare_config);
+	}
+}
+
+static const char *discovery_bare_config_to_string(void)
+{
+	switch (discovery_bare_config) {
+	case DISCOVERY_BARE_NEVER:
+		return "never";
+	case DISCOVERY_BARE_ALWAYS:
+		return "always";
+	default:
+		BUG("invalid discovery_bare_config %d", discovery_bare_config);
+	}
+}
+
 enum discovery_result {
 	GIT_DIR_NONE = 0,
 	GIT_DIR_EXPLICIT,
@@ -1142,7 +1195,8 @@ enum discovery_result {
 	GIT_DIR_HIT_CEILING = -1,
 	GIT_DIR_HIT_MOUNT_POINT = -2,
 	GIT_DIR_INVALID_GITFILE = -3,
-	GIT_DIR_INVALID_OWNERSHIP = -4
+	GIT_DIR_INVALID_OWNERSHIP = -4,
+	GIT_DIR_DISALLOWED_BARE = -5
 };
 
 /*
@@ -1239,6 +1293,8 @@ static enum discovery_result setup_git_directory_gently_1(struct strbuf *dir,
 		}
 
 		if (is_git_directory(dir->buf)) {
+			if (!check_bare_repo_allowed())
+				return GIT_DIR_DISALLOWED_BARE;
 			if (!ensure_valid_ownership(dir->buf))
 				return GIT_DIR_INVALID_OWNERSHIP;
 			strbuf_addstr(gitdir, ".");
@@ -1385,6 +1441,14 @@ const char *setup_git_directory_gently(int *nongit_ok)
 		}
 		*nongit_ok = 1;
 		break;
+	case GIT_DIR_DISALLOWED_BARE:
+		if (!nongit_ok) {
+			die(_("cannot use bare repository '%s' (discovery.bare is '%s')"),
+			    dir.buf,
+			    discovery_bare_config_to_string());
+		}
+		*nongit_ok = 1;
+		break;
 	case GIT_DIR_NONE:
 		/*
 		 * As a safeguard against setup_git_directory_gently_1 returning
diff --git a/t/t0034-discovery-bare.sh b/t/t0034-discovery-bare.sh
new file mode 100755
index 00000000000..9c774872c4e
--- /dev/null
+++ b/t/t0034-discovery-bare.sh
@@ -0,0 +1,59 @@
+#!/bin/sh
+
+test_description='verify discovery.bare checks'
+
+. ./test-lib.sh
+
+pwd="$(pwd)"
+
+expect_allowed () {
+	git rev-parse --absolute-git-dir >actual &&
+	echo "$pwd/outer-repo/bare-repo" >expected &&
+	test_cmp expected actual
+}
+
+expect_rejected () {
+	test_must_fail git rev-parse --absolute-git-dir 2>err &&
+	grep "discovery.bare" err
+}
+
+test_expect_success 'setup bare repo in worktree' '
+	git init outer-repo &&
+	git init --bare outer-repo/bare-repo
+'
+
+test_expect_success 'discovery.bare unset' '
+	(
+		cd outer-repo/bare-repo &&
+		expect_allowed &&
+		cd refs/ &&
+		expect_allowed
+	)
+'
+
+test_expect_success 'discovery.bare=always' '
+	git config --global discovery.bare always &&
+	(
+		cd outer-repo/bare-repo &&
+		expect_allowed &&
+		cd refs/ &&
+		expect_allowed
+	)
+'
+
+test_expect_success 'discovery.bare=never' '
+	git config --global discovery.bare never &&
+	(
+		cd outer-repo/bare-repo &&
+		expect_rejected &&
+		cd refs/ &&
+		expect_rejected
+	) &&
+	(
+		GIT_DIR=outer-repo/bare-repo &&
+		export GIT_DIR &&
+		expect_allowed
+	)
+'
+
+test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 2/2] setup.c: learn discovery.bareRepository=cwd
  2022-05-13 23:37 ` [PATCH v2 0/2] " Glen Choo via GitGitGadget
  2022-05-13 23:37   ` [PATCH v2 1/2] " Glen Choo via GitGitGadget
@ 2022-05-13 23:37   ` Glen Choo via GitGitGadget
  2022-05-16 18:49     ` Derrick Stolee
  2022-05-16 16:40   ` [PATCH v2 0/2] setup.c: make bare repo discovery optional Junio C Hamano
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 26+ messages in thread
From: Glen Choo via GitGitGadget @ 2022-05-13 23:37 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, brian m. carlson, Derrick Stolee, Junio C Hamano,
	Emily Shaffer, Glen Choo, Glen Choo

From: Glen Choo <chooglen@google.com>

Add a 'cwd' option to discovery.bareRepository, which allows a bare
repository to be used if and only if the cwd is the root of a bare
repository. This covers the common case where a user works with a bare
repository by cd-ing into the repository's root.

Signed-off-by: Glen Choo <chooglen@google.com>
---
 Documentation/config/discovery.txt |  6 ++++--
 setup.c                            | 27 ++++++++++++++++++++-------
 t/t0034-discovery-bare.sh          | 10 ++++++++++
 3 files changed, 34 insertions(+), 9 deletions(-)

diff --git a/Documentation/config/discovery.txt b/Documentation/config/discovery.txt
index 761cabe6e70..d7cdee3a5e1 100644
--- a/Documentation/config/discovery.txt
+++ b/Documentation/config/discovery.txt
@@ -10,8 +10,10 @@ config, not when it is specified in a repository config or via the command
 line option `-c discovery.bare=<value>`.
 +
 The currently supported values are `always` (Git always recognizes bare
-repositories) and `never` (Git never recognizes bare repositories).
-This defaults to `always`, but this default is likely to change.
+repositories), `cwd` (Git only recognizes bare repositories if they are the
+current working directory) and `never` (Git never recognizes bare
+repositories). This defaults to `always`, but this default is likely to
+change.
 +
 If your workflow does not rely on bare repositories, it is recommended that
 you set this value to `never`. This makes repository discovery easier to
diff --git a/setup.c b/setup.c
index cee01d86f0c..ead999f404c 100644
--- a/setup.c
+++ b/setup.c
@@ -14,6 +14,7 @@ enum discovery_bare_config {
 	DISCOVERY_BARE_UNKNOWN = -1,
 	DISCOVERY_BARE_NEVER = 0,
 	DISCOVERY_BARE_ALWAYS,
+	DISCOVERY_BARE_CWD,
 };
 static enum discovery_bare_config discovery_bare_config =
 	DISCOVERY_BARE_UNKNOWN;
@@ -1153,10 +1154,14 @@ static int discovery_bare_cb(const char *key, const char *value, void *d)
 		discovery_bare_config = DISCOVERY_BARE_ALWAYS;
 		return 0;
 	}
+	if (!strcmp(value, "cwd")) {
+		discovery_bare_config = DISCOVERY_BARE_CWD;
+		return 0;
+	}
 	return -1;
 }
 
-static int check_bare_repo_allowed(void)
+static int check_bare_repo_allowed(const char *cwd, const char *path)
 {
 	if (discovery_bare_config == DISCOVERY_BARE_UNKNOWN) {
 		read_very_early_config(discovery_bare_cb, NULL);
@@ -1169,6 +1174,8 @@ static int check_bare_repo_allowed(void)
 		return 0;
 	case DISCOVERY_BARE_ALWAYS:
 		return 1;
+	case DISCOVERY_BARE_CWD:
+		return !strcmp(cwd, path);
 	default:
 		BUG("invalid discovery_bare_config %d", discovery_bare_config);
 	}
@@ -1181,6 +1188,8 @@ static const char *discovery_bare_config_to_string(void)
 		return "never";
 	case DISCOVERY_BARE_ALWAYS:
 		return "always";
+	case DISCOVERY_BARE_CWD:
+		return "cwd";
 	default:
 		BUG("invalid discovery_bare_config %d", discovery_bare_config);
 	}
@@ -1212,7 +1221,8 @@ enum discovery_result {
  * the discovered .git/ directory, if any. If `gitdir` is not absolute, it
  * is relative to `dir` (i.e. *not* necessarily the cwd).
  */
-static enum discovery_result setup_git_directory_gently_1(struct strbuf *dir,
+static enum discovery_result setup_git_directory_gently_1(struct strbuf *cwd,
+							  struct strbuf *dir,
 							  struct strbuf *gitdir,
 							  int die_on_error)
 {
@@ -1293,7 +1303,7 @@ static enum discovery_result setup_git_directory_gently_1(struct strbuf *dir,
 		}
 
 		if (is_git_directory(dir->buf)) {
-			if (!check_bare_repo_allowed())
+			if (!check_bare_repo_allowed(cwd->buf, dir->buf))
 				return GIT_DIR_DISALLOWED_BARE;
 			if (!ensure_valid_ownership(dir->buf))
 				return GIT_DIR_INVALID_OWNERSHIP;
@@ -1319,16 +1329,18 @@ static enum discovery_result setup_git_directory_gently_1(struct strbuf *dir,
 int discover_git_directory(struct strbuf *commondir,
 			   struct strbuf *gitdir)
 {
-	struct strbuf dir = STRBUF_INIT, err = STRBUF_INIT;
+	struct strbuf cwd = STRBUF_INIT, dir = STRBUF_INIT, err = STRBUF_INIT;
 	size_t gitdir_offset = gitdir->len, cwd_len;
 	size_t commondir_offset = commondir->len;
 	struct repository_format candidate = REPOSITORY_FORMAT_INIT;
 
-	if (strbuf_getcwd(&dir))
+	if (strbuf_getcwd(&cwd))
 		return -1;
+	strbuf_addbuf(&dir, &cwd);
 
 	cwd_len = dir.len;
-	if (setup_git_directory_gently_1(&dir, gitdir, 0) <= 0) {
+	if (setup_git_directory_gently_1(&cwd, &dir, gitdir, 0) <= 0) {
+		strbuf_release(&cwd);
 		strbuf_release(&dir);
 		return -1;
 	}
@@ -1351,6 +1363,7 @@ int discover_git_directory(struct strbuf *commondir,
 	strbuf_reset(&dir);
 	strbuf_addf(&dir, "%s/config", commondir->buf + commondir_offset);
 	read_repository_format(&candidate, dir.buf);
+	strbuf_release(&cwd);
 	strbuf_release(&dir);
 
 	if (verify_repository_format(&candidate, &err) < 0) {
@@ -1400,7 +1413,7 @@ const char *setup_git_directory_gently(int *nongit_ok)
 		die_errno(_("Unable to read current working directory"));
 	strbuf_addbuf(&dir, &cwd);
 
-	switch (setup_git_directory_gently_1(&dir, &gitdir, 1)) {
+	switch (setup_git_directory_gently_1(&cwd, &dir, &gitdir, 1)) {
 	case GIT_DIR_EXPLICIT:
 		prefix = setup_explicit_git_dir(gitdir.buf, &cwd, &repo_fmt, nongit_ok);
 		break;
diff --git a/t/t0034-discovery-bare.sh b/t/t0034-discovery-bare.sh
index 9c774872c4e..ba44cf19c99 100755
--- a/t/t0034-discovery-bare.sh
+++ b/t/t0034-discovery-bare.sh
@@ -56,4 +56,14 @@ test_expect_success 'discovery.bare=never' '
 	)
 '
 
+test_expect_success 'discovery.bare=cwd' '
+	git config --global discovery.bare cwd &&
+	(
+		cd outer-repo/bare-repo &&
+		expect_allowed &&
+		cd refs/ &&
+		expect_rejected
+	)
+'
+
 test_done
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/2] setup.c: make bare repo discovery optional
  2022-05-13 23:37 ` [PATCH v2 0/2] " Glen Choo via GitGitGadget
  2022-05-13 23:37   ` [PATCH v2 1/2] " Glen Choo via GitGitGadget
  2022-05-13 23:37   ` [PATCH v2 2/2] setup.c: learn discovery.bareRepository=cwd Glen Choo via GitGitGadget
@ 2022-05-16 16:40   ` Junio C Hamano
  2022-05-16 18:36     ` Glen Choo
  2022-05-16 16:43   ` Junio C Hamano
  2022-05-16 19:07   ` Derrick Stolee
  4 siblings, 1 reply; 26+ messages in thread
From: Junio C Hamano @ 2022-05-16 16:40 UTC (permalink / raw)
  To: Glen Choo via GitGitGadget
  Cc: git, Taylor Blau, brian m. carlson, Derrick Stolee,
	Emily Shaffer, Glen Choo

"Glen Choo via GitGitGadget" <gitgitgadget@gmail.com> writes:

>  * die()-ing is necessary if we're trying to flip the default value of
>    discovery.bare. We'd expect many bare repo users to be broken, and it's
>    more helpful to fail loudly than to silently ignore the bare repo.
>
>    But in the long term, long after we've flipped the default and users know
>    that they need to opt into bare repo discovery, would it be a better UX
>    to just silently ignore the bare repo?

Would a middle-ground of giving a warning() message help?  Can it be
loud and annoying enough to knudge the users to adjust without
breaking the functionality?

The longer-term default should be "cwd is allowed, but we do not
bother going up from object/04 subdirectory of a bare repository",
not "bare repositories should not be usable at all without GIT_DIR".

>      +    Add a config variable, `discovery.bare`, that tells Git whether or not
>      +    it should work with the bare repository it has discovered i.e. Git will
>      +    die() if it discovers a bare repository, but it is not allowed by

Missing comma before "i.e."

>      ++discovery.bare::
>      ++	Specifies what kinds of directories Git can recognize as a bare
>      ++	repository when looking for the repository (aka repository
>       +	discovery). This has no effect if repository discovery is not
>       +	performed e.g. the path to the repository is set via `--git-dir`
>       +	(see linkgit:git[1]).
>       ++
>       +This config setting is only respected when specified in a system or global
>       +config, not when it is specified in a repository config or via the command
>      ++line option `-c discovery.bare=<value>`.

;-)

>      +++
>      ++The currently supported values are `always` (Git always recognizes bare
>      ++repositories) and `never` (Git never recognizes bare repositories).
>      ++This defaults to `always`, but this default is likely to change.
>      +++
>      ++If your workflow does not rely on bare repositories, it is recommended that
>      ++you set this value to `never`. This makes repository discovery easier to
>      ++reason about and prevents certain types of security and non-security
>      ++problems, such as:

Hopefully "git fetch" over ssh:// and file:/// would run the other
side with GIT_DIR explicitly set?  As long as this recommendation
does not break these use cases, I think we are OK, but I do not yet
find these "problems, such as..." so convincing.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/2] setup.c: make bare repo discovery optional
  2022-05-13 23:37 ` [PATCH v2 0/2] " Glen Choo via GitGitGadget
                     ` (2 preceding siblings ...)
  2022-05-16 16:40   ` [PATCH v2 0/2] setup.c: make bare repo discovery optional Junio C Hamano
@ 2022-05-16 16:43   ` Junio C Hamano
  2022-05-16 19:07   ` Derrick Stolee
  4 siblings, 0 replies; 26+ messages in thread
From: Junio C Hamano @ 2022-05-16 16:43 UTC (permalink / raw)
  To: Glen Choo via GitGitGadget
  Cc: git, Taylor Blau, brian m. carlson, Derrick Stolee,
	Emily Shaffer, Glen Choo

"Glen Choo via GitGitGadget" <gitgitgadget@gmail.com> writes:

>  t/t0034-discovery-bare.sh          | 69 +++++++++++++++++++++++

This number is already in use by an in-flight topic, if I am not
mistaken.  Please make it a habit to always check your topic works
well when merged to 'next' and to 'seen'.

Thanks.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 1/2] setup.c: make bare repo discovery optional
  2022-05-13 23:37   ` [PATCH v2 1/2] " Glen Choo via GitGitGadget
@ 2022-05-16 18:12     ` Glen Choo
  2022-05-16 18:46     ` Derrick Stolee
  1 sibling, 0 replies; 26+ messages in thread
From: Glen Choo @ 2022-05-16 18:12 UTC (permalink / raw)
  To: Glen Choo via GitGitGadget, git
  Cc: Taylor Blau, brian m. carlson, Derrick Stolee, Junio C Hamano,
	Emily Shaffer

"Glen Choo via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Glen Choo <chooglen@google.com>
>
> Add a config variable, `discovery.bare`, that tells Git whether or not
> it should work with the bare repository it has discovered i.e. Git will
> die() if it discovers a bare repository, but it is not allowed by
> `discovery.bare`. This only affects repository discovery, thus it has no
> effect if discovery was not done (e.g. `--git-dir` was passed).
>
> This is motivated by the fact that some workflows don't use bare
> repositories at all, and users may prefer to opt out of bare repository
> discovery altogether:
>
> - An easy assumption for a user to make is that Git commands run
>   anywhere inside a repository's working tree will use the same
>   repository. However, if the working tree contains a bare repository
>   below the root-level (".git" is preferred at the root-level), any
>   operations inside that bare repository use the bare repository
>   instead.
>
>   In the worst case, attackers can use this confusion to trick users
>   into running arbitrary code (see [1] for a deeper discussion). But
>   even in benign situations (e.g. a user renames ".git/" to ".git.old/"
>   and commits it for archival purposes), disabling bare repository
>   discovery can be a simpler mode of operation (e.g. because the user
>   doesn't actually want to use ".git.old/") [2].
>
> - Git won't "accidentally" recognize a directory that wasn't meant to be
>   a bare repository, but happens to resemble one. While such accidents
>   are probably very rare in practice, this lets users reduce the chance
>   to zero.
>
> This config is an enum of:
>
> - ["always"|(unset)]: always recognize bare repositories (like Git does
>   today)
> - "never": never recognize bare repositories
>
> More values are expected to be added later, and the default is expected
> to change (i.e. to something other than "always").
>
> [1]: https://lore.kernel.org/git/kl6lsfqpygsj.fsf@chooglen-macbookpro.roam.corp.google.com
> [2]: I don't personally know anyone who does this as part of their
> normal workflow, but a cursory search on GitHub suggests that there is a
> not insubstantial number of people who munge ".git" in order to store
> its contents.
>
> https://github.com/search?l=&o=desc&p=1&q=ref+size%3A%3C1000+filename%3AHEAD&s=indexed&type=Code
> (aka search for the text "ref", size:<1000, filename:HEAD)
>
> Signed-off-by: Glen Choo <chooglen@google.com>

The intended commit message ends here...

> WIP setup.c: make discovery.bare die on failure
>
> Signed-off-by: Glen Choo <chooglen@google.com>

Ugh, dumb mistake (bad squash). Fortunately this was one of my more
professional-sounding WIP commit messages.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/2] setup.c: make bare repo discovery optional
  2022-05-16 16:40   ` [PATCH v2 0/2] setup.c: make bare repo discovery optional Junio C Hamano
@ 2022-05-16 18:36     ` Glen Choo
  2022-05-16 19:16       ` Junio C Hamano
  0 siblings, 1 reply; 26+ messages in thread
From: Glen Choo @ 2022-05-16 18:36 UTC (permalink / raw)
  To: Junio C Hamano, Glen Choo via GitGitGadget
  Cc: git, Taylor Blau, brian m. carlson, Derrick Stolee,
	Emily Shaffer, rsbecker

Junio C Hamano <gitster@pobox.com> writes:

> "Glen Choo via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>>  * die()-ing is necessary if we're trying to flip the default value of
>>    discovery.bare. We'd expect many bare repo users to be broken, and it's
>>    more helpful to fail loudly than to silently ignore the bare repo.
>>
>>    But in the long term, long after we've flipped the default and users know
>>    that they need to opt into bare repo discovery, would it be a better UX
>>    to just silently ignore the bare repo?
>
> Would a middle-ground of giving a warning() message help?  Can it be
> loud and annoying enough to knudge the users to adjust without
> breaking the functionality?

Personally, when my tool changes its behavior, I would strongly prefer
it to die than to "change behavior + warn". I'd feel more comfortable
knowing that the tool did nothing as opposed to doing the wrong thing
and only being informed after the fact. Also, I sometimes ignore
warnings ;)

When we _do_ transition away from die(), ignore + warning() sounds like
a good first step.

But if any of this flies in the face of the project's conventions, let
me know as such.

>>      +    Add a config variable, `discovery.bare`, that tells Git whether or not
>>      +    it should work with the bare repository it has discovered i.e. Git will
>>      +    die() if it discovers a bare repository, but it is not allowed by
>
> Missing comma before "i.e."

Thanks.

>>      +++
>>      ++The currently supported values are `always` (Git always recognizes bare
>>      ++repositories) and `never` (Git never recognizes bare repositories).
>>      ++This defaults to `always`, but this default is likely to change.
>>      +++
>>      ++If your workflow does not rely on bare repositories, it is recommended that
>>      ++you set this value to `never`. This makes repository discovery easier to
>>      ++reason about and prevents certain types of security and non-security
>>      ++problems, such as:
>
> Hopefully "git fetch" over ssh:// and file:/// would run the other
> side with GIT_DIR explicitly set?

Ah, I'll check this and get back to you.

>                                                        I do not yet
> find these "problems, such as..." so convincing.

What would be a convincing rationale to you? I'll capture that here.

I'm assuming that you already have such an rationale in mind when you
say that the longer-term default is that "we respect bare repositories
only if they are the cwd.". I'm also assuming that this rationale is
something other than embedded bare repos, because "cwd-only" does not
protect against that.

Perhaps "never" sounds better to folks who don't ever expect bare
repositories and want to lock down the environment. Randall (cc-ed)
suggests one such use case in [1].

(To Randall: Oops, I actually meant to cc you earlier, since you were
the first to suggest a practical use case for never allowing bare repos.
It must've slipped my mind).

[1] https://lore.kernel.org/git/005d01d84ad0$782e8fc0$688baf40$@nexbridge.com.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 1/2] setup.c: make bare repo discovery optional
  2022-05-13 23:37   ` [PATCH v2 1/2] " Glen Choo via GitGitGadget
  2022-05-16 18:12     ` Glen Choo
@ 2022-05-16 18:46     ` Derrick Stolee
  2022-05-16 22:25       ` Taylor Blau
  2022-05-17 20:24       ` Glen Choo
  1 sibling, 2 replies; 26+ messages in thread
From: Derrick Stolee @ 2022-05-16 18:46 UTC (permalink / raw)
  To: Glen Choo via GitGitGadget, git
  Cc: Taylor Blau, brian m. carlson, Junio C Hamano, Emily Shaffer, Glen Choo

On 5/13/2022 7:37 PM, Glen Choo via GitGitGadget wrote:
> From: Glen Choo <chooglen@google.com>
> 
> Add a config variable, `discovery.bare`, that tells Git whether or not
> it should work with the bare repository it has discovered i.e. Git will
> die() if it discovers a bare repository, but it is not allowed by
> `discovery.bare`. This only affects repository discovery, thus it has no
> effect if discovery was not done (e.g. `--git-dir` was passed).

> This config is an enum of:
> 
> - ["always"|(unset)]: always recognize bare repositories (like Git does
>   today)
> - "never": never recognize bare repositories
> 
> More values are expected to be added later, and the default is expected
> to change (i.e. to something other than "always").

I think it is fine to include the "never" option for users to opt-in to
this super-protected state, but I want to make it very clear that we
should never move to it as a new default. This phrasing of 'something
other than "always"' is key, but it might be good to point out that
"never" is very unlikely to be that default.

> WIP setup.c: make discovery.bare die on failure
> 
> Signed-off-by: Glen Choo <chooglen@google.com>

Accidental concatenation of squashed commit?

> diff --git a/Documentation/config/discovery.txt b/Documentation/config/discovery.txt
> new file mode 100644
> index 00000000000..761cabe6e70
> --- /dev/null
> +++ b/Documentation/config/discovery.txt
> @@ -0,0 +1,24 @@
> +discovery.bare::
> +	Specifies what kinds of directories Git can recognize as a bare
> +	repository when looking for the repository (aka repository
> +	discovery). This has no effect if repository discovery is not
> +	performed e.g. the path to the repository is set via `--git-dir`
> +	(see linkgit:git[1]).

Avoid "e.g." here.

	This has no effect if the repository is specified directly via
	the --git-dir command-line option or the GIT_DIR environment
	variable.

> +This config setting is only respected when specified in a system or global
> +config, not when it is specified in a repository config or via the command
> +line option `-c discovery.bare=<value>`.

We are sprinkling config options that have these same restrictions throughout
the config documentation. It might be time to define a term like "protected
config" at the top of git-config.txt and then refer to that from these other
locations.

> +The currently supported values are `always` (Git always recognizes bare
> +repositories) and `never` (Git never recognizes bare repositories).

This sentence structure is likely to change in the future, and as it stands
will become complicated. A bulleted list will have easier edits in the future.

> +This defaults to `always`, but this default is likely to change.

For now, I would say "but this default may change in the future." instead.

> +If your workflow does not rely on bare repositories, it is recommended that
> +you set this value to `never`. This makes repository discovery easier to
> +reason about and prevents certain types of security and non-security
> +problems, such as:
> +

(You might need a "+" here.)

> +* `git clone`-ing a repository containing a malicious bare repository
> +  inside it.
> +* Git recognizing a directory that isn't meant to be a bare repository,
> +  but happens to look like one.

I think these last bits recommending the 'never' option are a bit
distracting. It doesn't make repository discovery "easier to reason
about" because we still discover the bare repo and die() instead of
skipping it and looking higher for a non-bare repository in the
parent directories. The case of an "accidentally-recognized bare
repo" is so unlikely it is probably not worth mention in these docs.

Instead, I think something like this might be better:

  If you do not use bare repositories in your workflow, then it may
  be beneficial to set `discovery.bare` to `never` in your global
  config. This will protect you from attacks that involve cloning a
  repository that contains a bare repository and running a Git
  command within that directory.

> +static int check_bare_repo_allowed(void)
> +{
> +	if (discovery_bare_config == DISCOVERY_BARE_UNKNOWN) {
> +		read_very_early_config(discovery_bare_cb, NULL);

This will add the third place where we use read_very_early_config(),
adding to the existing calls in tr2_sysenv_load() and
ensure_valid_ownership(). If I understand it correctly, that means
that every Git execution in a bare repository will now parse the
system and global config three times.

This doesn't count the check for uploadpack.packobjectshook in
upload-pack.c that uses current_config_scope() to restrict its
value to the system and global config.

We are probably at the point where we need to instead create a
configset that stores this "protected config" and allow us to
lookup config keys directly from that configset instead of
iterating through these config files repeatedly.

> +		/* We didn't find a value; use the default. */
> +		if (discovery_bare_config == DISCOVERY_BARE_UNKNOWN)
> +			discovery_bare_config = DISCOVERY_BARE_ALWAYS;

This could also be done in advance of the config parsing
by setting discovery_bare_config = DISCOVERY_BARE_ALWAYS before
calling read_very_early_config(). Avoids an if and a comment
here, which might be nice.

> +	}
> +	switch (discovery_bare_config) {
> +	case DISCOVERY_BARE_NEVER:
> +		return 0;
> +	case DISCOVERY_BARE_ALWAYS:
> +		return 1;
> +	default:
> +		BUG("invalid discovery_bare_config %d", discovery_bare_config);
> +	}

You return -1 in discovery_bare_cb when the key matches, but
the value is not understood. Should we check the return value
of read_very_early_config(), too?
> +static const char *discovery_bare_config_to_string(void)
> +{
> +	switch (discovery_bare_config) {
> +	case DISCOVERY_BARE_NEVER:
> +		return "never";
> +	case DISCOVERY_BARE_ALWAYS:
> +		return "always";
> +	default:
> +		BUG("invalid discovery_bare_config %d", discovery_bare_config);

In general, I'm not sure these BUG() statements are helpful,
but they aren't hurting anything. I wonder if it would be
better to use DISCOVERY_BARE_UNKNOWN instead of default,
because then the compiler should notice that the switch needs
updating when a new enum mode is added.

> @@ -1142,7 +1195,8 @@ enum discovery_result {
>  	GIT_DIR_HIT_CEILING = -1,
>  	GIT_DIR_HIT_MOUNT_POINT = -2,
>  	GIT_DIR_INVALID_GITFILE = -3,
> -	GIT_DIR_INVALID_OWNERSHIP = -4
> +	GIT_DIR_INVALID_OWNERSHIP = -4,
> +	GIT_DIR_DISALLOWED_BARE = -5

I think that you can add a comma at the end of this enum to avoid the
changed line the next time the enum needs to be expanded.

>  };
>  
>  /*
> @@ -1239,6 +1293,8 @@ static enum discovery_result setup_git_directory_gently_1(struct strbuf *dir,
>  		}
>  
>  		if (is_git_directory(dir->buf)) {
> +			if (!check_bare_repo_allowed())
> +				return GIT_DIR_DISALLOWED_BARE;

Won't this fail if someone runs a Git command inside of a .git/
directory for a non-bare repository? I just want to be sure that
we hit this error instead:

	fatal: this operation must be run in a work tree

I see that this error is tested in t0008-ignores.sh, but that's
with the default "always" value. It would be good to explicitly
check that this is the right error when using the "never" config.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 2/2] setup.c: learn discovery.bareRepository=cwd
  2022-05-13 23:37   ` [PATCH v2 2/2] setup.c: learn discovery.bareRepository=cwd Glen Choo via GitGitGadget
@ 2022-05-16 18:49     ` Derrick Stolee
  0 siblings, 0 replies; 26+ messages in thread
From: Derrick Stolee @ 2022-05-16 18:49 UTC (permalink / raw)
  To: Glen Choo via GitGitGadget, git
  Cc: Taylor Blau, brian m. carlson, Junio C Hamano, Emily Shaffer, Glen Choo

On 5/13/2022 7:37 PM, Glen Choo via GitGitGadget wrote:
> From: Glen Choo <chooglen@google.com>
> 
> Add a 'cwd' option to discovery.bareRepository, which allows a bare
> repository to be used if and only if the cwd is the root of a bare
> repository. This covers the common case where a user works with a bare
> repository by cd-ing into the repository's root.

I don't consider this case valuable. In addition to allowing
the most-common use case, it also allows the most-common route
that an attacker would use to try to get a user to run a Git
command in a malicious embedded bare repo. I think we are
better off without it.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/2] setup.c: make bare repo discovery optional
  2022-05-13 23:37 ` [PATCH v2 0/2] " Glen Choo via GitGitGadget
                     ` (3 preceding siblings ...)
  2022-05-16 16:43   ` Junio C Hamano
@ 2022-05-16 19:07   ` Derrick Stolee
  2022-05-16 22:43     ` Taylor Blau
                       ` (2 more replies)
  4 siblings, 3 replies; 26+ messages in thread
From: Derrick Stolee @ 2022-05-16 19:07 UTC (permalink / raw)
  To: Glen Choo via GitGitGadget, git
  Cc: Taylor Blau, brian m. carlson, Junio C Hamano, Emily Shaffer, Glen Choo

On 5/13/2022 7:37 PM, Glen Choo via GitGitGadget wrote:
> Thanks all for the comments on v1, I've expanded this series somewhat to
> address them,...

Please include a full cover letter with each version, so reviewers
can respond to the full series goals.

Your series here intends to start protecting against malicious
embedded bare repositories by allowing users to opt-in to a more
protected state. When the 'discovery.bare' option is set, then
Git may die() on a bare repository that is discovered based on
the current working directory (these protections are ignored if
the user specifies the directory directly through --git-dir or
$GIT_DIR).

The 'discovery.bare' option has these values at the end of your
series:

* 'always' (default) allows all bare repos, matching the current
  behavior of Git.

* 'never' avoids operating in bare repositories altogether.

* 'cwd' operates in a bare repository only if the current directory
  is exactly the root of the bare repository.

It is important that we keep 'always' as the default at first,
because we do not want to introduce a breaking change without
warning (at least for an issue like this that has been around
for a long time).

The 'never' option is a good one for very security-conscious
users who really want to avoid problems. I don't anticipate that
users who know about this option and set it themselves are the
type that would fall for the social engineering required to
attack using this vector, but I can imagine an IT department
installing the value in system config across a fleet of machines.

I find the 'cwd' option to not be valuable. It unblocks most
existing users, but also almost completely removes the protection
that the option was supposed to provide.

I find neither the 'never' or 'cwd' options an acceptable choice
for a future default.

I also think that this protection is too rigid: it restricts
_all_ bare repositories, not just embedded ones. There is no check
to see if the parent directory of the bare repository is inside a
non-bare repository.

This leads to what I think would be a valuable replacement for
the 'cwd' option:

* 'no-embedded' allows non-embedded bare repositories. An
  _embedded bare repository_ is a bare repository whose parent
  directory is contained in the worktree of a non-bare Git
  repository. When in this mode, embedded bare repositories are
  not allowed unless the parent non-bare Git repository has a
  'safe.embedded' config value storing the path to the current
  embedded bare repository.

That was certainly difficult to write, but here it is as
pseudo-code to hopefully remove some doubt as to how this might
work:

  if repo is bare:
    if value == "always":
       return ALLOWED
    if value == "never":
       return FORBIDDEN;

    path = get_parent_repo()

    if !path:
       return ALLOWED
    
    if config_file_has_value("{path}/.git/config", "safe.embedded", repo):
       return ALLOWED

    return FORBIDDEN

With this kind of option, we can protect users from these
social engineering attacks while providing an opt-in protection
for scenarios where embedded bare repos are currently being used
(while also not breaking anyone using non-embedded bare repos).

I think Taylor was mentioning something like this in his previous
replies, perhaps even to the previous thread on this topic.

This 'no-embedded' option is something that I could see as a
potential new default, after it has proven itself in a released
version of Git.

There are performance drawbacks to checking the parent path for
a Git repo, which is why it is only done when in "no-embedded"
mode.

I mentioned some other concerns in your PATCH 1 about how we
are now adding the third use of read_very_early_config() and that
we should probably refactor that before adding the third option,
in order to avoid additional performance costs as well as it
being difficult to audit which config options are only checked
from these "protected" config files.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/2] setup.c: make bare repo discovery optional
  2022-05-16 18:36     ` Glen Choo
@ 2022-05-16 19:16       ` Junio C Hamano
  2022-05-16 20:27         ` Glen Choo
  0 siblings, 1 reply; 26+ messages in thread
From: Junio C Hamano @ 2022-05-16 19:16 UTC (permalink / raw)
  To: Glen Choo
  Cc: Glen Choo via GitGitGadget, git, Taylor Blau, brian m. carlson,
	Derrick Stolee, Emily Shaffer, rsbecker

Glen Choo <chooglen@google.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> "Glen Choo via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>
>>>  * die()-ing is necessary if we're trying to flip the default value of
>>>    discovery.bare. We'd expect many bare repo users to be broken, and it's
>>>    more helpful to fail loudly than to silently ignore the bare repo.
>>>
>>>    But in the long term, long after we've flipped the default and users know
>>>    that they need to opt into bare repo discovery, would it be a better UX
>>>    to just silently ignore the bare repo?
>>
>> Would a middle-ground of giving a warning() message help?  Can it be
>> loud and annoying enough to knudge the users to adjust without
>> breaking the functionality?
>
> Personally, when my tool changes its behavior, I would strongly prefer
> it to die than to "change behavior + warn". I'd feel more comfortable
> knowing that the tool did nothing as opposed to doing the wrong thing
> and only being informed after the fact. Also, I sometimes ignore
> warnings ;)

Heh, personally I would try very hard not to change the behaviour
without explicitly asked by the users with configuration or command
line option.  Flipping the default has traditionally been done in
two or three phases.

 (1) We start by giving a loud and annoying warning to those who
     haven't configured and tell them the default *will* change, how
     to keep the current behaviour forever, and how to live in the
     future by adopting the future default early.

 (2) After a while, we flip the default.  Those who haven't
     configured are given a notice that the default has changed, how
     to keep the old behaviour forever, and how to explicitly choose
     the same value as the default to squelch the notice.

 (3) After yet another while, we stop giving the notice.  If we
     omitted (2), here is where we flip the default.

Strictly speaking, we can have (1) in one release and then could
directly jump to (3), but some distros may skip the releases that
has (1), and (2) is an attempt to help users of such distros.

>> Hopefully "git fetch" over ssh:// and file:/// would run the other
>> side with GIT_DIR explicitly set?
>
> Ah, I'll check this and get back to you.
>
>>                                                        I do not yet
>> find these "problems, such as..." so convincing.
>
> What would be a convincing rationale to you? I'll capture that here.

That is a wrong question.  You are the one pushing for castrating
the bare repositories.

> I'm assuming that you already have such an rationale in mind when you
> say that the longer-term default is that "we respect bare repositories
> only if they are the cwd.". I'm also assuming that this rationale is
> something other than embedded bare repos, because "cwd-only" does not
> protect against that.

No, I do not have such a "different" rationale to justify the change
proposed in this patch.  I was saying that the claim "embedded bare
repos are risky", backed by your two examples, did not sound all
that serious a problem.  Presented with a more serious brekage
scenario, it may make the description more convincing.

Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/2] setup.c: make bare repo discovery optional
  2022-05-16 19:16       ` Junio C Hamano
@ 2022-05-16 20:27         ` Glen Choo
  2022-05-16 22:16           ` Junio C Hamano
  0 siblings, 1 reply; 26+ messages in thread
From: Glen Choo @ 2022-05-16 20:27 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Glen Choo via GitGitGadget, git, Taylor Blau, brian m. carlson,
	Derrick Stolee, Emily Shaffer, rsbecker

Junio C Hamano <gitster@pobox.com> writes:

> Glen Choo <chooglen@google.com> writes:
>
>> Junio C Hamano <gitster@pobox.com> writes:
>>
>>> "Glen Choo via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>>
>>>>  * die()-ing is necessary if we're trying to flip the default value of
>>>>    discovery.bare. We'd expect many bare repo users to be broken, and it's
>>>>    more helpful to fail loudly than to silently ignore the bare repo.
>>>>
>>>>    But in the long term, long after we've flipped the default and users know
>>>>    that they need to opt into bare repo discovery, would it be a better UX
>>>>    to just silently ignore the bare repo?
>>>
>>> Would a middle-ground of giving a warning() message help?  Can it be
>>> loud and annoying enough to knudge the users to adjust without
>>> breaking the functionality?
>>
>> Personally, when my tool changes its behavior, I would strongly prefer
>> it to die than to "change behavior + warn". I'd feel more comfortable
>> knowing that the tool did nothing as opposed to doing the wrong thing
>> and only being informed after the fact. Also, I sometimes ignore
>> warnings ;)
>
> Heh, personally I would try very hard not to change the behaviour
> without explicitly asked by the users with configuration or command
> line option.  Flipping the default has traditionally been done in
> two or three phases.
>
>  (1) We start by giving a loud and annoying warning to those who
>      haven't configured and tell them the default *will* change, how
>      to keep the current behaviour forever, and how to live in the
>      future by adopting the future default early.
>
>  (2) After a while, we flip the default.  Those who haven't
>      configured are given a notice that the default has changed, how
>      to keep the old behaviour forever, and how to explicitly choose
>      the same value as the default to squelch the notice.
>
>  (3) After yet another while, we stop giving the notice.  If we
>      omitted (2), here is where we flip the default.
>
> Strictly speaking, we can have (1) in one release and then could
> directly jump to (3), but some distros may skip the releases that
> has (1), and (2) is an attempt to help users of such distros.

Ah, that is very helpful. Thanks. It's pretty clear that I misunderstood
what you meant by "giving a warning() message" - the warning() is there
to prepare users in advance of the change; we don't actually want the
warning() in the long term.

For something as disruptive as discovering bare repos, having all of
(1), (2) and (3) sounds appropriate.

>>> Hopefully "git fetch" over ssh:// and file:/// would run the other
>>> side with GIT_DIR explicitly set?
>>
>> Ah, I'll check this and get back to you.
>>
>>>                                                        I do not yet
>>> find these "problems, such as..." so convincing.
>>
>> What would be a convincing rationale to you? I'll capture that here.
>
> That is a wrong question.  You are the one pushing for castrating
> the bare repositories.

Let me clarify in case this wasn't received the way I intended. Earlier
in the thread, you mentioned:

  The longer-term default should be "cwd is allowed, but we do not
  bother going up from object/04 subdirectory of a bare repository",
  [...]

which I took to mean "Junio thinks that, by default, Git should stop
walking up to find a bare repo, and thinks this is better because of
rationale X.", and not, "Junio does not think that the default needs to
change, but is just suggesting a better default than Glen's".

If it is the former, then there is obviously some thought process here
that is worth sharing.

If it the latter, then I'm in favor of taking Stolee's suggestion to
drop "cwd", since nobody else finds it useful enough. (I like the
'simplification' story, but not enough to push "cwd" through, especially
since it does quite little security-wise.)

>> I'm assuming that you already have such an rationale in mind when you
>> say that the longer-term default is that "we respect bare repositories
>> only if they are the cwd.". I'm also assuming that this rationale is
>> something other than embedded bare repos, because "cwd-only" does not
>> protect against that.
>
> No, I do not have such a "different" rationale to justify the change
> proposed in this patch.  I was saying that the claim "embedded bare
> repos are risky", backed by your two examples, did not sound all
> that serious a problem.  Presented with a more serious brekage
> scenario, it may make the description more convincing.

Fair. I'll mull over this.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/2] setup.c: make bare repo discovery optional
  2022-05-16 20:27         ` Glen Choo
@ 2022-05-16 22:16           ` Junio C Hamano
  0 siblings, 0 replies; 26+ messages in thread
From: Junio C Hamano @ 2022-05-16 22:16 UTC (permalink / raw)
  To: Glen Choo
  Cc: Glen Choo via GitGitGadget, git, Taylor Blau, brian m. carlson,
	Derrick Stolee, Emily Shaffer, rsbecker

Glen Choo <chooglen@google.com> writes:

> which I took to mean "Junio thinks that, by default, Git should stop
> walking up to find a bare repo, and thinks this is better because of
> rationale X."

The X is "it would not break existing use case too badly, just to
address a 'security' story whose severity is not so clearly
expressed".

> If it the latter, then I'm in favor of taking Stolee's suggestion to
> drop "cwd", since nobody else finds it useful enough. (I like the
> 'simplification' story, but not enough to push "cwd" through, especially
> since it does quite little security-wise.)

As long as you'll be there to answer the angry mob that complain
loudly (and irritatingly enough, the only do so after a release is
made to flip the default), I do not care too much either way ;-).

Thanks.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 1/2] setup.c: make bare repo discovery optional
  2022-05-16 18:46     ` Derrick Stolee
@ 2022-05-16 22:25       ` Taylor Blau
  2022-05-17 20:24       ` Glen Choo
  1 sibling, 0 replies; 26+ messages in thread
From: Taylor Blau @ 2022-05-16 22:25 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Glen Choo via GitGitGadget, git, brian m. carlson,
	Junio C Hamano, Emily Shaffer, Glen Choo

On Mon, May 16, 2022 at 02:46:55PM -0400, Derrick Stolee wrote:
> On 5/13/2022 7:37 PM, Glen Choo via GitGitGadget wrote:
> > From: Glen Choo <chooglen@google.com>
> >
> > Add a config variable, `discovery.bare`, that tells Git whether or not
> > it should work with the bare repository it has discovered i.e. Git will
> > die() if it discovers a bare repository, but it is not allowed by
> > `discovery.bare`. This only affects repository discovery, thus it has no
> > effect if discovery was not done (e.g. `--git-dir` was passed).
>
> > This config is an enum of:
> >
> > - ["always"|(unset)]: always recognize bare repositories (like Git does
> >   today)
> > - "never": never recognize bare repositories
> >
> > More values are expected to be added later, and the default is expected
> > to change (i.e. to something other than "always").
>
> I think it is fine to include the "never" option for users to opt-in to
> this super-protected state, but I want to make it very clear that we
> should never move to it as a new default. This phrasing of 'something
> other than "always"' is key, but it might be good to point out that
> "never" is very unlikely to be that default.

I am confused, then.

What does a user who has some legitimate (non-embedded) bare
repositories do if they are skeptical of other bare repositories? I
suspect the best answer we would be able to provide with these patches
is "use `--git-dir`".

What happens to a user who has a combination of legitimate bare
repositories, embedded bare repositories that they trust, and other
embedded bare repositories that they don't?

As far as I can tell, our recommendation with these tools would be to:

  - run `git config --global discovery.bare never`, and
  - include `--git-dir=$(pwd)` in any git invocations in bare
    repositories that they do trust

This gets at my concerns from [1] and [2] (mostly [2], in this case)
that we're trying to close the embedded bare repos problem with an
overly broad solution, at the expense of usability.

I can't shake the feeling that something like I described towards the
bottom of [2] would give you all of the security guarantees you're after
without compromising on usability for non-embedded bare repositories.

I'm happy to explore this direction more myself if you don't want to. I
would just much rather see us adopt an approach that doesn't break more
use-cases than it has to if such a thing can be avoided.

I cannot endorse these patches as-is.

Thanks,
Taylor

[1]: https://lore.kernel.org/git/Ylobp7sntKeWTLDX@nand.local/
[2]: https://lore.kernel.org/git/YnmKwLoQCorBnMe2@nand.local/

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/2] setup.c: make bare repo discovery optional
  2022-05-16 19:07   ` Derrick Stolee
@ 2022-05-16 22:43     ` Taylor Blau
  2022-05-16 23:19     ` Junio C Hamano
  2022-05-17 18:56     ` Glen Choo
  2 siblings, 0 replies; 26+ messages in thread
From: Taylor Blau @ 2022-05-16 22:43 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Glen Choo via GitGitGadget, git, brian m. carlson,
	Junio C Hamano, Emily Shaffer, Glen Choo

On Mon, May 16, 2022 at 03:07:35PM -0400, Derrick Stolee wrote:
> On 5/13/2022 7:37 PM, Glen Choo via GitGitGadget wrote:
> > Thanks all for the comments on v1, I've expanded this series somewhat to
> > address them,...
>
> Please include a full cover letter with each version, so reviewers
> can respond to the full series goals.
>
> Your series here intends to start protecting against malicious
> embedded bare repositories by allowing users to opt-in to a more
> protected state. [...]

Thanks for the summary, which I think will be especially helpful to
others looking at this series for the very first time.

> The 'never' option is a good one for very security-conscious
> users who really want to avoid problems. I don't anticipate that
> users who know about this option and set it themselves are the
> type that would fall for the social engineering required to
> attack using this vector, but I can imagine an IT department
> installing the value in system config across a fleet of machines.

When I first read this, I disagreed, since presumably that same crowd
has legitimate bare repositories that they want to continue being able
to operate in without having to pass `--git-dir` or `$GIT_DIR` in.

In fact...

> I also think that this protection is too rigid: it restricts
> _all_ bare repositories, not just embedded ones. There is no check
> to see if the parent directory of the bare repository is inside a
> non-bare repository.

...this resonates quite a bit more with me. "never" isn't a good option
unless you aren't a user of bare repositories _and_ don't have any
embedded bare repositories (either at all, or any ones that you trust).

> This leads to what I think would be a valuable replacement for
> the 'cwd' option:
>
> * 'no-embedded' allows non-embedded bare repositories. An
>   _embedded bare repository_ is a bare repository whose parent
>   directory is contained in the worktree of a non-bare Git
>   repository. When in this mode, embedded bare repositories are
>   not allowed unless the parent non-bare Git repository has a
>   'safe.embedded' config value storing the path to the current
>   embedded bare repository.
>
> That was certainly difficult to write, but here it is as
> pseudo-code to hopefully remove some doubt as to how this might
> work:
>
>   if repo is bare:
>     if value == "always":
>        return ALLOWED
>     if value == "never":
>        return FORBIDDEN;

This is indeed very similar to a proposal I had made upthread (which you
note lower down in this email). One thing that's nice is that we only
have to traverse up to the parent repo when in the "no-embedded" mode.
That may be slow (since it's unbounded all the way up to the filesystem
root or a ceiling directory, whichever we encounter first), but I think
it's unavoidable if you need to distinguish between embedded and
non-embedded bare repositories.

>     path = get_parent_repo()
>
>     if !path:
>        return ALLOWED
>
>     if config_file_has_value("{path}/.git/config", "safe.embedded", repo):
>        return ALLOWED
>
>     return FORBIDDEN
>
> With this kind of option, we can protect users from these
> social engineering attacks while providing an opt-in protection
> for scenarios where embedded bare repos are currently being used
> (while also not breaking anyone using non-embedded bare repos).
>
> I think Taylor was mentioning something like this in his previous
> replies, perhaps even to the previous thread on this topic.

Yep, see: https://lore.kernel.org/git/Ylobp7sntKeWTLDX@nand.local/.a

> This 'no-embedded' option is something that I could see as a
> potential new default, after it has proven itself in a released
> version of Git.

I would be totally happy to see "no-embedded" become the default. It
might be nice to issue a warning when the top-level config is unset,
to give users a heads up about cases that may be broken, perhaps like:

    if repo is bare:
      switch (value) {
      case "always":
        return ALLOWED;
      case "never":
        return FORBIDDEN;
      case "no-embedded": # fallthrough
      case "":
        path = get_parent_repo()
        if !path
          return ALLOWED;

        if config_file_has_value("{path}/.git/config", "safe.embedded", repo)
          return ALLOWED;

        if value == "no-embedded":
          return FORBIDDEN;

        # otherwise, we're in an embedded bare repository with an unset
        # discovery.bare config.
        #
        # warn that this will break in the future...
        warning(_("%s is embedded within %s"), the_repository.path, path);
        advise(_("to allow discovery for this embedded repo, either run"));
        advise(_(""));
        advise(_("  $ git config --global discovery.bare always, or"));
        advise(_("  $ git -C '%s' config --local safe.embedded '%s'"),
               path, relpath(path, the_repository.path));

        # ...but allow the invocation for now until the default is
        # changed.
        return ALLOWED;
      default:
        die(_("unrecognized value of discovery.bare: '%s'"), value);
      }

...where relpath is similar to Go's path/filepath.Rel function.

With an appropriate deprecation period, I think we could even get away
from the "continue executing, but don't read config+hooks", which in
retrospect is more error-prone and difficult to reason about than I
initially had given it credit for.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/2] setup.c: make bare repo discovery optional
  2022-05-16 19:07   ` Derrick Stolee
  2022-05-16 22:43     ` Taylor Blau
@ 2022-05-16 23:19     ` Junio C Hamano
  2022-05-17 18:56     ` Glen Choo
  2 siblings, 0 replies; 26+ messages in thread
From: Junio C Hamano @ 2022-05-16 23:19 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Glen Choo via GitGitGadget, git, Taylor Blau, brian m. carlson,
	Emily Shaffer, Glen Choo

Derrick Stolee <derrickstolee@github.com> writes:

> * 'no-embedded' allows non-embedded bare repositories. An
>   _embedded bare repository_ is a bare repository whose parent
>   directory is contained in the worktree of a non-bare Git
>   repository. When in this mode, embedded bare repositories are
>   not allowed unless the parent non-bare Git repository has a
>   'safe.embedded' config value storing the path to the current
>   embedded bare repository.

Sounds sensible.  I wonder how expensive this will be in practice,
but the behaviour seems well thought out.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/2] setup.c: make bare repo discovery optional
  2022-05-16 19:07   ` Derrick Stolee
  2022-05-16 22:43     ` Taylor Blau
  2022-05-16 23:19     ` Junio C Hamano
@ 2022-05-17 18:56     ` Glen Choo
  2 siblings, 0 replies; 26+ messages in thread
From: Glen Choo @ 2022-05-17 18:56 UTC (permalink / raw)
  To: Derrick Stolee, Glen Choo via GitGitGadget, git
  Cc: Taylor Blau, brian m. carlson, Junio C Hamano, Emily Shaffer


Thanks, I think this has advanced the conversation quite a bit.

Derrick Stolee <derrickstolee@github.com> writes:

> On 5/13/2022 7:37 PM, Glen Choo via GitGitGadget wrote:
>> Thanks all for the comments on v1, I've expanded this series somewhat to
>> address them,...
>
> Please include a full cover letter with each version, so reviewers
> can respond to the full series goals.
>
> Your series here intends to start protecting against malicious
> embedded bare repositories by allowing users to opt-in to a more
> protected state. When the 'discovery.bare' option is set, then
> Git may die() on a bare repository that is discovered based on
> the current working directory (these protections are ignored if
> the user specifies the directory directly through --git-dir or
> $GIT_DIR).
>
> The 'discovery.bare' option has these values at the end of your
> series:
>
> * 'always' (default) allows all bare repos, matching the current
>   behavior of Git.
>
> * 'never' avoids operating in bare repositories altogether.
>
> * 'cwd' operates in a bare repository only if the current directory
>   is exactly the root of the bare repository.

My mistake, I should have prepared this summary myself. Thanks again.

> It is important that we keep 'always' as the default at first,
> because we do not want to introduce a breaking change without
> warning (at least for an issue like this that has been around
> for a long time).

Yes.

> The 'never' option is a good one for very security-conscious
> users who really want to avoid problems. I don't anticipate that
> users who know about this option and set it themselves are the
> type that would fall for the social engineering required to
> attack using this vector, but I can imagine an IT department
> installing the value in system config across a fleet of machines.

Yes. Setting the 'never' option in a system config is the use case that
motivated this.

> I find the 'cwd' option to not be valuable. It unblocks most
> existing users, but also almost completely removes the protection
> that the option was supposed to provide.

Ok, I agree that it provides next-to-no protection. I'll drop it in this
series; it's easy enough to reimplement if users really want it anyway.

> This leads to what I think would be a valuable replacement for
> the 'cwd' option:
>
> * 'no-embedded' allows non-embedded bare repositories. An
>   _embedded bare repository_ is a bare repository whose parent
>   directory is contained in the worktree of a non-bare Git
>   repository. When in this mode, embedded bare repositories are
>   not allowed unless the parent non-bare Git repository has a
>   'safe.embedded' config value storing the path to the current
>   embedded bare repository.
>
> That was certainly difficult to write, but here it is as
> pseudo-code to hopefully remove some doubt as to how this might
> work:
>
>   if repo is bare:
>     if value == "always":
>        return ALLOWED
>     if value == "never":
>        return FORBIDDEN;
>
>     path = get_parent_repo()
>
>     if !path:
>        return ALLOWED
>     
>     if config_file_has_value("{path}/.git/config", "safe.embedded", repo):
>        return ALLOWED
>
>     return FORBIDDEN
>
> With this kind of option, we can protect users from these
> social engineering attacks while providing an opt-in protection
> for scenarios where embedded bare repos are currently being used
> (while also not breaking anyone using non-embedded bare repos).

[...]

> This 'no-embedded' option is something that I could see as a
> potential new default, after it has proven itself in a released
> version of Git.

I agree, this sounds like a good default that should work for most
users.

That said, I don't think I will implement it, and even if I do, it won't
be in this series. I have serious doubts that I'd be able to deliver it
in a reasonable amount of time (I tried preparing patches to this effect
and failed [1]), and 'never' is sufficient for $DAYJOB's current needs.

I would be very happy to see this come to fruition though. I have no
objections to anyone preparing patches for this, and I'll gladly review
those if that's helpful.

[1] The specific trouble I had was figuring out whether or not the
 'parent' repo was tracking the bare repo, since an untracked bare repo
 in the working tree isn't (in some sense) really "embedded" and it
 can't have come from a remote.

 But maybe the tracking check is unnecessary. We would break a few more
 users without it, but 'safe.embedded' is an easy enough way for a user
 to unbreak themselves.

> I mentioned some other concerns in your PATCH 1 about how we
> are now adding the third use of read_very_early_config() and that
> we should probably refactor that before adding the third option,
> in order to avoid additional performance costs as well as it
> being difficult to audit which config options are only checked
> from these "protected" config files.

Makes sense. I'll ask about specifics on that subthread.

>
> Thanks,
> -Stolee

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 1/2] setup.c: make bare repo discovery optional
  2022-05-16 18:46     ` Derrick Stolee
  2022-05-16 22:25       ` Taylor Blau
@ 2022-05-17 20:24       ` Glen Choo
  2022-05-17 21:51         ` Glen Choo
  1 sibling, 1 reply; 26+ messages in thread
From: Glen Choo @ 2022-05-17 20:24 UTC (permalink / raw)
  To: Derrick Stolee, Glen Choo via GitGitGadget, git
  Cc: Taylor Blau, brian m. carlson, Junio C Hamano, Emily Shaffer

Thanks for being thorough, I find it really helpful.

For brevity, I won't reply to comments that I think are obviously good,
so you can assume I'll incorproate anything that isn't commented on.

Derrick Stolee <derrickstolee@github.com> writes:

> On 5/13/2022 7:37 PM, Glen Choo via GitGitGadget wrote:
>> From: Glen Choo <chooglen@google.com>
>> 
>> +This config setting is only respected when specified in a system or global
>> +config, not when it is specified in a repository config or via the command
>> +line option `-c discovery.bare=<value>`.
>
> We are sprinkling config options that have these same restrictions throughout
> the config documentation. It might be time to define a term like "protected
> config" at the top of git-config.txt and then refer to that from these other
> locations.

Agree, and I think defining the term will be useful in future on-list
discussions.

>> +static int check_bare_repo_allowed(void)
>> +{
>> +	if (discovery_bare_config == DISCOVERY_BARE_UNKNOWN) {
>> +		read_very_early_config(discovery_bare_cb, NULL);
>
> This will add the third place where we use read_very_early_config(),
> adding to the existing calls in tr2_sysenv_load() and
> ensure_valid_ownership(). If I understand it correctly, that means
> that every Git execution in a bare repository will now parse the
> system and global config three times.
>
> This doesn't count the check for uploadpack.packobjectshook in
> upload-pack.c that uses current_config_scope() to restrict its
> value to the system and global config.
>
> We are probably at the point where we need to instead create a
> configset that stores this "protected config" and allow us to
> lookup config keys directly from that configset instead of
> iterating through these config files repeatedly.

Looking at all of the read_very_early_config() calls,

- check_bare_repo_allowed() can use git_configset_get_string()
- ensure_valid_ownership() can use git_configset_get_value_multi()
- tr2_sysenv_load() reads every value with the "trace2." prefix. AFAICT
  configsets only support exact key lookups and I don't see an easy way
  teach configsets to support prefix lookups.

(I didn't look too closely at uploadpack.packobjectshook because I don't
know enough about config scopes to comment.)

So using a configset, we'll still need to read the config files at least
twice. That's better than thrice, but it doesn't cover the
tr2_sysenv_load() use case, and we'll run into this yet again if add
function that reads all config values with a given prefix.

An hacky alternative that covers all of these use cases would be to read
all protected config in a single pass, e.g.

  static struct protected_config {
         struct safe_directory_data safe_directory_data;
         const char *discovery_bare;
         struct string_list tr2_sysenv;
  };

  static int protected_config_cb()
  {
    /* Parse EVERYTHING that belongs in protected_config. */
  }

but protected_config_cb() would have to parse too many unrelated things
for my liking.

So I'll use the configset for the cases where the key is known, and
perhaps we'll punt on tr2_sysenv_load().

>> +	}
>> +	switch (discovery_bare_config) {
>> +	case DISCOVERY_BARE_NEVER:
>> +		return 0;
>> +	case DISCOVERY_BARE_ALWAYS:
>> +		return 1;
>> +	default:
>> +		BUG("invalid discovery_bare_config %d", discovery_bare_config);
>> +	}
>
> You return -1 in discovery_bare_cb when the key matches, but
> the value is not understood. Should we check the return value
> of read_very_early_config(), too?

This comment doesn't apply because unlike most other config reading
functions, read_very_early_config() and read_early_config() die when the
callback returns -1.

I'm not sure why this is the case though, and maybe you think there is
value in having a non-die()-ing variant, e.g.
read_very_early_config_gently()?

>>  };
>>  
>>  /*
>> @@ -1239,6 +1293,8 @@ static enum discovery_result setup_git_directory_gently_1(struct strbuf *dir,
>>  		}
>>  
>>  		if (is_git_directory(dir->buf)) {
>> +			if (!check_bare_repo_allowed())
>> +				return GIT_DIR_DISALLOWED_BARE;
>
> Won't this fail if someone runs a Git command inside of a .git/
> directory for a non-bare repository? I just want to be sure that
> we hit this error instead:
>
> 	fatal: this operation must be run in a work tree
>
> I see that this error is tested in t0008-ignores.sh, but that's
> with the default "always" value. It would be good to explicitly
> check that this is the right error when using the "never" config.

Yes, it will fail if run inside of a .git/ directory. "never" prevents
you from working from inside .git/ unless you set GIT_DIR.

IIRC, we don't show "fatal: this operation must be run in a work
tree" for every Git command, e.g. "git log" works just fine. It makes
sense to show this warning when the CWD supports 'some, but not all' Git
commands, but I don't think this is valuable if we forbid *all* Git
commands.

Instead of trying to make "never" accomodate this use case, perhaps what
we want is a "dotgit-only" option that allows a bare repository if it is
below a .git/ directory. Since we forbid .git in the index, this seems
somewhat safe, but I hadn't proposed this sooner because I don't know if
we need it yet, and I'm certain that there are less secure edge cases
that need to be thought through.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 1/2] setup.c: make bare repo discovery optional
  2022-05-17 20:24       ` Glen Choo
@ 2022-05-17 21:51         ` Glen Choo
  0 siblings, 0 replies; 26+ messages in thread
From: Glen Choo @ 2022-05-17 21:51 UTC (permalink / raw)
  To: Derrick Stolee, Glen Choo via GitGitGadget, git
  Cc: Taylor Blau, brian m. carlson, Junio C Hamano, Emily Shaffer

Glen Choo <chooglen@google.com> writes:

>>> +static int check_bare_repo_allowed(void)
>>> +{
>>> +	if (discovery_bare_config == DISCOVERY_BARE_UNKNOWN) {
>>> +		read_very_early_config(discovery_bare_cb, NULL);
>>
>> This will add the third place where we use read_very_early_config(),
>> adding to the existing calls in tr2_sysenv_load() and
>> ensure_valid_ownership(). If I understand it correctly, that means
>> that every Git execution in a bare repository will now parse the
>> system and global config three times.
>>
>> This doesn't count the check for uploadpack.packobjectshook in
>> upload-pack.c that uses current_config_scope() to restrict its
>> value to the system and global config.
>>
>> We are probably at the point where we need to instead create a
>> configset that stores this "protected config" and allow us to
>> lookup config keys directly from that configset instead of
>> iterating through these config files repeatedly.
>
> Looking at all of the read_very_early_config() calls,
>
> - check_bare_repo_allowed() can use git_configset_get_string()
> - ensure_valid_ownership() can use git_configset_get_value_multi()
> - tr2_sysenv_load() reads every value with the "trace2." prefix. AFAICT
>   configsets only support exact key lookups and I don't see an easy way
>   teach configsets to support prefix lookups.
>
> (I didn't look too closely at uploadpack.packobjectshook because I don't
> know enough about config scopes to comment.)
>
> So using a configset, we'll still need to read the config files at least
> twice. That's better than thrice, but it doesn't cover the
> tr2_sysenv_load() use case, and we'll run into this yet again if add
> function that reads all config values with a given prefix.
>
> An hacky alternative that covers all of these use cases would be to read
> all protected config in a single pass, e.g.
>
>   static struct protected_config {
>          struct safe_directory_data safe_directory_data;
>          const char *discovery_bare;
>          struct string_list tr2_sysenv;
>   };
>
>   static int protected_config_cb()
>   {
>     /* Parse EVERYTHING that belongs in protected_config. */
>   }
>
> but protected_config_cb() would have to parse too many unrelated things
> for my liking.
>
> So I'll use the configset for the cases where the key is known, and
> perhaps we'll punt on tr2_sysenv_load().

Since I'm trying to replace read_very_early_config() anyway, is this a
good time to teach git to respect "-c safe.directory"?

My understanding of [1] is that we only ignore "-c safe.directory"
because read_very_early_config() doesn't support it, but we would prefer
to support it if we could.

[1] https://lore.kernel.org/git/xmqqlevabcsu.fsf@gitster.g/

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2022-05-17 21:52 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-06 18:30 [PATCH] [RFC] setup.c: make bare repo discovery optional Glen Choo via GitGitGadget
2022-05-06 20:33 ` Junio C Hamano
2022-05-09 21:42 ` Taylor Blau
2022-05-09 22:54   ` Junio C Hamano
2022-05-09 23:57     ` Taylor Blau
2022-05-10  0:23       ` Junio C Hamano
2022-05-10 22:00   ` Glen Choo
2022-05-13 23:37 ` [PATCH v2 0/2] " Glen Choo via GitGitGadget
2022-05-13 23:37   ` [PATCH v2 1/2] " Glen Choo via GitGitGadget
2022-05-16 18:12     ` Glen Choo
2022-05-16 18:46     ` Derrick Stolee
2022-05-16 22:25       ` Taylor Blau
2022-05-17 20:24       ` Glen Choo
2022-05-17 21:51         ` Glen Choo
2022-05-13 23:37   ` [PATCH v2 2/2] setup.c: learn discovery.bareRepository=cwd Glen Choo via GitGitGadget
2022-05-16 18:49     ` Derrick Stolee
2022-05-16 16:40   ` [PATCH v2 0/2] setup.c: make bare repo discovery optional Junio C Hamano
2022-05-16 18:36     ` Glen Choo
2022-05-16 19:16       ` Junio C Hamano
2022-05-16 20:27         ` Glen Choo
2022-05-16 22:16           ` Junio C Hamano
2022-05-16 16:43   ` Junio C Hamano
2022-05-16 19:07   ` Derrick Stolee
2022-05-16 22:43     ` Taylor Blau
2022-05-16 23:19     ` Junio C Hamano
2022-05-17 18:56     ` Glen Choo

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).