git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Glen Choo <chooglen@google.com>
To: Calvin Wan <calvinwan@google.com>, git@vger.kernel.org
Cc: Josh Steadmon <steadmon@google.com>,
	peff@peff.net, gitster@pobox.com,
	Calvin Wan <calvinwan@google.com>
Subject: Re: [PATCH v2 6/6] add: reject nested repositories
Date: Mon, 06 Mar 2023 18:04:08 -0800	[thread overview]
Message-ID: <kl6lilfdiag7.fsf@chooglen-macbookpro.roam.corp.google.com> (raw)
In-Reply-To: <20230228185642.2357806-6-calvinwan@google.com>

Calvin Wan <calvinwan@google.com> writes:

> At $dayjob, we have found that even this advice is insufficient to stop
> users from committing unclonable embedded repos in shared projects.
> This causes toil for the owners of the top-level project repository as
> they must clean up the resulting gitlinks. Additionally, these mistakes
> are often made by partners outside of $dayjob, which means that a simple
> organization-wide change to the default Git config would be insufficient
> to prevent these mistakes.
>
> Due to this experience, we believe that Git's default behavior should be
> changed to disallow adding embedded repositories. This commit changes
> the existing warning into a fatal error, rewrites the advice given, and
> deprecates `--no-warn-embedded-repo` in favor of `--allow-embedded-repo`
> to bypass the fatal error.

Sounds good, thanks.

> ---no-warn-embedded-repo::
> -	By default, `git add` will warn when adding an embedded
> +--allow-embedded-repo::
> +	By default, `git add` will error out when adding an embedded
>  	repository to the index without using `git submodule add` to
> -	create an entry in `.gitmodules`. This option will suppress the
> -	warning (e.g., if you are manually performing operations on
> -	submodules).
> +	create an entry in `.gitmodules`. This option will allow the
> +	embedded repository to be added. (e.g., if you are manually
> +	performing operations on submodules).

Okay.

> +--no-warn-embedded-repo::
> +	This option is deprecated in favor of '--add-embedded-repo'.
> +	Passing this option still suppresses advice but does not bypass
> +	the error.

Hm, why would a user want to suppress the warning but still have "git
add" fail?

If this is for backwards compatibility, i.e. users get the same behavior
if they pass "--no-warn-embedded-repo" before and after this patch, then
shouldn't this also allow the embedded repo to be added (i.e. it is an
alias of --allow-embedded-repo)?

> @@ -409,48 +412,53 @@ static int add_config(const char *var, const char *value, void *cb)
>  }
>  
>  static const char embedded_advice[] = N_(
> -"You've added another git repository inside your current repository.\n"
> +"You attempted to add another git repository inside your current repository.\n"
>  "Clones of the outer repository will not contain the contents of\n"
>  "the embedded repository and will not know how to obtain it.\n"
>  "If you meant to add a submodule, use:\n"
>  "\n"
>  "	git submodule add <url> %s\n"
>  "\n"
> -"If you added this path by mistake, you can remove it from the\n"
> -"index with:\n"
> +"See \"git help submodule\" for more information.\n"
>  "\n"
> -"	git rm --cached %s\n"
> +"If you cannot use submodules, you may bypass this check with:\n"
>  "\n"
> -"See \"git help submodule\" for more information."
> +"	git add --allow-embedded-repo %s\n"
>  );

Is there a particular reason you reordered the

  "See \"git help submodule\" for more information.\n"

line? I personally like it at the end, and if a user is already used to
looking for it at the end, they can keep looking there.

> -static void check_embedded_repo(const char *path)
> +static int check_embedded_repo(const char *path)
>  {
> +	int ret = 0;
>  	struct strbuf name = STRBUF_INIT;
>  	static int adviced_on_embedded_repo = 0;
>  
> -	if (!warn_on_embedded_repo)
> -		return;
> +	if (allow_embedded_repo)
> +		goto cleanup;
>  	if (!ends_with(path, "/"))
> -		return;
> +		goto cleanup;
> +
> +	ret = 1;
>  
>  	/* Drop trailing slash for aesthetics */
>  	strbuf_addstr(&name, path);
>  	strbuf_strip_suffix(&name, "/");
>  
> -	warning(_("adding embedded git repository: %s"), name.buf);
> +	error(_("cannot add embedded git repository: %s"), name.buf);
>  	if (!adviced_on_embedded_repo &&
> -	    advice_enabled(ADVICE_ADD_EMBEDDED_REPO)) {
> +		warn_on_embedded_repo &&
> +		advice_enabled(ADVICE_ADD_EMBEDDED_REPO)) {
>  		advise(embedded_advice, name.buf, name.buf);
>  		adviced_on_embedded_repo = 1;
>  	}
>  
> +cleanup:
>  	strbuf_release(&name);
> +	return ret;
>  }

So we give an error message when we first encounter the embedded repo,
okay...

>  static int add_files(struct dir_struct *dir, int flags)
>  {
> -	int i, exit_status = 0;
> +	int i, exit_status = 0, embedded_repo = 0;
>  	struct string_list matched_sparse_paths = STRING_LIST_INIT_NODUP;
>  
>  	if (dir->ignored_nr) {
> @@ -476,10 +484,13 @@ static int add_files(struct dir_struct *dir, int flags)
>  				die(_("adding files failed"));
>  			exit_status = 1;
>  		} else {
> -			check_embedded_repo(dir->entries[i]->name);
> +			embedded_repo |= check_embedded_repo(dir->entries[i]->name);
>  		}
>  	}
>  
> +	if (embedded_repo)
> +		die(_("refusing to add embedded git repositories"));
> +

And then we die(), giving a similar message again. That feels like
overkill, but I'm not sure if it's unidiomatic.

Is there a reason why we shouldn't just die() the first time we
encounter the embedded repo? The difference is that in this patch, we
actually add all of the non-submodule files before die()-ing, but I'm
not sure if that's what users would expect. Personally at least, I'd
expect "git add" to abort the moment it encountered something wrong.

> diff --git a/t/t0008-ignores.sh b/t/t0008-ignores.sh
> index c70d11bc91..a53dac5931 100755
> --- a/t/t0008-ignores.sh
> +++ b/t/t0008-ignores.sh
> @@ -191,7 +191,7 @@ test_expect_success 'setup' '
>  		git add a &&
>  		git commit -m"commit in submodule"
>  	) &&
> -	git add a/submodule &&
> +	git add --allow-embedded-repo a/submodule &&
>  	cat <<-\EOF >.gitignore &&
>  		one
>  		ignored-*

Given that --allow-embedded-repo is meant to be an escape hatch, and
we've already adjusted so many tests to avoid using it, I'd imagine that
any uses of it in the test suite have to be quite well-justified. It
would have been helpful to see such a justification in the comments on
each of the changes or some general principles in the commit message.

I've tried testing most of these with "git submodule add". This one
works around a test failure in test 346 "submodule", which checks that
ignores should fail if the path is in the submodule, like

	test_check_ignore "a/submodule/one" 128 &&
	test_stderr "fatal: Pathspec '\''a/submodule/one'\'' is in submodule '\''a/submodule'\''"

I don't understand why git-check-ignore should succeed on submodules but
not an embedded repo. I'll investigate more later.

> diff --git a/t/t2103-update-index-ignore-missing.sh b/t/t2103-update-index-ignore-missing.sh
> index 11bc136f6e..1ce4fc49fa 100755
> --- a/t/t2103-update-index-ignore-missing.sh
> +++ b/t/t2103-update-index-ignore-missing.sh
> @@ -36,7 +36,7 @@ test_expect_success basics '
>  		git add file &&
>  		git commit -m "sub initial"
>  	) &&
> -	git add ./xyzzy &&
> +	git add --allow-embedded-repo ./xyzzy &&
>  
>  	test_tick &&
>  	git commit -m initial &&

This seems to pass with "git submodule add", but maybe we don't want to
introduce .gitmodules into a low level test.

> diff --git a/t/t4035-diff-quiet.sh b/t/t4035-diff-quiet.sh
> index 76f8034c60..bfd87891f4 100755
> --- a/t/t4035-diff-quiet.sh
> +++ b/t/t4035-diff-quiet.sh
> @@ -66,7 +66,7 @@ test_expect_success 'git diff-index --cached HEAD^' '
>  test_expect_success 'git diff-index --cached HEAD^' '
>  	echo text >>b &&
>  	echo 3 >c &&
> -	git add . &&
> +	git add --allow-embedded-repo . &&
>  	test_expect_code 1 git diff-index --quiet --cached HEAD^ >cnt &&
>  	test_line_count = 0 cnt
>  '

This is also a low-level test for gitlinks in the index, so eliminating
noise from .gitmodules also seems okay I think.

> diff --git a/t/t6430-merge-recursive.sh b/t/t6430-merge-recursive.sh
> index 07067bb347..ae435fa492 100755
> --- a/t/t6430-merge-recursive.sh
> +++ b/t/t6430-merge-recursive.sh
> @@ -677,7 +677,7 @@ test_expect_success 'merging with triple rename across D/F conflict' '
>  	echo content3 >sub2/file3 &&
>  	mkdir simple &&
>  	echo base >simple/bar &&
> -	git add -A &&
> +	git add -A --allow-embedded-repo &&
>  	test_tick &&
>  	git commit -m base &&
>  

Similarly, this tests low-level gitlink merging, so eliminating
.gitmodules sounds okay.

> diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh
> index eae6a46ef3..18ef9141b7 100755
> --- a/t/t7400-submodule-basic.sh
> +++ b/t/t7400-submodule-basic.sh
> @@ -118,7 +118,7 @@ test_expect_success 'setup - repository in init subdirectory' '
>  test_expect_success 'setup - commit with gitlink' '
>  	echo a >a &&
>  	echo z >z &&
> -	git add a init z &&
> +	git add --allow-embedded-repo a init z &&
>  	git commit -m "super commit 1"
>  '
>  

This is needed because test 38

  test_expect_success 'status should fail for unmapped paths' '
    test_must_fail git submodule status
  '

presumably requires the submodule to not be in .gitmodules. This needs a
comment, I think.

> @@ -771,7 +771,7 @@ test_expect_success 'set up for relative path tests' '
>  			git init &&
>  			test_commit foo
>  		) &&
> -		git add sub &&
> +		git add --allow-embedded-repo sub &&
>  		git config -f .gitmodules submodule.sub.path sub &&
>  		git config -f .gitmodules submodule.sub.url ../subrepo &&
>  		cp .git/config pristine-.git-config &&

In the next test, we're checking that "git submodule init" fixes a
gitlink without .gitmodules. Okay. Might benefit from a comment.

> diff --git a/t/t7412-submodule-absorbgitdirs.sh b/t/t7412-submodule-absorbgitdirs.sh
> index 2859695c6d..d1662aa23c 100755
> --- a/t/t7412-submodule-absorbgitdirs.sh
> +++ b/t/t7412-submodule-absorbgitdirs.sh
> @@ -100,7 +100,7 @@ test_expect_success 'absorb the git dir in a nested submodule' '
>  test_expect_success 'setup a gitlink with missing .gitmodules entry' '
>  	git init sub2 &&
>  	test_commit -C sub2 first &&
> -	git add sub2 &&
> +	git add --allow-embedded-repo sub2 &&
>  	git commit -m superproject
>  '

The test is literally called 'setup a gitlink with missing .gitmodules
entry', so okay.

> diff --git a/t/t7450-bad-git-dotfiles.sh b/t/t7450-bad-git-dotfiles.sh
> index ba1f569bcb..4b3010c9e2 100755
> --- a/t/t7450-bad-git-dotfiles.sh
> +++ b/t/t7450-bad-git-dotfiles.sh
> @@ -307,7 +307,7 @@ test_expect_success 'git dirs of sibling submodules must not be nested' '
>  		EOF
>  		git clone . thing1 &&
>  		git clone . thing2 &&
> -		git add .gitmodules thing1 thing2 &&
> +		git add --allow-embedded-repo .gitmodules thing1 thing2 &&
>  		test_tick &&
>  		git commit -m nested
>  	) &&

This one can be adjusted. The important thing is that .gitmodules is bad
_after_ we have "git submodule add"-ed the submodules, so this works:

	git init nested &&
	test_commit -C nested nested &&
	(
		cd nested &&
		git clone . thing1 &&
		git clone . thing2 &&
		git submodule add ./thing1 &&
		git submodule add ./thing2 &&
		cat >.gitmodules <<-EOF &&
		[submodule "hippo"]
			url = .
			path = thing1
		[submodule "hippo/hooks"]
			url = .
			path = thing2
		EOF
		git add .gitmodules &&
		test_tick &&
		git commit -m nested
	) &&
	test_must_fail git clone --recurse-submodules nested clone 2>err &&
	test_i18ngrep "is inside git dir" err

> -- 
> 2.39.2.722.g9855ee24e9-goog

      reply	other threads:[~2023-03-07  2:04 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-13 18:21 [RFC PATCH 0/6] add: block invalid submodules Calvin Wan
2023-02-13 18:21 ` [RFC PATCH 1/6] leak fix: cache_put_path Calvin Wan
2023-02-13 19:23   ` Junio C Hamano
2023-02-14 19:56     ` Calvin Wan
2023-02-14 21:08       ` Junio C Hamano
2023-02-14 21:39         ` Calvin Wan
2023-02-14 21:59           ` Junio C Hamano
2023-02-13 18:21 ` [RFC PATCH 2/6] t4041, t4060: modernize test style Calvin Wan
2023-02-13 19:41   ` Junio C Hamano
2023-02-14 20:22     ` Calvin Wan
2023-02-13 18:21 ` [RFC PATCH 3/6] tests: Use `git submodule add` instead of `git add` Calvin Wan
2023-02-13 18:21 ` [RFC PATCH 4/6] tests: use `git submodule add` and fix expected diffs Calvin Wan
2023-02-13 23:07   ` Junio C Hamano
2023-02-13 23:19     ` Junio C Hamano
2023-02-13 18:21 ` [RFC PATCH 5/6] tests: use `git submodule add` and fix expected status Calvin Wan
2023-02-13 18:21 ` [RFC PATCH 6/6] add: reject nested repositories Calvin Wan
2023-02-13 20:42   ` Jeff King
2023-02-14  2:17     ` Junio C Hamano
2023-02-14 16:07       ` Jeff King
2023-02-14 16:32         ` Junio C Hamano
2023-02-14 21:45           ` Calvin Wan
2023-02-28 18:52 ` [PATCH v2 0/6] add: block invalid submodules Calvin Wan
2023-02-28 18:56   ` [PATCH v2 1/6] t4041, t4060: modernize test style Calvin Wan
2023-03-06 19:32     ` Glen Choo
2023-03-06 20:40       ` Calvin Wan
2023-02-28 18:56   ` [PATCH v2 2/6] tests: Use `git submodule add` instead of `git add` Calvin Wan
2023-02-28 23:30     ` Junio C Hamano
2023-03-03  0:16       ` Calvin Wan
2023-03-06 21:26     ` Glen Choo
2023-02-28 18:56   ` [PATCH v2 3/6] tests: use `git submodule add` and fix expected diffs Calvin Wan
2023-03-06 23:34     ` Glen Choo
2023-03-06 23:57       ` Junio C Hamano
2023-02-28 18:56   ` [PATCH v2 4/6] tests: use `git submodule add` and fix expected status Calvin Wan
2023-03-07  0:15     ` Glen Choo
2023-02-28 18:56   ` [PATCH v2 5/6] tests: remove duplicate .gitmodules path Calvin Wan
2023-02-28 23:35     ` Junio C Hamano
2023-03-02 23:09       ` Calvin Wan
2023-03-07  0:51     ` Glen Choo
2023-02-28 18:56   ` [PATCH v2 6/6] add: reject nested repositories Calvin Wan
2023-03-07  2:04     ` Glen Choo [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=kl6lilfdiag7.fsf@chooglen-macbookpro.roam.corp.google.com \
    --to=chooglen@google.com \
    --cc=calvinwan@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=steadmon@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).