git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / Atom feed
From: Elijah Newren <newren@gmail.com>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>, "Jeff King" <peff@peff.net>,
	"Rafael Ascensão" <rafa.almas@gmail.com>,
	"SZEDER Gábor" <szeder.dev@gmail.com>,
	"Samuel Lijin" <sxlijin@gmail.com>,
	"Elijah Newren" <newren@gmail.com>
Subject: [PATCH v3 10/12] clean: avoid removing untracked files in a nested git repository
Date: Thu, 12 Sep 2019 15:12:38 -0700
Message-ID: <20190912221240.18057-11-newren@gmail.com> (raw)
In-Reply-To: <20190912221240.18057-1-newren@gmail.com>

Users expect files in a nested git repository to be left alone unless
sufficiently forced (with two -f's).  Unfortunately, in certain
circumstances, git would delete both tracked (and possibly dirty) files
and untracked files within a nested repository.  To explain how this
happens, let's contrast a couple cases.  First, take the following
example setup (which assumes we are already within a git repo):

   git init nested
   cd nested
   >tracked
   git add tracked
   git commit -m init
   >untracked
   cd ..

In this setup, everything works as expected; running 'git clean -fd'
will result in fill_directory() returning the following paths:
   nested/
   nested/tracked
   nested/untracked
and then correct_untracked_entries() would notice this can be compressed
to
   nested/
and then since "nested/" is a directory, we would call
remove_dirs("nested/", ...), which would
check is_nonbare_repository_dir() and then decide to skip it.

However, if someone also creates an ignored file:
   >nested/ignored
then running 'git clean -fd' would result in fill_directory() returning
the same paths:
   nested/
   nested/tracked
   nested/untracked
but correct_untracked_entries() will notice that we had ignored entries
under nested/ and thus simplify this list to
   nested/tracked
   nested/untracked
Since these are not directories, we do not call remove_dirs() which was
the only place that had the is_nonbare_repository_dir() safety check --
resulting in us deleting both the untracked file and the tracked (and
possibly dirty) file.

One possible fix for this issue would be walking the parent directories
of each path and checking if they represent nonbare repositories, but
that would be wasteful.  Even if we added caching of some sort, it's
still a waste because we should have been able to check that "nested/"
represented a nonbare repository before even descending into it in the
first place.  Add a DIR_SKIP_NESTED_GIT flag to dir_struct.flags and use
it to prevent fill_directory() and friends from descending into nested
git repos.

With this change, we also modify two regression tests added in commit
91479b9c72f1 ("t7300: add tests to document behavior of clean and nested
git", 2015-06-15).  That commit, nor its series, nor the six previous
iterations of that series on the mailing list discussed why those tests
coded the expectation they did.  In fact, it appears their purpose was
simply to test _existing_ behavior to make sure that the performance
changes didn't change the behavior.  However, these two tests directly
contradicted the manpage's claims that two -f's were required to delete
files/directories under a nested git repository.  While one could argue
that the user gave an explicit path which matched files/directories that
were within a nested repository, there's a slippery slope that becomes
very difficult for users to understand once you go down that route (e.g.
what if they specified "git clean -f -d '*.c'"?)  It would also be hard
to explain what the exact behavior was; avoid such problems by making it
really simple.

Also, clean up some grammar errors describing this functionality in the
git-clean manpage.

Finally, there are still a couple bugs with -ffd not cleaning out enough
(e.g.  missing the nested .git) and with -ffdX possibly cleaning out the
wrong files (paying attention to outer .gitignore instead of inner).
This patch does not address these cases at all (and does not change the
behavior relative to those flags), it only fixes the handling when given
a single -f.  See
https://public-inbox.org/git/20190905212043.GC32087@szeder.dev/ for more
discussion of the -ffd[X?] bugs.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Documentation/git-clean.txt |  6 +++---
 builtin/clean.c             |  2 ++
 dir.c                       | 10 ++++++++++
 dir.h                       |  3 ++-
 t/t7300-clean.sh            | 10 +++++-----
 5 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/Documentation/git-clean.txt b/Documentation/git-clean.txt
index 3ab749b921..ba31d8d166 100644
--- a/Documentation/git-clean.txt
+++ b/Documentation/git-clean.txt
@@ -37,9 +37,9 @@ OPTIONS
 --force::
 	If the Git configuration variable clean.requireForce is not set
 	to false, 'git clean' will refuse to delete files or directories
-	unless given -f or -i. Git will refuse to delete directories
-	with .git sub directory or file unless a second -f
-	is given.
+	unless given -f or -i.  Git will refuse to modify untracked
+	nested git repositories (directories with a .git subdirectory)
+	unless a second -f is given.
 
 -i::
 --interactive::
diff --git a/builtin/clean.c b/builtin/clean.c
index 68d70e41c0..3a7a63ae71 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -946,6 +946,8 @@ int cmd_clean(int argc, const char **argv, const char *prefix)
 
 	if (force > 1)
 		rm_flags = 0;
+	else
+		dir.flags |= DIR_SKIP_NESTED_GIT;
 
 	dir.flags |= DIR_SHOW_OTHER_DIRECTORIES;
 
diff --git a/dir.c b/dir.c
index 3b2fe1701c..7ff79170fc 100644
--- a/dir.c
+++ b/dir.c
@@ -1451,6 +1451,16 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 		return path_none;
 
 	case index_nonexistent:
+		if (dir->flags & DIR_SKIP_NESTED_GIT) {
+			int nested_repo;
+			struct strbuf sb = STRBUF_INIT;
+			strbuf_addstr(&sb, dirname);
+			nested_repo = is_nonbare_repository_dir(&sb);
+			strbuf_release(&sb);
+			if (nested_repo)
+				return path_none;
+		}
+
 		if (dir->flags & DIR_SHOW_OTHER_DIRECTORIES)
 			break;
 		if (exclude &&
diff --git a/dir.h b/dir.h
index 46c238ab49..739aea7c96 100644
--- a/dir.h
+++ b/dir.h
@@ -156,7 +156,8 @@ struct dir_struct {
 		DIR_SHOW_IGNORED_TOO = 1<<5,
 		DIR_COLLECT_KILLED_ONLY = 1<<6,
 		DIR_KEEP_UNTRACKED_CONTENTS = 1<<7,
-		DIR_SHOW_IGNORED_TOO_MODE_MATCHING = 1<<8
+		DIR_SHOW_IGNORED_TOO_MODE_MATCHING = 1<<8,
+		DIR_SKIP_NESTED_GIT = 1<<9
 	} flags;
 	struct dir_entry **entries;
 	struct dir_entry **ignored;
diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index 530dfdab34..6e6d24c1c3 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -549,7 +549,7 @@ test_expect_failure 'nested (non-empty) bare repositories should be cleaned even
 	test_path_is_missing strange_bare
 '
 
-test_expect_success 'giving path in nested git work tree will remove it' '
+test_expect_success 'giving path in nested git work tree will NOT remove it' '
 	rm -fr repo &&
 	mkdir repo &&
 	(
@@ -561,7 +561,7 @@ test_expect_success 'giving path in nested git work tree will remove it' '
 	git clean -f -d repo/bar/baz &&
 	test_path_is_file repo/.git/HEAD &&
 	test_path_is_dir repo/bar/ &&
-	test_path_is_missing repo/bar/baz
+	test_path_is_file repo/bar/baz/hello.world
 '
 
 test_expect_success 'giving path to nested .git will not remove it' '
@@ -579,7 +579,7 @@ test_expect_success 'giving path to nested .git will not remove it' '
 	test_path_is_dir untracked/
 '
 
-test_expect_success 'giving path to nested .git/ will remove contents' '
+test_expect_success 'giving path to nested .git/ will NOT remove contents' '
 	rm -fr repo untracked &&
 	mkdir repo untracked &&
 	(
@@ -589,7 +589,7 @@ test_expect_success 'giving path to nested .git/ will remove contents' '
 	) &&
 	git clean -f -d repo/.git/ &&
 	test_path_is_dir repo/.git &&
-	test_dir_is_empty repo/.git &&
+	test_path_is_file repo/.git/HEAD &&
 	test_path_is_dir untracked/
 '
 
@@ -671,7 +671,7 @@ test_expect_success 'git clean -d skips untracked dirs containing ignored files'
 	test_path_is_missing foo/b/bb
 '
 
-test_expect_failure 'git clean -d skips nested repo containing ignored files' '
+test_expect_success 'git clean -d skips nested repo containing ignored files' '
 	test_when_finished "rm -rf nested-repo-with-ignored-file" &&
 
 	git init nested-repo-with-ignored-file &&
-- 
2.23.0.173.gad11b3a635.dirty


  parent reply index

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-25 18:59 [PATCH] t7300-clean: demonstrate deleting nested repo with an ignored file breakage SZEDER Gábor
2019-08-25 20:34 ` SZEDER Gábor
2019-08-25 22:32 ` Philip Oakley
2019-08-26  7:48   ` SZEDER Gábor
2019-09-05 15:47 ` [RFC PATCH v2 00/12] Fix some git clean issues Elijah Newren
2019-09-05 15:47   ` [RFC PATCH v2 01/12] t7300: Add some testcases showing failure to clean specified pathspecs Elijah Newren
2019-09-05 15:47   ` [RFC PATCH v2 02/12] dir: fix typo in comment Elijah Newren
2019-09-05 15:47   ` [RFC PATCH v2 03/12] dir: fix off-by-one error in match_pathspec_item Elijah Newren
2019-09-05 15:47   ` [RFC PATCH v2 04/12] dir: Directories should be checked for matching pathspecs too Elijah Newren
2019-09-05 15:47   ` [RFC PATCH v2 05/12] dir: Make the DO_MATCH_SUBMODULE code reusable for a non-submodule case Elijah Newren
2019-09-05 15:47   ` [RFC PATCH v2 06/12] dir: If our pathspec might match files under a dir, recurse into it Elijah Newren
2019-09-05 15:47   ` [RFC PATCH v2 07/12] dir: add commentary explaining match_pathspec_item's return value Elijah Newren
2019-09-05 15:47   ` [RFC PATCH v2 08/12] git-clean.txt: do not claim we will delete files with -n/--dry-run Elijah Newren
2019-09-05 15:47   ` [RFC PATCH v2 09/12] clean: disambiguate the definition of -d Elijah Newren
2019-09-05 15:47   ` [RFC PATCH v2 10/12] clean: avoid removing untracked files in a nested git repository Elijah Newren
2019-09-05 21:20     ` SZEDER Gábor
2019-09-05 15:47   ` [RFC PATCH v2 11/12] clean: rewrap overly long line Elijah Newren
2019-09-05 15:47   ` [RFC PATCH v2 12/12] clean: fix theoretical path corruption Elijah Newren
2019-09-05 19:27     ` SZEDER Gábor
2019-09-07  0:34       ` Elijah Newren
2019-09-05 19:01   ` [RFC PATCH v2 00/12] Fix some git clean issues SZEDER Gábor
2019-09-07  0:33     ` Elijah Newren
2019-09-12 22:12   ` [PATCH v3 " Elijah Newren
2019-09-12 22:12     ` [PATCH v3 01/12] t7300: add testcases showing failure to clean specified pathspecs Elijah Newren
2019-09-13 18:54       ` Junio C Hamano
2019-09-13 19:10         ` Elijah Newren
2019-09-13 20:29           ` Junio C Hamano
2019-09-12 22:12     ` [PATCH v3 02/12] dir: fix typo in comment Elijah Newren
2019-09-12 22:12     ` [PATCH v3 03/12] dir: fix off-by-one error in match_pathspec_item Elijah Newren
2019-09-13 19:05       ` Junio C Hamano
2019-09-12 22:12     ` [PATCH v3 04/12] dir: also check directories for matching pathspecs Elijah Newren
2019-09-12 22:12     ` [PATCH v3 05/12] dir: make the DO_MATCH_SUBMODULE code reusable for a non-submodule case Elijah Newren
2019-09-12 22:12     ` [PATCH v3 06/12] dir: if our pathspec might match files under a dir, recurse into it Elijah Newren
2019-09-13 19:45       ` Junio C Hamano
2019-09-12 22:12     ` [PATCH v3 07/12] dir: add commentary explaining match_pathspec_item's return value Elijah Newren
2019-09-13 20:04       ` Junio C Hamano
2019-09-12 22:12     ` [PATCH v3 08/12] git-clean.txt: do not claim we will delete files with -n/--dry-run Elijah Newren
2019-09-12 22:12     ` [PATCH v3 09/12] clean: disambiguate the definition of -d Elijah Newren
2019-09-12 22:12     ` Elijah Newren [this message]
2019-09-12 22:12     ` [PATCH v3 11/12] clean: rewrap overly long line Elijah Newren
2019-09-12 22:12     ` [PATCH v3 12/12] clean: fix theoretical path corruption Elijah Newren
2019-09-17 16:34     ` [PATCH v4 00/12] Fix some git clean issues Elijah Newren
2019-09-17 16:34       ` [PATCH v4 01/12] t7300: add testcases showing failure to clean specified pathspecs Elijah Newren
2019-09-17 16:34       ` [PATCH v4 02/12] dir: fix typo in comment Elijah Newren
2019-09-17 16:34       ` [PATCH v4 03/12] dir: fix off-by-one error in match_pathspec_item Elijah Newren
2019-09-17 16:34       ` [PATCH v4 04/12] dir: also check directories for matching pathspecs Elijah Newren
2019-09-25 20:39         ` [BUG] git is segfaulting, was " Denton Liu
2019-09-25 21:28           ` Elijah Newren
2019-09-25 21:55             ` Denton Liu
2019-09-26 20:35               ` Denton Liu
2019-09-27  0:12                 ` Elijah Newren
2019-09-27  1:09           ` SZEDER Gábor
2019-09-27  2:17             ` SZEDER Gábor
2019-09-27 17:10               ` Denton Liu
2019-09-30 19:11                 ` [PATCH] dir: special case check for the possibility that pathspec is NULL Elijah Newren
2019-09-30 22:31                   ` Denton Liu
2019-10-01  7:01                     ` Elijah Newren
2019-10-01 18:30                   ` [PATCH v2] " Elijah Newren
2019-10-01 18:40                     ` Denton Liu
2019-10-01 18:54                       ` Elijah Newren
2019-10-01 18:55                       ` [PATCH v3] " Elijah Newren
2019-10-01 19:35                         ` Denton Liu
2019-10-01 19:39                           ` Elijah Newren
2019-10-02 15:51                             ` Elijah Newren
2019-10-07 18:04                         ` SZEDER Gábor
2019-09-17 16:34       ` [PATCH v4 05/12] dir: make the DO_MATCH_SUBMODULE code reusable for a non-submodule case Elijah Newren
2019-09-17 16:34       ` [PATCH v4 06/12] dir: if our pathspec might match files under a dir, recurse into it Elijah Newren
2019-09-17 16:34       ` [PATCH v4 07/12] dir: add commentary explaining match_pathspec_item's return value Elijah Newren
2019-09-17 16:35       ` [PATCH v4 08/12] git-clean.txt: do not claim we will delete files with -n/--dry-run Elijah Newren
2019-09-17 16:35       ` [PATCH v4 09/12] clean: disambiguate the definition of -d Elijah Newren
2019-09-17 16:35       ` [PATCH v4 10/12] clean: avoid removing untracked files in a nested git repository Elijah Newren
2019-09-17 16:35       ` [PATCH v4 11/12] clean: rewrap overly long line Elijah Newren
2019-09-17 16:35       ` [PATCH v4 12/12] clean: fix theoretical path corruption Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190912221240.18057-11-newren@gmail.com \
    --to=newren@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=rafa.almas@gmail.com \
    --cc=sxlijin@gmail.com \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

git@vger.kernel.org list mirror (unofficial, one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Example config snippet for mirrors

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git