git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / Atom feed
* [PATCH] t7300-clean: demonstrate deleting nested repo with an ignored file breakage
@ 2019-08-25 18:59 SZEDER Gábor
  2019-08-25 20:34 ` SZEDER Gábor
                   ` (2 more replies)
  0 siblings, 3 replies; 73+ messages in thread
From: SZEDER Gábor @ 2019-08-25 18:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Samuel Lijin, SZEDER Gábor

'git clean -fd' must not delete an untracked directory if it belongs
to a different Git repository or worktree.  Unfortunately, if a
'.gitignore' rule in the outer repository happens to match a file in a
nested repository or worktree, then something goes awry and 'git clean
-fd' does delete the content of the nested repository's worktree
except that ignored file, potentially leading to data loss.

Add a test to 't7300-clean.sh' to demonstrate this breakage.

This issue is a regression introduced in 6b1db43109 (clean: teach
clean -d to preserve ignored paths, 2017-05-23).

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---

BEWARE: Our toplevel '.gitignore' currently contains the '*.manifest'
rule [1], which ignores the file 'compat/win32/git.manifest' [2], so
if you use nested worktrees in your git repo, then a 'git clean -fd'
will delete them.

[1] 516dfb8416 (.gitignore: touch up the entries regarding Visual
    Studio, 2019-07-29)
[2] fe90397604 (mingw: embed a manifest to trick UAC into Doing The
    Right Thing, 2019-06-27)


 t/t7300-clean.sh | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index a2c45d1902..d01fd120ab 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -669,6 +669,28 @@ test_expect_success 'git clean -d skips untracked dirs containing ignored files'
 	test_path_is_missing foo/b/bb
 '
 
+test_expect_failure 'git clean -d skips nested repo containing ignored files' '
+	test_when_finished "rm -rf nested-repo-with-ignored-file" &&
+
+	git init nested-repo-with-ignored-file &&
+	(
+		cd nested-repo-with-ignored-file &&
+		>file &&
+		git add file &&
+		git commit -m Initial &&
+
+		# This file is ignored by a .gitignore rule in the outer repo
+		# added in the previous test.
+		>ignoreme
+	) &&
+
+	git clean -fd &&
+
+	test_path_is_file nested-repo-with-ignored-file/.git/index &&
+	test_path_is_file nested-repo-with-ignored-file/ignoreme &&
+	test_path_is_file nested-repo-with-ignored-file/file
+'
+
 test_expect_success MINGW 'handle clean & core.longpaths = false nicely' '
 	test_config core.longpaths false &&
 	a50=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa &&
-- 
2.23.0.331.g4e51dcdf11


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] t7300-clean: demonstrate deleting nested repo with an ignored file breakage
  2019-08-25 18:59 [PATCH] t7300-clean: demonstrate deleting nested repo with an ignored file breakage SZEDER Gábor
@ 2019-08-25 20:34 ` SZEDER Gábor
  2019-08-25 22:32 ` Philip Oakley
  2019-09-05 15:47 ` [RFC PATCH v2 00/12] Fix some git clean issues Elijah Newren
  2 siblings, 0 replies; 73+ messages in thread
From: SZEDER Gábor @ 2019-08-25 20:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Samuel Lijin

On Sun, Aug 25, 2019 at 08:59:18PM +0200, SZEDER Gábor wrote:
> 'git clean -fd' must not delete an untracked directory if it belongs
> to a different Git repository or worktree.  Unfortunately, if a
> '.gitignore' rule in the outer repository happens to match a file in a
> nested repository or worktree, then something goes awry and 'git clean
> -fd' does delete the content of the nested repository's worktree
> except that ignored file, potentially leading to data loss.
> 
> Add a test to 't7300-clean.sh' to demonstrate this breakage.
> 
> This issue is a regression introduced in 6b1db43109 (clean: teach
> clean -d to preserve ignored paths, 2017-05-23).
> 
> Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
> ---
> 
> BEWARE: Our toplevel '.gitignore' currently contains the '*.manifest'
> rule [1], which ignores the file 'compat/win32/git.manifest' [2], so
> if you use nested worktrees in your git repo, then a 'git clean -fd'
> will delete them.

OK, singling out that manifest file is just nonsense, any object file,
etc... in the nested worktree/repo can trigger the same issue just as
well.

(It just so happened that when I ran 'git clean -fd' I had a nested
worktree where I haven't build anything yet, and besides the .git file
only that 'git.manifest' file survived in the nested worktree, and then
I got misled by it.)
 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] t7300-clean: demonstrate deleting nested repo with an ignored file breakage
  2019-08-25 18:59 [PATCH] t7300-clean: demonstrate deleting nested repo with an ignored file breakage SZEDER Gábor
  2019-08-25 20:34 ` SZEDER Gábor
@ 2019-08-25 22:32 ` Philip Oakley
  2019-08-26  7:48   ` SZEDER Gábor
  2019-09-05 15:47 ` [RFC PATCH v2 00/12] Fix some git clean issues Elijah Newren
  2 siblings, 1 reply; 73+ messages in thread
From: Philip Oakley @ 2019-08-25 22:32 UTC (permalink / raw)
  To: SZEDER Gábor, Junio C Hamano; +Cc: git, Samuel Lijin

Hi Szeder,

On 25/08/2019 19:59, SZEDER Gábor wrote:
> 'git clean -fd' must not delete an untracked directory if it belongs
s/untracked//
I don't believe it should matter either way for a sub-module 
(sub-directory).
> to a different Git repository or worktree.
msybr split the assertion from the fault explanation.
>   Unfortunately, if a
> '.gitignore' rule in the outer repository happens to match a file in a
> nested repository or worktree, then something goes awry and 'git clean
> -fd' does delete the content of the nested repository's worktree
good so far.
> except that ignored file, potentially leading to data loss.
this appears at cross purposes as the description has changed from 
'ignored/untracked directory' to 'ignored file'. I'm not sure which part 
the data loss is meant to refer to.
>
> Add a test to 't7300-clean.sh' to demonstrate this breakage.
>
> This issue is a regression introduced in 6b1db43109 (clean: teach
> clean -d to preserve ignored paths, 2017-05-23).
>
> Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
> ---
>
> BEWARE: Our toplevel '.gitignore' currently contains the '*.manifest'
> rule [1], which ignores the file 'compat/win32/git.manifest' [2], so
> if you use nested worktrees in your git repo, then a 'git clean -fd'
> will delete them.
>
> [1] 516dfb8416 (.gitignore: touch up the entries regarding Visual
>      Studio, 2019-07-29)
> [2] fe90397604 (mingw: embed a manifest to trick UAC into Doing The
>      Right Thing, 2019-06-27)
>
>
>   t/t7300-clean.sh | 22 ++++++++++++++++++++++
>   1 file changed, 22 insertions(+)
>
> diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
> index a2c45d1902..d01fd120ab 100755
> --- a/t/t7300-clean.sh
> +++ b/t/t7300-clean.sh
> @@ -669,6 +669,28 @@ test_expect_success 'git clean -d skips untracked dirs containing ignored files'
>   	test_path_is_missing foo/b/bb
>   '
>   
> +test_expect_failure 'git clean -d skips nested repo containing ignored files' '
> +	test_when_finished "rm -rf nested-repo-with-ignored-file" &&
> +
> +	git init nested-repo-with-ignored-file &&
> +	(
> +		cd nested-repo-with-ignored-file &&
> +		>file &&
> +		git add file &&
> +		git commit -m Initial &&
> +
> +		# This file is ignored by a .gitignore rule in the outer repo
> +		# added in the previous test.
> +		>ignoreme
> +	) &&
> +
> +	git clean -fd &&
> +
> +	test_path_is_file nested-repo-with-ignored-file/.git/index &&
> +	test_path_is_file nested-repo-with-ignored-file/ignoreme &&
> +	test_path_is_file nested-repo-with-ignored-file/file
> +'
> +
>   test_expect_success MINGW 'handle clean & core.longpaths = false nicely' '
>   	test_config core.longpaths false &&
>   	a50=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa &&


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] t7300-clean: demonstrate deleting nested repo with an ignored file breakage
  2019-08-25 22:32 ` Philip Oakley
@ 2019-08-26  7:48   ` SZEDER Gábor
  0 siblings, 0 replies; 73+ messages in thread
From: SZEDER Gábor @ 2019-08-26  7:48 UTC (permalink / raw)
  To: Philip Oakley; +Cc: Junio C Hamano, git, Samuel Lijin

On Sun, Aug 25, 2019 at 11:32:28PM +0100, Philip Oakley wrote:
> Hi Szeder,
> 
> On 25/08/2019 19:59, SZEDER Gábor wrote:
> >'git clean -fd' must not delete an untracked directory if it belongs
> s/untracked//
> I don't believe it should matter either way for a sub-module
> (sub-directory).

I just paraphrased the documentation of the '-d' option for a bit of
context.

   Remove untracked directories in addition to untracked files. If an
   untracked directory is managed by a different Git repository, it is
   not removed by default. Use -f option twice if you really want to
   remove such a directory.

> >to a different Git repository or worktree.
> msybr split the assertion from the fault explanation.
> >  Unfortunately, if a
> >'.gitignore' rule in the outer repository happens to match a file in a
> >nested repository or worktree, then something goes awry and 'git clean
> >-fd' does delete the content of the nested repository's worktree
> good so far.
> >except that ignored file, potentially leading to data loss.
> this appears at cross purposes as the description has changed from
> 'ignored/untracked directory' to 'ignored file'.

The description does not mention any ignored directories.

> I'm not sure which part the
> data loss is meant to refer to.

Well, there is only one part where the description talks about stuff
getting deleted... and that's what it refers to :)

> >Add a test to 't7300-clean.sh' to demonstrate this breakage.
> >
> >This issue is a regression introduced in 6b1db43109 (clean: teach
> >clean -d to preserve ignored paths, 2017-05-23).
> >
> >Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
> >---

On a related note, 'git clean -fdx' does leave the nested repository
or worktree intact in the same situation, as it should.

> >  t/t7300-clean.sh | 22 ++++++++++++++++++++++
> >  1 file changed, 22 insertions(+)
> >
> >diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
> >index a2c45d1902..d01fd120ab 100755
> >--- a/t/t7300-clean.sh
> >+++ b/t/t7300-clean.sh
> >@@ -669,6 +669,28 @@ test_expect_success 'git clean -d skips untracked dirs containing ignored files'
> >  	test_path_is_missing foo/b/bb
> >  '
> >+test_expect_failure 'git clean -d skips nested repo containing ignored files' '
> >+	test_when_finished "rm -rf nested-repo-with-ignored-file" &&
> >+
> >+	git init nested-repo-with-ignored-file &&
> >+	(
> >+		cd nested-repo-with-ignored-file &&
> >+		>file &&
> >+		git add file &&
> >+		git commit -m Initial &&
> >+
> >+		# This file is ignored by a .gitignore rule in the outer repo
> >+		# added in the previous test.
> >+		>ignoreme
> >+	) &&
> >+
> >+	git clean -fd &&
> >+
> >+	test_path_is_file nested-repo-with-ignored-file/.git/index &&
> >+	test_path_is_file nested-repo-with-ignored-file/ignoreme &&
> >+	test_path_is_file nested-repo-with-ignored-file/file
> >+'
> >+
> >  test_expect_success MINGW 'handle clean & core.longpaths = false nicely' '
> >  	test_config core.longpaths false &&
> >  	a50=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa &&
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [RFC PATCH v2 00/12] Fix some git clean issues
  2019-08-25 18:59 [PATCH] t7300-clean: demonstrate deleting nested repo with an ignored file breakage SZEDER Gábor
  2019-08-25 20:34 ` SZEDER Gábor
  2019-08-25 22:32 ` Philip Oakley
@ 2019-09-05 15:47 ` Elijah Newren
  2019-09-05 15:47   ` [RFC PATCH v2 01/12] t7300: Add some testcases showing failure to clean specified pathspecs Elijah Newren
                     ` (13 more replies)
  2 siblings, 14 replies; 73+ messages in thread
From: Elijah Newren @ 2019-09-05 15:47 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Rafael Ascensão, SZEDER Gábor, Samuel Lijin,
	Elijah Newren

NOTE: This series builds on sg/clean-nested-repo-with-ignored, as it
      (among other things) modifies his testcase from expect_failure
      to expect_success.  Also, Peff is probably the only one who
      remembers v1 (and even he may have forgotten it): v1 was posted
      a year and a half ago.

This patch series fixes a few issues with git-clean:
  * Failure to clean when multiple pathspecs are specified, reported both
    in April 2018[1] and again in May 2019[2].
  * Failure to preserve both tracked and untracked files within a nested
    Git repository reported a few weeks ago by SZEDER[3].

[1] https://public-inbox.org/git/20180405173446.32372-4-newren@gmail.com/
[2] https://public-inbox.org/git/20190531183651.10067-1-rafa.almas@gmail.com/
[3] https://public-inbox.org/git/20190825185918.3909-1-szeder.dev@gmail.com/

I still never got answers to some questions in v1 of my RFC, so after
considerable thought I eventually decided to:

  * Declare the existing documentation to be ambiguous and hard to
    interpret correctly; modified the documentation to clearly
    document 'correct behavior' with how different pieces interact.
    
  * Overrule four regression tests as having the wrong *expectation*,
    and modify them to have a correct one.  That sounds like a
    backward compatibility issue BUT: The tests were written to check
    for issues that were orthogonal to the pieces that mattered in this
    series and thus couldn't be viewed as actually having an opinion
    on correct behavior on my issues; rather, they were simply
    reinforcing existing (buggy) implementation results.

  * Add a few tests which actually check relevant interactions of
    parameters and setup, to make this area less ambiguous.  (Though
    one of them was added by SZEDER before my patches, and I should
    probably add a couple more tests...)

Help from reviewers:

The biggest area I need help from reviewers is to look at the commit
messages for patches 9 and 10, to see if folks agree with my
declaration of 'correct behavior' and my changes to the regression
tests.  If those are good, this series can proceed.  If they aren't,
and someone else can't provide an alternate easy-to-explain 'correct
behavior' that we should implement and which is devoid of ugly edge
cases for users, then this patch series may languish for another few
years.

Other notes:
  * Patches 1-6 were included in v1 and have almost no changes (just one
    fix pointed out by Peff).
  * Patch 6's commit message has some additional RFC-related comments
    and questions, one of which ties in with Patch 9.
  * Patch 7 was added as per (old) conversation with Peff.
  * Patch 9 & 10 are in most need of review (see above); each has
    lengthy commit messages.
  * It would be nice if someone knows whether the codepath edited in
    Patch 12 is dead code.  If so, we could change that patch to just
    drop that if-check block.  If it's not dead code, that patch fixes
    what is probably a rare but ugly bug.

Elijah Newren (12):
  t7300: Add some testcases showing failure to clean specified pathspecs
  dir: fix typo in comment
  dir: fix off-by-one error in match_pathspec_item
  dir: Directories should be checked for matching pathspecs too
  dir: Make the DO_MATCH_SUBMODULE code reusable for a non-submodule
    case
  dir: If our pathspec might match files under a dir, recurse into it
  dir: add commentary explaining match_pathspec_item's return value
  git-clean.txt: do not claim we will delete files with -n/--dry-run
  clean: disambiguate the definition of -d
  clean: avoid removing untracked files in a nested git repository
  clean: rewrap overly long line
  clean: fix theoretical path corruption

 Documentation/git-clean.txt | 16 +++++-----
 builtin/clean.c             | 17 ++++++++--
 dir.c                       | 63 +++++++++++++++++++++++++++----------
 dir.h                       |  8 +++--
 t/t7300-clean.sh            | 44 +++++++++++++++++++++++---
 5 files changed, 114 insertions(+), 34 deletions(-)

-- 
2.22.1.11.g45a39ee867


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [RFC PATCH v2 01/12] t7300: Add some testcases showing failure to clean specified pathspecs
  2019-09-05 15:47 ` [RFC PATCH v2 00/12] Fix some git clean issues Elijah Newren
@ 2019-09-05 15:47   ` Elijah Newren
  2019-09-05 15:47   ` [RFC PATCH v2 02/12] dir: fix typo in comment Elijah Newren
                     ` (12 subsequent siblings)
  13 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-09-05 15:47 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Rafael Ascensão, SZEDER Gábor, Samuel Lijin,
	Elijah Newren

Someone brought me a testcase where multiple git-clean invocations were
required to clean out unwanted files:
  mkdir d{1,2}
  touch d{1,2}/ut
  touch d1/t && git add d1/t
With this setup, the user would need to run
  git clean -ffd */ut
twice to delete both ut files.

A little testing showed some interesting variants:
  * If only one of those two ut files existed (either one), then only one
    clean command would be necessary.
  * If both directories had tracked files, then only one git clean would
    be necessary to clean both files.
  * If both directories had no tracked files then the clean command above
    would never clean either of the untracked files despite the pathspec
    explicitly calling both of them out.

A bisect showed that the failure to clean out the files started with
commit cf424f5fd89b ("clean: respect pathspecs with "-d", 2014-03-10).
However, that pointed to a separate issue: while the "-d" flag was used
by the original user who showed me this problem, that flag should have
been irrelevant to this problem.  Testing again without the "-d" flag
showed that the same buggy behavior exists without using that flag, and
has in fact existed since before cf424f5fd89b.

Add testcases showing that multiple untracked files within entirely
untracked directories cannot be cleaned when specifying these files to
git clean via pathspecs.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t7300-clean.sh | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index d01fd120ab..2c254c773c 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -691,6 +691,38 @@ test_expect_failure 'git clean -d skips nested repo containing ignored files' '
 	test_path_is_file nested-repo-with-ignored-file/file
 '
 
+test_expect_failure 'git clean handles being told what to clean' '
+	mkdir -p d1 d2 &&
+	touch d1/ut d2/ut &&
+	git clean -f */ut &&
+	test_path_is_missing d1/ut &&
+	test_path_is_missing d2/ut
+'
+
+test_expect_failure 'git clean handles being told what to clean, with -d' '
+	mkdir -p d1 d2 &&
+	touch d1/ut d2/ut &&
+	git clean -ffd */ut &&
+	test_path_is_missing d1/ut &&
+	test_path_is_missing d2/ut
+'
+
+test_expect_failure 'git clean works if a glob is passed without -d' '
+	mkdir -p d1 d2 &&
+	touch d1/ut d2/ut &&
+	git clean -f "*ut" &&
+	test_path_is_missing d1/ut &&
+	test_path_is_missing d2/ut
+'
+
+test_expect_failure 'git clean works if a glob is passed with -d' '
+	mkdir -p d1 d2 &&
+	touch d1/ut d2/ut &&
+	git clean -ffd "*ut" &&
+	test_path_is_missing d1/ut &&
+	test_path_is_missing d2/ut
+'
+
 test_expect_success MINGW 'handle clean & core.longpaths = false nicely' '
 	test_config core.longpaths false &&
 	a50=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa &&
-- 
2.22.1.11.g45a39ee867


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [RFC PATCH v2 02/12] dir: fix typo in comment
  2019-09-05 15:47 ` [RFC PATCH v2 00/12] Fix some git clean issues Elijah Newren
  2019-09-05 15:47   ` [RFC PATCH v2 01/12] t7300: Add some testcases showing failure to clean specified pathspecs Elijah Newren
@ 2019-09-05 15:47   ` Elijah Newren
  2019-09-05 15:47   ` [RFC PATCH v2 03/12] dir: fix off-by-one error in match_pathspec_item Elijah Newren
                     ` (11 subsequent siblings)
  13 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-09-05 15:47 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Rafael Ascensão, SZEDER Gábor, Samuel Lijin,
	Elijah Newren

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/dir.c b/dir.c
index d021c908e5..a9168bed96 100644
--- a/dir.c
+++ b/dir.c
@@ -139,7 +139,7 @@ static size_t common_prefix_len(const struct pathspec *pathspec)
 	 * ":(icase)path" is treated as a pathspec full of
 	 * wildcard. In other words, only prefix is considered common
 	 * prefix. If the pathspec is abc/foo abc/bar, running in
-	 * subdir xyz, the common prefix is still xyz, not xuz/abc as
+	 * subdir xyz, the common prefix is still xyz, not xyz/abc as
 	 * in non-:(icase).
 	 */
 	GUARD_PATHSPEC(pathspec,
-- 
2.22.1.11.g45a39ee867


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [RFC PATCH v2 03/12] dir: fix off-by-one error in match_pathspec_item
  2019-09-05 15:47 ` [RFC PATCH v2 00/12] Fix some git clean issues Elijah Newren
  2019-09-05 15:47   ` [RFC PATCH v2 01/12] t7300: Add some testcases showing failure to clean specified pathspecs Elijah Newren
  2019-09-05 15:47   ` [RFC PATCH v2 02/12] dir: fix typo in comment Elijah Newren
@ 2019-09-05 15:47   ` Elijah Newren
  2019-09-05 15:47   ` [RFC PATCH v2 04/12] dir: Directories should be checked for matching pathspecs too Elijah Newren
                     ` (10 subsequent siblings)
  13 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-09-05 15:47 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Rafael Ascensão, SZEDER Gábor, Samuel Lijin,
	Elijah Newren

For a pathspec like 'foo/bar' comparing against a path named "foo/",
namelen will be 4, and match[namelen] will be 'b'.  The correct location
of the directory separator is namelen-1.

The reason the code worked anyway was that the following code immediately
checked whether the first matchlen characters matched (which they do) and
then bailed and return MATCHED_RECURSIVELY anyway since wildmatch doesn't
have the ability to check if "name" can be matched as a directory (or
prefix) against the pathspec.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/dir.c b/dir.c
index a9168bed96..bf1a74799e 100644
--- a/dir.c
+++ b/dir.c
@@ -356,8 +356,9 @@ static int match_pathspec_item(const struct index_state *istate,
 	/* Perform checks to see if "name" is a super set of the pathspec */
 	if (flags & DO_MATCH_SUBMODULE) {
 		/* name is a literal prefix of the pathspec */
+		int offset = name[namelen-1] == '/' ? 1 : 0;
 		if ((namelen < matchlen) &&
-		    (match[namelen] == '/') &&
+		    (match[namelen-offset] == '/') &&
 		    !ps_strncmp(item, match, name, namelen))
 			return MATCHED_RECURSIVELY;
 
-- 
2.22.1.11.g45a39ee867


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [RFC PATCH v2 04/12] dir: Directories should be checked for matching pathspecs too
  2019-09-05 15:47 ` [RFC PATCH v2 00/12] Fix some git clean issues Elijah Newren
                     ` (2 preceding siblings ...)
  2019-09-05 15:47   ` [RFC PATCH v2 03/12] dir: fix off-by-one error in match_pathspec_item Elijah Newren
@ 2019-09-05 15:47   ` Elijah Newren
  2019-09-05 15:47   ` [RFC PATCH v2 05/12] dir: Make the DO_MATCH_SUBMODULE code reusable for a non-submodule case Elijah Newren
                     ` (9 subsequent siblings)
  13 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-09-05 15:47 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Rafael Ascensão, SZEDER Gábor, Samuel Lijin,
	Elijah Newren

Even if a directory doesn't match a pathspec, it is possible, depending
on the precise pathspecs, that some file underneath it might.  So we
special case and recurse into the directory for such situations.  However,
we previously always added any untracked directory that we recursed into
to the list of untracked paths, regardless of whether the directory
itself matched the pathspec.

For the case of git-clean and a set of pathspecs of "dir/file" and "more",
this caused a problem because we'd end up with dir entries for both of
  "dir"
  "dir/file"
Then correct_untracked_entries() would try to helpfully prune duplicates
for us by removing "dir/file" since it's under "dir", leaving us with
  "dir"
Since the original pathspec only had "dir/file", the only entry left
doesn't match and leaves nothing to be removed.  (Note that if only one
pathspec was specified, e.g. only "dir/file", then the common_prefix_len
optimizations in fill_directory would cause us to bypass this problem,
making it appear in simple tests that we could correctly remove manually
specified pathspecs.)

Fix this by actually checking whether the directory we are about to add
to the list of dir entries actually matches the pathspec; only do this
matching check after we have already returned from recursing into the
directory.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c            | 5 +++++
 t/t7300-clean.sh | 4 ++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/dir.c b/dir.c
index bf1a74799e..76a3c3894b 100644
--- a/dir.c
+++ b/dir.c
@@ -1951,6 +1951,11 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 							 check_only, stop_at_first_file, pathspec);
 			if (subdir_state > dir_state)
 				dir_state = subdir_state;
+
+			if (!match_pathspec(istate, pathspec, path.buf, path.len,
+					    0 /* prefix */, NULL,
+					    0 /* do NOT special case dirs */))
+				state = path_none;
 		}
 
 		if (check_only) {
diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index 2c254c773c..12617158db 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -699,7 +699,7 @@ test_expect_failure 'git clean handles being told what to clean' '
 	test_path_is_missing d2/ut
 '
 
-test_expect_failure 'git clean handles being told what to clean, with -d' '
+test_expect_success 'git clean handles being told what to clean, with -d' '
 	mkdir -p d1 d2 &&
 	touch d1/ut d2/ut &&
 	git clean -ffd */ut &&
@@ -715,7 +715,7 @@ test_expect_failure 'git clean works if a glob is passed without -d' '
 	test_path_is_missing d2/ut
 '
 
-test_expect_failure 'git clean works if a glob is passed with -d' '
+test_expect_success 'git clean works if a glob is passed with -d' '
 	mkdir -p d1 d2 &&
 	touch d1/ut d2/ut &&
 	git clean -ffd "*ut" &&
-- 
2.22.1.11.g45a39ee867


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [RFC PATCH v2 05/12] dir: Make the DO_MATCH_SUBMODULE code reusable for a non-submodule case
  2019-09-05 15:47 ` [RFC PATCH v2 00/12] Fix some git clean issues Elijah Newren
                     ` (3 preceding siblings ...)
  2019-09-05 15:47   ` [RFC PATCH v2 04/12] dir: Directories should be checked for matching pathspecs too Elijah Newren
@ 2019-09-05 15:47   ` Elijah Newren
  2019-09-05 15:47   ` [RFC PATCH v2 06/12] dir: If our pathspec might match files under a dir, recurse into it Elijah Newren
                     ` (8 subsequent siblings)
  13 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-09-05 15:47 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Rafael Ascensão, SZEDER Gábor, Samuel Lijin,
	Elijah Newren

The specific checks done in match_pathspec_item for the DO_MATCH_SUBMODULE
case are useful for other cases which have nothing to do with submodules.
Rename this constant; a subsequent commit will make use of this change.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/dir.c b/dir.c
index 76a3c3894b..b4d656192e 100644
--- a/dir.c
+++ b/dir.c
@@ -273,7 +273,7 @@ static int do_read_blob(const struct object_id *oid, struct oid_stat *oid_stat,
 
 #define DO_MATCH_EXCLUDE   (1<<0)
 #define DO_MATCH_DIRECTORY (1<<1)
-#define DO_MATCH_SUBMODULE (1<<2)
+#define DO_MATCH_LEADING_PATHSPEC (1<<2)
 
 /*
  * Does 'match' match the given name?
@@ -354,7 +354,7 @@ static int match_pathspec_item(const struct index_state *istate,
 		return MATCHED_FNMATCH;
 
 	/* Perform checks to see if "name" is a super set of the pathspec */
-	if (flags & DO_MATCH_SUBMODULE) {
+	if (flags & DO_MATCH_LEADING_PATHSPEC) {
 		/* name is a literal prefix of the pathspec */
 		int offset = name[namelen-1] == '/' ? 1 : 0;
 		if ((namelen < matchlen) &&
@@ -498,7 +498,7 @@ int submodule_path_match(const struct index_state *istate,
 					strlen(submodule_name),
 					0, seen,
 					DO_MATCH_DIRECTORY |
-					DO_MATCH_SUBMODULE);
+					DO_MATCH_LEADING_PATHSPEC);
 	return matched;
 }
 
-- 
2.22.1.11.g45a39ee867


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [RFC PATCH v2 06/12] dir: If our pathspec might match files under a dir, recurse into it
  2019-09-05 15:47 ` [RFC PATCH v2 00/12] Fix some git clean issues Elijah Newren
                     ` (4 preceding siblings ...)
  2019-09-05 15:47   ` [RFC PATCH v2 05/12] dir: Make the DO_MATCH_SUBMODULE code reusable for a non-submodule case Elijah Newren
@ 2019-09-05 15:47   ` Elijah Newren
  2019-09-05 15:47   ` [RFC PATCH v2 07/12] dir: add commentary explaining match_pathspec_item's return value Elijah Newren
                     ` (7 subsequent siblings)
  13 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-09-05 15:47 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Rafael Ascensão, SZEDER Gábor, Samuel Lijin,
	Elijah Newren

For git clean, if a directory is entirely untracked and the user did not
specify -d (corresponding to DIR_SHOW_IGNORED_TOO), then we usually do
not want to remove that directory and thus do not recurse into it.
However, if the user manually specified specific (or even globbed) paths
somewhere under that directory to remove, then we need to recurse into
the directory to make sure we remove the relevant paths under that
directory as the user requested.

Note that this does not mean that the recursed-into directory will be
added to dir->entries for later removal; as of a few commits earlier in
this series, there is another more strict match check that is run after
returning from a recursed-into directory before deciding to add it to the
list of entries.  Therefore, this will only result in files underneath
the given directory which match one of the pathspecs being added to the
entries list.

Two particular considerations for this patch:

  * If we want to only recurse into a directory when it is specifically
    matched rather than matched-via-glob (e.g. '*.c'), then we could do
    so via making the final non-zero return in match_pathspec_item be
    MATCHED_RECURSIVELY instead of MATCHED_RECURSIVELY_LEADING_PATHSPEC.
    (See final patch of this RFC series for details; note that the
    relative order of MATCHED_RECURSIVELY_LEADING_PATHSPEC and
    MATCHED_RECURSIVELY are important for such a change.))

  * There is a growing amount of logic in read_directory_recursive() for
    deciding whether to recurse into a subdirectory.  However, there is
    a comment immediately preceding this logic that says to recurse if
    instructed by treat_path().   It may be better for the logic in
    read_directory_recursive() to be moved to treat_path() (or another
    function it calls, such as treat_directory()), but I did not feel
    strongly about this and just left the logic where it was while
    adding to it.  Do others have strong opinions on this?

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c            | 10 ++++++----
 dir.h            |  5 +++--
 t/t7300-clean.sh |  4 ++--
 3 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/dir.c b/dir.c
index b4d656192e..47c0a99cb5 100644
--- a/dir.c
+++ b/dir.c
@@ -360,7 +360,7 @@ static int match_pathspec_item(const struct index_state *istate,
 		if ((namelen < matchlen) &&
 		    (match[namelen-offset] == '/') &&
 		    !ps_strncmp(item, match, name, namelen))
-			return MATCHED_RECURSIVELY;
+			return MATCHED_RECURSIVELY_LEADING_PATHSPEC;
 
 		/* name" doesn't match up to the first wild character */
 		if (item->nowildcard_len < item->len &&
@@ -377,7 +377,7 @@ static int match_pathspec_item(const struct index_state *istate,
 		 * The submodules themselves will be able to perform more
 		 * accurate matching to determine if the pathspec matches.
 		 */
-		return MATCHED_RECURSIVELY;
+		return MATCHED_RECURSIVELY_LEADING_PATHSPEC;
 	}
 
 	return 0;
@@ -1939,8 +1939,10 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 		/* recurse into subdir if instructed by treat_path */
 		if ((state == path_recurse) ||
 			((state == path_untracked) &&
-			 (dir->flags & DIR_SHOW_IGNORED_TOO) &&
-			 (get_dtype(cdir.de, istate, path.buf, path.len) == DT_DIR))) {
+			 (get_dtype(cdir.de, istate, path.buf, path.len) == DT_DIR) &&
+			 ((dir->flags & DIR_SHOW_IGNORED_TOO) ||
+			  do_match_pathspec(istate, pathspec, path.buf, path.len,
+					    baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC))) {
 			struct untracked_cache_dir *ud;
 			ud = lookup_untracked(dir->untracked, untracked,
 					      path.buf + baselen,
diff --git a/dir.h b/dir.h
index 680079bbe3..46c238ab49 100644
--- a/dir.h
+++ b/dir.h
@@ -211,8 +211,9 @@ int count_slashes(const char *s);
  * when populating the seen[] array.
  */
 #define MATCHED_RECURSIVELY 1
-#define MATCHED_FNMATCH 2
-#define MATCHED_EXACTLY 3
+#define MATCHED_RECURSIVELY_LEADING_PATHSPEC 2
+#define MATCHED_FNMATCH 3
+#define MATCHED_EXACTLY 4
 int simple_length(const char *match);
 int no_wildcard(const char *string);
 char *common_prefix(const struct pathspec *pathspec);
diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index 12617158db..d83aeb7dc2 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -691,7 +691,7 @@ test_expect_failure 'git clean -d skips nested repo containing ignored files' '
 	test_path_is_file nested-repo-with-ignored-file/file
 '
 
-test_expect_failure 'git clean handles being told what to clean' '
+test_expect_success 'git clean handles being told what to clean' '
 	mkdir -p d1 d2 &&
 	touch d1/ut d2/ut &&
 	git clean -f */ut &&
@@ -707,7 +707,7 @@ test_expect_success 'git clean handles being told what to clean, with -d' '
 	test_path_is_missing d2/ut
 '
 
-test_expect_failure 'git clean works if a glob is passed without -d' '
+test_expect_success 'git clean works if a glob is passed without -d' '
 	mkdir -p d1 d2 &&
 	touch d1/ut d2/ut &&
 	git clean -f "*ut" &&
-- 
2.22.1.11.g45a39ee867


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [RFC PATCH v2 07/12] dir: add commentary explaining match_pathspec_item's return value
  2019-09-05 15:47 ` [RFC PATCH v2 00/12] Fix some git clean issues Elijah Newren
                     ` (5 preceding siblings ...)
  2019-09-05 15:47   ` [RFC PATCH v2 06/12] dir: If our pathspec might match files under a dir, recurse into it Elijah Newren
@ 2019-09-05 15:47   ` Elijah Newren
  2019-09-05 15:47   ` [RFC PATCH v2 08/12] git-clean.txt: do not claim we will delete files with -n/--dry-run Elijah Newren
                     ` (6 subsequent siblings)
  13 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-09-05 15:47 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Rafael Ascensão, SZEDER Gábor, Samuel Lijin,
	Elijah Newren

The way match_pathspec_item() handles names and pathspecs with trailing
slash characters, in conjunction with special options like
DO_MATCH_DIRECTORY and DO_MATCH_LEADING_PATHSPEC were non-obvious, and
broken until this patch series.  Add a table in a comment explaining the
intent of how these work.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/dir.c b/dir.c
index 47c0a99cb5..3b2fe1701c 100644
--- a/dir.c
+++ b/dir.c
@@ -276,16 +276,27 @@ static int do_read_blob(const struct object_id *oid, struct oid_stat *oid_stat,
 #define DO_MATCH_LEADING_PATHSPEC (1<<2)
 
 /*
- * Does 'match' match the given name?
- * A match is found if
+ * Does the given pathspec match the given name?  A match is found if
  *
- * (1) the 'match' string is leading directory of 'name', or
- * (2) the 'match' string is a wildcard and matches 'name', or
- * (3) the 'match' string is exactly the same as 'name'.
+ * (1) the pathspec string is leading directory of 'name' ("RECURSIVELY"), or
+ * (2) the pathspec string has a leading part matching 'name' ("LEADING"), or
+ * (3) the pathspec string is a wildcard and matches 'name' ("WILDCARD"), or
+ * (4) the pathspec string is exactly the same as 'name' ("EXACT").
  *
- * and the return value tells which case it was.
+ * Return value tells which case it was (1-4), or 0 when there is no match.
  *
- * It returns 0 when there is no match.
+ * It may be instructive to look at a small table of concrete examples
+ * to understand the differences between 1, 2, and 4:
+ *
+ *                              Pathspecs
+ *                |    a/b    |   a/b/    |   a/b/c
+ *          ------+-----------+-----------+------------
+ *          a/b   |  EXACT    |  EXACT[1] | LEADING[2]
+ *  Names   a/b/  | RECURSIVE |   EXACT   | LEADING[2]
+ *          a/b/c | RECURSIVE | RECURSIVE |   EXACT
+ *
+ * [1] Only if DO_MATCH_DIRECTORY is passed; otherwise, this is NOT a match.
+ * [2] Only if DO_MATCH_LEADING_PATHSPEC is passed; otherwise, not a match.
  */
 static int match_pathspec_item(const struct index_state *istate,
 			       const struct pathspec_item *item, int prefix,
@@ -353,7 +364,7 @@ static int match_pathspec_item(const struct index_state *istate,
 			 item->nowildcard_len - prefix))
 		return MATCHED_FNMATCH;
 
-	/* Perform checks to see if "name" is a super set of the pathspec */
+	/* Perform checks to see if "name" is a leading string of the pathspec */
 	if (flags & DO_MATCH_LEADING_PATHSPEC) {
 		/* name is a literal prefix of the pathspec */
 		int offset = name[namelen-1] == '/' ? 1 : 0;
-- 
2.22.1.11.g45a39ee867


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [RFC PATCH v2 08/12] git-clean.txt: do not claim we will delete files with -n/--dry-run
  2019-09-05 15:47 ` [RFC PATCH v2 00/12] Fix some git clean issues Elijah Newren
                     ` (6 preceding siblings ...)
  2019-09-05 15:47   ` [RFC PATCH v2 07/12] dir: add commentary explaining match_pathspec_item's return value Elijah Newren
@ 2019-09-05 15:47   ` Elijah Newren
  2019-09-05 15:47   ` [RFC PATCH v2 09/12] clean: disambiguate the definition of -d Elijah Newren
                     ` (5 subsequent siblings)
  13 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-09-05 15:47 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Rafael Ascensão, SZEDER Gábor, Samuel Lijin,
	Elijah Newren

It appears that the wrong option got included in the list of what will
cause git-clean to actually take action.  Correct the list.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Documentation/git-clean.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/git-clean.txt b/Documentation/git-clean.txt
index db876f7dde..e84ffc9396 100644
--- a/Documentation/git-clean.txt
+++ b/Documentation/git-clean.txt
@@ -35,7 +35,7 @@ OPTIONS
 --force::
 	If the Git configuration variable clean.requireForce is not set
 	to false, 'git clean' will refuse to delete files or directories
-	unless given -f, -n or -i. Git will refuse to delete directories
+	unless given -f or -i. Git will refuse to delete directories
 	with .git sub directory or file unless a second -f
 	is given.
 
-- 
2.22.1.11.g45a39ee867


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [RFC PATCH v2 09/12] clean: disambiguate the definition of -d
  2019-09-05 15:47 ` [RFC PATCH v2 00/12] Fix some git clean issues Elijah Newren
                     ` (7 preceding siblings ...)
  2019-09-05 15:47   ` [RFC PATCH v2 08/12] git-clean.txt: do not claim we will delete files with -n/--dry-run Elijah Newren
@ 2019-09-05 15:47   ` Elijah Newren
  2019-09-05 15:47   ` [RFC PATCH v2 10/12] clean: avoid removing untracked files in a nested git repository Elijah Newren
                     ` (4 subsequent siblings)
  13 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-09-05 15:47 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Rafael Ascensão, SZEDER Gábor, Samuel Lijin,
	Elijah Newren

The -d flag pre-dated git-clean's ability to have paths specified.  As
such, the default for git-clean was to only remove untracked files in
the current directory, and -d existed to allow it to recurse into
subdirectories.

The interaction of paths and the -d option appears to not have been
carefully considered, as evidenced by numerous bugs and a dearth of
tests covering such pairings in the testsuite.  The definition turns out
to be important, so let's look at some of the various ways one could
interpret the -d option:

  A) Without -d, only look in subdirectories which contain tracked
     files under them; with -d, also look in subdirectories which
     are untracked for files to clean.

  B) Without specified paths from the user for us to delete, we need to
     have some kind of default, so...without -d, only look in
     subdirectories which contain tracked files under them; with -d,
     also look in subdirectories which are untracked for files to clean.

The important distinction here is that choice B says that the presence
or absence of '-d' is irrelevant if paths are specified.  The logic
behind option B is that if a user explicitly asked us to clean a
specified pathspec, then we should clean anything that matches that
pathspec.  Some examples may clarify.  Should

   git clean -f untracked_dir/file

remove untracked_dir/file or not?  It seems crazy not to, but a strict
reading of option A says it shouldn't be removed.  How about

   git clean -f untracked_dir/file1 tracked_dir/file2

or

   git clean -f untracked_dir_1/file1 untracked_dir_2/file2

?  Should it remove either or both of these files?  Should it require
multiple runs to remove both the files listed?  (If this sounds like a
crazy question to even ask, see the commit message of "t7300: Add some
testcases showing failure to clean specified pathspecs" added earlier in
this patch series.)  What if -ffd were used instead of -f -- should that
allow these to be removed?  Should it take multiple invocations with
-ffd?  What if a glob (such as '*tracked*') were used instead of
spelling out the directory names?  What if the filenames involved globs,
such as

   git clean -f '*.o'

or

   git clean -f '*/*.o'

?

The current documentation actually suggests a definition that is
slightly different than choice A, and the implementation prior to this
series provided something radically different than either choices A or
B. (The implementation, though, was clearly just buggy).  There may be
other choices as well.  However, for almost any given choice of
definition for -d that I can think of, some of the examples above will
appear buggy to the user.  The only case that doesn't have negative
surprises is choice B: treat a user-specified path as a request to clean
all untracked files which match that path specification, including
recursing into any untracked directories.

Change the documentation and basic implementation to use this
definition.

There were two regression tests that indirectly depended on the current
implementation, but neither was about subdirectory handling.  These two
tests were introduced in commit 5b7570cfb41c ("git-clean: add tests for
relative path", 2008-03-07) which was solely created to add coverage for
the changes in commit fb328947c8e ("git-clean: correct printing relative
path", 2008-03-07).  Both tests specified a directory that happened to
have an untracked subdirectory, but both were only checking that the
resulting printout of a file that was removed was shown with a relative
path.  Update these tests appropriately.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Documentation/git-clean.txt | 10 ++++++----
 builtin/clean.c             |  8 ++++++++
 t/t7300-clean.sh            |  2 ++
 3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-clean.txt b/Documentation/git-clean.txt
index e84ffc9396..3ab749b921 100644
--- a/Documentation/git-clean.txt
+++ b/Documentation/git-clean.txt
@@ -26,10 +26,12 @@ are affected.
 OPTIONS
 -------
 -d::
-	Remove untracked directories in addition to untracked files.
-	If an untracked directory is managed by a different Git
-	repository, it is not removed by default.  Use -f option twice
-	if you really want to remove such a directory.
+	Normally, when no <path> is specified, git clean will not
+	recurse into untracked directories to avoid removing too much.
+	Specify -d to have it recurse into such directories as well.
+	If any paths are specified, -d is irrelevant; all untracked
+	files matching the specified paths (with exceptions for nested
+	git directories mentioned under `--force`) will be removed.
 
 -f::
 --force::
diff --git a/builtin/clean.c b/builtin/clean.c
index d5579da716..68d70e41c0 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -949,6 +949,14 @@ int cmd_clean(int argc, const char **argv, const char *prefix)
 
 	dir.flags |= DIR_SHOW_OTHER_DIRECTORIES;
 
+	if (argc) {
+		/*
+		 * Remaining args implies pathspecs specified, and we should
+		 * recurse within those.
+		 */
+		remove_directories = 1;
+	}
+
 	if (remove_directories)
 		dir.flags |= DIR_SHOW_IGNORED_TOO | DIR_KEEP_UNTRACKED_CONTENTS;
 
diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index d83aeb7dc2..530dfdab34 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -117,6 +117,7 @@ test_expect_success C_LOCALE_OUTPUT 'git clean with relative prefix' '
 	would_clean=$(
 		cd docs &&
 		git clean -n ../src |
+		grep part3 |
 		sed -n -e "s|^Would remove ||p"
 	) &&
 	verbose test "$would_clean" = ../src/part3.c
@@ -129,6 +130,7 @@ test_expect_success C_LOCALE_OUTPUT 'git clean with absolute path' '
 	would_clean=$(
 		cd docs &&
 		git clean -n "$(pwd)/../src" |
+		grep part3 |
 		sed -n -e "s|^Would remove ||p"
 	) &&
 	verbose test "$would_clean" = ../src/part3.c
-- 
2.22.1.11.g45a39ee867


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [RFC PATCH v2 10/12] clean: avoid removing untracked files in a nested git repository
  2019-09-05 15:47 ` [RFC PATCH v2 00/12] Fix some git clean issues Elijah Newren
                     ` (8 preceding siblings ...)
  2019-09-05 15:47   ` [RFC PATCH v2 09/12] clean: disambiguate the definition of -d Elijah Newren
@ 2019-09-05 15:47   ` Elijah Newren
  2019-09-05 21:20     ` SZEDER Gábor
  2019-09-05 15:47   ` [RFC PATCH v2 11/12] clean: rewrap overly long line Elijah Newren
                     ` (3 subsequent siblings)
  13 siblings, 1 reply; 73+ messages in thread
From: Elijah Newren @ 2019-09-05 15:47 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Rafael Ascensão, SZEDER Gábor, Samuel Lijin,
	Elijah Newren

Users expect files in a nested git repository to be left alone unless
sufficiently forced (with two -f's).  Unfortunately, in certain
circumstances, git would delete both tracked (and possibly dirty) files
and untracked files within a nested repository.  To explain how this
happens, let's contrast a couple cases.  First, take the following
example setup (which assumes we are already within a git repo):

   git init nested
   cd nested
   >tracked
   git add tracked
   git commit -m init
   >untracked
   cd ..

In this setup, everything works as expected; running 'git clean -fd'
will result in fill_directory() returning the following paths:
   nested/
   nested/tracked
   nested/untracked
and then correct_untracked_entries() would notice this can be compressed
to
   nested/
and then since "nested/" is a directory, we would call
remove_dirs("nested/", ...), which would
check is_nonbare_repository_dir() and then decide to skip it.

However, if someone also creates an ignored file:
   >nested/ignored
then running 'git clean -fd' would result in fill_directory() returning
the same paths:
   nested/
   nested/tracked
   nested/untracked
but correct_untracked_entries() will notice that we had ignored entries
under nested/ and thus simplify this list to
   nested/tracked
   nested/untracked
Since these are not directories, we do not call remove_dirs() which was
the only place that had the is_nonbare_repository_dir() safety check --
resulting in us deleting both the untracked file and the tracked (and
possibly dirty) file.

One possible fix for this issue would be walking the parent directories
of each path and checking if they represent nonbare repositories, but
that would be wasteful.  Even if we added caching of some sort, it's
still a waste because we should have been able to check that "nested/"
represented a nonbare repository before even descending into it in the
first place.  Add a DIR_SKIP_NESTED_GIT flag to dir_struct.flags and use
it to prevent fill_directory() and friends from descending into nested
git repos.

With this change, we also modify two regression tests added in commit
91479b9c72f1 ("t7300: add tests to document behavior of clean and nested
git", 2015-06-15).  That commit, nor its series, nor the six previous
iterations of that series on the mailing list discussed why those tests
coded the expectation they did.  In fact, it appears their purpose was
simply to test _existing_ behavior to make sure that the performance
changes didn't change the behavior.  However, these two tests directly
contradicted the manpage's claims that two -f's were required to delete
files/directories under a nested git repository.  While one could argue
that the user gave an explicit path which matched files/directories that
were within a nested repository, there's a slippery slope that becomes
very difficult for users to understand once you go down that route (e.g.
what if they specified "git clean -f -d '*.c'"?)  It would also be hard
to explain what the exact behavior was; avoid such problems by making it
really simple.

Also, clean up some grammar errors describing this functionality in the
git-clean manpage.

Finally, there is one somewhat related bug which this patch does not
fix, coming from the opposite angle.  If the user runs
   git clean -ffd
to force deletion of untracked nested repositories, and within an
untracked nested repo the user has ignored files (according to the inner
OR outer repositories' .gitignore), then not only will those ignored
files be left alone but the .git/ subdirectory of the nested repo will
be left alone too.  I am not completely sure if this should be
considered a bug (though it seems like it since the lack of the
untracked file would result in the .git/ subdirectory being deleted),
but in any event it is very minor compared to accidentally deleting user
data and I did not dive into it.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Documentation/git-clean.txt |  6 +++---
 builtin/clean.c             |  2 ++
 dir.c                       | 10 ++++++++++
 dir.h                       |  3 ++-
 t/t7300-clean.sh            | 10 +++++-----
 5 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/Documentation/git-clean.txt b/Documentation/git-clean.txt
index 3ab749b921..ba31d8d166 100644
--- a/Documentation/git-clean.txt
+++ b/Documentation/git-clean.txt
@@ -37,9 +37,9 @@ OPTIONS
 --force::
 	If the Git configuration variable clean.requireForce is not set
 	to false, 'git clean' will refuse to delete files or directories
-	unless given -f or -i. Git will refuse to delete directories
-	with .git sub directory or file unless a second -f
-	is given.
+	unless given -f or -i.  Git will refuse to modify untracked
+	nested git repositories (directories with a .git subdirectory)
+	unless a second -f is given.
 
 -i::
 --interactive::
diff --git a/builtin/clean.c b/builtin/clean.c
index 68d70e41c0..3a7a63ae71 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -946,6 +946,8 @@ int cmd_clean(int argc, const char **argv, const char *prefix)
 
 	if (force > 1)
 		rm_flags = 0;
+	else
+		dir.flags |= DIR_SKIP_NESTED_GIT;
 
 	dir.flags |= DIR_SHOW_OTHER_DIRECTORIES;
 
diff --git a/dir.c b/dir.c
index 3b2fe1701c..7ff79170fc 100644
--- a/dir.c
+++ b/dir.c
@@ -1451,6 +1451,16 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 		return path_none;
 
 	case index_nonexistent:
+		if (dir->flags & DIR_SKIP_NESTED_GIT) {
+			int nested_repo;
+			struct strbuf sb = STRBUF_INIT;
+			strbuf_addstr(&sb, dirname);
+			nested_repo = is_nonbare_repository_dir(&sb);
+			strbuf_release(&sb);
+			if (nested_repo)
+				return path_none;
+		}
+
 		if (dir->flags & DIR_SHOW_OTHER_DIRECTORIES)
 			break;
 		if (exclude &&
diff --git a/dir.h b/dir.h
index 46c238ab49..739aea7c96 100644
--- a/dir.h
+++ b/dir.h
@@ -156,7 +156,8 @@ struct dir_struct {
 		DIR_SHOW_IGNORED_TOO = 1<<5,
 		DIR_COLLECT_KILLED_ONLY = 1<<6,
 		DIR_KEEP_UNTRACKED_CONTENTS = 1<<7,
-		DIR_SHOW_IGNORED_TOO_MODE_MATCHING = 1<<8
+		DIR_SHOW_IGNORED_TOO_MODE_MATCHING = 1<<8,
+		DIR_SKIP_NESTED_GIT = 1<<9
 	} flags;
 	struct dir_entry **entries;
 	struct dir_entry **ignored;
diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index 530dfdab34..6e6d24c1c3 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -549,7 +549,7 @@ test_expect_failure 'nested (non-empty) bare repositories should be cleaned even
 	test_path_is_missing strange_bare
 '
 
-test_expect_success 'giving path in nested git work tree will remove it' '
+test_expect_success 'giving path in nested git work tree will NOT remove it' '
 	rm -fr repo &&
 	mkdir repo &&
 	(
@@ -561,7 +561,7 @@ test_expect_success 'giving path in nested git work tree will remove it' '
 	git clean -f -d repo/bar/baz &&
 	test_path_is_file repo/.git/HEAD &&
 	test_path_is_dir repo/bar/ &&
-	test_path_is_missing repo/bar/baz
+	test_path_is_file repo/bar/baz/hello.world
 '
 
 test_expect_success 'giving path to nested .git will not remove it' '
@@ -579,7 +579,7 @@ test_expect_success 'giving path to nested .git will not remove it' '
 	test_path_is_dir untracked/
 '
 
-test_expect_success 'giving path to nested .git/ will remove contents' '
+test_expect_success 'giving path to nested .git/ will NOT remove contents' '
 	rm -fr repo untracked &&
 	mkdir repo untracked &&
 	(
@@ -589,7 +589,7 @@ test_expect_success 'giving path to nested .git/ will remove contents' '
 	) &&
 	git clean -f -d repo/.git/ &&
 	test_path_is_dir repo/.git &&
-	test_dir_is_empty repo/.git &&
+	test_path_is_file repo/.git/HEAD &&
 	test_path_is_dir untracked/
 '
 
@@ -671,7 +671,7 @@ test_expect_success 'git clean -d skips untracked dirs containing ignored files'
 	test_path_is_missing foo/b/bb
 '
 
-test_expect_failure 'git clean -d skips nested repo containing ignored files' '
+test_expect_success 'git clean -d skips nested repo containing ignored files' '
 	test_when_finished "rm -rf nested-repo-with-ignored-file" &&
 
 	git init nested-repo-with-ignored-file &&
-- 
2.22.1.11.g45a39ee867


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [RFC PATCH v2 11/12] clean: rewrap overly long line
  2019-09-05 15:47 ` [RFC PATCH v2 00/12] Fix some git clean issues Elijah Newren
                     ` (9 preceding siblings ...)
  2019-09-05 15:47   ` [RFC PATCH v2 10/12] clean: avoid removing untracked files in a nested git repository Elijah Newren
@ 2019-09-05 15:47   ` Elijah Newren
  2019-09-05 15:47   ` [RFC PATCH v2 12/12] clean: fix theoretical path corruption Elijah Newren
                     ` (2 subsequent siblings)
  13 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-09-05 15:47 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Rafael Ascensão, SZEDER Gábor, Samuel Lijin,
	Elijah Newren

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 builtin/clean.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/builtin/clean.c b/builtin/clean.c
index 3a7a63ae71..6030842f3a 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -158,7 +158,8 @@ static int remove_dirs(struct strbuf *path, const char *prefix, int force_flag,
 
 	*dir_gone = 1;
 
-	if ((force_flag & REMOVE_DIR_KEEP_NESTED_GIT) && is_nonbare_repository_dir(path)) {
+	if ((force_flag & REMOVE_DIR_KEEP_NESTED_GIT) &&
+	    is_nonbare_repository_dir(path)) {
 		if (!quiet) {
 			quote_path_relative(path->buf, prefix, &quoted);
 			printf(dry_run ?  _(msg_would_skip_git_dir) : _(msg_skip_git_dir),
-- 
2.22.1.11.g45a39ee867


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [RFC PATCH v2 12/12] clean: fix theoretical path corruption
  2019-09-05 15:47 ` [RFC PATCH v2 00/12] Fix some git clean issues Elijah Newren
                     ` (10 preceding siblings ...)
  2019-09-05 15:47   ` [RFC PATCH v2 11/12] clean: rewrap overly long line Elijah Newren
@ 2019-09-05 15:47   ` Elijah Newren
  2019-09-05 19:27     ` SZEDER Gábor
  2019-09-05 19:01   ` [RFC PATCH v2 00/12] Fix some git clean issues SZEDER Gábor
  2019-09-12 22:12   ` [PATCH v3 " Elijah Newren
  13 siblings, 1 reply; 73+ messages in thread
From: Elijah Newren @ 2019-09-05 15:47 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Rafael Ascensão, SZEDER Gábor, Samuel Lijin,
	Elijah Newren

cmd_clean() had the following code structure:

    struct strbuf abs_path = STRBUF_INIT;
    for_each_string_list_item(item, &del_list) {
        strbuf_addstr(&abs_path, prefix);
        strbuf_addstr(&abs_path, item->string);
        PROCESS(&abs_path);
        strbuf_reset(&abs_path);
    }

where I've elided a bunch of unnecessary details and PROCESS(&abs_path)
represents a big chunk of code rather than an actual function call.  One
piece of PROCESS was:

    if (lstat(abs_path.buf, &st))
        continue;

which would cause the strbuf_reset() to be missed -- meaning that the
next path to be handled would have two paths concatenated.  This path
used to use die_errno() instead of continue prior to commit 396049e5fb62
("git-clean: refactor git-clean into two phases", 2013-06-25), but my
understanding of how correct_untracked_entries() works is that it will
prevent both dir/ and dir/file from being in the list to clean so this
should be dead code and the die_errno() should be safe.  But I hesitate
to remove it since I am not certain.  Instead, just fix it to avoid path
corruption in case it is possible to reach this continue statement.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 builtin/clean.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/builtin/clean.c b/builtin/clean.c
index 6030842f3a..ccb6e23f0b 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -1028,8 +1028,10 @@ int cmd_clean(int argc, const char **argv, const char *prefix)
 		 * recursive directory removal, so lstat() here could
 		 * fail with ENOENT.
 		 */
-		if (lstat(abs_path.buf, &st))
+		if (lstat(abs_path.buf, &st)) {
+			strbuf_reset(&abs_path);
 			continue;
+		}
 
 		if (S_ISDIR(st.st_mode)) {
 			if (remove_dirs(&abs_path, prefix, rm_flags, dry_run, quiet, &gone))
-- 
2.22.1.11.g45a39ee867


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [RFC PATCH v2 00/12] Fix some git clean issues
  2019-09-05 15:47 ` [RFC PATCH v2 00/12] Fix some git clean issues Elijah Newren
                     ` (11 preceding siblings ...)
  2019-09-05 15:47   ` [RFC PATCH v2 12/12] clean: fix theoretical path corruption Elijah Newren
@ 2019-09-05 19:01   ` SZEDER Gábor
  2019-09-07  0:33     ` Elijah Newren
  2019-09-12 22:12   ` [PATCH v3 " Elijah Newren
  13 siblings, 1 reply; 73+ messages in thread
From: SZEDER Gábor @ 2019-09-05 19:01 UTC (permalink / raw)
  To: Elijah Newren; +Cc: git, Jeff King, Rafael Ascensão, Samuel Lijin

On Thu, Sep 05, 2019 at 08:47:23AM -0700, Elijah Newren wrote:
> This patch series fixes a few issues with git-clean:

>   * Failure to preserve both tracked and untracked files within a nested
>     Git repository reported a few weeks ago by SZEDER[3].

Wow, I didn't expect a 12 patch series to fix that issue...
Thanks.

> Elijah Newren (12):
>   t7300: Add some testcases showing failure to clean specified pathspecs
>   dir: fix typo in comment
>   dir: fix off-by-one error in match_pathspec_item
>   dir: Directories should be checked for matching pathspecs too
>   dir: Make the DO_MATCH_SUBMODULE code reusable for a non-submodule
>     case
>   dir: If our pathspec might match files under a dir, recurse into it

Nit: no capital letters after the '<area>:' prefix.

>   dir: add commentary explaining match_pathspec_item's return value
>   git-clean.txt: do not claim we will delete files with -n/--dry-run
>   clean: disambiguate the definition of -d
>   clean: avoid removing untracked files in a nested git repository
>   clean: rewrap overly long line
>   clean: fix theoretical path corruption
> 
>  Documentation/git-clean.txt | 16 +++++-----
>  builtin/clean.c             | 17 ++++++++--
>  dir.c                       | 63 +++++++++++++++++++++++++++----------
>  dir.h                       |  8 +++--
>  t/t7300-clean.sh            | 44 +++++++++++++++++++++++---
>  5 files changed, 114 insertions(+), 34 deletions(-)
> 
> -- 
> 2.22.1.11.g45a39ee867
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [RFC PATCH v2 12/12] clean: fix theoretical path corruption
  2019-09-05 15:47   ` [RFC PATCH v2 12/12] clean: fix theoretical path corruption Elijah Newren
@ 2019-09-05 19:27     ` SZEDER Gábor
  2019-09-07  0:34       ` Elijah Newren
  0 siblings, 1 reply; 73+ messages in thread
From: SZEDER Gábor @ 2019-09-05 19:27 UTC (permalink / raw)
  To: Elijah Newren; +Cc: git, Jeff King, Rafael Ascensão, Samuel Lijin

On Thu, Sep 05, 2019 at 08:47:35AM -0700, Elijah Newren wrote:
> cmd_clean() had the following code structure:
> 
>     struct strbuf abs_path = STRBUF_INIT;
>     for_each_string_list_item(item, &del_list) {
>         strbuf_addstr(&abs_path, prefix);
>         strbuf_addstr(&abs_path, item->string);
>         PROCESS(&abs_path);
>         strbuf_reset(&abs_path);
>     }
> 
> where I've elided a bunch of unnecessary details and PROCESS(&abs_path)
> represents a big chunk of code rather than an actual function call.  One
> piece of PROCESS was:
> 
>     if (lstat(abs_path.buf, &st))
>         continue;
> 
> which would cause the strbuf_reset() to be missed -- meaning that the
> next path to be handled would have two paths concatenated.  This path
> used to use die_errno() instead of continue prior to commit 396049e5fb62
> ("git-clean: refactor git-clean into two phases", 2013-06-25), but my
> understanding of how correct_untracked_entries() works is that it will
> prevent both dir/ and dir/file from being in the list to clean so this
> should be dead code and the die_errno() should be safe.  But I hesitate
> to remove it since I am not certain.  Instead, just fix it to avoid path
> corruption in case it is possible to reach this continue statement.
> 
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  builtin/clean.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/builtin/clean.c b/builtin/clean.c
> index 6030842f3a..ccb6e23f0b 100644
> --- a/builtin/clean.c
> +++ b/builtin/clean.c
> @@ -1028,8 +1028,10 @@ int cmd_clean(int argc, const char **argv, const char *prefix)
>  		 * recursive directory removal, so lstat() here could
>  		 * fail with ENOENT.
>  		 */
> -		if (lstat(abs_path.buf, &st))
> +		if (lstat(abs_path.buf, &st)) {
> +			strbuf_reset(&abs_path);
>  			continue;
> +		}

I wonder whether it would be safer to call strbuf_reset() at the start
of each loop iteration instead of before 'continue'.  That way we
wouldn't have to worry about another 'continue' statements forgetting
about it.

It probably doesn't really matter in this particular case (considering
that it's potentially dead code to begin with), but have a look at
e.g. diff.c:show_stats() and its several strbuf_reset(&out) calls
preceeding continue statements.

>  		if (S_ISDIR(st.st_mode)) {
>  			if (remove_dirs(&abs_path, prefix, rm_flags, dry_run, quiet, &gone))
> -- 
> 2.22.1.11.g45a39ee867
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [RFC PATCH v2 10/12] clean: avoid removing untracked files in a nested git repository
  2019-09-05 15:47   ` [RFC PATCH v2 10/12] clean: avoid removing untracked files in a nested git repository Elijah Newren
@ 2019-09-05 21:20     ` SZEDER Gábor
  0 siblings, 0 replies; 73+ messages in thread
From: SZEDER Gábor @ 2019-09-05 21:20 UTC (permalink / raw)
  To: Elijah Newren; +Cc: git, Jeff King, Rafael Ascensão, Samuel Lijin

On Thu, Sep 05, 2019 at 08:47:33AM -0700, Elijah Newren wrote:
> Users expect files in a nested git repository to be left alone unless
> sufficiently forced (with two -f's).  Unfortunately, in certain
> circumstances, git would delete both tracked (and possibly dirty) files
> and untracked files within a nested repository.  To explain how this
> happens, let's contrast a couple cases.  First, take the following
> example setup (which assumes we are already within a git repo):
> 
>    git init nested
>    cd nested
>    >tracked
>    git add tracked
>    git commit -m init
>    >untracked
>    cd ..
> 
> In this setup, everything works as expected; running 'git clean -fd'
> will result in fill_directory() returning the following paths:
>    nested/
>    nested/tracked
>    nested/untracked
> and then correct_untracked_entries() would notice this can be compressed
> to
>    nested/
> and then since "nested/" is a directory, we would call
> remove_dirs("nested/", ...), which would
> check is_nonbare_repository_dir() and then decide to skip it.
> 
> However, if someone also creates an ignored file:
>    >nested/ignored
> then running 'git clean -fd' would result in fill_directory() returning
> the same paths:
>    nested/
>    nested/tracked
>    nested/untracked
> but correct_untracked_entries() will notice that we had ignored entries
> under nested/ and thus simplify this list to
>    nested/tracked
>    nested/untracked
> Since these are not directories, we do not call remove_dirs() which was
> the only place that had the is_nonbare_repository_dir() safety check --
> resulting in us deleting both the untracked file and the tracked (and
> possibly dirty) file.
> 
> One possible fix for this issue would be walking the parent directories
> of each path and checking if they represent nonbare repositories, but
> that would be wasteful.  Even if we added caching of some sort, it's
> still a waste because we should have been able to check that "nested/"
> represented a nonbare repository before even descending into it in the
> first place.  Add a DIR_SKIP_NESTED_GIT flag to dir_struct.flags and use
> it to prevent fill_directory() and friends from descending into nested
> git repos.

> Finally, there is one somewhat related bug which this patch does not
> fix, coming from the opposite angle.  If the user runs
>    git clean -ffd
> to force deletion of untracked nested repositories, and within an
> untracked nested repo the user has ignored files (according to the inner
> OR outer repositories' .gitignore), then not only will those ignored
> files be left alone but the .git/ subdirectory of the nested repo will
> be left alone too.  I am not completely sure if this should be
> considered a bug (though it seems like it since the lack of the
> untracked file would result in the .git/ subdirectory being deleted),
> but in any event it is very minor compared to accidentally deleting user
> data and I did not dive into it.

We briefly mentioned this "ignored file in a nested repo fools 'git
clean -d'" issue in an unrelated thread as well, where Philip
suggested that the gitignore of the outer repository should not have
any effect on the nested repository.  I'm inclined to agree.

  https://public-inbox.org/git/e221aaf8-7d0b-6feb-3f58-1e9e4382939b@iee.email/

Now, 'git clean -X' is supposed to "Remove only files ignored by
Git.".  I'm not entirely sure what 'git clean -ffdX' is supposed to do
(or whether it makes any sense in the first place), but it does delete
files in the nested repository that are ignored only in the outer
repository, both tracked (and possibly dirty) and untracked, even with
this patch series.  Without this series '-fdX' is just as bad, but
with this patch (i.e. by not descending into nested repositories)
'-fdX' becomes sensible and leaves the nested repository alone.

> diff --git a/Documentation/git-clean.txt b/Documentation/git-clean.txt
> index 3ab749b921..ba31d8d166 100644
> --- a/Documentation/git-clean.txt
> +++ b/Documentation/git-clean.txt
> @@ -37,9 +37,9 @@ OPTIONS
>  --force::
>  	If the Git configuration variable clean.requireForce is not set
>  	to false, 'git clean' will refuse to delete files or directories
> -	unless given -f or -i. Git will refuse to delete directories
> -	with .git sub directory or file unless a second -f
> -	is given.
> +	unless given -f or -i.  Git will refuse to modify untracked
> +	nested git repositories (directories with a .git subdirectory)
> +	unless a second -f is given.

I like this wording.


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [RFC PATCH v2 00/12] Fix some git clean issues
  2019-09-05 19:01   ` [RFC PATCH v2 00/12] Fix some git clean issues SZEDER Gábor
@ 2019-09-07  0:33     ` Elijah Newren
  0 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-09-07  0:33 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Git Mailing List, Jeff King, Rafael Ascensão, Samuel Lijin

On Thu, Sep 5, 2019 at 12:01 PM SZEDER Gábor <szeder.dev@gmail.com> wrote:
>
> On Thu, Sep 05, 2019 at 08:47:23AM -0700, Elijah Newren wrote:
> > This patch series fixes a few issues with git-clean:
>
> >   * Failure to preserve both tracked and untracked files within a nested
> >     Git repository reported a few weeks ago by SZEDER[3].
>
> Wow, I didn't expect a 12 patch series to fix that issue...
> Thanks.

Well, to be fair, only the last three patches were about that issue.
The first 9 were about the other issues.  It's just that your testcase
reminded me of my old series and gave me another nudge to dig it out
and see if it helped with your problem.  I had to rebase it and look
back over it, and then found it didn't help with your problem, but by
then I had refamiliarized myself with the code so...

> > Elijah Newren (12):
> >   t7300: Add some testcases showing failure to clean specified pathspecs
> >   dir: fix typo in comment
> >   dir: fix off-by-one error in match_pathspec_item
> >   dir: Directories should be checked for matching pathspecs too
> >   dir: Make the DO_MATCH_SUBMODULE code reusable for a non-submodule
> >     case
> >   dir: If our pathspec might match files under a dir, recurse into it
>
> Nit: no capital letters after the '<area>:' prefix.

Gah, I should know that any patch series I submitted from a year and a
half ago probably made that mistake.  I've mostly trained myself out
of it now, but I certainly hadn't back then.

Thanks for pointing it out; will fix.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [RFC PATCH v2 12/12] clean: fix theoretical path corruption
  2019-09-05 19:27     ` SZEDER Gábor
@ 2019-09-07  0:34       ` Elijah Newren
  0 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-09-07  0:34 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Git Mailing List, Jeff King, Rafael Ascensão, Samuel Lijin

On Thu, Sep 5, 2019 at 12:27 PM SZEDER Gábor <szeder.dev@gmail.com> wrote:
>
> On Thu, Sep 05, 2019 at 08:47:35AM -0700, Elijah Newren wrote:
> > cmd_clean() had the following code structure:
> >
> >     struct strbuf abs_path = STRBUF_INIT;
> >     for_each_string_list_item(item, &del_list) {
> >         strbuf_addstr(&abs_path, prefix);
> >         strbuf_addstr(&abs_path, item->string);
> >         PROCESS(&abs_path);
> >         strbuf_reset(&abs_path);
> >     }
> >
> > where I've elided a bunch of unnecessary details and PROCESS(&abs_path)
> > represents a big chunk of code rather than an actual function call.  One
> > piece of PROCESS was:
> >
> >     if (lstat(abs_path.buf, &st))
> >         continue;
> >
> > which would cause the strbuf_reset() to be missed -- meaning that the
> > next path to be handled would have two paths concatenated.  This path
> > used to use die_errno() instead of continue prior to commit 396049e5fb62
> > ("git-clean: refactor git-clean into two phases", 2013-06-25), but my
> > understanding of how correct_untracked_entries() works is that it will
> > prevent both dir/ and dir/file from being in the list to clean so this
> > should be dead code and the die_errno() should be safe.  But I hesitate
> > to remove it since I am not certain.  Instead, just fix it to avoid path
> > corruption in case it is possible to reach this continue statement.
> >
> > Signed-off-by: Elijah Newren <newren@gmail.com>
> > ---
> >  builtin/clean.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/builtin/clean.c b/builtin/clean.c
> > index 6030842f3a..ccb6e23f0b 100644
> > --- a/builtin/clean.c
> > +++ b/builtin/clean.c
> > @@ -1028,8 +1028,10 @@ int cmd_clean(int argc, const char **argv, const char *prefix)
> >                * recursive directory removal, so lstat() here could
> >                * fail with ENOENT.
> >                */
> > -             if (lstat(abs_path.buf, &st))
> > +             if (lstat(abs_path.buf, &st)) {
> > +                     strbuf_reset(&abs_path);
> >                       continue;
> > +             }
>
> I wonder whether it would be safer to call strbuf_reset() at the start
> of each loop iteration instead of before 'continue'.  That way we
> wouldn't have to worry about another 'continue' statements forgetting
> about it.
>
> It probably doesn't really matter in this particular case (considering
> that it's potentially dead code to begin with), but have a look at
> e.g. diff.c:show_stats() and its several strbuf_reset(&out) calls
> preceeding continue statements.

Ooh, I like that idea.  I think I'll apply that here.  I'll probably
leave diff.c:show_stats() as #leftoverbits for someone else, though I
really like the idea of fixing up other issues like this as you
suggest.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v3 00/12] Fix some git clean issues
  2019-09-05 15:47 ` [RFC PATCH v2 00/12] Fix some git clean issues Elijah Newren
                     ` (12 preceding siblings ...)
  2019-09-05 19:01   ` [RFC PATCH v2 00/12] Fix some git clean issues SZEDER Gábor
@ 2019-09-12 22:12   ` " Elijah Newren
  2019-09-12 22:12     ` [PATCH v3 01/12] t7300: add testcases showing failure to clean specified pathspecs Elijah Newren
                       ` (12 more replies)
  13 siblings, 13 replies; 73+ messages in thread
From: Elijah Newren @ 2019-09-12 22:12 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Rafael Ascensão,
	SZEDER Gábor, Samuel Lijin, Elijah Newren

NOTE: This series builds on sg/clean-nested-repo-with-ignored, as it
      (among other things) modifies his testcase from expect_failure
      to expect_success.  Also, Peff is probably the only one who
      remembers v1 (and even he may have forgotten it): v1 was posted
      a year and a half ago.

This patch series fixes a few issues with git-clean:
  * Failure to clean when multiple pathspecs are specified, reported both
    in April 2018[1] and again in May 2019[2].
  * Failure to preserve both tracked and untracked files within a nested
    Git repository reported a few weeks ago by SZEDER[3].

[1] https://public-inbox.org/git/20180405173446.32372-4-newren@gmail.com/
[2] https://public-inbox.org/git/20190531183651.10067-1-rafa.almas@gmail.com/
[3] https://public-inbox.org/git/20190825185918.3909-1-szeder.dev@gmail.com/

Changes since v2:
  * Removed the RFC label
  * Fixed up a few things SZEDER pointed out in v2 -- some commit
    message improvements, and an extra safety precaution for the
    final patch.

Stuff I'd like reviewer to focus on:
  * Read the (long) commit messages for patches 9 & 10; do folks agree
    with my declaration of "correct" behavior and my changes to the
    few regression tests in those patches?

Elijah Newren (12):
  t7300: add testcases showing failure to clean specified pathspecs
  dir: fix typo in comment
  dir: fix off-by-one error in match_pathspec_item
  dir: also check directories for matching pathspecs
  dir: make the DO_MATCH_SUBMODULE code reusable for a non-submodule
    case
  dir: if our pathspec might match files under a dir, recurse into it
  dir: add commentary explaining match_pathspec_item's return value
  git-clean.txt: do not claim we will delete files with -n/--dry-run
  clean: disambiguate the definition of -d
  clean: avoid removing untracked files in a nested git repository
  clean: rewrap overly long line
  clean: fix theoretical path corruption

 Documentation/git-clean.txt | 16 +++++-----
 builtin/clean.c             | 15 +++++++--
 dir.c                       | 63 +++++++++++++++++++++++++++----------
 dir.h                       |  8 +++--
 t/t7300-clean.sh            | 44 +++++++++++++++++++++++---
 5 files changed, 112 insertions(+), 34 deletions(-)

Range-diff:
 1:  82328e2033 !  1:  fe35ab8cc3 t7300: Add some testcases showing failure to clean specified pathspecs
    @@ Metadata
     Author: Elijah Newren <newren@gmail.com>
     
      ## Commit message ##
    -    t7300: Add some testcases showing failure to clean specified pathspecs
    +    t7300: add testcases showing failure to clean specified pathspecs
     
         Someone brought me a testcase where multiple git-clean invocations were
         required to clean out unwanted files:
 2:  5c1f58fd9d =  2:  707d287d79 dir: fix typo in comment
 3:  0e8b415af3 =  3:  bb316e82b2 dir: fix off-by-one error in match_pathspec_item
 4:  30b3ede443 !  4:  56319f934a dir: Directories should be checked for matching pathspecs too
    @@ Metadata
     Author: Elijah Newren <newren@gmail.com>
     
      ## Commit message ##
    -    dir: Directories should be checked for matching pathspecs too
    +    dir: also check directories for matching pathspecs
     
         Even if a directory doesn't match a pathspec, it is possible, depending
         on the precise pathspecs, that some file underneath it might.  So we
 5:  bab01f4cda !  5:  81593a565c dir: Make the DO_MATCH_SUBMODULE code reusable for a non-submodule case
    @@ Metadata
     Author: Elijah Newren <newren@gmail.com>
     
      ## Commit message ##
    -    dir: Make the DO_MATCH_SUBMODULE code reusable for a non-submodule case
    +    dir: make the DO_MATCH_SUBMODULE code reusable for a non-submodule case
     
         The specific checks done in match_pathspec_item for the DO_MATCH_SUBMODULE
         case are useful for other cases which have nothing to do with submodules.
 6:  c619ab4b3e !  6:  9566823a0f dir: If our pathspec might match files under a dir, recurse into it
    @@ Metadata
     Author: Elijah Newren <newren@gmail.com>
     
      ## Commit message ##
    -    dir: If our pathspec might match files under a dir, recurse into it
    +    dir: if our pathspec might match files under a dir, recurse into it
     
         For git clean, if a directory is entirely untracked and the user did not
         specify -d (corresponding to DIR_SHOW_IGNORED_TOO), then we usually do
    @@ Commit message
         the given directory which match one of the pathspecs being added to the
         entries list.
     
    -    Two particular considerations for this patch:
    +    Two notes of potential interest to future readers:
     
    -      * If we want to only recurse into a directory when it is specifically
    +      * If we wanted to only recurse into a directory when it is specifically
             matched rather than matched-via-glob (e.g. '*.c'), then we could do
             so via making the final non-zero return in match_pathspec_item be
             MATCHED_RECURSIVELY instead of MATCHED_RECURSIVELY_LEADING_PATHSPEC.
    -        (See final patch of this RFC series for details; note that the
    -        relative order of MATCHED_RECURSIVELY_LEADING_PATHSPEC and
    -        MATCHED_RECURSIVELY are important for such a change.))
    +        (Note that the relative order of MATCHED_RECURSIVELY_LEADING_PATHSPEC
    +        and MATCHED_RECURSIVELY are important for such a change.)  I was
    +        leaving open that possibility while writing an RFC asking for the
    +        behavior we want, but even though we don't want it, that knowledge
    +        might help you understand the code flow better.
     
           * There is a growing amount of logic in read_directory_recursive() for
    -        deciding whether to recurse into a subdirectory.  However, there is
    -        a comment immediately preceding this logic that says to recurse if
    +        deciding whether to recurse into a subdirectory.  However, there is a
    +        comment immediately preceding this logic that says to recurse if
             instructed by treat_path().   It may be better for the logic in
    -        read_directory_recursive() to be moved to treat_path() (or another
    -        function it calls, such as treat_directory()), but I did not feel
    -        strongly about this and just left the logic where it was while
    -        adding to it.  Do others have strong opinions on this?
    +        read_directory_recursive() to ultimately be moved to treat_path() (or
    +        another function it calls, such as treat_directory()), but I have
    +        left that for someone else to tackle in the future.
     
         Signed-off-by: Elijah Newren <newren@gmail.com>
     
 7:  5e1686a30e =  7:  7821898ba7 dir: add commentary explaining match_pathspec_item's return value
 8:  d92637f961 =  8:  13def5df57 git-clean.txt: do not claim we will delete files with -n/--dry-run
 9:  4550a30df8 =  9:  e6b274abf7 clean: disambiguate the definition of -d
10:  0985be2793 ! 10:  5f4ef14765 clean: avoid removing untracked files in a nested git repository
    @@ Commit message
         Also, clean up some grammar errors describing this functionality in the
         git-clean manpage.
     
    -    Finally, there is one somewhat related bug which this patch does not
    -    fix, coming from the opposite angle.  If the user runs
    -       git clean -ffd
    -    to force deletion of untracked nested repositories, and within an
    -    untracked nested repo the user has ignored files (according to the inner
    -    OR outer repositories' .gitignore), then not only will those ignored
    -    files be left alone but the .git/ subdirectory of the nested repo will
    -    be left alone too.  I am not completely sure if this should be
    -    considered a bug (though it seems like it since the lack of the
    -    untracked file would result in the .git/ subdirectory being deleted),
    -    but in any event it is very minor compared to accidentally deleting user
    -    data and I did not dive into it.
    +    Finally, there are still a couple bugs with -ffd not cleaning out enough
    +    (e.g.  missing the nested .git) and with -ffdX possibly cleaning out the
    +    wrong files (paying attention to outer .gitignore instead of inner).
    +    This patch does not address these cases at all (and does not change the
    +    behavior relative to those flags), it only fixes the handling when given
    +    a single -f.  See
    +    https://public-inbox.org/git/20190905212043.GC32087@szeder.dev/ for more
    +    discussion of the -ffd[X?] bugs.
     
         Signed-off-by: Elijah Newren <newren@gmail.com>
     
11:  2d36e3f7cb = 11:  4e30e62eb1 clean: rewrap overly long line
12:  3c5d1ff16a <  -:  ---------- clean: fix theoretical path corruption
 -:  ---------- > 12:  de2444f7cb clean: fix theoretical path corruption
-- 
2.23.0.173.gad11b3a635.dirty


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v3 01/12] t7300: add testcases showing failure to clean specified pathspecs
  2019-09-12 22:12   ` [PATCH v3 " Elijah Newren
@ 2019-09-12 22:12     ` Elijah Newren
  2019-09-13 18:54       ` Junio C Hamano
  2019-09-12 22:12     ` [PATCH v3 02/12] dir: fix typo in comment Elijah Newren
                       ` (11 subsequent siblings)
  12 siblings, 1 reply; 73+ messages in thread
From: Elijah Newren @ 2019-09-12 22:12 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Rafael Ascensão,
	SZEDER Gábor, Samuel Lijin, Elijah Newren

Someone brought me a testcase where multiple git-clean invocations were
required to clean out unwanted files:
  mkdir d{1,2}
  touch d{1,2}/ut
  touch d1/t && git add d1/t
With this setup, the user would need to run
  git clean -ffd */ut
twice to delete both ut files.

A little testing showed some interesting variants:
  * If only one of those two ut files existed (either one), then only one
    clean command would be necessary.
  * If both directories had tracked files, then only one git clean would
    be necessary to clean both files.
  * If both directories had no tracked files then the clean command above
    would never clean either of the untracked files despite the pathspec
    explicitly calling both of them out.

A bisect showed that the failure to clean out the files started with
commit cf424f5fd89b ("clean: respect pathspecs with "-d", 2014-03-10).
However, that pointed to a separate issue: while the "-d" flag was used
by the original user who showed me this problem, that flag should have
been irrelevant to this problem.  Testing again without the "-d" flag
showed that the same buggy behavior exists without using that flag, and
has in fact existed since before cf424f5fd89b.

Add testcases showing that multiple untracked files within entirely
untracked directories cannot be cleaned when specifying these files to
git clean via pathspecs.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t7300-clean.sh | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index d01fd120ab..2c254c773c 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -691,6 +691,38 @@ test_expect_failure 'git clean -d skips nested repo containing ignored files' '
 	test_path_is_file nested-repo-with-ignored-file/file
 '
 
+test_expect_failure 'git clean handles being told what to clean' '
+	mkdir -p d1 d2 &&
+	touch d1/ut d2/ut &&
+	git clean -f */ut &&
+	test_path_is_missing d1/ut &&
+	test_path_is_missing d2/ut
+'
+
+test_expect_failure 'git clean handles being told what to clean, with -d' '
+	mkdir -p d1 d2 &&
+	touch d1/ut d2/ut &&
+	git clean -ffd */ut &&
+	test_path_is_missing d1/ut &&
+	test_path_is_missing d2/ut
+'
+
+test_expect_failure 'git clean works if a glob is passed without -d' '
+	mkdir -p d1 d2 &&
+	touch d1/ut d2/ut &&
+	git clean -f "*ut" &&
+	test_path_is_missing d1/ut &&
+	test_path_is_missing d2/ut
+'
+
+test_expect_failure 'git clean works if a glob is passed with -d' '
+	mkdir -p d1 d2 &&
+	touch d1/ut d2/ut &&
+	git clean -ffd "*ut" &&
+	test_path_is_missing d1/ut &&
+	test_path_is_missing d2/ut
+'
+
 test_expect_success MINGW 'handle clean & core.longpaths = false nicely' '
 	test_config core.longpaths false &&
 	a50=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa &&
-- 
2.23.0.173.gad11b3a635.dirty


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v3 02/12] dir: fix typo in comment
  2019-09-12 22:12   ` [PATCH v3 " Elijah Newren
  2019-09-12 22:12     ` [PATCH v3 01/12] t7300: add testcases showing failure to clean specified pathspecs Elijah Newren
@ 2019-09-12 22:12     ` Elijah Newren
  2019-09-12 22:12     ` [PATCH v3 03/12] dir: fix off-by-one error in match_pathspec_item Elijah Newren
                       ` (10 subsequent siblings)
  12 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-09-12 22:12 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Rafael Ascensão,
	SZEDER Gábor, Samuel Lijin, Elijah Newren

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/dir.c b/dir.c
index d021c908e5..a9168bed96 100644
--- a/dir.c
+++ b/dir.c
@@ -139,7 +139,7 @@ static size_t common_prefix_len(const struct pathspec *pathspec)
 	 * ":(icase)path" is treated as a pathspec full of
 	 * wildcard. In other words, only prefix is considered common
 	 * prefix. If the pathspec is abc/foo abc/bar, running in
-	 * subdir xyz, the common prefix is still xyz, not xuz/abc as
+	 * subdir xyz, the common prefix is still xyz, not xyz/abc as
 	 * in non-:(icase).
 	 */
 	GUARD_PATHSPEC(pathspec,
-- 
2.23.0.173.gad11b3a635.dirty


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v3 03/12] dir: fix off-by-one error in match_pathspec_item
  2019-09-12 22:12   ` [PATCH v3 " Elijah Newren
  2019-09-12 22:12     ` [PATCH v3 01/12] t7300: add testcases showing failure to clean specified pathspecs Elijah Newren
  2019-09-12 22:12     ` [PATCH v3 02/12] dir: fix typo in comment Elijah Newren
@ 2019-09-12 22:12     ` Elijah Newren
  2019-09-13 19:05       ` Junio C Hamano
  2019-09-12 22:12     ` [PATCH v3 04/12] dir: also check directories for matching pathspecs Elijah Newren
                       ` (9 subsequent siblings)
  12 siblings, 1 reply; 73+ messages in thread
From: Elijah Newren @ 2019-09-12 22:12 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Rafael Ascensão,
	SZEDER Gábor, Samuel Lijin, Elijah Newren

For a pathspec like 'foo/bar' comparing against a path named "foo/",
namelen will be 4, and match[namelen] will be 'b'.  The correct location
of the directory separator is namelen-1.

The reason the code worked anyway was that the following code immediately
checked whether the first matchlen characters matched (which they do) and
then bailed and return MATCHED_RECURSIVELY anyway since wildmatch doesn't
have the ability to check if "name" can be matched as a directory (or
prefix) against the pathspec.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/dir.c b/dir.c
index a9168bed96..bf1a74799e 100644
--- a/dir.c
+++ b/dir.c
@@ -356,8 +356,9 @@ static int match_pathspec_item(const struct index_state *istate,
 	/* Perform checks to see if "name" is a super set of the pathspec */
 	if (flags & DO_MATCH_SUBMODULE) {
 		/* name is a literal prefix of the pathspec */
+		int offset = name[namelen-1] == '/' ? 1 : 0;
 		if ((namelen < matchlen) &&
-		    (match[namelen] == '/') &&
+		    (match[namelen-offset] == '/') &&
 		    !ps_strncmp(item, match, name, namelen))
 			return MATCHED_RECURSIVELY;
 
-- 
2.23.0.173.gad11b3a635.dirty


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v3 04/12] dir: also check directories for matching pathspecs
  2019-09-12 22:12   ` [PATCH v3 " Elijah Newren
                       ` (2 preceding siblings ...)
  2019-09-12 22:12     ` [PATCH v3 03/12] dir: fix off-by-one error in match_pathspec_item Elijah Newren
@ 2019-09-12 22:12     ` Elijah Newren
  2019-09-12 22:12     ` [PATCH v3 05/12] dir: make the DO_MATCH_SUBMODULE code reusable for a non-submodule case Elijah Newren
                       ` (8 subsequent siblings)
  12 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-09-12 22:12 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Rafael Ascensão,
	SZEDER Gábor, Samuel Lijin, Elijah Newren

Even if a directory doesn't match a pathspec, it is possible, depending
on the precise pathspecs, that some file underneath it might.  So we
special case and recurse into the directory for such situations.  However,
we previously always added any untracked directory that we recursed into
to the list of untracked paths, regardless of whether the directory
itself matched the pathspec.

For the case of git-clean and a set of pathspecs of "dir/file" and "more",
this caused a problem because we'd end up with dir entries for both of
  "dir"
  "dir/file"
Then correct_untracked_entries() would try to helpfully prune duplicates
for us by removing "dir/file" since it's under "dir", leaving us with
  "dir"
Since the original pathspec only had "dir/file", the only entry left
doesn't match and leaves nothing to be removed.  (Note that if only one
pathspec was specified, e.g. only "dir/file", then the common_prefix_len
optimizations in fill_directory would cause us to bypass this problem,
making it appear in simple tests that we could correctly remove manually
specified pathspecs.)

Fix this by actually checking whether the directory we are about to add
to the list of dir entries actually matches the pathspec; only do this
matching check after we have already returned from recursing into the
directory.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c            | 5 +++++
 t/t7300-clean.sh | 4 ++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/dir.c b/dir.c
index bf1a74799e..76a3c3894b 100644
--- a/dir.c
+++ b/dir.c
@@ -1951,6 +1951,11 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 							 check_only, stop_at_first_file, pathspec);
 			if (subdir_state > dir_state)
 				dir_state = subdir_state;
+
+			if (!match_pathspec(istate, pathspec, path.buf, path.len,
+					    0 /* prefix */, NULL,
+					    0 /* do NOT special case dirs */))
+				state = path_none;
 		}
 
 		if (check_only) {
diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index 2c254c773c..12617158db 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -699,7 +699,7 @@ test_expect_failure 'git clean handles being told what to clean' '
 	test_path_is_missing d2/ut
 '
 
-test_expect_failure 'git clean handles being told what to clean, with -d' '
+test_expect_success 'git clean handles being told what to clean, with -d' '
 	mkdir -p d1 d2 &&
 	touch d1/ut d2/ut &&
 	git clean -ffd */ut &&
@@ -715,7 +715,7 @@ test_expect_failure 'git clean works if a glob is passed without -d' '
 	test_path_is_missing d2/ut
 '
 
-test_expect_failure 'git clean works if a glob is passed with -d' '
+test_expect_success 'git clean works if a glob is passed with -d' '
 	mkdir -p d1 d2 &&
 	touch d1/ut d2/ut &&
 	git clean -ffd "*ut" &&
-- 
2.23.0.173.gad11b3a635.dirty


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v3 05/12] dir: make the DO_MATCH_SUBMODULE code reusable for a non-submodule case
  2019-09-12 22:12   ` [PATCH v3 " Elijah Newren
                       ` (3 preceding siblings ...)
  2019-09-12 22:12     ` [PATCH v3 04/12] dir: also check directories for matching pathspecs Elijah Newren
@ 2019-09-12 22:12     ` Elijah Newren
  2019-09-12 22:12     ` [PATCH v3 06/12] dir: if our pathspec might match files under a dir, recurse into it Elijah Newren
                       ` (7 subsequent siblings)
  12 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-09-12 22:12 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Rafael Ascensão,
	SZEDER Gábor, Samuel Lijin, Elijah Newren

The specific checks done in match_pathspec_item for the DO_MATCH_SUBMODULE
case are useful for other cases which have nothing to do with submodules.
Rename this constant; a subsequent commit will make use of this change.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/dir.c b/dir.c
index 76a3c3894b..b4d656192e 100644
--- a/dir.c
+++ b/dir.c
@@ -273,7 +273,7 @@ static int do_read_blob(const struct object_id *oid, struct oid_stat *oid_stat,
 
 #define DO_MATCH_EXCLUDE   (1<<0)
 #define DO_MATCH_DIRECTORY (1<<1)
-#define DO_MATCH_SUBMODULE (1<<2)
+#define DO_MATCH_LEADING_PATHSPEC (1<<2)
 
 /*
  * Does 'match' match the given name?
@@ -354,7 +354,7 @@ static int match_pathspec_item(const struct index_state *istate,
 		return MATCHED_FNMATCH;
 
 	/* Perform checks to see if "name" is a super set of the pathspec */
-	if (flags & DO_MATCH_SUBMODULE) {
+	if (flags & DO_MATCH_LEADING_PATHSPEC) {
 		/* name is a literal prefix of the pathspec */
 		int offset = name[namelen-1] == '/' ? 1 : 0;
 		if ((namelen < matchlen) &&
@@ -498,7 +498,7 @@ int submodule_path_match(const struct index_state *istate,
 					strlen(submodule_name),
 					0, seen,
 					DO_MATCH_DIRECTORY |
-					DO_MATCH_SUBMODULE);
+					DO_MATCH_LEADING_PATHSPEC);
 	return matched;
 }
 
-- 
2.23.0.173.gad11b3a635.dirty


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v3 06/12] dir: if our pathspec might match files under a dir, recurse into it
  2019-09-12 22:12   ` [PATCH v3 " Elijah Newren
                       ` (4 preceding siblings ...)
  2019-09-12 22:12     ` [PATCH v3 05/12] dir: make the DO_MATCH_SUBMODULE code reusable for a non-submodule case Elijah Newren
@ 2019-09-12 22:12     ` Elijah Newren
  2019-09-13 19:45       ` Junio C Hamano
  2019-09-12 22:12     ` [PATCH v3 07/12] dir: add commentary explaining match_pathspec_item's return value Elijah Newren
                       ` (6 subsequent siblings)
  12 siblings, 1 reply; 73+ messages in thread
From: Elijah Newren @ 2019-09-12 22:12 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Rafael Ascensão,
	SZEDER Gábor, Samuel Lijin, Elijah Newren

For git clean, if a directory is entirely untracked and the user did not
specify -d (corresponding to DIR_SHOW_IGNORED_TOO), then we usually do
not want to remove that directory and thus do not recurse into it.
However, if the user manually specified specific (or even globbed) paths
somewhere under that directory to remove, then we need to recurse into
the directory to make sure we remove the relevant paths under that
directory as the user requested.

Note that this does not mean that the recursed-into directory will be
added to dir->entries for later removal; as of a few commits earlier in
this series, there is another more strict match check that is run after
returning from a recursed-into directory before deciding to add it to the
list of entries.  Therefore, this will only result in files underneath
the given directory which match one of the pathspecs being added to the
entries list.

Two notes of potential interest to future readers:

  * If we wanted to only recurse into a directory when it is specifically
    matched rather than matched-via-glob (e.g. '*.c'), then we could do
    so via making the final non-zero return in match_pathspec_item be
    MATCHED_RECURSIVELY instead of MATCHED_RECURSIVELY_LEADING_PATHSPEC.
    (Note that the relative order of MATCHED_RECURSIVELY_LEADING_PATHSPEC
    and MATCHED_RECURSIVELY are important for such a change.)  I was
    leaving open that possibility while writing an RFC asking for the
    behavior we want, but even though we don't want it, that knowledge
    might help you understand the code flow better.

  * There is a growing amount of logic in read_directory_recursive() for
    deciding whether to recurse into a subdirectory.  However, there is a
    comment immediately preceding this logic that says to recurse if
    instructed by treat_path().   It may be better for the logic in
    read_directory_recursive() to ultimately be moved to treat_path() (or
    another function it calls, such as treat_directory()), but I have
    left that for someone else to tackle in the future.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c            | 10 ++++++----
 dir.h            |  5 +++--
 t/t7300-clean.sh |  4 ++--
 3 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/dir.c b/dir.c
index b4d656192e..47c0a99cb5 100644
--- a/dir.c
+++ b/dir.c
@@ -360,7 +360,7 @@ static int match_pathspec_item(const struct index_state *istate,
 		if ((namelen < matchlen) &&
 		    (match[namelen-offset] == '/') &&
 		    !ps_strncmp(item, match, name, namelen))
-			return MATCHED_RECURSIVELY;
+			return MATCHED_RECURSIVELY_LEADING_PATHSPEC;
 
 		/* name" doesn't match up to the first wild character */
 		if (item->nowildcard_len < item->len &&
@@ -377,7 +377,7 @@ static int match_pathspec_item(const struct index_state *istate,
 		 * The submodules themselves will be able to perform more
 		 * accurate matching to determine if the pathspec matches.
 		 */
-		return MATCHED_RECURSIVELY;
+		return MATCHED_RECURSIVELY_LEADING_PATHSPEC;
 	}
 
 	return 0;
@@ -1939,8 +1939,10 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 		/* recurse into subdir if instructed by treat_path */
 		if ((state == path_recurse) ||
 			((state == path_untracked) &&
-			 (dir->flags & DIR_SHOW_IGNORED_TOO) &&
-			 (get_dtype(cdir.de, istate, path.buf, path.len) == DT_DIR))) {
+			 (get_dtype(cdir.de, istate, path.buf, path.len) == DT_DIR) &&
+			 ((dir->flags & DIR_SHOW_IGNORED_TOO) ||
+			  do_match_pathspec(istate, pathspec, path.buf, path.len,
+					    baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC))) {
 			struct untracked_cache_dir *ud;
 			ud = lookup_untracked(dir->untracked, untracked,
 					      path.buf + baselen,
diff --git a/dir.h b/dir.h
index 680079bbe3..46c238ab49 100644
--- a/dir.h
+++ b/dir.h
@@ -211,8 +211,9 @@ int count_slashes(const char *s);
  * when populating the seen[] array.
  */
 #define MATCHED_RECURSIVELY 1
-#define MATCHED_FNMATCH 2
-#define MATCHED_EXACTLY 3
+#define MATCHED_RECURSIVELY_LEADING_PATHSPEC 2
+#define MATCHED_FNMATCH 3
+#define MATCHED_EXACTLY 4
 int simple_length(const char *match);
 int no_wildcard(const char *string);
 char *common_prefix(const struct pathspec *pathspec);
diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index 12617158db..d83aeb7dc2 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -691,7 +691,7 @@ test_expect_failure 'git clean -d skips nested repo containing ignored files' '
 	test_path_is_file nested-repo-with-ignored-file/file
 '
 
-test_expect_failure 'git clean handles being told what to clean' '
+test_expect_success 'git clean handles being told what to clean' '
 	mkdir -p d1 d2 &&
 	touch d1/ut d2/ut &&
 	git clean -f */ut &&
@@ -707,7 +707,7 @@ test_expect_success 'git clean handles being told what to clean, with -d' '
 	test_path_is_missing d2/ut
 '
 
-test_expect_failure 'git clean works if a glob is passed without -d' '
+test_expect_success 'git clean works if a glob is passed without -d' '
 	mkdir -p d1 d2 &&
 	touch d1/ut d2/ut &&
 	git clean -f "*ut" &&
-- 
2.23.0.173.gad11b3a635.dirty


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v3 07/12] dir: add commentary explaining match_pathspec_item's return value
  2019-09-12 22:12   ` [PATCH v3 " Elijah Newren
                       ` (5 preceding siblings ...)
  2019-09-12 22:12     ` [PATCH v3 06/12] dir: if our pathspec might match files under a dir, recurse into it Elijah Newren
@ 2019-09-12 22:12     ` Elijah Newren
  2019-09-13 20:04       ` Junio C Hamano
  2019-09-12 22:12     ` [PATCH v3 08/12] git-clean.txt: do not claim we will delete files with -n/--dry-run Elijah Newren
                       ` (5 subsequent siblings)
  12 siblings, 1 reply; 73+ messages in thread
From: Elijah Newren @ 2019-09-12 22:12 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Rafael Ascensão,
	SZEDER Gábor, Samuel Lijin, Elijah Newren

The way match_pathspec_item() handles names and pathspecs with trailing
slash characters, in conjunction with special options like
DO_MATCH_DIRECTORY and DO_MATCH_LEADING_PATHSPEC were non-obvious, and
broken until this patch series.  Add a table in a comment explaining the
intent of how these work.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/dir.c b/dir.c
index 47c0a99cb5..3b2fe1701c 100644
--- a/dir.c
+++ b/dir.c
@@ -276,16 +276,27 @@ static int do_read_blob(const struct object_id *oid, struct oid_stat *oid_stat,
 #define DO_MATCH_LEADING_PATHSPEC (1<<2)
 
 /*
- * Does 'match' match the given name?
- * A match is found if
+ * Does the given pathspec match the given name?  A match is found if
  *
- * (1) the 'match' string is leading directory of 'name', or
- * (2) the 'match' string is a wildcard and matches 'name', or
- * (3) the 'match' string is exactly the same as 'name'.
+ * (1) the pathspec string is leading directory of 'name' ("RECURSIVELY"), or
+ * (2) the pathspec string has a leading part matching 'name' ("LEADING"), or
+ * (3) the pathspec string is a wildcard and matches 'name' ("WILDCARD"), or
+ * (4) the pathspec string is exactly the same as 'name' ("EXACT").
  *
- * and the return value tells which case it was.
+ * Return value tells which case it was (1-4), or 0 when there is no match.
  *
- * It returns 0 when there is no match.
+ * It may be instructive to look at a small table of concrete examples
+ * to understand the differences between 1, 2, and 4:
+ *
+ *                              Pathspecs
+ *                |    a/b    |   a/b/    |   a/b/c
+ *          ------+-----------+-----------+------------
+ *          a/b   |  EXACT    |  EXACT[1] | LEADING[2]
+ *  Names   a/b/  | RECURSIVE |   EXACT   | LEADING[2]
+ *          a/b/c | RECURSIVE | RECURSIVE |   EXACT
+ *
+ * [1] Only if DO_MATCH_DIRECTORY is passed; otherwise, this is NOT a match.
+ * [2] Only if DO_MATCH_LEADING_PATHSPEC is passed; otherwise, not a match.
  */
 static int match_pathspec_item(const struct index_state *istate,
 			       const struct pathspec_item *item, int prefix,
@@ -353,7 +364,7 @@ static int match_pathspec_item(const struct index_state *istate,
 			 item->nowildcard_len - prefix))
 		return MATCHED_FNMATCH;
 
-	/* Perform checks to see if "name" is a super set of the pathspec */
+	/* Perform checks to see if "name" is a leading string of the pathspec */
 	if (flags & DO_MATCH_LEADING_PATHSPEC) {
 		/* name is a literal prefix of the pathspec */
 		int offset = name[namelen-1] == '/' ? 1 : 0;
-- 
2.23.0.173.gad11b3a635.dirty


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v3 08/12] git-clean.txt: do not claim we will delete files with -n/--dry-run
  2019-09-12 22:12   ` [PATCH v3 " Elijah Newren
                       ` (6 preceding siblings ...)
  2019-09-12 22:12     ` [PATCH v3 07/12] dir: add commentary explaining match_pathspec_item's return value Elijah Newren
@ 2019-09-12 22:12     ` Elijah Newren
  2019-09-12 22:12     ` [PATCH v3 09/12] clean: disambiguate the definition of -d Elijah Newren
                       ` (4 subsequent siblings)
  12 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-09-12 22:12 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Rafael Ascensão,
	SZEDER Gábor, Samuel Lijin, Elijah Newren

It appears that the wrong option got included in the list of what will
cause git-clean to actually take action.  Correct the list.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Documentation/git-clean.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/git-clean.txt b/Documentation/git-clean.txt
index db876f7dde..e84ffc9396 100644
--- a/Documentation/git-clean.txt
+++ b/Documentation/git-clean.txt
@@ -35,7 +35,7 @@ OPTIONS
 --force::
 	If the Git configuration variable clean.requireForce is not set
 	to false, 'git clean' will refuse to delete files or directories
-	unless given -f, -n or -i. Git will refuse to delete directories
+	unless given -f or -i. Git will refuse to delete directories
 	with .git sub directory or file unless a second -f
 	is given.
 
-- 
2.23.0.173.gad11b3a635.dirty


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v3 09/12] clean: disambiguate the definition of -d
  2019-09-12 22:12   ` [PATCH v3 " Elijah Newren
                       ` (7 preceding siblings ...)
  2019-09-12 22:12     ` [PATCH v3 08/12] git-clean.txt: do not claim we will delete files with -n/--dry-run Elijah Newren
@ 2019-09-12 22:12     ` Elijah Newren
  2019-09-12 22:12     ` [PATCH v3 10/12] clean: avoid removing untracked files in a nested git repository Elijah Newren
                       ` (3 subsequent siblings)
  12 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-09-12 22:12 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Rafael Ascensão,
	SZEDER Gábor, Samuel Lijin, Elijah Newren

The -d flag pre-dated git-clean's ability to have paths specified.  As
such, the default for git-clean was to only remove untracked files in
the current directory, and -d existed to allow it to recurse into
subdirectories.

The interaction of paths and the -d option appears to not have been
carefully considered, as evidenced by numerous bugs and a dearth of
tests covering such pairings in the testsuite.  The definition turns out
to be important, so let's look at some of the various ways one could
interpret the -d option:

  A) Without -d, only look in subdirectories which contain tracked
     files under them; with -d, also look in subdirectories which
     are untracked for files to clean.

  B) Without specified paths from the user for us to delete, we need to
     have some kind of default, so...without -d, only look in
     subdirectories which contain tracked files under them; with -d,
     also look in subdirectories which are untracked for files to clean.

The important distinction here is that choice B says that the presence
or absence of '-d' is irrelevant if paths are specified.  The logic
behind option B is that if a user explicitly asked us to clean a
specified pathspec, then we should clean anything that matches that
pathspec.  Some examples may clarify.  Should

   git clean -f untracked_dir/file

remove untracked_dir/file or not?  It seems crazy not to, but a strict
reading of option A says it shouldn't be removed.  How about

   git clean -f untracked_dir/file1 tracked_dir/file2

or

   git clean -f untracked_dir_1/file1 untracked_dir_2/file2

?  Should it remove either or both of these files?  Should it require
multiple runs to remove both the files listed?  (If this sounds like a
crazy question to even ask, see the commit message of "t7300: Add some
testcases showing failure to clean specified pathspecs" added earlier in
this patch series.)  What if -ffd were used instead of -f -- should that
allow these to be removed?  Should it take multiple invocations with
-ffd?  What if a glob (such as '*tracked*') were used instead of
spelling out the directory names?  What if the filenames involved globs,
such as

   git clean -f '*.o'

or

   git clean -f '*/*.o'

?

The current documentation actually suggests a definition that is
slightly different than choice A, and the implementation prior to this
series provided something radically different than either choices A or
B. (The implementation, though, was clearly just buggy).  There may be
other choices as well.  However, for almost any given choice of
definition for -d that I can think of, some of the examples above will
appear buggy to the user.  The only case that doesn't have negative
surprises is choice B: treat a user-specified path as a request to clean
all untracked files which match that path specification, including
recursing into any untracked directories.

Change the documentation and basic implementation to use this
definition.

There were two regression tests that indirectly depended on the current
implementation, but neither was about subdirectory handling.  These two
tests were introduced in commit 5b7570cfb41c ("git-clean: add tests for
relative path", 2008-03-07) which was solely created to add coverage for
the changes in commit fb328947c8e ("git-clean: correct printing relative
path", 2008-03-07).  Both tests specified a directory that happened to
have an untracked subdirectory, but both were only checking that the
resulting printout of a file that was removed was shown with a relative
path.  Update these tests appropriately.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Documentation/git-clean.txt | 10 ++++++----
 builtin/clean.c             |  8 ++++++++
 t/t7300-clean.sh            |  2 ++
 3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-clean.txt b/Documentation/git-clean.txt
index e84ffc9396..3ab749b921 100644
--- a/Documentation/git-clean.txt
+++ b/Documentation/git-clean.txt
@@ -26,10 +26,12 @@ are affected.
 OPTIONS
 -------
 -d::
-	Remove untracked directories in addition to untracked files.
-	If an untracked directory is managed by a different Git
-	repository, it is not removed by default.  Use -f option twice
-	if you really want to remove such a directory.
+	Normally, when no <path> is specified, git clean will not
+	recurse into untracked directories to avoid removing too much.
+	Specify -d to have it recurse into such directories as well.
+	If any paths are specified, -d is irrelevant; all untracked
+	files matching the specified paths (with exceptions for nested
+	git directories mentioned under `--force`) will be removed.
 
 -f::
 --force::
diff --git a/builtin/clean.c b/builtin/clean.c
index d5579da716..68d70e41c0 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -949,6 +949,14 @@ int cmd_clean(int argc, const char **argv, const char *prefix)
 
 	dir.flags |= DIR_SHOW_OTHER_DIRECTORIES;
 
+	if (argc) {
+		/*
+		 * Remaining args implies pathspecs specified, and we should
+		 * recurse within those.
+		 */
+		remove_directories = 1;
+	}
+
 	if (remove_directories)
 		dir.flags |= DIR_SHOW_IGNORED_TOO | DIR_KEEP_UNTRACKED_CONTENTS;
 
diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index d83aeb7dc2..530dfdab34 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -117,6 +117,7 @@ test_expect_success C_LOCALE_OUTPUT 'git clean with relative prefix' '
 	would_clean=$(
 		cd docs &&
 		git clean -n ../src |
+		grep part3 |
 		sed -n -e "s|^Would remove ||p"
 	) &&
 	verbose test "$would_clean" = ../src/part3.c
@@ -129,6 +130,7 @@ test_expect_success C_LOCALE_OUTPUT 'git clean with absolute path' '
 	would_clean=$(
 		cd docs &&
 		git clean -n "$(pwd)/../src" |
+		grep part3 |
 		sed -n -e "s|^Would remove ||p"
 	) &&
 	verbose test "$would_clean" = ../src/part3.c
-- 
2.23.0.173.gad11b3a635.dirty


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v3 10/12] clean: avoid removing untracked files in a nested git repository
  2019-09-12 22:12   ` [PATCH v3 " Elijah Newren
                       ` (8 preceding siblings ...)
  2019-09-12 22:12     ` [PATCH v3 09/12] clean: disambiguate the definition of -d Elijah Newren
@ 2019-09-12 22:12     ` Elijah Newren
  2019-09-12 22:12     ` [PATCH v3 11/12] clean: rewrap overly long line Elijah Newren
                       ` (2 subsequent siblings)
  12 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-09-12 22:12 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Rafael Ascensão,
	SZEDER Gábor, Samuel Lijin, Elijah Newren

Users expect files in a nested git repository to be left alone unless
sufficiently forced (with two -f's).  Unfortunately, in certain
circumstances, git would delete both tracked (and possibly dirty) files
and untracked files within a nested repository.  To explain how this
happens, let's contrast a couple cases.  First, take the following
example setup (which assumes we are already within a git repo):

   git init nested
   cd nested
   >tracked
   git add tracked
   git commit -m init
   >untracked
   cd ..

In this setup, everything works as expected; running 'git clean -fd'
will result in fill_directory() returning the following paths:
   nested/
   nested/tracked
   nested/untracked
and then correct_untracked_entries() would notice this can be compressed
to
   nested/
and then since "nested/" is a directory, we would call
remove_dirs("nested/", ...), which would
check is_nonbare_repository_dir() and then decide to skip it.

However, if someone also creates an ignored file:
   >nested/ignored
then running 'git clean -fd' would result in fill_directory() returning
the same paths:
   nested/
   nested/tracked
   nested/untracked
but correct_untracked_entries() will notice that we had ignored entries
under nested/ and thus simplify this list to
   nested/tracked
   nested/untracked
Since these are not directories, we do not call remove_dirs() which was
the only place that had the is_nonbare_repository_dir() safety check --
resulting in us deleting both the untracked file and the tracked (and
possibly dirty) file.

One possible fix for this issue would be walking the parent directories
of each path and checking if they represent nonbare repositories, but
that would be wasteful.  Even if we added caching of some sort, it's
still a waste because we should have been able to check that "nested/"
represented a nonbare repository before even descending into it in the
first place.  Add a DIR_SKIP_NESTED_GIT flag to dir_struct.flags and use
it to prevent fill_directory() and friends from descending into nested
git repos.

With this change, we also modify two regression tests added in commit
91479b9c72f1 ("t7300: add tests to document behavior of clean and nested
git", 2015-06-15).  That commit, nor its series, nor the six previous
iterations of that series on the mailing list discussed why those tests
coded the expectation they did.  In fact, it appears their purpose was
simply to test _existing_ behavior to make sure that the performance
changes didn't change the behavior.  However, these two tests directly
contradicted the manpage's claims that two -f's were required to delete
files/directories under a nested git repository.  While one could argue
that the user gave an explicit path which matched files/directories that
were within a nested repository, there's a slippery slope that becomes
very difficult for users to understand once you go down that route (e.g.
what if they specified "git clean -f -d '*.c'"?)  It would also be hard
to explain what the exact behavior was; avoid such problems by making it
really simple.

Also, clean up some grammar errors describing this functionality in the
git-clean manpage.

Finally, there are still a couple bugs with -ffd not cleaning out enough
(e.g.  missing the nested .git) and with -ffdX possibly cleaning out the
wrong files (paying attention to outer .gitignore instead of inner).
This patch does not address these cases at all (and does not change the
behavior relative to those flags), it only fixes the handling when given
a single -f.  See
https://public-inbox.org/git/20190905212043.GC32087@szeder.dev/ for more
discussion of the -ffd[X?] bugs.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Documentation/git-clean.txt |  6 +++---
 builtin/clean.c             |  2 ++
 dir.c                       | 10 ++++++++++
 dir.h                       |  3 ++-
 t/t7300-clean.sh            | 10 +++++-----
 5 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/Documentation/git-clean.txt b/Documentation/git-clean.txt
index 3ab749b921..ba31d8d166 100644
--- a/Documentation/git-clean.txt
+++ b/Documentation/git-clean.txt
@@ -37,9 +37,9 @@ OPTIONS
 --force::
 	If the Git configuration variable clean.requireForce is not set
 	to false, 'git clean' will refuse to delete files or directories
-	unless given -f or -i. Git will refuse to delete directories
-	with .git sub directory or file unless a second -f
-	is given.
+	unless given -f or -i.  Git will refuse to modify untracked
+	nested git repositories (directories with a .git subdirectory)
+	unless a second -f is given.
 
 -i::
 --interactive::
diff --git a/builtin/clean.c b/builtin/clean.c
index 68d70e41c0..3a7a63ae71 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -946,6 +946,8 @@ int cmd_clean(int argc, const char **argv, const char *prefix)
 
 	if (force > 1)
 		rm_flags = 0;
+	else
+		dir.flags |= DIR_SKIP_NESTED_GIT;
 
 	dir.flags |= DIR_SHOW_OTHER_DIRECTORIES;
 
diff --git a/dir.c b/dir.c
index 3b2fe1701c..7ff79170fc 100644
--- a/dir.c
+++ b/dir.c
@@ -1451,6 +1451,16 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 		return path_none;
 
 	case index_nonexistent:
+		if (dir->flags & DIR_SKIP_NESTED_GIT) {
+			int nested_repo;
+			struct strbuf sb = STRBUF_INIT;
+			strbuf_addstr(&sb, dirname);
+			nested_repo = is_nonbare_repository_dir(&sb);
+			strbuf_release(&sb);
+			if (nested_repo)
+				return path_none;
+		}
+
 		if (dir->flags & DIR_SHOW_OTHER_DIRECTORIES)
 			break;
 		if (exclude &&
diff --git a/dir.h b/dir.h
index 46c238ab49..739aea7c96 100644
--- a/dir.h
+++ b/dir.h
@@ -156,7 +156,8 @@ struct dir_struct {
 		DIR_SHOW_IGNORED_TOO = 1<<5,
 		DIR_COLLECT_KILLED_ONLY = 1<<6,
 		DIR_KEEP_UNTRACKED_CONTENTS = 1<<7,
-		DIR_SHOW_IGNORED_TOO_MODE_MATCHING = 1<<8
+		DIR_SHOW_IGNORED_TOO_MODE_MATCHING = 1<<8,
+		DIR_SKIP_NESTED_GIT = 1<<9
 	} flags;
 	struct dir_entry **entries;
 	struct dir_entry **ignored;
diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index 530dfdab34..6e6d24c1c3 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -549,7 +549,7 @@ test_expect_failure 'nested (non-empty) bare repositories should be cleaned even
 	test_path_is_missing strange_bare
 '
 
-test_expect_success 'giving path in nested git work tree will remove it' '
+test_expect_success 'giving path in nested git work tree will NOT remove it' '
 	rm -fr repo &&
 	mkdir repo &&
 	(
@@ -561,7 +561,7 @@ test_expect_success 'giving path in nested git work tree will remove it' '
 	git clean -f -d repo/bar/baz &&
 	test_path_is_file repo/.git/HEAD &&
 	test_path_is_dir repo/bar/ &&
-	test_path_is_missing repo/bar/baz
+	test_path_is_file repo/bar/baz/hello.world
 '
 
 test_expect_success 'giving path to nested .git will not remove it' '
@@ -579,7 +579,7 @@ test_expect_success 'giving path to nested .git will not remove it' '
 	test_path_is_dir untracked/
 '
 
-test_expect_success 'giving path to nested .git/ will remove contents' '
+test_expect_success 'giving path to nested .git/ will NOT remove contents' '
 	rm -fr repo untracked &&
 	mkdir repo untracked &&
 	(
@@ -589,7 +589,7 @@ test_expect_success 'giving path to nested .git/ will remove contents' '
 	) &&
 	git clean -f -d repo/.git/ &&
 	test_path_is_dir repo/.git &&
-	test_dir_is_empty repo/.git &&
+	test_path_is_file repo/.git/HEAD &&
 	test_path_is_dir untracked/
 '
 
@@ -671,7 +671,7 @@ test_expect_success 'git clean -d skips untracked dirs containing ignored files'
 	test_path_is_missing foo/b/bb
 '
 
-test_expect_failure 'git clean -d skips nested repo containing ignored files' '
+test_expect_success 'git clean -d skips nested repo containing ignored files' '
 	test_when_finished "rm -rf nested-repo-with-ignored-file" &&
 
 	git init nested-repo-with-ignored-file &&
-- 
2.23.0.173.gad11b3a635.dirty


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v3 11/12] clean: rewrap overly long line
  2019-09-12 22:12   ` [PATCH v3 " Elijah Newren
                       ` (9 preceding siblings ...)
  2019-09-12 22:12     ` [PATCH v3 10/12] clean: avoid removing untracked files in a nested git repository Elijah Newren
@ 2019-09-12 22:12     ` Elijah Newren
  2019-09-12 22:12     ` [PATCH v3 12/12] clean: fix theoretical path corruption Elijah Newren
  2019-09-17 16:34     ` [PATCH v4 00/12] Fix some git clean issues Elijah Newren
  12 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-09-12 22:12 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Rafael Ascensão,
	SZEDER Gábor, Samuel Lijin, Elijah Newren

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 builtin/clean.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/builtin/clean.c b/builtin/clean.c
index 3a7a63ae71..6030842f3a 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -158,7 +158,8 @@ static int remove_dirs(struct strbuf *path, const char *prefix, int force_flag,
 
 	*dir_gone = 1;
 
-	if ((force_flag & REMOVE_DIR_KEEP_NESTED_GIT) && is_nonbare_repository_dir(path)) {
+	if ((force_flag & REMOVE_DIR_KEEP_NESTED_GIT) &&
+	    is_nonbare_repository_dir(path)) {
 		if (!quiet) {
 			quote_path_relative(path->buf, prefix, &quoted);
 			printf(dry_run ?  _(msg_would_skip_git_dir) : _(msg_skip_git_dir),
-- 
2.23.0.173.gad11b3a635.dirty


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v3 12/12] clean: fix theoretical path corruption
  2019-09-12 22:12   ` [PATCH v3 " Elijah Newren
                       ` (10 preceding siblings ...)
  2019-09-12 22:12     ` [PATCH v3 11/12] clean: rewrap overly long line Elijah Newren
@ 2019-09-12 22:12     ` Elijah Newren
  2019-09-17 16:34     ` [PATCH v4 00/12] Fix some git clean issues Elijah Newren
  12 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-09-12 22:12 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Rafael Ascensão,
	SZEDER Gábor, Samuel Lijin, Elijah Newren

cmd_clean() had the following code structure:

    struct strbuf abs_path = STRBUF_INIT;
    for_each_string_list_item(item, &del_list) {
        strbuf_addstr(&abs_path, prefix);
        strbuf_addstr(&abs_path, item->string);
        PROCESS(&abs_path);
        strbuf_reset(&abs_path);
    }

where I've elided a bunch of unnecessary details and PROCESS(&abs_path)
represents a big chunk of code rather than an actual function call.  One
piece of PROCESS was:

    if (lstat(abs_path.buf, &st))
        continue;

which would cause the strbuf_reset() to be missed -- meaning that the
next path to be handled would have two paths concatenated.  This path
used to use die_errno() instead of continue prior to commit 396049e5fb62
("git-clean: refactor git-clean into two phases", 2013-06-25), but my
understanding of how correct_untracked_entries() works is that it will
prevent both dir/ and dir/file from being in the list to clean so this
should be dead code and the die_errno() should be safe.  But I hesitate
to remove it since I am not certain.

However, we can fix both this bug and possible similar future bugs by
simply moving the strbuf_reset(&abs_path) to the beginning of the loop.
It'll result in N calls to strbuf_reset() instead of N-1, but that's a
small price to pay to avoid sneaky bugs like this.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 builtin/clean.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/clean.c b/builtin/clean.c
index 6030842f3a..4cf2399f59 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -1018,6 +1018,7 @@ int cmd_clean(int argc, const char **argv, const char *prefix)
 	for_each_string_list_item(item, &del_list) {
 		struct stat st;
 
+		strbuf_reset(&abs_path);
 		if (prefix)
 			strbuf_addstr(&abs_path, prefix);
 
@@ -1051,7 +1052,6 @@ int cmd_clean(int argc, const char **argv, const char *prefix)
 				printf(dry_run ? _(msg_would_remove) : _(msg_remove), qname);
 			}
 		}
-		strbuf_reset(&abs_path);
 	}
 
 	strbuf_release(&abs_path);
-- 
2.23.0.173.gad11b3a635.dirty


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v3 01/12] t7300: add testcases showing failure to clean specified pathspecs
  2019-09-12 22:12     ` [PATCH v3 01/12] t7300: add testcases showing failure to clean specified pathspecs Elijah Newren
@ 2019-09-13 18:54       ` Junio C Hamano
  2019-09-13 19:10         ` Elijah Newren
  0 siblings, 1 reply; 73+ messages in thread
From: Junio C Hamano @ 2019-09-13 18:54 UTC (permalink / raw)
  To: Elijah Newren
  Cc: git, Jeff King, Rafael Ascensão, SZEDER Gábor, Samuel Lijin

Elijah Newren <newren@gmail.com> writes:

> +test_expect_failure 'git clean handles being told what to clean' '
> +	mkdir -p d1 d2 &&
> +	touch d1/ut d2/ut &&
> +	git clean -f */ut &&
> +	test_path_is_missing d1/ut &&
> +	test_path_is_missing d2/ut
> +'

Looks like d1 and d2 are new directories and the paths we see in the
test are the only ones that are involved (i.e. we do not rely on any
leftover cruft in d[12]/ from previous tests).  If so, perhaps it is
easier to follow by starting the tests with "rm -fr d1 d2 &&" or
something to assure the readers of the script (not this patch, but
the resulting file down the road) about the isolation?  The same
comment applies to the remainder.

Also, you talked about tracked paths in the proposed log message; do
they not participate in reproducing the issue(s)?

Thanks.


> +test_expect_failure 'git clean handles being told what to clean, with -d' '
> +	mkdir -p d1 d2 &&
> +	touch d1/ut d2/ut &&
> +	git clean -ffd */ut &&
> +	test_path_is_missing d1/ut &&
> +	test_path_is_missing d2/ut
> +'
> +
> +test_expect_failure 'git clean works if a glob is passed without -d' '
> +	mkdir -p d1 d2 &&
> +	touch d1/ut d2/ut &&
> +	git clean -f "*ut" &&
> +	test_path_is_missing d1/ut &&
> +	test_path_is_missing d2/ut
> +'
> +
> +test_expect_failure 'git clean works if a glob is passed with -d' '
> +	mkdir -p d1 d2 &&
> +	touch d1/ut d2/ut &&
> +	git clean -ffd "*ut" &&
> +	test_path_is_missing d1/ut &&
> +	test_path_is_missing d2/ut
> +'
> +
>  test_expect_success MINGW 'handle clean & core.longpaths = false nicely' '
>  	test_config core.longpaths false &&
>  	a50=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa &&

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v3 03/12] dir: fix off-by-one error in match_pathspec_item
  2019-09-12 22:12     ` [PATCH v3 03/12] dir: fix off-by-one error in match_pathspec_item Elijah Newren
@ 2019-09-13 19:05       ` Junio C Hamano
  0 siblings, 0 replies; 73+ messages in thread
From: Junio C Hamano @ 2019-09-13 19:05 UTC (permalink / raw)
  To: Elijah Newren
  Cc: git, Jeff King, Rafael Ascensão, SZEDER Gábor, Samuel Lijin

Elijah Newren <newren@gmail.com> writes:

> For a pathspec like 'foo/bar' comparing against a path named "foo/",
> namelen will be 4, and match[namelen] will be 'b'.  The correct location
> of the directory separator is namelen-1.

And the reason why name[namelen-1] may not be slash, in which case
your new code makes offset 0, is because we need to handle what
case?  When path is "foo" (not "foo/")?  Just makes me wonder why
this callee allows the caller(s) to be inconsistent, sometimes
including the trailing slash in <name, nemelen> tuple, sometimes
not.

> The reason the code worked anyway was that the following code immediately
> checked whether the first matchlen characters matched (which they do) and
> then bailed and return MATCHED_RECURSIVELY anyway since wildmatch doesn't
> have the ability to check if "name" can be matched as a directory (or
> prefix) against the pathspec.

Nicely spotted and explained.

>
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  dir.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/dir.c b/dir.c
> index a9168bed96..bf1a74799e 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -356,8 +356,9 @@ static int match_pathspec_item(const struct index_state *istate,
>  	/* Perform checks to see if "name" is a super set of the pathspec */
>  	if (flags & DO_MATCH_SUBMODULE) {
>  		/* name is a literal prefix of the pathspec */
> +		int offset = name[namelen-1] == '/' ? 1 : 0;
>  		if ((namelen < matchlen) &&
> -		    (match[namelen] == '/') &&
> +		    (match[namelen-offset] == '/') &&
>  		    !ps_strncmp(item, match, name, namelen))
>  			return MATCHED_RECURSIVELY;

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v3 01/12] t7300: add testcases showing failure to clean specified pathspecs
  2019-09-13 18:54       ` Junio C Hamano
@ 2019-09-13 19:10         ` Elijah Newren
  2019-09-13 20:29           ` Junio C Hamano
  0 siblings, 1 reply; 73+ messages in thread
From: Elijah Newren @ 2019-09-13 19:10 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Git Mailing List, Jeff King, Rafael Ascensão,
	SZEDER Gábor, Samuel Lijin

On Fri, Sep 13, 2019 at 11:54 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Elijah Newren <newren@gmail.com> writes:
>
> > +test_expect_failure 'git clean handles being told what to clean' '
> > +     mkdir -p d1 d2 &&
> > +     touch d1/ut d2/ut &&
> > +     git clean -f */ut &&
> > +     test_path_is_missing d1/ut &&
> > +     test_path_is_missing d2/ut
> > +'
>
> Looks like d1 and d2 are new directories and the paths we see in the
> test are the only ones that are involved (i.e. we do not rely on any
> leftover cruft in d[12]/ from previous tests).  If so, perhaps it is
> easier to follow by starting the tests with "rm -fr d1 d2 &&" or
> something to assure the readers of the script (not this patch, but
> the resulting file down the road) about the isolation?  The same
> comment applies to the remainder.

Makes sense.

> Also, you talked about tracked paths in the proposed log message; do
> they not participate in reproducing the issue(s)?

If there is only one directory which has no tracked files, then the
user can clean up the files -- but confusingly, they have to issue the
same git-clean command multiple times.  If multiple directories have
no untracked files, git-clean will never clean them out.  I probably
didn't do a very good job explaining that although I started with the
case with one tracked, that I view the case without any as the more
general case -- and that solving it solves both problems.  I could
probably make that clearer in the commit message.  (Or maybe just add
more testcases even if slightly duplicative, I guess.)

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v3 06/12] dir: if our pathspec might match files under a dir, recurse into it
  2019-09-12 22:12     ` [PATCH v3 06/12] dir: if our pathspec might match files under a dir, recurse into it Elijah Newren
@ 2019-09-13 19:45       ` Junio C Hamano
  0 siblings, 0 replies; 73+ messages in thread
From: Junio C Hamano @ 2019-09-13 19:45 UTC (permalink / raw)
  To: Elijah Newren
  Cc: git, Jeff King, Rafael Ascensão, SZEDER Gábor, Samuel Lijin

Elijah Newren <newren@gmail.com> writes:

> For git clean, if a directory is entirely untracked and the user did not
> specify -d (corresponding to DIR_SHOW_IGNORED_TOO), then we usually do
> not want to remove that directory and thus do not recurse into it.

Makes sense.  To clean named paths in such a directory, we'd need an
option to recurse into it to find them, yet make sure the directory
itself does not get removed.

> However, if the user manually specified specific (or even globbed) paths
> somewhere under that directory to remove, then we need to recurse into
> the directory to make sure we remove the relevant paths under that
> directory as the user requested.

Surely.

> @@ -1939,8 +1939,10 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
>  		/* recurse into subdir if instructed by treat_path */
>  		if ((state == path_recurse) ||
>  			((state == path_untracked) &&
> -			 (dir->flags & DIR_SHOW_IGNORED_TOO) &&
> -			 (get_dtype(cdir.de, istate, path.buf, path.len) == DT_DIR))) {
> +			 (get_dtype(cdir.de, istate, path.buf, path.len) == DT_DIR) &&
> +			 ((dir->flags & DIR_SHOW_IGNORED_TOO) ||
> +			  do_match_pathspec(istate, pathspec, path.buf, path.len,
> +					    baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC))) {
>  			struct untracked_cache_dir *ud;
>  			ud = lookup_untracked(dir->untracked, untracked,
>  					      path.buf + baselen,

OK.

> diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
> index 12617158db..d83aeb7dc2 100755
> --- a/t/t7300-clean.sh
> +++ b/t/t7300-clean.sh
> @@ -691,7 +691,7 @@ test_expect_failure 'git clean -d skips nested repo containing ignored files' '
>  	test_path_is_file nested-repo-with-ignored-file/file
>  '
>  
> -test_expect_failure 'git clean handles being told what to clean' '
> +test_expect_success 'git clean handles being told what to clean' '
>  	mkdir -p d1 d2 &&
>  	touch d1/ut d2/ut &&
>  	git clean -f */ut &&
> @@ -707,7 +707,7 @@ test_expect_success 'git clean handles being told what to clean, with -d' '
>  	test_path_is_missing d2/ut
>  '
>  
> -test_expect_failure 'git clean works if a glob is passed without -d' '
> +test_expect_success 'git clean works if a glob is passed without -d' '
>  	mkdir -p d1 d2 &&
>  	touch d1/ut d2/ut &&
>  	git clean -f "*ut" &&

Nice.

Thanks.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v3 07/12] dir: add commentary explaining match_pathspec_item's return value
  2019-09-12 22:12     ` [PATCH v3 07/12] dir: add commentary explaining match_pathspec_item's return value Elijah Newren
@ 2019-09-13 20:04       ` Junio C Hamano
  0 siblings, 0 replies; 73+ messages in thread
From: Junio C Hamano @ 2019-09-13 20:04 UTC (permalink / raw)
  To: Elijah Newren
  Cc: git, Jeff King, Rafael Ascensão, SZEDER Gábor, Samuel Lijin

Elijah Newren <newren@gmail.com> writes:

> The way match_pathspec_item() handles names and pathspecs with trailing
> slash characters, in conjunction with special options like
> DO_MATCH_DIRECTORY and DO_MATCH_LEADING_PATHSPEC were non-obvious, and
> broken until this patch series.  Add a table in a comment explaining the
> intent of how these work.

Thanks.

>
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  dir.c | 27 +++++++++++++++++++--------
>  1 file changed, 19 insertions(+), 8 deletions(-)
>
> diff --git a/dir.c b/dir.c
> index 47c0a99cb5..3b2fe1701c 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -276,16 +276,27 @@ static int do_read_blob(const struct object_id *oid, struct oid_stat *oid_stat,
>  #define DO_MATCH_LEADING_PATHSPEC (1<<2)
>  
>  /*
> - * Does 'match' match the given name?
> - * A match is found if
> + * Does the given pathspec match the given name?  A match is found if
>   *
> - * (1) the 'match' string is leading directory of 'name', or
> - * (2) the 'match' string is a wildcard and matches 'name', or
> - * (3) the 'match' string is exactly the same as 'name'.
> + * (1) the pathspec string is leading directory of 'name' ("RECURSIVELY"), or
> + * (2) the pathspec string has a leading part matching 'name' ("LEADING"), or
> + * (3) the pathspec string is a wildcard and matches 'name' ("WILDCARD"), or
> + * (4) the pathspec string is exactly the same as 'name' ("EXACT").
>   *
> - * and the return value tells which case it was.
> + * Return value tells which case it was (1-4), or 0 when there is no match.
>   *
> - * It returns 0 when there is no match.
> + * It may be instructive to look at a small table of concrete examples
> + * to understand the differences between 1, 2, and 4:
> + *
> + *                              Pathspecs
> + *                |    a/b    |   a/b/    |   a/b/c
> + *          ------+-----------+-----------+------------
> + *          a/b   |  EXACT    |  EXACT[1] | LEADING[2]
> + *  Names   a/b/  | RECURSIVE |   EXACT   | LEADING[2]
> + *          a/b/c | RECURSIVE | RECURSIVE |   EXACT
> + *
> + * [1] Only if DO_MATCH_DIRECTORY is passed; otherwise, this is NOT a match.
> + * [2] Only if DO_MATCH_LEADING_PATHSPEC is passed; otherwise, not a match.
>   */
>  static int match_pathspec_item(const struct index_state *istate,
>  			       const struct pathspec_item *item, int prefix,
> @@ -353,7 +364,7 @@ static int match_pathspec_item(const struct index_state *istate,
>  			 item->nowildcard_len - prefix))
>  		return MATCHED_FNMATCH;
>  
> -	/* Perform checks to see if "name" is a super set of the pathspec */
> +	/* Perform checks to see if "name" is a leading string of the pathspec */
>  	if (flags & DO_MATCH_LEADING_PATHSPEC) {
>  		/* name is a literal prefix of the pathspec */
>  		int offset = name[namelen-1] == '/' ? 1 : 0;

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v3 01/12] t7300: add testcases showing failure to clean specified pathspecs
  2019-09-13 19:10         ` Elijah Newren
@ 2019-09-13 20:29           ` Junio C Hamano
  0 siblings, 0 replies; 73+ messages in thread
From: Junio C Hamano @ 2019-09-13 20:29 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Git Mailing List, Jeff King, Rafael Ascensão,
	SZEDER Gábor, Samuel Lijin

Elijah Newren <newren@gmail.com> writes:

>> Also, you talked about tracked paths in the proposed log message; do
>> they not participate in reproducing the issue(s)?
>
> If there is only one directory which has no tracked files, then the
> user can clean up the files -- but confusingly, they have to issue the
> same git-clean command multiple times.  If multiple directories have
> no untracked files, git-clean will never clean them out.  I probably
> didn't do a very good job explaining that although I started with the
> case with one tracked, that I view the case without any as the more
> general case -- and that solving it solves both problems.  I could
> probably make that clearer in the commit message.  (Or maybe just add
> more testcases even if slightly duplicative, I guess.)

My comment/puzzlement indeed was the lack of any tracked file in
your tests, even though the log message did talk about one's
presence making a difference in the outcome.

Thanks.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v4 00/12] Fix some git clean issues
  2019-09-12 22:12   ` [PATCH v3 " Elijah Newren
                       ` (11 preceding siblings ...)
  2019-09-12 22:12     ` [PATCH v3 12/12] clean: fix theoretical path corruption Elijah Newren
@ 2019-09-17 16:34     ` Elijah Newren
  2019-09-17 16:34       ` [PATCH v4 01/12] t7300: add testcases showing failure to clean specified pathspecs Elijah Newren
                         ` (11 more replies)
  12 siblings, 12 replies; 73+ messages in thread
From: Elijah Newren @ 2019-09-17 16:34 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Rafael Ascensão,
	SZEDER Gábor, Samuel Lijin, Elijah Newren

This patch series fixes a few issues with git-clean:
  * Failure to clean when multiple pathspecs are specified, reported both
    in April 2018[1] and again in May 2019[2].
  * Failure to preserve both tracked and untracked files within a nested
    Git repository reported a few weeks ago by SZEDER[3].
It builds on sg/clean-nested-repo-with-ignored.

[1] https://public-inbox.org/git/20180405173446.32372-4-newren@gmail.com/
[2] https://public-inbox.org/git/20190531183651.10067-1-rafa.almas@gmail.com/
[3] https://public-inbox.org/git/20190825185918.3909-1-szeder.dev@gmail.com/

Changes since v3:
  * Clarified a couple commit messages highlighted by Junio.

Elijah Newren (12):
  t7300: add testcases showing failure to clean specified pathspecs
  dir: fix typo in comment
  dir: fix off-by-one error in match_pathspec_item
  dir: also check directories for matching pathspecs
  dir: make the DO_MATCH_SUBMODULE code reusable for a non-submodule
    case
  dir: if our pathspec might match files under a dir, recurse into it
  dir: add commentary explaining match_pathspec_item's return value
  git-clean.txt: do not claim we will delete files with -n/--dry-run
  clean: disambiguate the definition of -d
  clean: avoid removing untracked files in a nested git repository
  clean: rewrap overly long line
  clean: fix theoretical path corruption

 Documentation/git-clean.txt | 16 +++++-----
 builtin/clean.c             | 15 +++++++--
 dir.c                       | 63 +++++++++++++++++++++++++++----------
 dir.h                       |  8 +++--
 t/t7300-clean.sh            | 44 +++++++++++++++++++++++---
 5 files changed, 112 insertions(+), 34 deletions(-)

Range-diff:
 1:  fe35ab8cc3 !  1:  a48d4e7faf t7300: add testcases showing failure to clean specified pathspecs
    @@ -28,9 +28,15 @@
         showed that the same buggy behavior exists without using that flag, and
         has in fact existed since before cf424f5fd89b.
     
    -    Add testcases showing that multiple untracked files within entirely
    -    untracked directories cannot be cleaned when specifying these files to
    -    git clean via pathspecs.
    +    Although these problems at first are perceived to be different (e.g.
    +    never clearing out the requested files vs. taking multiple invocations
    +    to get everything cleared out), they are actually just different
    +    manifestations of the same problem.  The case with multiple directories
    +    that have no tracked files is the more general case; solving it will
    +    solve all the others.  So, I concentrate on it.  Add testcases showing
    +    that multiple untracked files within entirely untracked directories
    +    cannot be cleaned when specifying these files to git clean via
    +    pathspecs.
     
         Signed-off-by: Elijah Newren <newren@gmail.com>
     
 2:  707d287d79 =  2:  eb00b46822 dir: fix typo in comment
 3:  bb316e82b2 !  3:  c0e5b820a9 dir: fix off-by-one error in match_pathspec_item
    @@ -6,11 +6,22 @@
         namelen will be 4, and match[namelen] will be 'b'.  The correct location
         of the directory separator is namelen-1.
     
    -    The reason the code worked anyway was that the following code immediately
    -    checked whether the first matchlen characters matched (which they do) and
    -    then bailed and return MATCHED_RECURSIVELY anyway since wildmatch doesn't
    -    have the ability to check if "name" can be matched as a directory (or
    -    prefix) against the pathspec.
    +    However, other callers of match_pathspec_item() such as builtin/grep.c's
    +    submodule_path_match() will compare against a path named "foo" instead of
    +    "foo/".  It might be better to change all the callers to be consistent,
    +    as discussed at
    +       https://public-inbox.org/git/xmqq7e6cdnkr.fsf@gitster-ct.c.googlers.com/
    +    and
    +       https://public-inbox.org/git/CABPp-BERWUPCPq-9fVW1LNocqkrfsoF4BPj3gJd9+En43vEkTQ@mail.gmail.com/
    +    but there are many cases to audit, so for now just make sure we handle
    +    both cases with and without a trailing slash.
    +
    +    The reason the code worked despite this sometimes-off-by-one error was
    +    that the subsequent code immediately checked whether the first matchlen
    +    characters matched (which they do) and then bailed and return
    +    MATCHED_RECURSIVELY anyway since wildmatch doesn't have the ability to
    +    check if "name" can be matched as a directory (or prefix) against the
    +    pathspec.
     
         Signed-off-by: Elijah Newren <newren@gmail.com>
     
 4:  56319f934a =  4:  397775ec35 dir: also check directories for matching pathspecs
 5:  81593a565c =  5:  b836de82c0 dir: make the DO_MATCH_SUBMODULE code reusable for a non-submodule case
 6:  9566823a0f =  6:  feb317d090 dir: if our pathspec might match files under a dir, recurse into it
 7:  7821898ba7 =  7:  0a574d6779 dir: add commentary explaining match_pathspec_item's return value
 8:  13def5df57 =  8:  0eaa08537c git-clean.txt: do not claim we will delete files with -n/--dry-run
 9:  e6b274abf7 =  9:  a1438301bb clean: disambiguate the definition of -d
10:  5f4ef14765 = 10:  8dc21923ee clean: avoid removing untracked files in a nested git repository
11:  4e30e62eb1 = 11:  707b6a5509 clean: rewrap overly long line
12:  de2444f7cb = 12:  84a90010ed clean: fix theoretical path corruption
-- 
2.22.1.17.g6e632477f7.dirty


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v4 01/12] t7300: add testcases showing failure to clean specified pathspecs
  2019-09-17 16:34     ` [PATCH v4 00/12] Fix some git clean issues Elijah Newren
@ 2019-09-17 16:34       ` Elijah Newren
  2019-09-17 16:34       ` [PATCH v4 02/12] dir: fix typo in comment Elijah Newren
                         ` (10 subsequent siblings)
  11 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-09-17 16:34 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Rafael Ascensão,
	SZEDER Gábor, Samuel Lijin, Elijah Newren

Someone brought me a testcase where multiple git-clean invocations were
required to clean out unwanted files:
  mkdir d{1,2}
  touch d{1,2}/ut
  touch d1/t && git add d1/t
With this setup, the user would need to run
  git clean -ffd */ut
twice to delete both ut files.

A little testing showed some interesting variants:
  * If only one of those two ut files existed (either one), then only one
    clean command would be necessary.
  * If both directories had tracked files, then only one git clean would
    be necessary to clean both files.
  * If both directories had no tracked files then the clean command above
    would never clean either of the untracked files despite the pathspec
    explicitly calling both of them out.

A bisect showed that the failure to clean out the files started with
commit cf424f5fd89b ("clean: respect pathspecs with "-d", 2014-03-10).
However, that pointed to a separate issue: while the "-d" flag was used
by the original user who showed me this problem, that flag should have
been irrelevant to this problem.  Testing again without the "-d" flag
showed that the same buggy behavior exists without using that flag, and
has in fact existed since before cf424f5fd89b.

Although these problems at first are perceived to be different (e.g.
never clearing out the requested files vs. taking multiple invocations
to get everything cleared out), they are actually just different
manifestations of the same problem.  The case with multiple directories
that have no tracked files is the more general case; solving it will
solve all the others.  So, I concentrate on it.  Add testcases showing
that multiple untracked files within entirely untracked directories
cannot be cleaned when specifying these files to git clean via
pathspecs.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t7300-clean.sh | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index d01fd120ab..2c254c773c 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -691,6 +691,38 @@ test_expect_failure 'git clean -d skips nested repo containing ignored files' '
 	test_path_is_file nested-repo-with-ignored-file/file
 '
 
+test_expect_failure 'git clean handles being told what to clean' '
+	mkdir -p d1 d2 &&
+	touch d1/ut d2/ut &&
+	git clean -f */ut &&
+	test_path_is_missing d1/ut &&
+	test_path_is_missing d2/ut
+'
+
+test_expect_failure 'git clean handles being told what to clean, with -d' '
+	mkdir -p d1 d2 &&
+	touch d1/ut d2/ut &&
+	git clean -ffd */ut &&
+	test_path_is_missing d1/ut &&
+	test_path_is_missing d2/ut
+'
+
+test_expect_failure 'git clean works if a glob is passed without -d' '
+	mkdir -p d1 d2 &&
+	touch d1/ut d2/ut &&
+	git clean -f "*ut" &&
+	test_path_is_missing d1/ut &&
+	test_path_is_missing d2/ut
+'
+
+test_expect_failure 'git clean works if a glob is passed with -d' '
+	mkdir -p d1 d2 &&
+	touch d1/ut d2/ut &&
+	git clean -ffd "*ut" &&
+	test_path_is_missing d1/ut &&
+	test_path_is_missing d2/ut
+'
+
 test_expect_success MINGW 'handle clean & core.longpaths = false nicely' '
 	test_config core.longpaths false &&
 	a50=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa &&
-- 
2.22.1.17.g6e632477f7.dirty


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v4 02/12] dir: fix typo in comment
  2019-09-17 16:34     ` [PATCH v4 00/12] Fix some git clean issues Elijah Newren
  2019-09-17 16:34       ` [PATCH v4 01/12] t7300: add testcases showing failure to clean specified pathspecs Elijah Newren
@ 2019-09-17 16:34       ` Elijah Newren
  2019-09-17 16:34       ` [PATCH v4 03/12] dir: fix off-by-one error in match_pathspec_item Elijah Newren
                         ` (9 subsequent siblings)
  11 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-09-17 16:34 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Rafael Ascensão,
	SZEDER Gábor, Samuel Lijin, Elijah Newren

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/dir.c b/dir.c
index d021c908e5..a9168bed96 100644
--- a/dir.c
+++ b/dir.c
@@ -139,7 +139,7 @@ static size_t common_prefix_len(const struct pathspec *pathspec)
 	 * ":(icase)path" is treated as a pathspec full of
 	 * wildcard. In other words, only prefix is considered common
 	 * prefix. If the pathspec is abc/foo abc/bar, running in
-	 * subdir xyz, the common prefix is still xyz, not xuz/abc as
+	 * subdir xyz, the common prefix is still xyz, not xyz/abc as
 	 * in non-:(icase).
 	 */
 	GUARD_PATHSPEC(pathspec,
-- 
2.22.1.17.g6e632477f7.dirty


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v4 03/12] dir: fix off-by-one error in match_pathspec_item
  2019-09-17 16:34     ` [PATCH v4 00/12] Fix some git clean issues Elijah Newren
  2019-09-17 16:34       ` [PATCH v4 01/12] t7300: add testcases showing failure to clean specified pathspecs Elijah Newren
  2019-09-17 16:34       ` [PATCH v4 02/12] dir: fix typo in comment Elijah Newren
@ 2019-09-17 16:34       ` Elijah Newren
  2019-09-17 16:34       ` [PATCH v4 04/12] dir: also check directories for matching pathspecs Elijah Newren
                         ` (8 subsequent siblings)
  11 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-09-17 16:34 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Rafael Ascensão,
	SZEDER Gábor, Samuel Lijin, Elijah Newren

For a pathspec like 'foo/bar' comparing against a path named "foo/",
namelen will be 4, and match[namelen] will be 'b'.  The correct location
of the directory separator is namelen-1.

However, other callers of match_pathspec_item() such as builtin/grep.c's
submodule_path_match() will compare against a path named "foo" instead of
"foo/".  It might be better to change all the callers to be consistent,
as discussed at
   https://public-inbox.org/git/xmqq7e6cdnkr.fsf@gitster-ct.c.googlers.com/
and
   https://public-inbox.org/git/CABPp-BERWUPCPq-9fVW1LNocqkrfsoF4BPj3gJd9+En43vEkTQ@mail.gmail.com/
but there are many cases to audit, so for now just make sure we handle
both cases with and without a trailing slash.

The reason the code worked despite this sometimes-off-by-one error was
that the subsequent code immediately checked whether the first matchlen
characters matched (which they do) and then bailed and return
MATCHED_RECURSIVELY anyway since wildmatch doesn't have the ability to
check if "name" can be matched as a directory (or prefix) against the
pathspec.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/dir.c b/dir.c
index a9168bed96..bf1a74799e 100644
--- a/dir.c
+++ b/dir.c
@@ -356,8 +356,9 @@ static int match_pathspec_item(const struct index_state *istate,
 	/* Perform checks to see if "name" is a super set of the pathspec */
 	if (flags & DO_MATCH_SUBMODULE) {
 		/* name is a literal prefix of the pathspec */
+		int offset = name[namelen-1] == '/' ? 1 : 0;
 		if ((namelen < matchlen) &&
-		    (match[namelen] == '/') &&
+		    (match[namelen-offset] == '/') &&
 		    !ps_strncmp(item, match, name, namelen))
 			return MATCHED_RECURSIVELY;
 
-- 
2.22.1.17.g6e632477f7.dirty


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v4 04/12] dir: also check directories for matching pathspecs
  2019-09-17 16:34     ` [PATCH v4 00/12] Fix some git clean issues Elijah Newren
                         ` (2 preceding siblings ...)
  2019-09-17 16:34       ` [PATCH v4 03/12] dir: fix off-by-one error in match_pathspec_item Elijah Newren
@ 2019-09-17 16:34       ` Elijah Newren
  2019-09-25 20:39         ` [BUG] git is segfaulting, was " Denton Liu
  2019-09-17 16:34       ` [PATCH v4 05/12] dir: make the DO_MATCH_SUBMODULE code reusable for a non-submodule case Elijah Newren
                         ` (7 subsequent siblings)
  11 siblings, 1 reply; 73+ messages in thread
From: Elijah Newren @ 2019-09-17 16:34 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Rafael Ascensão,
	SZEDER Gábor, Samuel Lijin, Elijah Newren

Even if a directory doesn't match a pathspec, it is possible, depending
on the precise pathspecs, that some file underneath it might.  So we
special case and recurse into the directory for such situations.  However,
we previously always added any untracked directory that we recursed into
to the list of untracked paths, regardless of whether the directory
itself matched the pathspec.

For the case of git-clean and a set of pathspecs of "dir/file" and "more",
this caused a problem because we'd end up with dir entries for both of
  "dir"
  "dir/file"
Then correct_untracked_entries() would try to helpfully prune duplicates
for us by removing "dir/file" since it's under "dir", leaving us with
  "dir"
Since the original pathspec only had "dir/file", the only entry left
doesn't match and leaves nothing to be removed.  (Note that if only one
pathspec was specified, e.g. only "dir/file", then the common_prefix_len
optimizations in fill_directory would cause us to bypass this problem,
making it appear in simple tests that we could correctly remove manually
specified pathspecs.)

Fix this by actually checking whether the directory we are about to add
to the list of dir entries actually matches the pathspec; only do this
matching check after we have already returned from recursing into the
directory.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c            | 5 +++++
 t/t7300-clean.sh | 4 ++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/dir.c b/dir.c
index bf1a74799e..76a3c3894b 100644
--- a/dir.c
+++ b/dir.c
@@ -1951,6 +1951,11 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 							 check_only, stop_at_first_file, pathspec);
 			if (subdir_state > dir_state)
 				dir_state = subdir_state;
+
+			if (!match_pathspec(istate, pathspec, path.buf, path.len,
+					    0 /* prefix */, NULL,
+					    0 /* do NOT special case dirs */))
+				state = path_none;
 		}
 
 		if (check_only) {
diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index 2c254c773c..12617158db 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -699,7 +699,7 @@ test_expect_failure 'git clean handles being told what to clean' '
 	test_path_is_missing d2/ut
 '
 
-test_expect_failure 'git clean handles being told what to clean, with -d' '
+test_expect_success 'git clean handles being told what to clean, with -d' '
 	mkdir -p d1 d2 &&
 	touch d1/ut d2/ut &&
 	git clean -ffd */ut &&
@@ -715,7 +715,7 @@ test_expect_failure 'git clean works if a glob is passed without -d' '
 	test_path_is_missing d2/ut
 '
 
-test_expect_failure 'git clean works if a glob is passed with -d' '
+test_expect_success 'git clean works if a glob is passed with -d' '
 	mkdir -p d1 d2 &&
 	touch d1/ut d2/ut &&
 	git clean -ffd "*ut" &&
-- 
2.22.1.17.g6e632477f7.dirty


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v4 05/12] dir: make the DO_MATCH_SUBMODULE code reusable for a non-submodule case
  2019-09-17 16:34     ` [PATCH v4 00/12] Fix some git clean issues Elijah Newren
                         ` (3 preceding siblings ...)
  2019-09-17 16:34       ` [PATCH v4 04/12] dir: also check directories for matching pathspecs Elijah Newren
@ 2019-09-17 16:34       ` Elijah Newren
  2019-09-17 16:34       ` [PATCH v4 06/12] dir: if our pathspec might match files under a dir, recurse into it Elijah Newren
                         ` (6 subsequent siblings)
  11 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-09-17 16:34 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Rafael Ascensão,
	SZEDER Gábor, Samuel Lijin, Elijah Newren

The specific checks done in match_pathspec_item for the DO_MATCH_SUBMODULE
case are useful for other cases which have nothing to do with submodules.
Rename this constant; a subsequent commit will make use of this change.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/dir.c b/dir.c
index 76a3c3894b..b4d656192e 100644
--- a/dir.c
+++ b/dir.c
@@ -273,7 +273,7 @@ static int do_read_blob(const struct object_id *oid, struct oid_stat *oid_stat,
 
 #define DO_MATCH_EXCLUDE   (1<<0)
 #define DO_MATCH_DIRECTORY (1<<1)
-#define DO_MATCH_SUBMODULE (1<<2)
+#define DO_MATCH_LEADING_PATHSPEC (1<<2)
 
 /*
  * Does 'match' match the given name?
@@ -354,7 +354,7 @@ static int match_pathspec_item(const struct index_state *istate,
 		return MATCHED_FNMATCH;
 
 	/* Perform checks to see if "name" is a super set of the pathspec */
-	if (flags & DO_MATCH_SUBMODULE) {
+	if (flags & DO_MATCH_LEADING_PATHSPEC) {
 		/* name is a literal prefix of the pathspec */
 		int offset = name[namelen-1] == '/' ? 1 : 0;
 		if ((namelen < matchlen) &&
@@ -498,7 +498,7 @@ int submodule_path_match(const struct index_state *istate,
 					strlen(submodule_name),
 					0, seen,
 					DO_MATCH_DIRECTORY |
-					DO_MATCH_SUBMODULE);
+					DO_MATCH_LEADING_PATHSPEC);
 	return matched;
 }
 
-- 
2.22.1.17.g6e632477f7.dirty


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v4 06/12] dir: if our pathspec might match files under a dir, recurse into it
  2019-09-17 16:34     ` [PATCH v4 00/12] Fix some git clean issues Elijah Newren
                         ` (4 preceding siblings ...)
  2019-09-17 16:34       ` [PATCH v4 05/12] dir: make the DO_MATCH_SUBMODULE code reusable for a non-submodule case Elijah Newren
@ 2019-09-17 16:34       ` Elijah Newren
  2019-09-17 16:34       ` [PATCH v4 07/12] dir: add commentary explaining match_pathspec_item's return value Elijah Newren
                         ` (5 subsequent siblings)
  11 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-09-17 16:34 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Rafael Ascensão,
	SZEDER Gábor, Samuel Lijin, Elijah Newren

For git clean, if a directory is entirely untracked and the user did not
specify -d (corresponding to DIR_SHOW_IGNORED_TOO), then we usually do
not want to remove that directory and thus do not recurse into it.
However, if the user manually specified specific (or even globbed) paths
somewhere under that directory to remove, then we need to recurse into
the directory to make sure we remove the relevant paths under that
directory as the user requested.

Note that this does not mean that the recursed-into directory will be
added to dir->entries for later removal; as of a few commits earlier in
this series, there is another more strict match check that is run after
returning from a recursed-into directory before deciding to add it to the
list of entries.  Therefore, this will only result in files underneath
the given directory which match one of the pathspecs being added to the
entries list.

Two notes of potential interest to future readers:

  * If we wanted to only recurse into a directory when it is specifically
    matched rather than matched-via-glob (e.g. '*.c'), then we could do
    so via making the final non-zero return in match_pathspec_item be
    MATCHED_RECURSIVELY instead of MATCHED_RECURSIVELY_LEADING_PATHSPEC.
    (Note that the relative order of MATCHED_RECURSIVELY_LEADING_PATHSPEC
    and MATCHED_RECURSIVELY are important for such a change.)  I was
    leaving open that possibility while writing an RFC asking for the
    behavior we want, but even though we don't want it, that knowledge
    might help you understand the code flow better.

  * There is a growing amount of logic in read_directory_recursive() for
    deciding whether to recurse into a subdirectory.  However, there is a
    comment immediately preceding this logic that says to recurse if
    instructed by treat_path().   It may be better for the logic in
    read_directory_recursive() to ultimately be moved to treat_path() (or
    another function it calls, such as treat_directory()), but I have
    left that for someone else to tackle in the future.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c            | 10 ++++++----
 dir.h            |  5 +++--
 t/t7300-clean.sh |  4 ++--
 3 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/dir.c b/dir.c
index b4d656192e..47c0a99cb5 100644
--- a/dir.c
+++ b/dir.c
@@ -360,7 +360,7 @@ static int match_pathspec_item(const struct index_state *istate,
 		if ((namelen < matchlen) &&
 		    (match[namelen-offset] == '/') &&
 		    !ps_strncmp(item, match, name, namelen))
-			return MATCHED_RECURSIVELY;
+			return MATCHED_RECURSIVELY_LEADING_PATHSPEC;
 
 		/* name" doesn't match up to the first wild character */
 		if (item->nowildcard_len < item->len &&
@@ -377,7 +377,7 @@ static int match_pathspec_item(const struct index_state *istate,
 		 * The submodules themselves will be able to perform more
 		 * accurate matching to determine if the pathspec matches.
 		 */
-		return MATCHED_RECURSIVELY;
+		return MATCHED_RECURSIVELY_LEADING_PATHSPEC;
 	}
 
 	return 0;
@@ -1939,8 +1939,10 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 		/* recurse into subdir if instructed by treat_path */
 		if ((state == path_recurse) ||
 			((state == path_untracked) &&
-			 (dir->flags & DIR_SHOW_IGNORED_TOO) &&
-			 (get_dtype(cdir.de, istate, path.buf, path.len) == DT_DIR))) {
+			 (get_dtype(cdir.de, istate, path.buf, path.len) == DT_DIR) &&
+			 ((dir->flags & DIR_SHOW_IGNORED_TOO) ||
+			  do_match_pathspec(istate, pathspec, path.buf, path.len,
+					    baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC))) {
 			struct untracked_cache_dir *ud;
 			ud = lookup_untracked(dir->untracked, untracked,
 					      path.buf + baselen,
diff --git a/dir.h b/dir.h
index 680079bbe3..46c238ab49 100644
--- a/dir.h
+++ b/dir.h
@@ -211,8 +211,9 @@ int count_slashes(const char *s);
  * when populating the seen[] array.
  */
 #define MATCHED_RECURSIVELY 1
-#define MATCHED_FNMATCH 2
-#define MATCHED_EXACTLY 3
+#define MATCHED_RECURSIVELY_LEADING_PATHSPEC 2
+#define MATCHED_FNMATCH 3
+#define MATCHED_EXACTLY 4
 int simple_length(const char *match);
 int no_wildcard(const char *string);
 char *common_prefix(const struct pathspec *pathspec);
diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index 12617158db..d83aeb7dc2 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -691,7 +691,7 @@ test_expect_failure 'git clean -d skips nested repo containing ignored files' '
 	test_path_is_file nested-repo-with-ignored-file/file
 '
 
-test_expect_failure 'git clean handles being told what to clean' '
+test_expect_success 'git clean handles being told what to clean' '
 	mkdir -p d1 d2 &&
 	touch d1/ut d2/ut &&
 	git clean -f */ut &&
@@ -707,7 +707,7 @@ test_expect_success 'git clean handles being told what to clean, with -d' '
 	test_path_is_missing d2/ut
 '
 
-test_expect_failure 'git clean works if a glob is passed without -d' '
+test_expect_success 'git clean works if a glob is passed without -d' '
 	mkdir -p d1 d2 &&
 	touch d1/ut d2/ut &&
 	git clean -f "*ut" &&
-- 
2.22.1.17.g6e632477f7.dirty


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v4 07/12] dir: add commentary explaining match_pathspec_item's return value
  2019-09-17 16:34     ` [PATCH v4 00/12] Fix some git clean issues Elijah Newren
                         ` (5 preceding siblings ...)
  2019-09-17 16:34       ` [PATCH v4 06/12] dir: if our pathspec might match files under a dir, recurse into it Elijah Newren
@ 2019-09-17 16:34       ` Elijah Newren
  2019-09-17 16:35       ` [PATCH v4 08/12] git-clean.txt: do not claim we will delete files with -n/--dry-run Elijah Newren
                         ` (4 subsequent siblings)
  11 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-09-17 16:34 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Rafael Ascensão,
	SZEDER Gábor, Samuel Lijin, Elijah Newren

The way match_pathspec_item() handles names and pathspecs with trailing
slash characters, in conjunction with special options like
DO_MATCH_DIRECTORY and DO_MATCH_LEADING_PATHSPEC were non-obvious, and
broken until this patch series.  Add a table in a comment explaining the
intent of how these work.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/dir.c b/dir.c
index 47c0a99cb5..3b2fe1701c 100644
--- a/dir.c
+++ b/dir.c
@@ -276,16 +276,27 @@ static int do_read_blob(const struct object_id *oid, struct oid_stat *oid_stat,
 #define DO_MATCH_LEADING_PATHSPEC (1<<2)
 
 /*
- * Does 'match' match the given name?
- * A match is found if
+ * Does the given pathspec match the given name?  A match is found if
  *
- * (1) the 'match' string is leading directory of 'name', or
- * (2) the 'match' string is a wildcard and matches 'name', or
- * (3) the 'match' string is exactly the same as 'name'.
+ * (1) the pathspec string is leading directory of 'name' ("RECURSIVELY"), or
+ * (2) the pathspec string has a leading part matching 'name' ("LEADING"), or
+ * (3) the pathspec string is a wildcard and matches 'name' ("WILDCARD"), or
+ * (4) the pathspec string is exactly the same as 'name' ("EXACT").
  *
- * and the return value tells which case it was.
+ * Return value tells which case it was (1-4), or 0 when there is no match.
  *
- * It returns 0 when there is no match.
+ * It may be instructive to look at a small table of concrete examples
+ * to understand the differences between 1, 2, and 4:
+ *
+ *                              Pathspecs
+ *                |    a/b    |   a/b/    |   a/b/c
+ *          ------+-----------+-----------+------------
+ *          a/b   |  EXACT    |  EXACT[1] | LEADING[2]
+ *  Names   a/b/  | RECURSIVE |   EXACT   | LEADING[2]
+ *          a/b/c | RECURSIVE | RECURSIVE |   EXACT
+ *
+ * [1] Only if DO_MATCH_DIRECTORY is passed; otherwise, this is NOT a match.
+ * [2] Only if DO_MATCH_LEADING_PATHSPEC is passed; otherwise, not a match.
  */
 static int match_pathspec_item(const struct index_state *istate,
 			       const struct pathspec_item *item, int prefix,
@@ -353,7 +364,7 @@ static int match_pathspec_item(const struct index_state *istate,
 			 item->nowildcard_len - prefix))
 		return MATCHED_FNMATCH;
 
-	/* Perform checks to see if "name" is a super set of the pathspec */
+	/* Perform checks to see if "name" is a leading string of the pathspec */
 	if (flags & DO_MATCH_LEADING_PATHSPEC) {
 		/* name is a literal prefix of the pathspec */
 		int offset = name[namelen-1] == '/' ? 1 : 0;
-- 
2.22.1.17.g6e632477f7.dirty


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v4 08/12] git-clean.txt: do not claim we will delete files with -n/--dry-run
  2019-09-17 16:34     ` [PATCH v4 00/12] Fix some git clean issues Elijah Newren
                         ` (6 preceding siblings ...)
  2019-09-17 16:34       ` [PATCH v4 07/12] dir: add commentary explaining match_pathspec_item's return value Elijah Newren
@ 2019-09-17 16:35       ` Elijah Newren
  2019-09-17 16:35       ` [PATCH v4 09/12] clean: disambiguate the definition of -d Elijah Newren
                         ` (3 subsequent siblings)
  11 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-09-17 16:35 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Rafael Ascensão,
	SZEDER Gábor, Samuel Lijin, Elijah Newren

It appears that the wrong option got included in the list of what will
cause git-clean to actually take action.  Correct the list.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Documentation/git-clean.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/git-clean.txt b/Documentation/git-clean.txt
index db876f7dde..e84ffc9396 100644
--- a/Documentation/git-clean.txt
+++ b/Documentation/git-clean.txt
@@ -35,7 +35,7 @@ OPTIONS
 --force::
 	If the Git configuration variable clean.requireForce is not set
 	to false, 'git clean' will refuse to delete files or directories
-	unless given -f, -n or -i. Git will refuse to delete directories
+	unless given -f or -i. Git will refuse to delete directories
 	with .git sub directory or file unless a second -f
 	is given.
 
-- 
2.22.1.17.g6e632477f7.dirty


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v4 09/12] clean: disambiguate the definition of -d
  2019-09-17 16:34     ` [PATCH v4 00/12] Fix some git clean issues Elijah Newren
                         ` (7 preceding siblings ...)
  2019-09-17 16:35       ` [PATCH v4 08/12] git-clean.txt: do not claim we will delete files with -n/--dry-run Elijah Newren
@ 2019-09-17 16:35       ` Elijah Newren
  2019-09-17 16:35       ` [PATCH v4 10/12] clean: avoid removing untracked files in a nested git repository Elijah Newren
                         ` (2 subsequent siblings)
  11 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-09-17 16:35 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Rafael Ascensão,
	SZEDER Gábor, Samuel Lijin, Elijah Newren

The -d flag pre-dated git-clean's ability to have paths specified.  As
such, the default for git-clean was to only remove untracked files in
the current directory, and -d existed to allow it to recurse into
subdirectories.

The interaction of paths and the -d option appears to not have been
carefully considered, as evidenced by numerous bugs and a dearth of
tests covering such pairings in the testsuite.  The definition turns out
to be important, so let's look at some of the various ways one could
interpret the -d option:

  A) Without -d, only look in subdirectories which contain tracked
     files under them; with -d, also look in subdirectories which
     are untracked for files to clean.

  B) Without specified paths from the user for us to delete, we need to
     have some kind of default, so...without -d, only look in
     subdirectories which contain tracked files under them; with -d,
     also look in subdirectories which are untracked for files to clean.

The important distinction here is that choice B says that the presence
or absence of '-d' is irrelevant if paths are specified.  The logic
behind option B is that if a user explicitly asked us to clean a
specified pathspec, then we should clean anything that matches that
pathspec.  Some examples may clarify.  Should

   git clean -f untracked_dir/file

remove untracked_dir/file or not?  It seems crazy not to, but a strict
reading of option A says it shouldn't be removed.  How about

   git clean -f untracked_dir/file1 tracked_dir/file2

or

   git clean -f untracked_dir_1/file1 untracked_dir_2/file2

?  Should it remove either or both of these files?  Should it require
multiple runs to remove both the files listed?  (If this sounds like a
crazy question to even ask, see the commit message of "t7300: Add some
testcases showing failure to clean specified pathspecs" added earlier in
this patch series.)  What if -ffd were used instead of -f -- should that
allow these to be removed?  Should it take multiple invocations with
-ffd?  What if a glob (such as '*tracked*') were used instead of
spelling out the directory names?  What if the filenames involved globs,
such as

   git clean -f '*.o'

or

   git clean -f '*/*.o'

?

The current documentation actually suggests a definition that is
slightly different than choice A, and the implementation prior to this
series provided something radically different than either choices A or
B. (The implementation, though, was clearly just buggy).  There may be
other choices as well.  However, for almost any given choice of
definition for -d that I can think of, some of the examples above will
appear buggy to the user.  The only case that doesn't have negative
surprises is choice B: treat a user-specified path as a request to clean
all untracked files which match that path specification, including
recursing into any untracked directories.

Change the documentation and basic implementation to use this
definition.

There were two regression tests that indirectly depended on the current
implementation, but neither was about subdirectory handling.  These two
tests were introduced in commit 5b7570cfb41c ("git-clean: add tests for
relative path", 2008-03-07) which was solely created to add coverage for
the changes in commit fb328947c8e ("git-clean: correct printing relative
path", 2008-03-07).  Both tests specified a directory that happened to
have an untracked subdirectory, but both were only checking that the
resulting printout of a file that was removed was shown with a relative
path.  Update these tests appropriately.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Documentation/git-clean.txt | 10 ++++++----
 builtin/clean.c             |  8 ++++++++
 t/t7300-clean.sh            |  2 ++
 3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-clean.txt b/Documentation/git-clean.txt
index e84ffc9396..3ab749b921 100644
--- a/Documentation/git-clean.txt
+++ b/Documentation/git-clean.txt
@@ -26,10 +26,12 @@ are affected.
 OPTIONS
 -------
 -d::
-	Remove untracked directories in addition to untracked files.
-	If an untracked directory is managed by a different Git
-	repository, it is not removed by default.  Use -f option twice
-	if you really want to remove such a directory.
+	Normally, when no <path> is specified, git clean will not
+	recurse into untracked directories to avoid removing too much.
+	Specify -d to have it recurse into such directories as well.
+	If any paths are specified, -d is irrelevant; all untracked
+	files matching the specified paths (with exceptions for nested
+	git directories mentioned under `--force`) will be removed.
 
 -f::
 --force::
diff --git a/builtin/clean.c b/builtin/clean.c
index d5579da716..68d70e41c0 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -949,6 +949,14 @@ int cmd_clean(int argc, const char **argv, const char *prefix)
 
 	dir.flags |= DIR_SHOW_OTHER_DIRECTORIES;
 
+	if (argc) {
+		/*
+		 * Remaining args implies pathspecs specified, and we should
+		 * recurse within those.
+		 */
+		remove_directories = 1;
+	}
+
 	if (remove_directories)
 		dir.flags |= DIR_SHOW_IGNORED_TOO | DIR_KEEP_UNTRACKED_CONTENTS;
 
diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index d83aeb7dc2..530dfdab34 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -117,6 +117,7 @@ test_expect_success C_LOCALE_OUTPUT 'git clean with relative prefix' '
 	would_clean=$(
 		cd docs &&
 		git clean -n ../src |
+		grep part3 |
 		sed -n -e "s|^Would remove ||p"
 	) &&
 	verbose test "$would_clean" = ../src/part3.c
@@ -129,6 +130,7 @@ test_expect_success C_LOCALE_OUTPUT 'git clean with absolute path' '
 	would_clean=$(
 		cd docs &&
 		git clean -n "$(pwd)/../src" |
+		grep part3 |
 		sed -n -e "s|^Would remove ||p"
 	) &&
 	verbose test "$would_clean" = ../src/part3.c
-- 
2.22.1.17.g6e632477f7.dirty


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v4 10/12] clean: avoid removing untracked files in a nested git repository
  2019-09-17 16:34     ` [PATCH v4 00/12] Fix some git clean issues Elijah Newren
                         ` (8 preceding siblings ...)
  2019-09-17 16:35       ` [PATCH v4 09/12] clean: disambiguate the definition of -d Elijah Newren
@ 2019-09-17 16:35       ` Elijah Newren
  2019-09-17 16:35       ` [PATCH v4 11/12] clean: rewrap overly long line Elijah Newren
  2019-09-17 16:35       ` [PATCH v4 12/12] clean: fix theoretical path corruption Elijah Newren
  11 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-09-17 16:35 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Rafael Ascensão,
	SZEDER Gábor, Samuel Lijin, Elijah Newren

Users expect files in a nested git repository to be left alone unless
sufficiently forced (with two -f's).  Unfortunately, in certain
circumstances, git would delete both tracked (and possibly dirty) files
and untracked files within a nested repository.  To explain how this
happens, let's contrast a couple cases.  First, take the following
example setup (which assumes we are already within a git repo):

   git init nested
   cd nested
   >tracked
   git add tracked
   git commit -m init
   >untracked
   cd ..

In this setup, everything works as expected; running 'git clean -fd'
will result in fill_directory() returning the following paths:
   nested/
   nested/tracked
   nested/untracked
and then correct_untracked_entries() would notice this can be compressed
to
   nested/
and then since "nested/" is a directory, we would call
remove_dirs("nested/", ...), which would
check is_nonbare_repository_dir() and then decide to skip it.

However, if someone also creates an ignored file:
   >nested/ignored
then running 'git clean -fd' would result in fill_directory() returning
the same paths:
   nested/
   nested/tracked
   nested/untracked
but correct_untracked_entries() will notice that we had ignored entries
under nested/ and thus simplify this list to
   nested/tracked
   nested/untracked
Since these are not directories, we do not call remove_dirs() which was
the only place that had the is_nonbare_repository_dir() safety check --
resulting in us deleting both the untracked file and the tracked (and
possibly dirty) file.

One possible fix for this issue would be walking the parent directories
of each path and checking if they represent nonbare repositories, but
that would be wasteful.  Even if we added caching of some sort, it's
still a waste because we should have been able to check that "nested/"
represented a nonbare repository before even descending into it in the
first place.  Add a DIR_SKIP_NESTED_GIT flag to dir_struct.flags and use
it to prevent fill_directory() and friends from descending into nested
git repos.

With this change, we also modify two regression tests added in commit
91479b9c72f1 ("t7300: add tests to document behavior of clean and nested
git", 2015-06-15).  That commit, nor its series, nor the six previous
iterations of that series on the mailing list discussed why those tests
coded the expectation they did.  In fact, it appears their purpose was
simply to test _existing_ behavior to make sure that the performance
changes didn't change the behavior.  However, these two tests directly
contradicted the manpage's claims that two -f's were required to delete
files/directories under a nested git repository.  While one could argue
that the user gave an explicit path which matched files/directories that
were within a nested repository, there's a slippery slope that becomes
very difficult for users to understand once you go down that route (e.g.
what if they specified "git clean -f -d '*.c'"?)  It would also be hard
to explain what the exact behavior was; avoid such problems by making it
really simple.

Also, clean up some grammar errors describing this functionality in the
git-clean manpage.

Finally, there are still a couple bugs with -ffd not cleaning out enough
(e.g.  missing the nested .git) and with -ffdX possibly cleaning out the
wrong files (paying attention to outer .gitignore instead of inner).
This patch does not address these cases at all (and does not change the
behavior relative to those flags), it only fixes the handling when given
a single -f.  See
https://public-inbox.org/git/20190905212043.GC32087@szeder.dev/ for more
discussion of the -ffd[X?] bugs.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Documentation/git-clean.txt |  6 +++---
 builtin/clean.c             |  2 ++
 dir.c                       | 10 ++++++++++
 dir.h                       |  3 ++-
 t/t7300-clean.sh            | 10 +++++-----
 5 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/Documentation/git-clean.txt b/Documentation/git-clean.txt
index 3ab749b921..ba31d8d166 100644
--- a/Documentation/git-clean.txt
+++ b/Documentation/git-clean.txt
@@ -37,9 +37,9 @@ OPTIONS
 --force::
 	If the Git configuration variable clean.requireForce is not set
 	to false, 'git clean' will refuse to delete files or directories
-	unless given -f or -i. Git will refuse to delete directories
-	with .git sub directory or file unless a second -f
-	is given.
+	unless given -f or -i.  Git will refuse to modify untracked
+	nested git repositories (directories with a .git subdirectory)
+	unless a second -f is given.
 
 -i::
 --interactive::
diff --git a/builtin/clean.c b/builtin/clean.c
index 68d70e41c0..3a7a63ae71 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -946,6 +946,8 @@ int cmd_clean(int argc, const char **argv, const char *prefix)
 
 	if (force > 1)
 		rm_flags = 0;
+	else
+		dir.flags |= DIR_SKIP_NESTED_GIT;
 
 	dir.flags |= DIR_SHOW_OTHER_DIRECTORIES;
 
diff --git a/dir.c b/dir.c
index 3b2fe1701c..7ff79170fc 100644
--- a/dir.c
+++ b/dir.c
@@ -1451,6 +1451,16 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 		return path_none;
 
 	case index_nonexistent:
+		if (dir->flags & DIR_SKIP_NESTED_GIT) {
+			int nested_repo;
+			struct strbuf sb = STRBUF_INIT;
+			strbuf_addstr(&sb, dirname);
+			nested_repo = is_nonbare_repository_dir(&sb);
+			strbuf_release(&sb);
+			if (nested_repo)
+				return path_none;
+		}
+
 		if (dir->flags & DIR_SHOW_OTHER_DIRECTORIES)
 			break;
 		if (exclude &&
diff --git a/dir.h b/dir.h
index 46c238ab49..739aea7c96 100644
--- a/dir.h
+++ b/dir.h
@@ -156,7 +156,8 @@ struct dir_struct {
 		DIR_SHOW_IGNORED_TOO = 1<<5,
 		DIR_COLLECT_KILLED_ONLY = 1<<6,
 		DIR_KEEP_UNTRACKED_CONTENTS = 1<<7,
-		DIR_SHOW_IGNORED_TOO_MODE_MATCHING = 1<<8
+		DIR_SHOW_IGNORED_TOO_MODE_MATCHING = 1<<8,
+		DIR_SKIP_NESTED_GIT = 1<<9
 	} flags;
 	struct dir_entry **entries;
 	struct dir_entry **ignored;
diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index 530dfdab34..6e6d24c1c3 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -549,7 +549,7 @@ test_expect_failure 'nested (non-empty) bare repositories should be cleaned even
 	test_path_is_missing strange_bare
 '
 
-test_expect_success 'giving path in nested git work tree will remove it' '
+test_expect_success 'giving path in nested git work tree will NOT remove it' '
 	rm -fr repo &&
 	mkdir repo &&
 	(
@@ -561,7 +561,7 @@ test_expect_success 'giving path in nested git work tree will remove it' '
 	git clean -f -d repo/bar/baz &&
 	test_path_is_file repo/.git/HEAD &&
 	test_path_is_dir repo/bar/ &&
-	test_path_is_missing repo/bar/baz
+	test_path_is_file repo/bar/baz/hello.world
 '
 
 test_expect_success 'giving path to nested .git will not remove it' '
@@ -579,7 +579,7 @@ test_expect_success 'giving path to nested .git will not remove it' '
 	test_path_is_dir untracked/
 '
 
-test_expect_success 'giving path to nested .git/ will remove contents' '
+test_expect_success 'giving path to nested .git/ will NOT remove contents' '
 	rm -fr repo untracked &&
 	mkdir repo untracked &&
 	(
@@ -589,7 +589,7 @@ test_expect_success 'giving path to nested .git/ will remove contents' '
 	) &&
 	git clean -f -d repo/.git/ &&
 	test_path_is_dir repo/.git &&
-	test_dir_is_empty repo/.git &&
+	test_path_is_file repo/.git/HEAD &&
 	test_path_is_dir untracked/
 '
 
@@ -671,7 +671,7 @@ test_expect_success 'git clean -d skips untracked dirs containing ignored files'
 	test_path_is_missing foo/b/bb
 '
 
-test_expect_failure 'git clean -d skips nested repo containing ignored files' '
+test_expect_success 'git clean -d skips nested repo containing ignored files' '
 	test_when_finished "rm -rf nested-repo-with-ignored-file" &&
 
 	git init nested-repo-with-ignored-file &&
-- 
2.22.1.17.g6e632477f7.dirty


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v4 11/12] clean: rewrap overly long line
  2019-09-17 16:34     ` [PATCH v4 00/12] Fix some git clean issues Elijah Newren
                         ` (9 preceding siblings ...)
  2019-09-17 16:35       ` [PATCH v4 10/12] clean: avoid removing untracked files in a nested git repository Elijah Newren
@ 2019-09-17 16:35       ` Elijah Newren
  2019-09-17 16:35       ` [PATCH v4 12/12] clean: fix theoretical path corruption Elijah Newren
  11 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-09-17 16:35 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Rafael Ascensão,
	SZEDER Gábor, Samuel Lijin, Elijah Newren

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 builtin/clean.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/builtin/clean.c b/builtin/clean.c
index 3a7a63ae71..6030842f3a 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -158,7 +158,8 @@ static int remove_dirs(struct strbuf *path, const char *prefix, int force_flag,
 
 	*dir_gone = 1;
 
-	if ((force_flag & REMOVE_DIR_KEEP_NESTED_GIT) && is_nonbare_repository_dir(path)) {
+	if ((force_flag & REMOVE_DIR_KEEP_NESTED_GIT) &&
+	    is_nonbare_repository_dir(path)) {
 		if (!quiet) {
 			quote_path_relative(path->buf, prefix, &quoted);
 			printf(dry_run ?  _(msg_would_skip_git_dir) : _(msg_skip_git_dir),
-- 
2.22.1.17.g6e632477f7.dirty


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v4 12/12] clean: fix theoretical path corruption
  2019-09-17 16:34     ` [PATCH v4 00/12] Fix some git clean issues Elijah Newren
                         ` (10 preceding siblings ...)
  2019-09-17 16:35       ` [PATCH v4 11/12] clean: rewrap overly long line Elijah Newren
@ 2019-09-17 16:35       ` Elijah Newren
  11 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-09-17 16:35 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Rafael Ascensão,
	SZEDER Gábor, Samuel Lijin, Elijah Newren

cmd_clean() had the following code structure:

    struct strbuf abs_path = STRBUF_INIT;
    for_each_string_list_item(item, &del_list) {
        strbuf_addstr(&abs_path, prefix);
        strbuf_addstr(&abs_path, item->string);
        PROCESS(&abs_path);
        strbuf_reset(&abs_path);
    }

where I've elided a bunch of unnecessary details and PROCESS(&abs_path)
represents a big chunk of code rather than an actual function call.  One
piece of PROCESS was:

    if (lstat(abs_path.buf, &st))
        continue;

which would cause the strbuf_reset() to be missed -- meaning that the
next path to be handled would have two paths concatenated.  This path
used to use die_errno() instead of continue prior to commit 396049e5fb62
("git-clean: refactor git-clean into two phases", 2013-06-25), but my
understanding of how correct_untracked_entries() works is that it will
prevent both dir/ and dir/file from being in the list to clean so this
should be dead code and the die_errno() should be safe.  But I hesitate
to remove it since I am not certain.

However, we can fix both this bug and possible similar future bugs by
simply moving the strbuf_reset(&abs_path) to the beginning of the loop.
It'll result in N calls to strbuf_reset() instead of N-1, but that's a
small price to pay to avoid sneaky bugs like this.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 builtin/clean.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/clean.c b/builtin/clean.c
index 6030842f3a..4cf2399f59 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -1018,6 +1018,7 @@ int cmd_clean(int argc, const char **argv, const char *prefix)
 	for_each_string_list_item(item, &del_list) {
 		struct stat st;
 
+		strbuf_reset(&abs_path);
 		if (prefix)
 			strbuf_addstr(&abs_path, prefix);
 
@@ -1051,7 +1052,6 @@ int cmd_clean(int argc, const char **argv, const char *prefix)
 				printf(dry_run ? _(msg_would_remove) : _(msg_remove), qname);
 			}
 		}
-		strbuf_reset(&abs_path);
 	}
 
 	strbuf_release(&abs_path);
-- 
2.22.1.17.g6e632477f7.dirty


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [BUG] git is segfaulting, was [PATCH v4 04/12] dir: also check directories for matching pathspecs
  2019-09-17 16:34       ` [PATCH v4 04/12] dir: also check directories for matching pathspecs Elijah Newren
@ 2019-09-25 20:39         ` " Denton Liu
  2019-09-25 21:28           ` Elijah Newren
  2019-09-27  1:09           ` SZEDER Gábor
  0 siblings, 2 replies; 73+ messages in thread
From: Denton Liu @ 2019-09-25 20:39 UTC (permalink / raw)
  To: Elijah Newren
  Cc: git, Junio C Hamano, Jeff King, Rafael Ascensão,
	SZEDER Gábor, Samuel Lijin

Hi Elijah,

I ran into a segfault on MacOS. I managed to bisect it down to
404ebceda0 (dir: also check directories for matching pathspecs,
2019-09-17), which should be the patch in the parent thread. The test
case below works fine without this patch applied but segfaults once it
is applied.

	#!/bin/sh

	git worktree add testdir
	git -C testdir checkout master
	git -C testdir fetch https://github.com/git/git.git todo
	bin-wrappers/git -C testdir checkout FETCH_HEAD # segfault here

Note that the worktree part isn't necessary to reproduce the problem but
I didn't want my files to be constantly refreshed, triggering a rebuild
each time.

I also managed to get this backtrace from running lldb at the segfault
but it is based on the latest "jch" commit, 1cc52d20df (Merge branch
'jt/merge-recursive-symlink-is-not-a-dir-in-way' into jch, 2019-09-20).

	* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x8)
	  * frame #0: 0x00000001000f63a0 git`do_match_pathspec(istate=0x0000000100299940, ps=0x000000010200aa80, name="Gitweb/static/js/lib/", namelen=21, prefix=0, seen=0x0000000000000000, flags=0) at dir.c:420:2 [opt]
		frame #1: 0x00000001000f632c git`match_pathspec(istate=0x0000000100299940, ps=0x0000000000000000, name="Gitweb/static/js/lib/", namelen=21, prefix=0, seen=0x0000000000000000, is_dir=0) at dir.c:490:13 [opt]
		frame #2: 0x00000001000f8315 git`read_directory_recursive(dir=0x00007ffeefbfe278, istate=0x0000000100299940, base=<unavailable>, baselen=17, untracked=<unavailable>, check_only=0, stop_at_first_file=0, pathspec=0x0000000000000000) at dir.c:1990:9 [opt]
		frame #3: 0x00000001000f82e9 git`read_directory_recursive(dir=0x00007ffeefbfe278, istate=0x0000000100299940, base=<unavailable>, baselen=14, untracked=<unavailable>, check_only=0, stop_at_first_file=0, pathspec=0x0000000000000000) at dir.c:1984:5 [opt]
		frame #4: 0x00000001000f82e9 git`read_directory_recursive(dir=0x00007ffeefbfe278, istate=0x0000000100299940, base=<unavailable>, baselen=7, untracked=<unavailable>, check_only=0, stop_at_first_file=0, pathspec=0x0000000000000000) at dir.c:1984:5 [opt]
		frame #5: 0x00000001000f60d1 git`read_directory(dir=0x00007ffeefbfe278, istate=0x0000000100299940, path="Gitweb/", len=7, pathspec=0x0000000000000000) at dir.c:2298:3 [opt]
		frame #6: 0x00000001001bded1 git`verify_clean_subdirectory(ce=<unavailable>, o=0x00007ffeefbfe8c0) at unpack-trees.c:1846:6 [opt]
		frame #7: 0x00000001001bdc1d git`check_ok_to_remove(name="Gitweb", len=6, dtype=4, ce=0x0000000103e70de0, st=0x00007ffeefbfe438, error_type=ERROR_WOULD_LOSE_UNTRACKED_OVERWRITTEN, o=0x00007ffeefbfe8c0) at unpack-trees.c:1901:7 [opt]
		frame #8: 0x00000001001bdb01 git`verify_absent_1(ce=<unavailable>, error_type=<unavailable>, o=<unavailable>) at unpack-trees.c:1964:10 [opt]
		frame #9: 0x00000001001bafc0 git`verify_absent(ce=<unavailable>, error_type=<unavailable>, o=<unavailable>) at unpack-trees.c:1052:11 [opt] [artificial]
		frame #10: 0x00000001001bbc3c git`merged_entry(ce=0x0000000100605fb0, old=0x0000000000000000, o=0x00007ffeefbfe8c0) at unpack-trees.c:2013:7 [opt]
		frame #11: 0x00000001001bd2b7 git`call_unpack_fn(src=<unavailable>, o=<unavailable>) at unpack-trees.c:522:12 [opt]
		frame #12: 0x00000001001bca16 git`unpack_nondirectories(n=2, mask=2, dirmask=<unavailable>, src=0x00007ffeefbfe5d0, names=<unavailable>, info=0x00007ffeefbfe718) at unpack-trees.c:1029:12 [opt]
		frame #13: 0x00000001001bad1a git`unpack_callback(n=2, mask=2, dirmask=0, names=0x0000000102007390, info=0x00007ffeefbfe718) at unpack-trees.c:1229:6 [opt]
		frame #14: 0x00000001001b8be2 git`traverse_trees(istate=0x0000000100299940, n=2, t=<unavailable>, info=<unavailable>) at tree-walk.c:497:17 [opt]
		frame #15: 0x00000001001ba80f git`unpack_trees(len=2, t=0x00007ffeefbfebe0, o=0x00007ffeefbfe8c0) at unpack-trees.c:1546:9 [opt]
		frame #16: 0x000000010001a443 git`merge_working_tree(opts=0x00007ffeefbfee38, old_branch_info=0x00007ffeefbfeca0, new_branch_info=0x00007ffeefbfeda0, writeout_error=0x00007ffeefbfeccc) at checkout.c:704:9 [opt]
		frame #17: 0x000000010001a08c git`switch_branches(opts=0x00007ffeefbfee38, new_branch_info=0x00007ffeefbfeda0) at checkout.c:1057:9 [opt]
		frame #18: 0x0000000100018df0 git`checkout_branch(opts=<unavailable>, new_branch_info=<unavailable>) at checkout.c:1426:9 [opt]
		frame #19: 0x0000000100017b90 git`checkout_main(argc=0, argv=0x00007ffeefbff570, prefix=0x0000000000000000, opts=0x00007ffeefbfee38, options=<unavailable>, usagestr=<unavailable>) at checkout.c:1682:10 [opt]
		frame #20: 0x0000000100016f2d git`cmd_checkout(argc=2, argv=0x00007ffeefbff568, prefix=0x0000000000000000) at checkout.c:1731:8 [opt]
		frame #21: 0x00000001000026f6 git`run_builtin(p=0x000000010024c710, argc=2, argv=0x00007ffeefbff568) at git.c:444:11 [opt]
		frame #22: 0x0000000100001a36 git`handle_builtin(argc=2, argv=0x00007ffeefbff568) at git.c:673:3 [opt]
		frame #23: 0x000000010000235c git`run_argv(argcp=0x00007ffeefbff4ec, argv=0x00007ffeefbff4d8) at git.c:740:4 [opt]
		frame #24: 0x0000000100001794 git`cmd_main(argc=2, argv=0x00007ffeefbff568) at git.c:871:19 [opt]
		frame #25: 0x00000001000a4405 git`main(argc=<unavailable>, argv=0x00007ffeefbff560) at common-main.c:52:11 [opt]
		frame #26: 0x00007fff783053d5 libdyld.dylib`start + 1

Sorry for the information dump, I haven't had the time to properly look
into the issue but I just wanted to make sure that you're aware.

Thanks and hope this helps,

Denton

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [BUG] git is segfaulting, was [PATCH v4 04/12] dir: also check directories for matching pathspecs
  2019-09-25 20:39         ` [BUG] git is segfaulting, was " Denton Liu
@ 2019-09-25 21:28           ` Elijah Newren
  2019-09-25 21:55             ` Denton Liu
  2019-09-27  1:09           ` SZEDER Gábor
  1 sibling, 1 reply; 73+ messages in thread
From: Elijah Newren @ 2019-09-25 21:28 UTC (permalink / raw)
  To: Denton Liu
  Cc: Git Mailing List, Junio C Hamano, Jeff King,
	Rafael Ascensão, SZEDER Gábor, Samuel Lijin

Hi Denton,

On Wed, Sep 25, 2019 at 1:39 PM Denton Liu <liu.denton@gmail.com> wrote:
>
> Hi Elijah,
>
> I ran into a segfault on MacOS. I managed to bisect it down to
> 404ebceda0 (dir: also check directories for matching pathspecs,
> 2019-09-17), which should be the patch in the parent thread. The test
> case below works fine without this patch applied but segfaults once it
> is applied.
>
>         #!/bin/sh
>
>         git worktree add testdir
>         git -C testdir checkout master
>         git -C testdir fetch https://github.com/git/git.git todo
>         bin-wrappers/git -C testdir checkout FETCH_HEAD # segfault here
>
> Note that the worktree part isn't necessary to reproduce the problem but
> I didn't want my files to be constantly refreshed, triggering a rebuild
> each time.
>
> I also managed to get this backtrace from running lldb at the segfault
> but it is based on the latest "jch" commit, 1cc52d20df (Merge branch
> 'jt/merge-recursive-symlink-is-not-a-dir-in-way' into jch, 2019-09-20).
>
>         * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x8)
>           * frame #0: 0x00000001000f63a0 git`do_match_pathspec(istate=0x0000000100299940, ps=0x000000010200aa80, name="Gitweb/static/js/lib/", namelen=21, prefix=0, seen=0x0000000000000000, flags=0) at dir.c:420:2 [opt]
>                 frame #1: 0x00000001000f632c git`match_pathspec(istate=0x0000000100299940, ps=0x0000000000000000, name="Gitweb/static/js/lib/", namelen=21, prefix=0, seen=0x0000000000000000, is_dir=0) at dir.c:490:13 [opt]
>                 frame #2: 0x00000001000f8315 git`read_directory_recursive(dir=0x00007ffeefbfe278, istate=0x0000000100299940, base=<unavailable>, baselen=17, untracked=<unavailable>, check_only=0, stop_at_first_file=0, pathspec=0x0000000000000000) at dir.c:1990:9 [opt]
>                 frame #3: 0x00000001000f82e9 git`read_directory_recursive(dir=0x00007ffeefbfe278, istate=0x0000000100299940, base=<unavailable>, baselen=14, untracked=<unavailable>, check_only=0, stop_at_first_file=0, pathspec=0x0000000000000000) at dir.c:1984:5 [opt]
>                 frame #4: 0x00000001000f82e9 git`read_directory_recursive(dir=0x00007ffeefbfe278, istate=0x0000000100299940, base=<unavailable>, baselen=7, untracked=<unavailable>, check_only=0, stop_at_first_file=0, pathspec=0x0000000000000000) at dir.c:1984:5 [opt]
>                 frame #5: 0x00000001000f60d1 git`read_directory(dir=0x00007ffeefbfe278, istate=0x0000000100299940, path="Gitweb/", len=7, pathspec=0x0000000000000000) at dir.c:2298:3 [opt]
>                 frame #6: 0x00000001001bded1 git`verify_clean_subdirectory(ce=<unavailable>, o=0x00007ffeefbfe8c0) at unpack-trees.c:1846:6 [opt]
>                 frame #7: 0x00000001001bdc1d git`check_ok_to_remove(name="Gitweb", len=6, dtype=4, ce=0x0000000103e70de0, st=0x00007ffeefbfe438, error_type=ERROR_WOULD_LOSE_UNTRACKED_OVERWRITTEN, o=0x00007ffeefbfe8c0) at unpack-trees.c:1901:7 [opt]
>                 frame #8: 0x00000001001bdb01 git`verify_absent_1(ce=<unavailable>, error_type=<unavailable>, o=<unavailable>) at unpack-trees.c:1964:10 [opt]
>                 frame #9: 0x00000001001bafc0 git`verify_absent(ce=<unavailable>, error_type=<unavailable>, o=<unavailable>) at unpack-trees.c:1052:11 [opt] [artificial]
>                 frame #10: 0x00000001001bbc3c git`merged_entry(ce=0x0000000100605fb0, old=0x0000000000000000, o=0x00007ffeefbfe8c0) at unpack-trees.c:2013:7 [opt]
>                 frame #11: 0x00000001001bd2b7 git`call_unpack_fn(src=<unavailable>, o=<unavailable>) at unpack-trees.c:522:12 [opt]
>                 frame #12: 0x00000001001bca16 git`unpack_nondirectories(n=2, mask=2, dirmask=<unavailable>, src=0x00007ffeefbfe5d0, names=<unavailable>, info=0x00007ffeefbfe718) at unpack-trees.c:1029:12 [opt]
>                 frame #13: 0x00000001001bad1a git`unpack_callback(n=2, mask=2, dirmask=0, names=0x0000000102007390, info=0x00007ffeefbfe718) at unpack-trees.c:1229:6 [opt]
>                 frame #14: 0x00000001001b8be2 git`traverse_trees(istate=0x0000000100299940, n=2, t=<unavailable>, info=<unavailable>) at tree-walk.c:497:17 [opt]
>                 frame #15: 0x00000001001ba80f git`unpack_trees(len=2, t=0x00007ffeefbfebe0, o=0x00007ffeefbfe8c0) at unpack-trees.c:1546:9 [opt]
>                 frame #16: 0x000000010001a443 git`merge_working_tree(opts=0x00007ffeefbfee38, old_branch_info=0x00007ffeefbfeca0, new_branch_info=0x00007ffeefbfeda0, writeout_error=0x00007ffeefbfeccc) at checkout.c:704:9 [opt]
>                 frame #17: 0x000000010001a08c git`switch_branches(opts=0x00007ffeefbfee38, new_branch_info=0x00007ffeefbfeda0) at checkout.c:1057:9 [opt]
>                 frame #18: 0x0000000100018df0 git`checkout_branch(opts=<unavailable>, new_branch_info=<unavailable>) at checkout.c:1426:9 [opt]
>                 frame #19: 0x0000000100017b90 git`checkout_main(argc=0, argv=0x00007ffeefbff570, prefix=0x0000000000000000, opts=0x00007ffeefbfee38, options=<unavailable>, usagestr=<unavailable>) at checkout.c:1682:10 [opt]
>                 frame #20: 0x0000000100016f2d git`cmd_checkout(argc=2, argv=0x00007ffeefbff568, prefix=0x0000000000000000) at checkout.c:1731:8 [opt]
>                 frame #21: 0x00000001000026f6 git`run_builtin(p=0x000000010024c710, argc=2, argv=0x00007ffeefbff568) at git.c:444:11 [opt]
>                 frame #22: 0x0000000100001a36 git`handle_builtin(argc=2, argv=0x00007ffeefbff568) at git.c:673:3 [opt]
>                 frame #23: 0x000000010000235c git`run_argv(argcp=0x00007ffeefbff4ec, argv=0x00007ffeefbff4d8) at git.c:740:4 [opt]
>                 frame #24: 0x0000000100001794 git`cmd_main(argc=2, argv=0x00007ffeefbff568) at git.c:871:19 [opt]
>                 frame #25: 0x00000001000a4405 git`main(argc=<unavailable>, argv=0x00007ffeefbff560) at common-main.c:52:11 [opt]
>                 frame #26: 0x00007fff783053d5 libdyld.dylib`start + 1
>
> Sorry for the information dump, I haven't had the time to properly look
> into the issue but I just wanted to make sure that you're aware.

Thanks for testing and sending the heads up.  Unfortunately, I cannot
reproduce on either Linux or Mac.  Do you have some special ignore
files or sparse-checkout paths that are important to triggering?
What's in your config.mak?  What compiler and version?

Here's what I did, just to verify:

cd ~/floss/git
git checkout 404ebceda0
NO_GETTEXT=1 make DEVELOPER=1 -j8   # I leave off the NO_GETTEXT=1 on linux
git worktree add testdir
git -C testdir checkout master
git -C testdir fetch https://github.com/git/git.git todo
bin-wrappers/git -C testdir checkout FETCH_HEAD

Did I get any of those steps wrong?


Thanks,
Elijah

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [BUG] git is segfaulting, was [PATCH v4 04/12] dir: also check directories for matching pathspecs
  2019-09-25 21:28           ` Elijah Newren
@ 2019-09-25 21:55             ` Denton Liu
  2019-09-26 20:35               ` Denton Liu
  0 siblings, 1 reply; 73+ messages in thread
From: Denton Liu @ 2019-09-25 21:55 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Git Mailing List, Junio C Hamano, Jeff King,
	Rafael Ascensão, SZEDER Gábor, Samuel Lijin

On Wed, Sep 25, 2019 at 02:28:15PM -0700, Elijah Newren wrote:

[...]

> > Sorry for the information dump, I haven't had the time to properly look
> > into the issue but I just wanted to make sure that you're aware.
> 
> Thanks for testing and sending the heads up.  Unfortunately, I cannot
> reproduce on either Linux or Mac.  Do you have some special ignore
> files or sparse-checkout paths that are important to triggering?
> What's in your config.mak?  

Before, I had an empty config.mak and I also had the following
.git/info/exclude (these are two worktrees I have checked out):

	/jch
	/patches

aside from that, I don't think I've changed anything else. Anyway, to
double-check that it wasn't my setup that was broken, I ran

	cd ..
	git clone git git2
	cd git2
	make configure
	./configure

and then followed the rest the steps and I could still reproduce it.

> What compiler and version?

	$ cc --version
	Apple LLVM version 10.0.1 (clang-1001.0.46.4)
	Target: x86_64-apple-darwin18.7.0
	Thread model: posix
	InstalledDir: /Library/Developer/CommandLineTools/usr/bin

> 
> Here's what I did, just to verify:
> 
> cd ~/floss/git
> git checkout 404ebceda0
> NO_GETTEXT=1 make DEVELOPER=1 -j8   # I leave off the NO_GETTEXT=1 on linux

I don't have NO_GETTEXT on Mac but I don't think it affects anything.

> git worktree add testdir
> git -C testdir checkout master
> git -C testdir fetch https://github.com/git/git.git todo
> bin-wrappers/git -C testdir checkout FETCH_HEAD
> 
> Did I get any of those steps wrong?

Looks correct to me. I don't see why this wouldn't reproduce. I'll send
you more information if I figure anything else out.

Thanks,

Denton

> 
> 
> Thanks,
> Elijah

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [BUG] git is segfaulting, was [PATCH v4 04/12] dir: also check directories for matching pathspecs
  2019-09-25 21:55             ` Denton Liu
@ 2019-09-26 20:35               ` Denton Liu
  2019-09-27  0:12                 ` Elijah Newren
  0 siblings, 1 reply; 73+ messages in thread
From: Denton Liu @ 2019-09-26 20:35 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Git Mailing List, Junio C Hamano, Jeff King,
	Rafael Ascensão, SZEDER Gábor, Samuel Lijin

On Wed, Sep 25, 2019 at 02:55:30PM -0700, Denton Liu wrote:
> Looks correct to me. I don't see why this wouldn't reproduce. I'll send
> you more information if I figure anything else out.

I looked into it a little more and I think I know why it's being
triggered.

When we checkout 'todo' from 'master', since they're completely
different trees, all of git's source files need to be removed. As a
result, the checkout process at some point invokes check_ok_to_remove().

This kicks off the following call chain:

	check_ok_to_remove()
	verify_clean_subdirectory()
	read_directory()
	read_directory_recursive() (this is called recursively, of course)
	match_pathspec()
	do_match_pathspec()

Where we segfault in do_match_pathspec() because ps is NULL:

	GUARD_PATHSPEC(ps,
			   PATHSPEC_FROMTOP |
			   PATHSPEC_MAXDEPTH |
			   PATHSPEC_LITERAL |
			   PATHSPEC_GLOB |
			   PATHSPEC_ICASE |
			   PATHSPEC_EXCLUDE |
			   PATHSPEC_ATTR);

So why is ps == NULL? In verify_clean_subdirectory(), we call
read_directory() like this:

	i = read_directory(&d, o->src_index, pathbuf, namelen+1, NULL);

where we explictly pass in a NULL and it is handed down the callstack. I
guess this means that we should be expecting that pathspecs can be NULL
in this path. So I've applied the patch at the bottom and it fixes the
problem.

I was wondering if we should stick a

	if (!ps)
		BUG("ps is NULL");

into do_match_pathspec(), though, so we can avoid these situations in
the future.

Also, I'm still not sure why the issue wasn't reproducible on your
side... I'm not too familiar with this area of the code, though.

-- >8 --
diff --git a/dir.c b/dir.c
index 76a3c3894b..b7a6de58c6 100644
--- a/dir.c
+++ b/dir.c
@@ -1952,7 +1952,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
                        if (subdir_state > dir_state)
                                dir_state = subdir_state;
 
-                       if (!match_pathspec(istate, pathspec, path.buf, path.len,
+                       if (pathspec && !match_pathspec(istate, pathspec, path.buf, path.len,
                                            0 /* prefix */, NULL,
                                            0 /* do NOT special case dirs */))
                                state = path_none;

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [BUG] git is segfaulting, was [PATCH v4 04/12] dir: also check directories for matching pathspecs
  2019-09-26 20:35               ` Denton Liu
@ 2019-09-27  0:12                 ` Elijah Newren
  0 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-09-27  0:12 UTC (permalink / raw)
  To: Denton Liu
  Cc: Git Mailing List, Junio C Hamano, Jeff King,
	Rafael Ascensão, SZEDER Gábor, Samuel Lijin

Hi Denton,

On Thu, Sep 26, 2019 at 1:35 PM Denton Liu <liu.denton@gmail.com> wrote:
>
> On Wed, Sep 25, 2019 at 02:55:30PM -0700, Denton Liu wrote:
> > Looks correct to me. I don't see why this wouldn't reproduce. I'll send
> > you more information if I figure anything else out.
>
> I looked into it a little more and I think I know why it's being
> triggered.
>
> When we checkout 'todo' from 'master', since they're completely
> different trees, all of git's source files need to be removed. As a
> result, the checkout process at some point invokes check_ok_to_remove().
>
> This kicks off the following call chain:
>
>         check_ok_to_remove()
>         verify_clean_subdirectory()
>         read_directory()
>         read_directory_recursive() (this is called recursively, of course)
>         match_pathspec()
>         do_match_pathspec()
>
> Where we segfault in do_match_pathspec() because ps is NULL:
>
>         GUARD_PATHSPEC(ps,
>                            PATHSPEC_FROMTOP |
>                            PATHSPEC_MAXDEPTH |
>                            PATHSPEC_LITERAL |
>                            PATHSPEC_GLOB |
>                            PATHSPEC_ICASE |
>                            PATHSPEC_EXCLUDE |
>                            PATHSPEC_ATTR);
>
> So why is ps == NULL? In verify_clean_subdirectory(), we call
> read_directory() like this:
>
>         i = read_directory(&d, o->src_index, pathbuf, namelen+1, NULL);
>
> where we explictly pass in a NULL and it is handed down the callstack. I
> guess this means that we should be expecting that pathspecs can be NULL
> in this path. So I've applied the patch at the bottom and it fixes the
> problem.
>
> I was wondering if we should stick a
>
>         if (!ps)
>                 BUG("ps is NULL");
>
> into do_match_pathspec(), though, so we can avoid these situations in
> the future.
>
> Also, I'm still not sure why the issue wasn't reproducible on your
> side... I'm not too familiar with this area of the code, though.
>
> -- >8 --
> diff --git a/dir.c b/dir.c
> index 76a3c3894b..b7a6de58c6 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1952,7 +1952,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
>                         if (subdir_state > dir_state)
>                                 dir_state = subdir_state;
>
> -                       if (!match_pathspec(istate, pathspec, path.buf, path.len,
> +                       if (pathspec && !match_pathspec(istate, pathspec, path.buf, path.len,
>                                             0 /* prefix */, NULL,
>                                             0 /* do NOT special case dirs */))
>                                 state = path_none;

The patch makes sense...but I'd really like to add a test, and
understand it better so I can check to see if there are any other bad
codepaths.  Sadly, I still have no idea how to reproduce the bug.  I
can put

    char *oopsies = NULL;
    printf("oopsies = %s\n", oopsies);

at the beginning of check_ok_to_remove() to verify that function is
never called and run the steps you gave with no problem.  However, I
do notice that your reproduction steps involve 'master' which may have
local changes for you that I don't have.  Is there any chance you can
reproduce this using a commit id that is already upstream instead of
'master'?  I've been poking around unpack-trees.c for a bit but I'm
having a hard time reversing out of it what's different about our
setups and how to trigger.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [BUG] git is segfaulting, was [PATCH v4 04/12] dir: also check directories for matching pathspecs
  2019-09-25 20:39         ` [BUG] git is segfaulting, was " Denton Liu
  2019-09-25 21:28           ` Elijah Newren
@ 2019-09-27  1:09           ` SZEDER Gábor
  2019-09-27  2:17             ` SZEDER Gábor
  1 sibling, 1 reply; 73+ messages in thread
From: SZEDER Gábor @ 2019-09-27  1:09 UTC (permalink / raw)
  To: Denton Liu
  Cc: Elijah Newren, git, Junio C Hamano, Jeff King,
	Rafael Ascensão, Samuel Lijin

On Wed, Sep 25, 2019 at 01:39:19PM -0700, Denton Liu wrote:
> Hi Elijah,
> 
> I ran into a segfault on MacOS. I managed to bisect it down to
> 404ebceda0 (dir: also check directories for matching pathspecs,
> 2019-09-17), which should be the patch in the parent thread. The test
> case below works fine without this patch applied but segfaults once it
> is applied.
> 
> 	#!/bin/sh
> 
> 	git worktree add testdir
> 	git -C testdir checkout master
> 	git -C testdir fetch https://github.com/git/git.git todo
> 	bin-wrappers/git -C testdir checkout FETCH_HEAD # segfault here
> 
> Note that the worktree part isn't necessary to reproduce the problem but
> I didn't want my files to be constantly refreshed, triggering a rebuild
> each time.
> 
> I also managed to get this backtrace from running lldb at the segfault
> but it is based on the latest "jch" commit, 1cc52d20df (Merge branch
> 'jt/merge-recursive-symlink-is-not-a-dir-in-way' into jch, 2019-09-20).
> 
> 	* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x8)
> 	  * frame #0: 0x00000001000f63a0 git`do_match_pathspec(istate=0x0000000100299940, ps=0x000000010200aa80, name="Gitweb/static/js/lib/", namelen=21, prefix=0, seen=0x0000000000000000, flags=0) at dir.c:420:2 [opt]
> 		frame #1: 0x00000001000f632c git`match_pathspec(istate=0x0000000100299940, ps=0x0000000000000000, name="Gitweb/static/js/lib/", namelen=21, prefix=0, seen=0x0000000000000000, is_dir=0) at dir.c:490:13 [opt]
> 		frame #2: 0x00000001000f8315 git`read_directory_recursive(dir=0x00007ffeefbfe278, istate=0x0000000100299940, base=<unavailable>, baselen=17, untracked=<unavailable>, check_only=0, stop_at_first_file=0, pathspec=0x0000000000000000) at dir.c:1990:9 [opt]
> 		frame #3: 0x00000001000f82e9 git`read_directory_recursive(dir=0x00007ffeefbfe278, istate=0x0000000100299940, base=<unavailable>, baselen=14, untracked=<unavailable>, check_only=0, stop_at_first_file=0, pathspec=0x0000000000000000) at dir.c:1984:5 [opt]
> 		frame #4: 0x00000001000f82e9 git`read_directory_recursive(dir=0x00007ffeefbfe278, istate=0x0000000100299940, base=<unavailable>, baselen=7, untracked=<unavailable>, check_only=0, stop_at_first_file=0, pathspec=0x0000000000000000) at dir.c:1984:5 [opt]
> 		frame #5: 0x00000001000f60d1 git`read_directory(dir=0x00007ffeefbfe278, istate=0x0000000100299940, path="Gitweb/", len=7, pathspec=0x0000000000000000) at dir.c:2298:3 [opt]
> 		frame #6: 0x00000001001bded1 git`verify_clean_subdirectory(ce=<unavailable>, o=0x00007ffeefbfe8c0) at unpack-trees.c:1846:6 [opt]
> 		frame #7: 0x00000001001bdc1d git`check_ok_to_remove(name="Gitweb", len=6, dtype=4, ce=0x0000000103e70de0, st=0x00007ffeefbfe438, error_type=ERROR_WOULD_LOSE_UNTRACKED_OVERWRITTEN, o=0x00007ffeefbfe8c0) at unpack-trees.c:1901:7 [opt]

That 'name="Gitweb" parameter caught my eye.  origin/todo contains a
'Gitweb' file, with upper case 'G', while master contains a 'gitweb'
directory, with lower case 'g'.  

Could it be that case (in)sensitivity plays a crucial rule in
triggering the segfault?  FWIW I could reproduce it following Denton's
description on Travis CI's macOS VM with the debug shell access, and
it uses case insensitive file system.


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [BUG] git is segfaulting, was [PATCH v4 04/12] dir: also check directories for matching pathspecs
  2019-09-27  1:09           ` SZEDER Gábor
@ 2019-09-27  2:17             ` SZEDER Gábor
  2019-09-27 17:10               ` Denton Liu
  0 siblings, 1 reply; 73+ messages in thread
From: SZEDER Gábor @ 2019-09-27  2:17 UTC (permalink / raw)
  To: Denton Liu
  Cc: Elijah Newren, git, Junio C Hamano, Jeff King,
	Rafael Ascensão, Samuel Lijin

On Fri, Sep 27, 2019 at 03:09:30AM +0200, SZEDER Gábor wrote:
> On Wed, Sep 25, 2019 at 01:39:19PM -0700, Denton Liu wrote:
> > Hi Elijah,
> > 
> > I ran into a segfault on MacOS. I managed to bisect it down to
> > 404ebceda0 (dir: also check directories for matching pathspecs,
> > 2019-09-17), which should be the patch in the parent thread. The test
> > case below works fine without this patch applied but segfaults once it
> > is applied.
> > 
> > 	#!/bin/sh
> > 
> > 	git worktree add testdir
> > 	git -C testdir checkout master
> > 	git -C testdir fetch https://github.com/git/git.git todo
> > 	bin-wrappers/git -C testdir checkout FETCH_HEAD # segfault here
> > 
> > Note that the worktree part isn't necessary to reproduce the problem but
> > I didn't want my files to be constantly refreshed, triggering a rebuild
> > each time.
> > 
> > I also managed to get this backtrace from running lldb at the segfault
> > but it is based on the latest "jch" commit, 1cc52d20df (Merge branch
> > 'jt/merge-recursive-symlink-is-not-a-dir-in-way' into jch, 2019-09-20).
> > 
> > 	* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x8)
> > 	  * frame #0: 0x00000001000f63a0 git`do_match_pathspec(istate=0x0000000100299940, ps=0x000000010200aa80, name="Gitweb/static/js/lib/", namelen=21, prefix=0, seen=0x0000000000000000, flags=0) at dir.c:420:2 [opt]
> > 		frame #1: 0x00000001000f632c git`match_pathspec(istate=0x0000000100299940, ps=0x0000000000000000, name="Gitweb/static/js/lib/", namelen=21, prefix=0, seen=0x0000000000000000, is_dir=0) at dir.c:490:13 [opt]
> > 		frame #2: 0x00000001000f8315 git`read_directory_recursive(dir=0x00007ffeefbfe278, istate=0x0000000100299940, base=<unavailable>, baselen=17, untracked=<unavailable>, check_only=0, stop_at_first_file=0, pathspec=0x0000000000000000) at dir.c:1990:9 [opt]
> > 		frame #3: 0x00000001000f82e9 git`read_directory_recursive(dir=0x00007ffeefbfe278, istate=0x0000000100299940, base=<unavailable>, baselen=14, untracked=<unavailable>, check_only=0, stop_at_first_file=0, pathspec=0x0000000000000000) at dir.c:1984:5 [opt]
> > 		frame #4: 0x00000001000f82e9 git`read_directory_recursive(dir=0x00007ffeefbfe278, istate=0x0000000100299940, base=<unavailable>, baselen=7, untracked=<unavailable>, check_only=0, stop_at_first_file=0, pathspec=0x0000000000000000) at dir.c:1984:5 [opt]
> > 		frame #5: 0x00000001000f60d1 git`read_directory(dir=0x00007ffeefbfe278, istate=0x0000000100299940, path="Gitweb/", len=7, pathspec=0x0000000000000000) at dir.c:2298:3 [opt]
> > 		frame #6: 0x00000001001bded1 git`verify_clean_subdirectory(ce=<unavailable>, o=0x00007ffeefbfe8c0) at unpack-trees.c:1846:6 [opt]
> > 		frame #7: 0x00000001001bdc1d git`check_ok_to_remove(name="Gitweb", len=6, dtype=4, ce=0x0000000103e70de0, st=0x00007ffeefbfe438, error_type=ERROR_WOULD_LOSE_UNTRACKED_OVERWRITTEN, o=0x00007ffeefbfe8c0) at unpack-trees.c:1901:7 [opt]
> 
> That 'name="Gitweb" parameter caught my eye.  origin/todo contains a
> 'Gitweb' file, with upper case 'G', while master contains a 'gitweb'
> directory, with lower case 'g'.  
> 
> Could it be that case (in)sensitivity plays a crucial rule in
> triggering the segfault?  FWIW I could reproduce it following Denton's
> description on Travis CI's macOS VM with the debug shell access, and
> it uses case insensitive file system.

Indeed, with 404ebceda0 the test below segfaults on case insensitive
fs, but not on a case sensitive one.


diff --git a/t/t0050-filesystem.sh b/t/t0050-filesystem.sh
index 192c94eccd..5b405c97d7 100755
--- a/t/t0050-filesystem.sh
+++ b/t/t0050-filesystem.sh
@@ -131,4 +131,27 @@ $test_unicode 'merge (silent unicode normalization)' '
 	git merge topic
 '
 
+test_expect_success CASE_INSENSITIVE_FS "Denton's segfault" '
+	git init repo &&
+	(
+		cd repo &&
+
+		echo foo >Gitweb &&
+		git add Gitweb &&
+		git commit -m "add Gitweb" &&
+
+		git checkout --orphan todo &&
+		git reset --hard &&
+		# the subdir is crucial, without it there is no segfault
+		mkdir -p gitweb/subdir &&
+		echo bar >gitweb/subdir/file &&
+		# it is not strictly necessary to add and commit the
+		# gitweb directory, its presence is sufficient
+		git add gitweb &&
+		git commit -m "add gitweb/subdir/file" &&
+
+		git checkout master
+	)
+'
+
 test_done



The end of its trace:

++git checkout master
./test-lib.sh: line 910: 11220 Segmentation fault: 11  git checkout master
error: last command exited with $?=139

Case insensitivity is important because check_ok_to_remove() is
invoked from verify_absent_1(), which looks like this:

  if (...)
     ....
  else if (...)
     ....
  else if (lstat(ce->name, &st))
      // That lstat() checked whether 'Gitweb' is absent.  On a case
      // sensitive fs it's absent, so it returns.  On a case
      // insensitive fs it finds 'master's 'gitweb' directory, so it
      // goes on to the else below, and eventually segfaults.
      return;
  else
      check_ok_to_remove()


Good night :)

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [BUG] git is segfaulting, was [PATCH v4 04/12] dir: also check directories for matching pathspecs
  2019-09-27  2:17             ` SZEDER Gábor
@ 2019-09-27 17:10               ` Denton Liu
  2019-09-30 19:11                 ` [PATCH] dir: special case check for the possibility that pathspec is NULL Elijah Newren
  0 siblings, 1 reply; 73+ messages in thread
From: Denton Liu @ 2019-09-27 17:10 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Elijah Newren, git, Junio C Hamano, Jeff King,
	Rafael Ascensão, Samuel Lijin

On Fri, Sep 27, 2019 at 04:17:46AM +0200, SZEDER Gábor wrote:
> On Fri, Sep 27, 2019 at 03:09:30AM +0200, SZEDER Gábor wrote:
> > On Wed, Sep 25, 2019 at 01:39:19PM -0700, Denton Liu wrote:
> > > Hi Elijah,
> > > 
> > > I ran into a segfault on MacOS. I managed to bisect it down to
> > > 404ebceda0 (dir: also check directories for matching pathspecs,
> > > 2019-09-17), which should be the patch in the parent thread. The test
> > > case below works fine without this patch applied but segfaults once it
> > > is applied.
> > > 
> > > 	#!/bin/sh
> > > 
> > > 	git worktree add testdir
> > > 	git -C testdir checkout master
> > > 	git -C testdir fetch https://github.com/git/git.git todo
> > > 	bin-wrappers/git -C testdir checkout FETCH_HEAD # segfault here
> > > 
> > > Note that the worktree part isn't necessary to reproduce the problem but
> > > I didn't want my files to be constantly refreshed, triggering a rebuild
> > > each time.
> > > 
> > > I also managed to get this backtrace from running lldb at the segfault
> > > but it is based on the latest "jch" commit, 1cc52d20df (Merge branch
> > > 'jt/merge-recursive-symlink-is-not-a-dir-in-way' into jch, 2019-09-20).
> > > 
> > > 	* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x8)
> > > 	  * frame #0: 0x00000001000f63a0 git`do_match_pathspec(istate=0x0000000100299940, ps=0x000000010200aa80, name="Gitweb/static/js/lib/", namelen=21, prefix=0, seen=0x0000000000000000, flags=0) at dir.c:420:2 [opt]
> > > 		frame #1: 0x00000001000f632c git`match_pathspec(istate=0x0000000100299940, ps=0x0000000000000000, name="Gitweb/static/js/lib/", namelen=21, prefix=0, seen=0x0000000000000000, is_dir=0) at dir.c:490:13 [opt]
> > > 		frame #2: 0x00000001000f8315 git`read_directory_recursive(dir=0x00007ffeefbfe278, istate=0x0000000100299940, base=<unavailable>, baselen=17, untracked=<unavailable>, check_only=0, stop_at_first_file=0, pathspec=0x0000000000000000) at dir.c:1990:9 [opt]
> > > 		frame #3: 0x00000001000f82e9 git`read_directory_recursive(dir=0x00007ffeefbfe278, istate=0x0000000100299940, base=<unavailable>, baselen=14, untracked=<unavailable>, check_only=0, stop_at_first_file=0, pathspec=0x0000000000000000) at dir.c:1984:5 [opt]
> > > 		frame #4: 0x00000001000f82e9 git`read_directory_recursive(dir=0x00007ffeefbfe278, istate=0x0000000100299940, base=<unavailable>, baselen=7, untracked=<unavailable>, check_only=0, stop_at_first_file=0, pathspec=0x0000000000000000) at dir.c:1984:5 [opt]
> > > 		frame #5: 0x00000001000f60d1 git`read_directory(dir=0x00007ffeefbfe278, istate=0x0000000100299940, path="Gitweb/", len=7, pathspec=0x0000000000000000) at dir.c:2298:3 [opt]
> > > 		frame #6: 0x00000001001bded1 git`verify_clean_subdirectory(ce=<unavailable>, o=0x00007ffeefbfe8c0) at unpack-trees.c:1846:6 [opt]
> > > 		frame #7: 0x00000001001bdc1d git`check_ok_to_remove(name="Gitweb", len=6, dtype=4, ce=0x0000000103e70de0, st=0x00007ffeefbfe438, error_type=ERROR_WOULD_LOSE_UNTRACKED_OVERWRITTEN, o=0x00007ffeefbfe8c0) at unpack-trees.c:1901:7 [opt]
> > 
> > That 'name="Gitweb" parameter caught my eye.  origin/todo contains a
> > 'Gitweb' file, with upper case 'G', while master contains a 'gitweb'
> > directory, with lower case 'g'.  
> > 
> > Could it be that case (in)sensitivity plays a crucial rule in
> > triggering the segfault?  FWIW I could reproduce it following Denton's
> > description on Travis CI's macOS VM with the debug shell access, and
> > it uses case insensitive file system.
> 
> Indeed, with 404ebceda0 the test below segfaults on case insensitive
> fs, but not on a case sensitive one.

Wow, good catch. I didn't even notice that in the backtrace.

> 
> 
> diff --git a/t/t0050-filesystem.sh b/t/t0050-filesystem.sh
> index 192c94eccd..5b405c97d7 100755
> --- a/t/t0050-filesystem.sh
> +++ b/t/t0050-filesystem.sh
> @@ -131,4 +131,27 @@ $test_unicode 'merge (silent unicode normalization)' '
>  	git merge topic
>  '
>  
> +test_expect_success CASE_INSENSITIVE_FS "Denton's segfault" '
> +	git init repo &&
> +	(
> +		cd repo &&
> +
> +		echo foo >Gitweb &&
> +		git add Gitweb &&
> +		git commit -m "add Gitweb" &&
> +
> +		git checkout --orphan todo &&
> +		git reset --hard &&
> +		# the subdir is crucial, without it there is no segfault
> +		mkdir -p gitweb/subdir &&
> +		echo bar >gitweb/subdir/file &&
> +		# it is not strictly necessary to add and commit the
> +		# gitweb directory, its presence is sufficient
> +		git add gitweb &&
> +		git commit -m "add gitweb/subdir/file" &&
> +
> +		git checkout master
> +	)
> +'
> +
>  test_done

I can confirm that this test case reproduces for me. Thanks for writing
this.

> 
> 
> 
> The end of its trace:
> 
> ++git checkout master
> ./test-lib.sh: line 910: 11220 Segmentation fault: 11  git checkout master
> error: last command exited with $?=139
> 
> Case insensitivity is important because check_ok_to_remove() is
> invoked from verify_absent_1(), which looks like this:
> 
>   if (...)
>      ....
>   else if (...)
>      ....
>   else if (lstat(ce->name, &st))
>       // That lstat() checked whether 'Gitweb' is absent.  On a case
>       // sensitive fs it's absent, so it returns.  On a case
>       // insensitive fs it finds 'master's 'gitweb' directory, so it
>       // goes on to the else below, and eventually segfaults.
>       return;
>   else
>       check_ok_to_remove()
> 
> 
> Good night :)

Thanks for your help!

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH] dir: special case check for the possibility that pathspec is NULL
  2019-09-27 17:10               ` Denton Liu
@ 2019-09-30 19:11                 ` Elijah Newren
  2019-09-30 22:31                   ` Denton Liu
  2019-10-01 18:30                   ` [PATCH v2] " Elijah Newren
  0 siblings, 2 replies; 73+ messages in thread
From: Elijah Newren @ 2019-09-30 19:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Denton Liu, SZEDER Gábor, Elijah Newren

Commits 404ebceda01c ("dir: also check directories for matching
pathspecs", 2019-09-17) and 89a1f4aaf765 ("dir: if our pathspec might
match files under a dir, recurse into it", 2019-09-17) added calls to
match_pathspec() and do_match_pathspec() passing along their pathspec
parameter.  Both match_pathspec() and do_match_pathspec() assume the
pathspec argument they are given is non-NULL.  It turns out that
unpack-tree.c's verify_clean_subdirectory() calls read_directory() with
pathspec == NULL, and it is possible on case insensitive filesystems for
that NULL to make it to these new calls to match_pathspec() and
do_match_pathspec().  Add appropriate checks on the NULLness of pathspec
to avoid a segfault.

In case the negation throws anyone off (one of the calls was to
do_match_pathspec() while the other was to !match_pathspec(), yet no
negation of the NULLness of pathspec is used), there are two ways to
understand the differences:
  * The code already handled the pathspec == NULL cases before this
    series, and this series only tried to change behavior when there was
    a pathspec, thus we only want to go into the if-block if pathspec is
    non-NULL.
  * One of the calls is for whether to recurse into a subdirectory, the
    other is for after we've recursed into it for whether we want to
    remove the subdirectory itself (i.e. the subdirectory didn't match
    but something under it could have).  That difference in situation
    leads to the slight differences in logic used (well, that and the
    slightly unusual fact that we don't want empty pathspecs to remove
    untracked directories by default).

Helped-by: Denton Liu <liu.denton@gmail.com>
Helped-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
---
This patch applies on top of en/clean-nested-with-ignored, which is now
in next.

Denton found and analyzed one issue and provided the patch for the
match_pathspec() call, SZEDER figured out why the issue only reproduced
for some folks and not others and provided the testcase, and I looked
through the remainder of the series and noted the do_match_pathspec()
call that should have the same check.

So, I'm not sure who should be author and who should be helped-by; I
feel like their contributions are possibly bigger than mine.  While I
tried to reproduce and debug, they ended up doing the work, and I just
looked through the rest of the series for similar issues and wrote up
a commit message.  *shrug*

 dir.c                 |  8 +++++---
 t/t0050-filesystem.sh | 23 +++++++++++++++++++++++
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/dir.c b/dir.c
index 7ff79170fc..bd39b86be4 100644
--- a/dir.c
+++ b/dir.c
@@ -1962,8 +1962,9 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 			((state == path_untracked) &&
 			 (get_dtype(cdir.de, istate, path.buf, path.len) == DT_DIR) &&
 			 ((dir->flags & DIR_SHOW_IGNORED_TOO) ||
-			  do_match_pathspec(istate, pathspec, path.buf, path.len,
-					    baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC))) {
+			  (pathspec &&
+			   do_match_pathspec(istate, pathspec, path.buf, path.len,
+					     baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC)))) {
 			struct untracked_cache_dir *ud;
 			ud = lookup_untracked(dir->untracked, untracked,
 					      path.buf + baselen,
@@ -1975,7 +1976,8 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 			if (subdir_state > dir_state)
 				dir_state = subdir_state;
 
-			if (!match_pathspec(istate, pathspec, path.buf, path.len,
+			if (pathspec &&
+			    !match_pathspec(istate, pathspec, path.buf, path.len,
 					    0 /* prefix */, NULL,
 					    0 /* do NOT special case dirs */))
 				state = path_none;
diff --git a/t/t0050-filesystem.sh b/t/t0050-filesystem.sh
index 192c94eccd..edb30f9eb2 100755
--- a/t/t0050-filesystem.sh
+++ b/t/t0050-filesystem.sh
@@ -131,4 +131,27 @@ $test_unicode 'merge (silent unicode normalization)' '
 	git merge topic
 '
 
+test_expect_success CASE_INSENSITIVE_FS 'checkout with no pathspec and a case insensitive fs' '
+	git init repo &&
+	(
+		cd repo &&
+
+		>Gitweb &&
+		git add Gitweb &&
+		git commit -m "add Gitweb" &&
+
+		git checkout --orphan todo &&
+		git reset --hard &&
+		# the subdir is crucial, without it there is no segfault
+		mkdir -p gitweb/subdir &&
+		>gitweb/subdir/file &&
+		# it is not strictly necessary to add and commit the
+		# gitweb directory, its presence is sufficient
+		git add gitweb &&
+		git commit -m "add gitweb/subdir/file" &&
+
+		git checkout master
+	)
+'
+
 test_done
-- 
2.22.1.14.g885c22d24b


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] dir: special case check for the possibility that pathspec is NULL
  2019-09-30 19:11                 ` [PATCH] dir: special case check for the possibility that pathspec is NULL Elijah Newren
@ 2019-09-30 22:31                   ` Denton Liu
  2019-10-01  7:01                     ` Elijah Newren
  2019-10-01 18:30                   ` [PATCH v2] " Elijah Newren
  1 sibling, 1 reply; 73+ messages in thread
From: Denton Liu @ 2019-09-30 22:31 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Junio C Hamano, git, SZEDER Gábor

Hi Elijah,

On Mon, Sep 30, 2019 at 12:11:06PM -0700, Elijah Newren wrote:
> Commits 404ebceda01c ("dir: also check directories for matching
> pathspecs", 2019-09-17) and 89a1f4aaf765 ("dir: if our pathspec might
> match files under a dir, recurse into it", 2019-09-17) added calls to
> match_pathspec() and do_match_pathspec() passing along their pathspec
> parameter.  Both match_pathspec() and do_match_pathspec() assume the
> pathspec argument they are given is non-NULL.  It turns out that
> unpack-tree.c's verify_clean_subdirectory() calls read_directory() with
> pathspec == NULL, and it is possible on case insensitive filesystems for
> that NULL to make it to these new calls to match_pathspec() and
> do_match_pathspec().  Add appropriate checks on the NULLness of pathspec
> to avoid a segfault.
> 
> In case the negation throws anyone off (one of the calls was to
> do_match_pathspec() while the other was to !match_pathspec(), yet no
> negation of the NULLness of pathspec is used), there are two ways to
> understand the differences:
>   * The code already handled the pathspec == NULL cases before this
>     series, and this series only tried to change behavior when there was
>     a pathspec, thus we only want to go into the if-block if pathspec is
>     non-NULL.
>   * One of the calls is for whether to recurse into a subdirectory, the
>     other is for after we've recursed into it for whether we want to
>     remove the subdirectory itself (i.e. the subdirectory didn't match
>     but something under it could have).  That difference in situation
>     leads to the slight differences in logic used (well, that and the
>     slightly unusual fact that we don't want empty pathspecs to remove
>     untracked directories by default).
> 
> Helped-by: Denton Liu <liu.denton@gmail.com>
> Helped-by: SZEDER Gábor <szeder.dev@gmail.com>
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
> This patch applies on top of en/clean-nested-with-ignored, which is now
> in next.
> 
> Denton found and analyzed one issue and provided the patch for the
> match_pathspec() call, SZEDER figured out why the issue only reproduced
> for some folks and not others and provided the testcase, and I looked
> through the remainder of the series and noted the do_match_pathspec()
> call that should have the same check.

Thanks for catching what I missed.

> 
> So, I'm not sure who should be author and who should be helped-by; I
> feel like their contributions are possibly bigger than mine.  While I
> tried to reproduce and debug, they ended up doing the work, and I just
> looked through the rest of the series for similar issues and wrote up
> a commit message.  *shrug*

Eh, it doesn't really matter to me. GitHub appears to have de facto
standardised the Co-authored-by: trailer to allow credit to be split
amonst multiple authors so _maybe_ we could use that, but I'm pretty
impartial.

> 
>  dir.c                 |  8 +++++---
>  t/t0050-filesystem.sh | 23 +++++++++++++++++++++++
>  2 files changed, 28 insertions(+), 3 deletions(-)
> 
> diff --git a/dir.c b/dir.c
> index 7ff79170fc..bd39b86be4 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1962,8 +1962,9 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
>  			((state == path_untracked) &&
>  			 (get_dtype(cdir.de, istate, path.buf, path.len) == DT_DIR) &&
>  			 ((dir->flags & DIR_SHOW_IGNORED_TOO) ||
> -			  do_match_pathspec(istate, pathspec, path.buf, path.len,
> -					    baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC))) {
> +			  (pathspec &&
> +			   do_match_pathspec(istate, pathspec, path.buf, path.len,
> +					     baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC)))) {
>  			struct untracked_cache_dir *ud;
>  			ud = lookup_untracked(dir->untracked, untracked,
>  					      path.buf + baselen,
> @@ -1975,7 +1976,8 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
>  			if (subdir_state > dir_state)
>  				dir_state = subdir_state;
>  
> -			if (!match_pathspec(istate, pathspec, path.buf, path.len,
> +			if (pathspec &&
> +			    !match_pathspec(istate, pathspec, path.buf, path.len,
>  					    0 /* prefix */, NULL,
>  					    0 /* do NOT special case dirs */))
>  				state = path_none;
> diff --git a/t/t0050-filesystem.sh b/t/t0050-filesystem.sh
> index 192c94eccd..edb30f9eb2 100755
> --- a/t/t0050-filesystem.sh
> +++ b/t/t0050-filesystem.sh
> @@ -131,4 +131,27 @@ $test_unicode 'merge (silent unicode normalization)' '
>  	git merge topic
>  '
>  
> +test_expect_success CASE_INSENSITIVE_FS 'checkout with no pathspec and a case insensitive fs' '
> +	git init repo &&
> +	(
> +		cd repo &&
> +
> +		>Gitweb &&
> +		git add Gitweb &&
> +		git commit -m "add Gitweb" &&
> +
> +		git checkout --orphan todo &&
> +		git reset --hard &&
> +		# the subdir is crucial, without it there is no segfault

We should either remove this comment or change the justification. A
future reader may be confused at what particular segfault this refers
to.

> +		mkdir -p gitweb/subdir &&
> +		>gitweb/subdir/file &&
> +		# it is not strictly necessary to add and commit the
> +		# gitweb directory, its presence is sufficient

Same here, its presence is sufficient to... what?

Thanks,

Denton

> +		git add gitweb &&
> +		git commit -m "add gitweb/subdir/file" &&
> +
> +		git checkout master
> +	)
> +'
> +
>  test_done
> -- 
> 2.22.1.14.g885c22d24b
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] dir: special case check for the possibility that pathspec is NULL
  2019-09-30 22:31                   ` Denton Liu
@ 2019-10-01  7:01                     ` Elijah Newren
  0 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-10-01  7:01 UTC (permalink / raw)
  To: Denton Liu; +Cc: Junio C Hamano, Git Mailing List, SZEDER Gábor

On Mon, Sep 30, 2019 at 3:31 PM Denton Liu <liu.denton@gmail.com> wrote:
>
> Hi Elijah,
>
> On Mon, Sep 30, 2019 at 12:11:06PM -0700, Elijah Newren wrote:
> > Commits 404ebceda01c ("dir: also check directories for matching
> > pathspecs", 2019-09-17) and 89a1f4aaf765 ("dir: if our pathspec might
> > match files under a dir, recurse into it", 2019-09-17) added calls to
> > match_pathspec() and do_match_pathspec() passing along their pathspec
> > parameter.  Both match_pathspec() and do_match_pathspec() assume the
> > pathspec argument they are given is non-NULL.  It turns out that
> > unpack-tree.c's verify_clean_subdirectory() calls read_directory() with
> > pathspec == NULL, and it is possible on case insensitive filesystems for
> > that NULL to make it to these new calls to match_pathspec() and
> > do_match_pathspec().  Add appropriate checks on the NULLness of pathspec
> > to avoid a segfault.
> >
> > In case the negation throws anyone off (one of the calls was to
> > do_match_pathspec() while the other was to !match_pathspec(), yet no
> > negation of the NULLness of pathspec is used), there are two ways to
> > understand the differences:
> >   * The code already handled the pathspec == NULL cases before this
> >     series, and this series only tried to change behavior when there was
> >     a pathspec, thus we only want to go into the if-block if pathspec is
> >     non-NULL.
> >   * One of the calls is for whether to recurse into a subdirectory, the
> >     other is for after we've recursed into it for whether we want to
> >     remove the subdirectory itself (i.e. the subdirectory didn't match
> >     but something under it could have).  That difference in situation
> >     leads to the slight differences in logic used (well, that and the
> >     slightly unusual fact that we don't want empty pathspecs to remove
> >     untracked directories by default).
> >
> > Helped-by: Denton Liu <liu.denton@gmail.com>
> > Helped-by: SZEDER Gábor <szeder.dev@gmail.com>
> > Signed-off-by: Elijah Newren <newren@gmail.com>
> > ---
> > This patch applies on top of en/clean-nested-with-ignored, which is now
> > in next.
> >
> > Denton found and analyzed one issue and provided the patch for the
> > match_pathspec() call, SZEDER figured out why the issue only reproduced
> > for some folks and not others and provided the testcase, and I looked
> > through the remainder of the series and noted the do_match_pathspec()
> > call that should have the same check.
>
> Thanks for catching what I missed.
>
> >
> > So, I'm not sure who should be author and who should be helped-by; I
> > feel like their contributions are possibly bigger than mine.  While I
> > tried to reproduce and debug, they ended up doing the work, and I just
> > looked through the rest of the series for similar issues and wrote up
> > a commit message.  *shrug*
>
> Eh, it doesn't really matter to me. GitHub appears to have de facto
> standardised the Co-authored-by: trailer to allow credit to be split
> amonst multiple authors so _maybe_ we could use that, but I'm pretty
> impartial.
>
> >
> >  dir.c                 |  8 +++++---
> >  t/t0050-filesystem.sh | 23 +++++++++++++++++++++++
> >  2 files changed, 28 insertions(+), 3 deletions(-)
> >
> > diff --git a/dir.c b/dir.c
> > index 7ff79170fc..bd39b86be4 100644
> > --- a/dir.c
> > +++ b/dir.c
> > @@ -1962,8 +1962,9 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
> >                       ((state == path_untracked) &&
> >                        (get_dtype(cdir.de, istate, path.buf, path.len) == DT_DIR) &&
> >                        ((dir->flags & DIR_SHOW_IGNORED_TOO) ||
> > -                       do_match_pathspec(istate, pathspec, path.buf, path.len,
> > -                                         baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC))) {
> > +                       (pathspec &&
> > +                        do_match_pathspec(istate, pathspec, path.buf, path.len,
> > +                                          baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC)))) {
> >                       struct untracked_cache_dir *ud;
> >                       ud = lookup_untracked(dir->untracked, untracked,
> >                                             path.buf + baselen,
> > @@ -1975,7 +1976,8 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
> >                       if (subdir_state > dir_state)
> >                               dir_state = subdir_state;
> >
> > -                     if (!match_pathspec(istate, pathspec, path.buf, path.len,
> > +                     if (pathspec &&
> > +                         !match_pathspec(istate, pathspec, path.buf, path.len,
> >                                           0 /* prefix */, NULL,
> >                                           0 /* do NOT special case dirs */))
> >                               state = path_none;
> > diff --git a/t/t0050-filesystem.sh b/t/t0050-filesystem.sh
> > index 192c94eccd..edb30f9eb2 100755
> > --- a/t/t0050-filesystem.sh
> > +++ b/t/t0050-filesystem.sh
> > @@ -131,4 +131,27 @@ $test_unicode 'merge (silent unicode normalization)' '
> >       git merge topic
> >  '
> >
> > +test_expect_success CASE_INSENSITIVE_FS 'checkout with no pathspec and a case insensitive fs' '
> > +     git init repo &&
> > +     (
> > +             cd repo &&
> > +
> > +             >Gitweb &&
> > +             git add Gitweb &&
> > +             git commit -m "add Gitweb" &&
> > +
> > +             git checkout --orphan todo &&
> > +             git reset --hard &&
> > +             # the subdir is crucial, without it there is no segfault
>
> We should either remove this comment or change the justification. A
> future reader may be confused at what particular segfault this refers
> to.

Yep, good point, I'll just go ahead and remove it.

> > +             mkdir -p gitweb/subdir &&
> > +             >gitweb/subdir/file &&
> > +             # it is not strictly necessary to add and commit the
> > +             # gitweb directory, its presence is sufficient
>
> Same here, its presence is sufficient to... what?

I will clean this one too and send a v2 tomorrow; it's getting late.

Thanks for all the digging you did on this bug to get it sorted out,
Denton; I really appreciate it.

Elijah

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v2] dir: special case check for the possibility that pathspec is NULL
  2019-09-30 19:11                 ` [PATCH] dir: special case check for the possibility that pathspec is NULL Elijah Newren
  2019-09-30 22:31                   ` Denton Liu
@ 2019-10-01 18:30                   ` " Elijah Newren
  2019-10-01 18:40                     ` Denton Liu
  1 sibling, 1 reply; 73+ messages in thread
From: Elijah Newren @ 2019-10-01 18:30 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Denton Liu, SZEDER Gábor, Elijah Newren

Commits 404ebceda01c ("dir: also check directories for matching
pathspecs", 2019-09-17) and 89a1f4aaf765 ("dir: if our pathspec might
match files under a dir, recurse into it", 2019-09-17) added calls to
match_pathspec() and do_match_pathspec() passing along their pathspec
parameter.  Both match_pathspec() and do_match_pathspec() assume the
pathspec argument they are given is non-NULL.  It turns out that
unpack-tree.c's verify_clean_subdirectory() calls read_directory() with
pathspec == NULL, and it is possible on case insensitive filesystems for
that NULL to make it to these new calls to match_pathspec() and
do_match_pathspec().  Add appropriate checks on the NULLness of pathspec
to avoid a segfault.

In case the negation throws anyone off (one of the calls was to
do_match_pathspec() while the other was to !match_pathspec(), yet no
negation of the NULLness of pathspec is used), there are two ways to
understand the differences:
  * The code already handled the pathspec == NULL cases before this
    series, and this series only tried to change behavior when there was
    a pathspec, thus we only want to go into the if-block if pathspec is
    non-NULL.
  * One of the calls is for whether to recurse into a subdirectory, the
    other is for after we've recursed into it for whether we want to
    remove the subdirectory itself (i.e. the subdirectory didn't match
    but something under it could have).  That difference in situation
    leads to the slight differences in logic used (well, that and the
    slightly unusual fact that we don't want empty pathspecs to remove
    untracked directories by default).

Denton found and analyzed one issue and provided the patch for the
match_pathspec() call, SZEDER figured out why the issue only reproduced
for some folks and not others and provided the testcase, and I looked
through the remainder of the series and noted the do_match_pathspec()
call that should have the same check.

Co-authored-by: Denton Liu <liu.denton@gmail.com>
Co-authored-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
---
Note: Applies on top of en/clean-nested-with-ignored, in next.

As with v1, the authorship is really mixed, so I don't know if I
should use Co-authored-by (highlighted as a possibility by Denton), or
the far more common Helped-by (as suggested by Junio but based on a
more limited summary of the different contributions), or if perhaps
Denton or SZEDER should be marked as the author and I be marked as
Helped-by or Co-authored-by.  Since Denton commented on round 1, I
used his suggestion for attribution in this round, but I'm open to
changing it to whatever works best.

Changes since v1:
  - Removed comments that made sense in context of the original thread
    but wouldn't be helpful to future readers.
  - s/Helped-by/Co-authored-by/

Range-diff:
1:  885c22d24b ! 1:  c495b9303c dir: special case check for the possibility that pathspec is NULL
    @@ t/t0050-filesystem.sh: $test_unicode 'merge (silent unicode normalization)' '
     +
     +		git checkout --orphan todo &&
     +		git reset --hard &&
    -+		# the subdir is crucial, without it there is no segfault
     +		mkdir -p gitweb/subdir &&
     +		>gitweb/subdir/file &&
     +		# it is not strictly necessary to add and commit the
    -+		# gitweb directory, its presence is sufficient
     +		git add gitweb &&
     +		git commit -m "add gitweb/subdir/file" &&
     +

 dir.c                 |  8 +++++---
 t/t0050-filesystem.sh | 21 +++++++++++++++++++++
 2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/dir.c b/dir.c
index 7ff79170fc..bd39b86be4 100644
--- a/dir.c
+++ b/dir.c
@@ -1962,8 +1962,9 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 			((state == path_untracked) &&
 			 (get_dtype(cdir.de, istate, path.buf, path.len) == DT_DIR) &&
 			 ((dir->flags & DIR_SHOW_IGNORED_TOO) ||
-			  do_match_pathspec(istate, pathspec, path.buf, path.len,
-					    baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC))) {
+			  (pathspec &&
+			   do_match_pathspec(istate, pathspec, path.buf, path.len,
+					     baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC)))) {
 			struct untracked_cache_dir *ud;
 			ud = lookup_untracked(dir->untracked, untracked,
 					      path.buf + baselen,
@@ -1975,7 +1976,8 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 			if (subdir_state > dir_state)
 				dir_state = subdir_state;
 
-			if (!match_pathspec(istate, pathspec, path.buf, path.len,
+			if (pathspec &&
+			    !match_pathspec(istate, pathspec, path.buf, path.len,
 					    0 /* prefix */, NULL,
 					    0 /* do NOT special case dirs */))
 				state = path_none;
diff --git a/t/t0050-filesystem.sh b/t/t0050-filesystem.sh
index 192c94eccd..a840919967 100755
--- a/t/t0050-filesystem.sh
+++ b/t/t0050-filesystem.sh
@@ -131,4 +131,25 @@ $test_unicode 'merge (silent unicode normalization)' '
 	git merge topic
 '
 
+test_expect_success CASE_INSENSITIVE_FS 'checkout with no pathspec and a case insensitive fs' '
+	git init repo &&
+	(
+		cd repo &&
+
+		>Gitweb &&
+		git add Gitweb &&
+		git commit -m "add Gitweb" &&
+
+		git checkout --orphan todo &&
+		git reset --hard &&
+		mkdir -p gitweb/subdir &&
+		>gitweb/subdir/file &&
+		# it is not strictly necessary to add and commit the
+		git add gitweb &&
+		git commit -m "add gitweb/subdir/file" &&
+
+		git checkout master
+	)
+'
+
 test_done
-- 
2.23.0.25.g3f4444bfd7.dirty


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v2] dir: special case check for the possibility that pathspec is NULL
  2019-10-01 18:30                   ` [PATCH v2] " Elijah Newren
@ 2019-10-01 18:40                     ` Denton Liu
  2019-10-01 18:54                       ` Elijah Newren
  2019-10-01 18:55                       ` [PATCH v3] " Elijah Newren
  0 siblings, 2 replies; 73+ messages in thread
From: Denton Liu @ 2019-10-01 18:40 UTC (permalink / raw)
  To: Elijah Newren; +Cc: git, Junio C Hamano, SZEDER Gábor

Hi Elijah,

On Tue, Oct 01, 2019 at 11:30:05AM -0700, Elijah Newren wrote:

[...]

> diff --git a/dir.c b/dir.c
> index 7ff79170fc..bd39b86be4 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1962,8 +1962,9 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
>  			((state == path_untracked) &&
>  			 (get_dtype(cdir.de, istate, path.buf, path.len) == DT_DIR) &&
>  			 ((dir->flags & DIR_SHOW_IGNORED_TOO) ||
> -			  do_match_pathspec(istate, pathspec, path.buf, path.len,
> -					    baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC))) {
> +			  (pathspec &&
> +			   do_match_pathspec(istate, pathspec, path.buf, path.len,
> +					     baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC)))) {
>  			struct untracked_cache_dir *ud;
>  			ud = lookup_untracked(dir->untracked, untracked,
>  					      path.buf + baselen,
> @@ -1975,7 +1976,8 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
>  			if (subdir_state > dir_state)
>  				dir_state = subdir_state;
>  
> -			if (!match_pathspec(istate, pathspec, path.buf, path.len,
> +			if (pathspec &&
> +			    !match_pathspec(istate, pathspec, path.buf, path.len,
>  					    0 /* prefix */, NULL,
>  					    0 /* do NOT special case dirs */))
>  				state = path_none;
> diff --git a/t/t0050-filesystem.sh b/t/t0050-filesystem.sh
> index 192c94eccd..a840919967 100755
> --- a/t/t0050-filesystem.sh
> +++ b/t/t0050-filesystem.sh
> @@ -131,4 +131,25 @@ $test_unicode 'merge (silent unicode normalization)' '
>  	git merge topic
>  '
>  
> +test_expect_success CASE_INSENSITIVE_FS 'checkout with no pathspec and a case insensitive fs' '
> +	git init repo &&
> +	(
> +		cd repo &&
> +
> +		>Gitweb &&
> +		git add Gitweb &&
> +		git commit -m "add Gitweb" &&
> +
> +		git checkout --orphan todo &&
> +		git reset --hard &&
> +		mkdir -p gitweb/subdir &&
> +		>gitweb/subdir/file &&
> +		# it is not strictly necessary to add and commit the

Probably not worth a reroll but we're missing "gitweb directory" at the
end of the comment. Other than that, it looks good to me.

Thanks again for the prompt fix,

Denton

> +		git add gitweb &&
> +		git commit -m "add gitweb/subdir/file" &&
> +
> +		git checkout master
> +	)
> +'
> +
>  test_done
> -- 
> 2.23.0.25.g3f4444bfd7.dirty
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v2] dir: special case check for the possibility that pathspec is NULL
  2019-10-01 18:40                     ` Denton Liu
@ 2019-10-01 18:54                       ` Elijah Newren
  2019-10-01 18:55                       ` [PATCH v3] " Elijah Newren
  1 sibling, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-10-01 18:54 UTC (permalink / raw)
  To: Denton Liu; +Cc: Git Mailing List, Junio C Hamano, SZEDER Gábor

On Tue, Oct 1, 2019 at 11:41 AM Denton Liu <liu.denton@gmail.com> wrote:
>
> Hi Elijah,
>
> On Tue, Oct 01, 2019 at 11:30:05AM -0700, Elijah Newren wrote:
>
> [...]
>
> > diff --git a/dir.c b/dir.c
> > index 7ff79170fc..bd39b86be4 100644
> > --- a/dir.c
> > +++ b/dir.c
> > @@ -1962,8 +1962,9 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
> >                       ((state == path_untracked) &&
> >                        (get_dtype(cdir.de, istate, path.buf, path.len) == DT_DIR) &&
> >                        ((dir->flags & DIR_SHOW_IGNORED_TOO) ||
> > -                       do_match_pathspec(istate, pathspec, path.buf, path.len,
> > -                                         baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC))) {
> > +                       (pathspec &&
> > +                        do_match_pathspec(istate, pathspec, path.buf, path.len,
> > +                                          baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC)))) {
> >                       struct untracked_cache_dir *ud;
> >                       ud = lookup_untracked(dir->untracked, untracked,
> >                                             path.buf + baselen,
> > @@ -1975,7 +1976,8 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
> >                       if (subdir_state > dir_state)
> >                               dir_state = subdir_state;
> >
> > -                     if (!match_pathspec(istate, pathspec, path.buf, path.len,
> > +                     if (pathspec &&
> > +                         !match_pathspec(istate, pathspec, path.buf, path.len,
> >                                           0 /* prefix */, NULL,
> >                                           0 /* do NOT special case dirs */))
> >                               state = path_none;
> > diff --git a/t/t0050-filesystem.sh b/t/t0050-filesystem.sh
> > index 192c94eccd..a840919967 100755
> > --- a/t/t0050-filesystem.sh
> > +++ b/t/t0050-filesystem.sh
> > @@ -131,4 +131,25 @@ $test_unicode 'merge (silent unicode normalization)' '
> >       git merge topic
> >  '
> >
> > +test_expect_success CASE_INSENSITIVE_FS 'checkout with no pathspec and a case insensitive fs' '
> > +     git init repo &&
> > +     (
> > +             cd repo &&
> > +
> > +             >Gitweb &&
> > +             git add Gitweb &&
> > +             git commit -m "add Gitweb" &&
> > +
> > +             git checkout --orphan todo &&
> > +             git reset --hard &&
> > +             mkdir -p gitweb/subdir &&
> > +             >gitweb/subdir/file &&
> > +             # it is not strictly necessary to add and commit the
>
> Probably not worth a reroll but we're missing "gitweb directory" at the
> end of the comment. Other than that, it looks good to me.

Yuck, I accidentally only removed half the comment when I intended to
remove it all?  Whoops.  I think it's worth a reroll; it's just a
single patch.  I'll send it out.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v3] dir: special case check for the possibility that pathspec is NULL
  2019-10-01 18:40                     ` Denton Liu
  2019-10-01 18:54                       ` Elijah Newren
@ 2019-10-01 18:55                       ` " Elijah Newren
  2019-10-01 19:35                         ` Denton Liu
  2019-10-07 18:04                         ` SZEDER Gábor
  1 sibling, 2 replies; 73+ messages in thread
From: Elijah Newren @ 2019-10-01 18:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Denton Liu, SZEDER Gábor, Elijah Newren

Commits 404ebceda01c ("dir: also check directories for matching
pathspecs", 2019-09-17) and 89a1f4aaf765 ("dir: if our pathspec might
match files under a dir, recurse into it", 2019-09-17) added calls to
match_pathspec() and do_match_pathspec() passing along their pathspec
parameter.  Both match_pathspec() and do_match_pathspec() assume the
pathspec argument they are given is non-NULL.  It turns out that
unpack-tree.c's verify_clean_subdirectory() calls read_directory() with
pathspec == NULL, and it is possible on case insensitive filesystems for
that NULL to make it to these new calls to match_pathspec() and
do_match_pathspec().  Add appropriate checks on the NULLness of pathspec
to avoid a segfault.

In case the negation throws anyone off (one of the calls was to
do_match_pathspec() while the other was to !match_pathspec(), yet no
negation of the NULLness of pathspec is used), there are two ways to
understand the differences:
  * The code already handled the pathspec == NULL cases before this
    series, and this series only tried to change behavior when there was
    a pathspec, thus we only want to go into the if-block if pathspec is
    non-NULL.
  * One of the calls is for whether to recurse into a subdirectory, the
    other is for after we've recursed into it for whether we want to
    remove the subdirectory itself (i.e. the subdirectory didn't match
    but something under it could have).  That difference in situation
    leads to the slight differences in logic used (well, that and the
    slightly unusual fact that we don't want empty pathspecs to remove
    untracked directories by default).

Denton found and analyzed one issue and provided the patch for the
match_pathspec() call, SZEDER figured out why the issue only reproduced
for some folks and not others and provided the testcase, and I looked
through the remainder of the series and noted the do_match_pathspec()
call that should have the same check.

Co-authored-by: Denton Liu <liu.denton@gmail.com>
Co-authored-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
---
Note: Applies on top of en/clean-nested-with-ignored, in next.

As with v1, the authorship is really mixed, so I don't know if I
should use Co-authored-by (highlighted as a possibility by Denton), or
the far more common Helped-by (as suggested by Junio but based on a
more limited summary of the different contributions), or if perhaps
Denton or SZEDER should be marked as the author and I be marked as
Helped-by or Co-authored-by.  Since Denton commented on round 1, I
used his suggestion for attribution in this round, but I'm open to
changing it to whatever works best.

Changes since v2:
  - This time actually removed the entire unnecessary comment

Range-diff:
1:  c495b9303c ! 1:  40392c6bba dir: special case check for the possibility that pathspec is NULL
    @@ t/t0050-filesystem.sh: $test_unicode 'merge (silent unicode normalization)' '
     +		git reset --hard &&
     +		mkdir -p gitweb/subdir &&
     +		>gitweb/subdir/file &&
    -+		# it is not strictly necessary to add and commit the
     +		git add gitweb &&
     +		git commit -m "add gitweb/subdir/file" &&
     +

 dir.c                 |  8 +++++---
 t/t0050-filesystem.sh | 21 +++++++++++++++++++++
 2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/dir.c b/dir.c
index 7ff79170fc..bd39b86be4 100644
--- a/dir.c
+++ b/dir.c
@@ -1962,8 +1962,9 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 			((state == path_untracked) &&
 			 (get_dtype(cdir.de, istate, path.buf, path.len) == DT_DIR) &&
 			 ((dir->flags & DIR_SHOW_IGNORED_TOO) ||
-			  do_match_pathspec(istate, pathspec, path.buf, path.len,
-					    baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC))) {
+			  (pathspec &&
+			   do_match_pathspec(istate, pathspec, path.buf, path.len,
+					     baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC)))) {
 			struct untracked_cache_dir *ud;
 			ud = lookup_untracked(dir->untracked, untracked,
 					      path.buf + baselen,
@@ -1975,7 +1976,8 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 			if (subdir_state > dir_state)
 				dir_state = subdir_state;
 
-			if (!match_pathspec(istate, pathspec, path.buf, path.len,
+			if (pathspec &&
+			    !match_pathspec(istate, pathspec, path.buf, path.len,
 					    0 /* prefix */, NULL,
 					    0 /* do NOT special case dirs */))
 				state = path_none;
diff --git a/t/t0050-filesystem.sh b/t/t0050-filesystem.sh
index 192c94eccd..a840919967 100755
--- a/t/t0050-filesystem.sh
+++ b/t/t0050-filesystem.sh
@@ -131,4 +131,25 @@ $test_unicode 'merge (silent unicode normalization)' '
 	git merge topic
 '
 
+test_expect_success CASE_INSENSITIVE_FS 'checkout with no pathspec and a case insensitive fs' '
+	git init repo &&
+	(
+		cd repo &&
+
+		>Gitweb &&
+		git add Gitweb &&
+		git commit -m "add Gitweb" &&
+
+		git checkout --orphan todo &&
+		git reset --hard &&
+		mkdir -p gitweb/subdir &&
+		>gitweb/subdir/file &&
+		git add gitweb &&
+		git commit -m "add gitweb/subdir/file" &&
+
+		git checkout master
+	)
+'
+
 test_done
-- 
2.23.0.25.g3f4444bfd7.dirty


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v3] dir: special case check for the possibility that pathspec is NULL
  2019-10-01 18:55                       ` [PATCH v3] " Elijah Newren
@ 2019-10-01 19:35                         ` Denton Liu
  2019-10-01 19:39                           ` Elijah Newren
  2019-10-07 18:04                         ` SZEDER Gábor
  1 sibling, 1 reply; 73+ messages in thread
From: Denton Liu @ 2019-10-01 19:35 UTC (permalink / raw)
  To: Elijah Newren; +Cc: git, Junio C Hamano, SZEDER Gábor

Hi Elijah,

Sorry for dragging out this thread for so long...

On Tue, Oct 01, 2019 at 11:55:24AM -0700, Elijah Newren wrote:

[...]

> diff --git a/t/t0050-filesystem.sh b/t/t0050-filesystem.sh
> index 192c94eccd..a840919967 100755
> --- a/t/t0050-filesystem.sh
> +++ b/t/t0050-filesystem.sh
> @@ -131,4 +131,25 @@ $test_unicode 'merge (silent unicode normalization)' '

I had to change the 25 to a 24 for this to apply cleanly.

>  	git merge topic
>  '
>  
> +test_expect_success CASE_INSENSITIVE_FS 'checkout with no pathspec and a case insensitive fs' '
> +	git init repo &&
> +	(
> +		cd repo &&
> +
> +		>Gitweb &&
> +		git add Gitweb &&
> +		git commit -m "add Gitweb" &&
> +
> +		git checkout --orphan todo &&
> +		git reset --hard &&
> +		mkdir -p gitweb/subdir &&
> +		>gitweb/subdir/file &&
> +		git add gitweb &&
> +		git commit -m "add gitweb/subdir/file" &&
> +
> +		git checkout master
> +	)
> +'
> +
>  test_done

Just wondering, how did you generate this patch? Did you manually edit
the last patch and resend it or is this a bug in our diff machinery?

(Side note, I _hate_ how bad the feedback for git apply/am is. We should
probably give more information than "error: corrupt patch at line 62"
such as why patches are corrupt (unexpected characters, too many/few
lines, something else?).)

> -- 
> 2.23.0.25.g3f4444bfd7.dirty
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v3] dir: special case check for the possibility that pathspec is NULL
  2019-10-01 19:35                         ` Denton Liu
@ 2019-10-01 19:39                           ` Elijah Newren
  2019-10-02 15:51                             ` Elijah Newren
  0 siblings, 1 reply; 73+ messages in thread
From: Elijah Newren @ 2019-10-01 19:39 UTC (permalink / raw)
  To: Denton Liu; +Cc: Git Mailing List, Junio C Hamano, SZEDER Gábor

On Tue, Oct 1, 2019 at 12:35 PM Denton Liu <liu.denton@gmail.com> wrote:
>
> Hi Elijah,
>
> Sorry for dragging out this thread for so long...
>
> On Tue, Oct 01, 2019 at 11:55:24AM -0700, Elijah Newren wrote:
>
> [...]
>
> > diff --git a/t/t0050-filesystem.sh b/t/t0050-filesystem.sh
> > index 192c94eccd..a840919967 100755
> > --- a/t/t0050-filesystem.sh
> > +++ b/t/t0050-filesystem.sh
> > @@ -131,4 +131,25 @@ $test_unicode 'merge (silent unicode normalization)' '
>
> I had to change the 25 to a 24 for this to apply cleanly.
>
> >       git merge topic
> >  '
> >
> > +test_expect_success CASE_INSENSITIVE_FS 'checkout with no pathspec and a case insensitive fs' '
> > +     git init repo &&
> > +     (
> > +             cd repo &&
> > +
> > +             >Gitweb &&
> > +             git add Gitweb &&
> > +             git commit -m "add Gitweb" &&
> > +
> > +             git checkout --orphan todo &&
> > +             git reset --hard &&
> > +             mkdir -p gitweb/subdir &&
> > +             >gitweb/subdir/file &&
> > +             git add gitweb &&
> > +             git commit -m "add gitweb/subdir/file" &&
> > +
> > +             git checkout master
> > +     )
> > +'
> > +
> >  test_done
>
> Just wondering, how did you generate this patch? Did you manually edit
> the last patch and resend it or is this a bug in our diff machinery?

I manually edited because it "was so simple" and of course just
compounded the problem because I didn't fix the count, as you pointed
out.  Gah.  Thanks for checking.  Clearly, I'm bouncing between too
many things this morning, and need to wait until I'm not so distracted
and rushing so I don't mess things up.  I'll sound out a v4 in a few
hours when I've cleaned a few other things off my plate.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v3] dir: special case check for the possibility that pathspec is NULL
  2019-10-01 19:39                           ` Elijah Newren
@ 2019-10-02 15:51                             ` Elijah Newren
  0 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-10-02 15:51 UTC (permalink / raw)
  To: Denton Liu; +Cc: Git Mailing List, Junio C Hamano, SZEDER Gábor

On Tue, Oct 1, 2019 at 12:39 PM Elijah Newren <newren@gmail.com> wrote:
>
> On Tue, Oct 1, 2019 at 12:35 PM Denton Liu <liu.denton@gmail.com> wrote:
> >
> > Hi Elijah,
> >
> > Sorry for dragging out this thread for so long...
> >
> > On Tue, Oct 01, 2019 at 11:55:24AM -0700, Elijah Newren wrote:
> >
> > [...]
> >
> > > diff --git a/t/t0050-filesystem.sh b/t/t0050-filesystem.sh
> > > index 192c94eccd..a840919967 100755
> > > --- a/t/t0050-filesystem.sh
> > > +++ b/t/t0050-filesystem.sh
> > > @@ -131,4 +131,25 @@ $test_unicode 'merge (silent unicode normalization)' '
> >
> > I had to change the 25 to a 24 for this to apply cleanly.
> >
> > >       git merge topic
> > >  '
> > >
> > > +test_expect_success CASE_INSENSITIVE_FS 'checkout with no pathspec and a case insensitive fs' '
> > > +     git init repo &&
> > > +     (
> > > +             cd repo &&
> > > +
> > > +             >Gitweb &&
> > > +             git add Gitweb &&
> > > +             git commit -m "add Gitweb" &&
> > > +
> > > +             git checkout --orphan todo &&
> > > +             git reset --hard &&
> > > +             mkdir -p gitweb/subdir &&
> > > +             >gitweb/subdir/file &&
> > > +             git add gitweb &&
> > > +             git commit -m "add gitweb/subdir/file" &&
> > > +
> > > +             git checkout master
> > > +     )
> > > +'
> > > +
> > >  test_done
> >
> > Just wondering, how did you generate this patch? Did you manually edit
> > the last patch and resend it or is this a bug in our diff machinery?
>
> I manually edited because it "was so simple" and of course just
> compounded the problem because I didn't fix the count, as you pointed
> out.  Gah.  Thanks for checking.  Clearly, I'm bouncing between too
> many things this morning, and need to wait until I'm not so distracted
> and rushing so I don't mess things up.  I'll sound out a v4 in a few
> hours when I've cleaned a few other things off my plate.

I was going to send out a new version this morning, but it looks like
Junio already picked up the patch and fixed it up (the tip of
en/clean-nested-with-ignored already has what we want), so I won't
resend after all.  Thanks Denton, SZEDER, and Junio.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v3] dir: special case check for the possibility that pathspec is NULL
  2019-10-01 18:55                       ` [PATCH v3] " Elijah Newren
  2019-10-01 19:35                         ` Denton Liu
@ 2019-10-07 18:04                         ` SZEDER Gábor
  1 sibling, 0 replies; 73+ messages in thread
From: SZEDER Gábor @ 2019-10-07 18:04 UTC (permalink / raw)
  To: Elijah Newren; +Cc: git, Junio C Hamano, Denton Liu

On Tue, Oct 01, 2019 at 11:55:24AM -0700, Elijah Newren wrote:
> Commits 404ebceda01c ("dir: also check directories for matching
> pathspecs", 2019-09-17) and 89a1f4aaf765 ("dir: if our pathspec might
> match files under a dir, recurse into it", 2019-09-17) added calls to
> match_pathspec() and do_match_pathspec() passing along their pathspec
> parameter.  Both match_pathspec() and do_match_pathspec() assume the
> pathspec argument they are given is non-NULL.  It turns out that
> unpack-tree.c's verify_clean_subdirectory() calls read_directory() with
> pathspec == NULL, and it is possible on case insensitive filesystems for
> that NULL to make it to these new calls to match_pathspec() and
> do_match_pathspec().  Add appropriate checks on the NULLness of pathspec
> to avoid a segfault.
> 
> In case the negation throws anyone off (one of the calls was to
> do_match_pathspec() while the other was to !match_pathspec(), yet no
> negation of the NULLness of pathspec is used), there are two ways to
> understand the differences:
>   * The code already handled the pathspec == NULL cases before this
>     series, and this series only tried to change behavior when there was
>     a pathspec, thus we only want to go into the if-block if pathspec is
>     non-NULL.
>   * One of the calls is for whether to recurse into a subdirectory, the
>     other is for after we've recursed into it for whether we want to
>     remove the subdirectory itself (i.e. the subdirectory didn't match
>     but something under it could have).  That difference in situation
>     leads to the slight differences in logic used (well, that and the
>     slightly unusual fact that we don't want empty pathspecs to remove
>     untracked directories by default).
> 
> Denton found and analyzed one issue and provided the patch for the
> match_pathspec() call, SZEDER figured out why the issue only reproduced
> for some folks and not others and provided the testcase, and I looked
> through the remainder of the series and noted the do_match_pathspec()
> call that should have the same check.
> 
> Co-authored-by: Denton Liu <liu.denton@gmail.com>
> Co-authored-by: SZEDER Gábor <szeder.dev@gmail.com>
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
> Note: Applies on top of en/clean-nested-with-ignored, in next.
> 
> As with v1, the authorship is really mixed, so I don't know if I
> should use Co-authored-by (highlighted as a possibility by Denton), or
> the far more common Helped-by (as suggested by Junio but based on a
> more limited summary of the different contributions), or if perhaps
> Denton or SZEDER should be marked as the author and I be marked as
> Helped-by or Co-authored-by.  Since Denton commented on round 1, I
> used his suggestion for attribution in this round, but I'm open to
> changing it to whatever works best.
> 
> Changes since v2:
>   - This time actually removed the entire unnecessary comment
> 
> Range-diff:
> 1:  c495b9303c ! 1:  40392c6bba dir: special case check for the possibility that pathspec is NULL
>     @@ t/t0050-filesystem.sh: $test_unicode 'merge (silent unicode normalization)' '
>      +		git reset --hard &&
>      +		mkdir -p gitweb/subdir &&
>      +		>gitweb/subdir/file &&
>     -+		# it is not strictly necessary to add and commit the
>      +		git add gitweb &&
>      +		git commit -m "add gitweb/subdir/file" &&
>      +
> 
>  dir.c                 |  8 +++++---
>  t/t0050-filesystem.sh | 21 +++++++++++++++++++++
>  2 files changed, 26 insertions(+), 3 deletions(-)
> 
> diff --git a/dir.c b/dir.c
> index 7ff79170fc..bd39b86be4 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1962,8 +1962,9 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
>  			((state == path_untracked) &&
>  			 (get_dtype(cdir.de, istate, path.buf, path.len) == DT_DIR) &&
>  			 ((dir->flags & DIR_SHOW_IGNORED_TOO) ||
> -			  do_match_pathspec(istate, pathspec, path.buf, path.len,
> -					    baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC))) {
> +			  (pathspec &&
> +			   do_match_pathspec(istate, pathspec, path.buf, path.len,
> +					     baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC)))) {
>  			struct untracked_cache_dir *ud;
>  			ud = lookup_untracked(dir->untracked, untracked,
>  					      path.buf + baselen,
> @@ -1975,7 +1976,8 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
>  			if (subdir_state > dir_state)
>  				dir_state = subdir_state;
>  
> -			if (!match_pathspec(istate, pathspec, path.buf, path.len,
> +			if (pathspec &&
> +			    !match_pathspec(istate, pathspec, path.buf, path.len,
>  					    0 /* prefix */, NULL,
>  					    0 /* do NOT special case dirs */))
>  				state = path_none;
> diff --git a/t/t0050-filesystem.sh b/t/t0050-filesystem.sh
> index 192c94eccd..a840919967 100755
> --- a/t/t0050-filesystem.sh
> +++ b/t/t0050-filesystem.sh
> @@ -131,4 +131,25 @@ $test_unicode 'merge (silent unicode normalization)' '
>  	git merge topic
>  '
>  
> +test_expect_success CASE_INSENSITIVE_FS 'checkout with no pathspec and a case insensitive fs' '
> +	git init repo &&
> +	(
> +		cd repo &&
> +
> +		>Gitweb &&
> +		git add Gitweb &&
> +		git commit -m "add Gitweb" &&
> +
> +		git checkout --orphan todo &&
> +		git reset --hard &&
> +		mkdir -p gitweb/subdir &&
> +		>gitweb/subdir/file &&
> +		git add gitweb &&
> +		git commit -m "add gitweb/subdir/file" &&
> +
> +		git checkout master
> +	)
> +'

I don't like this test ;)

I only intended it as a "here is how to reliably reproduce the
segfault without all the clutter of the full git.git repository" that
I wrote way past my bedtime.  But I think that:

  - it shouldn't have the CASE_INSENSITIVE_FS prereq.  Yes, that
    segfault could only be triggered on a case insensitive filesystem,
    but the given sequence of commands should succeed in a case
    sensitive file system just as well.

    (Have no idea why I added that prereq in the first place; as I
    said above, it was way past my bedtime...)

  - it's in the wrong test script; it would be better among other
    tests checking what 'git checkout' should or must not overwrite
    when switching branches, but not sure which test script that is.

    (I think I added it to this test script, because it stood out a
    bit when grepping for case insensitive fs in the test suite; I
    play the "past my bedtime" card again :)

  - it's already satisfied by 'git checkout master' not failing, but
    it doesn't check whether the resulting contents of the worktree
    are as expected.

  - it still bothers me why that additional subdir was necessary to
    trigger the segfault.  Did you look into it?


^ permalink raw reply	[flat|nested] 73+ messages in thread

end of thread, back to index

Thread overview: 73+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-25 18:59 [PATCH] t7300-clean: demonstrate deleting nested repo with an ignored file breakage SZEDER Gábor
2019-08-25 20:34 ` SZEDER Gábor
2019-08-25 22:32 ` Philip Oakley
2019-08-26  7:48   ` SZEDER Gábor
2019-09-05 15:47 ` [RFC PATCH v2 00/12] Fix some git clean issues Elijah Newren
2019-09-05 15:47   ` [RFC PATCH v2 01/12] t7300: Add some testcases showing failure to clean specified pathspecs Elijah Newren
2019-09-05 15:47   ` [RFC PATCH v2 02/12] dir: fix typo in comment Elijah Newren
2019-09-05 15:47   ` [RFC PATCH v2 03/12] dir: fix off-by-one error in match_pathspec_item Elijah Newren
2019-09-05 15:47   ` [RFC PATCH v2 04/12] dir: Directories should be checked for matching pathspecs too Elijah Newren
2019-09-05 15:47   ` [RFC PATCH v2 05/12] dir: Make the DO_MATCH_SUBMODULE code reusable for a non-submodule case Elijah Newren
2019-09-05 15:47   ` [RFC PATCH v2 06/12] dir: If our pathspec might match files under a dir, recurse into it Elijah Newren
2019-09-05 15:47   ` [RFC PATCH v2 07/12] dir: add commentary explaining match_pathspec_item's return value Elijah Newren
2019-09-05 15:47   ` [RFC PATCH v2 08/12] git-clean.txt: do not claim we will delete files with -n/--dry-run Elijah Newren
2019-09-05 15:47   ` [RFC PATCH v2 09/12] clean: disambiguate the definition of -d Elijah Newren
2019-09-05 15:47   ` [RFC PATCH v2 10/12] clean: avoid removing untracked files in a nested git repository Elijah Newren
2019-09-05 21:20     ` SZEDER Gábor
2019-09-05 15:47   ` [RFC PATCH v2 11/12] clean: rewrap overly long line Elijah Newren
2019-09-05 15:47   ` [RFC PATCH v2 12/12] clean: fix theoretical path corruption Elijah Newren
2019-09-05 19:27     ` SZEDER Gábor
2019-09-07  0:34       ` Elijah Newren
2019-09-05 19:01   ` [RFC PATCH v2 00/12] Fix some git clean issues SZEDER Gábor
2019-09-07  0:33     ` Elijah Newren
2019-09-12 22:12   ` [PATCH v3 " Elijah Newren
2019-09-12 22:12     ` [PATCH v3 01/12] t7300: add testcases showing failure to clean specified pathspecs Elijah Newren
2019-09-13 18:54       ` Junio C Hamano
2019-09-13 19:10         ` Elijah Newren
2019-09-13 20:29           ` Junio C Hamano
2019-09-12 22:12     ` [PATCH v3 02/12] dir: fix typo in comment Elijah Newren
2019-09-12 22:12     ` [PATCH v3 03/12] dir: fix off-by-one error in match_pathspec_item Elijah Newren
2019-09-13 19:05       ` Junio C Hamano
2019-09-12 22:12     ` [PATCH v3 04/12] dir: also check directories for matching pathspecs Elijah Newren
2019-09-12 22:12     ` [PATCH v3 05/12] dir: make the DO_MATCH_SUBMODULE code reusable for a non-submodule case Elijah Newren
2019-09-12 22:12     ` [PATCH v3 06/12] dir: if our pathspec might match files under a dir, recurse into it Elijah Newren
2019-09-13 19:45       ` Junio C Hamano
2019-09-12 22:12     ` [PATCH v3 07/12] dir: add commentary explaining match_pathspec_item's return value Elijah Newren
2019-09-13 20:04       ` Junio C Hamano
2019-09-12 22:12     ` [PATCH v3 08/12] git-clean.txt: do not claim we will delete files with -n/--dry-run Elijah Newren
2019-09-12 22:12     ` [PATCH v3 09/12] clean: disambiguate the definition of -d Elijah Newren
2019-09-12 22:12     ` [PATCH v3 10/12] clean: avoid removing untracked files in a nested git repository Elijah Newren
2019-09-12 22:12     ` [PATCH v3 11/12] clean: rewrap overly long line Elijah Newren
2019-09-12 22:12     ` [PATCH v3 12/12] clean: fix theoretical path corruption Elijah Newren
2019-09-17 16:34     ` [PATCH v4 00/12] Fix some git clean issues Elijah Newren
2019-09-17 16:34       ` [PATCH v4 01/12] t7300: add testcases showing failure to clean specified pathspecs Elijah Newren
2019-09-17 16:34       ` [PATCH v4 02/12] dir: fix typo in comment Elijah Newren
2019-09-17 16:34       ` [PATCH v4 03/12] dir: fix off-by-one error in match_pathspec_item Elijah Newren
2019-09-17 16:34       ` [PATCH v4 04/12] dir: also check directories for matching pathspecs Elijah Newren
2019-09-25 20:39         ` [BUG] git is segfaulting, was " Denton Liu
2019-09-25 21:28           ` Elijah Newren
2019-09-25 21:55             ` Denton Liu
2019-09-26 20:35               ` Denton Liu
2019-09-27  0:12                 ` Elijah Newren
2019-09-27  1:09           ` SZEDER Gábor
2019-09-27  2:17             ` SZEDER Gábor
2019-09-27 17:10               ` Denton Liu
2019-09-30 19:11                 ` [PATCH] dir: special case check for the possibility that pathspec is NULL Elijah Newren
2019-09-30 22:31                   ` Denton Liu
2019-10-01  7:01                     ` Elijah Newren
2019-10-01 18:30                   ` [PATCH v2] " Elijah Newren
2019-10-01 18:40                     ` Denton Liu
2019-10-01 18:54                       ` Elijah Newren
2019-10-01 18:55                       ` [PATCH v3] " Elijah Newren
2019-10-01 19:35                         ` Denton Liu
2019-10-01 19:39                           ` Elijah Newren
2019-10-02 15:51                             ` Elijah Newren
2019-10-07 18:04                         ` SZEDER Gábor
2019-09-17 16:34       ` [PATCH v4 05/12] dir: make the DO_MATCH_SUBMODULE code reusable for a non-submodule case Elijah Newren
2019-09-17 16:34       ` [PATCH v4 06/12] dir: if our pathspec might match files under a dir, recurse into it Elijah Newren
2019-09-17 16:34       ` [PATCH v4 07/12] dir: add commentary explaining match_pathspec_item's return value Elijah Newren
2019-09-17 16:35       ` [PATCH v4 08/12] git-clean.txt: do not claim we will delete files with -n/--dry-run Elijah Newren
2019-09-17 16:35       ` [PATCH v4 09/12] clean: disambiguate the definition of -d Elijah Newren
2019-09-17 16:35       ` [PATCH v4 10/12] clean: avoid removing untracked files in a nested git repository Elijah Newren
2019-09-17 16:35       ` [PATCH v4 11/12] clean: rewrap overly long line Elijah Newren
2019-09-17 16:35       ` [PATCH v4 12/12] clean: fix theoretical path corruption Elijah Newren

git@vger.kernel.org list mirror (unofficial, one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Example config snippet for mirrors

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git