git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/5] Directory traversal fixes
@ 2021-05-07  4:04 Elijah Newren via GitGitGadget
  2021-05-07  4:04 ` [PATCH 1/5] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget
                   ` (7 more replies)
  0 siblings, 8 replies; 90+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-05-07  4:04 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren

This patchset fixes a few directory traversal issues, where fill_directory()
would traverse into directories that it shouldn't and not traverse into
directories that it should. One of these issues was reported recently on
this list[1], another was found at $DAYJOB.

The fifth patch might have backward compatibility implications, but is easy
to review. Even if the logic in dir.c makes your eyes glaze over, at least
take a look at the fifth patch.

Also, if anyone has any ideas about a better place to put the "Some
sidenotes" from the third commit message rather than keeping them in a
random commit message, that might be helpful too.

[1] See
https://lore.kernel.org/git/DM6PR00MB06829EC5B85E0C5AC595004E894E9@DM6PR00MB0682.namprd00.prod.outlook.com/
or alternatively https://github.com/git-for-windows/git/issues/2732.

Elijah Newren (5):
  t7300: add testcase showing unnecessary traversal into ignored
    directory
  t3001, t7300: add testcase showcasing missed directory traversal
  dir: avoid unnecessary traversal into ignored directory
  dir: traverse into untracked directories if they may have ignored
    subfiles
  [RFC] ls-files: error out on -i unless -o or -c are specified

 builtin/ls-files.c                 |  3 ++
 dir.c                              | 50 ++++++++++++++++---------
 t/t1306-xdg-files.sh               |  2 +-
 t/t3001-ls-files-others-exclude.sh |  5 +++
 t/t3003-ls-files-exclude.sh        |  4 +-
 t/t7300-clean.sh                   | 59 ++++++++++++++++++++++++++++++
 6 files changed, 103 insertions(+), 20 deletions(-)


base-commit: 311531c9de557d25ac087c1637818bd2aad6eb3a
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1020%2Fnewren%2Fdirectory-traversal-fixes-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1020/newren/directory-traversal-fixes-v1
Pull-Request: https://github.com/git/git/pull/1020
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH 1/5] t7300: add testcase showing unnecessary traversal into ignored directory
  2021-05-07  4:04 [PATCH 0/5] Directory traversal fixes Elijah Newren via GitGitGadget
@ 2021-05-07  4:04 ` Elijah Newren via GitGitGadget
  2021-05-07  4:27   ` Eric Sunshine
  2021-05-08 11:13   ` Philip Oakley
  2021-05-07  4:04 ` [PATCH 2/5] t3001, t7300: add testcase showcasing missed directory traversal Elijah Newren via GitGitGadget
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 90+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-05-07  4:04 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

PNPM is apparently creating deeply nested (but ignored) directory
structures; traversing them is costly performance-wise, unnecessary, and
in some cases is even throwing warnings/errors because the paths are too
long to handle on various platforms.  Add a testcase that demonstrates
this problem.

Initial-test-by: Jason Gore <Jason.Gore@microsoft.com>
Helped-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t7300-clean.sh | 40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index a74816ca8b46..5f1dc397c11e 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -746,4 +746,44 @@ test_expect_success 'clean untracked paths by pathspec' '
 	test_must_be_empty actual
 '
 
+test_expect_failure 'avoid traversing into ignored directories' '
+	test_when_finished rm -f output error &&
+	test_create_repo avoid-traversing-deep-hierarchy &&
+	(
+		cd avoid-traversing-deep-hierarchy &&
+
+		>directory-random-file.txt &&
+		# Put this file under directory400/directory399/.../directory1/
+		depth=400 &&
+		for x in $(test_seq 1 $depth); do
+			mkdir "tmpdirectory$x" &&
+			mv directory* "tmpdirectory$x" &&
+			mv "tmpdirectory$x" "directory$x"
+		done &&
+
+		git clean -ffdxn -e directory$depth >../output 2>../error &&
+
+		test_must_be_empty ../output &&
+		# We especially do not want things like
+		#   "warning: could not open directory "
+		# appearing in the error output.  It is true that directories
+		# that are too long cannot be opened, but we should not be
+		# recursing into those directories anyway since the very first
+		# level is ignored.
+		test_must_be_empty ../error &&
+
+		# alpine-linux-musl fails to "rm -rf" a directory with such
+		# a deeply nested hierarchy.  Help it out by deleting the
+		# leading directories ourselves.  Super slow, but, what else
+		# can we do?  Without this, we will hit a
+		#     error: Tests passed but test cleanup failed; aborting
+		# so do this ugly manual cleanup...
+		while test ! -f directory-random-file.txt; do
+			name=$(ls -d directory*) &&
+			mv $name/* . &&
+			rmdir $name
+		done
+	)
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH 2/5] t3001, t7300: add testcase showcasing missed directory traversal
  2021-05-07  4:04 [PATCH 0/5] Directory traversal fixes Elijah Newren via GitGitGadget
  2021-05-07  4:04 ` [PATCH 1/5] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget
@ 2021-05-07  4:04 ` Elijah Newren via GitGitGadget
  2021-05-07  4:04 ` [PATCH 3/5] dir: avoid unnecessary traversal into ignored directory Elijah Newren via GitGitGadget
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 90+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-05-07  4:04 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

In the last commit, we added a testcase showing that the directory
traversal machinery sometimes traverses into directories unnecessarily.
Here we show that there are cases where it does the opposite: it does
not traverse into directories, despite those directories having
important files that need to be flagged.

Add a testcase showing that `git ls-files -o -i --directory` can omit
some of the files it should be listing, and another showing that `git
clean -fX` can fail to clean out some of the expected files.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t3001-ls-files-others-exclude.sh |  5 +++++
 t/t7300-clean.sh                   | 19 +++++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/t/t3001-ls-files-others-exclude.sh b/t/t3001-ls-files-others-exclude.sh
index 1ec7cb57c7a8..ac05d1a17931 100755
--- a/t/t3001-ls-files-others-exclude.sh
+++ b/t/t3001-ls-files-others-exclude.sh
@@ -292,6 +292,11 @@ EOF
 	test_cmp expect actual
 '
 
+test_expect_failure 'ls-files with "**" patterns and --directory' '
+	# Expectation same as previous test
+	git ls-files --directory -o -i --exclude "**/a.1" >actual &&
+	test_cmp expect actual
+'
 
 test_expect_success 'ls-files with "**" patterns and no slashes' '
 	git ls-files -o -i --exclude "one**a.1" >actual &&
diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index 5f1dc397c11e..337f9af1d74b 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -786,4 +786,23 @@ test_expect_failure 'avoid traversing into ignored directories' '
 	)
 '
 
+test_expect_failure 'traverse into directories that may have ignored entries' '
+	test_when_finished rm -f output &&
+	test_create_repo need-to-traverse-into-hierarchy &&
+	(
+		cd need-to-traverse-into-hierarchy &&
+		mkdir -p modules/foobar/src/generated &&
+		> modules/foobar/src/generated/code.c &&
+		> modules/foobar/Makefile &&
+		echo "/modules/**/src/generated/" >.gitignore &&
+
+		git clean -fX modules/foobar >../output &&
+
+		grep Removing ../output &&
+
+		test_path_is_missing modules/foobar/src/generated/code.c &&
+		test_path_is_file modules/foobar/Makefile
+	)
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH 3/5] dir: avoid unnecessary traversal into ignored directory
  2021-05-07  4:04 [PATCH 0/5] Directory traversal fixes Elijah Newren via GitGitGadget
  2021-05-07  4:04 ` [PATCH 1/5] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget
  2021-05-07  4:04 ` [PATCH 2/5] t3001, t7300: add testcase showcasing missed directory traversal Elijah Newren via GitGitGadget
@ 2021-05-07  4:04 ` Elijah Newren via GitGitGadget
  2021-05-07  4:04 ` [PATCH 4/5] dir: traverse into untracked directories if they may have ignored subfiles Elijah Newren via GitGitGadget
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 90+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-05-07  4:04 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

The show_other_directories case in treat_directory() tried to handle
both excludes and untracked files with the same logic, and mishandled
both the excludes and the untracked files in the process, in different
ways.  Split that logic apart, and then focus on the logic for the
excludes; a subsequent commit will address the logic for untracked
files.

For show_other_directories, an excluded directory means that
every path underneath that directory will also be excluded.  Given that
the calling code requested to just show directories when everything
under a directory had the same state (that's what the
"DIR_SHOW_OTHER_DIRECTORIES" flag means), we generally do not need to
traverse into such directories and can just immediately mark them as
ignored (i.e. as path_excluded).  The only reason we cannot just
immediately return path_excluded is the DIR_HIDE_EMPTY_DIRECTORIES flag
and the possibility that the ignored directory is an empty directory.
The code previously treated DIR_SHOW_IGNORED_TOO in most cases as an
exception as well, which was wrong.  It can sometimes reduce the number
of cases where we need to recurse (namely if
DIR_SHOW_IGNORED_TOO_MODE_MATCHING is also set), but should not be able
to increase the number of cases where we need to recurse.  Fix the logic
accordingly.

Some sidenotes about possible confusion with dir.c:

* "ignored" often refers to an untracked ignore", i.e. a file which is
  not tracked which matches one of the ignore/exclusion rules.  But you
  can also have a "tracked ignore", a tracked file that happens to match
  one of the ignore/exclusion rules and which dir.c has to worry about
  since "git ls-files -c -i" is supposed to list them.

* The dir code often uses "ignored" and "excluded" interchangeably,
  which you need to keep in mind while reading the code.  Sadly, though,
  it can get very confusing since ignore rules can have exclusions, as
  in the last of the following .gitignore rules:
      .gitignore
      *~
      *.log
      !settings.log
  In the last entry above, (pathspec->items[3].magic & PATHSPEC_EXCLUDE)
  will be true due the the '!' negating the rule.  Someone might refer
  to this as "excluded".  That means the file 'settings.log' will not
  match, and thus not be ignored.  So we won't return path_excluded for
  it.  So it's an exclude rule that prevents the file from being an
  exclude.  The non-excluded rules are the ones that result in files
  being excludes.  Great fun, eh?

Sometimes it feels like dir.c needs its own glossary with its many
definitions, including the multiply-defined terms.

Reported-by: Jason Gore <Jason.Gore@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c            | 44 +++++++++++++++++++++++++++++---------------
 t/t7300-clean.sh |  2 +-
 2 files changed, 30 insertions(+), 16 deletions(-)

diff --git a/dir.c b/dir.c
index 3474e67e8f3c..4b183749843e 100644
--- a/dir.c
+++ b/dir.c
@@ -1844,6 +1844,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	}
 
 	/* This is the "show_other_directories" case */
+	assert(dir->flags & DIR_SHOW_OTHER_DIRECTORIES);
 
 	/*
 	 * If we have a pathspec which could match something _below_ this
@@ -1854,27 +1855,40 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	if (matches_how == MATCHED_RECURSIVELY_LEADING_PATHSPEC)
 		return path_recurse;
 
+	/* Special cases for where this directory is excluded/ignored */
+	if (excluded) {
+		/*
+		 * In the show_other_directories case, if we're not
+		 * hiding empty directories, there is no need to
+		 * recurse into an ignored directory.
+		 */
+		if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES))
+			return path_excluded;
+
+		/*
+		 * Even if we are hiding empty directories, we can still avoid
+		 * recursing into ignored directories for DIR_SHOW_IGNORED_TOO
+		 * if DIR_SHOW_IGNORED_TOO_MODE_MATCHING is also set.
+		 */
+		if ((dir->flags & DIR_SHOW_IGNORED_TOO) &&
+		    (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING))
+			return path_excluded;
+	}
+
 	/*
-	 * Other than the path_recurse case immediately above, we only need
-	 * to recurse into untracked/ignored directories if either of the
-	 * following bits is set:
+	 * Other than the path_recurse case above, we only need to
+	 * recurse into untracked directories if either of the following
+	 * bits is set:
 	 *   - DIR_SHOW_IGNORED_TOO (because then we need to determine if
 	 *                           there are ignored entries below)
 	 *   - DIR_HIDE_EMPTY_DIRECTORIES (because we have to determine if
 	 *                                 the directory is empty)
 	 */
-	if (!(dir->flags & (DIR_SHOW_IGNORED_TOO | DIR_HIDE_EMPTY_DIRECTORIES)))
-		return excluded ? path_excluded : path_untracked;
-
-	/*
-	 * ...and even if DIR_SHOW_IGNORED_TOO is set, we can still avoid
-	 * recursing into ignored directories if the path is excluded and
-	 * DIR_SHOW_IGNORED_TOO_MODE_MATCHING is also set.
-	 */
-	if (excluded &&
-	    (dir->flags & DIR_SHOW_IGNORED_TOO) &&
-	    (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING))
-		return path_excluded;
+	if (!excluded &&
+	    !(dir->flags & (DIR_SHOW_IGNORED_TOO |
+			    DIR_HIDE_EMPTY_DIRECTORIES))) {
+		return path_untracked;
+	}
 
 	/*
 	 * Even if we don't want to know all the paths under an untracked or
diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index 337f9af1d74b..00e5fa35dae3 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -746,7 +746,7 @@ test_expect_success 'clean untracked paths by pathspec' '
 	test_must_be_empty actual
 '
 
-test_expect_failure 'avoid traversing into ignored directories' '
+test_expect_success 'avoid traversing into ignored directories' '
 	test_when_finished rm -f output error &&
 	test_create_repo avoid-traversing-deep-hierarchy &&
 	(
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH 4/5] dir: traverse into untracked directories if they may have ignored subfiles
  2021-05-07  4:04 [PATCH 0/5] Directory traversal fixes Elijah Newren via GitGitGadget
                   ` (2 preceding siblings ...)
  2021-05-07  4:04 ` [PATCH 3/5] dir: avoid unnecessary traversal into ignored directory Elijah Newren via GitGitGadget
@ 2021-05-07  4:04 ` Elijah Newren via GitGitGadget
  2021-05-07  4:05 ` [PATCH 5/5] [RFC] ls-files: error out on -i unless -o or -c are specified Elijah Newren via GitGitGadget
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 90+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-05-07  4:04 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

A directory that is untracked does not imply that all files under it
should be categorized as untracked; in particular, if the caller is
interested in ignored files, many files or directories underneath the
untracked directory may be ignored.  We previously partially handled
this right with DIR_SHOW_IGNORED_TOO, but missed DIR_SHOW_IGNORED.  It
was not obvious, though, because the logic for untracked and excluded
files had been fused together making it harder to reason about.  The
previous commit split that logic out, making it easier to notice that
DIR_SHOW_IGNORED was missing.  Add it.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c                              | 10 ++++++----
 t/t3001-ls-files-others-exclude.sh |  2 +-
 t/t7300-clean.sh                   |  2 +-
 3 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/dir.c b/dir.c
index 4b183749843e..3beb8e17a839 100644
--- a/dir.c
+++ b/dir.c
@@ -1877,15 +1877,17 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 
 	/*
 	 * Other than the path_recurse case above, we only need to
-	 * recurse into untracked directories if either of the following
+	 * recurse into untracked directories if any of the following
 	 * bits is set:
-	 *   - DIR_SHOW_IGNORED_TOO (because then we need to determine if
-	 *                           there are ignored entries below)
+	 *   - DIR_SHOW_IGNORED (because then we need to determine if
+	 *                       there are ignored entries below)
+	 *   - DIR_SHOW_IGNORED_TOO (same as above)
 	 *   - DIR_HIDE_EMPTY_DIRECTORIES (because we have to determine if
 	 *                                 the directory is empty)
 	 */
 	if (!excluded &&
-	    !(dir->flags & (DIR_SHOW_IGNORED_TOO |
+	    !(dir->flags & (DIR_SHOW_IGNORED |
+			    DIR_SHOW_IGNORED_TOO |
 			    DIR_HIDE_EMPTY_DIRECTORIES))) {
 		return path_untracked;
 	}
diff --git a/t/t3001-ls-files-others-exclude.sh b/t/t3001-ls-files-others-exclude.sh
index ac05d1a17931..516c95ea0e82 100755
--- a/t/t3001-ls-files-others-exclude.sh
+++ b/t/t3001-ls-files-others-exclude.sh
@@ -292,7 +292,7 @@ EOF
 	test_cmp expect actual
 '
 
-test_expect_failure 'ls-files with "**" patterns and --directory' '
+test_expect_success 'ls-files with "**" patterns and --directory' '
 	# Expectation same as previous test
 	git ls-files --directory -o -i --exclude "**/a.1" >actual &&
 	test_cmp expect actual
diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index 00e5fa35dae3..c2a3b7b6a52b 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -786,7 +786,7 @@ test_expect_success 'avoid traversing into ignored directories' '
 	)
 '
 
-test_expect_failure 'traverse into directories that may have ignored entries' '
+test_expect_success 'traverse into directories that may have ignored entries' '
 	test_when_finished rm -f output &&
 	test_create_repo need-to-traverse-into-hierarchy &&
 	(
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH 5/5] [RFC] ls-files: error out on -i unless -o or -c are specified
  2021-05-07  4:04 [PATCH 0/5] Directory traversal fixes Elijah Newren via GitGitGadget
                   ` (3 preceding siblings ...)
  2021-05-07  4:04 ` [PATCH 4/5] dir: traverse into untracked directories if they may have ignored subfiles Elijah Newren via GitGitGadget
@ 2021-05-07  4:05 ` Elijah Newren via GitGitGadget
  2021-05-07 16:22 ` [PATCH 6/5] dir: update stale description of treat_directory() Derrick Stolee
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 90+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-05-07  4:05 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

ls-files --ignored can be used together with either --others or
--cached.  After being perplexed for a bit and digging in to the code, I
assumed that ls-files -i was just broken and not printing anything and
had a nice patch ready to submit when I finally realized that -i can be
used with --cached to find tracked ignores.

While that was a mistake on my part, and a careful reading of the
documentation could have made this more clear, I suspect this is an
error others are likely to make as well.  In fact, of two uses in our
testsuite, I believe one of the two did make this error.  In t1306.13,
there are NO tracked files, and all the excludes built up and used in
that test and in previous tests thus have to be about untracked files.
However, since they were looking for an empty result, the mistake went
unnoticed as their erroneous command also just happened to give an empty
answer.

-i will most the time be used with -o, which would suggest we could just
make -i imply -o in the absence of either a -o or -c, but that would be
a backward incompatible break.  Instead, let's just flag -i without
either a -o or -c as an error, and update the two relevant testcases to
specify their intent.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 builtin/ls-files.c          | 3 +++
 t/t1306-xdg-files.sh        | 2 +-
 t/t3003-ls-files-exclude.sh | 4 ++--
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 60a2913a01e9..9f74b1ab2e69 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -748,6 +748,9 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 	if (pathspec.nr && error_unmatch)
 		ps_matched = xcalloc(pathspec.nr, 1);
 
+	if ((dir.flags & DIR_SHOW_IGNORED) && !show_others && !show_cached)
+		die("ls-files --ignored is usually used with --others, but --cached is the default.  Please specify which you want.");
+
 	if ((dir.flags & DIR_SHOW_IGNORED) && !exc_given)
 		die("ls-files --ignored needs some exclude pattern");
 
diff --git a/t/t1306-xdg-files.sh b/t/t1306-xdg-files.sh
index dd87b43be1a6..40d3c42618c0 100755
--- a/t/t1306-xdg-files.sh
+++ b/t/t1306-xdg-files.sh
@@ -116,7 +116,7 @@ test_expect_success 'Exclusion in a non-XDG global ignore file' '
 test_expect_success 'Checking XDG ignore file when HOME is unset' '
 	(sane_unset HOME &&
 	 git config --unset core.excludesfile &&
-	 git ls-files --exclude-standard --ignored >actual) &&
+	 git ls-files --exclude-standard --ignored --others >actual) &&
 	test_must_be_empty actual
 '
 
diff --git a/t/t3003-ls-files-exclude.sh b/t/t3003-ls-files-exclude.sh
index d5ec333131f9..c41c4f046abf 100755
--- a/t/t3003-ls-files-exclude.sh
+++ b/t/t3003-ls-files-exclude.sh
@@ -29,11 +29,11 @@ test_expect_success 'add file to gitignore' '
 '
 check_all_output
 
-test_expect_success 'ls-files -i lists only tracked-but-ignored files' '
+test_expect_success 'ls-files -i -c lists only tracked-but-ignored files' '
 	echo content >other-file &&
 	git add other-file &&
 	echo file >expect &&
-	git ls-files -i --exclude-standard >output &&
+	git ls-files -i -c --exclude-standard >output &&
 	test_cmp expect output
 '
 
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 1/5] t7300: add testcase showing unnecessary traversal into ignored directory
  2021-05-07  4:04 ` [PATCH 1/5] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget
@ 2021-05-07  4:27   ` Eric Sunshine
  2021-05-07  5:00     ` Elijah Newren
  2021-05-08 11:13   ` Philip Oakley
  1 sibling, 1 reply; 90+ messages in thread
From: Eric Sunshine @ 2021-05-07  4:27 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: Git List, Elijah Newren

On Fri, May 7, 2021 at 12:05 AM Elijah Newren via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> PNPM is apparently creating deeply nested (but ignored) directory
> structures; traversing them is costly performance-wise, unnecessary, and
> in some cases is even throwing warnings/errors because the paths are too
> long to handle on various platforms.  Add a testcase that demonstrates
> this problem.
>
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
> diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
> @@ -746,4 +746,44 @@ test_expect_success 'clean untracked paths by pathspec' '
> +test_expect_failure 'avoid traversing into ignored directories' '
> +       test_when_finished rm -f output error &&
> +       test_create_repo avoid-traversing-deep-hierarchy &&
> +       (
> +               cd avoid-traversing-deep-hierarchy &&
> +
> +               >directory-random-file.txt &&
> +               # Put this file under directory400/directory399/.../directory1/
> +               depth=400 &&
> +               for x in $(test_seq 1 $depth); do
> +                       mkdir "tmpdirectory$x" &&
> +                       mv directory* "tmpdirectory$x" &&
> +                       mv "tmpdirectory$x" "directory$x"
> +               done &&

Is this expensive/slow loop needed because you'd otherwise run afoul
of command-line length limits on some platforms if you tried creating
the entire mess of directories with a single `mkdir -p`?

> +               git clean -ffdxn -e directory$depth >../output 2>../error &&
> +
> +               test_must_be_empty ../output &&
> +               # We especially do not want things like
> +               #   "warning: could not open directory "
> +               # appearing in the error output.  It is true that directories
> +               # that are too long cannot be opened, but we should not be
> +               # recursing into those directories anyway since the very first
> +               # level is ignored.
> +               test_must_be_empty ../error &&
> +
> +               # alpine-linux-musl fails to "rm -rf" a directory with such
> +               # a deeply nested hierarchy.  Help it out by deleting the
> +               # leading directories ourselves.  Super slow, but, what else
> +               # can we do?  Without this, we will hit a
> +               #     error: Tests passed but test cleanup failed; aborting
> +               # so do this ugly manual cleanup...
> +               while test ! -f directory-random-file.txt; do
> +                       name=$(ls -d directory*) &&
> +                       mv $name/* . &&
> +                       rmdir $name
> +               done

Shouldn't this cleanup loop be under the control of
test_when_finished() to ensure it is invoked regardless of how the
test exits?

> +       )
> +'

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 1/5] t7300: add testcase showing unnecessary traversal into ignored directory
  2021-05-07  4:27   ` Eric Sunshine
@ 2021-05-07  5:00     ` Elijah Newren
  2021-05-07  5:31       ` Eric Sunshine
  2021-05-07 23:05       ` Jeff King
  0 siblings, 2 replies; 90+ messages in thread
From: Elijah Newren @ 2021-05-07  5:00 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Elijah Newren via GitGitGadget, Git List

On Thu, May 6, 2021 at 9:27 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
>
> On Fri, May 7, 2021 at 12:05 AM Elijah Newren via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> > PNPM is apparently creating deeply nested (but ignored) directory
> > structures; traversing them is costly performance-wise, unnecessary, and
> > in some cases is even throwing warnings/errors because the paths are too
> > long to handle on various platforms.  Add a testcase that demonstrates
> > this problem.
> >
> > Signed-off-by: Elijah Newren <newren@gmail.com>
> > ---
> > diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
> > @@ -746,4 +746,44 @@ test_expect_success 'clean untracked paths by pathspec' '
> > +test_expect_failure 'avoid traversing into ignored directories' '
> > +       test_when_finished rm -f output error &&
> > +       test_create_repo avoid-traversing-deep-hierarchy &&
> > +       (
> > +               cd avoid-traversing-deep-hierarchy &&
> > +
> > +               >directory-random-file.txt &&
> > +               # Put this file under directory400/directory399/.../directory1/
> > +               depth=400 &&
> > +               for x in $(test_seq 1 $depth); do
> > +                       mkdir "tmpdirectory$x" &&
> > +                       mv directory* "tmpdirectory$x" &&
> > +                       mv "tmpdirectory$x" "directory$x"
> > +               done &&
>
> Is this expensive/slow loop needed because you'd otherwise run afoul
> of command-line length limits on some platforms if you tried creating
> the entire mess of directories with a single `mkdir -p`?

The whole point is creating a path long enough that it runs afoul of
limits, yes.

If we had an alternative way to check whether dir.c actually recursed
into a directory, then I could dispense with this and just have a
single directory (and it could be named a single character long for
that matter too), but I don't know of a good way to do that.  (Some
possiibilities I considered along that route are mentioned at
https://lore.kernel.org/git/CABPp-BF3e+MWQAGb6ER7d5jqjcV=kYqQ2stM_oDyaqvonPPPSw@mail.gmail.com/)

> > +               git clean -ffdxn -e directory$depth >../output 2>../error &&
> > +
> > +               test_must_be_empty ../output &&
> > +               # We especially do not want things like
> > +               #   "warning: could not open directory "
> > +               # appearing in the error output.  It is true that directories
> > +               # that are too long cannot be opened, but we should not be
> > +               # recursing into those directories anyway since the very first
> > +               # level is ignored.
> > +               test_must_be_empty ../error &&
> > +
> > +               # alpine-linux-musl fails to "rm -rf" a directory with such
> > +               # a deeply nested hierarchy.  Help it out by deleting the
> > +               # leading directories ourselves.  Super slow, but, what else
> > +               # can we do?  Without this, we will hit a
> > +               #     error: Tests passed but test cleanup failed; aborting
> > +               # so do this ugly manual cleanup...
> > +               while test ! -f directory-random-file.txt; do
> > +                       name=$(ls -d directory*) &&
> > +                       mv $name/* . &&
> > +                       rmdir $name
> > +               done
>
> Shouldn't this cleanup loop be under the control of
> test_when_finished() to ensure it is invoked regardless of how the
> test exits?

I thought about that, but if the test fails, it seems nicer to leave
everything behind so it can be inspected.  It's similar to test_done,
which will only delete the $TRASH_DIRECTORY if all the tests passed.
So no, I don't think this should be under the control of
test_when_finished.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 1/5] t7300: add testcase showing unnecessary traversal into ignored directory
  2021-05-07  5:00     ` Elijah Newren
@ 2021-05-07  5:31       ` Eric Sunshine
  2021-05-07  5:42         ` Elijah Newren
  2021-05-07 23:05       ` Jeff King
  1 sibling, 1 reply; 90+ messages in thread
From: Eric Sunshine @ 2021-05-07  5:31 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Elijah Newren via GitGitGadget, Git List

On Fri, May 7, 2021 at 1:01 AM Elijah Newren <newren@gmail.com> wrote:
> On Thu, May 6, 2021 at 9:27 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
> > Is this expensive/slow loop needed because you'd otherwise run afoul
> > of command-line length limits on some platforms if you tried creating
> > the entire mess of directories with a single `mkdir -p`?
>
> The whole point is creating a path long enough that it runs afoul of
> limits, yes.
>
> If we had an alternative way to check whether dir.c actually recursed
> into a directory, then I could dispense with this and just have a
> single directory (and it could be named a single character long for
> that matter too), but I don't know of a good way to do that.  (Some
> possiibilities I considered along that route are mentioned at
> https://lore.kernel.org/git/CABPp-BF3e+MWQAGb6ER7d5jqjcV=kYqQ2stM_oDyaqvonPPPSw@mail.gmail.com/)

Thanks, I read that exchange (of course) immediately after sending the
above question.

> > > +               while test ! -f directory-random-file.txt; do
> > > +                       name=$(ls -d directory*) &&
> > > +                       mv $name/* . &&
> > > +                       rmdir $name
> > > +               done
> >
> > Shouldn't this cleanup loop be under the control of
> > test_when_finished() to ensure it is invoked regardless of how the
> > test exits?
>
> I thought about that, but if the test fails, it seems nicer to leave
> everything behind so it can be inspected.  It's similar to test_done,
> which will only delete the $TRASH_DIRECTORY if all the tests passed.
> So no, I don't think this should be under the control of
> test_when_finished.

I may be confused, but I'm not following this reasoning. If you're
using `-i` to debug a failure within the test, then the
test_when_finished() cleanup actions won't be triggered anyhow
(they're suppressed by `-i`), so everything will be left behind as
desired.

The problem with not placing this under control of
test_when_finished() is that, if something in the test proper does
break, after the "test failed" message, you'll get the undesirable
alpine-linux-musl behavior you explained in your earlier email where
test_done() bombs out.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 1/5] t7300: add testcase showing unnecessary traversal into ignored directory
  2021-05-07  5:31       ` Eric Sunshine
@ 2021-05-07  5:42         ` Elijah Newren
  2021-05-07  5:56           ` Eric Sunshine
  0 siblings, 1 reply; 90+ messages in thread
From: Elijah Newren @ 2021-05-07  5:42 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Elijah Newren via GitGitGadget, Git List

On Thu, May 6, 2021 at 10:32 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
>
> On Fri, May 7, 2021 at 1:01 AM Elijah Newren <newren@gmail.com> wrote:
> > On Thu, May 6, 2021 at 9:27 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
> > > Is this expensive/slow loop needed because you'd otherwise run afoul
> > > of command-line length limits on some platforms if you tried creating
> > > the entire mess of directories with a single `mkdir -p`?
> >
> > The whole point is creating a path long enough that it runs afoul of
> > limits, yes.
> >
> > If we had an alternative way to check whether dir.c actually recursed
> > into a directory, then I could dispense with this and just have a
> > single directory (and it could be named a single character long for
> > that matter too), but I don't know of a good way to do that.  (Some
> > possiibilities I considered along that route are mentioned at
> > https://lore.kernel.org/git/CABPp-BF3e+MWQAGb6ER7d5jqjcV=kYqQ2stM_oDyaqvonPPPSw@mail.gmail.com/)
>
> Thanks, I read that exchange (of course) immediately after sending the
> above question.
>
> > > > +               while test ! -f directory-random-file.txt; do
> > > > +                       name=$(ls -d directory*) &&
> > > > +                       mv $name/* . &&
> > > > +                       rmdir $name
> > > > +               done
> > >
> > > Shouldn't this cleanup loop be under the control of
> > > test_when_finished() to ensure it is invoked regardless of how the
> > > test exits?
> >
> > I thought about that, but if the test fails, it seems nicer to leave
> > everything behind so it can be inspected.  It's similar to test_done,
> > which will only delete the $TRASH_DIRECTORY if all the tests passed.
> > So no, I don't think this should be under the control of
> > test_when_finished.
>
> I may be confused, but I'm not following this reasoning. If you're
> using `-i` to debug a failure within the test, then the
> test_when_finished() cleanup actions won't be triggered anyhow
> (they're suppressed by `-i`), so everything will be left behind as
> desired.

I didn't know that about --immediate.  It's good to know.  However,
not all debugging is done with -i; someone can also just run the
testsuite expecting everything to pass, see a failure, and then decide
to go look around (and then maybe re-run with -i if the initial
looking around isn't clear).  I do that every once in a while.

> The problem with not placing this under control of
> test_when_finished() is that, if something in the test proper does
> break, after the "test failed" message, you'll get the undesirable
> alpine-linux-musl behavior you explained in your earlier email where
> test_done() bombs out.

Unless I'm misunderstanding the test_done() code (I'm looking at
test-lib.sh, lines 1149-1183), test_done() only bombs out when it
tries to "rm -rf $TRASH_DIRECTORY", and it only runs that command if
there are 0 test failures (see test-lib.sh, lines 1149-1183).  So, if
something in the test proper does break, that by itself will prevent
test_done() from bombing out.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 1/5] t7300: add testcase showing unnecessary traversal into ignored directory
  2021-05-07  5:42         ` Elijah Newren
@ 2021-05-07  5:56           ` Eric Sunshine
  0 siblings, 0 replies; 90+ messages in thread
From: Eric Sunshine @ 2021-05-07  5:56 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Elijah Newren via GitGitGadget, Git List

On Fri, May 7, 2021 at 1:42 AM Elijah Newren <newren@gmail.com> wrote:
> On Thu, May 6, 2021 at 10:32 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
> > I may be confused, but I'm not following this reasoning. If you're
> > using `-i` to debug a failure within the test, then the
> > test_when_finished() cleanup actions won't be triggered anyhow
> > (they're suppressed by `-i`), so everything will be left behind as
> > desired.
>
> I didn't know that about --immediate.  It's good to know.  However,
> not all debugging is done with -i; someone can also just run the
> testsuite expecting everything to pass, see a failure, and then decide
> to go look around (and then maybe re-run with -i if the initial
> looking around isn't clear).  I do that every once in a while.

That's certainly an approach, and it's made easier when each test
creates its own repo (as the tests you write typically do).

In general. though, the majority of Git test scripts run all their
tests in a single repo (per test script), with the result that state
from a failed test is very frequently clobbered by subsequent tests,
which is why --immediate is so useful (it stops the script as soon as
one test fails, so the test state is preserved as well as it can be).
Due to the "clobbering" problem, I don't think I've ever tried
debugging a failed test without using --immediate.

> > The problem with not placing this under control of
> > test_when_finished() is that, if something in the test proper does
> > break, after the "test failed" message, you'll get the undesirable
> > alpine-linux-musl behavior you explained in your earlier email where
> > test_done() bombs out.
>
> Unless I'm misunderstanding the test_done() code (I'm looking at
> test-lib.sh, lines 1149-1183), test_done() only bombs out when it
> tries to "rm -rf $TRASH_DIRECTORY", and it only runs that command if
> there are 0 test failures (see test-lib.sh, lines 1149-1183).  So, if
> something in the test proper does break, that by itself will prevent
> test_done() from bombing out.

I see what you're saying. Okay.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH 6/5] dir: update stale description of treat_directory()
  2021-05-07  4:04 [PATCH 0/5] Directory traversal fixes Elijah Newren via GitGitGadget
                   ` (4 preceding siblings ...)
  2021-05-07  4:05 ` [PATCH 5/5] [RFC] ls-files: error out on -i unless -o or -c are specified Elijah Newren via GitGitGadget
@ 2021-05-07 16:22 ` Derrick Stolee
  2021-05-07 17:57   ` Elijah Newren
  2021-05-07 16:27 ` [PATCH 0/5] Directory traversal fixes Derrick Stolee
  2021-05-08  0:08 ` [PATCH v2 0/8] " Elijah Newren via GitGitGadget
  7 siblings, 1 reply; 90+ messages in thread
From: Derrick Stolee @ 2021-05-07 16:22 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget, git; +Cc: Elijah Newren

On 5/7/2021 12:04 AM, Elijah Newren via GitGitGadget wrote:
> This patchset fixes a few directory traversal issues, where fill_directory()
> would traverse into directories that it shouldn't and not traverse into
> directories that it should. One of these issues was reported recently on
> this list[1], another was found at $DAYJOB.
> 
> The fifth patch might have backward compatibility implications, but is easy
> to review. Even if the logic in dir.c makes your eyes glaze over, at least
> take a look at the fifth patch.

My eyes were glazing over, so I went to read the whole treat_directory()
method and its related documentation comment. I found it to be a bit
confusing that it was referencing names that were deprecated 12 years ago.

Here is a patch that you could add to this series to improve these
comments.

Thanks,
-Stolee

-- >8 --

From 587a94ac396c969b6e7734ee46afeac20e87ccb9 Mon Sep 17 00:00:00 2001
From: Derrick Stolee <dstolee@microsoft.com>
Date: Fri, 7 May 2021 12:14:13 -0400
Subject: [PATCH] dir: update stale description of treat_directory()

The documentation comment for treat_directory() was originally written
in 095952 (Teach directory traversal about subprojects, 2007-04-11)
which was before the 'struct dir_struct' split its bitfield of named
options into a 'flags' enum in 7c4c97c0 (Turn the flags in struct
dir_struct into a single variable, 2009-02-16). When those flags
changed, the comment became stale, since members like
'show_other_directories' transitioned into flags like
DIR_SHOW_OTHER_DIRECTORIES.

Update the comments for treat_directory() to use these flag names rather
than the old member names.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 dir.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/dir.c b/dir.c
index 3beb8e17a83..0a0138bc1aa 100644
--- a/dir.c
+++ b/dir.c
@@ -1749,13 +1749,13 @@ static enum exist_status directory_exists_in_index(struct index_state *istate,
  * Case 3: if we didn't have it in the index previously, we
  * have a few sub-cases:
  *
- *  (a) if "show_other_directories" is true, we show it as
- *      just a directory, unless "hide_empty_directories" is
+ *  (a) if DIR_SHOW_OTHER_DIRECTORIES flag is set, we show it as
+ *      just a directory, unless DIR_HIDE_EMPTY_DIRECTORIES is
  *      also true, in which case we need to check if it contains any
  *      untracked and / or ignored files.
- *  (b) if it looks like a git directory, and we don't have
- *      'no_gitlinks' set we treat it as a gitlink, and show it
- *      as a directory.
+ *  (b) if it looks like a git directory and we don't have the
+ *      DIR_NO_GITLINKS flag, then we treat it as a gitlink, and
+ *      show it as a directory.
  *  (c) otherwise, we recurse into it.
  */
 static enum path_treatment treat_directory(struct dir_struct *dir,
@@ -1843,7 +1843,6 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 		return path_recurse;
 	}
 
-	/* This is the "show_other_directories" case */
 	assert(dir->flags & DIR_SHOW_OTHER_DIRECTORIES);
 
 	/*
@@ -1858,7 +1857,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	/* Special cases for where this directory is excluded/ignored */
 	if (excluded) {
 		/*
-		 * In the show_other_directories case, if we're not
+		 * If DIR_SHOW_OTHER_DIRECTORIES is set and we're not
 		 * hiding empty directories, there is no need to
 		 * recurse into an ignored directory.
 		 */
-- 
2.31.1.vfs.0.0.80.gb082c853c0e



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 0/5] Directory traversal fixes
  2021-05-07  4:04 [PATCH 0/5] Directory traversal fixes Elijah Newren via GitGitGadget
                   ` (5 preceding siblings ...)
  2021-05-07 16:22 ` [PATCH 6/5] dir: update stale description of treat_directory() Derrick Stolee
@ 2021-05-07 16:27 ` Derrick Stolee
  2021-05-08  0:08 ` [PATCH v2 0/8] " Elijah Newren via GitGitGadget
  7 siblings, 0 replies; 90+ messages in thread
From: Derrick Stolee @ 2021-05-07 16:27 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget, git; +Cc: Elijah Newren

On 5/7/2021 12:04 AM, Elijah Newren via GitGitGadget wrote:
> This patchset fixes a few directory traversal issues, where fill_directory()
> would traverse into directories that it shouldn't and not traverse into
> directories that it should. One of these issues was reported recently on
> this list[1], another was found at $DAYJOB.
> 
> The fifth patch might have backward compatibility implications, but is easy
> to review. Even if the logic in dir.c makes your eyes glaze over, at least
> take a look at the fifth patch.
> 
> Also, if anyone has any ideas about a better place to put the "Some
> sidenotes" from the third commit message rather than keeping them in a
> random commit message, that might be helpful too.

As for your patches themselves, I can't claim to understand all the
complicated details about how treat_directory() is working, but your
patches are well organized and the new tests are the real proof that
this is working as intended.

Thanks for the attention to detail here.

-Stolee

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 6/5] dir: update stale description of treat_directory()
  2021-05-07 16:22 ` [PATCH 6/5] dir: update stale description of treat_directory() Derrick Stolee
@ 2021-05-07 17:57   ` Elijah Newren
  0 siblings, 0 replies; 90+ messages in thread
From: Elijah Newren @ 2021-05-07 17:57 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Fri, May 7, 2021 at 9:22 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 5/7/2021 12:04 AM, Elijah Newren via GitGitGadget wrote:
> > This patchset fixes a few directory traversal issues, where fill_directory()
> > would traverse into directories that it shouldn't and not traverse into
> > directories that it should. One of these issues was reported recently on
> > this list[1], another was found at $DAYJOB.
> >
> > The fifth patch might have backward compatibility implications, but is easy
> > to review. Even if the logic in dir.c makes your eyes glaze over, at least
> > take a look at the fifth patch.
>
> My eyes were glazing over, so I went to read the whole treat_directory()
> method and its related documentation comment. I found it to be a bit
> confusing that it was referencing names that were deprecated 12 years ago.
>
> Here is a patch that you could add to this series to improve these
> comments.
>
> Thanks,
> -Stolee
>
> -- >8 --
>
> From 587a94ac396c969b6e7734ee46afeac20e87ccb9 Mon Sep 17 00:00:00 2001
> From: Derrick Stolee <dstolee@microsoft.com>
> Date: Fri, 7 May 2021 12:14:13 -0400
> Subject: [PATCH] dir: update stale description of treat_directory()
>
> The documentation comment for treat_directory() was originally written
> in 095952 (Teach directory traversal about subprojects, 2007-04-11)
> which was before the 'struct dir_struct' split its bitfield of named
> options into a 'flags' enum in 7c4c97c0 (Turn the flags in struct
> dir_struct into a single variable, 2009-02-16). When those flags
> changed, the comment became stale, since members like
> 'show_other_directories' transitioned into flags like
> DIR_SHOW_OTHER_DIRECTORIES.
>
> Update the comments for treat_directory() to use these flag names rather
> than the old member names.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  dir.c | 13 ++++++-------
>  1 file changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/dir.c b/dir.c
> index 3beb8e17a83..0a0138bc1aa 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1749,13 +1749,13 @@ static enum exist_status directory_exists_in_index(struct index_state *istate,
>   * Case 3: if we didn't have it in the index previously, we
>   * have a few sub-cases:
>   *
> - *  (a) if "show_other_directories" is true, we show it as
> - *      just a directory, unless "hide_empty_directories" is
> + *  (a) if DIR_SHOW_OTHER_DIRECTORIES flag is set, we show it as
> + *      just a directory, unless DIR_HIDE_EMPTY_DIRECTORIES is
>   *      also true, in which case we need to check if it contains any
>   *      untracked and / or ignored files.
> - *  (b) if it looks like a git directory, and we don't have
> - *      'no_gitlinks' set we treat it as a gitlink, and show it
> - *      as a directory.
> + *  (b) if it looks like a git directory and we don't have the
> + *      DIR_NO_GITLINKS flag, then we treat it as a gitlink, and
> + *      show it as a directory.
>   *  (c) otherwise, we recurse into it.
>   */
>  static enum path_treatment treat_directory(struct dir_struct *dir,
> @@ -1843,7 +1843,6 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
>                 return path_recurse;
>         }
>
> -       /* This is the "show_other_directories" case */
>         assert(dir->flags & DIR_SHOW_OTHER_DIRECTORIES);
>
>         /*
> @@ -1858,7 +1857,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
>         /* Special cases for where this directory is excluded/ignored */
>         if (excluded) {
>                 /*
> -                * In the show_other_directories case, if we're not
> +                * If DIR_SHOW_OTHER_DIRECTORIES is set and we're not
>                  * hiding empty directories, there is no need to
>                  * recurse into an ignored directory.
>                  */
> --
> 2.31.1.vfs.0.0.80.gb082c853c0e

Looks good to me; I'll give it some more time for other comments to
come in, but when I re-roll, I'll include this patch of yours.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 1/5] t7300: add testcase showing unnecessary traversal into ignored directory
  2021-05-07  5:00     ` Elijah Newren
  2021-05-07  5:31       ` Eric Sunshine
@ 2021-05-07 23:05       ` Jeff King
  2021-05-07 23:15         ` Eric Sunshine
  2021-05-08  0:04         ` Elijah Newren
  1 sibling, 2 replies; 90+ messages in thread
From: Jeff King @ 2021-05-07 23:05 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Eric Sunshine, Elijah Newren via GitGitGadget, Git List

On Thu, May 06, 2021 at 10:00:49PM -0700, Elijah Newren wrote:

> > > +               >directory-random-file.txt &&
> > > +               # Put this file under directory400/directory399/.../directory1/
> > > +               depth=400 &&
> > > +               for x in $(test_seq 1 $depth); do
> > > +                       mkdir "tmpdirectory$x" &&
> > > +                       mv directory* "tmpdirectory$x" &&
> > > +                       mv "tmpdirectory$x" "directory$x"
> > > +               done &&
> >
> > Is this expensive/slow loop needed because you'd otherwise run afoul
> > of command-line length limits on some platforms if you tried creating
> > the entire mess of directories with a single `mkdir -p`?
> 
> The whole point is creating a path long enough that it runs afoul of
> limits, yes.
> 
> If we had an alternative way to check whether dir.c actually recursed
> into a directory, then I could dispense with this and just have a
> single directory (and it could be named a single character long for
> that matter too), but I don't know of a good way to do that.  (Some
> possiibilities I considered along that route are mentioned at
> https://lore.kernel.org/git/CABPp-BF3e+MWQAGb6ER7d5jqjcV=kYqQ2stM_oDyaqvonPPPSw@mail.gmail.com/)

I don't have a better way of checking the dir.c behavior. But I think
the other half of Eric's question was: why can't we do this setup way
more efficiently with "mkdir -p"?

I'd be suspicious that it would work portably because of the long path.
But I think the perl I showed earlier would create it in much less time:

  $ touch directory-file
  $ time sh -c '
      for x in $(seq 1 400)
      do
        mkdir tmpdirectory$x &&
	mv directory* tmpdirectory$x &&
	mv tmpdirectory$x directory$x
      done
    '
    real	0m2.222s
    user	0m1.481s
    sys		0m0.816s

  $ time perl -e '
      for (reverse 1..400) {
        my $d = "directory$_";
	mkdir($d) and chdir($d) or die "mkdir($d): $!";
      }
      open(my $fh, ">", "some-file");
    '
    real	0m0.010s
    user	0m0.001s
    sys		0m0.009s

-Peff

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 1/5] t7300: add testcase showing unnecessary traversal into ignored directory
  2021-05-07 23:05       ` Jeff King
@ 2021-05-07 23:15         ` Eric Sunshine
  2021-05-08  0:04         ` Elijah Newren
  1 sibling, 0 replies; 90+ messages in thread
From: Eric Sunshine @ 2021-05-07 23:15 UTC (permalink / raw)
  To: Jeff King; +Cc: Elijah Newren, Elijah Newren via GitGitGadget, Git List

On Fri, May 7, 2021 at 7:05 PM Jeff King <peff@peff.net> wrote:
> I don't have a better way of checking the dir.c behavior. But I think
> the other half of Eric's question was: why can't we do this setup way
> more efficiently with "mkdir -p"?

I didn't really have that other half-question, as I understood the
portability ramifications. Rather, I just wanted to make sure the
reason I thought the code was doing the for-loop-plus-mv dance was
indeed correct, and that I wasn't overlooking something non-obvious. I
was also indirectly hinting that that bit of code might deserve an
in-code comment explaining why the for-loop is there so that someone
doesn't come along in the future and try replacing it with `mkdir -p`.

> I'd be suspicious that it would work portably because of the long path.
> But I think the perl I showed earlier would create it in much less time:
>
>   $ time perl -e '
>       for (reverse 1..400) {
>         my $d = "directory$_";
>         mkdir($d) and chdir($d) or die "mkdir($d): $!";
>       }
>       open(my $fh, ">", "some-file");
>     '

Yep, this and your other Perl code snippet for removing the directory
seemed much nicer than the far more expensive shell for-loop-plus-mv
(especially for Windows folk).

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 1/5] t7300: add testcase showing unnecessary traversal into ignored directory
  2021-05-07 23:05       ` Jeff King
  2021-05-07 23:15         ` Eric Sunshine
@ 2021-05-08  0:04         ` Elijah Newren
  2021-05-08  0:10           ` Eric Sunshine
  1 sibling, 1 reply; 90+ messages in thread
From: Elijah Newren @ 2021-05-08  0:04 UTC (permalink / raw)
  To: Jeff King; +Cc: Eric Sunshine, Elijah Newren via GitGitGadget, Git List

On Fri, May 7, 2021 at 4:05 PM Jeff King <peff@peff.net> wrote:
>
> On Thu, May 06, 2021 at 10:00:49PM -0700, Elijah Newren wrote:
>
> > > > +               >directory-random-file.txt &&
> > > > +               # Put this file under directory400/directory399/.../directory1/
> > > > +               depth=400 &&
> > > > +               for x in $(test_seq 1 $depth); do
> > > > +                       mkdir "tmpdirectory$x" &&
> > > > +                       mv directory* "tmpdirectory$x" &&
> > > > +                       mv "tmpdirectory$x" "directory$x"
> > > > +               done &&
> > >
> > > Is this expensive/slow loop needed because you'd otherwise run afoul
> > > of command-line length limits on some platforms if you tried creating
> > > the entire mess of directories with a single `mkdir -p`?
> >
> > The whole point is creating a path long enough that it runs afoul of
> > limits, yes.
> >
> > If we had an alternative way to check whether dir.c actually recursed
> > into a directory, then I could dispense with this and just have a
> > single directory (and it could be named a single character long for
> > that matter too), but I don't know of a good way to do that.  (Some
> > possiibilities I considered along that route are mentioned at
> > https://lore.kernel.org/git/CABPp-BF3e+MWQAGb6ER7d5jqjcV=kYqQ2stM_oDyaqvonPPPSw@mail.gmail.com/)
>
> I don't have a better way of checking the dir.c behavior. But I think
> the other half of Eric's question was: why can't we do this setup way
> more efficiently with "mkdir -p"?

I think I figured it out.  I now have the test simplified down to just:

test_expect_success 'avoid traversing into ignored directories' '
    test_when_finished rm -f output error trace.* &&
    test_create_repo avoid-traversing-deep-hierarchy &&
    (
        mkdir -p untracked/subdir/with/a &&
        >untracked/subdir/with/a/random-file.txt &&

        GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
        git clean -ffdxn -e untracked &&

        grep data.*read_directo.*visited ../trace.output \
            | cut -d "|" -f 9 >../trace.relevant &&
        cat >../trace.expect <<-EOF &&
        directories-visited:1
        paths-visited:4
        EOF
        test_cmp ../trace.expect ../trace.relevant
    )
'

This relies on a few extra changes to the code: (1) switching the
existing trace calls in dir.c over to using trace2 variants, and (2)
adding two new counters (visited_directories and visited_paths) that
are output using the trace2 framework.  I'm a little unsure if I
should check the paths-visited counter (will some platform have
additional files in every directory besides '.' and '..'?  Or not have
one of those?), but it is good to have it check that the code in this
case visits no directories other than the toplevel one (i.e. that
directories-visited is 1).

New patches incoming shortly...

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH v2 0/8] Directory traversal fixes
  2021-05-07  4:04 [PATCH 0/5] Directory traversal fixes Elijah Newren via GitGitGadget
                   ` (6 preceding siblings ...)
  2021-05-07 16:27 ` [PATCH 0/5] Directory traversal fixes Derrick Stolee
@ 2021-05-08  0:08 ` Elijah Newren via GitGitGadget
  2021-05-08  0:08   ` [PATCH v2 1/8] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget
                     ` (8 more replies)
  7 siblings, 9 replies; 90+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-05-08  0:08 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King, Elijah Newren

This patchset fixes a few directory traversal issues, where fill_directory()
would traverse into directories that it shouldn't and not traverse into
directories that it should.

Changes since v2:

 * Added a patch from Stolee to clean up some nearby comments that were made
   out-of-date 12 years ago
 * Added a new RFC patch that switches dir.c from using trace1 to trace2
 * Added a new RFC patch that adds directories-visited and paths-visited
   statistics using the trace2 output, and use that to vastly simplify (and
   accelerate) the t7300 testcase

I'm curious what others think of the backward compatibility ramifications of
the RFC patches, patch 5 & patch 6. And whether my use of trace2 is clean,
idiomatic, correct, etc. I've not used it before for things other than
region_enter & region_leave.

Also, if anyone has any ideas about a better place to put the "Some
sidenotes" from the third commit message rather than keeping them in a
random commit message, that might be helpful too.

[1] See
https://lore.kernel.org/git/DM6PR00MB06829EC5B85E0C5AC595004E894E9@DM6PR00MB0682.namprd00.prod.outlook.com/
or alternatively https://github.com/git-for-windows/git/issues/2732.

Derrick Stolee (1):
  dir: update stale description of treat_directory()

Elijah Newren (7):
  t7300: add testcase showing unnecessary traversal into ignored
    directory
  t3001, t7300: add testcase showcasing missed directory traversal
  dir: avoid unnecessary traversal into ignored directory
  dir: traverse into untracked directories if they may have ignored
    subfiles
  [RFC] ls-files: error out on -i unless -o or -c are specified
  [RFC] dir: convert trace calls to trace2 equivalents
  [RFC] dir: reported number of visited directories and paths with
    trace2

 builtin/ls-files.c                 |   3 +
 dir.c                              | 103 +++++++++------
 dir.h                              |   4 +
 t/t1306-xdg-files.sh               |   2 +-
 t/t3001-ls-files-others-exclude.sh |   5 +
 t/t3003-ls-files-exclude.sh        |   4 +-
 t/t7063-status-untracked-cache.sh  | 194 ++++++++++++++++-------------
 t/t7300-clean.sh                   |  41 ++++++
 t/t7519-status-fsmonitor.sh        |   8 +-
 9 files changed, 238 insertions(+), 126 deletions(-)


base-commit: 311531c9de557d25ac087c1637818bd2aad6eb3a
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1020%2Fnewren%2Fdirectory-traversal-fixes-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1020/newren/directory-traversal-fixes-v2
Pull-Request: https://github.com/git/git/pull/1020

Range-diff vs v1:

 1:  a3bd253fa8e8 = 1:  a3bd253fa8e8 t7300: add testcase showing unnecessary traversal into ignored directory
 2:  aa3a41e26eca = 2:  aa3a41e26eca t3001, t7300: add testcase showcasing missed directory traversal
 3:  3c3f6111da13 = 3:  3c3f6111da13 dir: avoid unnecessary traversal into ignored directory
 4:  fad048339b81 = 4:  fad048339b81 dir: traverse into untracked directories if they may have ignored subfiles
 5:  3d8dd00ccd10 = 5:  3d8dd00ccd10 [RFC] ls-files: error out on -i unless -o or -c are specified
 -:  ------------ > 6:  1d825dfdc70b dir: update stale description of treat_directory()
 -:  ------------ > 7:  3a2394506a53 [RFC] dir: convert trace calls to trace2 equivalents
 -:  ------------ > 8:  fba4d65b78c7 [RFC] dir: reported number of visited directories and paths with trace2

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH v2 1/8] t7300: add testcase showing unnecessary traversal into ignored directory
  2021-05-08  0:08 ` [PATCH v2 0/8] " Elijah Newren via GitGitGadget
@ 2021-05-08  0:08   ` Elijah Newren via GitGitGadget
  2021-05-08 10:13     ` Junio C Hamano
  2021-05-08 10:19     ` Junio C Hamano
  2021-05-08  0:08   ` [PATCH v2 2/8] t3001, t7300: add testcase showcasing missed directory traversal Elijah Newren via GitGitGadget
                     ` (7 subsequent siblings)
  8 siblings, 2 replies; 90+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-05-08  0:08 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

PNPM is apparently creating deeply nested (but ignored) directory
structures; traversing them is costly performance-wise, unnecessary, and
in some cases is even throwing warnings/errors because the paths are too
long to handle on various platforms.  Add a testcase that demonstrates
this problem.

Initial-test-by: Jason Gore <Jason.Gore@microsoft.com>
Helped-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t7300-clean.sh | 40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index a74816ca8b46..5f1dc397c11e 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -746,4 +746,44 @@ test_expect_success 'clean untracked paths by pathspec' '
 	test_must_be_empty actual
 '
 
+test_expect_failure 'avoid traversing into ignored directories' '
+	test_when_finished rm -f output error &&
+	test_create_repo avoid-traversing-deep-hierarchy &&
+	(
+		cd avoid-traversing-deep-hierarchy &&
+
+		>directory-random-file.txt &&
+		# Put this file under directory400/directory399/.../directory1/
+		depth=400 &&
+		for x in $(test_seq 1 $depth); do
+			mkdir "tmpdirectory$x" &&
+			mv directory* "tmpdirectory$x" &&
+			mv "tmpdirectory$x" "directory$x"
+		done &&
+
+		git clean -ffdxn -e directory$depth >../output 2>../error &&
+
+		test_must_be_empty ../output &&
+		# We especially do not want things like
+		#   "warning: could not open directory "
+		# appearing in the error output.  It is true that directories
+		# that are too long cannot be opened, but we should not be
+		# recursing into those directories anyway since the very first
+		# level is ignored.
+		test_must_be_empty ../error &&
+
+		# alpine-linux-musl fails to "rm -rf" a directory with such
+		# a deeply nested hierarchy.  Help it out by deleting the
+		# leading directories ourselves.  Super slow, but, what else
+		# can we do?  Without this, we will hit a
+		#     error: Tests passed but test cleanup failed; aborting
+		# so do this ugly manual cleanup...
+		while test ! -f directory-random-file.txt; do
+			name=$(ls -d directory*) &&
+			mv $name/* . &&
+			rmdir $name
+		done
+	)
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH v2 2/8] t3001, t7300: add testcase showcasing missed directory traversal
  2021-05-08  0:08 ` [PATCH v2 0/8] " Elijah Newren via GitGitGadget
  2021-05-08  0:08   ` [PATCH v2 1/8] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget
@ 2021-05-08  0:08   ` Elijah Newren via GitGitGadget
  2021-05-08  0:08   ` [PATCH v2 3/8] dir: avoid unnecessary traversal into ignored directory Elijah Newren via GitGitGadget
                     ` (6 subsequent siblings)
  8 siblings, 0 replies; 90+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-05-08  0:08 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

In the last commit, we added a testcase showing that the directory
traversal machinery sometimes traverses into directories unnecessarily.
Here we show that there are cases where it does the opposite: it does
not traverse into directories, despite those directories having
important files that need to be flagged.

Add a testcase showing that `git ls-files -o -i --directory` can omit
some of the files it should be listing, and another showing that `git
clean -fX` can fail to clean out some of the expected files.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t3001-ls-files-others-exclude.sh |  5 +++++
 t/t7300-clean.sh                   | 19 +++++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/t/t3001-ls-files-others-exclude.sh b/t/t3001-ls-files-others-exclude.sh
index 1ec7cb57c7a8..ac05d1a17931 100755
--- a/t/t3001-ls-files-others-exclude.sh
+++ b/t/t3001-ls-files-others-exclude.sh
@@ -292,6 +292,11 @@ EOF
 	test_cmp expect actual
 '
 
+test_expect_failure 'ls-files with "**" patterns and --directory' '
+	# Expectation same as previous test
+	git ls-files --directory -o -i --exclude "**/a.1" >actual &&
+	test_cmp expect actual
+'
 
 test_expect_success 'ls-files with "**" patterns and no slashes' '
 	git ls-files -o -i --exclude "one**a.1" >actual &&
diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index 5f1dc397c11e..337f9af1d74b 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -786,4 +786,23 @@ test_expect_failure 'avoid traversing into ignored directories' '
 	)
 '
 
+test_expect_failure 'traverse into directories that may have ignored entries' '
+	test_when_finished rm -f output &&
+	test_create_repo need-to-traverse-into-hierarchy &&
+	(
+		cd need-to-traverse-into-hierarchy &&
+		mkdir -p modules/foobar/src/generated &&
+		> modules/foobar/src/generated/code.c &&
+		> modules/foobar/Makefile &&
+		echo "/modules/**/src/generated/" >.gitignore &&
+
+		git clean -fX modules/foobar >../output &&
+
+		grep Removing ../output &&
+
+		test_path_is_missing modules/foobar/src/generated/code.c &&
+		test_path_is_file modules/foobar/Makefile
+	)
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH v2 3/8] dir: avoid unnecessary traversal into ignored directory
  2021-05-08  0:08 ` [PATCH v2 0/8] " Elijah Newren via GitGitGadget
  2021-05-08  0:08   ` [PATCH v2 1/8] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget
  2021-05-08  0:08   ` [PATCH v2 2/8] t3001, t7300: add testcase showcasing missed directory traversal Elijah Newren via GitGitGadget
@ 2021-05-08  0:08   ` Elijah Newren via GitGitGadget
  2021-05-08  0:08   ` [PATCH v2 4/8] dir: traverse into untracked directories if they may have ignored subfiles Elijah Newren via GitGitGadget
                     ` (5 subsequent siblings)
  8 siblings, 0 replies; 90+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-05-08  0:08 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

The show_other_directories case in treat_directory() tried to handle
both excludes and untracked files with the same logic, and mishandled
both the excludes and the untracked files in the process, in different
ways.  Split that logic apart, and then focus on the logic for the
excludes; a subsequent commit will address the logic for untracked
files.

For show_other_directories, an excluded directory means that
every path underneath that directory will also be excluded.  Given that
the calling code requested to just show directories when everything
under a directory had the same state (that's what the
"DIR_SHOW_OTHER_DIRECTORIES" flag means), we generally do not need to
traverse into such directories and can just immediately mark them as
ignored (i.e. as path_excluded).  The only reason we cannot just
immediately return path_excluded is the DIR_HIDE_EMPTY_DIRECTORIES flag
and the possibility that the ignored directory is an empty directory.
The code previously treated DIR_SHOW_IGNORED_TOO in most cases as an
exception as well, which was wrong.  It can sometimes reduce the number
of cases where we need to recurse (namely if
DIR_SHOW_IGNORED_TOO_MODE_MATCHING is also set), but should not be able
to increase the number of cases where we need to recurse.  Fix the logic
accordingly.

Some sidenotes about possible confusion with dir.c:

* "ignored" often refers to an untracked ignore", i.e. a file which is
  not tracked which matches one of the ignore/exclusion rules.  But you
  can also have a "tracked ignore", a tracked file that happens to match
  one of the ignore/exclusion rules and which dir.c has to worry about
  since "git ls-files -c -i" is supposed to list them.

* The dir code often uses "ignored" and "excluded" interchangeably,
  which you need to keep in mind while reading the code.  Sadly, though,
  it can get very confusing since ignore rules can have exclusions, as
  in the last of the following .gitignore rules:
      .gitignore
      *~
      *.log
      !settings.log
  In the last entry above, (pathspec->items[3].magic & PATHSPEC_EXCLUDE)
  will be true due the the '!' negating the rule.  Someone might refer
  to this as "excluded".  That means the file 'settings.log' will not
  match, and thus not be ignored.  So we won't return path_excluded for
  it.  So it's an exclude rule that prevents the file from being an
  exclude.  The non-excluded rules are the ones that result in files
  being excludes.  Great fun, eh?

Sometimes it feels like dir.c needs its own glossary with its many
definitions, including the multiply-defined terms.

Reported-by: Jason Gore <Jason.Gore@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c            | 44 +++++++++++++++++++++++++++++---------------
 t/t7300-clean.sh |  2 +-
 2 files changed, 30 insertions(+), 16 deletions(-)

diff --git a/dir.c b/dir.c
index 3474e67e8f3c..4b183749843e 100644
--- a/dir.c
+++ b/dir.c
@@ -1844,6 +1844,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	}
 
 	/* This is the "show_other_directories" case */
+	assert(dir->flags & DIR_SHOW_OTHER_DIRECTORIES);
 
 	/*
 	 * If we have a pathspec which could match something _below_ this
@@ -1854,27 +1855,40 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	if (matches_how == MATCHED_RECURSIVELY_LEADING_PATHSPEC)
 		return path_recurse;
 
+	/* Special cases for where this directory is excluded/ignored */
+	if (excluded) {
+		/*
+		 * In the show_other_directories case, if we're not
+		 * hiding empty directories, there is no need to
+		 * recurse into an ignored directory.
+		 */
+		if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES))
+			return path_excluded;
+
+		/*
+		 * Even if we are hiding empty directories, we can still avoid
+		 * recursing into ignored directories for DIR_SHOW_IGNORED_TOO
+		 * if DIR_SHOW_IGNORED_TOO_MODE_MATCHING is also set.
+		 */
+		if ((dir->flags & DIR_SHOW_IGNORED_TOO) &&
+		    (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING))
+			return path_excluded;
+	}
+
 	/*
-	 * Other than the path_recurse case immediately above, we only need
-	 * to recurse into untracked/ignored directories if either of the
-	 * following bits is set:
+	 * Other than the path_recurse case above, we only need to
+	 * recurse into untracked directories if either of the following
+	 * bits is set:
 	 *   - DIR_SHOW_IGNORED_TOO (because then we need to determine if
 	 *                           there are ignored entries below)
 	 *   - DIR_HIDE_EMPTY_DIRECTORIES (because we have to determine if
 	 *                                 the directory is empty)
 	 */
-	if (!(dir->flags & (DIR_SHOW_IGNORED_TOO | DIR_HIDE_EMPTY_DIRECTORIES)))
-		return excluded ? path_excluded : path_untracked;
-
-	/*
-	 * ...and even if DIR_SHOW_IGNORED_TOO is set, we can still avoid
-	 * recursing into ignored directories if the path is excluded and
-	 * DIR_SHOW_IGNORED_TOO_MODE_MATCHING is also set.
-	 */
-	if (excluded &&
-	    (dir->flags & DIR_SHOW_IGNORED_TOO) &&
-	    (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING))
-		return path_excluded;
+	if (!excluded &&
+	    !(dir->flags & (DIR_SHOW_IGNORED_TOO |
+			    DIR_HIDE_EMPTY_DIRECTORIES))) {
+		return path_untracked;
+	}
 
 	/*
 	 * Even if we don't want to know all the paths under an untracked or
diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index 337f9af1d74b..00e5fa35dae3 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -746,7 +746,7 @@ test_expect_success 'clean untracked paths by pathspec' '
 	test_must_be_empty actual
 '
 
-test_expect_failure 'avoid traversing into ignored directories' '
+test_expect_success 'avoid traversing into ignored directories' '
 	test_when_finished rm -f output error &&
 	test_create_repo avoid-traversing-deep-hierarchy &&
 	(
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH v2 4/8] dir: traverse into untracked directories if they may have ignored subfiles
  2021-05-08  0:08 ` [PATCH v2 0/8] " Elijah Newren via GitGitGadget
                     ` (2 preceding siblings ...)
  2021-05-08  0:08   ` [PATCH v2 3/8] dir: avoid unnecessary traversal into ignored directory Elijah Newren via GitGitGadget
@ 2021-05-08  0:08   ` Elijah Newren via GitGitGadget
  2021-05-08  0:08   ` [PATCH v2 5/8] [RFC] ls-files: error out on -i unless -o or -c are specified Elijah Newren via GitGitGadget
                     ` (4 subsequent siblings)
  8 siblings, 0 replies; 90+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-05-08  0:08 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

A directory that is untracked does not imply that all files under it
should be categorized as untracked; in particular, if the caller is
interested in ignored files, many files or directories underneath the
untracked directory may be ignored.  We previously partially handled
this right with DIR_SHOW_IGNORED_TOO, but missed DIR_SHOW_IGNORED.  It
was not obvious, though, because the logic for untracked and excluded
files had been fused together making it harder to reason about.  The
previous commit split that logic out, making it easier to notice that
DIR_SHOW_IGNORED was missing.  Add it.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c                              | 10 ++++++----
 t/t3001-ls-files-others-exclude.sh |  2 +-
 t/t7300-clean.sh                   |  2 +-
 3 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/dir.c b/dir.c
index 4b183749843e..3beb8e17a839 100644
--- a/dir.c
+++ b/dir.c
@@ -1877,15 +1877,17 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 
 	/*
 	 * Other than the path_recurse case above, we only need to
-	 * recurse into untracked directories if either of the following
+	 * recurse into untracked directories if any of the following
 	 * bits is set:
-	 *   - DIR_SHOW_IGNORED_TOO (because then we need to determine if
-	 *                           there are ignored entries below)
+	 *   - DIR_SHOW_IGNORED (because then we need to determine if
+	 *                       there are ignored entries below)
+	 *   - DIR_SHOW_IGNORED_TOO (same as above)
 	 *   - DIR_HIDE_EMPTY_DIRECTORIES (because we have to determine if
 	 *                                 the directory is empty)
 	 */
 	if (!excluded &&
-	    !(dir->flags & (DIR_SHOW_IGNORED_TOO |
+	    !(dir->flags & (DIR_SHOW_IGNORED |
+			    DIR_SHOW_IGNORED_TOO |
 			    DIR_HIDE_EMPTY_DIRECTORIES))) {
 		return path_untracked;
 	}
diff --git a/t/t3001-ls-files-others-exclude.sh b/t/t3001-ls-files-others-exclude.sh
index ac05d1a17931..516c95ea0e82 100755
--- a/t/t3001-ls-files-others-exclude.sh
+++ b/t/t3001-ls-files-others-exclude.sh
@@ -292,7 +292,7 @@ EOF
 	test_cmp expect actual
 '
 
-test_expect_failure 'ls-files with "**" patterns and --directory' '
+test_expect_success 'ls-files with "**" patterns and --directory' '
 	# Expectation same as previous test
 	git ls-files --directory -o -i --exclude "**/a.1" >actual &&
 	test_cmp expect actual
diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index 00e5fa35dae3..c2a3b7b6a52b 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -786,7 +786,7 @@ test_expect_success 'avoid traversing into ignored directories' '
 	)
 '
 
-test_expect_failure 'traverse into directories that may have ignored entries' '
+test_expect_success 'traverse into directories that may have ignored entries' '
 	test_when_finished rm -f output &&
 	test_create_repo need-to-traverse-into-hierarchy &&
 	(
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH v2 5/8] [RFC] ls-files: error out on -i unless -o or -c are specified
  2021-05-08  0:08 ` [PATCH v2 0/8] " Elijah Newren via GitGitGadget
                     ` (3 preceding siblings ...)
  2021-05-08  0:08   ` [PATCH v2 4/8] dir: traverse into untracked directories if they may have ignored subfiles Elijah Newren via GitGitGadget
@ 2021-05-08  0:08   ` Elijah Newren via GitGitGadget
  2021-05-08  0:08   ` [PATCH v2 6/8] dir: update stale description of treat_directory() Derrick Stolee via GitGitGadget
                     ` (3 subsequent siblings)
  8 siblings, 0 replies; 90+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-05-08  0:08 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

ls-files --ignored can be used together with either --others or
--cached.  After being perplexed for a bit and digging in to the code, I
assumed that ls-files -i was just broken and not printing anything and
had a nice patch ready to submit when I finally realized that -i can be
used with --cached to find tracked ignores.

While that was a mistake on my part, and a careful reading of the
documentation could have made this more clear, I suspect this is an
error others are likely to make as well.  In fact, of two uses in our
testsuite, I believe one of the two did make this error.  In t1306.13,
there are NO tracked files, and all the excludes built up and used in
that test and in previous tests thus have to be about untracked files.
However, since they were looking for an empty result, the mistake went
unnoticed as their erroneous command also just happened to give an empty
answer.

-i will most the time be used with -o, which would suggest we could just
make -i imply -o in the absence of either a -o or -c, but that would be
a backward incompatible break.  Instead, let's just flag -i without
either a -o or -c as an error, and update the two relevant testcases to
specify their intent.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 builtin/ls-files.c          | 3 +++
 t/t1306-xdg-files.sh        | 2 +-
 t/t3003-ls-files-exclude.sh | 4 ++--
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 60a2913a01e9..9f74b1ab2e69 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -748,6 +748,9 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 	if (pathspec.nr && error_unmatch)
 		ps_matched = xcalloc(pathspec.nr, 1);
 
+	if ((dir.flags & DIR_SHOW_IGNORED) && !show_others && !show_cached)
+		die("ls-files --ignored is usually used with --others, but --cached is the default.  Please specify which you want.");
+
 	if ((dir.flags & DIR_SHOW_IGNORED) && !exc_given)
 		die("ls-files --ignored needs some exclude pattern");
 
diff --git a/t/t1306-xdg-files.sh b/t/t1306-xdg-files.sh
index dd87b43be1a6..40d3c42618c0 100755
--- a/t/t1306-xdg-files.sh
+++ b/t/t1306-xdg-files.sh
@@ -116,7 +116,7 @@ test_expect_success 'Exclusion in a non-XDG global ignore file' '
 test_expect_success 'Checking XDG ignore file when HOME is unset' '
 	(sane_unset HOME &&
 	 git config --unset core.excludesfile &&
-	 git ls-files --exclude-standard --ignored >actual) &&
+	 git ls-files --exclude-standard --ignored --others >actual) &&
 	test_must_be_empty actual
 '
 
diff --git a/t/t3003-ls-files-exclude.sh b/t/t3003-ls-files-exclude.sh
index d5ec333131f9..c41c4f046abf 100755
--- a/t/t3003-ls-files-exclude.sh
+++ b/t/t3003-ls-files-exclude.sh
@@ -29,11 +29,11 @@ test_expect_success 'add file to gitignore' '
 '
 check_all_output
 
-test_expect_success 'ls-files -i lists only tracked-but-ignored files' '
+test_expect_success 'ls-files -i -c lists only tracked-but-ignored files' '
 	echo content >other-file &&
 	git add other-file &&
 	echo file >expect &&
-	git ls-files -i --exclude-standard >output &&
+	git ls-files -i -c --exclude-standard >output &&
 	test_cmp expect output
 '
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH v2 6/8] dir: update stale description of treat_directory()
  2021-05-08  0:08 ` [PATCH v2 0/8] " Elijah Newren via GitGitGadget
                     ` (4 preceding siblings ...)
  2021-05-08  0:08   ` [PATCH v2 5/8] [RFC] ls-files: error out on -i unless -o or -c are specified Elijah Newren via GitGitGadget
@ 2021-05-08  0:08   ` Derrick Stolee via GitGitGadget
  2021-05-08  0:08   ` [PATCH v2 7/8] [RFC] dir: convert trace calls to trace2 equivalents Elijah Newren via GitGitGadget
                     ` (2 subsequent siblings)
  8 siblings, 0 replies; 90+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-08  0:08 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King,
	Elijah Newren, Derrick Stolee

From: Derrick Stolee <stolee@gmail.com>

The documentation comment for treat_directory() was originally written
in 095952 (Teach directory traversal about subprojects, 2007-04-11)
which was before the 'struct dir_struct' split its bitfield of named
options into a 'flags' enum in 7c4c97c0 (Turn the flags in struct
dir_struct into a single variable, 2009-02-16). When those flags
changed, the comment became stale, since members like
'show_other_directories' transitioned into flags like
DIR_SHOW_OTHER_DIRECTORIES.

Update the comments for treat_directory() to use these flag names rather
than the old member names.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/dir.c b/dir.c
index 3beb8e17a839..0a0138bc1aa6 100644
--- a/dir.c
+++ b/dir.c
@@ -1749,13 +1749,13 @@ static enum exist_status directory_exists_in_index(struct index_state *istate,
  * Case 3: if we didn't have it in the index previously, we
  * have a few sub-cases:
  *
- *  (a) if "show_other_directories" is true, we show it as
- *      just a directory, unless "hide_empty_directories" is
+ *  (a) if DIR_SHOW_OTHER_DIRECTORIES flag is set, we show it as
+ *      just a directory, unless DIR_HIDE_EMPTY_DIRECTORIES is
  *      also true, in which case we need to check if it contains any
  *      untracked and / or ignored files.
- *  (b) if it looks like a git directory, and we don't have
- *      'no_gitlinks' set we treat it as a gitlink, and show it
- *      as a directory.
+ *  (b) if it looks like a git directory and we don't have the
+ *      DIR_NO_GITLINKS flag, then we treat it as a gitlink, and
+ *      show it as a directory.
  *  (c) otherwise, we recurse into it.
  */
 static enum path_treatment treat_directory(struct dir_struct *dir,
@@ -1843,7 +1843,6 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 		return path_recurse;
 	}
 
-	/* This is the "show_other_directories" case */
 	assert(dir->flags & DIR_SHOW_OTHER_DIRECTORIES);
 
 	/*
@@ -1858,7 +1857,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	/* Special cases for where this directory is excluded/ignored */
 	if (excluded) {
 		/*
-		 * In the show_other_directories case, if we're not
+		 * If DIR_SHOW_OTHER_DIRECTORIES is set and we're not
 		 * hiding empty directories, there is no need to
 		 * recurse into an ignored directory.
 		 */
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH v2 7/8] [RFC] dir: convert trace calls to trace2 equivalents
  2021-05-08  0:08 ` [PATCH v2 0/8] " Elijah Newren via GitGitGadget
                     ` (5 preceding siblings ...)
  2021-05-08  0:08   ` [PATCH v2 6/8] dir: update stale description of treat_directory() Derrick Stolee via GitGitGadget
@ 2021-05-08  0:08   ` Elijah Newren via GitGitGadget
  2021-05-08  0:08   ` [PATCH v2 8/8] [RFC] dir: reported number of visited directories and paths with trace2 Elijah Newren via GitGitGadget
  2021-05-08 19:58   ` [PATCH v3 0/8] Directory traversal fixes Elijah Newren via GitGitGadget
  8 siblings, 0 replies; 90+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-05-08  0:08 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c                             |  34 ++++--
 t/t7063-status-untracked-cache.sh | 193 +++++++++++++++++-------------
 t/t7519-status-fsmonitor.sh       |   8 +-
 3 files changed, 135 insertions(+), 100 deletions(-)

diff --git a/dir.c b/dir.c
index 0a0138bc1aa6..23c71ab7e9a1 100644
--- a/dir.c
+++ b/dir.c
@@ -2775,12 +2775,29 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
 	return root;
 }
 
+static void trace2_read_directory_statistics(struct dir_struct *dir,
+					     struct repository *repo)
+{
+	if (!dir->untracked)
+		return;
+	trace2_data_intmax("read_directory", repo,
+			   "node-creation", dir->untracked->dir_created);
+	trace2_data_intmax("read_directory", repo,
+			   "gitignore-invalidation",
+			   dir->untracked->gitignore_invalidated);
+	trace2_data_intmax("read_directory", repo,
+			   "directory-invalidation",
+			   dir->untracked->dir_invalidated);
+	trace2_data_intmax("read_directory", repo,
+			   "opendir", dir->untracked->dir_opened);
+}
+
 int read_directory(struct dir_struct *dir, struct index_state *istate,
 		   const char *path, int len, const struct pathspec *pathspec)
 {
 	struct untracked_cache_dir *untracked;
 
-	trace_performance_enter();
+	trace2_region_enter("dir", "read_directory", istate->repo);
 
 	if (has_symlink_leading_path(path, len)) {
 		trace_performance_leave("read directory %.*s", len, path);
@@ -2799,23 +2816,13 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
 	QSORT(dir->entries, dir->nr, cmp_dir_entry);
 	QSORT(dir->ignored, dir->ignored_nr, cmp_dir_entry);
 
-	trace_performance_leave("read directory %.*s", len, path);
+	trace2_region_leave("dir", "read_directory", istate->repo);
 	if (dir->untracked) {
 		static int force_untracked_cache = -1;
-		static struct trace_key trace_untracked_stats = TRACE_KEY_INIT(UNTRACKED_STATS);
 
 		if (force_untracked_cache < 0)
 			force_untracked_cache =
 				git_env_bool("GIT_FORCE_UNTRACKED_CACHE", 0);
-		trace_printf_key(&trace_untracked_stats,
-				 "node creation: %u\n"
-				 "gitignore invalidation: %u\n"
-				 "directory invalidation: %u\n"
-				 "opendir: %u\n",
-				 dir->untracked->dir_created,
-				 dir->untracked->gitignore_invalidated,
-				 dir->untracked->dir_invalidated,
-				 dir->untracked->dir_opened);
 		if (force_untracked_cache &&
 			dir->untracked == istate->untracked &&
 		    (dir->untracked->dir_opened ||
@@ -2826,6 +2833,9 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
 			FREE_AND_NULL(dir->untracked);
 		}
 	}
+
+	if (trace2_is_enabled())
+		trace2_read_directory_statistics(dir, istate->repo);
 	return dir->nr;
 }
 
diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh
index accefde72fb1..6bce65b439e3 100755
--- a/t/t7063-status-untracked-cache.sh
+++ b/t/t7063-status-untracked-cache.sh
@@ -57,6 +57,19 @@ iuc () {
 	return $ret
 }
 
+get_relevant_traces() {
+	# From the GIT_TRACE2_PERF data of the form
+	#    $TIME $FILE:$LINE | d0 | main | data | r1 | ? | ? | read_directo | $RELEVANT_STAT
+	# extract the $RELEVANT_STAT fields.  We don't care about region_enter
+	# or region_leave, or stats for things outside read_directory.
+	INPUT_FILE=$1
+	OUTPUT_FILE=$2
+	grep data.*read_directo $INPUT_FILE \
+	    | cut -d "|" -f 9 \
+	    >$OUTPUT_FILE
+}
+
+
 test_lazy_prereq UNTRACKED_CACHE '
 	{ git update-index --test-untracked-cache; ret=$?; } &&
 	test $ret -ne 1
@@ -129,19 +142,20 @@ EOF
 
 test_expect_success 'status first time (empty cache)' '
 	avoid_racy &&
-	: >../trace &&
-	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 	git status --porcelain >../actual &&
 	iuc status --porcelain >../status.iuc &&
 	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
 	cat >../trace.expect <<EOF &&
-node creation: 3
-gitignore invalidation: 1
-directory invalidation: 0
-opendir: 4
+ ..node-creation:3
+ ..gitignore-invalidation:1
+ ..directory-invalidation:0
+ ..opendir:4
 EOF
-	test_cmp ../trace.expect ../trace
+	test_cmp ../trace.expect ../trace.relevant
 '
 
 test_expect_success 'untracked cache after first status' '
@@ -151,19 +165,20 @@ test_expect_success 'untracked cache after first status' '
 
 test_expect_success 'status second time (fully populated cache)' '
 	avoid_racy &&
-	: >../trace &&
-	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 	git status --porcelain >../actual &&
 	iuc status --porcelain >../status.iuc &&
 	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
 	cat >../trace.expect <<EOF &&
-node creation: 0
-gitignore invalidation: 0
-directory invalidation: 0
-opendir: 0
+ ..node-creation:0
+ ..gitignore-invalidation:0
+ ..directory-invalidation:0
+ ..opendir:0
 EOF
-	test_cmp ../trace.expect ../trace
+	test_cmp ../trace.expect ../trace.relevant
 '
 
 test_expect_success 'untracked cache after second status' '
@@ -174,8 +189,8 @@ test_expect_success 'untracked cache after second status' '
 test_expect_success 'modify in root directory, one dir invalidation' '
 	avoid_racy &&
 	: >four &&
-	: >../trace &&
-	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 	git status --porcelain >../actual &&
 	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
@@ -189,13 +204,14 @@ A  two
 EOF
 	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
 	cat >../trace.expect <<EOF &&
-node creation: 0
-gitignore invalidation: 0
-directory invalidation: 1
-opendir: 1
+ ..node-creation:0
+ ..gitignore-invalidation:0
+ ..directory-invalidation:1
+ ..opendir:1
 EOF
-	test_cmp ../trace.expect ../trace
+	test_cmp ../trace.expect ../trace.relevant
 
 '
 
@@ -223,8 +239,8 @@ EOF
 test_expect_success 'new .gitignore invalidates recursively' '
 	avoid_racy &&
 	echo four >.gitignore &&
-	: >../trace &&
-	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 	git status --porcelain >../actual &&
 	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
@@ -238,13 +254,14 @@ A  two
 EOF
 	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
 	cat >../trace.expect <<EOF &&
-node creation: 0
-gitignore invalidation: 1
-directory invalidation: 1
-opendir: 4
+ ..node-creation:0
+ ..gitignore-invalidation:1
+ ..directory-invalidation:1
+ ..opendir:4
 EOF
-	test_cmp ../trace.expect ../trace
+	test_cmp ../trace.expect ../trace.relevant
 
 '
 
@@ -272,8 +289,8 @@ EOF
 test_expect_success 'new info/exclude invalidates everything' '
 	avoid_racy &&
 	echo three >>.git/info/exclude &&
-	: >../trace &&
-	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 	git status --porcelain >../actual &&
 	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
@@ -285,13 +302,14 @@ A  two
 EOF
 	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
 	cat >../trace.expect <<EOF &&
-node creation: 0
-gitignore invalidation: 1
-directory invalidation: 0
-opendir: 4
+ ..node-creation:0
+ ..gitignore-invalidation:1
+ ..directory-invalidation:0
+ ..opendir:4
 EOF
-	test_cmp ../trace.expect ../trace
+	test_cmp ../trace.expect ../trace.relevant
 '
 
 test_expect_success 'verify untracked cache dump' '
@@ -330,8 +348,8 @@ EOF
 '
 
 test_expect_success 'status after the move' '
-	: >../trace &&
-	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 	git status --porcelain >../actual &&
 	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
@@ -343,13 +361,14 @@ A  one
 EOF
 	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
 	cat >../trace.expect <<EOF &&
-node creation: 0
-gitignore invalidation: 0
-directory invalidation: 0
-opendir: 1
+ ..node-creation:0
+ ..gitignore-invalidation:0
+ ..directory-invalidation:0
+ ..opendir:1
 EOF
-	test_cmp ../trace.expect ../trace
+	test_cmp ../trace.expect ../trace.relevant
 '
 
 test_expect_success 'verify untracked cache dump' '
@@ -389,8 +408,8 @@ EOF
 '
 
 test_expect_success 'status after the move' '
-	: >../trace &&
-	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 	git status --porcelain >../actual &&
 	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
@@ -402,13 +421,14 @@ A  two
 EOF
 	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
 	cat >../trace.expect <<EOF &&
-node creation: 0
-gitignore invalidation: 0
-directory invalidation: 0
-opendir: 1
+ ..node-creation:0
+ ..gitignore-invalidation:0
+ ..directory-invalidation:0
+ ..opendir:1
 EOF
-	test_cmp ../trace.expect ../trace
+	test_cmp ../trace.expect ../trace.relevant
 '
 
 test_expect_success 'verify untracked cache dump' '
@@ -438,8 +458,8 @@ test_expect_success 'set up for sparse checkout testing' '
 '
 
 test_expect_success 'status after commit' '
-	: >../trace &&
-	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 	git status --porcelain >../actual &&
 	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
@@ -448,13 +468,14 @@ test_expect_success 'status after commit' '
 EOF
 	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
 	cat >../trace.expect <<EOF &&
-node creation: 0
-gitignore invalidation: 0
-directory invalidation: 0
-opendir: 2
+ ..node-creation:0
+ ..gitignore-invalidation:0
+ ..directory-invalidation:0
+ ..opendir:2
 EOF
-	test_cmp ../trace.expect ../trace
+	test_cmp ../trace.expect ../trace.relevant
 '
 
 test_expect_success 'untracked cache correct after commit' '
@@ -496,9 +517,9 @@ test_expect_success 'create/modify files, some of which are gitignored' '
 '
 
 test_expect_success 'test sparse status with untracked cache' '
-	: >../trace &&
+	: >../trace.output &&
 	avoid_racy &&
-	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 	git status --porcelain >../status.actual &&
 	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
@@ -509,13 +530,14 @@ test_expect_success 'test sparse status with untracked cache' '
 EOF
 	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../status.actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
 	cat >../trace.expect <<EOF &&
-node creation: 0
-gitignore invalidation: 1
-directory invalidation: 2
-opendir: 2
+ ..node-creation:0
+ ..gitignore-invalidation:1
+ ..directory-invalidation:2
+ ..opendir:2
 EOF
-	test_cmp ../trace.expect ../trace
+	test_cmp ../trace.expect ../trace.relevant
 '
 
 test_expect_success 'untracked cache correct after status' '
@@ -539,8 +561,8 @@ EOF
 
 test_expect_success 'test sparse status again with untracked cache' '
 	avoid_racy &&
-	: >../trace &&
-	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 	git status --porcelain >../status.actual &&
 	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
@@ -551,13 +573,14 @@ test_expect_success 'test sparse status again with untracked cache' '
 EOF
 	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../status.actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
 	cat >../trace.expect <<EOF &&
-node creation: 0
-gitignore invalidation: 0
-directory invalidation: 0
-opendir: 0
+ ..node-creation:0
+ ..gitignore-invalidation:0
+ ..directory-invalidation:0
+ ..opendir:0
 EOF
-	test_cmp ../trace.expect ../trace
+	test_cmp ../trace.expect ../trace.relevant
 '
 
 test_expect_success 'set up for test of subdir and sparse checkouts' '
@@ -568,8 +591,8 @@ test_expect_success 'set up for test of subdir and sparse checkouts' '
 
 test_expect_success 'test sparse status with untracked cache and subdir' '
 	avoid_racy &&
-	: >../trace &&
-	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 	git status --porcelain >../status.actual &&
 	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
@@ -581,13 +604,14 @@ test_expect_success 'test sparse status with untracked cache and subdir' '
 EOF
 	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../status.actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
 	cat >../trace.expect <<EOF &&
-node creation: 2
-gitignore invalidation: 0
-directory invalidation: 1
-opendir: 3
+ ..node-creation:2
+ ..gitignore-invalidation:0
+ ..directory-invalidation:1
+ ..opendir:3
 EOF
-	test_cmp ../trace.expect ../trace
+	test_cmp ../trace.expect ../trace.relevant
 '
 
 test_expect_success 'verify untracked cache dump (sparse/subdirs)' '
@@ -616,19 +640,20 @@ EOF
 
 test_expect_success 'test sparse status again with untracked cache and subdir' '
 	avoid_racy &&
-	: >../trace &&
-	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 	git status --porcelain >../status.actual &&
 	iuc status --porcelain >../status.iuc &&
 	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../status.actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
 	cat >../trace.expect <<EOF &&
-node creation: 0
-gitignore invalidation: 0
-directory invalidation: 0
-opendir: 0
+ ..node-creation:0
+ ..gitignore-invalidation:0
+ ..directory-invalidation:0
+ ..opendir:0
 EOF
-	test_cmp ../trace.expect ../trace
+	test_cmp ../trace.expect ../trace.relevant
 '
 
 test_expect_success 'move entry in subdir from untracked to cached' '
diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
index 45d025f96010..637391c6ce46 100755
--- a/t/t7519-status-fsmonitor.sh
+++ b/t/t7519-status-fsmonitor.sh
@@ -334,7 +334,7 @@ test_expect_success UNTRACKED_CACHE 'ignore .git changes when invalidating UNTR'
 		git config core.fsmonitor .git/hooks/fsmonitor-test &&
 		git update-index --untracked-cache &&
 		git update-index --fsmonitor &&
-		GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace-before" \
+		GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace-before" \
 		git status &&
 		test-tool dump-untracked-cache >../before
 	) &&
@@ -346,12 +346,12 @@ test_expect_success UNTRACKED_CACHE 'ignore .git changes when invalidating UNTR'
 	EOF
 	(
 		cd dot-git &&
-		GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace-after" \
+		GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace-after" \
 		git status &&
 		test-tool dump-untracked-cache >../after
 	) &&
-	grep "directory invalidation" trace-before >>before &&
-	grep "directory invalidation" trace-after >>after &&
+	grep "directory-invalidation" trace-before | cut -d"|" -f 9 >>before &&
+	grep "directory-invalidation" trace-after  | cut -d"|" -f 9 >>after &&
 	# UNTR extension unchanged, dir invalidation count unchanged
 	test_cmp before after
 '
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH v2 8/8] [RFC] dir: reported number of visited directories and paths with trace2
  2021-05-08  0:08 ` [PATCH v2 0/8] " Elijah Newren via GitGitGadget
                     ` (6 preceding siblings ...)
  2021-05-08  0:08   ` [PATCH v2 7/8] [RFC] dir: convert trace calls to trace2 equivalents Elijah Newren via GitGitGadget
@ 2021-05-08  0:08   ` Elijah Newren via GitGitGadget
  2021-05-08 19:58   ` [PATCH v3 0/8] Directory traversal fixes Elijah Newren via GitGitGadget
  8 siblings, 0 replies; 90+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-05-08  0:08 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Previously, tests that wanted to verify that we don't traverse into a
deep directory hierarchy that is ignored had no easy way to verify and
enforce that behavior.  Record information about the number of
directories and paths we inspect while traversing the directory
hierarchy in read_directory(), and when trace2 is enabled, print these
statistics.

Make use of these statistics in t7300 to simplify (and vastly improve
the performance of) the "avoid traversing into ignored directories"
test.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c                             |  8 ++++++
 dir.h                             |  4 +++
 t/t7063-status-untracked-cache.sh |  1 +
 t/t7300-clean.sh                  | 46 ++++++++++---------------------
 4 files changed, 27 insertions(+), 32 deletions(-)

diff --git a/dir.c b/dir.c
index 23c71ab7e9a1..896a9a62b2c7 100644
--- a/dir.c
+++ b/dir.c
@@ -2455,6 +2455,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 
 	if (open_cached_dir(&cdir, dir, untracked, istate, &path, check_only))
 		goto out;
+	dir->visited_directories++;
 
 	if (untracked)
 		untracked->check_only = !!check_only;
@@ -2463,6 +2464,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 		/* check how the file or directory should be treated */
 		state = treat_path(dir, untracked, &cdir, istate, &path,
 				   baselen, pathspec);
+		dir->visited_paths++;
 
 		if (state > dir_state)
 			dir_state = state;
@@ -2778,6 +2780,10 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
 static void trace2_read_directory_statistics(struct dir_struct *dir,
 					     struct repository *repo)
 {
+	trace2_data_intmax("read_directory", repo,
+			   "directories-visited", dir->visited_directories);
+	trace2_data_intmax("read_directory", repo,
+			   "paths-visited", dir->visited_paths);
 	if (!dir->untracked)
 		return;
 	trace2_data_intmax("read_directory", repo,
@@ -2798,6 +2804,8 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
 	struct untracked_cache_dir *untracked;
 
 	trace2_region_enter("dir", "read_directory", istate->repo);
+	dir->visited_paths = 0;
+	dir->visited_directories = 0;
 
 	if (has_symlink_leading_path(path, len)) {
 		trace_performance_leave("read directory %.*s", len, path);
diff --git a/dir.h b/dir.h
index 04d886cfce75..22c67907f689 100644
--- a/dir.h
+++ b/dir.h
@@ -336,6 +336,10 @@ struct dir_struct {
 	struct oid_stat ss_info_exclude;
 	struct oid_stat ss_excludes_file;
 	unsigned unmanaged_exclude_files;
+
+	/* Stats about the traversal */
+	unsigned visited_paths;
+	unsigned visited_directories;
 };
 
 /*Count the number of slashes for string s*/
diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh
index 6bce65b439e3..1517c316892f 100755
--- a/t/t7063-status-untracked-cache.sh
+++ b/t/t7063-status-untracked-cache.sh
@@ -65,6 +65,7 @@ get_relevant_traces() {
 	INPUT_FILE=$1
 	OUTPUT_FILE=$2
 	grep data.*read_directo $INPUT_FILE \
+	    | grep -v visited \
 	    | cut -d "|" -f 9 \
 	    >$OUTPUT_FILE
 }
diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index c2a3b7b6a52b..2c10a7b64f11 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -747,42 +747,24 @@ test_expect_success 'clean untracked paths by pathspec' '
 '
 
 test_expect_success 'avoid traversing into ignored directories' '
-	test_when_finished rm -f output error &&
+	test_when_finished rm -f output error trace.* &&
 	test_create_repo avoid-traversing-deep-hierarchy &&
 	(
 		cd avoid-traversing-deep-hierarchy &&
 
-		>directory-random-file.txt &&
-		# Put this file under directory400/directory399/.../directory1/
-		depth=400 &&
-		for x in $(test_seq 1 $depth); do
-			mkdir "tmpdirectory$x" &&
-			mv directory* "tmpdirectory$x" &&
-			mv "tmpdirectory$x" "directory$x"
-		done &&
-
-		git clean -ffdxn -e directory$depth >../output 2>../error &&
-
-		test_must_be_empty ../output &&
-		# We especially do not want things like
-		#   "warning: could not open directory "
-		# appearing in the error output.  It is true that directories
-		# that are too long cannot be opened, but we should not be
-		# recursing into those directories anyway since the very first
-		# level is ignored.
-		test_must_be_empty ../error &&
-
-		# alpine-linux-musl fails to "rm -rf" a directory with such
-		# a deeply nested hierarchy.  Help it out by deleting the
-		# leading directories ourselves.  Super slow, but, what else
-		# can we do?  Without this, we will hit a
-		#     error: Tests passed but test cleanup failed; aborting
-		# so do this ugly manual cleanup...
-		while test ! -f directory-random-file.txt; do
-			name=$(ls -d directory*) &&
-			mv $name/* . &&
-			rmdir $name
-		done
+		mkdir -p untracked/subdir/with/a &&
+		>untracked/subdir/with/a/random-file.txt &&
+
+		GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
+		git clean -ffdxn -e untracked &&
+
+		grep data.*read_directo.*visited ../trace.output \
+			| cut -d "|" -f 9 >../trace.relevant &&
+		cat >../trace.expect <<-EOF &&
+		 directories-visited:1
+		 paths-visited:4
+		EOF
+		test_cmp ../trace.expect ../trace.relevant
 	)
 '
 
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 1/5] t7300: add testcase showing unnecessary traversal into ignored directory
  2021-05-08  0:04         ` Elijah Newren
@ 2021-05-08  0:10           ` Eric Sunshine
  2021-05-08 17:20             ` Elijah Newren
  0 siblings, 1 reply; 90+ messages in thread
From: Eric Sunshine @ 2021-05-08  0:10 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Jeff King, Elijah Newren via GitGitGadget, Git List

On Fri, May 7, 2021 at 8:04 PM Elijah Newren <newren@gmail.com> wrote:
> I think I figured it out.  I now have the test simplified down to just:
>
> test_expect_success 'avoid traversing into ignored directories' '
>     test_when_finished rm -f output error trace.* &&
>     test_create_repo avoid-traversing-deep-hierarchy &&
>     (
>         mkdir -p untracked/subdir/with/a &&
>         >untracked/subdir/with/a/random-file.txt &&
>
>         GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
>         git clean -ffdxn -e untracked &&
>
>         grep data.*read_directo.*visited ../trace.output \
>             | cut -d "|" -f 9 >../trace.relevant &&
>         cat >../trace.expect <<-EOF &&
>         directories-visited:1
>         paths-visited:4
>         EOF
>         test_cmp ../trace.expect ../trace.relevant
>     )
> '

I believe that you can close the subshell immediately after `git
clean`, which would allow you to drop all the "../" prefixes on
pathnames.

> This relies on a few extra changes to the code: (1) switching the
> existing trace calls in dir.c over to using trace2 variants, and (2)
> adding two new counters (visited_directories and visited_paths) that
> are output using the trace2 framework.  I'm a little unsure if I
> should check the paths-visited counter (will some platform have
> additional files in every directory besides '.' and '..'?  Or not have
> one of those?), but it is good to have it check that the code in this
> case visits no directories other than the toplevel one (i.e. that
> directories-visited is 1).

I can't find the reference, but I recall a reply by jrneider (to some
proposed patch) that not all platforms are guaranteed to have "." and
".." entries (but I'm not sure we need to worry about that presently).

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 1/8] t7300: add testcase showing unnecessary traversal into ignored directory
  2021-05-08  0:08   ` [PATCH v2 1/8] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget
@ 2021-05-08 10:13     ` Junio C Hamano
  2021-05-08 17:34       ` Elijah Newren
  2021-05-08 10:19     ` Junio C Hamano
  1 sibling, 1 reply; 90+ messages in thread
From: Junio C Hamano @ 2021-05-08 10:13 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget
  Cc: git, Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King

"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Elijah Newren <newren@gmail.com>
>
> PNPM is apparently creating deeply nested (but ignored) directory

Sorry, but what's PNPM?

> structures; traversing them is costly performance-wise, unnecessary, and
> in some cases is even throwing warnings/errors because the paths are too
> long to handle on various platforms.  Add a testcase that demonstrates
> this problem.
>
> Initial-test-by: Jason Gore <Jason.Gore@microsoft.com>
> Helped-by: brian m. carlson <sandals@crustytoothpaste.net>
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  t/t7300-clean.sh | 40 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 40 insertions(+)
>
> diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
> index a74816ca8b46..5f1dc397c11e 100755
> --- a/t/t7300-clean.sh
> +++ b/t/t7300-clean.sh
> @@ -746,4 +746,44 @@ test_expect_success 'clean untracked paths by pathspec' '
>  	test_must_be_empty actual
>  '
>  
> +test_expect_failure 'avoid traversing into ignored directories' '
> +	test_when_finished rm -f output error &&
> +	test_create_repo avoid-traversing-deep-hierarchy &&
> +	(
> +		cd avoid-traversing-deep-hierarchy &&
> +
> +		>directory-random-file.txt &&
> +		# Put this file under directory400/directory399/.../directory1/
> +		depth=400 &&
> +		for x in $(test_seq 1 $depth); do

Style.  Lose semicolon, have "do" on the next line on its own,
aligned with "for".  Tip: you shouldn't need any semicolon other
than the doubled ones in case/esac in your shell script.

> +			mkdir "tmpdirectory$x" &&
> +			mv directory* "tmpdirectory$x" &&
> +			mv "tmpdirectory$x" "directory$x"
> +		done &&
> +
> +		git clean -ffdxn -e directory$depth >../output 2>../error &&
> +
> +		test_must_be_empty ../output &&
> +		# We especially do not want things like
> +		#   "warning: could not open directory "
> +		# appearing in the error output.  It is true that directories
> +		# that are too long cannot be opened, but we should not be
> +		# recursing into those directories anyway since the very first
> +		# level is ignored.
> +		test_must_be_empty ../error &&
> +
> +		# alpine-linux-musl fails to "rm -rf" a directory with such
> +		# a deeply nested hierarchy.  Help it out by deleting the
> +		# leading directories ourselves.  Super slow, but, what else
> +		# can we do?  Without this, we will hit a
> +		#     error: Tests passed but test cleanup failed; aborting
> +		# so do this ugly manual cleanup...
> +		while test ! -f directory-random-file.txt; do

Ditto.

> +			name=$(ls -d directory*) &&
> +			mv $name/* . &&
> +			rmdir $name
> +		done

Hmph, after seeing the discussion thread of v1, I was expecting to
see a helper in Perl that cd's down and then comes back up while
removing what is in its directory (and I expected something similar
for creation side we saw above).

> +	)
> +'
> +
>  test_done

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 1/8] t7300: add testcase showing unnecessary traversal into ignored directory
  2021-05-08  0:08   ` [PATCH v2 1/8] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget
  2021-05-08 10:13     ` Junio C Hamano
@ 2021-05-08 10:19     ` Junio C Hamano
  2021-05-08 17:41       ` Elijah Newren
  1 sibling, 1 reply; 90+ messages in thread
From: Junio C Hamano @ 2021-05-08 10:19 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget
  Cc: git, Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King

"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +		# alpine-linux-musl fails to "rm -rf" a directory with such
> +		# a deeply nested hierarchy.  Help it out by deleting the
> +		# leading directories ourselves.  Super slow, but, what else
> +		# can we do?  Without this, we will hit a
> +		#     error: Tests passed but test cleanup failed; aborting
> +		# so do this ugly manual cleanup...
> +		while test ! -f directory-random-file.txt; do
> +			name=$(ls -d directory*) &&
> +			mv $name/* . &&
> +			rmdir $name
> +		done

Another thing: this not being a test_when_finished handler means it
would not help after a test failure.

Perhaps wrap it in a helper

    clean_deep_hierarchy () {
	rm -fr directory* ||
	while test ! -f directory-random-file.txt
	do
		...
	done
    }

and call it from test_when_finished?

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 1/5] t7300: add testcase showing unnecessary traversal into ignored directory
  2021-05-07  4:04 ` [PATCH 1/5] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget
  2021-05-07  4:27   ` Eric Sunshine
@ 2021-05-08 11:13   ` Philip Oakley
  2021-05-08 17:20     ` Elijah Newren
  1 sibling, 1 reply; 90+ messages in thread
From: Philip Oakley @ 2021-05-08 11:13 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget, git; +Cc: Elijah Newren

On 07/05/2021 05:04, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
>
> PNPM 

for me, this was a UNA (un-named abbreviation), can we clarify it, e.g
s/PNPM/& package manager/

> is apparently creating deeply nested (but ignored) directory
> structures; traversing them is costly performance-wise, unnecessary, and
> in some cases is even throwing warnings/errors because the paths are too
> long to handle on various platforms.  Add a testcase that demonstrates
> this problem.
>
> Initial-test-by: Jason Gore <Jason.Gore@microsoft.com>
> Helped-by: brian m. carlson <sandals@crustytoothpaste.net>
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  t/t7300-clean.sh | 40 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 40 insertions(+)
>
> diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
> index a74816ca8b46..5f1dc397c11e 100755
> --- a/t/t7300-clean.sh
> +++ b/t/t7300-clean.sh
> @@ -746,4 +746,44 @@ test_expect_success 'clean untracked paths by pathspec' '
>  	test_must_be_empty actual
>  '
>  
> +test_expect_failure 'avoid traversing into ignored directories' '
> +	test_when_finished rm -f output error &&
> +	test_create_repo avoid-traversing-deep-hierarchy &&
> +	(
> +		cd avoid-traversing-deep-hierarchy &&
> +
> +		>directory-random-file.txt &&
> +		# Put this file under directory400/directory399/.../directory1/
> +		depth=400 &&
> +		for x in $(test_seq 1 $depth); do
> +			mkdir "tmpdirectory$x" &&
> +			mv directory* "tmpdirectory$x" &&
> +			mv "tmpdirectory$x" "directory$x"
> +		done &&
> +
> +		git clean -ffdxn -e directory$depth >../output 2>../error &&
> +
> +		test_must_be_empty ../output &&
> +		# We especially do not want things like
> +		#   "warning: could not open directory "
> +		# appearing in the error output.  It is true that directories
> +		# that are too long cannot be opened, but we should not be
> +		# recursing into those directories anyway since the very first
> +		# level is ignored.
> +		test_must_be_empty ../error &&
> +
> +		# alpine-linux-musl fails to "rm -rf" a directory with such
> +		# a deeply nested hierarchy.  Help it out by deleting the
> +		# leading directories ourselves.  Super slow, but, what else
> +		# can we do?  Without this, we will hit a
> +		#     error: Tests passed but test cleanup failed; aborting
> +		# so do this ugly manual cleanup...
> +		while test ! -f directory-random-file.txt; do
> +			name=$(ls -d directory*) &&
> +			mv $name/* . &&
> +			rmdir $name
> +		done
> +	)
> +'
> +
>  test_done


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 1/5] t7300: add testcase showing unnecessary traversal into ignored directory
  2021-05-08  0:10           ` Eric Sunshine
@ 2021-05-08 17:20             ` Elijah Newren
  0 siblings, 0 replies; 90+ messages in thread
From: Elijah Newren @ 2021-05-08 17:20 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Jeff King, Elijah Newren via GitGitGadget, Git List

On Fri, May 7, 2021 at 5:11 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
>
> On Fri, May 7, 2021 at 8:04 PM Elijah Newren <newren@gmail.com> wrote:
> > I think I figured it out.  I now have the test simplified down to just:
> >
> > test_expect_success 'avoid traversing into ignored directories' '
> >     test_when_finished rm -f output error trace.* &&
> >     test_create_repo avoid-traversing-deep-hierarchy &&
> >     (
> >         mkdir -p untracked/subdir/with/a &&
> >         >untracked/subdir/with/a/random-file.txt &&
> >
> >         GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
> >         git clean -ffdxn -e untracked &&
> >
> >         grep data.*read_directo.*visited ../trace.output \
> >             | cut -d "|" -f 9 >../trace.relevant &&
> >         cat >../trace.expect <<-EOF &&
> >         directories-visited:1
> >         paths-visited:4
> >         EOF
> >         test_cmp ../trace.expect ../trace.relevant
> >     )
> > '
>
> I believe that you can close the subshell immediately after `git
> clean`, which would allow you to drop all the "../" prefixes on
> pathnames.

Ah, good point.  I'll make that fix.

> > This relies on a few extra changes to the code: (1) switching the
> > existing trace calls in dir.c over to using trace2 variants, and (2)
> > adding two new counters (visited_directories and visited_paths) that
> > are output using the trace2 framework.  I'm a little unsure if I
> > should check the paths-visited counter (will some platform have
> > additional files in every directory besides '.' and '..'?  Or not have
> > one of those?), but it is good to have it check that the code in this
> > case visits no directories other than the toplevel one (i.e. that
> > directories-visited is 1).
>
> I can't find the reference, but I recall a reply by jrneider (to some
> proposed patch) that not all platforms are guaranteed to have "." and
> ".." entries (but I'm not sure we need to worry about that presently).

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 1/5] t7300: add testcase showing unnecessary traversal into ignored directory
  2021-05-08 11:13   ` Philip Oakley
@ 2021-05-08 17:20     ` Elijah Newren
  0 siblings, 0 replies; 90+ messages in thread
From: Elijah Newren @ 2021-05-08 17:20 UTC (permalink / raw)
  To: Philip Oakley; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Sat, May 8, 2021 at 4:13 AM Philip Oakley <philipoakley@iee.email> wrote:
>
> On 07/05/2021 05:04, Elijah Newren via GitGitGadget wrote:
> > From: Elijah Newren <newren@gmail.com>
> >
> > PNPM
>
> for me, this was a UNA (un-named abbreviation), can we clarify it, e.g
> s/PNPM/& package manager/

Will do, thanks.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 1/8] t7300: add testcase showing unnecessary traversal into ignored directory
  2021-05-08 10:13     ` Junio C Hamano
@ 2021-05-08 17:34       ` Elijah Newren
  0 siblings, 0 replies; 90+ messages in thread
From: Elijah Newren @ 2021-05-08 17:34 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Eric Sunshine,
	Derrick Stolee, Jeff King

On Sat, May 8, 2021 at 3:13 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: Elijah Newren <newren@gmail.com>
> >
> > PNPM is apparently creating deeply nested (but ignored) directory
>
> Sorry, but what's PNPM?

a package manager; I'll use Philip Oakley's suggestion to make it more clear.

> > structures; traversing them is costly performance-wise, unnecessary, and
> > in some cases is even throwing warnings/errors because the paths are too
> > long to handle on various platforms.  Add a testcase that demonstrates
> > this problem.
> >
> > Initial-test-by: Jason Gore <Jason.Gore@microsoft.com>
> > Helped-by: brian m. carlson <sandals@crustytoothpaste.net>
> > Signed-off-by: Elijah Newren <newren@gmail.com>
> > ---
> >  t/t7300-clean.sh | 40 ++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 40 insertions(+)
> >
> > diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
> > index a74816ca8b46..5f1dc397c11e 100755
> > --- a/t/t7300-clean.sh
> > +++ b/t/t7300-clean.sh
> > @@ -746,4 +746,44 @@ test_expect_success 'clean untracked paths by pathspec' '
> >       test_must_be_empty actual
> >  '
> >
> > +test_expect_failure 'avoid traversing into ignored directories' '
> > +     test_when_finished rm -f output error &&
> > +     test_create_repo avoid-traversing-deep-hierarchy &&
> > +     (
> > +             cd avoid-traversing-deep-hierarchy &&
> > +
> > +             >directory-random-file.txt &&
> > +             # Put this file under directory400/directory399/.../directory1/
> > +             depth=400 &&
> > +             for x in $(test_seq 1 $depth); do
>
> Style.  Lose semicolon, have "do" on the next line on its own,
> aligned with "for".  Tip: you shouldn't need any semicolon other
> than the doubled ones in case/esac in your shell script.

Thanks.

>
> > +                     mkdir "tmpdirectory$x" &&
> > +                     mv directory* "tmpdirectory$x" &&
> > +                     mv "tmpdirectory$x" "directory$x"
> > +             done &&
> > +
> > +             git clean -ffdxn -e directory$depth >../output 2>../error &&
> > +
> > +             test_must_be_empty ../output &&
> > +             # We especially do not want things like
> > +             #   "warning: could not open directory "
> > +             # appearing in the error output.  It is true that directories
> > +             # that are too long cannot be opened, but we should not be
> > +             # recursing into those directories anyway since the very first
> > +             # level is ignored.
> > +             test_must_be_empty ../error &&
> > +
> > +             # alpine-linux-musl fails to "rm -rf" a directory with such
> > +             # a deeply nested hierarchy.  Help it out by deleting the
> > +             # leading directories ourselves.  Super slow, but, what else
> > +             # can we do?  Without this, we will hit a
> > +             #     error: Tests passed but test cleanup failed; aborting
> > +             # so do this ugly manual cleanup...
> > +             while test ! -f directory-random-file.txt; do
>
> Ditto.

Yep, sorry.

> > +                     name=$(ls -d directory*) &&
> > +                     mv $name/* . &&
> > +                     rmdir $name
> > +             done
>
> Hmph, after seeing the discussion thread of v1, I was expecting to
> see a helper in Perl that cd's down and then comes back up while
> removing what is in its directory (and I expected something similar
> for creation side we saw above).

Hmm, I was a bit unsure of the alternative route I took in patches 7
and 8 (switching trace1 to trace2 in dir.c, then using it to get more
statistics which would allow a much more shallow directory structure
for this test).  I wasn't sure if the strategy seemed acceptable, and
I wanted people to be able to see the two schemes side-by-side, but if
that alternative is acceptable, I want to move patch 7 to the front of
the series, the code change parts of patch 8 as the second patch, and
then squash the rest of patch 8 into this patch vastly simplifying
this testcase and obsoleting everyone's comments on it.

Maybe I should have just refactored the series that way anyway.  I'll
send a reroll that does that, and put all the [RFC] patches first.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 1/8] t7300: add testcase showing unnecessary traversal into ignored directory
  2021-05-08 10:19     ` Junio C Hamano
@ 2021-05-08 17:41       ` Elijah Newren
  0 siblings, 0 replies; 90+ messages in thread
From: Elijah Newren @ 2021-05-08 17:41 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Eric Sunshine,
	Derrick Stolee, Jeff King

On Sat, May 8, 2021 at 3:19 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > +             # alpine-linux-musl fails to "rm -rf" a directory with such
> > +             # a deeply nested hierarchy.  Help it out by deleting the
> > +             # leading directories ourselves.  Super slow, but, what else
> > +             # can we do?  Without this, we will hit a
> > +             #     error: Tests passed but test cleanup failed; aborting
> > +             # so do this ugly manual cleanup...
> > +             while test ! -f directory-random-file.txt; do
> > +                     name=$(ls -d directory*) &&
> > +                     mv $name/* . &&
> > +                     rmdir $name
> > +             done
>
> Another thing: this not being a test_when_finished handler means it
> would not help after a test failure.

test failures are irrelevant here; this code is here to help
test_done's directory cleanup, which only fires when all tests pass.


But if I restructure the series, this whole section of code
disappears.  I'll do that...

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH v3 0/8] Directory traversal fixes
  2021-05-08  0:08 ` [PATCH v2 0/8] " Elijah Newren via GitGitGadget
                     ` (7 preceding siblings ...)
  2021-05-08  0:08   ` [PATCH v2 8/8] [RFC] dir: reported number of visited directories and paths with trace2 Elijah Newren via GitGitGadget
@ 2021-05-08 19:58   ` Elijah Newren via GitGitGadget
  2021-05-08 19:58     ` [PATCH v3 1/8] [RFC] dir: convert trace calls to trace2 equivalents Elijah Newren via GitGitGadget
                       ` (8 more replies)
  8 siblings, 9 replies; 90+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-05-08 19:58 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King,
	Philip Oakley, Jeff Hostetler, Josh Steadmon, Elijah Newren

This patchset fixes a few directory traversal issues, where fill_directory()
would traverse into directories that it shouldn't and not traverse into
directories that it should (one of which was originally reported on this
list at [1]). And it includes a few cleanups

Changes since v2:

 * Move the RFC patches to the front
 * Deletes all the ugly test code that stole reviewer attention away from
   the rest of the series. :-) The RFC patches being first allow the test to
   be dramatically simplified and rewritten.
 * Included cleanups suggested by Phillip Oakley and Eric Sunshine (the
   cleanups suggested by others are obsolete with the test rewrite).

Patches 1-3 are RFC because

 * (1) I'm not that familiar with trace1 & trace2; I've only used trace2 for
   region_enter() and region_leave() calls before. And I'm unsure if
   removing trace1 counts as a backward compatibility issue or not, though
   the trace2 documentation claims it's meant to replace trace1.
 * (2) The ls-files -i handling to print an error instead of operating as
   before might be considered a backward incompatible change. I want to hear
   others' opinions on that.

Also, if anyone has any ideas about a better place to put the "Some
sidenotes" from the sixth commit message rather than keeping them in a
random commit message, that might be helpful too.

[1] See
https://lore.kernel.org/git/DM6PR00MB06829EC5B85E0C5AC595004E894E9@DM6PR00MB0682.namprd00.prod.outlook.com/
or alternatively https://github.com/git-for-windows/git/issues/2732.

Derrick Stolee (1):
  dir: update stale description of treat_directory()

Elijah Newren (7):
  [RFC] dir: convert trace calls to trace2 equivalents
  [RFC] dir: report number of visited directories and paths with trace2
  [RFC] ls-files: error out on -i unless -o or -c are specified
  t7300: add testcase showing unnecessary traversal into ignored
    directory
  t3001, t7300: add testcase showcasing missed directory traversal
  dir: avoid unnecessary traversal into ignored directory
  dir: traverse into untracked directories if they may have ignored
    subfiles

 builtin/ls-files.c                 |   3 +
 dir.c                              | 103 +++++++++------
 dir.h                              |   4 +
 t/t1306-xdg-files.sh               |   2 +-
 t/t3001-ls-files-others-exclude.sh |   5 +
 t/t3003-ls-files-exclude.sh        |   4 +-
 t/t7063-status-untracked-cache.sh  | 194 ++++++++++++++++-------------
 t/t7300-clean.sh                   |  41 ++++++
 t/t7519-status-fsmonitor.sh        |   8 +-
 9 files changed, 238 insertions(+), 126 deletions(-)


base-commit: 311531c9de557d25ac087c1637818bd2aad6eb3a
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1020%2Fnewren%2Fdirectory-traversal-fixes-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1020/newren/directory-traversal-fixes-v3
Pull-Request: https://github.com/git/git/pull/1020

Range-diff vs v2:

 7:  3a2394506a53 = 1:  9f1c0d78d739 [RFC] dir: convert trace calls to trace2 equivalents
 8:  fba4d65b78c7 ! 2:  8b511f228af8 [RFC] dir: reported number of visited directories and paths with trace2
     @@ Metadata
      Author: Elijah Newren <newren@gmail.com>
      
       ## Commit message ##
     -    [RFC] dir: reported number of visited directories and paths with trace2
     +    [RFC] dir: report number of visited directories and paths with trace2
      
     -    Previously, tests that wanted to verify that we don't traverse into a
     -    deep directory hierarchy that is ignored had no easy way to verify and
     -    enforce that behavior.  Record information about the number of
     -    directories and paths we inspect while traversing the directory
     -    hierarchy in read_directory(), and when trace2 is enabled, print these
     -    statistics.
     -
     -    Make use of these statistics in t7300 to simplify (and vastly improve
     -    the performance of) the "avoid traversing into ignored directories"
     -    test.
     +    Provide more statistics in trace2 output that include the number of
     +    directories and total paths visited by the directory traversal logic.
     +    Subsequent patches will take advantage of this to ensure we do not
     +    unnecessarily traverse into ignored directories.
      
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
     @@ t/t7063-status-untracked-cache.sh: get_relevant_traces() {
       	    | cut -d "|" -f 9 \
       	    >$OUTPUT_FILE
       }
     -
     - ## t/t7300-clean.sh ##
     -@@ t/t7300-clean.sh: test_expect_success 'clean untracked paths by pathspec' '
     - '
     - 
     - test_expect_success 'avoid traversing into ignored directories' '
     --	test_when_finished rm -f output error &&
     -+	test_when_finished rm -f output error trace.* &&
     - 	test_create_repo avoid-traversing-deep-hierarchy &&
     - 	(
     - 		cd avoid-traversing-deep-hierarchy &&
     - 
     --		>directory-random-file.txt &&
     --		# Put this file under directory400/directory399/.../directory1/
     --		depth=400 &&
     --		for x in $(test_seq 1 $depth); do
     --			mkdir "tmpdirectory$x" &&
     --			mv directory* "tmpdirectory$x" &&
     --			mv "tmpdirectory$x" "directory$x"
     --		done &&
     --
     --		git clean -ffdxn -e directory$depth >../output 2>../error &&
     --
     --		test_must_be_empty ../output &&
     --		# We especially do not want things like
     --		#   "warning: could not open directory "
     --		# appearing in the error output.  It is true that directories
     --		# that are too long cannot be opened, but we should not be
     --		# recursing into those directories anyway since the very first
     --		# level is ignored.
     --		test_must_be_empty ../error &&
     --
     --		# alpine-linux-musl fails to "rm -rf" a directory with such
     --		# a deeply nested hierarchy.  Help it out by deleting the
     --		# leading directories ourselves.  Super slow, but, what else
     --		# can we do?  Without this, we will hit a
     --		#     error: Tests passed but test cleanup failed; aborting
     --		# so do this ugly manual cleanup...
     --		while test ! -f directory-random-file.txt; do
     --			name=$(ls -d directory*) &&
     --			mv $name/* . &&
     --			rmdir $name
     --		done
     -+		mkdir -p untracked/subdir/with/a &&
     -+		>untracked/subdir/with/a/random-file.txt &&
     -+
     -+		GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
     -+		git clean -ffdxn -e untracked &&
     -+
     -+		grep data.*read_directo.*visited ../trace.output \
     -+			| cut -d "|" -f 9 >../trace.relevant &&
     -+		cat >../trace.expect <<-EOF &&
     -+		 directories-visited:1
     -+		 paths-visited:4
     -+		EOF
     -+		test_cmp ../trace.expect ../trace.relevant
     - 	)
     - '
     - 
 5:  3d8dd00ccd10 = 3:  44a1322c4402 [RFC] ls-files: error out on -i unless -o or -c are specified
 1:  a3bd253fa8e8 ! 4:  dc3d3f247141 t7300: add testcase showing unnecessary traversal into ignored directory
     @@ Metadata
       ## Commit message ##
          t7300: add testcase showing unnecessary traversal into ignored directory
      
     -    PNPM is apparently creating deeply nested (but ignored) directory
     -    structures; traversing them is costly performance-wise, unnecessary, and
     -    in some cases is even throwing warnings/errors because the paths are too
     -    long to handle on various platforms.  Add a testcase that demonstrates
     -    this problem.
     +    The PNPM package manager is apparently creating deeply nested (but
     +    ignored) directory structures; traversing them is costly
     +    performance-wise, unnecessary, and in some cases is even throwing
     +    warnings/errors because the paths are too long to handle on various
     +    platforms.  Add a testcase that checks for such unnecessary directory
     +    traversal.
      
     -    Initial-test-by: Jason Gore <Jason.Gore@microsoft.com>
     -    Helped-by: brian m. carlson <sandals@crustytoothpaste.net>
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
       ## t/t7300-clean.sh ##
     @@ t/t7300-clean.sh: test_expect_success 'clean untracked paths by pathspec' '
       '
       
      +test_expect_failure 'avoid traversing into ignored directories' '
     -+	test_when_finished rm -f output error &&
     ++	test_when_finished rm -f output error trace.* &&
      +	test_create_repo avoid-traversing-deep-hierarchy &&
      +	(
      +		cd avoid-traversing-deep-hierarchy &&
      +
     -+		>directory-random-file.txt &&
     -+		# Put this file under directory400/directory399/.../directory1/
     -+		depth=400 &&
     -+		for x in $(test_seq 1 $depth); do
     -+			mkdir "tmpdirectory$x" &&
     -+			mv directory* "tmpdirectory$x" &&
     -+			mv "tmpdirectory$x" "directory$x"
     -+		done &&
     ++		mkdir -p untracked/subdir/with/a &&
     ++		>untracked/subdir/with/a/random-file.txt &&
      +
     -+		git clean -ffdxn -e directory$depth >../output 2>../error &&
     ++		GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
     ++		git clean -ffdxn -e untracked
     ++	) &&
      +
     -+		test_must_be_empty ../output &&
     -+		# We especially do not want things like
     -+		#   "warning: could not open directory "
     -+		# appearing in the error output.  It is true that directories
     -+		# that are too long cannot be opened, but we should not be
     -+		# recursing into those directories anyway since the very first
     -+		# level is ignored.
     -+		test_must_be_empty ../error &&
     -+
     -+		# alpine-linux-musl fails to "rm -rf" a directory with such
     -+		# a deeply nested hierarchy.  Help it out by deleting the
     -+		# leading directories ourselves.  Super slow, but, what else
     -+		# can we do?  Without this, we will hit a
     -+		#     error: Tests passed but test cleanup failed; aborting
     -+		# so do this ugly manual cleanup...
     -+		while test ! -f directory-random-file.txt; do
     -+			name=$(ls -d directory*) &&
     -+			mv $name/* . &&
     -+			rmdir $name
     -+		done
     -+	)
     ++	grep data.*read_directo.*visited trace.output \
     ++		| cut -d "|" -f 9 >trace.relevant &&
     ++	cat >trace.expect <<-EOF &&
     ++	 directories-visited:1
     ++	 paths-visited:4
     ++	EOF
     ++	test_cmp trace.expect trace.relevant
      +'
      +
       test_done
 2:  aa3a41e26eca ! 5:  73b03a1e8e05 t3001, t7300: add testcase showcasing missed directory traversal
     @@ t/t3001-ls-files-others-exclude.sh: EOF
      
       ## t/t7300-clean.sh ##
      @@ t/t7300-clean.sh: test_expect_failure 'avoid traversing into ignored directories' '
     - 	)
     + 	test_cmp trace.expect trace.relevant
       '
       
      +test_expect_failure 'traverse into directories that may have ignored entries' '
 3:  3c3f6111da13 ! 6:  66ffc7f02d08 dir: avoid unnecessary traversal into ignored directory
     @@ t/t7300-clean.sh: test_expect_success 'clean untracked paths by pathspec' '
       
      -test_expect_failure 'avoid traversing into ignored directories' '
      +test_expect_success 'avoid traversing into ignored directories' '
     - 	test_when_finished rm -f output error &&
     + 	test_when_finished rm -f output error trace.* &&
       	test_create_repo avoid-traversing-deep-hierarchy &&
       	(
 4:  fad048339b81 ! 7:  acde436b220e dir: traverse into untracked directories if they may have ignored subfiles
     @@ t/t3001-ls-files-others-exclude.sh: EOF
      
       ## t/t7300-clean.sh ##
      @@ t/t7300-clean.sh: test_expect_success 'avoid traversing into ignored directories' '
     - 	)
     + 	test_cmp trace.expect trace.relevant
       '
       
      -test_expect_failure 'traverse into directories that may have ignored entries' '
 6:  1d825dfdc70b = 8:  57135c357774 dir: update stale description of treat_directory()

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH v3 1/8] [RFC] dir: convert trace calls to trace2 equivalents
  2021-05-08 19:58   ` [PATCH v3 0/8] Directory traversal fixes Elijah Newren via GitGitGadget
@ 2021-05-08 19:58     ` Elijah Newren via GitGitGadget
  2021-05-10  4:49       ` Junio C Hamano
  2021-05-11 16:17       ` Jeff Hostetler
  2021-05-08 19:58     ` [PATCH v3 2/8] [RFC] dir: report number of visited directories and paths with trace2 Elijah Newren via GitGitGadget
                       ` (7 subsequent siblings)
  8 siblings, 2 replies; 90+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-05-08 19:58 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King,
	Philip Oakley, Jeff Hostetler, Josh Steadmon, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c                             |  34 ++++--
 t/t7063-status-untracked-cache.sh | 193 +++++++++++++++++-------------
 t/t7519-status-fsmonitor.sh       |   8 +-
 3 files changed, 135 insertions(+), 100 deletions(-)

diff --git a/dir.c b/dir.c
index 3474e67e8f3c..9f7c8debeab3 100644
--- a/dir.c
+++ b/dir.c
@@ -2760,12 +2760,29 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
 	return root;
 }
 
+static void trace2_read_directory_statistics(struct dir_struct *dir,
+					     struct repository *repo)
+{
+	if (!dir->untracked)
+		return;
+	trace2_data_intmax("read_directory", repo,
+			   "node-creation", dir->untracked->dir_created);
+	trace2_data_intmax("read_directory", repo,
+			   "gitignore-invalidation",
+			   dir->untracked->gitignore_invalidated);
+	trace2_data_intmax("read_directory", repo,
+			   "directory-invalidation",
+			   dir->untracked->dir_invalidated);
+	trace2_data_intmax("read_directory", repo,
+			   "opendir", dir->untracked->dir_opened);
+}
+
 int read_directory(struct dir_struct *dir, struct index_state *istate,
 		   const char *path, int len, const struct pathspec *pathspec)
 {
 	struct untracked_cache_dir *untracked;
 
-	trace_performance_enter();
+	trace2_region_enter("dir", "read_directory", istate->repo);
 
 	if (has_symlink_leading_path(path, len)) {
 		trace_performance_leave("read directory %.*s", len, path);
@@ -2784,23 +2801,13 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
 	QSORT(dir->entries, dir->nr, cmp_dir_entry);
 	QSORT(dir->ignored, dir->ignored_nr, cmp_dir_entry);
 
-	trace_performance_leave("read directory %.*s", len, path);
+	trace2_region_leave("dir", "read_directory", istate->repo);
 	if (dir->untracked) {
 		static int force_untracked_cache = -1;
-		static struct trace_key trace_untracked_stats = TRACE_KEY_INIT(UNTRACKED_STATS);
 
 		if (force_untracked_cache < 0)
 			force_untracked_cache =
 				git_env_bool("GIT_FORCE_UNTRACKED_CACHE", 0);
-		trace_printf_key(&trace_untracked_stats,
-				 "node creation: %u\n"
-				 "gitignore invalidation: %u\n"
-				 "directory invalidation: %u\n"
-				 "opendir: %u\n",
-				 dir->untracked->dir_created,
-				 dir->untracked->gitignore_invalidated,
-				 dir->untracked->dir_invalidated,
-				 dir->untracked->dir_opened);
 		if (force_untracked_cache &&
 			dir->untracked == istate->untracked &&
 		    (dir->untracked->dir_opened ||
@@ -2811,6 +2818,9 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
 			FREE_AND_NULL(dir->untracked);
 		}
 	}
+
+	if (trace2_is_enabled())
+		trace2_read_directory_statistics(dir, istate->repo);
 	return dir->nr;
 }
 
diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh
index accefde72fb1..6bce65b439e3 100755
--- a/t/t7063-status-untracked-cache.sh
+++ b/t/t7063-status-untracked-cache.sh
@@ -57,6 +57,19 @@ iuc () {
 	return $ret
 }
 
+get_relevant_traces() {
+	# From the GIT_TRACE2_PERF data of the form
+	#    $TIME $FILE:$LINE | d0 | main | data | r1 | ? | ? | read_directo | $RELEVANT_STAT
+	# extract the $RELEVANT_STAT fields.  We don't care about region_enter
+	# or region_leave, or stats for things outside read_directory.
+	INPUT_FILE=$1
+	OUTPUT_FILE=$2
+	grep data.*read_directo $INPUT_FILE \
+	    | cut -d "|" -f 9 \
+	    >$OUTPUT_FILE
+}
+
+
 test_lazy_prereq UNTRACKED_CACHE '
 	{ git update-index --test-untracked-cache; ret=$?; } &&
 	test $ret -ne 1
@@ -129,19 +142,20 @@ EOF
 
 test_expect_success 'status first time (empty cache)' '
 	avoid_racy &&
-	: >../trace &&
-	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 	git status --porcelain >../actual &&
 	iuc status --porcelain >../status.iuc &&
 	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
 	cat >../trace.expect <<EOF &&
-node creation: 3
-gitignore invalidation: 1
-directory invalidation: 0
-opendir: 4
+ ..node-creation:3
+ ..gitignore-invalidation:1
+ ..directory-invalidation:0
+ ..opendir:4
 EOF
-	test_cmp ../trace.expect ../trace
+	test_cmp ../trace.expect ../trace.relevant
 '
 
 test_expect_success 'untracked cache after first status' '
@@ -151,19 +165,20 @@ test_expect_success 'untracked cache after first status' '
 
 test_expect_success 'status second time (fully populated cache)' '
 	avoid_racy &&
-	: >../trace &&
-	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 	git status --porcelain >../actual &&
 	iuc status --porcelain >../status.iuc &&
 	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
 	cat >../trace.expect <<EOF &&
-node creation: 0
-gitignore invalidation: 0
-directory invalidation: 0
-opendir: 0
+ ..node-creation:0
+ ..gitignore-invalidation:0
+ ..directory-invalidation:0
+ ..opendir:0
 EOF
-	test_cmp ../trace.expect ../trace
+	test_cmp ../trace.expect ../trace.relevant
 '
 
 test_expect_success 'untracked cache after second status' '
@@ -174,8 +189,8 @@ test_expect_success 'untracked cache after second status' '
 test_expect_success 'modify in root directory, one dir invalidation' '
 	avoid_racy &&
 	: >four &&
-	: >../trace &&
-	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 	git status --porcelain >../actual &&
 	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
@@ -189,13 +204,14 @@ A  two
 EOF
 	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
 	cat >../trace.expect <<EOF &&
-node creation: 0
-gitignore invalidation: 0
-directory invalidation: 1
-opendir: 1
+ ..node-creation:0
+ ..gitignore-invalidation:0
+ ..directory-invalidation:1
+ ..opendir:1
 EOF
-	test_cmp ../trace.expect ../trace
+	test_cmp ../trace.expect ../trace.relevant
 
 '
 
@@ -223,8 +239,8 @@ EOF
 test_expect_success 'new .gitignore invalidates recursively' '
 	avoid_racy &&
 	echo four >.gitignore &&
-	: >../trace &&
-	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 	git status --porcelain >../actual &&
 	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
@@ -238,13 +254,14 @@ A  two
 EOF
 	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
 	cat >../trace.expect <<EOF &&
-node creation: 0
-gitignore invalidation: 1
-directory invalidation: 1
-opendir: 4
+ ..node-creation:0
+ ..gitignore-invalidation:1
+ ..directory-invalidation:1
+ ..opendir:4
 EOF
-	test_cmp ../trace.expect ../trace
+	test_cmp ../trace.expect ../trace.relevant
 
 '
 
@@ -272,8 +289,8 @@ EOF
 test_expect_success 'new info/exclude invalidates everything' '
 	avoid_racy &&
 	echo three >>.git/info/exclude &&
-	: >../trace &&
-	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 	git status --porcelain >../actual &&
 	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
@@ -285,13 +302,14 @@ A  two
 EOF
 	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
 	cat >../trace.expect <<EOF &&
-node creation: 0
-gitignore invalidation: 1
-directory invalidation: 0
-opendir: 4
+ ..node-creation:0
+ ..gitignore-invalidation:1
+ ..directory-invalidation:0
+ ..opendir:4
 EOF
-	test_cmp ../trace.expect ../trace
+	test_cmp ../trace.expect ../trace.relevant
 '
 
 test_expect_success 'verify untracked cache dump' '
@@ -330,8 +348,8 @@ EOF
 '
 
 test_expect_success 'status after the move' '
-	: >../trace &&
-	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 	git status --porcelain >../actual &&
 	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
@@ -343,13 +361,14 @@ A  one
 EOF
 	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
 	cat >../trace.expect <<EOF &&
-node creation: 0
-gitignore invalidation: 0
-directory invalidation: 0
-opendir: 1
+ ..node-creation:0
+ ..gitignore-invalidation:0
+ ..directory-invalidation:0
+ ..opendir:1
 EOF
-	test_cmp ../trace.expect ../trace
+	test_cmp ../trace.expect ../trace.relevant
 '
 
 test_expect_success 'verify untracked cache dump' '
@@ -389,8 +408,8 @@ EOF
 '
 
 test_expect_success 'status after the move' '
-	: >../trace &&
-	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 	git status --porcelain >../actual &&
 	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
@@ -402,13 +421,14 @@ A  two
 EOF
 	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
 	cat >../trace.expect <<EOF &&
-node creation: 0
-gitignore invalidation: 0
-directory invalidation: 0
-opendir: 1
+ ..node-creation:0
+ ..gitignore-invalidation:0
+ ..directory-invalidation:0
+ ..opendir:1
 EOF
-	test_cmp ../trace.expect ../trace
+	test_cmp ../trace.expect ../trace.relevant
 '
 
 test_expect_success 'verify untracked cache dump' '
@@ -438,8 +458,8 @@ test_expect_success 'set up for sparse checkout testing' '
 '
 
 test_expect_success 'status after commit' '
-	: >../trace &&
-	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 	git status --porcelain >../actual &&
 	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
@@ -448,13 +468,14 @@ test_expect_success 'status after commit' '
 EOF
 	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
 	cat >../trace.expect <<EOF &&
-node creation: 0
-gitignore invalidation: 0
-directory invalidation: 0
-opendir: 2
+ ..node-creation:0
+ ..gitignore-invalidation:0
+ ..directory-invalidation:0
+ ..opendir:2
 EOF
-	test_cmp ../trace.expect ../trace
+	test_cmp ../trace.expect ../trace.relevant
 '
 
 test_expect_success 'untracked cache correct after commit' '
@@ -496,9 +517,9 @@ test_expect_success 'create/modify files, some of which are gitignored' '
 '
 
 test_expect_success 'test sparse status with untracked cache' '
-	: >../trace &&
+	: >../trace.output &&
 	avoid_racy &&
-	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 	git status --porcelain >../status.actual &&
 	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
@@ -509,13 +530,14 @@ test_expect_success 'test sparse status with untracked cache' '
 EOF
 	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../status.actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
 	cat >../trace.expect <<EOF &&
-node creation: 0
-gitignore invalidation: 1
-directory invalidation: 2
-opendir: 2
+ ..node-creation:0
+ ..gitignore-invalidation:1
+ ..directory-invalidation:2
+ ..opendir:2
 EOF
-	test_cmp ../trace.expect ../trace
+	test_cmp ../trace.expect ../trace.relevant
 '
 
 test_expect_success 'untracked cache correct after status' '
@@ -539,8 +561,8 @@ EOF
 
 test_expect_success 'test sparse status again with untracked cache' '
 	avoid_racy &&
-	: >../trace &&
-	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 	git status --porcelain >../status.actual &&
 	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
@@ -551,13 +573,14 @@ test_expect_success 'test sparse status again with untracked cache' '
 EOF
 	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../status.actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
 	cat >../trace.expect <<EOF &&
-node creation: 0
-gitignore invalidation: 0
-directory invalidation: 0
-opendir: 0
+ ..node-creation:0
+ ..gitignore-invalidation:0
+ ..directory-invalidation:0
+ ..opendir:0
 EOF
-	test_cmp ../trace.expect ../trace
+	test_cmp ../trace.expect ../trace.relevant
 '
 
 test_expect_success 'set up for test of subdir and sparse checkouts' '
@@ -568,8 +591,8 @@ test_expect_success 'set up for test of subdir and sparse checkouts' '
 
 test_expect_success 'test sparse status with untracked cache and subdir' '
 	avoid_racy &&
-	: >../trace &&
-	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 	git status --porcelain >../status.actual &&
 	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
@@ -581,13 +604,14 @@ test_expect_success 'test sparse status with untracked cache and subdir' '
 EOF
 	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../status.actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
 	cat >../trace.expect <<EOF &&
-node creation: 2
-gitignore invalidation: 0
-directory invalidation: 1
-opendir: 3
+ ..node-creation:2
+ ..gitignore-invalidation:0
+ ..directory-invalidation:1
+ ..opendir:3
 EOF
-	test_cmp ../trace.expect ../trace
+	test_cmp ../trace.expect ../trace.relevant
 '
 
 test_expect_success 'verify untracked cache dump (sparse/subdirs)' '
@@ -616,19 +640,20 @@ EOF
 
 test_expect_success 'test sparse status again with untracked cache and subdir' '
 	avoid_racy &&
-	: >../trace &&
-	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 	git status --porcelain >../status.actual &&
 	iuc status --porcelain >../status.iuc &&
 	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../status.actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
 	cat >../trace.expect <<EOF &&
-node creation: 0
-gitignore invalidation: 0
-directory invalidation: 0
-opendir: 0
+ ..node-creation:0
+ ..gitignore-invalidation:0
+ ..directory-invalidation:0
+ ..opendir:0
 EOF
-	test_cmp ../trace.expect ../trace
+	test_cmp ../trace.expect ../trace.relevant
 '
 
 test_expect_success 'move entry in subdir from untracked to cached' '
diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
index 45d025f96010..637391c6ce46 100755
--- a/t/t7519-status-fsmonitor.sh
+++ b/t/t7519-status-fsmonitor.sh
@@ -334,7 +334,7 @@ test_expect_success UNTRACKED_CACHE 'ignore .git changes when invalidating UNTR'
 		git config core.fsmonitor .git/hooks/fsmonitor-test &&
 		git update-index --untracked-cache &&
 		git update-index --fsmonitor &&
-		GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace-before" \
+		GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace-before" \
 		git status &&
 		test-tool dump-untracked-cache >../before
 	) &&
@@ -346,12 +346,12 @@ test_expect_success UNTRACKED_CACHE 'ignore .git changes when invalidating UNTR'
 	EOF
 	(
 		cd dot-git &&
-		GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace-after" \
+		GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace-after" \
 		git status &&
 		test-tool dump-untracked-cache >../after
 	) &&
-	grep "directory invalidation" trace-before >>before &&
-	grep "directory invalidation" trace-after >>after &&
+	grep "directory-invalidation" trace-before | cut -d"|" -f 9 >>before &&
+	grep "directory-invalidation" trace-after  | cut -d"|" -f 9 >>after &&
 	# UNTR extension unchanged, dir invalidation count unchanged
 	test_cmp before after
 '
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH v3 2/8] [RFC] dir: report number of visited directories and paths with trace2
  2021-05-08 19:58   ` [PATCH v3 0/8] Directory traversal fixes Elijah Newren via GitGitGadget
  2021-05-08 19:58     ` [PATCH v3 1/8] [RFC] dir: convert trace calls to trace2 equivalents Elijah Newren via GitGitGadget
@ 2021-05-08 19:58     ` Elijah Newren via GitGitGadget
  2021-05-10  5:00       ` Junio C Hamano
  2021-05-08 19:58     ` [PATCH v3 3/8] [RFC] ls-files: error out on -i unless -o or -c are specified Elijah Newren via GitGitGadget
                       ` (6 subsequent siblings)
  8 siblings, 1 reply; 90+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-05-08 19:58 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King,
	Philip Oakley, Jeff Hostetler, Josh Steadmon, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Provide more statistics in trace2 output that include the number of
directories and total paths visited by the directory traversal logic.
Subsequent patches will take advantage of this to ensure we do not
unnecessarily traverse into ignored directories.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c                             | 8 ++++++++
 dir.h                             | 4 ++++
 t/t7063-status-untracked-cache.sh | 1 +
 3 files changed, 13 insertions(+)

diff --git a/dir.c b/dir.c
index 9f7c8debeab3..dfb174227b36 100644
--- a/dir.c
+++ b/dir.c
@@ -2440,6 +2440,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 
 	if (open_cached_dir(&cdir, dir, untracked, istate, &path, check_only))
 		goto out;
+	dir->visited_directories++;
 
 	if (untracked)
 		untracked->check_only = !!check_only;
@@ -2448,6 +2449,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 		/* check how the file or directory should be treated */
 		state = treat_path(dir, untracked, &cdir, istate, &path,
 				   baselen, pathspec);
+		dir->visited_paths++;
 
 		if (state > dir_state)
 			dir_state = state;
@@ -2763,6 +2765,10 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
 static void trace2_read_directory_statistics(struct dir_struct *dir,
 					     struct repository *repo)
 {
+	trace2_data_intmax("read_directory", repo,
+			   "directories-visited", dir->visited_directories);
+	trace2_data_intmax("read_directory", repo,
+			   "paths-visited", dir->visited_paths);
 	if (!dir->untracked)
 		return;
 	trace2_data_intmax("read_directory", repo,
@@ -2783,6 +2789,8 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
 	struct untracked_cache_dir *untracked;
 
 	trace2_region_enter("dir", "read_directory", istate->repo);
+	dir->visited_paths = 0;
+	dir->visited_directories = 0;
 
 	if (has_symlink_leading_path(path, len)) {
 		trace_performance_leave("read directory %.*s", len, path);
diff --git a/dir.h b/dir.h
index 04d886cfce75..22c67907f689 100644
--- a/dir.h
+++ b/dir.h
@@ -336,6 +336,10 @@ struct dir_struct {
 	struct oid_stat ss_info_exclude;
 	struct oid_stat ss_excludes_file;
 	unsigned unmanaged_exclude_files;
+
+	/* Stats about the traversal */
+	unsigned visited_paths;
+	unsigned visited_directories;
 };
 
 /*Count the number of slashes for string s*/
diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh
index 6bce65b439e3..1517c316892f 100755
--- a/t/t7063-status-untracked-cache.sh
+++ b/t/t7063-status-untracked-cache.sh
@@ -65,6 +65,7 @@ get_relevant_traces() {
 	INPUT_FILE=$1
 	OUTPUT_FILE=$2
 	grep data.*read_directo $INPUT_FILE \
+	    | grep -v visited \
 	    | cut -d "|" -f 9 \
 	    >$OUTPUT_FILE
 }
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH v3 3/8] [RFC] ls-files: error out on -i unless -o or -c are specified
  2021-05-08 19:58   ` [PATCH v3 0/8] Directory traversal fixes Elijah Newren via GitGitGadget
  2021-05-08 19:58     ` [PATCH v3 1/8] [RFC] dir: convert trace calls to trace2 equivalents Elijah Newren via GitGitGadget
  2021-05-08 19:58     ` [PATCH v3 2/8] [RFC] dir: report number of visited directories and paths with trace2 Elijah Newren via GitGitGadget
@ 2021-05-08 19:58     ` Elijah Newren via GitGitGadget
  2021-05-10  5:09       ` Junio C Hamano
  2021-05-08 19:59     ` [PATCH v3 4/8] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget
                       ` (5 subsequent siblings)
  8 siblings, 1 reply; 90+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-05-08 19:58 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King,
	Philip Oakley, Jeff Hostetler, Josh Steadmon, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

ls-files --ignored can be used together with either --others or
--cached.  After being perplexed for a bit and digging in to the code, I
assumed that ls-files -i was just broken and not printing anything and
had a nice patch ready to submit when I finally realized that -i can be
used with --cached to find tracked ignores.

While that was a mistake on my part, and a careful reading of the
documentation could have made this more clear, I suspect this is an
error others are likely to make as well.  In fact, of two uses in our
testsuite, I believe one of the two did make this error.  In t1306.13,
there are NO tracked files, and all the excludes built up and used in
that test and in previous tests thus have to be about untracked files.
However, since they were looking for an empty result, the mistake went
unnoticed as their erroneous command also just happened to give an empty
answer.

-i will most the time be used with -o, which would suggest we could just
make -i imply -o in the absence of either a -o or -c, but that would be
a backward incompatible break.  Instead, let's just flag -i without
either a -o or -c as an error, and update the two relevant testcases to
specify their intent.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 builtin/ls-files.c          | 3 +++
 t/t1306-xdg-files.sh        | 2 +-
 t/t3003-ls-files-exclude.sh | 4 ++--
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 60a2913a01e9..9f74b1ab2e69 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -748,6 +748,9 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 	if (pathspec.nr && error_unmatch)
 		ps_matched = xcalloc(pathspec.nr, 1);
 
+	if ((dir.flags & DIR_SHOW_IGNORED) && !show_others && !show_cached)
+		die("ls-files --ignored is usually used with --others, but --cached is the default.  Please specify which you want.");
+
 	if ((dir.flags & DIR_SHOW_IGNORED) && !exc_given)
 		die("ls-files --ignored needs some exclude pattern");
 
diff --git a/t/t1306-xdg-files.sh b/t/t1306-xdg-files.sh
index dd87b43be1a6..40d3c42618c0 100755
--- a/t/t1306-xdg-files.sh
+++ b/t/t1306-xdg-files.sh
@@ -116,7 +116,7 @@ test_expect_success 'Exclusion in a non-XDG global ignore file' '
 test_expect_success 'Checking XDG ignore file when HOME is unset' '
 	(sane_unset HOME &&
 	 git config --unset core.excludesfile &&
-	 git ls-files --exclude-standard --ignored >actual) &&
+	 git ls-files --exclude-standard --ignored --others >actual) &&
 	test_must_be_empty actual
 '
 
diff --git a/t/t3003-ls-files-exclude.sh b/t/t3003-ls-files-exclude.sh
index d5ec333131f9..c41c4f046abf 100755
--- a/t/t3003-ls-files-exclude.sh
+++ b/t/t3003-ls-files-exclude.sh
@@ -29,11 +29,11 @@ test_expect_success 'add file to gitignore' '
 '
 check_all_output
 
-test_expect_success 'ls-files -i lists only tracked-but-ignored files' '
+test_expect_success 'ls-files -i -c lists only tracked-but-ignored files' '
 	echo content >other-file &&
 	git add other-file &&
 	echo file >expect &&
-	git ls-files -i --exclude-standard >output &&
+	git ls-files -i -c --exclude-standard >output &&
 	test_cmp expect output
 '
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH v3 4/8] t7300: add testcase showing unnecessary traversal into ignored directory
  2021-05-08 19:58   ` [PATCH v3 0/8] Directory traversal fixes Elijah Newren via GitGitGadget
                       ` (2 preceding siblings ...)
  2021-05-08 19:58     ` [PATCH v3 3/8] [RFC] ls-files: error out on -i unless -o or -c are specified Elijah Newren via GitGitGadget
@ 2021-05-08 19:59     ` Elijah Newren via GitGitGadget
  2021-05-10  5:28       ` Junio C Hamano
  2021-05-08 19:59     ` [PATCH v3 5/8] t3001, t7300: add testcase showcasing missed directory traversal Elijah Newren via GitGitGadget
                       ` (4 subsequent siblings)
  8 siblings, 1 reply; 90+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-05-08 19:59 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King,
	Philip Oakley, Jeff Hostetler, Josh Steadmon, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

The PNPM package manager is apparently creating deeply nested (but
ignored) directory structures; traversing them is costly
performance-wise, unnecessary, and in some cases is even throwing
warnings/errors because the paths are too long to handle on various
platforms.  Add a testcase that checks for such unnecessary directory
traversal.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t7300-clean.sh | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index a74816ca8b46..b7c9898fac5b 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -746,4 +746,26 @@ test_expect_success 'clean untracked paths by pathspec' '
 	test_must_be_empty actual
 '
 
+test_expect_failure 'avoid traversing into ignored directories' '
+	test_when_finished rm -f output error trace.* &&
+	test_create_repo avoid-traversing-deep-hierarchy &&
+	(
+		cd avoid-traversing-deep-hierarchy &&
+
+		mkdir -p untracked/subdir/with/a &&
+		>untracked/subdir/with/a/random-file.txt &&
+
+		GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
+		git clean -ffdxn -e untracked
+	) &&
+
+	grep data.*read_directo.*visited trace.output \
+		| cut -d "|" -f 9 >trace.relevant &&
+	cat >trace.expect <<-EOF &&
+	 directories-visited:1
+	 paths-visited:4
+	EOF
+	test_cmp trace.expect trace.relevant
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH v3 5/8] t3001, t7300: add testcase showcasing missed directory traversal
  2021-05-08 19:58   ` [PATCH v3 0/8] Directory traversal fixes Elijah Newren via GitGitGadget
                       ` (3 preceding siblings ...)
  2021-05-08 19:59     ` [PATCH v3 4/8] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget
@ 2021-05-08 19:59     ` Elijah Newren via GitGitGadget
  2021-05-08 19:59     ` [PATCH v3 6/8] dir: avoid unnecessary traversal into ignored directory Elijah Newren via GitGitGadget
                       ` (3 subsequent siblings)
  8 siblings, 0 replies; 90+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-05-08 19:59 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King,
	Philip Oakley, Jeff Hostetler, Josh Steadmon, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

In the last commit, we added a testcase showing that the directory
traversal machinery sometimes traverses into directories unnecessarily.
Here we show that there are cases where it does the opposite: it does
not traverse into directories, despite those directories having
important files that need to be flagged.

Add a testcase showing that `git ls-files -o -i --directory` can omit
some of the files it should be listing, and another showing that `git
clean -fX` can fail to clean out some of the expected files.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t3001-ls-files-others-exclude.sh |  5 +++++
 t/t7300-clean.sh                   | 19 +++++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/t/t3001-ls-files-others-exclude.sh b/t/t3001-ls-files-others-exclude.sh
index 1ec7cb57c7a8..ac05d1a17931 100755
--- a/t/t3001-ls-files-others-exclude.sh
+++ b/t/t3001-ls-files-others-exclude.sh
@@ -292,6 +292,11 @@ EOF
 	test_cmp expect actual
 '
 
+test_expect_failure 'ls-files with "**" patterns and --directory' '
+	# Expectation same as previous test
+	git ls-files --directory -o -i --exclude "**/a.1" >actual &&
+	test_cmp expect actual
+'
 
 test_expect_success 'ls-files with "**" patterns and no slashes' '
 	git ls-files -o -i --exclude "one**a.1" >actual &&
diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index b7c9898fac5b..74d395838708 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -768,4 +768,23 @@ test_expect_failure 'avoid traversing into ignored directories' '
 	test_cmp trace.expect trace.relevant
 '
 
+test_expect_failure 'traverse into directories that may have ignored entries' '
+	test_when_finished rm -f output &&
+	test_create_repo need-to-traverse-into-hierarchy &&
+	(
+		cd need-to-traverse-into-hierarchy &&
+		mkdir -p modules/foobar/src/generated &&
+		> modules/foobar/src/generated/code.c &&
+		> modules/foobar/Makefile &&
+		echo "/modules/**/src/generated/" >.gitignore &&
+
+		git clean -fX modules/foobar >../output &&
+
+		grep Removing ../output &&
+
+		test_path_is_missing modules/foobar/src/generated/code.c &&
+		test_path_is_file modules/foobar/Makefile
+	)
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH v3 6/8] dir: avoid unnecessary traversal into ignored directory
  2021-05-08 19:58   ` [PATCH v3 0/8] Directory traversal fixes Elijah Newren via GitGitGadget
                       ` (4 preceding siblings ...)
  2021-05-08 19:59     ` [PATCH v3 5/8] t3001, t7300: add testcase showcasing missed directory traversal Elijah Newren via GitGitGadget
@ 2021-05-08 19:59     ` Elijah Newren via GitGitGadget
  2021-05-10  5:48       ` Junio C Hamano
  2021-05-08 19:59     ` [PATCH v3 7/8] dir: traverse into untracked directories if they may have ignored subfiles Elijah Newren via GitGitGadget
                       ` (2 subsequent siblings)
  8 siblings, 1 reply; 90+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-05-08 19:59 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King,
	Philip Oakley, Jeff Hostetler, Josh Steadmon, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

The show_other_directories case in treat_directory() tried to handle
both excludes and untracked files with the same logic, and mishandled
both the excludes and the untracked files in the process, in different
ways.  Split that logic apart, and then focus on the logic for the
excludes; a subsequent commit will address the logic for untracked
files.

For show_other_directories, an excluded directory means that
every path underneath that directory will also be excluded.  Given that
the calling code requested to just show directories when everything
under a directory had the same state (that's what the
"DIR_SHOW_OTHER_DIRECTORIES" flag means), we generally do not need to
traverse into such directories and can just immediately mark them as
ignored (i.e. as path_excluded).  The only reason we cannot just
immediately return path_excluded is the DIR_HIDE_EMPTY_DIRECTORIES flag
and the possibility that the ignored directory is an empty directory.
The code previously treated DIR_SHOW_IGNORED_TOO in most cases as an
exception as well, which was wrong.  It can sometimes reduce the number
of cases where we need to recurse (namely if
DIR_SHOW_IGNORED_TOO_MODE_MATCHING is also set), but should not be able
to increase the number of cases where we need to recurse.  Fix the logic
accordingly.

Some sidenotes about possible confusion with dir.c:

* "ignored" often refers to an untracked ignore", i.e. a file which is
  not tracked which matches one of the ignore/exclusion rules.  But you
  can also have a "tracked ignore", a tracked file that happens to match
  one of the ignore/exclusion rules and which dir.c has to worry about
  since "git ls-files -c -i" is supposed to list them.

* The dir code often uses "ignored" and "excluded" interchangeably,
  which you need to keep in mind while reading the code.  Sadly, though,
  it can get very confusing since ignore rules can have exclusions, as
  in the last of the following .gitignore rules:
      .gitignore
      *~
      *.log
      !settings.log
  In the last entry above, (pathspec->items[3].magic & PATHSPEC_EXCLUDE)
  will be true due the the '!' negating the rule.  Someone might refer
  to this as "excluded".  That means the file 'settings.log' will not
  match, and thus not be ignored.  So we won't return path_excluded for
  it.  So it's an exclude rule that prevents the file from being an
  exclude.  The non-excluded rules are the ones that result in files
  being excludes.  Great fun, eh?

Sometimes it feels like dir.c needs its own glossary with its many
definitions, including the multiply-defined terms.

Reported-by: Jason Gore <Jason.Gore@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c            | 44 +++++++++++++++++++++++++++++---------------
 t/t7300-clean.sh |  2 +-
 2 files changed, 30 insertions(+), 16 deletions(-)

diff --git a/dir.c b/dir.c
index dfb174227b36..3f2cfef2c2bb 100644
--- a/dir.c
+++ b/dir.c
@@ -1844,6 +1844,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	}
 
 	/* This is the "show_other_directories" case */
+	assert(dir->flags & DIR_SHOW_OTHER_DIRECTORIES);
 
 	/*
 	 * If we have a pathspec which could match something _below_ this
@@ -1854,27 +1855,40 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	if (matches_how == MATCHED_RECURSIVELY_LEADING_PATHSPEC)
 		return path_recurse;
 
+	/* Special cases for where this directory is excluded/ignored */
+	if (excluded) {
+		/*
+		 * In the show_other_directories case, if we're not
+		 * hiding empty directories, there is no need to
+		 * recurse into an ignored directory.
+		 */
+		if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES))
+			return path_excluded;
+
+		/*
+		 * Even if we are hiding empty directories, we can still avoid
+		 * recursing into ignored directories for DIR_SHOW_IGNORED_TOO
+		 * if DIR_SHOW_IGNORED_TOO_MODE_MATCHING is also set.
+		 */
+		if ((dir->flags & DIR_SHOW_IGNORED_TOO) &&
+		    (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING))
+			return path_excluded;
+	}
+
 	/*
-	 * Other than the path_recurse case immediately above, we only need
-	 * to recurse into untracked/ignored directories if either of the
-	 * following bits is set:
+	 * Other than the path_recurse case above, we only need to
+	 * recurse into untracked directories if either of the following
+	 * bits is set:
 	 *   - DIR_SHOW_IGNORED_TOO (because then we need to determine if
 	 *                           there are ignored entries below)
 	 *   - DIR_HIDE_EMPTY_DIRECTORIES (because we have to determine if
 	 *                                 the directory is empty)
 	 */
-	if (!(dir->flags & (DIR_SHOW_IGNORED_TOO | DIR_HIDE_EMPTY_DIRECTORIES)))
-		return excluded ? path_excluded : path_untracked;
-
-	/*
-	 * ...and even if DIR_SHOW_IGNORED_TOO is set, we can still avoid
-	 * recursing into ignored directories if the path is excluded and
-	 * DIR_SHOW_IGNORED_TOO_MODE_MATCHING is also set.
-	 */
-	if (excluded &&
-	    (dir->flags & DIR_SHOW_IGNORED_TOO) &&
-	    (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING))
-		return path_excluded;
+	if (!excluded &&
+	    !(dir->flags & (DIR_SHOW_IGNORED_TOO |
+			    DIR_HIDE_EMPTY_DIRECTORIES))) {
+		return path_untracked;
+	}
 
 	/*
 	 * Even if we don't want to know all the paths under an untracked or
diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index 74d395838708..a1d695ee9fe9 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -746,7 +746,7 @@ test_expect_success 'clean untracked paths by pathspec' '
 	test_must_be_empty actual
 '
 
-test_expect_failure 'avoid traversing into ignored directories' '
+test_expect_success 'avoid traversing into ignored directories' '
 	test_when_finished rm -f output error trace.* &&
 	test_create_repo avoid-traversing-deep-hierarchy &&
 	(
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH v3 7/8] dir: traverse into untracked directories if they may have ignored subfiles
  2021-05-08 19:58   ` [PATCH v3 0/8] Directory traversal fixes Elijah Newren via GitGitGadget
                       ` (5 preceding siblings ...)
  2021-05-08 19:59     ` [PATCH v3 6/8] dir: avoid unnecessary traversal into ignored directory Elijah Newren via GitGitGadget
@ 2021-05-08 19:59     ` Elijah Newren via GitGitGadget
  2021-05-08 19:59     ` [PATCH v3 8/8] dir: update stale description of treat_directory() Derrick Stolee via GitGitGadget
  2021-05-11 18:34     ` [PATCH v4 0/8] Directory traversal fixes Elijah Newren via GitGitGadget
  8 siblings, 0 replies; 90+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-05-08 19:59 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King,
	Philip Oakley, Jeff Hostetler, Josh Steadmon, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

A directory that is untracked does not imply that all files under it
should be categorized as untracked; in particular, if the caller is
interested in ignored files, many files or directories underneath the
untracked directory may be ignored.  We previously partially handled
this right with DIR_SHOW_IGNORED_TOO, but missed DIR_SHOW_IGNORED.  It
was not obvious, though, because the logic for untracked and excluded
files had been fused together making it harder to reason about.  The
previous commit split that logic out, making it easier to notice that
DIR_SHOW_IGNORED was missing.  Add it.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c                              | 10 ++++++----
 t/t3001-ls-files-others-exclude.sh |  2 +-
 t/t7300-clean.sh                   |  2 +-
 3 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/dir.c b/dir.c
index 3f2cfef2c2bb..f5d9732d9e68 100644
--- a/dir.c
+++ b/dir.c
@@ -1877,15 +1877,17 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 
 	/*
 	 * Other than the path_recurse case above, we only need to
-	 * recurse into untracked directories if either of the following
+	 * recurse into untracked directories if any of the following
 	 * bits is set:
-	 *   - DIR_SHOW_IGNORED_TOO (because then we need to determine if
-	 *                           there are ignored entries below)
+	 *   - DIR_SHOW_IGNORED (because then we need to determine if
+	 *                       there are ignored entries below)
+	 *   - DIR_SHOW_IGNORED_TOO (same as above)
 	 *   - DIR_HIDE_EMPTY_DIRECTORIES (because we have to determine if
 	 *                                 the directory is empty)
 	 */
 	if (!excluded &&
-	    !(dir->flags & (DIR_SHOW_IGNORED_TOO |
+	    !(dir->flags & (DIR_SHOW_IGNORED |
+			    DIR_SHOW_IGNORED_TOO |
 			    DIR_HIDE_EMPTY_DIRECTORIES))) {
 		return path_untracked;
 	}
diff --git a/t/t3001-ls-files-others-exclude.sh b/t/t3001-ls-files-others-exclude.sh
index ac05d1a17931..516c95ea0e82 100755
--- a/t/t3001-ls-files-others-exclude.sh
+++ b/t/t3001-ls-files-others-exclude.sh
@@ -292,7 +292,7 @@ EOF
 	test_cmp expect actual
 '
 
-test_expect_failure 'ls-files with "**" patterns and --directory' '
+test_expect_success 'ls-files with "**" patterns and --directory' '
 	# Expectation same as previous test
 	git ls-files --directory -o -i --exclude "**/a.1" >actual &&
 	test_cmp expect actual
diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index a1d695ee9fe9..751764c0f1ae 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -768,7 +768,7 @@ test_expect_success 'avoid traversing into ignored directories' '
 	test_cmp trace.expect trace.relevant
 '
 
-test_expect_failure 'traverse into directories that may have ignored entries' '
+test_expect_success 'traverse into directories that may have ignored entries' '
 	test_when_finished rm -f output &&
 	test_create_repo need-to-traverse-into-hierarchy &&
 	(
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH v3 8/8] dir: update stale description of treat_directory()
  2021-05-08 19:58   ` [PATCH v3 0/8] Directory traversal fixes Elijah Newren via GitGitGadget
                       ` (6 preceding siblings ...)
  2021-05-08 19:59     ` [PATCH v3 7/8] dir: traverse into untracked directories if they may have ignored subfiles Elijah Newren via GitGitGadget
@ 2021-05-08 19:59     ` Derrick Stolee via GitGitGadget
  2021-05-11 18:34     ` [PATCH v4 0/8] Directory traversal fixes Elijah Newren via GitGitGadget
  8 siblings, 0 replies; 90+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-08 19:59 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King,
	Philip Oakley, Jeff Hostetler, Josh Steadmon, Elijah Newren,
	Derrick Stolee

From: Derrick Stolee <stolee@gmail.com>

The documentation comment for treat_directory() was originally written
in 095952 (Teach directory traversal about subprojects, 2007-04-11)
which was before the 'struct dir_struct' split its bitfield of named
options into a 'flags' enum in 7c4c97c0 (Turn the flags in struct
dir_struct into a single variable, 2009-02-16). When those flags
changed, the comment became stale, since members like
'show_other_directories' transitioned into flags like
DIR_SHOW_OTHER_DIRECTORIES.

Update the comments for treat_directory() to use these flag names rather
than the old member names.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/dir.c b/dir.c
index f5d9732d9e68..896a9a62b2c7 100644
--- a/dir.c
+++ b/dir.c
@@ -1749,13 +1749,13 @@ static enum exist_status directory_exists_in_index(struct index_state *istate,
  * Case 3: if we didn't have it in the index previously, we
  * have a few sub-cases:
  *
- *  (a) if "show_other_directories" is true, we show it as
- *      just a directory, unless "hide_empty_directories" is
+ *  (a) if DIR_SHOW_OTHER_DIRECTORIES flag is set, we show it as
+ *      just a directory, unless DIR_HIDE_EMPTY_DIRECTORIES is
  *      also true, in which case we need to check if it contains any
  *      untracked and / or ignored files.
- *  (b) if it looks like a git directory, and we don't have
- *      'no_gitlinks' set we treat it as a gitlink, and show it
- *      as a directory.
+ *  (b) if it looks like a git directory and we don't have the
+ *      DIR_NO_GITLINKS flag, then we treat it as a gitlink, and
+ *      show it as a directory.
  *  (c) otherwise, we recurse into it.
  */
 static enum path_treatment treat_directory(struct dir_struct *dir,
@@ -1843,7 +1843,6 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 		return path_recurse;
 	}
 
-	/* This is the "show_other_directories" case */
 	assert(dir->flags & DIR_SHOW_OTHER_DIRECTORIES);
 
 	/*
@@ -1858,7 +1857,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	/* Special cases for where this directory is excluded/ignored */
 	if (excluded) {
 		/*
-		 * In the show_other_directories case, if we're not
+		 * If DIR_SHOW_OTHER_DIRECTORIES is set and we're not
 		 * hiding empty directories, there is no need to
 		 * recurse into an ignored directory.
 		 */
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v3 1/8] [RFC] dir: convert trace calls to trace2 equivalents
  2021-05-08 19:58     ` [PATCH v3 1/8] [RFC] dir: convert trace calls to trace2 equivalents Elijah Newren via GitGitGadget
@ 2021-05-10  4:49       ` Junio C Hamano
  2021-05-11 17:23         ` Elijah Newren
  2021-05-11 16:17       ` Jeff Hostetler
  1 sibling, 1 reply; 90+ messages in thread
From: Junio C Hamano @ 2021-05-10  4:49 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget
  Cc: git, Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King,
	Philip Oakley, Jeff Hostetler, Josh Steadmon

"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +static void trace2_read_directory_statistics(struct dir_struct *dir,
> +					     struct repository *repo)
> +{
> +	if (!dir->untracked)
> +		return;
> +	trace2_data_intmax("read_directory", repo,
> +			   "node-creation", dir->untracked->dir_created);
> +	trace2_data_intmax("read_directory", repo,
> +			   "gitignore-invalidation",
> +			   dir->untracked->gitignore_invalidated);
> +	trace2_data_intmax("read_directory", repo,
> +			   "directory-invalidation",
> +			   dir->untracked->dir_invalidated);
> +	trace2_data_intmax("read_directory", repo,
> +			   "opendir", dir->untracked->dir_opened);
> +}
> +

This obviously looks like an equivalent to what happens in the
original inside the "if (dir->untracked)" block.

And we have a performance_{enter,leave} pair replaced with
a region_[enter,leave} pair. 

> -	trace_performance_enter();
> +	trace2_region_enter("dir", "read_directory", istate->repo);
>   ...
> -	trace_performance_leave("read directory %.*s", len, path);
> +	trace2_region_leave("dir", "read_directory", istate->repo);

>  	if (dir->untracked) {
>  		static int force_untracked_cache = -1;
> -		static struct trace_key trace_untracked_stats = TRACE_KEY_INIT(UNTRACKED_STATS);
>  
>  		if (force_untracked_cache < 0)
>  			force_untracked_cache =
>  				git_env_bool("GIT_FORCE_UNTRACKED_CACHE", 0);
> -		trace_printf_key(&trace_untracked_stats,
> -				 "node creation: %u\n"
> -				 "gitignore invalidation: %u\n"
> -				 "directory invalidation: %u\n"
> -				 "opendir: %u\n",
> -				 dir->untracked->dir_created,
> -				 dir->untracked->gitignore_invalidated,
> -				 dir->untracked->dir_invalidated,
> -				 dir->untracked->dir_opened);
>  		if (force_untracked_cache &&
>  			dir->untracked == istate->untracked &&
>  		    (dir->untracked->dir_opened ||

Removal of the trace_printf() in the middle made the body of this
if() statement much less distracting, which is good.

> @@ -2811,6 +2818,9 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
>  			FREE_AND_NULL(dir->untracked);
>  		}
>  	}
> +
> +	if (trace2_is_enabled())
> +		trace2_read_directory_statistics(dir, istate->repo);

This slightly changes the semantics in that the original did an
equivalent emitting from inside the "if (dir->untracked)" block, but
this call is hoisted outside, and the new helper knows how to be
silent when untracked thing is not in effect, so the net effect at
this step is the same.  And if we ever add tracing statics that is
relevant when !dir->untracked is true, the new code organization is
easier to work with.

The only curious thing is the guard "if (trace2_is_enabled())";
correctness-wise, are there bad things going to happen if it is not
here, or is this a performance hack, or is it more for its
documentation value (meaning, it would be a bug if we later added
things that are irrelevant when trace is not enabled to the helper)?

> @@ -57,6 +57,19 @@ iuc () {
>  	return $ret
>  }
>  
> +get_relevant_traces() {

Style.  SP on both sides of "()".

> +	# From the GIT_TRACE2_PERF data of the form
> +	#    $TIME $FILE:$LINE | d0 | main | data | r1 | ? | ? | read_directo | $RELEVANT_STAT
> +	# extract the $RELEVANT_STAT fields.  We don't care about region_enter
> +	# or region_leave, or stats for things outside read_directory.
> +	INPUT_FILE=$1
> +	OUTPUT_FILE=$2
> +	grep data.*read_directo $INPUT_FILE \
> +	    | cut -d "|" -f 9 \
> +	    >$OUTPUT_FILE

Style.  Wrapping the line after pipe '|' will allow you to omit the
backslash.  Also quote the redirection target, i.e. >"$OUTPUT_FILE",
to help certain vintage of bash.

Those who are more familiar with the trace2 infrastructure may want
to further comment, but it looked obvious and straightforward to me.

Thanks.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v3 2/8] [RFC] dir: report number of visited directories and paths with trace2
  2021-05-08 19:58     ` [PATCH v3 2/8] [RFC] dir: report number of visited directories and paths with trace2 Elijah Newren via GitGitGadget
@ 2021-05-10  5:00       ` Junio C Hamano
  0 siblings, 0 replies; 90+ messages in thread
From: Junio C Hamano @ 2021-05-10  5:00 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget
  Cc: git, Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King,
	Philip Oakley, Jeff Hostetler, Josh Steadmon

"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Elijah Newren <newren@gmail.com>
>
> Provide more statistics in trace2 output that include the number of
> directories and total paths visited by the directory traversal logic.
> Subsequent patches will take advantage of this to ensure we do not
> unnecessarily traverse into ignored directories.

And this change is the reason behind how the call to the trace
statistics helper is now outside the "if (untracked)" block after
patch 1/8; makes sense to me.


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v3 3/8] [RFC] ls-files: error out on -i unless -o or -c are specified
  2021-05-08 19:58     ` [PATCH v3 3/8] [RFC] ls-files: error out on -i unless -o or -c are specified Elijah Newren via GitGitGadget
@ 2021-05-10  5:09       ` Junio C Hamano
  2021-05-11 17:40         ` Elijah Newren
  0 siblings, 1 reply; 90+ messages in thread
From: Junio C Hamano @ 2021-05-10  5:09 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget
  Cc: git, Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King,
	Philip Oakley, Jeff Hostetler, Josh Steadmon

"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> @@ -748,6 +748,9 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
>  	if (pathspec.nr && error_unmatch)
>  		ps_matched = xcalloc(pathspec.nr, 1);
>  
> +	if ((dir.flags & DIR_SHOW_IGNORED) && !show_others && !show_cached)
> +		die("ls-files --ignored is usually used with --others, but --cached is the default.  Please specify which you want.");
> +

So "git ls-files -i" would suddenly start erroring out and users are
to scramble and patch their scripts?

More importantly, the message does not make much sense.  "I is
usually used with O" is very true, but the mention of "usually" here
means it is not an error for "I" to be used without "O".  That part
is very understandable and correct.

But I do not know what "but --cached is the default" part wants to
say.  If it is the _default_, and (assuming that what I read in the
proposed log message is correct) the combination of "-i -c" is valid,
then I would understand the message if the code were more like this:

	if ((dir.flags & DIR_SHOW_IGNORED) &&
	    !show_others && !show_cached) {
		show_cached = 1; /* default */
		warning("ls-files -i given without -o/-c; defaulting to -i -c");
	}

If we are not defaulting to cached, then

	die("ls-files -i must be used with either -o or -c");

would also make sense.

The variant presented in the patch does not make sense to me.

> diff --git a/t/t1306-xdg-files.sh b/t/t1306-xdg-files.sh
> index dd87b43be1a6..40d3c42618c0 100755
> --- a/t/t1306-xdg-files.sh
> +++ b/t/t1306-xdg-files.sh
> @@ -116,7 +116,7 @@ test_expect_success 'Exclusion in a non-XDG global ignore file' '
>  test_expect_success 'Checking XDG ignore file when HOME is unset' '
>  	(sane_unset HOME &&
>  	 git config --unset core.excludesfile &&
> -	 git ls-files --exclude-standard --ignored >actual) &&
> +	 git ls-files --exclude-standard --ignored --others >actual) &&
>  	test_must_be_empty actual
>  '
>  
> diff --git a/t/t3003-ls-files-exclude.sh b/t/t3003-ls-files-exclude.sh
> index d5ec333131f9..c41c4f046abf 100755
> --- a/t/t3003-ls-files-exclude.sh
> +++ b/t/t3003-ls-files-exclude.sh
> @@ -29,11 +29,11 @@ test_expect_success 'add file to gitignore' '
>  '
>  check_all_output
>  
> -test_expect_success 'ls-files -i lists only tracked-but-ignored files' '
> +test_expect_success 'ls-files -i -c lists only tracked-but-ignored files' '
>  	echo content >other-file &&
>  	git add other-file &&
>  	echo file >expect &&
> -	git ls-files -i --exclude-standard >output &&
> +	git ls-files -i -c --exclude-standard >output &&
>  	test_cmp expect output
>  '

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v3 4/8] t7300: add testcase showing unnecessary traversal into ignored directory
  2021-05-08 19:59     ` [PATCH v3 4/8] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget
@ 2021-05-10  5:28       ` Junio C Hamano
  2021-05-11 17:45         ` Elijah Newren
  0 siblings, 1 reply; 90+ messages in thread
From: Junio C Hamano @ 2021-05-10  5:28 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget
  Cc: git, Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King,
	Philip Oakley, Jeff Hostetler, Josh Steadmon

"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +test_expect_failure 'avoid traversing into ignored directories' '
> +	test_when_finished rm -f output error trace.* &&
> +	test_create_repo avoid-traversing-deep-hierarchy &&
> +	(
> +		cd avoid-traversing-deep-hierarchy &&
> +
> +		mkdir -p untracked/subdir/with/a &&
> +		>untracked/subdir/with/a/random-file.txt &&
> +
> +		GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
> +		git clean -ffdxn -e untracked
> +	) &&
> +
> +	grep data.*read_directo.*visited trace.output \
> +		| cut -d "|" -f 9 >trace.relevant &&
> +	cat >trace.expect <<-EOF &&
> +	 directories-visited:1
> +	 paths-visited:4

Are the origins of '1' and '4' trivially obvious to those who are
reading the test, or do these deserve comments?

We create an empty test repository, go there and create a untracked/
hierarchy with a junk file, and tell "clean" that 'untracked' is
"also" in the exclude pattern (but since there is no other exclude
pattern, that is the only one), so everything underneath untracked/
we have no reason to inspect.

So, we do not visit 'untracked' directory.  Which ones do we visit?
Is '1' coming from the top-level of the working tree '.'?  What
about the number of visited paths '4' (the trace is stored outside
this new test repository, so that's not it).

Thanks.

> +	EOF
> +	test_cmp trace.expect trace.relevant
> +'
> +
>  test_done

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v3 6/8] dir: avoid unnecessary traversal into ignored directory
  2021-05-08 19:59     ` [PATCH v3 6/8] dir: avoid unnecessary traversal into ignored directory Elijah Newren via GitGitGadget
@ 2021-05-10  5:48       ` Junio C Hamano
  2021-05-11 17:57         ` Elijah Newren
  0 siblings, 1 reply; 90+ messages in thread
From: Junio C Hamano @ 2021-05-10  5:48 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget
  Cc: git, Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King,
	Philip Oakley, Jeff Hostetler, Josh Steadmon

"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Some sidenotes about possible confusion with dir.c:

Thanks for working on untangling this mess ;-)

> * "ignored" often refers to an untracked ignore", i.e. a file which is
>   not tracked which matches one of the ignore/exclusion rules.  But you
>   can also have a "tracked ignore", a tracked file that happens to match
>   one of the ignore/exclusion rules and which dir.c has to worry about
>   since "git ls-files -c -i" is supposed to list them.

OK.  This is to find a pattern in .gitignore that is too broad
(i.e. if the path were to be added as a new thing today, it would
require "add -f"), right?  The combination of "-i -c" does make
sense for that purpose.

> * The dir code often uses "ignored" and "excluded" interchangeably,
>   which you need to keep in mind while reading the code.  

True.  In tree .gitignore files are to hold exclude patterns, and
per repository personal exclude file is called $GIT_DIR/info/exclude
which is confusing.

> Sadly, though,
>   it can get very confusing since ignore rules can have exclusions, as
>   in the last of the following .gitignore rules:
>       .gitignore
>       *~
>       *.log
>       !settings.log
>   In the last entry above, (pathspec->items[3].magic & PATHSPEC_EXCLUDE)
>   will be true due the the '!' negating the rule.  Someone might refer
>   to this as "excluded".

That one I've never heard of.  As far as I am concerned, that is a
negative exclude pattern.

I do wish we started the project with .gitignore files and
$GIT_DIR/info/ignore both of which holds ignore patterns and
negative ignore patterns from day one, but the boat sailed
long time ago.


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v3 1/8] [RFC] dir: convert trace calls to trace2 equivalents
  2021-05-08 19:58     ` [PATCH v3 1/8] [RFC] dir: convert trace calls to trace2 equivalents Elijah Newren via GitGitGadget
  2021-05-10  4:49       ` Junio C Hamano
@ 2021-05-11 16:17       ` Jeff Hostetler
  2021-05-11 17:29         ` Elijah Newren
  1 sibling, 1 reply; 90+ messages in thread
From: Jeff Hostetler @ 2021-05-11 16:17 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget, git
  Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King,
	Philip Oakley, Jeff Hostetler, Josh Steadmon



On 5/8/21 3:58 PM, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
> 
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>   dir.c                             |  34 ++++--
>   t/t7063-status-untracked-cache.sh | 193 +++++++++++++++++-------------
>   t/t7519-status-fsmonitor.sh       |   8 +-
>   3 files changed, 135 insertions(+), 100 deletions(-)
> 
> diff --git a/dir.c b/dir.c
> index 3474e67e8f3c..9f7c8debeab3 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -2760,12 +2760,29 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
>   	return root;
>   }
>   
> +static void trace2_read_directory_statistics(struct dir_struct *dir,
> +					     struct repository *repo)
> +{
> +	if (!dir->untracked)
> +		return;

Is there value to also printing the path?
The existing `trace_performance_leave()` calls were, but
I'm familiar enough with this code to say if the output
wasn't always something like ".".

> +	trace2_data_intmax("read_directory", repo,
> +			   "node-creation", dir->untracked->dir_created);
> +	trace2_data_intmax("read_directory", repo,
> +			   "gitignore-invalidation",
> +			   dir->untracked->gitignore_invalidated);
> +	trace2_data_intmax("read_directory", repo,
> +			   "directory-invalidation",
> +			   dir->untracked->dir_invalidated);
> +	trace2_data_intmax("read_directory", repo,
> +			   "opendir", dir->untracked->dir_opened);
> +}
> +

The existing code was quite tangled and I think this helps
make things more clear.


>   int read_directory(struct dir_struct *dir, struct index_state *istate,
>   		   const char *path, int len, const struct pathspec *pathspec)
>   {
>   	struct untracked_cache_dir *untracked;
>   
> -	trace_performance_enter();
> +	trace2_region_enter("dir", "read_directory", istate->repo);
>   
>   	if (has_symlink_leading_path(path, len)) {
>   		trace_performance_leave("read directory %.*s", len, path);

This `trace_performance_leave()` inside the `if` needs to be
converted too.


> @@ -2784,23 +2801,13 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
>   	QSORT(dir->entries, dir->nr, cmp_dir_entry);
>   	QSORT(dir->ignored, dir->ignored_nr, cmp_dir_entry);
>   
> -	trace_performance_leave("read directory %.*s", len, path);
> +	trace2_region_leave("dir", "read_directory", istate->repo);

Can we put the call to `trace2_read_directory_statistics()` before
the above `trace2_region_leave()` call?   Then those stats will
appear indented between the begin- and end-region events in the output.

That way, the following `if (dir-untracked) {...}` is only
concerned with the untracked cache and/or freeing that data.

>   	if (dir->untracked) {
>   		static int force_untracked_cache = -1;
> -		static struct trace_key trace_untracked_stats = TRACE_KEY_INIT(UNTRACKED_STATS);
>   
>   		if (force_untracked_cache < 0)
>   			force_untracked_cache =
>   				git_env_bool("GIT_FORCE_UNTRACKED_CACHE", 0);
> -		trace_printf_key(&trace_untracked_stats,
> -				 "node creation: %u\n"
> -				 "gitignore invalidation: %u\n"
> -				 "directory invalidation: %u\n"
> -				 "opendir: %u\n",
> -				 dir->untracked->dir_created,
> -				 dir->untracked->gitignore_invalidated,
> -				 dir->untracked->dir_invalidated,
> -				 dir->untracked->dir_opened);
>   		if (force_untracked_cache &&
>   			dir->untracked == istate->untracked &&
>   		    (dir->untracked->dir_opened ||
> @@ -2811,6 +2818,9 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
>   			FREE_AND_NULL(dir->untracked);
>   		}
>   	}
> +
> +	if (trace2_is_enabled())
> +		trace2_read_directory_statistics(dir, istate->repo);

Also, I think it'd be ok to move the `trace2_is_enabled()` call
inside the function.  Since we're also testing `!dir->untracked`
inside the function.

The more that I look at the before and after versions, the
more I think the `trace2_read_directory_statistics()` call
should be up before the `trace2_region_leave()`.  Here at the
bottom of the function, we may have already freed `dir->untracked`.
I'm not familiar enough with this code to know if that is a
good or bad thing.


>   	return dir->nr;
>   }
>   
...


Thanks,
Jeff

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v3 1/8] [RFC] dir: convert trace calls to trace2 equivalents
  2021-05-10  4:49       ` Junio C Hamano
@ 2021-05-11 17:23         ` Elijah Newren
  0 siblings, 0 replies; 90+ messages in thread
From: Elijah Newren @ 2021-05-11 17:23 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Eric Sunshine,
	Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler,
	Josh Steadmon

On Sun, May 9, 2021 at 9:49 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > +static void trace2_read_directory_statistics(struct dir_struct *dir,
> > +                                          struct repository *repo)
> > +{
> > +     if (!dir->untracked)
> > +             return;
> > +     trace2_data_intmax("read_directory", repo,
> > +                        "node-creation", dir->untracked->dir_created);
> > +     trace2_data_intmax("read_directory", repo,
> > +                        "gitignore-invalidation",
> > +                        dir->untracked->gitignore_invalidated);
> > +     trace2_data_intmax("read_directory", repo,
> > +                        "directory-invalidation",
> > +                        dir->untracked->dir_invalidated);
> > +     trace2_data_intmax("read_directory", repo,
> > +                        "opendir", dir->untracked->dir_opened);
> > +}
> > +
>
> This obviously looks like an equivalent to what happens in the
> original inside the "if (dir->untracked)" block.
>
> And we have a performance_{enter,leave} pair replaced with
> a region_[enter,leave} pair.
>
> > -     trace_performance_enter();
> > +     trace2_region_enter("dir", "read_directory", istate->repo);
> >   ...
> > -     trace_performance_leave("read directory %.*s", len, path);
> > +     trace2_region_leave("dir", "read_directory", istate->repo);
>
> >       if (dir->untracked) {
> >               static int force_untracked_cache = -1;
> > -             static struct trace_key trace_untracked_stats = TRACE_KEY_INIT(UNTRACKED_STATS);
> >
> >               if (force_untracked_cache < 0)
> >                       force_untracked_cache =
> >                               git_env_bool("GIT_FORCE_UNTRACKED_CACHE", 0);
> > -             trace_printf_key(&trace_untracked_stats,
> > -                              "node creation: %u\n"
> > -                              "gitignore invalidation: %u\n"
> > -                              "directory invalidation: %u\n"
> > -                              "opendir: %u\n",
> > -                              dir->untracked->dir_created,
> > -                              dir->untracked->gitignore_invalidated,
> > -                              dir->untracked->dir_invalidated,
> > -                              dir->untracked->dir_opened);
> >               if (force_untracked_cache &&
> >                       dir->untracked == istate->untracked &&
> >                   (dir->untracked->dir_opened ||
>
> Removal of the trace_printf() in the middle made the body of this
> if() statement much less distracting, which is good.
>
> > @@ -2811,6 +2818,9 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
> >                       FREE_AND_NULL(dir->untracked);
> >               }
> >       }
> > +
> > +     if (trace2_is_enabled())
> > +             trace2_read_directory_statistics(dir, istate->repo);
>
> This slightly changes the semantics in that the original did an
> equivalent emitting from inside the "if (dir->untracked)" block, but
> this call is hoisted outside, and the new helper knows how to be
> silent when untracked thing is not in effect, so the net effect at
> this step is the same.  And if we ever add tracing statics that is
> relevant when !dir->untracked is true, the new code organization is
> easier to work with.
>
> The only curious thing is the guard "if (trace2_is_enabled())";
> correctness-wise, are there bad things going to happen if it is not
> here, or is this a performance hack, or is it more for its
> documentation value (meaning, it would be a bug if we later added
> things that are irrelevant when trace is not enabled to the helper)?

No, there's nothing bad that would happen here.  It was a combination
of a performance hack and documentation in case
trace2_read_directory_statistics() started gaining other code besides
trace2_*() calls, but which code was only relevant when trace2 was
enabled.

Turns out, though, that Jeff's suggestion to also print the path in
the statistics is going to require me creating a temporary strbuf so
that I can get a NUL-terminated string.  We only want to do that when
trace2_is_enabled(), so that will make the introduction of that check
a bit more natural.

> > @@ -57,6 +57,19 @@ iuc () {
> >       return $ret
> >  }
> >
> > +get_relevant_traces() {
>
> Style.  SP on both sides of "()".

Will fix.

>
> > +     # From the GIT_TRACE2_PERF data of the form
> > +     #    $TIME $FILE:$LINE | d0 | main | data | r1 | ? | ? | read_directo | $RELEVANT_STAT
> > +     # extract the $RELEVANT_STAT fields.  We don't care about region_enter
> > +     # or region_leave, or stats for things outside read_directory.
> > +     INPUT_FILE=$1
> > +     OUTPUT_FILE=$2
> > +     grep data.*read_directo $INPUT_FILE \
> > +         | cut -d "|" -f 9 \
> > +         >$OUTPUT_FILE
>
> Style.  Wrapping the line after pipe '|' will allow you to omit the
> backslash.  Also quote the redirection target, i.e. >"$OUTPUT_FILE",
> to help certain vintage of bash.

Will fix.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v3 1/8] [RFC] dir: convert trace calls to trace2 equivalents
  2021-05-11 16:17       ` Jeff Hostetler
@ 2021-05-11 17:29         ` Elijah Newren
  0 siblings, 0 replies; 90+ messages in thread
From: Elijah Newren @ 2021-05-11 17:29 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Eric Sunshine,
	Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler,
	Josh Steadmon

On Tue, May 11, 2021 at 9:17 AM Jeff Hostetler <git@jeffhostetler.com> wrote:
>
> On 5/8/21 3:58 PM, Elijah Newren via GitGitGadget wrote:
> > From: Elijah Newren <newren@gmail.com>
> >
> > Signed-off-by: Elijah Newren <newren@gmail.com>
> > ---
> >   dir.c                             |  34 ++++--
> >   t/t7063-status-untracked-cache.sh | 193 +++++++++++++++++-------------
> >   t/t7519-status-fsmonitor.sh       |   8 +-
> >   3 files changed, 135 insertions(+), 100 deletions(-)
> >
> > diff --git a/dir.c b/dir.c
> > index 3474e67e8f3c..9f7c8debeab3 100644
> > --- a/dir.c
> > +++ b/dir.c
> > @@ -2760,12 +2760,29 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
> >       return root;
> >   }
> >
> > +static void trace2_read_directory_statistics(struct dir_struct *dir,
> > +                                          struct repository *repo)
> > +{
> > +     if (!dir->untracked)
> > +             return;
>
> Is there value to also printing the path?
> The existing `trace_performance_leave()` calls were, but
> I'm familiar enough with this code to say if the output
> wasn't always something like ".".

The path will most likely just be "" (i.e. the empty string) for the
toplevel directory, but not always so it may be useful to print it.
I'll add it.

> > +     trace2_data_intmax("read_directory", repo,
> > +                        "node-creation", dir->untracked->dir_created);
> > +     trace2_data_intmax("read_directory", repo,
> > +                        "gitignore-invalidation",
> > +                        dir->untracked->gitignore_invalidated);
> > +     trace2_data_intmax("read_directory", repo,
> > +                        "directory-invalidation",
> > +                        dir->untracked->dir_invalidated);
> > +     trace2_data_intmax("read_directory", repo,
> > +                        "opendir", dir->untracked->dir_opened);
> > +}
> > +
>
> The existing code was quite tangled and I think this helps
> make things more clear.
>
>
> >   int read_directory(struct dir_struct *dir, struct index_state *istate,
> >                  const char *path, int len, const struct pathspec *pathspec)
> >   {
> >       struct untracked_cache_dir *untracked;
> >
> > -     trace_performance_enter();
> > +     trace2_region_enter("dir", "read_directory", istate->repo);
> >
> >       if (has_symlink_leading_path(path, len)) {
> >               trace_performance_leave("read directory %.*s", len, path);
>
> This `trace_performance_leave()` inside the `if` needs to be
> converted too.

Ooh, good catch.  Will fix.

> > @@ -2784,23 +2801,13 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
> >       QSORT(dir->entries, dir->nr, cmp_dir_entry);
> >       QSORT(dir->ignored, dir->ignored_nr, cmp_dir_entry);
> >
> > -     trace_performance_leave("read directory %.*s", len, path);
> > +     trace2_region_leave("dir", "read_directory", istate->repo);
>
> Can we put the call to `trace2_read_directory_statistics()` before
> the above `trace2_region_leave()` call?   Then those stats will
> appear indented between the begin- and end-region events in the output.
>
> That way, the following `if (dir-untracked) {...}` is only
> concerned with the untracked cache and/or freeing that data.

Makes sense, I'll move it.

> >       if (dir->untracked) {
> >               static int force_untracked_cache = -1;
> > -             static struct trace_key trace_untracked_stats = TRACE_KEY_INIT(UNTRACKED_STATS);
> >
> >               if (force_untracked_cache < 0)
> >                       force_untracked_cache =
> >                               git_env_bool("GIT_FORCE_UNTRACKED_CACHE", 0);
> > -             trace_printf_key(&trace_untracked_stats,
> > -                              "node creation: %u\n"
> > -                              "gitignore invalidation: %u\n"
> > -                              "directory invalidation: %u\n"
> > -                              "opendir: %u\n",
> > -                              dir->untracked->dir_created,
> > -                              dir->untracked->gitignore_invalidated,
> > -                              dir->untracked->dir_invalidated,
> > -                              dir->untracked->dir_opened);
> >               if (force_untracked_cache &&
> >                       dir->untracked == istate->untracked &&
> >                   (dir->untracked->dir_opened ||
> > @@ -2811,6 +2818,9 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
> >                       FREE_AND_NULL(dir->untracked);
> >               }
> >       }
> > +
> > +     if (trace2_is_enabled())
> > +             trace2_read_directory_statistics(dir, istate->repo);
>
> Also, I think it'd be ok to move the `trace2_is_enabled()` call
> inside the function.  Since we're also testing `!dir->untracked`
> inside the function.

Actually, I can't do that.  The path passed to this function is not
going to always be (and will often not be) NUL-terminated, but
trace2_data_string() expects a NUL-terminated string.  So, I'm going
to make a temporary strbuf and copy the path into it, but of course I
only want to spend time doing that if trace2_is_enabled().

> The more that I look at the before and after versions, the
> more I think the `trace2_read_directory_statistics()` call
> should be up before the `trace2_region_leave()`.  Here at the
> bottom of the function, we may have already freed `dir->untracked`.
> I'm not familiar enough with this code to know if that is a
> good or bad thing.

Yeah, the statistics really need to be moved earlier, both for the
nesting reasons you point out and because otherwise the statistics
won't print whenever dir->untracked != istate->untracked.  I'll move
them.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v3 3/8] [RFC] ls-files: error out on -i unless -o or -c are specified
  2021-05-10  5:09       ` Junio C Hamano
@ 2021-05-11 17:40         ` Elijah Newren
  2021-05-11 22:32           ` Junio C Hamano
  0 siblings, 1 reply; 90+ messages in thread
From: Elijah Newren @ 2021-05-11 17:40 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Eric Sunshine,
	Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler,
	Josh Steadmon

On Sun, May 9, 2021 at 10:09 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > @@ -748,6 +748,9 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
> >       if (pathspec.nr && error_unmatch)
> >               ps_matched = xcalloc(pathspec.nr, 1);
> >
> > +     if ((dir.flags & DIR_SHOW_IGNORED) && !show_others && !show_cached)
> > +             die("ls-files --ignored is usually used with --others, but --cached is the default.  Please specify which you want.");
> > +
>
> So "git ls-files -i" would suddenly start erroring out and users are
> to scramble and patch their scripts?

Thus the reason I marked this as "RFC" and called it out in the cover
letter for folks to comment on.

I figured that if I was having difficulty using it correctly and even
our own testsuite showed that 50% of such invocations were wrong
(despite being reviewed[1]), then it seems likely to me that erroring
out to inform folks of this problem might be warranted.  But, if folks
disagree, I can switch it to a warning instead.

[1] https://lore.kernel.org/git/20120724133227.GA14422@sigill.intra.peff.net/#t

> More importantly, the message does not make much sense.  "I is
> usually used with O" is very true, but the mention of "usually" here
> means it is not an error for "I" to be used without "O".  That part
> is very understandable and correct.
>
> But I do not know what "but --cached is the default" part wants to
> say.  If it is the _default_, and (assuming that what I read in the
> proposed log message is correct) the combination of "-i -c" is valid,
> then I would understand the message if the code were more like this:
>
>         if ((dir.flags & DIR_SHOW_IGNORED) &&
>             !show_others && !show_cached) {
>                 show_cached = 1; /* default */
>                 warning("ls-files -i given without -o/-c; defaulting to -i -c");
>         }
>
> If we are not defaulting to cached, then
>
>         die("ls-files -i must be used with either -o or -c");
>
> would also make sense.

Ooh, that wording is much nicer.  I'll adopt the latter suggestion,
but let me know if you'd rather I went the warning route.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v3 4/8] t7300: add testcase showing unnecessary traversal into ignored directory
  2021-05-10  5:28       ` Junio C Hamano
@ 2021-05-11 17:45         ` Elijah Newren
  2021-05-11 22:43           ` Junio C Hamano
  0 siblings, 1 reply; 90+ messages in thread
From: Elijah Newren @ 2021-05-11 17:45 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Eric Sunshine,
	Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler,
	Josh Steadmon

On Sun, May 9, 2021 at 10:28 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > +test_expect_failure 'avoid traversing into ignored directories' '
> > +     test_when_finished rm -f output error trace.* &&
> > +     test_create_repo avoid-traversing-deep-hierarchy &&
> > +     (
> > +             cd avoid-traversing-deep-hierarchy &&
> > +
> > +             mkdir -p untracked/subdir/with/a &&
> > +             >untracked/subdir/with/a/random-file.txt &&
> > +
> > +             GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
> > +             git clean -ffdxn -e untracked
> > +     ) &&
> > +
> > +     grep data.*read_directo.*visited trace.output \
> > +             | cut -d "|" -f 9 >trace.relevant &&
> > +     cat >trace.expect <<-EOF &&
> > +      directories-visited:1
> > +      paths-visited:4
>
> Are the origins of '1' and '4' trivially obvious to those who are
> reading the test, or do these deserve comments?
>
> We create an empty test repository, go there and create a untracked/
> hierarchy with a junk file, and tell "clean" that 'untracked' is
> "also" in the exclude pattern (but since there is no other exclude
> pattern, that is the only one), so everything underneath untracked/
> we have no reason to inspect.
>
> So, we do not visit 'untracked' directory.  Which ones do we visit?
> Is '1' coming from the top-level of the working tree '.'?  What
> about the number of visited paths '4' (the trace is stored outside
> this new test repository, so that's not it).

Good points.  I'll make a comment that directories-visited:1 is about
ensuring we only went into the toplevel directory, and I'll removed
the paths-visited check.

But to answer your question, the paths we visit are '.', '..', '.git',
and 'untracked', the first three of which we mark as path_none and
don't recurse into because of special rules for those paths, and the
last of which we shouldn't recurse into since it is ignored.  There
weren't any non-directory files in the toplevel directory, or those
would also be included in the paths-visited count.  A later patch in
the series will fix the code to not recurse into the 'untracked'
directory, fixing this test.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v3 6/8] dir: avoid unnecessary traversal into ignored directory
  2021-05-10  5:48       ` Junio C Hamano
@ 2021-05-11 17:57         ` Elijah Newren
  0 siblings, 0 replies; 90+ messages in thread
From: Elijah Newren @ 2021-05-11 17:57 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Eric Sunshine,
	Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler,
	Josh Steadmon

On Sun, May 9, 2021 at 10:48 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > Some sidenotes about possible confusion with dir.c:
>
> Thanks for working on untangling this mess ;-)
>
> > * "ignored" often refers to an untracked ignore", i.e. a file which is
> >   not tracked which matches one of the ignore/exclusion rules.  But you
> >   can also have a "tracked ignore", a tracked file that happens to match
> >   one of the ignore/exclusion rules and which dir.c has to worry about
> >   since "git ls-files -c -i" is supposed to list them.
>
> OK.  This is to find a pattern in .gitignore that is too broad
> (i.e. if the path were to be added as a new thing today, it would
> require "add -f"), right?  The combination of "-i -c" does make
> sense for that purpose.
>
> > * The dir code often uses "ignored" and "excluded" interchangeably,
> >   which you need to keep in mind while reading the code.
>
> True.  In tree .gitignore files are to hold exclude patterns, and
> per repository personal exclude file is called $GIT_DIR/info/exclude
> which is confusing.
>
> > Sadly, though,
> >   it can get very confusing since ignore rules can have exclusions, as
> >   in the last of the following .gitignore rules:
> >       .gitignore
> >       *~
> >       *.log
> >       !settings.log
> >   In the last entry above, (pathspec->items[3].magic & PATHSPEC_EXCLUDE)
> >   will be true due the the '!' negating the rule.  Someone might refer
> >   to this as "excluded".
>
> That one I've never heard of.  As far as I am concerned, that is a
> negative exclude pattern.

Oops, I was mixing up negative exclude patterns and negative (or
excluded) pathspecs.  So "exclude" can refer to "ignored" files, or be
used in "PATHSPEC_EXCLUDE" for excluded pathspecs.

...and there's another way it's used.  "exclude" can also be used to
refer to "exclude" patterns, meaning the patterns that .gitignore (and
related files) use.  However, .git/info/sparse-checkout re-used these
same rulesets, but then used them to determine path *inclusion*.  At
my request, Stolee mostly fixed that up in 65edd96aec ("treewide:
rename 'exclude' methods to 'pattern'", 2019-09-03) but you can still
occasionally find a code comment referring to an "exclude" pattern
that might actually be used by the sparse-checkout stuff as an
inclusion rule.

And then we have a myriad of other variables and comments with "excl"
in their name that might be derived from any of the above three...and
it's sometimes difficult for me to remember which one of the concepts
such a derived variable or comment might be referring to.

*sigh*

> I do wish we started the project with .gitignore files and
> $GIT_DIR/info/ignore both of which holds ignore patterns and
> negative ignore patterns from day one, but the boat sailed
> long time ago.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH v4 0/8] Directory traversal fixes
  2021-05-08 19:58   ` [PATCH v3 0/8] Directory traversal fixes Elijah Newren via GitGitGadget
                       ` (7 preceding siblings ...)
  2021-05-08 19:59     ` [PATCH v3 8/8] dir: update stale description of treat_directory() Derrick Stolee via GitGitGadget
@ 2021-05-11 18:34     ` Elijah Newren via GitGitGadget
  2021-05-11 18:34       ` [PATCH v4 1/8] dir: convert trace calls to trace2 equivalents Elijah Newren via GitGitGadget
                         ` (8 more replies)
  8 siblings, 9 replies; 90+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-05-11 18:34 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King,
	Philip Oakley, Jeff Hostetler, Josh Steadmon, Jeff Hostetler,
	Elijah Newren

This patchset fixes a few directory traversal issues, where fill_directory()
would traverse into directories that it shouldn't and not traverse into
directories that it should (one of which was originally reported on this
list at [1]). And it includes a few cleanups

Changes since v3, includes numerous cleanups suggested by Junio and Jeff
(thanks for the reviews!):

 * Removed the RFC labels, but if folks want a warning instead of a die on
   ls-files -i (see patch 3), let me know
 * Include the path passed to read_directory() in the printed trace2
   statistics
 * Print trace2 statistics before calling trace2_region_leave()
 * Make sure to convert both trace_performance_leave() calls
 * testcase style fixes
 * left a comment that directories-visited:1 referred to the toplevel
   directory
 * fixed up some commit message comments about "exclude" and mentioned yet
   another way it can be confusing.

As noted in previous versions of this series, if folks would prefer ls-files
-i to continue running but print a warning rather than making it an error as
I did in this series, let me know. Also, if anyone has any ideas about a
better place to put the "Some sidenotes" from the sixth commit message
rather than keeping them in a random commit message, that might be helpful
too.

[1] See
https://lore.kernel.org/git/DM6PR00MB06829EC5B85E0C5AC595004E894E9@DM6PR00MB0682.namprd00.prod.outlook.com/
or alternatively https://github.com/git-for-windows/git/issues/2732.

Derrick Stolee (1):
  dir: update stale description of treat_directory()

Elijah Newren (7):
  dir: convert trace calls to trace2 equivalents
  dir: report number of visited directories and paths with trace2
  ls-files: error out on -i unless -o or -c are specified
  t7300: add testcase showing unnecessary traversal into ignored
    directory
  t3001, t7300: add testcase showcasing missed directory traversal
  dir: avoid unnecessary traversal into ignored directory
  dir: traverse into untracked directories if they may have ignored
    subfiles

 builtin/ls-files.c                 |   3 +
 dir.c                              | 112 +++++++++++-----
 dir.h                              |   4 +
 t/t1306-xdg-files.sh               |   2 +-
 t/t3001-ls-files-others-exclude.sh |   5 +
 t/t3003-ls-files-exclude.sh        |   4 +-
 t/t7063-status-untracked-cache.sh  | 206 +++++++++++++++++------------
 t/t7300-clean.sh                   |  42 ++++++
 t/t7519-status-fsmonitor.sh        |   8 +-
 9 files changed, 259 insertions(+), 127 deletions(-)


base-commit: 311531c9de557d25ac087c1637818bd2aad6eb3a
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1020%2Fnewren%2Fdirectory-traversal-fixes-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1020/newren/directory-traversal-fixes-v4
Pull-Request: https://github.com/git/git/pull/1020

Range-diff vs v3:

 1:  9f1c0d78d739 ! 1:  9204e36b7e90 [RFC] dir: convert trace calls to trace2 equivalents
     @@ Metadata
      Author: Elijah Newren <newren@gmail.com>
      
       ## Commit message ##
     -    [RFC] dir: convert trace calls to trace2 equivalents
     +    dir: convert trace calls to trace2 equivalents
      
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
     @@ dir.c: static struct untracked_cache_dir *validate_untracked_cache(struct dir_st
       }
       
      +static void trace2_read_directory_statistics(struct dir_struct *dir,
     -+					     struct repository *repo)
     ++					     struct repository *repo,
     ++					     const char *path)
      +{
      +	if (!dir->untracked)
      +		return;
     ++	trace2_data_string("read_directory", repo, "path", path);
      +	trace2_data_intmax("read_directory", repo,
      +			   "node-creation", dir->untracked->dir_created);
      +	trace2_data_intmax("read_directory", repo,
     @@ dir.c: static struct untracked_cache_dir *validate_untracked_cache(struct dir_st
      +	trace2_region_enter("dir", "read_directory", istate->repo);
       
       	if (has_symlink_leading_path(path, len)) {
     - 		trace_performance_leave("read directory %.*s", len, path);
     +-		trace_performance_leave("read directory %.*s", len, path);
     ++		trace2_region_leave("dir", "read_directory", istate->repo);
     + 		return dir->nr;
     + 	}
     + 
      @@ dir.c: int read_directory(struct dir_struct *dir, struct index_state *istate,
       	QSORT(dir->entries, dir->nr, cmp_dir_entry);
       	QSORT(dir->ignored, dir->ignored_nr, cmp_dir_entry);
       
      -	trace_performance_leave("read directory %.*s", len, path);
     ++	if (trace2_is_enabled()) {
     ++		struct strbuf tmp = STRBUF_INIT;
     ++		strbuf_add(&tmp, path, len);
     ++		trace2_read_directory_statistics(dir, istate->repo, tmp.buf);
     ++		strbuf_release(&tmp);
     ++	}
     ++
      +	trace2_region_leave("dir", "read_directory", istate->repo);
       	if (dir->untracked) {
       		static int force_untracked_cache = -1;
     @@ dir.c: int read_directory(struct dir_struct *dir, struct index_state *istate,
       		}
       	}
      +
     -+	if (trace2_is_enabled())
     -+		trace2_read_directory_statistics(dir, istate->repo);
       	return dir->nr;
       }
       
     @@ t/t7063-status-untracked-cache.sh: iuc () {
       	return $ret
       }
       
     -+get_relevant_traces() {
     ++get_relevant_traces () {
      +	# From the GIT_TRACE2_PERF data of the form
      +	#    $TIME $FILE:$LINE | d0 | main | data | r1 | ? | ? | read_directo | $RELEVANT_STAT
      +	# extract the $RELEVANT_STAT fields.  We don't care about region_enter
      +	# or region_leave, or stats for things outside read_directory.
      +	INPUT_FILE=$1
      +	OUTPUT_FILE=$2
     -+	grep data.*read_directo $INPUT_FILE \
     -+	    | cut -d "|" -f 9 \
     -+	    >$OUTPUT_FILE
     ++	grep data.*read_directo $INPUT_FILE |
     ++	    cut -d "|" -f 9 \
     ++	    >"$OUTPUT_FILE"
      +}
      +
      +
     @@ t/t7063-status-untracked-cache.sh: EOF
      -gitignore invalidation: 1
      -directory invalidation: 0
      -opendir: 4
     -+ ..node-creation:3
     -+ ..gitignore-invalidation:1
     -+ ..directory-invalidation:0
     -+ ..opendir:4
     ++ ....path:
     ++ ....node-creation:3
     ++ ....gitignore-invalidation:1
     ++ ....directory-invalidation:0
     ++ ....opendir:4
       EOF
      -	test_cmp ../trace.expect ../trace
      +	test_cmp ../trace.expect ../trace.relevant
     @@ t/t7063-status-untracked-cache.sh: test_expect_success 'untracked cache after fi
      -gitignore invalidation: 0
      -directory invalidation: 0
      -opendir: 0
     -+ ..node-creation:0
     -+ ..gitignore-invalidation:0
     -+ ..directory-invalidation:0
     -+ ..opendir:0
     ++ ....path:
     ++ ....node-creation:0
     ++ ....gitignore-invalidation:0
     ++ ....directory-invalidation:0
     ++ ....opendir:0
       EOF
      -	test_cmp ../trace.expect ../trace
      +	test_cmp ../trace.expect ../trace.relevant
     @@ t/t7063-status-untracked-cache.sh: A  two
      -gitignore invalidation: 0
      -directory invalidation: 1
      -opendir: 1
     -+ ..node-creation:0
     -+ ..gitignore-invalidation:0
     -+ ..directory-invalidation:1
     -+ ..opendir:1
     ++ ....path:
     ++ ....node-creation:0
     ++ ....gitignore-invalidation:0
     ++ ....directory-invalidation:1
     ++ ....opendir:1
       EOF
      -	test_cmp ../trace.expect ../trace
      +	test_cmp ../trace.expect ../trace.relevant
     @@ t/t7063-status-untracked-cache.sh: A  two
      -gitignore invalidation: 1
      -directory invalidation: 1
      -opendir: 4
     -+ ..node-creation:0
     -+ ..gitignore-invalidation:1
     -+ ..directory-invalidation:1
     -+ ..opendir:4
     ++ ....path:
     ++ ....node-creation:0
     ++ ....gitignore-invalidation:1
     ++ ....directory-invalidation:1
     ++ ....opendir:4
       EOF
      -	test_cmp ../trace.expect ../trace
      +	test_cmp ../trace.expect ../trace.relevant
     @@ t/t7063-status-untracked-cache.sh: A  two
      -gitignore invalidation: 1
      -directory invalidation: 0
      -opendir: 4
     -+ ..node-creation:0
     -+ ..gitignore-invalidation:1
     -+ ..directory-invalidation:0
     -+ ..opendir:4
     ++ ....path:
     ++ ....node-creation:0
     ++ ....gitignore-invalidation:1
     ++ ....directory-invalidation:0
     ++ ....opendir:4
       EOF
      -	test_cmp ../trace.expect ../trace
      +	test_cmp ../trace.expect ../trace.relevant
     @@ t/t7063-status-untracked-cache.sh: A  one
      -gitignore invalidation: 0
      -directory invalidation: 0
      -opendir: 1
     -+ ..node-creation:0
     -+ ..gitignore-invalidation:0
     -+ ..directory-invalidation:0
     -+ ..opendir:1
     ++ ....path:
     ++ ....node-creation:0
     ++ ....gitignore-invalidation:0
     ++ ....directory-invalidation:0
     ++ ....opendir:1
       EOF
      -	test_cmp ../trace.expect ../trace
      +	test_cmp ../trace.expect ../trace.relevant
     @@ t/t7063-status-untracked-cache.sh: A  two
      -gitignore invalidation: 0
      -directory invalidation: 0
      -opendir: 1
     -+ ..node-creation:0
     -+ ..gitignore-invalidation:0
     -+ ..directory-invalidation:0
     -+ ..opendir:1
     ++ ....path:
     ++ ....node-creation:0
     ++ ....gitignore-invalidation:0
     ++ ....directory-invalidation:0
     ++ ....opendir:1
       EOF
      -	test_cmp ../trace.expect ../trace
      +	test_cmp ../trace.expect ../trace.relevant
     @@ t/t7063-status-untracked-cache.sh: test_expect_success 'status after commit' '
      -gitignore invalidation: 0
      -directory invalidation: 0
      -opendir: 2
     -+ ..node-creation:0
     -+ ..gitignore-invalidation:0
     -+ ..directory-invalidation:0
     -+ ..opendir:2
     ++ ....path:
     ++ ....node-creation:0
     ++ ....gitignore-invalidation:0
     ++ ....directory-invalidation:0
     ++ ....opendir:2
       EOF
      -	test_cmp ../trace.expect ../trace
      +	test_cmp ../trace.expect ../trace.relevant
     @@ t/t7063-status-untracked-cache.sh: test_expect_success 'test sparse status with
      -gitignore invalidation: 1
      -directory invalidation: 2
      -opendir: 2
     -+ ..node-creation:0
     -+ ..gitignore-invalidation:1
     -+ ..directory-invalidation:2
     -+ ..opendir:2
     ++ ....path:
     ++ ....node-creation:0
     ++ ....gitignore-invalidation:1
     ++ ....directory-invalidation:2
     ++ ....opendir:2
       EOF
      -	test_cmp ../trace.expect ../trace
      +	test_cmp ../trace.expect ../trace.relevant
     @@ t/t7063-status-untracked-cache.sh: test_expect_success 'test sparse status again
      -gitignore invalidation: 0
      -directory invalidation: 0
      -opendir: 0
     -+ ..node-creation:0
     -+ ..gitignore-invalidation:0
     -+ ..directory-invalidation:0
     -+ ..opendir:0
     ++ ....path:
     ++ ....node-creation:0
     ++ ....gitignore-invalidation:0
     ++ ....directory-invalidation:0
     ++ ....opendir:0
       EOF
      -	test_cmp ../trace.expect ../trace
      +	test_cmp ../trace.expect ../trace.relevant
     @@ t/t7063-status-untracked-cache.sh: test_expect_success 'test sparse status with
      -gitignore invalidation: 0
      -directory invalidation: 1
      -opendir: 3
     -+ ..node-creation:2
     -+ ..gitignore-invalidation:0
     -+ ..directory-invalidation:1
     -+ ..opendir:3
     ++ ....path:
     ++ ....node-creation:2
     ++ ....gitignore-invalidation:0
     ++ ....directory-invalidation:1
     ++ ....opendir:3
       EOF
      -	test_cmp ../trace.expect ../trace
      +	test_cmp ../trace.expect ../trace.relevant
     @@ t/t7063-status-untracked-cache.sh: EOF
      -gitignore invalidation: 0
      -directory invalidation: 0
      -opendir: 0
     -+ ..node-creation:0
     -+ ..gitignore-invalidation:0
     -+ ..directory-invalidation:0
     -+ ..opendir:0
     ++ ....path:
     ++ ....node-creation:0
     ++ ....gitignore-invalidation:0
     ++ ....directory-invalidation:0
     ++ ....opendir:0
       EOF
      -	test_cmp ../trace.expect ../trace
      +	test_cmp ../trace.expect ../trace.relevant
 2:  8b511f228af8 ! 2:  6939253be825 [RFC] dir: report number of visited directories and paths with trace2
     @@ Metadata
      Author: Elijah Newren <newren@gmail.com>
      
       ## Commit message ##
     -    [RFC] dir: report number of visited directories and paths with trace2
     +    dir: report number of visited directories and paths with trace2
      
          Provide more statistics in trace2 output that include the number of
          directories and total paths visited by the directory traversal logic.
     @@ dir.c: static enum path_treatment read_directory_recursive(struct dir_struct *di
       
       		if (state > dir_state)
       			dir_state = state;
     -@@ dir.c: static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
     - static void trace2_read_directory_statistics(struct dir_struct *dir,
     - 					     struct repository *repo)
     +@@ dir.c: static void trace2_read_directory_statistics(struct dir_struct *dir,
     + 					     struct repository *repo,
     + 					     const char *path)
       {
      +	trace2_data_intmax("read_directory", repo,
      +			   "directories-visited", dir->visited_directories);
     @@ dir.c: static struct untracked_cache_dir *validate_untracked_cache(struct dir_st
      +			   "paths-visited", dir->visited_paths);
       	if (!dir->untracked)
       		return;
     - 	trace2_data_intmax("read_directory", repo,
     + 	trace2_data_string("read_directory", repo, "path", path);
      @@ dir.c: int read_directory(struct dir_struct *dir, struct index_state *istate,
       	struct untracked_cache_dir *untracked;
       
     @@ dir.c: int read_directory(struct dir_struct *dir, struct index_state *istate,
      +	dir->visited_directories = 0;
       
       	if (has_symlink_leading_path(path, len)) {
     - 		trace_performance_leave("read directory %.*s", len, path);
     + 		trace2_region_leave("dir", "read_directory", istate->repo);
      
       ## dir.h ##
      @@ dir.h: struct dir_struct {
     @@ dir.h: struct dir_struct {
       /*Count the number of slashes for string s*/
      
       ## t/t7063-status-untracked-cache.sh ##
     -@@ t/t7063-status-untracked-cache.sh: get_relevant_traces() {
     +@@ t/t7063-status-untracked-cache.sh: get_relevant_traces () {
       	INPUT_FILE=$1
       	OUTPUT_FILE=$2
     - 	grep data.*read_directo $INPUT_FILE \
     -+	    | grep -v visited \
     - 	    | cut -d "|" -f 9 \
     - 	    >$OUTPUT_FILE
     + 	grep data.*read_directo $INPUT_FILE |
     +-	    cut -d "|" -f 9 \
     ++	    cut -d "|" -f 9 |
     ++	    grep -v visited \
     + 	    >"$OUTPUT_FILE"
       }
     + 
 3:  44a1322c4402 ! 3:  8d0ca8104be6 [RFC] ls-files: error out on -i unless -o or -c are specified
     @@ Metadata
      Author: Elijah Newren <newren@gmail.com>
      
       ## Commit message ##
     -    [RFC] ls-files: error out on -i unless -o or -c are specified
     +    ls-files: error out on -i unless -o or -c are specified
      
          ls-files --ignored can be used together with either --others or
          --cached.  After being perplexed for a bit and digging in to the code, I
          assumed that ls-files -i was just broken and not printing anything and
     -    had a nice patch ready to submit when I finally realized that -i can be
     +    I had a nice patch ready to submit when I finally realized that -i can be
          used with --cached to find tracked ignores.
      
          While that was a mistake on my part, and a careful reading of the
     @@ builtin/ls-files.c: int cmd_ls_files(int argc, const char **argv, const char *cm
       		ps_matched = xcalloc(pathspec.nr, 1);
       
      +	if ((dir.flags & DIR_SHOW_IGNORED) && !show_others && !show_cached)
     -+		die("ls-files --ignored is usually used with --others, but --cached is the default.  Please specify which you want.");
     ++		die("ls-files -i must be used with either -o or -c");
      +
       	if ((dir.flags & DIR_SHOW_IGNORED) && !exc_given)
       		die("ls-files --ignored needs some exclude pattern");
 4:  dc3d3f247141 ! 4:  317abab3571e t7300: add testcase showing unnecessary traversal into ignored directory
     @@ t/t7300-clean.sh: test_expect_success 'clean untracked paths by pathspec' '
      +		git clean -ffdxn -e untracked
      +	) &&
      +
     -+	grep data.*read_directo.*visited trace.output \
     -+		| cut -d "|" -f 9 >trace.relevant &&
     ++	# Make sure we only visited into the top-level directory, and did
     ++	# not traverse into the "untracked" subdirectory since it was excluded
     ++	grep data.*read_directo.*directories-visited trace.output |
     ++		cut -d "|" -f 9 >trace.relevant &&
      +	cat >trace.expect <<-EOF &&
     -+	 directories-visited:1
     -+	 paths-visited:4
     ++	 ..directories-visited:1
      +	EOF
      +	test_cmp trace.expect trace.relevant
      +'
 5:  73b03a1e8e05 = 5:  5eb019327b57 t3001, t7300: add testcase showcasing missed directory traversal
 6:  66ffc7f02d08 ! 6:  89cc01ef8598 dir: avoid unnecessary traversal into ignored directory
     @@ Commit message
            since "git ls-files -c -i" is supposed to list them.
      
          * The dir code often uses "ignored" and "excluded" interchangeably,
     -      which you need to keep in mind while reading the code.  Sadly, though,
     -      it can get very confusing since ignore rules can have exclusions, as
     -      in the last of the following .gitignore rules:
     -          .gitignore
     +      which you need to keep in mind while reading the code.
     +
     +    * "exclude" is used multiple ways in the code:
     +
     +      * As noted above, "exclude" is often a synonym for "ignored".
     +
     +      * The logic for parsing .gitignore files was re-used in
     +        .git/info/sparse-checkout, except there it is used to mark paths that
     +        the user wants to *keep*.  This was mostly addressed by commit
     +        65edd96aec ("treewide: rename 'exclude' methods to 'pattern'",
     +        2019-09-03), but every once in a while you'll find a comment about
     +        "exclude" referring to these patterns that might in fact be in use
     +        by the sparse-checkout machinery for inclusion rules.
     +
     +      * The word "EXCLUDE" is also used for pathspec negation, as in
     +          (pathspec->items[3].magic & PATHSPEC_EXCLUDE)
     +        Thus if a user had a .gitignore file containing
                *~
                *.log
                !settings.log
     -      In the last entry above, (pathspec->items[3].magic & PATHSPEC_EXCLUDE)
     -      will be true due the the '!' negating the rule.  Someone might refer
     -      to this as "excluded".  That means the file 'settings.log' will not
     -      match, and thus not be ignored.  So we won't return path_excluded for
     -      it.  So it's an exclude rule that prevents the file from being an
     -      exclude.  The non-excluded rules are the ones that result in files
     -      being excludes.  Great fun, eh?
     +        And then ran
     +          git add -- 'settings.*' ':^settings.log'
     +        Then :^settings.log is a pathspec negation making settings.log not
     +        be requested to be added even though all other settings.* files are
     +        being added.  Also, !settings.log in the gitignore file is a negative
     +        exclude pattern meaning that settings.log is normally a file we
     +        want to track even though all other *.log files are ignored.
      
          Sometimes it feels like dir.c needs its own glossary with its many
          definitions, including the multiply-defined terms.
 7:  acde436b220e = 7:  4a561e1229e4 dir: traverse into untracked directories if they may have ignored subfiles
 8:  57135c357774 = 8:  2945e749f5e3 dir: update stale description of treat_directory()

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH v4 1/8] dir: convert trace calls to trace2 equivalents
  2021-05-11 18:34     ` [PATCH v4 0/8] Directory traversal fixes Elijah Newren via GitGitGadget
@ 2021-05-11 18:34       ` Elijah Newren via GitGitGadget
  2021-05-11 19:06         ` Jeff Hostetler
  2021-05-11 18:34       ` [PATCH v4 2/8] dir: report number of visited directories and paths with trace2 Elijah Newren via GitGitGadget
                         ` (7 subsequent siblings)
  8 siblings, 1 reply; 90+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-05-11 18:34 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King,
	Philip Oakley, Jeff Hostetler, Josh Steadmon, Jeff Hostetler,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c                             |  43 +++++--
 t/t7063-status-untracked-cache.sh | 205 ++++++++++++++++++------------
 t/t7519-status-fsmonitor.sh       |   8 +-
 3 files changed, 155 insertions(+), 101 deletions(-)

diff --git a/dir.c b/dir.c
index 3474e67e8f3c..122fcbffdf89 100644
--- a/dir.c
+++ b/dir.c
@@ -2760,15 +2760,34 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
 	return root;
 }
 
+static void trace2_read_directory_statistics(struct dir_struct *dir,
+					     struct repository *repo,
+					     const char *path)
+{
+	if (!dir->untracked)
+		return;
+	trace2_data_string("read_directory", repo, "path", path);
+	trace2_data_intmax("read_directory", repo,
+			   "node-creation", dir->untracked->dir_created);
+	trace2_data_intmax("read_directory", repo,
+			   "gitignore-invalidation",
+			   dir->untracked->gitignore_invalidated);
+	trace2_data_intmax("read_directory", repo,
+			   "directory-invalidation",
+			   dir->untracked->dir_invalidated);
+	trace2_data_intmax("read_directory", repo,
+			   "opendir", dir->untracked->dir_opened);
+}
+
 int read_directory(struct dir_struct *dir, struct index_state *istate,
 		   const char *path, int len, const struct pathspec *pathspec)
 {
 	struct untracked_cache_dir *untracked;
 
-	trace_performance_enter();
+	trace2_region_enter("dir", "read_directory", istate->repo);
 
 	if (has_symlink_leading_path(path, len)) {
-		trace_performance_leave("read directory %.*s", len, path);
+		trace2_region_leave("dir", "read_directory", istate->repo);
 		return dir->nr;
 	}
 
@@ -2784,23 +2803,20 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
 	QSORT(dir->entries, dir->nr, cmp_dir_entry);
 	QSORT(dir->ignored, dir->ignored_nr, cmp_dir_entry);
 
-	trace_performance_leave("read directory %.*s", len, path);
+	if (trace2_is_enabled()) {
+		struct strbuf tmp = STRBUF_INIT;
+		strbuf_add(&tmp, path, len);
+		trace2_read_directory_statistics(dir, istate->repo, tmp.buf);
+		strbuf_release(&tmp);
+	}
+
+	trace2_region_leave("dir", "read_directory", istate->repo);
 	if (dir->untracked) {
 		static int force_untracked_cache = -1;
-		static struct trace_key trace_untracked_stats = TRACE_KEY_INIT(UNTRACKED_STATS);
 
 		if (force_untracked_cache < 0)
 			force_untracked_cache =
 				git_env_bool("GIT_FORCE_UNTRACKED_CACHE", 0);
-		trace_printf_key(&trace_untracked_stats,
-				 "node creation: %u\n"
-				 "gitignore invalidation: %u\n"
-				 "directory invalidation: %u\n"
-				 "opendir: %u\n",
-				 dir->untracked->dir_created,
-				 dir->untracked->gitignore_invalidated,
-				 dir->untracked->dir_invalidated,
-				 dir->untracked->dir_opened);
 		if (force_untracked_cache &&
 			dir->untracked == istate->untracked &&
 		    (dir->untracked->dir_opened ||
@@ -2811,6 +2827,7 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
 			FREE_AND_NULL(dir->untracked);
 		}
 	}
+
 	return dir->nr;
 }
 
diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh
index accefde72fb1..9710d33b3cd6 100755
--- a/t/t7063-status-untracked-cache.sh
+++ b/t/t7063-status-untracked-cache.sh
@@ -57,6 +57,19 @@ iuc () {
 	return $ret
 }
 
+get_relevant_traces () {
+	# From the GIT_TRACE2_PERF data of the form
+	#    $TIME $FILE:$LINE | d0 | main | data | r1 | ? | ? | read_directo | $RELEVANT_STAT
+	# extract the $RELEVANT_STAT fields.  We don't care about region_enter
+	# or region_leave, or stats for things outside read_directory.
+	INPUT_FILE=$1
+	OUTPUT_FILE=$2
+	grep data.*read_directo $INPUT_FILE |
+	    cut -d "|" -f 9 \
+	    >"$OUTPUT_FILE"
+}
+
+
 test_lazy_prereq UNTRACKED_CACHE '
 	{ git update-index --test-untracked-cache; ret=$?; } &&
 	test $ret -ne 1
@@ -129,19 +142,21 @@ EOF
 
 test_expect_success 'status first time (empty cache)' '
 	avoid_racy &&
-	: >../trace &&
-	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 	git status --porcelain >../actual &&
 	iuc status --porcelain >../status.iuc &&
 	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
 	cat >../trace.expect <<EOF &&
-node creation: 3
-gitignore invalidation: 1
-directory invalidation: 0
-opendir: 4
+ ....path:
+ ....node-creation:3
+ ....gitignore-invalidation:1
+ ....directory-invalidation:0
+ ....opendir:4
 EOF
-	test_cmp ../trace.expect ../trace
+	test_cmp ../trace.expect ../trace.relevant
 '
 
 test_expect_success 'untracked cache after first status' '
@@ -151,19 +166,21 @@ test_expect_success 'untracked cache after first status' '
 
 test_expect_success 'status second time (fully populated cache)' '
 	avoid_racy &&
-	: >../trace &&
-	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 	git status --porcelain >../actual &&
 	iuc status --porcelain >../status.iuc &&
 	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
 	cat >../trace.expect <<EOF &&
-node creation: 0
-gitignore invalidation: 0
-directory invalidation: 0
-opendir: 0
+ ....path:
+ ....node-creation:0
+ ....gitignore-invalidation:0
+ ....directory-invalidation:0
+ ....opendir:0
 EOF
-	test_cmp ../trace.expect ../trace
+	test_cmp ../trace.expect ../trace.relevant
 '
 
 test_expect_success 'untracked cache after second status' '
@@ -174,8 +191,8 @@ test_expect_success 'untracked cache after second status' '
 test_expect_success 'modify in root directory, one dir invalidation' '
 	avoid_racy &&
 	: >four &&
-	: >../trace &&
-	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 	git status --porcelain >../actual &&
 	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
@@ -189,13 +206,15 @@ A  two
 EOF
 	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
 	cat >../trace.expect <<EOF &&
-node creation: 0
-gitignore invalidation: 0
-directory invalidation: 1
-opendir: 1
+ ....path:
+ ....node-creation:0
+ ....gitignore-invalidation:0
+ ....directory-invalidation:1
+ ....opendir:1
 EOF
-	test_cmp ../trace.expect ../trace
+	test_cmp ../trace.expect ../trace.relevant
 
 '
 
@@ -223,8 +242,8 @@ EOF
 test_expect_success 'new .gitignore invalidates recursively' '
 	avoid_racy &&
 	echo four >.gitignore &&
-	: >../trace &&
-	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 	git status --porcelain >../actual &&
 	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
@@ -238,13 +257,15 @@ A  two
 EOF
 	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
 	cat >../trace.expect <<EOF &&
-node creation: 0
-gitignore invalidation: 1
-directory invalidation: 1
-opendir: 4
+ ....path:
+ ....node-creation:0
+ ....gitignore-invalidation:1
+ ....directory-invalidation:1
+ ....opendir:4
 EOF
-	test_cmp ../trace.expect ../trace
+	test_cmp ../trace.expect ../trace.relevant
 
 '
 
@@ -272,8 +293,8 @@ EOF
 test_expect_success 'new info/exclude invalidates everything' '
 	avoid_racy &&
 	echo three >>.git/info/exclude &&
-	: >../trace &&
-	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 	git status --porcelain >../actual &&
 	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
@@ -285,13 +306,15 @@ A  two
 EOF
 	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
 	cat >../trace.expect <<EOF &&
-node creation: 0
-gitignore invalidation: 1
-directory invalidation: 0
-opendir: 4
+ ....path:
+ ....node-creation:0
+ ....gitignore-invalidation:1
+ ....directory-invalidation:0
+ ....opendir:4
 EOF
-	test_cmp ../trace.expect ../trace
+	test_cmp ../trace.expect ../trace.relevant
 '
 
 test_expect_success 'verify untracked cache dump' '
@@ -330,8 +353,8 @@ EOF
 '
 
 test_expect_success 'status after the move' '
-	: >../trace &&
-	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 	git status --porcelain >../actual &&
 	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
@@ -343,13 +366,15 @@ A  one
 EOF
 	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
 	cat >../trace.expect <<EOF &&
-node creation: 0
-gitignore invalidation: 0
-directory invalidation: 0
-opendir: 1
+ ....path:
+ ....node-creation:0
+ ....gitignore-invalidation:0
+ ....directory-invalidation:0
+ ....opendir:1
 EOF
-	test_cmp ../trace.expect ../trace
+	test_cmp ../trace.expect ../trace.relevant
 '
 
 test_expect_success 'verify untracked cache dump' '
@@ -389,8 +414,8 @@ EOF
 '
 
 test_expect_success 'status after the move' '
-	: >../trace &&
-	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 	git status --porcelain >../actual &&
 	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
@@ -402,13 +427,15 @@ A  two
 EOF
 	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
 	cat >../trace.expect <<EOF &&
-node creation: 0
-gitignore invalidation: 0
-directory invalidation: 0
-opendir: 1
+ ....path:
+ ....node-creation:0
+ ....gitignore-invalidation:0
+ ....directory-invalidation:0
+ ....opendir:1
 EOF
-	test_cmp ../trace.expect ../trace
+	test_cmp ../trace.expect ../trace.relevant
 '
 
 test_expect_success 'verify untracked cache dump' '
@@ -438,8 +465,8 @@ test_expect_success 'set up for sparse checkout testing' '
 '
 
 test_expect_success 'status after commit' '
-	: >../trace &&
-	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 	git status --porcelain >../actual &&
 	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
@@ -448,13 +475,15 @@ test_expect_success 'status after commit' '
 EOF
 	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
 	cat >../trace.expect <<EOF &&
-node creation: 0
-gitignore invalidation: 0
-directory invalidation: 0
-opendir: 2
+ ....path:
+ ....node-creation:0
+ ....gitignore-invalidation:0
+ ....directory-invalidation:0
+ ....opendir:2
 EOF
-	test_cmp ../trace.expect ../trace
+	test_cmp ../trace.expect ../trace.relevant
 '
 
 test_expect_success 'untracked cache correct after commit' '
@@ -496,9 +525,9 @@ test_expect_success 'create/modify files, some of which are gitignored' '
 '
 
 test_expect_success 'test sparse status with untracked cache' '
-	: >../trace &&
+	: >../trace.output &&
 	avoid_racy &&
-	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 	git status --porcelain >../status.actual &&
 	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
@@ -509,13 +538,15 @@ test_expect_success 'test sparse status with untracked cache' '
 EOF
 	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../status.actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
 	cat >../trace.expect <<EOF &&
-node creation: 0
-gitignore invalidation: 1
-directory invalidation: 2
-opendir: 2
+ ....path:
+ ....node-creation:0
+ ....gitignore-invalidation:1
+ ....directory-invalidation:2
+ ....opendir:2
 EOF
-	test_cmp ../trace.expect ../trace
+	test_cmp ../trace.expect ../trace.relevant
 '
 
 test_expect_success 'untracked cache correct after status' '
@@ -539,8 +570,8 @@ EOF
 
 test_expect_success 'test sparse status again with untracked cache' '
 	avoid_racy &&
-	: >../trace &&
-	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 	git status --porcelain >../status.actual &&
 	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
@@ -551,13 +582,15 @@ test_expect_success 'test sparse status again with untracked cache' '
 EOF
 	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../status.actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
 	cat >../trace.expect <<EOF &&
-node creation: 0
-gitignore invalidation: 0
-directory invalidation: 0
-opendir: 0
+ ....path:
+ ....node-creation:0
+ ....gitignore-invalidation:0
+ ....directory-invalidation:0
+ ....opendir:0
 EOF
-	test_cmp ../trace.expect ../trace
+	test_cmp ../trace.expect ../trace.relevant
 '
 
 test_expect_success 'set up for test of subdir and sparse checkouts' '
@@ -568,8 +601,8 @@ test_expect_success 'set up for test of subdir and sparse checkouts' '
 
 test_expect_success 'test sparse status with untracked cache and subdir' '
 	avoid_racy &&
-	: >../trace &&
-	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 	git status --porcelain >../status.actual &&
 	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
@@ -581,13 +614,15 @@ test_expect_success 'test sparse status with untracked cache and subdir' '
 EOF
 	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../status.actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
 	cat >../trace.expect <<EOF &&
-node creation: 2
-gitignore invalidation: 0
-directory invalidation: 1
-opendir: 3
+ ....path:
+ ....node-creation:2
+ ....gitignore-invalidation:0
+ ....directory-invalidation:1
+ ....opendir:3
 EOF
-	test_cmp ../trace.expect ../trace
+	test_cmp ../trace.expect ../trace.relevant
 '
 
 test_expect_success 'verify untracked cache dump (sparse/subdirs)' '
@@ -616,19 +651,21 @@ EOF
 
 test_expect_success 'test sparse status again with untracked cache and subdir' '
 	avoid_racy &&
-	: >../trace &&
-	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 	git status --porcelain >../status.actual &&
 	iuc status --porcelain >../status.iuc &&
 	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../status.actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
 	cat >../trace.expect <<EOF &&
-node creation: 0
-gitignore invalidation: 0
-directory invalidation: 0
-opendir: 0
+ ....path:
+ ....node-creation:0
+ ....gitignore-invalidation:0
+ ....directory-invalidation:0
+ ....opendir:0
 EOF
-	test_cmp ../trace.expect ../trace
+	test_cmp ../trace.expect ../trace.relevant
 '
 
 test_expect_success 'move entry in subdir from untracked to cached' '
diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
index 45d025f96010..637391c6ce46 100755
--- a/t/t7519-status-fsmonitor.sh
+++ b/t/t7519-status-fsmonitor.sh
@@ -334,7 +334,7 @@ test_expect_success UNTRACKED_CACHE 'ignore .git changes when invalidating UNTR'
 		git config core.fsmonitor .git/hooks/fsmonitor-test &&
 		git update-index --untracked-cache &&
 		git update-index --fsmonitor &&
-		GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace-before" \
+		GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace-before" \
 		git status &&
 		test-tool dump-untracked-cache >../before
 	) &&
@@ -346,12 +346,12 @@ test_expect_success UNTRACKED_CACHE 'ignore .git changes when invalidating UNTR'
 	EOF
 	(
 		cd dot-git &&
-		GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace-after" \
+		GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace-after" \
 		git status &&
 		test-tool dump-untracked-cache >../after
 	) &&
-	grep "directory invalidation" trace-before >>before &&
-	grep "directory invalidation" trace-after >>after &&
+	grep "directory-invalidation" trace-before | cut -d"|" -f 9 >>before &&
+	grep "directory-invalidation" trace-after  | cut -d"|" -f 9 >>after &&
 	# UNTR extension unchanged, dir invalidation count unchanged
 	test_cmp before after
 '
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH v4 2/8] dir: report number of visited directories and paths with trace2
  2021-05-11 18:34     ` [PATCH v4 0/8] Directory traversal fixes Elijah Newren via GitGitGadget
  2021-05-11 18:34       ` [PATCH v4 1/8] dir: convert trace calls to trace2 equivalents Elijah Newren via GitGitGadget
@ 2021-05-11 18:34       ` Elijah Newren via GitGitGadget
  2021-05-11 18:34       ` [PATCH v4 3/8] ls-files: error out on -i unless -o or -c are specified Elijah Newren via GitGitGadget
                         ` (6 subsequent siblings)
  8 siblings, 0 replies; 90+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-05-11 18:34 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King,
	Philip Oakley, Jeff Hostetler, Josh Steadmon, Jeff Hostetler,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Provide more statistics in trace2 output that include the number of
directories and total paths visited by the directory traversal logic.
Subsequent patches will take advantage of this to ensure we do not
unnecessarily traverse into ignored directories.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c                             | 8 ++++++++
 dir.h                             | 4 ++++
 t/t7063-status-untracked-cache.sh | 3 ++-
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/dir.c b/dir.c
index 122fcbffdf89..69b8c9d7f9fb 100644
--- a/dir.c
+++ b/dir.c
@@ -2440,6 +2440,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 
 	if (open_cached_dir(&cdir, dir, untracked, istate, &path, check_only))
 		goto out;
+	dir->visited_directories++;
 
 	if (untracked)
 		untracked->check_only = !!check_only;
@@ -2448,6 +2449,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 		/* check how the file or directory should be treated */
 		state = treat_path(dir, untracked, &cdir, istate, &path,
 				   baselen, pathspec);
+		dir->visited_paths++;
 
 		if (state > dir_state)
 			dir_state = state;
@@ -2764,6 +2766,10 @@ static void trace2_read_directory_statistics(struct dir_struct *dir,
 					     struct repository *repo,
 					     const char *path)
 {
+	trace2_data_intmax("read_directory", repo,
+			   "directories-visited", dir->visited_directories);
+	trace2_data_intmax("read_directory", repo,
+			   "paths-visited", dir->visited_paths);
 	if (!dir->untracked)
 		return;
 	trace2_data_string("read_directory", repo, "path", path);
@@ -2785,6 +2791,8 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
 	struct untracked_cache_dir *untracked;
 
 	trace2_region_enter("dir", "read_directory", istate->repo);
+	dir->visited_paths = 0;
+	dir->visited_directories = 0;
 
 	if (has_symlink_leading_path(path, len)) {
 		trace2_region_leave("dir", "read_directory", istate->repo);
diff --git a/dir.h b/dir.h
index 04d886cfce75..22c67907f689 100644
--- a/dir.h
+++ b/dir.h
@@ -336,6 +336,10 @@ struct dir_struct {
 	struct oid_stat ss_info_exclude;
 	struct oid_stat ss_excludes_file;
 	unsigned unmanaged_exclude_files;
+
+	/* Stats about the traversal */
+	unsigned visited_paths;
+	unsigned visited_directories;
 };
 
 /*Count the number of slashes for string s*/
diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh
index 9710d33b3cd6..a0c123b0a77a 100755
--- a/t/t7063-status-untracked-cache.sh
+++ b/t/t7063-status-untracked-cache.sh
@@ -65,7 +65,8 @@ get_relevant_traces () {
 	INPUT_FILE=$1
 	OUTPUT_FILE=$2
 	grep data.*read_directo $INPUT_FILE |
-	    cut -d "|" -f 9 \
+	    cut -d "|" -f 9 |
+	    grep -v visited \
 	    >"$OUTPUT_FILE"
 }
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH v4 3/8] ls-files: error out on -i unless -o or -c are specified
  2021-05-11 18:34     ` [PATCH v4 0/8] Directory traversal fixes Elijah Newren via GitGitGadget
  2021-05-11 18:34       ` [PATCH v4 1/8] dir: convert trace calls to trace2 equivalents Elijah Newren via GitGitGadget
  2021-05-11 18:34       ` [PATCH v4 2/8] dir: report number of visited directories and paths with trace2 Elijah Newren via GitGitGadget
@ 2021-05-11 18:34       ` Elijah Newren via GitGitGadget
  2021-05-11 18:34       ` [PATCH v4 4/8] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget
                         ` (5 subsequent siblings)
  8 siblings, 0 replies; 90+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-05-11 18:34 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King,
	Philip Oakley, Jeff Hostetler, Josh Steadmon, Jeff Hostetler,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

ls-files --ignored can be used together with either --others or
--cached.  After being perplexed for a bit and digging in to the code, I
assumed that ls-files -i was just broken and not printing anything and
I had a nice patch ready to submit when I finally realized that -i can be
used with --cached to find tracked ignores.

While that was a mistake on my part, and a careful reading of the
documentation could have made this more clear, I suspect this is an
error others are likely to make as well.  In fact, of two uses in our
testsuite, I believe one of the two did make this error.  In t1306.13,
there are NO tracked files, and all the excludes built up and used in
that test and in previous tests thus have to be about untracked files.
However, since they were looking for an empty result, the mistake went
unnoticed as their erroneous command also just happened to give an empty
answer.

-i will most the time be used with -o, which would suggest we could just
make -i imply -o in the absence of either a -o or -c, but that would be
a backward incompatible break.  Instead, let's just flag -i without
either a -o or -c as an error, and update the two relevant testcases to
specify their intent.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 builtin/ls-files.c          | 3 +++
 t/t1306-xdg-files.sh        | 2 +-
 t/t3003-ls-files-exclude.sh | 4 ++--
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 60a2913a01e9..e8e25006c647 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -748,6 +748,9 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 	if (pathspec.nr && error_unmatch)
 		ps_matched = xcalloc(pathspec.nr, 1);
 
+	if ((dir.flags & DIR_SHOW_IGNORED) && !show_others && !show_cached)
+		die("ls-files -i must be used with either -o or -c");
+
 	if ((dir.flags & DIR_SHOW_IGNORED) && !exc_given)
 		die("ls-files --ignored needs some exclude pattern");
 
diff --git a/t/t1306-xdg-files.sh b/t/t1306-xdg-files.sh
index dd87b43be1a6..40d3c42618c0 100755
--- a/t/t1306-xdg-files.sh
+++ b/t/t1306-xdg-files.sh
@@ -116,7 +116,7 @@ test_expect_success 'Exclusion in a non-XDG global ignore file' '
 test_expect_success 'Checking XDG ignore file when HOME is unset' '
 	(sane_unset HOME &&
 	 git config --unset core.excludesfile &&
-	 git ls-files --exclude-standard --ignored >actual) &&
+	 git ls-files --exclude-standard --ignored --others >actual) &&
 	test_must_be_empty actual
 '
 
diff --git a/t/t3003-ls-files-exclude.sh b/t/t3003-ls-files-exclude.sh
index d5ec333131f9..c41c4f046abf 100755
--- a/t/t3003-ls-files-exclude.sh
+++ b/t/t3003-ls-files-exclude.sh
@@ -29,11 +29,11 @@ test_expect_success 'add file to gitignore' '
 '
 check_all_output
 
-test_expect_success 'ls-files -i lists only tracked-but-ignored files' '
+test_expect_success 'ls-files -i -c lists only tracked-but-ignored files' '
 	echo content >other-file &&
 	git add other-file &&
 	echo file >expect &&
-	git ls-files -i --exclude-standard >output &&
+	git ls-files -i -c --exclude-standard >output &&
 	test_cmp expect output
 '
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH v4 4/8] t7300: add testcase showing unnecessary traversal into ignored directory
  2021-05-11 18:34     ` [PATCH v4 0/8] Directory traversal fixes Elijah Newren via GitGitGadget
                         ` (2 preceding siblings ...)
  2021-05-11 18:34       ` [PATCH v4 3/8] ls-files: error out on -i unless -o or -c are specified Elijah Newren via GitGitGadget
@ 2021-05-11 18:34       ` Elijah Newren via GitGitGadget
  2021-05-11 18:34       ` [PATCH v4 5/8] t3001, t7300: add testcase showcasing missed directory traversal Elijah Newren via GitGitGadget
                         ` (4 subsequent siblings)
  8 siblings, 0 replies; 90+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-05-11 18:34 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King,
	Philip Oakley, Jeff Hostetler, Josh Steadmon, Jeff Hostetler,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

The PNPM package manager is apparently creating deeply nested (but
ignored) directory structures; traversing them is costly
performance-wise, unnecessary, and in some cases is even throwing
warnings/errors because the paths are too long to handle on various
platforms.  Add a testcase that checks for such unnecessary directory
traversal.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t7300-clean.sh | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index a74816ca8b46..07e8ba2d4b85 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -746,4 +746,27 @@ test_expect_success 'clean untracked paths by pathspec' '
 	test_must_be_empty actual
 '
 
+test_expect_failure 'avoid traversing into ignored directories' '
+	test_when_finished rm -f output error trace.* &&
+	test_create_repo avoid-traversing-deep-hierarchy &&
+	(
+		cd avoid-traversing-deep-hierarchy &&
+
+		mkdir -p untracked/subdir/with/a &&
+		>untracked/subdir/with/a/random-file.txt &&
+
+		GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
+		git clean -ffdxn -e untracked
+	) &&
+
+	# Make sure we only visited into the top-level directory, and did
+	# not traverse into the "untracked" subdirectory since it was excluded
+	grep data.*read_directo.*directories-visited trace.output |
+		cut -d "|" -f 9 >trace.relevant &&
+	cat >trace.expect <<-EOF &&
+	 ..directories-visited:1
+	EOF
+	test_cmp trace.expect trace.relevant
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH v4 5/8] t3001, t7300: add testcase showcasing missed directory traversal
  2021-05-11 18:34     ` [PATCH v4 0/8] Directory traversal fixes Elijah Newren via GitGitGadget
                         ` (3 preceding siblings ...)
  2021-05-11 18:34       ` [PATCH v4 4/8] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget
@ 2021-05-11 18:34       ` Elijah Newren via GitGitGadget
  2021-05-11 18:34       ` [PATCH v4 6/8] dir: avoid unnecessary traversal into ignored directory Elijah Newren via GitGitGadget
                         ` (3 subsequent siblings)
  8 siblings, 0 replies; 90+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-05-11 18:34 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King,
	Philip Oakley, Jeff Hostetler, Josh Steadmon, Jeff Hostetler,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

In the last commit, we added a testcase showing that the directory
traversal machinery sometimes traverses into directories unnecessarily.
Here we show that there are cases where it does the opposite: it does
not traverse into directories, despite those directories having
important files that need to be flagged.

Add a testcase showing that `git ls-files -o -i --directory` can omit
some of the files it should be listing, and another showing that `git
clean -fX` can fail to clean out some of the expected files.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t3001-ls-files-others-exclude.sh |  5 +++++
 t/t7300-clean.sh                   | 19 +++++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/t/t3001-ls-files-others-exclude.sh b/t/t3001-ls-files-others-exclude.sh
index 1ec7cb57c7a8..ac05d1a17931 100755
--- a/t/t3001-ls-files-others-exclude.sh
+++ b/t/t3001-ls-files-others-exclude.sh
@@ -292,6 +292,11 @@ EOF
 	test_cmp expect actual
 '
 
+test_expect_failure 'ls-files with "**" patterns and --directory' '
+	# Expectation same as previous test
+	git ls-files --directory -o -i --exclude "**/a.1" >actual &&
+	test_cmp expect actual
+'
 
 test_expect_success 'ls-files with "**" patterns and no slashes' '
 	git ls-files -o -i --exclude "one**a.1" >actual &&
diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index 07e8ba2d4b85..34c08c325407 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -769,4 +769,23 @@ test_expect_failure 'avoid traversing into ignored directories' '
 	test_cmp trace.expect trace.relevant
 '
 
+test_expect_failure 'traverse into directories that may have ignored entries' '
+	test_when_finished rm -f output &&
+	test_create_repo need-to-traverse-into-hierarchy &&
+	(
+		cd need-to-traverse-into-hierarchy &&
+		mkdir -p modules/foobar/src/generated &&
+		> modules/foobar/src/generated/code.c &&
+		> modules/foobar/Makefile &&
+		echo "/modules/**/src/generated/" >.gitignore &&
+
+		git clean -fX modules/foobar >../output &&
+
+		grep Removing ../output &&
+
+		test_path_is_missing modules/foobar/src/generated/code.c &&
+		test_path_is_file modules/foobar/Makefile
+	)
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH v4 6/8] dir: avoid unnecessary traversal into ignored directory
  2021-05-11 18:34     ` [PATCH v4 0/8] Directory traversal fixes Elijah Newren via GitGitGadget
                         ` (4 preceding siblings ...)
  2021-05-11 18:34       ` [PATCH v4 5/8] t3001, t7300: add testcase showcasing missed directory traversal Elijah Newren via GitGitGadget
@ 2021-05-11 18:34       ` Elijah Newren via GitGitGadget
  2021-05-11 18:34       ` [PATCH v4 7/8] dir: traverse into untracked directories if they may have ignored subfiles Elijah Newren via GitGitGadget
                         ` (2 subsequent siblings)
  8 siblings, 0 replies; 90+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-05-11 18:34 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King,
	Philip Oakley, Jeff Hostetler, Josh Steadmon, Jeff Hostetler,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

The show_other_directories case in treat_directory() tried to handle
both excludes and untracked files with the same logic, and mishandled
both the excludes and the untracked files in the process, in different
ways.  Split that logic apart, and then focus on the logic for the
excludes; a subsequent commit will address the logic for untracked
files.

For show_other_directories, an excluded directory means that
every path underneath that directory will also be excluded.  Given that
the calling code requested to just show directories when everything
under a directory had the same state (that's what the
"DIR_SHOW_OTHER_DIRECTORIES" flag means), we generally do not need to
traverse into such directories and can just immediately mark them as
ignored (i.e. as path_excluded).  The only reason we cannot just
immediately return path_excluded is the DIR_HIDE_EMPTY_DIRECTORIES flag
and the possibility that the ignored directory is an empty directory.
The code previously treated DIR_SHOW_IGNORED_TOO in most cases as an
exception as well, which was wrong.  It can sometimes reduce the number
of cases where we need to recurse (namely if
DIR_SHOW_IGNORED_TOO_MODE_MATCHING is also set), but should not be able
to increase the number of cases where we need to recurse.  Fix the logic
accordingly.

Some sidenotes about possible confusion with dir.c:

* "ignored" often refers to an untracked ignore", i.e. a file which is
  not tracked which matches one of the ignore/exclusion rules.  But you
  can also have a "tracked ignore", a tracked file that happens to match
  one of the ignore/exclusion rules and which dir.c has to worry about
  since "git ls-files -c -i" is supposed to list them.

* The dir code often uses "ignored" and "excluded" interchangeably,
  which you need to keep in mind while reading the code.

* "exclude" is used multiple ways in the code:

  * As noted above, "exclude" is often a synonym for "ignored".

  * The logic for parsing .gitignore files was re-used in
    .git/info/sparse-checkout, except there it is used to mark paths that
    the user wants to *keep*.  This was mostly addressed by commit
    65edd96aec ("treewide: rename 'exclude' methods to 'pattern'",
    2019-09-03), but every once in a while you'll find a comment about
    "exclude" referring to these patterns that might in fact be in use
    by the sparse-checkout machinery for inclusion rules.

  * The word "EXCLUDE" is also used for pathspec negation, as in
      (pathspec->items[3].magic & PATHSPEC_EXCLUDE)
    Thus if a user had a .gitignore file containing
      *~
      *.log
      !settings.log
    And then ran
      git add -- 'settings.*' ':^settings.log'
    Then :^settings.log is a pathspec negation making settings.log not
    be requested to be added even though all other settings.* files are
    being added.  Also, !settings.log in the gitignore file is a negative
    exclude pattern meaning that settings.log is normally a file we
    want to track even though all other *.log files are ignored.

Sometimes it feels like dir.c needs its own glossary with its many
definitions, including the multiply-defined terms.

Reported-by: Jason Gore <Jason.Gore@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c            | 44 +++++++++++++++++++++++++++++---------------
 t/t7300-clean.sh |  2 +-
 2 files changed, 30 insertions(+), 16 deletions(-)

diff --git a/dir.c b/dir.c
index 69b8c9d7f9fb..0126e2f08af7 100644
--- a/dir.c
+++ b/dir.c
@@ -1844,6 +1844,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	}
 
 	/* This is the "show_other_directories" case */
+	assert(dir->flags & DIR_SHOW_OTHER_DIRECTORIES);
 
 	/*
 	 * If we have a pathspec which could match something _below_ this
@@ -1854,27 +1855,40 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	if (matches_how == MATCHED_RECURSIVELY_LEADING_PATHSPEC)
 		return path_recurse;
 
+	/* Special cases for where this directory is excluded/ignored */
+	if (excluded) {
+		/*
+		 * In the show_other_directories case, if we're not
+		 * hiding empty directories, there is no need to
+		 * recurse into an ignored directory.
+		 */
+		if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES))
+			return path_excluded;
+
+		/*
+		 * Even if we are hiding empty directories, we can still avoid
+		 * recursing into ignored directories for DIR_SHOW_IGNORED_TOO
+		 * if DIR_SHOW_IGNORED_TOO_MODE_MATCHING is also set.
+		 */
+		if ((dir->flags & DIR_SHOW_IGNORED_TOO) &&
+		    (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING))
+			return path_excluded;
+	}
+
 	/*
-	 * Other than the path_recurse case immediately above, we only need
-	 * to recurse into untracked/ignored directories if either of the
-	 * following bits is set:
+	 * Other than the path_recurse case above, we only need to
+	 * recurse into untracked directories if either of the following
+	 * bits is set:
 	 *   - DIR_SHOW_IGNORED_TOO (because then we need to determine if
 	 *                           there are ignored entries below)
 	 *   - DIR_HIDE_EMPTY_DIRECTORIES (because we have to determine if
 	 *                                 the directory is empty)
 	 */
-	if (!(dir->flags & (DIR_SHOW_IGNORED_TOO | DIR_HIDE_EMPTY_DIRECTORIES)))
-		return excluded ? path_excluded : path_untracked;
-
-	/*
-	 * ...and even if DIR_SHOW_IGNORED_TOO is set, we can still avoid
-	 * recursing into ignored directories if the path is excluded and
-	 * DIR_SHOW_IGNORED_TOO_MODE_MATCHING is also set.
-	 */
-	if (excluded &&
-	    (dir->flags & DIR_SHOW_IGNORED_TOO) &&
-	    (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING))
-		return path_excluded;
+	if (!excluded &&
+	    !(dir->flags & (DIR_SHOW_IGNORED_TOO |
+			    DIR_HIDE_EMPTY_DIRECTORIES))) {
+		return path_untracked;
+	}
 
 	/*
 	 * Even if we don't want to know all the paths under an untracked or
diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index 34c08c325407..21e48b3ba591 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -746,7 +746,7 @@ test_expect_success 'clean untracked paths by pathspec' '
 	test_must_be_empty actual
 '
 
-test_expect_failure 'avoid traversing into ignored directories' '
+test_expect_success 'avoid traversing into ignored directories' '
 	test_when_finished rm -f output error trace.* &&
 	test_create_repo avoid-traversing-deep-hierarchy &&
 	(
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH v4 7/8] dir: traverse into untracked directories if they may have ignored subfiles
  2021-05-11 18:34     ` [PATCH v4 0/8] Directory traversal fixes Elijah Newren via GitGitGadget
                         ` (5 preceding siblings ...)
  2021-05-11 18:34       ` [PATCH v4 6/8] dir: avoid unnecessary traversal into ignored directory Elijah Newren via GitGitGadget
@ 2021-05-11 18:34       ` Elijah Newren via GitGitGadget
  2021-05-11 18:34       ` [PATCH v4 8/8] dir: update stale description of treat_directory() Derrick Stolee via GitGitGadget
  2021-05-12 17:28       ` [PATCH v5 0/9] Directory traversal fixes Elijah Newren via GitGitGadget
  8 siblings, 0 replies; 90+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-05-11 18:34 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King,
	Philip Oakley, Jeff Hostetler, Josh Steadmon, Jeff Hostetler,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

A directory that is untracked does not imply that all files under it
should be categorized as untracked; in particular, if the caller is
interested in ignored files, many files or directories underneath the
untracked directory may be ignored.  We previously partially handled
this right with DIR_SHOW_IGNORED_TOO, but missed DIR_SHOW_IGNORED.  It
was not obvious, though, because the logic for untracked and excluded
files had been fused together making it harder to reason about.  The
previous commit split that logic out, making it easier to notice that
DIR_SHOW_IGNORED was missing.  Add it.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c                              | 10 ++++++----
 t/t3001-ls-files-others-exclude.sh |  2 +-
 t/t7300-clean.sh                   |  2 +-
 3 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/dir.c b/dir.c
index 0126e2f08af7..deeff1a58319 100644
--- a/dir.c
+++ b/dir.c
@@ -1877,15 +1877,17 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 
 	/*
 	 * Other than the path_recurse case above, we only need to
-	 * recurse into untracked directories if either of the following
+	 * recurse into untracked directories if any of the following
 	 * bits is set:
-	 *   - DIR_SHOW_IGNORED_TOO (because then we need to determine if
-	 *                           there are ignored entries below)
+	 *   - DIR_SHOW_IGNORED (because then we need to determine if
+	 *                       there are ignored entries below)
+	 *   - DIR_SHOW_IGNORED_TOO (same as above)
 	 *   - DIR_HIDE_EMPTY_DIRECTORIES (because we have to determine if
 	 *                                 the directory is empty)
 	 */
 	if (!excluded &&
-	    !(dir->flags & (DIR_SHOW_IGNORED_TOO |
+	    !(dir->flags & (DIR_SHOW_IGNORED |
+			    DIR_SHOW_IGNORED_TOO |
 			    DIR_HIDE_EMPTY_DIRECTORIES))) {
 		return path_untracked;
 	}
diff --git a/t/t3001-ls-files-others-exclude.sh b/t/t3001-ls-files-others-exclude.sh
index ac05d1a17931..516c95ea0e82 100755
--- a/t/t3001-ls-files-others-exclude.sh
+++ b/t/t3001-ls-files-others-exclude.sh
@@ -292,7 +292,7 @@ EOF
 	test_cmp expect actual
 '
 
-test_expect_failure 'ls-files with "**" patterns and --directory' '
+test_expect_success 'ls-files with "**" patterns and --directory' '
 	# Expectation same as previous test
 	git ls-files --directory -o -i --exclude "**/a.1" >actual &&
 	test_cmp expect actual
diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index 21e48b3ba591..0399701e6276 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -769,7 +769,7 @@ test_expect_success 'avoid traversing into ignored directories' '
 	test_cmp trace.expect trace.relevant
 '
 
-test_expect_failure 'traverse into directories that may have ignored entries' '
+test_expect_success 'traverse into directories that may have ignored entries' '
 	test_when_finished rm -f output &&
 	test_create_repo need-to-traverse-into-hierarchy &&
 	(
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH v4 8/8] dir: update stale description of treat_directory()
  2021-05-11 18:34     ` [PATCH v4 0/8] Directory traversal fixes Elijah Newren via GitGitGadget
                         ` (6 preceding siblings ...)
  2021-05-11 18:34       ` [PATCH v4 7/8] dir: traverse into untracked directories if they may have ignored subfiles Elijah Newren via GitGitGadget
@ 2021-05-11 18:34       ` Derrick Stolee via GitGitGadget
  2021-05-12 17:28       ` [PATCH v5 0/9] Directory traversal fixes Elijah Newren via GitGitGadget
  8 siblings, 0 replies; 90+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-11 18:34 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King,
	Philip Oakley, Jeff Hostetler, Josh Steadmon, Jeff Hostetler,
	Elijah Newren, Derrick Stolee

From: Derrick Stolee <stolee@gmail.com>

The documentation comment for treat_directory() was originally written
in 095952 (Teach directory traversal about subprojects, 2007-04-11)
which was before the 'struct dir_struct' split its bitfield of named
options into a 'flags' enum in 7c4c97c0 (Turn the flags in struct
dir_struct into a single variable, 2009-02-16). When those flags
changed, the comment became stale, since members like
'show_other_directories' transitioned into flags like
DIR_SHOW_OTHER_DIRECTORIES.

Update the comments for treat_directory() to use these flag names rather
than the old member names.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/dir.c b/dir.c
index deeff1a58319..993a12145f9d 100644
--- a/dir.c
+++ b/dir.c
@@ -1749,13 +1749,13 @@ static enum exist_status directory_exists_in_index(struct index_state *istate,
  * Case 3: if we didn't have it in the index previously, we
  * have a few sub-cases:
  *
- *  (a) if "show_other_directories" is true, we show it as
- *      just a directory, unless "hide_empty_directories" is
+ *  (a) if DIR_SHOW_OTHER_DIRECTORIES flag is set, we show it as
+ *      just a directory, unless DIR_HIDE_EMPTY_DIRECTORIES is
  *      also true, in which case we need to check if it contains any
  *      untracked and / or ignored files.
- *  (b) if it looks like a git directory, and we don't have
- *      'no_gitlinks' set we treat it as a gitlink, and show it
- *      as a directory.
+ *  (b) if it looks like a git directory and we don't have the
+ *      DIR_NO_GITLINKS flag, then we treat it as a gitlink, and
+ *      show it as a directory.
  *  (c) otherwise, we recurse into it.
  */
 static enum path_treatment treat_directory(struct dir_struct *dir,
@@ -1843,7 +1843,6 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 		return path_recurse;
 	}
 
-	/* This is the "show_other_directories" case */
 	assert(dir->flags & DIR_SHOW_OTHER_DIRECTORIES);
 
 	/*
@@ -1858,7 +1857,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	/* Special cases for where this directory is excluded/ignored */
 	if (excluded) {
 		/*
-		 * In the show_other_directories case, if we're not
+		 * If DIR_SHOW_OTHER_DIRECTORIES is set and we're not
 		 * hiding empty directories, there is no need to
 		 * recurse into an ignored directory.
 		 */
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v4 1/8] dir: convert trace calls to trace2 equivalents
  2021-05-11 18:34       ` [PATCH v4 1/8] dir: convert trace calls to trace2 equivalents Elijah Newren via GitGitGadget
@ 2021-05-11 19:06         ` Jeff Hostetler
  2021-05-11 20:12           ` Elijah Newren
  0 siblings, 1 reply; 90+ messages in thread
From: Jeff Hostetler @ 2021-05-11 19:06 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget, git
  Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King,
	Philip Oakley, Jeff Hostetler, Josh Steadmon



On 5/11/21 2:34 PM, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
> 
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>   dir.c                             |  43 +++++--
>   t/t7063-status-untracked-cache.sh | 205 ++++++++++++++++++------------
>   t/t7519-status-fsmonitor.sh       |   8 +-
>   3 files changed, 155 insertions(+), 101 deletions(-)
> 
> diff --git a/dir.c b/dir.c
> index 3474e67e8f3c..122fcbffdf89 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -2760,15 +2760,34 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
>   	return root;
>   }
>   
> +static void trace2_read_directory_statistics(struct dir_struct *dir,
> +					     struct repository *repo,
> +					     const char *path)
> +{
> +	if (!dir->untracked)
> +		return;
> +	trace2_data_string("read_directory", repo, "path", path);

I'm probably just nit-picking here, but should this look more like:

	if (path && *path)
		trace2_data_string(...)
	if (!dir->untracked)
		return;

Then when you add the visitied fields in the next commit,
you'll have the path with them (when present).

(and it would let you optionally avoid the tmp strbuf in
the caller.)

Jeff

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v4 1/8] dir: convert trace calls to trace2 equivalents
  2021-05-11 19:06         ` Jeff Hostetler
@ 2021-05-11 20:12           ` Elijah Newren
  2021-05-11 23:12             ` Jeff Hostetler
  0 siblings, 1 reply; 90+ messages in thread
From: Elijah Newren @ 2021-05-11 20:12 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Eric Sunshine,
	Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler,
	Josh Steadmon

On Tue, May 11, 2021 at 12:06 PM Jeff Hostetler <git@jeffhostetler.com> wrote:
>
> On 5/11/21 2:34 PM, Elijah Newren via GitGitGadget wrote:
> > From: Elijah Newren <newren@gmail.com>
> >
> > Signed-off-by: Elijah Newren <newren@gmail.com>
> > ---
> >   dir.c                             |  43 +++++--
> >   t/t7063-status-untracked-cache.sh | 205 ++++++++++++++++++------------
> >   t/t7519-status-fsmonitor.sh       |   8 +-
> >   3 files changed, 155 insertions(+), 101 deletions(-)
> >
> > diff --git a/dir.c b/dir.c
> > index 3474e67e8f3c..122fcbffdf89 100644
> > --- a/dir.c
> > +++ b/dir.c
> > @@ -2760,15 +2760,34 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
> >       return root;
> >   }
> >
> > +static void trace2_read_directory_statistics(struct dir_struct *dir,
> > +                                          struct repository *repo,
> > +                                          const char *path)
> > +{
> > +     if (!dir->untracked)
> > +             return;
> > +     trace2_data_string("read_directory", repo, "path", path);
>
> I'm probably just nit-picking here, but should this look more like:

nit-picking and questions are totally fine.  :-)  Thanks for reviewing.

>
>         if (path && *path)
>                 trace2_data_string(...)

path is always non-NULL (it'd be an error to call read_directory()
with a NULL path).  So the first part of the check isn't meaningful
for this particular code.  The second half is interesting.  Do we want
to omit the path when it happens to be the toplevel directory (the
case where !*path)?  The original trace_performance_leave() calls
certainly didn't, and I was just trying to provide the same info they
do, as you suggested.  I guess people could determine the path by
knowing that the code doesn't print it when it's empty, but do we want
trace2 users to need to read the code to figure out statistics and
info?

>         if (!dir->untracked)
>                 return;
>
> Then when you add the visitied fields in the next commit,
> you'll have the path with them (when present).

There is always a path with them, it's just that the empty string
denotes the toplevel directory.

> (and it would let you optionally avoid the tmp strbuf in
> the caller.)

The path in read_directory() is not necessarily NUL-delimited, so
attempting to use it as-is, or even with your checks, would cause us
to possibly print garbage and do out-of-bounds reads.  We need the tmp
strbuf.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v3 3/8] [RFC] ls-files: error out on -i unless -o or -c are specified
  2021-05-11 17:40         ` Elijah Newren
@ 2021-05-11 22:32           ` Junio C Hamano
  0 siblings, 0 replies; 90+ messages in thread
From: Junio C Hamano @ 2021-05-11 22:32 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Eric Sunshine,
	Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler,
	Josh Steadmon

Elijah Newren <newren@gmail.com> writes:

>> If we are not defaulting to cached, then
>>
>>         die("ls-files -i must be used with either -o or -c");
>>
>> would also make sense.
>
> Ooh, that wording is much nicer.  I'll adopt the latter suggestion,
> but let me know if you'd rather I went the warning route.

Even though warning would be safer, I have no strong prefeference.
Either way will resolve my puzzlement.

Thanks.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v3 4/8] t7300: add testcase showing unnecessary traversal into ignored directory
  2021-05-11 17:45         ` Elijah Newren
@ 2021-05-11 22:43           ` Junio C Hamano
  2021-05-12  2:07             ` Elijah Newren
  0 siblings, 1 reply; 90+ messages in thread
From: Junio C Hamano @ 2021-05-11 22:43 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Eric Sunshine,
	Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler,
	Josh Steadmon

Elijah Newren <newren@gmail.com> writes:

> But to answer your question, the paths we visit are '.', '..', '.git',
> and 'untracked', the first three of which we mark as path_none and
> don't recurse into because of special rules for those paths, and the
> last of which we shouldn't recurse into since it is ignored.

Not a hard requirement, but I wish if we entirely ignored "." and
".." in our code (not just not counting, but making whoever calls
readdir() skip and call it again when it gets "." or "..").

  https://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html

seems to imply that readdir() may not give "." or ".." (if dot or
dot-dot exists, you are to return them only once, which implies that
it is perfectly OK for dot or dot-dot to be missing).

So dropping the test for number of visited paths would be nicer from
portability's point of view ;-)

Thanks.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v4 1/8] dir: convert trace calls to trace2 equivalents
  2021-05-11 20:12           ` Elijah Newren
@ 2021-05-11 23:12             ` Jeff Hostetler
  2021-05-12  0:44               ` Elijah Newren
  0 siblings, 1 reply; 90+ messages in thread
From: Jeff Hostetler @ 2021-05-11 23:12 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Eric Sunshine,
	Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler,
	Josh Steadmon



On 5/11/21 4:12 PM, Elijah Newren wrote:
> On Tue, May 11, 2021 at 12:06 PM Jeff Hostetler <git@jeffhostetler.com> wrote:
>>
>> On 5/11/21 2:34 PM, Elijah Newren via GitGitGadget wrote:
>>> From: Elijah Newren <newren@gmail.com>
>>>
>>> Signed-off-by: Elijah Newren <newren@gmail.com>
>>> ---
>>>    dir.c                             |  43 +++++--
>>>    t/t7063-status-untracked-cache.sh | 205 ++++++++++++++++++------------
>>>    t/t7519-status-fsmonitor.sh       |   8 +-
>>>    3 files changed, 155 insertions(+), 101 deletions(-)
>>>
>>> diff --git a/dir.c b/dir.c
>>> index 3474e67e8f3c..122fcbffdf89 100644
>>> --- a/dir.c
>>> +++ b/dir.c
>>> @@ -2760,15 +2760,34 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
>>>        return root;
>>>    }
>>>
>>> +static void trace2_read_directory_statistics(struct dir_struct *dir,
>>> +                                          struct repository *repo,
>>> +                                          const char *path)
>>> +{
>>> +     if (!dir->untracked)
>>> +             return;
>>> +     trace2_data_string("read_directory", repo, "path", path);
>>
>> I'm probably just nit-picking here, but should this look more like:
> 
> nit-picking and questions are totally fine.  :-)  Thanks for reviewing.
> 
>>
>>          if (path && *path)
>>                  trace2_data_string(...)
> 
> path is always non-NULL (it'd be an error to call read_directory()
> with a NULL path).  So the first part of the check isn't meaningful
> for this particular code.  The second half is interesting.  Do we want
> to omit the path when it happens to be the toplevel directory (the
> case where !*path)?  The original trace_performance_leave() calls
> certainly didn't, and I was just trying to provide the same info they
> do, as you suggested.  I guess people could determine the path by
> knowing that the code doesn't print it when it's empty, but do we want
> trace2 users to need to read the code to figure out statistics and
> info?

that's fine.  it might be easier to just always print it (even if
blank) so that post-processors know that rather than have to assume
it.

> 
>>          if (!dir->untracked)
>>                  return;
>>
>> Then when you add the visitied fields in the next commit,
>> you'll have the path with them (when present).
> 
> There is always a path with them, it's just that the empty string
> denotes the toplevel directory.
> 
>> (and it would let you optionally avoid the tmp strbuf in
>> the caller.)
> 
> The path in read_directory() is not necessarily NUL-delimited, so
> attempting to use it as-is, or even with your checks, would cause us
> to possibly print garbage and do out-of-bounds reads.  We need the tmp
> strbuf.
> 

I just meant, "if (!len) pass NULL, else build and pass tmp.buf".

but i'm nit-picking again.

Jeff


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v4 1/8] dir: convert trace calls to trace2 equivalents
  2021-05-11 23:12             ` Jeff Hostetler
@ 2021-05-12  0:44               ` Elijah Newren
  2021-05-12 12:26                 ` Jeff Hostetler
  0 siblings, 1 reply; 90+ messages in thread
From: Elijah Newren @ 2021-05-12  0:44 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Eric Sunshine,
	Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler,
	Josh Steadmon

On Tue, May 11, 2021 at 4:12 PM Jeff Hostetler <git@jeffhostetler.com> wrote:
>
> On 5/11/21 4:12 PM, Elijah Newren wrote:
> > On Tue, May 11, 2021 at 12:06 PM Jeff Hostetler <git@jeffhostetler.com> wrote:
> >>
> >> On 5/11/21 2:34 PM, Elijah Newren via GitGitGadget wrote:
> >>> From: Elijah Newren <newren@gmail.com>
> >>>
> >>> Signed-off-by: Elijah Newren <newren@gmail.com>
> >>> ---
> >>>    dir.c                             |  43 +++++--
> >>>    t/t7063-status-untracked-cache.sh | 205 ++++++++++++++++++------------
> >>>    t/t7519-status-fsmonitor.sh       |   8 +-
> >>>    3 files changed, 155 insertions(+), 101 deletions(-)
> >>>
> >>> diff --git a/dir.c b/dir.c
> >>> index 3474e67e8f3c..122fcbffdf89 100644
> >>> --- a/dir.c
> >>> +++ b/dir.c
> >>> @@ -2760,15 +2760,34 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
> >>>        return root;
> >>>    }
> >>>
> >>> +static void trace2_read_directory_statistics(struct dir_struct *dir,
> >>> +                                          struct repository *repo,
> >>> +                                          const char *path)
> >>> +{
> >>> +     if (!dir->untracked)
> >>> +             return;
> >>> +     trace2_data_string("read_directory", repo, "path", path);
> >>
> >> I'm probably just nit-picking here, but should this look more like:
> >
> > nit-picking and questions are totally fine.  :-)  Thanks for reviewing.
> >
> >>
> >>          if (path && *path)
> >>                  trace2_data_string(...)
> >
> > path is always non-NULL (it'd be an error to call read_directory()
> > with a NULL path).  So the first part of the check isn't meaningful
> > for this particular code.  The second half is interesting.  Do we want
> > to omit the path when it happens to be the toplevel directory (the
> > case where !*path)?  The original trace_performance_leave() calls
> > certainly didn't, and I was just trying to provide the same info they
> > do, as you suggested.  I guess people could determine the path by
> > knowing that the code doesn't print it when it's empty, but do we want
> > trace2 users to need to read the code to figure out statistics and
> > info?
>
> that's fine.  it might be easier to just always print it (even if
> blank) so that post-processors know that rather than have to assume
> it.
>
> >
> >>          if (!dir->untracked)
> >>                  return;
> >>
> >> Then when you add the visitied fields in the next commit,
> >> you'll have the path with them (when present).
> >
> > There is always a path with them, it's just that the empty string
> > denotes the toplevel directory.
> >
> >> (and it would let you optionally avoid the tmp strbuf in
> >> the caller.)
> >
> > The path in read_directory() is not necessarily NUL-delimited, so
> > attempting to use it as-is, or even with your checks, would cause us
> > to possibly print garbage and do out-of-bounds reads.  We need the tmp
> > strbuf.
> >
>
> I just meant, "if (!len) pass NULL, else build and pass tmp.buf".

Ah, gotcha, that's why you were checking non-NULL.

However, what about the other case when len is nonzero.  Let's say
that len = 8 and path points at
"filename*%&#)aWholeBunchOfTotalGarbageAfterTheRealFilenameThatShouldNotBeReadOrIncluded\0\0\0\0\0\0\0\0\0\0"
?

How do you make it print "filename" and only "filename" without the
other stuff without using the tmp strbuf?

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v3 4/8] t7300: add testcase showing unnecessary traversal into ignored directory
  2021-05-11 22:43           ` Junio C Hamano
@ 2021-05-12  2:07             ` Elijah Newren
  2021-05-12  3:17               ` Junio C Hamano
  0 siblings, 1 reply; 90+ messages in thread
From: Elijah Newren @ 2021-05-12  2:07 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Eric Sunshine,
	Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler,
	Josh Steadmon

On Tue, May 11, 2021 at 3:43 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Elijah Newren <newren@gmail.com> writes:
>
> > But to answer your question, the paths we visit are '.', '..', '.git',
> > and 'untracked', the first three of which we mark as path_none and
> > don't recurse into because of special rules for those paths, and the
> > last of which we shouldn't recurse into since it is ignored.
>
> Not a hard requirement, but I wish if we entirely ignored "." and
> ".." in our code (not just not counting, but making whoever calls
> readdir() skip and call it again when it gets "." or "..").
>
>   https://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html
>
> seems to imply that readdir() may not give "." or ".." (if dot or
> dot-dot exists, you are to return them only once, which implies that
> it is perfectly OK for dot or dot-dot to be missing).


Something like this?

diff --git a/dir.c b/dir.c
index 993a12145f..7f470bc701 100644
--- a/dir.c
+++ b/dir.c
@@ -2341,7 +2341,11 @@ static int read_cached_dir(struct cached_dir *cdir)
        struct dirent *de;

        if (cdir->fdir) {
-               de = readdir(cdir->fdir);
+               while ((de = readdir(cdir->fdir))) {
+                       /* Ignore '.' and '..' by re-looping; handle the rest */
+                       if (!de || !is_dot_or_dotdot(de->d_name))
+                               break;
+               }
                if (!de) {
                        cdir->d_name = NULL;
                        cdir->d_type = DT_UNKNOWN;

It appears that the other two callers of readdir() in dir.c, namely in
is_empty_dir() and remove_dir_recurse() already have such special
repeat-if-is_dot_or_dotdot() logic built into them, so this was
partially lifted from those.

If you'd like, I can add another patch in the series with this change
so that all readdir() calls in dir.c have such ignore '.' and '..'
logic.  Or, we could perhaps introduce a new readdir() wrapper that
does nothing other than ignore '.' and '..' and have all three of
these callsites use that new wrapper.

> So dropping the test for number of visited paths would be nicer from
> portability's point of view ;-)

Yep, makes sense.  I already did that in v4, which means it'll
continue to pass with or without the above proposed change to
read_cached_dir().

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v3 4/8] t7300: add testcase showing unnecessary traversal into ignored directory
  2021-05-12  2:07             ` Elijah Newren
@ 2021-05-12  3:17               ` Junio C Hamano
  0 siblings, 0 replies; 90+ messages in thread
From: Junio C Hamano @ 2021-05-12  3:17 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Eric Sunshine,
	Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler,
	Josh Steadmon

Elijah Newren <newren@gmail.com> writes:

> If you'd like, I can add another patch in the series with this change
> so that all readdir() calls in dir.c have such ignore '.' and '..'
> logic.  Or, we could perhaps introduce a new readdir() wrapper that
> does nothing other than ignore '.' and '..' and have all three of
> these callsites use that new wrapper.

Yeah, it is good to be consistent (either implementation).

>> So dropping the test for number of visited paths would be nicer from
>> portability's point of view ;-)
>
> Yep, makes sense.  I already did that in v4, which means it'll
> continue to pass with or without the above proposed change to
> read_cached_dir().

Yup.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v4 1/8] dir: convert trace calls to trace2 equivalents
  2021-05-12  0:44               ` Elijah Newren
@ 2021-05-12 12:26                 ` Jeff Hostetler
  2021-05-12 15:24                   ` Elijah Newren
  0 siblings, 1 reply; 90+ messages in thread
From: Jeff Hostetler @ 2021-05-12 12:26 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Eric Sunshine,
	Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler,
	Josh Steadmon



On 5/11/21 8:44 PM, Elijah Newren wrote:
> On Tue, May 11, 2021 at 4:12 PM Jeff Hostetler <git@jeffhostetler.com> wrote:
>>
>> On 5/11/21 4:12 PM, Elijah Newren wrote:
>>> On Tue, May 11, 2021 at 12:06 PM Jeff Hostetler <git@jeffhostetler.com> wrote:
>>>>
>>>> On 5/11/21 2:34 PM, Elijah Newren via GitGitGadget wrote:
>>>>> From: Elijah Newren <newren@gmail.com>
>>>>>
>>>>> Signed-off-by: Elijah Newren <newren@gmail.com>
>>>>> ---
>>>>>     dir.c                             |  43 +++++--
>>>>>     t/t7063-status-untracked-cache.sh | 205 ++++++++++++++++++------------
>>>>>     t/t7519-status-fsmonitor.sh       |   8 +-
>>>>>     3 files changed, 155 insertions(+), 101 deletions(-)
>>>>>
>>>>> diff --git a/dir.c b/dir.c
>>>>> index 3474e67e8f3c..122fcbffdf89 100644
>>>>> --- a/dir.c
>>>>> +++ b/dir.c
>>>>> @@ -2760,15 +2760,34 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
>>>>>         return root;
>>>>>     }
>>>>>
>>>>> +static void trace2_read_directory_statistics(struct dir_struct *dir,
>>>>> +                                          struct repository *repo,
>>>>> +                                          const char *path)
>>>>> +{
>>>>> +     if (!dir->untracked)
>>>>> +             return;
>>>>> +     trace2_data_string("read_directory", repo, "path", path);
>>>>
>>>> I'm probably just nit-picking here, but should this look more like:
>>>
>>> nit-picking and questions are totally fine.  :-)  Thanks for reviewing.
>>>
>>>>
>>>>           if (path && *path)
>>>>                   trace2_data_string(...)
>>>
>>> path is always non-NULL (it'd be an error to call read_directory()
>>> with a NULL path).  So the first part of the check isn't meaningful
>>> for this particular code.  The second half is interesting.  Do we want
>>> to omit the path when it happens to be the toplevel directory (the
>>> case where !*path)?  The original trace_performance_leave() calls
>>> certainly didn't, and I was just trying to provide the same info they
>>> do, as you suggested.  I guess people could determine the path by
>>> knowing that the code doesn't print it when it's empty, but do we want
>>> trace2 users to need to read the code to figure out statistics and
>>> info?
>>
>> that's fine.  it might be easier to just always print it (even if
>> blank) so that post-processors know that rather than have to assume
>> it.
>>
>>>
>>>>           if (!dir->untracked)
>>>>                   return;
>>>>
>>>> Then when you add the visitied fields in the next commit,
>>>> you'll have the path with them (when present).
>>>
>>> There is always a path with them, it's just that the empty string
>>> denotes the toplevel directory.
>>>
>>>> (and it would let you optionally avoid the tmp strbuf in
>>>> the caller.)
>>>
>>> The path in read_directory() is not necessarily NUL-delimited, so
>>> attempting to use it as-is, or even with your checks, would cause us
>>> to possibly print garbage and do out-of-bounds reads.  We need the tmp
>>> strbuf.
>>>
>>
>> I just meant, "if (!len) pass NULL, else build and pass tmp.buf".
> 
> Ah, gotcha, that's why you were checking non-NULL.
> 
> However, what about the other case when len is nonzero.  Let's say
> that len = 8 and path points at
> "filename*%&#)aWholeBunchOfTotalGarbageAfterTheRealFilenameThatShouldNotBeReadOrIncluded\0\0\0\0\0\0\0\0\0\0"
> ?
> 
> How do you make it print "filename" and only "filename" without the
> other stuff without using the tmp strbuf?
> 

I was still saying to use the "strbuf tmp" in the non-zero len case,
but just pass NULL (or "") for the len==0 case.

Alternatively, since `trace2_read_directory_statistics() a static
local function, we could move all of the path manipulation into it.

static void emit_stats(
	struct dir_struct *dir,
	struct repository *repo,
	const char* path_buf,
	size_t path_len)
{
	if (!path_len)
		trace2_data_string("read_directory", repo,
			"path", "");
	else {
		struct strbuf tmp = STRBUF_INIT;
		strbuf_add(&tmp, path_buf, path_len);
		trace2_data_string("read_directory", repo,
			"path", tmp.buf);
		strbuf_release(&tmp);
	}
	... the rest of intmax stats ...
}


BTW, could we also rename your stats function?  I've been trying
to keep the "trace2_" prefix reserved for the Trace2 API.


Thanks,
Jeff

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v4 1/8] dir: convert trace calls to trace2 equivalents
  2021-05-12 12:26                 ` Jeff Hostetler
@ 2021-05-12 15:24                   ` Elijah Newren
  0 siblings, 0 replies; 90+ messages in thread
From: Elijah Newren @ 2021-05-12 15:24 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Eric Sunshine,
	Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler,
	Josh Steadmon

On Wed, May 12, 2021 at 5:26 AM Jeff Hostetler <git@jeffhostetler.com> wrote:
>
> On 5/11/21 8:44 PM, Elijah Newren wrote:
> > On Tue, May 11, 2021 at 4:12 PM Jeff Hostetler <git@jeffhostetler.com> wrote:
> >>
> >> On 5/11/21 4:12 PM, Elijah Newren wrote:
> >>> On Tue, May 11, 2021 at 12:06 PM Jeff Hostetler <git@jeffhostetler.com> wrote:
> >>>>
> >>>> On 5/11/21 2:34 PM, Elijah Newren via GitGitGadget wrote:
> >>>>> From: Elijah Newren <newren@gmail.com>
> >>>>>
> >>>>> Signed-off-by: Elijah Newren <newren@gmail.com>
> >>>>> ---
> >>>>>     dir.c                             |  43 +++++--
> >>>>>     t/t7063-status-untracked-cache.sh | 205 ++++++++++++++++++------------
> >>>>>     t/t7519-status-fsmonitor.sh       |   8 +-
> >>>>>     3 files changed, 155 insertions(+), 101 deletions(-)
> >>>>>
> >>>>> diff --git a/dir.c b/dir.c
> >>>>> index 3474e67e8f3c..122fcbffdf89 100644
> >>>>> --- a/dir.c
> >>>>> +++ b/dir.c
> >>>>> @@ -2760,15 +2760,34 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
> >>>>>         return root;
> >>>>>     }
> >>>>>
> >>>>> +static void trace2_read_directory_statistics(struct dir_struct *dir,
> >>>>> +                                          struct repository *repo,
> >>>>> +                                          const char *path)
> >>>>> +{
> >>>>> +     if (!dir->untracked)
> >>>>> +             return;
> >>>>> +     trace2_data_string("read_directory", repo, "path", path);
> >>>>
> >>>> I'm probably just nit-picking here, but should this look more like:
> >>>
> >>> nit-picking and questions are totally fine.  :-)  Thanks for reviewing.
> >>>
> >>>>
> >>>>           if (path && *path)
> >>>>                   trace2_data_string(...)
> >>>
> >>> path is always non-NULL (it'd be an error to call read_directory()
> >>> with a NULL path).  So the first part of the check isn't meaningful
> >>> for this particular code.  The second half is interesting.  Do we want
> >>> to omit the path when it happens to be the toplevel directory (the
> >>> case where !*path)?  The original trace_performance_leave() calls
> >>> certainly didn't, and I was just trying to provide the same info they
> >>> do, as you suggested.  I guess people could determine the path by
> >>> knowing that the code doesn't print it when it's empty, but do we want
> >>> trace2 users to need to read the code to figure out statistics and
> >>> info?
> >>
> >> that's fine.  it might be easier to just always print it (even if
> >> blank) so that post-processors know that rather than have to assume
> >> it.
> >>
> >>>
> >>>>           if (!dir->untracked)
> >>>>                   return;
> >>>>
> >>>> Then when you add the visitied fields in the next commit,
> >>>> you'll have the path with them (when present).
> >>>
> >>> There is always a path with them, it's just that the empty string
> >>> denotes the toplevel directory.
> >>>
> >>>> (and it would let you optionally avoid the tmp strbuf in
> >>>> the caller.)
> >>>
> >>> The path in read_directory() is not necessarily NUL-delimited, so
> >>> attempting to use it as-is, or even with your checks, would cause us
> >>> to possibly print garbage and do out-of-bounds reads.  We need the tmp
> >>> strbuf.
> >>>
> >>
> >> I just meant, "if (!len) pass NULL, else build and pass tmp.buf".
> >
> > Ah, gotcha, that's why you were checking non-NULL.
> >
> > However, what about the other case when len is nonzero.  Let's say
> > that len = 8 and path points at
> > "filename*%&#)aWholeBunchOfTotalGarbageAfterTheRealFilenameThatShouldNotBeReadOrIncluded\0\0\0\0\0\0\0\0\0\0"
> > ?
> >
> > How do you make it print "filename" and only "filename" without the
> > other stuff without using the tmp strbuf?
> >
>
> I was still saying to use the "strbuf tmp" in the non-zero len case,
> but just pass NULL (or "") for the len==0 case.

Ah, now I see what you were saying.  Sorry for not getting it earlier.

> Alternatively, since `trace2_read_directory_statistics() a static
> local function, we could move all of the path manipulation into it.
>
> static void emit_stats(
>         struct dir_struct *dir,
>         struct repository *repo,
>         const char* path_buf,
>         size_t path_len)
> {
>         if (!path_len)
>                 trace2_data_string("read_directory", repo,
>                         "path", "");
>         else {
>                 struct strbuf tmp = STRBUF_INIT;
>                 strbuf_add(&tmp, path_buf, path_len);
>                 trace2_data_string("read_directory", repo,
>                         "path", tmp.buf);
>                 strbuf_release(&tmp);
>         }
>         ... the rest of intmax stats ...
> }

Makes sense.

> BTW, could we also rename your stats function?  I've been trying
> to keep the "trace2_" prefix reserved for the Trace2 API.

Sure, will do.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH v5 0/9] Directory traversal fixes
  2021-05-11 18:34     ` [PATCH v4 0/8] Directory traversal fixes Elijah Newren via GitGitGadget
                         ` (7 preceding siblings ...)
  2021-05-11 18:34       ` [PATCH v4 8/8] dir: update stale description of treat_directory() Derrick Stolee via GitGitGadget
@ 2021-05-12 17:28       ` Elijah Newren via GitGitGadget
  2021-05-12 17:28         ` [PATCH v5 1/9] dir: convert trace calls to trace2 equivalents Elijah Newren via GitGitGadget
                           ` (9 more replies)
  8 siblings, 10 replies; 90+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-05-12 17:28 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King,
	Philip Oakley, Jeff Hostetler, Josh Steadmon, Jeff Hostetler,
	Elijah Newren

This patchset fixes a few directory traversal issues, where fill_directory()
would traverse into directories that it shouldn't and not traverse into
directories that it should (one of which was originally reported on this
list at [1]). And it includes a few cleanups

Changes since v4:

 * Tweak the trace2 statistics emitting a bit, as per suggestions from Jeff.
 * Introduce a new readdir_skip_dot_and_dotdot() helper at the end of the
   series, and use it everywhere we repeat the same code to skip '.' and
   '..' entries from readdir. Also use it in dir.c's read_cached_dir() so we
   can be consistent about skipping it, even for statistics, across
   platforms.

If anyone has any ideas about a better place to put the "Some sidenotes"
from the sixth commit message rather than keeping them in a random commit
message, that might be helpful.

[1] See
https://lore.kernel.org/git/DM6PR00MB06829EC5B85E0C5AC595004E894E9@DM6PR00MB0682.namprd00.prod.outlook.com/
or alternatively https://github.com/git-for-windows/git/issues/2732.

Derrick Stolee (1):
  dir: update stale description of treat_directory()

Elijah Newren (8):
  dir: convert trace calls to trace2 equivalents
  dir: report number of visited directories and paths with trace2
  ls-files: error out on -i unless -o or -c are specified
  t7300: add testcase showing unnecessary traversal into ignored
    directory
  t3001, t7300: add testcase showcasing missed directory traversal
  dir: avoid unnecessary traversal into ignored directory
  dir: traverse into untracked directories if they may have ignored
    subfiles
  dir: introduce readdir_skip_dot_and_dotdot() helper

 builtin/clean.c                    |   4 +-
 builtin/ls-files.c                 |   3 +
 builtin/worktree.c                 |   4 +-
 diff-no-index.c                    |   5 +-
 dir.c                              | 146 +++++++++++++-------
 dir.h                              |   6 +
 entry.c                            |   5 +-
 notes-merge.c                      |   5 +-
 object-file.c                      |   4 +-
 packfile.c                         |   5 +-
 rerere.c                           |   4 +-
 t/t1306-xdg-files.sh               |   2 +-
 t/t3001-ls-files-others-exclude.sh |   5 +
 t/t3003-ls-files-exclude.sh        |   4 +-
 t/t7063-status-untracked-cache.sh  | 206 +++++++++++++++++------------
 t/t7300-clean.sh                   |  42 ++++++
 t/t7519-status-fsmonitor.sh        |   8 +-
 worktree.c                         |  12 +-
 18 files changed, 298 insertions(+), 172 deletions(-)


base-commit: 311531c9de557d25ac087c1637818bd2aad6eb3a
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1020%2Fnewren%2Fdirectory-traversal-fixes-v5
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1020/newren/directory-traversal-fixes-v5
Pull-Request: https://github.com/git/git/pull/1020

Range-diff vs v4:

  1:  9204e36b7e90 !  1:  6b1b4820dd20 dir: convert trace calls to trace2 equivalents
     @@ dir.c: static struct untracked_cache_dir *validate_untracked_cache(struct dir_st
       	return root;
       }
       
     -+static void trace2_read_directory_statistics(struct dir_struct *dir,
     -+					     struct repository *repo,
     -+					     const char *path)
     ++static void emit_traversal_statistics(struct dir_struct *dir,
     ++				      struct repository *repo,
     ++				      const char *path,
     ++				      int path_len)
      +{
     ++	if (!trace2_is_enabled())
     ++		return;
     ++
     ++	if (!path_len) {
     ++		trace2_data_string("read_directory", repo, "path", "");
     ++	} else {
     ++		struct strbuf tmp = STRBUF_INIT;
     ++		strbuf_add(&tmp, path, path_len);
     ++		trace2_data_string("read_directory", repo, "path", tmp.buf);
     ++		strbuf_release(&tmp);
     ++	}
     ++
      +	if (!dir->untracked)
      +		return;
     -+	trace2_data_string("read_directory", repo, "path", path);
      +	trace2_data_intmax("read_directory", repo,
      +			   "node-creation", dir->untracked->dir_created);
      +	trace2_data_intmax("read_directory", repo,
     @@ dir.c: int read_directory(struct dir_struct *dir, struct index_state *istate,
       	QSORT(dir->ignored, dir->ignored_nr, cmp_dir_entry);
       
      -	trace_performance_leave("read directory %.*s", len, path);
     -+	if (trace2_is_enabled()) {
     -+		struct strbuf tmp = STRBUF_INIT;
     -+		strbuf_add(&tmp, path, len);
     -+		trace2_read_directory_statistics(dir, istate->repo, tmp.buf);
     -+		strbuf_release(&tmp);
     -+	}
     ++	emit_traversal_statistics(dir, istate->repo, path, len);
      +
      +	trace2_region_leave("dir", "read_directory", istate->repo);
       	if (dir->untracked) {
  2:  6939253be825 !  2:  cfe2898b7a7e dir: report number of visited directories and paths with trace2
     @@ dir.c: static enum path_treatment read_directory_recursive(struct dir_struct *di
       
       		if (state > dir_state)
       			dir_state = state;
     -@@ dir.c: static void trace2_read_directory_statistics(struct dir_struct *dir,
     - 					     struct repository *repo,
     - 					     const char *path)
     - {
     +@@ dir.c: static void emit_traversal_statistics(struct dir_struct *dir,
     + 		strbuf_release(&tmp);
     + 	}
     + 
      +	trace2_data_intmax("read_directory", repo,
      +			   "directories-visited", dir->visited_directories);
      +	trace2_data_intmax("read_directory", repo,
      +			   "paths-visited", dir->visited_paths);
     ++
       	if (!dir->untracked)
       		return;
     - 	trace2_data_string("read_directory", repo, "path", path);
     + 	trace2_data_intmax("read_directory", repo,
      @@ dir.c: int read_directory(struct dir_struct *dir, struct index_state *istate,
       	struct untracked_cache_dir *untracked;
       
  3:  8d0ca8104be6 =  3:  279ef30ffbc2 ls-files: error out on -i unless -o or -c are specified
  4:  317abab3571e =  4:  5a8807a1992c t7300: add testcase showing unnecessary traversal into ignored directory
  5:  5eb019327b57 =  5:  b014ccbbaf3e t3001, t7300: add testcase showcasing missed directory traversal
  6:  89cc01ef8598 =  6:  ae1c9e37b21b dir: avoid unnecessary traversal into ignored directory
  7:  4a561e1229e4 =  7:  6fa1e85edf2f dir: traverse into untracked directories if they may have ignored subfiles
  8:  2945e749f5e3 =  8:  179f992edc92 dir: update stale description of treat_directory()
  -:  ------------ >  9:  b7c6176560bd dir: introduce readdir_skip_dot_and_dotdot() helper

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH v5 1/9] dir: convert trace calls to trace2 equivalents
  2021-05-12 17:28       ` [PATCH v5 0/9] Directory traversal fixes Elijah Newren via GitGitGadget
@ 2021-05-12 17:28         ` Elijah Newren via GitGitGadget
  2021-05-12 17:28         ` [PATCH v5 2/9] dir: report number of visited directories and paths with trace2 Elijah Newren via GitGitGadget
                           ` (8 subsequent siblings)
  9 siblings, 0 replies; 90+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-05-12 17:28 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King,
	Philip Oakley, Jeff Hostetler, Josh Steadmon, Jeff Hostetler,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c                             |  50 ++++++--
 t/t7063-status-untracked-cache.sh | 205 ++++++++++++++++++------------
 t/t7519-status-fsmonitor.sh       |   8 +-
 3 files changed, 162 insertions(+), 101 deletions(-)

diff --git a/dir.c b/dir.c
index 3474e67e8f3c..cf19a83d3e2c 100644
--- a/dir.c
+++ b/dir.c
@@ -2760,15 +2760,46 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
 	return root;
 }
 
+static void emit_traversal_statistics(struct dir_struct *dir,
+				      struct repository *repo,
+				      const char *path,
+				      int path_len)
+{
+	if (!trace2_is_enabled())
+		return;
+
+	if (!path_len) {
+		trace2_data_string("read_directory", repo, "path", "");
+	} else {
+		struct strbuf tmp = STRBUF_INIT;
+		strbuf_add(&tmp, path, path_len);
+		trace2_data_string("read_directory", repo, "path", tmp.buf);
+		strbuf_release(&tmp);
+	}
+
+	if (!dir->untracked)
+		return;
+	trace2_data_intmax("read_directory", repo,
+			   "node-creation", dir->untracked->dir_created);
+	trace2_data_intmax("read_directory", repo,
+			   "gitignore-invalidation",
+			   dir->untracked->gitignore_invalidated);
+	trace2_data_intmax("read_directory", repo,
+			   "directory-invalidation",
+			   dir->untracked->dir_invalidated);
+	trace2_data_intmax("read_directory", repo,
+			   "opendir", dir->untracked->dir_opened);
+}
+
 int read_directory(struct dir_struct *dir, struct index_state *istate,
 		   const char *path, int len, const struct pathspec *pathspec)
 {
 	struct untracked_cache_dir *untracked;
 
-	trace_performance_enter();
+	trace2_region_enter("dir", "read_directory", istate->repo);
 
 	if (has_symlink_leading_path(path, len)) {
-		trace_performance_leave("read directory %.*s", len, path);
+		trace2_region_leave("dir", "read_directory", istate->repo);
 		return dir->nr;
 	}
 
@@ -2784,23 +2815,15 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
 	QSORT(dir->entries, dir->nr, cmp_dir_entry);
 	QSORT(dir->ignored, dir->ignored_nr, cmp_dir_entry);
 
-	trace_performance_leave("read directory %.*s", len, path);
+	emit_traversal_statistics(dir, istate->repo, path, len);
+
+	trace2_region_leave("dir", "read_directory", istate->repo);
 	if (dir->untracked) {
 		static int force_untracked_cache = -1;
-		static struct trace_key trace_untracked_stats = TRACE_KEY_INIT(UNTRACKED_STATS);
 
 		if (force_untracked_cache < 0)
 			force_untracked_cache =
 				git_env_bool("GIT_FORCE_UNTRACKED_CACHE", 0);
-		trace_printf_key(&trace_untracked_stats,
-				 "node creation: %u\n"
-				 "gitignore invalidation: %u\n"
-				 "directory invalidation: %u\n"
-				 "opendir: %u\n",
-				 dir->untracked->dir_created,
-				 dir->untracked->gitignore_invalidated,
-				 dir->untracked->dir_invalidated,
-				 dir->untracked->dir_opened);
 		if (force_untracked_cache &&
 			dir->untracked == istate->untracked &&
 		    (dir->untracked->dir_opened ||
@@ -2811,6 +2834,7 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
 			FREE_AND_NULL(dir->untracked);
 		}
 	}
+
 	return dir->nr;
 }
 
diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh
index accefde72fb1..9710d33b3cd6 100755
--- a/t/t7063-status-untracked-cache.sh
+++ b/t/t7063-status-untracked-cache.sh
@@ -57,6 +57,19 @@ iuc () {
 	return $ret
 }
 
+get_relevant_traces () {
+	# From the GIT_TRACE2_PERF data of the form
+	#    $TIME $FILE:$LINE | d0 | main | data | r1 | ? | ? | read_directo | $RELEVANT_STAT
+	# extract the $RELEVANT_STAT fields.  We don't care about region_enter
+	# or region_leave, or stats for things outside read_directory.
+	INPUT_FILE=$1
+	OUTPUT_FILE=$2
+	grep data.*read_directo $INPUT_FILE |
+	    cut -d "|" -f 9 \
+	    >"$OUTPUT_FILE"
+}
+
+
 test_lazy_prereq UNTRACKED_CACHE '
 	{ git update-index --test-untracked-cache; ret=$?; } &&
 	test $ret -ne 1
@@ -129,19 +142,21 @@ EOF
 
 test_expect_success 'status first time (empty cache)' '
 	avoid_racy &&
-	: >../trace &&
-	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 	git status --porcelain >../actual &&
 	iuc status --porcelain >../status.iuc &&
 	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
 	cat >../trace.expect <<EOF &&
-node creation: 3
-gitignore invalidation: 1
-directory invalidation: 0
-opendir: 4
+ ....path:
+ ....node-creation:3
+ ....gitignore-invalidation:1
+ ....directory-invalidation:0
+ ....opendir:4
 EOF
-	test_cmp ../trace.expect ../trace
+	test_cmp ../trace.expect ../trace.relevant
 '
 
 test_expect_success 'untracked cache after first status' '
@@ -151,19 +166,21 @@ test_expect_success 'untracked cache after first status' '
 
 test_expect_success 'status second time (fully populated cache)' '
 	avoid_racy &&
-	: >../trace &&
-	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 	git status --porcelain >../actual &&
 	iuc status --porcelain >../status.iuc &&
 	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
 	cat >../trace.expect <<EOF &&
-node creation: 0
-gitignore invalidation: 0
-directory invalidation: 0
-opendir: 0
+ ....path:
+ ....node-creation:0
+ ....gitignore-invalidation:0
+ ....directory-invalidation:0
+ ....opendir:0
 EOF
-	test_cmp ../trace.expect ../trace
+	test_cmp ../trace.expect ../trace.relevant
 '
 
 test_expect_success 'untracked cache after second status' '
@@ -174,8 +191,8 @@ test_expect_success 'untracked cache after second status' '
 test_expect_success 'modify in root directory, one dir invalidation' '
 	avoid_racy &&
 	: >four &&
-	: >../trace &&
-	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 	git status --porcelain >../actual &&
 	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
@@ -189,13 +206,15 @@ A  two
 EOF
 	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
 	cat >../trace.expect <<EOF &&
-node creation: 0
-gitignore invalidation: 0
-directory invalidation: 1
-opendir: 1
+ ....path:
+ ....node-creation:0
+ ....gitignore-invalidation:0
+ ....directory-invalidation:1
+ ....opendir:1
 EOF
-	test_cmp ../trace.expect ../trace
+	test_cmp ../trace.expect ../trace.relevant
 
 '
 
@@ -223,8 +242,8 @@ EOF
 test_expect_success 'new .gitignore invalidates recursively' '
 	avoid_racy &&
 	echo four >.gitignore &&
-	: >../trace &&
-	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 	git status --porcelain >../actual &&
 	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
@@ -238,13 +257,15 @@ A  two
 EOF
 	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
 	cat >../trace.expect <<EOF &&
-node creation: 0
-gitignore invalidation: 1
-directory invalidation: 1
-opendir: 4
+ ....path:
+ ....node-creation:0
+ ....gitignore-invalidation:1
+ ....directory-invalidation:1
+ ....opendir:4
 EOF
-	test_cmp ../trace.expect ../trace
+	test_cmp ../trace.expect ../trace.relevant
 
 '
 
@@ -272,8 +293,8 @@ EOF
 test_expect_success 'new info/exclude invalidates everything' '
 	avoid_racy &&
 	echo three >>.git/info/exclude &&
-	: >../trace &&
-	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 	git status --porcelain >../actual &&
 	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
@@ -285,13 +306,15 @@ A  two
 EOF
 	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
 	cat >../trace.expect <<EOF &&
-node creation: 0
-gitignore invalidation: 1
-directory invalidation: 0
-opendir: 4
+ ....path:
+ ....node-creation:0
+ ....gitignore-invalidation:1
+ ....directory-invalidation:0
+ ....opendir:4
 EOF
-	test_cmp ../trace.expect ../trace
+	test_cmp ../trace.expect ../trace.relevant
 '
 
 test_expect_success 'verify untracked cache dump' '
@@ -330,8 +353,8 @@ EOF
 '
 
 test_expect_success 'status after the move' '
-	: >../trace &&
-	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 	git status --porcelain >../actual &&
 	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
@@ -343,13 +366,15 @@ A  one
 EOF
 	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
 	cat >../trace.expect <<EOF &&
-node creation: 0
-gitignore invalidation: 0
-directory invalidation: 0
-opendir: 1
+ ....path:
+ ....node-creation:0
+ ....gitignore-invalidation:0
+ ....directory-invalidation:0
+ ....opendir:1
 EOF
-	test_cmp ../trace.expect ../trace
+	test_cmp ../trace.expect ../trace.relevant
 '
 
 test_expect_success 'verify untracked cache dump' '
@@ -389,8 +414,8 @@ EOF
 '
 
 test_expect_success 'status after the move' '
-	: >../trace &&
-	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 	git status --porcelain >../actual &&
 	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
@@ -402,13 +427,15 @@ A  two
 EOF
 	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
 	cat >../trace.expect <<EOF &&
-node creation: 0
-gitignore invalidation: 0
-directory invalidation: 0
-opendir: 1
+ ....path:
+ ....node-creation:0
+ ....gitignore-invalidation:0
+ ....directory-invalidation:0
+ ....opendir:1
 EOF
-	test_cmp ../trace.expect ../trace
+	test_cmp ../trace.expect ../trace.relevant
 '
 
 test_expect_success 'verify untracked cache dump' '
@@ -438,8 +465,8 @@ test_expect_success 'set up for sparse checkout testing' '
 '
 
 test_expect_success 'status after commit' '
-	: >../trace &&
-	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 	git status --porcelain >../actual &&
 	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
@@ -448,13 +475,15 @@ test_expect_success 'status after commit' '
 EOF
 	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
 	cat >../trace.expect <<EOF &&
-node creation: 0
-gitignore invalidation: 0
-directory invalidation: 0
-opendir: 2
+ ....path:
+ ....node-creation:0
+ ....gitignore-invalidation:0
+ ....directory-invalidation:0
+ ....opendir:2
 EOF
-	test_cmp ../trace.expect ../trace
+	test_cmp ../trace.expect ../trace.relevant
 '
 
 test_expect_success 'untracked cache correct after commit' '
@@ -496,9 +525,9 @@ test_expect_success 'create/modify files, some of which are gitignored' '
 '
 
 test_expect_success 'test sparse status with untracked cache' '
-	: >../trace &&
+	: >../trace.output &&
 	avoid_racy &&
-	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 	git status --porcelain >../status.actual &&
 	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
@@ -509,13 +538,15 @@ test_expect_success 'test sparse status with untracked cache' '
 EOF
 	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../status.actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
 	cat >../trace.expect <<EOF &&
-node creation: 0
-gitignore invalidation: 1
-directory invalidation: 2
-opendir: 2
+ ....path:
+ ....node-creation:0
+ ....gitignore-invalidation:1
+ ....directory-invalidation:2
+ ....opendir:2
 EOF
-	test_cmp ../trace.expect ../trace
+	test_cmp ../trace.expect ../trace.relevant
 '
 
 test_expect_success 'untracked cache correct after status' '
@@ -539,8 +570,8 @@ EOF
 
 test_expect_success 'test sparse status again with untracked cache' '
 	avoid_racy &&
-	: >../trace &&
-	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 	git status --porcelain >../status.actual &&
 	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
@@ -551,13 +582,15 @@ test_expect_success 'test sparse status again with untracked cache' '
 EOF
 	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../status.actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
 	cat >../trace.expect <<EOF &&
-node creation: 0
-gitignore invalidation: 0
-directory invalidation: 0
-opendir: 0
+ ....path:
+ ....node-creation:0
+ ....gitignore-invalidation:0
+ ....directory-invalidation:0
+ ....opendir:0
 EOF
-	test_cmp ../trace.expect ../trace
+	test_cmp ../trace.expect ../trace.relevant
 '
 
 test_expect_success 'set up for test of subdir and sparse checkouts' '
@@ -568,8 +601,8 @@ test_expect_success 'set up for test of subdir and sparse checkouts' '
 
 test_expect_success 'test sparse status with untracked cache and subdir' '
 	avoid_racy &&
-	: >../trace &&
-	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 	git status --porcelain >../status.actual &&
 	iuc status --porcelain >../status.iuc &&
 	cat >../status.expect <<EOF &&
@@ -581,13 +614,15 @@ test_expect_success 'test sparse status with untracked cache and subdir' '
 EOF
 	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../status.actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
 	cat >../trace.expect <<EOF &&
-node creation: 2
-gitignore invalidation: 0
-directory invalidation: 1
-opendir: 3
+ ....path:
+ ....node-creation:2
+ ....gitignore-invalidation:0
+ ....directory-invalidation:1
+ ....opendir:3
 EOF
-	test_cmp ../trace.expect ../trace
+	test_cmp ../trace.expect ../trace.relevant
 '
 
 test_expect_success 'verify untracked cache dump (sparse/subdirs)' '
@@ -616,19 +651,21 @@ EOF
 
 test_expect_success 'test sparse status again with untracked cache and subdir' '
 	avoid_racy &&
-	: >../trace &&
-	GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 	git status --porcelain >../status.actual &&
 	iuc status --porcelain >../status.iuc &&
 	test_cmp ../status.expect ../status.iuc &&
 	test_cmp ../status.expect ../status.actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
 	cat >../trace.expect <<EOF &&
-node creation: 0
-gitignore invalidation: 0
-directory invalidation: 0
-opendir: 0
+ ....path:
+ ....node-creation:0
+ ....gitignore-invalidation:0
+ ....directory-invalidation:0
+ ....opendir:0
 EOF
-	test_cmp ../trace.expect ../trace
+	test_cmp ../trace.expect ../trace.relevant
 '
 
 test_expect_success 'move entry in subdir from untracked to cached' '
diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
index 45d025f96010..637391c6ce46 100755
--- a/t/t7519-status-fsmonitor.sh
+++ b/t/t7519-status-fsmonitor.sh
@@ -334,7 +334,7 @@ test_expect_success UNTRACKED_CACHE 'ignore .git changes when invalidating UNTR'
 		git config core.fsmonitor .git/hooks/fsmonitor-test &&
 		git update-index --untracked-cache &&
 		git update-index --fsmonitor &&
-		GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace-before" \
+		GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace-before" \
 		git status &&
 		test-tool dump-untracked-cache >../before
 	) &&
@@ -346,12 +346,12 @@ test_expect_success UNTRACKED_CACHE 'ignore .git changes when invalidating UNTR'
 	EOF
 	(
 		cd dot-git &&
-		GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace-after" \
+		GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace-after" \
 		git status &&
 		test-tool dump-untracked-cache >../after
 	) &&
-	grep "directory invalidation" trace-before >>before &&
-	grep "directory invalidation" trace-after >>after &&
+	grep "directory-invalidation" trace-before | cut -d"|" -f 9 >>before &&
+	grep "directory-invalidation" trace-after  | cut -d"|" -f 9 >>after &&
 	# UNTR extension unchanged, dir invalidation count unchanged
 	test_cmp before after
 '
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH v5 2/9] dir: report number of visited directories and paths with trace2
  2021-05-12 17:28       ` [PATCH v5 0/9] Directory traversal fixes Elijah Newren via GitGitGadget
  2021-05-12 17:28         ` [PATCH v5 1/9] dir: convert trace calls to trace2 equivalents Elijah Newren via GitGitGadget
@ 2021-05-12 17:28         ` Elijah Newren via GitGitGadget
  2021-05-12 17:28         ` [PATCH v5 3/9] ls-files: error out on -i unless -o or -c are specified Elijah Newren via GitGitGadget
                           ` (7 subsequent siblings)
  9 siblings, 0 replies; 90+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-05-12 17:28 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King,
	Philip Oakley, Jeff Hostetler, Josh Steadmon, Jeff Hostetler,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Provide more statistics in trace2 output that include the number of
directories and total paths visited by the directory traversal logic.
Subsequent patches will take advantage of this to ensure we do not
unnecessarily traverse into ignored directories.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c                             | 9 +++++++++
 dir.h                             | 4 ++++
 t/t7063-status-untracked-cache.sh | 3 ++-
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/dir.c b/dir.c
index cf19a83d3e2c..f6dec5fd4a78 100644
--- a/dir.c
+++ b/dir.c
@@ -2440,6 +2440,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 
 	if (open_cached_dir(&cdir, dir, untracked, istate, &path, check_only))
 		goto out;
+	dir->visited_directories++;
 
 	if (untracked)
 		untracked->check_only = !!check_only;
@@ -2448,6 +2449,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 		/* check how the file or directory should be treated */
 		state = treat_path(dir, untracked, &cdir, istate, &path,
 				   baselen, pathspec);
+		dir->visited_paths++;
 
 		if (state > dir_state)
 			dir_state = state;
@@ -2777,6 +2779,11 @@ static void emit_traversal_statistics(struct dir_struct *dir,
 		strbuf_release(&tmp);
 	}
 
+	trace2_data_intmax("read_directory", repo,
+			   "directories-visited", dir->visited_directories);
+	trace2_data_intmax("read_directory", repo,
+			   "paths-visited", dir->visited_paths);
+
 	if (!dir->untracked)
 		return;
 	trace2_data_intmax("read_directory", repo,
@@ -2797,6 +2804,8 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
 	struct untracked_cache_dir *untracked;
 
 	trace2_region_enter("dir", "read_directory", istate->repo);
+	dir->visited_paths = 0;
+	dir->visited_directories = 0;
 
 	if (has_symlink_leading_path(path, len)) {
 		trace2_region_leave("dir", "read_directory", istate->repo);
diff --git a/dir.h b/dir.h
index 04d886cfce75..22c67907f689 100644
--- a/dir.h
+++ b/dir.h
@@ -336,6 +336,10 @@ struct dir_struct {
 	struct oid_stat ss_info_exclude;
 	struct oid_stat ss_excludes_file;
 	unsigned unmanaged_exclude_files;
+
+	/* Stats about the traversal */
+	unsigned visited_paths;
+	unsigned visited_directories;
 };
 
 /*Count the number of slashes for string s*/
diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh
index 9710d33b3cd6..a0c123b0a77a 100755
--- a/t/t7063-status-untracked-cache.sh
+++ b/t/t7063-status-untracked-cache.sh
@@ -65,7 +65,8 @@ get_relevant_traces () {
 	INPUT_FILE=$1
 	OUTPUT_FILE=$2
 	grep data.*read_directo $INPUT_FILE |
-	    cut -d "|" -f 9 \
+	    cut -d "|" -f 9 |
+	    grep -v visited \
 	    >"$OUTPUT_FILE"
 }
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH v5 3/9] ls-files: error out on -i unless -o or -c are specified
  2021-05-12 17:28       ` [PATCH v5 0/9] Directory traversal fixes Elijah Newren via GitGitGadget
  2021-05-12 17:28         ` [PATCH v5 1/9] dir: convert trace calls to trace2 equivalents Elijah Newren via GitGitGadget
  2021-05-12 17:28         ` [PATCH v5 2/9] dir: report number of visited directories and paths with trace2 Elijah Newren via GitGitGadget
@ 2021-05-12 17:28         ` Elijah Newren via GitGitGadget
  2021-05-12 17:28         ` [PATCH v5 4/9] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget
                           ` (6 subsequent siblings)
  9 siblings, 0 replies; 90+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-05-12 17:28 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King,
	Philip Oakley, Jeff Hostetler, Josh Steadmon, Jeff Hostetler,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

ls-files --ignored can be used together with either --others or
--cached.  After being perplexed for a bit and digging in to the code, I
assumed that ls-files -i was just broken and not printing anything and
I had a nice patch ready to submit when I finally realized that -i can be
used with --cached to find tracked ignores.

While that was a mistake on my part, and a careful reading of the
documentation could have made this more clear, I suspect this is an
error others are likely to make as well.  In fact, of two uses in our
testsuite, I believe one of the two did make this error.  In t1306.13,
there are NO tracked files, and all the excludes built up and used in
that test and in previous tests thus have to be about untracked files.
However, since they were looking for an empty result, the mistake went
unnoticed as their erroneous command also just happened to give an empty
answer.

-i will most the time be used with -o, which would suggest we could just
make -i imply -o in the absence of either a -o or -c, but that would be
a backward incompatible break.  Instead, let's just flag -i without
either a -o or -c as an error, and update the two relevant testcases to
specify their intent.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 builtin/ls-files.c          | 3 +++
 t/t1306-xdg-files.sh        | 2 +-
 t/t3003-ls-files-exclude.sh | 4 ++--
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 60a2913a01e9..e8e25006c647 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -748,6 +748,9 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 	if (pathspec.nr && error_unmatch)
 		ps_matched = xcalloc(pathspec.nr, 1);
 
+	if ((dir.flags & DIR_SHOW_IGNORED) && !show_others && !show_cached)
+		die("ls-files -i must be used with either -o or -c");
+
 	if ((dir.flags & DIR_SHOW_IGNORED) && !exc_given)
 		die("ls-files --ignored needs some exclude pattern");
 
diff --git a/t/t1306-xdg-files.sh b/t/t1306-xdg-files.sh
index dd87b43be1a6..40d3c42618c0 100755
--- a/t/t1306-xdg-files.sh
+++ b/t/t1306-xdg-files.sh
@@ -116,7 +116,7 @@ test_expect_success 'Exclusion in a non-XDG global ignore file' '
 test_expect_success 'Checking XDG ignore file when HOME is unset' '
 	(sane_unset HOME &&
 	 git config --unset core.excludesfile &&
-	 git ls-files --exclude-standard --ignored >actual) &&
+	 git ls-files --exclude-standard --ignored --others >actual) &&
 	test_must_be_empty actual
 '
 
diff --git a/t/t3003-ls-files-exclude.sh b/t/t3003-ls-files-exclude.sh
index d5ec333131f9..c41c4f046abf 100755
--- a/t/t3003-ls-files-exclude.sh
+++ b/t/t3003-ls-files-exclude.sh
@@ -29,11 +29,11 @@ test_expect_success 'add file to gitignore' '
 '
 check_all_output
 
-test_expect_success 'ls-files -i lists only tracked-but-ignored files' '
+test_expect_success 'ls-files -i -c lists only tracked-but-ignored files' '
 	echo content >other-file &&
 	git add other-file &&
 	echo file >expect &&
-	git ls-files -i --exclude-standard >output &&
+	git ls-files -i -c --exclude-standard >output &&
 	test_cmp expect output
 '
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH v5 4/9] t7300: add testcase showing unnecessary traversal into ignored directory
  2021-05-12 17:28       ` [PATCH v5 0/9] Directory traversal fixes Elijah Newren via GitGitGadget
                           ` (2 preceding siblings ...)
  2021-05-12 17:28         ` [PATCH v5 3/9] ls-files: error out on -i unless -o or -c are specified Elijah Newren via GitGitGadget
@ 2021-05-12 17:28         ` Elijah Newren via GitGitGadget
  2021-05-12 17:28         ` [PATCH v5 5/9] t3001, t7300: add testcase showcasing missed directory traversal Elijah Newren via GitGitGadget
                           ` (5 subsequent siblings)
  9 siblings, 0 replies; 90+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-05-12 17:28 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King,
	Philip Oakley, Jeff Hostetler, Josh Steadmon, Jeff Hostetler,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

The PNPM package manager is apparently creating deeply nested (but
ignored) directory structures; traversing them is costly
performance-wise, unnecessary, and in some cases is even throwing
warnings/errors because the paths are too long to handle on various
platforms.  Add a testcase that checks for such unnecessary directory
traversal.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t7300-clean.sh | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index a74816ca8b46..07e8ba2d4b85 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -746,4 +746,27 @@ test_expect_success 'clean untracked paths by pathspec' '
 	test_must_be_empty actual
 '
 
+test_expect_failure 'avoid traversing into ignored directories' '
+	test_when_finished rm -f output error trace.* &&
+	test_create_repo avoid-traversing-deep-hierarchy &&
+	(
+		cd avoid-traversing-deep-hierarchy &&
+
+		mkdir -p untracked/subdir/with/a &&
+		>untracked/subdir/with/a/random-file.txt &&
+
+		GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
+		git clean -ffdxn -e untracked
+	) &&
+
+	# Make sure we only visited into the top-level directory, and did
+	# not traverse into the "untracked" subdirectory since it was excluded
+	grep data.*read_directo.*directories-visited trace.output |
+		cut -d "|" -f 9 >trace.relevant &&
+	cat >trace.expect <<-EOF &&
+	 ..directories-visited:1
+	EOF
+	test_cmp trace.expect trace.relevant
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH v5 5/9] t3001, t7300: add testcase showcasing missed directory traversal
  2021-05-12 17:28       ` [PATCH v5 0/9] Directory traversal fixes Elijah Newren via GitGitGadget
                           ` (3 preceding siblings ...)
  2021-05-12 17:28         ` [PATCH v5 4/9] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget
@ 2021-05-12 17:28         ` Elijah Newren via GitGitGadget
  2021-05-12 17:28         ` [PATCH v5 6/9] dir: avoid unnecessary traversal into ignored directory Elijah Newren via GitGitGadget
                           ` (4 subsequent siblings)
  9 siblings, 0 replies; 90+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-05-12 17:28 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King,
	Philip Oakley, Jeff Hostetler, Josh Steadmon, Jeff Hostetler,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

In the last commit, we added a testcase showing that the directory
traversal machinery sometimes traverses into directories unnecessarily.
Here we show that there are cases where it does the opposite: it does
not traverse into directories, despite those directories having
important files that need to be flagged.

Add a testcase showing that `git ls-files -o -i --directory` can omit
some of the files it should be listing, and another showing that `git
clean -fX` can fail to clean out some of the expected files.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t3001-ls-files-others-exclude.sh |  5 +++++
 t/t7300-clean.sh                   | 19 +++++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/t/t3001-ls-files-others-exclude.sh b/t/t3001-ls-files-others-exclude.sh
index 1ec7cb57c7a8..ac05d1a17931 100755
--- a/t/t3001-ls-files-others-exclude.sh
+++ b/t/t3001-ls-files-others-exclude.sh
@@ -292,6 +292,11 @@ EOF
 	test_cmp expect actual
 '
 
+test_expect_failure 'ls-files with "**" patterns and --directory' '
+	# Expectation same as previous test
+	git ls-files --directory -o -i --exclude "**/a.1" >actual &&
+	test_cmp expect actual
+'
 
 test_expect_success 'ls-files with "**" patterns and no slashes' '
 	git ls-files -o -i --exclude "one**a.1" >actual &&
diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index 07e8ba2d4b85..34c08c325407 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -769,4 +769,23 @@ test_expect_failure 'avoid traversing into ignored directories' '
 	test_cmp trace.expect trace.relevant
 '
 
+test_expect_failure 'traverse into directories that may have ignored entries' '
+	test_when_finished rm -f output &&
+	test_create_repo need-to-traverse-into-hierarchy &&
+	(
+		cd need-to-traverse-into-hierarchy &&
+		mkdir -p modules/foobar/src/generated &&
+		> modules/foobar/src/generated/code.c &&
+		> modules/foobar/Makefile &&
+		echo "/modules/**/src/generated/" >.gitignore &&
+
+		git clean -fX modules/foobar >../output &&
+
+		grep Removing ../output &&
+
+		test_path_is_missing modules/foobar/src/generated/code.c &&
+		test_path_is_file modules/foobar/Makefile
+	)
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH v5 6/9] dir: avoid unnecessary traversal into ignored directory
  2021-05-12 17:28       ` [PATCH v5 0/9] Directory traversal fixes Elijah Newren via GitGitGadget
                           ` (4 preceding siblings ...)
  2021-05-12 17:28         ` [PATCH v5 5/9] t3001, t7300: add testcase showcasing missed directory traversal Elijah Newren via GitGitGadget
@ 2021-05-12 17:28         ` Elijah Newren via GitGitGadget
  2021-05-12 17:28         ` [PATCH v5 7/9] dir: traverse into untracked directories if they may have ignored subfiles Elijah Newren via GitGitGadget
                           ` (3 subsequent siblings)
  9 siblings, 0 replies; 90+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-05-12 17:28 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King,
	Philip Oakley, Jeff Hostetler, Josh Steadmon, Jeff Hostetler,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

The show_other_directories case in treat_directory() tried to handle
both excludes and untracked files with the same logic, and mishandled
both the excludes and the untracked files in the process, in different
ways.  Split that logic apart, and then focus on the logic for the
excludes; a subsequent commit will address the logic for untracked
files.

For show_other_directories, an excluded directory means that
every path underneath that directory will also be excluded.  Given that
the calling code requested to just show directories when everything
under a directory had the same state (that's what the
"DIR_SHOW_OTHER_DIRECTORIES" flag means), we generally do not need to
traverse into such directories and can just immediately mark them as
ignored (i.e. as path_excluded).  The only reason we cannot just
immediately return path_excluded is the DIR_HIDE_EMPTY_DIRECTORIES flag
and the possibility that the ignored directory is an empty directory.
The code previously treated DIR_SHOW_IGNORED_TOO in most cases as an
exception as well, which was wrong.  It can sometimes reduce the number
of cases where we need to recurse (namely if
DIR_SHOW_IGNORED_TOO_MODE_MATCHING is also set), but should not be able
to increase the number of cases where we need to recurse.  Fix the logic
accordingly.

Some sidenotes about possible confusion with dir.c:

* "ignored" often refers to an untracked ignore", i.e. a file which is
  not tracked which matches one of the ignore/exclusion rules.  But you
  can also have a "tracked ignore", a tracked file that happens to match
  one of the ignore/exclusion rules and which dir.c has to worry about
  since "git ls-files -c -i" is supposed to list them.

* The dir code often uses "ignored" and "excluded" interchangeably,
  which you need to keep in mind while reading the code.

* "exclude" is used multiple ways in the code:

  * As noted above, "exclude" is often a synonym for "ignored".

  * The logic for parsing .gitignore files was re-used in
    .git/info/sparse-checkout, except there it is used to mark paths that
    the user wants to *keep*.  This was mostly addressed by commit
    65edd96aec ("treewide: rename 'exclude' methods to 'pattern'",
    2019-09-03), but every once in a while you'll find a comment about
    "exclude" referring to these patterns that might in fact be in use
    by the sparse-checkout machinery for inclusion rules.

  * The word "EXCLUDE" is also used for pathspec negation, as in
      (pathspec->items[3].magic & PATHSPEC_EXCLUDE)
    Thus if a user had a .gitignore file containing
      *~
      *.log
      !settings.log
    And then ran
      git add -- 'settings.*' ':^settings.log'
    Then :^settings.log is a pathspec negation making settings.log not
    be requested to be added even though all other settings.* files are
    being added.  Also, !settings.log in the gitignore file is a negative
    exclude pattern meaning that settings.log is normally a file we
    want to track even though all other *.log files are ignored.

Sometimes it feels like dir.c needs its own glossary with its many
definitions, including the multiply-defined terms.

Reported-by: Jason Gore <Jason.Gore@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c            | 44 +++++++++++++++++++++++++++++---------------
 t/t7300-clean.sh |  2 +-
 2 files changed, 30 insertions(+), 16 deletions(-)

diff --git a/dir.c b/dir.c
index f6dec5fd4a78..db2ae516a3aa 100644
--- a/dir.c
+++ b/dir.c
@@ -1844,6 +1844,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	}
 
 	/* This is the "show_other_directories" case */
+	assert(dir->flags & DIR_SHOW_OTHER_DIRECTORIES);
 
 	/*
 	 * If we have a pathspec which could match something _below_ this
@@ -1854,27 +1855,40 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	if (matches_how == MATCHED_RECURSIVELY_LEADING_PATHSPEC)
 		return path_recurse;
 
+	/* Special cases for where this directory is excluded/ignored */
+	if (excluded) {
+		/*
+		 * In the show_other_directories case, if we're not
+		 * hiding empty directories, there is no need to
+		 * recurse into an ignored directory.
+		 */
+		if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES))
+			return path_excluded;
+
+		/*
+		 * Even if we are hiding empty directories, we can still avoid
+		 * recursing into ignored directories for DIR_SHOW_IGNORED_TOO
+		 * if DIR_SHOW_IGNORED_TOO_MODE_MATCHING is also set.
+		 */
+		if ((dir->flags & DIR_SHOW_IGNORED_TOO) &&
+		    (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING))
+			return path_excluded;
+	}
+
 	/*
-	 * Other than the path_recurse case immediately above, we only need
-	 * to recurse into untracked/ignored directories if either of the
-	 * following bits is set:
+	 * Other than the path_recurse case above, we only need to
+	 * recurse into untracked directories if either of the following
+	 * bits is set:
 	 *   - DIR_SHOW_IGNORED_TOO (because then we need to determine if
 	 *                           there are ignored entries below)
 	 *   - DIR_HIDE_EMPTY_DIRECTORIES (because we have to determine if
 	 *                                 the directory is empty)
 	 */
-	if (!(dir->flags & (DIR_SHOW_IGNORED_TOO | DIR_HIDE_EMPTY_DIRECTORIES)))
-		return excluded ? path_excluded : path_untracked;
-
-	/*
-	 * ...and even if DIR_SHOW_IGNORED_TOO is set, we can still avoid
-	 * recursing into ignored directories if the path is excluded and
-	 * DIR_SHOW_IGNORED_TOO_MODE_MATCHING is also set.
-	 */
-	if (excluded &&
-	    (dir->flags & DIR_SHOW_IGNORED_TOO) &&
-	    (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING))
-		return path_excluded;
+	if (!excluded &&
+	    !(dir->flags & (DIR_SHOW_IGNORED_TOO |
+			    DIR_HIDE_EMPTY_DIRECTORIES))) {
+		return path_untracked;
+	}
 
 	/*
 	 * Even if we don't want to know all the paths under an untracked or
diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index 34c08c325407..21e48b3ba591 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -746,7 +746,7 @@ test_expect_success 'clean untracked paths by pathspec' '
 	test_must_be_empty actual
 '
 
-test_expect_failure 'avoid traversing into ignored directories' '
+test_expect_success 'avoid traversing into ignored directories' '
 	test_when_finished rm -f output error trace.* &&
 	test_create_repo avoid-traversing-deep-hierarchy &&
 	(
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH v5 7/9] dir: traverse into untracked directories if they may have ignored subfiles
  2021-05-12 17:28       ` [PATCH v5 0/9] Directory traversal fixes Elijah Newren via GitGitGadget
                           ` (5 preceding siblings ...)
  2021-05-12 17:28         ` [PATCH v5 6/9] dir: avoid unnecessary traversal into ignored directory Elijah Newren via GitGitGadget
@ 2021-05-12 17:28         ` Elijah Newren via GitGitGadget
  2021-05-12 17:28         ` [PATCH v5 8/9] dir: update stale description of treat_directory() Derrick Stolee via GitGitGadget
                           ` (2 subsequent siblings)
  9 siblings, 0 replies; 90+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-05-12 17:28 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King,
	Philip Oakley, Jeff Hostetler, Josh Steadmon, Jeff Hostetler,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

A directory that is untracked does not imply that all files under it
should be categorized as untracked; in particular, if the caller is
interested in ignored files, many files or directories underneath the
untracked directory may be ignored.  We previously partially handled
this right with DIR_SHOW_IGNORED_TOO, but missed DIR_SHOW_IGNORED.  It
was not obvious, though, because the logic for untracked and excluded
files had been fused together making it harder to reason about.  The
previous commit split that logic out, making it easier to notice that
DIR_SHOW_IGNORED was missing.  Add it.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 dir.c                              | 10 ++++++----
 t/t3001-ls-files-others-exclude.sh |  2 +-
 t/t7300-clean.sh                   |  2 +-
 3 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/dir.c b/dir.c
index db2ae516a3aa..c0233bbba36c 100644
--- a/dir.c
+++ b/dir.c
@@ -1877,15 +1877,17 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 
 	/*
 	 * Other than the path_recurse case above, we only need to
-	 * recurse into untracked directories if either of the following
+	 * recurse into untracked directories if any of the following
 	 * bits is set:
-	 *   - DIR_SHOW_IGNORED_TOO (because then we need to determine if
-	 *                           there are ignored entries below)
+	 *   - DIR_SHOW_IGNORED (because then we need to determine if
+	 *                       there are ignored entries below)
+	 *   - DIR_SHOW_IGNORED_TOO (same as above)
 	 *   - DIR_HIDE_EMPTY_DIRECTORIES (because we have to determine if
 	 *                                 the directory is empty)
 	 */
 	if (!excluded &&
-	    !(dir->flags & (DIR_SHOW_IGNORED_TOO |
+	    !(dir->flags & (DIR_SHOW_IGNORED |
+			    DIR_SHOW_IGNORED_TOO |
 			    DIR_HIDE_EMPTY_DIRECTORIES))) {
 		return path_untracked;
 	}
diff --git a/t/t3001-ls-files-others-exclude.sh b/t/t3001-ls-files-others-exclude.sh
index ac05d1a17931..516c95ea0e82 100755
--- a/t/t3001-ls-files-others-exclude.sh
+++ b/t/t3001-ls-files-others-exclude.sh
@@ -292,7 +292,7 @@ EOF
 	test_cmp expect actual
 '
 
-test_expect_failure 'ls-files with "**" patterns and --directory' '
+test_expect_success 'ls-files with "**" patterns and --directory' '
 	# Expectation same as previous test
 	git ls-files --directory -o -i --exclude "**/a.1" >actual &&
 	test_cmp expect actual
diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index 21e48b3ba591..0399701e6276 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -769,7 +769,7 @@ test_expect_success 'avoid traversing into ignored directories' '
 	test_cmp trace.expect trace.relevant
 '
 
-test_expect_failure 'traverse into directories that may have ignored entries' '
+test_expect_success 'traverse into directories that may have ignored entries' '
 	test_when_finished rm -f output &&
 	test_create_repo need-to-traverse-into-hierarchy &&
 	(
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH v5 8/9] dir: update stale description of treat_directory()
  2021-05-12 17:28       ` [PATCH v5 0/9] Directory traversal fixes Elijah Newren via GitGitGadget
                           ` (6 preceding siblings ...)
  2021-05-12 17:28         ` [PATCH v5 7/9] dir: traverse into untracked directories if they may have ignored subfiles Elijah Newren via GitGitGadget
@ 2021-05-12 17:28         ` Derrick Stolee via GitGitGadget
  2021-05-17 17:20           ` Derrick Stolee
  2021-05-12 17:28         ` [PATCH v5 9/9] dir: introduce readdir_skip_dot_and_dotdot() helper Elijah Newren via GitGitGadget
  2021-05-17 17:23         ` [PATCH v5 0/9] Directory traversal fixes Derrick Stolee
  9 siblings, 1 reply; 90+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-05-12 17:28 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King,
	Philip Oakley, Jeff Hostetler, Josh Steadmon, Jeff Hostetler,
	Elijah Newren, Derrick Stolee

From: Derrick Stolee <stolee@gmail.com>

The documentation comment for treat_directory() was originally written
in 095952 (Teach directory traversal about subprojects, 2007-04-11)
which was before the 'struct dir_struct' split its bitfield of named
options into a 'flags' enum in 7c4c97c0 (Turn the flags in struct
dir_struct into a single variable, 2009-02-16). When those flags
changed, the comment became stale, since members like
'show_other_directories' transitioned into flags like
DIR_SHOW_OTHER_DIRECTORIES.

Update the comments for treat_directory() to use these flag names rather
than the old member names.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
---
 dir.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/dir.c b/dir.c
index c0233bbba36c..4794c822b47f 100644
--- a/dir.c
+++ b/dir.c
@@ -1749,13 +1749,13 @@ static enum exist_status directory_exists_in_index(struct index_state *istate,
  * Case 3: if we didn't have it in the index previously, we
  * have a few sub-cases:
  *
- *  (a) if "show_other_directories" is true, we show it as
- *      just a directory, unless "hide_empty_directories" is
+ *  (a) if DIR_SHOW_OTHER_DIRECTORIES flag is set, we show it as
+ *      just a directory, unless DIR_HIDE_EMPTY_DIRECTORIES is
  *      also true, in which case we need to check if it contains any
  *      untracked and / or ignored files.
- *  (b) if it looks like a git directory, and we don't have
- *      'no_gitlinks' set we treat it as a gitlink, and show it
- *      as a directory.
+ *  (b) if it looks like a git directory and we don't have the
+ *      DIR_NO_GITLINKS flag, then we treat it as a gitlink, and
+ *      show it as a directory.
  *  (c) otherwise, we recurse into it.
  */
 static enum path_treatment treat_directory(struct dir_struct *dir,
@@ -1843,7 +1843,6 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 		return path_recurse;
 	}
 
-	/* This is the "show_other_directories" case */
 	assert(dir->flags & DIR_SHOW_OTHER_DIRECTORIES);
 
 	/*
@@ -1858,7 +1857,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	/* Special cases for where this directory is excluded/ignored */
 	if (excluded) {
 		/*
-		 * In the show_other_directories case, if we're not
+		 * If DIR_SHOW_OTHER_DIRECTORIES is set and we're not
 		 * hiding empty directories, there is no need to
 		 * recurse into an ignored directory.
 		 */
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH v5 9/9] dir: introduce readdir_skip_dot_and_dotdot() helper
  2021-05-12 17:28       ` [PATCH v5 0/9] Directory traversal fixes Elijah Newren via GitGitGadget
                           ` (7 preceding siblings ...)
  2021-05-12 17:28         ` [PATCH v5 8/9] dir: update stale description of treat_directory() Derrick Stolee via GitGitGadget
@ 2021-05-12 17:28         ` Elijah Newren via GitGitGadget
  2021-05-17 17:22           ` Derrick Stolee
  2021-05-17 17:23         ` [PATCH v5 0/9] Directory traversal fixes Derrick Stolee
  9 siblings, 1 reply; 90+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-05-12 17:28 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King,
	Philip Oakley, Jeff Hostetler, Josh Steadmon, Jeff Hostetler,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Many places in the code were doing
    while ((d = readdir(dir)) != NULL) {
        if (is_dot_or_dotdot(d->d_name))
            continue;
        ...process d...
    }
Introduce a readdir_skip_dot_and_dotdot() helper to make that a one-liner:
    while ((d = readdir_skip_dot_and_dotdot(dir)) != NULL) {
        ...process d...
    }

This helper particularly simplifies checks for empty directories.

Also use this helper in read_cached_dir() so that our statistics are
consistent across platforms.  (In other words, read_cached_dir() should
have been using is_dot_or_dotdot() and skipping such entries, but did
not and left it to treat_path() to detect and mark such entries as
path_none.)

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 builtin/clean.c    |  4 +---
 builtin/worktree.c |  4 +---
 diff-no-index.c    |  5 ++---
 dir.c              | 26 +++++++++++++++++---------
 dir.h              |  2 ++
 entry.c            |  5 +----
 notes-merge.c      |  5 +----
 object-file.c      |  4 +---
 packfile.c         |  5 +----
 rerere.c           |  4 +---
 worktree.c         | 12 +++---------
 11 files changed, 31 insertions(+), 45 deletions(-)

diff --git a/builtin/clean.c b/builtin/clean.c
index 995053b79173..a1a57476153b 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -189,10 +189,8 @@ static int remove_dirs(struct strbuf *path, const char *prefix, int force_flag,
 	strbuf_complete(path, '/');
 
 	len = path->len;
-	while ((e = readdir(dir)) != NULL) {
+	while ((e = readdir_skip_dot_and_dotdot(dir)) != NULL) {
 		struct stat st;
-		if (is_dot_or_dotdot(e->d_name))
-			continue;
 
 		strbuf_setlen(path, len);
 		strbuf_addstr(path, e->d_name);
diff --git a/builtin/worktree.c b/builtin/worktree.c
index 877145349381..ae28249e0f0b 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -118,10 +118,8 @@ static void prune_worktrees(void)
 	struct dirent *d;
 	if (!dir)
 		return;
-	while ((d = readdir(dir)) != NULL) {
+	while ((d = readdir_skip_dot_and_dotdot(dir)) != NULL) {
 		char *path;
-		if (is_dot_or_dotdot(d->d_name))
-			continue;
 		strbuf_reset(&reason);
 		if (should_prune_worktree(d->d_name, &reason, &path, expire))
 			prune_worktree(d->d_name, reason.buf);
diff --git a/diff-no-index.c b/diff-no-index.c
index 7814eabfe028..e5cc87837143 100644
--- a/diff-no-index.c
+++ b/diff-no-index.c
@@ -26,9 +26,8 @@ static int read_directory_contents(const char *path, struct string_list *list)
 	if (!(dir = opendir(path)))
 		return error("Could not open directory %s", path);
 
-	while ((e = readdir(dir)))
-		if (!is_dot_or_dotdot(e->d_name))
-			string_list_insert(list, e->d_name);
+	while ((e = readdir_skip_dot_and_dotdot(dir)))
+		string_list_insert(list, e->d_name);
 
 	closedir(dir);
 	return 0;
diff --git a/dir.c b/dir.c
index 4794c822b47f..66c8518947dd 100644
--- a/dir.c
+++ b/dir.c
@@ -59,6 +59,18 @@ void dir_init(struct dir_struct *dir)
 	memset(dir, 0, sizeof(*dir));
 }
 
+struct dirent *
+readdir_skip_dot_and_dotdot(DIR *dirp)
+{
+	struct dirent *e;
+
+	while ((e = readdir(dirp)) != NULL) {
+		if (!is_dot_or_dotdot(e->d_name))
+			break;
+	}
+	return e;
+}
+
 int count_slashes(const char *s)
 {
 	int cnt = 0;
@@ -2341,7 +2353,7 @@ static int read_cached_dir(struct cached_dir *cdir)
 	struct dirent *de;
 
 	if (cdir->fdir) {
-		de = readdir(cdir->fdir);
+		de = readdir_skip_dot_and_dotdot(cdir->fdir);
 		if (!de) {
 			cdir->d_name = NULL;
 			cdir->d_type = DT_UNKNOWN;
@@ -2940,11 +2952,9 @@ int is_empty_dir(const char *path)
 	if (!dir)
 		return 0;
 
-	while ((e = readdir(dir)) != NULL)
-		if (!is_dot_or_dotdot(e->d_name)) {
-			ret = 0;
-			break;
-		}
+	e = readdir_skip_dot_and_dotdot(dir);
+	if (e)
+		ret = 0;
 
 	closedir(dir);
 	return ret;
@@ -2984,10 +2994,8 @@ static int remove_dir_recurse(struct strbuf *path, int flag, int *kept_up)
 	strbuf_complete(path, '/');
 
 	len = path->len;
-	while ((e = readdir(dir)) != NULL) {
+	while ((e = readdir_skip_dot_and_dotdot(dir)) != NULL) {
 		struct stat st;
-		if (is_dot_or_dotdot(e->d_name))
-			continue;
 
 		strbuf_setlen(path, len);
 		strbuf_addstr(path, e->d_name);
diff --git a/dir.h b/dir.h
index 22c67907f689..a704e466afd5 100644
--- a/dir.h
+++ b/dir.h
@@ -342,6 +342,8 @@ struct dir_struct {
 	unsigned visited_directories;
 };
 
+struct dirent *readdir_skip_dot_and_dotdot(DIR *dirp);
+
 /*Count the number of slashes for string s*/
 int count_slashes(const char *s);
 
diff --git a/entry.c b/entry.c
index 2dc94ba5cc2a..6da589696770 100644
--- a/entry.c
+++ b/entry.c
@@ -57,12 +57,9 @@ static void remove_subtree(struct strbuf *path)
 
 	if (!dir)
 		die_errno("cannot opendir '%s'", path->buf);
-	while ((de = readdir(dir)) != NULL) {
+	while ((de = readdir_skip_dot_and_dotdot(dir)) != NULL) {
 		struct stat st;
 
-		if (is_dot_or_dotdot(de->d_name))
-			continue;
-
 		strbuf_addch(path, '/');
 		strbuf_addstr(path, de->d_name);
 		if (lstat(path->buf, &st))
diff --git a/notes-merge.c b/notes-merge.c
index d2771fa3d43c..e9d6f86d3428 100644
--- a/notes-merge.c
+++ b/notes-merge.c
@@ -695,13 +695,10 @@ int notes_merge_commit(struct notes_merge_options *o,
 
 	strbuf_addch(&path, '/');
 	baselen = path.len;
-	while ((e = readdir(dir)) != NULL) {
+	while ((e = readdir_skip_dot_and_dotdot(dir)) != NULL) {
 		struct stat st;
 		struct object_id obj_oid, blob_oid;
 
-		if (is_dot_or_dotdot(e->d_name))
-			continue;
-
 		if (get_oid_hex(e->d_name, &obj_oid)) {
 			if (o->verbosity >= 3)
 				printf("Skipping non-SHA1 entry '%s%s'\n",
diff --git a/object-file.c b/object-file.c
index 624af408cdcd..77bdcfd21bc8 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2304,10 +2304,8 @@ int for_each_file_in_obj_subdir(unsigned int subdir_nr,
 	strbuf_addch(path, '/');
 	baselen = path->len;
 
-	while ((de = readdir(dir))) {
+	while ((de = readdir_skip_dot_and_dotdot(dir))) {
 		size_t namelen;
-		if (is_dot_or_dotdot(de->d_name))
-			continue;
 
 		namelen = strlen(de->d_name);
 		strbuf_setlen(path, baselen);
diff --git a/packfile.c b/packfile.c
index 8668345d9309..7c8f1b7202ca 100644
--- a/packfile.c
+++ b/packfile.c
@@ -813,10 +813,7 @@ void for_each_file_in_pack_dir(const char *objdir,
 	}
 	strbuf_addch(&path, '/');
 	dirnamelen = path.len;
-	while ((de = readdir(dir)) != NULL) {
-		if (is_dot_or_dotdot(de->d_name))
-			continue;
-
+	while ((de = readdir_skip_dot_and_dotdot(dir)) != NULL) {
 		strbuf_setlen(&path, dirnamelen);
 		strbuf_addstr(&path, de->d_name);
 
diff --git a/rerere.c b/rerere.c
index dee60dc6df63..d83d58df4fbc 100644
--- a/rerere.c
+++ b/rerere.c
@@ -1190,13 +1190,11 @@ void rerere_gc(struct repository *r, struct string_list *rr)
 	if (!dir)
 		die_errno(_("unable to open rr-cache directory"));
 	/* Collect stale conflict IDs ... */
-	while ((e = readdir(dir))) {
+	while ((e = readdir_skip_dot_and_dotdot(dir))) {
 		struct rerere_dir *rr_dir;
 		struct rerere_id id;
 		int now_empty;
 
-		if (is_dot_or_dotdot(e->d_name))
-			continue;
 		if (!is_rr_cache_dirname(e->d_name))
 			continue; /* or should we remove e->d_name? */
 
diff --git a/worktree.c b/worktree.c
index f35ac40a84a5..237517baee67 100644
--- a/worktree.c
+++ b/worktree.c
@@ -128,10 +128,8 @@ struct worktree **get_worktrees(void)
 	dir = opendir(path.buf);
 	strbuf_release(&path);
 	if (dir) {
-		while ((d = readdir(dir)) != NULL) {
+		while ((d = readdir_skip_dot_and_dotdot(dir)) != NULL) {
 			struct worktree *linked = NULL;
-			if (is_dot_or_dotdot(d->d_name))
-				continue;
 
 			if ((linked = get_linked_worktree(d->d_name))) {
 				ALLOC_GROW(list, counter + 1, alloc);
@@ -486,13 +484,9 @@ int submodule_uses_worktrees(const char *path)
 	if (!dir)
 		return 0;
 
-	while ((d = readdir(dir)) != NULL) {
-		if (is_dot_or_dotdot(d->d_name))
-			continue;
-
+	d = readdir_skip_dot_and_dotdot(dir);
+	if (d != NULL)
 		ret = 1;
-		break;
-	}
 	closedir(dir);
 	return ret;
 }
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v5 8/9] dir: update stale description of treat_directory()
  2021-05-12 17:28         ` [PATCH v5 8/9] dir: update stale description of treat_directory() Derrick Stolee via GitGitGadget
@ 2021-05-17 17:20           ` Derrick Stolee
  2021-05-17 19:44             ` Junio C Hamano
  0 siblings, 1 reply; 90+ messages in thread
From: Derrick Stolee @ 2021-05-17 17:20 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget, git
  Cc: Eric Sunshine, Elijah Newren, Jeff King, Philip Oakley,
	Jeff Hostetler, Josh Steadmon, Jeff Hostetler

On 5/12/2021 1:28 PM, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <stolee@gmail.com>
> 
> The documentation comment for treat_directory() was originally written
> in 095952 (Teach directory traversal about subprojects, 2007-04-11)
> which was before the 'struct dir_struct' split its bitfield of named
> options into a 'flags' enum in 7c4c97c0 (Turn the flags in struct
> dir_struct into a single variable, 2009-02-16). When those flags
> changed, the comment became stale, since members like
> 'show_other_directories' transitioned into flags like
> DIR_SHOW_OTHER_DIRECTORIES.
> 
> Update the comments for treat_directory() to use these flag names rather
> than the old member names.
> 
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> Reviewed-by: Elijah Newren <newren@gmail.com>

I think you want the "Reviewed-by" before the "Signed-off-by",
followed by your own sign-off.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v5 9/9] dir: introduce readdir_skip_dot_and_dotdot() helper
  2021-05-12 17:28         ` [PATCH v5 9/9] dir: introduce readdir_skip_dot_and_dotdot() helper Elijah Newren via GitGitGadget
@ 2021-05-17 17:22           ` Derrick Stolee
  2021-05-18  3:34             ` Elijah Newren
  0 siblings, 1 reply; 90+ messages in thread
From: Derrick Stolee @ 2021-05-17 17:22 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget, git
  Cc: Eric Sunshine, Elijah Newren, Jeff King, Philip Oakley,
	Jeff Hostetler, Josh Steadmon, Jeff Hostetler

On 5/12/2021 1:28 PM, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
> 
> Many places in the code were doing
>     while ((d = readdir(dir)) != NULL) {
>         if (is_dot_or_dotdot(d->d_name))
>             continue;
>         ...process d...
>     }
> Introduce a readdir_skip_dot_and_dotdot() helper to make that a one-liner:
>     while ((d = readdir_skip_dot_and_dotdot(dir)) != NULL) {
>         ...process d...
>     }
> 
> This helper particularly simplifies checks for empty directories.
> 
> Also use this helper in read_cached_dir() so that our statistics are
> consistent across platforms.  (In other words, read_cached_dir() should
> have been using is_dot_or_dotdot() and skipping such entries, but did
> not and left it to treat_path() to detect and mark such entries as
> path_none.)

I like the idea of this helper!
  
> +struct dirent *
> +readdir_skip_dot_and_dotdot(DIR *dirp)

nit: This seems like an accidental newline between the
return type and the method name.

Otherwise, patch LGTM.

-Stolee

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v5 0/9] Directory traversal fixes
  2021-05-12 17:28       ` [PATCH v5 0/9] Directory traversal fixes Elijah Newren via GitGitGadget
                           ` (8 preceding siblings ...)
  2021-05-12 17:28         ` [PATCH v5 9/9] dir: introduce readdir_skip_dot_and_dotdot() helper Elijah Newren via GitGitGadget
@ 2021-05-17 17:23         ` Derrick Stolee
  9 siblings, 0 replies; 90+ messages in thread
From: Derrick Stolee @ 2021-05-17 17:23 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget, git
  Cc: Eric Sunshine, Elijah Newren, Jeff King, Philip Oakley,
	Jeff Hostetler, Josh Steadmon, Jeff Hostetler

On 5/12/2021 1:28 PM, Elijah Newren via GitGitGadget wrote:
> This patchset fixes a few directory traversal issues, where fill_directory()
> would traverse into directories that it shouldn't and not traverse into
> directories that it should (one of which was originally reported on this
> list at [1]). And it includes a few cleanups

Sorry that I've been sleeping on this series since v1. I re-read this
version from scratch and only found a couple nitpicks.

> If anyone has any ideas about a better place to put the "Some sidenotes"
> from the sixth commit message rather than keeping them in a random commit
> message, that might be helpful.

I don't have any better ideas, sorry.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v5 8/9] dir: update stale description of treat_directory()
  2021-05-17 17:20           ` Derrick Stolee
@ 2021-05-17 19:44             ` Junio C Hamano
  2021-05-18  3:32               ` Elijah Newren
  0 siblings, 1 reply; 90+ messages in thread
From: Junio C Hamano @ 2021-05-17 19:44 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, Eric Sunshine,
	Elijah Newren, Jeff King, Philip Oakley, Jeff Hostetler,
	Josh Steadmon, Jeff Hostetler

Derrick Stolee <stolee@gmail.com> writes:

> On 5/12/2021 1:28 PM, Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <stolee@gmail.com>
>> 
>> The documentation comment for treat_directory() was originally written
>> in 095952 (Teach directory traversal about subprojects, 2007-04-11)
>> which was before the 'struct dir_struct' split its bitfield of named
>> options into a 'flags' enum in 7c4c97c0 (Turn the flags in struct
>> dir_struct into a single variable, 2009-02-16). When those flags
>> changed, the comment became stale, since members like
>> 'show_other_directories' transitioned into flags like
>> DIR_SHOW_OTHER_DIRECTORIES.
>> 
>> Update the comments for treat_directory() to use these flag names rather
>> than the old member names.
>> 
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> Reviewed-by: Elijah Newren <newren@gmail.com>
>
> I think you want the "Reviewed-by" before the "Signed-off-by",
> followed by your own sign-off.

Grabbing somebody else's signed-off patch, and forwarding it (with
or without tweaks and enhancements) with your own sign-off would be
a sufficient sign that you've inspected the patch deeply enough to
be confident that it is worth forwarding.  So I think you can even
lose the reviewed-by.

But as long as you are relaying somebody else's patch, DCO asks you
to sign it off yourself.

Thanks.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v5 8/9] dir: update stale description of treat_directory()
  2021-05-17 19:44             ` Junio C Hamano
@ 2021-05-18  3:32               ` Elijah Newren
  2021-05-19  1:44                 ` Junio C Hamano
  0 siblings, 1 reply; 90+ messages in thread
From: Elijah Newren @ 2021-05-18  3:32 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee, Derrick Stolee via GitGitGadget,
	Git Mailing List, Eric Sunshine, Jeff King, Philip Oakley,
	Jeff Hostetler, Josh Steadmon, Jeff Hostetler

On Mon, May 17, 2021 at 12:44 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Derrick Stolee <stolee@gmail.com> writes:
>
> > On 5/12/2021 1:28 PM, Derrick Stolee via GitGitGadget wrote:
> >> From: Derrick Stolee <stolee@gmail.com>
> >>
> >> The documentation comment for treat_directory() was originally written
> >> in 095952 (Teach directory traversal about subprojects, 2007-04-11)
> >> which was before the 'struct dir_struct' split its bitfield of named
> >> options into a 'flags' enum in 7c4c97c0 (Turn the flags in struct
> >> dir_struct into a single variable, 2009-02-16). When those flags
> >> changed, the comment became stale, since members like
> >> 'show_other_directories' transitioned into flags like
> >> DIR_SHOW_OTHER_DIRECTORIES.
> >>
> >> Update the comments for treat_directory() to use these flag names rather
> >> than the old member names.
> >>
> >> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> >> Reviewed-by: Elijah Newren <newren@gmail.com>
> >
> > I think you want the "Reviewed-by" before the "Signed-off-by",
> > followed by your own sign-off.
>
> Grabbing somebody else's signed-off patch, and forwarding it (with
> or without tweaks and enhancements) with your own sign-off would be
> a sufficient sign that you've inspected the patch deeply enough to
> be confident that it is worth forwarding.  So I think you can even
> lose the reviewed-by.
>
> But as long as you are relaying somebody else's patch, DCO asks you
> to sign it off yourself.
>
> Thanks.

I was going to go fix this up, but it looks like en/dir-traversal has
already merged down to next.

We could revert the last two patches of the series out of next
(allowing the first seven with the important fixes to merge down) and
then I could resubmit just the last two patches.  Or we could just let
them all merge down as-is.  Preferences?

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v5 9/9] dir: introduce readdir_skip_dot_and_dotdot() helper
  2021-05-17 17:22           ` Derrick Stolee
@ 2021-05-18  3:34             ` Elijah Newren
  0 siblings, 0 replies; 90+ messages in thread
From: Elijah Newren @ 2021-05-18  3:34 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Eric Sunshine,
	Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon,
	Jeff Hostetler

On Mon, May 17, 2021 at 10:22 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 5/12/2021 1:28 PM, Elijah Newren via GitGitGadget wrote:
> > From: Elijah Newren <newren@gmail.com>
> >
> > Many places in the code were doing
> >     while ((d = readdir(dir)) != NULL) {
> >         if (is_dot_or_dotdot(d->d_name))
> >             continue;
> >         ...process d...
> >     }
> > Introduce a readdir_skip_dot_and_dotdot() helper to make that a one-liner:
> >     while ((d = readdir_skip_dot_and_dotdot(dir)) != NULL) {
> >         ...process d...
> >     }
> >
> > This helper particularly simplifies checks for empty directories.
> >
> > Also use this helper in read_cached_dir() so that our statistics are
> > consistent across platforms.  (In other words, read_cached_dir() should
> > have been using is_dot_or_dotdot() and skipping such entries, but did
> > not and left it to treat_path() to detect and mark such entries as
> > path_none.)
>
> I like the idea of this helper!
>
> > +struct dirent *
> > +readdir_skip_dot_and_dotdot(DIR *dirp)
>
> nit: This seems like an accidental newline between the
> return type and the method name.

I would fix this, but the patch is already in next.  If Junio decides
to revert the last two patches out of next because of the
Signed-off-by issue on the previous patch, then I'll resubmit this
patch as well with this issue fixed.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v5 8/9] dir: update stale description of treat_directory()
  2021-05-18  3:32               ` Elijah Newren
@ 2021-05-19  1:44                 ` Junio C Hamano
  0 siblings, 0 replies; 90+ messages in thread
From: Junio C Hamano @ 2021-05-19  1:44 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Derrick Stolee, Derrick Stolee via GitGitGadget,
	Git Mailing List, Eric Sunshine, Jeff King, Philip Oakley,
	Jeff Hostetler, Josh Steadmon, Jeff Hostetler

Elijah Newren <newren@gmail.com> writes:

> We could revert the last two patches of the series out of next
> (allowing the first seven with the important fixes to merge down) and
> then I could resubmit just the last two patches.  Or we could just let
> them all merge down as-is.  Preferences?

The former, thanks.

^ permalink raw reply	[flat|nested] 90+ messages in thread

end of thread, other threads:[~2021-05-19  1:44 UTC | newest]

Thread overview: 90+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-07  4:04 [PATCH 0/5] Directory traversal fixes Elijah Newren via GitGitGadget
2021-05-07  4:04 ` [PATCH 1/5] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget
2021-05-07  4:27   ` Eric Sunshine
2021-05-07  5:00     ` Elijah Newren
2021-05-07  5:31       ` Eric Sunshine
2021-05-07  5:42         ` Elijah Newren
2021-05-07  5:56           ` Eric Sunshine
2021-05-07 23:05       ` Jeff King
2021-05-07 23:15         ` Eric Sunshine
2021-05-08  0:04         ` Elijah Newren
2021-05-08  0:10           ` Eric Sunshine
2021-05-08 17:20             ` Elijah Newren
2021-05-08 11:13   ` Philip Oakley
2021-05-08 17:20     ` Elijah Newren
2021-05-07  4:04 ` [PATCH 2/5] t3001, t7300: add testcase showcasing missed directory traversal Elijah Newren via GitGitGadget
2021-05-07  4:04 ` [PATCH 3/5] dir: avoid unnecessary traversal into ignored directory Elijah Newren via GitGitGadget
2021-05-07  4:04 ` [PATCH 4/5] dir: traverse into untracked directories if they may have ignored subfiles Elijah Newren via GitGitGadget
2021-05-07  4:05 ` [PATCH 5/5] [RFC] ls-files: error out on -i unless -o or -c are specified Elijah Newren via GitGitGadget
2021-05-07 16:22 ` [PATCH 6/5] dir: update stale description of treat_directory() Derrick Stolee
2021-05-07 17:57   ` Elijah Newren
2021-05-07 16:27 ` [PATCH 0/5] Directory traversal fixes Derrick Stolee
2021-05-08  0:08 ` [PATCH v2 0/8] " Elijah Newren via GitGitGadget
2021-05-08  0:08   ` [PATCH v2 1/8] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget
2021-05-08 10:13     ` Junio C Hamano
2021-05-08 17:34       ` Elijah Newren
2021-05-08 10:19     ` Junio C Hamano
2021-05-08 17:41       ` Elijah Newren
2021-05-08  0:08   ` [PATCH v2 2/8] t3001, t7300: add testcase showcasing missed directory traversal Elijah Newren via GitGitGadget
2021-05-08  0:08   ` [PATCH v2 3/8] dir: avoid unnecessary traversal into ignored directory Elijah Newren via GitGitGadget
2021-05-08  0:08   ` [PATCH v2 4/8] dir: traverse into untracked directories if they may have ignored subfiles Elijah Newren via GitGitGadget
2021-05-08  0:08   ` [PATCH v2 5/8] [RFC] ls-files: error out on -i unless -o or -c are specified Elijah Newren via GitGitGadget
2021-05-08  0:08   ` [PATCH v2 6/8] dir: update stale description of treat_directory() Derrick Stolee via GitGitGadget
2021-05-08  0:08   ` [PATCH v2 7/8] [RFC] dir: convert trace calls to trace2 equivalents Elijah Newren via GitGitGadget
2021-05-08  0:08   ` [PATCH v2 8/8] [RFC] dir: reported number of visited directories and paths with trace2 Elijah Newren via GitGitGadget
2021-05-08 19:58   ` [PATCH v3 0/8] Directory traversal fixes Elijah Newren via GitGitGadget
2021-05-08 19:58     ` [PATCH v3 1/8] [RFC] dir: convert trace calls to trace2 equivalents Elijah Newren via GitGitGadget
2021-05-10  4:49       ` Junio C Hamano
2021-05-11 17:23         ` Elijah Newren
2021-05-11 16:17       ` Jeff Hostetler
2021-05-11 17:29         ` Elijah Newren
2021-05-08 19:58     ` [PATCH v3 2/8] [RFC] dir: report number of visited directories and paths with trace2 Elijah Newren via GitGitGadget
2021-05-10  5:00       ` Junio C Hamano
2021-05-08 19:58     ` [PATCH v3 3/8] [RFC] ls-files: error out on -i unless -o or -c are specified Elijah Newren via GitGitGadget
2021-05-10  5:09       ` Junio C Hamano
2021-05-11 17:40         ` Elijah Newren
2021-05-11 22:32           ` Junio C Hamano
2021-05-08 19:59     ` [PATCH v3 4/8] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget
2021-05-10  5:28       ` Junio C Hamano
2021-05-11 17:45         ` Elijah Newren
2021-05-11 22:43           ` Junio C Hamano
2021-05-12  2:07             ` Elijah Newren
2021-05-12  3:17               ` Junio C Hamano
2021-05-08 19:59     ` [PATCH v3 5/8] t3001, t7300: add testcase showcasing missed directory traversal Elijah Newren via GitGitGadget
2021-05-08 19:59     ` [PATCH v3 6/8] dir: avoid unnecessary traversal into ignored directory Elijah Newren via GitGitGadget
2021-05-10  5:48       ` Junio C Hamano
2021-05-11 17:57         ` Elijah Newren
2021-05-08 19:59     ` [PATCH v3 7/8] dir: traverse into untracked directories if they may have ignored subfiles Elijah Newren via GitGitGadget
2021-05-08 19:59     ` [PATCH v3 8/8] dir: update stale description of treat_directory() Derrick Stolee via GitGitGadget
2021-05-11 18:34     ` [PATCH v4 0/8] Directory traversal fixes Elijah Newren via GitGitGadget
2021-05-11 18:34       ` [PATCH v4 1/8] dir: convert trace calls to trace2 equivalents Elijah Newren via GitGitGadget
2021-05-11 19:06         ` Jeff Hostetler
2021-05-11 20:12           ` Elijah Newren
2021-05-11 23:12             ` Jeff Hostetler
2021-05-12  0:44               ` Elijah Newren
2021-05-12 12:26                 ` Jeff Hostetler
2021-05-12 15:24                   ` Elijah Newren
2021-05-11 18:34       ` [PATCH v4 2/8] dir: report number of visited directories and paths with trace2 Elijah Newren via GitGitGadget
2021-05-11 18:34       ` [PATCH v4 3/8] ls-files: error out on -i unless -o or -c are specified Elijah Newren via GitGitGadget
2021-05-11 18:34       ` [PATCH v4 4/8] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget
2021-05-11 18:34       ` [PATCH v4 5/8] t3001, t7300: add testcase showcasing missed directory traversal Elijah Newren via GitGitGadget
2021-05-11 18:34       ` [PATCH v4 6/8] dir: avoid unnecessary traversal into ignored directory Elijah Newren via GitGitGadget
2021-05-11 18:34       ` [PATCH v4 7/8] dir: traverse into untracked directories if they may have ignored subfiles Elijah Newren via GitGitGadget
2021-05-11 18:34       ` [PATCH v4 8/8] dir: update stale description of treat_directory() Derrick Stolee via GitGitGadget
2021-05-12 17:28       ` [PATCH v5 0/9] Directory traversal fixes Elijah Newren via GitGitGadget
2021-05-12 17:28         ` [PATCH v5 1/9] dir: convert trace calls to trace2 equivalents Elijah Newren via GitGitGadget
2021-05-12 17:28         ` [PATCH v5 2/9] dir: report number of visited directories and paths with trace2 Elijah Newren via GitGitGadget
2021-05-12 17:28         ` [PATCH v5 3/9] ls-files: error out on -i unless -o or -c are specified Elijah Newren via GitGitGadget
2021-05-12 17:28         ` [PATCH v5 4/9] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget
2021-05-12 17:28         ` [PATCH v5 5/9] t3001, t7300: add testcase showcasing missed directory traversal Elijah Newren via GitGitGadget
2021-05-12 17:28         ` [PATCH v5 6/9] dir: avoid unnecessary traversal into ignored directory Elijah Newren via GitGitGadget
2021-05-12 17:28         ` [PATCH v5 7/9] dir: traverse into untracked directories if they may have ignored subfiles Elijah Newren via GitGitGadget
2021-05-12 17:28         ` [PATCH v5 8/9] dir: update stale description of treat_directory() Derrick Stolee via GitGitGadget
2021-05-17 17:20           ` Derrick Stolee
2021-05-17 19:44             ` Junio C Hamano
2021-05-18  3:32               ` Elijah Newren
2021-05-19  1:44                 ` Junio C Hamano
2021-05-12 17:28         ` [PATCH v5 9/9] dir: introduce readdir_skip_dot_and_dotdot() helper Elijah Newren via GitGitGadget
2021-05-17 17:22           ` Derrick Stolee
2021-05-18  3:34             ` Elijah Newren
2021-05-17 17:23         ` [PATCH v5 0/9] Directory traversal fixes Derrick Stolee

git@vger.kernel.org list mirror (unofficial, one of many)

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 git git/ https://public-inbox.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://7fh6tueqddpjyxjmgtdiueylzoqt6pt7hec3pukyptlmohoowvhde4yd.onion/inbox.comp.version-control.git
	nntp://ie5yzdi7fg72h7s4sdcztq5evakq23rdt33mfyfcddc5u3ndnw24ogqd.onion/inbox.comp.version-control.git
	nntp://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git
 note: .onion URLs require Tor: https://www.torproject.org/

code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git