git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v1 0/1] Teach status to show ignored directories
@ 2017-08-10 18:49 Jameson Miller
  2017-08-10 18:49 ` [PATCH v1 1/1] dir: teach " Jameson Miller
  2017-09-18 17:24 ` [PATCH v2] Improve performance of git status --ignored Jameson Miller
  0 siblings, 2 replies; 11+ messages in thread
From: Jameson Miller @ 2017-08-10 18:49 UTC (permalink / raw)
  To: git; +Cc: sxlijin, bmwill, gitster, peff, Jameson Miller

Our application (Visual Studio) needs to run git status with options
to report untracked and ignored files. It needs all untracked files
reported individually, but would rather not have all individual
ignored files under explicitly ignored directories
reported. Directories that match an ignore pattern should be reported,
instead of all contained files.

It is not possible to get this output with the current arguments
available to git status. You can get ignored files (--ignored), all
untracked files (--untracked-files=all), but if you specify both
options then you will also get all individual ignored files.

This change is to add the option to have git status report all
untracked files while also reporting directories that match an ignore
pattern. The logic in dir.c was also modified to not iterate over all
files in an ignored directory. This change resulted in a performance
improvement in work directories with a large number of files contained
in ignored directories.  For example, this setup is present when bin
and obj directories are contained in the repository work dir.

Jameson Miller (1):
  dir: teach status to show ignored directories

 Documentation/git-status.txt                      |   5 +
 Documentation/technical/api-directory-listing.txt |   6 +
 builtin/commit.c                                  |   4 +
 dir.c                                             |  48 +++++-
 dir.h                                             |   3 +-
 t/t7519-status-show-ignored-directory.sh          | 189 ++++++++++++++++++++++
 wt-status.c                                       |   4 +
 wt-status.h                                       |   1 +
 8 files changed, 253 insertions(+), 7 deletions(-)
 create mode 100755 t/t7519-status-show-ignored-directory.sh

-- 
2.11.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v1 1/1] dir: teach status to show ignored directories
  2017-08-10 18:49 [PATCH v1 0/1] Teach status to show ignored directories Jameson Miller
@ 2017-08-10 18:49 ` Jameson Miller
  2017-08-10 20:03   ` Stefan Beller
  2017-08-11 17:39   ` Brandon Williams
  2017-09-18 17:24 ` [PATCH v2] Improve performance of git status --ignored Jameson Miller
  1 sibling, 2 replies; 11+ messages in thread
From: Jameson Miller @ 2017-08-10 18:49 UTC (permalink / raw)
  To: git; +Cc: sxlijin, bmwill, gitster, peff, Jameson Miller

Teach Git to optionally show ignored directories when showing all
untracked files. The git status command exposes the options to report
ignored and/or untracked files. However, when reporting all untracked
files (--untracked-files=all), all individual ignored files are reported
as well. It is not currently possible to get the reporting behavior of
the --ignored flag, while also reporting all untracked files. This
change exposes a flag to report all untracked files while not showing
individual files in ignored directories.

Motivation:
Our application (Visual Studio) needs all untracked files listed
individually, but does not need all ignored files listed individually.
Reporting all ignored files can affect the time it takes for status
to run. For a representative repository, here are some measurements
showing a large perf improvement for this scenario:

| Command | Reported ignored entries | Time (s) |
| ------- | ------------------------ | -------- |
| 1       | 0                        | 1.3      |
| 2       | 1024                     | 4.2      |
| 3       | 174904                   | 7.5      |
| 4       | 1046                     | 1.6      |

Commands:
 1) status
 2) status --ignored
 3) status --ignored --untracked-files=all
 4) status --ignored --untracked-files=all --show-ignored-directory

This changes exposes a --show-ignored-directory flag to the git status
command. This flag is utilized when running git status with the
--ignored and --untracked-files options to not list ignored individual
ignored files contained in directories that match an ignore pattern.

Part of the perf improvement comes from the tweak to
read_directory_recursive to stop scanning the file system after it
encounters the first file. When a directory is ignored, all it needs to
determine is if the directory is empty or not. The logic currently keeps
scanning the file system until it finds an untracked file. However, as
the directory is ignored, all the contained contents are also marked
excluded. For ignored directories that contain a large number of files,
this can take some time.

Signed-off-by: Jameson Miller <jamill@microsoft.com>
---
 Documentation/git-status.txt                      |   5 +
 Documentation/technical/api-directory-listing.txt |   6 +
 builtin/commit.c                                  |   4 +
 dir.c                                             |  48 +++++-
 dir.h                                             |   3 +-
 t/t7519-status-show-ignored-directory.sh          | 189 ++++++++++++++++++++++
 wt-status.c                                       |   4 +
 wt-status.h                                       |   1 +
 8 files changed, 253 insertions(+), 7 deletions(-)
 create mode 100755 t/t7519-status-show-ignored-directory.sh

diff --git a/Documentation/git-status.txt b/Documentation/git-status.txt
index d47f198f15..48ebe18571 100644
--- a/Documentation/git-status.txt
+++ b/Documentation/git-status.txt
@@ -100,6 +100,11 @@ configuration variable documented in linkgit:git-config[1].
 --ignored::
 	Show ignored files as well.
 
+--show-ignored-directory::
+    Show directories that are ignored, instead of individual files.
+    Does not recurse into excluded directories when listing all
+    untracked files.
+
 -z::
 	Terminate entries with NUL, instead of LF.  This implies
 	the `--porcelain=v1` output format if no other format is given.
diff --git a/Documentation/technical/api-directory-listing.txt b/Documentation/technical/api-directory-listing.txt
index 6c77b4920c..a9816df44c 100644
--- a/Documentation/technical/api-directory-listing.txt
+++ b/Documentation/technical/api-directory-listing.txt
@@ -33,6 +33,12 @@ The notable options are:
 	Similar to `DIR_SHOW_IGNORED`, but return ignored files in `ignored[]`
 	in addition to untracked files in `entries[]`.
 
+`DIR_SHOW_IGNORED_DIRECTORY`:::
+
+	If this is set, non-empty directories that match an ignore pattern are
+	returned. The individual files contained in ignored directories are not
+	included.
+
 `DIR_KEEP_UNTRACKED_CONTENTS`:::
 
 	Only has meaning if `DIR_SHOW_IGNORED_TOO` is also set; if this is set, the
diff --git a/builtin/commit.c b/builtin/commit.c
index 8e93802511..9fcf77ae3a 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1333,6 +1333,7 @@ static int git_status_config(const char *k, const char *v, void *cb)
 
 int cmd_status(int argc, const char **argv, const char *prefix)
 {
+	static int show_ignored_directory = 0;
 	static struct wt_status s;
 	int fd;
 	struct object_id oid;
@@ -1362,6 +1363,8 @@ int cmd_status(int argc, const char **argv, const char *prefix)
 		  N_("ignore changes to submodules, optional when: all, dirty, untracked. (Default: all)"),
 		  PARSE_OPT_OPTARG, NULL, (intptr_t)"all" },
 		OPT_COLUMN(0, "column", &s.colopts, N_("list untracked files in columns")),
+		OPT_BOOL(0, "show-ignored-directory", &show_ignored_directory,
+		N_("Only show directories that match an ignore pattern name.")),
 		OPT_END(),
 	};
 
@@ -1394,6 +1397,7 @@ int cmd_status(int argc, const char **argv, const char *prefix)
 	s.ignore_submodule_arg = ignore_submodule_arg;
 	s.status_format = status_format;
 	s.verbose = verbose;
+	s.show_ignored_directory = show_ignored_directory;
 
 	wt_status_collect(&s);
 
diff --git a/dir.c b/dir.c
index ae6f5c9636..b17dfd53bf 100644
--- a/dir.c
+++ b/dir.c
@@ -49,7 +49,7 @@ struct cached_dir {
 static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 	struct index_state *istate, const char *path, int len,
 	struct untracked_cache_dir *untracked,
-	int check_only, const struct pathspec *pathspec);
+	int check_only, int stop_at_first_file, const struct pathspec *pathspec);
 static int get_dtype(struct dirent *de, struct index_state *istate,
 		     const char *path, int len);
 
@@ -1389,6 +1389,21 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	case index_nonexistent:
 		if (dir->flags & DIR_SHOW_OTHER_DIRECTORIES)
 			break;
+		if (exclude &&
+			(dir->flags & DIR_SHOW_IGNORED_TOO) &&
+			(dir->flags & DIR_SHOW_IGNORED_DIRECTORY)) {
+
+			/*
+			 * This is an excluded directory, and we are only
+			 * showing the name of a excluded directory.
+			 * Check to see if there are any contained files
+			 * to determine if the directory is empty or not.
+			 */
+			if (read_directory_recursive(dir, istate, dirname, len,
+				untracked, 1, 1, pathspec) == path_excluded)
+				return path_excluded;
+			return path_none;
+		}
 		if (!(dir->flags & DIR_NO_GITLINKS)) {
 			unsigned char sha1[20];
 			if (resolve_gitlink_ref(dirname, "HEAD", sha1) == 0)
@@ -1405,9 +1420,10 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	untracked = lookup_untracked(dir->untracked, untracked,
 				     dirname + baselen, len - baselen);
 	return read_directory_recursive(dir, istate, dirname, len,
-					untracked, 1, pathspec);
+					untracked, 1, 0, pathspec);
 }
 
+
 /*
  * This is an inexact early pruning of any recursive directory
  * reading - if the path cannot possibly be in the pathspec,
@@ -1633,7 +1649,7 @@ static enum path_treatment treat_path_fast(struct dir_struct *dir,
 		 * with check_only set.
 		 */
 		return read_directory_recursive(dir, istate, path->buf, path->len,
-						cdir->ucd, 1, pathspec);
+						cdir->ucd, 1, 0, pathspec);
 	/*
 	 * We get path_recurse in the first run when
 	 * directory_exists_in_index() returns index_nonexistent. We
@@ -1798,7 +1814,7 @@ static void close_cached_dir(struct cached_dir *cdir)
 static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 	struct index_state *istate, const char *base, int baselen,
 	struct untracked_cache_dir *untracked, int check_only,
-	const struct pathspec *pathspec)
+	int stop_at_first_file, const struct pathspec *pathspec)
 {
 	struct cached_dir cdir;
 	enum path_treatment state, subdir_state, dir_state = path_none;
@@ -1832,12 +1848,32 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 			subdir_state =
 				read_directory_recursive(dir, istate, path.buf,
 							 path.len, ud,
-							 check_only, pathspec);
+							 check_only, stop_at_first_file, pathspec);
 			if (subdir_state > dir_state)
 				dir_state = subdir_state;
 		}
 
 		if (check_only) {
+			if (stop_at_first_file) {
+				/*
+				 * In general, if we are stopping at the first found file,
+				 * We can only signal that a path of at least "excluded" was
+				 * found. If the first file we find is "excluded" - there might
+				 * be other untracked files later on that will not be searched.
+				 *
+				 * In current usage of this function, stop_at_first_file will
+				 * only be set when called from a directory that matches the
+				 * exclude pattern - there should be no untracked files -
+				 * all contents should be marked as excluded.
+				 */
+				if (dir_state == path_excluded)
+					break;
+				else if (dir_state > path_excluded) {
+					dir_state = path_excluded;
+					break;
+				}
+			}
+
 			/* abort early if maximum state has been reached */
 			if (dir_state == path_untracked) {
 				if (cdir.fdir)
@@ -2108,7 +2144,7 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
 		 */
 		dir->untracked = NULL;
 	if (!len || treat_leading_path(dir, istate, path, len, pathspec))
-		read_directory_recursive(dir, istate, path, len, untracked, 0, pathspec);
+		read_directory_recursive(dir, istate, path, len, untracked, 0, 0, pathspec);
 	QSORT(dir->entries, dir->nr, cmp_dir_entry);
 	QSORT(dir->ignored, dir->ignored_nr, cmp_dir_entry);
 
diff --git a/dir.h b/dir.h
index e3717055d1..b59eedf633 100644
--- a/dir.h
+++ b/dir.h
@@ -152,7 +152,8 @@ struct dir_struct {
 		DIR_COLLECT_IGNORED = 1<<4,
 		DIR_SHOW_IGNORED_TOO = 1<<5,
 		DIR_COLLECT_KILLED_ONLY = 1<<6,
-		DIR_KEEP_UNTRACKED_CONTENTS = 1<<7
+		DIR_KEEP_UNTRACKED_CONTENTS = 1<<7,
+		DIR_SHOW_IGNORED_DIRECTORY = 1<<8
 	} flags;
 	struct dir_entry **entries;
 	struct dir_entry **ignored;
diff --git a/t/t7519-status-show-ignored-directory.sh b/t/t7519-status-show-ignored-directory.sh
new file mode 100755
index 0000000000..3ff0a96cdf
--- /dev/null
+++ b/t/t7519-status-show-ignored-directory.sh
@@ -0,0 +1,189 @@
+#!/bin/sh
+#
+#
+
+test_description='git status collapse ignored'
+
+. ./test-lib.sh
+
+
+cat >.gitignore <<\EOF
+*.ign
+ignored_dir/
+!*.unignore
+EOF
+
+# commit initial ignore file
+test_expect_success 'setup initial commit and ignore file' '
+	git add . &&
+	test_tick &&
+	git commit -m "Initial commit"
+'
+
+cat >expect <<\EOF
+? expect
+? output
+! dir/ignored/ignored_1.ign
+! dir/ignored/ignored_2.ign
+! ignored/ignored_1.ign
+! ignored/ignored_2.ign
+EOF
+
+# Test status behavior on folder with ignored files
+test_expect_success 'setup folder with ignored files' '
+	mkdir -p ignored dir/ignored &&
+	touch ignored/ignored_1.ign ignored/ignored_2.ign \
+		dir/ignored/ignored_1.ign dir/ignored/ignored_2.ign
+'
+
+test_expect_success 'Verify behavior of status on folders with ignored files' '
+	test_when_finished "git clean -fdx" &&
+	git status --porcelain=v2 --ignored --untracked-files=all --show-ignored-directory >output &&
+	test_i18ncmp expect output
+'
+
+# Test status bahavior on folder with tracked and ignored files
+cat >expect <<\EOF
+? expect
+? output
+! dir/tracked_ignored/ignored_1.ign
+! dir/tracked_ignored/ignored_2.ign
+! tracked_ignored/ignored_1.ign
+! tracked_ignored/ignored_2.ign
+EOF
+
+test_expect_success 'setup folder with tracked & ignored files' '
+	mkdir -p tracked_ignored dir/tracked_ignored &&
+	touch tracked_ignored/tracked_1 tracked_ignored/tracked_2 \
+		tracked_ignored/ignored_1.ign tracked_ignored/ignored_2.ign \
+		dir/tracked_ignored/tracked_1 dir/tracked_ignored/tracked_2 \
+		dir/tracked_ignored/ignored_1.ign dir/tracked_ignored/ignored_2.ign &&
+
+	git add tracked_ignored/tracked_1 tracked_ignored/tracked_2 \
+		dir/tracked_ignored/tracked_1 dir/tracked_ignored/tracked_2 &&
+	test_tick &&
+	git commit -m "commit tracked files"
+'
+
+test_expect_success 'Verify status on folder with tracked & ignored files' '
+	git status --porcelain=v2 --ignored --untracked-files=all --show-ignored-directory >output &&
+	test_i18ncmp expect output
+'
+
+# Test status with modified paths
+cat >expect <<\EOF
+1 .M N... 100644 100644 100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 tracked_ignored/tracked_1
+? expect
+? output
+! dir/tracked_ignored/ignored_1.ign
+! dir/tracked_ignored/ignored_2.ign
+! tracked_ignored/ignored_1.ign
+! tracked_ignored/ignored_2.ign
+EOF
+
+test_expect_success 'setup workdir with modified tracked files' '
+	echo 123 > tracked_ignored/tracked_1
+'
+
+test_expect_success 'Verify status on folder with tracked & ignored files' '
+	test_when_finished "git clean -fdx && git reset HEAD~1 --hard" &&
+	git status --porcelain=v2 --ignored --untracked-files=all --show-ignored-directory >output &&
+	test_i18ncmp expect output
+'
+
+# Test status behavior on folder with untracked and ignored files
+cat >expect <<\EOF
+? dir/untracked_ignored/untracked_1
+? dir/untracked_ignored/untracked_2
+? expect
+? output
+? untracked_ignored/untracked_1
+? untracked_ignored/untracked_2
+! dir/untracked_ignored/ignored_1.ign
+! dir/untracked_ignored/ignored_2.ign
+! untracked_ignored/ignored_1.ign
+! untracked_ignored/ignored_2.ign
+EOF
+
+test_expect_success 'setup folder with tracked & ignored files' '
+	mkdir -p untracked_ignored dir/untracked_ignored &&
+	touch untracked_ignored/untracked_1 untracked_ignored/untracked_2 \
+		untracked_ignored/ignored_1.ign untracked_ignored/ignored_2.ign \
+		dir/untracked_ignored/untracked_1 dir/untracked_ignored/untracked_2 \
+		dir/untracked_ignored/ignored_1.ign dir/untracked_ignored/ignored_2.ign
+'
+
+test_expect_success 'Verify status on folder with tracked & ignored files' '
+	test_when_finished "git clean -fdx" &&
+	git status --porcelain=v2 --ignored --untracked-files=all --show-ignored-directory >output &&
+	test_i18ncmp expect output
+'
+
+# Test status behavior on ignored folder
+cat >expect <<\EOF
+? expect
+? output
+! ignored_dir/
+EOF
+
+test_expect_success 'setup folder with tracked & ignored files' '
+	mkdir ignored_dir &&
+	touch ignored_dir/ignored_1 ignored_dir/ignored_2 \
+		ignored_dir/ignored_1.ign ignored_dir/ignored_2.ign
+'
+
+test_expect_success 'Verify status on folder with tracked & ignored files' '
+	test_when_finished "git clean -fdx" &&
+	git status --porcelain=v2 --ignored --untracked-files=all --show-ignored-directory >output &&
+	test_i18ncmp expect output
+'
+
+# Test status behavior on ignored folder with tracked file
+cat >expect <<\EOF
+? expect
+? output
+! ignored_dir/ignored_1
+! ignored_dir/ignored_1.ign
+! ignored_dir/ignored_2
+! ignored_dir/ignored_2.ign
+EOF
+
+test_expect_success 'setup folder with tracked & ignored files' '
+	mkdir ignored_dir &&
+	touch ignored_dir/ignored_1 ignored_dir/ignored_2 \
+		ignored_dir/ignored_1.ign ignored_dir/ignored_2.ign \
+		ignored_dir/tracked &&
+	git add -f ignored_dir/tracked &&
+	test_tick &&
+	git commit -m "Force add file in ignored directory"
+'
+
+test_expect_success 'Verify status on folder with tracked & ignored files' '
+	test_when_finished "git clean -fdx && git reset HEAD~1 --hard" &&
+	git status --porcelain=v2 --ignored --untracked-files=all --show-ignored-directory >output &&
+	test_i18ncmp expect output
+'
+
+# Test status with more complex paths -
+# ignored underneath an untracked folder
+cat >expect <<\EOF
+? expect
+? output
+? dir/dir/untracked/dir/untracked/a
+! dir/dir/untracked/dir/ignored/a.ign
+! dir/dir/untracked/dir/ignored_dir/
+EOF
+
+test_expect_success 'setup deeper folder structure for ignored and untracked' '
+	mkdir dir/dir/untracked/dir/ignored dir/dir/untracked/dir/ignored_dir \
+		dir/dir/untracked/dir/untracked &&
+	touch dir/dir/untracked/dir/ignored/a.ign dir/dir/untracked/dir/ignored_dir/a \
+	      dir/dir/untracked/dir/untracked/a
+'
+test_expect_success 'Verify status on slightly deeper folder structure' '
+	test_when_finished "git clean -fdx" &&
+	git status --porcelain=v2 --ignored --untracked-files=all --show-ignored-directory >output &&
+	test_i18ncmp expect output
+'
+
+test_done
diff --git a/wt-status.c b/wt-status.c
index 77c27c5113..c3d17490a4 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -664,6 +664,10 @@ static void wt_status_collect_untracked(struct wt_status *s)
 		dir.flags |= DIR_SHOW_IGNORED_TOO;
 	else
 		dir.untracked = the_index.untracked;
+
+	if (s->show_ignored_directory)
+		dir.flags |= DIR_SHOW_IGNORED_DIRECTORY;
+
 	setup_standard_excludes(&dir);
 
 	fill_directory(&dir, &the_index, &s->pathspec);
diff --git a/wt-status.h b/wt-status.h
index 64f4d33ea1..b9c1dffb92 100644
--- a/wt-status.h
+++ b/wt-status.h
@@ -72,6 +72,7 @@ struct wt_status {
 	int submodule_summary;
 	int show_ignored_files;
 	enum untracked_status_type show_untracked_files;
+	int show_ignored_directory;
 	const char *ignore_submodule_arg;
 	char color_palette[WT_STATUS_MAXSLOT][COLOR_MAXLEN];
 	unsigned colopts;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v1 1/1] dir: teach status to show ignored directories
  2017-08-10 18:49 ` [PATCH v1 1/1] dir: teach " Jameson Miller
@ 2017-08-10 20:03   ` Stefan Beller
  2017-08-11 17:48     ` Jameson Miller
  2017-08-11 17:39   ` Brandon Williams
  1 sibling, 1 reply; 11+ messages in thread
From: Stefan Beller @ 2017-08-10 20:03 UTC (permalink / raw)
  To: Jameson Miller
  Cc: git@vger.kernel.org, Samuel Lijin, Brandon Williams,
	Junio C Hamano, Jeff King, Jameson Miller

On Thu, Aug 10, 2017 at 11:49 AM, Jameson Miller
<jameson.miller81@gmail.com> wrote:

Welcome to the Git mailing list. :)

> Teach Git to optionally show ignored directories when showing all
> untracked files. The git status command exposes the options to report
> ignored and/or untracked files. However, when reporting all untracked
> files (--untracked-files=all), all individual ignored files are reported
> as well. It is not currently possible to get the reporting behavior of
> the --ignored flag, while also reporting all untracked files.

Trying to understand this based off the documentation for
--untracked=all and --ignored, I realize that the documentation
for --ignored seems to be lacking as I do not understand what the
--ignored behavior is in combination with --untracked=[all, normal, no]

> This
> change exposes a flag to report all untracked files while not showing
> individual files in ignored directories.

By the description up to here, it sounds as if you want to introduce
mode for --untracked, e.g. "normal-adjusted-for-ignored" (it's a bad
suggestion)? However the patch seems to add an orthogonal flag,
such that

  status --no-ignored --untracked=no --show-ignored-directory

would also be possible. What is a reasonable expectation for
the output of such?

> Motivation:
> Our application (Visual Studio) needs all untracked files listed
> individually, but does not need all ignored files listed individually.

For parsing output, I would strongly recommend --porcelain[=2],
but that is a tangent.

> Reporting all ignored files can affect the time it takes for status
> to run. For a representative repository, here are some measurements
> showing a large perf improvement for this scenario:
>
> | Command | Reported ignored entries | Time (s) |
> | ------- | ------------------------ | -------- |
> | 1       | 0                        | 1.3      |
> | 2       | 1024                     | 4.2      |
> | 3       | 174904                   | 7.5      |
> | 4       | 1046                     | 1.6      |
>
> Commands:
>  1) status
>  2) status --ignored
>  3) status --ignored --untracked-files=all
>  4) status --ignored --untracked-files=all --show-ignored-directory

(2) is --untracked-files=normal I'd presume, such that this flag
can be understood as a tweak to "normal" based on the similar size
between 2 and 4? (The timing improvement from 2 to 4 is huge though).

> This changes exposes a --show-ignored-directory flag to the git status

s/changes/change/

> command. This flag is utilized when running git status with the
> --ignored and --untracked-files options to not list ignored individual
> ignored files contained in directories that match an ignore pattern.
>
> Part of the perf improvement comes from the tweak to
> read_directory_recursive to stop scanning the file system after it
> encounters the first file. When a directory is ignored, all it needs to
> determine is if the directory is empty or not. The logic currently keeps
> scanning the file system until it finds an untracked file. However, as
> the directory is ignored, all the contained contents are also marked
> excluded. For ignored directories that contain a large number of files,
> this can take some time.

I think it is possible to ignore a directory and still track files in it, what
are the implications of this flag on a tracked (and changed) file in an
ignored dir?

What happens to empty directories that match an ignore pattern?


> @@ -1362,6 +1363,8 @@ int cmd_status(int argc, const char **argv, const char *prefix)
>                   N_("ignore changes to submodules, optional when: all, dirty, untracked. (Default: all)"),
>                   PARSE_OPT_OPTARG, NULL, (intptr_t)"all" },
>                 OPT_COLUMN(0, "column", &s.colopts, N_("list untracked files in columns")),
> +               OPT_BOOL(0, "show-ignored-directory", &show_ignored_directory,

Is it possible to directly read into  s.show_ignored_directory here?

> +test_expect_success 'Verify behavior of status on folders with ignored files' '
> +       test_when_finished "git clean -fdx" &&
> +       git status --porcelain=v2 --ignored --untracked-files=all --show-ignored-directory >output &&
> +       test_i18ncmp expect output
> +'
> +
> +# Test status bahavior on folder with tracked and ignored files

behavior

> +cat >expect <<\EOF
> +? expect
> +? output
> +! dir/tracked_ignored/ignored_1.ign
> +! dir/tracked_ignored/ignored_2.ign
> +! tracked_ignored/ignored_1.ign
> +! tracked_ignored/ignored_2.ign
> +EOF

I think our latest 'best style' is to include these heredocs into the test.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v1 1/1] dir: teach status to show ignored directories
  2017-08-10 18:49 ` [PATCH v1 1/1] dir: teach " Jameson Miller
  2017-08-10 20:03   ` Stefan Beller
@ 2017-08-11 17:39   ` Brandon Williams
  2017-08-11 18:29     ` Jameson Miller
  1 sibling, 1 reply; 11+ messages in thread
From: Brandon Williams @ 2017-08-11 17:39 UTC (permalink / raw)
  To: Jameson Miller; +Cc: git, sxlijin, gitster, peff, Jameson Miller

On 08/10, Jameson Miller wrote:
> Teach Git to optionally show ignored directories when showing all
> untracked files. The git status command exposes the options to report
> ignored and/or untracked files. However, when reporting all untracked
> files (--untracked-files=all), all individual ignored files are reported
> as well. It is not currently possible to get the reporting behavior of
> the --ignored flag, while also reporting all untracked files. This
> change exposes a flag to report all untracked files while not showing
> individual files in ignored directories.
> 
> Motivation:
> Our application (Visual Studio) needs all untracked files listed
> individually, but does not need all ignored files listed individually.
> Reporting all ignored files can affect the time it takes for status
> to run. For a representative repository, here are some measurements
> showing a large perf improvement for this scenario:
> 
> | Command | Reported ignored entries | Time (s) |
> | ------- | ------------------------ | -------- |
> | 1       | 0                        | 1.3      |
> | 2       | 1024                     | 4.2      |
> | 3       | 174904                   | 7.5      |
> | 4       | 1046                     | 1.6      |
> 
> Commands:
>  1) status
>  2) status --ignored
>  3) status --ignored --untracked-files=all
>  4) status --ignored --untracked-files=all --show-ignored-directory
> 
> This changes exposes a --show-ignored-directory flag to the git status
> command. This flag is utilized when running git status with the
> --ignored and --untracked-files options to not list ignored individual
> ignored files contained in directories that match an ignore pattern.

I can't help feeling that there is a better way express this with a
better UI.  I'm not saying this is wrong, I'm just not sure how
--show-ignored-directory would work when not paired with --ignored and
--untracked-files.  Does it require --ignored to also be given?

> 
> Part of the perf improvement comes from the tweak to
> read_directory_recursive to stop scanning the file system after it
> encounters the first file. When a directory is ignored, all it needs to
> determine is if the directory is empty or not. The logic currently keeps
> scanning the file system until it finds an untracked file. However, as
> the directory is ignored, all the contained contents are also marked
> excluded. For ignored directories that contain a large number of files,
> this can take some time.
> 
> Signed-off-by: Jameson Miller <jamill@microsoft.com>

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v1 1/1] dir: teach status to show ignored directories
  2017-08-10 20:03   ` Stefan Beller
@ 2017-08-11 17:48     ` Jameson Miller
  2017-08-14 21:05       ` Stefan Beller
  0 siblings, 1 reply; 11+ messages in thread
From: Jameson Miller @ 2017-08-11 17:48 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git@vger.kernel.org, Samuel Lijin, Brandon Williams,
	Junio C Hamano, Jeff King, Jameson Miller



On 08/10/2017 04:03 PM, Stefan Beller wrote:
> On Thu, Aug 10, 2017 at 11:49 AM, Jameson Miller
> <jameson.miller81@gmail.com> wrote:
>
> Welcome to the Git mailing list. :)

Thank you for the welcome and the review! I will include the
suggested code changes in the next patch version.

>> Teach Git to optionally show ignored directories when showing all
>> untracked files. The git status command exposes the options to report
>> ignored and/or untracked files. However, when reporting all untracked
>> files (--untracked-files=all), all individual ignored files are reported
>> as well. It is not currently possible to get the reporting behavior of
>> the --ignored flag, while also reporting all untracked files.
> Trying to understand this based off the documentation for
> --untracked=all and --ignored, I realize that the documentation
> for --ignored seems to be lacking as I do not understand what the
> --ignored behavior is in combination with --untracked=[all, normal, no]
>

The set of files listed by "--ignored" changes when different
values are given to "--untracked-files".  If would be nice to
be able to make the ignored output independent of the untracked
settings.  This patch attempts to do that while maintaining
backward compatibility with the existing behavior.

When "--ignored" is used by itself ("--untracked-files=normal"),
ignored directories (both directories that explicitly match a
directory ignore pattern -and- directories containing only ignored
files) are listed. Individual files within are *not* listed.

When "--ignored" is used with "--untracked-files=all", git always
lists the individual files within the ignored directories and DOES
NOT collapse the output to just the containing (explicitly or
implicitly ignored) directory.

This can cause a massive performance problem when there are
a lot of ignored files in a well-defined set of ignored directories.
For example, on Windows, Visual Studio creates a bin/ and
obj/ directory inside your project where it writes all .obj
files.  Normal usage is to explicitly ignore those 2 directory
names in your .gitignores (rather than or in addition to *.obj).
We just want to see the "bin/" and "obj/" paths
regardless of which "--untracked" flag is passed in.

We want to see the paths that explicitly match an ignore pattern.
This means that if a directory only contains ignored files, but the
directory itself is not explicitly ignored, then we want to see the
individual files. This is slightly different than the current behavior
of "--ignored".

I am open to suggestions on how to present the options to control
this behavior.

>> This
>> change exposes a flag to report all untracked files while not showing
>> individual files in ignored directories.
> By the description up to here, it sounds as if you want to introduce
> mode for --untracked, e.g. "normal-adjusted-for-ignored" (it's a bad
> suggestion)? However the patch seems to add an orthogonal flag,
> such that
>
>    status --no-ignored --untracked=no --show-ignored-directory
>
> would also be possible. What is a reasonable expectation for
> the output of such?

The current patch does add another flag. This flag only has meaning if
the "--ignored" and "--untracked=all" flags are also specified. Another
option I had considered is to let the "--ignored" flag take an argument.
Then, we could express this new behavior through (for example) a
"--ignored=exact" flag to reflect the fact that this new option
returns paths that match the ignore pattern, and does not
collapse directories that contain only ignored files.

>> Motivation:
>> Our application (Visual Studio) needs all untracked files listed
>> individually, but does not need all ignored files listed individually.
> For parsing output, I would strongly recommend --porcelain[=2],
> but that is a tangent.
>
>> Reporting all ignored files can affect the time it takes for status
>> to run. For a representative repository, here are some measurements
>> showing a large perf improvement for this scenario:
>>
>> | Command | Reported ignored entries | Time (s) |
>> | ------- | ------------------------ | -------- |
>> | 1       | 0                        | 1.3      |
>> | 2       | 1024                     | 4.2      |
>> | 3       | 174904                   | 7.5      |
>> | 4       | 1046                     | 1.6      |
>>
>> Commands:
>>   1) status
>>   2) status --ignored
>>   3) status --ignored --untracked-files=all
>>   4) status --ignored --untracked-files=all --show-ignored-directory
>> (2) is --untracked-files=normal I'd presume, such that this flag
>> can be understood as a tweak to "normal" based on the similar size
>> between 2 and 4? (The timing improvement from 2 to 4 is huge though).
(2) is --untracked-files=normal. Although the count of ignored
files similar between 2 and 4, I consider this flag more of a
tweak on 3, as we want the untracked files reported with
the "--untracked=all" flag. The counts between 2 and 4 are
similar in this case because most of the ignored files are
contained in ignored directories.

Our application calls status including the following flags:

--porcelain=v2 --ignored --untracked-files=all --ignore-submodules=none

This means we have bad performance compared to just "git status"
when there is a large number of files in ignored directories
With this new behavior, our application would move from case 3 to
case 4 for this repository.

You also point out the timing difference between case 2 and 4. I
think there is an optimization we can make when running "git
status --ignored" logic that will improve the the timing
here. Currently, the logic in dir.c is iterating over all files
contained in an ignored directory to see if it is empty or
not. This is not necessary - we should be able to stop after
finding any file. I plan to follow up on this in a different
patch.

>> This changes exposes a --show-ignored-directory flag to the git status
> s/changes/change/
>
>> command. This flag is utilized when running git status with the
>> --ignored and --untracked-files options to not list ignored individual
>> ignored files contained in directories that match an ignore pattern.
>>
>> Part of the perf improvement comes from the tweak to
>> read_directory_recursive to stop scanning the file system after it
>> encounters the first file. When a directory is ignored, all it needs to
>> determine is if the directory is empty or not. The logic currently keeps
>> scanning the file system until it finds an untracked file. However, as
>> the directory is ignored, all the contained contents are also marked
>> excluded. For ignored directories that contain a large number of files,
>> this can take some time.
> I think it is possible to ignore a directory and still track files in it, what
> are the implications of this flag on a tracked (and changed) file in an
> ignored dir?
It is possible to have tracked files in an ignored directory. The
behavior with this patch is to show individual files in the
ignored directory. This patch includes a test covering this
scenario. I think this behavior is reasonable for this case - I
am interested if there are other opinions on this.
>
> What happens to empty directories that match an ignore pattern?
>
The "git status" command would not show empty directories. From
the lower level implementation in dir.c, I think it *should*
depend on the DIR_HIDE_EMPTY_DIRECTORIES flag, but does not
currently. I will fix this with the next patch version.
>> @@ -1362,6 +1363,8 @@ int cmd_status(int argc, const char **argv, const char *prefix)
>>                    N_("ignore changes to submodules, optional when: all, dirty, untracked. (Default: all)"),
>>                    PARSE_OPT_OPTARG, NULL, (intptr_t)"all" },
>>                  OPT_COLUMN(0, "column", &s.colopts, N_("list untracked files in columns")),
>> +               OPT_BOOL(0, "show-ignored-directory", &show_ignored_directory,
> Is it possible to directly read into  s.show_ignored_directory here?
>
>> +test_expect_success 'Verify behavior of status on folders with ignored files' '
>> +       test_when_finished "git clean -fdx" &&
>> +       git status --porcelain=v2 --ignored --untracked-files=all --show-ignored-directory >output &&
>> +       test_i18ncmp expect output
>> +'
>> +
>> +# Test status bahavior on folder with tracked and ignored files
> behavior
>
>> +cat >expect <<\EOF
>> +? expect
>> +? output
>> +! dir/tracked_ignored/ignored_1.ign
>> +! dir/tracked_ignored/ignored_2.ign
>> +! tracked_ignored/ignored_1.ign
>> +! tracked_ignored/ignored_2.ign
>> +EOF
> I think our latest 'best style' is to include these heredocs into the test.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v1 1/1] dir: teach status to show ignored directories
  2017-08-11 17:39   ` Brandon Williams
@ 2017-08-11 18:29     ` Jameson Miller
  0 siblings, 0 replies; 11+ messages in thread
From: Jameson Miller @ 2017-08-11 18:29 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git, sxlijin, gitster, peff, Jameson Miller



On 08/11/2017 01:39 PM, Brandon Williams wrote:
> On 08/10, Jameson Miller wrote:
>> Teach Git to optionally show ignored directories when showing all
>> untracked files. The git status command exposes the options to report
>> ignored and/or untracked files. However, when reporting all untracked
>> files (--untracked-files=all), all individual ignored files are reported
>> as well. It is not currently possible to get the reporting behavior of
>> the --ignored flag, while also reporting all untracked files. This
>> change exposes a flag to report all untracked files while not showing
>> individual files in ignored directories.
>>
>> Motivation:
>> Our application (Visual Studio) needs all untracked files listed
>> individually, but does not need all ignored files listed individually.
>> Reporting all ignored files can affect the time it takes for status
>> to run. For a representative repository, here are some measurements
>> showing a large perf improvement for this scenario:
>>
>> | Command | Reported ignored entries | Time (s) |
>> | ------- | ------------------------ | -------- |
>> | 1       | 0                        | 1.3      |
>> | 2       | 1024                     | 4.2      |
>> | 3       | 174904                   | 7.5      |
>> | 4       | 1046                     | 1.6      |
>>
>> Commands:
>>   1) status
>>   2) status --ignored
>>   3) status --ignored --untracked-files=all
>>   4) status --ignored --untracked-files=all --show-ignored-directory
>>
>> This changes exposes a --show-ignored-directory flag to the git status
>> command. This flag is utilized when running git status with the
>> --ignored and --untracked-files options to not list ignored individual
>> ignored files contained in directories that match an ignore pattern.
> I can't help feeling that there is a better way express this with a
> better UI.  I'm not saying this is wrong, I'm just not sure how
> --show-ignored-directory would work when not paired with --ignored and
> --untracked-files.  Does it require --ignored to also be given?
Yes. This flag only has meaning when --ignored and --untracked=all
are specified. I am open to other suggestions on how to express this.

Another option might be to modify the "--ignored" flag to take an
argument, which would allow more explicit control over how ignored
paths are reported. This way, we could specify (for example)
"--ignored=explicit" to control the output of ignored paths.

>> Part of the perf improvement comes from the tweak to
>> read_directory_recursive to stop scanning the file system after it
>> encounters the first file. When a directory is ignored, all it needs to
>> determine is if the directory is empty or not. The logic currently keeps
>> scanning the file system until it finds an untracked file. However, as
>> the directory is ignored, all the contained contents are also marked
>> excluded. For ignored directories that contain a large number of files,
>> this can take some time.
>>
>> Signed-off-by: Jameson Miller <jamill@microsoft.com>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v1 1/1] dir: teach status to show ignored directories
  2017-08-11 17:48     ` Jameson Miller
@ 2017-08-14 21:05       ` Stefan Beller
  0 siblings, 0 replies; 11+ messages in thread
From: Stefan Beller @ 2017-08-14 21:05 UTC (permalink / raw)
  To: Jameson Miller
  Cc: git@vger.kernel.org, Samuel Lijin, Brandon Williams,
	Junio C Hamano, Jeff King, Jameson Miller

On Fri, Aug 11, 2017 at 10:48 AM, Jameson Miller
<jameson.miller81@gmail.com> wrote:
> The set of files listed by "--ignored" changes when different
> values are given to "--untracked-files".  If would be nice to
> be able to make the ignored output independent of the untracked
> settings.  This patch attempts to do that while maintaining
> backward compatibility with the existing behavior.


So after rereading this, maybe we could have
'--ignored[=as-untracked, new-mode]' with as-untracked as default?

This would satisfy your goal, while still maintaining backwards
compatibility.

> We want to see the paths that explicitly match an ignore pattern.
> This means that if a directory only contains ignored files, but the
> directory itself is not explicitly ignored, then we want to see the
> individual files. This is slightly different than the current behavior
> of "--ignored".
>
> I am open to suggestions on how to present the options to control
> this behavior.

I would strongly favor not introducing yet another flag to tweak the
behavior of another command line option. (The UX of Git is already
bloated, IMHO. But that is just my general stance on things)

So we'd have two proposals:

(1) The current patch proposes --show-ignored-directory that modifies
    the behavior of other flags
(2) extend an already existing option to take another mode (which I
    propose)

Strong arguments for (1) would be if the flag is orthogonal and apply
to the other flags in a uniform way, i.e.:
    status --no-ignored --untracked=no --show-ignored-directory
would make sense and produce an output that a user doesn't
find confusing.

Arguments for (2), are
* less documentation for users (see below)
* if the two flags are truly orthogonal, test effort is decreased
* no exposure of an API that may yield strange results when the user
  gives strange input.

>>> This
>>> change exposes a flag to report all untracked files while not showing
>>> individual files in ignored directories.
>>
>> By the description up to here, it sounds as if you want to introduce
>> mode for --untracked, e.g. "normal-adjusted-for-ignored" (it's a bad
>> suggestion)? However the patch seems to add an orthogonal flag,
>> such that
>>
>>    status --no-ignored --untracked=no --show-ignored-directory
>>
>> would also be possible. What is a reasonable expectation for
>> the output of such?
>
>
> The current patch does add another flag. This flag only has meaning if
> the "--ignored" and "--untracked=all" flags are also specified. Another
> option I had considered is to let the "--ignored" flag take an argument.

I would think that is a better solution, as then untracked and ignored
files (and their parameters) will be orthogonal in the long run.
Things like

  --untracked=all --ignored=none
  --untracked=none --ignored=only-dirs
  --untracked=reasonable-default --ignored=other-reasonable-default

will be possible and eventually the default for ignored files may be
independent of untracked settings.

> Then, we could express this new behavior through (for example) a
> "--ignored=exact" flag to reflect the fact that this new option
> returns paths that match the ignore pattern, and does not
> collapse directories that contain only ignored files.

Yes, and we would not need to worry about how the new
flag interacts with other flags. I would prefer that solution as it
seems the API will be cleaner that way.

Less things depend on each other, e.g. as a user I do not need to
know about the 'other' flag, be it ignored or untracked to solve my
problem. If we'd go with this third flag, we'd need to (a) document it,
but (b) also have to add a sentence to untracked and ignored flag
    "In the corner case of also having the third flag, we change
    behavior once again"
which I'd dislike from a users perspective.

>>> Commands:
>>>   1) status
>>>   2) status --ignored
>>>   3) status --ignored --untracked-files=all
>>>   4) status --ignored --untracked-files=all --show-ignored-directory
>>> (2) is --untracked-files=normal I'd presume, such that this flag
>>> can be understood as a tweak to "normal" based on the similar size
>>> between 2 and 4? (The timing improvement from 2 to 4 is huge though).
>
> (2) is --untracked-files=normal. Although the count of ignored
> files similar between 2 and 4, I consider this flag more of a
> tweak on 3, as we want the untracked files reported with
> the "--untracked=all" flag. The counts between 2 and 4 are
> similar in this case because most of the ignored files are
> contained in ignored directories.
>
> Our application calls status including the following flags:
>
> --porcelain=v2 --ignored --untracked-files=all --ignore-submodules=none
>
> This means we have bad performance compared to just "git status"
> when there is a large number of files in ignored directories
> With this new behavior, our application would move from case 3 to
> case 4 for this repository.
>
> You also point out the timing difference between case 2 and 4. I
> think there is an optimization we can make when running "git
> status --ignored" logic that will improve the the timing
> here. Currently, the logic in dir.c is iterating over all files
> contained in an ignored directory to see if it is empty or
> not. This is not necessary - we should be able to stop after
> finding any file. I plan to follow up on this in a different
> patch.

Thanks!

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2] Improve performance of git status --ignored
  2017-08-10 18:49 [PATCH v1 0/1] Teach status to show ignored directories Jameson Miller
  2017-08-10 18:49 ` [PATCH v1 1/1] dir: teach " Jameson Miller
@ 2017-09-18 17:24 ` Jameson Miller
  2017-09-18 17:24   ` Jameson Miller
  1 sibling, 1 reply; 11+ messages in thread
From: Jameson Miller @ 2017-09-18 17:24 UTC (permalink / raw)
  To: jameson.miller81; +Cc: bmwill, git, gitster, jamill, peff, sxlijin

This is the second version of my patches to improve handling of
ignored files

I have decided to break the original patch series into two parts:

 1) Perf improvements to handling ignored directories

 2) Expose extra options to control which ignored files are displayed
 by git status.

This patch will address #1, and I will follow up with another patch
for #2.

This patch improves the performance of 'git status --ignored' by
cutting out unnecessary work when determining if an ignored directory
is empty or not. The current logic will recursively enumerate through
all contents of an ignored directory to determine whether it is empty
or not. The new logic will return after the 1st file is
encountered. This can result in a significant speedup in work dirs
with a lot of files in ignored directories.

As an example of the performance improvement, here is a representative
example of a repository with ~190,000 files in ~400 ignored
directories:

| Command                    | Time (s) |
| -------------------------- | -------- |
| git status                 |    1.2   |
| git status --ignored (old) |    3.9   |
| git status --ignored (new) |    1.4   |

Jameson Miller (1):
  Improve performance of git status --ignored

 dir.c | 47 +++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 41 insertions(+), 6 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2] Improve performance of git status --ignored
  2017-09-18 17:24 ` [PATCH v2] Improve performance of git status --ignored Jameson Miller
@ 2017-09-18 17:24   ` Jameson Miller
  2017-09-19  3:27     ` Junio C Hamano
  2017-09-19 17:52     ` Brandon Williams
  0 siblings, 2 replies; 11+ messages in thread
From: Jameson Miller @ 2017-09-18 17:24 UTC (permalink / raw)
  To: jameson.miller81; +Cc: bmwill, git, gitster, jamill, peff, sxlijin

Improve the performance of the directory listing logic when it wants to list
non-empty ignored directories. In order to show non-empty ignored directories,
the existing logic will recursively iterate through all contents of an ignored
directory. This change introduces the optimization to stop iterating through
the contents once it finds the first file. This can have a significant
improvement in 'git status --ignored' performance in repositories with a large
number of files in ignored directories.

For an example of the performance difference on an example repository with
196,000 files in 400 ignored directories:

| Command                    |  Time (s) |
| -------------------------- | --------- |
| git status                 |   1.2     |
| git status --ignored (old) |   3.9     |
| git status --ignored (new) |   1.4     |

Signed-off-by: Jameson Miller <jamill@microsoft.com>
---
 dir.c | 47 +++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 41 insertions(+), 6 deletions(-)

diff --git a/dir.c b/dir.c
index 1c55dc3..1d17b80 100644
--- a/dir.c
+++ b/dir.c
@@ -49,7 +49,7 @@ struct cached_dir {
 static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 	struct index_state *istate, const char *path, int len,
 	struct untracked_cache_dir *untracked,
-	int check_only, const struct pathspec *pathspec);
+	int check_only, int stop_at_first_file, const struct pathspec *pathspec);
 static int get_dtype(struct dirent *de, struct index_state *istate,
 		     const char *path, int len);
 
@@ -1404,8 +1404,13 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 
 	untracked = lookup_untracked(dir->untracked, untracked,
 				     dirname + baselen, len - baselen);
+
+	/*
+	 * If this is an excluded directory, then we only need to check if
+	 * the directory contains any files.
+	 */
 	return read_directory_recursive(dir, istate, dirname, len,
-					untracked, 1, pathspec);
+					untracked, 1, exclude, pathspec);
 }
 
 /*
@@ -1633,7 +1638,7 @@ static enum path_treatment treat_path_fast(struct dir_struct *dir,
 		 * with check_only set.
 		 */
 		return read_directory_recursive(dir, istate, path->buf, path->len,
-						cdir->ucd, 1, pathspec);
+						cdir->ucd, 1, 0, pathspec);
 	/*
 	 * We get path_recurse in the first run when
 	 * directory_exists_in_index() returns index_nonexistent. We
@@ -1793,12 +1798,20 @@ static void close_cached_dir(struct cached_dir *cdir)
  * Also, we ignore the name ".git" (even if it is not a directory).
  * That likely will not change.
  *
+ * If 'stop_at_first_file' is specified, 'path_excluded' is returned
+ * to signal that a file was found. This is the least significant value that
+ * indicates that a file was encountered that does not depend on the order of
+ * whether an untracked or exluded path was encountered first.
+ *
  * Returns the most significant path_treatment value encountered in the scan.
+ * If 'stop_at_first_file' is specified, `path_excluded` is the most
+ * significant path_treatment value that will be returned.
  */
+
 static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 	struct index_state *istate, const char *base, int baselen,
 	struct untracked_cache_dir *untracked, int check_only,
-	const struct pathspec *pathspec)
+	int stop_at_first_file, const struct pathspec *pathspec)
 {
 	struct cached_dir cdir;
 	enum path_treatment state, subdir_state, dir_state = path_none;
@@ -1832,12 +1845,34 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 			subdir_state =
 				read_directory_recursive(dir, istate, path.buf,
 							 path.len, ud,
-							 check_only, pathspec);
+							 check_only, stop_at_first_file, pathspec);
 			if (subdir_state > dir_state)
 				dir_state = subdir_state;
 		}
 
 		if (check_only) {
+			if (stop_at_first_file) {
+				/*
+				 * If stopping at first file, then
+				 * signal that a file was found by
+				 * returning `path_excluded`. This is
+				 * to return a consistent value
+				 * regardless of whether an ignored or
+				 * excluded file happened to be
+				 * encountered 1st.
+				 *
+				 * In current usage, the
+				 * `stop_at_first_file` is passed when
+				 * an ancestor directory has matched
+				 * an exclude pattern, so any found
+				 * files will be excluded.
+				 */
+				if (dir_state >= path_excluded) {
+					dir_state = path_excluded;
+					break;
+				}
+			}
+
 			/* abort early if maximum state has been reached */
 			if (dir_state == path_untracked) {
 				if (cdir.fdir)
@@ -2108,7 +2143,7 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
 		 */
 		dir->untracked = NULL;
 	if (!len || treat_leading_path(dir, istate, path, len, pathspec))
-		read_directory_recursive(dir, istate, path, len, untracked, 0, pathspec);
+		read_directory_recursive(dir, istate, path, len, untracked, 0, 0, pathspec);
 	QSORT(dir->entries, dir->nr, cmp_dir_entry);
 	QSORT(dir->ignored, dir->ignored_nr, cmp_dir_entry);
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] Improve performance of git status --ignored
  2017-09-18 17:24   ` Jameson Miller
@ 2017-09-19  3:27     ` Junio C Hamano
  2017-09-19 17:52     ` Brandon Williams
  1 sibling, 0 replies; 11+ messages in thread
From: Junio C Hamano @ 2017-09-19  3:27 UTC (permalink / raw)
  To: Jameson Miller; +Cc: bmwill, git, jamill, peff, sxlijin

Jameson Miller <jameson.miller81@gmail.com> writes:

> Improve the performance of the directory listing logic when it wants to list
> non-empty ignored directories. In order to show non-empty ignored directories,
> the existing logic will recursively iterate through all contents of an ignored
> directory. This change introduces the optimization to stop iterating through
> the contents once it finds the first file.

Wow, such an obviously correct optimization.  Very nicely explained, too.

> This can have a significant
> improvement in 'git status --ignored' performance in repositories with a large
> number of files in ignored directories.
>
> For an example of the performance difference on an example repository with
> 196,000 files in 400 ignored directories:
>
> | Command                    |  Time (s) |
> | -------------------------- | --------- |
> | git status                 |   1.2     |
> | git status --ignored (old) |   3.9     |
> | git status --ignored (new) |   1.4     |
>
> Signed-off-by: Jameson Miller <jamill@microsoft.com>
> ---

I wish all the contributions I have to accept are as nicely done as
this one ;-)

Thanks.

>  dir.c | 47 +++++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 41 insertions(+), 6 deletions(-)
>
> diff --git a/dir.c b/dir.c
> index 1c55dc3..1d17b80 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -49,7 +49,7 @@ struct cached_dir {
>  static enum path_treatment read_directory_recursive(struct dir_struct *dir,
>  	struct index_state *istate, const char *path, int len,
>  	struct untracked_cache_dir *untracked,
> -	int check_only, const struct pathspec *pathspec);
> +	int check_only, int stop_at_first_file, const struct pathspec *pathspec);

We might want to make check_only and stop_at_first_file into a
single "unsigned flags" used as a collection of bits, but we can
wait until we start feeling the urge to add the third boolean
parameter to this function (at which point I'd probably demand a
preliminary clean-up to merge these two into a single flags word
before adding the third one as a new bit in that word).

> @@ -1404,8 +1404,13 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
>  
>  	untracked = lookup_untracked(dir->untracked, untracked,
>  				     dirname + baselen, len - baselen);
> +
> +	/*
> +	 * If this is an excluded directory, then we only need to check if
> +	 * the directory contains any files.
> +	 */
>  	return read_directory_recursive(dir, istate, dirname, len,
> -					untracked, 1, pathspec);
> +					untracked, 1, exclude, pathspec);

Nicely explained in the in-code comment.

I'd assume that you want your microsoft e-mail address used on the
signed-off-by line appear as the author, so I'll tweak this a bit to
make it so (otherwise, your 81@gmail.com would become the author).

Thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] Improve performance of git status --ignored
  2017-09-18 17:24   ` Jameson Miller
  2017-09-19  3:27     ` Junio C Hamano
@ 2017-09-19 17:52     ` Brandon Williams
  1 sibling, 0 replies; 11+ messages in thread
From: Brandon Williams @ 2017-09-19 17:52 UTC (permalink / raw)
  To: Jameson Miller; +Cc: git, gitster, jamill, peff, sxlijin

On 09/18, Jameson Miller wrote:
> Improve the performance of the directory listing logic when it wants to list
> non-empty ignored directories. In order to show non-empty ignored directories,
> the existing logic will recursively iterate through all contents of an ignored
> directory. This change introduces the optimization to stop iterating through
> the contents once it finds the first file. This can have a significant
> improvement in 'git status --ignored' performance in repositories with a large
> number of files in ignored directories.
> 
> For an example of the performance difference on an example repository with
> 196,000 files in 400 ignored directories:
> 
> | Command                    |  Time (s) |
> | -------------------------- | --------- |
> | git status                 |   1.2     |
> | git status --ignored (old) |   3.9     |
> | git status --ignored (new) |   1.4     |
> 
> Signed-off-by: Jameson Miller <jamill@microsoft.com>

Everything looks good to me.  My only nit (and no need to change it for
this patch) is that the first line of the commit msg usually has the
form:

  <area>: <short summary>

So that its easy to tell which part of the code a commit is changing.

Seriously, great patch.  Thanks!

> ---
>  dir.c | 47 +++++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 41 insertions(+), 6 deletions(-)
> 
> diff --git a/dir.c b/dir.c
> index 1c55dc3..1d17b80 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -49,7 +49,7 @@ struct cached_dir {
>  static enum path_treatment read_directory_recursive(struct dir_struct *dir,
>  	struct index_state *istate, const char *path, int len,
>  	struct untracked_cache_dir *untracked,
> -	int check_only, const struct pathspec *pathspec);
> +	int check_only, int stop_at_first_file, const struct pathspec *pathspec);
>  static int get_dtype(struct dirent *de, struct index_state *istate,
>  		     const char *path, int len);
>  
> @@ -1404,8 +1404,13 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
>  
>  	untracked = lookup_untracked(dir->untracked, untracked,
>  				     dirname + baselen, len - baselen);
> +
> +	/*
> +	 * If this is an excluded directory, then we only need to check if
> +	 * the directory contains any files.
> +	 */
>  	return read_directory_recursive(dir, istate, dirname, len,
> -					untracked, 1, pathspec);
> +					untracked, 1, exclude, pathspec);
>  }
>  
>  /*
> @@ -1633,7 +1638,7 @@ static enum path_treatment treat_path_fast(struct dir_struct *dir,
>  		 * with check_only set.
>  		 */
>  		return read_directory_recursive(dir, istate, path->buf, path->len,
> -						cdir->ucd, 1, pathspec);
> +						cdir->ucd, 1, 0, pathspec);
>  	/*
>  	 * We get path_recurse in the first run when
>  	 * directory_exists_in_index() returns index_nonexistent. We
> @@ -1793,12 +1798,20 @@ static void close_cached_dir(struct cached_dir *cdir)
>   * Also, we ignore the name ".git" (even if it is not a directory).
>   * That likely will not change.
>   *
> + * If 'stop_at_first_file' is specified, 'path_excluded' is returned
> + * to signal that a file was found. This is the least significant value that
> + * indicates that a file was encountered that does not depend on the order of
> + * whether an untracked or exluded path was encountered first.
> + *
>   * Returns the most significant path_treatment value encountered in the scan.
> + * If 'stop_at_first_file' is specified, `path_excluded` is the most
> + * significant path_treatment value that will be returned.
>   */
> +
>  static enum path_treatment read_directory_recursive(struct dir_struct *dir,
>  	struct index_state *istate, const char *base, int baselen,
>  	struct untracked_cache_dir *untracked, int check_only,
> -	const struct pathspec *pathspec)
> +	int stop_at_first_file, const struct pathspec *pathspec)
>  {
>  	struct cached_dir cdir;
>  	enum path_treatment state, subdir_state, dir_state = path_none;
> @@ -1832,12 +1845,34 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
>  			subdir_state =
>  				read_directory_recursive(dir, istate, path.buf,
>  							 path.len, ud,
> -							 check_only, pathspec);
> +							 check_only, stop_at_first_file, pathspec);
>  			if (subdir_state > dir_state)
>  				dir_state = subdir_state;
>  		}
>  
>  		if (check_only) {
> +			if (stop_at_first_file) {
> +				/*
> +				 * If stopping at first file, then
> +				 * signal that a file was found by
> +				 * returning `path_excluded`. This is
> +				 * to return a consistent value
> +				 * regardless of whether an ignored or
> +				 * excluded file happened to be
> +				 * encountered 1st.
> +				 *
> +				 * In current usage, the
> +				 * `stop_at_first_file` is passed when
> +				 * an ancestor directory has matched
> +				 * an exclude pattern, so any found
> +				 * files will be excluded.
> +				 */
> +				if (dir_state >= path_excluded) {
> +					dir_state = path_excluded;
> +					break;
> +				}
> +			}
> +
>  			/* abort early if maximum state has been reached */
>  			if (dir_state == path_untracked) {
>  				if (cdir.fdir)
> @@ -2108,7 +2143,7 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
>  		 */
>  		dir->untracked = NULL;
>  	if (!len || treat_leading_path(dir, istate, path, len, pathspec))
> -		read_directory_recursive(dir, istate, path, len, untracked, 0, pathspec);
> +		read_directory_recursive(dir, istate, path, len, untracked, 0, 0, pathspec);
>  	QSORT(dir->entries, dir->nr, cmp_dir_entry);
>  	QSORT(dir->ignored, dir->ignored_nr, cmp_dir_entry);
>  
> -- 
> 2.7.4
> 

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-09-19 17:52 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-10 18:49 [PATCH v1 0/1] Teach status to show ignored directories Jameson Miller
2017-08-10 18:49 ` [PATCH v1 1/1] dir: teach " Jameson Miller
2017-08-10 20:03   ` Stefan Beller
2017-08-11 17:48     ` Jameson Miller
2017-08-14 21:05       ` Stefan Beller
2017-08-11 17:39   ` Brandon Williams
2017-08-11 18:29     ` Jameson Miller
2017-09-18 17:24 ` [PATCH v2] Improve performance of git status --ignored Jameson Miller
2017-09-18 17:24   ` Jameson Miller
2017-09-19  3:27     ` Junio C Hamano
2017-09-19 17:52     ` Brandon Williams

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).