git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Tao Klerks via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: "Tao Klerks" <tao@klerks.biz>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Tao Klerks" <tao@klerks.biz>, "Tao Klerks" <tao@klerks.biz>
Subject: [PATCH v6 2/2] untracked-cache: support '--untracked-files=all' if configured
Date: Thu, 31 Mar 2022 16:02:15 +0000	[thread overview]
Message-ID: <0095ec49c5f3a3fc60c8fa0487dbeb04b0e1385d.1648742535.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.985.v6.git.1648742535.gitgitgadget@gmail.com>

From: Tao Klerks <tao@klerks.biz>

Untracked cache was originally designed to only work with
"--untracked-files=normal", and is bypassed when
"--untracked-files=all" is requested, but this causes performance
issues for UI tooling that wants to see "all" on a frequent basis.

On the other hand, the conditions that altogether prevented
applicability to the "all" mode no longer seem to apply, after
several major refactors in recent years; this possibility was
discussed in
81153d02-8e7a-be59-e709-e90cd5906f3a@jeffhostetler.com and
CABPp-BFiwzzUgiTj_zu+vF5x20L0=1cf25cHwk7KZQj2YkVzXw@mail.gmail.com,
and somewhat confirmed experimentally by several users using a
version of this patch to use untracked cache with -uall for about a
year.

When 'git status' runs without using the untracked cache, on a large
repo, on windows, with fsmonitor, it can run very slowly. This can
make GUIs that need to use "-uall" (and therefore currently bypass
untracked cache) unusable when fsmonitor is enabled, on such large
repos.

To partially address this, align the supported directory flags for the
stored untracked cache data with the git config. If a user specifies
an '--untracked-files=' commandline parameter that does not align with
their 'status.showuntrackedfiles' config value, then the untracked
cache will be ignored - as it is for other unsupported situations like
when a pathspec is specified.

If the previously stored flags no longer match the current
configuration, but the currently-applicable flags do match the current
configuration, then discard the previously stored untracked cache
data.

For most users there will be no change in behavior. Users who need
'--untracked-files=all' to perform well will now have the option of
setting "status.showuntrackedfiles" to "all" for better / more
consistent performance.

Users who need '--untracked-files=all' to perform well for their
tooling AND prefer to avoid the verbosity of "all" when running
git status explicitly without options... are out of luck for now (no
change).

Users who have the "status.showuntrackedfiles" config set to "all"
and yet frequently explicitly call
'git status --untracked-files=normal' (and use the untracked cache)
are the only ones who will be disadvantaged by this change. Their
"--untracked-files=normal" calls will, after this change, no longer
use the untracked cache.

Signed-off-by: Tao Klerks <tao@klerks.biz>
---
 dir.c                             | 88 +++++++++++++++++++++++++------
 t/t7063-status-untracked-cache.sh | 81 ++++++++++++++++++++++++++++
 2 files changed, 153 insertions(+), 16 deletions(-)

diff --git a/dir.c b/dir.c
index f2b0f242101..26c4d141ab9 100644
--- a/dir.c
+++ b/dir.c
@@ -2747,13 +2747,33 @@ static void set_untracked_ident(struct untracked_cache *uc)
 	strbuf_addch(&uc->ident, 0);
 }
 
-static void new_untracked_cache(struct index_state *istate)
+static unsigned new_untracked_cache_flags(struct index_state *istate)
+{
+	struct repository *repo = istate->repo;
+	char *val;
+
+	/*
+	 * This logic is coordinated with the setting of these flags in
+	 * wt-status.c#wt_status_collect_untracked(), and the evaluation
+	 * of the config setting in commit.c#git_status_config()
+	 */
+	if (!repo_config_get_string(repo, "status.showuntrackedfiles", &val) &&
+	    !strcmp(val, "all"))
+		return 0;
+
+	/*
+	 * The default, if "all" is not set, is "normal" - leading us here.
+	 * If the value is "none" then it really doesn't matter.
+	 */
+	return DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
+}
+
+static void new_untracked_cache(struct index_state *istate, int flags)
 {
 	struct untracked_cache *uc = xcalloc(1, sizeof(*uc));
 	strbuf_init(&uc->ident, 100);
 	uc->exclude_per_dir = ".gitignore";
-	/* should be the same flags used by git-status */
-	uc->dir_flags = DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
+	uc->dir_flags = flags >= 0 ? flags : new_untracked_cache_flags(istate);
 	set_untracked_ident(uc);
 	istate->untracked = uc;
 	istate->cache_changed |= UNTRACKED_CHANGED;
@@ -2762,11 +2782,11 @@ static void new_untracked_cache(struct index_state *istate)
 void add_untracked_cache(struct index_state *istate)
 {
 	if (!istate->untracked) {
-		new_untracked_cache(istate);
+		new_untracked_cache(istate, -1);
 	} else {
 		if (!ident_in_untracked(istate->untracked)) {
 			free_untracked_cache(istate->untracked);
-			new_untracked_cache(istate);
+			new_untracked_cache(istate, -1);
 		}
 	}
 }
@@ -2814,17 +2834,9 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
 	if (base_len || (pathspec && pathspec->nr))
 		return NULL;
 
-	/* Different set of flags may produce different results */
-	if (dir->flags != dir->untracked->dir_flags ||
-	    /*
-	     * See treat_directory(), case index_nonexistent. Without
-	     * this flag, we may need to also cache .git file content
-	     * for the resolve_gitlink_ref() call, which we don't.
-	     */
-	    !(dir->flags & DIR_SHOW_OTHER_DIRECTORIES) ||
-	    /* We don't support collecting ignore files */
-	    (dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO |
-			   DIR_COLLECT_IGNORED)))
+	/* We don't support collecting ignore files */
+	if (dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO |
+			DIR_COLLECT_IGNORED))
 		return NULL;
 
 	/*
@@ -2847,6 +2859,50 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
 		return NULL;
 	}
 
+	/*
+	 * If the untracked structure we received does not have the same flags
+	 * as requested in this run, we're going to need to either discard the
+	 * existing structure (and potentially later recreate), or bypass the
+	 * untracked cache mechanism for this run.
+	 */
+	if (dir->flags != dir->untracked->dir_flags) {
+		/*
+		 * If the untracked structure we received does not have the same flags
+		 * as configured, then we need to reset / create a new "untracked"
+		 * structure to match the new config.
+		 *
+		 * Keeping the saved and used untracked cache consistent with the
+		 * configuration provides an opportunity for frequent users of
+		 * "git status -uall" to leverage the untracked cache by aligning their
+		 * configuration - setting "status.showuntrackedfiles" to "all" or
+		 * "normal" as appropriate.
+		 *
+		 * Previously using -uall (or setting "status.showuntrackedfiles" to
+		 * "all") was incompatible with untracked cache and *consistently*
+		 * caused surprisingly bad performance (with fscache and fsmonitor
+		 * enabled) on Windows.
+		 *
+		 * IMPROVEMENT OPPORTUNITY: If we reworked the untracked cache storage
+		 * to not be as bound up with the desired output in a given run,
+		 * and instead iterated through and stored enough information to
+		 * correctly serve both "modes", then users could get peak performance
+		 * with or without '-uall' regardless of their
+		 * "status.showuntrackedfiles" config.
+		 */
+		if (dir->untracked->dir_flags != new_untracked_cache_flags(istate)) {
+			free_untracked_cache(istate->untracked);
+			new_untracked_cache(istate, dir->flags);
+			dir->untracked = istate->untracked;
+		}
+		else {
+			/*
+			 * Current untracked cache data is consistent with config, but not
+			 * usable in this request/run; just bypass untracked cache.
+			 */
+			return NULL;
+		}
+	}
+
 	if (!dir->untracked->root) {
 		/* Untracked cache existed but is not initialized; fix that */
 		FLEX_ALLOC_STR(dir->untracked->root, name, "");
diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh
index b89be8dc6d6..9936cc329ec 100755
--- a/t/t7063-status-untracked-cache.sh
+++ b/t/t7063-status-untracked-cache.sh
@@ -222,6 +222,87 @@ test_expect_success 'untracked cache remains after bypass' '
 	test_cmp ../dump.expect ../actual
 '
 
+test_expect_success 'if -uall is configured, untracked cache gets populated by default' '
+	test_config status.showuntrackedfiles all &&
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
+	git status --porcelain >../actual &&
+	iuc status --porcelain >../status.iuc &&
+	test_cmp ../status_uall.expect ../status.iuc &&
+	test_cmp ../status_uall.expect ../actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
+	cat >../trace.expect <<EOF &&
+ ....path:
+ ....node-creation:3
+ ....gitignore-invalidation:1
+ ....directory-invalidation:0
+ ....opendir:4
+EOF
+	test_cmp ../trace.expect ../trace.relevant
+'
+
+cat >../dump_uall.expect <<EOF &&
+info/exclude $EMPTY_BLOB
+core.excludesfile $ZERO_OID
+exclude_per_dir .gitignore
+flags 00000000
+/ $ZERO_OID recurse valid
+three
+/done/ $ZERO_OID recurse valid
+/dthree/ $ZERO_OID recurse valid
+three
+/dtwo/ $ZERO_OID recurse valid
+two
+EOF
+
+test_expect_success 'if -uall was configured, untracked cache is populated' '
+	test-tool dump-untracked-cache >../actual &&
+	test_cmp ../dump_uall.expect ../actual
+'
+
+test_expect_success 'if -uall is configured, untracked cache is used by default' '
+	test_config status.showuntrackedfiles all &&
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
+	git status --porcelain >../actual &&
+	iuc status --porcelain >../status.iuc &&
+	test_cmp ../status_uall.expect ../status.iuc &&
+	test_cmp ../status_uall.expect ../actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
+	cat >../trace.expect <<EOF &&
+ ....path:
+ ....node-creation:0
+ ....gitignore-invalidation:0
+ ....directory-invalidation:0
+ ....opendir:0
+EOF
+	test_cmp ../trace.expect ../trace.relevant
+'
+
+# Bypassing the untracked cache here is not desirable from an
+# end-user perspective, but is expected in the current design.
+# The untracked cache data stored for a -all run cannot be
+# correctly used in a -unormal run - it would yield incorrect
+# output.
+test_expect_success 'if -uall is configured, untracked cache is bypassed with -unormal' '
+	test_config status.showuntrackedfiles all &&
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
+	git status -unormal --porcelain >../actual &&
+	iuc status -unormal --porcelain >../status.iuc &&
+	test_cmp ../status.expect ../status.iuc &&
+	test_cmp ../status.expect ../actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
+	cat >../trace.expect <<EOF &&
+ ....path:
+EOF
+	test_cmp ../trace.expect ../trace.relevant
+'
+
+test_expect_success 'repopulate untracked cache for -unormal' '
+	git status --porcelain
+'
+
 test_expect_success 'modify in root directory, one dir invalidation' '
 	: >four &&
 	test-tool chmtime =-240 four &&
-- 
gitgitgadget

      parent reply	other threads:[~2022-03-31 16:02 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-23  6:44 [PATCH] Support untracked cache with '--untracked-files=all' if configured Tao Klerks via GitGitGadget
2022-02-25 17:52 ` [PATCH v2] untracked-cache: support " Tao Klerks via GitGitGadget
2022-02-25 19:43   ` Junio C Hamano
2022-02-27 11:21     ` Tao Klerks
2022-02-27 19:54       ` Junio C Hamano
2022-02-27 11:22   ` [PATCH v3] " Tao Klerks via GitGitGadget
2022-02-27 14:39     ` Tao Klerks
2022-02-27 15:13     ` [PATCH v4] " Tao Klerks via GitGitGadget
2022-02-28 14:02       ` Ævar Arnfjörð Bjarmason
2022-03-02  8:47         ` Tao Klerks
2022-03-29 11:25       ` [PATCH v5 0/2] Support untracked cache with " Tao Klerks via GitGitGadget
2022-03-29 11:25         ` [PATCH v5 1/2] untracked-cache: test untracked-cache-bypassing behavior with -uall Tao Klerks via GitGitGadget
2022-03-29 16:51           ` Junio C Hamano
2022-03-30  4:46             ` Tao Klerks
2022-03-30 16:39               ` Junio C Hamano
2022-03-31  5:15                 ` Tao Klerks
2022-03-29 11:25         ` [PATCH v5 2/2] untracked-cache: support '--untracked-files=all' if configured Tao Klerks via GitGitGadget
2022-03-29 17:43           ` Junio C Hamano
2022-03-30 19:59             ` Tao Klerks
2022-03-31 16:02         ` [PATCH v6 0/2] Support untracked cache with " Tao Klerks via GitGitGadget
2022-03-31 16:02           ` [PATCH v6 1/2] untracked-cache: test untracked-cache-bypassing behavior with -uall Tao Klerks via GitGitGadget
2022-03-31 16:02           ` Tao Klerks via GitGitGadget [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0095ec49c5f3a3fc60c8fa0487dbeb04b0e1385d.1648742535.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=tao@klerks.biz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).