git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] Support untracked cache with '--untracked-files=all' if configured
@ 2021-06-23  6:44 Tao Klerks via GitGitGadget
  2022-02-25 17:52 ` [PATCH v2] untracked-cache: support " Tao Klerks via GitGitGadget
  0 siblings, 1 reply; 22+ messages in thread
From: Tao Klerks via GitGitGadget @ 2021-06-23  6:44 UTC (permalink / raw)
  To: git; +Cc: Tao Klerks, Tao Klerks

From: Tao Klerks <tao@klerks.biz>

Untracked cache was originally designed to only work with
'-untracked-files=normal', but this is a concern for UI tooling
that wants to see "all" on a frequent basis, and the conditions
that prevented applicability to the "all" mode no longer seem to
apply.

The disqualification of untracked cache is a particular problem on
Windows with the defaulted fscache, where the introduction of
fsmonitor can make the first and costly directory-iteration happen
in untracked file detection, single-threaded, rather than in
preload-index on multiple threads. Improving the performance of a
"normal" 'git status' run with fsmonitor can make
'git status --untracked-files=all' perform much worse.

Here we align the supported directory flags for the stored
untracked cache data with the git config. If a user specifies an
'--untracked-files=' commandline parameter that does not align
with their 'status.showuntrackedfiles' config value, then the
untracked cache will be ignored - as it is for other unsupported
situations like when a pathspec is specified.

If the previously stored flags no longer match the current
configuration, but the currently-applicable flags do match the
current configuration, then the previously stored untracked cache
data is discarded.

For most users there will be no change in behavior. Users who
need '--untracked-files=all' to perform well will have the option
of setting "status.showuntrackedfiles" to "all".

Users who need '--untracked-files=all' to perform well for their
tooling AND prefer to avoid the verbosity of "all" when running
git status explicitly... are out of luck for now.

Users who set "status.showuntrackedfiles" to "all" and yet
most frequently explicitly call
'git status --untracked-files=normal' (and use the untracked
cache) are the only users who will be explicitly disadvantaged
by this change. These should be vanishingly rare, if there are
any at all.

Signed-off-by: Tao Klerks <tao@klerks.biz>
---
    Support untracked cache with '--untracked-files=all' if configured
    
    This patch aims to make it possible for users of -uall, either by
    preference or by need (eg UI tooling), to benefit from the untracked
    cache when they set their 'status.showuntrackedfiles' config setting to
    'all'. This is very important for large repos in Windows.
    
    More detail on the change and context in the commit message, I assume
    repeating a verbose message here is discouraged.
    
    These changes result from a couple of conversations,
    81153d02-8e7a-be59-e709-e90cd5906f3a@jeffhostetler.com and
    CABPp-BFiwzzUgiTj_zu+vF5x20L0=1cf25cHwk7KZQj2YkVzXw@mail.gmail.com>.
    
    The test suite passes, but as a n00b I do have questions:
    
     * Is there any additional validatiln that could/should be done to
       confirm that "-uall" untracked data can be cached safely? It looks
       that way to me, both from following the code, discussing briefly with
       Elijah Newren in the thread above, and manual testing, but I'm not
       sure what edge-cases to be looking out for
     * Is it OK to check the repo configuration in the body of processing?
       It seems to be a rare pattern
     * Should I be adding test cases? To the existing t7603 untracked cache
       tests, or elsewhere?

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-985%2FTaoK%2Ftaok-untracked-cache-with-uall-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-985/TaoK/taok-untracked-cache-with-uall-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/985

 dir.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 68 insertions(+), 18 deletions(-)

diff --git a/dir.c b/dir.c
index ebe5ec046e..49fc13c111 100644
--- a/dir.c
+++ b/dir.c
@@ -2668,13 +2668,33 @@ static void set_untracked_ident(struct untracked_cache *uc)
 	strbuf_addch(&uc->ident, 0);
 }
 
-static void new_untracked_cache(struct index_state *istate)
+static unsigned configured_default_dir_flags(struct index_state *istate)
+{
+	/* This logic is coordinated with the setting of these flags in
+	 * wt-status.c#wt_status_collect_untracked(), and the evaluation
+	 * of the config setting in commit.c#git_status_config()
+	 */
+	char *status_untracked_files_config_value;
+	int config_outcome = repo_config_get_string(istate->repo,
+								"status.showuntrackedfiles",
+								&status_untracked_files_config_value);
+	if (!config_outcome && !strcmp(status_untracked_files_config_value, "all")) {
+		return 0;
+	} else {
+		/*
+		 * The default, if "all" is not set, is "normal" - leading us here.
+		 * If the value is "none" then it really doesn't matter.
+		 */
+		return DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
+	}
+}
+
+static void new_untracked_cache(struct index_state *istate, unsigned flags)
 {
 	struct untracked_cache *uc = xcalloc(1, sizeof(*uc));
 	strbuf_init(&uc->ident, 100);
 	uc->exclude_per_dir = ".gitignore";
-	/* should be the same flags used by git-status */
-	uc->dir_flags = DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
+	uc->dir_flags = flags;
 	set_untracked_ident(uc);
 	istate->untracked = uc;
 	istate->cache_changed |= UNTRACKED_CHANGED;
@@ -2683,11 +2703,13 @@ static void new_untracked_cache(struct index_state *istate)
 void add_untracked_cache(struct index_state *istate)
 {
 	if (!istate->untracked) {
-		new_untracked_cache(istate);
+		new_untracked_cache(istate,
+				configured_default_dir_flags(istate));
 	} else {
 		if (!ident_in_untracked(istate->untracked)) {
 			free_untracked_cache(istate->untracked);
-			new_untracked_cache(istate);
+			new_untracked_cache(istate,
+					configured_default_dir_flags(istate));
 		}
 	}
 }
@@ -2703,10 +2725,12 @@ void remove_untracked_cache(struct index_state *istate)
 
 static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *dir,
 						      int base_len,
-						      const struct pathspec *pathspec)
+						      const struct pathspec *pathspec,
+							  struct index_state *istate)
 {
 	struct untracked_cache_dir *root;
 	static int untracked_cache_disabled = -1;
+	unsigned configured_dir_flags;
 
 	if (!dir->untracked)
 		return NULL;
@@ -2734,17 +2758,9 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
 	if (base_len || (pathspec && pathspec->nr))
 		return NULL;
 
-	/* Different set of flags may produce different results */
-	if (dir->flags != dir->untracked->dir_flags ||
-	    /*
-	     * See treat_directory(), case index_nonexistent. Without
-	     * this flag, we may need to also cache .git file content
-	     * for the resolve_gitlink_ref() call, which we don't.
-	     */
-	    !(dir->flags & DIR_SHOW_OTHER_DIRECTORIES) ||
-	    /* We don't support collecting ignore files */
-	    (dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO |
-			   DIR_COLLECT_IGNORED)))
+	/* We don't support collecting ignore files */
+	if (dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO |
+			DIR_COLLECT_IGNORED))
 		return NULL;
 
 	/*
@@ -2767,6 +2783,40 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
 		return NULL;
 	}
 
+	/* We don't support using or preparing the untracked cache if
+	 * the current effective flags don't match the configured
+	 * flags.
+	 */
+	configured_dir_flags = configured_default_dir_flags(istate);
+	if (dir->flags != configured_dir_flags)
+		return NULL;
+
+	/* If the untracked structure we received does not have the same flags
+	 * as configured, but the configured flags do match the effective flags,
+	 * then we need to reset / create a new "untracked" structure to match
+	 * the new config.
+	 * Keeping the saved and used untracked cache in-line with the
+	 * configuration provides an opportunity for frequent users of
+	 * "git status -uall" to leverage the untracked cache by aligning their
+	 * configuration (setting "status.showuntrackedfiles" to "all" or
+	 * "normal" as appropriate), where previously this option was
+	 * incompatible with untracked cache and *consistently* caused
+	 * surprisingly bad performance (with fscache and fsmonitor enabled) on
+	 * Windows.
+	 *
+	 * IMPROVEMENT OPPORTUNITY: If we reworked the untracked cache storage
+	 * to not be as bound up with the desired output in a given run,
+	 * and instead iterated through and stored enough information to
+	 * correctly serve both "modes", then users could get peak performance
+	 * with or without '-uall' regardless of their
+	 * "status.showuntrackedfiles" config.
+	 */
+	if (dir->flags != dir->untracked->dir_flags) {
+		free_untracked_cache(istate->untracked);
+		new_untracked_cache(istate, configured_dir_flags);
+		dir->untracked = istate->untracked;
+	}
+
 	if (!dir->untracked->root)
 		FLEX_ALLOC_STR(dir->untracked->root, name, "");
 
@@ -2838,7 +2888,7 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
 		return dir->nr;
 	}
 
-	untracked = validate_untracked_cache(dir, len, pathspec);
+	untracked = validate_untracked_cache(dir, len, pathspec, istate);
 	if (!untracked)
 		/*
 		 * make sure untracked cache code path is disabled,

base-commit: 670b81a890388c60b7032a4f5b879f2ece8c4558
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2] untracked-cache: support '--untracked-files=all' if configured
  2021-06-23  6:44 [PATCH] Support untracked cache with '--untracked-files=all' if configured Tao Klerks via GitGitGadget
@ 2022-02-25 17:52 ` Tao Klerks via GitGitGadget
  2022-02-25 19:43   ` Junio C Hamano
  2022-02-27 11:22   ` [PATCH v3] " Tao Klerks via GitGitGadget
  0 siblings, 2 replies; 22+ messages in thread
From: Tao Klerks via GitGitGadget @ 2022-02-25 17:52 UTC (permalink / raw)
  To: git; +Cc: Tao Klerks, Tao Klerks

From: Tao Klerks <tao@klerks.biz>

Untracked cache was originally designed to only work with
'-untracked-files=normal', but this is a concern for UI tooling that
wants to see "all" on a frequent basis, and the conditions that
prevented applicability to the "all" mode no longer seem to apply.

The disqualification of untracked cache is a particular problem on
Windows with the defaulted fscache, where the introduction of
fsmonitor can make the first and costly directory-iteration happen
in untracked file detection, single-threaded, rather than in
preload-index on multiple threads. Improving the performance of a
"normal" 'git status' run with fsmonitor can make
'git status --untracked-files=all' perform much worse.

In this change, align the supported directory flags for the stored
untracked cache data with the git config. If a user specifies an
'--untracked-files=' commandline parameter that does not align with
their 'status.showuntrackedfiles' config value, then the untracked
cache will be ignored - as it is for other unsupported situations like
when a pathspec is specified.

If the previously stored flags no longer match the current
configuration, but the currently-applicable flags do match the current
configuration, then the previously stored untracked cache data is
discarded.

For most users there will be no change in behavior. Users who need
'--untracked-files=all' to perform well will have the option of
setting "status.showuntrackedfiles" to "all".

Users who need '--untracked-files=all' to perform well for their
tooling AND prefer to avoid the verbosity of "all" when running
git status explicitly without options... are out of luck for now (no
change).

Users who set "status.showuntrackedfiles" to "all" and yet most
frequently explicitly call 'git status --untracked-files=normal' (and
use the untracked cache) are the only ones who would be disadvantaged
by this change. It seems unlikely there are any such users.

Signed-off-by: Tao Klerks <tao@klerks.biz>
---
    Support untracked cache with '--untracked-files=all' if configured
    
    Resending this patch from a few months ago (with better standards
    compliance) - it hasn't seen any comments yet, I would dearly love some
    eyes on this as the change can make a big difference to hundreds of
    windows users in my environment (and I'd really rather not start
    distributing customized git builds!)
    
    This patch aims to make it possible for users of the -uall flag to git
    status, either by preference or by need (eg UI tooling), to benefit from
    the untracked cache when they set their 'status.showuntrackedfiles'
    config setting to 'all'. This is very important for large repos in
    Windows.
    
    More detail on the change and context in the commit message, I assume
    repeating a verbose message here is discouraged.
    
    These changes result from a couple of conversations,
    81153d02-8e7a-be59-e709-e90cd5906f3a@jeffhostetler.com and
    CABPp-BFiwzzUgiTj_zu+vF5x20L0=1cf25cHwk7KZQj2YkVzXw@mail.gmail.com>.
    
    The test suite passes, but as a n00b I do have questions:
    
     * Is there any additional validation that could/should be done to
       confirm that "-uall" untracked data can be cached safely?
    
    ** It looks safe from following the code ** It seems from discussing
    briefly with Elijah Newren in the thread above that thare are no obvious
    red flags ** Manual testing, explicitly and implicitly through months of
    use, yields no issues
    
     * Is it OK to check the repo configuration in the body of processing?
       It seems to be a rare pattern
     * Can anyone think of a way to test this change?

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-985%2FTaoK%2Ftaok-untracked-cache-with-uall-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-985/TaoK/taok-untracked-cache-with-uall-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/985

Range-diff vs v1:

 1:  2797efad9a4 ! 1:  e2f1ad26c78 Support untracked cache with '--untracked-files=all' if configured
     @@ Metadata
      Author: Tao Klerks <tao@klerks.biz>
      
       ## Commit message ##
     -    Support untracked cache with '--untracked-files=all' if configured
     +    untracked-cache: support '--untracked-files=all' if configured
      
          Untracked cache was originally designed to only work with
     -    '-untracked-files=normal', but this is a concern for UI tooling
     -    that wants to see "all" on a frequent basis, and the conditions
     -    that prevented applicability to the "all" mode no longer seem to
     -    apply.
     +    '-untracked-files=normal', but this is a concern for UI tooling that
     +    wants to see "all" on a frequent basis, and the conditions that
     +    prevented applicability to the "all" mode no longer seem to apply.
      
          The disqualification of untracked cache is a particular problem on
          Windows with the defaulted fscache, where the introduction of
     @@ Commit message
          "normal" 'git status' run with fsmonitor can make
          'git status --untracked-files=all' perform much worse.
      
     -    Here we align the supported directory flags for the stored
     +    In this change, align the supported directory flags for the stored
          untracked cache data with the git config. If a user specifies an
     -    '--untracked-files=' commandline parameter that does not align
     -    with their 'status.showuntrackedfiles' config value, then the
     -    untracked cache will be ignored - as it is for other unsupported
     -    situations like when a pathspec is specified.
     +    '--untracked-files=' commandline parameter that does not align with
     +    their 'status.showuntrackedfiles' config value, then the untracked
     +    cache will be ignored - as it is for other unsupported situations like
     +    when a pathspec is specified.
      
          If the previously stored flags no longer match the current
     -    configuration, but the currently-applicable flags do match the
     -    current configuration, then the previously stored untracked cache
     -    data is discarded.
     +    configuration, but the currently-applicable flags do match the current
     +    configuration, then the previously stored untracked cache data is
     +    discarded.
      
     -    For most users there will be no change in behavior. Users who
     -    need '--untracked-files=all' to perform well will have the option
     -    of setting "status.showuntrackedfiles" to "all".
     +    For most users there will be no change in behavior. Users who need
     +    '--untracked-files=all' to perform well will have the option of
     +    setting "status.showuntrackedfiles" to "all".
      
          Users who need '--untracked-files=all' to perform well for their
          tooling AND prefer to avoid the verbosity of "all" when running
     -    git status explicitly... are out of luck for now.
     +    git status explicitly without options... are out of luck for now (no
     +    change).
      
     -    Users who set "status.showuntrackedfiles" to "all" and yet
     -    most frequently explicitly call
     -    'git status --untracked-files=normal' (and use the untracked
     -    cache) are the only users who will be explicitly disadvantaged
     -    by this change. These should be vanishingly rare, if there are
     -    any at all.
     +    Users who set "status.showuntrackedfiles" to "all" and yet most
     +    frequently explicitly call 'git status --untracked-files=normal' (and
     +    use the untracked cache) are the only ones who would be disadvantaged
     +    by this change. It seems unlikely there are any such users.
      
          Signed-off-by: Tao Klerks <tao@klerks.biz>
      


 dir.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 68 insertions(+), 18 deletions(-)

diff --git a/dir.c b/dir.c
index d91295f2bcd..e35331d3f71 100644
--- a/dir.c
+++ b/dir.c
@@ -2746,13 +2746,33 @@ static void set_untracked_ident(struct untracked_cache *uc)
 	strbuf_addch(&uc->ident, 0);
 }
 
-static void new_untracked_cache(struct index_state *istate)
+static unsigned configured_default_dir_flags(struct index_state *istate)
+{
+	/* This logic is coordinated with the setting of these flags in
+	 * wt-status.c#wt_status_collect_untracked(), and the evaluation
+	 * of the config setting in commit.c#git_status_config()
+	 */
+	char *status_untracked_files_config_value;
+	int config_outcome = repo_config_get_string(istate->repo,
+								"status.showuntrackedfiles",
+								&status_untracked_files_config_value);
+	if (!config_outcome && !strcmp(status_untracked_files_config_value, "all")) {
+		return 0;
+	} else {
+		/*
+		 * The default, if "all" is not set, is "normal" - leading us here.
+		 * If the value is "none" then it really doesn't matter.
+		 */
+		return DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
+	}
+}
+
+static void new_untracked_cache(struct index_state *istate, unsigned flags)
 {
 	struct untracked_cache *uc = xcalloc(1, sizeof(*uc));
 	strbuf_init(&uc->ident, 100);
 	uc->exclude_per_dir = ".gitignore";
-	/* should be the same flags used by git-status */
-	uc->dir_flags = DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
+	uc->dir_flags = flags;
 	set_untracked_ident(uc);
 	istate->untracked = uc;
 	istate->cache_changed |= UNTRACKED_CHANGED;
@@ -2761,11 +2781,13 @@ static void new_untracked_cache(struct index_state *istate)
 void add_untracked_cache(struct index_state *istate)
 {
 	if (!istate->untracked) {
-		new_untracked_cache(istate);
+		new_untracked_cache(istate,
+				configured_default_dir_flags(istate));
 	} else {
 		if (!ident_in_untracked(istate->untracked)) {
 			free_untracked_cache(istate->untracked);
-			new_untracked_cache(istate);
+			new_untracked_cache(istate,
+					configured_default_dir_flags(istate));
 		}
 	}
 }
@@ -2781,10 +2803,12 @@ void remove_untracked_cache(struct index_state *istate)
 
 static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *dir,
 						      int base_len,
-						      const struct pathspec *pathspec)
+						      const struct pathspec *pathspec,
+							  struct index_state *istate)
 {
 	struct untracked_cache_dir *root;
 	static int untracked_cache_disabled = -1;
+	unsigned configured_dir_flags;
 
 	if (!dir->untracked)
 		return NULL;
@@ -2812,17 +2836,9 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
 	if (base_len || (pathspec && pathspec->nr))
 		return NULL;
 
-	/* Different set of flags may produce different results */
-	if (dir->flags != dir->untracked->dir_flags ||
-	    /*
-	     * See treat_directory(), case index_nonexistent. Without
-	     * this flag, we may need to also cache .git file content
-	     * for the resolve_gitlink_ref() call, which we don't.
-	     */
-	    !(dir->flags & DIR_SHOW_OTHER_DIRECTORIES) ||
-	    /* We don't support collecting ignore files */
-	    (dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO |
-			   DIR_COLLECT_IGNORED)))
+	/* We don't support collecting ignore files */
+	if (dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO |
+			DIR_COLLECT_IGNORED))
 		return NULL;
 
 	/*
@@ -2845,6 +2861,40 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
 		return NULL;
 	}
 
+	/* We don't support using or preparing the untracked cache if
+	 * the current effective flags don't match the configured
+	 * flags.
+	 */
+	configured_dir_flags = configured_default_dir_flags(istate);
+	if (dir->flags != configured_dir_flags)
+		return NULL;
+
+	/* If the untracked structure we received does not have the same flags
+	 * as configured, but the configured flags do match the effective flags,
+	 * then we need to reset / create a new "untracked" structure to match
+	 * the new config.
+	 * Keeping the saved and used untracked cache in-line with the
+	 * configuration provides an opportunity for frequent users of
+	 * "git status -uall" to leverage the untracked cache by aligning their
+	 * configuration (setting "status.showuntrackedfiles" to "all" or
+	 * "normal" as appropriate), where previously this option was
+	 * incompatible with untracked cache and *consistently* caused
+	 * surprisingly bad performance (with fscache and fsmonitor enabled) on
+	 * Windows.
+	 *
+	 * IMPROVEMENT OPPORTUNITY: If we reworked the untracked cache storage
+	 * to not be as bound up with the desired output in a given run,
+	 * and instead iterated through and stored enough information to
+	 * correctly serve both "modes", then users could get peak performance
+	 * with or without '-uall' regardless of their
+	 * "status.showuntrackedfiles" config.
+	 */
+	if (dir->flags != dir->untracked->dir_flags) {
+		free_untracked_cache(istate->untracked);
+		new_untracked_cache(istate, configured_dir_flags);
+		dir->untracked = istate->untracked;
+	}
+
 	if (!dir->untracked->root)
 		FLEX_ALLOC_STR(dir->untracked->root, name, "");
 
@@ -2916,7 +2966,7 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
 		return dir->nr;
 	}
 
-	untracked = validate_untracked_cache(dir, len, pathspec);
+	untracked = validate_untracked_cache(dir, len, pathspec, istate);
 	if (!untracked)
 		/*
 		 * make sure untracked cache code path is disabled,

base-commit: dab1b7905d0b295f1acef9785bb2b9cbb0fdec84
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] untracked-cache: support '--untracked-files=all' if configured
  2022-02-25 17:52 ` [PATCH v2] untracked-cache: support " Tao Klerks via GitGitGadget
@ 2022-02-25 19:43   ` Junio C Hamano
  2022-02-27 11:21     ` Tao Klerks
  2022-02-27 11:22   ` [PATCH v3] " Tao Klerks via GitGitGadget
  1 sibling, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2022-02-25 19:43 UTC (permalink / raw)
  To: Tao Klerks via GitGitGadget; +Cc: git, Tao Klerks

"Tao Klerks via GitGitGadget" <gitgitgadget@gmail.com> writes:

> If the previously stored flags no longer match the current
> configuration, but the currently-applicable flags do match the current
> configuration, then the previously stored untracked cache data is
> discarded.
>
> For most users there will be no change in behavior. Users who need
> '--untracked-files=all' to perform well will have the option of
> setting "status.showuntrackedfiles" to "all".
>
> Users who need '--untracked-files=all' to perform well for their
> tooling AND prefer to avoid the verbosity of "all" when running
> git status explicitly without options... are out of luck for now (no
> change).

So, in short, the root of the problem is that untracked cache can be
used to serve only one mode (between normal and all) of operation,
and the real solution to that problem must come in a different form,
i.e. allowing a single unified untracked cache to be usable in both
modes, perhaps by maintaining all the untracked ones in the cache,
but filter out upon output when the 'normal' mode is asked for?

> Users who set "status.showuntrackedfiles" to "all" and yet most
> frequently explicitly call 'git status --untracked-files=normal' (and
> use the untracked cache) are the only ones who would be disadvantaged
> by this change. It seems unlikely there are any such users.

Given how widely used we are these days, I am afraid that the days
we can safely say "users with such a strange use pattern would not
exist at all" is long gone.

> +static unsigned configured_default_dir_flags(struct index_state *istate)
> +{
> +	/* This logic is coordinated with the setting of these flags in
> +	 * wt-status.c#wt_status_collect_untracked(), and the evaluation
> +	 * of the config setting in commit.c#git_status_config()
> +	 */

Good thing to note here.

Style.

  /*
   * Our multi-line comments reads more like this, with
   * slash-asterisk that opens, and asterisk-slash that closes,
   * sitting on their own lines.
   */

> +	char *status_untracked_files_config_value;
> +	int config_outcome = repo_config_get_string(istate->repo,
> +								"status.showuntrackedfiles",

The indentation looks a bit off.

In this project, tab width is 8.  The beginning of the second
parameter to the function call on the second line should align with
the beginning of the first parameter that immediately follows the
open parenthesis '('.

> +								&status_untracked_files_config_value);
> +	if (!config_outcome && !strcmp(status_untracked_files_config_value, "all")) {
> +		return 0;
> +	} else {
> +		/*
> +		 * The default, if "all" is not set, is "normal" - leading us here.
> +		 * If the value is "none" then it really doesn't matter.
> +		 */
> +		return DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
> +	}
> +}

I didn't see the need to pass istate to this function, though.
Shouldn't it take a repository instead?

> -static void new_untracked_cache(struct index_state *istate)
> +static void new_untracked_cache(struct index_state *istate, unsigned flags)
>  {
>  	struct untracked_cache *uc = xcalloc(1, sizeof(*uc));
>  	strbuf_init(&uc->ident, 100);
>  	uc->exclude_per_dir = ".gitignore";
> -	/* should be the same flags used by git-status */
> -	uc->dir_flags = DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
> +	uc->dir_flags = flags;

So we used to hardcode these two flags to match what is done in
wt-status.c when show_untracked_files != SHOW_ALL_UNTRACKEDFILES;
we allow a different set of flags to be given by the caller.

> @@ -2761,11 +2781,13 @@ static void new_untracked_cache(struct index_state *istate)
>  void add_untracked_cache(struct index_state *istate)
>  {
>  	if (!istate->untracked) {
> -		new_untracked_cache(istate);
> +		new_untracked_cache(istate,
> +				configured_default_dir_flags(istate));
>  	} else {
>  		if (!ident_in_untracked(istate->untracked)) {
>  			free_untracked_cache(istate->untracked);
> -			new_untracked_cache(istate);
> +			new_untracked_cache(istate,
> +					configured_default_dir_flags(istate));
>  		}
>  	}
>  }

OK.  That's quite straight-forward to see how the tweak is made.

> @@ -2781,10 +2803,12 @@ void remove_untracked_cache(struct index_state *istate)
>  
>  static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *dir,
>  						      int base_len,
> -						      const struct pathspec *pathspec)
> +						      const struct pathspec *pathspec,
> +							  struct index_state *istate)
>  {
>  	struct untracked_cache_dir *root;
>  	static int untracked_cache_disabled = -1;
> +	unsigned configured_dir_flags;
>  
>  	if (!dir->untracked)
>  		return NULL;
> @@ -2812,17 +2836,9 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
>  	if (base_len || (pathspec && pathspec->nr))
>  		return NULL;
>  
> -	/* Different set of flags may produce different results */
> -	if (dir->flags != dir->untracked->dir_flags ||

This is removed because we are making sure that dir->flags and
dir->untracked->dir_flags match?

> -	    /*
> -	     * See treat_directory(), case index_nonexistent. Without
> -	     * this flag, we may need to also cache .git file content
> -	     * for the resolve_gitlink_ref() call, which we don't.
> -	     */
> -	    !(dir->flags & DIR_SHOW_OTHER_DIRECTORIES) ||

This is removed because...?

> -	    /* We don't support collecting ignore files */
> -	    (dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO |
> -			   DIR_COLLECT_IGNORED)))

> +	/* We don't support collecting ignore files */
> +	if (dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO |
> +			DIR_COLLECT_IGNORED))
>  		return NULL;
>  
>  	/*
> @@ -2845,6 +2861,40 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
>  		return NULL;
>  	}
>  
> +	/* We don't support using or preparing the untracked cache if
> +	 * the current effective flags don't match the configured
> +	 * flags.
> +	 */

Style (another one in large comment below).

> +	configured_dir_flags = configured_default_dir_flags(istate);
> +	if (dir->flags != configured_dir_flags)
> +		return NULL;

Hmph.  If this weren't necessary, this function does not need to
call configured_default_dir_flags(), and it can lose the
configured_dir_flags variable, too.  Which means that
new_untracked_cache() function does not need to take the flags word
as a caller-supplied parameter.  Instead, it can make a call to
configured_dir_flags() and assign the result to uc->dir_flags
itself, which would have been much nicer.

> +	/* If the untracked structure we received does not have the same flags
> +	 * as configured, but the configured flags do match the effective flags,
> +	 * then we need to reset / create a new "untracked" structure to match
> +	 * the new config.
> +	 * Keeping the saved and used untracked cache in-line with the
> +	 * configuration provides an opportunity for frequent users of
> +	 * "git status -uall" to leverage the untracked cache by aligning their
> +	 * configuration (setting "status.showuntrackedfiles" to "all" or
> +	 * "normal" as appropriate), where previously this option was
> +	 * incompatible with untracked cache and *consistently* caused
> +	 * surprisingly bad performance (with fscache and fsmonitor enabled) on
> +	 * Windows.
> +	 *
> +	 * IMPROVEMENT OPPORTUNITY: If we reworked the untracked cache storage
> +	 * to not be as bound up with the desired output in a given run,
> +	 * and instead iterated through and stored enough information to
> +	 * correctly serve both "modes", then users could get peak performance
> +	 * with or without '-uall' regardless of their
> +	 * "status.showuntrackedfiles" config.
> +	 */
> +	if (dir->flags != dir->untracked->dir_flags) {
> +		free_untracked_cache(istate->untracked);
> +		new_untracked_cache(istate, configured_dir_flags);
> +		dir->untracked = istate->untracked;
> +	}


This compensates what we lost in the validate_untracked_cache()
above by making them match.  Looking reasonable.

>  	if (!dir->untracked->root)
>  		FLEX_ALLOC_STR(dir->untracked->root, name, "");
>  
> @@ -2916,7 +2966,7 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
>  		return dir->nr;
>  	}
>  
> -	untracked = validate_untracked_cache(dir, len, pathspec);
> +	untracked = validate_untracked_cache(dir, len, pathspec, istate);
>  	if (!untracked)
>  		/*
>  		 * make sure untracked cache code path is disabled,
>
> base-commit: dab1b7905d0b295f1acef9785bb2b9cbb0fdec84

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] untracked-cache: support '--untracked-files=all' if configured
  2022-02-25 19:43   ` Junio C Hamano
@ 2022-02-27 11:21     ` Tao Klerks
  2022-02-27 19:54       ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: Tao Klerks @ 2022-02-27 11:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Tao Klerks via GitGitGadget, git

On Fri, Feb 25, 2022 at 8:44 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Tao Klerks via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > If the previously stored flags no longer match the current
> > configuration, but the currently-applicable flags do match the current
> > configuration, then the previously stored untracked cache data is
> > discarded.
> >
> > For most users there will be no change in behavior. Users who need
> > '--untracked-files=all' to perform well will have the option of
> > setting "status.showuntrackedfiles" to "all".
> >
> > Users who need '--untracked-files=all' to perform well for their
> > tooling AND prefer to avoid the verbosity of "all" when running
> > git status explicitly without options... are out of luck for now (no
> > change).
>
> So, in short, the root of the problem is that untracked cache can be
> used to serve only one mode (between normal and all) of operation,
> and the real solution to that problem must come in a different form,
> i.e. allowing a single unified untracked cache to be usable in both
> modes, perhaps by maintaining all the untracked ones in the cache,
> but filter out upon output when the 'normal' mode is asked for?

I wouldn't call this the root of the problem I was trying to solve with this
patch, but rather the root of the remaining problem, yes.

The challenge that I can't get my head around for this longer-term
approach or objective, is *when* the untracked-folder nested files
should be enumerated.

Currently, untracked folders are only recursed into if -uall is
specified or configured - and that's a significant characteristic:
It's perfectly plausible that some users sometimes have huge
deep untracked folder hierarchies that take a long time to explore,
but git never needs to because they never specify -uall.

If we decided to serve both modes with one single untracked
cache structure, then we would either need to always fully
recurse, or implement some sort of "just-in-time" filling in of the
recursive bits when someone specifies -uall for the first time.

Either way, I'm pretty sure it's beyond me to do that right. Hence
this very-pragmatic approach that makes it *possible* to get
good/normal performance with -uall.

>
> > Users who set "status.showuntrackedfiles" to "all" and yet most
> > frequently explicitly call 'git status --untracked-files=normal' (and
> > use the untracked cache) are the only ones who would be disadvantaged
> > by this change. It seems unlikely there are any such users.
>
> Given how widely used we are these days, I am afraid that the days
> we can safely say "users with such a strange use pattern would not
> exist at all" is long gone.
>
> > +static unsigned configured_default_dir_flags(struct index_state *istate)
> > +{
> > +     /* This logic is coordinated with the setting of these flags in
> > +      * wt-status.c#wt_status_collect_untracked(), and the evaluation
> > +      * of the config setting in commit.c#git_status_config()
> > +      */
>
> Good thing to note here.
>
> Style.
>
>   /*
>    * Our multi-line comments reads more like this, with
>    * slash-asterisk that opens, and asterisk-slash that closes,
>    * sitting on their own lines.
>    */

Thanks, sorry, I thought I'd corrected them all but clearly missed some.

>
> > +     char *status_untracked_files_config_value;
> > +     int config_outcome = repo_config_get_string(istate->repo,
> > +                                                             "status.showuntrackedfiles",
>
> The indentation looks a bit off.
>
> In this project, tab width is 8.  The beginning of the second
> parameter to the function call on the second line should align with
> the beginning of the first parameter that immediately follows the
> open parenthesis '('.
>

Fixed, thx.

> > +                                                             &status_untracked_files_config_value);
> > +     if (!config_outcome && !strcmp(status_untracked_files_config_value, "all")) {
> > +             return 0;
> > +     } else {
> > +             /*
> > +              * The default, if "all" is not set, is "normal" - leading us here.
> > +              * If the value is "none" then it really doesn't matter.
> > +              */
> > +             return DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
> > +     }
> > +}
>
> I didn't see the need to pass istate to this function, though.
> Shouldn't it take a repository instead?

Makes sense, fixed, thx.

>
> > -static void new_untracked_cache(struct index_state *istate)
> > +static void new_untracked_cache(struct index_state *istate, unsigned flags)
> >  {
> >       struct untracked_cache *uc = xcalloc(1, sizeof(*uc));
> >       strbuf_init(&uc->ident, 100);
> >       uc->exclude_per_dir = ".gitignore";
> > -     /* should be the same flags used by git-status */
> > -     uc->dir_flags = DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
> > +     uc->dir_flags = flags;
>
> So we used to hardcode these two flags to match what is done in
> wt-status.c when show_untracked_files != SHOW_ALL_UNTRACKEDFILES;
> we allow a different set of flags to be given by the caller.
>
> > @@ -2761,11 +2781,13 @@ static void new_untracked_cache(struct index_state *istate)
> >  void add_untracked_cache(struct index_state *istate)
> >  {
> >       if (!istate->untracked) {
> > -             new_untracked_cache(istate);
> > +             new_untracked_cache(istate,
> > +                             configured_default_dir_flags(istate));
> >       } else {
> >               if (!ident_in_untracked(istate->untracked)) {
> >                       free_untracked_cache(istate->untracked);
> > -                     new_untracked_cache(istate);
> > +                     new_untracked_cache(istate,
> > +                                     configured_default_dir_flags(istate));
> >               }
> >       }
> >  }
>
> OK.  That's quite straight-forward to see how the tweak is made.
>
> > @@ -2781,10 +2803,12 @@ void remove_untracked_cache(struct index_state *istate)
> >
> >  static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *dir,
> >                                                     int base_len,
> > -                                                   const struct pathspec *pathspec)
> > +                                                   const struct pathspec *pathspec,
> > +                                                       struct index_state *istate)
> >  {
> >       struct untracked_cache_dir *root;
> >       static int untracked_cache_disabled = -1;
> > +     unsigned configured_dir_flags;
> >
> >       if (!dir->untracked)
> >               return NULL;
> > @@ -2812,17 +2836,9 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
> >       if (base_len || (pathspec && pathspec->nr))
> >               return NULL;
> >
> > -     /* Different set of flags may produce different results */
> > -     if (dir->flags != dir->untracked->dir_flags ||
>
> This is removed because we are making sure that dir->flags and
> dir->untracked->dir_flags match?
>
> > -         /*
> > -          * See treat_directory(), case index_nonexistent. Without
> > -          * this flag, we may need to also cache .git file content
> > -          * for the resolve_gitlink_ref() call, which we don't.
> > -          */
> > -         !(dir->flags & DIR_SHOW_OTHER_DIRECTORIES) ||
>
> This is removed because...?

Because we *do* now support using untracked cache with -uall...
As long as the config matches the runtime flags (new check later).

>
> > -         /* We don't support collecting ignore files */
> > -         (dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO |
> > -                        DIR_COLLECT_IGNORED)))
>
> > +     /* We don't support collecting ignore files */
> > +     if (dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO |
> > +                     DIR_COLLECT_IGNORED))
> >               return NULL;
> >
> >       /*
> > @@ -2845,6 +2861,40 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
> >               return NULL;
> >       }
> >
> > +     /* We don't support using or preparing the untracked cache if
> > +      * the current effective flags don't match the configured
> > +      * flags.
> > +      */
>
> Style (another one in large comment below).
>

Thx.

> > +     configured_dir_flags = configured_default_dir_flags(istate);
> > +     if (dir->flags != configured_dir_flags)
> > +             return NULL;
>
> Hmph.  If this weren't necessary, this function does not need to
> call configured_default_dir_flags(), and it can lose the
> configured_dir_flags variable, too.  Which means that
> new_untracked_cache() function does not need to take the flags word
> as a caller-supplied parameter.  Instead, it can make a call to
> configured_dir_flags() and assign the result to uc->dir_flags
> itself, which would have been much nicer.

I've tightened this up a little with an inline call to
configured_default_dir_flags(), getting rid of the variable, let's see
if that makes more sense / is cleaner.

Sending new version with these changes.

>
> > +     /* If the untracked structure we received does not have the same flags
> > +      * as configured, but the configured flags do match the effective flags,
> > +      * then we need to reset / create a new "untracked" structure to match
> > +      * the new config.
> > +      * Keeping the saved and used untracked cache in-line with the
> > +      * configuration provides an opportunity for frequent users of
> > +      * "git status -uall" to leverage the untracked cache by aligning their
> > +      * configuration (setting "status.showuntrackedfiles" to "all" or
> > +      * "normal" as appropriate), where previously this option was
> > +      * incompatible with untracked cache and *consistently* caused
> > +      * surprisingly bad performance (with fscache and fsmonitor enabled) on
> > +      * Windows.
> > +      *
> > +      * IMPROVEMENT OPPORTUNITY: If we reworked the untracked cache storage
> > +      * to not be as bound up with the desired output in a given run,
> > +      * and instead iterated through and stored enough information to
> > +      * correctly serve both "modes", then users could get peak performance
> > +      * with or without '-uall' regardless of their
> > +      * "status.showuntrackedfiles" config.
> > +      */
> > +     if (dir->flags != dir->untracked->dir_flags) {
> > +             free_untracked_cache(istate->untracked);
> > +             new_untracked_cache(istate, configured_dir_flags);
> > +             dir->untracked = istate->untracked;
> > +     }
>
>
> This compensates what we lost in the validate_untracked_cache()
> above by making them match.  Looking reasonable.
>
> >       if (!dir->untracked->root)
> >               FLEX_ALLOC_STR(dir->untracked->root, name, "");
> >
> > @@ -2916,7 +2966,7 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
> >               return dir->nr;
> >       }
> >
> > -     untracked = validate_untracked_cache(dir, len, pathspec);
> > +     untracked = validate_untracked_cache(dir, len, pathspec, istate);
> >       if (!untracked)
> >               /*
> >                * make sure untracked cache code path is disabled,
> >
> > base-commit: dab1b7905d0b295f1acef9785bb2b9cbb0fdec84

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3] untracked-cache: support '--untracked-files=all' if configured
  2022-02-25 17:52 ` [PATCH v2] untracked-cache: support " Tao Klerks via GitGitGadget
  2022-02-25 19:43   ` Junio C Hamano
@ 2022-02-27 11:22   ` Tao Klerks via GitGitGadget
  2022-02-27 14:39     ` Tao Klerks
  2022-02-27 15:13     ` [PATCH v4] " Tao Klerks via GitGitGadget
  1 sibling, 2 replies; 22+ messages in thread
From: Tao Klerks via GitGitGadget @ 2022-02-27 11:22 UTC (permalink / raw)
  To: git; +Cc: Tao Klerks, Tao Klerks

From: Tao Klerks <tao@klerks.biz>

Untracked cache was originally designed to only work with
'-untracked-files=normal', but this causes performance issues for UI
tooling that wants to see "all" on a frequent basis. On the other hand,
the conditions that prevented applicability to the "all" mode no
longer seem to apply.

The disqualification of untracked cache has a particularly significant
impact on Windows with the defaulted fscache, where the introduction
of fsmonitor can make the first and costly directory-iteration happen
in untracked file detection, single-threaded, rather than in
preload-index on multiple threads. Improving the performance of a
"normal" 'git status' run with fsmonitor can make
'git status --untracked-files=all' perform much worse.

To partially address this, align the supported directory flags for the
stored untracked cache data with the git config. If a user specifies
an '--untracked-files=' commandline parameter that does not align with
their 'status.showuntrackedfiles' config value, then the untracked
cache will be ignored - as it is for other unsupported situations like
when a pathspec is specified.

If the previously stored flags no longer match the current
configuration, but the currently-applicable flags do match the current
configuration, then discard the previously stored untracked cache
data.

For most users there will be no change in behavior. Users who need
'--untracked-files=all' to perform well will now have the option of
setting "status.showuntrackedfiles" to "all" for better / more
consistent performance.

Users who need '--untracked-files=all' to perform well for their
tooling AND prefer to avoid the verbosity of "all" when running
git status explicitly without options... are out of luck for now (no
change).

Users who have the "status.showuntrackedfiles" config set to "all"
and yet frequently explicitly call
'git status --untracked-files=normal' (and use the untracked cache)
are the only ones who will be disadvantaged by this change. Their
"--untracked-files=normal" calls will, after this change, no longer
use the untracked cache.

Signed-off-by: Tao Klerks <tao@klerks.biz>
---
    Support untracked cache with '--untracked-files=all' if configured
    
    Resending this patch from a few months ago (with better standards
    compliance) - it hasn't seen any comments yet, I would dearly love some
    eyes on this as the change can make a big difference to hundreds of
    windows users in my environment (and I'd really rather not start
    distributing customized git builds!)
    
    This patch aims to make it possible for users of the -uall flag to git
    status, either by preference or by need (eg UI tooling), to benefit from
    the untracked cache when they set their 'status.showuntrackedfiles'
    config setting to 'all'. This is very important for large repos in
    Windows.
    
    More detail on the change and context in the commit message, I assume
    repeating a verbose message here is discouraged.
    
    These changes result from a couple of conversations,
    81153d02-8e7a-be59-e709-e90cd5906f3a@jeffhostetler.com and
    CABPp-BFiwzzUgiTj_zu+vF5x20L0=1cf25cHwk7KZQj2YkVzXw@mail.gmail.com>.
    
    The test suite passes, but as a n00b I do have questions:
    
     * Is there any additional validation that could/should be done to
       confirm that "-uall" untracked data can be cached safely?
       * It looks safe from following the code
       * It seems from discussing briefly with Elijah Newren in the thread
         above that thare are no obvious red flags
       * Manual testing, explicitly and implicitly through months of use,
         yields no issues
     * Is it OK to check the repo configuration in the body of processing?
       It seems to be a rare pattern
     * Can anyone think of a way to test this change?

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-985%2FTaoK%2Ftaok-untracked-cache-with-uall-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-985/TaoK/taok-untracked-cache-with-uall-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/985

Range-diff vs v2:

 1:  e2f1ad26c78 ! 1:  49cf90bfab5 untracked-cache: support '--untracked-files=all' if configured
     @@ Commit message
          untracked-cache: support '--untracked-files=all' if configured
      
          Untracked cache was originally designed to only work with
     -    '-untracked-files=normal', but this is a concern for UI tooling that
     -    wants to see "all" on a frequent basis, and the conditions that
     -    prevented applicability to the "all" mode no longer seem to apply.
     +    '-untracked-files=normal', but this causes performance issues for UI
     +    tooling that wants to see "all" on a frequent basis. On the other hand,
     +    the conditions that prevented applicability to the "all" mode no
     +    longer seem to apply.
      
     -    The disqualification of untracked cache is a particular problem on
     -    Windows with the defaulted fscache, where the introduction of
     -    fsmonitor can make the first and costly directory-iteration happen
     +    The disqualification of untracked cache has a particularly significant
     +    impact on Windows with the defaulted fscache, where the introduction
     +    of fsmonitor can make the first and costly directory-iteration happen
          in untracked file detection, single-threaded, rather than in
          preload-index on multiple threads. Improving the performance of a
          "normal" 'git status' run with fsmonitor can make
          'git status --untracked-files=all' perform much worse.
      
     -    In this change, align the supported directory flags for the stored
     -    untracked cache data with the git config. If a user specifies an
     -    '--untracked-files=' commandline parameter that does not align with
     +    To partially address this, align the supported directory flags for the
     +    stored untracked cache data with the git config. If a user specifies
     +    an '--untracked-files=' commandline parameter that does not align with
          their 'status.showuntrackedfiles' config value, then the untracked
          cache will be ignored - as it is for other unsupported situations like
          when a pathspec is specified.
      
          If the previously stored flags no longer match the current
          configuration, but the currently-applicable flags do match the current
     -    configuration, then the previously stored untracked cache data is
     -    discarded.
     +    configuration, then discard the previously stored untracked cache
     +    data.
      
          For most users there will be no change in behavior. Users who need
     -    '--untracked-files=all' to perform well will have the option of
     -    setting "status.showuntrackedfiles" to "all".
     +    '--untracked-files=all' to perform well will now have the option of
     +    setting "status.showuntrackedfiles" to "all" for better / more
     +    consistent performance.
      
          Users who need '--untracked-files=all' to perform well for their
          tooling AND prefer to avoid the verbosity of "all" when running
          git status explicitly without options... are out of luck for now (no
          change).
      
     -    Users who set "status.showuntrackedfiles" to "all" and yet most
     -    frequently explicitly call 'git status --untracked-files=normal' (and
     -    use the untracked cache) are the only ones who would be disadvantaged
     -    by this change. It seems unlikely there are any such users.
     +    Users who have the "status.showuntrackedfiles" config set to "all"
     +    and yet frequently explicitly call
     +    'git status --untracked-files=normal' (and use the untracked cache)
     +    are the only ones who will be disadvantaged by this change. Their
     +    "--untracked-files=normal" calls will, after this change, no longer
     +    use the untracked cache.
      
          Signed-off-by: Tao Klerks <tao@klerks.biz>
      


 dir.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 68 insertions(+), 18 deletions(-)

diff --git a/dir.c b/dir.c
index d91295f2bcd..e35331d3f71 100644
--- a/dir.c
+++ b/dir.c
@@ -2746,13 +2746,33 @@ static void set_untracked_ident(struct untracked_cache *uc)
 	strbuf_addch(&uc->ident, 0);
 }
 
-static void new_untracked_cache(struct index_state *istate)
+static unsigned configured_default_dir_flags(struct index_state *istate)
+{
+	/* This logic is coordinated with the setting of these flags in
+	 * wt-status.c#wt_status_collect_untracked(), and the evaluation
+	 * of the config setting in commit.c#git_status_config()
+	 */
+	char *status_untracked_files_config_value;
+	int config_outcome = repo_config_get_string(istate->repo,
+								"status.showuntrackedfiles",
+								&status_untracked_files_config_value);
+	if (!config_outcome && !strcmp(status_untracked_files_config_value, "all")) {
+		return 0;
+	} else {
+		/*
+		 * The default, if "all" is not set, is "normal" - leading us here.
+		 * If the value is "none" then it really doesn't matter.
+		 */
+		return DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
+	}
+}
+
+static void new_untracked_cache(struct index_state *istate, unsigned flags)
 {
 	struct untracked_cache *uc = xcalloc(1, sizeof(*uc));
 	strbuf_init(&uc->ident, 100);
 	uc->exclude_per_dir = ".gitignore";
-	/* should be the same flags used by git-status */
-	uc->dir_flags = DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
+	uc->dir_flags = flags;
 	set_untracked_ident(uc);
 	istate->untracked = uc;
 	istate->cache_changed |= UNTRACKED_CHANGED;
@@ -2761,11 +2781,13 @@ static void new_untracked_cache(struct index_state *istate)
 void add_untracked_cache(struct index_state *istate)
 {
 	if (!istate->untracked) {
-		new_untracked_cache(istate);
+		new_untracked_cache(istate,
+				configured_default_dir_flags(istate));
 	} else {
 		if (!ident_in_untracked(istate->untracked)) {
 			free_untracked_cache(istate->untracked);
-			new_untracked_cache(istate);
+			new_untracked_cache(istate,
+					configured_default_dir_flags(istate));
 		}
 	}
 }
@@ -2781,10 +2803,12 @@ void remove_untracked_cache(struct index_state *istate)
 
 static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *dir,
 						      int base_len,
-						      const struct pathspec *pathspec)
+						      const struct pathspec *pathspec,
+							  struct index_state *istate)
 {
 	struct untracked_cache_dir *root;
 	static int untracked_cache_disabled = -1;
+	unsigned configured_dir_flags;
 
 	if (!dir->untracked)
 		return NULL;
@@ -2812,17 +2836,9 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
 	if (base_len || (pathspec && pathspec->nr))
 		return NULL;
 
-	/* Different set of flags may produce different results */
-	if (dir->flags != dir->untracked->dir_flags ||
-	    /*
-	     * See treat_directory(), case index_nonexistent. Without
-	     * this flag, we may need to also cache .git file content
-	     * for the resolve_gitlink_ref() call, which we don't.
-	     */
-	    !(dir->flags & DIR_SHOW_OTHER_DIRECTORIES) ||
-	    /* We don't support collecting ignore files */
-	    (dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO |
-			   DIR_COLLECT_IGNORED)))
+	/* We don't support collecting ignore files */
+	if (dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO |
+			DIR_COLLECT_IGNORED))
 		return NULL;
 
 	/*
@@ -2845,6 +2861,40 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
 		return NULL;
 	}
 
+	/* We don't support using or preparing the untracked cache if
+	 * the current effective flags don't match the configured
+	 * flags.
+	 */
+	configured_dir_flags = configured_default_dir_flags(istate);
+	if (dir->flags != configured_dir_flags)
+		return NULL;
+
+	/* If the untracked structure we received does not have the same flags
+	 * as configured, but the configured flags do match the effective flags,
+	 * then we need to reset / create a new "untracked" structure to match
+	 * the new config.
+	 * Keeping the saved and used untracked cache in-line with the
+	 * configuration provides an opportunity for frequent users of
+	 * "git status -uall" to leverage the untracked cache by aligning their
+	 * configuration (setting "status.showuntrackedfiles" to "all" or
+	 * "normal" as appropriate), where previously this option was
+	 * incompatible with untracked cache and *consistently* caused
+	 * surprisingly bad performance (with fscache and fsmonitor enabled) on
+	 * Windows.
+	 *
+	 * IMPROVEMENT OPPORTUNITY: If we reworked the untracked cache storage
+	 * to not be as bound up with the desired output in a given run,
+	 * and instead iterated through and stored enough information to
+	 * correctly serve both "modes", then users could get peak performance
+	 * with or without '-uall' regardless of their
+	 * "status.showuntrackedfiles" config.
+	 */
+	if (dir->flags != dir->untracked->dir_flags) {
+		free_untracked_cache(istate->untracked);
+		new_untracked_cache(istate, configured_dir_flags);
+		dir->untracked = istate->untracked;
+	}
+
 	if (!dir->untracked->root)
 		FLEX_ALLOC_STR(dir->untracked->root, name, "");
 
@@ -2916,7 +2966,7 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
 		return dir->nr;
 	}
 
-	untracked = validate_untracked_cache(dir, len, pathspec);
+	untracked = validate_untracked_cache(dir, len, pathspec, istate);
 	if (!untracked)
 		/*
 		 * make sure untracked cache code path is disabled,

base-commit: dab1b7905d0b295f1acef9785bb2b9cbb0fdec84
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v3] untracked-cache: support '--untracked-files=all' if configured
  2022-02-27 11:22   ` [PATCH v3] " Tao Klerks via GitGitGadget
@ 2022-02-27 14:39     ` Tao Klerks
  2022-02-27 15:13     ` [PATCH v4] " Tao Klerks via GitGitGadget
  1 sibling, 0 replies; 22+ messages in thread
From: Tao Klerks @ 2022-02-27 14:39 UTC (permalink / raw)
  To: Tao Klerks via GitGitGadget; +Cc: git, Junio C Hamano

Bad submission, my apologies - commit message is right but code
changes are missing. New one coming soon.

On Sun, Feb 27, 2022 at 12:22 PM Tao Klerks via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Tao Klerks <tao@klerks.biz>
>
> Untracked cache was originally designed to only work with
> '-untracked-files=normal', but this causes performance issues for UI
> tooling that wants to see "all" on a frequent basis. On the other hand,
> the conditions that prevented applicability to the "all" mode no
> longer seem to apply.
>
> The disqualification of untracked cache has a particularly significant
> impact on Windows with the defaulted fscache, where the introduction
> of fsmonitor can make the first and costly directory-iteration happen
> in untracked file detection, single-threaded, rather than in
> preload-index on multiple threads. Improving the performance of a
> "normal" 'git status' run with fsmonitor can make
> 'git status --untracked-files=all' perform much worse.
>
> To partially address this, align the supported directory flags for the
> stored untracked cache data with the git config. If a user specifies
> an '--untracked-files=' commandline parameter that does not align with
> their 'status.showuntrackedfiles' config value, then the untracked
> cache will be ignored - as it is for other unsupported situations like
> when a pathspec is specified.
>
> If the previously stored flags no longer match the current
> configuration, but the currently-applicable flags do match the current
> configuration, then discard the previously stored untracked cache
> data.
>
> For most users there will be no change in behavior. Users who need
> '--untracked-files=all' to perform well will now have the option of
> setting "status.showuntrackedfiles" to "all" for better / more
> consistent performance.
>
> Users who need '--untracked-files=all' to perform well for their
> tooling AND prefer to avoid the verbosity of "all" when running
> git status explicitly without options... are out of luck for now (no
> change).
>
> Users who have the "status.showuntrackedfiles" config set to "all"
> and yet frequently explicitly call
> 'git status --untracked-files=normal' (and use the untracked cache)
> are the only ones who will be disadvantaged by this change. Their
> "--untracked-files=normal" calls will, after this change, no longer
> use the untracked cache.
>
> Signed-off-by: Tao Klerks <tao@klerks.biz>
> ---
>     Support untracked cache with '--untracked-files=all' if configured
>
>     Resending this patch from a few months ago (with better standards
>     compliance) - it hasn't seen any comments yet, I would dearly love some
>     eyes on this as the change can make a big difference to hundreds of
>     windows users in my environment (and I'd really rather not start
>     distributing customized git builds!)
>
>     This patch aims to make it possible for users of the -uall flag to git
>     status, either by preference or by need (eg UI tooling), to benefit from
>     the untracked cache when they set their 'status.showuntrackedfiles'
>     config setting to 'all'. This is very important for large repos in
>     Windows.
>
>     More detail on the change and context in the commit message, I assume
>     repeating a verbose message here is discouraged.
>
>     These changes result from a couple of conversations,
>     81153d02-8e7a-be59-e709-e90cd5906f3a@jeffhostetler.com and
>     CABPp-BFiwzzUgiTj_zu+vF5x20L0=1cf25cHwk7KZQj2YkVzXw@mail.gmail.com>.
>
>     The test suite passes, but as a n00b I do have questions:
>
>      * Is there any additional validation that could/should be done to
>        confirm that "-uall" untracked data can be cached safely?
>        * It looks safe from following the code
>        * It seems from discussing briefly with Elijah Newren in the thread
>          above that thare are no obvious red flags
>        * Manual testing, explicitly and implicitly through months of use,
>          yields no issues
>      * Is it OK to check the repo configuration in the body of processing?
>        It seems to be a rare pattern
>      * Can anyone think of a way to test this change?
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-985%2FTaoK%2Ftaok-untracked-cache-with-uall-v3
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-985/TaoK/taok-untracked-cache-with-uall-v3
> Pull-Request: https://github.com/gitgitgadget/git/pull/985
>
> Range-diff vs v2:
>
>  1:  e2f1ad26c78 ! 1:  49cf90bfab5 untracked-cache: support '--untracked-files=all' if configured
>      @@ Commit message
>           untracked-cache: support '--untracked-files=all' if configured
>
>           Untracked cache was originally designed to only work with
>      -    '-untracked-files=normal', but this is a concern for UI tooling that
>      -    wants to see "all" on a frequent basis, and the conditions that
>      -    prevented applicability to the "all" mode no longer seem to apply.
>      +    '-untracked-files=normal', but this causes performance issues for UI
>      +    tooling that wants to see "all" on a frequent basis. On the other hand,
>      +    the conditions that prevented applicability to the "all" mode no
>      +    longer seem to apply.
>
>      -    The disqualification of untracked cache is a particular problem on
>      -    Windows with the defaulted fscache, where the introduction of
>      -    fsmonitor can make the first and costly directory-iteration happen
>      +    The disqualification of untracked cache has a particularly significant
>      +    impact on Windows with the defaulted fscache, where the introduction
>      +    of fsmonitor can make the first and costly directory-iteration happen
>           in untracked file detection, single-threaded, rather than in
>           preload-index on multiple threads. Improving the performance of a
>           "normal" 'git status' run with fsmonitor can make
>           'git status --untracked-files=all' perform much worse.
>
>      -    In this change, align the supported directory flags for the stored
>      -    untracked cache data with the git config. If a user specifies an
>      -    '--untracked-files=' commandline parameter that does not align with
>      +    To partially address this, align the supported directory flags for the
>      +    stored untracked cache data with the git config. If a user specifies
>      +    an '--untracked-files=' commandline parameter that does not align with
>           their 'status.showuntrackedfiles' config value, then the untracked
>           cache will be ignored - as it is for other unsupported situations like
>           when a pathspec is specified.
>
>           If the previously stored flags no longer match the current
>           configuration, but the currently-applicable flags do match the current
>      -    configuration, then the previously stored untracked cache data is
>      -    discarded.
>      +    configuration, then discard the previously stored untracked cache
>      +    data.
>
>           For most users there will be no change in behavior. Users who need
>      -    '--untracked-files=all' to perform well will have the option of
>      -    setting "status.showuntrackedfiles" to "all".
>      +    '--untracked-files=all' to perform well will now have the option of
>      +    setting "status.showuntrackedfiles" to "all" for better / more
>      +    consistent performance.
>
>           Users who need '--untracked-files=all' to perform well for their
>           tooling AND prefer to avoid the verbosity of "all" when running
>           git status explicitly without options... are out of luck for now (no
>           change).
>
>      -    Users who set "status.showuntrackedfiles" to "all" and yet most
>      -    frequently explicitly call 'git status --untracked-files=normal' (and
>      -    use the untracked cache) are the only ones who would be disadvantaged
>      -    by this change. It seems unlikely there are any such users.
>      +    Users who have the "status.showuntrackedfiles" config set to "all"
>      +    and yet frequently explicitly call
>      +    'git status --untracked-files=normal' (and use the untracked cache)
>      +    are the only ones who will be disadvantaged by this change. Their
>      +    "--untracked-files=normal" calls will, after this change, no longer
>      +    use the untracked cache.
>
>           Signed-off-by: Tao Klerks <tao@klerks.biz>
>
>
>
>  dir.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++-------------
>  1 file changed, 68 insertions(+), 18 deletions(-)
>
> diff --git a/dir.c b/dir.c
> index d91295f2bcd..e35331d3f71 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -2746,13 +2746,33 @@ static void set_untracked_ident(struct untracked_cache *uc)
>         strbuf_addch(&uc->ident, 0);
>  }
>
> -static void new_untracked_cache(struct index_state *istate)
> +static unsigned configured_default_dir_flags(struct index_state *istate)
> +{
> +       /* This logic is coordinated with the setting of these flags in
> +        * wt-status.c#wt_status_collect_untracked(), and the evaluation
> +        * of the config setting in commit.c#git_status_config()
> +        */
> +       char *status_untracked_files_config_value;
> +       int config_outcome = repo_config_get_string(istate->repo,
> +                                                               "status.showuntrackedfiles",
> +                                                               &status_untracked_files_config_value);
> +       if (!config_outcome && !strcmp(status_untracked_files_config_value, "all")) {
> +               return 0;
> +       } else {
> +               /*
> +                * The default, if "all" is not set, is "normal" - leading us here.
> +                * If the value is "none" then it really doesn't matter.
> +                */
> +               return DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
> +       }
> +}
> +
> +static void new_untracked_cache(struct index_state *istate, unsigned flags)
>  {
>         struct untracked_cache *uc = xcalloc(1, sizeof(*uc));
>         strbuf_init(&uc->ident, 100);
>         uc->exclude_per_dir = ".gitignore";
> -       /* should be the same flags used by git-status */
> -       uc->dir_flags = DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
> +       uc->dir_flags = flags;
>         set_untracked_ident(uc);
>         istate->untracked = uc;
>         istate->cache_changed |= UNTRACKED_CHANGED;
> @@ -2761,11 +2781,13 @@ static void new_untracked_cache(struct index_state *istate)
>  void add_untracked_cache(struct index_state *istate)
>  {
>         if (!istate->untracked) {
> -               new_untracked_cache(istate);
> +               new_untracked_cache(istate,
> +                               configured_default_dir_flags(istate));
>         } else {
>                 if (!ident_in_untracked(istate->untracked)) {
>                         free_untracked_cache(istate->untracked);
> -                       new_untracked_cache(istate);
> +                       new_untracked_cache(istate,
> +                                       configured_default_dir_flags(istate));
>                 }
>         }
>  }
> @@ -2781,10 +2803,12 @@ void remove_untracked_cache(struct index_state *istate)
>
>  static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *dir,
>                                                       int base_len,
> -                                                     const struct pathspec *pathspec)
> +                                                     const struct pathspec *pathspec,
> +                                                         struct index_state *istate)
>  {
>         struct untracked_cache_dir *root;
>         static int untracked_cache_disabled = -1;
> +       unsigned configured_dir_flags;
>
>         if (!dir->untracked)
>                 return NULL;
> @@ -2812,17 +2836,9 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
>         if (base_len || (pathspec && pathspec->nr))
>                 return NULL;
>
> -       /* Different set of flags may produce different results */
> -       if (dir->flags != dir->untracked->dir_flags ||
> -           /*
> -            * See treat_directory(), case index_nonexistent. Without
> -            * this flag, we may need to also cache .git file content
> -            * for the resolve_gitlink_ref() call, which we don't.
> -            */
> -           !(dir->flags & DIR_SHOW_OTHER_DIRECTORIES) ||
> -           /* We don't support collecting ignore files */
> -           (dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO |
> -                          DIR_COLLECT_IGNORED)))
> +       /* We don't support collecting ignore files */
> +       if (dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO |
> +                       DIR_COLLECT_IGNORED))
>                 return NULL;
>
>         /*
> @@ -2845,6 +2861,40 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
>                 return NULL;
>         }
>
> +       /* We don't support using or preparing the untracked cache if
> +        * the current effective flags don't match the configured
> +        * flags.
> +        */
> +       configured_dir_flags = configured_default_dir_flags(istate);
> +       if (dir->flags != configured_dir_flags)
> +               return NULL;
> +
> +       /* If the untracked structure we received does not have the same flags
> +        * as configured, but the configured flags do match the effective flags,
> +        * then we need to reset / create a new "untracked" structure to match
> +        * the new config.
> +        * Keeping the saved and used untracked cache in-line with the
> +        * configuration provides an opportunity for frequent users of
> +        * "git status -uall" to leverage the untracked cache by aligning their
> +        * configuration (setting "status.showuntrackedfiles" to "all" or
> +        * "normal" as appropriate), where previously this option was
> +        * incompatible with untracked cache and *consistently* caused
> +        * surprisingly bad performance (with fscache and fsmonitor enabled) on
> +        * Windows.
> +        *
> +        * IMPROVEMENT OPPORTUNITY: If we reworked the untracked cache storage
> +        * to not be as bound up with the desired output in a given run,
> +        * and instead iterated through and stored enough information to
> +        * correctly serve both "modes", then users could get peak performance
> +        * with or without '-uall' regardless of their
> +        * "status.showuntrackedfiles" config.
> +        */
> +       if (dir->flags != dir->untracked->dir_flags) {
> +               free_untracked_cache(istate->untracked);
> +               new_untracked_cache(istate, configured_dir_flags);
> +               dir->untracked = istate->untracked;
> +       }
> +
>         if (!dir->untracked->root)
>                 FLEX_ALLOC_STR(dir->untracked->root, name, "");
>
> @@ -2916,7 +2966,7 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
>                 return dir->nr;
>         }
>
> -       untracked = validate_untracked_cache(dir, len, pathspec);
> +       untracked = validate_untracked_cache(dir, len, pathspec, istate);
>         if (!untracked)
>                 /*
>                  * make sure untracked cache code path is disabled,
>
> base-commit: dab1b7905d0b295f1acef9785bb2b9cbb0fdec84
> --
> gitgitgadget

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v4] untracked-cache: support '--untracked-files=all' if configured
  2022-02-27 11:22   ` [PATCH v3] " Tao Klerks via GitGitGadget
  2022-02-27 14:39     ` Tao Klerks
@ 2022-02-27 15:13     ` Tao Klerks via GitGitGadget
  2022-02-28 14:02       ` Ævar Arnfjörð Bjarmason
  2022-03-29 11:25       ` [PATCH v5 0/2] Support untracked cache with " Tao Klerks via GitGitGadget
  1 sibling, 2 replies; 22+ messages in thread
From: Tao Klerks via GitGitGadget @ 2022-02-27 15:13 UTC (permalink / raw)
  To: git; +Cc: Tao Klerks, Tao Klerks, Tao Klerks

From: Tao Klerks <tao@klerks.biz>

Untracked cache was originally designed to only work with
'-untracked-files=normal', but this causes performance issues for UI
tooling that wants to see "all" on a frequent basis. On the other hand,
the conditions that prevented applicability to the "all" mode no
longer seem to apply.

The disqualification of untracked cache has a particularly significant
impact on Windows with the defaulted fscache, where the introduction
of fsmonitor can make the first and costly directory-iteration happen
in untracked file detection, single-threaded, rather than in
preload-index on multiple threads. Improving the performance of a
"normal" 'git status' run with fsmonitor can make
'git status --untracked-files=all' perform much worse.

To partially address this, align the supported directory flags for the
stored untracked cache data with the git config. If a user specifies
an '--untracked-files=' commandline parameter that does not align with
their 'status.showuntrackedfiles' config value, then the untracked
cache will be ignored - as it is for other unsupported situations like
when a pathspec is specified.

If the previously stored flags no longer match the current
configuration, but the currently-applicable flags do match the current
configuration, then discard the previously stored untracked cache
data.

For most users there will be no change in behavior. Users who need
'--untracked-files=all' to perform well will now have the option of
setting "status.showuntrackedfiles" to "all" for better / more
consistent performance.

Users who need '--untracked-files=all' to perform well for their
tooling AND prefer to avoid the verbosity of "all" when running
git status explicitly without options... are out of luck for now (no
change).

Users who have the "status.showuntrackedfiles" config set to "all"
and yet frequently explicitly call
'git status --untracked-files=normal' (and use the untracked cache)
are the only ones who will be disadvantaged by this change. Their
"--untracked-files=normal" calls will, after this change, no longer
use the untracked cache.

Signed-off-by: Tao Klerks <tao@klerks.biz>
---
    Support untracked cache with '--untracked-files=all' if configured
    
    Resending this patch from a few months ago (with better standards
    compliance) - it hasn't seen any comments yet, I would dearly love some
    eyes on this as the change can make a big difference to hundreds of
    windows users in my environment (and I'd really rather not start
    distributing customized git builds!)
    
    This patch aims to make it possible for users of the -uall flag to git
    status, either by preference or by need (eg UI tooling), to benefit from
    the untracked cache when they set their 'status.showuntrackedfiles'
    config setting to 'all'. This is very important for large repos in
    Windows.
    
    More detail on the change and context in the commit message, I assume
    repeating a verbose message here is discouraged.
    
    These changes result from a couple of conversations,
    81153d02-8e7a-be59-e709-e90cd5906f3a@jeffhostetler.com and
    CABPp-BFiwzzUgiTj_zu+vF5x20L0=1cf25cHwk7KZQj2YkVzXw@mail.gmail.com>.
    
    The test suite passes, but as a n00b I do have questions:
    
     * Is there any additional validation that could/should be done to
       confirm that "-uall" untracked data can be cached safely?
       * It looks safe from following the code
       * It seems from discussing briefly with Elijah Newren in the thread
         above that thare are no obvious red flags
       * Manual testing, explicitly and implicitly through months of use,
         yields no issues
     * Is it OK to check the repo configuration in the body of processing?
       It seems to be a rare pattern
     * Can anyone think of a way to test this change?

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-985%2FTaoK%2Ftaok-untracked-cache-with-uall-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-985/TaoK/taok-untracked-cache-with-uall-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/985

Range-diff vs v3:

 1:  49cf90bfab5 ! 1:  5da418e5c60 untracked-cache: support '--untracked-files=all' if configured
     @@ dir.c: static void set_untracked_ident(struct untracked_cache *uc)
       }
       
      -static void new_untracked_cache(struct index_state *istate)
     -+static unsigned configured_default_dir_flags(struct index_state *istate)
     ++static unsigned configured_default_dir_flags(struct repository *repo)
      +{
     -+	/* This logic is coordinated with the setting of these flags in
     ++	/*
     ++	 * This logic is coordinated with the setting of these flags in
      +	 * wt-status.c#wt_status_collect_untracked(), and the evaluation
      +	 * of the config setting in commit.c#git_status_config()
      +	 */
      +	char *status_untracked_files_config_value;
     -+	int config_outcome = repo_config_get_string(istate->repo,
     -+								"status.showuntrackedfiles",
     -+								&status_untracked_files_config_value);
     ++	int config_outcome = repo_config_get_string(repo,
     ++						    "status.showuntrackedfiles",
     ++						    &status_untracked_files_config_value);
      +	if (!config_outcome && !strcmp(status_untracked_files_config_value, "all")) {
      +		return 0;
      +	} else {
     @@ dir.c: static void new_untracked_cache(struct index_state *istate)
       	if (!istate->untracked) {
      -		new_untracked_cache(istate);
      +		new_untracked_cache(istate,
     -+				configured_default_dir_flags(istate));
     ++				    configured_default_dir_flags(istate->repo));
       	} else {
       		if (!ident_in_untracked(istate->untracked)) {
       			free_untracked_cache(istate->untracked);
      -			new_untracked_cache(istate);
      +			new_untracked_cache(istate,
     -+					configured_default_dir_flags(istate));
     ++					    configured_default_dir_flags(istate->repo));
       		}
       	}
       }
      @@ dir.c: void remove_untracked_cache(struct index_state *istate)
     + }
       
       static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *dir,
     - 						      int base_len,
     +-						      int base_len,
      -						      const struct pathspec *pathspec)
     -+						      const struct pathspec *pathspec,
     -+							  struct index_state *istate)
     ++							    int base_len,
     ++							    const struct pathspec *pathspec,
     ++							    struct index_state *istate)
       {
       	struct untracked_cache_dir *root;
       	static int untracked_cache_disabled = -1;
     -+	unsigned configured_dir_flags;
     - 
     - 	if (!dir->untracked)
     - 		return NULL;
      @@ dir.c: static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
       	if (base_len || (pathspec && pathspec->nr))
       		return NULL;
     @@ dir.c: static struct untracked_cache_dir *validate_untracked_cache(struct dir_st
       		return NULL;
       	}
       
     -+	/* We don't support using or preparing the untracked cache if
     ++	/*
     ++	 * We don't support using or preparing the untracked cache if
      +	 * the current effective flags don't match the configured
      +	 * flags.
      +	 */
     -+	configured_dir_flags = configured_default_dir_flags(istate);
     -+	if (dir->flags != configured_dir_flags)
     ++	if (dir->flags != configured_default_dir_flags(istate->repo))
      +		return NULL;
      +
     -+	/* If the untracked structure we received does not have the same flags
     ++	/*
     ++	 * If the untracked structure we received does not have the same flags
      +	 * as configured, but the configured flags do match the effective flags,
      +	 * then we need to reset / create a new "untracked" structure to match
      +	 * the new config.
     @@ dir.c: static struct untracked_cache_dir *validate_untracked_cache(struct dir_st
      +	 */
      +	if (dir->flags != dir->untracked->dir_flags) {
      +		free_untracked_cache(istate->untracked);
     -+		new_untracked_cache(istate, configured_dir_flags);
     ++		new_untracked_cache(istate, dir->flags);
      +		dir->untracked = istate->untracked;
      +	}
      +


 dir.c | 89 ++++++++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 70 insertions(+), 19 deletions(-)

diff --git a/dir.c b/dir.c
index d91295f2bcd..57a7d42482f 100644
--- a/dir.c
+++ b/dir.c
@@ -2746,13 +2746,34 @@ static void set_untracked_ident(struct untracked_cache *uc)
 	strbuf_addch(&uc->ident, 0);
 }
 
-static void new_untracked_cache(struct index_state *istate)
+static unsigned configured_default_dir_flags(struct repository *repo)
+{
+	/*
+	 * This logic is coordinated with the setting of these flags in
+	 * wt-status.c#wt_status_collect_untracked(), and the evaluation
+	 * of the config setting in commit.c#git_status_config()
+	 */
+	char *status_untracked_files_config_value;
+	int config_outcome = repo_config_get_string(repo,
+						    "status.showuntrackedfiles",
+						    &status_untracked_files_config_value);
+	if (!config_outcome && !strcmp(status_untracked_files_config_value, "all")) {
+		return 0;
+	} else {
+		/*
+		 * The default, if "all" is not set, is "normal" - leading us here.
+		 * If the value is "none" then it really doesn't matter.
+		 */
+		return DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
+	}
+}
+
+static void new_untracked_cache(struct index_state *istate, unsigned flags)
 {
 	struct untracked_cache *uc = xcalloc(1, sizeof(*uc));
 	strbuf_init(&uc->ident, 100);
 	uc->exclude_per_dir = ".gitignore";
-	/* should be the same flags used by git-status */
-	uc->dir_flags = DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
+	uc->dir_flags = flags;
 	set_untracked_ident(uc);
 	istate->untracked = uc;
 	istate->cache_changed |= UNTRACKED_CHANGED;
@@ -2761,11 +2782,13 @@ static void new_untracked_cache(struct index_state *istate)
 void add_untracked_cache(struct index_state *istate)
 {
 	if (!istate->untracked) {
-		new_untracked_cache(istate);
+		new_untracked_cache(istate,
+				    configured_default_dir_flags(istate->repo));
 	} else {
 		if (!ident_in_untracked(istate->untracked)) {
 			free_untracked_cache(istate->untracked);
-			new_untracked_cache(istate);
+			new_untracked_cache(istate,
+					    configured_default_dir_flags(istate->repo));
 		}
 	}
 }
@@ -2780,8 +2803,9 @@ void remove_untracked_cache(struct index_state *istate)
 }
 
 static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *dir,
-						      int base_len,
-						      const struct pathspec *pathspec)
+							    int base_len,
+							    const struct pathspec *pathspec,
+							    struct index_state *istate)
 {
 	struct untracked_cache_dir *root;
 	static int untracked_cache_disabled = -1;
@@ -2812,17 +2836,9 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
 	if (base_len || (pathspec && pathspec->nr))
 		return NULL;
 
-	/* Different set of flags may produce different results */
-	if (dir->flags != dir->untracked->dir_flags ||
-	    /*
-	     * See treat_directory(), case index_nonexistent. Without
-	     * this flag, we may need to also cache .git file content
-	     * for the resolve_gitlink_ref() call, which we don't.
-	     */
-	    !(dir->flags & DIR_SHOW_OTHER_DIRECTORIES) ||
-	    /* We don't support collecting ignore files */
-	    (dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO |
-			   DIR_COLLECT_IGNORED)))
+	/* We don't support collecting ignore files */
+	if (dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO |
+			DIR_COLLECT_IGNORED))
 		return NULL;
 
 	/*
@@ -2845,6 +2861,41 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
 		return NULL;
 	}
 
+	/*
+	 * We don't support using or preparing the untracked cache if
+	 * the current effective flags don't match the configured
+	 * flags.
+	 */
+	if (dir->flags != configured_default_dir_flags(istate->repo))
+		return NULL;
+
+	/*
+	 * If the untracked structure we received does not have the same flags
+	 * as configured, but the configured flags do match the effective flags,
+	 * then we need to reset / create a new "untracked" structure to match
+	 * the new config.
+	 * Keeping the saved and used untracked cache in-line with the
+	 * configuration provides an opportunity for frequent users of
+	 * "git status -uall" to leverage the untracked cache by aligning their
+	 * configuration (setting "status.showuntrackedfiles" to "all" or
+	 * "normal" as appropriate), where previously this option was
+	 * incompatible with untracked cache and *consistently* caused
+	 * surprisingly bad performance (with fscache and fsmonitor enabled) on
+	 * Windows.
+	 *
+	 * IMPROVEMENT OPPORTUNITY: If we reworked the untracked cache storage
+	 * to not be as bound up with the desired output in a given run,
+	 * and instead iterated through and stored enough information to
+	 * correctly serve both "modes", then users could get peak performance
+	 * with or without '-uall' regardless of their
+	 * "status.showuntrackedfiles" config.
+	 */
+	if (dir->flags != dir->untracked->dir_flags) {
+		free_untracked_cache(istate->untracked);
+		new_untracked_cache(istate, dir->flags);
+		dir->untracked = istate->untracked;
+	}
+
 	if (!dir->untracked->root)
 		FLEX_ALLOC_STR(dir->untracked->root, name, "");
 
@@ -2916,7 +2967,7 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
 		return dir->nr;
 	}
 
-	untracked = validate_untracked_cache(dir, len, pathspec);
+	untracked = validate_untracked_cache(dir, len, pathspec, istate);
 	if (!untracked)
 		/*
 		 * make sure untracked cache code path is disabled,

base-commit: dab1b7905d0b295f1acef9785bb2b9cbb0fdec84
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] untracked-cache: support '--untracked-files=all' if configured
  2022-02-27 11:21     ` Tao Klerks
@ 2022-02-27 19:54       ` Junio C Hamano
  0 siblings, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2022-02-27 19:54 UTC (permalink / raw)
  To: Tao Klerks; +Cc: Tao Klerks via GitGitGadget, git

Tao Klerks <tao@klerks.biz> writes:

>> > +     configured_dir_flags = configured_default_dir_flags(istate);
>> > +     if (dir->flags != configured_dir_flags)
>> > +             return NULL;
>>
>> Hmph.  If this weren't necessary, this function does not need to
>> call configured_default_dir_flags(), and it can lose the
>> configured_dir_flags variable, too.  Which means that
>> new_untracked_cache() function does not need to take the flags word
>> as a caller-supplied parameter.  Instead, it can make a call to
>> configured_dir_flags() and assign the result to uc->dir_flags
>> itself, which would have been much nicer.
>
> I've tightened this up a little with an inline call to
> configured_default_dir_flags(), getting rid of the variable, let's see
> if that makes more sense / is cleaner.

The extra variable does not bother me at all.  Leaving the room for
the caller to screw up and pass an incorrect configred_dir_flags is
what disturbs me.

If this caller needs to call configred_default_dir_flags(istate),
then we cannot avoid it.  And the extra variable needed to call the
function only once, store its result, and use that result twice, is
perfectly a good thing to have.  We don't want to see inline or
anything tricky.

Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4] untracked-cache: support '--untracked-files=all' if configured
  2022-02-27 15:13     ` [PATCH v4] " Tao Klerks via GitGitGadget
@ 2022-02-28 14:02       ` Ævar Arnfjörð Bjarmason
  2022-03-02  8:47         ` Tao Klerks
  2022-03-29 11:25       ` [PATCH v5 0/2] Support untracked cache with " Tao Klerks via GitGitGadget
  1 sibling, 1 reply; 22+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-02-28 14:02 UTC (permalink / raw)
  To: Tao Klerks via GitGitGadget; +Cc: git, Tao Klerks


On Sun, Feb 27 2022, Tao Klerks via GitGitGadget wrote:

> From: Tao Klerks <tao@klerks.biz>
> [...]
>     The test suite passes, but as a n00b I do have questions:
>     
>      * Is there any additional validation that could/should be done to
>        confirm that "-uall" untracked data can be cached safely?

I haven't given this any substantial review, sorry.

>        * It looks safe from following the code

Yes, at a glance it looks safe to me.

>        * It seems from discussing briefly with Elijah Newren in the thread
>          above that thare are no obvious red flags
>        * Manual testing, explicitly and implicitly through months of use,
>          yields no issues
>      * Is it OK to check the repo configuration in the body of processing?
>        It seems to be a rare pattern

Yes, it's not very common, but it's fine.

>      * Can anyone think of a way to test this change?

Well, if we set "flags" to 0 in your new helper the existing tests will
fail, so that's something at least.

> -static void new_untracked_cache(struct index_state *istate)
> +static unsigned configured_default_dir_flags(struct repository *repo)
> +{
> +	/*
> +	 * This logic is coordinated with the setting of these flags in
> +	 * wt-status.c#wt_status_collect_untracked(), and the evaluation
> +	 * of the config setting in commit.c#git_status_config()
> +	 */
> +	char *status_untracked_files_config_value;
> +	int config_outcome = repo_config_get_string(repo,
> +						    "status.showuntrackedfiles",
> +						    &status_untracked_files_config_value);
> +	if (!config_outcome && !strcmp(status_untracked_files_config_value, "all")) {
> +		return 0;
> +	} else {
> +		/*
> +		 * The default, if "all" is not set, is "normal" - leading us here.
> +		 * If the value is "none" then it really doesn't matter.
> +		 */
> +		return DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
> +	}
> +}
> +
> [...]

In reviewing this I found the addition of very long lines & indentation
made this a bit harder to read. I came up with the below which should be
squashable on top, and perhaps makes this easier to read.

I.e. the one caller that needs custom flags passes them, others pass -1
now.

For throwaway "char *" variables we usually use "char *p", "char *val"
or whatever. A "status_untracked_files_config_value" is a bit much for
something function local whose body is <10 lines of code.

And if you drop the "int config_outcome" + rename the variable the
repo_config_get_string() fits on one line:

diff --git a/dir.c b/dir.c
index 57a7d42482f..22a27d7780f 100644
--- a/dir.c
+++ b/dir.c
@@ -2746,34 +2746,33 @@ static void set_untracked_ident(struct untracked_cache *uc)
 	strbuf_addch(&uc->ident, 0);
 }
 
-static unsigned configured_default_dir_flags(struct repository *repo)
+static unsigned new_untracked_cache_flags(struct index_state *istate)
 {
+	struct repository *repo = istate->repo;
+	char *val;
+
 	/*
 	 * This logic is coordinated with the setting of these flags in
 	 * wt-status.c#wt_status_collect_untracked(), and the evaluation
 	 * of the config setting in commit.c#git_status_config()
 	 */
-	char *status_untracked_files_config_value;
-	int config_outcome = repo_config_get_string(repo,
-						    "status.showuntrackedfiles",
-						    &status_untracked_files_config_value);
-	if (!config_outcome && !strcmp(status_untracked_files_config_value, "all")) {
+	if (!repo_config_get_string(repo, "status.showuntrackedfiles", &val) &&
+	    !strcmp(val, "all"))
 		return 0;
-	} else {
-		/*
-		 * The default, if "all" is not set, is "normal" - leading us here.
-		 * If the value is "none" then it really doesn't matter.
-		 */
-		return DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
-	}
+
+	/*
+	 * The default, if "all" is not set, is "normal" - leading us here.
+	 * If the value is "none" then it really doesn't matter.
+	 */
+	return DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
 }
 
-static void new_untracked_cache(struct index_state *istate, unsigned flags)
+static void new_untracked_cache(struct index_state *istate, int flags)
 {
 	struct untracked_cache *uc = xcalloc(1, sizeof(*uc));
 	strbuf_init(&uc->ident, 100);
 	uc->exclude_per_dir = ".gitignore";
-	uc->dir_flags = flags;
+	uc->dir_flags = flags >= 0 ? flags : new_untracked_cache_flags(istate);
 	set_untracked_ident(uc);
 	istate->untracked = uc;
 	istate->cache_changed |= UNTRACKED_CHANGED;
@@ -2782,13 +2781,11 @@ static void new_untracked_cache(struct index_state *istate, unsigned flags)
 void add_untracked_cache(struct index_state *istate)
 {
 	if (!istate->untracked) {
-		new_untracked_cache(istate,
-				    configured_default_dir_flags(istate->repo));
+		new_untracked_cache(istate, -1);
 	} else {
 		if (!ident_in_untracked(istate->untracked)) {
 			free_untracked_cache(istate->untracked);
-			new_untracked_cache(istate,
-					    configured_default_dir_flags(istate->repo));
+			new_untracked_cache(istate, -1);
 		}
 	}
 }
@@ -2866,7 +2863,7 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
 	 * the current effective flags don't match the configured
 	 * flags.
 	 */
-	if (dir->flags != configured_default_dir_flags(istate->repo))
+	if (dir->flags != new_untracked_cache_flags(istate))
 		return NULL;
 
 	/*

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v4] untracked-cache: support '--untracked-files=all' if configured
  2022-02-28 14:02       ` Ævar Arnfjörð Bjarmason
@ 2022-03-02  8:47         ` Tao Klerks
  0 siblings, 0 replies; 22+ messages in thread
From: Tao Klerks @ 2022-03-02  8:47 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Tao Klerks via GitGitGadget, git

On Mon, Feb 28, 2022 at 3:08 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> In reviewing this I found the addition of very long lines & indentation
> made this a bit harder to read. I came up with the below which should be
> squashable on top, and perhaps makes this easier to read.
>
> I.e. the one caller that needs custom flags passes them, others pass -1
> now.
>
> For throwaway "char *" variables we usually use "char *p", "char *val"
> or whatever. A "status_untracked_files_config_value" is a bit much for
> something function local whose body is <10 lines of code.
>
> And if you drop the "int config_outcome" + rename the variable the
> repo_config_get_string() fits on one line:
>

Thank you, this looks much cleaner!

One question I have about this patch itself is why you changed the arg to
new_untracked_cache_flags / configured_default_dir_flags from "repo"
back to "istate" / whether that was intentional; I had understood Junio's
recommendation that the function get the minimum it needed (the repo
in this case) as very sensible... especially now that we need to pass in
that state less often.

Another question is whether I should add a comment in
new_untracked_cache_flags explaining the new meaning of "-1" for
the flags argument, or whether this is sufficiently standard in this
project or in C generally that I should leave it implied.

A third question is whether you have an opinion on the better approach
to getting these changes merged: does it make more sense to argue
that this is an improvement overall, and that the case of configured
"all" and argument-provided "normal", which will now have worse
performance, can and should be ignored, *or* will it be easier to get
this reviewed/merged if the new behavior is made conditional upon a
new configuration key?

Last question is about protocol/workflow in this project: When I squash
your proposal in, should I add an extra
"Signed-Off-By: Ævar Arnfjörð Bjarmason <avarab@gmail.com>" line,
or just include the changes with mine?? (I've read SubmittingPatches
but don't really understand the whole signing off business; I do
understand that as your changes improve mine directly, they should
indeed be "squashed" in, rather than added as an extra commit)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v5 0/2] Support untracked cache with '--untracked-files=all' if configured
  2022-02-27 15:13     ` [PATCH v4] " Tao Klerks via GitGitGadget
  2022-02-28 14:02       ` Ævar Arnfjörð Bjarmason
@ 2022-03-29 11:25       ` Tao Klerks via GitGitGadget
  2022-03-29 11:25         ` [PATCH v5 1/2] untracked-cache: test untracked-cache-bypassing behavior with -uall Tao Klerks via GitGitGadget
                           ` (2 more replies)
  1 sibling, 3 replies; 22+ messages in thread
From: Tao Klerks via GitGitGadget @ 2022-03-29 11:25 UTC (permalink / raw)
  To: git; +Cc: Tao Klerks, Ævar Arnfjörð Bjarmason, Tao Klerks

Make it possible for users of the -uall flag to git status, either by
preference or by need (eg UI tooling), to benefit from the untracked cache
when they set their 'status.showuntrackedfiles' config setting to 'all'.
This is especially useful for large repos in Windows, where without
untracked cache "git status" times can easily be 5x higher, and GUI tooling
typically does use -uall.

In this fifth version, split the change into two patches - one to introduce
tests of the current untracked-cache-skipping behavior when -uall is
specified, and then new tests checking the new behavior with
'status.showuntrackedfiles=all'.

My two remaining questions with respect to this patchset are:

 1. Does it make sense to do this as a simple enhancement as proposed here,
    or would people be more comfortable with a new configuration option,
    given the potential for worse performance under specific (and. I
    believe, vanishingly rare) circumstances?
 2. If it makes sense to envision a future where a single untracked cache
    structure can support both -uall and -unormal, where should this
    possible future be alluded to, if anywhere?

Tao Klerks (2):
  untracked-cache: test untracked-cache-bypassing behavior with -uall
  untracked-cache: support '--untracked-files=all' if configured

 dir.c                             |  85 +++++++++++++++++------
 t/t7063-status-untracked-cache.sh | 108 ++++++++++++++++++++++++++++++
 2 files changed, 174 insertions(+), 19 deletions(-)


base-commit: abf474a5dd901f28013c52155411a48fd4c09922
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-985%2FTaoK%2Ftaok-untracked-cache-with-uall-v5
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-985/TaoK/taok-untracked-cache-with-uall-v5
Pull-Request: https://github.com/gitgitgadget/git/pull/985

Range-diff vs v4:

 -:  ----------- > 1:  98a4d8f35c5 untracked-cache: test untracked-cache-bypassing behavior with -uall
 1:  5da418e5c60 ! 2:  f60d2c6e36c untracked-cache: support '--untracked-files=all' if configured
     @@ Commit message
          untracked-cache: support '--untracked-files=all' if configured
      
          Untracked cache was originally designed to only work with
     -    '-untracked-files=normal', but this causes performance issues for UI
     +    '--untracked-files=normal', but this causes performance issues for UI
          tooling that wants to see "all" on a frequent basis. On the other hand,
          the conditions that prevented applicability to the "all" mode no
          longer seem to apply.
     @@ dir.c: static void set_untracked_ident(struct untracked_cache *uc)
       }
       
      -static void new_untracked_cache(struct index_state *istate)
     -+static unsigned configured_default_dir_flags(struct repository *repo)
     ++static unsigned new_untracked_cache_flags(struct index_state *istate)
      +{
     ++	struct repository *repo = istate->repo;
     ++	char *val;
     ++
      +	/*
      +	 * This logic is coordinated with the setting of these flags in
      +	 * wt-status.c#wt_status_collect_untracked(), and the evaluation
      +	 * of the config setting in commit.c#git_status_config()
      +	 */
     -+	char *status_untracked_files_config_value;
     -+	int config_outcome = repo_config_get_string(repo,
     -+						    "status.showuntrackedfiles",
     -+						    &status_untracked_files_config_value);
     -+	if (!config_outcome && !strcmp(status_untracked_files_config_value, "all")) {
     ++	if (!repo_config_get_string(repo, "status.showuntrackedfiles", &val) &&
     ++	    !strcmp(val, "all"))
      +		return 0;
     -+	} else {
     -+		/*
     -+		 * The default, if "all" is not set, is "normal" - leading us here.
     -+		 * If the value is "none" then it really doesn't matter.
     -+		 */
     -+		return DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
     -+	}
     ++
     ++	/*
     ++	 * The default, if "all" is not set, is "normal" - leading us here.
     ++	 * If the value is "none" then it really doesn't matter.
     ++	 */
     ++	return DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
      +}
      +
     -+static void new_untracked_cache(struct index_state *istate, unsigned flags)
     ++static void new_untracked_cache(struct index_state *istate, int flags)
       {
       	struct untracked_cache *uc = xcalloc(1, sizeof(*uc));
       	strbuf_init(&uc->ident, 100);
       	uc->exclude_per_dir = ".gitignore";
      -	/* should be the same flags used by git-status */
      -	uc->dir_flags = DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
     -+	uc->dir_flags = flags;
     ++	uc->dir_flags = flags >= 0 ? flags : new_untracked_cache_flags(istate);
       	set_untracked_ident(uc);
       	istate->untracked = uc;
       	istate->cache_changed |= UNTRACKED_CHANGED;
     @@ dir.c: static void new_untracked_cache(struct index_state *istate)
       {
       	if (!istate->untracked) {
      -		new_untracked_cache(istate);
     -+		new_untracked_cache(istate,
     -+				    configured_default_dir_flags(istate->repo));
     ++		new_untracked_cache(istate, -1);
       	} else {
       		if (!ident_in_untracked(istate->untracked)) {
       			free_untracked_cache(istate->untracked);
      -			new_untracked_cache(istate);
     -+			new_untracked_cache(istate,
     -+					    configured_default_dir_flags(istate->repo));
     ++			new_untracked_cache(istate, -1);
       		}
       	}
       }
     @@ dir.c: void remove_untracked_cache(struct index_state *istate)
       
       static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *dir,
      -						      int base_len,
     --						      const struct pathspec *pathspec)
     +-						      const struct pathspec *pathspec,
     +-						      struct index_state *istate)
      +							    int base_len,
      +							    const struct pathspec *pathspec,
      +							    struct index_state *istate)
     @@ dir.c: static struct untracked_cache_dir *validate_untracked_cache(struct dir_st
      +	 * the current effective flags don't match the configured
      +	 * flags.
      +	 */
     -+	if (dir->flags != configured_default_dir_flags(istate->repo))
     ++	if (dir->flags != new_untracked_cache_flags(istate))
      +		return NULL;
      +
      +	/*
     @@ dir.c: static struct untracked_cache_dir *validate_untracked_cache(struct dir_st
      +		dir->untracked = istate->untracked;
      +	}
      +
     - 	if (!dir->untracked->root)
     + 	if (!dir->untracked->root) {
     + 		/* Untracked cache existed but is not initialized; fix that */
       		FLEX_ALLOC_STR(dir->untracked->root, name, "");
     +
     + ## t/t7063-status-untracked-cache.sh ##
     +@@ t/t7063-status-untracked-cache.sh: test_expect_success 'untracked cache remains after bypass' '
     + 	test_cmp ../dump.expect ../actual
     + '
       
     -@@ dir.c: int read_directory(struct dir_struct *dir, struct index_state *istate,
     - 		return dir->nr;
     - 	}
     - 
     --	untracked = validate_untracked_cache(dir, len, pathspec);
     -+	untracked = validate_untracked_cache(dir, len, pathspec, istate);
     - 	if (!untracked)
     - 		/*
     - 		 * make sure untracked cache code path is disabled,
     ++test_expect_success 'if -uall is configured, untracked cache gets populated by default' '
     ++	test_config status.showuntrackedfiles all &&
     ++	: >../trace.output &&
     ++	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
     ++	git status --porcelain >../actual &&
     ++	iuc status --porcelain >../status.iuc &&
     ++	test_cmp ../status_uall.expect ../status.iuc &&
     ++	test_cmp ../status_uall.expect ../actual &&
     ++	get_relevant_traces ../trace.output ../trace.relevant &&
     ++	cat >../trace.expect <<EOF &&
     ++ ....path:
     ++ ....node-creation:3
     ++ ....gitignore-invalidation:1
     ++ ....directory-invalidation:0
     ++ ....opendir:4
     ++EOF
     ++	test_cmp ../trace.expect ../trace.relevant
     ++'
     ++
     ++cat >../dump_uall.expect <<EOF &&
     ++info/exclude $EMPTY_BLOB
     ++core.excludesfile $ZERO_OID
     ++exclude_per_dir .gitignore
     ++flags 00000000
     ++/ $ZERO_OID recurse valid
     ++three
     ++/done/ $ZERO_OID recurse valid
     ++/dthree/ $ZERO_OID recurse valid
     ++three
     ++/dtwo/ $ZERO_OID recurse valid
     ++two
     ++EOF
     ++
     ++test_expect_success 'if -uall was configured, untracked cache is populated' '
     ++	test-tool dump-untracked-cache >../actual &&
     ++	test_cmp ../dump_uall.expect ../actual
     ++'
     ++
     ++test_expect_success 'if -uall is configured, untracked cache is used by default' '
     ++	test_config status.showuntrackedfiles all &&
     ++	: >../trace.output &&
     ++	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
     ++	git status --porcelain >../actual &&
     ++	iuc status --porcelain >../status.iuc &&
     ++	test_cmp ../status_uall.expect ../status.iuc &&
     ++	test_cmp ../status_uall.expect ../actual &&
     ++	get_relevant_traces ../trace.output ../trace.relevant &&
     ++	cat >../trace.expect <<EOF &&
     ++ ....path:
     ++ ....node-creation:0
     ++ ....gitignore-invalidation:0
     ++ ....directory-invalidation:0
     ++ ....opendir:0
     ++EOF
     ++	test_cmp ../trace.expect ../trace.relevant
     ++'
     ++
     ++# Bypassing the untracked cache here is not desirable, but it expected
     ++# in the current implementation
     ++test_expect_success 'if -uall is configured, untracked cache is bypassed with -unormal' '
     ++	test_config status.showuntrackedfiles all &&
     ++	: >../trace.output &&
     ++	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
     ++	git status -unormal --porcelain >../actual &&
     ++	iuc status -unormal --porcelain >../status.iuc &&
     ++	test_cmp ../status.expect ../status.iuc &&
     ++	test_cmp ../status.expect ../actual &&
     ++	get_relevant_traces ../trace.output ../trace.relevant &&
     ++	cat >../trace.expect <<EOF &&
     ++ ....path:
     ++EOF
     ++	test_cmp ../trace.expect ../trace.relevant
     ++'
     ++
     ++test_expect_success 'repopulate untracked cache for -unormal' '
     ++	git status --porcelain
     ++'
     ++
     + test_expect_success 'modify in root directory, one dir invalidation' '
     + 	: >four &&
     + 	test-tool chmtime =-240 four &&

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v5 1/2] untracked-cache: test untracked-cache-bypassing behavior with -uall
  2022-03-29 11:25       ` [PATCH v5 0/2] Support untracked cache with " Tao Klerks via GitGitGadget
@ 2022-03-29 11:25         ` Tao Klerks via GitGitGadget
  2022-03-29 16:51           ` Junio C Hamano
  2022-03-29 11:25         ` [PATCH v5 2/2] untracked-cache: support '--untracked-files=all' if configured Tao Klerks via GitGitGadget
  2022-03-31 16:02         ` [PATCH v6 0/2] Support untracked cache with " Tao Klerks via GitGitGadget
  2 siblings, 1 reply; 22+ messages in thread
From: Tao Klerks via GitGitGadget @ 2022-03-29 11:25 UTC (permalink / raw)
  To: git
  Cc: Tao Klerks, Ævar Arnfjörð Bjarmason, Tao Klerks,
	Tao Klerks

From: Tao Klerks <tao@klerks.biz>

Untracked cache was originally designed to only work with
'--untracked-files=normal', and it gets ignored when
'--untracked-files=all' is specified instead.

Add explicit tests for this behavior.

Signed-off-by: Tao Klerks <tao@klerks.biz>
---
 t/t7063-status-untracked-cache.sh | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh
index ca90ee805e7..c44b70b96e4 100755
--- a/t/t7063-status-untracked-cache.sh
+++ b/t/t7063-status-untracked-cache.sh
@@ -190,6 +190,36 @@ test_expect_success 'untracked cache after second status' '
 	test_cmp ../dump.expect ../actual
 '
 
+cat >../status_uall.expect <<EOF &&
+A  done/one
+A  one
+A  two
+?? dthree/three
+?? dtwo/two
+?? three
+EOF
+
+# Bypassing the untracked cache here is not desirable, but it expected
+# in the current implementation
+test_expect_success 'untracked cache is bypassed with -uall' '
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
+	git status -uall --porcelain >../actual &&
+	iuc status -uall --porcelain >../status.iuc &&
+	test_cmp ../status_uall.expect ../status.iuc &&
+	test_cmp ../status_uall.expect ../actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
+	cat >../trace.expect <<EOF &&
+ ....path:
+EOF
+	test_cmp ../trace.expect ../trace.relevant
+'
+
+test_expect_success 'untracked cache remains after bypass' '
+	test-tool dump-untracked-cache >../actual &&
+	test_cmp ../dump.expect ../actual
+'
+
 test_expect_success 'modify in root directory, one dir invalidation' '
 	: >four &&
 	test-tool chmtime =-240 four &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v5 2/2] untracked-cache: support '--untracked-files=all' if configured
  2022-03-29 11:25       ` [PATCH v5 0/2] Support untracked cache with " Tao Klerks via GitGitGadget
  2022-03-29 11:25         ` [PATCH v5 1/2] untracked-cache: test untracked-cache-bypassing behavior with -uall Tao Klerks via GitGitGadget
@ 2022-03-29 11:25         ` Tao Klerks via GitGitGadget
  2022-03-29 17:43           ` Junio C Hamano
  2022-03-31 16:02         ` [PATCH v6 0/2] Support untracked cache with " Tao Klerks via GitGitGadget
  2 siblings, 1 reply; 22+ messages in thread
From: Tao Klerks via GitGitGadget @ 2022-03-29 11:25 UTC (permalink / raw)
  To: git
  Cc: Tao Klerks, Ævar Arnfjörð Bjarmason, Tao Klerks,
	Tao Klerks

From: Tao Klerks <tao@klerks.biz>

Untracked cache was originally designed to only work with
'--untracked-files=normal', but this causes performance issues for UI
tooling that wants to see "all" on a frequent basis. On the other hand,
the conditions that prevented applicability to the "all" mode no
longer seem to apply.

The disqualification of untracked cache has a particularly significant
impact on Windows with the defaulted fscache, where the introduction
of fsmonitor can make the first and costly directory-iteration happen
in untracked file detection, single-threaded, rather than in
preload-index on multiple threads. Improving the performance of a
"normal" 'git status' run with fsmonitor can make
'git status --untracked-files=all' perform much worse.

To partially address this, align the supported directory flags for the
stored untracked cache data with the git config. If a user specifies
an '--untracked-files=' commandline parameter that does not align with
their 'status.showuntrackedfiles' config value, then the untracked
cache will be ignored - as it is for other unsupported situations like
when a pathspec is specified.

If the previously stored flags no longer match the current
configuration, but the currently-applicable flags do match the current
configuration, then discard the previously stored untracked cache
data.

For most users there will be no change in behavior. Users who need
'--untracked-files=all' to perform well will now have the option of
setting "status.showuntrackedfiles" to "all" for better / more
consistent performance.

Users who need '--untracked-files=all' to perform well for their
tooling AND prefer to avoid the verbosity of "all" when running
git status explicitly without options... are out of luck for now (no
change).

Users who have the "status.showuntrackedfiles" config set to "all"
and yet frequently explicitly call
'git status --untracked-files=normal' (and use the untracked cache)
are the only ones who will be disadvantaged by this change. Their
"--untracked-files=normal" calls will, after this change, no longer
use the untracked cache.

Signed-off-by: Tao Klerks <tao@klerks.biz>
---
 dir.c                             | 85 ++++++++++++++++++++++++-------
 t/t7063-status-untracked-cache.sh | 78 ++++++++++++++++++++++++++++
 2 files changed, 144 insertions(+), 19 deletions(-)

diff --git a/dir.c b/dir.c
index f2b0f242101..c5b513ddd2a 100644
--- a/dir.c
+++ b/dir.c
@@ -2747,13 +2747,33 @@ static void set_untracked_ident(struct untracked_cache *uc)
 	strbuf_addch(&uc->ident, 0);
 }
 
-static void new_untracked_cache(struct index_state *istate)
+static unsigned new_untracked_cache_flags(struct index_state *istate)
+{
+	struct repository *repo = istate->repo;
+	char *val;
+
+	/*
+	 * This logic is coordinated with the setting of these flags in
+	 * wt-status.c#wt_status_collect_untracked(), and the evaluation
+	 * of the config setting in commit.c#git_status_config()
+	 */
+	if (!repo_config_get_string(repo, "status.showuntrackedfiles", &val) &&
+	    !strcmp(val, "all"))
+		return 0;
+
+	/*
+	 * The default, if "all" is not set, is "normal" - leading us here.
+	 * If the value is "none" then it really doesn't matter.
+	 */
+	return DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
+}
+
+static void new_untracked_cache(struct index_state *istate, int flags)
 {
 	struct untracked_cache *uc = xcalloc(1, sizeof(*uc));
 	strbuf_init(&uc->ident, 100);
 	uc->exclude_per_dir = ".gitignore";
-	/* should be the same flags used by git-status */
-	uc->dir_flags = DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
+	uc->dir_flags = flags >= 0 ? flags : new_untracked_cache_flags(istate);
 	set_untracked_ident(uc);
 	istate->untracked = uc;
 	istate->cache_changed |= UNTRACKED_CHANGED;
@@ -2762,11 +2782,11 @@ static void new_untracked_cache(struct index_state *istate)
 void add_untracked_cache(struct index_state *istate)
 {
 	if (!istate->untracked) {
-		new_untracked_cache(istate);
+		new_untracked_cache(istate, -1);
 	} else {
 		if (!ident_in_untracked(istate->untracked)) {
 			free_untracked_cache(istate->untracked);
-			new_untracked_cache(istate);
+			new_untracked_cache(istate, -1);
 		}
 	}
 }
@@ -2781,9 +2801,9 @@ void remove_untracked_cache(struct index_state *istate)
 }
 
 static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *dir,
-						      int base_len,
-						      const struct pathspec *pathspec,
-						      struct index_state *istate)
+							    int base_len,
+							    const struct pathspec *pathspec,
+							    struct index_state *istate)
 {
 	struct untracked_cache_dir *root;
 	static int untracked_cache_disabled = -1;
@@ -2814,17 +2834,9 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
 	if (base_len || (pathspec && pathspec->nr))
 		return NULL;
 
-	/* Different set of flags may produce different results */
-	if (dir->flags != dir->untracked->dir_flags ||
-	    /*
-	     * See treat_directory(), case index_nonexistent. Without
-	     * this flag, we may need to also cache .git file content
-	     * for the resolve_gitlink_ref() call, which we don't.
-	     */
-	    !(dir->flags & DIR_SHOW_OTHER_DIRECTORIES) ||
-	    /* We don't support collecting ignore files */
-	    (dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO |
-			   DIR_COLLECT_IGNORED)))
+	/* We don't support collecting ignore files */
+	if (dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO |
+			DIR_COLLECT_IGNORED))
 		return NULL;
 
 	/*
@@ -2847,6 +2859,41 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
 		return NULL;
 	}
 
+	/*
+	 * We don't support using or preparing the untracked cache if
+	 * the current effective flags don't match the configured
+	 * flags.
+	 */
+	if (dir->flags != new_untracked_cache_flags(istate))
+		return NULL;
+
+	/*
+	 * If the untracked structure we received does not have the same flags
+	 * as configured, but the configured flags do match the effective flags,
+	 * then we need to reset / create a new "untracked" structure to match
+	 * the new config.
+	 * Keeping the saved and used untracked cache in-line with the
+	 * configuration provides an opportunity for frequent users of
+	 * "git status -uall" to leverage the untracked cache by aligning their
+	 * configuration (setting "status.showuntrackedfiles" to "all" or
+	 * "normal" as appropriate), where previously this option was
+	 * incompatible with untracked cache and *consistently* caused
+	 * surprisingly bad performance (with fscache and fsmonitor enabled) on
+	 * Windows.
+	 *
+	 * IMPROVEMENT OPPORTUNITY: If we reworked the untracked cache storage
+	 * to not be as bound up with the desired output in a given run,
+	 * and instead iterated through and stored enough information to
+	 * correctly serve both "modes", then users could get peak performance
+	 * with or without '-uall' regardless of their
+	 * "status.showuntrackedfiles" config.
+	 */
+	if (dir->flags != dir->untracked->dir_flags) {
+		free_untracked_cache(istate->untracked);
+		new_untracked_cache(istate, dir->flags);
+		dir->untracked = istate->untracked;
+	}
+
 	if (!dir->untracked->root) {
 		/* Untracked cache existed but is not initialized; fix that */
 		FLEX_ALLOC_STR(dir->untracked->root, name, "");
diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh
index c44b70b96e4..b52fd937156 100755
--- a/t/t7063-status-untracked-cache.sh
+++ b/t/t7063-status-untracked-cache.sh
@@ -220,6 +220,84 @@ test_expect_success 'untracked cache remains after bypass' '
 	test_cmp ../dump.expect ../actual
 '
 
+test_expect_success 'if -uall is configured, untracked cache gets populated by default' '
+	test_config status.showuntrackedfiles all &&
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
+	git status --porcelain >../actual &&
+	iuc status --porcelain >../status.iuc &&
+	test_cmp ../status_uall.expect ../status.iuc &&
+	test_cmp ../status_uall.expect ../actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
+	cat >../trace.expect <<EOF &&
+ ....path:
+ ....node-creation:3
+ ....gitignore-invalidation:1
+ ....directory-invalidation:0
+ ....opendir:4
+EOF
+	test_cmp ../trace.expect ../trace.relevant
+'
+
+cat >../dump_uall.expect <<EOF &&
+info/exclude $EMPTY_BLOB
+core.excludesfile $ZERO_OID
+exclude_per_dir .gitignore
+flags 00000000
+/ $ZERO_OID recurse valid
+three
+/done/ $ZERO_OID recurse valid
+/dthree/ $ZERO_OID recurse valid
+three
+/dtwo/ $ZERO_OID recurse valid
+two
+EOF
+
+test_expect_success 'if -uall was configured, untracked cache is populated' '
+	test-tool dump-untracked-cache >../actual &&
+	test_cmp ../dump_uall.expect ../actual
+'
+
+test_expect_success 'if -uall is configured, untracked cache is used by default' '
+	test_config status.showuntrackedfiles all &&
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
+	git status --porcelain >../actual &&
+	iuc status --porcelain >../status.iuc &&
+	test_cmp ../status_uall.expect ../status.iuc &&
+	test_cmp ../status_uall.expect ../actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
+	cat >../trace.expect <<EOF &&
+ ....path:
+ ....node-creation:0
+ ....gitignore-invalidation:0
+ ....directory-invalidation:0
+ ....opendir:0
+EOF
+	test_cmp ../trace.expect ../trace.relevant
+'
+
+# Bypassing the untracked cache here is not desirable, but it expected
+# in the current implementation
+test_expect_success 'if -uall is configured, untracked cache is bypassed with -unormal' '
+	test_config status.showuntrackedfiles all &&
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
+	git status -unormal --porcelain >../actual &&
+	iuc status -unormal --porcelain >../status.iuc &&
+	test_cmp ../status.expect ../status.iuc &&
+	test_cmp ../status.expect ../actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
+	cat >../trace.expect <<EOF &&
+ ....path:
+EOF
+	test_cmp ../trace.expect ../trace.relevant
+'
+
+test_expect_success 'repopulate untracked cache for -unormal' '
+	git status --porcelain
+'
+
 test_expect_success 'modify in root directory, one dir invalidation' '
 	: >four &&
 	test-tool chmtime =-240 four &&
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 1/2] untracked-cache: test untracked-cache-bypassing behavior with -uall
  2022-03-29 11:25         ` [PATCH v5 1/2] untracked-cache: test untracked-cache-bypassing behavior with -uall Tao Klerks via GitGitGadget
@ 2022-03-29 16:51           ` Junio C Hamano
  2022-03-30  4:46             ` Tao Klerks
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2022-03-29 16:51 UTC (permalink / raw)
  To: Tao Klerks via GitGitGadget
  Cc: git, Tao Klerks, Ævar Arnfjörð Bjarmason

"Tao Klerks via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +# Bypassing the untracked cache here is not desirable, but it expected
> +# in the current implementation

If that is the case, it is much more desirable to squash it into a
single [2/2] patch so that the desirable working is documented (so
future breakage can be caught), the reviewers can read what the
intended behaviour is more easily (so we do not have to be confused
by this one saying "expect success"), make it easier to cherry-pick
the fix and test in the same patch elsewhere, and the existing
breakage can easily be caught by applying only the test part of the
patch.

Thanks.

> +test_expect_success 'untracked cache is bypassed with -uall' '
> +	: >../trace.output &&
> +	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
> +	git status -uall --porcelain >../actual &&
> +	iuc status -uall --porcelain >../status.iuc &&
> +	test_cmp ../status_uall.expect ../status.iuc &&
> +	test_cmp ../status_uall.expect ../actual &&
> +	get_relevant_traces ../trace.output ../trace.relevant &&
> +	cat >../trace.expect <<EOF &&
> + ....path:
> +EOF
> +	test_cmp ../trace.expect ../trace.relevant
> +'
> +
> +test_expect_success 'untracked cache remains after bypass' '
> +	test-tool dump-untracked-cache >../actual &&
> +	test_cmp ../dump.expect ../actual
> +'
> +
>  test_expect_success 'modify in root directory, one dir invalidation' '
>  	: >four &&
>  	test-tool chmtime =-240 four &&

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 2/2] untracked-cache: support '--untracked-files=all' if configured
  2022-03-29 11:25         ` [PATCH v5 2/2] untracked-cache: support '--untracked-files=all' if configured Tao Klerks via GitGitGadget
@ 2022-03-29 17:43           ` Junio C Hamano
  2022-03-30 19:59             ` Tao Klerks
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2022-03-29 17:43 UTC (permalink / raw)
  To: Tao Klerks via GitGitGadget
  Cc: git, Tao Klerks, Ævar Arnfjörð Bjarmason

"Tao Klerks via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Tao Klerks <tao@klerks.biz>
>
> Untracked cache was originally designed to only work with
> '--untracked-files=normal', but this causes performance issues for UI
> tooling that wants to see "all" on a frequent basis. On the other hand,
> the conditions that prevented applicability to the "all" mode no
> longer seem to apply.

There is a slight leap in logic flow before ", but this causes" that
forces readers read the above twice and guess what is missing.  I am
guessing that

    ... was designed to only work with "--untracked-files=normal",
    and is bypassed when "--untracked-files=all" request, but this
    causes ...

is what you meant to say.

The claim in the last sentence "... no longer seem to apply" is
thrown at the readers without much rationale.  Either justify it
more solidly or discard the claim?

> The disqualification of untracked cache has a particularly significant
> impact on Windows with the defaulted fscache, where the introduction
> of fsmonitor can make the first and costly directory-iteration happen
> in untracked file detection, single-threaded, rather than in
> preload-index on multiple threads. Improving the performance of a
> "normal" 'git status' run with fsmonitor can make
> 'git status --untracked-files=all' perform much worse.

The last sentence is a bit hard to parse and very hard to reason
about.  Do you mean to say "--untracked-files=all is slower when the
untracked cache is in use, so the performance of 'git status' may be
improved for '--untracked-files=normal' when the untracked cache is
used, it hurts when 'git status --untracked-files=all' is run"?

> To partially address this, align the supported directory flags for the
> stored untracked cache data with the git config. If a user specifies
> an '--untracked-files=' commandline parameter that does not align with
> their 'status.showuntrackedfiles' config value, then the untracked
> cache will be ignored - as it is for other unsupported situations like
> when a pathspec is specified.
>
> If the previously stored flags no longer match the current
> configuration, but the currently-applicable flags do match the current
> configuration, then discard the previously stored untracked cache
> data.

Let me follow what actually happens in the patch aloud.

> -static void new_untracked_cache(struct index_state *istate)
> +static unsigned new_untracked_cache_flags(struct index_state *istate)
> +{
> +	struct repository *repo = istate->repo;
> +	char *val;
> +
> +	/*
> +	 * This logic is coordinated with the setting of these flags in
> +	 * wt-status.c#wt_status_collect_untracked(), and the evaluation
> +	 * of the config setting in commit.c#git_status_config()
> +	 */
> +	if (!repo_config_get_string(repo, "status.showuntrackedfiles", &val) &&
> +	    !strcmp(val, "all"))
> +		return 0;
> +
> +	/*
> +	 * The default, if "all" is not set, is "normal" - leading us here.
> +	 * If the value is "none" then it really doesn't matter.
> +	 */
> +	return DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
> +}

This _guesses_ the user preference from the configuration.

> +static void new_untracked_cache(struct index_state *istate, int flags)
>  {
>  	struct untracked_cache *uc = xcalloc(1, sizeof(*uc));
>  	strbuf_init(&uc->ident, 100);
>  	uc->exclude_per_dir = ".gitignore";
> -	/* should be the same flags used by git-status */
> -	uc->dir_flags = DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
> +	uc->dir_flags = flags >= 0 ? flags : new_untracked_cache_flags(istate);

We use unsigned for the flag word unless there is a reason not to,
but this function wants to use top-bit as a signal to "guess from
config".  The caller either dictates what bits to set, or we guess.

And the created uc records the flags.

> @@ -2762,11 +2782,11 @@ static void new_untracked_cache(struct index_state *istate)
>  void add_untracked_cache(struct index_state *istate)
>  {
>  	if (!istate->untracked) {
> -		new_untracked_cache(istate);
> +		new_untracked_cache(istate, -1);

We are newly creating, so "-1" (guess from config) may be the most
appropriate (unless the caller of add_untracked_cache() already
knows what operation it is using for, that is).

>  	} else {
>  		if (!ident_in_untracked(istate->untracked)) {

Found an existing one but need to recreate.

>  			free_untracked_cache(istate->untracked);
> -			new_untracked_cache(istate);
> +			new_untracked_cache(istate, -1);

In this case, is it likely to give us a better guess to read the
configuration, or copy from the existing untracked file?  My gut
feeling says it would be the latter, and if that is correct, then
we might want

+	      		int old_flags = istate->untracked->dir_flags;
 			free_untracked_cache(istate->untracked);
-			new_untracked_cache(istate);
+			new_untracked_cache(istate, old_flags);

instead?  I dunno.

> @@ -2781,9 +2801,9 @@ void remove_untracked_cache(struct index_state *istate)
>  }
>  
>  static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *dir,
> -						      int base_len,
> -						      const struct pathspec *pathspec,
> -						      struct index_state *istate)
> +							    int base_len,
> +							    const struct pathspec *pathspec,
> +							    struct index_state *istate)
>  {
>  	struct untracked_cache_dir *root;
>  	static int untracked_cache_disabled = -1;

Is this just re-indenting?  Not very welcome, but OK.

> @@ -2814,17 +2834,9 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
>  	if (base_len || (pathspec && pathspec->nr))
>  		return NULL;
>  
> -	/* Different set of flags may produce different results */
> -	if (dir->flags != dir->untracked->dir_flags ||
> -	    /*
> -	     * See treat_directory(), case index_nonexistent. Without
> -	     * this flag, we may need to also cache .git file content
> -	     * for the resolve_gitlink_ref() call, which we don't.
> -	     */
> -	    !(dir->flags & DIR_SHOW_OTHER_DIRECTORIES) ||
> -	    /* We don't support collecting ignore files */
> -	    (dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO |
> -			   DIR_COLLECT_IGNORED)))
> +	/* We don't support collecting ignore files */
> +	if (dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO |
> +			DIR_COLLECT_IGNORED))
>  		return NULL;
>  
>  	/*
> @@ -2847,6 +2859,41 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
>  		return NULL;
>  	}
>  
> +	/*
> +	 * We don't support using or preparing the untracked cache if
> +	 * the current effective flags don't match the configured
> +	 * flags.
> +	 */

Is that what we want?  What happens when your user does this:

    - set showuntrackedfiles to "normal"
    - generate the untracked cache
    - set showuntrackedfiles to "all"

And now the user wants to make a request that wants to see flags
that are suitable for "normal".

The best case would be for the untracked cache to know what flags
were in use when it was generated, so that in the above sequence,
even though the current value of configuration variable and the
current request from the command line contradict each other, we
can notice that the untracked cache data is suitable for the current
request and the right thing happens.

> +	if (dir->flags != new_untracked_cache_flags(istate))
> +		return NULL;

But this only pays attention to what is in the configuration?  It
seems to me that it is being too pessimistic, but perhaps it is
necessary for correctness somehow?

Thanks.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 1/2] untracked-cache: test untracked-cache-bypassing behavior with -uall
  2022-03-29 16:51           ` Junio C Hamano
@ 2022-03-30  4:46             ` Tao Klerks
  2022-03-30 16:39               ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: Tao Klerks @ 2022-03-30  4:46 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Tao Klerks via GitGitGadget, git,
	Ævar Arnfjörð Bjarmason

On Tue, Mar 29, 2022 at 6:51 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Tao Klerks via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > +# Bypassing the untracked cache here is not desirable, but it expected
> > +# in the current implementation
>
> If that is the case, it is much more desirable to squash it into a
> single [2/2] patch so that the desirable working is documented (so
> future breakage can be caught), the reviewers can read what the
> intended behaviour is more easily (so we do not have to be confused
> by this one saying "expect success"), make it easier to cherry-pick
> the fix and test in the same patch elsewhere, and the existing
> breakage can easily be caught by applying only the test part of the
> patch.

Sorry, I'm not completely sure whether my comment was misleading, or
whether I'm misunderstanding your feedback.

The test added here does not test "desirable" behavior from an
end-user functional perspective, but it does test behavior that is
working "as-designed" as of many years ago.

The intent in adding this test is to ensure that if/when someone
changes this behavior down the line, they be forced to understand the
implications (eg: the current untracked cache structure does not store
the same data for a -uall run as for a -unormal run, and so using the
data from one in another would lead to output correctness errors).

Importantly, the behavior that this test is exercising *is not
changed* by the 2/2 "improvement" patch; this is why I separated it
out - it's a change that I feel we should make either way, and could
even be its own independent patch. It's in this series as it adds
relevant context and I am later adding similar tests for the
new/additional behavior.

Given the (I believe) misunderstanding, a better comment would be
something like:

# Bypassing the untracked cache here is not desirable from an
# end-user perspective, but is expected in the current implementation.
# The untracked cache data stored for a -unormal run cannot be
# correctly used in a -uall run - it would yield incorrect output.

Does that make more sense?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 1/2] untracked-cache: test untracked-cache-bypassing behavior with -uall
  2022-03-30  4:46             ` Tao Klerks
@ 2022-03-30 16:39               ` Junio C Hamano
  2022-03-31  5:15                 ` Tao Klerks
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2022-03-30 16:39 UTC (permalink / raw)
  To: Tao Klerks
  Cc: Tao Klerks via GitGitGadget, git,
	Ævar Arnfjörð Bjarmason

Tao Klerks <tao@klerks.biz> writes:

> Sorry, I'm not completely sure whether my comment was misleading, or
> whether I'm misunderstanding your feedback.
>
> The test added here does not test "desirable" behavior from an
> end-user functional perspective, but it does test behavior that is
> working "as-designed" as of many years ago.

Is it "as-designed", or just "left buggy"?

I somehow had an impression that you meant the latter, and it would
be our aspirational goal to eventually fix it.  And for such case,
it would be better to write the test body to show what the command 
should do, which would make the test fail with today's Git, and mark
it as test_expect_failure (which is not an ideal mechanism to prepare
for a future fix, but that is what we have now and should use, until
a better alternative being worked on is finished).

But if it is "as-designed, some users may find it suboptimal or
confusing or with any other negative adjectives, but it is too late
to change now and more importantly it is unthinkable to change it
because existing tools and users do depend on the current behaviour",
then what you did is perfectly fine.

> The intent in adding this test is to ensure that if/when someone
> changes this behavior down the line, they be forced to understand the
> implications.

That, too.

Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 2/2] untracked-cache: support '--untracked-files=all' if configured
  2022-03-29 17:43           ` Junio C Hamano
@ 2022-03-30 19:59             ` Tao Klerks
  0 siblings, 0 replies; 22+ messages in thread
From: Tao Klerks @ 2022-03-30 19:59 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Tao Klerks via GitGitGadget, git,
	Ævar Arnfjörð Bjarmason

 the

On Tue, Mar 29, 2022 at 7:43 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Tao Klerks via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: Tao Klerks <tao@klerks.biz>
> >
> > Untracked cache was originally designed to only work with
> > '--untracked-files=normal', but this causes performance issues for UI
> > tooling that wants to see "all" on a frequent basis. On the other hand,
> > the conditions that prevented applicability to the "all" mode no
> > longer seem to apply.
>
> There is a slight leap in logic flow before ", but this causes" that
> forces readers read the above twice and guess what is missing.  I am
> guessing that
>
>     ... was designed to only work with "--untracked-files=normal",
>     and is bypassed when "--untracked-files=all" request, but this
>     causes ...
>
> is what you meant to say.

Yep, will fix.

>
> The claim in the last sentence "... no longer seem to apply" is
> thrown at the readers without much rationale.  Either justify it
> more solidly or discard the claim?
>

I can add references to previous conversation threads around the
topic, and add the word "experimentally".

> > The disqualification of untracked cache has a particularly significant
> > impact on Windows with the defaulted fscache, where the introduction
> > of fsmonitor can make the first and costly directory-iteration happen
> > in untracked file detection, single-threaded, rather than in
> > preload-index on multiple threads. Improving the performance of a
> > "normal" 'git status' run with fsmonitor can make
> > 'git status --untracked-files=all' perform much worse.
>
> The last sentence is a bit hard to parse and very hard to reason
> about.  Do you mean to say "--untracked-files=all is slower when the
> untracked cache is in use, so the performance of 'git status' may be
> improved for '--untracked-files=normal' when the untracked cache is
> used, it hurts when 'git status --untracked-files=all' is run"?

Something like that. The case where untracked cache bypassing (or
disabling) has a huge impact is on windows, and involves fscache and
fsmonitor:
* Untracked cache saves some work under qualifying conditions
* If qualifying conditions are not met (eg when -uall), we do the work
anyway as though untracked cache were disabled.
* fscache optimises directory-iteration on windows, with an N-thread
preload strategy. This is enabled by default, and improves windows
"git status" performance transparently from "abominably slow" to just
"slow".
* fsmonitor, when combined with untracked cache, eliminates recursive
directory-iteration in favor of using a persistent in-memory model of
what's changed on disk (to compare to the index and untracked cache)
* When fsmonitor is enabled and used, the fscache preload strategy is
(obviously) disabled
* When fsmonitor is enabled and untracked cache is disabled or
bypassed, the otherwise-ubiquitous "fscache preload" does not happen,
and the untracked file search (directory iteration) takes... forever.

In other words: In windows, where fscache behavior is critical for
large repos and defaulted-on, when fsmonitor is also enabled which
generally further improves things but disqualifies fscache preload,
the bypassing of untracked cache causes a huge performance impact. A
"-uall" run can take a minute, where a "-unormal" run takes only a
second, because the enumeration of untracked files without using the
untracked cache data means a full directory iteration, not optimized
by fscache. On the other hand, removing fsmonitor and untracked cache
altogether, and leaving just the fscache optimization to do its job
consistently, results in consistent middle-of-the-road performance in
the order of 10 seconds in this example.

Enabling fsmonitor in this situation causes -uall to perform
pathologically badly, because of untracked cache bypassing
(interacting with the fscache-preload-bypassing of fsmonitor).
However, the benefits of dropping status time from 10 seconds to 1
second make fsmonitor very attractive. On the other hand, the
pathological performance with -uall is a problem if you use GUIs that
*require* -uall. Therefore, fixing untracked cache to be compatible
with -uall is critical to large repos on windows where the ideal
experience would be fscache working, fsmonitor working, untracked
cache working, and GUIs able to use -uall without the user waiting a
minute or more and sometimes timing out.

This problem sounds like an edge-case when I describe the chain of
interactions, but is actually very common among large-repo windows
users. Windows users are typically GUI users. GUIs typically use
-uall, as git's default "hide contents of untracked folders" behavior
is obviously limiting.

This patch doesn't actually change anything I've described by default
- but it makes it *possible* to get good/consistent performance with
-uall (with fscache, with fsmonitor, with untracked cache), *if* you
configure status.showuntrackedfiles=all.

Now, coming back to the paragraph in question; maybe, instead of
trying to summarize this windows-fscache-fsmonitor-untrackedcache
interaction, I should be a little more vague. Would this work?

--
When 'git status' runs without using the untracked cache, on a large
repo, on windows, with fsmonitor, it can run very slowly. This can
make GUIs that need to use "-uall" (and therefore currently bypass
untracked cache) unusable when fsmonitor is enabled, on such large
repos.
--


> Let me follow what actually happens in the patch aloud.
>
[...]
> This _guesses_ the user preference from the configuration.
>

Yes - the proposal, in this patch, is that the untracked cache
usability/compatibility be aligned with the configuration. We are
adding an extra meaning to the "status.showuntrackedfiles"
configuration, such that it not only indicates what you want to have
happen when you don't specify "-u" to git status, but it *also*
indicates what kind of status output the untracked cache should be
targeting/supporting.

> > +static void new_untracked_cache(struct index_state *istate, int flags)
> >  {
> >       struct untracked_cache *uc = xcalloc(1, sizeof(*uc));
> >       strbuf_init(&uc->ident, 100);
> >       uc->exclude_per_dir = ".gitignore";
> > -     /* should be the same flags used by git-status */
> > -     uc->dir_flags = DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
> > +     uc->dir_flags = flags >= 0 ? flags : new_untracked_cache_flags(istate);
>
> We use unsigned for the flag word unless there is a reason not to,
> but this function wants to use top-bit as a signal to "guess from
> config".  The caller either dictates what bits to set, or we guess.
>
> And the created uc records the flags.
>

Yep - although the word "guess" may be slightly misleading here. The
proposed semantics are such that if the existing untracked cache flags
are inconsistent with the config, we *discard* the
no-longer-consistent-with-config untracked cache; so it's not so much
a guess as a mandate.

> > @@ -2762,11 +2782,11 @@ static void new_untracked_cache(struct index_state *istate)
> >  void add_untracked_cache(struct index_state *istate)
> >  {
> >       if (!istate->untracked) {
> > -             new_untracked_cache(istate);
> > +             new_untracked_cache(istate, -1);
>
> We are newly creating, so "-1" (guess from config) may be the most
> appropriate (unless the caller of add_untracked_cache() already
> knows what operation it is using for, that is).
>
> >       } else {
> >               if (!ident_in_untracked(istate->untracked)) {
>
> Found an existing one but need to recreate.
>
> >                       free_untracked_cache(istate->untracked);
> > -                     new_untracked_cache(istate);
> > +                     new_untracked_cache(istate, -1);
>
> In this case, is it likely to give us a better guess to read the
> configuration, or copy from the existing untracked file?  My gut
> feeling says it would be the latter, and if that is correct, then
> we might want
>
> +                       int old_flags = istate->untracked->dir_flags;
>                         free_untracked_cache(istate->untracked);
> -                       new_untracked_cache(istate);
> +                       new_untracked_cache(istate, old_flags);
>
> instead?  I dunno.
>

As I noted above, we later consider that an untracked cache with flags
that are inconsistent with the current config is an invalid untracked
cache, and discard it; so setting the flags based on config is the
right thing (the only useful thing) to do.


> > @@ -2781,9 +2801,9 @@ void remove_untracked_cache(struct index_state *istate)
> >  }
> >
> >  static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *dir,
> > -                                                   int base_len,
> > -                                                   const struct pathspec *pathspec,
> > -                                                   struct index_state *istate)
> > +                                                         int base_len,
> > +                                                         const struct pathspec *pathspec,
> > +                                                         struct index_state *istate)
> >  {
> >       struct untracked_cache_dir *root;
> >       static int untracked_cache_disabled = -1;
>
> Is this just re-indenting?  Not very welcome, but OK.
>

Will fix; I think there was a previous version in which these lines
actually changed.

> > @@ -2814,17 +2834,9 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
> >       if (base_len || (pathspec && pathspec->nr))
> >               return NULL;
> >
> > -     /* Different set of flags may produce different results */
> > -     if (dir->flags != dir->untracked->dir_flags ||
> > -         /*
> > -          * See treat_directory(), case index_nonexistent. Without
> > -          * this flag, we may need to also cache .git file content
> > -          * for the resolve_gitlink_ref() call, which we don't.
> > -          */
> > -         !(dir->flags & DIR_SHOW_OTHER_DIRECTORIES) ||
> > -         /* We don't support collecting ignore files */
> > -         (dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO |
> > -                        DIR_COLLECT_IGNORED)))
> > +     /* We don't support collecting ignore files */
> > +     if (dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO |
> > +                     DIR_COLLECT_IGNORED))
> >               return NULL;
> >
> >       /*
> > @@ -2847,6 +2859,41 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
> >               return NULL;
> >       }
> >
> > +     /*
> > +      * We don't support using or preparing the untracked cache if
> > +      * the current effective flags don't match the configured
> > +      * flags.
> > +      */
>
> Is that what we want?  What happens when your user does this:
>
>     - set showuntrackedfiles to "normal"
>     - generate the untracked cache
>     - set showuntrackedfiles to "all"
>
> And now the user wants to make a request that wants to see flags
> that are suitable for "normal".
>
> The best case would be for the untracked cache to know what flags
> were in use when it was generated, so that in the above sequence,
> even though the current value of configuration variable and the
> current request from the command line contradict each other, we
> can notice that the untracked cache data is suitable for the current
> request and the right thing happens.
>
> > +     if (dir->flags != new_untracked_cache_flags(istate))
> > +             return NULL;
>
> But this only pays attention to what is in the configuration?  It
> seems to me that it is being too pessimistic, but perhaps it is
> necessary for correctness somehow?

The logic in the current patch is:
* If the configuration doesn't match the request, ignore untracked
cache completely (exit the UC logic)
* If the configuration doesn't match the current untracked cache data,
discard the existing untracked cache data (and later recreate it if
everything else works out)

I think your proposal is something like:
* If the current untracked cache data doesn't match the request
** If the current untracked cache data doesn't match the
configuration, then discard the existing untracked cache (and later
recreate it if everything else works out)
** Otherwise, ignore untracked cache completely (exit the UC logic)
* (otherwise, if current untracked cache data *does* match the
request, assume it's good enough to keep using for now - this is the
new behavior)

I find the latter slightly harder to understand and explain, but I'm
reasonably certain it addresses the case you identified above without
compromising correctness in any other cases - it can only have a
positive impact on performance. I'll have a go at making that change
today.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 1/2] untracked-cache: test untracked-cache-bypassing behavior with -uall
  2022-03-30 16:39               ` Junio C Hamano
@ 2022-03-31  5:15                 ` Tao Klerks
  0 siblings, 0 replies; 22+ messages in thread
From: Tao Klerks @ 2022-03-31  5:15 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Tao Klerks via GitGitGadget, git,
	Ævar Arnfjörð Bjarmason

On Wed, Mar 30, 2022 at 6:39 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Tao Klerks <tao@klerks.biz> writes:
>
> > Sorry, I'm not completely sure whether my comment was misleading, or
> > whether I'm misunderstanding your feedback.
> >
> > The test added here does not test "desirable" behavior from an
> > end-user functional perspective, but it does test behavior that is
> > working "as-designed" as of many years ago.
>
> Is it "as-designed", or just "left buggy"?
>
> I somehow had an impression that you meant the latter, and it would
> be our aspirational goal to eventually fix it.  And for such case,
> it would be better to write the test body to show what the command
> should do, which would make the test fail with today's Git, and mark
> it as test_expect_failure (which is not an ideal mechanism to prepare
> for a future fix, but that is what we have now and should use, until
> a better alternative being worked on is finished).
>
> But if it is "as-designed, some users may find it suboptimal or
> confusing or with any other negative adjectives, but it is too late
> to change now and more importantly it is unthinkable to change it
> because existing tools and users do depend on the current behaviour",
> then what you did is perfectly fine.
>

Heh, the answer is in-between: there certainly is an aspiration that
someone figure out how to have the single untracked cache structure
safely support both -uall and -unormal modes with the same data... but
it's not clear that this is possible, and it's certainly not planned.

I don't feel comfortable saying "we should fix this", and that's not
the intent of this test - its intent is to say "this is how it is
(currently) designed to work. If you accidentally change it, be sure
that you are changing the design, not just accidentally changing
behavior and potentially/probably introducing a bug".

I'll try with this updated comment, using the term "current design" to
make the distinction:

# Bypassing the untracked cache here is not desirable from an
# end-user perspective, but is expected in the current design.
# The untracked cache data stored for a -unormal run cannot be
# correctly used in a -uall run - it would yield incorrect output.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v6 0/2] Support untracked cache with '--untracked-files=all' if configured
  2022-03-29 11:25       ` [PATCH v5 0/2] Support untracked cache with " Tao Klerks via GitGitGadget
  2022-03-29 11:25         ` [PATCH v5 1/2] untracked-cache: test untracked-cache-bypassing behavior with -uall Tao Klerks via GitGitGadget
  2022-03-29 11:25         ` [PATCH v5 2/2] untracked-cache: support '--untracked-files=all' if configured Tao Klerks via GitGitGadget
@ 2022-03-31 16:02         ` Tao Klerks via GitGitGadget
  2022-03-31 16:02           ` [PATCH v6 1/2] untracked-cache: test untracked-cache-bypassing behavior with -uall Tao Klerks via GitGitGadget
  2022-03-31 16:02           ` [PATCH v6 2/2] untracked-cache: support '--untracked-files=all' if configured Tao Klerks via GitGitGadget
  2 siblings, 2 replies; 22+ messages in thread
From: Tao Klerks via GitGitGadget @ 2022-03-31 16:02 UTC (permalink / raw)
  To: git; +Cc: Tao Klerks, Ævar Arnfjörð Bjarmason, Tao Klerks

Make it possible for users of the -uall flag to git status, either by
preference or by need (eg UI tooling), to benefit from the untracked cache
when they set their 'status.showuntrackedfiles' config setting to 'all'.
This is especially useful for large repos in Windows, where without
untracked cache "git status" times can easily be 5x higher, and GUI tooling
typically does use -uall.

In this sixth version, clarify the main commit message and some of the code
comments, and adjust the logic slightly such that an existing untracked
cache structure, when consistent with the requested flags in the current
run, can be reused even though the current config would set/store/use the
other set of flags on a new untracked cache structure.

I'm comfortable with this patch as-is, but am still interested in any
thoughts as to whether it makes sense and is likely to be accepted to do
this as a simple enhancement as proposed here, or whether people be more
comfortable with a new configuration option, given the potential for worse
performance under specific (and, I believe, vanishingly rare) circumstances.

Tao Klerks (2):
  untracked-cache: test untracked-cache-bypassing behavior with -uall
  untracked-cache: support '--untracked-files=all' if configured

 dir.c                             |  88 ++++++++++++++++++-----
 t/t7063-status-untracked-cache.sh | 113 ++++++++++++++++++++++++++++++
 2 files changed, 185 insertions(+), 16 deletions(-)


base-commit: 805e0a68082a217f0112db9ee86a022227a9c81b
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-985%2FTaoK%2Ftaok-untracked-cache-with-uall-v6
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-985/TaoK/taok-untracked-cache-with-uall-v6
Pull-Request: https://github.com/gitgitgadget/git/pull/985

Range-diff vs v5:

 1:  98a4d8f35c5 ! 1:  f27f018493a untracked-cache: test untracked-cache-bypassing behavior with -uall
     @@ Commit message
          '--untracked-files=normal', and it gets ignored when
          '--untracked-files=all' is specified instead.
      
     -    Add explicit tests for this behavior.
     +    Add explicit tests for this known as-designed behavior.
      
          Signed-off-by: Tao Klerks <tao@klerks.biz>
      
     @@ t/t7063-status-untracked-cache.sh: test_expect_success 'untracked cache after se
      +?? three
      +EOF
      +
     -+# Bypassing the untracked cache here is not desirable, but it expected
     -+# in the current implementation
     ++# Bypassing the untracked cache here is not desirable from an
     ++# end-user perspective, but is expected in the current design.
     ++# The untracked cache data stored for a -unormal run cannot be
     ++# correctly used in a -uall run - it would yield incorrect output.
      +test_expect_success 'untracked cache is bypassed with -uall' '
      +	: >../trace.output &&
      +	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 2:  f60d2c6e36c ! 2:  0095ec49c5f untracked-cache: support '--untracked-files=all' if configured
     @@ Commit message
          untracked-cache: support '--untracked-files=all' if configured
      
          Untracked cache was originally designed to only work with
     -    '--untracked-files=normal', but this causes performance issues for UI
     -    tooling that wants to see "all" on a frequent basis. On the other hand,
     -    the conditions that prevented applicability to the "all" mode no
     -    longer seem to apply.
     +    "--untracked-files=normal", and is bypassed when
     +    "--untracked-files=all" is requested, but this causes performance
     +    issues for UI tooling that wants to see "all" on a frequent basis.
      
     -    The disqualification of untracked cache has a particularly significant
     -    impact on Windows with the defaulted fscache, where the introduction
     -    of fsmonitor can make the first and costly directory-iteration happen
     -    in untracked file detection, single-threaded, rather than in
     -    preload-index on multiple threads. Improving the performance of a
     -    "normal" 'git status' run with fsmonitor can make
     -    'git status --untracked-files=all' perform much worse.
     +    On the other hand, the conditions that altogether prevented
     +    applicability to the "all" mode no longer seem to apply, after
     +    several major refactors in recent years; this possibility was
     +    discussed in
     +    81153d02-8e7a-be59-e709-e90cd5906f3a@jeffhostetler.com and
     +    CABPp-BFiwzzUgiTj_zu+vF5x20L0=1cf25cHwk7KZQj2YkVzXw@mail.gmail.com,
     +    and somewhat confirmed experimentally by several users using a
     +    version of this patch to use untracked cache with -uall for about a
     +    year.
     +
     +    When 'git status' runs without using the untracked cache, on a large
     +    repo, on windows, with fsmonitor, it can run very slowly. This can
     +    make GUIs that need to use "-uall" (and therefore currently bypass
     +    untracked cache) unusable when fsmonitor is enabled, on such large
     +    repos.
      
          To partially address this, align the supported directory flags for the
          stored untracked cache data with the git config. If a user specifies
     @@ dir.c: static void new_untracked_cache(struct index_state *istate)
       		}
       	}
       }
     -@@ dir.c: void remove_untracked_cache(struct index_state *istate)
     - }
     - 
     - static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *dir,
     --						      int base_len,
     --						      const struct pathspec *pathspec,
     --						      struct index_state *istate)
     -+							    int base_len,
     -+							    const struct pathspec *pathspec,
     -+							    struct index_state *istate)
     - {
     - 	struct untracked_cache_dir *root;
     - 	static int untracked_cache_disabled = -1;
      @@ dir.c: static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
       	if (base_len || (pathspec && pathspec->nr))
       		return NULL;
     @@ dir.c: static struct untracked_cache_dir *validate_untracked_cache(struct dir_st
       		return NULL;
       	}
       
     -+	/*
     -+	 * We don't support using or preparing the untracked cache if
     -+	 * the current effective flags don't match the configured
     -+	 * flags.
     -+	 */
     -+	if (dir->flags != new_untracked_cache_flags(istate))
     -+		return NULL;
     -+
      +	/*
      +	 * If the untracked structure we received does not have the same flags
     -+	 * as configured, but the configured flags do match the effective flags,
     -+	 * then we need to reset / create a new "untracked" structure to match
     -+	 * the new config.
     -+	 * Keeping the saved and used untracked cache in-line with the
     -+	 * configuration provides an opportunity for frequent users of
     -+	 * "git status -uall" to leverage the untracked cache by aligning their
     -+	 * configuration (setting "status.showuntrackedfiles" to "all" or
     -+	 * "normal" as appropriate), where previously this option was
     -+	 * incompatible with untracked cache and *consistently* caused
     -+	 * surprisingly bad performance (with fscache and fsmonitor enabled) on
     -+	 * Windows.
     -+	 *
     -+	 * IMPROVEMENT OPPORTUNITY: If we reworked the untracked cache storage
     -+	 * to not be as bound up with the desired output in a given run,
     -+	 * and instead iterated through and stored enough information to
     -+	 * correctly serve both "modes", then users could get peak performance
     -+	 * with or without '-uall' regardless of their
     -+	 * "status.showuntrackedfiles" config.
     ++	 * as requested in this run, we're going to need to either discard the
     ++	 * existing structure (and potentially later recreate), or bypass the
     ++	 * untracked cache mechanism for this run.
      +	 */
      +	if (dir->flags != dir->untracked->dir_flags) {
     -+		free_untracked_cache(istate->untracked);
     -+		new_untracked_cache(istate, dir->flags);
     -+		dir->untracked = istate->untracked;
     ++		/*
     ++		 * If the untracked structure we received does not have the same flags
     ++		 * as configured, then we need to reset / create a new "untracked"
     ++		 * structure to match the new config.
     ++		 *
     ++		 * Keeping the saved and used untracked cache consistent with the
     ++		 * configuration provides an opportunity for frequent users of
     ++		 * "git status -uall" to leverage the untracked cache by aligning their
     ++		 * configuration - setting "status.showuntrackedfiles" to "all" or
     ++		 * "normal" as appropriate.
     ++		 *
     ++		 * Previously using -uall (or setting "status.showuntrackedfiles" to
     ++		 * "all") was incompatible with untracked cache and *consistently*
     ++		 * caused surprisingly bad performance (with fscache and fsmonitor
     ++		 * enabled) on Windows.
     ++		 *
     ++		 * IMPROVEMENT OPPORTUNITY: If we reworked the untracked cache storage
     ++		 * to not be as bound up with the desired output in a given run,
     ++		 * and instead iterated through and stored enough information to
     ++		 * correctly serve both "modes", then users could get peak performance
     ++		 * with or without '-uall' regardless of their
     ++		 * "status.showuntrackedfiles" config.
     ++		 */
     ++		if (dir->untracked->dir_flags != new_untracked_cache_flags(istate)) {
     ++			free_untracked_cache(istate->untracked);
     ++			new_untracked_cache(istate, dir->flags);
     ++			dir->untracked = istate->untracked;
     ++		}
     ++		else {
     ++			/*
     ++			 * Current untracked cache data is consistent with config, but not
     ++			 * usable in this request/run; just bypass untracked cache.
     ++			 */
     ++			return NULL;
     ++		}
      +	}
      +
       	if (!dir->untracked->root) {
     @@ t/t7063-status-untracked-cache.sh: test_expect_success 'untracked cache remains
      +	test_cmp ../trace.expect ../trace.relevant
      +'
      +
     -+# Bypassing the untracked cache here is not desirable, but it expected
     -+# in the current implementation
     ++# Bypassing the untracked cache here is not desirable from an
     ++# end-user perspective, but is expected in the current design.
     ++# The untracked cache data stored for a -all run cannot be
     ++# correctly used in a -unormal run - it would yield incorrect
     ++# output.
      +test_expect_success 'if -uall is configured, untracked cache is bypassed with -unormal' '
      +	test_config status.showuntrackedfiles all &&
      +	: >../trace.output &&

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v6 1/2] untracked-cache: test untracked-cache-bypassing behavior with -uall
  2022-03-31 16:02         ` [PATCH v6 0/2] Support untracked cache with " Tao Klerks via GitGitGadget
@ 2022-03-31 16:02           ` Tao Klerks via GitGitGadget
  2022-03-31 16:02           ` [PATCH v6 2/2] untracked-cache: support '--untracked-files=all' if configured Tao Klerks via GitGitGadget
  1 sibling, 0 replies; 22+ messages in thread
From: Tao Klerks via GitGitGadget @ 2022-03-31 16:02 UTC (permalink / raw)
  To: git
  Cc: Tao Klerks, Ævar Arnfjörð Bjarmason, Tao Klerks,
	Tao Klerks

From: Tao Klerks <tao@klerks.biz>

Untracked cache was originally designed to only work with
'--untracked-files=normal', and it gets ignored when
'--untracked-files=all' is specified instead.

Add explicit tests for this known as-designed behavior.

Signed-off-by: Tao Klerks <tao@klerks.biz>
---
 t/t7063-status-untracked-cache.sh | 32 +++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh
index ca90ee805e7..b89be8dc6d6 100755
--- a/t/t7063-status-untracked-cache.sh
+++ b/t/t7063-status-untracked-cache.sh
@@ -190,6 +190,38 @@ test_expect_success 'untracked cache after second status' '
 	test_cmp ../dump.expect ../actual
 '
 
+cat >../status_uall.expect <<EOF &&
+A  done/one
+A  one
+A  two
+?? dthree/three
+?? dtwo/two
+?? three
+EOF
+
+# Bypassing the untracked cache here is not desirable from an
+# end-user perspective, but is expected in the current design.
+# The untracked cache data stored for a -unormal run cannot be
+# correctly used in a -uall run - it would yield incorrect output.
+test_expect_success 'untracked cache is bypassed with -uall' '
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
+	git status -uall --porcelain >../actual &&
+	iuc status -uall --porcelain >../status.iuc &&
+	test_cmp ../status_uall.expect ../status.iuc &&
+	test_cmp ../status_uall.expect ../actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
+	cat >../trace.expect <<EOF &&
+ ....path:
+EOF
+	test_cmp ../trace.expect ../trace.relevant
+'
+
+test_expect_success 'untracked cache remains after bypass' '
+	test-tool dump-untracked-cache >../actual &&
+	test_cmp ../dump.expect ../actual
+'
+
 test_expect_success 'modify in root directory, one dir invalidation' '
 	: >four &&
 	test-tool chmtime =-240 four &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v6 2/2] untracked-cache: support '--untracked-files=all' if configured
  2022-03-31 16:02         ` [PATCH v6 0/2] Support untracked cache with " Tao Klerks via GitGitGadget
  2022-03-31 16:02           ` [PATCH v6 1/2] untracked-cache: test untracked-cache-bypassing behavior with -uall Tao Klerks via GitGitGadget
@ 2022-03-31 16:02           ` Tao Klerks via GitGitGadget
  1 sibling, 0 replies; 22+ messages in thread
From: Tao Klerks via GitGitGadget @ 2022-03-31 16:02 UTC (permalink / raw)
  To: git
  Cc: Tao Klerks, Ævar Arnfjörð Bjarmason, Tao Klerks,
	Tao Klerks

From: Tao Klerks <tao@klerks.biz>

Untracked cache was originally designed to only work with
"--untracked-files=normal", and is bypassed when
"--untracked-files=all" is requested, but this causes performance
issues for UI tooling that wants to see "all" on a frequent basis.

On the other hand, the conditions that altogether prevented
applicability to the "all" mode no longer seem to apply, after
several major refactors in recent years; this possibility was
discussed in
81153d02-8e7a-be59-e709-e90cd5906f3a@jeffhostetler.com and
CABPp-BFiwzzUgiTj_zu+vF5x20L0=1cf25cHwk7KZQj2YkVzXw@mail.gmail.com,
and somewhat confirmed experimentally by several users using a
version of this patch to use untracked cache with -uall for about a
year.

When 'git status' runs without using the untracked cache, on a large
repo, on windows, with fsmonitor, it can run very slowly. This can
make GUIs that need to use "-uall" (and therefore currently bypass
untracked cache) unusable when fsmonitor is enabled, on such large
repos.

To partially address this, align the supported directory flags for the
stored untracked cache data with the git config. If a user specifies
an '--untracked-files=' commandline parameter that does not align with
their 'status.showuntrackedfiles' config value, then the untracked
cache will be ignored - as it is for other unsupported situations like
when a pathspec is specified.

If the previously stored flags no longer match the current
configuration, but the currently-applicable flags do match the current
configuration, then discard the previously stored untracked cache
data.

For most users there will be no change in behavior. Users who need
'--untracked-files=all' to perform well will now have the option of
setting "status.showuntrackedfiles" to "all" for better / more
consistent performance.

Users who need '--untracked-files=all' to perform well for their
tooling AND prefer to avoid the verbosity of "all" when running
git status explicitly without options... are out of luck for now (no
change).

Users who have the "status.showuntrackedfiles" config set to "all"
and yet frequently explicitly call
'git status --untracked-files=normal' (and use the untracked cache)
are the only ones who will be disadvantaged by this change. Their
"--untracked-files=normal" calls will, after this change, no longer
use the untracked cache.

Signed-off-by: Tao Klerks <tao@klerks.biz>
---
 dir.c                             | 88 +++++++++++++++++++++++++------
 t/t7063-status-untracked-cache.sh | 81 ++++++++++++++++++++++++++++
 2 files changed, 153 insertions(+), 16 deletions(-)

diff --git a/dir.c b/dir.c
index f2b0f242101..26c4d141ab9 100644
--- a/dir.c
+++ b/dir.c
@@ -2747,13 +2747,33 @@ static void set_untracked_ident(struct untracked_cache *uc)
 	strbuf_addch(&uc->ident, 0);
 }
 
-static void new_untracked_cache(struct index_state *istate)
+static unsigned new_untracked_cache_flags(struct index_state *istate)
+{
+	struct repository *repo = istate->repo;
+	char *val;
+
+	/*
+	 * This logic is coordinated with the setting of these flags in
+	 * wt-status.c#wt_status_collect_untracked(), and the evaluation
+	 * of the config setting in commit.c#git_status_config()
+	 */
+	if (!repo_config_get_string(repo, "status.showuntrackedfiles", &val) &&
+	    !strcmp(val, "all"))
+		return 0;
+
+	/*
+	 * The default, if "all" is not set, is "normal" - leading us here.
+	 * If the value is "none" then it really doesn't matter.
+	 */
+	return DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
+}
+
+static void new_untracked_cache(struct index_state *istate, int flags)
 {
 	struct untracked_cache *uc = xcalloc(1, sizeof(*uc));
 	strbuf_init(&uc->ident, 100);
 	uc->exclude_per_dir = ".gitignore";
-	/* should be the same flags used by git-status */
-	uc->dir_flags = DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
+	uc->dir_flags = flags >= 0 ? flags : new_untracked_cache_flags(istate);
 	set_untracked_ident(uc);
 	istate->untracked = uc;
 	istate->cache_changed |= UNTRACKED_CHANGED;
@@ -2762,11 +2782,11 @@ static void new_untracked_cache(struct index_state *istate)
 void add_untracked_cache(struct index_state *istate)
 {
 	if (!istate->untracked) {
-		new_untracked_cache(istate);
+		new_untracked_cache(istate, -1);
 	} else {
 		if (!ident_in_untracked(istate->untracked)) {
 			free_untracked_cache(istate->untracked);
-			new_untracked_cache(istate);
+			new_untracked_cache(istate, -1);
 		}
 	}
 }
@@ -2814,17 +2834,9 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
 	if (base_len || (pathspec && pathspec->nr))
 		return NULL;
 
-	/* Different set of flags may produce different results */
-	if (dir->flags != dir->untracked->dir_flags ||
-	    /*
-	     * See treat_directory(), case index_nonexistent. Without
-	     * this flag, we may need to also cache .git file content
-	     * for the resolve_gitlink_ref() call, which we don't.
-	     */
-	    !(dir->flags & DIR_SHOW_OTHER_DIRECTORIES) ||
-	    /* We don't support collecting ignore files */
-	    (dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO |
-			   DIR_COLLECT_IGNORED)))
+	/* We don't support collecting ignore files */
+	if (dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO |
+			DIR_COLLECT_IGNORED))
 		return NULL;
 
 	/*
@@ -2847,6 +2859,50 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
 		return NULL;
 	}
 
+	/*
+	 * If the untracked structure we received does not have the same flags
+	 * as requested in this run, we're going to need to either discard the
+	 * existing structure (and potentially later recreate), or bypass the
+	 * untracked cache mechanism for this run.
+	 */
+	if (dir->flags != dir->untracked->dir_flags) {
+		/*
+		 * If the untracked structure we received does not have the same flags
+		 * as configured, then we need to reset / create a new "untracked"
+		 * structure to match the new config.
+		 *
+		 * Keeping the saved and used untracked cache consistent with the
+		 * configuration provides an opportunity for frequent users of
+		 * "git status -uall" to leverage the untracked cache by aligning their
+		 * configuration - setting "status.showuntrackedfiles" to "all" or
+		 * "normal" as appropriate.
+		 *
+		 * Previously using -uall (or setting "status.showuntrackedfiles" to
+		 * "all") was incompatible with untracked cache and *consistently*
+		 * caused surprisingly bad performance (with fscache and fsmonitor
+		 * enabled) on Windows.
+		 *
+		 * IMPROVEMENT OPPORTUNITY: If we reworked the untracked cache storage
+		 * to not be as bound up with the desired output in a given run,
+		 * and instead iterated through and stored enough information to
+		 * correctly serve both "modes", then users could get peak performance
+		 * with or without '-uall' regardless of their
+		 * "status.showuntrackedfiles" config.
+		 */
+		if (dir->untracked->dir_flags != new_untracked_cache_flags(istate)) {
+			free_untracked_cache(istate->untracked);
+			new_untracked_cache(istate, dir->flags);
+			dir->untracked = istate->untracked;
+		}
+		else {
+			/*
+			 * Current untracked cache data is consistent with config, but not
+			 * usable in this request/run; just bypass untracked cache.
+			 */
+			return NULL;
+		}
+	}
+
 	if (!dir->untracked->root) {
 		/* Untracked cache existed but is not initialized; fix that */
 		FLEX_ALLOC_STR(dir->untracked->root, name, "");
diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh
index b89be8dc6d6..9936cc329ec 100755
--- a/t/t7063-status-untracked-cache.sh
+++ b/t/t7063-status-untracked-cache.sh
@@ -222,6 +222,87 @@ test_expect_success 'untracked cache remains after bypass' '
 	test_cmp ../dump.expect ../actual
 '
 
+test_expect_success 'if -uall is configured, untracked cache gets populated by default' '
+	test_config status.showuntrackedfiles all &&
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
+	git status --porcelain >../actual &&
+	iuc status --porcelain >../status.iuc &&
+	test_cmp ../status_uall.expect ../status.iuc &&
+	test_cmp ../status_uall.expect ../actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
+	cat >../trace.expect <<EOF &&
+ ....path:
+ ....node-creation:3
+ ....gitignore-invalidation:1
+ ....directory-invalidation:0
+ ....opendir:4
+EOF
+	test_cmp ../trace.expect ../trace.relevant
+'
+
+cat >../dump_uall.expect <<EOF &&
+info/exclude $EMPTY_BLOB
+core.excludesfile $ZERO_OID
+exclude_per_dir .gitignore
+flags 00000000
+/ $ZERO_OID recurse valid
+three
+/done/ $ZERO_OID recurse valid
+/dthree/ $ZERO_OID recurse valid
+three
+/dtwo/ $ZERO_OID recurse valid
+two
+EOF
+
+test_expect_success 'if -uall was configured, untracked cache is populated' '
+	test-tool dump-untracked-cache >../actual &&
+	test_cmp ../dump_uall.expect ../actual
+'
+
+test_expect_success 'if -uall is configured, untracked cache is used by default' '
+	test_config status.showuntrackedfiles all &&
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
+	git status --porcelain >../actual &&
+	iuc status --porcelain >../status.iuc &&
+	test_cmp ../status_uall.expect ../status.iuc &&
+	test_cmp ../status_uall.expect ../actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
+	cat >../trace.expect <<EOF &&
+ ....path:
+ ....node-creation:0
+ ....gitignore-invalidation:0
+ ....directory-invalidation:0
+ ....opendir:0
+EOF
+	test_cmp ../trace.expect ../trace.relevant
+'
+
+# Bypassing the untracked cache here is not desirable from an
+# end-user perspective, but is expected in the current design.
+# The untracked cache data stored for a -all run cannot be
+# correctly used in a -unormal run - it would yield incorrect
+# output.
+test_expect_success 'if -uall is configured, untracked cache is bypassed with -unormal' '
+	test_config status.showuntrackedfiles all &&
+	: >../trace.output &&
+	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
+	git status -unormal --porcelain >../actual &&
+	iuc status -unormal --porcelain >../status.iuc &&
+	test_cmp ../status.expect ../status.iuc &&
+	test_cmp ../status.expect ../actual &&
+	get_relevant_traces ../trace.output ../trace.relevant &&
+	cat >../trace.expect <<EOF &&
+ ....path:
+EOF
+	test_cmp ../trace.expect ../trace.relevant
+'
+
+test_expect_success 'repopulate untracked cache for -unormal' '
+	git status --porcelain
+'
+
 test_expect_success 'modify in root directory, one dir invalidation' '
 	: >four &&
 	test-tool chmtime =-240 four &&
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2022-03-31 16:02 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-23  6:44 [PATCH] Support untracked cache with '--untracked-files=all' if configured Tao Klerks via GitGitGadget
2022-02-25 17:52 ` [PATCH v2] untracked-cache: support " Tao Klerks via GitGitGadget
2022-02-25 19:43   ` Junio C Hamano
2022-02-27 11:21     ` Tao Klerks
2022-02-27 19:54       ` Junio C Hamano
2022-02-27 11:22   ` [PATCH v3] " Tao Klerks via GitGitGadget
2022-02-27 14:39     ` Tao Klerks
2022-02-27 15:13     ` [PATCH v4] " Tao Klerks via GitGitGadget
2022-02-28 14:02       ` Ævar Arnfjörð Bjarmason
2022-03-02  8:47         ` Tao Klerks
2022-03-29 11:25       ` [PATCH v5 0/2] Support untracked cache with " Tao Klerks via GitGitGadget
2022-03-29 11:25         ` [PATCH v5 1/2] untracked-cache: test untracked-cache-bypassing behavior with -uall Tao Klerks via GitGitGadget
2022-03-29 16:51           ` Junio C Hamano
2022-03-30  4:46             ` Tao Klerks
2022-03-30 16:39               ` Junio C Hamano
2022-03-31  5:15                 ` Tao Klerks
2022-03-29 11:25         ` [PATCH v5 2/2] untracked-cache: support '--untracked-files=all' if configured Tao Klerks via GitGitGadget
2022-03-29 17:43           ` Junio C Hamano
2022-03-30 19:59             ` Tao Klerks
2022-03-31 16:02         ` [PATCH v6 0/2] Support untracked cache with " Tao Klerks via GitGitGadget
2022-03-31 16:02           ` [PATCH v6 1/2] untracked-cache: test untracked-cache-bypassing behavior with -uall Tao Klerks via GitGitGadget
2022-03-31 16:02           ` [PATCH v6 2/2] untracked-cache: support '--untracked-files=all' if configured Tao Klerks via GitGitGadget

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).