git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff Hostetler <git@jeffhostetler.com>
To: Rudy Rigot via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org
Cc: Rudy Rigot <rudy.rigot@gmail.com>
Subject: Re: [PATCH v2] status: long status advice adapted to recent capabilities
Date: Wed, 2 Nov 2022 15:45:18 -0400	[thread overview]
Message-ID: <8abc5272-4e01-e793-5155-ea116e9ad4fd@jeffhostetler.com> (raw)
In-Reply-To: <pull.1384.v2.git.1667002005494.gitgitgadget@gmail.com>



On 10/28/22 8:06 PM, Rudy Rigot via GitGitGadget wrote:
> From: Rudy Rigot <rudy.rigot@gmail.com>
> 
> Currently, if git-status takes more than 2 seconds for enumerating untracked
> files, a piece of advice is given to the user to consider ignoring untracked
> files. But Git now offers more possibilities to resolve that situation
> (untracked cache, fsmonitor) with different downsides.
> 
> This change is about refreshing that advice message. A new section in the
> documentation is introduced to present the possibilities, and the advice
> message links to it. I'm also introducing tests for this advice message,
> which was untested so far.
> 
> One of the downsides of untracked cache / fsmonitor, is that the first call
> may be long in order to generate the cache, but the user may not know what
> their current configuration is. When collecting feedback from users of our
> very large repo, that's the most common point of confusion that keeps coming
> back: people complain about git status being slow, but are satisfied when
> we inform them that it's being cached and they should run it again to check.
> As a result, the advice message tries to keep them informed of their current
> configuration.

Let me suggest an alternative commit message.  We want to lead with a
"command" -- as in: "make Git do this" or "teach Git to do this".  Then
explain why.  Maybe something like:

     Improve the advice displayed when `git status` is slow because
     of excessive numbers of untracked files.  Update the `git status`
     man page to explain the various configuration options.

     `git status` can be slow when there are a large number of untracked
     files and directories, because Git must search the entire worktree
     to enumerate them.  Previously, Git would print an advice message
     with the elapsed search time and a suggestion to disable the search
     using the `-uno` option.  This suggestion also carried a warning
     that might scare off some users.

     Git can reduce the size and time of the untracked file search when
     the `core.untrackedCache` and `core.fsmonitor` features are enabled
     by caching results from previous `git status` invocations.

     Update the advice to explain the various combinations of additional
     configuration options and refer to (new) documentation in the man
     page that explains it in more detail than what can be printed in an
     advice message.

     Finally, add new tests to verify the new functionality.

Or something like that. :-)


[...]
> diff --git a/Documentation/git-status.txt b/Documentation/git-status.txt
> index 54a4b29b473..3d92e5fd018 100644
> --- a/Documentation/git-status.txt
> +++ b/Documentation/git-status.txt
> @@ -457,6 +457,33 @@ during the write may conflict with other simultaneous processes, causing
>   them to fail. Scripts running `status` in the background should consider
>   using `git --no-optional-locks status` (see linkgit:git[1] for details).
>   
> +UNTRACKED FILES AND STATUS SPEED
> +--------------------------------
> +
> +If your untracked files take an unusual amount of time to enumerate, your
> +repository certainly has a lot of them, and an advice message will display
> +about it. Here are some configurations to consider in order to improve the
> +situation:
> +
> +* Setting the `core.untrackedCache` configuration as `true` will allow for
> +`git status` to keep track of the mtime of folders, in order to cache past
> +`status` results and be sure to only browse folders that changed on subsequent
> +runs, for filesystems that can support it (see linkgit:git-update-index[1]
> +for details).
> +* Used in conjonction with `core.untrackedCache`, setting the `core.fsmonitor`
> +configuration as `true` will allow for `git status` to keep track of what
> +files recently changed, in order to cache past `status` results and be sure
> +to only focus on those files on subsequent runs (see linkgit:git-update-index[1]
> +for details).
> +* If none of the above options are satisfactory, setting the
> +`status.showUntrackedFiles` configuration as `no` will cause `git status`
> +to not attempt to list untracked files anymore, in which case you have to be
> +careful not to forget to add new files yourself.
> +
> +If none of the above solutions are satisfactory, and you are bothered with
> +the advice message, you can disable it by setting the `advice.statusUoption`
> +configuration to `false`.
> +

I hate to suggest a complete rewrite as I myself struggle with
how to phrase things all toooooo often, but let me offer another
starting point and see what you think:

     `git status` can be very slow in large worktrees if/when it
     needs to search for untracked files and directories.  There are
     many configuration options available to speed this up by either
     avoiding the work or making use of cached results from previous
     Git commands.  Since we all work in different ways, there is no
     single optimum set of settings right for everyone.  Here is a
     brief summary of the relevant options to help you choose which
     is right for you.  Each of these settings is independently
     documented elsewhere in more detail, so please refer to them
     for complete details.

     * `-uno` or `status.showUntrackedFiles=false` : just don't search
         and don't report on untracked files.  This is the fastest.
         `git status` will not list the untracked files, so you need
         to be careful to remember if you create any new files and
         manually `git add` them.

     * `advice.statusUoption=false` : search, but don't complain if it
         takes too long.

     * `core.untrackedCache=true` : enable the untracked cache feature
         and only search directories that have been modified since the
         previous `git status` command.  Git remembers the set of
         untracked files within each directory and assumes that if a
         directory has not been modified, then the set of untracked
         file within has not changed.  This is much faster than
         enumerating the contents of every directory, but still not
         without cost, because Git still has to search for the set of
         modified directories.

     * `core.untrackedCache=true` and `core.fsmonitor=true` or
         `core.fsmonitor=<hook_command_pathname>` : enable both the
         untracked cache and FSMonitor features and only search
         directories that have been modified since the previous
         `git status` command.  This is faster than using just the
         untracked cache alone because Git can also avoid searching
         for modified directories.  Git only has to enumerate the
         exact set of directories that have changed recently.

     Note that after you turn on the untracked cache and/or FSMonitor
     features it may take a few `git status` commands for the various
     caches to warm up before you see improved command times.  This is
     normal.

Something like that.  Hope this helps.  (And again, sorry for the
rewrite.)


[...]
> diff --git a/t/t7065-wtstatus-slow.sh b/t/t7065-wtstatus-slow.sh
> new file mode 100755
> index 00000000000..92c053eaa64
> --- /dev/null
> +++ b/t/t7065-wtstatus-slow.sh
> @@ -0,0 +1,40 @@
> +#!/bin/sh
[...]
> +
> +test_done
> \ No newline at end of file

small nit: we should have a final LF at the end of the file.

I'm going to skip over the test cases because I'm running
short on time this afternoon.


[...]
> diff --git a/wt-status.c b/wt-status.c
[...]
>   static void show_merge_in_progress(struct wt_status *s,
>   				   const char *color)
>   {
> @@ -1814,6 +1827,7 @@ static void wt_longstatus_print(struct wt_status *s)
>   {
>   	const char *branch_color = color(WT_STATUS_ONBRANCH, s);
>   	const char *branch_status_color = color(WT_STATUS_HEADER, s);
> +	enum fsmonitor_mode fsm_mode = fsm_settings__get_mode(s->repo);
>   
>   	if (s->branch) {
>   		const char *on_what = _("On branch ");
> @@ -1870,13 +1884,25 @@ static void wt_longstatus_print(struct wt_status *s)
>   		wt_longstatus_print_other(s, &s->untracked, _("Untracked files"), "add");
>   		if (s->show_ignored_mode)
>   			wt_longstatus_print_other(s, &s->ignored, _("Ignored files"), "add -f");
> -		if (advice_enabled(ADVICE_STATUS_U_OPTION) && 2000 < s->untracked_in_ms) {
> -			status_printf_ln(s, GIT_COLOR_NORMAL, "%s", "");
> -			status_printf_ln(s, GIT_COLOR_NORMAL,
> -					 _("It took %.2f seconds to enumerate untracked files. 'status -uno'\n"
> -					   "may speed it up, but you have to be careful not to forget to add\n"
> -					   "new files yourself (see 'git help status')."),
> -					 s->untracked_in_ms / 1000.0);
> +		if (uf_was_slow(s->untracked_in_ms)) {
> +			if (advice_enabled(ADVICE_STATUS_U_OPTION)) {
> +				status_printf_ln(s, GIT_COLOR_NORMAL, "%s", "");
> +				if (s->repo->settings.core_untracked_cache == UNTRACKED_CACHE_WRITE) {
> +					status_printf_ln(s, GIT_COLOR_NORMAL,
> +							_("It took %.2f seconds to enumerate untracked files,\n"
> +							"but this is currently being cached, with fsmonitor %s."),
> +							s->untracked_in_ms / 1000.0,
> +							(fsm_mode > FSMONITOR_MODE_DISABLED) ? "ON" : "OFF");
> +				} else {
> +					status_printf_ln(s, GIT_COLOR_NORMAL,
> +							_("It took %.2f seconds to enumerate untracked files."),
> +							s->untracked_in_ms / 1000.0);
> +				}
> +				status_printf_ln(s, GIT_COLOR_NORMAL,
> +						_("See https://git-scm.com/docs/git-status#_untracked_files_and_status_speed\n"
> +						"for configuration options that may improve that time."));
> +				status_printf_ln(s, GIT_COLOR_NORMAL, "%s", "");
> +			}

I'm not sure I like the various mixture of messages here.  Maybe
it would be better with a single simple message:

     _("It took %.2f seconds to enumerate untracked files.\n"
       "See 'git help status' for information on how to improve this.")

This keeps all of the information in the documentation rather
than having part of it here in the code.

Also, we should refer to the documentation via `git help` rather
than as a link to the website.


Thanks,
Jeff




  reply	other threads:[~2022-11-02 19:45 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-15 13:04 [PATCH] fsmonitor: long status advice adapted to the fsmonitor use case Rudy Rigot via GitGitGadget
2022-10-15 13:08 ` Rudy Rigot
2022-10-17 15:39 ` Jeff Hostetler
2022-10-17 16:59   ` Rudy Rigot
2022-10-20 12:56     ` Jeff Hostetler
2022-10-20 20:17       ` Rudy Rigot
2022-10-24 14:55         ` Jeff Hostetler
2022-10-29  0:06 ` [PATCH v2] status: long status advice adapted to recent capabilities Rudy Rigot via GitGitGadget
2022-11-02 19:45   ` Jeff Hostetler [this message]
2022-11-02 20:34     ` Rudy Rigot
2022-11-02 23:59     ` Taylor Blau
2022-11-03 14:28       ` Rudy Rigot
2022-11-04  8:52         ` Ævar Arnfjörð Bjarmason
2022-11-04 15:33           ` Rudy Rigot
2022-11-04 21:38         ` Taylor Blau
2022-11-02 21:27   ` [PATCH v3] " Rudy Rigot via GitGitGadget
2022-11-04 21:40     ` Taylor Blau
2022-11-07 20:02       ` Derrick Stolee
2022-11-07 23:19         ` Taylor Blau
2022-11-15 16:38       ` Jeff Hostetler
2022-11-07 20:01     ` Derrick Stolee
2022-11-07 20:20       ` Eric Sunshine
2022-11-07 20:31         ` Rudy Rigot
2022-11-10  4:46     ` [PATCH v4] " Rudy Rigot via GitGitGadget
2022-11-10  5:42       ` Eric Sunshine
2022-11-10 17:01         ` Rudy Rigot
2022-11-10 17:30           ` Eric Sunshine
2022-11-10 17:47             ` Rudy Rigot
2022-11-10 20:04       ` [PATCH v5] " Rudy Rigot via GitGitGadget
2022-11-15 16:39         ` Jeff Hostetler
2022-11-15 16:42           ` Rudy Rigot
2022-11-15 17:26         ` Eric Sunshine
2022-11-15 17:45           ` Rudy Rigot
2022-11-15 18:06             ` Eric Sunshine
2022-11-15 18:08               ` Rudy Rigot
2022-11-15 21:19         ` [PATCH v6] " Rudy Rigot via GitGitGadget
2022-11-21  5:06           ` Eric Sunshine
2022-11-21 15:54             ` Rudy Rigot
2022-11-21 16:17               ` Eric Sunshine
2022-11-22 16:52                 ` Rudy Rigot
2022-11-22 17:18                   ` Eric Sunshine
2022-11-22 17:24                     ` Eric Sunshine
2022-11-22 17:29                       ` Rudy Rigot
2022-11-22 17:40                     ` Eric Sunshine
2022-11-22 18:07                       ` Eric Sunshine
2022-11-22 19:19                         ` Rudy Rigot
2022-11-22 19:48                           ` Eric Sunshine
2022-11-22 16:59           ` [PATCH v7] status: modernize git-status "slow untracked files" advice Rudy Rigot via GitGitGadget
2022-11-22 22:07             ` [PATCH v8] " Rudy Rigot via GitGitGadget
2022-11-25  4:58               ` Junio C Hamano
2022-11-29 15:21                 ` Rudy Rigot
2022-11-30  0:51                   ` Rudy Rigot
2022-11-30  0:52               ` [PATCH v9] " Rudy Rigot via GitGitGadget
2022-12-01  6:48                 ` Junio C Hamano
2022-12-01 15:16                   ` Rudy Rigot
2022-12-01 22:45                     ` Junio C Hamano
2022-12-01 22:57                       ` Rudy Rigot
2023-05-11  5:17               ` [PATCH v8] " Eric Sunshine

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8abc5272-4e01-e793-5155-ea116e9ad4fd@jeffhostetler.com \
    --to=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=rudy.rigot@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).