git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Alex Vandiver via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Derrick Stolee <stolee@gmail.com>,
	Utsav Shah <utsav@dropbox.com>,
	Nipunn Koorapati <nipunn1313@gmail.com>,
	Alex Vandiver <alexmv@dropbox.com>
Subject: Re: [PATCH 1/4] fsmonitor: use fsmonitor data in `git diff`
Date: Sat, 17 Oct 2020 15:25:01 -0700	[thread overview]
Message-ID: <xmqqeelw8p8i.fsf@gitster.c.googlers.com> (raw)
In-Reply-To: <13fd992a375e30e8c7b0953a128e149951dee0ea.1602968677.git.gitgitgadget@gmail.com> (Alex Vandiver via GitGitGadget's message of "Sat, 17 Oct 2020 21:04:33 +0000")

"Alex Vandiver via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Alex Vandiver <alexmv@dropbox.com>
>
> With fsmonitor enabled, the first call to match_stat_with_submodule
> calls refresh_fsmonitor, incurring the overhead of reading the list of
> updated files -- but run_diff_files does not respect the
> CE_FSMONITOR_VALID flag.

run_diff_files() is used not just by "git diff" but other things
like "git add", so if we get an overall speed-up without having to
pay undue cost, that would be a very good news.

> diff --git a/diff-lib.c b/diff-lib.c
> index f95c6de75f..b7ee1b89ef 100644
> --- a/diff-lib.c
> +++ b/diff-lib.c
> @@ -97,6 +97,8 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
>  
>  	diff_set_mnemonic_prefix(&revs->diffopt, "i/", "w/");
>  
> +	refresh_fsmonitor(istate);
> +

"git diff" and friends are often run with pathspec, but the API into
the fsmonitor, refresh_fsmonitor() call, has no way to say "I only
am interested in the status of this directory and everything else
does not matter".  How expensive would this call to accept fsmonitor
data for the entire tree be, and would there eventually be a point
where the number of paths we are interested in checking (i.e. the
paths that would match the pathspec) is so small that we would be
better off not making this call?  E.g. if we are checking more than
20% of the working tree, running refresh_fsmonitor() for the entire
working tree is still a win, but if we are only checking less than
that, we are better off without fsmonitor, or does a tradeoff like
that exist?

> @@ -197,8 +199,19 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
>  		if (ce_uptodate(ce) || ce_skip_worktree(ce))
>  			continue;
>  
> -		/* If CE_VALID is set, don't look at workdir for file removal */
> -		if (ce->ce_flags & CE_VALID) {
> +		/*
> +		 * If CE_VALID is set, the user has promised us that the workdir
> +		 * hasn't changed compared to index, so don't stat workdir
> +		 * for file removal

The above seems to be an attempt to elaborate on the existing
comment, but ...

> +		 *  eg - via git udpate-index --assume-unchanged
> +		 *  eg - via core.ignorestat=true

... what are these two lines doing here?  It makes no sense to say
"Don't stat workdir for file removal by doing 'git update-index' or
by seetting core.ignorestat", but the placement of these two lines
makes it look as if that is what you are saying.  Perhaps

	When CE_VALID is set (via "update-index --assume-unchanged"
	or via adding paths while core.ignorestat is set to true),
	the user has promised ..., so don't stat workdir for removed
	files.

would probably be what you meant bo say.

> +		 * When using FSMONITOR:
> +		 * If CE_FSMONITOR_VALID is set, then we know the metadata on disk
> +		 * has not changed since the last refresh, and we can skip the
> +		 * file-removal checks without doing the stat in check_removed.

An iffy description.  You skip all the file-removal check by not
calling check_removed() as a whole.

This is not the fault of this patch, but in any case, the
description places too much stress on "removal" when in reality,
removal is not all that special in this codepath.  The check_removed
call also contributes to noticiing modified (not removed) files.  If
we are updating the comment here, we should correct that too,
perhaps

	When CE_VALID is set (via "update-index --assume-unchanged"
	or via adding paths while core.ignorestat is set to true),
	the user has promised that the working tree file for that
	path will not be modified.  When CE_FSMONITOR_VALID is true,
	the fsmonitor knows that the path hasn't been modified since
	we refreshed the cached stat information.  In either case,
	we do not have to stat to see if the path has been removed
	or modified.

or something like that, perhaps.

> +		 */
> +		if (ce->ce_flags & CE_VALID || ce->ce_flags & CE_FSMONITOR_VALID) {

Would it become easier to read, if written like this instead?

		if (ce->ce_flags & (CE_VALID | CE_FSMONITOR_VALID)) {

That reflects what the suggested comment says better.

>  			changed = 0;
>  			newmode = ce->ce_mode;
>  		} else {

Thanks.

  reply	other threads:[~2020-10-17 22:25 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-17 21:04 [PATCH 0/4] use fsmonitor data in git diff eliminating O(num_files) calls to lstat Nipunn Koorapati via GitGitGadget
2020-10-17 21:04 ` [PATCH 1/4] fsmonitor: use fsmonitor data in `git diff` Alex Vandiver via GitGitGadget
2020-10-17 22:25   ` Junio C Hamano [this message]
2020-10-18  0:54     ` Nipunn Koorapati
2020-10-18  4:17       ` Taylor Blau
2020-10-18  5:02         ` Junio C Hamano
2020-10-18 23:43           ` Taylor Blau
2020-10-19 17:23             ` Junio C Hamano
2020-10-19 17:37               ` Taylor Blau
2020-10-19 18:07                 ` Nipunn Koorapati
2020-10-17 21:04 ` [PATCH 2/4] t/perf/README: elaborate on output format Nipunn Koorapati via GitGitGadget
2020-10-17 21:04 ` [PATCH 3/4] t/perf/p7519-fsmonitor.sh: warm cache on first git status Nipunn Koorapati via GitGitGadget
2020-10-18  4:22   ` Taylor Blau
2020-10-17 21:04 ` [PATCH 4/4] t/perf: add fsmonitor perf test for git diff Nipunn Koorapati via GitGitGadget
2020-10-17 22:28   ` Junio C Hamano
2020-10-19 21:35 ` [PATCH v2 0/4] use fsmonitor data in git diff eliminating O(num_files) calls to lstat Nipunn Koorapati via GitGitGadget
2020-10-19 21:35   ` [PATCH v2 1/4] fsmonitor: use fsmonitor data in `git diff` Alex Vandiver via GitGitGadget
2020-10-19 21:35   ` [PATCH v2 2/4] t/perf/README: elaborate on output format Nipunn Koorapati via GitGitGadget
2020-10-19 21:35   ` [PATCH v2 3/4] t/perf/p7519-fsmonitor.sh: warm cache on first git status Nipunn Koorapati via GitGitGadget
2020-10-19 21:35   ` [PATCH v2 4/4] t/perf: add fsmonitor perf test for git diff Nipunn Koorapati via GitGitGadget
2020-10-19 21:43     ` Taylor Blau
2020-10-19 21:54     ` Taylor Blau
2020-10-19 22:00       ` Nipunn Koorapati
2020-10-19 22:02         ` Taylor Blau
2020-10-19 22:25       ` Nipunn Koorapati
2020-10-19 22:47   ` [PATCH v3 0/7] use fsmonitor data in git diff eliminating O(num_files) calls to lstat Nipunn Koorapati via GitGitGadget
2020-10-19 22:47     ` [PATCH v3 1/7] fsmonitor: use fsmonitor data in `git diff` Alex Vandiver via GitGitGadget
2020-10-19 22:47     ` [PATCH v3 2/7] t/perf/README: elaborate on output format Nipunn Koorapati via GitGitGadget
2020-10-19 22:47     ` [PATCH v3 3/7] t/perf/p7519-fsmonitor.sh: warm cache on first git status Nipunn Koorapati via GitGitGadget
2020-10-19 22:47     ` [PATCH v3 4/7] t/perf: add fsmonitor perf test for git diff Nipunn Koorapati via GitGitGadget
2020-10-19 22:47     ` [PATCH v3 5/7] perf lint: check test-lint-shell-syntax in perf tests Nipunn Koorapati via GitGitGadget
2020-10-20  2:38       ` Taylor Blau
2020-10-20  3:10         ` Junio C Hamano
2020-10-20  3:15           ` Taylor Blau
2020-10-20 10:16             ` Nipunn Koorapati
2020-10-20 10:09         ` Nipunn Koorapati
2020-10-19 22:47     ` [PATCH v3 6/7] p7519-fsmonitor: refactor to avoid code duplication Nipunn Koorapati via GitGitGadget
2020-10-20  2:43       ` Taylor Blau
2020-10-19 22:47     ` [PATCH v3 7/7] p7519-fsmonitor: add a git add benchmark Nipunn Koorapati via GitGitGadget
2020-10-19 23:02       ` Nipunn Koorapati
2020-10-20  2:40       ` Taylor Blau
2020-10-20 13:40     ` [PATCH v4 0/7] use fsmonitor data in git diff eliminating O(num_files) calls to lstat Nipunn Koorapati via GitGitGadget
2020-10-20 13:40       ` [PATCH v4 1/7] fsmonitor: use fsmonitor data in `git diff` Alex Vandiver via GitGitGadget
2020-10-20 13:40       ` [PATCH v4 2/7] t/perf/README: elaborate on output format Nipunn Koorapati via GitGitGadget
2020-10-20 13:41       ` [PATCH v4 3/7] t/perf/p7519-fsmonitor.sh: warm cache on first git status Nipunn Koorapati via GitGitGadget
2020-10-20 13:41       ` [PATCH v4 4/7] t/perf: add fsmonitor perf test for git diff Nipunn Koorapati via GitGitGadget
2020-10-20 13:41       ` [PATCH v4 5/7] perf lint: add make test-lint to perf tests Nipunn Koorapati via GitGitGadget
2020-10-20 22:06         ` Taylor Blau
2020-10-20 22:17           ` Nipunn Koorapati
2020-10-20 22:19             ` Taylor Blau
2020-10-20 13:41       ` [PATCH v4 6/7] p7519-fsmonitor: refactor to avoid code duplication Nipunn Koorapati via GitGitGadget
2020-10-20 13:41       ` [PATCH v4 7/7] p7519-fsmonitor: add a git add benchmark Nipunn Koorapati via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqeelw8p8i.fsf@gitster.c.googlers.com \
    --to=gitster@pobox.com \
    --cc=alexmv@dropbox.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=nipunn1313@gmail.com \
    --cc=stolee@gmail.com \
    --cc=utsav@dropbox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).