git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Ben Peart <peartben@gmail.com>
To: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	Christian Couder <christian.couder@gmail.com>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	Ben Peart <benpeart@microsoft.com>,
	Alex Vandiver <alexmv@dropbox.com>,
	git@vger.kernel.org
Subject: Re: [PATCH v2] dir.c: ignore paths containing .git when invalidating untracked cache
Date: Wed, 7 Feb 2018 11:59:58 -0500	[thread overview]
Message-ID: <c755e388-89a5-fc0f-f872-16fd5d5686b0@gmail.com> (raw)
In-Reply-To: <20180207092141.4312-2-pclouds@gmail.com>



On 2/7/2018 4:21 AM, Nguyễn Thái Ngọc Duy wrote:
> read_directory() code ignores all paths named ".git" even if it's not
> a valid git repository. See treat_path() for details. Since ".git" is
> basically invisible to read_directory(), when we are asked to
> invalidate a path that contains ".git", we can safely ignore it
> because the slow path would not consider it anyway.
> 
> This helps when fsmonitor is used and we have a real ".git" repo at
> worktree top. Occasionally .git/index will be updated and if the
> fsmonitor hook does not filter it, untracked cache is asked to
> invalidate the path ".git/index".
> 
> Without this patch, we invalidate the root directory unncessarily,
> which:
> 
> - makes read_directory() fall back to slow path for root directory
>    (slower)
> 
> - makes the index dirty (because UNTR extension is updated). Depending
>    on the index size, writing it down could also be slow.
> 

Thank you again, this patch makes much more sense to me.

> A note about the new "safe_path" knob. Since this new check could be
> relatively expensive, avoid it when we know it's not needed. If the
> path comes from the index, it can't contain ".git". If it does
> contain, we may be screwed up at many more levels, not just this one.
> 

I do have a simplifying suggestion to make.  I noticed that other uses 
of verify_path() check when the potentially erroneous path is passed in 
and then all the underlying code can assume it is valid.  I think that 
makes sense here as well and it makes for a smaller patch.


> diff --git a/fsmonitor.h b/fsmonitor.h
> index cd3cc0ccf2..65f3743636 100644
> --- a/fsmonitor.h
> +++ b/fsmonitor.h
> @@ -65,7 +65,7 @@ static inline void mark_fsmonitor_invalid(struct index_state *istate, struct cac
>   {
>   	if (core_fsmonitor) {
>   		ce->ce_flags &= ~CE_FSMONITOR_VALID;
> -		untracked_cache_invalidate_path(istate, ce->name);
> +		untracked_cache_invalidate_path(istate, ce->name, 1);

This test isn't needed because we're pulling the name right out of the 
cache entry so it doesn't need to be verified.

Here is a modified version of your patch for consideration:

================

read_directory() code ignores all paths named ".git" even if it's not
a valid git repository. See treat_path() for details. Since ".git" is
basically invisible to read_directory(), when we are asked to
invalidate a path that contains ".git", we can safely ignore it
because the slow path would not consider it anyway.

This helps when fsmonitor is used and we have a real ".git" repo at
worktree top. Occasionally .git/index will be updated and if the
fsmonitor hook does not filter it, untracked cache is asked to
invalidate the path ".git/index".

Without this patch, we invalidate the root directory unnecessarily,
which:

- makes read_directory() fall back to slow path for root directory
   (slower)

- makes the index dirty (because UNTR extension is updated). Depending
   on the index size, writing it down could also be slow.

Noticed-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Ben Peart <benpeart@microsoft.com>
---

Notes:
     Base Ref: master
     Web-Diff: https://github.com/benpeart/git/commit/218a577618
     Checkout: git fetch https://github.com/benpeart/git verify_path-v1 
&& git checkout 218a577618

  dir.c                       |  2 +-
  fsmonitor.c                 |  6 +++++-
  t/t7519-status-fsmonitor.sh | 39 +++++++++++++++++++++++++++++++++++++++
  3 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/dir.c b/dir.c
index 7c4b45e30e..d431da46f5 100644
--- a/dir.c
+++ b/dir.c
@@ -1773,7 +1773,7 @@ static enum path_treatment treat_path(struct 
dir_struct *dir,
  	if (!de)
  		return treat_path_fast(dir, untracked, cdir, istate, path,
  				       baselen, pathspec);
-	if (is_dot_or_dotdot(de->d_name) || !strcmp(de->d_name, ".git"))
+	if (is_dot_or_dotdot(de->d_name) || !fspathcmp(de->d_name, ".git"))
  		return path_none;
  	strbuf_setlen(path, baselen);
  	strbuf_addstr(path, de->d_name);
diff --git a/fsmonitor.c b/fsmonitor.c
index 0af7c4edba..019576f306 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -118,8 +118,12 @@ static int query_fsmonitor(int version, uint64_t 
last_update, struct strbuf *que

  static void fsmonitor_refresh_callback(struct index_state *istate, 
const char *name)
  {
-	int pos = index_name_pos(istate, name, strlen(name));
+	int pos;

+	if (!verify_path(name))
+		return;
+
+	pos = index_name_pos(istate, name, strlen(name));
  	if (pos >= 0) {
  		struct cache_entry *ce = istate->cache[pos];
  		ce->ce_flags &= ~CE_FSMONITOR_VALID;
diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
index eb2d13bbcf..756beb0d8e 100755
--- a/t/t7519-status-fsmonitor.sh
+++ b/t/t7519-status-fsmonitor.sh
@@ -314,4 +314,43 @@ test_expect_success 'splitting the index results in 
the same state' '
  	test_cmp expect actual
  '

+test_expect_success UNTRACKED_CACHE 'ignore .git changes when 
invalidating UNTR' '
+	test_create_repo dot-git &&
+	(
+		cd dot-git &&
+		mkdir -p .git/hooks &&
+		: >tracked &&
+		: >modified &&
+		mkdir dir1 &&
+		: >dir1/tracked &&
+		: >dir1/modified &&
+		mkdir dir2 &&
+		: >dir2/tracked &&
+		: >dir2/modified &&
+		write_integration_script &&
+		git config core.fsmonitor .git/hooks/fsmonitor-test &&
+		git update-index --untracked-cache &&
+		git update-index --fsmonitor &&
+		GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace-before" \
+		git status &&
+		test-dump-untracked-cache >../before
+	) &&
+	cat >>dot-git/.git/hooks/fsmonitor-test <<-\EOF &&
+	printf ".git\0"
+	printf ".git/index\0"
+	printf "dir1/.git\0"
+	printf "dir1/.git/index\0"
+	EOF
+	(
+		cd dot-git &&
+		GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace-after" \
+		git status &&
+		test-dump-untracked-cache >../after
+	) &&
+	grep "directory invalidation" trace-before >>before &&
+	grep "directory invalidation" trace-after >>after &&
+	# UNTR extension unchanged, dir invalidation count unchanged
+	test_cmp before after
+'
+
  test_done

base-commit: 5be1f00a9a701532232f57958efab4be8c959a29
-- 
2.15.0.windows.1


  reply	other threads:[~2018-02-07 17:00 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-27  0:28 Some rough edges of core.fsmonitor Ævar Arnfjörð Bjarmason
2018-01-27  1:36 ` Duy Nguyen
2018-01-27  1:39   ` [PATCH] trace: measure where the time is spent in the index-heavy operations Nguyễn Thái Ngọc Duy
2018-01-27 11:58     ` Thomas Gummerer
2018-01-27 12:27       ` [PATCH v2] " Nguyễn Thái Ngọc Duy
2018-01-27 11:43   ` Some rough edges of core.fsmonitor Ævar Arnfjörð Bjarmason
2018-01-27 12:39     ` Duy Nguyen
2018-01-27 13:09       ` Duy Nguyen
2018-01-27 19:01         ` Ævar Arnfjörð Bjarmason
2018-01-30 22:41           ` Ben Peart
2018-01-29  9:40     ` Duy Nguyen
2018-01-29 23:16       ` Ben Peart
2018-02-01 10:40         ` Duy Nguyen
2018-01-28 20:44 ` Johannes Schindelin
2018-01-28 22:28   ` Ævar Arnfjörð Bjarmason
2018-01-30  1:21     ` Ben Peart
2018-01-31 10:15       ` Duy Nguyen
2018-02-04  9:38         ` [PATCH] dir.c: ignore paths containing .git when invalidating untracked cache Nguyễn Thái Ngọc Duy
2018-02-05 17:44           ` Ben Peart
2018-02-06 12:02             ` Duy Nguyen
2018-02-07  9:21           ` [PATCH v2] " Nguyễn Thái Ngọc Duy
2018-02-07  9:21             ` Nguyễn Thái Ngọc Duy
2018-02-07 16:59               ` Ben Peart [this message]
2018-02-13 10:00                 ` Duy Nguyen
2018-02-13 17:57                   ` Junio C Hamano
2018-02-14  1:24                     ` Duy Nguyen
2018-02-14  8:00                       ` Junio C Hamano
2018-01-30 22:57 ` Some rough edges of core.fsmonitor Ben Peart
2018-01-30 23:16   ` Ævar Arnfjörð Bjarmason
2018-01-31 16:12     ` Ben Peart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c755e388-89a5-fc0f-f872-16fd5d5686b0@gmail.com \
    --to=peartben@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=alexmv@dropbox.com \
    --cc=avarab@gmail.com \
    --cc=benpeart@microsoft.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).