git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Ben Peart <peartben@gmail.com>
To: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	Christian Couder <christian.couder@gmail.com>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	Ben Peart <benpeart@microsoft.com>,
	Alex Vandiver <alexmv@dropbox.com>,
	git@vger.kernel.org
Subject: Re: [PATCH] dir.c: ignore paths containing .git when invalidating untracked cache
Date: Mon, 5 Feb 2018 12:44:05 -0500	[thread overview]
Message-ID: <f002344e-e8ae-84e8-a1aa-45e44c0ccc88@gmail.com> (raw)
In-Reply-To: <20180204093823.3671-1-pclouds@gmail.com>



On 2/4/2018 4:38 AM, Nguyễn Thái Ngọc Duy wrote:
> read_directory() code ignores all paths named ".git" even if it's not
> a valid git repository. See treat_path() for details. Since ".git" is
> basically invisible to read_directory(), when we are asked to
> invalidate a path that contains ".git", we can safely ignore it
> because the slow path would not consider it anyway.
> 
> This helps when fsmonitor is used and we have a real ".git" repo at
> worktree top. Occasionally .git/index will be updated and if the
> fsmonitor hook does not filter it, untracked cache is asked to
> invalidate the path ".git/index".
> 
> Without this patch, we invalidate the root directory unncessarily,
> which:
> 
> - makes read_directory() fall back to slow path for root directory
>    (slower)
> 
> - makes the index dirty (because UNTR extension is updated). Depending
>    on the index size, writing it down could also be slow.
> 
> Noticed-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>   Sorry for the resend, I forgot git@vger.
> 
>   dir.c             | 13 ++++++++++++-
>   git-compat-util.h |  2 ++
>   wrapper.c         | 12 ++++++++++++
>   3 files changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/dir.c b/dir.c
> index 7c4b45e30e..f8b4cabba9 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1773,7 +1773,7 @@ static enum path_treatment treat_path(struct dir_struct *dir,
>   	if (!de)
>   		return treat_path_fast(dir, untracked, cdir, istate, path,
>   				       baselen, pathspec);
> -	if (is_dot_or_dotdot(de->d_name) || !strcmp(de->d_name, ".git"))
> +	if (is_dot_or_dotdot(de->d_name) || !fspathcmp(de->d_name, ".git"))
>   		return path_none;
>   	strbuf_setlen(path, baselen);
>   	strbuf_addstr(path, de->d_name);
> @@ -2970,8 +2970,19 @@ static int invalidate_one_component(struct untracked_cache *uc,
>   void untracked_cache_invalidate_path(struct index_state *istate,
>   				     const char *path)
>   {
> +	const char *end;
> +	int skipped;
> +
>   	if (!istate->untracked || !istate->untracked->root)
>   		return;

Thank you for this patch!  It's great to see people helping improve the 
performance of git.

I ran a quick test and this is not sufficient to prevent the index from 
getting flagged as dirty and written out to disk when fsmonitor detects 
changes for files under the .git folder.  What I'm seeing is that when 
".git/index" is returned, the test below doesn't catch them and end 
early as would be expected.

As a result, invalidate_one_component() is called which calls 
invalidate_one_directory() which increments dir_invalidated counter and 
the regular dirty process continues which triggers the index to be 
written to disk unnecessarily.

> +	if (!fspathcmp(path, ".git"))
> +		return;
> +	if (ignore_case)
> +		skipped = skip_caseprefix(path, "/.git", &end);
> +	else
> +		skipped = skip_prefix(path, "/.git", &end);
> +	if (skipped && (*end == '\0' || *end == '/'))
> +		return;

If I replace the above lines with:

	if (ignore_case)
		skipped = skip_caseprefix(path, ".git", &end);
	else
		skipped = skip_prefix(path, ".git", &end);

Then it correctly skips _all_ files under ".git".  I'm not sure if by 
removing the leading slash, I'm breaking some other case but I was not 
able to find where that was expected or needed. Removing the leading 
slash also allows me to remove the fsmpathcmp() call as it is now redundant.

I wondered what would happen if there was some other directory named 
".git" and how that would behave.  With or without this patch and with 
or without the untracked cache, a file "dir1/.git/foo" is ignored by git 
so no change in behavior there either.

>   	invalidate_one_component(istate->untracked, istate->untracked->root,
>   				 path, strlen(path));
>   }
> diff --git a/git-compat-util.h b/git-compat-util.h
> index 68b2ad531e..27e0b761a3 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -484,6 +484,8 @@ static inline int skip_prefix(const char *str, const char *prefix,
>   	return 0;
>   }
>   
> +int skip_caseprefix(const char *str, const char *prefix, const char **out);
> +
>   /*
>    * If the string "str" is the same as the string in "prefix", then the "arg"
>    * parameter is set to the "def" parameter and 1 is returned.
> diff --git a/wrapper.c b/wrapper.c
> index d20356a776..bb888d9401 100644
> --- a/wrapper.c
> +++ b/wrapper.c
> @@ -690,3 +690,15 @@ int xgethostname(char *buf, size_t len)
>   		buf[len - 1] = 0;
>   	return ret;
>   }
> +
> +int skip_caseprefix(const char *str, const char *prefix, const char **out)
> +{
> +	do {
> +		if (!*prefix) {
> +			*out = str;
> +			return 1;
> +		}
> +	} while (tolower(*str++) == tolower(*prefix++));
> +
> +	return 0;
> +}
> 

  reply	other threads:[~2018-02-05 17:44 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-27  0:28 Some rough edges of core.fsmonitor Ævar Arnfjörð Bjarmason
2018-01-27  1:36 ` Duy Nguyen
2018-01-27  1:39   ` [PATCH] trace: measure where the time is spent in the index-heavy operations Nguyễn Thái Ngọc Duy
2018-01-27 11:58     ` Thomas Gummerer
2018-01-27 12:27       ` [PATCH v2] " Nguyễn Thái Ngọc Duy
2018-01-27 11:43   ` Some rough edges of core.fsmonitor Ævar Arnfjörð Bjarmason
2018-01-27 12:39     ` Duy Nguyen
2018-01-27 13:09       ` Duy Nguyen
2018-01-27 19:01         ` Ævar Arnfjörð Bjarmason
2018-01-30 22:41           ` Ben Peart
2018-01-29  9:40     ` Duy Nguyen
2018-01-29 23:16       ` Ben Peart
2018-02-01 10:40         ` Duy Nguyen
2018-01-28 20:44 ` Johannes Schindelin
2018-01-28 22:28   ` Ævar Arnfjörð Bjarmason
2018-01-30  1:21     ` Ben Peart
2018-01-31 10:15       ` Duy Nguyen
2018-02-04  9:38         ` [PATCH] dir.c: ignore paths containing .git when invalidating untracked cache Nguyễn Thái Ngọc Duy
2018-02-05 17:44           ` Ben Peart [this message]
2018-02-06 12:02             ` Duy Nguyen
2018-02-07  9:21           ` [PATCH v2] " Nguyễn Thái Ngọc Duy
2018-02-07  9:21             ` Nguyễn Thái Ngọc Duy
2018-02-07 16:59               ` Ben Peart
2018-02-13 10:00                 ` Duy Nguyen
2018-02-13 17:57                   ` Junio C Hamano
2018-02-14  1:24                     ` Duy Nguyen
2018-02-14  8:00                       ` Junio C Hamano
2018-01-30 22:57 ` Some rough edges of core.fsmonitor Ben Peart
2018-01-30 23:16   ` Ævar Arnfjörð Bjarmason
2018-01-31 16:12     ` Ben Peart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f002344e-e8ae-84e8-a1aa-45e44c0ccc88@gmail.com \
    --to=peartben@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=alexmv@dropbox.com \
    --cc=avarab@gmail.com \
    --cc=benpeart@microsoft.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).