git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: Tao Klerks <tao@klerks.biz>
Cc: git <git@vger.kernel.org>,
	Jeff Hostetler <jeffhost@microsoft.com>,
	Derrick Stolee <dstolee@microsoft.com>
Subject: Re: Windows: core.useBuiltinFSMonitor without core.untrackedcache - performance hazard?
Date: Fri, 11 Jun 2021 11:49:18 +0200 (CEST)	[thread overview]
Message-ID: <nycvar.QRO.7.76.6.2106111122010.57@tvgsbejvaqbjf.bet> (raw)
In-Reply-To: <CAPMMpog7bNNPm3suZKu6OppHA+KDYgCfmaxW4HqTAr7_tTVAPQ@mail.gmail.com>

Hi Tao,

thank you for chiming in! It is good to see that more people are dabbling
with the built-in FSMonitor.

On Thu, 10 Jun 2021, Tao Klerks wrote:

> With the new "core.useBuiltinFSMonitor" support in the Windows
> installer, I think this subject is worth calling out explicitly (and
> my apologies if I missed prior discussion):
>
> TL;DR:
>  - I believe "core.untrackedcache" should be enabled by default in
> Windows (and it does not appear to be)

Stolee indicated something similar that matches your observation:
https://lore.kernel.org/git/af7a671c-fa32-6d9a-7d75-65582fdbcf24@gmail.com/

	Interestingly, the untracked cache extension makes a big
	difference here. The performance of the overall behavior is much
	faster if the untracked cache exists (when paired with the builtin
	FS Monitor; it doesn't make a significant difference when FS
	Monitor is disabled).

>  - If a user enables "core.useBuiltinFSMonitor" (eg in the installer)
> in the hopes of getting snappy "git status" on a repo with a large
> deep working tree, they will be *unnecessarily* disappointed if
> "core.untrackedcache" is not enabled

Yes.

And. Unfortunately, there is an "and". I recently got a chance to work
with the Functional Tests of Scalar ("an opinionated repository management
tool" on top of Git, see https://github.com/microsoft/scalar#readme for
more details). Essentially, you can think of that test suite as
integration tests for Git in a large-scale context. And there, I ran into
trouble with the untracked cache on Windows (where it really provides the
most benefit).

The gist of it is that _sometimes_, the mtime of a directory seems not to
be updated immediately after an item in it was modified/added/deleted. And
that mtime is precisely what the untracked cache depends on.

The funny thing is: while the output of `git status` will therefore at
first fail to pick up on, say, a new untracked file, running `git status`
_immediately_ afterwards _will succeed_ to see that untracked file. So
there is something fishy going on with updating things (it might even be a
foul interaction between the FSCache and the untracked cache, but I have
no evidence to back that up or to disprove it).

It is one of my big TODOs to look into that. If you have any insights, or
time to investigate, I woud be really interested.

>  - There is also a lingering "problem" with "git status -uall", with
> both "core.useBuiltinFSMonitor" and "core.fsmonitor", but that seems
> far less trivial to address

Interesting. I guess the untracked cache might become too clunky with many
untracked files? Or is there something else going on?

> Detail:
>
> I just started testing the new "core.useBuiltinFSMonitor" option in
> the new installer, and it's amazing, thanks Ben, Alex, Johannes and
> Kevin!

Not to forget Jeff Hostetler, who essentially spent the past half year on
it on his own.

> However, when I first enabled it, I was getting slightly *worse* git
> status times than without it... and those worse git status times were
> accompanied by a message along the lines of:
> ---
> It took 5.88 seconds to enumerate untracked files. 'status -uno' may
> speed it up, but you have to be careful not to forget to add new files
> yourself (see 'git help status').
> ---
>
> For context, this is in a repo with 200,000 or so files, within 40,000
> folders (avg path depth 4 I think?), with a reasonably-intricate set
> of .gitignore patterns. Obviously that's not "your average user", but
> I would imagine it matches "the target audience for
> 'core.useBuiltinFSMonitor'" pretty well.

Right. I had a somewhat similar setup, with Git for Windows' SDK, which
consists of ~160k files in ~8k directories.

My `.gitignore` consists of only ~40 heavily commented lines (containing
five lines with wildcards), but I do have a `.git/info/exclude` that
contains a set of generated file/directory lists, i.e. without any
wildcards. This `exclude` file is ~26k lines long.

A cold-cache `git status` takes ~24sec, a warm-cache one ~10sec (with the
built-in FSMonitor daemon now active).

My guess is that the amount of work to match the untracked vs ignored
files is dominating the entire operation, by a lot.

> After a little head-scratching, I recalled an exchange with Johannes
> from last year:
> https://lore.kernel.org/git/CAPMMpohJicVeCaKsPvommYbGEH-D1V02TTMaiVTV8ux+9z9vkQ@mail.gmail.com/
>
> I never did understand the relevant code paths in much detail, but the
> practical conclusions were:
>  - Without "core.untrackedcache" enabled, git ends up iterating
> through the entire path structure of the working tree *even if
> "core.fsmonitor" (and now "core.useBuiltinFSMonitor") is enabled*,
> looking for untracked files to report
>  - Even with "core.untrackedcache" enabled, if "core.fsmonitor" (and
> now "core.useBuiltinFSMonitor") is enabled, git iterates through the
> entire path structure of the working tree *single-threaded* when the
> "--untracked-files" mode is set to "all" (by config or command-line)
>
> Now, I imagine that addressing/improving these behaviors is very
> non-trivial, but the impact could be reasonably limited if:
>  - core.untrackedcache were defaulted to "true", at least under
> Windows, at least when the installer is asked to set
> core.useBuiltinFSMonitor

As soon as I can fix the flakiness of the untracked cache on Windows, I
will do that!

>  - The "It took N.NN seconds to enumerate untracked files" message
> were to include a hint about core.untrackedcache, at least when the
> "--untracked-files" mode is set to "normal".
>
> Final note: I personally would love to see "core.useBuiltinFSMonitor
> actually makes things slower, when --untracked-files=all is specified"
> behavior be addressed,

Yes, we need to spend some quality time with some perf tools there.

> because common windows git integrations or front-ends like Git
> Extensions or IntelliJ IDEA commonly use those options, and therefore
> "suffer" a performance degradation on at least some operations when
> core.useBuiltinFSMonitor is enabled.
>
> I don't know whether this is the right place to report Windows-centric
> concerns, if not, my apologies.

I would not necessarily call them "Windows-centric", even if yes, at the
moment the built-in FSMonitor is most easily enabled on Windows (because I
added that experimental option in Git for Windows' installer, after
integrating the experimental feature).

Instead, I consider this more the type of feedback concerning large
worktrees, and what Git can do to support that use case better.

In particular the built-in FSMonitor, which already supports Windows and
macOS, and hopefully we will find volunteers to work on the Linux side
soon, too. In my mind, the built-in FSMonitor, the untracked cache, and
`git maintenance` are _crucial_ tools to allow Git to scale up.

So: thank you for your wonderful feedback!

Ciao,
Dscho

  reply	other threads:[~2021-06-11  9:49 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-10 10:24 Windows: core.useBuiltinFSMonitor without core.untrackedcache - performance hazard? Tao Klerks
2021-06-11  9:49 ` Johannes Schindelin [this message]
2021-06-21 12:50   ` Tao Klerks
2021-06-21 18:41     ` Jeff Hostetler
2021-06-21 20:52       ` Tao Klerks
2021-06-24 18:51         ` Tao Klerks
2021-06-24  5:25       ` Tao Klerks
2021-06-24 13:10         ` Jeff Hostetler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=nycvar.QRO.7.76.6.2106111122010.57@tvgsbejvaqbjf.bet \
    --to=johannes.schindelin@gmx.de \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=jeffhost@microsoft.com \
    --cc=tao@klerks.biz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).