git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Tao Klerks <tao@klerks.biz>
To: git@vger.kernel.org
Subject: Question about fsmonitor and --untracked-files=all
Date: Tue, 22 Sep 2020 13:35:42 +0200	[thread overview]
Message-ID: <CAPMMpoj+UhKCW_k34-cGkiWFghOOu13GhPgA0V-y4ZpLVppuiA@mail.gmail.com> (raw)

Hi folks,

I've got a couple questions about the "fsmonitor" functionality,
untracked files, and multithreading.

Background:

In a repo with:
 * A couple hundred thousand tracked files, and a couple hundred
thousand .gitignored files, across a few thousand directories
 * The --untracked-cache setting, tested and working
 * core.fsmonitor set up with watchman (with the sample integration
script from january)
 * Git version 2.27.0.windows.1

"git status" takes about 2s
"git status --untracked-files=all" takes about 20s

When I turn off "core.fsmonitor", the numbers change to something like:
"git status": 8s
"git status --untracked-files=all": 9s

Using windows' "procmon" to observe git.exe's behavior from outside, I
think I've understood a couple things that surprise me:
1. when you specify "--untracked-files=all", git scans the entire
folder tree regardless of the "fsmonitor" hook
2. when you specify the "fsmonitor" hook, git does any
filesystem-scanning in a single-threaded fashion (as opposed to
multi-threaded without "fsmonitor" / normally)

These two things combine so that with "fsmonitor" set, normal
command-line git status performance is great, but the performance in
tools that eagerly look for untracked files (like "Git Extensions" on
windows) actually suffers - it takes twice as long to run the 'git -c
diff.ignoreSubModules=none status --porcelain=2 -z
--untracked-files=all' command that this UI wants (and blocks on, when
you go to a commit dialog).

Questions:

1. Is there a reason "--untracked-files=all" causes a full directory
tree scan even with the "fsmonitor" hook active, or is this
accidental?
2. Assuming that the full directory tree scan is indeed necessary even
with "fsmonitor" (when requesting all untracked files), could it be
made multithreaded?

(my apologies for the simplistic "outside-in" observations; I don't
feel qualified to attempt to understand the git source code)

Thanks for any help understanding the optimization opportunities here!

Tao Klerks

             reply	other threads:[~2020-09-22 11:36 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-22 11:35 Tao Klerks [this message]
2020-09-23 10:40 ` Question about fsmonitor and --untracked-files=all Johannes Schindelin
2020-09-24 12:14   ` Tao Klerks

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPMMpoj+UhKCW_k34-cGkiWFghOOu13GhPgA0V-y4ZpLVppuiA@mail.gmail.com \
    --to=tao@klerks.biz \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).