Hi Johannes, Thanks for the detailed reply! My apologies for the subsequent delay - I've been trying to understand the behavior so that I can describe it in more detail, and that required me to learn a little C along the way :) On Fri, Jun 11, 2021 at 11:49 AM Johannes Schindelin wrote: > _sometimes_, the mtime of a directory seems not to > be updated immediately after an item in it was modified/added/deleted. And > that mtime is precisely what the untracked cache depends on. > > The funny thing is: while the output of `git status` will therefore at > first fail to pick up on, say, a new untracked file, running `git status` > _immediately_ afterwards _will succeed_ to see that untracked file. (I have nothing to add here - I don't understand what synchronous-acknowledgement or ordering guarantees we rely on to determine when we *expect* a change to be available to the fsmonitor, nor to I have any familiarity with the underlying APIs we use) > On Thu, 10 Jun 2021, Tao Klerks wrote: > > - There is also a lingering "problem" with "git status -uall", with > > both "core.useBuiltinFSMonitor" and "core.fsmonitor", but that seems > > far less trivial to address > > Interesting. I guess the untracked cache might become too clunky with many > untracked files? Or is there something else going on? I think I understand this now, but I don't think I can explain it particularly well/succinctly. I've attached a sort of "truth table" of performance data for a particular repository, running *warm* git status calls (no cold-cache testing at all) with various config and command-line options, and reporting the durations of various phases/processes captured using GIT_TRACE2_PERF=1. The claimed "problem" with "git status -uall", when both "core.useBuiltinFSMonitor" and "core.fsmonitor" are enabled, only exists from the perspective of someone who's got core.fscache enabled all the time: * core.fscache works (in the Windows port only) by doing slightly more expensive work up-front on first directory query within a request/process lifetime, and then intercepting subsequent filesystem "queries"/operations * The context within which most of this more-expensive work typically occurs, in a "git status" request, is an explicitly and intentionally multi-threaded filesystem "lstat"-checking process in preload-index.c (always 20 threads, for a large repo) * There are two sets of filesystem-iteration in a simple/naive git status call with core.preloadindex enabled as by default - the lstat-checking multithreaded loops for files in the index, and the recursive directory scanning for untracked files that happens later * That second (not-multithreaded) set of work, with fscache enabled, gets to reuse a bunch of cached fs data from the first one. A walk that cost 7s without fscache now costs only 2.5s, for example. * With fsmonitor enabled (and warm), the first walk simply doesn't happen, so fscache stops making any difference to that untracked-file-searching directory walk; it goes back to taking 7s; every directory is queried once in series, so fscache has no impact at all. * Because the preload-index lstat-querying loop is parallelized, with fscache the initial cache population happens fast-ish - the total time taken for git status is eg 5s (2.5s of parallelized & accelerated lstat-querying and 2.5s of untracked-folder-iterating-and-processing-from-fscache) * So, with fscache enabled, turning on fsmonitor actually makes "git status" *slower* - it changes the "lstat + untracked" time from "2.5s + 2.5s" to "0s + 7s" * We can hide/mitigate that by enabling the untracked cache - but that "fails out" in all sorts of conditions specified in dir.c validate_untracked_cache(), including specifying "-uall"/"-u" to "git status * -> it is only in the context of fscache being enabled, and already speeding up the filesystem work, that fsmonitor can make things worse by making the first directory-querying loop happen in a non-parallel area of code, and thereby cancel fscache's impact. * -> fsmonitor never makes things worse on linux, since there is no fscache and so untracked folder iteration never benefits from any multithreading * -> when the untracked cache does apply, fsmonitor can *help* it, avoiding the need for any sort of directory walk at all, on the basis that "nothing in the filesystem appears to have changed" Based on this understanding, it looks like there are at least three possible directions to be explored: 1. Making the untracked cache less sensitive to configuration in dir.c#validate_untracked_cache(), at the cost of doing more work in the cold cases & saving more data in the index file (specific untracked files, and .git folder names or something) ** This would result in peak performance, with no filesystem-iterating at all in the ideal case ** This would apply / add substantial value in Windows and Linux ** This is probably the most complex change - dir.c is not easy to understand/navigate 2. Making the untracked directory-search happen in a multithreaded way ** This would raise the performance with "-uall" to approximately the same as it was before fsmonitor was introduced on windows, and speed it up slightly on linux ** This change would probably not be worthwhile, - its impact would not be huge except in very specific cases, and it would still introduce non-trivial complexity 3. Forcing preload-index to actually "do its multithreaded work", even when fsmonitor is there, if we know that the untracked cached cannot be used and fsmonitor is enabled ** This would raise the performance with "-uall" to approximately the same as it was before fsmonitor was introduced on windows ** This change would probably be pretty easy - the main challenge is how to get untracked-cache-applicability information at preload-index time, since these happen in completely different parts of the codebase I mocked up a local hack for the third option, and confirmed that forcing preload-index to ignore it does indeed bring "git status -uall" performance back to the same level as before enabling fsmonitor. > > I just started testing the new "core.useBuiltinFSMonitor" option in > > the new installer, and it's amazing, thanks Ben, Alex, Johannes and > > Kevin! > > Not to forget Jeff Hostetler, who essentially spent the past half year on > it on his own. Argh, thank you for the correction, and thank you Jeff also for all your work on this. > > For context, this is in a repo with 200,000 or so files, within 40,000 > > folders (avg path depth 4 I think?), with a reasonably-intricate set > > of .gitignore patterns. > My `.gitignore` consists of only ~40 heavily commented lines (containing > five lines with wildcards), but I do have a `.git/info/exclude` that > contains a set of generated file/directory lists, i.e. without any > wildcards. This `exclude` file is ~26k lines long. > > My guess is that the amount of work to match the untracked vs ignored > files is dominating the entire operation, by a lot. In my case, as per the "truth table" referenced above, there are two kinds of things in play: 1. Filesystem operations are the main thing that matters in Windows 2. Some smaller amount of overhead (0.5-1s in my case) is associated with other work (pattern-matching etc) during untracked file detection, even with untracked cache enabled. The only way to avoid that work altogether is to have fsmonitor *and* untracked cache working, so that untracked cache can "trust" the fsmonitor results to avoid having to do any work at all. In this ideal situation fscache makes no difference, as there is no fs iteration. > > I don't know whether this is the right place to report Windows-centric > > concerns, if not, my apologies. > > I would not necessarily call them "Windows-centric", even if yes, at the > moment the built-in FSMonitor is most easily enabled on Windows (because I > added that experimental option in Git for Windows' installer, after > integrating the experimental feature). Right - now that I understand the situation better, there are three specific ways in which I consider these to be windows-centric concerns: * In my experience / in my context at least, the "naive" impact of file operation performance differences results in a more than 10X "git status" reduction in performance for large repos (over linux); various optional and/or windows-specific strategies significantly close that gap * core.fscache is a windows-specific strategy for dealing with this, and interacts with other features/strategies in potentially-surprising ways * The built-in FSMonitor is still only *easily* available in Windows But, to your point, most of the *solutions* need not be windows-centric at all. The "best" one, making untracked cache a little more forgiving, would definitely have tangible performance benefits on linux (and presumably OSX). Thanks, Tao