From: Ben Peart <peartben@gmail.com>
To: Duy Nguyen <pclouds@gmail.com>
Cc: "Brandon Williams" <bmwill@google.com>,
"Ben Peart" <benpeart@microsoft.com>,
"Git Mailing List" <git@vger.kernel.org>,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: Re: [PATCH v1] dir.c: don't flag the index as dirty for changes to the untracked cache
Date: Wed, 7 Feb 2018 09:13:28 -0500 [thread overview]
Message-ID: <0039c71c-cefd-4950-aa7c-ffbb7cf66e49@gmail.com> (raw)
In-Reply-To: <CACsJy8DLP=j-h3knwX9zOpejAfUbv1YJwfB-iw4476oy0hTfxg@mail.gmail.com>
On 2/6/2018 7:27 AM, Duy Nguyen wrote:
>
> This is another thing that bugs me. I know you're talking about huge
> index files, but at what size should we start this sort of
> optimization? Writing down a few MBs on linux is cheap enough that I
> won't bother optimizing (and I get my UNTR extension repaired all the
> time, so reduced lstat calls and stuff). This "typically" only comes
> at certain size, what size?
>
It's important to identify what we're trading off here. With the
proposed optimization, we'll end up doing less writes of the index but
potentially more lstat calls. Even with a small index, writing the
index is much more expensive than calling lstat on a file. Exactly how
much more expensive depends on a lot of variables but even with a SSD
its still orders of magnitude difference.
That means we could potentially lstat hundreds or thousands of files and
still come out ahead. Given the untracked cache works at the directory
level, the lstat cost actually scales with the number of directories
that have had changes (ie changing a 2nd file in the same directory
doesn't add any additional cost). In "typical" workflows, developers
don't often change hundreds of files across different directories so
we'd "typically" still come out ahead.
We have internal performance tests based on common developer sequences
of commands (status, edit a file, status, add, status, commit, log,
push, etc) that show that having the untracked cache turned on actually
makes these sequences slower. At the time, we didn't have time to
investigate why so we just turned off the untracked cache.
When writing the fsmonitor patch series which can utilize the untracked
cache, I did some investigation into why the untracked cache was slowing
things down in these tests and discovered the cause was the additional
index writes. That is what led to this patch.
I've been sitting on it for months now because I didn't have the time to
write some performance tests for the git perf framework to demonstrate
the problem and how this helps solve it. With the discussion about the
fsmonitor extension, I thought this might be a good time to send it out
there.
If you have the cycles, time a typical series of commands with and
without the untracked cache (ie don't just call status over and over in
a loop) and you should see the same results. Given my limited time
right now, I'm OK putting this on the back burner again until a git perf
test can be written to ensure it actually speeds things up as advertised.
next prev parent reply other threads:[~2018-02-07 14:13 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-02-05 19:56 [PATCH v1] dir.c: don't flag the index as dirty for changes to the untracked cache Ben Peart
2018-02-05 20:55 ` Junio C Hamano
2018-02-06 1:39 ` Ben Peart
2018-02-05 21:58 ` Brandon Williams
2018-02-06 1:48 ` Ben Peart
2018-02-06 12:27 ` Duy Nguyen
2018-02-06 12:55 ` Duy Nguyen
2018-02-07 10:59 ` Duy Nguyen
2018-02-07 13:46 ` Ben Peart
2018-02-06 14:50 ` Junio C Hamano
2018-02-07 14:13 ` Ben Peart [this message]
2018-02-12 10:20 ` Duy Nguyen
2018-02-12 17:57 ` Ben Peart
2018-02-13 9:57 ` Duy Nguyen
2018-02-08 10:33 ` Jeff King
2018-02-28 21:27 ` Junio C Hamano
2018-03-01 7:42 ` Jeff King
2018-03-01 12:35 ` Ben Peart
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0039c71c-cefd-4950-aa7c-ffbb7cf66e49@gmail.com \
--to=peartben@gmail.com \
--cc=avarab@gmail.com \
--cc=benpeart@microsoft.com \
--cc=bmwill@google.com \
--cc=git@vger.kernel.org \
--cc=pclouds@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).