git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Ben Peart <peartben@gmail.com>
To: Duy Nguyen <pclouds@gmail.com>
Cc: "Brandon Williams" <bmwill@google.com>,
	"Ben Peart" <benpeart@microsoft.com>,
	"Git Mailing List" <git@vger.kernel.org>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: Re: [PATCH v1] dir.c: don't flag the index as dirty for changes to the untracked cache
Date: Wed, 7 Feb 2018 09:13:28 -0500	[thread overview]
Message-ID: <0039c71c-cefd-4950-aa7c-ffbb7cf66e49@gmail.com> (raw)
In-Reply-To: <CACsJy8DLP=j-h3knwX9zOpejAfUbv1YJwfB-iw4476oy0hTfxg@mail.gmail.com>



On 2/6/2018 7:27 AM, Duy Nguyen wrote:
> 
> This is another thing that bugs me. I know you're talking about huge
> index files, but at what size should we start this sort of
> optimization? Writing down a few MBs on linux is cheap enough that I
> won't bother optimizing (and I get my UNTR extension repaired all the
> time, so reduced lstat calls and stuff). This "typically" only comes
> at certain size, what size?
> 

It's important to identify what we're trading off here.  With the 
proposed optimization, we'll end up doing less writes of the index but 
potentially more lstat calls.  Even with a small index, writing the 
index is much more expensive than calling lstat on a file.  Exactly how 
much more expensive depends on a lot of variables but even with a SSD 
its still orders of magnitude difference.

That means we could potentially lstat hundreds or thousands of files and 
still come out ahead.  Given the untracked cache works at the directory 
level, the lstat cost actually scales with the number of directories 
that have had changes (ie changing a 2nd file in the same directory 
doesn't add any additional cost).  In "typical" workflows, developers 
don't often change hundreds of files across different directories so 
we'd "typically" still come out ahead.

We have internal performance tests based on common developer sequences 
of commands (status, edit a file, status, add, status, commit, log, 
push, etc) that show that having the untracked cache turned on actually 
makes these sequences slower.  At the time, we didn't have time to 
investigate why so we just turned off the untracked cache.

When writing the fsmonitor patch series which can utilize the untracked 
cache, I did some investigation into why the untracked cache was slowing 
things down in these tests and discovered the cause was the additional 
index writes.  That is what led to this patch.

I've been sitting on it for months now because I didn't have the time to 
write some performance tests for the git perf framework to demonstrate 
the problem and how this helps solve it.  With the discussion about the 
fsmonitor extension, I thought this might be a good time to send it out 
there.

If you have the cycles, time a typical series of commands with and 
without the untracked cache (ie don't just call status over and over in 
a loop) and you should see the same results.  Given my limited time 
right now, I'm OK putting this on the back burner again until a git perf 
test can be written to ensure it actually speeds things up as advertised.

  parent reply	other threads:[~2018-02-07 14:13 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-05 19:56 [PATCH v1] dir.c: don't flag the index as dirty for changes to the untracked cache Ben Peart
2018-02-05 20:55 ` Junio C Hamano
2018-02-06  1:39   ` Ben Peart
2018-02-05 21:58 ` Brandon Williams
2018-02-06  1:48   ` Ben Peart
2018-02-06 12:27     ` Duy Nguyen
2018-02-06 12:55       ` Duy Nguyen
2018-02-07 10:59         ` Duy Nguyen
2018-02-07 13:46           ` Ben Peart
2018-02-06 14:50       ` Junio C Hamano
2018-02-07 14:13       ` Ben Peart [this message]
2018-02-12 10:20         ` Duy Nguyen
2018-02-12 17:57           ` Ben Peart
2018-02-13  9:57             ` Duy Nguyen
2018-02-08 10:33 ` Jeff King
2018-02-28 21:27   ` Junio C Hamano
2018-03-01  7:42     ` Jeff King
2018-03-01 12:35       ` Ben Peart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0039c71c-cefd-4950-aa7c-ffbb7cf66e49@gmail.com \
    --to=peartben@gmail.com \
    --cc=avarab@gmail.com \
    --cc=benpeart@microsoft.com \
    --cc=bmwill@google.com \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).