git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Ben Peart <peartben@gmail.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: "Git Mailing List" <git@vger.kernel.org>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
	"Johannes Schindelin" <johannes.schindelin@gmx.de>,
	"David Turner" <David.Turner@twosigma.com>,
	"Jeff King" <peff@peff.net>,
	"Christian Couder" <christian.couder@gmail.com>,
	"Ben Peart" <benpeart@microsoft.com>
Subject: Re: [WIP/PATCH 7/6] perf: add a performance test for core.fsmonitor
Date: Wed, 7 Jun 2017 21:57:49 -0400	[thread overview]
Message-ID: <468f6b07-f61f-d4ca-a8df-d0a80302b251@gmail.com> (raw)
In-Reply-To: <CACBZZX6D8oC34qat7kdrDOWC5eYm-DRkMWG9eOPPvKKsQtgPyw@mail.gmail.com>



On 6/7/2017 5:46 PM, Ævar Arnfjörð Bjarmason wrote:
> On Wed, Jun 7, 2017 at 9:51 PM, Ben Peart <peartben@gmail.com> wrote:
>>
>>
>> On 6/2/2017 7:06 PM, Ævar Arnfjörð Bjarmason wrote:
>>>
>>>
>>> I don't have time to update the perf test now or dig into it, but most
>>> of what you're describing in this mail doesn't at all match with the
>>> ad-hoc tests I ran in
>>>
>>> https://public-inbox.org/git/CACBZZX5e58bWuf3NdDYTxu2KyZj29hHONzN=rp-7vXd8nURyWQ@mail.gmail.com/
>>>
>>> There (at the very end of the E-Mail) I'm running watchman in a tight
>>> loop while I flush the entire fs cache, its runtime is never longer
>>> than 600ms, with 3ms being the norm.
>>
>>
>> I added a perf trace around the entire query-fsmonitor hook proc (patch
>> below) to measure the total actual impact of running the hook script +
>> querying watchman + parsing the output with perl + passing the result back
>> to git. On my machine, the total cost of the hook runs between 130 ms and
>> 180 ms when there are zero changes to report (ie best case).
>>
>> With short status times, the overhead of watchman simply outweighs any gains
>> in performance - especially when you have a warm file system cache as that
>> cancels out the biggest win of avoiding the IO associated with scanning the
>> working directory.
>>
>>
>> diff --git a/fsmonitor.c b/fsmonitor.c
>> index 763a8a3a3f..cb47f31863 100644
>> --- a/fsmonitor.c
>> +++ b/fsmonitor.c
>> @@ -210,9 +210,11 @@ void refresh_by_fsmonitor(struct index_state *istate)
>>           * If we have a last update time, call query-monitor for the set of
>>           * changes since that time.
>>           */
>> -       if (istate->fsmonitor_last_update)
>> +       if (istate->fsmonitor_last_update) {
>>                  query_success = !query_fsmonitor(HOOK_INTERFACE_VERSION,
>>                          istate->fsmonitor_last_update, &query_result);
>> +               trace_performance_since(last_update, "query-fsmonitor");
>> +       }
>>
>>          if (query_success) {
>>                  /* Mark all entries returned by the monitor as dirty */
>>
>>
>>
>>>
>>> I.e. flushing the cache doesn't slow things down much at all compared
>>> to how long a "git status" takes from cold cache. Something else must
>>> be going on, and the smoking gun is the gprof output I posted in the
>>> follow-up E-Mail:
>>>
>>> https://public-inbox.org/git/CACBZZX4eZ3G8LQ8O+_BkbkJ-ZXTOkUi9cW=QKYjfHKtmA3pgrA@mail.gmail.com/
>>>
>>> There with the fsmonitor we end up calling blk_SHA1_Block ~100K times
>>> during "status", but IIRC (I don't have the output in front of me,
>>> this is from memory) something like twenty times without the
>>> fsmonitor.
>>>
>>> It can't be a coincidence that with the fscache:
>>>
>>> $ pwd; git ls-files | wc -l
>>> /home/avar/g/linux
>>> 59844
>>>
>>> And you can see that in the fsmonitor "git status" we make exactly
>>> that many calls to cache_entry_from_ondisk(), but those calls don't
>>> show up at all in the non-fscache codepath.
>>>
>>
>> I don't see how the gprof numbers for the non-fsmonitor case can be correct.
>> It appears they don't contain any calls related to loading the index while
>> the fsmonitor gprof numbers do.  Here is a typical call stack:
>>
>> git.exe!cache_entry_from_ondisk()
>> git.exe!create_from_disk()
>> git.exe!do_read_index()
>> git.exe!read_index_from()
>> git.exe!read_index()
>>
>> During read_index(), cache_entry_from_ondisk() gets called for every item in
>> the index (which explains the 59K calls).  How can the non-fsmonitor
>> codepath not be loading the index?
>>
>>> So, again, I haven't dug and really must step away from the computer
>>> now, but this really looks like the fscache saves us the recursive
>>> readdir() / lstat() etc, but in return we somehow fall though to a
>>> codepath where we re-read the entire on-disk state back into the
>>> index, which we don't do in the non-fscache codepath.
>>>
>>
>> I've run multiple profiles and compared them with fsmonitor on and off and
>> have been unable to find any performance regression caused by fsmonitor
>> (other than flagging the index as dirty at times when it isn't required
>> which I have fixed for the next patch series).
>>
>> I have done many performance runs and when I subtract the _actual_ time
>> spent in the hook from the overall command time, it comes in at slightly
>> less time than when status is run with fsmonitor off.  This also leads me to
>> believe there is no regression with fsmonitor on.
>>
>> All this leads me back to my original conclusion: the reason status is
>> slower in these specific cases is because the overhead of calling the hook
>> exceeds the savings gained. If your status calls are taking less than a
>> second, it just doesn't make sense to add the complexity and overhead of
>> calling a file system watcher.
>>
>> I'm working on an updated perf test that will demonstrate the best case
>> scenario (warm watchman, cold file system cache) in addition to the worst
>> case (cold watchman, warm file system cache).  The reality is that in normal
>> use cases, perf will be between the two. I'll add that to the next iteration
>> of the patch series.
> 
> I'll try to dig further once we have the next submission + that perf test.
> 
> On Linux the time spent calling the hook itself is minimal:
> 
>      $ touch foo; time .git/hooks/query-fsmonitor 1 $(($(date +%s) *
> 1000000000 - 10))
>      Watchman says these changed: foo
>      foo
>      real    0m0.009s
>      user    0m0.004s
>      sys     0m0.000s
> 

Wow, what a difference:

$ touch foo; time .git/hooks/query-fsmonitor 1 $(($(date +%s) *
 > 1000000000 - 10))
.git/hooks/query-fsmonitor 1 1496885562999999990
.gitfoo
real    0m0.170s
user    0m0.045s
sys     0m0.091s


> So I'm fairly sure something else entirely is going on, but anyway, we
> can look at that later with the next submission.
> 
> Aside from that, I wonder on Windows how much speedier this could be
> for you if the entire hook was written in perl instead of a
> shellscript that calls perl. I could help with that if you'd like.
> 

Anything we can do to speed this up and have fewer moving pieces is all 
good.  I'm not a perl expert so I'd very much appreciate your expertise. 
  I have a couple of notes from Wez (author of Watchman) on things we 
could do to better optimize certain scenarios I'll pass along as well.

> You can likely replace that call to uname with reading the $^O variable.
> 
> What you're doing shelling out to cygpath can probably be done by
> loading the File::Spec library, but I'm not sure.
> 
> The echo / watchman / perl would need to be an IPC::Open3 invocation I think.
> 
> I think it being in shellscript is fine, just a suggestion in case
> having it all in one process (aside from the watchman shell-out) would
> help or Windows.
> 

  reply	other threads:[~2017-06-08  1:57 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-01 15:50 [PATCH v4 0/6] Fast git status via a file system watcher Ben Peart
2017-06-01 15:51 ` [PATCH v4 1/6] bswap: add 64 bit endianness helper get_be64 Ben Peart
2017-06-01 15:51 ` [PATCH v4 2/6] dir: make lookup_untracked() available outside of dir.c Ben Peart
2017-06-01 15:51 ` [PATCH v4 3/6] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files Ben Peart
2017-06-01 15:51 ` [PATCH v4 4/6] fsmonitor: add test cases for fsmonitor extension Ben Peart
2017-06-01 15:51 ` [PATCH v4 5/6] fsmonitor: add documentation for the " Ben Peart
2017-06-01 15:51 ` [PATCH v4 6/6] fsmonitor: add a sample query-fsmonitor hook script for Watchman Ben Peart
2017-06-07 21:38   ` Ævar Arnfjörð Bjarmason
2017-06-01 19:57 ` [PATCH v4 0/6] Fast git status via a file system watcher Ævar Arnfjörð Bjarmason
2017-06-01 21:06   ` Ben Peart
2017-06-01 21:12     ` Ævar Arnfjörð Bjarmason
2017-06-01 21:13     ` Stefan Beller
2017-06-01 21:26       ` Jeff King
2017-06-01 20:51 ` Ævar Arnfjörð Bjarmason
2017-06-01 21:13   ` Ævar Arnfjörð Bjarmason
2017-06-02  0:40     ` Ben Peart
2017-06-02 10:28       ` [WIP/PATCH 7/6] perf: add a performance test for core.fsmonitor Ævar Arnfjörð Bjarmason
2017-06-02 21:44         ` David Turner
2017-06-03 18:08           ` Ævar Arnfjörð Bjarmason
2017-06-05 14:27           ` Ben Peart
2017-06-02 22:05         ` Ben Peart
2017-06-02 23:06           ` Ævar Arnfjörð Bjarmason
2017-06-07 19:51             ` Ben Peart
2017-06-07 21:46               ` Ævar Arnfjörð Bjarmason
2017-06-08  1:57                 ` Ben Peart [this message]
2017-06-04  1:59         ` Junio C Hamano
2017-06-04  7:46           ` Ævar Arnfjörð Bjarmason
2017-06-04  8:21             ` Jeff King
2017-06-02  1:56 ` [PATCH v4 0/6] Fast git status via a file system watcher Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=468f6b07-f61f-d4ca-a8df-d0a80302b251@gmail.com \
    --to=peartben@gmail.com \
    --cc=David.Turner@twosigma.com \
    --cc=avarab@gmail.com \
    --cc=benpeart@microsoft.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=johannes.schindelin@gmx.de \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).