From: Ben Peart <peartben@gmail.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: "Git Mailing List" <git@vger.kernel.org>,
"Junio C Hamano" <gitster@pobox.com>,
"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
"Johannes Schindelin" <johannes.schindelin@gmx.de>,
"David Turner" <David.Turner@twosigma.com>,
"Jeff King" <peff@peff.net>,
"Christian Couder" <christian.couder@gmail.com>,
"Ben Peart" <benpeart@microsoft.com>
Subject: Re: [WIP/PATCH 7/6] perf: add a performance test for core.fsmonitor
Date: Wed, 7 Jun 2017 21:57:49 -0400 [thread overview]
Message-ID: <468f6b07-f61f-d4ca-a8df-d0a80302b251@gmail.com> (raw)
In-Reply-To: <CACBZZX6D8oC34qat7kdrDOWC5eYm-DRkMWG9eOPPvKKsQtgPyw@mail.gmail.com>
On 6/7/2017 5:46 PM, Ævar Arnfjörð Bjarmason wrote:
> On Wed, Jun 7, 2017 at 9:51 PM, Ben Peart <peartben@gmail.com> wrote:
>>
>>
>> On 6/2/2017 7:06 PM, Ævar Arnfjörð Bjarmason wrote:
>>>
>>>
>>> I don't have time to update the perf test now or dig into it, but most
>>> of what you're describing in this mail doesn't at all match with the
>>> ad-hoc tests I ran in
>>>
>>> https://public-inbox.org/git/CACBZZX5e58bWuf3NdDYTxu2KyZj29hHONzN=rp-7vXd8nURyWQ@mail.gmail.com/
>>>
>>> There (at the very end of the E-Mail) I'm running watchman in a tight
>>> loop while I flush the entire fs cache, its runtime is never longer
>>> than 600ms, with 3ms being the norm.
>>
>>
>> I added a perf trace around the entire query-fsmonitor hook proc (patch
>> below) to measure the total actual impact of running the hook script +
>> querying watchman + parsing the output with perl + passing the result back
>> to git. On my machine, the total cost of the hook runs between 130 ms and
>> 180 ms when there are zero changes to report (ie best case).
>>
>> With short status times, the overhead of watchman simply outweighs any gains
>> in performance - especially when you have a warm file system cache as that
>> cancels out the biggest win of avoiding the IO associated with scanning the
>> working directory.
>>
>>
>> diff --git a/fsmonitor.c b/fsmonitor.c
>> index 763a8a3a3f..cb47f31863 100644
>> --- a/fsmonitor.c
>> +++ b/fsmonitor.c
>> @@ -210,9 +210,11 @@ void refresh_by_fsmonitor(struct index_state *istate)
>> * If we have a last update time, call query-monitor for the set of
>> * changes since that time.
>> */
>> - if (istate->fsmonitor_last_update)
>> + if (istate->fsmonitor_last_update) {
>> query_success = !query_fsmonitor(HOOK_INTERFACE_VERSION,
>> istate->fsmonitor_last_update, &query_result);
>> + trace_performance_since(last_update, "query-fsmonitor");
>> + }
>>
>> if (query_success) {
>> /* Mark all entries returned by the monitor as dirty */
>>
>>
>>
>>>
>>> I.e. flushing the cache doesn't slow things down much at all compared
>>> to how long a "git status" takes from cold cache. Something else must
>>> be going on, and the smoking gun is the gprof output I posted in the
>>> follow-up E-Mail:
>>>
>>> https://public-inbox.org/git/CACBZZX4eZ3G8LQ8O+_BkbkJ-ZXTOkUi9cW=QKYjfHKtmA3pgrA@mail.gmail.com/
>>>
>>> There with the fsmonitor we end up calling blk_SHA1_Block ~100K times
>>> during "status", but IIRC (I don't have the output in front of me,
>>> this is from memory) something like twenty times without the
>>> fsmonitor.
>>>
>>> It can't be a coincidence that with the fscache:
>>>
>>> $ pwd; git ls-files | wc -l
>>> /home/avar/g/linux
>>> 59844
>>>
>>> And you can see that in the fsmonitor "git status" we make exactly
>>> that many calls to cache_entry_from_ondisk(), but those calls don't
>>> show up at all in the non-fscache codepath.
>>>
>>
>> I don't see how the gprof numbers for the non-fsmonitor case can be correct.
>> It appears they don't contain any calls related to loading the index while
>> the fsmonitor gprof numbers do. Here is a typical call stack:
>>
>> git.exe!cache_entry_from_ondisk()
>> git.exe!create_from_disk()
>> git.exe!do_read_index()
>> git.exe!read_index_from()
>> git.exe!read_index()
>>
>> During read_index(), cache_entry_from_ondisk() gets called for every item in
>> the index (which explains the 59K calls). How can the non-fsmonitor
>> codepath not be loading the index?
>>
>>> So, again, I haven't dug and really must step away from the computer
>>> now, but this really looks like the fscache saves us the recursive
>>> readdir() / lstat() etc, but in return we somehow fall though to a
>>> codepath where we re-read the entire on-disk state back into the
>>> index, which we don't do in the non-fscache codepath.
>>>
>>
>> I've run multiple profiles and compared them with fsmonitor on and off and
>> have been unable to find any performance regression caused by fsmonitor
>> (other than flagging the index as dirty at times when it isn't required
>> which I have fixed for the next patch series).
>>
>> I have done many performance runs and when I subtract the _actual_ time
>> spent in the hook from the overall command time, it comes in at slightly
>> less time than when status is run with fsmonitor off. This also leads me to
>> believe there is no regression with fsmonitor on.
>>
>> All this leads me back to my original conclusion: the reason status is
>> slower in these specific cases is because the overhead of calling the hook
>> exceeds the savings gained. If your status calls are taking less than a
>> second, it just doesn't make sense to add the complexity and overhead of
>> calling a file system watcher.
>>
>> I'm working on an updated perf test that will demonstrate the best case
>> scenario (warm watchman, cold file system cache) in addition to the worst
>> case (cold watchman, warm file system cache). The reality is that in normal
>> use cases, perf will be between the two. I'll add that to the next iteration
>> of the patch series.
>
> I'll try to dig further once we have the next submission + that perf test.
>
> On Linux the time spent calling the hook itself is minimal:
>
> $ touch foo; time .git/hooks/query-fsmonitor 1 $(($(date +%s) *
> 1000000000 - 10))
> Watchman says these changed: foo
> foo
> real 0m0.009s
> user 0m0.004s
> sys 0m0.000s
>
Wow, what a difference:
$ touch foo; time .git/hooks/query-fsmonitor 1 $(($(date +%s) *
> 1000000000 - 10))
.git/hooks/query-fsmonitor 1 1496885562999999990
.gitfoo
real 0m0.170s
user 0m0.045s
sys 0m0.091s
> So I'm fairly sure something else entirely is going on, but anyway, we
> can look at that later with the next submission.
>
> Aside from that, I wonder on Windows how much speedier this could be
> for you if the entire hook was written in perl instead of a
> shellscript that calls perl. I could help with that if you'd like.
>
Anything we can do to speed this up and have fewer moving pieces is all
good. I'm not a perl expert so I'd very much appreciate your expertise.
I have a couple of notes from Wez (author of Watchman) on things we
could do to better optimize certain scenarios I'll pass along as well.
> You can likely replace that call to uname with reading the $^O variable.
>
> What you're doing shelling out to cygpath can probably be done by
> loading the File::Spec library, but I'm not sure.
>
> The echo / watchman / perl would need to be an IPC::Open3 invocation I think.
>
> I think it being in shellscript is fine, just a suggestion in case
> having it all in one process (aside from the watchman shell-out) would
> help or Windows.
>
next prev parent reply other threads:[~2017-06-08 1:57 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-06-01 15:50 [PATCH v4 0/6] Fast git status via a file system watcher Ben Peart
2017-06-01 15:51 ` [PATCH v4 1/6] bswap: add 64 bit endianness helper get_be64 Ben Peart
2017-06-01 15:51 ` [PATCH v4 2/6] dir: make lookup_untracked() available outside of dir.c Ben Peart
2017-06-01 15:51 ` [PATCH v4 3/6] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files Ben Peart
2017-06-01 15:51 ` [PATCH v4 4/6] fsmonitor: add test cases for fsmonitor extension Ben Peart
2017-06-01 15:51 ` [PATCH v4 5/6] fsmonitor: add documentation for the " Ben Peart
2017-06-01 15:51 ` [PATCH v4 6/6] fsmonitor: add a sample query-fsmonitor hook script for Watchman Ben Peart
2017-06-07 21:38 ` Ævar Arnfjörð Bjarmason
2017-06-01 19:57 ` [PATCH v4 0/6] Fast git status via a file system watcher Ævar Arnfjörð Bjarmason
2017-06-01 21:06 ` Ben Peart
2017-06-01 21:12 ` Ævar Arnfjörð Bjarmason
2017-06-01 21:13 ` Stefan Beller
2017-06-01 21:26 ` Jeff King
2017-06-01 20:51 ` Ævar Arnfjörð Bjarmason
2017-06-01 21:13 ` Ævar Arnfjörð Bjarmason
2017-06-02 0:40 ` Ben Peart
2017-06-02 10:28 ` [WIP/PATCH 7/6] perf: add a performance test for core.fsmonitor Ævar Arnfjörð Bjarmason
2017-06-02 21:44 ` David Turner
2017-06-03 18:08 ` Ævar Arnfjörð Bjarmason
2017-06-05 14:27 ` Ben Peart
2017-06-02 22:05 ` Ben Peart
2017-06-02 23:06 ` Ævar Arnfjörð Bjarmason
2017-06-07 19:51 ` Ben Peart
2017-06-07 21:46 ` Ævar Arnfjörð Bjarmason
2017-06-08 1:57 ` Ben Peart [this message]
2017-06-04 1:59 ` Junio C Hamano
2017-06-04 7:46 ` Ævar Arnfjörð Bjarmason
2017-06-04 8:21 ` Jeff King
2017-06-02 1:56 ` [PATCH v4 0/6] Fast git status via a file system watcher Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=468f6b07-f61f-d4ca-a8df-d0a80302b251@gmail.com \
--to=peartben@gmail.com \
--cc=David.Turner@twosigma.com \
--cc=avarab@gmail.com \
--cc=benpeart@microsoft.com \
--cc=christian.couder@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=johannes.schindelin@gmx.de \
--cc=pclouds@gmail.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).