From: Matheus Tavares <matheus.bernardino@usp.br>
To: Derrick Stolee <stolee@gmail.com>
Cc: Jeff Hostetler via GitGitGadget <gitgitgadget@gmail.com>,
git <git@vger.kernel.org>,
Jeff Hostetler <jeffhost@microsoft.com>
Subject: Re: [PATCH 0/9] Trace2 stopwatch timers and global counters
Date: Tue, 21 Dec 2021 20:27:58 -0300 [thread overview]
Message-ID: <CAHd-oW6ChTb94hDOUzZZCAo5KBu5_QvD8sbpbSb2BQiWsXkMaw@mail.gmail.com> (raw)
In-Reply-To: <92923ca0-fbf9-e763-5735-214f3ad0cc3a@gmail.com>
On Tue, Dec 21, 2021 at 11:51 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 12/20/2021 10:01 AM, Jeff Hostetler via GitGitGadget wrote:
> >
> > 3. Rationale
> >
> > Timers and counters are an alternative to the existing "region" and "data"
> > events. The latter are intended to trace the major flow (or phases) of the
> > program and possibly capture the amount of work performed within a loop, for
> > example. The former are offered as a way to measure activity that is not
> > localized, such as the time spent in zlib or lstat, which may be called from
> > many different parts of the program.
>
> I'm excited for these API features.
Me too! This would have been very useful on some experiments I had to
run in the past.
Thanks for working on it, Jeff :)
> I also like your attention to thread contexts. I think these timers
> would be very interesting to use in parallel checkout. CC'ing Matheus
> for his thoughts on where he would want timer summaries for that
> feature.
For parallel checkout, I think it would be interesting to have timer
summaries for open/close, fstat/lstat, write, and
inflation/delta-reconstruction. Perhaps pkt-line routines too, so that
we can see how much time we spend in inter-process communication.
It would be nice to have timer information for disk reading as well
(more on that below), but I don't think it is possible since we read
the objects through mmap() and thus, we cannot easily isolate the
actual reading time from the decompression time :(
> I would probably want the per-thread summary to know if we
> are blocked on one really long thread while the others finish quickly.
That would be interesting. Parallel checkout actually uses
subprocesses, but I can see the per-thread summary being useful on
grep, for example. (Nevertheless, the use case you mentioned for the
timers -- to evaluate the work balance on parallel checkout -- seems
very interesting.)
> Within that: what are the things causing us to be slow? Is it zlib?
> Is it lstat()?
On my tests, the bottleneck on checkout heavily depended on the
underlying storage type. On HDDs, the bottleneck was object reading
(i.e. page faults on mmap()-ed files), with about 70% to 80% of the
checkout runtime.
On SSDs, reading was much faster, so CPU (i.e. inflation) became the
bottleneck, with 50% of the runtime. (Inflation only lost to reading
when checking out from *many* loose objects.)
Finally, on NFS, file creation with open(O_CREAT | O_EXCL) and fstat()
(which makes the NFS client flush previously cached writes to the
server) were the bottlenecks, with about 40% of the total runtime
each.
These numbers come from a (sequential) `git checkout .` execution on
an empty working tree of the Linux kernel (v5.12), and they were
gathered using eBPF-based profilers. For other operations, especially
ones that require many file removals or more laborious tree merging in
unpack_trees(), I suspect the bottlenecks may change.
If anyone would be interested in seeing the flamegraphs and other
plots for these profiling numbers, I have them at:
https://matheustavares.gitlab.io/annexes/parallel-checkout/profiling
And there is a bit more context at:
https://matheustavares.gitlab.io/posts/parallel-checkout
next prev parent reply other threads:[~2021-12-21 23:28 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-20 15:01 [PATCH 0/9] Trace2 stopwatch timers and global counters Jeff Hostetler via GitGitGadget
2021-12-20 15:01 ` [PATCH 1/9] trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx Jeff Hostetler via GitGitGadget
2021-12-20 15:01 ` [PATCH 2/9] trace2: convert tr2tls_thread_ctx.thread_name from strbuf to char* Jeff Hostetler via GitGitGadget
2021-12-20 16:31 ` Ævar Arnfjörð Bjarmason
2021-12-20 19:07 ` Jeff Hostetler
2021-12-20 19:35 ` Ævar Arnfjörð Bjarmason
2021-12-22 16:32 ` Jeff Hostetler
2021-12-21 7:33 ` Junio C Hamano
2021-12-21 7:22 ` Junio C Hamano
2021-12-22 16:28 ` Jeff Hostetler
2021-12-22 19:57 ` Junio C Hamano
2021-12-20 15:01 ` [PATCH 3/9] trace2: defer free of TLS CTX until program exit Jeff Hostetler via GitGitGadget
2021-12-21 7:30 ` Junio C Hamano
2021-12-22 21:59 ` Jeff Hostetler
2021-12-22 22:56 ` Junio C Hamano
2021-12-22 23:04 ` Jeff Hostetler
2021-12-23 7:38 ` Johannes Sixt
2021-12-23 18:18 ` Junio C Hamano
2021-12-27 18:51 ` Jeff Hostetler
2021-12-20 15:01 ` [PATCH 4/9] trace2: add thread-name override to event target Jeff Hostetler via GitGitGadget
2021-12-20 15:01 ` [PATCH 5/9] trace2: add thread-name override to perf target Jeff Hostetler via GitGitGadget
2021-12-20 15:01 ` [PATCH 6/9] trace2: add timer events to perf and event target formats Jeff Hostetler via GitGitGadget
2021-12-20 16:39 ` Ævar Arnfjörð Bjarmason
2021-12-20 19:44 ` Jeff Hostetler
2021-12-21 14:20 ` Derrick Stolee
2021-12-20 15:01 ` [PATCH 7/9] trace2: add stopwatch timers Jeff Hostetler via GitGitGadget
2021-12-20 16:42 ` Ævar Arnfjörð Bjarmason
2021-12-22 21:38 ` Jeff Hostetler
2021-12-21 14:45 ` Derrick Stolee
2021-12-22 21:57 ` Jeff Hostetler
2021-12-20 15:01 ` [PATCH 8/9] trace2: add counter events to perf and event target formats Jeff Hostetler via GitGitGadget
2021-12-20 16:51 ` Ævar Arnfjörð Bjarmason
2021-12-22 22:56 ` Jeff Hostetler
2021-12-20 15:01 ` [PATCH 9/9] trace2: add global counters Jeff Hostetler via GitGitGadget
2021-12-20 17:14 ` Ævar Arnfjörð Bjarmason
2021-12-22 22:18 ` Jeff Hostetler
2021-12-21 14:51 ` [PATCH 0/9] Trace2 stopwatch timers and " Derrick Stolee
2021-12-21 23:27 ` Matheus Tavares [this message]
2021-12-28 19:36 ` [PATCH v2 " Jeff Hostetler via GitGitGadget
2021-12-28 19:36 ` [PATCH v2 1/9] trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx Jeff Hostetler via GitGitGadget
2021-12-29 0:48 ` Ævar Arnfjörð Bjarmason
2021-12-28 19:36 ` [PATCH v2 2/9] trace2: convert tr2tls_thread_ctx.thread_name from strbuf to flex array Jeff Hostetler via GitGitGadget
2021-12-29 1:11 ` Ævar Arnfjörð Bjarmason
2021-12-29 16:46 ` Jeff Hostetler
2021-12-28 19:36 ` [PATCH v2 3/9] trace2: defer free of thread local storage until program exit Jeff Hostetler via GitGitGadget
2021-12-28 19:36 ` [PATCH v2 4/9] trace2: add thread-name override to event target Jeff Hostetler via GitGitGadget
2021-12-28 19:36 ` [PATCH v2 5/9] trace2: add thread-name override to perf target Jeff Hostetler via GitGitGadget
2021-12-29 1:48 ` Ævar Arnfjörð Bjarmason
2021-12-29 17:15 ` Jeff Hostetler
2021-12-28 19:36 ` [PATCH v2 6/9] trace2: add timer events to perf and event target formats Jeff Hostetler via GitGitGadget
2021-12-28 19:36 ` [PATCH v2 7/9] trace2: add stopwatch timers Jeff Hostetler via GitGitGadget
2021-12-28 19:36 ` [PATCH v2 8/9] trace2: add counter events to perf and event target formats Jeff Hostetler via GitGitGadget
2021-12-28 19:36 ` [PATCH v2 9/9] trace2: add global counters Jeff Hostetler via GitGitGadget
2021-12-29 1:54 ` [PATCH v2 0/9] Trace2 stopwatch timers and " Ævar Arnfjörð Bjarmason
2021-12-30 16:42 ` Jeff Hostetler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAHd-oW6ChTb94hDOUzZZCAo5KBu5_QvD8sbpbSb2BQiWsXkMaw@mail.gmail.com \
--to=matheus.bernardino@usp.br \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=jeffhost@microsoft.com \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).