git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* git-status performance with submodules
@ 2019-12-02  6:19 D. Ben Knoble
  2019-12-02  6:50 ` Junio C Hamano
  0 siblings, 1 reply; 3+ messages in thread
From: D. Ben Knoble @ 2019-12-02  6:19 UTC (permalink / raw)
  To: git

[If this has already gone through multiple times, I apologize for the
repetition; I have had a hard time getting GMail to send this. Past
versions had attachments, which I believe contributed to failures.
This one has none, but has links to all the content.]

Hello all,

I have a concern about the performance of git-status with many (~38)
submodules. As part of a (large-scale) system dynamics class, I was tasked
with identifying a performance problem, tracing it using KUTrace(2)[3], and
subsequently investigating it. I ended up with some unique observations about
git-status and submodules[2].

The interactive HTML traces are available on Google Drive[4][5].

I won't recreate all the details here, but I would encourage you to play with
the traces, or at least go through the slides.

### The short-version

Git status is slow(3).

### Baseline

- time git-status, with many submodules, and --ignore-submodules=none
    0.497s
- time git-status in non-submodule heavy repos
    0.014s

### What I consider a temporary fix

- time git-status, with many submodules, and --ignore-submodules=all
    0.026s

### What I would like to see

I would like to improve the git-status performance with this many submodules,
so that I can remove diff.ignoreSubmodules=none from my config (it is useful
information, and the flag affects many commands). I would be willing to work
on a discussed and designed fix.

### What I am curious about

From the traces (attached), it appears that git-status suffers from a lack of
(possibly embarrassing) parallelism: I would expect each submodule to be
independently check-able, but the process section of the trace has them
executing serially (for reasons unknown to me). The apparent need to fork/exec
many processes in this way appears to also be a source of latency, along with
the very large number of filesystem-related syscalls (if my understanding is
correct).

What can we do to fix this? Is there a reason for this (really terribly slow)
serial execution? Is this something developers haven't bothered to optimize
("unexpected use case")? If so, I would like to discuss taking a crack at it,
because I do have at least one repository with this many submodules, and I
care about its performance.

---

Notes

1) All timings were taken with the https://github.com/benknoble/Dotfiles repo
from around commit da194a8f4104a9fc74e8895ebc8512434f07d393

2) KUTrace is a set of kernel patches and userspace programs that provide
low-overhead tracing, as well as post-processing those traces

3) Timings taken on my machine (2012 macbook pro; can provide more details if
requested)

---

Links

[1]: https://docs.google.com/presentation/d/1z-6ffE9KY-Jswl2BiWzYV2DG6fOutgWSi_aZ5uql__s/edit?usp=sharing
[2]: https://benknoble.github.io/blog/2019/11/07/git-stat/
[3]: https://github.com/dicksites/KUtrace
[4]: https://drive.google.com/file/d/1JyYO420yWp7XvNJJ8HLOPU0o6mesSKZf/view?usp=sharing
[5]: https://drive.google.com/file/d/1BqqxH0PRCYz_vvYkBBFpbL5dkFTLPyuK/view?usp=sharing

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-12-02 14:05 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-02  6:19 git-status performance with submodules D. Ben Knoble
2019-12-02  6:50 ` Junio C Hamano
2019-12-02 14:05   ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).