git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "D. Ben Knoble" <ben.knoble@gmail.com>
To: git@vger.kernel.org
Subject: git-status performance with submodules
Date: Mon, 2 Dec 2019 01:19:49 -0500	[thread overview]
Message-ID: <CALnO6CCoXOZTsfag6yN_Ffn+H7KE-KTzm+P-GqLKnDMg8j_Qmg@mail.gmail.com> (raw)

[If this has already gone through multiple times, I apologize for the
repetition; I have had a hard time getting GMail to send this. Past
versions had attachments, which I believe contributed to failures.
This one has none, but has links to all the content.]

Hello all,

I have a concern about the performance of git-status with many (~38)
submodules. As part of a (large-scale) system dynamics class, I was tasked
with identifying a performance problem, tracing it using KUTrace(2)[3], and
subsequently investigating it. I ended up with some unique observations about
git-status and submodules[2].

The interactive HTML traces are available on Google Drive[4][5].

I won't recreate all the details here, but I would encourage you to play with
the traces, or at least go through the slides.

### The short-version

Git status is slow(3).

### Baseline

- time git-status, with many submodules, and --ignore-submodules=none
    0.497s
- time git-status in non-submodule heavy repos
    0.014s

### What I consider a temporary fix

- time git-status, with many submodules, and --ignore-submodules=all
    0.026s

### What I would like to see

I would like to improve the git-status performance with this many submodules,
so that I can remove diff.ignoreSubmodules=none from my config (it is useful
information, and the flag affects many commands). I would be willing to work
on a discussed and designed fix.

### What I am curious about

From the traces (attached), it appears that git-status suffers from a lack of
(possibly embarrassing) parallelism: I would expect each submodule to be
independently check-able, but the process section of the trace has them
executing serially (for reasons unknown to me). The apparent need to fork/exec
many processes in this way appears to also be a source of latency, along with
the very large number of filesystem-related syscalls (if my understanding is
correct).

What can we do to fix this? Is there a reason for this (really terribly slow)
serial execution? Is this something developers haven't bothered to optimize
("unexpected use case")? If so, I would like to discuss taking a crack at it,
because I do have at least one repository with this many submodules, and I
care about its performance.

---

Notes

1) All timings were taken with the https://github.com/benknoble/Dotfiles repo
from around commit da194a8f4104a9fc74e8895ebc8512434f07d393

2) KUTrace is a set of kernel patches and userspace programs that provide
low-overhead tracing, as well as post-processing those traces

3) Timings taken on my machine (2012 macbook pro; can provide more details if
requested)

---

Links

[1]: https://docs.google.com/presentation/d/1z-6ffE9KY-Jswl2BiWzYV2DG6fOutgWSi_aZ5uql__s/edit?usp=sharing
[2]: https://benknoble.github.io/blog/2019/11/07/git-stat/
[3]: https://github.com/dicksites/KUtrace
[4]: https://drive.google.com/file/d/1JyYO420yWp7XvNJJ8HLOPU0o6mesSKZf/view?usp=sharing
[5]: https://drive.google.com/file/d/1BqqxH0PRCYz_vvYkBBFpbL5dkFTLPyuK/view?usp=sharing

             reply	other threads:[~2019-12-02  6:20 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-02  6:19 D. Ben Knoble [this message]
2019-12-02  6:50 ` git-status performance with submodules Junio C Hamano
2019-12-02 14:05   ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALnO6CCoXOZTsfag6yN_Ffn+H7KE-KTzm+P-GqLKnDMg8j_Qmg@mail.gmail.com \
    --to=ben.knoble@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).