git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Tao Klerks <tao@klerks.biz>
To: Tassilo Horn <tsdh@gnu.org>
Cc: git@vger.kernel.org
Subject: Re: [BUG?] Major performance issue with some commands on our repo's master branch
Date: Sat, 4 Jun 2022 22:20:24 +0200	[thread overview]
Message-ID: <CAPMMpohzqKo-+q-tOcXymmzGxuOY-mf2NPRviHURm8-+3MPjZg@mail.gmail.com> (raw)
In-Reply-To: <87h750q1b9.fsf@gnu.org>

(resending as text-only after having stupidly replied from my mobile)

I can add a couple things that may or may not be related here. I work
with a large proprietary repo, like you, and it is also not absurdly
large. I maintain some custom tooling for a large scale perforce
interop process.

I used to use "git show" (without patch) in this custom tooling to get
commit metadata, because it has the advantage that you can specify an
arbitrary list of commits in one call, saving some process overheads
in Windows especially.

I stopped using "git show" when user reports of slowness eventually
revealed two things:

1. Large commits (eg merges to feature branches from the fast-moving
main trunk) were taking a surprisingly long time, despite the
no-patch, which made me think it was doing the patch work anyway, and
just discarding it at the end.

2. Merge commits from long-outdated feature branches, even though the
final patch displayed by "git show" is small, also take a long time.
It seems as though whatever patch-related work "git show" does (and
given your observations I guess it might well be rename-detection), it
does it with respect to *both parents* in the case of a merge request,
even though the patch it shows is only changes wrt the first parent.

All this to say: I haven't understood your branch setup, but I'm
guessing that you're regularly integrating work from "far-behind"
branches, and most or all of your commits on master are therefore
merges with large diffs wrt the second parent, and those large diffs
wrt the second parent are what's "getting worse".

I haven't attempted to debug this, and personally have little
incentive to do, as switching to "git log" and accepting the process
overheads solved *my* problem.

If I get the chance to, I will obviously report back here.

Thanks,
Tao

On Sat, Jun 4, 2022 at 10:29 AM Tassilo Horn <tsdh@gnu.org> wrote:
>
> Hi all,
>
> [spoiler alert: I've figured out the config option causing the problem
> while writing this long mail, so you might jump straight to the SOLUTION
> section at the bottom of this mail.]
>
> at my day job, I work on a git repo (sadly non-public, proprietary) with
> these stats:
>
> - master has about 150000 commits, the last release branch I've also benchmarked above has 144000 commits
> - the history dates back to 2001
> - .git/ is about 1.8 GB
>
> So it's quite big but not unusually big when compared to linux or other
> free software projects.
>
> The typical git commands I use (status, fetch, pull, commit, push,
> rebase, ...) are all quick.  However, I use the git porcelain Magit [1]
> which invokes several plumbing commands in order to present to the user
> an always up-to-date extended status buffer of the currently checked out
> branch showing the current branch.  Some of those plumbing commands are
> extremely slow for no obvious reasons.  The most outstanding command I
> could pinpoint is this:
>
> --8<---------------cut here---------------start------------->8---
> ❯ time git show --no-patch --format="%h %s" "master^{commit}" --
> 6192a0cfdc6 Merge remote-tracking branch 'origin/SHD_ECORO_3_9_7'
>
> ________________________________________________________
> Executed in   13.21 secs    fish           external
>    usr time   12.99 secs  462.00 micros   12.99 secs
>    sys time    0.17 secs  119.00 micros    0.17 secs
> --8<---------------cut here---------------end--------------->8---
>
> The interesting thing is that I have this problem only with the master
> branch.  When I run it for the last release branch, I get these times:
>
> --8<---------------cut here---------------start------------->8---
> ❯ time git show --no-patch --format="%h %s" "SHD_ECORO_3_9_7^{commit}" --
> 994334fc9fb ECOJ-33833 HTML-Formbrief: Bestellungs-Anhänge im KV-Kontext
>
> ________________________________________________________
> Executed in   22.68 millis    fish           external
>    usr time    7.71 millis  761.00 micros    6.95 millis
>    sys time   10.47 millis  194.00 micros   10.28 millis
> --8<---------------cut here---------------end--------------->8---
>
> So you see, it's almost a factor of 1000 difference!  How can that be?
>
> The split between master and the SHD_ECORO_3_X_X series of branches has
> happened almost 2 years ago and master is way ahead of those.
>
> --8<---------------cut here---------------start------------->8---
> ❯ git log --oneline master...origin/SHD_ECORO_3_9_7 | wc -l
> 5013
> --8<---------------cut here---------------end--------------->8---
>
> But there are around 9 merges from the last release branch into master
> daily.
>
> --8<---------------cut here---------------start------------->8---
> ❯ git log --merges --oneline --since 6months | wc -l
> 1611
> --8<---------------cut here---------------end--------------->8---
>
> From my memory, the issue hasn't popped up out of sudden but has gotten
> worse slowly over time.  I have the impression that the worsening
> increased pace over the last few month which might be the result of our
> workflow.  Before, I've been the merge guy doing two "merge waves" from
> the last supported release branch upwards into master once or twice a
> day (usually release-branch -> next-release-branch -> master).  Since
> about 3 month, we've switched to a workflow where every developer does
> merge upwards herself just after committing/pushing to some lesser
> branch than master simply because branches have diverged so much that
> you'd need to be an expert in everything in order to be able to resolve
> conflicts sensibly.
>
> I should mention that I haven't seen this issue with any other repo I
> have.  But that's also the biggest one I use.  The Emacs repository I
> also work on is comparable in the number of commits but with much less
> merges.
>
> At last, here's the git bugreport sysinfo section on that machine and
> repository.
>
> --8<---------------cut here---------------start------------->8---
> [System Info]
> git version:
> git version 2.36.1
> cpu: x86_64
> no commit associated with this build
> sizeof-long: 8
> sizeof-size_t: 8
> shell-path: /bin/sh
> uname: Linux 5.18.1-zen1-1-zen #1 ZEN SMP PREEMPT_DYNAMIC Mon, 30 May 2022 17:53:16 +0000 x86_64
> compiler info: gnuc: 11.2
> libc info: glibc: 2.35
> $SHELL (typically, interactive shell): /usr/bin/fish
>
> [Enabled Hooks]
> --8<---------------cut here---------------end--------------->8---
>
> SOLUTION
> ========
>
> While writing this long mail, I've figured out that the performance
> penalty is caused by my setting of diff.renameLimit = 10000.  If I
> comment that option in my ~/.gitconfig, the above command finishes in
> 150 millis instead of 13 seconds:
>
> --8<---------------cut here---------------start------------->8---
> ❯ time git show --no-patch --format="%h %s" "master^{commit}" --
> 6192a0cfdc6 Merge remote-tracking branch 'origin/SHD_ECORO_3_9_7'
>
> ________________________________________________________
> Executed in  147.99 millis    fish           external
>    usr time  114.52 millis  713.00 micros  113.81 millis
>    sys time   34.78 millis  193.00 micros   34.59 millis
> --8<---------------cut here---------------end--------------->8---
>
> But there's still the question why diff.renameLimit has an influence
> here when --no-patch is provided so no diff should be generated.
>
> Bye,
> Tassilo
>
> [1] https://magit.vc/

  reply	other threads:[~2022-06-04 20:20 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-04  7:39 [BUG?] Major performance issue with some commands on our repo's master branch Tassilo Horn
2022-06-04 20:20 ` Tao Klerks [this message]
2022-06-05 10:46   ` Tassilo Horn
2022-06-06  5:18     ` Tao Klerks
2022-06-08 23:36     ` Jeff King
2022-06-09  1:27       ` Kyle Meyer
2022-06-09 15:03         ` Jeff King
2022-06-09 18:23           ` Junio C Hamano
2022-06-09 18:43             ` Jeff King
2022-06-09 20:06               ` Junio C Hamano
2022-06-09  5:51       ` Tassilo Horn
2022-06-09 15:05         ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPMMpohzqKo-+q-tOcXymmzGxuOY-mf2NPRviHURm8-+3MPjZg@mail.gmail.com \
    --to=tao@klerks.biz \
    --cc=git@vger.kernel.org \
    --cc=tsdh@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).