* [BUG?] Major performance issue with some commands on our repo's master branch @ 2022-06-04 7:39 Tassilo Horn 2022-06-04 20:20 ` Tao Klerks 0 siblings, 1 reply; 12+ messages in thread From: Tassilo Horn @ 2022-06-04 7:39 UTC (permalink / raw) To: git Hi all, [spoiler alert: I've figured out the config option causing the problem while writing this long mail, so you might jump straight to the SOLUTION section at the bottom of this mail.] at my day job, I work on a git repo (sadly non-public, proprietary) with these stats: - master has about 150000 commits, the last release branch I've also benchmarked above has 144000 commits - the history dates back to 2001 - .git/ is about 1.8 GB So it's quite big but not unusually big when compared to linux or other free software projects. The typical git commands I use (status, fetch, pull, commit, push, rebase, ...) are all quick. However, I use the git porcelain Magit [1] which invokes several plumbing commands in order to present to the user an always up-to-date extended status buffer of the currently checked out branch showing the current branch. Some of those plumbing commands are extremely slow for no obvious reasons. The most outstanding command I could pinpoint is this: --8<---------------cut here---------------start------------->8--- ❯ time git show --no-patch --format="%h %s" "master^{commit}" -- 6192a0cfdc6 Merge remote-tracking branch 'origin/SHD_ECORO_3_9_7' ________________________________________________________ Executed in 13.21 secs fish external usr time 12.99 secs 462.00 micros 12.99 secs sys time 0.17 secs 119.00 micros 0.17 secs --8<---------------cut here---------------end--------------->8--- The interesting thing is that I have this problem only with the master branch. When I run it for the last release branch, I get these times: --8<---------------cut here---------------start------------->8--- ❯ time git show --no-patch --format="%h %s" "SHD_ECORO_3_9_7^{commit}" -- 994334fc9fb ECOJ-33833 HTML-Formbrief: Bestellungs-Anhänge im KV-Kontext ________________________________________________________ Executed in 22.68 millis fish external usr time 7.71 millis 761.00 micros 6.95 millis sys time 10.47 millis 194.00 micros 10.28 millis --8<---------------cut here---------------end--------------->8--- So you see, it's almost a factor of 1000 difference! How can that be? The split between master and the SHD_ECORO_3_X_X series of branches has happened almost 2 years ago and master is way ahead of those. --8<---------------cut here---------------start------------->8--- ❯ git log --oneline master...origin/SHD_ECORO_3_9_7 | wc -l 5013 --8<---------------cut here---------------end--------------->8--- But there are around 9 merges from the last release branch into master daily. --8<---------------cut here---------------start------------->8--- ❯ git log --merges --oneline --since 6months | wc -l 1611 --8<---------------cut here---------------end--------------->8--- From my memory, the issue hasn't popped up out of sudden but has gotten worse slowly over time. I have the impression that the worsening increased pace over the last few month which might be the result of our workflow. Before, I've been the merge guy doing two "merge waves" from the last supported release branch upwards into master once or twice a day (usually release-branch -> next-release-branch -> master). Since about 3 month, we've switched to a workflow where every developer does merge upwards herself just after committing/pushing to some lesser branch than master simply because branches have diverged so much that you'd need to be an expert in everything in order to be able to resolve conflicts sensibly. I should mention that I haven't seen this issue with any other repo I have. But that's also the biggest one I use. The Emacs repository I also work on is comparable in the number of commits but with much less merges. At last, here's the git bugreport sysinfo section on that machine and repository. --8<---------------cut here---------------start------------->8--- [System Info] git version: git version 2.36.1 cpu: x86_64 no commit associated with this build sizeof-long: 8 sizeof-size_t: 8 shell-path: /bin/sh uname: Linux 5.18.1-zen1-1-zen #1 ZEN SMP PREEMPT_DYNAMIC Mon, 30 May 2022 17:53:16 +0000 x86_64 compiler info: gnuc: 11.2 libc info: glibc: 2.35 $SHELL (typically, interactive shell): /usr/bin/fish [Enabled Hooks] --8<---------------cut here---------------end--------------->8--- SOLUTION ======== While writing this long mail, I've figured out that the performance penalty is caused by my setting of diff.renameLimit = 10000. If I comment that option in my ~/.gitconfig, the above command finishes in 150 millis instead of 13 seconds: --8<---------------cut here---------------start------------->8--- ❯ time git show --no-patch --format="%h %s" "master^{commit}" -- 6192a0cfdc6 Merge remote-tracking branch 'origin/SHD_ECORO_3_9_7' ________________________________________________________ Executed in 147.99 millis fish external usr time 114.52 millis 713.00 micros 113.81 millis sys time 34.78 millis 193.00 micros 34.59 millis --8<---------------cut here---------------end--------------->8--- But there's still the question why diff.renameLimit has an influence here when --no-patch is provided so no diff should be generated. Bye, Tassilo [1] https://magit.vc/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG?] Major performance issue with some commands on our repo's master branch 2022-06-04 7:39 [BUG?] Major performance issue with some commands on our repo's master branch Tassilo Horn @ 2022-06-04 20:20 ` Tao Klerks 2022-06-05 10:46 ` Tassilo Horn 0 siblings, 1 reply; 12+ messages in thread From: Tao Klerks @ 2022-06-04 20:20 UTC (permalink / raw) To: Tassilo Horn; +Cc: git (resending as text-only after having stupidly replied from my mobile) I can add a couple things that may or may not be related here. I work with a large proprietary repo, like you, and it is also not absurdly large. I maintain some custom tooling for a large scale perforce interop process. I used to use "git show" (without patch) in this custom tooling to get commit metadata, because it has the advantage that you can specify an arbitrary list of commits in one call, saving some process overheads in Windows especially. I stopped using "git show" when user reports of slowness eventually revealed two things: 1. Large commits (eg merges to feature branches from the fast-moving main trunk) were taking a surprisingly long time, despite the no-patch, which made me think it was doing the patch work anyway, and just discarding it at the end. 2. Merge commits from long-outdated feature branches, even though the final patch displayed by "git show" is small, also take a long time. It seems as though whatever patch-related work "git show" does (and given your observations I guess it might well be rename-detection), it does it with respect to *both parents* in the case of a merge request, even though the patch it shows is only changes wrt the first parent. All this to say: I haven't understood your branch setup, but I'm guessing that you're regularly integrating work from "far-behind" branches, and most or all of your commits on master are therefore merges with large diffs wrt the second parent, and those large diffs wrt the second parent are what's "getting worse". I haven't attempted to debug this, and personally have little incentive to do, as switching to "git log" and accepting the process overheads solved *my* problem. If I get the chance to, I will obviously report back here. Thanks, Tao On Sat, Jun 4, 2022 at 10:29 AM Tassilo Horn <tsdh@gnu.org> wrote: > > Hi all, > > [spoiler alert: I've figured out the config option causing the problem > while writing this long mail, so you might jump straight to the SOLUTION > section at the bottom of this mail.] > > at my day job, I work on a git repo (sadly non-public, proprietary) with > these stats: > > - master has about 150000 commits, the last release branch I've also benchmarked above has 144000 commits > - the history dates back to 2001 > - .git/ is about 1.8 GB > > So it's quite big but not unusually big when compared to linux or other > free software projects. > > The typical git commands I use (status, fetch, pull, commit, push, > rebase, ...) are all quick. However, I use the git porcelain Magit [1] > which invokes several plumbing commands in order to present to the user > an always up-to-date extended status buffer of the currently checked out > branch showing the current branch. Some of those plumbing commands are > extremely slow for no obvious reasons. The most outstanding command I > could pinpoint is this: > > --8<---------------cut here---------------start------------->8--- > ❯ time git show --no-patch --format="%h %s" "master^{commit}" -- > 6192a0cfdc6 Merge remote-tracking branch 'origin/SHD_ECORO_3_9_7' > > ________________________________________________________ > Executed in 13.21 secs fish external > usr time 12.99 secs 462.00 micros 12.99 secs > sys time 0.17 secs 119.00 micros 0.17 secs > --8<---------------cut here---------------end--------------->8--- > > The interesting thing is that I have this problem only with the master > branch. When I run it for the last release branch, I get these times: > > --8<---------------cut here---------------start------------->8--- > ❯ time git show --no-patch --format="%h %s" "SHD_ECORO_3_9_7^{commit}" -- > 994334fc9fb ECOJ-33833 HTML-Formbrief: Bestellungs-Anhänge im KV-Kontext > > ________________________________________________________ > Executed in 22.68 millis fish external > usr time 7.71 millis 761.00 micros 6.95 millis > sys time 10.47 millis 194.00 micros 10.28 millis > --8<---------------cut here---------------end--------------->8--- > > So you see, it's almost a factor of 1000 difference! How can that be? > > The split between master and the SHD_ECORO_3_X_X series of branches has > happened almost 2 years ago and master is way ahead of those. > > --8<---------------cut here---------------start------------->8--- > ❯ git log --oneline master...origin/SHD_ECORO_3_9_7 | wc -l > 5013 > --8<---------------cut here---------------end--------------->8--- > > But there are around 9 merges from the last release branch into master > daily. > > --8<---------------cut here---------------start------------->8--- > ❯ git log --merges --oneline --since 6months | wc -l > 1611 > --8<---------------cut here---------------end--------------->8--- > > From my memory, the issue hasn't popped up out of sudden but has gotten > worse slowly over time. I have the impression that the worsening > increased pace over the last few month which might be the result of our > workflow. Before, I've been the merge guy doing two "merge waves" from > the last supported release branch upwards into master once or twice a > day (usually release-branch -> next-release-branch -> master). Since > about 3 month, we've switched to a workflow where every developer does > merge upwards herself just after committing/pushing to some lesser > branch than master simply because branches have diverged so much that > you'd need to be an expert in everything in order to be able to resolve > conflicts sensibly. > > I should mention that I haven't seen this issue with any other repo I > have. But that's also the biggest one I use. The Emacs repository I > also work on is comparable in the number of commits but with much less > merges. > > At last, here's the git bugreport sysinfo section on that machine and > repository. > > --8<---------------cut here---------------start------------->8--- > [System Info] > git version: > git version 2.36.1 > cpu: x86_64 > no commit associated with this build > sizeof-long: 8 > sizeof-size_t: 8 > shell-path: /bin/sh > uname: Linux 5.18.1-zen1-1-zen #1 ZEN SMP PREEMPT_DYNAMIC Mon, 30 May 2022 17:53:16 +0000 x86_64 > compiler info: gnuc: 11.2 > libc info: glibc: 2.35 > $SHELL (typically, interactive shell): /usr/bin/fish > > [Enabled Hooks] > --8<---------------cut here---------------end--------------->8--- > > SOLUTION > ======== > > While writing this long mail, I've figured out that the performance > penalty is caused by my setting of diff.renameLimit = 10000. If I > comment that option in my ~/.gitconfig, the above command finishes in > 150 millis instead of 13 seconds: > > --8<---------------cut here---------------start------------->8--- > ❯ time git show --no-patch --format="%h %s" "master^{commit}" -- > 6192a0cfdc6 Merge remote-tracking branch 'origin/SHD_ECORO_3_9_7' > > ________________________________________________________ > Executed in 147.99 millis fish external > usr time 114.52 millis 713.00 micros 113.81 millis > sys time 34.78 millis 193.00 micros 34.59 millis > --8<---------------cut here---------------end--------------->8--- > > But there's still the question why diff.renameLimit has an influence > here when --no-patch is provided so no diff should be generated. > > Bye, > Tassilo > > [1] https://magit.vc/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG?] Major performance issue with some commands on our repo's master branch 2022-06-04 20:20 ` Tao Klerks @ 2022-06-05 10:46 ` Tassilo Horn 2022-06-06 5:18 ` Tao Klerks 2022-06-08 23:36 ` Jeff King 0 siblings, 2 replies; 12+ messages in thread From: Tassilo Horn @ 2022-06-05 10:46 UTC (permalink / raw) To: Tao Klerks; +Cc: git Tao Klerks <tao@klerks.biz> writes: Hi Tao, thanks for your response. > All this to say: I haven't understood your branch setup, but I'm > guessing that you're regularly integrating work from "far-behind" > branches, and most or all of your commits on master are therefore > merges with large diffs wrt the second parent, and those large diffs > wrt the second parent are what's "getting worse". That's exactly correct. > I haven't attempted to debug this, and personally have little > incentive to do, as switching to "git log" and accepting the process > overheads solved *my* problem. And I'm happy to report it solves *my* problem as well. There's a PR for the Magit git porcelain replacing "git show" with an equivalent "git log" incarnation which makes the 30seconds "refresh status buffer" operation instant. https://github.com/magit/magit/issues/4702 https://github.com/magit/magit/compare/km/show-to-log Still maybe someone might want to have a look at the "git show" issue to double-check if the performance burden in this specific case (no diff should be generated) is warranted. But at least I can work again with no coffee-break long pauses, so I'm all satisfied. :-) Thanks a lot for your insights. Bye, Tassilo ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG?] Major performance issue with some commands on our repo's master branch 2022-06-05 10:46 ` Tassilo Horn @ 2022-06-06 5:18 ` Tao Klerks 2022-06-08 23:36 ` Jeff King 1 sibling, 0 replies; 12+ messages in thread From: Tao Klerks @ 2022-06-06 5:18 UTC (permalink / raw) To: Tassilo Horn; +Cc: git On Sun, Jun 5, 2022 at 12:55 PM Tassilo Horn <tsdh@gnu.org> wrote: > > Tao Klerks <tao@klerks.biz> writes: > > > I haven't attempted to debug this, and personally have little > > incentive to do, as switching to "git log" and accepting the process > > overheads solved *my* problem. > > And I'm happy to report it solves *my* problem as well. There's a PR > for the Magit git porcelain replacing "git show" with an equivalent "git > log" incarnation which makes the 30seconds "refresh status buffer" > operation instant. > > https://github.com/magit/magit/issues/4702 > https://github.com/magit/magit/compare/km/show-to-log > > Still maybe someone might want to have a look at the "git show" issue to > double-check if the performance burden in this specific case (no diff > should be generated) is warranted. I spent a little time with this yesterday, and can confirm: * My issue seems to be the same as yours, "export GIT_TRACE2_PERF=1" shows all the time being spent in rename detection * "git show" is a slightly different entry point into the "git log" code (log.c, cmd_show()) * options to the "git log" functionality are largely collected in a "rev_info" object (defined in revision.h) * one option is the "-c / --diff-merges=combined" option of "git log" (setting rev_info.diff, rev_info.combine_merges and rev_info.dense_combined_merges) * another option is "-s / --no-patch" option of "git log" (setting rev_info.diffopt.output_format |= DIFF_FORMAT_NO_OUTPUT) * the "--patch" and "--no-patch" options seem to be (and are documented as) opposites, but they are not implemented as such; one calls for the work to be done, and the other only hides the output. * the "diffs" options are set automatically/implicitly for "git show", before argument parsing * we can simulate the "git show" performance issue in "--git log -1" by setting "--diff-merges=combined" *and* "--no-patch" explicitly * this performance issue does *not* present in a "git log" call with only the regular "--patch" argument, however; this basic "show patches" instruction defaults to "--diff-merges=off", which means the rename detection work does not need to happen. There might still be slight diff-related overheads, but they are undetectable in my testing. Therefore, I have confirmed it is possible to get "git show" to behave the way we would expect for "--no-patch", by *also* specifying "--diff-merges=off". There are at least two possible approaches / directions to improving these issues generically in the "git show" implementation, I think: 1. Add some "git show"-specific code, saying something along the lines of "if --no-patch is specified, then also imply "--diff-merges=off". This feels like the safer option / less likely to have side-effects. 2. Add some post-processing to the generic "git log & git show" options parsing, to generically propagate "--no-patch" into other properties like those set by "--diff-merges=combined" I don't feel confident enough with the code here to try for the second approach, but the first looks like something I should be able to propose a patch for - and in the meantime I know how to get the "single-process, many arbitrary commits" performance benefit of "git show" again. Thanks for sparking the exploration down this little rabbit-hole! ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG?] Major performance issue with some commands on our repo's master branch 2022-06-05 10:46 ` Tassilo Horn 2022-06-06 5:18 ` Tao Klerks @ 2022-06-08 23:36 ` Jeff King 2022-06-09 1:27 ` Kyle Meyer 2022-06-09 5:51 ` Tassilo Horn 1 sibling, 2 replies; 12+ messages in thread From: Jeff King @ 2022-06-08 23:36 UTC (permalink / raw) To: Tassilo Horn; +Cc: Tao Klerks, git On Sun, Jun 05, 2022 at 12:46:15PM +0200, Tassilo Horn wrote: > Still maybe someone might want to have a look at the "git show" issue to > double-check if the performance burden in this specific case (no diff > should be generated) is warranted. But at least I can work again with > no coffee-break long pauses, so I'm all satisfied. :-) I suspect the issue may be quite subtle. Even you asked for "--no-patch", the underlying diff may still be used for other things. For example, simplifying away TREESAME commits. I.e., ones which did not change anything from their parents after applying path restrictions, diff-filters, etc. There may be other cases, too (e.g., --follow). I think the code could be written to realize that none of those features are in use, and disable the diff entirely in favor of checking whether the two trees has the same object id. That would yield _mostly_ the same behavior, though there are probably corner cases (e.g., a tree with an odd mode entry, say, may get parsed so as to produce an empty diff, even though it's not byte for byte identical). That may be an acceptable tradeoff. But I think the code would be a bit brittle (it needs to know about all the cases where a diff might matter, and we may add more later). In general, I think Git assumes that tree-level diffs aren't too painful to produce. "git log" will do them, too, but just doesn't tickle your particular case because it doesn't look at merges. So probably setting diff.renamelimit correctly is not that bad a solution. -Peff ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG?] Major performance issue with some commands on our repo's master branch 2022-06-08 23:36 ` Jeff King @ 2022-06-09 1:27 ` Kyle Meyer 2022-06-09 15:03 ` Jeff King 2022-06-09 5:51 ` Tassilo Horn 1 sibling, 1 reply; 12+ messages in thread From: Kyle Meyer @ 2022-06-09 1:27 UTC (permalink / raw) To: Jeff King; +Cc: Tassilo Horn, Tao Klerks, git Jeff King writes: > I suspect the issue may be quite subtle. Even you asked for > "--no-patch", the underlying diff may still be used for other things. > For example, simplifying away TREESAME commits. I.e., ones which did not > change anything from their parents after applying path restrictions, > diff-filters, etc. There may be other cases, too (e.g., --follow). > > I think the code could be written to realize that none of those features > are in use, and disable the diff entirely in favor of checking whether > the two trees has the same object id. That would yield _mostly_ the same > behavior, though there are probably corner cases (e.g., a tree with an > odd mode entry, say, may get parsed so as to produce an empty diff, even > though it's not byte for byte identical). That may be an acceptable > tradeoff. But I think the code would be a bit brittle (it needs to know > about all the cases where a diff might matter, and we may add more > later). Do you think it'd be safe to make --no-patch imply --diff-merges=off, as Tao suggested elsewhere in this thread? https://lore.kernel.org/git/CAPMMpog-7eDOrgSU9GjV4G9rk5RkL-PJhaUAO3_0p2YxfRf=LA@mail.gmail.com If so, it seems like that'd be a good way to get speedups for some merge commits. For example, here are hyperfine timings for the current tip of git.git's master branch: Benchmark #1: git show --no-patch --format=%h 1e59178e3f Time (mean ± σ): 47.8 ms ± 1.5 ms [User: 43.2 ms, System: 4.6 ms] Range (min … max): 46.8 ms … 54.4 ms 59 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Benchmark #2: git show --no-patch --diff-merges=off --format=%h 1e59178e3f Time (mean ± σ): 3.2 ms ± 0.2 ms [User: 2.5 ms, System: 0.8 ms] Range (min … max): 2.9 ms … 6.8 ms 688 runs Warning: Command took less than 5 ms to complete. Results might be inaccurate. Warning: Statistical outliers were detected. Consider [...] options. Benchmark #3: git log --no-walk --format=%h 1e59178e3f Time (mean ± σ): 3.2 ms ± 0.1 ms [User: 2.4 ms, System: 0.8 ms] Range (min … max): 2.9 ms … 4.2 ms 697 runs Warning: Command took less than 5 ms to complete. Results might [...] Warning: Statistical outliers were detected. Consider [...] Summary 'git log --no-walk --format=%h 1e59178e3f' ran 1.01 ± 0.08 times faster than 'git show --no-patch --diff-merges=off --format=%h 1e59178e3f' 14.98 ± 0.79 times faster than 'git show --no-patch --format=%h 1e59178e3f' ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG?] Major performance issue with some commands on our repo's master branch 2022-06-09 1:27 ` Kyle Meyer @ 2022-06-09 15:03 ` Jeff King 2022-06-09 18:23 ` Junio C Hamano 0 siblings, 1 reply; 12+ messages in thread From: Jeff King @ 2022-06-09 15:03 UTC (permalink / raw) To: Kyle Meyer; +Cc: Tassilo Horn, Tao Klerks, git On Wed, Jun 08, 2022 at 09:27:43PM -0400, Kyle Meyer wrote: > > I think the code could be written to realize that none of those features > > are in use, and disable the diff entirely in favor of checking whether > > the two trees has the same object id. That would yield _mostly_ the same > > behavior, though there are probably corner cases (e.g., a tree with an > > odd mode entry, say, may get parsed so as to produce an empty diff, even > > though it's not byte for byte identical). That may be an acceptable > > tradeoff. But I think the code would be a bit brittle (it needs to know > > about all the cases where a diff might matter, and we may add more > > later). > > Do you think it'd be safe to make --no-patch imply --diff-merges=off, as > Tao suggested elsewhere in this thread? > > https://lore.kernel.org/git/CAPMMpog-7eDOrgSU9GjV4G9rk5RkL-PJhaUAO3_0p2YxfRf=LA@mail.gmail.com I'm not sure. If I do: diff --git a/diff-merges.c b/diff-merges.c index 7f64156b8b..f05a585dfb 100644 --- a/diff-merges.c +++ b/diff-merges.c @@ -164,6 +164,9 @@ void diff_merges_set_dense_combined_if_unset(struct rev_info *revs) void diff_merges_setup_revs(struct rev_info *revs) { + if (revs->diffopt.output_format == DIFF_FORMAT_NO_OUTPUT && + !revs->explicit_diff_merges) + diff_merges_suppress(revs); if (revs->combine_merges == 0) revs->dense_combined_merges = 0; if (revs->separate_merges == 0) then the test suite passes, but that may just be because we are not invoking the right corner case. It does change the output with something like: git show --diff-filter=D -s a6434bc6f7a1 Without the patch above, it always shows the commit. With it, it shows nothing. That's a bit far-fetched, but it is a regression, and I'm also not sure if it's just the tip of the iceberg. It also doesn't solve problem completely. Regular commits can have expensive diffs, too. I think you'd do better to have a mode specific to git-show that skips the diff if we're not showing it, but makes sure to always show the commit anyway. Perhaps something like the hunk above, but put into cmd_show(), and then setting revs->always_show_header. But it would require somebody verifying that this does the right thing in all cases. -Peff ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [BUG?] Major performance issue with some commands on our repo's master branch 2022-06-09 15:03 ` Jeff King @ 2022-06-09 18:23 ` Junio C Hamano 2022-06-09 18:43 ` Jeff King 0 siblings, 1 reply; 12+ messages in thread From: Junio C Hamano @ 2022-06-09 18:23 UTC (permalink / raw) To: Jeff King; +Cc: Kyle Meyer, Tassilo Horn, Tao Klerks, git Jeff King <peff@peff.net> writes: > git show --diff-filter=D -s a6434bc6f7a1 > > Without the patch above, it always shows the commit. With it, it shows > nothing. That's a bit far-fetched, but it is a regression, and I'm also > not sure if it's just the tip of the iceberg. Here "-s" is merely "do not give patch output like we do by default", so the behaviour is quite understandable and is not a regression we would want to see happen. -S/-G are also likely to be affected, not just the --diff-filter. > It also doesn't solve problem completely. Regular commits can have > expensive diffs, too. That's a good point. > I think you'd do better to have a mode specific to git-show that skips > the diff if we're not showing it, but makes sure to always show the > commit anyway. Meaning an explicit option "git show --log-only"? We'd need to careful to make it either (1) be incompatible with certain features of "git show" (like giving a pathspec) and error out, or (2) ignore these features of "git show" silently and document that. But it would work as a new option. Thanks. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG?] Major performance issue with some commands on our repo's master branch 2022-06-09 18:23 ` Junio C Hamano @ 2022-06-09 18:43 ` Jeff King 2022-06-09 20:06 ` Junio C Hamano 0 siblings, 1 reply; 12+ messages in thread From: Jeff King @ 2022-06-09 18:43 UTC (permalink / raw) To: Junio C Hamano; +Cc: Kyle Meyer, Tassilo Horn, Tao Klerks, git On Thu, Jun 09, 2022 at 11:23:58AM -0700, Junio C Hamano wrote: > > I think you'd do better to have a mode specific to git-show that skips > > the diff if we're not showing it, but makes sure to always show the > > commit anyway. > > Meaning an explicit option "git show --log-only"? We'd need to > careful to make it either (1) be incompatible with certain features > of "git show" (like giving a pathspec) and error out, or (2) ignore > these features of "git show" silently and document that. But it > would work as a new option. Certainly a new option would mean we weren't regressing any existing behavior. I do think it would be a hard option to explain, though. But what I wondered is whether "show" in particular, because it would never want to skip showing a commit, could get away with avoiding the diff automatically. I.e., currently "git show -Sfoo HEAD" will always show HEAD, even if "-S" does not match anything. So if we are not showing any diff output, there is no need to compute the diff in that case. That is unlike "git log", which would omit commits that didn't match. And really it is not "git show" that is special there, but the always_show_header flag it sets. So something like this might work: diff --git a/log-tree.c b/log-tree.c index d0ac0a6327..ed57386938 100644 --- a/log-tree.c +++ b/log-tree.c @@ -1024,6 +1024,10 @@ static int log_tree_diff(struct rev_info *opt, struct commit *commit, struct log if (!all_need_diff && !opt->merges_need_diff) return 0; + if (opt->diffopt.output_format == DIFF_FORMAT_NO_OUTPUT && + opt->always_show_header) + return 0; + parse_commit_or_die(commit); oid = get_commit_tree_oid(commit); It produces the same output in the cases I tried. And running with GIT_TRACE2_PERF shows that it doesn't diff and rename code. I'm not overly confident that it isn't violating some other subtle assumption / corner case that I haven't thought of, though. :) -Peff ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [BUG?] Major performance issue with some commands on our repo's master branch 2022-06-09 18:43 ` Jeff King @ 2022-06-09 20:06 ` Junio C Hamano 0 siblings, 0 replies; 12+ messages in thread From: Junio C Hamano @ 2022-06-09 20:06 UTC (permalink / raw) To: Jeff King; +Cc: Kyle Meyer, Tassilo Horn, Tao Klerks, git Jeff King <peff@peff.net> writes: > But what I wondered is whether "show" in particular, because it would > never want to skip showing a commit, could get away with avoiding the > diff automatically. Ahh, that is a clever thought. At least unless we automatically turn ourselves into "git log" by giving an range, we are naming individual object we want to see, so why not show them? But I wonder if "git show A -- P" should or should not show the commit if A does not touch the path P. Right now we apply the same history simplification so "git show master -- t/" gives nothing to me, which is probably one sensible thing to do. It is debatable why somebody who wants to see 'master' wants to hide it when it does not touch the paths that match the pathspec given, but it can also be debated why somebody would give a pathspec if commits are to be hidden when they do not touch paths that match it, so... > I.e., currently "git show -Sfoo HEAD" will always > show HEAD, even if "-S" does not match anything. So if we are not > showing any diff output, there is no need to compute the diff in that > case. That is unlike "git log", which would omit commits that didn't > match. OK, you came up with an example that behaves differently. > And really it is not "git show" that is special there, but the > always_show_header flag it sets. So something like this might work: A tempting thought, indeed. > diff --git a/log-tree.c b/log-tree.c > index d0ac0a6327..ed57386938 100644 > --- a/log-tree.c > +++ b/log-tree.c > @@ -1024,6 +1024,10 @@ static int log_tree_diff(struct rev_info *opt, struct commit *commit, struct log > if (!all_need_diff && !opt->merges_need_diff) > return 0; > > + if (opt->diffopt.output_format == DIFF_FORMAT_NO_OUTPUT && > + opt->always_show_header) > + return 0; > + > parse_commit_or_die(commit); > oid = get_commit_tree_oid(commit); > > > It produces the same output in the cases I tried. And running with > GIT_TRACE2_PERF shows that it doesn't diff and rename code. > > I'm not overly confident that it isn't violating some other subtle > assumption / corner case that I haven't thought of, though. :) > > -Peff ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG?] Major performance issue with some commands on our repo's master branch 2022-06-08 23:36 ` Jeff King 2022-06-09 1:27 ` Kyle Meyer @ 2022-06-09 5:51 ` Tassilo Horn 2022-06-09 15:05 ` Jeff King 1 sibling, 1 reply; 12+ messages in thread From: Tassilo Horn @ 2022-06-09 5:51 UTC (permalink / raw) To: Jeff King; +Cc: Tao Klerks, git Jeff King <peff@peff.net> writes: Hi Jeff, >> Still maybe someone might want to have a look at the "git show" issue >> to double-check if the performance burden in this specific case (no >> diff should be generated) is warranted. But at least I can work >> again with no coffee-break long pauses, so I'm all satisfied. :-) > > I suspect the issue may be quite subtle. Even you asked for > "--no-patch", the underlying diff may still be used for other things. > For example, simplifying away TREESAME commits. I.e., ones which did > not change anything from their parents after applying path > restrictions, diff-filters, etc. There may be other cases, too (e.g., > --follow). I see. In the end, my issue was solved by my git porcelain switching to a "git log" incarnation instead of "git show". When I "git show" manually, it's no big deal if it takes some time for merge commits. > [...] > > So probably setting diff.renamelimit correctly is not that bad a > solution. Does your statement imply diff.renameLimit = 10000 is an incorrect setting? The thing is that I mostly work with java codebases where every file rename implies a change in file contents, too. A large renameLimit seems to help in correctly detecting renames/copies although I don't have empirical data but only gut feeling. Bye, Tassilo ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG?] Major performance issue with some commands on our repo's master branch 2022-06-09 5:51 ` Tassilo Horn @ 2022-06-09 15:05 ` Jeff King 0 siblings, 0 replies; 12+ messages in thread From: Jeff King @ 2022-06-09 15:05 UTC (permalink / raw) To: Tassilo Horn; +Cc: Tao Klerks, git On Thu, Jun 09, 2022 at 07:51:36AM +0200, Tassilo Horn wrote: > > So probably setting diff.renamelimit correctly is not that bad a > > solution. > > Does your statement imply diff.renameLimit = 10000 is an incorrect > setting? The thing is that I mostly work with java codebases where > every file rename implies a change in file contents, too. A large > renameLimit seems to help in correctly detecting renames/copies although > I don't have empirical data but only gut feeling. Well, for some definition of incorrect. :) You are telling Git to spend extra time computing renames, and then you were annoyed when it spent a long time computing renames. So in that sense it was not what you wanted. It may be that you want different limits in different contexts, and the current config is not sufficient to express that. -Peff ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2022-06-09 20:06 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-06-04 7:39 [BUG?] Major performance issue with some commands on our repo's master branch Tassilo Horn 2022-06-04 20:20 ` Tao Klerks 2022-06-05 10:46 ` Tassilo Horn 2022-06-06 5:18 ` Tao Klerks 2022-06-08 23:36 ` Jeff King 2022-06-09 1:27 ` Kyle Meyer 2022-06-09 15:03 ` Jeff King 2022-06-09 18:23 ` Junio C Hamano 2022-06-09 18:43 ` Jeff King 2022-06-09 20:06 ` Junio C Hamano 2022-06-09 5:51 ` Tassilo Horn 2022-06-09 15:05 ` Jeff King
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).