git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Björn Pettersson A" <bjorn.a.pettersson@ericsson.com>
To: "git@vger.kernel.org" <git@vger.kernel.org>
Subject: Bad performance when using git log --parents (used by gitk)
Date: Tue, 2 Apr 2019 11:31:59 +0000	[thread overview]
Message-ID: <HE1PR0702MB3788FCDAB764252D9CBB42E5B0560@HE1PR0702MB3788.eurprd07.prod.outlook.com> (raw)

Hi!

The LLVM project is moving from SVN to git, creating a single repo on github for several LLVM sub-projects.
In the past we have had one git repo mirror for each sub-project (mirroring the SVN projects).

Unfortunately, I've seen some performance problems with git (or rather gitk) when starting to use the new llvm-project git repo.

It seems like gitk is using "git log --no-color -z --pretty=raw --show-notes --parents --boundary HEAD -- <file>" when loading the history. So it seems to be the performance of "git log --parents . -- <file>" that is causing the performance problem afaict.


Example:

Run "git log --parents" for an old file (bswap.ll), and a brand new file (dummy).

First we try it using the new "llvm-project" repository.

--------------------------------------------------------------------------------
bash-4.1$ git clone https://github.com/llvm/llvm-project.git && cd llvm-project
Cloning into 'llvm-project'...
remote: Enumerating objects: 130, done.
remote: Counting objects: 100% (130/130), done.
remote: Compressing objects: 100% (98/98), done.
remote: Total 3361980 (delta 39), reused 58 (delta 26), pack-reused 3361850
Receiving objects: 100% (3361980/3361980), 605.50 MiB | 15.63 MiB/s, done.
Resolving deltas: 100% (2755544/2755544), done.
Checking out files: 100% (82618/82618), done.

bash-4.1$ /usr/bin/time git log --parents -- llvm/test/CodeGen/Generic/bswap.ll >> /dev/null
190.63user 0.43system 3:11.01elapsed 100%CPU (0avgtext+0avgdata 702756maxresident)k
232inputs+0outputs (2major+177913minor)pagefaults 0swaps

bash-4.1$ touch dummy
bash-4.1$ git add dummy
bash-4.1$ git commit -m "test"
[master ce43ac2e487] test
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 dummy
bash-4.1$ /usr/bin/time git log --parents -- dummy >> /dev/null
205.54user 0.37system 3:25.83elapsed 100%CPU (0avgtext+0avgdata 644576maxresident)k
0inputs+0outputs (0major+163134minor)pagefaults 0swaps
--------------------------------------------------------------------------------


Now do the same for the old "llvm" repository.

--------------------------------------------------------------------------------
bash-4.1$ git clone https://github.com/llvm-mirror/llvm.git llvm && cd llvm
Cloning into 'llvm'...
remote: Enumerating objects: 84, done.
remote: Counting objects: 100% (84/84), done.
remote: Compressing objects: 100% (61/61), done.
remote: Total 1673859 (delta 25), reused 35 (delta 23), pack-reused 1673775
Receiving objects: 100% (1673859/1673859), 373.08 MiB | 12.72 MiB/s, done.
Resolving deltas: 100% (1369306/1369306), done.
Checking out files: 100% (36477/36477), done.
bash-4.1$ /usr/bin/time git log --parents -- test/CodeGen/Generic/bswap.ll >> /dev/null
4.89user 0.27system 0:05.19elapsed 99%CPU (0avgtext+0avgdata 468072maxresident)k
0inputs+0outputs (0major+120244minor)pagefaults 0swaps

bash-4.1$ touch dummy
bash-4.1$ git add dummy
bash-4.1$ git commit -m "test"
[master 1db81b43a30] test
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 dummy
bash-4.1$ /usr/bin/time git log --parents -- dummy >> /dev/null
4.05user 0.24system 0:04.32elapsed 99%CPU (0avgtext+0avgdata 437920maxresident)k
0inputs+0outputs (0major+112503minor)pagefaults 0swaps
--------------------------------------------------------------------------------


So for bswap.ll it takes about 190/5 = 38 times longer time to run "git log --parents",
and for the new dummy file it takes 205/4 = 51 times longer time, when using the new repo.

The size of the llvm-project repo is a little bit larger (since we have merged
several project, so the number of commits increases from ~180000 to ~310000, but I doubt
that such an increase should affect the time for git log --parents by a factor of 50.


From what I understand --parents can take some time, but I see huge degradation when using our new repo compared to the old.
Not sure if just the repo is too large (or poorly packed?), or if this is a git problem.

Any help understanding this is welcome.

I used git version 2.20.0 in the tests above.


PS. I also think that the problem can be seen for files with longer history, for example CODE_OWNERS.txt (llvm/CODE_OWNERS.txt in llvm-project). But then the git log command starts printing commits much sooner. So with gitk I actually get to see some history just after a few seconds also when using llvm-project (even though it takes some time to load the full history). For the files with a very short history (like the dummy file example) the printout won't happen until at the end (after 200 seconds) so git log (and gitk) just appears to be stuck. Is git log caching the result somehow, not printing anything until it has more than one commit to print?

Regards,
Björn Pettersson A    

Ericsson
Datalinjen 4 (Hus K)
58330, Linköping
Sweden

             reply	other threads:[~2019-04-02 11:32 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-02 11:31 Björn Pettersson A [this message]
2019-04-02 13:27 ` Bad performance when using git log --parents (used by gitk) Jeff King
2019-04-02 15:07   ` Björn Pettersson A
2019-04-02 18:20     ` Johannes Schindelin
2019-04-04  1:36       ` Jeff King
2019-04-04  1:41   ` [PATCH] revision: use a prio_queue to hold rewritten parents Jeff King
2019-04-04  1:54     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=HE1PR0702MB3788FCDAB764252D9CBB42E5B0560@HE1PR0702MB3788.eurprd07.prod.outlook.com \
    --to=bjorn.a.pettersson@ericsson.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).