git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Git blame performance on files with a lot of history
@ 2018-12-14 18:29 Clement Moyroud
  2018-12-14 19:10 ` Bryan Turner
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Clement Moyroud @ 2018-12-14 18:29 UTC (permalink / raw)
  To: git

Hello,

My group at work is migrating a CVS repo to Git. The biggest issue we
face so far is the performance of git blame, especially compared to
CVS on the same file. One file especially causes us trouble: it's a
30k lines file with 25 years of history in 3k+ commits. The complete
repo has 200k+ commits over that same period of time.

Currently, 'cvs annotate' takes 2.7 seconds, while 'git blame'
(without -M nor -C) takes 145s.

I tried using the commit-graph with the Bloom filter, per
https://public-inbox.org/git/61559c5b-546e-d61b-d2e1-68de692f5972@gmail.com/.
No dice:
    > time GIT_TEST_BLOOM_FILTERS=1
/wv/cmoyroud/calibre-src/git-bloom-filters/git-bloom-bin/bin/git
commit-graph write --reachable
    Annotating commits in commit graph: 573705, done.
    Computing commit graph generation numbers: 100% (286441/286441), done.
    Computing commit diff Bloom filters: 100% (286441/286441), done.
    GIT_TEST_BLOOM_FILTERS=1  commit-graph write --reachable  386.80s
user 31.78s system 78% cpu 8:53.87 total
    > time GIT_TEST_BLOOM_FILTERS=1 GIT_TRACE_BLOOM_FILTER=2
GIT_USE_POC_BLOOM_FILTER=y /path/to/git blame master --
important/file.C > /tmp/foo.compiler.bloom
    Blaming lines: 100% (33179/33179), done.
    GIT_TEST_BLOOM_FILTERS=1 GIT_TRACE_BLOOM_FILTER=2
GIT_USE_POC_BLOOM_FILTER=y   145.11s user 0.97s system 99% cpu 2:26.22
total
    > time /path/to/git blame master -- important/file.C >
/tmp/foo.compiler.nobloom
    Blaming lines: 100% (33179/33179), done.
    GIT_TEST_BLOOM_FILTERS=1 GIT_TEST_BLOOM_FILTERS=1
GIT_USE_POC_BLOOM_FILTER=y   141.69s user 0.77s system 99% cpu 2:22.56
total

I used Derrick Stolee's tree at
https://github.com/derrickstolee/git/tree/bloom/stolee

Looking at the blame code, it does not seem to be able to use the
commit graph, so I tried the same rev-list command from the e-mail,
using my own file:
    > GIT_TRACE_BLOOM_FILTER=2 GIT_USE_POC_BLOOM_FILTER=y
/path/to/git rev-list --count --full-history HEAD -- important/file.C
    3576

No trace information there either. Running 'strings' on the binary
reports the env. variable names, so I'm not totally crazy. Let me know
if I tried the right thing :)

Looks like blame performance is gonna be the biggest issue for us, so
I'm really interested in seeing improvements there. Let me know if
there's anything else I can try.

Cheers,

Clément

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-12-17 20:59 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-14 18:29 Git blame performance on files with a lot of history Clement Moyroud
2018-12-14 19:10 ` Bryan Turner
2018-12-17 20:43   ` Clement Moyroud
2018-12-14 21:31 ` Derrick Stolee
2018-12-17 20:59   ` Clement Moyroud
2018-12-14 22:48 ` Ævar Arnfjörð Bjarmason
2018-12-17 20:30   ` Clement Moyroud

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).