From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Eric Wong <e@80x24.org>
Cc: Git Mailing List <git@vger.kernel.org>,
Junio C Hamano <gitster@pobox.com>, Jeff King <peff@peff.net>,
Vicent Marti <tanoku@gmail.com>
Subject: Re: [PATCH/RFC] gitperformance: add new documentation about git performance tuning
Date: Tue, 4 Apr 2017 23:12:58 +0200 [thread overview]
Message-ID: <CACBZZX6qDUvbuOQ-tJ+enARJUcoUoipbapVxi4Lf=84xBCmbQw@mail.gmail.com> (raw)
In-Reply-To: <20170403223956.GA3537@whir>
On Tue, Apr 4, 2017 at 12:39 AM, Eric Wong <e@80x24.org> wrote:
> Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>> On Mon, Apr 3, 2017 at 11:34 PM, Eric Wong <e@80x24.org> wrote:
>> > Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>> >> - Should we be covering good practices for your repo going forward to
>> >> maintain good performance? E.g. don't have some huge tree all in
>> >> one directory (use subdirs), don't add binary (rather
>> >> un-delta-able) content if you can help it etc.
>> >
>> > Yes, I think so.
>>
>> I'll try to write something up.
>>
>> > I think avoiding ever growing ChangeLog-type files should also
>> > be added to things to avoid.
>>
>> How were those bad specifically? They should delta quite well, it's
>> expensive to commit large files but no more because they're
>> ever-growing.
>
> It might be blame/annotate specifically, I was remembering this
> thread from a decade ago:
>
> https://public-inbox.org/git/4aca3dc20712110933i636342fbifb15171d3e3cafb3@mail.gmail.com/T/
I did some basic testing on this, and I think advice about
ChangeLog-style files isn't worth including. On gcc.git blame on
ChangeLog still takes a few hundred MB of RAM, but finishes in about
2s on my machine. That gcc/fold-const.c file takes ~10s for me though,
but that thread seems to have resulted in some patches to git-blame.
Running this:
parallel '/usr/bin/time -f %E git blame {} 2>&1 >/dev/null | tr
"\n" "\t" && git log --oneline {} | wc -l | tr "\n" "\t" && wc -l {} |
tr "\n" "\t" && echo {}' ::: $(git ls-files) | tee
/tmp/git-blame-times.txt
On git.git shows that the slowest blames are just files with either
lots of commits, or lots of lines, or some combination of the two. The
gcc.git repo has some more pathological cases, top 10 on that repo:
$ parallel '/usr/bin/time -f %E git blame {} 2>&1 >/dev/null | tr "\n"
"\t" && git log --oneline {} | wc -l | tr "\n" "\t" && wc -l {} | tr
"\n" "\t" && echo {}' ::: $(git ls-files|grep -e ^gcc/ -e
ChangeLog|grep -v '/.*/') | tee /tmp/gcc-blame-times.txt
$ sort -nr /tmp/gcc-blame-times.txt |head -n 10
0:18.12 1513 14517 gcc/tree.c gcc/tree.c
0:17.35 66336 7435 gcc/ChangeLog gcc/ChangeLog
0:16.87 1634 30455 gcc/dwarf2out.c gcc/dwarf2out.c
0:16.76 1160 7937 gcc/varasm.c gcc/varasm.c
0:16.36 1692 5491 gcc/tree.h gcc/tree.h
0:15.34 94 493 gcc/xcoffout.c gcc/xcoffout.c
0:15.22 54 194 gcc/xcoffout.h gcc/xcoffout.h
0:15.12 964 9224 gcc/reload1.c gcc/reload1.c
0:14.90 1593 2202 gcc/toplev.c gcc/toplev.c
0:14.66 11 43 gcc/typeclass.h gcc/typeclass.h
Which makes it pretty clear that blame is slow where you'd expect, not
with files that are prepended or appended to.
>> One issue with e.g. storing logs (I keep my IRC logs in git) is that
>> if you're constantly committing large (text) files without repack your
>> .git grows by a *lot* in a very short amount of time until a very
>> expensive repack, so now I split my IRC logs by month.
>
> Yep, that too; as auto GC is triggered by the number of loose
> objects, not the size/packability of them.
next prev parent reply other threads:[~2017-04-04 21:13 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-03 21:16 [PATCH/RFC] gitperformance: add new documentation about git performance tuning Ævar Arnfjörð Bjarmason
2017-04-03 21:34 ` Eric Wong
2017-04-03 21:57 ` Ævar Arnfjörð Bjarmason
2017-04-03 22:39 ` Eric Wong
2017-04-04 21:12 ` Ævar Arnfjörð Bjarmason [this message]
2017-04-04 2:19 ` Jeff King
2017-04-04 15:07 ` Jeff Hostetler
2017-04-04 15:18 ` Ævar Arnfjörð Bjarmason
2017-04-04 18:25 ` Jeff Hostetler
2017-04-05 12:56 ` Duy Nguyen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CACBZZX6qDUvbuOQ-tJ+enARJUcoUoipbapVxi4Lf=84xBCmbQw@mail.gmail.com' \
--to=avarab@gmail.com \
--cc=e@80x24.org \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
--cc=tanoku@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).