git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Eric Wong <e@80x24.org>
Cc: Git Mailing List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>, Jeff King <peff@peff.net>,
	Vicent Marti <tanoku@gmail.com>
Subject: Re: [PATCH/RFC] gitperformance: add new documentation about git performance tuning
Date: Tue, 4 Apr 2017 23:12:58 +0200	[thread overview]
Message-ID: <CACBZZX6qDUvbuOQ-tJ+enARJUcoUoipbapVxi4Lf=84xBCmbQw@mail.gmail.com> (raw)
In-Reply-To: <20170403223956.GA3537@whir>

On Tue, Apr 4, 2017 at 12:39 AM, Eric Wong <e@80x24.org> wrote:
> Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>> On Mon, Apr 3, 2017 at 11:34 PM, Eric Wong <e@80x24.org> wrote:
>> > Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>> >>  - Should we be covering good practices for your repo going forward to
>> >>    maintain good performance? E.g. don't have some huge tree all in
>> >>    one directory (use subdirs), don't add binary (rather
>> >>    un-delta-able) content if you can help it etc.
>> >
>> > Yes, I think so.
>>
>> I'll try to write something up.
>>
>> > I think avoiding ever growing ChangeLog-type files should also
>> > be added to things to avoid.
>>
>> How were those bad specifically? They should delta quite well, it's
>> expensive to commit large files but no more because they're
>> ever-growing.
>
> It might be blame/annotate specifically, I was remembering this
> thread from a decade ago:
>
>   https://public-inbox.org/git/4aca3dc20712110933i636342fbifb15171d3e3cafb3@mail.gmail.com/T/

I did some basic testing on this, and I think advice about
ChangeLog-style files isn't worth including. On gcc.git blame on
ChangeLog still takes a few hundred MB of RAM, but finishes in about
2s on my machine. That gcc/fold-const.c file takes ~10s for me though,
but that thread seems to have resulted in some patches to git-blame.

Running this:

    parallel '/usr/bin/time -f %E git blame {} 2>&1 >/dev/null | tr
"\n" "\t" && git log --oneline {} | wc -l | tr "\n" "\t" && wc -l {} |
tr "\n" "\t" && echo {}' ::: $(git ls-files) | tee
/tmp/git-blame-times.txt

On git.git shows that the slowest blames are just files with either
lots of commits, or lots of lines, or some combination of the two. The
gcc.git repo has some more pathological cases, top 10 on that repo:

$ parallel '/usr/bin/time -f %E git blame {} 2>&1 >/dev/null | tr "\n"
"\t" && git log --oneline {} | wc -l | tr "\n" "\t" && wc -l {} | tr
"\n" "\t" && echo {}' ::: $(git ls-files|grep -e ^gcc/ -e
ChangeLog|grep -v '/.*/') | tee /tmp/gcc-blame-times.txt
$ sort -nr /tmp/gcc-blame-times.txt |head -n 10
0:18.12 1513    14517 gcc/tree.c        gcc/tree.c
0:17.35 66336   7435 gcc/ChangeLog      gcc/ChangeLog
0:16.87 1634    30455 gcc/dwarf2out.c   gcc/dwarf2out.c
0:16.76 1160    7937 gcc/varasm.c       gcc/varasm.c
0:16.36 1692    5491 gcc/tree.h gcc/tree.h
0:15.34 94      493 gcc/xcoffout.c      gcc/xcoffout.c
0:15.22 54      194 gcc/xcoffout.h      gcc/xcoffout.h
0:15.12 964     9224 gcc/reload1.c      gcc/reload1.c
0:14.90 1593    2202 gcc/toplev.c       gcc/toplev.c
0:14.66 11      43 gcc/typeclass.h      gcc/typeclass.h

Which makes it pretty clear that blame is slow where you'd expect, not
with files that are prepended or appended to.


>> One issue with e.g. storing logs (I keep my IRC logs in git) is that
>> if you're constantly committing large (text) files without repack your
>> .git grows by a *lot* in a very short amount of time until a very
>> expensive repack, so now I split my IRC logs by month.
>
> Yep, that too; as auto GC is triggered by the number of loose
> objects, not the size/packability of them.

  reply	other threads:[~2017-04-04 21:13 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-03 21:16 Ævar Arnfjörð Bjarmason
2017-04-03 21:34 ` Eric Wong
2017-04-03 21:57   ` Ævar Arnfjörð Bjarmason
2017-04-03 22:39     ` Eric Wong
2017-04-04 21:12       ` Ævar Arnfjörð Bjarmason [this message]
2017-04-04  2:19     ` Jeff King
2017-04-04 15:07 ` Jeff Hostetler
2017-04-04 15:18   ` Ævar Arnfjörð Bjarmason
2017-04-04 18:25     ` Jeff Hostetler
2017-04-05 12:56 ` Duy Nguyen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACBZZX6qDUvbuOQ-tJ+enARJUcoUoipbapVxi4Lf=84xBCmbQw@mail.gmail.com' \
    --to=avarab@gmail.com \
    --cc=e@80x24.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=tanoku@gmail.com \
    --subject='Re: [PATCH/RFC] gitperformance: add new documentation about git performance tuning' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).