From: Anthony Muller <anthony@monospace.sh>
To: "brian m. carlson" <sandals@crustytoothpaste.net>
Cc: "Bryan Turner" <bturner@atlassian.com>, "git" <git@vger.kernel.org>
Subject: Re: Performance of "git gc..." is extremely bad in some cases
Date: Tue, 09 Mar 2021 00:14:01 +0000 [thread overview]
Message-ID: <17814555aec.b8e46da8884253.2263161421793744939@monospace.sh> (raw)
In-Reply-To: <YEa5xe0gNDh2wZLB@camp.crustytoothpaste.net>
Thank you Brian and Bryan. You both clarified what was happening and now I know what to look for.
I can use a shallow clone for most repos, but there are some I want to keep history for. I don't need a full copy of this repo, but it was a good repo to show the issue I was facing.
Thanks again!
---- On Mon, 08 Mar 2021 23:56:53 +0000 brian m. carlson <sandals@crustytoothpaste.net> wrote ----
> On 2021-03-08 at 22:29:16, Bryan Turner wrote:
> > On Mon, Mar 8, 2021 at 1:32 PM Anthony Muller <anthony@monospace.sh> wrote:
> > >
> > > What did you do before the bug happened? (Steps to reproduce your issue)
> > >
> > > git clone https://github.com/notracking/hosts-blocklists
> > > cd hosts-blocklists
> > > git reflog expire --all --expire=now && git gc --prune=now --aggressive
> >
> > --aggressive tells git gc to discard all of its existing delta chains
> > and go find new ones, and to be fairly aggressive in how it looks for
> > candidates. This is going to be the primary source of the resource
> > usage you see, as well as the time.
> >
> > Aggressive GCs are something you do once in a (very great) while. If
> > you try this without the --aggressive, how does it look?
>
> I should point out that this repository is also rather pathologically
> structured. Almost every commit is an automatic commit updating the
> same five files which are text files ranging from 5 MB to 11 MB.
>
> When you use --aggressive, as Bryan pointed out, you're asking to throw
> away all the deltas and try really hard to compute all of them fresh.
> That's going to use a lot of memory because you're loading many large
> text files into memory. It's also going to use a lot of CPU because
> these files do indeed delta extremely well, and since computing deltas
> on larger files is more expensive, especially when there are many of
> them.
>
> And that's just the blobs. The trees and commits are also going to be
> nearly identically structured and will also delta well with virtually
> every other similar object of their type. Normally Git sorts by size
> which helps pick better candidates, but since these are all going to be
> identically sized, the performance is going to suffer.
>
> Now, I have the advantage in this case of being a person who's sometimes
> on call for the maintenance of Git repositories and in that capacity,
> that this is pathologically structured is obvious to me. But, yeah, I
> would definitely not run --aggressive on this repo unless I needed to
> and I would not expect it to perform well.
> --
> brian m. carlson (he/him or they/them)
> Houston, Texas, US
>
prev parent reply other threads:[~2021-03-09 0:14 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-08 21:15 Performance of "git gc..." is extremely bad in some cases Anthony Muller
2021-03-08 22:29 ` Bryan Turner
[not found] ` <178140c3b3b.c7a29306868075.2037370475662478386@monospace.sh>
2021-03-08 23:55 ` Bryan Turner
2021-03-08 23:56 ` brian m. carlson
2021-03-09 0:14 ` Anthony Muller [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=17814555aec.b8e46da8884253.2263161421793744939@monospace.sh \
--to=anthony@monospace.sh \
--cc=bturner@atlassian.com \
--cc=git@vger.kernel.org \
--cc=sandals@crustytoothpaste.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).