git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Anthony Muller <anthony@monospace.sh>
To: "brian m. carlson" <sandals@crustytoothpaste.net>
Cc: "Bryan Turner" <bturner@atlassian.com>, "git" <git@vger.kernel.org>
Subject: Re: Performance of "git gc..." is extremely bad in some cases
Date: Tue, 09 Mar 2021 00:14:01 +0000	[thread overview]
Message-ID: <17814555aec.b8e46da8884253.2263161421793744939@monospace.sh> (raw)
In-Reply-To: <YEa5xe0gNDh2wZLB@camp.crustytoothpaste.net>

Thank you Brian and Bryan. You both clarified what was happening and now I know what to look for.

I can use a shallow clone for most repos, but there are some I want to keep history for. I don't need a full copy of this repo, but it was a good repo to show the issue I was facing.

Thanks again!


 ---- On Mon, 08 Mar 2021 23:56:53 +0000 brian m. carlson <sandals@crustytoothpaste.net> wrote ----
 > On 2021-03-08 at 22:29:16, Bryan Turner wrote:
 > > On Mon, Mar 8, 2021 at 1:32 PM Anthony Muller <anthony@monospace.sh> wrote:
 > > >
 > > > What did you do before the bug happened? (Steps to reproduce your issue)
 > > >
 > > > git clone https://github.com/notracking/hosts-blocklists
 > > > cd hosts-blocklists
 > > > git reflog expire --all --expire=now && git gc --prune=now --aggressive
 > > 
 > > --aggressive tells git gc to discard all of its existing delta chains
 > > and go find new ones, and to be fairly aggressive in how it looks for
 > > candidates. This is going to be the primary source of the resource
 > > usage you see, as well as the time.
 > > 
 > > Aggressive GCs are something you do once in a (very great) while. If
 > > you try this without the --aggressive, how does it look?
 > 
 > I should point out that this repository is also rather pathologically
 > structured.  Almost every commit is an automatic commit updating the
 > same five files which are text files ranging from 5 MB to 11 MB.
 > 
 > When you use --aggressive, as Bryan pointed out, you're asking to throw
 > away all the deltas and try really hard to compute all of them fresh.
 > That's going to use a lot of memory because you're loading many large
 > text files into memory.  It's also going to use a lot of CPU because
 > these files do indeed delta extremely well, and since computing deltas
 > on larger files is more expensive, especially when there are many of
 > them.
 > 
 > And that's just the blobs.  The trees and commits are also going to be
 > nearly identically structured and will also delta well with virtually
 > every other similar object of their type.  Normally Git sorts by size
 > which helps pick better candidates, but since these are all going to be
 > identically sized, the performance is going to suffer.
 > 
 > Now, I have the advantage in this case of being a person who's sometimes
 > on call for the maintenance of Git repositories and in that capacity,
 > that this is pathologically structured is obvious to me.  But, yeah, I
 > would definitely not run --aggressive on this repo unless I needed to
 > and I would not expect it to perform well.
 > -- 
 > brian m. carlson (he/him or they/them)
 > Houston, Texas, US
 > 

      reply	other threads:[~2021-03-09  0:14 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-08 21:15 Performance of "git gc..." is extremely bad in some cases Anthony Muller
2021-03-08 22:29 ` Bryan Turner
     [not found]   ` <178140c3b3b.c7a29306868075.2037370475662478386@monospace.sh>
2021-03-08 23:55     ` Bryan Turner
2021-03-08 23:56   ` brian m. carlson
2021-03-09  0:14     ` Anthony Muller [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=17814555aec.b8e46da8884253.2263161421793744939@monospace.sh \
    --to=anthony@monospace.sh \
    --cc=bturner@atlassian.com \
    --cc=git@vger.kernel.org \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).