git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "brian m. carlson" <sandals@crustytoothpaste.net>
To: Bryan Turner <bturner@atlassian.com>
Cc: Anthony Muller <anthony@monospace.sh>, git <git@vger.kernel.org>
Subject: Re: Performance of "git gc..." is extremely bad in some cases
Date: Mon, 8 Mar 2021 23:56:53 +0000	[thread overview]
Message-ID: <YEa5xe0gNDh2wZLB@camp.crustytoothpaste.net> (raw)
In-Reply-To: <CAGyf7-F6jbs-HQeCSMjf_y8Y=5ZfME=CjBagAfKUbnP_0vDXqA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2163 bytes --]

On 2021-03-08 at 22:29:16, Bryan Turner wrote:
> On Mon, Mar 8, 2021 at 1:32 PM Anthony Muller <anthony@monospace.sh> wrote:
> >
> > What did you do before the bug happened? (Steps to reproduce your issue)
> >
> > git clone https://github.com/notracking/hosts-blocklists
> > cd hosts-blocklists
> > git reflog expire --all --expire=now && git gc --prune=now --aggressive
> 
> --aggressive tells git gc to discard all of its existing delta chains
> and go find new ones, and to be fairly aggressive in how it looks for
> candidates. This is going to be the primary source of the resource
> usage you see, as well as the time.
> 
> Aggressive GCs are something you do once in a (very great) while. If
> you try this without the --aggressive, how does it look?

I should point out that this repository is also rather pathologically
structured.  Almost every commit is an automatic commit updating the
same five files which are text files ranging from 5 MB to 11 MB.

When you use --aggressive, as Bryan pointed out, you're asking to throw
away all the deltas and try really hard to compute all of them fresh.
That's going to use a lot of memory because you're loading many large
text files into memory.  It's also going to use a lot of CPU because
these files do indeed delta extremely well, and since computing deltas
on larger files is more expensive, especially when there are many of
them.

And that's just the blobs.  The trees and commits are also going to be
nearly identically structured and will also delta well with virtually
every other similar object of their type.  Normally Git sorts by size
which helps pick better candidates, but since these are all going to be
identically sized, the performance is going to suffer.

Now, I have the advantage in this case of being a person who's sometimes
on call for the maintenance of Git repositories and in that capacity,
that this is pathologically structured is obvious to me.  But, yeah, I
would definitely not run --aggressive on this repo unless I needed to
and I would not expect it to perform well.
-- 
brian m. carlson (he/him or they/them)
Houston, Texas, US

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

  parent reply	other threads:[~2021-03-08 23:57 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-08 21:15 Performance of "git gc..." is extremely bad in some cases Anthony Muller
2021-03-08 22:29 ` Bryan Turner
     [not found]   ` <178140c3b3b.c7a29306868075.2037370475662478386@monospace.sh>
2021-03-08 23:55     ` Bryan Turner
2021-03-08 23:56   ` brian m. carlson [this message]
2021-03-09  0:14     ` Anthony Muller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YEa5xe0gNDh2wZLB@camp.crustytoothpaste.net \
    --to=sandals@crustytoothpaste.net \
    --cc=anthony@monospace.sh \
    --cc=bturner@atlassian.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).