git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Bryan Turner <bturner@atlassian.com>
To: Anthony Muller <anthony@monospace.sh>
Cc: git <git@vger.kernel.org>
Subject: Re: Performance of "git gc..." is extremely bad in some cases
Date: Mon, 8 Mar 2021 14:29:16 -0800	[thread overview]
Message-ID: <CAGyf7-F6jbs-HQeCSMjf_y8Y=5ZfME=CjBagAfKUbnP_0vDXqA@mail.gmail.com> (raw)
In-Reply-To: <17813b232e9.e48d03c3862272.7793967418558853913@monospace.sh>

On Mon, Mar 8, 2021 at 1:32 PM Anthony Muller <anthony@monospace.sh> wrote:
>
> What did you do before the bug happened? (Steps to reproduce your issue)
>
> git clone https://github.com/notracking/hosts-blocklists
> cd hosts-blocklists
> git reflog expire --all --expire=now && git gc --prune=now --aggressive

--aggressive tells git gc to discard all of its existing delta chains
and go find new ones, and to be fairly aggressive in how it looks for
candidates. This is going to be the primary source of the resource
usage you see, as well as the time.

Aggressive GCs are something you do once in a (very great) while. If
you try this without the --aggressive, how does it look?

>
>
> What did you expect to happen? (Expected behavior)
>
> Running gc on a ~300 MB repo should not take 1 hour 55 minutes when
> running gc on a 2.6 GB repo (LLVM) only takes 24 minutes.
>
>
> What happened instead? (Actual behavior)
>
> Command took 1h 55m to complete on a ~300MB repo and used enough
> resources that the machine is almost unusable.
>
>
> What's different between what you expected and what actually happened?
>
> Compression stage uses the majority of the resources and time. Compression
> itself, when compared to something like zlib or lzma, should not take very long.
> While more may be happening as objects are compressed, the amount of time
> gc takes to compress the objects and the resources it consumed are both
> unreasonable.

The compression happening here is delta compression, not simple
compression like zip. Git searches across the repository for similar
objects and stores them as chains with a base object and (essentially)
instructions for converting that base object into another object.
That's significantly more resource-intensive work than zipping some
data.

>
> Memory: RSS = 3451152 KB (3.29 GB), VSZ = 29286272 KB (27.92 GB)
> Time: 12902.83s user 8995.41s system 315% cpu 1:55:36.73 total

Git offers several knobs that can be used to influence (though not
necessarily control) its resource usage. On 64-bit Linux the defaults
are 1 thread per logical CPU (so hyperthreaded CPUs use double) and
_unlimited_ memory usage per thread. You might want to investigate
some options like pack.threads and pack.windowmemory to apply some
constraints.

>
> I've seen this issue with a number of repos and size of the repo does not
> determine if this happens. LLVM @ 2.6 GB worked flawlessly, a 900 MB
> repo never finished, this 300 MB repo takes forever, and if you test something
> like chromium git will just crash.
>
>
> [System Info]
> hardware: 2.9Ghz Quad Core i7
> git version:
> git version 2.30.0
> cpu: x86_64
> no commit associated with this build
> sizeof-long: 8
> sizeof-size_t: 8
> shell-path: /bin/sh
> uname: Darwin 19.6.0 Darwin Kernel Version 19.6.0: Tue Jan 12 22:13:05 PST 2021; root:xnu-6153.141.16~1/RELEASE_X86_64 x86_64
> compiler info: clang: 12.0.0 (clang-1200.0.32.28)
> libc info: no libc information available
> $SHELL (typically, interactive shell): /usr/local/bin/zsh
>

Hope this helps!
-b

  reply	other threads:[~2021-03-08 22:30 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-08 21:15 Performance of "git gc..." is extremely bad in some cases Anthony Muller
2021-03-08 22:29 ` Bryan Turner [this message]
     [not found]   ` <178140c3b3b.c7a29306868075.2037370475662478386@monospace.sh>
2021-03-08 23:55     ` Bryan Turner
2021-03-08 23:56   ` brian m. carlson
2021-03-09  0:14     ` Anthony Muller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGyf7-F6jbs-HQeCSMjf_y8Y=5ZfME=CjBagAfKUbnP_0vDXqA@mail.gmail.com' \
    --to=bturner@atlassian.com \
    --cc=anthony@monospace.sh \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).