git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <derrickstolee@github.com>
To: Jeff King <peff@peff.net>, Taylor Blau <me@ttaylorr.com>
Cc: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Subject: Re: [PATCH 03/10] builtin/gc.c: ignore cruft packs with `--keep-largest-pack`
Date: Tue, 18 Apr 2023 10:54:49 -0400	[thread overview]
Message-ID: <64e238fe-b69b-d670-6224-930b32bab9a5@github.com> (raw)
In-Reply-To: <20230418103909.GD508219@coredump.intra.peff.net>

On 4/18/2023 6:39 AM, Jeff King wrote:
> On Mon, Apr 17, 2023 at 07:03:08PM -0400, Taylor Blau wrote:

I agree with the prior discussion that gc.bigPackThreshold is
currently misbehaving and stopping it from caring about cruft packs
is the best way to fix that behavior in this series.

>> It is possible that in the future we could support writing multiple
>> cruft packs (we already handle the presence of multiple cruft packs
>> fine, just don't expose an easy way for the user to write >1 of them).
>> And at that point we would be able to relax this patch a bit and allow
>> `gc.bigPackThreshold` to cover cruft packs, too. But in the meantime,
>> the benefit of avoiding loose object explosions outweighs the possible
>> drawbacks here, IMHO.
> 
> I wondered if that interface might be an option to say "hey, I have a
> gigantic cruft file I want to carry forward, please leave it alone".
> 
> But if you have a giant cruft pack that is making your "git gc" too
> slow, it will eventually age out on its own. And if you're impatient,
> then "git gc --prune=now" is probably the right tool.
> 
> And If you really did want to keep rolling it forward for some reason,
> then I'd think marking it with ".keep" would be the best thing (and
> maybe even dropping the mtimes file? I'm not sure a how a kept-cruft
> pack does or should behave).

Generally, it's probably a good idea to (later) create a separate knob
for "don't rewrite the objects in a 'big' cruft pack unless you need
to". For situations where cruft objects are being collected and not
regularly pruned, this helps avoid repacking all unreachable objects
into a giant single pack, even though only a small number of objects
were discovered unreachable this time.

The important times where we'd want to consider a 'big' cruft pack
for rewrite are:

 1. Some objects in the cruft pack are being pruned.
 2. Some objects in the cruft pack need updated mtimes.

However, in the typical case that we are adding new cruft objects
and not changing the mtimes of existing unreachable objects, we could
create a sensible limit on the size of a cruft pack to be rewritten
during normal maintenance.

My personal preference would be something between 2GB and 10GB, which
seems like a decent balance between "size of cruft pack" and "number of
cruft packs" for most repositories. Since none of the objects are
reachable, we don't really care about them having good deltas for things
like fetches and clones. The benefit of reducing the time spent in 'git
repack --cruft' outweighs the slight disk space savings by having a
single cruft pack, in my opinion.

Thanks,
-Stolee

  reply	other threads:[~2023-04-18 14:55 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-17 20:54 [PATCH 00/10] gc: enable cruft packs by default Taylor Blau
2023-04-17 20:54 ` [PATCH 01/10] pack-write.c: plug a leak in stage_tmp_packfiles() Taylor Blau
2023-04-18 10:30   ` Jeff King
2023-04-18 19:40     ` Taylor Blau
2023-04-17 20:54 ` [PATCH 02/10] builtin/repack.c: fix incorrect reference to '-C' Taylor Blau
2023-04-17 20:54 ` [PATCH 03/10] builtin/gc.c: ignore cruft packs with `--keep-largest-pack` Taylor Blau
2023-04-17 22:54   ` Junio C Hamano
2023-04-17 23:03     ` Taylor Blau
2023-04-18 10:39       ` Jeff King
2023-04-18 14:54         ` Derrick Stolee [this message]
2023-04-17 20:54 ` [PATCH 04/10] t/t5304-prune.sh: prepare for `gc --cruft` by default Taylor Blau
2023-04-17 20:54 ` [PATCH 05/10] t/t9300-fast-import.sh: " Taylor Blau
2023-04-18 10:43   ` Jeff King
2023-04-18 19:44     ` Taylor Blau
2023-04-17 20:54 ` [PATCH 06/10] t/t6500-gc.sh: refactor cruft pack tests Taylor Blau
2023-04-17 20:54 ` [PATCH 07/10] t/t6500-gc.sh: add additional test cases Taylor Blau
2023-04-18 10:48   ` Jeff King
2023-04-18 19:48     ` Taylor Blau
2023-04-17 20:54 ` [PATCH 08/10] t/t6501-freshen-objects.sh: prepare for `gc --cruft` by default Taylor Blau
2023-04-18 10:56   ` Jeff King
2023-04-18 19:50     ` Taylor Blau
2023-04-22 11:23       ` Jeff King
2023-04-17 20:54 ` [PATCH 09/10] builtin/gc.c: make `gc.cruftPacks` enabled " Taylor Blau
2023-04-18 11:00   ` Jeff King
2023-04-18 19:52     ` Taylor Blau
2023-04-17 20:54 ` [PATCH 10/10] repository.h: drop unused `gc_cruft_packs` Taylor Blau
2023-04-18 11:02   ` Jeff King
2023-04-18 11:04 ` [PATCH 00/10] gc: enable cruft packs by default Jeff King
2023-04-18 19:53   ` Taylor Blau
2023-04-18 20:40 ` [PATCH v2 " Taylor Blau
2023-04-18 20:40   ` [PATCH v2 01/10] pack-write.c: plug a leak in stage_tmp_packfiles() Taylor Blau
2023-04-19 22:00     ` Junio C Hamano
2023-04-20 16:31       ` Taylor Blau
2023-04-20 16:57         ` Junio C Hamano
2023-04-18 20:40   ` [PATCH v2 02/10] builtin/repack.c: fix incorrect reference to '-C' Taylor Blau
2023-04-18 20:40   ` [PATCH v2 03/10] builtin/gc.c: ignore cruft packs with `--keep-largest-pack` Taylor Blau
2023-04-18 20:40   ` [PATCH v2 04/10] t/t5304-prune.sh: prepare for `gc --cruft` by default Taylor Blau
2023-04-18 20:40   ` [PATCH v2 05/10] t/t6501-freshen-objects.sh: " Taylor Blau
2023-04-18 20:40   ` [PATCH v2 06/10] t/t6500-gc.sh: refactor cruft pack tests Taylor Blau
2023-04-18 20:40   ` [PATCH v2 07/10] t/t6500-gc.sh: add additional test cases Taylor Blau
2023-04-18 20:40   ` [PATCH v2 08/10] t/t9300-fast-import.sh: prepare for `gc --cruft` by default Taylor Blau
2023-04-18 20:40   ` [PATCH v2 09/10] builtin/gc.c: make `gc.cruftPacks` enabled " Taylor Blau
2023-04-19 22:22     ` Junio C Hamano
2023-04-20 17:24       ` Taylor Blau
2023-04-20 17:31         ` Junio C Hamano
2023-04-20 19:19           ` Taylor Blau
2023-04-18 20:41   ` [PATCH v2 10/10] repository.h: drop unused `gc_cruft_packs` Taylor Blau
2023-04-19 22:19     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=64e238fe-b69b-d670-6224-930b32bab9a5@github.com \
    --to=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).