git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Taylor Blau <me@ttaylorr.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	git@vger.kernel.org, Derrick Stolee <derrickstolee@github.com>
Subject: Re: [PATCH 03/10] builtin/gc.c: ignore cruft packs with `--keep-largest-pack`
Date: Tue, 18 Apr 2023 06:39:09 -0400	[thread overview]
Message-ID: <20230418103909.GD508219@coredump.intra.peff.net> (raw)
In-Reply-To: <ZD3QLMs8/+DLKZM6@nand.local>

On Mon, Apr 17, 2023 at 07:03:08PM -0400, Taylor Blau wrote:

> On Mon, Apr 17, 2023 at 03:54:35PM -0700, Junio C Hamano wrote:
> > Taylor Blau <me@ttaylorr.com> writes:
> >
> > >   - The same is true for `gc.bigPackThreshold`, if the size of the cruft
> > >     pack exceeds the limit set by the caller.
> >
> > This is not as cut-and-dried clear as the previous one.  "This pack
> > is so large that it is not worth rewriting it only to expunge a
> > handful of objects that are no longer reachable from it" is the main
> > motivation to use this configuration, but doesn't some part of the
> > same reasoning apply equally to a large cruft pack?  But let's
> > assume that the configuration is totally irrelevant to cruft packs
> > and read on.
> 
> This is an inherent design trade-off. I imagine that callers who want to
> avoid rewriting their (large) cruft packs would prefer to generate a new
> cruft pack on top with just the recently accumulated unreachable
> objects.
> 
> That kind of works, except if you need to prune objects that are packed
> in an earlier cruft pack. If you have `gc.bigPackThreshold`, there is no
> way to do this if you need to expire objects that are in cruft packs
> above that threshold.
> 
> A user may find themselves frustrated when trying to `git gc --prune`
> some sensitive object(s) from their repository doesn't appear to work,
> only to discover that `gc.bigPackThreshold` is set somewhere in their
> configuration.
> 
> Writing (largely) the same cruft pack to expunge a few objects isn't
> ideal, but it is better than the status quo. And if you have so many
> unreachable objects that this is a concern, it is probably time to prune
> anyway.

Yeah, what your patch does makes sense to me as a default behavior. In a
pre-cruft-pack world, those objects would all be left alone by
gc.bigPackThreshol (because they're loose), and the essence of
cruft-packs is creating a parallel universe where those ejected-to-loose
objects just happen to be stored in a more efficient format.

> It is possible that in the future we could support writing multiple
> cruft packs (we already handle the presence of multiple cruft packs
> fine, just don't expose an easy way for the user to write >1 of them).
> And at that point we would be able to relax this patch a bit and allow
> `gc.bigPackThreshold` to cover cruft packs, too. But in the meantime,
> the benefit of avoiding loose object explosions outweighs the possible
> drawbacks here, IMHO.

I wondered if that interface might be an option to say "hey, I have a
gigantic cruft file I want to carry forward, please leave it alone".

But if you have a giant cruft pack that is making your "git gc" too
slow, it will eventually age out on its own. And if you're impatient,
then "git gc --prune=now" is probably the right tool.

And If you really did want to keep rolling it forward for some reason,
then I'd think marking it with ".keep" would be the best thing (and
maybe even dropping the mtimes file? I'm not sure a how a kept-cruft
pack does or should behave).

-Peff

  reply	other threads:[~2023-04-18 10:40 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-17 20:54 [PATCH 00/10] gc: enable cruft packs by default Taylor Blau
2023-04-17 20:54 ` [PATCH 01/10] pack-write.c: plug a leak in stage_tmp_packfiles() Taylor Blau
2023-04-18 10:30   ` Jeff King
2023-04-18 19:40     ` Taylor Blau
2023-04-17 20:54 ` [PATCH 02/10] builtin/repack.c: fix incorrect reference to '-C' Taylor Blau
2023-04-17 20:54 ` [PATCH 03/10] builtin/gc.c: ignore cruft packs with `--keep-largest-pack` Taylor Blau
2023-04-17 22:54   ` Junio C Hamano
2023-04-17 23:03     ` Taylor Blau
2023-04-18 10:39       ` Jeff King [this message]
2023-04-18 14:54         ` Derrick Stolee
2023-04-17 20:54 ` [PATCH 04/10] t/t5304-prune.sh: prepare for `gc --cruft` by default Taylor Blau
2023-04-17 20:54 ` [PATCH 05/10] t/t9300-fast-import.sh: " Taylor Blau
2023-04-18 10:43   ` Jeff King
2023-04-18 19:44     ` Taylor Blau
2023-04-17 20:54 ` [PATCH 06/10] t/t6500-gc.sh: refactor cruft pack tests Taylor Blau
2023-04-17 20:54 ` [PATCH 07/10] t/t6500-gc.sh: add additional test cases Taylor Blau
2023-04-18 10:48   ` Jeff King
2023-04-18 19:48     ` Taylor Blau
2023-04-17 20:54 ` [PATCH 08/10] t/t6501-freshen-objects.sh: prepare for `gc --cruft` by default Taylor Blau
2023-04-18 10:56   ` Jeff King
2023-04-18 19:50     ` Taylor Blau
2023-04-22 11:23       ` Jeff King
2023-04-17 20:54 ` [PATCH 09/10] builtin/gc.c: make `gc.cruftPacks` enabled " Taylor Blau
2023-04-18 11:00   ` Jeff King
2023-04-18 19:52     ` Taylor Blau
2023-04-17 20:54 ` [PATCH 10/10] repository.h: drop unused `gc_cruft_packs` Taylor Blau
2023-04-18 11:02   ` Jeff King
2023-04-18 11:04 ` [PATCH 00/10] gc: enable cruft packs by default Jeff King
2023-04-18 19:53   ` Taylor Blau
2023-04-18 20:40 ` [PATCH v2 " Taylor Blau
2023-04-18 20:40   ` [PATCH v2 01/10] pack-write.c: plug a leak in stage_tmp_packfiles() Taylor Blau
2023-04-19 22:00     ` Junio C Hamano
2023-04-20 16:31       ` Taylor Blau
2023-04-20 16:57         ` Junio C Hamano
2023-04-18 20:40   ` [PATCH v2 02/10] builtin/repack.c: fix incorrect reference to '-C' Taylor Blau
2023-04-18 20:40   ` [PATCH v2 03/10] builtin/gc.c: ignore cruft packs with `--keep-largest-pack` Taylor Blau
2023-04-18 20:40   ` [PATCH v2 04/10] t/t5304-prune.sh: prepare for `gc --cruft` by default Taylor Blau
2023-04-18 20:40   ` [PATCH v2 05/10] t/t6501-freshen-objects.sh: " Taylor Blau
2023-04-18 20:40   ` [PATCH v2 06/10] t/t6500-gc.sh: refactor cruft pack tests Taylor Blau
2023-04-18 20:40   ` [PATCH v2 07/10] t/t6500-gc.sh: add additional test cases Taylor Blau
2023-04-18 20:40   ` [PATCH v2 08/10] t/t9300-fast-import.sh: prepare for `gc --cruft` by default Taylor Blau
2023-04-18 20:40   ` [PATCH v2 09/10] builtin/gc.c: make `gc.cruftPacks` enabled " Taylor Blau
2023-04-19 22:22     ` Junio C Hamano
2023-04-20 17:24       ` Taylor Blau
2023-04-20 17:31         ` Junio C Hamano
2023-04-20 19:19           ` Taylor Blau
2023-04-18 20:41   ` [PATCH v2 10/10] repository.h: drop unused `gc_cruft_packs` Taylor Blau
2023-04-19 22:19     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230418103909.GD508219@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).