git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: Jeff King <peff@peff.net>,
	Derrick Stolee <derrickstolee@github.com>,
	Junio C Hamano <gitster@pobox.com>
Subject: [PATCH v2 09/10] builtin/gc.c: make `gc.cruftPacks` enabled by default
Date: Tue, 18 Apr 2023 16:40:57 -0400	[thread overview]
Message-ID: <b6784ddfe2906f7c04b3050bd9ba63a884ddb047.1681850424.git.me@ttaylorr.com> (raw)
In-Reply-To: <cover.1681850424.git.me@ttaylorr.com>

Back in 5b92477f89 (builtin/gc.c: conditionally avoid pruning objects
via loose, 2022-05-20), `git gc` learned the `--cruft` option and
`gc.cruftPacks` configuration to opt-in to writing cruft packs when
collecting or pruning unreachable objects.

Cruft packs were introduced with the merge in a50036da1a (Merge branch
'tb/cruft-packs', 2022-06-03). They address the problem of "loose object
explosions", where Git will write out many individual loose objects when
there is a large number of unreachable objects that have not yet aged
past `--prune=<date>`.

Instead of keeping track of those unreachable yet recent objects via
their loose object file's mtime, cruft packs collect all unreachable
objects into a single pack with a corresponding `*.mtimes` file that
acts as a table to store the mtimes of all unreachable objects. This
prevents the need to store unreachable objects as loose as they age out
of the repository, and avoids the problem of loose object explosions.

Beyond avoiding loose object explosions, cruft packs also act as a more
efficient mechanism to store unreachable objects as they age out of a
repository. This is because pairs of similar unreachable objects serve
as delta bases for one another.

In 5b92477f89, the feature was introduced as experimental. Since then,
GitHub has been running these patches in every repository generating
hundreds of millions of cruft packs along the way. The feature is
battle-tested, and avoids many pathological cases such as above. Users
who either run `git gc` manually, or via `git maintenance` can benefit
from having cruft packs.

As such, enable cruft pack generation to take place by default (by
making `gc.cruftPacks` have the default of "true" rather than "false).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/config/feature.txt |  3 ---
 Documentation/config/gc.txt      |  2 +-
 Documentation/git-gc.txt         |  5 +++--
 Documentation/gitformat-pack.txt |  4 ++--
 builtin/gc.c                     |  6 +-----
 t/t6500-gc.sh                    | 12 ++++--------
 6 files changed, 11 insertions(+), 21 deletions(-)

diff --git a/Documentation/config/feature.txt b/Documentation/config/feature.txt
index e52bc6b858..17b4d39f89 100644
--- a/Documentation/config/feature.txt
+++ b/Documentation/config/feature.txt
@@ -14,9 +14,6 @@ feature.experimental::
 +
 * `fetch.negotiationAlgorithm=skipping` may improve fetch negotiation times by
 skipping more commits at a time, reducing the number of round trips.
-+
-* `gc.cruftPacks=true` reduces disk space used by unreachable objects during
-garbage collection, preventing loose object explosions.
 
 feature.manyFiles::
 	Enable config options that optimize for repos with many files in the
diff --git a/Documentation/config/gc.txt b/Documentation/config/gc.txt
index 8d5353e9e0..7f95c866e1 100644
--- a/Documentation/config/gc.txt
+++ b/Documentation/config/gc.txt
@@ -84,7 +84,7 @@ gc.packRefs::
 gc.cruftPacks::
 	Store unreachable objects in a cruft pack (see
 	linkgit:git-repack[1]) instead of as loose objects. The default
-	is `false`.
+	is `true`.
 
 gc.pruneExpire::
 	When 'git gc' is run, it will call 'prune --expire 2.weeks.ago'
diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index fef382a70f..90806fd26a 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -54,9 +54,10 @@ other housekeeping tasks (e.g. rerere, working trees, reflog...) will
 be performed as well.
 
 
---cruft::
+--[no-]cruft::
 	When expiring unreachable objects, pack them separately into a
-	cruft pack instead of storing them as loose objects.
+	cruft pack instead of storing them as loose objects. `--cruft`
+	is on by default.
 
 --prune=<date>::
 	Prune loose objects older than date (default is 2 weeks ago,
diff --git a/Documentation/gitformat-pack.txt b/Documentation/gitformat-pack.txt
index e06af02f21..0c1be2dbe8 100644
--- a/Documentation/gitformat-pack.txt
+++ b/Documentation/gitformat-pack.txt
@@ -611,8 +611,8 @@ result of repeatedly resetting the objects' mtimes to the present time.
 
 If you are GC-ing repositories in a mixed version environment, consider omitting
 the `--cruft` option when using linkgit:git-repack[1] and linkgit:git-gc[1], and
-leaving the `gc.cruftPacks` configuration unset until all writers understand
-cruft packs.
+setting the `gc.cruftPacks` configuration to "false" until all writers
+understand cruft packs.
 
 === Alternatives
 
diff --git a/builtin/gc.c b/builtin/gc.c
index 53ef137e1d..ece01e966f 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -48,7 +48,7 @@ static const char * const builtin_gc_usage[] = {
 
 static int pack_refs = 1;
 static int prune_reflogs = 1;
-static int cruft_packs = -1;
+static int cruft_packs = 1;
 static int aggressive_depth = 50;
 static int aggressive_window = 250;
 static int gc_auto_threshold = 6700;
@@ -608,10 +608,6 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 	if (prune_expire && parse_expiry_date(prune_expire, &dummy))
 		die(_("failed to parse prune expiry value %s"), prune_expire);
 
-	prepare_repo_settings(the_repository);
-	if (cruft_packs < 0)
-		cruft_packs = the_repository->settings.gc_cruft_packs;
-
 	if (aggressive) {
 		strvec_push(&repack, "-f");
 		if (aggressive_depth > 0)
diff --git a/t/t6500-gc.sh b/t/t6500-gc.sh
index 3ba2ae5140..69509d0c11 100755
--- a/t/t6500-gc.sh
+++ b/t/t6500-gc.sh
@@ -216,11 +216,9 @@ assert_no_cruft_packs () {
 }
 
 for argv in \
-	"gc --cruft" \
+	"gc" \
 	"-c gc.cruftPacks=true gc" \
-	"-c gc.cruftPacks=false gc --cruft" \
-	"-c feature.experimental=true gc" \
-	"-c gc.cruftPacks=true -c feature.experimental=false gc"
+	"-c gc.cruftPacks=false gc --cruft"
 do
 	test_expect_success "git $argv generates a cruft pack" '
 		test_when_finished "rm -fr repo" &&
@@ -244,11 +242,9 @@ do
 done
 
 for argv in \
-	"gc" \
+	"gc --no-cruft" \
 	"-c gc.cruftPacks=false gc" \
-	"-c gc.cruftPacks=true gc --no-cruft" \
-	"-c feature.expiremental=true -c gc.cruftPacks=false gc" \
-	"-c feature.experimental=false gc"
+	"-c gc.cruftPacks=true gc --no-cruft"
 do
 	test_expect_success "git $argv does not generate a cruft pack" '
 		test_when_finished "rm -fr repo" &&
-- 
2.40.0.362.gc67ee7c2ff


  parent reply	other threads:[~2023-04-18 20:41 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-17 20:54 [PATCH 00/10] gc: enable cruft packs by default Taylor Blau
2023-04-17 20:54 ` [PATCH 01/10] pack-write.c: plug a leak in stage_tmp_packfiles() Taylor Blau
2023-04-18 10:30   ` Jeff King
2023-04-18 19:40     ` Taylor Blau
2023-04-17 20:54 ` [PATCH 02/10] builtin/repack.c: fix incorrect reference to '-C' Taylor Blau
2023-04-17 20:54 ` [PATCH 03/10] builtin/gc.c: ignore cruft packs with `--keep-largest-pack` Taylor Blau
2023-04-17 22:54   ` Junio C Hamano
2023-04-17 23:03     ` Taylor Blau
2023-04-18 10:39       ` Jeff King
2023-04-18 14:54         ` Derrick Stolee
2023-04-17 20:54 ` [PATCH 04/10] t/t5304-prune.sh: prepare for `gc --cruft` by default Taylor Blau
2023-04-17 20:54 ` [PATCH 05/10] t/t9300-fast-import.sh: " Taylor Blau
2023-04-18 10:43   ` Jeff King
2023-04-18 19:44     ` Taylor Blau
2023-04-17 20:54 ` [PATCH 06/10] t/t6500-gc.sh: refactor cruft pack tests Taylor Blau
2023-04-17 20:54 ` [PATCH 07/10] t/t6500-gc.sh: add additional test cases Taylor Blau
2023-04-18 10:48   ` Jeff King
2023-04-18 19:48     ` Taylor Blau
2023-04-17 20:54 ` [PATCH 08/10] t/t6501-freshen-objects.sh: prepare for `gc --cruft` by default Taylor Blau
2023-04-18 10:56   ` Jeff King
2023-04-18 19:50     ` Taylor Blau
2023-04-22 11:23       ` Jeff King
2023-04-17 20:54 ` [PATCH 09/10] builtin/gc.c: make `gc.cruftPacks` enabled " Taylor Blau
2023-04-18 11:00   ` Jeff King
2023-04-18 19:52     ` Taylor Blau
2023-04-17 20:54 ` [PATCH 10/10] repository.h: drop unused `gc_cruft_packs` Taylor Blau
2023-04-18 11:02   ` Jeff King
2023-04-18 11:04 ` [PATCH 00/10] gc: enable cruft packs by default Jeff King
2023-04-18 19:53   ` Taylor Blau
2023-04-18 20:40 ` [PATCH v2 " Taylor Blau
2023-04-18 20:40   ` [PATCH v2 01/10] pack-write.c: plug a leak in stage_tmp_packfiles() Taylor Blau
2023-04-19 22:00     ` Junio C Hamano
2023-04-20 16:31       ` Taylor Blau
2023-04-20 16:57         ` Junio C Hamano
2023-04-18 20:40   ` [PATCH v2 02/10] builtin/repack.c: fix incorrect reference to '-C' Taylor Blau
2023-04-18 20:40   ` [PATCH v2 03/10] builtin/gc.c: ignore cruft packs with `--keep-largest-pack` Taylor Blau
2023-04-18 20:40   ` [PATCH v2 04/10] t/t5304-prune.sh: prepare for `gc --cruft` by default Taylor Blau
2023-04-18 20:40   ` [PATCH v2 05/10] t/t6501-freshen-objects.sh: " Taylor Blau
2023-04-18 20:40   ` [PATCH v2 06/10] t/t6500-gc.sh: refactor cruft pack tests Taylor Blau
2023-04-18 20:40   ` [PATCH v2 07/10] t/t6500-gc.sh: add additional test cases Taylor Blau
2023-04-18 20:40   ` [PATCH v2 08/10] t/t9300-fast-import.sh: prepare for `gc --cruft` by default Taylor Blau
2023-04-18 20:40   ` Taylor Blau [this message]
2023-04-19 22:22     ` [PATCH v2 09/10] builtin/gc.c: make `gc.cruftPacks` enabled " Junio C Hamano
2023-04-20 17:24       ` Taylor Blau
2023-04-20 17:31         ` Junio C Hamano
2023-04-20 19:19           ` Taylor Blau
2023-04-18 20:41   ` [PATCH v2 10/10] repository.h: drop unused `gc_cruft_packs` Taylor Blau
2023-04-19 22:19     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b6784ddfe2906f7c04b3050bd9ba63a884ddb047.1681850424.git.me@ttaylorr.com \
    --to=me@ttaylorr.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).