git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Christian Couder <christian.couder@gmail.com>
To: git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>,
	John Cai <johncai86@gmail.com>,
	Jonathan Tan <jonathantanmy@google.com>,
	Jonathan Nieder <jrnieder@gmail.com>,
	Taylor Blau <me@ttaylorr.com>, Derrick Stolee <stolee@gmail.com>,
	Patrick Steinhardt <ps@pks.im>,
	Christian Couder <christian.couder@gmail.com>,
	Christian Couder <chriscool@tuxfamily.org>
Subject: [PATCH v7 8/9] repack: implement `--filter-to` for storing filtered out objects
Date: Mon, 25 Sep 2023 17:25:16 +0200	[thread overview]
Message-ID: <20230925152517.803579-9-christian.couder@gmail.com> (raw)
In-Reply-To: <20230925152517.803579-1-christian.couder@gmail.com>

A previous commit has implemented `git repack --filter=<filter-spec>` to
allow users to filter out some objects from the main pack and move them
into a new different pack.

It would be nice if this new different pack could be created in a
different directory than the regular pack. This would make it possible
to move large blobs into a pack on a different kind of storage, for
example cheaper storage.

Even in a different directory, this pack can be accessible if, for
example, the Git alternates mechanism is used to point to it. In fact
not using the Git alternates mechanism can corrupt a repo as the
generated pack containing the filtered objects might not be accessible
from the repo any more. So setting up the Git alternates mechanism
should be done before using this feature if the user wants the repo to
be fully usable while this feature is used.

In some cases, like when a repo has just been cloned or when there is no
other activity in the repo, it's Ok to setup the Git alternates
mechanism afterwards though. It's also Ok to just inspect the generated
packfile containing the filtered objects and then just move it into the
'.git/objects/pack/' directory manually. That's why it's not necessary
for this command to check that the Git alternates mechanism has been
already setup.

While at it, as an example to show that `--filter` and `--filter-to`
work well with other options, let's also add a test to check that these
options work well with `--max-pack-size`.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 Documentation/git-repack.txt | 11 +++++++
 builtin/repack.c             | 10 +++++-
 t/t7700-repack.sh            | 62 ++++++++++++++++++++++++++++++++++++
 3 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index 6d5bec7716..8545a32667 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -155,6 +155,17 @@ depth is 4095.
 	a single packfile containing all the objects. See
 	linkgit:git-rev-list[1] for valid `<filter-spec>` forms.
 
+--filter-to=<dir>::
+	Write the pack containing filtered out objects to the
+	directory `<dir>`. Only useful with `--filter`. This can be
+	used for putting the pack on a separate object directory that
+	is accessed through the Git alternates mechanism. **WARNING:**
+	If the packfile containing the filtered out objects is not
+	accessible, the repo can become corrupt as it might not be
+	possible to access the objects in that packfile. See the
+	`objects` and `objects/info/alternates` sections of
+	linkgit:gitrepository-layout[5].
+
 -b::
 --write-bitmap-index::
 	Write a reachability bitmap index as part of the repack. This
diff --git a/builtin/repack.c b/builtin/repack.c
index c7b564192f..db9277081d 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -977,6 +977,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	int write_midx = 0;
 	const char *cruft_expiration = NULL;
 	const char *expire_to = NULL;
+	const char *filter_to = NULL;
 
 	struct option builtin_repack_options[] = {
 		OPT_BIT('a', NULL, &pack_everything,
@@ -1029,6 +1030,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 			   N_("write a multi-pack index of the resulting packs")),
 		OPT_STRING(0, "expire-to", &expire_to, N_("dir"),
 			   N_("pack prefix to store a pack containing pruned objects")),
+		OPT_STRING(0, "filter-to", &filter_to, N_("dir"),
+			   N_("pack prefix to store a pack containing filtered out objects")),
 		OPT_END()
 	};
 
@@ -1177,6 +1180,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	if (po_args.filter_options.choice)
 		strvec_pushf(&cmd.args, "--filter=%s",
 			     expand_list_objects_filter_spec(&po_args.filter_options));
+	else if (filter_to)
+		die(_("option '%s' can only be used along with '%s'"), "--filter-to", "--filter");
 
 	if (geometry.split_factor)
 		cmd.in = -1;
@@ -1265,8 +1270,11 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	}
 
 	if (po_args.filter_options.choice) {
+		if (!filter_to)
+			filter_to = packtmp;
+
 		ret = write_filtered_pack(&po_args,
-					  packtmp,
+					  filter_to,
 					  find_pack_prefix(packdir, packtmp),
 					  &existing,
 					  &names);
diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index 39e89445fd..48e92aa6f7 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -462,6 +462,68 @@ test_expect_success '--filter works with --pack-kept-objects and .keep packs' '
 	)
 '
 
+test_expect_success '--filter-to stores filtered out objects' '
+	git -C bare.git repack -a -d &&
+	test_stdout_line_count = 1 ls bare.git/objects/pack/*.pack &&
+
+	git init --bare filtered.git &&
+	git -C bare.git -c repack.writebitmaps=false repack -a -d \
+		--filter=blob:none \
+		--filter-to=../filtered.git/objects/pack/pack &&
+	test_stdout_line_count = 1 ls bare.git/objects/pack/pack-*.pack &&
+	test_stdout_line_count = 1 ls filtered.git/objects/pack/pack-*.pack &&
+
+	commit_pack=$(test-tool -C bare.git find-pack -c 1 HEAD) &&
+	blob_pack=$(test-tool -C bare.git find-pack -c 0 HEAD:file1) &&
+	blob_hash=$(git -C bare.git rev-parse HEAD:file1) &&
+	test -n "$blob_hash" &&
+	blob_pack=$(test-tool -C filtered.git find-pack -c 1 $blob_hash) &&
+
+	echo $(pwd)/filtered.git/objects >bare.git/objects/info/alternates &&
+	blob_pack=$(test-tool -C bare.git find-pack -c 1 HEAD:file1) &&
+	blob_content=$(git -C bare.git show $blob_hash) &&
+	test "$blob_content" = "content1"
+'
+
+test_expect_success '--filter works with --max-pack-size' '
+	rm -rf filtered.git &&
+	git init --bare filtered.git &&
+	git init max-pack-size &&
+	(
+		cd max-pack-size &&
+		test_commit base &&
+		# two blobs which exceed the maximum pack size
+		test-tool genrandom foo 1048576 >foo &&
+		git hash-object -w foo &&
+		test-tool genrandom bar 1048576 >bar &&
+		git hash-object -w bar &&
+		git add foo bar &&
+		git commit -m "adding foo and bar"
+	) &&
+	git clone --no-local --bare max-pack-size max-pack-size.git &&
+	(
+		cd max-pack-size.git &&
+		git -c repack.writebitmaps=false repack -a -d --filter=blob:none \
+			--max-pack-size=1M \
+			--filter-to=../filtered.git/objects/pack/pack &&
+		echo $(cd .. && pwd)/filtered.git/objects >objects/info/alternates &&
+
+		# Check that the 3 blobs are in different packfiles in filtered.git
+		test_stdout_line_count = 3 ls ../filtered.git/objects/pack/pack-*.pack &&
+		test_stdout_line_count = 1 ls objects/pack/pack-*.pack &&
+		foo_pack=$(test-tool find-pack -c 1 HEAD:foo) &&
+		bar_pack=$(test-tool find-pack -c 1 HEAD:bar) &&
+		base_pack=$(test-tool find-pack -c 1 HEAD:base.t) &&
+		test "$foo_pack" != "$bar_pack" &&
+		test "$foo_pack" != "$base_pack" &&
+		test "$bar_pack" != "$base_pack" &&
+		for pack in "$foo_pack" "$bar_pack" "$base_pack"
+		do
+			case "$foo_pack" in */filtered.git/objects/pack/*) true ;; *) return 1 ;; esac
+		done
+	)
+'
+
 objdir=.git/objects
 midx=$objdir/pack/multi-pack-index
 
-- 
2.42.0.279.g57b2ba444c


  parent reply	other threads:[~2023-09-25 15:26 UTC|newest]

Thread overview: 161+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-14 19:25 [PATCH 0/9] Repack objects into separate packfiles based on a filter Christian Couder
2023-06-14 19:25 ` [PATCH 1/9] pack-objects: allow `--filter` without `--stdout` Christian Couder
2023-06-21 10:49   ` Taylor Blau
2023-07-05  6:16     ` Christian Couder
2023-06-14 19:25 ` [PATCH 2/9] pack-objects: add `--print-filtered` to print omitted objects Christian Couder
2023-06-15 22:50   ` Junio C Hamano
2023-06-21 10:52     ` Taylor Blau
2023-06-21 11:11       ` Christian Couder
2023-06-21 11:54         ` Taylor Blau
2023-06-14 19:25 ` [PATCH 3/9] t/helper: add 'find-pack' test-tool Christian Couder
2023-06-15 23:32   ` Junio C Hamano
2023-06-21 10:40     ` Christian Couder
2023-06-21 10:54     ` Taylor Blau
2023-06-14 19:25 ` [PATCH 4/9] repack: refactor piping an oid to a command Christian Couder
2023-06-15 23:46   ` Junio C Hamano
2023-06-21 10:55     ` Taylor Blau
2023-06-21 10:56     ` Christian Couder
2023-06-14 19:25 ` [PATCH 5/9] repack: refactor finishing pack-objects command Christian Couder
2023-06-16  0:13   ` Junio C Hamano
2023-06-21 11:06     ` Taylor Blau
2023-06-21 11:19       ` Christian Couder
2023-06-21 11:05   ` Taylor Blau
2023-06-14 19:25 ` [PATCH 6/9] repack: add `--filter=<filter-spec>` option Christian Couder
2023-06-16  0:43   ` Junio C Hamano
2023-06-21 11:20     ` Taylor Blau
2023-06-21 15:04       ` Christian Couder
2023-06-22 11:05         ` Taylor Blau
2023-06-21 14:40     ` Christian Couder
2023-06-21 16:53       ` Junio C Hamano
2023-06-22  8:39         ` Christian Couder
2023-06-22 18:32           ` Junio C Hamano
2023-06-21 11:17   ` Taylor Blau
2023-07-05  7:18     ` Christian Couder
2023-06-14 19:25 ` [PATCH 7/9] gc: add `gc.repackFilter` config option Christian Couder
2023-06-14 19:25 ` [PATCH 8/9] repack: implement `--filter-to` for storing filtered out objects Christian Couder
2023-06-16  2:21   ` Junio C Hamano
2023-06-21 11:49   ` Taylor Blau
2023-06-21 12:08     ` Christian Couder
2023-06-21 12:25       ` Taylor Blau
2023-06-21 16:44         ` Junio C Hamano
2023-07-05  6:19     ` Christian Couder
2023-06-14 19:25 ` [PATCH 9/9] gc: add `gc.repackFilterTo` config option Christian Couder
2023-06-16  2:54   ` Junio C Hamano
2023-06-14 21:36 ` [PATCH 0/9] Repack objects into separate packfiles based on a filter Junio C Hamano
2023-06-16  3:08   ` Junio C Hamano
2023-07-05  6:08 ` [PATCH v2 0/8] " Christian Couder
2023-07-05  6:08   ` [PATCH v2 1/8] pack-objects: allow `--filter` without `--stdout` Christian Couder
2023-07-05  6:08   ` [PATCH v2 2/8] t/helper: add 'find-pack' test-tool Christian Couder
2023-07-05  6:08   ` [PATCH v2 3/8] repack: refactor finishing pack-objects command Christian Couder
2023-07-05  6:08   ` [PATCH v2 4/8] repack: refactor finding pack prefix Christian Couder
2023-07-05  6:08   ` [PATCH v2 5/8] repack: add `--filter=<filter-spec>` option Christian Couder
2023-07-05 17:53     ` Junio C Hamano
2023-07-24  9:01       ` Christian Couder
2023-07-24 18:28         ` Junio C Hamano
2023-07-25 15:22           ` Christian Couder
2023-07-25 17:25             ` Junio C Hamano
2023-07-25 23:08               ` Junio C Hamano
2023-08-08  8:45                 ` Christian Couder
2023-08-09 20:38                   ` Taylor Blau
2023-08-09 22:50                   ` Junio C Hamano
2023-08-09 23:38                     ` Junio C Hamano
2023-08-10  0:10                       ` Jeff King
2023-07-05 18:12     ` Junio C Hamano
2023-07-24  9:02       ` Christian Couder
2023-07-05  6:08   ` [PATCH v2 6/8] gc: add `gc.repackFilter` config option Christian Couder
2023-07-05  6:08   ` [PATCH v2 7/8] repack: implement `--filter-to` for storing filtered out objects Christian Couder
2023-07-05 18:26     ` Junio C Hamano
2023-07-24  9:00       ` Christian Couder
2023-07-24 18:18         ` Junio C Hamano
2023-07-25 13:41           ` Robert Coup
2023-07-25 16:50             ` Junio C Hamano
2023-07-25 15:45           ` Christian Couder
2023-07-05  6:08   ` [PATCH v2 8/8] gc: add `gc.repackFilterTo` config option Christian Couder
2023-07-24  8:59   ` [PATCH v3 0/8] Repack objects into separate packfiles based on a filter Christian Couder
2023-07-24  8:59     ` [PATCH v3 1/8] pack-objects: allow `--filter` without `--stdout` Christian Couder
2023-07-25 22:38       ` Taylor Blau
2023-07-25 23:51         ` Junio C Hamano
2023-07-24  8:59     ` [PATCH v3 2/8] t/helper: add 'find-pack' test-tool Christian Couder
2023-07-25 22:44       ` Taylor Blau
2023-08-08  8:28         ` Christian Couder
2023-07-24  8:59     ` [PATCH v3 3/8] repack: refactor finishing pack-objects command Christian Couder
2023-07-25 22:45       ` Taylor Blau
2023-07-24  8:59     ` [PATCH v3 4/8] repack: refactor finding pack prefix Christian Couder
2023-07-25 22:47       ` Taylor Blau
2023-08-08  8:29         ` Christian Couder
2023-07-24  8:59     ` [PATCH v3 5/8] repack: add `--filter=<filter-spec>` option Christian Couder
2023-07-25 23:04       ` Taylor Blau
2023-08-08  8:34         ` Christian Couder
2023-08-09 21:12           ` Taylor Blau
2023-07-24  8:59     ` [PATCH v3 6/8] gc: add `gc.repackFilter` config option Christian Couder
2023-07-25 23:07       ` Taylor Blau
2023-08-08  8:38         ` Christian Couder
2023-08-09 21:15           ` Taylor Blau
2023-07-24  8:59     ` [PATCH v3 7/8] repack: implement `--filter-to` for storing filtered out objects Christian Couder
2023-07-24  8:59     ` [PATCH v3 8/8] gc: add `gc.repackFilterTo` config option Christian Couder
2023-07-25 23:10     ` [PATCH v3 0/8] Repack objects into separate packfiles based on a filter Taylor Blau
2023-08-08  8:26     ` [PATCH v4 " Christian Couder
2023-08-08  8:26       ` [PATCH v4 1/8] pack-objects: allow `--filter` without `--stdout` Christian Couder
2023-08-08  8:26       ` [PATCH v4 2/8] t/helper: add 'find-pack' test-tool Christian Couder
2023-08-09 21:18         ` Taylor Blau
2023-08-08  8:26       ` [PATCH v4 3/8] repack: refactor finishing pack-objects command Christian Couder
2023-08-08  8:26       ` [PATCH v4 4/8] repack: refactor finding pack prefix Christian Couder
2023-08-09 21:20         ` Taylor Blau
2023-08-08  8:26       ` [PATCH v4 5/8] repack: add `--filter=<filter-spec>` option Christian Couder
2023-08-09 21:40         ` Taylor Blau
2023-08-08  8:26       ` [PATCH v4 6/8] gc: add `gc.repackFilter` config option Christian Couder
2023-08-08  8:26       ` [PATCH v4 7/8] repack: implement `--filter-to` for storing filtered out objects Christian Couder
2023-08-08  8:26       ` [PATCH v4 8/8] gc: add `gc.repackFilterTo` config option Christian Couder
2023-08-09 21:45       ` [PATCH v4 0/8] Repack objects into separate packfiles based on a filter Taylor Blau
2023-08-09 21:57         ` Junio C Hamano
2023-08-12  0:12         ` Christian Couder
2023-08-12  0:00       ` [PATCH v5 " Christian Couder
2023-08-12  0:00         ` [PATCH v5 1/8] pack-objects: allow `--filter` without `--stdout` Christian Couder
2023-08-12  0:00         ` [PATCH v5 2/8] t/helper: add 'find-pack' test-tool Christian Couder
2023-08-12  0:00         ` [PATCH v5 3/8] repack: refactor finishing pack-objects command Christian Couder
2023-08-12  0:00         ` [PATCH v5 4/8] repack: refactor finding pack prefix Christian Couder
2023-08-12  0:00         ` [PATCH v5 5/8] repack: add `--filter=<filter-spec>` option Christian Couder
2023-08-12  0:00         ` [PATCH v5 6/8] gc: add `gc.repackFilter` config option Christian Couder
2023-08-12  0:00         ` [PATCH v5 7/8] repack: implement `--filter-to` for storing filtered out objects Christian Couder
2023-08-12  0:00         ` [PATCH v5 8/8] gc: add `gc.repackFilterTo` config option Christian Couder
2023-08-15  0:51         ` [PATCH v5 0/8] Repack objects into separate packfiles based on a filter Junio C Hamano
2023-08-15 21:43           ` Taylor Blau
2023-08-15 22:32             ` Junio C Hamano
2023-08-15 23:09               ` Taylor Blau
2023-08-15 23:18                 ` Junio C Hamano
2023-08-16  0:38                   ` Taylor Blau
2023-08-16 17:16                     ` Junio C Hamano
2023-09-11 15:20                 ` Christian Couder
2023-09-11 15:06         ` [PATCH v6 0/9] " Christian Couder
2023-09-11 15:06           ` [PATCH v6 1/9] pack-objects: allow `--filter` without `--stdout` Christian Couder
2023-09-11 15:06           ` [PATCH v6 2/9] t/helper: add 'find-pack' test-tool Christian Couder
2023-09-11 15:06           ` [PATCH v6 3/9] repack: refactor finishing pack-objects command Christian Couder
2023-09-11 15:06           ` [PATCH v6 4/9] repack: refactor finding pack prefix Christian Couder
2023-09-11 15:06           ` [PATCH v6 5/9] pack-bitmap-write: rebuild using new bitmap when remapping Christian Couder
2023-09-11 15:06           ` [PATCH v6 6/9] repack: add `--filter=<filter-spec>` option Christian Couder
2023-09-11 15:06           ` [PATCH v6 7/9] gc: add `gc.repackFilter` config option Christian Couder
2023-09-11 15:06           ` [PATCH v6 8/9] repack: implement `--filter-to` for storing filtered out objects Christian Couder
2023-09-11 15:06           ` [PATCH v6 9/9] gc: add `gc.repackFilterTo` config option Christian Couder
2023-09-25 15:25           ` [PATCH v7 0/9] Repack objects into separate packfiles based on a filter Christian Couder
2023-09-25 15:25             ` [PATCH v7 1/9] pack-objects: allow `--filter` without `--stdout` Christian Couder
2023-09-25 15:25             ` [PATCH v7 2/9] t/helper: add 'find-pack' test-tool Christian Couder
2023-09-25 15:25             ` [PATCH v7 3/9] repack: refactor finishing pack-objects command Christian Couder
2023-09-25 15:25             ` [PATCH v7 4/9] repack: refactor finding pack prefix Christian Couder
2023-09-25 15:25             ` [PATCH v7 5/9] pack-bitmap-write: rebuild using new bitmap when remapping Christian Couder
2023-09-25 15:25             ` [PATCH v7 6/9] repack: add `--filter=<filter-spec>` option Christian Couder
2023-09-25 15:25             ` [PATCH v7 7/9] gc: add `gc.repackFilter` config option Christian Couder
2023-09-25 15:25             ` Christian Couder [this message]
2023-09-25 15:25             ` [PATCH v7 9/9] gc: add `gc.repackFilterTo` " Christian Couder
2023-09-25 19:14             ` [PATCH v7 0/9] Repack objects into separate packfiles based on a filter Junio C Hamano
2023-09-25 22:41               ` Taylor Blau
2023-10-02 16:54             ` [PATCH v8 " Christian Couder
2023-10-02 16:54               ` [PATCH v8 1/9] pack-objects: allow `--filter` without `--stdout` Christian Couder
2023-10-02 16:54               ` [PATCH v8 2/9] t/helper: add 'find-pack' test-tool Christian Couder
2023-10-02 16:54               ` [PATCH v8 3/9] repack: refactor finishing pack-objects command Christian Couder
2023-10-02 16:54               ` [PATCH v8 4/9] repack: refactor finding pack prefix Christian Couder
2023-10-02 16:55               ` [PATCH v8 5/9] pack-bitmap-write: rebuild using new bitmap when remapping Christian Couder
2023-10-02 16:55               ` [PATCH v8 6/9] repack: add `--filter=<filter-spec>` option Christian Couder
2023-10-02 16:55               ` [PATCH v8 7/9] gc: add `gc.repackFilter` config option Christian Couder
2023-10-02 16:55               ` [PATCH v8 8/9] repack: implement `--filter-to` for storing filtered out objects Christian Couder
2023-10-02 16:55               ` [PATCH v8 9/9] gc: add `gc.repackFilterTo` config option Christian Couder
2023-10-02 20:14               ` [PATCH v8 0/9] Repack objects into separate packfiles based on a filter Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230925152517.803579-9-christian.couder@gmail.com \
    --to=christian.couder@gmail.com \
    --cc=chriscool@tuxfamily.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=johncai86@gmail.com \
    --cc=jonathantanmy@google.com \
    --cc=jrnieder@gmail.com \
    --cc=me@ttaylorr.com \
    --cc=ps@pks.im \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).