git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Christian Couder <christian.couder@gmail.com>
To: git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>,
	John Cai <johncai86@gmail.com>,
	Jonathan Tan <jonathantanmy@google.com>,
	Jonathan Nieder <jrnieder@gmail.com>,
	Taylor Blau <me@ttaylorr.com>, Derrick Stolee <stolee@gmail.com>,
	Patrick Steinhardt <ps@pks.im>,
	Christian Couder <christian.couder@gmail.com>,
	Christian Couder <chriscool@tuxfamily.org>
Subject: [PATCH v2 5/8] repack: add `--filter=<filter-spec>` option
Date: Wed,  5 Jul 2023 08:08:09 +0200	[thread overview]
Message-ID: <20230705060812.2865188-6-christian.couder@gmail.com> (raw)
In-Reply-To: <20230705060812.2865188-1-christian.couder@gmail.com>

This new option puts the objects specified by `<filter-spec>` into a
separate packfile.

This could be useful if, for example, some large blobs take a lot of
precious space on fast storage while they are rarely accessed. It could
make sense to move them into a separate cheaper, though slower, storage.

In other use cases it might make sense to put all the blobs into
separate storage.

This is done by running two `git pack-objects` commands. The first one
is run with `--filter=<filter-spec>`, using the specified filter. It
packs objects while omitting the objects specified by the filter.
Then another `git pack-objects` command is launched using
`--stdin-packs`. We pass it all the previously existing packs into its
stdin, so that it will pack all the objects in the previously existing
packs. But we also pass into its stdin, the pack created by the previous
`git pack-objects --filter=<filter-spec>` command as well as the kept
packs, all prefixed with '^', so that the objects in these packs will be
omitted from the resulting pack. The result is that only the objects
filtered out by the first `git pack-objects` command are in the pack
resulting from the second `git pack-objects` command.

Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 Documentation/git-repack.txt |  9 +++++
 builtin/repack.c             | 67 ++++++++++++++++++++++++++++++++++++
 t/t7700-repack.sh            | 16 +++++++++
 3 files changed, 92 insertions(+)

diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index 4017157949..d702553033 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -143,6 +143,15 @@ depth is 4095.
 	a larger and slower repository; see the discussion in
 	`pack.packSizeLimit`.
 
+--filter=<filter-spec>::
+	Remove objects matching the filter specification from the
+	resulting packfile and put them into a separate packfile. Note
+	that objects used in the working directory are not filtered
+	out. So for the split to fully work, it's best to perform it
+	in a bare repo and to use the `-a` and `-d` options along with
+	this option.  See linkgit:git-rev-list[1] for valid
+	`<filter-spec>` forms.
+
 -b::
 --write-bitmap-index::
 	Write a reachability bitmap index as part of the repack. This
diff --git a/builtin/repack.c b/builtin/repack.c
index 4e5afee8d8..e2661b956c 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -54,6 +54,7 @@ struct pack_objects_args {
 	const char *depth;
 	const char *threads;
 	const char *max_pack_size;
+	const char *filter;
 	int no_reuse_delta;
 	int no_reuse_object;
 	int quiet;
@@ -174,6 +175,8 @@ static void prepare_pack_objects(struct child_process *cmd,
 		strvec_pushf(&cmd->args, "--threads=%s", args->threads);
 	if (args->max_pack_size)
 		strvec_pushf(&cmd->args, "--max-pack-size=%s", args->max_pack_size);
+	if (args->filter)
+		strvec_pushf(&cmd->args, "--filter=%s", args->filter);
 	if (args->no_reuse_delta)
 		strvec_pushf(&cmd->args, "--no-reuse-delta");
 	if (args->no_reuse_object)
@@ -734,6 +737,57 @@ static int finish_pack_objects_cmd(struct child_process *cmd,
 	return finish_command(cmd);
 }
 
+static int write_filtered_pack(const struct pack_objects_args *args,
+			       const char *destination,
+			       const char *pack_prefix,
+			       struct string_list *names,
+			       struct string_list *existing_packs,
+			       struct string_list *existing_kept_packs)
+{
+	struct child_process cmd = CHILD_PROCESS_INIT;
+	struct string_list_item *item;
+	FILE *in;
+	int ret;
+	const char *scratch;
+	int local = skip_prefix(destination, packdir, &scratch);
+
+	/* We need to copy 'args' to modify it */
+	struct pack_objects_args new_args = *args;
+
+	/* No need to filter again */
+	new_args.filter = NULL;
+
+	prepare_pack_objects(&cmd, &new_args, destination);
+
+	strvec_push(&cmd.args, "--stdin-packs");
+
+	cmd.in = -1;
+
+	ret = start_command(&cmd);
+	if (ret)
+		return ret;
+
+	/*
+	 * names has a confusing double use: it both provides the list
+	 * of just-written new packs, and accepts the name of the
+	 * filtered pack we are writing.
+	 *
+	 * By the time it is read here, it contains only the pack(s)
+	 * that were just written, which is exactly the set of packs we
+	 * want to consider kept.
+	 */
+	in = xfdopen(cmd.in, "w");
+	for_each_string_list_item(item, names)
+		fprintf(in, "^%s-%s.pack\n", pack_prefix, item->string);
+	for_each_string_list_item(item, existing_packs)
+		fprintf(in, "%s.pack\n", item->string);
+	for_each_string_list_item(item, existing_kept_packs)
+		fprintf(in, "^%s.pack\n", item->string);
+	fclose(in);
+
+	return finish_pack_objects_cmd(&cmd, names, local);
+}
+
 static int write_cruft_pack(const struct pack_objects_args *args,
 			    const char *destination,
 			    const char *pack_prefix,
@@ -866,6 +920,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 				N_("limits the maximum number of threads")),
 		OPT_STRING(0, "max-pack-size", &po_args.max_pack_size, N_("bytes"),
 				N_("maximum size of each packfile")),
+		OPT_STRING(0, "filter", &po_args.filter, N_("args"),
+				N_("object filtering")),
 		OPT_BOOL(0, "pack-kept-objects", &pack_kept_objects,
 				N_("repack objects in packs marked with .keep")),
 		OPT_STRING_LIST(0, "keep-pack", &keep_pack_list, N_("name"),
@@ -1105,6 +1161,17 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 		}
 	}
 
+	if (po_args.filter) {
+		ret = write_filtered_pack(&po_args,
+					  packtmp,
+					  find_pack_prefix(),
+					  &names,
+					  &existing_nonkept_packs,
+					  &existing_kept_packs);
+		if (ret)
+			goto cleanup;
+	}
+
 	string_list_sort(&names);
 
 	close_object_store(the_repository->objects);
diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index af79266c58..66589e4217 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -293,6 +293,22 @@ test_expect_success 'auto-bitmaps do not complain if unavailable' '
 	test_must_be_empty actual
 '
 
+test_expect_success 'repacking with a filter works' '
+	git -C bare.git repack -a -d &&
+	test_stdout_line_count = 1 ls bare.git/objects/pack/*.pack &&
+	git -C bare.git -c repack.writebitmaps=false repack -a -d --filter=blob:none &&
+	test_stdout_line_count = 2 ls bare.git/objects/pack/*.pack &&
+	commit_pack=$(test-tool -C bare.git find-pack HEAD) &&
+	test -n "$commit_pack" &&
+	blob_pack=$(test-tool -C bare.git find-pack HEAD:file1) &&
+	test -n "$blob_pack" &&
+	test "$commit_pack" != "$blob_pack" &&
+	tree_pack=$(test-tool -C bare.git find-pack HEAD^{tree}) &&
+	test "$tree_pack" = "$commit_pack" &&
+	blob_pack2=$(test-tool -C bare.git find-pack HEAD:file2) &&
+	test "$blob_pack2" = "$blob_pack"
+'
+
 objdir=.git/objects
 midx=$objdir/pack/multi-pack-index
 
-- 
2.41.0.244.g8cb3faa74c


  parent reply	other threads:[~2023-07-05  6:08 UTC|newest]

Thread overview: 161+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-14 19:25 [PATCH 0/9] Repack objects into separate packfiles based on a filter Christian Couder
2023-06-14 19:25 ` [PATCH 1/9] pack-objects: allow `--filter` without `--stdout` Christian Couder
2023-06-21 10:49   ` Taylor Blau
2023-07-05  6:16     ` Christian Couder
2023-06-14 19:25 ` [PATCH 2/9] pack-objects: add `--print-filtered` to print omitted objects Christian Couder
2023-06-15 22:50   ` Junio C Hamano
2023-06-21 10:52     ` Taylor Blau
2023-06-21 11:11       ` Christian Couder
2023-06-21 11:54         ` Taylor Blau
2023-06-14 19:25 ` [PATCH 3/9] t/helper: add 'find-pack' test-tool Christian Couder
2023-06-15 23:32   ` Junio C Hamano
2023-06-21 10:40     ` Christian Couder
2023-06-21 10:54     ` Taylor Blau
2023-06-14 19:25 ` [PATCH 4/9] repack: refactor piping an oid to a command Christian Couder
2023-06-15 23:46   ` Junio C Hamano
2023-06-21 10:55     ` Taylor Blau
2023-06-21 10:56     ` Christian Couder
2023-06-14 19:25 ` [PATCH 5/9] repack: refactor finishing pack-objects command Christian Couder
2023-06-16  0:13   ` Junio C Hamano
2023-06-21 11:06     ` Taylor Blau
2023-06-21 11:19       ` Christian Couder
2023-06-21 11:05   ` Taylor Blau
2023-06-14 19:25 ` [PATCH 6/9] repack: add `--filter=<filter-spec>` option Christian Couder
2023-06-16  0:43   ` Junio C Hamano
2023-06-21 11:20     ` Taylor Blau
2023-06-21 15:04       ` Christian Couder
2023-06-22 11:05         ` Taylor Blau
2023-06-21 14:40     ` Christian Couder
2023-06-21 16:53       ` Junio C Hamano
2023-06-22  8:39         ` Christian Couder
2023-06-22 18:32           ` Junio C Hamano
2023-06-21 11:17   ` Taylor Blau
2023-07-05  7:18     ` Christian Couder
2023-06-14 19:25 ` [PATCH 7/9] gc: add `gc.repackFilter` config option Christian Couder
2023-06-14 19:25 ` [PATCH 8/9] repack: implement `--filter-to` for storing filtered out objects Christian Couder
2023-06-16  2:21   ` Junio C Hamano
2023-06-21 11:49   ` Taylor Blau
2023-06-21 12:08     ` Christian Couder
2023-06-21 12:25       ` Taylor Blau
2023-06-21 16:44         ` Junio C Hamano
2023-07-05  6:19     ` Christian Couder
2023-06-14 19:25 ` [PATCH 9/9] gc: add `gc.repackFilterTo` config option Christian Couder
2023-06-16  2:54   ` Junio C Hamano
2023-06-14 21:36 ` [PATCH 0/9] Repack objects into separate packfiles based on a filter Junio C Hamano
2023-06-16  3:08   ` Junio C Hamano
2023-07-05  6:08 ` [PATCH v2 0/8] " Christian Couder
2023-07-05  6:08   ` [PATCH v2 1/8] pack-objects: allow `--filter` without `--stdout` Christian Couder
2023-07-05  6:08   ` [PATCH v2 2/8] t/helper: add 'find-pack' test-tool Christian Couder
2023-07-05  6:08   ` [PATCH v2 3/8] repack: refactor finishing pack-objects command Christian Couder
2023-07-05  6:08   ` [PATCH v2 4/8] repack: refactor finding pack prefix Christian Couder
2023-07-05  6:08   ` Christian Couder [this message]
2023-07-05 17:53     ` [PATCH v2 5/8] repack: add `--filter=<filter-spec>` option Junio C Hamano
2023-07-24  9:01       ` Christian Couder
2023-07-24 18:28         ` Junio C Hamano
2023-07-25 15:22           ` Christian Couder
2023-07-25 17:25             ` Junio C Hamano
2023-07-25 23:08               ` Junio C Hamano
2023-08-08  8:45                 ` Christian Couder
2023-08-09 20:38                   ` Taylor Blau
2023-08-09 22:50                   ` Junio C Hamano
2023-08-09 23:38                     ` Junio C Hamano
2023-08-10  0:10                       ` Jeff King
2023-07-05 18:12     ` Junio C Hamano
2023-07-24  9:02       ` Christian Couder
2023-07-05  6:08   ` [PATCH v2 6/8] gc: add `gc.repackFilter` config option Christian Couder
2023-07-05  6:08   ` [PATCH v2 7/8] repack: implement `--filter-to` for storing filtered out objects Christian Couder
2023-07-05 18:26     ` Junio C Hamano
2023-07-24  9:00       ` Christian Couder
2023-07-24 18:18         ` Junio C Hamano
2023-07-25 13:41           ` Robert Coup
2023-07-25 16:50             ` Junio C Hamano
2023-07-25 15:45           ` Christian Couder
2023-07-05  6:08   ` [PATCH v2 8/8] gc: add `gc.repackFilterTo` config option Christian Couder
2023-07-24  8:59   ` [PATCH v3 0/8] Repack objects into separate packfiles based on a filter Christian Couder
2023-07-24  8:59     ` [PATCH v3 1/8] pack-objects: allow `--filter` without `--stdout` Christian Couder
2023-07-25 22:38       ` Taylor Blau
2023-07-25 23:51         ` Junio C Hamano
2023-07-24  8:59     ` [PATCH v3 2/8] t/helper: add 'find-pack' test-tool Christian Couder
2023-07-25 22:44       ` Taylor Blau
2023-08-08  8:28         ` Christian Couder
2023-07-24  8:59     ` [PATCH v3 3/8] repack: refactor finishing pack-objects command Christian Couder
2023-07-25 22:45       ` Taylor Blau
2023-07-24  8:59     ` [PATCH v3 4/8] repack: refactor finding pack prefix Christian Couder
2023-07-25 22:47       ` Taylor Blau
2023-08-08  8:29         ` Christian Couder
2023-07-24  8:59     ` [PATCH v3 5/8] repack: add `--filter=<filter-spec>` option Christian Couder
2023-07-25 23:04       ` Taylor Blau
2023-08-08  8:34         ` Christian Couder
2023-08-09 21:12           ` Taylor Blau
2023-07-24  8:59     ` [PATCH v3 6/8] gc: add `gc.repackFilter` config option Christian Couder
2023-07-25 23:07       ` Taylor Blau
2023-08-08  8:38         ` Christian Couder
2023-08-09 21:15           ` Taylor Blau
2023-07-24  8:59     ` [PATCH v3 7/8] repack: implement `--filter-to` for storing filtered out objects Christian Couder
2023-07-24  8:59     ` [PATCH v3 8/8] gc: add `gc.repackFilterTo` config option Christian Couder
2023-07-25 23:10     ` [PATCH v3 0/8] Repack objects into separate packfiles based on a filter Taylor Blau
2023-08-08  8:26     ` [PATCH v4 " Christian Couder
2023-08-08  8:26       ` [PATCH v4 1/8] pack-objects: allow `--filter` without `--stdout` Christian Couder
2023-08-08  8:26       ` [PATCH v4 2/8] t/helper: add 'find-pack' test-tool Christian Couder
2023-08-09 21:18         ` Taylor Blau
2023-08-08  8:26       ` [PATCH v4 3/8] repack: refactor finishing pack-objects command Christian Couder
2023-08-08  8:26       ` [PATCH v4 4/8] repack: refactor finding pack prefix Christian Couder
2023-08-09 21:20         ` Taylor Blau
2023-08-08  8:26       ` [PATCH v4 5/8] repack: add `--filter=<filter-spec>` option Christian Couder
2023-08-09 21:40         ` Taylor Blau
2023-08-08  8:26       ` [PATCH v4 6/8] gc: add `gc.repackFilter` config option Christian Couder
2023-08-08  8:26       ` [PATCH v4 7/8] repack: implement `--filter-to` for storing filtered out objects Christian Couder
2023-08-08  8:26       ` [PATCH v4 8/8] gc: add `gc.repackFilterTo` config option Christian Couder
2023-08-09 21:45       ` [PATCH v4 0/8] Repack objects into separate packfiles based on a filter Taylor Blau
2023-08-09 21:57         ` Junio C Hamano
2023-08-12  0:12         ` Christian Couder
2023-08-12  0:00       ` [PATCH v5 " Christian Couder
2023-08-12  0:00         ` [PATCH v5 1/8] pack-objects: allow `--filter` without `--stdout` Christian Couder
2023-08-12  0:00         ` [PATCH v5 2/8] t/helper: add 'find-pack' test-tool Christian Couder
2023-08-12  0:00         ` [PATCH v5 3/8] repack: refactor finishing pack-objects command Christian Couder
2023-08-12  0:00         ` [PATCH v5 4/8] repack: refactor finding pack prefix Christian Couder
2023-08-12  0:00         ` [PATCH v5 5/8] repack: add `--filter=<filter-spec>` option Christian Couder
2023-08-12  0:00         ` [PATCH v5 6/8] gc: add `gc.repackFilter` config option Christian Couder
2023-08-12  0:00         ` [PATCH v5 7/8] repack: implement `--filter-to` for storing filtered out objects Christian Couder
2023-08-12  0:00         ` [PATCH v5 8/8] gc: add `gc.repackFilterTo` config option Christian Couder
2023-08-15  0:51         ` [PATCH v5 0/8] Repack objects into separate packfiles based on a filter Junio C Hamano
2023-08-15 21:43           ` Taylor Blau
2023-08-15 22:32             ` Junio C Hamano
2023-08-15 23:09               ` Taylor Blau
2023-08-15 23:18                 ` Junio C Hamano
2023-08-16  0:38                   ` Taylor Blau
2023-08-16 17:16                     ` Junio C Hamano
2023-09-11 15:20                 ` Christian Couder
2023-09-11 15:06         ` [PATCH v6 0/9] " Christian Couder
2023-09-11 15:06           ` [PATCH v6 1/9] pack-objects: allow `--filter` without `--stdout` Christian Couder
2023-09-11 15:06           ` [PATCH v6 2/9] t/helper: add 'find-pack' test-tool Christian Couder
2023-09-11 15:06           ` [PATCH v6 3/9] repack: refactor finishing pack-objects command Christian Couder
2023-09-11 15:06           ` [PATCH v6 4/9] repack: refactor finding pack prefix Christian Couder
2023-09-11 15:06           ` [PATCH v6 5/9] pack-bitmap-write: rebuild using new bitmap when remapping Christian Couder
2023-09-11 15:06           ` [PATCH v6 6/9] repack: add `--filter=<filter-spec>` option Christian Couder
2023-09-11 15:06           ` [PATCH v6 7/9] gc: add `gc.repackFilter` config option Christian Couder
2023-09-11 15:06           ` [PATCH v6 8/9] repack: implement `--filter-to` for storing filtered out objects Christian Couder
2023-09-11 15:06           ` [PATCH v6 9/9] gc: add `gc.repackFilterTo` config option Christian Couder
2023-09-25 15:25           ` [PATCH v7 0/9] Repack objects into separate packfiles based on a filter Christian Couder
2023-09-25 15:25             ` [PATCH v7 1/9] pack-objects: allow `--filter` without `--stdout` Christian Couder
2023-09-25 15:25             ` [PATCH v7 2/9] t/helper: add 'find-pack' test-tool Christian Couder
2023-09-25 15:25             ` [PATCH v7 3/9] repack: refactor finishing pack-objects command Christian Couder
2023-09-25 15:25             ` [PATCH v7 4/9] repack: refactor finding pack prefix Christian Couder
2023-09-25 15:25             ` [PATCH v7 5/9] pack-bitmap-write: rebuild using new bitmap when remapping Christian Couder
2023-09-25 15:25             ` [PATCH v7 6/9] repack: add `--filter=<filter-spec>` option Christian Couder
2023-09-25 15:25             ` [PATCH v7 7/9] gc: add `gc.repackFilter` config option Christian Couder
2023-09-25 15:25             ` [PATCH v7 8/9] repack: implement `--filter-to` for storing filtered out objects Christian Couder
2023-09-25 15:25             ` [PATCH v7 9/9] gc: add `gc.repackFilterTo` config option Christian Couder
2023-09-25 19:14             ` [PATCH v7 0/9] Repack objects into separate packfiles based on a filter Junio C Hamano
2023-09-25 22:41               ` Taylor Blau
2023-10-02 16:54             ` [PATCH v8 " Christian Couder
2023-10-02 16:54               ` [PATCH v8 1/9] pack-objects: allow `--filter` without `--stdout` Christian Couder
2023-10-02 16:54               ` [PATCH v8 2/9] t/helper: add 'find-pack' test-tool Christian Couder
2023-10-02 16:54               ` [PATCH v8 3/9] repack: refactor finishing pack-objects command Christian Couder
2023-10-02 16:54               ` [PATCH v8 4/9] repack: refactor finding pack prefix Christian Couder
2023-10-02 16:55               ` [PATCH v8 5/9] pack-bitmap-write: rebuild using new bitmap when remapping Christian Couder
2023-10-02 16:55               ` [PATCH v8 6/9] repack: add `--filter=<filter-spec>` option Christian Couder
2023-10-02 16:55               ` [PATCH v8 7/9] gc: add `gc.repackFilter` config option Christian Couder
2023-10-02 16:55               ` [PATCH v8 8/9] repack: implement `--filter-to` for storing filtered out objects Christian Couder
2023-10-02 16:55               ` [PATCH v8 9/9] gc: add `gc.repackFilterTo` config option Christian Couder
2023-10-02 20:14               ` [PATCH v8 0/9] Repack objects into separate packfiles based on a filter Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230705060812.2865188-6-christian.couder@gmail.com \
    --to=christian.couder@gmail.com \
    --cc=chriscool@tuxfamily.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=johncai86@gmail.com \
    --cc=jonathantanmy@google.com \
    --cc=jrnieder@gmail.com \
    --cc=me@ttaylorr.com \
    --cc=ps@pks.im \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).