git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: me@ttaylorr.com, jonathantanmy@google.com,
	christian.couder@gmail.com, git@jeffhostetler.com,
	Derrick Stolee <dstolee@microsoft.com>,
	Derrick Stolee <dstolee@microsoft.com>
Subject: [PATCH 1/2] partial-clone: set default filter with --partial
Date: Thu, 19 Mar 2020 17:28:06 +0000	[thread overview]
Message-ID: <6f340d9aadf71d394ad320ad162f1d140b632f2c.1584638887.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.586.git.1584638887.gitgitgadget@gmail.com>

From: Derrick Stolee <dstolee@microsoft.com>

Partial clone means a lot of different things, including filtering
out all blobs, large blobs, or objects along a certain pathspec.
The pathspec option has limited uses, in particular due to the
performance challenges in serving with such filters. The blob size
option can be helpful for repositories with a small number of large
binaries, but otherwise it is difficult to find a meaningful split
between "small" and "large" blobs.

When I think of or recommend partial clone, I specifically mention
the case of filtering out all blobs, and downloading those blobs
only as needed.

This case is extremely useful, since it takes the best part of
shallow clone (a very small initial download) without any of the
downsides of restricted history.

However, the command-line interface can be confusing:

	git clone --filter=blob:none <url>

Add a simpler "--partial" option that defaults to this case:

	git clone --partial <url>

This should make the feature more discoverable. However, there is
a significant interest in the size-limited filters as that behaves
very similarly to Git LFS. For those cases, the following is
available:

	git clone --partial=<size> <url>

There are quite a few commands using OPT_PARSE_LIST_OBJECTS_FILTER,
includeing clone, fetch, and pack-objects. Augment this macro to
include the "--partial[=<size>]" mode for free.

Modify the first partial clone test that checks --filter=blob:none
to also check --partial with the same expected conditions after
clone. The diff is much simpler to see when ignoring whitespace,
since several lines added a leading tab. This test is essentially
copied to include the two ways to specify a blob size limit of
one byte.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 list-objects-filter-options.c | 18 +++++++++++++++
 list-objects-filter-options.h |  8 ++++++-
 t/t5616-partial-clone.sh      | 42 ++++++++++++++++++++++++++---------
 3 files changed, 57 insertions(+), 11 deletions(-)

diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index 256bcfbdfe6..a71716ef75e 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -270,6 +270,24 @@ int opt_parse_list_objects_filter(const struct option *opt,
 	return 0;
 }
 
+int opt_set_blob_none_filter(const struct option *opt,
+			     const char *arg, int unset)
+{
+	struct strbuf filter_arg = STRBUF_INIT;
+	struct list_objects_filter_options *filter_options = opt->value;
+	
+	if (unset || !arg || !strcmp(arg, "0")) {
+		parse_list_objects_filter(filter_options, "blob:none");
+		return 0;
+	}
+	
+	strbuf_addf(&filter_arg, "blob:limit=%s", arg);
+	parse_list_objects_filter(filter_options, filter_arg.buf);
+	strbuf_release(&filter_arg);
+
+	return 0;
+}
+
 const char *list_objects_filter_spec(struct list_objects_filter_options *filter)
 {
 	if (!filter->filter_spec.nr)
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index 2ffb39222c4..ac38ffcbe86 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -62,6 +62,7 @@ struct list_objects_filter_options {
 
 /* Normalized command line arguments */
 #define CL_ARG__FILTER "filter"
+#define CL_ARG__PARTIAL "partial"
 
 void list_objects_filter_die_if_populated(
 	struct list_objects_filter_options *filter_options);
@@ -80,11 +81,16 @@ void parse_list_objects_filter(
 
 int opt_parse_list_objects_filter(const struct option *opt,
 				  const char *arg, int unset);
+int opt_set_blob_none_filter(const struct option *opt,
+			     const char *arg, int unset);
 
 #define OPT_PARSE_LIST_OBJECTS_FILTER(fo) \
 	{ OPTION_CALLBACK, 0, CL_ARG__FILTER, fo, N_("args"), \
 	  N_("object filtering"), 0, \
-	  opt_parse_list_objects_filter }
+	  opt_parse_list_objects_filter }, \
+	{ OPTION_CALLBACK, 0, CL_ARG__PARTIAL, fo, N_("size"), \
+	  N_("partial clone with blob filter"), \
+	  PARSE_OPT_OPTARG | PARSE_OPT_NONEG , opt_set_blob_none_filter }
 
 /*
  * Translates abbreviated numbers in the filter's filter_spec into their
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 77bb91e9769..c42cef61296 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -33,17 +33,39 @@ test_expect_success 'setup bare clone for server' '
 # confirm we are missing all of the known blobs.
 # confirm partial clone was registered in the local config.
 test_expect_success 'do partial clone 1' '
-	git clone --no-checkout --filter=blob:none "file://$(pwd)/srv.bare" pc1 &&
-
-	git -C pc1 rev-list --quiet --objects --missing=print HEAD >revs &&
-	awk -f print_1.awk revs |
-	sed "s/?//" |
-	sort >observed.oids &&
+	for option in "--filter=blob:none" "--partial"
+	do
+		rm -rf pc1 &&
+		git clone --no-checkout "$option" "file://$(pwd)/srv.bare" pc1 &&
+
+		git -C pc1 rev-list --quiet --objects --missing=print HEAD >revs &&
+		awk -f print_1.awk revs |
+		sed "s/?//" |
+		sort >observed.oids &&
+
+		test_cmp expect_1.oids observed.oids &&
+		test "$(git -C pc1 config --local core.repositoryformatversion)" = "1" &&
+		test "$(git -C pc1 config --local remote.origin.promisor)" = "true" &&
+		test "$(git -C pc1 config --local remote.origin.partialclonefilter)" = "blob:none"
+	done
+'
 
-	test_cmp expect_1.oids observed.oids &&
-	test "$(git -C pc1 config --local core.repositoryformatversion)" = "1" &&
-	test "$(git -C pc1 config --local remote.origin.promisor)" = "true" &&
-	test "$(git -C pc1 config --local remote.origin.partialclonefilter)" = "blob:none"
+test_expect_success 'do partial clone with size limit' '
+	for option in "--filter=blob:limit=1" "--partial=1"
+	do
+		rm -rf pc-limit &&
+		git clone --no-checkout "$option" "file://$(pwd)/srv.bare" pc-limit &&
+
+		git -C pc-limit rev-list --quiet --objects --missing=print HEAD >revs &&
+		awk -f print_1.awk revs |
+		sed "s/?//" |
+		sort >observed.oids &&
+
+		test_cmp expect_1.oids observed.oids &&
+		test "$(git -C pc-limit config --local core.repositoryformatversion)" = "1" &&
+		test "$(git -C pc-limit config --local remote.origin.promisor)" = "true" &&
+		test "$(git -C pc-limit config --local remote.origin.partialclonefilter)" = "blob:limit=1"
+	done
 '
 
 test_expect_success 'verify that .promisor file contains refs fetched' '
-- 
gitgitgadget


  reply	other threads:[~2020-03-19 17:28 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-19 17:28 [PATCH 0/2] Slightly simplify partial clone user experience Derrick Stolee via GitGitGadget
2020-03-19 17:28 ` Derrick Stolee via GitGitGadget [this message]
2020-03-20 20:26   ` [PATCH 1/2] partial-clone: set default filter with --partial Junio C Hamano
2020-03-20 20:38     ` Derrick Stolee
2020-03-22  9:46       ` Jeff King
2020-03-19 17:28 ` [PATCH 2/2] clone: document --partial and --filter options Derrick Stolee via GitGitGadget
2020-03-20 12:27 ` [PATCH 0/2] Slightly simplify partial clone user experience Derrick Stolee
2020-03-22  9:51 ` Jeff King
2020-03-22 10:58   ` Christian Couder
2020-03-22 16:45     ` Derrick Stolee
2020-03-22 19:22       ` Jeff King
2020-03-22 16:03 ` Taylor Blau
2020-03-22 19:50 ` [PATCH v2] clone: document --filter options Derrick Stolee via GitGitGadget
2020-03-24  3:40   ` Jeff King
2020-03-24  5:17   ` Jonathan Nieder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6f340d9aadf71d394ad320ad162f1d140b632f2c.1584638887.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).