git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v4 00/15] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests
@ 2017-11-16 18:17 Jeff Hostetler
  2017-11-16 18:17 ` [PATCH v4 01/15] upload-pack: add object filtering for partial clone Jeff Hostetler
                   ` (16 more replies)
  0 siblings, 17 replies; 19+ messages in thread
From: Jeff Hostetler @ 2017-11-16 18:17 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, jonathantanmy, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

This part 3 of a 3 part sequence partial clone.  It assumes
that part 1 and part 2 are in place.

This patch series is labeled as V4 to keep it in sync with
the corresponding V4 versions of parts 1 and 2.  There was
not a V3 version of this patch series.

Jonathan and I independently started on this task.  This is another
pass at merging those efforts.  So there are several places that may
need refactoring and cleanup, but fewer than in the previous submission.
In particular, the test cases should be squashed and new tests added.

And I think we need more end-to-end tests.  I'll work on those next.

Jeff Hostetler (5):
  upload-pack: add object filtering for partial clone
  clone, fetch-pack, index-pack, transport: partial clone
  fetch: add object filtering for partial fetch
  remote-curl: add object filtering for partial clone
  partial-clone: define partial clone settings in config

Jonathan Tan (10):
  fetch: refactor calculation of remote list
  pack-objects: test support for blob filtering
  fetch-pack: test support excluding large blobs
  fetch-pack: test support excluding large blobs
  fetch: add from_promisor and exclude-promisor-objects parameters
  t5500: add fetch-pack tests for partial clone
  t5601: test for partial clone
  t5500: more tests for partial clone and fetch
  unpack-trees: batch fetching of missing blobs
  fetch-pack: restore save_commit_buffer after use

 Documentation/config.txt                          |   4 +
 Documentation/gitremote-helpers.txt               |   4 +
 Documentation/technical/pack-protocol.txt         |   8 ++
 Documentation/technical/protocol-capabilities.txt |   8 ++
 builtin/clone.c                                   |  22 ++++-
 builtin/fetch-pack.c                              |   4 +
 builtin/fetch.c                                   |  93 +++++++++++++++--
 cache.h                                           |   1 +
 config.c                                          |   5 +
 connected.c                                       |   2 +
 environment.c                                     |   1 +
 fetch-object.c                                    |  27 ++++-
 fetch-object.h                                    |   5 +
 fetch-pack.c                                      |  17 ++++
 fetch-pack.h                                      |   2 +
 list-objects-filter-options.c                     | 110 +++++++++++++++++++--
 list-objects-filter-options.h                     |  12 +++
 remote-curl.c                                     |  11 +++
 t/t5300-pack-object.sh                            |  26 +++++
 t/t5500-fetch-pack.sh                             | 115 ++++++++++++++++++++++
 t/t5601-clone.sh                                  | 101 +++++++++++++++++++
 t/test-lib-functions.sh                           |  12 +++
 transport-helper.c                                |   5 +
 transport.c                                       |   4 +
 transport.h                                       |   5 +
 unpack-trees.c                                    |  22 +++++
 upload-pack.c                                     |  22 ++++-
 27 files changed, 628 insertions(+), 20 deletions(-)

-- 
2.9.3


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v4 01/15] upload-pack: add object filtering for partial clone
  2017-11-16 18:17 [PATCH v4 00/15] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests Jeff Hostetler
@ 2017-11-16 18:17 ` Jeff Hostetler
  2017-11-16 18:17 ` [PATCH v4 02/15] clone, fetch-pack, index-pack, transport: " Jeff Hostetler
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 19+ messages in thread
From: Jeff Hostetler @ 2017-11-16 18:17 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, jonathantanmy, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Teach upload-pack to negotiate object filtering over the protocol and
to send filter parameters to pack-objects.  This is intended for partial
clone and fetch.

The idea to make upload-pack configurable using uploadpack.allowFilter
comes from Jonathan Tan's work in [1].

[1] https://public-inbox.org/git/f211093280b422c32cc1b7034130072f35c5ed51.1506714999.git.jonathantanmy@google.com/

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Documentation/config.txt                          |  4 ++++
 Documentation/technical/pack-protocol.txt         |  8 +++++++
 Documentation/technical/protocol-capabilities.txt |  8 +++++++
 list-objects-filter-options.c                     | 26 +++++++++++++++++++++++
 list-objects-filter-options.h                     |  6 ++++++
 upload-pack.c                                     | 22 ++++++++++++++++++-
 6 files changed, 73 insertions(+), 1 deletion(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 1ac0ae6..e528210 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -3268,6 +3268,10 @@ uploadpack.packObjectsHook::
 	was run. I.e., `upload-pack` will feed input intended for
 	`pack-objects` to the hook, and expects a completed packfile on
 	stdout.
+
+uploadpack.allowFilter::
+	If this option is set, `upload-pack` will advertise partial
+	clone and partial fetch object filtering.
 +
 Note that this configuration variable is ignored if it is seen in the
 repository-level config (this is a safety measure against fetching from
diff --git a/Documentation/technical/pack-protocol.txt b/Documentation/technical/pack-protocol.txt
index ed1eae8..a43a113 100644
--- a/Documentation/technical/pack-protocol.txt
+++ b/Documentation/technical/pack-protocol.txt
@@ -212,6 +212,7 @@ out of what the server said it could do with the first 'want' line.
   upload-request    =  want-list
 		       *shallow-line
 		       *1depth-request
+		       [filter-request]
 		       flush-pkt
 
   want-list         =  first-want
@@ -227,6 +228,8 @@ out of what the server said it could do with the first 'want' line.
   additional-want   =  PKT-LINE("want" SP obj-id)
 
   depth             =  1*DIGIT
+
+  filter-request    =  PKT-LINE("filter" SP filter-spec)
 ----
 
 Clients MUST send all the obj-ids it wants from the reference
@@ -249,6 +252,11 @@ complete those commits. Commits whose parents are not received as a
 result are defined as shallow and marked as such in the server. This
 information is sent back to the client in the next step.
 
+The client can optionally request that pack-objects omit various
+objects from the packfile using one of several filtering techniques.
+These are intended for use with partial clone and partial fetch
+operations.  See `rev-list` for possible "filter-spec" values.
+
 Once all the 'want's and 'shallow's (and optional 'deepen') are
 transferred, clients MUST send a flush-pkt, to tell the server side
 that it is done sending the list.
diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt
index 26dcc6f..332d209 100644
--- a/Documentation/technical/protocol-capabilities.txt
+++ b/Documentation/technical/protocol-capabilities.txt
@@ -309,3 +309,11 @@ to accept a signed push certificate, and asks the <nonce> to be
 included in the push certificate.  A send-pack client MUST NOT
 send a push-cert packet unless the receive-pack server advertises
 this capability.
+
+filter
+------
+
+If the upload-pack server advertises the 'filter' capability,
+fetch-pack may send "filter" commands to request a partial clone
+or partial fetch and request that the server omit various objects
+from the packfile.
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index a9298fd..f1fb57b 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -147,3 +147,29 @@ int opt_parse_list_objects_filter(const struct option *opt,
 
 	return parse_list_objects_filter(filter_options, arg);
 }
+
+/*
+ * The caller wants to pass the value of filter_options->raw_value
+ * to a subordinate program.  Encode the value if necessary to guard
+ * against injection attacks.
+ */
+void list_objects_filter_push_arg(
+	struct argv_array *args,
+	const struct list_objects_filter_options *filter_options)
+{
+	if (!filter_options->choice)
+		return;
+	if (!filter_options->raw_value || !*filter_options->raw_value)
+		return;
+
+	if (filter_options->requires_armor) {
+		struct strbuf buf = STRBUF_INIT;
+		armor_encode_arg(&buf, filter_options->raw_value);
+		argv_array_pushf(args, "--%s=%s", CL_ARG__FILTER, buf.buf);
+		strbuf_release(&buf);
+	} else {
+		argv_array_pushf(args, "--%s=%s", CL_ARG__FILTER,
+				 filter_options->raw_value);
+	}
+}
+
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index 797bd3a..99f454c 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -54,4 +54,10 @@ int opt_parse_list_objects_filter(const struct option *opt,
 	  N_("object filtering"), PARSE_OPT_NONEG, \
 	  opt_parse_list_objects_filter }
 
+struct argv_array;
+
+void list_objects_filter_push_arg(
+	struct argv_array *args,
+	const struct list_objects_filter_options *filter_options);
+
 #endif /* LIST_OBJECTS_FILTER_OPTIONS_H */
diff --git a/upload-pack.c b/upload-pack.c
index e25f725..98a254a 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -10,6 +10,8 @@
 #include "diff.h"
 #include "revision.h"
 #include "list-objects.h"
+#include "list-objects-filter.h"
+#include "list-objects-filter-options.h"
 #include "run-command.h"
 #include "connect.h"
 #include "sigchain.h"
@@ -64,6 +66,10 @@ static int advertise_refs;
 static int stateless_rpc;
 static const char *pack_objects_hook;
 
+static int filter_capability_requested;
+static int filter_advertise;
+static struct list_objects_filter_options filter_options;
+
 static void reset_timeout(void)
 {
 	alarm(timeout);
@@ -131,6 +137,9 @@ static void create_pack_file(void)
 		argv_array_push(&pack_objects.args, "--delta-base-offset");
 	if (use_include_tag)
 		argv_array_push(&pack_objects.args, "--include-tag");
+	if (filter_options.choice)
+		list_objects_filter_push_arg(&pack_objects.args,
+					     &filter_options);
 
 	pack_objects.in = -1;
 	pack_objects.out = -1;
@@ -794,6 +803,12 @@ static void receive_needs(void)
 			deepen_rev_list = 1;
 			continue;
 		}
+		if (skip_prefix(line, "filter ", &arg)) {
+			if (!filter_capability_requested)
+				die("git upload-pack: filtering capability not negotiated");
+			parse_list_objects_filter(&filter_options, arg);
+			continue;
+		}
 		if (!skip_prefix(line, "want ", &arg) ||
 		    get_oid_hex(arg, &oid_buf))
 			die("git upload-pack: protocol error, "
@@ -821,6 +836,8 @@ static void receive_needs(void)
 			no_progress = 1;
 		if (parse_feature_request(features, "include-tag"))
 			use_include_tag = 1;
+		if (parse_feature_request(features, "filter"))
+			filter_capability_requested = 1;
 
 		o = parse_object(&oid_buf);
 		if (!o) {
@@ -940,7 +957,7 @@ static int send_ref(const char *refname, const struct object_id *oid,
 		struct strbuf symref_info = STRBUF_INIT;
 
 		format_symref_info(&symref_info, cb_data);
-		packet_write_fmt(1, "%s %s%c%s%s%s%s%s agent=%s\n",
+		packet_write_fmt(1, "%s %s%c%s%s%s%s%s%s agent=%s\n",
 			     oid_to_hex(oid), refname_nons,
 			     0, capabilities,
 			     (allow_unadvertised_object_request & ALLOW_TIP_SHA1) ?
@@ -949,6 +966,7 @@ static int send_ref(const char *refname, const struct object_id *oid,
 				     " allow-reachable-sha1-in-want" : "",
 			     stateless_rpc ? " no-done" : "",
 			     symref_info.buf,
+			     filter_advertise ? " filter" : "",
 			     git_user_agent_sanitized());
 		strbuf_release(&symref_info);
 	} else {
@@ -1027,6 +1045,8 @@ static int upload_pack_config(const char *var, const char *value, void *unused)
 	} else if (current_config_scope() != CONFIG_SCOPE_REPO) {
 		if (!strcmp("uploadpack.packobjectshook", var))
 			return git_config_string(&pack_objects_hook, var, value);
+	} else if (!strcmp("uploadpack.allowfilter", var)) {
+		filter_advertise = git_config_bool(var, value);
 	}
 	return parse_hide_refs_config(var, value, "uploadpack");
 }
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 02/15] clone, fetch-pack, index-pack, transport: partial clone
  2017-11-16 18:17 [PATCH v4 00/15] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests Jeff Hostetler
  2017-11-16 18:17 ` [PATCH v4 01/15] upload-pack: add object filtering for partial clone Jeff Hostetler
@ 2017-11-16 18:17 ` Jeff Hostetler
  2017-11-16 18:17 ` [PATCH v4 03/15] fetch: refactor calculation of remote list Jeff Hostetler
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 19+ messages in thread
From: Jeff Hostetler @ 2017-11-16 18:17 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, jonathantanmy, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 builtin/clone.c      |  9 +++++++++
 builtin/fetch-pack.c |  4 ++++
 fetch-pack.c         | 13 +++++++++++++
 fetch-pack.h         |  2 ++
 transport-helper.c   |  5 +++++
 transport.c          |  4 ++++
 transport.h          |  5 +++++
 7 files changed, 42 insertions(+)

diff --git a/builtin/clone.c b/builtin/clone.c
index dbddd98..fceb9e7 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -26,6 +26,7 @@
 #include "run-command.h"
 #include "connected.h"
 #include "packfile.h"
+#include "list-objects-filter-options.h"
 
 /*
  * Overall FIXMEs:
@@ -60,6 +61,7 @@ static struct string_list option_optional_reference = STRING_LIST_INIT_NODUP;
 static int option_dissociate;
 static int max_jobs = -1;
 static struct string_list option_recurse_submodules = STRING_LIST_INIT_NODUP;
+static struct list_objects_filter_options filter_options;
 
 static int recurse_submodules_cb(const struct option *opt,
 				 const char *arg, int unset)
@@ -135,6 +137,7 @@ static struct option builtin_clone_options[] = {
 			TRANSPORT_FAMILY_IPV4),
 	OPT_SET_INT('6', "ipv6", &family, N_("use IPv6 addresses only"),
 			TRANSPORT_FAMILY_IPV6),
+	OPT_PARSE_LIST_OBJECTS_FILTER(&filter_options),
 	OPT_END()
 };
 
@@ -1073,6 +1076,8 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 			warning(_("--shallow-since is ignored in local clones; use file:// instead."));
 		if (option_not.nr)
 			warning(_("--shallow-exclude is ignored in local clones; use file:// instead."));
+		if (filter_options.choice)
+			warning(_("--filter is ignored in local clones; use file:// instead."));
 		if (!access(mkpath("%s/shallow", path), F_OK)) {
 			if (option_local > 0)
 				warning(_("source repository is shallow, ignoring --local"));
@@ -1104,6 +1109,10 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		transport_set_option(transport, TRANS_OPT_UPLOADPACK,
 				     option_upload_pack);
 
+	if (filter_options.choice)
+		transport_set_option(transport, TRANS_OPT_LIST_OBJECTS_FILTER,
+				     filter_options.raw_value);
+
 	if (transport->smart_options && !deepen)
 		transport->smart_options->check_self_contained_and_connected = 1;
 
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 9a7ebf6..d0fdaa8 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -153,6 +153,10 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
 			args.no_haves = 1;
 			continue;
 		}
+		if (skip_prefix(arg, ("--" CL_ARG__FILTER "="), &arg)) {
+			parse_list_objects_filter(&args.filter_options, arg);
+			continue;
+		}
 		usage(fetch_pack_usage);
 	}
 	if (deepen_not.nr)
diff --git a/fetch-pack.c b/fetch-pack.c
index 4640b4e..895e8f9 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -29,6 +29,7 @@ static int deepen_not_ok;
 static int fetch_fsck_objects = -1;
 static int transfer_fsck_objects = -1;
 static int agent_supported;
+static int server_supports_filtering;
 static struct lock_file shallow_lock;
 static const char *alternate_shallow_file;
 
@@ -379,6 +380,8 @@ static int find_common(struct fetch_pack_args *args,
 			if (deepen_not_ok)      strbuf_addstr(&c, " deepen-not");
 			if (agent_supported)    strbuf_addf(&c, " agent=%s",
 							    git_user_agent_sanitized());
+			if (args->filter_options.choice)
+				strbuf_addstr(&c, " filter");
 			packet_buf_write(&req_buf, "want %s%s\n", remote_hex, c.buf);
 			strbuf_release(&c);
 		} else
@@ -407,6 +410,9 @@ static int find_common(struct fetch_pack_args *args,
 			packet_buf_write(&req_buf, "deepen-not %s", s->string);
 		}
 	}
+	if (server_supports_filtering && args->filter_options.choice)
+		packet_buf_write(&req_buf, "filter %s",
+				 args->filter_options.raw_value);
 	packet_buf_flush(&req_buf);
 	state_len = req_buf.len;
 
@@ -967,6 +973,13 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 	else
 		prefer_ofs_delta = 0;
 
+	if (server_supports("filter")) {
+		server_supports_filtering = 1;
+		print_verbose(args, _("Server supports filter"));
+	} else if (args->filter_options.choice) {
+		warning("filtering not recognized by server, ignoring");
+	}
+
 	if ((agent_feature = server_feature_value("agent", &agent_len))) {
 		agent_supported = 1;
 		if (agent_len)
diff --git a/fetch-pack.h b/fetch-pack.h
index 84904c3..64661b6 100644
--- a/fetch-pack.h
+++ b/fetch-pack.h
@@ -3,6 +3,7 @@
 
 #include "string-list.h"
 #include "run-command.h"
+#include "list-objects-filter-options.h"
 
 struct oid_array;
 
@@ -12,6 +13,7 @@ struct fetch_pack_args {
 	int depth;
 	const char *deepen_since;
 	const struct string_list *deepen_not;
+	struct list_objects_filter_options filter_options;
 	unsigned deepen_relative:1;
 	unsigned quiet:1;
 	unsigned keep_pack:1;
diff --git a/transport-helper.c b/transport-helper.c
index c948d52..96823c7 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -671,6 +671,11 @@ static int fetch(struct transport *transport,
 	if (data->transport_options.update_shallow)
 		set_helper_option(transport, "update-shallow", "true");
 
+	if (data->transport_options.filter_options.choice)
+		set_helper_option(
+			transport, "filter",
+			data->transport_options.filter_options.raw_value);
+
 	if (data->fetch)
 		return fetch_with_fetch(transport, nr_heads, to_fetch);
 
diff --git a/transport.c b/transport.c
index 8211f82..d50c73b 100644
--- a/transport.c
+++ b/transport.c
@@ -166,6 +166,9 @@ static int set_git_option(struct git_transport_options *opts,
 	} else if (!strcmp(name, TRANS_OPT_NO_HAVES)) {
 		opts->no_haves = !!value;
 		return 0;
+	} else if (!strcmp(name, TRANS_OPT_LIST_OBJECTS_FILTER)) {
+		parse_list_objects_filter(&opts->filter_options, value);
+		return 0;
 	}
 	return 1;
 }
@@ -236,6 +239,7 @@ static int fetch_refs_via_pack(struct transport *transport,
 	args.update_shallow = data->options.update_shallow;
 	args.from_promisor = data->options.from_promisor;
 	args.no_haves = data->options.no_haves;
+	args.filter_options = data->options.filter_options;
 
 	if (!data->got_remote_heads) {
 		connect_setup(transport, 0);
diff --git a/transport.h b/transport.h
index 67428f6..f64aa3a 100644
--- a/transport.h
+++ b/transport.h
@@ -4,6 +4,7 @@
 #include "cache.h"
 #include "run-command.h"
 #include "remote.h"
+#include "list-objects-filter-options.h"
 
 struct string_list;
 
@@ -23,6 +24,7 @@ struct git_transport_options {
 	const char *uploadpack;
 	const char *receivepack;
 	struct push_cas_option *cas;
+	struct list_objects_filter_options filter_options;
 };
 
 enum transport_family {
@@ -218,6 +220,9 @@ void transport_check_allowed(const char *type);
 /* Do not send "have" lines */
 #define TRANS_OPT_NO_HAVES "no-haves"
 
+/* Filter objects for partial clone and fetch */
+#define TRANS_OPT_LIST_OBJECTS_FILTER "filter"
+
 /**
  * Returns 0 if the option was used, non-zero otherwise. Prints a
  * message to stderr if the option is not used.
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 03/15] fetch: refactor calculation of remote list
  2017-11-16 18:17 [PATCH v4 00/15] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests Jeff Hostetler
  2017-11-16 18:17 ` [PATCH v4 01/15] upload-pack: add object filtering for partial clone Jeff Hostetler
  2017-11-16 18:17 ` [PATCH v4 02/15] clone, fetch-pack, index-pack, transport: " Jeff Hostetler
@ 2017-11-16 18:17 ` Jeff Hostetler
  2017-11-16 18:17 ` [PATCH v4 04/15] fetch: add object filtering for partial fetch Jeff Hostetler
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 19+ messages in thread
From: Jeff Hostetler @ 2017-11-16 18:17 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, jonathantanmy

From: Jonathan Tan <jonathantanmy@google.com>

Separate out the calculation of remotes to be fetched from and the
actual fetching. This will allow us to include an additional step before
the actual fetching in a subsequent commit.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/fetch.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 225c734..1b1f039 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1322,7 +1322,7 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 {
 	int i;
 	struct string_list list = STRING_LIST_INIT_DUP;
-	struct remote *remote;
+	struct remote *remote = NULL;
 	int result = 0;
 	struct argv_array argv_gc_auto = ARGV_ARRAY_INIT;
 
@@ -1367,17 +1367,14 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 		else if (argc > 1)
 			die(_("fetch --all does not make sense with refspecs"));
 		(void) for_each_remote(get_one_remote_for_fetch, &list);
-		result = fetch_multiple(&list);
 	} else if (argc == 0) {
 		/* No arguments -- use default remote */
 		remote = remote_get(NULL);
-		result = fetch_one(remote, argc, argv);
 	} else if (multiple) {
 		/* All arguments are assumed to be remotes or groups */
 		for (i = 0; i < argc; i++)
 			if (!add_remote_or_group(argv[i], &list))
 				die(_("No such remote or remote group: %s"), argv[i]);
-		result = fetch_multiple(&list);
 	} else {
 		/* Single remote or group */
 		(void) add_remote_or_group(argv[0], &list);
@@ -1385,14 +1382,19 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 			/* More than one remote */
 			if (argc > 1)
 				die(_("Fetching a group and specifying refspecs does not make sense"));
-			result = fetch_multiple(&list);
 		} else {
 			/* Zero or one remotes */
 			remote = remote_get(argv[0]);
-			result = fetch_one(remote, argc-1, argv+1);
+			argc--;
+			argv++;
 		}
 	}
 
+	if (remote)
+		result = fetch_one(remote, argc, argv);
+	else
+		result = fetch_multiple(&list);
+
 	if (!result && (recurse_submodules != RECURSE_SUBMODULES_OFF)) {
 		struct argv_array options = ARGV_ARRAY_INIT;
 
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 04/15] fetch: add object filtering for partial fetch
  2017-11-16 18:17 [PATCH v4 00/15] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests Jeff Hostetler
                   ` (2 preceding siblings ...)
  2017-11-16 18:17 ` [PATCH v4 03/15] fetch: refactor calculation of remote list Jeff Hostetler
@ 2017-11-16 18:17 ` Jeff Hostetler
  2017-11-16 18:17 ` [PATCH v4 05/15] remote-curl: add object filtering for partial clone Jeff Hostetler
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 19+ messages in thread
From: Jeff Hostetler @ 2017-11-16 18:17 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, jonathantanmy, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Teach fetch to use the list-objects filtering parameters
to allow a "partial fetch" following a "partial clone".

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 builtin/fetch.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 1b1f039..fb9af7c 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -18,6 +18,7 @@
 #include "argv-array.h"
 #include "utf8.h"
 #include "packfile.h"
+#include "list-objects-filter-options.h"
 
 static const char * const builtin_fetch_usage[] = {
 	N_("git fetch [<options>] [<repository> [<refspec>...]]"),
@@ -55,6 +56,7 @@ static int recurse_submodules_default = RECURSE_SUBMODULES_ON_DEMAND;
 static int shown_url = 0;
 static int refmap_alloc, refmap_nr;
 static const char **refmap_array;
+static struct list_objects_filter_options filter_options;
 
 static int git_fetch_config(const char *k, const char *v, void *cb)
 {
@@ -160,6 +162,7 @@ static struct option builtin_fetch_options[] = {
 			TRANSPORT_FAMILY_IPV4),
 	OPT_SET_INT('6', "ipv6", &family, N_("use IPv6 addresses only"),
 			TRANSPORT_FAMILY_IPV6),
+	OPT_PARSE_LIST_OBJECTS_FILTER(&filter_options),
 	OPT_END()
 };
 
@@ -1044,6 +1047,9 @@ static struct transport *prepare_transport(struct remote *remote, int deepen)
 		set_option(transport, TRANS_OPT_DEEPEN_RELATIVE, "yes");
 	if (update_shallow)
 		set_option(transport, TRANS_OPT_UPDATE_SHALLOW, "yes");
+	if (filter_options.choice)
+		set_option(transport, TRANS_OPT_LIST_OBJECTS_FILTER,
+			   filter_options.raw_value);
 	return transport;
 }
 
@@ -1242,6 +1248,20 @@ static int fetch_multiple(struct string_list *list)
 	int i, result = 0;
 	struct argv_array argv = ARGV_ARRAY_INIT;
 
+	if (filter_options.choice) {
+		/*
+		 * We currently only support partial-fetches to the remote
+		 * used for the partial-clone because we only support 1
+		 * promisor remote, so we DO NOT allow explicit command
+		 * line filter arguments for multi-fetches.
+		 *
+		 * Note that the loop below will spawn background fetches
+		 * for each remote and one of them MAY INHERIT the proper
+		 * partial-fetch settings, so everything is consistent.
+		 */
+		die(_("partial-fetch is not supported on multiple remotes"));
+	}
+
 	if (!append && !dry_run) {
 		int errcode = truncate_fetch_head();
 		if (errcode)
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 05/15] remote-curl: add object filtering for partial clone
  2017-11-16 18:17 [PATCH v4 00/15] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests Jeff Hostetler
                   ` (3 preceding siblings ...)
  2017-11-16 18:17 ` [PATCH v4 04/15] fetch: add object filtering for partial fetch Jeff Hostetler
@ 2017-11-16 18:17 ` Jeff Hostetler
  2017-11-16 18:17 ` [PATCH v4 06/15] pack-objects: test support for blob filtering Jeff Hostetler
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 19+ messages in thread
From: Jeff Hostetler @ 2017-11-16 18:17 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, jonathantanmy, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Documentation/gitremote-helpers.txt |  4 ++++
 remote-curl.c                       | 11 +++++++++++
 2 files changed, 15 insertions(+)

diff --git a/Documentation/gitremote-helpers.txt b/Documentation/gitremote-helpers.txt
index 1ceab89..4bb11bf 100644
--- a/Documentation/gitremote-helpers.txt
+++ b/Documentation/gitremote-helpers.txt
@@ -472,6 +472,10 @@ set by Git if the remote helper has the 'option' capability.
 'option no-haves' {'true'|'false'}::
 	Do not send "have" lines.
 
+'option filter <filter-spec>'::
+	An object filter specification for partial clone or fetch
+	as described in rev-list.
+
 SEE ALSO
 --------
 linkgit:git-remote[1]
diff --git a/remote-curl.c b/remote-curl.c
index 34a81b8..c2f28ab 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -13,6 +13,7 @@
 #include "credential.h"
 #include "sha1-array.h"
 #include "send-pack.h"
+#include "list-objects-filter-options.h"
 
 static struct remote *remote;
 /* always ends with a trailing slash */
@@ -22,6 +23,7 @@ struct options {
 	int verbosity;
 	unsigned long depth;
 	char *deepen_since;
+	char *partial_clone_filter;
 	struct string_list deepen_not;
 	struct string_list push_options;
 	unsigned progress : 1,
@@ -165,6 +167,9 @@ static int set_option(const char *name, const char *value)
 	} else if (!strcmp(name, "no-haves")) {
 		options.no_haves = 1;
 		return 0;
+	} else if (!strcmp(name, "filter")) {
+		options.partial_clone_filter = xstrdup(value);
+		return 0;
 	} else {
 		return 1 /* unsupported */;
 	}
@@ -834,6 +839,12 @@ static int fetch_git(struct discovery *heads,
 		argv_array_push(&args, "--from-promisor");
 	if (options.no_haves)
 		argv_array_push(&args, "--no-haves");
+	if (options.partial_clone_filter) {
+		struct list_objects_filter_options filter_options;
+		parse_list_objects_filter(&filter_options,
+					  options.partial_clone_filter);
+		list_objects_filter_push_arg(&args, &filter_options);
+	}
 	argv_array_push(&args, url.buf);
 
 	for (i = 0; i < nr_heads; i++) {
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 06/15] pack-objects: test support for blob filtering
  2017-11-16 18:17 [PATCH v4 00/15] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests Jeff Hostetler
                   ` (4 preceding siblings ...)
  2017-11-16 18:17 ` [PATCH v4 05/15] remote-curl: add object filtering for partial clone Jeff Hostetler
@ 2017-11-16 18:17 ` Jeff Hostetler
  2017-11-16 18:17 ` [PATCH v4 07/15] fetch-pack: test support excluding large blobs Jeff Hostetler
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 19+ messages in thread
From: Jeff Hostetler @ 2017-11-16 18:17 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, jonathantanmy, Jeff Hostetler

From: Jonathan Tan <jonathantanmy@google.com>

As part of an effort to improve Git support for very large repositories
in which clients typically have only a subset of all version-controlled
blobs, test pack-objects support for --filter=blob:limit=<n>, packing only
blobs not exceeding that size unless the blob corresponds to a file
whose name starts with ".git". upload-pack will eventually be taught to
use this new parameter if needed to exclude certain blobs during a fetch
or clone, potentially drastically reducing network consumption when
serving these very large repositories.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 t/t5300-pack-object.sh  | 26 ++++++++++++++++++++++++++
 t/test-lib-functions.sh | 12 ++++++++++++
 2 files changed, 38 insertions(+)

diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index 9c68b99..8e3db12 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -457,6 +457,32 @@ test_expect_success !PTHREADS,C_LOCALE_OUTPUT 'pack-objects --threads=N or pack.
 	grep -F "no threads support, ignoring pack.threads" err
 '
 
+lcut () {
+	perl -e '$/ = undef; $_ = <>; s/^.{'$1'}//s; print $_'
+}
+
+test_expect_success 'filtering by size works with multiple excluded' '
+	rm -rf server &&
+	git init server &&
+	printf a > server/a &&
+	printf b > server/b &&
+	printf c-very-long-file > server/c &&
+	printf d-very-long-file > server/d &&
+	git -C server add a b c d &&
+	git -C server commit -m x &&
+
+	git -C server rev-parse HEAD >objects &&
+	git -C server pack-objects --revs --stdout --filter=blob:limit=10 <objects >my.pack &&
+
+	# Ensure that only the small blobs are in the packfile
+	git index-pack my.pack &&
+	git verify-pack -v my.idx >objectlist &&
+	grep $(git hash-object server/a) objectlist &&
+	grep $(git hash-object server/b) objectlist &&
+	! grep $(git hash-object server/c) objectlist &&
+	! grep $(git hash-object server/d) objectlist
+'
+
 #
 # WARNING!
 #
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 1701fe2..07b79c7 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1020,3 +1020,15 @@ nongit () {
 		"$@"
 	)
 }
+
+# Converts big-endian pairs of hexadecimal digits into bytes. For example,
+# "printf 61620d0a | hex_pack" results in "ab\r\n".
+hex_pack () {
+	perl -e '$/ = undef; $input = <>; print pack("H*", $input)'
+}
+
+# Converts bytes into big-endian pairs of hexadecimal digits. For example,
+# "printf 'ab\r\n' | hex_unpack" results in "61620d0a".
+hex_unpack () {
+	perl -e '$/ = undef; $input = <>; print unpack("H2" x length($input), $input)'
+}
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 07/15] fetch-pack: test support excluding large blobs
  2017-11-16 18:17 [PATCH v4 00/15] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests Jeff Hostetler
                   ` (5 preceding siblings ...)
  2017-11-16 18:17 ` [PATCH v4 06/15] pack-objects: test support for blob filtering Jeff Hostetler
@ 2017-11-16 18:17 ` Jeff Hostetler
  2017-11-16 18:17 ` [PATCH v4 08/15] partial-clone: define partial clone settings in config Jeff Hostetler
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 19+ messages in thread
From: Jeff Hostetler @ 2017-11-16 18:17 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, jonathantanmy, Jeff Hostetler

From: Jonathan Tan <jonathantanmy@google.com>

Created tests to verify fetch-pack and upload-pack support
for excluding large blobs using --filter=blob:limit=<n>
parameter.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 cache.h                       |  1 +
 config.c                      |  5 +++
 environment.c                 |  1 +
 list-objects-filter-options.c | 84 +++++++++++++++++++++++++++++++++++++++----
 list-objects-filter-options.h |  6 ++++
 t/t5500-fetch-pack.sh         | 27 ++++++++++++++
 6 files changed, 117 insertions(+), 7 deletions(-)

diff --git a/cache.h b/cache.h
index 6980072..bccc510 100644
--- a/cache.h
+++ b/cache.h
@@ -861,6 +861,7 @@ extern int grafts_replace_parents;
 #define GIT_REPO_VERSION_READ 1
 extern int repository_format_precious_objects;
 extern char *repository_format_partial_clone;
+extern const char *core_partial_clone_filter_default;
 
 struct repository_format {
 	int version;
diff --git a/config.c b/config.c
index adb7d7a..adeee04 100644
--- a/config.c
+++ b/config.c
@@ -1241,6 +1241,11 @@ static int git_default_core_config(const char *var, const char *value)
 		return 0;
 	}
 
+	if (!strcmp(var, "core.partialclonefilter")) {
+		return git_config_string(&core_partial_clone_filter_default,
+					 var, value);
+	}
+
 	/* Add other config variables here and to Documentation/config.txt. */
 	return 0;
 }
diff --git a/environment.c b/environment.c
index e52aab3..7537565 100644
--- a/environment.c
+++ b/environment.c
@@ -28,6 +28,7 @@ int warn_on_object_refname_ambiguity = 1;
 int ref_paranoia = -1;
 int repository_format_precious_objects;
 char *repository_format_partial_clone;
+const char *core_partial_clone_filter_default;
 const char *git_commit_encoding;
 const char *git_log_output_encoding;
 const char *apply_default_whitespace;
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index f1fb57b..76a6579 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -75,13 +75,22 @@ int armor_decode_arg(struct strbuf *buf, const char *arg)
  * subordinate commands when necessary.  We also "intern" the arg for
  * the convenience of the current command.
  */
-int parse_list_objects_filter(struct list_objects_filter_options *filter_options,
-			      const char *arg)
+static int gently_parse_list_objects_filter(
+	struct list_objects_filter_options *filter_options,
+	const char *arg,
+	struct strbuf *errbuf)
 {
 	const char *v0;
 
-	if (filter_options->choice)
-		die(_("multiple object filter types cannot be combined"));
+	if (filter_options->choice) {
+		if (errbuf) {
+			strbuf_init(errbuf, 0);
+			strbuf_addstr(
+				errbuf,
+				_("multiple filter-specs cannot be combined"));
+		}
+		return 1;
+	}
 
 	filter_options->raw_value = strdup(arg);
 
@@ -92,7 +101,7 @@ int parse_list_objects_filter(struct list_objects_filter_options *filter_options
 
 	if (skip_prefix(arg, "blob:limit=", &v0)) {
 		if (!git_parse_ulong(v0, &filter_options->blob_limit_value))
-			die(_("invalid filter-spec expression '%s'"), arg);
+			goto invalid_expression;
 		filter_options->choice = LOFC_BLOB_LIMIT;
 		return 0;
 	}
@@ -127,13 +136,27 @@ int parse_list_objects_filter(struct list_objects_filter_options *filter_options
 		int r;
 		struct strbuf buf = STRBUF_INIT;
 		if (armor_decode_arg(&buf, v0) < 0)
-			die(_("invalid filter-spec expression '%s'"), arg);
+			goto invalid_expression;
 		r = parse_list_objects_filter(filter_options, buf.buf);
 		strbuf_release(&buf);
 		return r;
 	}
 
-	die(_("invalid filter-spec expression '%s'"), arg);
+invalid_expression:
+	if (errbuf) {
+		strbuf_init(errbuf, 0);
+		strbuf_addf(errbuf, "invalid filter-spec '%s'", arg);
+	}
+	memset(filter_options, 0, sizeof(*filter_options));
+	return 1;
+}
+
+int parse_list_objects_filter(struct list_objects_filter_options *filter_options,
+			      const char *arg)
+{
+	struct strbuf buf = STRBUF_INIT;
+	if (gently_parse_list_objects_filter(filter_options, arg, &buf))
+		die("%s", buf.buf);
 	return 0;
 }
 
@@ -173,3 +196,50 @@ void list_objects_filter_push_arg(
 	}
 }
 
+void partial_clone_register(
+	const char *remote,
+	const struct list_objects_filter_options *filter_options)
+{
+	/*
+	 * Record the name of the partial clone remote in the
+	 * config and in the global variable -- the latter is
+	 * used throughout to indicate that partial clone is
+	 * enabled and to expect missing objects.
+	 */
+	if (repository_format_partial_clone &&
+	    *repository_format_partial_clone &&
+	    strcmp(remote, repository_format_partial_clone))
+		die(_("cannot change partial clone promisor remote"));
+
+	git_config_set("core.repositoryformatversion", "1");
+	git_config_set("extensions.partialclone", remote);
+
+	repository_format_partial_clone = xstrdup(remote);
+
+	/*
+	 * Record the initial filter-spec in the config as
+	 * the default for subsequent fetches from this remote.
+	 */
+	if (filter_options->requires_armor) {
+		struct strbuf buf = STRBUF_INIT;
+		armor_encode_arg(&buf, filter_options->raw_value);
+		core_partial_clone_filter_default = xstrdup(buf.buf);
+		strbuf_release(&buf);
+	} else {
+		core_partial_clone_filter_default =
+			xstrdup(filter_options->raw_value);
+	}
+	git_config_set("core.partialclonefilter",
+		       core_partial_clone_filter_default);
+}
+
+void partial_clone_get_default_filter_spec(
+	struct list_objects_filter_options *filter_options)
+{
+	/*
+	 * Parse default value, but silently ignore it if it is invalid.
+	 */
+	gently_parse_list_objects_filter(filter_options,
+					 core_partial_clone_filter_default,
+					 NULL);
+}
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index 99f454c..1a345ec 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -60,4 +60,10 @@ void list_objects_filter_push_arg(
 	struct argv_array *args,
 	const struct list_objects_filter_options *filter_options);
 
+void partial_clone_register(
+	const char *remote,
+	const struct list_objects_filter_options *filter_options);
+void partial_clone_get_default_filter_spec(
+	struct list_objects_filter_options *filter_options);
+
 #endif /* LIST_OBJECTS_FILTER_OPTIONS_H */
diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index 80a1a32..c57916b 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -755,4 +755,31 @@ test_expect_success 'fetching deepen' '
 	)
 '
 
+test_expect_success 'filtering by size' '
+	rm -rf server client &&
+	test_create_repo server &&
+	test_commit -C server one &&
+	test_config -C server uploadpack.allowfilter 1 &&
+
+	test_create_repo client &&
+	git -C client fetch-pack --filter=blob:limit=0 ../server HEAD &&
+
+	# Ensure that object is not inadvertently fetched
+	test_must_fail git -C client cat-file -e $(git hash-object server/one.t)
+'
+
+test_expect_success 'filtering by size has no effect if support for it is not advertised' '
+	rm -rf server client &&
+	test_create_repo server &&
+	test_commit -C server one &&
+
+	test_create_repo client &&
+	git -C client fetch-pack --filter=blob:limit=0 ../server HEAD 2> err &&
+
+	# Ensure that object is fetched
+	git -C client cat-file -e $(git hash-object server/one.t) &&
+
+	test_i18ngrep "filtering not recognized by server" err
+'
+
 test_done
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 08/15] partial-clone: define partial clone settings in config
  2017-11-16 18:17 [PATCH v4 00/15] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests Jeff Hostetler
                   ` (6 preceding siblings ...)
  2017-11-16 18:17 ` [PATCH v4 07/15] fetch-pack: test support excluding large blobs Jeff Hostetler
@ 2017-11-16 18:17 ` Jeff Hostetler
  2017-11-16 18:17 ` [PATCH v4 09/15] fetch-pack: test support excluding large blobs Jeff Hostetler
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 19+ messages in thread
From: Jeff Hostetler @ 2017-11-16 18:17 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, jonathantanmy, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Create get and set routines for partial clone settings in
the config.  These will be used by partial clone and fetch
to remember the promisor remote and the default filter-spec.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 t/t5500-fetch-pack.sh | 27 ---------------------------
 1 file changed, 27 deletions(-)

diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index c57916b..80a1a32 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -755,31 +755,4 @@ test_expect_success 'fetching deepen' '
 	)
 '
 
-test_expect_success 'filtering by size' '
-	rm -rf server client &&
-	test_create_repo server &&
-	test_commit -C server one &&
-	test_config -C server uploadpack.allowfilter 1 &&
-
-	test_create_repo client &&
-	git -C client fetch-pack --filter=blob:limit=0 ../server HEAD &&
-
-	# Ensure that object is not inadvertently fetched
-	test_must_fail git -C client cat-file -e $(git hash-object server/one.t)
-'
-
-test_expect_success 'filtering by size has no effect if support for it is not advertised' '
-	rm -rf server client &&
-	test_create_repo server &&
-	test_commit -C server one &&
-
-	test_create_repo client &&
-	git -C client fetch-pack --filter=blob:limit=0 ../server HEAD 2> err &&
-
-	# Ensure that object is fetched
-	git -C client cat-file -e $(git hash-object server/one.t) &&
-
-	test_i18ngrep "filtering not recognized by server" err
-'
-
 test_done
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 09/15] fetch-pack: test support excluding large blobs
  2017-11-16 18:17 [PATCH v4 00/15] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests Jeff Hostetler
                   ` (7 preceding siblings ...)
  2017-11-16 18:17 ` [PATCH v4 08/15] partial-clone: define partial clone settings in config Jeff Hostetler
@ 2017-11-16 18:17 ` Jeff Hostetler
  2017-11-16 18:17 ` [PATCH v4 10/15] fetch: add from_promisor and exclude-promisor-objects parameters Jeff Hostetler
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 19+ messages in thread
From: Jeff Hostetler @ 2017-11-16 18:17 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, jonathantanmy, Jeff Hostetler

From: Jonathan Tan <jonathantanmy@google.com>

Created tests to verify fetch-pack and upload-pack support
for excluding large blobs using --filter=blob:limit=<n>
parameter.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 t/t5500-fetch-pack.sh | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index 80a1a32..c57916b 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -755,4 +755,31 @@ test_expect_success 'fetching deepen' '
 	)
 '
 
+test_expect_success 'filtering by size' '
+	rm -rf server client &&
+	test_create_repo server &&
+	test_commit -C server one &&
+	test_config -C server uploadpack.allowfilter 1 &&
+
+	test_create_repo client &&
+	git -C client fetch-pack --filter=blob:limit=0 ../server HEAD &&
+
+	# Ensure that object is not inadvertently fetched
+	test_must_fail git -C client cat-file -e $(git hash-object server/one.t)
+'
+
+test_expect_success 'filtering by size has no effect if support for it is not advertised' '
+	rm -rf server client &&
+	test_create_repo server &&
+	test_commit -C server one &&
+
+	test_create_repo client &&
+	git -C client fetch-pack --filter=blob:limit=0 ../server HEAD 2> err &&
+
+	# Ensure that object is fetched
+	git -C client cat-file -e $(git hash-object server/one.t) &&
+
+	test_i18ngrep "filtering not recognized by server" err
+'
+
 test_done
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 10/15] fetch: add from_promisor and exclude-promisor-objects parameters
  2017-11-16 18:17 [PATCH v4 00/15] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests Jeff Hostetler
                   ` (8 preceding siblings ...)
  2017-11-16 18:17 ` [PATCH v4 09/15] fetch-pack: test support excluding large blobs Jeff Hostetler
@ 2017-11-16 18:17 ` Jeff Hostetler
  2017-11-16 18:17 ` [PATCH v4 11/15] t5500: add fetch-pack tests for partial clone Jeff Hostetler
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 19+ messages in thread
From: Jeff Hostetler @ 2017-11-16 18:17 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, jonathantanmy, Jeff Hostetler

From: Jonathan Tan <jonathantanmy@google.com>

Teach fetch to use from-promisor and exclude-promisor-objects
parameters with sub-commands.  Initialize fetch_if_missing
global variable.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 builtin/fetch.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 connected.c     |  2 ++
 2 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index fb9af7c..d3cf423 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1047,9 +1047,11 @@ static struct transport *prepare_transport(struct remote *remote, int deepen)
 		set_option(transport, TRANS_OPT_DEEPEN_RELATIVE, "yes");
 	if (update_shallow)
 		set_option(transport, TRANS_OPT_UPDATE_SHALLOW, "yes");
-	if (filter_options.choice)
+	if (filter_options.choice) {
 		set_option(transport, TRANS_OPT_LIST_OBJECTS_FILTER,
 			   filter_options.raw_value);
+		set_option(transport, TRANS_OPT_FROM_PROMISOR, "1");
+	}
 	return transport;
 }
 
@@ -1287,6 +1289,59 @@ static int fetch_multiple(struct string_list *list)
 	return result;
 }
 
+static inline void fetch_one__setup_partial(struct remote *remote)
+{
+	int ppc, neq;
+
+	/* Do we have a prior partial clone/fetch? */
+	ppc = (repository_format_partial_clone &&
+	       *repository_format_partial_clone);
+
+	/*
+	 * If no prior partial clone/fetch and partial fetch was NOT
+	 * requested now, do a normal fetch.
+	 */
+	if (!ppc && !filter_options.choice)
+		return;
+
+	/*
+	 * If this is the FIRST partial fetch request, we enable partial
+	 * on this repo and remember the given filter-spec as the default
+	 * for subsequent fetches to this remote.
+	 */
+	if (!ppc && filter_options.choice) {
+		partial_clone_register(remote->name, &filter_options);
+		return;
+	}
+
+	/*
+	 * We are currently limited to only ONE promisor remote.  That is,
+	 * we only allow partial fetches back to the original partial clone
+	 * remote (or the first partial fetch remote).  Disallow explicit
+	 * partial fetches to a different remote.
+	 *
+	 * Normal (non-partial) fetch commands should still be allowed to
+	 * other remotes.
+	 */
+	neq = (strcmp(remote->name, repository_format_partial_clone));
+	if (neq && filter_options.choice)
+		die(_("unsupported partial-fetch to a different remote"));
+
+	if (neq && !filter_options.choice)
+		return;
+
+	/*
+	 * When fetching from the promisor remote, we either use the
+	 * explicitly given filter-spec or inherit the filter-spec from
+	 * the clone.
+	 */
+	if (filter_options.choice)
+		return;
+
+	partial_clone_get_default_filter_spec(&filter_options);
+	return;
+}
+
 static int fetch_one(struct remote *remote, int argc, const char **argv)
 {
 	static const char **refs = NULL;
@@ -1298,6 +1353,8 @@ static int fetch_one(struct remote *remote, int argc, const char **argv)
 		die(_("No remote repository specified.  Please, specify either a URL or a\n"
 		    "remote name from which new revisions should be fetched."));
 
+	fetch_one__setup_partial(remote);
+
 	gtransport = prepare_transport(remote, 1);
 
 	if (prune < 0) {
@@ -1348,6 +1405,8 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 
 	packet_trace_identity("fetch");
 
+	fetch_if_missing = 0;
+
 	/* Record the command line for the reflog */
 	strbuf_addstr(&default_rla, "fetch");
 	for (i = 1; i < argc; i++)
diff --git a/connected.c b/connected.c
index f416b05..3a5bd67 100644
--- a/connected.c
+++ b/connected.c
@@ -56,6 +56,8 @@ int check_connected(sha1_iterate_fn fn, void *cb_data,
 	argv_array_push(&rev_list.args,"rev-list");
 	argv_array_push(&rev_list.args, "--objects");
 	argv_array_push(&rev_list.args, "--stdin");
+	if (repository_format_partial_clone)
+		argv_array_push(&rev_list.args, "--exclude-promisor-objects");
 	argv_array_push(&rev_list.args, "--not");
 	argv_array_push(&rev_list.args, "--all");
 	argv_array_push(&rev_list.args, "--quiet");
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 11/15] t5500: add fetch-pack tests for partial clone
  2017-11-16 18:17 [PATCH v4 00/15] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests Jeff Hostetler
                   ` (9 preceding siblings ...)
  2017-11-16 18:17 ` [PATCH v4 10/15] fetch: add from_promisor and exclude-promisor-objects parameters Jeff Hostetler
@ 2017-11-16 18:17 ` Jeff Hostetler
  2017-11-16 18:17 ` [PATCH v4 12/15] t5601: test " Jeff Hostetler
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 19+ messages in thread
From: Jeff Hostetler @ 2017-11-16 18:17 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, jonathantanmy, Jeff Hostetler

From: Jonathan Tan <jonathantanmy@google.com>

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 t/t5500-fetch-pack.sh | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index c57916b..23702b5 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -782,4 +782,40 @@ test_expect_success 'filtering by size has no effect if support for it is not ad
 	test_i18ngrep "filtering not recognized by server" err
 '
 
+fetch_blob_max_bytes () {
+		      SERVER="$1"
+		      URL="$2"
+
+	rm -rf "$SERVER" client &&
+	test_create_repo "$SERVER" &&
+	test_commit -C "$SERVER" one &&
+	test_config -C "$SERVER" uploadpack.allowfilter 1 &&
+
+	git clone "$URL" client &&
+	test_config -C client extensions.partialclone origin &&
+
+	test_commit -C "$SERVER" two &&
+
+	git -C client fetch --filter=blob:limit=0 origin HEAD:somewhere &&
+
+	# Ensure that commit is fetched, but blob is not
+	test_config -C client extensions.partialclone "arbitrary string" &&
+	git -C client cat-file -e $(git -C "$SERVER" rev-parse two) &&
+	test_must_fail git -C client cat-file -e $(git hash-object "$SERVER/two.t")
+}
+
+test_expect_success 'fetch with filtering' '
+		     fetch_blob_max_bytes server server
+'
+
+. "$TEST_DIRECTORY"/lib-httpd.sh
+start_httpd
+
+test_expect_success 'fetch with filtering and HTTP' '
+		     fetch_blob_max_bytes "$HTTPD_DOCUMENT_ROOT_PATH/server" "$HTTPD_URL/smart/server"
+'
+
+stop_httpd
+
+
 test_done
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 12/15] t5601: test for partial clone
  2017-11-16 18:17 [PATCH v4 00/15] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests Jeff Hostetler
                   ` (10 preceding siblings ...)
  2017-11-16 18:17 ` [PATCH v4 11/15] t5500: add fetch-pack tests for partial clone Jeff Hostetler
@ 2017-11-16 18:17 ` Jeff Hostetler
  2017-11-16 18:17 ` [PATCH v4 13/15] t5500: more tests for partial clone and fetch Jeff Hostetler
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 19+ messages in thread
From: Jeff Hostetler @ 2017-11-16 18:17 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, jonathantanmy, Jeff Hostetler

From: Jonathan Tan <jonathantanmy@google.com>

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 builtin/clone.c  | 15 ++++++++++++---
 t/t5601-clone.sh | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 61 insertions(+), 3 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index fceb9e7..5719690 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -889,6 +889,8 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	struct refspec *refspec;
 	const char *fetch_pattern;
 
+	fetch_if_missing = 0;
+
 	packet_trace_identity("clone");
 	argc = parse_options(argc, argv, prefix, builtin_clone_options,
 			     builtin_clone_usage, 0);
@@ -1109,11 +1111,13 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		transport_set_option(transport, TRANS_OPT_UPLOADPACK,
 				     option_upload_pack);
 
-	if (filter_options.choice)
+	if (filter_options.choice) {
 		transport_set_option(transport, TRANS_OPT_LIST_OBJECTS_FILTER,
 				     filter_options.raw_value);
+		transport_set_option(transport, TRANS_OPT_FROM_PROMISOR, "1");
+	}
 
-	if (transport->smart_options && !deepen)
+	if (transport->smart_options && !deepen && !filter_options.choice)
 		transport->smart_options->check_self_contained_and_connected = 1;
 
 	refs = transport_get_remote_refs(transport);
@@ -1173,13 +1177,17 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	write_refspec_config(src_ref_prefix, our_head_points_at,
 			remote_head_points_at, &branch_top);
 
+	if (filter_options.choice)
+		partial_clone_register("origin", &filter_options);
+
 	if (is_local)
 		clone_local(path, git_dir);
 	else if (refs && complete_refs_before_fetch)
 		transport_fetch_refs(transport, mapped_refs);
 
 	update_remote_refs(refs, mapped_refs, remote_head_points_at,
-			   branch_top.buf, reflog_msg.buf, transport, !is_local);
+			   branch_top.buf, reflog_msg.buf, transport,
+			   !is_local && !filter_options.choice);
 
 	update_head(our_head_points_at, remote_head, reflog_msg.buf);
 
@@ -1200,6 +1208,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	}
 
 	junk_mode = JUNK_LEAVE_REPO;
+	fetch_if_missing = 1;
 	err = checkout(submodule_progress);
 
 	strbuf_release(&reflog_msg);
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index 9c56f77..6d37c6d 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -571,4 +571,53 @@ test_expect_success 'GIT_TRACE_PACKFILE produces a usable pack' '
 	git -C replay.git index-pack -v --stdin <tmp.pack
 '
 
+partial_clone () {
+	       SERVER="$1" &&
+	       URL="$2" &&
+
+	rm -rf "$SERVER" client &&
+	test_create_repo "$SERVER" &&
+	test_commit -C "$SERVER" one &&
+	HASH1=$(git hash-object "$SERVER/one.t") &&
+	git -C "$SERVER" revert HEAD &&
+	test_commit -C "$SERVER" two &&
+	HASH2=$(git hash-object "$SERVER/two.t") &&
+	test_config -C "$SERVER" uploadpack.allowfilter 1 &&
+	test_config -C "$SERVER" uploadpack.allowanysha1inwant 1 &&
+
+	git clone --filter=blob:limit=0 "$URL" client &&
+
+	git -C client fsck &&
+
+	# Ensure that unneeded blobs are not inadvertently fetched.
+	test_config -C client extensions.partialclone "not a remote" &&
+	test_must_fail git -C client cat-file -e "$HASH1" &&
+
+	# But this blob was fetched, because clone performs an initial checkout
+	git -C client cat-file -e "$HASH2"
+}
+
+test_expect_success 'partial clone' '
+	partial_clone server "file://$(pwd)/server"
+'
+
+test_expect_success 'partial clone: warn if server does not support object filtering' '
+	rm -rf server client &&
+	test_create_repo server &&
+	test_commit -C server one &&
+
+	git clone --filter=blob:limit=0 "file://$(pwd)/server" client 2> err &&
+
+	test_i18ngrep "filtering not recognized by server" err
+'
+
+. "$TEST_DIRECTORY"/lib-httpd.sh
+start_httpd
+
+test_expect_success 'partial clone using HTTP' '
+	partial_clone "$HTTPD_DOCUMENT_ROOT_PATH/server" "$HTTPD_URL/smart/server"
+'
+
+stop_httpd
+
 test_done
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 13/15] t5500: more tests for partial clone and fetch
  2017-11-16 18:17 [PATCH v4 00/15] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests Jeff Hostetler
                   ` (11 preceding siblings ...)
  2017-11-16 18:17 ` [PATCH v4 12/15] t5601: test " Jeff Hostetler
@ 2017-11-16 18:17 ` Jeff Hostetler
  2017-11-16 18:17 ` [PATCH v4 14/15] unpack-trees: batch fetching of missing blobs Jeff Hostetler
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 19+ messages in thread
From: Jeff Hostetler @ 2017-11-16 18:17 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, jonathantanmy, Jeff Hostetler

From: Jonathan Tan <jonathantanmy@google.com>

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 t/t5500-fetch-pack.sh | 60 +++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 56 insertions(+), 4 deletions(-)

diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index 23702b5..c95bb7b 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -782,7 +782,7 @@ test_expect_success 'filtering by size has no effect if support for it is not ad
 	test_i18ngrep "filtering not recognized by server" err
 '
 
-fetch_blob_max_bytes () {
+setup_blob_max_bytes () {
 		      SERVER="$1"
 		      URL="$2"
 
@@ -794,7 +794,11 @@ fetch_blob_max_bytes () {
 	git clone "$URL" client &&
 	test_config -C client extensions.partialclone origin &&
 
-	test_commit -C "$SERVER" two &&
+	test_commit -C "$SERVER" two
+}
+
+do_blob_max_bytes() {
+	SERVER="$1" &&
 
 	git -C client fetch --filter=blob:limit=0 origin HEAD:somewhere &&
 
@@ -805,14 +809,62 @@ fetch_blob_max_bytes () {
 }
 
 test_expect_success 'fetch with filtering' '
-		     fetch_blob_max_bytes server server
+	setup_blob_max_bytes server server &&
+	do_blob_max_bytes server
+'
+
+test_expect_success 'fetch respects configured filtering' '
+	setup_blob_max_bytes server server &&
+
+	test_config -C client core.partialclonefilter blob:limit=0 &&
+
+	git -C client fetch origin HEAD:somewhere &&
+
+	# Ensure that commit is fetched, but blob is not
+	test_config -C client extensions.partialclone "arbitrary string" &&
+	git -C client cat-file -e $(git -C server rev-parse two) &&
+	test_must_fail git -C client cat-file -e $(git hash-object server/two.t)
+'
+
+test_expect_success 'pull respects configured filtering' '
+	setup_blob_max_bytes server server &&
+
+	# Hide two.t from tip so that client does not load it upon the
+	# automatic checkout that pull performs
+	git -C server rm two.t &&
+	test_commit -C server three &&
+
+	test_config -C server uploadpack.allowanysha1inwant 1 &&
+	test_config -C client core.partialclonefilter blob:limit=0 &&
+
+	git -C client pull origin &&
+
+	# Ensure that commit is fetched, but blob is not
+	test_config -C client extensions.partialclone "arbitrary string" &&
+	git -C client cat-file -e $(git -C server rev-parse two) &&
+	test_must_fail git -C client cat-file -e $(git hash-object server/two.t)
+'
+
+test_expect_success 'clone configures filtering' '
+	rm -rf server client &&
+	test_create_repo server &&
+	test_commit -C server one &&
+	test_commit -C server two &&
+	test_config -C server uploadpack.allowanysha1inwant 1 &&
+
+	git clone --filter=blob:limit=12345 server client &&
+
+	# Ensure that we can, for example, checkout HEAD^
+	rm -rf client/.git/objects/* &&
+	git -C client checkout HEAD^
 '
 
 . "$TEST_DIRECTORY"/lib-httpd.sh
 start_httpd
 
 test_expect_success 'fetch with filtering and HTTP' '
-		     fetch_blob_max_bytes "$HTTPD_DOCUMENT_ROOT_PATH/server" "$HTTPD_URL/smart/server"
+	setup_blob_max_bytes "$HTTPD_DOCUMENT_ROOT_PATH/server" "$HTTPD_URL/smart/server" &&
+	do_blob_max_bytes "$HTTPD_DOCUMENT_ROOT_PATH/server"
 '
 
 stop_httpd
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 14/15] unpack-trees: batch fetching of missing blobs
  2017-11-16 18:17 [PATCH v4 00/15] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests Jeff Hostetler
                   ` (12 preceding siblings ...)
  2017-11-16 18:17 ` [PATCH v4 13/15] t5500: more tests for partial clone and fetch Jeff Hostetler
@ 2017-11-16 18:17 ` Jeff Hostetler
  2017-11-16 18:17 ` [PATCH v4 15/15] fetch-pack: restore save_commit_buffer after use Jeff Hostetler
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 19+ messages in thread
From: Jeff Hostetler @ 2017-11-16 18:17 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, jonathantanmy, Jeff Hostetler

From: Jonathan Tan <jonathantanmy@google.com>

When running checkout, first prefetch all blobs that are to be updated
but are missing. This means that only one pack is downloaded during such
operations, instead of one per missing blob.

This operates only on the blob level - if a repository has a missing
tree, they are still fetched one at a time.

This does not use the delayed checkout mechanism introduced in commit
2841e8f ("convert: add "status=delayed" to filter process protocol",
2017-06-30) due to significant conceptual differences - in particular,
for partial clones, we already know what needs to be fetched based on
the contents of the local repo alone, whereas for status=delayed, it is
the filter process that tells us what needs to be checked in the end.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 fetch-object.c   | 27 +++++++++++++++++++++++----
 fetch-object.h   |  5 +++++
 t/t5601-clone.sh | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 unpack-trees.c   | 22 ++++++++++++++++++++++
 4 files changed, 102 insertions(+), 4 deletions(-)

diff --git a/fetch-object.c b/fetch-object.c
index 369b61c..21b4dfa 100644
--- a/fetch-object.c
+++ b/fetch-object.c
@@ -3,12 +3,12 @@
 #include "pkt-line.h"
 #include "strbuf.h"
 #include "transport.h"
+#include "fetch-object.h"
 
-void fetch_object(const char *remote_name, const unsigned char *sha1)
+static void fetch_refs(const char *remote_name, struct ref *ref)
 {
 	struct remote *remote;
 	struct transport *transport;
-	struct ref *ref;
 	int original_fetch_if_missing = fetch_if_missing;
 
 	fetch_if_missing = 0;
@@ -17,10 +17,29 @@ void fetch_object(const char *remote_name, const unsigned char *sha1)
 		die(_("Remote with no URL"));
 	transport = transport_get(remote, remote->url[0]);
 
-	ref = alloc_ref(sha1_to_hex(sha1));
-	hashcpy(ref->old_oid.hash, sha1);
 	transport_set_option(transport, TRANS_OPT_FROM_PROMISOR, "1");
 	transport_set_option(transport, TRANS_OPT_NO_HAVES, "1");
 	transport_fetch_refs(transport, ref);
 	fetch_if_missing = original_fetch_if_missing;
 }
+
+void fetch_object(const char *remote_name, const unsigned char *sha1)
+{
+	struct ref *ref = alloc_ref(sha1_to_hex(sha1));
+	hashcpy(ref->old_oid.hash, sha1);
+	fetch_refs(remote_name, ref);
+}
+
+void fetch_objects(const char *remote_name, const struct oid_array *to_fetch)
+{
+	struct ref *ref = NULL;
+	int i;
+
+	for (i = 0; i < to_fetch->nr; i++) {
+		struct ref *new_ref = alloc_ref(oid_to_hex(&to_fetch->oid[i]));
+		oidcpy(&new_ref->old_oid, &to_fetch->oid[i]);
+		new_ref->next = ref;
+		ref = new_ref;
+	}
+	fetch_refs(remote_name, ref);
+}
diff --git a/fetch-object.h b/fetch-object.h
index f371300..4b269d0 100644
--- a/fetch-object.h
+++ b/fetch-object.h
@@ -1,6 +1,11 @@
 #ifndef FETCH_OBJECT_H
 #define FETCH_OBJECT_H
 
+#include "sha1-array.h"
+
 extern void fetch_object(const char *remote_name, const unsigned char *sha1);
 
+extern void fetch_objects(const char *remote_name,
+			  const struct oid_array *to_fetch);
+
 #endif
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index 6d37c6d..13610b7 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -611,6 +611,58 @@ test_expect_success 'partial clone: warn if server does not support object filte
 	test_i18ngrep "filtering not recognized by server" err
 '
 
+test_expect_success 'batch missing blob request during checkout' '
+	rm -rf server client &&
+
+	test_create_repo server &&
+	echo a >server/a &&
+	echo b >server/b &&
+	git -C server add a b &&
+
+	git -C server commit -m x &&
+	echo aa >server/a &&
+	echo bb >server/b &&
+	git -C server add a b &&
+	git -C server commit -m x &&
+
+	test_config -C server uploadpack.allowfilter 1 &&
+	test_config -C server uploadpack.allowanysha1inwant 1 &&
+
+	git clone --filter=blob:limit=0 "file://$(pwd)/server" client &&
+
+	# Ensure that there is only one negotiation by checking that there is
+	# only "done" line sent. ("done" marks the end of negotiation.)
+	GIT_TRACE_PACKET="$(pwd)/trace" git -C client checkout HEAD^ &&
+	grep "git> done" trace >done_lines &&
+	test_line_count = 1 done_lines
+'
+
+test_expect_success 'batch missing blob request does not inadvertently try to fetch gitlinks' '
+	rm -rf server client &&
+
+	test_create_repo repo_for_submodule &&
+	test_commit -C repo_for_submodule x &&
+
+	test_create_repo server &&
+	echo a >server/a &&
+	echo b >server/b &&
+	git -C server add a b &&
+	git -C server commit -m x &&
+
+	echo aa >server/a &&
+	echo bb >server/b &&
+	# Also add a gitlink pointing to an arbitrary repository
+	git -C server submodule add "$(pwd)/repo_for_submodule" c &&
+	git -C server add a b c &&
+	git -C server commit -m x &&
+
+	test_config -C server uploadpack.allowfilter 1 &&
+	test_config -C server uploadpack.allowanysha1inwant 1 &&
+
+	# Make sure that it succeeds
+	git clone --filter=blob:limit=0 "file://$(pwd)/server" client
+'
+
 . "$TEST_DIRECTORY"/lib-httpd.sh
 start_httpd
 
diff --git a/unpack-trees.c b/unpack-trees.c
index 71b70cc..73a1cdb 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -14,6 +14,7 @@
 #include "dir.h"
 #include "submodule.h"
 #include "submodule-config.h"
+#include "fetch-object.h"
 
 /*
  * Error messages expected by scripts out of plumbing commands such as
@@ -369,6 +370,27 @@ static int check_updates(struct unpack_trees_options *o)
 		load_gitmodules_file(index, &state);
 
 	enable_delayed_checkout(&state);
+	if (repository_format_partial_clone && o->update && !o->dry_run) {
+		/*
+		 * Prefetch the objects that are to be checked out in the loop
+		 * below.
+		 */
+		struct oid_array to_fetch = OID_ARRAY_INIT;
+		int fetch_if_missing_store = fetch_if_missing;
+		fetch_if_missing = 0;
+		for (i = 0; i < index->cache_nr; i++) {
+			struct cache_entry *ce = index->cache[i];
+			if ((ce->ce_flags & CE_UPDATE) &&
+			    !S_ISGITLINK(ce->ce_mode)) {
+				if (!has_object_file(&ce->oid))
+					oid_array_append(&to_fetch, &ce->oid);
+			}
+		}
+		if (to_fetch.nr)
+			fetch_objects(repository_format_partial_clone,
+				      &to_fetch);
+		fetch_if_missing = fetch_if_missing_store;
+	}
 	for (i = 0; i < index->cache_nr; i++) {
 		struct cache_entry *ce = index->cache[i];
 
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 15/15] fetch-pack: restore save_commit_buffer after use
  2017-11-16 18:17 [PATCH v4 00/15] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests Jeff Hostetler
                   ` (13 preceding siblings ...)
  2017-11-16 18:17 ` [PATCH v4 14/15] unpack-trees: batch fetching of missing blobs Jeff Hostetler
@ 2017-11-16 18:17 ` Jeff Hostetler
  2017-11-17  6:19 ` [PATCH v4 00/15] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests Junio C Hamano
  2017-11-21 18:17 ` Jonathan Tan
  16 siblings, 0 replies; 19+ messages in thread
From: Jeff Hostetler @ 2017-11-16 18:17 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, jonathantanmy, Jeff Hostetler

From: Jonathan Tan <jonathantanmy@google.com>

In fetch-pack, the global variable save_commit_buffer is set to 0, but
not restored to its original value after use.

In particular, if show_log() (in log-tree.c) is invoked after
fetch_pack() in the same process, show_log() will return before printing
out the commit message (because the invocation to
get_cached_commit_buffer() returns NULL, because the commit buffer was
not saved). I discovered this when attempting to run "git log -S" in a
partial clone, triggering the case where revision walking lazily loads
missing objects.

Therefore, restore save_commit_buffer to its original value after use.

An alternative to solve the problem I had is to replace
get_cached_commit_buffer() with get_commit_buffer(). That invocation was
introduced in commit a97934d ("use get_cached_commit_buffer where
appropriate", 2014-06-13) to replace "commit->buffer" introduced in
commit 3131b71 ("Add "--show-all" revision walker flag for debugging",
2008-02-13). In the latter commit, the commit author seems to be
deciding between not showing an unparsed commit at all and showing an
unparsed commit without the message (which is what the commit does), and
did not mention parsing the unparsed commit, so I prefer to preserve the
existing behavior.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 fetch-pack.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fetch-pack.c b/fetch-pack.c
index 895e8f9..121f03e 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -717,6 +717,7 @@ static int everything_local(struct fetch_pack_args *args,
 {
 	struct ref *ref;
 	int retval;
+	int old_save_commit_buffer = save_commit_buffer;
 	timestamp_t cutoff = 0;
 
 	save_commit_buffer = 0;
@@ -784,6 +785,9 @@ static int everything_local(struct fetch_pack_args *args,
 		print_verbose(args, _("already have %s (%s)"), oid_to_hex(remote),
 			      ref->name);
 	}
+
+	save_commit_buffer = old_save_commit_buffer;
+
 	return retval;
 }
 
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 00/15] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests
  2017-11-16 18:17 [PATCH v4 00/15] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests Jeff Hostetler
                   ` (14 preceding siblings ...)
  2017-11-16 18:17 ` [PATCH v4 15/15] fetch-pack: restore save_commit_buffer after use Jeff Hostetler
@ 2017-11-17  6:19 ` Junio C Hamano
  2017-11-21 18:17 ` Jonathan Tan
  16 siblings, 0 replies; 19+ messages in thread
From: Junio C Hamano @ 2017-11-17  6:19 UTC (permalink / raw)
  To: Jeff Hostetler; +Cc: git, peff, jonathantanmy, Jeff Hostetler

Jeff Hostetler <git@jeffhostetler.com> writes:

> From: Jeff Hostetler <jeffhost@microsoft.com>
>
> This part 3 of a 3 part sequence partial clone.  It assumes
> that part 1 and part 2 are in place.

I couldn't figure out why 'pu' fails with this topic at t5500 (and
others) so I dropped a merge of this before pushing the result out.

Thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 00/15] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests
  2017-11-16 18:17 [PATCH v4 00/15] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests Jeff Hostetler
                   ` (15 preceding siblings ...)
  2017-11-17  6:19 ` [PATCH v4 00/15] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests Junio C Hamano
@ 2017-11-21 18:17 ` Jonathan Tan
  2017-11-21 20:46   ` Jeff Hostetler
  16 siblings, 1 reply; 19+ messages in thread
From: Jonathan Tan @ 2017-11-21 18:17 UTC (permalink / raw)
  To: Jeff Hostetler; +Cc: git, gitster, peff, Jeff Hostetler

On Thu, 16 Nov 2017 18:17:08 +0000
Jeff Hostetler <git@jeffhostetler.com> wrote:

> From: Jeff Hostetler <jeffhost@microsoft.com>
> 
> This part 3 of a 3 part sequence partial clone.  It assumes
> that part 1 and part 2 are in place.
> 
> This patch series is labeled as V4 to keep it in sync with
> the corresponding V4 versions of parts 1 and 2.  There was
> not a V3 version of this patch series.
> 
> Jonathan and I independently started on this task.  This is another
> pass at merging those efforts.  So there are several places that may
> need refactoring and cleanup, but fewer than in the previous submission.
> In particular, the test cases should be squashed and new tests added.
> 
> And I think we need more end-to-end tests.  I'll work on those next.

I think that it would be easier to review if the test for each command
was contained in the same patch as (or the patch immediately following)
the implementation of the command - for example, as in my modifications
[1].

(If you're about to send out v5, that's fine - maybe take this in
consideration for v6, if there is one.)

[1] https://github.com/jonathantanmy/git/commits/pc20171103

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 00/15] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests
  2017-11-21 18:17 ` Jonathan Tan
@ 2017-11-21 20:46   ` Jeff Hostetler
  0 siblings, 0 replies; 19+ messages in thread
From: Jeff Hostetler @ 2017-11-21 20:46 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, gitster, peff, Jeff Hostetler



On 11/21/2017 1:17 PM, Jonathan Tan wrote:
> On Thu, 16 Nov 2017 18:17:08 +0000
> Jeff Hostetler <git@jeffhostetler.com> wrote:
> 
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>
>> This part 3 of a 3 part sequence partial clone.  It assumes
>> that part 1 and part 2 are in place.
>>
>> This patch series is labeled as V4 to keep it in sync with
>> the corresponding V4 versions of parts 1 and 2.  There was
>> not a V3 version of this patch series.
>>
>> Jonathan and I independently started on this task.  This is another
>> pass at merging those efforts.  So there are several places that may
>> need refactoring and cleanup, but fewer than in the previous submission.
>> In particular, the test cases should be squashed and new tests added.
>>
>> And I think we need more end-to-end tests.  I'll work on those next.
> 
> I think that it would be easier to review if the test for each command
> was contained in the same patch as (or the patch immediately following)
> the implementation of the command - for example, as in my modifications
> [1].
> 
> (If you're about to send out v5, that's fine - maybe take this in
> consideration for v6, if there is one.)
> 
> [1] https://github.com/jonathantanmy/git/commits/pc20171103
> 

I've already pulled the tests to be with the code changes in part 1
and part 2.  There are a few commits with tests in part 3 that we may
want to squash together, but we can do that later.  If you want to
start with my V5 set and do that, that would be fine.

What I was talking about above are some additional end-to-end tests
that I think we need.  For example, do a partial clone/fetch and then
run blame or some other commands that exercise the overall system.

As parts 1, 2, and 3 are settling down, these additional tests could
be in a part 4 for now.  Later, if it makes sense, these could be
move earlier.

Jeff

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2017-11-21 20:46 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-16 18:17 [PATCH v4 00/15] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests Jeff Hostetler
2017-11-16 18:17 ` [PATCH v4 01/15] upload-pack: add object filtering for partial clone Jeff Hostetler
2017-11-16 18:17 ` [PATCH v4 02/15] clone, fetch-pack, index-pack, transport: " Jeff Hostetler
2017-11-16 18:17 ` [PATCH v4 03/15] fetch: refactor calculation of remote list Jeff Hostetler
2017-11-16 18:17 ` [PATCH v4 04/15] fetch: add object filtering for partial fetch Jeff Hostetler
2017-11-16 18:17 ` [PATCH v4 05/15] remote-curl: add object filtering for partial clone Jeff Hostetler
2017-11-16 18:17 ` [PATCH v4 06/15] pack-objects: test support for blob filtering Jeff Hostetler
2017-11-16 18:17 ` [PATCH v4 07/15] fetch-pack: test support excluding large blobs Jeff Hostetler
2017-11-16 18:17 ` [PATCH v4 08/15] partial-clone: define partial clone settings in config Jeff Hostetler
2017-11-16 18:17 ` [PATCH v4 09/15] fetch-pack: test support excluding large blobs Jeff Hostetler
2017-11-16 18:17 ` [PATCH v4 10/15] fetch: add from_promisor and exclude-promisor-objects parameters Jeff Hostetler
2017-11-16 18:17 ` [PATCH v4 11/15] t5500: add fetch-pack tests for partial clone Jeff Hostetler
2017-11-16 18:17 ` [PATCH v4 12/15] t5601: test " Jeff Hostetler
2017-11-16 18:17 ` [PATCH v4 13/15] t5500: more tests for partial clone and fetch Jeff Hostetler
2017-11-16 18:17 ` [PATCH v4 14/15] unpack-trees: batch fetching of missing blobs Jeff Hostetler
2017-11-16 18:17 ` [PATCH v4 15/15] fetch-pack: restore save_commit_buffer after use Jeff Hostetler
2017-11-17  6:19 ` [PATCH v4 00/15] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests Junio C Hamano
2017-11-21 18:17 ` Jonathan Tan
2017-11-21 20:46   ` Jeff Hostetler

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).