git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests
@ 2017-12-08 15:58 Jeff Hostetler
  2017-12-08 15:58 ` [PATCH v7 01/16] sha1_file: support lazily fetching missing objects Jeff Hostetler
                   ` (17 more replies)
  0 siblings, 18 replies; 24+ messages in thread
From: Jeff Hostetler @ 2017-12-08 15:58 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, jonathantanmy, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

This is V7 of part 3 of partial clone.  It builds upon V7 of part 2
(which builds upon V6 of part 1).

This version adds additional tests, fixes test errors on the MAC version,
and squashes some fixup commits.

It also restores functionality accidentally dropped from the V6 series
for "git fetch" to automatically inherit the partial-clone filter-spec
when appropriate.  This version extends the --no-filter argument to
override this inheritance.

Jeff Hostetler (8):
  upload-pack: add object filtering for partial clone
  fetch-pack, index-pack, transport: partial clone
  fetch-pack: add --no-filter
  fetch: support filters
  partial-clone: define partial clone settings in config
  t5616: end-to-end tests for partial clone
  fetch: inherit filter-spec from partial clone
  t5616: test bulk prefetch after partial fetch

Jonathan Tan (8):
  sha1_file: support lazily fetching missing objects
  rev-list: support termination at promisor objects
  gc: do not repack promisor packfiles
  fetch-pack: test support excluding large blobs
  fetch: refactor calculation of remote list
  clone: partial clone
  unpack-trees: batch fetching of missing blobs
  fetch-pack: restore save_commit_buffer after use

 Documentation/config.txt                          |   4 +
 Documentation/git-pack-objects.txt                |  11 ++
 Documentation/rev-list-options.txt                |  11 ++
 Documentation/technical/pack-protocol.txt         |   8 +
 Documentation/technical/protocol-capabilities.txt |   8 +
 builtin/cat-file.c                                |   2 +
 builtin/clone.c                                   |  22 ++-
 builtin/fetch-pack.c                              |  10 ++
 builtin/fetch.c                                   |  83 ++++++++-
 builtin/fsck.c                                    |   3 +
 builtin/gc.c                                      |   3 +
 builtin/index-pack.c                              |   6 +
 builtin/pack-objects.c                            |  37 +++-
 builtin/prune.c                                   |   7 +
 builtin/repack.c                                  |   8 +-
 builtin/rev-list.c                                |  73 +++++++-
 cache.h                                           |   9 +
 config.c                                          |   5 +
 connected.c                                       |   2 +
 environment.c                                     |   1 +
 fetch-object.c                                    |  29 ++-
 fetch-object.h                                    |   5 +
 fetch-pack.c                                      |  17 ++
 fetch-pack.h                                      |   2 +
 list-objects-filter-options.c                     |  92 ++++++++--
 list-objects-filter-options.h                     |  18 ++
 list-objects.c                                    |  29 ++-
 object.c                                          |   2 +-
 remote-curl.c                                     |   6 +
 revision.c                                        |  33 +++-
 revision.h                                        |   5 +-
 sha1_file.c                                       |  32 +++-
 t/t0410-partial-clone.sh                          | 206 +++++++++++++++++++++-
 t/t5500-fetch-pack.sh                             |  63 +++++++
 t/t5601-clone.sh                                  | 101 +++++++++++
 t/t5616-partial-clone.sh                          | 146 +++++++++++++++
 transport-helper.c                                |   5 +
 transport.c                                       |   4 +
 transport.h                                       |   5 +
 unpack-trees.c                                    |  22 +++
 upload-pack.c                                     |  31 +++-
 41 files changed, 1110 insertions(+), 56 deletions(-)
 create mode 100755 t/t5616-partial-clone.sh

-- 
2.9.3


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v7 01/16] sha1_file: support lazily fetching missing objects
  2017-12-08 15:58 [PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests Jeff Hostetler
@ 2017-12-08 15:58 ` Jeff Hostetler
  2017-12-08 15:58 ` [PATCH v7 02/16] rev-list: support termination at promisor objects Jeff Hostetler
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Jeff Hostetler @ 2017-12-08 15:58 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, jonathantanmy

From: Jonathan Tan <jonathantanmy@google.com>

Teach sha1_file to fetch objects from the remote configured in
extensions.partialclone whenever an object is requested but missing.

The fetching of objects can be suppressed through a global variable.
This is used by fsck and index-pack.

However, by default, such fetching is not suppressed. This is meant as a
temporary measure to ensure that all Git commands work in such a
situation. Future patches will update some commands to either tolerate
missing objects (without fetching them) or be more efficient in fetching
them.

In order to determine the code changes in sha1_file.c necessary, I
investigated the following:
 (1) functions in sha1_file.c that take in a hash, without the user
     regarding how the object is stored (loose or packed)
 (2) functions in packfile.c (because I need to check callers that know
     about the loose/packed distinction and operate on both differently,
     and ensure that they can handle the concept of objects that are
     neither loose nor packed)

(1) is handled by the modification to sha1_object_info_extended().

For (2), I looked at for_each_packed_object and others.  For
for_each_packed_object, the callers either already work or are fixed in
this patch:
 - reachable - only to find recent objects
 - builtin/fsck - already knows about missing objects
 - builtin/cat-file - warning message added in this commit

Callers of the other functions do not need to be changed:
 - parse_pack_index
   - http - indirectly from http_get_info_packs
   - find_pack_entry_one
     - this searches a single pack that is provided as an argument; the
       caller already knows (through other means) that the sought object
       is in a specific pack
 - find_sha1_pack
   - fast-import - appears to be an optimization to not store a file if
     it is already in a pack
   - http-walker - to search through a struct alt_base
   - http-push - to search through remote packs
 - has_sha1_pack
   - builtin/fsck - already knows about promisor objects
   - builtin/count-objects - informational purposes only (check if loose
     object is also packed)
   - builtin/prune-packed - check if object to be pruned is packed (if
     not, don't prune it)
   - revision - used to exclude packed objects if requested by user
   - diff - just for optimization

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/cat-file.c       |  2 ++
 builtin/fetch-pack.c     |  2 ++
 builtin/fsck.c           |  3 +++
 builtin/index-pack.c     |  6 ++++++
 cache.h                  |  8 ++++++++
 fetch-object.c           |  3 +++
 sha1_file.c              | 32 ++++++++++++++++++++++--------
 t/t0410-partial-clone.sh | 51 ++++++++++++++++++++++++++++++++++++++++++++++++
 8 files changed, 99 insertions(+), 8 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index f5fa4fd..cf9ea5c 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -475,6 +475,8 @@ static int batch_objects(struct batch_options *opt)
 
 		for_each_loose_object(batch_loose_object, &sa, 0);
 		for_each_packed_object(batch_packed_object, &sa, 0);
+		if (repository_format_partial_clone)
+			warning("This repository has extensions.partialClone set. Some objects may not be loaded.");
 
 		cb.opt = opt;
 		cb.expand = &data;
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 02abe72..15eeed7 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -53,6 +53,8 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
 	struct oid_array shallow = OID_ARRAY_INIT;
 	struct string_list deepen_not = STRING_LIST_INIT_DUP;
 
+	fetch_if_missing = 0;
+
 	packet_trace_identity("fetch-pack");
 
 	memset(&args, 0, sizeof(args));
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 578a7c8..3b76c0e 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -678,6 +678,9 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 	int i;
 	struct alternate_object_database *alt;
 
+	/* fsck knows how to handle missing promisor objects */
+	fetch_if_missing = 0;
+
 	errors_found = 0;
 	check_replace_refs = 0;
 
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 24c2f05..a0a35e6 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1657,6 +1657,12 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	unsigned foreign_nr = 1;	/* zero is a "good" value, assume bad */
 	int report_end_of_input = 0;
 
+	/*
+	 * index-pack never needs to fetch missing objects, since it only
+	 * accesses the repo to do hash collision checks
+	 */
+	fetch_if_missing = 0;
+
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage(index_pack_usage);
 
diff --git a/cache.h b/cache.h
index c76f2e9..6980072 100644
--- a/cache.h
+++ b/cache.h
@@ -1727,6 +1727,14 @@ struct object_info {
 #define OBJECT_INFO_QUICK 8
 extern int sha1_object_info_extended(const unsigned char *, struct object_info *, unsigned flags);
 
+/*
+ * Set this to 0 to prevent sha1_object_info_extended() from fetching missing
+ * blobs. This has a difference only if extensions.partialClone is set.
+ *
+ * Its default value is 1.
+ */
+extern int fetch_if_missing;
+
 /* Dumb servers support */
 extern int update_server_info(int);
 
diff --git a/fetch-object.c b/fetch-object.c
index 08e91ce..258fcfa 100644
--- a/fetch-object.c
+++ b/fetch-object.c
@@ -10,7 +10,9 @@ void fetch_object(const char *remote_name, const unsigned char *sha1)
 	struct remote *remote;
 	struct transport *transport;
 	struct ref *ref;
+	int original_fetch_if_missing = fetch_if_missing;
 
+	fetch_if_missing = 0;
 	remote = remote_get(remote_name);
 	if (!remote->url[0])
 		die(_("Remote with no URL"));
@@ -21,4 +23,5 @@ void fetch_object(const char *remote_name, const unsigned char *sha1)
 	transport_set_option(transport, TRANS_OPT_FROM_PROMISOR, "1");
 	transport_set_option(transport, TRANS_OPT_NO_DEPENDENTS, "1");
 	transport_fetch_refs(transport, ref);
+	fetch_if_missing = original_fetch_if_missing;
 }
diff --git a/sha1_file.c b/sha1_file.c
index 10c3a00..dd956e2 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -29,6 +29,7 @@
 #include "mergesort.h"
 #include "quote.h"
 #include "packfile.h"
+#include "fetch-object.h"
 
 const unsigned char null_sha1[GIT_MAX_RAWSZ];
 const struct object_id null_oid;
@@ -1144,6 +1145,8 @@ static int sha1_loose_object_info(const unsigned char *sha1,
 	return (status < 0) ? status : 0;
 }
 
+int fetch_if_missing = 1;
+
 int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi, unsigned flags)
 {
 	static struct object_info blank_oi = OBJECT_INFO_INIT;
@@ -1152,6 +1155,7 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi,
 	const unsigned char *real = (flags & OBJECT_INFO_LOOKUP_REPLACE) ?
 				    lookup_replace_object(sha1) :
 				    sha1;
+	int already_retried = 0;
 
 	if (!oi)
 		oi = &blank_oi;
@@ -1176,19 +1180,32 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi,
 		}
 	}
 
-	if (!find_pack_entry(real, &e)) {
+	while (1) {
+		if (find_pack_entry(real, &e))
+			break;
+
 		/* Most likely it's a loose object. */
 		if (!sha1_loose_object_info(real, oi, flags))
 			return 0;
 
 		/* Not a loose object; someone else may have just packed it. */
-		if (flags & OBJECT_INFO_QUICK) {
-			return -1;
-		} else {
-			reprepare_packed_git();
-			if (!find_pack_entry(real, &e))
-				return -1;
+		reprepare_packed_git();
+		if (find_pack_entry(real, &e))
+			break;
+
+		/* Check if it is a missing object */
+		if (fetch_if_missing && repository_format_partial_clone &&
+		    !already_retried) {
+			/*
+			 * TODO Investigate haveing fetch_object() return
+			 * TODO error/success and stopping the music here.
+			 */
+			fetch_object(repository_format_partial_clone, real);
+			already_retried = 1;
+			continue;
 		}
+
+		return -1;
 	}
 
 	if (oi == &blank_oi)
@@ -1197,7 +1214,6 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi,
 		 * information below, so return early.
 		 */
 		return 0;
-
 	rtype = packed_object_info(e.p, e.offset, oi);
 	if (rtype < 0) {
 		mark_bad_packed_object(e.p, real);
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index e96f436..8a90f6a 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -138,4 +138,55 @@ test_expect_success 'missing CLI object, but promised, passes fsck' '
 	git -C repo fsck "$A"
 '
 
+test_expect_success 'fetching of missing objects' '
+	rm -rf repo &&
+	test_create_repo server &&
+	test_commit -C server foo &&
+	git -C server repack -a -d --write-bitmap-index &&
+
+	git clone "file://$(pwd)/server" repo &&
+	HASH=$(git -C repo rev-parse foo) &&
+	rm -rf repo/.git/objects/* &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config extensions.partialclone "origin" &&
+	git -C repo cat-file -p "$HASH" &&
+
+	# Ensure that the .promisor file is written, and check that its
+	# associated packfile contains the object
+	ls repo/.git/objects/pack/pack-*.promisor >promisorlist &&
+	test_line_count = 1 promisorlist &&
+	IDX=$(cat promisorlist | sed "s/promisor$/idx/") &&
+	git verify-pack --verbose "$IDX" | grep "$HASH"
+'
+
+LIB_HTTPD_PORT=12345  # default port, 410, cannot be used as non-root
+. "$TEST_DIRECTORY"/lib-httpd.sh
+start_httpd
+
+test_expect_success 'fetching of missing objects from an HTTP server' '
+	rm -rf repo &&
+	SERVER="$HTTPD_DOCUMENT_ROOT_PATH/server" &&
+	test_create_repo "$SERVER" &&
+	test_commit -C "$SERVER" foo &&
+	git -C "$SERVER" repack -a -d --write-bitmap-index &&
+
+	git clone $HTTPD_URL/smart/server repo &&
+	HASH=$(git -C repo rev-parse foo) &&
+	rm -rf repo/.git/objects/* &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config extensions.partialclone "origin" &&
+	git -C repo cat-file -p "$HASH" &&
+
+	# Ensure that the .promisor file is written, and check that its
+	# associated packfile contains the object
+	ls repo/.git/objects/pack/pack-*.promisor >promisorlist &&
+	test_line_count = 1 promisorlist &&
+	IDX=$(cat promisorlist | sed "s/promisor$/idx/") &&
+	git verify-pack --verbose "$IDX" | grep "$HASH"
+'
+
+stop_httpd
+
 test_done
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v7 02/16] rev-list: support termination at promisor objects
  2017-12-08 15:58 [PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests Jeff Hostetler
  2017-12-08 15:58 ` [PATCH v7 01/16] sha1_file: support lazily fetching missing objects Jeff Hostetler
@ 2017-12-08 15:58 ` Jeff Hostetler
  2017-12-08 15:58 ` [PATCH v7 03/16] gc: do not repack promisor packfiles Jeff Hostetler
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Jeff Hostetler @ 2017-12-08 15:58 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, jonathantanmy, Jeff Hostetler

From: Jonathan Tan <jonathantanmy@google.com>

Teach rev-list to support termination of an object traversal at any
object from a promisor remote (whether one that the local repo also has,
or one that the local repo knows about because it has another promisor
object that references it).

This will be used subsequently in gc and in the connectivity check used
by fetch.

For efficiency, if an object is referenced by a promisor object, and is
in the local repo only as a non-promisor object, object traversal will
not stop there. This is to avoid building the list of promisor object
references.

(In list-objects.c, the case where obj is NULL in process_blob() and
process_tree() do not need to be changed because those happen only when
there is a conflict between the expected type and the existing object.
If the object doesn't exist, an object will be synthesized, which is
fine.)

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Documentation/rev-list-options.txt |  11 ++++
 builtin/rev-list.c                 |  71 +++++++++++++++++++++++---
 list-objects.c                     |  29 ++++++++++-
 object.c                           |   2 +-
 revision.c                         |  33 +++++++++++-
 revision.h                         |   5 +-
 t/t0410-partial-clone.sh           | 101 +++++++++++++++++++++++++++++++++++++
 7 files changed, 240 insertions(+), 12 deletions(-)

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index 8d8b7f4..0ce8ccd 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -745,10 +745,21 @@ The form '--missing=allow-any' will allow object traversal to continue
 if a missing object is encountered.  Missing objects will silently be
 omitted from the results.
 +
+The form '--missing=allow-promisor' is like 'allow-any', but will only
+allow object traversal to continue for EXPECTED promisor missing objects.
+Unexpected missing objects will raise an error.
++
 The form '--missing=print' is like 'allow-any', but will also print a
 list of the missing objects.  Object IDs are prefixed with a ``?'' character.
 endif::git-rev-list[]
 
+--exclude-promisor-objects::
+	(For internal use only.)  Prefilter object traversal at
+	promisor boundary.  This is used with partial clone.  This is
+	stronger than `--missing=allow-promisor` because it limits the
+	traversal, rather than just silencing errors about missing
+	objects.
+
 --no-walk[=(sorted|unsorted)]::
 	Only show the given commits, but do not traverse their ancestors.
 	This has no effect if a range is specified. If the argument
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 547a3e0..48f922d 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -15,6 +15,7 @@
 #include "progress.h"
 #include "reflog-walk.h"
 #include "oidset.h"
+#include "packfile.h"
 
 static const char rev_list_usage[] =
 "git rev-list [OPTION] <commit-id>... [ -- paths... ]\n"
@@ -67,6 +68,7 @@ enum missing_action {
 	MA_ERROR = 0,    /* fail if any missing objects are encountered */
 	MA_ALLOW_ANY,    /* silently allow ALL missing objects */
 	MA_PRINT,        /* print ALL missing objects in special section */
+	MA_ALLOW_PROMISOR, /* silently allow all missing PROMISOR objects */
 };
 static enum missing_action arg_missing_action;
 
@@ -197,6 +199,12 @@ static void finish_commit(struct commit *commit, void *data)
 
 static inline void finish_object__ma(struct object *obj)
 {
+	/*
+	 * Whether or not we try to dynamically fetch missing objects
+	 * from the server, we currently DO NOT have the object.  We
+	 * can either print, allow (ignore), or conditionally allow
+	 * (ignore) them.
+	 */
 	switch (arg_missing_action) {
 	case MA_ERROR:
 		die("missing blob object '%s'", oid_to_hex(&obj->oid));
@@ -209,25 +217,36 @@ static inline void finish_object__ma(struct object *obj)
 		oidset_insert(&missing_objects, &obj->oid);
 		return;
 
+	case MA_ALLOW_PROMISOR:
+		if (is_promisor_object(&obj->oid))
+			return;
+		die("unexpected missing blob object '%s'",
+		    oid_to_hex(&obj->oid));
+		return;
+
 	default:
 		BUG("unhandled missing_action");
 		return;
 	}
 }
 
-static void finish_object(struct object *obj, const char *name, void *cb_data)
+static int finish_object(struct object *obj, const char *name, void *cb_data)
 {
 	struct rev_list_info *info = cb_data;
-	if (obj->type == OBJ_BLOB && !has_object_file(&obj->oid))
+	if (obj->type == OBJ_BLOB && !has_object_file(&obj->oid)) {
 		finish_object__ma(obj);
+		return 1;
+	}
 	if (info->revs->verify_objects && !obj->parsed && obj->type != OBJ_COMMIT)
 		parse_object(&obj->oid);
+	return 0;
 }
 
 static void show_object(struct object *obj, const char *name, void *cb_data)
 {
 	struct rev_list_info *info = cb_data;
-	finish_object(obj, name, cb_data);
+	if (finish_object(obj, name, cb_data))
+		return;
 	display_progress(progress, ++progress_counter);
 	if (info->flags & REV_LIST_QUIET)
 		return;
@@ -315,11 +334,19 @@ static inline int parse_missing_action_value(const char *value)
 
 	if (!strcmp(value, "allow-any")) {
 		arg_missing_action = MA_ALLOW_ANY;
+		fetch_if_missing = 0;
 		return 1;
 	}
 
 	if (!strcmp(value, "print")) {
 		arg_missing_action = MA_PRINT;
+		fetch_if_missing = 0;
+		return 1;
+	}
+
+	if (!strcmp(value, "allow-promisor")) {
+		arg_missing_action = MA_ALLOW_PROMISOR;
+		fetch_if_missing = 0;
 		return 1;
 	}
 
@@ -344,6 +371,35 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 	init_revisions(&revs, prefix);
 	revs.abbrev = DEFAULT_ABBREV;
 	revs.commit_format = CMIT_FMT_UNSPECIFIED;
+
+	/*
+	 * Scan the argument list before invoking setup_revisions(), so that we
+	 * know if fetch_if_missing needs to be set to 0.
+	 *
+	 * "--exclude-promisor-objects" acts as a pre-filter on missing objects
+	 * by not crossing the boundary from realized objects to promisor
+	 * objects.
+	 *
+	 * Let "--missing" to conditionally set fetch_if_missing.
+	 */
+	for (i = 1; i < argc; i++) {
+		const char *arg = argv[i];
+		if (!strcmp(arg, "--exclude-promisor-objects")) {
+			fetch_if_missing = 0;
+			revs.exclude_promisor_objects = 1;
+			break;
+		}
+	}
+	for (i = 1; i < argc; i++) {
+		const char *arg = argv[i];
+		if (skip_prefix(arg, "--missing=", &arg)) {
+			if (revs.exclude_promisor_objects)
+				die(_("cannot combine --exclude-promisor-objects and --missing"));
+			if (parse_missing_action_value(arg))
+				break;
+		}
+	}
+
 	argc = setup_revisions(argc, argv, &revs, NULL);
 
 	memset(&info, 0, sizeof(info));
@@ -412,10 +468,11 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 			continue;
 		}
 
-		if (skip_prefix(arg, "--missing=", &arg) &&
-		    parse_missing_action_value(arg))
-			continue;
-		
+		if (!strcmp(arg, "--exclude-promisor-objects"))
+			continue; /* already handled above */
+		if (skip_prefix(arg, "--missing=", &arg))
+			continue; /* already handled above */
+
 		usage(rev_list_usage);
 
 	}
diff --git a/list-objects.c b/list-objects.c
index d9e83d0..58621fc 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -9,6 +9,7 @@
 #include "list-objects.h"
 #include "list-objects-filter.h"
 #include "list-objects-filter-options.h"
+#include "packfile.h"
 
 static void process_blob(struct rev_info *revs,
 			 struct blob *blob,
@@ -30,6 +31,20 @@ static void process_blob(struct rev_info *revs,
 	if (obj->flags & (UNINTERESTING | SEEN))
 		return;
 
+	/*
+	 * Pre-filter known-missing objects when explicitly requested.
+	 * Otherwise, a missing object error message may be reported
+	 * later (depending on other filtering criteria).
+	 *
+	 * Note that this "--exclude-promisor-objects" pre-filtering
+	 * may cause the actual filter to report an incomplete list
+	 * of missing objects.
+	 */
+	if (revs->exclude_promisor_objects &&
+	    !has_object_file(&obj->oid) &&
+	    is_promisor_object(&obj->oid))
+		return;
+
 	pathlen = path->len;
 	strbuf_addstr(path, name);
 	if (filter_fn)
@@ -91,6 +106,8 @@ static void process_tree(struct rev_info *revs,
 		all_entries_interesting: entry_not_interesting;
 	int baselen = base->len;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
+	int gently = revs->ignore_missing_links ||
+		     revs->exclude_promisor_objects;
 
 	if (!revs->tree_objects)
 		return;
@@ -98,9 +115,19 @@ static void process_tree(struct rev_info *revs,
 		die("bad tree object");
 	if (obj->flags & (UNINTERESTING | SEEN))
 		return;
-	if (parse_tree_gently(tree, revs->ignore_missing_links) < 0) {
+	if (parse_tree_gently(tree, gently) < 0) {
 		if (revs->ignore_missing_links)
 			return;
+
+		/*
+		 * Pre-filter known-missing tree objects when explicitly
+		 * requested.  This may cause the actual filter to report
+		 * an incomplete list of missing objects.
+		 */
+		if (revs->exclude_promisor_objects &&
+		    is_promisor_object(&obj->oid))
+			return;
+
 		die("bad tree object %s", oid_to_hex(&obj->oid));
 	}
 
diff --git a/object.c b/object.c
index b9a4a0e..4c222d6 100644
--- a/object.c
+++ b/object.c
@@ -252,7 +252,7 @@ struct object *parse_object(const struct object_id *oid)
 	if (obj && obj->parsed)
 		return obj;
 
-	if ((obj && obj->type == OBJ_BLOB) ||
+	if ((obj && obj->type == OBJ_BLOB && has_object_file(oid)) ||
 	    (!obj && has_object_file(oid) &&
 	     sha1_object_info(oid->hash, NULL) == OBJ_BLOB)) {
 		if (check_sha1_signature(repl, NULL, 0, NULL) < 0) {
diff --git a/revision.c b/revision.c
index d167223..05a7aac 100644
--- a/revision.c
+++ b/revision.c
@@ -198,6 +198,8 @@ static struct object *get_reference(struct rev_info *revs, const char *name,
 	if (!object) {
 		if (revs->ignore_missing)
 			return object;
+		if (revs->exclude_promisor_objects && is_promisor_object(oid))
+			return NULL;
 		die("bad object %s", name);
 	}
 	object->flags |= flags;
@@ -790,9 +792,17 @@ static int add_parents_to_list(struct rev_info *revs, struct commit *commit,
 
 	for (parent = commit->parents; parent; parent = parent->next) {
 		struct commit *p = parent->item;
-
-		if (parse_commit_gently(p, revs->ignore_missing_links) < 0)
+		int gently = revs->ignore_missing_links ||
+			     revs->exclude_promisor_objects;
+		if (parse_commit_gently(p, gently) < 0) {
+			if (revs->exclude_promisor_objects &&
+			    is_promisor_object(&p->object.oid)) {
+				if (revs->first_parent_only)
+					break;
+				continue;
+			}
 			return -1;
+		}
 		if (revs->show_source && !p->util)
 			p->util = commit->util;
 		p->object.flags |= left_flag;
@@ -2088,6 +2098,10 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 		revs->limited = 1;
 	} else if (!strcmp(arg, "--ignore-missing")) {
 		revs->ignore_missing = 1;
+	} else if (!strcmp(arg, "--exclude-promisor-objects")) {
+		if (fetch_if_missing)
+			die("BUG: exclude_promisor_objects can only be used when fetch_if_missing is 0");
+		revs->exclude_promisor_objects = 1;
 	} else {
 		int opts = diff_opt_parse(&revs->diffopt, argv, argc, revs->prefix);
 		if (!opts)
@@ -2830,6 +2844,16 @@ void reset_revision_walk(void)
 	clear_object_flags(SEEN | ADDED | SHOWN);
 }
 
+static int mark_uninteresting(const struct object_id *oid,
+			      struct packed_git *pack,
+			      uint32_t pos,
+			      void *unused)
+{
+	struct object *o = parse_object(oid);
+	o->flags |= UNINTERESTING | SEEN;
+	return 0;
+}
+
 int prepare_revision_walk(struct rev_info *revs)
 {
 	int i;
@@ -2858,6 +2882,11 @@ int prepare_revision_walk(struct rev_info *revs)
 	    (revs->limited && limiting_can_increase_treesame(revs)))
 		revs->treesame.name = "treesame";
 
+	if (revs->exclude_promisor_objects) {
+		for_each_packed_object(mark_uninteresting, NULL,
+				       FOR_EACH_OBJECT_PROMISOR_ONLY);
+	}
+
 	if (revs->no_walk != REVISION_WALK_NO_WALK_UNSORTED)
 		commit_list_sort_by_date(&revs->commits);
 	if (revs->no_walk)
diff --git a/revision.h b/revision.h
index 5476120..5f9a49c 100644
--- a/revision.h
+++ b/revision.h
@@ -121,7 +121,10 @@ struct rev_info {
 			bisect:1,
 			ancestry_path:1,
 			first_parent_only:1,
-			line_level_traverse:1;
+			line_level_traverse:1,
+
+			/* for internal use only */
+			exclude_promisor_objects:1;
 
 	/* Diff flags */
 	unsigned int	diff:1,
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 8a90f6a..3ca6af5 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -160,6 +160,107 @@ test_expect_success 'fetching of missing objects' '
 	git verify-pack --verbose "$IDX" | grep "$HASH"
 '
 
+test_expect_success 'rev-list stops traversal at missing and promised commit' '
+	rm -rf repo &&
+	test_create_repo repo &&
+	test_commit -C repo foo &&
+	test_commit -C repo bar &&
+
+	FOO=$(git -C repo rev-parse foo) &&
+	promise_and_delete "$FOO" &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config extensions.partialclone "arbitrary string" &&
+	git -C repo rev-list --exclude-promisor-objects --objects bar >out &&
+	grep $(git -C repo rev-parse bar) out &&
+	! grep $FOO out
+'
+
+test_expect_success 'rev-list stops traversal at missing and promised tree' '
+	rm -rf repo &&
+	test_create_repo repo &&
+	test_commit -C repo foo &&
+	mkdir repo/a_dir &&
+	echo something >repo/a_dir/something &&
+	git -C repo add a_dir/something &&
+	git -C repo commit -m bar &&
+
+	# foo^{tree} (tree referenced from commit)
+	TREE=$(git -C repo rev-parse foo^{tree}) &&
+
+	# a tree referenced by HEAD^{tree} (tree referenced from tree)
+	TREE2=$(git -C repo ls-tree HEAD^{tree} | grep " tree " | head -1 | cut -b13-52) &&
+
+	promise_and_delete "$TREE" &&
+	promise_and_delete "$TREE2" &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config extensions.partialclone "arbitrary string" &&
+	git -C repo rev-list --exclude-promisor-objects --objects HEAD >out &&
+	grep $(git -C repo rev-parse foo) out &&
+	! grep $TREE out &&
+	grep $(git -C repo rev-parse HEAD) out &&
+	! grep $TREE2 out
+'
+
+test_expect_success 'rev-list stops traversal at missing and promised blob' '
+	rm -rf repo &&
+	test_create_repo repo &&
+	echo something >repo/something &&
+	git -C repo add something &&
+	git -C repo commit -m foo &&
+
+	BLOB=$(git -C repo hash-object -w something) &&
+	promise_and_delete "$BLOB" &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config extensions.partialclone "arbitrary string" &&
+	git -C repo rev-list --exclude-promisor-objects --objects HEAD >out &&
+	grep $(git -C repo rev-parse HEAD) out &&
+	! grep $BLOB out
+'
+
+test_expect_success 'rev-list stops traversal at promisor commit, tree, and blob' '
+	rm -rf repo &&
+	test_create_repo repo &&
+	test_commit -C repo foo &&
+	test_commit -C repo bar &&
+	test_commit -C repo baz &&
+
+	COMMIT=$(git -C repo rev-parse foo) &&
+	TREE=$(git -C repo rev-parse bar^{tree}) &&
+	BLOB=$(git hash-object repo/baz.t) &&
+	printf "%s\n%s\n%s\n" $COMMIT $TREE $BLOB | pack_as_from_promisor &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config extensions.partialclone "arbitrary string" &&
+	git -C repo rev-list --exclude-promisor-objects --objects HEAD >out &&
+	! grep $COMMIT out &&
+	! grep $TREE out &&
+	! grep $BLOB out &&
+	grep $(git -C repo rev-parse bar) out  # sanity check that some walking was done
+'
+
+test_expect_success 'rev-list accepts missing and promised objects on command line' '
+	rm -rf repo &&
+	test_create_repo repo &&
+	test_commit -C repo foo &&
+	test_commit -C repo bar &&
+	test_commit -C repo baz &&
+
+	COMMIT=$(git -C repo rev-parse foo) &&
+	TREE=$(git -C repo rev-parse bar^{tree}) &&
+	BLOB=$(git hash-object repo/baz.t) &&
+
+	promise_and_delete $COMMIT &&
+	promise_and_delete $TREE &&
+	promise_and_delete $BLOB &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config extensions.partialclone "arbitrary string" &&
+	git -C repo rev-list --exclude-promisor-objects --objects "$COMMIT" "$TREE" "$BLOB"
+'
+
 LIB_HTTPD_PORT=12345  # default port, 410, cannot be used as non-root
 . "$TEST_DIRECTORY"/lib-httpd.sh
 start_httpd
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v7 03/16] gc: do not repack promisor packfiles
  2017-12-08 15:58 [PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests Jeff Hostetler
  2017-12-08 15:58 ` [PATCH v7 01/16] sha1_file: support lazily fetching missing objects Jeff Hostetler
  2017-12-08 15:58 ` [PATCH v7 02/16] rev-list: support termination at promisor objects Jeff Hostetler
@ 2017-12-08 15:58 ` Jeff Hostetler
  2017-12-08 15:58 ` [PATCH v7 04/16] upload-pack: add object filtering for partial clone Jeff Hostetler
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Jeff Hostetler @ 2017-12-08 15:58 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, jonathantanmy, Jeff Hostetler

From: Jonathan Tan <jonathantanmy@google.com>

Teach gc to stop traversal at promisor objects, and to leave promisor
packfiles alone. This has the effect of only repacking non-promisor
packfiles, and preserves the distinction between promisor packfiles and
non-promisor packfiles.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Documentation/git-pack-objects.txt | 11 ++++++++
 builtin/gc.c                       |  3 +++
 builtin/pack-objects.c             | 37 ++++++++++++++++++++++++--
 builtin/prune.c                    |  7 +++++
 builtin/repack.c                   |  8 ++++--
 t/t0410-partial-clone.sh           | 54 ++++++++++++++++++++++++++++++++++++--
 6 files changed, 114 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
index aa403d0..81bc490 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -255,6 +255,17 @@ a missing object is encountered.  This is the default action.
 The form '--missing=allow-any' will allow object traversal to continue
 if a missing object is encountered.  Missing objects will silently be
 omitted from the results.
++
+The form '--missing=allow-promisor' is like 'allow-any', but will only
+allow object traversal to continue for EXPECTED promisor missing objects.
+Unexpected missing object will raise an error.
+
+--exclude-promisor-objects::
+	Omit objects that are known to be in the promisor remote.  (This
+	option has the purpose of operating only on locally created objects,
+	so that when we repack, we still maintain a distinction between
+	locally created objects [without .promisor] and objects from the
+	promisor remote [with .promisor].)  This is used with partial clone.
 
 SEE ALSO
 --------
diff --git a/builtin/gc.c b/builtin/gc.c
index 3c5eae0..77fa720 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -458,6 +458,9 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 			argv_array_push(&prune, prune_expire);
 			if (quiet)
 				argv_array_push(&prune, "--no-progress");
+			if (repository_format_partial_clone)
+				argv_array_push(&prune,
+						"--exclude-promisor-objects");
 			if (run_command_v_opt(prune.argv, RUN_GIT_CMD))
 				return error(FAILED_RUN, prune.argv[0]);
 		}
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 45ad35d..f5fc401 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -75,6 +75,8 @@ static int use_bitmap_index = -1;
 static int write_bitmap_index;
 static uint16_t write_bitmap_options;
 
+static int exclude_promisor_objects;
+
 static unsigned long delta_cache_size = 0;
 static unsigned long max_delta_cache_size = 256 * 1024 * 1024;
 static unsigned long cache_max_small_delta_size = 1000;
@@ -84,8 +86,9 @@ static unsigned long window_memory_limit = 0;
 static struct list_objects_filter_options filter_options;
 
 enum missing_action {
-	MA_ERROR = 0,    /* fail if any missing objects are encountered */
-	MA_ALLOW_ANY,    /* silently allow ALL missing objects */
+	MA_ERROR = 0,      /* fail if any missing objects are encountered */
+	MA_ALLOW_ANY,      /* silently allow ALL missing objects */
+	MA_ALLOW_PROMISOR, /* silently allow all missing PROMISOR objects */
 };
 static enum missing_action arg_missing_action;
 static show_object_fn fn_show_object;
@@ -2577,6 +2580,20 @@ static void show_object__ma_allow_any(struct object *obj, const char *name, void
 	show_object(obj, name, data);
 }
 
+static void show_object__ma_allow_promisor(struct object *obj, const char *name, void *data)
+{
+	assert(arg_missing_action == MA_ALLOW_PROMISOR);
+
+	/*
+	 * Quietly ignore EXPECTED missing objects.  This avoids problems with
+	 * staging them now and getting an odd error later.
+	 */
+	if (!has_object_file(&obj->oid) && is_promisor_object(&obj->oid))
+		return;
+
+	show_object(obj, name, data);
+}
+
 static int option_parse_missing_action(const struct option *opt,
 				       const char *arg, int unset)
 {
@@ -2591,10 +2608,18 @@ static int option_parse_missing_action(const struct option *opt,
 
 	if (!strcmp(arg, "allow-any")) {
 		arg_missing_action = MA_ALLOW_ANY;
+		fetch_if_missing = 0;
 		fn_show_object = show_object__ma_allow_any;
 		return 0;
 	}
 
+	if (!strcmp(arg, "allow-promisor")) {
+		arg_missing_action = MA_ALLOW_PROMISOR;
+		fetch_if_missing = 0;
+		fn_show_object = show_object__ma_allow_promisor;
+		return 0;
+	}
+
 	die(_("invalid value for --missing"));
 	return 0;
 }
@@ -3008,6 +3033,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 		{ OPTION_CALLBACK, 0, "missing", NULL, N_("action"),
 		  N_("handling for missing objects"), PARSE_OPT_NONEG,
 		  option_parse_missing_action },
+		OPT_BOOL(0, "exclude-promisor-objects", &exclude_promisor_objects,
+			 N_("do not pack objects in promisor packfiles")),
 		OPT_END(),
 	};
 
@@ -3053,6 +3080,12 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 		argv_array_push(&rp, "--unpacked");
 	}
 
+	if (exclude_promisor_objects) {
+		use_internal_rev_list = 1;
+		fetch_if_missing = 0;
+		argv_array_push(&rp, "--exclude-promisor-objects");
+	}
+
 	if (!reuse_object)
 		reuse_delta = 0;
 	if (pack_compression_level == -1)
diff --git a/builtin/prune.c b/builtin/prune.c
index cddabf2..be34645 100644
--- a/builtin/prune.c
+++ b/builtin/prune.c
@@ -101,12 +101,15 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
 {
 	struct rev_info revs;
 	struct progress *progress = NULL;
+	int exclude_promisor_objects = 0;
 	const struct option options[] = {
 		OPT__DRY_RUN(&show_only, N_("do not remove, show only")),
 		OPT__VERBOSE(&verbose, N_("report pruned objects")),
 		OPT_BOOL(0, "progress", &show_progress, N_("show progress")),
 		OPT_EXPIRY_DATE(0, "expire", &expire,
 				N_("expire objects older than <time>")),
+		OPT_BOOL(0, "exclude-promisor-objects", &exclude_promisor_objects,
+			 N_("limit traversal to objects outside promisor packfiles")),
 		OPT_END()
 	};
 	char *s;
@@ -139,6 +142,10 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
 		show_progress = isatty(2);
 	if (show_progress)
 		progress = start_delayed_progress(_("Checking connectivity"), 0);
+	if (exclude_promisor_objects) {
+		fetch_if_missing = 0;
+		revs.exclude_promisor_objects = 1;
+	}
 
 	mark_reachable_objects(&revs, 1, expire, progress);
 	stop_progress(&progress);
diff --git a/builtin/repack.c b/builtin/repack.c
index f17a68a..7bdb401 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -83,7 +83,8 @@ static void remove_pack_on_signal(int signo)
 
 /*
  * Adds all packs hex strings to the fname list, which do not
- * have a corresponding .keep file.
+ * have a corresponding .keep or .promisor file. These packs are not to
+ * be kept if we are going to pack everything into one file.
  */
 static void get_non_kept_pack_filenames(struct string_list *fname_list)
 {
@@ -101,7 +102,8 @@ static void get_non_kept_pack_filenames(struct string_list *fname_list)
 
 		fname = xmemdupz(e->d_name, len);
 
-		if (!file_exists(mkpath("%s/%s.keep", packdir, fname)))
+		if (!file_exists(mkpath("%s/%s.keep", packdir, fname)) &&
+		    !file_exists(mkpath("%s/%s.promisor", packdir, fname)))
 			string_list_append_nodup(fname_list, fname);
 		else
 			free(fname);
@@ -232,6 +234,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	argv_array_push(&cmd.args, "--all");
 	argv_array_push(&cmd.args, "--reflog");
 	argv_array_push(&cmd.args, "--indexed-objects");
+	if (repository_format_partial_clone)
+		argv_array_push(&cmd.args, "--exclude-promisor-objects");
 	if (window)
 		argv_array_pushf(&cmd.args, "--window=%s", window);
 	if (window_memory)
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 3ca6af5..cc18b75 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -10,14 +10,16 @@ delete_object () {
 
 pack_as_from_promisor () {
 	HASH=$(git -C repo pack-objects .git/objects/pack/pack) &&
-	>repo/.git/objects/pack/pack-$HASH.promisor
+	>repo/.git/objects/pack/pack-$HASH.promisor &&
+	echo $HASH
 }
 
 promise_and_delete () {
 	HASH=$(git -C repo rev-parse "$1") &&
 	git -C repo tag -a -m message my_annotated_tag "$HASH" &&
 	git -C repo rev-parse my_annotated_tag | pack_as_from_promisor &&
-	git -C repo tag -d my_annotated_tag &&
+	# tag -d prints a message to stdout, so redirect it
+	git -C repo tag -d my_annotated_tag >/dev/null &&
 	delete_object repo "$HASH"
 }
 
@@ -261,6 +263,54 @@ test_expect_success 'rev-list accepts missing and promised objects on command li
 	git -C repo rev-list --exclude-promisor-objects --objects "$COMMIT" "$TREE" "$BLOB"
 '
 
+test_expect_success 'gc does not repack promisor objects' '
+	rm -rf repo &&
+	test_create_repo repo &&
+	test_commit -C repo my_commit &&
+
+	TREE_HASH=$(git -C repo rev-parse HEAD^{tree}) &&
+	HASH=$(printf "$TREE_HASH\n" | pack_as_from_promisor) &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config extensions.partialclone "arbitrary string" &&
+	git -C repo gc &&
+
+	# Ensure that the promisor packfile still exists, and remove it
+	test -e repo/.git/objects/pack/pack-$HASH.pack &&
+	rm repo/.git/objects/pack/pack-$HASH.* &&
+
+	# Ensure that the single other pack contains the commit, but not the tree
+	ls repo/.git/objects/pack/pack-*.pack >packlist &&
+	test_line_count = 1 packlist &&
+	git verify-pack repo/.git/objects/pack/pack-*.pack -v >out &&
+	grep "$(git -C repo rev-parse HEAD)" out &&
+	! grep "$TREE_HASH" out
+'
+
+test_expect_success 'gc stops traversal when a missing but promised object is reached' '
+	rm -rf repo &&
+	test_create_repo repo &&
+	test_commit -C repo my_commit &&
+
+	TREE_HASH=$(git -C repo rev-parse HEAD^{tree}) &&
+	HASH=$(promise_and_delete $TREE_HASH) &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config extensions.partialclone "arbitrary string" &&
+	git -C repo gc &&
+
+	# Ensure that the promisor packfile still exists, and remove it
+	test -e repo/.git/objects/pack/pack-$HASH.pack &&
+	rm repo/.git/objects/pack/pack-$HASH.* &&
+
+	# Ensure that the single other pack contains the commit, but not the tree
+	ls repo/.git/objects/pack/pack-*.pack >packlist &&
+	test_line_count = 1 packlist &&
+	git verify-pack repo/.git/objects/pack/pack-*.pack -v >out &&
+	grep "$(git -C repo rev-parse HEAD)" out &&
+	! grep "$TREE_HASH" out
+'
+
 LIB_HTTPD_PORT=12345  # default port, 410, cannot be used as non-root
 . "$TEST_DIRECTORY"/lib-httpd.sh
 start_httpd
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v7 04/16] upload-pack: add object filtering for partial clone
  2017-12-08 15:58 [PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests Jeff Hostetler
                   ` (2 preceding siblings ...)
  2017-12-08 15:58 ` [PATCH v7 03/16] gc: do not repack promisor packfiles Jeff Hostetler
@ 2017-12-08 15:58 ` Jeff Hostetler
  2017-12-08 15:58 ` [PATCH v7 05/16] fetch-pack, index-pack, transport: " Jeff Hostetler
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Jeff Hostetler @ 2017-12-08 15:58 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, jonathantanmy, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Teach upload-pack to negotiate object filtering over the protocol and
to send filter parameters to pack-objects.  This is intended for partial
clone and fetch.

The idea to make upload-pack configurable using uploadpack.allowFilter
comes from Jonathan Tan's work in [1].

[1] https://public-inbox.org/git/f211093280b422c32cc1b7034130072f35c5ed51.1506714999.git.jonathantanmy@google.com/

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Documentation/config.txt                          |  4 ++++
 Documentation/technical/pack-protocol.txt         |  8 +++++++
 Documentation/technical/protocol-capabilities.txt |  8 +++++++
 upload-pack.c                                     | 26 ++++++++++++++++++++++-
 4 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 1ac0ae6..e528210 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -3268,6 +3268,10 @@ uploadpack.packObjectsHook::
 	was run. I.e., `upload-pack` will feed input intended for
 	`pack-objects` to the hook, and expects a completed packfile on
 	stdout.
+
+uploadpack.allowFilter::
+	If this option is set, `upload-pack` will advertise partial
+	clone and partial fetch object filtering.
 +
 Note that this configuration variable is ignored if it is seen in the
 repository-level config (this is a safety measure against fetching from
diff --git a/Documentation/technical/pack-protocol.txt b/Documentation/technical/pack-protocol.txt
index ed1eae8..a43a113 100644
--- a/Documentation/technical/pack-protocol.txt
+++ b/Documentation/technical/pack-protocol.txt
@@ -212,6 +212,7 @@ out of what the server said it could do with the first 'want' line.
   upload-request    =  want-list
 		       *shallow-line
 		       *1depth-request
+		       [filter-request]
 		       flush-pkt
 
   want-list         =  first-want
@@ -227,6 +228,8 @@ out of what the server said it could do with the first 'want' line.
   additional-want   =  PKT-LINE("want" SP obj-id)
 
   depth             =  1*DIGIT
+
+  filter-request    =  PKT-LINE("filter" SP filter-spec)
 ----
 
 Clients MUST send all the obj-ids it wants from the reference
@@ -249,6 +252,11 @@ complete those commits. Commits whose parents are not received as a
 result are defined as shallow and marked as such in the server. This
 information is sent back to the client in the next step.
 
+The client can optionally request that pack-objects omit various
+objects from the packfile using one of several filtering techniques.
+These are intended for use with partial clone and partial fetch
+operations.  See `rev-list` for possible "filter-spec" values.
+
 Once all the 'want's and 'shallow's (and optional 'deepen') are
 transferred, clients MUST send a flush-pkt, to tell the server side
 that it is done sending the list.
diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt
index 26dcc6f..332d209 100644
--- a/Documentation/technical/protocol-capabilities.txt
+++ b/Documentation/technical/protocol-capabilities.txt
@@ -309,3 +309,11 @@ to accept a signed push certificate, and asks the <nonce> to be
 included in the push certificate.  A send-pack client MUST NOT
 send a push-cert packet unless the receive-pack server advertises
 this capability.
+
+filter
+------
+
+If the upload-pack server advertises the 'filter' capability,
+fetch-pack may send "filter" commands to request a partial clone
+or partial fetch and request that the server omit various objects
+from the packfile.
diff --git a/upload-pack.c b/upload-pack.c
index e25f725..e6d38b9 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -10,6 +10,8 @@
 #include "diff.h"
 #include "revision.h"
 #include "list-objects.h"
+#include "list-objects-filter.h"
+#include "list-objects-filter-options.h"
 #include "run-command.h"
 #include "connect.h"
 #include "sigchain.h"
@@ -18,6 +20,7 @@
 #include "parse-options.h"
 #include "argv-array.h"
 #include "prio-queue.h"
+#include "quote.h"
 
 static const char * const upload_pack_usage[] = {
 	N_("git upload-pack [<options>] <dir>"),
@@ -64,6 +67,10 @@ static int advertise_refs;
 static int stateless_rpc;
 static const char *pack_objects_hook;
 
+static int filter_capability_requested;
+static int filter_advertise;
+static struct list_objects_filter_options filter_options;
+
 static void reset_timeout(void)
 {
 	alarm(timeout);
@@ -131,6 +138,12 @@ static void create_pack_file(void)
 		argv_array_push(&pack_objects.args, "--delta-base-offset");
 	if (use_include_tag)
 		argv_array_push(&pack_objects.args, "--include-tag");
+	if (filter_options.filter_spec) {
+		struct strbuf buf = STRBUF_INIT;
+		sq_quote_buf(&buf, filter_options.filter_spec);
+		argv_array_pushf(&pack_objects.args, "--filter=%s", buf.buf);
+		strbuf_release(&buf);
+	}
 
 	pack_objects.in = -1;
 	pack_objects.out = -1;
@@ -794,6 +807,12 @@ static void receive_needs(void)
 			deepen_rev_list = 1;
 			continue;
 		}
+		if (skip_prefix(line, "filter ", &arg)) {
+			if (!filter_capability_requested)
+				die("git upload-pack: filtering capability not negotiated");
+			parse_list_objects_filter(&filter_options, arg);
+			continue;
+		}
 		if (!skip_prefix(line, "want ", &arg) ||
 		    get_oid_hex(arg, &oid_buf))
 			die("git upload-pack: protocol error, "
@@ -821,6 +840,8 @@ static void receive_needs(void)
 			no_progress = 1;
 		if (parse_feature_request(features, "include-tag"))
 			use_include_tag = 1;
+		if (parse_feature_request(features, "filter"))
+			filter_capability_requested = 1;
 
 		o = parse_object(&oid_buf);
 		if (!o) {
@@ -940,7 +961,7 @@ static int send_ref(const char *refname, const struct object_id *oid,
 		struct strbuf symref_info = STRBUF_INIT;
 
 		format_symref_info(&symref_info, cb_data);
-		packet_write_fmt(1, "%s %s%c%s%s%s%s%s agent=%s\n",
+		packet_write_fmt(1, "%s %s%c%s%s%s%s%s%s agent=%s\n",
 			     oid_to_hex(oid), refname_nons,
 			     0, capabilities,
 			     (allow_unadvertised_object_request & ALLOW_TIP_SHA1) ?
@@ -949,6 +970,7 @@ static int send_ref(const char *refname, const struct object_id *oid,
 				     " allow-reachable-sha1-in-want" : "",
 			     stateless_rpc ? " no-done" : "",
 			     symref_info.buf,
+			     filter_advertise ? " filter" : "",
 			     git_user_agent_sanitized());
 		strbuf_release(&symref_info);
 	} else {
@@ -1027,6 +1049,8 @@ static int upload_pack_config(const char *var, const char *value, void *unused)
 	} else if (current_config_scope() != CONFIG_SCOPE_REPO) {
 		if (!strcmp("uploadpack.packobjectshook", var))
 			return git_config_string(&pack_objects_hook, var, value);
+	} else if (!strcmp("uploadpack.allowfilter", var)) {
+		filter_advertise = git_config_bool(var, value);
 	}
 	return parse_hide_refs_config(var, value, "uploadpack");
 }
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v7 05/16] fetch-pack, index-pack, transport: partial clone
  2017-12-08 15:58 [PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests Jeff Hostetler
                   ` (3 preceding siblings ...)
  2017-12-08 15:58 ` [PATCH v7 04/16] upload-pack: add object filtering for partial clone Jeff Hostetler
@ 2017-12-08 15:58 ` Jeff Hostetler
  2017-12-08 15:58 ` [PATCH v7 06/16] fetch-pack: add --no-filter Jeff Hostetler
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Jeff Hostetler @ 2017-12-08 15:58 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, jonathantanmy, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 builtin/fetch-pack.c |  4 ++++
 fetch-pack.c         | 13 +++++++++++++
 fetch-pack.h         |  2 ++
 transport-helper.c   |  5 +++++
 transport.c          |  4 ++++
 transport.h          |  5 +++++
 6 files changed, 33 insertions(+)

diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 15eeed7..7957807 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -153,6 +153,10 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
 			args.no_dependents = 1;
 			continue;
 		}
+		if (skip_prefix(arg, ("--" CL_ARG__FILTER "="), &arg)) {
+			parse_list_objects_filter(&args.filter_options, arg);
+			continue;
+		}
 		usage(fetch_pack_usage);
 	}
 	if (deepen_not.nr)
diff --git a/fetch-pack.c b/fetch-pack.c
index 0798e0b..3c5f064 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -29,6 +29,7 @@ static int deepen_not_ok;
 static int fetch_fsck_objects = -1;
 static int transfer_fsck_objects = -1;
 static int agent_supported;
+static int server_supports_filtering;
 static struct lock_file shallow_lock;
 static const char *alternate_shallow_file;
 
@@ -379,6 +380,8 @@ static int find_common(struct fetch_pack_args *args,
 			if (deepen_not_ok)      strbuf_addstr(&c, " deepen-not");
 			if (agent_supported)    strbuf_addf(&c, " agent=%s",
 							    git_user_agent_sanitized());
+			if (args->filter_options.choice)
+				strbuf_addstr(&c, " filter");
 			packet_buf_write(&req_buf, "want %s%s\n", remote_hex, c.buf);
 			strbuf_release(&c);
 		} else
@@ -407,6 +410,9 @@ static int find_common(struct fetch_pack_args *args,
 			packet_buf_write(&req_buf, "deepen-not %s", s->string);
 		}
 	}
+	if (server_supports_filtering && args->filter_options.choice)
+		packet_buf_write(&req_buf, "filter %s",
+				 args->filter_options.filter_spec);
 	packet_buf_flush(&req_buf);
 	state_len = req_buf.len;
 
@@ -969,6 +975,13 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 	else
 		prefer_ofs_delta = 0;
 
+	if (server_supports("filter")) {
+		server_supports_filtering = 1;
+		print_verbose(args, _("Server supports filter"));
+	} else if (args->filter_options.choice) {
+		warning("filtering not recognized by server, ignoring");
+	}
+
 	if ((agent_feature = server_feature_value("agent", &agent_len))) {
 		agent_supported = 1;
 		if (agent_len)
diff --git a/fetch-pack.h b/fetch-pack.h
index aeac152..3e224a1 100644
--- a/fetch-pack.h
+++ b/fetch-pack.h
@@ -3,6 +3,7 @@
 
 #include "string-list.h"
 #include "run-command.h"
+#include "list-objects-filter-options.h"
 
 struct oid_array;
 
@@ -12,6 +13,7 @@ struct fetch_pack_args {
 	int depth;
 	const char *deepen_since;
 	const struct string_list *deepen_not;
+	struct list_objects_filter_options filter_options;
 	unsigned deepen_relative:1;
 	unsigned quiet:1;
 	unsigned keep_pack:1;
diff --git a/transport-helper.c b/transport-helper.c
index c948d52..0650ca0 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -671,6 +671,11 @@ static int fetch(struct transport *transport,
 	if (data->transport_options.update_shallow)
 		set_helper_option(transport, "update-shallow", "true");
 
+	if (data->transport_options.filter_options.choice)
+		set_helper_option(
+			transport, "filter",
+			data->transport_options.filter_options.filter_spec);
+
 	if (data->fetch)
 		return fetch_with_fetch(transport, nr_heads, to_fetch);
 
diff --git a/transport.c b/transport.c
index f2fbc6f..cce50f5 100644
--- a/transport.c
+++ b/transport.c
@@ -166,6 +166,9 @@ static int set_git_option(struct git_transport_options *opts,
 	} else if (!strcmp(name, TRANS_OPT_NO_DEPENDENTS)) {
 		opts->no_dependents = !!value;
 		return 0;
+	} else if (!strcmp(name, TRANS_OPT_LIST_OBJECTS_FILTER)) {
+		parse_list_objects_filter(&opts->filter_options, value);
+		return 0;
 	}
 	return 1;
 }
@@ -236,6 +239,7 @@ static int fetch_refs_via_pack(struct transport *transport,
 	args.update_shallow = data->options.update_shallow;
 	args.from_promisor = data->options.from_promisor;
 	args.no_dependents = data->options.no_dependents;
+	args.filter_options = data->options.filter_options;
 
 	if (!data->got_remote_heads) {
 		connect_setup(transport, 0);
diff --git a/transport.h b/transport.h
index c49a8ed..31b1936 100644
--- a/transport.h
+++ b/transport.h
@@ -4,6 +4,7 @@
 #include "cache.h"
 #include "run-command.h"
 #include "remote.h"
+#include "list-objects-filter-options.h"
 
 struct string_list;
 
@@ -23,6 +24,7 @@ struct git_transport_options {
 	const char *uploadpack;
 	const char *receivepack;
 	struct push_cas_option *cas;
+	struct list_objects_filter_options filter_options;
 };
 
 enum transport_family {
@@ -221,6 +223,9 @@ void transport_check_allowed(const char *type);
  */
 #define TRANS_OPT_NO_DEPENDENTS "no-dependents"
 
+/* Filter objects for partial clone and fetch */
+#define TRANS_OPT_LIST_OBJECTS_FILTER "filter"
+
 /**
  * Returns 0 if the option was used, non-zero otherwise. Prints a
  * message to stderr if the option is not used.
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v7 06/16] fetch-pack: add --no-filter
  2017-12-08 15:58 [PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests Jeff Hostetler
                   ` (4 preceding siblings ...)
  2017-12-08 15:58 ` [PATCH v7 05/16] fetch-pack, index-pack, transport: " Jeff Hostetler
@ 2017-12-08 15:58 ` Jeff Hostetler
  2017-12-08 15:58 ` [PATCH v7 07/16] fetch-pack: test support excluding large blobs Jeff Hostetler
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Jeff Hostetler @ 2017-12-08 15:58 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, jonathantanmy, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Fixup fetch-pack to accept --no-filter to be consistent with
rev-list and pack-objects.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 builtin/fetch-pack.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 7957807..cbf5035 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -157,6 +157,10 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
 			parse_list_objects_filter(&args.filter_options, arg);
 			continue;
 		}
+		if (!strcmp(arg, ("--no-" CL_ARG__FILTER))) {
+			list_objects_filter_release(&args.filter_options);
+			continue;
+		}
 		usage(fetch_pack_usage);
 	}
 	if (deepen_not.nr)
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v7 07/16] fetch-pack: test support excluding large blobs
  2017-12-08 15:58 [PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests Jeff Hostetler
                   ` (5 preceding siblings ...)
  2017-12-08 15:58 ` [PATCH v7 06/16] fetch-pack: add --no-filter Jeff Hostetler
@ 2017-12-08 15:58 ` Jeff Hostetler
  2017-12-08 15:58 ` [PATCH v7 08/16] fetch: refactor calculation of remote list Jeff Hostetler
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Jeff Hostetler @ 2017-12-08 15:58 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, jonathantanmy, Jeff Hostetler

From: Jonathan Tan <jonathantanmy@google.com>

Created tests to verify fetch-pack and upload-pack support
for excluding large blobs using --filter=blobs:limit=<n>
parameter.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 t/t5500-fetch-pack.sh | 27 +++++++++++++++++++++++++++
 upload-pack.c         | 13 +++++++++----
 2 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index 80a1a32..c57916b 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -755,4 +755,31 @@ test_expect_success 'fetching deepen' '
 	)
 '
 
+test_expect_success 'filtering by size' '
+	rm -rf server client &&
+	test_create_repo server &&
+	test_commit -C server one &&
+	test_config -C server uploadpack.allowfilter 1 &&
+
+	test_create_repo client &&
+	git -C client fetch-pack --filter=blob:limit=0 ../server HEAD &&
+
+	# Ensure that object is not inadvertently fetched
+	test_must_fail git -C client cat-file -e $(git hash-object server/one.t)
+'
+
+test_expect_success 'filtering by size has no effect if support for it is not advertised' '
+	rm -rf server client &&
+	test_create_repo server &&
+	test_commit -C server one &&
+
+	test_create_repo client &&
+	git -C client fetch-pack --filter=blob:limit=0 ../server HEAD 2> err &&
+
+	# Ensure that object is fetched
+	git -C client cat-file -e $(git hash-object server/one.t) &&
+
+	test_i18ngrep "filtering not recognized by server" err
+'
+
 test_done
diff --git a/upload-pack.c b/upload-pack.c
index e6d38b9..15b6605 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -139,10 +139,15 @@ static void create_pack_file(void)
 	if (use_include_tag)
 		argv_array_push(&pack_objects.args, "--include-tag");
 	if (filter_options.filter_spec) {
-		struct strbuf buf = STRBUF_INIT;
-		sq_quote_buf(&buf, filter_options.filter_spec);
-		argv_array_pushf(&pack_objects.args, "--filter=%s", buf.buf);
-		strbuf_release(&buf);
+		if (pack_objects.use_shell) {
+			struct strbuf buf = STRBUF_INIT;
+			sq_quote_buf(&buf, filter_options.filter_spec);
+			argv_array_pushf(&pack_objects.args, "--filter=%s", buf.buf);
+			strbuf_release(&buf);
+		} else {
+			argv_array_pushf(&pack_objects.args, "--filter=%s",
+					 filter_options.filter_spec);
+		}
 	}
 
 	pack_objects.in = -1;
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v7 08/16] fetch: refactor calculation of remote list
  2017-12-08 15:58 [PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests Jeff Hostetler
                   ` (6 preceding siblings ...)
  2017-12-08 15:58 ` [PATCH v7 07/16] fetch-pack: test support excluding large blobs Jeff Hostetler
@ 2017-12-08 15:58 ` Jeff Hostetler
  2017-12-08 15:58 ` [PATCH v7 09/16] fetch: support filters Jeff Hostetler
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Jeff Hostetler @ 2017-12-08 15:58 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, jonathantanmy

From: Jonathan Tan <jonathantanmy@google.com>

Separate out the calculation of remotes to be fetched from and the
actual fetching. This will allow us to include an additional step before
the actual fetching in a subsequent commit.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/fetch.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 225c734..1b1f039 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1322,7 +1322,7 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 {
 	int i;
 	struct string_list list = STRING_LIST_INIT_DUP;
-	struct remote *remote;
+	struct remote *remote = NULL;
 	int result = 0;
 	struct argv_array argv_gc_auto = ARGV_ARRAY_INIT;
 
@@ -1367,17 +1367,14 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 		else if (argc > 1)
 			die(_("fetch --all does not make sense with refspecs"));
 		(void) for_each_remote(get_one_remote_for_fetch, &list);
-		result = fetch_multiple(&list);
 	} else if (argc == 0) {
 		/* No arguments -- use default remote */
 		remote = remote_get(NULL);
-		result = fetch_one(remote, argc, argv);
 	} else if (multiple) {
 		/* All arguments are assumed to be remotes or groups */
 		for (i = 0; i < argc; i++)
 			if (!add_remote_or_group(argv[i], &list))
 				die(_("No such remote or remote group: %s"), argv[i]);
-		result = fetch_multiple(&list);
 	} else {
 		/* Single remote or group */
 		(void) add_remote_or_group(argv[0], &list);
@@ -1385,14 +1382,19 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 			/* More than one remote */
 			if (argc > 1)
 				die(_("Fetching a group and specifying refspecs does not make sense"));
-			result = fetch_multiple(&list);
 		} else {
 			/* Zero or one remotes */
 			remote = remote_get(argv[0]);
-			result = fetch_one(remote, argc-1, argv+1);
+			argc--;
+			argv++;
 		}
 	}
 
+	if (remote)
+		result = fetch_one(remote, argc, argv);
+	else
+		result = fetch_multiple(&list);
+
 	if (!result && (recurse_submodules != RECURSE_SUBMODULES_OFF)) {
 		struct argv_array options = ARGV_ARRAY_INIT;
 
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v7 09/16] fetch: support filters
  2017-12-08 15:58 [PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests Jeff Hostetler
                   ` (7 preceding siblings ...)
  2017-12-08 15:58 ` [PATCH v7 08/16] fetch: refactor calculation of remote list Jeff Hostetler
@ 2017-12-08 15:58 ` Jeff Hostetler
  2018-08-19 11:24   ` Duy Nguyen
  2017-12-08 15:58 ` [PATCH v7 10/16] partial-clone: define partial clone settings in config Jeff Hostetler
                   ` (8 subsequent siblings)
  17 siblings, 1 reply; 24+ messages in thread
From: Jeff Hostetler @ 2017-12-08 15:58 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, jonathantanmy, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Teach fetch to support filters. This is only allowed for the remote
configured in extensions.partialcloneremote.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/fetch.c       | 23 +++++++++++++++++++++--
 connected.c           |  2 ++
 remote-curl.c         |  6 ++++++
 t/t5500-fetch-pack.sh | 36 ++++++++++++++++++++++++++++++++++++
 4 files changed, 65 insertions(+), 2 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 1b1f039..14aab71 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -18,6 +18,7 @@
 #include "argv-array.h"
 #include "utf8.h"
 #include "packfile.h"
+#include "list-objects-filter-options.h"
 
 static const char * const builtin_fetch_usage[] = {
 	N_("git fetch [<options>] [<repository> [<refspec>...]]"),
@@ -55,6 +56,7 @@ static int recurse_submodules_default = RECURSE_SUBMODULES_ON_DEMAND;
 static int shown_url = 0;
 static int refmap_alloc, refmap_nr;
 static const char **refmap_array;
+static struct list_objects_filter_options filter_options;
 
 static int git_fetch_config(const char *k, const char *v, void *cb)
 {
@@ -160,6 +162,7 @@ static struct option builtin_fetch_options[] = {
 			TRANSPORT_FAMILY_IPV4),
 	OPT_SET_INT('6', "ipv6", &family, N_("use IPv6 addresses only"),
 			TRANSPORT_FAMILY_IPV6),
+	OPT_PARSE_LIST_OBJECTS_FILTER(&filter_options),
 	OPT_END()
 };
 
@@ -1044,6 +1047,11 @@ static struct transport *prepare_transport(struct remote *remote, int deepen)
 		set_option(transport, TRANS_OPT_DEEPEN_RELATIVE, "yes");
 	if (update_shallow)
 		set_option(transport, TRANS_OPT_UPDATE_SHALLOW, "yes");
+	if (filter_options.choice) {
+		set_option(transport, TRANS_OPT_LIST_OBJECTS_FILTER,
+			   filter_options.filter_spec);
+		set_option(transport, TRANS_OPT_FROM_PROMISOR, "1");
+	}
 	return transport;
 }
 
@@ -1328,6 +1336,8 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 
 	packet_trace_identity("fetch");
 
+	fetch_if_missing = 0;
+
 	/* Record the command line for the reflog */
 	strbuf_addstr(&default_rla, "fetch");
 	for (i = 1; i < argc; i++)
@@ -1361,6 +1371,9 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 	if (depth || deepen_since || deepen_not.nr)
 		deepen = 1;
 
+	if (filter_options.choice && !repository_format_partial_clone)
+		die("--filter can only be used when extensions.partialClone is set");
+
 	if (all) {
 		if (argc == 1)
 			die(_("fetch --all does not take a repository argument"));
@@ -1390,10 +1403,16 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 		}
 	}
 
-	if (remote)
+	if (remote) {
+		if (filter_options.choice &&
+		    strcmp(remote->name, repository_format_partial_clone))
+			die(_("--filter can only be used with the remote configured in core.partialClone"));
 		result = fetch_one(remote, argc, argv);
-	else
+	} else {
+		if (filter_options.choice)
+			die(_("--filter can only be used with the remote configured in core.partialClone"));
 		result = fetch_multiple(&list);
+	}
 
 	if (!result && (recurse_submodules != RECURSE_SUBMODULES_OFF)) {
 		struct argv_array options = ARGV_ARRAY_INIT;
diff --git a/connected.c b/connected.c
index f416b05..3a5bd67 100644
--- a/connected.c
+++ b/connected.c
@@ -56,6 +56,8 @@ int check_connected(sha1_iterate_fn fn, void *cb_data,
 	argv_array_push(&rev_list.args,"rev-list");
 	argv_array_push(&rev_list.args, "--objects");
 	argv_array_push(&rev_list.args, "--stdin");
+	if (repository_format_partial_clone)
+		argv_array_push(&rev_list.args, "--exclude-promisor-objects");
 	argv_array_push(&rev_list.args, "--not");
 	argv_array_push(&rev_list.args, "--all");
 	argv_array_push(&rev_list.args, "--quiet");
diff --git a/remote-curl.c b/remote-curl.c
index 4318391..6ec5352 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -24,6 +24,7 @@ struct options {
 	char *deepen_since;
 	struct string_list deepen_not;
 	struct string_list push_options;
+	char *filter;
 	unsigned progress : 1,
 		check_self_contained_and_connected : 1,
 		cloning : 1,
@@ -165,6 +166,9 @@ static int set_option(const char *name, const char *value)
 	} else if (!strcmp(name, "no-dependents")) {
 		options.no_dependents = 1;
 		return 0;
+	} else if (!strcmp(name, "filter")) {
+		options.filter = xstrdup(value);;
+		return 0;
 	} else {
 		return 1 /* unsupported */;
 	}
@@ -834,6 +838,8 @@ static int fetch_git(struct discovery *heads,
 		argv_array_push(&args, "--from-promisor");
 	if (options.no_dependents)
 		argv_array_push(&args, "--no-dependents");
+	if (options.filter)
+		argv_array_pushf(&args, "--filter=%s", options.filter);
 	argv_array_push(&args, url.buf);
 
 	for (i = 0; i < nr_heads; i++) {
diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index c57916b..ec9ba9b 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -782,4 +782,40 @@ test_expect_success 'filtering by size has no effect if support for it is not ad
 	test_i18ngrep "filtering not recognized by server" err
 '
 
+fetch_filter_blob_limit_zero () {
+	SERVER="$1"
+	URL="$2"
+
+	rm -rf "$SERVER" client &&
+	test_create_repo "$SERVER" &&
+	test_commit -C "$SERVER" one &&
+	test_config -C "$SERVER" uploadpack.allowfilter 1 &&
+
+	git clone "$URL" client &&
+	test_config -C client extensions.partialclone origin &&
+
+	test_commit -C "$SERVER" two &&
+
+	git -C client fetch --filter=blob:limit=0 origin HEAD:somewhere &&
+
+	# Ensure that commit is fetched, but blob is not
+	test_config -C client extensions.partialclone "arbitrary string" &&
+	git -C client cat-file -e $(git -C "$SERVER" rev-parse two) &&
+	test_must_fail git -C client cat-file -e $(git hash-object "$SERVER/two.t")
+}
+
+test_expect_success 'fetch with --filter=blob:limit=0' '
+	fetch_filter_blob_limit_zero server server
+'
+
+. "$TEST_DIRECTORY"/lib-httpd.sh
+start_httpd
+
+test_expect_success 'fetch with --filter=blob:limit=0 and HTTP' '
+	fetch_filter_blob_limit_zero "$HTTPD_DOCUMENT_ROOT_PATH/server" "$HTTPD_URL/smart/server"
+'
+
+stop_httpd
+
+
 test_done
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v7 10/16] partial-clone: define partial clone settings in config
  2017-12-08 15:58 [PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests Jeff Hostetler
                   ` (8 preceding siblings ...)
  2017-12-08 15:58 ` [PATCH v7 09/16] fetch: support filters Jeff Hostetler
@ 2017-12-08 15:58 ` Jeff Hostetler
  2017-12-08 15:58 ` [PATCH v7 11/16] clone: partial clone Jeff Hostetler
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Jeff Hostetler @ 2017-12-08 15:58 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, jonathantanmy, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Create get and set routines for "partial clone" config settings.
These will be used in a future commit by clone and fetch to
remember the promisor remote and the default filter-spec.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 cache.h                       |  1 +
 config.c                      |  5 +++
 environment.c                 |  1 +
 list-objects-filter-options.c | 90 +++++++++++++++++++++++++++++++++++--------
 list-objects-filter-options.h |  6 +++
 5 files changed, 88 insertions(+), 15 deletions(-)

diff --git a/cache.h b/cache.h
index 6980072..bccc510 100644
--- a/cache.h
+++ b/cache.h
@@ -861,6 +861,7 @@ extern int grafts_replace_parents;
 #define GIT_REPO_VERSION_READ 1
 extern int repository_format_precious_objects;
 extern char *repository_format_partial_clone;
+extern const char *core_partial_clone_filter_default;
 
 struct repository_format {
 	int version;
diff --git a/config.c b/config.c
index adb7d7a..adeee04 100644
--- a/config.c
+++ b/config.c
@@ -1241,6 +1241,11 @@ static int git_default_core_config(const char *var, const char *value)
 		return 0;
 	}
 
+	if (!strcmp(var, "core.partialclonefilter")) {
+		return git_config_string(&core_partial_clone_filter_default,
+					 var, value);
+	}
+
 	/* Add other config variables here and to Documentation/config.txt. */
 	return 0;
 }
diff --git a/environment.c b/environment.c
index e52aab3..7537565 100644
--- a/environment.c
+++ b/environment.c
@@ -28,6 +28,7 @@ int warn_on_object_refname_ambiguity = 1;
 int ref_paranoia = -1;
 int repository_format_precious_objects;
 char *repository_format_partial_clone;
+const char *core_partial_clone_filter_default;
 const char *git_commit_encoding;
 const char *git_log_output_encoding;
 const char *apply_default_whitespace;
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index 4c5b34e..5c47e2b 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -21,29 +21,36 @@
  * subordinate commands when necessary.  We also "intern" the arg for
  * the convenience of the current command.
  */
-int parse_list_objects_filter(struct list_objects_filter_options *filter_options,
-			      const char *arg)
+static int gently_parse_list_objects_filter(
+	struct list_objects_filter_options *filter_options,
+	const char *arg,
+	struct strbuf *errbuf)
 {
 	const char *v0;
 
-	if (filter_options->choice)
-		die(_("multiple object filter types cannot be combined"));
+	if (filter_options->choice) {
+		if (errbuf) {
+			strbuf_init(errbuf, 0);
+			strbuf_addstr(
+				errbuf,
+				_("multiple filter-specs cannot be combined"));
+		}
+		return 1;
+	}
 
 	filter_options->filter_spec = strdup(arg);
 
 	if (!strcmp(arg, "blob:none")) {
 		filter_options->choice = LOFC_BLOB_NONE;
 		return 0;
-	}
 
-	if (skip_prefix(arg, "blob:limit=", &v0)) {
-		if (!git_parse_ulong(v0, &filter_options->blob_limit_value))
-			die(_("invalid filter-spec expression '%s'"), arg);
-		filter_options->choice = LOFC_BLOB_LIMIT;
-		return 0;
-	}
+	} else if (skip_prefix(arg, "blob:limit=", &v0)) {
+		if (git_parse_ulong(v0, &filter_options->blob_limit_value)) {
+			filter_options->choice = LOFC_BLOB_LIMIT;
+			return 0;
+		}
 
-	if (skip_prefix(arg, "sparse:oid=", &v0)) {
+	} else if (skip_prefix(arg, "sparse:oid=", &v0)) {
 		struct object_context oc;
 		struct object_id sparse_oid;
 
@@ -57,15 +64,27 @@ int parse_list_objects_filter(struct list_objects_filter_options *filter_options
 			filter_options->sparse_oid_value = oiddup(&sparse_oid);
 		filter_options->choice = LOFC_SPARSE_OID;
 		return 0;
-	}
 
-	if (skip_prefix(arg, "sparse:path=", &v0)) {
+	} else if (skip_prefix(arg, "sparse:path=", &v0)) {
 		filter_options->choice = LOFC_SPARSE_PATH;
 		filter_options->sparse_path_value = strdup(v0);
 		return 0;
 	}
 
-	die(_("invalid filter-spec expression '%s'"), arg);
+	if (errbuf) {
+		strbuf_init(errbuf, 0);
+		strbuf_addf(errbuf, "invalid filter-spec '%s'", arg);
+	}
+	memset(filter_options, 0, sizeof(*filter_options));
+	return 1;
+}
+
+int parse_list_objects_filter(struct list_objects_filter_options *filter_options,
+			      const char *arg)
+{
+	struct strbuf buf = STRBUF_INIT;
+	if (gently_parse_list_objects_filter(filter_options, arg, &buf))
+		die("%s", buf.buf);
 	return 0;
 }
 
@@ -90,3 +109,44 @@ void list_objects_filter_release(
 	free(filter_options->sparse_path_value);
 	memset(filter_options, 0, sizeof(*filter_options));
 }
+
+void partial_clone_register(
+	const char *remote,
+	const struct list_objects_filter_options *filter_options)
+{
+	/*
+	 * Record the name of the partial clone remote in the
+	 * config and in the global variable -- the latter is
+	 * used throughout to indicate that partial clone is
+	 * enabled and to expect missing objects.
+	 */
+	if (repository_format_partial_clone &&
+	    *repository_format_partial_clone &&
+	    strcmp(remote, repository_format_partial_clone))
+		die(_("cannot change partial clone promisor remote"));
+
+	git_config_set("core.repositoryformatversion", "1");
+	git_config_set("extensions.partialclone", remote);
+
+	repository_format_partial_clone = xstrdup(remote);
+
+	/*
+	 * Record the initial filter-spec in the config as
+	 * the default for subsequent fetches from this remote.
+	 */
+	core_partial_clone_filter_default =
+		xstrdup(filter_options->filter_spec);
+	git_config_set("core.partialclonefilter",
+		       core_partial_clone_filter_default);
+}
+
+void partial_clone_get_default_filter_spec(
+	struct list_objects_filter_options *filter_options)
+{
+	/*
+	 * Parse default value, but silently ignore it if it is invalid.
+	 */
+	gently_parse_list_objects_filter(filter_options,
+					 core_partial_clone_filter_default,
+					 NULL);
+}
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index eea44a1..1143539 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -58,4 +58,10 @@ int opt_parse_list_objects_filter(const struct option *opt,
 void list_objects_filter_release(
 	struct list_objects_filter_options *filter_options);
 
+void partial_clone_register(
+	const char *remote,
+	const struct list_objects_filter_options *filter_options);
+void partial_clone_get_default_filter_spec(
+	struct list_objects_filter_options *filter_options);
+
 #endif /* LIST_OBJECTS_FILTER_OPTIONS_H */
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v7 11/16] clone: partial clone
  2017-12-08 15:58 [PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests Jeff Hostetler
                   ` (9 preceding siblings ...)
  2017-12-08 15:58 ` [PATCH v7 10/16] partial-clone: define partial clone settings in config Jeff Hostetler
@ 2017-12-08 15:58 ` Jeff Hostetler
  2017-12-08 15:58 ` [PATCH v7 12/16] unpack-trees: batch fetching of missing blobs Jeff Hostetler
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Jeff Hostetler @ 2017-12-08 15:58 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, jonathantanmy, Jeff Hostetler

From: Jonathan Tan <jonathantanmy@google.com>

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 builtin/clone.c  | 22 ++++++++++++++++++++--
 t/t5601-clone.sh | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 69 insertions(+), 2 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index dbddd98..f519bd4 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -26,6 +26,7 @@
 #include "run-command.h"
 #include "connected.h"
 #include "packfile.h"
+#include "list-objects-filter-options.h"
 
 /*
  * Overall FIXMEs:
@@ -60,6 +61,7 @@ static struct string_list option_optional_reference = STRING_LIST_INIT_NODUP;
 static int option_dissociate;
 static int max_jobs = -1;
 static struct string_list option_recurse_submodules = STRING_LIST_INIT_NODUP;
+static struct list_objects_filter_options filter_options;
 
 static int recurse_submodules_cb(const struct option *opt,
 				 const char *arg, int unset)
@@ -135,6 +137,7 @@ static struct option builtin_clone_options[] = {
 			TRANSPORT_FAMILY_IPV4),
 	OPT_SET_INT('6', "ipv6", &family, N_("use IPv6 addresses only"),
 			TRANSPORT_FAMILY_IPV6),
+	OPT_PARSE_LIST_OBJECTS_FILTER(&filter_options),
 	OPT_END()
 };
 
@@ -886,6 +889,8 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	struct refspec *refspec;
 	const char *fetch_pattern;
 
+	fetch_if_missing = 0;
+
 	packet_trace_identity("clone");
 	argc = parse_options(argc, argv, prefix, builtin_clone_options,
 			     builtin_clone_usage, 0);
@@ -1073,6 +1078,8 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 			warning(_("--shallow-since is ignored in local clones; use file:// instead."));
 		if (option_not.nr)
 			warning(_("--shallow-exclude is ignored in local clones; use file:// instead."));
+		if (filter_options.choice)
+			warning(_("--filter is ignored in local clones; use file:// instead."));
 		if (!access(mkpath("%s/shallow", path), F_OK)) {
 			if (option_local > 0)
 				warning(_("source repository is shallow, ignoring --local"));
@@ -1104,7 +1111,13 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		transport_set_option(transport, TRANS_OPT_UPLOADPACK,
 				     option_upload_pack);
 
-	if (transport->smart_options && !deepen)
+	if (filter_options.choice) {
+		transport_set_option(transport, TRANS_OPT_LIST_OBJECTS_FILTER,
+				     filter_options.filter_spec);
+		transport_set_option(transport, TRANS_OPT_FROM_PROMISOR, "1");
+	}
+
+	if (transport->smart_options && !deepen && !filter_options.choice)
 		transport->smart_options->check_self_contained_and_connected = 1;
 
 	refs = transport_get_remote_refs(transport);
@@ -1164,13 +1177,17 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	write_refspec_config(src_ref_prefix, our_head_points_at,
 			remote_head_points_at, &branch_top);
 
+	if (filter_options.choice)
+		partial_clone_register("origin", &filter_options);
+
 	if (is_local)
 		clone_local(path, git_dir);
 	else if (refs && complete_refs_before_fetch)
 		transport_fetch_refs(transport, mapped_refs);
 
 	update_remote_refs(refs, mapped_refs, remote_head_points_at,
-			   branch_top.buf, reflog_msg.buf, transport, !is_local);
+			   branch_top.buf, reflog_msg.buf, transport,
+			   !is_local && !filter_options.choice);
 
 	update_head(our_head_points_at, remote_head, reflog_msg.buf);
 
@@ -1191,6 +1208,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	}
 
 	junk_mode = JUNK_LEAVE_REPO;
+	fetch_if_missing = 1;
 	err = checkout(submodule_progress);
 
 	strbuf_release(&reflog_msg);
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index 9c56f77..6d37c6d 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -571,4 +571,53 @@ test_expect_success 'GIT_TRACE_PACKFILE produces a usable pack' '
 	git -C replay.git index-pack -v --stdin <tmp.pack
 '
 
+partial_clone () {
+	       SERVER="$1" &&
+	       URL="$2" &&
+
+	rm -rf "$SERVER" client &&
+	test_create_repo "$SERVER" &&
+	test_commit -C "$SERVER" one &&
+	HASH1=$(git hash-object "$SERVER/one.t") &&
+	git -C "$SERVER" revert HEAD &&
+	test_commit -C "$SERVER" two &&
+	HASH2=$(git hash-object "$SERVER/two.t") &&
+	test_config -C "$SERVER" uploadpack.allowfilter 1 &&
+	test_config -C "$SERVER" uploadpack.allowanysha1inwant 1 &&
+
+	git clone --filter=blob:limit=0 "$URL" client &&
+
+	git -C client fsck &&
+
+	# Ensure that unneeded blobs are not inadvertently fetched.
+	test_config -C client extensions.partialclone "not a remote" &&
+	test_must_fail git -C client cat-file -e "$HASH1" &&
+
+	# But this blob was fetched, because clone performs an initial checkout
+	git -C client cat-file -e "$HASH2"
+}
+
+test_expect_success 'partial clone' '
+	partial_clone server "file://$(pwd)/server"
+'
+
+test_expect_success 'partial clone: warn if server does not support object filtering' '
+	rm -rf server client &&
+	test_create_repo server &&
+	test_commit -C server one &&
+
+	git clone --filter=blob:limit=0 "file://$(pwd)/server" client 2> err &&
+
+	test_i18ngrep "filtering not recognized by server" err
+'
+
+. "$TEST_DIRECTORY"/lib-httpd.sh
+start_httpd
+
+test_expect_success 'partial clone using HTTP' '
+	partial_clone "$HTTPD_DOCUMENT_ROOT_PATH/server" "$HTTPD_URL/smart/server"
+'
+
+stop_httpd
+
 test_done
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v7 12/16] unpack-trees: batch fetching of missing blobs
  2017-12-08 15:58 [PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests Jeff Hostetler
                   ` (10 preceding siblings ...)
  2017-12-08 15:58 ` [PATCH v7 11/16] clone: partial clone Jeff Hostetler
@ 2017-12-08 15:58 ` Jeff Hostetler
  2017-12-08 15:58 ` [PATCH v7 13/16] fetch-pack: restore save_commit_buffer after use Jeff Hostetler
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Jeff Hostetler @ 2017-12-08 15:58 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, jonathantanmy, Jeff Hostetler

From: Jonathan Tan <jonathantanmy@google.com>

When running checkout, first prefetch all blobs that are to be updated
but are missing. This means that only one pack is downloaded during such
operations, instead of one per missing blob.

This operates only on the blob level - if a repository has a missing
tree, they are still fetched one at a time.

This does not use the delayed checkout mechanism introduced in commit
2841e8f ("convert: add "status=delayed" to filter process protocol",
2017-06-30) due to significant conceptual differences - in particular,
for partial clones, we already know what needs to be fetched based on
the contents of the local repo alone, whereas for status=delayed, it is
the filter process that tells us what needs to be checked in the end.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 fetch-object.c   | 26 ++++++++++++++++++++++----
 fetch-object.h   |  5 +++++
 t/t5601-clone.sh | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 unpack-trees.c   | 22 ++++++++++++++++++++++
 4 files changed, 101 insertions(+), 4 deletions(-)

diff --git a/fetch-object.c b/fetch-object.c
index 258fcfa..853624f 100644
--- a/fetch-object.c
+++ b/fetch-object.c
@@ -5,11 +5,10 @@
 #include "transport.h"
 #include "fetch-object.h"
 
-void fetch_object(const char *remote_name, const unsigned char *sha1)
+static void fetch_refs(const char *remote_name, struct ref *ref)
 {
 	struct remote *remote;
 	struct transport *transport;
-	struct ref *ref;
 	int original_fetch_if_missing = fetch_if_missing;
 
 	fetch_if_missing = 0;
@@ -18,10 +17,29 @@ void fetch_object(const char *remote_name, const unsigned char *sha1)
 		die(_("Remote with no URL"));
 	transport = transport_get(remote, remote->url[0]);
 
-	ref = alloc_ref(sha1_to_hex(sha1));
-	hashcpy(ref->old_oid.hash, sha1);
 	transport_set_option(transport, TRANS_OPT_FROM_PROMISOR, "1");
 	transport_set_option(transport, TRANS_OPT_NO_DEPENDENTS, "1");
 	transport_fetch_refs(transport, ref);
 	fetch_if_missing = original_fetch_if_missing;
 }
+
+void fetch_object(const char *remote_name, const unsigned char *sha1)
+{
+	struct ref *ref = alloc_ref(sha1_to_hex(sha1));
+	hashcpy(ref->old_oid.hash, sha1);
+	fetch_refs(remote_name, ref);
+}
+
+void fetch_objects(const char *remote_name, const struct oid_array *to_fetch)
+{
+	struct ref *ref = NULL;
+	int i;
+
+	for (i = 0; i < to_fetch->nr; i++) {
+		struct ref *new_ref = alloc_ref(oid_to_hex(&to_fetch->oid[i]));
+		oidcpy(&new_ref->old_oid, &to_fetch->oid[i]);
+		new_ref->next = ref;
+		ref = new_ref;
+	}
+	fetch_refs(remote_name, ref);
+}
diff --git a/fetch-object.h b/fetch-object.h
index f371300..4b269d0 100644
--- a/fetch-object.h
+++ b/fetch-object.h
@@ -1,6 +1,11 @@
 #ifndef FETCH_OBJECT_H
 #define FETCH_OBJECT_H
 
+#include "sha1-array.h"
+
 extern void fetch_object(const char *remote_name, const unsigned char *sha1);
 
+extern void fetch_objects(const char *remote_name,
+			  const struct oid_array *to_fetch);
+
 #endif
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index 6d37c6d..13610b7 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -611,6 +611,58 @@ test_expect_success 'partial clone: warn if server does not support object filte
 	test_i18ngrep "filtering not recognized by server" err
 '
 
+test_expect_success 'batch missing blob request during checkout' '
+	rm -rf server client &&
+
+	test_create_repo server &&
+	echo a >server/a &&
+	echo b >server/b &&
+	git -C server add a b &&
+
+	git -C server commit -m x &&
+	echo aa >server/a &&
+	echo bb >server/b &&
+	git -C server add a b &&
+	git -C server commit -m x &&
+
+	test_config -C server uploadpack.allowfilter 1 &&
+	test_config -C server uploadpack.allowanysha1inwant 1 &&
+
+	git clone --filter=blob:limit=0 "file://$(pwd)/server" client &&
+
+	# Ensure that there is only one negotiation by checking that there is
+	# only "done" line sent. ("done" marks the end of negotiation.)
+	GIT_TRACE_PACKET="$(pwd)/trace" git -C client checkout HEAD^ &&
+	grep "git> done" trace >done_lines &&
+	test_line_count = 1 done_lines
+'
+
+test_expect_success 'batch missing blob request does not inadvertently try to fetch gitlinks' '
+	rm -rf server client &&
+
+	test_create_repo repo_for_submodule &&
+	test_commit -C repo_for_submodule x &&
+
+	test_create_repo server &&
+	echo a >server/a &&
+	echo b >server/b &&
+	git -C server add a b &&
+	git -C server commit -m x &&
+
+	echo aa >server/a &&
+	echo bb >server/b &&
+	# Also add a gitlink pointing to an arbitrary repository
+	git -C server submodule add "$(pwd)/repo_for_submodule" c &&
+	git -C server add a b c &&
+	git -C server commit -m x &&
+
+	test_config -C server uploadpack.allowfilter 1 &&
+	test_config -C server uploadpack.allowanysha1inwant 1 &&
+
+	# Make sure that it succeeds
+	git clone --filter=blob:limit=0 "file://$(pwd)/server" client
+'
+
 . "$TEST_DIRECTORY"/lib-httpd.sh
 start_httpd
 
diff --git a/unpack-trees.c b/unpack-trees.c
index 71b70cc..73a1cdb 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -14,6 +14,7 @@
 #include "dir.h"
 #include "submodule.h"
 #include "submodule-config.h"
+#include "fetch-object.h"
 
 /*
  * Error messages expected by scripts out of plumbing commands such as
@@ -369,6 +370,27 @@ static int check_updates(struct unpack_trees_options *o)
 		load_gitmodules_file(index, &state);
 
 	enable_delayed_checkout(&state);
+	if (repository_format_partial_clone && o->update && !o->dry_run) {
+		/*
+		 * Prefetch the objects that are to be checked out in the loop
+		 * below.
+		 */
+		struct oid_array to_fetch = OID_ARRAY_INIT;
+		int fetch_if_missing_store = fetch_if_missing;
+		fetch_if_missing = 0;
+		for (i = 0; i < index->cache_nr; i++) {
+			struct cache_entry *ce = index->cache[i];
+			if ((ce->ce_flags & CE_UPDATE) &&
+			    !S_ISGITLINK(ce->ce_mode)) {
+				if (!has_object_file(&ce->oid))
+					oid_array_append(&to_fetch, &ce->oid);
+			}
+		}
+		if (to_fetch.nr)
+			fetch_objects(repository_format_partial_clone,
+				      &to_fetch);
+		fetch_if_missing = fetch_if_missing_store;
+	}
 	for (i = 0; i < index->cache_nr; i++) {
 		struct cache_entry *ce = index->cache[i];
 
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v7 13/16] fetch-pack: restore save_commit_buffer after use
  2017-12-08 15:58 [PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests Jeff Hostetler
                   ` (11 preceding siblings ...)
  2017-12-08 15:58 ` [PATCH v7 12/16] unpack-trees: batch fetching of missing blobs Jeff Hostetler
@ 2017-12-08 15:58 ` Jeff Hostetler
  2017-12-08 15:58 ` [PATCH v7 14/16] t5616: end-to-end tests for partial clone Jeff Hostetler
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Jeff Hostetler @ 2017-12-08 15:58 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, jonathantanmy, Jeff Hostetler

From: Jonathan Tan <jonathantanmy@google.com>

In fetch-pack, the global variable save_commit_buffer is set to 0, but
not restored to its original value after use.

In particular, if show_log() (in log-tree.c) is invoked after
fetch_pack() in the same process, show_log() will return before printing
out the commit message (because the invocation to
get_cached_commit_buffer() returns NULL, because the commit buffer was
not saved). I discovered this when attempting to run "git log -S" in a
partial clone, triggering the case where revision walking lazily loads
missing objects.

Therefore, restore save_commit_buffer to its original value after use.

An alternative to solve the problem I had is to replace
get_cached_commit_buffer() with get_commit_buffer(). That invocation was
introduced in commit a97934d ("use get_cached_commit_buffer where
appropriate", 2014-06-13) to replace "commit->buffer" introduced in
commit 3131b71 ("Add "--show-all" revision walker flag for debugging",
2008-02-13). In the latter commit, the commit author seems to be
deciding between not showing an unparsed commit at all and showing an
unparsed commit without the message (which is what the commit does), and
did not mention parsing the unparsed commit, so I prefer to preserve the
existing behavior.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 fetch-pack.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fetch-pack.c b/fetch-pack.c
index 3c5f064..773453c 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -717,6 +717,7 @@ static int everything_local(struct fetch_pack_args *args,
 {
 	struct ref *ref;
 	int retval;
+	int old_save_commit_buffer = save_commit_buffer;
 	timestamp_t cutoff = 0;
 
 	save_commit_buffer = 0;
@@ -786,6 +787,9 @@ static int everything_local(struct fetch_pack_args *args,
 		print_verbose(args, _("already have %s (%s)"), oid_to_hex(remote),
 			      ref->name);
 	}
+
+	save_commit_buffer = old_save_commit_buffer;
+
 	return retval;
 }
 
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v7 14/16] t5616: end-to-end tests for partial clone
  2017-12-08 15:58 [PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests Jeff Hostetler
                   ` (12 preceding siblings ...)
  2017-12-08 15:58 ` [PATCH v7 13/16] fetch-pack: restore save_commit_buffer after use Jeff Hostetler
@ 2017-12-08 15:58 ` Jeff Hostetler
  2017-12-08 15:58 ` [PATCH v7 15/16] fetch: inherit filter-spec from " Jeff Hostetler
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Jeff Hostetler @ 2017-12-08 15:58 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, jonathantanmy, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Additional end-to-end tests for partial clone.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 t/t5616-partial-clone.sh | 96 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 96 insertions(+)
 create mode 100755 t/t5616-partial-clone.sh

diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
new file mode 100755
index 0000000..3b8cc0b
--- /dev/null
+++ b/t/t5616-partial-clone.sh
@@ -0,0 +1,96 @@
+#!/bin/sh
+
+test_description='git partial clone'
+
+. ./test-lib.sh
+
+# create a normal "src" repo where we can later create new commits.
+# expect_1.oids will contain a list of the OIDs of all blobs.
+test_expect_success 'setup normal src repo' '
+	echo "{print \$1}" >print_1.awk &&
+	echo "{print \$2}" >print_2.awk &&
+
+	git init src &&
+	for n in 1 2 3 4
+	do
+		echo "This is file: $n" > src/file.$n.txt
+		git -C src add file.$n.txt
+		git -C src commit -m "file $n"
+		git -C src ls-files -s file.$n.txt >>temp
+	done &&
+	awk -f print_2.awk <temp | sort >expect_1.oids &&
+	test_line_count = 4 expect_1.oids
+'
+
+# bare clone "src" giving "srv.bare" for use as our server.
+test_expect_success 'setup bare clone for server' '
+	git clone --bare "file://$(pwd)/src" srv.bare &&
+	git -C srv.bare config --local uploadpack.allowfilter 1 &&
+	git -C srv.bare config --local uploadpack.allowanysha1inwant 1
+'
+
+# do basic partial clone from "srv.bare"
+# confirm we are missing all of the known blobs.
+# confirm partial clone was registered in the local config.
+test_expect_success 'do partial clone 1' '
+	git clone --no-checkout --filter=blob:none "file://$(pwd)/srv.bare" pc1 &&
+	git -C pc1 rev-list HEAD --quiet --objects --missing=print \
+		| awk -f print_1.awk \
+		| sed "s/?//" \
+		| sort >observed.oids &&
+	test_cmp expect_1.oids observed.oids &&
+	test "$(git -C pc1 config --local core.repositoryformatversion)" = "1" &&
+	test "$(git -C pc1 config --local extensions.partialclone)" = "origin" &&
+	test "$(git -C pc1 config --local core.partialclonefilter)" = "blob:none"
+'
+
+# checkout master to force dynamic object fetch of blobs at HEAD.
+test_expect_success 'verify checkout with dynamic object fetch' '
+	git -C pc1 rev-list HEAD --quiet --objects --missing=print >observed &&
+	test_line_count = 4 observed &&
+	git -C pc1 checkout master &&
+	git -C pc1 rev-list HEAD --quiet --objects --missing=print >observed &&
+	test_line_count = 0 observed
+'
+
+# create new commits in "src" repo to establish a blame history on file.1.txt
+# and push to "srv.bare".
+test_expect_success 'push new commits to server' '
+	git -C src remote add srv "file://$(pwd)/srv.bare" &&
+	for x in a b c d e
+	do
+		echo "Mod $x" >>src/file.1.txt
+		git -C src add file.1.txt
+		git -C src commit -m "mod $x"
+	done &&
+	git -C src blame master -- file.1.txt >expect.blame &&
+	git -C src push -u srv master
+'
+
+# (partial) fetch in the partial clone repo from the promisor remote.
+# verify that fetch inherited the filter-spec from the config and DOES NOT
+# have the new blobs.
+test_expect_success 'partial fetch inherits filter settings' '
+	git -C pc1 fetch origin &&
+	git -C pc1 rev-list master..origin/master --quiet --objects --missing=print >observed &&
+	test_line_count = 5 observed
+'
+
+# force dynamic object fetch using diff.
+# we should only get 1 new blob (for the file in origin/master).
+test_expect_success 'verify diff causes dynamic object fetch' '
+	git -C pc1 diff master..origin/master -- file.1.txt &&
+	git -C pc1 rev-list master..origin/master --quiet --objects --missing=print >observed &&
+	test_line_count = 4 observed
+'
+
+# force full dynamic object fetch of the file's history using blame.
+# we should get the intermediate blobs for the file.
+test_expect_success 'verify blame causes dynamic object fetch' '
+	git -C pc1 blame origin/master -- file.1.txt >observed.blame &&
+	test_cmp expect.blame observed.blame &&
+	git -C pc1 rev-list master..origin/master --quiet --objects --missing=print >observed &&
+	test_line_count = 0 observed
+'
+
+test_done
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v7 15/16] fetch: inherit filter-spec from partial clone
  2017-12-08 15:58 [PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests Jeff Hostetler
                   ` (13 preceding siblings ...)
  2017-12-08 15:58 ` [PATCH v7 14/16] t5616: end-to-end tests for partial clone Jeff Hostetler
@ 2017-12-08 15:58 ` Jeff Hostetler
  2017-12-08 15:58 ` [PATCH v7 16/16] t5616: test bulk prefetch after partial fetch Jeff Hostetler
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Jeff Hostetler @ 2017-12-08 15:58 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, jonathantanmy, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Teach (partial) fetch to inherit the filter-spec used by
the partial clone.  Extend --no-filter to override this
inheritance.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 builtin/fetch-pack.c          |  2 +-
 builtin/fetch.c               | 56 ++++++++++++++++++++++++++++++++++++++++---
 builtin/rev-list.c            |  2 +-
 list-objects-filter-options.c |  2 +-
 list-objects-filter-options.h | 12 ++++++++++
 t/t5616-partial-clone.sh      | 22 ++++++++++++++++-
 6 files changed, 89 insertions(+), 7 deletions(-)

diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index cbf5035..a7bc136 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -158,7 +158,7 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
 			continue;
 		}
 		if (!strcmp(arg, ("--no-" CL_ARG__FILTER))) {
-			list_objects_filter_release(&args.filter_options);
+			list_objects_filter_set_no_filter(&args.filter_options);
 			continue;
 		}
 		usage(fetch_pack_usage);
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 14aab71..79c866c 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1275,6 +1275,56 @@ static int fetch_multiple(struct string_list *list)
 	return result;
 }
 
+/*
+ * Fetching from the promisor remote should use the given filter-spec
+ * or inherit the default filter-spec from the config.
+ */
+static inline void fetch_one_setup_partial(struct remote *remote)
+{
+	/*
+	 * Explicit --no-filter argument overrides everything, regardless
+	 * of any prior partial clones and fetches.
+	 */
+	if (filter_options.no_filter)
+		return;
+
+	/*
+	 * If no prior partial clone/fetch and the current fetch DID NOT
+	 * request a partial-fetch, do a normal fetch.
+	 */
+	if (!repository_format_partial_clone && !filter_options.choice)
+		return;
+
+	/*
+	 * If this is the FIRST partial-fetch request, we enable partial
+	 * on this repo and remember the given filter-spec as the default
+	 * for subsequent fetches to this remote.
+	 */
+	if (!repository_format_partial_clone && filter_options.choice) {
+		partial_clone_register(remote->name, &filter_options);
+		return;
+	}
+
+	/*
+	 * We are currently limited to only ONE promisor remote and only
+	 * allow partial-fetches from the promisor remote.
+	 */
+	if (strcmp(remote->name, repository_format_partial_clone)) {
+		if (filter_options.choice)
+			die(_("--filter can only be used with the remote configured in core.partialClone"));
+		return;
+	}
+
+	/*
+	 * Do a partial-fetch from the promisor remote using either the
+	 * explicitly given filter-spec or inherit the filter-spec from
+	 * the config.
+	 */
+	if (!filter_options.choice)
+		partial_clone_get_default_filter_spec(&filter_options);
+	return;
+}
+
 static int fetch_one(struct remote *remote, int argc, const char **argv)
 {
 	static const char **refs = NULL;
@@ -1404,13 +1454,13 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 	}
 
 	if (remote) {
-		if (filter_options.choice &&
-		    strcmp(remote->name, repository_format_partial_clone))
-			die(_("--filter can only be used with the remote configured in core.partialClone"));
+		if (filter_options.choice || repository_format_partial_clone)
+			fetch_one_setup_partial(remote);
 		result = fetch_one(remote, argc, argv);
 	} else {
 		if (filter_options.choice)
 			die(_("--filter can only be used with the remote configured in core.partialClone"));
+		/* TODO should this also die if we have a previous partial-clone? */
 		result = fetch_multiple(&list);
 	}
 
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 48f922d..8503dea 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -460,7 +460,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 			continue;
 		}
 		if (!strcmp(arg, ("--no-" CL_ARG__FILTER))) {
-			list_objects_filter_release(&filter_options);
+			list_objects_filter_set_no_filter(&filter_options);
 			continue;
 		}
 		if (!strcmp(arg, "--filter-print-omitted")) {
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index 5c47e2b..6a3cc98 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -94,7 +94,7 @@ int opt_parse_list_objects_filter(const struct option *opt,
 	struct list_objects_filter_options *filter_options = opt->value;
 
 	if (unset || !arg) {
-		list_objects_filter_release(filter_options);
+		list_objects_filter_set_no_filter(filter_options);
 		return 0;
 	}
 
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index 1143539..0000a61 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -31,6 +31,11 @@ struct list_objects_filter_options {
 	enum list_objects_filter_choice choice;
 
 	/*
+	 * Choice is LOFC_DISABLED because "--no-filter" was requested.
+	 */
+	unsigned int no_filter : 1;
+
+	/*
 	 * Parsed values (fields) from within the filter-spec.  These are
 	 * choice-specific; not all values will be defined for any given
 	 * choice.
@@ -58,6 +63,13 @@ int opt_parse_list_objects_filter(const struct option *opt,
 void list_objects_filter_release(
 	struct list_objects_filter_options *filter_options);
 
+static inline void list_objects_filter_set_no_filter(
+	struct list_objects_filter_options *filter_options)
+{
+	list_objects_filter_release(filter_options);
+	filter_options->no_filter = 1;
+}
+
 void partial_clone_register(
 	const char *remote,
 	const struct list_objects_filter_options *filter_options);
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 3b8cc0b..433e07e 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -59,7 +59,7 @@ test_expect_success 'push new commits to server' '
 	git -C src remote add srv "file://$(pwd)/srv.bare" &&
 	for x in a b c d e
 	do
-		echo "Mod $x" >>src/file.1.txt
+		echo "Mod file.1.txt $x" >>src/file.1.txt
 		git -C src add file.1.txt
 		git -C src commit -m "mod $x"
 	done &&
@@ -93,4 +93,24 @@ test_expect_success 'verify blame causes dynamic object fetch' '
 	test_line_count = 0 observed
 '
 
+# create new commits in "src" repo to establish a history on file.2.txt
+# and push to "srv.bare".
+test_expect_success 'push new commits to server for file.2.txt' '
+	for x in a b c d e f
+	do
+		echo "Mod file.2.txt $x" >>src/file.2.txt
+		git -C src add file.2.txt
+		git -C src commit -m "mod $x"
+	done &&
+	git -C src push -u srv master
+'
+
+# Do FULL fetch by disabling filter-spec using --no-filter.
+# Verify we have all the new blobs.
+test_expect_success 'override inherited filter-spec using --no-filter' '
+	git -C pc1 fetch --no-filter origin &&
+	git -C pc1 rev-list master..origin/master --quiet --objects --missing=print >observed &&
+	test_line_count = 0 observed
+'
+
 test_done
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v7 16/16] t5616: test bulk prefetch after partial fetch
  2017-12-08 15:58 [PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests Jeff Hostetler
                   ` (14 preceding siblings ...)
  2017-12-08 15:58 ` [PATCH v7 15/16] fetch: inherit filter-spec from " Jeff Hostetler
@ 2017-12-08 15:58 ` Jeff Hostetler
  2017-12-08 17:58 ` [PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests Junio C Hamano
  2017-12-08 22:30 ` Brandon Williams
  17 siblings, 0 replies; 24+ messages in thread
From: Jeff Hostetler @ 2017-12-08 15:58 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, jonathantanmy, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Add test to t5616 to bulk fetch missing objects following
a partial fetch.  A technique like this could be used in
a pre-command hook for example.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 t/t5616-partial-clone.sh | 32 +++++++++++++++++++++++++++++++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 433e07e..29d8631 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -105,7 +105,7 @@ test_expect_success 'push new commits to server for file.2.txt' '
 	git -C src push -u srv master
 '
 
-# Do FULL fetch by disabling filter-spec using --no-filter.
+# Do FULL fetch by disabling inherited filter-spec using --no-filter.
 # Verify we have all the new blobs.
 test_expect_success 'override inherited filter-spec using --no-filter' '
 	git -C pc1 fetch --no-filter origin &&
@@ -113,4 +113,34 @@ test_expect_success 'override inherited filter-spec using --no-filter' '
 	test_line_count = 0 observed
 '
 
+# create new commits in "src" repo to establish a history on file.3.txt
+# and push to "srv.bare".
+test_expect_success 'push new commits to server for file.3.txt' '
+	for x in a b c d e f
+	do
+		echo "Mod file.3.txt $x" >>src/file.3.txt
+		git -C src add file.3.txt
+		git -C src commit -m "mod $x"
+	done &&
+	git -C src push -u srv master
+'
+
+# Do a partial fetch and then try to manually fetch the missing objects.
+# This can be used as the basis of a pre-command hook to bulk fetch objects
+# perhaps combined with a command in dry-run mode.
+test_expect_success 'manual prefetch of missing objects' '
+	git -C pc1 fetch --filter=blob:none origin &&
+	git -C pc1 rev-list master..origin/master --quiet --objects --missing=print \
+		| awk -f print_1.awk \
+		| sed "s/?//" \
+		| sort >observed.oids &&
+	test_line_count = 6 observed.oids &&
+	git -C pc1 fetch-pack --stdin "file://$(pwd)/srv.bare" <observed.oids &&
+	git -C pc1 rev-list master..origin/master --quiet --objects --missing=print \
+		| awk -f print_1.awk \
+		| sed "s/?//" \
+		| sort >observed.oids &&
+	test_line_count = 0 observed.oids
+'
+
 test_done
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests
  2017-12-08 15:58 [PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests Jeff Hostetler
                   ` (15 preceding siblings ...)
  2017-12-08 15:58 ` [PATCH v7 16/16] t5616: test bulk prefetch after partial fetch Jeff Hostetler
@ 2017-12-08 17:58 ` Junio C Hamano
  2017-12-08 18:10   ` Jeff Hostetler
  2017-12-08 22:30 ` Brandon Williams
  17 siblings, 1 reply; 24+ messages in thread
From: Junio C Hamano @ 2017-12-08 17:58 UTC (permalink / raw)
  To: Jeff Hostetler; +Cc: git, peff, jonathantanmy, Jeff Hostetler

Jeff Hostetler <git@jeffhostetler.com> writes:

> From: Jeff Hostetler <jeffhost@microsoft.com>
>
> This is V7 of part 3 of partial clone.  It builds upon V7 of part 2
> (which builds upon V6 of part 1).

Aren't the three patches at the bottom sort-of duplicate from the
part 2 series?


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests
  2017-12-08 17:58 ` [PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests Junio C Hamano
@ 2017-12-08 18:10   ` Jeff Hostetler
  2017-12-08 18:23     ` Junio C Hamano
  0 siblings, 1 reply; 24+ messages in thread
From: Jeff Hostetler @ 2017-12-08 18:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, peff, jonathantanmy, Jeff Hostetler



On 12/8/2017 12:58 PM, Junio C Hamano wrote:
> Jeff Hostetler <git@jeffhostetler.com> writes:
> 
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>
>> This is V7 of part 3 of partial clone.  It builds upon V7 of part 2
>> (which builds upon V6 of part 1).
> 
> Aren't the three patches at the bottom sort-of duplicate from the
> part 2 series?
> 

oops.  yes, you're right.  it looks like i selected pc*6*_p2..pc7_p3
rather than pc*7*_p2..pc7_p3.  sorry for the typo.

and since the only changes in p2 were to squash those 2 commits near
the tip of p2, only those 3 commits changed SHAs in v7 over v6.

so, please disregard the duplicates.

would you like me to send a corrected V8 for p3 ?

Jeff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests
  2017-12-08 18:10   ` Jeff Hostetler
@ 2017-12-08 18:23     ` Junio C Hamano
  0 siblings, 0 replies; 24+ messages in thread
From: Junio C Hamano @ 2017-12-08 18:23 UTC (permalink / raw)
  To: Jeff Hostetler; +Cc: git, peff, jonathantanmy, Jeff Hostetler

Jeff Hostetler <git@jeffhostetler.com> writes:

> On 12/8/2017 12:58 PM, Junio C Hamano wrote:
>> Jeff Hostetler <git@jeffhostetler.com> writes:
>>
>>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>>
>>> This is V7 of part 3 of partial clone.  It builds upon V7 of part 2
>>> (which builds upon V6 of part 1).
>>
>> Aren't the three patches at the bottom sort-of duplicate from the
>> part 2 series?
>>
>
> oops.  yes, you're right.  it looks like i selected pc*6*_p2..pc7_p3
> rather than pc*7*_p2..pc7_p3.  sorry for the typo.
>
> and since the only changes in p2 were to squash those 2 commits near
> the tip of p2, only those 3 commits changed SHAs in v7 over v6.
>
> so, please disregard the duplicates.
>
> would you like me to send a corrected V8 for p3 ?

Nah.  I just wanted to make sure that I am discarding the right ones
(i.e. 1-3/16 of partial-clone, not 8-10/10 of fsck-promisors).

Thanks for an update.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests
  2017-12-08 15:58 [PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests Jeff Hostetler
                   ` (16 preceding siblings ...)
  2017-12-08 17:58 ` [PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests Junio C Hamano
@ 2017-12-08 22:30 ` Brandon Williams
  2017-12-11 19:19   ` Jonathan Tan
  17 siblings, 1 reply; 24+ messages in thread
From: Brandon Williams @ 2017-12-08 22:30 UTC (permalink / raw)
  To: Jeff Hostetler; +Cc: git, gitster, peff, jonathantanmy, Jeff Hostetler

On 12/08, Jeff Hostetler wrote:
> From: Jeff Hostetler <jeffhost@microsoft.com>
> 
> This is V7 of part 3 of partial clone.  It builds upon V7 of part 2
> (which builds upon V6 of part 1).
> 
> This version adds additional tests, fixes test errors on the MAC version,
> and squashes some fixup commits.
> 
> It also restores functionality accidentally dropped from the V6 series
> for "git fetch" to automatically inherit the partial-clone filter-spec
> when appropriate.  This version extends the --no-filter argument to
> override this inheritance.
> 

I just finished reading through parts 1-3.  Overall I like the series.
There are a few point's that I'm not a big fan of but i wasn't able to
come up with a better alternative.  One of these being the need for a
global variable to tell the fetch-object logic to not go to the server
to try and fetch a missing object.

One other thing i noticed was it looks like when you discover that you
are missing a blob you you'll try to fault it in from the server without
first checking its an object the server would even have.  Shouldn't you
first do a check to verify that the object in question is a promised
object before you go out to contact the server to request it?  You may
have already ruled this out for some reason I'm not aware of (maybe its
too costly to compute?).


-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests
  2017-12-08 22:30 ` Brandon Williams
@ 2017-12-11 19:19   ` Jonathan Tan
  0 siblings, 0 replies; 24+ messages in thread
From: Jonathan Tan @ 2017-12-11 19:19 UTC (permalink / raw)
  To: Brandon Williams; +Cc: Jeff Hostetler, git, gitster, peff, Jeff Hostetler

On Fri, 8 Dec 2017 14:30:10 -0800
Brandon Williams <bmwill@google.com> wrote:

> I just finished reading through parts 1-3.  Overall I like the series.
> There are a few point's that I'm not a big fan of but i wasn't able to
> come up with a better alternative.  One of these being the need for a
> global variable to tell the fetch-object logic to not go to the server
> to try and fetch a missing object.

I didn't really like that approach too but I went with that because,
like you, I couldn't come up with a better one. The main issue is that
too many functions (e.g. parse_commit() in commit.c) indirectly read
objects, and I couldn't find a better way to control them all. Ideally,
we should have a "struct object_store" (or maybe "struct repository"
could do this too) on which we can set "fetch_if_missing", and have all
object-reading functions take a pointer to this struct. Or completely
separate the object-reading and object-parsing code (e.g. commit.c
should not be able to read objects at all). Or both.

Any of these would be major undertakings, though, and there are good
reasons for why the same function does the reading and parsing (for
example, parse_commit() does not perform any reading if the object has
been already parsed).

> One other thing i noticed was it looks like when you discover that you
> are missing a blob you you'll try to fault it in from the server without
> first checking its an object the server would even have.  Shouldn't you
> first do a check to verify that the object in question is a promised
> object before you go out to contact the server to request it?  You may
> have already ruled this out for some reason I'm not aware of (maybe its
> too costly to compute?).

It is quite costly to compute - in the worst case, we would need to read
every object in every promisor packfile of one or more certain types
(e.g. if we know that we're fetching a blob, we need to read every tree)
to find out if the object we want is a promisor object.

Such a check would be better at surfacing mistakes (e.g. the user giving
the wrong SHA-1) early, but beyond that, I don't think that having the
check is very important. Consider these two very common situations:

 (1) Fetching a single branch by its tip's SHA-1. A naive implementation
     will first check if we have that SHA-1, which triggers the dynamic
     fetch (since it is an object read), and assuming success, notice
     that we indeed have that tip, and not fetch anything else. The
     check you describe will avoid this situation.
 (2) Dynamically fetching a missing blob by its SHA-1. A naive
     implementation will first check if we have that SHA-1, which
     triggers the dynamic fetch, and that fetch will first check if we
     have that SHA-1, and so on (thus, an infinite loop). The check you
     describe will not avoid that situation.

The check solves (1), but we still need a solution to (2) - I used
"fetch_if_missing", as discussed in your previous question and my answer
to that. A solution to (2) is usually also a solution to (1), so the
check wouldn't help much here.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 09/16] fetch: support filters
  2017-12-08 15:58 ` [PATCH v7 09/16] fetch: support filters Jeff Hostetler
@ 2018-08-19 11:24   ` Duy Nguyen
  2018-08-20 19:42     ` Jeff Hostetler
  0 siblings, 1 reply; 24+ messages in thread
From: Duy Nguyen @ 2018-08-19 11:24 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: Git Mailing List, Junio C Hamano, Jeff King, Jonathan Tan,
	Jeff Hostetler

On Fri, Dec 8, 2017 at 5:00 PM Jeff Hostetler <git@jeffhostetler.com> wrote:
>
> From: Jeff Hostetler <jeffhost@microsoft.com>
>
> Teach fetch to support filters. This is only allowed for the remote
> configured in extensions.partialcloneremote.
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  builtin/fetch.c       | 23 +++++++++++++++++++++--
>  connected.c           |  2 ++
>  remote-curl.c         |  6 ++++++
>  t/t5500-fetch-pack.sh | 36 ++++++++++++++++++++++++++++++++++++
>  4 files changed, 65 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index 1b1f039..14aab71 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -18,6 +18,7 @@
>  #include "argv-array.h"
>  #include "utf8.h"
>  #include "packfile.h"
> +#include "list-objects-filter-options.h"
>
>  static const char * const builtin_fetch_usage[] = {
>         N_("git fetch [<options>] [<repository> [<refspec>...]]"),
> @@ -55,6 +56,7 @@ static int recurse_submodules_default = RECURSE_SUBMODULES_ON_DEMAND;
>  static int shown_url = 0;
>  static int refmap_alloc, refmap_nr;
>  static const char **refmap_array;
> +static struct list_objects_filter_options filter_options;
>
>  static int git_fetch_config(const char *k, const char *v, void *cb)
>  {
> @@ -160,6 +162,7 @@ static struct option builtin_fetch_options[] = {
>                         TRANSPORT_FAMILY_IPV4),
>         OPT_SET_INT('6', "ipv6", &family, N_("use IPv6 addresses only"),
>                         TRANSPORT_FAMILY_IPV6),
> +       OPT_PARSE_LIST_OBJECTS_FILTER(&filter_options),

Documentation is missing. Please add something to git-fetch.txt (or
fetch-options.txt) about this option. I would make a patch but I don't
know enough about this to write and I'm in the middle of something
else.
-- 
Duy

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 09/16] fetch: support filters
  2018-08-19 11:24   ` Duy Nguyen
@ 2018-08-20 19:42     ` Jeff Hostetler
  0 siblings, 0 replies; 24+ messages in thread
From: Jeff Hostetler @ 2018-08-20 19:42 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Git Mailing List, Junio C Hamano, Jeff King, Jonathan Tan,
	Jeff Hostetler



On 8/19/2018 7:24 AM, Duy Nguyen wrote:
> On Fri, Dec 8, 2017 at 5:00 PM Jeff Hostetler <git@jeffhostetler.com> wrote:
>>
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>
>> Teach fetch to support filters. This is only allowed for the remote
>> configured in extensions.partialcloneremote.
>>
>> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
>> ---
>>   builtin/fetch.c       | 23 +++++++++++++++++++++--
>>   connected.c           |  2 ++
>>   remote-curl.c         |  6 ++++++
>>   t/t5500-fetch-pack.sh | 36 ++++++++++++++++++++++++++++++++++++
>>   4 files changed, 65 insertions(+), 2 deletions(-)
>>
>> diff --git a/builtin/fetch.c b/builtin/fetch.c
>> index 1b1f039..14aab71 100644
>> --- a/builtin/fetch.c
>> +++ b/builtin/fetch.c
>> @@ -18,6 +18,7 @@
>>   #include "argv-array.h"
>>   #include "utf8.h"
>>   #include "packfile.h"
>> +#include "list-objects-filter-options.h"
>>
>>   static const char * const builtin_fetch_usage[] = {
>>          N_("git fetch [<options>] [<repository> [<refspec>...]]"),
>> @@ -55,6 +56,7 @@ static int recurse_submodules_default = RECURSE_SUBMODULES_ON_DEMAND;
>>   static int shown_url = 0;
>>   static int refmap_alloc, refmap_nr;
>>   static const char **refmap_array;
>> +static struct list_objects_filter_options filter_options;
>>
>>   static int git_fetch_config(const char *k, const char *v, void *cb)
>>   {
>> @@ -160,6 +162,7 @@ static struct option builtin_fetch_options[] = {
>>                          TRANSPORT_FAMILY_IPV4),
>>          OPT_SET_INT('6', "ipv6", &family, N_("use IPv6 addresses only"),
>>                          TRANSPORT_FAMILY_IPV6),
>> +       OPT_PARSE_LIST_OBJECTS_FILTER(&filter_options),
> 
> Documentation is missing. Please add something to git-fetch.txt (or
> fetch-options.txt) about this option. I would make a patch but I don't
> know enough about this to write and I'm in the middle of something
> else.
> 

Documentation for --filter=<fs> (and --no-filter) were added to
rev-list-options.txt.  The opening paragraph talks about --objects
which would need to be omitted for fetch and clone, but the rest
of that text is relevant.

I'll push up a patch shortly.

Thanks
Jeff

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2018-08-20 19:42 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-08 15:58 [PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests Jeff Hostetler
2017-12-08 15:58 ` [PATCH v7 01/16] sha1_file: support lazily fetching missing objects Jeff Hostetler
2017-12-08 15:58 ` [PATCH v7 02/16] rev-list: support termination at promisor objects Jeff Hostetler
2017-12-08 15:58 ` [PATCH v7 03/16] gc: do not repack promisor packfiles Jeff Hostetler
2017-12-08 15:58 ` [PATCH v7 04/16] upload-pack: add object filtering for partial clone Jeff Hostetler
2017-12-08 15:58 ` [PATCH v7 05/16] fetch-pack, index-pack, transport: " Jeff Hostetler
2017-12-08 15:58 ` [PATCH v7 06/16] fetch-pack: add --no-filter Jeff Hostetler
2017-12-08 15:58 ` [PATCH v7 07/16] fetch-pack: test support excluding large blobs Jeff Hostetler
2017-12-08 15:58 ` [PATCH v7 08/16] fetch: refactor calculation of remote list Jeff Hostetler
2017-12-08 15:58 ` [PATCH v7 09/16] fetch: support filters Jeff Hostetler
2018-08-19 11:24   ` Duy Nguyen
2018-08-20 19:42     ` Jeff Hostetler
2017-12-08 15:58 ` [PATCH v7 10/16] partial-clone: define partial clone settings in config Jeff Hostetler
2017-12-08 15:58 ` [PATCH v7 11/16] clone: partial clone Jeff Hostetler
2017-12-08 15:58 ` [PATCH v7 12/16] unpack-trees: batch fetching of missing blobs Jeff Hostetler
2017-12-08 15:58 ` [PATCH v7 13/16] fetch-pack: restore save_commit_buffer after use Jeff Hostetler
2017-12-08 15:58 ` [PATCH v7 14/16] t5616: end-to-end tests for partial clone Jeff Hostetler
2017-12-08 15:58 ` [PATCH v7 15/16] fetch: inherit filter-spec from " Jeff Hostetler
2017-12-08 15:58 ` [PATCH v7 16/16] t5616: test bulk prefetch after partial fetch Jeff Hostetler
2017-12-08 17:58 ` [PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests Junio C Hamano
2017-12-08 18:10   ` Jeff Hostetler
2017-12-08 18:23     ` Junio C Hamano
2017-12-08 22:30 ` Brandon Williams
2017-12-11 19:19   ` Jonathan Tan

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).