git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/9] WIP Partial clone part 2: fsck and promisors
@ 2017-11-02 20:20 Jeff Hostetler
  2017-11-02 20:20 ` [PATCH 1/9] extension.partialclone: introduce partial clone extension Jeff Hostetler
                   ` (8 more replies)
  0 siblings, 9 replies; 19+ messages in thread
From: Jeff Hostetler @ 2017-11-02 20:20 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, jonathantanmy, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

This is part 2 of a proposed 3 part sequence for partial clone.
Part 2 assumes part 1 [1] is in place.

Part 2 is concerned with fsck, gc, initial support for dynamic
object fetching, and tracking promisor objects.  Jonathan Tan
originally developed this code.  I have moved it on top of [1].

[1] https://public-inbox.org/git/20171102124445.fbffd43521cd35f6a71e1851@google.com/T/
[2] https://public-inbox.org/git/cover.1506714999.git.jonathantanmy@google.com/


Jeff Hostetler (1):
  extension.partialclone: introduce partial clone extension

Jonathan Tan (8):
  fsck: introduce partialclone extension
  fsck: support refs pointing to promisor objects
  fsck: support referenced promisor objects
  fsck: support promisor objects as CLI argument
  index-pack: refactor writing of .keep files
  introduce fetch-object: fetch one promisor object
  sha1_file: support lazily fetching missing objects
  gc: do not repack promisor packfiles

 Documentation/git-pack-objects.txt             |   4 +
 Documentation/gitremote-helpers.txt            |   2 +
 Documentation/technical/repository-version.txt |  22 ++
 Makefile                                       |   2 +
 builtin/cat-file.c                             |   3 +
 builtin/fetch-pack.c                           |  10 +
 builtin/fsck.c                                 |  26 +-
 builtin/gc.c                                   |   4 +
 builtin/index-pack.c                           | 113 ++++----
 builtin/pack-objects.c                         |  14 +
 builtin/prune.c                                |   7 +
 builtin/repack.c                               |  12 +-
 builtin/rev-list.c                             |  35 ++-
 cache.h                                        |  15 +-
 config.h                                       |   3 +
 environment.c                                  |   2 +
 fetch-object.c                                 |  26 ++
 fetch-object.h                                 |   6 +
 fetch-pack.c                                   |   8 +-
 fetch-pack.h                                   |   2 +
 list-objects.c                                 |   8 +-
 object.c                                       |   2 +-
 packfile.c                                     |  78 +++++-
 packfile.h                                     |  13 +
 partial-clone-utils.c                          |  78 ++++++
 partial-clone-utils.h                          |  34 +++
 remote-curl.c                                  |  17 +-
 revision.c                                     |  32 ++-
 revision.h                                     |   5 +-
 setup.c                                        |  12 +
 sha1_file.c                                    |  39 ++-
 t/t0410-partial-clone.sh                       | 343 +++++++++++++++++++++++++
 transport.c                                    |   8 +
 transport.h                                    |   8 +
 34 files changed, 917 insertions(+), 76 deletions(-)
 create mode 100644 fetch-object.c
 create mode 100644 fetch-object.h
 create mode 100644 partial-clone-utils.c
 create mode 100644 partial-clone-utils.h
 create mode 100755 t/t0410-partial-clone.sh

-- 
2.9.3


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/9] extension.partialclone: introduce partial clone extension
  2017-11-02 20:20 [PATCH 0/9] WIP Partial clone part 2: fsck and promisors Jeff Hostetler
@ 2017-11-02 20:20 ` Jeff Hostetler
  2017-11-02 22:24   ` Jonathan Tan
  2017-11-02 20:20 ` [PATCH 2/9] fsck: introduce partialclone extension Jeff Hostetler
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 19+ messages in thread
From: Jeff Hostetler @ 2017-11-02 20:20 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, jonathantanmy, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Introduce the ability to have missing objects in a repo.  This
functionality is guarded by new repository extension options:
    `extensions.partialcloneremote` and
    `extensions.partialclonefilter`.

See the update to Documentation/technical/repository-version.txt
in this patch for more information.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Documentation/technical/repository-version.txt | 22 ++++++++
 Makefile                                       |  1 +
 cache.h                                        |  4 ++
 config.h                                       |  3 +
 environment.c                                  |  2 +
 partial-clone-utils.c                          | 78 ++++++++++++++++++++++++++
 partial-clone-utils.h                          | 34 +++++++++++
 setup.c                                        | 15 +++++
 8 files changed, 159 insertions(+)
 create mode 100644 partial-clone-utils.c
 create mode 100644 partial-clone-utils.h

diff --git a/Documentation/technical/repository-version.txt b/Documentation/technical/repository-version.txt
index 00ad379..9d488db 100644
--- a/Documentation/technical/repository-version.txt
+++ b/Documentation/technical/repository-version.txt
@@ -86,3 +86,25 @@ for testing format-1 compatibility.
 When the config key `extensions.preciousObjects` is set to `true`,
 objects in the repository MUST NOT be deleted (e.g., by `git-prune` or
 `git repack -d`).
+
+`partialcloneremote`
+~~~~~~~~~~~~~~~~~~~~
+
+When the config key `extensions.partialcloneremote` is set, it indicates
+that the repo was created with a partial clone (or later performed
+a partial fetch) and that the remote may have omitted sending
+certain unwanted objects.  Such a remote is called a "promisor remote"
+and it promises that all such omitted objects can be fetched from it
+in the future.
+
+The value of this key is the name of the promisor remote.
+
+`partialclonefilter`
+~~~~~~~~~~~~~~~~~~~~
+
+When the config key `extensions.partialclonefilter` is set, it gives
+the initial filter expression used to create the partial clone.
+This value becomed the default filter expression for subsequent
+fetches (called "partial fetches") from the promisor remote.  This
+value may also be set by the first explicit partial fetch following a
+normal clone.
diff --git a/Makefile b/Makefile
index ca378a4..12d141a 100644
--- a/Makefile
+++ b/Makefile
@@ -838,6 +838,7 @@ LIB_OBJS += pack-write.o
 LIB_OBJS += pager.o
 LIB_OBJS += parse-options.o
 LIB_OBJS += parse-options-cb.o
+LIB_OBJS += partial-clone-utils.o
 LIB_OBJS += patch-delta.o
 LIB_OBJS += patch-ids.o
 LIB_OBJS += path.o
diff --git a/cache.h b/cache.h
index 6440e2b..4b785c0 100644
--- a/cache.h
+++ b/cache.h
@@ -860,12 +860,16 @@ extern int grafts_replace_parents;
 #define GIT_REPO_VERSION 0
 #define GIT_REPO_VERSION_READ 1
 extern int repository_format_precious_objects;
+extern char *repository_format_partial_clone_remote;
+extern char *repository_format_partial_clone_filter;
 
 struct repository_format {
 	int version;
 	int precious_objects;
 	int is_bare;
 	char *work_tree;
+	char *partial_clone_remote; /* value of extensions.partialcloneremote */
+	char *partial_clone_filter; /* value of extensions.partialclonefilter */
 	struct string_list unknown_extensions;
 };
 
diff --git a/config.h b/config.h
index a49d264..90544ef 100644
--- a/config.h
+++ b/config.h
@@ -34,6 +34,9 @@ struct config_options {
 	const char *git_dir;
 };
 
+#define KEY_PARTIALCLONEREMOTE "partialcloneremote"
+#define KEY_PARTIALCLONEFILTER "partialclonefilter"
+
 typedef int (*config_fn_t)(const char *, const char *, void *);
 extern int git_default_config(const char *, const char *, void *);
 extern int git_config_from_file(config_fn_t fn, const char *, void *);
diff --git a/environment.c b/environment.c
index 8289c25..2fcf9bb 100644
--- a/environment.c
+++ b/environment.c
@@ -27,6 +27,8 @@ int warn_ambiguous_refs = 1;
 int warn_on_object_refname_ambiguity = 1;
 int ref_paranoia = -1;
 int repository_format_precious_objects;
+char *repository_format_partial_clone_remote;
+char *repository_format_partial_clone_filter;
 const char *git_commit_encoding;
 const char *git_log_output_encoding;
 const char *apply_default_whitespace;
diff --git a/partial-clone-utils.c b/partial-clone-utils.c
new file mode 100644
index 0000000..32cc20d
--- /dev/null
+++ b/partial-clone-utils.c
@@ -0,0 +1,78 @@
+#include "cache.h"
+#include "config.h"
+#include "partial-clone-utils.h"
+
+int is_partial_clone_registered(void)
+{
+	if (repository_format_partial_clone_remote ||
+	    repository_format_partial_clone_filter)
+		return 1;
+
+	return 0;
+}
+
+void partial_clone_utils_register(
+	const struct list_objects_filter_options *filter_options,
+	const char *remote,
+	const char *cmd_name)
+{
+	if (is_partial_clone_registered()) {
+		/*
+		 * The original partial-clone or a previous partial-fetch
+		 * already registered the partial-clone settings.
+		 * If we get here, we are in a subsequent partial-* command
+		 * (with explicit filter args on the command line).
+		 *
+		 * For now, we restrict subsequent commands to one
+		 * consistent with the original request.  We may relax
+		 * this later after we get more experience with the
+		 * partial-clone feature.
+		 *
+		 * [] Restrict to same remote because our dynamic
+		 *    object loading only knows how to fetch objects
+		 *    from 1 remote.
+		 */
+		assert(filter_options && filter_options->choice);
+		assert(remote && *remote);
+
+		if (strcmp(remote, repository_format_partial_clone_remote))
+			die("%s --%s currently limited to remote '%s'",
+			    cmd_name, CL_ARG__FILTER,
+			    repository_format_partial_clone_remote);
+
+		/*
+		 * Treat the (possibly new) filter-spec as transient;
+		 * use it for the current command, but do not overwrite
+		 * the default.
+		 */
+		return;
+	}
+
+	repository_format_partial_clone_remote = xstrdup(remote);
+	repository_format_partial_clone_filter = xstrdup(filter_options->raw_value);
+
+	/*
+	 * Force repo version > 0 to enable extensions namespace.
+	 *
+	 * TODO if already set > 0, we should not overwrite it. 
+	 */
+	git_config_set("core.repositoryformatversion", "1");
+
+	/*
+	 * Use the "extensions" namespace in the config to record
+	 * the name of the remote used in the partial clone.
+	 * This will help us return to that server when we need
+	 * to backfill missing objects.
+	 *
+	 * It is also used to indicate that there *MAY* be
+	 * missing objects so that subsequent commands don't
+	 * immediately die if they hit one.
+	 *
+	 * Also remember the initial filter settings used by
+	 * clone as a default for future fetches.
+	 */
+	git_config_set("extensions." KEY_PARTIALCLONEREMOTE,
+		       repository_format_partial_clone_remote);
+	git_config_set("extensions." KEY_PARTIALCLONEFILTER,
+		       repository_format_partial_clone_filter);
+}
diff --git a/partial-clone-utils.h b/partial-clone-utils.h
new file mode 100644
index 0000000..b527570
--- /dev/null
+++ b/partial-clone-utils.h
@@ -0,0 +1,34 @@
+#ifndef PARTIAL_CLONE_UTILS_H
+#define PARTIAL_CLONE_UTILS_H
+
+#include "list-objects-filter-options.h"
+
+/*
+ * Register that partial-clone was used to create the repo and
+ * update the config on disk.
+ *
+ * If nothing else, this indicates that the ODB may have missing
+ * objects and that various commands should handle that gracefully.
+ *
+ * Record the remote used for the clone so that we know where
+ * to get missing objects in the future.
+ *
+ * Also record the filter expression so that we know something
+ * about the missing objects (e.g., size-limit vs sparse).
+ *
+ * May also be used by a partial-fetch following a normal clone
+ * to turn on the above tracking.
+ */ 
+extern void partial_clone_utils_register(
+	const struct list_objects_filter_options *filter_options,
+	const char *remote,
+	const char *cmd_name);
+
+/*
+ * Return 1 if partial-clone was used to create the repo
+ * or a subsequent partial-fetch was used.  This is an
+ * indicator that there may be missing objects.
+ */
+extern int is_partial_clone_registered(void);
+
+#endif /* PARTIAL_CLONE_UTILS_H */
diff --git a/setup.c b/setup.c
index 03f51e0..bc4133d 100644
--- a/setup.c
+++ b/setup.c
@@ -420,6 +420,19 @@ static int check_repo_format(const char *var, const char *value, void *vdata)
 			;
 		else if (!strcmp(ext, "preciousobjects"))
 			data->precious_objects = git_config_bool(var, value);
+
+		else if (!strcmp(ext, KEY_PARTIALCLONEREMOTE))
+			if (!value)
+				return config_error_nonbool(var);
+			else
+				data->partial_clone_remote = xstrdup(value);
+
+		else if (!strcmp(ext, KEY_PARTIALCLONEFILTER))
+			if (!value)
+				return config_error_nonbool(var);
+			else
+				data->partial_clone_filter = xstrdup(value);
+
 		else
 			string_list_append(&data->unknown_extensions, ext);
 	} else if (strcmp(var, "core.bare") == 0) {
@@ -463,6 +476,8 @@ static int check_repository_format_gently(const char *gitdir, int *nongit_ok)
 	}
 
 	repository_format_precious_objects = candidate.precious_objects;
+	repository_format_partial_clone_remote = candidate.partial_clone_remote;
+	repository_format_partial_clone_filter = candidate.partial_clone_filter;
 	string_list_clear(&candidate.unknown_extensions, 0);
 	if (!has_common) {
 		if (candidate.is_bare != -1) {
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/9] fsck: introduce partialclone extension
  2017-11-02 20:20 [PATCH 0/9] WIP Partial clone part 2: fsck and promisors Jeff Hostetler
  2017-11-02 20:20 ` [PATCH 1/9] extension.partialclone: introduce partial clone extension Jeff Hostetler
@ 2017-11-02 20:20 ` Jeff Hostetler
  2017-11-02 20:20 ` [PATCH 3/9] fsck: support refs pointing to promisor objects Jeff Hostetler
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Jeff Hostetler @ 2017-11-02 20:20 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, jonathantanmy, Jeff Hostetler

From: Jonathan Tan <jonathantanmy@google.com>

Currently, Git does not support repos with very large numbers of objects
or repos that wish to minimize manipulation of certain blobs (for
example, because they are very large) very well, even if the user
operates mostly on part of the repo, because Git is designed on the
assumption that every referenced object is available somewhere in the
repo storage. In such an arrangement, the full set of objects is usually
available in remote storage, ready to be lazily downloaded.

Introduce the ability to have missing objects in a repo.  This
functionality is guarded behind a new repository extension option
`extensions.partialcloneremote`.
See Documentation/technical/repository-version.txt for more information.

Teach fsck about the new state of affairs. In this commit, teach fsck
that missing promisor objects referenced from the reflog are not an
error case; in future commits, fsck will be taught about other cases.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 builtin/fsck.c           |  2 +-
 cache.h                  |  3 +-
 packfile.c               | 78 ++++++++++++++++++++++++++++++++++++++++++++--
 packfile.h               | 13 ++++++++
 setup.c                  |  3 --
 t/t0410-partial-clone.sh | 81 ++++++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 172 insertions(+), 8 deletions(-)
 create mode 100755 t/t0410-partial-clone.sh

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 56afe40..2934299 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -398,7 +398,7 @@ static void fsck_handle_reflog_oid(const char *refname, struct object_id *oid,
 					xstrfmt("%s@{%"PRItime"}", refname, timestamp));
 			obj->flags |= USED;
 			mark_object_reachable(obj);
-		} else {
+		} else if (!is_promisor_object(oid)) {
 			error("%s: invalid reflog entry %s", refname, oid_to_hex(oid));
 			errors_found |= ERROR_REACHABLE;
 		}
diff --git a/cache.h b/cache.h
index 4b785c0..5f84103 100644
--- a/cache.h
+++ b/cache.h
@@ -1589,7 +1589,8 @@ extern struct packed_git {
 	unsigned pack_local:1,
 		 pack_keep:1,
 		 freshened:1,
-		 do_not_close:1;
+		 do_not_close:1,
+		 pack_promisor:1;
 	unsigned char sha1[20];
 	struct revindex_entry *revindex;
 	/* something like ".git/objects/pack/xxxxx.pack" */
diff --git a/packfile.c b/packfile.c
index 4a5fe7a..b015a54 100644
--- a/packfile.c
+++ b/packfile.c
@@ -8,6 +8,12 @@
 #include "list.h"
 #include "streaming.h"
 #include "sha1-lookup.h"
+#include "commit.h"
+#include "object.h"
+#include "tag.h"
+#include "tree-walk.h"
+#include "tree.h"
+#include "partial-clone-utils.h"
 
 char *odb_pack_name(struct strbuf *buf,
 		    const unsigned char *sha1,
@@ -643,10 +649,10 @@ struct packed_git *add_packed_git(const char *path, size_t path_len, int local)
 		return NULL;
 
 	/*
-	 * ".pack" is long enough to hold any suffix we're adding (and
+	 * ".promisor" is long enough to hold any suffix we're adding (and
 	 * the use xsnprintf double-checks that)
 	 */
-	alloc = st_add3(path_len, strlen(".pack"), 1);
+	alloc = st_add3(path_len, strlen(".promisor"), 1);
 	p = alloc_packed_git(alloc);
 	memcpy(p->pack_name, path, path_len);
 
@@ -654,6 +660,10 @@ struct packed_git *add_packed_git(const char *path, size_t path_len, int local)
 	if (!access(p->pack_name, F_OK))
 		p->pack_keep = 1;
 
+	xsnprintf(p->pack_name + path_len, alloc - path_len, ".promisor");
+	if (!access(p->pack_name, F_OK))
+		p->pack_promisor = 1;
+
 	xsnprintf(p->pack_name + path_len, alloc - path_len, ".pack");
 	if (stat(p->pack_name, &st) || !S_ISREG(st.st_mode)) {
 		free(p);
@@ -781,7 +791,8 @@ static void prepare_packed_git_one(char *objdir, int local)
 		if (ends_with(de->d_name, ".idx") ||
 		    ends_with(de->d_name, ".pack") ||
 		    ends_with(de->d_name, ".bitmap") ||
-		    ends_with(de->d_name, ".keep"))
+		    ends_with(de->d_name, ".keep") ||
+		    ends_with(de->d_name, ".promisor"))
 			string_list_append(&garbage, path.buf);
 		else
 			report_garbage(PACKDIR_FILE_GARBAGE, path.buf);
@@ -1889,6 +1900,9 @@ int for_each_packed_object(each_packed_object_fn cb, void *data, unsigned flags)
 	for (p = packed_git; p; p = p->next) {
 		if ((flags & FOR_EACH_OBJECT_LOCAL_ONLY) && !p->pack_local)
 			continue;
+		if ((flags & FOR_EACH_OBJECT_PROMISOR_ONLY) &&
+		    !p->pack_promisor)
+			continue;
 		if (open_pack_index(p)) {
 			pack_errors = 1;
 			continue;
@@ -1899,3 +1913,61 @@ int for_each_packed_object(each_packed_object_fn cb, void *data, unsigned flags)
 	}
 	return r ? r : pack_errors;
 }
+
+static int add_promisor_object(const struct object_id *oid,
+			       struct packed_git *pack,
+			       uint32_t pos,
+			       void *set_)
+{
+	struct oidset *set = set_;
+	struct object *obj = parse_object(oid);
+	if (!obj)
+		return 1;
+
+	oidset_insert(set, oid);
+
+	/*
+	 * If this is a tree, commit, or tag, the objects it refers
+	 * to are also promisor objects. (Blobs refer to no objects.)
+	 */
+	if (obj->type == OBJ_TREE) {
+		struct tree *tree = (struct tree *)obj;
+		struct tree_desc desc;
+		struct name_entry entry;
+		if (init_tree_desc_gently(&desc, tree->buffer, tree->size))
+			/*
+			 * Error messages are given when packs are
+			 * verified, so do not print any here.
+			 */
+			return 0;
+		while (tree_entry_gently(&desc, &entry))
+			oidset_insert(set, entry.oid);
+	} else if (obj->type == OBJ_COMMIT) {
+		struct commit *commit = (struct commit *) obj;
+		struct commit_list *parents = commit->parents;
+
+		oidset_insert(set, &commit->tree->object.oid);
+		for (; parents; parents = parents->next)
+			oidset_insert(set, &parents->item->object.oid);
+	} else if (obj->type == OBJ_TAG) {
+		struct tag *tag = (struct tag *) obj;
+		oidset_insert(set, &tag->tagged->oid);
+	}
+	return 0;
+}
+
+int is_promisor_object(const struct object_id *oid)
+{
+	static struct oidset promisor_objects;
+	static int promisor_objects_prepared;
+
+	if (!promisor_objects_prepared) {
+		if (is_partial_clone_registered()) {
+			for_each_packed_object(add_promisor_object,
+					       &promisor_objects,
+					       FOR_EACH_OBJECT_PROMISOR_ONLY);
+		}
+		promisor_objects_prepared = 1;
+	}
+	return oidset_contains(&promisor_objects, oid);
+}
diff --git a/packfile.h b/packfile.h
index 0cdeb54..a7fca59 100644
--- a/packfile.h
+++ b/packfile.h
@@ -1,6 +1,8 @@
 #ifndef PACKFILE_H
 #define PACKFILE_H
 
+#include "oidset.h"
+
 /*
  * Generate the filename to be used for a pack file with checksum "sha1" and
  * extension "ext". The result is written into the strbuf "buf", overwriting
@@ -125,6 +127,11 @@ extern int has_sha1_pack(const unsigned char *sha1);
 extern int has_pack_index(const unsigned char *sha1);
 
 /*
+ * Only iterate over packs obtained from the promisor remote.
+ */
+#define FOR_EACH_OBJECT_PROMISOR_ONLY 2
+
+/*
  * Iterate over packed objects in both the local
  * repository and any alternates repositories (unless the
  * FOR_EACH_OBJECT_LOCAL_ONLY flag, defined in cache.h, is set).
@@ -135,4 +142,10 @@ typedef int each_packed_object_fn(const struct object_id *oid,
 				  void *data);
 extern int for_each_packed_object(each_packed_object_fn, void *, unsigned flags);
 
+/*
+ * Return 1 if an object in a promisor packfile is or refers to the given
+ * object, 0 otherwise.
+ */
+extern int is_promisor_object(const struct object_id *oid);
+
 #endif
diff --git a/setup.c b/setup.c
index bc4133d..ebfb34c 100644
--- a/setup.c
+++ b/setup.c
@@ -420,19 +420,16 @@ static int check_repo_format(const char *var, const char *value, void *vdata)
 			;
 		else if (!strcmp(ext, "preciousobjects"))
 			data->precious_objects = git_config_bool(var, value);
-
 		else if (!strcmp(ext, KEY_PARTIALCLONEREMOTE))
 			if (!value)
 				return config_error_nonbool(var);
 			else
 				data->partial_clone_remote = xstrdup(value);
-
 		else if (!strcmp(ext, KEY_PARTIALCLONEFILTER))
 			if (!value)
 				return config_error_nonbool(var);
 			else
 				data->partial_clone_filter = xstrdup(value);
-
 		else
 			string_list_append(&data->unknown_extensions, ext);
 	} else if (strcmp(var, "core.bare") == 0) {
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
new file mode 100755
index 0000000..52347fb
--- /dev/null
+++ b/t/t0410-partial-clone.sh
@@ -0,0 +1,81 @@
+#!/bin/sh
+
+test_description='partial clone'
+
+. ./test-lib.sh
+
+delete_object () {
+	rm $1/.git/objects/$(echo $2 | sed -e 's|^..|&/|')
+}
+
+pack_as_from_promisor () {
+	HASH=$(git -C repo pack-objects .git/objects/pack/pack) &&
+	>repo/.git/objects/pack/pack-$HASH.promisor
+}
+
+test_expect_success 'missing reflog object, but promised by a commit, passes fsck' '
+	test_create_repo repo &&
+	test_commit -C repo my_commit &&
+
+	A=$(git -C repo commit-tree -m a HEAD^{tree}) &&
+	C=$(git -C repo commit-tree -m c -p $A HEAD^{tree}) &&
+
+	# Reference $A only from reflog, and delete it
+	git -C repo branch my_branch "$A" &&
+	git -C repo branch -f my_branch my_commit &&
+	delete_object repo "$A" &&
+
+	# State that we got $C, which refers to $A, from promisor
+	printf "$C\n" | pack_as_from_promisor &&
+
+	# Normally, it fails
+	test_must_fail git -C repo fsck &&
+
+	# But with the extension, it succeeds
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config extensions.partialcloneremote "arbitrary string" &&
+	git -C repo fsck
+'
+
+test_expect_success 'missing reflog object, but promised by a tag, passes fsck' '
+	rm -rf repo &&
+	test_create_repo repo &&
+	test_commit -C repo my_commit &&
+
+	A=$(git -C repo commit-tree -m a HEAD^{tree}) &&
+	git -C repo tag -a -m d my_tag_name $A &&
+	T=$(git -C repo rev-parse my_tag_name) &&
+	git -C repo tag -d my_tag_name &&
+
+	# Reference $A only from reflog, and delete it
+	git -C repo branch my_branch "$A" &&
+	git -C repo branch -f my_branch my_commit &&
+	delete_object repo "$A" &&
+
+	# State that we got $T, which refers to $A, from promisor
+	printf "$T\n" | pack_as_from_promisor &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config extensions.partialcloneremote "arbitrary string" &&
+	git -C repo fsck
+'
+
+test_expect_success 'missing reflog object alone fails fsck, even with extension set' '
+	rm -rf repo &&
+	test_create_repo repo &&
+	test_commit -C repo my_commit &&
+
+	A=$(git -C repo commit-tree -m a HEAD^{tree}) &&
+	B=$(git -C repo commit-tree -m b HEAD^{tree}) &&
+
+	# Reference $A only from reflog, and delete it
+	git -C repo branch my_branch "$A" &&
+	git -C repo branch -f my_branch my_commit &&
+	delete_object repo "$A" &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config extensions.partialcloneremote "arbitrary string" &&
+	test_must_fail git -C repo fsck
+'
+
+test_done
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/9] fsck: support refs pointing to promisor objects
  2017-11-02 20:20 [PATCH 0/9] WIP Partial clone part 2: fsck and promisors Jeff Hostetler
  2017-11-02 20:20 ` [PATCH 1/9] extension.partialclone: introduce partial clone extension Jeff Hostetler
  2017-11-02 20:20 ` [PATCH 2/9] fsck: introduce partialclone extension Jeff Hostetler
@ 2017-11-02 20:20 ` Jeff Hostetler
  2017-11-02 20:20 ` [PATCH 4/9] fsck: support referenced " Jeff Hostetler
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Jeff Hostetler @ 2017-11-02 20:20 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, jonathantanmy, Jeff Hostetler

From: Jonathan Tan <jonathantanmy@google.com>

Teach fsck to not treat refs referring to missing promisor objects as an
error when extensions.partialclone is set.

For the purposes of warning about no default refs, such refs are still
treated as legitimate refs.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 builtin/fsck.c           |  8 ++++++++
 t/t0410-partial-clone.sh | 24 ++++++++++++++++++++++++
 2 files changed, 32 insertions(+)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 2934299..ee937bb 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -434,6 +434,14 @@ static int fsck_handle_ref(const char *refname, const struct object_id *oid,
 
 	obj = parse_object(oid);
 	if (!obj) {
+		if (is_promisor_object(oid)) {
+			/*
+			 * Increment default_refs anyway, because this is a
+			 * valid ref.
+			 */
+			 default_refs++;
+			 return 0;
+		}
 		error("%s: invalid sha1 pointer %s", refname, oid_to_hex(oid));
 		errors_found |= ERROR_REACHABLE;
 		/* We'll continue with the rest despite the error.. */
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 52347fb..5a03ead 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -13,6 +13,14 @@ pack_as_from_promisor () {
 	>repo/.git/objects/pack/pack-$HASH.promisor
 }
 
+promise_and_delete () {
+	HASH=$(git -C repo rev-parse "$1") &&
+	git -C repo tag -a -m message my_annotated_tag "$HASH" &&
+	git -C repo rev-parse my_annotated_tag | pack_as_from_promisor &&
+	git -C repo tag -d my_annotated_tag &&
+	delete_object repo "$HASH"
+}
+
 test_expect_success 'missing reflog object, but promised by a commit, passes fsck' '
 	test_create_repo repo &&
 	test_commit -C repo my_commit &&
@@ -78,4 +86,20 @@ test_expect_success 'missing reflog object alone fails fsck, even with extension
 	test_must_fail git -C repo fsck
 '
 
+test_expect_success 'missing ref object, but promised, passes fsck' '
+	rm -rf repo &&
+	test_create_repo repo &&
+	test_commit -C repo my_commit &&
+
+	A=$(git -C repo commit-tree -m a HEAD^{tree}) &&
+
+	# Reference $A only from ref
+	git -C repo branch my_branch "$A" &&
+	promise_and_delete "$A" &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config extensions.partialcloneremote "arbitrary string" &&
+	git -C repo fsck
+'
+
 test_done
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/9] fsck: support referenced promisor objects
  2017-11-02 20:20 [PATCH 0/9] WIP Partial clone part 2: fsck and promisors Jeff Hostetler
                   ` (2 preceding siblings ...)
  2017-11-02 20:20 ` [PATCH 3/9] fsck: support refs pointing to promisor objects Jeff Hostetler
@ 2017-11-02 20:20 ` Jeff Hostetler
  2017-11-02 20:20 ` [PATCH 5/9] fsck: support promisor objects as CLI argument Jeff Hostetler
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Jeff Hostetler @ 2017-11-02 20:20 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, jonathantanmy, Jeff Hostetler

From: Jonathan Tan <jonathantanmy@google.com>

Teach fsck to not treat missing promisor objects indirectly pointed to
by refs as an error when extensions.partialcloneremote is set.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 builtin/fsck.c           | 11 +++++++++++
 t/t0410-partial-clone.sh | 23 +++++++++++++++++++++++
 2 files changed, 34 insertions(+)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index ee937bb..4c2a56d 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -149,6 +149,15 @@ static int mark_object(struct object *obj, int type, void *data, struct fsck_opt
 	if (obj->flags & REACHABLE)
 		return 0;
 	obj->flags |= REACHABLE;
+
+	if (is_promisor_object(&obj->oid))
+		/*
+		 * Further recursion does not need to be performed on this
+		 * object since it is a promisor object (so it does not need to
+		 * be added to "pending").
+		 */
+		return 0;
+
 	if (!(obj->flags & HAS_OBJ)) {
 		if (parent && !has_object_file(&obj->oid)) {
 			printf("broken link from %7s %s\n",
@@ -208,6 +217,8 @@ static void check_reachable_object(struct object *obj)
 	 * do a full fsck
 	 */
 	if (!(obj->flags & HAS_OBJ)) {
+		if (is_promisor_object(&obj->oid))
+			return;
 		if (has_sha1_pack(obj->oid.hash))
 			return; /* it is in pack - forget about it */
 		printf("missing %s %s\n", printable_type(obj),
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 5a03ead..b1d404e 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -102,4 +102,27 @@ test_expect_success 'missing ref object, but promised, passes fsck' '
 	git -C repo fsck
 '
 
+test_expect_success 'missing object, but promised, passes fsck' '
+	rm -rf repo &&
+	test_create_repo repo &&
+	test_commit -C repo 1 &&
+	test_commit -C repo 2 &&
+	test_commit -C repo 3 &&
+	git -C repo tag -a annotated_tag -m "annotated tag" &&
+
+	C=$(git -C repo rev-parse 1) &&
+	T=$(git -C repo rev-parse 2^{tree}) &&
+	B=$(git hash-object repo/3.t) &&
+	AT=$(git -C repo rev-parse annotated_tag) &&
+
+	promise_and_delete "$C" &&
+	promise_and_delete "$T" &&
+	promise_and_delete "$B" &&
+	promise_and_delete "$AT" &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config extensions.partialcloneremote "arbitrary string" &&
+	git -C repo fsck
+'
+
 test_done
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/9] fsck: support promisor objects as CLI argument
  2017-11-02 20:20 [PATCH 0/9] WIP Partial clone part 2: fsck and promisors Jeff Hostetler
                   ` (3 preceding siblings ...)
  2017-11-02 20:20 ` [PATCH 4/9] fsck: support referenced " Jeff Hostetler
@ 2017-11-02 20:20 ` Jeff Hostetler
  2017-11-02 20:20 ` [PATCH 6/9] index-pack: refactor writing of .keep files Jeff Hostetler
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Jeff Hostetler @ 2017-11-02 20:20 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, jonathantanmy, Jeff Hostetler

From: Jonathan Tan <jonathantanmy@google.com>

Teach fsck to not treat missing promisor objects provided on the CLI as
an error when extensions.partialcloneremote is set.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 builtin/fsck.c           |  2 ++
 t/t0410-partial-clone.sh | 13 +++++++++++++
 2 files changed, 15 insertions(+)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 4c2a56d..578a7c8 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -750,6 +750,8 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 			struct object *obj = lookup_object(oid.hash);
 
 			if (!obj || !(obj->flags & HAS_OBJ)) {
+				if (is_promisor_object(&oid))
+					continue;
 				error("%s: object missing", oid_to_hex(&oid));
 				errors_found |= ERROR_OBJECT;
 				continue;
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index b1d404e..002e071 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -125,4 +125,17 @@ test_expect_success 'missing object, but promised, passes fsck' '
 	git -C repo fsck
 '
 
+test_expect_success 'missing CLI object, but promised, passes fsck' '
+	rm -rf repo &&
+	test_create_repo repo &&
+	test_commit -C repo my_commit &&
+
+	A=$(git -C repo commit-tree -m a HEAD^{tree}) &&
+	promise_and_delete "$A" &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config extensions.partialcloneremote "arbitrary string" &&
+	git -C repo fsck "$A"
+'
+
 test_done
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 6/9] index-pack: refactor writing of .keep files
  2017-11-02 20:20 [PATCH 0/9] WIP Partial clone part 2: fsck and promisors Jeff Hostetler
                   ` (4 preceding siblings ...)
  2017-11-02 20:20 ` [PATCH 5/9] fsck: support promisor objects as CLI argument Jeff Hostetler
@ 2017-11-02 20:20 ` Jeff Hostetler
  2017-11-02 20:20 ` [PATCH 7/9] introduce fetch-object: fetch one promisor object Jeff Hostetler
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Jeff Hostetler @ 2017-11-02 20:20 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, jonathantanmy, Jeff Hostetler

From: Jonathan Tan <jonathantanmy@google.com>

In a subsequent commit, index-pack will be taught to write ".promisor"
files which are similar to the ".keep" files it knows how to write.
Refactor the writing of ".keep" files, so that the implementation of
writing ".promisor" files becomes easier.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 builtin/index-pack.c | 99 ++++++++++++++++++++++++++++------------------------
 1 file changed, 53 insertions(+), 46 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 8ec459f..4f305a7 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1389,15 +1389,58 @@ static void fix_unresolved_deltas(struct sha1file *f)
 	free(sorted_by_pos);
 }
 
+static const char *derive_filename(const char *pack_name, const char *suffix,
+				   struct strbuf *buf)
+{
+	size_t len;
+	if (!strip_suffix(pack_name, ".pack", &len))
+		die(_("packfile name '%s' does not end with '.pack'"),
+		    pack_name);
+	strbuf_add(buf, pack_name, len);
+	strbuf_addch(buf, '.');
+	strbuf_addstr(buf, suffix);
+	return buf->buf;
+}
+
+static void write_special_file(const char *suffix, const char *msg,
+			       const char *pack_name, const unsigned char *sha1,
+			       const char **report)
+{
+	struct strbuf name_buf = STRBUF_INIT;
+	const char *filename;
+	int fd;
+	int msg_len = strlen(msg);
+
+	if (pack_name)
+		filename = derive_filename(pack_name, suffix, &name_buf);
+	else
+		filename = odb_pack_name(&name_buf, sha1, suffix);
+
+	fd = odb_pack_keep(filename);
+	if (fd < 0) {
+		if (errno != EEXIST)
+			die_errno(_("cannot write %s file '%s'"),
+				  suffix, filename);
+	} else {
+		if (msg_len > 0) {
+			write_or_die(fd, msg, msg_len);
+			write_or_die(fd, "\n", 1);
+		}
+		if (close(fd) != 0)
+			die_errno(_("cannot close written %s file '%s'"),
+				  suffix, filename);
+		*report = suffix;
+	}
+	strbuf_release(&name_buf);
+}
+
 static void final(const char *final_pack_name, const char *curr_pack_name,
 		  const char *final_index_name, const char *curr_index_name,
-		  const char *keep_name, const char *keep_msg,
-		  unsigned char *sha1)
+		  const char *keep_msg, unsigned char *sha1)
 {
 	const char *report = "pack";
 	struct strbuf pack_name = STRBUF_INIT;
 	struct strbuf index_name = STRBUF_INIT;
-	struct strbuf keep_name_buf = STRBUF_INIT;
 	int err;
 
 	if (!from_stdin) {
@@ -1409,28 +1452,9 @@ static void final(const char *final_pack_name, const char *curr_pack_name,
 			die_errno(_("error while closing pack file"));
 	}
 
-	if (keep_msg) {
-		int keep_fd, keep_msg_len = strlen(keep_msg);
-
-		if (!keep_name)
-			keep_name = odb_pack_name(&keep_name_buf, sha1, "keep");
-
-		keep_fd = odb_pack_keep(keep_name);
-		if (keep_fd < 0) {
-			if (errno != EEXIST)
-				die_errno(_("cannot write keep file '%s'"),
-					  keep_name);
-		} else {
-			if (keep_msg_len > 0) {
-				write_or_die(keep_fd, keep_msg, keep_msg_len);
-				write_or_die(keep_fd, "\n", 1);
-			}
-			if (close(keep_fd) != 0)
-				die_errno(_("cannot close written keep file '%s'"),
-					  keep_name);
-			report = "keep";
-		}
-	}
+	if (keep_msg)
+		write_special_file("keep", keep_msg, final_pack_name, sha1,
+				   &report);
 
 	if (final_pack_name != curr_pack_name) {
 		if (!final_pack_name)
@@ -1472,7 +1496,6 @@ static void final(const char *final_pack_name, const char *curr_pack_name,
 
 	strbuf_release(&index_name);
 	strbuf_release(&pack_name);
-	strbuf_release(&keep_name_buf);
 }
 
 static int git_index_pack_config(const char *k, const char *v, void *cb)
@@ -1615,26 +1638,13 @@ static void show_pack_info(int stat_only)
 	}
 }
 
-static const char *derive_filename(const char *pack_name, const char *suffix,
-				   struct strbuf *buf)
-{
-	size_t len;
-	if (!strip_suffix(pack_name, ".pack", &len))
-		die(_("packfile name '%s' does not end with '.pack'"),
-		    pack_name);
-	strbuf_add(buf, pack_name, len);
-	strbuf_addstr(buf, suffix);
-	return buf->buf;
-}
-
 int cmd_index_pack(int argc, const char **argv, const char *prefix)
 {
 	int i, fix_thin_pack = 0, verify = 0, stat_only = 0;
 	const char *curr_index;
 	const char *index_name = NULL, *pack_name = NULL;
-	const char *keep_name = NULL, *keep_msg = NULL;
-	struct strbuf index_name_buf = STRBUF_INIT,
-		      keep_name_buf = STRBUF_INIT;
+	const char *keep_msg = NULL;
+	struct strbuf index_name_buf = STRBUF_INIT;
 	struct pack_idx_entry **idx_objects;
 	struct pack_idx_option opts;
 	unsigned char pack_sha1[20];
@@ -1745,9 +1755,7 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	if (from_stdin && !startup_info->have_repository)
 		die(_("--stdin requires a git repository"));
 	if (!index_name && pack_name)
-		index_name = derive_filename(pack_name, ".idx", &index_name_buf);
-	if (keep_msg && !keep_name && pack_name)
-		keep_name = derive_filename(pack_name, ".keep", &keep_name_buf);
+		index_name = derive_filename(pack_name, "idx", &index_name_buf);
 
 	if (verify) {
 		if (!index_name)
@@ -1795,13 +1803,12 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	if (!verify)
 		final(pack_name, curr_pack,
 		      index_name, curr_index,
-		      keep_name, keep_msg,
+		      keep_msg,
 		      pack_sha1);
 	else
 		close(input_fd);
 	free(objects);
 	strbuf_release(&index_name_buf);
-	strbuf_release(&keep_name_buf);
 	if (pack_name == NULL)
 		free((void *) curr_pack);
 	if (index_name == NULL)
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 7/9] introduce fetch-object: fetch one promisor object
  2017-11-02 20:20 [PATCH 0/9] WIP Partial clone part 2: fsck and promisors Jeff Hostetler
                   ` (5 preceding siblings ...)
  2017-11-02 20:20 ` [PATCH 6/9] index-pack: refactor writing of .keep files Jeff Hostetler
@ 2017-11-02 20:20 ` Jeff Hostetler
  2017-11-02 20:20 ` [PATCH 8/9] sha1_file: support lazily fetching missing objects Jeff Hostetler
  2017-11-02 20:20 ` [PATCH 9/9] gc: do not repack promisor packfiles Jeff Hostetler
  8 siblings, 0 replies; 19+ messages in thread
From: Jeff Hostetler @ 2017-11-02 20:20 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, jonathantanmy, Jeff Hostetler

From: Jonathan Tan <jonathantanmy@google.com>

Introduce fetch-object, providing the ability to fetch one object from a
promisor remote.

This uses fetch-pack. To do this, the transport mechanism has been
updated with 2 flags, "from-promisor" to indicate that the resulting
pack comes from a promisor remote (and thus should be annotated as such
by index-pack), and "no-haves" to suppress the sending of "have" lines.

This will be tested in a subsequent commit.

NEEDSWORK: update this when we have more information about protocol v2,
which should allow a way to suppress the ref advertisement and
officially allow any object type to be "want"-ed.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Documentation/gitremote-helpers.txt |  2 ++
 Makefile                            |  1 +
 builtin/fetch-pack.c                |  8 ++++++++
 builtin/index-pack.c                | 16 +++++++++++++---
 fetch-object.c                      | 23 +++++++++++++++++++++++
 fetch-object.h                      |  6 ++++++
 fetch-pack.c                        |  8 ++++++--
 fetch-pack.h                        |  2 ++
 remote-curl.c                       | 17 ++++++++++++++++-
 transport.c                         |  8 ++++++++
 transport.h                         |  8 ++++++++
 11 files changed, 93 insertions(+), 6 deletions(-)
 create mode 100644 fetch-object.c
 create mode 100644 fetch-object.h

diff --git a/Documentation/gitremote-helpers.txt b/Documentation/gitremote-helpers.txt
index 4a584f3..6da3f41 100644
--- a/Documentation/gitremote-helpers.txt
+++ b/Documentation/gitremote-helpers.txt
@@ -466,6 +466,8 @@ set by Git if the remote helper has the 'option' capability.
 	Transmit <string> as a push option. As the push option
 	must not contain LF or NUL characters, the string is not encoded.
 
+TODO document 'option from-promisor' and 'option no-haves' ?
+
 SEE ALSO
 --------
 linkgit:git-remote[1]
diff --git a/Makefile b/Makefile
index 12d141a..7a0679a 100644
--- a/Makefile
+++ b/Makefile
@@ -792,6 +792,7 @@ LIB_OBJS += ewah/ewah_bitmap.o
 LIB_OBJS += ewah/ewah_io.o
 LIB_OBJS += ewah/ewah_rlw.o
 LIB_OBJS += exec_cmd.o
+LIB_OBJS += fetch-object.o
 LIB_OBJS += fetch-pack.o
 LIB_OBJS += fsck.o
 LIB_OBJS += gettext.o
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 366b9d1..9f303cf 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -143,6 +143,14 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
 			args.update_shallow = 1;
 			continue;
 		}
+		if (!strcmp("--from-promisor", arg)) {
+			args.from_promisor = 1;
+			continue;
+		}
+		if (!strcmp("--no-haves", arg)) {
+			args.no_haves = 1;
+			continue;
+		}
 		usage(fetch_pack_usage);
 	}
 	if (deepen_not.nr)
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 4f305a7..24c2f05 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1429,14 +1429,16 @@ static void write_special_file(const char *suffix, const char *msg,
 		if (close(fd) != 0)
 			die_errno(_("cannot close written %s file '%s'"),
 				  suffix, filename);
-		*report = suffix;
+		if (report)
+			*report = suffix;
 	}
 	strbuf_release(&name_buf);
 }
 
 static void final(const char *final_pack_name, const char *curr_pack_name,
 		  const char *final_index_name, const char *curr_index_name,
-		  const char *keep_msg, unsigned char *sha1)
+		  const char *keep_msg, const char *promisor_msg,
+		  unsigned char *sha1)
 {
 	const char *report = "pack";
 	struct strbuf pack_name = STRBUF_INIT;
@@ -1455,6 +1457,9 @@ static void final(const char *final_pack_name, const char *curr_pack_name,
 	if (keep_msg)
 		write_special_file("keep", keep_msg, final_pack_name, sha1,
 				   &report);
+	if (promisor_msg)
+		write_special_file("promisor", promisor_msg, final_pack_name,
+				   sha1, NULL);
 
 	if (final_pack_name != curr_pack_name) {
 		if (!final_pack_name)
@@ -1644,6 +1649,7 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	const char *curr_index;
 	const char *index_name = NULL, *pack_name = NULL;
 	const char *keep_msg = NULL;
+	const char *promisor_msg = NULL;
 	struct strbuf index_name_buf = STRBUF_INIT;
 	struct pack_idx_entry **idx_objects;
 	struct pack_idx_option opts;
@@ -1693,6 +1699,10 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 				keep_msg = "";
 			} else if (starts_with(arg, "--keep=")) {
 				keep_msg = arg + 7;
+			} else if (!strcmp(arg, "--promisor")) {
+				promisor_msg = "";
+			} else if (starts_with(arg, "--promisor=")) {
+				promisor_msg = arg + strlen("--promisor=");
 			} else if (starts_with(arg, "--threads=")) {
 				char *end;
 				nr_threads = strtoul(arg+10, &end, 0);
@@ -1803,7 +1813,7 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	if (!verify)
 		final(pack_name, curr_pack,
 		      index_name, curr_index,
-		      keep_msg,
+		      keep_msg, promisor_msg,
 		      pack_sha1);
 	else
 		close(input_fd);
diff --git a/fetch-object.c b/fetch-object.c
new file mode 100644
index 0000000..f89dbba
--- /dev/null
+++ b/fetch-object.c
@@ -0,0 +1,23 @@
+#include "cache.h"
+#include "packfile.h"
+#include "pkt-line.h"
+#include "strbuf.h"
+#include "transport.h"
+
+void fetch_object(const char *remote_name, const unsigned char *sha1)
+{
+	struct remote *remote;
+	struct transport *transport;
+	struct ref *ref;
+
+	remote = remote_get(remote_name);
+	if (!remote->url[0])
+		die(_("Remote with no URL"));
+	transport = transport_get(remote, remote->url[0]);
+
+	ref = alloc_ref(sha1_to_hex(sha1));
+	hashcpy(ref->old_oid.hash, sha1);
+	transport_set_option(transport, TRANS_OPT_FROM_PROMISOR, "1");
+	transport_set_option(transport, TRANS_OPT_NO_HAVES, "1");
+	transport_fetch_refs(transport, ref);
+}
diff --git a/fetch-object.h b/fetch-object.h
new file mode 100644
index 0000000..f371300
--- /dev/null
+++ b/fetch-object.h
@@ -0,0 +1,6 @@
+#ifndef FETCH_OBJECT_H
+#define FETCH_OBJECT_H
+
+extern void fetch_object(const char *remote_name, const unsigned char *sha1);
+
+#endif
diff --git a/fetch-pack.c b/fetch-pack.c
index 008b25d..4640b4e 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -450,6 +450,8 @@ static int find_common(struct fetch_pack_args *args,
 
 	flushes = 0;
 	retval = -1;
+	if (args->no_haves)
+		goto done;
 	while ((oid = get_rev())) {
 		packet_buf_write(&req_buf, "have %s\n", oid_to_hex(oid));
 		print_verbose(args, "have %s", oid_to_hex(oid));
@@ -832,7 +834,7 @@ static int get_pack(struct fetch_pack_args *args,
 		argv_array_push(&cmd.args, alternate_shallow_file);
 	}
 
-	if (do_keep) {
+	if (do_keep || args->from_promisor) {
 		if (pack_lockfile)
 			cmd.out = -1;
 		cmd_name = "index-pack";
@@ -842,7 +844,7 @@ static int get_pack(struct fetch_pack_args *args,
 			argv_array_push(&cmd.args, "-v");
 		if (args->use_thin_pack)
 			argv_array_push(&cmd.args, "--fix-thin");
-		if (args->lock_pack || unpack_limit) {
+		if (do_keep && (args->lock_pack || unpack_limit)) {
 			char hostname[HOST_NAME_MAX + 1];
 			if (xgethostname(hostname, sizeof(hostname)))
 				xsnprintf(hostname, sizeof(hostname), "localhost");
@@ -852,6 +854,8 @@ static int get_pack(struct fetch_pack_args *args,
 		}
 		if (args->check_self_contained_and_connected)
 			argv_array_push(&cmd.args, "--check-self-contained-and-connected");
+		if (args->from_promisor)
+			argv_array_push(&cmd.args, "--promisor");
 	}
 	else {
 		cmd_name = "unpack-objects";
diff --git a/fetch-pack.h b/fetch-pack.h
index b6aeb43..84904c3 100644
--- a/fetch-pack.h
+++ b/fetch-pack.h
@@ -29,6 +29,8 @@ struct fetch_pack_args {
 	unsigned cloning:1;
 	unsigned update_shallow:1;
 	unsigned deepen:1;
+	unsigned from_promisor:1;
+	unsigned no_haves:1;
 };
 
 /*
diff --git a/remote-curl.c b/remote-curl.c
index 0053b09..41e8a42 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -33,7 +33,9 @@ struct options {
 		thin : 1,
 		/* One of the SEND_PACK_PUSH_CERT_* constants. */
 		push_cert : 2,
-		deepen_relative : 1;
+		deepen_relative : 1,
+		from_promisor : 1,
+		no_haves : 1;
 };
 static struct options options;
 static struct string_list cas_options = STRING_LIST_INIT_DUP;
@@ -157,6 +159,15 @@ static int set_option(const char *name, const char *value)
 			return -1;
 		return 0;
 #endif /* LIBCURL_VERSION_NUM >= 0x070a08 */
+
+	} else if (!strcmp(name, "from-promisor")) {
+		options.from_promisor = 1;
+		return 0;
+
+	} else if (!strcmp(name, "no-haves")) {
+		options.no_haves = 1;
+		return 0;
+
 	} else {
 		return 1 /* unsupported */;
 	}
@@ -822,6 +833,10 @@ static int fetch_git(struct discovery *heads,
 				 options.deepen_not.items[i].string);
 	if (options.deepen_relative && options.depth)
 		argv_array_push(&args, "--deepen-relative");
+	if (options.from_promisor)
+		argv_array_push(&args, "--from-promisor");
+	if (options.no_haves)
+		argv_array_push(&args, "--no-haves");
 	argv_array_push(&args, url.buf);
 
 	for (i = 0; i < nr_heads; i++) {
diff --git a/transport.c b/transport.c
index f1e2f61..8211f82 100644
--- a/transport.c
+++ b/transport.c
@@ -160,6 +160,12 @@ static int set_git_option(struct git_transport_options *opts,
 	} else if (!strcmp(name, TRANS_OPT_DEEPEN_RELATIVE)) {
 		opts->deepen_relative = !!value;
 		return 0;
+	} else if (!strcmp(name, TRANS_OPT_FROM_PROMISOR)) {
+		opts->from_promisor = !!value;
+		return 0;
+	} else if (!strcmp(name, TRANS_OPT_NO_HAVES)) {
+		opts->no_haves = !!value;
+		return 0;
 	}
 	return 1;
 }
@@ -228,6 +234,8 @@ static int fetch_refs_via_pack(struct transport *transport,
 		data->options.check_self_contained_and_connected;
 	args.cloning = transport->cloning;
 	args.update_shallow = data->options.update_shallow;
+	args.from_promisor = data->options.from_promisor;
+	args.no_haves = data->options.no_haves;
 
 	if (!data->got_remote_heads) {
 		connect_setup(transport, 0);
diff --git a/transport.h b/transport.h
index bc55715..67428f6 100644
--- a/transport.h
+++ b/transport.h
@@ -15,6 +15,8 @@ struct git_transport_options {
 	unsigned self_contained_and_connected : 1;
 	unsigned update_shallow : 1;
 	unsigned deepen_relative : 1;
+	unsigned from_promisor : 1;
+	unsigned no_haves : 1;
 	int depth;
 	const char *deepen_since;
 	const struct string_list *deepen_not;
@@ -210,6 +212,12 @@ void transport_check_allowed(const char *type);
 /* Send push certificates */
 #define TRANS_OPT_PUSH_CERT "pushcert"
 
+/* Indicate that these objects are being fetched by a promisor */
+#define TRANS_OPT_FROM_PROMISOR "from-promisor"
+
+/* Do not send "have" lines */
+#define TRANS_OPT_NO_HAVES "no-haves"
+
 /**
  * Returns 0 if the option was used, non-zero otherwise. Prints a
  * message to stderr if the option is not used.
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 8/9] sha1_file: support lazily fetching missing objects
  2017-11-02 20:20 [PATCH 0/9] WIP Partial clone part 2: fsck and promisors Jeff Hostetler
                   ` (6 preceding siblings ...)
  2017-11-02 20:20 ` [PATCH 7/9] introduce fetch-object: fetch one promisor object Jeff Hostetler
@ 2017-11-02 20:20 ` Jeff Hostetler
  2017-11-02 20:20 ` [PATCH 9/9] gc: do not repack promisor packfiles Jeff Hostetler
  8 siblings, 0 replies; 19+ messages in thread
From: Jeff Hostetler @ 2017-11-02 20:20 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, jonathantanmy, Jeff Hostetler

From: Jonathan Tan <jonathantanmy@google.com>

Teach sha1_file to fetch objects from the remote configured in
extensions.partialcloneremote whenever an object is requested but missing.

The fetching of objects can be suppressed through a global variable.
This is used by fsck and index-pack.

However, by default, such fetching is not suppressed. This is meant as a
temporary measure to ensure that all Git commands work in such a
situation. Future patches will update some commands to either tolerate
missing objects (without fetching them) or be more efficient in fetching
them.

In order to determine the code changes in sha1_file.c necessary, I
investigated the following:
 (1) functions in sha1_file.c that take in a hash, without the user
     regarding how the object is stored (loose or packed)
 (2) functions in packfile.c (because I need to check callers that know
     about the loose/packed distinction and operate on both differently,
     and ensure that they can handle the concept of objects that are
     neither loose nor packed)

(1) is handled by the modification to sha1_object_info_extended().

For (2), I looked at for_each_packed_object and others.  For
for_each_packed_object, the callers either already work or are fixed in
this patch:
 - reachable - only to find recent objects
 - builtin/fsck - already knows about missing objects
 - builtin/cat-file - warning message added in this commit

Callers of the other functions do not need to be changed:
 - parse_pack_index
   - http - indirectly from http_get_info_packs
   - find_pack_entry_one
     - this searches a single pack that is provided as an argument; the
       caller already knows (through other means) that the sought object
       is in a specific pack
 - find_sha1_pack
   - fast-import - appears to be an optimization to not store a file if
     it is already in a pack
   - http-walker - to search through a struct alt_base
   - http-push - to search through remote packs
 - has_sha1_pack
   - builtin/fsck - already knows about promisor objects
   - builtin/count-objects - informational purposes only (check if loose
     object is also packed)
   - builtin/prune-packed - check if object to be pruned is packed (if
     not, don't prune it)
   - revision - used to exclude packed objects if requested by user
   - diff - just for optimization

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 builtin/cat-file.c       |   3 +
 builtin/fetch-pack.c     |   2 +
 builtin/fsck.c           |   3 +
 builtin/index-pack.c     |   6 ++
 builtin/rev-list.c       |  35 +++++++++--
 cache.h                  |   8 +++
 fetch-object.c           |   3 +
 list-objects.c           |   8 ++-
 object.c                 |   2 +-
 revision.c               |  32 +++++++++-
 revision.h               |   5 +-
 sha1_file.c              |  39 ++++++++----
 t/t0410-partial-clone.sh | 152 +++++++++++++++++++++++++++++++++++++++++++++++
 13 files changed, 277 insertions(+), 21 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index f5fa4fd..ba77b73 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -13,6 +13,7 @@
 #include "tree-walk.h"
 #include "sha1-array.h"
 #include "packfile.h"
+#include "partial-clone-utils.h"
 
 struct batch_options {
 	int enabled;
@@ -475,6 +476,8 @@ static int batch_objects(struct batch_options *opt)
 
 		for_each_loose_object(batch_loose_object, &sa, 0);
 		for_each_packed_object(batch_packed_object, &sa, 0);
+		if (is_partial_clone_registered())
+			warning("This repository has partial clone enabled. Some objects may not be loaded.");
 
 		cb.opt = opt;
 		cb.expand = &data;
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 9f303cf..9a7ebf6 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -53,6 +53,8 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
 	struct oid_array shallow = OID_ARRAY_INIT;
 	struct string_list deepen_not = STRING_LIST_INIT_DUP;
 
+	fetch_if_missing = 0;
+
 	packet_trace_identity("fetch-pack");
 
 	memset(&args, 0, sizeof(args));
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 578a7c8..3b76c0e 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -678,6 +678,9 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 	int i;
 	struct alternate_object_database *alt;
 
+	/* fsck knows how to handle missing promisor objects */
+	fetch_if_missing = 0;
+
 	errors_found = 0;
 	check_replace_refs = 0;
 
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 24c2f05..a0a35e6 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1657,6 +1657,12 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	unsigned foreign_nr = 1;	/* zero is a "good" value, assume bad */
 	int report_end_of_input = 0;
 
+	/*
+	 * index-pack never needs to fetch missing objects, since it only
+	 * accesses the repo to do hash collision checks
+	 */
+	fetch_if_missing = 0;
+
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage(index_pack_usage);
 
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index cc9fa40..21f0b4c 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -15,6 +15,7 @@
 #include "progress.h"
 #include "reflog-walk.h"
 #include "oidset.h"
+#include "packfile.h"
 
 static const char rev_list_usage[] =
 "git rev-list [OPTION] <commit-id>... [ -- paths... ]\n"
@@ -60,6 +61,7 @@ static unsigned progress_counter;
 static struct list_objects_filter_options filter_options;
 static struct oidset missing_objects;
 static struct oidset omitted_objects;
+static int arg_exclude_promisor_objects;
 static int arg_print_missing;
 static int arg_print_omitted;
 static int arg_ignore_missing;
@@ -190,27 +192,39 @@ static void finish_commit(struct commit *commit, void *data)
 	free_commit_buffer(commit);
 }
 
-static void finish_object(struct object *obj, const char *name, void *cb_data)
+static int finish_object(struct object *obj, const char *name, void *cb_data)
 {
 	struct rev_list_info *info = cb_data;
 	if (obj->type == OBJ_BLOB && !has_object_file(&obj->oid)) {
+		/*
+		 * If "--exclude-promisor-objects", we try to dynamically
+		 * fetch missing objects.  Whether we tried or not, we do
+		 * not have the object.  We can then either print, ignore,
+		 * or conditionally ignore them.
+		 */
 		if (arg_print_missing) {
 			oidset_insert(&missing_objects, &obj->oid);
-			return;
+			return 1;
 		}
 		if (arg_ignore_missing)
-			return;
+			return 1;
+
+		if (arg_exclude_promisor_objects &&
+		    is_promisor_object(&obj->oid))
+			return 1;
 
 		die("missing blob object '%s'", oid_to_hex(&obj->oid));
 	}
 	if (info->revs->verify_objects && !obj->parsed && obj->type != OBJ_COMMIT)
 		parse_object(&obj->oid);
+	return 0;
 }
 
 static void show_object(struct object *obj, const char *name, void *cb_data)
 {
 	struct rev_list_info *info = cb_data;
-	finish_object(obj, name, cb_data);
+	if (finish_object(obj, name, cb_data))
+		return;
 	display_progress(progress, ++progress_counter);
 	if (info->flags & REV_LIST_QUIET)
 		return;
@@ -307,6 +321,19 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 	init_revisions(&revs, prefix);
 	revs.abbrev = DEFAULT_ABBREV;
 	revs.commit_format = CMIT_FMT_UNSPECIFIED;
+
+	/*
+	 * Scan the argument list before invoking setup_revisions(), so that we
+	 * know if fetch_if_missing needs to be set to 0.
+	 */
+	for (i = 1; i < argc; i++) {
+		if (!strcmp(argv[i], "--exclude-promisor-objects")) {
+			fetch_if_missing = 0;
+			arg_exclude_promisor_objects = 1;
+			break;
+		}
+	}
+
 	argc = setup_revisions(argc, argv, &revs, NULL);
 
 	memset(&info, 0, sizeof(info));
diff --git a/cache.h b/cache.h
index 5f84103..360ff9e 100644
--- a/cache.h
+++ b/cache.h
@@ -1729,6 +1729,14 @@ struct object_info {
 #define OBJECT_INFO_QUICK 8
 extern int sha1_object_info_extended(const unsigned char *, struct object_info *, unsigned flags);
 
+/*
+ * Set this to 0 to prevent sha1_object_info_extended() from fetching missing
+ * blobs. This has a difference only if is_partial_clone_registered() is true.
+ *
+ * Its default value is 1.
+ */
+extern int fetch_if_missing;
+
 /* Dumb servers support */
 extern int update_server_info(int);
 
diff --git a/fetch-object.c b/fetch-object.c
index f89dbba..369b61c 100644
--- a/fetch-object.c
+++ b/fetch-object.c
@@ -9,7 +9,9 @@ void fetch_object(const char *remote_name, const unsigned char *sha1)
 	struct remote *remote;
 	struct transport *transport;
 	struct ref *ref;
+	int original_fetch_if_missing = fetch_if_missing;
 
+	fetch_if_missing = 0;
 	remote = remote_get(remote_name);
 	if (!remote->url[0])
 		die(_("Remote with no URL"));
@@ -20,4 +22,5 @@ void fetch_object(const char *remote_name, const unsigned char *sha1)
 	transport_set_option(transport, TRANS_OPT_FROM_PROMISOR, "1");
 	transport_set_option(transport, TRANS_OPT_NO_HAVES, "1");
 	transport_fetch_refs(transport, ref);
+	fetch_if_missing = original_fetch_if_missing;
 }
diff --git a/list-objects.c b/list-objects.c
index 848b040..5390a74 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -9,6 +9,7 @@
 #include "list-objects.h"
 #include "list-objects-filter.h"
 #include "list-objects-filter-options.h"
+#include "packfile.h"
 
 static void process_blob(struct rev_info *revs,
 			 struct blob *blob,
@@ -90,6 +91,8 @@ static void process_tree(struct rev_info *revs,
 	enum interesting match = revs->diffopt.pathspec.nr == 0 ?
 		all_entries_interesting: entry_not_interesting;
 	int baselen = base->len;
+	int gently = revs->ignore_missing_links ||
+		revs->exclude_promisor_objects;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_SHOW;
 
 	if (!revs->tree_objects)
@@ -98,9 +101,12 @@ static void process_tree(struct rev_info *revs,
 		die("bad tree object");
 	if (obj->flags & (UNINTERESTING | SEEN))
 		return;
-	if (parse_tree_gently(tree, revs->ignore_missing_links) < 0) {
+	if (parse_tree_gently(tree, gently) < 0) {
 		if (revs->ignore_missing_links)
 			return;
+		if (revs->exclude_promisor_objects &&
+		    is_promisor_object(&obj->oid))
+			return;
 		die("bad tree object %s", oid_to_hex(&obj->oid));
 	}
 
diff --git a/object.c b/object.c
index b9a4a0e..4c222d6 100644
--- a/object.c
+++ b/object.c
@@ -252,7 +252,7 @@ struct object *parse_object(const struct object_id *oid)
 	if (obj && obj->parsed)
 		return obj;
 
-	if ((obj && obj->type == OBJ_BLOB) ||
+	if ((obj && obj->type == OBJ_BLOB && has_object_file(oid)) ||
 	    (!obj && has_object_file(oid) &&
 	     sha1_object_info(oid->hash, NULL) == OBJ_BLOB)) {
 		if (check_sha1_signature(repl, NULL, 0, NULL) < 0) {
diff --git a/revision.c b/revision.c
index d167223..85265a4 100644
--- a/revision.c
+++ b/revision.c
@@ -198,6 +198,8 @@ static struct object *get_reference(struct rev_info *revs, const char *name,
 	if (!object) {
 		if (revs->ignore_missing)
 			return object;
+		if (revs->exclude_promisor_objects && is_promisor_object(oid))
+			return NULL;
 		die("bad object %s", name);
 	}
 	object->flags |= flags;
@@ -791,8 +793,17 @@ static int add_parents_to_list(struct rev_info *revs, struct commit *commit,
 	for (parent = commit->parents; parent; parent = parent->next) {
 		struct commit *p = parent->item;
 
-		if (parse_commit_gently(p, revs->ignore_missing_links) < 0)
+		int gently = revs->ignore_missing_links ||
+			revs->exclude_promisor_objects;
+		if (parse_commit_gently(p, gently) < 0) {
+			if (revs->exclude_promisor_objects &&
+			    is_promisor_object(&p->object.oid)) {
+				if (revs->first_parent_only)
+					break;
+				continue;
+			}
 			return -1;
+		}
 		if (revs->show_source && !p->util)
 			p->util = commit->util;
 		p->object.flags |= left_flag;
@@ -2088,6 +2099,10 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 		revs->limited = 1;
 	} else if (!strcmp(arg, "--ignore-missing")) {
 		revs->ignore_missing = 1;
+	}else if (!strcmp(arg, "--exclude-promisor-objects")) {
+		if (fetch_if_missing)
+			die("BUG: --exclude-promisor-objects can only be used when fetch_if_missing is 0");
+		revs->exclude_promisor_objects = 1;
 	} else {
 		int opts = diff_opt_parse(&revs->diffopt, argv, argc, revs->prefix);
 		if (!opts)
@@ -2830,6 +2845,16 @@ void reset_revision_walk(void)
 	clear_object_flags(SEEN | ADDED | SHOWN);
 }
 
+static int mark_uninteresting(const struct object_id *oid,
+			      struct packed_git *pack,
+			      uint32_t pos,
+			      void *unused)
+{
+	struct object *o = parse_object(oid);
+	o->flags |= UNINTERESTING | SEEN;
+	return 0;
+}
+
 int prepare_revision_walk(struct rev_info *revs)
 {
 	int i;
@@ -2858,6 +2883,11 @@ int prepare_revision_walk(struct rev_info *revs)
 	    (revs->limited && limiting_can_increase_treesame(revs)))
 		revs->treesame.name = "treesame";
 
+	if (revs->exclude_promisor_objects) {
+		for_each_packed_object(mark_uninteresting, NULL,
+				       FOR_EACH_OBJECT_PROMISOR_ONLY);
+	}
+
 	if (revs->no_walk != REVISION_WALK_NO_WALK_UNSORTED)
 		commit_list_sort_by_date(&revs->commits);
 	if (revs->no_walk)
diff --git a/revision.h b/revision.h
index 5476120..5f9a49c 100644
--- a/revision.h
+++ b/revision.h
@@ -121,7 +121,10 @@ struct rev_info {
 			bisect:1,
 			ancestry_path:1,
 			first_parent_only:1,
-			line_level_traverse:1;
+			line_level_traverse:1,
+
+			/* for internal use only */
+			exclude_promisor_objects:1;
 
 	/* Diff flags */
 	unsigned int	diff:1,
diff --git a/sha1_file.c b/sha1_file.c
index 10c3a00..88e92aa 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -29,6 +29,8 @@
 #include "mergesort.h"
 #include "quote.h"
 #include "packfile.h"
+#include "fetch-object.h"
+#include "partial-clone-utils.h"
 
 const unsigned char null_sha1[GIT_MAX_RAWSZ];
 const struct object_id null_oid;
@@ -1144,6 +1146,8 @@ static int sha1_loose_object_info(const unsigned char *sha1,
 	return (status < 0) ? status : 0;
 }
 
+int fetch_if_missing = 1;
+
 int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi, unsigned flags)
 {
 	static struct object_info blank_oi = OBJECT_INFO_INIT;
@@ -1152,6 +1156,7 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi,
 	const unsigned char *real = (flags & OBJECT_INFO_LOOKUP_REPLACE) ?
 				    lookup_replace_object(sha1) :
 				    sha1;
+	int already_retried = 0;
 
 	if (!oi)
 		oi = &blank_oi;
@@ -1176,28 +1181,36 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi,
 		}
 	}
 
-	if (!find_pack_entry(real, &e)) {
-		/* Most likely it's a loose object. */
-		if (!sha1_loose_object_info(real, oi, flags))
-			return 0;
+retry:
+	if (find_pack_entry(real, &e))
+		goto found_packed;
 
-		/* Not a loose object; someone else may have just packed it. */
-		if (flags & OBJECT_INFO_QUICK) {
-			return -1;
-		} else {
-			reprepare_packed_git();
-			if (!find_pack_entry(real, &e))
-				return -1;
-		}
+	/* Most likely it's a loose object. */
+	if (!sha1_loose_object_info(real, oi, flags))
+		return 0;
+
+	/* Not a loose object; someone else may have just packed it. */
+	reprepare_packed_git();
+	if (find_pack_entry(real, &e))
+		goto found_packed;
+
+	/* Check if it is a missing object */
+	if (fetch_if_missing && repository_format_partial_clone_remote &&
+	    !already_retried) {
+		fetch_object(repository_format_partial_clone_remote, real);
+		already_retried = 1;
+		goto retry;
 	}
 
+	return -1;
+
+found_packed:
 	if (oi == &blank_oi)
 		/*
 		 * We know that the caller doesn't actually need the
 		 * information below, so return early.
 		 */
 		return 0;
-
 	rtype = packed_object_info(e.p, e.offset, oi);
 	if (rtype < 0) {
 		mark_bad_packed_object(e.p, real);
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 002e071..59de768 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -138,4 +138,156 @@ test_expect_success 'missing CLI object, but promised, passes fsck' '
 	git -C repo fsck "$A"
 '
 
+test_expect_success 'fetching of missing objects' '
+	rm -rf repo &&
+	test_create_repo server &&
+	test_commit -C server foo &&
+	git -C server repack -a -d --write-bitmap-index &&
+
+	git clone "file://$(pwd)/server" repo &&
+	HASH=$(git -C repo rev-parse foo) &&
+	rm -rf repo/.git/objects/* &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config extensions.partialcloneremote "origin" &&
+	git -C repo cat-file -p "$HASH" &&
+
+	# Ensure that the .promisor file is written, and check that its
+	# associated packfile contains the object
+	ls repo/.git/objects/pack/pack-*.promisor >promisorlist &&
+	test_line_count = 1 promisorlist &&
+	IDX=$(cat promisorlist | sed "s/promisor$/idx/") &&
+	git verify-pack --verbose "$IDX" | grep "$HASH"
+'
+
+test_expect_success 'rev-list stops traversal at missing and promised commit' '
+		     rm -rf repo &&
+		     test_create_repo repo &&
+		     test_commit -C repo foo &&
+		     test_commit -C repo bar &&
+
+	FOO=$(git -C repo rev-parse foo) &&
+	promise_and_delete "$FOO" &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config extensions.partialcloneremote "arbitrary string" &&
+	git -C repo rev-list --exclude-promisor-objects --objects bar >out &&
+	grep $(git -C repo rev-parse bar) out &&
+	! grep $FOO out
+'
+
+test_expect_success 'rev-list stops traversal at missing and promised tree' '
+		     rm -rf repo &&
+		     test_create_repo repo &&
+		     test_commit -C repo foo &&
+		     mkdir repo/a_dir &&
+		     echo something >repo/a_dir/something &&
+		     git -C repo add a_dir/something &&
+		     git -C repo commit -m bar &&
+
+	# foo^{tree} (tree referenced from commit)
+	TREE=$(git -C repo rev-parse foo^{tree}) &&
+
+	# a tree referenced by HEAD^{tree} (tree referenced from tree)
+	TREE2=$(git -C repo ls-tree HEAD^{tree} | grep " tree " | head -1 | cut -b13-52) &&
+
+	promise_and_delete "$TREE" &&
+	promise_and_delete "$TREE2" &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config extensions.partialcloneremote "arbitrary string" &&
+	git -C repo rev-list --exclude-promisor-objects --objects HEAD >out &&
+	grep $(git -C repo rev-parse foo) out &&
+	! grep $TREE out &&
+	grep $(git -C repo rev-parse HEAD) out &&
+	! grep $TREE2 out
+'
+
+test_expect_success 'rev-list stops traversal at missing and promised blob' '
+		     rm -rf repo &&
+		     test_create_repo repo &&
+		     echo something >repo/something &&
+		     git -C repo add something &&
+		     git -C repo commit -m foo &&
+
+	BLOB=$(git -C repo hash-object -w something) &&
+	promise_and_delete "$BLOB" &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config extensions.partialcloneremote "arbitrary string" &&
+	git -C repo rev-list --exclude-promisor-objects --objects HEAD >out &&
+	grep $(git -C repo rev-parse HEAD) out &&
+	! grep $BLOB out
+'
+
+test_expect_success 'rev-list stops traversal at promisor commit, tree, and blob' '
+		     rm -rf repo &&
+		     test_create_repo repo &&
+		     test_commit -C repo foo &&
+		     test_commit -C repo bar &&
+		     test_commit -C repo baz &&
+
+	COMMIT=$(git -C repo rev-parse foo) &&
+	TREE=$(git -C repo rev-parse bar^{tree}) &&
+	BLOB=$(git hash-object repo/baz.t) &&
+	printf "%s\n%s\n%s\n" $COMMIT $TREE $BLOB | pack_as_from_promisor &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config extensions.partialcloneremote "arbitrary string" &&
+	git -C repo rev-list --exclude-promisor-objects --objects HEAD >out &&
+	! grep $COMMIT out &&
+	! grep $TREE out &&
+	! grep $BLOB out &&
+	grep $(git -C repo rev-parse bar) out  # sanity check that some walking was done
+'
+
+test_expect_success 'rev-list accepts missing and promised objects on command line' '
+		     rm -rf repo &&
+		     test_create_repo repo &&
+		     test_commit -C repo foo &&
+		     test_commit -C repo bar &&
+		     test_commit -C repo baz &&
+
+	COMMIT=$(git -C repo rev-parse foo) &&
+	TREE=$(git -C repo rev-parse bar^{tree}) &&
+	BLOB=$(git hash-object repo/baz.t) &&
+
+	promise_and_delete $COMMIT &&
+	promise_and_delete $TREE &&
+	promise_and_delete $BLOB &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config extensions.partialcloneremote "arbitrary string" &&
+	git -C repo rev-list --exclude-promisor-objects --objects "$COMMIT" "$TREE" "$BLOB"
+'
+
+LIB_HTTPD_PORT=12345  # default port, 410, cannot be used as non-root
+. "$TEST_DIRECTORY"/lib-httpd.sh
+start_httpd
+
+test_expect_success 'fetching of missing objects from an HTTP server' '
+	rm -rf repo &&
+	SERVER="$HTTPD_DOCUMENT_ROOT_PATH/server" &&
+	test_create_repo "$SERVER" &&
+	test_commit -C "$SERVER" foo &&
+	git -C "$SERVER" repack -a -d --write-bitmap-index &&
+
+	git clone $HTTPD_URL/smart/server repo &&
+	HASH=$(git -C repo rev-parse foo) &&
+	rm -rf repo/.git/objects/* &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config extensions.partialcloneremoteremote "origin" &&
+	git -C repo cat-file -p "$HASH" &&
+
+	# Ensure that the .promisor file is written, and check that its
+	# associated packfile contains the object
+	ls repo/.git/objects/pack/pack-*.promisor >promisorlist &&
+	test_line_count = 1 promisorlist &&
+	IDX=$(cat promisorlist | sed "s/promisor$/idx/") &&
+	git verify-pack --verbose "$IDX" | grep "$HASH"
+'
+
+stop_httpd
+
 test_done
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 9/9] gc: do not repack promisor packfiles
  2017-11-02 20:20 [PATCH 0/9] WIP Partial clone part 2: fsck and promisors Jeff Hostetler
                   ` (7 preceding siblings ...)
  2017-11-02 20:20 ` [PATCH 8/9] sha1_file: support lazily fetching missing objects Jeff Hostetler
@ 2017-11-02 20:20 ` Jeff Hostetler
  8 siblings, 0 replies; 19+ messages in thread
From: Jeff Hostetler @ 2017-11-02 20:20 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, jonathantanmy, Jeff Hostetler

From: Jonathan Tan <jonathantanmy@google.com>

Teach gc to stop traversal at promisor objects, and to leave promisor
packfiles alone. This has the effect of only repacking non-promisor
packfiles, and preserves the distinction between promisor packfiles and
non-promisor packfiles.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Documentation/git-pack-objects.txt |  4 +++
 builtin/gc.c                       |  4 +++
 builtin/pack-objects.c             | 14 ++++++++++
 builtin/prune.c                    |  7 +++++
 builtin/repack.c                   | 12 +++++++--
 t/t0410-partial-clone.sh           | 54 ++++++++++++++++++++++++++++++++++++--
 6 files changed, 91 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
index 6786351..ee462c6 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -246,6 +246,10 @@ So does `git bundle` (see linkgit:git-bundle[1]) when it creates a bundle.
 	Ignore missing objects without error.  This may be used with
 	or without and of the above filtering.
 
+--exclude-promisor-objects::
+	Silently omit referenced but missing objects from the packfile.
+	This is used with partial clone.
+
 SEE ALSO
 --------
 linkgit:git-rev-list[1]
diff --git a/builtin/gc.c b/builtin/gc.c
index 3c5eae0..a17806a 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -20,6 +20,7 @@
 #include "argv-array.h"
 #include "commit.h"
 #include "packfile.h"
+#include "partial-clone-utils.h"
 
 #define FAILED_RUN "failed to run %s"
 
@@ -458,6 +459,9 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 			argv_array_push(&prune, prune_expire);
 			if (quiet)
 				argv_array_push(&prune, "--no-progress");
+			if (is_partial_clone_registered())
+				argv_array_push(&prune,
+						"--exclude-promisor-objects");
 			if (run_command_v_opt(prune.argv, RUN_GIT_CMD))
 				return error(FAILED_RUN, prune.argv[0]);
 		}
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index e16722f..957e459 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -83,6 +83,7 @@ static unsigned long window_memory_limit = 0;
 
 static struct list_objects_filter_options filter_options;
 static int arg_ignore_missing;
+static int arg_exclude_promisor_objects;
 
 /*
  * stats
@@ -2561,6 +2562,11 @@ static void show_object(struct object *obj, const char *name, void *data)
 	if (arg_ignore_missing && !has_object_file(&obj->oid))
 		return;
 
+	if (arg_exclude_promisor_objects &&
+	    !has_object_file(&obj->oid) &&
+	    is_promisor_object(&obj->oid))
+		return;
+
 	add_preferred_base_object(name);
 	add_object_entry(obj->oid.hash, obj->type, name, 0);
 	obj->flags |= OBJECT_ADDED;
@@ -2972,6 +2978,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 		OPT_PARSE_LIST_OBJECTS_FILTER(&filter_options),
 		OPT_BOOL(0, "filter-ignore-missing", &arg_ignore_missing,
 			 N_("ignore and omit missing objects from packfile")),
+		OPT_BOOL(0, "exclude-promisor-objects", &arg_exclude_promisor_objects,
+			 N_("do not pack objects in promisor packfiles")),
 		OPT_END(),
 	};
 
@@ -3017,6 +3025,12 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 		argv_array_push(&rp, "--unpacked");
 	}
 
+	if (arg_exclude_promisor_objects) {
+		use_internal_rev_list = 1;
+		fetch_if_missing = 0;
+		argv_array_push(&rp, "--exclude-promisor-objects");
+	}
+
 	if (!reuse_object)
 		reuse_delta = 0;
 	if (pack_compression_level == -1)
diff --git a/builtin/prune.c b/builtin/prune.c
index cddabf2..be34645 100644
--- a/builtin/prune.c
+++ b/builtin/prune.c
@@ -101,12 +101,15 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
 {
 	struct rev_info revs;
 	struct progress *progress = NULL;
+	int exclude_promisor_objects = 0;
 	const struct option options[] = {
 		OPT__DRY_RUN(&show_only, N_("do not remove, show only")),
 		OPT__VERBOSE(&verbose, N_("report pruned objects")),
 		OPT_BOOL(0, "progress", &show_progress, N_("show progress")),
 		OPT_EXPIRY_DATE(0, "expire", &expire,
 				N_("expire objects older than <time>")),
+		OPT_BOOL(0, "exclude-promisor-objects", &exclude_promisor_objects,
+			 N_("limit traversal to objects outside promisor packfiles")),
 		OPT_END()
 	};
 	char *s;
@@ -139,6 +142,10 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
 		show_progress = isatty(2);
 	if (show_progress)
 		progress = start_delayed_progress(_("Checking connectivity"), 0);
+	if (exclude_promisor_objects) {
+		fetch_if_missing = 0;
+		revs.exclude_promisor_objects = 1;
+	}
 
 	mark_reachable_objects(&revs, 1, expire, progress);
 	stop_progress(&progress);
diff --git a/builtin/repack.c b/builtin/repack.c
index f17a68a..a5a7977 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -83,7 +83,8 @@ static void remove_pack_on_signal(int signo)
 
 /*
  * Adds all packs hex strings to the fname list, which do not
- * have a corresponding .keep file.
+ * have a corresponding .keep or .promisor file. These packs are not to
+ * be kept if we are going to pack everything into one file.
  */
 static void get_non_kept_pack_filenames(struct string_list *fname_list)
 {
@@ -101,7 +102,8 @@ static void get_non_kept_pack_filenames(struct string_list *fname_list)
 
 		fname = xmemdupz(e->d_name, len);
 
-		if (!file_exists(mkpath("%s/%s.keep", packdir, fname)))
+		if (!file_exists(mkpath("%s/%s.keep", packdir, fname)) &&
+		    !file_exists(mkpath("%s/%s.promisor", packdir, fname)))
 			string_list_append_nodup(fname_list, fname);
 		else
 			free(fname);
@@ -232,6 +234,12 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	argv_array_push(&cmd.args, "--all");
 	argv_array_push(&cmd.args, "--reflog");
 	argv_array_push(&cmd.args, "--indexed-objects");
+
+	/*
+	 * TODO Should this be if (is_partial_clone_registered()) ...
+	 */
+	argv_array_push(&cmd.args, "--exclude-promisor-objects");
+
 	if (window)
 		argv_array_pushf(&cmd.args, "--window=%s", window);
 	if (window_memory)
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 59de768..7ddcb4c 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -11,13 +11,15 @@ delete_object () {
 pack_as_from_promisor () {
 	HASH=$(git -C repo pack-objects .git/objects/pack/pack) &&
 	>repo/.git/objects/pack/pack-$HASH.promisor
+	echo $HASH
 }
 
 promise_and_delete () {
 	HASH=$(git -C repo rev-parse "$1") &&
 	git -C repo tag -a -m message my_annotated_tag "$HASH" &&
 	git -C repo rev-parse my_annotated_tag | pack_as_from_promisor &&
-	git -C repo tag -d my_annotated_tag &&
+	# tag -d prints a message to stdout, so redirect it
+	git -C repo tag -d my_annotated_tag >/dev/null &&
 	delete_object repo "$HASH"
 }
 
@@ -261,6 +263,54 @@ test_expect_success 'rev-list accepts missing and promised objects on command li
 	git -C repo rev-list --exclude-promisor-objects --objects "$COMMIT" "$TREE" "$BLOB"
 '
 
+test_expect_success 'gc does not repack promisor objects' '
+	rm -rf repo &&
+	test_create_repo repo &&
+	test_commit -C repo my_commit &&
+
+	TREE_HASH=$(git -C repo rev-parse HEAD^{tree}) &&
+	HASH=$(printf "$TREE_HASH\n" | pack_as_from_promisor) &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config extensions.partialcloneremote "arbitrary string" &&
+	git -C repo gc &&
+
+	# Ensure that the promisor packfile still exists, and remove it
+	test -e repo/.git/objects/pack/pack-$HASH.pack &&
+	rm repo/.git/objects/pack/pack-$HASH.* &&
+
+	# Ensure that the single other pack contains the commit, but not the tree
+	ls repo/.git/objects/pack/pack-*.pack >packlist &&
+	test_line_count = 1 packlist &&
+	git verify-pack repo/.git/objects/pack/pack-*.pack -v >out &&
+	grep "$(git -C repo rev-parse HEAD)" out &&
+	! grep "$TREE_HASH" out
+'
+
+test_expect_success 'gc stops traversal when a missing but promised object is reached' '
+	rm -rf repo &&
+	test_create_repo repo &&
+	test_commit -C repo my_commit &&
+
+	TREE_HASH=$(git -C repo rev-parse HEAD^{tree}) &&
+	HASH=$(promise_and_delete $TREE_HASH) &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config extensions.partialcloneremote "arbitrary string" &&
+	git -C repo gc &&
+
+	# Ensure that the promisor packfile still exists, and remove it
+	test -e repo/.git/objects/pack/pack-$HASH.pack &&
+	rm repo/.git/objects/pack/pack-$HASH.* &&
+
+	# Ensure that the single other pack contains the commit, but not the tree
+	ls repo/.git/objects/pack/pack-*.pack >packlist &&
+	test_line_count = 1 packlist &&
+	git verify-pack repo/.git/objects/pack/pack-*.pack -v >out &&
+	grep "$(git -C repo rev-parse HEAD)" out &&
+	! grep "$TREE_HASH" out
+'
+
 LIB_HTTPD_PORT=12345  # default port, 410, cannot be used as non-root
 . "$TEST_DIRECTORY"/lib-httpd.sh
 start_httpd
@@ -277,7 +327,7 @@ test_expect_success 'fetching of missing objects from an HTTP server' '
 	rm -rf repo/.git/objects/* &&
 
 	git -C repo config core.repositoryformatversion 1 &&
-	git -C repo config extensions.partialcloneremoteremote "origin" &&
+	git -C repo config extensions.partialcloneremote "origin" &&
 	git -C repo cat-file -p "$HASH" &&
 
 	# Ensure that the .promisor file is written, and check that its
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/9] extension.partialclone: introduce partial clone extension
  2017-11-02 20:20 ` [PATCH 1/9] extension.partialclone: introduce partial clone extension Jeff Hostetler
@ 2017-11-02 22:24   ` Jonathan Tan
  2017-11-03 13:57     ` Jeff Hostetler
  0 siblings, 1 reply; 19+ messages in thread
From: Jonathan Tan @ 2017-11-02 22:24 UTC (permalink / raw)
  To: Jeff Hostetler; +Cc: git, gitster, peff, Jeff Hostetler

On Thu,  2 Nov 2017 20:20:44 +0000
Jeff Hostetler <git@jeffhostetler.com> wrote:

> From: Jeff Hostetler <jeffhost@microsoft.com>
> 
> Introduce the ability to have missing objects in a repo.  This
> functionality is guarded by new repository extension options:
>     `extensions.partialcloneremote` and
>     `extensions.partialclonefilter`.

With this, it is unclear what happens if extensions.partialcloneremote
is not set but extensions.partialclonefilter is set. For something as
significant as a repository extension (which Git uses to determine if it
will even attempt to interact with a repo), I think - I would prefer
just extensions.partialclone (or extensions.partialcloneremote, though I
prefer the former) which determines the remote (the important part,
which controls the dynamic object fetching), and have another option
"core.partialclonefilter" which is only useful if
"extensions.partialclone" is set.

I also don't think extensions.partialclonefilter (or
core.partialclonefilter, if we use my suggestion) needs to be introduced
so early in the patch set when it will only be used once we start
fetching/cloning.

> +void partial_clone_utils_register(
> +	const struct list_objects_filter_options *filter_options,
> +	const char *remote,
> +	const char *cmd_name)
> +{

This function is useful once we have fetch/clone, but probably not
before that. Since the fetch/clone patches are several patches ahead,
could this be moved there?

> @@ -420,6 +420,19 @@ static int check_repo_format(const char *var, const char *value, void *vdata)
>  			;
>  		else if (!strcmp(ext, "preciousobjects"))
>  			data->precious_objects = git_config_bool(var, value);
> +
> +		else if (!strcmp(ext, KEY_PARTIALCLONEREMOTE))
> +			if (!value)
> +				return config_error_nonbool(var);
> +			else
> +				data->partial_clone_remote = xstrdup(value);
> +
> +		else if (!strcmp(ext, KEY_PARTIALCLONEFILTER))
> +			if (!value)
> +				return config_error_nonbool(var);
> +			else
> +				data->partial_clone_filter = xstrdup(value);
> +
>  		else
>  			string_list_append(&data->unknown_extensions, ext);
>  	} else if (strcmp(var, "core.bare") == 0) {

With a complicated block, probably better to use braces in these
clauses.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/9] extension.partialclone: introduce partial clone extension
  2017-11-02 22:24   ` Jonathan Tan
@ 2017-11-03 13:57     ` Jeff Hostetler
  2017-11-03 18:39       ` Jonathan Tan
  0 siblings, 1 reply; 19+ messages in thread
From: Jeff Hostetler @ 2017-11-03 13:57 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, gitster, peff, Jeff Hostetler



On 11/2/2017 6:24 PM, Jonathan Tan wrote:
> On Thu,  2 Nov 2017 20:20:44 +0000
> Jeff Hostetler <git@jeffhostetler.com> wrote:
> 
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>
>> Introduce the ability to have missing objects in a repo.  This
>> functionality is guarded by new repository extension options:
>>      `extensions.partialcloneremote` and
>>      `extensions.partialclonefilter`.
> 
> With this, it is unclear what happens if extensions.partialcloneremote
> is not set but extensions.partialclonefilter is set. For something as
> significant as a repository extension (which Git uses to determine if it
> will even attempt to interact with a repo), I think - I would prefer
> just extensions.partialclone (or extensions.partialcloneremote, though I
> prefer the former) which determines the remote (the important part,
> which controls the dynamic object fetching), and have another option
> "core.partialclonefilter" which is only useful if
> "extensions.partialclone" is set.

Yes, that is a point I wanted to ask about.  I renamed the
extensions.partialclone that you created and then I moved your
remote.<name>.blob-max-bytes setting to be in extensions too.
Moving it to core.partialclonefilter is fine.

> 
> I also don't think extensions.partialclonefilter (or
> core.partialclonefilter, if we use my suggestion) needs to be introduced
> so early in the patch set when it will only be used once we start
> fetching/cloning.
> 
>> +void partial_clone_utils_register(
>> +	const struct list_objects_filter_options *filter_options,
>> +	const char *remote,
>> +	const char *cmd_name)
>> +{
> 
> This function is useful once we have fetch/clone, but probably not
> before that. Since the fetch/clone patches are several patches ahead,
> could this be moved there?

Sure.

> 
>> @@ -420,6 +420,19 @@ static int check_repo_format(const char *var, const char *value, void *vdata)
>>   			;
>>   		else if (!strcmp(ext, "preciousobjects"))
>>   			data->precious_objects = git_config_bool(var, value);
>> +
>> +		else if (!strcmp(ext, KEY_PARTIALCLONEREMOTE))
>> +			if (!value)
>> +				return config_error_nonbool(var);
>> +			else
>> +				data->partial_clone_remote = xstrdup(value);
>> +
>> +		else if (!strcmp(ext, KEY_PARTIALCLONEFILTER))
>> +			if (!value)
>> +				return config_error_nonbool(var);
>> +			else
>> +				data->partial_clone_filter = xstrdup(value);
>> +
>>   		else
>>   			string_list_append(&data->unknown_extensions, ext);
>>   	} else if (strcmp(var, "core.bare") == 0) {
> 
> With a complicated block, probably better to use braces in these
> clauses.
> 

Good point.

Thanks,
Jeff


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/9] extension.partialclone: introduce partial clone extension
  2017-11-03 13:57     ` Jeff Hostetler
@ 2017-11-03 18:39       ` Jonathan Tan
  2017-11-06 17:32         ` Jeff Hostetler
  0 siblings, 1 reply; 19+ messages in thread
From: Jonathan Tan @ 2017-11-03 18:39 UTC (permalink / raw)
  To: Jeff Hostetler; +Cc: git, gitster, peff, Jeff Hostetler

On Fri, 3 Nov 2017 09:57:18 -0400
Jeff Hostetler <git@jeffhostetler.com> wrote:

> On 11/2/2017 6:24 PM, Jonathan Tan wrote:
> > On Thu,  2 Nov 2017 20:20:44 +0000
> > Jeff Hostetler <git@jeffhostetler.com> wrote:
> > 
> >> From: Jeff Hostetler <jeffhost@microsoft.com>
> >>
> >> Introduce the ability to have missing objects in a repo.  This
> >> functionality is guarded by new repository extension options:
> >>      `extensions.partialcloneremote` and
> >>      `extensions.partialclonefilter`.
> > 
> > With this, it is unclear what happens if extensions.partialcloneremote
> > is not set but extensions.partialclonefilter is set. For something as
> > significant as a repository extension (which Git uses to determine if it
> > will even attempt to interact with a repo), I think - I would prefer
> > just extensions.partialclone (or extensions.partialcloneremote, though I
> > prefer the former) which determines the remote (the important part,
> > which controls the dynamic object fetching), and have another option
> > "core.partialclonefilter" which is only useful if
> > "extensions.partialclone" is set.
> 
> Yes, that is a point I wanted to ask about.  I renamed the
> extensions.partialclone that you created and then I moved your
> remote.<name>.blob-max-bytes setting to be in extensions too.
> Moving it to core.partialclonefilter is fine.

OK - in that case, it might be easier to just reuse my first patch in
its entirety. "core.partialclonefilter" is not used until the
fetching/cloning part anyway.

I agree that "core.partialclonefilter" (or another place not in
"remote") instead of "remote.<name>.blob-max-bytes" is a good idea - in
the future, we might want to reuse the same filter setting for
non-fetching functionality.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/9] extension.partialclone: introduce partial clone extension
  2017-11-03 18:39       ` Jonathan Tan
@ 2017-11-06 17:32         ` Jeff Hostetler
  2017-11-06 19:16           ` Jonathan Tan
  0 siblings, 1 reply; 19+ messages in thread
From: Jeff Hostetler @ 2017-11-06 17:32 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, gitster, peff, Jeff Hostetler



On 11/3/2017 2:39 PM, Jonathan Tan wrote:
> On Fri, 3 Nov 2017 09:57:18 -0400
> Jeff Hostetler <git@jeffhostetler.com> wrote:
> 
>> On 11/2/2017 6:24 PM, Jonathan Tan wrote:
>>> On Thu,  2 Nov 2017 20:20:44 +0000
>>> Jeff Hostetler <git@jeffhostetler.com> wrote:
>>>
>>>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>>>
>>>> Introduce the ability to have missing objects in a repo.  This
>>>> functionality is guarded by new repository extension options:
>>>>       `extensions.partialcloneremote` and
>>>>       `extensions.partialclonefilter`.
>>>
>>> With this, it is unclear what happens if extensions.partialcloneremote
>>> is not set but extensions.partialclonefilter is set. For something as
>>> significant as a repository extension (which Git uses to determine if it
>>> will even attempt to interact with a repo), I think - I would prefer
>>> just extensions.partialclone (or extensions.partialcloneremote, though I
>>> prefer the former) which determines the remote (the important part,
>>> which controls the dynamic object fetching), and have another option
>>> "core.partialclonefilter" which is only useful if
>>> "extensions.partialclone" is set.
>>
>> Yes, that is a point I wanted to ask about.  I renamed the
>> extensions.partialclone that you created and then I moved your
>> remote.<name>.blob-max-bytes setting to be in extensions too.
>> Moving it to core.partialclonefilter is fine.
> 
> OK - in that case, it might be easier to just reuse my first patch in
> its entirety. "core.partialclonefilter" is not used until the
> fetching/cloning part anyway.
> 

Good point.  I'll take a look at refactoring that.
If it looks like the result will be mostly/effectively
your original patches, I'll let you know and hand part 2
back to you.

> I agree that "core.partialclonefilter" (or another place not in
> "remote") instead of "remote.<name>.blob-max-bytes" is a good idea - in
> the future, we might want to reuse the same filter setting for
> non-fetching functionality.
> 

Jeff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/9] extension.partialclone: introduce partial clone extension
  2017-11-06 17:32         ` Jeff Hostetler
@ 2017-11-06 19:16           ` Jonathan Tan
  2017-11-08 20:32             ` Jeff Hostetler
  0 siblings, 1 reply; 19+ messages in thread
From: Jonathan Tan @ 2017-11-06 19:16 UTC (permalink / raw)
  To: Jeff Hostetler; +Cc: git, gitster, peff, Jeff Hostetler

On Mon, 6 Nov 2017 12:32:45 -0500
Jeff Hostetler <git@jeffhostetler.com> wrote:

> >> Yes, that is a point I wanted to ask about.  I renamed the
> >> extensions.partialclone that you created and then I moved your
> >> remote.<name>.blob-max-bytes setting to be in extensions too.
> >> Moving it to core.partialclonefilter is fine.
> > 
> > OK - in that case, it might be easier to just reuse my first patch in
> > its entirety. "core.partialclonefilter" is not used until the
> > fetching/cloning part anyway.
> > 
> 
> Good point.  I'll take a look at refactoring that.
> If it looks like the result will be mostly/effectively
> your original patches, I'll let you know and hand part 2
> back to you.

Sounds good. I uploaded the result of rebasing my part 2 patches on top
of your part 1 patches here, if you would like it as a reference:

https://github.com/jonathantanmy/git/commits/pc20171106

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/9] extension.partialclone: introduce partial clone extension
  2017-11-06 19:16           ` Jonathan Tan
@ 2017-11-08 20:32             ` Jeff Hostetler
  2017-11-08 21:51               ` Jonathan Tan
  0 siblings, 1 reply; 19+ messages in thread
From: Jeff Hostetler @ 2017-11-08 20:32 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, gitster, peff, Jeff Hostetler



On 11/6/2017 2:16 PM, Jonathan Tan wrote:
> On Mon, 6 Nov 2017 12:32:45 -0500
> Jeff Hostetler <git@jeffhostetler.com> wrote:
> 
>>>> Yes, that is a point I wanted to ask about.  I renamed the
>>>> extensions.partialclone that you created and then I moved your
>>>> remote.<name>.blob-max-bytes setting to be in extensions too.
>>>> Moving it to core.partialclonefilter is fine.
>>>
>>> OK - in that case, it might be easier to just reuse my first patch in
>>> its entirety. "core.partialclonefilter" is not used until the
>>> fetching/cloning part anyway.
>>>
>>
>> Good point.  I'll take a look at refactoring that.
>> If it looks like the result will be mostly/effectively
>> your original patches, I'll let you know and hand part 2
>> back to you.
> 
> Sounds good. I uploaded the result of rebasing my part 2 patches on top
> of your part 1 patches here, if you would like it as a reference:
> 
> https://github.com/jonathantanmy/git/commits/pc20171106
> 

Thanks Jonathan.

I moved my version of part 2 on top of yesterday's part 1.
There are a few changes between my version and yours. Could
you take a quick look at them and see if they make sense?
(I'll spare the mailing list another patch series until after
I attend to the feed back on part 1.)

https://github.com/jeffhostetler/git/commits/core/pc3_p2

Thanks
Jeff


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/9] extension.partialclone: introduce partial clone extension
  2017-11-08 20:32             ` Jeff Hostetler
@ 2017-11-08 21:51               ` Jonathan Tan
  2017-11-08 22:11                 ` Jeff Hostetler
  2017-11-16 17:33                 ` Jeff Hostetler
  0 siblings, 2 replies; 19+ messages in thread
From: Jonathan Tan @ 2017-11-08 21:51 UTC (permalink / raw)
  To: Jeff Hostetler; +Cc: git, gitster, peff, Jeff Hostetler

On Wed, 8 Nov 2017 15:32:21 -0500
Jeff Hostetler <git@jeffhostetler.com> wrote:

> Thanks Jonathan.
> 
> I moved my version of part 2 on top of yesterday's part 1.
> There are a few changes between my version and yours. Could
> you take a quick look at them and see if they make sense?
> (I'll spare the mailing list another patch series until after
> I attend to the feed back on part 1.)
> 
> https://github.com/jeffhostetler/git/commits/core/pc3_p2

Thanks - the differences are quite minor, and they generally make sense.
The main one is that finish_object() in builtin/rev-list.c now returns
int instead of void, but that makes sense.

Other than that:

 - I think you accidentally squashed the rev-list commit into
   "sha1_file: support lazily fetching missing objects".
 - The documentation for --exclude-promisor-objects in
   git-pack-objects.txt should be "Omit objects that are known to be in
   the promisor remote". (This option has the purpose of operating only
   on locally created objects, so that when we repack, we still maintain
   a distinction between locally created objects [without .promisor] and
   objects from the promisor remote [with .promisor].)
 - The transport options in gitremote-helpers.txt could have the same
   documentation as in transport.h.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/9] extension.partialclone: introduce partial clone extension
  2017-11-08 21:51               ` Jonathan Tan
@ 2017-11-08 22:11                 ` Jeff Hostetler
  2017-11-16 17:33                 ` Jeff Hostetler
  1 sibling, 0 replies; 19+ messages in thread
From: Jeff Hostetler @ 2017-11-08 22:11 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, gitster, peff, Jeff Hostetler



On 11/8/2017 4:51 PM, Jonathan Tan wrote:
> On Wed, 8 Nov 2017 15:32:21 -0500
> Jeff Hostetler <git@jeffhostetler.com> wrote:
> 
>> Thanks Jonathan.
>>
>> I moved my version of part 2 on top of yesterday's part 1.
>> There are a few changes between my version and yours. Could
>> you take a quick look at them and see if they make sense?
>> (I'll spare the mailing list another patch series until after
>> I attend to the feed back on part 1.)
>>
>> https://github.com/jeffhostetler/git/commits/core/pc3_p2
> 
> Thanks - the differences are quite minor, and they generally make sense.
> The main one is that finish_object() in builtin/rev-list.c now returns
> int instead of void, but that makes sense.
> 
> Other than that:
> 
>   - I think you accidentally squashed the rev-list commit into
>     "sha1_file: support lazily fetching missing objects".
>   - The documentation for --exclude-promisor-objects in
>     git-pack-objects.txt should be "Omit objects that are known to be in
>     the promisor remote". (This option has the purpose of operating only
>     on locally created objects, so that when we repack, we still maintain
>     a distinction between locally created objects [without .promisor] and
>     objects from the promisor remote [with .promisor].)
>   - The transport options in gitremote-helpers.txt could have the same
>     documentation as in transport.h.
> 

thanks for the quick turn around.  i'll get these into my next
version next week.

Jeff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/9] extension.partialclone: introduce partial clone extension
  2017-11-08 21:51               ` Jonathan Tan
  2017-11-08 22:11                 ` Jeff Hostetler
@ 2017-11-16 17:33                 ` Jeff Hostetler
  1 sibling, 0 replies; 19+ messages in thread
From: Jeff Hostetler @ 2017-11-16 17:33 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, gitster, peff, Jeff Hostetler



On 11/8/2017 4:51 PM, Jonathan Tan wrote:
> On Wed, 8 Nov 2017 15:32:21 -0500
> Jeff Hostetler <git@jeffhostetler.com> wrote:
> 
>> Thanks Jonathan.
>>
>> I moved my version of part 2 on top of yesterday's part 1.
>> There are a few changes between my version and yours. Could
>> you take a quick look at them and see if they make sense?
>> (I'll spare the mailing list another patch series until after
>> I attend to the feed back on part 1.)
>>
>> https://github.com/jeffhostetler/git/commits/core/pc3_p2
> 
> Thanks - the differences are quite minor, and they generally make sense.
> The main one is that finish_object() in builtin/rev-list.c now returns
> int instead of void, but that makes sense.
> 
> Other than that:
> 
>   - I think you accidentally squashed the rev-list commit into
>     "sha1_file: support lazily fetching missing objects".

fixed. thanks.

>   - The documentation for --exclude-promisor-objects in
>     git-pack-objects.txt should be "Omit objects that are known to be in
>     the promisor remote". (This option has the purpose of operating only
>     on locally created objects, so that when we repack, we still maintain
>     a distinction between locally created objects [without .promisor] and
>     objects from the promisor remote [with .promisor].)

>   - The transport options in gitremote-helpers.txt could have the same
>     documentation as in transport.h.

fixed. thanks.
  

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2017-11-16 17:33 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-02 20:20 [PATCH 0/9] WIP Partial clone part 2: fsck and promisors Jeff Hostetler
2017-11-02 20:20 ` [PATCH 1/9] extension.partialclone: introduce partial clone extension Jeff Hostetler
2017-11-02 22:24   ` Jonathan Tan
2017-11-03 13:57     ` Jeff Hostetler
2017-11-03 18:39       ` Jonathan Tan
2017-11-06 17:32         ` Jeff Hostetler
2017-11-06 19:16           ` Jonathan Tan
2017-11-08 20:32             ` Jeff Hostetler
2017-11-08 21:51               ` Jonathan Tan
2017-11-08 22:11                 ` Jeff Hostetler
2017-11-16 17:33                 ` Jeff Hostetler
2017-11-02 20:20 ` [PATCH 2/9] fsck: introduce partialclone extension Jeff Hostetler
2017-11-02 20:20 ` [PATCH 3/9] fsck: support refs pointing to promisor objects Jeff Hostetler
2017-11-02 20:20 ` [PATCH 4/9] fsck: support referenced " Jeff Hostetler
2017-11-02 20:20 ` [PATCH 5/9] fsck: support promisor objects as CLI argument Jeff Hostetler
2017-11-02 20:20 ` [PATCH 6/9] index-pack: refactor writing of .keep files Jeff Hostetler
2017-11-02 20:20 ` [PATCH 7/9] introduce fetch-object: fetch one promisor object Jeff Hostetler
2017-11-02 20:20 ` [PATCH 8/9] sha1_file: support lazily fetching missing objects Jeff Hostetler
2017-11-02 20:20 ` [PATCH 9/9] gc: do not repack promisor packfiles Jeff Hostetler

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).