git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/4] First steps towards partial clone submodules
@ 2021-06-01 21:34 Jonathan Tan
  2021-06-01 21:34 ` [PATCH 1/4] promisor-remote: read partialClone config here Jonathan Tan
                   ` (7 more replies)
  0 siblings, 8 replies; 77+ messages in thread
From: Jonathan Tan @ 2021-06-01 21:34 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

This is a preliminary step towards supporting partial clone submodules
(e.g., by cloning with --recurse-submodules and having the given filter
propagate to submodules). Even with this patch set, we won't be there
yet (notably, some code in Git access objects in submodules by adding
them as alternates - so lazy-fetching missing objects in submodules
wouldn't work here), but at least this is a first step.

This patch set would also be useful if Git needed to operate on
other repositories (other than in the submodule case), but I can't think
of such a situation right now.

As mentioned, there is still more work that needs to be done. Any help
is appreciated, and as for me, I hope to get back to this in the 3rd
quarter of the year.

Jonathan Tan (4):
  promisor-remote: read partialClone config here
  promisor-remote: support per-repository config
  run-command: move envvar-resetting function
  promisor-remote: teach lazy-fetch in any repo

 Makefile                      |   1 +
 cache.h                       |   1 -
 object-file.c                 |   7 +-
 promisor-remote.c             | 119 +++++++++++++++++++---------------
 promisor-remote.h             |  26 +++++---
 repository.h                  |   4 ++
 run-command.c                 |  10 +++
 run-command.h                 |   7 ++
 setup.c                       |  10 ++-
 submodule.c                   |  14 +---
 t/helper/test-partial-clone.c |  34 ++++++++++
 t/helper/test-tool.c          |   1 +
 t/helper/test-tool.h          |   1 +
 t/t0410-partial-clone.sh      |  24 +++++++
 14 files changed, 177 insertions(+), 82 deletions(-)
 create mode 100644 t/helper/test-partial-clone.c

-- 
2.32.0.rc0.204.g9fa02ecfa5-goog


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH 1/4] promisor-remote: read partialClone config here
  2021-06-01 21:34 [PATCH 0/4] First steps towards partial clone submodules Jonathan Tan
@ 2021-06-01 21:34 ` Jonathan Tan
  2021-06-04 19:56   ` Taylor Blau
  2021-06-07 22:41   ` Emily Shaffer
  2021-06-01 21:34 ` [PATCH 2/4] promisor-remote: support per-repository config Jonathan Tan
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 77+ messages in thread
From: Jonathan Tan @ 2021-06-01 21:34 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Currently, the reading of config related to promisor remotes is done in
two places: once in setup.c (which sets the global variable
repository_format_partial_clone, to be read by the code in
promisor-remote.c), and once in promisor-remote.c. This means that care
must be taken to ensure that repository_format_partial_clone is set
before any code in promisor-remote.c accesses it.

To simplify the code, move all such config reading to promisor-remote.c.
By doing this, it will be easier to see when
repository_format_partial_clone is written and, thus, to reason about
the code. This will be especially helpful in a subsequent commit, which
modifies this code.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 cache.h           |  1 -
 promisor-remote.c | 10 +++++-----
 promisor-remote.h |  6 ------
 setup.c           | 10 +++++++---
 4 files changed, 12 insertions(+), 15 deletions(-)

diff --git a/cache.h b/cache.h
index ba04ff8bd3..dbdcec8601 100644
--- a/cache.h
+++ b/cache.h
@@ -1061,7 +1061,6 @@ extern int repository_format_worktree_config;
 struct repository_format {
 	int version;
 	int precious_objects;
-	char *partial_clone; /* value of extensions.partialclone */
 	int worktree_config;
 	int is_bare;
 	int hash_algo;
diff --git a/promisor-remote.c b/promisor-remote.c
index da3f2ca261..bfe8eee5f2 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -7,11 +7,6 @@
 
 static char *repository_format_partial_clone;
 
-void set_repository_format_partial_clone(char *partial_clone)
-{
-	repository_format_partial_clone = xstrdup_or_null(partial_clone);
-}
-
 static int fetch_objects(const char *remote_name,
 			 const struct object_id *oids,
 			 int oid_nr)
@@ -99,6 +94,11 @@ static int promisor_remote_config(const char *var, const char *value, void *data
 	size_t namelen;
 	const char *subkey;
 
+	if (!strcmp(var, "extensions.partialclone")) {
+		repository_format_partial_clone = xstrdup(value);
+		return 0;
+	}
+
 	if (parse_config_key(var, "remote", &name, &namelen, &subkey) < 0)
 		return 0;
 
diff --git a/promisor-remote.h b/promisor-remote.h
index c7a14063c5..687210ab87 100644
--- a/promisor-remote.h
+++ b/promisor-remote.h
@@ -32,10 +32,4 @@ int promisor_remote_get_direct(struct repository *repo,
 			       const struct object_id *oids,
 			       int oid_nr);
 
-/*
- * This should be used only once from setup.c to set the value we got
- * from the extensions.partialclone config option.
- */
-void set_repository_format_partial_clone(char *partial_clone);
-
 #endif /* PROMISOR_REMOTE_H */
diff --git a/setup.c b/setup.c
index 59e2facd9d..d60b6bc554 100644
--- a/setup.c
+++ b/setup.c
@@ -470,7 +470,13 @@ static enum extension_result handle_extension_v0(const char *var,
 		} else if (!strcmp(ext, "partialclone")) {
 			if (!value)
 				return config_error_nonbool(var);
-			data->partial_clone = xstrdup(value);
+			/*
+			 * This config variable will be read together with the
+			 * other relevant config variables in
+			 * promisor_remote_config() in promisor_remote.c, so we
+			 * do not need to read it here. Just report that this
+			 * extension is known.
+			 */
 			return EXTENSION_OK;
 		} else if (!strcmp(ext, "worktreeconfig")) {
 			data->worktree_config = git_config_bool(var, value);
@@ -566,7 +572,6 @@ static int check_repository_format_gently(const char *gitdir, struct repository_
 	}
 
 	repository_format_precious_objects = candidate->precious_objects;
-	set_repository_format_partial_clone(candidate->partial_clone);
 	repository_format_worktree_config = candidate->worktree_config;
 	string_list_clear(&candidate->unknown_extensions, 0);
 	string_list_clear(&candidate->v1_only_extensions, 0);
@@ -650,7 +655,6 @@ void clear_repository_format(struct repository_format *format)
 	string_list_clear(&format->unknown_extensions, 0);
 	string_list_clear(&format->v1_only_extensions, 0);
 	free(format->work_tree);
-	free(format->partial_clone);
 	init_repository_format(format);
 }
 
-- 
2.32.0.rc0.204.g9fa02ecfa5-goog


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH 2/4] promisor-remote: support per-repository config
  2021-06-01 21:34 [PATCH 0/4] First steps towards partial clone submodules Jonathan Tan
  2021-06-01 21:34 ` [PATCH 1/4] promisor-remote: read partialClone config here Jonathan Tan
@ 2021-06-01 21:34 ` Jonathan Tan
  2021-06-04 20:09   ` Taylor Blau
                     ` (2 more replies)
  2021-06-01 21:34 ` [PATCH 3/4] run-command: move envvar-resetting function Jonathan Tan
                   ` (5 subsequent siblings)
  7 siblings, 3 replies; 77+ messages in thread
From: Jonathan Tan @ 2021-06-01 21:34 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Instead of using global variables to store promisor remote information,
store this config in struct repository instead, and add
repository-agnostic non-static functions corresponding to the existing
non-static functions that only work on the_repository.

The actual lazy-fetching of missing objects currently does not work on
repositories other than the_repository, and will still not work after
this commit, so add a BUG message explaining this. A subsequent commit
will remove this limitation.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 promisor-remote.c | 101 +++++++++++++++++++++++++---------------------
 promisor-remote.h |  20 +++++++--
 repository.h      |   4 ++
 3 files changed, 77 insertions(+), 48 deletions(-)

diff --git a/promisor-remote.c b/promisor-remote.c
index bfe8eee5f2..5819d2cf28 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -5,7 +5,11 @@
 #include "transport.h"
 #include "strvec.h"
 
-static char *repository_format_partial_clone;
+struct promisor_remote_config {
+	char *repository_format_partial_clone;
+	struct promisor_remote *promisors;
+	struct promisor_remote **promisors_tail;
+};
 
 static int fetch_objects(const char *remote_name,
 			 const struct object_id *oids,
@@ -37,10 +41,8 @@ static int fetch_objects(const char *remote_name,
 	return finish_command(&child) ? -1 : 0;
 }
 
-static struct promisor_remote *promisors;
-static struct promisor_remote **promisors_tail = &promisors;
-
-static struct promisor_remote *promisor_remote_new(const char *remote_name)
+static struct promisor_remote *promisor_remote_new(struct promisor_remote_config *config,
+						   const char *remote_name)
 {
 	struct promisor_remote *r;
 
@@ -52,18 +54,19 @@ static struct promisor_remote *promisor_remote_new(const char *remote_name)
 
 	FLEX_ALLOC_STR(r, name, remote_name);
 
-	*promisors_tail = r;
-	promisors_tail = &r->next;
+	*config->promisors_tail = r;
+	config->promisors_tail = &r->next;
 
 	return r;
 }
 
-static struct promisor_remote *promisor_remote_lookup(const char *remote_name,
+static struct promisor_remote *promisor_remote_lookup(struct promisor_remote_config *config,
+						      const char *remote_name,
 						      struct promisor_remote **previous)
 {
 	struct promisor_remote *r, *p;
 
-	for (p = NULL, r = promisors; r; p = r, r = r->next)
+	for (p = NULL, r = config->promisors; r; p = r, r = r->next)
 		if (!strcmp(r->name, remote_name)) {
 			if (previous)
 				*previous = p;
@@ -73,7 +76,8 @@ static struct promisor_remote *promisor_remote_lookup(const char *remote_name,
 	return NULL;
 }
 
-static void promisor_remote_move_to_tail(struct promisor_remote *r,
+static void promisor_remote_move_to_tail(struct promisor_remote_config *config,
+					 struct promisor_remote *r,
 					 struct promisor_remote *previous)
 {
 	if (r->next == NULL)
@@ -82,20 +86,21 @@ static void promisor_remote_move_to_tail(struct promisor_remote *r,
 	if (previous)
 		previous->next = r->next;
 	else
-		promisors = r->next ? r->next : r;
+		config->promisors = r->next ? r->next : r;
 	r->next = NULL;
-	*promisors_tail = r;
-	promisors_tail = &r->next;
+	*config->promisors_tail = r;
+	config->promisors_tail = &r->next;
 }
 
 static int promisor_remote_config(const char *var, const char *value, void *data)
 {
+	struct promisor_remote_config *config = data;
 	const char *name;
 	size_t namelen;
 	const char *subkey;
 
 	if (!strcmp(var, "extensions.partialclone")) {
-		repository_format_partial_clone = xstrdup(value);
+		config->repository_format_partial_clone = xstrdup(value);
 		return 0;
 	}
 
@@ -110,8 +115,8 @@ static int promisor_remote_config(const char *var, const char *value, void *data
 
 		remote_name = xmemdupz(name, namelen);
 
-		if (!promisor_remote_lookup(remote_name, NULL))
-			promisor_remote_new(remote_name);
+		if (!promisor_remote_lookup(config, remote_name, NULL))
+			promisor_remote_new(config, remote_name);
 
 		free(remote_name);
 		return 0;
@@ -120,9 +125,9 @@ static int promisor_remote_config(const char *var, const char *value, void *data
 		struct promisor_remote *r;
 		char *remote_name = xmemdupz(name, namelen);
 
-		r = promisor_remote_lookup(remote_name, NULL);
+		r = promisor_remote_lookup(config, remote_name, NULL);
 		if (!r)
-			r = promisor_remote_new(remote_name);
+			r = promisor_remote_new(config, remote_name);
 
 		free(remote_name);
 
@@ -135,59 +140,63 @@ static int promisor_remote_config(const char *var, const char *value, void *data
 	return 0;
 }
 
-static int initialized;
-
-static void promisor_remote_init(void)
+static void promisor_remote_init(struct repository *r)
 {
-	if (initialized)
+	struct promisor_remote_config *config;
+
+	if (r->promisor_remote_config)
 		return;
-	initialized = 1;
+	config = r->promisor_remote_config =
+		xcalloc(sizeof(*r->promisor_remote_config), 1);
+	config->promisors_tail = &config->promisors;
 
-	git_config(promisor_remote_config, NULL);
+	git_config(promisor_remote_config, config);
 
-	if (repository_format_partial_clone) {
+	if (config->repository_format_partial_clone) {
 		struct promisor_remote *o, *previous;
 
-		o = promisor_remote_lookup(repository_format_partial_clone,
+		o = promisor_remote_lookup(config,
+					   config->repository_format_partial_clone,
 					   &previous);
 		if (o)
-			promisor_remote_move_to_tail(o, previous);
+			promisor_remote_move_to_tail(config, o, previous);
 		else
-			promisor_remote_new(repository_format_partial_clone);
+			promisor_remote_new(config, config->repository_format_partial_clone);
 	}
 }
 
-static void promisor_remote_clear(void)
+static void promisor_remote_clear(struct promisor_remote_config *config)
 {
-	while (promisors) {
-		struct promisor_remote *r = promisors;
-		promisors = promisors->next;
+	while (config->promisors) {
+		struct promisor_remote *r = config->promisors;
+		config->promisors = config->promisors->next;
 		free(r);
 	}
 
-	promisors_tail = &promisors;
+	config->promisors_tail = &config->promisors;
 }
 
-void promisor_remote_reinit(void)
+void repo_promisor_remote_reinit(struct repository *r)
 {
-	initialized = 0;
-	promisor_remote_clear();
-	promisor_remote_init();
+	promisor_remote_clear(r->promisor_remote_config);
+	FREE_AND_NULL(r->promisor_remote_config);
+	promisor_remote_init(r);
 }
 
-struct promisor_remote *promisor_remote_find(const char *remote_name)
+struct promisor_remote *repo_promisor_remote_find(struct repository *r,
+						  const char *remote_name)
 {
-	promisor_remote_init();
+	promisor_remote_init(r);
 
 	if (!remote_name)
-		return promisors;
+		return r->promisor_remote_config->promisors;
 
-	return promisor_remote_lookup(remote_name, NULL);
+	return promisor_remote_lookup(r->promisor_remote_config, remote_name, NULL);
 }
 
-int has_promisor_remote(void)
+int repo_has_promisor_remote(struct repository *r)
 {
-	return !!promisor_remote_find(NULL);
+	return !!repo_promisor_remote_find(r, NULL);
 }
 
 static int remove_fetched_oids(struct repository *repo,
@@ -235,9 +244,11 @@ int promisor_remote_get_direct(struct repository *repo,
 	if (oid_nr == 0)
 		return 0;
 
-	promisor_remote_init();
+	promisor_remote_init(repo);
 
-	for (r = promisors; r; r = r->next) {
+	if (repo != the_repository)
+		BUG("only the_repository is supported for now");
+	for (r = repo->promisor_remote_config->promisors; r; r = r->next) {
 		if (fetch_objects(r->name, remaining_oids, remaining_nr) < 0) {
 			if (remaining_nr == 1)
 				continue;
diff --git a/promisor-remote.h b/promisor-remote.h
index 687210ab87..5390d3e7bf 100644
--- a/promisor-remote.h
+++ b/promisor-remote.h
@@ -17,9 +17,23 @@ struct promisor_remote {
 	const char name[FLEX_ARRAY];
 };
 
-void promisor_remote_reinit(void);
-struct promisor_remote *promisor_remote_find(const char *remote_name);
-int has_promisor_remote(void);
+void repo_promisor_remote_reinit(struct repository *r);
+static inline void promisor_remote_reinit(void)
+{
+	repo_promisor_remote_reinit(the_repository);
+}
+
+struct promisor_remote *repo_promisor_remote_find(struct repository *r, const char *remote_name);
+static inline struct promisor_remote *promisor_remote_find(const char *remote_name)
+{
+	return repo_promisor_remote_find(the_repository, remote_name);
+}
+
+int repo_has_promisor_remote(struct repository *r);
+static inline int has_promisor_remote(void)
+{
+	return repo_has_promisor_remote(the_repository);
+}
 
 /*
  * Fetches all requested objects from all promisor remotes, trying them one at
diff --git a/repository.h b/repository.h
index a45f7520fd..fc06c154e2 100644
--- a/repository.h
+++ b/repository.h
@@ -10,6 +10,7 @@ struct lock_file;
 struct pathspec;
 struct raw_object_store;
 struct submodule_cache;
+struct promisor_remote_config;
 
 enum untracked_cache_setting {
 	UNTRACKED_CACHE_UNSET = -1,
@@ -139,6 +140,9 @@ struct repository {
 	/* True if commit-graph has been disabled within this process. */
 	int commit_graph_disabled;
 
+	/* Configurations related to promisor remotes. */
+	struct promisor_remote_config *promisor_remote_config;
+
 	/* Configurations */
 
 	/* Indicate if a repository has a different 'commondir' from 'gitdir' */
-- 
2.32.0.rc0.204.g9fa02ecfa5-goog


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH 3/4] run-command: move envvar-resetting function
  2021-06-01 21:34 [PATCH 0/4] First steps towards partial clone submodules Jonathan Tan
  2021-06-01 21:34 ` [PATCH 1/4] promisor-remote: read partialClone config here Jonathan Tan
  2021-06-01 21:34 ` [PATCH 2/4] promisor-remote: support per-repository config Jonathan Tan
@ 2021-06-01 21:34 ` Jonathan Tan
  2021-06-04 20:19   ` Taylor Blau
  2021-06-08  0:54   ` Emily Shaffer
  2021-06-01 21:34 ` [PATCH 4/4] promisor-remote: teach lazy-fetch in any repo Jonathan Tan
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 77+ messages in thread
From: Jonathan Tan @ 2021-06-01 21:34 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

There is a function that resets environment variables, used when
invoking a sub-process in a submodule. The lazy-fetching code (used in
partial clones) will need this function in a subsequent commit, so move
it to a more central location.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 run-command.c | 10 ++++++++++
 run-command.h |  7 +++++++
 submodule.c   | 14 ++------------
 3 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/run-command.c b/run-command.c
index be6bc128cd..a6c458119c 100644
--- a/run-command.c
+++ b/run-command.c
@@ -1892,3 +1892,13 @@ int run_auto_maintenance(int quiet)
 
 	return run_command(&maint);
 }
+
+void prepare_other_repo_env(struct strvec *env_array)
+{
+	const char * const *var;
+
+	for (var = local_repo_env; *var; var++) {
+		if (strcmp(*var, CONFIG_DATA_ENVIRONMENT))
+			strvec_push(env_array, *var);
+	}
+}
diff --git a/run-command.h b/run-command.h
index d08414a92e..6f61ec7703 100644
--- a/run-command.h
+++ b/run-command.h
@@ -483,4 +483,11 @@ int run_processes_parallel_tr2(int n, get_next_task_fn, start_failure_fn,
 			       task_finished_fn, void *pp_cb,
 			       const char *tr2_category, const char *tr2_label);
 
+/**
+ * Convenience function that adds entries to env_array that resets all
+ * repo-specific environment variables except for CONFIG_DATA_ENVIRONMENT. See
+ * local_repo_env in cache.h for more information.
+ */
+void prepare_other_repo_env(struct strvec *env_array);
+
 #endif
diff --git a/submodule.c b/submodule.c
index 0b1d9c1dde..a30216db52 100644
--- a/submodule.c
+++ b/submodule.c
@@ -484,26 +484,16 @@ static void print_submodule_diff_summary(struct repository *r, struct rev_info *
 	strbuf_release(&sb);
 }
 
-static void prepare_submodule_repo_env_no_git_dir(struct strvec *out)
-{
-	const char * const *var;
-
-	for (var = local_repo_env; *var; var++) {
-		if (strcmp(*var, CONFIG_DATA_ENVIRONMENT))
-			strvec_push(out, *var);
-	}
-}
-
 void prepare_submodule_repo_env(struct strvec *out)
 {
-	prepare_submodule_repo_env_no_git_dir(out);
+	prepare_other_repo_env(out);
 	strvec_pushf(out, "%s=%s", GIT_DIR_ENVIRONMENT,
 		     DEFAULT_GIT_DIR_ENVIRONMENT);
 }
 
 static void prepare_submodule_repo_env_in_gitdir(struct strvec *out)
 {
-	prepare_submodule_repo_env_no_git_dir(out);
+	prepare_other_repo_env(out);
 	strvec_pushf(out, "%s=.", GIT_DIR_ENVIRONMENT);
 }
 
-- 
2.32.0.rc0.204.g9fa02ecfa5-goog


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH 4/4] promisor-remote: teach lazy-fetch in any repo
  2021-06-01 21:34 [PATCH 0/4] First steps towards partial clone submodules Jonathan Tan
                   ` (2 preceding siblings ...)
  2021-06-01 21:34 ` [PATCH 3/4] run-command: move envvar-resetting function Jonathan Tan
@ 2021-06-01 21:34 ` Jonathan Tan
  2021-06-04 21:25   ` Taylor Blau
                     ` (3 more replies)
  2021-06-08  0:25 ` [PATCH v2 0/4] First steps towards partial clone submodules Jonathan Tan
                   ` (3 subsequent siblings)
  7 siblings, 4 replies; 77+ messages in thread
From: Jonathan Tan @ 2021-06-01 21:34 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

This is one step towards supporting partial clone submodules.

Even after this patch, we will still lack partial clone submodules
support, primarily because a lot of Git code that accesses submodule
objects does so by adding their object stores as alternates, meaning
that any lazy fetches that would occur in the submodule would be done
based on the config of the superproject, not of the submodule. This also
prevents testing of the functionality in this patch by user-facing
commands. So for now, test this mechanism using a test helper.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 Makefile                      |  1 +
 object-file.c                 |  7 ++-----
 promisor-remote.c             | 14 +++++++++-----
 t/helper/test-partial-clone.c | 34 ++++++++++++++++++++++++++++++++++
 t/helper/test-tool.c          |  1 +
 t/helper/test-tool.h          |  1 +
 t/t0410-partial-clone.sh      | 24 ++++++++++++++++++++++++
 7 files changed, 72 insertions(+), 10 deletions(-)
 create mode 100644 t/helper/test-partial-clone.c

diff --git a/Makefile b/Makefile
index c3565fc0f8..f6653bcd5e 100644
--- a/Makefile
+++ b/Makefile
@@ -725,6 +725,7 @@ TEST_BUILTINS_OBJS += test-oidmap.o
 TEST_BUILTINS_OBJS += test-online-cpus.o
 TEST_BUILTINS_OBJS += test-parse-options.o
 TEST_BUILTINS_OBJS += test-parse-pathspec-file.o
+TEST_BUILTINS_OBJS += test-partial-clone.o
 TEST_BUILTINS_OBJS += test-path-utils.o
 TEST_BUILTINS_OBJS += test-pcre2-config.o
 TEST_BUILTINS_OBJS += test-pkt-line.o
diff --git a/object-file.c b/object-file.c
index f233b440b2..ebf273e9e7 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1570,15 +1570,12 @@ static int do_oid_object_info_extended(struct repository *r,
 		}
 
 		/* Check if it is a missing object */
-		if (fetch_if_missing && has_promisor_remote() &&
-		    !already_retried && r == the_repository &&
+		if (fetch_if_missing && repo_has_promisor_remote(r) &&
+		    !already_retried &&
 		    !(flags & OBJECT_INFO_SKIP_FETCH_OBJECT)) {
 			/*
 			 * TODO Investigate checking promisor_remote_get_direct()
 			 * TODO return value and stopping on error here.
-			 * TODO Pass a repository struct through
-			 * promisor_remote_get_direct(), such that arbitrary
-			 * repositories work.
 			 */
 			promisor_remote_get_direct(r, real, 1);
 			already_retried = 1;
diff --git a/promisor-remote.c b/promisor-remote.c
index 5819d2cf28..1601f05d79 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -11,7 +11,8 @@ struct promisor_remote_config {
 	struct promisor_remote **promisors_tail;
 };
 
-static int fetch_objects(const char *remote_name,
+static int fetch_objects(struct repository *repo,
+			 const char *remote_name,
 			 const struct object_id *oids,
 			 int oid_nr)
 {
@@ -21,6 +22,11 @@ static int fetch_objects(const char *remote_name,
 
 	child.git_cmd = 1;
 	child.in = -1;
+	if (repo != the_repository) {
+		prepare_other_repo_env(&child.env_array);
+		strvec_pushf(&child.env_array, "%s=%s", GIT_DIR_ENVIRONMENT,
+			     repo->gitdir);
+	}
 	strvec_pushl(&child.args, "-c", "fetch.negotiationAlgorithm=noop",
 		     "fetch", remote_name, "--no-tags",
 		     "--no-write-fetch-head", "--recurse-submodules=no",
@@ -150,7 +156,7 @@ static void promisor_remote_init(struct repository *r)
 		xcalloc(sizeof(*r->promisor_remote_config), 1);
 	config->promisors_tail = &config->promisors;
 
-	git_config(promisor_remote_config, config);
+	repo_config(r, promisor_remote_config, config);
 
 	if (config->repository_format_partial_clone) {
 		struct promisor_remote *o, *previous;
@@ -246,10 +252,8 @@ int promisor_remote_get_direct(struct repository *repo,
 
 	promisor_remote_init(repo);
 
-	if (repo != the_repository)
-		BUG("only the_repository is supported for now");
 	for (r = repo->promisor_remote_config->promisors; r; r = r->next) {
-		if (fetch_objects(r->name, remaining_oids, remaining_nr) < 0) {
+		if (fetch_objects(repo, r->name, remaining_oids, remaining_nr) < 0) {
 			if (remaining_nr == 1)
 				continue;
 			remaining_nr = remove_fetched_oids(repo, &remaining_oids,
diff --git a/t/helper/test-partial-clone.c b/t/helper/test-partial-clone.c
new file mode 100644
index 0000000000..e7bc7eb21f
--- /dev/null
+++ b/t/helper/test-partial-clone.c
@@ -0,0 +1,34 @@
+#include "cache.h"
+#include "test-tool.h"
+#include "repository.h"
+#include "object-store.h"
+
+static void object_info(const char *gitdir, const char *oid_hex)
+{
+	struct repository r;
+	struct object_id oid;
+	unsigned long size;
+	struct object_info oi = {.sizep = &size};
+	const char *p;
+
+	if (repo_init(&r, gitdir, NULL))
+		die("could not init repo");
+	if (parse_oid_hex(oid_hex, &oid, &p))
+		die("could not parse oid");
+	if (oid_object_info_extended(&r, &oid, &oi, 0))
+		die("could not obtain object info");
+	printf("%d\n", (int) size);
+}
+
+int cmd__partial_clone(int argc, const char **argv)
+{
+	if (argc < 4)
+		die("too few arguments");
+
+	if (!strcmp(argv[1], "object-info"))
+		object_info(argv[2], argv[3]);
+	else
+		die("invalid argument '%s'", argv[1]);
+
+	return 0;
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index c5bd0c6d4c..b21e8f1519 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -46,6 +46,7 @@ static struct test_cmd cmds[] = {
 	{ "online-cpus", cmd__online_cpus },
 	{ "parse-options", cmd__parse_options },
 	{ "parse-pathspec-file", cmd__parse_pathspec_file },
+	{ "partial-clone", cmd__partial_clone },
 	{ "path-utils", cmd__path_utils },
 	{ "pcre2-config", cmd__pcre2_config },
 	{ "pkt-line", cmd__pkt_line },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index e8069a3b22..f845ced4b3 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -35,6 +35,7 @@ int cmd__oidmap(int argc, const char **argv);
 int cmd__online_cpus(int argc, const char **argv);
 int cmd__parse_options(int argc, const char **argv);
 int cmd__parse_pathspec_file(int argc, const char** argv);
+int cmd__partial_clone(int argc, const char **argv);
 int cmd__path_utils(int argc, const char **argv);
 int cmd__pcre2_config(int argc, const char **argv);
 int cmd__pkt_line(int argc, const char **argv);
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 584a039b85..e804d267e6 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -604,6 +604,30 @@ test_expect_success 'do not fetch when checking existence of tree we construct o
 	git -C repo cherry-pick side1
 '
 
+test_expect_success 'lazy-fetch when accessing object not in the_repository' '
+	rm -rf full partial.git &&
+	test_create_repo full &&
+	printf 12345 >full/file.txt &&
+	git -C full add file.txt &&
+	git -C full commit -m "first commit" &&
+
+	test_config -C full uploadpack.allowfilter 1 &&
+	test_config -C full uploadpack.allowanysha1inwant 1 &&
+	git clone --filter=blob:none --bare "file://$(pwd)/full" partial.git &&
+	FILE_HASH=$(git hash-object --stdin <full/file.txt) &&
+
+	# Sanity check that the file is missing
+	git -C partial.git rev-list --objects --missing=print HEAD >out &&
+	grep "[?]$FILE_HASH" out &&
+
+	OUT=$(test-tool partial-clone object-info partial.git "$FILE_HASH") &&
+	test "$OUT" -eq 5 &&
+
+	# Sanity check that the file is now present
+	git -C partial.git rev-list --objects --missing=print HEAD >out &&
+	! grep "[?]$FILE_HASH" out
+'
+
 . "$TEST_DIRECTORY"/lib-httpd.sh
 start_httpd
 
-- 
2.32.0.rc0.204.g9fa02ecfa5-goog


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 1/4] promisor-remote: read partialClone config here
  2021-06-01 21:34 ` [PATCH 1/4] promisor-remote: read partialClone config here Jonathan Tan
@ 2021-06-04 19:56   ` Taylor Blau
  2021-06-05  1:38     ` Jonathan Tan
  2021-06-07 22:41   ` Emily Shaffer
  1 sibling, 1 reply; 77+ messages in thread
From: Taylor Blau @ 2021-06-04 19:56 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On Tue, Jun 01, 2021 at 02:34:16PM -0700, Jonathan Tan wrote:
> @@ -99,6 +94,11 @@ static int promisor_remote_config(const char *var, const char *value, void *data
>  	size_t namelen;
>  	const char *subkey;
>
> +	if (!strcmp(var, "extensions.partialclone")) {
> +		repository_format_partial_clone = xstrdup(value);

Can value ever be NULL here? I think the answer is "no, because we check
earlier in setup.c:handle_extension_v0()", but there is an implicit
conversion from xstrdup_or_null() to just xstrdup(), which would fault
if value were to be NULL.

Looking deeper, this path is a little confusing to me, since (in the
pre-image), handle_extension_v0() makes a copy of value and binds it to
data->partial_clone. But then check_repository_format_gently() makes
another copy of canidate->partial_clone (which is the same location as
data->partial_clone).

So, the extra copy is a little strange to me, because even though the
copy in handle_extension_v0() is definitely necessary, I'm not certain
that the one in set_repository_format_partial_clone() is. And this patch
removes the latter one, which I think is good. But we never free
repository_format_partial_clone.

Maybe that is added in a later patch, let's see...

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2/4] promisor-remote: support per-repository config
  2021-06-01 21:34 ` [PATCH 2/4] promisor-remote: support per-repository config Jonathan Tan
@ 2021-06-04 20:09   ` Taylor Blau
  2021-06-05  1:43     ` Jonathan Tan
  2021-06-04 21:21   ` Elijah Newren
  2021-06-08  0:48   ` Emily Shaffer
  2 siblings, 1 reply; 77+ messages in thread
From: Taylor Blau @ 2021-06-04 20:09 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On Tue, Jun 01, 2021 at 02:34:17PM -0700, Jonathan Tan wrote:
> Instead of using global variables to store promisor remote information,
> store this config in struct repository instead, and add
> repository-agnostic non-static functions corresponding to the existing
> non-static functions that only work on the_repository.
>
> The actual lazy-fetching of missing objects currently does not work on
> repositories other than the_repository, and will still not work after
> this commit, so add a BUG message explaining this. A subsequent commit
> will remove this limitation.

Makes sense to me. I found my answer to the question that I raised
during my review of the previous patch, and I think it would make sense
to address in an amended version of this patch.

Other than that, the translation all looked very faithful to me.

> -void promisor_remote_reinit(void)
> +void repo_promisor_remote_reinit(struct repository *r)
>  {
> -	initialized = 0;
> -	promisor_remote_clear();
> -	promisor_remote_init();
> +	promisor_remote_clear(r->promisor_remote_config);

Ah, this is probably where I would have expected to see
r->promisor_remote_config->repository_format_partial_clone freed as
well.

I wondered whether or not that should have been freed, since on first
read it seemed that this function was mostly concerned with the list of
promisor remotes rather than the structure containing them. But on a
closer look, we are re-initializing the whole structure with
promisor_remote_init(), which runs the whole promisor_remote_config
callback again.

So I do think we want to free that part of the structure, too, before
reinitializing it. I would probably do it in promisor_remote_clear().

> @@ -235,9 +244,11 @@ int promisor_remote_get_direct(struct repository *repo,
>  	if (oid_nr == 0)
>  		return 0;
>
> -	promisor_remote_init();
> +	promisor_remote_init(repo);
>
> -	for (r = promisors; r; r = r->next) {
> +	if (repo != the_repository)
> +		BUG("only the_repository is supported for now");

I could go either way on whether this is worthy of a BUG() or not, but I
don't really have much of a strong feeling about it.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 3/4] run-command: move envvar-resetting function
  2021-06-01 21:34 ` [PATCH 3/4] run-command: move envvar-resetting function Jonathan Tan
@ 2021-06-04 20:19   ` Taylor Blau
  2021-06-05  1:57     ` Jonathan Tan
  2021-06-08  0:54   ` Emily Shaffer
  1 sibling, 1 reply; 77+ messages in thread
From: Taylor Blau @ 2021-06-04 20:19 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On Tue, Jun 01, 2021 at 02:34:18PM -0700, Jonathan Tan wrote:
> There is a function that resets environment variables, used when
> invoking a sub-process in a submodule. The lazy-fetching code (used in
> partial clones) will need this function in a subsequent commit, so move
> it to a more central location.
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>

All seems pretty normal to me. I did have one question, though:

> +/**
> + * Convenience function that adds entries to env_array that resets all

Hmm. Why "resets"? IIUC local_repo_env is the array of environment
variables that change behavior. With that understanding in mind, I
probably would have written something more like:

    Convenience function which adds all GIT_* environment variables to
    env_array with the exception of GIT_CONFIG_PARAMETERS. See
    local_repo_env in cache.h for more information.

(Confusingly, cache.h calls this variable CONFIG_DATA_ENVIRONMENT, but
binds it to GIT_CONFIG_PARAMETERS. I think it probably makes more sense
to use the environment variable's name rather than our #define, since
we're saying "all GIT_* variables, except this one", so it would be
weird for "this one" not to start with "GIT_".

Otherwise the movement looks fine to me.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2/4] promisor-remote: support per-repository config
  2021-06-01 21:34 ` [PATCH 2/4] promisor-remote: support per-repository config Jonathan Tan
  2021-06-04 20:09   ` Taylor Blau
@ 2021-06-04 21:21   ` Elijah Newren
  2021-06-05  1:54     ` Jonathan Tan
  2021-06-08  0:48   ` Emily Shaffer
  2 siblings, 1 reply; 77+ messages in thread
From: Elijah Newren @ 2021-06-04 21:21 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Git Mailing List

On Tue, Jun 1, 2021 at 2:38 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> Instead of using global variables to store promisor remote information,
> store this config in struct repository instead, and add
> repository-agnostic non-static functions corresponding to the existing
> non-static functions that only work on the_repository.
>
> The actual lazy-fetching of missing objects currently does not work on
> repositories other than the_repository, and will still not work after
> this commit, so add a BUG message explaining this. A subsequent commit
> will remove this limitation.
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  promisor-remote.c | 101 +++++++++++++++++++++++++---------------------
>  promisor-remote.h |  20 +++++++--
>  repository.h      |   4 ++
>  3 files changed, 77 insertions(+), 48 deletions(-)
>
> diff --git a/promisor-remote.c b/promisor-remote.c
> index bfe8eee5f2..5819d2cf28 100644
> --- a/promisor-remote.c
> +++ b/promisor-remote.c
> @@ -5,7 +5,11 @@
>  #include "transport.h"
>  #include "strvec.h"
>
> -static char *repository_format_partial_clone;
> +struct promisor_remote_config {
> +       char *repository_format_partial_clone;
> +       struct promisor_remote *promisors;
> +       struct promisor_remote **promisors_tail;
> +};
>
>  static int fetch_objects(const char *remote_name,
>                          const struct object_id *oids,
> @@ -37,10 +41,8 @@ static int fetch_objects(const char *remote_name,
>         return finish_command(&child) ? -1 : 0;
>  }
>
> -static struct promisor_remote *promisors;
> -static struct promisor_remote **promisors_tail = &promisors;
> -
> -static struct promisor_remote *promisor_remote_new(const char *remote_name)
> +static struct promisor_remote *promisor_remote_new(struct promisor_remote_config *config,
> +                                                  const char *remote_name)
>  {
>         struct promisor_remote *r;
>
> @@ -52,18 +54,19 @@ static struct promisor_remote *promisor_remote_new(const char *remote_name)
>
>         FLEX_ALLOC_STR(r, name, remote_name);
>
> -       *promisors_tail = r;
> -       promisors_tail = &r->next;
> +       *config->promisors_tail = r;
> +       config->promisors_tail = &r->next;
>
>         return r;
>  }
>
> -static struct promisor_remote *promisor_remote_lookup(const char *remote_name,
> +static struct promisor_remote *promisor_remote_lookup(struct promisor_remote_config *config,
> +                                                     const char *remote_name,
>                                                       struct promisor_remote **previous)
>  {
>         struct promisor_remote *r, *p;
>
> -       for (p = NULL, r = promisors; r; p = r, r = r->next)
> +       for (p = NULL, r = config->promisors; r; p = r, r = r->next)
>                 if (!strcmp(r->name, remote_name)) {
>                         if (previous)
>                                 *previous = p;
> @@ -73,7 +76,8 @@ static struct promisor_remote *promisor_remote_lookup(const char *remote_name,
>         return NULL;
>  }
>
> -static void promisor_remote_move_to_tail(struct promisor_remote *r,
> +static void promisor_remote_move_to_tail(struct promisor_remote_config *config,
> +                                        struct promisor_remote *r,
>                                          struct promisor_remote *previous)
>  {
>         if (r->next == NULL)
> @@ -82,20 +86,21 @@ static void promisor_remote_move_to_tail(struct promisor_remote *r,
>         if (previous)
>                 previous->next = r->next;
>         else
> -               promisors = r->next ? r->next : r;
> +               config->promisors = r->next ? r->next : r;
>         r->next = NULL;
> -       *promisors_tail = r;
> -       promisors_tail = &r->next;
> +       *config->promisors_tail = r;
> +       config->promisors_tail = &r->next;
>  }
>
>  static int promisor_remote_config(const char *var, const char *value, void *data)
>  {
> +       struct promisor_remote_config *config = data;
>         const char *name;
>         size_t namelen;
>         const char *subkey;
>
>         if (!strcmp(var, "extensions.partialclone")) {
> -               repository_format_partial_clone = xstrdup(value);
> +               config->repository_format_partial_clone = xstrdup(value);
>                 return 0;
>         }
>
> @@ -110,8 +115,8 @@ static int promisor_remote_config(const char *var, const char *value, void *data
>
>                 remote_name = xmemdupz(name, namelen);
>
> -               if (!promisor_remote_lookup(remote_name, NULL))
> -                       promisor_remote_new(remote_name);
> +               if (!promisor_remote_lookup(config, remote_name, NULL))
> +                       promisor_remote_new(config, remote_name);
>
>                 free(remote_name);
>                 return 0;
> @@ -120,9 +125,9 @@ static int promisor_remote_config(const char *var, const char *value, void *data
>                 struct promisor_remote *r;
>                 char *remote_name = xmemdupz(name, namelen);
>
> -               r = promisor_remote_lookup(remote_name, NULL);
> +               r = promisor_remote_lookup(config, remote_name, NULL);
>                 if (!r)
> -                       r = promisor_remote_new(remote_name);
> +                       r = promisor_remote_new(config, remote_name);
>
>                 free(remote_name);
>
> @@ -135,59 +140,63 @@ static int promisor_remote_config(const char *var, const char *value, void *data
>         return 0;
>  }
>
> -static int initialized;
> -
> -static void promisor_remote_init(void)
> +static void promisor_remote_init(struct repository *r)
>  {
> -       if (initialized)
> +       struct promisor_remote_config *config;
> +
> +       if (r->promisor_remote_config)
>                 return;
> -       initialized = 1;
> +       config = r->promisor_remote_config =
> +               xcalloc(sizeof(*r->promisor_remote_config), 1);
> +       config->promisors_tail = &config->promisors;
>
> -       git_config(promisor_remote_config, NULL);
> +       git_config(promisor_remote_config, config);
>
> -       if (repository_format_partial_clone) {
> +       if (config->repository_format_partial_clone) {
>                 struct promisor_remote *o, *previous;
>
> -               o = promisor_remote_lookup(repository_format_partial_clone,
> +               o = promisor_remote_lookup(config,
> +                                          config->repository_format_partial_clone,
>                                            &previous);
>                 if (o)
> -                       promisor_remote_move_to_tail(o, previous);
> +                       promisor_remote_move_to_tail(config, o, previous);
>                 else
> -                       promisor_remote_new(repository_format_partial_clone);
> +                       promisor_remote_new(config, config->repository_format_partial_clone);
>         }
>  }
>
> -static void promisor_remote_clear(void)
> +static void promisor_remote_clear(struct promisor_remote_config *config)
>  {
> -       while (promisors) {
> -               struct promisor_remote *r = promisors;
> -               promisors = promisors->next;
> +       while (config->promisors) {
> +               struct promisor_remote *r = config->promisors;
> +               config->promisors = config->promisors->next;
>                 free(r);
>         }
>
> -       promisors_tail = &promisors;
> +       config->promisors_tail = &config->promisors;
>  }
>
> -void promisor_remote_reinit(void)
> +void repo_promisor_remote_reinit(struct repository *r)
>  {
> -       initialized = 0;
> -       promisor_remote_clear();
> -       promisor_remote_init();
> +       promisor_remote_clear(r->promisor_remote_config);
> +       FREE_AND_NULL(r->promisor_remote_config);
> +       promisor_remote_init(r);
>  }
>
> -struct promisor_remote *promisor_remote_find(const char *remote_name)
> +struct promisor_remote *repo_promisor_remote_find(struct repository *r,
> +                                                 const char *remote_name)
>  {
> -       promisor_remote_init();
> +       promisor_remote_init(r);
>
>         if (!remote_name)
> -               return promisors;
> +               return r->promisor_remote_config->promisors;
>
> -       return promisor_remote_lookup(remote_name, NULL);
> +       return promisor_remote_lookup(r->promisor_remote_config, remote_name, NULL);
>  }
>
> -int has_promisor_remote(void)
> +int repo_has_promisor_remote(struct repository *r)
>  {
> -       return !!promisor_remote_find(NULL);
> +       return !!repo_promisor_remote_find(r, NULL);
>  }
>
>  static int remove_fetched_oids(struct repository *repo,
> @@ -235,9 +244,11 @@ int promisor_remote_get_direct(struct repository *repo,
>         if (oid_nr == 0)
>                 return 0;
>
> -       promisor_remote_init();
> +       promisor_remote_init(repo);
>
> -       for (r = promisors; r; r = r->next) {
> +       if (repo != the_repository)
> +               BUG("only the_repository is supported for now");
> +       for (r = repo->promisor_remote_config->promisors; r; r = r->next) {
>                 if (fetch_objects(r->name, remaining_oids, remaining_nr) < 0) {
>                         if (remaining_nr == 1)
>                                 continue;
> diff --git a/promisor-remote.h b/promisor-remote.h
> index 687210ab87..5390d3e7bf 100644
> --- a/promisor-remote.h
> +++ b/promisor-remote.h
> @@ -17,9 +17,23 @@ struct promisor_remote {
>         const char name[FLEX_ARRAY];
>  };
>
> -void promisor_remote_reinit(void);
> -struct promisor_remote *promisor_remote_find(const char *remote_name);
> -int has_promisor_remote(void);
> +void repo_promisor_remote_reinit(struct repository *r);
> +static inline void promisor_remote_reinit(void)
> +{
> +       repo_promisor_remote_reinit(the_repository);
> +}
> +
> +struct promisor_remote *repo_promisor_remote_find(struct repository *r, const char *remote_name);
> +static inline struct promisor_remote *promisor_remote_find(const char *remote_name)
> +{
> +       return repo_promisor_remote_find(the_repository, remote_name);
> +}
> +
> +int repo_has_promisor_remote(struct repository *r);
> +static inline int has_promisor_remote(void)
> +{
> +       return repo_has_promisor_remote(the_repository);
> +}

Is part of the plan for supporting partial clones within submodules to
audit the code for use of these inline wrappers and convert them over
to the repo_* variants?  I'm particularly interested in the
has_promisor_remote() function, since there are calls in
diffcore-rename at least that protect that call with a check against r
== the_repository.

>
>  /*
>   * Fetches all requested objects from all promisor remotes, trying them one at
> diff --git a/repository.h b/repository.h
> index a45f7520fd..fc06c154e2 100644
> --- a/repository.h
> +++ b/repository.h
> @@ -10,6 +10,7 @@ struct lock_file;
>  struct pathspec;
>  struct raw_object_store;
>  struct submodule_cache;
> +struct promisor_remote_config;
>
>  enum untracked_cache_setting {
>         UNTRACKED_CACHE_UNSET = -1,
> @@ -139,6 +140,9 @@ struct repository {
>         /* True if commit-graph has been disabled within this process. */
>         int commit_graph_disabled;
>
> +       /* Configurations related to promisor remotes. */
> +       struct promisor_remote_config *promisor_remote_config;
> +
>         /* Configurations */
>
>         /* Indicate if a repository has a different 'commondir' from 'gitdir' */
> --
> 2.32.0.rc0.204.g9fa02ecfa5-goog

Looks like a reasonable step in moving away from globals and have
repository-specific variants of these functions; I didn't spot any
problems, just one question about additional plans.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 4/4] promisor-remote: teach lazy-fetch in any repo
  2021-06-01 21:34 ` [PATCH 4/4] promisor-remote: teach lazy-fetch in any repo Jonathan Tan
@ 2021-06-04 21:25   ` Taylor Blau
  2021-06-05  2:11     ` Jonathan Tan
  2021-06-04 21:35   ` Elijah Newren
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 77+ messages in thread
From: Taylor Blau @ 2021-06-04 21:25 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On Tue, Jun 01, 2021 at 02:34:19PM -0700, Jonathan Tan wrote:
> This is one step towards supporting partial clone submodules.
>
> Even after this patch, we will still lack partial clone submodules
> support, primarily because a lot of Git code that accesses submodule
> objects does so by adding their object stores as alternates, meaning
> that any lazy fetches that would occur in the submodule would be done
> based on the config of the superproject, not of the submodule. This also
> prevents testing of the functionality in this patch by user-facing
> commands. So for now, test this mechanism using a test helper.

OK. Everything you wrote seemed reasonable to me, but I did have a
couple of questions on the test you added:

> diff --git a/t/helper/test-partial-clone.c b/t/helper/test-partial-clone.c
> new file mode 100644
> index 0000000000..e7bc7eb21f
> --- /dev/null
> +++ b/t/helper/test-partial-clone.c
> @@ -0,0 +1,34 @@
> +#include "cache.h"
> +#include "test-tool.h"
> +#include "repository.h"
> +#include "object-store.h"
> +
> +static void object_info(const char *gitdir, const char *oid_hex)
> +{
> +	struct repository r;
> +	struct object_id oid;
> +	unsigned long size;
> +	struct object_info oi = {.sizep = &size};
> +	const char *p;
> +
> +	if (repo_init(&r, gitdir, NULL))
> +		die("could not init repo");
> +	if (parse_oid_hex(oid_hex, &oid, &p))
> +		die("could not parse oid");
> +	if (oid_object_info_extended(&r, &oid, &oi, 0))
> +		die("could not obtain object info");
> +	printf("%d\n", (int) size);
> +}

Hmm. Is there a reason that the same couldn't be implemented by calling "git
cat-file -s" from the partial clone?

> diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
> index 584a039b85..e804d267e6 100755
> --- a/t/t0410-partial-clone.sh
> +++ b/t/t0410-partial-clone.sh
> @@ -604,6 +604,30 @@ test_expect_success 'do not fetch when checking existence of tree we construct o
>  	git -C repo cherry-pick side1
>  '
>
> +test_expect_success 'lazy-fetch when accessing object not in the_repository' '
> +	rm -rf full partial.git &&
> +	test_create_repo full &&
> +	printf 12345 >full/file.txt &&
> +	git -C full add file.txt &&
> +	git -C full commit -m "first commit" &&

This is a stylistic nit, but I think using test_commit is better here
for a non-superficial reason. My guess is that you wanted to avoid
specifying a message and file (which are required positional arguments
to test_commit before you can specify the contents). But I think there
are two good reasons to use test_commit here:

  - It saves three lines of test script here.
  - You don't have to make the expected size a magic number (i.e.,
    because you knew ahead of time that the contents was "12345").

I probably would have expected this test to end with:

  git -C full cat-file -s $FILE_HASH >expect &&
  git -C partial.git cat-file -s $FILE_HASH >actual &&
  test_cmp expect actual

which reads more clearly to me (although I think the much more important
test is that $FILE_HASH doesn't show up in the output of the rev-list
--missing=print that is run in the partial clone).

> +
> +	test_config -C full uploadpack.allowfilter 1 &&
> +	test_config -C full uploadpack.allowanysha1inwant 1 &&
> +	git clone --filter=blob:none --bare "file://$(pwd)/full" partial.git &&
> +	FILE_HASH=$(git hash-object --stdin <full/file.txt) &&

This works for me, although I wouldn't have been sad to see the
sub-shell contain "git -C full rev-parse HEAD:file.txt" instead.

> +	# Sanity check that the file is missing
> +	git -C partial.git rev-list --objects --missing=print HEAD >out &&
> +	grep "[?]$FILE_HASH" out &&
> +
> +	OUT=$(test-tool partial-clone object-info partial.git "$FILE_HASH") &&

Coming back to my point about the utility of the partial-clone helper,
could this be replaced by saying just OUT="$(git -C partial.git cat-file
-s "$FILE_HASH")" instead?

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 4/4] promisor-remote: teach lazy-fetch in any repo
  2021-06-01 21:34 ` [PATCH 4/4] promisor-remote: teach lazy-fetch in any repo Jonathan Tan
  2021-06-04 21:25   ` Taylor Blau
@ 2021-06-04 21:35   ` Elijah Newren
  2021-06-05  2:16     ` Jonathan Tan
  2021-06-05  3:48     ` Elijah Newren
  2021-06-05  0:22   ` Elijah Newren
  2021-06-08  1:41   ` Emily Shaffer
  3 siblings, 2 replies; 77+ messages in thread
From: Elijah Newren @ 2021-06-04 21:35 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Git Mailing List

On Tue, Jun 1, 2021 at 2:38 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> This is one step towards supporting partial clone submodules.
>
> Even after this patch, we will still lack partial clone submodules
> support, primarily because a lot of Git code that accesses submodule
> objects does so by adding their object stores as alternates, meaning
> that any lazy fetches that would occur in the submodule would be done
> based on the config of the superproject, not of the submodule. This also
> prevents testing of the functionality in this patch by user-facing
> commands. So for now, test this mechanism using a test helper.
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  Makefile                      |  1 +
>  object-file.c                 |  7 ++-----
>  promisor-remote.c             | 14 +++++++++-----
>  t/helper/test-partial-clone.c | 34 ++++++++++++++++++++++++++++++++++
>  t/helper/test-tool.c          |  1 +
>  t/helper/test-tool.h          |  1 +
>  t/t0410-partial-clone.sh      | 24 ++++++++++++++++++++++++
>  7 files changed, 72 insertions(+), 10 deletions(-)
>  create mode 100644 t/helper/test-partial-clone.c
>
> diff --git a/Makefile b/Makefile
> index c3565fc0f8..f6653bcd5e 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -725,6 +725,7 @@ TEST_BUILTINS_OBJS += test-oidmap.o
>  TEST_BUILTINS_OBJS += test-online-cpus.o
>  TEST_BUILTINS_OBJS += test-parse-options.o
>  TEST_BUILTINS_OBJS += test-parse-pathspec-file.o
> +TEST_BUILTINS_OBJS += test-partial-clone.o
>  TEST_BUILTINS_OBJS += test-path-utils.o
>  TEST_BUILTINS_OBJS += test-pcre2-config.o
>  TEST_BUILTINS_OBJS += test-pkt-line.o
> diff --git a/object-file.c b/object-file.c
> index f233b440b2..ebf273e9e7 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -1570,15 +1570,12 @@ static int do_oid_object_info_extended(struct repository *r,
>                 }
>
>                 /* Check if it is a missing object */
> -               if (fetch_if_missing && has_promisor_remote() &&
> -                   !already_retried && r == the_repository &&
> +               if (fetch_if_missing && repo_has_promisor_remote(r) &&
> +                   !already_retried &&

So here you removed the special check against the_repository while
looking for promisor_remotes.  There are other such special checks in
the code; I also see:

diff.c: if (options->repo == the_repository && has_promisor_remote() &&
diffcore-break.c:       if (r == the_repository && has_promisor_remote()) {
diffcore-rename.c:      if (r == the_repository && has_promisor_remote()) {

and a series I'm planning to submit soon will add another to merge.ort.c.

Do these need to all be fixed as part of the partial clone submodule
support as well?  Do I need to change anything about my series?  I
guess since I'm asking that, I should probably submit it first so you
can actually see it and answer my question.  (And the timing may be
good since the area is fresh in your memory...)

>                     !(flags & OBJECT_INFO_SKIP_FETCH_OBJECT)) {
>                         /*
>                          * TODO Investigate checking promisor_remote_get_direct()
>                          * TODO return value and stopping on error here.
> -                        * TODO Pass a repository struct through
> -                        * promisor_remote_get_direct(), such that arbitrary
> -                        * repositories work.

Odd, it appears that when this comment was added (in commit b14ed5adaf
("Use promisor_remote_get_direct() and has_promisor_remote()",
2019-06-25)), a repository was passed to promisor_remote_get_direct().
Sure, it was just a transliteration of the comment that was there
before when fetch_objects() was the function being called, but since
the code was being changed and the comment being updated, it seems the
TODO should have been removed back then.

Oh, well, good to update it now at least.

>                          */
>                         promisor_remote_get_direct(r, real, 1);
>                         already_retried = 1;
> diff --git a/promisor-remote.c b/promisor-remote.c
> index 5819d2cf28..1601f05d79 100644
> --- a/promisor-remote.c
> +++ b/promisor-remote.c
> @@ -11,7 +11,8 @@ struct promisor_remote_config {
>         struct promisor_remote **promisors_tail;
>  };
>
> -static int fetch_objects(const char *remote_name,
> +static int fetch_objects(struct repository *repo,
> +                        const char *remote_name,
>                          const struct object_id *oids,
>                          int oid_nr)
>  {
> @@ -21,6 +22,11 @@ static int fetch_objects(const char *remote_name,
>
>         child.git_cmd = 1;
>         child.in = -1;
> +       if (repo != the_repository) {
> +               prepare_other_repo_env(&child.env_array);
> +               strvec_pushf(&child.env_array, "%s=%s", GIT_DIR_ENVIRONMENT,
> +                            repo->gitdir);
> +       }
>         strvec_pushl(&child.args, "-c", "fetch.negotiationAlgorithm=noop",
>                      "fetch", remote_name, "--no-tags",
>                      "--no-write-fetch-head", "--recurse-submodules=no",
> @@ -150,7 +156,7 @@ static void promisor_remote_init(struct repository *r)
>                 xcalloc(sizeof(*r->promisor_remote_config), 1);
>         config->promisors_tail = &config->promisors;
>
> -       git_config(promisor_remote_config, config);
> +       repo_config(r, promisor_remote_config, config);
>
>         if (config->repository_format_partial_clone) {
>                 struct promisor_remote *o, *previous;
> @@ -246,10 +252,8 @@ int promisor_remote_get_direct(struct repository *repo,
>
>         promisor_remote_init(repo);
>
> -       if (repo != the_repository)
> -               BUG("only the_repository is supported for now");
>         for (r = repo->promisor_remote_config->promisors; r; r = r->next) {
> -               if (fetch_objects(r->name, remaining_oids, remaining_nr) < 0) {
> +               if (fetch_objects(repo, r->name, remaining_oids, remaining_nr) < 0) {
>                         if (remaining_nr == 1)
>                                 continue;
>                         remaining_nr = remove_fetched_oids(repo, &remaining_oids,
> diff --git a/t/helper/test-partial-clone.c b/t/helper/test-partial-clone.c
> new file mode 100644
> index 0000000000..e7bc7eb21f
> --- /dev/null
> +++ b/t/helper/test-partial-clone.c
> @@ -0,0 +1,34 @@
> +#include "cache.h"
> +#include "test-tool.h"
> +#include "repository.h"
> +#include "object-store.h"
> +
> +static void object_info(const char *gitdir, const char *oid_hex)
> +{
> +       struct repository r;
> +       struct object_id oid;
> +       unsigned long size;
> +       struct object_info oi = {.sizep = &size};
> +       const char *p;
> +
> +       if (repo_init(&r, gitdir, NULL))
> +               die("could not init repo");
> +       if (parse_oid_hex(oid_hex, &oid, &p))
> +               die("could not parse oid");
> +       if (oid_object_info_extended(&r, &oid, &oi, 0))
> +               die("could not obtain object info");
> +       printf("%d\n", (int) size);
> +}
> +
> +int cmd__partial_clone(int argc, const char **argv)
> +{
> +       if (argc < 4)
> +               die("too few arguments");
> +
> +       if (!strcmp(argv[1], "object-info"))
> +               object_info(argv[2], argv[3]);
> +       else
> +               die("invalid argument '%s'", argv[1]);
> +
> +       return 0;
> +}
> diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
> index c5bd0c6d4c..b21e8f1519 100644
> --- a/t/helper/test-tool.c
> +++ b/t/helper/test-tool.c
> @@ -46,6 +46,7 @@ static struct test_cmd cmds[] = {
>         { "online-cpus", cmd__online_cpus },
>         { "parse-options", cmd__parse_options },
>         { "parse-pathspec-file", cmd__parse_pathspec_file },
> +       { "partial-clone", cmd__partial_clone },
>         { "path-utils", cmd__path_utils },
>         { "pcre2-config", cmd__pcre2_config },
>         { "pkt-line", cmd__pkt_line },
> diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
> index e8069a3b22..f845ced4b3 100644
> --- a/t/helper/test-tool.h
> +++ b/t/helper/test-tool.h
> @@ -35,6 +35,7 @@ int cmd__oidmap(int argc, const char **argv);
>  int cmd__online_cpus(int argc, const char **argv);
>  int cmd__parse_options(int argc, const char **argv);
>  int cmd__parse_pathspec_file(int argc, const char** argv);
> +int cmd__partial_clone(int argc, const char **argv);
>  int cmd__path_utils(int argc, const char **argv);
>  int cmd__pcre2_config(int argc, const char **argv);
>  int cmd__pkt_line(int argc, const char **argv);
> diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
> index 584a039b85..e804d267e6 100755
> --- a/t/t0410-partial-clone.sh
> +++ b/t/t0410-partial-clone.sh
> @@ -604,6 +604,30 @@ test_expect_success 'do not fetch when checking existence of tree we construct o
>         git -C repo cherry-pick side1
>  '
>
> +test_expect_success 'lazy-fetch when accessing object not in the_repository' '
> +       rm -rf full partial.git &&
> +       test_create_repo full &&
> +       printf 12345 >full/file.txt &&
> +       git -C full add file.txt &&
> +       git -C full commit -m "first commit" &&
> +
> +       test_config -C full uploadpack.allowfilter 1 &&
> +       test_config -C full uploadpack.allowanysha1inwant 1 &&
> +       git clone --filter=blob:none --bare "file://$(pwd)/full" partial.git &&
> +       FILE_HASH=$(git hash-object --stdin <full/file.txt) &&
> +
> +       # Sanity check that the file is missing
> +       git -C partial.git rev-list --objects --missing=print HEAD >out &&
> +       grep "[?]$FILE_HASH" out &&
> +
> +       OUT=$(test-tool partial-clone object-info partial.git "$FILE_HASH") &&
> +       test "$OUT" -eq 5 &&
> +
> +       # Sanity check that the file is now present
> +       git -C partial.git rev-list --objects --missing=print HEAD >out &&
> +       ! grep "[?]$FILE_HASH" out
> +'
> +
>  . "$TEST_DIRECTORY"/lib-httpd.sh
>  start_httpd
>
> --
> 2.32.0.rc0.204.g9fa02ecfa5-goog

Looks good to me.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 4/4] promisor-remote: teach lazy-fetch in any repo
  2021-06-01 21:34 ` [PATCH 4/4] promisor-remote: teach lazy-fetch in any repo Jonathan Tan
  2021-06-04 21:25   ` Taylor Blau
  2021-06-04 21:35   ` Elijah Newren
@ 2021-06-05  0:22   ` Elijah Newren
  2021-06-05  2:16     ` Jonathan Tan
  2021-06-08  1:41   ` Emily Shaffer
  3 siblings, 1 reply; 77+ messages in thread
From: Elijah Newren @ 2021-06-05  0:22 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Git Mailing List

Hi,

On Tue, Jun 1, 2021 at 2:38 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> This is one step towards supporting partial clone submodules.
>
> Even after this patch, we will still lack partial clone submodules
> support, primarily because a lot of Git code that accesses submodule
> objects does so by adding their object stores as alternates, meaning
> that any lazy fetches that would occur in the submodule would be done
> based on the config of the superproject, not of the submodule. This also
> prevents testing of the functionality in this patch by user-facing
> commands. So for now, test this mechanism using a test helper.
>
...
> diff --git a/t/helper/test-partial-clone.c b/t/helper/test-partial-clone.c
> new file mode 100644
> index 0000000000..e7bc7eb21f
> --- /dev/null
> +++ b/t/helper/test-partial-clone.c
> @@ -0,0 +1,34 @@
> +#include "cache.h"
> +#include "test-tool.h"
> +#include "repository.h"
> +#include "object-store.h"
> +
> +static void object_info(const char *gitdir, const char *oid_hex)
> +{
> +       struct repository r;
> +       struct object_id oid;
> +       unsigned long size;
> +       struct object_info oi = {.sizep = &size};
> +       const char *p;
> +
> +       if (repo_init(&r, gitdir, NULL))
> +               die("could not init repo");
> +       if (parse_oid_hex(oid_hex, &oid, &p))
> +               die("could not parse oid");
> +       if (oid_object_info_extended(&r, &oid, &oi, 0))
> +               die("could not obtain object info");
> +       printf("%d\n", (int) size);
> +}
> +
> +int cmd__partial_clone(int argc, const char **argv)
> +{
> +       if (argc < 4)
> +               die("too few arguments");
> +
> +       if (!strcmp(argv[1], "object-info"))
> +               object_info(argv[2], argv[3]);
> +       else
> +               die("invalid argument '%s'", argv[1]);
> +
> +       return 0;
> +}
> diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
> index c5bd0c6d4c..b21e8f1519 100644
> --- a/t/helper/test-tool.c
> +++ b/t/helper/test-tool.c
> @@ -46,6 +46,7 @@ static struct test_cmd cmds[] = {
>         { "online-cpus", cmd__online_cpus },
>         { "parse-options", cmd__parse_options },
>         { "parse-pathspec-file", cmd__parse_pathspec_file },
> +       { "partial-clone", cmd__partial_clone },
>         { "path-utils", cmd__path_utils },
>         { "pcre2-config", cmd__pcre2_config },
>         { "pkt-line", cmd__pkt_line },
> diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
> index e8069a3b22..f845ced4b3 100644
> --- a/t/helper/test-tool.h
> +++ b/t/helper/test-tool.h
> @@ -35,6 +35,7 @@ int cmd__oidmap(int argc, const char **argv);
>  int cmd__online_cpus(int argc, const char **argv);
>  int cmd__parse_options(int argc, const char **argv);
>  int cmd__parse_pathspec_file(int argc, const char** argv);
> +int cmd__partial_clone(int argc, const char **argv);
>  int cmd__path_utils(int argc, const char **argv);
>  int cmd__pcre2_config(int argc, const char **argv);
>  int cmd__pkt_line(int argc, const char **argv);
> diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
> index 584a039b85..e804d267e6 100755
> --- a/t/t0410-partial-clone.sh
> +++ b/t/t0410-partial-clone.sh
> @@ -604,6 +604,30 @@ test_expect_success 'do not fetch when checking existence of tree we construct o
>         git -C repo cherry-pick side1
>  '
>
> +test_expect_success 'lazy-fetch when accessing object not in the_repository' '
> +       rm -rf full partial.git &&
> +       test_create_repo full &&
> +       printf 12345 >full/file.txt &&
> +       git -C full add file.txt &&
> +       git -C full commit -m "first commit" &&
> +
> +       test_config -C full uploadpack.allowfilter 1 &&
> +       test_config -C full uploadpack.allowanysha1inwant 1 &&
> +       git clone --filter=blob:none --bare "file://$(pwd)/full" partial.git &&
> +       FILE_HASH=$(git hash-object --stdin <full/file.txt) &&
> +
> +       # Sanity check that the file is missing
> +       git -C partial.git rev-list --objects --missing=print HEAD >out &&
> +       grep "[?]$FILE_HASH" out &&
> +
> +       OUT=$(test-tool partial-clone object-info partial.git "$FILE_HASH") &&
> +       test "$OUT" -eq 5 &&
> +
> +       # Sanity check that the file is now present
> +       git -C partial.git rev-list --objects --missing=print HEAD >out &&
> +       ! grep "[?]$FILE_HASH" out
> +'
> +

Turns out that this test fails under GIT_TEST_DEFAULT_HASH=sha256; output:

error: wrong index v2 file size in /home/newren/floss/git/t/trash
directory.t0410-partial-clone/partial.git/objects/pack/pack-66a15be115d740341216938fb7abb31902e960bd6d464829d85164d1a4a25bec.idx
error: wrong index v2 file size in /home/newren/floss/git/t/trash
directory.t0410-partial-clone/partial.git/objects/pack/pack-66a15be115d740341216938fb7abb31902e960bd6d464829d85164d1a4a25bec.idx
fatal: couldn't find remote ref 74242c6e4a0d89f454d89d3496a1f7cb3f1f39f0
error: wrong index v2 file size in /home/newren/floss/git/t/trash
directory.t0410-partial-clone/partial.git/objects/pack/pack-66a15be115d740341216938fb7abb31902e960bd6d464829d85164d1a4a25bec.idx
error: wrong index v2 file size in /home/newren/floss/git/t/trash
directory.t0410-partial-clone/partial.git/objects/pack/pack-66a15be115d740341216938fb7abb31902e960bd6d464829d85164d1a4a25bec.idx
fatal: could not obtain object info

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 1/4] promisor-remote: read partialClone config here
  2021-06-04 19:56   ` Taylor Blau
@ 2021-06-05  1:38     ` Jonathan Tan
  0 siblings, 0 replies; 77+ messages in thread
From: Jonathan Tan @ 2021-06-05  1:38 UTC (permalink / raw)
  To: me; +Cc: jonathantanmy, git

> On Tue, Jun 01, 2021 at 02:34:16PM -0700, Jonathan Tan wrote:
> > @@ -99,6 +94,11 @@ static int promisor_remote_config(const char *var, const char *value, void *data
> >  	size_t namelen;
> >  	const char *subkey;
> >
> > +	if (!strcmp(var, "extensions.partialclone")) {
> > +		repository_format_partial_clone = xstrdup(value);
> 
> Can value ever be NULL here? I think the answer is "no, because we check
> earlier in setup.c:handle_extension_v0()", but there is an implicit
> conversion from xstrdup_or_null() to just xstrdup(), which would fault
> if value were to be NULL.

Ah, good catch. I think you're right, but it's better to be defensive
here. I'll add that check and also say that this will be caught by
setup.c.

> Looking deeper, this path is a little confusing to me, since (in the
> pre-image), handle_extension_v0() makes a copy of value and binds it to
> data->partial_clone. But then check_repository_format_gently() makes
> another copy of canidate->partial_clone (which is the same location as
> data->partial_clone).
> 
> So, the extra copy is a little strange to me, because even though the
> copy in handle_extension_v0() is definitely necessary, I'm not certain
> that the one in set_repository_format_partial_clone() is. And this patch
> removes the latter one, which I think is good. But we never free
> repository_format_partial_clone.

Yes, it's also a simplification that we go straight from config to
variable here, instead of going through the candidate struct. As for
freeing repository_format_partial_clone, this is run once and never
again (guarded by "initialized" check) so there is no prior value to
free, and I don't think this variable needs to be freed.

> Maybe that is added in a later patch, let's see...

So I think things are fine here, but in a later patch,
repository_format_partial_clone is moved to the struct repository, which
can be cleared with repo_clear(). I'll need to add a free there :-)

> Thanks,
> Taylor

Thanks for looking at the patch set too.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2/4] promisor-remote: support per-repository config
  2021-06-04 20:09   ` Taylor Blau
@ 2021-06-05  1:43     ` Jonathan Tan
  0 siblings, 0 replies; 77+ messages in thread
From: Jonathan Tan @ 2021-06-05  1:43 UTC (permalink / raw)
  To: me; +Cc: jonathantanmy, git

> On Tue, Jun 01, 2021 at 02:34:17PM -0700, Jonathan Tan wrote:
> > Instead of using global variables to store promisor remote information,
> > store this config in struct repository instead, and add
> > repository-agnostic non-static functions corresponding to the existing
> > non-static functions that only work on the_repository.
> >
> > The actual lazy-fetching of missing objects currently does not work on
> > repositories other than the_repository, and will still not work after
> > this commit, so add a BUG message explaining this. A subsequent commit
> > will remove this limitation.
> 
> Makes sense to me. I found my answer to the question that I raised
> during my review of the previous patch, and I think it would make sense
> to address in an amended version of this patch.

Just to clarify, what is the question and what is the answer?

> Other than that, the translation all looked very faithful to me.
> 
> > -void promisor_remote_reinit(void)
> > +void repo_promisor_remote_reinit(struct repository *r)
> >  {
> > -	initialized = 0;
> > -	promisor_remote_clear();
> > -	promisor_remote_init();
> > +	promisor_remote_clear(r->promisor_remote_config);
> 
> Ah, this is probably where I would have expected to see
> r->promisor_remote_config->repository_format_partial_clone freed as
> well.
> 
> I wondered whether or not that should have been freed, since on first
> read it seemed that this function was mostly concerned with the list of
> promisor remotes rather than the structure containing them. But on a
> closer look, we are re-initializing the whole structure with
> promisor_remote_init(), which runs the whole promisor_remote_config
> callback again.
> 
> So I do think we want to free that part of the structure, too, before
> reinitializing it. I would probably do it in promisor_remote_clear().

I'll add that in the next version.

> > @@ -235,9 +244,11 @@ int promisor_remote_get_direct(struct repository *repo,
> >  	if (oid_nr == 0)
> >  		return 0;
> >
> > -	promisor_remote_init();
> > +	promisor_remote_init(repo);
> >
> > -	for (r = promisors; r; r = r->next) {
> > +	if (repo != the_repository)
> > +		BUG("only the_repository is supported for now");
> 
> I could go either way on whether this is worthy of a BUG() or not, but I
> don't really have much of a strong feeling about it.

This BUG() will be removed in patch 4. I'm OK either way (adding BUG()
here and removing it in patch 4, or never adding BUG() in the first
place).

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2/4] promisor-remote: support per-repository config
  2021-06-04 21:21   ` Elijah Newren
@ 2021-06-05  1:54     ` Jonathan Tan
  0 siblings, 0 replies; 77+ messages in thread
From: Jonathan Tan @ 2021-06-05  1:54 UTC (permalink / raw)
  To: newren; +Cc: jonathantanmy, git

> > +int repo_has_promisor_remote(struct repository *r);
> > +static inline int has_promisor_remote(void)
> > +{
> > +       return repo_has_promisor_remote(the_repository);
> > +}
> 
> Is part of the plan for supporting partial clones within submodules to
> audit the code for use of these inline wrappers and convert them over
> to the repo_* variants?  I'm particularly interested in the
> has_promisor_remote() function, since there are calls in
> diffcore-rename at least that protect that call with a check against r
> == the_repository.

Good point. Yes, that should be part of the plan - at least, we should
see what those invocations are for.

> > @@ -139,6 +140,9 @@ struct repository {
> >         /* True if commit-graph has been disabled within this process. */
> >         int commit_graph_disabled;
> >
> > +       /* Configurations related to promisor remotes. */
> > +       struct promisor_remote_config *promisor_remote_config;
> > +
> >         /* Configurations */
> >
> >         /* Indicate if a repository has a different 'commondir' from 'gitdir' */
> > --
> > 2.32.0.rc0.204.g9fa02ecfa5-goog
> 
> Looks like a reasonable step in moving away from globals and have
> repository-specific variants of these functions; I didn't spot any
> problems, just one question about additional plans.

Thanks for taking a look.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 3/4] run-command: move envvar-resetting function
  2021-06-04 20:19   ` Taylor Blau
@ 2021-06-05  1:57     ` Jonathan Tan
  0 siblings, 0 replies; 77+ messages in thread
From: Jonathan Tan @ 2021-06-05  1:57 UTC (permalink / raw)
  To: me; +Cc: jonathantanmy, git

> On Tue, Jun 01, 2021 at 02:34:18PM -0700, Jonathan Tan wrote:
> > There is a function that resets environment variables, used when
> > invoking a sub-process in a submodule. The lazy-fetching code (used in
> > partial clones) will need this function in a subsequent commit, so move
> > it to a more central location.
> >
> > Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> 
> All seems pretty normal to me. I did have one question, though:
> 
> > +/**
> > + * Convenience function that adds entries to env_array that resets all
> 
> Hmm. Why "resets"? IIUC local_repo_env is the array of environment
> variables that change behavior. With that understanding in mind, I
> probably would have written something more like:
> 
>     Convenience function which adds all GIT_* environment variables to
>     env_array with the exception of GIT_CONFIG_PARAMETERS. See
>     local_repo_env in cache.h for more information.

I mentioned "reset" because the effect of adding the name without any
value makes the environment variable of that name unset in the
subprocess. I'll word it as you say, and add "When used as the env_array
of a subprocess, these entries cause the corresponding environment
variables to be unset in the subprocess." after the first sentence.

> (Confusingly, cache.h calls this variable CONFIG_DATA_ENVIRONMENT, but
> binds it to GIT_CONFIG_PARAMETERS. I think it probably makes more sense
> to use the environment variable's name rather than our #define, since
> we're saying "all GIT_* variables, except this one", so it would be
> weird for "this one" not to start with "GIT_".

OK, that makes sense.

> Otherwise the movement looks fine to me.
> 
> Thanks,
> Taylor

Thanks.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 4/4] promisor-remote: teach lazy-fetch in any repo
  2021-06-04 21:25   ` Taylor Blau
@ 2021-06-05  2:11     ` Jonathan Tan
  0 siblings, 0 replies; 77+ messages in thread
From: Jonathan Tan @ 2021-06-05  2:11 UTC (permalink / raw)
  To: me; +Cc: jonathantanmy, git

> > +static void object_info(const char *gitdir, const char *oid_hex)
> > +{
> > +	struct repository r;
> > +	struct object_id oid;
> > +	unsigned long size;
> > +	struct object_info oi = {.sizep = &size};
> > +	const char *p;
> > +
> > +	if (repo_init(&r, gitdir, NULL))
> > +		die("could not init repo");
> > +	if (parse_oid_hex(oid_hex, &oid, &p))
> > +		die("could not parse oid");
> > +	if (oid_object_info_extended(&r, &oid, &oi, 0))
> > +		die("could not obtain object info");
> > +	printf("%d\n", (int) size);
> > +}
> 
> Hmm. Is there a reason that the same couldn't be implemented by calling "git
> cat-file -s" from the partial clone?

I don't think "git cat-file" (when run in the superproject) by itself
can access an object from a submodule. "git -C name_of_submodule
cat-file $HASH" would access that object, but I specifically want to
test oid_object_info_extended() on a repo that is not the_repository
(which would not work with -C, because the_repository would then be the
submodule).

> > +test_expect_success 'lazy-fetch when accessing object not in the_repository' '
> > +	rm -rf full partial.git &&
> > +	test_create_repo full &&
> > +	printf 12345 >full/file.txt &&
> > +	git -C full add file.txt &&
> > +	git -C full commit -m "first commit" &&
> 
> This is a stylistic nit, but I think using test_commit is better here
> for a non-superficial reason. My guess is that you wanted to avoid
> specifying a message and file (which are required positional arguments
> to test_commit before you can specify the contents). But I think there
> are two good reasons to use test_commit here:
> 
>   - It saves three lines of test script here.
>   - You don't have to make the expected size a magic number (i.e.,
>     because you knew ahead of time that the contents was "12345").
> 
> I probably would have expected this test to end with:
> 
>   git -C full cat-file -s $FILE_HASH >expect &&
>   git -C partial.git cat-file -s $FILE_HASH >actual &&
>   test_cmp expect actual
> 
> which reads more clearly to me (although I think the much more important
> test is that $FILE_HASH doesn't show up in the output of the rev-list
> --missing=print that is run in the partial clone).

That makes sense.

> > +	test_config -C full uploadpack.allowfilter 1 &&
> > +	test_config -C full uploadpack.allowanysha1inwant 1 &&
> > +	git clone --filter=blob:none --bare "file://$(pwd)/full" partial.git &&
> > +	FILE_HASH=$(git hash-object --stdin <full/file.txt) &&
> 
> This works for me, although I wouldn't have been sad to see the
> sub-shell contain "git -C full rev-parse HEAD:file.txt" instead.

I'll do this too.

> > +	# Sanity check that the file is missing
> > +	git -C partial.git rev-list --objects --missing=print HEAD >out &&
> > +	grep "[?]$FILE_HASH" out &&
> > +
> > +	OUT=$(test-tool partial-clone object-info partial.git "$FILE_HASH") &&
> 
> Coming back to my point about the utility of the partial-clone helper,
> could this be replaced by saying just OUT="$(git -C partial.git cat-file
> -s "$FILE_HASH")" instead?
> 
> Thanks,
> Taylor

Same answer as above - I specifically want to test accessing (and
thereby lazy-fetching) an object when the object is not in
the_repository. I'll add some documentation to explain what it does and
that it's equivalent to using "git -C repo cat-file -s", except that
this specifically tests another code path.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 4/4] promisor-remote: teach lazy-fetch in any repo
  2021-06-04 21:35   ` Elijah Newren
@ 2021-06-05  2:16     ` Jonathan Tan
  2021-06-05  3:48     ` Elijah Newren
  1 sibling, 0 replies; 77+ messages in thread
From: Jonathan Tan @ 2021-06-05  2:16 UTC (permalink / raw)
  To: newren; +Cc: jonathantanmy, git

> > diff --git a/object-file.c b/object-file.c
> > index f233b440b2..ebf273e9e7 100644
> > --- a/object-file.c
> > +++ b/object-file.c
> > @@ -1570,15 +1570,12 @@ static int do_oid_object_info_extended(struct repository *r,
> >                 }
> >
> >                 /* Check if it is a missing object */
> > -               if (fetch_if_missing && has_promisor_remote() &&
> > -                   !already_retried && r == the_repository &&
> > +               if (fetch_if_missing && repo_has_promisor_remote(r) &&
> > +                   !already_retried &&
> 
> So here you removed the special check against the_repository while
> looking for promisor_remotes.  There are other such special checks in
> the code; I also see:
> 
> diff.c: if (options->repo == the_repository && has_promisor_remote() &&
> diffcore-break.c:       if (r == the_repository && has_promisor_remote()) {
> diffcore-rename.c:      if (r == the_repository && has_promisor_remote()) {
> 
> and a series I'm planning to submit soon will add another to merge.ort.c.
> 
> Do these need to all be fixed as part of the partial clone submodule
> support as well?  Do I need to change anything about my series?  I
> guess since I'm asking that, I should probably submit it first so you
> can actually see it and answer my question.  (And the timing may be
> good since the area is fresh in your memory...)

Thanks for raising this. Looking at the 3 you listed, they all have to
do with prefetching. This is fine both now and later. Now, since partial
clones in submodules still don't work, any fetching of any sort (pre- or
not) will not work. Later, since this prefetching is just an
optimization. (Of course, we should come back and add prefetching for
submodule partial clones, but that is an optimization, not a correctness
issue.)

> >                     !(flags & OBJECT_INFO_SKIP_FETCH_OBJECT)) {
> >                         /*
> >                          * TODO Investigate checking promisor_remote_get_direct()
> >                          * TODO return value and stopping on error here.
> > -                        * TODO Pass a repository struct through
> > -                        * promisor_remote_get_direct(), such that arbitrary
> > -                        * repositories work.
> 
> Odd, it appears that when this comment was added (in commit b14ed5adaf
> ("Use promisor_remote_get_direct() and has_promisor_remote()",
> 2019-06-25)), a repository was passed to promisor_remote_get_direct().
> Sure, it was just a transliteration of the comment that was there
> before when fetch_objects() was the function being called, but since
> the code was being changed and the comment being updated, it seems the
> TODO should have been removed back then.
> 
> Oh, well, good to update it now at least.

Yes - perhaps the comment was emphasizing the "such that arbitrary
repositories work" part. But anyway, yes, it is now removed.

[snip]

> Looks good to me.

Thanks for taking a look.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 4/4] promisor-remote: teach lazy-fetch in any repo
  2021-06-05  0:22   ` Elijah Newren
@ 2021-06-05  2:16     ` Jonathan Tan
  0 siblings, 0 replies; 77+ messages in thread
From: Jonathan Tan @ 2021-06-05  2:16 UTC (permalink / raw)
  To: newren; +Cc: jonathantanmy, git

> Turns out that this test fails under GIT_TEST_DEFAULT_HASH=sha256; output:
> 
> error: wrong index v2 file size in /home/newren/floss/git/t/trash
> directory.t0410-partial-clone/partial.git/objects/pack/pack-66a15be115d740341216938fb7abb31902e960bd6d464829d85164d1a4a25bec.idx
> error: wrong index v2 file size in /home/newren/floss/git/t/trash
> directory.t0410-partial-clone/partial.git/objects/pack/pack-66a15be115d740341216938fb7abb31902e960bd6d464829d85164d1a4a25bec.idx
> fatal: couldn't find remote ref 74242c6e4a0d89f454d89d3496a1f7cb3f1f39f0
> error: wrong index v2 file size in /home/newren/floss/git/t/trash
> directory.t0410-partial-clone/partial.git/objects/pack/pack-66a15be115d740341216938fb7abb31902e960bd6d464829d85164d1a4a25bec.idx
> error: wrong index v2 file size in /home/newren/floss/git/t/trash
> directory.t0410-partial-clone/partial.git/objects/pack/pack-66a15be115d740341216938fb7abb31902e960bd6d464829d85164d1a4a25bec.idx
> fatal: could not obtain object info

Thanks for noticing this. I'll take a look.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 4/4] promisor-remote: teach lazy-fetch in any repo
  2021-06-04 21:35   ` Elijah Newren
  2021-06-05  2:16     ` Jonathan Tan
@ 2021-06-05  3:48     ` Elijah Newren
  1 sibling, 0 replies; 77+ messages in thread
From: Elijah Newren @ 2021-06-05  3:48 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Git Mailing List

A quick update...

On Fri, Jun 4, 2021 at 2:35 PM Elijah Newren <newren@gmail.com> wrote:
>
> On Tue, Jun 1, 2021 at 2:38 PM Jonathan Tan <jonathantanmy@google.com> wrote:
> >
...
> > diff --git a/object-file.c b/object-file.c
> > index f233b440b2..ebf273e9e7 100644
> > --- a/object-file.c
> > +++ b/object-file.c
> > @@ -1570,15 +1570,12 @@ static int do_oid_object_info_extended(struct repository *r,
> >                 }
> >
> >                 /* Check if it is a missing object */
> > -               if (fetch_if_missing && has_promisor_remote() &&
> > -                   !already_retried && r == the_repository &&
> > +               if (fetch_if_missing && repo_has_promisor_remote(r) &&
> > +                   !already_retried &&
>
> So here you removed the special check against the_repository while
> looking for promisor_remotes.  There are other such special checks in
> the code; I also see:
>
> diff.c: if (options->repo == the_repository && has_promisor_remote() &&
> diffcore-break.c:       if (r == the_repository && has_promisor_remote()) {
> diffcore-rename.c:      if (r == the_repository && has_promisor_remote()) {
>
> and a series I'm planning to submit soon will add another to merge.ort.c.
>
> Do these need to all be fixed as part of the partial clone submodule
> support as well?  Do I need to change anything about my series?  I
> guess since I'm asking that, I should probably submit it first so you
> can actually see it and answer my question.  (And the timing may be
> good since the area is fresh in your memory...)

That other topic is now over here:
https://lore.kernel.org/git/pull.969.git.1622856485.gitgitgadget@gmail.com/T/#t

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 1/4] promisor-remote: read partialClone config here
  2021-06-01 21:34 ` [PATCH 1/4] promisor-remote: read partialClone config here Jonathan Tan
  2021-06-04 19:56   ` Taylor Blau
@ 2021-06-07 22:41   ` Emily Shaffer
  1 sibling, 0 replies; 77+ messages in thread
From: Emily Shaffer @ 2021-06-07 22:41 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On Tue, Jun 01, 2021 at 02:34:16PM -0700, Jonathan Tan wrote:
> 
> Currently, the reading of config related to promisor remotes is done in
> two places: once in setup.c (which sets the global variable
> repository_format_partial_clone, to be read by the code in
> promisor-remote.c), and once in promisor-remote.c. This means that care
> must be taken to ensure that repository_format_partial_clone is set
> before any code in promisor-remote.c accesses it.
> 
> To simplify the code, move all such config reading to promisor-remote.c.
> By doing this, it will be easier to see when
> repository_format_partial_clone is written and, thus, to reason about
> the code. This will be especially helpful in a subsequent commit, which
> modifies this code.

Do we reliably call promisor-remote.c:promisor_remote_config()? It's
called only during promisor_remote_init(), which happens if we call
something like promisor_remote_get_direct(), and I guess we call that
one unconditionally (e.g. there's no "if (partial_clone)
promisor_remote_get_direct();" that I saw in a brief glance) then it's
OK.

> @@ -1061,7 +1061,6 @@ extern int repository_format_worktree_config;
>  struct repository_format {
>  	int version;
>  	int precious_objects;
> -	char *partial_clone; /* value of extensions.partialclone */

I also don't see that this repository_format.partial_clone value gets
checked anywhere anyways - I only see where it's set and freed in a
brief grep - so this seems fine to me.

I saw Taylor's comment about NULL-ness and other than that, this patch
looks good to me.

Reviewed-by: Emily Shaffer <emilyshaffer@google.com>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH v2 0/4] First steps towards partial clone submodules
  2021-06-01 21:34 [PATCH 0/4] First steps towards partial clone submodules Jonathan Tan
                   ` (3 preceding siblings ...)
  2021-06-01 21:34 ` [PATCH 4/4] promisor-remote: teach lazy-fetch in any repo Jonathan Tan
@ 2021-06-08  0:25 ` Jonathan Tan
  2021-06-08  0:25   ` [PATCH v2 1/4] promisor-remote: read partialClone config here Jonathan Tan
                     ` (4 more replies)
  2021-06-08  1:44 ` [PATCH " Emily Shaffer
                   ` (2 subsequent siblings)
  7 siblings, 5 replies; 77+ messages in thread
From: Jonathan Tan @ 2021-06-08  0:25 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, me, newren, emilyshaffer

Thanks everyone for your reviews. I believe I've addressed all review
comments, including the one from Elijah about the test failing with
sha256 (which turns out to be because I didn't add a call to
setup_git_directory(), which the other test helpers do).

Jonathan Tan (4):
  promisor-remote: read partialClone config here
  promisor-remote: support per-repository config
  run-command: move envvar-resetting function
  promisor-remote: teach lazy-fetch in any repo

 Makefile                      |   1 +
 cache.h                       |   1 -
 object-file.c                 |   7 +-
 promisor-remote.c             | 125 ++++++++++++++++++++--------------
 promisor-remote.h             |  28 +++++---
 repository.c                  |   6 ++
 repository.h                  |   4 ++
 run-command.c                 |  10 +++
 run-command.h                 |   9 +++
 setup.c                       |  10 ++-
 submodule.c                   |  14 +---
 t/helper/test-partial-clone.c |  43 ++++++++++++
 t/helper/test-tool.c          |   1 +
 t/helper/test-tool.h          |   1 +
 t/t0410-partial-clone.sh      |  23 +++++++
 15 files changed, 201 insertions(+), 82 deletions(-)
 create mode 100644 t/helper/test-partial-clone.c

Range-diff against v1:
1:  4a7ad9ffeb ! 1:  07290cba86 promisor-remote: read partialClone config here
    @@ promisor-remote.c: static int promisor_remote_config(const char *var, const char
      	const char *subkey;
      
     +	if (!strcmp(var, "extensions.partialclone")) {
    -+		repository_format_partial_clone = xstrdup(value);
    ++		/*
    ++		 * NULL value is handled in handle_extension_v0 in setup.c.
    ++		 */
    ++		if (value)
    ++			repository_format_partial_clone = xstrdup(value);
     +		return 0;
     +	}
     +
2:  d8f5fa9b9f ! 2:  c462927ff2 promisor-remote: support per-repository config
    @@ promisor-remote.c: static void promisor_remote_move_to_tail(struct promisor_remo
      	const char *name;
      	size_t namelen;
      	const char *subkey;
    - 
    - 	if (!strcmp(var, "extensions.partialclone")) {
    --		repository_format_partial_clone = xstrdup(value);
    -+		config->repository_format_partial_clone = xstrdup(value);
    +@@ promisor-remote.c: static int promisor_remote_config(const char *var, const char *value, void *data
    + 		 * NULL value is handled in handle_extension_v0 in setup.c.
    + 		 */
    + 		if (value)
    +-			repository_format_partial_clone = xstrdup(value);
    ++			config->repository_format_partial_clone = xstrdup(value);
      		return 0;
      	}
      
    @@ promisor-remote.c: static int promisor_remote_config(const char *var, const char
      }
      
     -static void promisor_remote_clear(void)
    -+static void promisor_remote_clear(struct promisor_remote_config *config)
    ++void promisor_remote_clear(struct promisor_remote_config *config)
      {
     -	while (promisors) {
     -		struct promisor_remote *r = promisors;
     -		promisors = promisors->next;
    ++	FREE_AND_NULL(config->repository_format_partial_clone);
    ++
     +	while (config->promisors) {
     +		struct promisor_remote *r = config->promisors;
     +		config->promisors = config->promisors->next;
    @@ promisor-remote.h: struct promisor_remote {
     +	repo_promisor_remote_reinit(the_repository);
     +}
     +
    ++void promisor_remote_clear(struct promisor_remote_config *config);
    ++
     +struct promisor_remote *repo_promisor_remote_find(struct repository *r, const char *remote_name);
     +static inline struct promisor_remote *promisor_remote_find(const char *remote_name)
     +{
    @@ promisor-remote.h: struct promisor_remote {
      /*
       * Fetches all requested objects from all promisor remotes, trying them one at
     
    + ## repository.c ##
    +@@
    + #include "lockfile.h"
    + #include "submodule-config.h"
    + #include "sparse-index.h"
    ++#include "promisor-remote.h"
    + 
    + /* The main repository */
    + static struct repository the_repo;
    +@@ repository.c: void repo_clear(struct repository *repo)
    + 		if (repo->index != &the_index)
    + 			FREE_AND_NULL(repo->index);
    + 	}
    ++
    ++	if (repo->promisor_remote_config) {
    ++		promisor_remote_clear(repo->promisor_remote_config);
    ++		FREE_AND_NULL(repo->promisor_remote_config);
    ++	}
    + }
    + 
    + int repo_read_index(struct repository *repo)
    +
      ## repository.h ##
     @@ repository.h: struct lock_file;
      struct pathspec;
3:  c5307a9f02 ! 3:  9cbdf60981 run-command: move envvar-resetting function
    @@ run-command.h: int run_processes_parallel_tr2(int n, get_next_task_fn, start_fai
      			       const char *tr2_category, const char *tr2_label);
      
     +/**
    -+ * Convenience function that adds entries to env_array that resets all
    -+ * repo-specific environment variables except for CONFIG_DATA_ENVIRONMENT. See
    -+ * local_repo_env in cache.h for more information.
    ++ * Convenience function which adds all GIT_* environment variables to env_array
    ++ * with the exception of GIT_CONFIG_PARAMETERS. When used as the env_array of a
    ++ * subprocess, these entries cause the corresponding environment variables to
    ++ * be unset in the subprocess. See local_repo_env in cache.h for more
    ++ * information.
     + */
     +void prepare_other_repo_env(struct strvec *env_array);
     +
4:  b70a00b9b0 ! 4:  5b41569ace promisor-remote: teach lazy-fetch in any repo
    @@ t/helper/test-partial-clone.c (new)
     +#include "repository.h"
     +#include "object-store.h"
     +
    ++/*
    ++ * Prints the size of the object corresponding to the given hash in a specific
    ++ * gitdir. This is similar to "git -C gitdir cat-file -s", except that this
    ++ * exercises the code that accesses the object of an arbitrary repository that
    ++ * is not the_repository. ("git -C gitdir" makes it so that the_repository is
    ++ * the one in gitdir.)
    ++ */
     +static void object_info(const char *gitdir, const char *oid_hex)
     +{
     +	struct repository r;
    @@ t/helper/test-partial-clone.c (new)
     +
     +int cmd__partial_clone(int argc, const char **argv)
     +{
    ++	setup_git_directory();
    ++
     +	if (argc < 4)
     +		die("too few arguments");
     +
    @@ t/t0410-partial-clone.sh: test_expect_success 'do not fetch when checking existe
     +test_expect_success 'lazy-fetch when accessing object not in the_repository' '
     +	rm -rf full partial.git &&
     +	test_create_repo full &&
    -+	printf 12345 >full/file.txt &&
    -+	git -C full add file.txt &&
    -+	git -C full commit -m "first commit" &&
    ++	test_commit -C full create-a-file file.txt &&
     +
     +	test_config -C full uploadpack.allowfilter 1 &&
     +	test_config -C full uploadpack.allowanysha1inwant 1 &&
     +	git clone --filter=blob:none --bare "file://$(pwd)/full" partial.git &&
    -+	FILE_HASH=$(git hash-object --stdin <full/file.txt) &&
    ++	FILE_HASH=$(git -C full rev-parse HEAD:file.txt) &&
     +
     +	# Sanity check that the file is missing
     +	git -C partial.git rev-list --objects --missing=print HEAD >out &&
     +	grep "[?]$FILE_HASH" out &&
     +
    -+	OUT=$(test-tool partial-clone object-info partial.git "$FILE_HASH") &&
    -+	test "$OUT" -eq 5 &&
    ++	git -C full cat-file -s "$FILE_HASH" >expect &&
    ++	test-tool partial-clone object-info partial.git "$FILE_HASH" >actual &&
    ++	test_cmp expect actual &&
     +
     +	# Sanity check that the file is now present
     +	git -C partial.git rev-list --objects --missing=print HEAD >out &&
-- 
2.32.0.rc1.229.g3e70b5a671-goog


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH v2 1/4] promisor-remote: read partialClone config here
  2021-06-08  0:25 ` [PATCH v2 0/4] First steps towards partial clone submodules Jonathan Tan
@ 2021-06-08  0:25   ` Jonathan Tan
  2021-06-08  3:18     ` Junio C Hamano
  2021-06-08 17:28     ` Elijah Newren
  2021-06-08  0:25   ` [PATCH v2 2/4] promisor-remote: support per-repository config Jonathan Tan
                     ` (3 subsequent siblings)
  4 siblings, 2 replies; 77+ messages in thread
From: Jonathan Tan @ 2021-06-08  0:25 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, me, newren, emilyshaffer

Currently, the reading of config related to promisor remotes is done in
two places: once in setup.c (which sets the global variable
repository_format_partial_clone, to be read by the code in
promisor-remote.c), and once in promisor-remote.c. This means that care
must be taken to ensure that repository_format_partial_clone is set
before any code in promisor-remote.c accesses it.

To simplify the code, move all such config reading to promisor-remote.c.
By doing this, it will be easier to see when
repository_format_partial_clone is written and, thus, to reason about
the code. This will be especially helpful in a subsequent commit, which
modifies this code.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 cache.h           |  1 -
 promisor-remote.c | 14 +++++++++-----
 promisor-remote.h |  6 ------
 setup.c           | 10 +++++++---
 4 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/cache.h b/cache.h
index ba04ff8bd3..dbdcec8601 100644
--- a/cache.h
+++ b/cache.h
@@ -1061,7 +1061,6 @@ extern int repository_format_worktree_config;
 struct repository_format {
 	int version;
 	int precious_objects;
-	char *partial_clone; /* value of extensions.partialclone */
 	int worktree_config;
 	int is_bare;
 	int hash_algo;
diff --git a/promisor-remote.c b/promisor-remote.c
index da3f2ca261..c0e5061dfe 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -7,11 +7,6 @@
 
 static char *repository_format_partial_clone;
 
-void set_repository_format_partial_clone(char *partial_clone)
-{
-	repository_format_partial_clone = xstrdup_or_null(partial_clone);
-}
-
 static int fetch_objects(const char *remote_name,
 			 const struct object_id *oids,
 			 int oid_nr)
@@ -99,6 +94,15 @@ static int promisor_remote_config(const char *var, const char *value, void *data
 	size_t namelen;
 	const char *subkey;
 
+	if (!strcmp(var, "extensions.partialclone")) {
+		/*
+		 * NULL value is handled in handle_extension_v0 in setup.c.
+		 */
+		if (value)
+			repository_format_partial_clone = xstrdup(value);
+		return 0;
+	}
+
 	if (parse_config_key(var, "remote", &name, &namelen, &subkey) < 0)
 		return 0;
 
diff --git a/promisor-remote.h b/promisor-remote.h
index c7a14063c5..687210ab87 100644
--- a/promisor-remote.h
+++ b/promisor-remote.h
@@ -32,10 +32,4 @@ int promisor_remote_get_direct(struct repository *repo,
 			       const struct object_id *oids,
 			       int oid_nr);
 
-/*
- * This should be used only once from setup.c to set the value we got
- * from the extensions.partialclone config option.
- */
-void set_repository_format_partial_clone(char *partial_clone);
-
 #endif /* PROMISOR_REMOTE_H */
diff --git a/setup.c b/setup.c
index 59e2facd9d..d60b6bc554 100644
--- a/setup.c
+++ b/setup.c
@@ -470,7 +470,13 @@ static enum extension_result handle_extension_v0(const char *var,
 		} else if (!strcmp(ext, "partialclone")) {
 			if (!value)
 				return config_error_nonbool(var);
-			data->partial_clone = xstrdup(value);
+			/*
+			 * This config variable will be read together with the
+			 * other relevant config variables in
+			 * promisor_remote_config() in promisor_remote.c, so we
+			 * do not need to read it here. Just report that this
+			 * extension is known.
+			 */
 			return EXTENSION_OK;
 		} else if (!strcmp(ext, "worktreeconfig")) {
 			data->worktree_config = git_config_bool(var, value);
@@ -566,7 +572,6 @@ static int check_repository_format_gently(const char *gitdir, struct repository_
 	}
 
 	repository_format_precious_objects = candidate->precious_objects;
-	set_repository_format_partial_clone(candidate->partial_clone);
 	repository_format_worktree_config = candidate->worktree_config;
 	string_list_clear(&candidate->unknown_extensions, 0);
 	string_list_clear(&candidate->v1_only_extensions, 0);
@@ -650,7 +655,6 @@ void clear_repository_format(struct repository_format *format)
 	string_list_clear(&format->unknown_extensions, 0);
 	string_list_clear(&format->v1_only_extensions, 0);
 	free(format->work_tree);
-	free(format->partial_clone);
 	init_repository_format(format);
 }
 
-- 
2.32.0.rc1.229.g3e70b5a671-goog


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH v2 2/4] promisor-remote: support per-repository config
  2021-06-08  0:25 ` [PATCH v2 0/4] First steps towards partial clone submodules Jonathan Tan
  2021-06-08  0:25   ` [PATCH v2 1/4] promisor-remote: read partialClone config here Jonathan Tan
@ 2021-06-08  0:25   ` Jonathan Tan
  2021-06-08  3:30     ` Junio C Hamano
  2021-06-08  0:25   ` [PATCH v2 3/4] run-command: move envvar-resetting function Jonathan Tan
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 77+ messages in thread
From: Jonathan Tan @ 2021-06-08  0:25 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, me, newren, emilyshaffer

Instead of using global variables to store promisor remote information,
store this config in struct repository instead, and add
repository-agnostic non-static functions corresponding to the existing
non-static functions that only work on the_repository.

The actual lazy-fetching of missing objects currently does not work on
repositories other than the_repository, and will still not work after
this commit, so add a BUG message explaining this. A subsequent commit
will remove this limitation.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 promisor-remote.c | 103 ++++++++++++++++++++++++++--------------------
 promisor-remote.h |  22 ++++++++--
 repository.c      |   6 +++
 repository.h      |   4 ++
 4 files changed, 87 insertions(+), 48 deletions(-)

diff --git a/promisor-remote.c b/promisor-remote.c
index c0e5061dfe..e1e1f7e93a 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -5,7 +5,11 @@
 #include "transport.h"
 #include "strvec.h"
 
-static char *repository_format_partial_clone;
+struct promisor_remote_config {
+	char *repository_format_partial_clone;
+	struct promisor_remote *promisors;
+	struct promisor_remote **promisors_tail;
+};
 
 static int fetch_objects(const char *remote_name,
 			 const struct object_id *oids,
@@ -37,10 +41,8 @@ static int fetch_objects(const char *remote_name,
 	return finish_command(&child) ? -1 : 0;
 }
 
-static struct promisor_remote *promisors;
-static struct promisor_remote **promisors_tail = &promisors;
-
-static struct promisor_remote *promisor_remote_new(const char *remote_name)
+static struct promisor_remote *promisor_remote_new(struct promisor_remote_config *config,
+						   const char *remote_name)
 {
 	struct promisor_remote *r;
 
@@ -52,18 +54,19 @@ static struct promisor_remote *promisor_remote_new(const char *remote_name)
 
 	FLEX_ALLOC_STR(r, name, remote_name);
 
-	*promisors_tail = r;
-	promisors_tail = &r->next;
+	*config->promisors_tail = r;
+	config->promisors_tail = &r->next;
 
 	return r;
 }
 
-static struct promisor_remote *promisor_remote_lookup(const char *remote_name,
+static struct promisor_remote *promisor_remote_lookup(struct promisor_remote_config *config,
+						      const char *remote_name,
 						      struct promisor_remote **previous)
 {
 	struct promisor_remote *r, *p;
 
-	for (p = NULL, r = promisors; r; p = r, r = r->next)
+	for (p = NULL, r = config->promisors; r; p = r, r = r->next)
 		if (!strcmp(r->name, remote_name)) {
 			if (previous)
 				*previous = p;
@@ -73,7 +76,8 @@ static struct promisor_remote *promisor_remote_lookup(const char *remote_name,
 	return NULL;
 }
 
-static void promisor_remote_move_to_tail(struct promisor_remote *r,
+static void promisor_remote_move_to_tail(struct promisor_remote_config *config,
+					 struct promisor_remote *r,
 					 struct promisor_remote *previous)
 {
 	if (r->next == NULL)
@@ -82,14 +86,15 @@ static void promisor_remote_move_to_tail(struct promisor_remote *r,
 	if (previous)
 		previous->next = r->next;
 	else
-		promisors = r->next ? r->next : r;
+		config->promisors = r->next ? r->next : r;
 	r->next = NULL;
-	*promisors_tail = r;
-	promisors_tail = &r->next;
+	*config->promisors_tail = r;
+	config->promisors_tail = &r->next;
 }
 
 static int promisor_remote_config(const char *var, const char *value, void *data)
 {
+	struct promisor_remote_config *config = data;
 	const char *name;
 	size_t namelen;
 	const char *subkey;
@@ -99,7 +104,7 @@ static int promisor_remote_config(const char *var, const char *value, void *data
 		 * NULL value is handled in handle_extension_v0 in setup.c.
 		 */
 		if (value)
-			repository_format_partial_clone = xstrdup(value);
+			config->repository_format_partial_clone = xstrdup(value);
 		return 0;
 	}
 
@@ -114,8 +119,8 @@ static int promisor_remote_config(const char *var, const char *value, void *data
 
 		remote_name = xmemdupz(name, namelen);
 
-		if (!promisor_remote_lookup(remote_name, NULL))
-			promisor_remote_new(remote_name);
+		if (!promisor_remote_lookup(config, remote_name, NULL))
+			promisor_remote_new(config, remote_name);
 
 		free(remote_name);
 		return 0;
@@ -124,9 +129,9 @@ static int promisor_remote_config(const char *var, const char *value, void *data
 		struct promisor_remote *r;
 		char *remote_name = xmemdupz(name, namelen);
 
-		r = promisor_remote_lookup(remote_name, NULL);
+		r = promisor_remote_lookup(config, remote_name, NULL);
 		if (!r)
-			r = promisor_remote_new(remote_name);
+			r = promisor_remote_new(config, remote_name);
 
 		free(remote_name);
 
@@ -139,59 +144,65 @@ static int promisor_remote_config(const char *var, const char *value, void *data
 	return 0;
 }
 
-static int initialized;
-
-static void promisor_remote_init(void)
+static void promisor_remote_init(struct repository *r)
 {
-	if (initialized)
+	struct promisor_remote_config *config;
+
+	if (r->promisor_remote_config)
 		return;
-	initialized = 1;
+	config = r->promisor_remote_config =
+		xcalloc(sizeof(*r->promisor_remote_config), 1);
+	config->promisors_tail = &config->promisors;
 
-	git_config(promisor_remote_config, NULL);
+	git_config(promisor_remote_config, config);
 
-	if (repository_format_partial_clone) {
+	if (config->repository_format_partial_clone) {
 		struct promisor_remote *o, *previous;
 
-		o = promisor_remote_lookup(repository_format_partial_clone,
+		o = promisor_remote_lookup(config,
+					   config->repository_format_partial_clone,
 					   &previous);
 		if (o)
-			promisor_remote_move_to_tail(o, previous);
+			promisor_remote_move_to_tail(config, o, previous);
 		else
-			promisor_remote_new(repository_format_partial_clone);
+			promisor_remote_new(config, config->repository_format_partial_clone);
 	}
 }
 
-static void promisor_remote_clear(void)
+void promisor_remote_clear(struct promisor_remote_config *config)
 {
-	while (promisors) {
-		struct promisor_remote *r = promisors;
-		promisors = promisors->next;
+	FREE_AND_NULL(config->repository_format_partial_clone);
+
+	while (config->promisors) {
+		struct promisor_remote *r = config->promisors;
+		config->promisors = config->promisors->next;
 		free(r);
 	}
 
-	promisors_tail = &promisors;
+	config->promisors_tail = &config->promisors;
 }
 
-void promisor_remote_reinit(void)
+void repo_promisor_remote_reinit(struct repository *r)
 {
-	initialized = 0;
-	promisor_remote_clear();
-	promisor_remote_init();
+	promisor_remote_clear(r->promisor_remote_config);
+	FREE_AND_NULL(r->promisor_remote_config);
+	promisor_remote_init(r);
 }
 
-struct promisor_remote *promisor_remote_find(const char *remote_name)
+struct promisor_remote *repo_promisor_remote_find(struct repository *r,
+						  const char *remote_name)
 {
-	promisor_remote_init();
+	promisor_remote_init(r);
 
 	if (!remote_name)
-		return promisors;
+		return r->promisor_remote_config->promisors;
 
-	return promisor_remote_lookup(remote_name, NULL);
+	return promisor_remote_lookup(r->promisor_remote_config, remote_name, NULL);
 }
 
-int has_promisor_remote(void)
+int repo_has_promisor_remote(struct repository *r)
 {
-	return !!promisor_remote_find(NULL);
+	return !!repo_promisor_remote_find(r, NULL);
 }
 
 static int remove_fetched_oids(struct repository *repo,
@@ -239,9 +250,11 @@ int promisor_remote_get_direct(struct repository *repo,
 	if (oid_nr == 0)
 		return 0;
 
-	promisor_remote_init();
+	promisor_remote_init(repo);
 
-	for (r = promisors; r; r = r->next) {
+	if (repo != the_repository)
+		BUG("only the_repository is supported for now");
+	for (r = repo->promisor_remote_config->promisors; r; r = r->next) {
 		if (fetch_objects(r->name, remaining_oids, remaining_nr) < 0) {
 			if (remaining_nr == 1)
 				continue;
diff --git a/promisor-remote.h b/promisor-remote.h
index 687210ab87..edc45ab0f5 100644
--- a/promisor-remote.h
+++ b/promisor-remote.h
@@ -17,9 +17,25 @@ struct promisor_remote {
 	const char name[FLEX_ARRAY];
 };
 
-void promisor_remote_reinit(void);
-struct promisor_remote *promisor_remote_find(const char *remote_name);
-int has_promisor_remote(void);
+void repo_promisor_remote_reinit(struct repository *r);
+static inline void promisor_remote_reinit(void)
+{
+	repo_promisor_remote_reinit(the_repository);
+}
+
+void promisor_remote_clear(struct promisor_remote_config *config);
+
+struct promisor_remote *repo_promisor_remote_find(struct repository *r, const char *remote_name);
+static inline struct promisor_remote *promisor_remote_find(const char *remote_name)
+{
+	return repo_promisor_remote_find(the_repository, remote_name);
+}
+
+int repo_has_promisor_remote(struct repository *r);
+static inline int has_promisor_remote(void)
+{
+	return repo_has_promisor_remote(the_repository);
+}
 
 /*
  * Fetches all requested objects from all promisor remotes, trying them one at
diff --git a/repository.c b/repository.c
index 448cd557d4..dca0a11ab6 100644
--- a/repository.c
+++ b/repository.c
@@ -11,6 +11,7 @@
 #include "lockfile.h"
 #include "submodule-config.h"
 #include "sparse-index.h"
+#include "promisor-remote.h"
 
 /* The main repository */
 static struct repository the_repo;
@@ -258,6 +259,11 @@ void repo_clear(struct repository *repo)
 		if (repo->index != &the_index)
 			FREE_AND_NULL(repo->index);
 	}
+
+	if (repo->promisor_remote_config) {
+		promisor_remote_clear(repo->promisor_remote_config);
+		FREE_AND_NULL(repo->promisor_remote_config);
+	}
 }
 
 int repo_read_index(struct repository *repo)
diff --git a/repository.h b/repository.h
index a45f7520fd..fc06c154e2 100644
--- a/repository.h
+++ b/repository.h
@@ -10,6 +10,7 @@ struct lock_file;
 struct pathspec;
 struct raw_object_store;
 struct submodule_cache;
+struct promisor_remote_config;
 
 enum untracked_cache_setting {
 	UNTRACKED_CACHE_UNSET = -1,
@@ -139,6 +140,9 @@ struct repository {
 	/* True if commit-graph has been disabled within this process. */
 	int commit_graph_disabled;
 
+	/* Configurations related to promisor remotes. */
+	struct promisor_remote_config *promisor_remote_config;
+
 	/* Configurations */
 
 	/* Indicate if a repository has a different 'commondir' from 'gitdir' */
-- 
2.32.0.rc1.229.g3e70b5a671-goog


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH v2 3/4] run-command: move envvar-resetting function
  2021-06-08  0:25 ` [PATCH v2 0/4] First steps towards partial clone submodules Jonathan Tan
  2021-06-08  0:25   ` [PATCH v2 1/4] promisor-remote: read partialClone config here Jonathan Tan
  2021-06-08  0:25   ` [PATCH v2 2/4] promisor-remote: support per-repository config Jonathan Tan
@ 2021-06-08  0:25   ` Jonathan Tan
  2021-06-08  4:14     ` Junio C Hamano
  2021-06-08  0:25   ` [PATCH v2 4/4] promisor-remote: teach lazy-fetch in any repo Jonathan Tan
  2021-06-08 17:50   ` [PATCH v2 0/4] First steps towards partial clone submodules Elijah Newren
  4 siblings, 1 reply; 77+ messages in thread
From: Jonathan Tan @ 2021-06-08  0:25 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, me, newren, emilyshaffer

There is a function that resets environment variables, used when
invoking a sub-process in a submodule. The lazy-fetching code (used in
partial clones) will need this function in a subsequent commit, so move
it to a more central location.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 run-command.c | 10 ++++++++++
 run-command.h |  9 +++++++++
 submodule.c   | 14 ++------------
 3 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/run-command.c b/run-command.c
index be6bc128cd..a6c458119c 100644
--- a/run-command.c
+++ b/run-command.c
@@ -1892,3 +1892,13 @@ int run_auto_maintenance(int quiet)
 
 	return run_command(&maint);
 }
+
+void prepare_other_repo_env(struct strvec *env_array)
+{
+	const char * const *var;
+
+	for (var = local_repo_env; *var; var++) {
+		if (strcmp(*var, CONFIG_DATA_ENVIRONMENT))
+			strvec_push(env_array, *var);
+	}
+}
diff --git a/run-command.h b/run-command.h
index d08414a92e..a1d9107f5b 100644
--- a/run-command.h
+++ b/run-command.h
@@ -483,4 +483,13 @@ int run_processes_parallel_tr2(int n, get_next_task_fn, start_failure_fn,
 			       task_finished_fn, void *pp_cb,
 			       const char *tr2_category, const char *tr2_label);
 
+/**
+ * Convenience function which adds all GIT_* environment variables to env_array
+ * with the exception of GIT_CONFIG_PARAMETERS. When used as the env_array of a
+ * subprocess, these entries cause the corresponding environment variables to
+ * be unset in the subprocess. See local_repo_env in cache.h for more
+ * information.
+ */
+void prepare_other_repo_env(struct strvec *env_array);
+
 #endif
diff --git a/submodule.c b/submodule.c
index 0b1d9c1dde..a30216db52 100644
--- a/submodule.c
+++ b/submodule.c
@@ -484,26 +484,16 @@ static void print_submodule_diff_summary(struct repository *r, struct rev_info *
 	strbuf_release(&sb);
 }
 
-static void prepare_submodule_repo_env_no_git_dir(struct strvec *out)
-{
-	const char * const *var;
-
-	for (var = local_repo_env; *var; var++) {
-		if (strcmp(*var, CONFIG_DATA_ENVIRONMENT))
-			strvec_push(out, *var);
-	}
-}
-
 void prepare_submodule_repo_env(struct strvec *out)
 {
-	prepare_submodule_repo_env_no_git_dir(out);
+	prepare_other_repo_env(out);
 	strvec_pushf(out, "%s=%s", GIT_DIR_ENVIRONMENT,
 		     DEFAULT_GIT_DIR_ENVIRONMENT);
 }
 
 static void prepare_submodule_repo_env_in_gitdir(struct strvec *out)
 {
-	prepare_submodule_repo_env_no_git_dir(out);
+	prepare_other_repo_env(out);
 	strvec_pushf(out, "%s=.", GIT_DIR_ENVIRONMENT);
 }
 
-- 
2.32.0.rc1.229.g3e70b5a671-goog


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH v2 4/4] promisor-remote: teach lazy-fetch in any repo
  2021-06-08  0:25 ` [PATCH v2 0/4] First steps towards partial clone submodules Jonathan Tan
                     ` (2 preceding siblings ...)
  2021-06-08  0:25   ` [PATCH v2 3/4] run-command: move envvar-resetting function Jonathan Tan
@ 2021-06-08  0:25   ` Jonathan Tan
  2021-06-08  4:33     ` Junio C Hamano
  2021-06-08 17:42     ` Elijah Newren
  2021-06-08 17:50   ` [PATCH v2 0/4] First steps towards partial clone submodules Elijah Newren
  4 siblings, 2 replies; 77+ messages in thread
From: Jonathan Tan @ 2021-06-08  0:25 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, me, newren, emilyshaffer

This is one step towards supporting partial clone submodules.

Even after this patch, we will still lack partial clone submodules
support, primarily because a lot of Git code that accesses submodule
objects does so by adding their object stores as alternates, meaning
that any lazy fetches that would occur in the submodule would be done
based on the config of the superproject, not of the submodule. This also
prevents testing of the functionality in this patch by user-facing
commands. So for now, test this mechanism using a test helper.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 Makefile                      |  1 +
 object-file.c                 |  7 ++----
 promisor-remote.c             | 14 ++++++++----
 t/helper/test-partial-clone.c | 43 +++++++++++++++++++++++++++++++++++
 t/helper/test-tool.c          |  1 +
 t/helper/test-tool.h          |  1 +
 t/t0410-partial-clone.sh      | 23 +++++++++++++++++++
 7 files changed, 80 insertions(+), 10 deletions(-)
 create mode 100644 t/helper/test-partial-clone.c

diff --git a/Makefile b/Makefile
index c3565fc0f8..f6653bcd5e 100644
--- a/Makefile
+++ b/Makefile
@@ -725,6 +725,7 @@ TEST_BUILTINS_OBJS += test-oidmap.o
 TEST_BUILTINS_OBJS += test-online-cpus.o
 TEST_BUILTINS_OBJS += test-parse-options.o
 TEST_BUILTINS_OBJS += test-parse-pathspec-file.o
+TEST_BUILTINS_OBJS += test-partial-clone.o
 TEST_BUILTINS_OBJS += test-path-utils.o
 TEST_BUILTINS_OBJS += test-pcre2-config.o
 TEST_BUILTINS_OBJS += test-pkt-line.o
diff --git a/object-file.c b/object-file.c
index f233b440b2..ebf273e9e7 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1570,15 +1570,12 @@ static int do_oid_object_info_extended(struct repository *r,
 		}
 
 		/* Check if it is a missing object */
-		if (fetch_if_missing && has_promisor_remote() &&
-		    !already_retried && r == the_repository &&
+		if (fetch_if_missing && repo_has_promisor_remote(r) &&
+		    !already_retried &&
 		    !(flags & OBJECT_INFO_SKIP_FETCH_OBJECT)) {
 			/*
 			 * TODO Investigate checking promisor_remote_get_direct()
 			 * TODO return value and stopping on error here.
-			 * TODO Pass a repository struct through
-			 * promisor_remote_get_direct(), such that arbitrary
-			 * repositories work.
 			 */
 			promisor_remote_get_direct(r, real, 1);
 			already_retried = 1;
diff --git a/promisor-remote.c b/promisor-remote.c
index e1e1f7e93a..1491374d65 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -11,7 +11,8 @@ struct promisor_remote_config {
 	struct promisor_remote **promisors_tail;
 };
 
-static int fetch_objects(const char *remote_name,
+static int fetch_objects(struct repository *repo,
+			 const char *remote_name,
 			 const struct object_id *oids,
 			 int oid_nr)
 {
@@ -21,6 +22,11 @@ static int fetch_objects(const char *remote_name,
 
 	child.git_cmd = 1;
 	child.in = -1;
+	if (repo != the_repository) {
+		prepare_other_repo_env(&child.env_array);
+		strvec_pushf(&child.env_array, "%s=%s", GIT_DIR_ENVIRONMENT,
+			     repo->gitdir);
+	}
 	strvec_pushl(&child.args, "-c", "fetch.negotiationAlgorithm=noop",
 		     "fetch", remote_name, "--no-tags",
 		     "--no-write-fetch-head", "--recurse-submodules=no",
@@ -154,7 +160,7 @@ static void promisor_remote_init(struct repository *r)
 		xcalloc(sizeof(*r->promisor_remote_config), 1);
 	config->promisors_tail = &config->promisors;
 
-	git_config(promisor_remote_config, config);
+	repo_config(r, promisor_remote_config, config);
 
 	if (config->repository_format_partial_clone) {
 		struct promisor_remote *o, *previous;
@@ -252,10 +258,8 @@ int promisor_remote_get_direct(struct repository *repo,
 
 	promisor_remote_init(repo);
 
-	if (repo != the_repository)
-		BUG("only the_repository is supported for now");
 	for (r = repo->promisor_remote_config->promisors; r; r = r->next) {
-		if (fetch_objects(r->name, remaining_oids, remaining_nr) < 0) {
+		if (fetch_objects(repo, r->name, remaining_oids, remaining_nr) < 0) {
 			if (remaining_nr == 1)
 				continue;
 			remaining_nr = remove_fetched_oids(repo, &remaining_oids,
diff --git a/t/helper/test-partial-clone.c b/t/helper/test-partial-clone.c
new file mode 100644
index 0000000000..3f102cfddd
--- /dev/null
+++ b/t/helper/test-partial-clone.c
@@ -0,0 +1,43 @@
+#include "cache.h"
+#include "test-tool.h"
+#include "repository.h"
+#include "object-store.h"
+
+/*
+ * Prints the size of the object corresponding to the given hash in a specific
+ * gitdir. This is similar to "git -C gitdir cat-file -s", except that this
+ * exercises the code that accesses the object of an arbitrary repository that
+ * is not the_repository. ("git -C gitdir" makes it so that the_repository is
+ * the one in gitdir.)
+ */
+static void object_info(const char *gitdir, const char *oid_hex)
+{
+	struct repository r;
+	struct object_id oid;
+	unsigned long size;
+	struct object_info oi = {.sizep = &size};
+	const char *p;
+
+	if (repo_init(&r, gitdir, NULL))
+		die("could not init repo");
+	if (parse_oid_hex(oid_hex, &oid, &p))
+		die("could not parse oid");
+	if (oid_object_info_extended(&r, &oid, &oi, 0))
+		die("could not obtain object info");
+	printf("%d\n", (int) size);
+}
+
+int cmd__partial_clone(int argc, const char **argv)
+{
+	setup_git_directory();
+
+	if (argc < 4)
+		die("too few arguments");
+
+	if (!strcmp(argv[1], "object-info"))
+		object_info(argv[2], argv[3]);
+	else
+		die("invalid argument '%s'", argv[1]);
+
+	return 0;
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index c5bd0c6d4c..b21e8f1519 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -46,6 +46,7 @@ static struct test_cmd cmds[] = {
 	{ "online-cpus", cmd__online_cpus },
 	{ "parse-options", cmd__parse_options },
 	{ "parse-pathspec-file", cmd__parse_pathspec_file },
+	{ "partial-clone", cmd__partial_clone },
 	{ "path-utils", cmd__path_utils },
 	{ "pcre2-config", cmd__pcre2_config },
 	{ "pkt-line", cmd__pkt_line },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index e8069a3b22..f845ced4b3 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -35,6 +35,7 @@ int cmd__oidmap(int argc, const char **argv);
 int cmd__online_cpus(int argc, const char **argv);
 int cmd__parse_options(int argc, const char **argv);
 int cmd__parse_pathspec_file(int argc, const char** argv);
+int cmd__partial_clone(int argc, const char **argv);
 int cmd__path_utils(int argc, const char **argv);
 int cmd__pcre2_config(int argc, const char **argv);
 int cmd__pkt_line(int argc, const char **argv);
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 584a039b85..a211a66c67 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -604,6 +604,29 @@ test_expect_success 'do not fetch when checking existence of tree we construct o
 	git -C repo cherry-pick side1
 '
 
+test_expect_success 'lazy-fetch when accessing object not in the_repository' '
+	rm -rf full partial.git &&
+	test_create_repo full &&
+	test_commit -C full create-a-file file.txt &&
+
+	test_config -C full uploadpack.allowfilter 1 &&
+	test_config -C full uploadpack.allowanysha1inwant 1 &&
+	git clone --filter=blob:none --bare "file://$(pwd)/full" partial.git &&
+	FILE_HASH=$(git -C full rev-parse HEAD:file.txt) &&
+
+	# Sanity check that the file is missing
+	git -C partial.git rev-list --objects --missing=print HEAD >out &&
+	grep "[?]$FILE_HASH" out &&
+
+	git -C full cat-file -s "$FILE_HASH" >expect &&
+	test-tool partial-clone object-info partial.git "$FILE_HASH" >actual &&
+	test_cmp expect actual &&
+
+	# Sanity check that the file is now present
+	git -C partial.git rev-list --objects --missing=print HEAD >out &&
+	! grep "[?]$FILE_HASH" out
+'
+
 . "$TEST_DIRECTORY"/lib-httpd.sh
 start_httpd
 
-- 
2.32.0.rc1.229.g3e70b5a671-goog


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2/4] promisor-remote: support per-repository config
  2021-06-01 21:34 ` [PATCH 2/4] promisor-remote: support per-repository config Jonathan Tan
  2021-06-04 20:09   ` Taylor Blau
  2021-06-04 21:21   ` Elijah Newren
@ 2021-06-08  0:48   ` Emily Shaffer
  2 siblings, 0 replies; 77+ messages in thread
From: Emily Shaffer @ 2021-06-08  0:48 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On Tue, Jun 01, 2021 at 02:34:17PM -0700, Jonathan Tan wrote:
> 
> Instead of using global variables to store promisor remote information,
> store this config in struct repository instead, and add
> repository-agnostic non-static functions corresponding to the existing
> non-static functions that only work on the_repository.

Nice!

> diff --git a/promisor-remote.c b/promisor-remote.c
> index bfe8eee5f2..5819d2cf28 100644
> --- a/promisor-remote.c
> +++ b/promisor-remote.c
> @@ -5,7 +5,11 @@
>  #include "transport.h"
>  #include "strvec.h"
>  
> -static char *repository_format_partial_clone;
> +struct promisor_remote_config {
> +	char *repository_format_partial_clone;
> +	struct promisor_remote *promisors;
> +	struct promisor_remote **promisors_tail;
> +};
[snip]
> -static struct promisor_remote *promisors;
> -static struct promisor_remote **promisors_tail = &promisors;

Nice, so all the old globals are contained in a data struct now...

> -static struct promisor_remote *promisor_remote_new(const char *remote_name)
> +static struct promisor_remote *promisor_remote_new(struct promisor_remote_config *config,
> +						   const char *remote_name)

...which we use during promisor_remote creation now.

> @@ -135,59 +140,63 @@ static int promisor_remote_config(const char *var, const char *value, void *data
>  	return 0;
>  }
>  
> -static int initialized;

Very happy to see this pattern vanquished. ;)

> -static void promisor_remote_init(void)
> +static void promisor_remote_init(struct repository *r)
>  {
> -	if (initialized)
> +	struct promisor_remote_config *config;
> +
> +	if (r->promisor_remote_config)
>  		return;
> -	initialized = 1;

So it's not a "call once, set global state" anymore, but we are also
careful not to trample any existing state on the 'struct repository'.
Nice.

> +	config = r->promisor_remote_config =
> +		xcalloc(sizeof(*r->promisor_remote_config), 1);

Hm. Am I the only one who doesn't like assigning from the result of an
assignment like this? ...Based on 'git grep "=[^=]\+=[^=]"' yes, I am the
only one :)


Overall this patch looks OK to me, but I see that there are some changes
suggested for the next version, so I'll hold off on a reviewed-by so I
can have a look at those too.

 - Emily

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 3/4] run-command: move envvar-resetting function
  2021-06-01 21:34 ` [PATCH 3/4] run-command: move envvar-resetting function Jonathan Tan
  2021-06-04 20:19   ` Taylor Blau
@ 2021-06-08  0:54   ` Emily Shaffer
  1 sibling, 0 replies; 77+ messages in thread
From: Emily Shaffer @ 2021-06-08  0:54 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On Tue, Jun 01, 2021 at 02:34:18PM -0700, Jonathan Tan wrote:
> 
> There is a function that resets environment variables, used when
> invoking a sub-process in a submodule. The lazy-fetching code (used in
> partial clones) will need this function in a subsequent commit, so move
> it to a more central location.
> 
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  run-command.c | 10 ++++++++++
>  run-command.h |  7 +++++++
>  submodule.c   | 14 ++------------
>  3 files changed, 19 insertions(+), 12 deletions(-)
> diff --git a/run-command.h b/run-command.h
> index d08414a92e..6f61ec7703 100644
> --- a/run-command.h
> +++ b/run-command.h
> @@ -483,4 +483,11 @@ int run_processes_parallel_tr2(int n, get_next_task_fn, start_failure_fn,
>  			       task_finished_fn, void *pp_cb,
>  			       const char *tr2_category, const char *tr2_label);
>  
> +/**
> + * Convenience function that adds entries to env_array that resets all
> + * repo-specific environment variables except for CONFIG_DATA_ENVIRONMENT. See
> + * local_repo_env in cache.h for more information.
> + */
> +void prepare_other_repo_env(struct strvec *env_array);

I like the name change on the function as well.

>  void prepare_submodule_repo_env(struct strvec *out)
>  {
> -	prepare_submodule_repo_env_no_git_dir(out);
> +	prepare_other_repo_env(out);
>  	strvec_pushf(out, "%s=%s", GIT_DIR_ENVIRONMENT,
>  		     DEFAULT_GIT_DIR_ENVIRONMENT);
>  }
>  
>  static void prepare_submodule_repo_env_in_gitdir(struct strvec *out)
>  {
> -	prepare_submodule_repo_env_no_git_dir(out);
> +	prepare_other_repo_env(out);
>  	strvec_pushf(out, "%s=.", GIT_DIR_ENVIRONMENT);
>  }

This call was used in less places than I thought (I guess that's part of
why you're making it more public/central), so my worry about having some
large scale change was for nothing.

As for Taylor's comment about the CONFIG_DATA_ENVIRONMENT variable, it
was named like that before you got here, so I am not too worried whether
or not you change it.

Reviewed-by: Emily Shaffer <emilyshaffer@google.com>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 4/4] promisor-remote: teach lazy-fetch in any repo
  2021-06-01 21:34 ` [PATCH 4/4] promisor-remote: teach lazy-fetch in any repo Jonathan Tan
                     ` (2 preceding siblings ...)
  2021-06-05  0:22   ` Elijah Newren
@ 2021-06-08  1:41   ` Emily Shaffer
  2021-06-09  4:52     ` Jonathan Tan
  3 siblings, 1 reply; 77+ messages in thread
From: Emily Shaffer @ 2021-06-08  1:41 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On Tue, Jun 01, 2021 at 02:34:19PM -0700, Jonathan Tan wrote:
> 
> This is one step towards supporting partial clone submodules.
> 
> Even after this patch, we will still lack partial clone submodules
> support, primarily because a lot of Git code that accesses submodule
> objects does so by adding their object stores as alternates, meaning
> that any lazy fetches that would occur in the submodule would be done
> based on the config of the superproject, not of the submodule. This also
> prevents testing of the functionality in this patch by user-facing
> commands. So for now, test this mechanism using a test helper.
> 
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  Makefile                      |  1 +
>  object-file.c                 |  7 ++-----
>  promisor-remote.c             | 14 +++++++++-----
>  t/helper/test-partial-clone.c | 34 ++++++++++++++++++++++++++++++++++
>  t/helper/test-tool.c          |  1 +
>  t/helper/test-tool.h          |  1 +
>  t/t0410-partial-clone.sh      | 24 ++++++++++++++++++++++++
>  7 files changed, 72 insertions(+), 10 deletions(-)
>  create mode 100644 t/helper/test-partial-clone.c
> 
> diff --git a/Makefile b/Makefile
> index c3565fc0f8..f6653bcd5e 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -725,6 +725,7 @@ TEST_BUILTINS_OBJS += test-oidmap.o
>  TEST_BUILTINS_OBJS += test-online-cpus.o
>  TEST_BUILTINS_OBJS += test-parse-options.o
>  TEST_BUILTINS_OBJS += test-parse-pathspec-file.o
> +TEST_BUILTINS_OBJS += test-partial-clone.o
>  TEST_BUILTINS_OBJS += test-path-utils.o
>  TEST_BUILTINS_OBJS += test-pcre2-config.o
>  TEST_BUILTINS_OBJS += test-pkt-line.o
> diff --git a/object-file.c b/object-file.c
> index f233b440b2..ebf273e9e7 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -1570,15 +1570,12 @@ static int do_oid_object_info_extended(struct repository *r,
>  		}
>  
>  		/* Check if it is a missing object */
> -		if (fetch_if_missing && has_promisor_remote() &&
> -		    !already_retried && r == the_repository &&
> +		if (fetch_if_missing && repo_has_promisor_remote(r) &&
> +		    !already_retried &&

So we remove the explicit "if we've got promisors and are operating on
the repo we launched in" and instead ask "if the repo we're operating on
has promisors" - definitely a step towards in-process submodule
happiness :)

>  		    !(flags & OBJECT_INFO_SKIP_FETCH_OBJECT)) {
>  			/*
>  			 * TODO Investigate checking promisor_remote_get_direct()
>  			 * TODO return value and stopping on error here.
> -			 * TODO Pass a repository struct through
> -			 * promisor_remote_get_direct(), such that arbitrary
> -			 * repositories work.
>  			 */
>  			promisor_remote_get_direct(r, real, 1);

And this seems like a stale comment, since I see we were already passing
'r' here. But arbitrary repositories still don't just work, right? Or, I
guess your point was "partial clone + submodules don't just work,
because of the alternates thing" - so maybe this part is OK?

> @@ -150,7 +156,7 @@ static void promisor_remote_init(struct repository *r)
>  		xcalloc(sizeof(*r->promisor_remote_config), 1);
>  	config->promisors_tail = &config->promisors;
>  
> -	git_config(promisor_remote_config, config);
> +	repo_config(r, promisor_remote_config, config);

Should this change have happened when we added 'r' to
promisor_remote_init? If r==the_repository then there's no difference
between these two calls, right?

> diff --git a/t/helper/test-partial-clone.c b/t/helper/test-partial-clone.c
> new file mode 100644
> index 0000000000..e7bc7eb21f
> --- /dev/null
> +++ b/t/helper/test-partial-clone.c
> @@ -0,0 +1,34 @@
> +#include "cache.h"
> +#include "test-tool.h"
> +#include "repository.h"
> +#include "object-store.h"
> +
> +static void object_info(const char *gitdir, const char *oid_hex)
> +{
> +	struct repository r;
> +	struct object_id oid;
> +	unsigned long size;
> +	struct object_info oi = {.sizep = &size};
> +	const char *p;
> +
> +	if (repo_init(&r, gitdir, NULL))
> +		die("could not init repo");
> +	if (parse_oid_hex(oid_hex, &oid, &p))
> +		die("could not parse oid");
> +	if (oid_object_info_extended(&r, &oid, &oi, 0))
> +		die("could not obtain object info");
> +	printf("%d\n", (int) size);
> +}
> +
> +int cmd__partial_clone(int argc, const char **argv)
> +{
> +	if (argc < 4)
> +		die("too few arguments");
> +
> +	if (!strcmp(argv[1], "object-info"))
> +		object_info(argv[2], argv[3]);
> +	else
> +		die("invalid argument '%s'", argv[1]);
> +
> +	return 0;
> +}
> diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
> index c5bd0c6d4c..b21e8f1519 100644
> --- a/t/helper/test-tool.c
> +++ b/t/helper/test-tool.c
> @@ -46,6 +46,7 @@ static struct test_cmd cmds[] = {
>  	{ "online-cpus", cmd__online_cpus },
>  	{ "parse-options", cmd__parse_options },
>  	{ "parse-pathspec-file", cmd__parse_pathspec_file },
> +	{ "partial-clone", cmd__partial_clone },
>  	{ "path-utils", cmd__path_utils },
>  	{ "pcre2-config", cmd__pcre2_config },
>  	{ "pkt-line", cmd__pkt_line },
> diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
> index e8069a3b22..f845ced4b3 100644
> --- a/t/helper/test-tool.h
> +++ b/t/helper/test-tool.h
> @@ -35,6 +35,7 @@ int cmd__oidmap(int argc, const char **argv);
>  int cmd__online_cpus(int argc, const char **argv);
>  int cmd__parse_options(int argc, const char **argv);
>  int cmd__parse_pathspec_file(int argc, const char** argv);
> +int cmd__partial_clone(int argc, const char **argv);
>  int cmd__path_utils(int argc, const char **argv);
>  int cmd__pcre2_config(int argc, const char **argv);
>  int cmd__pkt_line(int argc, const char **argv);
> diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
> index 584a039b85..e804d267e6 100755
> --- a/t/t0410-partial-clone.sh
> +++ b/t/t0410-partial-clone.sh
> @@ -604,6 +604,30 @@ test_expect_success 'do not fetch when checking existence of tree we construct o
>  	git -C repo cherry-pick side1
>  '
>  
> +test_expect_success 'lazy-fetch when accessing object not in the_repository' '
> +	rm -rf full partial.git &&
> +	test_create_repo full &&
> +	printf 12345 >full/file.txt &&
> +	git -C full add file.txt &&
> +	git -C full commit -m "first commit" &&
I think there is some test_commit or similar function here that's more
commonly used, right?

> +
> +	test_config -C full uploadpack.allowfilter 1 &&
> +	test_config -C full uploadpack.allowanysha1inwant 1 &&
I wasn't sure what these configs are for, but it looks like .allowfilter
is to allow 'full' to serve as a remote to a partial clone. But what do
you need .allowAnySha1InWant for here? Are we expecting to ask for SHAs
that 'full' doesn't have?

> +	git clone --filter=blob:none --bare "file://$(pwd)/full" partial.git &&
> +	FILE_HASH=$(git hash-object --stdin <full/file.txt) &&
> +
> +	# Sanity check that the file is missing
> +	git -C partial.git rev-list --objects --missing=print HEAD >out &&
> +	grep "[?]$FILE_HASH" out &&
> +
> +	OUT=$(test-tool partial-clone object-info partial.git "$FILE_HASH") &&
> +	test "$OUT" -eq 5 &&

Hm. I guess I am confused about why this fetches the desired object into
partial.git. Maybe the test-helper needs a comment (and maybe here too)
on the line where fetch will be triggered?

> +
> +	# Sanity check that the file is now present
> +	git -C partial.git rev-list --objects --missing=print HEAD >out &&
> +	! grep "[?]$FILE_HASH" out

 - Emily

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 0/4] First steps towards partial clone submodules
  2021-06-01 21:34 [PATCH 0/4] First steps towards partial clone submodules Jonathan Tan
                   ` (4 preceding siblings ...)
  2021-06-08  0:25 ` [PATCH v2 0/4] First steps towards partial clone submodules Jonathan Tan
@ 2021-06-08  1:44 ` Emily Shaffer
  2021-06-10 17:35 ` [PATCH v3 0/5] " Jonathan Tan
  2021-06-17 17:13 ` [PATCH v4 " Jonathan Tan
  7 siblings, 0 replies; 77+ messages in thread
From: Emily Shaffer @ 2021-06-08  1:44 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On Tue, Jun 01, 2021 at 02:34:15PM -0700, Jonathan Tan wrote:
> 
> This is a preliminary step towards supporting partial clone submodules
> (e.g., by cloning with --recurse-submodules and having the given filter
> propagate to submodules). Even with this patch set, we won't be there
> yet (notably, some code in Git access objects in submodules by adding
> them as alternates - so lazy-fetching missing objects in submodules
> wouldn't work here), but at least this is a first step.
> 
> This patch set would also be useful if Git needed to operate on
> other repositories (other than in the submodule case), but I can't think
> of such a situation right now.
> 
> As mentioned, there is still more work that needs to be done. Any help
> is appreciated, and as for me, I hope to get back to this in the 3rd
> quarter of the year.
> 

I see there's a v2 that came while I was still reviewing, oops. But
overall I like this series:

 - It's small
 - It does reasonable code cleanup which benefits the codebase on its
   own
 - It paves the way for a series later on without being part of that
   series, meaning that the later series will be slightly smaller
   because of it (a lesson I should learn for myself)

Thanks. I'll try and review v2 later in the week.

 - Emily

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 1/4] promisor-remote: read partialClone config here
  2021-06-08  0:25   ` [PATCH v2 1/4] promisor-remote: read partialClone config here Jonathan Tan
@ 2021-06-08  3:18     ` Junio C Hamano
  2021-06-09  4:26       ` Jonathan Tan
  2021-06-08 17:28     ` Elijah Newren
  1 sibling, 1 reply; 77+ messages in thread
From: Junio C Hamano @ 2021-06-08  3:18 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, me, newren, emilyshaffer

Jonathan Tan <jonathantanmy@google.com> writes:

> Currently, the reading of config related to promisor remotes is done in
> two places: once in setup.c (which sets the global variable
> repository_format_partial_clone, to be read by the code in
> promisor-remote.c), and once in promisor-remote.c. This means that care
> must be taken to ensure that repository_format_partial_clone is set
> before any code in promisor-remote.c accesses it.

The above is very true, but I am puzzled by the chosen direction of
the code movement.

Given that the value in the field repository_format.partial_clone
comes from an extension, and an extension that is not understood by
the version of Git that is running MUST abort the execution of Git,
wouldn't it be guaranteed that, in a correctly written program, the
.partial_clone field must already be set up correctly before
anything else, including those in promissor-remote.c, accesses it?

> To simplify the code, move all such config reading to promisor-remote.c.
> By doing this, it will be easier to see when
> repository_format_partial_clone is written and, thus, to reason about
> the code. This will be especially helpful in a subsequent commit, which
> modifies this code.

So, I am not sure if this simplifies the code the way we want to
read our code.  Doing a thing in one place is indeed simpler than
doing it in two places, but it looks like promisor-remote code
should be using the repository-format data more, not the other way
around, at least to me.

Perhaps I am missing some other motivation, though.

Thanks.

>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  cache.h           |  1 -
>  promisor-remote.c | 14 +++++++++-----
>  promisor-remote.h |  6 ------
>  setup.c           | 10 +++++++---
>  4 files changed, 16 insertions(+), 15 deletions(-)
>
> diff --git a/cache.h b/cache.h
> index ba04ff8bd3..dbdcec8601 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1061,7 +1061,6 @@ extern int repository_format_worktree_config;
>  struct repository_format {
>  	int version;
>  	int precious_objects;
> -	char *partial_clone; /* value of extensions.partialclone */
>  	int worktree_config;
>  	int is_bare;
>  	int hash_algo;
> diff --git a/promisor-remote.c b/promisor-remote.c
> index da3f2ca261..c0e5061dfe 100644
> --- a/promisor-remote.c
> +++ b/promisor-remote.c
> @@ -7,11 +7,6 @@
>  
>  static char *repository_format_partial_clone;
>  
> -void set_repository_format_partial_clone(char *partial_clone)
> -{
> -	repository_format_partial_clone = xstrdup_or_null(partial_clone);
> -}
> -
>  static int fetch_objects(const char *remote_name,
>  			 const struct object_id *oids,
>  			 int oid_nr)
> @@ -99,6 +94,15 @@ static int promisor_remote_config(const char *var, const char *value, void *data
>  	size_t namelen;
>  	const char *subkey;
>  
> +	if (!strcmp(var, "extensions.partialclone")) {
> +		/*
> +		 * NULL value is handled in handle_extension_v0 in setup.c.
> +		 */
> +		if (value)
> +			repository_format_partial_clone = xstrdup(value);
> +		return 0;
> +	}
> +
>  	if (parse_config_key(var, "remote", &name, &namelen, &subkey) < 0)
>  		return 0;
>  
> diff --git a/promisor-remote.h b/promisor-remote.h
> index c7a14063c5..687210ab87 100644
> --- a/promisor-remote.h
> +++ b/promisor-remote.h
> @@ -32,10 +32,4 @@ int promisor_remote_get_direct(struct repository *repo,
>  			       const struct object_id *oids,
>  			       int oid_nr);
>  
> -/*
> - * This should be used only once from setup.c to set the value we got
> - * from the extensions.partialclone config option.
> - */
> -void set_repository_format_partial_clone(char *partial_clone);
> -
>  #endif /* PROMISOR_REMOTE_H */
> diff --git a/setup.c b/setup.c
> index 59e2facd9d..d60b6bc554 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -470,7 +470,13 @@ static enum extension_result handle_extension_v0(const char *var,
>  		} else if (!strcmp(ext, "partialclone")) {
>  			if (!value)
>  				return config_error_nonbool(var);
> -			data->partial_clone = xstrdup(value);
> +			/*
> +			 * This config variable will be read together with the
> +			 * other relevant config variables in
> +			 * promisor_remote_config() in promisor_remote.c, so we
> +			 * do not need to read it here. Just report that this
> +			 * extension is known.
> +			 */
>  			return EXTENSION_OK;
>  		} else if (!strcmp(ext, "worktreeconfig")) {
>  			data->worktree_config = git_config_bool(var, value);
> @@ -566,7 +572,6 @@ static int check_repository_format_gently(const char *gitdir, struct repository_
>  	}
>  
>  	repository_format_precious_objects = candidate->precious_objects;
> -	set_repository_format_partial_clone(candidate->partial_clone);
>  	repository_format_worktree_config = candidate->worktree_config;
>  	string_list_clear(&candidate->unknown_extensions, 0);
>  	string_list_clear(&candidate->v1_only_extensions, 0);
> @@ -650,7 +655,6 @@ void clear_repository_format(struct repository_format *format)
>  	string_list_clear(&format->unknown_extensions, 0);
>  	string_list_clear(&format->v1_only_extensions, 0);
>  	free(format->work_tree);
> -	free(format->partial_clone);
>  	init_repository_format(format);
>  }

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 2/4] promisor-remote: support per-repository config
  2021-06-08  0:25   ` [PATCH v2 2/4] promisor-remote: support per-repository config Jonathan Tan
@ 2021-06-08  3:30     ` Junio C Hamano
  2021-06-09  4:29       ` Jonathan Tan
  0 siblings, 1 reply; 77+ messages in thread
From: Junio C Hamano @ 2021-06-08  3:30 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, me, newren, emilyshaffer

Jonathan Tan <jonathantanmy@google.com> writes:

> Instead of using global variables to store promisor remote information,
> store this config in struct repository instead, and add
> repository-agnostic non-static functions corresponding to the existing
> non-static functions that only work on the_repository.

This does make sense.  In general, repository extensions are per
repository, so anything read from "extensions.*" should be stored
per in-core repository structure.

But doesn't that mean the thing that should be fixed is on the
setup.c side, where not just extensions.partialClone but other data
is read into "struct repository_format *format"?  Shouldn't we have
a pointer to that struct in the in-core repository object?

Special casing the "partialClone" field alone feels somewhat strange
to me.

Thanks.


> The actual lazy-fetching of missing objects currently does not work on
> repositories other than the_repository, and will still not work after
> this commit, so add a BUG message explaining this. A subsequent commit
> will remove this limitation.
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  promisor-remote.c | 103 ++++++++++++++++++++++++++--------------------
>  promisor-remote.h |  22 ++++++++--
>  repository.c      |   6 +++
>  repository.h      |   4 ++
>  4 files changed, 87 insertions(+), 48 deletions(-)
>
> diff --git a/promisor-remote.c b/promisor-remote.c
> index c0e5061dfe..e1e1f7e93a 100644
> --- a/promisor-remote.c
> +++ b/promisor-remote.c
> @@ -5,7 +5,11 @@
>  #include "transport.h"
>  #include "strvec.h"
>  
> -static char *repository_format_partial_clone;
> +struct promisor_remote_config {
> +	char *repository_format_partial_clone;
> +	struct promisor_remote *promisors;
> +	struct promisor_remote **promisors_tail;
> +};
>  
>  static int fetch_objects(const char *remote_name,
>  			 const struct object_id *oids,
> @@ -37,10 +41,8 @@ static int fetch_objects(const char *remote_name,
>  	return finish_command(&child) ? -1 : 0;
>  }
>  
> -static struct promisor_remote *promisors;
> -static struct promisor_remote **promisors_tail = &promisors;
> -
> -static struct promisor_remote *promisor_remote_new(const char *remote_name)
> +static struct promisor_remote *promisor_remote_new(struct promisor_remote_config *config,
> +						   const char *remote_name)
>  {
>  	struct promisor_remote *r;
>  
> @@ -52,18 +54,19 @@ static struct promisor_remote *promisor_remote_new(const char *remote_name)
>  
>  	FLEX_ALLOC_STR(r, name, remote_name);
>  
> -	*promisors_tail = r;
> -	promisors_tail = &r->next;
> +	*config->promisors_tail = r;
> +	config->promisors_tail = &r->next;
>  
>  	return r;
>  }
>  
> -static struct promisor_remote *promisor_remote_lookup(const char *remote_name,
> +static struct promisor_remote *promisor_remote_lookup(struct promisor_remote_config *config,
> +						      const char *remote_name,
>  						      struct promisor_remote **previous)
>  {
>  	struct promisor_remote *r, *p;
>  
> -	for (p = NULL, r = promisors; r; p = r, r = r->next)
> +	for (p = NULL, r = config->promisors; r; p = r, r = r->next)
>  		if (!strcmp(r->name, remote_name)) {
>  			if (previous)
>  				*previous = p;
> @@ -73,7 +76,8 @@ static struct promisor_remote *promisor_remote_lookup(const char *remote_name,
>  	return NULL;
>  }
>  
> -static void promisor_remote_move_to_tail(struct promisor_remote *r,
> +static void promisor_remote_move_to_tail(struct promisor_remote_config *config,
> +					 struct promisor_remote *r,
>  					 struct promisor_remote *previous)
>  {
>  	if (r->next == NULL)
> @@ -82,14 +86,15 @@ static void promisor_remote_move_to_tail(struct promisor_remote *r,
>  	if (previous)
>  		previous->next = r->next;
>  	else
> -		promisors = r->next ? r->next : r;
> +		config->promisors = r->next ? r->next : r;
>  	r->next = NULL;
> -	*promisors_tail = r;
> -	promisors_tail = &r->next;
> +	*config->promisors_tail = r;
> +	config->promisors_tail = &r->next;
>  }
>  
>  static int promisor_remote_config(const char *var, const char *value, void *data)
>  {
> +	struct promisor_remote_config *config = data;
>  	const char *name;
>  	size_t namelen;
>  	const char *subkey;
> @@ -99,7 +104,7 @@ static int promisor_remote_config(const char *var, const char *value, void *data
>  		 * NULL value is handled in handle_extension_v0 in setup.c.
>  		 */
>  		if (value)
> -			repository_format_partial_clone = xstrdup(value);
> +			config->repository_format_partial_clone = xstrdup(value);
>  		return 0;
>  	}
>  
> @@ -114,8 +119,8 @@ static int promisor_remote_config(const char *var, const char *value, void *data
>  
>  		remote_name = xmemdupz(name, namelen);
>  
> -		if (!promisor_remote_lookup(remote_name, NULL))
> -			promisor_remote_new(remote_name);
> +		if (!promisor_remote_lookup(config, remote_name, NULL))
> +			promisor_remote_new(config, remote_name);
>  
>  		free(remote_name);
>  		return 0;
> @@ -124,9 +129,9 @@ static int promisor_remote_config(const char *var, const char *value, void *data
>  		struct promisor_remote *r;
>  		char *remote_name = xmemdupz(name, namelen);
>  
> -		r = promisor_remote_lookup(remote_name, NULL);
> +		r = promisor_remote_lookup(config, remote_name, NULL);
>  		if (!r)
> -			r = promisor_remote_new(remote_name);
> +			r = promisor_remote_new(config, remote_name);
>  
>  		free(remote_name);
>  
> @@ -139,59 +144,65 @@ static int promisor_remote_config(const char *var, const char *value, void *data
>  	return 0;
>  }
>  
> -static int initialized;
> -
> -static void promisor_remote_init(void)
> +static void promisor_remote_init(struct repository *r)
>  {
> -	if (initialized)
> +	struct promisor_remote_config *config;
> +
> +	if (r->promisor_remote_config)
>  		return;
> -	initialized = 1;
> +	config = r->promisor_remote_config =
> +		xcalloc(sizeof(*r->promisor_remote_config), 1);
> +	config->promisors_tail = &config->promisors;
>  
> -	git_config(promisor_remote_config, NULL);
> +	git_config(promisor_remote_config, config);
>  
> -	if (repository_format_partial_clone) {
> +	if (config->repository_format_partial_clone) {
>  		struct promisor_remote *o, *previous;
>  
> -		o = promisor_remote_lookup(repository_format_partial_clone,
> +		o = promisor_remote_lookup(config,
> +					   config->repository_format_partial_clone,
>  					   &previous);
>  		if (o)
> -			promisor_remote_move_to_tail(o, previous);
> +			promisor_remote_move_to_tail(config, o, previous);
>  		else
> -			promisor_remote_new(repository_format_partial_clone);
> +			promisor_remote_new(config, config->repository_format_partial_clone);
>  	}
>  }
>  
> -static void promisor_remote_clear(void)
> +void promisor_remote_clear(struct promisor_remote_config *config)
>  {
> -	while (promisors) {
> -		struct promisor_remote *r = promisors;
> -		promisors = promisors->next;
> +	FREE_AND_NULL(config->repository_format_partial_clone);
> +
> +	while (config->promisors) {
> +		struct promisor_remote *r = config->promisors;
> +		config->promisors = config->promisors->next;
>  		free(r);
>  	}
>  
> -	promisors_tail = &promisors;
> +	config->promisors_tail = &config->promisors;
>  }
>  
> -void promisor_remote_reinit(void)
> +void repo_promisor_remote_reinit(struct repository *r)
>  {
> -	initialized = 0;
> -	promisor_remote_clear();
> -	promisor_remote_init();
> +	promisor_remote_clear(r->promisor_remote_config);
> +	FREE_AND_NULL(r->promisor_remote_config);
> +	promisor_remote_init(r);
>  }
>  
> -struct promisor_remote *promisor_remote_find(const char *remote_name)
> +struct promisor_remote *repo_promisor_remote_find(struct repository *r,
> +						  const char *remote_name)
>  {
> -	promisor_remote_init();
> +	promisor_remote_init(r);
>  
>  	if (!remote_name)
> -		return promisors;
> +		return r->promisor_remote_config->promisors;
>  
> -	return promisor_remote_lookup(remote_name, NULL);
> +	return promisor_remote_lookup(r->promisor_remote_config, remote_name, NULL);
>  }
>  
> -int has_promisor_remote(void)
> +int repo_has_promisor_remote(struct repository *r)
>  {
> -	return !!promisor_remote_find(NULL);
> +	return !!repo_promisor_remote_find(r, NULL);
>  }
>  
>  static int remove_fetched_oids(struct repository *repo,
> @@ -239,9 +250,11 @@ int promisor_remote_get_direct(struct repository *repo,
>  	if (oid_nr == 0)
>  		return 0;
>  
> -	promisor_remote_init();
> +	promisor_remote_init(repo);
>  
> -	for (r = promisors; r; r = r->next) {
> +	if (repo != the_repository)
> +		BUG("only the_repository is supported for now");
> +	for (r = repo->promisor_remote_config->promisors; r; r = r->next) {
>  		if (fetch_objects(r->name, remaining_oids, remaining_nr) < 0) {
>  			if (remaining_nr == 1)
>  				continue;
> diff --git a/promisor-remote.h b/promisor-remote.h
> index 687210ab87..edc45ab0f5 100644
> --- a/promisor-remote.h
> +++ b/promisor-remote.h
> @@ -17,9 +17,25 @@ struct promisor_remote {
>  	const char name[FLEX_ARRAY];
>  };
>  
> -void promisor_remote_reinit(void);
> -struct promisor_remote *promisor_remote_find(const char *remote_name);
> -int has_promisor_remote(void);
> +void repo_promisor_remote_reinit(struct repository *r);
> +static inline void promisor_remote_reinit(void)
> +{
> +	repo_promisor_remote_reinit(the_repository);
> +}
> +
> +void promisor_remote_clear(struct promisor_remote_config *config);
> +
> +struct promisor_remote *repo_promisor_remote_find(struct repository *r, const char *remote_name);
> +static inline struct promisor_remote *promisor_remote_find(const char *remote_name)
> +{
> +	return repo_promisor_remote_find(the_repository, remote_name);
> +}
> +
> +int repo_has_promisor_remote(struct repository *r);
> +static inline int has_promisor_remote(void)
> +{
> +	return repo_has_promisor_remote(the_repository);
> +}
>  
>  /*
>   * Fetches all requested objects from all promisor remotes, trying them one at
> diff --git a/repository.c b/repository.c
> index 448cd557d4..dca0a11ab6 100644
> --- a/repository.c
> +++ b/repository.c
> @@ -11,6 +11,7 @@
>  #include "lockfile.h"
>  #include "submodule-config.h"
>  #include "sparse-index.h"
> +#include "promisor-remote.h"
>  
>  /* The main repository */
>  static struct repository the_repo;
> @@ -258,6 +259,11 @@ void repo_clear(struct repository *repo)
>  		if (repo->index != &the_index)
>  			FREE_AND_NULL(repo->index);
>  	}
> +
> +	if (repo->promisor_remote_config) {
> +		promisor_remote_clear(repo->promisor_remote_config);
> +		FREE_AND_NULL(repo->promisor_remote_config);
> +	}
>  }
>  
>  int repo_read_index(struct repository *repo)
> diff --git a/repository.h b/repository.h
> index a45f7520fd..fc06c154e2 100644
> --- a/repository.h
> +++ b/repository.h
> @@ -10,6 +10,7 @@ struct lock_file;
>  struct pathspec;
>  struct raw_object_store;
>  struct submodule_cache;
> +struct promisor_remote_config;
>  
>  enum untracked_cache_setting {
>  	UNTRACKED_CACHE_UNSET = -1,
> @@ -139,6 +140,9 @@ struct repository {
>  	/* True if commit-graph has been disabled within this process. */
>  	int commit_graph_disabled;
>  
> +	/* Configurations related to promisor remotes. */
> +	struct promisor_remote_config *promisor_remote_config;
> +
>  	/* Configurations */
>  
>  	/* Indicate if a repository has a different 'commondir' from 'gitdir' */

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 3/4] run-command: move envvar-resetting function
  2021-06-08  0:25   ` [PATCH v2 3/4] run-command: move envvar-resetting function Jonathan Tan
@ 2021-06-08  4:14     ` Junio C Hamano
  2021-06-09  4:32       ` Jonathan Tan
  0 siblings, 1 reply; 77+ messages in thread
From: Junio C Hamano @ 2021-06-08  4:14 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, me, newren, emilyshaffer

Jonathan Tan <jonathantanmy@google.com> writes:

> There is a function that resets environment variables, used when
> invoking a sub-process in a submodule. The lazy-fetching code (used in
> partial clones) will need this function in a subsequent commit, so move
> it to a more central location.
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  run-command.c | 10 ++++++++++
>  run-command.h |  9 +++++++++
>  submodule.c   | 14 ++------------
>  3 files changed, 21 insertions(+), 12 deletions(-)
>
> diff --git a/run-command.c b/run-command.c
> index be6bc128cd..a6c458119c 100644
> --- a/run-command.c
> +++ b/run-command.c
> @@ -1892,3 +1892,13 @@ int run_auto_maintenance(int quiet)
>  
>  	return run_command(&maint);
>  }
> +
> +void prepare_other_repo_env(struct strvec *env_array)
> +{
> +	const char * const *var;
> +
> +	for (var = local_repo_env; *var; var++) {
> +		if (strcmp(*var, CONFIG_DATA_ENVIRONMENT))
> +			strvec_push(env_array, *var);
> +	}
> +}

It does make sense to move this to run-command.c from submodule.c
and the function name is already suitable for being global.  I
however cannot help wondering if we should also pay attention to the
GIT_CONFIG_KEY_$n and GIT_CONFIG_VALUE_$n pairs (which is not a new
problem in this patch).

This helper may sit better next to prep_childenv(), instead of just
saying "the location does not matter, just append randomly at the
end", though.

Otherwise looking good.

Thanks.



> diff --git a/run-command.h b/run-command.h
> index d08414a92e..a1d9107f5b 100644
> --- a/run-command.h
> +++ b/run-command.h
> @@ -483,4 +483,13 @@ int run_processes_parallel_tr2(int n, get_next_task_fn, start_failure_fn,
>  			       task_finished_fn, void *pp_cb,
>  			       const char *tr2_category, const char *tr2_label);
>  
> +/**
> + * Convenience function which adds all GIT_* environment variables to env_array
> + * with the exception of GIT_CONFIG_PARAMETERS. When used as the env_array of a
> + * subprocess, these entries cause the corresponding environment variables to
> + * be unset in the subprocess. See local_repo_env in cache.h for more
> + * information.
> + */
> +void prepare_other_repo_env(struct strvec *env_array);
> +
>  #endif
> diff --git a/submodule.c b/submodule.c
> index 0b1d9c1dde..a30216db52 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -484,26 +484,16 @@ static void print_submodule_diff_summary(struct repository *r, struct rev_info *
>  	strbuf_release(&sb);
>  }
>  
> -static void prepare_submodule_repo_env_no_git_dir(struct strvec *out)
> -{
> -	const char * const *var;
> -
> -	for (var = local_repo_env; *var; var++) {
> -		if (strcmp(*var, CONFIG_DATA_ENVIRONMENT))
> -			strvec_push(out, *var);
> -	}
> -}
> -
>  void prepare_submodule_repo_env(struct strvec *out)
>  {
> -	prepare_submodule_repo_env_no_git_dir(out);
> +	prepare_other_repo_env(out);
>  	strvec_pushf(out, "%s=%s", GIT_DIR_ENVIRONMENT,
>  		     DEFAULT_GIT_DIR_ENVIRONMENT);
>  }
>  
>  static void prepare_submodule_repo_env_in_gitdir(struct strvec *out)
>  {
> -	prepare_submodule_repo_env_no_git_dir(out);
> +	prepare_other_repo_env(out);
>  	strvec_pushf(out, "%s=.", GIT_DIR_ENVIRONMENT);
>  }

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 4/4] promisor-remote: teach lazy-fetch in any repo
  2021-06-08  0:25   ` [PATCH v2 4/4] promisor-remote: teach lazy-fetch in any repo Jonathan Tan
@ 2021-06-08  4:33     ` Junio C Hamano
  2021-06-09  4:39       ` Jonathan Tan
  2021-06-08 17:42     ` Elijah Newren
  1 sibling, 1 reply; 77+ messages in thread
From: Junio C Hamano @ 2021-06-08  4:33 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, me, newren, emilyshaffer

Jonathan Tan <jonathantanmy@google.com> writes:

>  		/* Check if it is a missing object */
> -		if (fetch_if_missing && has_promisor_remote() &&
> -		    !already_retried && r == the_repository &&
> +		if (fetch_if_missing && repo_has_promisor_remote(r) &&
> +		    !already_retried &&

Turning has_promisor_remote() into repo_has_promisor_remote(r) does
make tons of sense.  Is this part of the code ready to lose "'r' must
be the_repository because has_promisor_remote() only works on the
primary in-core repository" we had before?

> @@ -21,6 +22,11 @@ static int fetch_objects(const char *remote_name,
>  
>  	child.git_cmd = 1;
>  	child.in = -1;
> +	if (repo != the_repository) {
> +		prepare_other_repo_env(&child.env_array);
> +		strvec_pushf(&child.env_array, "%s=%s", GIT_DIR_ENVIRONMENT,
> +			     repo->gitdir);
> +	}

This is what prepare_submodule_repo_env_in_gitdir() does; it makes
me wonder if it (i.e. set up environment for that other repository,
including the GIT_DIR and possibly other per-repository environment
variable override) should be the primary API callers would want,
instead of a more limited prepare_other_repo_env() that does not
even take 'repo' parameter.  Doesn't it feel somewhat strange for a
function that is supposed to help preparing a part of child process
by filling appropriate environ[] array to be run in a repository
that is different from ours (which is "other repo" part of its name)
not to want to even know which repository the "other" repo is?

> diff --git a/t/helper/test-partial-clone.c b/t/helper/test-partial-clone.c
> new file mode 100644
> index 0000000000..3f102cfddd
> --- /dev/null
> +++ b/t/helper/test-partial-clone.c
> @@ -0,0 +1,43 @@
> +#include "cache.h"
> +#include "test-tool.h"
> +#include "repository.h"
> +#include "object-store.h"
> +
> +/*
> + * Prints the size of the object corresponding to the given hash in a specific
> + * gitdir. This is similar to "git -C gitdir cat-file -s", except that this
> + * exercises the code that accesses the object of an arbitrary repository that
> + * is not the_repository. ("git -C gitdir" makes it so that the_repository is
> + * the one in gitdir.)
> + */

The reason why this only gives size is because it will eventually
become unnecessary once the main code starts running things in a
submodule repository properly (i.e. without doing the alternate odb
thing), and a more elaborate check is not worth your engineering
effort?  Object type and object sizes are something that you can
safely express in plain text, would be handy for testing, and would
not require too much extra code, I'd imagine.

> +static void object_info(const char *gitdir, const char *oid_hex)
> +{
> +	struct repository r;
> +	struct object_id oid;
> +	unsigned long size;
> +	struct object_info oi = {.sizep = &size};
> +	const char *p;
> +
> +	if (repo_init(&r, gitdir, NULL))
> +		die("could not init repo");
> +	if (parse_oid_hex(oid_hex, &oid, &p))
> +		die("could not parse oid");
> +	if (oid_object_info_extended(&r, &oid, &oi, 0))
> +		die("could not obtain object info");
> +	printf("%d\n", (int) size);

Mimicking what builtin/cat-file.c::cat_one_file() does, for example, and
using

	printf("%"PRIuMAX"\n", (uintmax_t)size);

might be better (I was wondering if we can extract reusable helpers,
but I do not think that is worth doing, if this is meant to be
temporary stop-gap measure).

Thanks.


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 1/4] promisor-remote: read partialClone config here
  2021-06-08  0:25   ` [PATCH v2 1/4] promisor-remote: read partialClone config here Jonathan Tan
  2021-06-08  3:18     ` Junio C Hamano
@ 2021-06-08 17:28     ` Elijah Newren
  2021-06-09  4:44       ` Jonathan Tan
  1 sibling, 1 reply; 77+ messages in thread
From: Elijah Newren @ 2021-06-08 17:28 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Git Mailing List, Taylor Blau, Emily Shaffer

On Mon, Jun 7, 2021 at 5:26 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> Currently, the reading of config related to promisor remotes is done in
> two places: once in setup.c (which sets the global variable
> repository_format_partial_clone, to be read by the code in
> promisor-remote.c), and once in promisor-remote.c. This means that care
> must be taken to ensure that repository_format_partial_clone is set
> before any code in promisor-remote.c accesses it.
>
> To simplify the code, move all such config reading to promisor-remote.c.
> By doing this, it will be easier to see when
> repository_format_partial_clone is written and, thus, to reason about
> the code. This will be especially helpful in a subsequent commit, which
> modifies this code.
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  cache.h           |  1 -
>  promisor-remote.c | 14 +++++++++-----
>  promisor-remote.h |  6 ------
>  setup.c           | 10 +++++++---
>  4 files changed, 16 insertions(+), 15 deletions(-)
>
> diff --git a/cache.h b/cache.h
> index ba04ff8bd3..dbdcec8601 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1061,7 +1061,6 @@ extern int repository_format_worktree_config;
>  struct repository_format {
>         int version;
>         int precious_objects;
> -       char *partial_clone; /* value of extensions.partialclone */
>         int worktree_config;
>         int is_bare;
>         int hash_algo;
> diff --git a/promisor-remote.c b/promisor-remote.c
> index da3f2ca261..c0e5061dfe 100644
> --- a/promisor-remote.c
> +++ b/promisor-remote.c
> @@ -7,11 +7,6 @@
>
>  static char *repository_format_partial_clone;
>
> -void set_repository_format_partial_clone(char *partial_clone)
> -{
> -       repository_format_partial_clone = xstrdup_or_null(partial_clone);
> -}
> -
>  static int fetch_objects(const char *remote_name,
>                          const struct object_id *oids,
>                          int oid_nr)
> @@ -99,6 +94,15 @@ static int promisor_remote_config(const char *var, const char *value, void *data
>         size_t namelen;
>         const char *subkey;
>
> +       if (!strcmp(var, "extensions.partialclone")) {
> +               /*
> +                * NULL value is handled in handle_extension_v0 in setup.c.
> +                */
> +               if (value)
> +                       repository_format_partial_clone = xstrdup(value);
> +               return 0;
> +       }

This is actually slightly hard to parse out.  I was trying to figure
out where repository_format_partial_clone was initialized, and it's
not handled when value is NULL in handle_extension_v0; it's the fact
that repository_format_partial_clone is declared a static global
variable.

But in the next patch you make it a member of struct
promisor_remote_config, and instead rely on the xcalloc call in
promisor_remote_init().

That means everything is properly initialized and you haven't made any
mistakes here, but the logic is a bit hard to follow.  Perhaps it'd be
nicer to just write this as

+       if (!strcmp(var, "extensions.partialclone")) {
+               repository_format_partial_clone = xstrdup_or_null(value);
+               return 0;
+       }

which makes the code shorter and easier to follow, at least for me.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 4/4] promisor-remote: teach lazy-fetch in any repo
  2021-06-08  0:25   ` [PATCH v2 4/4] promisor-remote: teach lazy-fetch in any repo Jonathan Tan
  2021-06-08  4:33     ` Junio C Hamano
@ 2021-06-08 17:42     ` Elijah Newren
  2021-06-09  4:46       ` Jonathan Tan
  1 sibling, 1 reply; 77+ messages in thread
From: Elijah Newren @ 2021-06-08 17:42 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Git Mailing List, Taylor Blau, Emily Shaffer

On Mon, Jun 7, 2021 at 5:26 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> This is one step towards supporting partial clone submodules.
>
> Even after this patch, we will still lack partial clone submodules
> support, primarily because a lot of Git code that accesses submodule
> objects does so by adding their object stores as alternates, meaning
> that any lazy fetches that would occur in the submodule would be done
> based on the config of the superproject, not of the submodule. This also
> prevents testing of the functionality in this patch by user-facing
> commands. So for now, test this mechanism using a test helper.

I wonder if this commit message is a good place to call out that we
also want to eventually audit codepaths using the old
has_promisor_remote() wrapper function (particularly the ones
protected by a repo == the_repository check) as well.

>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  Makefile                      |  1 +
>  object-file.c                 |  7 ++----
>  promisor-remote.c             | 14 ++++++++----
>  t/helper/test-partial-clone.c | 43 +++++++++++++++++++++++++++++++++++
>  t/helper/test-tool.c          |  1 +
>  t/helper/test-tool.h          |  1 +
>  t/t0410-partial-clone.sh      | 23 +++++++++++++++++++
>  7 files changed, 80 insertions(+), 10 deletions(-)
>  create mode 100644 t/helper/test-partial-clone.c
>
> diff --git a/Makefile b/Makefile
> index c3565fc0f8..f6653bcd5e 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -725,6 +725,7 @@ TEST_BUILTINS_OBJS += test-oidmap.o
>  TEST_BUILTINS_OBJS += test-online-cpus.o
>  TEST_BUILTINS_OBJS += test-parse-options.o
>  TEST_BUILTINS_OBJS += test-parse-pathspec-file.o
> +TEST_BUILTINS_OBJS += test-partial-clone.o
>  TEST_BUILTINS_OBJS += test-path-utils.o
>  TEST_BUILTINS_OBJS += test-pcre2-config.o
>  TEST_BUILTINS_OBJS += test-pkt-line.o
> diff --git a/object-file.c b/object-file.c
> index f233b440b2..ebf273e9e7 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -1570,15 +1570,12 @@ static int do_oid_object_info_extended(struct repository *r,
>                 }
>
>                 /* Check if it is a missing object */
> -               if (fetch_if_missing && has_promisor_remote() &&
> -                   !already_retried && r == the_repository &&
> +               if (fetch_if_missing && repo_has_promisor_remote(r) &&
> +                   !already_retried &&
>                     !(flags & OBJECT_INFO_SKIP_FETCH_OBJECT)) {
>                         /*
>                          * TODO Investigate checking promisor_remote_get_direct()
>                          * TODO return value and stopping on error here.
> -                        * TODO Pass a repository struct through
> -                        * promisor_remote_get_direct(), such that arbitrary
> -                        * repositories work.
>                          */
>                         promisor_remote_get_direct(r, real, 1);
>                         already_retried = 1;
> diff --git a/promisor-remote.c b/promisor-remote.c
> index e1e1f7e93a..1491374d65 100644
> --- a/promisor-remote.c
> +++ b/promisor-remote.c
> @@ -11,7 +11,8 @@ struct promisor_remote_config {
>         struct promisor_remote **promisors_tail;
>  };
>
> -static int fetch_objects(const char *remote_name,
> +static int fetch_objects(struct repository *repo,
> +                        const char *remote_name,
>                          const struct object_id *oids,
>                          int oid_nr)
>  {
> @@ -21,6 +22,11 @@ static int fetch_objects(const char *remote_name,
>
>         child.git_cmd = 1;
>         child.in = -1;
> +       if (repo != the_repository) {
> +               prepare_other_repo_env(&child.env_array);
> +               strvec_pushf(&child.env_array, "%s=%s", GIT_DIR_ENVIRONMENT,
> +                            repo->gitdir);
> +       }
>         strvec_pushl(&child.args, "-c", "fetch.negotiationAlgorithm=noop",
>                      "fetch", remote_name, "--no-tags",
>                      "--no-write-fetch-head", "--recurse-submodules=no",
> @@ -154,7 +160,7 @@ static void promisor_remote_init(struct repository *r)
>                 xcalloc(sizeof(*r->promisor_remote_config), 1);
>         config->promisors_tail = &config->promisors;
>
> -       git_config(promisor_remote_config, config);
> +       repo_config(r, promisor_remote_config, config);
>
>         if (config->repository_format_partial_clone) {
>                 struct promisor_remote *o, *previous;
> @@ -252,10 +258,8 @@ int promisor_remote_get_direct(struct repository *repo,
>
>         promisor_remote_init(repo);
>
> -       if (repo != the_repository)
> -               BUG("only the_repository is supported for now");
>         for (r = repo->promisor_remote_config->promisors; r; r = r->next) {
> -               if (fetch_objects(r->name, remaining_oids, remaining_nr) < 0) {
> +               if (fetch_objects(repo, r->name, remaining_oids, remaining_nr) < 0) {
>                         if (remaining_nr == 1)
>                                 continue;
>                         remaining_nr = remove_fetched_oids(repo, &remaining_oids,
> diff --git a/t/helper/test-partial-clone.c b/t/helper/test-partial-clone.c
> new file mode 100644
> index 0000000000..3f102cfddd
> --- /dev/null
> +++ b/t/helper/test-partial-clone.c
> @@ -0,0 +1,43 @@
> +#include "cache.h"
> +#include "test-tool.h"
> +#include "repository.h"
> +#include "object-store.h"
> +
> +/*
> + * Prints the size of the object corresponding to the given hash in a specific
> + * gitdir. This is similar to "git -C gitdir cat-file -s", except that this
> + * exercises the code that accesses the object of an arbitrary repository that
> + * is not the_repository. ("git -C gitdir" makes it so that the_repository is
> + * the one in gitdir.)
> + */
> +static void object_info(const char *gitdir, const char *oid_hex)
> +{
> +       struct repository r;
> +       struct object_id oid;
> +       unsigned long size;
> +       struct object_info oi = {.sizep = &size};
> +       const char *p;
> +
> +       if (repo_init(&r, gitdir, NULL))
> +               die("could not init repo");
> +       if (parse_oid_hex(oid_hex, &oid, &p))
> +               die("could not parse oid");
> +       if (oid_object_info_extended(&r, &oid, &oi, 0))
> +               die("could not obtain object info");
> +       printf("%d\n", (int) size);
> +}
> +
> +int cmd__partial_clone(int argc, const char **argv)
> +{
> +       setup_git_directory();
> +
> +       if (argc < 4)
> +               die("too few arguments");
> +
> +       if (!strcmp(argv[1], "object-info"))
> +               object_info(argv[2], argv[3]);
> +       else
> +               die("invalid argument '%s'", argv[1]);
> +
> +       return 0;
> +}
> diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
> index c5bd0c6d4c..b21e8f1519 100644
> --- a/t/helper/test-tool.c
> +++ b/t/helper/test-tool.c
> @@ -46,6 +46,7 @@ static struct test_cmd cmds[] = {
>         { "online-cpus", cmd__online_cpus },
>         { "parse-options", cmd__parse_options },
>         { "parse-pathspec-file", cmd__parse_pathspec_file },
> +       { "partial-clone", cmd__partial_clone },
>         { "path-utils", cmd__path_utils },
>         { "pcre2-config", cmd__pcre2_config },
>         { "pkt-line", cmd__pkt_line },
> diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
> index e8069a3b22..f845ced4b3 100644
> --- a/t/helper/test-tool.h
> +++ b/t/helper/test-tool.h
> @@ -35,6 +35,7 @@ int cmd__oidmap(int argc, const char **argv);
>  int cmd__online_cpus(int argc, const char **argv);
>  int cmd__parse_options(int argc, const char **argv);
>  int cmd__parse_pathspec_file(int argc, const char** argv);
> +int cmd__partial_clone(int argc, const char **argv);
>  int cmd__path_utils(int argc, const char **argv);
>  int cmd__pcre2_config(int argc, const char **argv);
>  int cmd__pkt_line(int argc, const char **argv);
> diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
> index 584a039b85..a211a66c67 100755
> --- a/t/t0410-partial-clone.sh
> +++ b/t/t0410-partial-clone.sh
> @@ -604,6 +604,29 @@ test_expect_success 'do not fetch when checking existence of tree we construct o
>         git -C repo cherry-pick side1
>  '
>
> +test_expect_success 'lazy-fetch when accessing object not in the_repository' '
> +       rm -rf full partial.git &&
> +       test_create_repo full &&
> +       test_commit -C full create-a-file file.txt &&
> +
> +       test_config -C full uploadpack.allowfilter 1 &&
> +       test_config -C full uploadpack.allowanysha1inwant 1 &&
> +       git clone --filter=blob:none --bare "file://$(pwd)/full" partial.git &&
> +       FILE_HASH=$(git -C full rev-parse HEAD:file.txt) &&
> +
> +       # Sanity check that the file is missing
> +       git -C partial.git rev-list --objects --missing=print HEAD >out &&
> +       grep "[?]$FILE_HASH" out &&
> +
> +       git -C full cat-file -s "$FILE_HASH" >expect &&
> +       test-tool partial-clone object-info partial.git "$FILE_HASH" >actual &&
> +       test_cmp expect actual &&
> +
> +       # Sanity check that the file is now present
> +       git -C partial.git rev-list --objects --missing=print HEAD >out &&
> +       ! grep "[?]$FILE_HASH" out
> +'
> +
>  . "$TEST_DIRECTORY"/lib-httpd.sh
>  start_httpd
>
> --
> 2.32.0.rc1.229.g3e70b5a671-goog
>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 0/4] First steps towards partial clone submodules
  2021-06-08  0:25 ` [PATCH v2 0/4] First steps towards partial clone submodules Jonathan Tan
                     ` (3 preceding siblings ...)
  2021-06-08  0:25   ` [PATCH v2 4/4] promisor-remote: teach lazy-fetch in any repo Jonathan Tan
@ 2021-06-08 17:50   ` Elijah Newren
  2021-06-08 23:42     ` Junio C Hamano
  2021-06-09  4:58     ` Jonathan Tan
  4 siblings, 2 replies; 77+ messages in thread
From: Elijah Newren @ 2021-06-08 17:50 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Git Mailing List, Taylor Blau, Emily Shaffer, Junio C Hamano

On Mon, Jun 7, 2021 at 5:26 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> Thanks everyone for your reviews. I believe I've addressed all review
> comments, including the one from Elijah about the test failing with
> sha256 (which turns out to be because I didn't add a call to
> setup_git_directory(), which the other test helpers do).

Thanks for fixing those up.  I spotted some minor nits/questions, but
nothing big.

Looks like Junio did spot some bigger items...which raises a question
for me.  I have a series
(https://lore.kernel.org/git/pull.969.git.1622856485.gitgitgadget@gmail.com/)
that also touches partial clones.  Our series are semantically
independent, but we both add a repository parameter to
fetch_objects().  So we both make the same change, but you also make
additional nearby changes, resulting in two trivial conflicts.  So,
should I rebase my series on yours, should you rebase on mine, or
should we just let both proceed independently and double-check Junio
resolves the trivial conflicts in favor of your side?

Thoughts?

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 0/4] First steps towards partial clone submodules
  2021-06-08 17:50   ` [PATCH v2 0/4] First steps towards partial clone submodules Elijah Newren
@ 2021-06-08 23:42     ` Junio C Hamano
  2021-06-09  0:07       ` Elijah Newren
  2021-06-09  4:58     ` Jonathan Tan
  1 sibling, 1 reply; 77+ messages in thread
From: Junio C Hamano @ 2021-06-08 23:42 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Jonathan Tan, Git Mailing List, Taylor Blau, Emily Shaffer

Elijah Newren <newren@gmail.com> writes:

> On Mon, Jun 7, 2021 at 5:26 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>>
>> Thanks everyone for your reviews. I believe I've addressed all review
>> comments, including the one from Elijah about the test failing with
>> sha256 (which turns out to be because I didn't add a call to
>> setup_git_directory(), which the other test helpers do).
>
> Thanks for fixing those up.  I spotted some minor nits/questions, but
> nothing big.
>
> Looks like Junio did spot some bigger items...which raises a question
> for me.  I have a series ...

Do you mean, by "bigger items", that we may want to turn it around
to have repo extension data to the in-core repository structure?

> (https://lore.kernel.org/git/pull.969.git.1622856485.gitgitgadget@gmail.com/)
> that also touches partial clones.  Our series are semantically
> independent, but we both add a repository parameter to
> fetch_objects().  So we both make the same change, but you also make
> additional nearby changes, resulting in two trivial conflicts.  ...

I can sort of see how the above plan would work if we are not going
to fix the "keep only the partialclone related extension thing,
instead of solving the larger structural problem that the current
arrangement ignores that repository extensions are per repository".
But wouldn't that leave us with two series with technical debt?
Also, if Jonathan's series fixes the "bigger item", would the above
"proceed more or less independently or rebase one on top of the
other" plan work well without making the same fix in yours?

I guess a better first step would be to stop, think and decide what
to do with the "bigger" thing---if only to dismiss it with a firm
declaration that we would never do such a fix and move extension
data piecemeal to relevant subsystems, so that we'd reduce conflicts
in the future, as I am reasonably sure that the "bigger item" will
be tempting to fix even after the two series lands, and doing so at
that time would be twice larger surgery.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 0/4] First steps towards partial clone submodules
  2021-06-08 23:42     ` Junio C Hamano
@ 2021-06-09  0:07       ` Elijah Newren
  2021-06-09  0:18         ` Junio C Hamano
  0 siblings, 1 reply; 77+ messages in thread
From: Elijah Newren @ 2021-06-09  0:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Tan, Git Mailing List, Taylor Blau, Emily Shaffer

On Tue, Jun 8, 2021 at 4:42 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Elijah Newren <newren@gmail.com> writes:
>
> > On Mon, Jun 7, 2021 at 5:26 PM Jonathan Tan <jonathantanmy@google.com> wrote:
> >>
> >> Thanks everyone for your reviews. I believe I've addressed all review
> >> comments, including the one from Elijah about the test failing with
> >> sha256 (which turns out to be because I didn't add a call to
> >> setup_git_directory(), which the other test helpers do).
> >
> > Thanks for fixing those up.  I spotted some minor nits/questions, but
> > nothing big.
> >
> > Looks like Junio did spot some bigger items...which raises a question
> > for me.  I have a series ...
>
> Do you mean, by "bigger items", that we may want to turn it around
> to have repo extension data to the in-core repository structure?

Yes.

> > (https://lore.kernel.org/git/pull.969.git.1622856485.gitgitgadget@gmail.com/)
> > that also touches partial clones.  Our series are semantically
> > independent, but we both add a repository parameter to
> > fetch_objects().  So we both make the same change, but you also make
> > additional nearby changes, resulting in two trivial conflicts.  ...
>
> I can sort of see how the above plan would work if we are not going
> to fix the "keep only the partialclone related extension thing,
> instead of solving the larger structural problem that the current
> arrangement ignores that repository extensions are per repository".
> But wouldn't that leave us with two series with technical debt?
> Also, if Jonathan's series fixes the "bigger item", would the above
> "proceed more or less independently or rebase one on top of the
> other" plan work well without making the same fix in yours?

My series is completely independent of the partialclone extension stuff.

My series merely adds the recording of a single statistic (number of
fetched objects) to the partial clone stuff; everything else is higher
level diffcore-rename and merge-ort stuff.

> I guess a better first step would be to stop, think and decide what
> to do with the "bigger" thing---if only to dismiss it with a firm
> declaration that we would never do such a fix and move extension
> data piecemeal to relevant subsystems, so that we'd reduce conflicts
> in the future, as I am reasonably sure that the "bigger item" will
> be tempting to fix even after the two series lands, and doing so at
> that time would be twice larger surgery.

I don't understand how you think the partialclone extension stuff is
relevant to my series at all.  My changes to promisor-remote.c are
just a couple lines, and if he expands or rearranges his work, the
amount of conflicts can't really get any bigger because there's only a
few lines on my side for it to conflict with.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 0/4] First steps towards partial clone submodules
  2021-06-09  0:07       ` Elijah Newren
@ 2021-06-09  0:18         ` Junio C Hamano
  0 siblings, 0 replies; 77+ messages in thread
From: Junio C Hamano @ 2021-06-09  0:18 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Jonathan Tan, Git Mailing List, Taylor Blau, Emily Shaffer

Elijah Newren <newren@gmail.com> writes:

> My series is completely independent of the partialclone extension stuff.

OK, that makes it simpler.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 1/4] promisor-remote: read partialClone config here
  2021-06-08  3:18     ` Junio C Hamano
@ 2021-06-09  4:26       ` Jonathan Tan
  2021-06-09  9:30         ` Junio C Hamano
  0 siblings, 1 reply; 77+ messages in thread
From: Jonathan Tan @ 2021-06-09  4:26 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git, me, newren, emilyshaffer

> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> > Currently, the reading of config related to promisor remotes is done in
> > two places: once in setup.c (which sets the global variable
> > repository_format_partial_clone, to be read by the code in
> > promisor-remote.c), and once in promisor-remote.c. This means that care
> > must be taken to ensure that repository_format_partial_clone is set
> > before any code in promisor-remote.c accesses it.
> 
> The above is very true, but I am puzzled by the chosen direction of
> the code movement.
> 
> Given that the value in the field repository_format.partial_clone
> comes from an extension, and an extension that is not understood by
> the version of Git that is running MUST abort the execution of Git,
> wouldn't it be guaranteed that, in a correctly written program, the
> .partial_clone field must already be set up correctly before
> anything else, including those in promissor-remote.c, accesses it?
> 
> > To simplify the code, move all such config reading to promisor-remote.c.
> > By doing this, it will be easier to see when
> > repository_format_partial_clone is written and, thus, to reason about
> > the code. This will be especially helpful in a subsequent commit, which
> > modifies this code.
> 
> So, I am not sure if this simplifies the code the way we want to
> read our code.  Doing a thing in one place is indeed simpler than
> doing it in two places, but it looks like promisor-remote code
> should be using the repository-format data more, not the other way
> around, at least to me.
> 
> Perhaps I am missing some other motivation, though.
> 
> Thanks.

I'm reluctant to add more fields to struct repository_format. Right
now, the way it is used is to hold any information we gathered (e.g.
hash type) while determining if a repo is one that we can handle. Any
information we still need is copied somewhere else, and the struct
itself is immediately freed.

If we were to use it for promisor remote config, we would have to
read config into struct repository_format's fields and copy those fields
into struct repository in setup.c, and then access the same fields in
promisor-remote.c. It seems more straightforward to just do everything
in promisor-remote.c - for example, if we needed to change the type of
one of those fields, we would just need to change it in one file instead
of two.

I acknowledge that there still remains the duplication that setup.c
needs to know that extensions.partialClone is a valid extension, and
that promsior-remote.c needs to interpret extensions.partialClone.

Having said that, I don't feel very strongly about keeping everything in
promisor-remote.c, so I can move it into setup.c if that's the reviewer
consensus.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 2/4] promisor-remote: support per-repository config
  2021-06-08  3:30     ` Junio C Hamano
@ 2021-06-09  4:29       ` Jonathan Tan
  0 siblings, 0 replies; 77+ messages in thread
From: Jonathan Tan @ 2021-06-09  4:29 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git, me, newren, emilyshaffer

> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> > Instead of using global variables to store promisor remote information,
> > store this config in struct repository instead, and add
> > repository-agnostic non-static functions corresponding to the existing
> > non-static functions that only work on the_repository.
> 
> This does make sense.  In general, repository extensions are per
> repository, so anything read from "extensions.*" should be stored
> per in-core repository structure.
> 
> But doesn't that mean the thing that should be fixed is on the
> setup.c side, where not just extensions.partialClone but other data
> is read into "struct repository_format *format"?  Shouldn't we have
> a pointer to that struct in the in-core repository object?
> 
> Special casing the "partialClone" field alone feels somewhat strange
> to me.
> 
> Thanks.

My reply is the same as what I replied to the query in patch 1 [1].

[1] https://lore.kernel.org/git/20210609042649.2322758-1-jonathantanmy@google.com/

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 3/4] run-command: move envvar-resetting function
  2021-06-08  4:14     ` Junio C Hamano
@ 2021-06-09  4:32       ` Jonathan Tan
  2021-06-09  5:28         ` Junio C Hamano
  0 siblings, 1 reply; 77+ messages in thread
From: Jonathan Tan @ 2021-06-09  4:32 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git, me, newren, emilyshaffer

> It does make sense to move this to run-command.c from submodule.c
> and the function name is already suitable for being global.  I
> however cannot help wondering if we should also pay attention to the
> GIT_CONFIG_KEY_$n and GIT_CONFIG_VALUE_$n pairs (which is not a new
> problem in this patch).

Note that I changed the function name (the previous one was too
submodule-specific). As for the config pairs, they are currently being
passed through - do you have a situation in mind in which they should
not be passed through?

> This helper may sit better next to prep_childenv(), instead of just
> saying "the location does not matter, just append randomly at the
> end", though.
> 
> Otherwise looking good.
> 
> Thanks.

Thanks - I'll move it.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 4/4] promisor-remote: teach lazy-fetch in any repo
  2021-06-08  4:33     ` Junio C Hamano
@ 2021-06-09  4:39       ` Jonathan Tan
  2021-06-09  5:33         ` Junio C Hamano
  0 siblings, 1 reply; 77+ messages in thread
From: Jonathan Tan @ 2021-06-09  4:39 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git, me, newren, emilyshaffer

> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> >  		/* Check if it is a missing object */
> > -		if (fetch_if_missing && has_promisor_remote() &&
> > -		    !already_retried && r == the_repository &&
> > +		if (fetch_if_missing && repo_has_promisor_remote(r) &&
> > +		    !already_retried &&
> 
> Turning has_promisor_remote() into repo_has_promisor_remote(r) does
> make tons of sense.  Is this part of the code ready to lose "'r' must
> be the_repository because has_promisor_remote() only works on the
> primary in-core repository" we had before?

Yes - that is precisely what I test with the test helper (running this
code path with "r" that is not the_repository).

> > @@ -21,6 +22,11 @@ static int fetch_objects(const char *remote_name,
> >  
> >  	child.git_cmd = 1;
> >  	child.in = -1;
> > +	if (repo != the_repository) {
> > +		prepare_other_repo_env(&child.env_array);
> > +		strvec_pushf(&child.env_array, "%s=%s", GIT_DIR_ENVIRONMENT,
> > +			     repo->gitdir);
> > +	}
> 
> This is what prepare_submodule_repo_env_in_gitdir() does; it makes
> me wonder if it (i.e. set up environment for that other repository,
> including the GIT_DIR and possibly other per-repository environment
> variable override) should be the primary API callers would want,
> instead of a more limited prepare_other_repo_env() that does not
> even take 'repo' parameter.  Doesn't it feel somewhat strange for a
> function that is supposed to help preparing a part of child process
> by filling appropriate environ[] array to be run in a repository
> that is different from ours (which is "other repo" part of its name)
> not to want to even know which repository the "other" repo is?

Good point. I'll update prepare_other_repo_env() to have a gitdir
parameter.

> > diff --git a/t/helper/test-partial-clone.c b/t/helper/test-partial-clone.c
> > new file mode 100644
> > index 0000000000..3f102cfddd
> > --- /dev/null
> > +++ b/t/helper/test-partial-clone.c
> > @@ -0,0 +1,43 @@
> > +#include "cache.h"
> > +#include "test-tool.h"
> > +#include "repository.h"
> > +#include "object-store.h"
> > +
> > +/*
> > + * Prints the size of the object corresponding to the given hash in a specific
> > + * gitdir. This is similar to "git -C gitdir cat-file -s", except that this
> > + * exercises the code that accesses the object of an arbitrary repository that
> > + * is not the_repository. ("git -C gitdir" makes it so that the_repository is
> > + * the one in gitdir.)
> > + */
> 
> The reason why this only gives size is because it will eventually
> become unnecessary once the main code starts running things in a
> submodule repository properly (i.e. without doing the alternate odb
> thing),

If you mean that this code path can be tested through user-visible
commands (e.g. git grep with submodule recursion) once the main code
starts avoiding doing the alternate odb thing, so this helper will
eventually be unnecessary, then the answer is yes.

> and a more elaborate check is not worth your engineering
> effort?

Even now, I don't think a more elaborate check is worth the engineering
effort - I just want to check that the file indeed was lazy-fetched, so
any minor datum would suffice.

> Object type and object sizes are something that you can
> safely express in plain text, would be handy for testing, and would
> not require too much extra code, I'd imagine.

It would, but we can already use "git cat-file -s" (or -t) for that. The
helper is meant to test a specific code path wherein we access a
submodule object during a process running in the superproject.

> > +	printf("%d\n", (int) size);
> 
> Mimicking what builtin/cat-file.c::cat_one_file() does, for example, and
> using
> 
> 	printf("%"PRIuMAX"\n", (uintmax_t)size);
> 
> might be better (I was wondering if we can extract reusable helpers,
> but I do not think that is worth doing, if this is meant to be
> temporary stop-gap measure).
> 
> Thanks.

Sounds good - I'll change this.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 1/4] promisor-remote: read partialClone config here
  2021-06-08 17:28     ` Elijah Newren
@ 2021-06-09  4:44       ` Jonathan Tan
  2021-06-09  5:34         ` Elijah Newren
  0 siblings, 1 reply; 77+ messages in thread
From: Jonathan Tan @ 2021-06-09  4:44 UTC (permalink / raw)
  To: newren; +Cc: jonathantanmy, git, me, emilyshaffer

> > @@ -99,6 +94,15 @@ static int promisor_remote_config(const char *var, const char *value, void *data
> >         size_t namelen;
> >         const char *subkey;
> >
> > +       if (!strcmp(var, "extensions.partialclone")) {
> > +               /*
> > +                * NULL value is handled in handle_extension_v0 in setup.c.
> > +                */
> > +               if (value)
> > +                       repository_format_partial_clone = xstrdup(value);
> > +               return 0;
> > +       }
> 
> This is actually slightly hard to parse out.  I was trying to figure
> out where repository_format_partial_clone was initialized, and it's
> not handled when value is NULL in handle_extension_v0; it's the fact
> that repository_format_partial_clone is declared a static global
> variable.
> 
> But in the next patch you make it a member of struct
> promisor_remote_config, and instead rely on the xcalloc call in
> promisor_remote_init().
> 
> That means everything is properly initialized and you haven't made any
> mistakes here, but the logic is a bit hard to follow.  Perhaps it'd be
> nicer to just write this as
> 
> +       if (!strcmp(var, "extensions.partialclone")) {
> +               repository_format_partial_clone = xstrdup_or_null(value);
> +               return 0;
> +       }
> 
> which makes the code shorter and easier to follow, at least for me.

Hmm...is your concern about the case in which
repository_format_partial_clone is uninitialized, or about ignoring a
potential NULL value? If the former, I don't see how your suggestion
fixes things, since extensions.partialclone may never have been in the
config in the first place (and would thus leave
repository_format_partial_clone uninitialized, if it weren't for the
fact that it is in static storage and thus initialized to 0). If the
latter, I guess I should be more detailed about how it's being handled
in setup.c (or maybe just leave out the comment altogether - the code
here can handle a NULL repository_format_partial_clone for some reason).

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 4/4] promisor-remote: teach lazy-fetch in any repo
  2021-06-08 17:42     ` Elijah Newren
@ 2021-06-09  4:46       ` Jonathan Tan
  0 siblings, 0 replies; 77+ messages in thread
From: Jonathan Tan @ 2021-06-09  4:46 UTC (permalink / raw)
  To: newren; +Cc: jonathantanmy, git, me, emilyshaffer

> On Mon, Jun 7, 2021 at 5:26 PM Jonathan Tan <jonathantanmy@google.com> wrote:
> >
> > This is one step towards supporting partial clone submodules.
> >
> > Even after this patch, we will still lack partial clone submodules
> > support, primarily because a lot of Git code that accesses submodule
> > objects does so by adding their object stores as alternates, meaning
> > that any lazy fetches that would occur in the submodule would be done
> > based on the config of the superproject, not of the submodule. This also
> > prevents testing of the functionality in this patch by user-facing
> > commands. So for now, test this mechanism using a test helper.
> 
> I wonder if this commit message is a good place to call out that we
> also want to eventually audit codepaths using the old
> has_promisor_remote() wrapper function (particularly the ones
> protected by a repo == the_repository check) as well.

Sounds good. I think we will need to check uses of all wrappers.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 4/4] promisor-remote: teach lazy-fetch in any repo
  2021-06-08  1:41   ` Emily Shaffer
@ 2021-06-09  4:52     ` Jonathan Tan
  0 siblings, 0 replies; 77+ messages in thread
From: Jonathan Tan @ 2021-06-09  4:52 UTC (permalink / raw)
  To: emilyshaffer; +Cc: jonathantanmy, git

> >  		    !(flags & OBJECT_INFO_SKIP_FETCH_OBJECT)) {
> >  			/*
> >  			 * TODO Investigate checking promisor_remote_get_direct()
> >  			 * TODO return value and stopping on error here.
> > -			 * TODO Pass a repository struct through
> > -			 * promisor_remote_get_direct(), such that arbitrary
> > -			 * repositories work.
> >  			 */
> >  			promisor_remote_get_direct(r, real, 1);
> 
> And this seems like a stale comment, since I see we were already passing
> 'r' here. But arbitrary repositories still don't just work, right? Or, I
> guess your point was "partial clone + submodules don't just work,
> because of the alternates thing" - so maybe this part is OK?

This part is OK (arbitrary repositories work here), yes.

> > @@ -150,7 +156,7 @@ static void promisor_remote_init(struct repository *r)
> >  		xcalloc(sizeof(*r->promisor_remote_config), 1);
> >  	config->promisors_tail = &config->promisors;
> >  
> > -	git_config(promisor_remote_config, config);
> > +	repo_config(r, promisor_remote_config, config);
> 
> Should this change have happened when we added 'r' to
> promisor_remote_init? If r==the_repository then there's no difference
> between these two calls, right?

Good point - yes, it should have. I'll change it.

> > +test_expect_success 'lazy-fetch when accessing object not in the_repository' '
> > +	rm -rf full partial.git &&
> > +	test_create_repo full &&
> > +	printf 12345 >full/file.txt &&
> > +	git -C full add file.txt &&
> > +	git -C full commit -m "first commit" &&
> I think there is some test_commit or similar function here that's more
> commonly used, right?

Taylor Blau suggested a similar thing, and I have changed it in v2.

> 
> > +
> > +	test_config -C full uploadpack.allowfilter 1 &&
> > +	test_config -C full uploadpack.allowanysha1inwant 1 &&
> I wasn't sure what these configs are for, but it looks like .allowfilter
> is to allow 'full' to serve as a remote to a partial clone. But what do
> you need .allowAnySha1InWant for here? Are we expecting to ask for SHAs
> that 'full' doesn't have?

We are expecting to ask for SHAs that 'full' doesn't *advertise*, yes (namely,
the hash of a certain blob).

> > +	git clone --filter=blob:none --bare "file://$(pwd)/full" partial.git &&
> > +	FILE_HASH=$(git hash-object --stdin <full/file.txt) &&
> > +
> > +	# Sanity check that the file is missing
> > +	git -C partial.git rev-list --objects --missing=print HEAD >out &&
> > +	grep "[?]$FILE_HASH" out &&
> > +
> > +	OUT=$(test-tool partial-clone object-info partial.git "$FILE_HASH") &&
> > +	test "$OUT" -eq 5 &&
> 
> Hm. I guess I am confused about why this fetches the desired object into
> partial.git. Maybe the test-helper needs a comment (and maybe here too)
> on the line where fetch will be triggered?

I've added a comment to the test-helper code in v2 - could you take a
look and see if that clarifies things? But in any case, the answer is
that this test-tool invocation attempts to read an object in the
submodule while running as a process in the superproject. The read
attempt is a read of a missing object, so that object is lazily fetched.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 0/4] First steps towards partial clone submodules
  2021-06-08 17:50   ` [PATCH v2 0/4] First steps towards partial clone submodules Elijah Newren
  2021-06-08 23:42     ` Junio C Hamano
@ 2021-06-09  4:58     ` Jonathan Tan
  1 sibling, 0 replies; 77+ messages in thread
From: Jonathan Tan @ 2021-06-09  4:58 UTC (permalink / raw)
  To: newren; +Cc: jonathantanmy, git, me, emilyshaffer, gitster

> Looks like Junio did spot some bigger items...which raises a question
> for me.  I have a series
> (https://lore.kernel.org/git/pull.969.git.1622856485.gitgitgadget@gmail.com/)
> that also touches partial clones.  Our series are semantically
> independent, but we both add a repository parameter to
> fetch_objects().  So we both make the same change, but you also make
> additional nearby changes, resulting in two trivial conflicts.  So,
> should I rebase my series on yours, should you rebase on mine, or
> should we just let both proceed independently and double-check Junio
> resolves the trivial conflicts in favor of your side?
> 
> Thoughts?

From [1], looks like this is already resolved, but in any case I think
we can just let both proceed independently since the conflicts are
relatively trivial. If it turns out to be not so trivial, I think Junio
can just let one of us know on-list and whoever it is can rebase on the
other's.

[1] https://lore.kernel.org/git/xmqqlf7jnb5u.fsf@gitster.g/

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 3/4] run-command: move envvar-resetting function
  2021-06-09  4:32       ` Jonathan Tan
@ 2021-06-09  5:28         ` Junio C Hamano
  2021-06-09 18:15           ` Jonathan Tan
  0 siblings, 1 reply; 77+ messages in thread
From: Junio C Hamano @ 2021-06-09  5:28 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, me, newren, emilyshaffer

Jonathan Tan <jonathantanmy@google.com> writes:

>> It does make sense to move this to run-command.c from submodule.c
>> and the function name is already suitable for being global.  I
>> however cannot help wondering if we should also pay attention to the
>> GIT_CONFIG_KEY_$n and GIT_CONFIG_VALUE_$n pairs (which is not a new
>> problem in this patch).
>
> Note that I changed the function name (the previous one was too
> submodule-specific).

Ah, sorry for a stale mention of that thing---in any case, the name
is suitable for a global.

> As for the config pairs, they are currently being
> passed through - do you have a situation in mind in which they should
> not be passed through?

Wasn't the GIT_CONFIG_KEY/VALUE meant as a moral equivalent of the
GIT_CONFIG_PARAMETERS for those scripts that do not want to bother
following the quoting rules of the single parameter approach?

I do not see why we should filter configuration variables passed via
one mechanism and let variables passed via the other machanism
through.  That feels inconsistent (I suspect that there may already
be inconsistencies introduced when GIT_CONFIG_KEY/VALUE mechanism
was added, though).

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 4/4] promisor-remote: teach lazy-fetch in any repo
  2021-06-09  4:39       ` Jonathan Tan
@ 2021-06-09  5:33         ` Junio C Hamano
  2021-06-09 18:20           ` Jonathan Tan
  0 siblings, 1 reply; 77+ messages in thread
From: Junio C Hamano @ 2021-06-09  5:33 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, me, newren, emilyshaffer

Jonathan Tan <jonathantanmy@google.com> writes:

>> by filling appropriate environ[] array to be run in a repository
>> that is different from ours (which is "other repo" part of its name)
>> not to want to even know which repository the "other" repo is?
>
> Good point. I'll update prepare_other_repo_env() to have a gitdir
> parameter.

I actually meant that the function should take an in-core "repo"
structure.

>> Object type and object sizes are something that you can
>> safely express in plain text, would be handy for testing, and would
>> not require too much extra code, I'd imagine.
>
> It would, but we can already use "git cat-file -s" (or -t) for that. The
> helper is meant to test a specific code path wherein we access a
> submodule object during a process running in the superproject.

I know, but can you use "git cat-file -s" to check the codepath you
care about?  I do not think so.  Hence the suggestion.


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 1/4] promisor-remote: read partialClone config here
  2021-06-09  4:44       ` Jonathan Tan
@ 2021-06-09  5:34         ` Elijah Newren
  2021-06-10 17:25           ` Jonathan Tan
  0 siblings, 1 reply; 77+ messages in thread
From: Elijah Newren @ 2021-06-09  5:34 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Git Mailing List, Taylor Blau, Emily Shaffer

On Tue, Jun 8, 2021 at 9:44 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> > > @@ -99,6 +94,15 @@ static int promisor_remote_config(const char *var, const char *value, void *data
> > >         size_t namelen;
> > >         const char *subkey;
> > >
> > > +       if (!strcmp(var, "extensions.partialclone")) {
> > > +               /*
> > > +                * NULL value is handled in handle_extension_v0 in setup.c.
> > > +                */
> > > +               if (value)
> > > +                       repository_format_partial_clone = xstrdup(value);
> > > +               return 0;
> > > +       }
> >
> > This is actually slightly hard to parse out.  I was trying to figure
> > out where repository_format_partial_clone was initialized, and it's
> > not handled when value is NULL in handle_extension_v0; it's the fact
> > that repository_format_partial_clone is declared a static global
> > variable.
> >
> > But in the next patch you make it a member of struct
> > promisor_remote_config, and instead rely on the xcalloc call in
> > promisor_remote_init().
> >
> > That means everything is properly initialized and you haven't made any
> > mistakes here, but the logic is a bit hard to follow.  Perhaps it'd be
> > nicer to just write this as
> >
> > +       if (!strcmp(var, "extensions.partialclone")) {
> > +               repository_format_partial_clone = xstrdup_or_null(value);
> > +               return 0;
> > +       }
> >
> > which makes the code shorter and easier to follow, at least for me.
>
> Hmm...is your concern about the case in which
> repository_format_partial_clone is uninitialized, or about ignoring a
> potential NULL value? If the former, I don't see how your suggestion
> fixes things, since extensions.partialclone may never have been in the
> config in the first place (and would thus leave
> repository_format_partial_clone uninitialized, if it weren't for the
> fact that it is in static storage and thus initialized to 0). If the
> latter, I guess I should be more detailed about how it's being handled
> in setup.c (or maybe just leave out the comment altogether - the code
> here can handle a NULL repository_format_partial_clone for some reason).

My comment was about the latter; I was trying to understand what the
comment meant relative to that case, and how and where that case would
be handled in the code.  With that frame of reference, the comment
seemed misleading to me...though perhaps the comment was intended to
answer some other question entirely.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 1/4] promisor-remote: read partialClone config here
  2021-06-09  4:26       ` Jonathan Tan
@ 2021-06-09  9:30         ` Junio C Hamano
  2021-06-09 17:16           ` Jonathan Tan
  0 siblings, 1 reply; 77+ messages in thread
From: Junio C Hamano @ 2021-06-09  9:30 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, me, newren, emilyshaffer

Jonathan Tan <jonathantanmy@google.com> writes:

> I'm reluctant to add more fields to struct repository_format.

I am only suggesting to add one new member to either struct
repository or struct repo_settings, so that it becomes crystal clear
that struct repository_format is about each single repository, not
the global the_repository.  Other things that partial repository
support needs to keep *and* do not directly come from extensions
would not belong to repository_format and should not be added there,
but what we read from extensions.* for each repository belongs to
each instance of in-core repository structure and should be
discoverable starting from "struct repository", no?



^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 1/4] promisor-remote: read partialClone config here
  2021-06-09  9:30         ` Junio C Hamano
@ 2021-06-09 17:16           ` Jonathan Tan
  0 siblings, 0 replies; 77+ messages in thread
From: Jonathan Tan @ 2021-06-09 17:16 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git, me, newren, emilyshaffer

> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> > I'm reluctant to add more fields to struct repository_format.
> 
> I am only suggesting to add one new member to either struct
> repository or struct repo_settings, so that it becomes crystal clear
> that struct repository_format is about each single repository, not
> the global the_repository.  Other things that partial repository
> support needs to keep *and* do not directly come from extensions
> would not belong to repository_format and should not be added there,
> but what we read from extensions.* for each repository belongs to
> each instance of in-core repository structure and should be
> discoverable starting from "struct repository", no?

Ah, I see. I'll try this.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 3/4] run-command: move envvar-resetting function
  2021-06-09  5:28         ` Junio C Hamano
@ 2021-06-09 18:15           ` Jonathan Tan
  0 siblings, 0 replies; 77+ messages in thread
From: Jonathan Tan @ 2021-06-09 18:15 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git, me, newren, emilyshaffer

> > As for the config pairs, they are currently being
> > passed through - do you have a situation in mind in which they should
> > not be passed through?
> 
> Wasn't the GIT_CONFIG_KEY/VALUE meant as a moral equivalent of the
> GIT_CONFIG_PARAMETERS for those scripts that do not want to bother
> following the quoting rules of the single parameter approach?
> 
> I do not see why we should filter configuration variables passed via
> one mechanism and let variables passed via the other machanism
> through.  That feels inconsistent (I suspect that there may already
> be inconsistencies introduced when GIT_CONFIG_KEY/VALUE mechanism
> was added, though).

Yes, I think you're right. The filter was added in 14111fc492 ("git:
submodule honor -c credential.* from command line", 2016-03-01) (letting
a sanitized version of GIT_CONFIG_PARAMETERS through) and took its
present form in 89044baa8b ("submodule: stop sanitizing config options",
2016-05-06) (removing the sanitization). GIT_CONFIG_KEY/VALUE was only
added in d8d77153ea ("config: allow specifying config entries via envvar
pairs", 2021-01-15).

Reading the commit messages of these commits, I agree that
GIT_CONFIG_KEY/VALUE should be treated the same way as
GIT_CONFIG_PARAMETERS. In particular, 14111fc492 mentioned passing
credentials as the reason for the feature, and in d8d77153ea, the
ability to pass secrets without exposing them to a user running "ps" was
one of the motivating reasons. I'll add a commit changing this before
this commit.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 4/4] promisor-remote: teach lazy-fetch in any repo
  2021-06-09  5:33         ` Junio C Hamano
@ 2021-06-09 18:20           ` Jonathan Tan
  2021-06-10  1:26             ` Junio C Hamano
  0 siblings, 1 reply; 77+ messages in thread
From: Jonathan Tan @ 2021-06-09 18:20 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git, me, newren, emilyshaffer

> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> >> by filling appropriate environ[] array to be run in a repository
> >> that is different from ours (which is "other repo" part of its name)
> >> not to want to even know which repository the "other" repo is?
> >
> > Good point. I'll update prepare_other_repo_env() to have a gitdir
> > parameter.
> 
> I actually meant that the function should take an in-core "repo"
> structure.

But that seems like we're passing much more than we need - we only need
the git_dir. Also, there is a function that wants to pass a literal "."
as the gitdir; if we do this, I'll have to check if there is still a
struct repository that we can pass that will result in the same gitdir.

> >> Object type and object sizes are something that you can
> >> safely express in plain text, would be handy for testing, and would
> >> not require too much extra code, I'd imagine.
> >
> > It would, but we can already use "git cat-file -s" (or -t) for that. The
> > helper is meant to test a specific code path wherein we access a
> > submodule object during a process running in the superproject.
> 
> I know, but can you use "git cat-file -s" to check the codepath you
> care about?  I do not think so.  Hence the suggestion.

I'm still not convinced that we'll need it in the future, but you're
right that it is not too much trouble. I'll add it in.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 4/4] promisor-remote: teach lazy-fetch in any repo
  2021-06-09 18:20           ` Jonathan Tan
@ 2021-06-10  1:26             ` Junio C Hamano
  0 siblings, 0 replies; 77+ messages in thread
From: Junio C Hamano @ 2021-06-10  1:26 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, me, newren, emilyshaffer

Jonathan Tan <jonathantanmy@google.com> writes:

>> Jonathan Tan <jonathantanmy@google.com> writes:
>> 
>> >> by filling appropriate environ[] array to be run in a repository
>> >> that is different from ours (which is "other repo" part of its name)
>> >> not to want to even know which repository the "other" repo is?
>> >
>> > Good point. I'll update prepare_other_repo_env() to have a gitdir
>> > parameter.
>> 
>> I actually meant that the function should take an in-core "repo"
>> structure.
>
> But that seems like we're passing much more than we need - we only need
> the git_dir. Also, there is a function that wants to pass a literal "."
> as the gitdir; if we do this, I'll have to check if there is still a
> struct repository that we can pass that will result in the same gitdir.

OK.  If the caller at the point does not have anything but git-dir,
there may not be much point in instantiating a full in-core repo
structure to pass to prepare_other_repo_env() to it.  If the helper
needs to learn more about that repository, it can go from the
git-dir and do things itself.

>> >> Object type and object sizes are something that you can
>> >> safely express in plain text, would be handy for testing, and would
>> >> not require too much extra code, I'd imagine.
>> >
>> > It would, but we can already use "git cat-file -s" (or -t) for that. The
>> > helper is meant to test a specific code path wherein we access a
>> > submodule object during a process running in the superproject.
>> 
>> I know, but can you use "git cat-file -s" to check the codepath you
>> care about?  I do not think so.  Hence the suggestion.
>
> I'm still not convinced that we'll need it in the future, but you're
> right that it is not too much trouble. I'll add it in.

As your answer to my initial question was that this is purely a
stop-gap testing measure until we get the support fully plumbed in
so that the real-world codepath can be tested end-to-end, I do not
think it matters all that much.  If it is easy to add, I suspect
that it would help to catch more bugs, but I wouldn't lose sleep if
it doesn't get added.

Thanks.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 1/4] promisor-remote: read partialClone config here
  2021-06-09  5:34         ` Elijah Newren
@ 2021-06-10 17:25           ` Jonathan Tan
  0 siblings, 0 replies; 77+ messages in thread
From: Jonathan Tan @ 2021-06-10 17:25 UTC (permalink / raw)
  To: newren; +Cc: jonathantanmy, git, me, emilyshaffer

> > Hmm...is your concern about the case in which
> > repository_format_partial_clone is uninitialized, or about ignoring a
> > potential NULL value? If the former, I don't see how your suggestion
> > fixes things, since extensions.partialclone may never have been in the
> > config in the first place (and would thus leave
> > repository_format_partial_clone uninitialized, if it weren't for the
> > fact that it is in static storage and thus initialized to 0). If the
> > latter, I guess I should be more detailed about how it's being handled
> > in setup.c (or maybe just leave out the comment altogether - the code
> > here can handle a NULL repository_format_partial_clone for some reason).
> 
> My comment was about the latter; I was trying to understand what the
> comment meant relative to that case, and how and where that case would
> be handled in the code.  With that frame of reference, the comment
> seemed misleading to me...though perhaps the comment was intended to
> answer some other question entirely.

Junio suggested [1] that repository_format_partial_clone be handled when
the repo format is validated, so this part of the code can just make use
of the repository_format_partial_clone value in struct repository and
not read the config itself. So I believe that this part is now
obsolete (but you can take a look at patches 1 and 2 to verify, if you
want).

[1] https://lore.kernel.org/git/xmqqeedbidvy.fsf@gitster.g/

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH v3 0/5] First steps towards partial clone submodules
  2021-06-01 21:34 [PATCH 0/4] First steps towards partial clone submodules Jonathan Tan
                   ` (5 preceding siblings ...)
  2021-06-08  1:44 ` [PATCH " Emily Shaffer
@ 2021-06-10 17:35 ` Jonathan Tan
  2021-06-10 17:35   ` [PATCH v3 1/5] repository: move global r_f_p_c to repo struct Jonathan Tan
                     ` (5 more replies)
  2021-06-17 17:13 ` [PATCH v4 " Jonathan Tan
  7 siblings, 6 replies; 77+ messages in thread
From: Jonathan Tan @ 2021-06-10 17:35 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, me, newren, emilyshaffer, gitster

I think I've addressed all review comments. As for Junio's suggestion
about also printing the type in the former patch 4 (now patch 5) [1], I
decided to just leave the code as-is and not also print the type.

The main changes are that patch 1 is somewhat rewritten - we still
remove the global variable, but we no longer read the
extensions.partialClone config directly from promisor-remote.c. Instead,
we store it in struct repository when the format of a repository is
being verified, and promisor-remote.c merely reads it from there. Patch
3 is a new patch that updates the environment variable preparation
before it is moved in patch 4 (formerly patch 3).

[1] https://lore.kernel.org/git/xmqq7dj2ik7k.fsf@gitster.g/

Jonathan Tan (5):
  repository: move global r_f_p_c to repo struct
  promisor-remote: support per-repository config
  submodule: refrain from filtering GIT_CONFIG_COUNT
  run-command: refactor subprocess env preparation
  promisor-remote: teach lazy-fetch in any repo

 Makefile                      |   1 +
 object-file.c                 |   7 +--
 promisor-remote.c             | 108 ++++++++++++++++++----------------
 promisor-remote.h             |  28 ++++++---
 repository.c                  |   9 +++
 repository.h                  |   5 ++
 run-command.c                 |  12 ++++
 run-command.h                 |  10 ++++
 setup.c                       |  16 +++--
 submodule.c                   |  17 +-----
 t/helper/test-partial-clone.c |  43 ++++++++++++++
 t/helper/test-tool.c          |   1 +
 t/helper/test-tool.h          |   1 +
 t/t0410-partial-clone.sh      |  23 ++++++++
 14 files changed, 196 insertions(+), 85 deletions(-)
 create mode 100644 t/helper/test-partial-clone.c

Range-diff against v2:
1:  d99598ca50 < -:  ---------- promisor-remote: read partialClone config here
-:  ---------- > 1:  255d112256 repository: move global r_f_p_c to repo struct
2:  5a1ccae335 ! 2:  a52448cff2 promisor-remote: support per-repository config
    @@ promisor-remote.c
      #include "transport.h"
      #include "strvec.h"
      
    --static char *repository_format_partial_clone;
     +struct promisor_remote_config {
    -+	char *repository_format_partial_clone;
     +	struct promisor_remote *promisors;
     +	struct promisor_remote **promisors_tail;
     +};
    - 
    ++
      static int fetch_objects(const char *remote_name,
      			 const struct object_id *oids,
    + 			 int oid_nr)
     @@ promisor-remote.c: static int fetch_objects(const char *remote_name,
      	return finish_command(&child) ? -1 : 0;
      }
    @@ promisor-remote.c: static void promisor_remote_move_to_tail(struct promisor_remo
      	const char *name;
      	size_t namelen;
      	const char *subkey;
    -@@ promisor-remote.c: static int promisor_remote_config(const char *var, const char *value, void *data
    - 		 * NULL value is handled in handle_extension_v0 in setup.c.
    - 		 */
    - 		if (value)
    --			repository_format_partial_clone = xstrdup(value);
    -+			config->repository_format_partial_clone = xstrdup(value);
    - 		return 0;
    - 	}
    - 
     @@ promisor-remote.c: static int promisor_remote_config(const char *var, const char *value, void *data
      
      		remote_name = xmemdupz(name, namelen);
    @@ promisor-remote.c: static int promisor_remote_config(const char *var, const char
     +	config->promisors_tail = &config->promisors;
      
     -	git_config(promisor_remote_config, NULL);
    -+	git_config(promisor_remote_config, config);
    ++	repo_config(r, promisor_remote_config, config);
      
    --	if (repository_format_partial_clone) {
    -+	if (config->repository_format_partial_clone) {
    +-	if (the_repository->repository_format_partial_clone) {
    ++	if (r->repository_format_partial_clone) {
      		struct promisor_remote *o, *previous;
      
    --		o = promisor_remote_lookup(repository_format_partial_clone,
    +-		o = promisor_remote_lookup(the_repository->repository_format_partial_clone,
     +		o = promisor_remote_lookup(config,
    -+					   config->repository_format_partial_clone,
    ++					   r->repository_format_partial_clone,
      					   &previous);
      		if (o)
     -			promisor_remote_move_to_tail(o, previous);
     +			promisor_remote_move_to_tail(config, o, previous);
      		else
    --			promisor_remote_new(repository_format_partial_clone);
    -+			promisor_remote_new(config, config->repository_format_partial_clone);
    +-			promisor_remote_new(the_repository->repository_format_partial_clone);
    ++			promisor_remote_new(config, r->repository_format_partial_clone);
      	}
      }
      
    @@ promisor-remote.c: static int promisor_remote_config(const char *var, const char
     -	while (promisors) {
     -		struct promisor_remote *r = promisors;
     -		promisors = promisors->next;
    -+	FREE_AND_NULL(config->repository_format_partial_clone);
    -+
     +	while (config->promisors) {
     +		struct promisor_remote *r = config->promisors;
     +		config->promisors = config->promisors->next;
    @@ repository.h: struct lock_file;
      enum untracked_cache_setting {
      	UNTRACKED_CACHE_UNSET = -1,
     @@ repository.h: struct repository {
    - 	/* True if commit-graph has been disabled within this process. */
    - 	int commit_graph_disabled;
      
    -+	/* Configurations related to promisor remotes. */
    + 	/* Configurations related to promisor remotes. */
    + 	char *repository_format_partial_clone;
     +	struct promisor_remote_config *promisor_remote_config;
    -+
    + 
      	/* Configurations */
      
    - 	/* Indicate if a repository has a different 'commondir' from 'gitdir' */
-:  ---------- > 3:  e1a40108f4 submodule: refrain from filtering GIT_CONFIG_COUNT
3:  3f7c4e6e67 ! 4:  fd6907822c run-command: move envvar-resetting function
    @@ Metadata
     Author: Jonathan Tan <jonathantanmy@google.com>
     
      ## Commit message ##
    -    run-command: move envvar-resetting function
    +    run-command: refactor subprocess env preparation
     
    -    There is a function that resets environment variables, used when
    -    invoking a sub-process in a submodule. The lazy-fetching code (used in
    -    partial clones) will need this function in a subsequent commit, so move
    -    it to a more central location.
    +    submodule.c has functionality that prepares the environment for running
    +    a subprocess in a new repo. The lazy-fetching code (used in partial
    +    clones) will need this in a subsequent commit, so move it to a more
    +    central location.
     
         Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
     
    @@ run-command.c: int run_auto_maintenance(int quiet)
      	return run_command(&maint);
      }
     +
    -+void prepare_other_repo_env(struct strvec *env_array)
    ++void prepare_other_repo_env(struct strvec *env_array, const char *new_git_dir)
     +{
     +	const char * const *var;
     +
     +	for (var = local_repo_env; *var; var++) {
    -+		if (strcmp(*var, CONFIG_DATA_ENVIRONMENT))
    ++		if (strcmp(*var, CONFIG_DATA_ENVIRONMENT) &&
    ++		    strcmp(*var, CONFIG_COUNT_ENVIRONMENT))
     +			strvec_push(env_array, *var);
     +	}
    ++	strvec_pushf(env_array, "%s=%s", GIT_DIR_ENVIRONMENT, new_git_dir);
     +}
     
      ## run-command.h ##
    @@ run-command.h: int run_processes_parallel_tr2(int n, get_next_task_fn, start_fai
      			       const char *tr2_category, const char *tr2_label);
      
     +/**
    -+ * Convenience function which adds all GIT_* environment variables to env_array
    -+ * with the exception of GIT_CONFIG_PARAMETERS. When used as the env_array of a
    -+ * subprocess, these entries cause the corresponding environment variables to
    -+ * be unset in the subprocess. See local_repo_env in cache.h for more
    ++ * Convenience function which prepares env_array for a command to be run in a
    ++ * new repo. This adds all GIT_* environment variables to env_array with the
    ++ * exception of GIT_CONFIG_PARAMETERS (which cause the corresponding
    ++ * environment variables to be unset in the subprocess) and adds an environment
    ++ * variable pointing to new_git_dir. See local_repo_env in cache.h for more
     + * information.
     + */
    -+void prepare_other_repo_env(struct strvec *env_array);
    ++void prepare_other_repo_env(struct strvec *env_array, const char *new_git_dir);
     +
      #endif
     
    @@ submodule.c: static void print_submodule_diff_summary(struct repository *r, stru
     -	const char * const *var;
     -
     -	for (var = local_repo_env; *var; var++) {
    --		if (strcmp(*var, CONFIG_DATA_ENVIRONMENT))
    +-		if (strcmp(*var, CONFIG_DATA_ENVIRONMENT) &&
    +-		    strcmp(*var, CONFIG_COUNT_ENVIRONMENT))
     -			strvec_push(out, *var);
     -	}
     -}
    @@ submodule.c: static void print_submodule_diff_summary(struct repository *r, stru
      void prepare_submodule_repo_env(struct strvec *out)
      {
     -	prepare_submodule_repo_env_no_git_dir(out);
    -+	prepare_other_repo_env(out);
    - 	strvec_pushf(out, "%s=%s", GIT_DIR_ENVIRONMENT,
    - 		     DEFAULT_GIT_DIR_ENVIRONMENT);
    +-	strvec_pushf(out, "%s=%s", GIT_DIR_ENVIRONMENT,
    +-		     DEFAULT_GIT_DIR_ENVIRONMENT);
    ++	prepare_other_repo_env(out, DEFAULT_GIT_DIR_ENVIRONMENT);
      }
      
      static void prepare_submodule_repo_env_in_gitdir(struct strvec *out)
      {
     -	prepare_submodule_repo_env_no_git_dir(out);
    -+	prepare_other_repo_env(out);
    - 	strvec_pushf(out, "%s=.", GIT_DIR_ENVIRONMENT);
    +-	strvec_pushf(out, "%s=.", GIT_DIR_ENVIRONMENT);
    ++	prepare_other_repo_env(out, ".");
      }
      
    + /*
4:  655607d575 ! 5:  a6d73662b1 promisor-remote: teach lazy-fetch in any repo
    @@ Commit message
         prevents testing of the functionality in this patch by user-facing
         commands. So for now, test this mechanism using a test helper.
     
    +    Besides that, there is some code that uses the wrapper functions
    +    like has_promisor_remote(). Those will need to be checked to see if they
    +    could support the non-wrapper functions instead (and thus support any
    +    repository, not just the_repository).
    +
         Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
     
      ## Makefile ##
    @@ promisor-remote.c: static int fetch_objects(const char *remote_name,
      
      	child.git_cmd = 1;
      	child.in = -1;
    -+	if (repo != the_repository) {
    -+		prepare_other_repo_env(&child.env_array);
    -+		strvec_pushf(&child.env_array, "%s=%s", GIT_DIR_ENVIRONMENT,
    -+			     repo->gitdir);
    -+	}
    ++	if (repo != the_repository)
    ++		prepare_other_repo_env(&child.env_array, repo->gitdir);
      	strvec_pushl(&child.args, "-c", "fetch.negotiationAlgorithm=noop",
      		     "fetch", remote_name, "--no-tags",
      		     "--no-write-fetch-head", "--recurse-submodules=no",
    -@@ promisor-remote.c: static void promisor_remote_init(struct repository *r)
    - 		xcalloc(sizeof(*r->promisor_remote_config), 1);
    - 	config->promisors_tail = &config->promisors;
    - 
    --	git_config(promisor_remote_config, config);
    -+	repo_config(r, promisor_remote_config, config);
    - 
    - 	if (config->repository_format_partial_clone) {
    - 		struct promisor_remote *o, *previous;
     @@ promisor-remote.c: int promisor_remote_get_direct(struct repository *repo,
      
      	promisor_remote_init(repo);
-- 
2.32.0.rc1.229.g3e70b5a671-goog


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH v3 1/5] repository: move global r_f_p_c to repo struct
  2021-06-10 17:35 ` [PATCH v3 0/5] " Jonathan Tan
@ 2021-06-10 17:35   ` Jonathan Tan
  2021-06-10 20:47     ` Elijah Newren
  2021-06-10 17:35   ` [PATCH v3 2/5] promisor-remote: support per-repository config Jonathan Tan
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 77+ messages in thread
From: Jonathan Tan @ 2021-06-10 17:35 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, me, newren, emilyshaffer, gitster

Move repository_format_partial_clone, which is currently a global
variable, into struct repository. (Full support for per-repository
partial clone config will be done in a subsequent commit - this is split
into its own commit because of the extent of the changes needed.)

The new repo-specific variable cannot be set in
check_repository_format_gently() (as is currently), because that
function does not know which repo it is operating on (or even whether
the value is important); therefore this responsibility is delegated to
the outermost caller that knows. Of all the outermost callers that know
(found by looking at all functions that call clear_repository_format()),
I looked at those that either read from the main Git directory or write
into a struct repository. These callers have been modified accordingly
(write to the_repository in the former case and write to the given
struct repository in the latter case).

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 promisor-remote.c | 13 +++----------
 promisor-remote.h |  6 ------
 repository.c      |  3 +++
 repository.h      |  3 +++
 setup.c           | 16 +++++++++++-----
 5 files changed, 20 insertions(+), 21 deletions(-)

diff --git a/promisor-remote.c b/promisor-remote.c
index da3f2ca261..d24081dc21 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -5,13 +5,6 @@
 #include "transport.h"
 #include "strvec.h"
 
-static char *repository_format_partial_clone;
-
-void set_repository_format_partial_clone(char *partial_clone)
-{
-	repository_format_partial_clone = xstrdup_or_null(partial_clone);
-}
-
 static int fetch_objects(const char *remote_name,
 			 const struct object_id *oids,
 			 int oid_nr)
@@ -145,15 +138,15 @@ static void promisor_remote_init(void)
 
 	git_config(promisor_remote_config, NULL);
 
-	if (repository_format_partial_clone) {
+	if (the_repository->repository_format_partial_clone) {
 		struct promisor_remote *o, *previous;
 
-		o = promisor_remote_lookup(repository_format_partial_clone,
+		o = promisor_remote_lookup(the_repository->repository_format_partial_clone,
 					   &previous);
 		if (o)
 			promisor_remote_move_to_tail(o, previous);
 		else
-			promisor_remote_new(repository_format_partial_clone);
+			promisor_remote_new(the_repository->repository_format_partial_clone);
 	}
 }
 
diff --git a/promisor-remote.h b/promisor-remote.h
index c7a14063c5..687210ab87 100644
--- a/promisor-remote.h
+++ b/promisor-remote.h
@@ -32,10 +32,4 @@ int promisor_remote_get_direct(struct repository *repo,
 			       const struct object_id *oids,
 			       int oid_nr);
 
-/*
- * This should be used only once from setup.c to set the value we got
- * from the extensions.partialclone config option.
- */
-void set_repository_format_partial_clone(char *partial_clone);
-
 #endif /* PROMISOR_REMOTE_H */
diff --git a/repository.c b/repository.c
index 448cd557d4..4878c297d9 100644
--- a/repository.c
+++ b/repository.c
@@ -172,6 +172,9 @@ int repo_init(struct repository *repo,
 
 	repo_set_hash_algo(repo, format.hash_algo);
 
+	repo->repository_format_partial_clone = format.partial_clone;
+	format.partial_clone = NULL;
+
 	if (worktree)
 		repo_set_worktree(repo, worktree);
 
diff --git a/repository.h b/repository.h
index a45f7520fd..6fb16ed336 100644
--- a/repository.h
+++ b/repository.h
@@ -139,6 +139,9 @@ struct repository {
 	/* True if commit-graph has been disabled within this process. */
 	int commit_graph_disabled;
 
+	/* Configurations related to promisor remotes. */
+	char *repository_format_partial_clone;
+
 	/* Configurations */
 
 	/* Indicate if a repository has a different 'commondir' from 'gitdir' */
diff --git a/setup.c b/setup.c
index 59e2facd9d..fbedfe8e03 100644
--- a/setup.c
+++ b/setup.c
@@ -468,8 +468,6 @@ static enum extension_result handle_extension_v0(const char *var,
 			data->precious_objects = git_config_bool(var, value);
 			return EXTENSION_OK;
 		} else if (!strcmp(ext, "partialclone")) {
-			if (!value)
-				return config_error_nonbool(var);
 			data->partial_clone = xstrdup(value);
 			return EXTENSION_OK;
 		} else if (!strcmp(ext, "worktreeconfig")) {
@@ -566,7 +564,6 @@ static int check_repository_format_gently(const char *gitdir, struct repository_
 	}
 
 	repository_format_precious_objects = candidate->precious_objects;
-	set_repository_format_partial_clone(candidate->partial_clone);
 	repository_format_worktree_config = candidate->worktree_config;
 	string_list_clear(&candidate->unknown_extensions, 0);
 	string_list_clear(&candidate->v1_only_extensions, 0);
@@ -1193,6 +1190,10 @@ int discover_git_directory(struct strbuf *commondir,
 		return -1;
 	}
 
+	the_repository->repository_format_partial_clone =
+		candidate.partial_clone;
+	candidate.partial_clone = NULL;
+
 	clear_repository_format(&candidate);
 	return 0;
 }
@@ -1300,8 +1301,12 @@ const char *setup_git_directory_gently(int *nongit_ok)
 				gitdir = DEFAULT_GIT_DIR_ENVIRONMENT;
 			setup_git_env(gitdir);
 		}
-		if (startup_info->have_repository)
+		if (startup_info->have_repository) {
 			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
+			the_repository->repository_format_partial_clone =
+				repo_fmt.partial_clone;
+			repo_fmt.partial_clone = NULL;
+		}
 	}
 	/*
 	 * Since precompose_string_if_needed() needs to look at
@@ -1319,7 +1324,6 @@ const char *setup_git_directory_gently(int *nongit_ok)
 		setenv(GIT_PREFIX_ENVIRONMENT, "", 1);
 	}
 
-
 	strbuf_release(&dir);
 	strbuf_release(&gitdir);
 	clear_repository_format(&repo_fmt);
@@ -1386,6 +1390,8 @@ void check_repository_format(struct repository_format *fmt)
 	check_repository_format_gently(get_git_dir(), fmt, NULL);
 	startup_info->have_repository = 1;
 	repo_set_hash_algo(the_repository, fmt->hash_algo);
+	the_repository->repository_format_partial_clone =
+		xstrdup_or_null(fmt->partial_clone);
 	clear_repository_format(&repo_fmt);
 }
 
-- 
2.32.0.rc1.229.g3e70b5a671-goog


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH v3 2/5] promisor-remote: support per-repository config
  2021-06-10 17:35 ` [PATCH v3 0/5] " Jonathan Tan
  2021-06-10 17:35   ` [PATCH v3 1/5] repository: move global r_f_p_c to repo struct Jonathan Tan
@ 2021-06-10 17:35   ` Jonathan Tan
  2021-06-10 17:35   ` [PATCH v3 3/5] submodule: refrain from filtering GIT_CONFIG_COUNT Jonathan Tan
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 77+ messages in thread
From: Jonathan Tan @ 2021-06-10 17:35 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, me, newren, emilyshaffer, gitster

Instead of using global variables to store promisor remote information,
store this config in struct repository instead, and add
repository-agnostic non-static functions corresponding to the existing
non-static functions that only work on the_repository.

The actual lazy-fetching of missing objects currently does not work on
repositories other than the_repository, and will still not work after
this commit, so add a BUG message explaining this. A subsequent commit
will remove this limitation.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 promisor-remote.c | 98 ++++++++++++++++++++++++++---------------------
 promisor-remote.h | 22 +++++++++--
 repository.c      |  6 +++
 repository.h      |  2 +
 4 files changed, 82 insertions(+), 46 deletions(-)

diff --git a/promisor-remote.c b/promisor-remote.c
index d24081dc21..1e00e16b0f 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -5,6 +5,11 @@
 #include "transport.h"
 #include "strvec.h"
 
+struct promisor_remote_config {
+	struct promisor_remote *promisors;
+	struct promisor_remote **promisors_tail;
+};
+
 static int fetch_objects(const char *remote_name,
 			 const struct object_id *oids,
 			 int oid_nr)
@@ -35,10 +40,8 @@ static int fetch_objects(const char *remote_name,
 	return finish_command(&child) ? -1 : 0;
 }
 
-static struct promisor_remote *promisors;
-static struct promisor_remote **promisors_tail = &promisors;
-
-static struct promisor_remote *promisor_remote_new(const char *remote_name)
+static struct promisor_remote *promisor_remote_new(struct promisor_remote_config *config,
+						   const char *remote_name)
 {
 	struct promisor_remote *r;
 
@@ -50,18 +53,19 @@ static struct promisor_remote *promisor_remote_new(const char *remote_name)
 
 	FLEX_ALLOC_STR(r, name, remote_name);
 
-	*promisors_tail = r;
-	promisors_tail = &r->next;
+	*config->promisors_tail = r;
+	config->promisors_tail = &r->next;
 
 	return r;
 }
 
-static struct promisor_remote *promisor_remote_lookup(const char *remote_name,
+static struct promisor_remote *promisor_remote_lookup(struct promisor_remote_config *config,
+						      const char *remote_name,
 						      struct promisor_remote **previous)
 {
 	struct promisor_remote *r, *p;
 
-	for (p = NULL, r = promisors; r; p = r, r = r->next)
+	for (p = NULL, r = config->promisors; r; p = r, r = r->next)
 		if (!strcmp(r->name, remote_name)) {
 			if (previous)
 				*previous = p;
@@ -71,7 +75,8 @@ static struct promisor_remote *promisor_remote_lookup(const char *remote_name,
 	return NULL;
 }
 
-static void promisor_remote_move_to_tail(struct promisor_remote *r,
+static void promisor_remote_move_to_tail(struct promisor_remote_config *config,
+					 struct promisor_remote *r,
 					 struct promisor_remote *previous)
 {
 	if (r->next == NULL)
@@ -80,14 +85,15 @@ static void promisor_remote_move_to_tail(struct promisor_remote *r,
 	if (previous)
 		previous->next = r->next;
 	else
-		promisors = r->next ? r->next : r;
+		config->promisors = r->next ? r->next : r;
 	r->next = NULL;
-	*promisors_tail = r;
-	promisors_tail = &r->next;
+	*config->promisors_tail = r;
+	config->promisors_tail = &r->next;
 }
 
 static int promisor_remote_config(const char *var, const char *value, void *data)
 {
+	struct promisor_remote_config *config = data;
 	const char *name;
 	size_t namelen;
 	const char *subkey;
@@ -103,8 +109,8 @@ static int promisor_remote_config(const char *var, const char *value, void *data
 
 		remote_name = xmemdupz(name, namelen);
 
-		if (!promisor_remote_lookup(remote_name, NULL))
-			promisor_remote_new(remote_name);
+		if (!promisor_remote_lookup(config, remote_name, NULL))
+			promisor_remote_new(config, remote_name);
 
 		free(remote_name);
 		return 0;
@@ -113,9 +119,9 @@ static int promisor_remote_config(const char *var, const char *value, void *data
 		struct promisor_remote *r;
 		char *remote_name = xmemdupz(name, namelen);
 
-		r = promisor_remote_lookup(remote_name, NULL);
+		r = promisor_remote_lookup(config, remote_name, NULL);
 		if (!r)
-			r = promisor_remote_new(remote_name);
+			r = promisor_remote_new(config, remote_name);
 
 		free(remote_name);
 
@@ -128,59 +134,63 @@ static int promisor_remote_config(const char *var, const char *value, void *data
 	return 0;
 }
 
-static int initialized;
-
-static void promisor_remote_init(void)
+static void promisor_remote_init(struct repository *r)
 {
-	if (initialized)
+	struct promisor_remote_config *config;
+
+	if (r->promisor_remote_config)
 		return;
-	initialized = 1;
+	config = r->promisor_remote_config =
+		xcalloc(sizeof(*r->promisor_remote_config), 1);
+	config->promisors_tail = &config->promisors;
 
-	git_config(promisor_remote_config, NULL);
+	repo_config(r, promisor_remote_config, config);
 
-	if (the_repository->repository_format_partial_clone) {
+	if (r->repository_format_partial_clone) {
 		struct promisor_remote *o, *previous;
 
-		o = promisor_remote_lookup(the_repository->repository_format_partial_clone,
+		o = promisor_remote_lookup(config,
+					   r->repository_format_partial_clone,
 					   &previous);
 		if (o)
-			promisor_remote_move_to_tail(o, previous);
+			promisor_remote_move_to_tail(config, o, previous);
 		else
-			promisor_remote_new(the_repository->repository_format_partial_clone);
+			promisor_remote_new(config, r->repository_format_partial_clone);
 	}
 }
 
-static void promisor_remote_clear(void)
+void promisor_remote_clear(struct promisor_remote_config *config)
 {
-	while (promisors) {
-		struct promisor_remote *r = promisors;
-		promisors = promisors->next;
+	while (config->promisors) {
+		struct promisor_remote *r = config->promisors;
+		config->promisors = config->promisors->next;
 		free(r);
 	}
 
-	promisors_tail = &promisors;
+	config->promisors_tail = &config->promisors;
 }
 
-void promisor_remote_reinit(void)
+void repo_promisor_remote_reinit(struct repository *r)
 {
-	initialized = 0;
-	promisor_remote_clear();
-	promisor_remote_init();
+	promisor_remote_clear(r->promisor_remote_config);
+	FREE_AND_NULL(r->promisor_remote_config);
+	promisor_remote_init(r);
 }
 
-struct promisor_remote *promisor_remote_find(const char *remote_name)
+struct promisor_remote *repo_promisor_remote_find(struct repository *r,
+						  const char *remote_name)
 {
-	promisor_remote_init();
+	promisor_remote_init(r);
 
 	if (!remote_name)
-		return promisors;
+		return r->promisor_remote_config->promisors;
 
-	return promisor_remote_lookup(remote_name, NULL);
+	return promisor_remote_lookup(r->promisor_remote_config, remote_name, NULL);
 }
 
-int has_promisor_remote(void)
+int repo_has_promisor_remote(struct repository *r)
 {
-	return !!promisor_remote_find(NULL);
+	return !!repo_promisor_remote_find(r, NULL);
 }
 
 static int remove_fetched_oids(struct repository *repo,
@@ -228,9 +238,11 @@ int promisor_remote_get_direct(struct repository *repo,
 	if (oid_nr == 0)
 		return 0;
 
-	promisor_remote_init();
+	promisor_remote_init(repo);
 
-	for (r = promisors; r; r = r->next) {
+	if (repo != the_repository)
+		BUG("only the_repository is supported for now");
+	for (r = repo->promisor_remote_config->promisors; r; r = r->next) {
 		if (fetch_objects(r->name, remaining_oids, remaining_nr) < 0) {
 			if (remaining_nr == 1)
 				continue;
diff --git a/promisor-remote.h b/promisor-remote.h
index 687210ab87..edc45ab0f5 100644
--- a/promisor-remote.h
+++ b/promisor-remote.h
@@ -17,9 +17,25 @@ struct promisor_remote {
 	const char name[FLEX_ARRAY];
 };
 
-void promisor_remote_reinit(void);
-struct promisor_remote *promisor_remote_find(const char *remote_name);
-int has_promisor_remote(void);
+void repo_promisor_remote_reinit(struct repository *r);
+static inline void promisor_remote_reinit(void)
+{
+	repo_promisor_remote_reinit(the_repository);
+}
+
+void promisor_remote_clear(struct promisor_remote_config *config);
+
+struct promisor_remote *repo_promisor_remote_find(struct repository *r, const char *remote_name);
+static inline struct promisor_remote *promisor_remote_find(const char *remote_name)
+{
+	return repo_promisor_remote_find(the_repository, remote_name);
+}
+
+int repo_has_promisor_remote(struct repository *r);
+static inline int has_promisor_remote(void)
+{
+	return repo_has_promisor_remote(the_repository);
+}
 
 /*
  * Fetches all requested objects from all promisor remotes, trying them one at
diff --git a/repository.c b/repository.c
index 4878c297d9..a14074d964 100644
--- a/repository.c
+++ b/repository.c
@@ -11,6 +11,7 @@
 #include "lockfile.h"
 #include "submodule-config.h"
 #include "sparse-index.h"
+#include "promisor-remote.h"
 
 /* The main repository */
 static struct repository the_repo;
@@ -261,6 +262,11 @@ void repo_clear(struct repository *repo)
 		if (repo->index != &the_index)
 			FREE_AND_NULL(repo->index);
 	}
+
+	if (repo->promisor_remote_config) {
+		promisor_remote_clear(repo->promisor_remote_config);
+		FREE_AND_NULL(repo->promisor_remote_config);
+	}
 }
 
 int repo_read_index(struct repository *repo)
diff --git a/repository.h b/repository.h
index 6fb16ed336..3740c93bc0 100644
--- a/repository.h
+++ b/repository.h
@@ -10,6 +10,7 @@ struct lock_file;
 struct pathspec;
 struct raw_object_store;
 struct submodule_cache;
+struct promisor_remote_config;
 
 enum untracked_cache_setting {
 	UNTRACKED_CACHE_UNSET = -1,
@@ -141,6 +142,7 @@ struct repository {
 
 	/* Configurations related to promisor remotes. */
 	char *repository_format_partial_clone;
+	struct promisor_remote_config *promisor_remote_config;
 
 	/* Configurations */
 
-- 
2.32.0.rc1.229.g3e70b5a671-goog


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH v3 3/5] submodule: refrain from filtering GIT_CONFIG_COUNT
  2021-06-10 17:35 ` [PATCH v3 0/5] " Jonathan Tan
  2021-06-10 17:35   ` [PATCH v3 1/5] repository: move global r_f_p_c to repo struct Jonathan Tan
  2021-06-10 17:35   ` [PATCH v3 2/5] promisor-remote: support per-repository config Jonathan Tan
@ 2021-06-10 17:35   ` Jonathan Tan
  2021-06-10 21:13     ` Elijah Newren
  2021-06-10 17:35   ` [PATCH v3 4/5] run-command: refactor subprocess env preparation Jonathan Tan
                     ` (2 subsequent siblings)
  5 siblings, 1 reply; 77+ messages in thread
From: Jonathan Tan @ 2021-06-10 17:35 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, me, newren, emilyshaffer, gitster

14111fc492 ("git: submodule honor -c credential.* from command line",
2016-03-01) taught Git to pass through the GIT_CONFIG_PARAMETERS
environment variable when invoking a subprocess on behalf of a
submodule. But when d8d77153ea ("config: allow specifying config entries
via envvar pairs", 2021-01-15) introduced support for GIT_CONFIG_COUNT
(and its associated GIT_CONFIG_KEY_? and GIT_CONFIG_VALUE_?), the
subprocess mechanism wasn't updated to also pass through these
variables.

Since they are conceptually the same (d8d77153ea was written to address
a shortcoming of GIT_CONFIG_PARAMETERS), update the submodule subprocess
mechanism to also pass through GIT_CONFIG_COUNT.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 submodule.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/submodule.c b/submodule.c
index 0b1d9c1dde..f09031e397 100644
--- a/submodule.c
+++ b/submodule.c
@@ -489,7 +489,8 @@ static void prepare_submodule_repo_env_no_git_dir(struct strvec *out)
 	const char * const *var;
 
 	for (var = local_repo_env; *var; var++) {
-		if (strcmp(*var, CONFIG_DATA_ENVIRONMENT))
+		if (strcmp(*var, CONFIG_DATA_ENVIRONMENT) &&
+		    strcmp(*var, CONFIG_COUNT_ENVIRONMENT))
 			strvec_push(out, *var);
 	}
 }
-- 
2.32.0.rc1.229.g3e70b5a671-goog


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH v3 4/5] run-command: refactor subprocess env preparation
  2021-06-10 17:35 ` [PATCH v3 0/5] " Jonathan Tan
                     ` (2 preceding siblings ...)
  2021-06-10 17:35   ` [PATCH v3 3/5] submodule: refrain from filtering GIT_CONFIG_COUNT Jonathan Tan
@ 2021-06-10 17:35   ` Jonathan Tan
  2021-06-10 21:21     ` Elijah Newren
  2021-06-10 17:35   ` [PATCH v3 5/5] promisor-remote: teach lazy-fetch in any repo Jonathan Tan
  2021-06-10 21:29   ` [PATCH v3 0/5] First steps towards partial clone submodules Elijah Newren
  5 siblings, 1 reply; 77+ messages in thread
From: Jonathan Tan @ 2021-06-10 17:35 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, me, newren, emilyshaffer, gitster

submodule.c has functionality that prepares the environment for running
a subprocess in a new repo. The lazy-fetching code (used in partial
clones) will need this in a subsequent commit, so move it to a more
central location.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 run-command.c | 12 ++++++++++++
 run-command.h | 10 ++++++++++
 submodule.c   | 18 ++----------------
 3 files changed, 24 insertions(+), 16 deletions(-)

diff --git a/run-command.c b/run-command.c
index be6bc128cd..549a94a6a4 100644
--- a/run-command.c
+++ b/run-command.c
@@ -1892,3 +1892,15 @@ int run_auto_maintenance(int quiet)
 
 	return run_command(&maint);
 }
+
+void prepare_other_repo_env(struct strvec *env_array, const char *new_git_dir)
+{
+	const char * const *var;
+
+	for (var = local_repo_env; *var; var++) {
+		if (strcmp(*var, CONFIG_DATA_ENVIRONMENT) &&
+		    strcmp(*var, CONFIG_COUNT_ENVIRONMENT))
+			strvec_push(env_array, *var);
+	}
+	strvec_pushf(env_array, "%s=%s", GIT_DIR_ENVIRONMENT, new_git_dir);
+}
diff --git a/run-command.h b/run-command.h
index d08414a92e..92f1c00b11 100644
--- a/run-command.h
+++ b/run-command.h
@@ -483,4 +483,14 @@ int run_processes_parallel_tr2(int n, get_next_task_fn, start_failure_fn,
 			       task_finished_fn, void *pp_cb,
 			       const char *tr2_category, const char *tr2_label);
 
+/**
+ * Convenience function which prepares env_array for a command to be run in a
+ * new repo. This adds all GIT_* environment variables to env_array with the
+ * exception of GIT_CONFIG_PARAMETERS (which cause the corresponding
+ * environment variables to be unset in the subprocess) and adds an environment
+ * variable pointing to new_git_dir. See local_repo_env in cache.h for more
+ * information.
+ */
+void prepare_other_repo_env(struct strvec *env_array, const char *new_git_dir);
+
 #endif
diff --git a/submodule.c b/submodule.c
index f09031e397..8e611fe1db 100644
--- a/submodule.c
+++ b/submodule.c
@@ -484,28 +484,14 @@ static void print_submodule_diff_summary(struct repository *r, struct rev_info *
 	strbuf_release(&sb);
 }
 
-static void prepare_submodule_repo_env_no_git_dir(struct strvec *out)
-{
-	const char * const *var;
-
-	for (var = local_repo_env; *var; var++) {
-		if (strcmp(*var, CONFIG_DATA_ENVIRONMENT) &&
-		    strcmp(*var, CONFIG_COUNT_ENVIRONMENT))
-			strvec_push(out, *var);
-	}
-}
-
 void prepare_submodule_repo_env(struct strvec *out)
 {
-	prepare_submodule_repo_env_no_git_dir(out);
-	strvec_pushf(out, "%s=%s", GIT_DIR_ENVIRONMENT,
-		     DEFAULT_GIT_DIR_ENVIRONMENT);
+	prepare_other_repo_env(out, DEFAULT_GIT_DIR_ENVIRONMENT);
 }
 
 static void prepare_submodule_repo_env_in_gitdir(struct strvec *out)
 {
-	prepare_submodule_repo_env_no_git_dir(out);
-	strvec_pushf(out, "%s=.", GIT_DIR_ENVIRONMENT);
+	prepare_other_repo_env(out, ".");
 }
 
 /*
-- 
2.32.0.rc1.229.g3e70b5a671-goog


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH v3 5/5] promisor-remote: teach lazy-fetch in any repo
  2021-06-10 17:35 ` [PATCH v3 0/5] " Jonathan Tan
                     ` (3 preceding siblings ...)
  2021-06-10 17:35   ` [PATCH v3 4/5] run-command: refactor subprocess env preparation Jonathan Tan
@ 2021-06-10 17:35   ` Jonathan Tan
  2021-06-10 21:29   ` [PATCH v3 0/5] First steps towards partial clone submodules Elijah Newren
  5 siblings, 0 replies; 77+ messages in thread
From: Jonathan Tan @ 2021-06-10 17:35 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, me, newren, emilyshaffer, gitster

This is one step towards supporting partial clone submodules.

Even after this patch, we will still lack partial clone submodules
support, primarily because a lot of Git code that accesses submodule
objects does so by adding their object stores as alternates, meaning
that any lazy fetches that would occur in the submodule would be done
based on the config of the superproject, not of the submodule. This also
prevents testing of the functionality in this patch by user-facing
commands. So for now, test this mechanism using a test helper.

Besides that, there is some code that uses the wrapper functions
like has_promisor_remote(). Those will need to be checked to see if they
could support the non-wrapper functions instead (and thus support any
repository, not just the_repository).

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 Makefile                      |  1 +
 object-file.c                 |  7 ++----
 promisor-remote.c             |  9 ++++----
 t/helper/test-partial-clone.c | 43 +++++++++++++++++++++++++++++++++++
 t/helper/test-tool.c          |  1 +
 t/helper/test-tool.h          |  1 +
 t/t0410-partial-clone.sh      | 23 +++++++++++++++++++
 7 files changed, 76 insertions(+), 9 deletions(-)
 create mode 100644 t/helper/test-partial-clone.c

diff --git a/Makefile b/Makefile
index c3565fc0f8..f6653bcd5e 100644
--- a/Makefile
+++ b/Makefile
@@ -725,6 +725,7 @@ TEST_BUILTINS_OBJS += test-oidmap.o
 TEST_BUILTINS_OBJS += test-online-cpus.o
 TEST_BUILTINS_OBJS += test-parse-options.o
 TEST_BUILTINS_OBJS += test-parse-pathspec-file.o
+TEST_BUILTINS_OBJS += test-partial-clone.o
 TEST_BUILTINS_OBJS += test-path-utils.o
 TEST_BUILTINS_OBJS += test-pcre2-config.o
 TEST_BUILTINS_OBJS += test-pkt-line.o
diff --git a/object-file.c b/object-file.c
index f233b440b2..ebf273e9e7 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1570,15 +1570,12 @@ static int do_oid_object_info_extended(struct repository *r,
 		}
 
 		/* Check if it is a missing object */
-		if (fetch_if_missing && has_promisor_remote() &&
-		    !already_retried && r == the_repository &&
+		if (fetch_if_missing && repo_has_promisor_remote(r) &&
+		    !already_retried &&
 		    !(flags & OBJECT_INFO_SKIP_FETCH_OBJECT)) {
 			/*
 			 * TODO Investigate checking promisor_remote_get_direct()
 			 * TODO return value and stopping on error here.
-			 * TODO Pass a repository struct through
-			 * promisor_remote_get_direct(), such that arbitrary
-			 * repositories work.
 			 */
 			promisor_remote_get_direct(r, real, 1);
 			already_retried = 1;
diff --git a/promisor-remote.c b/promisor-remote.c
index 1e00e16b0f..c088dcbff3 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -10,7 +10,8 @@ struct promisor_remote_config {
 	struct promisor_remote **promisors_tail;
 };
 
-static int fetch_objects(const char *remote_name,
+static int fetch_objects(struct repository *repo,
+			 const char *remote_name,
 			 const struct object_id *oids,
 			 int oid_nr)
 {
@@ -20,6 +21,8 @@ static int fetch_objects(const char *remote_name,
 
 	child.git_cmd = 1;
 	child.in = -1;
+	if (repo != the_repository)
+		prepare_other_repo_env(&child.env_array, repo->gitdir);
 	strvec_pushl(&child.args, "-c", "fetch.negotiationAlgorithm=noop",
 		     "fetch", remote_name, "--no-tags",
 		     "--no-write-fetch-head", "--recurse-submodules=no",
@@ -240,10 +243,8 @@ int promisor_remote_get_direct(struct repository *repo,
 
 	promisor_remote_init(repo);
 
-	if (repo != the_repository)
-		BUG("only the_repository is supported for now");
 	for (r = repo->promisor_remote_config->promisors; r; r = r->next) {
-		if (fetch_objects(r->name, remaining_oids, remaining_nr) < 0) {
+		if (fetch_objects(repo, r->name, remaining_oids, remaining_nr) < 0) {
 			if (remaining_nr == 1)
 				continue;
 			remaining_nr = remove_fetched_oids(repo, &remaining_oids,
diff --git a/t/helper/test-partial-clone.c b/t/helper/test-partial-clone.c
new file mode 100644
index 0000000000..3f102cfddd
--- /dev/null
+++ b/t/helper/test-partial-clone.c
@@ -0,0 +1,43 @@
+#include "cache.h"
+#include "test-tool.h"
+#include "repository.h"
+#include "object-store.h"
+
+/*
+ * Prints the size of the object corresponding to the given hash in a specific
+ * gitdir. This is similar to "git -C gitdir cat-file -s", except that this
+ * exercises the code that accesses the object of an arbitrary repository that
+ * is not the_repository. ("git -C gitdir" makes it so that the_repository is
+ * the one in gitdir.)
+ */
+static void object_info(const char *gitdir, const char *oid_hex)
+{
+	struct repository r;
+	struct object_id oid;
+	unsigned long size;
+	struct object_info oi = {.sizep = &size};
+	const char *p;
+
+	if (repo_init(&r, gitdir, NULL))
+		die("could not init repo");
+	if (parse_oid_hex(oid_hex, &oid, &p))
+		die("could not parse oid");
+	if (oid_object_info_extended(&r, &oid, &oi, 0))
+		die("could not obtain object info");
+	printf("%d\n", (int) size);
+}
+
+int cmd__partial_clone(int argc, const char **argv)
+{
+	setup_git_directory();
+
+	if (argc < 4)
+		die("too few arguments");
+
+	if (!strcmp(argv[1], "object-info"))
+		object_info(argv[2], argv[3]);
+	else
+		die("invalid argument '%s'", argv[1]);
+
+	return 0;
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index c5bd0c6d4c..b21e8f1519 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -46,6 +46,7 @@ static struct test_cmd cmds[] = {
 	{ "online-cpus", cmd__online_cpus },
 	{ "parse-options", cmd__parse_options },
 	{ "parse-pathspec-file", cmd__parse_pathspec_file },
+	{ "partial-clone", cmd__partial_clone },
 	{ "path-utils", cmd__path_utils },
 	{ "pcre2-config", cmd__pcre2_config },
 	{ "pkt-line", cmd__pkt_line },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index e8069a3b22..f845ced4b3 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -35,6 +35,7 @@ int cmd__oidmap(int argc, const char **argv);
 int cmd__online_cpus(int argc, const char **argv);
 int cmd__parse_options(int argc, const char **argv);
 int cmd__parse_pathspec_file(int argc, const char** argv);
+int cmd__partial_clone(int argc, const char **argv);
 int cmd__path_utils(int argc, const char **argv);
 int cmd__pcre2_config(int argc, const char **argv);
 int cmd__pkt_line(int argc, const char **argv);
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 584a039b85..a211a66c67 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -604,6 +604,29 @@ test_expect_success 'do not fetch when checking existence of tree we construct o
 	git -C repo cherry-pick side1
 '
 
+test_expect_success 'lazy-fetch when accessing object not in the_repository' '
+	rm -rf full partial.git &&
+	test_create_repo full &&
+	test_commit -C full create-a-file file.txt &&
+
+	test_config -C full uploadpack.allowfilter 1 &&
+	test_config -C full uploadpack.allowanysha1inwant 1 &&
+	git clone --filter=blob:none --bare "file://$(pwd)/full" partial.git &&
+	FILE_HASH=$(git -C full rev-parse HEAD:file.txt) &&
+
+	# Sanity check that the file is missing
+	git -C partial.git rev-list --objects --missing=print HEAD >out &&
+	grep "[?]$FILE_HASH" out &&
+
+	git -C full cat-file -s "$FILE_HASH" >expect &&
+	test-tool partial-clone object-info partial.git "$FILE_HASH" >actual &&
+	test_cmp expect actual &&
+
+	# Sanity check that the file is now present
+	git -C partial.git rev-list --objects --missing=print HEAD >out &&
+	! grep "[?]$FILE_HASH" out
+'
+
 . "$TEST_DIRECTORY"/lib-httpd.sh
 start_httpd
 
-- 
2.32.0.rc1.229.g3e70b5a671-goog


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 1/5] repository: move global r_f_p_c to repo struct
  2021-06-10 17:35   ` [PATCH v3 1/5] repository: move global r_f_p_c to repo struct Jonathan Tan
@ 2021-06-10 20:47     ` Elijah Newren
  0 siblings, 0 replies; 77+ messages in thread
From: Elijah Newren @ 2021-06-10 20:47 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Git Mailing List, Taylor Blau, Emily Shaffer, Junio C Hamano

On Thu, Jun 10, 2021 at 10:35 AM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> Move repository_format_partial_clone, which is currently a global
> variable, into struct repository. (Full support for per-repository
> partial clone config will be done in a subsequent commit - this is split
> into its own commit because of the extent of the changes needed.)
>
> The new repo-specific variable cannot be set in
> check_repository_format_gently() (as is currently), because that
> function does not know which repo it is operating on (or even whether
> the value is important); therefore this responsibility is delegated to
> the outermost caller that knows. Of all the outermost callers that know
> (found by looking at all functions that call clear_repository_format()),
> I looked at those that either read from the main Git directory or write
> into a struct repository. These callers have been modified accordingly
> (write to the_repository in the former case and write to the given
> struct repository in the latter case).
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  promisor-remote.c | 13 +++----------
>  promisor-remote.h |  6 ------
>  repository.c      |  3 +++
>  repository.h      |  3 +++
>  setup.c           | 16 +++++++++++-----
>  5 files changed, 20 insertions(+), 21 deletions(-)
>
> diff --git a/promisor-remote.c b/promisor-remote.c
> index da3f2ca261..d24081dc21 100644
> --- a/promisor-remote.c
> +++ b/promisor-remote.c
> @@ -5,13 +5,6 @@
>  #include "transport.h"
>  #include "strvec.h"
>
> -static char *repository_format_partial_clone;
> -
> -void set_repository_format_partial_clone(char *partial_clone)
> -{
> -       repository_format_partial_clone = xstrdup_or_null(partial_clone);
> -}
> -
>  static int fetch_objects(const char *remote_name,
>                          const struct object_id *oids,
>                          int oid_nr)
> @@ -145,15 +138,15 @@ static void promisor_remote_init(void)
>
>         git_config(promisor_remote_config, NULL);
>
> -       if (repository_format_partial_clone) {
> +       if (the_repository->repository_format_partial_clone) {
>                 struct promisor_remote *o, *previous;
>
> -               o = promisor_remote_lookup(repository_format_partial_clone,
> +               o = promisor_remote_lookup(the_repository->repository_format_partial_clone,
>                                            &previous);
>                 if (o)
>                         promisor_remote_move_to_tail(o, previous);
>                 else
> -                       promisor_remote_new(repository_format_partial_clone);
> +                       promisor_remote_new(the_repository->repository_format_partial_clone);
>         }
>  }
>
> diff --git a/promisor-remote.h b/promisor-remote.h
> index c7a14063c5..687210ab87 100644
> --- a/promisor-remote.h
> +++ b/promisor-remote.h
> @@ -32,10 +32,4 @@ int promisor_remote_get_direct(struct repository *repo,
>                                const struct object_id *oids,
>                                int oid_nr);
>
> -/*
> - * This should be used only once from setup.c to set the value we got
> - * from the extensions.partialclone config option.
> - */
> -void set_repository_format_partial_clone(char *partial_clone);
> -
>  #endif /* PROMISOR_REMOTE_H */
> diff --git a/repository.c b/repository.c
> index 448cd557d4..4878c297d9 100644
> --- a/repository.c
> +++ b/repository.c
> @@ -172,6 +172,9 @@ int repo_init(struct repository *repo,
>
>         repo_set_hash_algo(repo, format.hash_algo);
>
> +       repo->repository_format_partial_clone = format.partial_clone;
> +       format.partial_clone = NULL;

This was surprising to me, and I tried to dig around to find out why
you set it to NULL.  AFAICT, you're trying to avoid the need to do a
xstrdup(), so you take over ownership in the first line and set to
NULL in the second to avoid a double-free.  So, it makes sense, but
given how surprising it was and it took me a while to figure it out,
perhaps it's worth adding a comment that this is what you're doing
here?  The same comment would also apply in a few other places in this
patch...

> +
>         if (worktree)
>                 repo_set_worktree(repo, worktree);
>
> diff --git a/repository.h b/repository.h
> index a45f7520fd..6fb16ed336 100644
> --- a/repository.h
> +++ b/repository.h
> @@ -139,6 +139,9 @@ struct repository {
>         /* True if commit-graph has been disabled within this process. */
>         int commit_graph_disabled;
>
> +       /* Configurations related to promisor remotes. */
> +       char *repository_format_partial_clone;
> +
>         /* Configurations */
>
>         /* Indicate if a repository has a different 'commondir' from 'gitdir' */
> diff --git a/setup.c b/setup.c
> index 59e2facd9d..fbedfe8e03 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -468,8 +468,6 @@ static enum extension_result handle_extension_v0(const char *var,
>                         data->precious_objects = git_config_bool(var, value);
>                         return EXTENSION_OK;
>                 } else if (!strcmp(ext, "partialclone")) {
> -                       if (!value)
> -                               return config_error_nonbool(var);
>                         data->partial_clone = xstrdup(value);
>                         return EXTENSION_OK;
>                 } else if (!strcmp(ext, "worktreeconfig")) {
> @@ -566,7 +564,6 @@ static int check_repository_format_gently(const char *gitdir, struct repository_
>         }
>
>         repository_format_precious_objects = candidate->precious_objects;
> -       set_repository_format_partial_clone(candidate->partial_clone);
>         repository_format_worktree_config = candidate->worktree_config;
>         string_list_clear(&candidate->unknown_extensions, 0);
>         string_list_clear(&candidate->v1_only_extensions, 0);
> @@ -1193,6 +1190,10 @@ int discover_git_directory(struct strbuf *commondir,
>                 return -1;
>         }
>
> +       the_repository->repository_format_partial_clone =
> +               candidate.partial_clone;
> +       candidate.partial_clone = NULL;
> +

comment would also be nice here

>         clear_repository_format(&candidate);
>         return 0;
>  }
> @@ -1300,8 +1301,12 @@ const char *setup_git_directory_gently(int *nongit_ok)
>                                 gitdir = DEFAULT_GIT_DIR_ENVIRONMENT;
>                         setup_git_env(gitdir);
>                 }
> -               if (startup_info->have_repository)
> +               if (startup_info->have_repository) {
>                         repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
> +                       the_repository->repository_format_partial_clone =
> +                               repo_fmt.partial_clone;
> +                       repo_fmt.partial_clone = NULL;

and here

> +               }
>         }
>         /*
>          * Since precompose_string_if_needed() needs to look at
> @@ -1319,7 +1324,6 @@ const char *setup_git_directory_gently(int *nongit_ok)
>                 setenv(GIT_PREFIX_ENVIRONMENT, "", 1);
>         }
>
> -

why the stray whitespace change?

>         strbuf_release(&dir);
>         strbuf_release(&gitdir);
>         clear_repository_format(&repo_fmt);
> @@ -1386,6 +1390,8 @@ void check_repository_format(struct repository_format *fmt)
>         check_repository_format_gently(get_git_dir(), fmt, NULL);
>         startup_info->have_repository = 1;
>         repo_set_hash_algo(the_repository, fmt->hash_algo);
> +       the_repository->repository_format_partial_clone =
> +               xstrdup_or_null(fmt->partial_clone);
>         clear_repository_format(&repo_fmt);
>  }
>
> --
> 2.32.0.rc1.229.g3e70b5a671-goog

Other than the minor nits above, this looks good.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 3/5] submodule: refrain from filtering GIT_CONFIG_COUNT
  2021-06-10 17:35   ` [PATCH v3 3/5] submodule: refrain from filtering GIT_CONFIG_COUNT Jonathan Tan
@ 2021-06-10 21:13     ` Elijah Newren
  2021-06-10 21:51       ` Jeff King
  0 siblings, 1 reply; 77+ messages in thread
From: Elijah Newren @ 2021-06-10 21:13 UTC (permalink / raw)
  To: Jonathan Tan
  Cc: Git Mailing List, Taylor Blau, Emily Shaffer, Junio C Hamano, Jeff King

Adding Peff to cc as per comments about 89044baa8b ("submodule: stop
sanitizing config options", 2016-05-04) below.

On Thu, Jun 10, 2021 at 10:35 AM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> 14111fc492 ("git: submodule honor -c credential.* from command line",
> 2016-03-01) taught Git to pass through the GIT_CONFIG_PARAMETERS
> environment variable when invoking a subprocess on behalf of a
> submodule. But when d8d77153ea ("config: allow specifying config entries
> via envvar pairs", 2021-01-15) introduced support for GIT_CONFIG_COUNT
> (and its associated GIT_CONFIG_KEY_? and GIT_CONFIG_VALUE_?), the
> subprocess mechanism wasn't updated to also pass through these
> variables.
>
> Since they are conceptually the same (d8d77153ea was written to address
> a shortcoming of GIT_CONFIG_PARAMETERS), update the submodule subprocess
> mechanism to also pass through GIT_CONFIG_COUNT.
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  submodule.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/submodule.c b/submodule.c
> index 0b1d9c1dde..f09031e397 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -489,7 +489,8 @@ static void prepare_submodule_repo_env_no_git_dir(struct strvec *out)
>         const char * const *var;
>
>         for (var = local_repo_env; *var; var++) {
> -               if (strcmp(*var, CONFIG_DATA_ENVIRONMENT))
> +               if (strcmp(*var, CONFIG_DATA_ENVIRONMENT) &&
> +                   strcmp(*var, CONFIG_COUNT_ENVIRONMENT))
>                         strvec_push(out, *var);
>         }
>  }
> --
> 2.32.0.rc1.229.g3e70b5a671-goog

I'm super confused.  It appears that
prepare_submodule_repo_env_no_git_dir() is filtering out
"GIT_CONFIG_PARAMETERS" (CONFIG_DATA_ENVIRONMENT) and
"GIT_CONFIG_COUNT" (CONFIG_COUNT_ENVIRONMENT), using all environment
variables other than these ones.  But the commit message talks about
adding an extra environment variable, rather than filtering another
out.  I must be mis-reading something somewhere, but I'm struggling to
figure it out.

Digging around for a while led me to commit 89044baa8b ("submodule:
stop sanitizing config options", 2016-05-04), which suggests that the
passing of GIT_CONFIG_PARAMETERS is not done here but in
git-submodule.sh.  It still didn't make it clear to me why it's
stripped out here, but something makes me thing that git-submodule.sh
should be affected by your change as well.

Also, from looking at the other commit messages you reference, it
appears GIT_CONFIG_PARAMETERS was just one big environment variable,
whereas GIT_CONFIG_COUNT is closely associated with 2*N other
environment variables...so shouldn't your loop (and perhaps also
git-submodule.sh) also be checking GIT_CONFIG_KEY_\d+ and
GIT_CONFIG_VALUE_\d+ ?

I've been looking at this patch longer than I care to admit and I
still feel like I don't have a clue what's going on.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 4/5] run-command: refactor subprocess env preparation
  2021-06-10 17:35   ` [PATCH v3 4/5] run-command: refactor subprocess env preparation Jonathan Tan
@ 2021-06-10 21:21     ` Elijah Newren
  0 siblings, 0 replies; 77+ messages in thread
From: Elijah Newren @ 2021-06-10 21:21 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Git Mailing List, Taylor Blau, Emily Shaffer, Junio C Hamano

On Thu, Jun 10, 2021 at 10:35 AM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> submodule.c has functionality that prepares the environment for running
> a subprocess in a new repo. The lazy-fetching code (used in partial
> clones) will need this in a subsequent commit, so move it to a more
> central location.
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  run-command.c | 12 ++++++++++++
>  run-command.h | 10 ++++++++++
>  submodule.c   | 18 ++----------------
>  3 files changed, 24 insertions(+), 16 deletions(-)
>
> diff --git a/run-command.c b/run-command.c
> index be6bc128cd..549a94a6a4 100644
> --- a/run-command.c
> +++ b/run-command.c
> @@ -1892,3 +1892,15 @@ int run_auto_maintenance(int quiet)
>
>         return run_command(&maint);
>  }
> +
> +void prepare_other_repo_env(struct strvec *env_array, const char *new_git_dir)
> +{
> +       const char * const *var;
> +
> +       for (var = local_repo_env; *var; var++) {
> +               if (strcmp(*var, CONFIG_DATA_ENVIRONMENT) &&
> +                   strcmp(*var, CONFIG_COUNT_ENVIRONMENT))
> +                       strvec_push(env_array, *var);
> +       }
> +       strvec_pushf(env_array, "%s=%s", GIT_DIR_ENVIRONMENT, new_git_dir);
> +}
> diff --git a/run-command.h b/run-command.h
> index d08414a92e..92f1c00b11 100644
> --- a/run-command.h
> +++ b/run-command.h
> @@ -483,4 +483,14 @@ int run_processes_parallel_tr2(int n, get_next_task_fn, start_failure_fn,
>                                task_finished_fn, void *pp_cb,
>                                const char *tr2_category, const char *tr2_label);
>
> +/**
> + * Convenience function which prepares env_array for a command to be run in a
> + * new repo. This adds all GIT_* environment variables to env_array with the
> + * exception of GIT_CONFIG_PARAMETERS (which cause the corresponding
> + * environment variables to be unset in the subprocess) and adds an environment
> + * variable pointing to new_git_dir. See local_repo_env in cache.h for more
> + * information.

This comment is out-of-date as of your previous patch.  There's (at
least) one more variable that is also excluded.

> + */
> +void prepare_other_repo_env(struct strvec *env_array, const char *new_git_dir);
> +
>  #endif
> diff --git a/submodule.c b/submodule.c
> index f09031e397..8e611fe1db 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -484,28 +484,14 @@ static void print_submodule_diff_summary(struct repository *r, struct rev_info *
>         strbuf_release(&sb);
>  }
>
> -static void prepare_submodule_repo_env_no_git_dir(struct strvec *out)
> -{
> -       const char * const *var;
> -
> -       for (var = local_repo_env; *var; var++) {
> -               if (strcmp(*var, CONFIG_DATA_ENVIRONMENT) &&
> -                   strcmp(*var, CONFIG_COUNT_ENVIRONMENT))
> -                       strvec_push(out, *var);
> -       }
> -}
> -
>  void prepare_submodule_repo_env(struct strvec *out)
>  {
> -       prepare_submodule_repo_env_no_git_dir(out);
> -       strvec_pushf(out, "%s=%s", GIT_DIR_ENVIRONMENT,
> -                    DEFAULT_GIT_DIR_ENVIRONMENT);
> +       prepare_other_repo_env(out, DEFAULT_GIT_DIR_ENVIRONMENT);
>  }
>
>  static void prepare_submodule_repo_env_in_gitdir(struct strvec *out)
>  {
> -       prepare_submodule_repo_env_no_git_dir(out);
> -       strvec_pushf(out, "%s=.", GIT_DIR_ENVIRONMENT);
> +       prepare_other_repo_env(out, ".");
>  }
>
>  /*
> --
> 2.32.0.rc1.229.g3e70b5a671-goog

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 0/5] First steps towards partial clone submodules
  2021-06-10 17:35 ` [PATCH v3 0/5] " Jonathan Tan
                     ` (4 preceding siblings ...)
  2021-06-10 17:35   ` [PATCH v3 5/5] promisor-remote: teach lazy-fetch in any repo Jonathan Tan
@ 2021-06-10 21:29   ` Elijah Newren
  2021-06-15 21:22     ` Elijah Newren
  5 siblings, 1 reply; 77+ messages in thread
From: Elijah Newren @ 2021-06-10 21:29 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Git Mailing List, Taylor Blau, Emily Shaffer, Junio C Hamano

On Thu, Jun 10, 2021 at 10:35 AM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> I think I've addressed all review comments. As for Junio's suggestion
> about also printing the type in the former patch 4 (now patch 5) [1], I
> decided to just leave the code as-is and not also print the type.
>
> The main changes are that patch 1 is somewhat rewritten - we still
> remove the global variable, but we no longer read the
> extensions.partialClone config directly from promisor-remote.c. Instead,
> we store it in struct repository when the format of a repository is
> being verified, and promisor-remote.c merely reads it from there. Patch
> 3 is a new patch that updates the environment variable preparation
> before it is moved in patch 4 (formerly patch 3).

I've read through all the patches.  2 & 5 look good to me, I had small
nitpicks on 1 & 4, and I'm totally lost on patch 3.  Patch 3 is just a
one-liner and it might be fine, but for some reason I can't figure out
the code before or after the patch even after digging around into
other commits and other files to try to get my bearings.  Hopefully
someone else can comment on that one.

>
> [1] https://lore.kernel.org/git/xmqq7dj2ik7k.fsf@gitster.g/
>
> Jonathan Tan (5):
>   repository: move global r_f_p_c to repo struct
>   promisor-remote: support per-repository config
>   submodule: refrain from filtering GIT_CONFIG_COUNT
>   run-command: refactor subprocess env preparation
>   promisor-remote: teach lazy-fetch in any repo
>
>  Makefile                      |   1 +
>  object-file.c                 |   7 +--
>  promisor-remote.c             | 108 ++++++++++++++++++----------------
>  promisor-remote.h             |  28 ++++++---
>  repository.c                  |   9 +++
>  repository.h                  |   5 ++
>  run-command.c                 |  12 ++++
>  run-command.h                 |  10 ++++
>  setup.c                       |  16 +++--
>  submodule.c                   |  17 +-----
>  t/helper/test-partial-clone.c |  43 ++++++++++++++
>  t/helper/test-tool.c          |   1 +
>  t/helper/test-tool.h          |   1 +
>  t/t0410-partial-clone.sh      |  23 ++++++++
>  14 files changed, 196 insertions(+), 85 deletions(-)
>  create mode 100644 t/helper/test-partial-clone.c
>
> Range-diff against v2:
> 1:  d99598ca50 < -:  ---------- promisor-remote: read partialClone config here
> -:  ---------- > 1:  255d112256 repository: move global r_f_p_c to repo struct
> 2:  5a1ccae335 ! 2:  a52448cff2 promisor-remote: support per-repository config
>     @@ promisor-remote.c
>       #include "transport.h"
>       #include "strvec.h"
>
>     --static char *repository_format_partial_clone;
>      +struct promisor_remote_config {
>     -+  char *repository_format_partial_clone;
>      +  struct promisor_remote *promisors;
>      +  struct promisor_remote **promisors_tail;
>      +};
>     -
>     ++
>       static int fetch_objects(const char *remote_name,
>                          const struct object_id *oids,
>     +                    int oid_nr)
>      @@ promisor-remote.c: static int fetch_objects(const char *remote_name,
>         return finish_command(&child) ? -1 : 0;
>       }
>     @@ promisor-remote.c: static void promisor_remote_move_to_tail(struct promisor_remo
>         const char *name;
>         size_t namelen;
>         const char *subkey;
>     -@@ promisor-remote.c: static int promisor_remote_config(const char *var, const char *value, void *data
>     -            * NULL value is handled in handle_extension_v0 in setup.c.
>     -            */
>     -           if (value)
>     --                  repository_format_partial_clone = xstrdup(value);
>     -+                  config->repository_format_partial_clone = xstrdup(value);
>     -           return 0;
>     -   }
>     -
>      @@ promisor-remote.c: static int promisor_remote_config(const char *var, const char *value, void *data
>
>                 remote_name = xmemdupz(name, namelen);
>     @@ promisor-remote.c: static int promisor_remote_config(const char *var, const char
>      +  config->promisors_tail = &config->promisors;
>
>      -  git_config(promisor_remote_config, NULL);
>     -+  git_config(promisor_remote_config, config);
>     ++  repo_config(r, promisor_remote_config, config);
>
>     --  if (repository_format_partial_clone) {
>     -+  if (config->repository_format_partial_clone) {
>     +-  if (the_repository->repository_format_partial_clone) {
>     ++  if (r->repository_format_partial_clone) {
>                 struct promisor_remote *o, *previous;
>
>     --          o = promisor_remote_lookup(repository_format_partial_clone,
>     +-          o = promisor_remote_lookup(the_repository->repository_format_partial_clone,
>      +          o = promisor_remote_lookup(config,
>     -+                                     config->repository_format_partial_clone,
>     ++                                     r->repository_format_partial_clone,
>                                            &previous);
>                 if (o)
>      -                  promisor_remote_move_to_tail(o, previous);
>      +                  promisor_remote_move_to_tail(config, o, previous);
>                 else
>     --                  promisor_remote_new(repository_format_partial_clone);
>     -+                  promisor_remote_new(config, config->repository_format_partial_clone);
>     +-                  promisor_remote_new(the_repository->repository_format_partial_clone);
>     ++                  promisor_remote_new(config, r->repository_format_partial_clone);
>         }
>       }
>
>     @@ promisor-remote.c: static int promisor_remote_config(const char *var, const char
>      -  while (promisors) {
>      -          struct promisor_remote *r = promisors;
>      -          promisors = promisors->next;
>     -+  FREE_AND_NULL(config->repository_format_partial_clone);
>     -+
>      +  while (config->promisors) {
>      +          struct promisor_remote *r = config->promisors;
>      +          config->promisors = config->promisors->next;
>     @@ repository.h: struct lock_file;
>       enum untracked_cache_setting {
>         UNTRACKED_CACHE_UNSET = -1,
>      @@ repository.h: struct repository {
>     -   /* True if commit-graph has been disabled within this process. */
>     -   int commit_graph_disabled;
>
>     -+  /* Configurations related to promisor remotes. */
>     +   /* Configurations related to promisor remotes. */
>     +   char *repository_format_partial_clone;
>      +  struct promisor_remote_config *promisor_remote_config;
>     -+
>     +
>         /* Configurations */
>
>     -   /* Indicate if a repository has a different 'commondir' from 'gitdir' */
> -:  ---------- > 3:  e1a40108f4 submodule: refrain from filtering GIT_CONFIG_COUNT
> 3:  3f7c4e6e67 ! 4:  fd6907822c run-command: move envvar-resetting function
>     @@ Metadata
>      Author: Jonathan Tan <jonathantanmy@google.com>
>
>       ## Commit message ##
>     -    run-command: move envvar-resetting function
>     +    run-command: refactor subprocess env preparation
>
>     -    There is a function that resets environment variables, used when
>     -    invoking a sub-process in a submodule. The lazy-fetching code (used in
>     -    partial clones) will need this function in a subsequent commit, so move
>     -    it to a more central location.
>     +    submodule.c has functionality that prepares the environment for running
>     +    a subprocess in a new repo. The lazy-fetching code (used in partial
>     +    clones) will need this in a subsequent commit, so move it to a more
>     +    central location.
>
>          Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
>
>     @@ run-command.c: int run_auto_maintenance(int quiet)
>         return run_command(&maint);
>       }
>      +
>     -+void prepare_other_repo_env(struct strvec *env_array)
>     ++void prepare_other_repo_env(struct strvec *env_array, const char *new_git_dir)
>      +{
>      +  const char * const *var;
>      +
>      +  for (var = local_repo_env; *var; var++) {
>     -+          if (strcmp(*var, CONFIG_DATA_ENVIRONMENT))
>     ++          if (strcmp(*var, CONFIG_DATA_ENVIRONMENT) &&
>     ++              strcmp(*var, CONFIG_COUNT_ENVIRONMENT))
>      +                  strvec_push(env_array, *var);
>      +  }
>     ++  strvec_pushf(env_array, "%s=%s", GIT_DIR_ENVIRONMENT, new_git_dir);
>      +}
>
>       ## run-command.h ##
>     @@ run-command.h: int run_processes_parallel_tr2(int n, get_next_task_fn, start_fai
>                                const char *tr2_category, const char *tr2_label);
>
>      +/**
>     -+ * Convenience function which adds all GIT_* environment variables to env_array
>     -+ * with the exception of GIT_CONFIG_PARAMETERS. When used as the env_array of a
>     -+ * subprocess, these entries cause the corresponding environment variables to
>     -+ * be unset in the subprocess. See local_repo_env in cache.h for more
>     ++ * Convenience function which prepares env_array for a command to be run in a
>     ++ * new repo. This adds all GIT_* environment variables to env_array with the
>     ++ * exception of GIT_CONFIG_PARAMETERS (which cause the corresponding
>     ++ * environment variables to be unset in the subprocess) and adds an environment
>     ++ * variable pointing to new_git_dir. See local_repo_env in cache.h for more
>      + * information.
>      + */
>     -+void prepare_other_repo_env(struct strvec *env_array);
>     ++void prepare_other_repo_env(struct strvec *env_array, const char *new_git_dir);
>      +
>       #endif
>
>     @@ submodule.c: static void print_submodule_diff_summary(struct repository *r, stru
>      -  const char * const *var;
>      -
>      -  for (var = local_repo_env; *var; var++) {
>     --          if (strcmp(*var, CONFIG_DATA_ENVIRONMENT))
>     +-          if (strcmp(*var, CONFIG_DATA_ENVIRONMENT) &&
>     +-              strcmp(*var, CONFIG_COUNT_ENVIRONMENT))
>      -                  strvec_push(out, *var);
>      -  }
>      -}
>     @@ submodule.c: static void print_submodule_diff_summary(struct repository *r, stru
>       void prepare_submodule_repo_env(struct strvec *out)
>       {
>      -  prepare_submodule_repo_env_no_git_dir(out);
>     -+  prepare_other_repo_env(out);
>     -   strvec_pushf(out, "%s=%s", GIT_DIR_ENVIRONMENT,
>     -                DEFAULT_GIT_DIR_ENVIRONMENT);
>     +-  strvec_pushf(out, "%s=%s", GIT_DIR_ENVIRONMENT,
>     +-               DEFAULT_GIT_DIR_ENVIRONMENT);
>     ++  prepare_other_repo_env(out, DEFAULT_GIT_DIR_ENVIRONMENT);
>       }
>
>       static void prepare_submodule_repo_env_in_gitdir(struct strvec *out)
>       {
>      -  prepare_submodule_repo_env_no_git_dir(out);
>     -+  prepare_other_repo_env(out);
>     -   strvec_pushf(out, "%s=.", GIT_DIR_ENVIRONMENT);
>     +-  strvec_pushf(out, "%s=.", GIT_DIR_ENVIRONMENT);
>     ++  prepare_other_repo_env(out, ".");
>       }
>
>     + /*
> 4:  655607d575 ! 5:  a6d73662b1 promisor-remote: teach lazy-fetch in any repo
>     @@ Commit message
>          prevents testing of the functionality in this patch by user-facing
>          commands. So for now, test this mechanism using a test helper.
>
>     +    Besides that, there is some code that uses the wrapper functions
>     +    like has_promisor_remote(). Those will need to be checked to see if they
>     +    could support the non-wrapper functions instead (and thus support any
>     +    repository, not just the_repository).
>     +
>          Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
>
>       ## Makefile ##
>     @@ promisor-remote.c: static int fetch_objects(const char *remote_name,
>
>         child.git_cmd = 1;
>         child.in = -1;
>     -+  if (repo != the_repository) {
>     -+          prepare_other_repo_env(&child.env_array);
>     -+          strvec_pushf(&child.env_array, "%s=%s", GIT_DIR_ENVIRONMENT,
>     -+                       repo->gitdir);
>     -+  }
>     ++  if (repo != the_repository)
>     ++          prepare_other_repo_env(&child.env_array, repo->gitdir);
>         strvec_pushl(&child.args, "-c", "fetch.negotiationAlgorithm=noop",
>                      "fetch", remote_name, "--no-tags",
>                      "--no-write-fetch-head", "--recurse-submodules=no",
>     -@@ promisor-remote.c: static void promisor_remote_init(struct repository *r)
>     -           xcalloc(sizeof(*r->promisor_remote_config), 1);
>     -   config->promisors_tail = &config->promisors;
>     -
>     --  git_config(promisor_remote_config, config);
>     -+  repo_config(r, promisor_remote_config, config);
>     -
>     -   if (config->repository_format_partial_clone) {
>     -           struct promisor_remote *o, *previous;
>      @@ promisor-remote.c: int promisor_remote_get_direct(struct repository *repo,
>
>         promisor_remote_init(repo);
> --
> 2.32.0.rc1.229.g3e70b5a671-goog
>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 3/5] submodule: refrain from filtering GIT_CONFIG_COUNT
  2021-06-10 21:13     ` Elijah Newren
@ 2021-06-10 21:51       ` Jeff King
  2021-06-11 17:02         ` Jonathan Tan
  0 siblings, 1 reply; 77+ messages in thread
From: Jeff King @ 2021-06-10 21:51 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Jonathan Tan, Git Mailing List, Taylor Blau, Emily Shaffer,
	Junio C Hamano

On Thu, Jun 10, 2021 at 02:13:49PM -0700, Elijah Newren wrote:

> > diff --git a/submodule.c b/submodule.c
> > index 0b1d9c1dde..f09031e397 100644
> > --- a/submodule.c
> > +++ b/submodule.c
> > @@ -489,7 +489,8 @@ static void prepare_submodule_repo_env_no_git_dir(struct strvec *out)
> >         const char * const *var;
> >
> >         for (var = local_repo_env; *var; var++) {
> > -               if (strcmp(*var, CONFIG_DATA_ENVIRONMENT))
> > +               if (strcmp(*var, CONFIG_DATA_ENVIRONMENT) &&
> > +                   strcmp(*var, CONFIG_COUNT_ENVIRONMENT))
> >                         strvec_push(out, *var);
> >         }
> >  }
> > --
> > 2.32.0.rc1.229.g3e70b5a671-goog
> 
> I'm super confused.  It appears that
> prepare_submodule_repo_env_no_git_dir() is filtering out
> "GIT_CONFIG_PARAMETERS" (CONFIG_DATA_ENVIRONMENT) and
> "GIT_CONFIG_COUNT" (CONFIG_COUNT_ENVIRONMENT), using all environment
> variables other than these ones.  But the commit message talks about
> adding an extra environment variable, rather than filtering another
> out.  I must be mis-reading something somewhere, but I'm struggling to
> figure it out.

I think there might be a double (triple?) negative here:

  - we want to pass through the config parameters variable, but not
    other local repo env variables;

  - so we _don't_ want the config variable to appear in the "out"
    strvec, because its presence would cause it to be cleared
    from the child process environment;

  - so we go through the list adding everything _except_ that variable;

  - and we match using strcmp(), so a true value means "did not match",
    so we should add it to the list

> Also, from looking at the other commit messages you reference, it
> appears GIT_CONFIG_PARAMETERS was just one big environment variable,
> whereas GIT_CONFIG_COUNT is closely associated with 2*N other
> environment variables...so shouldn't your loop (and perhaps also
> git-submodule.sh) also be checking GIT_CONFIG_KEY_\d+ and
> GIT_CONFIG_VALUE_\d+ ?

We definitely could clean out those GIT_CONFIG_KEY_* values. But the
COUNT serves as a master parameter. Anybody who sets COUNT would then
also set the individual key/value parameters, too (and even it only sets
it to "5", and there is a crufty GIT_CONFIG_KEY_6 in the environment,
that is not wrong).

-Peff

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 3/5] submodule: refrain from filtering GIT_CONFIG_COUNT
  2021-06-10 21:51       ` Jeff King
@ 2021-06-11 17:02         ` Jonathan Tan
  0 siblings, 0 replies; 77+ messages in thread
From: Jonathan Tan @ 2021-06-11 17:02 UTC (permalink / raw)
  To: peff; +Cc: newren, jonathantanmy, git, me, emilyshaffer, gitster

> > I'm super confused.  It appears that
> > prepare_submodule_repo_env_no_git_dir() is filtering out
> > "GIT_CONFIG_PARAMETERS" (CONFIG_DATA_ENVIRONMENT) and
> > "GIT_CONFIG_COUNT" (CONFIG_COUNT_ENVIRONMENT), using all environment
> > variables other than these ones.  But the commit message talks about
> > adding an extra environment variable, rather than filtering another
> > out.  I must be mis-reading something somewhere, but I'm struggling to
> > figure it out.
> 
> I think there might be a double (triple?) negative here:
> 
>   - we want to pass through the config parameters variable, but not
>     other local repo env variables;
> 
>   - so we _don't_ want the config variable to appear in the "out"
>     strvec, because its presence would cause it to be cleared
>     from the child process environment;
> 
>   - so we go through the list adding everything _except_ that variable;
> 
>   - and we match using strcmp(), so a true value means "did not match",
>     so we should add it to the list
> 
> > Also, from looking at the other commit messages you reference, it
> > appears GIT_CONFIG_PARAMETERS was just one big environment variable,
> > whereas GIT_CONFIG_COUNT is closely associated with 2*N other
> > environment variables...so shouldn't your loop (and perhaps also
> > git-submodule.sh) also be checking GIT_CONFIG_KEY_\d+ and
> > GIT_CONFIG_VALUE_\d+ ?
> 
> We definitely could clean out those GIT_CONFIG_KEY_* values. But the
> COUNT serves as a master parameter. Anybody who sets COUNT would then
> also set the individual key/value parameters, too (and even it only sets
> it to "5", and there is a crufty GIT_CONFIG_KEY_6 in the environment,
> that is not wrong).
> 
> -Peff

As Peff describes, if an envvar is present in the list, it becomes
unset. (Perhaps confusingly, if an string of the form "ENVVAR=VALUE"
(note the "=") is present in the list, it becomes set to the given
value.) So in order to *not* filter out the envvar from the subprocess,
we need to filter out the envvar from env_array.

If you can think of a better way to document this, please let me know.
One way I thought of that might reduce confusion is for this function to
take the struct child_process directly. I don't like taking the whole
struct when we're just modifying env_array, but I think that this
becomes easier to document (just say that we're unsetting all these
envvars from the child process, and in the function body, say that to
unset a variable, we need to make it appear without a "=" in env_array).

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 0/5] First steps towards partial clone submodules
  2021-06-10 21:29   ` [PATCH v3 0/5] First steps towards partial clone submodules Elijah Newren
@ 2021-06-15 21:22     ` Elijah Newren
  0 siblings, 0 replies; 77+ messages in thread
From: Elijah Newren @ 2021-06-15 21:22 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Git Mailing List, Taylor Blau, Emily Shaffer, Junio C Hamano

Saw this series mentioned in "What's cooking" and remembered I didn't
give an update.

On Thu, Jun 10, 2021 at 2:29 PM Elijah Newren <newren@gmail.com> wrote:
>
> On Thu, Jun 10, 2021 at 10:35 AM Jonathan Tan <jonathantanmy@google.com> wrote:
> >
> > I think I've addressed all review comments. As for Junio's suggestion
> > about also printing the type in the former patch 4 (now patch 5) [1], I
> > decided to just leave the code as-is and not also print the type.
> >
> > The main changes are that patch 1 is somewhat rewritten - we still
> > remove the global variable, but we no longer read the
> > extensions.partialClone config directly from promisor-remote.c. Instead,
> > we store it in struct repository when the format of a repository is
> > being verified, and promisor-remote.c merely reads it from there. Patch
> > 3 is a new patch that updates the environment variable preparation
> > before it is moved in patch 4 (formerly patch 3).
>
> I've read through all the patches.  2 & 5 look good to me, I had small
> nitpicks on 1 & 4, and I'm totally lost on patch 3.  Patch 3 is just a
> one-liner and it might be fine, but for some reason I can't figure out
> the code before or after the patch even after digging around into
> other commits and other files to try to get my bearings.  Hopefully
> someone else can comment on that one.

I'm happy with Jonathan and Peff's responses on patch 3; as I
mentioned above I just didn't understand the original code before
Jonathan's changes.  (Perhaps some comments could be added to clarify
that code area, but again that's clarifying the code that existed
before Jonathan's patch so it doesn't need to be part of his series.)
So that only leaves my nitpicks on patches 1 & 4; otherwise the series
looks good to me.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH v4 0/5] First steps towards partial clone submodules
  2021-06-01 21:34 [PATCH 0/4] First steps towards partial clone submodules Jonathan Tan
                   ` (6 preceding siblings ...)
  2021-06-10 17:35 ` [PATCH v3 0/5] " Jonathan Tan
@ 2021-06-17 17:13 ` Jonathan Tan
  2021-06-17 17:13   ` [PATCH v4 1/5] repository: move global r_f_p_c to repo struct Jonathan Tan
                     ` (5 more replies)
  7 siblings, 6 replies; 77+ messages in thread
From: Jonathan Tan @ 2021-06-17 17:13 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, newren

Quoting from [1]:

> I'm happy with Jonathan and Peff's responses on patch 3; as I
> mentioned above I just didn't understand the original code before
> Jonathan's changes.  (Perhaps some comments could be added to clarify
> that code area, but again that's clarifying the code that existed
> before Jonathan's patch so it doesn't need to be part of his series.)
> So that only leaves my nitpicks on patches 1 & 4; otherwise the series
> looks good to me.

I've addressed Elijah's comments on patches 1 and 4.

[1] https://lore.kernel.org/git/CABPp-BFD5=98C0+WnfK=+s7twZ960ORiZzUSP94GD2A4bXJ69Q@mail.gmail.com/

Jonathan Tan (5):
  repository: move global r_f_p_c to repo struct
  promisor-remote: support per-repository config
  submodule: refrain from filtering GIT_CONFIG_COUNT
  run-command: refactor subprocess env preparation
  promisor-remote: teach lazy-fetch in any repo

 Makefile                      |   1 +
 object-file.c                 |   7 +--
 promisor-remote.c             | 108 ++++++++++++++++++----------------
 promisor-remote.h             |  28 ++++++---
 repository.c                  |  10 ++++
 repository.h                  |   5 ++
 run-command.c                 |  12 ++++
 run-command.h                 |  10 ++++
 setup.c                       |  17 ++++--
 submodule.c                   |  17 +-----
 t/helper/test-partial-clone.c |  43 ++++++++++++++
 t/helper/test-tool.c          |   1 +
 t/helper/test-tool.h          |   1 +
 t/t0410-partial-clone.sh      |  23 ++++++++
 14 files changed, 199 insertions(+), 84 deletions(-)
 create mode 100644 t/helper/test-partial-clone.c

Range-diff against v3:
1:  e8e6a95951 ! 1:  0bd009597d repository: move global r_f_p_c to repo struct
    @@ repository.c: int repo_init(struct repository *repo,
      
      	repo_set_hash_algo(repo, format.hash_algo);
      
    ++	/* take ownership of format.partial_clone */
     +	repo->repository_format_partial_clone = format.partial_clone;
     +	format.partial_clone = NULL;
     +
    @@ setup.c: int discover_git_directory(struct strbuf *commondir,
      		return -1;
      	}
      
    ++	/* take ownership of candidate.partial_clone */
     +	the_repository->repository_format_partial_clone =
     +		candidate.partial_clone;
     +	candidate.partial_clone = NULL;
    @@ setup.c: const char *setup_git_directory_gently(int *nongit_ok)
     -		if (startup_info->have_repository)
     +		if (startup_info->have_repository) {
      			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
    ++			/* take ownership of repo_fmt.partial_clone */
     +			the_repository->repository_format_partial_clone =
     +				repo_fmt.partial_clone;
     +			repo_fmt.partial_clone = NULL;
    @@ setup.c: const char *setup_git_directory_gently(int *nongit_ok)
      	}
      	/*
      	 * Since precompose_string_if_needed() needs to look at
    -@@ setup.c: const char *setup_git_directory_gently(int *nongit_ok)
    - 		setenv(GIT_PREFIX_ENVIRONMENT, "", 1);
    - 	}
    - 
    --
    - 	strbuf_release(&dir);
    - 	strbuf_release(&gitdir);
    - 	clear_repository_format(&repo_fmt);
     @@ setup.c: void check_repository_format(struct repository_format *fmt)
      	check_repository_format_gently(get_git_dir(), fmt, NULL);
      	startup_info->have_repository = 1;
2:  07eb0a0f39 = 2:  8a478b46bf promisor-remote: support per-repository config
3:  004ac92e9b = 3:  78b4108ae1 submodule: refrain from filtering GIT_CONFIG_COUNT
4:  ce0454f442 ! 4:  1778cbf878 run-command: refactor subprocess env preparation
    @@ run-command.h: int run_processes_parallel_tr2(int n, get_next_task_fn, start_fai
     +/**
     + * Convenience function which prepares env_array for a command to be run in a
     + * new repo. This adds all GIT_* environment variables to env_array with the
    -+ * exception of GIT_CONFIG_PARAMETERS (which cause the corresponding
    -+ * environment variables to be unset in the subprocess) and adds an environment
    -+ * variable pointing to new_git_dir. See local_repo_env in cache.h for more
    -+ * information.
    ++ * exception of GIT_CONFIG_PARAMETERS and GIT_CONFIG_COUNT (which cause the
    ++ * corresponding environment variables to be unset in the subprocess) and adds
    ++ * an environment variable pointing to new_git_dir. See local_repo_env in
    ++ * cache.h for more information.
     + */
     +void prepare_other_repo_env(struct strvec *env_array, const char *new_git_dir);
     +
5:  a3278d61f0 = 5:  dbba426b6a promisor-remote: teach lazy-fetch in any repo
-- 
2.32.0.272.g935e593368-goog


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH v4 1/5] repository: move global r_f_p_c to repo struct
  2021-06-17 17:13 ` [PATCH v4 " Jonathan Tan
@ 2021-06-17 17:13   ` Jonathan Tan
  2021-06-17 17:13   ` [PATCH v4 2/5] promisor-remote: support per-repository config Jonathan Tan
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 77+ messages in thread
From: Jonathan Tan @ 2021-06-17 17:13 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, newren, Junio C Hamano

Move repository_format_partial_clone, which is currently a global
variable, into struct repository. (Full support for per-repository
partial clone config will be done in a subsequent commit - this is split
into its own commit because of the extent of the changes needed.)

The new repo-specific variable cannot be set in
check_repository_format_gently() (as is currently), because that
function does not know which repo it is operating on (or even whether
the value is important); therefore this responsibility is delegated to
the outermost caller that knows. Of all the outermost callers that know
(found by looking at all functions that call clear_repository_format()),
I looked at those that either read from the main Git directory or write
into a struct repository. These callers have been modified accordingly
(write to the_repository in the former case and write to the given
struct repository in the latter case).

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 promisor-remote.c | 13 +++----------
 promisor-remote.h |  6 ------
 repository.c      |  4 ++++
 repository.h      |  3 +++
 setup.c           | 17 +++++++++++++----
 5 files changed, 23 insertions(+), 20 deletions(-)

diff --git a/promisor-remote.c b/promisor-remote.c
index da3f2ca261..d24081dc21 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -5,13 +5,6 @@
 #include "transport.h"
 #include "strvec.h"
 
-static char *repository_format_partial_clone;
-
-void set_repository_format_partial_clone(char *partial_clone)
-{
-	repository_format_partial_clone = xstrdup_or_null(partial_clone);
-}
-
 static int fetch_objects(const char *remote_name,
 			 const struct object_id *oids,
 			 int oid_nr)
@@ -145,15 +138,15 @@ static void promisor_remote_init(void)
 
 	git_config(promisor_remote_config, NULL);
 
-	if (repository_format_partial_clone) {
+	if (the_repository->repository_format_partial_clone) {
 		struct promisor_remote *o, *previous;
 
-		o = promisor_remote_lookup(repository_format_partial_clone,
+		o = promisor_remote_lookup(the_repository->repository_format_partial_clone,
 					   &previous);
 		if (o)
 			promisor_remote_move_to_tail(o, previous);
 		else
-			promisor_remote_new(repository_format_partial_clone);
+			promisor_remote_new(the_repository->repository_format_partial_clone);
 	}
 }
 
diff --git a/promisor-remote.h b/promisor-remote.h
index c7a14063c5..687210ab87 100644
--- a/promisor-remote.h
+++ b/promisor-remote.h
@@ -32,10 +32,4 @@ int promisor_remote_get_direct(struct repository *repo,
 			       const struct object_id *oids,
 			       int oid_nr);
 
-/*
- * This should be used only once from setup.c to set the value we got
- * from the extensions.partialclone config option.
- */
-void set_repository_format_partial_clone(char *partial_clone);
-
 #endif /* PROMISOR_REMOTE_H */
diff --git a/repository.c b/repository.c
index 448cd557d4..057a4748a0 100644
--- a/repository.c
+++ b/repository.c
@@ -172,6 +172,10 @@ int repo_init(struct repository *repo,
 
 	repo_set_hash_algo(repo, format.hash_algo);
 
+	/* take ownership of format.partial_clone */
+	repo->repository_format_partial_clone = format.partial_clone;
+	format.partial_clone = NULL;
+
 	if (worktree)
 		repo_set_worktree(repo, worktree);
 
diff --git a/repository.h b/repository.h
index a45f7520fd..6fb16ed336 100644
--- a/repository.h
+++ b/repository.h
@@ -139,6 +139,9 @@ struct repository {
 	/* True if commit-graph has been disabled within this process. */
 	int commit_graph_disabled;
 
+	/* Configurations related to promisor remotes. */
+	char *repository_format_partial_clone;
+
 	/* Configurations */
 
 	/* Indicate if a repository has a different 'commondir' from 'gitdir' */
diff --git a/setup.c b/setup.c
index ead2f80cd8..eb9367ca5c 100644
--- a/setup.c
+++ b/setup.c
@@ -468,8 +468,6 @@ static enum extension_result handle_extension_v0(const char *var,
 			data->precious_objects = git_config_bool(var, value);
 			return EXTENSION_OK;
 		} else if (!strcmp(ext, "partialclone")) {
-			if (!value)
-				return config_error_nonbool(var);
 			data->partial_clone = xstrdup(value);
 			return EXTENSION_OK;
 		} else if (!strcmp(ext, "worktreeconfig")) {
@@ -566,7 +564,6 @@ static int check_repository_format_gently(const char *gitdir, struct repository_
 	}
 
 	repository_format_precious_objects = candidate->precious_objects;
-	set_repository_format_partial_clone(candidate->partial_clone);
 	repository_format_worktree_config = candidate->worktree_config;
 	string_list_clear(&candidate->unknown_extensions, 0);
 	string_list_clear(&candidate->v1_only_extensions, 0);
@@ -1197,6 +1194,11 @@ int discover_git_directory(struct strbuf *commondir,
 		return -1;
 	}
 
+	/* take ownership of candidate.partial_clone */
+	the_repository->repository_format_partial_clone =
+		candidate.partial_clone;
+	candidate.partial_clone = NULL;
+
 	clear_repository_format(&candidate);
 	return 0;
 }
@@ -1304,8 +1306,13 @@ const char *setup_git_directory_gently(int *nongit_ok)
 				gitdir = DEFAULT_GIT_DIR_ENVIRONMENT;
 			setup_git_env(gitdir);
 		}
-		if (startup_info->have_repository)
+		if (startup_info->have_repository) {
 			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
+			/* take ownership of repo_fmt.partial_clone */
+			the_repository->repository_format_partial_clone =
+				repo_fmt.partial_clone;
+			repo_fmt.partial_clone = NULL;
+		}
 	}
 	/*
 	 * Since precompose_string_if_needed() needs to look at
@@ -1390,6 +1397,8 @@ void check_repository_format(struct repository_format *fmt)
 	check_repository_format_gently(get_git_dir(), fmt, NULL);
 	startup_info->have_repository = 1;
 	repo_set_hash_algo(the_repository, fmt->hash_algo);
+	the_repository->repository_format_partial_clone =
+		xstrdup_or_null(fmt->partial_clone);
 	clear_repository_format(&repo_fmt);
 }
 
-- 
2.32.0.272.g935e593368-goog


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH v4 2/5] promisor-remote: support per-repository config
  2021-06-17 17:13 ` [PATCH v4 " Jonathan Tan
  2021-06-17 17:13   ` [PATCH v4 1/5] repository: move global r_f_p_c to repo struct Jonathan Tan
@ 2021-06-17 17:13   ` Jonathan Tan
  2021-06-17 17:13   ` [PATCH v4 3/5] submodule: refrain from filtering GIT_CONFIG_COUNT Jonathan Tan
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 77+ messages in thread
From: Jonathan Tan @ 2021-06-17 17:13 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, newren, Junio C Hamano

Instead of using global variables to store promisor remote information,
store this config in struct repository instead, and add
repository-agnostic non-static functions corresponding to the existing
non-static functions that only work on the_repository.

The actual lazy-fetching of missing objects currently does not work on
repositories other than the_repository, and will still not work after
this commit, so add a BUG message explaining this. A subsequent commit
will remove this limitation.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 promisor-remote.c | 98 ++++++++++++++++++++++++++---------------------
 promisor-remote.h | 22 +++++++++--
 repository.c      |  6 +++
 repository.h      |  2 +
 4 files changed, 82 insertions(+), 46 deletions(-)

diff --git a/promisor-remote.c b/promisor-remote.c
index d24081dc21..1e00e16b0f 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -5,6 +5,11 @@
 #include "transport.h"
 #include "strvec.h"
 
+struct promisor_remote_config {
+	struct promisor_remote *promisors;
+	struct promisor_remote **promisors_tail;
+};
+
 static int fetch_objects(const char *remote_name,
 			 const struct object_id *oids,
 			 int oid_nr)
@@ -35,10 +40,8 @@ static int fetch_objects(const char *remote_name,
 	return finish_command(&child) ? -1 : 0;
 }
 
-static struct promisor_remote *promisors;
-static struct promisor_remote **promisors_tail = &promisors;
-
-static struct promisor_remote *promisor_remote_new(const char *remote_name)
+static struct promisor_remote *promisor_remote_new(struct promisor_remote_config *config,
+						   const char *remote_name)
 {
 	struct promisor_remote *r;
 
@@ -50,18 +53,19 @@ static struct promisor_remote *promisor_remote_new(const char *remote_name)
 
 	FLEX_ALLOC_STR(r, name, remote_name);
 
-	*promisors_tail = r;
-	promisors_tail = &r->next;
+	*config->promisors_tail = r;
+	config->promisors_tail = &r->next;
 
 	return r;
 }
 
-static struct promisor_remote *promisor_remote_lookup(const char *remote_name,
+static struct promisor_remote *promisor_remote_lookup(struct promisor_remote_config *config,
+						      const char *remote_name,
 						      struct promisor_remote **previous)
 {
 	struct promisor_remote *r, *p;
 
-	for (p = NULL, r = promisors; r; p = r, r = r->next)
+	for (p = NULL, r = config->promisors; r; p = r, r = r->next)
 		if (!strcmp(r->name, remote_name)) {
 			if (previous)
 				*previous = p;
@@ -71,7 +75,8 @@ static struct promisor_remote *promisor_remote_lookup(const char *remote_name,
 	return NULL;
 }
 
-static void promisor_remote_move_to_tail(struct promisor_remote *r,
+static void promisor_remote_move_to_tail(struct promisor_remote_config *config,
+					 struct promisor_remote *r,
 					 struct promisor_remote *previous)
 {
 	if (r->next == NULL)
@@ -80,14 +85,15 @@ static void promisor_remote_move_to_tail(struct promisor_remote *r,
 	if (previous)
 		previous->next = r->next;
 	else
-		promisors = r->next ? r->next : r;
+		config->promisors = r->next ? r->next : r;
 	r->next = NULL;
-	*promisors_tail = r;
-	promisors_tail = &r->next;
+	*config->promisors_tail = r;
+	config->promisors_tail = &r->next;
 }
 
 static int promisor_remote_config(const char *var, const char *value, void *data)
 {
+	struct promisor_remote_config *config = data;
 	const char *name;
 	size_t namelen;
 	const char *subkey;
@@ -103,8 +109,8 @@ static int promisor_remote_config(const char *var, const char *value, void *data
 
 		remote_name = xmemdupz(name, namelen);
 
-		if (!promisor_remote_lookup(remote_name, NULL))
-			promisor_remote_new(remote_name);
+		if (!promisor_remote_lookup(config, remote_name, NULL))
+			promisor_remote_new(config, remote_name);
 
 		free(remote_name);
 		return 0;
@@ -113,9 +119,9 @@ static int promisor_remote_config(const char *var, const char *value, void *data
 		struct promisor_remote *r;
 		char *remote_name = xmemdupz(name, namelen);
 
-		r = promisor_remote_lookup(remote_name, NULL);
+		r = promisor_remote_lookup(config, remote_name, NULL);
 		if (!r)
-			r = promisor_remote_new(remote_name);
+			r = promisor_remote_new(config, remote_name);
 
 		free(remote_name);
 
@@ -128,59 +134,63 @@ static int promisor_remote_config(const char *var, const char *value, void *data
 	return 0;
 }
 
-static int initialized;
-
-static void promisor_remote_init(void)
+static void promisor_remote_init(struct repository *r)
 {
-	if (initialized)
+	struct promisor_remote_config *config;
+
+	if (r->promisor_remote_config)
 		return;
-	initialized = 1;
+	config = r->promisor_remote_config =
+		xcalloc(sizeof(*r->promisor_remote_config), 1);
+	config->promisors_tail = &config->promisors;
 
-	git_config(promisor_remote_config, NULL);
+	repo_config(r, promisor_remote_config, config);
 
-	if (the_repository->repository_format_partial_clone) {
+	if (r->repository_format_partial_clone) {
 		struct promisor_remote *o, *previous;
 
-		o = promisor_remote_lookup(the_repository->repository_format_partial_clone,
+		o = promisor_remote_lookup(config,
+					   r->repository_format_partial_clone,
 					   &previous);
 		if (o)
-			promisor_remote_move_to_tail(o, previous);
+			promisor_remote_move_to_tail(config, o, previous);
 		else
-			promisor_remote_new(the_repository->repository_format_partial_clone);
+			promisor_remote_new(config, r->repository_format_partial_clone);
 	}
 }
 
-static void promisor_remote_clear(void)
+void promisor_remote_clear(struct promisor_remote_config *config)
 {
-	while (promisors) {
-		struct promisor_remote *r = promisors;
-		promisors = promisors->next;
+	while (config->promisors) {
+		struct promisor_remote *r = config->promisors;
+		config->promisors = config->promisors->next;
 		free(r);
 	}
 
-	promisors_tail = &promisors;
+	config->promisors_tail = &config->promisors;
 }
 
-void promisor_remote_reinit(void)
+void repo_promisor_remote_reinit(struct repository *r)
 {
-	initialized = 0;
-	promisor_remote_clear();
-	promisor_remote_init();
+	promisor_remote_clear(r->promisor_remote_config);
+	FREE_AND_NULL(r->promisor_remote_config);
+	promisor_remote_init(r);
 }
 
-struct promisor_remote *promisor_remote_find(const char *remote_name)
+struct promisor_remote *repo_promisor_remote_find(struct repository *r,
+						  const char *remote_name)
 {
-	promisor_remote_init();
+	promisor_remote_init(r);
 
 	if (!remote_name)
-		return promisors;
+		return r->promisor_remote_config->promisors;
 
-	return promisor_remote_lookup(remote_name, NULL);
+	return promisor_remote_lookup(r->promisor_remote_config, remote_name, NULL);
 }
 
-int has_promisor_remote(void)
+int repo_has_promisor_remote(struct repository *r)
 {
-	return !!promisor_remote_find(NULL);
+	return !!repo_promisor_remote_find(r, NULL);
 }
 
 static int remove_fetched_oids(struct repository *repo,
@@ -228,9 +238,11 @@ int promisor_remote_get_direct(struct repository *repo,
 	if (oid_nr == 0)
 		return 0;
 
-	promisor_remote_init();
+	promisor_remote_init(repo);
 
-	for (r = promisors; r; r = r->next) {
+	if (repo != the_repository)
+		BUG("only the_repository is supported for now");
+	for (r = repo->promisor_remote_config->promisors; r; r = r->next) {
 		if (fetch_objects(r->name, remaining_oids, remaining_nr) < 0) {
 			if (remaining_nr == 1)
 				continue;
diff --git a/promisor-remote.h b/promisor-remote.h
index 687210ab87..edc45ab0f5 100644
--- a/promisor-remote.h
+++ b/promisor-remote.h
@@ -17,9 +17,25 @@ struct promisor_remote {
 	const char name[FLEX_ARRAY];
 };
 
-void promisor_remote_reinit(void);
-struct promisor_remote *promisor_remote_find(const char *remote_name);
-int has_promisor_remote(void);
+void repo_promisor_remote_reinit(struct repository *r);
+static inline void promisor_remote_reinit(void)
+{
+	repo_promisor_remote_reinit(the_repository);
+}
+
+void promisor_remote_clear(struct promisor_remote_config *config);
+
+struct promisor_remote *repo_promisor_remote_find(struct repository *r, const char *remote_name);
+static inline struct promisor_remote *promisor_remote_find(const char *remote_name)
+{
+	return repo_promisor_remote_find(the_repository, remote_name);
+}
+
+int repo_has_promisor_remote(struct repository *r);
+static inline int has_promisor_remote(void)
+{
+	return repo_has_promisor_remote(the_repository);
+}
 
 /*
  * Fetches all requested objects from all promisor remotes, trying them one at
diff --git a/repository.c b/repository.c
index 057a4748a0..b2bf44c6fa 100644
--- a/repository.c
+++ b/repository.c
@@ -11,6 +11,7 @@
 #include "lockfile.h"
 #include "submodule-config.h"
 #include "sparse-index.h"
+#include "promisor-remote.h"
 
 /* The main repository */
 static struct repository the_repo;
@@ -262,6 +263,11 @@ void repo_clear(struct repository *repo)
 		if (repo->index != &the_index)
 			FREE_AND_NULL(repo->index);
 	}
+
+	if (repo->promisor_remote_config) {
+		promisor_remote_clear(repo->promisor_remote_config);
+		FREE_AND_NULL(repo->promisor_remote_config);
+	}
 }
 
 int repo_read_index(struct repository *repo)
diff --git a/repository.h b/repository.h
index 6fb16ed336..3740c93bc0 100644
--- a/repository.h
+++ b/repository.h
@@ -10,6 +10,7 @@ struct lock_file;
 struct pathspec;
 struct raw_object_store;
 struct submodule_cache;
+struct promisor_remote_config;
 
 enum untracked_cache_setting {
 	UNTRACKED_CACHE_UNSET = -1,
@@ -141,6 +142,7 @@ struct repository {
 
 	/* Configurations related to promisor remotes. */
 	char *repository_format_partial_clone;
+	struct promisor_remote_config *promisor_remote_config;
 
 	/* Configurations */
 
-- 
2.32.0.272.g935e593368-goog


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH v4 3/5] submodule: refrain from filtering GIT_CONFIG_COUNT
  2021-06-17 17:13 ` [PATCH v4 " Jonathan Tan
  2021-06-17 17:13   ` [PATCH v4 1/5] repository: move global r_f_p_c to repo struct Jonathan Tan
  2021-06-17 17:13   ` [PATCH v4 2/5] promisor-remote: support per-repository config Jonathan Tan
@ 2021-06-17 17:13   ` Jonathan Tan
  2021-06-17 17:13   ` [PATCH v4 4/5] run-command: refactor subprocess env preparation Jonathan Tan
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 77+ messages in thread
From: Jonathan Tan @ 2021-06-17 17:13 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, newren, Junio C Hamano

14111fc492 ("git: submodule honor -c credential.* from command line",
2016-03-01) taught Git to pass through the GIT_CONFIG_PARAMETERS
environment variable when invoking a subprocess on behalf of a
submodule. But when d8d77153ea ("config: allow specifying config entries
via envvar pairs", 2021-01-15) introduced support for GIT_CONFIG_COUNT
(and its associated GIT_CONFIG_KEY_? and GIT_CONFIG_VALUE_?), the
subprocess mechanism wasn't updated to also pass through these
variables.

Since they are conceptually the same (d8d77153ea was written to address
a shortcoming of GIT_CONFIG_PARAMETERS), update the submodule subprocess
mechanism to also pass through GIT_CONFIG_COUNT.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 submodule.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/submodule.c b/submodule.c
index 0b1d9c1dde..f09031e397 100644
--- a/submodule.c
+++ b/submodule.c
@@ -489,7 +489,8 @@ static void prepare_submodule_repo_env_no_git_dir(struct strvec *out)
 	const char * const *var;
 
 	for (var = local_repo_env; *var; var++) {
-		if (strcmp(*var, CONFIG_DATA_ENVIRONMENT))
+		if (strcmp(*var, CONFIG_DATA_ENVIRONMENT) &&
+		    strcmp(*var, CONFIG_COUNT_ENVIRONMENT))
 			strvec_push(out, *var);
 	}
 }
-- 
2.32.0.272.g935e593368-goog


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH v4 4/5] run-command: refactor subprocess env preparation
  2021-06-17 17:13 ` [PATCH v4 " Jonathan Tan
                     ` (2 preceding siblings ...)
  2021-06-17 17:13   ` [PATCH v4 3/5] submodule: refrain from filtering GIT_CONFIG_COUNT Jonathan Tan
@ 2021-06-17 17:13   ` Jonathan Tan
  2021-06-17 17:13   ` [PATCH v4 5/5] promisor-remote: teach lazy-fetch in any repo Jonathan Tan
  2021-06-19 20:01   ` [PATCH v4 0/5] First steps towards partial clone submodules Elijah Newren
  5 siblings, 0 replies; 77+ messages in thread
From: Jonathan Tan @ 2021-06-17 17:13 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, newren, Junio C Hamano

submodule.c has functionality that prepares the environment for running
a subprocess in a new repo. The lazy-fetching code (used in partial
clones) will need this in a subsequent commit, so move it to a more
central location.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 run-command.c | 12 ++++++++++++
 run-command.h | 10 ++++++++++
 submodule.c   | 18 ++----------------
 3 files changed, 24 insertions(+), 16 deletions(-)

diff --git a/run-command.c b/run-command.c
index be6bc128cd..549a94a6a4 100644
--- a/run-command.c
+++ b/run-command.c
@@ -1892,3 +1892,15 @@ int run_auto_maintenance(int quiet)
 
 	return run_command(&maint);
 }
+
+void prepare_other_repo_env(struct strvec *env_array, const char *new_git_dir)
+{
+	const char * const *var;
+
+	for (var = local_repo_env; *var; var++) {
+		if (strcmp(*var, CONFIG_DATA_ENVIRONMENT) &&
+		    strcmp(*var, CONFIG_COUNT_ENVIRONMENT))
+			strvec_push(env_array, *var);
+	}
+	strvec_pushf(env_array, "%s=%s", GIT_DIR_ENVIRONMENT, new_git_dir);
+}
diff --git a/run-command.h b/run-command.h
index d08414a92e..22132536a9 100644
--- a/run-command.h
+++ b/run-command.h
@@ -483,4 +483,14 @@ int run_processes_parallel_tr2(int n, get_next_task_fn, start_failure_fn,
 			       task_finished_fn, void *pp_cb,
 			       const char *tr2_category, const char *tr2_label);
 
+/**
+ * Convenience function which prepares env_array for a command to be run in a
+ * new repo. This adds all GIT_* environment variables to env_array with the
+ * exception of GIT_CONFIG_PARAMETERS and GIT_CONFIG_COUNT (which cause the
+ * corresponding environment variables to be unset in the subprocess) and adds
+ * an environment variable pointing to new_git_dir. See local_repo_env in
+ * cache.h for more information.
+ */
+void prepare_other_repo_env(struct strvec *env_array, const char *new_git_dir);
+
 #endif
diff --git a/submodule.c b/submodule.c
index f09031e397..8e611fe1db 100644
--- a/submodule.c
+++ b/submodule.c
@@ -484,28 +484,14 @@ static void print_submodule_diff_summary(struct repository *r, struct rev_info *
 	strbuf_release(&sb);
 }
 
-static void prepare_submodule_repo_env_no_git_dir(struct strvec *out)
-{
-	const char * const *var;
-
-	for (var = local_repo_env; *var; var++) {
-		if (strcmp(*var, CONFIG_DATA_ENVIRONMENT) &&
-		    strcmp(*var, CONFIG_COUNT_ENVIRONMENT))
-			strvec_push(out, *var);
-	}
-}
-
 void prepare_submodule_repo_env(struct strvec *out)
 {
-	prepare_submodule_repo_env_no_git_dir(out);
-	strvec_pushf(out, "%s=%s", GIT_DIR_ENVIRONMENT,
-		     DEFAULT_GIT_DIR_ENVIRONMENT);
+	prepare_other_repo_env(out, DEFAULT_GIT_DIR_ENVIRONMENT);
 }
 
 static void prepare_submodule_repo_env_in_gitdir(struct strvec *out)
 {
-	prepare_submodule_repo_env_no_git_dir(out);
-	strvec_pushf(out, "%s=.", GIT_DIR_ENVIRONMENT);
+	prepare_other_repo_env(out, ".");
 }
 
 /*
-- 
2.32.0.272.g935e593368-goog


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH v4 5/5] promisor-remote: teach lazy-fetch in any repo
  2021-06-17 17:13 ` [PATCH v4 " Jonathan Tan
                     ` (3 preceding siblings ...)
  2021-06-17 17:13   ` [PATCH v4 4/5] run-command: refactor subprocess env preparation Jonathan Tan
@ 2021-06-17 17:13   ` Jonathan Tan
  2021-06-19 20:01   ` [PATCH v4 0/5] First steps towards partial clone submodules Elijah Newren
  5 siblings, 0 replies; 77+ messages in thread
From: Jonathan Tan @ 2021-06-17 17:13 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, newren, Junio C Hamano

This is one step towards supporting partial clone submodules.

Even after this patch, we will still lack partial clone submodules
support, primarily because a lot of Git code that accesses submodule
objects does so by adding their object stores as alternates, meaning
that any lazy fetches that would occur in the submodule would be done
based on the config of the superproject, not of the submodule. This also
prevents testing of the functionality in this patch by user-facing
commands. So for now, test this mechanism using a test helper.

Besides that, there is some code that uses the wrapper functions
like has_promisor_remote(). Those will need to be checked to see if they
could support the non-wrapper functions instead (and thus support any
repository, not just the_repository).

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Makefile                      |  1 +
 object-file.c                 |  7 ++----
 promisor-remote.c             |  9 ++++----
 t/helper/test-partial-clone.c | 43 +++++++++++++++++++++++++++++++++++
 t/helper/test-tool.c          |  1 +
 t/helper/test-tool.h          |  1 +
 t/t0410-partial-clone.sh      | 23 +++++++++++++++++++
 7 files changed, 76 insertions(+), 9 deletions(-)
 create mode 100644 t/helper/test-partial-clone.c

diff --git a/Makefile b/Makefile
index c3565fc0f8..f6653bcd5e 100644
--- a/Makefile
+++ b/Makefile
@@ -725,6 +725,7 @@ TEST_BUILTINS_OBJS += test-oidmap.o
 TEST_BUILTINS_OBJS += test-online-cpus.o
 TEST_BUILTINS_OBJS += test-parse-options.o
 TEST_BUILTINS_OBJS += test-parse-pathspec-file.o
+TEST_BUILTINS_OBJS += test-partial-clone.o
 TEST_BUILTINS_OBJS += test-path-utils.o
 TEST_BUILTINS_OBJS += test-pcre2-config.o
 TEST_BUILTINS_OBJS += test-pkt-line.o
diff --git a/object-file.c b/object-file.c
index f233b440b2..ebf273e9e7 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1570,15 +1570,12 @@ static int do_oid_object_info_extended(struct repository *r,
 		}
 
 		/* Check if it is a missing object */
-		if (fetch_if_missing && has_promisor_remote() &&
-		    !already_retried && r == the_repository &&
+		if (fetch_if_missing && repo_has_promisor_remote(r) &&
+		    !already_retried &&
 		    !(flags & OBJECT_INFO_SKIP_FETCH_OBJECT)) {
 			/*
 			 * TODO Investigate checking promisor_remote_get_direct()
 			 * TODO return value and stopping on error here.
-			 * TODO Pass a repository struct through
-			 * promisor_remote_get_direct(), such that arbitrary
-			 * repositories work.
 			 */
 			promisor_remote_get_direct(r, real, 1);
 			already_retried = 1;
diff --git a/promisor-remote.c b/promisor-remote.c
index 1e00e16b0f..c088dcbff3 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -10,7 +10,8 @@ struct promisor_remote_config {
 	struct promisor_remote **promisors_tail;
 };
 
-static int fetch_objects(const char *remote_name,
+static int fetch_objects(struct repository *repo,
+			 const char *remote_name,
 			 const struct object_id *oids,
 			 int oid_nr)
 {
@@ -20,6 +21,8 @@ static int fetch_objects(const char *remote_name,
 
 	child.git_cmd = 1;
 	child.in = -1;
+	if (repo != the_repository)
+		prepare_other_repo_env(&child.env_array, repo->gitdir);
 	strvec_pushl(&child.args, "-c", "fetch.negotiationAlgorithm=noop",
 		     "fetch", remote_name, "--no-tags",
 		     "--no-write-fetch-head", "--recurse-submodules=no",
@@ -240,10 +243,8 @@ int promisor_remote_get_direct(struct repository *repo,
 
 	promisor_remote_init(repo);
 
-	if (repo != the_repository)
-		BUG("only the_repository is supported for now");
 	for (r = repo->promisor_remote_config->promisors; r; r = r->next) {
-		if (fetch_objects(r->name, remaining_oids, remaining_nr) < 0) {
+		if (fetch_objects(repo, r->name, remaining_oids, remaining_nr) < 0) {
 			if (remaining_nr == 1)
 				continue;
 			remaining_nr = remove_fetched_oids(repo, &remaining_oids,
diff --git a/t/helper/test-partial-clone.c b/t/helper/test-partial-clone.c
new file mode 100644
index 0000000000..3f102cfddd
--- /dev/null
+++ b/t/helper/test-partial-clone.c
@@ -0,0 +1,43 @@
+#include "cache.h"
+#include "test-tool.h"
+#include "repository.h"
+#include "object-store.h"
+
+/*
+ * Prints the size of the object corresponding to the given hash in a specific
+ * gitdir. This is similar to "git -C gitdir cat-file -s", except that this
+ * exercises the code that accesses the object of an arbitrary repository that
+ * is not the_repository. ("git -C gitdir" makes it so that the_repository is
+ * the one in gitdir.)
+ */
+static void object_info(const char *gitdir, const char *oid_hex)
+{
+	struct repository r;
+	struct object_id oid;
+	unsigned long size;
+	struct object_info oi = {.sizep = &size};
+	const char *p;
+
+	if (repo_init(&r, gitdir, NULL))
+		die("could not init repo");
+	if (parse_oid_hex(oid_hex, &oid, &p))
+		die("could not parse oid");
+	if (oid_object_info_extended(&r, &oid, &oi, 0))
+		die("could not obtain object info");
+	printf("%d\n", (int) size);
+}
+
+int cmd__partial_clone(int argc, const char **argv)
+{
+	setup_git_directory();
+
+	if (argc < 4)
+		die("too few arguments");
+
+	if (!strcmp(argv[1], "object-info"))
+		object_info(argv[2], argv[3]);
+	else
+		die("invalid argument '%s'", argv[1]);
+
+	return 0;
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index c5bd0c6d4c..b21e8f1519 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -46,6 +46,7 @@ static struct test_cmd cmds[] = {
 	{ "online-cpus", cmd__online_cpus },
 	{ "parse-options", cmd__parse_options },
 	{ "parse-pathspec-file", cmd__parse_pathspec_file },
+	{ "partial-clone", cmd__partial_clone },
 	{ "path-utils", cmd__path_utils },
 	{ "pcre2-config", cmd__pcre2_config },
 	{ "pkt-line", cmd__pkt_line },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index e8069a3b22..f845ced4b3 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -35,6 +35,7 @@ int cmd__oidmap(int argc, const char **argv);
 int cmd__online_cpus(int argc, const char **argv);
 int cmd__parse_options(int argc, const char **argv);
 int cmd__parse_pathspec_file(int argc, const char** argv);
+int cmd__partial_clone(int argc, const char **argv);
 int cmd__path_utils(int argc, const char **argv);
 int cmd__pcre2_config(int argc, const char **argv);
 int cmd__pkt_line(int argc, const char **argv);
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 584a039b85..a211a66c67 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -604,6 +604,29 @@ test_expect_success 'do not fetch when checking existence of tree we construct o
 	git -C repo cherry-pick side1
 '
 
+test_expect_success 'lazy-fetch when accessing object not in the_repository' '
+	rm -rf full partial.git &&
+	test_create_repo full &&
+	test_commit -C full create-a-file file.txt &&
+
+	test_config -C full uploadpack.allowfilter 1 &&
+	test_config -C full uploadpack.allowanysha1inwant 1 &&
+	git clone --filter=blob:none --bare "file://$(pwd)/full" partial.git &&
+	FILE_HASH=$(git -C full rev-parse HEAD:file.txt) &&
+
+	# Sanity check that the file is missing
+	git -C partial.git rev-list --objects --missing=print HEAD >out &&
+	grep "[?]$FILE_HASH" out &&
+
+	git -C full cat-file -s "$FILE_HASH" >expect &&
+	test-tool partial-clone object-info partial.git "$FILE_HASH" >actual &&
+	test_cmp expect actual &&
+
+	# Sanity check that the file is now present
+	git -C partial.git rev-list --objects --missing=print HEAD >out &&
+	! grep "[?]$FILE_HASH" out
+'
+
 . "$TEST_DIRECTORY"/lib-httpd.sh
 start_httpd
 
-- 
2.32.0.272.g935e593368-goog


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v4 0/5] First steps towards partial clone submodules
  2021-06-17 17:13 ` [PATCH v4 " Jonathan Tan
                     ` (4 preceding siblings ...)
  2021-06-17 17:13   ` [PATCH v4 5/5] promisor-remote: teach lazy-fetch in any repo Jonathan Tan
@ 2021-06-19 20:01   ` Elijah Newren
  5 siblings, 0 replies; 77+ messages in thread
From: Elijah Newren @ 2021-06-19 20:01 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Git Mailing List

On Thu, Jun 17, 2021 at 10:13 AM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> Quoting from [1]:
>
> > I'm happy with Jonathan and Peff's responses on patch 3; as I
> > mentioned above I just didn't understand the original code before
> > Jonathan's changes.  (Perhaps some comments could be added to clarify
> > that code area, but again that's clarifying the code that existed
> > before Jonathan's patch so it doesn't need to be part of his series.)
> > So that only leaves my nitpicks on patches 1 & 4; otherwise the series
> > looks good to me.
>
> I've addressed Elijah's comments on patches 1 and 4.

Yep, patches 1, 2, 4, and 5 are Reviewed-by me.  While I looked over
Patch 3, I made Peff explain it to me, so he's the one who reviewed
that one.  ;-)

^ permalink raw reply	[flat|nested] 77+ messages in thread

end of thread, other threads:[~2021-06-19 20:01 UTC | newest]

Thread overview: 77+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-01 21:34 [PATCH 0/4] First steps towards partial clone submodules Jonathan Tan
2021-06-01 21:34 ` [PATCH 1/4] promisor-remote: read partialClone config here Jonathan Tan
2021-06-04 19:56   ` Taylor Blau
2021-06-05  1:38     ` Jonathan Tan
2021-06-07 22:41   ` Emily Shaffer
2021-06-01 21:34 ` [PATCH 2/4] promisor-remote: support per-repository config Jonathan Tan
2021-06-04 20:09   ` Taylor Blau
2021-06-05  1:43     ` Jonathan Tan
2021-06-04 21:21   ` Elijah Newren
2021-06-05  1:54     ` Jonathan Tan
2021-06-08  0:48   ` Emily Shaffer
2021-06-01 21:34 ` [PATCH 3/4] run-command: move envvar-resetting function Jonathan Tan
2021-06-04 20:19   ` Taylor Blau
2021-06-05  1:57     ` Jonathan Tan
2021-06-08  0:54   ` Emily Shaffer
2021-06-01 21:34 ` [PATCH 4/4] promisor-remote: teach lazy-fetch in any repo Jonathan Tan
2021-06-04 21:25   ` Taylor Blau
2021-06-05  2:11     ` Jonathan Tan
2021-06-04 21:35   ` Elijah Newren
2021-06-05  2:16     ` Jonathan Tan
2021-06-05  3:48     ` Elijah Newren
2021-06-05  0:22   ` Elijah Newren
2021-06-05  2:16     ` Jonathan Tan
2021-06-08  1:41   ` Emily Shaffer
2021-06-09  4:52     ` Jonathan Tan
2021-06-08  0:25 ` [PATCH v2 0/4] First steps towards partial clone submodules Jonathan Tan
2021-06-08  0:25   ` [PATCH v2 1/4] promisor-remote: read partialClone config here Jonathan Tan
2021-06-08  3:18     ` Junio C Hamano
2021-06-09  4:26       ` Jonathan Tan
2021-06-09  9:30         ` Junio C Hamano
2021-06-09 17:16           ` Jonathan Tan
2021-06-08 17:28     ` Elijah Newren
2021-06-09  4:44       ` Jonathan Tan
2021-06-09  5:34         ` Elijah Newren
2021-06-10 17:25           ` Jonathan Tan
2021-06-08  0:25   ` [PATCH v2 2/4] promisor-remote: support per-repository config Jonathan Tan
2021-06-08  3:30     ` Junio C Hamano
2021-06-09  4:29       ` Jonathan Tan
2021-06-08  0:25   ` [PATCH v2 3/4] run-command: move envvar-resetting function Jonathan Tan
2021-06-08  4:14     ` Junio C Hamano
2021-06-09  4:32       ` Jonathan Tan
2021-06-09  5:28         ` Junio C Hamano
2021-06-09 18:15           ` Jonathan Tan
2021-06-08  0:25   ` [PATCH v2 4/4] promisor-remote: teach lazy-fetch in any repo Jonathan Tan
2021-06-08  4:33     ` Junio C Hamano
2021-06-09  4:39       ` Jonathan Tan
2021-06-09  5:33         ` Junio C Hamano
2021-06-09 18:20           ` Jonathan Tan
2021-06-10  1:26             ` Junio C Hamano
2021-06-08 17:42     ` Elijah Newren
2021-06-09  4:46       ` Jonathan Tan
2021-06-08 17:50   ` [PATCH v2 0/4] First steps towards partial clone submodules Elijah Newren
2021-06-08 23:42     ` Junio C Hamano
2021-06-09  0:07       ` Elijah Newren
2021-06-09  0:18         ` Junio C Hamano
2021-06-09  4:58     ` Jonathan Tan
2021-06-08  1:44 ` [PATCH " Emily Shaffer
2021-06-10 17:35 ` [PATCH v3 0/5] " Jonathan Tan
2021-06-10 17:35   ` [PATCH v3 1/5] repository: move global r_f_p_c to repo struct Jonathan Tan
2021-06-10 20:47     ` Elijah Newren
2021-06-10 17:35   ` [PATCH v3 2/5] promisor-remote: support per-repository config Jonathan Tan
2021-06-10 17:35   ` [PATCH v3 3/5] submodule: refrain from filtering GIT_CONFIG_COUNT Jonathan Tan
2021-06-10 21:13     ` Elijah Newren
2021-06-10 21:51       ` Jeff King
2021-06-11 17:02         ` Jonathan Tan
2021-06-10 17:35   ` [PATCH v3 4/5] run-command: refactor subprocess env preparation Jonathan Tan
2021-06-10 21:21     ` Elijah Newren
2021-06-10 17:35   ` [PATCH v3 5/5] promisor-remote: teach lazy-fetch in any repo Jonathan Tan
2021-06-10 21:29   ` [PATCH v3 0/5] First steps towards partial clone submodules Elijah Newren
2021-06-15 21:22     ` Elijah Newren
2021-06-17 17:13 ` [PATCH v4 " Jonathan Tan
2021-06-17 17:13   ` [PATCH v4 1/5] repository: move global r_f_p_c to repo struct Jonathan Tan
2021-06-17 17:13   ` [PATCH v4 2/5] promisor-remote: support per-repository config Jonathan Tan
2021-06-17 17:13   ` [PATCH v4 3/5] submodule: refrain from filtering GIT_CONFIG_COUNT Jonathan Tan
2021-06-17 17:13   ` [PATCH v4 4/5] run-command: refactor subprocess env preparation Jonathan Tan
2021-06-17 17:13   ` [PATCH v4 5/5] promisor-remote: teach lazy-fetch in any repo Jonathan Tan
2021-06-19 20:01   ` [PATCH v4 0/5] First steps towards partial clone submodules Elijah Newren

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).