git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/8] fetch --recurse-submodules: fetch unpopulated submodules
@ 2022-02-10  4:41 Glen Choo
  2022-02-10  4:41 ` [PATCH 1/8] submodule: inline submodule_commits() into caller Glen Choo
                   ` (9 more replies)
  0 siblings, 10 replies; 149+ messages in thread
From: Glen Choo @ 2022-02-10  4:41 UTC (permalink / raw)
  To: git; +Cc: Glen Choo, Jonathan Tan

= Background

When fetching submodule commits, "git fetch --recurse-submodules" only
considers populated submodules, and not all of the submodules in
$GIT_DIR/modules as one might expect. As a result, "git fetch
--recurse-submodules" behaves differently based on which commit is
checked out.

This can be a problem, for instance, if the user has a branch with
submodules and a branch without:

  # the submodules were initialized at some point in history..
  git checkout -b branch-with-submodules origin/branch-with-submodules
  git submodule update --init

  # later down the road..
  git checkout --recurse-submodules branch-without-submodules
  # no submodules are fetched!
  git fetch --recurse-submodules
  # if origin/branch-with-submodules has new submodule commits, this
  # checkout will fail because we never fetched the submodule
  git checkout --recurse-submodules branch-with-submodules

This series makes "git fetch" fetch the right submodules regardless of
which commit is checked out, as long as the submodule has already been
cloned. In particular, "git fetch" learns to:

1. read submodules from the relevant superproject commit instead of
   the file system
2. fetch all changed submodules, even if they are not populated

= Patch organization

- Patches 1-3 teach "git fetch" to read .gitmodules from the
  superproject commit.
- Patches 4-5 are quality-of-life improvements to the test suite that
  make it easier to write the tests in patch 7.
- Patch 6 separates the steps of "finding which submodules to fetch" and
  "fetching the submodules", making it easier to tell "git fetch" to
  fetch unpopulated submodules.
- Patch 7 teaches "git fetch" to fetch changed, unpopulated submodules
  in addition to populated submodules.
- Patch 8 is an optional bugfix + cleanup of the "git fetch" code that
  removes the last caller of the deprecated "add_submodule_odb()".

= Future work

Even with this series, there is no guarantee that "git fetch" will fetch
every necessary submodule commit because a superproject commit can
introduce new submodules, and since those submodules are not cloned, "git
fetch" cannot fetch those commits yet. This series should get us closer
to that goal because "git fetch" can read submodules from the
superproject commit, which is a necessary precursor to figuring out
whether to clone submodules from superproject commits.

Glen Choo (8):
  submodule: inline submodule_commits() into caller
  submodule: store new submodule commits oid_array in a struct
  submodule: make static functions read submodules from commits
  t5526: introduce test helper to assert on fetches
  t5526: use grep to assert on fetches
  submodule: extract get_fetch_task()
  fetch: fetch unpopulated, changed submodules
  submodule: fix bug and remove add_submodule_odb()

 Documentation/fetch-options.txt |  26 ++-
 Documentation/git-fetch.txt     |  10 +-
 submodule.c                     | 316 ++++++++++++++++----------
 submodule.h                     |   9 +-
 t/t5526-fetch-submodules.sh     | 386 ++++++++++++++++++++++++--------
 5 files changed, 524 insertions(+), 223 deletions(-)


base-commit: 679e3693aba0c17af60c031f7eef68f2296b8dad
-- 
2.33.GIT


^ permalink raw reply	[flat|nested] 149+ messages in thread

* [PATCH 1/8] submodule: inline submodule_commits() into caller
  2022-02-10  4:41 [PATCH 0/8] fetch --recurse-submodules: fetch unpopulated submodules Glen Choo
@ 2022-02-10  4:41 ` Glen Choo
  2022-02-10  4:41 ` [PATCH 2/8] submodule: store new submodule commits oid_array in a struct Glen Choo
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 149+ messages in thread
From: Glen Choo @ 2022-02-10  4:41 UTC (permalink / raw)
  To: git; +Cc: Glen Choo, Jonathan Tan

When collecting the string_list of changed submodule names, the new
submodules commits are stored in the string_list_item.util as an
oid_array. A subsequent commit will replace the oid_array with a struct
that has more information.

Prepare for this change by inlining submodule_commits() (which inserts
into the string_list and initializes the string_list_item.util) into its
only caller. This simplifies the code and makes it easier for the caller
to add information to the string_list_item.util.

Signed-off-by: Glen Choo <chooglen@google.com>
---
 submodule.c | 22 ++++++----------------
 1 file changed, 6 insertions(+), 16 deletions(-)

diff --git a/submodule.c b/submodule.c
index 5ace18a7d9..49f9dc5d23 100644
--- a/submodule.c
+++ b/submodule.c
@@ -782,19 +782,6 @@ const struct submodule *submodule_from_ce(const struct cache_entry *ce)
 	return submodule_from_path(the_repository, null_oid(), ce->name);
 }
 
-static struct oid_array *submodule_commits(struct string_list *submodules,
-					   const char *name)
-{
-	struct string_list_item *item;
-
-	item = string_list_insert(submodules, name);
-	if (item->util)
-		return (struct oid_array *) item->util;
-
-	/* NEEDSWORK: should we have oid_array_init()? */
-	item->util = xcalloc(1, sizeof(struct oid_array));
-	return (struct oid_array *) item->util;
-}
 
 struct collect_changed_submodules_cb_data {
 	struct repository *repo;
@@ -830,9 +817,9 @@ static void collect_changed_submodules_cb(struct diff_queue_struct *q,
 
 	for (i = 0; i < q->nr; i++) {
 		struct diff_filepair *p = q->queue[i];
-		struct oid_array *commits;
 		const struct submodule *submodule;
 		const char *name;
+		struct string_list_item *item;
 
 		if (!S_ISGITLINK(p->two->mode))
 			continue;
@@ -859,8 +846,11 @@ static void collect_changed_submodules_cb(struct diff_queue_struct *q,
 		if (!name)
 			continue;
 
-		commits = submodule_commits(changed, name);
-		oid_array_append(commits, &p->two->oid);
+		item = string_list_insert(changed, name);
+		if (!item->util)
+			/* NEEDSWORK: should we have oid_array_init()? */
+			item->util = xcalloc(1, sizeof(struct oid_array));
+		oid_array_append(item->util, &p->two->oid);
 	}
 }
 
-- 
2.33.GIT


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 2/8] submodule: store new submodule commits oid_array in a struct
  2022-02-10  4:41 [PATCH 0/8] fetch --recurse-submodules: fetch unpopulated submodules Glen Choo
  2022-02-10  4:41 ` [PATCH 1/8] submodule: inline submodule_commits() into caller Glen Choo
@ 2022-02-10  4:41 ` Glen Choo
  2022-02-10 19:00   ` Jonathan Tan
  2022-02-10  4:41 ` [PATCH 3/8] submodule: make static functions read submodules from commits Glen Choo
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 149+ messages in thread
From: Glen Choo @ 2022-02-10  4:41 UTC (permalink / raw)
  To: git; +Cc: Glen Choo, Jonathan Tan

This commit prepares for a future commit that will teach `git fetch
--recurse-submodules` how to fetch submodules that are present in
<gitdir>/modules, but are not populated. To do this, we need to store
more information about the changed submodule so that we can read the
submodule configuration from the superproject commit instead of the
filesystem.

Refactor the changed submodules string_list.util to hold a struct
instead of an oid_array. This struct only holds the new_commits
oid_array for now; more information will be added later.

Signed-off-by: Glen Choo <chooglen@google.com>
---
 submodule.c | 60 ++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 41 insertions(+), 19 deletions(-)

diff --git a/submodule.c b/submodule.c
index 49f9dc5d23..e2405c9f15 100644
--- a/submodule.c
+++ b/submodule.c
@@ -806,6 +806,21 @@ static const char *default_name_or_path(const char *path_or_name)
 	return path_or_name;
 }
 
+/*
+ * Holds relevant information for a changed submodule. Used as the .util
+ * member of the changed submodule string_list_item.
+ */
+struct changed_submodule_data {
+	/* The submodule commits that have changed in the rev walk. */
+	struct oid_array *new_commits;
+};
+
+static void changed_submodule_data_clear(struct changed_submodule_data *cs_data)
+{
+	oid_array_clear(cs_data->new_commits);
+	free(cs_data->new_commits);
+}
+
 static void collect_changed_submodules_cb(struct diff_queue_struct *q,
 					  struct diff_options *options,
 					  void *data)
@@ -820,6 +835,7 @@ static void collect_changed_submodules_cb(struct diff_queue_struct *q,
 		const struct submodule *submodule;
 		const char *name;
 		struct string_list_item *item;
+		struct changed_submodule_data *cs_data;
 
 		if (!S_ISGITLINK(p->two->mode))
 			continue;
@@ -847,10 +863,15 @@ static void collect_changed_submodules_cb(struct diff_queue_struct *q,
 			continue;
 
 		item = string_list_insert(changed, name);
-		if (!item->util)
+		if (item->util)
+			cs_data = item->util;
+		else {
+			cs_data = xcalloc(1, sizeof(struct changed_submodule_data));
 			/* NEEDSWORK: should we have oid_array_init()? */
-			item->util = xcalloc(1, sizeof(struct oid_array));
-		oid_array_append(item->util, &p->two->oid);
+			cs_data->new_commits = xcalloc(1, sizeof(struct oid_array));
+			item->util = cs_data;
+		}
+		oid_array_append(cs_data->new_commits, &p->two->oid);
 	}
 }
 
@@ -897,11 +918,12 @@ static void collect_changed_submodules(struct repository *r,
 	reset_revision_walk();
 }
 
-static void free_submodules_oids(struct string_list *submodules)
+static void free_submodules_data(struct string_list *submodules)
 {
 	struct string_list_item *item;
-	for_each_string_list_item(item, submodules)
-		oid_array_clear((struct oid_array *) item->util);
+	for_each_string_list_item(item, submodules) {
+		changed_submodule_data_clear(item->util);
+	}
 	string_list_clear(submodules, 1);
 }
 
@@ -1067,7 +1089,7 @@ int find_unpushed_submodules(struct repository *r,
 	collect_changed_submodules(r, &submodules, &argv);
 
 	for_each_string_list_item(name, &submodules) {
-		struct oid_array *commits = name->util;
+		struct changed_submodule_data *cs_data = name->util;
 		const struct submodule *submodule;
 		const char *path = NULL;
 
@@ -1080,11 +1102,11 @@ int find_unpushed_submodules(struct repository *r,
 		if (!path)
 			continue;
 
-		if (submodule_needs_pushing(r, path, commits))
+		if (submodule_needs_pushing(r, path, cs_data->new_commits))
 			string_list_insert(needs_pushing, path);
 	}
 
-	free_submodules_oids(&submodules);
+	free_submodules_data(&submodules);
 	strvec_clear(&argv);
 
 	return needs_pushing->nr;
@@ -1254,7 +1276,7 @@ static void calculate_changed_submodule_paths(struct repository *r,
 	collect_changed_submodules(r, changed_submodule_names, &argv);
 
 	for_each_string_list_item(name, changed_submodule_names) {
-		struct oid_array *commits = name->util;
+		struct changed_submodule_data *cs_data = name->util;
 		const struct submodule *submodule;
 		const char *path = NULL;
 
@@ -1267,8 +1289,8 @@ static void calculate_changed_submodule_paths(struct repository *r,
 		if (!path)
 			continue;
 
-		if (submodule_has_commits(r, path, commits)) {
-			oid_array_clear(commits);
+		if (submodule_has_commits(r, path, cs_data->new_commits)) {
+			oid_array_clear(cs_data->new_commits);
 			*name->string = '\0';
 		}
 	}
@@ -1305,7 +1327,7 @@ int submodule_touches_in_range(struct repository *r,
 
 	strvec_clear(&args);
 
-	free_submodules_oids(&subs);
+	free_submodules_data(&subs);
 	return ret;
 }
 
@@ -1587,7 +1609,7 @@ static int fetch_finish(int retvalue, struct strbuf *err,
 	struct fetch_task *task = task_cb;
 
 	struct string_list_item *it;
-	struct oid_array *commits;
+	struct changed_submodule_data *cs_data;
 
 	if (!task || !task->sub)
 		BUG("callback cookie bogus");
@@ -1615,14 +1637,14 @@ static int fetch_finish(int retvalue, struct strbuf *err,
 		/* Could be an unchanged submodule, not contained in the list */
 		goto out;
 
-	commits = it->util;
-	oid_array_filter(commits,
+	cs_data = it->util;
+	oid_array_filter(cs_data->new_commits,
 			 commit_missing_in_sub,
 			 task->repo);
 
 	/* Are there commits we want, but do not exist? */
-	if (commits->nr) {
-		task->commits = commits;
+	if (cs_data->new_commits->nr) {
+		task->commits = cs_data->new_commits;
 		ALLOC_GROW(spf->oid_fetch_tasks,
 			   spf->oid_fetch_tasks_nr + 1,
 			   spf->oid_fetch_tasks_alloc);
@@ -1680,7 +1702,7 @@ int fetch_populated_submodules(struct repository *r,
 
 	strvec_clear(&spf.args);
 out:
-	free_submodules_oids(&spf.changed_submodule_names);
+	free_submodules_data(&spf.changed_submodule_names);
 	return spf.result;
 }
 
-- 
2.33.GIT


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 3/8] submodule: make static functions read submodules from commits
  2022-02-10  4:41 [PATCH 0/8] fetch --recurse-submodules: fetch unpopulated submodules Glen Choo
  2022-02-10  4:41 ` [PATCH 1/8] submodule: inline submodule_commits() into caller Glen Choo
  2022-02-10  4:41 ` [PATCH 2/8] submodule: store new submodule commits oid_array in a struct Glen Choo
@ 2022-02-10  4:41 ` Glen Choo
  2022-02-10 19:15   ` Jonathan Tan
  2022-02-10  4:41 ` [PATCH 4/8] t5526: introduce test helper to assert on fetches Glen Choo
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 149+ messages in thread
From: Glen Choo @ 2022-02-10  4:41 UTC (permalink / raw)
  To: git; +Cc: Glen Choo, Jonathan Tan

A future commit will teach "fetch --recurse-submodules" to fetch
unpopulated submodules. Prepare for this by teaching the necessary
static functions to read submodules from superproject commits instead of
the index and filesystem. Then, store the necessary fields (path and
super_oid), and use them in "fetch --recurse-submodules" where possible.

As a result, "git fetch" now reads changed submodules using the
`.gitmodules` and path from super_oid's tree (which is where "git fetch"
actually noticed the changed submodule) instead of the filesystem.

Signed-off-by: Glen Choo <chooglen@google.com>
---
 submodule.c | 39 ++++++++++++++++++++++++++++-----------
 1 file changed, 28 insertions(+), 11 deletions(-)

diff --git a/submodule.c b/submodule.c
index e2405c9f15..d4227ac22d 100644
--- a/submodule.c
+++ b/submodule.c
@@ -811,6 +811,16 @@ static const char *default_name_or_path(const char *path_or_name)
  * member of the changed submodule string_list_item.
  */
 struct changed_submodule_data {
+	/*
+	 * The first superproject commit in the rev walk that points to the
+	 * submodule.
+	 */
+	const struct object_id *super_oid;
+	/*
+	 * Path to the submodule in the superproject commit referenced
+	 * by 'super_oid'.
+	 */
+	char *path;
 	/* The submodule commits that have changed in the rev walk. */
 	struct oid_array *new_commits;
 };
@@ -819,6 +829,7 @@ static void changed_submodule_data_clear(struct changed_submodule_data *cs_data)
 {
 	oid_array_clear(cs_data->new_commits);
 	free(cs_data->new_commits);
+	free(cs_data->path);
 }
 
 static void collect_changed_submodules_cb(struct diff_queue_struct *q,
@@ -869,6 +880,8 @@ static void collect_changed_submodules_cb(struct diff_queue_struct *q,
 			cs_data = xcalloc(1, sizeof(struct changed_submodule_data));
 			/* NEEDSWORK: should we have oid_array_init()? */
 			cs_data->new_commits = xcalloc(1, sizeof(struct oid_array));
+			cs_data->super_oid = commit_oid;
+			cs_data->path = xstrdup(p->two->path);
 			item->util = cs_data;
 		}
 		oid_array_append(cs_data->new_commits, &p->two->oid);
@@ -944,6 +957,7 @@ struct has_commit_data {
 	struct repository *repo;
 	int result;
 	const char *path;
+	const struct object_id *super_oid;
 };
 
 static int check_has_commit(const struct object_id *oid, void *data)
@@ -952,7 +966,7 @@ static int check_has_commit(const struct object_id *oid, void *data)
 	struct repository subrepo;
 	enum object_type type;
 
-	if (repo_submodule_init(&subrepo, cb->repo, cb->path, null_oid())) {
+	if (repo_submodule_init(&subrepo, cb->repo, cb->path, cb->super_oid)) {
 		cb->result = 0;
 		goto cleanup;
 	}
@@ -980,9 +994,10 @@ static int check_has_commit(const struct object_id *oid, void *data)
 
 static int submodule_has_commits(struct repository *r,
 				 const char *path,
+				 const struct object_id *super_oid,
 				 struct oid_array *commits)
 {
-	struct has_commit_data has_commit = { r, 1, path };
+	struct has_commit_data has_commit = { r, 1, path, super_oid };
 
 	/*
 	 * Perform a cheap, but incorrect check for the existence of 'commits'.
@@ -1029,7 +1044,7 @@ static int submodule_needs_pushing(struct repository *r,
 				   const char *path,
 				   struct oid_array *commits)
 {
-	if (!submodule_has_commits(r, path, commits))
+	if (!submodule_has_commits(r, path, null_oid(), commits))
 		/*
 		 * NOTE: We do consider it safe to return "no" here. The
 		 * correct answer would be "We do not know" instead of
@@ -1280,7 +1295,7 @@ static void calculate_changed_submodule_paths(struct repository *r,
 		const struct submodule *submodule;
 		const char *path = NULL;
 
-		submodule = submodule_from_name(r, null_oid(), name->string);
+		submodule = submodule_from_name(r, cs_data->super_oid, name->string);
 		if (submodule)
 			path = submodule->path;
 		else
@@ -1289,7 +1304,7 @@ static void calculate_changed_submodule_paths(struct repository *r,
 		if (!path)
 			continue;
 
-		if (submodule_has_commits(r, path, cs_data->new_commits)) {
+		if (submodule_has_commits(r, path, cs_data->super_oid, cs_data->new_commits)) {
 			oid_array_clear(cs_data->new_commits);
 			*name->string = '\0';
 		}
@@ -1414,12 +1429,13 @@ static const struct submodule *get_non_gitmodules_submodule(const char *path)
 }
 
 static struct fetch_task *fetch_task_create(struct repository *r,
-					    const char *path)
+					    const char *path,
+					    const struct object_id *treeish_name)
 {
 	struct fetch_task *task = xmalloc(sizeof(*task));
 	memset(task, 0, sizeof(*task));
 
-	task->sub = submodule_from_path(r, null_oid(), path);
+	task->sub = submodule_from_path(r, treeish_name, path);
 	if (!task->sub) {
 		/*
 		 * No entry in .gitmodules? Technically not a submodule,
@@ -1451,11 +1467,12 @@ static void fetch_task_release(struct fetch_task *p)
 }
 
 static struct repository *get_submodule_repo_for(struct repository *r,
-						 const char *path)
+						 const char *path,
+						 const struct object_id *treeish_name)
 {
 	struct repository *ret = xmalloc(sizeof(*ret));
 
-	if (repo_submodule_init(ret, r, path, null_oid())) {
+	if (repo_submodule_init(ret, r, path, treeish_name)) {
 		free(ret);
 		return NULL;
 	}
@@ -1476,7 +1493,7 @@ static int get_next_submodule(struct child_process *cp,
 		if (!S_ISGITLINK(ce->ce_mode))
 			continue;
 
-		task = fetch_task_create(spf->r, ce->name);
+		task = fetch_task_create(spf->r, ce->name, null_oid());
 		if (!task)
 			continue;
 
@@ -1499,7 +1516,7 @@ static int get_next_submodule(struct child_process *cp,
 			continue;
 		}
 
-		task->repo = get_submodule_repo_for(spf->r, task->sub->path);
+		task->repo = get_submodule_repo_for(spf->r, task->sub->path, null_oid());
 		if (task->repo) {
 			struct strbuf submodule_prefix = STRBUF_INIT;
 			child_process_init(cp);
-- 
2.33.GIT


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 4/8] t5526: introduce test helper to assert on fetches
  2022-02-10  4:41 [PATCH 0/8] fetch --recurse-submodules: fetch unpopulated submodules Glen Choo
                   ` (2 preceding siblings ...)
  2022-02-10  4:41 ` [PATCH 3/8] submodule: make static functions read submodules from commits Glen Choo
@ 2022-02-10  4:41 ` Glen Choo
  2022-02-10  4:41 ` [PATCH 5/8] t5526: use grep " Glen Choo
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 149+ messages in thread
From: Glen Choo @ 2022-02-10  4:41 UTC (permalink / raw)
  To: git; +Cc: Glen Choo, Jonathan Tan

A future commit will change the stderr of "git fetch
--recurse-submodules" and add new tests to t/t5526-fetch-submodules.sh.
This poses two challenges:

* The tests use test_cmp to assert on the stderr, which will fail on the
  future test because the stderr changes slightly, even though it still
  contains the information we expect.
* The expect.err file is constructed by the add_upstream_commit() helper
  as input into test_cmp, but most tests fetch a different combination
  of repos from expect.err. This results in noisy tests that modify
  parts of that expect.err to generate the expected output.

To address both of these issues, introduce a verify_fetch_result()
helper to t/t5526-fetch-submodules.sh that asserts on the output of "git
fetch --recurse-submodules" and handles the ordering of expect.err.

As a result, the tests no longer construct expect.err manually. test_cmp
is still invoked by verify_fetch_result(), but that will be replaced in
a later commit.

Signed-off-by: Glen Choo <chooglen@google.com>
---
 t/t5526-fetch-submodules.sh | 136 +++++++++++++++++++++---------------
 1 file changed, 81 insertions(+), 55 deletions(-)

diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index 2dc75b80db..0e93df1665 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -13,6 +13,10 @@ export GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB
 
 pwd=$(pwd)
 
+# For each submodule in the test setup, this creates a commit and writes
+# a file that contains the expected err if that new commit were fetched.
+# These output files get concatenated in the right order by
+# verify_fetch_result().
 add_upstream_commit() {
 	(
 		cd submodule &&
@@ -22,9 +26,9 @@ add_upstream_commit() {
 		git add subfile &&
 		git commit -m new subfile &&
 		head2=$(git rev-parse --short HEAD) &&
-		echo "Fetching submodule submodule" > ../expect.err &&
-		echo "From $pwd/submodule" >> ../expect.err &&
-		echo "   $head1..$head2  sub        -> origin/sub" >> ../expect.err
+		echo "Fetching submodule submodule" > ../expect.err.sub &&
+		echo "From $pwd/submodule" >> ../expect.err.sub &&
+		echo "   $head1..$head2  sub        -> origin/sub" >> ../expect.err.sub
 	) &&
 	(
 		cd deepsubmodule &&
@@ -34,12 +38,33 @@ add_upstream_commit() {
 		git add deepsubfile &&
 		git commit -m new deepsubfile &&
 		head2=$(git rev-parse --short HEAD) &&
-		echo "Fetching submodule submodule/subdir/deepsubmodule" >> ../expect.err
-		echo "From $pwd/deepsubmodule" >> ../expect.err &&
-		echo "   $head1..$head2  deep       -> origin/deep" >> ../expect.err
+		echo "Fetching submodule submodule/subdir/deepsubmodule" > ../expect.err.deep
+		echo "From $pwd/deepsubmodule" >> ../expect.err.deep &&
+		echo "   $head1..$head2  deep       -> origin/deep" >> ../expect.err.deep
 	)
 }
 
+# Verifies that the expected repositories were fetched. This is done by
+# concatenating the files expect.err.[super|sub|deep] in the correct
+# order and comparing it to the actual stderr.
+#
+# If a repo should not be fetched in the test, its corresponding
+# expect.err file should be rm-ed.
+verify_fetch_result() {
+	ACTUAL_ERR=$1 &&
+	rm -f expect.err.combined &&
+	if [ -f expect.err.super ]; then
+		cat expect.err.super >>expect.err.combined
+	fi &&
+	if [ -f expect.err.sub ]; then
+		cat expect.err.sub >>expect.err.combined
+	fi &&
+	if [ -f expect.err.deep ]; then
+		cat expect.err.deep >>expect.err.combined
+	fi &&
+	test_cmp expect.err.combined $ACTUAL_ERR
+}
+
 test_expect_success setup '
 	mkdir deepsubmodule &&
 	(
@@ -77,7 +102,7 @@ test_expect_success "fetch --recurse-submodules recurses into submodules" '
 		git fetch --recurse-submodules >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "submodule.recurse option triggers recursive fetch" '
@@ -87,7 +112,7 @@ test_expect_success "submodule.recurse option triggers recursive fetch" '
 		git -c submodule.recurse fetch >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "fetch --recurse-submodules -j2 has the same output behaviour" '
@@ -97,7 +122,7 @@ test_expect_success "fetch --recurse-submodules -j2 has the same output behaviou
 		GIT_TRACE="$TRASH_DIRECTORY/trace.out" git fetch --recurse-submodules -j2 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err &&
+	verify_fetch_result actual.err &&
 	grep "2 tasks" trace.out
 '
 
@@ -127,7 +152,7 @@ test_expect_success "using fetchRecurseSubmodules=true in .gitmodules recurses i
 		git fetch >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "--no-recurse-submodules overrides .gitmodules config" '
@@ -158,7 +183,7 @@ test_expect_success "--recurse-submodules overrides fetchRecurseSubmodules setti
 		git config --unset submodule.submodule.fetchRecurseSubmodules
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "--quiet propagates to submodules" '
@@ -186,7 +211,7 @@ test_expect_success "--dry-run propagates to submodules" '
 		git fetch --recurse-submodules --dry-run >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "Without --dry-run propagates to submodules" '
@@ -195,7 +220,7 @@ test_expect_success "Without --dry-run propagates to submodules" '
 		git fetch --recurse-submodules >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "recurseSubmodules=true propagates into submodules" '
@@ -206,7 +231,7 @@ test_expect_success "recurseSubmodules=true propagates into submodules" '
 		git fetch >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "--recurse-submodules overrides config in submodule" '
@@ -220,7 +245,7 @@ test_expect_success "--recurse-submodules overrides config in submodule" '
 		git fetch --recurse-submodules >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "--no-recurse-submodules overrides config setting" '
@@ -253,14 +278,14 @@ test_expect_success "Recursion stops when no new submodule commits are fetched"
 	git add submodule &&
 	git commit -m "new submodule" &&
 	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.sub &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err.sub &&
-	head -3 expect.err >> expect.err.sub &&
+	echo "From $pwd/." > expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
+	rm expect.err.deep &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err
 	) &&
-	test_cmp expect.err.sub actual.err &&
+	verify_fetch_result actual.err &&
 	test_must_be_empty actual.out
 '
 
@@ -271,14 +296,16 @@ test_expect_success "Recursion doesn't happen when new superproject commits don'
 	git add file &&
 	git commit -m "new file" &&
 	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.file &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err.file &&
+	echo "From $pwd/." > expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
+	rm expect.err.sub &&
+	rm expect.err.deep &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err.file actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "Recursion picks up config in submodule" '
@@ -295,9 +322,8 @@ test_expect_success "Recursion picks up config in submodule" '
 	git add submodule &&
 	git commit -m "new submodule" &&
 	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.sub &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err.sub &&
-	cat expect.err >> expect.err.sub &&
+	echo "From $pwd/." > expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err &&
@@ -306,7 +332,7 @@ test_expect_success "Recursion picks up config in submodule" '
 			git config --unset fetch.recurseSubmodules
 		)
 	) &&
-	test_cmp expect.err.sub actual.err &&
+	verify_fetch_result actual.err &&
 	test_must_be_empty actual.out
 '
 
@@ -331,15 +357,13 @@ test_expect_success "Recursion picks up all submodules when necessary" '
 	git add submodule &&
 	git commit -m "new submodule" &&
 	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.2 &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err.2 &&
-	cat expect.err.sub >> expect.err.2 &&
-	tail -3 expect.err >> expect.err.2 &&
+	echo "From $pwd/." > expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err
 	) &&
-	test_cmp expect.err.2 actual.err &&
+	verify_fetch_result actual.err &&
 	test_must_be_empty actual.out
 '
 
@@ -375,11 +399,8 @@ test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necess
 	git add submodule &&
 	git commit -m "new submodule" &&
 	head2=$(git rev-parse --short HEAD) &&
-	tail -3 expect.err > expect.err.deepsub &&
-	echo "From $pwd/." > expect.err &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err &&
-	cat expect.err.sub >> expect.err &&
-	cat expect.err.deepsub >> expect.err &&
+	echo "From $pwd/." > expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
 	(
 		cd downstream &&
 		git config fetch.recurseSubmodules false &&
@@ -395,7 +416,7 @@ test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necess
 		)
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "'--recurse-submodules=on-demand' stops when no new submodule commits are found in the superproject (and ignores config)" '
@@ -405,14 +426,16 @@ test_expect_success "'--recurse-submodules=on-demand' stops when no new submodul
 	git add file &&
 	git commit -m "new file" &&
 	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.file &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err.file &&
+	echo "From $pwd/." > expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
+	rm expect.err.sub &&
+	rm expect.err.deep &&
 	(
 		cd downstream &&
 		git fetch --recurse-submodules=on-demand >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err.file actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "'fetch.recurseSubmodules=on-demand' overrides global config" '
@@ -426,9 +449,9 @@ test_expect_success "'fetch.recurseSubmodules=on-demand' overrides global config
 	git add submodule &&
 	git commit -m "new submodule" &&
 	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.2 &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err.2 &&
-	head -3 expect.err >> expect.err.2 &&
+	echo "From $pwd/." > expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
+	rm expect.err.deep &&
 	(
 		cd downstream &&
 		git config fetch.recurseSubmodules on-demand &&
@@ -440,7 +463,7 @@ test_expect_success "'fetch.recurseSubmodules=on-demand' overrides global config
 		git config --unset fetch.recurseSubmodules
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err.2 actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "'submodule.<sub>.fetchRecurseSubmodules=on-demand' overrides fetch.recurseSubmodules" '
@@ -454,9 +477,9 @@ test_expect_success "'submodule.<sub>.fetchRecurseSubmodules=on-demand' override
 	git add submodule &&
 	git commit -m "new submodule" &&
 	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.2 &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err.2 &&
-	head -3 expect.err >> expect.err.2 &&
+	echo "From $pwd/." > expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
+	rm expect.err.deep &&
 	(
 		cd downstream &&
 		git config submodule.submodule.fetchRecurseSubmodules on-demand &&
@@ -468,7 +491,7 @@ test_expect_success "'submodule.<sub>.fetchRecurseSubmodules=on-demand' override
 		git config --unset submodule.submodule.fetchRecurseSubmodules
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err.2 actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "don't fetch submodule when newly recorded commits are already present" '
@@ -480,14 +503,17 @@ test_expect_success "don't fetch submodule when newly recorded commits are alrea
 	git add submodule &&
 	git commit -m "submodule rewound" &&
 	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err &&
+	echo "From $pwd/." > expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
+	rm expect.err.sub &&
+	# This file does not exist, but rm -f for readability
+	rm -f expect.err.deep &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err &&
+	verify_fetch_result actual.err &&
 	(
 		cd submodule &&
 		git checkout -q sub
@@ -505,9 +531,9 @@ test_expect_success "'fetch.recurseSubmodules=on-demand' works also without .git
 	git rm .gitmodules &&
 	git commit -m "new submodule without .gitmodules" &&
 	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." >expect.err.2 &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err.2 &&
-	head -3 expect.err >>expect.err.2 &&
+	echo "From $pwd/." >expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
+	rm expect.err.deep &&
 	(
 		cd downstream &&
 		rm .gitmodules &&
@@ -523,7 +549,7 @@ test_expect_success "'fetch.recurseSubmodules=on-demand' works also without .git
 		git reset --hard
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err.2 actual.err &&
+	verify_fetch_result actual.err &&
 	git checkout HEAD^ -- .gitmodules &&
 	git add .gitmodules &&
 	git commit -m "new submodule restored .gitmodules"
-- 
2.33.GIT


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 5/8] t5526: use grep to assert on fetches
  2022-02-10  4:41 [PATCH 0/8] fetch --recurse-submodules: fetch unpopulated submodules Glen Choo
                   ` (3 preceding siblings ...)
  2022-02-10  4:41 ` [PATCH 4/8] t5526: introduce test helper to assert on fetches Glen Choo
@ 2022-02-10  4:41 ` Glen Choo
  2022-02-10 19:17   ` Jonathan Tan
  2022-02-10  4:41 ` [PATCH 6/8] submodule: extract get_fetch_task() Glen Choo
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 149+ messages in thread
From: Glen Choo @ 2022-02-10  4:41 UTC (permalink / raw)
  To: git; +Cc: Glen Choo, Jonathan Tan

In a previous commit, we replaced test_cmp invocations with
verify_fetch_result(). Finish the process of removing test_cmp by using
grep in verify_fetch_result() instead.

This makes the tests less sensitive to changes because, instead of
checking the whole stderr, we only grep for the lines of the form

* "<old-head>..<new-head>\s+branch\s+-> origin/branch"
* "Fetching submodule <submodule-path>" (if fetching a submodule)

when we expect the repo to have fetched. If we expect the repo to not
have fetched, grep to make sure the lines are absent. Also, simplify the
assertions by using grep patterns that match only the relevant pieces of
information, e.g. <old-head> is irrelevant because we only want to know
if the fetch was performed, so we don't need to know where the branch
was before the fetch.

Signed-off-by: Glen Choo <chooglen@google.com>
---
 t/t5526-fetch-submodules.sh | 131 +++++++++++++-----------------------
 1 file changed, 48 insertions(+), 83 deletions(-)

diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index 0e93df1665..cb18f0ac21 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -20,49 +20,52 @@ pwd=$(pwd)
 add_upstream_commit() {
 	(
 		cd submodule &&
-		head1=$(git rev-parse --short HEAD) &&
 		echo new >> subfile &&
 		test_tick &&
 		git add subfile &&
 		git commit -m new subfile &&
-		head2=$(git rev-parse --short HEAD) &&
-		echo "Fetching submodule submodule" > ../expect.err.sub &&
-		echo "From $pwd/submodule" >> ../expect.err.sub &&
-		echo "   $head1..$head2  sub        -> origin/sub" >> ../expect.err.sub
+		git rev-parse --short HEAD >../subhead
 	) &&
 	(
 		cd deepsubmodule &&
-		head1=$(git rev-parse --short HEAD) &&
 		echo new >> deepsubfile &&
 		test_tick &&
 		git add deepsubfile &&
 		git commit -m new deepsubfile &&
-		head2=$(git rev-parse --short HEAD) &&
-		echo "Fetching submodule submodule/subdir/deepsubmodule" > ../expect.err.deep
-		echo "From $pwd/deepsubmodule" >> ../expect.err.deep &&
-		echo "   $head1..$head2  deep       -> origin/deep" >> ../expect.err.deep
+		git rev-parse --short HEAD >../deephead
 	)
 }
 
 # Verifies that the expected repositories were fetched. This is done by
-# concatenating the files expect.err.[super|sub|deep] in the correct
-# order and comparing it to the actual stderr.
+# checking that the branches of [super|sub|deep] were updated to
+# [super|sub|deep]head if the corresponding file exists.
 #
-# If a repo should not be fetched in the test, its corresponding
-# expect.err file should be rm-ed.
+# If the [super|sub|deep] head file does not exist, this verifies that
+# the corresponding repo was not fetched. Thus, if a repo should not be
+# fetched in the test, its corresponding head file should be
+# rm-ed.
 verify_fetch_result() {
 	ACTUAL_ERR=$1 &&
-	rm -f expect.err.combined &&
-	if [ -f expect.err.super ]; then
-		cat expect.err.super >>expect.err.combined
+	# Each grep pattern is guaranteed to match the correct repo
+	# because each repo uses a different name for their branch i.e.
+	# "super", "sub" and "deep".
+	if [ -f superhead ]; then
+		grep -E "\.\.$(cat superhead)\s+super\s+-> origin/super" $ACTUAL_ERR
+	else
+		! grep "super" $ACTUAL_ERR
 	fi &&
-	if [ -f expect.err.sub ]; then
-		cat expect.err.sub >>expect.err.combined
+	if [ -f subhead ]; then
+		grep "Fetching submodule submodule" $ACTUAL_ERR &&
+		grep -E "\.\.$(cat subhead)\s+sub\s+-> origin/sub" $ACTUAL_ERR
+	else
+		! grep "Fetching submodule submodule" $ACTUAL_ERR
 	fi &&
-	if [ -f expect.err.deep ]; then
-		cat expect.err.deep >>expect.err.combined
-	fi &&
-	test_cmp expect.err.combined $ACTUAL_ERR
+	if [ -f deephead ]; then
+		grep "Fetching submodule submodule/subdir/deepsubmodule" $ACTUAL_ERR &&
+		grep -E "\.\.$(cat deephead)\s+deep\s+-> origin/deep" $ACTUAL_ERR
+	else
+		! grep "Fetching submodule submodule/subdir/deepsubmodule" $ACTUAL_ERR
+	fi
 }
 
 test_expect_success setup '
@@ -274,13 +277,10 @@ test_expect_success "Recursion doesn't happen when no new commits are fetched in
 '
 
 test_expect_success "Recursion stops when no new submodule commits are fetched" '
-	head1=$(git rev-parse --short HEAD) &&
 	git add submodule &&
 	git commit -m "new submodule" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
-	rm expect.err.deep &&
+	git rev-parse --short HEAD >superhead &&
+	rm deephead &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err
@@ -291,15 +291,12 @@ test_expect_success "Recursion stops when no new submodule commits are fetched"
 
 test_expect_success "Recursion doesn't happen when new superproject commits don't change any submodules" '
 	add_upstream_commit &&
-	head1=$(git rev-parse --short HEAD) &&
 	echo a > file &&
 	git add file &&
 	git commit -m "new file" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
-	rm expect.err.sub &&
-	rm expect.err.deep &&
+	git rev-parse --short HEAD >superhead &&
+	rm subhead &&
+	rm deephead &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err
@@ -318,12 +315,9 @@ test_expect_success "Recursion picks up config in submodule" '
 		)
 	) &&
 	add_upstream_commit &&
-	head1=$(git rev-parse --short HEAD) &&
 	git add submodule &&
 	git commit -m "new submodule" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
+	git rev-parse --short HEAD >superhead &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err &&
@@ -345,20 +339,13 @@ test_expect_success "Recursion picks up all submodules when necessary" '
 			git fetch &&
 			git checkout -q FETCH_HEAD
 		) &&
-		head1=$(git rev-parse --short HEAD^) &&
 		git add subdir/deepsubmodule &&
 		git commit -m "new deepsubmodule" &&
-		head2=$(git rev-parse --short HEAD) &&
-		echo "Fetching submodule submodule" > ../expect.err.sub &&
-		echo "From $pwd/submodule" >> ../expect.err.sub &&
-		echo "   $head1..$head2  sub        -> origin/sub" >> ../expect.err.sub
+		git rev-parse --short HEAD >../subhead
 	) &&
-	head1=$(git rev-parse --short HEAD) &&
 	git add submodule &&
 	git commit -m "new submodule" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
+	git rev-parse --short HEAD >superhead &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err
@@ -376,13 +363,9 @@ test_expect_success "'--recurse-submodules=on-demand' doesn't recurse when no ne
 			git fetch &&
 			git checkout -q FETCH_HEAD
 		) &&
-		head1=$(git rev-parse --short HEAD^) &&
 		git add subdir/deepsubmodule &&
 		git commit -m "new deepsubmodule" &&
-		head2=$(git rev-parse --short HEAD) &&
-		echo Fetching submodule submodule > ../expect.err.sub &&
-		echo "From $pwd/submodule" >> ../expect.err.sub &&
-		echo "   $head1..$head2  sub        -> origin/sub" >> ../expect.err.sub
+		git rev-parse --short HEAD >../subhead
 	) &&
 	(
 		cd downstream &&
@@ -395,12 +378,9 @@ test_expect_success "'--recurse-submodules=on-demand' doesn't recurse when no ne
 '
 
 test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necessary (and ignores config)" '
-	head1=$(git rev-parse --short HEAD) &&
 	git add submodule &&
 	git commit -m "new submodule" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
+	git rev-parse --short HEAD >superhead &&
 	(
 		cd downstream &&
 		git config fetch.recurseSubmodules false &&
@@ -421,15 +401,12 @@ test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necess
 
 test_expect_success "'--recurse-submodules=on-demand' stops when no new submodule commits are found in the superproject (and ignores config)" '
 	add_upstream_commit &&
-	head1=$(git rev-parse --short HEAD) &&
 	echo a >> file &&
 	git add file &&
 	git commit -m "new file" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
-	rm expect.err.sub &&
-	rm expect.err.deep &&
+	git rev-parse --short HEAD >superhead &&
+	rm subhead &&
+	rm deephead &&
 	(
 		cd downstream &&
 		git fetch --recurse-submodules=on-demand >../actual.out 2>../actual.err
@@ -445,13 +422,10 @@ test_expect_success "'fetch.recurseSubmodules=on-demand' overrides global config
 	) &&
 	add_upstream_commit &&
 	git config --global fetch.recurseSubmodules false &&
-	head1=$(git rev-parse --short HEAD) &&
 	git add submodule &&
 	git commit -m "new submodule" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
-	rm expect.err.deep &&
+	git rev-parse --short HEAD >superhead &&
+	rm deephead &&
 	(
 		cd downstream &&
 		git config fetch.recurseSubmodules on-demand &&
@@ -473,13 +447,10 @@ test_expect_success "'submodule.<sub>.fetchRecurseSubmodules=on-demand' override
 	) &&
 	add_upstream_commit &&
 	git config fetch.recurseSubmodules false &&
-	head1=$(git rev-parse --short HEAD) &&
 	git add submodule &&
 	git commit -m "new submodule" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
-	rm expect.err.deep &&
+	git rev-parse --short HEAD >superhead &&
+	rm deephead &&
 	(
 		cd downstream &&
 		git config submodule.submodule.fetchRecurseSubmodules on-demand &&
@@ -499,15 +470,12 @@ test_expect_success "don't fetch submodule when newly recorded commits are alrea
 		cd submodule &&
 		git checkout -q HEAD^^
 	) &&
-	head1=$(git rev-parse --short HEAD) &&
 	git add submodule &&
 	git commit -m "submodule rewound" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
-	rm expect.err.sub &&
+	git rev-parse --short HEAD >superhead &&
+	rm subhead &&
 	# This file does not exist, but rm -f for readability
-	rm -f expect.err.deep &&
+	rm -f deephead &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err
@@ -526,14 +494,11 @@ test_expect_success "'fetch.recurseSubmodules=on-demand' works also without .git
 		git fetch --recurse-submodules
 	) &&
 	add_upstream_commit &&
-	head1=$(git rev-parse --short HEAD) &&
 	git add submodule &&
 	git rm .gitmodules &&
 	git commit -m "new submodule without .gitmodules" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." >expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
-	rm expect.err.deep &&
+	git rev-parse --short HEAD >superhead &&
+	rm deephead &&
 	(
 		cd downstream &&
 		rm .gitmodules &&
-- 
2.33.GIT


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 6/8] submodule: extract get_fetch_task()
  2022-02-10  4:41 [PATCH 0/8] fetch --recurse-submodules: fetch unpopulated submodules Glen Choo
                   ` (4 preceding siblings ...)
  2022-02-10  4:41 ` [PATCH 5/8] t5526: use grep " Glen Choo
@ 2022-02-10  4:41 ` Glen Choo
  2022-02-10 19:33   ` Jonathan Tan
  2022-02-10  4:41 ` [PATCH 7/8] fetch: fetch unpopulated, changed submodules Glen Choo
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 149+ messages in thread
From: Glen Choo @ 2022-02-10  4:41 UTC (permalink / raw)
  To: git; +Cc: Glen Choo, Jonathan Tan

get_next_submodule() configures the parallel submodule fetch by
performing two functions:

* iterate the index to find submodules
* configure the child processes to fetch the submodules found in the
  previous step

Extract the index iterating code into an iterator function,
get_fetch_task(), so that get_next_submodule() is agnostic of how
to find submodules. This prepares for a subsequent commit will teach the
fetch machinery to also iterate through the list of changed
submodules (in addition to the index).

Signed-off-by: Glen Choo <chooglen@google.com>
---
 submodule.c | 75 ++++++++++++++++++++++++++++++-----------------------
 1 file changed, 43 insertions(+), 32 deletions(-)

diff --git a/submodule.c b/submodule.c
index d4227ac22d..d695dcadf4 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1480,14 +1480,12 @@ static struct repository *get_submodule_repo_for(struct repository *r,
 	return ret;
 }
 
-static int get_next_submodule(struct child_process *cp,
-			      struct strbuf *err, void *data, void **task_cb)
+static struct fetch_task *
+get_fetch_task(struct submodule_parallel_fetch *spf,
+	       const char **default_argv, struct strbuf *err)
 {
-	struct submodule_parallel_fetch *spf = data;
-
 	for (; spf->count < spf->r->index->cache_nr; spf->count++) {
 		const struct cache_entry *ce = spf->r->index->cache[spf->count];
-		const char *default_argv;
 		struct fetch_task *task;
 
 		if (!S_ISGITLINK(ce->ce_mode))
@@ -1507,41 +1505,17 @@ static int get_next_submodule(struct child_process *cp,
 					&spf->changed_submodule_names,
 					task->sub->name))
 				continue;
-			default_argv = "on-demand";
+			*default_argv = "on-demand";
 			break;
 		case RECURSE_SUBMODULES_ON:
-			default_argv = "yes";
+			*default_argv = "yes";
 			break;
 		case RECURSE_SUBMODULES_OFF:
 			continue;
 		}
 
 		task->repo = get_submodule_repo_for(spf->r, task->sub->path, null_oid());
-		if (task->repo) {
-			struct strbuf submodule_prefix = STRBUF_INIT;
-			child_process_init(cp);
-			cp->dir = task->repo->gitdir;
-			prepare_submodule_repo_env_in_gitdir(&cp->env_array);
-			cp->git_cmd = 1;
-			if (!spf->quiet)
-				strbuf_addf(err, _("Fetching submodule %s%s\n"),
-					    spf->prefix, ce->name);
-			strvec_init(&cp->args);
-			strvec_pushv(&cp->args, spf->args.v);
-			strvec_push(&cp->args, default_argv);
-			strvec_push(&cp->args, "--submodule-prefix");
-
-			strbuf_addf(&submodule_prefix, "%s%s/",
-						       spf->prefix,
-						       task->sub->path);
-			strvec_push(&cp->args, submodule_prefix.buf);
-
-			spf->count++;
-			*task_cb = task;
-
-			strbuf_release(&submodule_prefix);
-			return 1;
-		} else {
+		if (!task->repo) {
 			struct strbuf empty_submodule_path = STRBUF_INIT;
 
 			fetch_task_release(task);
@@ -1562,7 +1536,44 @@ static int get_next_submodule(struct child_process *cp,
 					    ce->name);
 			}
 			strbuf_release(&empty_submodule_path);
+			continue;
 		}
+		if (!spf->quiet)
+			strbuf_addf(err, _("Fetching submodule %s%s\n"),
+				    spf->prefix, ce->name);
+
+		spf->count++;
+		return task;
+	}
+	return NULL;
+}
+
+static int get_next_submodule(struct child_process *cp, struct strbuf *err,
+			      void *data, void **task_cb)
+{
+	struct submodule_parallel_fetch *spf = data;
+	const char *default_argv = NULL;
+	struct fetch_task *task = get_fetch_task(spf, &default_argv, err);
+
+	if (task) {
+		struct strbuf submodule_prefix = STRBUF_INIT;
+
+		child_process_init(cp);
+		cp->dir = task->repo->gitdir;
+		prepare_submodule_repo_env_in_gitdir(&cp->env_array);
+		cp->git_cmd = 1;
+		strvec_init(&cp->args);
+		strvec_pushv(&cp->args, spf->args.v);
+		strvec_push(&cp->args, default_argv);
+		strvec_push(&cp->args, "--submodule-prefix");
+
+		strbuf_addf(&submodule_prefix, "%s%s/", spf->prefix,
+			    task->sub->path);
+		strvec_push(&cp->args, submodule_prefix.buf);
+		*task_cb = task;
+
+		strbuf_release(&submodule_prefix);
+		return 1;
 	}
 
 	if (spf->oid_fetch_tasks_nr) {
-- 
2.33.GIT


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 7/8] fetch: fetch unpopulated, changed submodules
  2022-02-10  4:41 [PATCH 0/8] fetch --recurse-submodules: fetch unpopulated submodules Glen Choo
                   ` (5 preceding siblings ...)
  2022-02-10  4:41 ` [PATCH 6/8] submodule: extract get_fetch_task() Glen Choo
@ 2022-02-10  4:41 ` Glen Choo
  2022-02-10 22:49   ` Junio C Hamano
                     ` (2 more replies)
  2022-02-10  4:41 ` [PATCH 8/8] submodule: fix bug and remove add_submodule_odb() Glen Choo
                   ` (2 subsequent siblings)
  9 siblings, 3 replies; 149+ messages in thread
From: Glen Choo @ 2022-02-10  4:41 UTC (permalink / raw)
  To: git; +Cc: Glen Choo, Jonathan Tan

"git fetch --recurse-submodules" only considers populated
submodules (i.e. submodules that can be found by iterating the index),
which makes "git fetch" behave differently based on which commit is
checked out. As a result, even if the user has initialized all submodules
correctly, they may not fetch the necessary submodule commits, and
commands like "git checkout --recurse-submodules" might fail.

Teach "git fetch" to fetch cloned, changed submodules regardless of
whether they are populated (this is in addition to the current behavior
of fetching populated submodules).

Since a submodule may be encountered multiple times (via the list of
populated submodules or via the list of changed submodules), maintain a
list of seen submodules to avoid fetching a submodule more than once.

Signed-off-by: Glen Choo <chooglen@google.com>
---
submodule.c has a seemingly-unrelated change that teaches the "find
changed submodules" rev walk to call is_repository_shallow(). This fixes
what I believe is a legitimate bug - the rev walk would fail on a
shallow repo.

Our test suite did not catch this prior to this commit because we skip
the rev walk if .gitmodules is not found, and thus the test suite did
not attempt the rev walk on a shallow clone. After this commit,
we always attempt to find changed submodules (regardless of whether
there is a .gitmodules file), and the test suite noticed the bug.

 Documentation/fetch-options.txt |  26 ++--
 Documentation/git-fetch.txt     |  10 +-
 submodule.c                     | 101 +++++++++++++--
 t/t5526-fetch-submodules.sh     | 217 ++++++++++++++++++++++++++++++++
 4 files changed, 328 insertions(+), 26 deletions(-)

diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt
index e967ff1874..38dad13683 100644
--- a/Documentation/fetch-options.txt
+++ b/Documentation/fetch-options.txt
@@ -185,15 +185,23 @@ endif::git-pull[]
 ifndef::git-pull[]
 --recurse-submodules[=yes|on-demand|no]::
 	This option controls if and under what conditions new commits of
-	populated submodules should be fetched too. It can be used as a
-	boolean option to completely disable recursion when set to 'no' or to
-	unconditionally recurse into all populated submodules when set to
-	'yes', which is the default when this option is used without any
-	value. Use 'on-demand' to only recurse into a populated submodule
-	when the superproject retrieves a commit that updates the submodule's
-	reference to a commit that isn't already in the local submodule
-	clone. By default, 'on-demand' is used, unless
-	`fetch.recurseSubmodules` is set (see linkgit:git-config[1]).
+	submodules should be fetched too. When recursing through submodules,
+	`git fetch` always attempts to fetch "changed" submodules, that is, a
+	submodule that has commits that are referenced by a newly fetched
+	superproject commit but are missing in the local submodule clone. A
+	changed submodule can be fetched as long as it is present locally e.g.
+	in `$GIT_DIR/modules/` (see linkgit:gitsubmodules[7]); if the upstream
+	adds a new submodule, that submodule cannot be fetched until it is
+	cloned e.g. by `git submodule update`.
++
+When set to 'on-demand', only changed submodules are fetched. When set
+to 'yes', all populated submodules are fetched and submodules that are
+both unpopulated and changed are fetched. When set to 'no', submodules
+are never fetched.
++
+When unspecified, this uses the value of `fetch.recurseSubmodules` if it
+is set (see linkgit:git-config[1]), defaulting to 'on-demand' if unset.
+When this option is used without any value, it defaults to 'yes'.
 endif::git-pull[]
 
 -j::
diff --git a/Documentation/git-fetch.txt b/Documentation/git-fetch.txt
index 550c16ca61..e9d364669a 100644
--- a/Documentation/git-fetch.txt
+++ b/Documentation/git-fetch.txt
@@ -287,12 +287,10 @@ include::transfer-data-leaks.txt[]
 
 BUGS
 ----
-Using --recurse-submodules can only fetch new commits in already checked
-out submodules right now. When e.g. upstream added a new submodule in the
-just fetched commits of the superproject the submodule itself cannot be
-fetched, making it impossible to check out that submodule later without
-having to do a fetch again. This is expected to be fixed in a future Git
-version.
+Using --recurse-submodules can only fetch new commits in submodules that are
+present locally e.g. in `$GIT_DIR/modules/`. If the upstream adds a new
+submodule, that submodule cannot be fetched until it is cloned e.g. by `git
+submodule update`. This is expected to be fixed in a future Git version.
 
 SEE ALSO
 --------
diff --git a/submodule.c b/submodule.c
index d695dcadf4..0c02bbc9c3 100644
--- a/submodule.c
+++ b/submodule.c
@@ -22,6 +22,7 @@
 #include "parse-options.h"
 #include "object-store.h"
 #include "commit-reach.h"
+#include "shallow.h"
 
 static int config_update_recurse_submodules = RECURSE_SUBMODULES_OFF;
 static int initialized_fetch_ref_tips;
@@ -907,6 +908,9 @@ static void collect_changed_submodules(struct repository *r,
 
 	save_warning = warn_on_object_refname_ambiguity;
 	warn_on_object_refname_ambiguity = 0;
+	/* make sure shallows are read */
+	is_repository_shallow(the_repository);
+
 	repo_init_revisions(r, &rev, NULL);
 	setup_revisions(argv->nr, argv->v, &rev, &s_r_opt);
 	warn_on_object_refname_ambiguity = save_warning;
@@ -1273,10 +1277,6 @@ static void calculate_changed_submodule_paths(struct repository *r,
 	struct strvec argv = STRVEC_INIT;
 	struct string_list_item *name;
 
-	/* No need to check if there are no submodules configured */
-	if (!submodule_from_path(r, NULL, NULL))
-		return;
-
 	strvec_push(&argv, "--"); /* argv[0] program name */
 	oid_array_for_each_unique(&ref_tips_after_fetch,
 				   append_oid_to_argv, &argv);
@@ -1347,7 +1347,8 @@ int submodule_touches_in_range(struct repository *r,
 }
 
 struct submodule_parallel_fetch {
-	int count;
+	int index_count;
+	int changed_count;
 	struct strvec args;
 	struct repository *r;
 	const char *prefix;
@@ -1357,6 +1358,7 @@ struct submodule_parallel_fetch {
 	int result;
 
 	struct string_list changed_submodule_names;
+	struct string_list seen_submodule_names;
 
 	/* Pending fetches by OIDs */
 	struct fetch_task **oid_fetch_tasks;
@@ -1367,6 +1369,7 @@ struct submodule_parallel_fetch {
 #define SPF_INIT { \
 	.args = STRVEC_INIT, \
 	.changed_submodule_names = STRING_LIST_INIT_DUP, \
+	.seen_submodule_names = STRING_LIST_INIT_DUP, \
 	.submodules_with_errors = STRBUF_INIT, \
 }
 
@@ -1481,11 +1484,12 @@ static struct repository *get_submodule_repo_for(struct repository *r,
 }
 
 static struct fetch_task *
-get_fetch_task(struct submodule_parallel_fetch *spf,
-	       const char **default_argv, struct strbuf *err)
+get_fetch_task_from_index(struct submodule_parallel_fetch *spf,
+			  const char **default_argv, struct strbuf *err)
 {
-	for (; spf->count < spf->r->index->cache_nr; spf->count++) {
-		const struct cache_entry *ce = spf->r->index->cache[spf->count];
+	for (; spf->index_count < spf->r->index->cache_nr; spf->index_count++) {
+		const struct cache_entry *ce =
+			spf->r->index->cache[spf->index_count];
 		struct fetch_task *task;
 
 		if (!S_ISGITLINK(ce->ce_mode))
@@ -1495,6 +1499,15 @@ get_fetch_task(struct submodule_parallel_fetch *spf,
 		if (!task)
 			continue;
 
+		/*
+		 * We might have already considered this submodule
+		 * because we saw it when iterating the changed
+		 * submodule names.
+		 */
+		if (string_list_lookup(&spf->seen_submodule_names,
+				       task->sub->name))
+			continue;
+
 		switch (get_fetch_recurse_config(task->sub, spf))
 		{
 		default:
@@ -1542,7 +1555,69 @@ get_fetch_task(struct submodule_parallel_fetch *spf,
 			strbuf_addf(err, _("Fetching submodule %s%s\n"),
 				    spf->prefix, ce->name);
 
-		spf->count++;
+		spf->index_count++;
+		return task;
+	}
+	return NULL;
+}
+
+static struct fetch_task *
+get_fetch_task_from_changed(struct submodule_parallel_fetch *spf,
+			    const char **default_argv, struct strbuf *err)
+{
+	for (; spf->changed_count < spf->changed_submodule_names.nr;
+	     spf->changed_count++) {
+		struct string_list_item item =
+			spf->changed_submodule_names.items[spf->changed_count];
+		struct changed_submodule_data *cs_data = item.util;
+		struct fetch_task *task;
+
+		/*
+		 * We might have already considered this submodule
+		 * because we saw it in the index.
+		 */
+		if (string_list_lookup(&spf->seen_submodule_names, item.string))
+			continue;
+
+		task = fetch_task_create(spf->r, cs_data->path,
+					 cs_data->super_oid);
+		if (!task)
+			continue;
+
+		switch (get_fetch_recurse_config(task->sub, spf)) {
+		default:
+		case RECURSE_SUBMODULES_DEFAULT:
+		case RECURSE_SUBMODULES_ON_DEMAND:
+			*default_argv = "on-demand";
+			break;
+		case RECURSE_SUBMODULES_ON:
+			*default_argv = "yes";
+			break;
+		case RECURSE_SUBMODULES_OFF:
+			continue;
+		}
+
+		task->repo = get_submodule_repo_for(spf->r, task->sub->path,
+						    cs_data->super_oid);
+		if (!task->repo) {
+			fetch_task_release(task);
+			free(task);
+
+			strbuf_addf(err, _("Could not access submodule '%s'\n"),
+				    cs_data->path);
+			continue;
+		}
+		if (!is_tree_submodule_active(spf->r, cs_data->super_oid,
+					      task->sub->path))
+			continue;
+
+		if (!spf->quiet)
+			strbuf_addf(err,
+				    _("Fetching submodule %s%s at commit %s\n"),
+				    spf->prefix, task->sub->path,
+				    find_unique_abbrev(cs_data->super_oid,
+						       DEFAULT_ABBREV));
+		spf->changed_count++;
 		return task;
 	}
 	return NULL;
@@ -1553,7 +1628,10 @@ static int get_next_submodule(struct child_process *cp, struct strbuf *err,
 {
 	struct submodule_parallel_fetch *spf = data;
 	const char *default_argv = NULL;
-	struct fetch_task *task = get_fetch_task(spf, &default_argv, err);
+	struct fetch_task *task =
+		get_fetch_task_from_index(spf, &default_argv, err);
+	if (!task)
+		task = get_fetch_task_from_changed(spf, &default_argv, err);
 
 	if (task) {
 		struct strbuf submodule_prefix = STRBUF_INIT;
@@ -1573,6 +1651,7 @@ static int get_next_submodule(struct child_process *cp, struct strbuf *err,
 		*task_cb = task;
 
 		strbuf_release(&submodule_prefix);
+		string_list_insert(&spf->seen_submodule_names, task->sub->name);
 		return 1;
 	}
 
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index cb18f0ac21..f37dca4e09 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -399,6 +399,223 @@ test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necess
 	verify_fetch_result actual.err
 '
 
+# Cleans up after tests that checkout branches other than the main ones
+# in the tests.
+checkout_main_branches() {
+	git -C downstream checkout --recurse-submodules super &&
+	git -C downstream/submodule checkout --recurse-submodules sub &&
+	git -C downstream/submodule/subdir/deepsubmodule checkout --recurse-submodules deep
+}
+
+# Test that we can fetch submodules in other branches by running fetch
+# in a branch that has no submodules.
+test_expect_success 'setup downstream branch without submodules' '
+	(
+		cd downstream &&
+		git checkout --recurse-submodules -b no-submodules &&
+		rm .gitmodules &&
+		git rm submodule &&
+		git add .gitmodules &&
+		git commit -m "no submodules" &&
+		git checkout --recurse-submodules super
+	)
+'
+
+test_expect_success "'--recurse-submodules=on-demand' should fetch submodule commits if the submodule is changed but the index has no submodules" '
+	test_when_finished "checkout_main_branches" &&
+	git -C downstream fetch --recurse-submodules &&
+	# Create new superproject commit with updated submodules
+	add_upstream_commit &&
+	(
+		cd submodule &&
+		(
+			cd subdir/deepsubmodule &&
+			git fetch &&
+			git checkout -q FETCH_HEAD
+		) &&
+		git add subdir/deepsubmodule &&
+		git commit -m "new deep submodule"
+	) &&
+	git add submodule &&
+	git commit -m "new submodule" &&
+
+	# Fetch the new superproject commit
+	(
+		cd downstream &&
+		git switch --recurse-submodules no-submodules &&
+		git fetch --recurse-submodules=on-demand >../actual.out 2>../actual.err &&
+		git checkout --recurse-submodules origin/super 2>../actual-checkout.err
+	) &&
+	test_must_be_empty actual.out &&
+	git rev-parse --short HEAD >superhead &&
+	git -C submodule rev-parse --short HEAD >subhead &&
+	git -C deepsubmodule rev-parse --short HEAD >deephead &&
+	verify_fetch_result actual.err &&
+
+	# Assert that the fetch happened at the non-HEAD commits
+	grep "Fetching submodule submodule at commit $superhead" actual.err &&
+	grep "Fetching submodule submodule/subdir/deepsubmodule at commit $subhead" actual.err &&
+
+	# Assert that we can checkout the superproject commit with --recurse-submodules
+	! grep -E "error: Submodule .+ could not be updated" actual-checkout.err
+'
+
+test_expect_success "'--recurse-submodules' should fetch submodule commits if the submodule is changed but the index has no submodules" '
+	test_when_finished "checkout_main_branches" &&
+	# Fetch any leftover commits from other tests.
+	git -C downstream fetch --recurse-submodules &&
+	# Create new superproject commit with updated submodules
+	add_upstream_commit &&
+	(
+		cd submodule &&
+		(
+			cd subdir/deepsubmodule &&
+			git fetch &&
+			git checkout -q FETCH_HEAD
+		) &&
+		git add subdir/deepsubmodule &&
+		git commit -m "new deep submodule"
+	) &&
+	git add submodule &&
+	git commit -m "new submodule" &&
+
+	# Fetch the new superproject commit
+	(
+		cd downstream &&
+		git switch --recurse-submodules no-submodules &&
+		git fetch --recurse-submodules >../actual.out 2>../actual.err &&
+		git checkout --recurse-submodules origin/super 2>../actual-checkout.err
+	) &&
+	test_must_be_empty actual.out &&
+	git rev-parse --short HEAD >superhead &&
+	git -C submodule rev-parse --short HEAD >subhead &&
+	git -C deepsubmodule rev-parse --short HEAD >deephead &&
+	verify_fetch_result actual.err &&
+
+	# Assert that the fetch happened at the non-HEAD commits
+	grep "Fetching submodule submodule at commit $superhead" actual.err &&
+	grep "Fetching submodule submodule/subdir/deepsubmodule at commit $subhead" actual.err &&
+
+	# Assert that we can checkout the superproject commit with --recurse-submodules
+	! grep -E "error: Submodule .+ could not be updated" actual-checkout.err
+'
+
+test_expect_success "'--recurse-submodules' should ignore changed, inactive submodules" '
+	test_when_finished "checkout_main_branches" &&
+	# Fetch any leftover commits from other tests.
+	git -C downstream fetch --recurse-submodules &&
+	# Create new superproject commit with updated submodules
+	add_upstream_commit &&
+	(
+		cd submodule &&
+		(
+			cd subdir/deepsubmodule &&
+			git fetch &&
+			git checkout -q FETCH_HEAD
+		) &&
+		git add subdir/deepsubmodule &&
+		git commit -m "new deep submodule"
+	) &&
+	git add submodule &&
+	git commit -m "new submodule" &&
+
+	# Fetch the new superproject commit
+	(
+		cd downstream &&
+		git switch --recurse-submodules no-submodules &&
+		git -c submodule.submodule.active=false fetch --recurse-submodules >../actual.out 2>../actual.err
+	) &&
+	test_must_be_empty actual.out &&
+	git rev-parse --short HEAD >superhead &&
+	# Neither should be fetched because the submodule is inactive
+	rm subhead &&
+	rm deephead &&
+	verify_fetch_result actual.err
+'
+
+# Test that we properly fetch the submodules in the index as well as
+# submodules in other branches.
+test_expect_success 'setup downstream branch with other submodule' '
+	mkdir submodule2 &&
+	(
+		cd submodule2 &&
+		git init &&
+		echo sub2content >sub2file &&
+		git add sub2file &&
+		git commit -a -m new &&
+		git branch -M sub2
+	) &&
+	git checkout -b super-sub2-only &&
+	git submodule add "$pwd/submodule2" submodule2 &&
+	git commit -m "add sub2" &&
+	git checkout super &&
+	(
+		cd downstream &&
+		git fetch --recurse-submodules origin &&
+		git checkout super-sub2-only &&
+		# Explicitly run "git submodule update" because sub2 is new
+		# and has not been cloned.
+		git submodule update --init &&
+		git checkout --recurse-submodules super
+	)
+'
+
+test_expect_success "'--recurse-submodules' should fetch submodule commits in changed submodules and the index" '
+	test_when_finished "checkout_main_branches" &&
+	# Fetch any leftover commits from other tests.
+	git -C downstream fetch --recurse-submodules &&
+	# Create new commit in origin/super
+	add_upstream_commit &&
+	(
+		cd submodule &&
+		(
+			cd subdir/deepsubmodule &&
+			git fetch &&
+			git checkout -q FETCH_HEAD
+		) &&
+		git add subdir/deepsubmodule &&
+		git commit -m "new deep submodule"
+	) &&
+	git add submodule &&
+	git commit -m "new submodule" &&
+
+	# Create new commit in origin/super-sub2-only
+	git checkout super-sub2-only &&
+	(
+		cd submodule2 &&
+		test_commit --no-tag foo
+	) &&
+	git add submodule2 &&
+	git commit -m "new submodule2" &&
+
+	git checkout super &&
+	(
+		cd downstream &&
+		git fetch --recurse-submodules >../actual.out 2>../actual.err &&
+		git checkout --recurse-submodules origin/super-sub2-only 2>../actual-checkout.err
+	) &&
+	test_must_be_empty actual.out &&
+
+	# Assert that the submodules in the super branch are fetched
+	git rev-parse --short HEAD >superhead &&
+	git -C submodule rev-parse --short HEAD >subhead &&
+	git -C deepsubmodule rev-parse --short HEAD >deephead &&
+	verify_fetch_result actual.err &&
+	# Assert that submodule is read from the index, not from a commit
+	! grep "Fetching submodule submodule at commit" actual.err &&
+
+	# Assert that super-sub2-only and submodule2 were fetched even
+	# though another branch is checked out
+	super_sub2_only_head=$(git rev-parse --short super-sub2-only) &&
+	grep -E "\.\.${super_sub2_only_head}\s+super-sub2-only\s+-> origin/super-sub2-only" actual.err &&
+	grep "Fetching submodule submodule2 at commit $super_sub2_only_head" actual.err &&
+	sub2head=$(git -C submodule2 rev-parse --short HEAD) &&
+	grep -E "\.\.${sub2head}\s+sub2\s+-> origin/sub2" actual.err &&
+
+	# Assert that we can checkout the superproject commit with --recurse-submodules
+	! grep -E "error: Submodule .+ could not be updated" actual-checkout.err
+'
+
 test_expect_success "'--recurse-submodules=on-demand' stops when no new submodule commits are found in the superproject (and ignores config)" '
 	add_upstream_commit &&
 	echo a >> file &&
-- 
2.33.GIT


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH 8/8] submodule: fix bug and remove add_submodule_odb()
  2022-02-10  4:41 [PATCH 0/8] fetch --recurse-submodules: fetch unpopulated submodules Glen Choo
                   ` (6 preceding siblings ...)
  2022-02-10  4:41 ` [PATCH 7/8] fetch: fetch unpopulated, changed submodules Glen Choo
@ 2022-02-10  4:41 ` Glen Choo
  2022-02-10 22:54   ` Junio C Hamano
  2022-02-10 23:04   ` Jonathan Tan
  2022-02-10  7:07 ` [PATCH 0/8] fetch --recurse-submodules: fetch unpopulated submodules Junio C Hamano
  2022-02-15 17:23 ` [PATCH v2 0/9] " Glen Choo
  9 siblings, 2 replies; 149+ messages in thread
From: Glen Choo @ 2022-02-10  4:41 UTC (permalink / raw)
  To: git; +Cc: Glen Choo, Jonathan Tan

add_submodule_odb() is a hack - it adds a submodule's odb as an
alternate, allowing the submodule's objects to be read via
the_repository. Its last caller is submodule_has_commits(), which calls
add_submodule_odb() to prepare for check_has_commit(). This used to be
necessary because check_has_commit() used the_repository's odb, but this
is longer true as of 13a2f620b2 (submodule: pass repo to
check_has_commit(), 2021-10-08).

Removing add_submodule_odb() reveals a bug in check_has_commit(), where
check_has_commit() will segfault if the submodule is missing (e.g. the
user has not init-ed the submodule). This happens because the
submodule's struct repository cannot be initialized, but
check_has_commit() tries to cleanup the uninitialized struct anyway.
This was masked by add_submodule_odb(), because add_submodule_odb()
fails when the submodule is missing, causing the caller to return early
and avoid calling check_has_commit().

Fix the bug and remove the call to add_submodule_odb(). Since
add_submodule_odb() has no more callers, remove it too.

Note that submodule odbs can still by added as alternates via
add_submodule_odb_by_path().

Signed-off-by: Glen Choo <chooglen@google.com>
---
This bug only exists because we can't call repo_clear() twice on the
same struct repository. So instead of just fixing this site, an
alternative (and maybe better) fix would be to fix repo_clear(). If
others think that's a good idea, I'll do that instead.

 submodule.c | 35 ++---------------------------------
 submodule.h |  9 ++++-----
 2 files changed, 6 insertions(+), 38 deletions(-)

diff --git a/submodule.c b/submodule.c
index 0c02bbc9c3..fdfddd3aac 100644
--- a/submodule.c
+++ b/submodule.c
@@ -168,26 +168,6 @@ void stage_updated_gitmodules(struct index_state *istate)
 
 static struct string_list added_submodule_odb_paths = STRING_LIST_INIT_NODUP;
 
-/* TODO: remove this function, use repo_submodule_init instead. */
-int add_submodule_odb(const char *path)
-{
-	struct strbuf objects_directory = STRBUF_INIT;
-	int ret = 0;
-
-	ret = strbuf_git_path_submodule(&objects_directory, path, "objects/");
-	if (ret)
-		goto done;
-	if (!is_directory(objects_directory.buf)) {
-		ret = -1;
-		goto done;
-	}
-	string_list_insert(&added_submodule_odb_paths,
-			   strbuf_detach(&objects_directory, NULL));
-done:
-	strbuf_release(&objects_directory);
-	return ret;
-}
-
 void add_submodule_odb_by_path(const char *path)
 {
 	string_list_insert(&added_submodule_odb_paths, xstrdup(path));
@@ -972,7 +952,8 @@ static int check_has_commit(const struct object_id *oid, void *data)
 
 	if (repo_submodule_init(&subrepo, cb->repo, cb->path, cb->super_oid)) {
 		cb->result = 0;
-		goto cleanup;
+		/* subrepo failed to init, so don't clean it up. */
+		return 0;
 	}
 
 	type = oid_object_info(&subrepo, oid, NULL);
@@ -1003,18 +984,6 @@ static int submodule_has_commits(struct repository *r,
 {
 	struct has_commit_data has_commit = { r, 1, path, super_oid };
 
-	/*
-	 * Perform a cheap, but incorrect check for the existence of 'commits'.
-	 * This is done by adding the submodule's object store to the in-core
-	 * object store, and then querying for each commit's existence.  If we
-	 * do not have the commit object anywhere, there is no chance we have
-	 * it in the object store of the correct submodule and have it
-	 * reachable from a ref, so we can fail early without spawning rev-list
-	 * which is expensive.
-	 */
-	if (add_submodule_odb(path))
-		return 0;
-
 	oid_array_for_each_unique(commits, check_has_commit, &has_commit);
 
 	if (has_commit.result) {
diff --git a/submodule.h b/submodule.h
index 784ceffc0e..ca1f12b78b 100644
--- a/submodule.h
+++ b/submodule.h
@@ -103,12 +103,11 @@ int submodule_uses_gitfile(const char *path);
 int bad_to_remove_submodule(const char *path, unsigned flags);
 
 /*
- * Call add_submodule_odb() to add the submodule at the given path to a list.
- * When register_all_submodule_odb_as_alternates() is called, the object stores
- * of all submodules in that list will be added as alternates in
- * the_repository.
+ * Call add_submodule_odb_by_path() to add the submodule at the given
+ * path to a list. When register_all_submodule_odb_as_alternates() is
+ * called, the object stores of all submodules in that list will be
+ * added as alternates in the_repository.
  */
-int add_submodule_odb(const char *path);
 void add_submodule_odb_by_path(const char *path);
 int register_all_submodule_odb_as_alternates(void);
 
-- 
2.33.GIT


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* Re: [PATCH 0/8] fetch --recurse-submodules: fetch unpopulated submodules
  2022-02-10  4:41 [PATCH 0/8] fetch --recurse-submodules: fetch unpopulated submodules Glen Choo
                   ` (7 preceding siblings ...)
  2022-02-10  4:41 ` [PATCH 8/8] submodule: fix bug and remove add_submodule_odb() Glen Choo
@ 2022-02-10  7:07 ` Junio C Hamano
  2022-02-10  8:51   ` Glen Choo
  2022-02-15 17:23 ` [PATCH v2 0/9] " Glen Choo
  9 siblings, 1 reply; 149+ messages in thread
From: Junio C Hamano @ 2022-02-10  7:07 UTC (permalink / raw)
  To: Glen Choo; +Cc: git, Jonathan Tan

Glen Choo <chooglen@google.com> writes:

> = Background
>
> When fetching submodule commits, "git fetch --recurse-submodules" only
> considers populated submodules, and not all of the submodules in
> $GIT_DIR/modules as one might expect. As a result, "git fetch
> --recurse-submodules" behaves differently based on which commit is
> checked out.

After getting 'init'ed, which is a sign that the user is interested
in that submodule, when we happen to check out a superproject commit
that lack the submodule in question, do we _lose_ the record that it
was once of interest?  I do not think so.  The cloned copy in
$GIT_DIR/modules/ would not go away, so we _could_ update it, even
there is no checkout, when the superproject we currently have may
not have the submodule.

But I am not sure if that is a problem.  After all, the recursive
fetch tries to make sure that the superproject commit that is
checked out is reproduced as recorded by fetching the submodule
commit recorded in the superproject commit, not a commit that
happens to be at the tip of random branch in the submodule.

It is OK to allow fetching into submodule that is not currently have
a checkout, but I think we should view it purely as prefetching.  We
do not even know, after doing such a fetch in the submodule, we have
the commit necessary for the _next_ commit in superproject we will
check out.

> This can be a problem, for instance, if the user has a branch with
> submodules and a branch without:
>
>   # the submodules were initialized at some point in history..
>   git checkout -b branch-with-submodules origin/branch-with-submodules
>   git submodule update --init
>
>   # later down the road..
>   git checkout --recurse-submodules branch-without-submodules
>   # no submodules are fetched!
>   git fetch --recurse-submodules
>   # if origin/branch-with-submodules has new submodule commits, this
>   # checkout will fail because we never fetched the submodule
>   git checkout --recurse-submodules branch-with-submodules

That is expected, and UNLESS we fetched _everything_ offered by the
remote repository to the submodule in the previous step, there is no
guarantee that this "recurse-submodules" checkout would succeed.

> This series makes "git fetch" fetch the right submodules regardless of
> which commit is checked out, as long as the submodule has already been
> cloned. In particular, "git fetch" learns to:
>
> 1. read submodules from the relevant superproject commit instead of
>    the file system
> 2. fetch all changed submodules, even if they are not populated

The real question is not "in which submodules we fetch", but "what
commits we fetch in these submodules".  I do not think there is a
good answer to the latter.

Of course, we we take this sequence instead:

	git checkout branch-with-submodules
	git fetch --recurse-submodules
	git checkout --recurse-submodules branch-with-submodules
	
things should work correctly (I think we both are assuming that the
other side allows to fetch _any_ object, not just ref), as "fetch"
knows what superproject commit it is asked to complete, unlike the
previous example you gave, where it does not have a clue on what
superproject commit it is preparing submodules for, right?

So, I am not quite sure if we are solving the right problem, let
alone with the right approach.

Also, if the strategy is to prefetch in all submodules that were
'init'ed, we do not have to read .gitmodules from the superproject
commit at all, right?  We can just go check .git/modules/ which is
available locally.  We need to see which submodules are of local
interest by consulting .git/config and/or .git/modules/ anyway even
if we read .gitmodules from the superproject commit to learn what
modules are there.

Thanks.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 0/8] fetch --recurse-submodules: fetch unpopulated submodules
  2022-02-10  7:07 ` [PATCH 0/8] fetch --recurse-submodules: fetch unpopulated submodules Junio C Hamano
@ 2022-02-10  8:51   ` Glen Choo
  2022-02-10 17:40     ` Junio C Hamano
  0 siblings, 1 reply; 149+ messages in thread
From: Glen Choo @ 2022-02-10  8:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jonathan Tan

Junio C Hamano <gitster@pobox.com> writes:

> Glen Choo <chooglen@google.com> writes:
>
>> = Background
>>
>> When fetching submodule commits, "git fetch --recurse-submodules" only
>> considers populated submodules, and not all of the submodules in
>> $GIT_DIR/modules as one might expect. As a result, "git fetch
>> --recurse-submodules" behaves differently based on which commit is
>> checked out.
>
> After getting 'init'ed, which is a sign that the user is interested
> in that submodule, when we happen to check out a superproject commit
> that lack the submodule in question, do we _lose_ the record that it
> was once of interest?  I do not think so.  The cloned copy in
> $GIT_DIR/modules/ would not go away, so we _could_ update it, even
> there is no checkout, when the superproject we currently have may
> not have the submodule.
>
> But I am not sure if that is a problem.  After all, the recursive
> fetch tries to make sure that the superproject commit that is
> checked out is reproduced as recorded by fetching the submodule
> commit recorded in the superproject commit, not a commit that
> happens to be at the tip of random branch in the submodule.
>
> It is OK to allow fetching into submodule that is not currently have
> a checkout, but I think we should view it purely as prefetching.  We
> do not even know, after doing such a fetch in the submodule, we have
> the commit necessary for the _next_ commit in superproject we will
> check out.

Hm, I may be misreading your message, but by "tip of random branch in
the submodule", did you mean "tip of random branch in the
_superproject_"?

If so, prior to this series, recursive fetch already fetches submodule
commits that are recorded by superproject commits other than the one
checked out. submodule.c:calculate_changed_submodule_paths() performs a
rev walk starting from the newly fetched superproject branch tips to
find missing submodule commits that are referenced by superproject
commits. These missing submodule commits are explicitly fetched by the
recursive fetch.

So we already do prefetching, but this series makes the prefetching
smarter by also prefetching in submodules that aren't checked out.

(I think my cover letter could have been clearer; I should have
explicitly called out the fact that we already do prefetching.)

>> This can be a problem, for instance, if the user has a branch with
>> submodules and a branch without:
>>
>>   # the submodules were initialized at some point in history..
>>   git checkout -b branch-with-submodules origin/branch-with-submodules
>>   git submodule update --init
>>
>>   # later down the road..
>>   git checkout --recurse-submodules branch-without-submodules
>>   # no submodules are fetched!
>>   git fetch --recurse-submodules
>>   # if origin/branch-with-submodules has new submodule commits, this
>>   # checkout will fail because we never fetched the submodule
>>   git checkout --recurse-submodules branch-with-submodules
>
> That is expected, and UNLESS we fetched _everything_ offered by the
> remote repository to the submodule in the previous step, there is no
> guarantee that this "recurse-submodules" checkout would succeed.

Yes. With the current prefetching, I don't think we make any guarantee
to the user that all submodule commits will be fetched (even if all of
the subomdules are checked out).

But if I understand the "find changed submodules" rev walk correctly, we
look for changed submodules in the ancestry chains of the branch tips
(but I'm not sure how the rev walk decides to stop). So we might be
_very close_ to fetching all the commits that we think users care about
even though we don't guarantee that all commits will be fetched.

>> This series makes "git fetch" fetch the right submodules regardless of
>> which commit is checked out, as long as the submodule has already been
>> cloned. In particular, "git fetch" learns to:
>>
>> 1. read submodules from the relevant superproject commit instead of
>>    the file system
>> 2. fetch all changed submodules, even if they are not populated
>
> The real question is not "in which submodules we fetch", but "what
> commits we fetch in these submodules".  I do not think there is a
> good answer to the latter.
>
> Of course, we we take this sequence instead:
>
> 	git checkout branch-with-submodules
> 	git fetch --recurse-submodules
> 	git checkout --recurse-submodules branch-with-submodules
> 	
> things should work correctly (I think we both are assuming that the
> other side allows to fetch _any_ object, not just ref), as "fetch"
> knows what superproject commit it is asked to complete, unlike the
> previous example you gave, where it does not have a clue on what
> superproject commit it is preparing submodules for, right?

So, given my prior description of recursive fetch, we actually _do_ know
which superproject commits to prepare for and which submodule commits to
fetch.

> Also, if the strategy is to prefetch in all submodules that were
> 'init'ed, we do not have to read .gitmodules from the superproject
> commit at all, right?  We can just go check .git/modules/ which is
> available locally.  We need to see which submodules are of local
> interest by consulting .git/config and/or .git/modules/ anyway even
> if we read .gitmodules from the superproject commit to learn what
> modules are there.

Hm, good point. Finding submodules of interest in .git/modules or
.git/config sounds like common sense (it's more obvious than trying to
identify all submodules by doing a rev walk at least). 

That said, just looking at what submodules we have doesn't tell us which
submodule commits we need, which is why we have the "find changed
submodules" rev walk. And since we already have the rev walk (which
tells us which superproject commits we care about), it's not that much
effort to fetch non-checked-out submodules.

So I think we'd eventually want to consult .git/modules and .git/config
(we'll have to do this when we start teaching "git fetch" to clone new
submodules, for example) but it's unnecessary for this series.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 0/8] fetch --recurse-submodules: fetch unpopulated submodules
  2022-02-10  8:51   ` Glen Choo
@ 2022-02-10 17:40     ` Junio C Hamano
  2022-02-11  2:39       ` Glen Choo
  0 siblings, 1 reply; 149+ messages in thread
From: Junio C Hamano @ 2022-02-10 17:40 UTC (permalink / raw)
  To: Glen Choo; +Cc: git, Jonathan Tan

Glen Choo <chooglen@google.com> writes:

>> It is OK to allow fetching into submodule that is not currently have
>> a checkout, but I think we should view it purely as prefetching.  We
>> do not even know, after doing such a fetch in the submodule, we have
>> the commit necessary for the _next_ commit in superproject we will
>> check out.
>
> Hm, I may be misreading your message, but by "tip of random branch in
> the submodule", did you mean "tip of random branch in the
> _superproject_"?

No, I meant something like "git submodule foreach 'git fetch --all'"
(or without '--all' to fetch whatever the refspec there tells us),
i.e. tips of branches in the submodule.

>> The real question is not "in which submodules we fetch", but "what
>> commits we fetch in these submodules".  I do not think there is a
>> good answer to the latter.
>>
>> Of course, we we take this sequence instead:
>>
>> 	git checkout branch-with-submodules
>> 	git fetch --recurse-submodules
>> 	git checkout --recurse-submodules branch-with-submodules
>> 	
>> things should work correctly (I think we both are assuming that the
>> other side allows to fetch _any_ object, not just ref), as "fetch"
>> knows what superproject commit it is asked to complete, unlike the
>> previous example you gave, where it does not have a clue on what
>> superproject commit it is preparing submodules for, right?
>
> So, given my prior description of recursive fetch, we actually _do_ know
> which superproject commits to prepare for and which submodule commits to
> fetch.

Just to make sure I understand what is going on, let me rephrase.

 * To find out which submodule commits we need to fetch, we find new
   commits in the superproject we just fetched, inspect the trees of
   these commits to see gitlinks that name commits we need to fetch
   into the submodule repositories.

 * For that to work well, we need to know, from the path these
   commits appear in the trees of the superproject, to find out from
   which submodule to fetch these commits from.  And to make the
   mapping from paths to submodule names, we need to read
   .gitmodules from the same superproject commit we found the
   submodule commit in (as during the history of the superproject,
   the submodule may have moved around).

If so, I understand why being able to read .gitmodules from
superproject commits is essential.  The flow would become like

 (1) fetch in the superproject

 (2) iterate over each new superproject commit:
     - read its .gitmodules
     - iterate over each gitlink found in the superproject commit:
       - map the path we found gitlink at into module name
       - find the submodule repository initialized for the module
         - if the submodule is not of local interest, skip
         - add the submodule commit pointed by gitlink to the
           set of commits that need to be fetched for the submodule [*]

 (3) iterate over each submodule we found more than one commits that
     need to be fetched in, and fetch these commits (we do not have
     to go over the network to re-fetch commits that exist in the
     object store and are reachable from the refs, but "fetch"
     already knows how to optimize that).

Am I on the right track?

Thanks.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 2/8] submodule: store new submodule commits oid_array in a struct
  2022-02-10  4:41 ` [PATCH 2/8] submodule: store new submodule commits oid_array in a struct Glen Choo
@ 2022-02-10 19:00   ` Jonathan Tan
  2022-02-10 22:05     ` Junio C Hamano
  0 siblings, 1 reply; 149+ messages in thread
From: Jonathan Tan @ 2022-02-10 19:00 UTC (permalink / raw)
  To: Glen Choo; +Cc: Jonathan Tan, git

Glen Choo <chooglen@google.com> writes:
> +/*
> + * Holds relevant information for a changed submodule. Used as the .util
> + * member of the changed submodule string_list_item.
> + */
> +struct changed_submodule_data {
> +	/* The submodule commits that have changed in the rev walk. */
> +	struct oid_array *new_commits;
> +};

Overall this change is straightforward and looks good, except that I
think that the struct oid_array can be embedded directly instead of
through a pointer.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 3/8] submodule: make static functions read submodules from commits
  2022-02-10  4:41 ` [PATCH 3/8] submodule: make static functions read submodules from commits Glen Choo
@ 2022-02-10 19:15   ` Jonathan Tan
  2022-02-11 10:07     ` Glen Choo
  2022-02-11 10:09     ` Glen Choo
  0 siblings, 2 replies; 149+ messages in thread
From: Jonathan Tan @ 2022-02-10 19:15 UTC (permalink / raw)
  To: Glen Choo; +Cc: Jonathan Tan, git

Glen Choo <chooglen@google.com> writes:
> As a result, "git fetch" now reads changed submodules using the
> `.gitmodules` and path from super_oid's tree (which is where "git fetch"
> actually noticed the changed submodule) instead of the filesystem.

Could we have a test showing what has changed?

> @@ -1029,7 +1044,7 @@ static int submodule_needs_pushing(struct repository *r,
>  				   const char *path,
>  				   struct oid_array *commits)
>  {
> -	if (!submodule_has_commits(r, path, commits))
> +	if (!submodule_has_commits(r, path, null_oid(), commits))

This confused me at first, but I see that this code is not for fetching,
but for pushing. This patch set concerns itself with fetching, so
passing null_oid() to preserve existing behavior is good.

> @@ -1414,12 +1429,13 @@ static const struct submodule *get_non_gitmodules_submodule(const char *path)
>  }
>  
>  static struct fetch_task *fetch_task_create(struct repository *r,
> -					    const char *path)
> +					    const char *path,
> +					    const struct object_id *treeish_name)
>  {
>  	struct fetch_task *task = xmalloc(sizeof(*task));
>  	memset(task, 0, sizeof(*task));
>  
> -	task->sub = submodule_from_path(r, null_oid(), path);
> +	task->sub = submodule_from_path(r, treeish_name, path);

If there is not a good reason to have "path" before "treeish_name",
probably best to maintain the same parameter order as
submodule_from_path().

> @@ -1476,7 +1493,7 @@ static int get_next_submodule(struct child_process *cp,
>  		if (!S_ISGITLINK(ce->ce_mode))
>  			continue;
>  
> -		task = fetch_task_create(spf->r, ce->name);
> +		task = fetch_task_create(spf->r, ce->name, null_oid());

Hmm...is the plumbing incomplete? This code is about fetching, but we're
not passing any superproject commit OID here. If this will be fixed in a
future commit, maybe the distribution of what goes into each commit
needs to be revised.

> @@ -1499,7 +1516,7 @@ static int get_next_submodule(struct child_process *cp,
>  			continue;
>  		}
>  
> -		task->repo = get_submodule_repo_for(spf->r, task->sub->path);
> +		task->repo = get_submodule_repo_for(spf->r, task->sub->path, null_oid());

Same comment here.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 5/8] t5526: use grep to assert on fetches
  2022-02-10  4:41 ` [PATCH 5/8] t5526: use grep " Glen Choo
@ 2022-02-10 19:17   ` Jonathan Tan
  0 siblings, 0 replies; 149+ messages in thread
From: Jonathan Tan @ 2022-02-10 19:17 UTC (permalink / raw)
  To: Glen Choo; +Cc: Jonathan Tan, git

Glen Choo <chooglen@google.com> writes:
> In a previous commit, we replaced test_cmp invocations with
> verify_fetch_result(). Finish the process of removing test_cmp by using
> grep in verify_fetch_result() instead.

Thanks; this and the previous patch look like a better scheme for
testing.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 6/8] submodule: extract get_fetch_task()
  2022-02-10  4:41 ` [PATCH 6/8] submodule: extract get_fetch_task() Glen Choo
@ 2022-02-10 19:33   ` Jonathan Tan
  0 siblings, 0 replies; 149+ messages in thread
From: Jonathan Tan @ 2022-02-10 19:33 UTC (permalink / raw)
  To: Glen Choo; +Cc: Jonathan Tan, git

Glen Choo <chooglen@google.com> writes:
> Extract the index iterating code into an iterator function,
> get_fetch_task(), so that get_next_submodule() is agnostic of how
> to find submodules. This prepares for a subsequent commit will teach the
> fetch machinery to also iterate through the list of changed
> submodules (in addition to the index).

The transformation looks correct, but there are several things that
would have made it much easier to review.

> @@ -1507,41 +1505,17 @@ static int get_next_submodule(struct child_process *cp,

[snip]

> -		if (task->repo) {
> -			struct strbuf submodule_prefix = STRBUF_INIT;
> -			child_process_init(cp);
> -			cp->dir = task->repo->gitdir;
> -			prepare_submodule_repo_env_in_gitdir(&cp->env_array);
> -			cp->git_cmd = 1;
> -			if (!spf->quiet)
> -				strbuf_addf(err, _("Fetching submodule %s%s\n"),
> -					    spf->prefix, ce->name);
> -			strvec_init(&cp->args);
> -			strvec_pushv(&cp->args, spf->args.v);
> -			strvec_push(&cp->args, default_argv);
> -			strvec_push(&cp->args, "--submodule-prefix");
> -
> -			strbuf_addf(&submodule_prefix, "%s%s/",
> -						       spf->prefix,
> -						       task->sub->path);
> -			strvec_push(&cp->args, submodule_prefix.buf);
> -
> -			spf->count++;
> -			*task_cb = task;
> -
> -			strbuf_release(&submodule_prefix);
> -			return 1;
> -		} else {
> +		if (!task->repo) {
>  			struct strbuf empty_submodule_path = STRBUF_INIT;
>  
>  			fetch_task_release(task);
> @@ -1562,7 +1536,44 @@ static int get_next_submodule(struct child_process *cp,
>  					    ce->name);
>  			}
>  			strbuf_release(&empty_submodule_path);
> +			continue;
>  		}
> +		if (!spf->quiet)
> +			strbuf_addf(err, _("Fetching submodule %s%s\n"),
> +				    spf->prefix, ce->name);
> +
> +		spf->count++;
> +		return task;
> +	}
> +	return NULL;
> +}

You could have retained the "if (task->repo) { } else { }" structure
instead of adding a "continue;".

Also, the "if (!spf->quiet)" could be moved into get_next_submodule(),
but I see why it's there (it needs ce->name, which we otherwise don't
need), so leaving it where it is in this patch is fine too.

> +		strbuf_addf(&submodule_prefix, "%s%s/", spf->prefix,
> +			    task->sub->path);

It would have been clearer if this line wasn't rewrapped.

As a reviewer, sometimes it's hard to make the tradeoff between asking
the author to make formatting changes versus leaving it alone because
the reviewer has already inspected the changes and decided that any
errors are only in formatting, not in logic. In this case, though,
because there is only one more patch in the series and the formatting
change I'm suggesting here won't really affect it that much, I think
it's better if you make the formatting change for the benefit of other
reviewers who are currently reviewing this patch set, and anyone looking
at this commit in the future.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 2/8] submodule: store new submodule commits oid_array in a struct
  2022-02-10 19:00   ` Jonathan Tan
@ 2022-02-10 22:05     ` Junio C Hamano
  0 siblings, 0 replies; 149+ messages in thread
From: Junio C Hamano @ 2022-02-10 22:05 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Glen Choo, git

Jonathan Tan <jonathantanmy@google.com> writes:

> Glen Choo <chooglen@google.com> writes:
>> +/*
>> + * Holds relevant information for a changed submodule. Used as the .util
>> + * member of the changed submodule string_list_item.
>> + */
>> +struct changed_submodule_data {
>> +	/* The submodule commits that have changed in the rev walk. */
>> +	struct oid_array *new_commits;
>> +};
>
> Overall this change is straightforward and looks good, except that I
> think that the struct oid_array can be embedded directly instead of
> through a pointer.

True.

I am a bit behind and haven't seen the simplicity 1/8 promised to
bring to us, but hopefully we'll see soon enough why 1/8 is a good
idea.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 7/8] fetch: fetch unpopulated, changed submodules
  2022-02-10  4:41 ` [PATCH 7/8] fetch: fetch unpopulated, changed submodules Glen Choo
@ 2022-02-10 22:49   ` Junio C Hamano
  2022-02-11  7:15     ` Glen Choo
  2022-02-10 22:51   ` Jonathan Tan
  2022-02-14 10:17   ` Glen Choo
  2 siblings, 1 reply; 149+ messages in thread
From: Junio C Hamano @ 2022-02-10 22:49 UTC (permalink / raw)
  To: Glen Choo; +Cc: git, Jonathan Tan

Glen Choo <chooglen@google.com> writes:

> @@ -1273,10 +1277,6 @@ static void calculate_changed_submodule_paths(struct repository *r,
>  	struct strvec argv = STRVEC_INIT;
>  	struct string_list_item *name;
>  
> -	/* No need to check if there are no submodules configured */
> -	if (!submodule_from_path(r, NULL, NULL))
> -		return;
> -

It looks to me that this hunk reverts 18322bad (fetch: skip
on-demand checking when no submodules are configured, 2011-09-09),
which tried to avoid high cost computation when we know there is no
submodule.  Intended?  Perhaps it should be replaced with an
equivalent check that (1) still says "we do care about submodules"
even if the current checkout has no submodules (i.e. ls-files shows
no gitlinks), but (2) says "no, there is nothing interesting" when
$GIT_COMMON_DIR/modules/ is empty or some other cheap check we can
use?

> +get_fetch_task_from_index(struct submodule_parallel_fetch *spf,
> +			  const char **default_argv, struct strbuf *err)
>  {
> -	for (; spf->count < spf->r->index->cache_nr; spf->count++) {
> -		const struct cache_entry *ce = spf->r->index->cache[spf->count];
> +	for (; spf->index_count < spf->r->index->cache_nr; spf->index_count++) {
> +		const struct cache_entry *ce =
> +			spf->r->index->cache[spf->index_count];
>  		struct fetch_task *task;
>  
>  		if (!S_ISGITLINK(ce->ce_mode))
> @@ -1495,6 +1499,15 @@ get_fetch_task(struct submodule_parallel_fetch *spf,
>  		if (!task)
>  			continue;
>  
> +		/*
> +		 * We might have already considered this submodule
> +		 * because we saw it when iterating the changed
> +		 * submodule names.
> +		 */
> +		if (string_list_lookup(&spf->seen_submodule_names,
> +				       task->sub->name))
> +			continue;
> +
>  		switch (get_fetch_recurse_config(task->sub, spf))
>  		{
>  		default:
> @@ -1542,7 +1555,69 @@ get_fetch_task(struct submodule_parallel_fetch *spf,
>  			strbuf_addf(err, _("Fetching submodule %s%s\n"),
>  				    spf->prefix, ce->name);
>  
> -		spf->count++;
> +		spf->index_count++;
> +		return task;
> +	}
> +	return NULL;
> +}

Sorry, but I am confused.  If we are gathering which submodules to
fetch from the changes to gitlinks in the range of superproject
changes, why do we even need to scan the index (i.e. the current
checkout in the superproject) to begin with?  If it was changed,
we'd know get_fetch_task_from_changed() would take care of it, and
if there was no change to the submodule between the superproject's
commits before and after the fetch, there is nothing gained from
fetching in the submodules, no?

> +static struct fetch_task *
> +get_fetch_task_from_changed(struct submodule_parallel_fetch *spf,
> +			    const char **default_argv, struct strbuf *err)
> +{

> @@ -1553,7 +1628,10 @@ static int get_next_submodule(struct child_process *cp, struct strbuf *err,
>  {
>  	struct submodule_parallel_fetch *spf = data;
>  	const char *default_argv = NULL;
> -	struct fetch_task *task = get_fetch_task(spf, &default_argv, err);
> +	struct fetch_task *task =
> +		get_fetch_task_from_index(spf, &default_argv, err);
> +	if (!task)
> +		task = get_fetch_task_from_changed(spf, &default_argv, err);

Hmph, intersting.  So if "from index" grabbed some submodules
already, then the "from the changes in the superproject, we know
these submodules need refreshing" is not happen at all?  I am afraid
that I am still not following this...

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 7/8] fetch: fetch unpopulated, changed submodules
  2022-02-10  4:41 ` [PATCH 7/8] fetch: fetch unpopulated, changed submodules Glen Choo
  2022-02-10 22:49   ` Junio C Hamano
@ 2022-02-10 22:51   ` Jonathan Tan
  2022-02-14  4:24     ` Glen Choo
  2022-02-14 18:04     ` Glen Choo
  2022-02-14 10:17   ` Glen Choo
  2 siblings, 2 replies; 149+ messages in thread
From: Jonathan Tan @ 2022-02-10 22:51 UTC (permalink / raw)
  To: Glen Choo; +Cc: Jonathan Tan, git

Glen Choo <chooglen@google.com> writes:
> submodule.c has a seemingly-unrelated change that teaches the "find
> changed submodules" rev walk to call is_repository_shallow(). This fixes
> what I believe is a legitimate bug - the rev walk would fail on a
> shallow repo.
> 
> Our test suite did not catch this prior to this commit because we skip
> the rev walk if .gitmodules is not found, and thus the test suite did
> not attempt the rev walk on a shallow clone. After this commit,
> we always attempt to find changed submodules (regardless of whether
> there is a .gitmodules file), and the test suite noticed the bug.

Is this bug present without the other code introduced in this patch? If
yes, it's better to put the bugfix in a separate patch with a test that
would have failed but now passes.

Some more high-level comments:

> @@ -1273,10 +1277,6 @@ static void calculate_changed_submodule_paths(struct repository *r,
>  	struct strvec argv = STRVEC_INIT;
>  	struct string_list_item *name;
>  
> -	/* No need to check if there are no submodules configured */
> -	if (!submodule_from_path(r, NULL, NULL))
> -		return;

I think this is removed because "no submodules configured" here actually
means "no submodules configured in the index", but submodules may be
configured in the superproject commits we're fetching.

I wonder if this should be mentioned in the commit message, but I'm OK
either way.

>  struct submodule_parallel_fetch {
> -	int count;
> +	int index_count;
> +	int changed_count;

Here (and elsewhere) we're checking both the index and the superproject
commits for .gitmodules. Do we still need to check the index?

> @@ -1495,6 +1499,15 @@ get_fetch_task(struct submodule_parallel_fetch *spf,
>  		if (!task)
>  			continue;
>  
> +		/*
> +		 * We might have already considered this submodule
> +		 * because we saw it when iterating the changed
> +		 * submodule names.
> +		 */
> +		if (string_list_lookup(&spf->seen_submodule_names,
> +				       task->sub->name))
> +			continue;

[snip]
> +		/*
> +		 * We might have already considered this submodule
> +		 * because we saw it in the index.
> +		 */
> +		if (string_list_lookup(&spf->seen_submodule_names, item.string))
> +			continue;

Hmm...it's odd that the checks happen in both places, when theoretically
we would do one after the other, so this check would only need to be in
one place. Maybe this is because of how we had to implement it (looping
over everything every time when we get the next fetch task) but if it's
easy to avoid, that would be great.

> +# Cleans up after tests that checkout branches other than the main ones
> +# in the tests.
> +checkout_main_branches() {
> +	git -C downstream checkout --recurse-submodules super &&
> +	git -C downstream/submodule checkout --recurse-submodules sub &&
> +	git -C downstream/submodule/subdir/deepsubmodule checkout --recurse-submodules deep
> +}

If we need to clean up in this way, I think it's better if we store a
pristine copy somewhere (e.g. pristine-downstream), delete downstream,
and copy it over when we need to.

> +# Test that we can fetch submodules in other branches by running fetch
> +# in a branch that has no submodules.
> +test_expect_success 'setup downstream branch without submodules' '
> +	(
> +		cd downstream &&
> +		git checkout --recurse-submodules -b no-submodules &&
> +		rm .gitmodules &&
> +		git rm submodule &&
> +		git add .gitmodules &&
> +		git commit -m "no submodules" &&
> +		git checkout --recurse-submodules super
> +	)
> +'

The tip of the branch indeed doesn't have any submodules, but when
fetching this branch, we might end up fetching some of the tip's
ancestors (depending on the repo we're fetching into), which do have
submodules. If we need a branch without submodules, I think that all
ancestors should not have submodules too.

That might be an argument for creating our own downstream and upstream
repos instead of reusing the existing ones.

> +test_expect_success "'--recurse-submodules=on-demand' should fetch submodule commits if the submodule is changed but the index has no submodules" '
> +	test_when_finished "checkout_main_branches" &&
> +	git -C downstream fetch --recurse-submodules &&
> +	# Create new superproject commit with updated submodules
> +	add_upstream_commit &&
> +	(
> +		cd submodule &&
> +		(
> +			cd subdir/deepsubmodule &&
> +			git fetch &&

Hmm...I thought submodule/subdir/deepsubmodule is upstream. Why is it
fetching?

> +	# Fetch the new superproject commit
> +	(
> +		cd downstream &&
> +		git switch --recurse-submodules no-submodules &&
> +		git fetch --recurse-submodules=on-demand >../actual.out 2>../actual.err &&
> +		git checkout --recurse-submodules origin/super 2>../actual-checkout.err

This patch set is about fetching, so the checkout here seems odd. To
verify that the fetch happened successfully, I think that we should
obtain the hashes of the commits that we expect to be fetched from
upstream, and then verify that they are present downstream.

> +	# Assert that we can checkout the superproject commit with --recurse-submodules
> +	! grep -E "error: Submodule .+ could not be updated" actual-checkout.err

Negative greps are error-prone, since they will also appear to work if
the message was just misspelled. We should probably check that the
expected commit is present instead.

> +# Test that we properly fetch the submodules in the index as well as
> +# submodules in other branches.
> +test_expect_success 'setup downstream branch with other submodule' '
> +	mkdir submodule2 &&
> +	(
> +		cd submodule2 &&
> +		git init &&
> +		echo sub2content >sub2file &&
> +		git add sub2file &&
> +		git commit -a -m new &&
> +		git branch -M sub2
> +	) &&
> +	git checkout -b super-sub2-only &&
> +	git submodule add "$pwd/submodule2" submodule2 &&
> +	git commit -m "add sub2" &&
> +	git checkout super &&
> +	(
> +		cd downstream &&
> +		git fetch --recurse-submodules origin &&
> +		git checkout super-sub2-only &&
> +		# Explicitly run "git submodule update" because sub2 is new
> +		# and has not been cloned.
> +		git submodule update --init &&
> +		git checkout --recurse-submodules super
> +	)
> +'

I couldn't see the submodule in the index to be fetched; maybe it's
there somewhere but it's not obvious to me. Also, why do we
need to run "git submodule update"? This patch set concerns itself with
fetching existing submodules, not cloning new ones.

> +test_expect_success "'--recurse-submodules' should fetch submodule commits in changed submodules and the index" '

Same comment about where in the index is the submodule to be fetched.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 8/8] submodule: fix bug and remove add_submodule_odb()
  2022-02-10  4:41 ` [PATCH 8/8] submodule: fix bug and remove add_submodule_odb() Glen Choo
@ 2022-02-10 22:54   ` Junio C Hamano
  2022-02-11  3:13     ` Glen Choo
  2022-02-10 23:04   ` Jonathan Tan
  1 sibling, 1 reply; 149+ messages in thread
From: Junio C Hamano @ 2022-02-10 22:54 UTC (permalink / raw)
  To: Glen Choo; +Cc: git, Jonathan Tan

Glen Choo <chooglen@google.com> writes:

> add_submodule_odb() is a hack - it adds a submodule's odb as an
> alternate, allowing the submodule's objects to be read via
> the_repository. Its last caller is submodule_has_commits(), which calls
> add_submodule_odb() to prepare for check_has_commit(). This used to be
> necessary because check_has_commit() used the_repository's odb, but this
> is longer true as of 13a2f620b2 (submodule: pass repo to
> check_has_commit(), 2021-10-08).

Yes!  I wonder if we can do this much earlier in the series (or even
an independent clean-up that the rest of the series depends on) and
have it graduate earlier?

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 8/8] submodule: fix bug and remove add_submodule_odb()
  2022-02-10  4:41 ` [PATCH 8/8] submodule: fix bug and remove add_submodule_odb() Glen Choo
  2022-02-10 22:54   ` Junio C Hamano
@ 2022-02-10 23:04   ` Jonathan Tan
  2022-02-11  3:18     ` Glen Choo
  2022-02-11 17:19     ` Junio C Hamano
  1 sibling, 2 replies; 149+ messages in thread
From: Jonathan Tan @ 2022-02-10 23:04 UTC (permalink / raw)
  To: Glen Choo; +Cc: Jonathan Tan, git

Glen Choo <chooglen@google.com> writes:
> add_submodule_odb() is a hack - it adds a submodule's odb as an
> alternate, allowing the submodule's objects to be read via
> the_repository. Its last caller is submodule_has_commits(), which calls
> add_submodule_odb() to prepare for check_has_commit(). This used to be
> necessary because check_has_commit() used the_repository's odb, but this
> is longer true as of 13a2f620b2 (submodule: pass repo to
> check_has_commit(), 2021-10-08).
> 
> Removing add_submodule_odb() reveals a bug in check_has_commit(), where
> check_has_commit() will segfault if the submodule is missing (e.g. the
> user has not init-ed the submodule). This happens because the
> submodule's struct repository cannot be initialized, but
> check_has_commit() tries to cleanup the uninitialized struct anyway.
> This was masked by add_submodule_odb(), because add_submodule_odb()
> fails when the submodule is missing, causing the caller to return early
> and avoid calling check_has_commit().
> 
> Fix the bug and remove the call to add_submodule_odb(). Since
> add_submodule_odb() has no more callers, remove it too.
> 
> Note that submodule odbs can still by added as alternates via
> add_submodule_odb_by_path().
> 
> Signed-off-by: Glen Choo <chooglen@google.com>
> ---
> This bug only exists because we can't call repo_clear() twice on the
> same struct repository. So instead of just fixing this site, an
> alternative (and maybe better) fix would be to fix repo_clear(). If
> others think that's a good idea, I'll do that instead.

Reading the first paragraph of the commit message, I'm given the
impression that this is the last site in which we attempt to add a
submodule ODB as an alternate, but that is not true. This is indeed the
last usage of add_submodule_odb(), but add_submodule_odb_by_path() still
exists.

I think the primary point of this commit is to fix a latent bug in
check_has_commit(), and add_submodule_odb()'s role here is just hiding
it. Its hacky behavior does not play a role.

I would write the commit message like this:

  submodule: fix latent check_has_commit() bug

  check_has_commit() will attempt to clear a non-initialized struct
  repository if initialization fails. This bug is masked by its only
  caller, submodule_has_commits(), first calling add_submodule_odb().
  The latter fails if the repo does not exist, making
  submodule_has_commits() exit early and not invoke check_has_commit()
  in a situation in which initialization would fail.

  Fix this bug, and because calling add_submodule_odb() is no longer
  necessary as of 13a2f620b2 (submodule: pass repo to
  check_has_commit(), 2021-10-08), remove that call too.

  This is the last caller of add_submodule_odb(), so remove that
  function. (Adding submodule ODBs as alternates is still present in the
  form of add_submodule_odb_by_path().)

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 0/8] fetch --recurse-submodules: fetch unpopulated submodules
  2022-02-10 17:40     ` Junio C Hamano
@ 2022-02-11  2:39       ` Glen Choo
  0 siblings, 0 replies; 149+ messages in thread
From: Glen Choo @ 2022-02-11  2:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jonathan Tan

Junio C Hamano <gitster@pobox.com> writes:

>>> The real question is not "in which submodules we fetch", but "what
>>> commits we fetch in these submodules".  I do not think there is a
>>> good answer to the latter.
>>>
>>> Of course, we we take this sequence instead:
>>>
>>> 	git checkout branch-with-submodules
>>> 	git fetch --recurse-submodules
>>> 	git checkout --recurse-submodules branch-with-submodules
>>> 	
>>> things should work correctly (I think we both are assuming that the
>>> other side allows to fetch _any_ object, not just ref), as "fetch"
>>> knows what superproject commit it is asked to complete, unlike the
>>> previous example you gave, where it does not have a clue on what
>>> superproject commit it is preparing submodules for, right?
>>
>> So, given my prior description of recursive fetch, we actually _do_ know
>> which superproject commits to prepare for and which submodule commits to
>> fetch.
>
> Just to make sure I understand what is going on, let me rephrase.
>
>  * To find out which submodule commits we need to fetch, we find new
>    commits in the superproject we just fetched, inspect the trees of
>    these commits to see gitlinks that name commits we need to fetch
>    into the submodule repositories.
>
>  * For that to work well, we need to know, from the path these
>    commits appear in the trees of the superproject, to find out from
>    which submodule to fetch these commits from.  And to make the
>    mapping from paths to submodule names, we need to read
>    .gitmodules from the same superproject commit we found the
>    submodule commit in (as during the history of the superproject,
>    the submodule may have moved around).
>
> If so, I understand why being able to read .gitmodules from
> superproject commits is essential.  The flow would become like
>
>  (1) fetch in the superproject
>
>  (2) iterate over each new superproject commit:
>      - read its .gitmodules
>      - iterate over each gitlink found in the superproject commit:
>        - map the path we found gitlink at into module name
>        - find the submodule repository initialized for the module
>          - if the submodule is not of local interest, skip
>          - add the submodule commit pointed by gitlink to the
>            set of commits that need to be fetched for the submodule [*]
>
>  (3) iterate over each submodule we found more than one commits that
>      need to be fetched in, and fetch these commits (we do not have
>      to go over the network to re-fetch commits that exist in the
>      object store and are reachable from the refs, but "fetch"
>      already knows how to optimize that).
>
> Am I on the right track?

Yup, I think that's quite an accurate description. In particular..

>  (2) iterate over each new superproject commit:
>      - read its .gitmodules

Prior to this series, .gitmodules is read from the filesystem even
though we may notice the missing commit in a non-checked-out
superproject commit.

>  (3) iterate over each submodule we found more than one commits that
>      need to be fetched in, and fetch these commits

Yes, this describes the new "fetch changed submodules behavior"
accurately. However, we also attempt to fetch checked out submodules,
and this is where the two fetching strategies, "yes" and "on-demand" [1]
matter:

"yes", aka "--recurse-submodules" tells "git fetch" to attempt to fetch
_every_ checked out submodule regardless of whether Git thinks it has
missing commits (if we do not find any missing commits, I believe it
defaults to the "git fetch <remote-name>" form). [2]

"on-demand", aka "--recurse-submodules=on-demand", is the 'default'
option. (It is 'default' as in the sense of not passing a
"--recurse-submodules" arg, not default as in passing
"--recurse-submodules" without an "="). With "on-demand", we _only_
attempt to fetch changed submodules. Vis-a-vis "yes", this strategy has
no effect on fetching non-checked-out submodules because we only
fetch changed, non-checked-out submodules anyway, but it lets us ignore
unchanged, checked out submodules.

[1] From Documentation/fetch-options.txt:

--recurse-submodules[=yes|on-demand|no]::
  This option controls if and under what conditions new commits of
  populated submodules should be fetched too. It can be used as a
  boolean option to completely disable recursion when set to 'no' or to
  unconditionally recurse into all populated submodules when set to
  'yes', which is the default when this option is used without any
  value. Use 'on-demand' to only recurse into a populated submodule
  when the superproject retrieves a commit that updates the submodule's
  reference to a commit that isn't already in the local submodule
  clone. By default, 'on-demand' is used, unless
`fetch.recurseSubmodules` is set (see linkgit:git-config[1]).

[2] Sidenote on the "yes" option: I think the rationale for doing
unconditional fetches is not clear to reviewers. Admittedly, beyond 'the
test suite fails', I don't really remember why either. I'll dig into
this as I respond to the reviewer feedback.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 8/8] submodule: fix bug and remove add_submodule_odb()
  2022-02-10 22:54   ` Junio C Hamano
@ 2022-02-11  3:13     ` Glen Choo
  0 siblings, 0 replies; 149+ messages in thread
From: Glen Choo @ 2022-02-11  3:13 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jonathan Tan

Junio C Hamano <gitster@pobox.com> writes:

> Glen Choo <chooglen@google.com> writes:
>
>> add_submodule_odb() is a hack - it adds a submodule's odb as an
>> alternate, allowing the submodule's objects to be read via
>> the_repository. Its last caller is submodule_has_commits(), which calls
>> add_submodule_odb() to prepare for check_has_commit(). This used to be
>> necessary because check_has_commit() used the_repository's odb, but this
>> is longer true as of 13a2f620b2 (submodule: pass repo to
>> check_has_commit(), 2021-10-08).
>
> Yes!  I wonder if we can do this much earlier in the series (or even
> an independent clean-up that the rest of the series depends on) and
> have it graduate earlier?

This patch is totally conflict-free and dependency-free with regard to
the rest of the series, so there's almost no overhead to sending this as
an independent clean up.

And since you seem interested in seeing this graduate early, I'll send
it out independently :)

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 8/8] submodule: fix bug and remove add_submodule_odb()
  2022-02-10 23:04   ` Jonathan Tan
@ 2022-02-11  3:18     ` Glen Choo
  2022-02-11 17:19     ` Junio C Hamano
  1 sibling, 0 replies; 149+ messages in thread
From: Glen Choo @ 2022-02-11  3:18 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Jonathan Tan, git

Jonathan Tan <jonathantanmy@google.com> writes:

> Glen Choo <chooglen@google.com> writes:
>> add_submodule_odb() is a hack - it adds a submodule's odb as an
>> alternate, allowing the submodule's objects to be read via
>> the_repository. Its last caller is submodule_has_commits(), which calls
>> add_submodule_odb() to prepare for check_has_commit(). This used to be
>> necessary because check_has_commit() used the_repository's odb, but this
>> is longer true as of 13a2f620b2 (submodule: pass repo to
>> check_has_commit(), 2021-10-08).
>> 
>> Removing add_submodule_odb() reveals a bug in check_has_commit(), where
>> check_has_commit() will segfault if the submodule is missing (e.g. the
>> user has not init-ed the submodule). This happens because the
>> submodule's struct repository cannot be initialized, but
>> check_has_commit() tries to cleanup the uninitialized struct anyway.
>> This was masked by add_submodule_odb(), because add_submodule_odb()
>> fails when the submodule is missing, causing the caller to return early
>> and avoid calling check_has_commit().
>> 
>> Fix the bug and remove the call to add_submodule_odb(). Since
>> add_submodule_odb() has no more callers, remove it too.
>> 
>> Note that submodule odbs can still by added as alternates via
>> add_submodule_odb_by_path().
>> 
>> Signed-off-by: Glen Choo <chooglen@google.com>
>> ---
>> This bug only exists because we can't call repo_clear() twice on the
>> same struct repository. So instead of just fixing this site, an
>> alternative (and maybe better) fix would be to fix repo_clear(). If
>> others think that's a good idea, I'll do that instead.
>
> Reading the first paragraph of the commit message, I'm given the
> impression that this is the last site in which we attempt to add a
> submodule ODB as an alternate, but that is not true. This is indeed the
> last usage of add_submodule_odb(), but add_submodule_odb_by_path() still
> exists.
>
> I think the primary point of this commit is to fix a latent bug in
> check_has_commit(), and add_submodule_odb()'s role here is just hiding
> it. Its hacky behavior does not play a role.
>
> I would write the commit message like this:
>
>   submodule: fix latent check_has_commit() bug
>
>   check_has_commit() will attempt to clear a non-initialized struct
>   repository if initialization fails. This bug is masked by its only
>   caller, submodule_has_commits(), first calling add_submodule_odb().
>   The latter fails if the repo does not exist, making
>   submodule_has_commits() exit early and not invoke check_has_commit()
>   in a situation in which initialization would fail.
>
>   Fix this bug, and because calling add_submodule_odb() is no longer
>   necessary as of 13a2f620b2 (submodule: pass repo to
>   check_has_commit(), 2021-10-08), remove that call too.
>
>   This is the last caller of add_submodule_odb(), so remove that
>   function. (Adding submodule ODBs as alternates is still present in the
>   form of add_submodule_odb_by_path().)

Hm.. that is a good point, the commit message seems to promise more than
what it actually delivers. I'll take your suggestion, thanks!

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 7/8] fetch: fetch unpopulated, changed submodules
  2022-02-10 22:49   ` Junio C Hamano
@ 2022-02-11  7:15     ` Glen Choo
  2022-02-11 17:07       ` Junio C Hamano
  0 siblings, 1 reply; 149+ messages in thread
From: Glen Choo @ 2022-02-11  7:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jonathan Tan

Junio C Hamano <gitster@pobox.com> writes:

>> +static struct fetch_task *
>> +get_fetch_task_from_changed(struct submodule_parallel_fetch *spf,
>> +			    const char **default_argv, struct strbuf *err)
>> +{
>
>> @@ -1553,7 +1628,10 @@ static int get_next_submodule(struct child_process *cp, struct strbuf *err,
>>  {
>>  	struct submodule_parallel_fetch *spf = data;
>>  	const char *default_argv = NULL;
>> -	struct fetch_task *task = get_fetch_task(spf, &default_argv, err);
>> +	struct fetch_task *task =
>> +		get_fetch_task_from_index(spf, &default_argv, err);
>> +	if (!task)
>> +		task = get_fetch_task_from_changed(spf, &default_argv, err);
>
> Hmph, intersting.  So if "from index" grabbed some submodules
> already, then the "from the changes in the superproject, we know
> these submodules need refreshing" is not happen at all?  I am afraid
> that I am still not following this...

Hm, perhaps the following will help:

- get_next_submodule() is an iterator, specifically, it is a
  get_next_task_fn passed to run_processes_parallel_tr2(). It gets
  called until it is exhausted.
- Since get_next_submodule() is an iterator, I've implemented
  get_fetch_task_from_index() and get_fetch_task_from_changed() as
  iterators (they return NULL when they are exhausted).

So in practice:

- We repeatedly call get_next_submodule(), which tries to get a fetch
  task by calling the get_fetch_task_* functions.
- If get_fetch_task_from_index() returns non-NULL, get_next_submodule()
  uses that fetch task.
- Eventually, we will have considered every submodule in the index. At
  that point, get_fetch_task_from_index() is exhausted and always
  returns NULL.
- Since get_fetch_task_from_index() returns NULL, get_next_submodule()
  now gets its fetch tasks from get_fetch_task_from_changed().
- Eventually, we will also have considered every changed submodule, and
  get_fetch_task_from_changed() is exhausted.
- get_next_submodule() has now been exhausted and we are done.

As for the other questions, I'll dig a bit deeper before getting back to
you with answers.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 3/8] submodule: make static functions read submodules from commits
  2022-02-10 19:15   ` Jonathan Tan
@ 2022-02-11 10:07     ` Glen Choo
  2022-02-11 10:09     ` Glen Choo
  1 sibling, 0 replies; 149+ messages in thread
From: Glen Choo @ 2022-02-11 10:07 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Jonathan Tan, git


Hm, after reading your feedback, I'm starting to question whether this
patch makes sense in its current form.

Jonathan Tan <jonathantanmy@google.com> writes:

> Glen Choo <chooglen@google.com> writes:
>> As a result, "git fetch" now reads changed submodules using the
>> `.gitmodules` and path from super_oid's tree (which is where "git fetch"
>> actually noticed the changed submodule) instead of the filesystem.
>
> Could we have a test showing what has changed?

Looking at this closer, I don't think a test is feasible, but even more
importantly, I don't think this behavior change is even desirable at
all..

I was confused when I wrote the commit message. Prior to this patch,
"git fetch" already records the names of changed submodules by correctly
reading .gitmodules from the given commit. From
collect_changed_submodules_cb(): 

		submodule = submodule_from_path(me->repo,
						commit_oid, p->two->path);
    [...]
		item = string_list_insert(changed, name);
		if (!item->util)
			item->util = xcalloc(1, sizeof(struct changed_submodule_data));
		cs_data = item->util;
		oid_array_append(&cs_data->new_commits, &p->two->oid);

The only behavior that actually _does_ change is that we plumb super_oid
through submodule_has_commits(). "git fetch" invokes this function to
figure out if it already has all of the needed commits, and if so, skip
the fetch.

Before plumbing super_oid, we could not check for commits in an
unpopulated submodule, but now we do. We will need this when we fetch in
unpopulated submodules, but as of this patch, we never fetch in
unpopulated submodules anyway, so this check is just wasted effort.

And because we never fetch anyway, we can't test any meaningful
behavior. We could check whether or not we check the submodule odb, but
that's a lot of effort to spend on something we don't need.

So we probably don't want this behavior change. I can preserve the
existing behavior by passing null_oid() instead, and pass super_oid in
the actual "fetch unpopulated submodules" patch.

>> @@ -1476,7 +1493,7 @@ static int get_next_submodule(struct child_process *cp,
>>  		if (!S_ISGITLINK(ce->ce_mode))
>>  			continue;
>>  
>> -		task = fetch_task_create(spf->r, ce->name);
>> +		task = fetch_task_create(spf->r, ce->name, null_oid());
>
> Hmm...is the plumbing incomplete? This code is about fetching, but we're
> not passing any superproject commit OID here. If this will be fixed in a
> future commit, maybe the distribution of what goes into each commit
> needs to be revised.
>
>> @@ -1499,7 +1516,7 @@ static int get_next_submodule(struct child_process *cp,
>>  			continue;
>>  		}
>>  
>> -		task->repo = get_submodule_repo_for(spf->r, task->sub->path);
>> +		task->repo = get_submodule_repo_for(spf->r, task->sub->path, null_oid());
>
> Same comment here.

This is intentional (but I admit that I also got confused rereading
this). This should be null_oid() (i.e. read from the filesystem) because
we are iterating through the index to get submodule paths. So we should
pass null_oid() so that we read .gitmodules from the filesystem, not a
possibly out-of-sync superproject commit.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 3/8] submodule: make static functions read submodules from commits
  2022-02-10 19:15   ` Jonathan Tan
  2022-02-11 10:07     ` Glen Choo
@ 2022-02-11 10:09     ` Glen Choo
  1 sibling, 0 replies; 149+ messages in thread
From: Glen Choo @ 2022-02-11 10:09 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Jonathan Tan, git


Hm, after reading your feedback, I'm starting to question whether this
patch makes sense in its current form.

Jonathan Tan <jonathantanmy@google.com> writes:

> Glen Choo <chooglen@google.com> writes:
>> As a result, "git fetch" now reads changed submodules using the
>> `.gitmodules` and path from super_oid's tree (which is where "git fetch"
>> actually noticed the changed submodule) instead of the filesystem.
>
> Could we have a test showing what has changed?

Looking at this closer, I don't think a test is feasible, but even more
importantly, I don't think this behavior change is even desirable at
all..

I was confused when I wrote the commit message. Prior to this patch,
"git fetch" already records the names of changed submodules by correctly
reading .gitmodules from the given commit. From
collect_changed_submodules_cb(): 

		submodule = submodule_from_path(me->repo,
						commit_oid, p->two->path);
    [...]
		item = string_list_insert(changed, name);
		if (!item->util)
			item->util = xcalloc(1, sizeof(struct changed_submodule_data));
		cs_data = item->util;
		oid_array_append(&cs_data->new_commits, &p->two->oid);

The only behavior that actually _does_ change is that we plumb super_oid
through submodule_has_commits(). "git fetch" invokes this function to
figure out if it already has all of the needed commits, and if so, skip
the fetch.

Before plumbing super_oid, we could not check for commits in an
unpopulated submodule, but now we do. We will need this when we fetch in
unpopulated submodules, but as of this patch, we never fetch in
unpopulated submodules anyway, so this check is just wasted effort.

And because we never fetch anyway, we can't test any meaningful
behavior. We could check whether or not we check the submodule odb, but
that's a lot of effort to spend on something we don't need.

So we probably don't want this behavior change. I can preserve the
existing behavior by passing null_oid() instead, and pass super_oid in
the actual "fetch unpopulated submodules" patch.

>> @@ -1476,7 +1493,7 @@ static int get_next_submodule(struct child_process *cp,
>>  		if (!S_ISGITLINK(ce->ce_mode))
>>  			continue;
>>  
>> -		task = fetch_task_create(spf->r, ce->name);
>> +		task = fetch_task_create(spf->r, ce->name, null_oid());
>
> Hmm...is the plumbing incomplete? This code is about fetching, but we're
> not passing any superproject commit OID here. If this will be fixed in a
> future commit, maybe the distribution of what goes into each commit
> needs to be revised.
>
>> @@ -1499,7 +1516,7 @@ static int get_next_submodule(struct child_process *cp,
>>  			continue;
>>  		}
>>  
>> -		task->repo = get_submodule_repo_for(spf->r, task->sub->path);
>> +		task->repo = get_submodule_repo_for(spf->r, task->sub->path, null_oid());
>
> Same comment here.

This is intentional (but I admit that I also got confused rereading
this). This should be null_oid() (i.e. read from the filesystem) because
we are iterating through the index to get submodule paths. So we should
pass null_oid() so that we read .gitmodules from the filesystem, not a
possibly out-of-sync superproject commit.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 7/8] fetch: fetch unpopulated, changed submodules
  2022-02-11  7:15     ` Glen Choo
@ 2022-02-11 17:07       ` Junio C Hamano
  0 siblings, 0 replies; 149+ messages in thread
From: Junio C Hamano @ 2022-02-11 17:07 UTC (permalink / raw)
  To: Glen Choo; +Cc: git, Jonathan Tan

Glen Choo <chooglen@google.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>>> +static struct fetch_task *
>>> +get_fetch_task_from_changed(struct submodule_parallel_fetch *spf,
>>> +			    const char **default_argv, struct strbuf *err)
>>> +{
>>
>>> @@ -1553,7 +1628,10 @@ static int get_next_submodule(struct child_process *cp, struct strbuf *err,
>>>  {
>>>  	struct submodule_parallel_fetch *spf = data;
>>>  	const char *default_argv = NULL;
>>> -	struct fetch_task *task = get_fetch_task(spf, &default_argv, err);
>>> +	struct fetch_task *task =
>>> +		get_fetch_task_from_index(spf, &default_argv, err);
>>> +	if (!task)
>>> +		task = get_fetch_task_from_changed(spf, &default_argv, err);
>>
>> Hmph, intersting.  So if "from index" grabbed some submodules
>> already, then the "from the changes in the superproject, we know
>> these submodules need refreshing" is not happen at all?  I am afraid
>> that I am still not following this...
>
> Hm, perhaps the following will help:
>
> - get_next_submodule() is an iterator, specifically, it is a
>   get_next_task_fn passed to run_processes_parallel_tr2(). It gets
>   called until it is exhausted.

Ahh, yeah, I totally forgot how we designed these things to work.

Even though these functions have a loop, (1) they start iterating at
the point where they left off in the last call, and (2) they return
as soon as they find the first item in the loop, which should have
stood out as a typical generator pattern, but somehow I missed these
signs.

> So in practice:
> ...
> - get_next_submodule() has now been exhausted and we are done.

But my original question (based on my misunderstanding that a single
call to these would grab all submodules that needs fetching by
inspecting either the index or the history) still stands, doesn't it?

Presumably the "history scan" part is because we assume that we
already had all the necessary submodule commits to check out any
superproject commits before this recursive fetch started.  That is
the reason why we do not scan the history behind the "old tips".  We
inspect only the history newer than them, leading to the "new tips",
and try to grab all submodule commits that newly appear, to ensure
that we can check out all the superproject commits we just obtained
and have no missing submodule commits necessary to do so.  Combined
with the assumption on the state before this fetch that we had all
necessary submodule commits to check out the superproject commits up
to "old tips", we maintain the invariant that we can check out any
superproject commits recursively, no?

If we are doing so, especially with this series where we do the
"history scan" to complete submodule commits necessary for all new
commits in the superproject, regardless of the branch being checked
out in the superproject, why do we still need to scan the index to
ensure that the current checkout can recurse down the submodule
without hitting a missing commit?

The only case the "index scan" may make a difference is when the
assumption, the invariant that we can check out any superproject
commits recursively, did not hold before we started the fetch, no?

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 8/8] submodule: fix bug and remove add_submodule_odb()
  2022-02-10 23:04   ` Jonathan Tan
  2022-02-11  3:18     ` Glen Choo
@ 2022-02-11 17:19     ` Junio C Hamano
  2022-02-14  2:52       ` Glen Choo
  1 sibling, 1 reply; 149+ messages in thread
From: Junio C Hamano @ 2022-02-11 17:19 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Glen Choo, git

Jonathan Tan <jonathantanmy@google.com> writes:

> I would write the commit message like this:
>
>   submodule: fix latent check_has_commit() bug
>
>   check_has_commit() will attempt to clear a non-initialized struct
>   repository if initialization fails. This bug is masked by its only
>   caller, submodule_has_commits(), first calling add_submodule_odb().
>   The latter fails if the repo does not exist, making
>   submodule_has_commits() exit early and not invoke check_has_commit()
>   in a situation in which initialization would fail.
>
>   Fix this bug, and because calling add_submodule_odb() is no longer
>   necessary as of 13a2f620b2 (submodule: pass repo to
>   check_has_commit(), 2021-10-08), remove that call too.
>
>   This is the last caller of add_submodule_odb(), so remove that
>   function. (Adding submodule ODBs as alternates is still present in the
>   form of add_submodule_odb_by_path().)

Looks more clearly explained.  

We still end up calling add_to_alternate_memory(), so I take the
"let's have this early" back.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 8/8] submodule: fix bug and remove add_submodule_odb()
  2022-02-11 17:19     ` Junio C Hamano
@ 2022-02-14  2:52       ` Glen Choo
  0 siblings, 0 replies; 149+ messages in thread
From: Glen Choo @ 2022-02-14  2:52 UTC (permalink / raw)
  To: Junio C Hamano, Jonathan Tan; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

> Jonathan Tan <jonathantanmy@google.com> writes:
>
>> I would write the commit message like this:
>>
>>   submodule: fix latent check_has_commit() bug
>>
>>   check_has_commit() will attempt to clear a non-initialized struct
>>   repository if initialization fails. This bug is masked by its only
>>   caller, submodule_has_commits(), first calling add_submodule_odb().
>>   The latter fails if the repo does not exist, making
>>   submodule_has_commits() exit early and not invoke check_has_commit()
>>   in a situation in which initialization would fail.
>>
>>   Fix this bug, and because calling add_submodule_odb() is no longer
>>   necessary as of 13a2f620b2 (submodule: pass repo to
>>   check_has_commit(), 2021-10-08), remove that call too.
>>
>>   This is the last caller of add_submodule_odb(), so remove that
>>   function. (Adding submodule ODBs as alternates is still present in the
>>   form of add_submodule_odb_by_path().)
>
> Looks more clearly explained.  
>
> We still end up calling add_to_alternate_memory(), so I take the
> "let's have this early" back.

Ok, so I won't send this patch separately. Thanks Jonathan for making
things clearer :)

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 7/8] fetch: fetch unpopulated, changed submodules
  2022-02-10 22:51   ` Jonathan Tan
@ 2022-02-14  4:24     ` Glen Choo
  2022-02-14 18:04     ` Glen Choo
  1 sibling, 0 replies; 149+ messages in thread
From: Glen Choo @ 2022-02-14  4:24 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Jonathan Tan, git

Jonathan Tan <jonathantanmy@google.com> writes:

>> @@ -1495,6 +1499,15 @@ get_fetch_task(struct submodule_parallel_fetch *spf,
>>  		if (!task)
>>  			continue;
>>  
>> +		/*
>> +		 * We might have already considered this submodule
>> +		 * because we saw it when iterating the changed
>> +		 * submodule names.
>> +		 */
>> +		if (string_list_lookup(&spf->seen_submodule_names,
>> +				       task->sub->name))
>> +			continue;
>
> [snip]
>> +		/*
>> +		 * We might have already considered this submodule
>> +		 * because we saw it in the index.
>> +		 */
>> +		if (string_list_lookup(&spf->seen_submodule_names, item.string))
>> +			continue;
>
> Hmm...it's odd that the checks happen in both places, when theoretically
> we would do one after the other, so this check would only need to be in
> one place. Maybe this is because of how we had to implement it (looping
> over everything every time when we get the next fetch task) but if it's
> easy to avoid, that would be great.

Yes, in order for the code to be correct, we only need this check once,
but I chose to check twice for defensiveness. That is, we avoid creating
implicit dependencies between the functions like "function A does not
consider whether a submodule has already been fetched, so it must always
be called before function B".

Perhaps there is another concern that overrides this? e.g. performance.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 7/8] fetch: fetch unpopulated, changed submodules
  2022-02-10  4:41 ` [PATCH 7/8] fetch: fetch unpopulated, changed submodules Glen Choo
  2022-02-10 22:49   ` Junio C Hamano
  2022-02-10 22:51   ` Jonathan Tan
@ 2022-02-14 10:17   ` Glen Choo
  2 siblings, 0 replies; 149+ messages in thread
From: Glen Choo @ 2022-02-14 10:17 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, Junio C Hamano

Glen Choo <chooglen@google.com> writes:

> Teach "git fetch" to fetch cloned, changed submodules regardless of
> whether they are populated (this is in addition to the current behavior
> of fetching populated submodules).

Reviewers (and myself) have rightfully asked why "git fetch" should
continue to bother looking for submodules in the index if it already
fetches all of the changed submodules. The reasons for this are twofold:

1. The primary reason is that --recurse-submodules, aka
--recurse-submodules=yes does an unconditional fetch in each of the
submodules regardless of whether they have been changed by a
superproject commit. This is the behavior of e.g. from
 t/t5526-fetch-submodules.sh:101:

  test_expect_success "fetch --recurse-submodules recurses into submodules" '
    # Creates commits in the submodules but NOT the superproject
    add_upstream_commit &&
    (
      cd downstream &&
      git fetch --recurse-submodules >../actual.out 2>../actual.err
    ) &&
    test_must_be_empty actual.out &&
    # Assert that the new submodule commits have been fetched and that
    # no superproject commit was fetched
    verify_fetch_result actual.err
  '

Thus, we continue to check the index to implement this unconditional
fetching behavior.

2. In the --recurse-submodule=on-demand case, it can be correct to
ignore the index because "on-demand" only requires us to fetch changed
submodules. But in the event that a submodule is both changed and
populated, we may prefer to read the index instead of the superproject
commit, because the contents of the index are more obvious and more
actionable to the user.

For example, we print the path of the submodule when attempting to fetch
a submodule for debugging purposes:

- For a populated submodule, we print "Fetching submodule <path>"
- For an unpopulated submodule, we print "Fetching submodule <path> at
  commit foo"

Presumably, the user would prefer to see the "populated submodule"
message because it's easier to work with, e.g. "git -C <path>
<fix-the-problem>" instead of "git checkout --recurse-submodules
<commit-with-submodule> && git <fix-the-problem>".

The latter is not a sufficient reason to read the index and then the
changed submodule list (because we could try to read the changed
submodule configuration from index), but since we need to support
--recurse-submodules=yes, this implementation is convenient for
achieving both goals.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH 7/8] fetch: fetch unpopulated, changed submodules
  2022-02-10 22:51   ` Jonathan Tan
  2022-02-14  4:24     ` Glen Choo
@ 2022-02-14 18:04     ` Glen Choo
  1 sibling, 0 replies; 149+ messages in thread
From: Glen Choo @ 2022-02-14 18:04 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Jonathan Tan, git

Jonathan Tan <jonathantanmy@google.com> writes:

> Glen Choo <chooglen@google.com> writes:
>> submodule.c has a seemingly-unrelated change that teaches the "find
>> changed submodules" rev walk to call is_repository_shallow(). This fixes
>> what I believe is a legitimate bug - the rev walk would fail on a
>> shallow repo.
>> 
>> Our test suite did not catch this prior to this commit because we skip
>> the rev walk if .gitmodules is not found, and thus the test suite did
>> not attempt the rev walk on a shallow clone. After this commit,
>> we always attempt to find changed submodules (regardless of whether
>> there is a .gitmodules file), and the test suite noticed the bug.
>
> Is this bug present without the other code introduced in this patch? If
> yes, it's better to put the bugfix in a separate patch with a test that
> would have failed but now passes.

Makes sense, I'll do so.

>> @@ -1273,10 +1277,6 @@ static void calculate_changed_submodule_paths(struct repository *r,
>>  	struct strvec argv = STRVEC_INIT;
>>  	struct string_list_item *name;
>>  
>> -	/* No need to check if there are no submodules configured */
>> -	if (!submodule_from_path(r, NULL, NULL))
>> -		return;
>
> I think this is removed because "no submodules configured" here actually
> means "no submodules configured in the index", but submodules may be
> configured in the superproject commits we're fetching.
>
> I wonder if this should be mentioned in the commit message, but I'm OK
> either way.

Yes, your interpretation is correct. Though, as Junio mentioned in
<xmqqtud6e3r8.fsf@gitster.g>, I think we'd prefer to have _some kind_ of
check, even though this one no longer makes sense.

>
>>  struct submodule_parallel_fetch {
>> -	int count;
>> +	int index_count;
>> +	int changed_count;
>
> Here (and elsewhere) we're checking both the index and the superproject
> commits for .gitmodules. Do we still need to check the index?

Since this is a frequently asked question, I answered this elsewhere,
namely <kl6lczjp7nwj.fsf@chooglen-macbookpro.roam.corp.google.com>.

>> +# Cleans up after tests that checkout branches other than the main ones
>> +# in the tests.
>> +checkout_main_branches() {
>> +	git -C downstream checkout --recurse-submodules super &&
>> +	git -C downstream/submodule checkout --recurse-submodules sub &&
>> +	git -C downstream/submodule/subdir/deepsubmodule checkout --recurse-submodules deep
>> +}
>
> If we need to clean up in this way, I think it's better if we store a
> pristine copy somewhere (e.g. pristine-downstream), delete downstream,
> and copy it over when we need to.

The need for cleanup isn't that big; this just checks out the right
branches after we've checked out _other_ branches. If remove
the checkout, we won't need this any more, and...

>> +	# Fetch the new superproject commit
>> +	(
>> +		cd downstream &&
>> +		git switch --recurse-submodules no-submodules &&
>> +		git fetch --recurse-submodules=on-demand >../actual.out 2>../actual.err &&
>> +		git checkout --recurse-submodules origin/super 2>../actual-checkout.err
>
> This patch set is about fetching, so the checkout here seems odd. To
> verify that the fetch happened successfully, I think that we should
> obtain the hashes of the commits that we expect to be fetched from
> upstream, and then verify that they are present downstream.

IIUC this feedback correctly, the checkout is just an indirect way of
checking if we have the commit, so it makes more sense to just check if
we have the commit.

But explicitly checking for the commit (with "git cat-file -e" I
assume?) is probably overkill - verify_fetch_result() already checks for
this by grep-ing the output of "git fetch".

So I think it's ok to drop the checkout and not check for the commit
(beyond verify_fetch_result()).

>> +# Test that we can fetch submodules in other branches by running fetch
>> +# in a branch that has no submodules.
>> +test_expect_success 'setup downstream branch without submodules' '
>> +	(
>> +		cd downstream &&
>> +		git checkout --recurse-submodules -b no-submodules &&
>> +		rm .gitmodules &&
>> +		git rm submodule &&
>> +		git add .gitmodules &&
>> +		git commit -m "no submodules" &&
>> +		git checkout --recurse-submodules super
>> +	)
>> +'
>
> The tip of the branch indeed doesn't have any submodules, but when
> fetching this branch, we might end up fetching some of the tip's
> ancestors (depending on the repo we're fetching into), which do have
> submodules. If we need a branch without submodules, I think that all
> ancestors should not have submodules too.
>
> That might be an argument for creating our own downstream and upstream
> repos instead of reusing the existing ones.

I think I just made a silly wording error, I meant a "commit" or
"working tree state" without submodules, not a branch. The behavior I
wanted to test is whether or not changed submodules are fetched in
the absence of submodules and .gitmodules in the index/working tree.

>> +test_expect_success "'--recurse-submodules=on-demand' should fetch submodule commits if the submodule is changed but the index has no submodules" '
>> +	test_when_finished "checkout_main_branches" &&
>> +	git -C downstream fetch --recurse-submodules &&
>> +	# Create new superproject commit with updated submodules
>> +	add_upstream_commit &&
>> +	(
>> +		cd submodule &&
>> +		(
>> +			cd subdir/deepsubmodule &&
>> +			git fetch &&
>
> Hmm...I thought submodule/subdir/deepsubmodule is upstream. Why is it
> fetching?

Ah, deepsubmodule is a submodule in the "submodule/" repo, whose
remote is in "deepsubmodule/":

  test_expect_success setup '
    mkdir deepsubmodule &&
    (
      cd deepsubmodule &&
      git init &&
      echo deepsubcontent > deepsubfile &&
      git add deepsubfile &&
      git commit -m new deepsubfile &&
      git branch -M deep
    ) &&
    mkdir submodule &&
    (
      cd submodule &&
      git init &&
      echo subcontent > subfile &&
      git add subfile &&
      git submodule add "$pwd/deepsubmodule" subdir/deepsubmodule &&
      git commit -a -m new &&
      git branch -M sub
    )

So we fetch in "submodule/subdir/deepsubmodule" to get a new
deepsubmodule and (non-deep) submodule commit. Both of these commits
are then used to construct a new superproject commit.

If this is too confusing, maybe I should try to make the test simpler.

>
>> +	# Assert that we can checkout the superproject commit with --recurse-submodules
>> +	! grep -E "error: Submodule .+ could not be updated" actual-checkout.err
>
> Negative greps are error-prone, since they will also appear to work if
> the message was just misspelled. We should probably check that the
> expected commit is present instead.

That's a good point, I hadn't considered that.

>> +# Test that we properly fetch the submodules in the index as well as
>> +# submodules in other branches.
>> +test_expect_success 'setup downstream branch with other submodule' '
>> +	mkdir submodule2 &&
>> +	(
>> +		cd submodule2 &&
>> +		git init &&
>> +		echo sub2content >sub2file &&
>> +		git add sub2file &&
>> +		git commit -a -m new &&
>> +		git branch -M sub2
>> +	) &&
>> +	git checkout -b super-sub2-only &&
>> +	git submodule add "$pwd/submodule2" submodule2 &&
>> +	git commit -m "add sub2" &&
>> +	git checkout super &&
>> +	(
>> +		cd downstream &&
>> +		git fetch --recurse-submodules origin &&
>> +		git checkout super-sub2-only &&
>> +		# Explicitly run "git submodule update" because sub2 is new
>> +		# and has not been cloned.
>> +		git submodule update --init &&
>> +		git checkout --recurse-submodules super
>> +	)
>> +'
>
> I couldn't see the submodule in the index to be fetched; maybe it's
> there somewhere but it's not obvious to me.

If it helps, I've updated the description to:

  # In downstream, init "submodule2", but do not check it out while
  # fetching. This lets us assert that unpopulated submodules can be
  # fetched.

The 'submodules in the index' are the existing submodules ("submodule"
and "submodule/subdir/deepsubmodule"), and the changed, unpopulated
submodule is "submodule2".

> Also, why do we need to run "git submodule update"? This patch set
> concerns itself with fetching existing submodules, not cloning new
> ones.

In this setup step, 'downstream' clones 'submodule2' using "git
submodule update". So from the perspective of the following tests,
'submodule2' is an existing submodule. We could have cloned 'submodule2'
in an earlier setup step, but it wouldn't have been needed until these
tests.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* [PATCH v2 0/9] fetch --recurse-submodules: fetch unpopulated submodules
  2022-02-10  4:41 [PATCH 0/8] fetch --recurse-submodules: fetch unpopulated submodules Glen Choo
                   ` (8 preceding siblings ...)
  2022-02-10  7:07 ` [PATCH 0/8] fetch --recurse-submodules: fetch unpopulated submodules Junio C Hamano
@ 2022-02-15 17:23 ` Glen Choo
  2022-02-15 17:23   ` [PATCH v2 1/9] t5526: introduce test helper to assert on fetches Glen Choo
                     ` (9 more replies)
  9 siblings, 10 replies; 149+ messages in thread
From: Glen Choo @ 2022-02-15 17:23 UTC (permalink / raw)
  To: git; +Cc: Glen Choo, Jonathan Tan, Junio C Hamano

Original cover letter: https://lore.kernel.org/git/20220210044152.78352-1-chooglen@google.com

This series is based on gc/branch-recurse-submodules.

Thanks for the kind feedback :) I believe this version addresses all of
the feedback, though I suspect that I might not have given clear-enough
answers to all of the questions. Let me know if more clarification is
needed.

= Patch organization

The patches are organized somewhat differently vs v1. During review of
v1, I realized that it didn't make sense to read submodules from the
superproject commit until we actually fetch unpopulated submodules. So
this version just has all of the C patches together, instead of
putting the test suite patches after "read submodules from the
superproject commit".

- Patches 1-2 are quality-of-life improvements to the test suite that
  make it easier to write the tests in patch 7.
- Patches 3-5 are preparation for "git fetch" to read .gitmodules from
  the superproject commit in patch 7.
- Patch 6 separates the steps of "finding which submodules to fetch" and
  "fetching the submodules", making it easier to tell "git fetch" to
  fetch unpopulated submodules.
- Patch 7 teaches "git fetch" to fetch changed, unpopulated submodules
  in addition to populated submodules.
- Patch 8 is an optional bugfix that fixes a bug with "git fetch
  --update-shallow" in a repository with submodules. This was discovered
  in v1 but is no longer necessary for tests to pass.
- Patch 9 is an optional bugfix + cleanup of the "git fetch" code that
  removes the last caller of the deprecated "add_submodule_odb()".

= Changes

Since v1:
- Numerous style fixes suggested by Jonathan (thanks!)
- In patch 3, don't prematurely read submodules from the superproject
  commit (see:
  <kl6l5yplyat6.fsf@chooglen-macbookpro.roam.corp.google.com>).
- In patch 7, stop using "git checkout" and "! grep" in tests.
- In patch 7, stop doing the "find changed submodules" rev walk
  unconditionally. Instead, continue to check for .gitmodules, but also
  check for submodules in $GIT_DIR/modules.
  - I'm not entirely happy with the helper function name, see "---" for
    details.
- Move "git fetch --update-shallow" bugfix to patch 8.
  - Because the "find changed submodules" rev walk is no longer
    unconditional, this fix is no longer needed for tests to pass.
- Rename fetch_populated_submodules() to fetch_submodules().

Glen Choo (9):
  t5526: introduce test helper to assert on fetches
  t5526: use grep to assert on fetches
  submodule: make static functions read submodules from commits
  submodule: inline submodule_commits() into caller
  submodule: store new submodule commits oid_array in a struct
  submodule: extract get_fetch_task()
  fetch: fetch unpopulated, changed submodules
  submodule: read shallows when finding changed submodules
  submodule: fix latent check_has_commit() bug

 Documentation/fetch-options.txt |  26 ++-
 Documentation/git-fetch.txt     |  10 +-
 builtin/fetch.c                 |  14 +-
 submodule.c                     | 321 +++++++++++++++++----------
 submodule.h                     |  21 +-
 t/t5526-fetch-submodules.sh     | 374 ++++++++++++++++++++++++--------
 6 files changed, 534 insertions(+), 232 deletions(-)

Range-diff against v1:
 4:  432aa60296 =  1:  a159cdaabb t5526: introduce test helper to assert on fetches
 5:  9515d22804 =  2:  48894c6c43 t5526: use grep to assert on fetches
 3:  3c28ea743c !  3:  6cf5e76d62 submodule: make static functions read submodules from commits
    @@ Commit message
         submodule: make static functions read submodules from commits
     
         A future commit will teach "fetch --recurse-submodules" to fetch
    -    unpopulated submodules. Prepare for this by teaching the necessary
    -    static functions to read submodules from superproject commits instead of
    -    the index and filesystem. Then, store the necessary fields (path and
    -    super_oid), and use them in "fetch --recurse-submodules" where possible.
    +    unpopulated submodules. To prepare for this, teach the necessary static
    +    functions how to read submodules from superproject commits using a
    +    "treeish_name" argument (instead of always reading from the index and
    +    filesystem) but do not actually change where submodules are read from.
    +    Submodules will be read from commits when we fetch unpopulated
    +    submodules.
     
    -    As a result, "git fetch" now reads changed submodules using the
    -    `.gitmodules` and path from super_oid's tree (which is where "git fetch"
    -    actually noticed the changed submodule) instead of the filesystem.
    +    The changed function signatures follow repo_submodule_init()'s argument
    +    order, i.e. "path" then "treeish_name". Where needed, reorder the
    +    arguments of functions that already take "path" and "treeish_name" to be
    +    consistent with this convention.
     
         Signed-off-by: Glen Choo <chooglen@google.com>
     
      ## submodule.c ##
    -@@ submodule.c: static const char *default_name_or_path(const char *path_or_name)
    -  * member of the changed submodule string_list_item.
    -  */
    - struct changed_submodule_data {
    -+	/*
    -+	 * The first superproject commit in the rev walk that points to the
    -+	 * submodule.
    -+	 */
    -+	const struct object_id *super_oid;
    -+	/*
    -+	 * Path to the submodule in the superproject commit referenced
    -+	 * by 'super_oid'.
    -+	 */
    -+	char *path;
    - 	/* The submodule commits that have changed in the rev walk. */
    - 	struct oid_array *new_commits;
    - };
    -@@ submodule.c: static void changed_submodule_data_clear(struct changed_submodule_data *cs_data)
    - {
    - 	oid_array_clear(cs_data->new_commits);
    - 	free(cs_data->new_commits);
    -+	free(cs_data->path);
    - }
    - 
    - static void collect_changed_submodules_cb(struct diff_queue_struct *q,
    -@@ submodule.c: static void collect_changed_submodules_cb(struct diff_queue_struct *q,
    - 			cs_data = xcalloc(1, sizeof(struct changed_submodule_data));
    - 			/* NEEDSWORK: should we have oid_array_init()? */
    - 			cs_data->new_commits = xcalloc(1, sizeof(struct oid_array));
    -+			cs_data->super_oid = commit_oid;
    -+			cs_data->path = xstrdup(p->two->path);
    - 			item->util = cs_data;
    - 		}
    - 		oid_array_append(cs_data->new_commits, &p->two->oid);
     @@ submodule.c: struct has_commit_data {
      	struct repository *repo;
      	int result;
    @@ submodule.c: static int submodule_needs_pushing(struct repository *r,
      		/*
      		 * NOTE: We do consider it safe to return "no" here. The
      		 * correct answer would be "We do not know" instead of
    -@@ submodule.c: static void calculate_changed_submodule_paths(struct repository *r,
    - 		const struct submodule *submodule;
    - 		const char *path = NULL;
    - 
    --		submodule = submodule_from_name(r, null_oid(), name->string);
    -+		submodule = submodule_from_name(r, cs_data->super_oid, name->string);
    - 		if (submodule)
    - 			path = submodule->path;
    - 		else
     @@ submodule.c: static void calculate_changed_submodule_paths(struct repository *r,
      		if (!path)
      			continue;
      
    --		if (submodule_has_commits(r, path, cs_data->new_commits)) {
    -+		if (submodule_has_commits(r, path, cs_data->super_oid, cs_data->new_commits)) {
    - 			oid_array_clear(cs_data->new_commits);
    +-		if (submodule_has_commits(r, path, commits)) {
    ++		if (submodule_has_commits(r, path, null_oid(), commits)) {
    + 			oid_array_clear(commits);
      			*name->string = '\0';
      		}
     @@ submodule.c: static const struct submodule *get_non_gitmodules_submodule(const char *path)
 1:  a8ef64d16c !  4:  07fd4ff0a9 submodule: inline submodule_commits() into caller
    @@ Commit message
     
         Prepare for this change by inlining submodule_commits() (which inserts
         into the string_list and initializes the string_list_item.util) into its
    -    only caller. This simplifies the code and makes it easier for the caller
    -    to add information to the string_list_item.util.
    +    only caller so that the code is easier to refactor later.
     
         Signed-off-by: Glen Choo <chooglen@google.com>
     
 2:  11e48fbd41 !  5:  f049cb231b submodule: store new submodule commits oid_array in a struct
    @@ submodule.c: static const char *default_name_or_path(const char *path_or_name)
     + */
     +struct changed_submodule_data {
     +	/* The submodule commits that have changed in the rev walk. */
    -+	struct oid_array *new_commits;
    ++	struct oid_array new_commits;
     +};
     +
     +static void changed_submodule_data_clear(struct changed_submodule_data *cs_data)
     +{
    -+	oid_array_clear(cs_data->new_commits);
    -+	free(cs_data->new_commits);
    ++	oid_array_clear(&cs_data->new_commits);
     +}
     +
      static void collect_changed_submodules_cb(struct diff_queue_struct *q,
    @@ submodule.c: static void collect_changed_submodules_cb(struct diff_queue_struct
      		if (!S_ISGITLINK(p->two->mode))
      			continue;
     @@ submodule.c: static void collect_changed_submodules_cb(struct diff_queue_struct *q,
    - 			continue;
      
      		item = string_list_insert(changed, name);
    --		if (!item->util)
    -+		if (item->util)
    -+			cs_data = item->util;
    -+		else {
    -+			cs_data = xcalloc(1, sizeof(struct changed_submodule_data));
    - 			/* NEEDSWORK: should we have oid_array_init()? */
    + 		if (!item->util)
    +-			/* NEEDSWORK: should we have oid_array_init()? */
     -			item->util = xcalloc(1, sizeof(struct oid_array));
     -		oid_array_append(item->util, &p->two->oid);
    -+			cs_data->new_commits = xcalloc(1, sizeof(struct oid_array));
    -+			item->util = cs_data;
    -+		}
    -+		oid_array_append(cs_data->new_commits, &p->two->oid);
    ++			item->util = xcalloc(1, sizeof(struct changed_submodule_data));
    ++		cs_data = item->util;
    ++		oid_array_append(&cs_data->new_commits, &p->two->oid);
      	}
      }
      
    @@ submodule.c: int find_unpushed_submodules(struct repository *r,
      			continue;
      
     -		if (submodule_needs_pushing(r, path, commits))
    -+		if (submodule_needs_pushing(r, path, cs_data->new_commits))
    ++		if (submodule_needs_pushing(r, path, &cs_data->new_commits))
      			string_list_insert(needs_pushing, path);
      	}
      
    @@ submodule.c: static void calculate_changed_submodule_paths(struct repository *r,
      		if (!path)
      			continue;
      
    --		if (submodule_has_commits(r, path, commits)) {
    +-		if (submodule_has_commits(r, path, null_oid(), commits)) {
     -			oid_array_clear(commits);
    -+		if (submodule_has_commits(r, path, cs_data->new_commits)) {
    -+			oid_array_clear(cs_data->new_commits);
    ++		if (submodule_has_commits(r, path, null_oid(), &cs_data->new_commits)) {
    ++			changed_submodule_data_clear(cs_data);
      			*name->string = '\0';
      		}
      	}
    @@ submodule.c: static int fetch_finish(int retvalue, struct strbuf *err,
     -	commits = it->util;
     -	oid_array_filter(commits,
     +	cs_data = it->util;
    -+	oid_array_filter(cs_data->new_commits,
    ++	oid_array_filter(&cs_data->new_commits,
      			 commit_missing_in_sub,
      			 task->repo);
      
      	/* Are there commits we want, but do not exist? */
     -	if (commits->nr) {
     -		task->commits = commits;
    -+	if (cs_data->new_commits->nr) {
    -+		task->commits = cs_data->new_commits;
    ++	if (cs_data->new_commits.nr) {
    ++		task->commits = &cs_data->new_commits;
      		ALLOC_GROW(spf->oid_fetch_tasks,
      			   spf->oid_fetch_tasks_nr + 1,
      			   spf->oid_fetch_tasks_alloc);
 6:  db6de6617b !  6:  814073eecc submodule: extract get_fetch_task()
    @@ submodule.c: static int get_next_submodule(struct child_process *cp,
      			break;
      		case RECURSE_SUBMODULES_OFF:
      			continue;
    - 		}
    +@@ submodule.c: static int get_next_submodule(struct child_process *cp,
      
      		task->repo = get_submodule_repo_for(spf->r, task->sub->path, null_oid());
    --		if (task->repo) {
    + 		if (task->repo) {
     -			struct strbuf submodule_prefix = STRBUF_INIT;
     -			child_process_init(cp);
     -			cp->dir = task->repo->gitdir;
     -			prepare_submodule_repo_env_in_gitdir(&cp->env_array);
     -			cp->git_cmd = 1;
    --			if (!spf->quiet)
    --				strbuf_addf(err, _("Fetching submodule %s%s\n"),
    --					    spf->prefix, ce->name);
    + 			if (!spf->quiet)
    + 				strbuf_addf(err, _("Fetching submodule %s%s\n"),
    + 					    spf->prefix, ce->name);
     -			strvec_init(&cp->args);
     -			strvec_pushv(&cp->args, spf->args.v);
     -			strvec_push(&cp->args, default_argv);
    @@ submodule.c: static int get_next_submodule(struct child_process *cp,
     -						       spf->prefix,
     -						       task->sub->path);
     -			strvec_push(&cp->args, submodule_prefix.buf);
    --
    --			spf->count++;
    + 
    + 			spf->count++;
     -			*task_cb = task;
     -
     -			strbuf_release(&submodule_prefix);
     -			return 1;
    --		} else {
    -+		if (!task->repo) {
    ++			return task;
    + 		} else {
      			struct strbuf empty_submodule_path = STRBUF_INIT;
      
    - 			fetch_task_release(task);
     @@ submodule.c: static int get_next_submodule(struct child_process *cp,
    - 					    ce->name);
    - 			}
      			strbuf_release(&empty_submodule_path);
    -+			continue;
      		}
    -+		if (!spf->quiet)
    -+			strbuf_addf(err, _("Fetching submodule %s%s\n"),
    -+				    spf->prefix, ce->name);
    -+
    -+		spf->count++;
    -+		return task;
    -+	}
    + 	}
     +	return NULL;
     +}
     +
    @@ submodule.c: static int get_next_submodule(struct child_process *cp,
     +		strvec_push(&cp->args, default_argv);
     +		strvec_push(&cp->args, "--submodule-prefix");
     +
    -+		strbuf_addf(&submodule_prefix, "%s%s/", spf->prefix,
    -+			    task->sub->path);
    ++		strbuf_addf(&submodule_prefix, "%s%s/",
    ++						spf->prefix,
    ++						task->sub->path);
     +		strvec_push(&cp->args, submodule_prefix.buf);
     +		*task_cb = task;
     +
     +		strbuf_release(&submodule_prefix);
     +		return 1;
    - 	}
    ++	}
      
      	if (spf->oid_fetch_tasks_nr) {
    + 		struct fetch_task *task =
 7:  1737338380 !  7:  10fd5bf921 fetch: fetch unpopulated, changed submodules
    @@ Documentation/git-fetch.txt: include::transfer-data-leaks.txt[]
      SEE ALSO
      --------
     
    + ## builtin/fetch.c ##
    +@@ builtin/fetch.c: int cmd_fetch(int argc, const char **argv, const char *prefix)
    + 			max_children = fetch_parallel_config;
    + 
    + 		add_options_to_argv(&options);
    +-		result = fetch_populated_submodules(the_repository,
    +-						    &options,
    +-						    submodule_prefix,
    +-						    recurse_submodules,
    +-						    recurse_submodules_default,
    +-						    verbosity < 0,
    +-						    max_children);
    ++		result = fetch_submodules(the_repository,
    ++					  &options,
    ++					  submodule_prefix,
    ++					  recurse_submodules,
    ++					  recurse_submodules_default,
    ++					  verbosity < 0,
    ++					  max_children);
    + 		strvec_clear(&options);
    + 	}
    + 
    +
      ## submodule.c ##
    -@@
    - #include "parse-options.h"
    - #include "object-store.h"
    - #include "commit-reach.h"
    -+#include "shallow.h"
    +@@ submodule.c: static const char *default_name_or_path(const char *path_or_name)
    +  * member of the changed submodule string_list_item.
    +  */
    + struct changed_submodule_data {
    ++	/*
    ++	 * The first superproject commit in the rev walk that points to the
    ++	 * submodule.
    ++	 */
    ++	const struct object_id *super_oid;
    ++	/*
    ++	 * Path to the submodule in the superproject commit referenced
    ++	 * by 'super_oid'.
    ++	 */
    ++	char *path;
    + 	/* The submodule commits that have changed in the rev walk. */
    + 	struct oid_array new_commits;
    + };
    +@@ submodule.c: struct changed_submodule_data {
    + static void changed_submodule_data_clear(struct changed_submodule_data *cs_data)
    + {
    + 	oid_array_clear(&cs_data->new_commits);
    ++	free(cs_data->path);
    + }
      
    - static int config_update_recurse_submodules = RECURSE_SUBMODULES_OFF;
    - static int initialized_fetch_ref_tips;
    -@@ submodule.c: static void collect_changed_submodules(struct repository *r,
    + static void collect_changed_submodules_cb(struct diff_queue_struct *q,
    +@@ submodule.c: static void collect_changed_submodules_cb(struct diff_queue_struct *q,
    + 		if (!item->util)
    + 			item->util = xcalloc(1, sizeof(struct changed_submodule_data));
    + 		cs_data = item->util;
    ++		cs_data->super_oid = commit_oid;
    ++		cs_data->path = xstrdup(p->two->path);
    + 		oid_array_append(&cs_data->new_commits, &p->two->oid);
    + 	}
    + }
    +@@ submodule.c: void check_for_new_submodule_commits(struct object_id *oid)
    + 	oid_array_append(&ref_tips_after_fetch, oid);
    + }
      
    - 	save_warning = warn_on_object_refname_ambiguity;
    - 	warn_on_object_refname_ambiguity = 0;
    -+	/* make sure shallows are read */
    -+	is_repository_shallow(the_repository);
    ++/*
    ++ * Returns 1 if the repo has absorbed submodule gitdirs, and 0
    ++ * otherwise. Like submodule_name_to_gitdir(), this checks
    ++ * $GIT_DIR/modules, not $GIT_COMMON_DIR.
    ++ */
    ++static int repo_has_absorbed_submodules(struct repository *r)
    ++{
    ++	struct strbuf buf = STRBUF_INIT;
     +
    - 	repo_init_revisions(r, &rev, NULL);
    - 	setup_revisions(argv->nr, argv->v, &rev, &s_r_opt);
    - 	warn_on_object_refname_ambiguity = save_warning;
    -@@ submodule.c: static void calculate_changed_submodule_paths(struct repository *r,
    ++	strbuf_repo_git_path(&buf, r, "modules/");
    ++	return file_exists(buf.buf) && !is_empty_dir(buf.buf);
    ++}
    ++
    + static void calculate_changed_submodule_paths(struct repository *r,
    + 		struct string_list *changed_submodule_names)
    + {
      	struct strvec argv = STRVEC_INIT;
      	struct string_list_item *name;
      
     -	/* No need to check if there are no submodules configured */
     -	if (!submodule_from_path(r, NULL, NULL))
    --		return;
    --
    ++	/* No need to check if no submodules would be fetched */
    ++	if (!submodule_from_path(r, NULL, NULL) &&
    ++	    !repo_has_absorbed_submodules(r))
    + 		return;
    + 
      	strvec_push(&argv, "--"); /* argv[0] program name */
    - 	oid_array_for_each_unique(&ref_tips_after_fetch,
    - 				   append_oid_to_argv, &argv);
     @@ submodule.c: int submodule_touches_in_range(struct repository *r,
      }
      
    @@ submodule.c: get_fetch_task(struct submodule_parallel_fetch *spf,
      		{
      		default:
     @@ submodule.c: get_fetch_task(struct submodule_parallel_fetch *spf,
    - 			strbuf_addf(err, _("Fetching submodule %s%s\n"),
    - 				    spf->prefix, ce->name);
    + 				strbuf_addf(err, _("Fetching submodule %s%s\n"),
    + 					    spf->prefix, ce->name);
    + 
    +-			spf->count++;
    ++			spf->index_count++;
    + 			return task;
    + 		} else {
    + 			struct strbuf empty_submodule_path = STRBUF_INIT;
    +@@ submodule.c: get_fetch_task(struct submodule_parallel_fetch *spf,
    + 	return NULL;
    + }
      
    --		spf->count++;
    -+		spf->index_count++;
    -+		return task;
    -+	}
    -+	return NULL;
    -+}
    -+
     +static struct fetch_task *
     +get_fetch_task_from_changed(struct submodule_parallel_fetch *spf,
     +			    const char **default_argv, struct strbuf *err)
    @@ submodule.c: get_fetch_task(struct submodule_parallel_fetch *spf,
     +				    find_unique_abbrev(cs_data->super_oid,
     +						       DEFAULT_ABBREV));
     +		spf->changed_count++;
    - 		return task;
    - 	}
    - 	return NULL;
    -@@ submodule.c: static int get_next_submodule(struct child_process *cp, struct strbuf *err,
    ++		return task;
    ++	}
    ++	return NULL;
    ++}
    ++
    + static int get_next_submodule(struct child_process *cp, struct strbuf *err,
    + 			      void *data, void **task_cb)
      {
      	struct submodule_parallel_fetch *spf = data;
      	const char *default_argv = NULL;
    @@ submodule.c: static int get_next_submodule(struct child_process *cp, struct strb
      		return 1;
      	}
      
    +@@ submodule.c: static int fetch_finish(int retvalue, struct strbuf *err,
    + 	return 0;
    + }
    + 
    +-int fetch_populated_submodules(struct repository *r,
    +-			       const struct strvec *options,
    +-			       const char *prefix, int command_line_option,
    +-			       int default_option,
    +-			       int quiet, int max_parallel_jobs)
    ++int fetch_submodules(struct repository *r,
    ++		     const struct strvec *options,
    ++		     const char *prefix, int command_line_option,
    ++		     int default_option,
    ++		     int quiet, int max_parallel_jobs)
    + {
    + 	int i;
    + 	struct submodule_parallel_fetch spf = SPF_INIT;
    +
    + ## submodule.h ##
    +@@ submodule.h: int should_update_submodules(void);
    +  */
    + const struct submodule *submodule_from_ce(const struct cache_entry *ce);
    + void check_for_new_submodule_commits(struct object_id *oid);
    +-int fetch_populated_submodules(struct repository *r,
    +-			       const struct strvec *options,
    +-			       const char *prefix,
    +-			       int command_line_option,
    +-			       int default_option,
    +-			       int quiet, int max_parallel_jobs);
    ++int fetch_submodules(struct repository *r,
    ++		     const struct strvec *options,
    ++		     const char *prefix,
    ++		     int command_line_option,
    ++		     int default_option,
    ++		     int quiet, int max_parallel_jobs);
    + unsigned is_submodule_modified(const char *path, int ignore_untracked);
    + int submodule_uses_gitfile(const char *path);
    + 
     
      ## t/t5526-fetch-submodules.sh ##
     @@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necess
      	verify_fetch_result actual.err
      '
      
    -+# Cleans up after tests that checkout branches other than the main ones
    -+# in the tests.
    -+checkout_main_branches() {
    -+	git -C downstream checkout --recurse-submodules super &&
    -+	git -C downstream/submodule checkout --recurse-submodules sub &&
    -+	git -C downstream/submodule/subdir/deepsubmodule checkout --recurse-submodules deep
    -+}
    -+
     +# Test that we can fetch submodules in other branches by running fetch
    -+# in a branch that has no submodules.
    ++# in a commit that has no submodules.
     +test_expect_success 'setup downstream branch without submodules' '
     +	(
     +		cd downstream &&
    @@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-deman
     +'
     +
     +test_expect_success "'--recurse-submodules=on-demand' should fetch submodule commits if the submodule is changed but the index has no submodules" '
    -+	test_when_finished "checkout_main_branches" &&
     +	git -C downstream fetch --recurse-submodules &&
     +	# Create new superproject commit with updated submodules
     +	add_upstream_commit &&
    @@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-deman
     +	(
     +		cd downstream &&
     +		git switch --recurse-submodules no-submodules &&
    -+		git fetch --recurse-submodules=on-demand >../actual.out 2>../actual.err &&
    -+		git checkout --recurse-submodules origin/super 2>../actual-checkout.err
    ++		git fetch --recurse-submodules=on-demand >../actual.out 2>../actual.err
     +	) &&
     +	test_must_be_empty actual.out &&
     +	git rev-parse --short HEAD >superhead &&
    @@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-deman
     +
     +	# Assert that the fetch happened at the non-HEAD commits
     +	grep "Fetching submodule submodule at commit $superhead" actual.err &&
    -+	grep "Fetching submodule submodule/subdir/deepsubmodule at commit $subhead" actual.err &&
    -+
    -+	# Assert that we can checkout the superproject commit with --recurse-submodules
    -+	! grep -E "error: Submodule .+ could not be updated" actual-checkout.err
    ++	grep "Fetching submodule submodule/subdir/deepsubmodule at commit $subhead" actual.err
     +'
     +
     +test_expect_success "'--recurse-submodules' should fetch submodule commits if the submodule is changed but the index has no submodules" '
    -+	test_when_finished "checkout_main_branches" &&
     +	# Fetch any leftover commits from other tests.
     +	git -C downstream fetch --recurse-submodules &&
     +	# Create new superproject commit with updated submodules
    @@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-deman
     +	(
     +		cd downstream &&
     +		git switch --recurse-submodules no-submodules &&
    -+		git fetch --recurse-submodules >../actual.out 2>../actual.err &&
    -+		git checkout --recurse-submodules origin/super 2>../actual-checkout.err
    ++		git fetch --recurse-submodules >../actual.out 2>../actual.err
     +	) &&
     +	test_must_be_empty actual.out &&
     +	git rev-parse --short HEAD >superhead &&
    @@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-deman
     +
     +	# Assert that the fetch happened at the non-HEAD commits
     +	grep "Fetching submodule submodule at commit $superhead" actual.err &&
    -+	grep "Fetching submodule submodule/subdir/deepsubmodule at commit $subhead" actual.err &&
    -+
    -+	# Assert that we can checkout the superproject commit with --recurse-submodules
    -+	! grep -E "error: Submodule .+ could not be updated" actual-checkout.err
    ++	grep "Fetching submodule submodule/subdir/deepsubmodule at commit $subhead" actual.err
     +'
     +
     +test_expect_success "'--recurse-submodules' should ignore changed, inactive submodules" '
    -+	test_when_finished "checkout_main_branches" &&
     +	# Fetch any leftover commits from other tests.
     +	git -C downstream fetch --recurse-submodules &&
     +	# Create new superproject commit with updated submodules
    @@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-deman
     +	verify_fetch_result actual.err
     +'
     +
    -+# Test that we properly fetch the submodules in the index as well as
    -+# submodules in other branches.
    ++# In downstream, init "submodule2", but do not check it out while
    ++# fetching. This lets us assert that unpopulated submodules can be
    ++# fetched.
     +test_expect_success 'setup downstream branch with other submodule' '
     +	mkdir submodule2 &&
     +	(
    @@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-deman
     +'
     +
     +test_expect_success "'--recurse-submodules' should fetch submodule commits in changed submodules and the index" '
    -+	test_when_finished "checkout_main_branches" &&
     +	# Fetch any leftover commits from other tests.
     +	git -C downstream fetch --recurse-submodules &&
     +	# Create new commit in origin/super
    @@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-deman
     +	git checkout super &&
     +	(
     +		cd downstream &&
    -+		git fetch --recurse-submodules >../actual.out 2>../actual.err &&
    -+		git checkout --recurse-submodules origin/super-sub2-only 2>../actual-checkout.err
    ++		git fetch --recurse-submodules >../actual.out 2>../actual.err
     +	) &&
     +	test_must_be_empty actual.out &&
     +
    @@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-deman
     +	git -C submodule rev-parse --short HEAD >subhead &&
     +	git -C deepsubmodule rev-parse --short HEAD >deephead &&
     +	verify_fetch_result actual.err &&
    -+	# Assert that submodule is read from the index, not from a commit
    -+	! grep "Fetching submodule submodule at commit" actual.err &&
    ++	# grep for the exact line to check that the submodule is read
    ++	# from the index, not from a commit
    ++	grep "^Fetching submodule submodule\$" actual.err &&
     +
     +	# Assert that super-sub2-only and submodule2 were fetched even
     +	# though another branch is checked out
    @@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-deman
     +	grep -E "\.\.${super_sub2_only_head}\s+super-sub2-only\s+-> origin/super-sub2-only" actual.err &&
     +	grep "Fetching submodule submodule2 at commit $super_sub2_only_head" actual.err &&
     +	sub2head=$(git -C submodule2 rev-parse --short HEAD) &&
    -+	grep -E "\.\.${sub2head}\s+sub2\s+-> origin/sub2" actual.err &&
    -+
    -+	# Assert that we can checkout the superproject commit with --recurse-submodules
    -+	! grep -E "error: Submodule .+ could not be updated" actual-checkout.err
    ++	grep -E "\.\.${sub2head}\s+sub2\s+-> origin/sub2" actual.err
     +'
     +
      test_expect_success "'--recurse-submodules=on-demand' stops when no new submodule commits are found in the superproject (and ignores config)" '
 -:  ---------- >  8:  8aa68111b0 submodule: read shallows when finding changed submodules
 8:  e44bb1560e !  9:  05a8b93154 submodule: fix bug and remove add_submodule_odb()
    @@ Metadata
     Author: Glen Choo <chooglen@google.com>
     
      ## Commit message ##
    -    submodule: fix bug and remove add_submodule_odb()
    +    submodule: fix latent check_has_commit() bug
     
    -    add_submodule_odb() is a hack - it adds a submodule's odb as an
    -    alternate, allowing the submodule's objects to be read via
    -    the_repository. Its last caller is submodule_has_commits(), which calls
    -    add_submodule_odb() to prepare for check_has_commit(). This used to be
    -    necessary because check_has_commit() used the_repository's odb, but this
    -    is longer true as of 13a2f620b2 (submodule: pass repo to
    -    check_has_commit(), 2021-10-08).
    +    When check_has_commit() is called on a missing submodule, initialization
    +    of the struct repository fails, but it attempts to clear the struct
    +    anyway (which is a fatal error). This bug is masked by its only caller,
    +    submodule_has_commits(), first calling add_submodule_odb(). The latter
    +    fails if the submodule does not exist, making submodule_has_commits()
    +    exit early and not invoke check_has_commit().
     
    -    Removing add_submodule_odb() reveals a bug in check_has_commit(), where
    -    check_has_commit() will segfault if the submodule is missing (e.g. the
    -    user has not init-ed the submodule). This happens because the submodule
    -    struct cannot be initialized, but check_has_commit() tries to cleanup
    -    the uninitialized struct anyway. This was masked by add_submodule_odb(),
    -    because add_submodule_odb() fails when the submodule is missing, causing
    -    the caller to return early and avoid calling check_has_commit().
    +    Fix this bug, and because calling add_submodule_odb() is no longer
    +    necessary as of 13a2f620b2 (submodule: pass repo to
    +    check_has_commit(), 2021-10-08), remove that call too.
     
    -    Fix the bug and remove the call to add_submodule_odb(). Since
    -    add_submodule_odb() has no more callers, remove it too.
    -
    -    Note that submodule odbs can still by added as alternates via
    -    add_submodule_odb_by_path().
    +    This is the last caller of add_submodule_odb(), so remove that
    +    function. (Submodule ODBs are still added as alternates via
    +    add_submodule_odb_by_path().)
     
         Signed-off-by: Glen Choo <chooglen@google.com>
     

base-commit: 679e3693aba0c17af60c031f7eef68f2296b8dad
-- 
2.33.GIT


^ permalink raw reply	[flat|nested] 149+ messages in thread

* [PATCH v2 1/9] t5526: introduce test helper to assert on fetches
  2022-02-15 17:23 ` [PATCH v2 0/9] " Glen Choo
@ 2022-02-15 17:23   ` Glen Choo
  2022-02-15 21:37     ` Ævar Arnfjörð Bjarmason
  2022-02-15 17:23   ` [PATCH v2 2/9] t5526: use grep " Glen Choo
                     ` (8 subsequent siblings)
  9 siblings, 1 reply; 149+ messages in thread
From: Glen Choo @ 2022-02-15 17:23 UTC (permalink / raw)
  To: git; +Cc: Glen Choo, Jonathan Tan, Junio C Hamano

A future commit will change the stderr of "git fetch
--recurse-submodules" and add new tests to t/t5526-fetch-submodules.sh.
This poses two challenges:

* The tests use test_cmp to assert on the stderr, which will fail on the
  future test because the stderr changes slightly, even though it still
  contains the information we expect.
* The expect.err file is constructed by the add_upstream_commit() helper
  as input into test_cmp, but most tests fetch a different combination
  of repos from expect.err. This results in noisy tests that modify
  parts of that expect.err to generate the expected output.

To address both of these issues, introduce a verify_fetch_result()
helper to t/t5526-fetch-submodules.sh that asserts on the output of "git
fetch --recurse-submodules" and handles the ordering of expect.err.

As a result, the tests no longer construct expect.err manually. test_cmp
is still invoked by verify_fetch_result(), but that will be replaced in
a later commit.

Signed-off-by: Glen Choo <chooglen@google.com>
---
 t/t5526-fetch-submodules.sh | 136 +++++++++++++++++++++---------------
 1 file changed, 81 insertions(+), 55 deletions(-)

diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index 2dc75b80db..0e93df1665 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -13,6 +13,10 @@ export GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB
 
 pwd=$(pwd)
 
+# For each submodule in the test setup, this creates a commit and writes
+# a file that contains the expected err if that new commit were fetched.
+# These output files get concatenated in the right order by
+# verify_fetch_result().
 add_upstream_commit() {
 	(
 		cd submodule &&
@@ -22,9 +26,9 @@ add_upstream_commit() {
 		git add subfile &&
 		git commit -m new subfile &&
 		head2=$(git rev-parse --short HEAD) &&
-		echo "Fetching submodule submodule" > ../expect.err &&
-		echo "From $pwd/submodule" >> ../expect.err &&
-		echo "   $head1..$head2  sub        -> origin/sub" >> ../expect.err
+		echo "Fetching submodule submodule" > ../expect.err.sub &&
+		echo "From $pwd/submodule" >> ../expect.err.sub &&
+		echo "   $head1..$head2  sub        -> origin/sub" >> ../expect.err.sub
 	) &&
 	(
 		cd deepsubmodule &&
@@ -34,12 +38,33 @@ add_upstream_commit() {
 		git add deepsubfile &&
 		git commit -m new deepsubfile &&
 		head2=$(git rev-parse --short HEAD) &&
-		echo "Fetching submodule submodule/subdir/deepsubmodule" >> ../expect.err
-		echo "From $pwd/deepsubmodule" >> ../expect.err &&
-		echo "   $head1..$head2  deep       -> origin/deep" >> ../expect.err
+		echo "Fetching submodule submodule/subdir/deepsubmodule" > ../expect.err.deep
+		echo "From $pwd/deepsubmodule" >> ../expect.err.deep &&
+		echo "   $head1..$head2  deep       -> origin/deep" >> ../expect.err.deep
 	)
 }
 
+# Verifies that the expected repositories were fetched. This is done by
+# concatenating the files expect.err.[super|sub|deep] in the correct
+# order and comparing it to the actual stderr.
+#
+# If a repo should not be fetched in the test, its corresponding
+# expect.err file should be rm-ed.
+verify_fetch_result() {
+	ACTUAL_ERR=$1 &&
+	rm -f expect.err.combined &&
+	if [ -f expect.err.super ]; then
+		cat expect.err.super >>expect.err.combined
+	fi &&
+	if [ -f expect.err.sub ]; then
+		cat expect.err.sub >>expect.err.combined
+	fi &&
+	if [ -f expect.err.deep ]; then
+		cat expect.err.deep >>expect.err.combined
+	fi &&
+	test_cmp expect.err.combined $ACTUAL_ERR
+}
+
 test_expect_success setup '
 	mkdir deepsubmodule &&
 	(
@@ -77,7 +102,7 @@ test_expect_success "fetch --recurse-submodules recurses into submodules" '
 		git fetch --recurse-submodules >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "submodule.recurse option triggers recursive fetch" '
@@ -87,7 +112,7 @@ test_expect_success "submodule.recurse option triggers recursive fetch" '
 		git -c submodule.recurse fetch >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "fetch --recurse-submodules -j2 has the same output behaviour" '
@@ -97,7 +122,7 @@ test_expect_success "fetch --recurse-submodules -j2 has the same output behaviou
 		GIT_TRACE="$TRASH_DIRECTORY/trace.out" git fetch --recurse-submodules -j2 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err &&
+	verify_fetch_result actual.err &&
 	grep "2 tasks" trace.out
 '
 
@@ -127,7 +152,7 @@ test_expect_success "using fetchRecurseSubmodules=true in .gitmodules recurses i
 		git fetch >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "--no-recurse-submodules overrides .gitmodules config" '
@@ -158,7 +183,7 @@ test_expect_success "--recurse-submodules overrides fetchRecurseSubmodules setti
 		git config --unset submodule.submodule.fetchRecurseSubmodules
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "--quiet propagates to submodules" '
@@ -186,7 +211,7 @@ test_expect_success "--dry-run propagates to submodules" '
 		git fetch --recurse-submodules --dry-run >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "Without --dry-run propagates to submodules" '
@@ -195,7 +220,7 @@ test_expect_success "Without --dry-run propagates to submodules" '
 		git fetch --recurse-submodules >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "recurseSubmodules=true propagates into submodules" '
@@ -206,7 +231,7 @@ test_expect_success "recurseSubmodules=true propagates into submodules" '
 		git fetch >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "--recurse-submodules overrides config in submodule" '
@@ -220,7 +245,7 @@ test_expect_success "--recurse-submodules overrides config in submodule" '
 		git fetch --recurse-submodules >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "--no-recurse-submodules overrides config setting" '
@@ -253,14 +278,14 @@ test_expect_success "Recursion stops when no new submodule commits are fetched"
 	git add submodule &&
 	git commit -m "new submodule" &&
 	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.sub &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err.sub &&
-	head -3 expect.err >> expect.err.sub &&
+	echo "From $pwd/." > expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
+	rm expect.err.deep &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err
 	) &&
-	test_cmp expect.err.sub actual.err &&
+	verify_fetch_result actual.err &&
 	test_must_be_empty actual.out
 '
 
@@ -271,14 +296,16 @@ test_expect_success "Recursion doesn't happen when new superproject commits don'
 	git add file &&
 	git commit -m "new file" &&
 	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.file &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err.file &&
+	echo "From $pwd/." > expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
+	rm expect.err.sub &&
+	rm expect.err.deep &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err.file actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "Recursion picks up config in submodule" '
@@ -295,9 +322,8 @@ test_expect_success "Recursion picks up config in submodule" '
 	git add submodule &&
 	git commit -m "new submodule" &&
 	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.sub &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err.sub &&
-	cat expect.err >> expect.err.sub &&
+	echo "From $pwd/." > expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err &&
@@ -306,7 +332,7 @@ test_expect_success "Recursion picks up config in submodule" '
 			git config --unset fetch.recurseSubmodules
 		)
 	) &&
-	test_cmp expect.err.sub actual.err &&
+	verify_fetch_result actual.err &&
 	test_must_be_empty actual.out
 '
 
@@ -331,15 +357,13 @@ test_expect_success "Recursion picks up all submodules when necessary" '
 	git add submodule &&
 	git commit -m "new submodule" &&
 	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.2 &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err.2 &&
-	cat expect.err.sub >> expect.err.2 &&
-	tail -3 expect.err >> expect.err.2 &&
+	echo "From $pwd/." > expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err
 	) &&
-	test_cmp expect.err.2 actual.err &&
+	verify_fetch_result actual.err &&
 	test_must_be_empty actual.out
 '
 
@@ -375,11 +399,8 @@ test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necess
 	git add submodule &&
 	git commit -m "new submodule" &&
 	head2=$(git rev-parse --short HEAD) &&
-	tail -3 expect.err > expect.err.deepsub &&
-	echo "From $pwd/." > expect.err &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err &&
-	cat expect.err.sub >> expect.err &&
-	cat expect.err.deepsub >> expect.err &&
+	echo "From $pwd/." > expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
 	(
 		cd downstream &&
 		git config fetch.recurseSubmodules false &&
@@ -395,7 +416,7 @@ test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necess
 		)
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "'--recurse-submodules=on-demand' stops when no new submodule commits are found in the superproject (and ignores config)" '
@@ -405,14 +426,16 @@ test_expect_success "'--recurse-submodules=on-demand' stops when no new submodul
 	git add file &&
 	git commit -m "new file" &&
 	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.file &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err.file &&
+	echo "From $pwd/." > expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
+	rm expect.err.sub &&
+	rm expect.err.deep &&
 	(
 		cd downstream &&
 		git fetch --recurse-submodules=on-demand >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err.file actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "'fetch.recurseSubmodules=on-demand' overrides global config" '
@@ -426,9 +449,9 @@ test_expect_success "'fetch.recurseSubmodules=on-demand' overrides global config
 	git add submodule &&
 	git commit -m "new submodule" &&
 	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.2 &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err.2 &&
-	head -3 expect.err >> expect.err.2 &&
+	echo "From $pwd/." > expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
+	rm expect.err.deep &&
 	(
 		cd downstream &&
 		git config fetch.recurseSubmodules on-demand &&
@@ -440,7 +463,7 @@ test_expect_success "'fetch.recurseSubmodules=on-demand' overrides global config
 		git config --unset fetch.recurseSubmodules
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err.2 actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "'submodule.<sub>.fetchRecurseSubmodules=on-demand' overrides fetch.recurseSubmodules" '
@@ -454,9 +477,9 @@ test_expect_success "'submodule.<sub>.fetchRecurseSubmodules=on-demand' override
 	git add submodule &&
 	git commit -m "new submodule" &&
 	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.2 &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err.2 &&
-	head -3 expect.err >> expect.err.2 &&
+	echo "From $pwd/." > expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
+	rm expect.err.deep &&
 	(
 		cd downstream &&
 		git config submodule.submodule.fetchRecurseSubmodules on-demand &&
@@ -468,7 +491,7 @@ test_expect_success "'submodule.<sub>.fetchRecurseSubmodules=on-demand' override
 		git config --unset submodule.submodule.fetchRecurseSubmodules
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err.2 actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "don't fetch submodule when newly recorded commits are already present" '
@@ -480,14 +503,17 @@ test_expect_success "don't fetch submodule when newly recorded commits are alrea
 	git add submodule &&
 	git commit -m "submodule rewound" &&
 	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err &&
+	echo "From $pwd/." > expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
+	rm expect.err.sub &&
+	# This file does not exist, but rm -f for readability
+	rm -f expect.err.deep &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err &&
+	verify_fetch_result actual.err &&
 	(
 		cd submodule &&
 		git checkout -q sub
@@ -505,9 +531,9 @@ test_expect_success "'fetch.recurseSubmodules=on-demand' works also without .git
 	git rm .gitmodules &&
 	git commit -m "new submodule without .gitmodules" &&
 	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." >expect.err.2 &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err.2 &&
-	head -3 expect.err >>expect.err.2 &&
+	echo "From $pwd/." >expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
+	rm expect.err.deep &&
 	(
 		cd downstream &&
 		rm .gitmodules &&
@@ -523,7 +549,7 @@ test_expect_success "'fetch.recurseSubmodules=on-demand' works also without .git
 		git reset --hard
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err.2 actual.err &&
+	verify_fetch_result actual.err &&
 	git checkout HEAD^ -- .gitmodules &&
 	git add .gitmodules &&
 	git commit -m "new submodule restored .gitmodules"
-- 
2.33.GIT


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v2 2/9] t5526: use grep to assert on fetches
  2022-02-15 17:23 ` [PATCH v2 0/9] " Glen Choo
  2022-02-15 17:23   ` [PATCH v2 1/9] t5526: introduce test helper to assert on fetches Glen Choo
@ 2022-02-15 17:23   ` Glen Choo
  2022-02-15 21:53     ` Ævar Arnfjörð Bjarmason
  2022-02-15 17:23   ` [PATCH v2 3/9] submodule: make static functions read submodules from commits Glen Choo
                     ` (7 subsequent siblings)
  9 siblings, 1 reply; 149+ messages in thread
From: Glen Choo @ 2022-02-15 17:23 UTC (permalink / raw)
  To: git; +Cc: Glen Choo, Jonathan Tan, Junio C Hamano

In a previous commit, we replaced test_cmp invocations with
verify_fetch_result(). Finish the process of removing test_cmp by using
grep in verify_fetch_result() instead.

This makes the tests less sensitive to changes because, instead of
checking the whole stderr, we only grep for the lines of the form

* "<old-head>..<new-head>\s+branch\s+-> origin/branch"
* "Fetching submodule <submodule-path>" (if fetching a submodule)

when we expect the repo to have fetched. If we expect the repo to not
have fetched, grep to make sure the lines are absent. Also, simplify the
assertions by using grep patterns that match only the relevant pieces of
information, e.g. <old-head> is irrelevant because we only want to know
if the fetch was performed, so we don't need to know where the branch
was before the fetch.

Signed-off-by: Glen Choo <chooglen@google.com>
---
 t/t5526-fetch-submodules.sh | 131 +++++++++++++-----------------------
 1 file changed, 48 insertions(+), 83 deletions(-)

diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index 0e93df1665..cb18f0ac21 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -20,49 +20,52 @@ pwd=$(pwd)
 add_upstream_commit() {
 	(
 		cd submodule &&
-		head1=$(git rev-parse --short HEAD) &&
 		echo new >> subfile &&
 		test_tick &&
 		git add subfile &&
 		git commit -m new subfile &&
-		head2=$(git rev-parse --short HEAD) &&
-		echo "Fetching submodule submodule" > ../expect.err.sub &&
-		echo "From $pwd/submodule" >> ../expect.err.sub &&
-		echo "   $head1..$head2  sub        -> origin/sub" >> ../expect.err.sub
+		git rev-parse --short HEAD >../subhead
 	) &&
 	(
 		cd deepsubmodule &&
-		head1=$(git rev-parse --short HEAD) &&
 		echo new >> deepsubfile &&
 		test_tick &&
 		git add deepsubfile &&
 		git commit -m new deepsubfile &&
-		head2=$(git rev-parse --short HEAD) &&
-		echo "Fetching submodule submodule/subdir/deepsubmodule" > ../expect.err.deep
-		echo "From $pwd/deepsubmodule" >> ../expect.err.deep &&
-		echo "   $head1..$head2  deep       -> origin/deep" >> ../expect.err.deep
+		git rev-parse --short HEAD >../deephead
 	)
 }
 
 # Verifies that the expected repositories were fetched. This is done by
-# concatenating the files expect.err.[super|sub|deep] in the correct
-# order and comparing it to the actual stderr.
+# checking that the branches of [super|sub|deep] were updated to
+# [super|sub|deep]head if the corresponding file exists.
 #
-# If a repo should not be fetched in the test, its corresponding
-# expect.err file should be rm-ed.
+# If the [super|sub|deep] head file does not exist, this verifies that
+# the corresponding repo was not fetched. Thus, if a repo should not be
+# fetched in the test, its corresponding head file should be
+# rm-ed.
 verify_fetch_result() {
 	ACTUAL_ERR=$1 &&
-	rm -f expect.err.combined &&
-	if [ -f expect.err.super ]; then
-		cat expect.err.super >>expect.err.combined
+	# Each grep pattern is guaranteed to match the correct repo
+	# because each repo uses a different name for their branch i.e.
+	# "super", "sub" and "deep".
+	if [ -f superhead ]; then
+		grep -E "\.\.$(cat superhead)\s+super\s+-> origin/super" $ACTUAL_ERR
+	else
+		! grep "super" $ACTUAL_ERR
 	fi &&
-	if [ -f expect.err.sub ]; then
-		cat expect.err.sub >>expect.err.combined
+	if [ -f subhead ]; then
+		grep "Fetching submodule submodule" $ACTUAL_ERR &&
+		grep -E "\.\.$(cat subhead)\s+sub\s+-> origin/sub" $ACTUAL_ERR
+	else
+		! grep "Fetching submodule submodule" $ACTUAL_ERR
 	fi &&
-	if [ -f expect.err.deep ]; then
-		cat expect.err.deep >>expect.err.combined
-	fi &&
-	test_cmp expect.err.combined $ACTUAL_ERR
+	if [ -f deephead ]; then
+		grep "Fetching submodule submodule/subdir/deepsubmodule" $ACTUAL_ERR &&
+		grep -E "\.\.$(cat deephead)\s+deep\s+-> origin/deep" $ACTUAL_ERR
+	else
+		! grep "Fetching submodule submodule/subdir/deepsubmodule" $ACTUAL_ERR
+	fi
 }
 
 test_expect_success setup '
@@ -274,13 +277,10 @@ test_expect_success "Recursion doesn't happen when no new commits are fetched in
 '
 
 test_expect_success "Recursion stops when no new submodule commits are fetched" '
-	head1=$(git rev-parse --short HEAD) &&
 	git add submodule &&
 	git commit -m "new submodule" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
-	rm expect.err.deep &&
+	git rev-parse --short HEAD >superhead &&
+	rm deephead &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err
@@ -291,15 +291,12 @@ test_expect_success "Recursion stops when no new submodule commits are fetched"
 
 test_expect_success "Recursion doesn't happen when new superproject commits don't change any submodules" '
 	add_upstream_commit &&
-	head1=$(git rev-parse --short HEAD) &&
 	echo a > file &&
 	git add file &&
 	git commit -m "new file" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
-	rm expect.err.sub &&
-	rm expect.err.deep &&
+	git rev-parse --short HEAD >superhead &&
+	rm subhead &&
+	rm deephead &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err
@@ -318,12 +315,9 @@ test_expect_success "Recursion picks up config in submodule" '
 		)
 	) &&
 	add_upstream_commit &&
-	head1=$(git rev-parse --short HEAD) &&
 	git add submodule &&
 	git commit -m "new submodule" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
+	git rev-parse --short HEAD >superhead &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err &&
@@ -345,20 +339,13 @@ test_expect_success "Recursion picks up all submodules when necessary" '
 			git fetch &&
 			git checkout -q FETCH_HEAD
 		) &&
-		head1=$(git rev-parse --short HEAD^) &&
 		git add subdir/deepsubmodule &&
 		git commit -m "new deepsubmodule" &&
-		head2=$(git rev-parse --short HEAD) &&
-		echo "Fetching submodule submodule" > ../expect.err.sub &&
-		echo "From $pwd/submodule" >> ../expect.err.sub &&
-		echo "   $head1..$head2  sub        -> origin/sub" >> ../expect.err.sub
+		git rev-parse --short HEAD >../subhead
 	) &&
-	head1=$(git rev-parse --short HEAD) &&
 	git add submodule &&
 	git commit -m "new submodule" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
+	git rev-parse --short HEAD >superhead &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err
@@ -376,13 +363,9 @@ test_expect_success "'--recurse-submodules=on-demand' doesn't recurse when no ne
 			git fetch &&
 			git checkout -q FETCH_HEAD
 		) &&
-		head1=$(git rev-parse --short HEAD^) &&
 		git add subdir/deepsubmodule &&
 		git commit -m "new deepsubmodule" &&
-		head2=$(git rev-parse --short HEAD) &&
-		echo Fetching submodule submodule > ../expect.err.sub &&
-		echo "From $pwd/submodule" >> ../expect.err.sub &&
-		echo "   $head1..$head2  sub        -> origin/sub" >> ../expect.err.sub
+		git rev-parse --short HEAD >../subhead
 	) &&
 	(
 		cd downstream &&
@@ -395,12 +378,9 @@ test_expect_success "'--recurse-submodules=on-demand' doesn't recurse when no ne
 '
 
 test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necessary (and ignores config)" '
-	head1=$(git rev-parse --short HEAD) &&
 	git add submodule &&
 	git commit -m "new submodule" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
+	git rev-parse --short HEAD >superhead &&
 	(
 		cd downstream &&
 		git config fetch.recurseSubmodules false &&
@@ -421,15 +401,12 @@ test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necess
 
 test_expect_success "'--recurse-submodules=on-demand' stops when no new submodule commits are found in the superproject (and ignores config)" '
 	add_upstream_commit &&
-	head1=$(git rev-parse --short HEAD) &&
 	echo a >> file &&
 	git add file &&
 	git commit -m "new file" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
-	rm expect.err.sub &&
-	rm expect.err.deep &&
+	git rev-parse --short HEAD >superhead &&
+	rm subhead &&
+	rm deephead &&
 	(
 		cd downstream &&
 		git fetch --recurse-submodules=on-demand >../actual.out 2>../actual.err
@@ -445,13 +422,10 @@ test_expect_success "'fetch.recurseSubmodules=on-demand' overrides global config
 	) &&
 	add_upstream_commit &&
 	git config --global fetch.recurseSubmodules false &&
-	head1=$(git rev-parse --short HEAD) &&
 	git add submodule &&
 	git commit -m "new submodule" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
-	rm expect.err.deep &&
+	git rev-parse --short HEAD >superhead &&
+	rm deephead &&
 	(
 		cd downstream &&
 		git config fetch.recurseSubmodules on-demand &&
@@ -473,13 +447,10 @@ test_expect_success "'submodule.<sub>.fetchRecurseSubmodules=on-demand' override
 	) &&
 	add_upstream_commit &&
 	git config fetch.recurseSubmodules false &&
-	head1=$(git rev-parse --short HEAD) &&
 	git add submodule &&
 	git commit -m "new submodule" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
-	rm expect.err.deep &&
+	git rev-parse --short HEAD >superhead &&
+	rm deephead &&
 	(
 		cd downstream &&
 		git config submodule.submodule.fetchRecurseSubmodules on-demand &&
@@ -499,15 +470,12 @@ test_expect_success "don't fetch submodule when newly recorded commits are alrea
 		cd submodule &&
 		git checkout -q HEAD^^
 	) &&
-	head1=$(git rev-parse --short HEAD) &&
 	git add submodule &&
 	git commit -m "submodule rewound" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
-	rm expect.err.sub &&
+	git rev-parse --short HEAD >superhead &&
+	rm subhead &&
 	# This file does not exist, but rm -f for readability
-	rm -f expect.err.deep &&
+	rm -f deephead &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err
@@ -526,14 +494,11 @@ test_expect_success "'fetch.recurseSubmodules=on-demand' works also without .git
 		git fetch --recurse-submodules
 	) &&
 	add_upstream_commit &&
-	head1=$(git rev-parse --short HEAD) &&
 	git add submodule &&
 	git rm .gitmodules &&
 	git commit -m "new submodule without .gitmodules" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." >expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
-	rm expect.err.deep &&
+	git rev-parse --short HEAD >superhead &&
+	rm deephead &&
 	(
 		cd downstream &&
 		rm .gitmodules &&
-- 
2.33.GIT


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v2 3/9] submodule: make static functions read submodules from commits
  2022-02-15 17:23 ` [PATCH v2 0/9] " Glen Choo
  2022-02-15 17:23   ` [PATCH v2 1/9] t5526: introduce test helper to assert on fetches Glen Choo
  2022-02-15 17:23   ` [PATCH v2 2/9] t5526: use grep " Glen Choo
@ 2022-02-15 17:23   ` Glen Choo
  2022-02-15 21:18     ` Jonathan Tan
  2022-02-15 22:00     ` Ævar Arnfjörð Bjarmason
  2022-02-15 17:23   ` [PATCH v2 4/9] submodule: inline submodule_commits() into caller Glen Choo
                     ` (6 subsequent siblings)
  9 siblings, 2 replies; 149+ messages in thread
From: Glen Choo @ 2022-02-15 17:23 UTC (permalink / raw)
  To: git; +Cc: Glen Choo, Jonathan Tan, Junio C Hamano

A future commit will teach "fetch --recurse-submodules" to fetch
unpopulated submodules. To prepare for this, teach the necessary static
functions how to read submodules from superproject commits using a
"treeish_name" argument (instead of always reading from the index and
filesystem) but do not actually change where submodules are read from.
Submodules will be read from commits when we fetch unpopulated
submodules.

The changed function signatures follow repo_submodule_init()'s argument
order, i.e. "path" then "treeish_name". Where needed, reorder the
arguments of functions that already take "path" and "treeish_name" to be
consistent with this convention.

Signed-off-by: Glen Choo <chooglen@google.com>
---
 submodule.c | 24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/submodule.c b/submodule.c
index 5ace18a7d9..7032dcabb8 100644
--- a/submodule.c
+++ b/submodule.c
@@ -932,6 +932,7 @@ struct has_commit_data {
 	struct repository *repo;
 	int result;
 	const char *path;
+	const struct object_id *super_oid;
 };
 
 static int check_has_commit(const struct object_id *oid, void *data)
@@ -940,7 +941,7 @@ static int check_has_commit(const struct object_id *oid, void *data)
 	struct repository subrepo;
 	enum object_type type;
 
-	if (repo_submodule_init(&subrepo, cb->repo, cb->path, null_oid())) {
+	if (repo_submodule_init(&subrepo, cb->repo, cb->path, cb->super_oid)) {
 		cb->result = 0;
 		goto cleanup;
 	}
@@ -968,9 +969,10 @@ static int check_has_commit(const struct object_id *oid, void *data)
 
 static int submodule_has_commits(struct repository *r,
 				 const char *path,
+				 const struct object_id *super_oid,
 				 struct oid_array *commits)
 {
-	struct has_commit_data has_commit = { r, 1, path };
+	struct has_commit_data has_commit = { r, 1, path, super_oid };
 
 	/*
 	 * Perform a cheap, but incorrect check for the existence of 'commits'.
@@ -1017,7 +1019,7 @@ static int submodule_needs_pushing(struct repository *r,
 				   const char *path,
 				   struct oid_array *commits)
 {
-	if (!submodule_has_commits(r, path, commits))
+	if (!submodule_has_commits(r, path, null_oid(), commits))
 		/*
 		 * NOTE: We do consider it safe to return "no" here. The
 		 * correct answer would be "We do not know" instead of
@@ -1277,7 +1279,7 @@ static void calculate_changed_submodule_paths(struct repository *r,
 		if (!path)
 			continue;
 
-		if (submodule_has_commits(r, path, commits)) {
+		if (submodule_has_commits(r, path, null_oid(), commits)) {
 			oid_array_clear(commits);
 			*name->string = '\0';
 		}
@@ -1402,12 +1404,13 @@ static const struct submodule *get_non_gitmodules_submodule(const char *path)
 }
 
 static struct fetch_task *fetch_task_create(struct repository *r,
-					    const char *path)
+					    const char *path,
+					    const struct object_id *treeish_name)
 {
 	struct fetch_task *task = xmalloc(sizeof(*task));
 	memset(task, 0, sizeof(*task));
 
-	task->sub = submodule_from_path(r, null_oid(), path);
+	task->sub = submodule_from_path(r, treeish_name, path);
 	if (!task->sub) {
 		/*
 		 * No entry in .gitmodules? Technically not a submodule,
@@ -1439,11 +1442,12 @@ static void fetch_task_release(struct fetch_task *p)
 }
 
 static struct repository *get_submodule_repo_for(struct repository *r,
-						 const char *path)
+						 const char *path,
+						 const struct object_id *treeish_name)
 {
 	struct repository *ret = xmalloc(sizeof(*ret));
 
-	if (repo_submodule_init(ret, r, path, null_oid())) {
+	if (repo_submodule_init(ret, r, path, treeish_name)) {
 		free(ret);
 		return NULL;
 	}
@@ -1464,7 +1468,7 @@ static int get_next_submodule(struct child_process *cp,
 		if (!S_ISGITLINK(ce->ce_mode))
 			continue;
 
-		task = fetch_task_create(spf->r, ce->name);
+		task = fetch_task_create(spf->r, ce->name, null_oid());
 		if (!task)
 			continue;
 
@@ -1487,7 +1491,7 @@ static int get_next_submodule(struct child_process *cp,
 			continue;
 		}
 
-		task->repo = get_submodule_repo_for(spf->r, task->sub->path);
+		task->repo = get_submodule_repo_for(spf->r, task->sub->path, null_oid());
 		if (task->repo) {
 			struct strbuf submodule_prefix = STRBUF_INIT;
 			child_process_init(cp);
-- 
2.33.GIT


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v2 4/9] submodule: inline submodule_commits() into caller
  2022-02-15 17:23 ` [PATCH v2 0/9] " Glen Choo
                     ` (2 preceding siblings ...)
  2022-02-15 17:23   ` [PATCH v2 3/9] submodule: make static functions read submodules from commits Glen Choo
@ 2022-02-15 17:23   ` Glen Choo
  2022-02-15 22:02     ` Ævar Arnfjörð Bjarmason
  2022-02-15 17:23   ` [PATCH v2 5/9] submodule: store new submodule commits oid_array in a struct Glen Choo
                     ` (5 subsequent siblings)
  9 siblings, 1 reply; 149+ messages in thread
From: Glen Choo @ 2022-02-15 17:23 UTC (permalink / raw)
  To: git; +Cc: Glen Choo, Jonathan Tan, Junio C Hamano

When collecting the string_list of changed submodule names, the new
submodules commits are stored in the string_list_item.util as an
oid_array. A subsequent commit will replace the oid_array with a struct
that has more information.

Prepare for this change by inlining submodule_commits() (which inserts
into the string_list and initializes the string_list_item.util) into its
only caller so that the code is easier to refactor later.

Signed-off-by: Glen Choo <chooglen@google.com>
---
 submodule.c | 22 ++++++----------------
 1 file changed, 6 insertions(+), 16 deletions(-)

diff --git a/submodule.c b/submodule.c
index 7032dcabb8..7fdf7793fb 100644
--- a/submodule.c
+++ b/submodule.c
@@ -782,19 +782,6 @@ const struct submodule *submodule_from_ce(const struct cache_entry *ce)
 	return submodule_from_path(the_repository, null_oid(), ce->name);
 }
 
-static struct oid_array *submodule_commits(struct string_list *submodules,
-					   const char *name)
-{
-	struct string_list_item *item;
-
-	item = string_list_insert(submodules, name);
-	if (item->util)
-		return (struct oid_array *) item->util;
-
-	/* NEEDSWORK: should we have oid_array_init()? */
-	item->util = xcalloc(1, sizeof(struct oid_array));
-	return (struct oid_array *) item->util;
-}
 
 struct collect_changed_submodules_cb_data {
 	struct repository *repo;
@@ -830,9 +817,9 @@ static void collect_changed_submodules_cb(struct diff_queue_struct *q,
 
 	for (i = 0; i < q->nr; i++) {
 		struct diff_filepair *p = q->queue[i];
-		struct oid_array *commits;
 		const struct submodule *submodule;
 		const char *name;
+		struct string_list_item *item;
 
 		if (!S_ISGITLINK(p->two->mode))
 			continue;
@@ -859,8 +846,11 @@ static void collect_changed_submodules_cb(struct diff_queue_struct *q,
 		if (!name)
 			continue;
 
-		commits = submodule_commits(changed, name);
-		oid_array_append(commits, &p->two->oid);
+		item = string_list_insert(changed, name);
+		if (!item->util)
+			/* NEEDSWORK: should we have oid_array_init()? */
+			item->util = xcalloc(1, sizeof(struct oid_array));
+		oid_array_append(item->util, &p->two->oid);
 	}
 }
 
-- 
2.33.GIT


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v2 5/9] submodule: store new submodule commits oid_array in a struct
  2022-02-15 17:23 ` [PATCH v2 0/9] " Glen Choo
                     ` (3 preceding siblings ...)
  2022-02-15 17:23   ` [PATCH v2 4/9] submodule: inline submodule_commits() into caller Glen Choo
@ 2022-02-15 17:23   ` Glen Choo
  2022-02-15 21:33     ` Ævar Arnfjörð Bjarmason
  2022-02-15 17:23   ` [PATCH v2 6/9] submodule: extract get_fetch_task() Glen Choo
                     ` (4 subsequent siblings)
  9 siblings, 1 reply; 149+ messages in thread
From: Glen Choo @ 2022-02-15 17:23 UTC (permalink / raw)
  To: git; +Cc: Glen Choo, Jonathan Tan, Junio C Hamano

This commit prepares for a future commit that will teach `git fetch
--recurse-submodules` how to fetch submodules that are present in
<gitdir>/modules, but are not populated. To do this, we need to store
more information about the changed submodule so that we can read the
submodule configuration from the superproject commit instead of the
filesystem.

Refactor the changed submodules string_list.util to hold a struct
instead of an oid_array. This struct only holds the new_commits
oid_array for now; more information will be added later.

Signed-off-by: Glen Choo <chooglen@google.com>
---
 submodule.c | 54 ++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 35 insertions(+), 19 deletions(-)

diff --git a/submodule.c b/submodule.c
index 7fdf7793fb..5b1aa3fbe8 100644
--- a/submodule.c
+++ b/submodule.c
@@ -806,6 +806,20 @@ static const char *default_name_or_path(const char *path_or_name)
 	return path_or_name;
 }
 
+/*
+ * Holds relevant information for a changed submodule. Used as the .util
+ * member of the changed submodule string_list_item.
+ */
+struct changed_submodule_data {
+	/* The submodule commits that have changed in the rev walk. */
+	struct oid_array new_commits;
+};
+
+static void changed_submodule_data_clear(struct changed_submodule_data *cs_data)
+{
+	oid_array_clear(&cs_data->new_commits);
+}
+
 static void collect_changed_submodules_cb(struct diff_queue_struct *q,
 					  struct diff_options *options,
 					  void *data)
@@ -820,6 +834,7 @@ static void collect_changed_submodules_cb(struct diff_queue_struct *q,
 		const struct submodule *submodule;
 		const char *name;
 		struct string_list_item *item;
+		struct changed_submodule_data *cs_data;
 
 		if (!S_ISGITLINK(p->two->mode))
 			continue;
@@ -848,9 +863,9 @@ static void collect_changed_submodules_cb(struct diff_queue_struct *q,
 
 		item = string_list_insert(changed, name);
 		if (!item->util)
-			/* NEEDSWORK: should we have oid_array_init()? */
-			item->util = xcalloc(1, sizeof(struct oid_array));
-		oid_array_append(item->util, &p->two->oid);
+			item->util = xcalloc(1, sizeof(struct changed_submodule_data));
+		cs_data = item->util;
+		oid_array_append(&cs_data->new_commits, &p->two->oid);
 	}
 }
 
@@ -897,11 +912,12 @@ static void collect_changed_submodules(struct repository *r,
 	reset_revision_walk();
 }
 
-static void free_submodules_oids(struct string_list *submodules)
+static void free_submodules_data(struct string_list *submodules)
 {
 	struct string_list_item *item;
-	for_each_string_list_item(item, submodules)
-		oid_array_clear((struct oid_array *) item->util);
+	for_each_string_list_item(item, submodules) {
+		changed_submodule_data_clear(item->util);
+	}
 	string_list_clear(submodules, 1);
 }
 
@@ -1069,7 +1085,7 @@ int find_unpushed_submodules(struct repository *r,
 	collect_changed_submodules(r, &submodules, &argv);
 
 	for_each_string_list_item(name, &submodules) {
-		struct oid_array *commits = name->util;
+		struct changed_submodule_data *cs_data = name->util;
 		const struct submodule *submodule;
 		const char *path = NULL;
 
@@ -1082,11 +1098,11 @@ int find_unpushed_submodules(struct repository *r,
 		if (!path)
 			continue;
 
-		if (submodule_needs_pushing(r, path, commits))
+		if (submodule_needs_pushing(r, path, &cs_data->new_commits))
 			string_list_insert(needs_pushing, path);
 	}
 
-	free_submodules_oids(&submodules);
+	free_submodules_data(&submodules);
 	strvec_clear(&argv);
 
 	return needs_pushing->nr;
@@ -1256,7 +1272,7 @@ static void calculate_changed_submodule_paths(struct repository *r,
 	collect_changed_submodules(r, changed_submodule_names, &argv);
 
 	for_each_string_list_item(name, changed_submodule_names) {
-		struct oid_array *commits = name->util;
+		struct changed_submodule_data *cs_data = name->util;
 		const struct submodule *submodule;
 		const char *path = NULL;
 
@@ -1269,8 +1285,8 @@ static void calculate_changed_submodule_paths(struct repository *r,
 		if (!path)
 			continue;
 
-		if (submodule_has_commits(r, path, null_oid(), commits)) {
-			oid_array_clear(commits);
+		if (submodule_has_commits(r, path, null_oid(), &cs_data->new_commits)) {
+			changed_submodule_data_clear(cs_data);
 			*name->string = '\0';
 		}
 	}
@@ -1307,7 +1323,7 @@ int submodule_touches_in_range(struct repository *r,
 
 	strvec_clear(&args);
 
-	free_submodules_oids(&subs);
+	free_submodules_data(&subs);
 	return ret;
 }
 
@@ -1591,7 +1607,7 @@ static int fetch_finish(int retvalue, struct strbuf *err,
 	struct fetch_task *task = task_cb;
 
 	struct string_list_item *it;
-	struct oid_array *commits;
+	struct changed_submodule_data *cs_data;
 
 	if (!task || !task->sub)
 		BUG("callback cookie bogus");
@@ -1619,14 +1635,14 @@ static int fetch_finish(int retvalue, struct strbuf *err,
 		/* Could be an unchanged submodule, not contained in the list */
 		goto out;
 
-	commits = it->util;
-	oid_array_filter(commits,
+	cs_data = it->util;
+	oid_array_filter(&cs_data->new_commits,
 			 commit_missing_in_sub,
 			 task->repo);
 
 	/* Are there commits we want, but do not exist? */
-	if (commits->nr) {
-		task->commits = commits;
+	if (cs_data->new_commits.nr) {
+		task->commits = &cs_data->new_commits;
 		ALLOC_GROW(spf->oid_fetch_tasks,
 			   spf->oid_fetch_tasks_nr + 1,
 			   spf->oid_fetch_tasks_alloc);
@@ -1684,7 +1700,7 @@ int fetch_populated_submodules(struct repository *r,
 
 	strvec_clear(&spf.args);
 out:
-	free_submodules_oids(&spf.changed_submodule_names);
+	free_submodules_data(&spf.changed_submodule_names);
 	return spf.result;
 }
 
-- 
2.33.GIT


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v2 6/9] submodule: extract get_fetch_task()
  2022-02-15 17:23 ` [PATCH v2 0/9] " Glen Choo
                     ` (4 preceding siblings ...)
  2022-02-15 17:23   ` [PATCH v2 5/9] submodule: store new submodule commits oid_array in a struct Glen Choo
@ 2022-02-15 17:23   ` Glen Choo
  2022-02-15 17:23   ` [PATCH v2 7/9] fetch: fetch unpopulated, changed submodules Glen Choo
                     ` (3 subsequent siblings)
  9 siblings, 0 replies; 149+ messages in thread
From: Glen Choo @ 2022-02-15 17:23 UTC (permalink / raw)
  To: git; +Cc: Glen Choo, Jonathan Tan, Junio C Hamano

get_next_submodule() configures the parallel submodule fetch by
performing two functions:

* iterate the index to find submodules
* configure the child processes to fetch the submodules found in the
  previous step

Extract the index iterating code into an iterator function,
get_fetch_task(), so that get_next_submodule() is agnostic of how
to find submodules. This prepares for a subsequent commit will teach the
fetch machinery to also iterate through the list of changed
submodules (in addition to the index).

Signed-off-by: Glen Choo <chooglen@google.com>
---
Jonathan: I'm really happy with the formatting changes that you
suggested because this diff is a lot easier to read, so thanks again!
Going forward, I'd appreciate any and all formatting suggestions - if
they seem possibly excessive, you can mark them as nits.

 submodule.c | 62 ++++++++++++++++++++++++++++++++---------------------
 1 file changed, 37 insertions(+), 25 deletions(-)

diff --git a/submodule.c b/submodule.c
index 5b1aa3fbe8..22d8a1ca12 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1461,14 +1461,12 @@ static struct repository *get_submodule_repo_for(struct repository *r,
 	return ret;
 }
 
-static int get_next_submodule(struct child_process *cp,
-			      struct strbuf *err, void *data, void **task_cb)
+static struct fetch_task *
+get_fetch_task(struct submodule_parallel_fetch *spf,
+	       const char **default_argv, struct strbuf *err)
 {
-	struct submodule_parallel_fetch *spf = data;
-
 	for (; spf->count < spf->r->index->cache_nr; spf->count++) {
 		const struct cache_entry *ce = spf->r->index->cache[spf->count];
-		const char *default_argv;
 		struct fetch_task *task;
 
 		if (!S_ISGITLINK(ce->ce_mode))
@@ -1488,10 +1486,10 @@ static int get_next_submodule(struct child_process *cp,
 					&spf->changed_submodule_names,
 					task->sub->name))
 				continue;
-			default_argv = "on-demand";
+			*default_argv = "on-demand";
 			break;
 		case RECURSE_SUBMODULES_ON:
-			default_argv = "yes";
+			*default_argv = "yes";
 			break;
 		case RECURSE_SUBMODULES_OFF:
 			continue;
@@ -1499,29 +1497,12 @@ static int get_next_submodule(struct child_process *cp,
 
 		task->repo = get_submodule_repo_for(spf->r, task->sub->path, null_oid());
 		if (task->repo) {
-			struct strbuf submodule_prefix = STRBUF_INIT;
-			child_process_init(cp);
-			cp->dir = task->repo->gitdir;
-			prepare_submodule_repo_env_in_gitdir(&cp->env_array);
-			cp->git_cmd = 1;
 			if (!spf->quiet)
 				strbuf_addf(err, _("Fetching submodule %s%s\n"),
 					    spf->prefix, ce->name);
-			strvec_init(&cp->args);
-			strvec_pushv(&cp->args, spf->args.v);
-			strvec_push(&cp->args, default_argv);
-			strvec_push(&cp->args, "--submodule-prefix");
-
-			strbuf_addf(&submodule_prefix, "%s%s/",
-						       spf->prefix,
-						       task->sub->path);
-			strvec_push(&cp->args, submodule_prefix.buf);
 
 			spf->count++;
-			*task_cb = task;
-
-			strbuf_release(&submodule_prefix);
-			return 1;
+			return task;
 		} else {
 			struct strbuf empty_submodule_path = STRBUF_INIT;
 
@@ -1545,6 +1526,37 @@ static int get_next_submodule(struct child_process *cp,
 			strbuf_release(&empty_submodule_path);
 		}
 	}
+	return NULL;
+}
+
+static int get_next_submodule(struct child_process *cp, struct strbuf *err,
+			      void *data, void **task_cb)
+{
+	struct submodule_parallel_fetch *spf = data;
+	const char *default_argv = NULL;
+	struct fetch_task *task = get_fetch_task(spf, &default_argv, err);
+
+	if (task) {
+		struct strbuf submodule_prefix = STRBUF_INIT;
+
+		child_process_init(cp);
+		cp->dir = task->repo->gitdir;
+		prepare_submodule_repo_env_in_gitdir(&cp->env_array);
+		cp->git_cmd = 1;
+		strvec_init(&cp->args);
+		strvec_pushv(&cp->args, spf->args.v);
+		strvec_push(&cp->args, default_argv);
+		strvec_push(&cp->args, "--submodule-prefix");
+
+		strbuf_addf(&submodule_prefix, "%s%s/",
+						spf->prefix,
+						task->sub->path);
+		strvec_push(&cp->args, submodule_prefix.buf);
+		*task_cb = task;
+
+		strbuf_release(&submodule_prefix);
+		return 1;
+	}
 
 	if (spf->oid_fetch_tasks_nr) {
 		struct fetch_task *task =
-- 
2.33.GIT


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v2 7/9] fetch: fetch unpopulated, changed submodules
  2022-02-15 17:23 ` [PATCH v2 0/9] " Glen Choo
                     ` (5 preceding siblings ...)
  2022-02-15 17:23   ` [PATCH v2 6/9] submodule: extract get_fetch_task() Glen Choo
@ 2022-02-15 17:23   ` Glen Choo
  2022-02-15 22:02     ` Jonathan Tan
  2022-02-15 22:06     ` Ævar Arnfjörð Bjarmason
  2022-02-15 17:23   ` [PATCH v2 8/9] submodule: read shallows when finding " Glen Choo
                     ` (2 subsequent siblings)
  9 siblings, 2 replies; 149+ messages in thread
From: Glen Choo @ 2022-02-15 17:23 UTC (permalink / raw)
  To: git; +Cc: Glen Choo, Jonathan Tan, Junio C Hamano

"git fetch --recurse-submodules" only considers populated
submodules (i.e. submodules that can be found by iterating the index),
which makes "git fetch" behave differently based on which commit is
checked out. As a result, even if the user has initialized all submodules
correctly, they may not fetch the necessary submodule commits, and
commands like "git checkout --recurse-submodules" might fail.

Teach "git fetch" to fetch cloned, changed submodules regardless of
whether they are populated (this is in addition to the current behavior
of fetching populated submodules).

Since a submodule may be encountered multiple times (via the list of
populated submodules or via the list of changed submodules), maintain a
list of seen submodules to avoid fetching a submodule more than once.

Signed-off-by: Glen Choo <chooglen@google.com>
---
As I mentioned in the cover letter, I'm not entirely happy with the
name repo_has_absorbed_submodules() - it's not a standardized term AFAIK
and it's a little clunky.

"absorbed submodule" is just a stand-in for "submodule in .git/modules",
so if we have a better term for "submodule in .git/modules", let's use
that instead.

 Documentation/fetch-options.txt |  26 +++--
 Documentation/git-fetch.txt     |  10 +-
 builtin/fetch.c                 |  14 +--
 submodule.c                     | 134 +++++++++++++++++++---
 submodule.h                     |  12 +-
 t/t5526-fetch-submodules.sh     | 195 ++++++++++++++++++++++++++++++++
 6 files changed, 349 insertions(+), 42 deletions(-)

diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt
index e967ff1874..38dad13683 100644
--- a/Documentation/fetch-options.txt
+++ b/Documentation/fetch-options.txt
@@ -185,15 +185,23 @@ endif::git-pull[]
 ifndef::git-pull[]
 --recurse-submodules[=yes|on-demand|no]::
 	This option controls if and under what conditions new commits of
-	populated submodules should be fetched too. It can be used as a
-	boolean option to completely disable recursion when set to 'no' or to
-	unconditionally recurse into all populated submodules when set to
-	'yes', which is the default when this option is used without any
-	value. Use 'on-demand' to only recurse into a populated submodule
-	when the superproject retrieves a commit that updates the submodule's
-	reference to a commit that isn't already in the local submodule
-	clone. By default, 'on-demand' is used, unless
-	`fetch.recurseSubmodules` is set (see linkgit:git-config[1]).
+	submodules should be fetched too. When recursing through submodules,
+	`git fetch` always attempts to fetch "changed" submodules, that is, a
+	submodule that has commits that are referenced by a newly fetched
+	superproject commit but are missing in the local submodule clone. A
+	changed submodule can be fetched as long as it is present locally e.g.
+	in `$GIT_DIR/modules/` (see linkgit:gitsubmodules[7]); if the upstream
+	adds a new submodule, that submodule cannot be fetched until it is
+	cloned e.g. by `git submodule update`.
++
+When set to 'on-demand', only changed submodules are fetched. When set
+to 'yes', all populated submodules are fetched and submodules that are
+both unpopulated and changed are fetched. When set to 'no', submodules
+are never fetched.
++
+When unspecified, this uses the value of `fetch.recurseSubmodules` if it
+is set (see linkgit:git-config[1]), defaulting to 'on-demand' if unset.
+When this option is used without any value, it defaults to 'yes'.
 endif::git-pull[]
 
 -j::
diff --git a/Documentation/git-fetch.txt b/Documentation/git-fetch.txt
index 550c16ca61..e9d364669a 100644
--- a/Documentation/git-fetch.txt
+++ b/Documentation/git-fetch.txt
@@ -287,12 +287,10 @@ include::transfer-data-leaks.txt[]
 
 BUGS
 ----
-Using --recurse-submodules can only fetch new commits in already checked
-out submodules right now. When e.g. upstream added a new submodule in the
-just fetched commits of the superproject the submodule itself cannot be
-fetched, making it impossible to check out that submodule later without
-having to do a fetch again. This is expected to be fixed in a future Git
-version.
+Using --recurse-submodules can only fetch new commits in submodules that are
+present locally e.g. in `$GIT_DIR/modules/`. If the upstream adds a new
+submodule, that submodule cannot be fetched until it is cloned e.g. by `git
+submodule update`. This is expected to be fixed in a future Git version.
 
 SEE ALSO
 --------
diff --git a/builtin/fetch.c b/builtin/fetch.c
index f7abbc31ff..faaf89f637 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -2122,13 +2122,13 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 			max_children = fetch_parallel_config;
 
 		add_options_to_argv(&options);
-		result = fetch_populated_submodules(the_repository,
-						    &options,
-						    submodule_prefix,
-						    recurse_submodules,
-						    recurse_submodules_default,
-						    verbosity < 0,
-						    max_children);
+		result = fetch_submodules(the_repository,
+					  &options,
+					  submodule_prefix,
+					  recurse_submodules,
+					  recurse_submodules_default,
+					  verbosity < 0,
+					  max_children);
 		strvec_clear(&options);
 	}
 
diff --git a/submodule.c b/submodule.c
index 22d8a1ca12..3558fddeb7 100644
--- a/submodule.c
+++ b/submodule.c
@@ -811,6 +811,16 @@ static const char *default_name_or_path(const char *path_or_name)
  * member of the changed submodule string_list_item.
  */
 struct changed_submodule_data {
+	/*
+	 * The first superproject commit in the rev walk that points to the
+	 * submodule.
+	 */
+	const struct object_id *super_oid;
+	/*
+	 * Path to the submodule in the superproject commit referenced
+	 * by 'super_oid'.
+	 */
+	char *path;
 	/* The submodule commits that have changed in the rev walk. */
 	struct oid_array new_commits;
 };
@@ -818,6 +828,7 @@ struct changed_submodule_data {
 static void changed_submodule_data_clear(struct changed_submodule_data *cs_data)
 {
 	oid_array_clear(&cs_data->new_commits);
+	free(cs_data->path);
 }
 
 static void collect_changed_submodules_cb(struct diff_queue_struct *q,
@@ -865,6 +876,8 @@ static void collect_changed_submodules_cb(struct diff_queue_struct *q,
 		if (!item->util)
 			item->util = xcalloc(1, sizeof(struct changed_submodule_data));
 		cs_data = item->util;
+		cs_data->super_oid = commit_oid;
+		cs_data->path = xstrdup(p->two->path);
 		oid_array_append(&cs_data->new_commits, &p->two->oid);
 	}
 }
@@ -1248,14 +1261,28 @@ void check_for_new_submodule_commits(struct object_id *oid)
 	oid_array_append(&ref_tips_after_fetch, oid);
 }
 
+/*
+ * Returns 1 if the repo has absorbed submodule gitdirs, and 0
+ * otherwise. Like submodule_name_to_gitdir(), this checks
+ * $GIT_DIR/modules, not $GIT_COMMON_DIR.
+ */
+static int repo_has_absorbed_submodules(struct repository *r)
+{
+	struct strbuf buf = STRBUF_INIT;
+
+	strbuf_repo_git_path(&buf, r, "modules/");
+	return file_exists(buf.buf) && !is_empty_dir(buf.buf);
+}
+
 static void calculate_changed_submodule_paths(struct repository *r,
 		struct string_list *changed_submodule_names)
 {
 	struct strvec argv = STRVEC_INIT;
 	struct string_list_item *name;
 
-	/* No need to check if there are no submodules configured */
-	if (!submodule_from_path(r, NULL, NULL))
+	/* No need to check if no submodules would be fetched */
+	if (!submodule_from_path(r, NULL, NULL) &&
+	    !repo_has_absorbed_submodules(r))
 		return;
 
 	strvec_push(&argv, "--"); /* argv[0] program name */
@@ -1328,7 +1355,8 @@ int submodule_touches_in_range(struct repository *r,
 }
 
 struct submodule_parallel_fetch {
-	int count;
+	int index_count;
+	int changed_count;
 	struct strvec args;
 	struct repository *r;
 	const char *prefix;
@@ -1338,6 +1366,7 @@ struct submodule_parallel_fetch {
 	int result;
 
 	struct string_list changed_submodule_names;
+	struct string_list seen_submodule_names;
 
 	/* Pending fetches by OIDs */
 	struct fetch_task **oid_fetch_tasks;
@@ -1348,6 +1377,7 @@ struct submodule_parallel_fetch {
 #define SPF_INIT { \
 	.args = STRVEC_INIT, \
 	.changed_submodule_names = STRING_LIST_INIT_DUP, \
+	.seen_submodule_names = STRING_LIST_INIT_DUP, \
 	.submodules_with_errors = STRBUF_INIT, \
 }
 
@@ -1462,11 +1492,12 @@ static struct repository *get_submodule_repo_for(struct repository *r,
 }
 
 static struct fetch_task *
-get_fetch_task(struct submodule_parallel_fetch *spf,
-	       const char **default_argv, struct strbuf *err)
+get_fetch_task_from_index(struct submodule_parallel_fetch *spf,
+			  const char **default_argv, struct strbuf *err)
 {
-	for (; spf->count < spf->r->index->cache_nr; spf->count++) {
-		const struct cache_entry *ce = spf->r->index->cache[spf->count];
+	for (; spf->index_count < spf->r->index->cache_nr; spf->index_count++) {
+		const struct cache_entry *ce =
+			spf->r->index->cache[spf->index_count];
 		struct fetch_task *task;
 
 		if (!S_ISGITLINK(ce->ce_mode))
@@ -1476,6 +1507,15 @@ get_fetch_task(struct submodule_parallel_fetch *spf,
 		if (!task)
 			continue;
 
+		/*
+		 * We might have already considered this submodule
+		 * because we saw it when iterating the changed
+		 * submodule names.
+		 */
+		if (string_list_lookup(&spf->seen_submodule_names,
+				       task->sub->name))
+			continue;
+
 		switch (get_fetch_recurse_config(task->sub, spf))
 		{
 		default:
@@ -1501,7 +1541,7 @@ get_fetch_task(struct submodule_parallel_fetch *spf,
 				strbuf_addf(err, _("Fetching submodule %s%s\n"),
 					    spf->prefix, ce->name);
 
-			spf->count++;
+			spf->index_count++;
 			return task;
 		} else {
 			struct strbuf empty_submodule_path = STRBUF_INIT;
@@ -1529,12 +1569,77 @@ get_fetch_task(struct submodule_parallel_fetch *spf,
 	return NULL;
 }
 
+static struct fetch_task *
+get_fetch_task_from_changed(struct submodule_parallel_fetch *spf,
+			    const char **default_argv, struct strbuf *err)
+{
+	for (; spf->changed_count < spf->changed_submodule_names.nr;
+	     spf->changed_count++) {
+		struct string_list_item item =
+			spf->changed_submodule_names.items[spf->changed_count];
+		struct changed_submodule_data *cs_data = item.util;
+		struct fetch_task *task;
+
+		/*
+		 * We might have already considered this submodule
+		 * because we saw it in the index.
+		 */
+		if (string_list_lookup(&spf->seen_submodule_names, item.string))
+			continue;
+
+		task = fetch_task_create(spf->r, cs_data->path,
+					 cs_data->super_oid);
+		if (!task)
+			continue;
+
+		switch (get_fetch_recurse_config(task->sub, spf)) {
+		default:
+		case RECURSE_SUBMODULES_DEFAULT:
+		case RECURSE_SUBMODULES_ON_DEMAND:
+			*default_argv = "on-demand";
+			break;
+		case RECURSE_SUBMODULES_ON:
+			*default_argv = "yes";
+			break;
+		case RECURSE_SUBMODULES_OFF:
+			continue;
+		}
+
+		task->repo = get_submodule_repo_for(spf->r, task->sub->path,
+						    cs_data->super_oid);
+		if (!task->repo) {
+			fetch_task_release(task);
+			free(task);
+
+			strbuf_addf(err, _("Could not access submodule '%s'\n"),
+				    cs_data->path);
+			continue;
+		}
+		if (!is_tree_submodule_active(spf->r, cs_data->super_oid,
+					      task->sub->path))
+			continue;
+
+		if (!spf->quiet)
+			strbuf_addf(err,
+				    _("Fetching submodule %s%s at commit %s\n"),
+				    spf->prefix, task->sub->path,
+				    find_unique_abbrev(cs_data->super_oid,
+						       DEFAULT_ABBREV));
+		spf->changed_count++;
+		return task;
+	}
+	return NULL;
+}
+
 static int get_next_submodule(struct child_process *cp, struct strbuf *err,
 			      void *data, void **task_cb)
 {
 	struct submodule_parallel_fetch *spf = data;
 	const char *default_argv = NULL;
-	struct fetch_task *task = get_fetch_task(spf, &default_argv, err);
+	struct fetch_task *task =
+		get_fetch_task_from_index(spf, &default_argv, err);
+	if (!task)
+		task = get_fetch_task_from_changed(spf, &default_argv, err);
 
 	if (task) {
 		struct strbuf submodule_prefix = STRBUF_INIT;
@@ -1555,6 +1660,7 @@ static int get_next_submodule(struct child_process *cp, struct strbuf *err,
 		*task_cb = task;
 
 		strbuf_release(&submodule_prefix);
+		string_list_insert(&spf->seen_submodule_names, task->sub->name);
 		return 1;
 	}
 
@@ -1669,11 +1775,11 @@ static int fetch_finish(int retvalue, struct strbuf *err,
 	return 0;
 }
 
-int fetch_populated_submodules(struct repository *r,
-			       const struct strvec *options,
-			       const char *prefix, int command_line_option,
-			       int default_option,
-			       int quiet, int max_parallel_jobs)
+int fetch_submodules(struct repository *r,
+		     const struct strvec *options,
+		     const char *prefix, int command_line_option,
+		     int default_option,
+		     int quiet, int max_parallel_jobs)
 {
 	int i;
 	struct submodule_parallel_fetch spf = SPF_INIT;
diff --git a/submodule.h b/submodule.h
index 784ceffc0e..61bebde319 100644
--- a/submodule.h
+++ b/submodule.h
@@ -88,12 +88,12 @@ int should_update_submodules(void);
  */
 const struct submodule *submodule_from_ce(const struct cache_entry *ce);
 void check_for_new_submodule_commits(struct object_id *oid);
-int fetch_populated_submodules(struct repository *r,
-			       const struct strvec *options,
-			       const char *prefix,
-			       int command_line_option,
-			       int default_option,
-			       int quiet, int max_parallel_jobs);
+int fetch_submodules(struct repository *r,
+		     const struct strvec *options,
+		     const char *prefix,
+		     int command_line_option,
+		     int default_option,
+		     int quiet, int max_parallel_jobs);
 unsigned is_submodule_modified(const char *path, int ignore_untracked);
 int submodule_uses_gitfile(const char *path);
 
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index cb18f0ac21..df44757468 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -399,6 +399,201 @@ test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necess
 	verify_fetch_result actual.err
 '
 
+# Test that we can fetch submodules in other branches by running fetch
+# in a commit that has no submodules.
+test_expect_success 'setup downstream branch without submodules' '
+	(
+		cd downstream &&
+		git checkout --recurse-submodules -b no-submodules &&
+		rm .gitmodules &&
+		git rm submodule &&
+		git add .gitmodules &&
+		git commit -m "no submodules" &&
+		git checkout --recurse-submodules super
+	)
+'
+
+test_expect_success "'--recurse-submodules=on-demand' should fetch submodule commits if the submodule is changed but the index has no submodules" '
+	git -C downstream fetch --recurse-submodules &&
+	# Create new superproject commit with updated submodules
+	add_upstream_commit &&
+	(
+		cd submodule &&
+		(
+			cd subdir/deepsubmodule &&
+			git fetch &&
+			git checkout -q FETCH_HEAD
+		) &&
+		git add subdir/deepsubmodule &&
+		git commit -m "new deep submodule"
+	) &&
+	git add submodule &&
+	git commit -m "new submodule" &&
+
+	# Fetch the new superproject commit
+	(
+		cd downstream &&
+		git switch --recurse-submodules no-submodules &&
+		git fetch --recurse-submodules=on-demand >../actual.out 2>../actual.err
+	) &&
+	test_must_be_empty actual.out &&
+	git rev-parse --short HEAD >superhead &&
+	git -C submodule rev-parse --short HEAD >subhead &&
+	git -C deepsubmodule rev-parse --short HEAD >deephead &&
+	verify_fetch_result actual.err &&
+
+	# Assert that the fetch happened at the non-HEAD commits
+	grep "Fetching submodule submodule at commit $superhead" actual.err &&
+	grep "Fetching submodule submodule/subdir/deepsubmodule at commit $subhead" actual.err
+'
+
+test_expect_success "'--recurse-submodules' should fetch submodule commits if the submodule is changed but the index has no submodules" '
+	# Fetch any leftover commits from other tests.
+	git -C downstream fetch --recurse-submodules &&
+	# Create new superproject commit with updated submodules
+	add_upstream_commit &&
+	(
+		cd submodule &&
+		(
+			cd subdir/deepsubmodule &&
+			git fetch &&
+			git checkout -q FETCH_HEAD
+		) &&
+		git add subdir/deepsubmodule &&
+		git commit -m "new deep submodule"
+	) &&
+	git add submodule &&
+	git commit -m "new submodule" &&
+
+	# Fetch the new superproject commit
+	(
+		cd downstream &&
+		git switch --recurse-submodules no-submodules &&
+		git fetch --recurse-submodules >../actual.out 2>../actual.err
+	) &&
+	test_must_be_empty actual.out &&
+	git rev-parse --short HEAD >superhead &&
+	git -C submodule rev-parse --short HEAD >subhead &&
+	git -C deepsubmodule rev-parse --short HEAD >deephead &&
+	verify_fetch_result actual.err &&
+
+	# Assert that the fetch happened at the non-HEAD commits
+	grep "Fetching submodule submodule at commit $superhead" actual.err &&
+	grep "Fetching submodule submodule/subdir/deepsubmodule at commit $subhead" actual.err
+'
+
+test_expect_success "'--recurse-submodules' should ignore changed, inactive submodules" '
+	# Fetch any leftover commits from other tests.
+	git -C downstream fetch --recurse-submodules &&
+	# Create new superproject commit with updated submodules
+	add_upstream_commit &&
+	(
+		cd submodule &&
+		(
+			cd subdir/deepsubmodule &&
+			git fetch &&
+			git checkout -q FETCH_HEAD
+		) &&
+		git add subdir/deepsubmodule &&
+		git commit -m "new deep submodule"
+	) &&
+	git add submodule &&
+	git commit -m "new submodule" &&
+
+	# Fetch the new superproject commit
+	(
+		cd downstream &&
+		git switch --recurse-submodules no-submodules &&
+		git -c submodule.submodule.active=false fetch --recurse-submodules >../actual.out 2>../actual.err
+	) &&
+	test_must_be_empty actual.out &&
+	git rev-parse --short HEAD >superhead &&
+	# Neither should be fetched because the submodule is inactive
+	rm subhead &&
+	rm deephead &&
+	verify_fetch_result actual.err
+'
+
+# In downstream, init "submodule2", but do not check it out while
+# fetching. This lets us assert that unpopulated submodules can be
+# fetched.
+test_expect_success 'setup downstream branch with other submodule' '
+	mkdir submodule2 &&
+	(
+		cd submodule2 &&
+		git init &&
+		echo sub2content >sub2file &&
+		git add sub2file &&
+		git commit -a -m new &&
+		git branch -M sub2
+	) &&
+	git checkout -b super-sub2-only &&
+	git submodule add "$pwd/submodule2" submodule2 &&
+	git commit -m "add sub2" &&
+	git checkout super &&
+	(
+		cd downstream &&
+		git fetch --recurse-submodules origin &&
+		git checkout super-sub2-only &&
+		# Explicitly run "git submodule update" because sub2 is new
+		# and has not been cloned.
+		git submodule update --init &&
+		git checkout --recurse-submodules super
+	)
+'
+
+test_expect_success "'--recurse-submodules' should fetch submodule commits in changed submodules and the index" '
+	# Fetch any leftover commits from other tests.
+	git -C downstream fetch --recurse-submodules &&
+	# Create new commit in origin/super
+	add_upstream_commit &&
+	(
+		cd submodule &&
+		(
+			cd subdir/deepsubmodule &&
+			git fetch &&
+			git checkout -q FETCH_HEAD
+		) &&
+		git add subdir/deepsubmodule &&
+		git commit -m "new deep submodule"
+	) &&
+	git add submodule &&
+	git commit -m "new submodule" &&
+
+	# Create new commit in origin/super-sub2-only
+	git checkout super-sub2-only &&
+	(
+		cd submodule2 &&
+		test_commit --no-tag foo
+	) &&
+	git add submodule2 &&
+	git commit -m "new submodule2" &&
+
+	git checkout super &&
+	(
+		cd downstream &&
+		git fetch --recurse-submodules >../actual.out 2>../actual.err
+	) &&
+	test_must_be_empty actual.out &&
+
+	# Assert that the submodules in the super branch are fetched
+	git rev-parse --short HEAD >superhead &&
+	git -C submodule rev-parse --short HEAD >subhead &&
+	git -C deepsubmodule rev-parse --short HEAD >deephead &&
+	verify_fetch_result actual.err &&
+	# grep for the exact line to check that the submodule is read
+	# from the index, not from a commit
+	grep "^Fetching submodule submodule\$" actual.err &&
+
+	# Assert that super-sub2-only and submodule2 were fetched even
+	# though another branch is checked out
+	super_sub2_only_head=$(git rev-parse --short super-sub2-only) &&
+	grep -E "\.\.${super_sub2_only_head}\s+super-sub2-only\s+-> origin/super-sub2-only" actual.err &&
+	grep "Fetching submodule submodule2 at commit $super_sub2_only_head" actual.err &&
+	sub2head=$(git -C submodule2 rev-parse --short HEAD) &&
+	grep -E "\.\.${sub2head}\s+sub2\s+-> origin/sub2" actual.err
+'
+
 test_expect_success "'--recurse-submodules=on-demand' stops when no new submodule commits are found in the superproject (and ignores config)" '
 	add_upstream_commit &&
 	echo a >> file &&
-- 
2.33.GIT


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v2 8/9] submodule: read shallows when finding changed submodules
  2022-02-15 17:23 ` [PATCH v2 0/9] " Glen Choo
                     ` (6 preceding siblings ...)
  2022-02-15 17:23   ` [PATCH v2 7/9] fetch: fetch unpopulated, changed submodules Glen Choo
@ 2022-02-15 17:23   ` Glen Choo
  2022-02-15 22:03     ` Jonathan Tan
  2022-02-15 17:23   ` [PATCH v2 9/9] submodule: fix latent check_has_commit() bug Glen Choo
  2022-02-24 10:08   ` [PATCH v3 00/10] fetch --recurse-submodules: fetch unpopulated submodules Glen Choo
  9 siblings, 1 reply; 149+ messages in thread
From: Glen Choo @ 2022-02-15 17:23 UTC (permalink / raw)
  To: git; +Cc: Glen Choo, Jonathan Tan, Junio C Hamano

In a repository with submodules, "git fetch --update-shallow" can fail.
This happens because "git fetch" does not read shallows when rev walking
the newly fetched commits to find changed submodules, thus the rev walk
may try to read the parent of a shallow and fail. This can occur when
--recurse-submodules is not passed, because the default behavior is
to fetch changed submodules i.e. --recurse-submodules=on-demand.

Fix this by reading shallows before the rev walk, and test for it.

Signed-off-by: Glen Choo <chooglen@google.com>
---
 submodule.c                 |  4 ++++
 t/t5526-fetch-submodules.sh | 10 ++++++++++
 2 files changed, 14 insertions(+)

diff --git a/submodule.c b/submodule.c
index 3558fddeb7..e62619bee0 100644
--- a/submodule.c
+++ b/submodule.c
@@ -22,6 +22,7 @@
 #include "parse-options.h"
 #include "object-store.h"
 #include "commit-reach.h"
+#include "shallow.h"
 
 static int config_update_recurse_submodules = RECURSE_SUBMODULES_OFF;
 static int initialized_fetch_ref_tips;
@@ -901,6 +902,9 @@ static void collect_changed_submodules(struct repository *r,
 
 	save_warning = warn_on_object_refname_ambiguity;
 	warn_on_object_refname_ambiguity = 0;
+	/* make sure shallows are read */
+	is_repository_shallow(the_repository);
+
 	repo_init_revisions(r, &rev, NULL);
 	setup_revisions(argv->nr, argv->v, &rev, &s_r_opt);
 	warn_on_object_refname_ambiguity = save_warning;
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index df44757468..ea70c3646f 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -1031,4 +1031,14 @@ test_expect_success 'recursive fetch after deinit a submodule' '
 	test_cmp expect actual
 '
 
+test_expect_success 'recursive fetch does not fail with --update-shallow' '
+	git clone --no-local --depth=2 --recurse-submodules . shallow &&
+	git init notshallow &&
+	(
+		cd notshallow &&
+		git submodule add ../submodule sub &&
+		git fetch --update-shallow ../shallow/.git refs/heads/*:refs/remotes/shallow/* --recurse-submodules
+	)
+'
+
 test_done
-- 
2.33.GIT


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v2 9/9] submodule: fix latent check_has_commit() bug
  2022-02-15 17:23 ` [PATCH v2 0/9] " Glen Choo
                     ` (7 preceding siblings ...)
  2022-02-15 17:23   ` [PATCH v2 8/9] submodule: read shallows when finding " Glen Choo
@ 2022-02-15 17:23   ` Glen Choo
  2022-02-15 22:04     ` Jonathan Tan
  2022-02-24 10:08   ` [PATCH v3 00/10] fetch --recurse-submodules: fetch unpopulated submodules Glen Choo
  9 siblings, 1 reply; 149+ messages in thread
From: Glen Choo @ 2022-02-15 17:23 UTC (permalink / raw)
  To: git; +Cc: Glen Choo, Jonathan Tan, Junio C Hamano

When check_has_commit() is called on a missing submodule, initialization
of the struct repository fails, but it attempts to clear the struct
anyway (which is a fatal error). This bug is masked by its only caller,
submodule_has_commits(), first calling add_submodule_odb() - the latter
fails if the submodule does not exist, making submodule_has_commits()
exit early and not invoke check_has_commit().

Fix this bug, and because calling add_submodule_odb() is no longer
necessary as of 13a2f620b2 (submodule: pass repo to
check_has_commit(), 2021-10-08), remove that call too.

This is the last caller of add_submodule_odb(), so remove that
function. (Submodule ODBs are still added as alternates via
add_submodule_odb_by_path().)

Signed-off-by: Glen Choo <chooglen@google.com>
---
 submodule.c | 35 ++---------------------------------
 submodule.h |  9 ++++-----
 2 files changed, 6 insertions(+), 38 deletions(-)

diff --git a/submodule.c b/submodule.c
index e62619bee0..2b17440777 100644
--- a/submodule.c
+++ b/submodule.c
@@ -168,26 +168,6 @@ void stage_updated_gitmodules(struct index_state *istate)
 
 static struct string_list added_submodule_odb_paths = STRING_LIST_INIT_NODUP;
 
-/* TODO: remove this function, use repo_submodule_init instead. */
-int add_submodule_odb(const char *path)
-{
-	struct strbuf objects_directory = STRBUF_INIT;
-	int ret = 0;
-
-	ret = strbuf_git_path_submodule(&objects_directory, path, "objects/");
-	if (ret)
-		goto done;
-	if (!is_directory(objects_directory.buf)) {
-		ret = -1;
-		goto done;
-	}
-	string_list_insert(&added_submodule_odb_paths,
-			   strbuf_detach(&objects_directory, NULL));
-done:
-	strbuf_release(&objects_directory);
-	return ret;
-}
-
 void add_submodule_odb_by_path(const char *path)
 {
 	string_list_insert(&added_submodule_odb_paths, xstrdup(path));
@@ -966,7 +946,8 @@ static int check_has_commit(const struct object_id *oid, void *data)
 
 	if (repo_submodule_init(&subrepo, cb->repo, cb->path, cb->super_oid)) {
 		cb->result = 0;
-		goto cleanup;
+		/* subrepo failed to init, so don't clean it up. */
+		return 0;
 	}
 
 	type = oid_object_info(&subrepo, oid, NULL);
@@ -997,18 +978,6 @@ static int submodule_has_commits(struct repository *r,
 {
 	struct has_commit_data has_commit = { r, 1, path, super_oid };
 
-	/*
-	 * Perform a cheap, but incorrect check for the existence of 'commits'.
-	 * This is done by adding the submodule's object store to the in-core
-	 * object store, and then querying for each commit's existence.  If we
-	 * do not have the commit object anywhere, there is no chance we have
-	 * it in the object store of the correct submodule and have it
-	 * reachable from a ref, so we can fail early without spawning rev-list
-	 * which is expensive.
-	 */
-	if (add_submodule_odb(path))
-		return 0;
-
 	oid_array_for_each_unique(commits, check_has_commit, &has_commit);
 
 	if (has_commit.result) {
diff --git a/submodule.h b/submodule.h
index 61bebde319..40c1445237 100644
--- a/submodule.h
+++ b/submodule.h
@@ -103,12 +103,11 @@ int submodule_uses_gitfile(const char *path);
 int bad_to_remove_submodule(const char *path, unsigned flags);
 
 /*
- * Call add_submodule_odb() to add the submodule at the given path to a list.
- * When register_all_submodule_odb_as_alternates() is called, the object stores
- * of all submodules in that list will be added as alternates in
- * the_repository.
+ * Call add_submodule_odb_by_path() to add the submodule at the given
+ * path to a list. When register_all_submodule_odb_as_alternates() is
+ * called, the object stores of all submodules in that list will be
+ * added as alternates in the_repository.
  */
-int add_submodule_odb(const char *path);
 void add_submodule_odb_by_path(const char *path);
 int register_all_submodule_odb_as_alternates(void);
 
-- 
2.33.GIT


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 3/9] submodule: make static functions read submodules from commits
  2022-02-15 17:23   ` [PATCH v2 3/9] submodule: make static functions read submodules from commits Glen Choo
@ 2022-02-15 21:18     ` Jonathan Tan
  2022-02-16  6:59       ` Glen Choo
  2022-02-15 22:00     ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 149+ messages in thread
From: Jonathan Tan @ 2022-02-15 21:18 UTC (permalink / raw)
  To: Glen Choo; +Cc: Jonathan Tan, git, Junio C Hamano

First of all, patches 1 and 2 look good since they are the same as in
v1 and I have reviewed them. Moving on...

Glen Choo <chooglen@google.com> writes:
> The changed function signatures follow repo_submodule_init()'s argument
> order, i.e. "path" then "treeish_name". Where needed, reorder the
> arguments of functions that already take "path" and "treeish_name" to be
> consistent with this convention.

This paragraph made me nervous, but looking at the diff, you didn't
actually reorder any arguments. Probably best to delete this paragraph.

The fact that the additional functionality is not used also means that
we can't tell for sure if all relevant functions are indeed changed, but
perhaps we can determine this in a later patch.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 5/9] submodule: store new submodule commits oid_array in a struct
  2022-02-15 17:23   ` [PATCH v2 5/9] submodule: store new submodule commits oid_array in a struct Glen Choo
@ 2022-02-15 21:33     ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 149+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-02-15 21:33 UTC (permalink / raw)
  To: Glen Choo; +Cc: git, Jonathan Tan, Junio C Hamano


On Wed, Feb 16 2022, Glen Choo wrote:

>  	struct string_list_item *item;
> -	for_each_string_list_item(item, submodules)
> -		oid_array_clear((struct oid_array *) item->util);
> +	for_each_string_list_item(item, submodules) {
> +		changed_submodule_data_clear(item->util);
> +	}

Nit: These {} additions aren't needed here or in the end state of this
series.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 1/9] t5526: introduce test helper to assert on fetches
  2022-02-15 17:23   ` [PATCH v2 1/9] t5526: introduce test helper to assert on fetches Glen Choo
@ 2022-02-15 21:37     ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 149+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-02-15 21:37 UTC (permalink / raw)
  To: Glen Choo; +Cc: git, Jonathan Tan, Junio C Hamano


On Wed, Feb 16 2022, Glen Choo wrote:

> +verify_fetch_result() {
> +	ACTUAL_ERR=$1 &&
> +	rm -f expect.err.combined &&
> +	if [ -f expect.err.super ]; then
> +		cat expect.err.super >>expect.err.combined
> +	fi &&
> +	if [ -f expect.err.sub ]; then
> +		cat expect.err.sub >>expect.err.combined
> +	fi &&
> +	if [ -f expect.err.deep ]; then
> +		cat expect.err.deep >>expect.err.combined
> +	fi &&
> +	test_cmp expect.err.combined $ACTUAL_ERR
> +}

I see this will get further modified in this series, but I wondered why
not just something like:

diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index 5d55f14ed42..4d8e06dea52 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -49,16 +49,12 @@ add_upstream_commit() {
 # expect.err file should be rm-ed.
 verify_fetch_result() {
 	ACTUAL_ERR=$1 &&
-	rm -f expect.err.combined &&
-	if [ -f expect.err.super ]; then
-		cat expect.err.super >>expect.err.combined
-	fi &&
-	if [ -f expect.err.sub ]; then
-		cat expect.err.sub >>expect.err.combined
-	fi &&
-	if [ -f expect.err.deep ]; then
-		cat expect.err.deep >>expect.err.combined
-	fi &&
+
+	{
+		cat expect.err.super
+		cat expect.err.sub
+		cat expect.err.deep
+	} >expect.err.combined
 	test_cmp expect.err.combined $ACTUAL_ERR
 }
 
I.e. there's no law that we've got to avoid non-zero on every exit
code. In this case we can avoid the existence checks and just "cat" them
together, and if it fails we'll presumably fail on the test_cmp
anyway...

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 2/9] t5526: use grep to assert on fetches
  2022-02-15 17:23   ` [PATCH v2 2/9] t5526: use grep " Glen Choo
@ 2022-02-15 21:53     ` Ævar Arnfjörð Bjarmason
  2022-02-16  3:09       ` Glen Choo
  0 siblings, 1 reply; 149+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-02-15 21:53 UTC (permalink / raw)
  To: Glen Choo; +Cc: git, Jonathan Tan, Junio C Hamano


On Wed, Feb 16 2022, Glen Choo wrote:

> In a previous commit, we replaced test_cmp invocations with
> verify_fetch_result(). Finish the process of removing test_cmp by using
> grep in verify_fetch_result() instead.
>
> This makes the tests less sensitive to changes because, instead of
> checking the whole stderr, we only grep for the lines of the form
>
> * "<old-head>..<new-head>\s+branch\s+-> origin/branch"
> * "Fetching submodule <submodule-path>" (if fetching a submodule)
>
> when we expect the repo to have fetched. If we expect the repo to not
> have fetched, grep to make sure the lines are absent. Also, simplify the
> assertions by using grep patterns that match only the relevant pieces of
> information, e.g. <old-head> is irrelevant because we only want to know
> if the fetch was performed, so we don't need to know where the branch
> was before the fetch.

I tried ejecting 1/9 and 2/9 out of this series locally, and it passes
all tests until the new tests you add in 7/9.

As ugly as some of the pre-image is, I wonder if dropping these first
two and biting the bullet and just continuing with the test_cmp is
better.

The test_cmp is going to catch issues that even the cleverest grep
combinations won't, e.g. in the submodule-in-C series I discovered a bug
where all of our testing & the existing series hadn't spotted that we
were dropping a \n at the end in one of the messages.

Particularly as:

>  # Verifies that the expected repositories were fetched. This is done by
> -# concatenating the files expect.err.[super|sub|deep] in the correct
> -# order and comparing it to the actual stderr.
> +# checking that the branches of [super|sub|deep] were updated to
> +# [super|sub|deep]head if the corresponding file exists.
>  #
> -# If a repo should not be fetched in the test, its corresponding
> -# expect.err file should be rm-ed.
> +# If the [super|sub|deep] head file does not exist, this verifies that
> +# the corresponding repo was not fetched. Thus, if a repo should not be
> +# fetched in the test, its corresponding head file should be
> +# rm-ed.
>  verify_fetch_result() {
>  	ACTUAL_ERR=$1 &&
> -	rm -f expect.err.combined &&
> -	if [ -f expect.err.super ]; then
> -		cat expect.err.super >>expect.err.combined
> +	# Each grep pattern is guaranteed to match the correct repo
> +	# because each repo uses a different name for their branch i.e.
> +	# "super", "sub" and "deep".
> +	if [ -f superhead ]; then
> +		grep -E "\.\.$(cat superhead)\s+super\s+-> origin/super" $ACTUAL_ERR
> +	else
> +		! grep "super" $ACTUAL_ERR
>  	fi &&
> -	if [ -f expect.err.sub ]; then
> -		cat expect.err.sub >>expect.err.combined
> +	if [ -f subhead ]; then
> +		grep "Fetching submodule submodule" $ACTUAL_ERR &&
> +		grep -E "\.\.$(cat subhead)\s+sub\s+-> origin/sub" $ACTUAL_ERR
> +	else
> +		! grep "Fetching submodule submodule" $ACTUAL_ERR
>  	fi &&
> -	if [ -f expect.err.deep ]; then
> -		cat expect.err.deep >>expect.err.combined
> -	fi &&
> -	test_cmp expect.err.combined $ACTUAL_ERR
> +	if [ -f deephead ]; then
> +		grep "Fetching submodule submodule/subdir/deepsubmodule" $ACTUAL_ERR &&
> +		grep -E "\.\.$(cat deephead)\s+deep\s+-> origin/deep" $ACTUAL_ERR
> +	else
> +		! grep "Fetching submodule submodule/subdir/deepsubmodule" $ACTUAL_ERR
> +	fi
>  }

This sort of thing is really hard to understand and review...

>  test_expect_success setup '
> @@ -274,13 +277,10 @@ test_expect_success "Recursion doesn't happen when no new commits are fetched in
>  '
>  
>  test_expect_success "Recursion stops when no new submodule commits are fetched" '
> -	head1=$(git rev-parse --short HEAD) &&
>  	git add submodule &&
>  	git commit -m "new submodule" &&
> -	head2=$(git rev-parse --short HEAD) &&
> -	echo "From $pwd/." > expect.err.super &&
> -	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&

...as opposed to if we just rolled the generation of this into some
utility printf function.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 3/9] submodule: make static functions read submodules from commits
  2022-02-15 17:23   ` [PATCH v2 3/9] submodule: make static functions read submodules from commits Glen Choo
  2022-02-15 21:18     ` Jonathan Tan
@ 2022-02-15 22:00     ` Ævar Arnfjörð Bjarmason
  2022-02-16  7:08       ` Glen Choo
  1 sibling, 1 reply; 149+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-02-15 22:00 UTC (permalink / raw)
  To: Glen Choo; +Cc: git, Jonathan Tan, Junio C Hamano


On Wed, Feb 16 2022, Glen Choo wrote:

> diff --git a/submodule.c b/submodule.c
> index 5ace18a7d9..7032dcabb8 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -932,6 +932,7 @@ struct has_commit_data {
>  	struct repository *repo;
>  	int result;
>  	const char *path;
> +	const struct object_id *super_oid;
>  };

...

> -	struct has_commit_data has_commit = { r, 1, path };
> +	struct has_commit_data has_commit = { r, 1, path, super_oid };

FWIW I wouldn't at all mind the tiny detour of just turning this into
designated initializers while we're at it, instead of having to keep
track of the positionals. I.e.:

	[...] = {
		.repo = r,
		.result = 1,
                .path = path,
                ,super_oid = super_oid
	};


^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 7/9] fetch: fetch unpopulated, changed submodules
  2022-02-15 17:23   ` [PATCH v2 7/9] fetch: fetch unpopulated, changed submodules Glen Choo
@ 2022-02-15 22:02     ` Jonathan Tan
  2022-02-16  5:46       ` Glen Choo
  2022-02-15 22:06     ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 149+ messages in thread
From: Jonathan Tan @ 2022-02-15 22:02 UTC (permalink / raw)
  To: Glen Choo; +Cc: Jonathan Tan, git, Junio C Hamano

Patches 4-6 look good.

Glen Choo <chooglen@google.com> writes:
> Teach "git fetch" to fetch cloned, changed submodules regardless of
> whether they are populated (this is in addition to the current behavior
> of fetching populated submodules).

I think add a note that the current behavior is regardless of what is
being fetched. So, maybe something like:

  Teach "git fetch" to fetch cloned, changed submodules regardless of
  whether they are populated. This is in addition to the current behavior
  of fetching populated submodules (which happens regardless of what was
  fetched in the superproject, or even if nothing was fetched in the
  superproject).

> As I mentioned in the cover letter, I'm not entirely happy with the
> name repo_has_absorbed_submodules() - it's not a standardized term AFAIK
> and it's a little clunky.
> 
> "absorbed submodule" is just a stand-in for "submodule in .git/modules",
> so if we have a better term for "submodule in .git/modules", let's use
> that instead.

I think that this is OK if the doc comment is updated. I'll make the
suggestion in the appropriate place below.

> @@ -1248,14 +1261,28 @@ void check_for_new_submodule_commits(struct object_id *oid)
>  	oid_array_append(&ref_tips_after_fetch, oid);
>  }
>  
> +/*
> + * Returns 1 if the repo has absorbed submodule gitdirs, and 0
> + * otherwise. Like submodule_name_to_gitdir(), this checks
> + * $GIT_DIR/modules, not $GIT_COMMON_DIR.
> + */
> +static int repo_has_absorbed_submodules(struct repository *r)
> +{
> +	struct strbuf buf = STRBUF_INIT;
> +
> +	strbuf_repo_git_path(&buf, r, "modules/");
> +	return file_exists(buf.buf) && !is_empty_dir(buf.buf);
> +}

I think that if you replace the doc comment with something like:

  Returns 1 if there is at least one submodule gitdir in
  $GIT_DIR/modules, and 0 otherwise. (End users can move submodule
  gitdirs into $GIT_DIR/modules by running "git submodule
  absorbgitdirs".) Like submodule_name_to_gitdir()...

then it would be fine.

> @@ -1338,6 +1366,7 @@ struct submodule_parallel_fetch {
>  	int result;
>  
>  	struct string_list changed_submodule_names;
> +	struct string_list seen_submodule_names;

Could we unify the 2 lists instead of having 2 separate ones?

> @@ -1529,12 +1569,77 @@ get_fetch_task(struct submodule_parallel_fetch *spf,
>  	return NULL;
>  }
>  
> +static struct fetch_task *
> +get_fetch_task_from_changed(struct submodule_parallel_fetch *spf,
> +			    const char **default_argv, struct strbuf *err)
> +{
> +	for (; spf->changed_count < spf->changed_submodule_names.nr;
> +	     spf->changed_count++) {
> +		struct string_list_item item =
> +			spf->changed_submodule_names.items[spf->changed_count];
> +		struct changed_submodule_data *cs_data = item.util;
> +		struct fetch_task *task;
> +
> +		/*
> +		 * We might have already considered this submodule
> +		 * because we saw it in the index.
> +		 */
> +		if (string_list_lookup(&spf->seen_submodule_names, item.string))
> +			continue;
> +
> +		task = fetch_task_create(spf->r, cs_data->path,
> +					 cs_data->super_oid);
> +		if (!task)
> +			continue;
> +
> +		switch (get_fetch_recurse_config(task->sub, spf)) {
> +		default:
> +		case RECURSE_SUBMODULES_DEFAULT:
> +		case RECURSE_SUBMODULES_ON_DEMAND:
> +			*default_argv = "on-demand";
> +			break;
> +		case RECURSE_SUBMODULES_ON:
> +			*default_argv = "yes";
> +			break;
> +		case RECURSE_SUBMODULES_OFF:
> +			continue;
> +		}
> +
> +		task->repo = get_submodule_repo_for(spf->r, task->sub->path,
> +						    cs_data->super_oid);
> +		if (!task->repo) {
> +			fetch_task_release(task);
> +			free(task);
> +
> +			strbuf_addf(err, _("Could not access submodule '%s'\n"),
> +				    cs_data->path);
> +			continue;
> +		}
> +		if (!is_tree_submodule_active(spf->r, cs_data->super_oid,
> +					      task->sub->path))
> +			continue;
> +
> +		if (!spf->quiet)
> +			strbuf_addf(err,
> +				    _("Fetching submodule %s%s at commit %s\n"),
> +				    spf->prefix, task->sub->path,
> +				    find_unique_abbrev(cs_data->super_oid,
> +						       DEFAULT_ABBREV));
> +		spf->changed_count++;
> +		return task;
> +	}
> +	return NULL;
> +}

This is very similar to get_fetch_task_from_index(). Both:
 1. loop over something
 2. exit early if the submodule name is seen
 3. create the fetch task
 4. set the "recurse config"
 5. get the submodule repo
 6. if success, increment a counter
 7. if failure, check for some conditions and maybe append to err

Could a function be refactored that does 2-5?

> +test_expect_success "'--recurse-submodules=on-demand' should fetch submodule commits if the submodule is changed but the index has no submodules" '
> +	git -C downstream fetch --recurse-submodules &&

First of all, thanks for updating the test - it is much easier to
understand now.

About this line, we shouldn't use the code being tested to set up (we're
testing "fetch --recurse-submodules", so we shouldn't use the same
command to set up). Also, if we don't have confidence in the starting
state, it may be a sign to write it out more explicitly instead of
relying on a complicated command to do the right thing.

However, in this case, I don't think we need this. All we need is to see
that the test contains a new superproject commit that points to a new
submodule commit (and recursively). So we don't need this line.

> +	# Create new superproject commit with updated submodules
> +	add_upstream_commit &&
> +	(
> +		cd submodule &&
> +		(
> +			cd subdir/deepsubmodule &&
> +			git fetch &&
> +			git checkout -q FETCH_HEAD
> +		) &&
> +		git add subdir/deepsubmodule &&
> +		git commit -m "new deep submodule"
> +	) &&
> +	git add submodule &&
> +	git commit -m "new submodule" &&

I thought add_upstream_commit() would do this, but apparently it just
adds commits to the submodules (which works for the earlier tests that
just tested that the submodules were fetched, but not for this one). I
think that all this should go into its own function.

Also, I understand that "git fetch" is there to pick up the commit we
made in add_upstream_commit, but this indirection is unnecessary in a
test, I think. Can we not use add_upstream_commit and just create our
own in subdir/deepsubmodule? (This might conflict with subsequent tests
that use the old scheme, but I think that it should be fine.)

> +	# Fetch the new superproject commit
> +	(
> +		cd downstream &&
> +		git switch --recurse-submodules no-submodules &&
> +		git fetch --recurse-submodules=on-demand >../actual.out 2>../actual.err
> +	) &&
> +	test_must_be_empty actual.out &&
> +	git rev-parse --short HEAD >superhead &&
> +	git -C submodule rev-parse --short HEAD >subhead &&
> +	git -C deepsubmodule rev-parse --short HEAD >deephead &&

These >superhead lines would not be necessary if we had our own
function.

> +# In downstream, init "submodule2", but do not check it out while
> +# fetching. This lets us assert that unpopulated submodules can be
> +# fetched.

Firstly, "In upstream", I think? You want to fetch from it, so it has to
be upstream.

Secondly, is this test needed? I thought the case in which the worktree
has no submodules would be sufficient to test that unpopulated
submodules can be fetched.

> +test_expect_success "'--recurse-submodules' should fetch submodule commits in changed submodules and the index" '

[snip]

> +	git checkout super &&
> +	(
> +		cd downstream &&
> +		git fetch --recurse-submodules >../actual.out 2>../actual.err
> +	) &&
> +	test_must_be_empty actual.out &&
> +
> +	# Assert that the submodules in the super branch are fetched
> +	git rev-parse --short HEAD >superhead &&
> +	git -C submodule rev-parse --short HEAD >subhead &&
> +	git -C deepsubmodule rev-parse --short HEAD >deephead &&
> +	verify_fetch_result actual.err &&
> +	# grep for the exact line to check that the submodule is read
> +	# from the index, not from a commit
> +	grep "^Fetching submodule submodule\$" actual.err &&

Instead of a grep, I think this should be done by precisely specifying
what to fetch in the "git fetch" invocation, and then checking that the
submodule has commits that it didn't have before.

In addition, I think the following cases also need to be tested:
 - two fetched commits have submodules of the same name but different
   URL
 - a fetched commit and a commit in the index have submodules of the
   same name but different URL

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 4/9] submodule: inline submodule_commits() into caller
  2022-02-15 17:23   ` [PATCH v2 4/9] submodule: inline submodule_commits() into caller Glen Choo
@ 2022-02-15 22:02     ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 149+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-02-15 22:02 UTC (permalink / raw)
  To: Glen Choo; +Cc: git, Jonathan Tan, Junio C Hamano


On Wed, Feb 16 2022, Glen Choo wrote:

> -		commits = submodule_commits(changed, name);
> -		oid_array_append(commits, &p->two->oid);
> +		item = string_list_insert(changed, name);
> +		if (!item->util)
> +			/* NEEDSWORK: should we have oid_array_init()? */
> +			item->util = xcalloc(1, sizeof(struct oid_array));
> +		oid_array_append(item->util, &p->two->oid);
>  	}
>  }

Yes, just adding it while we're at it seems worthwhile, and if not
defining this in terms of the macro would be better, as the two are
guaranteed not to drift apart. I.e. the pattern seen in:

    git grep -W 'memcpy.*&blank'

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 8/9] submodule: read shallows when finding changed submodules
  2022-02-15 17:23   ` [PATCH v2 8/9] submodule: read shallows when finding " Glen Choo
@ 2022-02-15 22:03     ` Jonathan Tan
  0 siblings, 0 replies; 149+ messages in thread
From: Jonathan Tan @ 2022-02-15 22:03 UTC (permalink / raw)
  To: Glen Choo; +Cc: Jonathan Tan, git, Junio C Hamano

Glen Choo <chooglen@google.com> writes:
> @@ -901,6 +902,9 @@ static void collect_changed_submodules(struct repository *r,
>  
>  	save_warning = warn_on_object_refname_ambiguity;
>  	warn_on_object_refname_ambiguity = 0;
> +	/* make sure shallows are read */
> +	is_repository_shallow(the_repository);

This is presumably to initialize the data structures that some later
code reads without first calling is_repository_shallow(). Do we know
which part of the later code this is? If yes, it would be better to fix
it there.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 9/9] submodule: fix latent check_has_commit() bug
  2022-02-15 17:23   ` [PATCH v2 9/9] submodule: fix latent check_has_commit() bug Glen Choo
@ 2022-02-15 22:04     ` Jonathan Tan
  0 siblings, 0 replies; 149+ messages in thread
From: Jonathan Tan @ 2022-02-15 22:04 UTC (permalink / raw)
  To: Glen Choo; +Cc: Jonathan Tan, git, Junio C Hamano

Glen Choo <chooglen@google.com> writes:
> When check_has_commit() is called on a missing submodule, initialization
> of the struct repository fails, but it attempts to clear the struct
> anyway (which is a fatal error). This bug is masked by its only caller,
> submodule_has_commits(), first calling add_submodule_odb() - the latter
> fails if the submodule does not exist, making submodule_has_commits()
> exit early and not invoke check_has_commit().

This patch looks good, thanks.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 7/9] fetch: fetch unpopulated, changed submodules
  2022-02-15 17:23   ` [PATCH v2 7/9] fetch: fetch unpopulated, changed submodules Glen Choo
  2022-02-15 22:02     ` Jonathan Tan
@ 2022-02-15 22:06     ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 149+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-02-15 22:06 UTC (permalink / raw)
  To: Glen Choo; +Cc: git, Jonathan Tan, Junio C Hamano


On Wed, Feb 16 2022, Glen Choo wrote:

> +		switch (get_fetch_recurse_config(task->sub, spf)) {
> +		default:

Unfortunately get_fetch_recurse_config() returns "int", and the enum
fields here defined in submodule.h aren't of a named type, so we can't
get the advantage of a complier check for exhaustive enum member
checking here...

> +		case RECURSE_SUBMODULES_DEFAULT:
> +		case RECURSE_SUBMODULES_ON_DEMAND:
> +			*default_argv = "on-demand";
> +			break;
> +		case RECURSE_SUBMODULES_ON:
> +			*default_argv = "yes";
> +			break;
> +		case RECURSE_SUBMODULES_OFF:
> +			continue;

...in any case there's a lot more of them, so just having this "default"
case seems to make sense...

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 2/9] t5526: use grep to assert on fetches
  2022-02-15 21:53     ` Ævar Arnfjörð Bjarmason
@ 2022-02-16  3:09       ` Glen Choo
  2022-02-16 10:02         ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 149+ messages in thread
From: Glen Choo @ 2022-02-16  3:09 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Jonathan Tan, Junio C Hamano

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Wed, Feb 16 2022, Glen Choo wrote:
>
>> In a previous commit, we replaced test_cmp invocations with
>> verify_fetch_result(). Finish the process of removing test_cmp by using
>> grep in verify_fetch_result() instead.
>>
>> This makes the tests less sensitive to changes because, instead of
>> checking the whole stderr, we only grep for the lines of the form
>>
>> * "<old-head>..<new-head>\s+branch\s+-> origin/branch"
>> * "Fetching submodule <submodule-path>" (if fetching a submodule)
>>
>> when we expect the repo to have fetched. If we expect the repo to not
>> have fetched, grep to make sure the lines are absent. Also, simplify the
>> assertions by using grep patterns that match only the relevant pieces of
>> information, e.g. <old-head> is irrelevant because we only want to know
>> if the fetch was performed, so we don't need to know where the branch
>> was before the fetch.
>
> I tried ejecting 1/9 and 2/9 out of this series locally, and it passes
> all tests until the new tests you add in 7/9.
>
> As ugly as some of the pre-image is, I wonder if dropping these first
> two and biting the bullet and just continuing with the test_cmp is
> better.
>
> The test_cmp is going to catch issues that even the cleverest grep
> combinations won't, e.g. in the submodule-in-C series I discovered a bug
> where all of our testing & the existing series hadn't spotted that we
> were dropping a \n at the end in one of the messages.

I think there are two schools of thought on how to test informational
messages:

- assert an exact match on the exact output that we generate
- assert that the output contains the pieces of information we care
  about

These two approaches are virtually opposites on two axes - the former
will catch unintentional changes (like you've noted) and the latter
saves on maintenance effort and tends to be less noisy in tests.

Personally, I'm a bit torn between both approaches in general because I
want tests to be maintainable (testing the exact output is a bit of an
antipattern at Google), but I'm not very comfortable with the fact that
unintended changes can sneak through.

So I don't think there's a correct answer in general. Maybe an
acceptable rule of thumb is that test_cmp is good until it starts
getting in the way of reading and writing understandable tests.

If we agree on that rule, then for this patch, I think replacing
test_cmp is the way to go, primarily because it lets us ignore the 'old
head' of the branch before the fetch, e.g. in the quoted example..

>>  test_expect_success setup '
>> @@ -274,13 +277,10 @@ test_expect_success "Recursion doesn't happen when no new commits are fetched in
>>  '
>>  
>>  test_expect_success "Recursion stops when no new submodule commits are fetched" '
>> -	head1=$(git rev-parse --short HEAD) &&
>>  	git add submodule &&
>>  	git commit -m "new submodule" &&
>> -	head2=$(git rev-parse --short HEAD) &&
>> -	echo "From $pwd/." > expect.err.super &&
>> -	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
>
> ...as opposed to if we just rolled the generation of this into some
> utility printf function.

we'd still have to deal with $head1 if we use test_cmp. That's fine for
this test, because it's pretty simple, but it gets pretty janky later
on:

  @@ -345,20 +339,13 @@ test_expect_success "Recursion picks up all submodules when necessary" '
        git fetch &&
        git checkout -q FETCH_HEAD
      ) &&
  -		head1=$(git rev-parse --short HEAD^) &&
      git add subdir/deepsubmodule &&
      git commit -m "new deepsubmodule" &&
  -		head2=$(git rev-parse --short HEAD) &&
  -		echo "Fetching submodule submodule" > ../expect.err.sub &&
  -		echo "From $pwd/submodule" >> ../expect.err.sub &&
  -		echo "   $head1..$head2  sub        -> origin/sub" >> ../expect.err.sub
  +		git rev-parse --short HEAD >../subhead
    ) &&
  -	head1=$(git rev-parse --short HEAD) &&
    git add submodule &&
    git commit -m "new submodule" &&
  -	head2=$(git rev-parse --short HEAD) &&
  -	echo "From $pwd/." > expect.err.super &&
  -	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
  +	git rev-parse --short HEAD >superhead &&
    (
      cd downstream &&
      git fetch >../actual.out 2>../actual.err

In this example, we have two $head1 variables in different subshells,
one of which is HEAD, but the other is HEAD^. The reason why we want
HEAD^ isn't obvious (IIRC it's because the submodule upstream is 2
commits ahead because we add the deepsubmodule in a separate commit), in
my opinion, and I got tripped up quite a few times trying to read and
understand the test. That's a lot of effort to spend on irrelevant
information - the test actually cares about what it fetched, not where
the ref used to be.

So for that reason, I'd prefer to remove test_cmp for this test.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 7/9] fetch: fetch unpopulated, changed submodules
  2022-02-15 22:02     ` Jonathan Tan
@ 2022-02-16  5:46       ` Glen Choo
  2022-02-16  9:11         ` Glen Choo
  0 siblings, 1 reply; 149+ messages in thread
From: Glen Choo @ 2022-02-16  5:46 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Jonathan Tan, git, Junio C Hamano

Jonathan Tan <jonathantanmy@google.com> writes:

> Glen Choo <chooglen@google.com> writes:
>> Teach "git fetch" to fetch cloned, changed submodules regardless of
>> whether they are populated (this is in addition to the current behavior
>> of fetching populated submodules).
>
> I think add a note that the current behavior is regardless of what is
> being fetched. So, maybe something like:
>
>   Teach "git fetch" to fetch cloned, changed submodules regardless of
>   whether they are populated. This is in addition to the current behavior
>   of fetching populated submodules (which happens regardless of what was
>   fetched in the superproject, or even if nothing was fetched in the
>   superproject).

Makes sense, thanks. This description is true, though a bit misleading
in the [--recurse-submodules=on-demand] case - if nothing was fetched,
on-demand would still try to fetch submodules, though no submodules
would be fetched. I'll tweak it a little.

>> As I mentioned in the cover letter, I'm not entirely happy with the
>> name repo_has_absorbed_submodules() - it's not a standardized term AFAIK
>> and it's a little clunky.
>> 
>> "absorbed submodule" is just a stand-in for "submodule in .git/modules",
>> so if we have a better term for "submodule in .git/modules", let's use
>> that instead.
>
> I think that this is OK if the doc comment is updated. I'll make the
> suggestion in the appropriate place below.
>> +/*
>> + * Returns 1 if the repo has absorbed submodule gitdirs, and 0
>> + * otherwise. Like submodule_name_to_gitdir(), this checks
>> + * $GIT_DIR/modules, not $GIT_COMMON_DIR.
>> + */
>
> I think that if you replace the doc comment with something like:
>
>   Returns 1 if there is at least one submodule gitdir in
>   $GIT_DIR/modules, and 0 otherwise. (End users can move submodule
>   gitdirs into $GIT_DIR/modules by running "git submodule
>   absorbgitdirs".) Like submodule_name_to_gitdir()...
>
> then it would be fine.

Thanks! Sounds good.

>> @@ -1338,6 +1366,7 @@ struct submodule_parallel_fetch {
>>  	int result;
>>  
>>  	struct string_list changed_submodule_names;
>> +	struct string_list seen_submodule_names;
>
> Could we unify the 2 lists instead of having 2 separate ones?

If I understand the suggestion correctly (Can we combine the two lists
into a single changed_submodule_names list?) I don't think we can - at
least not without other changes. The intent behind seen_submodule_names
is to tell get_fetch_task_from_changed() which changed_submodule_names
items can be ignored. If we only used one list, we wouldn't be able to
tell whether we had already considered the submodule or not. If we
stored this info elsewhere, e.g. an extra field in
changed_submodule_data, then we could use a single list.

>
>> @@ -1529,12 +1569,77 @@ get_fetch_task(struct submodule_parallel_fetch *spf,
>>  	return NULL;
>>  }
>>  
>> +static struct fetch_task *
>> +get_fetch_task_from_changed(struct submodule_parallel_fetch *spf,
>> +			    const char **default_argv, struct strbuf *err)
>> +{
>> +	for (; spf->changed_count < spf->changed_submodule_names.nr;
>> +	     spf->changed_count++) {
>> +		struct string_list_item item =
>> +			spf->changed_submodule_names.items[spf->changed_count];
>> +		struct changed_submodule_data *cs_data = item.util;
>> +		struct fetch_task *task;
>> +
>> +		/*
>> +		 * We might have already considered this submodule
>> +		 * because we saw it in the index.
>> +		 */
>> +		if (string_list_lookup(&spf->seen_submodule_names, item.string))
>> +			continue;
>> +
>> +		task = fetch_task_create(spf->r, cs_data->path,
>> +					 cs_data->super_oid);
>> +		if (!task)
>> +			continue;
>> +
>> +		switch (get_fetch_recurse_config(task->sub, spf)) {
>> +		default:
>> +		case RECURSE_SUBMODULES_DEFAULT:
>> +		case RECURSE_SUBMODULES_ON_DEMAND:
>> +			*default_argv = "on-demand";
>> +			break;
>> +		case RECURSE_SUBMODULES_ON:
>> +			*default_argv = "yes";
>> +			break;
>> +		case RECURSE_SUBMODULES_OFF:
>> +			continue;
>> +		}
>> +
>> +		task->repo = get_submodule_repo_for(spf->r, task->sub->path,
>> +						    cs_data->super_oid);
>> +		if (!task->repo) {
>> +			fetch_task_release(task);
>> +			free(task);
>> +
>> +			strbuf_addf(err, _("Could not access submodule '%s'\n"),
>> +				    cs_data->path);
>> +			continue;
>> +		}
>> +		if (!is_tree_submodule_active(spf->r, cs_data->super_oid,
>> +					      task->sub->path))
>> +			continue;
>> +
>> +		if (!spf->quiet)
>> +			strbuf_addf(err,
>> +				    _("Fetching submodule %s%s at commit %s\n"),
>> +				    spf->prefix, task->sub->path,
>> +				    find_unique_abbrev(cs_data->super_oid,
>> +						       DEFAULT_ABBREV));
>> +		spf->changed_count++;
>> +		return task;
>> +	}
>> +	return NULL;
>> +}
>
> This is very similar to get_fetch_task_from_index(). Both:
>  1. loop over something
>  2. exit early if the submodule name is seen
>  3. create the fetch task
>  4. set the "recurse config"
>  5. get the submodule repo
>  6. if success, increment a counter
>  7. if failure, check for some conditions and maybe append to err
>
> Could a function be refactored that does 2-5?

Hm, it makes sense. I don't see a reason for 2-5 to be different for
the different functions.

>
>> +test_expect_success "'--recurse-submodules=on-demand' should fetch submodule commits if the submodule is changed but the index has no submodules" '
>> +	git -C downstream fetch --recurse-submodules &&
>
> First of all, thanks for updating the test - it is much easier to
> understand now.
>
> About this line, we shouldn't use the code being tested to set up (we're
> testing "fetch --recurse-submodules", so we shouldn't use the same
> command to set up). Also, if we don't have confidence in the starting
> state, it may be a sign to write it out more explicitly instead of
> relying on a complicated command to do the right thing.

True. I think the easiest way to do this would be the
"porcelain-downstream" you suggested in v1. But as you mentioned...

> However, in this case, I don't think we need this. All we need is to see
> that the test contains a new superproject commit that points to a new
> submodule commit (and recursively). So we don't need this line.

this isn't necessary, so I don't know if this is worth the effort at the
moment. I'll tinker with it.

As for the line itself, you're right, we don't need this. The state of
the downstream was more important when we cared about the old branch
head (it's needed for test_cmp), but we no longer do.

>> +	# Create new superproject commit with updated submodules
>> +	add_upstream_commit &&
>> +	(
>> +		cd submodule &&
>> +		(
>> +			cd subdir/deepsubmodule &&
>> +			git fetch &&
>> +			git checkout -q FETCH_HEAD
>> +		) &&
>> +		git add subdir/deepsubmodule &&
>> +		git commit -m "new deep submodule"
>> +	) &&
>> +	git add submodule &&
>> +	git commit -m "new submodule" &&
>
> I thought add_upstream_commit() would do this, but apparently it just
> adds commits to the submodules (which works for the earlier tests that
> just tested that the submodules were fetched, but not for this one). I
> think that all this should go into its own function.
>
> Also, I understand that "git fetch" is there to pick up the commit we
> made in add_upstream_commit, but this indirection is unnecessary in a
> test, I think. Can we not use add_upstream_commit and just create our
> own in subdir/deepsubmodule? (This might conflict with subsequent tests
> that use the old scheme, but I think that it should be fine.)

I copy-pasted this from existing tests, but I'm not happy with how noisy
it is either. I'll tinker with this too.

>> +# In downstream, init "submodule2", but do not check it out while
>> +# fetching. This lets us assert that unpopulated submodules can be
>> +# fetched.
>
> Firstly, "In upstream", I think? You want to fetch from it, so it has to
> be upstream.

It is "in downstream" - we "git init" the upstream, but we need to "git
submodule update --init" (which wraps "git submodule init") in the
downstream. If we didn't init it in the downstream, downstream wouldn't
have the clone and wouldn't fetch.


> Secondly, is this test needed? I thought the case in which the worktree
> has no submodules would be sufficient to test that unpopulated
> submodules can be fetched.

I'd prefer to have this test because it tests the interaction between
populated and unpopulated submodules. e.g. in a previous iteration, I
only had the "no submodules" test, but accidentally reused the
submodule_parallel_fetch.count variable for both populated and
unpopulated submodules. The test suite didn't catch the bug - I only
noticed the bug by a stroke of luck.

>> +test_expect_success "'--recurse-submodules' should fetch submodule commits in changed submodules and the index" '
>
> [snip]
>
>> +	git checkout super &&
>> +	(
>> +		cd downstream &&
>> +		git fetch --recurse-submodules >../actual.out 2>../actual.err
>> +	) &&
>> +	test_must_be_empty actual.out &&
>> +
>> +	# Assert that the submodules in the super branch are fetched
>> +	git rev-parse --short HEAD >superhead &&
>> +	git -C submodule rev-parse --short HEAD >subhead &&
>> +	git -C deepsubmodule rev-parse --short HEAD >deephead &&
>> +	verify_fetch_result actual.err &&
>> +	# grep for the exact line to check that the submodule is read
>> +	# from the index, not from a commit
>> +	grep "^Fetching submodule submodule\$" actual.err &&
>
> Instead of a grep, I think this should be done by precisely specifying
> what to fetch in the "git fetch" invocation, and then checking that the
> submodule has commits that it didn't have before.

verify_fetch_result() already tells us that the submodule
has the new commit:

	if [ -f subhead ]; then
		grep "Fetching submodule submodule" $ACTUAL_ERR &&
		grep -E "\.\.$(cat subhead)\s+sub\s+-> origin/sub" $ACTUAL_ERR

but (by design) it does not tell us whether "git fetch" read the
.gitmodules config from the index or a commit. The additional grep with
"^$" tells us that we read from the index because it checks that the
info message is not "Fetching submodule submodule at commit <id>". We
want to have this check because we want "git fetch" to prefer the index
in the event that the submodule is both changed and populated.

> In addition, I think the following cases also need to be tested:
>  - two fetched commits have submodules of the same name but different
>    URL
>  - a fetched commit and a commit in the index have submodules of the
>    same name but different URL

It makes sense to test both cases. The former is a new edge case
introduced by this commit, while the latter is a concern before this
commit. I _believe_ that we already handle the latter gracefully, and
that the same logic can be used to handle the former, but I don't think
we have any tests proving either of these hypotheses.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 3/9] submodule: make static functions read submodules from commits
  2022-02-15 21:18     ` Jonathan Tan
@ 2022-02-16  6:59       ` Glen Choo
  0 siblings, 0 replies; 149+ messages in thread
From: Glen Choo @ 2022-02-16  6:59 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Jonathan Tan, git, Junio C Hamano

Jonathan Tan <jonathantanmy@google.com> writes:

> Glen Choo <chooglen@google.com> writes:
>> The changed function signatures follow repo_submodule_init()'s argument
>> order, i.e. "path" then "treeish_name". Where needed, reorder the
>> arguments of functions that already take "path" and "treeish_name" to be
>> consistent with this convention.
>
> This paragraph made me nervous, but looking at the diff, you didn't
> actually reorder any arguments. Probably best to delete this paragraph.

Oh you're right. I was sure that this paragraph used to be relevant, but
I guess not. Thanks for the catch.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 3/9] submodule: make static functions read submodules from commits
  2022-02-15 22:00     ` Ævar Arnfjörð Bjarmason
@ 2022-02-16  7:08       ` Glen Choo
  0 siblings, 0 replies; 149+ messages in thread
From: Glen Choo @ 2022-02-16  7:08 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Jonathan Tan, Junio C Hamano

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Wed, Feb 16 2022, Glen Choo wrote:
>
>> diff --git a/submodule.c b/submodule.c
>> index 5ace18a7d9..7032dcabb8 100644
>> --- a/submodule.c
>> +++ b/submodule.c
>> @@ -932,6 +932,7 @@ struct has_commit_data {
>>  	struct repository *repo;
>>  	int result;
>>  	const char *path;
>> +	const struct object_id *super_oid;
>>  };
>
> ...
>
>> -	struct has_commit_data has_commit = { r, 1, path };
>> +	struct has_commit_data has_commit = { r, 1, path, super_oid };
>
> FWIW I wouldn't at all mind the tiny detour of just turning this into
> designated initializers while we're at it, instead of having to keep
> track of the positionals. I.e.:
>
> 	[...] = {
> 		.repo = r,
> 		.result = 1,
>                 .path = path,
>                 ,super_oid = super_oid
> 	};

Since I'm touching the line anyway, this seems like a reasonable change.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 7/9] fetch: fetch unpopulated, changed submodules
  2022-02-16  5:46       ` Glen Choo
@ 2022-02-16  9:11         ` Glen Choo
  2022-02-16  9:39           ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 149+ messages in thread
From: Glen Choo @ 2022-02-16  9:11 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Jonathan Tan, git, Junio C Hamano

Glen Choo <chooglen@google.com> writes:

> Jonathan Tan <jonathantanmy@google.com> writes:
>
>> Glen Choo <chooglen@google.com> writes:
>>> +	# Create new superproject commit with updated submodules
>>> +	add_upstream_commit &&
>>> +	(
>>> +		cd submodule &&
>>> +		(
>>> +			cd subdir/deepsubmodule &&
>>> +			git fetch &&
>>> +			git checkout -q FETCH_HEAD
>>> +		) &&
>>> +		git add subdir/deepsubmodule &&
>>> +		git commit -m "new deep submodule"
>>> +	) &&
>>> +	git add submodule &&
>>> +	git commit -m "new submodule" &&
>>
>> I thought add_upstream_commit() would do this, but apparently it just
>> adds commits to the submodules (which works for the earlier tests that
>> just tested that the submodules were fetched, but not for this one). I
>> think that all this should go into its own function.

I'm testing out a function that does exactly what these lines do, i.e.
create a superproject commit containing a submodule change containing a
deepsubmodule change. That works pretty well and it makes sense in the
context of the tests.

>> Also, I understand that "git fetch" is there to pick up the commit we
>> made in add_upstream_commit, but this indirection is unnecessary in a
>> test, I think. Can we not use add_upstream_commit and just create our
>> own in subdir/deepsubmodule? (This might conflict with subsequent tests
>> that use the old scheme, but I think that it should be fine.)

We can avoid the "git fetch" if we first need to fix an inconsistency in
how the submodules are set up. Right now, we have:

  test_expect_success setup '
    mkdir deepsubmodule &&
    [...]
    mkdir submodule &&
    (
    [...]
      git submodule add "$pwd/deepsubmodule" subdir/deepsubmodule &&
      git commit -a -m new &&
      git branch -M sub
    ) &&
    git submodule add "$pwd/submodule" submodule &&
    [...]
    (
      cd downstream &&
      git submodule update --init --recursive
    )
  '

resulting in a directory structure like:

$pwd
|_submodule
  |_subdir
    |_deepsubmodule
|_deepsubmodule

and upstream/downstream dependencies like:

upstream                             downstream            
--------                             ----------
$pwd/deepsubmodule                   $pwd/downstream/submodule/subdir/deepsubmodule (SUT)
                                     $pwd/submodule/subdir/deepsubmodule


So we can't create the commit in submodule/subdir/deepsubmodule, because
that's not where our SUT would fetch from.

Instead, we could fix this by having a more consistent
upstream/downstream structure:

$pwd
|_submodule
  |_subdir
    |_deepsubmodule

upstream                             downstream            
--------                             ----------
$pwd/submodule/subdir/deepsubmodule  $pwd/downstream/submodule/subdir/deepsubmodule (SUT)

This seems more convenient to test, but before I commit to this, is
there a downside to this that I'm not seeing?

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 7/9] fetch: fetch unpopulated, changed submodules
  2022-02-16  9:11         ` Glen Choo
@ 2022-02-16  9:39           ` Ævar Arnfjörð Bjarmason
  2022-02-16 17:33             ` Glen Choo
  0 siblings, 1 reply; 149+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-02-16  9:39 UTC (permalink / raw)
  To: Glen Choo; +Cc: Jonathan Tan, git, Junio C Hamano


On Wed, Feb 16 2022, Glen Choo wrote:

> Glen Choo <chooglen@google.com> writes:
>
>> Jonathan Tan <jonathantanmy@google.com> writes:
>>
>>> Glen Choo <chooglen@google.com> writes:
>>>> +	# Create new superproject commit with updated submodules
>>>> +	add_upstream_commit &&
>>>> +	(
>>>> +		cd submodule &&
>>>> +		(
>>>> +			cd subdir/deepsubmodule &&
>>>> +			git fetch &&
>>>> +			git checkout -q FETCH_HEAD
>>>> +		) &&
>>>> +		git add subdir/deepsubmodule &&
>>>> +		git commit -m "new deep submodule"
>>>> +	) &&
>>>> +	git add submodule &&
>>>> +	git commit -m "new submodule" &&
>>>
>>> I thought add_upstream_commit() would do this, but apparently it just
>>> adds commits to the submodules (which works for the earlier tests that
>>> just tested that the submodules were fetched, but not for this one). I
>>> think that all this should go into its own function.
>
> I'm testing out a function that does exactly what these lines do, i.e.
> create a superproject commit containing a submodule change containing a
> deepsubmodule change. That works pretty well and it makes sense in the
> context of the tests.
>
>>> Also, I understand that "git fetch" is there to pick up the commit we
>>> made in add_upstream_commit, but this indirection is unnecessary in a
>>> test, I think. Can we not use add_upstream_commit and just create our
>>> own in subdir/deepsubmodule? (This might conflict with subsequent tests
>>> that use the old scheme, but I think that it should be fine.)
>
> We can avoid the "git fetch" if we first need to fix an inconsistency in
> how the submodules are set up. Right now, we have:
>
>   test_expect_success setup '
>     mkdir deepsubmodule &&
>     [...]
>     mkdir submodule &&
>     (
>     [...]
>       git submodule add "$pwd/deepsubmodule" subdir/deepsubmodule &&
>       git commit -a -m new &&
>       git branch -M sub
>     ) &&
>     git submodule add "$pwd/submodule" submodule &&
>     [...]
>     (
>       cd downstream &&
>       git submodule update --init --recursive
>     )
>   '
>
> resulting in a directory structure like:
>
> $pwd
> |_submodule
>   |_subdir
>     |_deepsubmodule
> |_deepsubmodule
>
> and upstream/downstream dependencies like:
>
> upstream                             downstream            
> --------                             ----------
> $pwd/deepsubmodule                   $pwd/downstream/submodule/subdir/deepsubmodule (SUT)
>                                      $pwd/submodule/subdir/deepsubmodule
>
>
> So we can't create the commit in submodule/subdir/deepsubmodule, because
> that's not where our SUT would fetch from.
>
> Instead, we could fix this by having a more consistent
> upstream/downstream structure:
>
> $pwd
> |_submodule
>   |_subdir
>     |_deepsubmodule
>
> upstream                             downstream            
> --------                             ----------
> $pwd/submodule/subdir/deepsubmodule  $pwd/downstream/submodule/subdir/deepsubmodule (SUT)
>
> This seems more convenient to test, but before I commit to this, is
> there a downside to this that I'm not seeing?

Won't this sort of arrangement create N copies of e.g. a zlib.git or
some other common library that might be used by N submodules.

But I haven't read all the context, I'm assuming you're talking about
how we store in-tree a/b and x/y/b submodules now, we store those in
.git/ both as .git/modules/b.git or whatever? I can't remember ... :)

Whatever we do now I do think the caveat I've noted above is interesting
when it comes to submodule design, e.g. if both git.git and
some-random-thing.git both bring in the same sha1collisiondetection.git
from the same github URL should those be the same in our underlying
storage?

I think the answer to that would ideally be both "yes" and
"no".

I.e. "yes" because it's surely handy for "git fetch", now you don't need to
fetch the same stuff twice in the common case of just updating all our
recursive submodules.

And also "no" because maybe some users would really consider them
different. E.g. the you may want to "cd git/" and adjust the git.git one
and create a branch there for some hotfix it needs, which would not be
needed/wanted by some-random-thing.git.

Hrm...

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 2/9] t5526: use grep to assert on fetches
  2022-02-16  3:09       ` Glen Choo
@ 2022-02-16 10:02         ` Ævar Arnfjörð Bjarmason
  2022-02-17  4:04           ` Glen Choo
  0 siblings, 1 reply; 149+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-02-16 10:02 UTC (permalink / raw)
  To: Glen Choo; +Cc: git, Jonathan Tan, Junio C Hamano


On Wed, Feb 16 2022, Glen Choo wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> On Wed, Feb 16 2022, Glen Choo wrote:
>>
>>> In a previous commit, we replaced test_cmp invocations with
>>> verify_fetch_result(). Finish the process of removing test_cmp by using
>>> grep in verify_fetch_result() instead.
>>>
>>> This makes the tests less sensitive to changes because, instead of
>>> checking the whole stderr, we only grep for the lines of the form
>>>
>>> * "<old-head>..<new-head>\s+branch\s+-> origin/branch"
>>> * "Fetching submodule <submodule-path>" (if fetching a submodule)
>>>
>>> when we expect the repo to have fetched. If we expect the repo to not
>>> have fetched, grep to make sure the lines are absent. Also, simplify the
>>> assertions by using grep patterns that match only the relevant pieces of
>>> information, e.g. <old-head> is irrelevant because we only want to know
>>> if the fetch was performed, so we don't need to know where the branch
>>> was before the fetch.
>>
>> I tried ejecting 1/9 and 2/9 out of this series locally, and it passes
>> all tests until the new tests you add in 7/9.
>>
>> As ugly as some of the pre-image is, I wonder if dropping these first
>> two and biting the bullet and just continuing with the test_cmp is
>> better.
>>
>> The test_cmp is going to catch issues that even the cleverest grep
>> combinations won't, e.g. in the submodule-in-C series I discovered a bug
>> where all of our testing & the existing series hadn't spotted that we
>> were dropping a \n at the end in one of the messages.
>
> I think there are two schools of thought on how to test informational
> messages:
>
> - assert an exact match on the exact output that we generate
> - assert that the output contains the pieces of information we care
>   about
>
> These two approaches are virtually opposites on two axes - the former
> will catch unintentional changes (like you've noted)[...]

Yes, and to be fair I'm thoroughly in the "assert an exact match" camp,
i.e. "let's just use test_cmp", and not everyone would agree with that.

I mean, I don't think we should test_cmp every single instance of a
command, but for things that are *the tests* concerning themselves with
what the output should be, yes we should do that.

> [...] and the latter saves on maintenance effort and tends to be less noisy in tests.

I also don't think you're right about the other approach "sav[ing] on
[future] maintenance effort" in this case.

If I was needing to adjust some of this output I'd spend way longer on
trying to carefully reason that some series of "grep" invocations were
really doing the right thing, and probably end up doing the equivalent
of a "test_cmp" for myself out of general paranoia, whereas adjusting
the output.

Whereas adjusting the code, running the tests, and looking at the "diff
-u" failures from "test_cmp" and adjusting the output is an easy matter
of copy/pasting.

Then reviewers can just see what's a clear human-readable change,
e.g. imagine reviewing a patch where we start trying to aligning
something in the output where the patch has a pretty much 1=1 (test_cmp)
mapping to the before/after, v.s. doing the same with whatever "grep"
regex we wind up with.

> Personally, I'm a bit torn between both approaches in general because I
> want tests to be maintainable (testing the exact output is a bit of an
> antipattern at Google), but I'm not very comfortable with the fact that
> unintended changes can sneak through.

Yes, anyway whatever one thinks in general what I meant to point out
here with "biting the bullet" is that whatever one thinks in general
about the right approch for new tests, this series in particular seems
to be creating more work for itself than it needs by refactoring the
test_cmp in existing tests just to add a few new ones.

I.e. even if you'd like to not use test_cmp-alike for the new tests,
wouldn't it be simpler to just leave the old ones in place and use your
new helper for your new tests?

> So I don't think there's a correct answer in general. Maybe an
> acceptable rule of thumb is that test_cmp is good until it starts
> getting in the way of reading and writing understandable tests.
>
> If we agree on that rule, then for this patch, I think replacing
> test_cmp is the way to go, primarily because it lets us ignore the 'old
> head' of the branch before the fetch, e.g. in the quoted example..

[...]

>>>  test_expect_success setup '
>>> @@ -274,13 +277,10 @@ test_expect_success "Recursion doesn't happen when no new commits are fetched in
>>>  '
>>>  
>>>  test_expect_success "Recursion stops when no new submodule commits are fetched" '
>>> -	head1=$(git rev-parse --short HEAD) &&
>>>  	git add submodule &&
>>>  	git commit -m "new submodule" &&
>>> -	head2=$(git rev-parse --short HEAD) &&
>>> -	echo "From $pwd/." > expect.err.super &&
>>> -	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
>>
>> ...as opposed to if we just rolled the generation of this into some
>> utility printf function.
>
> we'd still have to deal with $head1 if we use test_cmp. That's fine for
> this test, because it's pretty simple, but it gets pretty janky later
> on:
>
>   @@ -345,20 +339,13 @@ test_expect_success "Recursion picks up all submodules when necessary" '
>         git fetch &&
>         git checkout -q FETCH_HEAD
>       ) &&
>   -		head1=$(git rev-parse --short HEAD^) &&
>       git add subdir/deepsubmodule &&
>       git commit -m "new deepsubmodule" &&
>   -		head2=$(git rev-parse --short HEAD) &&
>   -		echo "Fetching submodule submodule" > ../expect.err.sub &&
>   -		echo "From $pwd/submodule" >> ../expect.err.sub &&
>   -		echo "   $head1..$head2  sub        -> origin/sub" >> ../expect.err.sub
>   +		git rev-parse --short HEAD >../subhead
>     ) &&
>   -	head1=$(git rev-parse --short HEAD) &&
>     git add submodule &&
>     git commit -m "new submodule" &&
>   -	head2=$(git rev-parse --short HEAD) &&
>   -	echo "From $pwd/." > expect.err.super &&
>   -	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
>   +	git rev-parse --short HEAD >superhead &&
>     (
>       cd downstream &&
>       git fetch >../actual.out 2>../actual.err
>
> In this example, we have two $head1 variables in different subshells,
> one of which is HEAD, but the other is HEAD^. The reason why we want
> HEAD^ isn't obvious (IIRC it's because the submodule upstream is 2
> commits ahead because we add the deepsubmodule in a separate commit), in
> my opinion, and I got tripped up quite a few times trying to read and
> understand the test. That's a lot of effort to spend on irrelevant
> information - the test actually cares about what it fetched, not where
> the ref used to be.
>
> So for that reason, I'd prefer to remove test_cmp for this test.

I agree that it's pretty irrelevant, but I also think we'd be throwing
the baby out with the bath water by entirely doing away with test_cmp
here, there's an easier way to do this.

I.e. none of these tests surely need to test that we updated from
$head1..$head2 again and again with the corresponding verbosity in test
setup and shelling out to "git rev-parse --short HEAD" or whatever.

Instead let's just test once somewhere that when we run submodule
fetching that submodules are indeed updated appropriately. Surely other
submodule tests will break if the "update" code is made to NOOP, or
update to the wrong HEAD>

Then for all these test_cmp tests we can just sed-away the
$head1..$head2 with something like (untested):

    sed -n -e 's/[^.]*\.\.[^.]*/OLD..NEW/g'

I.e. let's just skip this entire ceremony with asserting the old/new
HEAD unless it's really needed (and then we can probably do it once
outside a test_cmp).

If you grep through the test suite for "sed" adjacent to "test_cmp"
you'll find a lot of such examples of munging the output before
test_cmp-ing it.

That's perfectly fine here, since the actual point of the test_cmp is to
check the formatting/order etc. of the output itself, not to continually
re-assert that submodule updating still works, and that we get the right
OIDs.




^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 7/9] fetch: fetch unpopulated, changed submodules
  2022-02-16  9:39           ` Ævar Arnfjörð Bjarmason
@ 2022-02-16 17:33             ` Glen Choo
  0 siblings, 0 replies; 149+ messages in thread
From: Glen Choo @ 2022-02-16 17:33 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Jonathan Tan, git, Junio C Hamano

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Wed, Feb 16 2022, Glen Choo wrote:
>
>> Glen Choo <chooglen@google.com> writes:
>>
>>> Jonathan Tan <jonathantanmy@google.com> writes:
>>>
>>>> Glen Choo <chooglen@google.com> writes:
>>>>> +	# Create new superproject commit with updated submodules
>>>>> +	add_upstream_commit &&
>>>>> +	(
>>>>> +		cd submodule &&
>>>>> +		(
>>>>> +			cd subdir/deepsubmodule &&
>>>>> +			git fetch &&
>>>>> +			git checkout -q FETCH_HEAD
>>>>> +		) &&
>>>>> +		git add subdir/deepsubmodule &&
>>>>> +		git commit -m "new deep submodule"
>>>>> +	) &&
>>>>> +	git add submodule &&
>>>>> +	git commit -m "new submodule" &&
>>>>
>>>> I thought add_upstream_commit() would do this, but apparently it just
>>>> adds commits to the submodules (which works for the earlier tests that
>>>> just tested that the submodules were fetched, but not for this one). I
>>>> think that all this should go into its own function.
>>
>> I'm testing out a function that does exactly what these lines do, i.e.
>> create a superproject commit containing a submodule change containing a
>> deepsubmodule change. That works pretty well and it makes sense in the
>> context of the tests.
>>
>>>> Also, I understand that "git fetch" is there to pick up the commit we
>>>> made in add_upstream_commit, but this indirection is unnecessary in a
>>>> test, I think. Can we not use add_upstream_commit and just create our
>>>> own in subdir/deepsubmodule? (This might conflict with subsequent tests
>>>> that use the old scheme, but I think that it should be fine.)
>>
>> We can avoid the "git fetch" if we first need to fix an inconsistency in
>> how the submodules are set up. Right now, we have:
>>
>>   test_expect_success setup '
>>     mkdir deepsubmodule &&
>>     [...]
>>     mkdir submodule &&
>>     (
>>     [...]
>>       git submodule add "$pwd/deepsubmodule" subdir/deepsubmodule &&
>>       git commit -a -m new &&
>>       git branch -M sub
>>     ) &&
>>     git submodule add "$pwd/submodule" submodule &&
>>     [...]
>>     (
>>       cd downstream &&
>>       git submodule update --init --recursive
>>     )
>>   '
>>
>> resulting in a directory structure like:
>>
>> $pwd
>> |_submodule
>>   |_subdir
>>     |_deepsubmodule
>> |_deepsubmodule
>>
>> and upstream/downstream dependencies like:
>>
>> upstream                             downstream            
>> --------                             ----------
>> $pwd/deepsubmodule                   $pwd/downstream/submodule/subdir/deepsubmodule (SUT)
>>                                      $pwd/submodule/subdir/deepsubmodule
>>
>>
>> So we can't create the commit in submodule/subdir/deepsubmodule, because
>> that's not where our SUT would fetch from.
>>
>> Instead, we could fix this by having a more consistent
>> upstream/downstream structure:
>>
>> $pwd
>> |_submodule
>>   |_subdir
>>     |_deepsubmodule
>>
>> upstream                             downstream            
>> --------                             ----------
>> $pwd/submodule/subdir/deepsubmodule  $pwd/downstream/submodule/subdir/deepsubmodule (SUT)
>>
>> This seems more convenient to test, but before I commit to this, is
>> there a downside to this that I'm not seeing?
>
> Won't this sort of arrangement create N copies of e.g. a zlib.git or
> some other common library that might be used by N submodules.
>
> But I haven't read all the context, I'm assuming you're talking about
> how we store in-tree a/b and x/y/b submodules now, we store those in
> .git/ both as .git/modules/b.git or whatever? I can't remember ... :)

Ah the problem I'm describing is much simpler, it's just "how do we want
our test setup (which has submodules) to look".

But we can also consider the question you are asking :)

> Whatever we do now I do think the caveat I've noted above is interesting
> when it comes to submodule design, e.g. if both git.git and
> some-random-thing.git both bring in the same sha1collisiondetection.git
> from the same github URL should those be the same in our underlying
> storage?
>
> I think the answer to that would ideally be both "yes" and
> "no".
>
> I.e. "yes" because it's surely handy for "git fetch", now you don't need to
> fetch the same stuff twice in the common case of just updating all our
> recursive submodules.

Hm, and it would save space on disk.

> And also "no" because maybe some users would really consider them
> different. E.g. the you may want to "cd git/" and adjust the git.git one
> and create a branch there for some hotfix it needs, which would not be
> needed/wanted by some-random-thing.git.

I don't think we could say "yes" for all users, because the subset of
users you describe here will probably appreciate them being separate.

But I can imagine doing this manually, like a "git submodule dedupe",
that lets users who really need it can opt into this risky setup where
submodules are shared. Does anyone really need it though? I'm not sure
yet.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 2/9] t5526: use grep to assert on fetches
  2022-02-16 10:02         ` Ævar Arnfjörð Bjarmason
@ 2022-02-17  4:04           ` Glen Choo
  2022-02-17  9:25             ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 149+ messages in thread
From: Glen Choo @ 2022-02-17  4:04 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Jonathan Tan, Junio C Hamano

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Wed, Feb 16 2022, Glen Choo wrote:
>
>> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>>
>>> On Wed, Feb 16 2022, Glen Choo wrote:
>>>
>>>> In a previous commit, we replaced test_cmp invocations with
>>>> verify_fetch_result(). Finish the process of removing test_cmp by using
>>>> grep in verify_fetch_result() instead.
>>>>
>>>> This makes the tests less sensitive to changes because, instead of
>>>> checking the whole stderr, we only grep for the lines of the form
>>>>
>>>> * "<old-head>..<new-head>\s+branch\s+-> origin/branch"
>>>> * "Fetching submodule <submodule-path>" (if fetching a submodule)
>>>>
>>>> when we expect the repo to have fetched. If we expect the repo to not
>>>> have fetched, grep to make sure the lines are absent. Also, simplify the
>>>> assertions by using grep patterns that match only the relevant pieces of
>>>> information, e.g. <old-head> is irrelevant because we only want to know
>>>> if the fetch was performed, so we don't need to know where the branch
>>>> was before the fetch.
>>>
>>> I tried ejecting 1/9 and 2/9 out of this series locally, and it passes
>>> all tests until the new tests you add in 7/9.
>>>
>>> As ugly as some of the pre-image is, I wonder if dropping these first
>>> two and biting the bullet and just continuing with the test_cmp is
>>> better.
>>>
>>> The test_cmp is going to catch issues that even the cleverest grep
>>> combinations won't, e.g. in the submodule-in-C series I discovered a bug
>>> where all of our testing & the existing series hadn't spotted that we
>>> were dropping a \n at the end in one of the messages.
>>
>> I think there are two schools of thought on how to test informational
>> messages:
>>
>> - assert an exact match on the exact output that we generate
>> - assert that the output contains the pieces of information we care
>>   about
>>
>> These two approaches are virtually opposites on two axes - the former
>> will catch unintentional changes (like you've noted)[...]
>
> Yes, and to be fair I'm thoroughly in the "assert an exact match" camp,
> i.e. "let's just use test_cmp", and not everyone would agree with that.
>
> I mean, I don't think we should test_cmp every single instance of a
> command, but for things that are *the tests* concerning themselves with
> what the output should be, yes we should do that.

That's a good point I hadn't considered, which is that if we want any
hope of catching unintentional changes in our test suite, we'd want
_some_ test to check the output. For "git fetch --recurse-submodules",
it makes the most sense for that test to live in this file.

By eliminating all instances of test_cmp in this file in particular, we
lose assurances that we don't introduce accidental changes. It makes
sense to at least have some tests explicitly for output.

>
>> [...] and the latter saves on maintenance effort and tends to be less noisy in tests.
>
> I also don't think you're right about the other approach "sav[ing] on
> [future] maintenance effort" in this case.
>
> If I was needing to adjust some of this output I'd spend way longer on
> trying to carefully reason that some series of "grep" invocations were
> really doing the right thing, and probably end up doing the equivalent
> of a "test_cmp" for myself out of general paranoia, whereas adjusting
> the output.

That's fair. I've optimized the tests for readability by putting
complicated logic in the test helper. But any diligent test reader would
need to read the test helper to convince themselves of its correctness.
In this case, I agree that the helper is too complex.

>> Personally, I'm a bit torn between both approaches in general because I
>> want tests to be maintainable (testing the exact output is a bit of an
>> antipattern at Google), but I'm not very comfortable with the fact that
>> unintended changes can sneak through.
>
> Yes, anyway whatever one thinks in general what I meant to point out
> here with "biting the bullet" is that whatever one thinks in general
> about the right approch for new tests, this series in particular seems
> to be creating more work for itself than it needs by refactoring the
> test_cmp in existing tests just to add a few new ones.
>
> I.e. even if you'd like to not use test_cmp-alike for the new tests,
> wouldn't it be simpler to just leave the old ones in place and use your
> new helper for your new tests?

I'm not sure about this - avoiding changing old tests leads to
fragmentation in the test suite and even the same file. I find it very
challenging to read/modify files like this, because there is no longer a
consistent style for the file, and I have to figure out which is the
"good" way to write tests.

This suggestion makes sense if there's some qualitative difference
between the new tests and old ones besides just 'being new'. This isn't
true for this series, so I'd prefer to keep things consistent.

>> So I don't think there's a correct answer in general. Maybe an
>> acceptable rule of thumb is that test_cmp is good until it starts
>> getting in the way of reading and writing understandable tests.
>>
>> If we agree on that rule, then for this patch, I think replacing
>> test_cmp is the way to go, primarily because it lets us ignore the 'old
>> head' of the branch before the fetch, e.g. in the quoted example..
>
> [...]
>
>>>>  test_expect_success setup '
>>>> @@ -274,13 +277,10 @@ test_expect_success "Recursion doesn't happen when no new commits are fetched in
>>>>  '
>>>>  
>>>>  test_expect_success "Recursion stops when no new submodule commits are fetched" '
>>>> -	head1=$(git rev-parse --short HEAD) &&
>>>>  	git add submodule &&
>>>>  	git commit -m "new submodule" &&
>>>> -	head2=$(git rev-parse --short HEAD) &&
>>>> -	echo "From $pwd/." > expect.err.super &&
>>>> -	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
>>>
>>> ...as opposed to if we just rolled the generation of this into some
>>> utility printf function.
>>
>> we'd still have to deal with $head1 if we use test_cmp. That's fine for
>> this test, because it's pretty simple, but it gets pretty janky later
>> on:
>>
>>   @@ -345,20 +339,13 @@ test_expect_success "Recursion picks up all submodules when necessary" '
>>         git fetch &&
>>         git checkout -q FETCH_HEAD
>>       ) &&
>>   -		head1=$(git rev-parse --short HEAD^) &&
>>       git add subdir/deepsubmodule &&
>>       git commit -m "new deepsubmodule" &&
>>   -		head2=$(git rev-parse --short HEAD) &&
>>   -		echo "Fetching submodule submodule" > ../expect.err.sub &&
>>   -		echo "From $pwd/submodule" >> ../expect.err.sub &&
>>   -		echo "   $head1..$head2  sub        -> origin/sub" >> ../expect.err.sub
>>   +		git rev-parse --short HEAD >../subhead
>>     ) &&
>>   -	head1=$(git rev-parse --short HEAD) &&
>>     git add submodule &&
>>     git commit -m "new submodule" &&
>>   -	head2=$(git rev-parse --short HEAD) &&
>>   -	echo "From $pwd/." > expect.err.super &&
>>   -	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
>>   +	git rev-parse --short HEAD >superhead &&
>>     (
>>       cd downstream &&
>>       git fetch >../actual.out 2>../actual.err
>>
>> In this example, we have two $head1 variables in different subshells,
>> one of which is HEAD, but the other is HEAD^. The reason why we want
>> HEAD^ isn't obvious (IIRC it's because the submodule upstream is 2
>> commits ahead because we add the deepsubmodule in a separate commit), in
>> my opinion, and I got tripped up quite a few times trying to read and
>> understand the test. That's a lot of effort to spend on irrelevant
>> information - the test actually cares about what it fetched, not where
>> the ref used to be.
>>
>> So for that reason, I'd prefer to remove test_cmp for this test.
>
> I agree that it's pretty irrelevant, but I also think we'd be throwing
> the baby out with the bath water by entirely doing away with test_cmp
> here, there's an easier way to do this.
[...]
> Instead let's just test once somewhere that when we run submodule
> fetching that submodules are indeed updated appropriately. Surely other
> submodule tests will break if the "update" code is made to NOOP, or
> update to the wrong HEAD>
>
> Then for all these test_cmp tests we can just sed-away the
> $head1..$head2 with something like (untested):
>
>     sed -n -e 's/[^.]*\.\.[^.]*/OLD..NEW/g'
>
> I.e. let's just skip this entire ceremony with asserting the old/new
> HEAD unless it's really needed (and then we can probably do it once
> outside a test_cmp).
>
> If you grep through the test suite for "sed" adjacent to "test_cmp"
> you'll find a lot of such examples of munging the output before
> test_cmp-ing it.

Makes sense, I hadn't considered this (though I have seen the pattern in
the test suite, oops). The most compelling argument in favor of this is
that this could remove a lot of the complexity of verify_fetch_result(),
which is impeding test readability.

> I.e. none of these tests surely need to test that we updated from
> $head1..$head2 again and again with the corresponding verbosity in test
> setup and shelling out to "git rev-parse --short HEAD" or whatever.

I find the converse (we are testing the formatting over and over again)
less convincing. In these tests, we really are checking for $head2 in
the stderr to verify that the correct thing was fetched. I'm not
convinced that we should be relying on _other_ submodule tests to tell
us that submodule fetch is broken. Which brings me back to the original
motivation of this patch..

>
> That's perfectly fine here, since the actual point of the test_cmp is to
> check the formatting/order etc. of the output itself, not to continually
> re-assert that submodule updating still works, and that we get the right
> OIDs.

which is that these tests actually are continually re-asserting the
submodule updating works correctly in the different circumstances, and
since we use the stderr to check this, test_cmp adds unwarranted noise.

But you are correct in that the point of test_cmp is to check
formatting/order etc. There is value in using test_cmp for this purpose,
and getting rid of it entirely creates a hole in our test coverage.
(This wouldn't mean that we'd need to use test_cmp _everywhere_ though,
only that we need to use test_cmp _somewhere_.)

As it stands:

+   test_cmp can improve the readability of the test helpers and
    debuggability of tests vs grep
+/- test_cmp can catch formatting changes that are hard to catch
    otherwise, though at the cost of being sensitive to _any_ formatting
    changes
-   test_cmp needs some munging to eliminate unnecessary information

so on the whole, I think it's worth trying to use test_cmp in the test
helper. We may not _need_ it everywhere, but if it would be nice to use
it in as many places as possible.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 2/9] t5526: use grep to assert on fetches
  2022-02-17  4:04           ` Glen Choo
@ 2022-02-17  9:25             ` Ævar Arnfjörð Bjarmason
  2022-02-17 16:16               ` Glen Choo
  0 siblings, 1 reply; 149+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-02-17  9:25 UTC (permalink / raw)
  To: Glen Choo; +Cc: git, Jonathan Tan, Junio C Hamano


On Thu, Feb 17 2022, Glen Choo wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> On Wed, Feb 16 2022, Glen Choo wrote:
>>
>>> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>>>
>>>> On Wed, Feb 16 2022, Glen Choo wrote:
>>>>
>>>>> In a previous commit, we replaced test_cmp invocations with
>>>>> verify_fetch_result(). Finish the process of removing test_cmp by using
>>>>> grep in verify_fetch_result() instead.
>>>>>
>>>>> This makes the tests less sensitive to changes because, instead of
>>>>> checking the whole stderr, we only grep for the lines of the form
>>>>>
>>>>> * "<old-head>..<new-head>\s+branch\s+-> origin/branch"
>>>>> * "Fetching submodule <submodule-path>" (if fetching a submodule)
>>>>>
>>>>> when we expect the repo to have fetched. If we expect the repo to not
>>>>> have fetched, grep to make sure the lines are absent. Also, simplify the
>>>>> assertions by using grep patterns that match only the relevant pieces of
>>>>> information, e.g. <old-head> is irrelevant because we only want to know
>>>>> if the fetch was performed, so we don't need to know where the branch
>>>>> was before the fetch.
>>>>
>>>> I tried ejecting 1/9 and 2/9 out of this series locally, and it passes
>>>> all tests until the new tests you add in 7/9.
>>>>
>>>> As ugly as some of the pre-image is, I wonder if dropping these first
>>>> two and biting the bullet and just continuing with the test_cmp is
>>>> better.
>>>>
>>>> The test_cmp is going to catch issues that even the cleverest grep
>>>> combinations won't, e.g. in the submodule-in-C series I discovered a bug
>>>> where all of our testing & the existing series hadn't spotted that we
>>>> were dropping a \n at the end in one of the messages.
>>>
>>> I think there are two schools of thought on how to test informational
>>> messages:
>>>
>>> - assert an exact match on the exact output that we generate
>>> - assert that the output contains the pieces of information we care
>>>   about
>>>
>>> These two approaches are virtually opposites on two axes - the former
>>> will catch unintentional changes (like you've noted)[...]
>>
>> Yes, and to be fair I'm thoroughly in the "assert an exact match" camp,
>> i.e. "let's just use test_cmp", and not everyone would agree with that.
>>
>> I mean, I don't think we should test_cmp every single instance of a
>> command, but for things that are *the tests* concerning themselves with
>> what the output should be, yes we should do that.
>
> That's a good point I hadn't considered, which is that if we want any
> hope of catching unintentional changes in our test suite, we'd want
> _some_ test to check the output. For "git fetch --recurse-submodules",
> it makes the most sense for that test to live in this file.
>
> By eliminating all instances of test_cmp in this file in particular, we
> lose assurances that we don't introduce accidental changes. It makes
> sense to at least have some tests explicitly for output.
>
>>
>>> [...] and the latter saves on maintenance effort and tends to be less noisy in tests.
>>
>> I also don't think you're right about the other approach "sav[ing] on
>> [future] maintenance effort" in this case.
>>
>> If I was needing to adjust some of this output I'd spend way longer on
>> trying to carefully reason that some series of "grep" invocations were
>> really doing the right thing, and probably end up doing the equivalent
>> of a "test_cmp" for myself out of general paranoia, whereas adjusting
>> the output.
>
> That's fair. I've optimized the tests for readability by putting
> complicated logic in the test helper. But any diligent test reader would
> need to read the test helper to convince themselves of its correctness.
> In this case, I agree that the helper is too complex.
>
>>> Personally, I'm a bit torn between both approaches in general because I
>>> want tests to be maintainable (testing the exact output is a bit of an
>>> antipattern at Google), but I'm not very comfortable with the fact that
>>> unintended changes can sneak through.
>>
>> Yes, anyway whatever one thinks in general what I meant to point out
>> here with "biting the bullet" is that whatever one thinks in general
>> about the right approch for new tests, this series in particular seems
>> to be creating more work for itself than it needs by refactoring the
>> test_cmp in existing tests just to add a few new ones.
>>
>> I.e. even if you'd like to not use test_cmp-alike for the new tests,
>> wouldn't it be simpler to just leave the old ones in place and use your
>> new helper for your new tests?
>
> I'm not sure about this - avoiding changing old tests leads to
> fragmentation in the test suite and even the same file. I find it very
> challenging to read/modify files like this, because there is no longer a
> consistent style for the file, and I have to figure out which is the
> "good" way to write tests.
>
> This suggestion makes sense if there's some qualitative difference
> between the new tests and old ones besides just 'being new'. This isn't
> true for this series, so I'd prefer to keep things consistent.
>
>>> So I don't think there's a correct answer in general. Maybe an
>>> acceptable rule of thumb is that test_cmp is good until it starts
>>> getting in the way of reading and writing understandable tests.
>>>
>>> If we agree on that rule, then for this patch, I think replacing
>>> test_cmp is the way to go, primarily because it lets us ignore the 'old
>>> head' of the branch before the fetch, e.g. in the quoted example..
>>
>> [...]
>>
>>>>>  test_expect_success setup '
>>>>> @@ -274,13 +277,10 @@ test_expect_success "Recursion doesn't happen when no new commits are fetched in
>>>>>  '
>>>>>  
>>>>>  test_expect_success "Recursion stops when no new submodule commits are fetched" '
>>>>> -	head1=$(git rev-parse --short HEAD) &&
>>>>>  	git add submodule &&
>>>>>  	git commit -m "new submodule" &&
>>>>> -	head2=$(git rev-parse --short HEAD) &&
>>>>> -	echo "From $pwd/." > expect.err.super &&
>>>>> -	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
>>>>
>>>> ...as opposed to if we just rolled the generation of this into some
>>>> utility printf function.
>>>
>>> we'd still have to deal with $head1 if we use test_cmp. That's fine for
>>> this test, because it's pretty simple, but it gets pretty janky later
>>> on:
>>>
>>>   @@ -345,20 +339,13 @@ test_expect_success "Recursion picks up all submodules when necessary" '
>>>         git fetch &&
>>>         git checkout -q FETCH_HEAD
>>>       ) &&
>>>   -		head1=$(git rev-parse --short HEAD^) &&
>>>       git add subdir/deepsubmodule &&
>>>       git commit -m "new deepsubmodule" &&
>>>   -		head2=$(git rev-parse --short HEAD) &&
>>>   -		echo "Fetching submodule submodule" > ../expect.err.sub &&
>>>   -		echo "From $pwd/submodule" >> ../expect.err.sub &&
>>>   -		echo "   $head1..$head2  sub        -> origin/sub" >> ../expect.err.sub
>>>   +		git rev-parse --short HEAD >../subhead
>>>     ) &&
>>>   -	head1=$(git rev-parse --short HEAD) &&
>>>     git add submodule &&
>>>     git commit -m "new submodule" &&
>>>   -	head2=$(git rev-parse --short HEAD) &&
>>>   -	echo "From $pwd/." > expect.err.super &&
>>>   -	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
>>>   +	git rev-parse --short HEAD >superhead &&
>>>     (
>>>       cd downstream &&
>>>       git fetch >../actual.out 2>../actual.err
>>>
>>> In this example, we have two $head1 variables in different subshells,
>>> one of which is HEAD, but the other is HEAD^. The reason why we want
>>> HEAD^ isn't obvious (IIRC it's because the submodule upstream is 2
>>> commits ahead because we add the deepsubmodule in a separate commit), in
>>> my opinion, and I got tripped up quite a few times trying to read and
>>> understand the test. That's a lot of effort to spend on irrelevant
>>> information - the test actually cares about what it fetched, not where
>>> the ref used to be.
>>>
>>> So for that reason, I'd prefer to remove test_cmp for this test.
>>
>> I agree that it's pretty irrelevant, but I also think we'd be throwing
>> the baby out with the bath water by entirely doing away with test_cmp
>> here, there's an easier way to do this.
> [...]
>> Instead let's just test once somewhere that when we run submodule
>> fetching that submodules are indeed updated appropriately. Surely other
>> submodule tests will break if the "update" code is made to NOOP, or
>> update to the wrong HEAD>
>>
>> Then for all these test_cmp tests we can just sed-away the
>> $head1..$head2 with something like (untested):
>>
>>     sed -n -e 's/[^.]*\.\.[^.]*/OLD..NEW/g'
>>
>> I.e. let's just skip this entire ceremony with asserting the old/new
>> HEAD unless it's really needed (and then we can probably do it once
>> outside a test_cmp).
>>
>> If you grep through the test suite for "sed" adjacent to "test_cmp"
>> you'll find a lot of such examples of munging the output before
>> test_cmp-ing it.
>
> Makes sense, I hadn't considered this (though I have seen the pattern in
> the test suite, oops). The most compelling argument in favor of this is
> that this could remove a lot of the complexity of verify_fetch_result(),
> which is impeding test readability.
>
>> I.e. none of these tests surely need to test that we updated from
>> $head1..$head2 again and again with the corresponding verbosity in test
>> setup and shelling out to "git rev-parse --short HEAD" or whatever.
>
> I find the converse (we are testing the formatting over and over again)
> less convincing. In these tests, we really are checking for $head2 in
> the stderr to verify that the correct thing was fetched. I'm not
> convinced that we should be relying on _other_ submodule tests to tell
> us that submodule fetch is broken. Which brings me back to the original
> motivation of this patch..
>
>>
>> That's perfectly fine here, since the actual point of the test_cmp is to
>> check the formatting/order etc. of the output itself, not to continually
>> re-assert that submodule updating still works, and that we get the right
>> OIDs.
>
> which is that these tests actually are continually re-asserting the
> submodule updating works correctly in the different circumstances, and
> since we use the stderr to check this, test_cmp adds unwarranted noise.
>
> But you are correct in that the point of test_cmp is to check
> formatting/order etc. There is value in using test_cmp for this purpose,
> and getting rid of it entirely creates a hole in our test coverage.
> (This wouldn't mean that we'd need to use test_cmp _everywhere_ though,
> only that we need to use test_cmp _somewhere_.)
>
> As it stands:
>
> +   test_cmp can improve the readability of the test helpers and
>     debuggability of tests vs grep
> +/- test_cmp can catch formatting changes that are hard to catch
>     otherwise, though at the cost of being sensitive to _any_ formatting
>     changes
> -   test_cmp needs some munging to eliminate unnecessary information
>
> so on the whole, I think it's worth trying to use test_cmp in the test
> helper. We may not _need_ it everywhere, but if it would be nice to use
> it in as many places as possible.

I think whatever you opt to go for here makes sense. I just wanted to
provide the feedback in case it was helpful, i.e. when reading it I
found these conversions a bit odd, wondered if they were strictly needed
etc.

Thanks!

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v2 2/9] t5526: use grep to assert on fetches
  2022-02-17  9:25             ` Ævar Arnfjörð Bjarmason
@ 2022-02-17 16:16               ` Glen Choo
  0 siblings, 0 replies; 149+ messages in thread
From: Glen Choo @ 2022-02-17 16:16 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Jonathan Tan, Junio C Hamano

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Thu, Feb 17 2022, Glen Choo wrote:
>
>> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>>
>>> On Wed, Feb 16 2022, Glen Choo wrote:
>>>
>>>> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>>>>
>>>>> On Wed, Feb 16 2022, Glen Choo wrote:
>>>>>
>>>>>> In a previous commit, we replaced test_cmp invocations with
>>>>>> verify_fetch_result(). Finish the process of removing test_cmp by using
>>>>>> grep in verify_fetch_result() instead.
>>>>>>
>>>>>> This makes the tests less sensitive to changes because, instead of
>>>>>> checking the whole stderr, we only grep for the lines of the form
>>>>>>
>>>>>> * "<old-head>..<new-head>\s+branch\s+-> origin/branch"
>>>>>> * "Fetching submodule <submodule-path>" (if fetching a submodule)
>>>>>>
>>>>>> when we expect the repo to have fetched. If we expect the repo to not
>>>>>> have fetched, grep to make sure the lines are absent. Also, simplify the
>>>>>> assertions by using grep patterns that match only the relevant pieces of
>>>>>> information, e.g. <old-head> is irrelevant because we only want to know
>>>>>> if the fetch was performed, so we don't need to know where the branch
>>>>>> was before the fetch.
>>>>>
>>>>> I tried ejecting 1/9 and 2/9 out of this series locally, and it passes
>>>>> all tests until the new tests you add in 7/9.
>>>>>
>>>>> As ugly as some of the pre-image is, I wonder if dropping these first
>>>>> two and biting the bullet and just continuing with the test_cmp is
>>>>> better.
>>>>>
>>>>> The test_cmp is going to catch issues that even the cleverest grep
>>>>> combinations won't, e.g. in the submodule-in-C series I discovered a bug
>>>>> where all of our testing & the existing series hadn't spotted that we
>>>>> were dropping a \n at the end in one of the messages.
>>>>
>>>> I think there are two schools of thought on how to test informational
>>>> messages:
>>>>
>>>> - assert an exact match on the exact output that we generate
>>>> - assert that the output contains the pieces of information we care
>>>>   about
>>>>
>>>> These two approaches are virtually opposites on two axes - the former
>>>> will catch unintentional changes (like you've noted)[...]
>>>
>>> Yes, and to be fair I'm thoroughly in the "assert an exact match" camp,
>>> i.e. "let's just use test_cmp", and not everyone would agree with that.
>>>
>>> I mean, I don't think we should test_cmp every single instance of a
>>> command, but for things that are *the tests* concerning themselves with
>>> what the output should be, yes we should do that.
>>
>> That's a good point I hadn't considered, which is that if we want any
>> hope of catching unintentional changes in our test suite, we'd want
>> _some_ test to check the output. For "git fetch --recurse-submodules",
>> it makes the most sense for that test to live in this file.
>>
>> By eliminating all instances of test_cmp in this file in particular, we
>> lose assurances that we don't introduce accidental changes. It makes
>> sense to at least have some tests explicitly for output.
>>
>>>
>>>> [...] and the latter saves on maintenance effort and tends to be less noisy in tests.
>>>
>>> I also don't think you're right about the other approach "sav[ing] on
>>> [future] maintenance effort" in this case.
>>>
>>> If I was needing to adjust some of this output I'd spend way longer on
>>> trying to carefully reason that some series of "grep" invocations were
>>> really doing the right thing, and probably end up doing the equivalent
>>> of a "test_cmp" for myself out of general paranoia, whereas adjusting
>>> the output.
>>
>> That's fair. I've optimized the tests for readability by putting
>> complicated logic in the test helper. But any diligent test reader would
>> need to read the test helper to convince themselves of its correctness.
>> In this case, I agree that the helper is too complex.
>>
>>>> Personally, I'm a bit torn between both approaches in general because I
>>>> want tests to be maintainable (testing the exact output is a bit of an
>>>> antipattern at Google), but I'm not very comfortable with the fact that
>>>> unintended changes can sneak through.
>>>
>>> Yes, anyway whatever one thinks in general what I meant to point out
>>> here with "biting the bullet" is that whatever one thinks in general
>>> about the right approch for new tests, this series in particular seems
>>> to be creating more work for itself than it needs by refactoring the
>>> test_cmp in existing tests just to add a few new ones.
>>>
>>> I.e. even if you'd like to not use test_cmp-alike for the new tests,
>>> wouldn't it be simpler to just leave the old ones in place and use your
>>> new helper for your new tests?
>>
>> I'm not sure about this - avoiding changing old tests leads to
>> fragmentation in the test suite and even the same file. I find it very
>> challenging to read/modify files like this, because there is no longer a
>> consistent style for the file, and I have to figure out which is the
>> "good" way to write tests.
>>
>> This suggestion makes sense if there's some qualitative difference
>> between the new tests and old ones besides just 'being new'. This isn't
>> true for this series, so I'd prefer to keep things consistent.
>>
>>>> So I don't think there's a correct answer in general. Maybe an
>>>> acceptable rule of thumb is that test_cmp is good until it starts
>>>> getting in the way of reading and writing understandable tests.
>>>>
>>>> If we agree on that rule, then for this patch, I think replacing
>>>> test_cmp is the way to go, primarily because it lets us ignore the 'old
>>>> head' of the branch before the fetch, e.g. in the quoted example..
>>>
>>> [...]
>>>
>>>>>>  test_expect_success setup '
>>>>>> @@ -274,13 +277,10 @@ test_expect_success "Recursion doesn't happen when no new commits are fetched in
>>>>>>  '
>>>>>>  
>>>>>>  test_expect_success "Recursion stops when no new submodule commits are fetched" '
>>>>>> -	head1=$(git rev-parse --short HEAD) &&
>>>>>>  	git add submodule &&
>>>>>>  	git commit -m "new submodule" &&
>>>>>> -	head2=$(git rev-parse --short HEAD) &&
>>>>>> -	echo "From $pwd/." > expect.err.super &&
>>>>>> -	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
>>>>>
>>>>> ...as opposed to if we just rolled the generation of this into some
>>>>> utility printf function.
>>>>
>>>> we'd still have to deal with $head1 if we use test_cmp. That's fine for
>>>> this test, because it's pretty simple, but it gets pretty janky later
>>>> on:
>>>>
>>>>   @@ -345,20 +339,13 @@ test_expect_success "Recursion picks up all submodules when necessary" '
>>>>         git fetch &&
>>>>         git checkout -q FETCH_HEAD
>>>>       ) &&
>>>>   -		head1=$(git rev-parse --short HEAD^) &&
>>>>       git add subdir/deepsubmodule &&
>>>>       git commit -m "new deepsubmodule" &&
>>>>   -		head2=$(git rev-parse --short HEAD) &&
>>>>   -		echo "Fetching submodule submodule" > ../expect.err.sub &&
>>>>   -		echo "From $pwd/submodule" >> ../expect.err.sub &&
>>>>   -		echo "   $head1..$head2  sub        -> origin/sub" >> ../expect.err.sub
>>>>   +		git rev-parse --short HEAD >../subhead
>>>>     ) &&
>>>>   -	head1=$(git rev-parse --short HEAD) &&
>>>>     git add submodule &&
>>>>     git commit -m "new submodule" &&
>>>>   -	head2=$(git rev-parse --short HEAD) &&
>>>>   -	echo "From $pwd/." > expect.err.super &&
>>>>   -	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
>>>>   +	git rev-parse --short HEAD >superhead &&
>>>>     (
>>>>       cd downstream &&
>>>>       git fetch >../actual.out 2>../actual.err
>>>>
>>>> In this example, we have two $head1 variables in different subshells,
>>>> one of which is HEAD, but the other is HEAD^. The reason why we want
>>>> HEAD^ isn't obvious (IIRC it's because the submodule upstream is 2
>>>> commits ahead because we add the deepsubmodule in a separate commit), in
>>>> my opinion, and I got tripped up quite a few times trying to read and
>>>> understand the test. That's a lot of effort to spend on irrelevant
>>>> information - the test actually cares about what it fetched, not where
>>>> the ref used to be.
>>>>
>>>> So for that reason, I'd prefer to remove test_cmp for this test.
>>>
>>> I agree that it's pretty irrelevant, but I also think we'd be throwing
>>> the baby out with the bath water by entirely doing away with test_cmp
>>> here, there's an easier way to do this.
>> [...]
>>> Instead let's just test once somewhere that when we run submodule
>>> fetching that submodules are indeed updated appropriately. Surely other
>>> submodule tests will break if the "update" code is made to NOOP, or
>>> update to the wrong HEAD>
>>>
>>> Then for all these test_cmp tests we can just sed-away the
>>> $head1..$head2 with something like (untested):
>>>
>>>     sed -n -e 's/[^.]*\.\.[^.]*/OLD..NEW/g'
>>>
>>> I.e. let's just skip this entire ceremony with asserting the old/new
>>> HEAD unless it's really needed (and then we can probably do it once
>>> outside a test_cmp).
>>>
>>> If you grep through the test suite for "sed" adjacent to "test_cmp"
>>> you'll find a lot of such examples of munging the output before
>>> test_cmp-ing it.
>>
>> Makes sense, I hadn't considered this (though I have seen the pattern in
>> the test suite, oops). The most compelling argument in favor of this is
>> that this could remove a lot of the complexity of verify_fetch_result(),
>> which is impeding test readability.
>>
>>> I.e. none of these tests surely need to test that we updated from
>>> $head1..$head2 again and again with the corresponding verbosity in test
>>> setup and shelling out to "git rev-parse --short HEAD" or whatever.
>>
>> I find the converse (we are testing the formatting over and over again)
>> less convincing. In these tests, we really are checking for $head2 in
>> the stderr to verify that the correct thing was fetched. I'm not
>> convinced that we should be relying on _other_ submodule tests to tell
>> us that submodule fetch is broken. Which brings me back to the original
>> motivation of this patch..
>>
>>>
>>> That's perfectly fine here, since the actual point of the test_cmp is to
>>> check the formatting/order etc. of the output itself, not to continually
>>> re-assert that submodule updating still works, and that we get the right
>>> OIDs.
>>
>> which is that these tests actually are continually re-asserting the
>> submodule updating works correctly in the different circumstances, and
>> since we use the stderr to check this, test_cmp adds unwarranted noise.
>>
>> But you are correct in that the point of test_cmp is to check
>> formatting/order etc. There is value in using test_cmp for this purpose,
>> and getting rid of it entirely creates a hole in our test coverage.
>> (This wouldn't mean that we'd need to use test_cmp _everywhere_ though,
>> only that we need to use test_cmp _somewhere_.)
>>
>> As it stands:
>>
>> +   test_cmp can improve the readability of the test helpers and
>>     debuggability of tests vs grep
>> +/- test_cmp can catch formatting changes that are hard to catch
>>     otherwise, though at the cost of being sensitive to _any_ formatting
>>     changes
>> -   test_cmp needs some munging to eliminate unnecessary information
>>
>> so on the whole, I think it's worth trying to use test_cmp in the test
>> helper. We may not _need_ it everywhere, but if it would be nice to use
>> it in as many places as possible.
>
> I think whatever you opt to go for here makes sense. I just wanted to
> provide the feedback in case it was helpful, i.e. when reading it I
> found these conversions a bit odd, wondered if they were strictly needed
> etc.
>
> Thanks!

The additional perspective was indeed helpful, thanks!

^ permalink raw reply	[flat|nested] 149+ messages in thread

* [PATCH v3 00/10] fetch --recurse-submodules: fetch unpopulated submodules
  2022-02-15 17:23 ` [PATCH v2 0/9] " Glen Choo
                     ` (8 preceding siblings ...)
  2022-02-15 17:23   ` [PATCH v2 9/9] submodule: fix latent check_has_commit() bug Glen Choo
@ 2022-02-24 10:08   ` Glen Choo
  2022-02-24 10:08     ` [PATCH v3 01/10] t5526: introduce test helper to assert on fetches Glen Choo
                       ` (10 more replies)
  9 siblings, 11 replies; 149+ messages in thread
From: Glen Choo @ 2022-02-24 10:08 UTC (permalink / raw)
  To: git
  Cc: Glen Choo, Jonathan Tan, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

Original cover letter: https://lore.kernel.org/git/20220210044152.78352-1-chooglen@google.com

This series is based on gc/branch-recurse-submodules.

Thanks for the feedback, and sorry for the delay - I've made some
substantial changes in response to the feedback, as well as unearthing
some surprising bugs.

Patches 2 and 9 have some extra discussion in their --- descriptions.
I'd appreciate feedback on those (especially the 'future work' described
in patch 9).

= Patch organization

- Patches 1-3 are quality-of-life improvements to the test suite that
  make it easier to write the tests in patch 9.
- Patches 4-6 are preparation for "git fetch" to read .gitmodules from
  the superproject commit in patch 7.
- Patches 7-8 refactor out the logic of "finding which submodules to
  fetch" and "fetching the submodules", making it easier to tell "git
  fetch" to fetch unpopulated submodules.
- Patch 9 teaches "git fetch" to fetch changed, unpopulated submodules
  in addition to populated submodules.
- Patch 10 is an optional bugfix + cleanup of the "git fetch" code that
  removes the last caller of the deprecated "add_submodule_odb()".

= Changes 

== Since v2
- Numerous small fixes to the code and commit message (thanks to all who
  helped spot these :))
- In patch 2, use test_cmp + sed to assert on test output, effectively
  reverting the "use grep" approach of v1-2 (see patch 2's description).
- New patch 3: introduce a test helper that creates the expected
  superproject commit (instead of copy-pasting the code over and over).
  - I did not get rid of "git fetch" inside the test helper (as Jonathan
    suggested) though, because that requires a bigger change in the test
    setup, and I think the test helper makes the test straightforward
    enough.
- New patch 8: refactor some shared logic out into fetch_task_create().
  This reduces code duplication between the get_fetch_task_from_*
  functions.
- In patch 9, add additional tests for 'submodules with the same name'.
- In patch 9, handle a bug where a submodule that is unpopulated by "git
  rm" still has "core.worktree" set and cannot be fetched (see patch 9's
  description).
- Remove the "git fetch --update-shallow" patch (I'll try to send it
  separately).

== Since v1
- Numerous style fixes suggested by Jonathan (thanks!)
- In patch 3, don't prematurely read submodules from the superproject
  commit (see:
  <kl6l5yplyat6.fsf@chooglen-macbookpro.roam.corp.google.com>).
- In patch 7, stop using "git checkout" and "! grep" in tests.
- In patch 7, stop doing the "find changed submodules" rev walk
  unconditionally. Instead, continue to check for .gitmodules, but also
  check for submodules in $GIT_DIR/modules.
  - I'm not entirely happy with the helper function name, see "---" for
    details.
- Move "git fetch --update-shallow" bugfix to patch 8.
  - Because the "find changed submodules" rev walk is no longer
    unconditional, this fix is no longer needed for tests to pass.
- Rename fetch_populated_submodules() to fetch_submodules().

Glen Choo (10):
  t5526: introduce test helper to assert on fetches
  t5526: stop asserting on stderr literally
  t5526: create superproject commits with test helper
  submodule: make static functions read submodules from commits
  submodule: inline submodule_commits() into caller
  submodule: store new submodule commits oid_array in a struct
  submodule: extract get_fetch_task()
  submodule: move logic into fetch_task_create()
  fetch: fetch unpopulated, changed submodules
  submodule: fix latent check_has_commit() bug

 Documentation/fetch-options.txt |  26 +-
 Documentation/git-fetch.txt     |  10 +-
 builtin/fetch.c                 |  14 +-
 submodule.c                     | 395 ++++++++++++++---------
 submodule.h                     |  21 +-
 t/t5526-fetch-submodules.sh     | 533 ++++++++++++++++++++++++--------
 6 files changed, 687 insertions(+), 312 deletions(-)

Range-diff against v2:
 1:  a159cdaabb !  1:  b6d34b0f5c t5526: introduce test helper to assert on fetches
    @@ Metadata
      ## Commit message ##
         t5526: introduce test helper to assert on fetches
     
    -    A future commit will change the stderr of "git fetch
    -    --recurse-submodules" and add new tests to t/t5526-fetch-submodules.sh.
    -    This poses two challenges:
    +    Tests in t/t5526-fetch-submodules.sh are unnecessarily noisy:
    +
    +    * The tests have extra logic in order to reproduce the expected stderr
    +      literally, but not all of these details (e.g. the head of the
    +      remote-tracking branch before the fetch) are relevant to the test.
     
    -    * The tests use test_cmp to assert on the stderr, which will fail on the
    -      future test because the stderr changes slightly, even though it still
    -      contains the information we expect.
         * The expect.err file is constructed by the add_upstream_commit() helper
           as input into test_cmp, but most tests fetch a different combination
           of repos from expect.err. This results in noisy tests that modify
    @@ Commit message
         helper to t/t5526-fetch-submodules.sh that asserts on the output of "git
         fetch --recurse-submodules" and handles the ordering of expect.err.
     
    -    As a result, the tests no longer construct expect.err manually. test_cmp
    -    is still invoked by verify_fetch_result(), but that will be replaced in
    -    a later commit.
    +    As a result, the tests no longer construct expect.err manually. Tests
    +    still consider the old head of the remote-tracking branch ("$head1"),
    +    but that will be fixed in a later commit.
     
         Signed-off-by: Glen Choo <chooglen@google.com>
     
 2:  48894c6c43 !  2:  0b85fa35c2 t5526: use grep to assert on fetches
    @@ Metadata
     Author: Glen Choo <chooglen@google.com>
     
      ## Commit message ##
    -    t5526: use grep to assert on fetches
    +    t5526: stop asserting on stderr literally
     
    -    In a previous commit, we replaced test_cmp invocations with
    -    verify_fetch_result(). Finish the process of removing test_cmp by using
    -    grep in verify_fetch_result() instead.
    +    In the previous commit message, we noted that not all of the "git fetch"
    +    stderr is relevant to the tests. Most of the test setup lines are
    +    dedicated to these details of the stderr:
     
    -    This makes the tests less sensitive to changes because, instead of
    -    checking the whole stderr, we only grep for the lines of the form
    +    1. which repos (super/sub/deep) are involved in the fetch
    +    2. the head of the remote-tracking branch before the fetch (i.e. $head1)
    +    3. the head of the remote-tracking branch after the fetch (i.e. $head2)
     
    -    * "<old-head>..<new-head>\s+branch\s+-> origin/branch"
    -    * "Fetching submodule <submodule-path>" (if fetching a submodule)
    +    1. and 3. are relevant because they tell us that the expected commit is
    +    fetched by the expected repo, but 2. is completely irrelevant.
     
    -    when we expect the repo to have fetched. If we expect the repo to not
    -    have fetched, grep to make sure the lines are absent. Also, simplify the
    -    assertions by using grep patterns that match only the relevant pieces of
    -    information, e.g. <old-head> is irrelevant because we only want to know
    -    if the fetch was performed, so we don't need to know where the branch
    -    was before the fetch.
    +    Stop asserting on $head1 by replacing it with a dummy value in the
    +    actual and expected output. Do this by introducing test
    +    helpers (check_*()) that make it easier to construct the expected
    +    output, and use sed to munge the actual output.
     
         Signed-off-by: Glen Choo <chooglen@google.com>
     
      ## t/t5526-fetch-submodules.sh ##
    +@@ t/t5526-fetch-submodules.sh: export GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB
    + 
    + pwd=$(pwd)
    + 
    ++check_sub() {
    ++	NEW_HEAD=$1 &&
    ++	cat <<-EOF >$pwd/expect.err.sub
    ++	Fetching submodule submodule
    ++	From $pwd/submodule
    ++	   OLD_HEAD..$NEW_HEAD  sub        -> origin/sub
    ++	EOF
    ++}
    ++
    ++check_deep() {
    ++	NEW_HEAD=$1 &&
    ++	cat <<-EOF >$pwd/expect.err.deep
    ++	Fetching submodule submodule/subdir/deepsubmodule
    ++	From $pwd/deepsubmodule
    ++	   OLD_HEAD..$NEW_HEAD  deep       -> origin/deep
    ++	EOF
    ++}
    ++
    ++check_super() {
    ++	NEW_HEAD=$1 &&
    ++	cat <<-EOF >$pwd/expect.err.super
    ++	From $pwd/.
    ++	   OLD_HEAD..$NEW_HEAD  super      -> origin/super
    ++	EOF
    ++}
    ++
    + # For each submodule in the test setup, this creates a commit and writes
    + # a file that contains the expected err if that new commit were fetched.
    + # These output files get concatenated in the right order by
     @@ t/t5526-fetch-submodules.sh: pwd=$(pwd)
      add_upstream_commit() {
      	(
    @@ t/t5526-fetch-submodules.sh: pwd=$(pwd)
     -		echo "Fetching submodule submodule" > ../expect.err.sub &&
     -		echo "From $pwd/submodule" >> ../expect.err.sub &&
     -		echo "   $head1..$head2  sub        -> origin/sub" >> ../expect.err.sub
    -+		git rev-parse --short HEAD >../subhead
    ++		new_head=$(git rev-parse --short HEAD) &&
    ++		check_sub $new_head
      	) &&
      	(
      		cd deepsubmodule &&
    @@ t/t5526-fetch-submodules.sh: pwd=$(pwd)
     -		echo "Fetching submodule submodule/subdir/deepsubmodule" > ../expect.err.deep
     -		echo "From $pwd/deepsubmodule" >> ../expect.err.deep &&
     -		echo "   $head1..$head2  deep       -> origin/deep" >> ../expect.err.deep
    -+		git rev-parse --short HEAD >../deephead
    ++		new_head=$(git rev-parse --short HEAD) &&
    ++		check_deep $new_head
      	)
      }
      
    - # Verifies that the expected repositories were fetched. This is done by
    --# concatenating the files expect.err.[super|sub|deep] in the correct
    --# order and comparing it to the actual stderr.
    -+# checking that the branches of [super|sub|deep] were updated to
    -+# [super|sub|deep]head if the corresponding file exists.
    - #
    --# If a repo should not be fetched in the test, its corresponding
    --# expect.err file should be rm-ed.
    -+# If the [super|sub|deep] head file does not exist, this verifies that
    -+# the corresponding repo was not fetched. Thus, if a repo should not be
    -+# fetched in the test, its corresponding head file should be
    -+# rm-ed.
    - verify_fetch_result() {
    - 	ACTUAL_ERR=$1 &&
    --	rm -f expect.err.combined &&
    --	if [ -f expect.err.super ]; then
    --		cat expect.err.super >>expect.err.combined
    -+	# Each grep pattern is guaranteed to match the correct repo
    -+	# because each repo uses a different name for their branch i.e.
    -+	# "super", "sub" and "deep".
    -+	if [ -f superhead ]; then
    -+		grep -E "\.\.$(cat superhead)\s+super\s+-> origin/super" $ACTUAL_ERR
    -+	else
    -+		! grep "super" $ACTUAL_ERR
    - 	fi &&
    --	if [ -f expect.err.sub ]; then
    --		cat expect.err.sub >>expect.err.combined
    -+	if [ -f subhead ]; then
    -+		grep "Fetching submodule submodule" $ACTUAL_ERR &&
    -+		grep -E "\.\.$(cat subhead)\s+sub\s+-> origin/sub" $ACTUAL_ERR
    -+	else
    -+		! grep "Fetching submodule submodule" $ACTUAL_ERR
    +@@ t/t5526-fetch-submodules.sh: verify_fetch_result() {
    + 	if [ -f expect.err.deep ]; then
    + 		cat expect.err.deep >>expect.err.combined
      	fi &&
    --	if [ -f expect.err.deep ]; then
    --		cat expect.err.deep >>expect.err.combined
    --	fi &&
     -	test_cmp expect.err.combined $ACTUAL_ERR
    -+	if [ -f deephead ]; then
    -+		grep "Fetching submodule submodule/subdir/deepsubmodule" $ACTUAL_ERR &&
    -+		grep -E "\.\.$(cat deephead)\s+deep\s+-> origin/deep" $ACTUAL_ERR
    -+	else
    -+		! grep "Fetching submodule submodule/subdir/deepsubmodule" $ACTUAL_ERR
    -+	fi
    ++	sed -E 's/[0-9a-f]+\.\./OLD_HEAD\.\./' $ACTUAL_ERR >actual.err.cmp &&
    ++	test_cmp expect.err.combined actual.err.cmp
      }
      
      test_expect_success setup '
    @@ t/t5526-fetch-submodules.sh: test_expect_success "Recursion doesn't happen when
     -	head2=$(git rev-parse --short HEAD) &&
     -	echo "From $pwd/." > expect.err.super &&
     -	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
    --	rm expect.err.deep &&
    -+	git rev-parse --short HEAD >superhead &&
    -+	rm deephead &&
    ++	new_head=$(git rev-parse --short HEAD) &&
    ++	check_super $new_head &&
    + 	rm expect.err.deep &&
      	(
      		cd downstream &&
    - 		git fetch >../actual.out 2>../actual.err
     @@ t/t5526-fetch-submodules.sh: test_expect_success "Recursion stops when no new submodule commits are fetched"
      
      test_expect_success "Recursion doesn't happen when new superproject commits don't change any submodules" '
    @@ t/t5526-fetch-submodules.sh: test_expect_success "Recursion stops when no new su
     -	head2=$(git rev-parse --short HEAD) &&
     -	echo "From $pwd/." > expect.err.super &&
     -	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
    --	rm expect.err.sub &&
    --	rm expect.err.deep &&
    -+	git rev-parse --short HEAD >superhead &&
    -+	rm subhead &&
    -+	rm deephead &&
    ++	new_head=$(git rev-parse --short HEAD) &&
    ++	check_super $new_head &&
    + 	rm expect.err.sub &&
    + 	rm expect.err.deep &&
      	(
    - 		cd downstream &&
    - 		git fetch >../actual.out 2>../actual.err
     @@ t/t5526-fetch-submodules.sh: test_expect_success "Recursion picks up config in submodule" '
      		)
      	) &&
    @@ t/t5526-fetch-submodules.sh: test_expect_success "Recursion picks up config in s
     -	head2=$(git rev-parse --short HEAD) &&
     -	echo "From $pwd/." > expect.err.super &&
     -	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
    -+	git rev-parse --short HEAD >superhead &&
    ++	new_head=$(git rev-parse --short HEAD) &&
    ++	check_super $new_head &&
      	(
      		cd downstream &&
      		git fetch >../actual.out 2>../actual.err &&
    @@ t/t5526-fetch-submodules.sh: test_expect_success "Recursion picks up all submodu
     -		echo "Fetching submodule submodule" > ../expect.err.sub &&
     -		echo "From $pwd/submodule" >> ../expect.err.sub &&
     -		echo "   $head1..$head2  sub        -> origin/sub" >> ../expect.err.sub
    -+		git rev-parse --short HEAD >../subhead
    ++		new_head=$(git rev-parse --short HEAD) &&
    ++		check_sub $new_head
      	) &&
     -	head1=$(git rev-parse --short HEAD) &&
      	git add submodule &&
    @@ t/t5526-fetch-submodules.sh: test_expect_success "Recursion picks up all submodu
     -	head2=$(git rev-parse --short HEAD) &&
     -	echo "From $pwd/." > expect.err.super &&
     -	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
    -+	git rev-parse --short HEAD >superhead &&
    ++	new_head=$(git rev-parse --short HEAD) &&
    ++	check_super $new_head &&
      	(
      		cd downstream &&
      		git fetch >../actual.out 2>../actual.err
    @@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-deman
     -		echo Fetching submodule submodule > ../expect.err.sub &&
     -		echo "From $pwd/submodule" >> ../expect.err.sub &&
     -		echo "   $head1..$head2  sub        -> origin/sub" >> ../expect.err.sub
    -+		git rev-parse --short HEAD >../subhead
    ++		new_head=$(git rev-parse --short HEAD) &&
    ++		check_sub $new_head
      	) &&
      	(
      		cd downstream &&
    @@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-deman
     -	head2=$(git rev-parse --short HEAD) &&
     -	echo "From $pwd/." > expect.err.super &&
     -	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
    -+	git rev-parse --short HEAD >superhead &&
    ++	new_head=$(git rev-parse --short HEAD) &&
    ++	check_super $new_head &&
      	(
      		cd downstream &&
      		git config fetch.recurseSubmodules false &&
    @@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-deman
     -	head2=$(git rev-parse --short HEAD) &&
     -	echo "From $pwd/." > expect.err.super &&
     -	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
    --	rm expect.err.sub &&
    --	rm expect.err.deep &&
    -+	git rev-parse --short HEAD >superhead &&
    -+	rm subhead &&
    -+	rm deephead &&
    ++	new_head=$(git rev-parse --short HEAD) &&
    ++	check_super $new_head &&
    + 	rm expect.err.sub &&
    + 	rm expect.err.deep &&
      	(
    - 		cd downstream &&
    - 		git fetch --recurse-submodules=on-demand >../actual.out 2>../actual.err
     @@ t/t5526-fetch-submodules.sh: test_expect_success "'fetch.recurseSubmodules=on-demand' overrides global config
      	) &&
      	add_upstream_commit &&
    @@ t/t5526-fetch-submodules.sh: test_expect_success "'fetch.recurseSubmodules=on-de
     -	head2=$(git rev-parse --short HEAD) &&
     -	echo "From $pwd/." > expect.err.super &&
     -	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
    --	rm expect.err.deep &&
    -+	git rev-parse --short HEAD >superhead &&
    -+	rm deephead &&
    ++	new_head=$(git rev-parse --short HEAD) &&
    ++	check_super $new_head &&
    + 	rm expect.err.deep &&
      	(
      		cd downstream &&
    - 		git config fetch.recurseSubmodules on-demand &&
     @@ t/t5526-fetch-submodules.sh: test_expect_success "'submodule.<sub>.fetchRecurseSubmodules=on-demand' override
      	) &&
      	add_upstream_commit &&
    @@ t/t5526-fetch-submodules.sh: test_expect_success "'submodule.<sub>.fetchRecurseS
     -	head2=$(git rev-parse --short HEAD) &&
     -	echo "From $pwd/." > expect.err.super &&
     -	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
    --	rm expect.err.deep &&
    -+	git rev-parse --short HEAD >superhead &&
    -+	rm deephead &&
    ++	new_head=$(git rev-parse --short HEAD) &&
    ++	check_super $new_head &&
    + 	rm expect.err.deep &&
      	(
      		cd downstream &&
    - 		git config submodule.submodule.fetchRecurseSubmodules on-demand &&
     @@ t/t5526-fetch-submodules.sh: test_expect_success "don't fetch submodule when newly recorded commits are alrea
      		cd submodule &&
      		git checkout -q HEAD^^
    @@ t/t5526-fetch-submodules.sh: test_expect_success "don't fetch submodule when new
     -	head2=$(git rev-parse --short HEAD) &&
     -	echo "From $pwd/." > expect.err.super &&
     -	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
    --	rm expect.err.sub &&
    -+	git rev-parse --short HEAD >superhead &&
    -+	rm subhead &&
    ++	new_head=$(git rev-parse --short HEAD) &&
    ++	check_super $new_head &&
    + 	rm expect.err.sub &&
      	# This file does not exist, but rm -f for readability
    --	rm -f expect.err.deep &&
    -+	rm -f deephead &&
    - 	(
    - 		cd downstream &&
    - 		git fetch >../actual.out 2>../actual.err
    + 	rm -f expect.err.deep &&
     @@ t/t5526-fetch-submodules.sh: test_expect_success "'fetch.recurseSubmodules=on-demand' works also without .git
      		git fetch --recurse-submodules
      	) &&
    @@ t/t5526-fetch-submodules.sh: test_expect_success "'fetch.recurseSubmodules=on-de
     -	head2=$(git rev-parse --short HEAD) &&
     -	echo "From $pwd/." >expect.err.super &&
     -	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
    --	rm expect.err.deep &&
    -+	git rev-parse --short HEAD >superhead &&
    -+	rm deephead &&
    ++	new_head=$(git rev-parse --short HEAD) &&
    ++	check_super $new_head &&
    + 	rm expect.err.deep &&
      	(
      		cd downstream &&
    - 		rm .gitmodules &&
 -:  ---------- >  3:  bb8ef6094a t5526: create superproject commits with test helper
 3:  6cf5e76d62 !  4:  e83a1713c4 submodule: make static functions read submodules from commits
    @@ Commit message
         Submodules will be read from commits when we fetch unpopulated
         submodules.
     
    -    The changed function signatures follow repo_submodule_init()'s argument
    -    order, i.e. "path" then "treeish_name". Where needed, reorder the
    -    arguments of functions that already take "path" and "treeish_name" to be
    -    consistent with this convention.
    -
         Signed-off-by: Glen Choo <chooglen@google.com>
     
      ## submodule.c ##
    @@ submodule.c: static int check_has_commit(const struct object_id *oid, void *data
      				 struct oid_array *commits)
      {
     -	struct has_commit_data has_commit = { r, 1, path };
    -+	struct has_commit_data has_commit = { r, 1, path, super_oid };
    ++	struct has_commit_data has_commit = {
    ++		.repo = r,
    ++		.result = 1,
    ++		.path = path,
    ++		.super_oid = super_oid
    ++	};
      
      	/*
      	 * Perform a cheap, but incorrect check for the existence of 'commits'.
 4:  07fd4ff0a9 =  5:  e27d402b9a submodule: inline submodule_commits() into caller
 5:  f049cb231b !  6:  1c7c8218b8 submodule: store new submodule commits oid_array in a struct
    @@ submodule.c: static void collect_changed_submodules(struct repository *r,
     +static void free_submodules_data(struct string_list *submodules)
      {
      	struct string_list_item *item;
    --	for_each_string_list_item(item, submodules)
    + 	for_each_string_list_item(item, submodules)
     -		oid_array_clear((struct oid_array *) item->util);
    -+	for_each_string_list_item(item, submodules) {
     +		changed_submodule_data_clear(item->util);
    -+	}
    ++
      	string_list_clear(submodules, 1);
      }
      
 6:  814073eecc !  7:  80cf317722 submodule: extract get_fetch_task()
    @@ Commit message
         Signed-off-by: Glen Choo <chooglen@google.com>
     
      ## submodule.c ##
    +@@ submodule.c: struct fetch_task {
    + 	struct repository *repo;
    + 	const struct submodule *sub;
    + 	unsigned free_sub : 1; /* Do we need to free the submodule? */
    ++	const char *default_argv;
    + 
    + 	struct oid_array *commits; /* Ensure these commits are fetched */
    + };
     @@ submodule.c: static struct repository *get_submodule_repo_for(struct repository *r,
      	return ret;
      }
    @@ submodule.c: static struct repository *get_submodule_repo_for(struct repository
     -static int get_next_submodule(struct child_process *cp,
     -			      struct strbuf *err, void *data, void **task_cb)
     +static struct fetch_task *
    -+get_fetch_task(struct submodule_parallel_fetch *spf,
    -+	       const char **default_argv, struct strbuf *err)
    ++get_fetch_task(struct submodule_parallel_fetch *spf, struct strbuf *err)
      {
     -	struct submodule_parallel_fetch *spf = data;
     -
    @@ submodule.c: static int get_next_submodule(struct child_process *cp,
      					task->sub->name))
      				continue;
     -			default_argv = "on-demand";
    -+			*default_argv = "on-demand";
    ++			task->default_argv = "on-demand";
      			break;
      		case RECURSE_SUBMODULES_ON:
     -			default_argv = "yes";
    -+			*default_argv = "yes";
    ++			task->default_argv = "yes";
      			break;
      		case RECURSE_SUBMODULES_OFF:
      			continue;
    @@ submodule.c: static int get_next_submodule(struct child_process *cp,
     +			      void *data, void **task_cb)
     +{
     +	struct submodule_parallel_fetch *spf = data;
    -+	const char *default_argv = NULL;
    -+	struct fetch_task *task = get_fetch_task(spf, &default_argv, err);
    ++	struct fetch_task *task = get_fetch_task(spf, err);
     +
     +	if (task) {
     +		struct strbuf submodule_prefix = STRBUF_INIT;
    @@ submodule.c: static int get_next_submodule(struct child_process *cp,
     +		cp->git_cmd = 1;
     +		strvec_init(&cp->args);
     +		strvec_pushv(&cp->args, spf->args.v);
    -+		strvec_push(&cp->args, default_argv);
    ++		strvec_push(&cp->args, task->default_argv);
     +		strvec_push(&cp->args, "--submodule-prefix");
     +
     +		strbuf_addf(&submodule_prefix, "%s%s/",
 -:  ---------- >  8:  bf9cfa7054 submodule: move logic into fetch_task_create()
 7:  10fd5bf921 !  9:  c7c2ff71b6 fetch: fetch unpopulated, changed submodules
    @@ Commit message
         commands like "git checkout --recurse-submodules" might fail.
     
         Teach "git fetch" to fetch cloned, changed submodules regardless of
    -    whether they are populated (this is in addition to the current behavior
    -    of fetching populated submodules).
    +    whether they are populated. This is in addition to the current behavior
    +    of fetching populated submodules (which is always attempted regardless
    +    of what was fetched in the superproject, or even if nothing was fetched
    +    in the superproject).
     
    -    Since a submodule may be encountered multiple times (via the list of
    -    populated submodules or via the list of changed submodules), maintain a
    -    list of seen submodules to avoid fetching a submodule more than once.
    +    A submodule may be encountered multiple times (via the list of
    +    populated submodules or via the list of changed submodules). When this
    +    happens, "git fetch" only reads the 'populated copy' and ignores the
    +    'changed copy'. Amend the verify_fetch_result() test helper so that we
    +    can assert on which 'copy' is being read.
     
         Signed-off-by: Glen Choo <chooglen@google.com>
     
    @@ submodule.c: void check_for_new_submodule_commits(struct object_id *oid)
      }
      
     +/*
    -+ * Returns 1 if the repo has absorbed submodule gitdirs, and 0
    -+ * otherwise. Like submodule_name_to_gitdir(), this checks
    ++ * Returns 1 if there is at least one submodule gitdir in
    ++ * $GIT_DIR/modules and 0 otherwise. This follows
    ++ * submodule_name_to_gitdir(), which looks for submodules in
     + * $GIT_DIR/modules, not $GIT_COMMON_DIR.
    ++ *
    ++ * A submodule can be moved to $GIT_DIR/modules manually by running "git
    ++ * submodule absorbgitdirs", or it may be initialized there by "git
    ++ * submodule update".
     + */
     +static int repo_has_absorbed_submodules(struct repository *r)
     +{
    @@ submodule.c: struct submodule_parallel_fetch {
      	.submodules_with_errors = STRBUF_INIT, \
      }
      
    -@@ submodule.c: static struct repository *get_submodule_repo_for(struct repository *r,
    +@@ submodule.c: struct fetch_task {
    + 	const struct submodule *sub;
    + 	unsigned free_sub : 1; /* Do we need to free the submodule? */
    + 	const char *default_argv;
    ++	struct strvec git_args;
    + 
    + 	struct oid_array *commits; /* Ensure these commits are fetched */
    + };
    +@@ submodule.c: static void fetch_task_release(struct fetch_task *p)
    + 	if (p->repo)
    + 		repo_clear(p->repo);
    + 	FREE_AND_NULL(p->repo);
    ++
    ++	strvec_clear(&p->git_args);
    + }
    + 
    + static struct repository *get_submodule_repo_for(struct repository *r,
    +@@ submodule.c: static struct fetch_task *fetch_task_create(struct submodule_parallel_fetch *spf
    + 		task->free_sub = 1;
    + 	}
    + 
    ++	if (string_list_lookup(&spf->seen_submodule_names, task->sub->name))
    ++		goto cleanup;
    ++
    + 	switch (get_fetch_recurse_config(task->sub, spf))
    + 	{
    + 	default:
    +@@ submodule.c: static struct fetch_task *fetch_task_create(struct submodule_parallel_fetch *spf
      }
      
      static struct fetch_task *
    --get_fetch_task(struct submodule_parallel_fetch *spf,
    --	       const char **default_argv, struct strbuf *err)
    +-get_fetch_task(struct submodule_parallel_fetch *spf, struct strbuf *err)
     +get_fetch_task_from_index(struct submodule_parallel_fetch *spf,
    -+			  const char **default_argv, struct strbuf *err)
    ++			  struct strbuf *err)
      {
     -	for (; spf->count < spf->r->index->cache_nr; spf->count++) {
     -		const struct cache_entry *ce = spf->r->index->cache[spf->count];
    @@ submodule.c: static struct repository *get_submodule_repo_for(struct repository
      		struct fetch_task *task;
      
      		if (!S_ISGITLINK(ce->ce_mode))
    -@@ submodule.c: get_fetch_task(struct submodule_parallel_fetch *spf,
    - 		if (!task)
    - 			continue;
    - 
    -+		/*
    -+		 * We might have already considered this submodule
    -+		 * because we saw it when iterating the changed
    -+		 * submodule names.
    -+		 */
    -+		if (string_list_lookup(&spf->seen_submodule_names,
    -+				       task->sub->name))
    -+			continue;
    -+
    - 		switch (get_fetch_recurse_config(task->sub, spf))
    - 		{
    - 		default:
    -@@ submodule.c: get_fetch_task(struct submodule_parallel_fetch *spf,
    +@@ submodule.c: get_fetch_task(struct submodule_parallel_fetch *spf, struct strbuf *err)
      				strbuf_addf(err, _("Fetching submodule %s%s\n"),
      					    spf->prefix, ce->name);
      
    @@ submodule.c: get_fetch_task(struct submodule_parallel_fetch *spf,
      			return task;
      		} else {
      			struct strbuf empty_submodule_path = STRBUF_INIT;
    -@@ submodule.c: get_fetch_task(struct submodule_parallel_fetch *spf,
    +@@ submodule.c: get_fetch_task(struct submodule_parallel_fetch *spf, struct strbuf *err)
      	return NULL;
      }
      
     +static struct fetch_task *
     +get_fetch_task_from_changed(struct submodule_parallel_fetch *spf,
    -+			    const char **default_argv, struct strbuf *err)
    ++			    struct strbuf *err)
     +{
     +	for (; spf->changed_count < spf->changed_submodule_names.nr;
     +	     spf->changed_count++) {
    @@ submodule.c: get_fetch_task(struct submodule_parallel_fetch *spf,
     +		struct changed_submodule_data *cs_data = item.util;
     +		struct fetch_task *task;
     +
    -+		/*
    -+		 * We might have already considered this submodule
    -+		 * because we saw it in the index.
    -+		 */
    -+		if (string_list_lookup(&spf->seen_submodule_names, item.string))
    ++		if (!is_tree_submodule_active(spf->r, cs_data->super_oid,cs_data->path))
     +			continue;
     +
    -+		task = fetch_task_create(spf->r, cs_data->path,
    ++		task = fetch_task_create(spf, cs_data->path,
     +					 cs_data->super_oid);
     +		if (!task)
     +			continue;
     +
    -+		switch (get_fetch_recurse_config(task->sub, spf)) {
    -+		default:
    -+		case RECURSE_SUBMODULES_DEFAULT:
    -+		case RECURSE_SUBMODULES_ON_DEMAND:
    -+			*default_argv = "on-demand";
    -+			break;
    -+		case RECURSE_SUBMODULES_ON:
    -+			*default_argv = "yes";
    -+			break;
    -+		case RECURSE_SUBMODULES_OFF:
    -+			continue;
    -+		}
    -+
    -+		task->repo = get_submodule_repo_for(spf->r, task->sub->path,
    -+						    cs_data->super_oid);
     +		if (!task->repo) {
    ++			strbuf_addf(err, _("Could not access submodule '%s' at commit %s\n"),
    ++				    cs_data->path,
    ++				    find_unique_abbrev(cs_data->super_oid, DEFAULT_ABBREV));
    ++
     +			fetch_task_release(task);
     +			free(task);
    -+
    -+			strbuf_addf(err, _("Could not access submodule '%s'\n"),
    -+				    cs_data->path);
     +			continue;
     +		}
    -+		if (!is_tree_submodule_active(spf->r, cs_data->super_oid,
    -+					      task->sub->path))
    -+			continue;
     +
     +		if (!spf->quiet)
     +			strbuf_addf(err,
    @@ submodule.c: get_fetch_task(struct submodule_parallel_fetch *spf,
     +				    spf->prefix, task->sub->path,
     +				    find_unique_abbrev(cs_data->super_oid,
     +						       DEFAULT_ABBREV));
    ++
     +		spf->changed_count++;
    ++		/*
    ++		 * NEEDSWORK: A submodule unpopulated by "git rm" will
    ++		 * have core.worktree set, but the actual core.worktree
    ++		 * directory won't exist, causing the child process to
    ++		 * fail. Forcibly set --work-tree until we get smarter
    ++		 * handling for core.worktree in unpopulated submodules.
    ++		 */
    ++		strvec_push(&task->git_args, "--work-tree=.");
     +		return task;
     +	}
     +	return NULL;
    @@ submodule.c: get_fetch_task(struct submodule_parallel_fetch *spf,
      			      void *data, void **task_cb)
      {
      	struct submodule_parallel_fetch *spf = data;
    - 	const char *default_argv = NULL;
    --	struct fetch_task *task = get_fetch_task(spf, &default_argv, err);
    +-	struct fetch_task *task = get_fetch_task(spf, err);
     +	struct fetch_task *task =
    -+		get_fetch_task_from_index(spf, &default_argv, err);
    ++		get_fetch_task_from_index(spf, err);
     +	if (!task)
    -+		task = get_fetch_task_from_changed(spf, &default_argv, err);
    ++		task = get_fetch_task_from_changed(spf, err);
      
      	if (task) {
      		struct strbuf submodule_prefix = STRBUF_INIT;
    +@@ submodule.c: static int get_next_submodule(struct child_process *cp, struct strbuf *err,
    + 		prepare_submodule_repo_env_in_gitdir(&cp->env_array);
    + 		cp->git_cmd = 1;
    + 		strvec_init(&cp->args);
    ++		if (task->git_args.nr)
    ++			strvec_pushv(&cp->args, task->git_args.v);
    + 		strvec_pushv(&cp->args, spf->args.v);
    + 		strvec_push(&cp->args, task->default_argv);
    + 		strvec_push(&cp->args, "--submodule-prefix");
     @@ submodule.c: static int get_next_submodule(struct child_process *cp, struct strbuf *err,
      		*task_cb = task;
      
    @@ submodule.h: int should_update_submodules(void);
      
     
      ## t/t5526-fetch-submodules.sh ##
    +@@ t/t5526-fetch-submodules.sh: pwd=$(pwd)
    + 
    + check_sub() {
    + 	NEW_HEAD=$1 &&
    ++	SUPER_HEAD=$2 &&
    + 	cat <<-EOF >$pwd/expect.err.sub
    +-	Fetching submodule submodule
    ++	Fetching submodule submodule${SUPER_HEAD:+ at commit $SUPER_HEAD}
    + 	From $pwd/submodule
    + 	   OLD_HEAD..$NEW_HEAD  sub        -> origin/sub
    + 	EOF
    +@@ t/t5526-fetch-submodules.sh: check_sub() {
    + 
    + check_deep() {
    + 	NEW_HEAD=$1 &&
    ++	SUB_HEAD=$2 &&
    + 	cat <<-EOF >$pwd/expect.err.deep
    +-	Fetching submodule submodule/subdir/deepsubmodule
    ++	Fetching submodule submodule/subdir/deepsubmodule${SUB_HEAD:+ at commit $SUB_HEAD}
    + 	From $pwd/deepsubmodule
    + 	   OLD_HEAD..$NEW_HEAD  deep       -> origin/deep
    + 	EOF
     @@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necess
      	verify_fetch_result actual.err
      '
    @@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-deman
     +	(
     +		cd downstream &&
     +		git checkout --recurse-submodules -b no-submodules &&
    -+		rm .gitmodules &&
    ++		git rm .gitmodules &&
     +		git rm submodule &&
    -+		git add .gitmodules &&
     +		git commit -m "no submodules" &&
     +		git checkout --recurse-submodules super
     +	)
     +'
     +
     +test_expect_success "'--recurse-submodules=on-demand' should fetch submodule commits if the submodule is changed but the index has no submodules" '
    -+	git -C downstream fetch --recurse-submodules &&
    -+	# Create new superproject commit with updated submodules
    -+	add_upstream_commit &&
    -+	(
    -+		cd submodule &&
    -+		(
    -+			cd subdir/deepsubmodule &&
    -+			git fetch &&
    -+			git checkout -q FETCH_HEAD
    -+		) &&
    -+		git add subdir/deepsubmodule &&
    -+		git commit -m "new deep submodule"
    -+	) &&
    -+	git add submodule &&
    -+	git commit -m "new submodule" &&
    -+
    ++	add_submodule_commits &&
    ++	add_superproject_commits &&
     +	# Fetch the new superproject commit
     +	(
     +		cd downstream &&
     +		git switch --recurse-submodules no-submodules &&
     +		git fetch --recurse-submodules=on-demand >../actual.out 2>../actual.err
     +	) &&
    -+	test_must_be_empty actual.out &&
    -+	git rev-parse --short HEAD >superhead &&
    -+	git -C submodule rev-parse --short HEAD >subhead &&
    -+	git -C deepsubmodule rev-parse --short HEAD >deephead &&
    -+	verify_fetch_result actual.err &&
    ++	super_head=$(git rev-parse --short HEAD) &&
    ++	sub_head=$(git -C submodule rev-parse --short HEAD) &&
    ++	deep_head=$(git -C submodule/subdir/deepsubmodule rev-parse --short HEAD) &&
    ++
    ++	# assert that these are fetched from commits, not the index
    ++	check_sub $sub_head $super_head &&
    ++	check_deep $deep_head $sub_head &&
     +
    -+	# Assert that the fetch happened at the non-HEAD commits
    -+	grep "Fetching submodule submodule at commit $superhead" actual.err &&
    -+	grep "Fetching submodule submodule/subdir/deepsubmodule at commit $subhead" actual.err
    ++	test_must_be_empty actual.out &&
    ++	verify_fetch_result actual.err
     +'
     +
     +test_expect_success "'--recurse-submodules' should fetch submodule commits if the submodule is changed but the index has no submodules" '
    -+	# Fetch any leftover commits from other tests.
    -+	git -C downstream fetch --recurse-submodules &&
    -+	# Create new superproject commit with updated submodules
    -+	add_upstream_commit &&
    -+	(
    -+		cd submodule &&
    -+		(
    -+			cd subdir/deepsubmodule &&
    -+			git fetch &&
    -+			git checkout -q FETCH_HEAD
    -+		) &&
    -+		git add subdir/deepsubmodule &&
    -+		git commit -m "new deep submodule"
    -+	) &&
    -+	git add submodule &&
    -+	git commit -m "new submodule" &&
    -+
    ++	add_submodule_commits &&
    ++	add_superproject_commits &&
     +	# Fetch the new superproject commit
     +	(
     +		cd downstream &&
     +		git switch --recurse-submodules no-submodules &&
     +		git fetch --recurse-submodules >../actual.out 2>../actual.err
     +	) &&
    -+	test_must_be_empty actual.out &&
    -+	git rev-parse --short HEAD >superhead &&
    -+	git -C submodule rev-parse --short HEAD >subhead &&
    -+	git -C deepsubmodule rev-parse --short HEAD >deephead &&
    -+	verify_fetch_result actual.err &&
    ++	super_head=$(git rev-parse --short HEAD) &&
    ++	sub_head=$(git -C submodule rev-parse --short HEAD) &&
    ++	deep_head=$(git -C submodule/subdir/deepsubmodule rev-parse --short HEAD) &&
    ++
    ++	# assert that these are fetched from commits, not the index
    ++	check_sub $sub_head $super_head &&
    ++	check_deep $deep_head $sub_head &&
     +
    -+	# Assert that the fetch happened at the non-HEAD commits
    -+	grep "Fetching submodule submodule at commit $superhead" actual.err &&
    -+	grep "Fetching submodule submodule/subdir/deepsubmodule at commit $subhead" actual.err
    ++	test_must_be_empty actual.out &&
    ++	verify_fetch_result actual.err
     +'
     +
     +test_expect_success "'--recurse-submodules' should ignore changed, inactive submodules" '
    -+	# Fetch any leftover commits from other tests.
    -+	git -C downstream fetch --recurse-submodules &&
    -+	# Create new superproject commit with updated submodules
    -+	add_upstream_commit &&
    -+	(
    -+		cd submodule &&
    -+		(
    -+			cd subdir/deepsubmodule &&
    -+			git fetch &&
    -+			git checkout -q FETCH_HEAD
    -+		) &&
    -+		git add subdir/deepsubmodule &&
    -+		git commit -m "new deep submodule"
    -+	) &&
    -+	git add submodule &&
    -+	git commit -m "new submodule" &&
    ++	add_submodule_commits &&
    ++	add_superproject_commits &&
     +
     +	# Fetch the new superproject commit
     +	(
    @@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-deman
     +		git -c submodule.submodule.active=false fetch --recurse-submodules >../actual.out 2>../actual.err
     +	) &&
     +	test_must_be_empty actual.out &&
    -+	git rev-parse --short HEAD >superhead &&
    ++	super_head=$(git rev-parse --short HEAD) &&
    ++	check_super $super_head &&
     +	# Neither should be fetched because the submodule is inactive
    -+	rm subhead &&
    -+	rm deephead &&
    ++	rm expect.err.sub &&
    ++	rm expect.err.deep &&
     +	verify_fetch_result actual.err
     +'
     +
    @@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-deman
     +'
     +
     +test_expect_success "'--recurse-submodules' should fetch submodule commits in changed submodules and the index" '
    -+	# Fetch any leftover commits from other tests.
    -+	git -C downstream fetch --recurse-submodules &&
     +	# Create new commit in origin/super
    -+	add_upstream_commit &&
    -+	(
    -+		cd submodule &&
    -+		(
    -+			cd subdir/deepsubmodule &&
    -+			git fetch &&
    -+			git checkout -q FETCH_HEAD
    -+		) &&
    -+		git add subdir/deepsubmodule &&
    -+		git commit -m "new deep submodule"
    -+	) &&
    -+	git add submodule &&
    -+	git commit -m "new submodule" &&
    ++	add_submodule_commits &&
    ++	add_superproject_commits &&
     +
     +	# Create new commit in origin/super-sub2-only
     +	git checkout super-sub2-only &&
    @@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-deman
     +		git fetch --recurse-submodules >../actual.out 2>../actual.err
     +	) &&
     +	test_must_be_empty actual.out &&
    -+
    -+	# Assert that the submodules in the super branch are fetched
    -+	git rev-parse --short HEAD >superhead &&
    -+	git -C submodule rev-parse --short HEAD >subhead &&
    -+	git -C deepsubmodule rev-parse --short HEAD >deephead &&
    -+	verify_fetch_result actual.err &&
    -+	# grep for the exact line to check that the submodule is read
    -+	# from the index, not from a commit
    -+	grep "^Fetching submodule submodule\$" actual.err &&
    -+
    -+	# Assert that super-sub2-only and submodule2 were fetched even
    -+	# though another branch is checked out
    ++	deep_head=$(git -C submodule/subdir/deepsubmodule rev-parse --short HEAD) &&
    ++	sub_head=$(git -C submodule rev-parse --short HEAD) &&
    ++	sub2_head=$(git -C submodule2 rev-parse --short HEAD) &&
    ++	super_head=$(git rev-parse --short HEAD) &&
     +	super_sub2_only_head=$(git rev-parse --short super-sub2-only) &&
    -+	grep -E "\.\.${super_sub2_only_head}\s+super-sub2-only\s+-> origin/super-sub2-only" actual.err &&
    -+	grep "Fetching submodule submodule2 at commit $super_sub2_only_head" actual.err &&
    -+	sub2head=$(git -C submodule2 rev-parse --short HEAD) &&
    -+	grep -E "\.\.${sub2head}\s+sub2\s+-> origin/sub2" actual.err
    ++
    ++	# Use test_cmp manually because verify_fetch_result does not
    ++	# consider submodule2. All the repos should be fetched, but only
    ++	# submodule2 should be read from a commit
    ++	cat <<-EOF > expect.err.combined &&
    ++	From $pwd/.
    ++	   OLD_HEAD..$super_head  super           -> origin/super
    ++	   OLD_HEAD..$super_sub2_only_head  super-sub2-only -> origin/super-sub2-only
    ++	Fetching submodule submodule
    ++	From $pwd/submodule
    ++	   OLD_HEAD..$sub_head  sub        -> origin/sub
    ++	Fetching submodule submodule/subdir/deepsubmodule
    ++	From $pwd/deepsubmodule
    ++	   OLD_HEAD..$deep_head  deep       -> origin/deep
    ++	Fetching submodule submodule2 at commit $super_sub2_only_head
    ++	From $pwd/submodule2
    ++	   OLD_HEAD..$sub2_head  sub2       -> origin/sub2
    ++	EOF
    ++	sed -E "s/[0-9a-f]+\.\./OLD_HEAD\.\./" actual.err >actual.err.cmp &&
    ++	test_cmp expect.err.combined actual.err.cmp
     +'
     +
      test_expect_success "'--recurse-submodules=on-demand' stops when no new submodule commits are found in the superproject (and ignores config)" '
    - 	add_upstream_commit &&
    + 	add_submodule_commits &&
      	echo a >> file &&
    +@@ t/t5526-fetch-submodules.sh: test_expect_success 'recursive fetch after deinit a submodule' '
    + 	test_cmp expect actual
    + '
    + 
    ++test_expect_success 'setup repo with upstreams that share a submodule name' '
    ++	mkdir same-name-1 &&
    ++	(
    ++		cd same-name-1 &&
    ++		git init &&
    ++		test_commit --no-tag a
    ++	) &&
    ++	git clone same-name-1 same-name-2 &&
    ++	# same-name-1 and same-name-2 both add a submodule with the
    ++	# name "submodule"
    ++	(
    ++		cd same-name-1 &&
    ++		mkdir submodule &&
    ++		git -C submodule init &&
    ++		test_commit -C submodule --no-tag a1 &&
    ++		git submodule add "$pwd/same-name-1/submodule" &&
    ++		git add submodule &&
    ++		git commit -m "super-a1"
    ++	) &&
    ++	(
    ++		cd same-name-2 &&
    ++		mkdir submodule &&
    ++		git -C submodule init &&
    ++		test_commit -C submodule --no-tag a2 &&
    ++		git submodule add "$pwd/same-name-2/submodule" &&
    ++		git add submodule &&
    ++		git commit -m "super-a2"
    ++	) &&
    ++	git clone same-name-1 -o same-name-1 same-name-downstream &&
    ++	(
    ++		cd same-name-downstream &&
    ++		git remote add same-name-2 ../same-name-2 &&
    ++		git fetch --all &&
    ++		# init downstream with same-name-1
    ++		git submodule update --init
    ++	)
    ++'
    ++
    ++test_expect_success 'fetch --recurse-submodules updates name-conflicted, populated submodule' '
    ++	test_when_finished "git -C same-name-downstream checkout master" &&
    ++	(
    ++		cd same-name-1 &&
    ++		test_commit -C submodule --no-tag b1 &&
    ++		git add submodule &&
    ++		git commit -m "super-b1"
    ++	) &&
    ++	(
    ++		cd same-name-2 &&
    ++		test_commit -C submodule --no-tag b2 &&
    ++		git add submodule &&
    ++		git commit -m "super-b2"
    ++	) &&
    ++	(
    ++		cd same-name-downstream &&
    ++		# even though the .gitmodules is correct, we cannot
    ++		# fetch from same-name-2
    ++		git checkout same-name-2/master &&
    ++		git fetch --recurse-submodules same-name-1 &&
    ++		test_must_fail git fetch --recurse-submodules same-name-2
    ++	) &&
    ++	super_head1=$(git -C same-name-1 rev-parse HEAD) &&
    ++	git -C same-name-downstream cat-file -e $super_head1 &&
    ++
    ++	super_head2=$(git -C same-name-2 rev-parse HEAD) &&
    ++	git -C same-name-downstream cat-file -e $super_head2 &&
    ++
    ++	sub_head1=$(git -C same-name-1/submodule rev-parse HEAD) &&
    ++	git -C same-name-downstream/submodule cat-file -e $sub_head1 &&
    ++
    ++	sub_head2=$(git -C same-name-2/submodule rev-parse HEAD) &&
    ++	test_must_fail git -C same-name-downstream/submodule cat-file -e $sub_head2
    ++'
    ++
    ++test_expect_success 'fetch --recurse-submodules updates name-conflicted, unpopulated submodule' '
    ++	(
    ++		cd same-name-1 &&
    ++		test_commit -C submodule --no-tag c1 &&
    ++		git add submodule &&
    ++		git commit -m "super-c1"
    ++	) &&
    ++	(
    ++		cd same-name-2 &&
    ++		test_commit -C submodule --no-tag c2 &&
    ++		git add submodule &&
    ++		git commit -m "super-c2"
    ++	) &&
    ++	(
    ++		cd same-name-downstream &&
    ++		git checkout master &&
    ++		git rm .gitmodules &&
    ++		git rm submodule &&
    ++		git commit -m "no submodules" &&
    ++		git fetch --recurse-submodules same-name-1
    ++	) &&
    ++	head1=$(git -C same-name-1/submodule rev-parse HEAD) &&
    ++	head2=$(git -C same-name-2/submodule rev-parse HEAD) &&
    ++	(
    ++		cd same-name-downstream/.git/modules/submodule &&
    ++		# The submodule has core.worktree pointing to the "git
    ++		# rm"-ed directory, overwrite the invalid value.
    ++		git --work-tree=. cat-file -e $head1 &&
    ++		test_must_fail git --work-tree=. cat-file -e $head2
    ++	)
    ++'
    ++
    + test_done
 8:  8aa68111b0 <  -:  ---------- submodule: read shallows when finding changed submodules
 9:  05a8b93154 ! 10:  e1ac74eee4 submodule: fix latent check_has_commit() bug
    @@ submodule.c: static int check_has_commit(const struct object_id *oid, void *data
      
      	type = oid_object_info(&subrepo, oid, NULL);
     @@ submodule.c: static int submodule_has_commits(struct repository *r,
    - {
    - 	struct has_commit_data has_commit = { r, 1, path, super_oid };
    + 		.super_oid = super_oid
    + 	};
      
     -	/*
     -	 * Perform a cheap, but incorrect check for the existence of 'commits'.

base-commit: 679e3693aba0c17af60c031f7eef68f2296b8dad
-- 
2.33.GIT


^ permalink raw reply	[flat|nested] 149+ messages in thread

* [PATCH v3 01/10] t5526: introduce test helper to assert on fetches
  2022-02-24 10:08   ` [PATCH v3 00/10] fetch --recurse-submodules: fetch unpopulated submodules Glen Choo
@ 2022-02-24 10:08     ` Glen Choo
  2022-02-25  0:34       ` Junio C Hamano
  2022-02-24 10:08     ` [PATCH v3 02/10] t5526: stop asserting on stderr literally Glen Choo
                       ` (9 subsequent siblings)
  10 siblings, 1 reply; 149+ messages in thread
From: Glen Choo @ 2022-02-24 10:08 UTC (permalink / raw)
  To: git
  Cc: Glen Choo, Jonathan Tan, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

Tests in t/t5526-fetch-submodules.sh are unnecessarily noisy:

* The tests have extra logic in order to reproduce the expected stderr
  literally, but not all of these details (e.g. the head of the
  remote-tracking branch before the fetch) are relevant to the test.

* The expect.err file is constructed by the add_upstream_commit() helper
  as input into test_cmp, but most tests fetch a different combination
  of repos from expect.err. This results in noisy tests that modify
  parts of that expect.err to generate the expected output.

To address both of these issues, introduce a verify_fetch_result()
helper to t/t5526-fetch-submodules.sh that asserts on the output of "git
fetch --recurse-submodules" and handles the ordering of expect.err.

As a result, the tests no longer construct expect.err manually. Tests
still consider the old head of the remote-tracking branch ("$head1"),
but that will be fixed in a later commit.

Signed-off-by: Glen Choo <chooglen@google.com>
---
 t/t5526-fetch-submodules.sh | 136 +++++++++++++++++++++---------------
 1 file changed, 81 insertions(+), 55 deletions(-)

diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index 2dc75b80db..0e93df1665 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -13,6 +13,10 @@ export GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB
 
 pwd=$(pwd)
 
+# For each submodule in the test setup, this creates a commit and writes
+# a file that contains the expected err if that new commit were fetched.
+# These output files get concatenated in the right order by
+# verify_fetch_result().
 add_upstream_commit() {
 	(
 		cd submodule &&
@@ -22,9 +26,9 @@ add_upstream_commit() {
 		git add subfile &&
 		git commit -m new subfile &&
 		head2=$(git rev-parse --short HEAD) &&
-		echo "Fetching submodule submodule" > ../expect.err &&
-		echo "From $pwd/submodule" >> ../expect.err &&
-		echo "   $head1..$head2  sub        -> origin/sub" >> ../expect.err
+		echo "Fetching submodule submodule" > ../expect.err.sub &&
+		echo "From $pwd/submodule" >> ../expect.err.sub &&
+		echo "   $head1..$head2  sub        -> origin/sub" >> ../expect.err.sub
 	) &&
 	(
 		cd deepsubmodule &&
@@ -34,12 +38,33 @@ add_upstream_commit() {
 		git add deepsubfile &&
 		git commit -m new deepsubfile &&
 		head2=$(git rev-parse --short HEAD) &&
-		echo "Fetching submodule submodule/subdir/deepsubmodule" >> ../expect.err
-		echo "From $pwd/deepsubmodule" >> ../expect.err &&
-		echo "   $head1..$head2  deep       -> origin/deep" >> ../expect.err
+		echo "Fetching submodule submodule/subdir/deepsubmodule" > ../expect.err.deep
+		echo "From $pwd/deepsubmodule" >> ../expect.err.deep &&
+		echo "   $head1..$head2  deep       -> origin/deep" >> ../expect.err.deep
 	)
 }
 
+# Verifies that the expected repositories were fetched. This is done by
+# concatenating the files expect.err.[super|sub|deep] in the correct
+# order and comparing it to the actual stderr.
+#
+# If a repo should not be fetched in the test, its corresponding
+# expect.err file should be rm-ed.
+verify_fetch_result() {
+	ACTUAL_ERR=$1 &&
+	rm -f expect.err.combined &&
+	if [ -f expect.err.super ]; then
+		cat expect.err.super >>expect.err.combined
+	fi &&
+	if [ -f expect.err.sub ]; then
+		cat expect.err.sub >>expect.err.combined
+	fi &&
+	if [ -f expect.err.deep ]; then
+		cat expect.err.deep >>expect.err.combined
+	fi &&
+	test_cmp expect.err.combined $ACTUAL_ERR
+}
+
 test_expect_success setup '
 	mkdir deepsubmodule &&
 	(
@@ -77,7 +102,7 @@ test_expect_success "fetch --recurse-submodules recurses into submodules" '
 		git fetch --recurse-submodules >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "submodule.recurse option triggers recursive fetch" '
@@ -87,7 +112,7 @@ test_expect_success "submodule.recurse option triggers recursive fetch" '
 		git -c submodule.recurse fetch >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "fetch --recurse-submodules -j2 has the same output behaviour" '
@@ -97,7 +122,7 @@ test_expect_success "fetch --recurse-submodules -j2 has the same output behaviou
 		GIT_TRACE="$TRASH_DIRECTORY/trace.out" git fetch --recurse-submodules -j2 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err &&
+	verify_fetch_result actual.err &&
 	grep "2 tasks" trace.out
 '
 
@@ -127,7 +152,7 @@ test_expect_success "using fetchRecurseSubmodules=true in .gitmodules recurses i
 		git fetch >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "--no-recurse-submodules overrides .gitmodules config" '
@@ -158,7 +183,7 @@ test_expect_success "--recurse-submodules overrides fetchRecurseSubmodules setti
 		git config --unset submodule.submodule.fetchRecurseSubmodules
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "--quiet propagates to submodules" '
@@ -186,7 +211,7 @@ test_expect_success "--dry-run propagates to submodules" '
 		git fetch --recurse-submodules --dry-run >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "Without --dry-run propagates to submodules" '
@@ -195,7 +220,7 @@ test_expect_success "Without --dry-run propagates to submodules" '
 		git fetch --recurse-submodules >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "recurseSubmodules=true propagates into submodules" '
@@ -206,7 +231,7 @@ test_expect_success "recurseSubmodules=true propagates into submodules" '
 		git fetch >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "--recurse-submodules overrides config in submodule" '
@@ -220,7 +245,7 @@ test_expect_success "--recurse-submodules overrides config in submodule" '
 		git fetch --recurse-submodules >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "--no-recurse-submodules overrides config setting" '
@@ -253,14 +278,14 @@ test_expect_success "Recursion stops when no new submodule commits are fetched"
 	git add submodule &&
 	git commit -m "new submodule" &&
 	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.sub &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err.sub &&
-	head -3 expect.err >> expect.err.sub &&
+	echo "From $pwd/." > expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
+	rm expect.err.deep &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err
 	) &&
-	test_cmp expect.err.sub actual.err &&
+	verify_fetch_result actual.err &&
 	test_must_be_empty actual.out
 '
 
@@ -271,14 +296,16 @@ test_expect_success "Recursion doesn't happen when new superproject commits don'
 	git add file &&
 	git commit -m "new file" &&
 	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.file &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err.file &&
+	echo "From $pwd/." > expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
+	rm expect.err.sub &&
+	rm expect.err.deep &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err.file actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "Recursion picks up config in submodule" '
@@ -295,9 +322,8 @@ test_expect_success "Recursion picks up config in submodule" '
 	git add submodule &&
 	git commit -m "new submodule" &&
 	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.sub &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err.sub &&
-	cat expect.err >> expect.err.sub &&
+	echo "From $pwd/." > expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err &&
@@ -306,7 +332,7 @@ test_expect_success "Recursion picks up config in submodule" '
 			git config --unset fetch.recurseSubmodules
 		)
 	) &&
-	test_cmp expect.err.sub actual.err &&
+	verify_fetch_result actual.err &&
 	test_must_be_empty actual.out
 '
 
@@ -331,15 +357,13 @@ test_expect_success "Recursion picks up all submodules when necessary" '
 	git add submodule &&
 	git commit -m "new submodule" &&
 	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.2 &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err.2 &&
-	cat expect.err.sub >> expect.err.2 &&
-	tail -3 expect.err >> expect.err.2 &&
+	echo "From $pwd/." > expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err
 	) &&
-	test_cmp expect.err.2 actual.err &&
+	verify_fetch_result actual.err &&
 	test_must_be_empty actual.out
 '
 
@@ -375,11 +399,8 @@ test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necess
 	git add submodule &&
 	git commit -m "new submodule" &&
 	head2=$(git rev-parse --short HEAD) &&
-	tail -3 expect.err > expect.err.deepsub &&
-	echo "From $pwd/." > expect.err &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err &&
-	cat expect.err.sub >> expect.err &&
-	cat expect.err.deepsub >> expect.err &&
+	echo "From $pwd/." > expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
 	(
 		cd downstream &&
 		git config fetch.recurseSubmodules false &&
@@ -395,7 +416,7 @@ test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necess
 		)
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "'--recurse-submodules=on-demand' stops when no new submodule commits are found in the superproject (and ignores config)" '
@@ -405,14 +426,16 @@ test_expect_success "'--recurse-submodules=on-demand' stops when no new submodul
 	git add file &&
 	git commit -m "new file" &&
 	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.file &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err.file &&
+	echo "From $pwd/." > expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
+	rm expect.err.sub &&
+	rm expect.err.deep &&
 	(
 		cd downstream &&
 		git fetch --recurse-submodules=on-demand >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err.file actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "'fetch.recurseSubmodules=on-demand' overrides global config" '
@@ -426,9 +449,9 @@ test_expect_success "'fetch.recurseSubmodules=on-demand' overrides global config
 	git add submodule &&
 	git commit -m "new submodule" &&
 	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.2 &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err.2 &&
-	head -3 expect.err >> expect.err.2 &&
+	echo "From $pwd/." > expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
+	rm expect.err.deep &&
 	(
 		cd downstream &&
 		git config fetch.recurseSubmodules on-demand &&
@@ -440,7 +463,7 @@ test_expect_success "'fetch.recurseSubmodules=on-demand' overrides global config
 		git config --unset fetch.recurseSubmodules
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err.2 actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "'submodule.<sub>.fetchRecurseSubmodules=on-demand' overrides fetch.recurseSubmodules" '
@@ -454,9 +477,9 @@ test_expect_success "'submodule.<sub>.fetchRecurseSubmodules=on-demand' override
 	git add submodule &&
 	git commit -m "new submodule" &&
 	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.2 &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err.2 &&
-	head -3 expect.err >> expect.err.2 &&
+	echo "From $pwd/." > expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
+	rm expect.err.deep &&
 	(
 		cd downstream &&
 		git config submodule.submodule.fetchRecurseSubmodules on-demand &&
@@ -468,7 +491,7 @@ test_expect_success "'submodule.<sub>.fetchRecurseSubmodules=on-demand' override
 		git config --unset submodule.submodule.fetchRecurseSubmodules
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err.2 actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "don't fetch submodule when newly recorded commits are already present" '
@@ -480,14 +503,17 @@ test_expect_success "don't fetch submodule when newly recorded commits are alrea
 	git add submodule &&
 	git commit -m "submodule rewound" &&
 	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err &&
+	echo "From $pwd/." > expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
+	rm expect.err.sub &&
+	# This file does not exist, but rm -f for readability
+	rm -f expect.err.deep &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err &&
+	verify_fetch_result actual.err &&
 	(
 		cd submodule &&
 		git checkout -q sub
@@ -505,9 +531,9 @@ test_expect_success "'fetch.recurseSubmodules=on-demand' works also without .git
 	git rm .gitmodules &&
 	git commit -m "new submodule without .gitmodules" &&
 	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." >expect.err.2 &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err.2 &&
-	head -3 expect.err >>expect.err.2 &&
+	echo "From $pwd/." >expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
+	rm expect.err.deep &&
 	(
 		cd downstream &&
 		rm .gitmodules &&
@@ -523,7 +549,7 @@ test_expect_success "'fetch.recurseSubmodules=on-demand' works also without .git
 		git reset --hard
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err.2 actual.err &&
+	verify_fetch_result actual.err &&
 	git checkout HEAD^ -- .gitmodules &&
 	git add .gitmodules &&
 	git commit -m "new submodule restored .gitmodules"
-- 
2.33.GIT


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v3 02/10] t5526: stop asserting on stderr literally
  2022-02-24 10:08   ` [PATCH v3 00/10] fetch --recurse-submodules: fetch unpopulated submodules Glen Choo
  2022-02-24 10:08     ` [PATCH v3 01/10] t5526: introduce test helper to assert on fetches Glen Choo
@ 2022-02-24 10:08     ` Glen Choo
  2022-02-24 11:52       ` Ævar Arnfjörð Bjarmason
  2022-02-24 23:05       ` Jonathan Tan
  2022-02-24 10:08     ` [PATCH v3 03/10] t5526: create superproject commits with test helper Glen Choo
                       ` (8 subsequent siblings)
  10 siblings, 2 replies; 149+ messages in thread
From: Glen Choo @ 2022-02-24 10:08 UTC (permalink / raw)
  To: git
  Cc: Glen Choo, Jonathan Tan, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

In the previous commit message, we noted that not all of the "git fetch"
stderr is relevant to the tests. Most of the test setup lines are
dedicated to these details of the stderr:

1. which repos (super/sub/deep) are involved in the fetch
2. the head of the remote-tracking branch before the fetch (i.e. $head1)
3. the head of the remote-tracking branch after the fetch (i.e. $head2)

1. and 3. are relevant because they tell us that the expected commit is
fetched by the expected repo, but 2. is completely irrelevant.

Stop asserting on $head1 by replacing it with a dummy value in the
actual and expected output. Do this by introducing test
helpers (check_*()) that make it easier to construct the expected
output, and use sed to munge the actual output.

Signed-off-by: Glen Choo <chooglen@google.com>
---
Per Ævar's suggestion [1], I reverted the test suite changes that
replaced "test_cmp" with "grep", and opted to munge out irrelevant
details with "sed". Opinion on "grep" vs "test_cmp" seems a bit split,
and there are good arguments for either, but for these tests, I think
test_cmp works better for a few reasons:

- The motivation for removing test_cmp in the old patch 1 [2], i.e.
  we _want_ to ignore changes in the test output, turned out to be
  absolutely incorrect. We actually care a lot about the changes in test
  output because it tells us where we are reading the submodule info
  from.

  In the v1-2 test scheme, verify_fetch_result() would ignore those
  changes, so I added extra extra grep-s specifically to check
  where the submodule is read from. ([3] comments on one of these). I
  could have generalized verify_fetch_result() to handle those extra
  grep-s, but since the original motivation for using grep is gone,
  test_cmp seemed like a viable alternative for an intuitive test
  scheme.

- verify_fetch_result() is easier to reason about because we now assert
  on the output almost verbatim (besides munging), instead of mixing
  greps and negative greps to get a similar result. This should be
  helpful for someone updating the tests later.

- When a test can't use verify_fetch_result() (e.g. it involves other
  submodules, patch 9 adds some of these), I found it a lot easier to
  write the test using test_cmp instead of grep.

- test_cmp tests are sensitive to unmeaningful changes, but this
  behavior helps us catch unwanted regressions and (in these tests at
  least) it is relatively easy to change test_cmp tests to do the right
  thing.

[1] https://lore.kernel.org/git/220216.86y22bt8gp.gmgdl@evledraar.gmail.com
[2] https://lore.kernel.org/git/20220215172318.73533-2-chooglen@google.com
[3] https://lore.kernel.org/git/20220215220229.1633486-1-jonathantanmy@google.com

 t/t5526-fetch-submodules.sh | 117 +++++++++++++++++-------------------
 1 file changed, 56 insertions(+), 61 deletions(-)

diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index 0e93df1665..a3890e2f6c 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -13,6 +13,32 @@ export GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB
 
 pwd=$(pwd)
 
+check_sub() {
+	NEW_HEAD=$1 &&
+	cat <<-EOF >$pwd/expect.err.sub
+	Fetching submodule submodule
+	From $pwd/submodule
+	   OLD_HEAD..$NEW_HEAD  sub        -> origin/sub
+	EOF
+}
+
+check_deep() {
+	NEW_HEAD=$1 &&
+	cat <<-EOF >$pwd/expect.err.deep
+	Fetching submodule submodule/subdir/deepsubmodule
+	From $pwd/deepsubmodule
+	   OLD_HEAD..$NEW_HEAD  deep       -> origin/deep
+	EOF
+}
+
+check_super() {
+	NEW_HEAD=$1 &&
+	cat <<-EOF >$pwd/expect.err.super
+	From $pwd/.
+	   OLD_HEAD..$NEW_HEAD  super      -> origin/super
+	EOF
+}
+
 # For each submodule in the test setup, this creates a commit and writes
 # a file that contains the expected err if that new commit were fetched.
 # These output files get concatenated in the right order by
@@ -20,27 +46,21 @@ pwd=$(pwd)
 add_upstream_commit() {
 	(
 		cd submodule &&
-		head1=$(git rev-parse --short HEAD) &&
 		echo new >> subfile &&
 		test_tick &&
 		git add subfile &&
 		git commit -m new subfile &&
-		head2=$(git rev-parse --short HEAD) &&
-		echo "Fetching submodule submodule" > ../expect.err.sub &&
-		echo "From $pwd/submodule" >> ../expect.err.sub &&
-		echo "   $head1..$head2  sub        -> origin/sub" >> ../expect.err.sub
+		new_head=$(git rev-parse --short HEAD) &&
+		check_sub $new_head
 	) &&
 	(
 		cd deepsubmodule &&
-		head1=$(git rev-parse --short HEAD) &&
 		echo new >> deepsubfile &&
 		test_tick &&
 		git add deepsubfile &&
 		git commit -m new deepsubfile &&
-		head2=$(git rev-parse --short HEAD) &&
-		echo "Fetching submodule submodule/subdir/deepsubmodule" > ../expect.err.deep
-		echo "From $pwd/deepsubmodule" >> ../expect.err.deep &&
-		echo "   $head1..$head2  deep       -> origin/deep" >> ../expect.err.deep
+		new_head=$(git rev-parse --short HEAD) &&
+		check_deep $new_head
 	)
 }
 
@@ -62,7 +82,8 @@ verify_fetch_result() {
 	if [ -f expect.err.deep ]; then
 		cat expect.err.deep >>expect.err.combined
 	fi &&
-	test_cmp expect.err.combined $ACTUAL_ERR
+	sed -E 's/[0-9a-f]+\.\./OLD_HEAD\.\./' $ACTUAL_ERR >actual.err.cmp &&
+	test_cmp expect.err.combined actual.err.cmp
 }
 
 test_expect_success setup '
@@ -274,12 +295,10 @@ test_expect_success "Recursion doesn't happen when no new commits are fetched in
 '
 
 test_expect_success "Recursion stops when no new submodule commits are fetched" '
-	head1=$(git rev-parse --short HEAD) &&
 	git add submodule &&
 	git commit -m "new submodule" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
+	new_head=$(git rev-parse --short HEAD) &&
+	check_super $new_head &&
 	rm expect.err.deep &&
 	(
 		cd downstream &&
@@ -291,13 +310,11 @@ test_expect_success "Recursion stops when no new submodule commits are fetched"
 
 test_expect_success "Recursion doesn't happen when new superproject commits don't change any submodules" '
 	add_upstream_commit &&
-	head1=$(git rev-parse --short HEAD) &&
 	echo a > file &&
 	git add file &&
 	git commit -m "new file" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
+	new_head=$(git rev-parse --short HEAD) &&
+	check_super $new_head &&
 	rm expect.err.sub &&
 	rm expect.err.deep &&
 	(
@@ -318,12 +335,10 @@ test_expect_success "Recursion picks up config in submodule" '
 		)
 	) &&
 	add_upstream_commit &&
-	head1=$(git rev-parse --short HEAD) &&
 	git add submodule &&
 	git commit -m "new submodule" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
+	new_head=$(git rev-parse --short HEAD) &&
+	check_super $new_head &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err &&
@@ -345,20 +360,15 @@ test_expect_success "Recursion picks up all submodules when necessary" '
 			git fetch &&
 			git checkout -q FETCH_HEAD
 		) &&
-		head1=$(git rev-parse --short HEAD^) &&
 		git add subdir/deepsubmodule &&
 		git commit -m "new deepsubmodule" &&
-		head2=$(git rev-parse --short HEAD) &&
-		echo "Fetching submodule submodule" > ../expect.err.sub &&
-		echo "From $pwd/submodule" >> ../expect.err.sub &&
-		echo "   $head1..$head2  sub        -> origin/sub" >> ../expect.err.sub
+		new_head=$(git rev-parse --short HEAD) &&
+		check_sub $new_head
 	) &&
-	head1=$(git rev-parse --short HEAD) &&
 	git add submodule &&
 	git commit -m "new submodule" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
+	new_head=$(git rev-parse --short HEAD) &&
+	check_super $new_head &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err
@@ -376,13 +386,10 @@ test_expect_success "'--recurse-submodules=on-demand' doesn't recurse when no ne
 			git fetch &&
 			git checkout -q FETCH_HEAD
 		) &&
-		head1=$(git rev-parse --short HEAD^) &&
 		git add subdir/deepsubmodule &&
 		git commit -m "new deepsubmodule" &&
-		head2=$(git rev-parse --short HEAD) &&
-		echo Fetching submodule submodule > ../expect.err.sub &&
-		echo "From $pwd/submodule" >> ../expect.err.sub &&
-		echo "   $head1..$head2  sub        -> origin/sub" >> ../expect.err.sub
+		new_head=$(git rev-parse --short HEAD) &&
+		check_sub $new_head
 	) &&
 	(
 		cd downstream &&
@@ -395,12 +402,10 @@ test_expect_success "'--recurse-submodules=on-demand' doesn't recurse when no ne
 '
 
 test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necessary (and ignores config)" '
-	head1=$(git rev-parse --short HEAD) &&
 	git add submodule &&
 	git commit -m "new submodule" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
+	new_head=$(git rev-parse --short HEAD) &&
+	check_super $new_head &&
 	(
 		cd downstream &&
 		git config fetch.recurseSubmodules false &&
@@ -421,13 +426,11 @@ test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necess
 
 test_expect_success "'--recurse-submodules=on-demand' stops when no new submodule commits are found in the superproject (and ignores config)" '
 	add_upstream_commit &&
-	head1=$(git rev-parse --short HEAD) &&
 	echo a >> file &&
 	git add file &&
 	git commit -m "new file" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
+	new_head=$(git rev-parse --short HEAD) &&
+	check_super $new_head &&
 	rm expect.err.sub &&
 	rm expect.err.deep &&
 	(
@@ -445,12 +448,10 @@ test_expect_success "'fetch.recurseSubmodules=on-demand' overrides global config
 	) &&
 	add_upstream_commit &&
 	git config --global fetch.recurseSubmodules false &&
-	head1=$(git rev-parse --short HEAD) &&
 	git add submodule &&
 	git commit -m "new submodule" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
+	new_head=$(git rev-parse --short HEAD) &&
+	check_super $new_head &&
 	rm expect.err.deep &&
 	(
 		cd downstream &&
@@ -473,12 +474,10 @@ test_expect_success "'submodule.<sub>.fetchRecurseSubmodules=on-demand' override
 	) &&
 	add_upstream_commit &&
 	git config fetch.recurseSubmodules false &&
-	head1=$(git rev-parse --short HEAD) &&
 	git add submodule &&
 	git commit -m "new submodule" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
+	new_head=$(git rev-parse --short HEAD) &&
+	check_super $new_head &&
 	rm expect.err.deep &&
 	(
 		cd downstream &&
@@ -499,12 +498,10 @@ test_expect_success "don't fetch submodule when newly recorded commits are alrea
 		cd submodule &&
 		git checkout -q HEAD^^
 	) &&
-	head1=$(git rev-parse --short HEAD) &&
 	git add submodule &&
 	git commit -m "submodule rewound" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
+	new_head=$(git rev-parse --short HEAD) &&
+	check_super $new_head &&
 	rm expect.err.sub &&
 	# This file does not exist, but rm -f for readability
 	rm -f expect.err.deep &&
@@ -526,13 +523,11 @@ test_expect_success "'fetch.recurseSubmodules=on-demand' works also without .git
 		git fetch --recurse-submodules
 	) &&
 	add_upstream_commit &&
-	head1=$(git rev-parse --short HEAD) &&
 	git add submodule &&
 	git rm .gitmodules &&
 	git commit -m "new submodule without .gitmodules" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." >expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
+	new_head=$(git rev-parse --short HEAD) &&
+	check_super $new_head &&
 	rm expect.err.deep &&
 	(
 		cd downstream &&
-- 
2.33.GIT


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v3 03/10] t5526: create superproject commits with test helper
  2022-02-24 10:08   ` [PATCH v3 00/10] fetch --recurse-submodules: fetch unpopulated submodules Glen Choo
  2022-02-24 10:08     ` [PATCH v3 01/10] t5526: introduce test helper to assert on fetches Glen Choo
  2022-02-24 10:08     ` [PATCH v3 02/10] t5526: stop asserting on stderr literally Glen Choo
@ 2022-02-24 10:08     ` Glen Choo
  2022-02-24 23:14       ` Jonathan Tan
  2022-02-24 10:08     ` [PATCH v3 04/10] submodule: make static functions read submodules from commits Glen Choo
                       ` (7 subsequent siblings)
  10 siblings, 1 reply; 149+ messages in thread
From: Glen Choo @ 2022-02-24 10:08 UTC (permalink / raw)
  To: git
  Cc: Glen Choo, Jonathan Tan, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

A few tests in t5526 use this pattern as part of their setup:

1. Create new commits in the upstream submodules (using
   add_upstream_commit()).
2. In the upstream superprojects, add the new submodule commits from the
   previous step.

A future commit will add more tests with this pattern, so reduce the
verbosity of present and future tests by introducing a test helper that
creates superproject commits. Since we now have two helpers that add
upstream commits, rename add_upstream_commit() to
add_submodule_commits().

Signed-off-by: Glen Choo <chooglen@google.com>
---
 t/t5526-fetch-submodules.sh | 94 +++++++++++++++++--------------------
 1 file changed, 44 insertions(+), 50 deletions(-)

diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index a3890e2f6c..ee4dd5a4a9 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -43,7 +43,7 @@ check_super() {
 # a file that contains the expected err if that new commit were fetched.
 # These output files get concatenated in the right order by
 # verify_fetch_result().
-add_upstream_commit() {
+add_submodule_commits() {
 	(
 		cd submodule &&
 		echo new >> subfile &&
@@ -64,6 +64,30 @@ add_upstream_commit() {
 	)
 }
 
+# For each superproject in the test setup, update its submodule, add the
+# submodule and create a new commit with the submodule change.
+#
+# This requires add_submodule_commits() to be called first, otherwise
+# the submodules will not have changed and cannot be "git add"-ed.
+add_superproject_commits() {
+(
+	cd submodule &&
+	(
+		cd subdir/deepsubmodule &&
+		git fetch &&
+		git checkout -q FETCH_HEAD
+	) &&
+		git add subdir/deepsubmodule &&
+		git commit -m "new deep submodule"
+	) &&
+	git add submodule &&
+	git commit -m "new submodule" &&
+	super_head=$(git rev-parse --short HEAD) &&
+	sub_head=$(git -C submodule rev-parse --short HEAD) &&
+	check_super $super_head &&
+	check_sub $sub_head
+}
+
 # Verifies that the expected repositories were fetched. This is done by
 # concatenating the files expect.err.[super|sub|deep] in the correct
 # order and comparing it to the actual stderr.
@@ -117,7 +141,7 @@ test_expect_success setup '
 '
 
 test_expect_success "fetch --recurse-submodules recurses into submodules" '
-	add_upstream_commit &&
+	add_submodule_commits &&
 	(
 		cd downstream &&
 		git fetch --recurse-submodules >../actual.out 2>../actual.err
@@ -127,7 +151,7 @@ test_expect_success "fetch --recurse-submodules recurses into submodules" '
 '
 
 test_expect_success "submodule.recurse option triggers recursive fetch" '
-	add_upstream_commit &&
+	add_submodule_commits &&
 	(
 		cd downstream &&
 		git -c submodule.recurse fetch >../actual.out 2>../actual.err
@@ -137,7 +161,7 @@ test_expect_success "submodule.recurse option triggers recursive fetch" '
 '
 
 test_expect_success "fetch --recurse-submodules -j2 has the same output behaviour" '
-	add_upstream_commit &&
+	add_submodule_commits &&
 	(
 		cd downstream &&
 		GIT_TRACE="$TRASH_DIRECTORY/trace.out" git fetch --recurse-submodules -j2 2>../actual.err
@@ -148,7 +172,7 @@ test_expect_success "fetch --recurse-submodules -j2 has the same output behaviou
 '
 
 test_expect_success "fetch alone only fetches superproject" '
-	add_upstream_commit &&
+	add_submodule_commits &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err
@@ -177,7 +201,7 @@ test_expect_success "using fetchRecurseSubmodules=true in .gitmodules recurses i
 '
 
 test_expect_success "--no-recurse-submodules overrides .gitmodules config" '
-	add_upstream_commit &&
+	add_submodule_commits &&
 	(
 		cd downstream &&
 		git fetch --no-recurse-submodules >../actual.out 2>../actual.err
@@ -226,7 +250,7 @@ test_expect_success "--quiet propagates to parallel submodules" '
 '
 
 test_expect_success "--dry-run propagates to submodules" '
-	add_upstream_commit &&
+	add_submodule_commits &&
 	(
 		cd downstream &&
 		git fetch --recurse-submodules --dry-run >../actual.out 2>../actual.err
@@ -245,7 +269,7 @@ test_expect_success "Without --dry-run propagates to submodules" '
 '
 
 test_expect_success "recurseSubmodules=true propagates into submodules" '
-	add_upstream_commit &&
+	add_submodule_commits &&
 	(
 		cd downstream &&
 		git config fetch.recurseSubmodules true &&
@@ -256,7 +280,7 @@ test_expect_success "recurseSubmodules=true propagates into submodules" '
 '
 
 test_expect_success "--recurse-submodules overrides config in submodule" '
-	add_upstream_commit &&
+	add_submodule_commits &&
 	(
 		cd downstream &&
 		(
@@ -270,7 +294,7 @@ test_expect_success "--recurse-submodules overrides config in submodule" '
 '
 
 test_expect_success "--no-recurse-submodules overrides config setting" '
-	add_upstream_commit &&
+	add_submodule_commits &&
 	(
 		cd downstream &&
 		git config fetch.recurseSubmodules true &&
@@ -309,7 +333,7 @@ test_expect_success "Recursion stops when no new submodule commits are fetched"
 '
 
 test_expect_success "Recursion doesn't happen when new superproject commits don't change any submodules" '
-	add_upstream_commit &&
+	add_submodule_commits &&
 	echo a > file &&
 	git add file &&
 	git commit -m "new file" &&
@@ -334,7 +358,7 @@ test_expect_success "Recursion picks up config in submodule" '
 			git config fetch.recurseSubmodules true
 		)
 	) &&
-	add_upstream_commit &&
+	add_submodule_commits &&
 	git add submodule &&
 	git commit -m "new submodule" &&
 	new_head=$(git rev-parse --short HEAD) &&
@@ -352,23 +376,8 @@ test_expect_success "Recursion picks up config in submodule" '
 '
 
 test_expect_success "Recursion picks up all submodules when necessary" '
-	add_upstream_commit &&
-	(
-		cd submodule &&
-		(
-			cd subdir/deepsubmodule &&
-			git fetch &&
-			git checkout -q FETCH_HEAD
-		) &&
-		git add subdir/deepsubmodule &&
-		git commit -m "new deepsubmodule" &&
-		new_head=$(git rev-parse --short HEAD) &&
-		check_sub $new_head
-	) &&
-	git add submodule &&
-	git commit -m "new submodule" &&
-	new_head=$(git rev-parse --short HEAD) &&
-	check_super $new_head &&
+	add_submodule_commits &&
+	add_superproject_commits &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err
@@ -378,19 +387,7 @@ test_expect_success "Recursion picks up all submodules when necessary" '
 '
 
 test_expect_success "'--recurse-submodules=on-demand' doesn't recurse when no new commits are fetched in the superproject (and ignores config)" '
-	add_upstream_commit &&
-	(
-		cd submodule &&
-		(
-			cd subdir/deepsubmodule &&
-			git fetch &&
-			git checkout -q FETCH_HEAD
-		) &&
-		git add subdir/deepsubmodule &&
-		git commit -m "new deepsubmodule" &&
-		new_head=$(git rev-parse --short HEAD) &&
-		check_sub $new_head
-	) &&
+	add_submodule_commits &&
 	(
 		cd downstream &&
 		git config fetch.recurseSubmodules true &&
@@ -402,10 +399,7 @@ test_expect_success "'--recurse-submodules=on-demand' doesn't recurse when no ne
 '
 
 test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necessary (and ignores config)" '
-	git add submodule &&
-	git commit -m "new submodule" &&
-	new_head=$(git rev-parse --short HEAD) &&
-	check_super $new_head &&
+	add_superproject_commits &&
 	(
 		cd downstream &&
 		git config fetch.recurseSubmodules false &&
@@ -425,7 +419,7 @@ test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necess
 '
 
 test_expect_success "'--recurse-submodules=on-demand' stops when no new submodule commits are found in the superproject (and ignores config)" '
-	add_upstream_commit &&
+	add_submodule_commits &&
 	echo a >> file &&
 	git add file &&
 	git commit -m "new file" &&
@@ -446,7 +440,7 @@ test_expect_success "'fetch.recurseSubmodules=on-demand' overrides global config
 		cd downstream &&
 		git fetch --recurse-submodules
 	) &&
-	add_upstream_commit &&
+	add_submodule_commits &&
 	git config --global fetch.recurseSubmodules false &&
 	git add submodule &&
 	git commit -m "new submodule" &&
@@ -472,7 +466,7 @@ test_expect_success "'submodule.<sub>.fetchRecurseSubmodules=on-demand' override
 		cd downstream &&
 		git fetch --recurse-submodules
 	) &&
-	add_upstream_commit &&
+	add_submodule_commits &&
 	git config fetch.recurseSubmodules false &&
 	git add submodule &&
 	git commit -m "new submodule" &&
@@ -522,7 +516,7 @@ test_expect_success "'fetch.recurseSubmodules=on-demand' works also without .git
 		cd downstream &&
 		git fetch --recurse-submodules
 	) &&
-	add_upstream_commit &&
+	add_submodule_commits &&
 	git add submodule &&
 	git rm .gitmodules &&
 	git commit -m "new submodule without .gitmodules" &&
-- 
2.33.GIT


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v3 04/10] submodule: make static functions read submodules from commits
  2022-02-24 10:08   ` [PATCH v3 00/10] fetch --recurse-submodules: fetch unpopulated submodules Glen Choo
                       ` (2 preceding siblings ...)
  2022-02-24 10:08     ` [PATCH v3 03/10] t5526: create superproject commits with test helper Glen Choo
@ 2022-02-24 10:08     ` Glen Choo
  2022-02-24 10:08     ` [PATCH v3 05/10] submodule: inline submodule_commits() into caller Glen Choo
                       ` (6 subsequent siblings)
  10 siblings, 0 replies; 149+ messages in thread
From: Glen Choo @ 2022-02-24 10:08 UTC (permalink / raw)
  To: git
  Cc: Glen Choo, Jonathan Tan, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

A future commit will teach "fetch --recurse-submodules" to fetch
unpopulated submodules. To prepare for this, teach the necessary static
functions how to read submodules from superproject commits using a
"treeish_name" argument (instead of always reading from the index and
filesystem) but do not actually change where submodules are read from.
Submodules will be read from commits when we fetch unpopulated
submodules.

Signed-off-by: Glen Choo <chooglen@google.com>
---
 submodule.c | 29 +++++++++++++++++++----------
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/submodule.c b/submodule.c
index 5ace18a7d9..4f3300f2cb 100644
--- a/submodule.c
+++ b/submodule.c
@@ -932,6 +932,7 @@ struct has_commit_data {
 	struct repository *repo;
 	int result;
 	const char *path;
+	const struct object_id *super_oid;
 };
 
 static int check_has_commit(const struct object_id *oid, void *data)
@@ -940,7 +941,7 @@ static int check_has_commit(const struct object_id *oid, void *data)
 	struct repository subrepo;
 	enum object_type type;
 
-	if (repo_submodule_init(&subrepo, cb->repo, cb->path, null_oid())) {
+	if (repo_submodule_init(&subrepo, cb->repo, cb->path, cb->super_oid)) {
 		cb->result = 0;
 		goto cleanup;
 	}
@@ -968,9 +969,15 @@ static int check_has_commit(const struct object_id *oid, void *data)
 
 static int submodule_has_commits(struct repository *r,
 				 const char *path,
+				 const struct object_id *super_oid,
 				 struct oid_array *commits)
 {
-	struct has_commit_data has_commit = { r, 1, path };
+	struct has_commit_data has_commit = {
+		.repo = r,
+		.result = 1,
+		.path = path,
+		.super_oid = super_oid
+	};
 
 	/*
 	 * Perform a cheap, but incorrect check for the existence of 'commits'.
@@ -1017,7 +1024,7 @@ static int submodule_needs_pushing(struct repository *r,
 				   const char *path,
 				   struct oid_array *commits)
 {
-	if (!submodule_has_commits(r, path, commits))
+	if (!submodule_has_commits(r, path, null_oid(), commits))
 		/*
 		 * NOTE: We do consider it safe to return "no" here. The
 		 * correct answer would be "We do not know" instead of
@@ -1277,7 +1284,7 @@ static void calculate_changed_submodule_paths(struct repository *r,
 		if (!path)
 			continue;
 
-		if (submodule_has_commits(r, path, commits)) {
+		if (submodule_has_commits(r, path, null_oid(), commits)) {
 			oid_array_clear(commits);
 			*name->string = '\0';
 		}
@@ -1402,12 +1409,13 @@ static const struct submodule *get_non_gitmodules_submodule(const char *path)
 }
 
 static struct fetch_task *fetch_task_create(struct repository *r,
-					    const char *path)
+					    const char *path,
+					    const struct object_id *treeish_name)
 {
 	struct fetch_task *task = xmalloc(sizeof(*task));
 	memset(task, 0, sizeof(*task));
 
-	task->sub = submodule_from_path(r, null_oid(), path);
+	task->sub = submodule_from_path(r, treeish_name, path);
 	if (!task->sub) {
 		/*
 		 * No entry in .gitmodules? Technically not a submodule,
@@ -1439,11 +1447,12 @@ static void fetch_task_release(struct fetch_task *p)
 }
 
 static struct repository *get_submodule_repo_for(struct repository *r,
-						 const char *path)
+						 const char *path,
+						 const struct object_id *treeish_name)
 {
 	struct repository *ret = xmalloc(sizeof(*ret));
 
-	if (repo_submodule_init(ret, r, path, null_oid())) {
+	if (repo_submodule_init(ret, r, path, treeish_name)) {
 		free(ret);
 		return NULL;
 	}
@@ -1464,7 +1473,7 @@ static int get_next_submodule(struct child_process *cp,
 		if (!S_ISGITLINK(ce->ce_mode))
 			continue;
 
-		task = fetch_task_create(spf->r, ce->name);
+		task = fetch_task_create(spf->r, ce->name, null_oid());
 		if (!task)
 			continue;
 
@@ -1487,7 +1496,7 @@ static int get_next_submodule(struct child_process *cp,
 			continue;
 		}
 
-		task->repo = get_submodule_repo_for(spf->r, task->sub->path);
+		task->repo = get_submodule_repo_for(spf->r, task->sub->path, null_oid());
 		if (task->repo) {
 			struct strbuf submodule_prefix = STRBUF_INIT;
 			child_process_init(cp);
-- 
2.33.GIT


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v3 05/10] submodule: inline submodule_commits() into caller
  2022-02-24 10:08   ` [PATCH v3 00/10] fetch --recurse-submodules: fetch unpopulated submodules Glen Choo
                       ` (3 preceding siblings ...)
  2022-02-24 10:08     ` [PATCH v3 04/10] submodule: make static functions read submodules from commits Glen Choo
@ 2022-02-24 10:08     ` Glen Choo
  2022-02-24 10:08     ` [PATCH v3 06/10] submodule: store new submodule commits oid_array in a struct Glen Choo
                       ` (5 subsequent siblings)
  10 siblings, 0 replies; 149+ messages in thread
From: Glen Choo @ 2022-02-24 10:08 UTC (permalink / raw)
  To: git
  Cc: Glen Choo, Jonathan Tan, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

When collecting the string_list of changed submodule names, the new
submodules commits are stored in the string_list_item.util as an
oid_array. A subsequent commit will replace the oid_array with a struct
that has more information.

Prepare for this change by inlining submodule_commits() (which inserts
into the string_list and initializes the string_list_item.util) into its
only caller so that the code is easier to refactor later.

Signed-off-by: Glen Choo <chooglen@google.com>
---
 submodule.c | 22 ++++++----------------
 1 file changed, 6 insertions(+), 16 deletions(-)

diff --git a/submodule.c b/submodule.c
index 4f3300f2cb..3bc189cf05 100644
--- a/submodule.c
+++ b/submodule.c
@@ -782,19 +782,6 @@ const struct submodule *submodule_from_ce(const struct cache_entry *ce)
 	return submodule_from_path(the_repository, null_oid(), ce->name);
 }
 
-static struct oid_array *submodule_commits(struct string_list *submodules,
-					   const char *name)
-{
-	struct string_list_item *item;
-
-	item = string_list_insert(submodules, name);
-	if (item->util)
-		return (struct oid_array *) item->util;
-
-	/* NEEDSWORK: should we have oid_array_init()? */
-	item->util = xcalloc(1, sizeof(struct oid_array));
-	return (struct oid_array *) item->util;
-}
 
 struct collect_changed_submodules_cb_data {
 	struct repository *repo;
@@ -830,9 +817,9 @@ static void collect_changed_submodules_cb(struct diff_queue_struct *q,
 
 	for (i = 0; i < q->nr; i++) {
 		struct diff_filepair *p = q->queue[i];
-		struct oid_array *commits;
 		const struct submodule *submodule;
 		const char *name;
+		struct string_list_item *item;
 
 		if (!S_ISGITLINK(p->two->mode))
 			continue;
@@ -859,8 +846,11 @@ static void collect_changed_submodules_cb(struct diff_queue_struct *q,
 		if (!name)
 			continue;
 
-		commits = submodule_commits(changed, name);
-		oid_array_append(commits, &p->two->oid);
+		item = string_list_insert(changed, name);
+		if (!item->util)
+			/* NEEDSWORK: should we have oid_array_init()? */
+			item->util = xcalloc(1, sizeof(struct oid_array));
+		oid_array_append(item->util, &p->two->oid);
 	}
 }
 
-- 
2.33.GIT


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v3 06/10] submodule: store new submodule commits oid_array in a struct
  2022-02-24 10:08   ` [PATCH v3 00/10] fetch --recurse-submodules: fetch unpopulated submodules Glen Choo
                       ` (4 preceding siblings ...)
  2022-02-24 10:08     ` [PATCH v3 05/10] submodule: inline submodule_commits() into caller Glen Choo
@ 2022-02-24 10:08     ` Glen Choo
  2022-02-24 10:08     ` [PATCH v3 07/10] submodule: extract get_fetch_task() Glen Choo
                       ` (4 subsequent siblings)
  10 siblings, 0 replies; 149+ messages in thread
From: Glen Choo @ 2022-02-24 10:08 UTC (permalink / raw)
  To: git
  Cc: Glen Choo, Jonathan Tan, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

This commit prepares for a future commit that will teach `git fetch
--recurse-submodules` how to fetch submodules that are present in
<gitdir>/modules, but are not populated. To do this, we need to store
more information about the changed submodule so that we can read the
submodule configuration from the superproject commit instead of the
filesystem.

Refactor the changed submodules string_list.util to hold a struct
instead of an oid_array. This struct only holds the new_commits
oid_array for now; more information will be added later.

Signed-off-by: Glen Choo <chooglen@google.com>
---
 submodule.c | 52 ++++++++++++++++++++++++++++++++++------------------
 1 file changed, 34 insertions(+), 18 deletions(-)

diff --git a/submodule.c b/submodule.c
index 3bc189cf05..0b9c25f9d3 100644
--- a/submodule.c
+++ b/submodule.c
@@ -806,6 +806,20 @@ static const char *default_name_or_path(const char *path_or_name)
 	return path_or_name;
 }
 
+/*
+ * Holds relevant information for a changed submodule. Used as the .util
+ * member of the changed submodule string_list_item.
+ */
+struct changed_submodule_data {
+	/* The submodule commits that have changed in the rev walk. */
+	struct oid_array new_commits;
+};
+
+static void changed_submodule_data_clear(struct changed_submodule_data *cs_data)
+{
+	oid_array_clear(&cs_data->new_commits);
+}
+
 static void collect_changed_submodules_cb(struct diff_queue_struct *q,
 					  struct diff_options *options,
 					  void *data)
@@ -820,6 +834,7 @@ static void collect_changed_submodules_cb(struct diff_queue_struct *q,
 		const struct submodule *submodule;
 		const char *name;
 		struct string_list_item *item;
+		struct changed_submodule_data *cs_data;
 
 		if (!S_ISGITLINK(p->two->mode))
 			continue;
@@ -848,9 +863,9 @@ static void collect_changed_submodules_cb(struct diff_queue_struct *q,
 
 		item = string_list_insert(changed, name);
 		if (!item->util)
-			/* NEEDSWORK: should we have oid_array_init()? */
-			item->util = xcalloc(1, sizeof(struct oid_array));
-		oid_array_append(item->util, &p->two->oid);
+			item->util = xcalloc(1, sizeof(struct changed_submodule_data));
+		cs_data = item->util;
+		oid_array_append(&cs_data->new_commits, &p->two->oid);
 	}
 }
 
@@ -897,11 +912,12 @@ static void collect_changed_submodules(struct repository *r,
 	reset_revision_walk();
 }
 
-static void free_submodules_oids(struct string_list *submodules)
+static void free_submodules_data(struct string_list *submodules)
 {
 	struct string_list_item *item;
 	for_each_string_list_item(item, submodules)
-		oid_array_clear((struct oid_array *) item->util);
+		changed_submodule_data_clear(item->util);
+
 	string_list_clear(submodules, 1);
 }
 
@@ -1074,7 +1090,7 @@ int find_unpushed_submodules(struct repository *r,
 	collect_changed_submodules(r, &submodules, &argv);
 
 	for_each_string_list_item(name, &submodules) {
-		struct oid_array *commits = name->util;
+		struct changed_submodule_data *cs_data = name->util;
 		const struct submodule *submodule;
 		const char *path = NULL;
 
@@ -1087,11 +1103,11 @@ int find_unpushed_submodules(struct repository *r,
 		if (!path)
 			continue;
 
-		if (submodule_needs_pushing(r, path, commits))
+		if (submodule_needs_pushing(r, path, &cs_data->new_commits))
 			string_list_insert(needs_pushing, path);
 	}
 
-	free_submodules_oids(&submodules);
+	free_submodules_data(&submodules);
 	strvec_clear(&argv);
 
 	return needs_pushing->nr;
@@ -1261,7 +1277,7 @@ static void calculate_changed_submodule_paths(struct repository *r,
 	collect_changed_submodules(r, changed_submodule_names, &argv);
 
 	for_each_string_list_item(name, changed_submodule_names) {
-		struct oid_array *commits = name->util;
+		struct changed_submodule_data *cs_data = name->util;
 		const struct submodule *submodule;
 		const char *path = NULL;
 
@@ -1274,8 +1290,8 @@ static void calculate_changed_submodule_paths(struct repository *r,
 		if (!path)
 			continue;
 
-		if (submodule_has_commits(r, path, null_oid(), commits)) {
-			oid_array_clear(commits);
+		if (submodule_has_commits(r, path, null_oid(), &cs_data->new_commits)) {
+			changed_submodule_data_clear(cs_data);
 			*name->string = '\0';
 		}
 	}
@@ -1312,7 +1328,7 @@ int submodule_touches_in_range(struct repository *r,
 
 	strvec_clear(&args);
 
-	free_submodules_oids(&subs);
+	free_submodules_data(&subs);
 	return ret;
 }
 
@@ -1596,7 +1612,7 @@ static int fetch_finish(int retvalue, struct strbuf *err,
 	struct fetch_task *task = task_cb;
 
 	struct string_list_item *it;
-	struct oid_array *commits;
+	struct changed_submodule_data *cs_data;
 
 	if (!task || !task->sub)
 		BUG("callback cookie bogus");
@@ -1624,14 +1640,14 @@ static int fetch_finish(int retvalue, struct strbuf *err,
 		/* Could be an unchanged submodule, not contained in the list */
 		goto out;
 
-	commits = it->util;
-	oid_array_filter(commits,
+	cs_data = it->util;
+	oid_array_filter(&cs_data->new_commits,
 			 commit_missing_in_sub,
 			 task->repo);
 
 	/* Are there commits we want, but do not exist? */
-	if (commits->nr) {
-		task->commits = commits;
+	if (cs_data->new_commits.nr) {
+		task->commits = &cs_data->new_commits;
 		ALLOC_GROW(spf->oid_fetch_tasks,
 			   spf->oid_fetch_tasks_nr + 1,
 			   spf->oid_fetch_tasks_alloc);
@@ -1689,7 +1705,7 @@ int fetch_populated_submodules(struct repository *r,
 
 	strvec_clear(&spf.args);
 out:
-	free_submodules_oids(&spf.changed_submodule_names);
+	free_submodules_data(&spf.changed_submodule_names);
 	return spf.result;
 }
 
-- 
2.33.GIT


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v3 07/10] submodule: extract get_fetch_task()
  2022-02-24 10:08   ` [PATCH v3 00/10] fetch --recurse-submodules: fetch unpopulated submodules Glen Choo
                       ` (5 preceding siblings ...)
  2022-02-24 10:08     ` [PATCH v3 06/10] submodule: store new submodule commits oid_array in a struct Glen Choo
@ 2022-02-24 10:08     ` Glen Choo
  2022-02-24 23:26       ` Jonathan Tan
  2022-02-24 10:08     ` [PATCH v3 08/10] submodule: move logic into fetch_task_create() Glen Choo
                       ` (3 subsequent siblings)
  10 siblings, 1 reply; 149+ messages in thread
From: Glen Choo @ 2022-02-24 10:08 UTC (permalink / raw)
  To: git
  Cc: Glen Choo, Jonathan Tan, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

get_next_submodule() configures the parallel submodule fetch by
performing two functions:

* iterate the index to find submodules
* configure the child processes to fetch the submodules found in the
  previous step

Extract the index iterating code into an iterator function,
get_fetch_task(), so that get_next_submodule() is agnostic of how
to find submodules. This prepares for a subsequent commit will teach the
fetch machinery to also iterate through the list of changed
submodules (in addition to the index).

Signed-off-by: Glen Choo <chooglen@google.com>
---
 submodule.c | 61 +++++++++++++++++++++++++++++++----------------------
 1 file changed, 36 insertions(+), 25 deletions(-)

diff --git a/submodule.c b/submodule.c
index 0b9c25f9d3..988757002a 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1389,6 +1389,7 @@ struct fetch_task {
 	struct repository *repo;
 	const struct submodule *sub;
 	unsigned free_sub : 1; /* Do we need to free the submodule? */
+	const char *default_argv;
 
 	struct oid_array *commits; /* Ensure these commits are fetched */
 };
@@ -1466,14 +1467,11 @@ static struct repository *get_submodule_repo_for(struct repository *r,
 	return ret;
 }
 
-static int get_next_submodule(struct child_process *cp,
-			      struct strbuf *err, void *data, void **task_cb)
+static struct fetch_task *
+get_fetch_task(struct submodule_parallel_fetch *spf, struct strbuf *err)
 {
-	struct submodule_parallel_fetch *spf = data;
-
 	for (; spf->count < spf->r->index->cache_nr; spf->count++) {
 		const struct cache_entry *ce = spf->r->index->cache[spf->count];
-		const char *default_argv;
 		struct fetch_task *task;
 
 		if (!S_ISGITLINK(ce->ce_mode))
@@ -1493,10 +1491,10 @@ static int get_next_submodule(struct child_process *cp,
 					&spf->changed_submodule_names,
 					task->sub->name))
 				continue;
-			default_argv = "on-demand";
+			task->default_argv = "on-demand";
 			break;
 		case RECURSE_SUBMODULES_ON:
-			default_argv = "yes";
+			task->default_argv = "yes";
 			break;
 		case RECURSE_SUBMODULES_OFF:
 			continue;
@@ -1504,29 +1502,12 @@ static int get_next_submodule(struct child_process *cp,
 
 		task->repo = get_submodule_repo_for(spf->r, task->sub->path, null_oid());
 		if (task->repo) {
-			struct strbuf submodule_prefix = STRBUF_INIT;
-			child_process_init(cp);
-			cp->dir = task->repo->gitdir;
-			prepare_submodule_repo_env_in_gitdir(&cp->env_array);
-			cp->git_cmd = 1;
 			if (!spf->quiet)
 				strbuf_addf(err, _("Fetching submodule %s%s\n"),
 					    spf->prefix, ce->name);
-			strvec_init(&cp->args);
-			strvec_pushv(&cp->args, spf->args.v);
-			strvec_push(&cp->args, default_argv);
-			strvec_push(&cp->args, "--submodule-prefix");
-
-			strbuf_addf(&submodule_prefix, "%s%s/",
-						       spf->prefix,
-						       task->sub->path);
-			strvec_push(&cp->args, submodule_prefix.buf);
 
 			spf->count++;
-			*task_cb = task;
-
-			strbuf_release(&submodule_prefix);
-			return 1;
+			return task;
 		} else {
 			struct strbuf empty_submodule_path = STRBUF_INIT;
 
@@ -1550,6 +1531,36 @@ static int get_next_submodule(struct child_process *cp,
 			strbuf_release(&empty_submodule_path);
 		}
 	}
+	return NULL;
+}
+
+static int get_next_submodule(struct child_process *cp, struct strbuf *err,
+			      void *data, void **task_cb)
+{
+	struct submodule_parallel_fetch *spf = data;
+	struct fetch_task *task = get_fetch_task(spf, err);
+
+	if (task) {
+		struct strbuf submodule_prefix = STRBUF_INIT;
+
+		child_process_init(cp);
+		cp->dir = task->repo->gitdir;
+		prepare_submodule_repo_env_in_gitdir(&cp->env_array);
+		cp->git_cmd = 1;
+		strvec_init(&cp->args);
+		strvec_pushv(&cp->args, spf->args.v);
+		strvec_push(&cp->args, task->default_argv);
+		strvec_push(&cp->args, "--submodule-prefix");
+
+		strbuf_addf(&submodule_prefix, "%s%s/",
+						spf->prefix,
+						task->sub->path);
+		strvec_push(&cp->args, submodule_prefix.buf);
+		*task_cb = task;
+
+		strbuf_release(&submodule_prefix);
+		return 1;
+	}
 
 	if (spf->oid_fetch_tasks_nr) {
 		struct fetch_task *task =
-- 
2.33.GIT


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v3 08/10] submodule: move logic into fetch_task_create()
  2022-02-24 10:08   ` [PATCH v3 00/10] fetch --recurse-submodules: fetch unpopulated submodules Glen Choo
                       ` (6 preceding siblings ...)
  2022-02-24 10:08     ` [PATCH v3 07/10] submodule: extract get_fetch_task() Glen Choo
@ 2022-02-24 10:08     ` Glen Choo
  2022-02-24 23:36       ` Jonathan Tan
  2022-02-24 10:08     ` [PATCH v3 09/10] fetch: fetch unpopulated, changed submodules Glen Choo
                       ` (2 subsequent siblings)
  10 siblings, 1 reply; 149+ messages in thread
From: Glen Choo @ 2022-02-24 10:08 UTC (permalink / raw)
  To: git
  Cc: Glen Choo, Jonathan Tan, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

get_fetch_task() gets a fetch task by iterating the index; a future
commit will introduce a similar function, get_fetch_task_from_changed(),
that gets a fetch task from the list of changed submodules. Both
functions are similar in that they need to:

* create a fetch task
* initialize the submodule repo for the fetch task
* determine the default recursion mode

Move all of this logic into fetch_task_create() so that it is no longer
split between fetch_task_create() and get_fetch_task(). This will make
it easier to share code with get_fetch_task_from_changed().

Signed-off-by: Glen Choo <chooglen@google.com>
---
I think this patch could be squashed into the previous one, let me know
if this is a good idea.

 submodule.c | 99 ++++++++++++++++++++++++++++-------------------------
 1 file changed, 52 insertions(+), 47 deletions(-)

diff --git a/submodule.c b/submodule.c
index 988757002a..03af223aba 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1415,32 +1415,6 @@ static const struct submodule *get_non_gitmodules_submodule(const char *path)
 	return (const struct submodule *) ret;
 }
 
-static struct fetch_task *fetch_task_create(struct repository *r,
-					    const char *path,
-					    const struct object_id *treeish_name)
-{
-	struct fetch_task *task = xmalloc(sizeof(*task));
-	memset(task, 0, sizeof(*task));
-
-	task->sub = submodule_from_path(r, treeish_name, path);
-	if (!task->sub) {
-		/*
-		 * No entry in .gitmodules? Technically not a submodule,
-		 * but historically we supported repositories that happen to be
-		 * in-place where a gitlink is. Keep supporting them.
-		 */
-		task->sub = get_non_gitmodules_submodule(path);
-		if (!task->sub) {
-			free(task);
-			return NULL;
-		}
-
-		task->free_sub = 1;
-	}
-
-	return task;
-}
-
 static void fetch_task_release(struct fetch_task *p)
 {
 	if (p->free_sub)
@@ -1467,6 +1441,57 @@ static struct repository *get_submodule_repo_for(struct repository *r,
 	return ret;
 }
 
+static struct fetch_task *fetch_task_create(struct submodule_parallel_fetch *spf,
+					    const char *path,
+					    const struct object_id *treeish_name)
+{
+	struct fetch_task *task = xmalloc(sizeof(*task));
+	memset(task, 0, sizeof(*task));
+
+	task->sub = submodule_from_path(spf->r, treeish_name, path);
+
+	if (!task->sub) {
+		/*
+		 * No entry in .gitmodules? Technically not a submodule,
+		 * but historically we supported repositories that happen to be
+		 * in-place where a gitlink is. Keep supporting them.
+		 */
+		task->sub = get_non_gitmodules_submodule(path);
+		if (!task->sub)
+			goto cleanup;
+
+		task->free_sub = 1;
+	}
+
+	switch (get_fetch_recurse_config(task->sub, spf))
+	{
+	default:
+	case RECURSE_SUBMODULES_DEFAULT:
+	case RECURSE_SUBMODULES_ON_DEMAND:
+		if (!task->sub ||
+			!string_list_lookup(
+				&spf->changed_submodule_names,
+				task->sub->name))
+			goto cleanup;
+		task->default_argv = "on-demand";
+		break;
+	case RECURSE_SUBMODULES_ON:
+		task->default_argv = "yes";
+		break;
+	case RECURSE_SUBMODULES_OFF:
+		goto cleanup;
+	}
+
+	task->repo = get_submodule_repo_for(spf->r, path, treeish_name);
+
+	return task;
+
+ cleanup:
+	fetch_task_release(task);
+	free(task);
+	return NULL;
+}
+
 static struct fetch_task *
 get_fetch_task(struct submodule_parallel_fetch *spf, struct strbuf *err)
 {
@@ -1477,30 +1502,10 @@ get_fetch_task(struct submodule_parallel_fetch *spf, struct strbuf *err)
 		if (!S_ISGITLINK(ce->ce_mode))
 			continue;
 
-		task = fetch_task_create(spf->r, ce->name, null_oid());
+		task = fetch_task_create(spf, ce->name, null_oid());
 		if (!task)
 			continue;
 
-		switch (get_fetch_recurse_config(task->sub, spf))
-		{
-		default:
-		case RECURSE_SUBMODULES_DEFAULT:
-		case RECURSE_SUBMODULES_ON_DEMAND:
-			if (!task->sub ||
-			    !string_list_lookup(
-					&spf->changed_submodule_names,
-					task->sub->name))
-				continue;
-			task->default_argv = "on-demand";
-			break;
-		case RECURSE_SUBMODULES_ON:
-			task->default_argv = "yes";
-			break;
-		case RECURSE_SUBMODULES_OFF:
-			continue;
-		}
-
-		task->repo = get_submodule_repo_for(spf->r, task->sub->path, null_oid());
 		if (task->repo) {
 			if (!spf->quiet)
 				strbuf_addf(err, _("Fetching submodule %s%s\n"),
-- 
2.33.GIT


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v3 09/10] fetch: fetch unpopulated, changed submodules
  2022-02-24 10:08   ` [PATCH v3 00/10] fetch --recurse-submodules: fetch unpopulated submodules Glen Choo
                       ` (7 preceding siblings ...)
  2022-02-24 10:08     ` [PATCH v3 08/10] submodule: move logic into fetch_task_create() Glen Choo
@ 2022-02-24 10:08     ` Glen Choo
  2022-02-24 21:30       ` Junio C Hamano
                         ` (3 more replies)
  2022-02-24 10:08     ` [PATCH v3 10/10] submodule: fix latent check_has_commit() bug Glen Choo
  2022-03-04  0:57     ` [PATCH v4 00/10] fetch --recurse-submodules: fetch unpopulated submodules Glen Choo
  10 siblings, 4 replies; 149+ messages in thread
From: Glen Choo @ 2022-02-24 10:08 UTC (permalink / raw)
  To: git
  Cc: Glen Choo, Jonathan Tan, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

"git fetch --recurse-submodules" only considers populated
submodules (i.e. submodules that can be found by iterating the index),
which makes "git fetch" behave differently based on which commit is
checked out. As a result, even if the user has initialized all submodules
correctly, they may not fetch the necessary submodule commits, and
commands like "git checkout --recurse-submodules" might fail.

Teach "git fetch" to fetch cloned, changed submodules regardless of
whether they are populated. This is in addition to the current behavior
of fetching populated submodules (which is always attempted regardless
of what was fetched in the superproject, or even if nothing was fetched
in the superproject).

A submodule may be encountered multiple times (via the list of
populated submodules or via the list of changed submodules). When this
happens, "git fetch" only reads the 'populated copy' and ignores the
'changed copy'. Amend the verify_fetch_result() test helper so that we
can assert on which 'copy' is being read.

Signed-off-by: Glen Choo <chooglen@google.com>
---
In the process of writing the new tests [1], I noticed some failures of
the form:

  # rm the submodule's working tree directory.
  $ git rm submodule
  [...]

  # Do a fetch that requires running a child process from the submodule.
  $ git fetch --recurse-submodules same-name-1 
  [...]

  # Fatal error tells us that we cannot chdir to the deleted working
    tree.
  fatal: cannot chdir to '../../../submodule': No such file or directory

This happens because submodules set/unset a value for core.worktree when
they are checked out/"un-checked out" (see submodule_move_head() and
connect_work_tree_and_git_dir()), but "git rm" doesn't know that
core.worktree should be updated.

I've worked around this by passing "--work-tree=." to the child process
[2], but this feels like a hack, especially because this bug should
affect all child processes in a "git rm"-ed submodule (this probably
includes the "git branch" processes in gc/branch-recurse-submodules, but
I haven't confirmed it yet). Some more comprehensive solutions that
could be future work are:

- Teach "git [add|rm]" to unset core.worktree (the reverse operation,
  "git restore", should already do the correct thing). This won't detect
  submodules removed with "rm -r" though.
- Teach submodule child processes to ignore stale core.worktree values.
- Do more things in-core instead of using child processes (avoiding the
  failing chdir() call).

I'm not sure what future work we should pursue, or even if the
"--work-tree=." workaround is even good, so I'd appreciate feedback
here.

[1] There is a similar, preexisting test that also removes the
submodules. However, that test isn't affected because it invokes "git
checkout" after doing "git rm".
[2] Since the submodule has a git dir but no working tree, I also tried
working around the bug by passing "--bare". However, this doesn't work
because work tree settings override "bare-ness" settings, as described
by t/t1510-repo-setup.sh.

 Documentation/fetch-options.txt |  26 ++--
 Documentation/git-fetch.txt     |  10 +-
 builtin/fetch.c                 |  14 +-
 submodule.c                     | 125 +++++++++++++--
 submodule.h                     |  12 +-
 t/t5526-fetch-submodules.sh     | 260 +++++++++++++++++++++++++++++++-
 6 files changed, 404 insertions(+), 43 deletions(-)

diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt
index e967ff1874..38dad13683 100644
--- a/Documentation/fetch-options.txt
+++ b/Documentation/fetch-options.txt
@@ -185,15 +185,23 @@ endif::git-pull[]
 ifndef::git-pull[]
 --recurse-submodules[=yes|on-demand|no]::
 	This option controls if and under what conditions new commits of
-	populated submodules should be fetched too. It can be used as a
-	boolean option to completely disable recursion when set to 'no' or to
-	unconditionally recurse into all populated submodules when set to
-	'yes', which is the default when this option is used without any
-	value. Use 'on-demand' to only recurse into a populated submodule
-	when the superproject retrieves a commit that updates the submodule's
-	reference to a commit that isn't already in the local submodule
-	clone. By default, 'on-demand' is used, unless
-	`fetch.recurseSubmodules` is set (see linkgit:git-config[1]).
+	submodules should be fetched too. When recursing through submodules,
+	`git fetch` always attempts to fetch "changed" submodules, that is, a
+	submodule that has commits that are referenced by a newly fetched
+	superproject commit but are missing in the local submodule clone. A
+	changed submodule can be fetched as long as it is present locally e.g.
+	in `$GIT_DIR/modules/` (see linkgit:gitsubmodules[7]); if the upstream
+	adds a new submodule, that submodule cannot be fetched until it is
+	cloned e.g. by `git submodule update`.
++
+When set to 'on-demand', only changed submodules are fetched. When set
+to 'yes', all populated submodules are fetched and submodules that are
+both unpopulated and changed are fetched. When set to 'no', submodules
+are never fetched.
++
+When unspecified, this uses the value of `fetch.recurseSubmodules` if it
+is set (see linkgit:git-config[1]), defaulting to 'on-demand' if unset.
+When this option is used without any value, it defaults to 'yes'.
 endif::git-pull[]
 
 -j::
diff --git a/Documentation/git-fetch.txt b/Documentation/git-fetch.txt
index 550c16ca61..e9d364669a 100644
--- a/Documentation/git-fetch.txt
+++ b/Documentation/git-fetch.txt
@@ -287,12 +287,10 @@ include::transfer-data-leaks.txt[]
 
 BUGS
 ----
-Using --recurse-submodules can only fetch new commits in already checked
-out submodules right now. When e.g. upstream added a new submodule in the
-just fetched commits of the superproject the submodule itself cannot be
-fetched, making it impossible to check out that submodule later without
-having to do a fetch again. This is expected to be fixed in a future Git
-version.
+Using --recurse-submodules can only fetch new commits in submodules that are
+present locally e.g. in `$GIT_DIR/modules/`. If the upstream adds a new
+submodule, that submodule cannot be fetched until it is cloned e.g. by `git
+submodule update`. This is expected to be fixed in a future Git version.
 
 SEE ALSO
 --------
diff --git a/builtin/fetch.c b/builtin/fetch.c
index f7abbc31ff..faaf89f637 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -2122,13 +2122,13 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 			max_children = fetch_parallel_config;
 
 		add_options_to_argv(&options);
-		result = fetch_populated_submodules(the_repository,
-						    &options,
-						    submodule_prefix,
-						    recurse_submodules,
-						    recurse_submodules_default,
-						    verbosity < 0,
-						    max_children);
+		result = fetch_submodules(the_repository,
+					  &options,
+					  submodule_prefix,
+					  recurse_submodules,
+					  recurse_submodules_default,
+					  verbosity < 0,
+					  max_children);
 		strvec_clear(&options);
 	}
 
diff --git a/submodule.c b/submodule.c
index 03af223aba..d60f877b1f 100644
--- a/submodule.c
+++ b/submodule.c
@@ -811,6 +811,16 @@ static const char *default_name_or_path(const char *path_or_name)
  * member of the changed submodule string_list_item.
  */
 struct changed_submodule_data {
+	/*
+	 * The first superproject commit in the rev walk that points to the
+	 * submodule.
+	 */
+	const struct object_id *super_oid;
+	/*
+	 * Path to the submodule in the superproject commit referenced
+	 * by 'super_oid'.
+	 */
+	char *path;
 	/* The submodule commits that have changed in the rev walk. */
 	struct oid_array new_commits;
 };
@@ -818,6 +828,7 @@ struct changed_submodule_data {
 static void changed_submodule_data_clear(struct changed_submodule_data *cs_data)
 {
 	oid_array_clear(&cs_data->new_commits);
+	free(cs_data->path);
 }
 
 static void collect_changed_submodules_cb(struct diff_queue_struct *q,
@@ -865,6 +876,8 @@ static void collect_changed_submodules_cb(struct diff_queue_struct *q,
 		if (!item->util)
 			item->util = xcalloc(1, sizeof(struct changed_submodule_data));
 		cs_data = item->util;
+		cs_data->super_oid = commit_oid;
+		cs_data->path = xstrdup(p->two->path);
 		oid_array_append(&cs_data->new_commits, &p->two->oid);
 	}
 }
@@ -1253,14 +1266,33 @@ void check_for_new_submodule_commits(struct object_id *oid)
 	oid_array_append(&ref_tips_after_fetch, oid);
 }
 
+/*
+ * Returns 1 if there is at least one submodule gitdir in
+ * $GIT_DIR/modules and 0 otherwise. This follows
+ * submodule_name_to_gitdir(), which looks for submodules in
+ * $GIT_DIR/modules, not $GIT_COMMON_DIR.
+ *
+ * A submodule can be moved to $GIT_DIR/modules manually by running "git
+ * submodule absorbgitdirs", or it may be initialized there by "git
+ * submodule update".
+ */
+static int repo_has_absorbed_submodules(struct repository *r)
+{
+	struct strbuf buf = STRBUF_INIT;
+
+	strbuf_repo_git_path(&buf, r, "modules/");
+	return file_exists(buf.buf) && !is_empty_dir(buf.buf);
+}
+
 static void calculate_changed_submodule_paths(struct repository *r,
 		struct string_list *changed_submodule_names)
 {
 	struct strvec argv = STRVEC_INIT;
 	struct string_list_item *name;
 
-	/* No need to check if there are no submodules configured */
-	if (!submodule_from_path(r, NULL, NULL))
+	/* No need to check if no submodules would be fetched */
+	if (!submodule_from_path(r, NULL, NULL) &&
+	    !repo_has_absorbed_submodules(r))
 		return;
 
 	strvec_push(&argv, "--"); /* argv[0] program name */
@@ -1333,7 +1365,8 @@ int submodule_touches_in_range(struct repository *r,
 }
 
 struct submodule_parallel_fetch {
-	int count;
+	int index_count;
+	int changed_count;
 	struct strvec args;
 	struct repository *r;
 	const char *prefix;
@@ -1343,6 +1376,7 @@ struct submodule_parallel_fetch {
 	int result;
 
 	struct string_list changed_submodule_names;
+	struct string_list seen_submodule_names;
 
 	/* Pending fetches by OIDs */
 	struct fetch_task **oid_fetch_tasks;
@@ -1353,6 +1387,7 @@ struct submodule_parallel_fetch {
 #define SPF_INIT { \
 	.args = STRVEC_INIT, \
 	.changed_submodule_names = STRING_LIST_INIT_DUP, \
+	.seen_submodule_names = STRING_LIST_INIT_DUP, \
 	.submodules_with_errors = STRBUF_INIT, \
 }
 
@@ -1390,6 +1425,7 @@ struct fetch_task {
 	const struct submodule *sub;
 	unsigned free_sub : 1; /* Do we need to free the submodule? */
 	const char *default_argv;
+	struct strvec git_args;
 
 	struct oid_array *commits; /* Ensure these commits are fetched */
 };
@@ -1425,6 +1461,8 @@ static void fetch_task_release(struct fetch_task *p)
 	if (p->repo)
 		repo_clear(p->repo);
 	FREE_AND_NULL(p->repo);
+
+	strvec_clear(&p->git_args);
 }
 
 static struct repository *get_submodule_repo_for(struct repository *r,
@@ -1463,6 +1501,9 @@ static struct fetch_task *fetch_task_create(struct submodule_parallel_fetch *spf
 		task->free_sub = 1;
 	}
 
+	if (string_list_lookup(&spf->seen_submodule_names, task->sub->name))
+		goto cleanup;
+
 	switch (get_fetch_recurse_config(task->sub, spf))
 	{
 	default:
@@ -1493,10 +1534,12 @@ static struct fetch_task *fetch_task_create(struct submodule_parallel_fetch *spf
 }
 
 static struct fetch_task *
-get_fetch_task(struct submodule_parallel_fetch *spf, struct strbuf *err)
+get_fetch_task_from_index(struct submodule_parallel_fetch *spf,
+			  struct strbuf *err)
 {
-	for (; spf->count < spf->r->index->cache_nr; spf->count++) {
-		const struct cache_entry *ce = spf->r->index->cache[spf->count];
+	for (; spf->index_count < spf->r->index->cache_nr; spf->index_count++) {
+		const struct cache_entry *ce =
+			spf->r->index->cache[spf->index_count];
 		struct fetch_task *task;
 
 		if (!S_ISGITLINK(ce->ce_mode))
@@ -1511,7 +1554,7 @@ get_fetch_task(struct submodule_parallel_fetch *spf, struct strbuf *err)
 				strbuf_addf(err, _("Fetching submodule %s%s\n"),
 					    spf->prefix, ce->name);
 
-			spf->count++;
+			spf->index_count++;
 			return task;
 		} else {
 			struct strbuf empty_submodule_path = STRBUF_INIT;
@@ -1539,11 +1582,64 @@ get_fetch_task(struct submodule_parallel_fetch *spf, struct strbuf *err)
 	return NULL;
 }
 
+static struct fetch_task *
+get_fetch_task_from_changed(struct submodule_parallel_fetch *spf,
+			    struct strbuf *err)
+{
+	for (; spf->changed_count < spf->changed_submodule_names.nr;
+	     spf->changed_count++) {
+		struct string_list_item item =
+			spf->changed_submodule_names.items[spf->changed_count];
+		struct changed_submodule_data *cs_data = item.util;
+		struct fetch_task *task;
+
+		if (!is_tree_submodule_active(spf->r, cs_data->super_oid,cs_data->path))
+			continue;
+
+		task = fetch_task_create(spf, cs_data->path,
+					 cs_data->super_oid);
+		if (!task)
+			continue;
+
+		if (!task->repo) {
+			strbuf_addf(err, _("Could not access submodule '%s' at commit %s\n"),
+				    cs_data->path,
+				    find_unique_abbrev(cs_data->super_oid, DEFAULT_ABBREV));
+
+			fetch_task_release(task);
+			free(task);
+			continue;
+		}
+
+		if (!spf->quiet)
+			strbuf_addf(err,
+				    _("Fetching submodule %s%s at commit %s\n"),
+				    spf->prefix, task->sub->path,
+				    find_unique_abbrev(cs_data->super_oid,
+						       DEFAULT_ABBREV));
+
+		spf->changed_count++;
+		/*
+		 * NEEDSWORK: A submodule unpopulated by "git rm" will
+		 * have core.worktree set, but the actual core.worktree
+		 * directory won't exist, causing the child process to
+		 * fail. Forcibly set --work-tree until we get smarter
+		 * handling for core.worktree in unpopulated submodules.
+		 */
+		strvec_push(&task->git_args, "--work-tree=.");
+		return task;
+	}
+	return NULL;
+}
+
 static int get_next_submodule(struct child_process *cp, struct strbuf *err,
 			      void *data, void **task_cb)
 {
 	struct submodule_parallel_fetch *spf = data;
-	struct fetch_task *task = get_fetch_task(spf, err);
+	struct fetch_task *task =
+		get_fetch_task_from_index(spf, err);
+	if (!task)
+		task = get_fetch_task_from_changed(spf, err);
 
 	if (task) {
 		struct strbuf submodule_prefix = STRBUF_INIT;
@@ -1553,6 +1649,8 @@ static int get_next_submodule(struct child_process *cp, struct strbuf *err,
 		prepare_submodule_repo_env_in_gitdir(&cp->env_array);
 		cp->git_cmd = 1;
 		strvec_init(&cp->args);
+		if (task->git_args.nr)
+			strvec_pushv(&cp->args, task->git_args.v);
 		strvec_pushv(&cp->args, spf->args.v);
 		strvec_push(&cp->args, task->default_argv);
 		strvec_push(&cp->args, "--submodule-prefix");
@@ -1564,6 +1662,7 @@ static int get_next_submodule(struct child_process *cp, struct strbuf *err,
 		*task_cb = task;
 
 		strbuf_release(&submodule_prefix);
+		string_list_insert(&spf->seen_submodule_names, task->sub->name);
 		return 1;
 	}
 
@@ -1678,11 +1777,11 @@ static int fetch_finish(int retvalue, struct strbuf *err,
 	return 0;
 }
 
-int fetch_populated_submodules(struct repository *r,
-			       const struct strvec *options,
-			       const char *prefix, int command_line_option,
-			       int default_option,
-			       int quiet, int max_parallel_jobs)
+int fetch_submodules(struct repository *r,
+		     const struct strvec *options,
+		     const char *prefix, int command_line_option,
+		     int default_option,
+		     int quiet, int max_parallel_jobs)
 {
 	int i;
 	struct submodule_parallel_fetch spf = SPF_INIT;
diff --git a/submodule.h b/submodule.h
index 784ceffc0e..61bebde319 100644
--- a/submodule.h
+++ b/submodule.h
@@ -88,12 +88,12 @@ int should_update_submodules(void);
  */
 const struct submodule *submodule_from_ce(const struct cache_entry *ce);
 void check_for_new_submodule_commits(struct object_id *oid);
-int fetch_populated_submodules(struct repository *r,
-			       const struct strvec *options,
-			       const char *prefix,
-			       int command_line_option,
-			       int default_option,
-			       int quiet, int max_parallel_jobs);
+int fetch_submodules(struct repository *r,
+		     const struct strvec *options,
+		     const char *prefix,
+		     int command_line_option,
+		     int default_option,
+		     int quiet, int max_parallel_jobs);
 unsigned is_submodule_modified(const char *path, int ignore_untracked);
 int submodule_uses_gitfile(const char *path);
 
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index ee4dd5a4a9..639290d30d 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -15,8 +15,9 @@ pwd=$(pwd)
 
 check_sub() {
 	NEW_HEAD=$1 &&
+	SUPER_HEAD=$2 &&
 	cat <<-EOF >$pwd/expect.err.sub
-	Fetching submodule submodule
+	Fetching submodule submodule${SUPER_HEAD:+ at commit $SUPER_HEAD}
 	From $pwd/submodule
 	   OLD_HEAD..$NEW_HEAD  sub        -> origin/sub
 	EOF
@@ -24,8 +25,9 @@ check_sub() {
 
 check_deep() {
 	NEW_HEAD=$1 &&
+	SUB_HEAD=$2 &&
 	cat <<-EOF >$pwd/expect.err.deep
-	Fetching submodule submodule/subdir/deepsubmodule
+	Fetching submodule submodule/subdir/deepsubmodule${SUB_HEAD:+ at commit $SUB_HEAD}
 	From $pwd/deepsubmodule
 	   OLD_HEAD..$NEW_HEAD  deep       -> origin/deep
 	EOF
@@ -418,6 +420,155 @@ test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necess
 	verify_fetch_result actual.err
 '
 
+# Test that we can fetch submodules in other branches by running fetch
+# in a commit that has no submodules.
+test_expect_success 'setup downstream branch without submodules' '
+	(
+		cd downstream &&
+		git checkout --recurse-submodules -b no-submodules &&
+		git rm .gitmodules &&
+		git rm submodule &&
+		git commit -m "no submodules" &&
+		git checkout --recurse-submodules super
+	)
+'
+
+test_expect_success "'--recurse-submodules=on-demand' should fetch submodule commits if the submodule is changed but the index has no submodules" '
+	add_submodule_commits &&
+	add_superproject_commits &&
+	# Fetch the new superproject commit
+	(
+		cd downstream &&
+		git switch --recurse-submodules no-submodules &&
+		git fetch --recurse-submodules=on-demand >../actual.out 2>../actual.err
+	) &&
+	super_head=$(git rev-parse --short HEAD) &&
+	sub_head=$(git -C submodule rev-parse --short HEAD) &&
+	deep_head=$(git -C submodule/subdir/deepsubmodule rev-parse --short HEAD) &&
+
+	# assert that these are fetched from commits, not the index
+	check_sub $sub_head $super_head &&
+	check_deep $deep_head $sub_head &&
+
+	test_must_be_empty actual.out &&
+	verify_fetch_result actual.err
+'
+
+test_expect_success "'--recurse-submodules' should fetch submodule commits if the submodule is changed but the index has no submodules" '
+	add_submodule_commits &&
+	add_superproject_commits &&
+	# Fetch the new superproject commit
+	(
+		cd downstream &&
+		git switch --recurse-submodules no-submodules &&
+		git fetch --recurse-submodules >../actual.out 2>../actual.err
+	) &&
+	super_head=$(git rev-parse --short HEAD) &&
+	sub_head=$(git -C submodule rev-parse --short HEAD) &&
+	deep_head=$(git -C submodule/subdir/deepsubmodule rev-parse --short HEAD) &&
+
+	# assert that these are fetched from commits, not the index
+	check_sub $sub_head $super_head &&
+	check_deep $deep_head $sub_head &&
+
+	test_must_be_empty actual.out &&
+	verify_fetch_result actual.err
+'
+
+test_expect_success "'--recurse-submodules' should ignore changed, inactive submodules" '
+	add_submodule_commits &&
+	add_superproject_commits &&
+
+	# Fetch the new superproject commit
+	(
+		cd downstream &&
+		git switch --recurse-submodules no-submodules &&
+		git -c submodule.submodule.active=false fetch --recurse-submodules >../actual.out 2>../actual.err
+	) &&
+	test_must_be_empty actual.out &&
+	super_head=$(git rev-parse --short HEAD) &&
+	check_super $super_head &&
+	# Neither should be fetched because the submodule is inactive
+	rm expect.err.sub &&
+	rm expect.err.deep &&
+	verify_fetch_result actual.err
+'
+
+# In downstream, init "submodule2", but do not check it out while
+# fetching. This lets us assert that unpopulated submodules can be
+# fetched.
+test_expect_success 'setup downstream branch with other submodule' '
+	mkdir submodule2 &&
+	(
+		cd submodule2 &&
+		git init &&
+		echo sub2content >sub2file &&
+		git add sub2file &&
+		git commit -a -m new &&
+		git branch -M sub2
+	) &&
+	git checkout -b super-sub2-only &&
+	git submodule add "$pwd/submodule2" submodule2 &&
+	git commit -m "add sub2" &&
+	git checkout super &&
+	(
+		cd downstream &&
+		git fetch --recurse-submodules origin &&
+		git checkout super-sub2-only &&
+		# Explicitly run "git submodule update" because sub2 is new
+		# and has not been cloned.
+		git submodule update --init &&
+		git checkout --recurse-submodules super
+	)
+'
+
+test_expect_success "'--recurse-submodules' should fetch submodule commits in changed submodules and the index" '
+	# Create new commit in origin/super
+	add_submodule_commits &&
+	add_superproject_commits &&
+
+	# Create new commit in origin/super-sub2-only
+	git checkout super-sub2-only &&
+	(
+		cd submodule2 &&
+		test_commit --no-tag foo
+	) &&
+	git add submodule2 &&
+	git commit -m "new submodule2" &&
+
+	git checkout super &&
+	(
+		cd downstream &&
+		git fetch --recurse-submodules >../actual.out 2>../actual.err
+	) &&
+	test_must_be_empty actual.out &&
+	deep_head=$(git -C submodule/subdir/deepsubmodule rev-parse --short HEAD) &&
+	sub_head=$(git -C submodule rev-parse --short HEAD) &&
+	sub2_head=$(git -C submodule2 rev-parse --short HEAD) &&
+	super_head=$(git rev-parse --short HEAD) &&
+	super_sub2_only_head=$(git rev-parse --short super-sub2-only) &&
+
+	# Use test_cmp manually because verify_fetch_result does not
+	# consider submodule2. All the repos should be fetched, but only
+	# submodule2 should be read from a commit
+	cat <<-EOF > expect.err.combined &&
+	From $pwd/.
+	   OLD_HEAD..$super_head  super           -> origin/super
+	   OLD_HEAD..$super_sub2_only_head  super-sub2-only -> origin/super-sub2-only
+	Fetching submodule submodule
+	From $pwd/submodule
+	   OLD_HEAD..$sub_head  sub        -> origin/sub
+	Fetching submodule submodule/subdir/deepsubmodule
+	From $pwd/deepsubmodule
+	   OLD_HEAD..$deep_head  deep       -> origin/deep
+	Fetching submodule submodule2 at commit $super_sub2_only_head
+	From $pwd/submodule2
+	   OLD_HEAD..$sub2_head  sub2       -> origin/sub2
+	EOF
+	sed -E "s/[0-9a-f]+\.\./OLD_HEAD\.\./" actual.err >actual.err.cmp &&
+	test_cmp expect.err.combined actual.err.cmp
+'
+
 test_expect_success "'--recurse-submodules=on-demand' stops when no new submodule commits are found in the superproject (and ignores config)" '
 	add_submodule_commits &&
 	echo a >> file &&
@@ -860,4 +1011,109 @@ test_expect_success 'recursive fetch after deinit a submodule' '
 	test_cmp expect actual
 '
 
+test_expect_success 'setup repo with upstreams that share a submodule name' '
+	mkdir same-name-1 &&
+	(
+		cd same-name-1 &&
+		git init &&
+		test_commit --no-tag a
+	) &&
+	git clone same-name-1 same-name-2 &&
+	# same-name-1 and same-name-2 both add a submodule with the
+	# name "submodule"
+	(
+		cd same-name-1 &&
+		mkdir submodule &&
+		git -C submodule init &&
+		test_commit -C submodule --no-tag a1 &&
+		git submodule add "$pwd/same-name-1/submodule" &&
+		git add submodule &&
+		git commit -m "super-a1"
+	) &&
+	(
+		cd same-name-2 &&
+		mkdir submodule &&
+		git -C submodule init &&
+		test_commit -C submodule --no-tag a2 &&
+		git submodule add "$pwd/same-name-2/submodule" &&
+		git add submodule &&
+		git commit -m "super-a2"
+	) &&
+	git clone same-name-1 -o same-name-1 same-name-downstream &&
+	(
+		cd same-name-downstream &&
+		git remote add same-name-2 ../same-name-2 &&
+		git fetch --all &&
+		# init downstream with same-name-1
+		git submodule update --init
+	)
+'
+
+test_expect_success 'fetch --recurse-submodules updates name-conflicted, populated submodule' '
+	test_when_finished "git -C same-name-downstream checkout master" &&
+	(
+		cd same-name-1 &&
+		test_commit -C submodule --no-tag b1 &&
+		git add submodule &&
+		git commit -m "super-b1"
+	) &&
+	(
+		cd same-name-2 &&
+		test_commit -C submodule --no-tag b2 &&
+		git add submodule &&
+		git commit -m "super-b2"
+	) &&
+	(
+		cd same-name-downstream &&
+		# even though the .gitmodules is correct, we cannot
+		# fetch from same-name-2
+		git checkout same-name-2/master &&
+		git fetch --recurse-submodules same-name-1 &&
+		test_must_fail git fetch --recurse-submodules same-name-2
+	) &&
+	super_head1=$(git -C same-name-1 rev-parse HEAD) &&
+	git -C same-name-downstream cat-file -e $super_head1 &&
+
+	super_head2=$(git -C same-name-2 rev-parse HEAD) &&
+	git -C same-name-downstream cat-file -e $super_head2 &&
+
+	sub_head1=$(git -C same-name-1/submodule rev-parse HEAD) &&
+	git -C same-name-downstream/submodule cat-file -e $sub_head1 &&
+
+	sub_head2=$(git -C same-name-2/submodule rev-parse HEAD) &&
+	test_must_fail git -C same-name-downstream/submodule cat-file -e $sub_head2
+'
+
+test_expect_success 'fetch --recurse-submodules updates name-conflicted, unpopulated submodule' '
+	(
+		cd same-name-1 &&
+		test_commit -C submodule --no-tag c1 &&
+		git add submodule &&
+		git commit -m "super-c1"
+	) &&
+	(
+		cd same-name-2 &&
+		test_commit -C submodule --no-tag c2 &&
+		git add submodule &&
+		git commit -m "super-c2"
+	) &&
+	(
+		cd same-name-downstream &&
+		git checkout master &&
+		git rm .gitmodules &&
+		git rm submodule &&
+		git commit -m "no submodules" &&
+		git fetch --recurse-submodules same-name-1
+	) &&
+	head1=$(git -C same-name-1/submodule rev-parse HEAD) &&
+	head2=$(git -C same-name-2/submodule rev-parse HEAD) &&
+	(
+		cd same-name-downstream/.git/modules/submodule &&
+		# The submodule has core.worktree pointing to the "git
+		# rm"-ed directory, overwrite the invalid value.
+		git --work-tree=. cat-file -e $head1 &&
+		test_must_fail git --work-tree=. cat-file -e $head2
+	)
+'
+
 test_done
-- 
2.33.GIT


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v3 10/10] submodule: fix latent check_has_commit() bug
  2022-02-24 10:08   ` [PATCH v3 00/10] fetch --recurse-submodules: fetch unpopulated submodules Glen Choo
                       ` (8 preceding siblings ...)
  2022-02-24 10:08     ` [PATCH v3 09/10] fetch: fetch unpopulated, changed submodules Glen Choo
@ 2022-02-24 10:08     ` Glen Choo
  2022-03-04  0:57     ` [PATCH v4 00/10] fetch --recurse-submodules: fetch unpopulated submodules Glen Choo
  10 siblings, 0 replies; 149+ messages in thread
From: Glen Choo @ 2022-02-24 10:08 UTC (permalink / raw)
  To: git
  Cc: Glen Choo, Jonathan Tan, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

When check_has_commit() is called on a missing submodule, initialization
of the struct repository fails, but it attempts to clear the struct
anyway (which is a fatal error). This bug is masked by its only caller,
submodule_has_commits(), first calling add_submodule_odb(). The latter
fails if the submodule does not exist, making submodule_has_commits()
exit early and not invoke check_has_commit().

Fix this bug, and because calling add_submodule_odb() is no longer
necessary as of 13a2f620b2 (submodule: pass repo to
check_has_commit(), 2021-10-08), remove that call too.

This is the last caller of add_submodule_odb(), so remove that
function. (Submodule ODBs are still added as alternates via
add_submodule_odb_by_path().)

Signed-off-by: Glen Choo <chooglen@google.com>
---
 submodule.c | 35 ++---------------------------------
 submodule.h |  9 ++++-----
 2 files changed, 6 insertions(+), 38 deletions(-)

diff --git a/submodule.c b/submodule.c
index d60f877b1f..71495e67f5 100644
--- a/submodule.c
+++ b/submodule.c
@@ -167,26 +167,6 @@ void stage_updated_gitmodules(struct index_state *istate)
 
 static struct string_list added_submodule_odb_paths = STRING_LIST_INIT_NODUP;
 
-/* TODO: remove this function, use repo_submodule_init instead. */
-int add_submodule_odb(const char *path)
-{
-	struct strbuf objects_directory = STRBUF_INIT;
-	int ret = 0;
-
-	ret = strbuf_git_path_submodule(&objects_directory, path, "objects/");
-	if (ret)
-		goto done;
-	if (!is_directory(objects_directory.buf)) {
-		ret = -1;
-		goto done;
-	}
-	string_list_insert(&added_submodule_odb_paths,
-			   strbuf_detach(&objects_directory, NULL));
-done:
-	strbuf_release(&objects_directory);
-	return ret;
-}
-
 void add_submodule_odb_by_path(const char *path)
 {
 	string_list_insert(&added_submodule_odb_paths, xstrdup(path));
@@ -962,7 +942,8 @@ static int check_has_commit(const struct object_id *oid, void *data)
 
 	if (repo_submodule_init(&subrepo, cb->repo, cb->path, cb->super_oid)) {
 		cb->result = 0;
-		goto cleanup;
+		/* subrepo failed to init, so don't clean it up. */
+		return 0;
 	}
 
 	type = oid_object_info(&subrepo, oid, NULL);
@@ -998,18 +979,6 @@ static int submodule_has_commits(struct repository *r,
 		.super_oid = super_oid
 	};
 
-	/*
-	 * Perform a cheap, but incorrect check for the existence of 'commits'.
-	 * This is done by adding the submodule's object store to the in-core
-	 * object store, and then querying for each commit's existence.  If we
-	 * do not have the commit object anywhere, there is no chance we have
-	 * it in the object store of the correct submodule and have it
-	 * reachable from a ref, so we can fail early without spawning rev-list
-	 * which is expensive.
-	 */
-	if (add_submodule_odb(path))
-		return 0;
-
 	oid_array_for_each_unique(commits, check_has_commit, &has_commit);
 
 	if (has_commit.result) {
diff --git a/submodule.h b/submodule.h
index 61bebde319..40c1445237 100644
--- a/submodule.h
+++ b/submodule.h
@@ -103,12 +103,11 @@ int submodule_uses_gitfile(const char *path);
 int bad_to_remove_submodule(const char *path, unsigned flags);
 
 /*
- * Call add_submodule_odb() to add the submodule at the given path to a list.
- * When register_all_submodule_odb_as_alternates() is called, the object stores
- * of all submodules in that list will be added as alternates in
- * the_repository.
+ * Call add_submodule_odb_by_path() to add the submodule at the given
+ * path to a list. When register_all_submodule_odb_as_alternates() is
+ * called, the object stores of all submodules in that list will be
+ * added as alternates in the_repository.
  */
-int add_submodule_odb(const char *path);
 void add_submodule_odb_by_path(const char *path);
 int register_all_submodule_odb_as_alternates(void);
 
-- 
2.33.GIT


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* Re: [PATCH v3 02/10] t5526: stop asserting on stderr literally
  2022-02-24 10:08     ` [PATCH v3 02/10] t5526: stop asserting on stderr literally Glen Choo
@ 2022-02-24 11:52       ` Ævar Arnfjörð Bjarmason
  2022-02-24 16:15         ` Glen Choo
  2022-02-24 23:05       ` Jonathan Tan
  1 sibling, 1 reply; 149+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-02-24 11:52 UTC (permalink / raw)
  To: Glen Choo; +Cc: git, Jonathan Tan, Junio C Hamano


On Thu, Feb 24 2022, Glen Choo wrote:

> +check_sub() {
> +	NEW_HEAD=$1 &&
> +	cat <<-EOF >$pwd/expect.err.sub

Hrm, I didn't know that would work, the usual style is:

    cat >file <<...

Instead of:

    cat <<.. >file

Maybe better to use that?

> +	Fetching submodule submodule
> +	From $pwd/submodule
> +	   OLD_HEAD..$NEW_HEAD  sub        -> origin/sub
> +	EOF
> +}
> +
> +check_deep() {
> +	NEW_HEAD=$1 &&
> +	cat <<-EOF >$pwd/expect.err.deep
> +	Fetching submodule submodule/subdir/deepsubmodule
> +	From $pwd/deepsubmodule
> +	   OLD_HEAD..$NEW_HEAD  deep       -> origin/deep
> +	EOF
> +}
> +
> +check_super() {
> +	NEW_HEAD=$1 &&
> +	cat <<-EOF >$pwd/expect.err.super
> +	From $pwd/.
> +	   OLD_HEAD..$NEW_HEAD  super      -> origin/super
> +	EOF
> +}

These look a lot better, but instead of always passing the result of
"git rev-parse --short HEAD" can't we just always invoke that in these
helpers?

Maybe there are cases where $NEW_HEAD is different, I've just skimmed
this series.

> @@ -62,7 +82,8 @@ verify_fetch_result() {
>  	if [ -f expect.err.deep ]; then
>  		cat expect.err.deep >>expect.err.combined
>  	fi &&
> -	test_cmp expect.err.combined $ACTUAL_ERR
> +	sed -E 's/[0-9a-f]+\.\./OLD_HEAD\.\./' $ACTUAL_ERR >actual.err.cmp &&
> +	test_cmp expect.err.combined actual.err.cmp
>  }

I think this is unportable per check-non-portable-shell.pl:

        /\bsed\s+-[^efn]\s+/ and err 'sed option not portable (use only -n, -e, -f)';

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v3 02/10] t5526: stop asserting on stderr literally
  2022-02-24 11:52       ` Ævar Arnfjörð Bjarmason
@ 2022-02-24 16:15         ` Glen Choo
  2022-02-24 18:13           ` Eric Sunshine
  0 siblings, 1 reply; 149+ messages in thread
From: Glen Choo @ 2022-02-24 16:15 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Jonathan Tan, Junio C Hamano

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Thu, Feb 24 2022, Glen Choo wrote:
>
>> +check_sub() {
>> +	NEW_HEAD=$1 &&
>> +	cat <<-EOF >$pwd/expect.err.sub
>
> Hrm, I didn't know that would work, the usual style is:
>
>     cat >file <<...
>
> Instead of:
>
>     cat <<.. >file
>
> Maybe better to use that?

Thanks, I somehow mixed things up when I wrote that.

>> +	Fetching submodule submodule
>> +	From $pwd/submodule
>> +	   OLD_HEAD..$NEW_HEAD  sub        -> origin/sub
>> +	EOF
>> +}
>> +
>> +check_deep() {
>> +	NEW_HEAD=$1 &&
>> +	cat <<-EOF >$pwd/expect.err.deep
>> +	Fetching submodule submodule/subdir/deepsubmodule
>> +	From $pwd/deepsubmodule
>> +	   OLD_HEAD..$NEW_HEAD  deep       -> origin/deep
>> +	EOF
>> +}
>> +
>> +check_super() {
>> +	NEW_HEAD=$1 &&
>> +	cat <<-EOF >$pwd/expect.err.super
>> +	From $pwd/.
>> +	   OLD_HEAD..$NEW_HEAD  super      -> origin/super
>> +	EOF
>> +}
>
> These look a lot better, but instead of always passing the result of
> "git rev-parse --short HEAD" can't we just always invoke that in these
> helpers?
>
> Maybe there are cases where $NEW_HEAD is different, I've just skimmed
> this series.

I haven't found any other instances where $NEW_HEAD is different, so I
suppose we could move it into the helpers. I don't think it benefits
readability that much to do so, but if you think it's much better, I'll
incorporate it when I reroll this.

>> @@ -62,7 +82,8 @@ verify_fetch_result() {
>>  	if [ -f expect.err.deep ]; then
>>  		cat expect.err.deep >>expect.err.combined
>>  	fi &&
>> -	test_cmp expect.err.combined $ACTUAL_ERR
>> +	sed -E 's/[0-9a-f]+\.\./OLD_HEAD\.\./' $ACTUAL_ERR >actual.err.cmp &&
>> +	test_cmp expect.err.combined actual.err.cmp
>>  }
>
> I think this is unportable per check-non-portable-shell.pl:
>
>         /\bsed\s+-[^efn]\s+/ and err 'sed option not portable (use only -n, -e, -f)';

Ah thanks, my sed-fu is pretty poor, so I appreciate the tip :)

I used that because I wanted +, but I found what I needed from the sed
manpage i.e. that + is equivalent to \{1,\}).

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v3 02/10] t5526: stop asserting on stderr literally
  2022-02-24 16:15         ` Glen Choo
@ 2022-02-24 18:13           ` Eric Sunshine
  0 siblings, 0 replies; 149+ messages in thread
From: Eric Sunshine @ 2022-02-24 18:13 UTC (permalink / raw)
  To: Glen Choo
  Cc: Ævar Arnfjörð Bjarmason, Git List, Jonathan Tan,
	Junio C Hamano

On Thu, Feb 24, 2022 at 11:46 AM Glen Choo <chooglen@google.com> wrote:
> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
> > On Thu, Feb 24 2022, Glen Choo wrote:
> >> +    sed -E 's/[0-9a-f]+\.\./OLD_HEAD\.\./' $ACTUAL_ERR >actual.err.cmp &&
> >
> > I think this is unportable per check-non-portable-shell.pl:
> >
> >         /\bsed\s+-[^efn]\s+/ and err 'sed option not portable (use only -n, -e, -f)';
>
> I used that because I wanted +, but I found what I needed from the sed
> manpage i.e. that + is equivalent to \{1,\}).

This isn't necessarily going to be portable either for older sed
implementations. Most portable would be:

    [0-9a-f][0-9a-f]*

(Whether or not we need to worry about those older sed impmenetations
is a different question...)

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v3 09/10] fetch: fetch unpopulated, changed submodules
  2022-02-24 10:08     ` [PATCH v3 09/10] fetch: fetch unpopulated, changed submodules Glen Choo
@ 2022-02-24 21:30       ` Junio C Hamano
  2022-02-25  3:04         ` Glen Choo
  2022-02-25  0:33       ` Junio C Hamano
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 149+ messages in thread
From: Junio C Hamano @ 2022-02-24 21:30 UTC (permalink / raw)
  To: Glen Choo; +Cc: git, Jonathan Tan, Ævar Arnfjörð Bjarmason

Glen Choo <chooglen@google.com> writes:

> +	char *path;
>  	/* The submodule commits that have changed in the rev walk. */
>  	struct oid_array new_commits;
>  };
> @@ -818,6 +828,7 @@ struct changed_submodule_data {
>  static void changed_submodule_data_clear(struct changed_submodule_data *cs_data)
>  {
>  	oid_array_clear(&cs_data->new_commits);
> +	free(cs_data->path);

OK.

>  }
>  
>  static void collect_changed_submodules_cb(struct diff_queue_struct *q,
> @@ -865,6 +876,8 @@ static void collect_changed_submodules_cb(struct diff_queue_struct *q,
>  		if (!item->util)
>  			item->util = xcalloc(1, sizeof(struct changed_submodule_data));
>  		cs_data = item->util;
> +		cs_data->super_oid = commit_oid;
> +		cs_data->path = xstrdup(p->two->path);

Iffy.  If item->util were populated already, wouldn't cs_data
already have its .path member pointing at an allocated piece of
memory?  Can we safely free it before assigning a new value, or does
somebody else still have a copy of .path and we cannot free it?

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v3 02/10] t5526: stop asserting on stderr literally
  2022-02-24 10:08     ` [PATCH v3 02/10] t5526: stop asserting on stderr literally Glen Choo
  2022-02-24 11:52       ` Ævar Arnfjörð Bjarmason
@ 2022-02-24 23:05       ` Jonathan Tan
  2022-02-25  2:26         ` Glen Choo
  1 sibling, 1 reply; 149+ messages in thread
From: Jonathan Tan @ 2022-02-24 23:05 UTC (permalink / raw)
  To: Glen Choo; +Cc: Jonathan Tan, git, Junio C Hamano, avarab

Glen Choo <chooglen@google.com> writes:
> diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
> index 0e93df1665..a3890e2f6c 100755
> --- a/t/t5526-fetch-submodules.sh
> +++ b/t/t5526-fetch-submodules.sh
> @@ -13,6 +13,32 @@ export GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB
>  
>  pwd=$(pwd)
>  
> +check_sub() {
> +	NEW_HEAD=$1 &&
> +	cat <<-EOF >$pwd/expect.err.sub
> +	Fetching submodule submodule
> +	From $pwd/submodule
> +	   OLD_HEAD..$NEW_HEAD  sub        -> origin/sub
> +	EOF
> +}
> +
> +check_deep() {
> +	NEW_HEAD=$1 &&
> +	cat <<-EOF >$pwd/expect.err.deep
> +	Fetching submodule submodule/subdir/deepsubmodule
> +	From $pwd/deepsubmodule
> +	   OLD_HEAD..$NEW_HEAD  deep       -> origin/deep
> +	EOF
> +}
> +
> +check_super() {
> +	NEW_HEAD=$1 &&
> +	cat <<-EOF >$pwd/expect.err.super
> +	From $pwd/.
> +	   OLD_HEAD..$NEW_HEAD  super      -> origin/super
> +	EOF
> +}

These don't do any checking, but just write what's expected to a file.
Could these be called something like write_sub_expected etc.?

Other than that, the patches up to this look fine (besides the comments
left by others).

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v3 03/10] t5526: create superproject commits with test helper
  2022-02-24 10:08     ` [PATCH v3 03/10] t5526: create superproject commits with test helper Glen Choo
@ 2022-02-24 23:14       ` Jonathan Tan
  2022-02-25  2:52         ` Glen Choo
  0 siblings, 1 reply; 149+ messages in thread
From: Jonathan Tan @ 2022-02-24 23:14 UTC (permalink / raw)
  To: Glen Choo; +Cc: Jonathan Tan, git, Junio C Hamano, avarab

Glen Choo <chooglen@google.com> writes:
> +# For each superproject in the test setup, update its submodule, add the
> +# submodule and create a new commit with the submodule change.
> +#
> +# This requires add_submodule_commits() to be called first, otherwise
> +# the submodules will not have changed and cannot be "git add"-ed.
> +add_superproject_commits() {
> +(
> +	cd submodule &&
> +	(
> +		cd subdir/deepsubmodule &&
> +		git fetch &&
> +		git checkout -q FETCH_HEAD
> +	) &&
> +		git add subdir/deepsubmodule &&
> +		git commit -m "new deep submodule"
> +	) &&

The indentation looks off. Also, no need for "-q".

> @@ -378,19 +387,7 @@ test_expect_success "Recursion picks up all submodules when necessary" '
>  '
>  
>  test_expect_success "'--recurse-submodules=on-demand' doesn't recurse when no new commits are fetched in the superproject (and ignores config)" '
> -	add_upstream_commit &&
> -	(
> -		cd submodule &&
> -		(
> -			cd subdir/deepsubmodule &&
> -			git fetch &&
> -			git checkout -q FETCH_HEAD
> -		) &&
> -		git add subdir/deepsubmodule &&
> -		git commit -m "new deepsubmodule" &&
> -		new_head=$(git rev-parse --short HEAD) &&
> -		check_sub $new_head
> -	) &&
> +	add_submodule_commits &&
>  	(
>  		cd downstream &&
>  		git config fetch.recurseSubmodules true &&

Hmm...I'm surprised that this still passes even when code is deleted but
the replacement is not added. What's happening here, I guess, is that
we're checking that nothing has happened. The test probably should be
rewritten but that's outside the scope of this patch set. So for now,
just add the add_superproject_commits call.

> @@ -402,10 +399,7 @@ test_expect_success "'--recurse-submodules=on-demand' doesn't recurse when no ne
>  '
>  
>  test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necessary (and ignores config)" '
> -	git add submodule &&
> -	git commit -m "new submodule" &&
> -	new_head=$(git rev-parse --short HEAD) &&
> -	check_super $new_head &&
> +	add_superproject_commits &&
>  	(
>  		cd downstream &&
>  		git config fetch.recurseSubmodules false &&

add_superproject_commits without add_submodule_commits?

The rest looks good and overall this looks like a good idea to simplify
the test.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v3 07/10] submodule: extract get_fetch_task()
  2022-02-24 10:08     ` [PATCH v3 07/10] submodule: extract get_fetch_task() Glen Choo
@ 2022-02-24 23:26       ` Jonathan Tan
  0 siblings, 0 replies; 149+ messages in thread
From: Jonathan Tan @ 2022-02-24 23:26 UTC (permalink / raw)
  To: Glen Choo
  Cc: Jonathan Tan, git, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

Glen Choo <chooglen@google.com> writes:
> @@ -1389,6 +1389,7 @@ struct fetch_task {
>  	struct repository *repo;
>  	const struct submodule *sub;
>  	unsigned free_sub : 1; /* Do we need to free the submodule? */
> +	const char *default_argv;
>  
>  	struct oid_array *commits; /* Ensure these commits are fetched */
>  };

I preferred the other way of passing default_argv in parallel, because
it is only used for the interaction in between get_fetch_task() and
get_next_submodule(), but I don't feel too strongly about this. In any
case, up to and including this patch looks good.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v3 08/10] submodule: move logic into fetch_task_create()
  2022-02-24 10:08     ` [PATCH v3 08/10] submodule: move logic into fetch_task_create() Glen Choo
@ 2022-02-24 23:36       ` Jonathan Tan
  0 siblings, 0 replies; 149+ messages in thread
From: Jonathan Tan @ 2022-02-24 23:36 UTC (permalink / raw)
  To: Glen Choo
  Cc: Jonathan Tan, git, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

Glen Choo <chooglen@google.com> writes:
> I think this patch could be squashed into the previous one, let me know
> if this is a good idea.

For what it's worth, as a reviewer, I appreciated this patch being its
own. It made it easier to review.

It's unfortunate that the created diff has fetch_task_create() deleted
and readded, but showing it with

  --anchored=" memset(task, 0, sizeof(*task));"

does make it easier to see. (The space before memset is a tab.)

> + cleanup:
> +	fetch_task_release(task);
> +	free(task);
> +	return NULL;
> +}

No space between the left margin and "cleanup".

Otherwise, up to and including this patch looks good.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v3 09/10] fetch: fetch unpopulated, changed submodules
  2022-02-24 10:08     ` [PATCH v3 09/10] fetch: fetch unpopulated, changed submodules Glen Choo
  2022-02-24 21:30       ` Junio C Hamano
@ 2022-02-25  0:33       ` Junio C Hamano
  2022-02-25  3:07         ` Glen Choo
  2022-02-25  0:39       ` Jonathan Tan
  2022-02-26 18:53       ` Junio C Hamano
  3 siblings, 1 reply; 149+ messages in thread
From: Junio C Hamano @ 2022-02-25  0:33 UTC (permalink / raw)
  To: Glen Choo; +Cc: git, Jonathan Tan, Ævar Arnfjörð Bjarmason

Glen Choo <chooglen@google.com> writes:


> +static struct fetch_task *
> +get_fetch_task_from_changed(struct submodule_parallel_fetch *spf,
> +			    struct strbuf *err)
> +{
> +	for (; spf->changed_count < spf->changed_submodule_names.nr;
> +	     spf->changed_count++) {
> +		struct string_list_item item =
> +			spf->changed_submodule_names.items[spf->changed_count];
> +		struct changed_submodule_data *cs_data = item.util;
> +		struct fetch_task *task;
> +
> +		if (!is_tree_submodule_active(spf->r, cs_data->super_oid,cs_data->path))
> +			continue;

Where does this function come from?  I seem to be getting compilation errors.

> diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
> index ee4dd5a4a9..639290d30d 100755
> --- a/t/t5526-fetch-submodules.sh
> +++ b/t/t5526-fetch-submodules.sh
> @@ -15,8 +15,9 @@ pwd=$(pwd)
>  
>  check_sub() {

Style.  

	check_sub () {

>  	NEW_HEAD=$1 &&
> +	SUPER_HEAD=$2 &&
>  	cat <<-EOF >$pwd/expect.err.sub

Style.

	cat <<-EOF >"$pwd/expect.err.sub"

You may swap the order of redirection (having <<here-doc at the end
of the line might look more familiar to some people).  Try to do as
majority of surrounding code does.

Make sure you quote the redirection target filename if it involves
variable interpolation (see Documentation/CodingGuidelines, look for
"Redirection").

> +	cat <<-EOF > expect.err.combined &&

Style.

	cat <<-EOF >expect.err.combined &&

No SP between redirection operator and its target.

> +	sed -E "s/[0-9a-f]+\.\./OLD_HEAD\.\./" actual.err >actual.err.cmp &&

No ERE in sed.  "[0-9a-f][0-9a-f]*" instead of "[0-9a-f]+" should be
sufficient, I think.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v3 01/10] t5526: introduce test helper to assert on fetches
  2022-02-24 10:08     ` [PATCH v3 01/10] t5526: introduce test helper to assert on fetches Glen Choo
@ 2022-02-25  0:34       ` Junio C Hamano
  0 siblings, 0 replies; 149+ messages in thread
From: Junio C Hamano @ 2022-02-25  0:34 UTC (permalink / raw)
  To: Glen Choo; +Cc: git, Jonathan Tan, Ævar Arnfjörð Bjarmason

Glen Choo <chooglen@google.com> writes:

> +verify_fetch_result() {

Style.

	verify_fetch_result () {

> +	ACTUAL_ERR=$1 &&
> +	rm -f expect.err.combined &&
> +	if [ -f expect.err.super ]; then

Style.

	if test -f expect.err.super
	then


^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v3 09/10] fetch: fetch unpopulated, changed submodules
  2022-02-24 10:08     ` [PATCH v3 09/10] fetch: fetch unpopulated, changed submodules Glen Choo
  2022-02-24 21:30       ` Junio C Hamano
  2022-02-25  0:33       ` Junio C Hamano
@ 2022-02-25  0:39       ` Jonathan Tan
  2022-02-25  3:46         ` Glen Choo
  2022-02-26 18:53       ` Junio C Hamano
  3 siblings, 1 reply; 149+ messages in thread
From: Jonathan Tan @ 2022-02-25  0:39 UTC (permalink / raw)
  To: Glen Choo
  Cc: Jonathan Tan, git, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

Glen Choo <chooglen@google.com> writes:
> In the process of writing the new tests [1], I noticed some failures of
> the form:
> 
>   # rm the submodule's working tree directory.
>   $ git rm submodule
>   [...]
> 
>   # Do a fetch that requires running a child process from the submodule.
>   $ git fetch --recurse-submodules same-name-1 
>   [...]
> 
>   # Fatal error tells us that we cannot chdir to the deleted working
>     tree.
>   fatal: cannot chdir to '../../../submodule': No such file or directory
> 
> This happens because submodules set/unset a value for core.worktree when
> they are checked out/"un-checked out" (see submodule_move_head() and
> connect_work_tree_and_git_dir()), but "git rm" doesn't know that
> core.worktree should be updated.
> 
> I've worked around this by passing "--work-tree=." to the child process
> [2], but this feels like a hack, especially because this bug should
> affect all child processes in a "git rm"-ed submodule (this probably
> includes the "git branch" processes in gc/branch-recurse-submodules, but
> I haven't confirmed it yet). 

Ah...that's a tricky bug. Thanks for finding it.

> Some more comprehensive solutions that
> could be future work are:
> 
> - Teach "git [add|rm]" to unset core.worktree (the reverse operation,
>   "git restore", should already do the correct thing). This won't detect
>   submodules removed with "rm -r" though.

This might work with the caveat you mentioned.

> - Teach submodule child processes to ignore stale core.worktree values.

This might work, coupled with Emily Shaffer's work on teaching
submodules to know that they're submodules (so we know when a stale
core.worktree can be safely ignored).

[1] https://lore.kernel.org/git/20220203215914.683922-1-emilyshaffer@google.com/

> - Do more things in-core instead of using child processes (avoiding the
>   failing chdir() call).

This might not work if the invocation needs to check the worktree (for
example, as far as I know, we won't delete a branch if it's currently
checked out in a worktree).

> I'm not sure what future work we should pursue, or even if the
> "--work-tree=." workaround is even good, so I'd appreciate feedback
> here.

I can't think of better solutions than what you listed, unfortunately. I
also can't think of a better workaround, but at least it's narrowly
scoped: we know that we're running on a submodule and that the operation
is not affected by a worktree (for example, we're fetching, but we know
we're not fetching with a refspec that updates a currently checked out
branch). Let's see what others have to say.

> @@ -865,6 +876,8 @@ static void collect_changed_submodules_cb(struct diff_queue_struct *q,
>  		if (!item->util)
>  			item->util = xcalloc(1, sizeof(struct changed_submodule_data));
>  		cs_data = item->util;
> +		cs_data->super_oid = commit_oid;
> +		cs_data->path = xstrdup(p->two->path);

Junio mentioned the possibility of cs_data->path already being non-NULL
[1].

[1] https://lore.kernel.org/git/xmqqy220j6kf.fsf@gitster.g/

> @@ -1253,14 +1266,33 @@ void check_for_new_submodule_commits(struct object_id *oid)
>  	oid_array_append(&ref_tips_after_fetch, oid);
>  }
>  
> +/*
> + * Returns 1 if there is at least one submodule gitdir in
> + * $GIT_DIR/modules and 0 otherwise. This follows
> + * submodule_name_to_gitdir(), which looks for submodules in
> + * $GIT_DIR/modules, not $GIT_COMMON_DIR.
> + *
> + * A submodule can be moved to $GIT_DIR/modules manually by running "git
> + * submodule absorbgitdirs", or it may be initialized there by "git
> + * submodule update".
> + */
> +static int repo_has_absorbed_submodules(struct repository *r)
> +{
> +	struct strbuf buf = STRBUF_INIT;
> +
> +	strbuf_repo_git_path(&buf, r, "modules/");
> +	return file_exists(buf.buf) && !is_empty_dir(buf.buf);
> +}

buf needs to be released?

> @@ -1333,7 +1365,8 @@ int submodule_touches_in_range(struct repository *r,
>  }
>  
>  struct submodule_parallel_fetch {
> -	int count;
> +	int index_count;
> +	int changed_count;
>  	struct strvec args;
>  	struct repository *r;
>  	const char *prefix;

If we're sticking with these names, probably worth a comment. E.g.
"index_count" is the number of submodules in <name of field that this is
an index of> that we have processed, and likewise for "changed_count".

> @@ -1343,6 +1376,7 @@ struct submodule_parallel_fetch {
>  	int result;
>  
>  	struct string_list changed_submodule_names;
> +	struct string_list seen_submodule_names;
>  
>  	/* Pending fetches by OIDs */
>  	struct fetch_task **oid_fetch_tasks;

Also here - changed is the list that we generated from walking the
fetched superproject commits, and seen is the list of submodules we've
processed in <name of function>.

> @@ -1539,11 +1582,64 @@ get_fetch_task(struct submodule_parallel_fetch *spf, struct strbuf *err)

[snip]

> +		/*
> +		 * NEEDSWORK: A submodule unpopulated by "git rm" will
> +		 * have core.worktree set, but the actual core.worktree
> +		 * directory won't exist, causing the child process to
> +		 * fail. Forcibly set --work-tree until we get smarter
> +		 * handling for core.worktree in unpopulated submodules.
> +		 */
> +		strvec_push(&task->git_args, "--work-tree=.");
> +		return task;
> +	}
> +	return NULL;
> +}

If we end up sticking to this workaround (which sounds reasonable to
me), the comment here probably should contain a lot of what was written
under the "---" in the commit message.

> +test_expect_success "'--recurse-submodules=on-demand' should fetch submodule commits if the submodule is changed but the index has no submodules" '

[snip]

> +# In downstream, init "submodule2", but do not check it out while
> +# fetching. This lets us assert that unpopulated submodules can be
> +# fetched.
> +test_expect_success 'setup downstream branch with other submodule' '
> +	mkdir submodule2 &&
> +	(
> +		cd submodule2 &&
> +		git init &&
> +		echo sub2content >sub2file &&
> +		git add sub2file &&
> +		git commit -a -m new &&
> +		git branch -M sub2
> +	) &&
> +	git checkout -b super-sub2-only &&
> +	git submodule add "$pwd/submodule2" submodule2 &&
> +	git commit -m "add sub2" &&
> +	git checkout super &&
> +	(
> +		cd downstream &&
> +		git fetch --recurse-submodules origin &&
> +		git checkout super-sub2-only &&
> +		# Explicitly run "git submodule update" because sub2 is new
> +		# and has not been cloned.
> +		git submodule update --init &&
> +		git checkout --recurse-submodules super
> +	)
> +'

Hmm...what is the difference between this and the original case in which
the index has no submodules? Both assert that unpopulated submodules
(submodules that cannot be found by iterating the index, as described in
your commit message) can be fetched.

> +	# Use test_cmp manually because verify_fetch_result does not
> +	# consider submodule2. All the repos should be fetched, but only
> +	# submodule2 should be read from a commit
> +	cat <<-EOF > expect.err.combined &&
> +	From $pwd/.
> +	   OLD_HEAD..$super_head  super           -> origin/super
> +	   OLD_HEAD..$super_sub2_only_head  super-sub2-only -> origin/super-sub2-only
> +	Fetching submodule submodule
> +	From $pwd/submodule
> +	   OLD_HEAD..$sub_head  sub        -> origin/sub
> +	Fetching submodule submodule/subdir/deepsubmodule
> +	From $pwd/deepsubmodule
> +	   OLD_HEAD..$deep_head  deep       -> origin/deep
> +	Fetching submodule submodule2 at commit $super_sub2_only_head
> +	From $pwd/submodule2
> +	   OLD_HEAD..$sub2_head  sub2       -> origin/sub2
> +	EOF
> +	sed -E "s/[0-9a-f]+\.\./OLD_HEAD\.\./" actual.err >actual.err.cmp &&
> +	test_cmp expect.err.combined actual.err.cmp
> +'

Could verify_fetch_result be modified to consider the new submodule
instead?

> +test_expect_success 'fetch --recurse-submodules updates name-conflicted, populated submodule' '
> +	test_when_finished "git -C same-name-downstream checkout master" &&
> +	(
> +		cd same-name-1 &&
> +		test_commit -C submodule --no-tag b1 &&
> +		git add submodule &&
> +		git commit -m "super-b1"
> +	) &&
> +	(
> +		cd same-name-2 &&
> +		test_commit -C submodule --no-tag b2 &&
> +		git add submodule &&
> +		git commit -m "super-b2"
> +	) &&
> +	(
> +		cd same-name-downstream &&
> +		# even though the .gitmodules is correct, we cannot
> +		# fetch from same-name-2
> +		git checkout same-name-2/master &&
> +		git fetch --recurse-submodules same-name-1 &&
> +		test_must_fail git fetch --recurse-submodules same-name-2

What's the error message printed to the user here? (Just from reading
the code, I would have expected this to succeed, with the submodule
fetch being from same-name-1's submodule since we're fetching submodules
by name, but apparently that is not the case.)

> +	(
> +		cd same-name-downstream/.git/modules/submodule &&
> +		# The submodule has core.worktree pointing to the "git
> +		# rm"-ed directory, overwrite the invalid value.
> +		git --work-tree=. cat-file -e $head1 &&
> +		test_must_fail git --work-tree=. cat-file -e $head2
> +	)

Regarding the worktree workaround, also say "see comment in
get_fetch_task() for more information" or something like that.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v3 02/10] t5526: stop asserting on stderr literally
  2022-02-24 23:05       ` Jonathan Tan
@ 2022-02-25  2:26         ` Glen Choo
  0 siblings, 0 replies; 149+ messages in thread
From: Glen Choo @ 2022-02-25  2:26 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Jonathan Tan, git, Junio C Hamano, avarab

Jonathan Tan <jonathantanmy@google.com> writes:

> Glen Choo <chooglen@google.com> writes:
>> diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
>> index 0e93df1665..a3890e2f6c 100755
>> --- a/t/t5526-fetch-submodules.sh
>> +++ b/t/t5526-fetch-submodules.sh
>> @@ -13,6 +13,32 @@ export GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB
>>  
>>  pwd=$(pwd)
>>  
>> +check_sub() {
>> +	NEW_HEAD=$1 &&
>> +	cat <<-EOF >$pwd/expect.err.sub
>> +	Fetching submodule submodule
>> +	From $pwd/submodule
>> +	   OLD_HEAD..$NEW_HEAD  sub        -> origin/sub
>> +	EOF
>> +}
>> +
>> +check_deep() {
>> +	NEW_HEAD=$1 &&
>> +	cat <<-EOF >$pwd/expect.err.deep
>> +	Fetching submodule submodule/subdir/deepsubmodule
>> +	From $pwd/deepsubmodule
>> +	   OLD_HEAD..$NEW_HEAD  deep       -> origin/deep
>> +	EOF
>> +}
>> +
>> +check_super() {
>> +	NEW_HEAD=$1 &&
>> +	cat <<-EOF >$pwd/expect.err.super
>> +	From $pwd/.
>> +	   OLD_HEAD..$NEW_HEAD  super      -> origin/super
>> +	EOF
>> +}
>
> These don't do any checking, but just write what's expected to a file.
> Could these be called something like write_sub_expected etc.?
>

Thanks for the suggestion! I was struggling with names.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v3 03/10] t5526: create superproject commits with test helper
  2022-02-24 23:14       ` Jonathan Tan
@ 2022-02-25  2:52         ` Glen Choo
  2022-02-25 11:42           ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 149+ messages in thread
From: Glen Choo @ 2022-02-25  2:52 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Jonathan Tan, git, Junio C Hamano, avarab

Jonathan Tan <jonathantanmy@google.com> writes:

> Glen Choo <chooglen@google.com> writes:
>> +# For each superproject in the test setup, update its submodule, add the
>> +# submodule and create a new commit with the submodule change.
>> +#
>> +# This requires add_submodule_commits() to be called first, otherwise
>> +# the submodules will not have changed and cannot be "git add"-ed.
>> +add_superproject_commits() {
>> +(
>> +	cd submodule &&
>> +	(
>> +		cd subdir/deepsubmodule &&
>> +		git fetch &&
>> +		git checkout -q FETCH_HEAD
>> +	) &&
>> +		git add subdir/deepsubmodule &&
>> +		git commit -m "new deep submodule"
>> +	) &&
>
> The indentation looks off. Also, no need for "-q".

Ah thanks. I think the "-q" is there to suppress the detached HEAD
warning, which is very large.

I'd prefer to keep it unless there are stronger reasons than "it's not
needed for correctness". 

>> @@ -378,19 +387,7 @@ test_expect_success "Recursion picks up all submodules when necessary" '
>>  '
>>  
>>  test_expect_success "'--recurse-submodules=on-demand' doesn't recurse when no new commits are fetched in the superproject (and ignores config)" '
>> -	add_upstream_commit &&
>> -	(
>> -		cd submodule &&
>> -		(
>> -			cd subdir/deepsubmodule &&
>> -			git fetch &&
>> -			git checkout -q FETCH_HEAD
>> -		) &&
>> -		git add subdir/deepsubmodule &&
>> -		git commit -m "new deepsubmodule" &&
>> -		new_head=$(git rev-parse --short HEAD) &&
>> -		check_sub $new_head
>> -	) &&
>> +	add_submodule_commits &&
>>  	(
>>  		cd downstream &&
>>  		git config fetch.recurseSubmodules true &&
>
> Hmm...I'm surprised that this still passes even when code is deleted but
> the replacement is not added. What's happening here, I guess, is that
> we're checking that nothing has happened. The test probably should be
> rewritten but that's outside the scope of this patch set. So for now,
> just add the add_superproject_commits call.

Yeah this test could use some fixing up; I spent a lot of time trying to
understand this one. It could use comments at least.

The suggestion to add the add_superproject_commits call defeats the
purpose of the test though - which is to assert that "on-demand"
recursion only fetches submodule commits if a superproject commit says
the submodule has changed, unlike "yes", which unconditionally fetches
submodule commits.

So we need to consider these cases:

1. no new upstream commits
2. new upstream submodule commits, but not superproject (call
   add_submodule_commits() only)
3. new upstream submodule and superproject commits (call
   add_submodule_commits() and add_superproject_commits())

(1): "on-demand" and "yes" both fetch nothing
(2): "yes" fetches submodule commits but "on-demand" doesn't
(3): "on-demand" and "yes" both fetch submodule and superproject commits

So this test can't call add_superproject_commits(), because we would no
longer be testing scenario (2) - we'd be 'testing' (3) instead (which
doesn't tell us how "on-demand" is different from "yes").

>> @@ -402,10 +399,7 @@ test_expect_success "'--recurse-submodules=on-demand' doesn't recurse when no ne
>>  '
>>  
>>  test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necessary (and ignores config)" '
>> -	git add submodule &&
>> -	git commit -m "new submodule" &&
>> -	new_head=$(git rev-parse --short HEAD) &&
>> -	check_super $new_head &&
>> +	add_superproject_commits &&
>>  	(
>>  		cd downstream &&
>>  		git config fetch.recurseSubmodules false &&
>
> add_superproject_commits without add_submodule_commits?

This is a silly holdover from before my rewrite.. These lines:

   -	git add submodule &&
   -	git commit -m "new submodule" &&
   -	new_head=$(git rev-parse --short HEAD) &&

don't make any sense either until you realize that these commits were
set up in the _previous_ test. I should clean this up though, there's no
reason for others to have to struggle with this the way I did.

The easiest approach would be to add the add_submodule_commits() call,
with a comment explaining that it's technically unnecessary work
(because the previous test already calls add_submodule_commits()) but it
makes the test easier to read.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v3 09/10] fetch: fetch unpopulated, changed submodules
  2022-02-24 21:30       ` Junio C Hamano
@ 2022-02-25  3:04         ` Glen Choo
  0 siblings, 0 replies; 149+ messages in thread
From: Glen Choo @ 2022-02-25  3:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jonathan Tan, Ævar Arnfjörð Bjarmason

Junio C Hamano <gitster@pobox.com> writes:

> Glen Choo <chooglen@google.com> writes:
>
>> +	char *path;
>>  	/* The submodule commits that have changed in the rev walk. */
>>  	struct oid_array new_commits;
>>  };
>> @@ -818,6 +828,7 @@ struct changed_submodule_data {
>>  static void changed_submodule_data_clear(struct changed_submodule_data *cs_data)
>>  {
>>  	oid_array_clear(&cs_data->new_commits);
>> +	free(cs_data->path);
>
> OK.
>
>>  }
>>  
>>  static void collect_changed_submodules_cb(struct diff_queue_struct *q,
>> @@ -865,6 +876,8 @@ static void collect_changed_submodules_cb(struct diff_queue_struct *q,
>>  		if (!item->util)
>>  			item->util = xcalloc(1, sizeof(struct changed_submodule_data));
>>  		cs_data = item->util;
>> +		cs_data->super_oid = commit_oid;
>> +		cs_data->path = xstrdup(p->two->path);
>
> Iffy.  If item->util were populated already, wouldn't cs_data
> already have its .path member pointing at an allocated piece of
> memory?  Can we safely free it before assigning a new value, or does
> somebody else still have a copy of .path and we cannot free it?

Great catch! This is a silly mistake, it looks like this because I
copied the pattern that we used to _append_ new commit oids, but
.super_oid and .path aren't appended, they're replaced.

But we don't even need to replace .super_oid and .path, we can use the
first values we encounter and ignore subsequent ones.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v3 09/10] fetch: fetch unpopulated, changed submodules
  2022-02-25  0:33       ` Junio C Hamano
@ 2022-02-25  3:07         ` Glen Choo
  0 siblings, 0 replies; 149+ messages in thread
From: Glen Choo @ 2022-02-25  3:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jonathan Tan, Ævar Arnfjörð Bjarmason

Junio C Hamano <gitster@pobox.com> writes:

> Glen Choo <chooglen@google.com> writes:
>
>
>> +static struct fetch_task *
>> +get_fetch_task_from_changed(struct submodule_parallel_fetch *spf,
>> +			    struct strbuf *err)
>> +{
>> +	for (; spf->changed_count < spf->changed_submodule_names.nr;
>> +	     spf->changed_count++) {
>> +		struct string_list_item item =
>> +			spf->changed_submodule_names.items[spf->changed_count];
>> +		struct changed_submodule_data *cs_data = item.util;
>> +		struct fetch_task *task;
>> +
>> +		if (!is_tree_submodule_active(spf->r, cs_data->super_oid,cs_data->path))
>> +			continue;
>
> Where does this function come from?  I seem to be getting compilation errors.

Sorry, this was introduced in gc/branch-recurse-submodules, but I
neglected to mention that I used that as the base in v1.

>> diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
>> index ee4dd5a4a9..639290d30d 100755
>> --- a/t/t5526-fetch-submodules.sh
>> +++ b/t/t5526-fetch-submodules.sh
>> @@ -15,8 +15,9 @@ pwd=$(pwd)
>>  
>>  check_sub() {
>
> Style.  
>
> 	check_sub () {
>
>>  	NEW_HEAD=$1 &&
>> +	SUPER_HEAD=$2 &&
>>  	cat <<-EOF >$pwd/expect.err.sub
>
> Style.
>
> 	cat <<-EOF >"$pwd/expect.err.sub"
>
> You may swap the order of redirection (having <<here-doc at the end
> of the line might look more familiar to some people).  Try to do as
> majority of surrounding code does.
>
> Make sure you quote the redirection target filename if it involves
> variable interpolation (see Documentation/CodingGuidelines, look for
> "Redirection").
>
>> +	cat <<-EOF > expect.err.combined &&
>
> Style.
>
> 	cat <<-EOF >expect.err.combined &&
>
> No SP between redirection operator and its target.
>
>> +	sed -E "s/[0-9a-f]+\.\./OLD_HEAD\.\./" actual.err >actual.err.cmp &&
>
> No ERE in sed.  "[0-9a-f][0-9a-f]*" instead of "[0-9a-f]+" should be
> sufficient, I think.

Thanks :)

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v3 09/10] fetch: fetch unpopulated, changed submodules
  2022-02-25  0:39       ` Jonathan Tan
@ 2022-02-25  3:46         ` Glen Choo
  2022-03-04 23:46           ` Jonathan Tan
  2022-03-04 23:53           ` Jonathan Tan
  0 siblings, 2 replies; 149+ messages in thread
From: Glen Choo @ 2022-02-25  3:46 UTC (permalink / raw)
  To: Jonathan Tan
  Cc: Jonathan Tan, git, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

Jonathan Tan <jonathantanmy@google.com> writes:

> Glen Choo <chooglen@google.com> writes:
>> I'm not sure what future work we should pursue, or even if the
>> "--work-tree=." workaround is even good, so I'd appreciate feedback
>> here.
>
> I can't think of better solutions than what you listed, unfortunately. I
> also can't think of a better workaround, but at least it's narrowly
> scoped: we know that we're running on a submodule and that the operation
> is not affected by a worktree (for example, we're fetching, but we know
> we're not fetching with a refspec that updates a currently checked out
> branch). Let's see what others have to say.

Thanks for sharing your thoughts :)

>> @@ -1253,14 +1266,33 @@ void check_for_new_submodule_commits(struct object_id *oid)
>>  	oid_array_append(&ref_tips_after_fetch, oid);
>>  }
>>  
>> +/*
>> + * Returns 1 if there is at least one submodule gitdir in
>> + * $GIT_DIR/modules and 0 otherwise. This follows
>> + * submodule_name_to_gitdir(), which looks for submodules in
>> + * $GIT_DIR/modules, not $GIT_COMMON_DIR.
>> + *
>> + * A submodule can be moved to $GIT_DIR/modules manually by running "git
>> + * submodule absorbgitdirs", or it may be initialized there by "git
>> + * submodule update".
>> + */
>> +static int repo_has_absorbed_submodules(struct repository *r)
>> +{
>> +	struct strbuf buf = STRBUF_INIT;
>> +
>> +	strbuf_repo_git_path(&buf, r, "modules/");
>> +	return file_exists(buf.buf) && !is_empty_dir(buf.buf);
>> +}
>
> buf needs to be released?

Ah, thanks.

>> @@ -1333,7 +1365,8 @@ int submodule_touches_in_range(struct repository *r,
>>  }
>>  
>>  struct submodule_parallel_fetch {
>> -	int count;
>> +	int index_count;
>> +	int changed_count;
>>  	struct strvec args;
>>  	struct repository *r;
>>  	const char *prefix;
>
> If we're sticking with these names, probably worth a comment. E.g.
> "index_count" is the number of submodules in <name of field that this is
> an index of> that we have processed, and likewise for "changed_count".
>
>> @@ -1343,6 +1376,7 @@ struct submodule_parallel_fetch {
>>  	int result;
>>  
>>  	struct string_list changed_submodule_names;
>> +	struct string_list seen_submodule_names;
>>  
>>  	/* Pending fetches by OIDs */
>>  	struct fetch_task **oid_fetch_tasks;
>
> Also here - changed is the list that we generated from walking the
> fetched superproject commits, and seen is the list of submodules we've
> processed in <name of function>.

Makes sense.

>> @@ -1539,11 +1582,64 @@ get_fetch_task(struct submodule_parallel_fetch *spf, struct strbuf *err)
>
> [snip]
>
>> +		/*
>> +		 * NEEDSWORK: A submodule unpopulated by "git rm" will
>> +		 * have core.worktree set, but the actual core.worktree
>> +		 * directory won't exist, causing the child process to
>> +		 * fail. Forcibly set --work-tree until we get smarter
>> +		 * handling for core.worktree in unpopulated submodules.
>> +		 */
>> +		strvec_push(&task->git_args, "--work-tree=.");
>> +		return task;
>> +	}
>> +	return NULL;
>> +}
>
> If we end up sticking to this workaround (which sounds reasonable to
> me), the comment here probably should contain a lot of what was written
> under the "---" in the commit message.

I assume this includes documenting solutions (like your NEEDSWORK
comment on submodule_name_to_gitdir()) and why core.worktree isn't
usually a problem (because checkout et al do the right thing).

>> +test_expect_success "'--recurse-submodules=on-demand' should fetch submodule commits if the submodule is changed but the index has no submodules" '
>
> [snip]
>
>> +# In downstream, init "submodule2", but do not check it out while
>> +# fetching. This lets us assert that unpopulated submodules can be
>> +# fetched.
>> +test_expect_success 'setup downstream branch with other submodule' '
>> +	mkdir submodule2 &&
>> +	(
>> +		cd submodule2 &&
>> +		git init &&
>> +		echo sub2content >sub2file &&
>> +		git add sub2file &&
>> +		git commit -a -m new &&
>> +		git branch -M sub2
>> +	) &&
>> +	git checkout -b super-sub2-only &&
>> +	git submodule add "$pwd/submodule2" submodule2 &&
>> +	git commit -m "add sub2" &&
>> +	git checkout super &&
>> +	(
>> +		cd downstream &&
>> +		git fetch --recurse-submodules origin &&
>> +		git checkout super-sub2-only &&
>> +		# Explicitly run "git submodule update" because sub2 is new
>> +		# and has not been cloned.
>> +		git submodule update --init &&
>> +		git checkout --recurse-submodules super
>> +	)
>> +'
>
> Hmm...what is the difference between this and the original case in which
> the index has no submodules? Both assert that unpopulated submodules
> (submodules that cannot be found by iterating the index, as described in
> your commit message) can be fetched.

In the previous test, the index has no submodules (it's completely empty
in fact, so we don't iterate the index at all), but in this test, it
does. This lets us check that there aren't any buggy interactions when
both changed and index submodules are present.

I think such mistakes are pretty easy to introduce on accident - I made
one pre-v1 where I reused .count between both iterators (instead
of having .index_count and .changed_count). It passed the previous test
because we didn't care about the index, but it obviously wouldn't pass
this one.

>> +	# Use test_cmp manually because verify_fetch_result does not
>> +	# consider submodule2. All the repos should be fetched, but only
>> +	# submodule2 should be read from a commit
>> +	cat <<-EOF > expect.err.combined &&
>> +	From $pwd/.
>> +	   OLD_HEAD..$super_head  super           -> origin/super
>> +	   OLD_HEAD..$super_sub2_only_head  super-sub2-only -> origin/super-sub2-only
>> +	Fetching submodule submodule
>> +	From $pwd/submodule
>> +	   OLD_HEAD..$sub_head  sub        -> origin/sub
>> +	Fetching submodule submodule/subdir/deepsubmodule
>> +	From $pwd/deepsubmodule
>> +	   OLD_HEAD..$deep_head  deep       -> origin/deep
>> +	Fetching submodule submodule2 at commit $super_sub2_only_head
>> +	From $pwd/submodule2
>> +	   OLD_HEAD..$sub2_head  sub2       -> origin/sub2
>> +	EOF
>> +	sed -E "s/[0-9a-f]+\.\./OLD_HEAD\.\./" actual.err >actual.err.cmp &&
>> +	test_cmp expect.err.combined actual.err.cmp
>> +'
>
> Could verify_fetch_result be modified to consider the new submodule
> instead?

Since submodule2 is on the end of the file, I could modify
verify_fetch_result() to concatenate extra text on the end. But if it
weren't in the middle, we'd need to insert arbitrary text in the middle
of the file.

I can't think of a good way to do this without compromising test
readability, so I'll just do concatenation for now.

>> +test_expect_success 'fetch --recurse-submodules updates name-conflicted, populated submodule' '
>> +	test_when_finished "git -C same-name-downstream checkout master" &&
>> +	(
>> +		cd same-name-1 &&
>> +		test_commit -C submodule --no-tag b1 &&
>> +		git add submodule &&
>> +		git commit -m "super-b1"
>> +	) &&
>> +	(
>> +		cd same-name-2 &&
>> +		test_commit -C submodule --no-tag b2 &&
>> +		git add submodule &&
>> +		git commit -m "super-b2"
>> +	) &&
>> +	(
>> +		cd same-name-downstream &&
>> +		# even though the .gitmodules is correct, we cannot
>> +		# fetch from same-name-2
>> +		git checkout same-name-2/master &&
>> +		git fetch --recurse-submodules same-name-1 &&
>> +		test_must_fail git fetch --recurse-submodules same-name-2
>
> What's the error message printed to the user here? (Just from reading
> the code, I would have expected this to succeed, with the submodule
> fetch being from same-name-1's submodule since we're fetching submodules
> by name, but apparently that is not the case.)

Yeah, I think this might trip up some readers. The message is:

  From ../same-name-2
    b7ebb59..944b5ac  master     -> same-name-2/master
  Fetching submodule submodule
  fatal: git upload-pack: not our ref 7ff6874077503acb9d0a52e280aaed9748276319
  fatal: remote error: upload-pack: not our ref 7ff6874077503acb9d0a52e280aaed9748276319
  Errors during submodule fetch:
          submodule

Which, I believe, comes from how we fetch commits by oid:

  static int get_next_submodule(struct child_process *cp, struct strbuf *err,
              void *data, void **task_cb)
  [...]
    oid_array_for_each_unique(task->commits,
          append_oid_to_argv, &cp->args);

When the following is true:

- the submodule is found in the index
- we are fetching submodules unconditionally (--recurse-submodules=yes")
- no superproject commit "changes" the submodule

task->commits is empty, and we just fetch the from the submodule's
remote by name. But as long as any superproject commit "changes" the
submodule, we try to fetch by oid, which, as this test demonstrates, may
fail.

>
>> +	(
>> +		cd same-name-downstream/.git/modules/submodule &&
>> +		# The submodule has core.worktree pointing to the "git
>> +		# rm"-ed directory, overwrite the invalid value.
>> +		git --work-tree=. cat-file -e $head1 &&
>> +		test_must_fail git --work-tree=. cat-file -e $head2
>> +	)
>
> Regarding the worktree workaround, also say "see comment in
> get_fetch_task() for more information" or something like that.

Makes sense.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v3 03/10] t5526: create superproject commits with test helper
  2022-02-25  2:52         ` Glen Choo
@ 2022-02-25 11:42           ` Ævar Arnfjörð Bjarmason
  2022-02-28 18:11             ` Glen Choo
  0 siblings, 1 reply; 149+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-02-25 11:42 UTC (permalink / raw)
  To: Glen Choo; +Cc: Jonathan Tan, git, Junio C Hamano


On Fri, Feb 25 2022, Glen Choo wrote:

> Jonathan Tan <jonathantanmy@google.com> writes:
>
>> Glen Choo <chooglen@google.com> writes:
>>> +# For each superproject in the test setup, update its submodule, add the
>>> +# submodule and create a new commit with the submodule change.
>>> +#
>>> +# This requires add_submodule_commits() to be called first, otherwise
>>> +# the submodules will not have changed and cannot be "git add"-ed.
>>> +add_superproject_commits() {
>>> +(
>>> +	cd submodule &&
>>> +	(
>>> +		cd subdir/deepsubmodule &&
>>> +		git fetch &&
>>> +		git checkout -q FETCH_HEAD
>>> +	) &&
>>> +		git add subdir/deepsubmodule &&
>>> +		git commit -m "new deep submodule"
>>> +	) &&
>>
>> The indentation looks off. Also, no need for "-q".
>
> Ah thanks. I think the "-q" is there to suppress the detached HEAD
> warning, which is very large.
>
> I'd prefer to keep it unless there are stronger reasons than "it's not
> needed for correctness". 

FWIW I was going to comment on the -q, but didn't because you're just
moving this around.

I think even for large warnings it's fine to omit -q etc, since that's
what --verbose (as in the test-lib.sh argument) is for.

But in this case it's probably better to leave it as-is.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v3 09/10] fetch: fetch unpopulated, changed submodules
  2022-02-24 10:08     ` [PATCH v3 09/10] fetch: fetch unpopulated, changed submodules Glen Choo
                         ` (2 preceding siblings ...)
  2022-02-25  0:39       ` Jonathan Tan
@ 2022-02-26 18:53       ` Junio C Hamano
  2022-03-01 20:24         ` Johannes Schindelin
  2022-03-01 20:32         ` Junio C Hamano
  3 siblings, 2 replies; 149+ messages in thread
From: Junio C Hamano @ 2022-02-26 18:53 UTC (permalink / raw)
  To: Glen Choo
  Cc: git, Jonathan Tan, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin

A few tests added by this patch have been failing on one specific
job (linux-gcc ubuntu-latest) at GitHub CI.

https://github.com/git/git/runs/5341052811?check_suite_focus=true#step:5:3968
https://github.com/git/git/runs/5343133021?check_suite_focus=true#step:4:5520

    Side note: you may need to be logged in to GitHub to view them.
    These two use different versions of CI to show the test traces;
    in the latter you may have to click on right-facing rectangle on
    the line with label "5520" to see the breakage.

I think there is some baked-in assumption in the failing test what
the name of the initial branch by default is, which may be the reason
why this particular job fails while others don't.

Can you take a look at it?

Thanks.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v3 03/10] t5526: create superproject commits with test helper
  2022-02-25 11:42           ` Ævar Arnfjörð Bjarmason
@ 2022-02-28 18:11             ` Glen Choo
  0 siblings, 0 replies; 149+ messages in thread
From: Glen Choo @ 2022-02-28 18:11 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Jonathan Tan, git, Junio C Hamano

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Fri, Feb 25 2022, Glen Choo wrote:
>
>> Jonathan Tan <jonathantanmy@google.com> writes:
>>
>>> Glen Choo <chooglen@google.com> writes:
>>>> +# For each superproject in the test setup, update its submodule, add the
>>>> +# submodule and create a new commit with the submodule change.
>>>> +#
>>>> +# This requires add_submodule_commits() to be called first, otherwise
>>>> +# the submodules will not have changed and cannot be "git add"-ed.
>>>> +add_superproject_commits() {
>>>> +(
>>>> +	cd submodule &&
>>>> +	(
>>>> +		cd subdir/deepsubmodule &&
>>>> +		git fetch &&
>>>> +		git checkout -q FETCH_HEAD
>>>> +	) &&
>>>> +		git add subdir/deepsubmodule &&
>>>> +		git commit -m "new deep submodule"
>>>> +	) &&
>>>
>>> The indentation looks off. Also, no need for "-q".
>>
>> Ah thanks. I think the "-q" is there to suppress the detached HEAD
>> warning, which is very large.
>>
>> I'd prefer to keep it unless there are stronger reasons than "it's not
>> needed for correctness". 
>
> FWIW I was going to comment on the -q, but didn't because you're just
> moving this around.
>
> I think even for large warnings it's fine to omit -q etc, since that's
> what --verbose (as in the test-lib.sh argument) is for.

Ah interesting, I didn't consider that. Thanks!

> But in this case it's probably better to leave it as-is.

I'm also leaning towards this because I'm just moving things around, but
I could be convinced otherwise.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v3 09/10] fetch: fetch unpopulated, changed submodules
  2022-02-26 18:53       ` Junio C Hamano
@ 2022-03-01 20:24         ` Johannes Schindelin
  2022-03-01 20:33           ` Junio C Hamano
  2022-03-01 20:32         ` Junio C Hamano
  1 sibling, 1 reply; 149+ messages in thread
From: Johannes Schindelin @ 2022-03-01 20:24 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Glen Choo, git, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Hi,

On Sat, 26 Feb 2022, Junio C Hamano wrote:

> A few tests added by this patch have been failing on one specific
> job (linux-gcc ubuntu-latest) at GitHub CI.
>
> https://github.com/git/git/runs/5341052811?check_suite_focus=true#step:5:3968
> https://github.com/git/git/runs/5343133021?check_suite_focus=true#step:4:5520
>
>     Side note: you may need to be logged in to GitHub to view them.
>     These two use different versions of CI to show the test traces;
>     in the latter you may have to click on right-facing rectangle on
>     the line with label "5520" to see the breakage.
>
> I think there is some baked-in assumption in the failing test what
> the name of the initial branch by default is, which may be the reason
> why this particular job fails while others don't.
>
> Can you take a look at it?

The log says this:

-- snip --
[...]
  + git commit -m super-b2
  [main 00b85ba] super-b2
   Author: A U Thor <author@example.com>
   1 file changed, 1 insertion(+), 1 deletion(-)
  + cd same-name-downstream
  + git checkout same-name-2/master
  error: pathspec 'same-name-2/master' did not match any file(s) known to git
  error: last command exited with $?=1
  + git -C same-name-downstream checkout master
  error: pathspec 'master' did not match any file(s) known to git
  + eval_ret=1
  + :
  not ok 49 - fetch --recurse-submodules updates name-conflicted, populated submodule

  #
  #		test_when_finished "git -C same-name-downstream checkout master" &&
  #		(
  #			cd same-name-1 &&
  #			test_commit -C submodule --no-tag b1 &&
  #			git add submodule &&
  #			git commit -m "super-b1"
  #		) &&
  #		(
  #			cd same-name-2 &&
  #			test_commit -C submodule --no-tag b2 &&
  #			git add submodule &&
  #			git commit -m "super-b2"
  #		) &&
  #		(
  #			cd same-name-downstream &&
  #			# even though the .gitmodules is correct, we
  #			cannot
  #			# fetch from same-name-2
  #			git checkout same-name-2/master &&
  #			git fetch --recurse-submodules same-name-1 &&
  #			test_must_fail git fetch --recurse-submodules
  #			same-name-2
  #		) &&
  #		super_head1=$(git -C same-name-1 rev-parse HEAD) &&
  #		git -C same-name-downstream cat-file -e $super_head1 &&
  #
  #		super_head2=$(git -C same-name-2 rev-parse HEAD) &&
  #		git -C same-name-downstream cat-file -e $super_head2 &&
  #
  #		sub_head1=$(git -C same-name-1/submodule rev-parse HEAD)
  #		&&
  #		git -C same-name-downstream/submodule cat-file -e
  #		$sub_head1 &&
  #
  #		sub_head2=$(git -C same-name-2/submodule rev-parse HEAD)
  #		&&
  #		test_must_fail git -C same-name-downstream/submodule
  #		cat-file -e $sub_head2
  #
-- snap --

So yes, there is a lot of `master`ing going on.

I _think_ the remedy will be to use the `-b <branch-name>` option of `git
init` in
https://github.com/git/git/blob/82dd0cbc7fcf2985a3dcfbd99caa9f80626b00df/t/t5526-fetch-submodules.sh#L1015
in
https://github.com/git/git/blob/82dd0cbc7fcf2985a3dcfbd99caa9f80626b00df/t/t5526-fetch-submodules.sh#L1024
and in
https://github.com/git/git/blob/82dd0cbc7fcf2985a3dcfbd99caa9f80626b00df/t/t5526-fetch-submodules.sh#L1033
e.g.

	git -C submodule init -b main

At least that's how _I_ tried to address similar issues in the test suite
in the past.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v3 09/10] fetch: fetch unpopulated, changed submodules
  2022-02-26 18:53       ` Junio C Hamano
  2022-03-01 20:24         ` Johannes Schindelin
@ 2022-03-01 20:32         ` Junio C Hamano
  1 sibling, 0 replies; 149+ messages in thread
From: Junio C Hamano @ 2022-03-01 20:32 UTC (permalink / raw)
  To: Glen Choo
  Cc: git, Jonathan Tan, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin

Junio C Hamano <gitster@pobox.com> writes:

> A few tests added by this patch have been failing on one specific
> job (linux-gcc ubuntu-latest) at GitHub CI.
>
> https://github.com/git/git/runs/5341052811?check_suite_focus=true#step:5:3968
> https://github.com/git/git/runs/5343133021?check_suite_focus=true#step:4:5520
>
>     Side note: you may need to be logged in to GitHub to view them.
>     These two use different versions of CI to show the test traces;
>     in the latter you may have to click on right-facing rectangle on
>     the line with label "5520" to see the breakage.
>
> I think there is some baked-in assumption in the failing test what
> the name of the initial branch by default is, which may be the reason
> why this particular job fails while others don't.
>
> Can you take a look at it?
>
> Thanks.

In case you haven't noticed, this is what I have near the tip of the
topic to fix it.

diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index a395d2b979..9415a1e7c0 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -3,6 +3,8 @@
 
 test_description='Recursive "git fetch" for submodules'
 
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB=1
 export GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB
 

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* Re: [PATCH v3 09/10] fetch: fetch unpopulated, changed submodules
  2022-03-01 20:24         ` Johannes Schindelin
@ 2022-03-01 20:33           ` Junio C Hamano
  2022-03-02 23:25             ` Glen Choo
  0 siblings, 1 reply; 149+ messages in thread
From: Junio C Hamano @ 2022-03-01 20:33 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Glen Choo, git, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Hi,
>
> On Sat, 26 Feb 2022, Junio C Hamano wrote:
>
>> A few tests added by this patch have been failing on one specific
>> job (linux-gcc ubuntu-latest) at GitHub CI.
>>
>> https://github.com/git/git/runs/5341052811?check_suite_focus=true#step:5:3968
>> https://github.com/git/git/runs/5343133021?check_suite_focus=true#step:4:5520
>>
>>     Side note: you may need to be logged in to GitHub to view them.
>>     These two use different versions of CI to show the test traces;
>>     in the latter you may have to click on right-facing rectangle on
>>     the line with label "5520" to see the breakage.
>>
>> I think there is some baked-in assumption in the failing test what
>> the name of the initial branch by default is, which may be the reason
>> why this particular job fails while others don't.
>>
>> Can you take a look at it?
>
> The log says this:
> ...
> At least that's how _I_ tried to address similar issues in the test suite
> in the past.

Yes, I had a squashable fix/workaround queued since last night.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v3 09/10] fetch: fetch unpopulated, changed submodules
  2022-03-01 20:33           ` Junio C Hamano
@ 2022-03-02 23:25             ` Glen Choo
  0 siblings, 0 replies; 149+ messages in thread
From: Glen Choo @ 2022-03-02 23:25 UTC (permalink / raw)
  To: Junio C Hamano, Johannes Schindelin
  Cc: git, Jonathan Tan, Ævar Arnfjörð Bjarmason

Junio C Hamano <gitster@pobox.com> writes:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
>> Hi,
>>
>> On Sat, 26 Feb 2022, Junio C Hamano wrote:
>>
>>> A few tests added by this patch have been failing on one specific
>>> job (linux-gcc ubuntu-latest) at GitHub CI.
>>>
>>> https://github.com/git/git/runs/5341052811?check_suite_focus=true#step:5:3968
>>> https://github.com/git/git/runs/5343133021?check_suite_focus=true#step:4:5520
>>>
>>>     Side note: you may need to be logged in to GitHub to view them.
>>>     These two use different versions of CI to show the test traces;
>>>     in the latter you may have to click on right-facing rectangle on
>>>     the line with label "5520" to see the breakage.
>>>
>>> I think there is some baked-in assumption in the failing test what
>>> the name of the initial branch by default is, which may be the reason
>>> why this particular job fails while others don't.
>>>
>>> Can you take a look at it?
>>
>> The log says this:
>> ...
>> At least that's how _I_ tried to address similar issues in the test suite
>> in the past.
>
> Yes, I had a squashable fix/workaround queued since last night.

Thanks, both! I especially appreciate the pointers because I couldn't
remember how to set the default branch off the top of my head.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* [PATCH v4 00/10] fetch --recurse-submodules: fetch unpopulated submodules
  2022-02-24 10:08   ` [PATCH v3 00/10] fetch --recurse-submodules: fetch unpopulated submodules Glen Choo
                       ` (9 preceding siblings ...)
  2022-02-24 10:08     ` [PATCH v3 10/10] submodule: fix latent check_has_commit() bug Glen Choo
@ 2022-03-04  0:57     ` Glen Choo
  2022-03-04  0:57       ` [PATCH v4 01/10] t5526: introduce test helper to assert on fetches Glen Choo
                         ` (11 more replies)
  10 siblings, 12 replies; 149+ messages in thread
From: Glen Choo @ 2022-03-04  0:57 UTC (permalink / raw)
  To: git
  Cc: Glen Choo, Jonathan Tan, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

Original cover letter: https://lore.kernel.org/git/20220210044152.78352-1-chooglen@google.com

This round of patches is now based on master - I prepared the previous
rounds on top of gc/branch-recurse-submodules, but now that's merged
onto master (and the actual branch for this series,
gc/recursive-fetch-with-unused-submodules, is based off master anyway).

This round fixes up the comments from the previous round (thanks
everyone!), all of which are fairly small.

= Patch organization

- Patches 1-3 are quality-of-life improvements to the test suite that
  make it easier to write the tests in patch 9.
- Patches 4-6 are preparation for "git fetch" to read .gitmodules from
  the superproject commit in patch 7.
- Patches 7-8 refactor out the logic of "finding which submodules to
  fetch" and "fetching the submodules", making it easier to tell "git
  fetch" to fetch unpopulated submodules.
- Patch 9 teaches "git fetch" to fetch changed, unpopulated submodules
  in addition to populated submodules.
- Patch 10 is an optional bugfix + cleanup of the "git fetch" code that
  removes the last caller of the deprecated "add_submodule_odb()".

= Changes 

== Since v3
- Numerous style fixes + improved comments.
- Fix sed portability issues.
- Fix failing test due to default branch name assumptions.
- Patch 3: change a test so that it no longer depends on state from the
  previous test.
- Patch 9: fix memory leak when recording super_oid and path + add
  explanatory comment.

== Since v2
- Numerous small fixes to the code and commit message (thanks to all who
  helped spot these :))
- In patch 2, use test_cmp + sed to assert on test output, effectively
  reverting the "use grep" approach of v1-2 (see patch 2's description).
- New patch 3: introduce a test helper that creates the expected
  superproject commit (instead of copy-pasting the code over and over).
  - I did not get rid of "git fetch" inside the test helper (as Jonathan
    suggested) though, because that requires a bigger change in the test
    setup, and I think the test helper makes the test straightforward
    enough.
- New patch 8: refactor some shared logic out into fetch_task_create().
  This reduces code duplication between the get_fetch_task_from_*
  functions.
- In patch 9, add additional tests for 'submodules with the same name'.
- In patch 9, handle a bug where a submodule that is unpopulated by "git
  rm" still has "core.worktree" set and cannot be fetched (see patch 9's
  description).
- Remove the "git fetch --update-shallow" patch (I'll try to send it
  separately).

== Since v1
- Numerous style fixes suggested by Jonathan (thanks!)
- In patch 3, don't prematurely read submodules from the superproject
  commit (see:
  <kl6l5yplyat6.fsf@chooglen-macbookpro.roam.corp.google.com>).
- In patch 7, stop using "git checkout" and "! grep" in tests.
- In patch 7, stop doing the "find changed submodules" rev walk
  unconditionally. Instead, continue to check for .gitmodules, but also
  check for submodules in $GIT_DIR/modules.
  - I'm not entirely happy with the helper function name, see "---" for
    details.
- Move "git fetch --update-shallow" bugfix to patch 8.
  - Because the "find changed submodules" rev walk is no longer
    unconditional, this fix is no longer needed for tests to pass.
- Rename fetch_populated_submodules() to fetch_submodules().


Glen Choo (10):
  t5526: introduce test helper to assert on fetches
  t5526: stop asserting on stderr literally
  t5526: create superproject commits with test helper
  submodule: make static functions read submodules from commits
  submodule: inline submodule_commits() into caller
  submodule: store new submodule commits oid_array in a struct
  submodule: extract get_fetch_task()
  submodule: move logic into fetch_task_create()
  fetch: fetch unpopulated, changed submodules
  submodule: fix latent check_has_commit() bug

 Documentation/fetch-options.txt |  26 +-
 Documentation/git-fetch.txt     |  10 +-
 builtin/fetch.c                 |  14 +-
 submodule.c                     | 442 +++++++++++++++++---------
 submodule.h                     |  21 +-
 t/t5526-fetch-submodules.sh     | 539 ++++++++++++++++++++++++--------
 6 files changed, 740 insertions(+), 312 deletions(-)

Range-diff against v3:
 1:  b6d34b0f5c !  1:  57cd31afc2 t5526: introduce test helper to assert on fetches
    @@ t/t5526-fetch-submodules.sh: add_upstream_commit() {
     +verify_fetch_result() {
     +	ACTUAL_ERR=$1 &&
     +	rm -f expect.err.combined &&
    -+	if [ -f expect.err.super ]; then
    ++	if test -f expect.err.super
    ++	then
     +		cat expect.err.super >>expect.err.combined
     +	fi &&
    -+	if [ -f expect.err.sub ]; then
    ++	if test -f expect.err.sub
    ++	then
     +		cat expect.err.sub >>expect.err.combined
     +	fi &&
    -+	if [ -f expect.err.deep ]; then
    ++	if test -f expect.err.deep
    ++	then
     +		cat expect.err.deep >>expect.err.combined
     +	fi &&
     +	test_cmp expect.err.combined $ACTUAL_ERR
 2:  0b85fa35c2 !  2:  b70c894cff t5526: stop asserting on stderr literally
    @@ t/t5526-fetch-submodules.sh: export GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB
      
      pwd=$(pwd)
      
    -+check_sub() {
    ++check_sub () {
     +	NEW_HEAD=$1 &&
    -+	cat <<-EOF >$pwd/expect.err.sub
    ++	cat >$pwd/expect.err.sub <<-EOF
     +	Fetching submodule submodule
     +	From $pwd/submodule
     +	   OLD_HEAD..$NEW_HEAD  sub        -> origin/sub
     +	EOF
     +}
     +
    -+check_deep() {
    ++check_deep () {
     +	NEW_HEAD=$1 &&
    -+	cat <<-EOF >$pwd/expect.err.deep
    ++	cat >$pwd/expect.err.deep <<-EOF
     +	Fetching submodule submodule/subdir/deepsubmodule
     +	From $pwd/deepsubmodule
     +	   OLD_HEAD..$NEW_HEAD  deep       -> origin/deep
     +	EOF
     +}
     +
    -+check_super() {
    ++check_super () {
     +	NEW_HEAD=$1 &&
    -+	cat <<-EOF >$pwd/expect.err.super
    ++	cat >$pwd/expect.err.super <<-EOF
     +	From $pwd/.
     +	   OLD_HEAD..$NEW_HEAD  super      -> origin/super
     +	EOF
    @@ t/t5526-fetch-submodules.sh: pwd=$(pwd)
      	)
      }
      
    +@@ t/t5526-fetch-submodules.sh: add_upstream_commit() {
    + #
    + # If a repo should not be fetched in the test, its corresponding
    + # expect.err file should be rm-ed.
    +-verify_fetch_result() {
    ++verify_fetch_result () {
    + 	ACTUAL_ERR=$1 &&
    + 	rm -f expect.err.combined &&
    + 	if test -f expect.err.super
     @@ t/t5526-fetch-submodules.sh: verify_fetch_result() {
    - 	if [ -f expect.err.deep ]; then
    + 	then
      		cat expect.err.deep >>expect.err.combined
      	fi &&
     -	test_cmp expect.err.combined $ACTUAL_ERR
    -+	sed -E 's/[0-9a-f]+\.\./OLD_HEAD\.\./' $ACTUAL_ERR >actual.err.cmp &&
    ++	sed -e 's/[0-9a-f][0-9a-f]*\.\./OLD_HEAD\.\./' "$ACTUAL_ERR" >actual.err.cmp &&
     +	test_cmp expect.err.combined actual.err.cmp
      }
      
 3:  bb8ef6094a !  3:  7e2a01164e t5526: create superproject commits with test helper
    @@ Commit message
         Signed-off-by: Glen Choo <chooglen@google.com>
     
      ## t/t5526-fetch-submodules.sh ##
    -@@ t/t5526-fetch-submodules.sh: check_super() {
    +@@ t/t5526-fetch-submodules.sh: check_super () {
      # a file that contains the expected err if that new commit were fetched.
      # These output files get concatenated in the right order by
      # verify_fetch_result().
     -add_upstream_commit() {
    -+add_submodule_commits() {
    ++add_submodule_commits () {
      	(
      		cd submodule &&
      		echo new >> subfile &&
    @@ t/t5526-fetch-submodules.sh: add_upstream_commit() {
     +#
     +# This requires add_submodule_commits() to be called first, otherwise
     +# the submodules will not have changed and cannot be "git add"-ed.
    -+add_superproject_commits() {
    -+(
    -+	cd submodule &&
    ++add_superproject_commits () {
     +	(
    -+		cd subdir/deepsubmodule &&
    -+		git fetch &&
    -+		git checkout -q FETCH_HEAD
    -+	) &&
    ++		cd submodule &&
    ++		(
    ++			cd subdir/deepsubmodule &&
    ++			git fetch &&
    ++			git checkout -q FETCH_HEAD
    ++		) &&
     +		git add subdir/deepsubmodule &&
     +		git commit -m "new deep submodule"
     +	) &&
 4:  e83a1713c4 =  4:  88112ee225 submodule: make static functions read submodules from commits
 5:  e27d402b9a =  5:  007cd97aba submodule: inline submodule_commits() into caller
 6:  1c7c8218b8 =  6:  f34ea88fe9 submodule: store new submodule commits oid_array in a struct
 7:  80cf317722 !  7:  f66ab663c5 submodule: extract get_fetch_task()
    @@ submodule.c: struct fetch_task {
      	struct repository *repo;
      	const struct submodule *sub;
      	unsigned free_sub : 1; /* Do we need to free the submodule? */
    -+	const char *default_argv;
    ++	const char *default_argv; /* The default fetch mode. */
      
      	struct oid_array *commits; /* Ensure these commits are fetched */
      };
 8:  bf9cfa7054 =  8:  4e3db1bc9d submodule: move logic into fetch_task_create()
 9:  c7c2ff71b6 !  9:  9e7b1c1bbe fetch: fetch unpopulated, changed submodules
    @@ builtin/fetch.c: int cmd_fetch(int argc, const char **argv, const char *prefix)
     
      ## submodule.c ##
     @@ submodule.c: static const char *default_name_or_path(const char *path_or_name)
    -  * member of the changed submodule string_list_item.
    + 
    + /*
    +  * Holds relevant information for a changed submodule. Used as the .util
    +- * member of the changed submodule string_list_item.
    ++ * member of the changed submodule name string_list_item.
    ++ *
    ++ * (super_oid, path) allows the submodule config to be read from _some_
    ++ * .gitmodules file. We store this information the first time we find a
    ++ * superproject commit that points to the submodule, but this is
    ++ * arbitrary - we can choose any (super_oid, path) that matches the
    ++ * submodule's name.
       */
      struct changed_submodule_data {
     +	/*
    -+	 * The first superproject commit in the rev walk that points to the
    -+	 * submodule.
    ++	 * The first superproject commit in the rev walk that points to
    ++	 * the submodule.
     +	 */
     +	const struct object_id *super_oid;
     +	/*
    @@ submodule.c: struct changed_submodule_data {
      
      static void collect_changed_submodules_cb(struct diff_queue_struct *q,
     @@ submodule.c: static void collect_changed_submodules_cb(struct diff_queue_struct *q,
    - 		if (!item->util)
    + 			continue;
    + 
    + 		item = string_list_insert(changed, name);
    +-		if (!item->util)
    ++		if (item->util)
    ++			cs_data = item->util;
    ++		else {
      			item->util = xcalloc(1, sizeof(struct changed_submodule_data));
    - 		cs_data = item->util;
    -+		cs_data->super_oid = commit_oid;
    -+		cs_data->path = xstrdup(p->two->path);
    +-		cs_data = item->util;
    ++			cs_data = item->util;
    ++			cs_data->super_oid = commit_oid;
    ++			cs_data->path = xstrdup(p->two->path);
    ++		}
      		oid_array_append(&cs_data->new_commits, &p->two->oid);
      	}
      }
    @@ submodule.c: void check_for_new_submodule_commits(struct object_id *oid)
     + */
     +static int repo_has_absorbed_submodules(struct repository *r)
     +{
    ++	int ret;
     +	struct strbuf buf = STRBUF_INIT;
     +
     +	strbuf_repo_git_path(&buf, r, "modules/");
    -+	return file_exists(buf.buf) && !is_empty_dir(buf.buf);
    ++	ret = file_exists(buf.buf) && !is_empty_dir(buf.buf);
    ++	strbuf_release(&buf);
    ++	return ret;
     +}
     +
      static void calculate_changed_submodule_paths(struct repository *r,
    @@ submodule.c: int submodule_touches_in_range(struct repository *r,
      
      struct submodule_parallel_fetch {
     -	int count;
    ++	/*
    ++	 * The index of the last index entry processed by
    ++	 * get_fetch_task_from_index().
    ++	 */
     +	int index_count;
    ++	/*
    ++	 * The index of the last string_list entry processed by
    ++	 * get_fetch_task_from_changed().
    ++	 */
     +	int changed_count;
      	struct strvec args;
      	struct repository *r;
      	const char *prefix;
     @@ submodule.c: struct submodule_parallel_fetch {
    + 	int quiet;
      	int result;
      
    ++	/*
    ++	 * Names of submodules that have new commits. Generated by
    ++	 * walking the newly fetched superproject commits.
    ++	 */
      	struct string_list changed_submodule_names;
    ++	/*
    ++	 * Names of submodules that have already been processed. Lets us
    ++	 * avoid fetching the same submodule more than once.
    ++	 */
     +	struct string_list seen_submodule_names;
      
      	/* Pending fetches by OIDs */
    @@ submodule.c: struct submodule_parallel_fetch {
     @@ submodule.c: struct fetch_task {
      	const struct submodule *sub;
      	unsigned free_sub : 1; /* Do we need to free the submodule? */
    - 	const char *default_argv;
    -+	struct strvec git_args;
    + 	const char *default_argv; /* The default fetch mode. */
    ++	struct strvec git_args; /* Args for the child git process. */
      
      	struct oid_array *commits; /* Ensure these commits are fetched */
      };
    @@ submodule.c: get_fetch_task(struct submodule_parallel_fetch *spf, struct strbuf
     +
     +		spf->changed_count++;
     +		/*
    -+		 * NEEDSWORK: A submodule unpopulated by "git rm" will
    -+		 * have core.worktree set, but the actual core.worktree
    -+		 * directory won't exist, causing the child process to
    -+		 * fail. Forcibly set --work-tree until we get smarter
    -+		 * handling for core.worktree in unpopulated submodules.
    ++		 * NEEDSWORK: Submodules set/unset a value for
    ++		 * core.worktree when they are populated/unpopulated by
    ++		 * "git checkout" (and similar commands, see
    ++		 * submodule_move_head() and
    ++		 * connect_work_tree_and_git_dir()), but if the
    ++		 * submodule is unpopulated in another way (e.g. "git
    ++		 * rm", "rm -r"), core.worktree will still be set even
    ++		 * though the directory doesn't exist, and the child
    ++		 * process will crash while trying to chdir into the
    ++		 * nonexistent directory.
    ++		 *
    ++		 * In this case, we know that the submodule has no
    ++		 * working tree, so we can work around this by
    ++		 * setting "--work-tree=." (--bare does not work because
    ++		 * worktree settings take precedence over bare-ness).
    ++		 * However, this is not necessarily true in other cases,
    ++		 * so a generalized solution is still necessary.
    ++		 *
    ++		 * Possible solutions:
    ++		 * - teach "git [add|rm]" to unset core.worktree and
    ++		 *   discourage users from removing submodules without
    ++		 *   using a Git command.
    ++		 * - teach submodule child processes to ignore stale
    ++		 *   core.worktree values.
     +		 */
     +		strvec_push(&task->git_args, "--work-tree=.");
     +		return task;
    @@ submodule.h: int should_update_submodules(void);
      ## t/t5526-fetch-submodules.sh ##
     @@ t/t5526-fetch-submodules.sh: pwd=$(pwd)
      
    - check_sub() {
    + check_sub () {
      	NEW_HEAD=$1 &&
     +	SUPER_HEAD=$2 &&
    - 	cat <<-EOF >$pwd/expect.err.sub
    + 	cat >$pwd/expect.err.sub <<-EOF
     -	Fetching submodule submodule
     +	Fetching submodule submodule${SUPER_HEAD:+ at commit $SUPER_HEAD}
      	From $pwd/submodule
      	   OLD_HEAD..$NEW_HEAD  sub        -> origin/sub
      	EOF
    -@@ t/t5526-fetch-submodules.sh: check_sub() {
    +@@ t/t5526-fetch-submodules.sh: check_sub () {
      
    - check_deep() {
    + check_deep () {
      	NEW_HEAD=$1 &&
     +	SUB_HEAD=$2 &&
    - 	cat <<-EOF >$pwd/expect.err.deep
    + 	cat >$pwd/expect.err.deep <<-EOF
     -	Fetching submodule submodule/subdir/deepsubmodule
     +	Fetching submodule submodule/subdir/deepsubmodule${SUB_HEAD:+ at commit $SUB_HEAD}
      	From $pwd/deepsubmodule
      	   OLD_HEAD..$NEW_HEAD  deep       -> origin/deep
      	EOF
    +@@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-demand' doesn't recurse when no ne
    + '
    + 
    + test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necessary (and ignores config)" '
    ++	add_submodule_commits &&
    + 	add_superproject_commits &&
    + 	(
    + 		cd downstream &&
     @@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necess
      	verify_fetch_result actual.err
      '
    @@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-deman
     +	# Use test_cmp manually because verify_fetch_result does not
     +	# consider submodule2. All the repos should be fetched, but only
     +	# submodule2 should be read from a commit
    -+	cat <<-EOF > expect.err.combined &&
    ++	cat > expect.err.combined <<-EOF &&
     +	From $pwd/.
     +	   OLD_HEAD..$super_head  super           -> origin/super
     +	   OLD_HEAD..$super_sub2_only_head  super-sub2-only -> origin/super-sub2-only
    @@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-deman
     +	From $pwd/submodule2
     +	   OLD_HEAD..$sub2_head  sub2       -> origin/sub2
     +	EOF
    -+	sed -E "s/[0-9a-f]+\.\./OLD_HEAD\.\./" actual.err >actual.err.cmp &&
    ++	sed -e "s/[0-9a-f][0-9a-f]*\.\./OLD_HEAD\.\./" actual.err >actual.err.cmp &&
     +	test_cmp expect.err.combined actual.err.cmp
     +'
     +
    @@ t/t5526-fetch-submodules.sh: test_expect_success 'recursive fetch after deinit a
     +	mkdir same-name-1 &&
     +	(
     +		cd same-name-1 &&
    -+		git init &&
    ++		git init -b main &&
     +		test_commit --no-tag a
     +	) &&
     +	git clone same-name-1 same-name-2 &&
    @@ t/t5526-fetch-submodules.sh: test_expect_success 'recursive fetch after deinit a
     +	(
     +		cd same-name-1 &&
     +		mkdir submodule &&
    -+		git -C submodule init &&
    ++		git -C submodule init -b main &&
     +		test_commit -C submodule --no-tag a1 &&
     +		git submodule add "$pwd/same-name-1/submodule" &&
     +		git add submodule &&
    @@ t/t5526-fetch-submodules.sh: test_expect_success 'recursive fetch after deinit a
     +	(
     +		cd same-name-2 &&
     +		mkdir submodule &&
    -+		git -C submodule init &&
    ++		git -C submodule init -b main &&
     +		test_commit -C submodule --no-tag a2 &&
     +		git submodule add "$pwd/same-name-2/submodule" &&
     +		git add submodule &&
    @@ t/t5526-fetch-submodules.sh: test_expect_success 'recursive fetch after deinit a
     +'
     +
     +test_expect_success 'fetch --recurse-submodules updates name-conflicted, populated submodule' '
    -+	test_when_finished "git -C same-name-downstream checkout master" &&
    ++	test_when_finished "git -C same-name-downstream checkout main" &&
     +	(
     +		cd same-name-1 &&
     +		test_commit -C submodule --no-tag b1 &&
    @@ t/t5526-fetch-submodules.sh: test_expect_success 'recursive fetch after deinit a
     +		cd same-name-downstream &&
     +		# even though the .gitmodules is correct, we cannot
     +		# fetch from same-name-2
    -+		git checkout same-name-2/master &&
    ++		git checkout same-name-2/main &&
     +		git fetch --recurse-submodules same-name-1 &&
     +		test_must_fail git fetch --recurse-submodules same-name-2
     +	) &&
    @@ t/t5526-fetch-submodules.sh: test_expect_success 'recursive fetch after deinit a
     +	) &&
     +	(
     +		cd same-name-downstream &&
    -+		git checkout master &&
    ++		git checkout main &&
     +		git rm .gitmodules &&
     +		git rm submodule &&
     +		git commit -m "no submodules" &&
    @@ t/t5526-fetch-submodules.sh: test_expect_success 'recursive fetch after deinit a
     +	(
     +		cd same-name-downstream/.git/modules/submodule &&
     +		# The submodule has core.worktree pointing to the "git
    -+		# rm"-ed directory, overwrite the invalid value.
    ++		# rm"-ed directory, overwrite the invalid value. See
    ++		# comment in get_fetch_task_from_changed() for more
    ++		# information.
     +		git --work-tree=. cat-file -e $head1 &&
     +		test_must_fail git --work-tree=. cat-file -e $head2
     +	)
10:  e1ac74eee4 = 10:  362ce3c7f8 submodule: fix latent check_has_commit() bug

base-commit: 715d08a9e51251ad8290b181b6ac3b9e1f9719d7
-- 
2.33.GIT


^ permalink raw reply	[flat|nested] 149+ messages in thread

* [PATCH v4 01/10] t5526: introduce test helper to assert on fetches
  2022-03-04  0:57     ` [PATCH v4 00/10] fetch --recurse-submodules: fetch unpopulated submodules Glen Choo
@ 2022-03-04  0:57       ` Glen Choo
  2022-03-04  2:06         ` Junio C Hamano
  2022-03-04  0:57       ` [PATCH v4 02/10] t5526: stop asserting on stderr literally Glen Choo
                         ` (10 subsequent siblings)
  11 siblings, 1 reply; 149+ messages in thread
From: Glen Choo @ 2022-03-04  0:57 UTC (permalink / raw)
  To: git
  Cc: Glen Choo, Jonathan Tan, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

Tests in t/t5526-fetch-submodules.sh are unnecessarily noisy:

* The tests have extra logic in order to reproduce the expected stderr
  literally, but not all of these details (e.g. the head of the
  remote-tracking branch before the fetch) are relevant to the test.

* The expect.err file is constructed by the add_upstream_commit() helper
  as input into test_cmp, but most tests fetch a different combination
  of repos from expect.err. This results in noisy tests that modify
  parts of that expect.err to generate the expected output.

To address both of these issues, introduce a verify_fetch_result()
helper to t/t5526-fetch-submodules.sh that asserts on the output of "git
fetch --recurse-submodules" and handles the ordering of expect.err.

As a result, the tests no longer construct expect.err manually. Tests
still consider the old head of the remote-tracking branch ("$head1"),
but that will be fixed in a later commit.

Signed-off-by: Glen Choo <chooglen@google.com>
---
 t/t5526-fetch-submodules.sh | 139 ++++++++++++++++++++++--------------
 1 file changed, 84 insertions(+), 55 deletions(-)

diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index 840c89cc8b..dff7a4b90b 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -10,6 +10,10 @@ export GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB
 
 pwd=$(pwd)
 
+# For each submodule in the test setup, this creates a commit and writes
+# a file that contains the expected err if that new commit were fetched.
+# These output files get concatenated in the right order by
+# verify_fetch_result().
 add_upstream_commit() {
 	(
 		cd submodule &&
@@ -19,9 +23,9 @@ add_upstream_commit() {
 		git add subfile &&
 		git commit -m new subfile &&
 		head2=$(git rev-parse --short HEAD) &&
-		echo "Fetching submodule submodule" > ../expect.err &&
-		echo "From $pwd/submodule" >> ../expect.err &&
-		echo "   $head1..$head2  sub        -> origin/sub" >> ../expect.err
+		echo "Fetching submodule submodule" > ../expect.err.sub &&
+		echo "From $pwd/submodule" >> ../expect.err.sub &&
+		echo "   $head1..$head2  sub        -> origin/sub" >> ../expect.err.sub
 	) &&
 	(
 		cd deepsubmodule &&
@@ -31,12 +35,36 @@ add_upstream_commit() {
 		git add deepsubfile &&
 		git commit -m new deepsubfile &&
 		head2=$(git rev-parse --short HEAD) &&
-		echo "Fetching submodule submodule/subdir/deepsubmodule" >> ../expect.err
-		echo "From $pwd/deepsubmodule" >> ../expect.err &&
-		echo "   $head1..$head2  deep       -> origin/deep" >> ../expect.err
+		echo "Fetching submodule submodule/subdir/deepsubmodule" > ../expect.err.deep
+		echo "From $pwd/deepsubmodule" >> ../expect.err.deep &&
+		echo "   $head1..$head2  deep       -> origin/deep" >> ../expect.err.deep
 	)
 }
 
+# Verifies that the expected repositories were fetched. This is done by
+# concatenating the files expect.err.[super|sub|deep] in the correct
+# order and comparing it to the actual stderr.
+#
+# If a repo should not be fetched in the test, its corresponding
+# expect.err file should be rm-ed.
+verify_fetch_result() {
+	ACTUAL_ERR=$1 &&
+	rm -f expect.err.combined &&
+	if test -f expect.err.super
+	then
+		cat expect.err.super >>expect.err.combined
+	fi &&
+	if test -f expect.err.sub
+	then
+		cat expect.err.sub >>expect.err.combined
+	fi &&
+	if test -f expect.err.deep
+	then
+		cat expect.err.deep >>expect.err.combined
+	fi &&
+	test_cmp expect.err.combined $ACTUAL_ERR
+}
+
 test_expect_success setup '
 	mkdir deepsubmodule &&
 	(
@@ -74,7 +102,7 @@ test_expect_success "fetch --recurse-submodules recurses into submodules" '
 		git fetch --recurse-submodules >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "submodule.recurse option triggers recursive fetch" '
@@ -84,7 +112,7 @@ test_expect_success "submodule.recurse option triggers recursive fetch" '
 		git -c submodule.recurse fetch >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "fetch --recurse-submodules -j2 has the same output behaviour" '
@@ -94,7 +122,7 @@ test_expect_success "fetch --recurse-submodules -j2 has the same output behaviou
 		GIT_TRACE="$TRASH_DIRECTORY/trace.out" git fetch --recurse-submodules -j2 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err &&
+	verify_fetch_result actual.err &&
 	grep "2 tasks" trace.out
 '
 
@@ -124,7 +152,7 @@ test_expect_success "using fetchRecurseSubmodules=true in .gitmodules recurses i
 		git fetch >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "--no-recurse-submodules overrides .gitmodules config" '
@@ -155,7 +183,7 @@ test_expect_success "--recurse-submodules overrides fetchRecurseSubmodules setti
 		git config --unset submodule.submodule.fetchRecurseSubmodules
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "--quiet propagates to submodules" '
@@ -183,7 +211,7 @@ test_expect_success "--dry-run propagates to submodules" '
 		git fetch --recurse-submodules --dry-run >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "Without --dry-run propagates to submodules" '
@@ -192,7 +220,7 @@ test_expect_success "Without --dry-run propagates to submodules" '
 		git fetch --recurse-submodules >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "recurseSubmodules=true propagates into submodules" '
@@ -203,7 +231,7 @@ test_expect_success "recurseSubmodules=true propagates into submodules" '
 		git fetch >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "--recurse-submodules overrides config in submodule" '
@@ -217,7 +245,7 @@ test_expect_success "--recurse-submodules overrides config in submodule" '
 		git fetch --recurse-submodules >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "--no-recurse-submodules overrides config setting" '
@@ -250,14 +278,14 @@ test_expect_success "Recursion stops when no new submodule commits are fetched"
 	git add submodule &&
 	git commit -m "new submodule" &&
 	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.sub &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err.sub &&
-	head -3 expect.err >> expect.err.sub &&
+	echo "From $pwd/." > expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
+	rm expect.err.deep &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err
 	) &&
-	test_cmp expect.err.sub actual.err &&
+	verify_fetch_result actual.err &&
 	test_must_be_empty actual.out
 '
 
@@ -268,14 +296,16 @@ test_expect_success "Recursion doesn't happen when new superproject commits don'
 	git add file &&
 	git commit -m "new file" &&
 	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.file &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err.file &&
+	echo "From $pwd/." > expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
+	rm expect.err.sub &&
+	rm expect.err.deep &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err.file actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "Recursion picks up config in submodule" '
@@ -292,9 +322,8 @@ test_expect_success "Recursion picks up config in submodule" '
 	git add submodule &&
 	git commit -m "new submodule" &&
 	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.sub &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err.sub &&
-	cat expect.err >> expect.err.sub &&
+	echo "From $pwd/." > expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err &&
@@ -303,7 +332,7 @@ test_expect_success "Recursion picks up config in submodule" '
 			git config --unset fetch.recurseSubmodules
 		)
 	) &&
-	test_cmp expect.err.sub actual.err &&
+	verify_fetch_result actual.err &&
 	test_must_be_empty actual.out
 '
 
@@ -328,15 +357,13 @@ test_expect_success "Recursion picks up all submodules when necessary" '
 	git add submodule &&
 	git commit -m "new submodule" &&
 	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.2 &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err.2 &&
-	cat expect.err.sub >> expect.err.2 &&
-	tail -3 expect.err >> expect.err.2 &&
+	echo "From $pwd/." > expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err
 	) &&
-	test_cmp expect.err.2 actual.err &&
+	verify_fetch_result actual.err &&
 	test_must_be_empty actual.out
 '
 
@@ -372,11 +399,8 @@ test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necess
 	git add submodule &&
 	git commit -m "new submodule" &&
 	head2=$(git rev-parse --short HEAD) &&
-	tail -3 expect.err > expect.err.deepsub &&
-	echo "From $pwd/." > expect.err &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err &&
-	cat expect.err.sub >> expect.err &&
-	cat expect.err.deepsub >> expect.err &&
+	echo "From $pwd/." > expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
 	(
 		cd downstream &&
 		git config fetch.recurseSubmodules false &&
@@ -392,7 +416,7 @@ test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necess
 		)
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "'--recurse-submodules=on-demand' stops when no new submodule commits are found in the superproject (and ignores config)" '
@@ -402,14 +426,16 @@ test_expect_success "'--recurse-submodules=on-demand' stops when no new submodul
 	git add file &&
 	git commit -m "new file" &&
 	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.file &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err.file &&
+	echo "From $pwd/." > expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
+	rm expect.err.sub &&
+	rm expect.err.deep &&
 	(
 		cd downstream &&
 		git fetch --recurse-submodules=on-demand >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err.file actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "'fetch.recurseSubmodules=on-demand' overrides global config" '
@@ -423,9 +449,9 @@ test_expect_success "'fetch.recurseSubmodules=on-demand' overrides global config
 	git add submodule &&
 	git commit -m "new submodule" &&
 	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.2 &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err.2 &&
-	head -3 expect.err >> expect.err.2 &&
+	echo "From $pwd/." > expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
+	rm expect.err.deep &&
 	(
 		cd downstream &&
 		git config fetch.recurseSubmodules on-demand &&
@@ -437,7 +463,7 @@ test_expect_success "'fetch.recurseSubmodules=on-demand' overrides global config
 		git config --unset fetch.recurseSubmodules
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err.2 actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "'submodule.<sub>.fetchRecurseSubmodules=on-demand' overrides fetch.recurseSubmodules" '
@@ -451,9 +477,9 @@ test_expect_success "'submodule.<sub>.fetchRecurseSubmodules=on-demand' override
 	git add submodule &&
 	git commit -m "new submodule" &&
 	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.2 &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err.2 &&
-	head -3 expect.err >> expect.err.2 &&
+	echo "From $pwd/." > expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
+	rm expect.err.deep &&
 	(
 		cd downstream &&
 		git config submodule.submodule.fetchRecurseSubmodules on-demand &&
@@ -465,7 +491,7 @@ test_expect_success "'submodule.<sub>.fetchRecurseSubmodules=on-demand' override
 		git config --unset submodule.submodule.fetchRecurseSubmodules
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err.2 actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "don't fetch submodule when newly recorded commits are already present" '
@@ -477,14 +503,17 @@ test_expect_success "don't fetch submodule when newly recorded commits are alrea
 	git add submodule &&
 	git commit -m "submodule rewound" &&
 	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err &&
+	echo "From $pwd/." > expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
+	rm expect.err.sub &&
+	# This file does not exist, but rm -f for readability
+	rm -f expect.err.deep &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err &&
+	verify_fetch_result actual.err &&
 	(
 		cd submodule &&
 		git checkout -q sub
@@ -502,9 +531,9 @@ test_expect_success "'fetch.recurseSubmodules=on-demand' works also without .git
 	git rm .gitmodules &&
 	git commit -m "new submodule without .gitmodules" &&
 	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." >expect.err.2 &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err.2 &&
-	head -3 expect.err >>expect.err.2 &&
+	echo "From $pwd/." >expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
+	rm expect.err.deep &&
 	(
 		cd downstream &&
 		rm .gitmodules &&
@@ -520,7 +549,7 @@ test_expect_success "'fetch.recurseSubmodules=on-demand' works also without .git
 		git reset --hard
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err.2 actual.err &&
+	verify_fetch_result actual.err &&
 	git checkout HEAD^ -- .gitmodules &&
 	git add .gitmodules &&
 	git commit -m "new submodule restored .gitmodules"
-- 
2.33.GIT


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v4 02/10] t5526: stop asserting on stderr literally
  2022-03-04  0:57     ` [PATCH v4 00/10] fetch --recurse-submodules: fetch unpopulated submodules Glen Choo
  2022-03-04  0:57       ` [PATCH v4 01/10] t5526: introduce test helper to assert on fetches Glen Choo
@ 2022-03-04  0:57       ` Glen Choo
  2022-03-04  2:12         ` Junio C Hamano
  2022-03-04 22:41         ` Jonathan Tan
  2022-03-04  0:57       ` [PATCH v4 03/10] t5526: create superproject commits with test helper Glen Choo
                         ` (9 subsequent siblings)
  11 siblings, 2 replies; 149+ messages in thread
From: Glen Choo @ 2022-03-04  0:57 UTC (permalink / raw)
  To: git
  Cc: Glen Choo, Jonathan Tan, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

In the previous commit message, we noted that not all of the "git fetch"
stderr is relevant to the tests. Most of the test setup lines are
dedicated to these details of the stderr:

1. which repos (super/sub/deep) are involved in the fetch
2. the head of the remote-tracking branch before the fetch (i.e. $head1)
3. the head of the remote-tracking branch after the fetch (i.e. $head2)

1. and 3. are relevant because they tell us that the expected commit is
fetched by the expected repo, but 2. is completely irrelevant.

Stop asserting on $head1 by replacing it with a dummy value in the
actual and expected output. Do this by introducing test
helpers (check_*()) that make it easier to construct the expected
output, and use sed to munge the actual output.

Signed-off-by: Glen Choo <chooglen@google.com>
---
 t/t5526-fetch-submodules.sh | 119 +++++++++++++++++-------------------
 1 file changed, 57 insertions(+), 62 deletions(-)

diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index dff7a4b90b..6b24d37b2b 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -10,6 +10,32 @@ export GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB
 
 pwd=$(pwd)
 
+check_sub () {
+	NEW_HEAD=$1 &&
+	cat >$pwd/expect.err.sub <<-EOF
+	Fetching submodule submodule
+	From $pwd/submodule
+	   OLD_HEAD..$NEW_HEAD  sub        -> origin/sub
+	EOF
+}
+
+check_deep () {
+	NEW_HEAD=$1 &&
+	cat >$pwd/expect.err.deep <<-EOF
+	Fetching submodule submodule/subdir/deepsubmodule
+	From $pwd/deepsubmodule
+	   OLD_HEAD..$NEW_HEAD  deep       -> origin/deep
+	EOF
+}
+
+check_super () {
+	NEW_HEAD=$1 &&
+	cat >$pwd/expect.err.super <<-EOF
+	From $pwd/.
+	   OLD_HEAD..$NEW_HEAD  super      -> origin/super
+	EOF
+}
+
 # For each submodule in the test setup, this creates a commit and writes
 # a file that contains the expected err if that new commit were fetched.
 # These output files get concatenated in the right order by
@@ -17,27 +43,21 @@ pwd=$(pwd)
 add_upstream_commit() {
 	(
 		cd submodule &&
-		head1=$(git rev-parse --short HEAD) &&
 		echo new >> subfile &&
 		test_tick &&
 		git add subfile &&
 		git commit -m new subfile &&
-		head2=$(git rev-parse --short HEAD) &&
-		echo "Fetching submodule submodule" > ../expect.err.sub &&
-		echo "From $pwd/submodule" >> ../expect.err.sub &&
-		echo "   $head1..$head2  sub        -> origin/sub" >> ../expect.err.sub
+		new_head=$(git rev-parse --short HEAD) &&
+		check_sub $new_head
 	) &&
 	(
 		cd deepsubmodule &&
-		head1=$(git rev-parse --short HEAD) &&
 		echo new >> deepsubfile &&
 		test_tick &&
 		git add deepsubfile &&
 		git commit -m new deepsubfile &&
-		head2=$(git rev-parse --short HEAD) &&
-		echo "Fetching submodule submodule/subdir/deepsubmodule" > ../expect.err.deep
-		echo "From $pwd/deepsubmodule" >> ../expect.err.deep &&
-		echo "   $head1..$head2  deep       -> origin/deep" >> ../expect.err.deep
+		new_head=$(git rev-parse --short HEAD) &&
+		check_deep $new_head
 	)
 }
 
@@ -47,7 +67,7 @@ add_upstream_commit() {
 #
 # If a repo should not be fetched in the test, its corresponding
 # expect.err file should be rm-ed.
-verify_fetch_result() {
+verify_fetch_result () {
 	ACTUAL_ERR=$1 &&
 	rm -f expect.err.combined &&
 	if test -f expect.err.super
@@ -62,7 +82,8 @@ verify_fetch_result() {
 	then
 		cat expect.err.deep >>expect.err.combined
 	fi &&
-	test_cmp expect.err.combined $ACTUAL_ERR
+	sed -e 's/[0-9a-f][0-9a-f]*\.\./OLD_HEAD\.\./' "$ACTUAL_ERR" >actual.err.cmp &&
+	test_cmp expect.err.combined actual.err.cmp
 }
 
 test_expect_success setup '
@@ -274,12 +295,10 @@ test_expect_success "Recursion doesn't happen when no new commits are fetched in
 '
 
 test_expect_success "Recursion stops when no new submodule commits are fetched" '
-	head1=$(git rev-parse --short HEAD) &&
 	git add submodule &&
 	git commit -m "new submodule" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
+	new_head=$(git rev-parse --short HEAD) &&
+	check_super $new_head &&
 	rm expect.err.deep &&
 	(
 		cd downstream &&
@@ -291,13 +310,11 @@ test_expect_success "Recursion stops when no new submodule commits are fetched"
 
 test_expect_success "Recursion doesn't happen when new superproject commits don't change any submodules" '
 	add_upstream_commit &&
-	head1=$(git rev-parse --short HEAD) &&
 	echo a > file &&
 	git add file &&
 	git commit -m "new file" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
+	new_head=$(git rev-parse --short HEAD) &&
+	check_super $new_head &&
 	rm expect.err.sub &&
 	rm expect.err.deep &&
 	(
@@ -318,12 +335,10 @@ test_expect_success "Recursion picks up config in submodule" '
 		)
 	) &&
 	add_upstream_commit &&
-	head1=$(git rev-parse --short HEAD) &&
 	git add submodule &&
 	git commit -m "new submodule" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
+	new_head=$(git rev-parse --short HEAD) &&
+	check_super $new_head &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err &&
@@ -345,20 +360,15 @@ test_expect_success "Recursion picks up all submodules when necessary" '
 			git fetch &&
 			git checkout -q FETCH_HEAD
 		) &&
-		head1=$(git rev-parse --short HEAD^) &&
 		git add subdir/deepsubmodule &&
 		git commit -m "new deepsubmodule" &&
-		head2=$(git rev-parse --short HEAD) &&
-		echo "Fetching submodule submodule" > ../expect.err.sub &&
-		echo "From $pwd/submodule" >> ../expect.err.sub &&
-		echo "   $head1..$head2  sub        -> origin/sub" >> ../expect.err.sub
+		new_head=$(git rev-parse --short HEAD) &&
+		check_sub $new_head
 	) &&
-	head1=$(git rev-parse --short HEAD) &&
 	git add submodule &&
 	git commit -m "new submodule" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
+	new_head=$(git rev-parse --short HEAD) &&
+	check_super $new_head &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err
@@ -376,13 +386,10 @@ test_expect_success "'--recurse-submodules=on-demand' doesn't recurse when no ne
 			git fetch &&
 			git checkout -q FETCH_HEAD
 		) &&
-		head1=$(git rev-parse --short HEAD^) &&
 		git add subdir/deepsubmodule &&
 		git commit -m "new deepsubmodule" &&
-		head2=$(git rev-parse --short HEAD) &&
-		echo Fetching submodule submodule > ../expect.err.sub &&
-		echo "From $pwd/submodule" >> ../expect.err.sub &&
-		echo "   $head1..$head2  sub        -> origin/sub" >> ../expect.err.sub
+		new_head=$(git rev-parse --short HEAD) &&
+		check_sub $new_head
 	) &&
 	(
 		cd downstream &&
@@ -395,12 +402,10 @@ test_expect_success "'--recurse-submodules=on-demand' doesn't recurse when no ne
 '
 
 test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necessary (and ignores config)" '
-	head1=$(git rev-parse --short HEAD) &&
 	git add submodule &&
 	git commit -m "new submodule" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
+	new_head=$(git rev-parse --short HEAD) &&
+	check_super $new_head &&
 	(
 		cd downstream &&
 		git config fetch.recurseSubmodules false &&
@@ -421,13 +426,11 @@ test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necess
 
 test_expect_success "'--recurse-submodules=on-demand' stops when no new submodule commits are found in the superproject (and ignores config)" '
 	add_upstream_commit &&
-	head1=$(git rev-parse --short HEAD) &&
 	echo a >> file &&
 	git add file &&
 	git commit -m "new file" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
+	new_head=$(git rev-parse --short HEAD) &&
+	check_super $new_head &&
 	rm expect.err.sub &&
 	rm expect.err.deep &&
 	(
@@ -445,12 +448,10 @@ test_expect_success "'fetch.recurseSubmodules=on-demand' overrides global config
 	) &&
 	add_upstream_commit &&
 	git config --global fetch.recurseSubmodules false &&
-	head1=$(git rev-parse --short HEAD) &&
 	git add submodule &&
 	git commit -m "new submodule" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
+	new_head=$(git rev-parse --short HEAD) &&
+	check_super $new_head &&
 	rm expect.err.deep &&
 	(
 		cd downstream &&
@@ -473,12 +474,10 @@ test_expect_success "'submodule.<sub>.fetchRecurseSubmodules=on-demand' override
 	) &&
 	add_upstream_commit &&
 	git config fetch.recurseSubmodules false &&
-	head1=$(git rev-parse --short HEAD) &&
 	git add submodule &&
 	git commit -m "new submodule" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
+	new_head=$(git rev-parse --short HEAD) &&
+	check_super $new_head &&
 	rm expect.err.deep &&
 	(
 		cd downstream &&
@@ -499,12 +498,10 @@ test_expect_success "don't fetch submodule when newly recorded commits are alrea
 		cd submodule &&
 		git checkout -q HEAD^^
 	) &&
-	head1=$(git rev-parse --short HEAD) &&
 	git add submodule &&
 	git commit -m "submodule rewound" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
+	new_head=$(git rev-parse --short HEAD) &&
+	check_super $new_head &&
 	rm expect.err.sub &&
 	# This file does not exist, but rm -f for readability
 	rm -f expect.err.deep &&
@@ -526,13 +523,11 @@ test_expect_success "'fetch.recurseSubmodules=on-demand' works also without .git
 		git fetch --recurse-submodules
 	) &&
 	add_upstream_commit &&
-	head1=$(git rev-parse --short HEAD) &&
 	git add submodule &&
 	git rm .gitmodules &&
 	git commit -m "new submodule without .gitmodules" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." >expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
+	new_head=$(git rev-parse --short HEAD) &&
+	check_super $new_head &&
 	rm expect.err.deep &&
 	(
 		cd downstream &&
-- 
2.33.GIT


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v4 03/10] t5526: create superproject commits with test helper
  2022-03-04  0:57     ` [PATCH v4 00/10] fetch --recurse-submodules: fetch unpopulated submodules Glen Choo
  2022-03-04  0:57       ` [PATCH v4 01/10] t5526: introduce test helper to assert on fetches Glen Choo
  2022-03-04  0:57       ` [PATCH v4 02/10] t5526: stop asserting on stderr literally Glen Choo
@ 2022-03-04  0:57       ` Glen Choo
  2022-03-04 22:59         ` Jonathan Tan
  2022-03-04  0:57       ` [PATCH v4 04/10] submodule: make static functions read submodules from commits Glen Choo
                         ` (8 subsequent siblings)
  11 siblings, 1 reply; 149+ messages in thread
From: Glen Choo @ 2022-03-04  0:57 UTC (permalink / raw)
  To: git
  Cc: Glen Choo, Jonathan Tan, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

A few tests in t5526 use this pattern as part of their setup:

1. Create new commits in the upstream submodules (using
   add_upstream_commit()).
2. In the upstream superprojects, add the new submodule commits from the
   previous step.

A future commit will add more tests with this pattern, so reduce the
verbosity of present and future tests by introducing a test helper that
creates superproject commits. Since we now have two helpers that add
upstream commits, rename add_upstream_commit() to
add_submodule_commits().

Signed-off-by: Glen Choo <chooglen@google.com>
---
 t/t5526-fetch-submodules.sh | 94 +++++++++++++++++--------------------
 1 file changed, 44 insertions(+), 50 deletions(-)

diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index 6b24d37b2b..4cae2e4f7c 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -40,7 +40,7 @@ check_super () {
 # a file that contains the expected err if that new commit were fetched.
 # These output files get concatenated in the right order by
 # verify_fetch_result().
-add_upstream_commit() {
+add_submodule_commits () {
 	(
 		cd submodule &&
 		echo new >> subfile &&
@@ -61,6 +61,30 @@ add_upstream_commit() {
 	)
 }
 
+# For each superproject in the test setup, update its submodule, add the
+# submodule and create a new commit with the submodule change.
+#
+# This requires add_submodule_commits() to be called first, otherwise
+# the submodules will not have changed and cannot be "git add"-ed.
+add_superproject_commits () {
+	(
+		cd submodule &&
+		(
+			cd subdir/deepsubmodule &&
+			git fetch &&
+			git checkout -q FETCH_HEAD
+		) &&
+		git add subdir/deepsubmodule &&
+		git commit -m "new deep submodule"
+	) &&
+	git add submodule &&
+	git commit -m "new submodule" &&
+	super_head=$(git rev-parse --short HEAD) &&
+	sub_head=$(git -C submodule rev-parse --short HEAD) &&
+	check_super $super_head &&
+	check_sub $sub_head
+}
+
 # Verifies that the expected repositories were fetched. This is done by
 # concatenating the files expect.err.[super|sub|deep] in the correct
 # order and comparing it to the actual stderr.
@@ -117,7 +141,7 @@ test_expect_success setup '
 '
 
 test_expect_success "fetch --recurse-submodules recurses into submodules" '
-	add_upstream_commit &&
+	add_submodule_commits &&
 	(
 		cd downstream &&
 		git fetch --recurse-submodules >../actual.out 2>../actual.err
@@ -127,7 +151,7 @@ test_expect_success "fetch --recurse-submodules recurses into submodules" '
 '
 
 test_expect_success "submodule.recurse option triggers recursive fetch" '
-	add_upstream_commit &&
+	add_submodule_commits &&
 	(
 		cd downstream &&
 		git -c submodule.recurse fetch >../actual.out 2>../actual.err
@@ -137,7 +161,7 @@ test_expect_success "submodule.recurse option triggers recursive fetch" '
 '
 
 test_expect_success "fetch --recurse-submodules -j2 has the same output behaviour" '
-	add_upstream_commit &&
+	add_submodule_commits &&
 	(
 		cd downstream &&
 		GIT_TRACE="$TRASH_DIRECTORY/trace.out" git fetch --recurse-submodules -j2 2>../actual.err
@@ -148,7 +172,7 @@ test_expect_success "fetch --recurse-submodules -j2 has the same output behaviou
 '
 
 test_expect_success "fetch alone only fetches superproject" '
-	add_upstream_commit &&
+	add_submodule_commits &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err
@@ -177,7 +201,7 @@ test_expect_success "using fetchRecurseSubmodules=true in .gitmodules recurses i
 '
 
 test_expect_success "--no-recurse-submodules overrides .gitmodules config" '
-	add_upstream_commit &&
+	add_submodule_commits &&
 	(
 		cd downstream &&
 		git fetch --no-recurse-submodules >../actual.out 2>../actual.err
@@ -226,7 +250,7 @@ test_expect_success "--quiet propagates to parallel submodules" '
 '
 
 test_expect_success "--dry-run propagates to submodules" '
-	add_upstream_commit &&
+	add_submodule_commits &&
 	(
 		cd downstream &&
 		git fetch --recurse-submodules --dry-run >../actual.out 2>../actual.err
@@ -245,7 +269,7 @@ test_expect_success "Without --dry-run propagates to submodules" '
 '
 
 test_expect_success "recurseSubmodules=true propagates into submodules" '
-	add_upstream_commit &&
+	add_submodule_commits &&
 	(
 		cd downstream &&
 		git config fetch.recurseSubmodules true &&
@@ -256,7 +280,7 @@ test_expect_success "recurseSubmodules=true propagates into submodules" '
 '
 
 test_expect_success "--recurse-submodules overrides config in submodule" '
-	add_upstream_commit &&
+	add_submodule_commits &&
 	(
 		cd downstream &&
 		(
@@ -270,7 +294,7 @@ test_expect_success "--recurse-submodules overrides config in submodule" '
 '
 
 test_expect_success "--no-recurse-submodules overrides config setting" '
-	add_upstream_commit &&
+	add_submodule_commits &&
 	(
 		cd downstream &&
 		git config fetch.recurseSubmodules true &&
@@ -309,7 +333,7 @@ test_expect_success "Recursion stops when no new submodule commits are fetched"
 '
 
 test_expect_success "Recursion doesn't happen when new superproject commits don't change any submodules" '
-	add_upstream_commit &&
+	add_submodule_commits &&
 	echo a > file &&
 	git add file &&
 	git commit -m "new file" &&
@@ -334,7 +358,7 @@ test_expect_success "Recursion picks up config in submodule" '
 			git config fetch.recurseSubmodules true
 		)
 	) &&
-	add_upstream_commit &&
+	add_submodule_commits &&
 	git add submodule &&
 	git commit -m "new submodule" &&
 	new_head=$(git rev-parse --short HEAD) &&
@@ -352,23 +376,8 @@ test_expect_success "Recursion picks up config in submodule" '
 '
 
 test_expect_success "Recursion picks up all submodules when necessary" '
-	add_upstream_commit &&
-	(
-		cd submodule &&
-		(
-			cd subdir/deepsubmodule &&
-			git fetch &&
-			git checkout -q FETCH_HEAD
-		) &&
-		git add subdir/deepsubmodule &&
-		git commit -m "new deepsubmodule" &&
-		new_head=$(git rev-parse --short HEAD) &&
-		check_sub $new_head
-	) &&
-	git add submodule &&
-	git commit -m "new submodule" &&
-	new_head=$(git rev-parse --short HEAD) &&
-	check_super $new_head &&
+	add_submodule_commits &&
+	add_superproject_commits &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err
@@ -378,19 +387,7 @@ test_expect_success "Recursion picks up all submodules when necessary" '
 '
 
 test_expect_success "'--recurse-submodules=on-demand' doesn't recurse when no new commits are fetched in the superproject (and ignores config)" '
-	add_upstream_commit &&
-	(
-		cd submodule &&
-		(
-			cd subdir/deepsubmodule &&
-			git fetch &&
-			git checkout -q FETCH_HEAD
-		) &&
-		git add subdir/deepsubmodule &&
-		git commit -m "new deepsubmodule" &&
-		new_head=$(git rev-parse --short HEAD) &&
-		check_sub $new_head
-	) &&
+	add_submodule_commits &&
 	(
 		cd downstream &&
 		git config fetch.recurseSubmodules true &&
@@ -402,10 +399,7 @@ test_expect_success "'--recurse-submodules=on-demand' doesn't recurse when no ne
 '
 
 test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necessary (and ignores config)" '
-	git add submodule &&
-	git commit -m "new submodule" &&
-	new_head=$(git rev-parse --short HEAD) &&
-	check_super $new_head &&
+	add_superproject_commits &&
 	(
 		cd downstream &&
 		git config fetch.recurseSubmodules false &&
@@ -425,7 +419,7 @@ test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necess
 '
 
 test_expect_success "'--recurse-submodules=on-demand' stops when no new submodule commits are found in the superproject (and ignores config)" '
-	add_upstream_commit &&
+	add_submodule_commits &&
 	echo a >> file &&
 	git add file &&
 	git commit -m "new file" &&
@@ -446,7 +440,7 @@ test_expect_success "'fetch.recurseSubmodules=on-demand' overrides global config
 		cd downstream &&
 		git fetch --recurse-submodules
 	) &&
-	add_upstream_commit &&
+	add_submodule_commits &&
 	git config --global fetch.recurseSubmodules false &&
 	git add submodule &&
 	git commit -m "new submodule" &&
@@ -472,7 +466,7 @@ test_expect_success "'submodule.<sub>.fetchRecurseSubmodules=on-demand' override
 		cd downstream &&
 		git fetch --recurse-submodules
 	) &&
-	add_upstream_commit &&
+	add_submodule_commits &&
 	git config fetch.recurseSubmodules false &&
 	git add submodule &&
 	git commit -m "new submodule" &&
@@ -522,7 +516,7 @@ test_expect_success "'fetch.recurseSubmodules=on-demand' works also without .git
 		cd downstream &&
 		git fetch --recurse-submodules
 	) &&
-	add_upstream_commit &&
+	add_submodule_commits &&
 	git add submodule &&
 	git rm .gitmodules &&
 	git commit -m "new submodule without .gitmodules" &&
-- 
2.33.GIT


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v4 04/10] submodule: make static functions read submodules from commits
  2022-03-04  0:57     ` [PATCH v4 00/10] fetch --recurse-submodules: fetch unpopulated submodules Glen Choo
                         ` (2 preceding siblings ...)
  2022-03-04  0:57       ` [PATCH v4 03/10] t5526: create superproject commits with test helper Glen Choo
@ 2022-03-04  0:57       ` Glen Choo
  2022-03-04  0:57       ` [PATCH v4 05/10] submodule: inline submodule_commits() into caller Glen Choo
                         ` (7 subsequent siblings)
  11 siblings, 0 replies; 149+ messages in thread
From: Glen Choo @ 2022-03-04  0:57 UTC (permalink / raw)
  To: git
  Cc: Glen Choo, Jonathan Tan, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

A future commit will teach "fetch --recurse-submodules" to fetch
unpopulated submodules. To prepare for this, teach the necessary static
functions how to read submodules from superproject commits using a
"treeish_name" argument (instead of always reading from the index and
filesystem) but do not actually change where submodules are read from.
Submodules will be read from commits when we fetch unpopulated
submodules.

Signed-off-by: Glen Choo <chooglen@google.com>
---
 submodule.c | 29 +++++++++++++++++++----------
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/submodule.c b/submodule.c
index 5ace18a7d9..4f3300f2cb 100644
--- a/submodule.c
+++ b/submodule.c
@@ -932,6 +932,7 @@ struct has_commit_data {
 	struct repository *repo;
 	int result;
 	const char *path;
+	const struct object_id *super_oid;
 };
 
 static int check_has_commit(const struct object_id *oid, void *data)
@@ -940,7 +941,7 @@ static int check_has_commit(const struct object_id *oid, void *data)
 	struct repository subrepo;
 	enum object_type type;
 
-	if (repo_submodule_init(&subrepo, cb->repo, cb->path, null_oid())) {
+	if (repo_submodule_init(&subrepo, cb->repo, cb->path, cb->super_oid)) {
 		cb->result = 0;
 		goto cleanup;
 	}
@@ -968,9 +969,15 @@ static int check_has_commit(const struct object_id *oid, void *data)
 
 static int submodule_has_commits(struct repository *r,
 				 const char *path,
+				 const struct object_id *super_oid,
 				 struct oid_array *commits)
 {
-	struct has_commit_data has_commit = { r, 1, path };
+	struct has_commit_data has_commit = {
+		.repo = r,
+		.result = 1,
+		.path = path,
+		.super_oid = super_oid
+	};
 
 	/*
 	 * Perform a cheap, but incorrect check for the existence of 'commits'.
@@ -1017,7 +1024,7 @@ static int submodule_needs_pushing(struct repository *r,
 				   const char *path,
 				   struct oid_array *commits)
 {
-	if (!submodule_has_commits(r, path, commits))
+	if (!submodule_has_commits(r, path, null_oid(), commits))
 		/*
 		 * NOTE: We do consider it safe to return "no" here. The
 		 * correct answer would be "We do not know" instead of
@@ -1277,7 +1284,7 @@ static void calculate_changed_submodule_paths(struct repository *r,
 		if (!path)
 			continue;
 
-		if (submodule_has_commits(r, path, commits)) {
+		if (submodule_has_commits(r, path, null_oid(), commits)) {
 			oid_array_clear(commits);
 			*name->string = '\0';
 		}
@@ -1402,12 +1409,13 @@ static const struct submodule *get_non_gitmodules_submodule(const char *path)
 }
 
 static struct fetch_task *fetch_task_create(struct repository *r,
-					    const char *path)
+					    const char *path,
+					    const struct object_id *treeish_name)
 {
 	struct fetch_task *task = xmalloc(sizeof(*task));
 	memset(task, 0, sizeof(*task));
 
-	task->sub = submodule_from_path(r, null_oid(), path);
+	task->sub = submodule_from_path(r, treeish_name, path);
 	if (!task->sub) {
 		/*
 		 * No entry in .gitmodules? Technically not a submodule,
@@ -1439,11 +1447,12 @@ static void fetch_task_release(struct fetch_task *p)
 }
 
 static struct repository *get_submodule_repo_for(struct repository *r,
-						 const char *path)
+						 const char *path,
+						 const struct object_id *treeish_name)
 {
 	struct repository *ret = xmalloc(sizeof(*ret));
 
-	if (repo_submodule_init(ret, r, path, null_oid())) {
+	if (repo_submodule_init(ret, r, path, treeish_name)) {
 		free(ret);
 		return NULL;
 	}
@@ -1464,7 +1473,7 @@ static int get_next_submodule(struct child_process *cp,
 		if (!S_ISGITLINK(ce->ce_mode))
 			continue;
 
-		task = fetch_task_create(spf->r, ce->name);
+		task = fetch_task_create(spf->r, ce->name, null_oid());
 		if (!task)
 			continue;
 
@@ -1487,7 +1496,7 @@ static int get_next_submodule(struct child_process *cp,
 			continue;
 		}
 
-		task->repo = get_submodule_repo_for(spf->r, task->sub->path);
+		task->repo = get_submodule_repo_for(spf->r, task->sub->path, null_oid());
 		if (task->repo) {
 			struct strbuf submodule_prefix = STRBUF_INIT;
 			child_process_init(cp);
-- 
2.33.GIT


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v4 05/10] submodule: inline submodule_commits() into caller
  2022-03-04  0:57     ` [PATCH v4 00/10] fetch --recurse-submodules: fetch unpopulated submodules Glen Choo
                         ` (3 preceding siblings ...)
  2022-03-04  0:57       ` [PATCH v4 04/10] submodule: make static functions read submodules from commits Glen Choo
@ 2022-03-04  0:57       ` Glen Choo
  2022-03-04  0:57       ` [PATCH v4 06/10] submodule: store new submodule commits oid_array in a struct Glen Choo
                         ` (6 subsequent siblings)
  11 siblings, 0 replies; 149+ messages in thread
From: Glen Choo @ 2022-03-04  0:57 UTC (permalink / raw)
  To: git
  Cc: Glen Choo, Jonathan Tan, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

When collecting the string_list of changed submodule names, the new
submodules commits are stored in the string_list_item.util as an
oid_array. A subsequent commit will replace the oid_array with a struct
that has more information.

Prepare for this change by inlining submodule_commits() (which inserts
into the string_list and initializes the string_list_item.util) into its
only caller so that the code is easier to refactor later.

Signed-off-by: Glen Choo <chooglen@google.com>
---
 submodule.c | 22 ++++++----------------
 1 file changed, 6 insertions(+), 16 deletions(-)

diff --git a/submodule.c b/submodule.c
index 4f3300f2cb..3bc189cf05 100644
--- a/submodule.c
+++ b/submodule.c
@@ -782,19 +782,6 @@ const struct submodule *submodule_from_ce(const struct cache_entry *ce)
 	return submodule_from_path(the_repository, null_oid(), ce->name);
 }
 
-static struct oid_array *submodule_commits(struct string_list *submodules,
-					   const char *name)
-{
-	struct string_list_item *item;
-
-	item = string_list_insert(submodules, name);
-	if (item->util)
-		return (struct oid_array *) item->util;
-
-	/* NEEDSWORK: should we have oid_array_init()? */
-	item->util = xcalloc(1, sizeof(struct oid_array));
-	return (struct oid_array *) item->util;
-}
 
 struct collect_changed_submodules_cb_data {
 	struct repository *repo;
@@ -830,9 +817,9 @@ static void collect_changed_submodules_cb(struct diff_queue_struct *q,
 
 	for (i = 0; i < q->nr; i++) {
 		struct diff_filepair *p = q->queue[i];
-		struct oid_array *commits;
 		const struct submodule *submodule;
 		const char *name;
+		struct string_list_item *item;
 
 		if (!S_ISGITLINK(p->two->mode))
 			continue;
@@ -859,8 +846,11 @@ static void collect_changed_submodules_cb(struct diff_queue_struct *q,
 		if (!name)
 			continue;
 
-		commits = submodule_commits(changed, name);
-		oid_array_append(commits, &p->two->oid);
+		item = string_list_insert(changed, name);
+		if (!item->util)
+			/* NEEDSWORK: should we have oid_array_init()? */
+			item->util = xcalloc(1, sizeof(struct oid_array));
+		oid_array_append(item->util, &p->two->oid);
 	}
 }
 
-- 
2.33.GIT


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v4 06/10] submodule: store new submodule commits oid_array in a struct
  2022-03-04  0:57     ` [PATCH v4 00/10] fetch --recurse-submodules: fetch unpopulated submodules Glen Choo
                         ` (4 preceding siblings ...)
  2022-03-04  0:57       ` [PATCH v4 05/10] submodule: inline submodule_commits() into caller Glen Choo
@ 2022-03-04  0:57       ` Glen Choo
  2022-03-04  0:57       ` [PATCH v4 07/10] submodule: extract get_fetch_task() Glen Choo
                         ` (5 subsequent siblings)
  11 siblings, 0 replies; 149+ messages in thread
From: Glen Choo @ 2022-03-04  0:57 UTC (permalink / raw)
  To: git
  Cc: Glen Choo, Jonathan Tan, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

This commit prepares for a future commit that will teach `git fetch
--recurse-submodules` how to fetch submodules that are present in
<gitdir>/modules, but are not populated. To do this, we need to store
more information about the changed submodule so that we can read the
submodule configuration from the superproject commit instead of the
filesystem.

Refactor the changed submodules string_list.util to hold a struct
instead of an oid_array. This struct only holds the new_commits
oid_array for now; more information will be added later.

Signed-off-by: Glen Choo <chooglen@google.com>
---
 submodule.c | 52 ++++++++++++++++++++++++++++++++++------------------
 1 file changed, 34 insertions(+), 18 deletions(-)

diff --git a/submodule.c b/submodule.c
index 3bc189cf05..0b9c25f9d3 100644
--- a/submodule.c
+++ b/submodule.c
@@ -806,6 +806,20 @@ static const char *default_name_or_path(const char *path_or_name)
 	return path_or_name;
 }
 
+/*
+ * Holds relevant information for a changed submodule. Used as the .util
+ * member of the changed submodule string_list_item.
+ */
+struct changed_submodule_data {
+	/* The submodule commits that have changed in the rev walk. */
+	struct oid_array new_commits;
+};
+
+static void changed_submodule_data_clear(struct changed_submodule_data *cs_data)
+{
+	oid_array_clear(&cs_data->new_commits);
+}
+
 static void collect_changed_submodules_cb(struct diff_queue_struct *q,
 					  struct diff_options *options,
 					  void *data)
@@ -820,6 +834,7 @@ static void collect_changed_submodules_cb(struct diff_queue_struct *q,
 		const struct submodule *submodule;
 		const char *name;
 		struct string_list_item *item;
+		struct changed_submodule_data *cs_data;
 
 		if (!S_ISGITLINK(p->two->mode))
 			continue;
@@ -848,9 +863,9 @@ static void collect_changed_submodules_cb(struct diff_queue_struct *q,
 
 		item = string_list_insert(changed, name);
 		if (!item->util)
-			/* NEEDSWORK: should we have oid_array_init()? */
-			item->util = xcalloc(1, sizeof(struct oid_array));
-		oid_array_append(item->util, &p->two->oid);
+			item->util = xcalloc(1, sizeof(struct changed_submodule_data));
+		cs_data = item->util;
+		oid_array_append(&cs_data->new_commits, &p->two->oid);
 	}
 }
 
@@ -897,11 +912,12 @@ static void collect_changed_submodules(struct repository *r,
 	reset_revision_walk();
 }
 
-static void free_submodules_oids(struct string_list *submodules)
+static void free_submodules_data(struct string_list *submodules)
 {
 	struct string_list_item *item;
 	for_each_string_list_item(item, submodules)
-		oid_array_clear((struct oid_array *) item->util);
+		changed_submodule_data_clear(item->util);
+
 	string_list_clear(submodules, 1);
 }
 
@@ -1074,7 +1090,7 @@ int find_unpushed_submodules(struct repository *r,
 	collect_changed_submodules(r, &submodules, &argv);
 
 	for_each_string_list_item(name, &submodules) {
-		struct oid_array *commits = name->util;
+		struct changed_submodule_data *cs_data = name->util;
 		const struct submodule *submodule;
 		const char *path = NULL;
 
@@ -1087,11 +1103,11 @@ int find_unpushed_submodules(struct repository *r,
 		if (!path)
 			continue;
 
-		if (submodule_needs_pushing(r, path, commits))
+		if (submodule_needs_pushing(r, path, &cs_data->new_commits))
 			string_list_insert(needs_pushing, path);
 	}
 
-	free_submodules_oids(&submodules);
+	free_submodules_data(&submodules);
 	strvec_clear(&argv);
 
 	return needs_pushing->nr;
@@ -1261,7 +1277,7 @@ static void calculate_changed_submodule_paths(struct repository *r,
 	collect_changed_submodules(r, changed_submodule_names, &argv);
 
 	for_each_string_list_item(name, changed_submodule_names) {
-		struct oid_array *commits = name->util;
+		struct changed_submodule_data *cs_data = name->util;
 		const struct submodule *submodule;
 		const char *path = NULL;
 
@@ -1274,8 +1290,8 @@ static void calculate_changed_submodule_paths(struct repository *r,
 		if (!path)
 			continue;
 
-		if (submodule_has_commits(r, path, null_oid(), commits)) {
-			oid_array_clear(commits);
+		if (submodule_has_commits(r, path, null_oid(), &cs_data->new_commits)) {
+			changed_submodule_data_clear(cs_data);
 			*name->string = '\0';
 		}
 	}
@@ -1312,7 +1328,7 @@ int submodule_touches_in_range(struct repository *r,
 
 	strvec_clear(&args);
 
-	free_submodules_oids(&subs);
+	free_submodules_data(&subs);
 	return ret;
 }
 
@@ -1596,7 +1612,7 @@ static int fetch_finish(int retvalue, struct strbuf *err,
 	struct fetch_task *task = task_cb;
 
 	struct string_list_item *it;
-	struct oid_array *commits;
+	struct changed_submodule_data *cs_data;
 
 	if (!task || !task->sub)
 		BUG("callback cookie bogus");
@@ -1624,14 +1640,14 @@ static int fetch_finish(int retvalue, struct strbuf *err,
 		/* Could be an unchanged submodule, not contained in the list */
 		goto out;
 
-	commits = it->util;
-	oid_array_filter(commits,
+	cs_data = it->util;
+	oid_array_filter(&cs_data->new_commits,
 			 commit_missing_in_sub,
 			 task->repo);
 
 	/* Are there commits we want, but do not exist? */
-	if (commits->nr) {
-		task->commits = commits;
+	if (cs_data->new_commits.nr) {
+		task->commits = &cs_data->new_commits;
 		ALLOC_GROW(spf->oid_fetch_tasks,
 			   spf->oid_fetch_tasks_nr + 1,
 			   spf->oid_fetch_tasks_alloc);
@@ -1689,7 +1705,7 @@ int fetch_populated_submodules(struct repository *r,
 
 	strvec_clear(&spf.args);
 out:
-	free_submodules_oids(&spf.changed_submodule_names);
+	free_submodules_data(&spf.changed_submodule_names);
 	return spf.result;
 }
 
-- 
2.33.GIT


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v4 07/10] submodule: extract get_fetch_task()
  2022-03-04  0:57     ` [PATCH v4 00/10] fetch --recurse-submodules: fetch unpopulated submodules Glen Choo
                         ` (5 preceding siblings ...)
  2022-03-04  0:57       ` [PATCH v4 06/10] submodule: store new submodule commits oid_array in a struct Glen Choo
@ 2022-03-04  0:57       ` Glen Choo
  2022-03-04  0:57       ` [PATCH v4 08/10] submodule: move logic into fetch_task_create() Glen Choo
                         ` (4 subsequent siblings)
  11 siblings, 0 replies; 149+ messages in thread
From: Glen Choo @ 2022-03-04  0:57 UTC (permalink / raw)
  To: git
  Cc: Glen Choo, Jonathan Tan, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

get_next_submodule() configures the parallel submodule fetch by
performing two functions:

* iterate the index to find submodules
* configure the child processes to fetch the submodules found in the
  previous step

Extract the index iterating code into an iterator function,
get_fetch_task(), so that get_next_submodule() is agnostic of how
to find submodules. This prepares for a subsequent commit will teach the
fetch machinery to also iterate through the list of changed
submodules (in addition to the index).

Signed-off-by: Glen Choo <chooglen@google.com>
---
 submodule.c | 61 +++++++++++++++++++++++++++++++----------------------
 1 file changed, 36 insertions(+), 25 deletions(-)

diff --git a/submodule.c b/submodule.c
index 0b9c25f9d3..7a5316b6f7 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1389,6 +1389,7 @@ struct fetch_task {
 	struct repository *repo;
 	const struct submodule *sub;
 	unsigned free_sub : 1; /* Do we need to free the submodule? */
+	const char *default_argv; /* The default fetch mode. */
 
 	struct oid_array *commits; /* Ensure these commits are fetched */
 };
@@ -1466,14 +1467,11 @@ static struct repository *get_submodule_repo_for(struct repository *r,
 	return ret;
 }
 
-static int get_next_submodule(struct child_process *cp,
-			      struct strbuf *err, void *data, void **task_cb)
+static struct fetch_task *
+get_fetch_task(struct submodule_parallel_fetch *spf, struct strbuf *err)
 {
-	struct submodule_parallel_fetch *spf = data;
-
 	for (; spf->count < spf->r->index->cache_nr; spf->count++) {
 		const struct cache_entry *ce = spf->r->index->cache[spf->count];
-		const char *default_argv;
 		struct fetch_task *task;
 
 		if (!S_ISGITLINK(ce->ce_mode))
@@ -1493,10 +1491,10 @@ static int get_next_submodule(struct child_process *cp,
 					&spf->changed_submodule_names,
 					task->sub->name))
 				continue;
-			default_argv = "on-demand";
+			task->default_argv = "on-demand";
 			break;
 		case RECURSE_SUBMODULES_ON:
-			default_argv = "yes";
+			task->default_argv = "yes";
 			break;
 		case RECURSE_SUBMODULES_OFF:
 			continue;
@@ -1504,29 +1502,12 @@ static int get_next_submodule(struct child_process *cp,
 
 		task->repo = get_submodule_repo_for(spf->r, task->sub->path, null_oid());
 		if (task->repo) {
-			struct strbuf submodule_prefix = STRBUF_INIT;
-			child_process_init(cp);
-			cp->dir = task->repo->gitdir;
-			prepare_submodule_repo_env_in_gitdir(&cp->env_array);
-			cp->git_cmd = 1;
 			if (!spf->quiet)
 				strbuf_addf(err, _("Fetching submodule %s%s\n"),
 					    spf->prefix, ce->name);
-			strvec_init(&cp->args);
-			strvec_pushv(&cp->args, spf->args.v);
-			strvec_push(&cp->args, default_argv);
-			strvec_push(&cp->args, "--submodule-prefix");
-
-			strbuf_addf(&submodule_prefix, "%s%s/",
-						       spf->prefix,
-						       task->sub->path);
-			strvec_push(&cp->args, submodule_prefix.buf);
 
 			spf->count++;
-			*task_cb = task;
-
-			strbuf_release(&submodule_prefix);
-			return 1;
+			return task;
 		} else {
 			struct strbuf empty_submodule_path = STRBUF_INIT;
 
@@ -1550,6 +1531,36 @@ static int get_next_submodule(struct child_process *cp,
 			strbuf_release(&empty_submodule_path);
 		}
 	}
+	return NULL;
+}
+
+static int get_next_submodule(struct child_process *cp, struct strbuf *err,
+			      void *data, void **task_cb)
+{
+	struct submodule_parallel_fetch *spf = data;
+	struct fetch_task *task = get_fetch_task(spf, err);
+
+	if (task) {
+		struct strbuf submodule_prefix = STRBUF_INIT;
+
+		child_process_init(cp);
+		cp->dir = task->repo->gitdir;
+		prepare_submodule_repo_env_in_gitdir(&cp->env_array);
+		cp->git_cmd = 1;
+		strvec_init(&cp->args);
+		strvec_pushv(&cp->args, spf->args.v);
+		strvec_push(&cp->args, task->default_argv);
+		strvec_push(&cp->args, "--submodule-prefix");
+
+		strbuf_addf(&submodule_prefix, "%s%s/",
+						spf->prefix,
+						task->sub->path);
+		strvec_push(&cp->args, submodule_prefix.buf);
+		*task_cb = task;
+
+		strbuf_release(&submodule_prefix);
+		return 1;
+	}
 
 	if (spf->oid_fetch_tasks_nr) {
 		struct fetch_task *task =
-- 
2.33.GIT


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v4 08/10] submodule: move logic into fetch_task_create()
  2022-03-04  0:57     ` [PATCH v4 00/10] fetch --recurse-submodules: fetch unpopulated submodules Glen Choo
                         ` (6 preceding siblings ...)
  2022-03-04  0:57       ` [PATCH v4 07/10] submodule: extract get_fetch_task() Glen Choo
@ 2022-03-04  0:57       ` Glen Choo
  2022-03-04  0:57       ` [PATCH v4 09/10] fetch: fetch unpopulated, changed submodules Glen Choo
                         ` (3 subsequent siblings)
  11 siblings, 0 replies; 149+ messages in thread
From: Glen Choo @ 2022-03-04  0:57 UTC (permalink / raw)
  To: git
  Cc: Glen Choo, Jonathan Tan, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

get_fetch_task() gets a fetch task by iterating the index; a future
commit will introduce a similar function, get_fetch_task_from_changed(),
that gets a fetch task from the list of changed submodules. Both
functions are similar in that they need to:

* create a fetch task
* initialize the submodule repo for the fetch task
* determine the default recursion mode

Move all of this logic into fetch_task_create() so that it is no longer
split between fetch_task_create() and get_fetch_task(). This will make
it easier to share code with get_fetch_task_from_changed().

Signed-off-by: Glen Choo <chooglen@google.com>
---
 submodule.c | 99 ++++++++++++++++++++++++++++-------------------------
 1 file changed, 52 insertions(+), 47 deletions(-)

diff --git a/submodule.c b/submodule.c
index 7a5316b6f7..b36ef26752 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1415,32 +1415,6 @@ static const struct submodule *get_non_gitmodules_submodule(const char *path)
 	return (const struct submodule *) ret;
 }
 
-static struct fetch_task *fetch_task_create(struct repository *r,
-					    const char *path,
-					    const struct object_id *treeish_name)
-{
-	struct fetch_task *task = xmalloc(sizeof(*task));
-	memset(task, 0, sizeof(*task));
-
-	task->sub = submodule_from_path(r, treeish_name, path);
-	if (!task->sub) {
-		/*
-		 * No entry in .gitmodules? Technically not a submodule,
-		 * but historically we supported repositories that happen to be
-		 * in-place where a gitlink is. Keep supporting them.
-		 */
-		task->sub = get_non_gitmodules_submodule(path);
-		if (!task->sub) {
-			free(task);
-			return NULL;
-		}
-
-		task->free_sub = 1;
-	}
-
-	return task;
-}
-
 static void fetch_task_release(struct fetch_task *p)
 {
 	if (p->free_sub)
@@ -1467,6 +1441,57 @@ static struct repository *get_submodule_repo_for(struct repository *r,
 	return ret;
 }
 
+static struct fetch_task *fetch_task_create(struct submodule_parallel_fetch *spf,
+					    const char *path,
+					    const struct object_id *treeish_name)
+{
+	struct fetch_task *task = xmalloc(sizeof(*task));
+	memset(task, 0, sizeof(*task));
+
+	task->sub = submodule_from_path(spf->r, treeish_name, path);
+
+	if (!task->sub) {
+		/*
+		 * No entry in .gitmodules? Technically not a submodule,
+		 * but historically we supported repositories that happen to be
+		 * in-place where a gitlink is. Keep supporting them.
+		 */
+		task->sub = get_non_gitmodules_submodule(path);
+		if (!task->sub)
+			goto cleanup;
+
+		task->free_sub = 1;
+	}
+
+	switch (get_fetch_recurse_config(task->sub, spf))
+	{
+	default:
+	case RECURSE_SUBMODULES_DEFAULT:
+	case RECURSE_SUBMODULES_ON_DEMAND:
+		if (!task->sub ||
+			!string_list_lookup(
+				&spf->changed_submodule_names,
+				task->sub->name))
+			goto cleanup;
+		task->default_argv = "on-demand";
+		break;
+	case RECURSE_SUBMODULES_ON:
+		task->default_argv = "yes";
+		break;
+	case RECURSE_SUBMODULES_OFF:
+		goto cleanup;
+	}
+
+	task->repo = get_submodule_repo_for(spf->r, path, treeish_name);
+
+	return task;
+
+ cleanup:
+	fetch_task_release(task);
+	free(task);
+	return NULL;
+}
+
 static struct fetch_task *
 get_fetch_task(struct submodule_parallel_fetch *spf, struct strbuf *err)
 {
@@ -1477,30 +1502,10 @@ get_fetch_task(struct submodule_parallel_fetch *spf, struct strbuf *err)
 		if (!S_ISGITLINK(ce->ce_mode))
 			continue;
 
-		task = fetch_task_create(spf->r, ce->name, null_oid());
+		task = fetch_task_create(spf, ce->name, null_oid());
 		if (!task)
 			continue;
 
-		switch (get_fetch_recurse_config(task->sub, spf))
-		{
-		default:
-		case RECURSE_SUBMODULES_DEFAULT:
-		case RECURSE_SUBMODULES_ON_DEMAND:
-			if (!task->sub ||
-			    !string_list_lookup(
-					&spf->changed_submodule_names,
-					task->sub->name))
-				continue;
-			task->default_argv = "on-demand";
-			break;
-		case RECURSE_SUBMODULES_ON:
-			task->default_argv = "yes";
-			break;
-		case RECURSE_SUBMODULES_OFF:
-			continue;
-		}
-
-		task->repo = get_submodule_repo_for(spf->r, task->sub->path, null_oid());
 		if (task->repo) {
 			if (!spf->quiet)
 				strbuf_addf(err, _("Fetching submodule %s%s\n"),
-- 
2.33.GIT


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v4 09/10] fetch: fetch unpopulated, changed submodules
  2022-03-04  0:57     ` [PATCH v4 00/10] fetch --recurse-submodules: fetch unpopulated submodules Glen Choo
                         ` (7 preceding siblings ...)
  2022-03-04  0:57       ` [PATCH v4 08/10] submodule: move logic into fetch_task_create() Glen Choo
@ 2022-03-04  0:57       ` Glen Choo
  2022-03-04  2:37         ` Junio C Hamano
  2022-03-04 23:56         ` Jonathan Tan
  2022-03-04  0:57       ` [PATCH v4 10/10] submodule: fix latent check_has_commit() bug Glen Choo
                         ` (2 subsequent siblings)
  11 siblings, 2 replies; 149+ messages in thread
From: Glen Choo @ 2022-03-04  0:57 UTC (permalink / raw)
  To: git
  Cc: Glen Choo, Jonathan Tan, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

"git fetch --recurse-submodules" only considers populated
submodules (i.e. submodules that can be found by iterating the index),
which makes "git fetch" behave differently based on which commit is
checked out. As a result, even if the user has initialized all submodules
correctly, they may not fetch the necessary submodule commits, and
commands like "git checkout --recurse-submodules" might fail.

Teach "git fetch" to fetch cloned, changed submodules regardless of
whether they are populated. This is in addition to the current behavior
of fetching populated submodules (which is always attempted regardless
of what was fetched in the superproject, or even if nothing was fetched
in the superproject).

A submodule may be encountered multiple times (via the list of
populated submodules or via the list of changed submodules). When this
happens, "git fetch" only reads the 'populated copy' and ignores the
'changed copy'. Amend the verify_fetch_result() test helper so that we
can assert on which 'copy' is being read.

Signed-off-by: Glen Choo <chooglen@google.com>
---
 Documentation/fetch-options.txt |  26 ++--
 Documentation/git-fetch.txt     |  10 +-
 builtin/fetch.c                 |  14 +-
 submodule.c                     | 178 +++++++++++++++++++--
 submodule.h                     |  12 +-
 t/t5526-fetch-submodules.sh     | 263 +++++++++++++++++++++++++++++++-
 6 files changed, 457 insertions(+), 46 deletions(-)

diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt
index f903683189..6cdd9d43c5 100644
--- a/Documentation/fetch-options.txt
+++ b/Documentation/fetch-options.txt
@@ -186,15 +186,23 @@ endif::git-pull[]
 ifndef::git-pull[]
 --recurse-submodules[=yes|on-demand|no]::
 	This option controls if and under what conditions new commits of
-	populated submodules should be fetched too. It can be used as a
-	boolean option to completely disable recursion when set to 'no' or to
-	unconditionally recurse into all populated submodules when set to
-	'yes', which is the default when this option is used without any
-	value. Use 'on-demand' to only recurse into a populated submodule
-	when the superproject retrieves a commit that updates the submodule's
-	reference to a commit that isn't already in the local submodule
-	clone. By default, 'on-demand' is used, unless
-	`fetch.recurseSubmodules` is set (see linkgit:git-config[1]).
+	submodules should be fetched too. When recursing through submodules,
+	`git fetch` always attempts to fetch "changed" submodules, that is, a
+	submodule that has commits that are referenced by a newly fetched
+	superproject commit but are missing in the local submodule clone. A
+	changed submodule can be fetched as long as it is present locally e.g.
+	in `$GIT_DIR/modules/` (see linkgit:gitsubmodules[7]); if the upstream
+	adds a new submodule, that submodule cannot be fetched until it is
+	cloned e.g. by `git submodule update`.
++
+When set to 'on-demand', only changed submodules are fetched. When set
+to 'yes', all populated submodules are fetched and submodules that are
+both unpopulated and changed are fetched. When set to 'no', submodules
+are never fetched.
++
+When unspecified, this uses the value of `fetch.recurseSubmodules` if it
+is set (see linkgit:git-config[1]), defaulting to 'on-demand' if unset.
+When this option is used without any value, it defaults to 'yes'.
 endif::git-pull[]
 
 -j::
diff --git a/Documentation/git-fetch.txt b/Documentation/git-fetch.txt
index 550c16ca61..e9d364669a 100644
--- a/Documentation/git-fetch.txt
+++ b/Documentation/git-fetch.txt
@@ -287,12 +287,10 @@ include::transfer-data-leaks.txt[]
 
 BUGS
 ----
-Using --recurse-submodules can only fetch new commits in already checked
-out submodules right now. When e.g. upstream added a new submodule in the
-just fetched commits of the superproject the submodule itself cannot be
-fetched, making it impossible to check out that submodule later without
-having to do a fetch again. This is expected to be fixed in a future Git
-version.
+Using --recurse-submodules can only fetch new commits in submodules that are
+present locally e.g. in `$GIT_DIR/modules/`. If the upstream adds a new
+submodule, that submodule cannot be fetched until it is cloned e.g. by `git
+submodule update`. This is expected to be fixed in a future Git version.
 
 SEE ALSO
 --------
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 95832ba1df..97a89763c8 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -2178,13 +2178,13 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 			max_children = fetch_parallel_config;
 
 		add_options_to_argv(&options);
-		result = fetch_populated_submodules(the_repository,
-						    &options,
-						    submodule_prefix,
-						    recurse_submodules,
-						    recurse_submodules_default,
-						    verbosity < 0,
-						    max_children);
+		result = fetch_submodules(the_repository,
+					  &options,
+					  submodule_prefix,
+					  recurse_submodules,
+					  recurse_submodules_default,
+					  verbosity < 0,
+					  max_children);
 		strvec_clear(&options);
 	}
 
diff --git a/submodule.c b/submodule.c
index b36ef26752..1f5f39ce18 100644
--- a/submodule.c
+++ b/submodule.c
@@ -808,9 +808,25 @@ static const char *default_name_or_path(const char *path_or_name)
 
 /*
  * Holds relevant information for a changed submodule. Used as the .util
- * member of the changed submodule string_list_item.
+ * member of the changed submodule name string_list_item.
+ *
+ * (super_oid, path) allows the submodule config to be read from _some_
+ * .gitmodules file. We store this information the first time we find a
+ * superproject commit that points to the submodule, but this is
+ * arbitrary - we can choose any (super_oid, path) that matches the
+ * submodule's name.
  */
 struct changed_submodule_data {
+	/*
+	 * The first superproject commit in the rev walk that points to
+	 * the submodule.
+	 */
+	const struct object_id *super_oid;
+	/*
+	 * Path to the submodule in the superproject commit referenced
+	 * by 'super_oid'.
+	 */
+	char *path;
 	/* The submodule commits that have changed in the rev walk. */
 	struct oid_array new_commits;
 };
@@ -818,6 +834,7 @@ struct changed_submodule_data {
 static void changed_submodule_data_clear(struct changed_submodule_data *cs_data)
 {
 	oid_array_clear(&cs_data->new_commits);
+	free(cs_data->path);
 }
 
 static void collect_changed_submodules_cb(struct diff_queue_struct *q,
@@ -862,9 +879,14 @@ static void collect_changed_submodules_cb(struct diff_queue_struct *q,
 			continue;
 
 		item = string_list_insert(changed, name);
-		if (!item->util)
+		if (item->util)
+			cs_data = item->util;
+		else {
 			item->util = xcalloc(1, sizeof(struct changed_submodule_data));
-		cs_data = item->util;
+			cs_data = item->util;
+			cs_data->super_oid = commit_oid;
+			cs_data->path = xstrdup(p->two->path);
+		}
 		oid_array_append(&cs_data->new_commits, &p->two->oid);
 	}
 }
@@ -1253,14 +1275,36 @@ void check_for_new_submodule_commits(struct object_id *oid)
 	oid_array_append(&ref_tips_after_fetch, oid);
 }
 
+/*
+ * Returns 1 if there is at least one submodule gitdir in
+ * $GIT_DIR/modules and 0 otherwise. This follows
+ * submodule_name_to_gitdir(), which looks for submodules in
+ * $GIT_DIR/modules, not $GIT_COMMON_DIR.
+ *
+ * A submodule can be moved to $GIT_DIR/modules manually by running "git
+ * submodule absorbgitdirs", or it may be initialized there by "git
+ * submodule update".
+ */
+static int repo_has_absorbed_submodules(struct repository *r)
+{
+	int ret;
+	struct strbuf buf = STRBUF_INIT;
+
+	strbuf_repo_git_path(&buf, r, "modules/");
+	ret = file_exists(buf.buf) && !is_empty_dir(buf.buf);
+	strbuf_release(&buf);
+	return ret;
+}
+
 static void calculate_changed_submodule_paths(struct repository *r,
 		struct string_list *changed_submodule_names)
 {
 	struct strvec argv = STRVEC_INIT;
 	struct string_list_item *name;
 
-	/* No need to check if there are no submodules configured */
-	if (!submodule_from_path(r, NULL, NULL))
+	/* No need to check if no submodules would be fetched */
+	if (!submodule_from_path(r, NULL, NULL) &&
+	    !repo_has_absorbed_submodules(r))
 		return;
 
 	strvec_push(&argv, "--"); /* argv[0] program name */
@@ -1333,7 +1377,16 @@ int submodule_touches_in_range(struct repository *r,
 }
 
 struct submodule_parallel_fetch {
-	int count;
+	/*
+	 * The index of the last index entry processed by
+	 * get_fetch_task_from_index().
+	 */
+	int index_count;
+	/*
+	 * The index of the last string_list entry processed by
+	 * get_fetch_task_from_changed().
+	 */
+	int changed_count;
 	struct strvec args;
 	struct repository *r;
 	const char *prefix;
@@ -1342,7 +1395,16 @@ struct submodule_parallel_fetch {
 	int quiet;
 	int result;
 
+	/*
+	 * Names of submodules that have new commits. Generated by
+	 * walking the newly fetched superproject commits.
+	 */
 	struct string_list changed_submodule_names;
+	/*
+	 * Names of submodules that have already been processed. Lets us
+	 * avoid fetching the same submodule more than once.
+	 */
+	struct string_list seen_submodule_names;
 
 	/* Pending fetches by OIDs */
 	struct fetch_task **oid_fetch_tasks;
@@ -1353,6 +1415,7 @@ struct submodule_parallel_fetch {
 #define SPF_INIT { \
 	.args = STRVEC_INIT, \
 	.changed_submodule_names = STRING_LIST_INIT_DUP, \
+	.seen_submodule_names = STRING_LIST_INIT_DUP, \
 	.submodules_with_errors = STRBUF_INIT, \
 }
 
@@ -1390,6 +1453,7 @@ struct fetch_task {
 	const struct submodule *sub;
 	unsigned free_sub : 1; /* Do we need to free the submodule? */
 	const char *default_argv; /* The default fetch mode. */
+	struct strvec git_args; /* Args for the child git process. */
 
 	struct oid_array *commits; /* Ensure these commits are fetched */
 };
@@ -1425,6 +1489,8 @@ static void fetch_task_release(struct fetch_task *p)
 	if (p->repo)
 		repo_clear(p->repo);
 	FREE_AND_NULL(p->repo);
+
+	strvec_clear(&p->git_args);
 }
 
 static struct repository *get_submodule_repo_for(struct repository *r,
@@ -1463,6 +1529,9 @@ static struct fetch_task *fetch_task_create(struct submodule_parallel_fetch *spf
 		task->free_sub = 1;
 	}
 
+	if (string_list_lookup(&spf->seen_submodule_names, task->sub->name))
+		goto cleanup;
+
 	switch (get_fetch_recurse_config(task->sub, spf))
 	{
 	default:
@@ -1493,10 +1562,12 @@ static struct fetch_task *fetch_task_create(struct submodule_parallel_fetch *spf
 }
 
 static struct fetch_task *
-get_fetch_task(struct submodule_parallel_fetch *spf, struct strbuf *err)
+get_fetch_task_from_index(struct submodule_parallel_fetch *spf,
+			  struct strbuf *err)
 {
-	for (; spf->count < spf->r->index->cache_nr; spf->count++) {
-		const struct cache_entry *ce = spf->r->index->cache[spf->count];
+	for (; spf->index_count < spf->r->index->cache_nr; spf->index_count++) {
+		const struct cache_entry *ce =
+			spf->r->index->cache[spf->index_count];
 		struct fetch_task *task;
 
 		if (!S_ISGITLINK(ce->ce_mode))
@@ -1511,7 +1582,7 @@ get_fetch_task(struct submodule_parallel_fetch *spf, struct strbuf *err)
 				strbuf_addf(err, _("Fetching submodule %s%s\n"),
 					    spf->prefix, ce->name);
 
-			spf->count++;
+			spf->index_count++;
 			return task;
 		} else {
 			struct strbuf empty_submodule_path = STRBUF_INIT;
@@ -1539,11 +1610,83 @@ get_fetch_task(struct submodule_parallel_fetch *spf, struct strbuf *err)
 	return NULL;
 }
 
+static struct fetch_task *
+get_fetch_task_from_changed(struct submodule_parallel_fetch *spf,
+			    struct strbuf *err)
+{
+	for (; spf->changed_count < spf->changed_submodule_names.nr;
+	     spf->changed_count++) {
+		struct string_list_item item =
+			spf->changed_submodule_names.items[spf->changed_count];
+		struct changed_submodule_data *cs_data = item.util;
+		struct fetch_task *task;
+
+		if (!is_tree_submodule_active(spf->r, cs_data->super_oid,cs_data->path))
+			continue;
+
+		task = fetch_task_create(spf, cs_data->path,
+					 cs_data->super_oid);
+		if (!task)
+			continue;
+
+		if (!task->repo) {
+			strbuf_addf(err, _("Could not access submodule '%s' at commit %s\n"),
+				    cs_data->path,
+				    find_unique_abbrev(cs_data->super_oid, DEFAULT_ABBREV));
+
+			fetch_task_release(task);
+			free(task);
+			continue;
+		}
+
+		if (!spf->quiet)
+			strbuf_addf(err,
+				    _("Fetching submodule %s%s at commit %s\n"),
+				    spf->prefix, task->sub->path,
+				    find_unique_abbrev(cs_data->super_oid,
+						       DEFAULT_ABBREV));
+
+		spf->changed_count++;
+		/*
+		 * NEEDSWORK: Submodules set/unset a value for
+		 * core.worktree when they are populated/unpopulated by
+		 * "git checkout" (and similar commands, see
+		 * submodule_move_head() and
+		 * connect_work_tree_and_git_dir()), but if the
+		 * submodule is unpopulated in another way (e.g. "git
+		 * rm", "rm -r"), core.worktree will still be set even
+		 * though the directory doesn't exist, and the child
+		 * process will crash while trying to chdir into the
+		 * nonexistent directory.
+		 *
+		 * In this case, we know that the submodule has no
+		 * working tree, so we can work around this by
+		 * setting "--work-tree=." (--bare does not work because
+		 * worktree settings take precedence over bare-ness).
+		 * However, this is not necessarily true in other cases,
+		 * so a generalized solution is still necessary.
+		 *
+		 * Possible solutions:
+		 * - teach "git [add|rm]" to unset core.worktree and
+		 *   discourage users from removing submodules without
+		 *   using a Git command.
+		 * - teach submodule child processes to ignore stale
+		 *   core.worktree values.
+		 */
+		strvec_push(&task->git_args, "--work-tree=.");
+		return task;
+	}
+	return NULL;
+}
+
 static int get_next_submodule(struct child_process *cp, struct strbuf *err,
 			      void *data, void **task_cb)
 {
 	struct submodule_parallel_fetch *spf = data;
-	struct fetch_task *task = get_fetch_task(spf, err);
+	struct fetch_task *task =
+		get_fetch_task_from_index(spf, err);
+	if (!task)
+		task = get_fetch_task_from_changed(spf, err);
 
 	if (task) {
 		struct strbuf submodule_prefix = STRBUF_INIT;
@@ -1553,6 +1696,8 @@ static int get_next_submodule(struct child_process *cp, struct strbuf *err,
 		prepare_submodule_repo_env_in_gitdir(&cp->env_array);
 		cp->git_cmd = 1;
 		strvec_init(&cp->args);
+		if (task->git_args.nr)
+			strvec_pushv(&cp->args, task->git_args.v);
 		strvec_pushv(&cp->args, spf->args.v);
 		strvec_push(&cp->args, task->default_argv);
 		strvec_push(&cp->args, "--submodule-prefix");
@@ -1564,6 +1709,7 @@ static int get_next_submodule(struct child_process *cp, struct strbuf *err,
 		*task_cb = task;
 
 		strbuf_release(&submodule_prefix);
+		string_list_insert(&spf->seen_submodule_names, task->sub->name);
 		return 1;
 	}
 
@@ -1678,11 +1824,11 @@ static int fetch_finish(int retvalue, struct strbuf *err,
 	return 0;
 }
 
-int fetch_populated_submodules(struct repository *r,
-			       const struct strvec *options,
-			       const char *prefix, int command_line_option,
-			       int default_option,
-			       int quiet, int max_parallel_jobs)
+int fetch_submodules(struct repository *r,
+		     const struct strvec *options,
+		     const char *prefix, int command_line_option,
+		     int default_option,
+		     int quiet, int max_parallel_jobs)
 {
 	int i;
 	struct submodule_parallel_fetch spf = SPF_INIT;
diff --git a/submodule.h b/submodule.h
index 784ceffc0e..61bebde319 100644
--- a/submodule.h
+++ b/submodule.h
@@ -88,12 +88,12 @@ int should_update_submodules(void);
  */
 const struct submodule *submodule_from_ce(const struct cache_entry *ce);
 void check_for_new_submodule_commits(struct object_id *oid);
-int fetch_populated_submodules(struct repository *r,
-			       const struct strvec *options,
-			       const char *prefix,
-			       int command_line_option,
-			       int default_option,
-			       int quiet, int max_parallel_jobs);
+int fetch_submodules(struct repository *r,
+		     const struct strvec *options,
+		     const char *prefix,
+		     int command_line_option,
+		     int default_option,
+		     int quiet, int max_parallel_jobs);
 unsigned is_submodule_modified(const char *path, int ignore_untracked);
 int submodule_uses_gitfile(const char *path);
 
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index 4cae2e4f7c..e844ae9e42 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -12,8 +12,9 @@ pwd=$(pwd)
 
 check_sub () {
 	NEW_HEAD=$1 &&
+	SUPER_HEAD=$2 &&
 	cat >$pwd/expect.err.sub <<-EOF
-	Fetching submodule submodule
+	Fetching submodule submodule${SUPER_HEAD:+ at commit $SUPER_HEAD}
 	From $pwd/submodule
 	   OLD_HEAD..$NEW_HEAD  sub        -> origin/sub
 	EOF
@@ -21,8 +22,9 @@ check_sub () {
 
 check_deep () {
 	NEW_HEAD=$1 &&
+	SUB_HEAD=$2 &&
 	cat >$pwd/expect.err.deep <<-EOF
-	Fetching submodule submodule/subdir/deepsubmodule
+	Fetching submodule submodule/subdir/deepsubmodule${SUB_HEAD:+ at commit $SUB_HEAD}
 	From $pwd/deepsubmodule
 	   OLD_HEAD..$NEW_HEAD  deep       -> origin/deep
 	EOF
@@ -399,6 +401,7 @@ test_expect_success "'--recurse-submodules=on-demand' doesn't recurse when no ne
 '
 
 test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necessary (and ignores config)" '
+	add_submodule_commits &&
 	add_superproject_commits &&
 	(
 		cd downstream &&
@@ -418,6 +421,155 @@ test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necess
 	verify_fetch_result actual.err
 '
 
+# Test that we can fetch submodules in other branches by running fetch
+# in a commit that has no submodules.
+test_expect_success 'setup downstream branch without submodules' '
+	(
+		cd downstream &&
+		git checkout --recurse-submodules -b no-submodules &&
+		git rm .gitmodules &&
+		git rm submodule &&
+		git commit -m "no submodules" &&
+		git checkout --recurse-submodules super
+	)
+'
+
+test_expect_success "'--recurse-submodules=on-demand' should fetch submodule commits if the submodule is changed but the index has no submodules" '
+	add_submodule_commits &&
+	add_superproject_commits &&
+	# Fetch the new superproject commit
+	(
+		cd downstream &&
+		git switch --recurse-submodules no-submodules &&
+		git fetch --recurse-submodules=on-demand >../actual.out 2>../actual.err
+	) &&
+	super_head=$(git rev-parse --short HEAD) &&
+	sub_head=$(git -C submodule rev-parse --short HEAD) &&
+	deep_head=$(git -C submodule/subdir/deepsubmodule rev-parse --short HEAD) &&
+
+	# assert that these are fetched from commits, not the index
+	check_sub $sub_head $super_head &&
+	check_deep $deep_head $sub_head &&
+
+	test_must_be_empty actual.out &&
+	verify_fetch_result actual.err
+'
+
+test_expect_success "'--recurse-submodules' should fetch submodule commits if the submodule is changed but the index has no submodules" '
+	add_submodule_commits &&
+	add_superproject_commits &&
+	# Fetch the new superproject commit
+	(
+		cd downstream &&
+		git switch --recurse-submodules no-submodules &&
+		git fetch --recurse-submodules >../actual.out 2>../actual.err
+	) &&
+	super_head=$(git rev-parse --short HEAD) &&
+	sub_head=$(git -C submodule rev-parse --short HEAD) &&
+	deep_head=$(git -C submodule/subdir/deepsubmodule rev-parse --short HEAD) &&
+
+	# assert that these are fetched from commits, not the index
+	check_sub $sub_head $super_head &&
+	check_deep $deep_head $sub_head &&
+
+	test_must_be_empty actual.out &&
+	verify_fetch_result actual.err
+'
+
+test_expect_success "'--recurse-submodules' should ignore changed, inactive submodules" '
+	add_submodule_commits &&
+	add_superproject_commits &&
+
+	# Fetch the new superproject commit
+	(
+		cd downstream &&
+		git switch --recurse-submodules no-submodules &&
+		git -c submodule.submodule.active=false fetch --recurse-submodules >../actual.out 2>../actual.err
+	) &&
+	test_must_be_empty actual.out &&
+	super_head=$(git rev-parse --short HEAD) &&
+	check_super $super_head &&
+	# Neither should be fetched because the submodule is inactive
+	rm expect.err.sub &&
+	rm expect.err.deep &&
+	verify_fetch_result actual.err
+'
+
+# In downstream, init "submodule2", but do not check it out while
+# fetching. This lets us assert that unpopulated submodules can be
+# fetched.
+test_expect_success 'setup downstream branch with other submodule' '
+	mkdir submodule2 &&
+	(
+		cd submodule2 &&
+		git init &&
+		echo sub2content >sub2file &&
+		git add sub2file &&
+		git commit -a -m new &&
+		git branch -M sub2
+	) &&
+	git checkout -b super-sub2-only &&
+	git submodule add "$pwd/submodule2" submodule2 &&
+	git commit -m "add sub2" &&
+	git checkout super &&
+	(
+		cd downstream &&
+		git fetch --recurse-submodules origin &&
+		git checkout super-sub2-only &&
+		# Explicitly run "git submodule update" because sub2 is new
+		# and has not been cloned.
+		git submodule update --init &&
+		git checkout --recurse-submodules super
+	)
+'
+
+test_expect_success "'--recurse-submodules' should fetch submodule commits in changed submodules and the index" '
+	# Create new commit in origin/super
+	add_submodule_commits &&
+	add_superproject_commits &&
+
+	# Create new commit in origin/super-sub2-only
+	git checkout super-sub2-only &&
+	(
+		cd submodule2 &&
+		test_commit --no-tag foo
+	) &&
+	git add submodule2 &&
+	git commit -m "new submodule2" &&
+
+	git checkout super &&
+	(
+		cd downstream &&
+		git fetch --recurse-submodules >../actual.out 2>../actual.err
+	) &&
+	test_must_be_empty actual.out &&
+	deep_head=$(git -C submodule/subdir/deepsubmodule rev-parse --short HEAD) &&
+	sub_head=$(git -C submodule rev-parse --short HEAD) &&
+	sub2_head=$(git -C submodule2 rev-parse --short HEAD) &&
+	super_head=$(git rev-parse --short HEAD) &&
+	super_sub2_only_head=$(git rev-parse --short super-sub2-only) &&
+
+	# Use test_cmp manually because verify_fetch_result does not
+	# consider submodule2. All the repos should be fetched, but only
+	# submodule2 should be read from a commit
+	cat > expect.err.combined <<-EOF &&
+	From $pwd/.
+	   OLD_HEAD..$super_head  super           -> origin/super
+	   OLD_HEAD..$super_sub2_only_head  super-sub2-only -> origin/super-sub2-only
+	Fetching submodule submodule
+	From $pwd/submodule
+	   OLD_HEAD..$sub_head  sub        -> origin/sub
+	Fetching submodule submodule/subdir/deepsubmodule
+	From $pwd/deepsubmodule
+	   OLD_HEAD..$deep_head  deep       -> origin/deep
+	Fetching submodule submodule2 at commit $super_sub2_only_head
+	From $pwd/submodule2
+	   OLD_HEAD..$sub2_head  sub2       -> origin/sub2
+	EOF
+	sed -e "s/[0-9a-f][0-9a-f]*\.\./OLD_HEAD\.\./" actual.err >actual.err.cmp &&
+	test_cmp expect.err.combined actual.err.cmp
+'
+
 test_expect_success "'--recurse-submodules=on-demand' stops when no new submodule commits are found in the superproject (and ignores config)" '
 	add_submodule_commits &&
 	echo a >> file &&
@@ -860,4 +1012,111 @@ test_expect_success 'recursive fetch after deinit a submodule' '
 	test_cmp expect actual
 '
 
+test_expect_success 'setup repo with upstreams that share a submodule name' '
+	mkdir same-name-1 &&
+	(
+		cd same-name-1 &&
+		git init -b main &&
+		test_commit --no-tag a
+	) &&
+	git clone same-name-1 same-name-2 &&
+	# same-name-1 and same-name-2 both add a submodule with the
+	# name "submodule"
+	(
+		cd same-name-1 &&
+		mkdir submodule &&
+		git -C submodule init -b main &&
+		test_commit -C submodule --no-tag a1 &&
+		git submodule add "$pwd/same-name-1/submodule" &&
+		git add submodule &&
+		git commit -m "super-a1"
+	) &&
+	(
+		cd same-name-2 &&
+		mkdir submodule &&
+		git -C submodule init -b main &&
+		test_commit -C submodule --no-tag a2 &&
+		git submodule add "$pwd/same-name-2/submodule" &&
+		git add submodule &&
+		git commit -m "super-a2"
+	) &&
+	git clone same-name-1 -o same-name-1 same-name-downstream &&
+	(
+		cd same-name-downstream &&
+		git remote add same-name-2 ../same-name-2 &&
+		git fetch --all &&
+		# init downstream with same-name-1
+		git submodule update --init
+	)
+'
+
+test_expect_success 'fetch --recurse-submodules updates name-conflicted, populated submodule' '
+	test_when_finished "git -C same-name-downstream checkout main" &&
+	(
+		cd same-name-1 &&
+		test_commit -C submodule --no-tag b1 &&
+		git add submodule &&
+		git commit -m "super-b1"
+	) &&
+	(
+		cd same-name-2 &&
+		test_commit -C submodule --no-tag b2 &&
+		git add submodule &&
+		git commit -m "super-b2"
+	) &&
+	(
+		cd same-name-downstream &&
+		# even though the .gitmodules is correct, we cannot
+		# fetch from same-name-2
+		git checkout same-name-2/main &&
+		git fetch --recurse-submodules same-name-1 &&
+		test_must_fail git fetch --recurse-submodules same-name-2
+	) &&
+	super_head1=$(git -C same-name-1 rev-parse HEAD) &&
+	git -C same-name-downstream cat-file -e $super_head1 &&
+
+	super_head2=$(git -C same-name-2 rev-parse HEAD) &&
+	git -C same-name-downstream cat-file -e $super_head2 &&
+
+	sub_head1=$(git -C same-name-1/submodule rev-parse HEAD) &&
+	git -C same-name-downstream/submodule cat-file -e $sub_head1 &&
+
+	sub_head2=$(git -C same-name-2/submodule rev-parse HEAD) &&
+	test_must_fail git -C same-name-downstream/submodule cat-file -e $sub_head2
+'
+
+test_expect_success 'fetch --recurse-submodules updates name-conflicted, unpopulated submodule' '
+	(
+		cd same-name-1 &&
+		test_commit -C submodule --no-tag c1 &&
+		git add submodule &&
+		git commit -m "super-c1"
+	) &&
+	(
+		cd same-name-2 &&
+		test_commit -C submodule --no-tag c2 &&
+		git add submodule &&
+		git commit -m "super-c2"
+	) &&
+	(
+		cd same-name-downstream &&
+		git checkout main &&
+		git rm .gitmodules &&
+		git rm submodule &&
+		git commit -m "no submodules" &&
+		git fetch --recurse-submodules same-name-1
+	) &&
+	head1=$(git -C same-name-1/submodule rev-parse HEAD) &&
+	head2=$(git -C same-name-2/submodule rev-parse HEAD) &&
+	(
+		cd same-name-downstream/.git/modules/submodule &&
+		# The submodule has core.worktree pointing to the "git
+		# rm"-ed directory, overwrite the invalid value. See
+		# comment in get_fetch_task_from_changed() for more
+		# information.
+		git --work-tree=. cat-file -e $head1 &&
+		test_must_fail git --work-tree=. cat-file -e $head2
+	)
+'
+
 test_done
-- 
2.33.GIT


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v4 10/10] submodule: fix latent check_has_commit() bug
  2022-03-04  0:57     ` [PATCH v4 00/10] fetch --recurse-submodules: fetch unpopulated submodules Glen Choo
                         ` (8 preceding siblings ...)
  2022-03-04  0:57       ` [PATCH v4 09/10] fetch: fetch unpopulated, changed submodules Glen Choo
@ 2022-03-04  0:57       ` Glen Choo
  2022-03-04  2:17         ` Junio C Hamano
  2022-03-04  2:22       ` [PATCH v4 00/10] fetch --recurse-submodules: fetch unpopulated submodules Junio C Hamano
  2022-03-08  0:14       ` [PATCH v5 " Glen Choo
  11 siblings, 1 reply; 149+ messages in thread
From: Glen Choo @ 2022-03-04  0:57 UTC (permalink / raw)
  To: git
  Cc: Glen Choo, Jonathan Tan, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

When check_has_commit() is called on a missing submodule, initialization
of the struct repository fails, but it attempts to clear the struct
anyway (which is a fatal error). This bug is masked by its only caller,
submodule_has_commits(), first calling add_submodule_odb(). The latter
fails if the submodule does not exist, making submodule_has_commits()
exit early and not invoke check_has_commit().

Fix this bug, and because calling add_submodule_odb() is no longer
necessary as of 13a2f620b2 (submodule: pass repo to
check_has_commit(), 2021-10-08), remove that call too.

This is the last caller of add_submodule_odb(), so remove that
function. (Submodule ODBs are still added as alternates via
add_submodule_odb_by_path().)

Signed-off-by: Glen Choo <chooglen@google.com>
---
 submodule.c | 35 ++---------------------------------
 submodule.h |  9 ++++-----
 2 files changed, 6 insertions(+), 38 deletions(-)

diff --git a/submodule.c b/submodule.c
index 1f5f39ce18..6e6b2d04e4 100644
--- a/submodule.c
+++ b/submodule.c
@@ -167,26 +167,6 @@ void stage_updated_gitmodules(struct index_state *istate)
 
 static struct string_list added_submodule_odb_paths = STRING_LIST_INIT_NODUP;
 
-/* TODO: remove this function, use repo_submodule_init instead. */
-int add_submodule_odb(const char *path)
-{
-	struct strbuf objects_directory = STRBUF_INIT;
-	int ret = 0;
-
-	ret = strbuf_git_path_submodule(&objects_directory, path, "objects/");
-	if (ret)
-		goto done;
-	if (!is_directory(objects_directory.buf)) {
-		ret = -1;
-		goto done;
-	}
-	string_list_insert(&added_submodule_odb_paths,
-			   strbuf_detach(&objects_directory, NULL));
-done:
-	strbuf_release(&objects_directory);
-	return ret;
-}
-
 void add_submodule_odb_by_path(const char *path)
 {
 	string_list_insert(&added_submodule_odb_paths, xstrdup(path));
@@ -971,7 +951,8 @@ static int check_has_commit(const struct object_id *oid, void *data)
 
 	if (repo_submodule_init(&subrepo, cb->repo, cb->path, cb->super_oid)) {
 		cb->result = 0;
-		goto cleanup;
+		/* subrepo failed to init, so don't clean it up. */
+		return 0;
 	}
 
 	type = oid_object_info(&subrepo, oid, NULL);
@@ -1007,18 +988,6 @@ static int submodule_has_commits(struct repository *r,
 		.super_oid = super_oid
 	};
 
-	/*
-	 * Perform a cheap, but incorrect check for the existence of 'commits'.
-	 * This is done by adding the submodule's object store to the in-core
-	 * object store, and then querying for each commit's existence.  If we
-	 * do not have the commit object anywhere, there is no chance we have
-	 * it in the object store of the correct submodule and have it
-	 * reachable from a ref, so we can fail early without spawning rev-list
-	 * which is expensive.
-	 */
-	if (add_submodule_odb(path))
-		return 0;
-
 	oid_array_for_each_unique(commits, check_has_commit, &has_commit);
 
 	if (has_commit.result) {
diff --git a/submodule.h b/submodule.h
index 61bebde319..40c1445237 100644
--- a/submodule.h
+++ b/submodule.h
@@ -103,12 +103,11 @@ int submodule_uses_gitfile(const char *path);
 int bad_to_remove_submodule(const char *path, unsigned flags);
 
 /*
- * Call add_submodule_odb() to add the submodule at the given path to a list.
- * When register_all_submodule_odb_as_alternates() is called, the object stores
- * of all submodules in that list will be added as alternates in
- * the_repository.
+ * Call add_submodule_odb_by_path() to add the submodule at the given
+ * path to a list. When register_all_submodule_odb_as_alternates() is
+ * called, the object stores of all submodules in that list will be
+ * added as alternates in the_repository.
  */
-int add_submodule_odb(const char *path);
 void add_submodule_odb_by_path(const char *path);
 int register_all_submodule_odb_as_alternates(void);
 
-- 
2.33.GIT


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* Re: [PATCH v4 01/10] t5526: introduce test helper to assert on fetches
  2022-03-04  0:57       ` [PATCH v4 01/10] t5526: introduce test helper to assert on fetches Glen Choo
@ 2022-03-04  2:06         ` Junio C Hamano
  2022-03-04 22:11           ` Glen Choo
  0 siblings, 1 reply; 149+ messages in thread
From: Junio C Hamano @ 2022-03-04  2:06 UTC (permalink / raw)
  To: Glen Choo; +Cc: git, Jonathan Tan, Ævar Arnfjörð Bjarmason

Glen Choo <chooglen@google.com> writes:

> +# Verifies that the expected repositories were fetched. This is done by
> +# concatenating the files expect.err.[super|sub|deep] in the correct
> +# order and comparing it to the actual stderr.
> +#
> +# If a repo should not be fetched in the test, its corresponding
> +# expect.err file should be rm-ed.
> +verify_fetch_result() {

I think you updated 02/10 "check_sub () {" but this is leftover.  If
there aren't too many, I can fix them up locally.  We'll see.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v4 02/10] t5526: stop asserting on stderr literally
  2022-03-04  0:57       ` [PATCH v4 02/10] t5526: stop asserting on stderr literally Glen Choo
@ 2022-03-04  2:12         ` Junio C Hamano
  2022-03-04 22:41         ` Jonathan Tan
  1 sibling, 0 replies; 149+ messages in thread
From: Junio C Hamano @ 2022-03-04  2:12 UTC (permalink / raw)
  To: Glen Choo; +Cc: git, Jonathan Tan, Ævar Arnfjörð Bjarmason

Glen Choo <chooglen@google.com> writes:

> In the previous commit message, we noted that not all of the "git fetch"
> stderr is relevant to the tests. Most of the test setup lines are
> dedicated to these details of the stderr:
>
> 1. which repos (super/sub/deep) are involved in the fetch
> 2. the head of the remote-tracking branch before the fetch (i.e. $head1)
> 3. the head of the remote-tracking branch after the fetch (i.e. $head2)
>
> 1. and 3. are relevant because they tell us that the expected commit is
> fetched by the expected repo, but 2. is completely irrelevant.
>
> Stop asserting on $head1 by replacing it with a dummy value in the
> actual and expected output. Do this by introducing test
> helpers (check_*()) that make it easier to construct the expected
> output, and use sed to munge the actual output.
>
> Signed-off-by: Glen Choo <chooglen@google.com>
> ---
>  t/t5526-fetch-submodules.sh | 119 +++++++++++++++++-------------------
>  1 file changed, 57 insertions(+), 62 deletions(-)
>
> diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
> index dff7a4b90b..6b24d37b2b 100755
> --- a/t/t5526-fetch-submodules.sh
> +++ b/t/t5526-fetch-submodules.sh
> @@ -10,6 +10,32 @@ export GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB
>  
>  pwd=$(pwd)
>  
> +check_sub () {
> +	NEW_HEAD=$1 &&
> +	cat >$pwd/expect.err.sub <<-EOF

Style.

	cat >"$pwd/expect.err.sub" <<-EOF

Here, $pwd most likely has an $IFS letter in it (because we
deliberately use "trash directory.xxxx" as the place to run our
tests in) but a redirection target does not go through $IFS word
splitting, so such a quoting is not technically necessary, but
some versions of bash are known to throw a warning if we don't,
and an extra quoting does not hurt.

> +check_deep () {
> +	NEW_HEAD=$1 &&
> +	cat >$pwd/expect.err.deep <<-EOF

Likewise.

> +	Fetching submodule submodule/subdir/deepsubmodule
> +	From $pwd/deepsubmodule
> +	   OLD_HEAD..$NEW_HEAD  deep       -> origin/deep
> +	EOF
> +}
> +
> +check_super () {
> +	NEW_HEAD=$1 &&
> +	cat >$pwd/expect.err.super <<-EOF

Likewise.

> +	From $pwd/.
> +	   OLD_HEAD..$NEW_HEAD  super      -> origin/super
> +	EOF
> +}

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v4 10/10] submodule: fix latent check_has_commit() bug
  2022-03-04  0:57       ` [PATCH v4 10/10] submodule: fix latent check_has_commit() bug Glen Choo
@ 2022-03-04  2:17         ` Junio C Hamano
  0 siblings, 0 replies; 149+ messages in thread
From: Junio C Hamano @ 2022-03-04  2:17 UTC (permalink / raw)
  To: Glen Choo; +Cc: git, Jonathan Tan, Ævar Arnfjörð Bjarmason

Glen Choo <chooglen@google.com> writes:

> When check_has_commit() is called on a missing submodule, initialization
> of the struct repository fails, but it attempts to clear the struct
> anyway (which is a fatal error). This bug is masked by its only caller,
> submodule_has_commits(), first calling add_submodule_odb(). The latter
> fails if the submodule does not exist, making submodule_has_commits()
> exit early and not invoke check_has_commit().
>
> Fix this bug, and because calling add_submodule_odb() is no longer
> necessary as of 13a2f620b2 (submodule: pass repo to
> check_has_commit(), 2021-10-08), remove that call too.
>
> This is the last caller of add_submodule_odb(), so remove that
> function. (Submodule ODBs are still added as alternates via
> add_submodule_odb_by_path().)
>
> Signed-off-by: Glen Choo <chooglen@google.com>
> ---
>  submodule.c | 35 ++---------------------------------
>  submodule.h |  9 ++++-----
>  2 files changed, 6 insertions(+), 38 deletions(-)

Looks reasonable.  Will queue.

Thanks.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v4 00/10] fetch --recurse-submodules: fetch unpopulated submodules
  2022-03-04  0:57     ` [PATCH v4 00/10] fetch --recurse-submodules: fetch unpopulated submodules Glen Choo
                         ` (9 preceding siblings ...)
  2022-03-04  0:57       ` [PATCH v4 10/10] submodule: fix latent check_has_commit() bug Glen Choo
@ 2022-03-04  2:22       ` Junio C Hamano
  2022-03-08  0:14       ` [PATCH v5 " Glen Choo
  11 siblings, 0 replies; 149+ messages in thread
From: Junio C Hamano @ 2022-03-04  2:22 UTC (permalink / raw)
  To: Glen Choo; +Cc: git, Jonathan Tan, Ævar Arnfjörð Bjarmason

Glen Choo <chooglen@google.com> writes:

> == Since v3
> - Numerous style fixes + improved comments.
> - Fix sed portability issues.

Good.

> - Fix failing test due to default branch name assumptions.

OK.  Sprinkling "-b main" all over to "git init" makes a bit noisy
patch, but the conversion looks good.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v4 09/10] fetch: fetch unpopulated, changed submodules
  2022-03-04  0:57       ` [PATCH v4 09/10] fetch: fetch unpopulated, changed submodules Glen Choo
@ 2022-03-04  2:37         ` Junio C Hamano
  2022-03-04 22:59           ` Glen Choo
  2022-03-04 23:56         ` Jonathan Tan
  1 sibling, 1 reply; 149+ messages in thread
From: Junio C Hamano @ 2022-03-04  2:37 UTC (permalink / raw)
  To: Glen Choo; +Cc: git, Jonathan Tan, Ævar Arnfjörð Bjarmason

Glen Choo <chooglen@google.com> writes:

>  		item = string_list_insert(changed, name);
> -		if (!item->util)
> +		if (item->util)
> +			cs_data = item->util;
> +		else {
>  			item->util = xcalloc(1, sizeof(struct changed_submodule_data));
> -		cs_data = item->util;
> +			cs_data = item->util;
> +			cs_data->super_oid = commit_oid;
> +			cs_data->path = xstrdup(p->two->path);
> +		}

I do not quite get this change.

collect_changed_submodules() walks a range of revisions in the
superproject, doing an equivalent of "git log --raw" and feeding the
differences to this callback function.  The above code looks at the
path and uses the "changed" string list to record which submodule
was modified, what commit in the submodule is needed, etc.

What happens when the range has more than one change to the same
submodule?  cs_data has only one room for recording .super_oid
(which commit in the superproject touches the submodule) and .path
(where in the superproject's tree the submodule exists).  "git mv"
of a submodule might be rare and it may not hurt too much that only
a single .path can be kept, but it looks somewhat iffy.

>  		oid_array_append(&cs_data->new_commits, &p->two->oid);

At least, we are not losing any submodule commit even when the same
submodule is touched more than once by the superproject, but it is
dubious why we have cs_data.super_oid and cs_data.path in the first
place.

How are they used, or are they something that seemed useful when the
code was first written but it turned out that they weren't and left
unused?

Or do we need to make cs_data an array of 3-tuple { .super_oid,
.submodule_oid, .path } for each submodule name?

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v4 01/10] t5526: introduce test helper to assert on fetches
  2022-03-04  2:06         ` Junio C Hamano
@ 2022-03-04 22:11           ` Glen Choo
  0 siblings, 0 replies; 149+ messages in thread
From: Glen Choo @ 2022-03-04 22:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jonathan Tan, Ævar Arnfjörð Bjarmason

Junio C Hamano <gitster@pobox.com> writes:

> Glen Choo <chooglen@google.com> writes:
>
>> +# Verifies that the expected repositories were fetched. This is done by
>> +# concatenating the files expect.err.[super|sub|deep] in the correct
>> +# order and comparing it to the actual stderr.
>> +#
>> +# If a repo should not be fetched in the test, its corresponding
>> +# expect.err file should be rm-ed.
>> +verify_fetch_result() {
>
> I think you updated 02/10 "check_sub () {" but this is leftover.  If
> there aren't too many, I can fix them up locally.  We'll see.

Ugh, sorry about that.. I fixed this somewhere, but at some point I got
my branches confused and must've missed this one.

Since you left comments on other patches (like [02/10]
xmqq8rtq5uue.fsf@gitster.g), I'll just fix it on my end.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v4 02/10] t5526: stop asserting on stderr literally
  2022-03-04  0:57       ` [PATCH v4 02/10] t5526: stop asserting on stderr literally Glen Choo
  2022-03-04  2:12         ` Junio C Hamano
@ 2022-03-04 22:41         ` Jonathan Tan
  2022-03-04 23:48           ` Junio C Hamano
  1 sibling, 1 reply; 149+ messages in thread
From: Jonathan Tan @ 2022-03-04 22:41 UTC (permalink / raw)
  To: Glen Choo
  Cc: Jonathan Tan, git, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

Glen Choo <chooglen@google.com> writes:
> +check_sub () {
> +	NEW_HEAD=$1 &&
> +	cat >$pwd/expect.err.sub <<-EOF
> +	Fetching submodule submodule
> +	From $pwd/submodule
> +	   OLD_HEAD..$NEW_HEAD  sub        -> origin/sub
> +	EOF
> +}
> +
> +check_deep () {
> +	NEW_HEAD=$1 &&
> +	cat >$pwd/expect.err.deep <<-EOF
> +	Fetching submodule submodule/subdir/deepsubmodule
> +	From $pwd/deepsubmodule
> +	   OLD_HEAD..$NEW_HEAD  deep       -> origin/deep
> +	EOF
> +}
> +
> +check_super () {
> +	NEW_HEAD=$1 &&
> +	cat >$pwd/expect.err.super <<-EOF
> +	From $pwd/.
> +	   OLD_HEAD..$NEW_HEAD  super      -> origin/super
> +	EOF
> +}

The check_ names still aren't changed (as I suggested in [1]) but
perhaps it's fine to leave it. It doesn't seem to bother the other
reviewers, and changing it would slightly disrupt the review in that
there will be extra changes in the range-diff.

[1] https://lore.kernel.org/git/20220224230523.2877129-1-jonathantanmy@google.com/

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v4 03/10] t5526: create superproject commits with test helper
  2022-03-04  0:57       ` [PATCH v4 03/10] t5526: create superproject commits with test helper Glen Choo
@ 2022-03-04 22:59         ` Jonathan Tan
  0 siblings, 0 replies; 149+ messages in thread
From: Jonathan Tan @ 2022-03-04 22:59 UTC (permalink / raw)
  To: Glen Choo
  Cc: Jonathan Tan, git, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

Glen Choo <chooglen@google.com> writes:
>  test_expect_success "'--recurse-submodules=on-demand' doesn't recurse when no new commits are fetched in the superproject (and ignores config)" '
> -	add_upstream_commit &&
> -	(
> -		cd submodule &&
> -		(
> -			cd subdir/deepsubmodule &&
> -			git fetch &&
> -			git checkout -q FETCH_HEAD
> -		) &&
> -		git add subdir/deepsubmodule &&
> -		git commit -m "new deepsubmodule" &&
> -		new_head=$(git rev-parse --short HEAD) &&
> -		check_sub $new_head
> -	) &&
> +	add_submodule_commits &&
>  	(
>  		cd downstream &&
>  		git config fetch.recurseSubmodules true &&

The deletion of the block in which we updated
submodule/subdir/deepsubmodule and submodule was a cause for concern,
but now I think it's fine - add_upstream_commit already adds a commit in
the submodule that would be fetched if we were to recurse into it. But
since we are not recursing when no new commits are fetched in the
superproject, we just have to detect that the commit added in
add_upstream_commit wasn't fetched.

OK - patches up to and including this one look good.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v4 09/10] fetch: fetch unpopulated, changed submodules
  2022-03-04  2:37         ` Junio C Hamano
@ 2022-03-04 22:59           ` Glen Choo
  2022-03-05  0:13             ` Junio C Hamano
  0 siblings, 1 reply; 149+ messages in thread
From: Glen Choo @ 2022-03-04 22:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jonathan Tan, Ævar Arnfjörð Bjarmason

Junio C Hamano <gitster@pobox.com> writes:

> Glen Choo <chooglen@google.com> writes:
>
>>  		item = string_list_insert(changed, name);
>> -		if (!item->util)
>> +		if (item->util)
>> +			cs_data = item->util;
>> +		else {
>>  			item->util = xcalloc(1, sizeof(struct changed_submodule_data));
>> -		cs_data = item->util;
>> +			cs_data = item->util;
>> +			cs_data->super_oid = commit_oid;
>> +			cs_data->path = xstrdup(p->two->path);
>> +		}
>
> I do not quite get this change.
>
> collect_changed_submodules() walks a range of revisions in the
> superproject, doing an equivalent of "git log --raw" and feeding the
> differences to this callback function.  The above code looks at the
> path and uses the "changed" string list to record which submodule
> was modified, what commit in the submodule is needed, etc.
>
> What happens when the range has more than one change to the same
> submodule?  cs_data has only one room for recording .super_oid
> (which commit in the superproject touches the submodule) and .path
> (where in the superproject's tree the submodule exists).  "git mv"
> of a submodule might be rare and it may not hurt too much that only
> a single .path can be kept, but it looks somewhat iffy.

Yes, I agree that it looks odd, which is why I added this comment to
hopefully make it less opaque:

  + * (super_oid, path) allows the submodule config to be read from _some_
  + * .gitmodules file. We store this information the first time we find a
  + * superproject commit that points to the submodule, but this is
  + * arbitrary - we can choose any (super_oid, path) that matches the
  + * submodule's name.

I guess this only says that it is ok to store .super_oid and .path from
any commit, but doesn't go in depth into _why_. It's ok because we only
need (.super_oid, .path) because repo_submodule_init(..., path,
treeish_name) maps these args to the submodule's name and gitdir (i.e.
.git/modules/<name>).

This means we don't worry about 'git mv' (super_oid's .gitmodules will
tell us the correct name even if the path changed relative to some other
commit), nor seeing the submodule more than once (it doesn't matter
whose .gitmodules we look at so long as repo_submodule_init() derives
the correct gitdir).

And now that you've pointed this out, I realize that we could do away
with (.super_oid, .path) altogether if we had a variant of
repo_submodule_init() that takes the submodule name instead of (path,
treeish_name). (We have a similar submodule_from_name(), but that only
reads the submodule config, not a struct repository.) I would prefer not
to introduce such a function so late into the review cycle, but I could
clean this up later.

>>  		oid_array_append(&cs_data->new_commits, &p->two->oid);
>
> At least, we are not losing any submodule commit even when the same
> submodule is touched more than once by the superproject, but it is
> dubious why we have cs_data.super_oid and cs_data.path in the first
> place.

On the hand, we actually need to record every submodule commit, so yes.

> How are they used, or are they something that seemed useful when the
> code was first written but it turned out that they weren't and left
> unused?
>
> Or do we need to make cs_data an array of 3-tuple { .super_oid,
> .submodule_oid, .path } for each submodule name?

To conclude:

- The changed_submodules string_list is basically a map that tells us,
  for a given submodule _name_, which commits we need to fetch and where
  repo_submodule_init() can read the submodule name from.
- We only use cs_data as a string_list_item.util, and the
  string_list_item.string is the submodule name itself.
- .new_commits tells us which commits to fetch.
- .super_oid and .path tells repo_submodule_init() how to get the name
  of the submodule.

So we don't need to make this a 3-tuple.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v3 09/10] fetch: fetch unpopulated, changed submodules
  2022-02-25  3:46         ` Glen Choo
@ 2022-03-04 23:46           ` Jonathan Tan
  2022-03-05  0:22             ` Glen Choo
  2022-03-04 23:53           ` Jonathan Tan
  1 sibling, 1 reply; 149+ messages in thread
From: Jonathan Tan @ 2022-03-04 23:46 UTC (permalink / raw)
  To: Glen Choo
  Cc: Jonathan Tan, git, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

Glen Choo <chooglen@google.com> writes:
> >> +	# Use test_cmp manually because verify_fetch_result does not
> >> +	# consider submodule2. All the repos should be fetched, but only
> >> +	# submodule2 should be read from a commit
> >> +	cat <<-EOF > expect.err.combined &&
> >> +	From $pwd/.
> >> +	   OLD_HEAD..$super_head  super           -> origin/super
> >> +	   OLD_HEAD..$super_sub2_only_head  super-sub2-only -> origin/super-sub2-only
> >> +	Fetching submodule submodule
> >> +	From $pwd/submodule
> >> +	   OLD_HEAD..$sub_head  sub        -> origin/sub
> >> +	Fetching submodule submodule/subdir/deepsubmodule
> >> +	From $pwd/deepsubmodule
> >> +	   OLD_HEAD..$deep_head  deep       -> origin/deep
> >> +	Fetching submodule submodule2 at commit $super_sub2_only_head
> >> +	From $pwd/submodule2
> >> +	   OLD_HEAD..$sub2_head  sub2       -> origin/sub2
> >> +	EOF
> >> +	sed -E "s/[0-9a-f]+\.\./OLD_HEAD\.\./" actual.err >actual.err.cmp &&
> >> +	test_cmp expect.err.combined actual.err.cmp
> >> +'
> >
> > Could verify_fetch_result be modified to consider the new submodule
> > instead?
> 
> Since submodule2 is on the end of the file, I could modify
> verify_fetch_result() to concatenate extra text on the end. But if it
> weren't in the middle, we'd need to insert arbitrary text in the middle
> of the file.
> 
> I can't think of a good way to do this without compromising test
> readability, so I'll just do concatenation for now.

Looking at it, I think you can do it by adding a section that verifies
the "Fetching submodule submodule2" part if the file is present (so, no
change in behavior in the rest of the tests since they don't write this
file) and also modifying check_super to allow specification of the sub2
part (or making a new function for this).

> > What's the error message printed to the user here? (Just from reading
> > the code, I would have expected this to succeed, with the submodule
> > fetch being from same-name-1's submodule since we're fetching submodules
> > by name, but apparently that is not the case.)
> 
> Yeah, I think this might trip up some readers. The message is:
> 
>   From ../same-name-2
>     b7ebb59..944b5ac  master     -> same-name-2/master
>   Fetching submodule submodule
>   fatal: git upload-pack: not our ref 7ff6874077503acb9d0a52e280aaed9748276319
>   fatal: remote error: upload-pack: not our ref 7ff6874077503acb9d0a52e280aaed9748276319
>   Errors during submodule fetch:
>           submodule
> 
> Which, I believe, comes from how we fetch commits by oid:
> 
>   static int get_next_submodule(struct child_process *cp, struct strbuf *err,
>               void *data, void **task_cb)
>   [...]
>     oid_array_for_each_unique(task->commits,
>           append_oid_to_argv, &cp->args);
> 
> When the following is true:
> 
> - the submodule is found in the index
> - we are fetching submodules unconditionally (--recurse-submodules=yes")
> - no superproject commit "changes" the submodule
> 
> task->commits is empty, and we just fetch the from the submodule's
> remote by name. But as long as any superproject commit "changes" the
> submodule, we try to fetch by oid, which, as this test demonstrates, may
> fail.

Ah, so we try to fetch an OID from a submodule given by a fetched
commit, which is different from the submodule the client already has
locally. This might be a sign that we need to store more information
about the submodule so that we can print a clearer message. I haven't
looked into this deeply, but this might be possible by putting more
information in the util of changed_submodule_names, and when we have
already seen that submodule, to add more information to the util instead
of skipping it.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v4 02/10] t5526: stop asserting on stderr literally
  2022-03-04 22:41         ` Jonathan Tan
@ 2022-03-04 23:48           ` Junio C Hamano
  2022-03-05  0:25             ` Glen Choo
  0 siblings, 1 reply; 149+ messages in thread
From: Junio C Hamano @ 2022-03-04 23:48 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Glen Choo, git, Ævar Arnfjörð Bjarmason

Jonathan Tan <jonathantanmy@google.com> writes:

> The check_ names still aren't changed (as I suggested in [1]) but
> perhaps it's fine to leave it. It doesn't seem to bother the other
> reviewers, and changing it would slightly disrupt the review in that
> there will be extra changes in the range-diff.

At least, please do not count my not mentioning it as such a vote.
I didn't mention it because I saw you did.

Thanks.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v3 09/10] fetch: fetch unpopulated, changed submodules
  2022-02-25  3:46         ` Glen Choo
  2022-03-04 23:46           ` Jonathan Tan
@ 2022-03-04 23:53           ` Jonathan Tan
  1 sibling, 0 replies; 149+ messages in thread
From: Jonathan Tan @ 2022-03-04 23:53 UTC (permalink / raw)
  To: Glen Choo
  Cc: Jonathan Tan, git, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

Glen Choo <chooglen@google.com> writes:
> >> +# In downstream, init "submodule2", but do not check it out while
> >> +# fetching. This lets us assert that unpopulated submodules can be
> >> +# fetched.
> >> +test_expect_success 'setup downstream branch with other submodule' '
> >> +	mkdir submodule2 &&
> >> +	(
> >> +		cd submodule2 &&
> >> +		git init &&
> >> +		echo sub2content >sub2file &&
> >> +		git add sub2file &&
> >> +		git commit -a -m new &&
> >> +		git branch -M sub2
> >> +	) &&
> >> +	git checkout -b super-sub2-only &&
> >> +	git submodule add "$pwd/submodule2" submodule2 &&
> >> +	git commit -m "add sub2" &&
> >> +	git checkout super &&
> >> +	(
> >> +		cd downstream &&
> >> +		git fetch --recurse-submodules origin &&
> >> +		git checkout super-sub2-only &&
> >> +		# Explicitly run "git submodule update" because sub2 is new
> >> +		# and has not been cloned.
> >> +		git submodule update --init &&
> >> +		git checkout --recurse-submodules super
> >> +	)
> >> +'
> >
> > Hmm...what is the difference between this and the original case in which
> > the index has no submodules? Both assert that unpopulated submodules
> > (submodules that cannot be found by iterating the index, as described in
> > your commit message) can be fetched.
> 
> In the previous test, the index has no submodules (it's completely empty
> in fact, so we don't iterate the index at all), but in this test, it
> does. This lets us check that there aren't any buggy interactions when
> both changed and index submodules are present.
> 
> I think such mistakes are pretty easy to introduce on accident - I made
> one pre-v1 where I reused .count between both iterators (instead
> of having .index_count and .changed_count). It passed the previous test
> because we didn't care about the index, but it obviously wouldn't pass
> this one.

In that case, describe this difference (one has no submodules in index,
one has other submodules in index) and maybe position this so that both
test cases (the no-submodule-in-index one and the
other-submodule-in-index one) are next to each other.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v4 09/10] fetch: fetch unpopulated, changed submodules
  2022-03-04  0:57       ` [PATCH v4 09/10] fetch: fetch unpopulated, changed submodules Glen Choo
  2022-03-04  2:37         ` Junio C Hamano
@ 2022-03-04 23:56         ` Jonathan Tan
  1 sibling, 0 replies; 149+ messages in thread
From: Jonathan Tan @ 2022-03-04 23:56 UTC (permalink / raw)
  To: Glen Choo
  Cc: Jonathan Tan, git, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

Glen Choo <chooglen@google.com> writes:
> +		/*
> +		 * NEEDSWORK: Submodules set/unset a value for
> +		 * core.worktree when they are populated/unpopulated by
> +		 * "git checkout" (and similar commands, see
> +		 * submodule_move_head() and
> +		 * connect_work_tree_and_git_dir()), but if the
> +		 * submodule is unpopulated in another way (e.g. "git
> +		 * rm", "rm -r"), core.worktree will still be set even
> +		 * though the directory doesn't exist, and the child
> +		 * process will crash while trying to chdir into the
> +		 * nonexistent directory.
> +		 *
> +		 * In this case, we know that the submodule has no
> +		 * working tree, so we can work around this by
> +		 * setting "--work-tree=." (--bare does not work because
> +		 * worktree settings take precedence over bare-ness).
> +		 * However, this is not necessarily true in other cases,
> +		 * so a generalized solution is still necessary.
> +		 *
> +		 * Possible solutions:
> +		 * - teach "git [add|rm]" to unset core.worktree and
> +		 *   discourage users from removing submodules without
> +		 *   using a Git command.
> +		 * - teach submodule child processes to ignore stale
> +		 *   core.worktree values.
> +		 */
> +		strvec_push(&task->git_args, "--work-tree=.");
> +		return task;

Thanks - this is a good comment.

I've also written other comments on this patch that are easier to
describe in the context of an earlier conversation, so I've written
these as replies to an earlier email:

https://lore.kernel.org/git/20220304234622.647776-1-jonathantanmy@google.com/
https://lore.kernel.org/git/20220304235328.649768-1-jonathantanmy@google.com/

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v4 09/10] fetch: fetch unpopulated, changed submodules
  2022-03-04 22:59           ` Glen Choo
@ 2022-03-05  0:13             ` Junio C Hamano
  2022-03-05  0:37               ` Glen Choo
  0 siblings, 1 reply; 149+ messages in thread
From: Junio C Hamano @ 2022-03-05  0:13 UTC (permalink / raw)
  To: Glen Choo; +Cc: git, Jonathan Tan, Ævar Arnfjörð Bjarmason

Glen Choo <chooglen@google.com> writes:

> And now that you've pointed this out, I realize that we could do away
> with (.super_oid, .path) altogether if we had a variant of
> repo_submodule_init() that takes the submodule name instead of (path,
> treeish_name). (We have a similar submodule_from_name(), but that only
> reads the submodule config, not a struct repository.) I would prefer not
> to introduce such a function so late into the review cycle, but I could
> clean this up later.

I am puzzled.  What do you exactly mean by "late into the review
cycle"?

> - The changed_submodules string_list is basically a map that tells us,
>   for a given submodule _name_, which commits we need to fetch and where
>   repo_submodule_init() can read the submodule name from.
> - We only use cs_data as a string_list_item.util, and the
>   string_list_item.string is the submodule name itself.
> - .new_commits tells us which commits to fetch.
> - .super_oid and .path tells repo_submodule_init() how to get the name
>   of the submodule.
>
> So we don't need to make this a 3-tuple.

OK.  We need to learn in which local repository houses the submodule
we discover in cs_data resides.  It may or may not have a checkout
in the current checkout of the superorject commit.  And just one
<.super_oid, .path> tuple should be sufficient to tell us that,
because the mapping from submodule name to path may change as "git
mv" moves it around, but the mapping from submodule name to where
the submodule repository is stored in the .git/ directory of the
superproject should not change.  Am I following you so far
correctly?

I am wondering if we need even one <.super_oid, .path> tuple.
Looking at the implementation of repo_submodule_init(), I have a
feeling that a version of "initialize named submodule in a given
tree-ish in the superproject" would be rather trivial.  We already
have submodule name, so submodule_name_to_gitdir() would be all we
need, no?  After all, we are only interested in fetching objects to
fill missing commits (and possibly update the remote tracking
branches) and do not care about touching its working tree.  And once
we learn that .git/modules/<name>/ directory, we can fetch the
necessary commits into it, right?

Or am I oversimplifying the problem?

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v3 09/10] fetch: fetch unpopulated, changed submodules
  2022-03-04 23:46           ` Jonathan Tan
@ 2022-03-05  0:22             ` Glen Choo
  0 siblings, 0 replies; 149+ messages in thread
From: Glen Choo @ 2022-03-05  0:22 UTC (permalink / raw)
  To: Jonathan Tan
  Cc: Jonathan Tan, git, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

Jonathan Tan <jonathantanmy@google.com> writes:

> Glen Choo <chooglen@google.com> writes:
>> > What's the error message printed to the user here? (Just from reading
>> > the code, I would have expected this to succeed, with the submodule
>> > fetch being from same-name-1's submodule since we're fetching submodules
>> > by name, but apparently that is not the case.)
>> 
>> Yeah, I think this might trip up some readers. The message is:
>> 
>>   From ../same-name-2
>>     b7ebb59..944b5ac  master     -> same-name-2/master
>>   Fetching submodule submodule
>>   fatal: git upload-pack: not our ref 7ff6874077503acb9d0a52e280aaed9748276319
>>   fatal: remote error: upload-pack: not our ref 7ff6874077503acb9d0a52e280aaed9748276319
>>   Errors during submodule fetch:
>>           submodule
>> 
>> Which, I believe, comes from how we fetch commits by oid:
>> 
>>   static int get_next_submodule(struct child_process *cp, struct strbuf *err,
>>               void *data, void **task_cb)
>>   [...]
>>     oid_array_for_each_unique(task->commits,
>>           append_oid_to_argv, &cp->args);
>> 
>> When the following is true:
>> 
>> - the submodule is found in the index
>> - we are fetching submodules unconditionally (--recurse-submodules=yes")
>> - no superproject commit "changes" the submodule
>> 
>> task->commits is empty, and we just fetch the from the submodule's
>> remote by name. But as long as any superproject commit "changes" the
>> submodule, we try to fetch by oid, which, as this test demonstrates, may
>> fail.
>
> Ah, so we try to fetch an OID from a submodule given by a fetched
> commit, which is different from the submodule the client already has
> locally. This might be a sign that we need to store more information
> about the submodule so that we can print a clearer message. I haven't
> looked into this deeply, but this might be possible by putting more
> information in the util of changed_submodule_names, and when we have
> already seen that submodule, to add more information to the util instead
> of skipping it.

Storing the submodule URL might achieve this purpose, but if the URL
doesn't match, I think we'd want to skip the commit instead of trying to
fetch a commit from an unrelated URL. I don't know if this is a good
idea though yet - I haven't looked deeply into what Git uses the URL for
and whether users might want to change the URL even though the submodule
is the 'same' (e.g. pointing the URL to another remote instead of having
two completely different repos with the same submodule name).

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v4 02/10] t5526: stop asserting on stderr literally
  2022-03-04 23:48           ` Junio C Hamano
@ 2022-03-05  0:25             ` Glen Choo
  0 siblings, 0 replies; 149+ messages in thread
From: Glen Choo @ 2022-03-05  0:25 UTC (permalink / raw)
  To: Junio C Hamano, Jonathan Tan; +Cc: git, Ævar Arnfjörð Bjarmason

Junio C Hamano <gitster@pobox.com> writes:

> Jonathan Tan <jonathantanmy@google.com> writes:
>
>> The check_ names still aren't changed (as I suggested in [1]) but
>> perhaps it's fine to leave it. It doesn't seem to bother the other
>> reviewers, and changing it would slightly disrupt the review in that
>> there will be extra changes in the range-diff.
>
> At least, please do not count my not mentioning it as such a vote.
> I didn't mention it because I saw you did.

Oh! Sorry, I intended to do this - I missed this and another suggestion
that you made re: test assertions [1]. Will incorporate this into the
next round.

[1] 20220304234622.647776-1-jonathantanmy@google.com

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v4 09/10] fetch: fetch unpopulated, changed submodules
  2022-03-05  0:13             ` Junio C Hamano
@ 2022-03-05  0:37               ` Glen Choo
  2022-03-08  0:11                 ` Junio C Hamano
  0 siblings, 1 reply; 149+ messages in thread
From: Glen Choo @ 2022-03-05  0:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jonathan Tan, Ævar Arnfjörð Bjarmason

Junio C Hamano <gitster@pobox.com> writes:

> Glen Choo <chooglen@google.com> writes:
>
>> And now that you've pointed this out, I realize that we could do away
>> with (.super_oid, .path) altogether if we had a variant of
>> repo_submodule_init() that takes the submodule name instead of (path,
>> treeish_name). (We have a similar submodule_from_name(), but that only
>> reads the submodule config, not a struct repository.) I would prefer not
>> to introduce such a function so late into the review cycle, but I could
>> clean this up later.
>
> I am puzzled.  What do you exactly mean by "late into the review
> cycle"?

I mean that reviewers have already seen several iterations of this, and
I'm afraid that a refactor might introduce unnecessary cognitive
overhead.

But of course, we might decide that the refactor is a good enough idea
that we want to do it anyway :)

>> - The changed_submodules string_list is basically a map that tells us,
>>   for a given submodule _name_, which commits we need to fetch and where
>>   repo_submodule_init() can read the submodule name from.
>> - We only use cs_data as a string_list_item.util, and the
>>   string_list_item.string is the submodule name itself.
>> - .new_commits tells us which commits to fetch.
>> - .super_oid and .path tells repo_submodule_init() how to get the name
>>   of the submodule.
>>
>> So we don't need to make this a 3-tuple.
>
> OK.  We need to learn in which local repository houses the submodule
> we discover in cs_data resides.  It may or may not have a checkout
> in the current checkout of the superorject commit.  And just one
> <.super_oid, .path> tuple should be sufficient to tell us that,
> because the mapping from submodule name to path may change as "git
> mv" moves it around, but the mapping from submodule name to where
> the submodule repository is stored in the .git/ directory of the
> superproject should not change.  Am I following you so far
> correctly?

Yes, that's correct.

> I am wondering if we need even one <.super_oid, .path> tuple.
> Looking at the implementation of repo_submodule_init(), I have a
> feeling that a version of "initialize named submodule in a given
> tree-ish in the superproject" would be rather trivial.  We already
> have submodule name, so submodule_name_to_gitdir() would be all we
> need, no?  After all, we are only interested in fetching objects to
> fill missing commits (and possibly update the remote tracking
> branches) and do not care about touching its working tree.  And once
> we learn that .git/modules/<name>/ directory, we can fetch the
> necessary commits into it, right?
>
> Or am I oversimplifying the problem?

I don't think you are oversimplifying. Now that I look at it again, it
really _does_ seem trivial. Doing this refactor saves me the headache of
explaining why we need a single <.super_oid, .path> tuple, and saves
readers the headache of figuring out if I'm right.

I'll try it and see if it really makes things simpler or not.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v4 09/10] fetch: fetch unpopulated, changed submodules
  2022-03-05  0:37               ` Glen Choo
@ 2022-03-08  0:11                 ` Junio C Hamano
  0 siblings, 0 replies; 149+ messages in thread
From: Junio C Hamano @ 2022-03-08  0:11 UTC (permalink / raw)
  To: Glen Choo; +Cc: git, Jonathan Tan, Ævar Arnfjörð Bjarmason

Glen Choo <chooglen@google.com> writes:

>> Or am I oversimplifying the problem?
>
> I don't think you are oversimplifying. Now that I look at it again, it
> really _does_ seem trivial. Doing this refactor saves me the headache of
> explaining why we need a single <.super_oid, .path> tuple, and saves
> readers the headache of figuring out if I'm right.
>
> I'll try it and see if it really makes things simpler or not.

Thanks.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* [PATCH v5 00/10] fetch --recurse-submodules: fetch unpopulated submodules
  2022-03-04  0:57     ` [PATCH v4 00/10] fetch --recurse-submodules: fetch unpopulated submodules Glen Choo
                         ` (10 preceding siblings ...)
  2022-03-04  2:22       ` [PATCH v4 00/10] fetch --recurse-submodules: fetch unpopulated submodules Junio C Hamano
@ 2022-03-08  0:14       ` Glen Choo
  2022-03-08  0:14         ` [PATCH v5 01/10] t5526: introduce test helper to assert on fetches Glen Choo
                           ` (11 more replies)
  11 siblings, 12 replies; 149+ messages in thread
From: Glen Choo @ 2022-03-08  0:14 UTC (permalink / raw)
  To: git
  Cc: Glen Choo, Jonathan Tan, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

Original cover letter: https://lore.kernel.org/git/20220210044152.78352-1-chooglen@google.com

Based off 'master'.

Thanks as always for the kind feedback :) I believe I incorporated all
of feedback from the previous round except for the following:

- <xmqqk0d9w91r.fsf@gitster.g> In a recursive fetch, we thought that we
  could use the submodule name instead of the superproject commit oid
  and path.

  It's true that we don't need <.super_oid, .path> in order to init the
  subrepo, but it turns out that recursive fetch reads some
  configuration values from .gitmodules (via submodule_from_path()), so
  we still need to store super_oid in order to read the correct
  .gitmodules file.

- <20220304235328.649768-1-jonathantanmy@google.com> I've described the
  differences between the no-submodule-in-index test and the
  other-submodule-in-index test (their comments now refer to one
  another, so the contrast is more obvious), but didn't reorder them
  because I thought that made the test setup less intuitive to read.

- <20220304234622.647776-1-jonathantanmy@google.com> I added
  expect.err.sub2 to verify_test_result() but didn't change
  write_expected_super() to account for sub2. It turned out to be tricky
  to predict the output when 'super' fetches >1 branch because each
  fetched branch can affect the formatting. e.g.

    	   OLD_HEAD..super  super           -> origin/super

  can become

    	   OLD_HEAD..super  super                   -> origin/super
    	   OLD_HEAD..super  some-other-branch       -> origin/some-other-branch

  (I could work around this by replacing the whitespace with sed, but it
  seemed like too much overhead for a one-off test).

= Patch organization

- Patches 1-3 are quality-of-life improvements to the test suite that
  make it easier to write the tests in patch 9.
- Patches 4-6 are preparation for "git fetch" to read .gitmodules from
  the superproject commit in patch 7.
- Patches 7-8 refactor out the logic of "finding which submodules to
  fetch" and "fetching the submodules", making it easier to tell "git
  fetch" to fetch unpopulated submodules.
- Patch 9 teaches "git fetch" to fetch changed, unpopulated submodules
  in addition to populated submodules.
- Patch 10 is an optional bugfix + cleanup of the "git fetch" code that
  removes the last caller of the deprecated "add_submodule_odb()".

= Changes 

== Since v4
- Rename test helpers (s/check_/write_expected_)
- Test style fixes
- Update test comments
- Remove the manual test_cmp in the test that checks sub2 (but we still
  construct expect.err.super manually).

== Since v3
- Numerous style fixes + improved comments.
- Fix sed portability issues.
- Fix failing test due to default branch name assumptions.
- Patch 3: change a test so that it no longer depends on state from the
  previous test.
- Patch 9: fix memory leak when recording super_oid and path + add
  explanatory comment.

== Since v2
- Numerous small fixes to the code and commit message (thanks to all who
  helped spot these :))
- In patch 2, use test_cmp + sed to assert on test output, effectively
  reverting the "use grep" approach of v1-2 (see patch 2's description).
- New patch 3: introduce a test helper that creates the expected
  superproject commit (instead of copy-pasting the code over and over).
  - I did not get rid of "git fetch" inside the test helper (as Jonathan
    suggested) though, because that requires a bigger change in the test
    setup, and I think the test helper makes the test straightforward
    enough.
- New patch 8: refactor some shared logic out into fetch_task_create().
  This reduces code duplication between the get_fetch_task_from_*
  functions.
- In patch 9, add additional tests for 'submodules with the same name'.
- In patch 9, handle a bug where a submodule that is unpopulated by "git
  rm" still has "core.worktree" set and cannot be fetched (see patch 9's
  description).
- Remove the "git fetch --update-shallow" patch (I'll try to send it
  separately).

== Since v1
- Numerous style fixes suggested by Jonathan (thanks!)
- In patch 3, don't prematurely read submodules from the superproject
  commit (see:
  <kl6l5yplyat6.fsf@chooglen-macbookpro.roam.corp.google.com>).
- In patch 7, stop using "git checkout" and "! grep" in tests.
- In patch 7, stop doing the "find changed submodules" rev walk
  unconditionally. Instead, continue to check for .gitmodules, but also
  check for submodules in $GIT_DIR/modules.
  - I'm not entirely happy with the helper function name, see "---" for
    details.
- Move "git fetch --update-shallow" bugfix to patch 8.
  - Because the "find changed submodules" rev walk is no longer
    unconditional, this fix is no longer needed for tests to pass.
- Rename fetch_populated_submodules() to fetch_submodules().


Glen Choo (10):
  t5526: introduce test helper to assert on fetches
  t5526: stop asserting on stderr literally
  t5526: create superproject commits with test helper
  submodule: make static functions read submodules from commits
  submodule: inline submodule_commits() into caller
  submodule: store new submodule commits oid_array in a struct
  submodule: extract get_fetch_task()
  submodule: move logic into fetch_task_create()
  fetch: fetch unpopulated, changed submodules
  submodule: fix latent check_has_commit() bug

 Documentation/fetch-options.txt |  26 +-
 Documentation/git-fetch.txt     |  10 +-
 builtin/fetch.c                 |  14 +-
 submodule.c                     | 442 +++++++++++++++++---------
 submodule.h                     |  21 +-
 t/t5526-fetch-submodules.sh     | 545 ++++++++++++++++++++++++--------
 6 files changed, 746 insertions(+), 312 deletions(-)

Range-diff against v4:
 1:  57cd31afc2 !  1:  f22f992e2b t5526: introduce test helper to assert on fetches
    @@ t/t5526-fetch-submodules.sh: add_upstream_commit() {
     +#
     +# If a repo should not be fetched in the test, its corresponding
     +# expect.err file should be rm-ed.
    -+verify_fetch_result() {
    ++verify_fetch_result () {
     +	ACTUAL_ERR=$1 &&
     +	rm -f expect.err.combined &&
     +	if test -f expect.err.super
 2:  b70c894cff !  2:  f6ee125e16 t5526: stop asserting on stderr literally
    @@ Commit message
     
         Stop asserting on $head1 by replacing it with a dummy value in the
         actual and expected output. Do this by introducing test
    -    helpers (check_*()) that make it easier to construct the expected
    -    output, and use sed to munge the actual output.
    +    helpers (write_expected_*()) that make it easier to construct the
    +    expected output, and use sed to munge the actual output.
     
         Signed-off-by: Glen Choo <chooglen@google.com>
     
    @@ t/t5526-fetch-submodules.sh: export GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB
      
      pwd=$(pwd)
      
    -+check_sub () {
    ++write_expected_sub () {
     +	NEW_HEAD=$1 &&
    -+	cat >$pwd/expect.err.sub <<-EOF
    ++	cat >"$pwd/expect.err.sub" <<-EOF
     +	Fetching submodule submodule
     +	From $pwd/submodule
     +	   OLD_HEAD..$NEW_HEAD  sub        -> origin/sub
     +	EOF
     +}
     +
    -+check_deep () {
    ++write_expected_deep () {
     +	NEW_HEAD=$1 &&
    -+	cat >$pwd/expect.err.deep <<-EOF
    ++	cat >"$pwd/expect.err.deep" <<-EOF
     +	Fetching submodule submodule/subdir/deepsubmodule
     +	From $pwd/deepsubmodule
     +	   OLD_HEAD..$NEW_HEAD  deep       -> origin/deep
     +	EOF
     +}
     +
    -+check_super () {
    ++write_expected_super () {
     +	NEW_HEAD=$1 &&
    -+	cat >$pwd/expect.err.super <<-EOF
    ++	cat >"$pwd/expect.err.super" <<-EOF
     +	From $pwd/.
     +	   OLD_HEAD..$NEW_HEAD  super      -> origin/super
     +	EOF
    @@ t/t5526-fetch-submodules.sh: pwd=$(pwd)
     -		echo "From $pwd/submodule" >> ../expect.err.sub &&
     -		echo "   $head1..$head2  sub        -> origin/sub" >> ../expect.err.sub
     +		new_head=$(git rev-parse --short HEAD) &&
    -+		check_sub $new_head
    ++		write_expected_sub $new_head
      	) &&
      	(
      		cd deepsubmodule &&
    @@ t/t5526-fetch-submodules.sh: pwd=$(pwd)
     -		echo "From $pwd/deepsubmodule" >> ../expect.err.deep &&
     -		echo "   $head1..$head2  deep       -> origin/deep" >> ../expect.err.deep
     +		new_head=$(git rev-parse --short HEAD) &&
    -+		check_deep $new_head
    ++		write_expected_deep $new_head
      	)
      }
      
    -@@ t/t5526-fetch-submodules.sh: add_upstream_commit() {
    - #
    - # If a repo should not be fetched in the test, its corresponding
    - # expect.err file should be rm-ed.
    --verify_fetch_result() {
    -+verify_fetch_result () {
    - 	ACTUAL_ERR=$1 &&
    - 	rm -f expect.err.combined &&
    - 	if test -f expect.err.super
    -@@ t/t5526-fetch-submodules.sh: verify_fetch_result() {
    +@@ t/t5526-fetch-submodules.sh: verify_fetch_result () {
      	then
      		cat expect.err.deep >>expect.err.combined
      	fi &&
    @@ t/t5526-fetch-submodules.sh: test_expect_success "Recursion doesn't happen when
     -	echo "From $pwd/." > expect.err.super &&
     -	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
     +	new_head=$(git rev-parse --short HEAD) &&
    -+	check_super $new_head &&
    ++	write_expected_super $new_head &&
      	rm expect.err.deep &&
      	(
      		cd downstream &&
    @@ t/t5526-fetch-submodules.sh: test_expect_success "Recursion stops when no new su
     -	echo "From $pwd/." > expect.err.super &&
     -	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
     +	new_head=$(git rev-parse --short HEAD) &&
    -+	check_super $new_head &&
    ++	write_expected_super $new_head &&
      	rm expect.err.sub &&
      	rm expect.err.deep &&
      	(
    @@ t/t5526-fetch-submodules.sh: test_expect_success "Recursion picks up config in s
     -	echo "From $pwd/." > expect.err.super &&
     -	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
     +	new_head=$(git rev-parse --short HEAD) &&
    -+	check_super $new_head &&
    ++	write_expected_super $new_head &&
      	(
      		cd downstream &&
      		git fetch >../actual.out 2>../actual.err &&
    @@ t/t5526-fetch-submodules.sh: test_expect_success "Recursion picks up all submodu
     -		echo "From $pwd/submodule" >> ../expect.err.sub &&
     -		echo "   $head1..$head2  sub        -> origin/sub" >> ../expect.err.sub
     +		new_head=$(git rev-parse --short HEAD) &&
    -+		check_sub $new_head
    ++		write_expected_sub $new_head
      	) &&
     -	head1=$(git rev-parse --short HEAD) &&
      	git add submodule &&
    @@ t/t5526-fetch-submodules.sh: test_expect_success "Recursion picks up all submodu
     -	echo "From $pwd/." > expect.err.super &&
     -	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
     +	new_head=$(git rev-parse --short HEAD) &&
    -+	check_super $new_head &&
    ++	write_expected_super $new_head &&
      	(
      		cd downstream &&
      		git fetch >../actual.out 2>../actual.err
    @@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-deman
     -		echo "From $pwd/submodule" >> ../expect.err.sub &&
     -		echo "   $head1..$head2  sub        -> origin/sub" >> ../expect.err.sub
     +		new_head=$(git rev-parse --short HEAD) &&
    -+		check_sub $new_head
    ++		write_expected_sub $new_head
      	) &&
      	(
      		cd downstream &&
    @@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-deman
     -	echo "From $pwd/." > expect.err.super &&
     -	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
     +	new_head=$(git rev-parse --short HEAD) &&
    -+	check_super $new_head &&
    ++	write_expected_super $new_head &&
      	(
      		cd downstream &&
      		git config fetch.recurseSubmodules false &&
    @@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-deman
     -	echo "From $pwd/." > expect.err.super &&
     -	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
     +	new_head=$(git rev-parse --short HEAD) &&
    -+	check_super $new_head &&
    ++	write_expected_super $new_head &&
      	rm expect.err.sub &&
      	rm expect.err.deep &&
      	(
    @@ t/t5526-fetch-submodules.sh: test_expect_success "'fetch.recurseSubmodules=on-de
     -	echo "From $pwd/." > expect.err.super &&
     -	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
     +	new_head=$(git rev-parse --short HEAD) &&
    -+	check_super $new_head &&
    ++	write_expected_super $new_head &&
      	rm expect.err.deep &&
      	(
      		cd downstream &&
    @@ t/t5526-fetch-submodules.sh: test_expect_success "'submodule.<sub>.fetchRecurseS
     -	echo "From $pwd/." > expect.err.super &&
     -	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
     +	new_head=$(git rev-parse --short HEAD) &&
    -+	check_super $new_head &&
    ++	write_expected_super $new_head &&
      	rm expect.err.deep &&
      	(
      		cd downstream &&
    @@ t/t5526-fetch-submodules.sh: test_expect_success "don't fetch submodule when new
     -	echo "From $pwd/." > expect.err.super &&
     -	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
     +	new_head=$(git rev-parse --short HEAD) &&
    -+	check_super $new_head &&
    ++	write_expected_super $new_head &&
      	rm expect.err.sub &&
      	# This file does not exist, but rm -f for readability
      	rm -f expect.err.deep &&
    @@ t/t5526-fetch-submodules.sh: test_expect_success "'fetch.recurseSubmodules=on-de
     -	echo "From $pwd/." >expect.err.super &&
     -	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
     +	new_head=$(git rev-parse --short HEAD) &&
    -+	check_super $new_head &&
    ++	write_expected_super $new_head &&
      	rm expect.err.deep &&
      	(
      		cd downstream &&
 3:  7e2a01164e !  3:  17ccae2933 t5526: create superproject commits with test helper
    @@ Commit message
         Signed-off-by: Glen Choo <chooglen@google.com>
     
      ## t/t5526-fetch-submodules.sh ##
    -@@ t/t5526-fetch-submodules.sh: check_super () {
    +@@ t/t5526-fetch-submodules.sh: write_expected_super () {
      # a file that contains the expected err if that new commit were fetched.
      # These output files get concatenated in the right order by
      # verify_fetch_result().
    @@ t/t5526-fetch-submodules.sh: add_upstream_commit() {
     +	git commit -m "new submodule" &&
     +	super_head=$(git rev-parse --short HEAD) &&
     +	sub_head=$(git -C submodule rev-parse --short HEAD) &&
    -+	check_super $super_head &&
    -+	check_sub $sub_head
    ++	write_expected_super $super_head &&
    ++	write_expected_sub $sub_head
     +}
     +
      # Verifies that the expected repositories were fetched. This is done by
    @@ t/t5526-fetch-submodules.sh: test_expect_success "Recursion picks up config in s
     -		git add subdir/deepsubmodule &&
     -		git commit -m "new deepsubmodule" &&
     -		new_head=$(git rev-parse --short HEAD) &&
    --		check_sub $new_head
    +-		write_expected_sub $new_head
     -	) &&
     -	git add submodule &&
     -	git commit -m "new submodule" &&
     -	new_head=$(git rev-parse --short HEAD) &&
    --	check_super $new_head &&
    +-	write_expected_super $new_head &&
     +	add_submodule_commits &&
     +	add_superproject_commits &&
      	(
    @@ t/t5526-fetch-submodules.sh: test_expect_success "Recursion picks up all submodu
     -		git add subdir/deepsubmodule &&
     -		git commit -m "new deepsubmodule" &&
     -		new_head=$(git rev-parse --short HEAD) &&
    --		check_sub $new_head
    +-		write_expected_sub $new_head
     -	) &&
     +	add_submodule_commits &&
      	(
    @@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-deman
     -	git add submodule &&
     -	git commit -m "new submodule" &&
     -	new_head=$(git rev-parse --short HEAD) &&
    --	check_super $new_head &&
    +-	write_expected_super $new_head &&
    ++	add_submodule_commits &&
     +	add_superproject_commits &&
      	(
      		cd downstream &&
 4:  88112ee225 =  4:  b220dd32c1 submodule: make static functions read submodules from commits
 5:  007cd97aba =  5:  da346aa12a submodule: inline submodule_commits() into caller
 6:  f34ea88fe9 =  6:  c1fd9c3abf submodule: store new submodule commits oid_array in a struct
 7:  f66ab663c5 =  7:  dbb931fe30 submodule: extract get_fetch_task()
 8:  4e3db1bc9d =  8:  7242236df9 submodule: move logic into fetch_task_create()
 9:  9e7b1c1bbe !  9:  0dada865d4 fetch: fetch unpopulated, changed submodules
    @@ submodule.h: int should_update_submodules(void);
      ## t/t5526-fetch-submodules.sh ##
     @@ t/t5526-fetch-submodules.sh: pwd=$(pwd)
      
    - check_sub () {
    + write_expected_sub () {
      	NEW_HEAD=$1 &&
     +	SUPER_HEAD=$2 &&
    - 	cat >$pwd/expect.err.sub <<-EOF
    + 	cat >"$pwd/expect.err.sub" <<-EOF
     -	Fetching submodule submodule
     +	Fetching submodule submodule${SUPER_HEAD:+ at commit $SUPER_HEAD}
      	From $pwd/submodule
      	   OLD_HEAD..$NEW_HEAD  sub        -> origin/sub
      	EOF
    -@@ t/t5526-fetch-submodules.sh: check_sub () {
    + }
      
    - check_deep () {
    ++write_expected_sub2 () {
    ++	NEW_HEAD=$1 &&
    ++	SUPER_HEAD=$2 &&
    ++	cat >"$pwd/expect.err.sub2" <<-EOF
    ++	Fetching submodule submodule2${SUPER_HEAD:+ at commit $SUPER_HEAD}
    ++	From $pwd/submodule2
    ++	   OLD_HEAD..$NEW_HEAD  sub2       -> origin/sub2
    ++	EOF
    ++}
    ++
    + write_expected_deep () {
      	NEW_HEAD=$1 &&
     +	SUB_HEAD=$2 &&
    - 	cat >$pwd/expect.err.deep <<-EOF
    + 	cat >"$pwd/expect.err.deep" <<-EOF
     -	Fetching submodule submodule/subdir/deepsubmodule
     +	Fetching submodule submodule/subdir/deepsubmodule${SUB_HEAD:+ at commit $SUB_HEAD}
      	From $pwd/deepsubmodule
      	   OLD_HEAD..$NEW_HEAD  deep       -> origin/deep
      	EOF
    -@@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-demand' doesn't recurse when no ne
    - '
    - 
    - test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necessary (and ignores config)" '
    -+	add_submodule_commits &&
    - 	add_superproject_commits &&
    - 	(
    - 		cd downstream &&
    +@@ t/t5526-fetch-submodules.sh: verify_fetch_result () {
    + 	then
    + 		cat expect.err.deep >>expect.err.combined
    + 	fi &&
    ++	if test -f expect.err.sub2
    ++	then
    ++		cat expect.err.sub2 >>expect.err.combined
    ++	fi &&
    + 	sed -e 's/[0-9a-f][0-9a-f]*\.\./OLD_HEAD\.\./' "$ACTUAL_ERR" >actual.err.cmp &&
    + 	test_cmp expect.err.combined actual.err.cmp
    + }
     @@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necess
      	verify_fetch_result actual.err
      '
      
    -+# Test that we can fetch submodules in other branches by running fetch
    -+# in a commit that has no submodules.
    ++# These tests verify that we can fetch submodules that aren't in the
    ++# index.
    ++#
    ++# First, test the simple case where the index is empty and we only fetch
    ++# submodules that are not in the index.
     +test_expect_success 'setup downstream branch without submodules' '
     +	(
     +		cd downstream &&
    @@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-deman
     +	deep_head=$(git -C submodule/subdir/deepsubmodule rev-parse --short HEAD) &&
     +
     +	# assert that these are fetched from commits, not the index
    -+	check_sub $sub_head $super_head &&
    -+	check_deep $deep_head $sub_head &&
    ++	write_expected_sub $sub_head $super_head &&
    ++	write_expected_deep $deep_head $sub_head &&
     +
     +	test_must_be_empty actual.out &&
     +	verify_fetch_result actual.err
    @@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-deman
     +	deep_head=$(git -C submodule/subdir/deepsubmodule rev-parse --short HEAD) &&
     +
     +	# assert that these are fetched from commits, not the index
    -+	check_sub $sub_head $super_head &&
    -+	check_deep $deep_head $sub_head &&
    ++	write_expected_sub $sub_head $super_head &&
    ++	write_expected_deep $deep_head $sub_head &&
     +
     +	test_must_be_empty actual.out &&
     +	verify_fetch_result actual.err
    @@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-deman
     +	) &&
     +	test_must_be_empty actual.out &&
     +	super_head=$(git rev-parse --short HEAD) &&
    -+	check_super $super_head &&
    ++	write_expected_super $super_head &&
     +	# Neither should be fetched because the submodule is inactive
     +	rm expect.err.sub &&
     +	rm expect.err.deep &&
     +	verify_fetch_result actual.err
     +'
     +
    -+# In downstream, init "submodule2", but do not check it out while
    -+# fetching. This lets us assert that unpopulated submodules can be
    -+# fetched.
    ++# Now that we know we can fetch submodules that are not in the index,
    ++# test that we can fetch index and non-index submodules in the same
    ++# operation.
     +test_expect_success 'setup downstream branch with other submodule' '
     +	mkdir submodule2 &&
     +	(
    @@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-deman
     +'
     +
     +test_expect_success "'--recurse-submodules' should fetch submodule commits in changed submodules and the index" '
    ++	test_when_finished "rm expect.err.sub2" &&
     +	# Create new commit in origin/super
     +	add_submodule_commits &&
     +	add_superproject_commits &&
    @@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-deman
     +		git fetch --recurse-submodules >../actual.out 2>../actual.err
     +	) &&
     +	test_must_be_empty actual.out &&
    -+	deep_head=$(git -C submodule/subdir/deepsubmodule rev-parse --short HEAD) &&
    -+	sub_head=$(git -C submodule rev-parse --short HEAD) &&
     +	sub2_head=$(git -C submodule2 rev-parse --short HEAD) &&
    -+	super_head=$(git rev-parse --short HEAD) &&
    ++	super_head=$(git rev-parse --short super) &&
     +	super_sub2_only_head=$(git rev-parse --short super-sub2-only) &&
    ++	write_expected_sub2 $sub2_head $super_sub2_only_head &&
     +
    -+	# Use test_cmp manually because verify_fetch_result does not
    -+	# consider submodule2. All the repos should be fetched, but only
    -+	# submodule2 should be read from a commit
    -+	cat > expect.err.combined <<-EOF &&
    ++	# write_expected_super cannot handle >1 branch. Since this is a
    ++	# one-off, construct expect.err.super manually.
    ++	cat >"$pwd/expect.err.super" <<-EOF &&
     +	From $pwd/.
     +	   OLD_HEAD..$super_head  super           -> origin/super
     +	   OLD_HEAD..$super_sub2_only_head  super-sub2-only -> origin/super-sub2-only
    -+	Fetching submodule submodule
    -+	From $pwd/submodule
    -+	   OLD_HEAD..$sub_head  sub        -> origin/sub
    -+	Fetching submodule submodule/subdir/deepsubmodule
    -+	From $pwd/deepsubmodule
    -+	   OLD_HEAD..$deep_head  deep       -> origin/deep
    -+	Fetching submodule submodule2 at commit $super_sub2_only_head
    -+	From $pwd/submodule2
    -+	   OLD_HEAD..$sub2_head  sub2       -> origin/sub2
     +	EOF
    -+	sed -e "s/[0-9a-f][0-9a-f]*\.\./OLD_HEAD\.\./" actual.err >actual.err.cmp &&
    -+	test_cmp expect.err.combined actual.err.cmp
    ++	verify_fetch_result actual.err
     +'
     +
      test_expect_success "'--recurse-submodules=on-demand' stops when no new submodule commits are found in the superproject (and ignores config)" '
10:  362ce3c7f8 = 10:  71bb456041 submodule: fix latent check_has_commit() bug

base-commit: 715d08a9e51251ad8290b181b6ac3b9e1f9719d7
-- 
2.33.GIT


^ permalink raw reply	[flat|nested] 149+ messages in thread

* [PATCH v5 01/10] t5526: introduce test helper to assert on fetches
  2022-03-08  0:14       ` [PATCH v5 " Glen Choo
@ 2022-03-08  0:14         ` Glen Choo
  2022-03-08  0:14         ` [PATCH v5 02/10] t5526: stop asserting on stderr literally Glen Choo
                           ` (10 subsequent siblings)
  11 siblings, 0 replies; 149+ messages in thread
From: Glen Choo @ 2022-03-08  0:14 UTC (permalink / raw)
  To: git
  Cc: Glen Choo, Jonathan Tan, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

Tests in t/t5526-fetch-submodules.sh are unnecessarily noisy:

* The tests have extra logic in order to reproduce the expected stderr
  literally, but not all of these details (e.g. the head of the
  remote-tracking branch before the fetch) are relevant to the test.

* The expect.err file is constructed by the add_upstream_commit() helper
  as input into test_cmp, but most tests fetch a different combination
  of repos from expect.err. This results in noisy tests that modify
  parts of that expect.err to generate the expected output.

To address both of these issues, introduce a verify_fetch_result()
helper to t/t5526-fetch-submodules.sh that asserts on the output of "git
fetch --recurse-submodules" and handles the ordering of expect.err.

As a result, the tests no longer construct expect.err manually. Tests
still consider the old head of the remote-tracking branch ("$head1"),
but that will be fixed in a later commit.

Signed-off-by: Glen Choo <chooglen@google.com>
---
 t/t5526-fetch-submodules.sh | 139 ++++++++++++++++++++++--------------
 1 file changed, 84 insertions(+), 55 deletions(-)

diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index 840c89cc8b..c3a67270b1 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -10,6 +10,10 @@ export GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB
 
 pwd=$(pwd)
 
+# For each submodule in the test setup, this creates a commit and writes
+# a file that contains the expected err if that new commit were fetched.
+# These output files get concatenated in the right order by
+# verify_fetch_result().
 add_upstream_commit() {
 	(
 		cd submodule &&
@@ -19,9 +23,9 @@ add_upstream_commit() {
 		git add subfile &&
 		git commit -m new subfile &&
 		head2=$(git rev-parse --short HEAD) &&
-		echo "Fetching submodule submodule" > ../expect.err &&
-		echo "From $pwd/submodule" >> ../expect.err &&
-		echo "   $head1..$head2  sub        -> origin/sub" >> ../expect.err
+		echo "Fetching submodule submodule" > ../expect.err.sub &&
+		echo "From $pwd/submodule" >> ../expect.err.sub &&
+		echo "   $head1..$head2  sub        -> origin/sub" >> ../expect.err.sub
 	) &&
 	(
 		cd deepsubmodule &&
@@ -31,12 +35,36 @@ add_upstream_commit() {
 		git add deepsubfile &&
 		git commit -m new deepsubfile &&
 		head2=$(git rev-parse --short HEAD) &&
-		echo "Fetching submodule submodule/subdir/deepsubmodule" >> ../expect.err
-		echo "From $pwd/deepsubmodule" >> ../expect.err &&
-		echo "   $head1..$head2  deep       -> origin/deep" >> ../expect.err
+		echo "Fetching submodule submodule/subdir/deepsubmodule" > ../expect.err.deep
+		echo "From $pwd/deepsubmodule" >> ../expect.err.deep &&
+		echo "   $head1..$head2  deep       -> origin/deep" >> ../expect.err.deep
 	)
 }
 
+# Verifies that the expected repositories were fetched. This is done by
+# concatenating the files expect.err.[super|sub|deep] in the correct
+# order and comparing it to the actual stderr.
+#
+# If a repo should not be fetched in the test, its corresponding
+# expect.err file should be rm-ed.
+verify_fetch_result () {
+	ACTUAL_ERR=$1 &&
+	rm -f expect.err.combined &&
+	if test -f expect.err.super
+	then
+		cat expect.err.super >>expect.err.combined
+	fi &&
+	if test -f expect.err.sub
+	then
+		cat expect.err.sub >>expect.err.combined
+	fi &&
+	if test -f expect.err.deep
+	then
+		cat expect.err.deep >>expect.err.combined
+	fi &&
+	test_cmp expect.err.combined $ACTUAL_ERR
+}
+
 test_expect_success setup '
 	mkdir deepsubmodule &&
 	(
@@ -74,7 +102,7 @@ test_expect_success "fetch --recurse-submodules recurses into submodules" '
 		git fetch --recurse-submodules >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "submodule.recurse option triggers recursive fetch" '
@@ -84,7 +112,7 @@ test_expect_success "submodule.recurse option triggers recursive fetch" '
 		git -c submodule.recurse fetch >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "fetch --recurse-submodules -j2 has the same output behaviour" '
@@ -94,7 +122,7 @@ test_expect_success "fetch --recurse-submodules -j2 has the same output behaviou
 		GIT_TRACE="$TRASH_DIRECTORY/trace.out" git fetch --recurse-submodules -j2 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err &&
+	verify_fetch_result actual.err &&
 	grep "2 tasks" trace.out
 '
 
@@ -124,7 +152,7 @@ test_expect_success "using fetchRecurseSubmodules=true in .gitmodules recurses i
 		git fetch >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "--no-recurse-submodules overrides .gitmodules config" '
@@ -155,7 +183,7 @@ test_expect_success "--recurse-submodules overrides fetchRecurseSubmodules setti
 		git config --unset submodule.submodule.fetchRecurseSubmodules
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "--quiet propagates to submodules" '
@@ -183,7 +211,7 @@ test_expect_success "--dry-run propagates to submodules" '
 		git fetch --recurse-submodules --dry-run >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "Without --dry-run propagates to submodules" '
@@ -192,7 +220,7 @@ test_expect_success "Without --dry-run propagates to submodules" '
 		git fetch --recurse-submodules >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "recurseSubmodules=true propagates into submodules" '
@@ -203,7 +231,7 @@ test_expect_success "recurseSubmodules=true propagates into submodules" '
 		git fetch >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "--recurse-submodules overrides config in submodule" '
@@ -217,7 +245,7 @@ test_expect_success "--recurse-submodules overrides config in submodule" '
 		git fetch --recurse-submodules >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "--no-recurse-submodules overrides config setting" '
@@ -250,14 +278,14 @@ test_expect_success "Recursion stops when no new submodule commits are fetched"
 	git add submodule &&
 	git commit -m "new submodule" &&
 	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.sub &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err.sub &&
-	head -3 expect.err >> expect.err.sub &&
+	echo "From $pwd/." > expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
+	rm expect.err.deep &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err
 	) &&
-	test_cmp expect.err.sub actual.err &&
+	verify_fetch_result actual.err &&
 	test_must_be_empty actual.out
 '
 
@@ -268,14 +296,16 @@ test_expect_success "Recursion doesn't happen when new superproject commits don'
 	git add file &&
 	git commit -m "new file" &&
 	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.file &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err.file &&
+	echo "From $pwd/." > expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
+	rm expect.err.sub &&
+	rm expect.err.deep &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err.file actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "Recursion picks up config in submodule" '
@@ -292,9 +322,8 @@ test_expect_success "Recursion picks up config in submodule" '
 	git add submodule &&
 	git commit -m "new submodule" &&
 	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.sub &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err.sub &&
-	cat expect.err >> expect.err.sub &&
+	echo "From $pwd/." > expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err &&
@@ -303,7 +332,7 @@ test_expect_success "Recursion picks up config in submodule" '
 			git config --unset fetch.recurseSubmodules
 		)
 	) &&
-	test_cmp expect.err.sub actual.err &&
+	verify_fetch_result actual.err &&
 	test_must_be_empty actual.out
 '
 
@@ -328,15 +357,13 @@ test_expect_success "Recursion picks up all submodules when necessary" '
 	git add submodule &&
 	git commit -m "new submodule" &&
 	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.2 &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err.2 &&
-	cat expect.err.sub >> expect.err.2 &&
-	tail -3 expect.err >> expect.err.2 &&
+	echo "From $pwd/." > expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err
 	) &&
-	test_cmp expect.err.2 actual.err &&
+	verify_fetch_result actual.err &&
 	test_must_be_empty actual.out
 '
 
@@ -372,11 +399,8 @@ test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necess
 	git add submodule &&
 	git commit -m "new submodule" &&
 	head2=$(git rev-parse --short HEAD) &&
-	tail -3 expect.err > expect.err.deepsub &&
-	echo "From $pwd/." > expect.err &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err &&
-	cat expect.err.sub >> expect.err &&
-	cat expect.err.deepsub >> expect.err &&
+	echo "From $pwd/." > expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
 	(
 		cd downstream &&
 		git config fetch.recurseSubmodules false &&
@@ -392,7 +416,7 @@ test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necess
 		)
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "'--recurse-submodules=on-demand' stops when no new submodule commits are found in the superproject (and ignores config)" '
@@ -402,14 +426,16 @@ test_expect_success "'--recurse-submodules=on-demand' stops when no new submodul
 	git add file &&
 	git commit -m "new file" &&
 	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.file &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err.file &&
+	echo "From $pwd/." > expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
+	rm expect.err.sub &&
+	rm expect.err.deep &&
 	(
 		cd downstream &&
 		git fetch --recurse-submodules=on-demand >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err.file actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "'fetch.recurseSubmodules=on-demand' overrides global config" '
@@ -423,9 +449,9 @@ test_expect_success "'fetch.recurseSubmodules=on-demand' overrides global config
 	git add submodule &&
 	git commit -m "new submodule" &&
 	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.2 &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err.2 &&
-	head -3 expect.err >> expect.err.2 &&
+	echo "From $pwd/." > expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
+	rm expect.err.deep &&
 	(
 		cd downstream &&
 		git config fetch.recurseSubmodules on-demand &&
@@ -437,7 +463,7 @@ test_expect_success "'fetch.recurseSubmodules=on-demand' overrides global config
 		git config --unset fetch.recurseSubmodules
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err.2 actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "'submodule.<sub>.fetchRecurseSubmodules=on-demand' overrides fetch.recurseSubmodules" '
@@ -451,9 +477,9 @@ test_expect_success "'submodule.<sub>.fetchRecurseSubmodules=on-demand' override
 	git add submodule &&
 	git commit -m "new submodule" &&
 	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.2 &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err.2 &&
-	head -3 expect.err >> expect.err.2 &&
+	echo "From $pwd/." > expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
+	rm expect.err.deep &&
 	(
 		cd downstream &&
 		git config submodule.submodule.fetchRecurseSubmodules on-demand &&
@@ -465,7 +491,7 @@ test_expect_success "'submodule.<sub>.fetchRecurseSubmodules=on-demand' override
 		git config --unset submodule.submodule.fetchRecurseSubmodules
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err.2 actual.err
+	verify_fetch_result actual.err
 '
 
 test_expect_success "don't fetch submodule when newly recorded commits are already present" '
@@ -477,14 +503,17 @@ test_expect_success "don't fetch submodule when newly recorded commits are alrea
 	git add submodule &&
 	git commit -m "submodule rewound" &&
 	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err &&
+	echo "From $pwd/." > expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
+	rm expect.err.sub &&
+	# This file does not exist, but rm -f for readability
+	rm -f expect.err.deep &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err actual.err &&
+	verify_fetch_result actual.err &&
 	(
 		cd submodule &&
 		git checkout -q sub
@@ -502,9 +531,9 @@ test_expect_success "'fetch.recurseSubmodules=on-demand' works also without .git
 	git rm .gitmodules &&
 	git commit -m "new submodule without .gitmodules" &&
 	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." >expect.err.2 &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err.2 &&
-	head -3 expect.err >>expect.err.2 &&
+	echo "From $pwd/." >expect.err.super &&
+	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
+	rm expect.err.deep &&
 	(
 		cd downstream &&
 		rm .gitmodules &&
@@ -520,7 +549,7 @@ test_expect_success "'fetch.recurseSubmodules=on-demand' works also without .git
 		git reset --hard
 	) &&
 	test_must_be_empty actual.out &&
-	test_cmp expect.err.2 actual.err &&
+	verify_fetch_result actual.err &&
 	git checkout HEAD^ -- .gitmodules &&
 	git add .gitmodules &&
 	git commit -m "new submodule restored .gitmodules"
-- 
2.33.GIT


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v5 02/10] t5526: stop asserting on stderr literally
  2022-03-08  0:14       ` [PATCH v5 " Glen Choo
  2022-03-08  0:14         ` [PATCH v5 01/10] t5526: introduce test helper to assert on fetches Glen Choo
@ 2022-03-08  0:14         ` Glen Choo
  2022-03-08  0:14         ` [PATCH v5 03/10] t5526: create superproject commits with test helper Glen Choo
                           ` (9 subsequent siblings)
  11 siblings, 0 replies; 149+ messages in thread
From: Glen Choo @ 2022-03-08  0:14 UTC (permalink / raw)
  To: git
  Cc: Glen Choo, Jonathan Tan, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

In the previous commit message, we noted that not all of the "git fetch"
stderr is relevant to the tests. Most of the test setup lines are
dedicated to these details of the stderr:

1. which repos (super/sub/deep) are involved in the fetch
2. the head of the remote-tracking branch before the fetch (i.e. $head1)
3. the head of the remote-tracking branch after the fetch (i.e. $head2)

1. and 3. are relevant because they tell us that the expected commit is
fetched by the expected repo, but 2. is completely irrelevant.

Stop asserting on $head1 by replacing it with a dummy value in the
actual and expected output. Do this by introducing test
helpers (write_expected_*()) that make it easier to construct the
expected output, and use sed to munge the actual output.

Signed-off-by: Glen Choo <chooglen@google.com>
---
 t/t5526-fetch-submodules.sh | 117 +++++++++++++++++-------------------
 1 file changed, 56 insertions(+), 61 deletions(-)

diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index c3a67270b1..e7136b68ba 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -10,6 +10,32 @@ export GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB
 
 pwd=$(pwd)
 
+write_expected_sub () {
+	NEW_HEAD=$1 &&
+	cat >"$pwd/expect.err.sub" <<-EOF
+	Fetching submodule submodule
+	From $pwd/submodule
+	   OLD_HEAD..$NEW_HEAD  sub        -> origin/sub
+	EOF
+}
+
+write_expected_deep () {
+	NEW_HEAD=$1 &&
+	cat >"$pwd/expect.err.deep" <<-EOF
+	Fetching submodule submodule/subdir/deepsubmodule
+	From $pwd/deepsubmodule
+	   OLD_HEAD..$NEW_HEAD  deep       -> origin/deep
+	EOF
+}
+
+write_expected_super () {
+	NEW_HEAD=$1 &&
+	cat >"$pwd/expect.err.super" <<-EOF
+	From $pwd/.
+	   OLD_HEAD..$NEW_HEAD  super      -> origin/super
+	EOF
+}
+
 # For each submodule in the test setup, this creates a commit and writes
 # a file that contains the expected err if that new commit were fetched.
 # These output files get concatenated in the right order by
@@ -17,27 +43,21 @@ pwd=$(pwd)
 add_upstream_commit() {
 	(
 		cd submodule &&
-		head1=$(git rev-parse --short HEAD) &&
 		echo new >> subfile &&
 		test_tick &&
 		git add subfile &&
 		git commit -m new subfile &&
-		head2=$(git rev-parse --short HEAD) &&
-		echo "Fetching submodule submodule" > ../expect.err.sub &&
-		echo "From $pwd/submodule" >> ../expect.err.sub &&
-		echo "   $head1..$head2  sub        -> origin/sub" >> ../expect.err.sub
+		new_head=$(git rev-parse --short HEAD) &&
+		write_expected_sub $new_head
 	) &&
 	(
 		cd deepsubmodule &&
-		head1=$(git rev-parse --short HEAD) &&
 		echo new >> deepsubfile &&
 		test_tick &&
 		git add deepsubfile &&
 		git commit -m new deepsubfile &&
-		head2=$(git rev-parse --short HEAD) &&
-		echo "Fetching submodule submodule/subdir/deepsubmodule" > ../expect.err.deep
-		echo "From $pwd/deepsubmodule" >> ../expect.err.deep &&
-		echo "   $head1..$head2  deep       -> origin/deep" >> ../expect.err.deep
+		new_head=$(git rev-parse --short HEAD) &&
+		write_expected_deep $new_head
 	)
 }
 
@@ -62,7 +82,8 @@ verify_fetch_result () {
 	then
 		cat expect.err.deep >>expect.err.combined
 	fi &&
-	test_cmp expect.err.combined $ACTUAL_ERR
+	sed -e 's/[0-9a-f][0-9a-f]*\.\./OLD_HEAD\.\./' "$ACTUAL_ERR" >actual.err.cmp &&
+	test_cmp expect.err.combined actual.err.cmp
 }
 
 test_expect_success setup '
@@ -274,12 +295,10 @@ test_expect_success "Recursion doesn't happen when no new commits are fetched in
 '
 
 test_expect_success "Recursion stops when no new submodule commits are fetched" '
-	head1=$(git rev-parse --short HEAD) &&
 	git add submodule &&
 	git commit -m "new submodule" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
+	new_head=$(git rev-parse --short HEAD) &&
+	write_expected_super $new_head &&
 	rm expect.err.deep &&
 	(
 		cd downstream &&
@@ -291,13 +310,11 @@ test_expect_success "Recursion stops when no new submodule commits are fetched"
 
 test_expect_success "Recursion doesn't happen when new superproject commits don't change any submodules" '
 	add_upstream_commit &&
-	head1=$(git rev-parse --short HEAD) &&
 	echo a > file &&
 	git add file &&
 	git commit -m "new file" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
+	new_head=$(git rev-parse --short HEAD) &&
+	write_expected_super $new_head &&
 	rm expect.err.sub &&
 	rm expect.err.deep &&
 	(
@@ -318,12 +335,10 @@ test_expect_success "Recursion picks up config in submodule" '
 		)
 	) &&
 	add_upstream_commit &&
-	head1=$(git rev-parse --short HEAD) &&
 	git add submodule &&
 	git commit -m "new submodule" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
+	new_head=$(git rev-parse --short HEAD) &&
+	write_expected_super $new_head &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err &&
@@ -345,20 +360,15 @@ test_expect_success "Recursion picks up all submodules when necessary" '
 			git fetch &&
 			git checkout -q FETCH_HEAD
 		) &&
-		head1=$(git rev-parse --short HEAD^) &&
 		git add subdir/deepsubmodule &&
 		git commit -m "new deepsubmodule" &&
-		head2=$(git rev-parse --short HEAD) &&
-		echo "Fetching submodule submodule" > ../expect.err.sub &&
-		echo "From $pwd/submodule" >> ../expect.err.sub &&
-		echo "   $head1..$head2  sub        -> origin/sub" >> ../expect.err.sub
+		new_head=$(git rev-parse --short HEAD) &&
+		write_expected_sub $new_head
 	) &&
-	head1=$(git rev-parse --short HEAD) &&
 	git add submodule &&
 	git commit -m "new submodule" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
+	new_head=$(git rev-parse --short HEAD) &&
+	write_expected_super $new_head &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err
@@ -376,13 +386,10 @@ test_expect_success "'--recurse-submodules=on-demand' doesn't recurse when no ne
 			git fetch &&
 			git checkout -q FETCH_HEAD
 		) &&
-		head1=$(git rev-parse --short HEAD^) &&
 		git add subdir/deepsubmodule &&
 		git commit -m "new deepsubmodule" &&
-		head2=$(git rev-parse --short HEAD) &&
-		echo Fetching submodule submodule > ../expect.err.sub &&
-		echo "From $pwd/submodule" >> ../expect.err.sub &&
-		echo "   $head1..$head2  sub        -> origin/sub" >> ../expect.err.sub
+		new_head=$(git rev-parse --short HEAD) &&
+		write_expected_sub $new_head
 	) &&
 	(
 		cd downstream &&
@@ -395,12 +402,10 @@ test_expect_success "'--recurse-submodules=on-demand' doesn't recurse when no ne
 '
 
 test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necessary (and ignores config)" '
-	head1=$(git rev-parse --short HEAD) &&
 	git add submodule &&
 	git commit -m "new submodule" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
+	new_head=$(git rev-parse --short HEAD) &&
+	write_expected_super $new_head &&
 	(
 		cd downstream &&
 		git config fetch.recurseSubmodules false &&
@@ -421,13 +426,11 @@ test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necess
 
 test_expect_success "'--recurse-submodules=on-demand' stops when no new submodule commits are found in the superproject (and ignores config)" '
 	add_upstream_commit &&
-	head1=$(git rev-parse --short HEAD) &&
 	echo a >> file &&
 	git add file &&
 	git commit -m "new file" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
+	new_head=$(git rev-parse --short HEAD) &&
+	write_expected_super $new_head &&
 	rm expect.err.sub &&
 	rm expect.err.deep &&
 	(
@@ -445,12 +448,10 @@ test_expect_success "'fetch.recurseSubmodules=on-demand' overrides global config
 	) &&
 	add_upstream_commit &&
 	git config --global fetch.recurseSubmodules false &&
-	head1=$(git rev-parse --short HEAD) &&
 	git add submodule &&
 	git commit -m "new submodule" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
+	new_head=$(git rev-parse --short HEAD) &&
+	write_expected_super $new_head &&
 	rm expect.err.deep &&
 	(
 		cd downstream &&
@@ -473,12 +474,10 @@ test_expect_success "'submodule.<sub>.fetchRecurseSubmodules=on-demand' override
 	) &&
 	add_upstream_commit &&
 	git config fetch.recurseSubmodules false &&
-	head1=$(git rev-parse --short HEAD) &&
 	git add submodule &&
 	git commit -m "new submodule" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
+	new_head=$(git rev-parse --short HEAD) &&
+	write_expected_super $new_head &&
 	rm expect.err.deep &&
 	(
 		cd downstream &&
@@ -499,12 +498,10 @@ test_expect_success "don't fetch submodule when newly recorded commits are alrea
 		cd submodule &&
 		git checkout -q HEAD^^
 	) &&
-	head1=$(git rev-parse --short HEAD) &&
 	git add submodule &&
 	git commit -m "submodule rewound" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." > expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
+	new_head=$(git rev-parse --short HEAD) &&
+	write_expected_super $new_head &&
 	rm expect.err.sub &&
 	# This file does not exist, but rm -f for readability
 	rm -f expect.err.deep &&
@@ -526,13 +523,11 @@ test_expect_success "'fetch.recurseSubmodules=on-demand' works also without .git
 		git fetch --recurse-submodules
 	) &&
 	add_upstream_commit &&
-	head1=$(git rev-parse --short HEAD) &&
 	git add submodule &&
 	git rm .gitmodules &&
 	git commit -m "new submodule without .gitmodules" &&
-	head2=$(git rev-parse --short HEAD) &&
-	echo "From $pwd/." >expect.err.super &&
-	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
+	new_head=$(git rev-parse --short HEAD) &&
+	write_expected_super $new_head &&
 	rm expect.err.deep &&
 	(
 		cd downstream &&
-- 
2.33.GIT


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v5 03/10] t5526: create superproject commits with test helper
  2022-03-08  0:14       ` [PATCH v5 " Glen Choo
  2022-03-08  0:14         ` [PATCH v5 01/10] t5526: introduce test helper to assert on fetches Glen Choo
  2022-03-08  0:14         ` [PATCH v5 02/10] t5526: stop asserting on stderr literally Glen Choo
@ 2022-03-08  0:14         ` Glen Choo
  2022-03-08  0:14         ` [PATCH v5 04/10] submodule: make static functions read submodules from commits Glen Choo
                           ` (8 subsequent siblings)
  11 siblings, 0 replies; 149+ messages in thread
From: Glen Choo @ 2022-03-08  0:14 UTC (permalink / raw)
  To: git
  Cc: Glen Choo, Jonathan Tan, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

A few tests in t5526 use this pattern as part of their setup:

1. Create new commits in the upstream submodules (using
   add_upstream_commit()).
2. In the upstream superprojects, add the new submodule commits from the
   previous step.

A future commit will add more tests with this pattern, so reduce the
verbosity of present and future tests by introducing a test helper that
creates superproject commits. Since we now have two helpers that add
upstream commits, rename add_upstream_commit() to
add_submodule_commits().

Signed-off-by: Glen Choo <chooglen@google.com>
---
 t/t5526-fetch-submodules.sh | 95 ++++++++++++++++++-------------------
 1 file changed, 45 insertions(+), 50 deletions(-)

diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index e7136b68ba..aa6bb9867c 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -40,7 +40,7 @@ write_expected_super () {
 # a file that contains the expected err if that new commit were fetched.
 # These output files get concatenated in the right order by
 # verify_fetch_result().
-add_upstream_commit() {
+add_submodule_commits () {
 	(
 		cd submodule &&
 		echo new >> subfile &&
@@ -61,6 +61,30 @@ add_upstream_commit() {
 	)
 }
 
+# For each superproject in the test setup, update its submodule, add the
+# submodule and create a new commit with the submodule change.
+#
+# This requires add_submodule_commits() to be called first, otherwise
+# the submodules will not have changed and cannot be "git add"-ed.
+add_superproject_commits () {
+	(
+		cd submodule &&
+		(
+			cd subdir/deepsubmodule &&
+			git fetch &&
+			git checkout -q FETCH_HEAD
+		) &&
+		git add subdir/deepsubmodule &&
+		git commit -m "new deep submodule"
+	) &&
+	git add submodule &&
+	git commit -m "new submodule" &&
+	super_head=$(git rev-parse --short HEAD) &&
+	sub_head=$(git -C submodule rev-parse --short HEAD) &&
+	write_expected_super $super_head &&
+	write_expected_sub $sub_head
+}
+
 # Verifies that the expected repositories were fetched. This is done by
 # concatenating the files expect.err.[super|sub|deep] in the correct
 # order and comparing it to the actual stderr.
@@ -117,7 +141,7 @@ test_expect_success setup '
 '
 
 test_expect_success "fetch --recurse-submodules recurses into submodules" '
-	add_upstream_commit &&
+	add_submodule_commits &&
 	(
 		cd downstream &&
 		git fetch --recurse-submodules >../actual.out 2>../actual.err
@@ -127,7 +151,7 @@ test_expect_success "fetch --recurse-submodules recurses into submodules" '
 '
 
 test_expect_success "submodule.recurse option triggers recursive fetch" '
-	add_upstream_commit &&
+	add_submodule_commits &&
 	(
 		cd downstream &&
 		git -c submodule.recurse fetch >../actual.out 2>../actual.err
@@ -137,7 +161,7 @@ test_expect_success "submodule.recurse option triggers recursive fetch" '
 '
 
 test_expect_success "fetch --recurse-submodules -j2 has the same output behaviour" '
-	add_upstream_commit &&
+	add_submodule_commits &&
 	(
 		cd downstream &&
 		GIT_TRACE="$TRASH_DIRECTORY/trace.out" git fetch --recurse-submodules -j2 2>../actual.err
@@ -148,7 +172,7 @@ test_expect_success "fetch --recurse-submodules -j2 has the same output behaviou
 '
 
 test_expect_success "fetch alone only fetches superproject" '
-	add_upstream_commit &&
+	add_submodule_commits &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err
@@ -177,7 +201,7 @@ test_expect_success "using fetchRecurseSubmodules=true in .gitmodules recurses i
 '
 
 test_expect_success "--no-recurse-submodules overrides .gitmodules config" '
-	add_upstream_commit &&
+	add_submodule_commits &&
 	(
 		cd downstream &&
 		git fetch --no-recurse-submodules >../actual.out 2>../actual.err
@@ -226,7 +250,7 @@ test_expect_success "--quiet propagates to parallel submodules" '
 '
 
 test_expect_success "--dry-run propagates to submodules" '
-	add_upstream_commit &&
+	add_submodule_commits &&
 	(
 		cd downstream &&
 		git fetch --recurse-submodules --dry-run >../actual.out 2>../actual.err
@@ -245,7 +269,7 @@ test_expect_success "Without --dry-run propagates to submodules" '
 '
 
 test_expect_success "recurseSubmodules=true propagates into submodules" '
-	add_upstream_commit &&
+	add_submodule_commits &&
 	(
 		cd downstream &&
 		git config fetch.recurseSubmodules true &&
@@ -256,7 +280,7 @@ test_expect_success "recurseSubmodules=true propagates into submodules" '
 '
 
 test_expect_success "--recurse-submodules overrides config in submodule" '
-	add_upstream_commit &&
+	add_submodule_commits &&
 	(
 		cd downstream &&
 		(
@@ -270,7 +294,7 @@ test_expect_success "--recurse-submodules overrides config in submodule" '
 '
 
 test_expect_success "--no-recurse-submodules overrides config setting" '
-	add_upstream_commit &&
+	add_submodule_commits &&
 	(
 		cd downstream &&
 		git config fetch.recurseSubmodules true &&
@@ -309,7 +333,7 @@ test_expect_success "Recursion stops when no new submodule commits are fetched"
 '
 
 test_expect_success "Recursion doesn't happen when new superproject commits don't change any submodules" '
-	add_upstream_commit &&
+	add_submodule_commits &&
 	echo a > file &&
 	git add file &&
 	git commit -m "new file" &&
@@ -334,7 +358,7 @@ test_expect_success "Recursion picks up config in submodule" '
 			git config fetch.recurseSubmodules true
 		)
 	) &&
-	add_upstream_commit &&
+	add_submodule_commits &&
 	git add submodule &&
 	git commit -m "new submodule" &&
 	new_head=$(git rev-parse --short HEAD) &&
@@ -352,23 +376,8 @@ test_expect_success "Recursion picks up config in submodule" '
 '
 
 test_expect_success "Recursion picks up all submodules when necessary" '
-	add_upstream_commit &&
-	(
-		cd submodule &&
-		(
-			cd subdir/deepsubmodule &&
-			git fetch &&
-			git checkout -q FETCH_HEAD
-		) &&
-		git add subdir/deepsubmodule &&
-		git commit -m "new deepsubmodule" &&
-		new_head=$(git rev-parse --short HEAD) &&
-		write_expected_sub $new_head
-	) &&
-	git add submodule &&
-	git commit -m "new submodule" &&
-	new_head=$(git rev-parse --short HEAD) &&
-	write_expected_super $new_head &&
+	add_submodule_commits &&
+	add_superproject_commits &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err
@@ -378,19 +387,7 @@ test_expect_success "Recursion picks up all submodules when necessary" '
 '
 
 test_expect_success "'--recurse-submodules=on-demand' doesn't recurse when no new commits are fetched in the superproject (and ignores config)" '
-	add_upstream_commit &&
-	(
-		cd submodule &&
-		(
-			cd subdir/deepsubmodule &&
-			git fetch &&
-			git checkout -q FETCH_HEAD
-		) &&
-		git add subdir/deepsubmodule &&
-		git commit -m "new deepsubmodule" &&
-		new_head=$(git rev-parse --short HEAD) &&
-		write_expected_sub $new_head
-	) &&
+	add_submodule_commits &&
 	(
 		cd downstream &&
 		git config fetch.recurseSubmodules true &&
@@ -402,10 +399,8 @@ test_expect_success "'--recurse-submodules=on-demand' doesn't recurse when no ne
 '
 
 test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necessary (and ignores config)" '
-	git add submodule &&
-	git commit -m "new submodule" &&
-	new_head=$(git rev-parse --short HEAD) &&
-	write_expected_super $new_head &&
+	add_submodule_commits &&
+	add_superproject_commits &&
 	(
 		cd downstream &&
 		git config fetch.recurseSubmodules false &&
@@ -425,7 +420,7 @@ test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necess
 '
 
 test_expect_success "'--recurse-submodules=on-demand' stops when no new submodule commits are found in the superproject (and ignores config)" '
-	add_upstream_commit &&
+	add_submodule_commits &&
 	echo a >> file &&
 	git add file &&
 	git commit -m "new file" &&
@@ -446,7 +441,7 @@ test_expect_success "'fetch.recurseSubmodules=on-demand' overrides global config
 		cd downstream &&
 		git fetch --recurse-submodules
 	) &&
-	add_upstream_commit &&
+	add_submodule_commits &&
 	git config --global fetch.recurseSubmodules false &&
 	git add submodule &&
 	git commit -m "new submodule" &&
@@ -472,7 +467,7 @@ test_expect_success "'submodule.<sub>.fetchRecurseSubmodules=on-demand' override
 		cd downstream &&
 		git fetch --recurse-submodules
 	) &&
-	add_upstream_commit &&
+	add_submodule_commits &&
 	git config fetch.recurseSubmodules false &&
 	git add submodule &&
 	git commit -m "new submodule" &&
@@ -522,7 +517,7 @@ test_expect_success "'fetch.recurseSubmodules=on-demand' works also without .git
 		cd downstream &&
 		git fetch --recurse-submodules
 	) &&
-	add_upstream_commit &&
+	add_submodule_commits &&
 	git add submodule &&
 	git rm .gitmodules &&
 	git commit -m "new submodule without .gitmodules" &&
-- 
2.33.GIT


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v5 04/10] submodule: make static functions read submodules from commits
  2022-03-08  0:14       ` [PATCH v5 " Glen Choo
                           ` (2 preceding siblings ...)
  2022-03-08  0:14         ` [PATCH v5 03/10] t5526: create superproject commits with test helper Glen Choo
@ 2022-03-08  0:14         ` Glen Choo
  2022-03-08  0:14         ` [PATCH v5 05/10] submodule: inline submodule_commits() into caller Glen Choo
                           ` (7 subsequent siblings)
  11 siblings, 0 replies; 149+ messages in thread
From: Glen Choo @ 2022-03-08  0:14 UTC (permalink / raw)
  To: git
  Cc: Glen Choo, Jonathan Tan, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

A future commit will teach "fetch --recurse-submodules" to fetch
unpopulated submodules. To prepare for this, teach the necessary static
functions how to read submodules from superproject commits using a
"treeish_name" argument (instead of always reading from the index and
filesystem) but do not actually change where submodules are read from.
Submodules will be read from commits when we fetch unpopulated
submodules.

Signed-off-by: Glen Choo <chooglen@google.com>
---
 submodule.c | 29 +++++++++++++++++++----------
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/submodule.c b/submodule.c
index 5ace18a7d9..4f3300f2cb 100644
--- a/submodule.c
+++ b/submodule.c
@@ -932,6 +932,7 @@ struct has_commit_data {
 	struct repository *repo;
 	int result;
 	const char *path;
+	const struct object_id *super_oid;
 };
 
 static int check_has_commit(const struct object_id *oid, void *data)
@@ -940,7 +941,7 @@ static int check_has_commit(const struct object_id *oid, void *data)
 	struct repository subrepo;
 	enum object_type type;
 
-	if (repo_submodule_init(&subrepo, cb->repo, cb->path, null_oid())) {
+	if (repo_submodule_init(&subrepo, cb->repo, cb->path, cb->super_oid)) {
 		cb->result = 0;
 		goto cleanup;
 	}
@@ -968,9 +969,15 @@ static int check_has_commit(const struct object_id *oid, void *data)
 
 static int submodule_has_commits(struct repository *r,
 				 const char *path,
+				 const struct object_id *super_oid,
 				 struct oid_array *commits)
 {
-	struct has_commit_data has_commit = { r, 1, path };
+	struct has_commit_data has_commit = {
+		.repo = r,
+		.result = 1,
+		.path = path,
+		.super_oid = super_oid
+	};
 
 	/*
 	 * Perform a cheap, but incorrect check for the existence of 'commits'.
@@ -1017,7 +1024,7 @@ static int submodule_needs_pushing(struct repository *r,
 				   const char *path,
 				   struct oid_array *commits)
 {
-	if (!submodule_has_commits(r, path, commits))
+	if (!submodule_has_commits(r, path, null_oid(), commits))
 		/*
 		 * NOTE: We do consider it safe to return "no" here. The
 		 * correct answer would be "We do not know" instead of
@@ -1277,7 +1284,7 @@ static void calculate_changed_submodule_paths(struct repository *r,
 		if (!path)
 			continue;
 
-		if (submodule_has_commits(r, path, commits)) {
+		if (submodule_has_commits(r, path, null_oid(), commits)) {
 			oid_array_clear(commits);
 			*name->string = '\0';
 		}
@@ -1402,12 +1409,13 @@ static const struct submodule *get_non_gitmodules_submodule(const char *path)
 }
 
 static struct fetch_task *fetch_task_create(struct repository *r,
-					    const char *path)
+					    const char *path,
+					    const struct object_id *treeish_name)
 {
 	struct fetch_task *task = xmalloc(sizeof(*task));
 	memset(task, 0, sizeof(*task));
 
-	task->sub = submodule_from_path(r, null_oid(), path);
+	task->sub = submodule_from_path(r, treeish_name, path);
 	if (!task->sub) {
 		/*
 		 * No entry in .gitmodules? Technically not a submodule,
@@ -1439,11 +1447,12 @@ static void fetch_task_release(struct fetch_task *p)
 }
 
 static struct repository *get_submodule_repo_for(struct repository *r,
-						 const char *path)
+						 const char *path,
+						 const struct object_id *treeish_name)
 {
 	struct repository *ret = xmalloc(sizeof(*ret));
 
-	if (repo_submodule_init(ret, r, path, null_oid())) {
+	if (repo_submodule_init(ret, r, path, treeish_name)) {
 		free(ret);
 		return NULL;
 	}
@@ -1464,7 +1473,7 @@ static int get_next_submodule(struct child_process *cp,
 		if (!S_ISGITLINK(ce->ce_mode))
 			continue;
 
-		task = fetch_task_create(spf->r, ce->name);
+		task = fetch_task_create(spf->r, ce->name, null_oid());
 		if (!task)
 			continue;
 
@@ -1487,7 +1496,7 @@ static int get_next_submodule(struct child_process *cp,
 			continue;
 		}
 
-		task->repo = get_submodule_repo_for(spf->r, task->sub->path);
+		task->repo = get_submodule_repo_for(spf->r, task->sub->path, null_oid());
 		if (task->repo) {
 			struct strbuf submodule_prefix = STRBUF_INIT;
 			child_process_init(cp);
-- 
2.33.GIT


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v5 05/10] submodule: inline submodule_commits() into caller
  2022-03-08  0:14       ` [PATCH v5 " Glen Choo
                           ` (3 preceding siblings ...)
  2022-03-08  0:14         ` [PATCH v5 04/10] submodule: make static functions read submodules from commits Glen Choo
@ 2022-03-08  0:14         ` Glen Choo
  2022-03-08  0:14         ` [PATCH v5 06/10] submodule: store new submodule commits oid_array in a struct Glen Choo
                           ` (6 subsequent siblings)
  11 siblings, 0 replies; 149+ messages in thread
From: Glen Choo @ 2022-03-08  0:14 UTC (permalink / raw)
  To: git
  Cc: Glen Choo, Jonathan Tan, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

When collecting the string_list of changed submodule names, the new
submodules commits are stored in the string_list_item.util as an
oid_array. A subsequent commit will replace the oid_array with a struct
that has more information.

Prepare for this change by inlining submodule_commits() (which inserts
into the string_list and initializes the string_list_item.util) into its
only caller so that the code is easier to refactor later.

Signed-off-by: Glen Choo <chooglen@google.com>
---
 submodule.c | 22 ++++++----------------
 1 file changed, 6 insertions(+), 16 deletions(-)

diff --git a/submodule.c b/submodule.c
index 4f3300f2cb..3bc189cf05 100644
--- a/submodule.c
+++ b/submodule.c
@@ -782,19 +782,6 @@ const struct submodule *submodule_from_ce(const struct cache_entry *ce)
 	return submodule_from_path(the_repository, null_oid(), ce->name);
 }
 
-static struct oid_array *submodule_commits(struct string_list *submodules,
-					   const char *name)
-{
-	struct string_list_item *item;
-
-	item = string_list_insert(submodules, name);
-	if (item->util)
-		return (struct oid_array *) item->util;
-
-	/* NEEDSWORK: should we have oid_array_init()? */
-	item->util = xcalloc(1, sizeof(struct oid_array));
-	return (struct oid_array *) item->util;
-}
 
 struct collect_changed_submodules_cb_data {
 	struct repository *repo;
@@ -830,9 +817,9 @@ static void collect_changed_submodules_cb(struct diff_queue_struct *q,
 
 	for (i = 0; i < q->nr; i++) {
 		struct diff_filepair *p = q->queue[i];
-		struct oid_array *commits;
 		const struct submodule *submodule;
 		const char *name;
+		struct string_list_item *item;
 
 		if (!S_ISGITLINK(p->two->mode))
 			continue;
@@ -859,8 +846,11 @@ static void collect_changed_submodules_cb(struct diff_queue_struct *q,
 		if (!name)
 			continue;
 
-		commits = submodule_commits(changed, name);
-		oid_array_append(commits, &p->two->oid);
+		item = string_list_insert(changed, name);
+		if (!item->util)
+			/* NEEDSWORK: should we have oid_array_init()? */
+			item->util = xcalloc(1, sizeof(struct oid_array));
+		oid_array_append(item->util, &p->two->oid);
 	}
 }
 
-- 
2.33.GIT


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v5 06/10] submodule: store new submodule commits oid_array in a struct
  2022-03-08  0:14       ` [PATCH v5 " Glen Choo
                           ` (4 preceding siblings ...)
  2022-03-08  0:14         ` [PATCH v5 05/10] submodule: inline submodule_commits() into caller Glen Choo
@ 2022-03-08  0:14         ` Glen Choo
  2022-03-08  0:14         ` [PATCH v5 07/10] submodule: extract get_fetch_task() Glen Choo
                           ` (5 subsequent siblings)
  11 siblings, 0 replies; 149+ messages in thread
From: Glen Choo @ 2022-03-08  0:14 UTC (permalink / raw)
  To: git
  Cc: Glen Choo, Jonathan Tan, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

This commit prepares for a future commit that will teach `git fetch
--recurse-submodules` how to fetch submodules that are present in
<gitdir>/modules, but are not populated. To do this, we need to store
more information about the changed submodule so that we can read the
submodule configuration from the superproject commit instead of the
filesystem.

Refactor the changed submodules string_list.util to hold a struct
instead of an oid_array. This struct only holds the new_commits
oid_array for now; more information will be added later.

Signed-off-by: Glen Choo <chooglen@google.com>
---
 submodule.c | 52 ++++++++++++++++++++++++++++++++++------------------
 1 file changed, 34 insertions(+), 18 deletions(-)

diff --git a/submodule.c b/submodule.c
index 3bc189cf05..0b9c25f9d3 100644
--- a/submodule.c
+++ b/submodule.c
@@ -806,6 +806,20 @@ static const char *default_name_or_path(const char *path_or_name)
 	return path_or_name;
 }
 
+/*
+ * Holds relevant information for a changed submodule. Used as the .util
+ * member of the changed submodule string_list_item.
+ */
+struct changed_submodule_data {
+	/* The submodule commits that have changed in the rev walk. */
+	struct oid_array new_commits;
+};
+
+static void changed_submodule_data_clear(struct changed_submodule_data *cs_data)
+{
+	oid_array_clear(&cs_data->new_commits);
+}
+
 static void collect_changed_submodules_cb(struct diff_queue_struct *q,
 					  struct diff_options *options,
 					  void *data)
@@ -820,6 +834,7 @@ static void collect_changed_submodules_cb(struct diff_queue_struct *q,
 		const struct submodule *submodule;
 		const char *name;
 		struct string_list_item *item;
+		struct changed_submodule_data *cs_data;
 
 		if (!S_ISGITLINK(p->two->mode))
 			continue;
@@ -848,9 +863,9 @@ static void collect_changed_submodules_cb(struct diff_queue_struct *q,
 
 		item = string_list_insert(changed, name);
 		if (!item->util)
-			/* NEEDSWORK: should we have oid_array_init()? */
-			item->util = xcalloc(1, sizeof(struct oid_array));
-		oid_array_append(item->util, &p->two->oid);
+			item->util = xcalloc(1, sizeof(struct changed_submodule_data));
+		cs_data = item->util;
+		oid_array_append(&cs_data->new_commits, &p->two->oid);
 	}
 }
 
@@ -897,11 +912,12 @@ static void collect_changed_submodules(struct repository *r,
 	reset_revision_walk();
 }
 
-static void free_submodules_oids(struct string_list *submodules)
+static void free_submodules_data(struct string_list *submodules)
 {
 	struct string_list_item *item;
 	for_each_string_list_item(item, submodules)
-		oid_array_clear((struct oid_array *) item->util);
+		changed_submodule_data_clear(item->util);
+
 	string_list_clear(submodules, 1);
 }
 
@@ -1074,7 +1090,7 @@ int find_unpushed_submodules(struct repository *r,
 	collect_changed_submodules(r, &submodules, &argv);
 
 	for_each_string_list_item(name, &submodules) {
-		struct oid_array *commits = name->util;
+		struct changed_submodule_data *cs_data = name->util;
 		const struct submodule *submodule;
 		const char *path = NULL;
 
@@ -1087,11 +1103,11 @@ int find_unpushed_submodules(struct repository *r,
 		if (!path)
 			continue;
 
-		if (submodule_needs_pushing(r, path, commits))
+		if (submodule_needs_pushing(r, path, &cs_data->new_commits))
 			string_list_insert(needs_pushing, path);
 	}
 
-	free_submodules_oids(&submodules);
+	free_submodules_data(&submodules);
 	strvec_clear(&argv);
 
 	return needs_pushing->nr;
@@ -1261,7 +1277,7 @@ static void calculate_changed_submodule_paths(struct repository *r,
 	collect_changed_submodules(r, changed_submodule_names, &argv);
 
 	for_each_string_list_item(name, changed_submodule_names) {
-		struct oid_array *commits = name->util;
+		struct changed_submodule_data *cs_data = name->util;
 		const struct submodule *submodule;
 		const char *path = NULL;
 
@@ -1274,8 +1290,8 @@ static void calculate_changed_submodule_paths(struct repository *r,
 		if (!path)
 			continue;
 
-		if (submodule_has_commits(r, path, null_oid(), commits)) {
-			oid_array_clear(commits);
+		if (submodule_has_commits(r, path, null_oid(), &cs_data->new_commits)) {
+			changed_submodule_data_clear(cs_data);
 			*name->string = '\0';
 		}
 	}
@@ -1312,7 +1328,7 @@ int submodule_touches_in_range(struct repository *r,
 
 	strvec_clear(&args);
 
-	free_submodules_oids(&subs);
+	free_submodules_data(&subs);
 	return ret;
 }
 
@@ -1596,7 +1612,7 @@ static int fetch_finish(int retvalue, struct strbuf *err,
 	struct fetch_task *task = task_cb;
 
 	struct string_list_item *it;
-	struct oid_array *commits;
+	struct changed_submodule_data *cs_data;
 
 	if (!task || !task->sub)
 		BUG("callback cookie bogus");
@@ -1624,14 +1640,14 @@ static int fetch_finish(int retvalue, struct strbuf *err,
 		/* Could be an unchanged submodule, not contained in the list */
 		goto out;
 
-	commits = it->util;
-	oid_array_filter(commits,
+	cs_data = it->util;
+	oid_array_filter(&cs_data->new_commits,
 			 commit_missing_in_sub,
 			 task->repo);
 
 	/* Are there commits we want, but do not exist? */
-	if (commits->nr) {
-		task->commits = commits;
+	if (cs_data->new_commits.nr) {
+		task->commits = &cs_data->new_commits;
 		ALLOC_GROW(spf->oid_fetch_tasks,
 			   spf->oid_fetch_tasks_nr + 1,
 			   spf->oid_fetch_tasks_alloc);
@@ -1689,7 +1705,7 @@ int fetch_populated_submodules(struct repository *r,
 
 	strvec_clear(&spf.args);
 out:
-	free_submodules_oids(&spf.changed_submodule_names);
+	free_submodules_data(&spf.changed_submodule_names);
 	return spf.result;
 }
 
-- 
2.33.GIT


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v5 07/10] submodule: extract get_fetch_task()
  2022-03-08  0:14       ` [PATCH v5 " Glen Choo
                           ` (5 preceding siblings ...)
  2022-03-08  0:14         ` [PATCH v5 06/10] submodule: store new submodule commits oid_array in a struct Glen Choo
@ 2022-03-08  0:14         ` Glen Choo
  2022-03-08  0:14         ` [PATCH v5 08/10] submodule: move logic into fetch_task_create() Glen Choo
                           ` (4 subsequent siblings)
  11 siblings, 0 replies; 149+ messages in thread
From: Glen Choo @ 2022-03-08  0:14 UTC (permalink / raw)
  To: git
  Cc: Glen Choo, Jonathan Tan, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

get_next_submodule() configures the parallel submodule fetch by
performing two functions:

* iterate the index to find submodules
* configure the child processes to fetch the submodules found in the
  previous step

Extract the index iterating code into an iterator function,
get_fetch_task(), so that get_next_submodule() is agnostic of how
to find submodules. This prepares for a subsequent commit will teach the
fetch machinery to also iterate through the list of changed
submodules (in addition to the index).

Signed-off-by: Glen Choo <chooglen@google.com>
---
 submodule.c | 61 +++++++++++++++++++++++++++++++----------------------
 1 file changed, 36 insertions(+), 25 deletions(-)

diff --git a/submodule.c b/submodule.c
index 0b9c25f9d3..7a5316b6f7 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1389,6 +1389,7 @@ struct fetch_task {
 	struct repository *repo;
 	const struct submodule *sub;
 	unsigned free_sub : 1; /* Do we need to free the submodule? */
+	const char *default_argv; /* The default fetch mode. */
 
 	struct oid_array *commits; /* Ensure these commits are fetched */
 };
@@ -1466,14 +1467,11 @@ static struct repository *get_submodule_repo_for(struct repository *r,
 	return ret;
 }
 
-static int get_next_submodule(struct child_process *cp,
-			      struct strbuf *err, void *data, void **task_cb)
+static struct fetch_task *
+get_fetch_task(struct submodule_parallel_fetch *spf, struct strbuf *err)
 {
-	struct submodule_parallel_fetch *spf = data;
-
 	for (; spf->count < spf->r->index->cache_nr; spf->count++) {
 		const struct cache_entry *ce = spf->r->index->cache[spf->count];
-		const char *default_argv;
 		struct fetch_task *task;
 
 		if (!S_ISGITLINK(ce->ce_mode))
@@ -1493,10 +1491,10 @@ static int get_next_submodule(struct child_process *cp,
 					&spf->changed_submodule_names,
 					task->sub->name))
 				continue;
-			default_argv = "on-demand";
+			task->default_argv = "on-demand";
 			break;
 		case RECURSE_SUBMODULES_ON:
-			default_argv = "yes";
+			task->default_argv = "yes";
 			break;
 		case RECURSE_SUBMODULES_OFF:
 			continue;
@@ -1504,29 +1502,12 @@ static int get_next_submodule(struct child_process *cp,
 
 		task->repo = get_submodule_repo_for(spf->r, task->sub->path, null_oid());
 		if (task->repo) {
-			struct strbuf submodule_prefix = STRBUF_INIT;
-			child_process_init(cp);
-			cp->dir = task->repo->gitdir;
-			prepare_submodule_repo_env_in_gitdir(&cp->env_array);
-			cp->git_cmd = 1;
 			if (!spf->quiet)
 				strbuf_addf(err, _("Fetching submodule %s%s\n"),
 					    spf->prefix, ce->name);
-			strvec_init(&cp->args);
-			strvec_pushv(&cp->args, spf->args.v);
-			strvec_push(&cp->args, default_argv);
-			strvec_push(&cp->args, "--submodule-prefix");
-
-			strbuf_addf(&submodule_prefix, "%s%s/",
-						       spf->prefix,
-						       task->sub->path);
-			strvec_push(&cp->args, submodule_prefix.buf);
 
 			spf->count++;
-			*task_cb = task;
-
-			strbuf_release(&submodule_prefix);
-			return 1;
+			return task;
 		} else {
 			struct strbuf empty_submodule_path = STRBUF_INIT;
 
@@ -1550,6 +1531,36 @@ static int get_next_submodule(struct child_process *cp,
 			strbuf_release(&empty_submodule_path);
 		}
 	}
+	return NULL;
+}
+
+static int get_next_submodule(struct child_process *cp, struct strbuf *err,
+			      void *data, void **task_cb)
+{
+	struct submodule_parallel_fetch *spf = data;
+	struct fetch_task *task = get_fetch_task(spf, err);
+
+	if (task) {
+		struct strbuf submodule_prefix = STRBUF_INIT;
+
+		child_process_init(cp);
+		cp->dir = task->repo->gitdir;
+		prepare_submodule_repo_env_in_gitdir(&cp->env_array);
+		cp->git_cmd = 1;
+		strvec_init(&cp->args);
+		strvec_pushv(&cp->args, spf->args.v);
+		strvec_push(&cp->args, task->default_argv);
+		strvec_push(&cp->args, "--submodule-prefix");
+
+		strbuf_addf(&submodule_prefix, "%s%s/",
+						spf->prefix,
+						task->sub->path);
+		strvec_push(&cp->args, submodule_prefix.buf);
+		*task_cb = task;
+
+		strbuf_release(&submodule_prefix);
+		return 1;
+	}
 
 	if (spf->oid_fetch_tasks_nr) {
 		struct fetch_task *task =
-- 
2.33.GIT


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v5 08/10] submodule: move logic into fetch_task_create()
  2022-03-08  0:14       ` [PATCH v5 " Glen Choo
                           ` (6 preceding siblings ...)
  2022-03-08  0:14         ` [PATCH v5 07/10] submodule: extract get_fetch_task() Glen Choo
@ 2022-03-08  0:14         ` Glen Choo
  2022-03-08  0:14         ` [PATCH v5 09/10] fetch: fetch unpopulated, changed submodules Glen Choo
                           ` (3 subsequent siblings)
  11 siblings, 0 replies; 149+ messages in thread
From: Glen Choo @ 2022-03-08  0:14 UTC (permalink / raw)
  To: git
  Cc: Glen Choo, Jonathan Tan, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

get_fetch_task() gets a fetch task by iterating the index; a future
commit will introduce a similar function, get_fetch_task_from_changed(),
that gets a fetch task from the list of changed submodules. Both
functions are similar in that they need to:

* create a fetch task
* initialize the submodule repo for the fetch task
* determine the default recursion mode

Move all of this logic into fetch_task_create() so that it is no longer
split between fetch_task_create() and get_fetch_task(). This will make
it easier to share code with get_fetch_task_from_changed().

Signed-off-by: Glen Choo <chooglen@google.com>
---
 submodule.c | 99 ++++++++++++++++++++++++++++-------------------------
 1 file changed, 52 insertions(+), 47 deletions(-)

diff --git a/submodule.c b/submodule.c
index 7a5316b6f7..b36ef26752 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1415,32 +1415,6 @@ static const struct submodule *get_non_gitmodules_submodule(const char *path)
 	return (const struct submodule *) ret;
 }
 
-static struct fetch_task *fetch_task_create(struct repository *r,
-					    const char *path,
-					    const struct object_id *treeish_name)
-{
-	struct fetch_task *task = xmalloc(sizeof(*task));
-	memset(task, 0, sizeof(*task));
-
-	task->sub = submodule_from_path(r, treeish_name, path);
-	if (!task->sub) {
-		/*
-		 * No entry in .gitmodules? Technically not a submodule,
-		 * but historically we supported repositories that happen to be
-		 * in-place where a gitlink is. Keep supporting them.
-		 */
-		task->sub = get_non_gitmodules_submodule(path);
-		if (!task->sub) {
-			free(task);
-			return NULL;
-		}
-
-		task->free_sub = 1;
-	}
-
-	return task;
-}
-
 static void fetch_task_release(struct fetch_task *p)
 {
 	if (p->free_sub)
@@ -1467,6 +1441,57 @@ static struct repository *get_submodule_repo_for(struct repository *r,
 	return ret;
 }
 
+static struct fetch_task *fetch_task_create(struct submodule_parallel_fetch *spf,
+					    const char *path,
+					    const struct object_id *treeish_name)
+{
+	struct fetch_task *task = xmalloc(sizeof(*task));
+	memset(task, 0, sizeof(*task));
+
+	task->sub = submodule_from_path(spf->r, treeish_name, path);
+
+	if (!task->sub) {
+		/*
+		 * No entry in .gitmodules? Technically not a submodule,
+		 * but historically we supported repositories that happen to be
+		 * in-place where a gitlink is. Keep supporting them.
+		 */
+		task->sub = get_non_gitmodules_submodule(path);
+		if (!task->sub)
+			goto cleanup;
+
+		task->free_sub = 1;
+	}
+
+	switch (get_fetch_recurse_config(task->sub, spf))
+	{
+	default:
+	case RECURSE_SUBMODULES_DEFAULT:
+	case RECURSE_SUBMODULES_ON_DEMAND:
+		if (!task->sub ||
+			!string_list_lookup(
+				&spf->changed_submodule_names,
+				task->sub->name))
+			goto cleanup;
+		task->default_argv = "on-demand";
+		break;
+	case RECURSE_SUBMODULES_ON:
+		task->default_argv = "yes";
+		break;
+	case RECURSE_SUBMODULES_OFF:
+		goto cleanup;
+	}
+
+	task->repo = get_submodule_repo_for(spf->r, path, treeish_name);
+
+	return task;
+
+ cleanup:
+	fetch_task_release(task);
+	free(task);
+	return NULL;
+}
+
 static struct fetch_task *
 get_fetch_task(struct submodule_parallel_fetch *spf, struct strbuf *err)
 {
@@ -1477,30 +1502,10 @@ get_fetch_task(struct submodule_parallel_fetch *spf, struct strbuf *err)
 		if (!S_ISGITLINK(ce->ce_mode))
 			continue;
 
-		task = fetch_task_create(spf->r, ce->name, null_oid());
+		task = fetch_task_create(spf, ce->name, null_oid());
 		if (!task)
 			continue;
 
-		switch (get_fetch_recurse_config(task->sub, spf))
-		{
-		default:
-		case RECURSE_SUBMODULES_DEFAULT:
-		case RECURSE_SUBMODULES_ON_DEMAND:
-			if (!task->sub ||
-			    !string_list_lookup(
-					&spf->changed_submodule_names,
-					task->sub->name))
-				continue;
-			task->default_argv = "on-demand";
-			break;
-		case RECURSE_SUBMODULES_ON:
-			task->default_argv = "yes";
-			break;
-		case RECURSE_SUBMODULES_OFF:
-			continue;
-		}
-
-		task->repo = get_submodule_repo_for(spf->r, task->sub->path, null_oid());
 		if (task->repo) {
 			if (!spf->quiet)
 				strbuf_addf(err, _("Fetching submodule %s%s\n"),
-- 
2.33.GIT


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v5 09/10] fetch: fetch unpopulated, changed submodules
  2022-03-08  0:14       ` [PATCH v5 " Glen Choo
                           ` (7 preceding siblings ...)
  2022-03-08  0:14         ` [PATCH v5 08/10] submodule: move logic into fetch_task_create() Glen Choo
@ 2022-03-08  0:14         ` Glen Choo
  2022-03-08  0:14         ` [PATCH v5 10/10] submodule: fix latent check_has_commit() bug Glen Choo
                           ` (2 subsequent siblings)
  11 siblings, 0 replies; 149+ messages in thread
From: Glen Choo @ 2022-03-08  0:14 UTC (permalink / raw)
  To: git
  Cc: Glen Choo, Jonathan Tan, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

"git fetch --recurse-submodules" only considers populated
submodules (i.e. submodules that can be found by iterating the index),
which makes "git fetch" behave differently based on which commit is
checked out. As a result, even if the user has initialized all submodules
correctly, they may not fetch the necessary submodule commits, and
commands like "git checkout --recurse-submodules" might fail.

Teach "git fetch" to fetch cloned, changed submodules regardless of
whether they are populated. This is in addition to the current behavior
of fetching populated submodules (which is always attempted regardless
of what was fetched in the superproject, or even if nothing was fetched
in the superproject).

A submodule may be encountered multiple times (via the list of
populated submodules or via the list of changed submodules). When this
happens, "git fetch" only reads the 'populated copy' and ignores the
'changed copy'. Amend the verify_fetch_result() test helper so that we
can assert on which 'copy' is being read.

Signed-off-by: Glen Choo <chooglen@google.com>
---
 Documentation/fetch-options.txt |  26 ++--
 Documentation/git-fetch.txt     |  10 +-
 builtin/fetch.c                 |  14 +-
 submodule.c                     | 178 +++++++++++++++++++--
 submodule.h                     |  12 +-
 t/t5526-fetch-submodules.sh     | 268 +++++++++++++++++++++++++++++++-
 6 files changed, 462 insertions(+), 46 deletions(-)

diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt
index f903683189..6cdd9d43c5 100644
--- a/Documentation/fetch-options.txt
+++ b/Documentation/fetch-options.txt
@@ -186,15 +186,23 @@ endif::git-pull[]
 ifndef::git-pull[]
 --recurse-submodules[=yes|on-demand|no]::
 	This option controls if and under what conditions new commits of
-	populated submodules should be fetched too. It can be used as a
-	boolean option to completely disable recursion when set to 'no' or to
-	unconditionally recurse into all populated submodules when set to
-	'yes', which is the default when this option is used without any
-	value. Use 'on-demand' to only recurse into a populated submodule
-	when the superproject retrieves a commit that updates the submodule's
-	reference to a commit that isn't already in the local submodule
-	clone. By default, 'on-demand' is used, unless
-	`fetch.recurseSubmodules` is set (see linkgit:git-config[1]).
+	submodules should be fetched too. When recursing through submodules,
+	`git fetch` always attempts to fetch "changed" submodules, that is, a
+	submodule that has commits that are referenced by a newly fetched
+	superproject commit but are missing in the local submodule clone. A
+	changed submodule can be fetched as long as it is present locally e.g.
+	in `$GIT_DIR/modules/` (see linkgit:gitsubmodules[7]); if the upstream
+	adds a new submodule, that submodule cannot be fetched until it is
+	cloned e.g. by `git submodule update`.
++
+When set to 'on-demand', only changed submodules are fetched. When set
+to 'yes', all populated submodules are fetched and submodules that are
+both unpopulated and changed are fetched. When set to 'no', submodules
+are never fetched.
++
+When unspecified, this uses the value of `fetch.recurseSubmodules` if it
+is set (see linkgit:git-config[1]), defaulting to 'on-demand' if unset.
+When this option is used without any value, it defaults to 'yes'.
 endif::git-pull[]
 
 -j::
diff --git a/Documentation/git-fetch.txt b/Documentation/git-fetch.txt
index 550c16ca61..e9d364669a 100644
--- a/Documentation/git-fetch.txt
+++ b/Documentation/git-fetch.txt
@@ -287,12 +287,10 @@ include::transfer-data-leaks.txt[]
 
 BUGS
 ----
-Using --recurse-submodules can only fetch new commits in already checked
-out submodules right now. When e.g. upstream added a new submodule in the
-just fetched commits of the superproject the submodule itself cannot be
-fetched, making it impossible to check out that submodule later without
-having to do a fetch again. This is expected to be fixed in a future Git
-version.
+Using --recurse-submodules can only fetch new commits in submodules that are
+present locally e.g. in `$GIT_DIR/modules/`. If the upstream adds a new
+submodule, that submodule cannot be fetched until it is cloned e.g. by `git
+submodule update`. This is expected to be fixed in a future Git version.
 
 SEE ALSO
 --------
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 95832ba1df..97a89763c8 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -2178,13 +2178,13 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 			max_children = fetch_parallel_config;
 
 		add_options_to_argv(&options);
-		result = fetch_populated_submodules(the_repository,
-						    &options,
-						    submodule_prefix,
-						    recurse_submodules,
-						    recurse_submodules_default,
-						    verbosity < 0,
-						    max_children);
+		result = fetch_submodules(the_repository,
+					  &options,
+					  submodule_prefix,
+					  recurse_submodules,
+					  recurse_submodules_default,
+					  verbosity < 0,
+					  max_children);
 		strvec_clear(&options);
 	}
 
diff --git a/submodule.c b/submodule.c
index b36ef26752..1f5f39ce18 100644
--- a/submodule.c
+++ b/submodule.c
@@ -808,9 +808,25 @@ static const char *default_name_or_path(const char *path_or_name)
 
 /*
  * Holds relevant information for a changed submodule. Used as the .util
- * member of the changed submodule string_list_item.
+ * member of the changed submodule name string_list_item.
+ *
+ * (super_oid, path) allows the submodule config to be read from _some_
+ * .gitmodules file. We store this information the first time we find a
+ * superproject commit that points to the submodule, but this is
+ * arbitrary - we can choose any (super_oid, path) that matches the
+ * submodule's name.
  */
 struct changed_submodule_data {
+	/*
+	 * The first superproject commit in the rev walk that points to
+	 * the submodule.
+	 */
+	const struct object_id *super_oid;
+	/*
+	 * Path to the submodule in the superproject commit referenced
+	 * by 'super_oid'.
+	 */
+	char *path;
 	/* The submodule commits that have changed in the rev walk. */
 	struct oid_array new_commits;
 };
@@ -818,6 +834,7 @@ struct changed_submodule_data {
 static void changed_submodule_data_clear(struct changed_submodule_data *cs_data)
 {
 	oid_array_clear(&cs_data->new_commits);
+	free(cs_data->path);
 }
 
 static void collect_changed_submodules_cb(struct diff_queue_struct *q,
@@ -862,9 +879,14 @@ static void collect_changed_submodules_cb(struct diff_queue_struct *q,
 			continue;
 
 		item = string_list_insert(changed, name);
-		if (!item->util)
+		if (item->util)
+			cs_data = item->util;
+		else {
 			item->util = xcalloc(1, sizeof(struct changed_submodule_data));
-		cs_data = item->util;
+			cs_data = item->util;
+			cs_data->super_oid = commit_oid;
+			cs_data->path = xstrdup(p->two->path);
+		}
 		oid_array_append(&cs_data->new_commits, &p->two->oid);
 	}
 }
@@ -1253,14 +1275,36 @@ void check_for_new_submodule_commits(struct object_id *oid)
 	oid_array_append(&ref_tips_after_fetch, oid);
 }
 
+/*
+ * Returns 1 if there is at least one submodule gitdir in
+ * $GIT_DIR/modules and 0 otherwise. This follows
+ * submodule_name_to_gitdir(), which looks for submodules in
+ * $GIT_DIR/modules, not $GIT_COMMON_DIR.
+ *
+ * A submodule can be moved to $GIT_DIR/modules manually by running "git
+ * submodule absorbgitdirs", or it may be initialized there by "git
+ * submodule update".
+ */
+static int repo_has_absorbed_submodules(struct repository *r)
+{
+	int ret;
+	struct strbuf buf = STRBUF_INIT;
+
+	strbuf_repo_git_path(&buf, r, "modules/");
+	ret = file_exists(buf.buf) && !is_empty_dir(buf.buf);
+	strbuf_release(&buf);
+	return ret;
+}
+
 static void calculate_changed_submodule_paths(struct repository *r,
 		struct string_list *changed_submodule_names)
 {
 	struct strvec argv = STRVEC_INIT;
 	struct string_list_item *name;
 
-	/* No need to check if there are no submodules configured */
-	if (!submodule_from_path(r, NULL, NULL))
+	/* No need to check if no submodules would be fetched */
+	if (!submodule_from_path(r, NULL, NULL) &&
+	    !repo_has_absorbed_submodules(r))
 		return;
 
 	strvec_push(&argv, "--"); /* argv[0] program name */
@@ -1333,7 +1377,16 @@ int submodule_touches_in_range(struct repository *r,
 }
 
 struct submodule_parallel_fetch {
-	int count;
+	/*
+	 * The index of the last index entry processed by
+	 * get_fetch_task_from_index().
+	 */
+	int index_count;
+	/*
+	 * The index of the last string_list entry processed by
+	 * get_fetch_task_from_changed().
+	 */
+	int changed_count;
 	struct strvec args;
 	struct repository *r;
 	const char *prefix;
@@ -1342,7 +1395,16 @@ struct submodule_parallel_fetch {
 	int quiet;
 	int result;
 
+	/*
+	 * Names of submodules that have new commits. Generated by
+	 * walking the newly fetched superproject commits.
+	 */
 	struct string_list changed_submodule_names;
+	/*
+	 * Names of submodules that have already been processed. Lets us
+	 * avoid fetching the same submodule more than once.
+	 */
+	struct string_list seen_submodule_names;
 
 	/* Pending fetches by OIDs */
 	struct fetch_task **oid_fetch_tasks;
@@ -1353,6 +1415,7 @@ struct submodule_parallel_fetch {
 #define SPF_INIT { \
 	.args = STRVEC_INIT, \
 	.changed_submodule_names = STRING_LIST_INIT_DUP, \
+	.seen_submodule_names = STRING_LIST_INIT_DUP, \
 	.submodules_with_errors = STRBUF_INIT, \
 }
 
@@ -1390,6 +1453,7 @@ struct fetch_task {
 	const struct submodule *sub;
 	unsigned free_sub : 1; /* Do we need to free the submodule? */
 	const char *default_argv; /* The default fetch mode. */
+	struct strvec git_args; /* Args for the child git process. */
 
 	struct oid_array *commits; /* Ensure these commits are fetched */
 };
@@ -1425,6 +1489,8 @@ static void fetch_task_release(struct fetch_task *p)
 	if (p->repo)
 		repo_clear(p->repo);
 	FREE_AND_NULL(p->repo);
+
+	strvec_clear(&p->git_args);
 }
 
 static struct repository *get_submodule_repo_for(struct repository *r,
@@ -1463,6 +1529,9 @@ static struct fetch_task *fetch_task_create(struct submodule_parallel_fetch *spf
 		task->free_sub = 1;
 	}
 
+	if (string_list_lookup(&spf->seen_submodule_names, task->sub->name))
+		goto cleanup;
+
 	switch (get_fetch_recurse_config(task->sub, spf))
 	{
 	default:
@@ -1493,10 +1562,12 @@ static struct fetch_task *fetch_task_create(struct submodule_parallel_fetch *spf
 }
 
 static struct fetch_task *
-get_fetch_task(struct submodule_parallel_fetch *spf, struct strbuf *err)
+get_fetch_task_from_index(struct submodule_parallel_fetch *spf,
+			  struct strbuf *err)
 {
-	for (; spf->count < spf->r->index->cache_nr; spf->count++) {
-		const struct cache_entry *ce = spf->r->index->cache[spf->count];
+	for (; spf->index_count < spf->r->index->cache_nr; spf->index_count++) {
+		const struct cache_entry *ce =
+			spf->r->index->cache[spf->index_count];
 		struct fetch_task *task;
 
 		if (!S_ISGITLINK(ce->ce_mode))
@@ -1511,7 +1582,7 @@ get_fetch_task(struct submodule_parallel_fetch *spf, struct strbuf *err)
 				strbuf_addf(err, _("Fetching submodule %s%s\n"),
 					    spf->prefix, ce->name);
 
-			spf->count++;
+			spf->index_count++;
 			return task;
 		} else {
 			struct strbuf empty_submodule_path = STRBUF_INIT;
@@ -1539,11 +1610,83 @@ get_fetch_task(struct submodule_parallel_fetch *spf, struct strbuf *err)
 	return NULL;
 }
 
+static struct fetch_task *
+get_fetch_task_from_changed(struct submodule_parallel_fetch *spf,
+			    struct strbuf *err)
+{
+	for (; spf->changed_count < spf->changed_submodule_names.nr;
+	     spf->changed_count++) {
+		struct string_list_item item =
+			spf->changed_submodule_names.items[spf->changed_count];
+		struct changed_submodule_data *cs_data = item.util;
+		struct fetch_task *task;
+
+		if (!is_tree_submodule_active(spf->r, cs_data->super_oid,cs_data->path))
+			continue;
+
+		task = fetch_task_create(spf, cs_data->path,
+					 cs_data->super_oid);
+		if (!task)
+			continue;
+
+		if (!task->repo) {
+			strbuf_addf(err, _("Could not access submodule '%s' at commit %s\n"),
+				    cs_data->path,
+				    find_unique_abbrev(cs_data->super_oid, DEFAULT_ABBREV));
+
+			fetch_task_release(task);
+			free(task);
+			continue;
+		}
+
+		if (!spf->quiet)
+			strbuf_addf(err,
+				    _("Fetching submodule %s%s at commit %s\n"),
+				    spf->prefix, task->sub->path,
+				    find_unique_abbrev(cs_data->super_oid,
+						       DEFAULT_ABBREV));
+
+		spf->changed_count++;
+		/*
+		 * NEEDSWORK: Submodules set/unset a value for
+		 * core.worktree when they are populated/unpopulated by
+		 * "git checkout" (and similar commands, see
+		 * submodule_move_head() and
+		 * connect_work_tree_and_git_dir()), but if the
+		 * submodule is unpopulated in another way (e.g. "git
+		 * rm", "rm -r"), core.worktree will still be set even
+		 * though the directory doesn't exist, and the child
+		 * process will crash while trying to chdir into the
+		 * nonexistent directory.
+		 *
+		 * In this case, we know that the submodule has no
+		 * working tree, so we can work around this by
+		 * setting "--work-tree=." (--bare does not work because
+		 * worktree settings take precedence over bare-ness).
+		 * However, this is not necessarily true in other cases,
+		 * so a generalized solution is still necessary.
+		 *
+		 * Possible solutions:
+		 * - teach "git [add|rm]" to unset core.worktree and
+		 *   discourage users from removing submodules without
+		 *   using a Git command.
+		 * - teach submodule child processes to ignore stale
+		 *   core.worktree values.
+		 */
+		strvec_push(&task->git_args, "--work-tree=.");
+		return task;
+	}
+	return NULL;
+}
+
 static int get_next_submodule(struct child_process *cp, struct strbuf *err,
 			      void *data, void **task_cb)
 {
 	struct submodule_parallel_fetch *spf = data;
-	struct fetch_task *task = get_fetch_task(spf, err);
+	struct fetch_task *task =
+		get_fetch_task_from_index(spf, err);
+	if (!task)
+		task = get_fetch_task_from_changed(spf, err);
 
 	if (task) {
 		struct strbuf submodule_prefix = STRBUF_INIT;
@@ -1553,6 +1696,8 @@ static int get_next_submodule(struct child_process *cp, struct strbuf *err,
 		prepare_submodule_repo_env_in_gitdir(&cp->env_array);
 		cp->git_cmd = 1;
 		strvec_init(&cp->args);
+		if (task->git_args.nr)
+			strvec_pushv(&cp->args, task->git_args.v);
 		strvec_pushv(&cp->args, spf->args.v);
 		strvec_push(&cp->args, task->default_argv);
 		strvec_push(&cp->args, "--submodule-prefix");
@@ -1564,6 +1709,7 @@ static int get_next_submodule(struct child_process *cp, struct strbuf *err,
 		*task_cb = task;
 
 		strbuf_release(&submodule_prefix);
+		string_list_insert(&spf->seen_submodule_names, task->sub->name);
 		return 1;
 	}
 
@@ -1678,11 +1824,11 @@ static int fetch_finish(int retvalue, struct strbuf *err,
 	return 0;
 }
 
-int fetch_populated_submodules(struct repository *r,
-			       const struct strvec *options,
-			       const char *prefix, int command_line_option,
-			       int default_option,
-			       int quiet, int max_parallel_jobs)
+int fetch_submodules(struct repository *r,
+		     const struct strvec *options,
+		     const char *prefix, int command_line_option,
+		     int default_option,
+		     int quiet, int max_parallel_jobs)
 {
 	int i;
 	struct submodule_parallel_fetch spf = SPF_INIT;
diff --git a/submodule.h b/submodule.h
index 784ceffc0e..61bebde319 100644
--- a/submodule.h
+++ b/submodule.h
@@ -88,12 +88,12 @@ int should_update_submodules(void);
  */
 const struct submodule *submodule_from_ce(const struct cache_entry *ce);
 void check_for_new_submodule_commits(struct object_id *oid);
-int fetch_populated_submodules(struct repository *r,
-			       const struct strvec *options,
-			       const char *prefix,
-			       int command_line_option,
-			       int default_option,
-			       int quiet, int max_parallel_jobs);
+int fetch_submodules(struct repository *r,
+		     const struct strvec *options,
+		     const char *prefix,
+		     int command_line_option,
+		     int default_option,
+		     int quiet, int max_parallel_jobs);
 unsigned is_submodule_modified(const char *path, int ignore_untracked);
 int submodule_uses_gitfile(const char *path);
 
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index aa6bb9867c..43dada8544 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -12,17 +12,29 @@ pwd=$(pwd)
 
 write_expected_sub () {
 	NEW_HEAD=$1 &&
+	SUPER_HEAD=$2 &&
 	cat >"$pwd/expect.err.sub" <<-EOF
-	Fetching submodule submodule
+	Fetching submodule submodule${SUPER_HEAD:+ at commit $SUPER_HEAD}
 	From $pwd/submodule
 	   OLD_HEAD..$NEW_HEAD  sub        -> origin/sub
 	EOF
 }
 
+write_expected_sub2 () {
+	NEW_HEAD=$1 &&
+	SUPER_HEAD=$2 &&
+	cat >"$pwd/expect.err.sub2" <<-EOF
+	Fetching submodule submodule2${SUPER_HEAD:+ at commit $SUPER_HEAD}
+	From $pwd/submodule2
+	   OLD_HEAD..$NEW_HEAD  sub2       -> origin/sub2
+	EOF
+}
+
 write_expected_deep () {
 	NEW_HEAD=$1 &&
+	SUB_HEAD=$2 &&
 	cat >"$pwd/expect.err.deep" <<-EOF
-	Fetching submodule submodule/subdir/deepsubmodule
+	Fetching submodule submodule/subdir/deepsubmodule${SUB_HEAD:+ at commit $SUB_HEAD}
 	From $pwd/deepsubmodule
 	   OLD_HEAD..$NEW_HEAD  deep       -> origin/deep
 	EOF
@@ -106,6 +118,10 @@ verify_fetch_result () {
 	then
 		cat expect.err.deep >>expect.err.combined
 	fi &&
+	if test -f expect.err.sub2
+	then
+		cat expect.err.sub2 >>expect.err.combined
+	fi &&
 	sed -e 's/[0-9a-f][0-9a-f]*\.\./OLD_HEAD\.\./' "$ACTUAL_ERR" >actual.err.cmp &&
 	test_cmp expect.err.combined actual.err.cmp
 }
@@ -419,6 +435,147 @@ test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necess
 	verify_fetch_result actual.err
 '
 
+# These tests verify that we can fetch submodules that aren't in the
+# index.
+#
+# First, test the simple case where the index is empty and we only fetch
+# submodules that are not in the index.
+test_expect_success 'setup downstream branch without submodules' '
+	(
+		cd downstream &&
+		git checkout --recurse-submodules -b no-submodules &&
+		git rm .gitmodules &&
+		git rm submodule &&
+		git commit -m "no submodules" &&
+		git checkout --recurse-submodules super
+	)
+'
+
+test_expect_success "'--recurse-submodules=on-demand' should fetch submodule commits if the submodule is changed but the index has no submodules" '
+	add_submodule_commits &&
+	add_superproject_commits &&
+	# Fetch the new superproject commit
+	(
+		cd downstream &&
+		git switch --recurse-submodules no-submodules &&
+		git fetch --recurse-submodules=on-demand >../actual.out 2>../actual.err
+	) &&
+	super_head=$(git rev-parse --short HEAD) &&
+	sub_head=$(git -C submodule rev-parse --short HEAD) &&
+	deep_head=$(git -C submodule/subdir/deepsubmodule rev-parse --short HEAD) &&
+
+	# assert that these are fetched from commits, not the index
+	write_expected_sub $sub_head $super_head &&
+	write_expected_deep $deep_head $sub_head &&
+
+	test_must_be_empty actual.out &&
+	verify_fetch_result actual.err
+'
+
+test_expect_success "'--recurse-submodules' should fetch submodule commits if the submodule is changed but the index has no submodules" '
+	add_submodule_commits &&
+	add_superproject_commits &&
+	# Fetch the new superproject commit
+	(
+		cd downstream &&
+		git switch --recurse-submodules no-submodules &&
+		git fetch --recurse-submodules >../actual.out 2>../actual.err
+	) &&
+	super_head=$(git rev-parse --short HEAD) &&
+	sub_head=$(git -C submodule rev-parse --short HEAD) &&
+	deep_head=$(git -C submodule/subdir/deepsubmodule rev-parse --short HEAD) &&
+
+	# assert that these are fetched from commits, not the index
+	write_expected_sub $sub_head $super_head &&
+	write_expected_deep $deep_head $sub_head &&
+
+	test_must_be_empty actual.out &&
+	verify_fetch_result actual.err
+'
+
+test_expect_success "'--recurse-submodules' should ignore changed, inactive submodules" '
+	add_submodule_commits &&
+	add_superproject_commits &&
+
+	# Fetch the new superproject commit
+	(
+		cd downstream &&
+		git switch --recurse-submodules no-submodules &&
+		git -c submodule.submodule.active=false fetch --recurse-submodules >../actual.out 2>../actual.err
+	) &&
+	test_must_be_empty actual.out &&
+	super_head=$(git rev-parse --short HEAD) &&
+	write_expected_super $super_head &&
+	# Neither should be fetched because the submodule is inactive
+	rm expect.err.sub &&
+	rm expect.err.deep &&
+	verify_fetch_result actual.err
+'
+
+# Now that we know we can fetch submodules that are not in the index,
+# test that we can fetch index and non-index submodules in the same
+# operation.
+test_expect_success 'setup downstream branch with other submodule' '
+	mkdir submodule2 &&
+	(
+		cd submodule2 &&
+		git init &&
+		echo sub2content >sub2file &&
+		git add sub2file &&
+		git commit -a -m new &&
+		git branch -M sub2
+	) &&
+	git checkout -b super-sub2-only &&
+	git submodule add "$pwd/submodule2" submodule2 &&
+	git commit -m "add sub2" &&
+	git checkout super &&
+	(
+		cd downstream &&
+		git fetch --recurse-submodules origin &&
+		git checkout super-sub2-only &&
+		# Explicitly run "git submodule update" because sub2 is new
+		# and has not been cloned.
+		git submodule update --init &&
+		git checkout --recurse-submodules super
+	)
+'
+
+test_expect_success "'--recurse-submodules' should fetch submodule commits in changed submodules and the index" '
+	test_when_finished "rm expect.err.sub2" &&
+	# Create new commit in origin/super
+	add_submodule_commits &&
+	add_superproject_commits &&
+
+	# Create new commit in origin/super-sub2-only
+	git checkout super-sub2-only &&
+	(
+		cd submodule2 &&
+		test_commit --no-tag foo
+	) &&
+	git add submodule2 &&
+	git commit -m "new submodule2" &&
+
+	git checkout super &&
+	(
+		cd downstream &&
+		git fetch --recurse-submodules >../actual.out 2>../actual.err
+	) &&
+	test_must_be_empty actual.out &&
+	sub2_head=$(git -C submodule2 rev-parse --short HEAD) &&
+	super_head=$(git rev-parse --short super) &&
+	super_sub2_only_head=$(git rev-parse --short super-sub2-only) &&
+	write_expected_sub2 $sub2_head $super_sub2_only_head &&
+
+	# write_expected_super cannot handle >1 branch. Since this is a
+	# one-off, construct expect.err.super manually.
+	cat >"$pwd/expect.err.super" <<-EOF &&
+	From $pwd/.
+	   OLD_HEAD..$super_head  super           -> origin/super
+	   OLD_HEAD..$super_sub2_only_head  super-sub2-only -> origin/super-sub2-only
+	EOF
+	verify_fetch_result actual.err
+'
+
 test_expect_success "'--recurse-submodules=on-demand' stops when no new submodule commits are found in the superproject (and ignores config)" '
 	add_submodule_commits &&
 	echo a >> file &&
@@ -861,4 +1018,111 @@ test_expect_success 'recursive fetch after deinit a submodule' '
 	test_cmp expect actual
 '
 
+test_expect_success 'setup repo with upstreams that share a submodule name' '
+	mkdir same-name-1 &&
+	(
+		cd same-name-1 &&
+		git init -b main &&
+		test_commit --no-tag a
+	) &&
+	git clone same-name-1 same-name-2 &&
+	# same-name-1 and same-name-2 both add a submodule with the
+	# name "submodule"
+	(
+		cd same-name-1 &&
+		mkdir submodule &&
+		git -C submodule init -b main &&
+		test_commit -C submodule --no-tag a1 &&
+		git submodule add "$pwd/same-name-1/submodule" &&
+		git add submodule &&
+		git commit -m "super-a1"
+	) &&
+	(
+		cd same-name-2 &&
+		mkdir submodule &&
+		git -C submodule init -b main &&
+		test_commit -C submodule --no-tag a2 &&
+		git submodule add "$pwd/same-name-2/submodule" &&
+		git add submodule &&
+		git commit -m "super-a2"
+	) &&
+	git clone same-name-1 -o same-name-1 same-name-downstream &&
+	(
+		cd same-name-downstream &&
+		git remote add same-name-2 ../same-name-2 &&
+		git fetch --all &&
+		# init downstream with same-name-1
+		git submodule update --init
+	)
+'
+
+test_expect_success 'fetch --recurse-submodules updates name-conflicted, populated submodule' '
+	test_when_finished "git -C same-name-downstream checkout main" &&
+	(
+		cd same-name-1 &&
+		test_commit -C submodule --no-tag b1 &&
+		git add submodule &&
+		git commit -m "super-b1"
+	) &&
+	(
+		cd same-name-2 &&
+		test_commit -C submodule --no-tag b2 &&
+		git add submodule &&
+		git commit -m "super-b2"
+	) &&
+	(
+		cd same-name-downstream &&
+		# even though the .gitmodules is correct, we cannot
+		# fetch from same-name-2
+		git checkout same-name-2/main &&
+		git fetch --recurse-submodules same-name-1 &&
+		test_must_fail git fetch --recurse-submodules same-name-2
+	) &&
+	super_head1=$(git -C same-name-1 rev-parse HEAD) &&
+	git -C same-name-downstream cat-file -e $super_head1 &&
+
+	super_head2=$(git -C same-name-2 rev-parse HEAD) &&
+	git -C same-name-downstream cat-file -e $super_head2 &&
+
+	sub_head1=$(git -C same-name-1/submodule rev-parse HEAD) &&
+	git -C same-name-downstream/submodule cat-file -e $sub_head1 &&
+
+	sub_head2=$(git -C same-name-2/submodule rev-parse HEAD) &&
+	test_must_fail git -C same-name-downstream/submodule cat-file -e $sub_head2
+'
+
+test_expect_success 'fetch --recurse-submodules updates name-conflicted, unpopulated submodule' '
+	(
+		cd same-name-1 &&
+		test_commit -C submodule --no-tag c1 &&
+		git add submodule &&
+		git commit -m "super-c1"
+	) &&
+	(
+		cd same-name-2 &&
+		test_commit -C submodule --no-tag c2 &&
+		git add submodule &&
+		git commit -m "super-c2"
+	) &&
+	(
+		cd same-name-downstream &&
+		git checkout main &&
+		git rm .gitmodules &&
+		git rm submodule &&
+		git commit -m "no submodules" &&
+		git fetch --recurse-submodules same-name-1
+	) &&
+	head1=$(git -C same-name-1/submodule rev-parse HEAD) &&
+	head2=$(git -C same-name-2/submodule rev-parse HEAD) &&
+	(
+		cd same-name-downstream/.git/modules/submodule &&
+		# The submodule has core.worktree pointing to the "git
+		# rm"-ed directory, overwrite the invalid value. See
+		# comment in get_fetch_task_from_changed() for more
+		# information.
+		git --work-tree=. cat-file -e $head1 &&
+		test_must_fail git --work-tree=. cat-file -e $head2
+	)
+'
+
 test_done
-- 
2.33.GIT


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* [PATCH v5 10/10] submodule: fix latent check_has_commit() bug
  2022-03-08  0:14       ` [PATCH v5 " Glen Choo
                           ` (8 preceding siblings ...)
  2022-03-08  0:14         ` [PATCH v5 09/10] fetch: fetch unpopulated, changed submodules Glen Choo
@ 2022-03-08  0:14         ` Glen Choo
  2022-03-08  0:50         ` [PATCH v5 00/10] fetch --recurse-submodules: fetch unpopulated submodules Junio C Hamano
  2022-03-08 21:42         ` Jonathan Tan
  11 siblings, 0 replies; 149+ messages in thread
From: Glen Choo @ 2022-03-08  0:14 UTC (permalink / raw)
  To: git
  Cc: Glen Choo, Jonathan Tan, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

When check_has_commit() is called on a missing submodule, initialization
of the struct repository fails, but it attempts to clear the struct
anyway (which is a fatal error). This bug is masked by its only caller,
submodule_has_commits(), first calling add_submodule_odb(). The latter
fails if the submodule does not exist, making submodule_has_commits()
exit early and not invoke check_has_commit().

Fix this bug, and because calling add_submodule_odb() is no longer
necessary as of 13a2f620b2 (submodule: pass repo to
check_has_commit(), 2021-10-08), remove that call too.

This is the last caller of add_submodule_odb(), so remove that
function. (Submodule ODBs are still added as alternates via
add_submodule_odb_by_path().)

Signed-off-by: Glen Choo <chooglen@google.com>
---
 submodule.c | 35 ++---------------------------------
 submodule.h |  9 ++++-----
 2 files changed, 6 insertions(+), 38 deletions(-)

diff --git a/submodule.c b/submodule.c
index 1f5f39ce18..6e6b2d04e4 100644
--- a/submodule.c
+++ b/submodule.c
@@ -167,26 +167,6 @@ void stage_updated_gitmodules(struct index_state *istate)
 
 static struct string_list added_submodule_odb_paths = STRING_LIST_INIT_NODUP;
 
-/* TODO: remove this function, use repo_submodule_init instead. */
-int add_submodule_odb(const char *path)
-{
-	struct strbuf objects_directory = STRBUF_INIT;
-	int ret = 0;
-
-	ret = strbuf_git_path_submodule(&objects_directory, path, "objects/");
-	if (ret)
-		goto done;
-	if (!is_directory(objects_directory.buf)) {
-		ret = -1;
-		goto done;
-	}
-	string_list_insert(&added_submodule_odb_paths,
-			   strbuf_detach(&objects_directory, NULL));
-done:
-	strbuf_release(&objects_directory);
-	return ret;
-}
-
 void add_submodule_odb_by_path(const char *path)
 {
 	string_list_insert(&added_submodule_odb_paths, xstrdup(path));
@@ -971,7 +951,8 @@ static int check_has_commit(const struct object_id *oid, void *data)
 
 	if (repo_submodule_init(&subrepo, cb->repo, cb->path, cb->super_oid)) {
 		cb->result = 0;
-		goto cleanup;
+		/* subrepo failed to init, so don't clean it up. */
+		return 0;
 	}
 
 	type = oid_object_info(&subrepo, oid, NULL);
@@ -1007,18 +988,6 @@ static int submodule_has_commits(struct repository *r,
 		.super_oid = super_oid
 	};
 
-	/*
-	 * Perform a cheap, but incorrect check for the existence of 'commits'.
-	 * This is done by adding the submodule's object store to the in-core
-	 * object store, and then querying for each commit's existence.  If we
-	 * do not have the commit object anywhere, there is no chance we have
-	 * it in the object store of the correct submodule and have it
-	 * reachable from a ref, so we can fail early without spawning rev-list
-	 * which is expensive.
-	 */
-	if (add_submodule_odb(path))
-		return 0;
-
 	oid_array_for_each_unique(commits, check_has_commit, &has_commit);
 
 	if (has_commit.result) {
diff --git a/submodule.h b/submodule.h
index 61bebde319..40c1445237 100644
--- a/submodule.h
+++ b/submodule.h
@@ -103,12 +103,11 @@ int submodule_uses_gitfile(const char *path);
 int bad_to_remove_submodule(const char *path, unsigned flags);
 
 /*
- * Call add_submodule_odb() to add the submodule at the given path to a list.
- * When register_all_submodule_odb_as_alternates() is called, the object stores
- * of all submodules in that list will be added as alternates in
- * the_repository.
+ * Call add_submodule_odb_by_path() to add the submodule at the given
+ * path to a list. When register_all_submodule_odb_as_alternates() is
+ * called, the object stores of all submodules in that list will be
+ * added as alternates in the_repository.
  */
-int add_submodule_odb(const char *path);
 void add_submodule_odb_by_path(const char *path);
 int register_all_submodule_odb_as_alternates(void);
 
-- 
2.33.GIT


^ permalink raw reply related	[flat|nested] 149+ messages in thread

* Re: [PATCH v5 00/10] fetch --recurse-submodules: fetch unpopulated submodules
  2022-03-08  0:14       ` [PATCH v5 " Glen Choo
                           ` (9 preceding siblings ...)
  2022-03-08  0:14         ` [PATCH v5 10/10] submodule: fix latent check_has_commit() bug Glen Choo
@ 2022-03-08  0:50         ` Junio C Hamano
  2022-03-08 18:24           ` Glen Choo
  2022-03-08 21:42         ` Jonathan Tan
  11 siblings, 1 reply; 149+ messages in thread
From: Junio C Hamano @ 2022-03-08  0:50 UTC (permalink / raw)
  To: Glen Choo; +Cc: git, Jonathan Tan, Ævar Arnfjörð Bjarmason

Glen Choo <chooglen@google.com> writes:

>   It's true that we don't need <.super_oid, .path> in order to init the
>   subrepo, but it turns out that recursive fetch reads some
>   configuration values from .gitmodules (via submodule_from_path()), so
>   we still need to store super_oid in order to read the correct
>   .gitmodules file.

OK, but then do we know which .gitmodules file is the "correct" one,
when there are more than one .super_oid?  Or do we assume that
.gitmodules does not change in the range of superproject commits we
have fetched before deciding what commits need to be fetched in the
submodules?

> == Since v4
> - Rename test helpers (s/check_/write_expected_)
> - Test style fixes
> - Update test comments
> - Remove the manual test_cmp in the test that checks sub2 (but we still
>   construct expect.err.super manually).

All of these changes looked sensible.

Will queue.  Thanks.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v5 00/10] fetch --recurse-submodules: fetch unpopulated submodules
  2022-03-08  0:50         ` [PATCH v5 00/10] fetch --recurse-submodules: fetch unpopulated submodules Junio C Hamano
@ 2022-03-08 18:24           ` Glen Choo
  2022-03-09 19:13             ` Junio C Hamano
  0 siblings, 1 reply; 149+ messages in thread
From: Glen Choo @ 2022-03-08 18:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jonathan Tan, Ævar Arnfjörð Bjarmason

Junio C Hamano <gitster@pobox.com> writes:

> Glen Choo <chooglen@google.com> writes:
>
>>   It's true that we don't need <.super_oid, .path> in order to init the
>>   subrepo, but it turns out that recursive fetch reads some
>>   configuration values from .gitmodules (via submodule_from_path()), so
>>   we still need to store super_oid in order to read the correct
>>   .gitmodules file.
>
> OK, but then do we know which .gitmodules file is the "correct" one,
> when there are more than one .super_oid?  Or do we assume that
> .gitmodules does not change in the range of superproject commits we
> have fetched before deciding what commits need to be fetched in the
> submodules?

This uses a "first one wins approach", which obviously doesn't have
correctness guarantees. But in practice, I don't think this is likely to
cause problems:

- As far as I can tell, the only value we read from .gitmodules is
  'submodule.<name>.fetchRecurseSubmodules', and this value gets
  overridden by two other values: the CLI option, and the config
  variable with the same name in .git/config.

  During "git submodule init", we copy the config values from
  .gitmodules to .git/config. Since we can only fetch init-ed submodules
  anyway, it's quite unlikely that we will ever actually make use of the
  .gitmodules config.

- Even if we do use the .gitmodules config values, it's unlikely that
  the values in .gitmodules will change often, so it _probably_ won't
  matter which one we choose.

- This only matters when the submodule is not in the index. If the
  submodule _is_ in the index, we read .gitmodules from the filesystem
  i.e. these patches shouldn't change the behavior for submodules in the
  index.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v5 00/10] fetch --recurse-submodules: fetch unpopulated submodules
  2022-03-08  0:14       ` [PATCH v5 " Glen Choo
                           ` (10 preceding siblings ...)
  2022-03-08  0:50         ` [PATCH v5 00/10] fetch --recurse-submodules: fetch unpopulated submodules Junio C Hamano
@ 2022-03-08 21:42         ` Jonathan Tan
  11 siblings, 0 replies; 149+ messages in thread
From: Jonathan Tan @ 2022-03-08 21:42 UTC (permalink / raw)
  To: Glen Choo
  Cc: Jonathan Tan, git, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

Glen Choo <chooglen@google.com> writes:
> - <20220304235328.649768-1-jonathantanmy@google.com> I've described the
>   differences between the no-submodule-in-index test and the
>   other-submodule-in-index test (their comments now refer to one
>   another, so the contrast is more obvious), but didn't reorder them
>   because I thought that made the test setup less intuitive to read.

Thanks - the comments make sense.

> - <20220304234622.647776-1-jonathantanmy@google.com> I added
>   expect.err.sub2 to verify_test_result() but didn't change
>   write_expected_super() to account for sub2. It turned out to be tricky
>   to predict the output when 'super' fetches >1 branch because each
>   fetched branch can affect the formatting. e.g.
> 
>     	   OLD_HEAD..super  super           -> origin/super
> 
>   can become
> 
>     	   OLD_HEAD..super  super                   -> origin/super
>     	   OLD_HEAD..super  some-other-branch       -> origin/some-other-branch
> 
>   (I could work around this by replacing the whitespace with sed, but it
>   seemed like too much overhead for a one-off test).

Overwriting just the super part works for me, thanks.

The only thing remaining from me is my comment about fetching OIDs from
one submodule into another (of the same name but different URL) [1], but
I looked into it myself and we can probably postpone handling this to
another patch set.

In such a patch set, we would probably need to store the URLs that are
reported by upstream .gitmodules somewhere. (I forgot that we don't use
them in this patch.) And then, either implement an autosync function
(like "git submodule sync", perhaps gated by a "--sync-submodules"
argument so that users can include it when fetching new commits and
exclude it when fetching historical commits) and/or use those URLs in a
diagnostic message to be printed when the fetch fails.

As it is, the existing fetch-into-submodules-at-HEAD also suffers from
the same flaw, so I'm OK postponing this to another patch set.

So,
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>

[1] https://lore.kernel.org/git/20220304234622.647776-1-jonathantanmy@google.com/

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v5 00/10] fetch --recurse-submodules: fetch unpopulated submodules
  2022-03-08 18:24           ` Glen Choo
@ 2022-03-09 19:13             ` Junio C Hamano
  2022-03-09 19:49               ` Glen Choo
  0 siblings, 1 reply; 149+ messages in thread
From: Junio C Hamano @ 2022-03-09 19:13 UTC (permalink / raw)
  To: Glen Choo; +Cc: git, Jonathan Tan, Ævar Arnfjörð Bjarmason

Glen Choo <chooglen@google.com> writes:

> This uses a "first one wins approach", which obviously doesn't have
> correctness guarantees. But in practice, I don't think this is likely to
> cause problems:
>
> - As far as I can tell, the only value we read from .gitmodules is
>   'submodule.<name>.fetchRecurseSubmodules', and this value gets
>   overridden by two other values: the CLI option, and the config
>   variable with the same name in .git/config.
>
>   During "git submodule init", we copy the config values from
>   .gitmodules to .git/config. Since we can only fetch init-ed submodules
>   anyway, it's quite unlikely that we will ever actually make use of the
>   .gitmodules config.

These are reasonable.

> - Even if we do use the .gitmodules config values, it's unlikely that
>   the values in .gitmodules will change often, so it _probably_ won't
>   matter which one we choose.

What bad things would we see if the value changes during the span of
history of the superproject we fetched?  How often we would see
broken behaviour is immaterial and breakage being rare is a no excuse
to import a new code with designed-in flaw.  Unless the "rare" is
"never", that is.

I would think using ANY values from .gitmodules without having the
end-user agree with the settings and copying the settings to the
.git/config is a BUG.  So if it mattered from which superproject
commit we took .gitmodules from, that would mean we already have
such a bug and it is not a new problem.

That would be a reasonable argument for this topic. Together with
the previous point, i.e. we do not copy values we see in the in-tree
.gitmodules file to .git/config anyway, it would make a good enough
assurance, I would think.

> - This only matters when the submodule is not in the index. If the
>   submodule _is_ in the index, we read .gitmodules from the filesystem
>   i.e. these patches shouldn't change the behavior for submodules in the
>   index.

How often we would see broken behaviour does not matter.  If it is
broken when the submodule is not in the index, we need to know.

But as you said, it does not sound likely that in-tree .gitmodules
matters.

It leads to a possible #leftoverbit clean-up.  Because we only fetch
submodules that are initialized, the API functions we are using in
this series has no reason to require us to feed _a_ commit in the
superproject to them so that they can find .gitmodules in them.

Fixing the API can probably be left outside the scope of the topic,
to be done soon after the dust from the topic settles, I think, to
avoid distracting us from the topic.

Thanks.



^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v5 00/10] fetch --recurse-submodules: fetch unpopulated submodules
  2022-03-09 19:13             ` Junio C Hamano
@ 2022-03-09 19:49               ` Glen Choo
  2022-03-09 20:22                 ` Junio C Hamano
  0 siblings, 1 reply; 149+ messages in thread
From: Glen Choo @ 2022-03-09 19:49 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jonathan Tan, Ævar Arnfjörð Bjarmason

Junio C Hamano <gitster@pobox.com> writes:

> Glen Choo <chooglen@google.com> writes:
>
>> This uses a "first one wins approach", which obviously doesn't have
>> correctness guarantees. But in practice, I don't think this is likely to
>> cause problems:
>>
>> - As far as I can tell, the only value we read from .gitmodules is
>>   'submodule.<name>.fetchRecurseSubmodules', and this value gets
>>   overridden by two other values: the CLI option, and the config
>>   variable with the same name in .git/config.
>>
>>   During "git submodule init", we copy the config values from
>>   .gitmodules to .git/config. Since we can only fetch init-ed submodules
>>   anyway, it's quite unlikely that we will ever actually make use of the
>>   .gitmodules config.
>
> These are reasonable.
>
>> - Even if we do use the .gitmodules config values, it's unlikely that
>>   the values in .gitmodules will change often, so it _probably_ won't
>>   matter which one we choose.
>
> What bad things would we see if the value changes during the span of
> history of the superproject we fetched?  How often we would see
> broken behaviour is immaterial and breakage being rare is a no excuse
> to import a new code with designed-in flaw.  Unless the "rare" is
> "never", that is.

Makes sense, I'll keep this mind.

> I would think using ANY values from .gitmodules without having the
> end-user agree with the settings and copying the settings to the
> .git/config is a BUG.  So if it mattered from which superproject
> commit we took .gitmodules from, that would mean we already have
> such a bug and it is not a new problem.
>
> That would be a reasonable argument for this topic. Together with
> the previous point, i.e. we do not copy values we see in the in-tree
> .gitmodules file to .git/config anyway, it would make a good enough
> assurance, I would think.

To clarify, does this opinion of "don't use config values that aren't
copied into .git/config" extend to in-tree .gitmodules? Prior to this
series, we always read the in-tree .gitmodules to get the config - the
user does not need to copy the settings to .git/config, but we don't
pick a commit to read .gitmodules from.

If we still want to consider in-tree .gitmodules e.g. by merging
.git/config and .gitmodules, then we still have the new problem of
choosing the right .gitmodules.

If the answer is "no, we don't even consider in-tree .gitmodules"
(unless we really have to, like cloning a new submodule), that seems
pretty safe and predictable because we wouldn't have to look in two
different places to figure out what the user wants. And more crucially,
we'd never have to guess which .gitmodules to read - which will become
more of an issue as we add more support for init-ed but unpopulated
submodules.

> It leads to a possible #leftoverbit clean-up.  Because we only fetch
> submodules that are initialized, the API functions we are using in
> this series has no reason to require us to feed _a_ commit in the
> superproject to them so that they can find .gitmodules in them.

Hm, this is true; an initialized submodule should already have the
'expected' information in .git/config. And if we no longer have to fret
about whether we're reading the correct .gitmodules, we can revisit the
idea of "init a subrepo using only its name".

> Fixing the API can probably be left outside the scope of the topic,
> to be done soon after the dust from the topic settles, I think, to
> avoid distracting us from the topic.
>
> Thanks.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v5 00/10] fetch --recurse-submodules: fetch unpopulated submodules
  2022-03-09 19:49               ` Glen Choo
@ 2022-03-09 20:22                 ` Junio C Hamano
  2022-03-09 22:11                   ` Glen Choo
  0 siblings, 1 reply; 149+ messages in thread
From: Junio C Hamano @ 2022-03-09 20:22 UTC (permalink / raw)
  To: Glen Choo; +Cc: git, Jonathan Tan, Ævar Arnfjörð Bjarmason

Glen Choo <chooglen@google.com> writes:

> To clarify, does this opinion of "don't use config values that aren't
> copied into .git/config" extend to in-tree .gitmodules? Prior to this
> series, we always read the in-tree .gitmodules to get the config - the
> user does not need to copy the settings to .git/config, but we don't
> pick a commit to read .gitmodules from.

I think we do, but I also think it was a huge mistake to allow
repository data to directly affect the behaviour of local checkout.

Fixing that is most likely outside the scope of this series, though.

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v5 00/10] fetch --recurse-submodules: fetch unpopulated submodules
  2022-03-09 20:22                 ` Junio C Hamano
@ 2022-03-09 22:11                   ` Glen Choo
  2022-03-16 21:58                     ` Glen Choo
  0 siblings, 1 reply; 149+ messages in thread
From: Glen Choo @ 2022-03-09 22:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jonathan Tan, Ævar Arnfjörð Bjarmason

Junio C Hamano <gitster@pobox.com> writes:

> Glen Choo <chooglen@google.com> writes:
>
>> To clarify, does this opinion of "don't use config values that aren't
>> copied into .git/config" extend to in-tree .gitmodules? Prior to this
>> series, we always read the in-tree .gitmodules to get the config - the
>> user does not need to copy the settings to .git/config, but we don't
>> pick a commit to read .gitmodules from.
>
> I think we do, but I also think it was a huge mistake to allow
> repository data to directly affect the behaviour of local checkout.

I'm inclined to agree.

> Fixing that is most likely outside the scope of this series, though.

Agree. Thanks!

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v5 00/10] fetch --recurse-submodules: fetch unpopulated submodules
  2022-03-09 22:11                   ` Glen Choo
@ 2022-03-16 21:58                     ` Glen Choo
  2022-03-16 22:06                       ` Junio C Hamano
  0 siblings, 1 reply; 149+ messages in thread
From: Glen Choo @ 2022-03-16 21:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jonathan Tan, Ævar Arnfjörð Bjarmason

Glen Choo <chooglen@google.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> Glen Choo <chooglen@google.com> writes:
>>
>>> To clarify, does this opinion of "don't use config values that aren't
>>> copied into .git/config" extend to in-tree .gitmodules? Prior to this
>>> series, we always read the in-tree .gitmodules to get the config - the
>>> user does not need to copy the settings to .git/config, but we don't
>>> pick a commit to read .gitmodules from.
>>
>> I think we do, but I also think it was a huge mistake to allow
>> repository data to directly affect the behaviour of local checkout.
>
> I'm inclined to agree.
>
>> Fixing that is most likely outside the scope of this series, though.
>
> Agree. Thanks!

I thought that this would have been the end of the discussion, but after
reading <xmqqa6dpllmc.fsf@gitster.g>, I guess I had the wrong impression
(oops).

If I am reading everything correctly, we both agree that it's not
good to read _any_ config values from .gitmodules (even if it's
in-tree), and that we should clean it up outside of this topic. So for
this topic to be merged into 'next', is it enough to say that I will fix
this behavior in a follow up topic?

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v5 00/10] fetch --recurse-submodules: fetch unpopulated submodules
  2022-03-16 21:58                     ` Glen Choo
@ 2022-03-16 22:06                       ` Junio C Hamano
  2022-03-16 22:37                         ` Glen Choo
  0 siblings, 1 reply; 149+ messages in thread
From: Junio C Hamano @ 2022-03-16 22:06 UTC (permalink / raw)
  To: Glen Choo; +Cc: git, Jonathan Tan, Ævar Arnfjörð Bjarmason

Glen Choo <chooglen@google.com> writes:

> Glen Choo <chooglen@google.com> writes:
>
>> Junio C Hamano <gitster@pobox.com> writes:
>>
>>> Glen Choo <chooglen@google.com> writes:
>>>
>>>> To clarify, does this opinion of "don't use config values that aren't
>>>> copied into .git/config" extend to in-tree .gitmodules? Prior to this
>>>> series, we always read the in-tree .gitmodules to get the config - the
>>>> user does not need to copy the settings to .git/config, but we don't
>>>> pick a commit to read .gitmodules from.
>>>
>>> I think we do, but I also think it was a huge mistake to allow
>>> repository data to directly affect the behaviour of local checkout.
>>
>> I'm inclined to agree.
>>
>>> Fixing that is most likely outside the scope of this series, though.
>>
>> Agree. Thanks!
>
> I thought that this would have been the end of the discussion, but after
> reading <xmqqa6dpllmc.fsf@gitster.g>, I guess I had the wrong impression
> (oops).
>
> If I am reading everything correctly, we both agree that it's not
> good to read _any_ config values from .gitmodules (even if it's
> in-tree), and that we should clean it up outside of this topic. So for
> this topic to be merged into 'next', is it enough to say that I will fix
> this behavior in a follow up topic?

At least we should remember that is something to be fixed.  It may
not be you personally who addresses that issue, though ;-)

^ permalink raw reply	[flat|nested] 149+ messages in thread

* Re: [PATCH v5 00/10] fetch --recurse-submodules: fetch unpopulated submodules
  2022-03-16 22:06                       ` Junio C Hamano
@ 2022-03-16 22:37                         ` Glen Choo
  2022-03-16 23:08                           ` Junio C Hamano
  0 siblings, 1 reply; 149+ messages in thread
From: Glen Choo @ 2022-03-16 22:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jonathan Tan, Ævar Arnfjörð Bjarmason

Junio C Hamano <gitster@pobox.com> writes:

> Glen Choo <chooglen@google.com> writes:
>
>> Glen Choo <chooglen@google.com> writes:
>>
>>> Junio C Hamano <gitster@pobox.com> writes:
>>>
>>>> Glen Choo <chooglen@google.com> writes:
>>>>
>>>>> To clarify, does this opinion of "don't use config values that aren't
>>>>> copied into .git/config" extend to in-tree .gitmodules? Prior to this
>>>>> series, we always read the in-tree .gitmodules to get the config - the
>>>>> user does not need to copy the settings to .git/config, but we don't
>>>>> pick a commit to read .gitmodules from.
>>>>
>>>> I think we do, but I also think it was a huge mistake to allow
>>>> repository data to directly affect the behaviour of local checkout.
>>>
>>> I'm inclined to agree.
>>>
>>>> Fixing that is most likely outside the scope of this series, though.
>>>
>>> Agree. Thanks!
>>
>> I thought that this would have been the end of the discussion, but after
>> reading <xmqqa6dpllmc.fsf@gitster.g>, I guess I had the wrong impression
>> (oops).
>>
>> If I am reading everything correctly, we both agree that it's not
>> good to read _any_ config values from .gitmodules (even if it's
>> in-tree), and that we should clean it up outside of this topic. So for
>> this topic to be merged into 'next', is it enough to say that I will fix
>> this behavior in a follow up topic?
>
> At least we should remember that is something to be fixed.  It may
> not be you personally who addresses that issue, though ;-)

Perhaps squashing in a NEEDSWORK comment into [PATCH v5 09/10] will
suffice? I can also resend this series if preferred.

----- >8 --------- >8 --------- >8 --------- >8 --------- >8 ----

diff --git a/submodule.c b/submodule.c
index 6e6b2d04e4..93c78a4dc3 100644
--- a/submodule.c
+++ b/submodule.c
@@ -795,6 +795,21 @@ static const char *default_name_or_path(const char *path_or_name)
  * superproject commit that points to the submodule, but this is
  * arbitrary - we can choose any (super_oid, path) that matches the
  * submodule's name.
+ *
+ * NEEDSWORK: Storing an arbitrary commit is undesirable because we can't
+ * guarantee that we're reading the commit that the user would expect. A better
+ * scheme would be to just fetch a submodule by its name. This requires two
+ * steps:
+ * - Create a function that behaves like repo_submodule_init(), but accepts a
+ *   submodule name instead of treeish_name and path. This should be easy
+ *   because repo_submodule_init() internally uses the submodule's name.
+ *
+ * - Replace most instances of 'struct submodule' (which is the .gitmodules
+ *   config) with just the submodule name. This is OK because we expect
+ *   submodule settings to be stored in .git/config (via "git submodule init"),
+ *   not .gitmodules. This also lets us delete get_non_gitmodules_submodule(),
+ *   which constructs a bogus 'struct submodule' for the sake of giving a
+ *   placeholder name to a gitlink.
  */
 struct changed_submodule_data {
 	/*

^ permalink raw reply related	[flat|nested] 149+ messages in thread

* Re: [PATCH v5 00/10] fetch --recurse-submodules: fetch unpopulated submodules
  2022-03-16 22:37                         ` Glen Choo
@ 2022-03-16 23:08                           ` Junio C Hamano
  0 siblings, 0 replies; 149+ messages in thread
From: Junio C Hamano @ 2022-03-16 23:08 UTC (permalink / raw)
  To: Glen Choo; +Cc: git, Jonathan Tan, Ævar Arnfjörð Bjarmason

Glen Choo <chooglen@google.com> writes:

> Perhaps squashing in a NEEDSWORK comment into [PATCH v5 09/10] will
> suffice? I can also resend this series if preferred.

It should work.  Let me try it in the last integration cycle of
today.

> ----- >8 --------- >8 --------- >8 --------- >8 --------- >8 ----
>
> diff --git a/submodule.c b/submodule.c
> index 6e6b2d04e4..93c78a4dc3 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -795,6 +795,21 @@ static const char *default_name_or_path(const char *path_or_name)
>   * superproject commit that points to the submodule, but this is
>   * arbitrary - we can choose any (super_oid, path) that matches the
>   * submodule's name.
> + *
> + * NEEDSWORK: Storing an arbitrary commit is undesirable because we can't
> + * guarantee that we're reading the commit that the user would expect. A better
> + * scheme would be to just fetch a submodule by its name. This requires two
> + * steps:
> + * - Create a function that behaves like repo_submodule_init(), but accepts a
> + *   submodule name instead of treeish_name and path. This should be easy
> + *   because repo_submodule_init() internally uses the submodule's name.
> + *
> + * - Replace most instances of 'struct submodule' (which is the .gitmodules
> + *   config) with just the submodule name. This is OK because we expect
> + *   submodule settings to be stored in .git/config (via "git submodule init"),
> + *   not .gitmodules. This also lets us delete get_non_gitmodules_submodule(),
> + *   which constructs a bogus 'struct submodule' for the sake of giving a
> + *   placeholder name to a gitlink.
>   */
>  struct changed_submodule_data {
>  	/*

^ permalink raw reply	[flat|nested] 149+ messages in thread

end of thread, other threads:[~2022-03-16 23:08 UTC | newest]

Thread overview: 149+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-10  4:41 [PATCH 0/8] fetch --recurse-submodules: fetch unpopulated submodules Glen Choo
2022-02-10  4:41 ` [PATCH 1/8] submodule: inline submodule_commits() into caller Glen Choo
2022-02-10  4:41 ` [PATCH 2/8] submodule: store new submodule commits oid_array in a struct Glen Choo
2022-02-10 19:00   ` Jonathan Tan
2022-02-10 22:05     ` Junio C Hamano
2022-02-10  4:41 ` [PATCH 3/8] submodule: make static functions read submodules from commits Glen Choo
2022-02-10 19:15   ` Jonathan Tan
2022-02-11 10:07     ` Glen Choo
2022-02-11 10:09     ` Glen Choo
2022-02-10  4:41 ` [PATCH 4/8] t5526: introduce test helper to assert on fetches Glen Choo
2022-02-10  4:41 ` [PATCH 5/8] t5526: use grep " Glen Choo
2022-02-10 19:17   ` Jonathan Tan
2022-02-10  4:41 ` [PATCH 6/8] submodule: extract get_fetch_task() Glen Choo
2022-02-10 19:33   ` Jonathan Tan
2022-02-10  4:41 ` [PATCH 7/8] fetch: fetch unpopulated, changed submodules Glen Choo
2022-02-10 22:49   ` Junio C Hamano
2022-02-11  7:15     ` Glen Choo
2022-02-11 17:07       ` Junio C Hamano
2022-02-10 22:51   ` Jonathan Tan
2022-02-14  4:24     ` Glen Choo
2022-02-14 18:04     ` Glen Choo
2022-02-14 10:17   ` Glen Choo
2022-02-10  4:41 ` [PATCH 8/8] submodule: fix bug and remove add_submodule_odb() Glen Choo
2022-02-10 22:54   ` Junio C Hamano
2022-02-11  3:13     ` Glen Choo
2022-02-10 23:04   ` Jonathan Tan
2022-02-11  3:18     ` Glen Choo
2022-02-11 17:19     ` Junio C Hamano
2022-02-14  2:52       ` Glen Choo
2022-02-10  7:07 ` [PATCH 0/8] fetch --recurse-submodules: fetch unpopulated submodules Junio C Hamano
2022-02-10  8:51   ` Glen Choo
2022-02-10 17:40     ` Junio C Hamano
2022-02-11  2:39       ` Glen Choo
2022-02-15 17:23 ` [PATCH v2 0/9] " Glen Choo
2022-02-15 17:23   ` [PATCH v2 1/9] t5526: introduce test helper to assert on fetches Glen Choo
2022-02-15 21:37     ` Ævar Arnfjörð Bjarmason
2022-02-15 17:23   ` [PATCH v2 2/9] t5526: use grep " Glen Choo
2022-02-15 21:53     ` Ævar Arnfjörð Bjarmason
2022-02-16  3:09       ` Glen Choo
2022-02-16 10:02         ` Ævar Arnfjörð Bjarmason
2022-02-17  4:04           ` Glen Choo
2022-02-17  9:25             ` Ævar Arnfjörð Bjarmason
2022-02-17 16:16               ` Glen Choo
2022-02-15 17:23   ` [PATCH v2 3/9] submodule: make static functions read submodules from commits Glen Choo
2022-02-15 21:18     ` Jonathan Tan
2022-02-16  6:59       ` Glen Choo
2022-02-15 22:00     ` Ævar Arnfjörð Bjarmason
2022-02-16  7:08       ` Glen Choo
2022-02-15 17:23   ` [PATCH v2 4/9] submodule: inline submodule_commits() into caller Glen Choo
2022-02-15 22:02     ` Ævar Arnfjörð Bjarmason
2022-02-15 17:23   ` [PATCH v2 5/9] submodule: store new submodule commits oid_array in a struct Glen Choo
2022-02-15 21:33     ` Ævar Arnfjörð Bjarmason
2022-02-15 17:23   ` [PATCH v2 6/9] submodule: extract get_fetch_task() Glen Choo
2022-02-15 17:23   ` [PATCH v2 7/9] fetch: fetch unpopulated, changed submodules Glen Choo
2022-02-15 22:02     ` Jonathan Tan
2022-02-16  5:46       ` Glen Choo
2022-02-16  9:11         ` Glen Choo
2022-02-16  9:39           ` Ævar Arnfjörð Bjarmason
2022-02-16 17:33             ` Glen Choo
2022-02-15 22:06     ` Ævar Arnfjörð Bjarmason
2022-02-15 17:23   ` [PATCH v2 8/9] submodule: read shallows when finding " Glen Choo
2022-02-15 22:03     ` Jonathan Tan
2022-02-15 17:23   ` [PATCH v2 9/9] submodule: fix latent check_has_commit() bug Glen Choo
2022-02-15 22:04     ` Jonathan Tan
2022-02-24 10:08   ` [PATCH v3 00/10] fetch --recurse-submodules: fetch unpopulated submodules Glen Choo
2022-02-24 10:08     ` [PATCH v3 01/10] t5526: introduce test helper to assert on fetches Glen Choo
2022-02-25  0:34       ` Junio C Hamano
2022-02-24 10:08     ` [PATCH v3 02/10] t5526: stop asserting on stderr literally Glen Choo
2022-02-24 11:52       ` Ævar Arnfjörð Bjarmason
2022-02-24 16:15         ` Glen Choo
2022-02-24 18:13           ` Eric Sunshine
2022-02-24 23:05       ` Jonathan Tan
2022-02-25  2:26         ` Glen Choo
2022-02-24 10:08     ` [PATCH v3 03/10] t5526: create superproject commits with test helper Glen Choo
2022-02-24 23:14       ` Jonathan Tan
2022-02-25  2:52         ` Glen Choo
2022-02-25 11:42           ` Ævar Arnfjörð Bjarmason
2022-02-28 18:11             ` Glen Choo
2022-02-24 10:08     ` [PATCH v3 04/10] submodule: make static functions read submodules from commits Glen Choo
2022-02-24 10:08     ` [PATCH v3 05/10] submodule: inline submodule_commits() into caller Glen Choo
2022-02-24 10:08     ` [PATCH v3 06/10] submodule: store new submodule commits oid_array in a struct Glen Choo
2022-02-24 10:08     ` [PATCH v3 07/10] submodule: extract get_fetch_task() Glen Choo
2022-02-24 23:26       ` Jonathan Tan
2022-02-24 10:08     ` [PATCH v3 08/10] submodule: move logic into fetch_task_create() Glen Choo
2022-02-24 23:36       ` Jonathan Tan
2022-02-24 10:08     ` [PATCH v3 09/10] fetch: fetch unpopulated, changed submodules Glen Choo
2022-02-24 21:30       ` Junio C Hamano
2022-02-25  3:04         ` Glen Choo
2022-02-25  0:33       ` Junio C Hamano
2022-02-25  3:07         ` Glen Choo
2022-02-25  0:39       ` Jonathan Tan
2022-02-25  3:46         ` Glen Choo
2022-03-04 23:46           ` Jonathan Tan
2022-03-05  0:22             ` Glen Choo
2022-03-04 23:53           ` Jonathan Tan
2022-02-26 18:53       ` Junio C Hamano
2022-03-01 20:24         ` Johannes Schindelin
2022-03-01 20:33           ` Junio C Hamano
2022-03-02 23:25             ` Glen Choo
2022-03-01 20:32         ` Junio C Hamano
2022-02-24 10:08     ` [PATCH v3 10/10] submodule: fix latent check_has_commit() bug Glen Choo
2022-03-04  0:57     ` [PATCH v4 00/10] fetch --recurse-submodules: fetch unpopulated submodules Glen Choo
2022-03-04  0:57       ` [PATCH v4 01/10] t5526: introduce test helper to assert on fetches Glen Choo
2022-03-04  2:06         ` Junio C Hamano
2022-03-04 22:11           ` Glen Choo
2022-03-04  0:57       ` [PATCH v4 02/10] t5526: stop asserting on stderr literally Glen Choo
2022-03-04  2:12         ` Junio C Hamano
2022-03-04 22:41         ` Jonathan Tan
2022-03-04 23:48           ` Junio C Hamano
2022-03-05  0:25             ` Glen Choo
2022-03-04  0:57       ` [PATCH v4 03/10] t5526: create superproject commits with test helper Glen Choo
2022-03-04 22:59         ` Jonathan Tan
2022-03-04  0:57       ` [PATCH v4 04/10] submodule: make static functions read submodules from commits Glen Choo
2022-03-04  0:57       ` [PATCH v4 05/10] submodule: inline submodule_commits() into caller Glen Choo
2022-03-04  0:57       ` [PATCH v4 06/10] submodule: store new submodule commits oid_array in a struct Glen Choo
2022-03-04  0:57       ` [PATCH v4 07/10] submodule: extract get_fetch_task() Glen Choo
2022-03-04  0:57       ` [PATCH v4 08/10] submodule: move logic into fetch_task_create() Glen Choo
2022-03-04  0:57       ` [PATCH v4 09/10] fetch: fetch unpopulated, changed submodules Glen Choo
2022-03-04  2:37         ` Junio C Hamano
2022-03-04 22:59           ` Glen Choo
2022-03-05  0:13             ` Junio C Hamano
2022-03-05  0:37               ` Glen Choo
2022-03-08  0:11                 ` Junio C Hamano
2022-03-04 23:56         ` Jonathan Tan
2022-03-04  0:57       ` [PATCH v4 10/10] submodule: fix latent check_has_commit() bug Glen Choo
2022-03-04  2:17         ` Junio C Hamano
2022-03-04  2:22       ` [PATCH v4 00/10] fetch --recurse-submodules: fetch unpopulated submodules Junio C Hamano
2022-03-08  0:14       ` [PATCH v5 " Glen Choo
2022-03-08  0:14         ` [PATCH v5 01/10] t5526: introduce test helper to assert on fetches Glen Choo
2022-03-08  0:14         ` [PATCH v5 02/10] t5526: stop asserting on stderr literally Glen Choo
2022-03-08  0:14         ` [PATCH v5 03/10] t5526: create superproject commits with test helper Glen Choo
2022-03-08  0:14         ` [PATCH v5 04/10] submodule: make static functions read submodules from commits Glen Choo
2022-03-08  0:14         ` [PATCH v5 05/10] submodule: inline submodule_commits() into caller Glen Choo
2022-03-08  0:14         ` [PATCH v5 06/10] submodule: store new submodule commits oid_array in a struct Glen Choo
2022-03-08  0:14         ` [PATCH v5 07/10] submodule: extract get_fetch_task() Glen Choo
2022-03-08  0:14         ` [PATCH v5 08/10] submodule: move logic into fetch_task_create() Glen Choo
2022-03-08  0:14         ` [PATCH v5 09/10] fetch: fetch unpopulated, changed submodules Glen Choo
2022-03-08  0:14         ` [PATCH v5 10/10] submodule: fix latent check_has_commit() bug Glen Choo
2022-03-08  0:50         ` [PATCH v5 00/10] fetch --recurse-submodules: fetch unpopulated submodules Junio C Hamano
2022-03-08 18:24           ` Glen Choo
2022-03-09 19:13             ` Junio C Hamano
2022-03-09 19:49               ` Glen Choo
2022-03-09 20:22                 ` Junio C Hamano
2022-03-09 22:11                   ` Glen Choo
2022-03-16 21:58                     ` Glen Choo
2022-03-16 22:06                       ` Junio C Hamano
2022-03-16 22:37                         ` Glen Choo
2022-03-16 23:08                           ` Junio C Hamano
2022-03-08 21:42         ` Jonathan Tan

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).