git@vger.kernel.org mailing list mirror (one of many)
 help / Atom feed
* [PATCH 0/9] fetch: make sure submodule oids are fetched
@ 2018-09-11 23:49 Stefan Beller
  2018-09-11 23:49 ` [PATCH 1/9] string-list: add string_list_{pop, last} functions Stefan Beller
                   ` (9 more replies)
  0 siblings, 10 replies; 35+ messages in thread
From: Stefan Beller @ 2018-09-11 23:49 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

Currently when git-fetch is asked to recurse into submodules, it dispatches
a plain "git-fetch -C <submodule-dir>" (and some submodule related options
such as prefix and recusing strategy, but) without any information of the
remote or the tip that should be fetched.

This works surprisingly well in some workflows, not so well in others,
which this series aims to fix.

The first patches provide new basic functionality and do some refactoring;
the interesting part is in the two last patches.

This was discussed in
https://public-inbox.org/git/20180808221752.195419-1-sbeller@google.com/
and I think I addressed all feedback so far.

Thanks,
Stefan

Stefan Beller (9):
  string-list: add string_list_{pop, last} functions
  sha1-array: provide oid_array_filter
  submodule.c: fix indentation
  submodule.c: sort changed_submodule_names before searching it
  submodule: move global changed_submodule_names into fetch submodule
    struct
  submodule.c: do not copy around submodule list
  submodule: fetch in submodules git directory instead of in worktree
  fetch: retry fetching submodules if sha1 were not fetched
  builtin/fetch: check for submodule updates for non branch fetches

 builtin/fetch.c             |  14 +--
 sha1-array.c                |  18 ++++
 sha1-array.h                |   5 +
 string-list.c               |  14 +++
 string-list.h               |  11 +++
 submodule.c                 | 189 ++++++++++++++++++++++++++++--------
 t/t5526-fetch-submodules.sh |  23 ++++-
 7 files changed, 227 insertions(+), 47 deletions(-)

-- 
2.19.0.397.gdd90340f6a-goog


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 1/9] string-list: add string_list_{pop, last} functions
  2018-09-11 23:49 [PATCH 0/9] fetch: make sure submodule oids are fetched Stefan Beller
@ 2018-09-11 23:49 ` Stefan Beller
  2018-09-12  2:24   ` Ramsay Jones
  2018-09-12 17:52   ` Junio C Hamano
  2018-09-11 23:49 ` [PATCH 2/9] sha1-array: provide oid_array_filter Stefan Beller
                   ` (8 subsequent siblings)
  9 siblings, 2 replies; 35+ messages in thread
From: Stefan Beller @ 2018-09-11 23:49 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

Add a few functions to allow a string-list to be used as a stack:

 - string_list_last() lets a caller peek the string_list_item at the
   end of the string list.  The caller needs to be aware that it is
   borrowing a pointer, which can become invalid if/when the
   string_list is resized.

 - string_list_pop() removes the string_list_item at the end of
   the string list.

 - _pop usually has a friend _push. This role is taken by
    string_list_append already, as they are not symmetrical
    in our code base: _append returns the pointer, such that
    adding a util is easy, but _pop doesn't return such a pointer.

You can use them in this pattern:

    while (list.nr) {
        struct string_list_item *item = string_list_last(&list);

        work_on(item);
        string_list_pop(&list);
    }

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
---
 string-list.c | 14 ++++++++++++++
 string-list.h | 11 +++++++++++
 2 files changed, 25 insertions(+)

diff --git a/string-list.c b/string-list.c
index 771c4550980..04db2b537c0 100644
--- a/string-list.c
+++ b/string-list.c
@@ -80,6 +80,20 @@ void string_list_remove(struct string_list *list, const char *string,
 	}
 }
 
+void string_list_pop(struct string_list *list, int free_util)
+{
+	if (list->nr == 0)
+		BUG("tried to remove an item from empty string list");
+
+	if (list->strdup_strings)
+		free(list->items[list->nr - 1].string);
+
+	if (free_util)
+		free(list->items[list->nr - 1].util);
+
+	list->nr--;
+}
+
 int string_list_has_string(const struct string_list *list, const char *string)
 {
 	int exact_match;
diff --git a/string-list.h b/string-list.h
index ff8f6094a33..ce2528bbe15 100644
--- a/string-list.h
+++ b/string-list.h
@@ -191,6 +191,17 @@ extern void string_list_remove(struct string_list *list, const char *string,
  */
 struct string_list_item *string_list_lookup(struct string_list *list, const char *string);
 
+/**
+ * Removes the last item from the list.
+ * The caller must ensure that the list is not empty.
+ */
+void string_list_pop(struct string_list *list, int free_util);
+
+static inline struct string_list_item *string_list_last(struct string_list *list)
+{
+	return &list->items[list->nr - 1];
+}
+
 /*
  * Remove all but the first of consecutive entries with the same
  * string value.  If free_util is true, call free() on the util
-- 
2.19.0.397.gdd90340f6a-goog


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 2/9] sha1-array: provide oid_array_filter
  2018-09-11 23:49 [PATCH 0/9] fetch: make sure submodule oids are fetched Stefan Beller
  2018-09-11 23:49 ` [PATCH 1/9] string-list: add string_list_{pop, last} functions Stefan Beller
@ 2018-09-11 23:49 ` Stefan Beller
  2018-09-12 18:02   ` Junio C Hamano
  2018-09-11 23:49 ` [PATCH 3/9] submodule.c: fix indentation Stefan Beller
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Stefan Beller @ 2018-09-11 23:49 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
---
 sha1-array.c | 18 ++++++++++++++++++
 sha1-array.h |  5 +++++
 2 files changed, 23 insertions(+)

diff --git a/sha1-array.c b/sha1-array.c
index 265941fbf40..76323935dd7 100644
--- a/sha1-array.c
+++ b/sha1-array.c
@@ -77,3 +77,21 @@ int oid_array_for_each_unique(struct oid_array *array,
 	}
 	return 0;
 }
+
+int oid_array_filter(struct oid_array *array,
+		     for_each_oid_fn fn,
+		     void *cbdata)
+{
+	int src, dst;
+
+	for (src = dst = 0; src < array->nr; src++) {
+		if (fn(&array->oid[src], cbdata)) {
+			if (dst < src)
+				oidcpy(&array->oid[dst], &array->oid[src]);
+			dst++;
+		}
+	}
+	array->nr = dst;
+
+	return 0;
+}
diff --git a/sha1-array.h b/sha1-array.h
index 232bf950172..a2d7c210835 100644
--- a/sha1-array.h
+++ b/sha1-array.h
@@ -23,4 +23,9 @@ int oid_array_for_each_unique(struct oid_array *array,
 			      for_each_oid_fn fn,
 			      void *data);
 
+/* Call fn for each oid, and remove it if fn returns 0, retain it otherwise */
+int oid_array_filter(struct oid_array *array,
+		     for_each_oid_fn fn,
+		     void *cbdata);
+
 #endif /* SHA1_ARRAY_H */
-- 
2.19.0.397.gdd90340f6a-goog


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 3/9] submodule.c: fix indentation
  2018-09-11 23:49 [PATCH 0/9] fetch: make sure submodule oids are fetched Stefan Beller
  2018-09-11 23:49 ` [PATCH 1/9] string-list: add string_list_{pop, last} functions Stefan Beller
  2018-09-11 23:49 ` [PATCH 2/9] sha1-array: provide oid_array_filter Stefan Beller
@ 2018-09-11 23:49 ` Stefan Beller
  2018-09-12 18:02   ` Junio C Hamano
  2018-09-11 23:49 ` [PATCH 4/9] submodule.c: sort changed_submodule_names before searching it Stefan Beller
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Stefan Beller @ 2018-09-11 23:49 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

The submodule subsystem is really bad at staying within 80 characters.
Fix it while we are here.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/submodule.c b/submodule.c
index a2b266fbfae..d29dfa3d1f5 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1244,7 +1244,8 @@ static int get_next_submodule(struct child_process *cp,
 		if (!submodule) {
 			const char *name = default_name_or_path(ce->name);
 			if (name) {
-				default_submodule.path = default_submodule.name = name;
+				default_submodule.path = name;
+				default_submodule.name = name;
 				submodule = &default_submodule;
 			}
 		}
@@ -1254,8 +1255,10 @@ static int get_next_submodule(struct child_process *cp,
 		default:
 		case RECURSE_SUBMODULES_DEFAULT:
 		case RECURSE_SUBMODULES_ON_DEMAND:
-			if (!submodule || !unsorted_string_list_lookup(&changed_submodule_names,
-							 submodule->name))
+			if (!submodule ||
+			    !unsorted_string_list_lookup(
+					&changed_submodule_names,
+					submodule->name))
 				continue;
 			default_argv = "on-demand";
 			break;
-- 
2.19.0.397.gdd90340f6a-goog


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 4/9] submodule.c: sort changed_submodule_names before searching it
  2018-09-11 23:49 [PATCH 0/9] fetch: make sure submodule oids are fetched Stefan Beller
                   ` (2 preceding siblings ...)
  2018-09-11 23:49 ` [PATCH 3/9] submodule.c: fix indentation Stefan Beller
@ 2018-09-11 23:49 ` Stefan Beller
  2018-09-12 18:18   ` Junio C Hamano
  2018-09-11 23:49 ` [PATCH 5/9] submodule: move global changed_submodule_names into fetch submodule struct Stefan Beller
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Stefan Beller @ 2018-09-11 23:49 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

We can string_list_insert() to maintain sorted-ness of the
list as we find new items, or we can string_list_append() to
build an unsorted list and sort it at the end just once.

To pick which one is more appropriate, we notice the fact
that we discover new items more or less in the already
sorted order.  That makes "append then sort" more
appropriate.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/submodule.c b/submodule.c
index d29dfa3d1f5..c6eff7699f3 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1256,7 +1256,7 @@ static int get_next_submodule(struct child_process *cp,
 		case RECURSE_SUBMODULES_DEFAULT:
 		case RECURSE_SUBMODULES_ON_DEMAND:
 			if (!submodule ||
-			    !unsorted_string_list_lookup(
+			    !string_list_lookup(
 					&changed_submodule_names,
 					submodule->name))
 				continue;
@@ -1350,6 +1350,7 @@ int fetch_populated_submodules(struct repository *r,
 	/* default value, "--submodule-prefix" and its value are added later */
 
 	calculate_changed_submodule_paths();
+	string_list_sort(&changed_submodule_names);
 	run_processes_parallel(max_parallel_jobs,
 			       get_next_submodule,
 			       fetch_start_failure,
-- 
2.19.0.397.gdd90340f6a-goog


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 5/9] submodule: move global changed_submodule_names into fetch submodule struct
  2018-09-11 23:49 [PATCH 0/9] fetch: make sure submodule oids are fetched Stefan Beller
                   ` (3 preceding siblings ...)
  2018-09-11 23:49 ` [PATCH 4/9] submodule.c: sort changed_submodule_names before searching it Stefan Beller
@ 2018-09-11 23:49 ` Stefan Beller
  2018-09-11 23:49 ` [PATCH 6/9] submodule.c: do not copy around submodule list Stefan Beller
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 35+ messages in thread
From: Stefan Beller @ 2018-09-11 23:49 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

The `changed_submodule_names` are only used for fetching, so let's make it
part of the struct that is passed around for fetching submodules.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule.c | 42 +++++++++++++++++++++++-------------------
 1 file changed, 23 insertions(+), 19 deletions(-)

diff --git a/submodule.c b/submodule.c
index c6eff7699f3..3520dd76bdf 100644
--- a/submodule.c
+++ b/submodule.c
@@ -24,7 +24,7 @@
 #include "object-store.h"
 
 static int config_update_recurse_submodules = RECURSE_SUBMODULES_OFF;
-static struct string_list changed_submodule_names = STRING_LIST_INIT_DUP;
+
 static int initialized_fetch_ref_tips;
 static struct oid_array ref_tips_before_fetch;
 static struct oid_array ref_tips_after_fetch;
@@ -1110,7 +1110,22 @@ void check_for_new_submodule_commits(struct object_id *oid)
 	oid_array_append(&ref_tips_after_fetch, oid);
 }
 
-static void calculate_changed_submodule_paths(void)
+struct submodule_parallel_fetch {
+	int count;
+	struct argv_array args;
+	struct repository *r;
+	const char *prefix;
+	int command_line_option;
+	int default_option;
+	int quiet;
+	int result;
+
+	struct string_list changed_submodule_names;
+};
+#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, STRING_LIST_INIT_DUP }
+
+static void calculate_changed_submodule_paths(
+	struct submodule_parallel_fetch *spf)
 {
 	struct argv_array argv = ARGV_ARRAY_INIT;
 	struct string_list changed_submodules = STRING_LIST_INIT_DUP;
@@ -1148,7 +1163,8 @@ static void calculate_changed_submodule_paths(void)
 			continue;
 
 		if (!submodule_has_commits(path, commits))
-			string_list_append(&changed_submodule_names, name->string);
+			string_list_append(&spf->changed_submodule_names,
+					   name->string);
 	}
 
 	free_submodules_oids(&changed_submodules);
@@ -1185,18 +1201,6 @@ int submodule_touches_in_range(struct object_id *excl_oid,
 	return ret;
 }
 
-struct submodule_parallel_fetch {
-	int count;
-	struct argv_array args;
-	struct repository *r;
-	const char *prefix;
-	int command_line_option;
-	int default_option;
-	int quiet;
-	int result;
-};
-#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0}
-
 static int get_fetch_recurse_config(const struct submodule *submodule,
 				    struct submodule_parallel_fetch *spf)
 {
@@ -1257,7 +1261,7 @@ static int get_next_submodule(struct child_process *cp,
 		case RECURSE_SUBMODULES_ON_DEMAND:
 			if (!submodule ||
 			    !string_list_lookup(
-					&changed_submodule_names,
+					&spf->changed_submodule_names,
 					submodule->name))
 				continue;
 			default_argv = "on-demand";
@@ -1349,8 +1353,8 @@ int fetch_populated_submodules(struct repository *r,
 	argv_array_push(&spf.args, "--recurse-submodules-default");
 	/* default value, "--submodule-prefix" and its value are added later */
 
-	calculate_changed_submodule_paths();
-	string_list_sort(&changed_submodule_names);
+	calculate_changed_submodule_paths(&spf);
+	string_list_sort(&spf.changed_submodule_names);
 	run_processes_parallel(max_parallel_jobs,
 			       get_next_submodule,
 			       fetch_start_failure,
@@ -1359,7 +1363,7 @@ int fetch_populated_submodules(struct repository *r,
 
 	argv_array_clear(&spf.args);
 out:
-	string_list_clear(&changed_submodule_names, 1);
+	string_list_clear(&spf.changed_submodule_names, 1);
 	return spf.result;
 }
 
-- 
2.19.0.397.gdd90340f6a-goog


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 6/9] submodule.c: do not copy around submodule list
  2018-09-11 23:49 [PATCH 0/9] fetch: make sure submodule oids are fetched Stefan Beller
                   ` (4 preceding siblings ...)
  2018-09-11 23:49 ` [PATCH 5/9] submodule: move global changed_submodule_names into fetch submodule struct Stefan Beller
@ 2018-09-11 23:49 ` Stefan Beller
  2018-09-12  2:25   ` Ramsay Jones
  2018-09-11 23:49 ` [PATCH 7/9] submodule: fetch in submodules git directory instead of in worktree Stefan Beller
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Stefan Beller @ 2018-09-11 23:49 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

'calculate_changed_submodule_paths' uses a local list to compute the
changed submodules, and then produces the result by copying appropriate
items into the result list.

Instead use the result list directly and prune items afterwards
using string_list_remove_empty_items.

By doin so we'll have access to the util pointer for longer that
contains the commits that we need to fetch, which will be
useful in a later patch.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/submodule.c b/submodule.c
index 3520dd76bdf..00a9a3c6b12 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1128,8 +1128,7 @@ static void calculate_changed_submodule_paths(
 	struct submodule_parallel_fetch *spf)
 {
 	struct argv_array argv = ARGV_ARRAY_INIT;
-	struct string_list changed_submodules = STRING_LIST_INIT_DUP;
-	const struct string_list_item *name;
+	struct string_list_item *name;
 
 	/* No need to check if there are no submodules configured */
 	if (!submodule_from_path(the_repository, NULL, NULL))
@@ -1146,9 +1145,9 @@ static void calculate_changed_submodule_paths(
 	 * Collect all submodules (whether checked out or not) for which new
 	 * commits have been recorded upstream in "changed_submodule_names".
 	 */
-	collect_changed_submodules(&changed_submodules, &argv);
+	collect_changed_submodules(&spf->changed_submodule_names, &argv);
 
-	for_each_string_list_item(name, &changed_submodules) {
+	for_each_string_list_item(name, &spf->changed_submodule_names) {
 		struct oid_array *commits = name->util;
 		const struct submodule *submodule;
 		const char *path = NULL;
@@ -1162,12 +1161,14 @@ static void calculate_changed_submodule_paths(
 		if (!path)
 			continue;
 
-		if (!submodule_has_commits(path, commits))
-			string_list_append(&spf->changed_submodule_names,
-					   name->string);
+		if (submodule_has_commits(path, commits)) {
+			oid_array_clear(commits);
+			*name->string = '\0';
+		}
 	}
 
-	free_submodules_oids(&changed_submodules);
+	string_list_remove_empty_items(&spf->changed_submodule_names, 1);
+
 	argv_array_clear(&argv);
 	oid_array_clear(&ref_tips_before_fetch);
 	oid_array_clear(&ref_tips_after_fetch);
@@ -1363,7 +1364,7 @@ int fetch_populated_submodules(struct repository *r,
 
 	argv_array_clear(&spf.args);
 out:
-	string_list_clear(&spf.changed_submodule_names, 1);
+	free_submodules_oids(&spf.changed_submodule_names);
 	return spf.result;
 }
 
-- 
2.19.0.397.gdd90340f6a-goog


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 7/9] submodule: fetch in submodules git directory instead of in worktree
  2018-09-11 23:49 [PATCH 0/9] fetch: make sure submodule oids are fetched Stefan Beller
                   ` (5 preceding siblings ...)
  2018-09-11 23:49 ` [PATCH 6/9] submodule.c: do not copy around submodule list Stefan Beller
@ 2018-09-11 23:49 ` Stefan Beller
  2018-09-12 18:36   ` Junio C Hamano
  2018-09-11 23:49 ` [PATCH 8/9] fetch: retry fetching submodules if sha1 were not fetched Stefan Beller
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Stefan Beller @ 2018-09-11 23:49 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

This patch started as a refactoring to make 'get_next_submodule' more
readable, but upon doing so, I realized that git-fetch actually doesn't
need to be run in the worktree. So let's run it in the git dir instead.

That should pave the way towards fetching submodules that are currently
not checked out.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule.c                 | 43 ++++++++++++++++++++++++++-----------
 t/t5526-fetch-submodules.sh |  7 +++++-
 2 files changed, 37 insertions(+), 13 deletions(-)

diff --git a/submodule.c b/submodule.c
index 00a9a3c6b12..1e6781504f0 100644
--- a/submodule.c
+++ b/submodule.c
@@ -481,6 +481,12 @@ void prepare_submodule_repo_env(struct argv_array *out)
 			 DEFAULT_GIT_DIR_ENVIRONMENT);
 }
 
+static void prepare_submodule_repo_env_in_gitdir(struct argv_array *out)
+{
+	prepare_submodule_repo_env_no_git_dir(out);
+	argv_array_pushf(out, "%s=.", GIT_DIR_ENVIRONMENT);
+}
+
 /* Helper function to display the submodule header line prior to the full
  * summary output. If it can locate the submodule objects directory it will
  * attempt to lookup both the left and right commits and put them into the
@@ -1227,6 +1233,27 @@ static int get_fetch_recurse_config(const struct submodule *submodule,
 	return spf->default_option;
 }
 
+static const char *get_submodule_git_dir(struct repository *r, const char *path)
+{
+	struct repository subrepo;
+	const char *ret;
+
+	if (repo_submodule_init(&subrepo, r, path)) {
+		/* no entry in .gitmodules? */
+		struct strbuf gitdir = STRBUF_INIT;
+		strbuf_repo_worktree_path(&gitdir, r, "%s/.git", path);
+		if (repo_init(&subrepo, gitdir.buf, NULL)) {
+			strbuf_release(&gitdir);
+			return NULL;
+		}
+	}
+
+	ret = xstrdup(subrepo.gitdir);
+	repo_clear(&subrepo);
+
+	return ret;
+}
+
 static int get_next_submodule(struct child_process *cp,
 			      struct strbuf *err, void *data, void **task_cb)
 {
@@ -1234,8 +1261,6 @@ static int get_next_submodule(struct child_process *cp,
 	struct submodule_parallel_fetch *spf = data;
 
 	for (; spf->count < spf->r->index->cache_nr; spf->count++) {
-		struct strbuf submodule_path = STRBUF_INIT;
-		struct strbuf submodule_git_dir = STRBUF_INIT;
 		struct strbuf submodule_prefix = STRBUF_INIT;
 		const struct cache_entry *ce = spf->r->index->cache[spf->count];
 		const char *git_dir, *default_argv;
@@ -1274,16 +1299,12 @@ static int get_next_submodule(struct child_process *cp,
 			continue;
 		}
 
-		strbuf_repo_worktree_path(&submodule_path, spf->r, "%s", ce->name);
-		strbuf_addf(&submodule_git_dir, "%s/.git", submodule_path.buf);
 		strbuf_addf(&submodule_prefix, "%s%s/", spf->prefix, ce->name);
-		git_dir = read_gitfile(submodule_git_dir.buf);
-		if (!git_dir)
-			git_dir = submodule_git_dir.buf;
-		if (is_directory(git_dir)) {
+		git_dir = get_submodule_git_dir(spf->r, ce->name);
+		if (git_dir) {
 			child_process_init(cp);
-			cp->dir = strbuf_detach(&submodule_path, NULL);
-			prepare_submodule_repo_env(&cp->env_array);
+			prepare_submodule_repo_env_in_gitdir(&cp->env_array);
+			cp->dir = git_dir;
 			cp->git_cmd = 1;
 			if (!spf->quiet)
 				strbuf_addf(err, "Fetching submodule %s%s\n",
@@ -1295,8 +1316,6 @@ static int get_next_submodule(struct child_process *cp,
 			argv_array_push(&cp->args, submodule_prefix.buf);
 			ret = 1;
 		}
-		strbuf_release(&submodule_path);
-		strbuf_release(&submodule_git_dir);
 		strbuf_release(&submodule_prefix);
 		if (ret) {
 			spf->count++;
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index 6c2f9b2ba26..42692219a1a 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -566,7 +566,12 @@ test_expect_success 'fetching submodule into a broken repository' '
 
 	test_must_fail git -C dst status &&
 	test_must_fail git -C dst diff &&
-	test_must_fail git -C dst fetch --recurse-submodules
+
+	# git-fetch cannot find the git directory of the submodule,
+	# so it will do nothing, successfully, as it cannot distinguish between
+	# this broken submodule and a submodule that was just set active but
+	# not cloned yet
+	git -C dst fetch --recurse-submodules
 '
 
 test_expect_success "fetch new commits when submodule got renamed" '
-- 
2.19.0.397.gdd90340f6a-goog


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 8/9] fetch: retry fetching submodules if sha1 were not fetched
  2018-09-11 23:49 [PATCH 0/9] fetch: make sure submodule oids are fetched Stefan Beller
                   ` (6 preceding siblings ...)
  2018-09-11 23:49 ` [PATCH 7/9] submodule: fetch in submodules git directory instead of in worktree Stefan Beller
@ 2018-09-11 23:49 ` Stefan Beller
  2018-09-12 19:03   ` Junio C Hamano
  2018-09-11 23:49 ` [PATCH 9/9] builtin/fetch: check for submodule updates for non branch fetches Stefan Beller
  2018-09-17 21:35 ` [PATCHv2 0/9] fetch: make sure submodule oids are fetched Stefan Beller
  9 siblings, 1 reply; 35+ messages in thread
From: Stefan Beller @ 2018-09-11 23:49 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

Currently when git-fetch is asked to recurse into submodules, it dispatches
a plain "git-fetch -C <submodule-dir>" (and some submodule related options
such as prefix and recusing strategy, but) without any information of the
remote or the tip that should be fetched.

This works surprisingly well in some workflows (such as using submodules
as a third party library), while not so well in other scenarios, such
as in a Gerrit topic-based workflow, that can tie together changes
(potentially across repositories) on the server side. One of the parts
of such a Gerrit workflow is to download a change when wanting to examine
it, and you'd want to have its submodule changes that are in the same
topic downloaded as well. However these submodule changes reside in their
own repository in their own ref (refs/changes/<int>).

Retry fetching a submodule if the object id that the superproject points
to, cannot be found.

This doesn't support fetching to FETCH_HEAD yet, but only into a local
branch. To make fetching into FETCH_HEAD work, we need some refactoring
in builtin/fetch.c to adjust the calls to 'check_for_new_submodule_commits'
that is coming in the next patch.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 builtin/fetch.c             |  9 ++--
 submodule.c                 | 87 ++++++++++++++++++++++++++++++++++++-
 t/t5526-fetch-submodules.sh | 16 +++++++
 3 files changed, 104 insertions(+), 8 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 61bec5d213d..95c44bf6ffa 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -700,8 +700,7 @@ static int update_local_ref(struct ref *ref,
 			what = _("[new ref]");
 		}
 
-		if ((recurse_submodules != RECURSE_SUBMODULES_OFF) &&
-		    (recurse_submodules != RECURSE_SUBMODULES_ON))
+		if (recurse_submodules != RECURSE_SUBMODULES_OFF)
 			check_for_new_submodule_commits(&ref->new_oid);
 		r = s_update_ref(msg, ref, 0);
 		format_display(display, r ? '!' : '*', what,
@@ -716,8 +715,7 @@ static int update_local_ref(struct ref *ref,
 		strbuf_add_unique_abbrev(&quickref, &current->object.oid, DEFAULT_ABBREV);
 		strbuf_addstr(&quickref, "..");
 		strbuf_add_unique_abbrev(&quickref, &ref->new_oid, DEFAULT_ABBREV);
-		if ((recurse_submodules != RECURSE_SUBMODULES_OFF) &&
-		    (recurse_submodules != RECURSE_SUBMODULES_ON))
+		if (recurse_submodules != RECURSE_SUBMODULES_OFF)
 			check_for_new_submodule_commits(&ref->new_oid);
 		r = s_update_ref("fast-forward", ref, 1);
 		format_display(display, r ? '!' : ' ', quickref.buf,
@@ -731,8 +729,7 @@ static int update_local_ref(struct ref *ref,
 		strbuf_add_unique_abbrev(&quickref, &current->object.oid, DEFAULT_ABBREV);
 		strbuf_addstr(&quickref, "...");
 		strbuf_add_unique_abbrev(&quickref, &ref->new_oid, DEFAULT_ABBREV);
-		if ((recurse_submodules != RECURSE_SUBMODULES_OFF) &&
-		    (recurse_submodules != RECURSE_SUBMODULES_ON))
+		if (recurse_submodules != RECURSE_SUBMODULES_OFF)
 			check_for_new_submodule_commits(&ref->new_oid);
 		r = s_update_ref("forced-update", ref, 1);
 		format_display(display, r ? '!' : '+', quickref.buf,
diff --git a/submodule.c b/submodule.c
index 1e6781504f0..a75146e89cf 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1127,8 +1127,11 @@ struct submodule_parallel_fetch {
 	int result;
 
 	struct string_list changed_submodule_names;
+	struct string_list retry;
 };
-#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, STRING_LIST_INIT_DUP }
+#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, \
+		  STRING_LIST_INIT_DUP, \
+		  STRING_LIST_INIT_NODUP}
 
 static void calculate_changed_submodule_paths(
 	struct submodule_parallel_fetch *spf)
@@ -1259,8 +1262,10 @@ static int get_next_submodule(struct child_process *cp,
 {
 	int ret = 0;
 	struct submodule_parallel_fetch *spf = data;
+	struct string_list_item *it;
 
 	for (; spf->count < spf->r->index->cache_nr; spf->count++) {
+		int recurse_config;
 		struct strbuf submodule_prefix = STRBUF_INIT;
 		const struct cache_entry *ce = spf->r->index->cache[spf->count];
 		const char *git_dir, *default_argv;
@@ -1280,7 +1285,9 @@ static int get_next_submodule(struct child_process *cp,
 			}
 		}
 
-		switch (get_fetch_recurse_config(submodule, spf))
+		recurse_config = get_fetch_recurse_config(submodule, spf);
+
+		switch (recurse_config)
 		{
 		default:
 		case RECURSE_SUBMODULES_DEFAULT:
@@ -1319,9 +1326,50 @@ static int get_next_submodule(struct child_process *cp,
 		strbuf_release(&submodule_prefix);
 		if (ret) {
 			spf->count++;
+			if (submodule != &default_submodule)
+				/* discard const-ness: */
+				*task_cb = (void*)submodule;
 			return 1;
 		}
 	}
+
+retry_next:
+
+	if (spf->retry.nr) {
+		struct strbuf submodule_prefix = STRBUF_INIT;
+		const struct submodule *sub;
+
+		it = string_list_last(&spf->retry);
+		sub = submodule_from_name(spf->r, &null_oid,
+					  it->string);
+
+		child_process_init(cp);
+		cp->dir = get_submodule_git_dir(spf->r, sub->path);
+		if (!cp->dir) {
+			string_list_pop(&spf->retry, 0);
+			goto retry_next;
+		}
+		prepare_submodule_repo_env_in_gitdir(&cp->env_array);
+		cp->git_cmd = 1;
+
+		strbuf_addf(&submodule_prefix, "%s%s/", spf->prefix, sub->path);
+		argv_array_init(&cp->args);
+		argv_array_pushv(&cp->args, spf->args.argv);
+		argv_array_push(&cp->args, "on-demand");
+		argv_array_push(&cp->args, "--submodule-prefix");
+		argv_array_push(&cp->args, submodule_prefix.buf);
+
+		/* NEEDSWORK: have get_default_remote from s--h */
+		argv_array_push(&cp->args, "origin");
+		oid_array_for_each_unique(it->util,
+					  append_oid_to_argv, &cp->args);
+
+		*task_cb = NULL; /* make sure we do not recurse forever */
+		strbuf_release(&submodule_prefix);
+		string_list_pop(&spf->retry, 0);
+		return 1;
+	}
+
 	return 0;
 }
 
@@ -1335,14 +1383,49 @@ static int fetch_start_failure(struct strbuf *err,
 	return 0;
 }
 
+static int commit_exists_in_sub(const struct object_id *oid, void *data)
+{
+	struct repository *subrepo = data;
+
+	enum object_type type = oid_object_info(subrepo, oid, NULL);
+
+	return type != OBJ_COMMIT;
+}
+
 static int fetch_finish(int retvalue, struct strbuf *err,
 			void *cb, void *task_cb)
 {
 	struct submodule_parallel_fetch *spf = cb;
+	struct submodule *sub = task_cb;
+	struct repository subrepo;
 
 	if (retvalue)
 		spf->result = 1;
 
+	if (!sub)
+		return 0;
+
+	if (repo_submodule_init(&subrepo, spf->r, sub->path) < 0)
+		warning(_("Could not get submodule repository for submodule '%s' in repository '%s'"),
+			  sub->path, spf->r->worktree);
+	else {
+		struct string_list_item *it;
+		struct oid_array *commits;
+
+		it = string_list_lookup(&spf->changed_submodule_names, sub->name);
+		if (!it)
+			return 0;
+
+		commits = it->util;
+		oid_array_filter(commits,
+				 commit_exists_in_sub,
+				 &subrepo);
+
+		if (commits->nr)
+			string_list_append(&spf->retry, sub->name)
+				->util = commits;
+	}
+
 	return 0;
 }
 
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index 42692219a1a..af12c50e7dd 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -605,4 +605,20 @@ test_expect_success "fetch new commits when submodule got renamed" '
 	test_cmp expect actual
 '
 
+test_expect_success "fetch new commits on-demand when they are not reachable" '
+	git checkout --detach &&
+	C=$(git -C submodule commit-tree -m "new change outside refs/heads" HEAD^{tree}) &&
+	git -C submodule update-ref refs/changes/1 $C &&
+	git update-index --cacheinfo 160000 $C submodule &&
+	git commit -m "updated submodule outside of refs/heads" &&
+	D=$(git rev-parse HEAD) &&
+	git update-ref refs/changes/2 $D &&
+	(
+		cd downstream &&
+		git fetch --recurse-submodules --recurse-submodules-default on-demand origin refs/changes/2:refs/heads/my_branch &&
+		git -C submodule cat-file -t $C &&
+		git checkout --recurse-submodules FETCH_HEAD
+	)
+'
+
 test_done
-- 
2.19.0.397.gdd90340f6a-goog


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 9/9] builtin/fetch: check for submodule updates for non branch fetches
  2018-09-11 23:49 [PATCH 0/9] fetch: make sure submodule oids are fetched Stefan Beller
                   ` (7 preceding siblings ...)
  2018-09-11 23:49 ` [PATCH 8/9] fetch: retry fetching submodules if sha1 were not fetched Stefan Beller
@ 2018-09-11 23:49 ` Stefan Beller
  2018-09-12 19:20   ` Junio C Hamano
  2018-09-17 21:35 ` [PATCHv2 0/9] fetch: make sure submodule oids are fetched Stefan Beller
  9 siblings, 1 reply; 35+ messages in thread
From: Stefan Beller @ 2018-09-11 23:49 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

For Gerrit users that use submodules the invocation of fetch without a
branch is their main use case.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 builtin/fetch.c             | 5 ++++-
 t/t5526-fetch-submodules.sh | 2 +-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 95c44bf6ffa..ea6ecd123e7 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -887,11 +887,14 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 				rc |= update_local_ref(ref, what, rm, &note,
 						       summary_width);
 				free(ref);
-			} else
+			} else {
+				check_for_new_submodule_commits(&rm->old_oid);
 				format_display(&note, '*',
 					       *kind ? kind : "branch", NULL,
 					       *what ? what : "HEAD",
 					       "FETCH_HEAD", summary_width);
+			}
+
 			if (note.len) {
 				if (verbosity >= 0 && !shown_url) {
 					fprintf(stderr, _("From %.*s\n"),
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index af12c50e7dd..a509eabb044 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -615,7 +615,7 @@ test_expect_success "fetch new commits on-demand when they are not reachable" '
 	git update-ref refs/changes/2 $D &&
 	(
 		cd downstream &&
-		git fetch --recurse-submodules --recurse-submodules-default on-demand origin refs/changes/2:refs/heads/my_branch &&
+		git fetch --recurse-submodules origin refs/changes/2 &&
 		git -C submodule cat-file -t $C &&
 		git checkout --recurse-submodules FETCH_HEAD
 	)
-- 
2.19.0.397.gdd90340f6a-goog


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/9] string-list: add string_list_{pop, last} functions
  2018-09-11 23:49 ` [PATCH 1/9] string-list: add string_list_{pop, last} functions Stefan Beller
@ 2018-09-12  2:24   ` Ramsay Jones
  2018-09-12 17:52   ` Junio C Hamano
  1 sibling, 0 replies; 35+ messages in thread
From: Ramsay Jones @ 2018-09-12  2:24 UTC (permalink / raw)
  To: Stefan Beller, git



On 12/09/18 00:49, Stefan Beller wrote:
> Add a few functions to allow a string-list to be used as a stack:
> 
>  - string_list_last() lets a caller peek the string_list_item at the
>    end of the string list.  The caller needs to be aware that it is
>    borrowing a pointer, which can become invalid if/when the
>    string_list is resized.
> 
>  - string_list_pop() removes the string_list_item at the end of
>    the string list.
> 
>  - _pop usually has a friend _push. This role is taken by
>     string_list_append already, as they are not symmetrical
>     in our code base: _append returns the pointer, such that
>     adding a util is easy, but _pop doesn't return such a pointer.
> 
> You can use them in this pattern:
> 
>     while (list.nr) {
>         struct string_list_item *item = string_list_last(&list);
> 
>         work_on(item);
>         string_list_pop(&list);

string_list_pop() takes a second int parameter (free_util).

ATB,
Ramsay Jones


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 6/9] submodule.c: do not copy around submodule list
  2018-09-11 23:49 ` [PATCH 6/9] submodule.c: do not copy around submodule list Stefan Beller
@ 2018-09-12  2:25   ` Ramsay Jones
  0 siblings, 0 replies; 35+ messages in thread
From: Ramsay Jones @ 2018-09-12  2:25 UTC (permalink / raw)
  To: Stefan Beller, git



On 12/09/18 00:49, Stefan Beller wrote:
> 'calculate_changed_submodule_paths' uses a local list to compute the
> changed submodules, and then produces the result by copying appropriate
> items into the result list.
> 
> Instead use the result list directly and prune items afterwards
> using string_list_remove_empty_items.
> 
> By doin so we'll have access to the util pointer for longer that

s/doin/doing/

ATB,
Ramsay Jones

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/9] string-list: add string_list_{pop, last} functions
  2018-09-11 23:49 ` [PATCH 1/9] string-list: add string_list_{pop, last} functions Stefan Beller
  2018-09-12  2:24   ` Ramsay Jones
@ 2018-09-12 17:52   ` Junio C Hamano
  1 sibling, 0 replies; 35+ messages in thread
From: Junio C Hamano @ 2018-09-12 17:52 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Stefan Beller <sbeller@google.com> writes:

> Add a few functions to allow a string-list to be used as a stack:
>
>  - string_list_last() lets a caller peek the string_list_item at the
>    end of the string list.  The caller needs to be aware that it is
>    borrowing a pointer, which can become invalid if/when the
>    string_list is resized.
>
>  - string_list_pop() removes the string_list_item at the end of
>    the string list.
>
>  - _pop usually has a friend _push. This role is taken by
>     string_list_append already, as they are not symmetrical
>     in our code base: _append returns the pointer, such that
>     adding a util is easy, but _pop doesn't return such a pointer.

This third-item somehow is indented one place too deeply.

"our pop() does not return a pointer, so we won't introduce push()
that does return a pointer"?  Am I reading the third item right?

pop() and push() usually go together, but they are opposite
operations, they are expected to behave oppositely, and they are
expected to be interfaced differently.  pop() that does not return a
pointer is no better or no worse match to push() that returns a
pointer, no?

Lack of something_push(), when something_pop() exists, would always
be surprising to anybody new to the "something" API, and no amount
of mumbling would justify it, I would think, but this third item
sounds like it is trying to make a lame excuse when there is no need
to.

Wouldn't it be sufficient to say "there is no string_list_push();
string_list_append() can be used in its place" and stop there?

> You can use them in this pattern:
>
>     while (list.nr) {
>         struct string_list_item *item = string_list_last(&list);
>
>         work_on(item);
>         string_list_pop(&list);
>     }

"free_util" as the second argument to this sample call is missing.

>  string-list.c | 14 ++++++++++++++
>  string-list.h | 11 +++++++++++
>  2 files changed, 25 insertions(+)

> diff --git a/string-list.c b/string-list.c
> index 771c4550980..04db2b537c0 100644
> --- a/string-list.c
> +++ b/string-list.c
> @@ -80,6 +80,20 @@ void string_list_remove(struct string_list *list, const char *string,
>  	}
>  }
>  
> +void string_list_pop(struct string_list *list, int free_util)
> +{
> +	if (list->nr == 0)
> +		BUG("tried to remove an item from empty string list");
> +
> +	if (list->strdup_strings)
> +		free(list->items[list->nr - 1].string);
> +
> +	if (free_util)
> +		free(list->items[list->nr - 1].util);
> +
> +	list->nr--;
> +}
> +
>  int string_list_has_string(const struct string_list *list, const char *string)
>  {
>  	int exact_match;
> diff --git a/string-list.h b/string-list.h
> index ff8f6094a33..ce2528bbe15 100644
> --- a/string-list.h
> +++ b/string-list.h
> @@ -191,6 +191,17 @@ extern void string_list_remove(struct string_list *list, const char *string,
>   */
>  struct string_list_item *string_list_lookup(struct string_list *list, const char *string);
>  
> +/**
> + * Removes the last item from the list.
> + * The caller must ensure that the list is not empty.
> + */
> +void string_list_pop(struct string_list *list, int free_util);
> +
> +static inline struct string_list_item *string_list_last(struct string_list *list)

We may want to warn users that pop(), append(), etc. shouldn't be
done while using the returned value from this function in an in-code
comment or docstring.

> +{
> +	return &list->items[list->nr - 1];
> +}
> +

Other than that, nicely done.

>  /*
>   * Remove all but the first of consecutive entries with the same
>   * string value.  If free_util is true, call free() on the util

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/9] sha1-array: provide oid_array_filter
  2018-09-11 23:49 ` [PATCH 2/9] sha1-array: provide oid_array_filter Stefan Beller
@ 2018-09-12 18:02   ` Junio C Hamano
  0 siblings, 0 replies; 35+ messages in thread
From: Junio C Hamano @ 2018-09-12 18:02 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Stefan Beller <sbeller@google.com> writes:

> Helped-by: Junio C Hamano <gitster@pobox.com>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  sha1-array.c | 18 ++++++++++++++++++
>  sha1-array.h |  5 +++++
>  2 files changed, 23 insertions(+)
>
> diff --git a/sha1-array.c b/sha1-array.c
> index 265941fbf40..76323935dd7 100644
> --- a/sha1-array.c
> +++ b/sha1-array.c
> @@ -77,3 +77,21 @@ int oid_array_for_each_unique(struct oid_array *array,
>  	}
>  	return 0;
>  }
> +
> +int oid_array_filter(struct oid_array *array,
> +		     for_each_oid_fn fn,

It probably makes sense to call it "want" instead of "fn" to match
object_array_filter().

> +		     void *cbdata)
> +{
> +	int src, dst;
> +
> +	for (src = dst = 0; src < array->nr; src++) {
> +		if (fn(&array->oid[src], cbdata)) {
> +			if (dst < src)
> +				oidcpy(&array->oid[dst], &array->oid[src]);
> +			dst++;
> +		}

In fact, matching the implementation of object_array_fiter() may
also make sense, as I do not see a strong reason why the resulting
code would become better by rewriting "dst != src" over there to
"dst < src" here.

> +	}
> +	array->nr = dst;
> +
> +	return 0;
> +}
> diff --git a/sha1-array.h b/sha1-array.h
> index 232bf950172..a2d7c210835 100644
> --- a/sha1-array.h
> +++ b/sha1-array.h
> @@ -23,4 +23,9 @@ int oid_array_for_each_unique(struct oid_array *array,
>  			      for_each_oid_fn fn,
>  			      void *data);
>  
> +/* Call fn for each oid, and remove it if fn returns 0, retain it otherwise */

Also perhaps mimic the wording of object_array_filter()'s comment?
I find it easier that the latter says "retaining only if X" instead
of saying "remove if !X, retain otherwise"; it's both shorter and
more to the point.  It also is nicer that it notes that the order is
preserved.

> +int oid_array_filter(struct oid_array *array,
> +		     for_each_oid_fn fn,
> +		     void *cbdata);
> +

Other than that, the function makes sense very much, and the
callsite we see in patch 8/9 does, too.

Thanks.

>  #endif /* SHA1_ARRAY_H */

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/9] submodule.c: fix indentation
  2018-09-11 23:49 ` [PATCH 3/9] submodule.c: fix indentation Stefan Beller
@ 2018-09-12 18:02   ` Junio C Hamano
  0 siblings, 0 replies; 35+ messages in thread
From: Junio C Hamano @ 2018-09-12 18:02 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Stefan Beller <sbeller@google.com> writes:

> The submodule subsystem is really bad at staying within 80 characters.
> Fix it while we are here.

Makes sense.  Thanks.

>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  submodule.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/submodule.c b/submodule.c
> index a2b266fbfae..d29dfa3d1f5 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -1244,7 +1244,8 @@ static int get_next_submodule(struct child_process *cp,
>  		if (!submodule) {
>  			const char *name = default_name_or_path(ce->name);
>  			if (name) {
> -				default_submodule.path = default_submodule.name = name;
> +				default_submodule.path = name;
> +				default_submodule.name = name;
>  				submodule = &default_submodule;
>  			}
>  		}
> @@ -1254,8 +1255,10 @@ static int get_next_submodule(struct child_process *cp,
>  		default:
>  		case RECURSE_SUBMODULES_DEFAULT:
>  		case RECURSE_SUBMODULES_ON_DEMAND:
> -			if (!submodule || !unsorted_string_list_lookup(&changed_submodule_names,
> -							 submodule->name))
> +			if (!submodule ||
> +			    !unsorted_string_list_lookup(
> +					&changed_submodule_names,
> +					submodule->name))
>  				continue;
>  			default_argv = "on-demand";
>  			break;

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 4/9] submodule.c: sort changed_submodule_names before searching it
  2018-09-11 23:49 ` [PATCH 4/9] submodule.c: sort changed_submodule_names before searching it Stefan Beller
@ 2018-09-12 18:18   ` Junio C Hamano
  2018-09-12 19:06     ` Stefan Beller
  0 siblings, 1 reply; 35+ messages in thread
From: Junio C Hamano @ 2018-09-12 18:18 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Stefan Beller <sbeller@google.com> writes:

> We can string_list_insert() to maintain sorted-ness of the
> list as we find new items, or we can string_list_append() to
> build an unsorted list and sort it at the end just once.
>
> To pick which one is more appropriate, we notice the fact
> that we discover new items more or less in the already
> sorted order.  That makes "append then sort" more
> appropriate.

Sorry, but I still do not get the math you are implying in the
second paragraph.  Are you saying that append-then-sort is efficient
when items being appended is already sorted?  That depends on the
sorting algorithm used, so the logic is incomplete unless you say
"given that we use X for sorting,...", I think.

Do we really discover new items in sorted order, by the way?  In a
single diff invocation made inside collect_changed_submodules() for
one commit in the superproject's history, we will grab changed paths
in the pathname order (i.e. sorted); if the superproject's tip commit
touches the submodules at paths A and Z, we will discover these two
paths in sorted order.

But because we are walking the superproject's history to collect all
paths that have been affected in that function, and repeatedly
calling diff as we discover commit in the superproject's history, I
am not sure how well the resulting set of paths would be sorted.

The tip commit in superproject's history may have modified the
submodule at path X, the parent of that commit may have touched the
submodule at path M, and its parent may have touched the submodule
at path A.  Don't we end up grabbing these paths in that discoverd
order, i.e. X, M and A?

I still think changing it from "insert as we find an item, keeping
the list sorted" to "append all and then sort before we start
looking things up from the result" makes sense, but I do not think
the "we find things in sorted order" is either true, or it would
affect the choice between the two.  A justification to choose the
latter I can think of that makes sense is that we don't have to pay
cost to keep the list sorted while building it because we do not do
any look-up while building the list.

> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  submodule.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/submodule.c b/submodule.c
> index d29dfa3d1f5..c6eff7699f3 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -1256,7 +1256,7 @@ static int get_next_submodule(struct child_process *cp,
>  		case RECURSE_SUBMODULES_DEFAULT:
>  		case RECURSE_SUBMODULES_ON_DEMAND:
>  			if (!submodule ||
> -			    !unsorted_string_list_lookup(
> +			    !string_list_lookup(
>  					&changed_submodule_names,
>  					submodule->name))
>  				continue;
> @@ -1350,6 +1350,7 @@ int fetch_populated_submodules(struct repository *r,
>  	/* default value, "--submodule-prefix" and its value are added later */
>  
>  	calculate_changed_submodule_paths();
> +	string_list_sort(&changed_submodule_names);
>  	run_processes_parallel(max_parallel_jobs,
>  			       get_next_submodule,
>  			       fetch_start_failure,

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 7/9] submodule: fetch in submodules git directory instead of in worktree
  2018-09-11 23:49 ` [PATCH 7/9] submodule: fetch in submodules git directory instead of in worktree Stefan Beller
@ 2018-09-12 18:36   ` Junio C Hamano
  2018-09-13 19:29     ` Stefan Beller
  0 siblings, 1 reply; 35+ messages in thread
From: Junio C Hamano @ 2018-09-12 18:36 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Stefan Beller <sbeller@google.com> writes:

> This patch started as a refactoring to make 'get_next_submodule' more
> readable, but upon doing so, I realized that git-fetch actually doesn't
> need to be run in the worktree. So let's run it in the git dir instead.

It may be clear to the author but not clear to the reader of the
above paragraph that "worktree", "fetch" and "git dir" all refer to
the recursively invoked operation that updates the submodules
repository.  s/git-fetch/"git fetch" for the submodule/ should be
sufficient to help the readers.

> That should pave the way towards fetching submodules that are currently
> not checked out.

Very good.

> +static void prepare_submodule_repo_env_in_gitdir(struct argv_array *out)
> +{
> +	prepare_submodule_repo_env_no_git_dir(out);
> +	argv_array_pushf(out, "%s=.", GIT_DIR_ENVIRONMENT);
> +}
> +
>  /* Helper function to display the submodule header line prior to the full
>   * summary output. If it can locate the submodule objects directory it will
>   * attempt to lookup both the left and right commits and put them into the
> @@ -1227,6 +1233,27 @@ static int get_fetch_recurse_config(const struct submodule *submodule,
>  	return spf->default_option;
>  }
>  
> +static const char *get_submodule_git_dir(struct repository *r, const char *path)
> +{
> +	struct repository subrepo;
> +	const char *ret;
> +
> +	if (repo_submodule_init(&subrepo, r, path)) {
> +		/* no entry in .gitmodules? */
> +		struct strbuf gitdir = STRBUF_INIT;
> +		strbuf_repo_worktree_path(&gitdir, r, "%s/.git", path);
> +		if (repo_init(&subrepo, gitdir.buf, NULL)) {
> +			strbuf_release(&gitdir);
> +			return NULL;
> +		}

This is for the modern "absorbed" layout?  Do we get a notice and
encouragement to migrate from the historical layout, or there is no
need to (e.g. the migration happens automatically in some other
codepaths)?

> +	}
> +
> +	ret = xstrdup(subrepo.gitdir);
> +	repo_clear(&subrepo);
> +
> +	return ret;
> +}

Returned value from this function is xstrdup()'ed so the caller
owns, not borrows.  There is no need to return "const char *" from
this function.  Also the caller needs to free it once done.

>  static int get_next_submodule(struct child_process *cp,
>  			      struct strbuf *err, void *data, void **task_cb)
>  {
> @@ -1234,8 +1261,6 @@ static int get_next_submodule(struct child_process *cp,
>  	struct submodule_parallel_fetch *spf = data;
>  
>  	for (; spf->count < spf->r->index->cache_nr; spf->count++) {
> -		struct strbuf submodule_path = STRBUF_INIT;
> -		struct strbuf submodule_git_dir = STRBUF_INIT;
>  		struct strbuf submodule_prefix = STRBUF_INIT;
>  		const struct cache_entry *ce = spf->r->index->cache[spf->count];
>  		const char *git_dir, *default_argv;
> @@ -1274,16 +1299,12 @@ static int get_next_submodule(struct child_process *cp,
>  			continue;
>  		}
>  
> -		strbuf_repo_worktree_path(&submodule_path, spf->r, "%s", ce->name);
> -		strbuf_addf(&submodule_git_dir, "%s/.git", submodule_path.buf);
>  		strbuf_addf(&submodule_prefix, "%s%s/", spf->prefix, ce->name);
> -		git_dir = read_gitfile(submodule_git_dir.buf);
> -		if (!git_dir)
> -			git_dir = submodule_git_dir.buf;
> -		if (is_directory(git_dir)) {

In the old code, git_dir came from read_gitfile() which borrowed.

> +		git_dir = get_submodule_git_dir(spf->r, ce->name);

In the new code, we own it, so we'd eventually need to get rid of
it.  How does it happen?

> +		if (git_dir) {
>  			child_process_init(cp);
> -			cp->dir = strbuf_detach(&submodule_path, NULL);
> -			prepare_submodule_repo_env(&cp->env_array);
> +			prepare_submodule_repo_env_in_gitdir(&cp->env_array);
> +			cp->dir = git_dir;

Does cp now own it and cp->dir gets freed once run_processes_parallel()
is done with this task?  Or is cp->dir simply leaking?  The old code
gave the result of strbuf_detach(), so even if cp->dir is leaking,
the leak is not new in this patch.

>  			cp->git_cmd = 1;
>  			if (!spf->quiet)
>  				strbuf_addf(err, "Fetching submodule %s%s\n",
> @@ -1295,8 +1316,6 @@ static int get_next_submodule(struct child_process *cp,
>  			argv_array_push(&cp->args, submodule_prefix.buf);
>  			ret = 1;
>  		}
> -		strbuf_release(&submodule_path);
> -		strbuf_release(&submodule_git_dir);

But if it is a leak, it is easily plugged by freeing git_dir here, I
think.

Thanks.


>  		strbuf_release(&submodule_prefix);
>  		if (ret) {
>  			spf->count++;
> diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
> index 6c2f9b2ba26..42692219a1a 100755
> --- a/t/t5526-fetch-submodules.sh
> +++ b/t/t5526-fetch-submodules.sh
> @@ -566,7 +566,12 @@ test_expect_success 'fetching submodule into a broken repository' '
>  
>  	test_must_fail git -C dst status &&
>  	test_must_fail git -C dst diff &&
> -	test_must_fail git -C dst fetch --recurse-submodules
> +
> +	# git-fetch cannot find the git directory of the submodule,
> +	# so it will do nothing, successfully, as it cannot distinguish between
> +	# this broken submodule and a submodule that was just set active but
> +	# not cloned yet
> +	git -C dst fetch --recurse-submodules
>  '
>  
>  test_expect_success "fetch new commits when submodule got renamed" '

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 8/9] fetch: retry fetching submodules if sha1 were not fetched
  2018-09-11 23:49 ` [PATCH 8/9] fetch: retry fetching submodules if sha1 were not fetched Stefan Beller
@ 2018-09-12 19:03   ` Junio C Hamano
  0 siblings, 0 replies; 35+ messages in thread
From: Junio C Hamano @ 2018-09-12 19:03 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Stefan Beller <sbeller@google.com> writes:

> Retry fetching a submodule if the object id that the superproject points
> to, cannot be found.

By object name?  By attempting to fetch all refs?  Or by doing
something else?  Fetching by the exact object name is the most
efficient approach, but the server side may not be prepared to
serve such a request, and that is why spelling it out here would
help the readers.

> This doesn't support fetching to FETCH_HEAD yet, but only into a local
> branch.

It is not clear if this sentence is talking about the fetch done at
the superproject level, or what happens in a submodule repository
when this "retrying" happens.  Assuming that it is the former,
perhaps

    This retrying does not happen when the "git fetch" done at the
    superproject is not storing the fetched results in remote
    tracking branches (i.e. instead just recording them to
    FETCH_HEAD) in this step.

would help the readers understand what you are trying to say.

> To make fetching into FETCH_HEAD work, we need some refactoring
> in builtin/fetch.c to adjust the calls to 'check_for_new_submodule_commits'
> that is coming in the next patch.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  builtin/fetch.c             |  9 ++--
>  submodule.c                 | 87 ++++++++++++++++++++++++++++++++++++-
>  t/t5526-fetch-submodules.sh | 16 +++++++
>  3 files changed, 104 insertions(+), 8 deletions(-)
>
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index 61bec5d213d..95c44bf6ffa 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -700,8 +700,7 @@ static int update_local_ref(struct ref *ref,
>  			what = _("[new ref]");
>  		}
>  
> -		if ((recurse_submodules != RECURSE_SUBMODULES_OFF) &&
> -		    (recurse_submodules != RECURSE_SUBMODULES_ON))
> +		if (recurse_submodules != RECURSE_SUBMODULES_OFF)
>  			check_for_new_submodule_commits(&ref->new_oid);
>  		r = s_update_ref(msg, ref, 0);
>  		format_display(display, r ? '!' : '*', what,
> @@ -716,8 +715,7 @@ static int update_local_ref(struct ref *ref,
>  		strbuf_add_unique_abbrev(&quickref, &current->object.oid, DEFAULT_ABBREV);
>  		strbuf_addstr(&quickref, "..");
>  		strbuf_add_unique_abbrev(&quickref, &ref->new_oid, DEFAULT_ABBREV);
> -		if ((recurse_submodules != RECURSE_SUBMODULES_OFF) &&
> -		    (recurse_submodules != RECURSE_SUBMODULES_ON))
> +		if (recurse_submodules != RECURSE_SUBMODULES_OFF)
>  			check_for_new_submodule_commits(&ref->new_oid);
>  		r = s_update_ref("fast-forward", ref, 1);
>  		format_display(display, r ? '!' : ' ', quickref.buf,
> @@ -731,8 +729,7 @@ static int update_local_ref(struct ref *ref,
>  		strbuf_add_unique_abbrev(&quickref, &current->object.oid, DEFAULT_ABBREV);
>  		strbuf_addstr(&quickref, "...");
>  		strbuf_add_unique_abbrev(&quickref, &ref->new_oid, DEFAULT_ABBREV);
> -		if ((recurse_submodules != RECURSE_SUBMODULES_OFF) &&
> -		    (recurse_submodules != RECURSE_SUBMODULES_ON))
> +		if (recurse_submodules != RECURSE_SUBMODULES_OFF)
>  			check_for_new_submodule_commits(&ref->new_oid);
>  		r = s_update_ref("forced-update", ref, 1);
>  		format_display(display, r ? '!' : '+', quickref.buf,

All of these used to react to any value set to recurse-submodules
that is not off or on (i.e. on-demand, default, none), but now
unless the value is explicitly set to off, we call into the check.
It is not immediately clear how that change is linked to the
retrying.  "When set to 'on', we did not check for new commits, but
now we do" can be read from the patch text but not the reasoning
behind it.

What was the reason why we didn't call "check-for-new" when recurse
is set to "on"?  Is it because "we are going to recurse anyway, so
there is no need to check to decide if we need to fetch in
submodule"?  And the reason why we now need to call when we are set
to recurse anyway is because check-for-new now learns much more than
just "do we need to cd there and run git-fetch? yes/no?"

The answers to these two questions would help readers in the log
message.

> diff --git a/submodule.c b/submodule.c
> index 1e6781504f0..a75146e89cf 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -1127,8 +1127,11 @@ struct submodule_parallel_fetch {
>  	int result;
>  
>  	struct string_list changed_submodule_names;
> +	struct string_list retry;
>  };
> -#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, STRING_LIST_INIT_DUP }
> +#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, \
> +		  STRING_LIST_INIT_DUP, \
> +		  STRING_LIST_INIT_NODUP}
>  
>  static void calculate_changed_submodule_paths(
>  	struct submodule_parallel_fetch *spf)
> @@ -1259,8 +1262,10 @@ static int get_next_submodule(struct child_process *cp,
>  {
>  	int ret = 0;
>  	struct submodule_parallel_fetch *spf = data;
> +	struct string_list_item *it;
>  
>  	for (; spf->count < spf->r->index->cache_nr; spf->count++) {
> +		int recurse_config;
>  		struct strbuf submodule_prefix = STRBUF_INIT;
>  		const struct cache_entry *ce = spf->r->index->cache[spf->count];
>  		const char *git_dir, *default_argv;
> @@ -1280,7 +1285,9 @@ static int get_next_submodule(struct child_process *cp,
>  			}
>  		}
>  
> -		switch (get_fetch_recurse_config(submodule, spf))
> +		recurse_config = get_fetch_recurse_config(submodule, spf);
> +
> +		switch (recurse_config)
>  		{
>  		default:
>  		case RECURSE_SUBMODULES_DEFAULT:
> @@ -1319,9 +1326,50 @@ static int get_next_submodule(struct child_process *cp,
>  		strbuf_release(&submodule_prefix);
>  		if (ret) {
>  			spf->count++;
> +			if (submodule != &default_submodule)
> +				/* discard const-ness: */
> +				*task_cb = (void*)submodule;
>  			return 1;
>  		}
>  	}
> +
> +retry_next:
> +
> +	if (spf->retry.nr) {
> +		struct strbuf submodule_prefix = STRBUF_INIT;
> +		const struct submodule *sub;
> +
> +		it = string_list_last(&spf->retry);
> +		sub = submodule_from_name(spf->r, &null_oid,
> +					  it->string);
> +
> +		child_process_init(cp);
> +		cp->dir = get_submodule_git_dir(spf->r, sub->path);
> +		if (!cp->dir) {
> +			string_list_pop(&spf->retry, 0);
> +			goto retry_next;
> +		}
> +		prepare_submodule_repo_env_in_gitdir(&cp->env_array);
> +		cp->git_cmd = 1;
> +
> +		strbuf_addf(&submodule_prefix, "%s%s/", spf->prefix, sub->path);
> +		argv_array_init(&cp->args);
> +		argv_array_pushv(&cp->args, spf->args.argv);
> +		argv_array_push(&cp->args, "on-demand");
> +		argv_array_push(&cp->args, "--submodule-prefix");
> +		argv_array_push(&cp->args, submodule_prefix.buf);
> +
> +		/* NEEDSWORK: have get_default_remote from s--h */
> +		argv_array_push(&cp->args, "origin");
> +		oid_array_for_each_unique(it->util,
> +					  append_oid_to_argv, &cp->args);
> +
> +		*task_cb = NULL; /* make sure we do not recurse forever */
> +		strbuf_release(&submodule_prefix);
> +		string_list_pop(&spf->retry, 0);
> +		return 1;
> +	}
> +
>  	return 0;
>  }
>  
> @@ -1335,14 +1383,49 @@ static int fetch_start_failure(struct strbuf *err,
>  	return 0;
>  }
>  
> +static int commit_exists_in_sub(const struct object_id *oid, void *data)
> +{
> +	struct repository *subrepo = data;
> +
> +	enum object_type type = oid_object_info(subrepo, oid, NULL);
> +
> +	return type != OBJ_COMMIT;
> +}

Is this checking if the 'oid' exists as a comit in the submodule
repository?  It smells to me that it is checking the opposite.
Shouldn't the function be named "commit_missing_from_submodule()" or
something like that?

>  static int fetch_finish(int retvalue, struct strbuf *err,
>  			void *cb, void *task_cb)
>  {
>  	struct submodule_parallel_fetch *spf = cb;
> +	struct submodule *sub = task_cb;
> +	struct repository subrepo;
>  
>  	if (retvalue)
>  		spf->result = 1;
>  
> +	if (!sub)
> +		return 0;
> +
> +	if (repo_submodule_init(&subrepo, spf->r, sub->path) < 0)
> +		warning(_("Could not get submodule repository for submodule '%s' in repository '%s'"),
> +			  sub->path, spf->r->worktree);
> +	else {
> +		struct string_list_item *it;
> +		struct oid_array *commits;
> +
> +		it = string_list_lookup(&spf->changed_submodule_names, sub->name);
> +		if (!it)
> +			return 0;

OK, so after a fetch, if we have missing commits, we append to the
spf->retry list, which will be looked at in the function we looked
at earlier.

> +		commits = it->util;
> +		oid_array_filter(commits,
> +				 commit_exists_in_sub,
> +				 &subrepo);
> +
> +		if (commits->nr)
> +			string_list_append(&spf->retry, sub->name)
> +				->util = commits;
> +	}
> +
>  	return 0;
>  }
>  
> diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
> index 42692219a1a..af12c50e7dd 100755
> --- a/t/t5526-fetch-submodules.sh
> +++ b/t/t5526-fetch-submodules.sh
> @@ -605,4 +605,20 @@ test_expect_success "fetch new commits when submodule got renamed" '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success "fetch new commits on-demand when they are not reachable" '
> +	git checkout --detach &&
> +	C=$(git -C submodule commit-tree -m "new change outside refs/heads" HEAD^{tree}) &&
> +	git -C submodule update-ref refs/changes/1 $C &&
> +	git update-index --cacheinfo 160000 $C submodule &&
> +	git commit -m "updated submodule outside of refs/heads" &&
> +	D=$(git rev-parse HEAD) &&
> +	git update-ref refs/changes/2 $D &&
> +	(
> +		cd downstream &&
> +		git fetch --recurse-submodules --recurse-submodules-default on-demand origin refs/changes/2:refs/heads/my_branch &&
> +		git -C submodule cat-file -t $C &&
> +		git checkout --recurse-submodules FETCH_HEAD
> +	)
> +'
> +
>  test_done

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 4/9] submodule.c: sort changed_submodule_names before searching it
  2018-09-12 18:18   ` Junio C Hamano
@ 2018-09-12 19:06     ` Stefan Beller
  0 siblings, 0 replies; 35+ messages in thread
From: Stefan Beller @ 2018-09-12 19:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Wed, Sep 12, 2018 at 11:18 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Stefan Beller <sbeller@google.com> writes:
>
> > We can string_list_insert() to maintain sorted-ness of the
> > list as we find new items, or we can string_list_append() to
> > build an unsorted list and sort it at the end just once.
> >
> > To pick which one is more appropriate, we notice the fact
> > that we discover new items more or less in the already
> > sorted order.  That makes "append then sort" more
> > appropriate.
>
> Sorry, but I still do not get the math you are implying in the
> second paragraph.  Are you saying that append-then-sort is efficient
> when items being appended is already sorted?  That depends on the
> sorting algorithm used, so the logic is incomplete unless you say
> "given that we use X for sorting,...", I think.
>
> Do we really discover new items in sorted order, by the way?  In a
> single diff invocation made inside collect_changed_submodules() for
> one commit in the superproject's history, we will grab changed paths
> in the pathname order (i.e. sorted); if the superproject's tip commit
> touches the submodules at paths A and Z, we will discover these two
> paths in sorted order.
>
> But because we are walking the superproject's history to collect all
> paths that have been affected in that function, and repeatedly
> calling diff as we discover commit in the superproject's history, I
> am not sure how well the resulting set of paths would be sorted.
>
> The tip commit in superproject's history may have modified the
> submodule at path X, the parent of that commit may have touched the
> submodule at path M, and its parent may have touched the submodule
> at path A.  Don't we end up grabbing these paths in that discoverd
> order, i.e. X, M and A?

That is true.

>
> I still think changing it from "insert as we find an item, keeping
> the list sorted" to "append all and then sort before we start
> looking things up from the result" makes sense, but I do not think
> the "we find things in sorted order" is either true, or it would
> affect the choice between the two.  A justification to choose the
> latter I can think of that makes sense is that we don't have to pay
> cost to keep the list sorted while building it because we do not do
> any look-up while building the list.

ok.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 9/9] builtin/fetch: check for submodule updates for non branch fetches
  2018-09-11 23:49 ` [PATCH 9/9] builtin/fetch: check for submodule updates for non branch fetches Stefan Beller
@ 2018-09-12 19:20   ` Junio C Hamano
  0 siblings, 0 replies; 35+ messages in thread
From: Junio C Hamano @ 2018-09-12 19:20 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Stefan Beller <sbeller@google.com> writes:

> For Gerrit users that use submodules the invocation of fetch without a
> branch is their main use case.

That's way under explains this commit.  It is totally unclear how
that statement of fact relates to the problem this patch is trying
to address; it does not even make it clear what problem is being
addressed by the patch.

>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  builtin/fetch.c             | 5 ++++-
>  t/t5526-fetch-submodules.sh | 2 +-
>  2 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index 95c44bf6ffa..ea6ecd123e7 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -887,11 +887,14 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
>  				rc |= update_local_ref(ref, what, rm, &note,
>  						       summary_width);
>  				free(ref);
> -			} else
> +			} else {
> +				check_for_new_submodule_commits(&rm->old_oid);

This happens when there is no "ref", which is set only when
rm->peer_ref exists, which is set only when we are using remote
tracking branch (or more generally storing the fetched rev somewhere
in our refs/ hierarchy), e.g. the rev is recorded only in FETCH_HEAD.

What does rm->old_oid have in such a case?  Is this the tip of the
superproject history we just fetched?

When we keep record of what we saw in the previous attempt to fetch,
we can tell "we have seen their history up to this old commit
before, and now we fetched their history up to this new commit" and
the question "during that time, which submodules have been modified
in the history of the superproject" becomes answerable.  When we are
not keeping the record of previous fetch, how would we answer that
question without going through the whole history?

	The answer is that check-for-new does not even do the "old
	branch tip was X and new branch tip is Y, so we can look
	only at X..Y"; it only cares about the new branch tip of the
	superproject, and excludes the existing tips of all branches
	in the superproject (i.e. computing something akin to "Y
	--not --all" instead of "X..Y").

So, I guess this is probably reasonable.  But does the call to
"check-for-new submodule" need to be unconditional?  In this
codepath, do we know when we are not doing a recursive fetch in a
superproject?  If so, perhaps we can omit the cost of going through
all the refs to populate ref_tips_before_fetch array in such a case.

>  				format_display(&note, '*',
>  					       *kind ? kind : "branch", NULL,
>  					       *what ? what : "HEAD",
>  					       "FETCH_HEAD", summary_width);
> +			}
> +
>  			if (note.len) {
>  				if (verbosity >= 0 && !shown_url) {
>  					fprintf(stderr, _("From %.*s\n"),
> diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
> index af12c50e7dd..a509eabb044 100755
> --- a/t/t5526-fetch-submodules.sh
> +++ b/t/t5526-fetch-submodules.sh
> @@ -615,7 +615,7 @@ test_expect_success "fetch new commits on-demand when they are not reachable" '
>  	git update-ref refs/changes/2 $D &&
>  	(
>  		cd downstream &&
> -		git fetch --recurse-submodules --recurse-submodules-default on-demand origin refs/changes/2:refs/heads/my_branch &&
> +		git fetch --recurse-submodules origin refs/changes/2 &&
>  		git -C submodule cat-file -t $C &&
>  		git checkout --recurse-submodules FETCH_HEAD
>  	)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 7/9] submodule: fetch in submodules git directory instead of in worktree
  2018-09-12 18:36   ` Junio C Hamano
@ 2018-09-13 19:29     ` Stefan Beller
  0 siblings, 0 replies; 35+ messages in thread
From: Stefan Beller @ 2018-09-13 19:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Wed, Sep 12, 2018 at 11:36 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Stefan Beller <sbeller@google.com> writes:
>
> > This patch started as a refactoring to make 'get_next_submodule' more
> > readable, but upon doing so, I realized that git-fetch actually doesn't
> > need to be run in the worktree. So let's run it in the git dir instead.
>
> It may be clear to the author but not clear to the reader of the
> above paragraph that "worktree", "fetch" and "git dir" all refer to
> the recursively invoked operation that updates the submodules
> repository.  s/git-fetch/"git fetch" for the submodule/ should be
> sufficient to help the readers.
>
> > That should pave the way towards fetching submodules that are currently
> > not checked out.
>
> Very good.
>
> > +static void prepare_submodule_repo_env_in_gitdir(struct argv_array *out)
> > +{
> > +     prepare_submodule_repo_env_no_git_dir(out);
> > +     argv_array_pushf(out, "%s=.", GIT_DIR_ENVIRONMENT);
> > +}
> > +
> >  /* Helper function to display the submodule header line prior to the full
> >   * summary output. If it can locate the submodule objects directory it will
> >   * attempt to lookup both the left and right commits and put them into the
> > @@ -1227,6 +1233,27 @@ static int get_fetch_recurse_config(const struct submodule *submodule,
> >       return spf->default_option;
> >  }
> >
> > +static const char *get_submodule_git_dir(struct repository *r, const char *path)
> > +{
> > +     struct repository subrepo;
> > +     const char *ret;
> > +
> > +     if (repo_submodule_init(&subrepo, r, path)) {
> > +             /* no entry in .gitmodules? */
> > +             struct strbuf gitdir = STRBUF_INIT;
> > +             strbuf_repo_worktree_path(&gitdir, r, "%s/.git", path);
> > +             if (repo_init(&subrepo, gitdir.buf, NULL)) {
> > +                     strbuf_release(&gitdir);
> > +                     return NULL;
> > +             }
>
> This is for the modern "absorbed" layout?  Do we get a notice and
> encouragement to migrate from the historical layout, or there is no
> need to (e.g. the migration happens automatically in some other
> codepaths)?

No, the absorbed would also be handled by repo_submodule_init.
I wrote a patch once to migrate repo_submodule_init to take a
"struct *submodule" instead of a path as the third argument, which
would fall in line with this patch as well, I'll dig it up.

Historically git-fetch supported repositories that are not submodules
(but have a gitlink and a working tree in place) as well. That is covered
here. (see comment /* no entry in .gitmodules? */)

> > -             strbuf_release(&submodule_path);
> > -             strbuf_release(&submodule_git_dir);
>
> But if it is a leak, it is easily plugged by freeing git_dir here, I
> think.

Thanks.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCHv2 0/9] fetch: make sure submodule oids are fetched
  2018-09-11 23:49 [PATCH 0/9] fetch: make sure submodule oids are fetched Stefan Beller
                   ` (8 preceding siblings ...)
  2018-09-11 23:49 ` [PATCH 9/9] builtin/fetch: check for submodule updates for non branch fetches Stefan Beller
@ 2018-09-17 21:35 ` Stefan Beller
  2018-09-17 21:35   ` [PATCH 1/9] string-list: add string_list_{pop, last} functions Stefan Beller
                     ` (8 more replies)
  9 siblings, 9 replies; 35+ messages in thread
From: Stefan Beller @ 2018-09-17 21:35 UTC (permalink / raw)
  To: sbeller; +Cc: git

v2:
* extended commit messages,
* plugged a memory leak
* rewrote the patch "sha1-array: provide oid_array_filter" to be much more like 
  object_array_fiter
* fixed a typo pointed out by Ramsay.

The range diff is below.
  
Thanks,
Stefan

v1:
Currently when git-fetch is asked to recurse into submodules, it dispatches
a plain "git-fetch -C <submodule-dir>" (and some submodule related options
such as prefix and recusing strategy, but) without any information of the
remote or the tip that should be fetched.

This works surprisingly well in some workflows, not so well in others,
which this series aims to fix.

The first patches provide new basic functionality and do some refactoring;
the interesting part is in the two last patches.

This was discussed in
https://public-inbox.org/git/20180808221752.195419-1-sbeller@google.com/
and I think I addressed all feedback so far.

Stefan Beller (9):
  string-list: add string_list_{pop, last} functions
  sha1-array: provide oid_array_filter
  submodule.c: fix indentation
  submodule.c: sort changed_submodule_names before searching it
  submodule: move global changed_submodule_names into fetch submodule
    struct
  submodule.c: do not copy around submodule list
  submodule: fetch in submodules git directory instead of in worktree
  fetch: retry fetching submodules if needed objects were not fetched
  builtin/fetch: check for submodule updates for non branch fetches

 builtin/fetch.c             |  14 +--
 sha1-array.c                |  17 ++++
 sha1-array.h                |   9 ++
 string-list.c               |  14 +++
 string-list.h               |  15 +++
 submodule.c                 | 191 ++++++++++++++++++++++++++++--------
 t/t5526-fetch-submodules.sh |  23 ++++-
 7 files changed, 236 insertions(+), 47 deletions(-)

git range-diff origin/sb/submodule-recursive-fetch-gets-the-tip...HEAD 

 1:  6fecf7cd01a !  1:  54d31d90734 string-list: add string_list_{pop, last} functions
    @@ -12,8 +12,8 @@
          - string_list_pop() removes the string_list_item at the end of
            the string list.
     
    -     - string_list_push() is not added, but string_list_append() can be
    -       used in its place.
    +     - there is no string_list_push(); string_list_append() can be used
    +       in its place.
     
         You can use them in this pattern:
     
    @@ -26,7 +26,6 @@
     
         Helped-by: Junio C Hamano <gitster@pobox.com>
         Signed-off-by: Stefan Beller <sbeller@google.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
      diff --git a/string-list.c b/string-list.c
      --- a/string-list.c
    @@ -66,6 +65,10 @@
     + */
     +void string_list_pop(struct string_list *list, int free_util);
     +
    ++/*
    ++ * Returns the last item of the list. As it returns the raw access, do not
    ++ * modify the list while holding onto the returned pointer.
    ++ */
     +static inline struct string_list_item *string_list_last(struct string_list *list)
     +{
     +	return &list->items[list->nr - 1];
 2:  7007a318a68 <  -:  ----------- sha1-array: provide oid_array_filter
 -:  ----------- >  2:  a2bd6ef2bf0 sha1-array: provide oid_array_filter
 3:  807429234ac !  3:  0300c27cbd7 submodule.c: fix indentation
    @@ -6,7 +6,6 @@
         Fix it while we are here.
     
         Signed-off-by: Stefan Beller <sbeller@google.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
      diff --git a/submodule.c b/submodule.c
      --- a/submodule.c
 4:  f6fa5273af9 !  4:  80cf0221bbe submodule.c: sort changed_submodule_names before searching it
    @@ -12,7 +12,6 @@
         appropriate.
     
         Signed-off-by: Stefan Beller <sbeller@google.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
      diff --git a/submodule.c b/submodule.c
      --- a/submodule.c
 5:  adf7a2fd203 !  5:  7ddb448b748 submodule: move global changed_submodule_names into fetch submodule struct
    @@ -6,7 +6,6 @@
         part of the struct that is passed around for fetching submodules.
     
         Signed-off-by: Stefan Beller <sbeller@google.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
      diff --git a/submodule.c b/submodule.c
      --- a/submodule.c
 6:  56c9398589a !  6:  7975a7f1e3b submodule.c: do not copy around submodule list
    @@ -9,12 +9,11 @@
         Instead use the result list directly and prune items afterwards
         using string_list_remove_empty_items.
     
    -    By doin so we'll have access to the util pointer for longer that
    +    By doing so we'll have access to the util pointer for longer that
         contains the commits that we need to fetch, which will be
         useful in a later patch.
     
         Signed-off-by: Stefan Beller <sbeller@google.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
      diff --git a/submodule.c b/submodule.c
      --- a/submodule.c
 7:  9f70a5f32c9 !  7:  29bc2868f26 submodule: fetch in submodules git directory instead of in worktree
    @@ -3,14 +3,14 @@
         submodule: fetch in submodules git directory instead of in worktree
     
         This patch started as a refactoring to make 'get_next_submodule' more
    -    readable, but upon doing so, I realized that git-fetch actually doesn't
    -    need to be run in the worktree. So let's run it in the git dir instead.
    +    readable, but upon doing so, I realized that "git fetch" of the submodule
    +    actually doesn't need to be run in the submodules worktree. So let's run
    +    it in its git dir instead.
     
         That should pave the way towards fetching submodules that are currently
         not checked out.
     
         Signed-off-by: Stefan Beller <sbeller@google.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
      diff --git a/submodule.c b/submodule.c
      --- a/submodule.c
    @@ -32,7 +32,7 @@
      	return spf->default_option;
      }
      
    -+static const char *get_submodule_git_dir(struct repository *r, const char *path)
    ++static char *get_submodule_git_dir(struct repository *r, const char *path)
     +{
     +	struct repository subrepo;
     +	const char *ret;
    @@ -87,7 +87,11 @@
      			if (!spf->quiet)
      				strbuf_addf(err, "Fetching submodule %s%s\n",
     @@
    + 			argv_array_push(&cp->args, default_argv);
    + 			argv_array_push(&cp->args, "--submodule-prefix");
      			argv_array_push(&cp->args, submodule_prefix.buf);
    ++
    ++			free(git_dir);
      			ret = 1;
      		}
     -		strbuf_release(&submodule_path);
 8:  bab609b4dc1 !  8:  f837c4a0789 fetch: retry fetching submodules if sha1 were not fetched
    @@ -1,9 +1,9 @@
     Author: Stefan Beller <sbeller@google.com>
     
    -    fetch: retry fetching submodules if sha1 were not fetched
    +    fetch: retry fetching submodules if needed objects were not fetched
     
         Currently when git-fetch is asked to recurse into submodules, it dispatches
    -    a plain "git-fetch -C <submodule-dir>" (and some submodule related options
    +    a plain "git-fetch -C <submodule-dir>" (with some submodule related options
         such as prefix and recusing strategy, but) without any information of the
         remote or the tip that should be fetched.
     
    @@ -16,16 +16,22 @@
         topic downloaded as well. However these submodule changes reside in their
         own repository in their own ref (refs/changes/<int>).
     
    -    Retry fetching a submodule if the object id that the superproject points
    -    to, cannot be found.
    +    Retry fetching a submodule by object name if the object id that the
    +    superproject points to, cannot be found.
     
    -    This doesn't support fetching to FETCH_HEAD yet, but only into a local
    -    branch. To make fetching into FETCH_HEAD work, we need some refactoring
    -    in builtin/fetch.c to adjust the calls to 'check_for_new_submodule_commits'
    -    that is coming in the next patch.
    +    This retrying does not happen when the "git fetch" done at the
    +    superproject is not storing the fetched results in remote
    +    tracking branches (i.e. instead just recording them to
    +    FETCH_HEAD) in this step. A later patch will fix this.
    +
    +    builtin/fetch used to only inspect submodules when they were fetched
    +    "on-demand", as in either on/off case it was clear whether the submodule
    +    needs to be fetched. However to know whether we need to try fetching the
    +    object ids, we need to identify the object names, which is done in this
    +    function check_for_new_submodule_commits(), so we'll also run that code
    +    in case the submodule recursion is set to "on".
     
         Signed-off-by: Stefan Beller <sbeller@google.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
      diff --git a/builtin/fetch.c b/builtin/fetch.c
      --- a/builtin/fetch.c
 9:  c16d21313f6 !  9:  71f2bb035b1 builtin/fetch: check for submodule updates for non branch fetches
    @@ -2,11 +2,29 @@
     
         builtin/fetch: check for submodule updates for non branch fetches
     
    -    For Gerrit users that use submodules the invocation of fetch without a
    -    branch is their main use case.
    +    Gerrit, the code review tool, has a different workflow than our mailing
    +    list based approach. Usually users upload changes to a Gerrit server and
    +    continuous integration and testing happens by bots. Sometimes however a
    +    user wants to checkout a change locally and look at it locally. For this
    +    use case, Gerrit offers a command line snippet to copy and paste to your
    +    terminal, which looks like
    +
    +      git fetch https://<host>/gerrit refs/changes/<id> &&
    +      git checkout FETCH_HEAD
    +
    +    For Gerrit changes that contain changing submodule gitlinks, it would be
    +    easy to extend both the fetch and checkout with the '--recurse-submodules'
    +    flag, such that this command line snippet would produce the state of a
    +    change locally.
    +
    +    However the functionality added in the previous patch, which would
    +    ensure that we fetch the objects in the submodule that the gitlink pointed
    +    at, only works for remote tracking branches so far, not for FETCH_HEAD.
    +
    +    Make sure that fetching a superproject to its FETCH_HEAD, also respects
    +    the existence checks for objects in the submodule recursion.
     
         Signed-off-by: Stefan Beller <sbeller@google.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
      diff --git a/builtin/fetch.c b/builtin/fetch.c
      --- a/builtin/fetch.c

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 1/9] string-list: add string_list_{pop, last} functions
  2018-09-17 21:35 ` [PATCHv2 0/9] fetch: make sure submodule oids are fetched Stefan Beller
@ 2018-09-17 21:35   ` Stefan Beller
  2018-09-21 22:08     ` [PATCH] " Stefan Beller
  2018-09-17 21:35   ` [PATCH 2/9] sha1-array: provide oid_array_filter Stefan Beller
                     ` (7 subsequent siblings)
  8 siblings, 1 reply; 35+ messages in thread
From: Stefan Beller @ 2018-09-17 21:35 UTC (permalink / raw)
  To: sbeller; +Cc: git

Add a few functions to allow a string-list to be used as a stack:

 - string_list_last() lets a caller peek the string_list_item at the
   end of the string list.  The caller needs to be aware that it is
   borrowing a pointer, which can become invalid if/when the
   string_list is resized.

 - string_list_pop() removes the string_list_item at the end of
   the string list.

 - there is no string_list_push(); string_list_append() can be used
   in its place.

You can use them in this pattern:

    while (list.nr) {
        struct string_list_item *item = string_list_last(&list);

        work_on(item);
        string_list_pop(&list, free_util);
    }

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
---
 string-list.c | 14 ++++++++++++++
 string-list.h | 15 +++++++++++++++
 2 files changed, 29 insertions(+)

diff --git a/string-list.c b/string-list.c
index 771c4550980..04db2b537c0 100644
--- a/string-list.c
+++ b/string-list.c
@@ -80,6 +80,20 @@ void string_list_remove(struct string_list *list, const char *string,
 	}
 }
 
+void string_list_pop(struct string_list *list, int free_util)
+{
+	if (list->nr == 0)
+		BUG("tried to remove an item from empty string list");
+
+	if (list->strdup_strings)
+		free(list->items[list->nr - 1].string);
+
+	if (free_util)
+		free(list->items[list->nr - 1].util);
+
+	list->nr--;
+}
+
 int string_list_has_string(const struct string_list *list, const char *string)
 {
 	int exact_match;
diff --git a/string-list.h b/string-list.h
index ff8f6094a33..15f2936e337 100644
--- a/string-list.h
+++ b/string-list.h
@@ -191,6 +191,21 @@ extern void string_list_remove(struct string_list *list, const char *string,
  */
 struct string_list_item *string_list_lookup(struct string_list *list, const char *string);
 
+/**
+ * Removes the last item from the list.
+ * The caller must ensure that the list is not empty.
+ */
+void string_list_pop(struct string_list *list, int free_util);
+
+/*
+ * Returns the last item of the list. As it returns the raw access, do not
+ * modify the list while holding onto the returned pointer.
+ */
+static inline struct string_list_item *string_list_last(struct string_list *list)
+{
+	return &list->items[list->nr - 1];
+}
+
 /*
  * Remove all but the first of consecutive entries with the same
  * string value.  If free_util is true, call free() on the util
-- 
2.19.0.397.gdd90340f6a-goog


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 2/9] sha1-array: provide oid_array_filter
  2018-09-17 21:35 ` [PATCHv2 0/9] fetch: make sure submodule oids are fetched Stefan Beller
  2018-09-17 21:35   ` [PATCH 1/9] string-list: add string_list_{pop, last} functions Stefan Beller
@ 2018-09-17 21:35   ` Stefan Beller
  2018-09-17 21:42     ` Stefan Beller
  2018-09-17 21:35   ` [PATCH 3/9] submodule.c: fix indentation Stefan Beller
                     ` (6 subsequent siblings)
  8 siblings, 1 reply; 35+ messages in thread
From: Stefan Beller @ 2018-09-17 21:35 UTC (permalink / raw)
  To: sbeller; +Cc: git

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
---
 sha1-array.c | 17 +++++++++++++++++
 sha1-array.h |  9 +++++++++
 2 files changed, 26 insertions(+)

diff --git a/sha1-array.c b/sha1-array.c
index 265941fbf40..67db5eeec9a 100644
--- a/sha1-array.c
+++ b/sha1-array.c
@@ -77,3 +77,20 @@ int oid_array_for_each_unique(struct oid_array *array,
 	}
 	return 0;
 }
+
+void oid_array_filter(struct oid_array *array,
+		      for_each_oid_fn want,
+		      void *cb_data)
+{
+	unsigned nr = array->nr, src, dst;
+	struct object_id *oids = array->oids;
+
+	for (src = dst = 0; src < nr; src++) {
+		if (want(&oids[src], cb_data)) {
+			if (src != dst)
+				oidcpy(oids[dst], &oids[src]);
+			dst++;
+		}
+	}
+	array->nr = dst;
+}
diff --git a/sha1-array.h b/sha1-array.h
index 232bf950172..ae059ca0431 100644
--- a/sha1-array.h
+++ b/sha1-array.h
@@ -23,4 +23,13 @@ int oid_array_for_each_unique(struct oid_array *array,
 			      for_each_oid_fn fn,
 			      void *data);
 
+/*
+ * Apply want to each entry in array, retaining only the entries for
+ * which the function returns true.  Preserve the order of the entries
+ * that are retained.
+ */
+int oid_array_filter(struct oid_array *array,
+		     for_each_oid_fn want,
+		     void *cbdata);
+
 #endif /* SHA1_ARRAY_H */
-- 
2.19.0.397.gdd90340f6a-goog


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 3/9] submodule.c: fix indentation
  2018-09-17 21:35 ` [PATCHv2 0/9] fetch: make sure submodule oids are fetched Stefan Beller
  2018-09-17 21:35   ` [PATCH 1/9] string-list: add string_list_{pop, last} functions Stefan Beller
  2018-09-17 21:35   ` [PATCH 2/9] sha1-array: provide oid_array_filter Stefan Beller
@ 2018-09-17 21:35   ` Stefan Beller
  2018-09-17 21:35   ` [PATCH 4/9] submodule.c: sort changed_submodule_names before searching it Stefan Beller
                     ` (5 subsequent siblings)
  8 siblings, 0 replies; 35+ messages in thread
From: Stefan Beller @ 2018-09-17 21:35 UTC (permalink / raw)
  To: sbeller; +Cc: git

The submodule subsystem is really bad at staying within 80 characters.
Fix it while we are here.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/submodule.c b/submodule.c
index a2b266fbfae..d29dfa3d1f5 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1244,7 +1244,8 @@ static int get_next_submodule(struct child_process *cp,
 		if (!submodule) {
 			const char *name = default_name_or_path(ce->name);
 			if (name) {
-				default_submodule.path = default_submodule.name = name;
+				default_submodule.path = name;
+				default_submodule.name = name;
 				submodule = &default_submodule;
 			}
 		}
@@ -1254,8 +1255,10 @@ static int get_next_submodule(struct child_process *cp,
 		default:
 		case RECURSE_SUBMODULES_DEFAULT:
 		case RECURSE_SUBMODULES_ON_DEMAND:
-			if (!submodule || !unsorted_string_list_lookup(&changed_submodule_names,
-							 submodule->name))
+			if (!submodule ||
+			    !unsorted_string_list_lookup(
+					&changed_submodule_names,
+					submodule->name))
 				continue;
 			default_argv = "on-demand";
 			break;
-- 
2.19.0.397.gdd90340f6a-goog


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 4/9] submodule.c: sort changed_submodule_names before searching it
  2018-09-17 21:35 ` [PATCHv2 0/9] fetch: make sure submodule oids are fetched Stefan Beller
                     ` (2 preceding siblings ...)
  2018-09-17 21:35   ` [PATCH 3/9] submodule.c: fix indentation Stefan Beller
@ 2018-09-17 21:35   ` Stefan Beller
  2018-09-17 21:35   ` [PATCH 5/9] submodule: move global changed_submodule_names into fetch submodule struct Stefan Beller
                     ` (4 subsequent siblings)
  8 siblings, 0 replies; 35+ messages in thread
From: Stefan Beller @ 2018-09-17 21:35 UTC (permalink / raw)
  To: sbeller; +Cc: git

We can string_list_insert() to maintain sorted-ness of the
list as we find new items, or we can string_list_append() to
build an unsorted list and sort it at the end just once.

To pick which one is more appropriate, we notice the fact
that we discover new items more or less in the already
sorted order.  That makes "append then sort" more
appropriate.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/submodule.c b/submodule.c
index d29dfa3d1f5..c6eff7699f3 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1256,7 +1256,7 @@ static int get_next_submodule(struct child_process *cp,
 		case RECURSE_SUBMODULES_DEFAULT:
 		case RECURSE_SUBMODULES_ON_DEMAND:
 			if (!submodule ||
-			    !unsorted_string_list_lookup(
+			    !string_list_lookup(
 					&changed_submodule_names,
 					submodule->name))
 				continue;
@@ -1350,6 +1350,7 @@ int fetch_populated_submodules(struct repository *r,
 	/* default value, "--submodule-prefix" and its value are added later */
 
 	calculate_changed_submodule_paths();
+	string_list_sort(&changed_submodule_names);
 	run_processes_parallel(max_parallel_jobs,
 			       get_next_submodule,
 			       fetch_start_failure,
-- 
2.19.0.397.gdd90340f6a-goog


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 5/9] submodule: move global changed_submodule_names into fetch submodule struct
  2018-09-17 21:35 ` [PATCHv2 0/9] fetch: make sure submodule oids are fetched Stefan Beller
                     ` (3 preceding siblings ...)
  2018-09-17 21:35   ` [PATCH 4/9] submodule.c: sort changed_submodule_names before searching it Stefan Beller
@ 2018-09-17 21:35   ` Stefan Beller
  2018-09-17 21:35   ` [PATCH 6/9] submodule.c: do not copy around submodule list Stefan Beller
                     ` (3 subsequent siblings)
  8 siblings, 0 replies; 35+ messages in thread
From: Stefan Beller @ 2018-09-17 21:35 UTC (permalink / raw)
  To: sbeller; +Cc: git

The `changed_submodule_names` are only used for fetching, so let's make it
part of the struct that is passed around for fetching submodules.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule.c | 42 +++++++++++++++++++++++-------------------
 1 file changed, 23 insertions(+), 19 deletions(-)

diff --git a/submodule.c b/submodule.c
index c6eff7699f3..3520dd76bdf 100644
--- a/submodule.c
+++ b/submodule.c
@@ -24,7 +24,7 @@
 #include "object-store.h"
 
 static int config_update_recurse_submodules = RECURSE_SUBMODULES_OFF;
-static struct string_list changed_submodule_names = STRING_LIST_INIT_DUP;
+
 static int initialized_fetch_ref_tips;
 static struct oid_array ref_tips_before_fetch;
 static struct oid_array ref_tips_after_fetch;
@@ -1110,7 +1110,22 @@ void check_for_new_submodule_commits(struct object_id *oid)
 	oid_array_append(&ref_tips_after_fetch, oid);
 }
 
-static void calculate_changed_submodule_paths(void)
+struct submodule_parallel_fetch {
+	int count;
+	struct argv_array args;
+	struct repository *r;
+	const char *prefix;
+	int command_line_option;
+	int default_option;
+	int quiet;
+	int result;
+
+	struct string_list changed_submodule_names;
+};
+#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, STRING_LIST_INIT_DUP }
+
+static void calculate_changed_submodule_paths(
+	struct submodule_parallel_fetch *spf)
 {
 	struct argv_array argv = ARGV_ARRAY_INIT;
 	struct string_list changed_submodules = STRING_LIST_INIT_DUP;
@@ -1148,7 +1163,8 @@ static void calculate_changed_submodule_paths(void)
 			continue;
 
 		if (!submodule_has_commits(path, commits))
-			string_list_append(&changed_submodule_names, name->string);
+			string_list_append(&spf->changed_submodule_names,
+					   name->string);
 	}
 
 	free_submodules_oids(&changed_submodules);
@@ -1185,18 +1201,6 @@ int submodule_touches_in_range(struct object_id *excl_oid,
 	return ret;
 }
 
-struct submodule_parallel_fetch {
-	int count;
-	struct argv_array args;
-	struct repository *r;
-	const char *prefix;
-	int command_line_option;
-	int default_option;
-	int quiet;
-	int result;
-};
-#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0}
-
 static int get_fetch_recurse_config(const struct submodule *submodule,
 				    struct submodule_parallel_fetch *spf)
 {
@@ -1257,7 +1261,7 @@ static int get_next_submodule(struct child_process *cp,
 		case RECURSE_SUBMODULES_ON_DEMAND:
 			if (!submodule ||
 			    !string_list_lookup(
-					&changed_submodule_names,
+					&spf->changed_submodule_names,
 					submodule->name))
 				continue;
 			default_argv = "on-demand";
@@ -1349,8 +1353,8 @@ int fetch_populated_submodules(struct repository *r,
 	argv_array_push(&spf.args, "--recurse-submodules-default");
 	/* default value, "--submodule-prefix" and its value are added later */
 
-	calculate_changed_submodule_paths();
-	string_list_sort(&changed_submodule_names);
+	calculate_changed_submodule_paths(&spf);
+	string_list_sort(&spf.changed_submodule_names);
 	run_processes_parallel(max_parallel_jobs,
 			       get_next_submodule,
 			       fetch_start_failure,
@@ -1359,7 +1363,7 @@ int fetch_populated_submodules(struct repository *r,
 
 	argv_array_clear(&spf.args);
 out:
-	string_list_clear(&changed_submodule_names, 1);
+	string_list_clear(&spf.changed_submodule_names, 1);
 	return spf.result;
 }
 
-- 
2.19.0.397.gdd90340f6a-goog


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 6/9] submodule.c: do not copy around submodule list
  2018-09-17 21:35 ` [PATCHv2 0/9] fetch: make sure submodule oids are fetched Stefan Beller
                     ` (4 preceding siblings ...)
  2018-09-17 21:35   ` [PATCH 5/9] submodule: move global changed_submodule_names into fetch submodule struct Stefan Beller
@ 2018-09-17 21:35   ` Stefan Beller
  2018-09-17 21:35   ` [PATCH 7/9] submodule: fetch in submodules git directory instead of in worktree Stefan Beller
                     ` (2 subsequent siblings)
  8 siblings, 0 replies; 35+ messages in thread
From: Stefan Beller @ 2018-09-17 21:35 UTC (permalink / raw)
  To: sbeller; +Cc: git

'calculate_changed_submodule_paths' uses a local list to compute the
changed submodules, and then produces the result by copying appropriate
items into the result list.

Instead use the result list directly and prune items afterwards
using string_list_remove_empty_items.

By doing so we'll have access to the util pointer for longer that
contains the commits that we need to fetch, which will be
useful in a later patch.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/submodule.c b/submodule.c
index 3520dd76bdf..00a9a3c6b12 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1128,8 +1128,7 @@ static void calculate_changed_submodule_paths(
 	struct submodule_parallel_fetch *spf)
 {
 	struct argv_array argv = ARGV_ARRAY_INIT;
-	struct string_list changed_submodules = STRING_LIST_INIT_DUP;
-	const struct string_list_item *name;
+	struct string_list_item *name;
 
 	/* No need to check if there are no submodules configured */
 	if (!submodule_from_path(the_repository, NULL, NULL))
@@ -1146,9 +1145,9 @@ static void calculate_changed_submodule_paths(
 	 * Collect all submodules (whether checked out or not) for which new
 	 * commits have been recorded upstream in "changed_submodule_names".
 	 */
-	collect_changed_submodules(&changed_submodules, &argv);
+	collect_changed_submodules(&spf->changed_submodule_names, &argv);
 
-	for_each_string_list_item(name, &changed_submodules) {
+	for_each_string_list_item(name, &spf->changed_submodule_names) {
 		struct oid_array *commits = name->util;
 		const struct submodule *submodule;
 		const char *path = NULL;
@@ -1162,12 +1161,14 @@ static void calculate_changed_submodule_paths(
 		if (!path)
 			continue;
 
-		if (!submodule_has_commits(path, commits))
-			string_list_append(&spf->changed_submodule_names,
-					   name->string);
+		if (submodule_has_commits(path, commits)) {
+			oid_array_clear(commits);
+			*name->string = '\0';
+		}
 	}
 
-	free_submodules_oids(&changed_submodules);
+	string_list_remove_empty_items(&spf->changed_submodule_names, 1);
+
 	argv_array_clear(&argv);
 	oid_array_clear(&ref_tips_before_fetch);
 	oid_array_clear(&ref_tips_after_fetch);
@@ -1363,7 +1364,7 @@ int fetch_populated_submodules(struct repository *r,
 
 	argv_array_clear(&spf.args);
 out:
-	string_list_clear(&spf.changed_submodule_names, 1);
+	free_submodules_oids(&spf.changed_submodule_names);
 	return spf.result;
 }
 
-- 
2.19.0.397.gdd90340f6a-goog


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 7/9] submodule: fetch in submodules git directory instead of in worktree
  2018-09-17 21:35 ` [PATCHv2 0/9] fetch: make sure submodule oids are fetched Stefan Beller
                     ` (5 preceding siblings ...)
  2018-09-17 21:35   ` [PATCH 6/9] submodule.c: do not copy around submodule list Stefan Beller
@ 2018-09-17 21:35   ` Stefan Beller
  2018-09-17 21:35   ` [PATCH 8/9] fetch: retry fetching submodules if needed objects were not fetched Stefan Beller
  2018-09-17 21:35   ` [PATCH 9/9] builtin/fetch: check for submodule updates for non branch fetches Stefan Beller
  8 siblings, 0 replies; 35+ messages in thread
From: Stefan Beller @ 2018-09-17 21:35 UTC (permalink / raw)
  To: sbeller; +Cc: git

This patch started as a refactoring to make 'get_next_submodule' more
readable, but upon doing so, I realized that "git fetch" of the submodule
actually doesn't need to be run in the submodules worktree. So let's run
it in its git dir instead.

That should pave the way towards fetching submodules that are currently
not checked out.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule.c                 | 45 +++++++++++++++++++++++++++----------
 t/t5526-fetch-submodules.sh |  7 +++++-
 2 files changed, 39 insertions(+), 13 deletions(-)

diff --git a/submodule.c b/submodule.c
index 00a9a3c6b12..88bce534268 100644
--- a/submodule.c
+++ b/submodule.c
@@ -481,6 +481,12 @@ void prepare_submodule_repo_env(struct argv_array *out)
 			 DEFAULT_GIT_DIR_ENVIRONMENT);
 }
 
+static void prepare_submodule_repo_env_in_gitdir(struct argv_array *out)
+{
+	prepare_submodule_repo_env_no_git_dir(out);
+	argv_array_pushf(out, "%s=.", GIT_DIR_ENVIRONMENT);
+}
+
 /* Helper function to display the submodule header line prior to the full
  * summary output. If it can locate the submodule objects directory it will
  * attempt to lookup both the left and right commits and put them into the
@@ -1227,6 +1233,27 @@ static int get_fetch_recurse_config(const struct submodule *submodule,
 	return spf->default_option;
 }
 
+static char *get_submodule_git_dir(struct repository *r, const char *path)
+{
+	struct repository subrepo;
+	const char *ret;
+
+	if (repo_submodule_init(&subrepo, r, path)) {
+		/* no entry in .gitmodules? */
+		struct strbuf gitdir = STRBUF_INIT;
+		strbuf_repo_worktree_path(&gitdir, r, "%s/.git", path);
+		if (repo_init(&subrepo, gitdir.buf, NULL)) {
+			strbuf_release(&gitdir);
+			return NULL;
+		}
+	}
+
+	ret = xstrdup(subrepo.gitdir);
+	repo_clear(&subrepo);
+
+	return ret;
+}
+
 static int get_next_submodule(struct child_process *cp,
 			      struct strbuf *err, void *data, void **task_cb)
 {
@@ -1234,8 +1261,6 @@ static int get_next_submodule(struct child_process *cp,
 	struct submodule_parallel_fetch *spf = data;
 
 	for (; spf->count < spf->r->index->cache_nr; spf->count++) {
-		struct strbuf submodule_path = STRBUF_INIT;
-		struct strbuf submodule_git_dir = STRBUF_INIT;
 		struct strbuf submodule_prefix = STRBUF_INIT;
 		const struct cache_entry *ce = spf->r->index->cache[spf->count];
 		const char *git_dir, *default_argv;
@@ -1274,16 +1299,12 @@ static int get_next_submodule(struct child_process *cp,
 			continue;
 		}
 
-		strbuf_repo_worktree_path(&submodule_path, spf->r, "%s", ce->name);
-		strbuf_addf(&submodule_git_dir, "%s/.git", submodule_path.buf);
 		strbuf_addf(&submodule_prefix, "%s%s/", spf->prefix, ce->name);
-		git_dir = read_gitfile(submodule_git_dir.buf);
-		if (!git_dir)
-			git_dir = submodule_git_dir.buf;
-		if (is_directory(git_dir)) {
+		git_dir = get_submodule_git_dir(spf->r, ce->name);
+		if (git_dir) {
 			child_process_init(cp);
-			cp->dir = strbuf_detach(&submodule_path, NULL);
-			prepare_submodule_repo_env(&cp->env_array);
+			prepare_submodule_repo_env_in_gitdir(&cp->env_array);
+			cp->dir = git_dir;
 			cp->git_cmd = 1;
 			if (!spf->quiet)
 				strbuf_addf(err, "Fetching submodule %s%s\n",
@@ -1293,10 +1314,10 @@ static int get_next_submodule(struct child_process *cp,
 			argv_array_push(&cp->args, default_argv);
 			argv_array_push(&cp->args, "--submodule-prefix");
 			argv_array_push(&cp->args, submodule_prefix.buf);
+
+			free(git_dir);
 			ret = 1;
 		}
-		strbuf_release(&submodule_path);
-		strbuf_release(&submodule_git_dir);
 		strbuf_release(&submodule_prefix);
 		if (ret) {
 			spf->count++;
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index 6c2f9b2ba26..42692219a1a 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -566,7 +566,12 @@ test_expect_success 'fetching submodule into a broken repository' '
 
 	test_must_fail git -C dst status &&
 	test_must_fail git -C dst diff &&
-	test_must_fail git -C dst fetch --recurse-submodules
+
+	# git-fetch cannot find the git directory of the submodule,
+	# so it will do nothing, successfully, as it cannot distinguish between
+	# this broken submodule and a submodule that was just set active but
+	# not cloned yet
+	git -C dst fetch --recurse-submodules
 '
 
 test_expect_success "fetch new commits when submodule got renamed" '
-- 
2.19.0.397.gdd90340f6a-goog


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 8/9] fetch: retry fetching submodules if needed objects were not fetched
  2018-09-17 21:35 ` [PATCHv2 0/9] fetch: make sure submodule oids are fetched Stefan Beller
                     ` (6 preceding siblings ...)
  2018-09-17 21:35   ` [PATCH 7/9] submodule: fetch in submodules git directory instead of in worktree Stefan Beller
@ 2018-09-17 21:35   ` Stefan Beller
  2018-09-17 21:35   ` [PATCH 9/9] builtin/fetch: check for submodule updates for non branch fetches Stefan Beller
  8 siblings, 0 replies; 35+ messages in thread
From: Stefan Beller @ 2018-09-17 21:35 UTC (permalink / raw)
  To: sbeller; +Cc: git

Currently when git-fetch is asked to recurse into submodules, it dispatches
a plain "git-fetch -C <submodule-dir>" (with some submodule related options
such as prefix and recusing strategy, but) without any information of the
remote or the tip that should be fetched.

This works surprisingly well in some workflows (such as using submodules
as a third party library), while not so well in other scenarios, such
as in a Gerrit topic-based workflow, that can tie together changes
(potentially across repositories) on the server side. One of the parts
of such a Gerrit workflow is to download a change when wanting to examine
it, and you'd want to have its submodule changes that are in the same
topic downloaded as well. However these submodule changes reside in their
own repository in their own ref (refs/changes/<int>).

Retry fetching a submodule by object name if the object id that the
superproject points to, cannot be found.

This retrying does not happen when the "git fetch" done at the
superproject is not storing the fetched results in remote
tracking branches (i.e. instead just recording them to
FETCH_HEAD) in this step. A later patch will fix this.

builtin/fetch used to only inspect submodules when they were fetched
"on-demand", as in either on/off case it was clear whether the submodule
needs to be fetched. However to know whether we need to try fetching the
object ids, we need to identify the object names, which is done in this
function check_for_new_submodule_commits(), so we'll also run that code
in case the submodule recursion is set to "on".

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 builtin/fetch.c             |  9 ++--
 submodule.c                 | 87 ++++++++++++++++++++++++++++++++++++-
 t/t5526-fetch-submodules.sh | 16 +++++++
 3 files changed, 104 insertions(+), 8 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 61bec5d213d..95c44bf6ffa 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -700,8 +700,7 @@ static int update_local_ref(struct ref *ref,
 			what = _("[new ref]");
 		}
 
-		if ((recurse_submodules != RECURSE_SUBMODULES_OFF) &&
-		    (recurse_submodules != RECURSE_SUBMODULES_ON))
+		if (recurse_submodules != RECURSE_SUBMODULES_OFF)
 			check_for_new_submodule_commits(&ref->new_oid);
 		r = s_update_ref(msg, ref, 0);
 		format_display(display, r ? '!' : '*', what,
@@ -716,8 +715,7 @@ static int update_local_ref(struct ref *ref,
 		strbuf_add_unique_abbrev(&quickref, &current->object.oid, DEFAULT_ABBREV);
 		strbuf_addstr(&quickref, "..");
 		strbuf_add_unique_abbrev(&quickref, &ref->new_oid, DEFAULT_ABBREV);
-		if ((recurse_submodules != RECURSE_SUBMODULES_OFF) &&
-		    (recurse_submodules != RECURSE_SUBMODULES_ON))
+		if (recurse_submodules != RECURSE_SUBMODULES_OFF)
 			check_for_new_submodule_commits(&ref->new_oid);
 		r = s_update_ref("fast-forward", ref, 1);
 		format_display(display, r ? '!' : ' ', quickref.buf,
@@ -731,8 +729,7 @@ static int update_local_ref(struct ref *ref,
 		strbuf_add_unique_abbrev(&quickref, &current->object.oid, DEFAULT_ABBREV);
 		strbuf_addstr(&quickref, "...");
 		strbuf_add_unique_abbrev(&quickref, &ref->new_oid, DEFAULT_ABBREV);
-		if ((recurse_submodules != RECURSE_SUBMODULES_OFF) &&
-		    (recurse_submodules != RECURSE_SUBMODULES_ON))
+		if (recurse_submodules != RECURSE_SUBMODULES_OFF)
 			check_for_new_submodule_commits(&ref->new_oid);
 		r = s_update_ref("forced-update", ref, 1);
 		format_display(display, r ? '!' : '+', quickref.buf,
diff --git a/submodule.c b/submodule.c
index 88bce534268..7d59e56171f 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1127,8 +1127,11 @@ struct submodule_parallel_fetch {
 	int result;
 
 	struct string_list changed_submodule_names;
+	struct string_list retry;
 };
-#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, STRING_LIST_INIT_DUP }
+#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, \
+		  STRING_LIST_INIT_DUP, \
+		  STRING_LIST_INIT_NODUP}
 
 static void calculate_changed_submodule_paths(
 	struct submodule_parallel_fetch *spf)
@@ -1259,8 +1262,10 @@ static int get_next_submodule(struct child_process *cp,
 {
 	int ret = 0;
 	struct submodule_parallel_fetch *spf = data;
+	struct string_list_item *it;
 
 	for (; spf->count < spf->r->index->cache_nr; spf->count++) {
+		int recurse_config;
 		struct strbuf submodule_prefix = STRBUF_INIT;
 		const struct cache_entry *ce = spf->r->index->cache[spf->count];
 		const char *git_dir, *default_argv;
@@ -1280,7 +1285,9 @@ static int get_next_submodule(struct child_process *cp,
 			}
 		}
 
-		switch (get_fetch_recurse_config(submodule, spf))
+		recurse_config = get_fetch_recurse_config(submodule, spf);
+
+		switch (recurse_config)
 		{
 		default:
 		case RECURSE_SUBMODULES_DEFAULT:
@@ -1321,9 +1328,50 @@ static int get_next_submodule(struct child_process *cp,
 		strbuf_release(&submodule_prefix);
 		if (ret) {
 			spf->count++;
+			if (submodule != &default_submodule)
+				/* discard const-ness: */
+				*task_cb = (void*)submodule;
 			return 1;
 		}
 	}
+
+retry_next:
+
+	if (spf->retry.nr) {
+		struct strbuf submodule_prefix = STRBUF_INIT;
+		const struct submodule *sub;
+
+		it = string_list_last(&spf->retry);
+		sub = submodule_from_name(spf->r, &null_oid,
+					  it->string);
+
+		child_process_init(cp);
+		cp->dir = get_submodule_git_dir(spf->r, sub->path);
+		if (!cp->dir) {
+			string_list_pop(&spf->retry, 0);
+			goto retry_next;
+		}
+		prepare_submodule_repo_env_in_gitdir(&cp->env_array);
+		cp->git_cmd = 1;
+
+		strbuf_addf(&submodule_prefix, "%s%s/", spf->prefix, sub->path);
+		argv_array_init(&cp->args);
+		argv_array_pushv(&cp->args, spf->args.argv);
+		argv_array_push(&cp->args, "on-demand");
+		argv_array_push(&cp->args, "--submodule-prefix");
+		argv_array_push(&cp->args, submodule_prefix.buf);
+
+		/* NEEDSWORK: have get_default_remote from s--h */
+		argv_array_push(&cp->args, "origin");
+		oid_array_for_each_unique(it->util,
+					  append_oid_to_argv, &cp->args);
+
+		*task_cb = NULL; /* make sure we do not recurse forever */
+		strbuf_release(&submodule_prefix);
+		string_list_pop(&spf->retry, 0);
+		return 1;
+	}
+
 	return 0;
 }
 
@@ -1337,14 +1385,49 @@ static int fetch_start_failure(struct strbuf *err,
 	return 0;
 }
 
+static int commit_exists_in_sub(const struct object_id *oid, void *data)
+{
+	struct repository *subrepo = data;
+
+	enum object_type type = oid_object_info(subrepo, oid, NULL);
+
+	return type != OBJ_COMMIT;
+}
+
 static int fetch_finish(int retvalue, struct strbuf *err,
 			void *cb, void *task_cb)
 {
 	struct submodule_parallel_fetch *spf = cb;
+	struct submodule *sub = task_cb;
+	struct repository subrepo;
 
 	if (retvalue)
 		spf->result = 1;
 
+	if (!sub)
+		return 0;
+
+	if (repo_submodule_init(&subrepo, spf->r, sub->path) < 0)
+		warning(_("Could not get submodule repository for submodule '%s' in repository '%s'"),
+			  sub->path, spf->r->worktree);
+	else {
+		struct string_list_item *it;
+		struct oid_array *commits;
+
+		it = string_list_lookup(&spf->changed_submodule_names, sub->name);
+		if (!it)
+			return 0;
+
+		commits = it->util;
+		oid_array_filter(commits,
+				 commit_exists_in_sub,
+				 &subrepo);
+
+		if (commits->nr)
+			string_list_append(&spf->retry, sub->name)
+				->util = commits;
+	}
+
 	return 0;
 }
 
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index 42692219a1a..af12c50e7dd 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -605,4 +605,20 @@ test_expect_success "fetch new commits when submodule got renamed" '
 	test_cmp expect actual
 '
 
+test_expect_success "fetch new commits on-demand when they are not reachable" '
+	git checkout --detach &&
+	C=$(git -C submodule commit-tree -m "new change outside refs/heads" HEAD^{tree}) &&
+	git -C submodule update-ref refs/changes/1 $C &&
+	git update-index --cacheinfo 160000 $C submodule &&
+	git commit -m "updated submodule outside of refs/heads" &&
+	D=$(git rev-parse HEAD) &&
+	git update-ref refs/changes/2 $D &&
+	(
+		cd downstream &&
+		git fetch --recurse-submodules --recurse-submodules-default on-demand origin refs/changes/2:refs/heads/my_branch &&
+		git -C submodule cat-file -t $C &&
+		git checkout --recurse-submodules FETCH_HEAD
+	)
+'
+
 test_done
-- 
2.19.0.397.gdd90340f6a-goog


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 9/9] builtin/fetch: check for submodule updates for non branch fetches
  2018-09-17 21:35 ` [PATCHv2 0/9] fetch: make sure submodule oids are fetched Stefan Beller
                     ` (7 preceding siblings ...)
  2018-09-17 21:35   ` [PATCH 8/9] fetch: retry fetching submodules if needed objects were not fetched Stefan Beller
@ 2018-09-17 21:35   ` Stefan Beller
  8 siblings, 0 replies; 35+ messages in thread
From: Stefan Beller @ 2018-09-17 21:35 UTC (permalink / raw)
  To: sbeller; +Cc: git

Gerrit, the code review tool, has a different workflow than our mailing
list based approach. Usually users upload changes to a Gerrit server and
continuous integration and testing happens by bots. Sometimes however a
user wants to checkout a change locally and look at it locally. For this
use case, Gerrit offers a command line snippet to copy and paste to your
terminal, which looks like

  git fetch https://<host>/gerrit refs/changes/<id> &&
  git checkout FETCH_HEAD

For Gerrit changes that contain changing submodule gitlinks, it would be
easy to extend both the fetch and checkout with the '--recurse-submodules'
flag, such that this command line snippet would produce the state of a
change locally.

However the functionality added in the previous patch, which would
ensure that we fetch the objects in the submodule that the gitlink pointed
at, only works for remote tracking branches so far, not for FETCH_HEAD.

Make sure that fetching a superproject to its FETCH_HEAD, also respects
the existence checks for objects in the submodule recursion.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 builtin/fetch.c             | 5 ++++-
 t/t5526-fetch-submodules.sh | 2 +-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 95c44bf6ffa..ea6ecd123e7 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -887,11 +887,14 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 				rc |= update_local_ref(ref, what, rm, &note,
 						       summary_width);
 				free(ref);
-			} else
+			} else {
+				check_for_new_submodule_commits(&rm->old_oid);
 				format_display(&note, '*',
 					       *kind ? kind : "branch", NULL,
 					       *what ? what : "HEAD",
 					       "FETCH_HEAD", summary_width);
+			}
+
 			if (note.len) {
 				if (verbosity >= 0 && !shown_url) {
 					fprintf(stderr, _("From %.*s\n"),
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index af12c50e7dd..a509eabb044 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -615,7 +615,7 @@ test_expect_success "fetch new commits on-demand when they are not reachable" '
 	git update-ref refs/changes/2 $D &&
 	(
 		cd downstream &&
-		git fetch --recurse-submodules --recurse-submodules-default on-demand origin refs/changes/2:refs/heads/my_branch &&
+		git fetch --recurse-submodules origin refs/changes/2 &&
 		git -C submodule cat-file -t $C &&
 		git checkout --recurse-submodules FETCH_HEAD
 	)
-- 
2.19.0.397.gdd90340f6a-goog


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/9] sha1-array: provide oid_array_filter
  2018-09-17 21:35   ` [PATCH 2/9] sha1-array: provide oid_array_filter Stefan Beller
@ 2018-09-17 21:42     ` Stefan Beller
  0 siblings, 0 replies; 35+ messages in thread
From: Stefan Beller @ 2018-09-17 21:42 UTC (permalink / raw)
  To: Stefan Beller, Junio C Hamano; +Cc: git

On Mon, Sep 17, 2018 at 2:36 PM Stefan Beller <sbeller@google.com> wrote:
>
> Helped-by: Junio C Hamano <gitster@pobox.com>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  sha1-array.c | 17 +++++++++++++++++
>  sha1-array.h |  9 +++++++++
>  2 files changed, 26 insertions(+)
>
> diff --git a/sha1-array.c b/sha1-array.c
> index 265941fbf40..67db5eeec9a 100644
> --- a/sha1-array.c
> +++ b/sha1-array.c
> @@ -77,3 +77,20 @@ int oid_array_for_each_unique(struct oid_array *array,
>         }
>         return 0;
>  }
> +
> +void oid_array_filter(struct oid_array *array,
> +                     for_each_oid_fn want,
> +                     void *cb_data)
> +{
> +       unsigned nr = array->nr, src, dst;
> +       struct object_id *oids = array->oids;

Blargh :-(

I made this last minute "pull oids out to be more like
the object_array_filter" and typo'd without compile checking.

Please discard this series.

Stefan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH] string-list: add string_list_{pop, last} functions
  2018-09-17 21:35   ` [PATCH 1/9] string-list: add string_list_{pop, last} functions Stefan Beller
@ 2018-09-21 22:08     ` " Stefan Beller
  0 siblings, 0 replies; 35+ messages in thread
From: Stefan Beller @ 2018-09-21 22:08 UTC (permalink / raw)
  To: sbeller; +Cc: git

Add a few functions to allow a string-list to be used as a stack:

 - string_list_last() lets a caller peek the string_list_item at the
   end of the string list.  The caller needs to be aware that it is
   borrowing a pointer, which can become invalid if/when the
   string_list is resized.

 - string_list_pop() removes the string_list_item at the end of
   the string list.

 - there is no string_list_push(); string_list_append() can be used
   in its place.

You can use them in this pattern:

    while (list.nr) {
        struct string_list_item *item = string_list_last(&list);

        work_on(item);
        string_list_pop(&list, free_util);
    }

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
---

I will be resending this series shortly, but it will not contain this patch,
as I will have no need for these any longer.

For the record, this is the final state of this patch.
It may be useful on its own.

Thanks,
Stefan

 string-list.c | 14 ++++++++++++++
 string-list.h | 15 +++++++++++++++
 2 files changed, 29 insertions(+)

diff --git a/string-list.c b/string-list.c
index 771c4550980..04db2b537c0 100644
--- a/string-list.c
+++ b/string-list.c
@@ -80,6 +80,20 @@ void string_list_remove(struct string_list *list, const char *string,
 	}
 }
 
+void string_list_pop(struct string_list *list, int free_util)
+{
+	if (list->nr == 0)
+		BUG("tried to remove an item from empty string list");
+
+	if (list->strdup_strings)
+		free(list->items[list->nr - 1].string);
+
+	if (free_util)
+		free(list->items[list->nr - 1].util);
+
+	list->nr--;
+}
+
 int string_list_has_string(const struct string_list *list, const char *string)
 {
 	int exact_match;
diff --git a/string-list.h b/string-list.h
index ff8f6094a33..15f2936e337 100644
--- a/string-list.h
+++ b/string-list.h
@@ -191,6 +191,21 @@ extern void string_list_remove(struct string_list *list, const char *string,
  */
 struct string_list_item *string_list_lookup(struct string_list *list, const char *string);
 
+/**
+ * Removes the last item from the list.
+ * The caller must ensure that the list is not empty.
+ */
+void string_list_pop(struct string_list *list, int free_util);
+
+/*
+ * Returns the last item of the list. As it returns the raw access, do not
+ * modify the list while holding onto the returned pointer.
+ */
+static inline struct string_list_item *string_list_last(struct string_list *list)
+{
+	return &list->items[list->nr - 1];
+}
+
 /*
  * Remove all but the first of consecutive entries with the same
  * string value.  If free_util is true, call free() on the util
-- 
2.19.0.444.g18242da7ef-goog


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 9/9] builtin/fetch: check for submodule updates for non branch fetches
  2018-10-16 18:13 ` [PATCH 9/9] builtin/fetch: check for submodule updates for non branch fetches Stefan Beller
@ 2018-10-18  0:47   ` Jonathan Tan
  0 siblings, 0 replies; 35+ messages in thread
From: Jonathan Tan @ 2018-10-18  0:47 UTC (permalink / raw)
  To: sbeller; +Cc: gitster, git, Jonathan Tan

> @@ -887,11 +887,14 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
>  				rc |= update_local_ref(ref, what, rm, &note,
>  						       summary_width);
>  				free(ref);
> -			} else
> +			} else {
> +				check_for_new_submodule_commits(&rm->old_oid);

Does this need to be guarded with a recurse_submodules check, just like
in update_local_ref()?

Also, this warrants a comment - this is here because there is some code
later that requires the new submodule commits to be registered, and the
other branch does not require it only because update_local_ref() calls
it.

> @@ -615,7 +615,7 @@ test_expect_success "fetch new commits on-demand when they are not reachable" '
>  	git update-ref refs/changes/2 $D &&
>  	(
>  		cd downstream &&
> -		git fetch --recurse-submodules --recurse-submodules-default on-demand origin refs/changes/2:refs/heads/my_branch &&
> +		git fetch --recurse-submodules origin refs/changes/2 &&
>  		git -C submodule cat-file -t $C &&
>  		git checkout --recurse-submodules FETCH_HEAD
>  	)

I think there should be a new test - we can tell from the code that just
because fetching to FETCH_HEAD works doesn't mean that fetching to a ref
works, and vice versa.

Also, can you make the test fetch 2 refs? So that we know that it works
with more than one.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 9/9] builtin/fetch: check for submodule updates for non branch fetches
  2018-10-16 18:13 [PATCH 0/9] Resending sb/submodule-recursive-fetch-gets-the-tip Stefan Beller
@ 2018-10-16 18:13 ` Stefan Beller
  2018-10-18  0:47   ` Jonathan Tan
  0 siblings, 1 reply; 35+ messages in thread
From: Stefan Beller @ 2018-10-16 18:13 UTC (permalink / raw)
  To: gitster; +Cc: git, Stefan Beller

Gerrit, the code review tool, has a different workflow than our mailing
list based approach. Usually users upload changes to a Gerrit server and
continuous integration and testing happens by bots. Sometimes however a
user wants to checkout a change locally and look at it locally. For this
use case, Gerrit offers a command line snippet to copy and paste to your
terminal, which looks like

  git fetch https://<host>/gerrit refs/changes/<id> &&
  git checkout FETCH_HEAD

For Gerrit changes that contain changing submodule gitlinks, it would be
easy to extend both the fetch and checkout with the '--recurse-submodules'
flag, such that this command line snippet would produce the state of a
change locally.

However the functionality added in the previous patch, which would
ensure that we fetch the objects in the submodule that the gitlink pointed
at, only works for remote tracking branches so far, not for FETCH_HEAD.

Make sure that fetching a superproject to its FETCH_HEAD, also respects
the existence checks for objects in the submodule recursion.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/fetch.c             | 5 ++++-
 t/t5526-fetch-submodules.sh | 2 +-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 95c44bf6ff..ea6ecd123e 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -887,11 +887,14 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 				rc |= update_local_ref(ref, what, rm, &note,
 						       summary_width);
 				free(ref);
-			} else
+			} else {
+				check_for_new_submodule_commits(&rm->old_oid);
 				format_display(&note, '*',
 					       *kind ? kind : "branch", NULL,
 					       *what ? what : "HEAD",
 					       "FETCH_HEAD", summary_width);
+			}
+
 			if (note.len) {
 				if (verbosity >= 0 && !shown_url) {
 					fprintf(stderr, _("From %.*s\n"),
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index af12c50e7d..a509eabb04 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -615,7 +615,7 @@ test_expect_success "fetch new commits on-demand when they are not reachable" '
 	git update-ref refs/changes/2 $D &&
 	(
 		cd downstream &&
-		git fetch --recurse-submodules --recurse-submodules-default on-demand origin refs/changes/2:refs/heads/my_branch &&
+		git fetch --recurse-submodules origin refs/changes/2 &&
 		git -C submodule cat-file -t $C &&
 		git checkout --recurse-submodules FETCH_HEAD
 	)
-- 
2.19.0


^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, back to index

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-11 23:49 [PATCH 0/9] fetch: make sure submodule oids are fetched Stefan Beller
2018-09-11 23:49 ` [PATCH 1/9] string-list: add string_list_{pop, last} functions Stefan Beller
2018-09-12  2:24   ` Ramsay Jones
2018-09-12 17:52   ` Junio C Hamano
2018-09-11 23:49 ` [PATCH 2/9] sha1-array: provide oid_array_filter Stefan Beller
2018-09-12 18:02   ` Junio C Hamano
2018-09-11 23:49 ` [PATCH 3/9] submodule.c: fix indentation Stefan Beller
2018-09-12 18:02   ` Junio C Hamano
2018-09-11 23:49 ` [PATCH 4/9] submodule.c: sort changed_submodule_names before searching it Stefan Beller
2018-09-12 18:18   ` Junio C Hamano
2018-09-12 19:06     ` Stefan Beller
2018-09-11 23:49 ` [PATCH 5/9] submodule: move global changed_submodule_names into fetch submodule struct Stefan Beller
2018-09-11 23:49 ` [PATCH 6/9] submodule.c: do not copy around submodule list Stefan Beller
2018-09-12  2:25   ` Ramsay Jones
2018-09-11 23:49 ` [PATCH 7/9] submodule: fetch in submodules git directory instead of in worktree Stefan Beller
2018-09-12 18:36   ` Junio C Hamano
2018-09-13 19:29     ` Stefan Beller
2018-09-11 23:49 ` [PATCH 8/9] fetch: retry fetching submodules if sha1 were not fetched Stefan Beller
2018-09-12 19:03   ` Junio C Hamano
2018-09-11 23:49 ` [PATCH 9/9] builtin/fetch: check for submodule updates for non branch fetches Stefan Beller
2018-09-12 19:20   ` Junio C Hamano
2018-09-17 21:35 ` [PATCHv2 0/9] fetch: make sure submodule oids are fetched Stefan Beller
2018-09-17 21:35   ` [PATCH 1/9] string-list: add string_list_{pop, last} functions Stefan Beller
2018-09-21 22:08     ` [PATCH] " Stefan Beller
2018-09-17 21:35   ` [PATCH 2/9] sha1-array: provide oid_array_filter Stefan Beller
2018-09-17 21:42     ` Stefan Beller
2018-09-17 21:35   ` [PATCH 3/9] submodule.c: fix indentation Stefan Beller
2018-09-17 21:35   ` [PATCH 4/9] submodule.c: sort changed_submodule_names before searching it Stefan Beller
2018-09-17 21:35   ` [PATCH 5/9] submodule: move global changed_submodule_names into fetch submodule struct Stefan Beller
2018-09-17 21:35   ` [PATCH 6/9] submodule.c: do not copy around submodule list Stefan Beller
2018-09-17 21:35   ` [PATCH 7/9] submodule: fetch in submodules git directory instead of in worktree Stefan Beller
2018-09-17 21:35   ` [PATCH 8/9] fetch: retry fetching submodules if needed objects were not fetched Stefan Beller
2018-09-17 21:35   ` [PATCH 9/9] builtin/fetch: check for submodule updates for non branch fetches Stefan Beller
2018-10-16 18:13 [PATCH 0/9] Resending sb/submodule-recursive-fetch-gets-the-tip Stefan Beller
2018-10-16 18:13 ` [PATCH 9/9] builtin/fetch: check for submodule updates for non branch fetches Stefan Beller
2018-10-18  0:47   ` Jonathan Tan

git@vger.kernel.org mailing list mirror (one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/
       or Tor2web: https://www.tor2web.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox