git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 8/9] fetch: retry fetching submodules if needed objects were not fetched
  2018-09-17 21:35 ` [PATCHv2 " Stefan Beller
@ 2018-09-17 21:35   ` Stefan Beller
  0 siblings, 0 replies; 29+ messages in thread
From: Stefan Beller @ 2018-09-17 21:35 UTC (permalink / raw)
  To: sbeller; +Cc: git

Currently when git-fetch is asked to recurse into submodules, it dispatches
a plain "git-fetch -C <submodule-dir>" (with some submodule related options
such as prefix and recusing strategy, but) without any information of the
remote or the tip that should be fetched.

This works surprisingly well in some workflows (such as using submodules
as a third party library), while not so well in other scenarios, such
as in a Gerrit topic-based workflow, that can tie together changes
(potentially across repositories) on the server side. One of the parts
of such a Gerrit workflow is to download a change when wanting to examine
it, and you'd want to have its submodule changes that are in the same
topic downloaded as well. However these submodule changes reside in their
own repository in their own ref (refs/changes/<int>).

Retry fetching a submodule by object name if the object id that the
superproject points to, cannot be found.

This retrying does not happen when the "git fetch" done at the
superproject is not storing the fetched results in remote
tracking branches (i.e. instead just recording them to
FETCH_HEAD) in this step. A later patch will fix this.

builtin/fetch used to only inspect submodules when they were fetched
"on-demand", as in either on/off case it was clear whether the submodule
needs to be fetched. However to know whether we need to try fetching the
object ids, we need to identify the object names, which is done in this
function check_for_new_submodule_commits(), so we'll also run that code
in case the submodule recursion is set to "on".

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 builtin/fetch.c             |  9 ++--
 submodule.c                 | 87 ++++++++++++++++++++++++++++++++++++-
 t/t5526-fetch-submodules.sh | 16 +++++++
 3 files changed, 104 insertions(+), 8 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 61bec5d213d..95c44bf6ffa 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -700,8 +700,7 @@ static int update_local_ref(struct ref *ref,
 			what = _("[new ref]");
 		}
 
-		if ((recurse_submodules != RECURSE_SUBMODULES_OFF) &&
-		    (recurse_submodules != RECURSE_SUBMODULES_ON))
+		if (recurse_submodules != RECURSE_SUBMODULES_OFF)
 			check_for_new_submodule_commits(&ref->new_oid);
 		r = s_update_ref(msg, ref, 0);
 		format_display(display, r ? '!' : '*', what,
@@ -716,8 +715,7 @@ static int update_local_ref(struct ref *ref,
 		strbuf_add_unique_abbrev(&quickref, &current->object.oid, DEFAULT_ABBREV);
 		strbuf_addstr(&quickref, "..");
 		strbuf_add_unique_abbrev(&quickref, &ref->new_oid, DEFAULT_ABBREV);
-		if ((recurse_submodules != RECURSE_SUBMODULES_OFF) &&
-		    (recurse_submodules != RECURSE_SUBMODULES_ON))
+		if (recurse_submodules != RECURSE_SUBMODULES_OFF)
 			check_for_new_submodule_commits(&ref->new_oid);
 		r = s_update_ref("fast-forward", ref, 1);
 		format_display(display, r ? '!' : ' ', quickref.buf,
@@ -731,8 +729,7 @@ static int update_local_ref(struct ref *ref,
 		strbuf_add_unique_abbrev(&quickref, &current->object.oid, DEFAULT_ABBREV);
 		strbuf_addstr(&quickref, "...");
 		strbuf_add_unique_abbrev(&quickref, &ref->new_oid, DEFAULT_ABBREV);
-		if ((recurse_submodules != RECURSE_SUBMODULES_OFF) &&
-		    (recurse_submodules != RECURSE_SUBMODULES_ON))
+		if (recurse_submodules != RECURSE_SUBMODULES_OFF)
 			check_for_new_submodule_commits(&ref->new_oid);
 		r = s_update_ref("forced-update", ref, 1);
 		format_display(display, r ? '!' : '+', quickref.buf,
diff --git a/submodule.c b/submodule.c
index 88bce534268..7d59e56171f 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1127,8 +1127,11 @@ struct submodule_parallel_fetch {
 	int result;
 
 	struct string_list changed_submodule_names;
+	struct string_list retry;
 };
-#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, STRING_LIST_INIT_DUP }
+#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, \
+		  STRING_LIST_INIT_DUP, \
+		  STRING_LIST_INIT_NODUP}
 
 static void calculate_changed_submodule_paths(
 	struct submodule_parallel_fetch *spf)
@@ -1259,8 +1262,10 @@ static int get_next_submodule(struct child_process *cp,
 {
 	int ret = 0;
 	struct submodule_parallel_fetch *spf = data;
+	struct string_list_item *it;
 
 	for (; spf->count < spf->r->index->cache_nr; spf->count++) {
+		int recurse_config;
 		struct strbuf submodule_prefix = STRBUF_INIT;
 		const struct cache_entry *ce = spf->r->index->cache[spf->count];
 		const char *git_dir, *default_argv;
@@ -1280,7 +1285,9 @@ static int get_next_submodule(struct child_process *cp,
 			}
 		}
 
-		switch (get_fetch_recurse_config(submodule, spf))
+		recurse_config = get_fetch_recurse_config(submodule, spf);
+
+		switch (recurse_config)
 		{
 		default:
 		case RECURSE_SUBMODULES_DEFAULT:
@@ -1321,9 +1328,50 @@ static int get_next_submodule(struct child_process *cp,
 		strbuf_release(&submodule_prefix);
 		if (ret) {
 			spf->count++;
+			if (submodule != &default_submodule)
+				/* discard const-ness: */
+				*task_cb = (void*)submodule;
 			return 1;
 		}
 	}
+
+retry_next:
+
+	if (spf->retry.nr) {
+		struct strbuf submodule_prefix = STRBUF_INIT;
+		const struct submodule *sub;
+
+		it = string_list_last(&spf->retry);
+		sub = submodule_from_name(spf->r, &null_oid,
+					  it->string);
+
+		child_process_init(cp);
+		cp->dir = get_submodule_git_dir(spf->r, sub->path);
+		if (!cp->dir) {
+			string_list_pop(&spf->retry, 0);
+			goto retry_next;
+		}
+		prepare_submodule_repo_env_in_gitdir(&cp->env_array);
+		cp->git_cmd = 1;
+
+		strbuf_addf(&submodule_prefix, "%s%s/", spf->prefix, sub->path);
+		argv_array_init(&cp->args);
+		argv_array_pushv(&cp->args, spf->args.argv);
+		argv_array_push(&cp->args, "on-demand");
+		argv_array_push(&cp->args, "--submodule-prefix");
+		argv_array_push(&cp->args, submodule_prefix.buf);
+
+		/* NEEDSWORK: have get_default_remote from s--h */
+		argv_array_push(&cp->args, "origin");
+		oid_array_for_each_unique(it->util,
+					  append_oid_to_argv, &cp->args);
+
+		*task_cb = NULL; /* make sure we do not recurse forever */
+		strbuf_release(&submodule_prefix);
+		string_list_pop(&spf->retry, 0);
+		return 1;
+	}
+
 	return 0;
 }
 
@@ -1337,14 +1385,49 @@ static int fetch_start_failure(struct strbuf *err,
 	return 0;
 }
 
+static int commit_exists_in_sub(const struct object_id *oid, void *data)
+{
+	struct repository *subrepo = data;
+
+	enum object_type type = oid_object_info(subrepo, oid, NULL);
+
+	return type != OBJ_COMMIT;
+}
+
 static int fetch_finish(int retvalue, struct strbuf *err,
 			void *cb, void *task_cb)
 {
 	struct submodule_parallel_fetch *spf = cb;
+	struct submodule *sub = task_cb;
+	struct repository subrepo;
 
 	if (retvalue)
 		spf->result = 1;
 
+	if (!sub)
+		return 0;
+
+	if (repo_submodule_init(&subrepo, spf->r, sub->path) < 0)
+		warning(_("Could not get submodule repository for submodule '%s' in repository '%s'"),
+			  sub->path, spf->r->worktree);
+	else {
+		struct string_list_item *it;
+		struct oid_array *commits;
+
+		it = string_list_lookup(&spf->changed_submodule_names, sub->name);
+		if (!it)
+			return 0;
+
+		commits = it->util;
+		oid_array_filter(commits,
+				 commit_exists_in_sub,
+				 &subrepo);
+
+		if (commits->nr)
+			string_list_append(&spf->retry, sub->name)
+				->util = commits;
+	}
+
 	return 0;
 }
 
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index 42692219a1a..af12c50e7dd 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -605,4 +605,20 @@ test_expect_success "fetch new commits when submodule got renamed" '
 	test_cmp expect actual
 '
 
+test_expect_success "fetch new commits on-demand when they are not reachable" '
+	git checkout --detach &&
+	C=$(git -C submodule commit-tree -m "new change outside refs/heads" HEAD^{tree}) &&
+	git -C submodule update-ref refs/changes/1 $C &&
+	git update-index --cacheinfo 160000 $C submodule &&
+	git commit -m "updated submodule outside of refs/heads" &&
+	D=$(git rev-parse HEAD) &&
+	git update-ref refs/changes/2 $D &&
+	(
+		cd downstream &&
+		git fetch --recurse-submodules --recurse-submodules-default on-demand origin refs/changes/2:refs/heads/my_branch &&
+		git -C submodule cat-file -t $C &&
+		git checkout --recurse-submodules FETCH_HEAD
+	)
+'
+
 test_done
-- 
2.19.0.397.gdd90340f6a-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 0/9] Resending sb/submodule-recursive-fetch-gets-the-tip
@ 2018-10-16 18:13 Stefan Beller
  2018-10-16 18:13 ` [PATCH 1/9] sha1-array: provide oid_array_filter Stefan Beller
                   ` (10 more replies)
  0 siblings, 11 replies; 29+ messages in thread
From: Stefan Beller @ 2018-10-16 18:13 UTC (permalink / raw)
  To: gitster; +Cc: git, Stefan Beller

This is based on ao/submodule-wo-gitmodules-checked-out.

This resends origin/sb/submodule-recursive-fetch-gets-the-tip, resolving
the issues pointed out via origin/xxx/sb-submodule-recursive-fetch-gets-the-tip-in-pu
by basing this series on origin/ao/submodule-wo-gitmodules-checked-out

A range-diff below shows how picking a different base changed the patches,
apart from that no further adjustments have been made.

Thanks,
Stefan

Stefan Beller (9):
  sha1-array: provide oid_array_filter
  submodule.c: fix indentation
  submodule.c: sort changed_submodule_names before searching it
  submodule.c: move global changed_submodule_names into fetch submodule
    struct
  submodule.c: do not copy around submodule list
  repository: repo_submodule_init to take a submodule struct
  submodule: fetch in submodules git directory instead of in worktree
  fetch: retry fetching submodules if needed objects were not fetched
  builtin/fetch: check for submodule updates for non branch fetches

 Documentation/technical/api-oid-array.txt    |   5 +
 builtin/fetch.c                              |  14 +-
 builtin/grep.c                               |  17 +-
 builtin/ls-files.c                           |  12 +-
 repository.c                                 |  27 +-
 repository.h                                 |  11 +-
 sha1-array.c                                 |  17 ++
 sha1-array.h                                 |   3 +
 submodule.c                                  | 275 +++++++++++++++----
 t/helper/test-submodule-nested-repo-config.c |   8 +-
 t/t5526-fetch-submodules.sh                  |  23 +-
 11 files changed, 315 insertions(+), 97 deletions(-)

git range-diff origin/xxx/sb-submodule-recursive-fetch-gets-the-tip-in-pu...
[...]
585:  ac1f98a0df <   -:  ---------- doc: move git-rev-parse from porcelain to plumbing
586:  7cf1a0fbef =   1:  a035323c49 sha1-array: provide oid_array_filter
587:  01077381d0 =   2:  30ed20b4f0 submodule.c: fix indentation
588:  4b0cdf5899 =   3:  cd590ea88d submodule.c: sort changed_submodule_names before searching it
589:  78e5099ecc !   4:  ce959811ba submodule.c: move global changed_submodule_names into fetch submodule struct
    @@ -12,7 +12,7 @@
     --- a/submodule.c
     +++ b/submodule.c
     @@
    - #include "commit-reach.h"
    + #include "object-store.h"
      
      static int config_update_recurse_submodules = RECURSE_SUBMODULES_OFF;
     -static struct string_list changed_submodule_names = STRING_LIST_INIT_DUP;
590:  d813f18bb3 =   5:  151f9a8ad4 submodule.c: do not copy around submodule list
591:  a077d63af7 !   6:  3a97743fa2 repository: repo_submodule_init to take a submodule struct
    @@ -15,7 +15,6 @@
         Also move its documentation into the header file.
     
         Signed-off-by: Stefan Beller <sbeller@google.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
     diff --git a/builtin/grep.c b/builtin/grep.c
     --- a/builtin/grep.c
    @@ -31,12 +30,16 @@
     +
      	int hit;
      
    - 	if (!is_submodule_active(superproject, path))
    + 	/*
    +@@
      		return 0;
    + 	}
      
    --	if (repo_submodule_init(&submodule, superproject, path))
    -+	if (repo_submodule_init(&subrepo, superproject, sub))
    +-	if (repo_submodule_init(&submodule, superproject, path)) {
    ++	if (repo_submodule_init(&subrepo, superproject, sub)) {
    + 		grep_read_unlock();
      		return 0;
    + 	}
      
     -	repo_read_gitmodules(&submodule);
     +	repo_read_gitmodules(&subrepo);
    @@ -44,9 +47,9 @@
      	/*
      	 * NEEDSWORK: This adds the submodule's object directory to the list of
     @@
    + 	 * store is no longer global and instead is a member of the repository
      	 * object.
      	 */
    - 	grep_read_lock();
     -	add_to_alternates_memory(submodule.objects->objectdir);
     +	add_to_alternates_memory(subrepo.objects->objectdir);
      	grep_read_unlock();
    @@ -100,19 +103,6 @@
      
      static void show_ce(struct repository *repo, struct dir_struct *dir,
     
    -diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
    ---- a/builtin/submodule--helper.c
    -+++ b/builtin/submodule--helper.c
    -@@
    - 	if (!sub)
    - 		BUG("We could get the submodule handle before?");
    - 
    --	if (repo_submodule_init(&subrepo, the_repository, path))
    -+	if (repo_submodule_init(&subrepo, the_repository, sub))
    - 		die(_("could not get a repository handle for submodule '%s'"), path);
    - 
    - 	if (!repo_config_get_string(&subrepo, "core.worktree", &cw)) {
    -
     diff --git a/repository.c b/repository.c
     --- a/repository.c
     +++ b/repository.c
    @@ -197,3 +187,32 @@
      void repo_clear(struct repository *repo);
      
      /*
    +
    +diff --git a/t/helper/test-submodule-nested-repo-config.c b/t/helper/test-submodule-nested-repo-config.c
    +--- a/t/helper/test-submodule-nested-repo-config.c
    ++++ b/t/helper/test-submodule-nested-repo-config.c
    +@@
    + 
    + int cmd__submodule_nested_repo_config(int argc, const char **argv)
    + {
    +-	struct repository submodule;
    ++	struct repository subrepo;
    ++	const struct submodule *sub;
    + 
    + 	if (argc < 3)
    + 		die_usage(argc, argv, "Wrong number of arguments.");
    + 
    + 	setup_git_directory();
    + 
    +-	if (repo_submodule_init(&submodule, the_repository, argv[1])) {
    ++	sub = submodule_from_path(the_repository, &null_oid, argv[1]);
    ++	if (repo_submodule_init(&subrepo, the_repository, sub)) {
    + 		die_usage(argc, argv, "Submodule not found.");
    + 	}
    + 
    + 	/* Read the config of _child_ submodules. */
    +-	print_config_from_gitmodules(&submodule, argv[2]);
    ++	print_config_from_gitmodules(&subrepo, argv[2]);
    + 
    + 	submodule_free(the_repository);
    + 
592:  780f6c1a92 =   7:  4e8ad61f8d submodule: fetch in submodules git directory instead of in worktree
593:  a530535912 =   8:  24bac00db7 fetch: retry fetching submodules if needed objects were not fetched
594:  a72bde3a8a =   9:  e031182e44 builtin/fetch: check for submodule updates for non branch fetches
[...]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/9] sha1-array: provide oid_array_filter
  2018-10-16 18:13 [PATCH 0/9] Resending sb/submodule-recursive-fetch-gets-the-tip Stefan Beller
@ 2018-10-16 18:13 ` Stefan Beller
  2018-10-16 18:13 ` [PATCH 2/9] submodule.c: fix indentation Stefan Beller
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 29+ messages in thread
From: Stefan Beller @ 2018-10-16 18:13 UTC (permalink / raw)
  To: gitster; +Cc: git, Stefan Beller

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/technical/api-oid-array.txt |  5 +++++
 sha1-array.c                              | 17 +++++++++++++++++
 sha1-array.h                              |  3 +++
 3 files changed, 25 insertions(+)

diff --git a/Documentation/technical/api-oid-array.txt b/Documentation/technical/api-oid-array.txt
index 9febfb1d52..c97428c2c3 100644
--- a/Documentation/technical/api-oid-array.txt
+++ b/Documentation/technical/api-oid-array.txt
@@ -48,6 +48,11 @@ Functions
 	is not sorted, this function has the side effect of sorting
 	it.
 
+`oid_array_filter`::
+	Apply the callback function `want` to each entry in the array,
+	retaining only the entries for which the function returns true.
+	Preserve the order of the entries that are retained.
+
 Examples
 --------
 
diff --git a/sha1-array.c b/sha1-array.c
index 265941fbf4..d505a004bb 100644
--- a/sha1-array.c
+++ b/sha1-array.c
@@ -77,3 +77,20 @@ int oid_array_for_each_unique(struct oid_array *array,
 	}
 	return 0;
 }
+
+void oid_array_filter(struct oid_array *array,
+		      for_each_oid_fn want,
+		      void *cb_data)
+{
+	unsigned nr = array->nr, src, dst;
+	struct object_id *oids = array->oid;
+
+	for (src = dst = 0; src < nr; src++) {
+		if (want(&oids[src], cb_data)) {
+			if (src != dst)
+				oidcpy(&oids[dst], &oids[src]);
+			dst++;
+		}
+	}
+	array->nr = dst;
+}
diff --git a/sha1-array.h b/sha1-array.h
index 232bf95017..55d016c4bf 100644
--- a/sha1-array.h
+++ b/sha1-array.h
@@ -22,5 +22,8 @@ int oid_array_for_each(struct oid_array *array,
 int oid_array_for_each_unique(struct oid_array *array,
 			      for_each_oid_fn fn,
 			      void *data);
+void oid_array_filter(struct oid_array *array,
+		      for_each_oid_fn want,
+		      void *cbdata);
 
 #endif /* SHA1_ARRAY_H */
-- 
2.19.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 2/9] submodule.c: fix indentation
  2018-10-16 18:13 [PATCH 0/9] Resending sb/submodule-recursive-fetch-gets-the-tip Stefan Beller
  2018-10-16 18:13 ` [PATCH 1/9] sha1-array: provide oid_array_filter Stefan Beller
@ 2018-10-16 18:13 ` Stefan Beller
  2018-10-16 18:13 ` [PATCH 3/9] submodule.c: sort changed_submodule_names before searching it Stefan Beller
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 29+ messages in thread
From: Stefan Beller @ 2018-10-16 18:13 UTC (permalink / raw)
  To: gitster; +Cc: git, Stefan Beller

The submodule subsystem is really bad at staying within 80 characters.
Fix it while we are here.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 submodule.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/submodule.c b/submodule.c
index 2b7082b2db..e145ebbb16 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1258,7 +1258,8 @@ static int get_next_submodule(struct child_process *cp,
 		if (!submodule) {
 			const char *name = default_name_or_path(ce->name);
 			if (name) {
-				default_submodule.path = default_submodule.name = name;
+				default_submodule.path = name;
+				default_submodule.name = name;
 				submodule = &default_submodule;
 			}
 		}
@@ -1268,8 +1269,10 @@ static int get_next_submodule(struct child_process *cp,
 		default:
 		case RECURSE_SUBMODULES_DEFAULT:
 		case RECURSE_SUBMODULES_ON_DEMAND:
-			if (!submodule || !unsorted_string_list_lookup(&changed_submodule_names,
-							 submodule->name))
+			if (!submodule ||
+			    !unsorted_string_list_lookup(
+					&changed_submodule_names,
+					submodule->name))
 				continue;
 			default_argv = "on-demand";
 			break;
-- 
2.19.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 3/9] submodule.c: sort changed_submodule_names before searching it
  2018-10-16 18:13 [PATCH 0/9] Resending sb/submodule-recursive-fetch-gets-the-tip Stefan Beller
  2018-10-16 18:13 ` [PATCH 1/9] sha1-array: provide oid_array_filter Stefan Beller
  2018-10-16 18:13 ` [PATCH 2/9] submodule.c: fix indentation Stefan Beller
@ 2018-10-16 18:13 ` Stefan Beller
  2018-10-17 21:21   ` Jonathan Tan
  2018-10-16 18:13 ` [PATCH 4/9] submodule.c: move global changed_submodule_names into fetch submodule struct Stefan Beller
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 29+ messages in thread
From: Stefan Beller @ 2018-10-16 18:13 UTC (permalink / raw)
  To: gitster; +Cc: git, Stefan Beller

We can string_list_insert() to maintain sorted-ness of the
list as we find new items, or we can string_list_append() to
build an unsorted list and sort it at the end just once.

As we do not rely on the sortedness while building the
list, we pick the "append and sort at the end" as it
has better worst case execution times.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 submodule.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/submodule.c b/submodule.c
index e145ebbb16..9fbfcfcfe1 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1270,7 +1270,7 @@ static int get_next_submodule(struct child_process *cp,
 		case RECURSE_SUBMODULES_DEFAULT:
 		case RECURSE_SUBMODULES_ON_DEMAND:
 			if (!submodule ||
-			    !unsorted_string_list_lookup(
+			    !string_list_lookup(
 					&changed_submodule_names,
 					submodule->name))
 				continue;
@@ -1364,6 +1364,7 @@ int fetch_populated_submodules(struct repository *r,
 	/* default value, "--submodule-prefix" and its value are added later */
 
 	calculate_changed_submodule_paths();
+	string_list_sort(&changed_submodule_names);
 	run_processes_parallel(max_parallel_jobs,
 			       get_next_submodule,
 			       fetch_start_failure,
-- 
2.19.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 4/9] submodule.c: move global changed_submodule_names into fetch submodule struct
  2018-10-16 18:13 [PATCH 0/9] Resending sb/submodule-recursive-fetch-gets-the-tip Stefan Beller
                   ` (2 preceding siblings ...)
  2018-10-16 18:13 ` [PATCH 3/9] submodule.c: sort changed_submodule_names before searching it Stefan Beller
@ 2018-10-16 18:13 ` Stefan Beller
  2018-10-17 21:26   ` Jonathan Tan
  2018-10-16 18:13 ` [PATCH 5/9] submodule.c: do not copy around submodule list Stefan Beller
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 29+ messages in thread
From: Stefan Beller @ 2018-10-16 18:13 UTC (permalink / raw)
  To: gitster; +Cc: git, Stefan Beller

The `changed_submodule_names` are only used for fetching, so let's make it
part of the struct that is passed around for fetching submodules.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 submodule.c | 42 +++++++++++++++++++++++-------------------
 1 file changed, 23 insertions(+), 19 deletions(-)

diff --git a/submodule.c b/submodule.c
index 9fbfcfcfe1..6b4cee82bf 100644
--- a/submodule.c
+++ b/submodule.c
@@ -24,7 +24,7 @@
 #include "object-store.h"
 
 static int config_update_recurse_submodules = RECURSE_SUBMODULES_OFF;
-static struct string_list changed_submodule_names = STRING_LIST_INIT_DUP;
+
 static int initialized_fetch_ref_tips;
 static struct oid_array ref_tips_before_fetch;
 static struct oid_array ref_tips_after_fetch;
@@ -1124,7 +1124,22 @@ void check_for_new_submodule_commits(struct object_id *oid)
 	oid_array_append(&ref_tips_after_fetch, oid);
 }
 
-static void calculate_changed_submodule_paths(void)
+struct submodule_parallel_fetch {
+	int count;
+	struct argv_array args;
+	struct repository *r;
+	const char *prefix;
+	int command_line_option;
+	int default_option;
+	int quiet;
+	int result;
+
+	struct string_list changed_submodule_names;
+};
+#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, STRING_LIST_INIT_DUP }
+
+static void calculate_changed_submodule_paths(
+	struct submodule_parallel_fetch *spf)
 {
 	struct argv_array argv = ARGV_ARRAY_INIT;
 	struct string_list changed_submodules = STRING_LIST_INIT_DUP;
@@ -1162,7 +1177,8 @@ static void calculate_changed_submodule_paths(void)
 			continue;
 
 		if (!submodule_has_commits(path, commits))
-			string_list_append(&changed_submodule_names, name->string);
+			string_list_append(&spf->changed_submodule_names,
+					   name->string);
 	}
 
 	free_submodules_oids(&changed_submodules);
@@ -1199,18 +1215,6 @@ int submodule_touches_in_range(struct object_id *excl_oid,
 	return ret;
 }
 
-struct submodule_parallel_fetch {
-	int count;
-	struct argv_array args;
-	struct repository *r;
-	const char *prefix;
-	int command_line_option;
-	int default_option;
-	int quiet;
-	int result;
-};
-#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0}
-
 static int get_fetch_recurse_config(const struct submodule *submodule,
 				    struct submodule_parallel_fetch *spf)
 {
@@ -1271,7 +1275,7 @@ static int get_next_submodule(struct child_process *cp,
 		case RECURSE_SUBMODULES_ON_DEMAND:
 			if (!submodule ||
 			    !string_list_lookup(
-					&changed_submodule_names,
+					&spf->changed_submodule_names,
 					submodule->name))
 				continue;
 			default_argv = "on-demand";
@@ -1363,8 +1367,8 @@ int fetch_populated_submodules(struct repository *r,
 	argv_array_push(&spf.args, "--recurse-submodules-default");
 	/* default value, "--submodule-prefix" and its value are added later */
 
-	calculate_changed_submodule_paths();
-	string_list_sort(&changed_submodule_names);
+	calculate_changed_submodule_paths(&spf);
+	string_list_sort(&spf.changed_submodule_names);
 	run_processes_parallel(max_parallel_jobs,
 			       get_next_submodule,
 			       fetch_start_failure,
@@ -1373,7 +1377,7 @@ int fetch_populated_submodules(struct repository *r,
 
 	argv_array_clear(&spf.args);
 out:
-	string_list_clear(&changed_submodule_names, 1);
+	string_list_clear(&spf.changed_submodule_names, 1);
 	return spf.result;
 }
 
-- 
2.19.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 5/9] submodule.c: do not copy around submodule list
  2018-10-16 18:13 [PATCH 0/9] Resending sb/submodule-recursive-fetch-gets-the-tip Stefan Beller
                   ` (3 preceding siblings ...)
  2018-10-16 18:13 ` [PATCH 4/9] submodule.c: move global changed_submodule_names into fetch submodule struct Stefan Beller
@ 2018-10-16 18:13 ` Stefan Beller
  2018-10-17 21:45   ` Jonathan Tan
  2018-10-16 18:13 ` [PATCH 6/9] repository: repo_submodule_init to take a submodule struct Stefan Beller
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 29+ messages in thread
From: Stefan Beller @ 2018-10-16 18:13 UTC (permalink / raw)
  To: gitster; +Cc: git, Stefan Beller

'calculate_changed_submodule_paths' uses a local list to compute the
changed submodules, and then produces the result by copying appropriate
items into the result list.

Instead use the result list directly and prune items afterwards
using string_list_remove_empty_items.

By doing so we'll have access to the util pointer for longer that
contains the commits that we need to fetch, which will be
useful in a later patch.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 submodule.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/submodule.c b/submodule.c
index 6b4cee82bf..cbefe5f54d 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1142,8 +1142,7 @@ static void calculate_changed_submodule_paths(
 	struct submodule_parallel_fetch *spf)
 {
 	struct argv_array argv = ARGV_ARRAY_INIT;
-	struct string_list changed_submodules = STRING_LIST_INIT_DUP;
-	const struct string_list_item *name;
+	struct string_list_item *name;
 
 	/* No need to check if there are no submodules configured */
 	if (!submodule_from_path(the_repository, NULL, NULL))
@@ -1160,9 +1159,9 @@ static void calculate_changed_submodule_paths(
 	 * Collect all submodules (whether checked out or not) for which new
 	 * commits have been recorded upstream in "changed_submodule_names".
 	 */
-	collect_changed_submodules(&changed_submodules, &argv);
+	collect_changed_submodules(&spf->changed_submodule_names, &argv);
 
-	for_each_string_list_item(name, &changed_submodules) {
+	for_each_string_list_item(name, &spf->changed_submodule_names) {
 		struct oid_array *commits = name->util;
 		const struct submodule *submodule;
 		const char *path = NULL;
@@ -1176,12 +1175,14 @@ static void calculate_changed_submodule_paths(
 		if (!path)
 			continue;
 
-		if (!submodule_has_commits(path, commits))
-			string_list_append(&spf->changed_submodule_names,
-					   name->string);
+		if (submodule_has_commits(path, commits)) {
+			oid_array_clear(commits);
+			*name->string = '\0';
+		}
 	}
 
-	free_submodules_oids(&changed_submodules);
+	string_list_remove_empty_items(&spf->changed_submodule_names, 1);
+
 	argv_array_clear(&argv);
 	oid_array_clear(&ref_tips_before_fetch);
 	oid_array_clear(&ref_tips_after_fetch);
@@ -1377,7 +1378,7 @@ int fetch_populated_submodules(struct repository *r,
 
 	argv_array_clear(&spf.args);
 out:
-	string_list_clear(&spf.changed_submodule_names, 1);
+	free_submodules_oids(&spf.changed_submodule_names);
 	return spf.result;
 }
 
-- 
2.19.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 6/9] repository: repo_submodule_init to take a submodule struct
  2018-10-16 18:13 [PATCH 0/9] Resending sb/submodule-recursive-fetch-gets-the-tip Stefan Beller
                   ` (4 preceding siblings ...)
  2018-10-16 18:13 ` [PATCH 5/9] submodule.c: do not copy around submodule list Stefan Beller
@ 2018-10-16 18:13 ` Stefan Beller
  2018-10-17 21:55   ` Jonathan Tan
  2018-10-16 18:13 ` [PATCH 7/9] submodule: fetch in submodules git directory instead of in worktree Stefan Beller
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 29+ messages in thread
From: Stefan Beller @ 2018-10-16 18:13 UTC (permalink / raw)
  To: gitster; +Cc: git, Stefan Beller

When constructing a struct repository for a submodule for some revision
of the superproject where the submodule is not contained in the index,
it may not be present in the working tree currently either. In that
siutation giving a 'path' argument is not useful. Upgrade the
repo_submodule_init function to take a struct submodule instead.

While we are at it, overhaul the repo_submodule_init function by renaming
the submodule repository struct, which is to be initialized, to a name
that is not confused with the struct submodule as easily.

Also move its documentation into the header file.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 builtin/grep.c                               | 17 +++++++-----
 builtin/ls-files.c                           | 12 +++++----
 repository.c                                 | 27 ++++++++------------
 repository.h                                 | 11 ++++++--
 t/helper/test-submodule-nested-repo-config.c |  8 +++---
 5 files changed, 41 insertions(+), 34 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 7da8fef31a..ba7634258a 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -418,7 +418,10 @@ static int grep_submodule(struct grep_opt *opt, struct repository *superproject,
 			  const struct object_id *oid,
 			  const char *filename, const char *path)
 {
-	struct repository submodule;
+	struct repository subrepo;
+	const struct submodule *sub = submodule_from_path(superproject,
+							  &null_oid, path);
+
 	int hit;
 
 	/*
@@ -434,12 +437,12 @@ static int grep_submodule(struct grep_opt *opt, struct repository *superproject,
 		return 0;
 	}
 
-	if (repo_submodule_init(&submodule, superproject, path)) {
+	if (repo_submodule_init(&subrepo, superproject, sub)) {
 		grep_read_unlock();
 		return 0;
 	}
 
-	repo_read_gitmodules(&submodule);
+	repo_read_gitmodules(&subrepo);
 
 	/*
 	 * NEEDSWORK: This adds the submodule's object directory to the list of
@@ -451,7 +454,7 @@ static int grep_submodule(struct grep_opt *opt, struct repository *superproject,
 	 * store is no longer global and instead is a member of the repository
 	 * object.
 	 */
-	add_to_alternates_memory(submodule.objects->objectdir);
+	add_to_alternates_memory(subrepo.objects->objectdir);
 	grep_read_unlock();
 
 	if (oid) {
@@ -476,14 +479,14 @@ static int grep_submodule(struct grep_opt *opt, struct repository *superproject,
 
 		init_tree_desc(&tree, data, size);
 		hit = grep_tree(opt, pathspec, &tree, &base, base.len,
-				object->type == OBJ_COMMIT, &submodule);
+				object->type == OBJ_COMMIT, &subrepo);
 		strbuf_release(&base);
 		free(data);
 	} else {
-		hit = grep_cache(opt, &submodule, pathspec, 1);
+		hit = grep_cache(opt, &subrepo, pathspec, 1);
 	}
 
-	repo_clear(&submodule);
+	repo_clear(&subrepo);
 	return hit;
 }
 
diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 7f9919a362..4d1649c1b3 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -206,17 +206,19 @@ static void show_files(struct repository *repo, struct dir_struct *dir);
 static void show_submodule(struct repository *superproject,
 			   struct dir_struct *dir, const char *path)
 {
-	struct repository submodule;
+	struct repository subrepo;
+	const struct submodule *sub = submodule_from_path(superproject,
+							  &null_oid, path);
 
-	if (repo_submodule_init(&submodule, superproject, path))
+	if (repo_submodule_init(&subrepo, superproject, sub))
 		return;
 
-	if (repo_read_index(&submodule) < 0)
+	if (repo_read_index(&subrepo) < 0)
 		die("index file corrupt");
 
-	show_files(&submodule, dir);
+	show_files(&subrepo, dir);
 
-	repo_clear(&submodule);
+	repo_clear(&subrepo);
 }
 
 static void show_ce(struct repository *repo, struct dir_struct *dir,
diff --git a/repository.c b/repository.c
index 5dd1486718..aabe64ee5d 100644
--- a/repository.c
+++ b/repository.c
@@ -166,30 +166,23 @@ int repo_init(struct repository *repo,
 	return -1;
 }
 
-/*
- * Initialize 'submodule' as the submodule given by 'path' in parent repository
- * 'superproject'.
- * Return 0 upon success and a non-zero value upon failure.
- */
-int repo_submodule_init(struct repository *submodule,
+int repo_submodule_init(struct repository *subrepo,
 			struct repository *superproject,
-			const char *path)
+			const struct submodule *sub)
 {
-	const struct submodule *sub;
 	struct strbuf gitdir = STRBUF_INIT;
 	struct strbuf worktree = STRBUF_INIT;
 	int ret = 0;
 
-	sub = submodule_from_path(superproject, &null_oid, path);
 	if (!sub) {
 		ret = -1;
 		goto out;
 	}
 
-	strbuf_repo_worktree_path(&gitdir, superproject, "%s/.git", path);
-	strbuf_repo_worktree_path(&worktree, superproject, "%s", path);
+	strbuf_repo_worktree_path(&gitdir, superproject, "%s/.git", sub->path);
+	strbuf_repo_worktree_path(&worktree, superproject, "%s", sub->path);
 
-	if (repo_init(submodule, gitdir.buf, worktree.buf)) {
+	if (repo_init(subrepo, gitdir.buf, worktree.buf)) {
 		/*
 		 * If initilization fails then it may be due to the submodule
 		 * not being populated in the superproject's worktree.  Instead
@@ -201,16 +194,16 @@ int repo_submodule_init(struct repository *submodule,
 		strbuf_repo_git_path(&gitdir, superproject,
 				     "modules/%s", sub->name);
 
-		if (repo_init(submodule, gitdir.buf, NULL)) {
+		if (repo_init(subrepo, gitdir.buf, NULL)) {
 			ret = -1;
 			goto out;
 		}
 	}
 
-	submodule->submodule_prefix = xstrfmt("%s%s/",
-					      superproject->submodule_prefix ?
-					      superproject->submodule_prefix :
-					      "", path);
+	subrepo->submodule_prefix = xstrfmt("%s%s/",
+					    superproject->submodule_prefix ?
+					    superproject->submodule_prefix :
+					    "", sub->path);
 
 out:
 	strbuf_release(&gitdir);
diff --git a/repository.h b/repository.h
index 9f16c42c1e..a9c7a5baa5 100644
--- a/repository.h
+++ b/repository.h
@@ -116,9 +116,16 @@ void repo_set_worktree(struct repository *repo, const char *path);
 void repo_set_hash_algo(struct repository *repo, int algo);
 void initialize_the_repository(void);
 int repo_init(struct repository *r, const char *gitdir, const char *worktree);
-int repo_submodule_init(struct repository *submodule,
+
+/*
+ * Initialize the repository 'subrepo' as the submodule given by the
+ * struct submodule 'sub' in parent repository 'superproject'.
+ * Return 0 upon success and a non-zero value upon failure.
+ */
+struct submodule;
+int repo_submodule_init(struct repository *subrepo,
 			struct repository *superproject,
-			const char *path);
+			const struct submodule *sub);
 void repo_clear(struct repository *repo);
 
 /*
diff --git a/t/helper/test-submodule-nested-repo-config.c b/t/helper/test-submodule-nested-repo-config.c
index a31e2a9bea..bc97929bbc 100644
--- a/t/helper/test-submodule-nested-repo-config.c
+++ b/t/helper/test-submodule-nested-repo-config.c
@@ -10,19 +10,21 @@ static void die_usage(int argc, const char **argv, const char *msg)
 
 int cmd__submodule_nested_repo_config(int argc, const char **argv)
 {
-	struct repository submodule;
+	struct repository subrepo;
+	const struct submodule *sub;
 
 	if (argc < 3)
 		die_usage(argc, argv, "Wrong number of arguments.");
 
 	setup_git_directory();
 
-	if (repo_submodule_init(&submodule, the_repository, argv[1])) {
+	sub = submodule_from_path(the_repository, &null_oid, argv[1]);
+	if (repo_submodule_init(&subrepo, the_repository, sub)) {
 		die_usage(argc, argv, "Submodule not found.");
 	}
 
 	/* Read the config of _child_ submodules. */
-	print_config_from_gitmodules(&submodule, argv[2]);
+	print_config_from_gitmodules(&subrepo, argv[2]);
 
 	submodule_free(the_repository);
 
-- 
2.19.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 7/9] submodule: fetch in submodules git directory instead of in worktree
  2018-10-16 18:13 [PATCH 0/9] Resending sb/submodule-recursive-fetch-gets-the-tip Stefan Beller
                   ` (5 preceding siblings ...)
  2018-10-16 18:13 ` [PATCH 6/9] repository: repo_submodule_init to take a submodule struct Stefan Beller
@ 2018-10-16 18:13 ` Stefan Beller
  2018-10-17 22:58   ` Jonathan Tan
  2018-10-16 18:13 ` [PATCH 8/9] fetch: retry fetching submodules if needed objects were not fetched Stefan Beller
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 29+ messages in thread
From: Stefan Beller @ 2018-10-16 18:13 UTC (permalink / raw)
  To: gitster; +Cc: git, Stefan Beller

This patch started as a refactoring to make 'get_next_submodule' more
readable, but upon doing so, I realized that "git fetch" of the submodule
actually doesn't need to be run in the submodules worktree. So let's run
it in its git dir instead.

That should pave the way towards fetching submodules that are currently
not checked out.

This patch leaks the cp->dir in get_next_submodule, as any further
callback in run_processes_parallel doesn't have access to the child
process any more. In an early iteration of this patch, the function
get_submodule_repo_for directly returned the string containing the
git directory, which would be a better design choice for this patch.

However the next patch both fixes the memory leak of cp->dir and also has
a use case for using the full repository handle of the submodule, so
it makes sense to introduce the get_submodule_repo_for here already.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 submodule.c                 | 51 +++++++++++++++++++++++++++----------
 t/t5526-fetch-submodules.sh |  7 ++++-
 2 files changed, 44 insertions(+), 14 deletions(-)

diff --git a/submodule.c b/submodule.c
index cbefe5f54d..30c06507e3 100644
--- a/submodule.c
+++ b/submodule.c
@@ -495,6 +495,12 @@ void prepare_submodule_repo_env(struct argv_array *out)
 			 DEFAULT_GIT_DIR_ENVIRONMENT);
 }
 
+static void prepare_submodule_repo_env_in_gitdir(struct argv_array *out)
+{
+	prepare_submodule_repo_env_no_git_dir(out);
+	argv_array_pushf(out, "%s=.", GIT_DIR_ENVIRONMENT);
+}
+
 /* Helper function to display the submodule header line prior to the full
  * summary output. If it can locate the submodule objects directory it will
  * attempt to lookup both the left and right commits and put them into the
@@ -1241,6 +1247,29 @@ static int get_fetch_recurse_config(const struct submodule *submodule,
 	return spf->default_option;
 }
 
+static struct repository *get_submodule_repo_for(struct repository *r,
+						 const struct submodule *sub)
+{
+	struct repository *ret = xmalloc(sizeof(*ret));
+
+	if (repo_submodule_init(ret, r, sub)) {
+		/*
+		 * No entry in .gitmodules? Technically not a submodule,
+		 * but historically we supported repositories that happen to be
+		 * in-place where a gitlink is. Keep supporting them.
+		 */
+		struct strbuf gitdir = STRBUF_INIT;
+		strbuf_repo_worktree_path(&gitdir, r, "%s/.git", sub->path);
+		if (repo_init(ret, gitdir.buf, NULL)) {
+			strbuf_release(&gitdir);
+			return NULL;
+		}
+		strbuf_release(&gitdir);
+	}
+
+	return ret;
+}
+
 static int get_next_submodule(struct child_process *cp,
 			      struct strbuf *err, void *data, void **task_cb)
 {
@@ -1248,12 +1277,11 @@ static int get_next_submodule(struct child_process *cp,
 	struct submodule_parallel_fetch *spf = data;
 
 	for (; spf->count < spf->r->index->cache_nr; spf->count++) {
-		struct strbuf submodule_path = STRBUF_INIT;
-		struct strbuf submodule_git_dir = STRBUF_INIT;
 		struct strbuf submodule_prefix = STRBUF_INIT;
 		const struct cache_entry *ce = spf->r->index->cache[spf->count];
-		const char *git_dir, *default_argv;
+		const char *default_argv;
 		const struct submodule *submodule;
+		struct repository *repo;
 		struct submodule default_submodule = SUBMODULE_INIT;
 
 		if (!S_ISGITLINK(ce->ce_mode))
@@ -1288,16 +1316,12 @@ static int get_next_submodule(struct child_process *cp,
 			continue;
 		}
 
-		strbuf_repo_worktree_path(&submodule_path, spf->r, "%s", ce->name);
-		strbuf_addf(&submodule_git_dir, "%s/.git", submodule_path.buf);
 		strbuf_addf(&submodule_prefix, "%s%s/", spf->prefix, ce->name);
-		git_dir = read_gitfile(submodule_git_dir.buf);
-		if (!git_dir)
-			git_dir = submodule_git_dir.buf;
-		if (is_directory(git_dir)) {
+		repo = get_submodule_repo_for(spf->r, submodule);
+		if (repo) {
 			child_process_init(cp);
-			cp->dir = strbuf_detach(&submodule_path, NULL);
-			prepare_submodule_repo_env(&cp->env_array);
+			prepare_submodule_repo_env_in_gitdir(&cp->env_array);
+			cp->dir = xstrdup(repo->gitdir);
 			cp->git_cmd = 1;
 			if (!spf->quiet)
 				strbuf_addf(err, "Fetching submodule %s%s\n",
@@ -1307,10 +1331,11 @@ static int get_next_submodule(struct child_process *cp,
 			argv_array_push(&cp->args, default_argv);
 			argv_array_push(&cp->args, "--submodule-prefix");
 			argv_array_push(&cp->args, submodule_prefix.buf);
+
+			repo_clear(repo);
+			free(repo);
 			ret = 1;
 		}
-		strbuf_release(&submodule_path);
-		strbuf_release(&submodule_git_dir);
 		strbuf_release(&submodule_prefix);
 		if (ret) {
 			spf->count++;
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index 6c2f9b2ba2..42692219a1 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -566,7 +566,12 @@ test_expect_success 'fetching submodule into a broken repository' '
 
 	test_must_fail git -C dst status &&
 	test_must_fail git -C dst diff &&
-	test_must_fail git -C dst fetch --recurse-submodules
+
+	# git-fetch cannot find the git directory of the submodule,
+	# so it will do nothing, successfully, as it cannot distinguish between
+	# this broken submodule and a submodule that was just set active but
+	# not cloned yet
+	git -C dst fetch --recurse-submodules
 '
 
 test_expect_success "fetch new commits when submodule got renamed" '
-- 
2.19.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 8/9] fetch: retry fetching submodules if needed objects were not fetched
  2018-10-16 18:13 [PATCH 0/9] Resending sb/submodule-recursive-fetch-gets-the-tip Stefan Beller
                   ` (6 preceding siblings ...)
  2018-10-16 18:13 ` [PATCH 7/9] submodule: fetch in submodules git directory instead of in worktree Stefan Beller
@ 2018-10-16 18:13 ` Stefan Beller
  2018-10-18  0:39   ` Jonathan Tan
  2018-10-16 18:13 ` [PATCH 9/9] builtin/fetch: check for submodule updates for non branch fetches Stefan Beller
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 29+ messages in thread
From: Stefan Beller @ 2018-10-16 18:13 UTC (permalink / raw)
  To: gitster; +Cc: git, Stefan Beller

Currently when git-fetch is asked to recurse into submodules, it dispatches
a plain "git-fetch -C <submodule-dir>" (with some submodule related options
such as prefix and recusing strategy, but) without any information of the
remote or the tip that should be fetched.

This works surprisingly well in some workflows (such as using submodules
as a third party library), while not so well in other scenarios, such
as in a Gerrit topic-based workflow, that can tie together changes
(potentially across repositories) on the server side. One of the parts
of such a Gerrit workflow is to download a change when wanting to examine
it, and you'd want to have its submodule changes that are in the same
topic downloaded as well. However these submodule changes reside in their
own repository in their own ref (refs/changes/<int>).

Retry fetching a submodule by object name if the object id that the
superproject points to, cannot be found.

This retrying does not happen when the "git fetch" done at the
superproject is not storing the fetched results in remote
tracking branches (i.e. instead just recording them to
FETCH_HEAD) in this step. A later patch will fix this.

builtin/fetch used to only inspect submodules when they were fetched
"on-demand", as in either on/off case it was clear whether the submodule
needs to be fetched. However to know whether we need to try fetching the
object ids, we need to identify the object names, which is done in this
function check_for_new_submodule_commits(), so we'll also run that code
in case the submodule recursion is set to "on".

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/fetch.c             |   9 +-
 submodule.c                 | 185 ++++++++++++++++++++++++++++++------
 t/t5526-fetch-submodules.sh |  16 ++++
 3 files changed, 177 insertions(+), 33 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 61bec5d213..95c44bf6ff 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -700,8 +700,7 @@ static int update_local_ref(struct ref *ref,
 			what = _("[new ref]");
 		}
 
-		if ((recurse_submodules != RECURSE_SUBMODULES_OFF) &&
-		    (recurse_submodules != RECURSE_SUBMODULES_ON))
+		if (recurse_submodules != RECURSE_SUBMODULES_OFF)
 			check_for_new_submodule_commits(&ref->new_oid);
 		r = s_update_ref(msg, ref, 0);
 		format_display(display, r ? '!' : '*', what,
@@ -716,8 +715,7 @@ static int update_local_ref(struct ref *ref,
 		strbuf_add_unique_abbrev(&quickref, &current->object.oid, DEFAULT_ABBREV);
 		strbuf_addstr(&quickref, "..");
 		strbuf_add_unique_abbrev(&quickref, &ref->new_oid, DEFAULT_ABBREV);
-		if ((recurse_submodules != RECURSE_SUBMODULES_OFF) &&
-		    (recurse_submodules != RECURSE_SUBMODULES_ON))
+		if (recurse_submodules != RECURSE_SUBMODULES_OFF)
 			check_for_new_submodule_commits(&ref->new_oid);
 		r = s_update_ref("fast-forward", ref, 1);
 		format_display(display, r ? '!' : ' ', quickref.buf,
@@ -731,8 +729,7 @@ static int update_local_ref(struct ref *ref,
 		strbuf_add_unique_abbrev(&quickref, &current->object.oid, DEFAULT_ABBREV);
 		strbuf_addstr(&quickref, "...");
 		strbuf_add_unique_abbrev(&quickref, &ref->new_oid, DEFAULT_ABBREV);
-		if ((recurse_submodules != RECURSE_SUBMODULES_OFF) &&
-		    (recurse_submodules != RECURSE_SUBMODULES_ON))
+		if (recurse_submodules != RECURSE_SUBMODULES_OFF)
 			check_for_new_submodule_commits(&ref->new_oid);
 		r = s_update_ref("forced-update", ref, 1);
 		format_display(display, r ? '!' : '+', quickref.buf,
diff --git a/submodule.c b/submodule.c
index 30c06507e3..7246b776f3 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1141,8 +1141,12 @@ struct submodule_parallel_fetch {
 	int result;
 
 	struct string_list changed_submodule_names;
+	struct get_next_submodule_task **retry;
+	int retry_nr, retry_alloc;
 };
-#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, STRING_LIST_INIT_DUP }
+#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, \
+		  STRING_LIST_INIT_DUP, \
+		  NULL, 0, 0}
 
 static void calculate_changed_submodule_paths(
 	struct submodule_parallel_fetch *spf)
@@ -1247,6 +1251,56 @@ static int get_fetch_recurse_config(const struct submodule *submodule,
 	return spf->default_option;
 }
 
+struct get_next_submodule_task {
+	struct repository *repo;
+	const struct submodule *sub;
+	unsigned free_sub : 1; /* Do we need to free the submodule? */
+	struct oid_array *commits;
+};
+
+static const struct submodule *get_default_submodule(const char *path)
+{
+	struct submodule *ret = NULL;
+	const char *name = default_name_or_path(path);
+
+	if (!name)
+		return NULL;
+
+	ret = xmalloc(sizeof(*ret));
+	memset(ret, 0, sizeof(*ret));
+	ret->path = name;
+	ret->name = name;
+
+	return (const struct submodule *) ret;
+}
+
+static struct get_next_submodule_task *get_next_submodule_task_create(
+	struct repository *r, const char *path)
+{
+	struct get_next_submodule_task *task = xmalloc(sizeof(*task));
+	memset(task, 0, sizeof(*task));
+
+	task->sub = submodule_from_path(r, &null_oid, path);
+	if (!task->sub) {
+		task->sub = get_default_submodule(path);
+		task->free_sub = 1;
+	}
+
+	return task;
+}
+
+static void get_next_submodule_task_release(struct get_next_submodule_task *p)
+{
+	if (p->free_sub)
+		free((void*)p->sub);
+	p->free_sub = 0;
+	p->sub = NULL;
+
+	if (p->repo)
+		repo_clear(p->repo);
+	FREE_AND_NULL(p->repo);
+}
+
 static struct repository *get_submodule_repo_for(struct repository *r,
 						 const struct submodule *sub)
 {
@@ -1273,39 +1327,35 @@ static struct repository *get_submodule_repo_for(struct repository *r,
 static int get_next_submodule(struct child_process *cp,
 			      struct strbuf *err, void *data, void **task_cb)
 {
-	int ret = 0;
 	struct submodule_parallel_fetch *spf = data;
 
 	for (; spf->count < spf->r->index->cache_nr; spf->count++) {
-		struct strbuf submodule_prefix = STRBUF_INIT;
+		int recurse_config;
 		const struct cache_entry *ce = spf->r->index->cache[spf->count];
 		const char *default_argv;
-		const struct submodule *submodule;
-		struct repository *repo;
-		struct submodule default_submodule = SUBMODULE_INIT;
+		struct get_next_submodule_task *task;
 
 		if (!S_ISGITLINK(ce->ce_mode))
 			continue;
 
-		submodule = submodule_from_path(spf->r, &null_oid, ce->name);
-		if (!submodule) {
-			const char *name = default_name_or_path(ce->name);
-			if (name) {
-				default_submodule.path = name;
-				default_submodule.name = name;
-				submodule = &default_submodule;
-			}
+		task = get_next_submodule_task_create(spf->r, ce->name);
+
+		if (!task->sub) {
+			free(task);
+			continue;
 		}
 
-		switch (get_fetch_recurse_config(submodule, spf))
+		recurse_config = get_fetch_recurse_config(task->sub, spf);
+
+		switch (recurse_config)
 		{
 		default:
 		case RECURSE_SUBMODULES_DEFAULT:
 		case RECURSE_SUBMODULES_ON_DEMAND:
-			if (!submodule ||
+			if (!task->sub ||
 			    !string_list_lookup(
 					&spf->changed_submodule_names,
-					submodule->name))
+					task->sub->name))
 				continue;
 			default_argv = "on-demand";
 			break;
@@ -1316,12 +1366,12 @@ static int get_next_submodule(struct child_process *cp,
 			continue;
 		}
 
-		strbuf_addf(&submodule_prefix, "%s%s/", spf->prefix, ce->name);
-		repo = get_submodule_repo_for(spf->r, submodule);
-		if (repo) {
+		task->repo = get_submodule_repo_for(spf->r, task->sub);
+		if (task->repo) {
+			struct strbuf submodule_prefix = STRBUF_INIT;
 			child_process_init(cp);
 			prepare_submodule_repo_env_in_gitdir(&cp->env_array);
-			cp->dir = xstrdup(repo->gitdir);
+			cp->dir = task->repo->gitdir;
 			cp->git_cmd = 1;
 			if (!spf->quiet)
 				strbuf_addf(err, "Fetching submodule %s%s\n",
@@ -1330,18 +1380,51 @@ static int get_next_submodule(struct child_process *cp,
 			argv_array_pushv(&cp->args, spf->args.argv);
 			argv_array_push(&cp->args, default_argv);
 			argv_array_push(&cp->args, "--submodule-prefix");
+
+			strbuf_addf(&submodule_prefix, "%s%s/",
+						       spf->prefix,
+						       task->sub->path);
 			argv_array_push(&cp->args, submodule_prefix.buf);
 
-			repo_clear(repo);
-			free(repo);
-			ret = 1;
-		}
-		strbuf_release(&submodule_prefix);
-		if (ret) {
 			spf->count++;
+			*task_cb = task;
+
+			strbuf_release(&submodule_prefix);
 			return 1;
+		} else {
+			get_next_submodule_task_release(task);
+			free(task);
 		}
 	}
+
+	if (spf->retry_nr) {
+		struct get_next_submodule_task *task = spf->retry[spf->retry_nr - 1];
+		struct strbuf submodule_prefix = STRBUF_INIT;
+		spf->retry_nr--;
+
+		strbuf_addf(&submodule_prefix, "%s%s/", spf->prefix, task->sub->path);
+
+		child_process_init(cp);
+		prepare_submodule_repo_env_in_gitdir(&cp->env_array);
+		cp->git_cmd = 1;
+		cp->dir = task->repo->gitdir;
+
+		argv_array_init(&cp->args);
+		argv_array_pushv(&cp->args, spf->args.argv);
+		argv_array_push(&cp->args, "on-demand");
+		argv_array_push(&cp->args, "--submodule-prefix");
+		argv_array_push(&cp->args, submodule_prefix.buf);
+
+		/* NEEDSWORK: have get_default_remote from s--h */
+		argv_array_push(&cp->args, "origin");
+		oid_array_for_each_unique(task->commits,
+					  append_oid_to_argv, &cp->args);
+
+		*task_cb = task;
+		strbuf_release(&submodule_prefix);
+		return 1;
+	}
+
 	return 0;
 }
 
@@ -1349,20 +1432,68 @@ static int fetch_start_failure(struct strbuf *err,
 			       void *cb, void *task_cb)
 {
 	struct submodule_parallel_fetch *spf = cb;
+	struct get_next_submodule_task *task = task_cb;
 
 	spf->result = 1;
 
+	get_next_submodule_task_release(task);
 	return 0;
 }
 
+static int commit_exists_in_sub(const struct object_id *oid, void *data)
+{
+	struct repository *subrepo = data;
+
+	enum object_type type = oid_object_info(subrepo, oid, NULL);
+
+	return type != OBJ_COMMIT;
+}
+
 static int fetch_finish(int retvalue, struct strbuf *err,
 			void *cb, void *task_cb)
 {
 	struct submodule_parallel_fetch *spf = cb;
+	struct get_next_submodule_task *task = task_cb;
+	const struct submodule *sub;
+
+	struct string_list_item *it;
+	struct oid_array *commits;
 
 	if (retvalue)
 		spf->result = 1;
 
+	if (!task)
+		return 0;
+
+	sub = task->sub;
+	if (!sub)
+		goto out;
+
+	it = string_list_lookup(&spf->changed_submodule_names, sub->name);
+	if (!it)
+		goto out;
+
+	commits = it->util;
+	oid_array_filter(commits,
+			 commit_exists_in_sub,
+			 task->repo);
+
+	/* Are there commits that do not exist? */
+	if (commits->nr) {
+		/* We already tried fetching them, do not try again. */
+		if (task->commits)
+			return 0;
+
+		task->commits = commits;
+		ALLOC_GROW(spf->retry, spf->retry_nr + 1, spf->retry_alloc);
+		spf->retry[spf->retry_nr] = task;
+		spf->retry_nr++;
+		return 0;
+	}
+
+out:
+	get_next_submodule_task_release(task);
+
 	return 0;
 }
 
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index 42692219a1..af12c50e7d 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -605,4 +605,20 @@ test_expect_success "fetch new commits when submodule got renamed" '
 	test_cmp expect actual
 '
 
+test_expect_success "fetch new commits on-demand when they are not reachable" '
+	git checkout --detach &&
+	C=$(git -C submodule commit-tree -m "new change outside refs/heads" HEAD^{tree}) &&
+	git -C submodule update-ref refs/changes/1 $C &&
+	git update-index --cacheinfo 160000 $C submodule &&
+	git commit -m "updated submodule outside of refs/heads" &&
+	D=$(git rev-parse HEAD) &&
+	git update-ref refs/changes/2 $D &&
+	(
+		cd downstream &&
+		git fetch --recurse-submodules --recurse-submodules-default on-demand origin refs/changes/2:refs/heads/my_branch &&
+		git -C submodule cat-file -t $C &&
+		git checkout --recurse-submodules FETCH_HEAD
+	)
+'
+
 test_done
-- 
2.19.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 9/9] builtin/fetch: check for submodule updates for non branch fetches
  2018-10-16 18:13 [PATCH 0/9] Resending sb/submodule-recursive-fetch-gets-the-tip Stefan Beller
                   ` (7 preceding siblings ...)
  2018-10-16 18:13 ` [PATCH 8/9] fetch: retry fetching submodules if needed objects were not fetched Stefan Beller
@ 2018-10-16 18:13 ` Stefan Beller
  2018-10-18  0:47   ` Jonathan Tan
  2018-10-18  2:30 ` [PATCH 0/9] Resending sb/submodule-recursive-fetch-gets-the-tip Junio C Hamano
  2018-10-18  7:30 ` Junio C Hamano
  10 siblings, 1 reply; 29+ messages in thread
From: Stefan Beller @ 2018-10-16 18:13 UTC (permalink / raw)
  To: gitster; +Cc: git, Stefan Beller

Gerrit, the code review tool, has a different workflow than our mailing
list based approach. Usually users upload changes to a Gerrit server and
continuous integration and testing happens by bots. Sometimes however a
user wants to checkout a change locally and look at it locally. For this
use case, Gerrit offers a command line snippet to copy and paste to your
terminal, which looks like

  git fetch https://<host>/gerrit refs/changes/<id> &&
  git checkout FETCH_HEAD

For Gerrit changes that contain changing submodule gitlinks, it would be
easy to extend both the fetch and checkout with the '--recurse-submodules'
flag, such that this command line snippet would produce the state of a
change locally.

However the functionality added in the previous patch, which would
ensure that we fetch the objects in the submodule that the gitlink pointed
at, only works for remote tracking branches so far, not for FETCH_HEAD.

Make sure that fetching a superproject to its FETCH_HEAD, also respects
the existence checks for objects in the submodule recursion.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/fetch.c             | 5 ++++-
 t/t5526-fetch-submodules.sh | 2 +-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 95c44bf6ff..ea6ecd123e 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -887,11 +887,14 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 				rc |= update_local_ref(ref, what, rm, &note,
 						       summary_width);
 				free(ref);
-			} else
+			} else {
+				check_for_new_submodule_commits(&rm->old_oid);
 				format_display(&note, '*',
 					       *kind ? kind : "branch", NULL,
 					       *what ? what : "HEAD",
 					       "FETCH_HEAD", summary_width);
+			}
+
 			if (note.len) {
 				if (verbosity >= 0 && !shown_url) {
 					fprintf(stderr, _("From %.*s\n"),
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index af12c50e7d..a509eabb04 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -615,7 +615,7 @@ test_expect_success "fetch new commits on-demand when they are not reachable" '
 	git update-ref refs/changes/2 $D &&
 	(
 		cd downstream &&
-		git fetch --recurse-submodules --recurse-submodules-default on-demand origin refs/changes/2:refs/heads/my_branch &&
+		git fetch --recurse-submodules origin refs/changes/2 &&
 		git -C submodule cat-file -t $C &&
 		git checkout --recurse-submodules FETCH_HEAD
 	)
-- 
2.19.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/9] submodule.c: sort changed_submodule_names before searching it
  2018-10-16 18:13 ` [PATCH 3/9] submodule.c: sort changed_submodule_names before searching it Stefan Beller
@ 2018-10-17 21:21   ` Jonathan Tan
  0 siblings, 0 replies; 29+ messages in thread
From: Jonathan Tan @ 2018-10-17 21:21 UTC (permalink / raw)
  To: sbeller; +Cc: gitster, git, Jonathan Tan

> We can string_list_insert() to maintain sorted-ness of the
> list as we find new items, or we can string_list_append() to
> build an unsorted list and sort it at the end just once.

This confused me at first, because neither function is mentioned in the
patch.

> As we do not rely on the sortedness while building the
> list, we pick the "append and sort at the end" as it
> has better worst case execution times.

It took me some time to see that you were rejecting the two solutions
you listed in the first paragraph, and are instead using a third (that
you describe in this paragraph).

The code itself looks fine.

In the future, I think that it's better if this type of patch went into
its own patch set - this seems independent of the concerns of this patch
set, so splitting up keeps patch sets small.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/9] submodule.c: move global changed_submodule_names into fetch submodule struct
  2018-10-16 18:13 ` [PATCH 4/9] submodule.c: move global changed_submodule_names into fetch submodule struct Stefan Beller
@ 2018-10-17 21:26   ` Jonathan Tan
  2018-10-18 19:09     ` Stefan Beller
  0 siblings, 1 reply; 29+ messages in thread
From: Jonathan Tan @ 2018-10-17 21:26 UTC (permalink / raw)
  To: sbeller; +Cc: gitster, git, Jonathan Tan

> The `changed_submodule_names` are only used for fetching, so let's make it
> part of the struct that is passed around for fetching submodules.

Keep the titles of commit messages to 50 characters or under.

> +static void calculate_changed_submodule_paths(
> +	struct submodule_parallel_fetch *spf)

Instead of taking the entire struct, could this just take the list of
changed submodule names instead?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 5/9] submodule.c: do not copy around submodule list
  2018-10-16 18:13 ` [PATCH 5/9] submodule.c: do not copy around submodule list Stefan Beller
@ 2018-10-17 21:45   ` Jonathan Tan
  2018-10-18  2:35     ` Junio C Hamano
  0 siblings, 1 reply; 29+ messages in thread
From: Jonathan Tan @ 2018-10-17 21:45 UTC (permalink / raw)
  To: sbeller; +Cc: gitster, git, Jonathan Tan

> By doing so we'll have access to the util pointer for longer that
> contains the commits that we need to fetch, which will be
> useful in a later patch.

It seems that the main point of this patch is so that the OIDs be stored
in changed_submodule_names, because you need them in a later patch. If
so, I would have expected a commit title like "submodule: store OIDs in
changed_submodule_names".

> @@ -1142,8 +1142,7 @@ static void calculate_changed_submodule_paths(
>  	struct submodule_parallel_fetch *spf)
>  {
>  	struct argv_array argv = ARGV_ARRAY_INIT;
> -	struct string_list changed_submodules = STRING_LIST_INIT_DUP;
> -	const struct string_list_item *name;
> +	struct string_list_item *name;

Prior to this patch, changed_submodules is here just so that we know
what to add to changed_submodule_names (it will be cleared at the end of
the function). So removing it is fine.

> -	collect_changed_submodules(&changed_submodules, &argv);
> +	collect_changed_submodules(&spf->changed_submodule_names, &argv);
>  
> -	for_each_string_list_item(name, &changed_submodules) {
> +	for_each_string_list_item(name, &spf->changed_submodule_names) {

We add all the changed submodules directly to changed_submodule_names,
and iterate over it. So we use changed_submodule_names as a
scratchpad...

> -		if (!submodule_has_commits(path, commits))
> -			string_list_append(&spf->changed_submodule_names,
> -					   name->string);
> +		if (submodule_has_commits(path, commits)) {
> +			oid_array_clear(commits);
> +			*name->string = '\0';
> +		}

...but this is fine because previously, we appended the name->string to
changed_submodule_names (with no util) whereas now, we make the
name->string empty in the opposite condition.

Before this patch, at the end of the loop, we had all the wanted
submodule names with no util in changed_submodule_names. With this
patch, at the end of the loop, we have all the wanted submodule names
with util pointing to an OID array, and also some blanks with util
pointing to a cleared OID array.

> -	free_submodules_oids(&changed_submodules);
> +	string_list_remove_empty_items(&spf->changed_submodule_names, 1);

The local variable changed_submodules no longer exists, so removing that
line is fine.

And we remove all the blanks from changed_submodule_names.

> @@ -1377,7 +1378,7 @@ int fetch_populated_submodules(struct repository *r,
>  
>  	argv_array_clear(&spf.args);
>  out:
> -	string_list_clear(&spf.changed_submodule_names, 1);
> +	free_submodules_oids(&spf.changed_submodule_names);

And because changed_submodule_names now has a non-trivial util pointer,
we need to free it properly.

This patch looks good, except that the commit title and message could
perhaps be clearer.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 6/9] repository: repo_submodule_init to take a submodule struct
  2018-10-16 18:13 ` [PATCH 6/9] repository: repo_submodule_init to take a submodule struct Stefan Beller
@ 2018-10-17 21:55   ` Jonathan Tan
  0 siblings, 0 replies; 29+ messages in thread
From: Jonathan Tan @ 2018-10-17 21:55 UTC (permalink / raw)
  To: sbeller; +Cc: gitster, git, Jonathan Tan

> When constructing a struct repository for a submodule for some revision
> of the superproject where the submodule is not contained in the index,
> it may not be present in the working tree currently either. In that
> siutation giving a 'path' argument is not useful. Upgrade the
> repo_submodule_init function to take a struct submodule instead.

Are there ways for other code to create a struct submodule without using
submodule_from_path()? If yes, maybe outline them here and say that this
makes repo_submodule_init() more useful, since it can now be used with
those methods.

> While we are at it, overhaul the repo_submodule_init function by renaming
> the submodule repository struct, which is to be initialized, to a name
> that is not confused with the struct submodule as easily.

"Overhaul" is probably not the right word for just a rename of a local
variable. Also mention the other functions in which you did this rename
(or just say "repo_submodule_init() and other functions").

> +/*
> + * Initialize the repository 'subrepo' as the submodule given by the
> + * struct submodule 'sub' in parent repository 'superproject'.
> + * Return 0 upon success and a non-zero value upon failure.
> + */
> +struct submodule;
> +int repo_submodule_init(struct repository *subrepo,
>  			struct repository *superproject,
> -			const char *path);
> +			const struct submodule *sub);

From this description, I would expect "sub" to not be allowed to be
NULL, but from the code I don't think that's the case. Should we
prohibit NULL (and add checks to all repo_submodule_init()'s callers) or
document that a NULL sub is allowed (and what happens in that case)?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 7/9] submodule: fetch in submodules git directory instead of in worktree
  2018-10-16 18:13 ` [PATCH 7/9] submodule: fetch in submodules git directory instead of in worktree Stefan Beller
@ 2018-10-17 22:58   ` Jonathan Tan
  2018-10-23 18:26     ` Stefan Beller
  0 siblings, 1 reply; 29+ messages in thread
From: Jonathan Tan @ 2018-10-17 22:58 UTC (permalink / raw)
  To: sbeller; +Cc: gitster, git, Jonathan Tan

> This patch started as a refactoring to make 'get_next_submodule' more
> readable, but upon doing so, I realized that "git fetch" of the submodule
> actually doesn't need to be run in the submodules worktree. So let's run
> it in its git dir instead.

The commit message needs to be updated, I think - this patch does
significantly more than fetching in the gitdir.

> This patch leaks the cp->dir in get_next_submodule, as any further
> callback in run_processes_parallel doesn't have access to the child
> process any more.

The cp->dir is already leaked - probably better to write "cp->dir in
get_next_submodule() is still leaked, but this will be fixed in a
subsequent patch".

> +static void prepare_submodule_repo_env_in_gitdir(struct argv_array *out)
> +{
> +	prepare_submodule_repo_env_no_git_dir(out);
> +	argv_array_pushf(out, "%s=.", GIT_DIR_ENVIRONMENT);

Why does GIT_DIR need to be set? Is it to avoid subcommands recursively
checking the parent directories in case the CWD is a malformed Git
repository? If yes, maybe it's worth adding a comment.

> +static struct repository *get_submodule_repo_for(struct repository *r,
> +						 const struct submodule *sub)
> +{
> +	struct repository *ret = xmalloc(sizeof(*ret));
> +
> +	if (repo_submodule_init(ret, r, sub)) {
> +		/*
> +		 * No entry in .gitmodules? Technically not a submodule,
> +		 * but historically we supported repositories that happen to be
> +		 * in-place where a gitlink is. Keep supporting them.
> +		 */
> +		struct strbuf gitdir = STRBUF_INIT;
> +		strbuf_repo_worktree_path(&gitdir, r, "%s/.git", sub->path);
> +		if (repo_init(ret, gitdir.buf, NULL)) {
> +			strbuf_release(&gitdir);
> +			return NULL;
> +		}
> +		strbuf_release(&gitdir);
> +	}
> +
> +	return ret;
> +}

This is the significant thing that this patch does more - an unskipped
submodule is now something that either passes the checks in
repo_submodule_init() or the checks in repo_init(), which seems to be
stricter than the current check that ".git" points to a directory or is
one. This means that we skip certain broken repositories, and this
necessitates a change in the test.

I think we should be more particular about what we're allowed to skip -
in particular, maybe if we're planning to skip this submodule, its
corresponding directory in the worktree (if one exists) needs to be
empty.

> -			cp->dir = strbuf_detach(&submodule_path, NULL);
> -			prepare_submodule_repo_env(&cp->env_array);
> +			prepare_submodule_repo_env_in_gitdir(&cp->env_array);
> +			cp->dir = xstrdup(repo->gitdir);

Here is where the functionality change (fetch in ".git") described in
the commit message occurs.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 8/9] fetch: retry fetching submodules if needed objects were not fetched
  2018-10-16 18:13 ` [PATCH 8/9] fetch: retry fetching submodules if needed objects were not fetched Stefan Beller
@ 2018-10-18  0:39   ` Jonathan Tan
  2018-10-23 22:37     ` Stefan Beller
  0 siblings, 1 reply; 29+ messages in thread
From: Jonathan Tan @ 2018-10-18  0:39 UTC (permalink / raw)
  To: sbeller; +Cc: gitster, git, Jonathan Tan

> Currently when git-fetch is asked to recurse into submodules, it dispatches
> a plain "git-fetch -C <submodule-dir>" (with some submodule related options
> such as prefix and recusing strategy, but) without any information of the
> remote or the tip that should be fetched.
> 
> This works surprisingly well in some workflows (such as using submodules
> as a third party library), while not so well in other scenarios, such
> as in a Gerrit topic-based workflow, that can tie together changes
> (potentially across repositories) on the server side. One of the parts
> of such a Gerrit workflow is to download a change when wanting to examine
> it, and you'd want to have its submodule changes that are in the same
> topic downloaded as well. However these submodule changes reside in their
> own repository in their own ref (refs/changes/<int>).

It seems that the pertinent situation is when, in the superproject, you
fetch a commit (which may be the target of a ref, or an ancestor of the
target of a ref) that points to a submodule commit that was not fetched
by the default refspec-less fetch that you describe in the first
paragraph. I would just describe it as follows:

  But this default fetch is not sufficient, as a newly fetched commit in
  the superproject could point to a commit in the submodule that is not
  in the default refspec. This is common in workflows like Gerrit's.
  When fetching a Gerrit change under review (from refs/changes/??), the
  commits in that change likely point to submodule commits that have not
  been merged to a branch yet.

Another thing you need to clarify is what happens if the fetch-by-commit
fails. Right now, it seems that it will make the whole thing fail, which
might be a surprising change in behavior.

> Retry fetching a submodule by object name if the object id that the
> superproject points to, cannot be found.

I don't really think of this as a retry - the first time, you're
fetching the default refspec, and the second time, with a specific list
of object IDs. Also, be consistent in the usage of "object name" and
"object id" - as far as I know, they are the same thing.

> This retrying does not happen when the "git fetch" done at the
> superproject is not storing the fetched results in remote
> tracking branches (i.e. instead just recording them to
> FETCH_HEAD) in this step. A later patch will fix this.

The test stores the result in a normal branch, not a remote tracking
branch. Is storing in a normal branch required?

Also, do you know why this is required? A naive reading of the patch
leads me to believe that this should work even if merely fetching to
FETCH_HEAD.

> +struct get_next_submodule_task {
> +	struct repository *repo;
> +	const struct submodule *sub;
> +	unsigned free_sub : 1; /* Do we need to free the submodule? */
> +	struct oid_array *commits;
> +};

Firstly, I don't think "commits" needs to be a pointer.

Document at least "commits". As far as I can tell, if NULL (or empty if
you take my suggestion), this means that this task is the first pass for
this particular submodule and the fetch needs to be done using the
default refspec. Otherwise, this task is the second pass for this
particular submodule and the fetch needs to be done passing the
contained OIDs.

> +static const struct submodule *get_default_submodule(const char *path)
> +{
> +	struct submodule *ret = NULL;
> +	const char *name = default_name_or_path(path);
> +
> +	if (!name)
> +		return NULL;
> +
> +	ret = xmalloc(sizeof(*ret));
> +	memset(ret, 0, sizeof(*ret));
> +	ret->path = name;
> +	ret->name = name;
> +
> +	return (const struct submodule *) ret;
> +}

What is a "default" submodule and why would you need one?

> +		task = get_next_submodule_task_create(spf->r, ce->name);
> +
> +		if (!task->sub) {
> +			free(task);
> +			continue;
>  		}

Will task->sub ever be NULL?

> +	if (spf->retry_nr) {
> +		struct get_next_submodule_task *task = spf->retry[spf->retry_nr - 1];
> +		struct strbuf submodule_prefix = STRBUF_INIT;
> +		spf->retry_nr--;
> +
> +		strbuf_addf(&submodule_prefix, "%s%s/", spf->prefix, task->sub->path);
> +
> +		child_process_init(cp);
> +		prepare_submodule_repo_env_in_gitdir(&cp->env_array);
> +		cp->git_cmd = 1;
> +		cp->dir = task->repo->gitdir;
> +
> +		argv_array_init(&cp->args);
> +		argv_array_pushv(&cp->args, spf->args.argv);
> +		argv_array_push(&cp->args, "on-demand");

This means that we need to trust that the last entry in spf->args can
take an "on-demand" parameter. Could we supply that argument here
explicitly instead?

> +		argv_array_push(&cp->args, "--submodule-prefix");
> +		argv_array_push(&cp->args, submodule_prefix.buf);
> +
> +		/* NEEDSWORK: have get_default_remote from s--h */

What is s--h?

> +static int commit_exists_in_sub(const struct object_id *oid, void *data)
> +{
> +	struct repository *subrepo = data;
> +
> +	enum object_type type = oid_object_info(subrepo, oid, NULL);
> +
> +	return type != OBJ_COMMIT;
> +}

Shouldn't the function name be commit_missing_in_sub?

>  static int fetch_finish(int retvalue, struct strbuf *err,
>  			void *cb, void *task_cb)
>  {
>  	struct submodule_parallel_fetch *spf = cb;
> +	struct get_next_submodule_task *task = task_cb;
> +	const struct submodule *sub;
> +
> +	struct string_list_item *it;
> +	struct oid_array *commits;
>  
>  	if (retvalue)
>  		spf->result = 1;
>  
> +	if (!task)
> +		return 0;

When will task be NULL?

> +
> +	sub = task->sub;
> +	if (!sub)
> +		goto out;

Same as above - when will sub be NULL?

> +	it = string_list_lookup(&spf->changed_submodule_names, sub->name);
> +	if (!it)
> +		goto out;

And "it" as well.

> +	commits = it->util;
> +	oid_array_filter(commits,
> +			 commit_exists_in_sub,
> +			 task->repo);
> +
> +	/* Are there commits that do not exist? */
> +	if (commits->nr) {
> +		/* We already tried fetching them, do not try again. */
> +		if (task->commits)
> +			return 0;

Clearer and more efficient if the check for task->commits was first
before all the filtering.

> +test_expect_success "fetch new commits on-demand when they are not reachable" '

"not reachable" confused me - they are indeed reachable, just not from
the default refspec.

> +	git checkout --detach &&
> +	C=$(git -C submodule commit-tree -m "new change outside refs/heads" HEAD^{tree}) &&
> +	git -C submodule update-ref refs/changes/1 $C &&
> +	git update-index --cacheinfo 160000 $C submodule &&
> +	git commit -m "updated submodule outside of refs/heads" &&
> +	D=$(git rev-parse HEAD) &&
> +	git update-ref refs/changes/2 $D &&
> +	(
> +		cd downstream &&
> +		git fetch --recurse-submodules --recurse-submodules-default on-demand origin refs/changes/2:refs/heads/my_branch &&
> +		git -C submodule cat-file -t $C &&
> +		git checkout --recurse-submodules FETCH_HEAD
> +	)
> +'

For maximum test coverage, I think this test should involve 2
submodules:

 B   C  E   F
  \ /    \ /
   A      D

and the upstream superproject having:

  G -> H -> I

The downstream superproject already has G, and is fetching I. In H, the
submodule gitlinks point to B and E respectively, and in I, the
submodule gitlinks point to C and F respectively. This ensures that both
multiple submodules work, and that submodule commits are also fetched
for interior superproject commits.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 9/9] builtin/fetch: check for submodule updates for non branch fetches
  2018-10-16 18:13 ` [PATCH 9/9] builtin/fetch: check for submodule updates for non branch fetches Stefan Beller
@ 2018-10-18  0:47   ` Jonathan Tan
  0 siblings, 0 replies; 29+ messages in thread
From: Jonathan Tan @ 2018-10-18  0:47 UTC (permalink / raw)
  To: sbeller; +Cc: gitster, git, Jonathan Tan

> @@ -887,11 +887,14 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
>  				rc |= update_local_ref(ref, what, rm, &note,
>  						       summary_width);
>  				free(ref);
> -			} else
> +			} else {
> +				check_for_new_submodule_commits(&rm->old_oid);

Does this need to be guarded with a recurse_submodules check, just like
in update_local_ref()?

Also, this warrants a comment - this is here because there is some code
later that requires the new submodule commits to be registered, and the
other branch does not require it only because update_local_ref() calls
it.

> @@ -615,7 +615,7 @@ test_expect_success "fetch new commits on-demand when they are not reachable" '
>  	git update-ref refs/changes/2 $D &&
>  	(
>  		cd downstream &&
> -		git fetch --recurse-submodules --recurse-submodules-default on-demand origin refs/changes/2:refs/heads/my_branch &&
> +		git fetch --recurse-submodules origin refs/changes/2 &&
>  		git -C submodule cat-file -t $C &&
>  		git checkout --recurse-submodules FETCH_HEAD
>  	)

I think there should be a new test - we can tell from the code that just
because fetching to FETCH_HEAD works doesn't mean that fetching to a ref
works, and vice versa.

Also, can you make the test fetch 2 refs? So that we know that it works
with more than one.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/9] Resending sb/submodule-recursive-fetch-gets-the-tip
  2018-10-16 18:13 [PATCH 0/9] Resending sb/submodule-recursive-fetch-gets-the-tip Stefan Beller
                   ` (8 preceding siblings ...)
  2018-10-16 18:13 ` [PATCH 9/9] builtin/fetch: check for submodule updates for non branch fetches Stefan Beller
@ 2018-10-18  2:30 ` Junio C Hamano
  2018-10-18  7:30 ` Junio C Hamano
  10 siblings, 0 replies; 29+ messages in thread
From: Junio C Hamano @ 2018-10-18  2:30 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Stefan Beller <sbeller@google.com> writes:

> This is based on ao/submodule-wo-gitmodules-checked-out.

Thanks.  I had an impression that we were not entirely happy with
the topic and are expecting a reroll, but let's hope that the part
we expect to be updated won't have much overlaps.

> A range-diff below shows how picking a different base changed the patches,
> apart from that no further adjustments have been made.

Thanks.  Let's see how well this plays together with other topics in 'pu'.q

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 5/9] submodule.c: do not copy around submodule list
  2018-10-17 21:45   ` Jonathan Tan
@ 2018-10-18  2:35     ` Junio C Hamano
  0 siblings, 0 replies; 29+ messages in thread
From: Junio C Hamano @ 2018-10-18  2:35 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: sbeller, git

Jonathan Tan <jonathantanmy@google.com> writes:

>> By doing so we'll have access to the util pointer for longer that
>> contains the commits that we need to fetch, which will be
>> useful in a later patch.
>
> It seems that the main point of this patch is so that the OIDs be stored
> in changed_submodule_names, because you need them in a later patch. If
> so, I would have expected a commit title like "submodule: store OIDs in
> changed_submodule_names".
>
> ...
> This patch looks good, except that the commit title and message could
> perhaps be clearer.

Thanks for a thoughtful review; not just this step, but for your review
comments on all the other steps in the series are very helpful.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/9] Resending sb/submodule-recursive-fetch-gets-the-tip
  2018-10-16 18:13 [PATCH 0/9] Resending sb/submodule-recursive-fetch-gets-the-tip Stefan Beller
                   ` (9 preceding siblings ...)
  2018-10-18  2:30 ` [PATCH 0/9] Resending sb/submodule-recursive-fetch-gets-the-tip Junio C Hamano
@ 2018-10-18  7:30 ` Junio C Hamano
  2018-10-18 18:00   ` Stefan Beller
  10 siblings, 1 reply; 29+ messages in thread
From: Junio C Hamano @ 2018-10-18  7:30 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Stefan Beller <sbeller@google.com> writes:

> This is based on ao/submodule-wo-gitmodules-checked-out.
>
> This resends origin/sb/submodule-recursive-fetch-gets-the-tip, resolving
> the issues pointed out via origin/xxx/sb-submodule-recursive-fetch-gets-the-tip-in-pu
> by basing this series on origin/ao/submodule-wo-gitmodules-checked-out

Applying this round to the result of merging ao/submodule-* to
'master' requires this to work, it seems, as you've introduced a
call to repo-init thing in the meantime with another topic.

Subject: [PATCH] fixup! repository: repo_submodule_init to take a submodule struct

---
 builtin/submodule--helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 5f8a804a6e..015aa1471f 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -2037,7 +2037,7 @@ static int ensure_core_worktree(int argc, const char **argv, const char *prefix)
 	if (!sub)
 		BUG("We could get the submodule handle before?");
 
-	if (repo_submodule_init(&subrepo, the_repository, path))
+	if (repo_submodule_init(&subrepo, the_repository, sub))
 		die(_("could not get a repository handle for submodule '%s'"), path);
 
 	if (!repo_config_get_string(&subrepo, "core.worktree", &cw)) {
-- 
2.19.1-450-ga4b8ab5363


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/9] Resending sb/submodule-recursive-fetch-gets-the-tip
  2018-10-18  7:30 ` Junio C Hamano
@ 2018-10-18 18:00   ` Stefan Beller
  0 siblings, 0 replies; 29+ messages in thread
From: Stefan Beller @ 2018-10-18 18:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thu, Oct 18, 2018 at 12:30 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Stefan Beller <sbeller@google.com> writes:
>
> > This is based on ao/submodule-wo-gitmodules-checked-out.
> >
> > This resends origin/sb/submodule-recursive-fetch-gets-the-tip, resolving
> > the issues pointed out via origin/xxx/sb-submodule-recursive-fetch-gets-the-tip-in-pu
> > by basing this series on origin/ao/submodule-wo-gitmodules-checked-out
>
> Applying this round to the result of merging ao/submodule-* to
> 'master' requires this to work, it seems, as you've introduced a
> call to repo-init thing in the meantime with another topic.

Unfortunately, yes, sb/submodule-update-in-c had one such call.

> > This is based on ao/submodule-wo-gitmodules-checked-out.
> Thanks.  I had an impression that we were not entirely happy with
> the topic and are expecting a reroll, but let's hope that the part
> we expect to be updated won't have much overlaps.

I slowly came to the realization that this topic might be ripped up
into 2 or more topics, as some of the cleanups seem to be orthogonal.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/9] submodule.c: move global changed_submodule_names into fetch submodule struct
  2018-10-17 21:26   ` Jonathan Tan
@ 2018-10-18 19:09     ` Stefan Beller
  0 siblings, 0 replies; 29+ messages in thread
From: Stefan Beller @ 2018-10-18 19:09 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Junio C Hamano, git

On Wed, Oct 17, 2018 at 2:26 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> > The `changed_submodule_names` are only used for fetching, so let's make it
> > part of the struct that is passed around for fetching submodules.
>
> Keep the titles of commit messages to 50 characters or under.

renamed

>
> > +static void calculate_changed_submodule_paths(
> > +     struct submodule_parallel_fetch *spf)
>
> Instead of taking the entire struct, could this just take the list of
> changed submodule names instead?

I think so, done.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 7/9] submodule: fetch in submodules git directory instead of in worktree
  2018-10-17 22:58   ` Jonathan Tan
@ 2018-10-23 18:26     ` Stefan Beller
  2018-10-23 22:55       ` Jonathan Tan
  0 siblings, 1 reply; 29+ messages in thread
From: Stefan Beller @ 2018-10-23 18:26 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Junio C Hamano, git

On Wed, Oct 17, 2018 at 3:58 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> > This patch started as a refactoring to make 'get_next_submodule' more
> > readable, but upon doing so, I realized that "git fetch" of the submodule
> > actually doesn't need to be run in the submodules worktree. So let's run
> > it in its git dir instead.
>
> The commit message needs to be updated, I think - this patch does
> significantly more than fetching in the gitdir.

From my point of view, it is not significant, but refactoring.
I'll think how to write a better commit message.

> > This patch leaks the cp->dir in get_next_submodule, as any further
> > callback in run_processes_parallel doesn't have access to the child
> > process any more.
>
> The cp->dir is already leaked - probably better to write "cp->dir in
> get_next_submodule() is still leaked, but this will be fixed in a
> subsequent patch".

... which fails to mention the reason why (as it is hard to do given
the current design) but is more concise.

> > +static void prepare_submodule_repo_env_in_gitdir(struct argv_array *out)
> > +{
> > +     prepare_submodule_repo_env_no_git_dir(out);
> > +     argv_array_pushf(out, "%s=.", GIT_DIR_ENVIRONMENT);
>
> Why does GIT_DIR need to be set? Is it to avoid subcommands recursively
> checking the parent directories in case the CWD is a malformed Git
> repository? If yes, maybe it's worth adding a comment.

It is copying the structure from prepare_submodule_repo_env,
specifically 10f5c52656 (submodule: avoid auto-discovery in
prepare_submodule_repo_env(), 2016-09-01), which sounds
appealing (and brings real benefits for the working directory),
but I have not thought about this protection for the git dir.

Maybe another approach is to not set the cwd for the child process
and instead point GIT_DIR_ENVIRONMENT only to the right
directory.

Then the use of GIT_DIR_ENVIRONMENT is obvious and
is not just for protection of corner cases.

However I think this protection is really valuable for the
.git dir as well as the submodule may be broken and we do not
want to end up in an infinite loop (as the discovery would find
the superproject which then tries to recurse, again, into the
submodule with the broken git dir)

When adding the comment here, we'd also want to have
the comment in prepare_submodule_repo_env, which
could be its own preparation commit.

> > +static struct repository *get_submodule_repo_for(struct repository *r,
> > +                                              const struct submodule *sub)
> > +{
> > +     struct repository *ret = xmalloc(sizeof(*ret));
> > +
> > +     if (repo_submodule_init(ret, r, sub)) {
> > +             /*
> > +              * No entry in .gitmodules? Technically not a submodule,
> > +              * but historically we supported repositories that happen to be
> > +              * in-place where a gitlink is. Keep supporting them.
> > +              */
> > +             struct strbuf gitdir = STRBUF_INIT;
> > +             strbuf_repo_worktree_path(&gitdir, r, "%s/.git", sub->path);
> > +             if (repo_init(ret, gitdir.buf, NULL)) {
> > +                     strbuf_release(&gitdir);
> > +                     return NULL;
> > +             }
> > +             strbuf_release(&gitdir);
> > +     }
> > +
> > +     return ret;
> > +}
>
> This is the significant thing that this patch does more - an unskipped
> submodule is now something that either passes the checks in
> repo_submodule_init() or the checks in repo_init(), which seems to be
> stricter than the current check that ".git" points to a directory or is
> one. This means that we skip certain broken repositories, and this
> necessitates a change in the test.

I see. However there is no change in function, the check in repo_init
(or repo_submodule_init) is less strict than the check in the child process.
So if there are broken submodule repositories, the difference of this
patch is the layer at which it is caught, i.e. we would not spawn a child
that fails, but skip the submodule.

Thinking of that, maybe we need to announce that in get_next_submodule

>
> I think we should be more particular about what we're allowed to skip -
> in particular, maybe if we're planning to skip this submodule, its
> corresponding directory in the worktree (if one exists) needs to be
> empty.

If the working tree directory is empty for that submodule, it means
it is likely not initialized. But why would we use that as a signal to
skip the submodule?



> > -                     cp->dir = strbuf_detach(&submodule_path, NULL);
> > -                     prepare_submodule_repo_env(&cp->env_array);
> > +                     prepare_submodule_repo_env_in_gitdir(&cp->env_array);
> > +                     cp->dir = xstrdup(repo->gitdir);
>
> Here is where the functionality change (fetch in ".git") described in
> the commit message occurs.

True.

Thanks for the review, I'll try to split up this commit a bit more and
explain each part on its own.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 8/9] fetch: retry fetching submodules if needed objects were not fetched
  2018-10-18  0:39   ` Jonathan Tan
@ 2018-10-23 22:37     ` Stefan Beller
  2018-10-23 23:37       ` Jonathan Tan
  0 siblings, 1 reply; 29+ messages in thread
From: Stefan Beller @ 2018-10-23 22:37 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Junio C Hamano, git

On Wed, Oct 17, 2018 at 5:40 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> > Currently when git-fetch is asked to recurse into submodules, it dispatches
> > a plain "git-fetch -C <submodule-dir>" (with some submodule related options
> > such as prefix and recusing strategy, but) without any information of the
> > remote or the tip that should be fetched.
> >
> > This works surprisingly well in some workflows (such as using submodules
> > as a third party library), while not so well in other scenarios, such
> > as in a Gerrit topic-based workflow, that can tie together changes
> > (potentially across repositories) on the server side. One of the parts
> > of such a Gerrit workflow is to download a change when wanting to examine
> > it, and you'd want to have its submodule changes that are in the same
> > topic downloaded as well. However these submodule changes reside in their
> > own repository in their own ref (refs/changes/<int>).
>
> It seems that the pertinent situation is when, in the superproject, you
> fetch a commit (which may be the target of a ref, or an ancestor of the
> target of a ref) that points to a submodule commit that was not fetched
> by the default refspec-less fetch that you describe in the first
> paragraph. I would just describe it as follows:
>
>   But this default fetch is not sufficient, as a newly fetched commit in
>   the superproject could point to a commit in the submodule that is not
>   in the default refspec. This is common in workflows like Gerrit's.
>   When fetching a Gerrit change under review (from refs/changes/??), the
>   commits in that change likely point to submodule commits that have not
>   been merged to a branch yet.

Makes sense.

> Another thing you need to clarify is what happens if the fetch-by-commit
> fails. Right now, it seems that it will make the whole thing fail, which
> might be a surprising change in behavior.

But a positive surprise, I would assume?

> The test stores the result in a normal branch, not a remote tracking
> branch. Is storing in a normal branch required?

In the test we fetch from another repository, such that in the
repository-under-test this will show up in a remote tracking branch?

> Also, do you know why this is required? A naive reading of the patch
> leads me to believe that this should work even if merely fetching to
> FETCH_HEAD.

See the next patch, check_for_new_submodule_commits() is missing
for FETCH_HEAD.

>
> > +struct get_next_submodule_task {
> > +     struct repository *repo;
> > +     const struct submodule *sub;
> > +     unsigned free_sub : 1; /* Do we need to free the submodule? */
> > +     struct oid_array *commits;
> > +};
>
> Firstly, I don't think "commits" needs to be a pointer.
>
> Document at least "commits". As far as I can tell, if NULL (or empty if
> you take my suggestion), this means that this task is the first pass for
> this particular submodule and the fetch needs to be done using the
> default refspec. Otherwise, this task is the second pass for this
> particular submodule and the fetch needs to be done passing the
> contained OIDs.

Makes sense. I think I'll make it a non-pointer, but will introduce another
flag or counter for the phase.

>
> > +static const struct submodule *get_default_submodule(const char *path)
> > +{
> > +     struct submodule *ret = NULL;
> > +     const char *name = default_name_or_path(path);
> > +
> > +     if (!name)
> > +             return NULL;
> > +
> > +     ret = xmalloc(sizeof(*ret));
> > +     memset(ret, 0, sizeof(*ret));
> > +     ret->path = name;
> > +     ret->name = name;
> > +
> > +     return (const struct submodule *) ret;
> > +}
>
> What is a "default" submodule and why would you need one?

s/default/artificial/. Such a submodule is a submodule that has no
config in the .gitmodules file and its name == path.
We need to keep those around for historic reasons AFAICT, c.f.
c68f837576 (implement fetching of moved submodules, 2017-10-16)

> > +             task = get_next_submodule_task_create(spf->r, ce->name);
> > +
> > +             if (!task->sub) {
> > +                     free(task);
> > +                     continue;
> >               }
>
> Will task->sub ever be NULL?

Yes, if we fall back to these "default" submodule and just try if it
can be handled
as a submodule, but it cannot be handled as such,
get_next_submodule_task_create has

    task->sub = submodule_from_path(r, &null_oid, path);
    if (!task->sub) {
        task->sub = get_default_submodule(path);

and get_default_submodule can return NULL.


>
> > +     if (spf->retry_nr) {
> > +             struct get_next_submodule_task *task = spf->retry[spf->retry_nr - 1];
> > +             struct strbuf submodule_prefix = STRBUF_INIT;
> > +             spf->retry_nr--;
> > +
> > +             strbuf_addf(&submodule_prefix, "%s%s/", spf->prefix, task->sub->path);
> > +
> > +             child_process_init(cp);
> > +             prepare_submodule_repo_env_in_gitdir(&cp->env_array);
> > +             cp->git_cmd = 1;
> > +             cp->dir = task->repo->gitdir;
> > +
> > +             argv_array_init(&cp->args);
> > +             argv_array_pushv(&cp->args, spf->args.argv);
> > +             argv_array_push(&cp->args, "on-demand");
>
> This means that we need to trust that the last entry in spf->args can
> take an "on-demand" parameter. Could we supply that argument here
> explicitly instead?
>
> > +             argv_array_push(&cp->args, "--submodule-prefix");
> > +             argv_array_push(&cp->args, submodule_prefix.buf);
> > +
> > +             /* NEEDSWORK: have get_default_remote from s--h */
>
> What is s--h?

builtin/submodule--helper, will clarify

>
> > +static int commit_exists_in_sub(const struct object_id *oid, void *data)
> > +{
> > +     struct repository *subrepo = data;
> > +
> > +     enum object_type type = oid_object_info(subrepo, oid, NULL);
> > +
> > +     return type != OBJ_COMMIT;
> > +}
>
> Shouldn't the function name be commit_missing_in_sub?

yes.

>
> >  static int fetch_finish(int retvalue, struct strbuf *err,
> >                       void *cb, void *task_cb)
> >  {
> >       struct submodule_parallel_fetch *spf = cb;
> > +     struct get_next_submodule_task *task = task_cb;
> > +     const struct submodule *sub;
> > +
> > +     struct string_list_item *it;
> > +     struct oid_array *commits;
> >
> >       if (retvalue)
> >               spf->result = 1;
> >
> > +     if (!task)
> > +             return 0;
>
> When will task be NULL?
>
> > +
> > +     sub = task->sub;
> > +     if (!sub)
> > +             goto out;
>
> Same as above - when will sub be NULL?
>
> > +     it = string_list_lookup(&spf->changed_submodule_names, sub->name);
> > +     if (!it)
> > +             goto out;
>
> And "it" as well.

I'll add comments.

>
> > +     commits = it->util;
> > +     oid_array_filter(commits,
> > +                      commit_exists_in_sub,
> > +                      task->repo);
> > +
> > +     /* Are there commits that do not exist? */
> > +     if (commits->nr) {
> > +             /* We already tried fetching them, do not try again. */
> > +             if (task->commits)
> > +                     return 0;
>
> Clearer and more efficient if the check for task->commits was first
> before all the filtering.
>
> > +test_expect_success "fetch new commits on-demand when they are not reachable" '
>
> "not reachable" confused me - they are indeed reachable, just not from
> the default refspec.

Makes sense

>
> > +     git checkout --detach &&
> > +     C=$(git -C submodule commit-tree -m "new change outside refs/heads" HEAD^{tree}) &&
> > +     git -C submodule update-ref refs/changes/1 $C &&
> > +     git update-index --cacheinfo 160000 $C submodule &&
> > +     git commit -m "updated submodule outside of refs/heads" &&
> > +     D=$(git rev-parse HEAD) &&
> > +     git update-ref refs/changes/2 $D &&
> > +     (
> > +             cd downstream &&
> > +             git fetch --recurse-submodules --recurse-submodules-default on-demand origin refs/changes/2:refs/heads/my_branch &&
> > +             git -C submodule cat-file -t $C &&
> > +             git checkout --recurse-submodules FETCH_HEAD
> > +     )
> > +'
>
> For maximum test coverage, I think this test should involve 2
> submodules:
>
>  B   C  E   F
>   \ /    \ /
>    A      D
>
> and the upstream superproject having:
>
>   G -> H -> I
>
> The downstream superproject already has G, and is fetching I. In H, the
> submodule gitlinks point to B and E respectively, and in I, the
> submodule gitlinks point to C and F respectively. This ensures that both
> multiple submodules work, and that submodule commits are also fetched
> for interior superproject commits.

ok.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 7/9] submodule: fetch in submodules git directory instead of in worktree
  2018-10-23 18:26     ` Stefan Beller
@ 2018-10-23 22:55       ` Jonathan Tan
  2018-10-23 23:01         ` Stefan Beller
  0 siblings, 1 reply; 29+ messages in thread
From: Jonathan Tan @ 2018-10-23 22:55 UTC (permalink / raw)
  To: sbeller; +Cc: jonathantanmy, gitster, git

> > Why does GIT_DIR need to be set? Is it to avoid subcommands recursively
> > checking the parent directories in case the CWD is a malformed Git
> > repository? If yes, maybe it's worth adding a comment.
> 
> It is copying the structure from prepare_submodule_repo_env,
> specifically 10f5c52656 (submodule: avoid auto-discovery in
> prepare_submodule_repo_env(), 2016-09-01), which sounds
> appealing (and brings real benefits for the working directory),
> but I have not thought about this protection for the git dir.
> 
> Maybe another approach is to not set the cwd for the child process
> and instead point GIT_DIR_ENVIRONMENT only to the right
> directory.
> 
> Then the use of GIT_DIR_ENVIRONMENT is obvious and
> is not just for protection of corner cases.
> 
> However I think this protection is really valuable for the
> .git dir as well as the submodule may be broken and we do not
> want to end up in an infinite loop (as the discovery would find
> the superproject which then tries to recurse, again, into the
> submodule with the broken git dir)
> 
> When adding the comment here, we'd also want to have
> the comment in prepare_submodule_repo_env, which
> could be its own preparation commit.

I agree with the protection. As for the preparation commit, I don't
think it's always the code author's responsibility to tidy up the
surrounding code, but since you're adding an identical comment here,
it's probably worth it to add the comment there too.

> > This is the significant thing that this patch does more - an unskipped
> > submodule is now something that either passes the checks in
> > repo_submodule_init() or the checks in repo_init(), which seems to be
> > stricter than the current check that ".git" points to a directory or is
> > one. This means that we skip certain broken repositories, and this
> > necessitates a change in the test.
> 
> I see. However there is no change in function, the check in repo_init
> (or repo_submodule_init) is less strict than the check in the child process.
> So if there are broken submodule repositories, the difference of this
> patch is the layer at which it is caught, i.e. we would not spawn a child
> that fails, but skip the submodule.
> 
> Thinking of that, maybe we need to announce that in get_next_submodule

The consequence of getting caught changes, though. Currently,
spf->result is set to 1 whenever a child process fails. But in this
patch, some of these repositories would be entirely skipped, meaning
that no child process is run, and spf->result is never modified.

> > I think we should be more particular about what we're allowed to skip -
> > in particular, maybe if we're planning to skip this submodule, its
> > corresponding directory in the worktree (if one exists) needs to be
> > empty.
> 
> If the working tree directory is empty for that submodule, it means
> it is likely not initialized. But why would we use that as a signal to
> skip the submodule?

What I meant was: if empty, skip it completely. Otherwise, do the
repo_submodule_init() and repo_init() thing, and if they both fail, set
spf->result to 1, preserving existing behavior.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 7/9] submodule: fetch in submodules git directory instead of in worktree
  2018-10-23 22:55       ` Jonathan Tan
@ 2018-10-23 23:01         ` Stefan Beller
  0 siblings, 0 replies; 29+ messages in thread
From: Stefan Beller @ 2018-10-23 23:01 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Junio C Hamano, git

On Tue, Oct 23, 2018 at 3:55 PM Jonathan Tan <jonathantanmy@google.com> wrote:
> > When adding the comment here, we'd also want to have
> > the comment in prepare_submodule_repo_env, which
> > could be its own preparation commit.
>
> I agree with the protection. As for the preparation commit, I don't
> think it's always the code author's responsibility to tidy up the
> surrounding code, but since you're adding an identical comment here,
> it's probably worth it to add the comment there too.

Am I the only one who dislikes inconsistent files? ;-)
(ie. clean in one place, not cleaned up in another)
I can see your point. Will add a comment

> > Thinking of that, maybe we need to announce that in get_next_submodule
>
> The consequence of getting caught changes, though. Currently,
> spf->result is set to 1 whenever a child process fails. But in this
> patch, some of these repositories would be entirely skipped, meaning
> that no child process is run, and spf->result is never modified.

Right.

> > If the working tree directory is empty for that submodule, it means
> > it is likely not initialized. But why would we use that as a signal to
> > skip the submodule?
>
> What I meant was: if empty, skip it completely. Otherwise, do the
> repo_submodule_init() and repo_init() thing, and if they both fail, set
> spf->result to 1, preserving existing behavior.

I did it the other way round:

If repo_[submodule_]init fails, see if we have a gitlink in tree and
an empty dir in the FS, to decide if we need to signal failure.

I can switch it around again, but it seemed easier to write as
that puts corner cases away into one else {} case.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 8/9] fetch: retry fetching submodules if needed objects were not fetched
  2018-10-23 22:37     ` Stefan Beller
@ 2018-10-23 23:37       ` Jonathan Tan
  2018-10-25 21:42         ` Stefan Beller
  0 siblings, 1 reply; 29+ messages in thread
From: Jonathan Tan @ 2018-10-23 23:37 UTC (permalink / raw)
  To: sbeller; +Cc: jonathantanmy, gitster, git

> > Another thing you need to clarify is what happens if the fetch-by-commit
> > fails. Right now, it seems that it will make the whole thing fail, which
> > might be a surprising change in behavior.
> 
> But a positive surprise, I would assume?

Whether positive or negative, I think that this needs to be mentioned in
the commit message.

As for positive or negative, I tend to agree that it's positive - sure,
some previously successful fetches would now fail, but the results of
those fetches could not be recursively checked out anyway.

> > The test stores the result in a normal branch, not a remote tracking
> > branch. Is storing in a normal branch required?
> 
> In the test we fetch from another repository, such that in the
> repository-under-test this will show up in a remote tracking branch?

If that were true, I would expect that when this line:

> git fetch --recurse-submodules --recurse-submodules-default on-demand origin refs/changes/2:refs/heads/my_branch &&

is replaced by this line:

> git fetch --recurse-submodules --recurse-submodules-default on-demand origin refs/changes/2 &&

then things would still work. The tests pass with the first line (after
I fixed a type mismatch) but not with the second. (Also I don't think a
remote-tracking branch is generated here - the output printed doesn't
indicate so, and refs/changes/2 is not a branch anyway.)

> > Also, do you know why this is required? A naive reading of the patch
> > leads me to believe that this should work even if merely fetching to
> > FETCH_HEAD.
> 
> See the next patch, check_for_new_submodule_commits() is missing
> for FETCH_HEAD.

I see in the next patch that there is an "if" branch in
store_updated_refs() without update_local_ref() in which
"check_for_new_submodule_commits(&rm->old_oid)" needs to be inserted. I
think this is a symptom that maybe check_for_new_submodule_commits()
needs to be extracted from update_local_ref() and put into
store_updated_refs() instead? In update_local_ref(), it is called on
ref->new_oid, which is actually the same as rm->old_oid anyway (there is
an oidcpy earlier).

> > > +static const struct submodule *get_default_submodule(const char *path)
> > > +{
> > > +     struct submodule *ret = NULL;
> > > +     const char *name = default_name_or_path(path);
> > > +
> > > +     if (!name)
> > > +             return NULL;
> > > +
> > > +     ret = xmalloc(sizeof(*ret));
> > > +     memset(ret, 0, sizeof(*ret));
> > > +     ret->path = name;
> > > +     ret->name = name;
> > > +
> > > +     return (const struct submodule *) ret;
> > > +}
> >
> > What is a "default" submodule and why would you need one?
> 
> s/default/artificial/. Such a submodule is a submodule that has no
> config in the .gitmodules file and its name == path.
> We need to keep those around for historic reasons AFAICT, c.f.
> c68f837576 (implement fetching of moved submodules, 2017-10-16)

Ah, OK. I would call it a fake submodule then, and copy over the "No
entry in .gitmodules?" comment.

> > Will task->sub ever be NULL?
> 
> Yes, if we fall back to these "default" submodule and just try if it
> can be handled
> as a submodule, but it cannot be handled as such,
> get_next_submodule_task_create has
> 
>     task->sub = submodule_from_path(r, &null_oid, path);
>     if (!task->sub) {
>         task->sub = get_default_submodule(path);
> 
> and get_default_submodule can return NULL.

Ah, yes you're right.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 8/9] fetch: retry fetching submodules if needed objects were not fetched
  2018-10-23 23:37       ` Jonathan Tan
@ 2018-10-25 21:42         ` Stefan Beller
  0 siblings, 0 replies; 29+ messages in thread
From: Stefan Beller @ 2018-10-25 21:42 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Junio C Hamano, git

On Tue, Oct 23, 2018 at 4:38 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> > > Another thing you need to clarify is what happens if the fetch-by-commit
> > > fails. Right now, it seems that it will make the whole thing fail, which
> > > might be a surprising change in behavior.
> >
> > But a positive surprise, I would assume?
>
> Whether positive or negative, I think that this needs to be mentioned in
> the commit message.
>
> As for positive or negative, I tend to agree that it's positive - sure,
> some previously successful fetches would now fail, but the results of
> those fetches could not be recursively checked out anyway.
>
> > > The test stores the result in a normal branch, not a remote tracking
> > > branch. Is storing in a normal branch required?
> >
> > In the test we fetch from another repository, such that in the
> > repository-under-test this will show up in a remote tracking branch?

I messed up there. Yes, we need to fetch into a normal branch
such that the logic of check_for_new_submodule_commits triggers
no matter where it is on the remote.

Your experiment below shows that we cannot fetch into FETCH_HEAD:

> If that were true, I would expect that when this line:
>
> > git fetch --recurse-submodules --recurse-submodules-default on-demand origin refs/changes/2:refs/heads/my_branch &&
>
> is replaced by this line:
>
> > git fetch --recurse-submodules --recurse-submodules-default on-demand origin refs/changes/2 &&
>
> then things would still work. The tests pass with the first line (after
> I fixed a type mismatch) but not with the second. (Also I don't think a
> remote-tracking branch is generated here - the output printed doesn't
> indicate so, and refs/changes/2 is not a branch anyway.)

> > > Also, do you know why this is required? A naive reading of the patch
> > > leads me to believe that this should work even if merely fetching to
> > > FETCH_HEAD.
> >
> > See the next patch, check_for_new_submodule_commits() is missing
> > for FETCH_HEAD.
>
> I see in the next patch that there is an "if" branch in
> store_updated_refs() without update_local_ref() in which
> "check_for_new_submodule_commits(&rm->old_oid)" needs to be inserted. I
> think this is a symptom that maybe check_for_new_submodule_commits()
> needs to be extracted from update_local_ref() and put into
> store_updated_refs() instead? In update_local_ref(), it is called on
> ref->new_oid, which is actually the same as rm->old_oid anyway (there is
> an oidcpy earlier).

I'll look into that.

> > > What is a "default" submodule and why would you need one?
> >
> > s/default/artificial/. Such a submodule is a submodule that has no
> > config in the .gitmodules file and its name == path.
> > We need to keep those around for historic reasons AFAICT, c.f.
> > c68f837576 (implement fetching of moved submodules, 2017-10-16)
>
> Ah, OK. I would call it a fake submodule then, and copy over the "No
> entry in .gitmodules?" comment.

"fake submodule" sounds like
http://debuggable.com/posts/git-fake-submodules:4b563ee4-f3cc-4061-967e-0e48cbdd56cb
which is what I think of when hearing fake submodules.

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2018-10-25 21:42 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-16 18:13 [PATCH 0/9] Resending sb/submodule-recursive-fetch-gets-the-tip Stefan Beller
2018-10-16 18:13 ` [PATCH 1/9] sha1-array: provide oid_array_filter Stefan Beller
2018-10-16 18:13 ` [PATCH 2/9] submodule.c: fix indentation Stefan Beller
2018-10-16 18:13 ` [PATCH 3/9] submodule.c: sort changed_submodule_names before searching it Stefan Beller
2018-10-17 21:21   ` Jonathan Tan
2018-10-16 18:13 ` [PATCH 4/9] submodule.c: move global changed_submodule_names into fetch submodule struct Stefan Beller
2018-10-17 21:26   ` Jonathan Tan
2018-10-18 19:09     ` Stefan Beller
2018-10-16 18:13 ` [PATCH 5/9] submodule.c: do not copy around submodule list Stefan Beller
2018-10-17 21:45   ` Jonathan Tan
2018-10-18  2:35     ` Junio C Hamano
2018-10-16 18:13 ` [PATCH 6/9] repository: repo_submodule_init to take a submodule struct Stefan Beller
2018-10-17 21:55   ` Jonathan Tan
2018-10-16 18:13 ` [PATCH 7/9] submodule: fetch in submodules git directory instead of in worktree Stefan Beller
2018-10-17 22:58   ` Jonathan Tan
2018-10-23 18:26     ` Stefan Beller
2018-10-23 22:55       ` Jonathan Tan
2018-10-23 23:01         ` Stefan Beller
2018-10-16 18:13 ` [PATCH 8/9] fetch: retry fetching submodules if needed objects were not fetched Stefan Beller
2018-10-18  0:39   ` Jonathan Tan
2018-10-23 22:37     ` Stefan Beller
2018-10-23 23:37       ` Jonathan Tan
2018-10-25 21:42         ` Stefan Beller
2018-10-16 18:13 ` [PATCH 9/9] builtin/fetch: check for submodule updates for non branch fetches Stefan Beller
2018-10-18  0:47   ` Jonathan Tan
2018-10-18  2:30 ` [PATCH 0/9] Resending sb/submodule-recursive-fetch-gets-the-tip Junio C Hamano
2018-10-18  7:30 ` Junio C Hamano
2018-10-18 18:00   ` Stefan Beller
  -- strict thread matches above, loose matches on Subject: below --
2018-09-11 23:49 [PATCH 0/9] fetch: make sure submodule oids are fetched Stefan Beller
2018-09-17 21:35 ` [PATCHv2 " Stefan Beller
2018-09-17 21:35   ` [PATCH 8/9] fetch: retry fetching submodules if needed objects were not fetched Stefan Beller

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).