git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v4 0/9] fetch: make sure submodule oids are fetched
@ 2018-09-25 19:47 Stefan Beller
  2018-09-25 19:47 ` [PATCH v4 1/9] sha1-array: provide oid_array_filter Stefan Beller
                   ` (8 more replies)
  0 siblings, 9 replies; 14+ messages in thread
From: Stefan Beller @ 2018-09-25 19:47 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

v4:
* Per Ævars comment, moved the docs for oid_array_filter into
  Documentation/technical/...
* addressed all outstanding comment as noted in "What's cooking" email,
* below is a range-diff against the currently queued version of the
  series.

v3:
* I discovered some issues with v2 after sending,
  which is why I rewrote the later patches completely
  and now we pass around a "task" struct that contains everything to know
  about the things to work on and what needs free()ing afterwards.
* as it is no longer string list based, this drops adding string_list_{pop, last}

v2:
* extended commit messages,
* plugged a memory leak
* rewrote the patch "sha1-array: provide oid_array_filter" to be much more like 
  object_array_fiter
* fixed a typo pointed out by Ramsay.

The range diff is below.
  
Thanks,
Stefan

v1:
Currently when git-fetch is asked to recurse into submodules, it dispatches
a plain "git-fetch -C <submodule-dir>" (and some submodule related options
such as prefix and recusing strategy, but) without any information of the
remote or the tip that should be fetched.

This works surprisingly well in some workflows, not so well in others,
which this series aims to fix.

The first patches provide new basic functionality and do some refactoring;
the interesting part is in the two last patches.

This was discussed in
https://public-inbox.org/git/20180808221752.195419-1-sbeller@google.com/
and I think I addressed all feedback so far.

Stefan Beller (9):
  sha1-array: provide oid_array_filter
  submodule.c: fix indentation
  submodule.c: sort changed_submodule_names before searching it
  submodule: move global changed_submodule_names into fetch submodule
    struct
  submodule.c: do not copy around submodule list
  repository: repo_submodule_init to take a submodule struct
  submodule: fetch in submodules git directory instead of in worktree
  fetch: retry fetching submodules if needed objects were not fetched
  builtin/fetch: check for submodule updates for non branch fetches

 Documentation/technical/api-oid-array.txt |   5 +
 builtin/fetch.c                           |  14 +-
 builtin/grep.c                            |  17 +-
 builtin/ls-files.c                        |  12 +-
 builtin/submodule--helper.c               |   2 +-
 repository.c                              |  27 +--
 repository.h                              |  11 +-
 sha1-array.c                              |  17 ++
 sha1-array.h                              |   3 +
 submodule.c                               | 275 +++++++++++++++++-----
 t/t5526-fetch-submodules.sh               |  23 +-
 11 files changed, 311 insertions(+), 95 deletions(-)

git range-diff  origin/sb/submodule-recursive-fetch-gets-the-tip...

  1:  6fecf7cd01a <   -:  ----------- string-list: add string_list_{pop, last} functions
  2:  7007a318a68 <   -:  ----------- sha1-array: provide oid_array_filter

    [ ... snip ... I rebased onto: ]

  -:  ----------- > 215:  fe8321ec057 Second batch post 2.19
  -:  ----------- > 216:  a9b49d4cfe9 sha1-array: provide oid_array_filter
  3:  807429234ac ! 217:  813205700d1 submodule.c: fix indentation
    @@ -6,7 +6,6 @@
         Fix it while we are here.
     
         Signed-off-by: Stefan Beller <sbeller@google.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
      diff --git a/submodule.c b/submodule.c
      --- a/submodule.c
  4:  f6fa5273af9 ! 218:  b4aa77f72ba submodule.c: sort changed_submodule_names before searching it
    @@ -12,7 +12,6 @@
         appropriate.
     
         Signed-off-by: Stefan Beller <sbeller@google.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
      diff --git a/submodule.c b/submodule.c
      --- a/submodule.c
  5:  adf7a2fd203 ! 219:  df4da0e18e4 submodule: move global changed_submodule_names into fetch submodule struct
    @@ -6,13 +6,12 @@
         part of the struct that is passed around for fetching submodules.
     
         Signed-off-by: Stefan Beller <sbeller@google.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
      diff --git a/submodule.c b/submodule.c
      --- a/submodule.c
      +++ b/submodule.c
     @@
    - #include "object-store.h"
    + #include "commit-reach.h"
      
      static int config_update_recurse_submodules = RECURSE_SUBMODULES_OFF;
     -static struct string_list changed_submodule_names = STRING_LIST_INIT_DUP;
  6:  56c9398589a ! 220:  b9f5d06c134 submodule.c: do not copy around submodule list
    @@ -9,12 +9,11 @@
         Instead use the result list directly and prune items afterwards
         using string_list_remove_empty_items.
     
    -    By doin so we'll have access to the util pointer for longer that
    +    By doing so we'll have access to the util pointer for longer that
         contains the commits that we need to fetch, which will be
         useful in a later patch.
     
         Signed-off-by: Stefan Beller <sbeller@google.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
      diff --git a/submodule.c b/submodule.c
      --- a/submodule.c
  -:  ----------- > 221:  e5ff287a0a2 repository: repo_submodule_init to take a submodule struct
  7:  9f70a5f32c9 ! 222:  8acd734b5f5 submodule: fetch in submodules git directory instead of in worktree
    @@ -3,14 +3,24 @@
         submodule: fetch in submodules git directory instead of in worktree
     
         This patch started as a refactoring to make 'get_next_submodule' more
    -    readable, but upon doing so, I realized that git-fetch actually doesn't
    -    need to be run in the worktree. So let's run it in the git dir instead.
    +    readable, but upon doing so, I realized that "git fetch" of the submodule
    +    actually doesn't need to be run in the submodules worktree. So let's run
    +    it in its git dir instead.
     
         That should pave the way towards fetching submodules that are currently
         not checked out.
     
    +    This patch leaks the cp->dir in get_next_submodule, as any further
    +    callback in run_processes_parallel doesn't have access to the child
    +    process any more. In an early iteration of this patch, the function
    +    get_submodule_repo_for directly returned the string containing the
    +    git directory, which would be a better design choice for this patch.
    +
    +    However the next patch both fixes the memory leak of cp->dir and also has
    +    a use case for using the full repository handle of the submodule, so
    +    it makes sense to introduce the get_submodule_repo_for here already.
    +
         Signed-off-by: Stefan Beller <sbeller@google.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
      diff --git a/submodule.c b/submodule.c
      --- a/submodule.c
    @@ -32,24 +42,26 @@
      	return spf->default_option;
      }
      
    -+static const char *get_submodule_git_dir(struct repository *r, const char *path)
    ++static struct repository *get_submodule_repo_for(struct repository *r,
    ++						 const struct submodule *sub)
     +{
    -+	struct repository subrepo;
    -+	const char *ret;
    ++	struct repository *ret = xmalloc(sizeof(*ret));
     +
    -+	if (repo_submodule_init(&subrepo, r, path)) {
    -+		/* no entry in .gitmodules? */
    ++	if (repo_submodule_init(ret, r, sub)) {
    ++		/*
    ++		 * No entry in .gitmodules? Technically not a submodule,
    ++		 * but historically we supported repositories that happen to be
    ++		 * in-place where a gitlink is. Keep supporting them.
    ++		 */
     +		struct strbuf gitdir = STRBUF_INIT;
    -+		strbuf_repo_worktree_path(&gitdir, r, "%s/.git", path);
    -+		if (repo_init(&subrepo, gitdir.buf, NULL)) {
    ++		strbuf_repo_worktree_path(&gitdir, r, "%s/.git", sub->path);
    ++		if (repo_init(ret, gitdir.buf, NULL)) {
     +			strbuf_release(&gitdir);
     +			return NULL;
     +		}
    ++		strbuf_release(&gitdir);
     +	}
     +
    -+	ret = xstrdup(subrepo.gitdir);
    -+	repo_clear(&subrepo);
    -+
     +	return ret;
     +}
     +
    @@ -64,7 +76,13 @@
     -		struct strbuf submodule_git_dir = STRBUF_INIT;
      		struct strbuf submodule_prefix = STRBUF_INIT;
      		const struct cache_entry *ce = spf->r->index->cache[spf->count];
    - 		const char *git_dir, *default_argv;
    +-		const char *git_dir, *default_argv;
    ++		const char *default_argv;
    + 		const struct submodule *submodule;
    ++		struct repository *repo;
    + 		struct submodule default_submodule = SUBMODULE_INIT;
    + 
    + 		if (!S_ISGITLINK(ce->ce_mode))
     @@
      			continue;
      		}
    @@ -76,18 +94,23 @@
     -		if (!git_dir)
     -			git_dir = submodule_git_dir.buf;
     -		if (is_directory(git_dir)) {
    -+		git_dir = get_submodule_git_dir(spf->r, ce->name);
    -+		if (git_dir) {
    ++		repo = get_submodule_repo_for(spf->r, submodule);
    ++		if (repo) {
      			child_process_init(cp);
     -			cp->dir = strbuf_detach(&submodule_path, NULL);
     -			prepare_submodule_repo_env(&cp->env_array);
     +			prepare_submodule_repo_env_in_gitdir(&cp->env_array);
    -+			cp->dir = git_dir;
    ++			cp->dir = xstrdup(repo->gitdir);
      			cp->git_cmd = 1;
      			if (!spf->quiet)
      				strbuf_addf(err, "Fetching submodule %s%s\n",
     @@
    + 			argv_array_push(&cp->args, default_argv);
    + 			argv_array_push(&cp->args, "--submodule-prefix");
      			argv_array_push(&cp->args, submodule_prefix.buf);
    ++
    ++			repo_clear(repo);
    ++			free(repo);
      			ret = 1;
      		}
     -		strbuf_release(&submodule_path);
  8:  bab609b4dc1 ! 223:  5752ba212a7 fetch: retry fetching submodules if sha1 were not fetched
    @@ -1,9 +1,9 @@
     Author: Stefan Beller <sbeller@google.com>
     
    -    fetch: retry fetching submodules if sha1 were not fetched
    +    fetch: retry fetching submodules if needed objects were not fetched
     
         Currently when git-fetch is asked to recurse into submodules, it dispatches
    -    a plain "git-fetch -C <submodule-dir>" (and some submodule related options
    +    a plain "git-fetch -C <submodule-dir>" (with some submodule related options
         such as prefix and recusing strategy, but) without any information of the
         remote or the tip that should be fetched.
     
    @@ -16,16 +16,22 @@
         topic downloaded as well. However these submodule changes reside in their
         own repository in their own ref (refs/changes/<int>).
     
    -    Retry fetching a submodule if the object id that the superproject points
    -    to, cannot be found.
    +    Retry fetching a submodule by object name if the object id that the
    +    superproject points to, cannot be found.
     
    -    This doesn't support fetching to FETCH_HEAD yet, but only into a local
    -    branch. To make fetching into FETCH_HEAD work, we need some refactoring
    -    in builtin/fetch.c to adjust the calls to 'check_for_new_submodule_commits'
    -    that is coming in the next patch.
    +    This retrying does not happen when the "git fetch" done at the
    +    superproject is not storing the fetched results in remote
    +    tracking branches (i.e. instead just recording them to
    +    FETCH_HEAD) in this step. A later patch will fix this.
    +
    +    builtin/fetch used to only inspect submodules when they were fetched
    +    "on-demand", as in either on/off case it was clear whether the submodule
    +    needs to be fetched. However to know whether we need to try fetching the
    +    object ids, we need to identify the object names, which is done in this
    +    function check_for_new_submodule_commits(), so we'll also run that code
    +    in case the submodule recursion is set to "on".
     
         Signed-off-by: Stefan Beller <sbeller@google.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
      diff --git a/builtin/fetch.c b/builtin/fetch.c
      --- a/builtin/fetch.c
    @@ -68,68 +74,181 @@
      	int result;
      
      	struct string_list changed_submodule_names;
    -+	struct string_list retry;
    ++	struct get_next_submodule_task **retry;
    ++	int retry_nr, retry_alloc;
      };
     -#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, STRING_LIST_INIT_DUP }
     +#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, \
     +		  STRING_LIST_INIT_DUP, \
    -+		  STRING_LIST_INIT_NODUP}
    ++		  NULL, 0, 0}
      
      static void calculate_changed_submodule_paths(
      	struct submodule_parallel_fetch *spf)
     @@
    + 	return spf->default_option;
    + }
    + 
    ++struct get_next_submodule_task {
    ++	struct repository *repo;
    ++	const struct submodule *sub;
    ++	unsigned free_sub : 1; /* Do we need to free the submodule? */
    ++	struct oid_array *commits;
    ++};
    ++
    ++static const struct submodule *get_default_submodule(const char *path)
    ++{
    ++	struct submodule *ret = NULL;
    ++	const char *name = default_name_or_path(path);
    ++
    ++	if (!name)
    ++		return NULL;
    ++
    ++	ret = xmalloc(sizeof(*ret));
    ++	memset(ret, 0, sizeof(*ret));
    ++	ret->path = name;
    ++	ret->name = name;
    ++
    ++	return (const struct submodule *) ret;
    ++}
    ++
    ++static struct get_next_submodule_task *get_next_submodule_task_create(
    ++	struct repository *r, const char *path)
    ++{
    ++	struct get_next_submodule_task *task = xmalloc(sizeof(*task));
    ++	memset(task, 0, sizeof(*task));
    ++
    ++	task->sub = submodule_from_path(r, &null_oid, path);
    ++	if (!task->sub) {
    ++		task->sub = get_default_submodule(path);
    ++		task->free_sub = 1;
    ++	}
    ++
    ++	return task;
    ++}
    ++
    ++static void get_next_submodule_task_release(struct get_next_submodule_task *p)
    ++{
    ++	if (p->free_sub)
    ++		free((void*)p->sub);
    ++	p->free_sub = 0;
    ++	p->sub = NULL;
    ++
    ++	if (p->repo)
    ++		repo_clear(p->repo);
    ++	FREE_AND_NULL(p->repo);
    ++}
    ++
    + static struct repository *get_submodule_repo_for(struct repository *r,
    + 						 const struct submodule *sub)
    + {
    +@@
    + static int get_next_submodule(struct child_process *cp,
    + 			      struct strbuf *err, void *data, void **task_cb)
      {
    - 	int ret = 0;
    +-	int ret = 0;
      	struct submodule_parallel_fetch *spf = data;
    -+	struct string_list_item *it;
      
      	for (; spf->count < spf->r->index->cache_nr; spf->count++) {
    +-		struct strbuf submodule_prefix = STRBUF_INIT;
     +		int recurse_config;
    - 		struct strbuf submodule_prefix = STRBUF_INIT;
      		const struct cache_entry *ce = spf->r->index->cache[spf->count];
    - 		const char *git_dir, *default_argv;
    -@@
    - 			}
    + 		const char *default_argv;
    +-		const struct submodule *submodule;
    +-		struct repository *repo;
    +-		struct submodule default_submodule = SUBMODULE_INIT;
    ++		struct get_next_submodule_task *task;
    + 
    + 		if (!S_ISGITLINK(ce->ce_mode))
    + 			continue;
    + 
    +-		submodule = submodule_from_path(spf->r, &null_oid, ce->name);
    +-		if (!submodule) {
    +-			const char *name = default_name_or_path(ce->name);
    +-			if (name) {
    +-				default_submodule.path = name;
    +-				default_submodule.name = name;
    +-				submodule = &default_submodule;
    +-			}
    ++		task = get_next_submodule_task_create(spf->r, ce->name);
    ++
    ++		if (!task->sub) {
    ++			free(task);
    ++			continue;
      		}
      
     -		switch (get_fetch_recurse_config(submodule, spf))
    -+		recurse_config = get_fetch_recurse_config(submodule, spf);
    ++		recurse_config = get_fetch_recurse_config(task->sub, spf);
     +
     +		switch (recurse_config)
      		{
      		default:
      		case RECURSE_SUBMODULES_DEFAULT:
    + 		case RECURSE_SUBMODULES_ON_DEMAND:
    +-			if (!submodule ||
    ++			if (!task->sub ||
    + 			    !string_list_lookup(
    + 					&spf->changed_submodule_names,
    +-					submodule->name))
    ++					task->sub->name))
    + 				continue;
    + 			default_argv = "on-demand";
    + 			break;
    +@@
    + 			continue;
    + 		}
    + 
    +-		strbuf_addf(&submodule_prefix, "%s%s/", spf->prefix, ce->name);
    +-		repo = get_submodule_repo_for(spf->r, submodule);
    +-		if (repo) {
    ++		task->repo = get_submodule_repo_for(spf->r, task->sub);
    ++		if (task->repo) {
    ++			struct strbuf submodule_prefix = STRBUF_INIT;
    + 			child_process_init(cp);
    + 			prepare_submodule_repo_env_in_gitdir(&cp->env_array);
    +-			cp->dir = xstrdup(repo->gitdir);
    ++			cp->dir = task->repo->gitdir;
    + 			cp->git_cmd = 1;
    + 			if (!spf->quiet)
    + 				strbuf_addf(err, "Fetching submodule %s%s\n",
     @@
    - 		strbuf_release(&submodule_prefix);
    - 		if (ret) {
    + 			argv_array_pushv(&cp->args, spf->args.argv);
    + 			argv_array_push(&cp->args, default_argv);
    + 			argv_array_push(&cp->args, "--submodule-prefix");
    ++
    ++			strbuf_addf(&submodule_prefix, "%s%s/",
    ++						       spf->prefix,
    ++						       task->sub->path);
    + 			argv_array_push(&cp->args, submodule_prefix.buf);
    + 
    +-			repo_clear(repo);
    +-			free(repo);
    +-			ret = 1;
    +-		}
    +-		strbuf_release(&submodule_prefix);
    +-		if (ret) {
      			spf->count++;
    -+			if (submodule != &default_submodule)
    -+				/* discard const-ness: */
    -+				*task_cb = (void*)submodule;
    ++			*task_cb = task;
    ++
    ++			strbuf_release(&submodule_prefix);
      			return 1;
    ++		} else {
    ++			get_next_submodule_task_release(task);
    ++			free(task);
      		}
      	}
     +
    -+retry_next:
    -+
    -+	if (spf->retry.nr) {
    ++	if (spf->retry_nr) {
    ++		struct get_next_submodule_task *task = spf->retry[spf->retry_nr - 1];
     +		struct strbuf submodule_prefix = STRBUF_INIT;
    -+		const struct submodule *sub;
    ++		spf->retry_nr--;
     +
    -+		it = string_list_last(&spf->retry);
    -+		sub = submodule_from_name(spf->r, &null_oid,
    -+					  it->string);
    ++		strbuf_addf(&submodule_prefix, "%s%s/", spf->prefix, task->sub->path);
     +
     +		child_process_init(cp);
    -+		cp->dir = get_submodule_git_dir(spf->r, sub->path);
    -+		if (!cp->dir) {
    -+			string_list_pop(&spf->retry, 0);
    -+			goto retry_next;
    -+		}
     +		prepare_submodule_repo_env_in_gitdir(&cp->env_array);
     +		cp->git_cmd = 1;
    ++		cp->dir = task->repo->gitdir;
     +
    -+		strbuf_addf(&submodule_prefix, "%s%s/", spf->prefix, sub->path);
     +		argv_array_init(&cp->args);
     +		argv_array_pushv(&cp->args, spf->args.argv);
     +		argv_array_push(&cp->args, "on-demand");
    @@ -138,12 +257,11 @@
     +
     +		/* NEEDSWORK: have get_default_remote from s--h */
     +		argv_array_push(&cp->args, "origin");
    -+		oid_array_for_each_unique(it->util,
    ++		oid_array_for_each_unique(task->commits,
     +					  append_oid_to_argv, &cp->args);
     +
    -+		*task_cb = NULL; /* make sure we do not recurse forever */
    ++		*task_cb = task;
     +		strbuf_release(&submodule_prefix);
    -+		string_list_pop(&spf->retry, 0);
     +		return 1;
     +	}
     +
    @@ -151,6 +269,14 @@
      }
      
     @@
    + 			       void *cb, void *task_cb)
    + {
    + 	struct submodule_parallel_fetch *spf = cb;
    ++	struct get_next_submodule_task *task = task_cb;
    + 
    + 	spf->result = 1;
    + 
    ++	get_next_submodule_task_release(task);
      	return 0;
      }
      
    @@ -167,35 +293,46 @@
      			void *cb, void *task_cb)
      {
      	struct submodule_parallel_fetch *spf = cb;
    -+	struct submodule *sub = task_cb;
    -+	struct repository subrepo;
    ++	struct get_next_submodule_task *task = task_cb;
    ++	const struct submodule *sub;
    ++
    ++	struct string_list_item *it;
    ++	struct oid_array *commits;
      
      	if (retvalue)
      		spf->result = 1;
      
    -+	if (!sub)
    ++	if (!task)
     +		return 0;
     +
    -+	if (repo_submodule_init(&subrepo, spf->r, sub->path) < 0)
    -+		warning(_("Could not get submodule repository for submodule '%s' in repository '%s'"),
    -+			  sub->path, spf->r->worktree);
    -+	else {
    -+		struct string_list_item *it;
    -+		struct oid_array *commits;
    ++	sub = task->sub;
    ++	if (!sub)
    ++		goto out;
     +
    -+		it = string_list_lookup(&spf->changed_submodule_names, sub->name);
    -+		if (!it)
    -+			return 0;
    ++	it = string_list_lookup(&spf->changed_submodule_names, sub->name);
    ++	if (!it)
    ++		goto out;
    ++
    ++	commits = it->util;
    ++	oid_array_filter(commits,
    ++			 commit_exists_in_sub,
    ++			 task->repo);
     +
    -+		commits = it->util;
    -+		oid_array_filter(commits,
    -+				 commit_exists_in_sub,
    -+				 &subrepo);
    ++	/* Are there commits that do not exist? */
    ++	if (commits->nr) {
    ++		/* We already tried fetching them, do not try again. */
    ++		if (task->commits)
    ++			return 0;
     +
    -+		if (commits->nr)
    -+			string_list_append(&spf->retry, sub->name)
    -+				->util = commits;
    ++		task->commits = commits;
    ++		ALLOC_GROW(spf->retry, spf->retry_nr + 1, spf->retry_alloc);
    ++		spf->retry[spf->retry_nr] = task;
    ++		spf->retry_nr++;
    ++		return 0;
     +	}
    ++
    ++out:
    ++	get_next_submodule_task_release(task);
     +
      	return 0;
      }
  9:  c16d21313f6 ! 224:  d02ee9ef485 builtin/fetch: check for submodule updates for non branch fetches
    @@ -2,11 +2,29 @@
     
         builtin/fetch: check for submodule updates for non branch fetches
     
    -    For Gerrit users that use submodules the invocation of fetch without a
    -    branch is their main use case.
    +    Gerrit, the code review tool, has a different workflow than our mailing
    +    list based approach. Usually users upload changes to a Gerrit server and
    +    continuous integration and testing happens by bots. Sometimes however a
    +    user wants to checkout a change locally and look at it locally. For this
    +    use case, Gerrit offers a command line snippet to copy and paste to your
    +    terminal, which looks like
    +
    +      git fetch https://<host>/gerrit refs/changes/<id> &&
    +      git checkout FETCH_HEAD
    +
    +    For Gerrit changes that contain changing submodule gitlinks, it would be
    +    easy to extend both the fetch and checkout with the '--recurse-submodules'
    +    flag, such that this command line snippet would produce the state of a
    +    change locally.
    +
    +    However the functionality added in the previous patch, which would
    +    ensure that we fetch the objects in the submodule that the gitlink pointed
    +    at, only works for remote tracking branches so far, not for FETCH_HEAD.
    +
    +    Make sure that fetching a superproject to its FETCH_HEAD, also respects
    +    the existence checks for objects in the submodule recursion.
     
         Signed-off-by: Stefan Beller <sbeller@google.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
      diff --git a/builtin/fetch.c b/builtin/fetch.c
      --- a/builtin/fetch.c

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v4 1/9] sha1-array: provide oid_array_filter
  2018-09-25 19:47 [PATCH v4 0/9] fetch: make sure submodule oids are fetched Stefan Beller
@ 2018-09-25 19:47 ` Stefan Beller
  2018-09-26 22:17   ` Junio C Hamano
  2018-09-25 19:47 ` [PATCH v4 2/9] submodule.c: fix indentation Stefan Beller
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 14+ messages in thread
From: Stefan Beller @ 2018-09-25 19:47 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/technical/api-oid-array.txt |  5 +++++
 sha1-array.c                              | 17 +++++++++++++++++
 sha1-array.h                              |  3 +++
 3 files changed, 25 insertions(+)

diff --git a/Documentation/technical/api-oid-array.txt b/Documentation/technical/api-oid-array.txt
index 9febfb1d528..c97428c2c34 100644
--- a/Documentation/technical/api-oid-array.txt
+++ b/Documentation/technical/api-oid-array.txt
@@ -48,6 +48,11 @@ Functions
 	is not sorted, this function has the side effect of sorting
 	it.
 
+`oid_array_filter`::
+	Apply the callback function `want` to each entry in the array,
+	retaining only the entries for which the function returns true.
+	Preserve the order of the entries that are retained.
+
 Examples
 --------
 
diff --git a/sha1-array.c b/sha1-array.c
index b94e0ec0f5e..d922e94e3fc 100644
--- a/sha1-array.c
+++ b/sha1-array.c
@@ -77,3 +77,20 @@ int oid_array_for_each_unique(struct oid_array *array,
 	}
 	return 0;
 }
+
+void oid_array_filter(struct oid_array *array,
+		      for_each_oid_fn want,
+		      void *cb_data)
+{
+	unsigned nr = array->nr, src, dst;
+	struct object_id *oids = array->oid;
+
+	for (src = dst = 0; src < nr; src++) {
+		if (want(&oids[src], cb_data)) {
+			if (src != dst)
+				oidcpy(&oids[dst], &oids[src]);
+			dst++;
+		}
+	}
+	array->nr = dst;
+}
diff --git a/sha1-array.h b/sha1-array.h
index 232bf950172..55d016c4bf7 100644
--- a/sha1-array.h
+++ b/sha1-array.h
@@ -22,5 +22,8 @@ int oid_array_for_each(struct oid_array *array,
 int oid_array_for_each_unique(struct oid_array *array,
 			      for_each_oid_fn fn,
 			      void *data);
+void oid_array_filter(struct oid_array *array,
+		      for_each_oid_fn want,
+		      void *cbdata);
 
 #endif /* SHA1_ARRAY_H */
-- 
2.19.0.605.g01d371f741-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4 2/9] submodule.c: fix indentation
  2018-09-25 19:47 [PATCH v4 0/9] fetch: make sure submodule oids are fetched Stefan Beller
  2018-09-25 19:47 ` [PATCH v4 1/9] sha1-array: provide oid_array_filter Stefan Beller
@ 2018-09-25 19:47 ` Stefan Beller
  2018-09-26 22:18   ` Junio C Hamano
  2018-09-25 19:47 ` [PATCH v4 3/9] submodule.c: sort changed_submodule_names before searching it Stefan Beller
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 14+ messages in thread
From: Stefan Beller @ 2018-09-25 19:47 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

The submodule subsystem is really bad at staying within 80 characters.
Fix it while we are here.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/submodule.c b/submodule.c
index b53cb6e9c47..0de9e2800ad 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1244,7 +1244,8 @@ static int get_next_submodule(struct child_process *cp,
 		if (!submodule) {
 			const char *name = default_name_or_path(ce->name);
 			if (name) {
-				default_submodule.path = default_submodule.name = name;
+				default_submodule.path = name;
+				default_submodule.name = name;
 				submodule = &default_submodule;
 			}
 		}
@@ -1254,8 +1255,10 @@ static int get_next_submodule(struct child_process *cp,
 		default:
 		case RECURSE_SUBMODULES_DEFAULT:
 		case RECURSE_SUBMODULES_ON_DEMAND:
-			if (!submodule || !unsorted_string_list_lookup(&changed_submodule_names,
-							 submodule->name))
+			if (!submodule ||
+			    !unsorted_string_list_lookup(
+					&changed_submodule_names,
+					submodule->name))
 				continue;
 			default_argv = "on-demand";
 			break;
-- 
2.19.0.605.g01d371f741-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4 3/9] submodule.c: sort changed_submodule_names before searching it
  2018-09-25 19:47 [PATCH v4 0/9] fetch: make sure submodule oids are fetched Stefan Beller
  2018-09-25 19:47 ` [PATCH v4 1/9] sha1-array: provide oid_array_filter Stefan Beller
  2018-09-25 19:47 ` [PATCH v4 2/9] submodule.c: fix indentation Stefan Beller
@ 2018-09-25 19:47 ` Stefan Beller
  2018-09-26 22:14   ` Junio C Hamano
  2018-09-25 19:47 ` [PATCH v4 4/9] submodule: move global changed_submodule_names into fetch submodule struct Stefan Beller
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 14+ messages in thread
From: Stefan Beller @ 2018-09-25 19:47 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

We can string_list_insert() to maintain sorted-ness of the
list as we find new items, or we can string_list_append() to
build an unsorted list and sort it at the end just once.

To pick which one is more appropriate, we notice the fact
that we discover new items more or less in the already
sorted order.  That makes "append then sort" more
appropriate.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/submodule.c b/submodule.c
index 0de9e2800ad..22c64bd8559 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1256,7 +1256,7 @@ static int get_next_submodule(struct child_process *cp,
 		case RECURSE_SUBMODULES_DEFAULT:
 		case RECURSE_SUBMODULES_ON_DEMAND:
 			if (!submodule ||
-			    !unsorted_string_list_lookup(
+			    !string_list_lookup(
 					&changed_submodule_names,
 					submodule->name))
 				continue;
@@ -1350,6 +1350,7 @@ int fetch_populated_submodules(struct repository *r,
 	/* default value, "--submodule-prefix" and its value are added later */
 
 	calculate_changed_submodule_paths();
+	string_list_sort(&changed_submodule_names);
 	run_processes_parallel(max_parallel_jobs,
 			       get_next_submodule,
 			       fetch_start_failure,
-- 
2.19.0.605.g01d371f741-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4 4/9] submodule: move global changed_submodule_names into fetch submodule struct
  2018-09-25 19:47 [PATCH v4 0/9] fetch: make sure submodule oids are fetched Stefan Beller
                   ` (2 preceding siblings ...)
  2018-09-25 19:47 ` [PATCH v4 3/9] submodule.c: sort changed_submodule_names before searching it Stefan Beller
@ 2018-09-25 19:47 ` Stefan Beller
  2018-09-26 22:19   ` Junio C Hamano
  2018-09-25 19:47 ` [PATCH v4 5/9] submodule.c: do not copy around submodule list Stefan Beller
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 14+ messages in thread
From: Stefan Beller @ 2018-09-25 19:47 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

The `changed_submodule_names` are only used for fetching, so let's make it
part of the struct that is passed around for fetching submodules.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule.c | 42 +++++++++++++++++++++++-------------------
 1 file changed, 23 insertions(+), 19 deletions(-)

diff --git a/submodule.c b/submodule.c
index 22c64bd8559..17103379ba4 100644
--- a/submodule.c
+++ b/submodule.c
@@ -25,7 +25,7 @@
 #include "commit-reach.h"
 
 static int config_update_recurse_submodules = RECURSE_SUBMODULES_OFF;
-static struct string_list changed_submodule_names = STRING_LIST_INIT_DUP;
+
 static int initialized_fetch_ref_tips;
 static struct oid_array ref_tips_before_fetch;
 static struct oid_array ref_tips_after_fetch;
@@ -1110,7 +1110,22 @@ void check_for_new_submodule_commits(struct object_id *oid)
 	oid_array_append(&ref_tips_after_fetch, oid);
 }
 
-static void calculate_changed_submodule_paths(void)
+struct submodule_parallel_fetch {
+	int count;
+	struct argv_array args;
+	struct repository *r;
+	const char *prefix;
+	int command_line_option;
+	int default_option;
+	int quiet;
+	int result;
+
+	struct string_list changed_submodule_names;
+};
+#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, STRING_LIST_INIT_DUP }
+
+static void calculate_changed_submodule_paths(
+	struct submodule_parallel_fetch *spf)
 {
 	struct argv_array argv = ARGV_ARRAY_INIT;
 	struct string_list changed_submodules = STRING_LIST_INIT_DUP;
@@ -1148,7 +1163,8 @@ static void calculate_changed_submodule_paths(void)
 			continue;
 
 		if (!submodule_has_commits(path, commits))
-			string_list_append(&changed_submodule_names, name->string);
+			string_list_append(&spf->changed_submodule_names,
+					   name->string);
 	}
 
 	free_submodules_oids(&changed_submodules);
@@ -1185,18 +1201,6 @@ int submodule_touches_in_range(struct object_id *excl_oid,
 	return ret;
 }
 
-struct submodule_parallel_fetch {
-	int count;
-	struct argv_array args;
-	struct repository *r;
-	const char *prefix;
-	int command_line_option;
-	int default_option;
-	int quiet;
-	int result;
-};
-#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0}
-
 static int get_fetch_recurse_config(const struct submodule *submodule,
 				    struct submodule_parallel_fetch *spf)
 {
@@ -1257,7 +1261,7 @@ static int get_next_submodule(struct child_process *cp,
 		case RECURSE_SUBMODULES_ON_DEMAND:
 			if (!submodule ||
 			    !string_list_lookup(
-					&changed_submodule_names,
+					&spf->changed_submodule_names,
 					submodule->name))
 				continue;
 			default_argv = "on-demand";
@@ -1349,8 +1353,8 @@ int fetch_populated_submodules(struct repository *r,
 	argv_array_push(&spf.args, "--recurse-submodules-default");
 	/* default value, "--submodule-prefix" and its value are added later */
 
-	calculate_changed_submodule_paths();
-	string_list_sort(&changed_submodule_names);
+	calculate_changed_submodule_paths(&spf);
+	string_list_sort(&spf.changed_submodule_names);
 	run_processes_parallel(max_parallel_jobs,
 			       get_next_submodule,
 			       fetch_start_failure,
@@ -1359,7 +1363,7 @@ int fetch_populated_submodules(struct repository *r,
 
 	argv_array_clear(&spf.args);
 out:
-	string_list_clear(&changed_submodule_names, 1);
+	string_list_clear(&spf.changed_submodule_names, 1);
 	return spf.result;
 }
 
-- 
2.19.0.605.g01d371f741-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4 5/9] submodule.c: do not copy around submodule list
  2018-09-25 19:47 [PATCH v4 0/9] fetch: make sure submodule oids are fetched Stefan Beller
                   ` (3 preceding siblings ...)
  2018-09-25 19:47 ` [PATCH v4 4/9] submodule: move global changed_submodule_names into fetch submodule struct Stefan Beller
@ 2018-09-25 19:47 ` Stefan Beller
  2018-09-25 19:47 ` [PATCH v4 6/9] repository: repo_submodule_init to take a submodule struct Stefan Beller
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Stefan Beller @ 2018-09-25 19:47 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

'calculate_changed_submodule_paths' uses a local list to compute the
changed submodules, and then produces the result by copying appropriate
items into the result list.

Instead use the result list directly and prune items afterwards
using string_list_remove_empty_items.

By doing so we'll have access to the util pointer for longer that
contains the commits that we need to fetch, which will be
useful in a later patch.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/submodule.c b/submodule.c
index 17103379ba4..dd478ed70bf 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1128,8 +1128,7 @@ static void calculate_changed_submodule_paths(
 	struct submodule_parallel_fetch *spf)
 {
 	struct argv_array argv = ARGV_ARRAY_INIT;
-	struct string_list changed_submodules = STRING_LIST_INIT_DUP;
-	const struct string_list_item *name;
+	struct string_list_item *name;
 
 	/* No need to check if there are no submodules configured */
 	if (!submodule_from_path(the_repository, NULL, NULL))
@@ -1146,9 +1145,9 @@ static void calculate_changed_submodule_paths(
 	 * Collect all submodules (whether checked out or not) for which new
 	 * commits have been recorded upstream in "changed_submodule_names".
 	 */
-	collect_changed_submodules(&changed_submodules, &argv);
+	collect_changed_submodules(&spf->changed_submodule_names, &argv);
 
-	for_each_string_list_item(name, &changed_submodules) {
+	for_each_string_list_item(name, &spf->changed_submodule_names) {
 		struct oid_array *commits = name->util;
 		const struct submodule *submodule;
 		const char *path = NULL;
@@ -1162,12 +1161,14 @@ static void calculate_changed_submodule_paths(
 		if (!path)
 			continue;
 
-		if (!submodule_has_commits(path, commits))
-			string_list_append(&spf->changed_submodule_names,
-					   name->string);
+		if (submodule_has_commits(path, commits)) {
+			oid_array_clear(commits);
+			*name->string = '\0';
+		}
 	}
 
-	free_submodules_oids(&changed_submodules);
+	string_list_remove_empty_items(&spf->changed_submodule_names, 1);
+
 	argv_array_clear(&argv);
 	oid_array_clear(&ref_tips_before_fetch);
 	oid_array_clear(&ref_tips_after_fetch);
@@ -1363,7 +1364,7 @@ int fetch_populated_submodules(struct repository *r,
 
 	argv_array_clear(&spf.args);
 out:
-	string_list_clear(&spf.changed_submodule_names, 1);
+	free_submodules_oids(&spf.changed_submodule_names);
 	return spf.result;
 }
 
-- 
2.19.0.605.g01d371f741-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4 6/9] repository: repo_submodule_init to take a submodule struct
  2018-09-25 19:47 [PATCH v4 0/9] fetch: make sure submodule oids are fetched Stefan Beller
                   ` (4 preceding siblings ...)
  2018-09-25 19:47 ` [PATCH v4 5/9] submodule.c: do not copy around submodule list Stefan Beller
@ 2018-09-25 19:47 ` Stefan Beller
  2018-09-25 19:47 ` [PATCH v4 7/9] submodule: fetch in submodules git directory instead of in worktree Stefan Beller
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Stefan Beller @ 2018-09-25 19:47 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

When constructing a struct repository for a submodule for some revision
of the superproject where the submodule is not contained in the index,
it may not be present in the working tree currently either. In that
siutation giving a 'path' argument is not useful. Upgrade the
repo_submodule_init function to take a struct submodule instead.

While we are at it, overhaul the repo_submodule_init function by renaming
the submodule repository struct, which is to be initialized, to a name
that is not confused with the struct submodule as easily.

Also move its documentation into the header file.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 builtin/grep.c              | 17 ++++++++++-------
 builtin/ls-files.c          | 12 +++++++-----
 builtin/submodule--helper.c |  2 +-
 repository.c                | 27 ++++++++++-----------------
 repository.h                | 11 +++++++++--
 5 files changed, 37 insertions(+), 32 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 601f801158f..81c53c862b1 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -418,16 +418,19 @@ static int grep_submodule(struct grep_opt *opt, struct repository *superproject,
 			  const struct object_id *oid,
 			  const char *filename, const char *path)
 {
-	struct repository submodule;
+	struct repository subrepo;
+	const struct submodule *sub = submodule_from_path(superproject,
+							  &null_oid, path);
+
 	int hit;
 
 	if (!is_submodule_active(superproject, path))
 		return 0;
 
-	if (repo_submodule_init(&submodule, superproject, path))
+	if (repo_submodule_init(&subrepo, superproject, sub))
 		return 0;
 
-	repo_read_gitmodules(&submodule);
+	repo_read_gitmodules(&subrepo);
 
 	/*
 	 * NEEDSWORK: This adds the submodule's object directory to the list of
@@ -440,7 +443,7 @@ static int grep_submodule(struct grep_opt *opt, struct repository *superproject,
 	 * object.
 	 */
 	grep_read_lock();
-	add_to_alternates_memory(submodule.objects->objectdir);
+	add_to_alternates_memory(subrepo.objects->objectdir);
 	grep_read_unlock();
 
 	if (oid) {
@@ -465,14 +468,14 @@ static int grep_submodule(struct grep_opt *opt, struct repository *superproject,
 
 		init_tree_desc(&tree, data, size);
 		hit = grep_tree(opt, pathspec, &tree, &base, base.len,
-				object->type == OBJ_COMMIT, &submodule);
+				object->type == OBJ_COMMIT, &subrepo);
 		strbuf_release(&base);
 		free(data);
 	} else {
-		hit = grep_cache(opt, &submodule, pathspec, 1);
+		hit = grep_cache(opt, &subrepo, pathspec, 1);
 	}
 
-	repo_clear(&submodule);
+	repo_clear(&subrepo);
 	return hit;
 }
 
diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 7f9919a3623..4d1649c1b3a 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -206,17 +206,19 @@ static void show_files(struct repository *repo, struct dir_struct *dir);
 static void show_submodule(struct repository *superproject,
 			   struct dir_struct *dir, const char *path)
 {
-	struct repository submodule;
+	struct repository subrepo;
+	const struct submodule *sub = submodule_from_path(superproject,
+							  &null_oid, path);
 
-	if (repo_submodule_init(&submodule, superproject, path))
+	if (repo_submodule_init(&subrepo, superproject, sub))
 		return;
 
-	if (repo_read_index(&submodule) < 0)
+	if (repo_read_index(&subrepo) < 0)
 		die("index file corrupt");
 
-	show_files(&submodule, dir);
+	show_files(&subrepo, dir);
 
-	repo_clear(&submodule);
+	repo_clear(&subrepo);
 }
 
 static void show_ce(struct repository *repo, struct dir_struct *dir,
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 40844870cfb..1164e0979bd 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -2037,7 +2037,7 @@ static int ensure_core_worktree(int argc, const char **argv, const char *prefix)
 	if (!sub)
 		BUG("We could get the submodule handle before?");
 
-	if (repo_submodule_init(&subrepo, the_repository, path))
+	if (repo_submodule_init(&subrepo, the_repository, sub))
 		die(_("could not get a repository handle for submodule '%s'"), path);
 
 	if (!repo_config_get_string(&subrepo, "core.worktree", &cw)) {
diff --git a/repository.c b/repository.c
index 5dd14867181..aabe64ee5d9 100644
--- a/repository.c
+++ b/repository.c
@@ -166,30 +166,23 @@ int repo_init(struct repository *repo,
 	return -1;
 }
 
-/*
- * Initialize 'submodule' as the submodule given by 'path' in parent repository
- * 'superproject'.
- * Return 0 upon success and a non-zero value upon failure.
- */
-int repo_submodule_init(struct repository *submodule,
+int repo_submodule_init(struct repository *subrepo,
 			struct repository *superproject,
-			const char *path)
+			const struct submodule *sub)
 {
-	const struct submodule *sub;
 	struct strbuf gitdir = STRBUF_INIT;
 	struct strbuf worktree = STRBUF_INIT;
 	int ret = 0;
 
-	sub = submodule_from_path(superproject, &null_oid, path);
 	if (!sub) {
 		ret = -1;
 		goto out;
 	}
 
-	strbuf_repo_worktree_path(&gitdir, superproject, "%s/.git", path);
-	strbuf_repo_worktree_path(&worktree, superproject, "%s", path);
+	strbuf_repo_worktree_path(&gitdir, superproject, "%s/.git", sub->path);
+	strbuf_repo_worktree_path(&worktree, superproject, "%s", sub->path);
 
-	if (repo_init(submodule, gitdir.buf, worktree.buf)) {
+	if (repo_init(subrepo, gitdir.buf, worktree.buf)) {
 		/*
 		 * If initilization fails then it may be due to the submodule
 		 * not being populated in the superproject's worktree.  Instead
@@ -201,16 +194,16 @@ int repo_submodule_init(struct repository *submodule,
 		strbuf_repo_git_path(&gitdir, superproject,
 				     "modules/%s", sub->name);
 
-		if (repo_init(submodule, gitdir.buf, NULL)) {
+		if (repo_init(subrepo, gitdir.buf, NULL)) {
 			ret = -1;
 			goto out;
 		}
 	}
 
-	submodule->submodule_prefix = xstrfmt("%s%s/",
-					      superproject->submodule_prefix ?
-					      superproject->submodule_prefix :
-					      "", path);
+	subrepo->submodule_prefix = xstrfmt("%s%s/",
+					    superproject->submodule_prefix ?
+					    superproject->submodule_prefix :
+					    "", sub->path);
 
 out:
 	strbuf_release(&gitdir);
diff --git a/repository.h b/repository.h
index 9f16c42c1ed..a9c7a5baa54 100644
--- a/repository.h
+++ b/repository.h
@@ -116,9 +116,16 @@ void repo_set_worktree(struct repository *repo, const char *path);
 void repo_set_hash_algo(struct repository *repo, int algo);
 void initialize_the_repository(void);
 int repo_init(struct repository *r, const char *gitdir, const char *worktree);
-int repo_submodule_init(struct repository *submodule,
+
+/*
+ * Initialize the repository 'subrepo' as the submodule given by the
+ * struct submodule 'sub' in parent repository 'superproject'.
+ * Return 0 upon success and a non-zero value upon failure.
+ */
+struct submodule;
+int repo_submodule_init(struct repository *subrepo,
 			struct repository *superproject,
-			const char *path);
+			const struct submodule *sub);
 void repo_clear(struct repository *repo);
 
 /*
-- 
2.19.0.605.g01d371f741-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4 7/9] submodule: fetch in submodules git directory instead of in worktree
  2018-09-25 19:47 [PATCH v4 0/9] fetch: make sure submodule oids are fetched Stefan Beller
                   ` (5 preceding siblings ...)
  2018-09-25 19:47 ` [PATCH v4 6/9] repository: repo_submodule_init to take a submodule struct Stefan Beller
@ 2018-09-25 19:47 ` Stefan Beller
  2018-09-25 19:47 ` [PATCH v4 8/9] fetch: retry fetching submodules if needed objects were not fetched Stefan Beller
  2018-09-25 19:47 ` [PATCH v4 9/9] builtin/fetch: check for submodule updates for non branch fetches Stefan Beller
  8 siblings, 0 replies; 14+ messages in thread
From: Stefan Beller @ 2018-09-25 19:47 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

This patch started as a refactoring to make 'get_next_submodule' more
readable, but upon doing so, I realized that "git fetch" of the submodule
actually doesn't need to be run in the submodules worktree. So let's run
it in its git dir instead.

That should pave the way towards fetching submodules that are currently
not checked out.

This patch leaks the cp->dir in get_next_submodule, as any further
callback in run_processes_parallel doesn't have access to the child
process any more. In an early iteration of this patch, the function
get_submodule_repo_for directly returned the string containing the
git directory, which would be a better design choice for this patch.

However the next patch both fixes the memory leak of cp->dir and also has
a use case for using the full repository handle of the submodule, so
it makes sense to introduce the get_submodule_repo_for here already.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule.c                 | 51 +++++++++++++++++++++++++++----------
 t/t5526-fetch-submodules.sh |  7 ++++-
 2 files changed, 44 insertions(+), 14 deletions(-)

diff --git a/submodule.c b/submodule.c
index dd478ed70bf..3f791f22771 100644
--- a/submodule.c
+++ b/submodule.c
@@ -481,6 +481,12 @@ void prepare_submodule_repo_env(struct argv_array *out)
 			 DEFAULT_GIT_DIR_ENVIRONMENT);
 }
 
+static void prepare_submodule_repo_env_in_gitdir(struct argv_array *out)
+{
+	prepare_submodule_repo_env_no_git_dir(out);
+	argv_array_pushf(out, "%s=.", GIT_DIR_ENVIRONMENT);
+}
+
 /* Helper function to display the submodule header line prior to the full
  * summary output. If it can locate the submodule objects directory it will
  * attempt to lookup both the left and right commits and put them into the
@@ -1227,6 +1233,29 @@ static int get_fetch_recurse_config(const struct submodule *submodule,
 	return spf->default_option;
 }
 
+static struct repository *get_submodule_repo_for(struct repository *r,
+						 const struct submodule *sub)
+{
+	struct repository *ret = xmalloc(sizeof(*ret));
+
+	if (repo_submodule_init(ret, r, sub)) {
+		/*
+		 * No entry in .gitmodules? Technically not a submodule,
+		 * but historically we supported repositories that happen to be
+		 * in-place where a gitlink is. Keep supporting them.
+		 */
+		struct strbuf gitdir = STRBUF_INIT;
+		strbuf_repo_worktree_path(&gitdir, r, "%s/.git", sub->path);
+		if (repo_init(ret, gitdir.buf, NULL)) {
+			strbuf_release(&gitdir);
+			return NULL;
+		}
+		strbuf_release(&gitdir);
+	}
+
+	return ret;
+}
+
 static int get_next_submodule(struct child_process *cp,
 			      struct strbuf *err, void *data, void **task_cb)
 {
@@ -1234,12 +1263,11 @@ static int get_next_submodule(struct child_process *cp,
 	struct submodule_parallel_fetch *spf = data;
 
 	for (; spf->count < spf->r->index->cache_nr; spf->count++) {
-		struct strbuf submodule_path = STRBUF_INIT;
-		struct strbuf submodule_git_dir = STRBUF_INIT;
 		struct strbuf submodule_prefix = STRBUF_INIT;
 		const struct cache_entry *ce = spf->r->index->cache[spf->count];
-		const char *git_dir, *default_argv;
+		const char *default_argv;
 		const struct submodule *submodule;
+		struct repository *repo;
 		struct submodule default_submodule = SUBMODULE_INIT;
 
 		if (!S_ISGITLINK(ce->ce_mode))
@@ -1274,16 +1302,12 @@ static int get_next_submodule(struct child_process *cp,
 			continue;
 		}
 
-		strbuf_repo_worktree_path(&submodule_path, spf->r, "%s", ce->name);
-		strbuf_addf(&submodule_git_dir, "%s/.git", submodule_path.buf);
 		strbuf_addf(&submodule_prefix, "%s%s/", spf->prefix, ce->name);
-		git_dir = read_gitfile(submodule_git_dir.buf);
-		if (!git_dir)
-			git_dir = submodule_git_dir.buf;
-		if (is_directory(git_dir)) {
+		repo = get_submodule_repo_for(spf->r, submodule);
+		if (repo) {
 			child_process_init(cp);
-			cp->dir = strbuf_detach(&submodule_path, NULL);
-			prepare_submodule_repo_env(&cp->env_array);
+			prepare_submodule_repo_env_in_gitdir(&cp->env_array);
+			cp->dir = xstrdup(repo->gitdir);
 			cp->git_cmd = 1;
 			if (!spf->quiet)
 				strbuf_addf(err, "Fetching submodule %s%s\n",
@@ -1293,10 +1317,11 @@ static int get_next_submodule(struct child_process *cp,
 			argv_array_push(&cp->args, default_argv);
 			argv_array_push(&cp->args, "--submodule-prefix");
 			argv_array_push(&cp->args, submodule_prefix.buf);
+
+			repo_clear(repo);
+			free(repo);
 			ret = 1;
 		}
-		strbuf_release(&submodule_path);
-		strbuf_release(&submodule_git_dir);
 		strbuf_release(&submodule_prefix);
 		if (ret) {
 			spf->count++;
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index 6c2f9b2ba26..42692219a1a 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -566,7 +566,12 @@ test_expect_success 'fetching submodule into a broken repository' '
 
 	test_must_fail git -C dst status &&
 	test_must_fail git -C dst diff &&
-	test_must_fail git -C dst fetch --recurse-submodules
+
+	# git-fetch cannot find the git directory of the submodule,
+	# so it will do nothing, successfully, as it cannot distinguish between
+	# this broken submodule and a submodule that was just set active but
+	# not cloned yet
+	git -C dst fetch --recurse-submodules
 '
 
 test_expect_success "fetch new commits when submodule got renamed" '
-- 
2.19.0.605.g01d371f741-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4 8/9] fetch: retry fetching submodules if needed objects were not fetched
  2018-09-25 19:47 [PATCH v4 0/9] fetch: make sure submodule oids are fetched Stefan Beller
                   ` (6 preceding siblings ...)
  2018-09-25 19:47 ` [PATCH v4 7/9] submodule: fetch in submodules git directory instead of in worktree Stefan Beller
@ 2018-09-25 19:47 ` Stefan Beller
  2018-09-25 19:47 ` [PATCH v4 9/9] builtin/fetch: check for submodule updates for non branch fetches Stefan Beller
  8 siblings, 0 replies; 14+ messages in thread
From: Stefan Beller @ 2018-09-25 19:47 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

Currently when git-fetch is asked to recurse into submodules, it dispatches
a plain "git-fetch -C <submodule-dir>" (with some submodule related options
such as prefix and recusing strategy, but) without any information of the
remote or the tip that should be fetched.

This works surprisingly well in some workflows (such as using submodules
as a third party library), while not so well in other scenarios, such
as in a Gerrit topic-based workflow, that can tie together changes
(potentially across repositories) on the server side. One of the parts
of such a Gerrit workflow is to download a change when wanting to examine
it, and you'd want to have its submodule changes that are in the same
topic downloaded as well. However these submodule changes reside in their
own repository in their own ref (refs/changes/<int>).

Retry fetching a submodule by object name if the object id that the
superproject points to, cannot be found.

This retrying does not happen when the "git fetch" done at the
superproject is not storing the fetched results in remote
tracking branches (i.e. instead just recording them to
FETCH_HEAD) in this step. A later patch will fix this.

builtin/fetch used to only inspect submodules when they were fetched
"on-demand", as in either on/off case it was clear whether the submodule
needs to be fetched. However to know whether we need to try fetching the
object ids, we need to identify the object names, which is done in this
function check_for_new_submodule_commits(), so we'll also run that code
in case the submodule recursion is set to "on".

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 builtin/fetch.c             |   9 +-
 submodule.c                 | 185 ++++++++++++++++++++++++++++++------
 t/t5526-fetch-submodules.sh |  16 ++++
 3 files changed, 177 insertions(+), 33 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 0696abfc2a1..e3b03ad3bd3 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -707,8 +707,7 @@ static int update_local_ref(struct ref *ref,
 			what = _("[new ref]");
 		}
 
-		if ((recurse_submodules != RECURSE_SUBMODULES_OFF) &&
-		    (recurse_submodules != RECURSE_SUBMODULES_ON))
+		if (recurse_submodules != RECURSE_SUBMODULES_OFF)
 			check_for_new_submodule_commits(&ref->new_oid);
 		r = s_update_ref(msg, ref, 0);
 		format_display(display, r ? '!' : '*', what,
@@ -723,8 +722,7 @@ static int update_local_ref(struct ref *ref,
 		strbuf_add_unique_abbrev(&quickref, &current->object.oid, DEFAULT_ABBREV);
 		strbuf_addstr(&quickref, "..");
 		strbuf_add_unique_abbrev(&quickref, &ref->new_oid, DEFAULT_ABBREV);
-		if ((recurse_submodules != RECURSE_SUBMODULES_OFF) &&
-		    (recurse_submodules != RECURSE_SUBMODULES_ON))
+		if (recurse_submodules != RECURSE_SUBMODULES_OFF)
 			check_for_new_submodule_commits(&ref->new_oid);
 		r = s_update_ref("fast-forward", ref, 1);
 		format_display(display, r ? '!' : ' ', quickref.buf,
@@ -738,8 +736,7 @@ static int update_local_ref(struct ref *ref,
 		strbuf_add_unique_abbrev(&quickref, &current->object.oid, DEFAULT_ABBREV);
 		strbuf_addstr(&quickref, "...");
 		strbuf_add_unique_abbrev(&quickref, &ref->new_oid, DEFAULT_ABBREV);
-		if ((recurse_submodules != RECURSE_SUBMODULES_OFF) &&
-		    (recurse_submodules != RECURSE_SUBMODULES_ON))
+		if (recurse_submodules != RECURSE_SUBMODULES_OFF)
 			check_for_new_submodule_commits(&ref->new_oid);
 		r = s_update_ref("forced-update", ref, 1);
 		format_display(display, r ? '!' : '+', quickref.buf,
diff --git a/submodule.c b/submodule.c
index 3f791f22771..05799362e05 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1127,8 +1127,12 @@ struct submodule_parallel_fetch {
 	int result;
 
 	struct string_list changed_submodule_names;
+	struct get_next_submodule_task **retry;
+	int retry_nr, retry_alloc;
 };
-#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, STRING_LIST_INIT_DUP }
+#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, \
+		  STRING_LIST_INIT_DUP, \
+		  NULL, 0, 0}
 
 static void calculate_changed_submodule_paths(
 	struct submodule_parallel_fetch *spf)
@@ -1233,6 +1237,56 @@ static int get_fetch_recurse_config(const struct submodule *submodule,
 	return spf->default_option;
 }
 
+struct get_next_submodule_task {
+	struct repository *repo;
+	const struct submodule *sub;
+	unsigned free_sub : 1; /* Do we need to free the submodule? */
+	struct oid_array *commits;
+};
+
+static const struct submodule *get_default_submodule(const char *path)
+{
+	struct submodule *ret = NULL;
+	const char *name = default_name_or_path(path);
+
+	if (!name)
+		return NULL;
+
+	ret = xmalloc(sizeof(*ret));
+	memset(ret, 0, sizeof(*ret));
+	ret->path = name;
+	ret->name = name;
+
+	return (const struct submodule *) ret;
+}
+
+static struct get_next_submodule_task *get_next_submodule_task_create(
+	struct repository *r, const char *path)
+{
+	struct get_next_submodule_task *task = xmalloc(sizeof(*task));
+	memset(task, 0, sizeof(*task));
+
+	task->sub = submodule_from_path(r, &null_oid, path);
+	if (!task->sub) {
+		task->sub = get_default_submodule(path);
+		task->free_sub = 1;
+	}
+
+	return task;
+}
+
+static void get_next_submodule_task_release(struct get_next_submodule_task *p)
+{
+	if (p->free_sub)
+		free((void*)p->sub);
+	p->free_sub = 0;
+	p->sub = NULL;
+
+	if (p->repo)
+		repo_clear(p->repo);
+	FREE_AND_NULL(p->repo);
+}
+
 static struct repository *get_submodule_repo_for(struct repository *r,
 						 const struct submodule *sub)
 {
@@ -1259,39 +1313,35 @@ static struct repository *get_submodule_repo_for(struct repository *r,
 static int get_next_submodule(struct child_process *cp,
 			      struct strbuf *err, void *data, void **task_cb)
 {
-	int ret = 0;
 	struct submodule_parallel_fetch *spf = data;
 
 	for (; spf->count < spf->r->index->cache_nr; spf->count++) {
-		struct strbuf submodule_prefix = STRBUF_INIT;
+		int recurse_config;
 		const struct cache_entry *ce = spf->r->index->cache[spf->count];
 		const char *default_argv;
-		const struct submodule *submodule;
-		struct repository *repo;
-		struct submodule default_submodule = SUBMODULE_INIT;
+		struct get_next_submodule_task *task;
 
 		if (!S_ISGITLINK(ce->ce_mode))
 			continue;
 
-		submodule = submodule_from_path(spf->r, &null_oid, ce->name);
-		if (!submodule) {
-			const char *name = default_name_or_path(ce->name);
-			if (name) {
-				default_submodule.path = name;
-				default_submodule.name = name;
-				submodule = &default_submodule;
-			}
+		task = get_next_submodule_task_create(spf->r, ce->name);
+
+		if (!task->sub) {
+			free(task);
+			continue;
 		}
 
-		switch (get_fetch_recurse_config(submodule, spf))
+		recurse_config = get_fetch_recurse_config(task->sub, spf);
+
+		switch (recurse_config)
 		{
 		default:
 		case RECURSE_SUBMODULES_DEFAULT:
 		case RECURSE_SUBMODULES_ON_DEMAND:
-			if (!submodule ||
+			if (!task->sub ||
 			    !string_list_lookup(
 					&spf->changed_submodule_names,
-					submodule->name))
+					task->sub->name))
 				continue;
 			default_argv = "on-demand";
 			break;
@@ -1302,12 +1352,12 @@ static int get_next_submodule(struct child_process *cp,
 			continue;
 		}
 
-		strbuf_addf(&submodule_prefix, "%s%s/", spf->prefix, ce->name);
-		repo = get_submodule_repo_for(spf->r, submodule);
-		if (repo) {
+		task->repo = get_submodule_repo_for(spf->r, task->sub);
+		if (task->repo) {
+			struct strbuf submodule_prefix = STRBUF_INIT;
 			child_process_init(cp);
 			prepare_submodule_repo_env_in_gitdir(&cp->env_array);
-			cp->dir = xstrdup(repo->gitdir);
+			cp->dir = task->repo->gitdir;
 			cp->git_cmd = 1;
 			if (!spf->quiet)
 				strbuf_addf(err, "Fetching submodule %s%s\n",
@@ -1316,18 +1366,51 @@ static int get_next_submodule(struct child_process *cp,
 			argv_array_pushv(&cp->args, spf->args.argv);
 			argv_array_push(&cp->args, default_argv);
 			argv_array_push(&cp->args, "--submodule-prefix");
+
+			strbuf_addf(&submodule_prefix, "%s%s/",
+						       spf->prefix,
+						       task->sub->path);
 			argv_array_push(&cp->args, submodule_prefix.buf);
 
-			repo_clear(repo);
-			free(repo);
-			ret = 1;
-		}
-		strbuf_release(&submodule_prefix);
-		if (ret) {
 			spf->count++;
+			*task_cb = task;
+
+			strbuf_release(&submodule_prefix);
 			return 1;
+		} else {
+			get_next_submodule_task_release(task);
+			free(task);
 		}
 	}
+
+	if (spf->retry_nr) {
+		struct get_next_submodule_task *task = spf->retry[spf->retry_nr - 1];
+		struct strbuf submodule_prefix = STRBUF_INIT;
+		spf->retry_nr--;
+
+		strbuf_addf(&submodule_prefix, "%s%s/", spf->prefix, task->sub->path);
+
+		child_process_init(cp);
+		prepare_submodule_repo_env_in_gitdir(&cp->env_array);
+		cp->git_cmd = 1;
+		cp->dir = task->repo->gitdir;
+
+		argv_array_init(&cp->args);
+		argv_array_pushv(&cp->args, spf->args.argv);
+		argv_array_push(&cp->args, "on-demand");
+		argv_array_push(&cp->args, "--submodule-prefix");
+		argv_array_push(&cp->args, submodule_prefix.buf);
+
+		/* NEEDSWORK: have get_default_remote from s--h */
+		argv_array_push(&cp->args, "origin");
+		oid_array_for_each_unique(task->commits,
+					  append_oid_to_argv, &cp->args);
+
+		*task_cb = task;
+		strbuf_release(&submodule_prefix);
+		return 1;
+	}
+
 	return 0;
 }
 
@@ -1335,20 +1418,68 @@ static int fetch_start_failure(struct strbuf *err,
 			       void *cb, void *task_cb)
 {
 	struct submodule_parallel_fetch *spf = cb;
+	struct get_next_submodule_task *task = task_cb;
 
 	spf->result = 1;
 
+	get_next_submodule_task_release(task);
 	return 0;
 }
 
+static int commit_exists_in_sub(const struct object_id *oid, void *data)
+{
+	struct repository *subrepo = data;
+
+	enum object_type type = oid_object_info(subrepo, oid, NULL);
+
+	return type != OBJ_COMMIT;
+}
+
 static int fetch_finish(int retvalue, struct strbuf *err,
 			void *cb, void *task_cb)
 {
 	struct submodule_parallel_fetch *spf = cb;
+	struct get_next_submodule_task *task = task_cb;
+	const struct submodule *sub;
+
+	struct string_list_item *it;
+	struct oid_array *commits;
 
 	if (retvalue)
 		spf->result = 1;
 
+	if (!task)
+		return 0;
+
+	sub = task->sub;
+	if (!sub)
+		goto out;
+
+	it = string_list_lookup(&spf->changed_submodule_names, sub->name);
+	if (!it)
+		goto out;
+
+	commits = it->util;
+	oid_array_filter(commits,
+			 commit_exists_in_sub,
+			 task->repo);
+
+	/* Are there commits that do not exist? */
+	if (commits->nr) {
+		/* We already tried fetching them, do not try again. */
+		if (task->commits)
+			return 0;
+
+		task->commits = commits;
+		ALLOC_GROW(spf->retry, spf->retry_nr + 1, spf->retry_alloc);
+		spf->retry[spf->retry_nr] = task;
+		spf->retry_nr++;
+		return 0;
+	}
+
+out:
+	get_next_submodule_task_release(task);
+
 	return 0;
 }
 
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index 42692219a1a..af12c50e7dd 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -605,4 +605,20 @@ test_expect_success "fetch new commits when submodule got renamed" '
 	test_cmp expect actual
 '
 
+test_expect_success "fetch new commits on-demand when they are not reachable" '
+	git checkout --detach &&
+	C=$(git -C submodule commit-tree -m "new change outside refs/heads" HEAD^{tree}) &&
+	git -C submodule update-ref refs/changes/1 $C &&
+	git update-index --cacheinfo 160000 $C submodule &&
+	git commit -m "updated submodule outside of refs/heads" &&
+	D=$(git rev-parse HEAD) &&
+	git update-ref refs/changes/2 $D &&
+	(
+		cd downstream &&
+		git fetch --recurse-submodules --recurse-submodules-default on-demand origin refs/changes/2:refs/heads/my_branch &&
+		git -C submodule cat-file -t $C &&
+		git checkout --recurse-submodules FETCH_HEAD
+	)
+'
+
 test_done
-- 
2.19.0.605.g01d371f741-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4 9/9] builtin/fetch: check for submodule updates for non branch fetches
  2018-09-25 19:47 [PATCH v4 0/9] fetch: make sure submodule oids are fetched Stefan Beller
                   ` (7 preceding siblings ...)
  2018-09-25 19:47 ` [PATCH v4 8/9] fetch: retry fetching submodules if needed objects were not fetched Stefan Beller
@ 2018-09-25 19:47 ` Stefan Beller
  8 siblings, 0 replies; 14+ messages in thread
From: Stefan Beller @ 2018-09-25 19:47 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

Gerrit, the code review tool, has a different workflow than our mailing
list based approach. Usually users upload changes to a Gerrit server and
continuous integration and testing happens by bots. Sometimes however a
user wants to checkout a change locally and look at it locally. For this
use case, Gerrit offers a command line snippet to copy and paste to your
terminal, which looks like

  git fetch https://<host>/gerrit refs/changes/<id> &&
  git checkout FETCH_HEAD

For Gerrit changes that contain changing submodule gitlinks, it would be
easy to extend both the fetch and checkout with the '--recurse-submodules'
flag, such that this command line snippet would produce the state of a
change locally.

However the functionality added in the previous patch, which would
ensure that we fetch the objects in the submodule that the gitlink pointed
at, only works for remote tracking branches so far, not for FETCH_HEAD.

Make sure that fetching a superproject to its FETCH_HEAD, also respects
the existence checks for objects in the submodule recursion.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 builtin/fetch.c             | 5 ++++-
 t/t5526-fetch-submodules.sh | 2 +-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index e3b03ad3bd3..f2d9e548bf0 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -894,11 +894,14 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 				rc |= update_local_ref(ref, what, rm, &note,
 						       summary_width);
 				free(ref);
-			} else
+			} else {
+				check_for_new_submodule_commits(&rm->old_oid);
 				format_display(&note, '*',
 					       *kind ? kind : "branch", NULL,
 					       *what ? what : "HEAD",
 					       "FETCH_HEAD", summary_width);
+			}
+
 			if (note.len) {
 				if (verbosity >= 0 && !shown_url) {
 					fprintf(stderr, _("From %.*s\n"),
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index af12c50e7dd..a509eabb044 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -615,7 +615,7 @@ test_expect_success "fetch new commits on-demand when they are not reachable" '
 	git update-ref refs/changes/2 $D &&
 	(
 		cd downstream &&
-		git fetch --recurse-submodules --recurse-submodules-default on-demand origin refs/changes/2:refs/heads/my_branch &&
+		git fetch --recurse-submodules origin refs/changes/2 &&
 		git -C submodule cat-file -t $C &&
 		git checkout --recurse-submodules FETCH_HEAD
 	)
-- 
2.19.0.605.g01d371f741-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 3/9] submodule.c: sort changed_submodule_names before searching it
  2018-09-25 19:47 ` [PATCH v4 3/9] submodule.c: sort changed_submodule_names before searching it Stefan Beller
@ 2018-09-26 22:14   ` Junio C Hamano
  0 siblings, 0 replies; 14+ messages in thread
From: Junio C Hamano @ 2018-09-26 22:14 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Stefan Beller <sbeller@google.com> writes:

> We can string_list_insert() to maintain sorted-ness of the
> list as we find new items, or we can string_list_append() to
> build an unsorted list and sort it at the end just once.
>
> To pick which one is more appropriate, we notice the fact
> that we discover new items more or less in the already
> sorted order.  That makes "append then sort" more
> appropriate.

I somehow thought that we agreed that the second paragraph above did
not make much sense in the previous review round.

    ... goes and looks ...

https://public-inbox.org/git/CAGZ79kbavjVbTqXsmtjW6=jhkq47_p3mc6=92xOp4_mfhqDtvw@mail.gmail.com/

That was two review cycles ago, I guess.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 1/9] sha1-array: provide oid_array_filter
  2018-09-25 19:47 ` [PATCH v4 1/9] sha1-array: provide oid_array_filter Stefan Beller
@ 2018-09-26 22:17   ` Junio C Hamano
  0 siblings, 0 replies; 14+ messages in thread
From: Junio C Hamano @ 2018-09-26 22:17 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Stefan Beller <sbeller@google.com> writes:

> Helped-by: Junio C Hamano <gitster@pobox.com>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  Documentation/technical/api-oid-array.txt |  5 +++++
>  sha1-array.c                              | 17 +++++++++++++++++
>  sha1-array.h                              |  3 +++
>  3 files changed, 25 insertions(+)

Perfect ;-)

>
> diff --git a/Documentation/technical/api-oid-array.txt b/Documentation/technical/api-oid-array.txt
> index 9febfb1d528..c97428c2c34 100644
> --- a/Documentation/technical/api-oid-array.txt
> +++ b/Documentation/technical/api-oid-array.txt
> @@ -48,6 +48,11 @@ Functions
>  	is not sorted, this function has the side effect of sorting
>  	it.
>  
> +`oid_array_filter`::
> +	Apply the callback function `want` to each entry in the array,
> +	retaining only the entries for which the function returns true.
> +	Preserve the order of the entries that are retained.
> +
>  Examples
>  --------
>  
> diff --git a/sha1-array.c b/sha1-array.c
> index b94e0ec0f5e..d922e94e3fc 100644
> --- a/sha1-array.c
> +++ b/sha1-array.c
> @@ -77,3 +77,20 @@ int oid_array_for_each_unique(struct oid_array *array,
>  	}
>  	return 0;
>  }
> +
> +void oid_array_filter(struct oid_array *array,
> +		      for_each_oid_fn want,
> +		      void *cb_data)
> +{
> +	unsigned nr = array->nr, src, dst;
> +	struct object_id *oids = array->oid;
> +
> +	for (src = dst = 0; src < nr; src++) {
> +		if (want(&oids[src], cb_data)) {
> +			if (src != dst)
> +				oidcpy(&oids[dst], &oids[src]);
> +			dst++;
> +		}
> +	}
> +	array->nr = dst;
> +}
> diff --git a/sha1-array.h b/sha1-array.h
> index 232bf950172..55d016c4bf7 100644
> --- a/sha1-array.h
> +++ b/sha1-array.h
> @@ -22,5 +22,8 @@ int oid_array_for_each(struct oid_array *array,
>  int oid_array_for_each_unique(struct oid_array *array,
>  			      for_each_oid_fn fn,
>  			      void *data);
> +void oid_array_filter(struct oid_array *array,
> +		      for_each_oid_fn want,
> +		      void *cbdata);
>  
>  #endif /* SHA1_ARRAY_H */

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 2/9] submodule.c: fix indentation
  2018-09-25 19:47 ` [PATCH v4 2/9] submodule.c: fix indentation Stefan Beller
@ 2018-09-26 22:18   ` Junio C Hamano
  0 siblings, 0 replies; 14+ messages in thread
From: Junio C Hamano @ 2018-09-26 22:18 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Stefan Beller <sbeller@google.com> writes:

> The submodule subsystem is really bad at staying within 80 characters.
> Fix it while we are here.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---

Makes sense.

>  submodule.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/submodule.c b/submodule.c
> index b53cb6e9c47..0de9e2800ad 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -1244,7 +1244,8 @@ static int get_next_submodule(struct child_process *cp,
>  		if (!submodule) {
>  			const char *name = default_name_or_path(ce->name);
>  			if (name) {
> -				default_submodule.path = default_submodule.name = name;
> +				default_submodule.path = name;
> +				default_submodule.name = name;
>  				submodule = &default_submodule;
>  			}
>  		}
> @@ -1254,8 +1255,10 @@ static int get_next_submodule(struct child_process *cp,
>  		default:
>  		case RECURSE_SUBMODULES_DEFAULT:
>  		case RECURSE_SUBMODULES_ON_DEMAND:
> -			if (!submodule || !unsorted_string_list_lookup(&changed_submodule_names,
> -							 submodule->name))
> +			if (!submodule ||
> +			    !unsorted_string_list_lookup(
> +					&changed_submodule_names,
> +					submodule->name))
>  				continue;
>  			default_argv = "on-demand";
>  			break;

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 4/9] submodule: move global changed_submodule_names into fetch submodule struct
  2018-09-25 19:47 ` [PATCH v4 4/9] submodule: move global changed_submodule_names into fetch submodule struct Stefan Beller
@ 2018-09-26 22:19   ` Junio C Hamano
  0 siblings, 0 replies; 14+ messages in thread
From: Junio C Hamano @ 2018-09-26 22:19 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Stefan Beller <sbeller@google.com> writes:

> The `changed_submodule_names` are only used for fetching, so let's make it
> part of the struct that is passed around for fetching submodules.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  submodule.c | 42 +++++++++++++++++++++++-------------------
>  1 file changed, 23 insertions(+), 19 deletions(-)

Yup, the less file-scope static we have and the more of them moved
to a struct, the closer we get to be able to use multiple of them at
the same time, which is a very nice step in the right direction.

>
> diff --git a/submodule.c b/submodule.c
> index 22c64bd8559..17103379ba4 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -25,7 +25,7 @@
>  #include "commit-reach.h"
>  
>  static int config_update_recurse_submodules = RECURSE_SUBMODULES_OFF;
> -static struct string_list changed_submodule_names = STRING_LIST_INIT_DUP;
> +
>  static int initialized_fetch_ref_tips;
>  static struct oid_array ref_tips_before_fetch;
>  static struct oid_array ref_tips_after_fetch;
> @@ -1110,7 +1110,22 @@ void check_for_new_submodule_commits(struct object_id *oid)
>  	oid_array_append(&ref_tips_after_fetch, oid);
>  }
>  
> -static void calculate_changed_submodule_paths(void)
> +struct submodule_parallel_fetch {
> +	int count;
> +	struct argv_array args;
> +	struct repository *r;
> +	const char *prefix;
> +	int command_line_option;
> +	int default_option;
> +	int quiet;
> +	int result;
> +
> +	struct string_list changed_submodule_names;
> +};
> +#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, STRING_LIST_INIT_DUP }
> +
> +static void calculate_changed_submodule_paths(
> +	struct submodule_parallel_fetch *spf)
>  {
>  	struct argv_array argv = ARGV_ARRAY_INIT;
>  	struct string_list changed_submodules = STRING_LIST_INIT_DUP;
> @@ -1148,7 +1163,8 @@ static void calculate_changed_submodule_paths(void)
>  			continue;
>  
>  		if (!submodule_has_commits(path, commits))
> -			string_list_append(&changed_submodule_names, name->string);
> +			string_list_append(&spf->changed_submodule_names,
> +					   name->string);
>  	}
>  
>  	free_submodules_oids(&changed_submodules);
> @@ -1185,18 +1201,6 @@ int submodule_touches_in_range(struct object_id *excl_oid,
>  	return ret;
>  }
>  
> -struct submodule_parallel_fetch {
> -	int count;
> -	struct argv_array args;
> -	struct repository *r;
> -	const char *prefix;
> -	int command_line_option;
> -	int default_option;
> -	int quiet;
> -	int result;
> -};
> -#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0}
> -
>  static int get_fetch_recurse_config(const struct submodule *submodule,
>  				    struct submodule_parallel_fetch *spf)
>  {
> @@ -1257,7 +1261,7 @@ static int get_next_submodule(struct child_process *cp,
>  		case RECURSE_SUBMODULES_ON_DEMAND:
>  			if (!submodule ||
>  			    !string_list_lookup(
> -					&changed_submodule_names,
> +					&spf->changed_submodule_names,
>  					submodule->name))
>  				continue;
>  			default_argv = "on-demand";
> @@ -1349,8 +1353,8 @@ int fetch_populated_submodules(struct repository *r,
>  	argv_array_push(&spf.args, "--recurse-submodules-default");
>  	/* default value, "--submodule-prefix" and its value are added later */
>  
> -	calculate_changed_submodule_paths();
> -	string_list_sort(&changed_submodule_names);
> +	calculate_changed_submodule_paths(&spf);
> +	string_list_sort(&spf.changed_submodule_names);
>  	run_processes_parallel(max_parallel_jobs,
>  			       get_next_submodule,
>  			       fetch_start_failure,
> @@ -1359,7 +1363,7 @@ int fetch_populated_submodules(struct repository *r,
>  
>  	argv_array_clear(&spf.args);
>  out:
> -	string_list_clear(&changed_submodule_names, 1);
> +	string_list_clear(&spf.changed_submodule_names, 1);
>  	return spf.result;
>  }

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-09-26 22:19 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-25 19:47 [PATCH v4 0/9] fetch: make sure submodule oids are fetched Stefan Beller
2018-09-25 19:47 ` [PATCH v4 1/9] sha1-array: provide oid_array_filter Stefan Beller
2018-09-26 22:17   ` Junio C Hamano
2018-09-25 19:47 ` [PATCH v4 2/9] submodule.c: fix indentation Stefan Beller
2018-09-26 22:18   ` Junio C Hamano
2018-09-25 19:47 ` [PATCH v4 3/9] submodule.c: sort changed_submodule_names before searching it Stefan Beller
2018-09-26 22:14   ` Junio C Hamano
2018-09-25 19:47 ` [PATCH v4 4/9] submodule: move global changed_submodule_names into fetch submodule struct Stefan Beller
2018-09-26 22:19   ` Junio C Hamano
2018-09-25 19:47 ` [PATCH v4 5/9] submodule.c: do not copy around submodule list Stefan Beller
2018-09-25 19:47 ` [PATCH v4 6/9] repository: repo_submodule_init to take a submodule struct Stefan Beller
2018-09-25 19:47 ` [PATCH v4 7/9] submodule: fetch in submodules git directory instead of in worktree Stefan Beller
2018-09-25 19:47 ` [PATCH v4 8/9] fetch: retry fetching submodules if needed objects were not fetched Stefan Beller
2018-09-25 19:47 ` [PATCH v4 9/9] builtin/fetch: check for submodule updates for non branch fetches Stefan Beller

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).