git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jonathan Tan <jonathantanmy@google.com>
To: sbeller@google.com
Cc: gitster@pobox.com, git@vger.kernel.org,
	Jonathan Tan <jonathantanmy@google.com>
Subject: Re: [PATCH 8/9] fetch: retry fetching submodules if needed objects were not fetched
Date: Wed, 17 Oct 2018 17:39:54 -0700	[thread overview]
Message-ID: <20181018003954.139498-1-jonathantanmy@google.com> (raw)
In-Reply-To: <20181016181327.107186-9-sbeller@google.com>

> Currently when git-fetch is asked to recurse into submodules, it dispatches
> a plain "git-fetch -C <submodule-dir>" (with some submodule related options
> such as prefix and recusing strategy, but) without any information of the
> remote or the tip that should be fetched.
> 
> This works surprisingly well in some workflows (such as using submodules
> as a third party library), while not so well in other scenarios, such
> as in a Gerrit topic-based workflow, that can tie together changes
> (potentially across repositories) on the server side. One of the parts
> of such a Gerrit workflow is to download a change when wanting to examine
> it, and you'd want to have its submodule changes that are in the same
> topic downloaded as well. However these submodule changes reside in their
> own repository in their own ref (refs/changes/<int>).

It seems that the pertinent situation is when, in the superproject, you
fetch a commit (which may be the target of a ref, or an ancestor of the
target of a ref) that points to a submodule commit that was not fetched
by the default refspec-less fetch that you describe in the first
paragraph. I would just describe it as follows:

  But this default fetch is not sufficient, as a newly fetched commit in
  the superproject could point to a commit in the submodule that is not
  in the default refspec. This is common in workflows like Gerrit's.
  When fetching a Gerrit change under review (from refs/changes/??), the
  commits in that change likely point to submodule commits that have not
  been merged to a branch yet.

Another thing you need to clarify is what happens if the fetch-by-commit
fails. Right now, it seems that it will make the whole thing fail, which
might be a surprising change in behavior.

> Retry fetching a submodule by object name if the object id that the
> superproject points to, cannot be found.

I don't really think of this as a retry - the first time, you're
fetching the default refspec, and the second time, with a specific list
of object IDs. Also, be consistent in the usage of "object name" and
"object id" - as far as I know, they are the same thing.

> This retrying does not happen when the "git fetch" done at the
> superproject is not storing the fetched results in remote
> tracking branches (i.e. instead just recording them to
> FETCH_HEAD) in this step. A later patch will fix this.

The test stores the result in a normal branch, not a remote tracking
branch. Is storing in a normal branch required?

Also, do you know why this is required? A naive reading of the patch
leads me to believe that this should work even if merely fetching to
FETCH_HEAD.

> +struct get_next_submodule_task {
> +	struct repository *repo;
> +	const struct submodule *sub;
> +	unsigned free_sub : 1; /* Do we need to free the submodule? */
> +	struct oid_array *commits;
> +};

Firstly, I don't think "commits" needs to be a pointer.

Document at least "commits". As far as I can tell, if NULL (or empty if
you take my suggestion), this means that this task is the first pass for
this particular submodule and the fetch needs to be done using the
default refspec. Otherwise, this task is the second pass for this
particular submodule and the fetch needs to be done passing the
contained OIDs.

> +static const struct submodule *get_default_submodule(const char *path)
> +{
> +	struct submodule *ret = NULL;
> +	const char *name = default_name_or_path(path);
> +
> +	if (!name)
> +		return NULL;
> +
> +	ret = xmalloc(sizeof(*ret));
> +	memset(ret, 0, sizeof(*ret));
> +	ret->path = name;
> +	ret->name = name;
> +
> +	return (const struct submodule *) ret;
> +}

What is a "default" submodule and why would you need one?

> +		task = get_next_submodule_task_create(spf->r, ce->name);
> +
> +		if (!task->sub) {
> +			free(task);
> +			continue;
>  		}

Will task->sub ever be NULL?

> +	if (spf->retry_nr) {
> +		struct get_next_submodule_task *task = spf->retry[spf->retry_nr - 1];
> +		struct strbuf submodule_prefix = STRBUF_INIT;
> +		spf->retry_nr--;
> +
> +		strbuf_addf(&submodule_prefix, "%s%s/", spf->prefix, task->sub->path);
> +
> +		child_process_init(cp);
> +		prepare_submodule_repo_env_in_gitdir(&cp->env_array);
> +		cp->git_cmd = 1;
> +		cp->dir = task->repo->gitdir;
> +
> +		argv_array_init(&cp->args);
> +		argv_array_pushv(&cp->args, spf->args.argv);
> +		argv_array_push(&cp->args, "on-demand");

This means that we need to trust that the last entry in spf->args can
take an "on-demand" parameter. Could we supply that argument here
explicitly instead?

> +		argv_array_push(&cp->args, "--submodule-prefix");
> +		argv_array_push(&cp->args, submodule_prefix.buf);
> +
> +		/* NEEDSWORK: have get_default_remote from s--h */

What is s--h?

> +static int commit_exists_in_sub(const struct object_id *oid, void *data)
> +{
> +	struct repository *subrepo = data;
> +
> +	enum object_type type = oid_object_info(subrepo, oid, NULL);
> +
> +	return type != OBJ_COMMIT;
> +}

Shouldn't the function name be commit_missing_in_sub?

>  static int fetch_finish(int retvalue, struct strbuf *err,
>  			void *cb, void *task_cb)
>  {
>  	struct submodule_parallel_fetch *spf = cb;
> +	struct get_next_submodule_task *task = task_cb;
> +	const struct submodule *sub;
> +
> +	struct string_list_item *it;
> +	struct oid_array *commits;
>  
>  	if (retvalue)
>  		spf->result = 1;
>  
> +	if (!task)
> +		return 0;

When will task be NULL?

> +
> +	sub = task->sub;
> +	if (!sub)
> +		goto out;

Same as above - when will sub be NULL?

> +	it = string_list_lookup(&spf->changed_submodule_names, sub->name);
> +	if (!it)
> +		goto out;

And "it" as well.

> +	commits = it->util;
> +	oid_array_filter(commits,
> +			 commit_exists_in_sub,
> +			 task->repo);
> +
> +	/* Are there commits that do not exist? */
> +	if (commits->nr) {
> +		/* We already tried fetching them, do not try again. */
> +		if (task->commits)
> +			return 0;

Clearer and more efficient if the check for task->commits was first
before all the filtering.

> +test_expect_success "fetch new commits on-demand when they are not reachable" '

"not reachable" confused me - they are indeed reachable, just not from
the default refspec.

> +	git checkout --detach &&
> +	C=$(git -C submodule commit-tree -m "new change outside refs/heads" HEAD^{tree}) &&
> +	git -C submodule update-ref refs/changes/1 $C &&
> +	git update-index --cacheinfo 160000 $C submodule &&
> +	git commit -m "updated submodule outside of refs/heads" &&
> +	D=$(git rev-parse HEAD) &&
> +	git update-ref refs/changes/2 $D &&
> +	(
> +		cd downstream &&
> +		git fetch --recurse-submodules --recurse-submodules-default on-demand origin refs/changes/2:refs/heads/my_branch &&
> +		git -C submodule cat-file -t $C &&
> +		git checkout --recurse-submodules FETCH_HEAD
> +	)
> +'

For maximum test coverage, I think this test should involve 2
submodules:

 B   C  E   F
  \ /    \ /
   A      D

and the upstream superproject having:

  G -> H -> I

The downstream superproject already has G, and is fetching I. In H, the
submodule gitlinks point to B and E respectively, and in I, the
submodule gitlinks point to C and F respectively. This ensures that both
multiple submodules work, and that submodule commits are also fetched
for interior superproject commits.

  reply	other threads:[~2018-10-18  0:40 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-16 18:13 [PATCH 0/9] Resending sb/submodule-recursive-fetch-gets-the-tip Stefan Beller
2018-10-16 18:13 ` [PATCH 1/9] sha1-array: provide oid_array_filter Stefan Beller
2018-10-16 18:13 ` [PATCH 2/9] submodule.c: fix indentation Stefan Beller
2018-10-16 18:13 ` [PATCH 3/9] submodule.c: sort changed_submodule_names before searching it Stefan Beller
2018-10-17 21:21   ` Jonathan Tan
2018-10-16 18:13 ` [PATCH 4/9] submodule.c: move global changed_submodule_names into fetch submodule struct Stefan Beller
2018-10-17 21:26   ` Jonathan Tan
2018-10-18 19:09     ` Stefan Beller
2018-10-16 18:13 ` [PATCH 5/9] submodule.c: do not copy around submodule list Stefan Beller
2018-10-17 21:45   ` Jonathan Tan
2018-10-18  2:35     ` Junio C Hamano
2018-10-16 18:13 ` [PATCH 6/9] repository: repo_submodule_init to take a submodule struct Stefan Beller
2018-10-17 21:55   ` Jonathan Tan
2018-10-16 18:13 ` [PATCH 7/9] submodule: fetch in submodules git directory instead of in worktree Stefan Beller
2018-10-17 22:58   ` Jonathan Tan
2018-10-23 18:26     ` Stefan Beller
2018-10-23 22:55       ` Jonathan Tan
2018-10-23 23:01         ` Stefan Beller
2018-10-16 18:13 ` [PATCH 8/9] fetch: retry fetching submodules if needed objects were not fetched Stefan Beller
2018-10-18  0:39   ` Jonathan Tan [this message]
2018-10-23 22:37     ` Stefan Beller
2018-10-23 23:37       ` Jonathan Tan
2018-10-25 21:42         ` Stefan Beller
2018-10-16 18:13 ` [PATCH 9/9] builtin/fetch: check for submodule updates for non branch fetches Stefan Beller
2018-10-18  0:47   ` Jonathan Tan
2018-10-18  2:30 ` [PATCH 0/9] Resending sb/submodule-recursive-fetch-gets-the-tip Junio C Hamano
2018-10-18  7:30 ` Junio C Hamano
2018-10-18 18:00   ` Stefan Beller
  -- strict thread matches above, loose matches on Subject: below --
2018-09-11 23:49 [PATCH 0/9] fetch: make sure submodule oids are fetched Stefan Beller
2018-09-17 21:35 ` [PATCHv2 " Stefan Beller
2018-09-17 21:35   ` [PATCH 8/9] fetch: retry fetching submodules if needed objects were not fetched Stefan Beller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181018003954.139498-1-jonathantanmy@google.com \
    --to=jonathantanmy@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).