From: Stefan Beller <sbeller@google.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: Junio C Hamano <gitster@pobox.com>, git <git@vger.kernel.org>
Subject: Re: [PATCH 8/9] fetch: retry fetching submodules if needed objects were not fetched
Date: Tue, 23 Oct 2018 15:37:48 -0700 [thread overview]
Message-ID: <CAGZ79kZrK5G-EeGRzxyw0xW3ozo9_aPab4r8fn_Jc4hzKDOEwg@mail.gmail.com> (raw)
In-Reply-To: <20181018003954.139498-1-jonathantanmy@google.com>
On Wed, Oct 17, 2018 at 5:40 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> > Currently when git-fetch is asked to recurse into submodules, it dispatches
> > a plain "git-fetch -C <submodule-dir>" (with some submodule related options
> > such as prefix and recusing strategy, but) without any information of the
> > remote or the tip that should be fetched.
> >
> > This works surprisingly well in some workflows (such as using submodules
> > as a third party library), while not so well in other scenarios, such
> > as in a Gerrit topic-based workflow, that can tie together changes
> > (potentially across repositories) on the server side. One of the parts
> > of such a Gerrit workflow is to download a change when wanting to examine
> > it, and you'd want to have its submodule changes that are in the same
> > topic downloaded as well. However these submodule changes reside in their
> > own repository in their own ref (refs/changes/<int>).
>
> It seems that the pertinent situation is when, in the superproject, you
> fetch a commit (which may be the target of a ref, or an ancestor of the
> target of a ref) that points to a submodule commit that was not fetched
> by the default refspec-less fetch that you describe in the first
> paragraph. I would just describe it as follows:
>
> But this default fetch is not sufficient, as a newly fetched commit in
> the superproject could point to a commit in the submodule that is not
> in the default refspec. This is common in workflows like Gerrit's.
> When fetching a Gerrit change under review (from refs/changes/??), the
> commits in that change likely point to submodule commits that have not
> been merged to a branch yet.
Makes sense.
> Another thing you need to clarify is what happens if the fetch-by-commit
> fails. Right now, it seems that it will make the whole thing fail, which
> might be a surprising change in behavior.
But a positive surprise, I would assume?
> The test stores the result in a normal branch, not a remote tracking
> branch. Is storing in a normal branch required?
In the test we fetch from another repository, such that in the
repository-under-test this will show up in a remote tracking branch?
> Also, do you know why this is required? A naive reading of the patch
> leads me to believe that this should work even if merely fetching to
> FETCH_HEAD.
See the next patch, check_for_new_submodule_commits() is missing
for FETCH_HEAD.
>
> > +struct get_next_submodule_task {
> > + struct repository *repo;
> > + const struct submodule *sub;
> > + unsigned free_sub : 1; /* Do we need to free the submodule? */
> > + struct oid_array *commits;
> > +};
>
> Firstly, I don't think "commits" needs to be a pointer.
>
> Document at least "commits". As far as I can tell, if NULL (or empty if
> you take my suggestion), this means that this task is the first pass for
> this particular submodule and the fetch needs to be done using the
> default refspec. Otherwise, this task is the second pass for this
> particular submodule and the fetch needs to be done passing the
> contained OIDs.
Makes sense. I think I'll make it a non-pointer, but will introduce another
flag or counter for the phase.
>
> > +static const struct submodule *get_default_submodule(const char *path)
> > +{
> > + struct submodule *ret = NULL;
> > + const char *name = default_name_or_path(path);
> > +
> > + if (!name)
> > + return NULL;
> > +
> > + ret = xmalloc(sizeof(*ret));
> > + memset(ret, 0, sizeof(*ret));
> > + ret->path = name;
> > + ret->name = name;
> > +
> > + return (const struct submodule *) ret;
> > +}
>
> What is a "default" submodule and why would you need one?
s/default/artificial/. Such a submodule is a submodule that has no
config in the .gitmodules file and its name == path.
We need to keep those around for historic reasons AFAICT, c.f.
c68f837576 (implement fetching of moved submodules, 2017-10-16)
> > + task = get_next_submodule_task_create(spf->r, ce->name);
> > +
> > + if (!task->sub) {
> > + free(task);
> > + continue;
> > }
>
> Will task->sub ever be NULL?
Yes, if we fall back to these "default" submodule and just try if it
can be handled
as a submodule, but it cannot be handled as such,
get_next_submodule_task_create has
task->sub = submodule_from_path(r, &null_oid, path);
if (!task->sub) {
task->sub = get_default_submodule(path);
and get_default_submodule can return NULL.
>
> > + if (spf->retry_nr) {
> > + struct get_next_submodule_task *task = spf->retry[spf->retry_nr - 1];
> > + struct strbuf submodule_prefix = STRBUF_INIT;
> > + spf->retry_nr--;
> > +
> > + strbuf_addf(&submodule_prefix, "%s%s/", spf->prefix, task->sub->path);
> > +
> > + child_process_init(cp);
> > + prepare_submodule_repo_env_in_gitdir(&cp->env_array);
> > + cp->git_cmd = 1;
> > + cp->dir = task->repo->gitdir;
> > +
> > + argv_array_init(&cp->args);
> > + argv_array_pushv(&cp->args, spf->args.argv);
> > + argv_array_push(&cp->args, "on-demand");
>
> This means that we need to trust that the last entry in spf->args can
> take an "on-demand" parameter. Could we supply that argument here
> explicitly instead?
>
> > + argv_array_push(&cp->args, "--submodule-prefix");
> > + argv_array_push(&cp->args, submodule_prefix.buf);
> > +
> > + /* NEEDSWORK: have get_default_remote from s--h */
>
> What is s--h?
builtin/submodule--helper, will clarify
>
> > +static int commit_exists_in_sub(const struct object_id *oid, void *data)
> > +{
> > + struct repository *subrepo = data;
> > +
> > + enum object_type type = oid_object_info(subrepo, oid, NULL);
> > +
> > + return type != OBJ_COMMIT;
> > +}
>
> Shouldn't the function name be commit_missing_in_sub?
yes.
>
> > static int fetch_finish(int retvalue, struct strbuf *err,
> > void *cb, void *task_cb)
> > {
> > struct submodule_parallel_fetch *spf = cb;
> > + struct get_next_submodule_task *task = task_cb;
> > + const struct submodule *sub;
> > +
> > + struct string_list_item *it;
> > + struct oid_array *commits;
> >
> > if (retvalue)
> > spf->result = 1;
> >
> > + if (!task)
> > + return 0;
>
> When will task be NULL?
>
> > +
> > + sub = task->sub;
> > + if (!sub)
> > + goto out;
>
> Same as above - when will sub be NULL?
>
> > + it = string_list_lookup(&spf->changed_submodule_names, sub->name);
> > + if (!it)
> > + goto out;
>
> And "it" as well.
I'll add comments.
>
> > + commits = it->util;
> > + oid_array_filter(commits,
> > + commit_exists_in_sub,
> > + task->repo);
> > +
> > + /* Are there commits that do not exist? */
> > + if (commits->nr) {
> > + /* We already tried fetching them, do not try again. */
> > + if (task->commits)
> > + return 0;
>
> Clearer and more efficient if the check for task->commits was first
> before all the filtering.
>
> > +test_expect_success "fetch new commits on-demand when they are not reachable" '
>
> "not reachable" confused me - they are indeed reachable, just not from
> the default refspec.
Makes sense
>
> > + git checkout --detach &&
> > + C=$(git -C submodule commit-tree -m "new change outside refs/heads" HEAD^{tree}) &&
> > + git -C submodule update-ref refs/changes/1 $C &&
> > + git update-index --cacheinfo 160000 $C submodule &&
> > + git commit -m "updated submodule outside of refs/heads" &&
> > + D=$(git rev-parse HEAD) &&
> > + git update-ref refs/changes/2 $D &&
> > + (
> > + cd downstream &&
> > + git fetch --recurse-submodules --recurse-submodules-default on-demand origin refs/changes/2:refs/heads/my_branch &&
> > + git -C submodule cat-file -t $C &&
> > + git checkout --recurse-submodules FETCH_HEAD
> > + )
> > +'
>
> For maximum test coverage, I think this test should involve 2
> submodules:
>
> B C E F
> \ / \ /
> A D
>
> and the upstream superproject having:
>
> G -> H -> I
>
> The downstream superproject already has G, and is fetching I. In H, the
> submodule gitlinks point to B and E respectively, and in I, the
> submodule gitlinks point to C and F respectively. This ensures that both
> multiple submodules work, and that submodule commits are also fetched
> for interior superproject commits.
ok.
next prev parent reply other threads:[~2018-10-23 22:38 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-10-16 18:13 [PATCH 0/9] Resending sb/submodule-recursive-fetch-gets-the-tip Stefan Beller
2018-10-16 18:13 ` [PATCH 1/9] sha1-array: provide oid_array_filter Stefan Beller
2018-10-16 18:13 ` [PATCH 2/9] submodule.c: fix indentation Stefan Beller
2018-10-16 18:13 ` [PATCH 3/9] submodule.c: sort changed_submodule_names before searching it Stefan Beller
2018-10-17 21:21 ` Jonathan Tan
2018-10-16 18:13 ` [PATCH 4/9] submodule.c: move global changed_submodule_names into fetch submodule struct Stefan Beller
2018-10-17 21:26 ` Jonathan Tan
2018-10-18 19:09 ` Stefan Beller
2018-10-16 18:13 ` [PATCH 5/9] submodule.c: do not copy around submodule list Stefan Beller
2018-10-17 21:45 ` Jonathan Tan
2018-10-18 2:35 ` Junio C Hamano
2018-10-16 18:13 ` [PATCH 6/9] repository: repo_submodule_init to take a submodule struct Stefan Beller
2018-10-17 21:55 ` Jonathan Tan
2018-10-16 18:13 ` [PATCH 7/9] submodule: fetch in submodules git directory instead of in worktree Stefan Beller
2018-10-17 22:58 ` Jonathan Tan
2018-10-23 18:26 ` Stefan Beller
2018-10-23 22:55 ` Jonathan Tan
2018-10-23 23:01 ` Stefan Beller
2018-10-16 18:13 ` [PATCH 8/9] fetch: retry fetching submodules if needed objects were not fetched Stefan Beller
2018-10-18 0:39 ` Jonathan Tan
2018-10-23 22:37 ` Stefan Beller [this message]
2018-10-23 23:37 ` Jonathan Tan
2018-10-25 21:42 ` Stefan Beller
2018-10-16 18:13 ` [PATCH 9/9] builtin/fetch: check for submodule updates for non branch fetches Stefan Beller
2018-10-18 0:47 ` Jonathan Tan
2018-10-18 2:30 ` [PATCH 0/9] Resending sb/submodule-recursive-fetch-gets-the-tip Junio C Hamano
2018-10-18 7:30 ` Junio C Hamano
2018-10-18 18:00 ` Stefan Beller
-- strict thread matches above, loose matches on Subject: below --
2018-09-11 23:49 [PATCH 0/9] fetch: make sure submodule oids are fetched Stefan Beller
2018-09-17 21:35 ` [PATCHv2 " Stefan Beller
2018-09-17 21:35 ` [PATCH 8/9] fetch: retry fetching submodules if needed objects were not fetched Stefan Beller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAGZ79kZrK5G-EeGRzxyw0xW3ozo9_aPab4r8fn_Jc4hzKDOEwg@mail.gmail.com \
--to=sbeller@google.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jonathantanmy@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).