git@vger.kernel.org mailing list mirror (one of many)
 help / Atom feed
From: Stefan Beller <sbeller@google.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: Junio C Hamano <gitster@pobox.com>, git <git@vger.kernel.org>
Subject: Re: [PATCH 8/9] fetch: retry fetching submodules if needed objects were not fetched
Date: Tue, 23 Oct 2018 15:37:48 -0700
Message-ID: <CAGZ79kZrK5G-EeGRzxyw0xW3ozo9_aPab4r8fn_Jc4hzKDOEwg@mail.gmail.com> (raw)
In-Reply-To: <20181018003954.139498-1-jonathantanmy@google.com>

On Wed, Oct 17, 2018 at 5:40 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> > Currently when git-fetch is asked to recurse into submodules, it dispatches
> > a plain "git-fetch -C <submodule-dir>" (with some submodule related options
> > such as prefix and recusing strategy, but) without any information of the
> > remote or the tip that should be fetched.
> >
> > This works surprisingly well in some workflows (such as using submodules
> > as a third party library), while not so well in other scenarios, such
> > as in a Gerrit topic-based workflow, that can tie together changes
> > (potentially across repositories) on the server side. One of the parts
> > of such a Gerrit workflow is to download a change when wanting to examine
> > it, and you'd want to have its submodule changes that are in the same
> > topic downloaded as well. However these submodule changes reside in their
> > own repository in their own ref (refs/changes/<int>).
>
> It seems that the pertinent situation is when, in the superproject, you
> fetch a commit (which may be the target of a ref, or an ancestor of the
> target of a ref) that points to a submodule commit that was not fetched
> by the default refspec-less fetch that you describe in the first
> paragraph. I would just describe it as follows:
>
>   But this default fetch is not sufficient, as a newly fetched commit in
>   the superproject could point to a commit in the submodule that is not
>   in the default refspec. This is common in workflows like Gerrit's.
>   When fetching a Gerrit change under review (from refs/changes/??), the
>   commits in that change likely point to submodule commits that have not
>   been merged to a branch yet.

Makes sense.

> Another thing you need to clarify is what happens if the fetch-by-commit
> fails. Right now, it seems that it will make the whole thing fail, which
> might be a surprising change in behavior.

But a positive surprise, I would assume?

> The test stores the result in a normal branch, not a remote tracking
> branch. Is storing in a normal branch required?

In the test we fetch from another repository, such that in the
repository-under-test this will show up in a remote tracking branch?

> Also, do you know why this is required? A naive reading of the patch
> leads me to believe that this should work even if merely fetching to
> FETCH_HEAD.

See the next patch, check_for_new_submodule_commits() is missing
for FETCH_HEAD.

>
> > +struct get_next_submodule_task {
> > +     struct repository *repo;
> > +     const struct submodule *sub;
> > +     unsigned free_sub : 1; /* Do we need to free the submodule? */
> > +     struct oid_array *commits;
> > +};
>
> Firstly, I don't think "commits" needs to be a pointer.
>
> Document at least "commits". As far as I can tell, if NULL (or empty if
> you take my suggestion), this means that this task is the first pass for
> this particular submodule and the fetch needs to be done using the
> default refspec. Otherwise, this task is the second pass for this
> particular submodule and the fetch needs to be done passing the
> contained OIDs.

Makes sense. I think I'll make it a non-pointer, but will introduce another
flag or counter for the phase.

>
> > +static const struct submodule *get_default_submodule(const char *path)
> > +{
> > +     struct submodule *ret = NULL;
> > +     const char *name = default_name_or_path(path);
> > +
> > +     if (!name)
> > +             return NULL;
> > +
> > +     ret = xmalloc(sizeof(*ret));
> > +     memset(ret, 0, sizeof(*ret));
> > +     ret->path = name;
> > +     ret->name = name;
> > +
> > +     return (const struct submodule *) ret;
> > +}
>
> What is a "default" submodule and why would you need one?

s/default/artificial/. Such a submodule is a submodule that has no
config in the .gitmodules file and its name == path.
We need to keep those around for historic reasons AFAICT, c.f.
c68f837576 (implement fetching of moved submodules, 2017-10-16)

> > +             task = get_next_submodule_task_create(spf->r, ce->name);
> > +
> > +             if (!task->sub) {
> > +                     free(task);
> > +                     continue;
> >               }
>
> Will task->sub ever be NULL?

Yes, if we fall back to these "default" submodule and just try if it
can be handled
as a submodule, but it cannot be handled as such,
get_next_submodule_task_create has

    task->sub = submodule_from_path(r, &null_oid, path);
    if (!task->sub) {
        task->sub = get_default_submodule(path);

and get_default_submodule can return NULL.


>
> > +     if (spf->retry_nr) {
> > +             struct get_next_submodule_task *task = spf->retry[spf->retry_nr - 1];
> > +             struct strbuf submodule_prefix = STRBUF_INIT;
> > +             spf->retry_nr--;
> > +
> > +             strbuf_addf(&submodule_prefix, "%s%s/", spf->prefix, task->sub->path);
> > +
> > +             child_process_init(cp);
> > +             prepare_submodule_repo_env_in_gitdir(&cp->env_array);
> > +             cp->git_cmd = 1;
> > +             cp->dir = task->repo->gitdir;
> > +
> > +             argv_array_init(&cp->args);
> > +             argv_array_pushv(&cp->args, spf->args.argv);
> > +             argv_array_push(&cp->args, "on-demand");
>
> This means that we need to trust that the last entry in spf->args can
> take an "on-demand" parameter. Could we supply that argument here
> explicitly instead?
>
> > +             argv_array_push(&cp->args, "--submodule-prefix");
> > +             argv_array_push(&cp->args, submodule_prefix.buf);
> > +
> > +             /* NEEDSWORK: have get_default_remote from s--h */
>
> What is s--h?

builtin/submodule--helper, will clarify

>
> > +static int commit_exists_in_sub(const struct object_id *oid, void *data)
> > +{
> > +     struct repository *subrepo = data;
> > +
> > +     enum object_type type = oid_object_info(subrepo, oid, NULL);
> > +
> > +     return type != OBJ_COMMIT;
> > +}
>
> Shouldn't the function name be commit_missing_in_sub?

yes.

>
> >  static int fetch_finish(int retvalue, struct strbuf *err,
> >                       void *cb, void *task_cb)
> >  {
> >       struct submodule_parallel_fetch *spf = cb;
> > +     struct get_next_submodule_task *task = task_cb;
> > +     const struct submodule *sub;
> > +
> > +     struct string_list_item *it;
> > +     struct oid_array *commits;
> >
> >       if (retvalue)
> >               spf->result = 1;
> >
> > +     if (!task)
> > +             return 0;
>
> When will task be NULL?
>
> > +
> > +     sub = task->sub;
> > +     if (!sub)
> > +             goto out;
>
> Same as above - when will sub be NULL?
>
> > +     it = string_list_lookup(&spf->changed_submodule_names, sub->name);
> > +     if (!it)
> > +             goto out;
>
> And "it" as well.

I'll add comments.

>
> > +     commits = it->util;
> > +     oid_array_filter(commits,
> > +                      commit_exists_in_sub,
> > +                      task->repo);
> > +
> > +     /* Are there commits that do not exist? */
> > +     if (commits->nr) {
> > +             /* We already tried fetching them, do not try again. */
> > +             if (task->commits)
> > +                     return 0;
>
> Clearer and more efficient if the check for task->commits was first
> before all the filtering.
>
> > +test_expect_success "fetch new commits on-demand when they are not reachable" '
>
> "not reachable" confused me - they are indeed reachable, just not from
> the default refspec.

Makes sense

>
> > +     git checkout --detach &&
> > +     C=$(git -C submodule commit-tree -m "new change outside refs/heads" HEAD^{tree}) &&
> > +     git -C submodule update-ref refs/changes/1 $C &&
> > +     git update-index --cacheinfo 160000 $C submodule &&
> > +     git commit -m "updated submodule outside of refs/heads" &&
> > +     D=$(git rev-parse HEAD) &&
> > +     git update-ref refs/changes/2 $D &&
> > +     (
> > +             cd downstream &&
> > +             git fetch --recurse-submodules --recurse-submodules-default on-demand origin refs/changes/2:refs/heads/my_branch &&
> > +             git -C submodule cat-file -t $C &&
> > +             git checkout --recurse-submodules FETCH_HEAD
> > +     )
> > +'
>
> For maximum test coverage, I think this test should involve 2
> submodules:
>
>  B   C  E   F
>   \ /    \ /
>    A      D
>
> and the upstream superproject having:
>
>   G -> H -> I
>
> The downstream superproject already has G, and is fetching I. In H, the
> submodule gitlinks point to B and E respectively, and in I, the
> submodule gitlinks point to C and F respectively. This ensures that both
> multiple submodules work, and that submodule commits are also fetched
> for interior superproject commits.

ok.

  reply index

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-16 18:13 [PATCH 0/9] Resending sb/submodule-recursive-fetch-gets-the-tip Stefan Beller
2018-10-16 18:13 ` [PATCH 1/9] sha1-array: provide oid_array_filter Stefan Beller
2018-10-16 18:13 ` [PATCH 2/9] submodule.c: fix indentation Stefan Beller
2018-10-16 18:13 ` [PATCH 3/9] submodule.c: sort changed_submodule_names before searching it Stefan Beller
2018-10-17 21:21   ` Jonathan Tan
2018-10-16 18:13 ` [PATCH 4/9] submodule.c: move global changed_submodule_names into fetch submodule struct Stefan Beller
2018-10-17 21:26   ` Jonathan Tan
2018-10-18 19:09     ` Stefan Beller
2018-10-16 18:13 ` [PATCH 5/9] submodule.c: do not copy around submodule list Stefan Beller
2018-10-17 21:45   ` Jonathan Tan
2018-10-18  2:35     ` Junio C Hamano
2018-10-16 18:13 ` [PATCH 6/9] repository: repo_submodule_init to take a submodule struct Stefan Beller
2018-10-17 21:55   ` Jonathan Tan
2018-10-16 18:13 ` [PATCH 7/9] submodule: fetch in submodules git directory instead of in worktree Stefan Beller
2018-10-17 22:58   ` Jonathan Tan
2018-10-23 18:26     ` Stefan Beller
2018-10-23 22:55       ` Jonathan Tan
2018-10-23 23:01         ` Stefan Beller
2018-10-16 18:13 ` [PATCH 8/9] fetch: retry fetching submodules if needed objects were not fetched Stefan Beller
2018-10-18  0:39   ` Jonathan Tan
2018-10-23 22:37     ` Stefan Beller [this message]
2018-10-23 23:37       ` Jonathan Tan
2018-10-25 21:42         ` Stefan Beller
2018-10-16 18:13 ` [PATCH 9/9] builtin/fetch: check for submodule updates for non branch fetches Stefan Beller
2018-10-18  0:47   ` Jonathan Tan
2018-10-18  2:30 ` [PATCH 0/9] Resending sb/submodule-recursive-fetch-gets-the-tip Junio C Hamano
2018-10-18  7:30 ` Junio C Hamano
2018-10-18 18:00   ` Stefan Beller
  -- strict thread matches above, loose matches on Subject: below --
2018-09-11 23:49 [PATCH 0/9] fetch: make sure submodule oids are fetched Stefan Beller
2018-09-17 21:35 ` [PATCHv2 " Stefan Beller
2018-09-17 21:35   ` [PATCH 8/9] fetch: retry fetching submodules if needed objects were not fetched Stefan Beller

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGZ79kZrK5G-EeGRzxyw0xW3ozo9_aPab4r8fn_Jc4hzKDOEwg@mail.gmail.com \
    --to=sbeller@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

git@vger.kernel.org mailing list mirror (one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/
       or Tor2web: https://www.tor2web.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox