git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Glen Choo <chooglen@google.com>
Cc: git@vger.kernel.org, Benedek Kozma <cyberbeni@gmail.com>
Subject: Re: Bugreport - submodules are fetched twice in some cases
Date: Fri, 13 May 2022 22:24:35 -0700	[thread overview]
Message-ID: <xmqqpmkg8z58.fsf@gitster.g> (raw)
In-Reply-To: <kl6lczghj7tn.fsf@chooglen-macbookpro.roam.corp.google.com> (Glen Choo's message of "Fri, 13 May 2022 17:07:00 -0700")

Glen Choo <chooglen@google.com> writes:

> And obviously, we aren't passing "--recurse-submodules=false", so there's
> good reason to believe that "--all" will fetch submodules R + 1 times.

Good find.

Given your recent work on enumerating the commits in the submodule
repository that are needed to complement "git fetch" made in the
superproject, the above finding raises an interesting question.

Imagine that we have two remotes for the current repository, and
this superproject uses one submodule.

When we run "git fetch --all --resurse-submodules", from one remote,
we may grab a range of history in the superproject that mentions
submodule commits C1 and C2 that are not in our clone of the
submodule, while the other remote gives a different range of history
in the superproject that mentions submodule commit C3 that we do not
have.

What should happen in our submodule?  In other words, how do we make
sure that we grab C1, C2 and C3?

Ideally, we probably would want to run a non-recursive fetch of the
superproject twice (i.e. once for each of the two remotes we have),
then traverse the superproject history to find that these three
commits are needed in the submodule, and run a single (possibly
recursive) fetch in the submodule and ask for C1, C2 and C3.  But I
am not sure if we are set up to do so.  Does the "parent" process
take a snapshot of our refs before spawning the two "child" fetches
for each remote when handling "fetch --all", so that we can later
figure out what superproject commits were obtained during the
fetches from these two remotes?  Without that information, we cannot
find out that C1, C2 and C3 are new in the submodule, so we cannot
implement the "fetch without recursion from each remote and then do
a single fetch in submodule to grab everything we need at once"
approach.

Provided if we have the "make sure everything needed in the
submodule is fetched by inspecting the range of commits we fetch for
a superproject" working correctly for a single remote, an
alternative approach is to run "git fetch --recurse-submodules" for
each remote separately, without the "parent" process doing anything
in the submodule (i.e. you earlier counted R+1 fetches, but instead,
we make R fetches in the submodule.  It is less than ideal but it
may be easier to implement).

Thoughts?

  reply	other threads:[~2022-05-14  5:26 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-29 14:46 Bugreport - submodules are fetched twice in some cases Benedek Kozma
2022-04-29 17:39 ` Junio C Hamano
2022-04-29 19:05   ` Glen Choo
2022-04-29 20:02     ` Junio C Hamano
2022-04-29 20:37       ` Glen Choo
2022-05-14  0:07       ` Glen Choo
2022-05-14  5:24         ` Junio C Hamano [this message]
2022-05-16 17:45           ` Glen Choo
2022-05-16 18:25             ` Junio C Hamano
2022-05-16 19:04               ` Junio C Hamano
2022-05-16 21:53                 ` [PATCH] fetch: do not run a redundant fetch from submodule Junio C Hamano
2022-05-16 22:56                   ` Glen Choo
2022-05-16 23:33                     ` Junio C Hamano
2022-05-16 23:53                   ` [PATCH v2] " Junio C Hamano
2022-05-17 16:47                     ` Glen Choo
2022-05-18 15:53                       ` Junio C Hamano
2022-05-14  0:15       ` Bugreport - submodules are fetched twice in some cases Glen Choo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqpmkg8z58.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=chooglen@google.com \
    --cc=cyberbeni@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).