git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* Bugreport - submodules are fetched twice in some cases
@ 2022-04-29 14:46 Benedek Kozma
  2022-04-29 17:39 ` Junio C Hamano
  0 siblings, 1 reply; 17+ messages in thread
From: Benedek Kozma @ 2022-04-29 14:46 UTC (permalink / raw)
  To: git

Thank you for filling out a Git bug report!
Please answer the following questions to help us understand your issue.

What did you do before the bug happened? (Steps to reproduce your issue)
Run the following command with 1 remote:
`git fetch --all --prune --prune-tags`
I have the following in my config that could affect this:
```
[submodule]
recurse = true
[fetch]
parallel = 0
```

What did you expect to happen? (Expected behavior)
Submodules are only fetched once.

What happened instead? (Actual behavior)
Submodules are fetched twice.

What's different between what you expected and what actually happened?
The process takes longer

Anything else you want to add:

Please review the rest of the bug report below.
You can delete any lines you don't wish to share.


[System Info]
git version:
git version 2.36.0
cpu: arm64
no commit associated with this build
sizeof-long: 8
sizeof-size_t: 8
shell-path: /bin/sh
feature: fsmonitor--daemon
uname: Darwin 21.4.0 Darwin Kernel Version 21.4.0: Fri Mar 18 00:46:32
PDT 2022; root:xnu-8020.101.4~15/RELEASE_ARM64_T6000 arm64
compiler info: clang: 13.1.6 (clang-1316.0.21.2)
libc info: no libc information available
$SHELL (typically, interactive shell): /bin/zsh


[Enabled Hooks]
pre-commit

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bugreport - submodules are fetched twice in some cases
  2022-04-29 14:46 Bugreport - submodules are fetched twice in some cases Benedek Kozma
@ 2022-04-29 17:39 ` Junio C Hamano
  2022-04-29 19:05   ` Glen Choo
  0 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2022-04-29 17:39 UTC (permalink / raw)
  To: Glen Choo; +Cc: git, Benedek Kozma

A simple and concrete reproduction 

    git init top
    cd top
    date >file1
    git init sub
    cd sub
    date >subfile1
    git add .
    git commit -m subinitial
    cd .. ;# back to top
    git submodule add ./sub sub
    git add file1
    git commit -m initial
    cd .. ;# out of top
    git clone --recurse-submodules top copy
    cd copy
    git config submodule.recurse true
    git config fetch.parallel 0
    GIT_TRACE2=$(pwd)/trace git fetch --all --prune --prune-tags

This throws the three lines to the output.

Fetching origin
Fetching submodule sub
Fetching submodule sub

The two "Fetching submodule" messages are coming from two separate
calls to get_fetch_task_from_index(), and the trace does show that
the code is doing "git-upload-pack" three times (one for the top
level, twice for the same top/sub).  We can see it by grepping
for "git-upload-pack" in the resulting 'trace' file above.

Glen, as submodule.c::fetch_submodules() was created in your heavy
refactoring quite recently, I thought I'd redirect this report in
your direction, as I expect you'd be the most clueful in this area
;-)

Thanks.




Benedek Kozma <cyberbeni@gmail.com> writes:

> Thank you for filling out a Git bug report!
> Please answer the following questions to help us understand your issue.
>
> What did you do before the bug happened? (Steps to reproduce your issue)
> Run the following command with 1 remote:
> `git fetch --all --prune --prune-tags`
> I have the following in my config that could affect this:
> ```
> [submodule]
> recurse = true
> [fetch]
> parallel = 0
> ```
>
> What did you expect to happen? (Expected behavior)
> Submodules are only fetched once.
>
> What happened instead? (Actual behavior)
> Submodules are fetched twice.
>
> What's different between what you expected and what actually happened?
> The process takes longer
>
> Anything else you want to add:
>
> Please review the rest of the bug report below.
> You can delete any lines you don't wish to share.
>
>
> [System Info]
> git version:
> git version 2.36.0
> cpu: arm64
> no commit associated with this build
> sizeof-long: 8
> sizeof-size_t: 8
> shell-path: /bin/sh
> feature: fsmonitor--daemon
> uname: Darwin 21.4.0 Darwin Kernel Version 21.4.0: Fri Mar 18 00:46:32
> PDT 2022; root:xnu-8020.101.4~15/RELEASE_ARM64_T6000 arm64
> compiler info: clang: 13.1.6 (clang-1316.0.21.2)
> libc info: no libc information available
> $SHELL (typically, interactive shell): /bin/zsh
>
>
> [Enabled Hooks]
> pre-commit

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bugreport - submodules are fetched twice in some cases
  2022-04-29 17:39 ` Junio C Hamano
@ 2022-04-29 19:05   ` Glen Choo
  2022-04-29 20:02     ` Junio C Hamano
  0 siblings, 1 reply; 17+ messages in thread
From: Glen Choo @ 2022-04-29 19:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Benedek Kozma

Junio C Hamano <gitster@pobox.com> writes:

> A simple and concrete reproduction 
>
>     git init top
>     cd top
>     date >file1
>     git init sub
>     cd sub
>     date >subfile1
>     git add .
>     git commit -m subinitial
>     cd .. ;# back to top
>     git submodule add ./sub sub
>     git add file1
>     git commit -m initial
>     cd .. ;# out of top
>     git clone --recurse-submodules top copy
>     cd copy
>     git config submodule.recurse true
>     git config fetch.parallel 0
>     GIT_TRACE2=$(pwd)/trace git fetch --all --prune --prune-tags
>
> This throws the three lines to the output.
>
> Fetching origin
> Fetching submodule sub
> Fetching submodule sub
>
> The two "Fetching submodule" messages are coming from two separate
> calls to get_fetch_task_from_index(), and the trace does show that
> the code is doing "git-upload-pack" three times (one for the top
> level, twice for the same top/sub).  We can see it by grepping
> for "git-upload-pack" in the resulting 'trace' file above.

 
Thanks for the reproduction recipe and findings, that'll be very helpful
:)

> Glen, as submodule.c::fetch_submodules() was created in your heavy
> refactoring quite recently, I thought I'd redirect this report in
> your direction, as I expect you'd be the most clueful in this area
> ;-)

Hm, this does look like something that I probably introduced. But even
if it turns out to be older than that, I think I am the right person to
fix it, yes.

I'm a little caught up with the embedded bare repo work, but I think I
can find time to debug this within the next 2 working days or so.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bugreport - submodules are fetched twice in some cases
  2022-04-29 19:05   ` Glen Choo
@ 2022-04-29 20:02     ` Junio C Hamano
  2022-04-29 20:37       ` Glen Choo
                         ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Junio C Hamano @ 2022-04-29 20:02 UTC (permalink / raw)
  To: Glen Choo; +Cc: git, Benedek Kozma

Glen Choo <chooglen@google.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> A simple and concrete reproduction 
>>
>>     git init top
>>     cd top
>>     date >file1
>>     git init sub
>>     cd sub
>>     date >subfile1
>>     git add .
>>     git commit -m subinitial
>>     cd .. ;# back to top
>>     git submodule add ./sub sub
>>     git add file1
>>     git commit -m initial
>>     cd .. ;# out of top
>>     git clone --recurse-submodules top copy
>>     cd copy
>>     git config submodule.recurse true
>>     git config fetch.parallel 0
>>     GIT_TRACE2=$(pwd)/trace git fetch --all --prune --prune-tags
>>
>> This throws the three lines to the output.
>>
>> Fetching origin
>> Fetching submodule sub
>> Fetching submodule sub
>>
>> The two "Fetching submodule" messages are coming from two separate
>> calls to get_fetch_task_from_index(), and the trace does show that
>> the code is doing "git-upload-pack" three times (one for the top
>> level, twice for the same top/sub).  We can see it by grepping
>> for "git-upload-pack" in the resulting 'trace' file above.
>
>  
> Thanks for the reproduction recipe and findings, that'll be very helpful
> :)
>
>> Glen, as submodule.c::fetch_submodules() was created in your heavy
>> refactoring quite recently, I thought I'd redirect this report in
>> your direction, as I expect you'd be the most clueful in this area
>> ;-)
>
> Hm, this does look like something that I probably introduced. But even
> if it turns out to be older than that, I think I am the right person to
> fix it, yes.

It seems that ever since the introduction of the --prune-tags option
at v2.16.1-16-g97716d217c (fetch: add a --prune-tags option and
fetch.pruneTags config, 2018-02-09), we always behaved this way.

Without "--prune-tags" (but still with "--prune"), we can go even
older than that version, and v2.10.0 seems to fetch only once.

And the command keeps working that way all the way back to the
commit that starts honoring submodule.recurse configuration, at
v2.13.0-137-g58f4203e7d (builtin/fetch.c: respect
'submodule.recurse' option, 2017-05-31)

If we instead use "git fetch --recurse-submodules" with versions
of GIt older than that, we can go even older.  I saw v2.5.0 behaves
that way before I got tired and gave up.

So, we still would want to eventually get to it, but no rush.  This
is an old thing and not as urgent as fixing a recent regression.

FWIW, without "--all", we do not make an extra fetch at all, with
the current code or with code as ancient as v2.5.0

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bugreport - submodules are fetched twice in some cases
  2022-04-29 20:02     ` Junio C Hamano
@ 2022-04-29 20:37       ` Glen Choo
  2022-05-14  0:07       ` Glen Choo
  2022-05-14  0:15       ` Bugreport - submodules are fetched twice in some cases Glen Choo
  2 siblings, 0 replies; 17+ messages in thread
From: Glen Choo @ 2022-04-29 20:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Benedek Kozma

Junio C Hamano <gitster@pobox.com> writes:

> Glen Choo <chooglen@google.com> writes:
>
>> Junio C Hamano <gitster@pobox.com> writes:
>>
>>> A simple and concrete reproduction 
>>>
>>>     git init top
>>>     cd top
>>>     date >file1
>>>     git init sub
>>>     cd sub
>>>     date >subfile1
>>>     git add .
>>>     git commit -m subinitial
>>>     cd .. ;# back to top
>>>     git submodule add ./sub sub
>>>     git add file1
>>>     git commit -m initial
>>>     cd .. ;# out of top
>>>     git clone --recurse-submodules top copy
>>>     cd copy
>>>     git config submodule.recurse true
>>>     git config fetch.parallel 0
>>>     GIT_TRACE2=$(pwd)/trace git fetch --all --prune --prune-tags
>>>
>>> This throws the three lines to the output.
>>>
>>> Fetching origin
>>> Fetching submodule sub
>>> Fetching submodule sub
>>>
>>> The two "Fetching submodule" messages are coming from two separate
>>> calls to get_fetch_task_from_index(), and the trace does show that
>>> the code is doing "git-upload-pack" three times (one for the top
>>> level, twice for the same top/sub).  We can see it by grepping
>>> for "git-upload-pack" in the resulting 'trace' file above.
>>
>>  
>> Thanks for the reproduction recipe and findings, that'll be very helpful
>> :)
>>
>>> Glen, as submodule.c::fetch_submodules() was created in your heavy
>>> refactoring quite recently, I thought I'd redirect this report in
>>> your direction, as I expect you'd be the most clueful in this area
>>> ;-)
>>
>> Hm, this does look like something that I probably introduced. But even
>> if it turns out to be older than that, I think I am the right person to
>> fix it, yes.
>
> It seems that ever since the introduction of the --prune-tags option
> at v2.16.1-16-g97716d217c (fetch: add a --prune-tags option and
> fetch.pruneTags config, 2018-02-09), we always behaved this way.
>
> Without "--prune-tags" (but still with "--prune"), we can go even
> older than that version, and v2.10.0 seems to fetch only once.
>
> And the command keeps working that way all the way back to the
> commit that starts honoring submodule.recurse configuration, at
> v2.13.0-137-g58f4203e7d (builtin/fetch.c: respect
> 'submodule.recurse' option, 2017-05-31)
>
> If we instead use "git fetch --recurse-submodules" with versions
> of GIt older than that, we can go even older.  I saw v2.5.0 behaves
> that way before I got tired and gave up.
>
> So, we still would want to eventually get to it, but no rush.  This
> is an old thing and not as urgent as fixing a recent regression.

Ah, thanks a lot for investigating even further. I'll still take a look
when I can, though this alleviates the time pressure a lot :)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bugreport - submodules are fetched twice in some cases
  2022-04-29 20:02     ` Junio C Hamano
  2022-04-29 20:37       ` Glen Choo
@ 2022-05-14  0:07       ` Glen Choo
  2022-05-14  5:24         ` Junio C Hamano
  2022-05-14  0:15       ` Bugreport - submodules are fetched twice in some cases Glen Choo
  2 siblings, 1 reply; 17+ messages in thread
From: Glen Choo @ 2022-05-14  0:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Benedek Kozma

Junio C Hamano <gitster@pobox.com> writes:

> Glen Choo <chooglen@google.com> writes:
>
>> Junio C Hamano <gitster@pobox.com> writes:
>>
>>> A simple and concrete reproduction 
>>>
>>>     git init top
>>>     cd top
>>>     date >file1
>>>     git init sub
>>>     cd sub
>>>     date >subfile1
>>>     git add .
>>>     git commit -m subinitial
>>>     cd .. ;# back to top
>>>     git submodule add ./sub sub
>>>     git add file1
>>>     git commit -m initial
>>>     cd .. ;# out of top
>>>     git clone --recurse-submodules top copy
>>>     cd copy
>>>     git config submodule.recurse true
>>>     git config fetch.parallel 0
>>>     GIT_TRACE2=$(pwd)/trace git fetch --all --prune --prune-tags
>>>
>>> This throws the three lines to the output.
>>>
>>> Fetching origin
>>> Fetching submodule sub
>>> Fetching submodule sub
>>>
>>> The two "Fetching submodule" messages are coming from two separate
>>> calls to get_fetch_task_from_index(), and the trace does show that
>>> the code is doing "git-upload-pack" three times (one for the top
>>> level, twice for the same top/sub).  We can see it by grepping
>>> for "git-upload-pack" in the resulting 'trace' file above.
>>
>>  
>> Thanks for the reproduction recipe and findings, that'll be very helpful
>> :)
>>
>>> Glen, as submodule.c::fetch_submodules() was created in your heavy
>>> refactoring quite recently, I thought I'd redirect this report in
>>> your direction, as I expect you'd be the most clueful in this area
>>> ;-)
>>
>> Hm, this does look like something that I probably introduced. But even
>> if it turns out to be older than that, I think I am the right person to
>> fix it, yes.
>
> It seems that ever since the introduction of the --prune-tags option
> at v2.16.1-16-g97716d217c (fetch: add a --prune-tags option and
> fetch.pruneTags config, 2018-02-09), we always behaved this way.
>
> Without "--prune-tags" (but still with "--prune"), we can go even
> older than that version, and v2.10.0 seems to fetch only once.
>
> And the command keeps working that way all the way back to the
> commit that starts honoring submodule.recurse configuration, at
> v2.13.0-137-g58f4203e7d (builtin/fetch.c: respect
> 'submodule.recurse' option, 2017-05-31)
>
> If we instead use "git fetch --recurse-submodules" with versions
> of GIt older than that, we can go even older.  I saw v2.5.0 behaves
> that way before I got tired and gave up.
>
> So, we still would want to eventually get to it, but no rush.  This
> is an old thing and not as urgent as fixing a recent regression.
>
> FWIW, without "--all", we do not make an extra fetch at all, with
> the current code or with code as ancient as v2.5.0

I had a little free time, so I dug into it a bit more. "--all" does seem
to be the crux of it and I _very strongly_ suspect it is because "--all"
creates a child process which also fetches submodules.

Here's a hint inside of builtin/fetch.c:fetch_multiple..

  static int fetch_multiple(struct string_list *list, int max_children)
  {
    // ....
    strvec_pushl(&argv, "fetch", "--append", "--no-auto-gc", "--no-write-commit-graph", NULL);

Notice how we pass "--no-auto-gc" and "--no-write-commit-graph"? This
makes sense because those tasks run _after_ the main fetch and only need
to run once. If we didn't pass these flags, then we'd run those tasks
once per remote, and once more after the main fetch (R + 1 times, where
R is the number of remotes).

And obviously, we aren't passing "--recurse-submodules=false", so there's
good reason to believe that "--all" will fetch submodules R + 1 times.

But confusingly, I can't validate this hypothesis because I can't get
the "inner fetch" to respect "--recurse-submodules" (I hope this isn't
due to that recent change I made).

I did most of this testing without "--prune" or "--prune-tags". Your
findings suggest that they're somewhat relevant though, so maybe I'll
look at those versions to get more clues.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bugreport - submodules are fetched twice in some cases
  2022-04-29 20:02     ` Junio C Hamano
  2022-04-29 20:37       ` Glen Choo
  2022-05-14  0:07       ` Glen Choo
@ 2022-05-14  0:15       ` Glen Choo
  2 siblings, 0 replies; 17+ messages in thread
From: Glen Choo @ 2022-05-14  0:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Benedek Kozma

> FWIW, without "--all", we do not make an extra fetch at all, with
> the current code or with code as ancient as v2.5.0

Aha, the reason why I wasn't able to override "--recurse-submodules" is
that I didn't notice add_options_to_argv() was setting
"--recurse-submodules" for each of the 'inner' fetch-es in
fetch_multiple().

So a PoC fix might look like this

----- >8 --------- >8 --------- >8 --------- >8 --------- >8 ----

diff --git a/builtin/fetch.c b/builtin/fetch.c
index e3791f09ed..d42646e68c 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1856,10 +1856,6 @@ static void add_options_to_argv(struct strvec *argv)
 		strvec_push(argv, "--force");
 	if (keep)
 		strvec_push(argv, "--keep");
-	if (recurse_submodules == RECURSE_SUBMODULES_ON)
-		strvec_push(argv, "--recurse-submodules");
-	else if (recurse_submodules == RECURSE_SUBMODULES_ON_DEMAND)
-		strvec_push(argv, "--recurse-submodules=on-demand");
 	if (tags == TAGS_SET)
 		strvec_push(argv, "--tags");
 	else if (tags == TAGS_UNSET)
@@ -1942,8 +1938,7 @@ static int fetch_multiple(struct string_list *list, int max_children)
 			return errcode;
 	}
 
-	strvec_pushl(&argv, "fetch", "--append", "--no-auto-gc",
-		     "--no-write-commit-graph", NULL);
+	strvec_pushl(&argv, "fetch", "--append", "--no-auto-gc", "--no-write-commit-graph", "--recurse-submodules=false", NULL);
 	add_options_to_argv(&argv);
 
 	if (max_children != 1 && list->nr != 1) {
@@ -2271,6 +2266,10 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 			max_children = fetch_parallel_config;
 
 		add_options_to_argv(&options);
+		if (recurse_submodules == RECURSE_SUBMODULES_ON)
+			strvec_push(&options, "--recurse-submodules");
+		else if (recurse_submodules == RECURSE_SUBMODULES_ON_DEMAND)
+			strvec_push(&options, "--recurse-submodules=on-demand");
 		result = fetch_submodules(the_repository,
 					  &options,
 					  submodule_prefix,

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bugreport - submodules are fetched twice in some cases
  2022-05-14  0:07       ` Glen Choo
@ 2022-05-14  5:24         ` Junio C Hamano
  2022-05-16 17:45           ` Glen Choo
  0 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2022-05-14  5:24 UTC (permalink / raw)
  To: Glen Choo; +Cc: git, Benedek Kozma

Glen Choo <chooglen@google.com> writes:

> And obviously, we aren't passing "--recurse-submodules=false", so there's
> good reason to believe that "--all" will fetch submodules R + 1 times.

Good find.

Given your recent work on enumerating the commits in the submodule
repository that are needed to complement "git fetch" made in the
superproject, the above finding raises an interesting question.

Imagine that we have two remotes for the current repository, and
this superproject uses one submodule.

When we run "git fetch --all --resurse-submodules", from one remote,
we may grab a range of history in the superproject that mentions
submodule commits C1 and C2 that are not in our clone of the
submodule, while the other remote gives a different range of history
in the superproject that mentions submodule commit C3 that we do not
have.

What should happen in our submodule?  In other words, how do we make
sure that we grab C1, C2 and C3?

Ideally, we probably would want to run a non-recursive fetch of the
superproject twice (i.e. once for each of the two remotes we have),
then traverse the superproject history to find that these three
commits are needed in the submodule, and run a single (possibly
recursive) fetch in the submodule and ask for C1, C2 and C3.  But I
am not sure if we are set up to do so.  Does the "parent" process
take a snapshot of our refs before spawning the two "child" fetches
for each remote when handling "fetch --all", so that we can later
figure out what superproject commits were obtained during the
fetches from these two remotes?  Without that information, we cannot
find out that C1, C2 and C3 are new in the submodule, so we cannot
implement the "fetch without recursion from each remote and then do
a single fetch in submodule to grab everything we need at once"
approach.

Provided if we have the "make sure everything needed in the
submodule is fetched by inspecting the range of commits we fetch for
a superproject" working correctly for a single remote, an
alternative approach is to run "git fetch --recurse-submodules" for
each remote separately, without the "parent" process doing anything
in the submodule (i.e. you earlier counted R+1 fetches, but instead,
we make R fetches in the submodule.  It is less than ideal but it
may be easier to implement).

Thoughts?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bugreport - submodules are fetched twice in some cases
  2022-05-14  5:24         ` Junio C Hamano
@ 2022-05-16 17:45           ` Glen Choo
  2022-05-16 18:25             ` Junio C Hamano
  0 siblings, 1 reply; 17+ messages in thread
From: Glen Choo @ 2022-05-16 17:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Benedek Kozma

Junio C Hamano <gitster@pobox.com> writes:

> Glen Choo <chooglen@google.com> writes:
>
>> And obviously, we aren't passing "--recurse-submodules=false", so there's
>> good reason to believe that "--all" will fetch submodules R + 1 times.
>
> Good find.
>
> Given your recent work on enumerating the commits in the submodule
> repository that are needed to complement "git fetch" made in the
> superproject, the above finding raises an interesting question.
>
> Imagine that we have two remotes for the current repository, and
> this superproject uses one submodule.
>
> When we run "git fetch --all --resurse-submodules", from one remote,
> we may grab a range of history in the superproject that mentions
> submodule commits C1 and C2 that are not in our clone of the
> submodule, while the other remote gives a different range of history
> in the superproject that mentions submodule commit C3 that we do not
> have.
>
> What should happen in our submodule?  In other words, how do we make
> sure that we grab C1, C2 and C3?
>
> Ideally, we probably would want to run a non-recursive fetch of the
> superproject twice (i.e. once for each of the two remotes we have),
> then traverse the superproject history to find that these three
> commits are needed in the submodule, and run a single (possibly
> recursive) fetch in the submodule and ask for C1, C2 and C3.  But I
> am not sure if we are set up to do so.  Does the "parent" process
> take a snapshot of our refs before spawning the two "child" fetches
> for each remote when handling "fetch --all", so that we can later
> figure out what superproject commits were obtained during the
> fetches from these two remotes?

I know that we do this "snapshot"-ing for a single process. Looking in
the code, I see that check_for_new_submodule_commits() creates the
snapshot (in addition to registering a newly fetched commit).

I don't think we are in the ideal scenario because we only snapshot when
we fetch without "--all":

  cmd_fetch() > *fetch_one()* > do_fetch() > backfill_tags() >
    fetch_and_consume_refs() > store_updated_refs() >
    check_for_new_submodule_commits()

In the ideal scenario, it would be something like:

  cmd_fetch() >
    TODO_snapshot_old_refs(), fetch_[one|multiple](),
    TODO_register_new_refs()

It looks non-trivial enough that I don't think I'll try to fix this
soon, but it does not look prohibitively hard.

> Provided if we have the "make sure everything needed in the
> submodule is fetched by inspecting the range of commits we fetch for
> a superproject" working correctly for a single remote, an
> alternative approach is to run "git fetch --recurse-submodules" for
> each remote separately, without the "parent" process doing anything
> in the submodule (i.e. you earlier counted R+1 fetches, but instead,
> we make R fetches in the submodule.  It is less than ideal but it
> may be easier to implement).
>
> Thoughts?

The +1 fetch is redundant, so it's probably good to get rid of it
anyway. I also wonder what it does if we don't snapshot the old refs in
the "parent" process. Perhaps it does nothing in the
"--recurse-submodules=on-demand" case because it wouldn't know what
submodules have changed?

Since it is only one fetch, it might not be worth the effort though.
It would only look correct in some odd situation where the user has only
one configured remote, but fetches with "--all".

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bugreport - submodules are fetched twice in some cases
  2022-05-16 17:45           ` Glen Choo
@ 2022-05-16 18:25             ` Junio C Hamano
  2022-05-16 19:04               ` Junio C Hamano
  0 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2022-05-16 18:25 UTC (permalink / raw)
  To: Glen Choo; +Cc: git, Benedek Kozma

Glen Choo <chooglen@google.com> writes:

> I don't think we are in the ideal scenario because we only snapshot when
> we fetch without "--all":
>
>   cmd_fetch() > *fetch_one()* > do_fetch() > backfill_tags() >
>     fetch_and_consume_refs() > store_updated_refs() >
>     check_for_new_submodule_commits()
>
> In the ideal scenario, it would be something like:
>
>   cmd_fetch() >
>     TODO_snapshot_old_refs(), fetch_[one|multiple](),
>     TODO_register_new_refs()
>
> It looks non-trivial enough that I don't think I'll try to fix this
> soon, but it does not look prohibitively hard.

It matches my gut feeling.

>> Provided if we have the "make sure everything needed in the
>> submodule is fetched by inspecting the range of commits we fetch for
>> a superproject" working correctly for a single remote, an
>> alternative approach is to run "git fetch --recurse-submodules" for
>> each remote separately, without the "parent" process doing anything
>> in the submodule (i.e. you earlier counted R+1 fetches, but instead,
>> we make R fetches in the submodule.  It is less than ideal but it
>> may be easier to implement).
>>
>> Thoughts?
>
> The +1 fetch is redundant, so it's probably good to get rid of it
> anyway.

Sounds sensible.  It should be a single-liner, i.e.

 builtin/fetch.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git i/builtin/fetch.c w/builtin/fetch.c
index eeee5ac8f1..be61c390c1 100644
--- i/builtin/fetch.c
+++ w/builtin/fetch.c
@@ -2262,7 +2262,7 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 		result = fetch_multiple(&list, max_children);
 	}
 
-	if (!result && (recurse_submodules != RECURSE_SUBMODULES_OFF)) {
+	if (!result && remote && (recurse_submodules != RECURSE_SUBMODULES_OFF)) {
 		struct strvec options = STRVEC_INIT;
 		int max_children = max_jobs;
 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bugreport - submodules are fetched twice in some cases
  2022-05-16 18:25             ` Junio C Hamano
@ 2022-05-16 19:04               ` Junio C Hamano
  2022-05-16 21:53                 ` [PATCH] fetch: do not run a redundant fetch from submodule Junio C Hamano
  0 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2022-05-16 19:04 UTC (permalink / raw)
  To: Glen Choo; +Cc: git, Benedek Kozma

Junio C Hamano <gitster@pobox.com> writes:

>> The +1 fetch is redundant, so it's probably good to get rid of it
>> anyway.

Another thing I noticed is that this +1 is *not* limited to "--all".
If you give a group that expands to multiple remotes, you'd take the
same fetch_multiple() code path (that is why the "single liner"
patch in the message I am responding to looks at the "is remote
set?", which is the same condition that decides if we use
fetch_one() or fetch_multiple()).


I also notice that there is a strange "optimzation" that is used
after expanding a group into one or more remotes and when it turns
out it was a group of one.  In such a case (and only in such a
case), we set "remote" and bypass the code path that uses &list.

I wonder if this "optimization" should be also used for "--all".

 builtin/fetch.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git c/builtin/fetch.c w/builtin/fetch.c
index e3791f09ed..9093455e81 100644
--- c/builtin/fetch.c
+++ w/builtin/fetch.c
@@ -2187,6 +2187,9 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 		else if (argc > 1)
 			die(_("fetch --all does not make sense with refspecs"));
 		(void) for_each_remote(get_one_remote_for_fetch, &list);
+		/* no point doing fetch_multiple() of one */
+		if (list.nr == 1)
+			remote = remote_get(list.items[0].string);
 	} else if (argc == 0) {
 		/* No arguments -- use default remote */
 		remote = remote_get(NULL);



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] fetch: do not run a redundant fetch from submodule
  2022-05-16 19:04               ` Junio C Hamano
@ 2022-05-16 21:53                 ` Junio C Hamano
  2022-05-16 22:56                   ` Glen Choo
  2022-05-16 23:53                   ` [PATCH v2] " Junio C Hamano
  0 siblings, 2 replies; 17+ messages in thread
From: Junio C Hamano @ 2022-05-16 21:53 UTC (permalink / raw)
  To: git; +Cc: Benedek Kozma, Glen Choo

When 7dce19d3 (fetch/pull: Add the --recurse-submodules option,
2010-11-12) introduced the "--recurse-submodule" option, the
approach taken was to perform fetches in submodules only once, after
all the main fetching (it may usually be a fetch from a single
remote, but it could be fetching from a group of remotes using
fetch_multiple()) succeeded.  Later we added "--all" to fetch from
all defined remotes, which complicated things even more.

If your project has a submodule, and you try to run "git fetch
--recurse-submodule --all", you'd see a fetch for the top-level,
which invokes another fetch for the submodule, followed by another
fetch for the same submodule.  All but the last fetch for the
submodule come from a "git fetch --recurse-submodules" subprocess
that is spawned via the fetch_multiple() interface for the remotes,
and the last fetch comes from the code at the end.

Because recursive fetching from submodules is done in each fetch for
the top-level in fetch_multiple(), the last fetch in the submodule
is redundant.  It only matters when fetch_one() interacts with a
single remote at the top-level.

While we are at it, there is one optimization that exists in dealing
with a group of remote, but is missing when "--all" is used.  In the
former, when the group turns out to be a group of one, instead of
spawning "git fetch" as a subprocess via the fetch_multiple()
interface, we use the normal fetch_one() code path.  Do the same
when handing "--all", if it turns out that we have only one remote
defined.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * So, here is a mixed thing that fixes the issue in two different
   ways, which makes it unsuitable to be the final patch.  Either
   "turning --all into a single fetch with optimizaiton" that is in
   the first hunk, or "don't do the final sweep unless doing a
   single fetch" that is in the second hunk, is sufficient to make
   the symptom disappear.  Of course, using them both does not break
   anything, but it somehow feels unsatisfactory, and makes readers
   feel that we should be able to do better.

   But the better thing being what Glen alluded to as "non-trivial
   not prohibitivey hard", I'll stop here.

 builtin/fetch.c                    |  6 +++++-
 t/t5617-clone-submodules-remote.sh | 13 +++++++++++++
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index e3791f09ed..359321e731 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -2187,6 +2187,10 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 		else if (argc > 1)
 			die(_("fetch --all does not make sense with refspecs"));
 		(void) for_each_remote(get_one_remote_for_fetch, &list);
+
+		/* no point doing fetch_multiple() of one */
+		if (list.nr == 1)
+			remote = remote_get(list.items[0].string);
 	} else if (argc == 0) {
 		/* No arguments -- use default remote */
 		remote = remote_get(NULL);
@@ -2261,7 +2265,7 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 		result = fetch_multiple(&list, max_children);
 	}
 
-	if (!result && (recurse_submodules != RECURSE_SUBMODULES_OFF)) {
+	if (!result && remote && (recurse_submodules != RECURSE_SUBMODULES_OFF)) {
 		struct strvec options = STRVEC_INIT;
 		int max_children = max_jobs;
 
diff --git a/t/t5617-clone-submodules-remote.sh b/t/t5617-clone-submodules-remote.sh
index ca8f80083a..8b6b482a39 100755
--- a/t/t5617-clone-submodules-remote.sh
+++ b/t/t5617-clone-submodules-remote.sh
@@ -72,6 +72,19 @@ test_expect_success 'clone with --single-branch' '
 	)
 '
 
+test_expect_success 'fetch --all with --recurse-submodules' '
+	test_when_finished "rm -fr super_clone" &&
+	git clone --recurse-submodules srv.bare super_clone &&
+	(
+		cd super_clone &&
+		git config submodule.recurse true &&
+		git config fetch.parallel 0 &&
+		git fetch --all 2>../fetch-log
+	) &&
+	grep "Fetching sub" fetch-log >fetch-subs &&
+	test_line_count = 1 fetch-subs
+'
+
 # do basic partial clone from "srv.bare"
 # confirm partial clone was registered in the local config for super and sub.
 test_expect_success 'clone with --filter' '
-- 
2.36.1-379-g841af8e974


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] fetch: do not run a redundant fetch from submodule
  2022-05-16 21:53                 ` [PATCH] fetch: do not run a redundant fetch from submodule Junio C Hamano
@ 2022-05-16 22:56                   ` Glen Choo
  2022-05-16 23:33                     ` Junio C Hamano
  2022-05-16 23:53                   ` [PATCH v2] " Junio C Hamano
  1 sibling, 1 reply; 17+ messages in thread
From: Glen Choo @ 2022-05-16 22:56 UTC (permalink / raw)
  To: Junio C Hamano, git; +Cc: Benedek Kozma

Junio C Hamano <gitster@pobox.com> writes:

> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index e3791f09ed..359321e731 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -2187,6 +2187,10 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
>  		else if (argc > 1)
>  			die(_("fetch --all does not make sense with refspecs"));
>  		(void) for_each_remote(get_one_remote_for_fetch, &list);
> +
> +		/* no point doing fetch_multiple() of one */
> +		if (list.nr == 1)
> +			remote = remote_get(list.items[0].string);
>  	} else if (argc == 0) {
>  		/* No arguments -- use default remote */
>  		remote = remote_get(NULL);
> @@ -2261,7 +2265,7 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
>  		result = fetch_multiple(&list, max_children);
>  	}
>  
> -	if (!result && (recurse_submodules != RECURSE_SUBMODULES_OFF)) {
> +	if (!result && remote && (recurse_submodules != RECURSE_SUBMODULES_OFF)) {
>  		struct strvec options = STRVEC_INIT;
>  		int max_children = max_jobs;

IMO "&& remote" is non-inuitive enough to warrant a comment, e.g.

  /* do not update submodules if fetch_multiple() was called */
  if (!result && remote && (recurse_submodules != RECURSE_SUBMODULES_OFF)) {

but I suppose that the commit message explains this well enough.

> diff --git a/t/t5617-clone-submodules-remote.sh b/t/t5617-clone-submodules-remote.sh
> index ca8f80083a..8b6b482a39 100755
> --- a/t/t5617-clone-submodules-remote.sh
> +++ b/t/t5617-clone-submodules-remote.sh
> @@ -72,6 +72,19 @@ test_expect_success 'clone with --single-branch' '
>  	)
>  '
>  
> +test_expect_success 'fetch --all with --recurse-submodules' '
> +	test_when_finished "rm -fr super_clone" &&
> +	git clone --recurse-submodules srv.bare super_clone &&
> +	(
> +		cd super_clone &&
> +		git config submodule.recurse true &&
> +		git config fetch.parallel 0 &&
> +		git fetch --all 2>../fetch-log
> +	) &&
> +	grep "Fetching sub" fetch-log >fetch-subs &&
> +	test_line_count = 1 fetch-subs
> +'
> +

The test looks good, but I think it belongs in
t/t5526-fetch-submodules.sh. I don't see anything else in
t5617-clone-submodules-remote.sh that calls "git fetch".

In addition, I think we need one more test that adds another remote. In
the above test, we only have one remote, so we hit your new optimization
and already pass the test without the need for "&& remote".

Whereas this test fails if we remove "&& remote".

  test_expect_success 'fetch --all with --recurse-submodules and more remotes' '
    test_when_finished "rm -fr super_clone" &&
    git clone --recurse-submodules srv.bare super_clone &&
    cp -r srv.bare srv.bare2 &&
    (
      cd super_clone &&
      git config submodule.recurse true &&
      git config fetch.parallel 0 &&
      git remote add other ../srv.bare2 &&
      git fetch --all 2>../fetch-log
    ) &&
    grep "Fetching sub" fetch-log >fetch-subs &&
    test_line_count = 2 fetch-subs
  '

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] fetch: do not run a redundant fetch from submodule
  2022-05-16 22:56                   ` Glen Choo
@ 2022-05-16 23:33                     ` Junio C Hamano
  0 siblings, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2022-05-16 23:33 UTC (permalink / raw)
  To: Glen Choo; +Cc: git, Benedek Kozma

Glen Choo <chooglen@google.com> writes:

> IMO "&& remote" is non-inuitive enough to warrant a comment, e.g.
>
>   /* do not update submodules if fetch_multiple() was called */
>   if (!result && remote && (recurse_submodules != RECURSE_SUBMODULES_OFF)) {

As I am not so familiar with this codepath when submodules are
involved, I need to be explained why having called fetch_multiple()
means there is no need to update submodules".

	/* 
	 * This is only needed after fetch_one(), which does not
         * fetch submodules by itself.
	 *
	 * fetch_multiple() has already updated submodules to grab
	 * commits necessary for the fetched history from each remote,
	 * so there is no need to fetch submodules from here.
	 */

perhaps?

> The test looks good, but I think it belongs in
> t/t5526-fetch-submodules.sh. I don't see anything else in
> t5617-clone-submodules-remote.sh that calls "git fetch".

Good eyes.

> In addition, I think we need one more test that adds another remote. In
> the above test, we only have one remote, so we hit your new optimization
> and already pass the test without the need for "&& remote".

True, too.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2] fetch: do not run a redundant fetch from submodule
  2022-05-16 21:53                 ` [PATCH] fetch: do not run a redundant fetch from submodule Junio C Hamano
  2022-05-16 22:56                   ` Glen Choo
@ 2022-05-16 23:53                   ` Junio C Hamano
  2022-05-17 16:47                     ` Glen Choo
  1 sibling, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2022-05-16 23:53 UTC (permalink / raw)
  To: git; +Cc: Benedek Kozma, Glen Choo

When 7dce19d3 (fetch/pull: Add the --recurse-submodules option,
2010-11-12) introduced the "--recurse-submodule" option, the
approach taken was to perform fetches in submodules only once, after
all the main fetching (it may usually be a fetch from a single
remote, but it could be fetching from a group of remotes using
fetch_multiple()) succeeded.  Later we added "--all" to fetch from
all defined remotes, which complicated things even more.

If your project has a submodule, and you try to run "git fetch
--recurse-submodule --all", you'd see a fetch for the top-level,
which invokes another fetch for the submodule, followed by another
fetch for the same submodule.  All but the last fetch for the
submodule come from a "git fetch --recurse-submodules" subprocess
that is spawned via the fetch_multiple() interface for the remotes,
and the last fetch comes from the code at the end.

Because recursive fetching from submodules is done in each fetch for
the top-level in fetch_multiple(), the last fetch in the submodule
is redundant.  It only matters when fetch_one() interacts with a
single remote at the top-level.

While we are at it, there is one optimization that exists in dealing
with a group of remote, but is missing when "--all" is used.  In the
former, when the group turns out to be a group of one, instead of
spawning "git fetch" as a subprocess via the fetch_multiple()
interface, we use the normal fetch_one() code path.  Do the same
when handing "--all", if it turns out that we have only one remote
defined.

Helped-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

So here is a second attempt.  It demonstrates a bit interesting
funny in range-diff where similar changes from the previous round
gets applied to a different target.

t5617 is much cleanly organized than t5526, and we may want to clean
up the latter after dust settles.

1:  006fe43da1 ! 1:  da0a4e341b fetch: do not run a redundant fetch from submodule
    @@ Commit message
         when handing "--all", if it turns out that we have only one remote
         defined.
     
    +    Helped-by: Glen Choo <chooglen@google.com>
         Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
      ## builtin/fetch.c ##
    @@ builtin/fetch.c: int cmd_fetch(int argc, const char **argv, const char *prefix)
      			die(_("fetch --all does not make sense with refspecs"));
      		(void) for_each_remote(get_one_remote_for_fetch, &list);
     +
    -+		/* no point doing fetch_multiple() of one */
    ++		/* do not do fetch_multiple() of one */
     +		if (list.nr == 1)
     +			remote = remote_get(list.items[0].string);
      	} else if (argc == 0) {
    @@ builtin/fetch.c: int cmd_fetch(int argc, const char **argv, const char *prefix)
      	}
      
     -	if (!result && (recurse_submodules != RECURSE_SUBMODULES_OFF)) {
    ++
    ++	/*
    ++	 * This is only needed after fetch_one(), which does not fetch
    ++	 * submodules by itself.
    ++	 *
    ++	 * When we fetch from multiple remotes, fetch_multiple() has
    ++	 * already updated submodules to grab commits necessary for
    ++	 * the fetched history from each remote, so there is no need
    ++	 * to fetch submodules from here.
    ++	 */
     +	if (!result && remote && (recurse_submodules != RECURSE_SUBMODULES_OFF)) {
      		struct strvec options = STRVEC_INIT;
      		int max_children = max_jobs;
      
     
    - ## t/t5617-clone-submodules-remote.sh ##
    -@@ t/t5617-clone-submodules-remote.sh: test_expect_success 'clone with --single-branch' '
    + ## t/t5526-fetch-submodules.sh ##
    +@@ t/t5526-fetch-submodules.sh: test_expect_success 'fetch --recurse-submodules updates name-conflicted, unpopul
      	)
      '
      
     +test_expect_success 'fetch --all with --recurse-submodules' '
    -+	test_when_finished "rm -fr super_clone" &&
    -+	git clone --recurse-submodules srv.bare super_clone &&
    ++	test_when_finished "rm -fr src_clone" &&
    ++	git clone --recurse-submodules src src_clone &&
     +	(
    -+		cd super_clone &&
    ++		cd src_clone &&
     +		git config submodule.recurse true &&
     +		git config fetch.parallel 0 &&
     +		git fetch --all 2>../fetch-log
     +	) &&
    -+	grep "Fetching sub" fetch-log >fetch-subs &&
    ++	grep "^Fetching submodule sub$" fetch-log >fetch-subs &&
     +	test_line_count = 1 fetch-subs
     +'
     +
    - # do basic partial clone from "srv.bare"
    - # confirm partial clone was registered in the local config for super and sub.
    - test_expect_success 'clone with --filter' '
    ++test_expect_success 'fetch --all with --recurse-submodules with multiple' '
    ++	test_when_finished "rm -fr src_clone" &&
    ++	git clone --recurse-submodules src src_clone &&
    ++	(
    ++		cd src_clone &&
    ++		git remote add secondary ../src &&
    ++		git config submodule.recurse true &&
    ++		git config fetch.parallel 0 &&
    ++		git fetch --all 2>../fetch-log
    ++	) &&
    ++	grep "Fetching submodule sub" fetch-log >fetch-subs &&
    ++	test_line_count = 2 fetch-subs
    ++'
    ++
    + test_done


 builtin/fetch.c             | 16 +++++++++++++++-
 t/t5526-fetch-submodules.sh | 27 +++++++++++++++++++++++++++
 2 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index e3791f09ed..8b15c40bb2 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -2187,6 +2187,10 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 		else if (argc > 1)
 			die(_("fetch --all does not make sense with refspecs"));
 		(void) for_each_remote(get_one_remote_for_fetch, &list);
+
+		/* do not do fetch_multiple() of one */
+		if (list.nr == 1)
+			remote = remote_get(list.items[0].string);
 	} else if (argc == 0) {
 		/* No arguments -- use default remote */
 		remote = remote_get(NULL);
@@ -2261,7 +2265,17 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 		result = fetch_multiple(&list, max_children);
 	}
 
-	if (!result && (recurse_submodules != RECURSE_SUBMODULES_OFF)) {
+
+	/*
+	 * This is only needed after fetch_one(), which does not fetch
+	 * submodules by itself.
+	 *
+	 * When we fetch from multiple remotes, fetch_multiple() has
+	 * already updated submodules to grab commits necessary for
+	 * the fetched history from each remote, so there is no need
+	 * to fetch submodules from here.
+	 */
+	if (!result && remote && (recurse_submodules != RECURSE_SUBMODULES_OFF)) {
 		struct strvec options = STRVEC_INIT;
 		int max_children = max_jobs;
 
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index 43dada8544..a301b56db8 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -1125,4 +1125,31 @@ test_expect_success 'fetch --recurse-submodules updates name-conflicted, unpopul
 	)
 '
 
+test_expect_success 'fetch --all with --recurse-submodules' '
+	test_when_finished "rm -fr src_clone" &&
+	git clone --recurse-submodules src src_clone &&
+	(
+		cd src_clone &&
+		git config submodule.recurse true &&
+		git config fetch.parallel 0 &&
+		git fetch --all 2>../fetch-log
+	) &&
+	grep "^Fetching submodule sub$" fetch-log >fetch-subs &&
+	test_line_count = 1 fetch-subs
+'
+
+test_expect_success 'fetch --all with --recurse-submodules with multiple' '
+	test_when_finished "rm -fr src_clone" &&
+	git clone --recurse-submodules src src_clone &&
+	(
+		cd src_clone &&
+		git remote add secondary ../src &&
+		git config submodule.recurse true &&
+		git config fetch.parallel 0 &&
+		git fetch --all 2>../fetch-log
+	) &&
+	grep "Fetching submodule sub" fetch-log >fetch-subs &&
+	test_line_count = 2 fetch-subs
+'
+
 test_done
-- 
2.36.1-384-ga138d62aa5


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] fetch: do not run a redundant fetch from submodule
  2022-05-16 23:53                   ` [PATCH v2] " Junio C Hamano
@ 2022-05-17 16:47                     ` Glen Choo
  2022-05-18 15:53                       ` Junio C Hamano
  0 siblings, 1 reply; 17+ messages in thread
From: Glen Choo @ 2022-05-17 16:47 UTC (permalink / raw)
  To: Junio C Hamano, git; +Cc: Benedek Kozma


This version looks good to me, thanks :)

  Reviewed-by: Glen Choo <chooglen@google.com>

Junio C Hamano <gitster@pobox.com> writes:

> t5617 is much cleanly organized than t5526, and we may want to clean
> up the latter after dust settles.

Yeah, t5526 has so many tests for the 'core' functionality that it's
hard to fit something 'tangential' like "--all". I might touch it again
soon, so I'll keep this in mind.

> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index e3791f09ed..8b15c40bb2 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -2261,7 +2265,17 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
>  		result = fetch_multiple(&list, max_children);
>  	}
>  
> -	if (!result && (recurse_submodules != RECURSE_SUBMODULES_OFF)) {
> +
> +	/*
> +	 * This is only needed after fetch_one(), which does not fetch
> +	 * submodules by itself.
> +	 *
> +	 * When we fetch from multiple remotes, fetch_multiple() has
> +	 * already updated submodules to grab commits necessary for
> +	 * the fetched history from each remote, so there is no need
> +	 * to fetch submodules from here.
> +	 */
> +	if (!result && remote && (recurse_submodules != RECURSE_SUBMODULES_OFF)) {
>  		struct strvec options = STRVEC_INIT;
>  		int max_children = max_jobs;

Looks good; the comment is easier to understand than my suggestion for
sure.

> diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
> index 43dada8544..a301b56db8 100755
> --- a/t/t5526-fetch-submodules.sh
> +++ b/t/t5526-fetch-submodules.sh
> @@ -1125,4 +1125,31 @@ test_expect_success 'fetch --recurse-submodules updates name-conflicted, unpopul
>  	)
>  '
>  
> +test_expect_success 'fetch --all with --recurse-submodules' '
> +	test_when_finished "rm -fr src_clone" &&
> +	git clone --recurse-submodules src src_clone &&
> +	(
> +		cd src_clone &&
> +		git config submodule.recurse true &&
> +		git config fetch.parallel 0 &&
> +		git fetch --all 2>../fetch-log
> +	) &&
> +	grep "^Fetching submodule sub$" fetch-log >fetch-subs &&
> +	test_line_count = 1 fetch-subs
> +'
> +
> +test_expect_success 'fetch --all with --recurse-submodules with multiple' '
> +	test_when_finished "rm -fr src_clone" &&
> +	git clone --recurse-submodules src src_clone &&
> +	(
> +		cd src_clone &&
> +		git remote add secondary ../src &&
> +		git config submodule.recurse true &&
> +		git config fetch.parallel 0 &&
> +		git fetch --all 2>../fetch-log
> +	) &&
> +	grep "Fetching submodule sub" fetch-log >fetch-subs &&
> +	test_line_count = 2 fetch-subs
> +'
> +

Also looks good.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] fetch: do not run a redundant fetch from submodule
  2022-05-17 16:47                     ` Glen Choo
@ 2022-05-18 15:53                       ` Junio C Hamano
  0 siblings, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2022-05-18 15:53 UTC (permalink / raw)
  To: Glen Choo; +Cc: git, Benedek Kozma

Glen Choo <chooglen@google.com> writes:

>> +
>> +	/*
>> +	 * This is only needed after fetch_one(), which does not fetch
>> +	 * submodules by itself.
>> +	 *
>> +	 * When we fetch from multiple remotes, fetch_multiple() has
>> +	 * already updated submodules to grab commits necessary for
>> +	 * the fetched history from each remote, so there is no need
>> +	 * to fetch submodules from here.
>> +	 */
>> +	if (!result && remote && (recurse_submodules != RECURSE_SUBMODULES_OFF)) {
>>  		struct strvec options = STRVEC_INIT;
>>  		int max_children = max_jobs;
>
> Looks good; the comment is easier to understand than my suggestion for
> sure.

Thanks.  Today's code has diverged too much from the original code I
wrote long time ago (before submodules), and I needed an extra set
of eyeballs to double check and tell me that what I (wishfully)
wrote how the code works with submodules is in line with today's
code ;-)


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-05-18 15:53 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-29 14:46 Bugreport - submodules are fetched twice in some cases Benedek Kozma
2022-04-29 17:39 ` Junio C Hamano
2022-04-29 19:05   ` Glen Choo
2022-04-29 20:02     ` Junio C Hamano
2022-04-29 20:37       ` Glen Choo
2022-05-14  0:07       ` Glen Choo
2022-05-14  5:24         ` Junio C Hamano
2022-05-16 17:45           ` Glen Choo
2022-05-16 18:25             ` Junio C Hamano
2022-05-16 19:04               ` Junio C Hamano
2022-05-16 21:53                 ` [PATCH] fetch: do not run a redundant fetch from submodule Junio C Hamano
2022-05-16 22:56                   ` Glen Choo
2022-05-16 23:33                     ` Junio C Hamano
2022-05-16 23:53                   ` [PATCH v2] " Junio C Hamano
2022-05-17 16:47                     ` Glen Choo
2022-05-18 15:53                       ` Junio C Hamano
2022-05-14  0:15       ` Bugreport - submodules are fetched twice in some cases Glen Choo

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).