git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Philippe Blain <levraiphilippeblain@gmail.com>
To: Glen Choo <chooglen@google.com>,
	Huang Zou <huang.zou@schrodinger.com>,
	git@vger.kernel.org
Subject: Re: Bug Report: fetch.recurseSubmodules doesn't affect git pull as otherwise stated in the git config docs
Date: Fri, 6 May 2022 18:24:32 -0400	[thread overview]
Message-ID: <b4d7600e-ed4e-b4b3-262a-a69818c25365@gmail.com> (raw)
In-Reply-To: <kl6lbkwa8h5n.fsf@chooglen-macbookpro.roam.corp.google.com>

Hi Glen!

Le 2022-05-06 à 17:50, Glen Choo a écrit :
> Philippe Blain <levraiphilippeblain@gmail.com> writes:
> 
>> Hi Huang,
>>
>> Le 2022-05-02 à 10:42, Huang Zou a écrit :
>>> Thank you for filling out a Git bug report!
>>> Please answer the following questions to help us understand your issue.
>>>
>>> What did you do before the bug happened? (Steps to reproduce your issue)
>>> 1) Set the following configs:
>>>  git config submodule.recurse true
>>>  git config fetch.recurseSubmodules false
>>> 2) On a repo with submodules, run:
>>> git pull
>>>
>>> What did you expect to happen? (Expected behavior)
>>> git pull doesn't recursively fetch submodules
>>>
>>> What happened instead? (Actual behavior)
>>> Submodules are fetched recursively
>>>
>>> What's different between what you expected and what actually happened?
>>> Submodules are fetched recursively
>>>
>>> Anything else you want to add:
>>> git fetch works as intended. The documentation for fetch.recurseSubmodules
>>> states that "This option controls whether git fetch (and the underlying
>>> fetch in git pull)" so I would naturally expect git pull to behave the same
>>> as git fetch
>>
>> I did not try to reproduce, but I took a look at the code and I think I understand
>> what happens. 
>>
>> When 'git pull' invokes 'git fetch', it does so by specifically using the '--recurse-submodules'
>> flag, see [1]. It sends either 'yes', 'no' or 'on-demand' as value, depending on the value
>> of the local variable 'recurse_submodules'. This variable is initialized to the config value
>> of 'submodule.recurse' in 'git_pull_config' [2], called at [3], and then overwritten by the value given
>> explicitely on the command line [4], parsed at [5].
>>
>> So when 'git fetch' runs when called by 'git pull', it always receive the 
>> '--recurse-submodules' flag, and thus any config for fetch.recurseSubmodules is ignored
>> (since explicit command line flags always have precedence over config values).
> 
> Thanks for looking into this! This seems to agree with my reading of the
> code. I haven't tried to reproduce it either, but the code looks
> obviously incorrect.
> 
>> So one way to fix this would be to also parse 'fetch.recurseSubmodules' in 'git_pull_config',
>> and send the correct value to the 'git fetch' invocation... Or simpler, call 'git fetch' with
>> '--recurse-submodules-default' [9] instead...
> 
> Despite having touched this code fairly recently, I had to do quite a
> rereading to refresh myself on how this works. If I _am_ reading this
> correctly, then I think we actually want to set `--recurse-submodules`
> and not `--recurse-submodules-default`.
> 
> The short story is that the two are not equivalent - when figuring out
> _which_ submodules to fetch (we determine on a submodule-by-submodule
> basis; we don't just say "should we fetch all submodules?"),
> `--recurse-submodules-default` gets overrided by config values, but
> `--recurse-submodules` does not.
> 
> The longer story (which I think is quite difficult to explain, I am also
> a little confused) is that in a recursive fetch,
> `--recurse-submodules-default` is the value of the parent's (we'll call
> it P) `--recurse-submodules`. This only matters when a process, C1, has
> to pass a value for `--recurse-submodules` to its child, C2. The
> precedence order is:
> 
> - C1's --recurse-submodules | fetch.recurseSubmodules |
>   submodule.recurse
> - C2's submodule entry in C1's .git/config
> - C2's entry in C1's .gitmodules
> - C1's --recurse-submodules-default (aka P's --recurse-submodules)
> 
> Specifically, in code:
> 
>   static int get_fetch_recurse_config(const struct submodule *submodule,
>               struct submodule_parallel_fetch *spf)
>   {
>     // Glen: passed in from builtin/fetch, which parses
>     //  --recurse-submodules, fetch.recurseSubmodules, submodule.recurse
>     if (spf->command_line_option != RECURSE_SUBMODULES_DEFAULT)
>       return spf->command_line_option;
> 
>     if (submodule) {
>       // ...
>       // Glen: fetch configuration from .gitmodules
>       int fetch_recurse = submodule->fetch_recurse;
> 
>       key = xstrfmt("submodule.%s.fetchRecurseSubmodules", submodule->name);
>       if (!repo_config_get_string_tmp(spf->r, key, &value)) {
>         // Glen: fetch configuration from .git/config
>         fetch_recurse = parse_fetch_recurse_submodules_arg(key, value);
>       }
>       // ...
>     }
> 
>     // Glen: --recurse-submodules-default
>     return spf->default_option;
>   }
> 
> So `--recurse-submodules-default` really wasn't meant for anything other
> than "fetch" invoking itself in a superproject-submodule setting.
> 
> Of course, I could be entirely wrong and I should just write up a test
> case :). I hope to send one soon.

OK so it's more complicated than I thought (I'm not surprised :P). Thanks for 
the details.

The other thing I thought about, is that even if '--recurse-submodules-default'
worked as I thought it did when I wrote this, it would still not be the right
thing to change builtin/pull to invoke 'git fetch' with it instead, since then
a user invoking e.g. 'git pull --recurse-submodules=false', who has 'fetch.recurseSubmodules=true',
would not get a recursive fetch, since the internal fetch would prioritise 'fetch.recurseSubmodules=true',
but that's contrary to user expectation that command-line flags override config, always...
So whatever is done to fix this, we might need to keep track of where 'recurse_submodules'
was set from, either the command-line or the config.

> 
>> [sidenote]
>> I'm thought for a while that it was maybe not a good idea to change the behaviour
>> in your specific situation. If you have 'submodule.recurse'
>> set to true and 'fetch.recurseSubmodules' set to false, and if the code is changed so that indeed
>> 'git pull' does not fetch recursively, then the code will still try to update the submodule working
>> trees after the end of the operation (merge or rebase), see the end of 'cmd_pull' [6], [7]. This  is
>> OK, because if there are new submodule commits referenced by the superproject and they were not fetched because the 
>> fetch was not recursive, then the call to 'git submodule update' in update_submodules/rebase_submodule
>> should fetch them, so no problem there.
>> [/sidenote]
> 
> I think the bigger question to ask is "what is the intended effect of
> 'submodule.recurse = true' and 'fetch.recurseSubmodules = false'?".

Yes, I agree that it would be nice if Huang clarified that as I'm not sure
either of the use case.

> I
> think it is not surprising to think that recursive worktree operations
> might fail if the submodule commits weren't fetched.

Yes, if ever 'merge' and 'rebase' are taught to obey 'submodule.recurse'
(if only I had time to work on that!), then all hell would break loose
if submodule commits were not fetched when these operations start.

> 
> Perhaps this is just a performance optimization? i.e. after fetching in
> the superproject, the user wants to skip the rev walk that discovers new
> submodule commits.
> 


Cheers,
Philippe.

  reply	other threads:[~2022-05-06 22:24 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-02 14:42 Bug Report: fetch.recurseSubmodules doesn't affect git pull as otherwise stated in the git config docs Huang Zou
2022-05-05 23:57 ` Philippe Blain
2022-05-06 21:50   ` Glen Choo
2022-05-06 22:24     ` Philippe Blain [this message]
2022-05-09 16:24       ` Huang Zou
2022-05-09 23:32 ` Glen Choo
2022-05-10  0:57   ` Philippe Blain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b4d7600e-ed4e-b4b3-262a-a69818c25365@gmail.com \
    --to=levraiphilippeblain@gmail.com \
    --cc=chooglen@google.com \
    --cc=git@vger.kernel.org \
    --cc=huang.zou@schrodinger.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).