git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* Bug in git submodule update --remote
@ 2021-05-14 18:11 Ben Avison
  2021-05-19 10:49 ` Atharva Raykar
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Ben Avison @ 2021-05-14 18:11 UTC (permalink / raw)
  To: git

A recent-ish change in git 2.27.1, introduced in commit f0a96e8d, has 
also got me wondering about whether some related functionality is 
correct. I'm not sure what the best way to fix things is, so can I 
invite opinions?

The scenario: I have a repository with a submodule. The submodule's 
upstream repository uses a fork workflow, so the submodule has two 
remotes, one for pulling in other people's changes, and one for 
uploading my own pull requests.

$ mkdir myproject
$ cd myproject
$ git init
$ git submodule add https://github.com/acme/library
$ cd library
$ git remote add -f myfork https://github.com/user/library
$ cd ..
$ git add .
$ git commit -m "Initial commit"
$ git remote add origin https://github.com/user/myproject
$ git push -u origin master

Furthermore, assume I forked https://github.com/acme/example some time 
ago, so that the master branches between it and my fork have diverged.

Time passes. People hack away on both projects. I want to fix a bug or 
implement a new feature in the library, so I start by ensuring both are 
up-to-date:

$ git pull
$ git submodule update --remote
$ cd library
$ git checkout -b new-feature
$ # hack away
$ git add .
$ git commit -m "New feature"
$ git push -u myfork new-feature
$ cd ..

Some more time passes, and I want to work on it again. Again, I start by 
ensuring I'm up-to-date. Before git 2.27.1, I could do:

$ git submodule update --remote

Now, I get:

fatal: Needed a single revision
Unable to find current myfork/HEAD revision in submodule path 'library'

What's going on is that within "git submodule update --remote", the sha1 
used is formed by looking up the submodule's ref "<remote>/<branch>". 
The change in git 2.27.1 is that if no remote tracking branch is stated 
in the superproject's .gitmodules file, <branch> defaults to "HEAD" 
rather than "master" as previously. That's fine if <remote> is "origin", 
but in practice, it depends on how the submodule is currently checked out:

* if it's in detached HEAD state, <remote> is set to "origin"
* if its branch is not a tracking branch, "origin" is also used
* but if it's a tracking branch (as I used in my workflow above - not 
uncommon in pull request workflows because you might grant other people 
access to the branch during the review process) then it looks up the 
name of the remote which is being tracked

First observation: a ref called "myfork/HEAD" presumably *could* have 
been created by the "git remote add" command, reflecting that remote's 
default branch. Should it?

In practice, that's not actually what I'd want, though. If I do "git 
submodule update --remote", I personally normally do so as a shortcut 
compared to cloning everything again. I don't expect the behaviour to 
change depending on whether I happen to have left one of the submodules 
checked out on a tracking branch or not: myfork/master is potentially 
quite out of date compared to origin/master.

It also makes very little sense to me that issuing "git submodule update 
--remote" twice in quick succession would leave us in a different state, 
because the first call puts all the submodules into detached HEAD state, 
meaning that the second call always looks up the remote tracking 
branches from the submodule's "origin" remote.

One way this could be fixed is that if <branch> turns out to be "HEAD", 
we could force <remote> to be "origin". However, this doesn't address 
the equivalent problem that arises if the remote tracking branch *is* 
named in .gitmodules.

I'd therefore like to propose that "git submodule update --remote" 
*always* uses the remote named "origin" to form the ref used to check 
out the submodule. However, I'm not sure whether everyone would agree 
that this constitutes a bugfix, or whether I'd need to introduce a new 
switch to enforce this behaviour.

So, what do you think?

Ben

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Bug in git submodule update --remote
  2021-05-14 18:11 Bug in git submodule update --remote Ben Avison
@ 2021-05-19 10:49 ` Atharva Raykar
  2021-05-19 14:41   ` Ben Avison
  2021-05-19 11:19 ` Atharva Raykar
  2021-05-22 18:02 ` Philippe Blain
  2 siblings, 1 reply; 8+ messages in thread
From: Atharva Raykar @ 2021-05-19 10:49 UTC (permalink / raw)
  To: Ben Avison; +Cc: git

On 14-May-2021, at 23:41, Ben Avison <bavison@riscosopen.org> wrote:
> 
> What's going on is that within "git submodule update --remote", the sha1 used is formed by looking up the submodule's ref "<remote>/<branch>". The change in git 2.27.1 is that if no remote tracking branch is stated in the superproject's .gitmodules file, <branch> defaults to "HEAD" rather than "master" as previously. That's fine if <remote> is "origin", but in practice, it depends on how the submodule is currently checked out:
> 
> * if it's in detached HEAD state, <remote> is set to "origin"
> * if its branch is not a tracking branch, "origin" is also used
> * but if it's a tracking branch (as I used in my workflow above - not uncommon in pull request workflows because you might grant other people access to the branch during the review process) then it looks up the name of the remote which is being tracked
> 
> First observation: a ref called "myfork/HEAD" presumably *could* have been created by the "git remote add" command, reflecting that remote's default branch. Should it?
> 
> In practice, that's not actually what I'd want, though. If I do "git submodule update --remote", I personally normally do so as a shortcut compared to cloning everything again. I don't expect the behaviour to change depending on whether I happen to have left one of the submodules checked out on a tracking branch or not: myfork/master is potentially quite out of date compared to origin/master.
> 
> It also makes very little sense to me that issuing "git submodule update --remote" twice in quick succession would leave us in a different state, because the first call puts all the submodules into detached HEAD state, meaning that the second call always looks up the remote tracking branches from the submodule's "origin" remote.

I agree with you, this behaviour is rather inconsistent, and
can be quite surprising to an unsuspecting user.

> One way this could be fixed is that if <branch> turns out to be "HEAD", we could force <remote> to be "origin". However, this doesn't address the equivalent problem that arises if the remote tracking branch *is* named in .gitmodules.
> 
> I'd therefore like to propose that "git submodule update --remote" *always* uses the remote named "origin" to form the ref used to check out the submodule. However, I'm not sure whether everyone would agree that this constitutes a bugfix, or whether I'd need to introduce a new switch to enforce this behaviour.
> 
> So, what do you think?

If I understood you correctly, you'd prefer that the updating
of the submodule should be independent of the ref that is checked
out in the submodule's directory.

While I am not sure of the reason why the design of 'update --remote'
uses the remote-tracking branch of the submodule, I can imagine
adding a switch like 'submodule.<name>.remote' that defaults to
'origin'. Then the behaviour could be changed such that it always
pulls from the remote specified in that option.

This would help make the behaviour consistent in all the cases you
mentioned, while also giving the option for a user to update the
submodule from the remote of their choice (which may not be origin).

--
Atharva Raykar


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Bug in git submodule update --remote
  2021-05-14 18:11 Bug in git submodule update --remote Ben Avison
  2021-05-19 10:49 ` Atharva Raykar
@ 2021-05-19 11:19 ` Atharva Raykar
  2021-05-22 18:02 ` Philippe Blain
  2 siblings, 0 replies; 8+ messages in thread
From: Atharva Raykar @ 2021-05-19 11:19 UTC (permalink / raw)
  To: Ben Avison; +Cc: git

I am also reposting your message, adding hard line breaks so that
it displays more readably in the list archive, and also so that
anyone else can choose to quote you from this.

On 14-May-2021, at 23:41, Ben Avison <bavison@riscosopen.org> wrote:
> 
> A recent-ish change in git 2.27.1, introduced in commit f0a96e8d, has
> also got me wondering about whether some related functionality is
> correct. I'm not sure what the best way to fix things is, so can I
> invite opinions?
> 
> The scenario: I have a repository with a submodule. The submodule's
> upstream repository uses a fork workflow, so the submodule has two
> remotes, one for pulling in other people's changes, and one for
> uploading my own pull requests.
> 
> $ mkdir myproject
> $ cd myproject
> $ git init
> $ git submodule add https://github.com/acme/library
> $ cd library
> $ git remote add -f myfork https://github.com/user/library
> $ cd ..
> $ git add .
> $ git commit -m "Initial commit"
> $ git remote add origin https://github.com/user/myproject
> $ git push -u origin master
> 
> Furthermore, assume I forked https://github.com/acme/example some time
> ago, so that the master branches between it and my fork have diverged.
> 
> Time passes. People hack away on both projects. I want to fix a bug or
> implement a new feature in the library, so I start by ensuring both are
> up-to-date:
> 
> $ git pull $ git submodule update --remote
> $ cd library
> $ git checkout -b new-feature
> $ # hack away
> $ git add .
> $ git commit -m "New feature"
> $ git push -u myfork new-feature
> $ cd ..
> 
> Some more time passes, and I want to work on it again. Again, I start by
> ensuring I'm up-to-date. Before git 2.27.1, I could do:
> 
> $ git submodule update --remote
> 
> Now, I get:
> 
> fatal: Needed a single revision Unable to find current myfork/HEAD
> revision in submodule path 'library'
> 
> What's going on is that within "git submodule update --remote", the sha1
> used is formed by looking up the submodule's ref "<remote>/<branch>".
> The change in git 2.27.1 is that if no remote tracking branch is stated
> in the superproject's .gitmodules file, <branch> defaults to "HEAD"
> rather than "master" as previously. That's fine if <remote> is "origin",
> but in practice, it depends on how the submodule is currently checked
> out:
> 
> * if it's in detached HEAD state, <remote> is set to "origin"
> * if its branch is not a tracking branch, "origin" is also used
> * but if it's a tracking branch (as I used in my workflow above - not
>   uncommon in pull request workflows because you might grant other
>   people access to the branch during the review process) then it looks
>   up the name of the remote which is being tracked
> 
> First observation: a ref called "myfork/HEAD" presumably *could* have
> been created by the "git remote add" command, reflecting that remote's
> default branch. Should it?
> 
> In practice, that's not actually what I'd want, though. If I do "git
> submodule update --remote", I personally normally do so as a shortcut
> compared to cloning everything again. I don't expect the behaviour to
> change depending on whether I happen to have left one of the submodules
> checked out on a tracking branch or not: myfork/master is potentially
> quite out of date compared to origin/master.
> 
> It also makes very little sense to me that issuing "git submodule update
> --remote" twice in quick succession would leave us in a different state,
> because the first call puts all the submodules into detached HEAD state,
> meaning that the second call always looks up the remote tracking
> branches from the submodule's "origin" remote.
> 
> One way this could be fixed is that if <branch> turns out to be "HEAD",
> we could force <remote> to be "origin". However, this doesn't address
> the equivalent problem that arises if the remote tracking branch *is*
> named in .gitmodules.
> 
> I'd therefore like to propose that "git submodule update --remote"
> *always* uses the remote named "origin" to form the ref used to check
> out the submodule. However, I'm not sure whether everyone would agree
> that this constitutes a bugfix, or whether I'd need to introduce a new
> switch to enforce this behaviour.
> 
> So, what do you think?
> 
> Ben


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Bug in git submodule update --remote
  2021-05-19 10:49 ` Atharva Raykar
@ 2021-05-19 14:41   ` Ben Avison
  2021-05-21 11:47     ` Atharva Raykar
  2021-05-22 19:15     ` Philippe Blain
  0 siblings, 2 replies; 8+ messages in thread
From: Ben Avison @ 2021-05-19 14:41 UTC (permalink / raw)
  To: Atharva Raykar; +Cc: git

On 19/05/2021 11:49, Atharva Raykar wrote:
> If I understood you correctly, you'd prefer that the updating of the
> submodule should be independent of the ref that is checked out in the
> submodule's directory.
> 
> While I am not sure of the reason why the design of 'update
> --remote' uses the remote-tracking branch of the submodule, I can
> imagine adding a switch like 'submodule.<name>.remote' that defaults
> to 'origin'. Then the behaviour could be changed such that it always 
> pulls from the remote specified in that option.
> 
> This would help make the behaviour consistent in all the cases you 
> mentioned, while also giving the option for a user to update the 
> submodule from the remote of their choice (which may not be origin).

I like that solution. Although, I should note that if the user has set
submodule.<name>.remote to something other than 'origin', they will need
to ensure that submodule.<name>.branch is also set, or they will still
hit the "Unable to find current <remote>/HEAD revision in submodule"
error that I initially stumbled on.

How about an implementation like the following? I introduced a new "git
submodule--helper" command rather than modify "print-default-remote" for
a couple of reasons:

1) "print-default-remote" is also used for "git submodule sync" (I'm not
sure if we should change its behaviour too)

2) "print-default-remote" needs to be executed from within the
submodule, and takes no arguments, whereas I need to parse the
superproject's .git/config so need to be executed from the superproject
and take the submodule path as an argument

The two functions I added are heavily based on "git submodule--helper
remote-branch". However:

* Unlike with the branch name, I don't fall back to using the name for
the remote cached from when we read the .gitmodules file, if it isn't
found in .git/config. It doesn't make sense to me for the .gitmodules
file to include this information, as any new clones will only contain
"origin" remotes anyway.

* I removed "struct strbuf sb" since I don't think it's used.

Ben


diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 9d505a6329..25ce3c8a1d 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -2444,6 +2444,41 @@ static int resolve_remote_submodule_branch(int argc, const char **argv,
 	return 0;
 }
 
+static const char *remote_submodule_remote(const char *path)
+{
+	const struct submodule *sub;
+	const char *remote = NULL;
+	char *key;
+
+	sub = submodule_from_path(the_repository, &null_oid, path);
+	if (!sub)
+		return NULL;
+
+	key = xstrfmt("submodule.%s.remote", sub->name);
+	repo_config_get_string_tmp(the_repository, key, &remote);
+	free(key);
+
+	if (!remote)
+		return "origin";
+
+	return remote;
+}
+
+static int resolve_remote_submodule_remote(int argc, const char **argv,
+		const char *prefix)
+{
+	const char *ret;
+	if (argc != 2)
+		die("submodule--helper remote-remote takes exactly one arguments, got %d", argc);
+
+	ret = remote_submodule_remote(argv[1]);
+	if (!ret)
+		die("submodule %s doesn't exist", argv[1]);
+
+	printf("%s", ret);
+	return 0;
+}
+
 static int push_check(int argc, const char **argv, const char *prefix)
 {
 	struct remote *remote;
@@ -2770,6 +2805,7 @@ static struct cmd_struct commands[] = {
 	{"deinit", module_deinit, 0},
 	{"summary", module_summary, SUPPORT_SUPER_PREFIX},
 	{"remote-branch", resolve_remote_submodule_branch, 0},
+	{"remote-remote", resolve_remote_submodule_remote, 0},
 	{"push-check", push_check, 0},
 	{"absorb-git-dirs", absorb_git_dirs, SUPPORT_SUPER_PREFIX},
 	{"is-active", is_active, 0},
diff --git a/git-submodule.sh b/git-submodule.sh
index eb90f18229..4d0df1cf5a 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -577,7 +577,7 @@ cmd_update()
 				fetch_in_submodule "$sm_path" $depth ||
 				die "$(eval_gettext "Unable to fetch in submodule path '\$sm_path'")"
 			fi
-			remote_name=$(sanitize_submodule_env; cd "$sm_path" && git submodule--helper print-default-remote)
+			remote_name=$(git submodule--helper remote-remote "$sm_path")
 			sha1=$(sanitize_submodule_env; cd "$sm_path" &&
 				git rev-parse --verify "${remote_name}/${branch}") ||
 			die "$(eval_gettext "Unable to find current \${remote_name}/\${branch} revision in submodule path '\$sm_path'")"

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Bug in git submodule update --remote
  2021-05-19 14:41   ` Ben Avison
@ 2021-05-21 11:47     ` Atharva Raykar
  2021-05-21 18:36       ` Christian Couder
  2021-05-22 19:15     ` Philippe Blain
  1 sibling, 1 reply; 8+ messages in thread
From: Atharva Raykar @ 2021-05-21 11:47 UTC (permalink / raw)
  To: Ben Avison; +Cc: git, Christian Couder, Shourya Shukla

On 19-May-2021, at 20:11, Ben Avison <bavison@riscosopen.org> wrote:
> 
> On 19/05/2021 11:49, Atharva Raykar wrote:
>> If I understood you correctly, you'd prefer that the updating of the
>> submodule should be independent of the ref that is checked out in the
>> submodule's directory.
>> 
>> While I am not sure of the reason why the design of 'update
>> --remote' uses the remote-tracking branch of the submodule, I can
>> imagine adding a switch like 'submodule.<name>.remote' that defaults
>> to 'origin'. Then the behaviour could be changed such that it always 
>> pulls from the remote specified in that option.
>> 
>> This would help make the behaviour consistent in all the cases you 
>> mentioned, while also giving the option for a user to update the 
>> submodule from the remote of their choice (which may not be origin).
> 
> I like that solution. Although, I should note that if the user has set
> submodule.<name>.remote to something other than 'origin', they will need
> to ensure that submodule.<name>.branch is also set, or they will still
> hit the "Unable to find current <remote>/HEAD revision in submodule"
> error that I initially stumbled on.
> 
> How about an implementation like the following? I introduced a new "git
> submodule--helper" command rather than modify "print-default-remote" for
> a couple of reasons:
> 
> 1) "print-default-remote" is also used for "git submodule sync" (I'm not
> sure if we should change its behaviour too)
> 
> 2) "print-default-remote" needs to be executed from within the
> submodule, and takes no arguments, whereas I need to parse the
> superproject's .git/config so need to be executed from the superproject
> and take the submodule path as an argument
> 
> The two functions I added are heavily based on "git submodule--helper
> remote-branch". However:
> 
> * Unlike with the branch name, I don't fall back to using the name for
> the remote cached from when we read the .gitmodules file, if it isn't
> found in .git/config. It doesn't make sense to me for the .gitmodules
> file to include this information, as any new clones will only contain
> "origin" remotes anyway.
> 
> * I removed "struct strbuf sb" since I don't think it's used.
> 
> Ben

Thanks for this. I am quite new around here, and I will be working
on porting the whole of 'submodule update' to C in the coming months.

Since this would modify the behaviour of the update subcommand, I
have decided to CC my mentors (Christian and Shourya) who are more
qualified than me to comment on this proposal.

I personally feel that the current behaviour where the remote used
depends on how the submodule is checked out is odd, and I don't
mind addressing it while doing the conversion of this functionality.

> diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
> index 9d505a6329..25ce3c8a1d 100644
> --- a/builtin/submodule--helper.c
> +++ b/builtin/submodule--helper.c
> @@ -2444,6 +2444,41 @@ static int resolve_remote_submodule_branch(int argc, const char **argv,
> 	return 0;
> }
> 
> +static const char *remote_submodule_remote(const char *path)
> +{
> +	const struct submodule *sub;
> +	const char *remote = NULL;
> +	char *key;
> +
> +	sub = submodule_from_path(the_repository, &null_oid, path);
> +	if (!sub)
> +		return NULL;
> +
> +	key = xstrfmt("submodule.%s.remote", sub->name);
> +	repo_config_get_string_tmp(the_repository, key, &remote);
> +	free(key);
> +
> +	if (!remote)
> +		return "origin";
> +
> +	return remote;
> +}
> +
> +static int resolve_remote_submodule_remote(int argc, const char **argv,
> +		const char *prefix)
> +{
> +	const char *ret;
> +	if (argc != 2)
> +		die("submodule--helper remote-remote takes exactly one arguments, got %d", argc);
> +
> +	ret = remote_submodule_remote(argv[1]);
> +	if (!ret)
> +		die("submodule %s doesn't exist", argv[1]);
> +
> +	printf("%s", ret);
> +	return 0;
> +}
> +
> static int push_check(int argc, const char **argv, const char *prefix)
> {
> 	struct remote *remote;
> @@ -2770,6 +2805,7 @@ static struct cmd_struct commands[] = {
> 	{"deinit", module_deinit, 0},
> 	{"summary", module_summary, SUPPORT_SUPER_PREFIX},
> 	{"remote-branch", resolve_remote_submodule_branch, 0},
> +	{"remote-remote", resolve_remote_submodule_remote, 0},
> 	{"push-check", push_check, 0},
> 	{"absorb-git-dirs", absorb_git_dirs, SUPPORT_SUPER_PREFIX},
> 	{"is-active", is_active, 0},
> diff --git a/git-submodule.sh b/git-submodule.sh
> index eb90f18229..4d0df1cf5a 100755
> --- a/git-submodule.sh
> +++ b/git-submodule.sh
> @@ -577,7 +577,7 @@ cmd_update()
> 				fetch_in_submodule "$sm_path" $depth ||
> 				die "$(eval_gettext "Unable to fetch in submodule path '\$sm_path'")"
> 			fi
> -			remote_name=$(sanitize_submodule_env; cd "$sm_path" && git submodule--helper print-default-remote)
> +			remote_name=$(git submodule--helper remote-remote "$sm_path")
> 			sha1=$(sanitize_submodule_env; cd "$sm_path" &&
> 				git rev-parse --verify "${remote_name}/${branch}") ||
> 			die "$(eval_gettext "Unable to find current \${remote_name}/\${branch} revision in submodule path '\$sm_path'")"


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Bug in git submodule update --remote
  2021-05-21 11:47     ` Atharva Raykar
@ 2021-05-21 18:36       ` Christian Couder
  0 siblings, 0 replies; 8+ messages in thread
From: Christian Couder @ 2021-05-21 18:36 UTC (permalink / raw)
  To: Atharva Raykar
  Cc: Ben Avison, git, Shourya Shukla, Johannes Schindelin, Philippe Blain

On Fri, May 21, 2021 at 1:47 PM Atharva Raykar <raykar.ath@gmail.com> wrote:
>
> On 19-May-2021, at 20:11, Ben Avison <bavison@riscosopen.org> wrote:

> > I like that solution. Although, I should note that if the user has set
> > submodule.<name>.remote to something other than 'origin', they will need
> > to ensure that submodule.<name>.branch is also set, or they will still
> > hit the "Unable to find current <remote>/HEAD revision in submodule"
> > error that I initially stumbled on.
> >
> > How about an implementation like the following? I introduced a new "git
> > submodule--helper" command rather than modify "print-default-remote" for
> > a couple of reasons:

First thanks for taking a look at improving git submodule and for
doing so by introducing most of the new code in submodule--helper!

[...]

> Since this would modify the behaviour of the update subcommand, I
> have decided to CC my mentors (Christian and Shourya) who are more
> qualified than me to comment on this proposal.

As commit f0a96e8d's author is Dscho (Johannes Schindelin) and as he
was helped by Philippe Blain, let me CC them too. They might know the
reasons for this behavior better than us.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Bug in git submodule update --remote
  2021-05-14 18:11 Bug in git submodule update --remote Ben Avison
  2021-05-19 10:49 ` Atharva Raykar
  2021-05-19 11:19 ` Atharva Raykar
@ 2021-05-22 18:02 ` Philippe Blain
  2 siblings, 0 replies; 8+ messages in thread
From: Philippe Blain @ 2021-05-22 18:02 UTC (permalink / raw)
  To: Ben Avison, git, Johannes Schindelin, Christian Couder, raykar.ath

Hi Ben,

Le 2021-05-14 à 14:11, Ben Avison a écrit :
> A recent-ish change in git 2.27.1, introduced in commit f0a96e8d, has also got me wondering about whether some related functionality is correct. I'm not sure what the best way to fix things is, so can I invite opinions?
> 
> The scenario: I have a repository with a submodule. The submodule's upstream repository uses a fork workflow, so the submodule has two remotes, one for pulling in other people's changes, and one for uploading my own pull requests.
> 
> $ mkdir myproject
> $ cd myproject
> $ git init
> $ git submodule add https://github.com/acme/library
> $ cd library
> $ git remote add -f myfork https://github.com/user/library
> $ cd ..
> $ git add .
> $ git commit -m "Initial commit"
> $ git remote add origin https://github.com/user/myproject
> $ git push -u origin master
> 
> Furthermore, assume I forked https://github.com/acme/example some time ago, so that the master branches between it and my fork have diverged.
> 
> Time passes. People hack away on both projects. I want to fix a bug or implement a new feature in the library, so I start by ensuring both are up-to-date:
> 
> $ git pull
> $ git submodule update --remote
> $ cd library
> $ git checkout -b new-feature
> $ # hack away
> $ git add .
> $ git commit -m "New feature"
> $ git push -u myfork new-feature

Starting on a slight tangent, I know the 'git push -u $remote $branch' command
is often mentioned in tutorials, but since you are using the forking workflow,
I would suggest looking into the 'remote.pushDefault' / branch.$.pushRemote configs [1]. This allows you
to track upstream/master on your feature branches (i.e. the branch into which your feature
will eventually be integrated), and to push directly to your fork with a simple 'git push' :)

> $ cd ..
> 
> Some more time passes, and I want to work on it again. Again, I start by ensuring I'm up-to-date. Before git 2.27.1, I could do:
> 
> $ git submodule update --remote
> 
> Now, I get:
> 
> fatal: Needed a single revision
> Unable to find current myfork/HEAD revision in submodule path 'library'

That's a shame. Dscho's f0a96e8d4c (submodule: fall back
to remote's HEAD for missing remote.<name>.branch, 2020-06-24) did not touch
git-sumodule.sh, so when I took a look at it in [3] I did not think of going through
the code path...

> 
> What's going on is that within "git submodule update --remote", the sha1 used is formed by looking up the submodule's ref "<remote>/<branch>". The change in git 2.27.1 is that if no remote tracking branch is stated in the superproject's .gitmodules file, <branch> defaults to "HEAD" rather than "master" as previously. That's fine if <remote> is "origin", but in practice, it depends on how the submodule is currently checked out:
> 
> * if it's in detached HEAD state, <remote> is set to "origin"
> * if its branch is not a tracking branch, "origin" is also used
> * but if it's a tracking branch (as I used in my workflow above - not uncommon in pull request workflows because you might grant other people access to the branch during the review process) then it looks up the name of the remote which is being tracked
> 

Yes, that's indeed the behaviour of submodule--helper.c::get_default_remote,
which is called by submodule--helper.c::print_default_remote, i.e.
'git submodule--helper print-default-remote', which is called by
git-submodule.sh::cmd_update (line 580 as of v2.32.0-rc1).

> First observation: a ref called "myfork/HEAD" presumably *could* have been created by the "git remote add" command, reflecting that remote's default branch. Should it?

I think that's what happens with the '-m' switch, i.e. 'git remote add -m master myfork $url'.
The symref $remote/HEAD can also be created after adding the remote
with 'git remote set-head myfork --auto' [2], which will
query the remote's real default branch (i.e. the remote HEAD) and set the local symref accordingly.
The fact that this is not done automatically upon
'git remote add' (or git fetch) is a historical mistake in my opinion but some others might disagree.

> 
> In practice, that's not actually what I'd want, though. If I do "git submodule update --remote", I personally normally do so as a shortcut compared to cloning everything again. I don't expect the behaviour to change depending on whether I happen to have left one of the submodules checked out on a tracking branch or not: myfork/master is potentially quite out of date compared to origin/master.

I completely agree.

> 
> It also makes very little sense to me that issuing "git submodule update --remote" twice in quick succession would leave us in a different state, because the first call puts all the submodules into detached HEAD state, meaning that the second call always looks up the remote tracking branches from the submodule's "origin" remote.

I also agree.

> 
> One way this could be fixed is that if <branch> turns out to be "HEAD", we could force <remote> to be "origin". However, this doesn't address the equivalent problem that arises if the remote tracking branch *is* named in .gitmodules.
> 
> I'd therefore like to propose that "git submodule update --remote" *always* uses the remote named "origin" to form the ref used to check out the submodule. However, I'm not sure whether everyone would agree that this constitutes a bugfix, or whether I'd need to introduce a new switch to enforce this behaviour.
>

I prefer introducing a new config variable, as you suggest in your patch
downthread. I'll comment there with more details, but I do not like the
approach of hard-coding 'origin' with no escape hatch. Personnally I use
'origin' for my fork and 'upstream' for the autoritative repository in all
my clones, and I can't do this in submodules because of the fact that 'origin'
is hard-coded at few places in the submodule code, so let't not hard code it yet
one more time.

Cheers,

Philippe.


[1] https://github.blog/2015-07-29-git-2-5-including-multiple-worktrees-and-triangular-workflows/#improved-support-for-triangular-workflows
[2] http://git-scm.com/docs/git-remote#Documentation/git-remote.txt-emset-headem
[3] https://lore.kernel.org/git/D2ED942B-9397-472A-B017-190016531547@gmail.com/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Bug in git submodule update --remote
  2021-05-19 14:41   ` Ben Avison
  2021-05-21 11:47     ` Atharva Raykar
@ 2021-05-22 19:15     ` Philippe Blain
  1 sibling, 0 replies; 8+ messages in thread
From: Philippe Blain @ 2021-05-22 19:15 UTC (permalink / raw)
  To: Ben Avison, Atharva Raykar, Christian Couder, Shourya Shukla,
	Emily Shaffer
  Cc: git

Hi Ben,

Le 2021-05-19 à 10:41, Ben Avison a écrit :
> On 19/05/2021 11:49, Atharva Raykar wrote:
>> If I understood you correctly, you'd prefer that the updating of the
>> submodule should be independent of the ref that is checked out in the
>> submodule's directory.
>>
>> While I am not sure of the reason why the design of 'update
>> --remote' uses the remote-tracking branch of the submodule, I can
>> imagine adding a switch like 'submodule.<name>.remote' that defaults
>> to 'origin'. Then the behaviour could be changed such that it always
>> pulls from the remote specified in that option.
>>
>> This would help make the behaviour consistent in all the cases you
>> mentioned, while also giving the option for a user to update the
>> submodule from the remote of their choice (which may not be origin).
> 
> I like that solution. 

I also think a new config is a good solution.

> Although, I should note that if the user has set
> submodule.<name>.remote to something other than 'origin', they will need
> to ensure that submodule.<name>.branch is also set, or they will still
> hit the "Unable to find current <remote>/HEAD revision in submodule"
> error that I initially stumbled on.
> 
> How about an implementation like the following? I introduced a new "git
> submodule--helper" command rather than modify "print-default-remote" for
> a couple of reasons:
> 
> 1) "print-default-remote" is also used for "git submodule sync" (I'm not
> sure if we should change its behaviour too)
> 
> 2) "print-default-remote" needs to be executed from within the
> submodule, and takes no arguments, whereas I need to parse the
> superproject's .git/config so need to be executed from the superproject
> and take the submodule path as an argument
> 
> The two functions I added are heavily based on "git submodule--helper
> remote-branch". However:
> 
> * Unlike with the branch name, I don't fall back to using the name for
> the remote cached from when we read the .gitmodules file, if it isn't
> found in .git/config. It doesn't make sense to me for the .gitmodules
> file to include this information, as any new clones will only contain
> "origin" remotes anyway.
> 
> * I removed "struct strbuf sb" since I don't think it's used.
> 
> Ben

I think that we would want to careful examine every use of 'get_default_remote'
(and the NEEDSWORK in submodule.c:get_next_submodule)
and decide for each one if it makes sense to respect that new config before failing
back to 'origin'. So I'm not sure about the approach here since it just fixes the behaviour
for 'git submodule update --remote' but not other scenarios where the user would like
another remote than 'origin' to be used for some operations in the submodule.

There was also another bug report recently [1] around the new 'clone.defaultRemoteName'
config and 'git submodule update --remote', so it might be good to also keep that in
mind when working on a solution.

Cheers,

Philippe.
P.S. I've CC'ed Emily who has plans for a big overhaul of submodule functionality
over the next months [2] so might be interested.

[1] https://lore.kernel.org/git/2d58fe40-9e8c-4653-8170-5411fd3cf6f4@www.fastmail.com/t/#u
[2] https://lore.kernel.org/git/0fc5c0f7-52f7-fb36-f654-ff5223a8809b@gmail.com/t/#u

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-05-22 19:15 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-14 18:11 Bug in git submodule update --remote Ben Avison
2021-05-19 10:49 ` Atharva Raykar
2021-05-19 14:41   ` Ben Avison
2021-05-21 11:47     ` Atharva Raykar
2021-05-21 18:36       ` Christian Couder
2021-05-22 19:15     ` Philippe Blain
2021-05-19 11:19 ` Atharva Raykar
2021-05-22 18:02 ` Philippe Blain

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).