git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Fetch on submodule update
@ 2018-08-01 17:18 Robert Dailey
  2018-08-01 22:34 ` Jonathan Nieder
  2018-08-02  6:08 ` Jonathan Nieder
  0 siblings, 2 replies; 6+ messages in thread
From: Robert Dailey @ 2018-08-01 17:18 UTC (permalink / raw)
  To: Git

Problem: I want to avoid recursively fetching submodules when I run a
`fetch` command, and instead defer that operation to the next
`submodule update`. Essentially I want `fetch.recurseSubmodules` to be
`false`, and `get submodule update` to do exactly what it does with
the `--remote` option, but still use the SHA1 of the submodule instead
of updating to the tip of the specified branch in the git modules
config.

I hope that makes sense. The reason for this ask is to
improve/streamline workflow in parent repositories. There are cases
where I want to quickly fetch only the parent repository, even if a
submodule changes, to perform some changes that do not require the
submodule itself (yet). Then at a later time, do `submodule update`
and have it automatically fetch when the SHA1 it's updating to does
not exist (because the former fetch operation for the submodule was
skipped). For my case, it's very slow to wait on submodules to
recursively fetch when I only wanted to fetch the parent repo for the
specific task I plan to do.

Is this possible right now through some variation of configuration?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Fetch on submodule update
  2018-08-01 17:18 Fetch on submodule update Robert Dailey
@ 2018-08-01 22:34 ` Jonathan Nieder
  2018-08-02  6:08 ` Jonathan Nieder
  1 sibling, 0 replies; 6+ messages in thread
From: Jonathan Nieder @ 2018-08-01 22:34 UTC (permalink / raw)
  To: Robert Dailey; +Cc: Git, Stefan Beller

Hi,

Robert Dailey wrote:

> Problem: I want to avoid recursively fetching submodules when I run a
> `fetch` command, and instead defer that operation to the next
> `submodule update`. Essentially I want `fetch.recurseSubmodules` to be
> `false`, and `get submodule update` to do exactly what it does with
> the `--remote` option, but still use the SHA1 of the submodule instead
> of updating to the tip of the specified branch in the git modules
> config.
>
> I hope that makes sense. The reason for this ask is to
> improve/streamline workflow in parent repositories. There are cases
> where I want to quickly fetch only the parent repository, even if a
> submodule changes, to perform some changes that do not require the
> submodule itself (yet). Then at a later time, do `submodule update`
> and have it automatically fetch when the SHA1 it's updating to does
> not exist (because the former fetch operation for the submodule was
> skipped). For my case, it's very slow to wait on submodules to
> recursively fetch when I only wanted to fetch the parent repo for the
> specific task I plan to do.
>
> Is this possible right now through some variation of configuration?

Can you say more about the overall workflow?  This seems quite different
from what we've been designing --recurse-submodules around:

- avoiding the end user ever having to use the "git submodule" command,
  except to add, remove, or reconfigure submodules

- treating the whole codebase as something like one project, so that
  "git checkout --recurse-submodules <commit>" always checks out the
  same state

More details about the application would help with better
understanding whether it can fit into this framework, or whether it's
a case where you'd want to set "submodule.recurse" to false to have
more manual control.

Thanks and hope that helps,
Jonathan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Fetch on submodule update
  2018-08-01 17:18 Fetch on submodule update Robert Dailey
  2018-08-01 22:34 ` Jonathan Nieder
@ 2018-08-02  6:08 ` Jonathan Nieder
  2018-08-06 14:45   ` Robert Dailey
  1 sibling, 1 reply; 6+ messages in thread
From: Jonathan Nieder @ 2018-08-02  6:08 UTC (permalink / raw)
  To: Robert Dailey; +Cc: Git, Stefan Beller

Hi again,

Robert Dailey wrote:

> Problem: I want to avoid recursively fetching submodules when I run a
> `fetch` command, and instead defer that operation to the next
> `submodule update`. Essentially I want `fetch.recurseSubmodules` to be
> `false`, and `get submodule update` to do exactly what it does with
> the `--remote` option, but still use the SHA1 of the submodule instead
> of updating to the tip of the specified branch in the git modules
> config.

I think I misread this the first time.  I got distracted by your
mention of the --remote option, but you mentioned you want to use the
SHA-1 of the submodule listed, so that was silly of me.

I think you'll find that "git fetch --no-recurse-submodules" and "git
submodule update" do exactly what you want.  "git submodule update"
does perform a fetch (unless you pass --no-fetch).

Let me know how it goes. :)

I'd still be interested in hearing more about the nature of the
submodules involved --- maybe `submodule.fetchJobs` would help, or
maybe this is a workflow where a tool that transparently fetches
submodules on demand like
https://gerrit.googlesource.com/gitfs/+/master/docs/design.md would be
useful (I'm not recommending using slothfs for this today, since it's
read-only, but it illustrates the idea).

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Fetch on submodule update
  2018-08-02  6:08 ` Jonathan Nieder
@ 2018-08-06 14:45   ` Robert Dailey
  2018-08-06 15:41     ` Jonathan Nieder
  0 siblings, 1 reply; 6+ messages in thread
From: Robert Dailey @ 2018-08-06 14:45 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Git, Stefan Beller

On Thu, Aug 2, 2018 at 1:08 AM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> I think I misread this the first time.  I got distracted by your
> mention of the --remote option, but you mentioned you want to use the
> SHA-1 of the submodule listed, so that was silly of me.
>
> I think you'll find that "git fetch --no-recurse-submodules" and "git
> submodule update" do exactly what you want.  "git submodule update"
> does perform a fetch (unless you pass --no-fetch).
>
> Let me know how it goes. :)
>
> I'd still be interested in hearing more about the nature of the
> submodules involved --- maybe `submodule.fetchJobs` would help, or
> maybe this is a workflow where a tool that transparently fetches
> submodules on demand like
> https://gerrit.googlesource.com/gitfs/+/master/docs/design.md would be
> useful (I'm not recommending using slothfs for this today, since it's
> read-only, but it illustrates the idea).

Hi thanks for your response, sorry I am a bit late getting back with you.

Maybe my workflow is dated, because I'm still used to treating
submodules as distinctly separated and independent things. I realize
submodule recursion is becoming more inherent in many high level git
commands, but outside of git there are separation issues that make
this workflow doomed to be non-seamless. For example, pull requests
will never offer the same uniformity: You will still have 1 pull
request per submodule. There's also the issue of log audits: You
cannot use blame, log, bisect, or other "diagnostic" commands to
introspect into submodules "as if" they were subtree or something of
the like (i.e. truly part of the DAG). A more realistic example of one
of the common questions I still can't answer easily is: "How do you
determine which commit in a submodule made it into which release of
the software?" In the case where the parent repository has the
annotated tags (representing software release milestones), and the
submodule is just a common library (which does not have those tags and
has no release cycle). Anyway, none of these issues are particularly
related but they do contribute to the answer to your question
regarding my workflow and use cases. The list goes on but I hope you
get the idea.

Some of the more functional issues are performance related: I am aware
enough, at times, that I can save time (in both local operations and
network overhead) by skipping submodules. For example, if I know that
I'm merging mainline branches, I do not need to mess with the
submodules (I can fetch, merge, commit, push from the parent repo
without messing with the submodules. This saves me time). If
`fetchJobs` was also `updateJobs`, i.e. you could update submodules in
parallel too, that might make this less of an issue. Think of
repositories [like boost][1] that have (I think) over a hundred
sibling submodules: Fetching 8 in parallel *and* doing `submodule
update` in parallel 8 times might also speed things up. There's also
`git status`, that if it recurses into submodules, is also
significantly slow in the boost case (I'm not sure if it is
parallelized).

Again, none of this is particularly related, but just to give you more
context on the "why" for my ask. Sorry if I'm dragging this out too
far.

The TLDR is that I do prefer the manual control. Automatic would be
great if submodules were treated as integrated in a similar manner to
subtree, but it's not there. I wasn't aware that `submodule update`
did a fetch, because sometimes if I do that, I get errors saying SHA1
is not present (because the submodule did not get fetched). Granted I
haven't seen this in a while, so maybe the fetch on submodule update
is a newer feature. Do you know what triggers the fetch on update
without --remote? Is it the missing SHA1 that triggers it, or is it
fetching unconditionally?

Thanks for confirming it behaves as I already wanted. And as you can
tell, I'm also happy to further discuss motivation / use cases /
details related to overall usage of submodules if you'd like. I'm
happy to help however I can!

[1]: https://github.com/boostorg/boost

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Fetch on submodule update
  2018-08-06 14:45   ` Robert Dailey
@ 2018-08-06 15:41     ` Jonathan Nieder
  2018-08-06 15:44       ` Robert Dailey
  0 siblings, 1 reply; 6+ messages in thread
From: Jonathan Nieder @ 2018-08-06 15:41 UTC (permalink / raw)
  To: Robert Dailey; +Cc: Git, Stefan Beller

Robert Dailey wrote:

>                                                  Automatic would be
> great if submodules were treated as integrated in a similar manner to
> subtree, but it's not there. I wasn't aware that `submodule update`
> did a fetch, because sometimes if I do that, I get errors saying SHA1
> is not present (because the submodule did not get fetched). Granted I
> haven't seen this in a while, so maybe the fetch on submodule update
> is a newer feature. Do you know what triggers the fetch on update
> without --remote? Is it the missing SHA1 that triggers it, or is it
> fetching unconditionally?

Thanks for this and the rest of the context you sent.  It's very
helpful.

The relevant code in git-submodule.sh is

	# Run fetch only if $sha1 isn't present or it
	# is not reachable from a ref.
	is_tip_reachable "$sm_path" "$sha1" ||
	fetch_in_submodule "$sm_path" $depth ||
	say "$(eval_gettext "Unable to fetch in submodule path '\$displaypath'")"

	# Now we tried the usual fetch, but $sha1 may
	# not be reachable from any of the refs
	is_tip_reachable "$sm_path" "$sha1" ||
	fetch_in_submodule "$sm_path" $depth "$sha1" ||
	die "$(eval_gettext "Fetched in submodule path '\$displaypath', but it did not contain \$sha1. Direct fetching of that commit failed.")"

The fallback to fetching by SHA-1 was introduced in v2.8.0-rc0~9^2
(submodule: try harder to fetch needed sha1 by direct fetching sha1,
2018-02-23).

Jonathan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Fetch on submodule update
  2018-08-06 15:41     ` Jonathan Nieder
@ 2018-08-06 15:44       ` Robert Dailey
  0 siblings, 0 replies; 6+ messages in thread
From: Robert Dailey @ 2018-08-06 15:44 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Git, Stefan Beller

On Mon, Aug 6, 2018 at 10:41 AM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> Robert Dailey wrote:
>
>>                                                  Automatic would be
>> great if submodules were treated as integrated in a similar manner to
>> subtree, but it's not there. I wasn't aware that `submodule update`
>> did a fetch, because sometimes if I do that, I get errors saying SHA1
>> is not present (because the submodule did not get fetched). Granted I
>> haven't seen this in a while, so maybe the fetch on submodule update
>> is a newer feature. Do you know what triggers the fetch on update
>> without --remote? Is it the missing SHA1 that triggers it, or is it
>> fetching unconditionally?
>
> Thanks for this and the rest of the context you sent.  It's very
> helpful.
>
> The relevant code in git-submodule.sh is
>
>         # Run fetch only if $sha1 isn't present or it
>         # is not reachable from a ref.
>         is_tip_reachable "$sm_path" "$sha1" ||
>         fetch_in_submodule "$sm_path" $depth ||
>         say "$(eval_gettext "Unable to fetch in submodule path '\$displaypath'")"
>
>         # Now we tried the usual fetch, but $sha1 may
>         # not be reachable from any of the refs
>         is_tip_reachable "$sm_path" "$sha1" ||
>         fetch_in_submodule "$sm_path" $depth "$sha1" ||
>         die "$(eval_gettext "Fetched in submodule path '\$displaypath', but it did not contain \$sha1. Direct fetching of that commit failed.")"
>
> The fallback to fetching by SHA-1 was introduced in v2.8.0-rc0~9^2
> (submodule: try harder to fetch needed sha1 by direct fetching sha1,
> 2018-02-23).

Yep, that's the root cause; I was basing my concerns on a legacy
issue. I just had avoided using `update` when I expected a fetch, so I
never saw the issue again, and thus didn't realize it was corrected.
Very helpful. Thanks again!

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-08-06 15:44 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-01 17:18 Fetch on submodule update Robert Dailey
2018-08-01 22:34 ` Jonathan Nieder
2018-08-02  6:08 ` Jonathan Nieder
2018-08-06 14:45   ` Robert Dailey
2018-08-06 15:41     ` Jonathan Nieder
2018-08-06 15:44       ` Robert Dailey

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).