git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Derrick Stolee <stolee@gmail.com>
Cc: Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org, Jonathan Tan <jonathantanmy@google.com>,
	Taylor Blau <me@ttaylorr.com>,
	Derrick Stolee <derrickstolee@github.com>,
	Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH] clone: --filter=tree:0 implies fetch.recurseSubmodules=no
Date: Tue, 24 Nov 2020 03:04:01 -0500	[thread overview]
Message-ID: <X7y+cZ3MPqqmGXx/@coredump.intra.peff.net> (raw)
In-Reply-To: <8a31af0e-4440-d957-11fb-48c4d2abd5c7@gmail.com>

On Mon, Nov 23, 2020 at 10:18:42AM -0500, Derrick Stolee wrote:

> > If I clone with tree:0, I'm still going to get the tree for the thing
> > I'm actually checking out (assuming a non-bare repo). It would be
> > reasonable to recursively fetch the submodules in that commit to
> > check them out (assuming you specified --recurse-submodules).
> > 
> > If I then fetch again, I'll end up with another tree that I'm about to
> > checkout. So likewise, would it make sense to fetch any updates from
> > there?
> 
> One thing that is different is that we will fetch the trees we need
> _during that checkout_, so why do it preemptively in the fetch?
> 
> Further, the number of trees being fetched is _not_ the number of ref
> tips, but seems to be related to the total number of commits different.
> There must be a rev walk looking for which commits changed the modules
> file or the commit link in the root tree.

Yeah, that's exactly what the "on-demand" thing is. I said before it was
looking for .gitmodules, but I think it has to be looking for updated
gitlinks. I'm pretty sure the logic only kicks in if we have a
.gitmodules, though, to avoid the extra traversal when you're not using
submodules (but I guess we're all now paying that traversal cost in
git.git, even if we don't populate the modules).

I think the goal is to do all of the fetches at once, so after "git
fetch" you can then run checkout, merge, etc, without worrying that
you'll need network access later.

But it's not actually "checkout" that does the fetching. It will update
the gitlink, but that will just show a diff against the content of the
submodule. You have to actually "git submodule update" to update the
repository, which will then auto-fetch (though I suspect with
appropriate config or command-line options, checkout can be convinced to
trigger "submodule update").

So in that sense, maybe your patch is the most sensible thing. If you're
not checking out immediately, we probably have no business in a partial
clone guessing at trees you _might_ check out later, especially if it
involves demand-fetching them from the server. And if you do check out
and "submodule update" immediately, then we'd do the fetch then anyway.

We can punt on the logic to walk the trees, only looking at ones we
actually _do_ have locally, until somebody else wants to work on it.

> >   - during its poking, should it set the necessary variables so that it
> >     never demand-fetches from a promisor remote? I suspect this part
> >     may be hard, because "fetch" and "checkout" are distinct operations
> >     (so during the "fetch" we don't yet have the new tree demand-fetched
> >     by checkout; in fact the user might not even be interested in
> >     checking it out yet).
> 
> I also think this is a good idea. In particular, should we consider
> making the "submodule fetch" be part of the "promisor"? That is, we
> only fetch our submodule on checkout? Can we assume that the commit
> will still exist on the remote, to some extent?

I think issues of whether the commit will exist are outside the scope
here. "submodule update" has to decide if we have it, or how to get it
(if it can). But it won't fail a checkout in that case; you'll just be
left with a diff between the gitlink and what's in the submodule repo.

I'm not quite sure what you're asking with the rest of it. From this:

> Naturally, this only really applies for --filter=tree:0, since in cases
> like blobless clones, we would still want the commits and trees from the
> submodule.

it sounds like you're asking whether the submodule should also be using
the same filter. Maybe, but I think it would depend on the workflow and
the module (and probably is something that you'd want to be able to
configure independently). At any rate, I think that's orthogonal to this
issue.

> > Given the difficulties in the latter case, this may be the best we can
> > do. But in that case, what happens when we _do_ care about submodules,
> > and do:
> > 
> >   git clone --recurse-submodules --filter=tree:0 ...
> >   git fetch
> >   git merge origin
> > 
> > Will we correctly fetch-on-demand the submodules we need during the
> > merge operation? If so, then that user experience is probably pretty
> > reasonable.

Answering my own question: no, you'd run "git submodule update"
afterwards.

> Hopefully. Notably, the filter option does _not_ apply recursively
> to the submodules, so even if we try to make the superproject a partial
> clone, the submodule is not partial.

Yep, but I do think that's orthogonal.

> More research is required. Let's drop this patch, as I don't currently
> have time to do the necessary deep dive. If someone else has time to
> look into this, I'd be happy to review a better patch.

I don't mind dropping it, but I've actually come around to the idea that
your patch or something like it is probably a strict improvement.

-Peff

  reply	other threads:[~2020-11-24  8:05 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-20 20:36 [PATCH] clone: --filter=tree:0 implies fetch.recurseSubmodules=no Derrick Stolee via GitGitGadget
2020-11-21  0:04 ` Jeff King
2020-11-23 15:18   ` Derrick Stolee
2020-11-24  8:04     ` Jeff King [this message]
2020-11-21 16:19 ` Philippe Blain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=X7y+cZ3MPqqmGXx/@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=derrickstolee@github.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=jonathantanmy@google.com \
    --cc=me@ttaylorr.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).