git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Philippe Blain <levraiphilippeblain@gmail.com>
To: Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org
Cc: Jonathan Tan <jonathantanmy@google.com>,
	Taylor Blau <me@ttaylorr.com>,
	Derrick Stolee <derrickstolee@github.com>,
	Derrick Stolee <dstolee@microsoft.com>,
	peff@peff.net
Subject: Re: [PATCH] clone: --filter=tree:0 implies fetch.recurseSubmodules=no
Date: Sat, 21 Nov 2020 11:19:00 -0500	[thread overview]
Message-ID: <e539892e-743a-96d7-a540-b7f0af22cbe1@gmail.com> (raw)
In-Reply-To: <pull.797.git.1605904586929.gitgitgadget@gmail.com>

Hi Stolee,

On 20-11-20 15 h 36, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
>
> The partial clone feature has several modes, but only a few are quick
> for a server to process using reachability bitmaps:
>
> * Blobless: --filter=blob:none downloads all commits and trees and
>   fetches necessary blobs on-demand.
>
> * Treeless: --filter=tree:0 downloads all commits and fetches necessary
>   trees and blobs on demand.
>
> This treeles mode is most similar to a shallow clone in the total size
> (it only adds the commit objects for the full history). This makes
> treeless clones an interesting replacement for shallow clones. A user
> can run more commands in a treeless clone than in a shallow clone,
> especially 'git log' (no pathspec).
>
> In particular, servers can still serve 'git fetch' requests quickly by
> calculating the difference between commit wants and haves using bitmaps.
>
> I was testing this feature with this in mind, and I knew that some trees
> would be downloaded multiple times when checking out a new branch, but I
> did not expect to discover a significant issue with 'git fetch', at
> least in repostiories with submodules.
>
> I was testing these commands:
>
> 	$ git clone --filter=tree:0 --single-branch --branch=master \
> 	  https://github.com/git/git
> 	$ git -C git fetch origin "+refs/heads/*:refs/remotes/origin/*"
>
> This fetch command started downloading several pack-files of trees
> before completing the command. I never let it finish since I got so
> impatient with the repeated downloads. During debugging, I found that
> the stack triggering promisor_remote_get_direct() was going through
> fetch_populated_submodules(). Notice that I did not recurse my
> submodules in the original clone, so the sha1collisiondetection
> submodule is not initialized. Even so, my 'git fetch' was scanning
> commits for updates to submodules.

I'm not super familiar with the inner workings
offetch_populated_submodules(), but is seems weird that this function
does something in that case. It should do nothing, as the submodule is
not populated. Maybe it would be worth it to investigate what exactly is
happening?

> I decided that even if I did populate the submodules, the nature of
> treeless clones makes me not want to care about the contents of commits
> other than those that I am explicitly navigating to.
>
> This loop of tree fetches can be avoided by adding
> --no-recurse-submodules to the 'git fetch' command or setting
> fetch.recurseSubmodules=no.
>
> To make this as painless as possible for future users of treeless
> clones, automatically set fetch.recurseSubmodules=no at clone time.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>     clone: --filter=tree:0 implies fetch.recurseSubmodules=no
>     
>     While testing different partial clone options, I stumbled across this
>     one. My initial thought was that we were parsing commits and loading
>     their root trees unnecessarily, but I see that doesn't happen after this
>     change.
>     
>     Here are some recent discussions about using --filter=tree:0:
>     
>     [1] 
>     https://lore.kernel.org/git/aa7b89ee-08aa-7943-6a00-28dcf344426e@syntevo.com/
>     [2] https://lore.kernel.org/git/cover.1588633810.git.me@ttaylorr.com/[3] 
>     https://lore.kernel.org/git/58274817-7ac6-b6ae-0d10-22485dfe5e0e@syntevo.com/
>     
>     Thanks, -Stolee
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-797%2Fderrickstolee%2Ftree-0-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-797/derrickstolee/tree-0-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/797
>
>  list-objects-filter-options.c | 4 ++++
>  t/t5616-partial-clone.sh      | 6 ++++++
>  2 files changed, 10 insertions(+)

In any case I think such a change would also need a doc update, probably
in Documentation/fetch-options.txt and Documentation/config/fetch.txt.

Cheers,

Philippe.


      parent reply	other threads:[~2020-11-21 16:21 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-20 20:36 [PATCH] clone: --filter=tree:0 implies fetch.recurseSubmodules=no Derrick Stolee via GitGitGadget
2020-11-21  0:04 ` Jeff King
2020-11-23 15:18   ` Derrick Stolee
2020-11-24  8:04     ` Jeff King
2020-11-21 16:19 ` Philippe Blain [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e539892e-743a-96d7-a540-b7f0af22cbe1@gmail.com \
    --to=levraiphilippeblain@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=jonathantanmy@google.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).