git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / Atom feed
From: Philippe Blain <levraiphilippeblain@gmail.com>
To: Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org
Cc: Jonathan Tan <jonathantanmy@google.com>,
	Taylor Blau <me@ttaylorr.com>,
	Derrick Stolee <derrickstolee@github.com>,
	Derrick Stolee <dstolee@microsoft.com>,
	peff@peff.net
Subject: Re: [PATCH] clone: --filter=tree:0 implies fetch.recurseSubmodules=no
Date: Sat, 21 Nov 2020 11:19:00 -0500
Message-ID: <e539892e-743a-96d7-a540-b7f0af22cbe1@gmail.com> (raw)
In-Reply-To: <pull.797.git.1605904586929.gitgitgadget@gmail.com>

Hi Stolee,

On 20-11-20 15 h 36, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
>
> The partial clone feature has several modes, but only a few are quick
> for a server to process using reachability bitmaps:
>
> * Blobless: --filter=blob:none downloads all commits and trees and
>   fetches necessary blobs on-demand.
>
> * Treeless: --filter=tree:0 downloads all commits and fetches necessary
>   trees and blobs on demand.
>
> This treeles mode is most similar to a shallow clone in the total size
> (it only adds the commit objects for the full history). This makes
> treeless clones an interesting replacement for shallow clones. A user
> can run more commands in a treeless clone than in a shallow clone,
> especially 'git log' (no pathspec).
>
> In particular, servers can still serve 'git fetch' requests quickly by
> calculating the difference between commit wants and haves using bitmaps.
>
> I was testing this feature with this in mind, and I knew that some trees
> would be downloaded multiple times when checking out a new branch, but I
> did not expect to discover a significant issue with 'git fetch', at
> least in repostiories with submodules.
>
> I was testing these commands:
>
> 	$ git clone --filter=tree:0 --single-branch --branch=master \
> 	  https://github.com/git/git
> 	$ git -C git fetch origin "+refs/heads/*:refs/remotes/origin/*"
>
> This fetch command started downloading several pack-files of trees
> before completing the command. I never let it finish since I got so
> impatient with the repeated downloads. During debugging, I found that
> the stack triggering promisor_remote_get_direct() was going through
> fetch_populated_submodules(). Notice that I did not recurse my
> submodules in the original clone, so the sha1collisiondetection
> submodule is not initialized. Even so, my 'git fetch' was scanning
> commits for updates to submodules.

I'm not super familiar with the inner workings
offetch_populated_submodules(), but is seems weird that this function
does something in that case. It should do nothing, as the submodule is
not populated. Maybe it would be worth it to investigate what exactly is
happening?

> I decided that even if I did populate the submodules, the nature of
> treeless clones makes me not want to care about the contents of commits
> other than those that I am explicitly navigating to.
>
> This loop of tree fetches can be avoided by adding
> --no-recurse-submodules to the 'git fetch' command or setting
> fetch.recurseSubmodules=no.
>
> To make this as painless as possible for future users of treeless
> clones, automatically set fetch.recurseSubmodules=no at clone time.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>     clone: --filter=tree:0 implies fetch.recurseSubmodules=no
>     
>     While testing different partial clone options, I stumbled across this
>     one. My initial thought was that we were parsing commits and loading
>     their root trees unnecessarily, but I see that doesn't happen after this
>     change.
>     
>     Here are some recent discussions about using --filter=tree:0:
>     
>     [1] 
>     https://lore.kernel.org/git/aa7b89ee-08aa-7943-6a00-28dcf344426e@syntevo.com/
>     [2] https://lore.kernel.org/git/cover.1588633810.git.me@ttaylorr.com/[3] 
>     https://lore.kernel.org/git/58274817-7ac6-b6ae-0d10-22485dfe5e0e@syntevo.com/
>     
>     Thanks, -Stolee
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-797%2Fderrickstolee%2Ftree-0-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-797/derrickstolee/tree-0-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/797
>
>  list-objects-filter-options.c | 4 ++++
>  t/t5616-partial-clone.sh      | 6 ++++++
>  2 files changed, 10 insertions(+)

In any case I think such a change would also need a doc update, probably
in Documentation/fetch-options.txt and Documentation/config/fetch.txt.

Cheers,

Philippe.


      parent reply	other threads:[~2020-11-21 16:21 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-20 20:36 Derrick Stolee via GitGitGadget
2020-11-21  0:04 ` Jeff King
2020-11-23 15:18   ` Derrick Stolee
2020-11-24  8:04     ` Jeff King
2020-11-21 16:19 ` Philippe Blain [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e539892e-743a-96d7-a540-b7f0af22cbe1@gmail.com \
    --to=levraiphilippeblain@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=jonathantanmy@google.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

git@vger.kernel.org list mirror (unofficial, one of many)

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 git git/ https://public-inbox.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git
 note: .onion URLs require Tor: https://www.torproject.org/

code repositories for the project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git