git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] clone: --filter=tree:0 implies fetch.recurseSubmodules=no
@ 2020-11-20 20:36 Derrick Stolee via GitGitGadget
  2020-11-21  0:04 ` Jeff King
  2020-11-21 16:19 ` Philippe Blain
  0 siblings, 2 replies; 5+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-11-20 20:36 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, Taylor Blau, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The partial clone feature has several modes, but only a few are quick
for a server to process using reachability bitmaps:

* Blobless: --filter=blob:none downloads all commits and trees and
  fetches necessary blobs on-demand.

* Treeless: --filter=tree:0 downloads all commits and fetches necessary
  trees and blobs on demand.

This treeles mode is most similar to a shallow clone in the total size
(it only adds the commit objects for the full history). This makes
treeless clones an interesting replacement for shallow clones. A user
can run more commands in a treeless clone than in a shallow clone,
especially 'git log' (no pathspec).

In particular, servers can still serve 'git fetch' requests quickly by
calculating the difference between commit wants and haves using bitmaps.

I was testing this feature with this in mind, and I knew that some trees
would be downloaded multiple times when checking out a new branch, but I
did not expect to discover a significant issue with 'git fetch', at
least in repostiories with submodules.

I was testing these commands:

	$ git clone --filter=tree:0 --single-branch --branch=master \
	  https://github.com/git/git
	$ git -C git fetch origin "+refs/heads/*:refs/remotes/origin/*"

This fetch command started downloading several pack-files of trees
before completing the command. I never let it finish since I got so
impatient with the repeated downloads. During debugging, I found that
the stack triggering promisor_remote_get_direct() was going through
fetch_populated_submodules(). Notice that I did not recurse my
submodules in the original clone, so the sha1collisiondetection
submodule is not initialized. Even so, my 'git fetch' was scanning
commits for updates to submodules.

I decided that even if I did populate the submodules, the nature of
treeless clones makes me not want to care about the contents of commits
other than those that I am explicitly navigating to.

This loop of tree fetches can be avoided by adding
--no-recurse-submodules to the 'git fetch' command or setting
fetch.recurseSubmodules=no.

To make this as painless as possible for future users of treeless
clones, automatically set fetch.recurseSubmodules=no at clone time.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
    clone: --filter=tree:0 implies fetch.recurseSubmodules=no
    
    While testing different partial clone options, I stumbled across this
    one. My initial thought was that we were parsing commits and loading
    their root trees unnecessarily, but I see that doesn't happen after this
    change.
    
    Here are some recent discussions about using --filter=tree:0:
    
    [1] 
    https://lore.kernel.org/git/aa7b89ee-08aa-7943-6a00-28dcf344426e@syntevo.com/
    [2] https://lore.kernel.org/git/cover.1588633810.git.me@ttaylorr.com/[3] 
    https://lore.kernel.org/git/58274817-7ac6-b6ae-0d10-22485dfe5e0e@syntevo.com/
    
    Thanks, -Stolee

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-797%2Fderrickstolee%2Ftree-0-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-797/derrickstolee/tree-0-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/797

 list-objects-filter-options.c | 4 ++++
 t/t5616-partial-clone.sh      | 6 ++++++
 2 files changed, 10 insertions(+)

diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index defd3dfd10..249939dfa5 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -376,6 +376,10 @@ void partial_clone_register(
 		       expand_list_objects_filter_spec(filter_options));
 	free(filter_name);
 
+	if (filter_options->choice == LOFC_TREE_DEPTH &&
+	    !filter_options->tree_exclude_depth)
+		git_config_set("fetch.recursesubmodules", "no");
+
 	/* Make sure the config info are reset */
 	promisor_remote_reinit();
 }
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index f4d49d8335..b2eaf78069 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -341,6 +341,12 @@ test_expect_success 'partial clone with sparse filter succeeds' '
 	)
 '
 
+test_expect_success '--filter=tree:0 sets fetch.recurseSubmodules=no' '
+	rm -rf dst &&
+	git clone --filter=tree:0 "file://$(pwd)/src" dst &&
+	test_config -C dst fetch.recursesubmodules no
+'
+
 test_expect_success 'partial clone with unresolvable sparse filter fails cleanly' '
 	rm -rf dst.git &&
 	test_must_fail git clone --no-local --bare \

base-commit: faefdd61ec7c7f6f3c8c9907891465ac9a2a1475
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-11-24  8:05 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-20 20:36 [PATCH] clone: --filter=tree:0 implies fetch.recurseSubmodules=no Derrick Stolee via GitGitGadget
2020-11-21  0:04 ` Jeff King
2020-11-23 15:18   ` Derrick Stolee
2020-11-24  8:04     ` Jeff King
2020-11-21 16:19 ` Philippe Blain

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).