From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Jonathan Tan <jonathantanmy@google.com>,
Taylor Blau <me@ttaylorr.com>,
Derrick Stolee <derrickstolee@github.com>,
Derrick Stolee <dstolee@microsoft.com>
Subject: [PATCH] clone: --filter=tree:0 implies fetch.recurseSubmodules=no
Date: Fri, 20 Nov 2020 20:36:26 +0000 [thread overview]
Message-ID: <pull.797.git.1605904586929.gitgitgadget@gmail.com> (raw)
From: Derrick Stolee <dstolee@microsoft.com>
The partial clone feature has several modes, but only a few are quick
for a server to process using reachability bitmaps:
* Blobless: --filter=blob:none downloads all commits and trees and
fetches necessary blobs on-demand.
* Treeless: --filter=tree:0 downloads all commits and fetches necessary
trees and blobs on demand.
This treeles mode is most similar to a shallow clone in the total size
(it only adds the commit objects for the full history). This makes
treeless clones an interesting replacement for shallow clones. A user
can run more commands in a treeless clone than in a shallow clone,
especially 'git log' (no pathspec).
In particular, servers can still serve 'git fetch' requests quickly by
calculating the difference between commit wants and haves using bitmaps.
I was testing this feature with this in mind, and I knew that some trees
would be downloaded multiple times when checking out a new branch, but I
did not expect to discover a significant issue with 'git fetch', at
least in repostiories with submodules.
I was testing these commands:
$ git clone --filter=tree:0 --single-branch --branch=master \
https://github.com/git/git
$ git -C git fetch origin "+refs/heads/*:refs/remotes/origin/*"
This fetch command started downloading several pack-files of trees
before completing the command. I never let it finish since I got so
impatient with the repeated downloads. During debugging, I found that
the stack triggering promisor_remote_get_direct() was going through
fetch_populated_submodules(). Notice that I did not recurse my
submodules in the original clone, so the sha1collisiondetection
submodule is not initialized. Even so, my 'git fetch' was scanning
commits for updates to submodules.
I decided that even if I did populate the submodules, the nature of
treeless clones makes me not want to care about the contents of commits
other than those that I am explicitly navigating to.
This loop of tree fetches can be avoided by adding
--no-recurse-submodules to the 'git fetch' command or setting
fetch.recurseSubmodules=no.
To make this as painless as possible for future users of treeless
clones, automatically set fetch.recurseSubmodules=no at clone time.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
clone: --filter=tree:0 implies fetch.recurseSubmodules=no
While testing different partial clone options, I stumbled across this
one. My initial thought was that we were parsing commits and loading
their root trees unnecessarily, but I see that doesn't happen after this
change.
Here are some recent discussions about using --filter=tree:0:
[1]
https://lore.kernel.org/git/aa7b89ee-08aa-7943-6a00-28dcf344426e@syntevo.com/
[2] https://lore.kernel.org/git/cover.1588633810.git.me@ttaylorr.com/[3]
https://lore.kernel.org/git/58274817-7ac6-b6ae-0d10-22485dfe5e0e@syntevo.com/
Thanks, -Stolee
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-797%2Fderrickstolee%2Ftree-0-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-797/derrickstolee/tree-0-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/797
list-objects-filter-options.c | 4 ++++
t/t5616-partial-clone.sh | 6 ++++++
2 files changed, 10 insertions(+)
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index defd3dfd10..249939dfa5 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -376,6 +376,10 @@ void partial_clone_register(
expand_list_objects_filter_spec(filter_options));
free(filter_name);
+ if (filter_options->choice == LOFC_TREE_DEPTH &&
+ !filter_options->tree_exclude_depth)
+ git_config_set("fetch.recursesubmodules", "no");
+
/* Make sure the config info are reset */
promisor_remote_reinit();
}
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index f4d49d8335..b2eaf78069 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -341,6 +341,12 @@ test_expect_success 'partial clone with sparse filter succeeds' '
)
'
+test_expect_success '--filter=tree:0 sets fetch.recurseSubmodules=no' '
+ rm -rf dst &&
+ git clone --filter=tree:0 "file://$(pwd)/src" dst &&
+ test_config -C dst fetch.recursesubmodules no
+'
+
test_expect_success 'partial clone with unresolvable sparse filter fails cleanly' '
rm -rf dst.git &&
test_must_fail git clone --no-local --bare \
base-commit: faefdd61ec7c7f6f3c8c9907891465ac9a2a1475
--
gitgitgadget
next reply other threads:[~2020-11-20 21:33 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-20 20:36 Derrick Stolee via GitGitGadget [this message]
2020-11-21 0:04 ` [PATCH] clone: --filter=tree:0 implies fetch.recurseSubmodules=no Jeff King
2020-11-23 15:18 ` Derrick Stolee
2020-11-24 8:04 ` Jeff King
2020-11-21 16:19 ` Philippe Blain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=pull.797.git.1605904586929.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=derrickstolee@github.com \
--cc=dstolee@microsoft.com \
--cc=git@vger.kernel.org \
--cc=jonathantanmy@google.com \
--cc=me@ttaylorr.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).