git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: gitster@pobox.com, me@ttaylorr.com, aevar@gmail.com,
	newren@gmail.com, Derrick Stolee <derrickstolee@github.com>,
	Derrick Stolee <derrickstolee@github.com>
Subject: [PATCH 13/25] clone: add --bundle-uri option
Date: Wed, 23 Feb 2022 18:30:51 +0000	[thread overview]
Message-ID: <050725d90ef019ca2684ec0afbfd701efea7f88a.1645641063.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1160.git.1645641063.gitgitgadget@gmail.com>

From: Derrick Stolee <derrickstolee@github.com>

Cloning a remote repository is one of the most expensive operations in
Git. The server can spend a lot of CPU time generating a pack-file for
the client's request. The amount of data can clog the network for a long
time, and the Git protocol is not resumable. For users with poor network
connections or are located far away from the origin server, this can be
especially painful.

The 'git bundle fetch' command allows users to bootstrap a repository
using a set of bundles. However, this would require them to use 'git
init' first, followed by the 'git bundle fetch', and finally add a
remote, fetch, and checkout the branch they want.

Instead, integrate this workflow directly into 'git clone' with the
--bundle-uri' option. If the user is aware of a bundle server, then they
can tell Git to bootstrap the new repository with these bundles before
fetching the remaining objects from the origin server.

RFC-TODO: Document this option in git-clone.txt.

RFC-TODO: I added a comment about the location of this code being
necessary for the later step of auto-discovering the bundle URI from the
origin server. This is probably not actually a requirement, but rather a
pain point around how I implemented the feature. If a --bundle-uri
option is specified, but SSH is used for the clone, then the SSH
connection is left open while Git downloads bundles from another server.
This is sub-optimal and should be reconsidered when fully reviewed.

RFC-TODO: create tests for this option with a variety of URI types.

RFC-TODO: a simple end-to-end test is available at the end of the
series.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/clone.c | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/builtin/clone.c b/builtin/clone.c
index 9c29093b352..6df3d513dc4 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -33,6 +33,7 @@
 #include "packfile.h"
 #include "list-objects-filter-options.h"
 #include "hook.h"
+#include "bundle.h"
 
 /*
  * Overall FIXMEs:
@@ -74,6 +75,7 @@ static struct string_list option_recurse_submodules = STRING_LIST_INIT_NODUP;
 static struct list_objects_filter_options filter_options;
 static struct string_list server_options = STRING_LIST_INIT_NODUP;
 static int option_remote_submodules;
+static const char *bundle_uri;
 
 static int recurse_submodules_cb(const struct option *opt,
 				 const char *arg, int unset)
@@ -155,6 +157,8 @@ static struct option builtin_clone_options[] = {
 		    N_("any cloned submodules will use their remote-tracking branch")),
 	OPT_BOOL(0, "sparse", &option_sparse_checkout,
 		    N_("initialize sparse-checkout file to include only files at root")),
+	OPT_STRING(0, "bundle-uri", &bundle_uri,
+		   N_("uri"), N_("A URI for downloading bundles before fetching from origin remote")),
 	OPT_END()
 };
 
@@ -1185,6 +1189,35 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 
 	refs = transport_get_remote_refs(transport, &transport_ls_refs_options);
 
+	/*
+	 * NOTE: The bundle URI download takes place after transport_get_remote_refs()
+	 * because a later change will introduce a check for recommended features,
+	 * which might include a recommended bundle URI.
+	 */
+
+	/*
+	 * Before fetching from the remote, download and install bundle
+	 * data from the --bundle-uri option.
+	 */
+	if (bundle_uri) {
+		const char *filter = NULL;
+
+		if (filter_options.filter_spec.nr)
+			filter = expand_list_objects_filter_spec(&filter_options);
+		/*
+		 * Set the config for fetching from this bundle URI in the
+		 * future, but do it before fetch_bundle_uri() which might
+		 * un-set it (for instance, if there is no table of contents).
+		 */
+		git_config_set("fetch.bundleuri", bundle_uri);
+		if (filter)
+			git_config_set("fetch.bundlefilter", filter);
+
+		if (!fetch_bundle_uri(bundle_uri, filter))
+			warning(_("failed to fetch objects from bundle URI '%s'"),
+				bundle_uri);
+	}
+
 	if (refs)
 		mapped_refs = wanted_peer_refs(refs, &remote->fetch);
 
-- 
gitgitgadget


  parent reply	other threads:[~2022-02-23 18:31 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-23 18:30 [PATCH 00/25] [RFC] Bundle URIs Derrick Stolee via GitGitGadget
2022-02-23 18:30 ` [PATCH 01/25] docs: document bundle URI standard Derrick Stolee via GitGitGadget
2022-03-02  2:28   ` Elijah Newren
2022-03-02 14:06     ` Derrick Stolee
2022-03-03  2:19       ` Elijah Newren
2022-03-03  2:44         ` Derrick Stolee
2022-02-23 18:30 ` [PATCH 02/25] bundle: alphabetize subcommands better Derrick Stolee via GitGitGadget
2022-03-11 13:47   ` Ævar Arnfjörð Bjarmason
2022-02-23 18:30 ` [PATCH 03/25] dir: extract starts_with_dot[_dot]_slash() Derrick Stolee via GitGitGadget
2022-02-23 18:30 ` [PATCH 04/25] remote: move relative_url() Derrick Stolee via GitGitGadget
2022-02-23 18:30 ` [PATCH 05/25] remote: allow relative_url() to return an absolute url Derrick Stolee via GitGitGadget
2022-02-23 18:30 ` [PATCH 06/25] http: make http_get_file() external Derrick Stolee via GitGitGadget
2022-02-23 18:30 ` [PATCH 07/25] remote-curl: add 'get' capability Derrick Stolee via GitGitGadget
2022-02-23 18:30 ` [PATCH 08/25] bundle: implement 'fetch' command for direct bundles Derrick Stolee via GitGitGadget
2022-02-23 18:30 ` [PATCH 09/25] bundle: parse table of contents during 'fetch' Derrick Stolee via GitGitGadget
2022-02-23 18:30 ` [PATCH 10/25] bundle: add --filter option to 'fetch' Derrick Stolee via GitGitGadget
2022-03-11 13:44   ` Ævar Arnfjörð Bjarmason
2022-02-23 18:30 ` [PATCH 11/25] bundle: allow relative URLs in table of contents Derrick Stolee via GitGitGadget
2022-03-11 13:42   ` Ævar Arnfjörð Bjarmason
2022-02-23 18:30 ` [PATCH 12/25] bundle: make it easy to call 'git bundle fetch' Derrick Stolee via GitGitGadget
2022-02-23 18:30 ` Derrick Stolee via GitGitGadget [this message]
2022-02-23 18:30 ` [PATCH 14/25] clone: --bundle-uri cannot be combined with --depth Derrick Stolee via GitGitGadget
2022-02-23 18:30 ` [PATCH 15/25] config: add git_config_get_timestamp() Derrick Stolee via GitGitGadget
2022-03-11 13:32   ` Ævar Arnfjörð Bjarmason
2022-02-23 18:30 ` [PATCH 16/25] bundle: only fetch bundles if timestamp is new Derrick Stolee via GitGitGadget
2022-02-23 18:30 ` [PATCH 17/25] fetch: fetch bundles before fetching original data Derrick Stolee via GitGitGadget
2022-02-23 18:30 ` [PATCH 18/25] connect.c: refactor sending of agent & object-format Ævar Arnfjörð Bjarmason via GitGitGadget
2022-02-23 18:30 ` [PATCH 19/25] protocol-caps: implement cap_features() Derrick Stolee via GitGitGadget
2022-02-23 18:30 ` [PATCH 20/25] serve: understand but do not advertise 'features' capability Derrick Stolee via GitGitGadget
2022-02-23 18:30 ` [PATCH 21/25] serve: advertise 'features' when config exists Derrick Stolee via GitGitGadget
2022-02-23 18:31 ` [PATCH 22/25] connect: implement get_recommended_features() Derrick Stolee via GitGitGadget
2022-02-23 18:31 ` [PATCH 23/25] transport: add connections for 'features' capability Derrick Stolee via GitGitGadget
2022-02-23 18:31 ` [PATCH 24/25] clone: use server-recommended bundle URI Derrick Stolee via GitGitGadget
2022-02-23 18:31 ` [PATCH 25/25] t5601: basic bundle URI test Derrick Stolee via GitGitGadget
2022-02-23 22:17 ` [PATCH 00/25] [RFC] Bundle URIs Ævar Arnfjörð Bjarmason
2022-02-24 14:11   ` Derrick Stolee
2022-03-04 13:30     ` Derrick Stolee
2022-03-04 14:49       ` Ævar Arnfjörð Bjarmason
2022-03-04 15:12         ` Derrick Stolee
2022-03-08 17:15           ` Derrick Stolee
2022-03-10 14:45             ` Johannes Schindelin
2022-04-07 19:08             ` Derrick Stolee
2022-04-08  9:15               ` Ævar Arnfjörð Bjarmason
2022-04-08 13:13                 ` Derrick Stolee
2022-04-08 18:26                   ` Junio C Hamano
2022-03-08  8:18   ` Teng Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=050725d90ef019ca2684ec0afbfd701efea7f88a.1645641063.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=aevar@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=newren@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).