git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: gitster@pobox.com, me@ttaylorr.com, vdye@github.com,
	avarab@gmail.com, steadmon@google.com, chooglen@google.com,
	Derrick Stolee <derrickstolee@github.com>,
	Derrick Stolee <derrickstolee@github.com>
Subject: [PATCH 4/8] bundle-uri: download in creationToken order
Date: Fri, 06 Jan 2023 20:36:41 +0000	[thread overview]
Message-ID: <57c0174d3752fb61a05e0653de9d3057616ed16a.1673037405.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1454.git.1673037405.gitgitgadget@gmail.com>

From: Derrick Stolee <derrickstolee@github.com>

The creationToken heuristic provides an ordering on the bundles
advertised by a bundle list. Teach the Git client to download bundles
differently when this heuristic is advertised.

The bundles in the list are sorted by their advertised creationToken
values, then downloaded in decreasing order. This avoids the previous
strategy of downloading bundles in an arbitrary order and attempting
to apply them (likely failing in the case of required commits) until
discovering the order through attempted unbundling.

During a fresh 'git clone', it may make sense to download the bundles in
increasing order, since that would prevent the need to attempt
unbundling a bundle with required commits that do not exist in our empty
object store. The cost of testing an unbundle is quite low, and instead
the chosen order is optimizing for a future bundle download during a
'git fetch' operation with a non-empty object store.

Since the Git client continues fetching from the Git remote after
downloading and unbundling bundles, the client's object store can be
ahead of the bundle provider's object store. The next time it attempts
to download from the bundle list, it makes most sense to download only
the most-recent bundles until all tips successfully unbundle. The
strategy implemented here provides that short-circuit where the client
downloads a minimal set of bundles.

A later implementation detail will store the maximum creationToken seen
during such a bundle download, and the client will avoid downloading a
bundle unless its creationToken is strictly greater than that stored
value. For now, if the client seeks to download from an identical
bundle list since its previous download, it will download the
most-recent bundle then stop since its required commits are already in
the object store.

Add tests that exercise this behavior, but we will expand upon these
tests when incremental downloads during 'git fetch' make use of
creationToken values.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c                | 140 +++++++++++++++++++++++++++++++++++-
 t/t5558-clone-bundle-uri.sh |  41 ++++++++++-
 t/t5601-clone.sh            |  50 +++++++++++++
 3 files changed, 227 insertions(+), 4 deletions(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index 63e2cc21057..b30c85ba6f2 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -434,6 +434,124 @@ static int download_bundle_to_file(struct remote_bundle_info *bundle, void *data
 	return 0;
 }
 
+struct sorted_bundle_list {
+	struct remote_bundle_info **items;
+	size_t alloc;
+	size_t nr;
+};
+
+static int insert_bundle(struct remote_bundle_info *bundle, void *data)
+{
+	struct sorted_bundle_list *list = data;
+	list->items[list->nr++] = bundle;
+	return 0;
+}
+
+static int compare_creation_token(const void *va, const void *vb)
+{
+	const struct remote_bundle_info * const *a = va;
+	const struct remote_bundle_info * const *b = vb;
+
+	if ((*a)->creationToken > (*b)->creationToken)
+		return -1;
+	if ((*a)->creationToken < (*b)->creationToken)
+		return 1;
+	return 0;
+}
+
+static int fetch_bundles_by_token(struct repository *r,
+				  struct bundle_list *list)
+{
+	int cur;
+	int pop_or_push = 0;
+	struct bundle_list_context ctx = {
+		.r = r,
+		.list = list,
+		.mode = list->mode,
+	};
+	struct sorted_bundle_list sorted = {
+		.alloc = hashmap_get_size(&list->bundles),
+	};
+
+	ALLOC_ARRAY(sorted.items, sorted.alloc);
+
+	for_all_bundles_in_list(list, insert_bundle, &sorted);
+
+	QSORT(sorted.items, sorted.nr, compare_creation_token);
+
+	/*
+	 * Use a stack-based approach to download the bundles and attempt
+	 * to unbundle them in decreasing order by creation token. If we
+	 * fail to unbundle (after a successful download) then move to the
+	 * next non-downloaded bundle (push to the stack) and attempt
+	 * downloading. Once we succeed in applying a bundle, move to the
+	 * previous unapplied bundle (pop the stack) and attempt to unbundle
+	 * it again.
+	 *
+	 * In the case of a fresh clone, we will likely download all of the
+	 * bundles before successfully unbundling the oldest one, then the
+	 * rest of the bundles unbundle successfully in increasing order
+	 * of creationToken.
+	 *
+	 * If there are existing objects, then this process may terminate
+	 * early when all required commits from "new" bundles exist in the
+	 * repo's object store.
+	 */
+	cur = 0;
+	while (cur >= 0 && cur < sorted.nr) {
+		struct remote_bundle_info *bundle = sorted.items[cur];
+		if (!bundle->file) {
+			/* Not downloaded yet. Try downloading. */
+			if (download_bundle_to_file(bundle, &ctx)) {
+				/* Failure. Push to the stack. */
+				pop_or_push = 1;
+				goto stack_operation;
+			}
+
+			/* We expect bundles when using creationTokens. */
+			if (!is_bundle(bundle->file, 1)) {
+				warning(_("file downloaded from '%s' is not a bundle"),
+					bundle->uri);
+				break;
+			}
+		}
+
+		if (bundle->file && !bundle->unbundled) {
+			/*
+			 * This was downloaded, but not successfully
+			 * unbundled. Try unbundling again.
+			 */
+			if (unbundle_from_file(ctx.r, bundle->file)) {
+				/* Failed to unbundle. Push to stack. */
+				pop_or_push = 1;
+			} else {
+				/* Succeeded in unbundle. Pop stack. */
+				pop_or_push = -1;
+			}
+		}
+
+		/*
+		 * Else case: downloaded and unbundled successfully.
+		 * Skip this by moving in the same direction as the
+		 * previous step.
+		 */
+
+stack_operation:
+		/* Move in the specified direction and repeat. */
+		cur += pop_or_push;
+	}
+
+	free(sorted.items);
+
+	/*
+	 * We succeed if the loop terminates because 'cur' drops below
+	 * zero. The other case is that we terminate because 'cur'
+	 * reaches the end of the list, so we have a failure no matter
+	 * which bundles we apply from the list.
+	 */
+	return cur >= 0;
+}
+
 static int download_bundle_list(struct repository *r,
 				struct bundle_list *local_list,
 				struct bundle_list *global_list,
@@ -471,7 +589,14 @@ static int fetch_bundle_list_in_config_format(struct repository *r,
 		goto cleanup;
 	}
 
-	if ((result = download_bundle_list(r, &list_from_bundle,
+	/*
+	 * If this list uses the creationToken heuristic, then the URIs
+	 * it advertises are expected to be bundles, not nested lists.
+	 * We can drop 'global_list' and 'depth'.
+	 */
+	if (list_from_bundle.heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN)
+		result = fetch_bundles_by_token(r, &list_from_bundle);
+	else if ((result = download_bundle_list(r, &list_from_bundle,
 					   global_list, depth)))
 		goto cleanup;
 
@@ -613,6 +738,14 @@ int fetch_bundle_list(struct repository *r, struct bundle_list *list)
 	int result;
 	struct bundle_list global_list;
 
+	/*
+	 * If the creationToken heuristic is used, then the URIs
+	 * advertised by 'list' are not nested lists and instead
+	 * direct bundles. We do not need to use global_list.
+	 */
+	if (list->heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN)
+		return fetch_bundles_by_token(r, list);
+
 	init_bundle_list(&global_list);
 
 	/* If a bundle is added to this global list, then it is required. */
@@ -621,7 +754,10 @@ int fetch_bundle_list(struct repository *r, struct bundle_list *list)
 	if ((result = download_bundle_list(r, list, &global_list, 0)))
 		goto cleanup;
 
-	result = unbundle_all_bundles(r, &global_list);
+	if (list->heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN)
+		result = fetch_bundles_by_token(r, list);
+	else
+		result = unbundle_all_bundles(r, &global_list);
 
 cleanup:
 	for_all_bundles_in_list(&global_list, unlink_bundle, NULL);
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index 328caeeae9a..d7461ec907e 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -368,6 +368,8 @@ test_expect_success 'clone bundle list (HTTP, any mode)' '
 '
 
 test_expect_success 'clone bundle list (http, creationToken)' '
+	test_when_finished rm -f trace*.txt &&
+
 	cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" &&
 	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
 	[bundle]
@@ -392,10 +394,45 @@ test_expect_success 'clone bundle list (http, creationToken)' '
 		creationToken = 4
 	EOF
 
-	git clone --bundle-uri="$HTTPD_URL/bundle-list" . clone-list-http-2 &&
+	GIT_TRACE2_EVENT=$(pwd)/trace-clone.txt \
+	git clone --bundle-uri="$HTTPD_URL/bundle-list" \
+		clone-from clone-list-http-2 &&
 
 	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
-	git -C clone-list-http-2 cat-file --batch-check <oids
+	git -C clone-list-http-2 cat-file --batch-check <oids &&
+
+	for b in 1 2 3 4
+	do
+		test_bundle_downloaded bundle-$b.bundle trace-clone.txt ||
+			return 1
+	done
+'
+
+test_expect_success 'clone bundle list (http, creationToken)' '
+	test_when_finished rm -f trace*.txt &&
+
+	cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" &&
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = bundle-1.bundle
+		creationToken = 1
+
+	[bundle "bundle-2"]
+		uri = bundle-2.bundle
+		creationToken = 2
+	EOF
+
+	GIT_TRACE2_EVENT=$(pwd)/trace-clone.txt \
+	git clone --bundle-uri="$HTTPD_URL/bundle-list" \
+		clone-from clone-token-http &&
+
+	test_bundle_downloaded bundle-1.bundle trace-clone.txt &&
+	test_bundle_downloaded bundle-2.bundle trace-clone.txt
 '
 
 # Do not add tests here unless they use the HTTP server, as they will
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index 1928ea1dd7c..57476b6e6d7 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -831,6 +831,56 @@ test_expect_success 'auto-discover multiple bundles from HTTP clone' '
 	grep -f pattern trace.txt
 '
 
+# Usage: test_bundle_downloaded <bundle-id> <trace-filename>
+test_bundle_downloaded () {
+	cat >pattern <<-EOF &&
+	"event":"child_start".*"argv":\["git-remote-https","$HTTPD_URL/$1.bundle"\]
+	EOF
+	grep -f pattern "$2"
+}
+
+test_expect_success 'auto-discover multiple bundles from HTTP clone: creationToken heuristic' '
+	test_when_finished rm -rf "$HTTPD_DOCUMENT_ROOT_PATH/repo4.git" &&
+	test_when_finished rm -rf clone-heuristic trace*.txt &&
+
+	test_commit -C src newest &&
+	git -C src bundle create "$HTTPD_DOCUMENT_ROOT_PATH/newest.bundle" HEAD~1..HEAD &&
+	git clone --bare --no-local src "$HTTPD_DOCUMENT_ROOT_PATH/repo4.git" &&
+
+	cat >>"$HTTPD_DOCUMENT_ROOT_PATH/repo4.git/config" <<-EOF &&
+	[uploadPack]
+		advertiseBundleURIs = true
+
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "everything"]
+		uri = $HTTPD_URL/everything.bundle
+		creationtoken = 1
+
+	[bundle "new"]
+		uri = $HTTPD_URL/new.bundle
+		creationtoken = 2
+
+	[bundle "newest"]
+		uri = $HTTPD_URL/newest.bundle
+		creationtoken = 3
+	EOF
+
+	GIT_TRACE2_EVENT="$(pwd)/trace-clone.txt" \
+		git -c protocol.version=2 \
+		    -c transfer.bundleURI=true clone \
+		"$HTTPD_URL/smart/repo4.git" clone-heuristic &&
+
+	# We should fetch all bundles
+	for b in everything new newest
+	do
+		test_bundle_downloaded $b trace-clone.txt || return 1
+	done
+'
+
 # DO NOT add non-httpd-specific tests here, because the last part of this
 # test script is only executed when httpd is available and enabled.
 
-- 
gitgitgadget


  parent reply	other threads:[~2023-01-06 20:37 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-06 20:36 [PATCH 0/8] Bundle URIs V: creationToken heuristic for incremental fetches Derrick Stolee via GitGitGadget
2023-01-06 20:36 ` [PATCH 1/8] t5558: add tests for creationToken heuristic Derrick Stolee via GitGitGadget
2023-01-17 18:17   ` Victoria Dye
2023-01-17 21:00     ` Derrick Stolee
2023-01-06 20:36 ` [PATCH 2/8] bundle-uri: parse bundle.heuristic=creationToken Derrick Stolee via GitGitGadget
2023-01-09  2:38   ` Junio C Hamano
2023-01-09 14:20     ` Derrick Stolee
2023-01-17 19:13   ` Victoria Dye
2023-01-06 20:36 ` [PATCH 3/8] bundle-uri: parse bundle.<id>.creationToken values Derrick Stolee via GitGitGadget
2023-01-09  3:08   ` Junio C Hamano
2023-01-09 14:41     ` Derrick Stolee
2023-01-17 19:24   ` Victoria Dye
2023-01-06 20:36 ` Derrick Stolee via GitGitGadget [this message]
2023-01-09  3:22   ` [PATCH 4/8] bundle-uri: download in creationToken order Junio C Hamano
2023-01-09 14:58     ` Derrick Stolee
2023-01-19 18:32   ` Victoria Dye
2023-01-20 14:56     ` Derrick Stolee
2023-01-06 20:36 ` [PATCH 5/8] clone: set fetch.bundleURI if appropriate Derrick Stolee via GitGitGadget
2023-01-19 19:42   ` Victoria Dye
2023-01-20 15:42     ` Derrick Stolee
2023-01-06 20:36 ` [PATCH 6/8] bundle-uri: drop bundle.flag from design doc Derrick Stolee via GitGitGadget
2023-01-19 19:44   ` Victoria Dye
2023-01-06 20:36 ` [PATCH 7/8] fetch: fetch from an external bundle URI Derrick Stolee via GitGitGadget
2023-01-19 20:34   ` Victoria Dye
2023-01-20 15:47     ` Derrick Stolee
2023-01-06 20:36 ` [PATCH 8/8] bundle-uri: store fetch.bundleCreationToken Derrick Stolee via GitGitGadget
2023-01-19 22:24   ` Victoria Dye
2023-01-20 15:53     ` Derrick Stolee
2023-01-23 15:21 ` [PATCH v2 00/10] Bundle URIs V: creationToken heuristic for incremental fetches Derrick Stolee via GitGitGadget
2023-01-23 15:21   ` [PATCH v2 01/10] bundle: optionally skip reachability walk Derrick Stolee via GitGitGadget
2023-01-23 18:03     ` Junio C Hamano
2023-01-23 18:24       ` Derrick Stolee
2023-01-23 20:13         ` Junio C Hamano
2023-01-23 22:30           ` Junio C Hamano
2023-01-24 12:27             ` Derrick Stolee
2023-01-24 14:14               ` [PATCH v2.5 01/11] bundle: test unbundling with incomplete history Derrick Stolee
2023-01-24 17:16                 ` Junio C Hamano
2023-01-24 14:16               ` [PATCH v2.5 02/11] bundle: verify using connected() Derrick Stolee
2023-01-24 17:33                 ` Junio C Hamano
2023-01-24 18:46                   ` Derrick Stolee
2023-01-24 20:41                     ` Junio C Hamano
2023-01-24 15:22               ` [PATCH v2 01/10] bundle: optionally skip reachability walk Junio C Hamano
2023-01-23 21:08         ` Junio C Hamano
2023-01-23 15:21   ` [PATCH v2 02/10] t5558: add tests for creationToken heuristic Derrick Stolee via GitGitGadget
2023-01-27 19:15     ` Victoria Dye
2023-01-23 15:21   ` [PATCH v2 03/10] bundle-uri: parse bundle.heuristic=creationToken Derrick Stolee via GitGitGadget
2023-01-23 15:21   ` [PATCH v2 04/10] bundle-uri: parse bundle.<id>.creationToken values Derrick Stolee via GitGitGadget
2023-01-23 15:21   ` [PATCH v2 05/10] bundle-uri: download in creationToken order Derrick Stolee via GitGitGadget
2023-01-27 19:17     ` Victoria Dye
2023-01-27 19:32       ` Junio C Hamano
2023-01-30 18:43         ` Derrick Stolee
2023-01-30 19:02           ` Junio C Hamano
2023-01-30 19:12             ` Derrick Stolee
2023-01-23 15:21   ` [PATCH v2 06/10] clone: set fetch.bundleURI if appropriate Derrick Stolee via GitGitGadget
2023-01-23 15:21   ` [PATCH v2 07/10] bundle-uri: drop bundle.flag from design doc Derrick Stolee via GitGitGadget
2023-01-23 15:21   ` [PATCH v2 08/10] fetch: fetch from an external bundle URI Derrick Stolee via GitGitGadget
2023-01-27 19:18     ` Victoria Dye
2023-01-23 15:21   ` [PATCH v2 09/10] bundle-uri: store fetch.bundleCreationToken Derrick Stolee via GitGitGadget
2023-01-23 15:21   ` [PATCH v2 10/10] bundle-uri: test missing bundles with heuristic Derrick Stolee via GitGitGadget
2023-01-27 19:21     ` Victoria Dye
2023-01-30 18:47       ` Derrick Stolee
2023-01-27 19:28   ` [PATCH v2 00/10] Bundle URIs V: creationToken heuristic for incremental fetches Victoria Dye
2023-01-31 13:29   ` [PATCH v3 00/11] " Derrick Stolee via GitGitGadget
2023-01-31 13:29     ` [PATCH v3 01/11] bundle: test unbundling with incomplete history Derrick Stolee via GitGitGadget
2023-01-31 13:29     ` [PATCH v3 02/11] bundle: verify using check_connected() Derrick Stolee via GitGitGadget
2023-01-31 17:35       ` Junio C Hamano
2023-01-31 19:31         ` Derrick Stolee
2023-01-31 19:36           ` Junio C Hamano
2023-01-31 13:29     ` [PATCH v3 03/11] t5558: add tests for creationToken heuristic Derrick Stolee via GitGitGadget
2023-01-31 13:29     ` [PATCH v3 04/11] bundle-uri: parse bundle.heuristic=creationToken Derrick Stolee via GitGitGadget
2023-01-31 13:29     ` [PATCH v3 05/11] bundle-uri: parse bundle.<id>.creationToken values Derrick Stolee via GitGitGadget
2023-01-31 21:22       ` Junio C Hamano
2023-01-31 13:29     ` [PATCH v3 06/11] bundle-uri: download in creationToken order Derrick Stolee via GitGitGadget
2023-01-31 13:29     ` [PATCH v3 07/11] clone: set fetch.bundleURI if appropriate Derrick Stolee via GitGitGadget
2023-01-31 13:29     ` [PATCH v3 08/11] bundle-uri: drop bundle.flag from design doc Derrick Stolee via GitGitGadget
2023-01-31 13:29     ` [PATCH v3 09/11] fetch: fetch from an external bundle URI Derrick Stolee via GitGitGadget
2023-01-31 13:29     ` [PATCH v3 10/11] bundle-uri: store fetch.bundleCreationToken Derrick Stolee via GitGitGadget
2023-01-31 13:29     ` [PATCH v3 11/11] bundle-uri: test missing bundles with heuristic Derrick Stolee via GitGitGadget
2023-01-31 22:01     ` [PATCH v3 00/11] Bundle URIs V: creationToken heuristic for incremental fetches Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57c0174d3752fb61a05e0653de9d3057616ed16a.1673037405.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=avarab@gmail.com \
    --cc=chooglen@google.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=steadmon@google.com \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).