git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/8] Bundle URIs V: creationToken heuristic for incremental fetches
@ 2023-01-06 20:36 Derrick Stolee via GitGitGadget
  2023-01-06 20:36 ` [PATCH 1/8] t5558: add tests for creationToken heuristic Derrick Stolee via GitGitGadget
                   ` (8 more replies)
  0 siblings, 9 replies; 74+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-01-06 20:36 UTC (permalink / raw)
  To: git; +Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee

This fifth part to the bundle URIs feature follows part IV (advertising via
protocol v2) which recently merged to 'master', so this series is based on
'master'.

This part introduces the concept of a heuristic that a bundle list can
advertise. The purpose of the heuristic is to hint to the Git client that
the bundles can be downloaded and unbundled in a certain order. In
particular, that order can assist with using the same bundle URI to download
new bundles from an updated bundle list. This allows bundle URIs to assist
with incremental fetches, not just initial clones.

The only planned heuristic is the "creationToken" heuristic where the bundle
list adds a 64-bit unsigned integer "creationToken" value to each bundle in
the list. Those values provide an ordering on the bundles implying that the
bundles can be unbundled in increasing creationToken order and at each point
the required commits for the ith bundle were provided by bundles with lower
creationTokens.

At clone time, the only difference implied by the creationToken order is
that the Git client does not need to guess at the order to apply the
bundles, but instead can use the creationToken order to apply them without
failure and retry. However, this presents an interesting benefit during
fetches: the Git client can check the bundle list and download bundles in
decreasing creationToken order until the required commits for these bundles
are present within the repository's object store. This prevents downloading
more bundle information than required.

The creationToken value is also a promise that the Git client will not need
to download a bundle if its creationToken is less than or equal to the
creationToken of a previously-downloaded bundle. This further improves the
performance during a fetch in that the client does not need to download any
bundles at all if it recognizes that the maximum creationToken is the same
(or smaller than) a previously-downloaded creationToken.

The creationToken concept is documented in the existing design document at
Documentation/technical/bundle-uri.txt, including suggested ways for bundle
providers to organize their bundle lists to take advantage of the heuristic.

This series formalizes the creationToken heuristic and the Git client logic
for understanding it. Further, for bundle lists provided by the git clone
--bundle-uri option, the Git client will recognize the heuristic as being
helpful for incremental fetches and store config values so that future git
fetch commands check the bundle list before communicating with any Git
remotes.

Note that this option does not integrate fetches with bundle lists
advertised via protocol v2. I spent some time working on this, but found the
implementation to be distinct enough that it merited its own attention in a
separate series. In particular, the configuration for indicating that a
fetch should check the bundle-uri protocol v2 command seemed best to be
located within a Git remote instead of a repository-global key such as is
being used for a static URI. Further, the timing of querying the bundle-uri
command during a git fetch command is significantly different and more
complicated than how it is used in git clone.


What Remains?
=============

Originally, I had planned on making this bundle URI work a 5-part series,
and this is part 5. Shouldn't we be done now?

There are two main things that should be done after this series, in any
order:

 * Teach git fetch to check a bundle list advertised by a remote over the
   bundle-uri protocol v2 command.
 * Add the bundle.<id>.filter option to allow advertising bundles and
   partial bundles side-by-side.

There is also room for expanding tests for more error conditions, or for
other tweaks that are not currently part of the design document. I do think
that after this series, the feature will be easier to work on different
parts in parallel.


Patch Outline
=============

 * Patch 1 creates a test setup demonstrating a creationToken heuristic. At
   this point, the Git client ignores the heuristic and uses its ad-hoc
   strategy for ordering the bundles.
 * Patches 2 and 3 teach Git to parse the bundle.heuristic and
   bundle.<id>.creationToken keys in a bundle list.
 * Patch 4 teaches Git to download bundles using the creationToken order.
   This order uses a stack approach to start from the maximum creationToken
   and continue downloading the next bundle in the list until all bundles
   can successfully be unbundled. This is the algorithm required for
   incremental fetches, while initial clones could download in the opposite
   order. Since clones will download all bundles anyway, having a second
   code path just for clones seemed unnecessary.
 * Patch 5 teaches git clone --bundle-uri to set fetch.bundleURI when the
   advertised bundle list includs a heuristic that Git understands.
 * Patch 6 updates the design document to remove reference to a bundle.flag
   option that was previously going to indicate the list was designed for
   fetches, but the bundle.heuristic option already does that.
 * Patch 7 teaches git fetch to check fetch.bundleURI and download bundles
   from that static URI before connecting to remotes via the Git protocol.
 * Patch 8 introduces a new fetch.bundleCreationToken config value to store
   the maximum creationToken of downloaded bundles. This prevents
   downloading the latest bundle on every git fetch command, reducing waste.

Thanks,

 * Stolee

Derrick Stolee (8):
  t5558: add tests for creationToken heuristic
  bundle-uri: parse bundle.heuristic=creationToken
  bundle-uri: parse bundle.<id>.creationToken values
  bundle-uri: download in creationToken order
  clone: set fetch.bundleURI if appropriate
  bundle-uri: drop bundle.flag from design doc
  fetch: fetch from an external bundle URI
  bundle-uri: store fetch.bundleCreationToken

 Documentation/config/bundle.txt        |   7 +
 Documentation/config/fetch.txt         |  16 ++
 Documentation/technical/bundle-uri.txt |   8 +-
 builtin/clone.c                        |   6 +-
 builtin/fetch.c                        |   8 +
 bundle-uri.c                           | 208 ++++++++++++++++++++++++-
 bundle-uri.h                           |  28 +++-
 t/t5558-clone-bundle-uri.sh            | 204 +++++++++++++++++++++++-
 t/t5601-clone.sh                       |  50 ++++++
 t/t5750-bundle-uri-parse.sh            |  37 +++++
 10 files changed, 561 insertions(+), 11 deletions(-)


base-commit: 4dbebc36b0893f5094668ddea077d0e235560b16
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1454%2Fderrickstolee%2Fbundle-redo%2FcreationToken-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1454/derrickstolee/bundle-redo/creationToken-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1454
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 1/8] t5558: add tests for creationToken heuristic
  2023-01-06 20:36 [PATCH 0/8] Bundle URIs V: creationToken heuristic for incremental fetches Derrick Stolee via GitGitGadget
@ 2023-01-06 20:36 ` Derrick Stolee via GitGitGadget
  2023-01-17 18:17   ` Victoria Dye
  2023-01-06 20:36 ` [PATCH 2/8] bundle-uri: parse bundle.heuristic=creationToken Derrick Stolee via GitGitGadget
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 74+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-01-06 20:36 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

As documented in the bundle URI design doc in 2da14fad8fe (docs:
document bundle URI standard, 2022-08-09), the 'creationToken' member of
a bundle URI allows a bundle provider to specify a total order on the
bundles.

Future changes will allow the Git client to understand these members and
modify its behavior around downloading the bundles in that order. In the
meantime, create tests that add creation tokens to the bundle list. For
now, the Git client correctly ignores these unknown keys.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 t/t5558-clone-bundle-uri.sh | 52 +++++++++++++++++++++++++++++++++++--
 1 file changed, 50 insertions(+), 2 deletions(-)

diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index 9155f31fa2c..328caeeae9a 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -284,7 +284,17 @@ test_expect_success 'clone HTTP bundle' '
 	test_config -C clone-http log.excludedecoration refs/bundle/
 '
 
+# usage: test_bundle_downloaded <bundle-name> <trace-file>
+test_bundle_downloaded () {
+	cat >pattern <<-EOF &&
+	"event":"child_start".*"argv":\["git-remote-https","$HTTPD_URL/$1"\]
+	EOF
+	grep -f pattern "$2"
+}
+
 test_expect_success 'clone bundle list (HTTP, no heuristic)' '
+	test_when_finished rm -f trace*.txt &&
+
 	cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" &&
 	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
 	[bundle]
@@ -304,12 +314,19 @@ test_expect_success 'clone bundle list (HTTP, no heuristic)' '
 		uri = $HTTPD_URL/bundle-4.bundle
 	EOF
 
-	git clone --bundle-uri="$HTTPD_URL/bundle-list" \
+	GIT_TRACE2_EVENT="$(pwd)/trace-clone.txt" \
+		git clone --bundle-uri="$HTTPD_URL/bundle-list" \
 		clone-from clone-list-http  2>err &&
 	! grep "Repository lacks these prerequisite commits" err &&
 
 	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
-	git -C clone-list-http cat-file --batch-check <oids
+	git -C clone-list-http cat-file --batch-check <oids &&
+
+	for b in 1 2 3 4
+	do
+		test_bundle_downloaded bundle-$b.bundle trace-clone.txt ||
+			return 1
+	done
 '
 
 test_expect_success 'clone bundle list (HTTP, any mode)' '
@@ -350,6 +367,37 @@ test_expect_success 'clone bundle list (HTTP, any mode)' '
 	test_cmp expect actual
 '
 
+test_expect_success 'clone bundle list (http, creationToken)' '
+	cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" &&
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = bundle-1.bundle
+		creationToken = 1
+
+	[bundle "bundle-2"]
+		uri = bundle-2.bundle
+		creationToken = 2
+
+	[bundle "bundle-3"]
+		uri = bundle-3.bundle
+		creationToken = 3
+
+	[bundle "bundle-4"]
+		uri = bundle-4.bundle
+		creationToken = 4
+	EOF
+
+	git clone --bundle-uri="$HTTPD_URL/bundle-list" . clone-list-http-2 &&
+
+	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
+	git -C clone-list-http-2 cat-file --batch-check <oids
+'
+
 # Do not add tests here unless they use the HTTP server, as they will
 # not run unless the HTTP dependencies exist.
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 2/8] bundle-uri: parse bundle.heuristic=creationToken
  2023-01-06 20:36 [PATCH 0/8] Bundle URIs V: creationToken heuristic for incremental fetches Derrick Stolee via GitGitGadget
  2023-01-06 20:36 ` [PATCH 1/8] t5558: add tests for creationToken heuristic Derrick Stolee via GitGitGadget
@ 2023-01-06 20:36 ` Derrick Stolee via GitGitGadget
  2023-01-09  2:38   ` Junio C Hamano
  2023-01-17 19:13   ` Victoria Dye
  2023-01-06 20:36 ` [PATCH 3/8] bundle-uri: parse bundle.<id>.creationToken values Derrick Stolee via GitGitGadget
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 74+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-01-06 20:36 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The bundle.heuristic value communicates that the bundle list is
organized to make use of the bundle.<id>.creationToken values that may
be provided in the bundle list. Those values will create a total order
on the bundles, allowing the Git client to download them in a specific
order and even remember previously-downloaded bundles by storing the
maximum creation token value.

Before implementing any logic that parses or uses the
bundle.<id>.creationToken values, teach Git to parse the
bundle.heuristic value from a bundle list. We can use 'test-tool
bundle-uri' to print the heuristic value and verify that the parsing
works correctly.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/config/bundle.txt |  7 +++++++
 bundle-uri.c                    | 21 +++++++++++++++++++++
 bundle-uri.h                    | 14 ++++++++++++++
 t/t5750-bundle-uri-parse.sh     | 19 +++++++++++++++++++
 4 files changed, 61 insertions(+)

diff --git a/Documentation/config/bundle.txt b/Documentation/config/bundle.txt
index daa21eb674a..3faae386853 100644
--- a/Documentation/config/bundle.txt
+++ b/Documentation/config/bundle.txt
@@ -15,6 +15,13 @@ bundle.mode::
 	complete understanding of the bundled information (`all`) or if any one
 	of the listed bundle URIs is sufficient (`any`).
 
+bundle.heuristic::
+	If this string-valued key exists, then the bundle list is designed to
+	work well with incremental `git fetch` commands. The heuristic signals
+	that there are additional keys available for each bundle that help
+	determine which subset of bundles the client should download. The
+	only value currently understood is `creationToken`.
+
 bundle.<id>.*::
 	The `bundle.<id>.*` keys are used to describe a single item in the
 	bundle list, grouped under `<id>` for identification purposes.
diff --git a/bundle-uri.c b/bundle-uri.c
index 36268dda172..56c94595c2a 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -9,6 +9,11 @@
 #include "config.h"
 #include "remote.h"
 
+static const char *heuristics[] = {
+	[BUNDLE_HEURISTIC_NONE] = "",
+	[BUNDLE_HEURISTIC_CREATIONTOKEN] = "creationToken",
+};
+
 static int compare_bundles(const void *hashmap_cmp_fn_data,
 			   const struct hashmap_entry *he1,
 			   const struct hashmap_entry *he2,
@@ -100,6 +105,9 @@ void print_bundle_list(FILE *fp, struct bundle_list *list)
 	fprintf(fp, "\tversion = %d\n", list->version);
 	fprintf(fp, "\tmode = %s\n", mode);
 
+	if (list->heuristic)
+		printf("\theuristic = %s\n", heuristics[list->heuristic]);
+
 	for_all_bundles_in_list(list, summarize_bundle, fp);
 }
 
@@ -142,6 +150,19 @@ static int bundle_list_update(const char *key, const char *value,
 			return 0;
 		}
 
+		if (!strcmp(subkey, "heuristic")) {
+			int i;
+			for (i = 0; i < BUNDLE_HEURISTIC__COUNT; i++) {
+				if (!strcmp(value, heuristics[i])) {
+					list->heuristic = i;
+					return 0;
+				}
+			}
+
+			/* Ignore unknown heuristics. */
+			return 0;
+		}
+
 		/* Ignore other unknown global keys. */
 		return 0;
 	}
diff --git a/bundle-uri.h b/bundle-uri.h
index d5e89f1671c..ad82174112d 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -52,6 +52,14 @@ enum bundle_list_mode {
 	BUNDLE_MODE_ANY
 };
 
+enum bundle_list_heuristic {
+	BUNDLE_HEURISTIC_NONE = 0,
+	BUNDLE_HEURISTIC_CREATIONTOKEN,
+
+	/* Must be last. */
+	BUNDLE_HEURISTIC__COUNT,
+};
+
 /**
  * A bundle_list contains an unordered set of remote_bundle_info structs,
  * as well as information about the bundle listing, such as version and
@@ -75,6 +83,12 @@ struct bundle_list {
 	 * advertised by the bundle list at that location.
 	 */
 	char *baseURI;
+
+	/**
+	 * A list can have a heuristic, which helps reduce the number of
+	 * downloaded bundles.
+	 */
+	enum bundle_list_heuristic heuristic;
 };
 
 void init_bundle_list(struct bundle_list *list);
diff --git a/t/t5750-bundle-uri-parse.sh b/t/t5750-bundle-uri-parse.sh
index 7b4f930e532..6fc92a9c0d4 100755
--- a/t/t5750-bundle-uri-parse.sh
+++ b/t/t5750-bundle-uri-parse.sh
@@ -250,4 +250,23 @@ test_expect_success 'parse config format edge cases: empty key or value' '
 	test_cmp_config_output expect actual
 '
 
+test_expect_success 'parse config format: creationToken heuristic' '
+	cat >expect <<-\EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+	[bundle "one"]
+		uri = http://example.com/bundle.bdl
+	[bundle "two"]
+		uri = https://example.com/bundle.bdl
+	[bundle "three"]
+		uri = file:///usr/share/git/bundle.bdl
+	EOF
+
+	test-tool bundle-uri parse-config expect >actual 2>err &&
+	test_must_be_empty err &&
+	test_cmp_config_output expect actual
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 3/8] bundle-uri: parse bundle.<id>.creationToken values
  2023-01-06 20:36 [PATCH 0/8] Bundle URIs V: creationToken heuristic for incremental fetches Derrick Stolee via GitGitGadget
  2023-01-06 20:36 ` [PATCH 1/8] t5558: add tests for creationToken heuristic Derrick Stolee via GitGitGadget
  2023-01-06 20:36 ` [PATCH 2/8] bundle-uri: parse bundle.heuristic=creationToken Derrick Stolee via GitGitGadget
@ 2023-01-06 20:36 ` Derrick Stolee via GitGitGadget
  2023-01-09  3:08   ` Junio C Hamano
  2023-01-17 19:24   ` Victoria Dye
  2023-01-06 20:36 ` [PATCH 4/8] bundle-uri: download in creationToken order Derrick Stolee via GitGitGadget
                   ` (5 subsequent siblings)
  8 siblings, 2 replies; 74+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-01-06 20:36 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The previous change taught Git to parse the bundle.heuristic value,
especially when its value is "creationToken". Now, teach Git to parse
the bundle.<id>.creationToken values on each bundle in a bundle list.

Before implementing any logic based on creationToken values for the
creationToken heuristic, parse and print these values for testing
purposes.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c                | 10 ++++++++++
 bundle-uri.h                |  6 ++++++
 t/t5750-bundle-uri-parse.sh | 18 ++++++++++++++++++
 3 files changed, 34 insertions(+)

diff --git a/bundle-uri.c b/bundle-uri.c
index 56c94595c2a..63e2cc21057 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -80,6 +80,9 @@ static int summarize_bundle(struct remote_bundle_info *info, void *data)
 	FILE *fp = data;
 	fprintf(fp, "[bundle \"%s\"]\n", info->id);
 	fprintf(fp, "\turi = %s\n", info->uri);
+
+	if (info->creationToken)
+		fprintf(fp, "\tcreationToken = %"PRIu64"\n", info->creationToken);
 	return 0;
 }
 
@@ -190,6 +193,13 @@ static int bundle_list_update(const char *key, const char *value,
 		return 0;
 	}
 
+	if (!strcmp(subkey, "creationtoken")) {
+		if (sscanf(value, "%"PRIu64, &bundle->creationToken) != 1)
+			warning(_("could not parse bundle list key %s with value '%s'"),
+				"creationToken", value);
+		return 0;
+	}
+
 	/*
 	 * At this point, we ignore any information that we don't
 	 * understand, assuming it to be hints for a heuristic the client
diff --git a/bundle-uri.h b/bundle-uri.h
index ad82174112d..1cae418211b 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -42,6 +42,12 @@ struct remote_bundle_info {
 	 * this boolean is true.
 	 */
 	unsigned unbundled:1;
+
+	/**
+	 * If the bundle is part of a list with the creationToken
+	 * heuristic, then we use this member for sorting the bundles.
+	 */
+	uint64_t creationToken;
 };
 
 #define REMOTE_BUNDLE_INFO_INIT { 0 }
diff --git a/t/t5750-bundle-uri-parse.sh b/t/t5750-bundle-uri-parse.sh
index 6fc92a9c0d4..81bdf58b944 100755
--- a/t/t5750-bundle-uri-parse.sh
+++ b/t/t5750-bundle-uri-parse.sh
@@ -258,10 +258,13 @@ test_expect_success 'parse config format: creationToken heuristic' '
 		heuristic = creationToken
 	[bundle "one"]
 		uri = http://example.com/bundle.bdl
+		creationToken = 123456
 	[bundle "two"]
 		uri = https://example.com/bundle.bdl
+		creationToken = 12345678901234567890
 	[bundle "three"]
 		uri = file:///usr/share/git/bundle.bdl
+		creationToken = 1
 	EOF
 
 	test-tool bundle-uri parse-config expect >actual 2>err &&
@@ -269,4 +272,19 @@ test_expect_success 'parse config format: creationToken heuristic' '
 	test_cmp_config_output expect actual
 '
 
+test_expect_success 'parse config format edge cases: creationToken heuristic' '
+	cat >expect <<-\EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+	[bundle "one"]
+		uri = http://example.com/bundle.bdl
+		creationToken = bogus
+	EOF
+
+	test-tool bundle-uri parse-config expect >actual 2>err &&
+	grep "could not parse bundle list key creationToken with value '\''bogus'\''" err
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 4/8] bundle-uri: download in creationToken order
  2023-01-06 20:36 [PATCH 0/8] Bundle URIs V: creationToken heuristic for incremental fetches Derrick Stolee via GitGitGadget
                   ` (2 preceding siblings ...)
  2023-01-06 20:36 ` [PATCH 3/8] bundle-uri: parse bundle.<id>.creationToken values Derrick Stolee via GitGitGadget
@ 2023-01-06 20:36 ` Derrick Stolee via GitGitGadget
  2023-01-09  3:22   ` Junio C Hamano
  2023-01-19 18:32   ` Victoria Dye
  2023-01-06 20:36 ` [PATCH 5/8] clone: set fetch.bundleURI if appropriate Derrick Stolee via GitGitGadget
                   ` (4 subsequent siblings)
  8 siblings, 2 replies; 74+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-01-06 20:36 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The creationToken heuristic provides an ordering on the bundles
advertised by a bundle list. Teach the Git client to download bundles
differently when this heuristic is advertised.

The bundles in the list are sorted by their advertised creationToken
values, then downloaded in decreasing order. This avoids the previous
strategy of downloading bundles in an arbitrary order and attempting
to apply them (likely failing in the case of required commits) until
discovering the order through attempted unbundling.

During a fresh 'git clone', it may make sense to download the bundles in
increasing order, since that would prevent the need to attempt
unbundling a bundle with required commits that do not exist in our empty
object store. The cost of testing an unbundle is quite low, and instead
the chosen order is optimizing for a future bundle download during a
'git fetch' operation with a non-empty object store.

Since the Git client continues fetching from the Git remote after
downloading and unbundling bundles, the client's object store can be
ahead of the bundle provider's object store. The next time it attempts
to download from the bundle list, it makes most sense to download only
the most-recent bundles until all tips successfully unbundle. The
strategy implemented here provides that short-circuit where the client
downloads a minimal set of bundles.

A later implementation detail will store the maximum creationToken seen
during such a bundle download, and the client will avoid downloading a
bundle unless its creationToken is strictly greater than that stored
value. For now, if the client seeks to download from an identical
bundle list since its previous download, it will download the
most-recent bundle then stop since its required commits are already in
the object store.

Add tests that exercise this behavior, but we will expand upon these
tests when incremental downloads during 'git fetch' make use of
creationToken values.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c                | 140 +++++++++++++++++++++++++++++++++++-
 t/t5558-clone-bundle-uri.sh |  41 ++++++++++-
 t/t5601-clone.sh            |  50 +++++++++++++
 3 files changed, 227 insertions(+), 4 deletions(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index 63e2cc21057..b30c85ba6f2 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -434,6 +434,124 @@ static int download_bundle_to_file(struct remote_bundle_info *bundle, void *data
 	return 0;
 }
 
+struct sorted_bundle_list {
+	struct remote_bundle_info **items;
+	size_t alloc;
+	size_t nr;
+};
+
+static int insert_bundle(struct remote_bundle_info *bundle, void *data)
+{
+	struct sorted_bundle_list *list = data;
+	list->items[list->nr++] = bundle;
+	return 0;
+}
+
+static int compare_creation_token(const void *va, const void *vb)
+{
+	const struct remote_bundle_info * const *a = va;
+	const struct remote_bundle_info * const *b = vb;
+
+	if ((*a)->creationToken > (*b)->creationToken)
+		return -1;
+	if ((*a)->creationToken < (*b)->creationToken)
+		return 1;
+	return 0;
+}
+
+static int fetch_bundles_by_token(struct repository *r,
+				  struct bundle_list *list)
+{
+	int cur;
+	int pop_or_push = 0;
+	struct bundle_list_context ctx = {
+		.r = r,
+		.list = list,
+		.mode = list->mode,
+	};
+	struct sorted_bundle_list sorted = {
+		.alloc = hashmap_get_size(&list->bundles),
+	};
+
+	ALLOC_ARRAY(sorted.items, sorted.alloc);
+
+	for_all_bundles_in_list(list, insert_bundle, &sorted);
+
+	QSORT(sorted.items, sorted.nr, compare_creation_token);
+
+	/*
+	 * Use a stack-based approach to download the bundles and attempt
+	 * to unbundle them in decreasing order by creation token. If we
+	 * fail to unbundle (after a successful download) then move to the
+	 * next non-downloaded bundle (push to the stack) and attempt
+	 * downloading. Once we succeed in applying a bundle, move to the
+	 * previous unapplied bundle (pop the stack) and attempt to unbundle
+	 * it again.
+	 *
+	 * In the case of a fresh clone, we will likely download all of the
+	 * bundles before successfully unbundling the oldest one, then the
+	 * rest of the bundles unbundle successfully in increasing order
+	 * of creationToken.
+	 *
+	 * If there are existing objects, then this process may terminate
+	 * early when all required commits from "new" bundles exist in the
+	 * repo's object store.
+	 */
+	cur = 0;
+	while (cur >= 0 && cur < sorted.nr) {
+		struct remote_bundle_info *bundle = sorted.items[cur];
+		if (!bundle->file) {
+			/* Not downloaded yet. Try downloading. */
+			if (download_bundle_to_file(bundle, &ctx)) {
+				/* Failure. Push to the stack. */
+				pop_or_push = 1;
+				goto stack_operation;
+			}
+
+			/* We expect bundles when using creationTokens. */
+			if (!is_bundle(bundle->file, 1)) {
+				warning(_("file downloaded from '%s' is not a bundle"),
+					bundle->uri);
+				break;
+			}
+		}
+
+		if (bundle->file && !bundle->unbundled) {
+			/*
+			 * This was downloaded, but not successfully
+			 * unbundled. Try unbundling again.
+			 */
+			if (unbundle_from_file(ctx.r, bundle->file)) {
+				/* Failed to unbundle. Push to stack. */
+				pop_or_push = 1;
+			} else {
+				/* Succeeded in unbundle. Pop stack. */
+				pop_or_push = -1;
+			}
+		}
+
+		/*
+		 * Else case: downloaded and unbundled successfully.
+		 * Skip this by moving in the same direction as the
+		 * previous step.
+		 */
+
+stack_operation:
+		/* Move in the specified direction and repeat. */
+		cur += pop_or_push;
+	}
+
+	free(sorted.items);
+
+	/*
+	 * We succeed if the loop terminates because 'cur' drops below
+	 * zero. The other case is that we terminate because 'cur'
+	 * reaches the end of the list, so we have a failure no matter
+	 * which bundles we apply from the list.
+	 */
+	return cur >= 0;
+}
+
 static int download_bundle_list(struct repository *r,
 				struct bundle_list *local_list,
 				struct bundle_list *global_list,
@@ -471,7 +589,14 @@ static int fetch_bundle_list_in_config_format(struct repository *r,
 		goto cleanup;
 	}
 
-	if ((result = download_bundle_list(r, &list_from_bundle,
+	/*
+	 * If this list uses the creationToken heuristic, then the URIs
+	 * it advertises are expected to be bundles, not nested lists.
+	 * We can drop 'global_list' and 'depth'.
+	 */
+	if (list_from_bundle.heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN)
+		result = fetch_bundles_by_token(r, &list_from_bundle);
+	else if ((result = download_bundle_list(r, &list_from_bundle,
 					   global_list, depth)))
 		goto cleanup;
 
@@ -613,6 +738,14 @@ int fetch_bundle_list(struct repository *r, struct bundle_list *list)
 	int result;
 	struct bundle_list global_list;
 
+	/*
+	 * If the creationToken heuristic is used, then the URIs
+	 * advertised by 'list' are not nested lists and instead
+	 * direct bundles. We do not need to use global_list.
+	 */
+	if (list->heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN)
+		return fetch_bundles_by_token(r, list);
+
 	init_bundle_list(&global_list);
 
 	/* If a bundle is added to this global list, then it is required. */
@@ -621,7 +754,10 @@ int fetch_bundle_list(struct repository *r, struct bundle_list *list)
 	if ((result = download_bundle_list(r, list, &global_list, 0)))
 		goto cleanup;
 
-	result = unbundle_all_bundles(r, &global_list);
+	if (list->heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN)
+		result = fetch_bundles_by_token(r, list);
+	else
+		result = unbundle_all_bundles(r, &global_list);
 
 cleanup:
 	for_all_bundles_in_list(&global_list, unlink_bundle, NULL);
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index 328caeeae9a..d7461ec907e 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -368,6 +368,8 @@ test_expect_success 'clone bundle list (HTTP, any mode)' '
 '
 
 test_expect_success 'clone bundle list (http, creationToken)' '
+	test_when_finished rm -f trace*.txt &&
+
 	cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" &&
 	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
 	[bundle]
@@ -392,10 +394,45 @@ test_expect_success 'clone bundle list (http, creationToken)' '
 		creationToken = 4
 	EOF
 
-	git clone --bundle-uri="$HTTPD_URL/bundle-list" . clone-list-http-2 &&
+	GIT_TRACE2_EVENT=$(pwd)/trace-clone.txt \
+	git clone --bundle-uri="$HTTPD_URL/bundle-list" \
+		clone-from clone-list-http-2 &&
 
 	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
-	git -C clone-list-http-2 cat-file --batch-check <oids
+	git -C clone-list-http-2 cat-file --batch-check <oids &&
+
+	for b in 1 2 3 4
+	do
+		test_bundle_downloaded bundle-$b.bundle trace-clone.txt ||
+			return 1
+	done
+'
+
+test_expect_success 'clone bundle list (http, creationToken)' '
+	test_when_finished rm -f trace*.txt &&
+
+	cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" &&
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = bundle-1.bundle
+		creationToken = 1
+
+	[bundle "bundle-2"]
+		uri = bundle-2.bundle
+		creationToken = 2
+	EOF
+
+	GIT_TRACE2_EVENT=$(pwd)/trace-clone.txt \
+	git clone --bundle-uri="$HTTPD_URL/bundle-list" \
+		clone-from clone-token-http &&
+
+	test_bundle_downloaded bundle-1.bundle trace-clone.txt &&
+	test_bundle_downloaded bundle-2.bundle trace-clone.txt
 '
 
 # Do not add tests here unless they use the HTTP server, as they will
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index 1928ea1dd7c..57476b6e6d7 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -831,6 +831,56 @@ test_expect_success 'auto-discover multiple bundles from HTTP clone' '
 	grep -f pattern trace.txt
 '
 
+# Usage: test_bundle_downloaded <bundle-id> <trace-filename>
+test_bundle_downloaded () {
+	cat >pattern <<-EOF &&
+	"event":"child_start".*"argv":\["git-remote-https","$HTTPD_URL/$1.bundle"\]
+	EOF
+	grep -f pattern "$2"
+}
+
+test_expect_success 'auto-discover multiple bundles from HTTP clone: creationToken heuristic' '
+	test_when_finished rm -rf "$HTTPD_DOCUMENT_ROOT_PATH/repo4.git" &&
+	test_when_finished rm -rf clone-heuristic trace*.txt &&
+
+	test_commit -C src newest &&
+	git -C src bundle create "$HTTPD_DOCUMENT_ROOT_PATH/newest.bundle" HEAD~1..HEAD &&
+	git clone --bare --no-local src "$HTTPD_DOCUMENT_ROOT_PATH/repo4.git" &&
+
+	cat >>"$HTTPD_DOCUMENT_ROOT_PATH/repo4.git/config" <<-EOF &&
+	[uploadPack]
+		advertiseBundleURIs = true
+
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "everything"]
+		uri = $HTTPD_URL/everything.bundle
+		creationtoken = 1
+
+	[bundle "new"]
+		uri = $HTTPD_URL/new.bundle
+		creationtoken = 2
+
+	[bundle "newest"]
+		uri = $HTTPD_URL/newest.bundle
+		creationtoken = 3
+	EOF
+
+	GIT_TRACE2_EVENT="$(pwd)/trace-clone.txt" \
+		git -c protocol.version=2 \
+		    -c transfer.bundleURI=true clone \
+		"$HTTPD_URL/smart/repo4.git" clone-heuristic &&
+
+	# We should fetch all bundles
+	for b in everything new newest
+	do
+		test_bundle_downloaded $b trace-clone.txt || return 1
+	done
+'
+
 # DO NOT add non-httpd-specific tests here, because the last part of this
 # test script is only executed when httpd is available and enabled.
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 5/8] clone: set fetch.bundleURI if appropriate
  2023-01-06 20:36 [PATCH 0/8] Bundle URIs V: creationToken heuristic for incremental fetches Derrick Stolee via GitGitGadget
                   ` (3 preceding siblings ...)
  2023-01-06 20:36 ` [PATCH 4/8] bundle-uri: download in creationToken order Derrick Stolee via GitGitGadget
@ 2023-01-06 20:36 ` Derrick Stolee via GitGitGadget
  2023-01-19 19:42   ` Victoria Dye
  2023-01-06 20:36 ` [PATCH 6/8] bundle-uri: drop bundle.flag from design doc Derrick Stolee via GitGitGadget
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 74+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-01-06 20:36 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

Bundle providers may organize their bundle lists in a way that is
intended to improve incremental fetches, not just initial clones.
However, they do need to state that they have organized with that in
mind, or else the client will not expect to save time by downloading
bundles after the initial clone. This is done by specifying a
bundle.heuristic value.

There are two types of bundle lists: those at a static URI and those
that are advertised from a Git remote over protocol v2.

The new fetch.bundleURI config value applies for static bundle URIs that
are not advertised over protocol v2. If the user specifies a static URI
via 'git clone --bundle-uri', then Git can set this config as a reminder
for future 'git fetch' operations to check the bundle list before
connecting to the remote(s).

For lists provided over protocol v2, we will want to take a different
approach and create a property of the remote itself by creating a
remote.<id>.* type config key. That is not implemented in this change.

Later changes will update 'git fetch' to consume this option.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/config/fetch.txt |  8 ++++++++
 builtin/clone.c                |  6 +++++-
 bundle-uri.c                   | 10 +++++++---
 bundle-uri.h                   |  8 +++++++-
 t/t5558-clone-bundle-uri.sh    | 33 +++++++++++++++++++++++++++++++++
 5 files changed, 60 insertions(+), 5 deletions(-)

diff --git a/Documentation/config/fetch.txt b/Documentation/config/fetch.txt
index cd65d236b43..4f796218aab 100644
--- a/Documentation/config/fetch.txt
+++ b/Documentation/config/fetch.txt
@@ -96,3 +96,11 @@ fetch.writeCommitGraph::
 	merge and the write may take longer. Having an updated commit-graph
 	file helps performance of many Git commands, including `git merge-base`,
 	`git push -f`, and `git log --graph`. Defaults to false.
+
+fetch.bundleURI::
+	This value stores a URI for fetching Git object data from a bundle URI
+	before performing an incremental fetch from the origin Git server. If
+	the value is `<uri>` then running `git fetch <args>` is equivalent to
+	first running `git fetch --bundle-uri=<uri>` immediately before
+	`git fetch <args>`. See details of the `--bundle-uri` option in
+	linkgit:git-fetch[1].
diff --git a/builtin/clone.c b/builtin/clone.c
index 5453ba5277f..5370617664d 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1248,12 +1248,16 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	 * data from the --bundle-uri option.
 	 */
 	if (bundle_uri) {
+		int has_heuristic = 0;
+
 		/* At this point, we need the_repository to match the cloned repo. */
 		if (repo_init(the_repository, git_dir, work_tree))
 			warning(_("failed to initialize the repo, skipping bundle URI"));
-		else if (fetch_bundle_uri(the_repository, bundle_uri))
+		else if (fetch_bundle_uri(the_repository, bundle_uri, &has_heuristic))
 			warning(_("failed to fetch objects from bundle URI '%s'"),
 				bundle_uri);
+		else if (has_heuristic)
+			git_config_set_gently("fetch.bundleuri", bundle_uri);
 	}
 
 	strvec_push(&transport_ls_refs_options.ref_prefixes, "HEAD");
diff --git a/bundle-uri.c b/bundle-uri.c
index b30c85ba6f2..1dbbbb980eb 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -594,9 +594,10 @@ static int fetch_bundle_list_in_config_format(struct repository *r,
 	 * it advertises are expected to be bundles, not nested lists.
 	 * We can drop 'global_list' and 'depth'.
 	 */
-	if (list_from_bundle.heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN)
+	if (list_from_bundle.heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN) {
 		result = fetch_bundles_by_token(r, &list_from_bundle);
-	else if ((result = download_bundle_list(r, &list_from_bundle,
+		global_list->heuristic = BUNDLE_HEURISTIC_CREATIONTOKEN;
+	} else if ((result = download_bundle_list(r, &list_from_bundle,
 					   global_list, depth)))
 		goto cleanup;
 
@@ -707,7 +708,8 @@ static int unlink_bundle(struct remote_bundle_info *info, void *data)
 	return 0;
 }
 
-int fetch_bundle_uri(struct repository *r, const char *uri)
+int fetch_bundle_uri(struct repository *r, const char *uri,
+		     int *has_heuristic)
 {
 	int result;
 	struct bundle_list list;
@@ -727,6 +729,8 @@ int fetch_bundle_uri(struct repository *r, const char *uri)
 	result = unbundle_all_bundles(r, &list);
 
 cleanup:
+	if (has_heuristic)
+		*has_heuristic = (list.heuristic != BUNDLE_HEURISTIC_NONE);
 	for_all_bundles_in_list(&list, unlink_bundle, NULL);
 	clear_bundle_list(&list);
 	clear_remote_bundle_info(&bundle, NULL);
diff --git a/bundle-uri.h b/bundle-uri.h
index 1cae418211b..52b27cd10e3 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -124,8 +124,14 @@ int bundle_uri_parse_config_format(const char *uri,
  * based on that information.
  *
  * Returns non-zero if no bundle information is found at the given 'uri'.
+ *
+ * If the pointer 'has_heuristic' is non-NULL, then the value it points to
+ * will be set to be non-zero if and only if the fetched list has a
+ * heuristic value. Such a value indicates that the list was designed for
+ * incremental fetches.
  */
-int fetch_bundle_uri(struct repository *r, const char *uri);
+int fetch_bundle_uri(struct repository *r, const char *uri,
+		     int *has_heuristic);
 
 /**
  * Given a bundle list that was already advertised (likely by the
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index d7461ec907e..8ff560425ee 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -435,6 +435,39 @@ test_expect_success 'clone bundle list (http, creationToken)' '
 	test_bundle_downloaded bundle-2.bundle trace-clone.txt
 '
 
+test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
+	test_when_finished rm -rf fetch-http-4 trace*.txt &&
+
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = bundle-1.bundle
+		creationToken = 1
+	EOF
+
+	GIT_TRACE2_EVENT="$(pwd)/trace-clone.txt" \
+	git clone --single-branch --branch=base \
+		--bundle-uri="$HTTPD_URL/bundle-list" \
+		"$HTTPD_URL/smart/fetch.git" fetch-http-4 &&
+
+	test_cmp_config -C fetch-http-4 "$HTTPD_URL/bundle-list" fetch.bundleuri &&
+
+	# The clone should copy two files: the list and bundle-1.
+	test_bundle_downloaded bundle-list trace-clone.txt &&
+	test_bundle_downloaded bundle-1.bundle trace-clone.txt &&
+
+	# only received base ref from bundle-1
+	git -C fetch-http-4 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+	cat >expect <<-\EOF &&
+	refs/bundles/base
+	EOF
+	test_cmp expect refs
+'
+
 # Do not add tests here unless they use the HTTP server, as they will
 # not run unless the HTTP dependencies exist.
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 6/8] bundle-uri: drop bundle.flag from design doc
  2023-01-06 20:36 [PATCH 0/8] Bundle URIs V: creationToken heuristic for incremental fetches Derrick Stolee via GitGitGadget
                   ` (4 preceding siblings ...)
  2023-01-06 20:36 ` [PATCH 5/8] clone: set fetch.bundleURI if appropriate Derrick Stolee via GitGitGadget
@ 2023-01-06 20:36 ` Derrick Stolee via GitGitGadget
  2023-01-19 19:44   ` Victoria Dye
  2023-01-06 20:36 ` [PATCH 7/8] fetch: fetch from an external bundle URI Derrick Stolee via GitGitGadget
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 74+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-01-06 20:36 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The Implementation Plan section lists a 'bundle.flag' option that is not
documented anywhere else. What is documented elsewhere in the document
and implemented by previous changes is the 'bundle.heuristic' config
key. For now, a heuristic is required to indicate that a bundle list is
organized for use during 'git fetch', and it is also sufficient for all
existing designs.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/technical/bundle-uri.txt | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/Documentation/technical/bundle-uri.txt b/Documentation/technical/bundle-uri.txt
index b78d01d9adf..91d3a13e327 100644
--- a/Documentation/technical/bundle-uri.txt
+++ b/Documentation/technical/bundle-uri.txt
@@ -479,14 +479,14 @@ outline for submitting these features:
    (This choice is an opt-in via a config option and a command-line
    option.)
 
-4. Allow the client to understand the `bundle.flag=forFetch` configuration
+4. Allow the client to understand the `bundle.heuristic` configuration key
    and the `bundle.<id>.creationToken` heuristic. When `git clone`
-   discovers a bundle URI with `bundle.flag=forFetch`, it configures the
-   client repository to check that bundle URI during later `git fetch <remote>`
+   discovers a bundle URI with `bundle.heuristic`, it configures the client
+   repository to check that bundle URI during later `git fetch <remote>`
    commands.
 
 5. Allow clients to discover bundle URIs during `git fetch` and configure
-   a bundle URI for later fetches if `bundle.flag=forFetch`.
+   a bundle URI for later fetches if `bundle.heuristic` is set.
 
 6. Implement the "inspect headers" heuristic to reduce data downloads when
    the `bundle.<id>.creationToken` heuristic is not available.
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 7/8] fetch: fetch from an external bundle URI
  2023-01-06 20:36 [PATCH 0/8] Bundle URIs V: creationToken heuristic for incremental fetches Derrick Stolee via GitGitGadget
                   ` (5 preceding siblings ...)
  2023-01-06 20:36 ` [PATCH 6/8] bundle-uri: drop bundle.flag from design doc Derrick Stolee via GitGitGadget
@ 2023-01-06 20:36 ` Derrick Stolee via GitGitGadget
  2023-01-19 20:34   ` Victoria Dye
  2023-01-06 20:36 ` [PATCH 8/8] bundle-uri: store fetch.bundleCreationToken Derrick Stolee via GitGitGadget
  2023-01-23 15:21 ` [PATCH v2 00/10] Bundle URIs V: creationToken heuristic for incremental fetches Derrick Stolee via GitGitGadget
  8 siblings, 1 reply; 74+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-01-06 20:36 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

When a user specifies a URI via 'git clone --bundle-uri', that URI may
be a bundle list that advertises a 'bundle.heuristic' value. In that
case, the Git client stores a 'fetch.bundleURI' config value storing
that URI.

Teach 'git fetch' to check for this config value and download bundles
from that URI before fetching from the Git remote(s). Likely, the bundle
provider has configured a heuristic (such as "creationToken") that will
allow the Git client to download only a portion of the bundles before
continuing the fetch.

Since this URI is completely independent of the remote server, we want
to be sure that we connect to the bundle URI before creating a
connection to the Git remote. We do not want to hold a stateful
connection for too long if we can avoid it.

To test that this works correctly, extend the previous tests that set
'fetch.bundleURI' to do follow-up fetches. The bundle list is updated
incrementally at each phase to demonstrate that the heuristic avoids
downloading older bundles. This includes the middle fetch downloading
the objects in bundle-3.bundle from the Git remote, and therefore not
needing that bundle in the third fetch.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/fetch.c             |  8 +++++
 t/t5558-clone-bundle-uri.sh | 59 +++++++++++++++++++++++++++++++++++++
 2 files changed, 67 insertions(+)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 7378cafeec9..fbb1d470c38 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -29,6 +29,7 @@
 #include "commit-graph.h"
 #include "shallow.h"
 #include "worktree.h"
+#include "bundle-uri.h"
 
 #define FORCED_UPDATES_DELAY_WARNING_IN_MS (10 * 1000)
 
@@ -2109,6 +2110,7 @@ static int fetch_one(struct remote *remote, int argc, const char **argv,
 int cmd_fetch(int argc, const char **argv, const char *prefix)
 {
 	int i;
+	const char *bundle_uri;
 	struct string_list list = STRING_LIST_INIT_DUP;
 	struct remote *remote = NULL;
 	int result = 0;
@@ -2194,6 +2196,12 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 	if (dry_run)
 		write_fetch_head = 0;
 
+	if (!git_config_get_string_tmp("fetch.bundleuri", &bundle_uri) &&
+	    !starts_with(bundle_uri, "remote:")) {
+		if (fetch_bundle_uri(the_repository, bundle_uri, NULL))
+			warning(_("failed to fetch bundles from '%s'"), bundle_uri);
+	}
+
 	if (all) {
 		if (argc == 1)
 			die(_("fetch --all does not take a repository argument"));
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index 8ff560425ee..3f4d61a915c 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -465,6 +465,65 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
 	cat >expect <<-\EOF &&
 	refs/bundles/base
 	EOF
+	test_cmp expect refs &&
+
+	cat >>"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle "bundle-2"]
+		uri = bundle-2.bundle
+		creationToken = 2
+	EOF
+
+	# Fetch the objects for bundle-2 _and_ bundle-3.
+	GIT_TRACE2_EVENT="$(pwd)/trace1.txt" \
+		git -C fetch-http-4 fetch origin --no-tags \
+		refs/heads/left:refs/heads/left \
+		refs/heads/right:refs/heads/right &&
+
+	# This fetch should copy two files: the list and bundle-2.
+	test_bundle_downloaded bundle-list trace1.txt &&
+	test_bundle_downloaded bundle-2.bundle trace1.txt &&
+	! test_bundle_downloaded bundle-1.bundle trace1.txt &&
+
+	# received left from bundle-2
+	git -C fetch-http-4 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+	cat >expect <<-\EOF &&
+	refs/bundles/base
+	refs/bundles/left
+	EOF
+	test_cmp expect refs &&
+
+	cat >>"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle "bundle-3"]
+		uri = bundle-3.bundle
+		creationToken = 3
+
+	[bundle "bundle-4"]
+		uri = bundle-4.bundle
+		creationToken = 4
+	EOF
+
+	# This fetch should skip bundle-3.bundle, since its objets are
+	# already local (we have the requisite commits for bundle-4.bundle).
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
+		git -C fetch-http-4 fetch origin --no-tags \
+		refs/heads/merge:refs/heads/merge &&
+
+	# This fetch should copy three files: the list, bundle-3, and bundle-4.
+	test_bundle_downloaded bundle-list trace2.txt &&
+	test_bundle_downloaded bundle-4.bundle trace2.txt &&
+	! test_bundle_downloaded bundle-1.bundle trace2.txt &&
+	! test_bundle_downloaded bundle-2.bundle trace2.txt &&
+	! test_bundle_downloaded bundle-3.bundle trace2.txt &&
+
+	# received merge ref from bundle-4, but right is missing
+	# because we did not download bundle-3.
+	git -C fetch-http-4 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+
+	cat >expect <<-\EOF &&
+	refs/bundles/base
+	refs/bundles/left
+	refs/bundles/merge
+	EOF
 	test_cmp expect refs
 '
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 8/8] bundle-uri: store fetch.bundleCreationToken
  2023-01-06 20:36 [PATCH 0/8] Bundle URIs V: creationToken heuristic for incremental fetches Derrick Stolee via GitGitGadget
                   ` (6 preceding siblings ...)
  2023-01-06 20:36 ` [PATCH 7/8] fetch: fetch from an external bundle URI Derrick Stolee via GitGitGadget
@ 2023-01-06 20:36 ` Derrick Stolee via GitGitGadget
  2023-01-19 22:24   ` Victoria Dye
  2023-01-23 15:21 ` [PATCH v2 00/10] Bundle URIs V: creationToken heuristic for incremental fetches Derrick Stolee via GitGitGadget
  8 siblings, 1 reply; 74+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-01-06 20:36 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

When a bundle list specifies the "creationToken" heuristic, the Git
client downloads the list and then starts downloading bundles in
descending creationToken order. This process stops as soon as all
downloaded bundles can be applied to the repository (because all
required commits are present in the repository or in the downloaded
bundles).

When checking the same bundle list twice, this strategy requires
downloading the bundle with the maximum creationToken again, which is
wasteful. The creationToken heuristic promises that the client will not
have a use for that bundle if its creationToken value is the at most the
previous creationToken value.

To prevent these wasteful downloads, create a fetch.bundleCreationToken
config setting that the Git client sets after downloading bundles. This
value allows skipping that maximum bundle download when this config
value is the same value (or larger).

To test that this works correctly, we can insert some "duplicate"
fetches into existing tests and demonstrate that only the bundle list is
downloaded.

The previous logic for downloading bundles by creationToken worked even
if the bundle list was empty, but now we have logic that depends on the
first entry of the list. Terminate early in the (non-sensical) case of
an empty bundle list.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/config/fetch.txt |  8 ++++++++
 bundle-uri.c                   | 35 ++++++++++++++++++++++++++++++++--
 t/t5558-clone-bundle-uri.sh    | 25 +++++++++++++++++++++++-
 3 files changed, 65 insertions(+), 3 deletions(-)

diff --git a/Documentation/config/fetch.txt b/Documentation/config/fetch.txt
index 4f796218aab..96755ba148b 100644
--- a/Documentation/config/fetch.txt
+++ b/Documentation/config/fetch.txt
@@ -104,3 +104,11 @@ fetch.bundleURI::
 	first running `git fetch --bundle-uri=<uri>` immediately before
 	`git fetch <args>`. See details of the `--bundle-uri` option in
 	linkgit:git-fetch[1].
+
+fetch.bundleCreationToken::
+	When using `fetch.bundleURI` to fetch incrementally from a bundle
+	list that uses the "creationToken" heuristic, this config value
+	stores the maximum `creationToken` value of the downloaded bundles.
+	This value is used to prevent downloading bundles in the future
+	if the advertised `creationToken` is not strictly larger than this
+	value.
diff --git a/bundle-uri.c b/bundle-uri.c
index 1dbbbb980eb..98655bd6721 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -464,6 +464,8 @@ static int fetch_bundles_by_token(struct repository *r,
 {
 	int cur;
 	int pop_or_push = 0;
+	const char *creationTokenStr;
+	uint64_t maxCreationToken;
 	struct bundle_list_context ctx = {
 		.r = r,
 		.list = list,
@@ -477,8 +479,27 @@ static int fetch_bundles_by_token(struct repository *r,
 
 	for_all_bundles_in_list(list, insert_bundle, &sorted);
 
+	if (!sorted.nr) {
+		free(sorted.items);
+		return 0;
+	}
+
 	QSORT(sorted.items, sorted.nr, compare_creation_token);
 
+	/*
+	 * If fetch.bundleCreationToken exists, parses to a uint64t, and
+	 * is not strictly smaller than the maximum creation token in the
+	 * bundle list, then do not download any bundles.
+	 */
+	if (!repo_config_get_value(r,
+				   "fetch.bundlecreationtoken",
+				   &creationTokenStr) &&
+	    sscanf(creationTokenStr, "%"PRIu64, &maxCreationToken) == 1 &&
+	    sorted.items[0]->creationToken <= maxCreationToken) {
+		free(sorted.items);
+		return 0;
+	}
+
 	/*
 	 * Use a stack-based approach to download the bundles and attempt
 	 * to unbundle them in decreasing order by creation token. If we
@@ -541,14 +562,24 @@ stack_operation:
 		cur += pop_or_push;
 	}
 
-	free(sorted.items);
-
 	/*
 	 * We succeed if the loop terminates because 'cur' drops below
 	 * zero. The other case is that we terminate because 'cur'
 	 * reaches the end of the list, so we have a failure no matter
 	 * which bundles we apply from the list.
 	 */
+	if (cur < 0) {
+		struct strbuf value = STRBUF_INIT;
+		strbuf_addf(&value, "%"PRIu64"", sorted.items[0]->creationToken);
+		if (repo_config_set_multivar_gently(ctx.r,
+						    "fetch.bundleCreationToken",
+						    value.buf, NULL, 0))
+			warning(_("failed to store maximum creation token"));
+
+		strbuf_release(&value);
+	}
+
+	free(sorted.items);
 	return cur >= 0;
 }
 
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index 3f4d61a915c..0604d721f1b 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -455,6 +455,7 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
 		"$HTTPD_URL/smart/fetch.git" fetch-http-4 &&
 
 	test_cmp_config -C fetch-http-4 "$HTTPD_URL/bundle-list" fetch.bundleuri &&
+	test_cmp_config -C fetch-http-4 1 fetch.bundlecreationtoken &&
 
 	# The clone should copy two files: the list and bundle-1.
 	test_bundle_downloaded bundle-list trace-clone.txt &&
@@ -479,6 +480,8 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
 		refs/heads/left:refs/heads/left \
 		refs/heads/right:refs/heads/right &&
 
+	test_cmp_config -C fetch-http-4 2 fetch.bundlecreationtoken &&
+
 	# This fetch should copy two files: the list and bundle-2.
 	test_bundle_downloaded bundle-list trace1.txt &&
 	test_bundle_downloaded bundle-2.bundle trace1.txt &&
@@ -492,6 +495,15 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
 	EOF
 	test_cmp expect refs &&
 
+	# No-op fetch
+	GIT_TRACE2_EVENT="$(pwd)/trace1b.txt" \
+		git -C fetch-http-4 fetch origin --no-tags \
+		refs/heads/left:refs/heads/left \
+		refs/heads/right:refs/heads/right &&
+	test_bundle_downloaded bundle-list trace1b.txt &&
+	! test_bundle_downloaded bundle-1.bundle trace1b.txt &&
+	! test_bundle_downloaded bundle-2.bundle trace1b.txt &&
+
 	cat >>"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
 	[bundle "bundle-3"]
 		uri = bundle-3.bundle
@@ -508,6 +520,8 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
 		git -C fetch-http-4 fetch origin --no-tags \
 		refs/heads/merge:refs/heads/merge &&
 
+	test_cmp_config -C fetch-http-4 4 fetch.bundlecreationtoken &&
+
 	# This fetch should copy three files: the list, bundle-3, and bundle-4.
 	test_bundle_downloaded bundle-list trace2.txt &&
 	test_bundle_downloaded bundle-4.bundle trace2.txt &&
@@ -524,7 +538,16 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
 	refs/bundles/left
 	refs/bundles/merge
 	EOF
-	test_cmp expect refs
+	test_cmp expect refs &&
+
+	# No-op fetch
+	GIT_TRACE2_EVENT="$(pwd)/trace2b.txt" \
+		git -C fetch-http-4 fetch origin &&
+	test_bundle_downloaded bundle-list trace2b.txt &&
+	! test_bundle_downloaded bundle-1.bundle trace2b.txt &&
+	! test_bundle_downloaded bundle-2.bundle trace2b.txt &&
+	! test_bundle_downloaded bundle-3.bundle trace2b.txt &&
+	! test_bundle_downloaded bundle-4.bundle trace2b.txt
 '
 
 # Do not add tests here unless they use the HTTP server, as they will
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: [PATCH 2/8] bundle-uri: parse bundle.heuristic=creationToken
  2023-01-06 20:36 ` [PATCH 2/8] bundle-uri: parse bundle.heuristic=creationToken Derrick Stolee via GitGitGadget
@ 2023-01-09  2:38   ` Junio C Hamano
  2023-01-09 14:20     ` Derrick Stolee
  2023-01-17 19:13   ` Victoria Dye
  1 sibling, 1 reply; 74+ messages in thread
From: Junio C Hamano @ 2023-01-09  2:38 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, me, vdye, avarab, steadmon, chooglen, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +static const char *heuristics[] = {
> +	[BUNDLE_HEURISTIC_NONE] = "",
> +	[BUNDLE_HEURISTIC_CREATIONTOKEN] = "creationToken",
> +};

Ideally it would require the least amount of maintenance if we could
define BUNDLE_HEURISTIC__COUNT as ARRAY_SIZE() of this thing, but it
being a file scope static, it might not be easy to arrange that.  As
a lessor altenative, would it make it safer to size this array more
explicitly using BUNDLE_HEURISTIC__COUNT macro?

	static const char *heuristics[BUNDLE_HEURISTIC__COUNT] = {
		...
	};

or is it more-or-less moot point to aim for safety because nobody
enforces that these [indices] used to define the contents of this
array are dense?

That is ...

> @@ -142,6 +150,19 @@ static int bundle_list_update(const char *key, const char *value,
>  			return 0;
>  		}
>  
> +		if (!strcmp(subkey, "heuristic")) {
> +			int i;
> +			for (i = 0; i < BUNDLE_HEURISTIC__COUNT; i++) {
> +				if (!strcmp(value, heuristics[i])) {
> +					list->heuristic = i;
> +					return 0;
> +				}
> +			}

... this strcmp() will segfault if heuristics[] array is sparse, or
BUNDLE_HEURISTIC__COUNT is larger than the array (i.e. you add a new
heuristic in "enum bundle_heuristic" before the __COUNT sentinel,
but forget to add it to the heuristics[] array).

"You are worrying too much.  Our developers would notice a segfault
and the current code, which may look risky to you, is something they
can live with", is a perfectly acceptable response, but somehow I
have this nagging feeling that we should be able to make it easier
to maintain without incurring extra runtime cost.

> diff --git a/bundle-uri.h b/bundle-uri.h
> index d5e89f1671c..ad82174112d 100644
> --- a/bundle-uri.h
> +++ b/bundle-uri.h
> @@ -52,6 +52,14 @@ enum bundle_list_mode {
>  	BUNDLE_MODE_ANY
>  };
>  
> +enum bundle_list_heuristic {
> +	BUNDLE_HEURISTIC_NONE = 0,
> +	BUNDLE_HEURISTIC_CREATIONTOKEN,
> +
> +	/* Must be last. */
> +	BUNDLE_HEURISTIC__COUNT,
> +};

The only reason to leave a trailing comma is to make it easy to
append new values at the end.  By omitting the trailing comma, you
can doubly signal "Must be last" here (not suggesting to remove the
comment; suggesting to remove the trailing comma).

Thanks.


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 3/8] bundle-uri: parse bundle.<id>.creationToken values
  2023-01-06 20:36 ` [PATCH 3/8] bundle-uri: parse bundle.<id>.creationToken values Derrick Stolee via GitGitGadget
@ 2023-01-09  3:08   ` Junio C Hamano
  2023-01-09 14:41     ` Derrick Stolee
  2023-01-17 19:24   ` Victoria Dye
  1 sibling, 1 reply; 74+ messages in thread
From: Junio C Hamano @ 2023-01-09  3:08 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, me, vdye, avarab, steadmon, chooglen, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +	if (!strcmp(subkey, "creationtoken")) {
> +		if (sscanf(value, "%"PRIu64, &bundle->creationToken) != 1)
> +			warning(_("could not parse bundle list key %s with value '%s'"),
> +				"creationToken", value);
> +		return 0;
> +	}

We tend to avoid sscanf() to parse out integral values, as it is a
bit too permissive to our liking (especially while parsing the
object header), but here it probably is OK, I guess.

> +	/**
> +	 * If the bundle is part of a list with the creationToken
> +	 * heuristic, then we use this member for sorting the bundles.
> +	 */
> +	uint64_t creationToken;
>  };

Is the idea behind the type is that creationTokens, while we leave
up to the bundle providers what the actual values (other than zero)
mean, must be comparable to give them a total order, and uint64
would be a usable type for bundle providers to come up with such
values easily?

Thanks.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/8] bundle-uri: download in creationToken order
  2023-01-06 20:36 ` [PATCH 4/8] bundle-uri: download in creationToken order Derrick Stolee via GitGitGadget
@ 2023-01-09  3:22   ` Junio C Hamano
  2023-01-09 14:58     ` Derrick Stolee
  2023-01-19 18:32   ` Victoria Dye
  1 sibling, 1 reply; 74+ messages in thread
From: Junio C Hamano @ 2023-01-09  3:22 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, me, vdye, avarab, steadmon, chooglen, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +struct sorted_bundle_list {
> +	struct remote_bundle_info **items;
> +	size_t alloc;
> +	size_t nr;
> +};
> +
> +static int insert_bundle(struct remote_bundle_info *bundle, void *data)
> +{
> +	struct sorted_bundle_list *list = data;
> +	list->items[list->nr++] = bundle;
> +	return 0;
> +}

Especially given that the type of the list claims it to be "sorted",
insert_bundle() is a misleading name for a helper that merely
appends to it to make the list (tentatively) unsorted.

I am not opposed to "append all to make an unsorted list, then sort
the list at the end" strategy.

> +static int compare_creation_token(const void *va, const void *vb)
> +{
> +	const struct remote_bundle_info * const *a = va;
> +	const struct remote_bundle_info * const *b = vb;
> +
> +	if ((*a)->creationToken > (*b)->creationToken)
> +		return -1;
> +	if ((*a)->creationToken < (*b)->creationToken)
> +		return 1;
> +	return 0;
> +}

Usually compare(a,b) returns the sign of (a-b), but the returned
value from the above is the opposite.  This is because we want the
list sorted from newer to older?  It may help developers to name
such a (reverse) "compare" function differently.

> +static int fetch_bundles_by_token(struct repository *r,
> +				  struct bundle_list *list)
> +{
> +	int cur;
> +	int pop_or_push = 0;
> +	struct bundle_list_context ctx = {
> +		.r = r,
> +		.list = list,
> +		.mode = list->mode,
> +	};
> +	struct sorted_bundle_list sorted = {
> +		.alloc = hashmap_get_size(&list->bundles),
> +	};
> +
> +	ALLOC_ARRAY(sorted.items, sorted.alloc);
> +
> +	for_all_bundles_in_list(list, insert_bundle, &sorted);
> +
> +	QSORT(sorted.items, sorted.nr, compare_creation_token);

If I were doing this patch, I would call the type of the list of
bundles "struct bundle_list" (without "sorted_" in its name), and
name the variable of that type used here "sorted".  That would make
it more clear that this particular bundle list starts its life as
unsorted (with "append_bundle" function adding more elements) and
then gets sorted in the end, from the above several lines.


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 2/8] bundle-uri: parse bundle.heuristic=creationToken
  2023-01-09  2:38   ` Junio C Hamano
@ 2023-01-09 14:20     ` Derrick Stolee
  0 siblings, 0 replies; 74+ messages in thread
From: Derrick Stolee @ 2023-01-09 14:20 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, me, vdye, avarab, steadmon, chooglen

On 1/8/2023 9:38 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> +static const char *heuristics[] = {
>> +	[BUNDLE_HEURISTIC_NONE] = "",
>> +	[BUNDLE_HEURISTIC_CREATIONTOKEN] = "creationToken",
>> +};
> 
> Ideally it would require the least amount of maintenance if we could
> define BUNDLE_HEURISTIC__COUNT as ARRAY_SIZE() of this thing, but it
> being a file scope static, it might not be easy to arrange that.  As
> a lessor altenative, would it make it safer to size this array more
> explicitly using BUNDLE_HEURISTIC__COUNT macro?
> 
> 	static const char *heuristics[BUNDLE_HEURISTIC__COUNT] = {
> 		...
> 	};

Yes, I should have used this size indicator.
 
> or is it more-or-less moot point to aim for safety because nobody
> enforces that these [indices] used to define the contents of this
> array are dense?
> 
> That is ...
> 
>> @@ -142,6 +150,19 @@ static int bundle_list_update(const char *key, const char *value,
>>  			return 0;
>>  		}
>>  
>> +		if (!strcmp(subkey, "heuristic")) {
>> +			int i;
>> +			for (i = 0; i < BUNDLE_HEURISTIC__COUNT; i++) {
>> +				if (!strcmp(value, heuristics[i])) {
>> +					list->heuristic = i;
>> +					return 0;
>> +				}
>> +			}
> 
> ... this strcmp() will segfault if heuristics[] array is sparse, or
> BUNDLE_HEURISTIC__COUNT is larger than the array (i.e. you add a new
> heuristic in "enum bundle_heuristic" before the __COUNT sentinel,
> but forget to add it to the heuristics[] array).
> 
> "You are worrying too much.  Our developers would notice a segfault
> and the current code, which may look risky to you, is something they
> can live with", is a perfectly acceptable response, but somehow I
> have this nagging feeling that we should be able to make it easier
> to maintain without incurring extra runtime cost.

You're right. I was following an established pattern of linking
enums to values, but I'm not sure that those other examples will
loop over the array like this looking for a value.

A safer approach would be to have an array of (enum, string) pairs
that could either be iterated in a loop (fast enough for a small
number of enum values, such as this case) or used to populate a
hashmap at runtime if needed for a large number of queries.

>> diff --git a/bundle-uri.h b/bundle-uri.h
>> index d5e89f1671c..ad82174112d 100644
>> --- a/bundle-uri.h
>> +++ b/bundle-uri.h
>> @@ -52,6 +52,14 @@ enum bundle_list_mode {
>>  	BUNDLE_MODE_ANY
>>  };
>>  
>> +enum bundle_list_heuristic {
>> +	BUNDLE_HEURISTIC_NONE = 0,
>> +	BUNDLE_HEURISTIC_CREATIONTOKEN,
>> +
>> +	/* Must be last. */
>> +	BUNDLE_HEURISTIC__COUNT,
>> +};
> 
> The only reason to leave a trailing comma is to make it easy to
> append new values at the end.  By omitting the trailing comma, you
> can doubly signal "Must be last" here (not suggesting to remove the
> comment; suggesting to remove the trailing comma).

This is a great example of "doing the typically right thing" but
without thinking of _why_ we do that thing. Thanks for pointing this
out.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 3/8] bundle-uri: parse bundle.<id>.creationToken values
  2023-01-09  3:08   ` Junio C Hamano
@ 2023-01-09 14:41     ` Derrick Stolee
  0 siblings, 0 replies; 74+ messages in thread
From: Derrick Stolee @ 2023-01-09 14:41 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, me, vdye, avarab, steadmon, chooglen

On 1/8/2023 10:08 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> +	if (!strcmp(subkey, "creationtoken")) {
>> +		if (sscanf(value, "%"PRIu64, &bundle->creationToken) != 1)
>> +			warning(_("could not parse bundle list key %s with value '%s'"),
>> +				"creationToken", value);
>> +		return 0;
>> +	}
> 
> We tend to avoid sscanf() to parse out integral values, as it is a
> bit too permissive to our liking (especially while parsing the
> object header), but here it probably is OK, I guess.

I tried to find another way to parse uint64ts, but could not find
another example in the codebase. I'd be happy to change it if we
have a preferred method.
 
>> +	/**
>> +	 * If the bundle is part of a list with the creationToken
>> +	 * heuristic, then we use this member for sorting the bundles.
>> +	 */
>> +	uint64_t creationToken;
>>  };
> 
> Is the idea behind the type is that creationTokens, while we leave
> up to the bundle providers what the actual values (other than zero)
> mean, must be comparable to give them a total order, and uint64
> would be a usable type for bundle providers to come up with such
> values easily?

One easy way to create the total order is to use Unix epoch
timestamps (on the same machine, to avoid clock skew). We cannot
use the machine-local width of the timestamp types, though. And
there is no need to use timestamp-like values. The bundle provider
could keep an integer count.

I did add a test that ensures we test a creationToken that would
not fit within 32 bits.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/8] bundle-uri: download in creationToken order
  2023-01-09  3:22   ` Junio C Hamano
@ 2023-01-09 14:58     ` Derrick Stolee
  0 siblings, 0 replies; 74+ messages in thread
From: Derrick Stolee @ 2023-01-09 14:58 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, me, vdye, avarab, steadmon, chooglen

On 1/8/2023 10:22 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> +struct sorted_bundle_list {
>> +	struct remote_bundle_info **items;
>> +	size_t alloc;
>> +	size_t nr;
>> +};
>> +
>> +static int insert_bundle(struct remote_bundle_info *bundle, void *data)
>> +{
>> +	struct sorted_bundle_list *list = data;
>> +	list->items[list->nr++] = bundle;
>> +	return 0;
>> +}
> 
> Especially given that the type of the list claims it to be "sorted",
> insert_bundle() is a misleading name for a helper that merely
> appends to it to make the list (tentatively) unsorted.
> 
> I am not opposed to "append all to make an unsorted list, then sort
> the list at the end" strategy.

...

> If I were doing this patch, I would call the type of the list of
> bundles "struct bundle_list" (without "sorted_" in its name), and
> name the variable of that type used here "sorted".  That would make
> it more clear that this particular bundle list starts its life as
> unsorted (with "append_bundle" function adding more elements) and
> then gets sorted in the end, from the above several lines.

Since "struct bundle_list" is taken, how about "bundles_for_sorting"
since that's the purpose of this struct (to be passed as data to
the for_all_bundles_in_list() and then to QSORT()).

Renaming insert_bundle() to append_bundle() is clearly better.

>> +static int compare_creation_token(const void *va, const void *vb)
>> +{
>> +	const struct remote_bundle_info * const *a = va;
>> +	const struct remote_bundle_info * const *b = vb;
>> +
>> +	if ((*a)->creationToken > (*b)->creationToken)
>> +		return -1;
>> +	if ((*a)->creationToken < (*b)->creationToken)
>> +		return 1;
>> +	return 0;
>> +}
> 
> Usually compare(a,b) returns the sign of (a-b), but the returned
> value from the above is the opposite.  This is because we want the
> list sorted from newer to older?  It may help developers to name
> such a (reverse) "compare" function differently.

Would renaming this to "compare_creation_token_decreasing" be clear
enough? (Plus a doc comment.)

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 1/8] t5558: add tests for creationToken heuristic
  2023-01-06 20:36 ` [PATCH 1/8] t5558: add tests for creationToken heuristic Derrick Stolee via GitGitGadget
@ 2023-01-17 18:17   ` Victoria Dye
  2023-01-17 21:00     ` Derrick Stolee
  0 siblings, 1 reply; 74+ messages in thread
From: Victoria Dye @ 2023-01-17 18:17 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget, git
  Cc: gitster, me, avarab, steadmon, chooglen, Derrick Stolee

Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>
> 
> As documented in the bundle URI design doc in 2da14fad8fe (docs:
> document bundle URI standard, 2022-08-09), the 'creationToken' member of
> a bundle URI allows a bundle provider to specify a total order on the
> bundles.
> 
> Future changes will allow the Git client to understand these members and
> modify its behavior around downloading the bundles in that order. In the
> meantime, create tests that add creation tokens to the bundle list. For
> now, the Git client correctly ignores these unknown keys.
> 
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>  t/t5558-clone-bundle-uri.sh | 52 +++++++++++++++++++++++++++++++++++--
>  1 file changed, 50 insertions(+), 2 deletions(-)
> 
> diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
> index 9155f31fa2c..328caeeae9a 100755
> --- a/t/t5558-clone-bundle-uri.sh
> +++ b/t/t5558-clone-bundle-uri.sh
> @@ -284,7 +284,17 @@ test_expect_success 'clone HTTP bundle' '
>  	test_config -C clone-http log.excludedecoration refs/bundle/
>  '
>  
> +# usage: test_bundle_downloaded <bundle-name> <trace-file>
> +test_bundle_downloaded () {
> +	cat >pattern <<-EOF &&
> +	"event":"child_start".*"argv":\["git-remote-https","$HTTPD_URL/$1"\]
> +	EOF
> +	grep -f pattern "$2"
> +}
> +
>  test_expect_success 'clone bundle list (HTTP, no heuristic)' '
> +	test_when_finished rm -f trace*.txt &&
> +
>  	cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" &&
>  	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
>  	[bundle]
> @@ -304,12 +314,19 @@ test_expect_success 'clone bundle list (HTTP, no heuristic)' '
>  		uri = $HTTPD_URL/bundle-4.bundle
>  	EOF
>  
> -	git clone --bundle-uri="$HTTPD_URL/bundle-list" \
> +	GIT_TRACE2_EVENT="$(pwd)/trace-clone.txt" \
> +		git clone --bundle-uri="$HTTPD_URL/bundle-list" \
>  		clone-from clone-list-http  2>err &&
>  	! grep "Repository lacks these prerequisite commits" err &&
>  
>  	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
> -	git -C clone-list-http cat-file --batch-check <oids
> +	git -C clone-list-http cat-file --batch-check <oids &&
> +
> +	for b in 1 2 3 4
> +	do
> +		test_bundle_downloaded bundle-$b.bundle trace-clone.txt ||
> +			return 1
> +	done

Because the current state of bundle list handling is equivalent to "no
heuristic", this pre-existing test is just updated to verify all bundles are
downloaded. This isn't new behavior, but it'll be relevant to compare with
the behavior of the 'creationToken' heuristic. 

I was going to ask how the tests verify that *only* the expected bundles are
downloaded, and it looks like later patches [1] handle that with
'! test_bundle_downloaded' checks. That approach seems a bit fragile (if a
bundle's name doesn't match the '! test_bundle_downloaded' check for some
reason, the bundle can be either downloaded or not with no effect on the
test result). Would something like a 'test_downloaded_bundle_count' work
instead?

-------- 8< --------

diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index 0604d721f1..b2f55dd983 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -292,6 +292,16 @@ test_bundle_downloaded () {
 	grep -f pattern "$2"
 }
 
+test_download_bundle_count () {
+	cat >exclude <<-EOF &&
+	"event":"child_start".*"argv":\["git-remote-https","$HTTPD_URL/bundle-list"\]
+	EOF
+	cat >pattern <<-EOF &&
+	"event":"child_start".*"argv":\["git-remote-https","$HTTPD_URL/.*"\]
+	EOF
+	test $(grep -f pattern "$2" | grep -v -f exclude | wc -l) -eq "$1"
+}
+
 test_expect_success 'clone bundle list (HTTP, no heuristic)' '
 	test_when_finished rm -f trace*.txt &&
 
@@ -322,6 +332,7 @@ test_expect_success 'clone bundle list (HTTP, no heuristic)' '
 	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
 	git -C clone-list-http cat-file --batch-check <oids &&
 
+	test_download_bundle_count 4 trace-clone.txt &&
 	for b in 1 2 3 4
 	do
 		test_bundle_downloaded bundle-$b.bundle trace-clone.txt ||

-------- >8 --------

[1] https://lore.kernel.org/git/51f210ddeb46fb06e885dc384a486c4bb16ad8cd.1673037405.git.gitgitgadget@gmail.com/

>  '
>  
>  test_expect_success 'clone bundle list (HTTP, any mode)' '
> @@ -350,6 +367,37 @@ test_expect_success 'clone bundle list (HTTP, any mode)' '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success 'clone bundle list (http, creationToken)' '
> +	cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" &&
> +	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
> +	[bundle]
> +		version = 1
> +		mode = all
> +		heuristic = creationToken
> +
> +	[bundle "bundle-1"]
> +		uri = bundle-1.bundle
> +		creationToken = 1
> +
> +	[bundle "bundle-2"]
> +		uri = bundle-2.bundle
> +		creationToken = 2
> +
> +	[bundle "bundle-3"]
> +		uri = bundle-3.bundle
> +		creationToken = 3
> +
> +	[bundle "bundle-4"]
> +		uri = bundle-4.bundle
> +		creationToken = 4
> +	EOF
> +
> +	git clone --bundle-uri="$HTTPD_URL/bundle-list" . clone-list-http-2 &&
> +
> +	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
> +	git -C clone-list-http-2 cat-file --batch-check <oids

This test looks like the one that was updated above, but adds the
'creationToken' heuristic key. However, the 'test_bundle_downloaded' check
isn't included - if it were, it would need to verify that all bundles were
downloaded, with the heuristic being ignored, all bundles will be downloaded
(which isn't consistent with what the 'creationToken' heuristic will
*eventually* do).

As a matter of personal preference (so no pressure to change if you
disagree), I find this test in its current state a bit misleading; because
it's a 'test_expect_success' and there's no "NEEDSWORK" or "TODO", I could
easily assume that cloning from a bundle list with the 'creationToken'
heuristic is working as-intended at this point (that is, there's no
indication that it's not implemented). 

If you did want to change it, adding a 'NEEDSWORK' comment, changing to
'test_expect_failure' & including the appropriate 'test_bundle_downloaded'
check, or moving this test to the patch where the heuristic is implemented
would mitigate any confusion. That said, this "issue" is resolved by the
end of the series anyway, so it's really a low priority fix.

> +'
> +
>  # Do not add tests here unless they use the HTTP server, as they will
>  # not run unless the HTTP dependencies exist.
>  


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: [PATCH 2/8] bundle-uri: parse bundle.heuristic=creationToken
  2023-01-06 20:36 ` [PATCH 2/8] bundle-uri: parse bundle.heuristic=creationToken Derrick Stolee via GitGitGadget
  2023-01-09  2:38   ` Junio C Hamano
@ 2023-01-17 19:13   ` Victoria Dye
  1 sibling, 0 replies; 74+ messages in thread
From: Victoria Dye @ 2023-01-17 19:13 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget, git
  Cc: gitster, me, avarab, steadmon, chooglen, Derrick Stolee

Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>
> 
> The bundle.heuristic value communicates that the bundle list is
> organized to make use of the bundle.<id>.creationToken values that may
> be provided in the bundle list. Those values will create a total order
> on the bundles, allowing the Git client to download them in a specific
> order and even remember previously-downloaded bundles by storing the
> maximum creation token value.
> 
> Before implementing any logic that parses or uses the
> bundle.<id>.creationToken values, teach Git to parse the
> bundle.heuristic value from a bundle list. We can use 'test-tool
> bundle-uri' to print the heuristic value and verify that the parsing
> works correctly.
> 
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>  Documentation/config/bundle.txt |  7 +++++++
>  bundle-uri.c                    | 21 +++++++++++++++++++++
>  bundle-uri.h                    | 14 ++++++++++++++
>  t/t5750-bundle-uri-parse.sh     | 19 +++++++++++++++++++
>  4 files changed, 61 insertions(+)
> 
> diff --git a/Documentation/config/bundle.txt b/Documentation/config/bundle.txt
> index daa21eb674a..3faae386853 100644
> --- a/Documentation/config/bundle.txt
> +++ b/Documentation/config/bundle.txt
> @@ -15,6 +15,13 @@ bundle.mode::
>  	complete understanding of the bundled information (`all`) or if any one
>  	of the listed bundle URIs is sufficient (`any`).
>  
> +bundle.heuristic::
> +	If this string-valued key exists, then the bundle list is designed to
> +	work well with incremental `git fetch` commands. The heuristic signals
> +	that there are additional keys available for each bundle that help
> +	determine which subset of bundles the client should download. The
> +	only value currently understood is `creationToken`.

This description clearly describes the 'heuristic' key and what it does.

> +
>  bundle.<id>.*::
>  	The `bundle.<id>.*` keys are used to describe a single item in the
>  	bundle list, grouped under `<id>` for identification purposes.
> diff --git a/bundle-uri.c b/bundle-uri.c
> index 36268dda172..56c94595c2a 100644
> --- a/bundle-uri.c
> +++ b/bundle-uri.c
> @@ -9,6 +9,11 @@
>  #include "config.h"
>  #include "remote.h"
>  
> +static const char *heuristics[] = {
> +	[BUNDLE_HEURISTIC_NONE] = "",
> +	[BUNDLE_HEURISTIC_CREATIONTOKEN] = "creationToken",
> +};
> +
>  static int compare_bundles(const void *hashmap_cmp_fn_data,
>  			   const struct hashmap_entry *he1,
>  			   const struct hashmap_entry *he2,
> @@ -100,6 +105,9 @@ void print_bundle_list(FILE *fp, struct bundle_list *list)
>  	fprintf(fp, "\tversion = %d\n", list->version);
>  	fprintf(fp, "\tmode = %s\n", mode);
>  
> +	if (list->heuristic)
> +		printf("\theuristic = %s\n", heuristics[list->heuristic]);

Given this condition, the 'heuristic' key should not be sent if it's
'BUNDLE_HEURISTIC_NONE'. But, as a fallback...

> +
>  	for_all_bundles_in_list(list, summarize_bundle, fp);
>  }
>  
> @@ -142,6 +150,19 @@ static int bundle_list_update(const char *key, const char *value,
>  			return 0;
>  		}
>  
> +		if (!strcmp(subkey, "heuristic")) {
> +			int i;
> +			for (i = 0; i < BUNDLE_HEURISTIC__COUNT; i++) {
> +				if (!strcmp(value, heuristics[i])) {
> +					list->heuristic = i;
> +					return 0;
> +				}
> +			}

...this condition seems to handle 'BUNDLE_HEURISTIC_NONE' anyway. There's no
harm in this, since 'BUNDLE_HEURISTIC_NONE' is the default value of
'list->heuristic' anyway.

>  void init_bundle_list(struct bundle_list *list);
> diff --git a/t/t5750-bundle-uri-parse.sh b/t/t5750-bundle-uri-parse.sh
> index 7b4f930e532..6fc92a9c0d4 100755
> --- a/t/t5750-bundle-uri-parse.sh
> +++ b/t/t5750-bundle-uri-parse.sh
> @@ -250,4 +250,23 @@ test_expect_success 'parse config format edge cases: empty key or value' '
>  	test_cmp_config_output expect actual
>  '
>  
> +test_expect_success 'parse config format: creationToken heuristic' '
> +	cat >expect <<-\EOF &&
> +	[bundle]
> +		version = 1
> +		mode = all
> +		heuristic = creationToken
> +	[bundle "one"]
> +		uri = http://example.com/bundle.bdl
> +	[bundle "two"]
> +		uri = https://example.com/bundle.bdl
> +	[bundle "three"]
> +		uri = file:///usr/share/git/bundle.bdl
> +	EOF
> +
> +	test-tool bundle-uri parse-config expect >actual 2>err &&
> +	test_must_be_empty err &&
> +	test_cmp_config_output expect actual
> +'

And this test verifies that 'heuristic' is no longer being ignored. Looks
good!

> +
>  test_done


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 3/8] bundle-uri: parse bundle.<id>.creationToken values
  2023-01-06 20:36 ` [PATCH 3/8] bundle-uri: parse bundle.<id>.creationToken values Derrick Stolee via GitGitGadget
  2023-01-09  3:08   ` Junio C Hamano
@ 2023-01-17 19:24   ` Victoria Dye
  1 sibling, 0 replies; 74+ messages in thread
From: Victoria Dye @ 2023-01-17 19:24 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget, git
  Cc: gitster, me, avarab, steadmon, chooglen, Derrick Stolee

Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>
> 
> The previous change taught Git to parse the bundle.heuristic value,
> especially when its value is "creationToken". Now, teach Git to parse
> the bundle.<id>.creationToken values on each bundle in a bundle list.
> 
> Before implementing any logic based on creationToken values for the
> creationToken heuristic, parse and print these values for testing
> purposes.
> 
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>  bundle-uri.c                | 10 ++++++++++
>  bundle-uri.h                |  6 ++++++
>  t/t5750-bundle-uri-parse.sh | 18 ++++++++++++++++++
>  3 files changed, 34 insertions(+)
> 
> diff --git a/bundle-uri.c b/bundle-uri.c
> index 56c94595c2a..63e2cc21057 100644
> --- a/bundle-uri.c
> +++ b/bundle-uri.c
> @@ -80,6 +80,9 @@ static int summarize_bundle(struct remote_bundle_info *info, void *data)
>  	FILE *fp = data;
>  	fprintf(fp, "[bundle \"%s\"]\n", info->id);
>  	fprintf(fp, "\turi = %s\n", info->uri);
> +
> +	if (info->creationToken)
> +		fprintf(fp, "\tcreationToken = %"PRIu64"\n", info->creationToken);
>  	return 0;
>  }
>  
> @@ -190,6 +193,13 @@ static int bundle_list_update(const char *key, const char *value,
>  		return 0;
>  	}
>  
> +	if (!strcmp(subkey, "creationtoken")) {
> +		if (sscanf(value, "%"PRIu64, &bundle->creationToken) != 1)
> +			warning(_("could not parse bundle list key %s with value '%s'"),
> +				"creationToken", value);
> +		return 0;
> +	}

Like the 'heuristic' key in the last patch, the parsing of 'creationToken'
is pretty straightforward.

> diff --git a/t/t5750-bundle-uri-parse.sh b/t/t5750-bundle-uri-parse.sh
> index 6fc92a9c0d4..81bdf58b944 100755
> --- a/t/t5750-bundle-uri-parse.sh
> +++ b/t/t5750-bundle-uri-parse.sh
> @@ -258,10 +258,13 @@ test_expect_success 'parse config format: creationToken heuristic' '
>  		heuristic = creationToken
>  	[bundle "one"]
>  		uri = http://example.com/bundle.bdl
> +		creationToken = 123456
>  	[bundle "two"]
>  		uri = https://example.com/bundle.bdl
> +		creationToken = 12345678901234567890
>  	[bundle "three"]
>  		uri = file:///usr/share/git/bundle.bdl
> +		creationToken = 1
>  	EOF
>  
>  	test-tool bundle-uri parse-config expect >actual 2>err &&
> @@ -269,4 +272,19 @@ test_expect_success 'parse config format: creationToken heuristic' '
>  	test_cmp_config_output expect actual
>  '
>  
> +test_expect_success 'parse config format edge cases: creationToken heuristic' '
> +	cat >expect <<-\EOF &&
> +	[bundle]
> +		version = 1
> +		mode = all
> +		heuristic = creationToken
> +	[bundle "one"]
> +		uri = http://example.com/bundle.bdl
> +		creationToken = bogus
> +	EOF
> +
> +	test-tool bundle-uri parse-config expect >actual 2>err &&
> +	grep "could not parse bundle list key creationToken with value '\''bogus'\''" err
> +'

And the tests cover both valid and invalid cases nicely.

> +
>  test_done


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 1/8] t5558: add tests for creationToken heuristic
  2023-01-17 18:17   ` Victoria Dye
@ 2023-01-17 21:00     ` Derrick Stolee
  0 siblings, 0 replies; 74+ messages in thread
From: Derrick Stolee @ 2023-01-17 21:00 UTC (permalink / raw)
  To: Victoria Dye, Derrick Stolee via GitGitGadget, git
  Cc: gitster, me, avarab, steadmon, chooglen

On 1/17/2023 1:17 PM, Victoria Dye wrote:
> Derrick Stolee via GitGitGadget wrote:

>> +	for b in 1 2 3 4
>> +	do
>> +		test_bundle_downloaded bundle-$b.bundle trace-clone.txt ||
>> +			return 1
>> +	done
> 
> Because the current state of bundle list handling is equivalent to "no
> heuristic", this pre-existing test is just updated to verify all bundles are
> downloaded. This isn't new behavior, but it'll be relevant to compare with
> the behavior of the 'creationToken' heuristic. 
> 
> I was going to ask how the tests verify that *only* the expected bundles are
> downloaded, and it looks like later patches [1] handle that with
> '! test_bundle_downloaded' checks. That approach seems a bit fragile (if a
> bundle's name doesn't match the '! test_bundle_downloaded' check for some
> reason, the bundle can be either downloaded or not with no effect on the
> test result). Would something like a 'test_downloaded_bundle_count' work
> instead?

Or, perhaps we could check the exact list _and order_ using this slightly
more generic helper?

# Given a GIT_TRACE2_EVENT log over stdin, writes to stdout a list of URLs
# sent to git-remote-https child processes.
test_remote_https_urls() {
	grep -e '"event":"child_start".*"argv":\["git-remote-https",".*"\]' |
		sed -e 's/{"event":"child_start".*"argv":\["git-remote-https","//g' \
		    -e 's/"\]}//g'
}

With a test example looking a lot like this:

	cat >expect <<-EOF &&
	$HTTPD_URL/newest.bundle
	$HTTPD_URL/new.bundle
	$HTTPD_URL/everything.bundle
	EOF

	test_remote_https_urls <trace-clone.txt >actual &&
	test_cmp expect actual

Thanks for the inspiration.

>> +test_expect_success 'clone bundle list (http, creationToken)' '
>> +	cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" &&
>> +	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
>> +	[bundle]
>> +		version = 1
>> +		mode = all
>> +		heuristic = creationToken
>> +
>> +	[bundle "bundle-1"]
>> +		uri = bundle-1.bundle
>> +		creationToken = 1
>> +
>> +	[bundle "bundle-2"]
>> +		uri = bundle-2.bundle
>> +		creationToken = 2
>> +
>> +	[bundle "bundle-3"]
>> +		uri = bundle-3.bundle
>> +		creationToken = 3
>> +
>> +	[bundle "bundle-4"]
>> +		uri = bundle-4.bundle
>> +		creationToken = 4
>> +	EOF
>> +
>> +	git clone --bundle-uri="$HTTPD_URL/bundle-list" . clone-list-http-2 &&
>> +
>> +	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
>> +	git -C clone-list-http-2 cat-file --batch-check <oids
> 
> This test looks like the one that was updated above, but adds the
> 'creationToken' heuristic key. However, the 'test_bundle_downloaded' check
> isn't included - if it were, it would need to verify that all bundles were
> downloaded, with the heuristic being ignored, all bundles will be downloaded
> (which isn't consistent with what the 'creationToken' heuristic will
> *eventually* do).
> 
> As a matter of personal preference (so no pressure to change if you
> disagree), I find this test in its current state a bit misleading; because
> it's a 'test_expect_success' and there's no "NEEDSWORK" or "TODO", I could
> easily assume that cloning from a bundle list with the 'creationToken'
> heuristic is working as-intended at this point (that is, there's no
> indication that it's not implemented). 

It's true that it's not implemented right now, and it is a bit misleading
because of that. At clone time, the only thing that will change with the
implementation is possibly the order of the files being downloaded (and
the order is not predictable before that implementation).

The restriction of _not_ downloading some files comes only for the 'git
fetch' implementation, so these test changes are only foundations for those
future tests.

The only benefit of having these tests right now is that we get some
demonstration that the existing implementation ignores unknown properties
in the bundle list.

> If you did want to change it, adding a 'NEEDSWORK' comment, changing to
> 'test_expect_failure' & including the appropriate 'test_bundle_downloaded'
> check, or moving this test to the patch where the heuristic is implemented
> would mitigate any confusion. That said, this "issue" is resolved by the
> end of the series anyway, so it's really a low priority fix.
I think if we wanted to go this route, we could do it with the "download
the bundles in this order" check, or possibly by adding the 'git fetch'
behavior into the test at this point. I'll consider these options for v2.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/8] bundle-uri: download in creationToken order
  2023-01-06 20:36 ` [PATCH 4/8] bundle-uri: download in creationToken order Derrick Stolee via GitGitGadget
  2023-01-09  3:22   ` Junio C Hamano
@ 2023-01-19 18:32   ` Victoria Dye
  2023-01-20 14:56     ` Derrick Stolee
  1 sibling, 1 reply; 74+ messages in thread
From: Victoria Dye @ 2023-01-19 18:32 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget, git
  Cc: gitster, me, avarab, steadmon, chooglen, Derrick Stolee

Derrick Stolee via GitGitGadget wrote:
> +static int fetch_bundles_by_token(struct repository *r,
> +				  struct bundle_list *list)
> +{
> +	int cur;
> +	int pop_or_push = 0;
> +	struct bundle_list_context ctx = {
> +		.r = r,
> +		.list = list,
> +		.mode = list->mode,
> +	};
> +	struct sorted_bundle_list sorted = {
> +		.alloc = hashmap_get_size(&list->bundles),
> +	};
> +
> +	ALLOC_ARRAY(sorted.items, sorted.alloc);
> +
> +	for_all_bundles_in_list(list, insert_bundle, &sorted);
> +
> +	QSORT(sorted.items, sorted.nr, compare_creation_token);

So, at this point, 'sorted' is ordered by *decreasing* creation token? With
the loop below being somewhat complex, it would be nice to have a comment
mention that explicitly so readers have a clear understanding of the
"initial state" before entering the loop.

> +
> +	/*
> +	 * Use a stack-based approach to download the bundles and attempt
> +	 * to unbundle them in decreasing order by creation token. If we
> +	 * fail to unbundle (after a successful download) then move to the
> +	 * next non-downloaded bundle (push to the stack) and attempt
> +	 * downloading. Once we succeed in applying a bundle, move to the
> +	 * previous unapplied bundle (pop the stack) and attempt to unbundle
> +	 * it again.
> +	 *
> +	 * In the case of a fresh clone, we will likely download all of the
> +	 * bundles before successfully unbundling the oldest one, then the
> +	 * rest of the bundles unbundle successfully in increasing order
> +	 * of creationToken.
> +	 *
> +	 * If there are existing objects, then this process may terminate
> +	 * early when all required commits from "new" bundles exist in the
> +	 * repo's object store.
> +	 */
> +	cur = 0;
> +	while (cur >= 0 && cur < sorted.nr) {
> +		struct remote_bundle_info *bundle = sorted.items[cur];
> +		if (!bundle->file) {
> +			/* Not downloaded yet. Try downloading. */
> +			if (download_bundle_to_file(bundle, &ctx)) {
> +				/* Failure. Push to the stack. */
> +				pop_or_push = 1;
> +				goto stack_operation;

Personally, I find the use of "stack" terminology more confusing than not.
'sorted' isn't really a stack, it's a list with fixed contents being
traversed stepwise with 'cur'. For example, 'pop_or_push' being renamed to
'move_direction' or 'step' something along those lines might more clearly
indicate what's actually happening with 'cur' & 'sorted'.

> +			}
> +
> +			/* We expect bundles when using creationTokens. */
> +			if (!is_bundle(bundle->file, 1)) {
> +				warning(_("file downloaded from '%s' is not a bundle"),
> +					bundle->uri);
> +				break;
> +			}
> +		}
> +
> +		if (bundle->file && !bundle->unbundled) {
> +			/*
> +			 * This was downloaded, but not successfully
> +			 * unbundled. Try unbundling again.
> +			 */
> +			if (unbundle_from_file(ctx.r, bundle->file)) {
> +				/* Failed to unbundle. Push to stack. */
> +				pop_or_push = 1;
> +			} else {
> +				/* Succeeded in unbundle. Pop stack. */
> +				pop_or_push = -1;
> +			}
> +		}
> +
> +		/*
> +		 * Else case: downloaded and unbundled successfully.
> +		 * Skip this by moving in the same direction as the
> +		 * previous step.
> +		 */
> +
> +stack_operation:
> +		/* Move in the specified direction and repeat. */
> +		cur += pop_or_push;
> +	}

After reading through this loop, I generally understood *what* its doing,
but didn't really follow *why* the download & unbundling is done like this.
I needed to refer back to the design doc
('Documentation/technical/bundle-uri.txt') to understand some basic
assumptions about bundles:

- A new bundle's creation token should always be strictly greater than the
  previous newest bundle's creation token. I don't see any special handling
  for equal creation tokens, so my assumption is that the sorting of the
  list arbitrarily assigns one to be "greater" and it's dealt with that way.
- The bundle with the lowest creation token should always be unbundleable,
  since it contains all objects in an initial clone.

I do still have some questions, though:

- Why would 'unbundle_from_file()' fail? From context clues, I'm guessing it
  fails if it has some unreachable objects (as in an incremental bundle), or
  if it's corrupted somehow.
- Why would 'download_bundle_to_file()' to fail? Unlike
  'unbundle_from_file()', it looks like that represents an unexpected error.

Also - it seems like one of the assumptions here is that, if a bundle can't
be downloaded & unbundled, no bundle with a higher creation token can be
successfully unbundled ('download_bundle_to_file()' sets 'pop_or_push' to
'1', which will cause the loop to ignore all higher-token bundles and return
a nonzero value from the function). 

I don't think that assumption is necessarily true, though. Suppose you have
a "base" bundle 100 and incremental bundles 101 and 102. 101 has all objects
from a new branch A, and 102 has all objects from a newer branch B (not
based on any objects in A). In this case, 102 could be unbundled even if 101
is corrupted/can't be downloaded, but we'd run into issues if we store 102
as the "latest unbundled creation token" (because it implies that 101 was
unbundled).

Is there any benefit to trying to unbundle those higher bundles *without*
advancing the "latest creation token"? E.g. in my example, unbundle 102 but
store '100' as the latest creation token?

> diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
> index 328caeeae9a..d7461ec907e 100755
> --- a/t/t5558-clone-bundle-uri.sh
> +++ b/t/t5558-clone-bundle-uri.sh
> @@ -368,6 +368,8 @@ test_expect_success 'clone bundle list (HTTP, any mode)' '
>  '
>  
>  test_expect_success 'clone bundle list (http, creationToken)' '
> +	test_when_finished rm -f trace*.txt &&
> +
>  	cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" &&
>  	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
>  	[bundle]
> @@ -392,10 +394,45 @@ test_expect_success 'clone bundle list (http, creationToken)' '
>  		creationToken = 4
>  	EOF
>  
> -	git clone --bundle-uri="$HTTPD_URL/bundle-list" . clone-list-http-2 &&
> +	GIT_TRACE2_EVENT=$(pwd)/trace-clone.txt \
> +	git clone --bundle-uri="$HTTPD_URL/bundle-list" \
> +		clone-from clone-list-http-2 &&
>  
>  	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
> -	git -C clone-list-http-2 cat-file --batch-check <oids
> +	git -C clone-list-http-2 cat-file --batch-check <oids &&
> +
> +	for b in 1 2 3 4
> +	do
> +		test_bundle_downloaded bundle-$b.bundle trace-clone.txt ||
> +			return 1
> +	done

If I understand correctly, these added conditions would have passed even if
they were added when the test was initially created in patch 1, but they're
added here to tie them to the implementation of the creationToken heuristic?
Seems reasonable.

> +'
> +
> +test_expect_success 'clone bundle list (http, creationToken)' '

This new test has the same name as the one above it - how does it differ
from that one? Whatever the difference is, can that be noted somehow in the
title or a comment?

> +	test_when_finished rm -f trace*.txt &&
> +
> +	cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" &&
> +	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
> +	[bundle]
> +		version = 1
> +		mode = all
> +		heuristic = creationToken
> +
> +	[bundle "bundle-1"]
> +		uri = bundle-1.bundle
> +		creationToken = 1
> +
> +	[bundle "bundle-2"]
> +		uri = bundle-2.bundle
> +		creationToken = 2
> +	EOF
> +
> +	GIT_TRACE2_EVENT=$(pwd)/trace-clone.txt \
> +	git clone --bundle-uri="$HTTPD_URL/bundle-list" \
> +		clone-from clone-token-http &&
> +
> +	test_bundle_downloaded bundle-1.bundle trace-clone.txt &&
> +	test_bundle_downloaded bundle-2.bundle trace-clone.txt
>  '
>  
>  # Do not add tests here unless they use the HTTP server, as they will
> diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
> index 1928ea1dd7c..57476b6e6d7 100755
> --- a/t/t5601-clone.sh
> +++ b/t/t5601-clone.sh
> @@ -831,6 +831,56 @@ test_expect_success 'auto-discover multiple bundles from HTTP clone' '
>  	grep -f pattern trace.txt
>  '
>  
> +# Usage: test_bundle_downloaded <bundle-id> <trace-filename>
> +test_bundle_downloaded () {
> +	cat >pattern <<-EOF &&
> +	"event":"child_start".*"argv":\["git-remote-https","$HTTPD_URL/$1.bundle"\]
> +	EOF
> +	grep -f pattern "$2"
> +}

This function is the same as the one created in 't5558'. Should it be moved
to 'lib-bundle.sh' or 'test-lib.sh' to avoid duplicate code?

> +
> +test_expect_success 'auto-discover multiple bundles from HTTP clone: creationToken heuristic' '
> +	test_when_finished rm -rf "$HTTPD_DOCUMENT_ROOT_PATH/repo4.git" &&
> +	test_when_finished rm -rf clone-heuristic trace*.txt &&
> +
> +	test_commit -C src newest &&
> +	git -C src bundle create "$HTTPD_DOCUMENT_ROOT_PATH/newest.bundle" HEAD~1..HEAD &&
> +	git clone --bare --no-local src "$HTTPD_DOCUMENT_ROOT_PATH/repo4.git" &&
> +
> +	cat >>"$HTTPD_DOCUMENT_ROOT_PATH/repo4.git/config" <<-EOF &&
> +	[uploadPack]
> +		advertiseBundleURIs = true
> +
> +	[bundle]
> +		version = 1
> +		mode = all
> +		heuristic = creationToken
> +
> +	[bundle "everything"]
> +		uri = $HTTPD_URL/everything.bundle
> +		creationtoken = 1
> +
> +	[bundle "new"]
> +		uri = $HTTPD_URL/new.bundle
> +		creationtoken = 2
> +
> +	[bundle "newest"]
> +		uri = $HTTPD_URL/newest.bundle
> +		creationtoken = 3
> +	EOF
> +
> +	GIT_TRACE2_EVENT="$(pwd)/trace-clone.txt" \
> +		git -c protocol.version=2 \
> +		    -c transfer.bundleURI=true clone \
> +		"$HTTPD_URL/smart/repo4.git" clone-heuristic &&
> +
> +	# We should fetch all bundles
> +	for b in everything new newest
> +	do
> +		test_bundle_downloaded $b trace-clone.txt || return 1
> +	done
> +'
> +
>  # DO NOT add non-httpd-specific tests here, because the last part of this
>  # test script is only executed when httpd is available and enabled.
>  


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 5/8] clone: set fetch.bundleURI if appropriate
  2023-01-06 20:36 ` [PATCH 5/8] clone: set fetch.bundleURI if appropriate Derrick Stolee via GitGitGadget
@ 2023-01-19 19:42   ` Victoria Dye
  2023-01-20 15:42     ` Derrick Stolee
  0 siblings, 1 reply; 74+ messages in thread
From: Victoria Dye @ 2023-01-19 19:42 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget, git
  Cc: gitster, me, avarab, steadmon, chooglen, Derrick Stolee

Derrick Stolee via GitGitGadget wrote:
> diff --git a/Documentation/config/fetch.txt b/Documentation/config/fetch.txt
> index cd65d236b43..4f796218aab 100644
> --- a/Documentation/config/fetch.txt
> +++ b/Documentation/config/fetch.txt
> @@ -96,3 +96,11 @@ fetch.writeCommitGraph::
>  	merge and the write may take longer. Having an updated commit-graph
>  	file helps performance of many Git commands, including `git merge-base`,
>  	`git push -f`, and `git log --graph`. Defaults to false.
> +
> +fetch.bundleURI::
> +	This value stores a URI for fetching Git object data from a bundle URI
> +	before performing an incremental fetch from the origin Git server. If
> +	the value is `<uri>` then running `git fetch <args>` is equivalent to
> +	first running `git fetch --bundle-uri=<uri>` immediately before
> +	`git fetch <args>`. See details of the `--bundle-uri` option in
> +	linkgit:git-fetch[1].

Since it's not mentioned from this or any other user-facing documentation
(AFAICT), could you note that this value is set automatically by 'git clone'
iff '--bundle-uri' is specified *and* 'bundle.heuristic' is set for the
initially downloaded bundle list?

It would also be nice to make note of that behavior in the documentation of
the '--bundle-uri' option in 'Documentation/git-clone.txt', since command
documentation in general seems to be more popular/visible to users than
config docs.

> diff --git a/builtin/clone.c b/builtin/clone.c
> index 5453ba5277f..5370617664d 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -1248,12 +1248,16 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
>  	 * data from the --bundle-uri option.
>  	 */
>  	if (bundle_uri) {
> +		int has_heuristic = 0;
> +
>  		/* At this point, we need the_repository to match the cloned repo. */
>  		if (repo_init(the_repository, git_dir, work_tree))
>  			warning(_("failed to initialize the repo, skipping bundle URI"));
> -		else if (fetch_bundle_uri(the_repository, bundle_uri))
> +		else if (fetch_bundle_uri(the_repository, bundle_uri, &has_heuristic))
>  			warning(_("failed to fetch objects from bundle URI '%s'"),
>  				bundle_uri);
> +		else if (has_heuristic)
> +			git_config_set_gently("fetch.bundleuri", bundle_uri);

If the heuristic is anything other than "none", this config value is set in
the repository-scoped config file. Makes sense!

>  	}
>  
>  	strvec_push(&transport_ls_refs_options.ref_prefixes, "HEAD");
> diff --git a/bundle-uri.c b/bundle-uri.c
> index b30c85ba6f2..1dbbbb980eb 100644
> --- a/bundle-uri.c
> +++ b/bundle-uri.c
> @@ -594,9 +594,10 @@ static int fetch_bundle_list_in_config_format(struct repository *r,
>  	 * it advertises are expected to be bundles, not nested lists.
>  	 * We can drop 'global_list' and 'depth'.
>  	 */
> -	if (list_from_bundle.heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN)
> +	if (list_from_bundle.heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN) {
>  		result = fetch_bundles_by_token(r, &list_from_bundle);
> -	else if ((result = download_bundle_list(r, &list_from_bundle,
> +		global_list->heuristic = BUNDLE_HEURISTIC_CREATIONTOKEN;

If the 'heuristic' field already existed and was being used to apply
bundles, why wasn't 'global_list->heuristic' already being set? Before this
patch, was the 'global_list->heuristic' field not accurately reflecting the
heuristic type of a given bundle list? 

If so, I think it'd make sense to move this section to patch 4 [1], since
that's when the heuristic is first applied to the bundle list.

[1] https://lore.kernel.org/git/57c0174d3752fb61a05e0653de9d3057616ed16a.1673037405.git.gitgitgadget@gmail.com/

> diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
> index d7461ec907e..8ff560425ee 100755
> --- a/t/t5558-clone-bundle-uri.sh
> +++ b/t/t5558-clone-bundle-uri.sh
> @@ -435,6 +435,39 @@ test_expect_success 'clone bundle list (http, creationToken)' '
>  	test_bundle_downloaded bundle-2.bundle trace-clone.txt
>  '
>  
> +test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
> +	test_when_finished rm -rf fetch-http-4 trace*.txt &&
> +
> +	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
> +	[bundle]
> +		version = 1
> +		mode = all
> +		heuristic = creationToken
> +
> +	[bundle "bundle-1"]
> +		uri = bundle-1.bundle
> +		creationToken = 1
> +	EOF
> +
> +	GIT_TRACE2_EVENT="$(pwd)/trace-clone.txt" \
> +	git clone --single-branch --branch=base \
> +		--bundle-uri="$HTTPD_URL/bundle-list" \
> +		"$HTTPD_URL/smart/fetch.git" fetch-http-4 &&
> +
> +	test_cmp_config -C fetch-http-4 "$HTTPD_URL/bundle-list" fetch.bundleuri &&
> +
> +	# The clone should copy two files: the list and bundle-1.
> +	test_bundle_downloaded bundle-list trace-clone.txt &&
> +	test_bundle_downloaded bundle-1.bundle trace-clone.txt &&
> +
> +	# only received base ref from bundle-1
> +	git -C fetch-http-4 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
> +	cat >expect <<-\EOF &&
> +	refs/bundles/base
> +	EOF
> +	test_cmp expect refs
> +'

This test looks good - it verifies the config update, bundle download, and
unbundle all work as intended.

> +
>  # Do not add tests here unless they use the HTTP server, as they will
>  # not run unless the HTTP dependencies exist.
>  


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 6/8] bundle-uri: drop bundle.flag from design doc
  2023-01-06 20:36 ` [PATCH 6/8] bundle-uri: drop bundle.flag from design doc Derrick Stolee via GitGitGadget
@ 2023-01-19 19:44   ` Victoria Dye
  0 siblings, 0 replies; 74+ messages in thread
From: Victoria Dye @ 2023-01-19 19:44 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget, git
  Cc: gitster, me, avarab, steadmon, chooglen, Derrick Stolee

Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>
> 
> The Implementation Plan section lists a 'bundle.flag' option that is not
> documented anywhere else. What is documented elsewhere in the document
> and implemented by previous changes is the 'bundle.heuristic' config
> key. For now, a heuristic is required to indicate that a bundle list is
> organized for use during 'git fetch', and it is also sufficient for all
> existing designs.

Good catch, thanks for keeping the documentation consistent!

> 
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>  Documentation/technical/bundle-uri.txt | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/technical/bundle-uri.txt b/Documentation/technical/bundle-uri.txt
> index b78d01d9adf..91d3a13e327 100644
> --- a/Documentation/technical/bundle-uri.txt
> +++ b/Documentation/technical/bundle-uri.txt
> @@ -479,14 +479,14 @@ outline for submitting these features:
>     (This choice is an opt-in via a config option and a command-line
>     option.)
>  
> -4. Allow the client to understand the `bundle.flag=forFetch` configuration
> +4. Allow the client to understand the `bundle.heuristic` configuration key
>     and the `bundle.<id>.creationToken` heuristic. When `git clone`
> -   discovers a bundle URI with `bundle.flag=forFetch`, it configures the
> -   client repository to check that bundle URI during later `git fetch <remote>`
> +   discovers a bundle URI with `bundle.heuristic`, it configures the client
> +   repository to check that bundle URI during later `git fetch <remote>`
>     commands.
>  
>  5. Allow clients to discover bundle URIs during `git fetch` and configure
> -   a bundle URI for later fetches if `bundle.flag=forFetch`.
> +   a bundle URI for later fetches if `bundle.heuristic` is set.
>  
>  6. Implement the "inspect headers" heuristic to reduce data downloads when
>     the `bundle.<id>.creationToken` heuristic is not available.


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 7/8] fetch: fetch from an external bundle URI
  2023-01-06 20:36 ` [PATCH 7/8] fetch: fetch from an external bundle URI Derrick Stolee via GitGitGadget
@ 2023-01-19 20:34   ` Victoria Dye
  2023-01-20 15:47     ` Derrick Stolee
  0 siblings, 1 reply; 74+ messages in thread
From: Victoria Dye @ 2023-01-19 20:34 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget, git
  Cc: gitster, me, avarab, steadmon, chooglen, Derrick Stolee

Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>
> 
> When a user specifies a URI via 'git clone --bundle-uri', that URI may
> be a bundle list that advertises a 'bundle.heuristic' value. In that
> case, the Git client stores a 'fetch.bundleURI' config value storing
> that URI.
> 
> Teach 'git fetch' to check for this config value and download bundles
> from that URI before fetching from the Git remote(s). Likely, the bundle
> provider has configured a heuristic (such as "creationToken") that will
> allow the Git client to download only a portion of the bundles before
> continuing the fetch.
> 
> Since this URI is completely independent of the remote server, we want
> to be sure that we connect to the bundle URI before creating a
> connection to the Git remote. We do not want to hold a stateful
> connection for too long if we can avoid it.
> 
> To test that this works correctly, extend the previous tests that set
> 'fetch.bundleURI' to do follow-up fetches. The bundle list is updated
> incrementally at each phase to demonstrate that the heuristic avoids
> downloading older bundles. This includes the middle fetch downloading
> the objects in bundle-3.bundle from the Git remote, and therefore not
> needing that bundle in the third fetch.
> 
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>  builtin/fetch.c             |  8 +++++
>  t/t5558-clone-bundle-uri.sh | 59 +++++++++++++++++++++++++++++++++++++
>  2 files changed, 67 insertions(+)
> 
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index 7378cafeec9..fbb1d470c38 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -29,6 +29,7 @@
>  #include "commit-graph.h"
>  #include "shallow.h"
>  #include "worktree.h"
> +#include "bundle-uri.h"
>  
>  #define FORCED_UPDATES_DELAY_WARNING_IN_MS (10 * 1000)
>  
> @@ -2109,6 +2110,7 @@ static int fetch_one(struct remote *remote, int argc, const char **argv,
>  int cmd_fetch(int argc, const char **argv, const char *prefix)
>  {
>  	int i;
> +	const char *bundle_uri;
>  	struct string_list list = STRING_LIST_INIT_DUP;
>  	struct remote *remote = NULL;
>  	int result = 0;
> @@ -2194,6 +2196,12 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
>  	if (dry_run)
>  		write_fetch_head = 0;
>  
> +	if (!git_config_get_string_tmp("fetch.bundleuri", &bundle_uri) &&
> +	    !starts_with(bundle_uri, "remote:")) {

Maybe a silly question, by why would the bundle URI start with 'remote:'
(and why do we silently skip fetching from the URI in that case)?

> +		if (fetch_bundle_uri(the_repository, bundle_uri, NULL))
> +			warning(_("failed to fetch bundles from '%s'"), bundle_uri);
> +	}
> +
>  	if (all) {
>  		if (argc == 1)
>  			die(_("fetch --all does not take a repository argument"));
> diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
> index 8ff560425ee..3f4d61a915c 100755
> --- a/t/t5558-clone-bundle-uri.sh
> +++ b/t/t5558-clone-bundle-uri.sh
> @@ -465,6 +465,65 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
>  	cat >expect <<-\EOF &&
>  	refs/bundles/base
>  	EOF
> +	test_cmp expect refs &&

At this point in the test, only 'base' is in the cloned repo (equivalent to
the contents of 'bundle-1').

> +
> +	cat >>"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
> +	[bundle "bundle-2"]
> +		uri = bundle-2.bundle
> +		creationToken = 2
> +	EOF
> +
> +	# Fetch the objects for bundle-2 _and_ bundle-3.
> +	GIT_TRACE2_EVENT="$(pwd)/trace1.txt" \
> +		git -C fetch-http-4 fetch origin --no-tags \
> +		refs/heads/left:refs/heads/left \
> +		refs/heads/right:refs/heads/right &&
> +
> +	# This fetch should copy two files: the list and bundle-2.
> +	test_bundle_downloaded bundle-list trace1.txt &&
> +	test_bundle_downloaded bundle-2.bundle trace1.txt &&
> +	! test_bundle_downloaded bundle-1.bundle trace1.txt &&

Now, with 'bundle-2' in the list, fetch 'left' and 'right'. 'bundle-1' is
not fetched because we already have its contents (in 'base'), 'bundle-2' is
downloaded to get 'left', and 'right' is fetched directly from the repo.

> +
> +	# received left from bundle-2
> +	git -C fetch-http-4 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
> +	cat >expect <<-\EOF &&
> +	refs/bundles/base
> +	refs/bundles/left
> +	EOF
> +	test_cmp expect refs &&
> +
> +	cat >>"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
> +	[bundle "bundle-3"]
> +		uri = bundle-3.bundle
> +		creationToken = 3
> +
> +	[bundle "bundle-4"]
> +		uri = bundle-4.bundle
> +		creationToken = 4
> +	EOF
> +
> +	# This fetch should skip bundle-3.bundle, since its objets are

s/objets/objects

> +	# already local (we have the requisite commits for bundle-4.bundle).
> +	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
> +		git -C fetch-http-4 fetch origin --no-tags \
> +		refs/heads/merge:refs/heads/merge &&
> +
> +	# This fetch should copy three files: the list, bundle-3, and bundle-4.
> +	test_bundle_downloaded bundle-list trace2.txt &&
> +	test_bundle_downloaded bundle-4.bundle trace2.txt &&
> +	! test_bundle_downloaded bundle-1.bundle trace2.txt &&
> +	! test_bundle_downloaded bundle-2.bundle trace2.txt &&
> +	! test_bundle_downloaded bundle-3.bundle trace2.txt &&

'bundle-3' and 'bundle-4' are then added to the list and we fetch 'merge'.
The repository already has 'base', 'left', and 'right' so we don't need to
download 'bundle-1', 'bundle-2', and 'bundle-3' respectively; 'bundle-4' is
downloaded to get 'merge'.

> +
> +	# received merge ref from bundle-4, but right is missing
> +	# because we did not download bundle-3.
> +	git -C fetch-http-4 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
> +
> +	cat >expect <<-\EOF &&
> +	refs/bundles/base
> +	refs/bundles/left
> +	refs/bundles/merge

And this confirms that 'base', 'left', and 'merge' all came from bundles
(and 'right' did not), and everything is working as expected. This is test
is both easy to understand (the comments clearly explain each step without
being overly verbose) and thorough. Looks good!

> +	EOF
>  	test_cmp expect refs
>  '
>  


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 8/8] bundle-uri: store fetch.bundleCreationToken
  2023-01-06 20:36 ` [PATCH 8/8] bundle-uri: store fetch.bundleCreationToken Derrick Stolee via GitGitGadget
@ 2023-01-19 22:24   ` Victoria Dye
  2023-01-20 15:53     ` Derrick Stolee
  0 siblings, 1 reply; 74+ messages in thread
From: Victoria Dye @ 2023-01-19 22:24 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget, git
  Cc: gitster, me, avarab, steadmon, chooglen, Derrick Stolee

Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>
> 
> When a bundle list specifies the "creationToken" heuristic, the Git
> client downloads the list and then starts downloading bundles in
> descending creationToken order. This process stops as soon as all
> downloaded bundles can be applied to the repository (because all
> required commits are present in the repository or in the downloaded
> bundles).
> 
> When checking the same bundle list twice, this strategy requires
> downloading the bundle with the maximum creationToken again, which is
> wasteful. The creationToken heuristic promises that the client will not
> have a use for that bundle if its creationToken value is the at most the

s/is the at most/is at most(?)

> previous creationToken value.
> 
> To prevent these wasteful downloads, create a fetch.bundleCreationToken
> config setting that the Git client sets after downloading bundles. This
> value allows skipping that maximum bundle download when this config
> value is the same value (or larger).
> 
> To test that this works correctly, we can insert some "duplicate"
> fetches into existing tests and demonstrate that only the bundle list is
> downloaded.
> 
> The previous logic for downloading bundles by creationToken worked even
> if the bundle list was empty, but now we have logic that depends on the
> first entry of the list. Terminate early in the (non-sensical) case of
> an empty bundle list.
> 
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>  Documentation/config/fetch.txt |  8 ++++++++
>  bundle-uri.c                   | 35 ++++++++++++++++++++++++++++++++--
>  t/t5558-clone-bundle-uri.sh    | 25 +++++++++++++++++++++++-
>  3 files changed, 65 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/config/fetch.txt b/Documentation/config/fetch.txt
> index 4f796218aab..96755ba148b 100644
> --- a/Documentation/config/fetch.txt
> +++ b/Documentation/config/fetch.txt
> @@ -104,3 +104,11 @@ fetch.bundleURI::
>  	first running `git fetch --bundle-uri=<uri>` immediately before
>  	`git fetch <args>`. See details of the `--bundle-uri` option in
>  	linkgit:git-fetch[1].
> +
> +fetch.bundleCreationToken::
> +	When using `fetch.bundleURI` to fetch incrementally from a bundle
> +	list that uses the "creationToken" heuristic, this config value
> +	stores the maximum `creationToken` value of the downloaded bundles.
> +	This value is used to prevent downloading bundles in the future
> +	if the advertised `creationToken` is not strictly larger than this
> +	value.
> diff --git a/bundle-uri.c b/bundle-uri.c
> index 1dbbbb980eb..98655bd6721 100644
> --- a/bundle-uri.c
> +++ b/bundle-uri.c
> @@ -464,6 +464,8 @@ static int fetch_bundles_by_token(struct repository *r,
>  {
>  	int cur;
>  	int pop_or_push = 0;
> +	const char *creationTokenStr;
> +	uint64_t maxCreationToken;
>  	struct bundle_list_context ctx = {
>  		.r = r,
>  		.list = list,
> @@ -477,8 +479,27 @@ static int fetch_bundles_by_token(struct repository *r,
>  
>  	for_all_bundles_in_list(list, insert_bundle, &sorted);
>  
> +	if (!sorted.nr) {
> +		free(sorted.items);
> +		return 0;
> +	}

This check is added here because we're only now at risk for an invalid
access to 'sorted' (checking 'sorted.items[0]' below).

> +
>  	QSORT(sorted.items, sorted.nr, compare_creation_token);
>  
> +	/*
> +	 * If fetch.bundleCreationToken exists, parses to a uint64t, and
> +	 * is not strictly smaller than the maximum creation token in the
> +	 * bundle list, then do not download any bundles.
> +	 */
> +	if (!repo_config_get_value(r,
> +				   "fetch.bundlecreationtoken",
> +				   &creationTokenStr) &&
> +	    sscanf(creationTokenStr, "%"PRIu64, &maxCreationToken) == 1 &&
> +	    sorted.items[0]->creationToken <= maxCreationToken) {
> +		free(sorted.items);
> +		return 0;
> +	}

And here, we exit if the cached creation token is greater than or equal to
the highest advertised token. Overall, this seems pretty safe:

- If the value is (somehow) manually updated to something larger than it
  should be, objects will be fetched from the server that could have
  otherwise come from a bundle. Not ideal, but no worse than a regular
  fetch.
- If the value is too small or invalid (i.e., not an unsigned integer),
  we'll download the first bundle, unbundle it, then overwrite the invalid
  'fetch.bundlecreationtoken' with a new valid one.

The latter is self-correcting, but should the former be documented
somewhere? For example, if someone changes which bundle server they're using
with a repo, the creation token numbering scheme might be completely
different. In that case, a user would probably want to "reset" the
'fetch.bundlecreationtoken' and start fresh with a new bundle list (even if
the recommended method is "manually delete the config key from your repo").

> +
>  	/*
>  	 * Use a stack-based approach to download the bundles and attempt
>  	 * to unbundle them in decreasing order by creation token. If we
> @@ -541,14 +562,24 @@ stack_operation:
>  		cur += pop_or_push;
>  	}
>  
> -	free(sorted.items);
> -
>  	/*
>  	 * We succeed if the loop terminates because 'cur' drops below
>  	 * zero. The other case is that we terminate because 'cur'
>  	 * reaches the end of the list, so we have a failure no matter
>  	 * which bundles we apply from the list.
>  	 */
> +	if (cur < 0) {
> +		struct strbuf value = STRBUF_INIT;
> +		strbuf_addf(&value, "%"PRIu64"", sorted.items[0]->creationToken);
> +		if (repo_config_set_multivar_gently(ctx.r,
> +						    "fetch.bundleCreationToken",
> +						    value.buf, NULL, 0))

Set the new max bundle creation token if the value has been updated (if the
'fetch.bundleCreationToken' was >= the first bundle's token, 'cur' is 0) in
the repository scope (like the cached bundle URI in patch 5 [1]). Looks
good.

[1] https://lore.kernel.org/git/d9c6f50e4f218267c1e8da060ce5b190dc8a709c.1673037405.git.gitgitgadget@gmail.com/

> +			warning(_("failed to store maximum creation token"));
> +
> +		strbuf_release(&value);
> +	}
> +
> +	free(sorted.items);
>  	return cur >= 0;
>  }
>  
> diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
> index 3f4d61a915c..0604d721f1b 100755
> --- a/t/t5558-clone-bundle-uri.sh
> +++ b/t/t5558-clone-bundle-uri.sh

It isn't exactly related to this patch, but it looks like the second "clone
bundle list (http, creationToken)" never received any updates to
differentiate it from the first test with that name (I noted the duplicate
name in patch 4 [2]). Is it just a leftover duplicate?

[1] https://lore.kernel.org/git/ede340d1-bce4-0c1d-7afb-4874a67d1803@github.com/

> @@ -455,6 +455,7 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
>  		"$HTTPD_URL/smart/fetch.git" fetch-http-4 &&
>  
>  	test_cmp_config -C fetch-http-4 "$HTTPD_URL/bundle-list" fetch.bundleuri &&
> +	test_cmp_config -C fetch-http-4 1 fetch.bundlecreationtoken &&
>  
>  	# The clone should copy two files: the list and bundle-1.
>  	test_bundle_downloaded bundle-list trace-clone.txt &&
> @@ -479,6 +480,8 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
>  		refs/heads/left:refs/heads/left \
>  		refs/heads/right:refs/heads/right &&
>  
> +	test_cmp_config -C fetch-http-4 2 fetch.bundlecreationtoken &&
> +
>  	# This fetch should copy two files: the list and bundle-2.
>  	test_bundle_downloaded bundle-list trace1.txt &&
>  	test_bundle_downloaded bundle-2.bundle trace1.txt &&

Verifying the max creation token that's saved - nice!

> @@ -492,6 +495,15 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
>  	EOF
>  	test_cmp expect refs &&
>  
> +	# No-op fetch
> +	GIT_TRACE2_EVENT="$(pwd)/trace1b.txt" \
> +		git -C fetch-http-4 fetch origin --no-tags \
> +		refs/heads/left:refs/heads/left \
> +		refs/heads/right:refs/heads/right &&
> +	test_bundle_downloaded bundle-list trace1b.txt &&
> +	! test_bundle_downloaded bundle-1.bundle trace1b.txt &&
> +	! test_bundle_downloaded bundle-2.bundle trace1b.txt &&

Now we make sure we're not downloading that first bundle if it's already
been unbundled.

> +
>  	cat >>"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
>  	[bundle "bundle-3"]
>  		uri = bundle-3.bundle
> @@ -508,6 +520,8 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
>  		git -C fetch-http-4 fetch origin --no-tags \
>  		refs/heads/merge:refs/heads/merge &&
>  
> +	test_cmp_config -C fetch-http-4 4 fetch.bundlecreationtoken &&
> +
>  	# This fetch should copy three files: the list, bundle-3, and bundle-4.
>  	test_bundle_downloaded bundle-list trace2.txt &&
>  	test_bundle_downloaded bundle-4.bundle trace2.txt &&
> @@ -524,7 +538,16 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
>  	refs/bundles/left
>  	refs/bundles/merge
>  	EOF
> -	test_cmp expect refs
> +	test_cmp expect refs &&
> +
> +	# No-op fetch
> +	GIT_TRACE2_EVENT="$(pwd)/trace2b.txt" \
> +		git -C fetch-http-4 fetch origin &&
> +	test_bundle_downloaded bundle-list trace2b.txt &&
> +	! test_bundle_downloaded bundle-1.bundle trace2b.txt &&
> +	! test_bundle_downloaded bundle-2.bundle trace2b.txt &&
> +	! test_bundle_downloaded bundle-3.bundle trace2b.txt &&
> +	! test_bundle_downloaded bundle-4.bundle trace2b.txt

And add another no-op fetch - this time, specifically making sure the
downloaded bundles have covered all objects fetched from 'origin'. As with
previous patches, these test updates are quite thorough; nice work!

>  '
>  
>  # Do not add tests here unless they use the HTTP server, as they will


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/8] bundle-uri: download in creationToken order
  2023-01-19 18:32   ` Victoria Dye
@ 2023-01-20 14:56     ` Derrick Stolee
  0 siblings, 0 replies; 74+ messages in thread
From: Derrick Stolee @ 2023-01-20 14:56 UTC (permalink / raw)
  To: Victoria Dye, Derrick Stolee via GitGitGadget, git
  Cc: gitster, me, avarab, steadmon, chooglen

On 1/19/2023 1:32 PM, Victoria Dye wrote:> Derrick Stolee via GitGitGadget wrote:
>> +static int fetch_bundles_by_token(struct repository *r,
>> +				  struct bundle_list *list)
>> +{
>> +	int cur;
>> +	int pop_or_push = 0;
>> +	struct bundle_list_context ctx = {
>> +		.r = r,
>> +		.list = list,
>> +		.mode = list->mode,
>> +	};
>> +	struct sorted_bundle_list sorted = {
>> +		.alloc = hashmap_get_size(&list->bundles),
>> +	};
>> +
>> +	ALLOC_ARRAY(sorted.items, sorted.alloc);
>> +
>> +	for_all_bundles_in_list(list, insert_bundle, &sorted);
>> +
>> +	QSORT(sorted.items, sorted.nr, compare_creation_token);
>
> So, at this point, 'sorted' is ordered by *decreasing* creation token? With
> the loop below being somewhat complex, it would be nice to have a comment
> mention that explicitly so readers have a clear understanding of the
> "initial state" before entering the loop.

That's a good point, but also in my local version I have the following line:

	QSORT(bundles.items, bundles.nr, compare_creation_token_decreasing);

The comparison function was renamed based on Junio's feedback. After making
that change, this line is more self-documenting. Do you still think that it
needs a clarification comment if this rename occurs?

>> +	/*
>> +	 * Use a stack-based approach to download the bundles and attempt
>> +	 * to unbundle them in decreasing order by creation token. If we
>> +	 * fail to unbundle (after a successful download) then move to the
>> +	 * next non-downloaded bundle (push to the stack) and attempt
>> +	 * downloading. Once we succeed in applying a bundle, move to the
>> +	 * previous unapplied bundle (pop the stack) and attempt to unbundle
>> +	 * it again.
>> +	 *
>> +	 * In the case of a fresh clone, we will likely download all of the
>> +	 * bundles before successfully unbundling the oldest one, then the
>> +	 * rest of the bundles unbundle successfully in increasing order
>> +	 * of creationToken.
>> +	 *
>> +	 * If there are existing objects, then this process may terminate
>> +	 * early when all required commits from "new" bundles exist in the
>> +	 * repo's object store.
>> +	 */
>> +	cur = 0;
>> +	while (cur >= 0 && cur < sorted.nr) {
>> +		struct remote_bundle_info *bundle = sorted.items[cur];
>> +		if (!bundle->file) {
>> +			/* Not downloaded yet. Try downloading. */
>> +			if (download_bundle_to_file(bundle, &ctx)) {
>> +				/* Failure. Push to the stack. */
>> +				pop_or_push = 1;
>> +				goto stack_operation;
>
> Personally, I find the use of "stack" terminology more confusing than not.
> 'sorted' isn't really a stack, it's a list with fixed contents being
> traversed stepwise with 'cur'. For example, 'pop_or_push' being renamed to
> 'move_direction' or 'step' something along those lines might more clearly
> indicate what's actually happening with 'cur' & 'sorted'.

s/pop_or_push/move_direction/ makes a lot of sense.

I'll think about describing the strategy differently to avoid the "stack"
language. Mentally, I'm constructing a stack of "downloaded but unable to
unbundle bundles", but they aren't actually arranged that way in any
explicit structure. Instead, they are just the bundles in the list that
have a file but haven't been unbundled.

>> +			}
>> +
>> +			/* We expect bundles when using creationTokens. */
>> +			if (!is_bundle(bundle->file, 1)) {
>> +				warning(_("file downloaded from '%s' is not a bundle"),
>> +					bundle->uri);
>> +				break;
>> +			}
>> +		}
>> +
>> +		if (bundle->file && !bundle->unbundled) {
>> +			/*
>> +			 * This was downloaded, but not successfully
>> +			 * unbundled. Try unbundling again.
>> +			 */
>> +			if (unbundle_from_file(ctx.r, bundle->file)) {
>> +				/* Failed to unbundle. Push to stack. */
>> +				pop_or_push = 1;
>> +			} else {
>> +				/* Succeeded in unbundle. Pop stack. */
>> +				pop_or_push = -1;
>> +			}
>> +		}
>> +
>> +		/*
>> +		 * Else case: downloaded and unbundled successfully.
>> +		 * Skip this by moving in the same direction as the
>> +		 * previous step.
>> +		 */
>> +
>> +stack_operation:
>> +		/* Move in the specified direction and repeat. */
>> +		cur += pop_or_push;
>> +	}
>
> After reading through this loop, I generally understood *what* its doing,
> but didn't really follow *why* the download & unbundling is done like this.

The commit message should be updated to point to refer to the previously-
added test setup in t5558:

# To get interesting tests for bundle lists, we need to construct a
# somewhat-interesting commit history.
#
# ---------------- bundle-4
#
#       4
#      / \
# ----|---|------- bundle-3
#     |   |
#     |   3
#     |   |
# ----|---|------- bundle-2
#     |   |
#     2   |
#     |   |
# ----|---|------- bundle-1
#      \ /
#       1
#       |
# (previous commits)

And then this can be used to motivate the algorithm. Suppose we have
already downloaded commit 1 through a previous fetch. We try to download
bundle-4 first, but it can't apply because it requires commits that are
in bundle-3 _and_ bundle-2, but the client doesn't know which bundles
contain those commits. Downloading bundle-3 successfully unbundles, so a
naive algorithm would think we are "done" and expect to unbundle bundle-4.
However, that unbundling fails, so we go deeper into the list to download
bundle-2. That succeeds, and then retrying bundle-4 succeeds.

> I needed to refer back to the design doc
> ('Documentation/technical/bundle-uri.txt') to understand some basic
> assumptions about bundles:
>
> - A new bundle's creation token should always be strictly greater than the
>   previous newest bundle's creation token. I don't see any special handling
>   for equal creation tokens, so my assumption is that the sorting of the
>   list arbitrarily assigns one to be "greater" and it's dealt with that way.

Yes, the bundle provider should not have equal values unless the bundles are
truly independent. That could be clarified in that doc.

> - The bundle with the lowest creation token should always be unbundleable,
>   since it contains all objects in an initial clone.

Yes, at least it should not have any required commits.

> I do still have some questions, though:
>
> - Why would 'unbundle_from_file()' fail? From context clues, I'm guessing it
>   fails if it has some unreachable objects (as in an incremental bundle), or
>   if it's corrupted somehow.

You are correct. We assume that the data is well-formed and so the problem
must be due to required commits not already present in the local object store.

> - Why would 'download_bundle_to_file()' to fail? Unlike
>   'unbundle_from_file()', it looks like that represents an unexpected error.

Yes, that could fail for network issues such as a server error or other
network failure. In such cases, the client should expect that we will not
be able to download that bundle for the process's lifetime. We may be able
to opportunistically download other bundles, but we will rely on the Git
protocol to get the objects if the bundles fail.

These failure conditions are not tested deeply (there are some tests from
earlier series that test the behavior, but there is room for improvement).

> Also - it seems like one of the assumptions here is that, if a bundle can't
> be downloaded & unbundled, no bundle with a higher creation token can be
> successfully unbundled ('download_bundle_to_file()' sets 'pop_or_push' to
> '1', which will cause the loop to ignore all higher-token bundles and return
> a nonzero value from the function).
>
> I don't think that assumption is necessarily true, though. Suppose you have
> a "base" bundle 100 and incremental bundles 101 and 102. 101 has all objects
> from a new branch A, and 102 has all objects from a newer branch B (not
> based on any objects in A). In this case, 102 could be unbundled even if 101
> is corrupted/can't be downloaded, but we'd run into issues if we store 102
> as the "latest unbundled creation token" (because it implies that 101 was
> unbundled).

You are correct. bundle-3 can be unbundled even if bundle-2 fails in the
test example above.

> Is there any benefit to trying to unbundle those higher bundles *without*
> advancing the "latest creation token"? E.g. in my example, unbundle 102 but
> store '100' as the latest creation token?

I will need to think more about this.

Generally, most repositories that care about this will not have independent
bundles because between every bundle creation step the default branch will
advance. (Of course, exceptions can still occur, such as over weekends.)
Thus, the latest bundle will have a required commit that only exists in the
previous bundle. This algorithm and its error conditions are then looking
for ways to recover when that is not the case.

When a bundle fails to download, my gut feeling is that it is unlikely that
it was completely independent of a bundle with higher creationToken. However,
we have already downloaded that bundle and it is a very low cost to attempt
an unbundling of it.

The tricky part is that we want to avoid downloading _all_ the bundles just
because one is failing to unbundle. If a failed download would cause the top
bundle from unbundling, we don't want to go through the whole list of bundles
even though they unbundle without issue. I'm thinking specifically about the
incremental fetch case, where we don't want to blow up to a full clone worth
of downloads.

This deserves a little more attention, so I'll think more on it and get
back to you.

>>  	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
>> -	git -C clone-list-http-2 cat-file --batch-check <oids
>> +	git -C clone-list-http-2 cat-file --batch-check <oids &&
>> +
>> +	for b in 1 2 3 4
>> +	do
>> +		test_bundle_downloaded bundle-$b.bundle trace-clone.txt ||
>> +			return 1
>> +	done
>
> If I understand correctly, these added conditions would have passed even if
> they were added when the test was initially created in patch 1, but they're
> added here to tie them to the implementation of the creationToken heuristic?
> Seems reasonable.

They probably should have been added in patch 1 to be clear that behavior
is not changing here.

>> +'
>> +
>> +test_expect_success 'clone bundle list (http, creationToken)' '
>
> This new test has the same name as the one above it - how does it differ
> from that one? Whatever the difference is, can that be noted somehow in the
> title or a comment?

The title should change, pointing out that the bundle list is truncated
and the rest of the clone is being fetched over the Git protocol. It will
be expanded with fetches later, I think, but it should be better motivated
in this patch, even if that is so.

>> +# Usage: test_bundle_downloaded <bundle-id> <trace-filename>
>> +test_bundle_downloaded () {
>> +	cat >pattern <<-EOF &&
>> +	"event":"child_start".*"argv":\["git-remote-https","$HTTPD_URL/$1.bundle"\]
>> +	EOF
>> +	grep -f pattern "$2"
>> +}
>
> This function is the same as the one created in 't5558'. Should it be moved
> to 'lib-bundle.sh' or 'test-lib.sh' to avoid duplicate code?

It's slightly different, but that is just because we are using the advertisement
and thus we never download a bundle-list and always download .bundle files. That
is not an important distinction and I expect to replace it with the
test_remote_https_urls() helper discussed in an earlier response.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 5/8] clone: set fetch.bundleURI if appropriate
  2023-01-19 19:42   ` Victoria Dye
@ 2023-01-20 15:42     ` Derrick Stolee
  0 siblings, 0 replies; 74+ messages in thread
From: Derrick Stolee @ 2023-01-20 15:42 UTC (permalink / raw)
  To: Victoria Dye, Derrick Stolee via GitGitGadget, git
  Cc: gitster, me, avarab, steadmon, chooglen

On 1/19/2023 2:42 PM, Victoria Dye wrote:
> Derrick Stolee via GitGitGadget wrote:
>> diff --git a/Documentation/config/fetch.txt b/Documentation/config/fetch.txt
>> index cd65d236b43..4f796218aab 100644
>> --- a/Documentation/config/fetch.txt
>> +++ b/Documentation/config/fetch.txt
>> @@ -96,3 +96,11 @@ fetch.writeCommitGraph::
>>  	merge and the write may take longer. Having an updated commit-graph
>>  	file helps performance of many Git commands, including `git merge-base`,
>>  	`git push -f`, and `git log --graph`. Defaults to false.
>> +
>> +fetch.bundleURI::
>> +	This value stores a URI for fetching Git object data from a bundle URI
>> +	before performing an incremental fetch from the origin Git server. If
>> +	the value is `<uri>` then running `git fetch <args>` is equivalent to
>> +	first running `git fetch --bundle-uri=<uri>` immediately before
>> +	`git fetch <args>`. See details of the `--bundle-uri` option in
>> +	linkgit:git-fetch[1].
>
> Since it's not mentioned from this or any other user-facing documentation
> (AFAICT), could you note that this value is set automatically by 'git clone'
> iff '--bundle-uri' is specified *and* 'bundle.heuristic' is set for the
> initially downloaded bundle list?

Can do.

> It would also be nice to make note of that behavior in the documentation of
> the '--bundle-uri' option in 'Documentation/git-clone.txt', since command
> documentation in general seems to be more popular/visible to users than
> config docs.

Yes. I also thought that I had updated this documentation to not refer
to 'git fetch --bundle-uri', which doesn't exist anymore since an earlier
RFC version. I'll be sure to update that, too.

>>  	strvec_push(&transport_ls_refs_options.ref_prefixes, "HEAD");
>> diff --git a/bundle-uri.c b/bundle-uri.c
>> index b30c85ba6f2..1dbbbb980eb 100644
>> --- a/bundle-uri.c
>> +++ b/bundle-uri.c
>> @@ -594,9 +594,10 @@ static int fetch_bundle_list_in_config_format(struct repository *r,
>>  	 * it advertises are expected to be bundles, not nested lists.
>>  	 * We can drop 'global_list' and 'depth'.
>>  	 */
>> -	if (list_from_bundle.heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN)
>> +	if (list_from_bundle.heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN) {
>>  		result = fetch_bundles_by_token(r, &list_from_bundle);
>> -	else if ((result = download_bundle_list(r, &list_from_bundle,
>> +		global_list->heuristic = BUNDLE_HEURISTIC_CREATIONTOKEN;
>
> If the 'heuristic' field already existed and was being used to apply
> bundles, why wasn't 'global_list->heuristic' already being set? Before this
> patch, was the 'global_list->heuristic' field not accurately reflecting the
> heuristic type of a given bundle list?
>
> If so, I think it'd make sense to move this section to patch 4 [1], since
> that's when the heuristic is first applied to the bundle list.
>
> [1] https://lore.kernel.org/git/57c0174d3752fb61a05e0653de9d3057616ed16a.1673037405.git.gitgitgadget@gmail.com/

Can do.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 7/8] fetch: fetch from an external bundle URI
  2023-01-19 20:34   ` Victoria Dye
@ 2023-01-20 15:47     ` Derrick Stolee
  0 siblings, 0 replies; 74+ messages in thread
From: Derrick Stolee @ 2023-01-20 15:47 UTC (permalink / raw)
  To: Victoria Dye, Derrick Stolee via GitGitGadget, git
  Cc: gitster, me, avarab, steadmon, chooglen

On 1/19/2023 3:34 PM, Victoria Dye wrote:
> Derrick Stolee via GitGitGadget wrote:

>> +	if (!git_config_get_string_tmp("fetch.bundleuri", &bundle_uri) &&
>> +	    !starts_with(bundle_uri, "remote:")) {
> 
> Maybe a silly question, by why would the bundle URI start with 'remote:'
> (and why do we silently skip fetching from the URI in that case)?

Thanks for catching this. I originally was going to include fetching from
lists advertised by a Git remote, and use the same `fetch.bundleURI` config.
However, it makes more sense to make a `remote.<name>.bundles` config
instead, so I dropped that functionality from this series. I forgot to remove
this `remote:` case, but will do so in v2.

I've locally fixed the "objects" typo you pointed out, too.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 8/8] bundle-uri: store fetch.bundleCreationToken
  2023-01-19 22:24   ` Victoria Dye
@ 2023-01-20 15:53     ` Derrick Stolee
  0 siblings, 0 replies; 74+ messages in thread
From: Derrick Stolee @ 2023-01-20 15:53 UTC (permalink / raw)
  To: Victoria Dye, Derrick Stolee via GitGitGadget, git
  Cc: gitster, me, avarab, steadmon, chooglen

On 1/19/2023 5:24 PM, Victoria Dye wrote:
> Derrick Stolee via GitGitGadget wrote:

>> +	/*
>> +	 * If fetch.bundleCreationToken exists, parses to a uint64t, and
>> +	 * is not strictly smaller than the maximum creation token in the
>> +	 * bundle list, then do not download any bundles.
>> +	 */
>> +	if (!repo_config_get_value(r,
>> +				   "fetch.bundlecreationtoken",
>> +				   &creationTokenStr) &&
>> +	    sscanf(creationTokenStr, "%"PRIu64, &maxCreationToken) == 1 &&
>> +	    sorted.items[0]->creationToken <= maxCreationToken) {
>> +		free(sorted.items);
>> +		return 0;
>> +	}
>
> And here, we exit if the cached creation token is greater than or equal to
> the highest advertised token. Overall, this seems pretty safe:
>
> - If the value is (somehow) manually updated to something larger than it
>   should be, objects will be fetched from the server that could have
>   otherwise come from a bundle. Not ideal, but no worse than a regular
>   fetch.
> - If the value is too small or invalid (i.e., not an unsigned integer),
>   we'll download the first bundle, unbundle it, then overwrite the invalid
>   'fetch.bundlecreationtoken' with a new valid one.
>
> The latter is self-correcting, but should the former be documented
> somewhere? For example, if someone changes which bundle server they're using
> with a repo, the creation token numbering scheme might be completely
> different. In that case, a user would probably want to "reset" the
> 'fetch.bundlecreationtoken' and start fresh with a new bundle list (even if
> the recommended method is "manually delete the config key from your repo").

I can update the config documentation to include this.


>> diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
>> index 3f4d61a915c..0604d721f1b 100755
>> --- a/t/t5558-clone-bundle-uri.sh
>> +++ b/t/t5558-clone-bundle-uri.sh
>
> It isn't exactly related to this patch, but it looks like the second "clone
> bundle list (http, creationToken)" never received any updates to
> differentiate it from the first test with that name (I noted the duplicate
> name in patch 4 [2]). Is it just a leftover duplicate?
>
> [1] https://lore.kernel.org/git/ede340d1-bce4-0c1d-7afb-4874a67d1803@github.com/

I'll be sure to follow up and make these test changes be of higher value,
avoiding these confusions.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH v2 00/10] Bundle URIs V: creationToken heuristic for incremental fetches
  2023-01-06 20:36 [PATCH 0/8] Bundle URIs V: creationToken heuristic for incremental fetches Derrick Stolee via GitGitGadget
                   ` (7 preceding siblings ...)
  2023-01-06 20:36 ` [PATCH 8/8] bundle-uri: store fetch.bundleCreationToken Derrick Stolee via GitGitGadget
@ 2023-01-23 15:21 ` Derrick Stolee via GitGitGadget
  2023-01-23 15:21   ` [PATCH v2 01/10] bundle: optionally skip reachability walk Derrick Stolee via GitGitGadget
                     ` (11 more replies)
  8 siblings, 12 replies; 74+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-01-23 15:21 UTC (permalink / raw)
  To: git; +Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee

This fifth part to the bundle URIs feature follows part IV (advertising via
protocol v2) which recently merged to 'master', so this series is based on
'master'.

This part introduces the concept of a heuristic that a bundle list can
advertise. The purpose of the heuristic is to hint to the Git client that
the bundles can be downloaded and unbundled in a certain order. In
particular, that order can assist with using the same bundle URI to download
new bundles from an updated bundle list. This allows bundle URIs to assist
with incremental fetches, not just initial clones.

The only planned heuristic is the "creationToken" heuristic where the bundle
list adds a 64-bit unsigned integer "creationToken" value to each bundle in
the list. Those values provide an ordering on the bundles implying that the
bundles can be unbundled in increasing creationToken order and at each point
the required commits for the ith bundle were provided by bundles with lower
creationTokens.

At clone time, the only difference implied by the creationToken order is
that the Git client does not need to guess at the order to apply the
bundles, but instead can use the creationToken order to apply them without
failure and retry. However, this presents an interesting benefit during
fetches: the Git client can check the bundle list and download bundles in
decreasing creationToken order until the required commits for these bundles
are present within the repository's object store. This prevents downloading
more bundle information than required.

The creationToken value is also a promise that the Git client will not need
to download a bundle if its creationToken is less than or equal to the
creationToken of a previously-downloaded bundle. This further improves the
performance during a fetch in that the client does not need to download any
bundles at all if it recognizes that the maximum creationToken is the same
(or smaller than) a previously-downloaded creationToken.

The creationToken concept is documented in the existing design document at
Documentation/technical/bundle-uri.txt, including suggested ways for bundle
providers to organize their bundle lists to take advantage of the heuristic.

This series formalizes the creationToken heuristic and the Git client logic
for understanding it. Further, for bundle lists provided by the git clone
--bundle-uri option, the Git client will recognize the heuristic as being
helpful for incremental fetches and store config values so that future git
fetch commands check the bundle list before communicating with any Git
remotes.

Note that this option does not integrate fetches with bundle lists
advertised via protocol v2. I spent some time working on this, but found the
implementation to be distinct enough that it merited its own attention in a
separate series. In particular, the configuration for indicating that a
fetch should check the bundle-uri protocol v2 command seemed best to be
located within a Git remote instead of a repository-global key such as is
being used for a static URI. Further, the timing of querying the bundle-uri
command during a git fetch command is significantly different and more
complicated than how it is used in git clone.


What Remains?
=============

Originally, I had planned on making this bundle URI work a 5-part series,
and this is part 5. Shouldn't we be done now?

There are two main things that should be done after this series, in any
order:

 * Teach git fetch to check a bundle list advertised by a remote over the
   bundle-uri protocol v2 command.
 * Add the bundle.<id>.filter option to allow advertising bundles and
   partial bundles side-by-side.

There is also room for expanding tests for more error conditions, or for
other tweaks that are not currently part of the design document. I do think
that after this series, the feature will be easier to work on different
parts in parallel.


Patch Outline
=============

 * (New in v2) Patch 1 adds a new VERIFY_BUNDLE_SKIP_REACHABLE flag for
   verify_bundle() which is called by unbundle(); this fixes a probable
   exposed by patch 10 where a bundle would fail to unbundle due to the "are
   the required commits reachable from refs?" check.
 * Patch 2 creates a test setup demonstrating a creationToken heuristic. At
   this point, the Git client ignores the heuristic and uses its ad-hoc
   strategy for ordering the bundles.
 * Patches 3 and 4 teach Git to parse the bundle.heuristic and
   bundle.<id>.creationToken keys in a bundle list.
 * Patch 5 teaches Git to download bundles using the creationToken order.
   This order uses a stack approach to start from the maximum creationToken
   and continue downloading the next bundle in the list until all bundles
   can successfully be unbundled. This is the algorithm required for
   incremental fetches, while initial clones could download in the opposite
   order. Since clones will download all bundles anyway, having a second
   code path just for clones seemed unnecessary.
 * Patch 6 teaches git clone --bundle-uri to set fetch.bundleURI when the
   advertised bundle list includs a heuristic that Git understands.
 * Patch 7 updates the design document to remove reference to a bundle.flag
   option that was previously going to indicate the list was designed for
   fetches, but the bundle.heuristic option already does that.
 * Patch 8 teaches git fetch to check fetch.bundleURI and download bundles
   from that static URI before connecting to remotes via the Git protocol.
 * Patch 9 introduces a new fetch.bundleCreationToken config value to store
   the maximum creationToken of downloaded bundles. This prevents
   downloading the latest bundle on every git fetch command, reducing waste.
 * (New in v2) Patch 10 adds new tests for interesting incremental fetch
   shapes. Along with other test edits in other patches, these revealed
   several issues that required improvement within this series. These tests
   also check extra cases around failed bundle downloads.


Updates in v2
=============

 * Patches 1 and 10 are new.
 * I started making the extra tests in patch 10 due to Victoria's concern
   around failed downloads. I extended the bundle list in a way that exposed
   other issues that are fixed in this version. Unfortunately, the test
   requires the full functionality of the entire series, so the tests are
   not isolated to where the code fixes are made. One thing that I noticed
   in the process is that some of the tests were using the local-clone trick
   to copy full object directories instead of copying only the requested
   object set. This was causing confusion in how the bundles were applying
   or failing to apply, so the tests are updated to use http whenever
   possible.
 * In Patch 2, I created a new test_remote_https_urls helper to get the full
   download list (in order). In this patch, the bundle download order is not
   well-defined, but is modified in later tests when it becomes
   well-defined.
 * In Patch 3, I updated the connection between config value and enum value
   to be an array of pairs instead of faking a hashmap-like interface that
   could be dangerous if the enum values were assigned incorrectly.
 * In Patch 5, the 'sorted' list and its type was renamed to be more
   descriptive. This also included updates to "append_bundle()" and
   "compare_creation_token_decreasing()" to be more descriptive. This had
   some side effects in Patch 8 due to the renames.
 * In Patch 5, I added the interesting bundle shape to the commit message to
   remind us of why the creationToken algorithm needs to be the way it is. I
   also removed the "stack" language in favor of discussing ranges of the
   sorted list. Some renames, such as "pop_or_push" is changed to
   "move_direction", resulted from this change of language.
 * The assignment of heuristic from the local list to global_list was moved
   into Patch 5.
 * In Patch 5, one of the tests removed bundle-2 because it allows a later
   test for git fetch to demonstrate the interesting behavior where bundle-4
   requires both bundle-2 and bundle-3.
 * In Patch 6, the fetch.bundleURI config is described differently,
   including dropping the defunct git fetch --bundle-uri reference and
   discussing that git clone --bundle-uri will set it automatically.
 * Patch 8 no longer refers to a config value starting with "remote:". It
   also expands a test that was previously not expanded in v1.
 * Patch 9 updates the documentation for fetch.bundleURI and
   fetch.bundleCreationToken to describe how the user should unset the
   latter if they edit the former.
 * Much of Patch 9's changes are due to context changes from the renames in
   Patch 5. However, it also adds the restriction that it will not attempt
   to download bundles unless their creationToken is strictly greater than
   the stored token. This ends up being critical to the failed download
   case, preventing an incremental fetch from downloading all bundles just
   because one bundle failed to download (and that case is tested in patch
   10).
 * Patch 10 adds significant testing, including several tests of failed
   bundle downloads in various cases.

Thanks,

 * Stolee

Derrick Stolee (10):
  bundle: optionally skip reachability walk
  t5558: add tests for creationToken heuristic
  bundle-uri: parse bundle.heuristic=creationToken
  bundle-uri: parse bundle.<id>.creationToken values
  bundle-uri: download in creationToken order
  clone: set fetch.bundleURI if appropriate
  bundle-uri: drop bundle.flag from design doc
  fetch: fetch from an external bundle URI
  bundle-uri: store fetch.bundleCreationToken
  bundle-uri: test missing bundles with heuristic

 Documentation/config/bundle.txt        |   7 +
 Documentation/config/fetch.txt         |  24 +
 Documentation/technical/bundle-uri.txt |   8 +-
 builtin/clone.c                        |   6 +-
 builtin/fetch.c                        |   7 +
 bundle-uri.c                           | 257 +++++++++-
 bundle-uri.h                           |  28 +-
 bundle.c                               |   3 +-
 bundle.h                               |   1 +
 t/t5558-clone-bundle-uri.sh            | 672 ++++++++++++++++++++++++-
 t/t5601-clone.sh                       |  46 ++
 t/t5750-bundle-uri-parse.sh            |  37 ++
 t/test-lib-functions.sh                |   8 +
 13 files changed, 1091 insertions(+), 13 deletions(-)


base-commit: 4dbebc36b0893f5094668ddea077d0e235560b16
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1454%2Fderrickstolee%2Fbundle-redo%2FcreationToken-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1454/derrickstolee/bundle-redo/creationToken-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1454

Range-diff vs v1:

  -:  ----------- >  1:  b3828725bc8 bundle: optionally skip reachability walk
  1:  39eed914878 !  2:  427aff4d5e5 t5558: add tests for creationToken heuristic
     @@ Commit message
          meantime, create tests that add creation tokens to the bundle list. For
          now, the Git client correctly ignores these unknown keys.
      
     +    Create a new test helper function, test_remote_https_urls, which filters
     +    GIT_TRACE2_EVENT output to extract a list of URLs passed to
     +    git-remote-https child processes. This can be used to verify the order
     +    of these requests as we implement the creationToken heuristic. For now,
     +    we need to sort the actual output since the current client does not have
     +    a well-defined order that it applies to the bundles.
     +
          Signed-off-by: Derrick Stolee <derrickstolee@github.com>
      
       ## t/t5558-clone-bundle-uri.sh ##
      @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone HTTP bundle' '
     - 	test_config -C clone-http log.excludedecoration refs/bundle/
       '
       
     -+# usage: test_bundle_downloaded <bundle-name> <trace-file>
     -+test_bundle_downloaded () {
     -+	cat >pattern <<-EOF &&
     -+	"event":"child_start".*"argv":\["git-remote-https","$HTTPD_URL/$1"\]
     -+	EOF
     -+	grep -f pattern "$2"
     -+}
     -+
       test_expect_success 'clone bundle list (HTTP, no heuristic)' '
      +	test_when_finished rm -f trace*.txt &&
      +
     @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone bundle list (HTTP, no he
      -	git -C clone-list-http cat-file --batch-check <oids
      +	git -C clone-list-http cat-file --batch-check <oids &&
      +
     -+	for b in 1 2 3 4
     -+	do
     -+		test_bundle_downloaded bundle-$b.bundle trace-clone.txt ||
     -+			return 1
     -+	done
     ++	cat >expect <<-EOF &&
     ++	$HTTPD_URL/bundle-1.bundle
     ++	$HTTPD_URL/bundle-2.bundle
     ++	$HTTPD_URL/bundle-3.bundle
     ++	$HTTPD_URL/bundle-4.bundle
     ++	$HTTPD_URL/bundle-list
     ++	EOF
     ++
     ++	# Sort the list, since the order is not well-defined
     ++	# without a heuristic.
     ++	test_remote_https_urls <trace-clone.txt | sort >actual &&
     ++	test_cmp expect actual
       '
       
       test_expect_success 'clone bundle list (HTTP, any mode)' '
     @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone bundle list (HTTP, any m
       '
       
      +test_expect_success 'clone bundle list (http, creationToken)' '
     ++	test_when_finished rm -f trace*.txt &&
     ++
      +	cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" &&
      +	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
      +	[bundle]
     @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone bundle list (HTTP, any m
      +		creationToken = 4
      +	EOF
      +
     -+	git clone --bundle-uri="$HTTPD_URL/bundle-list" . clone-list-http-2 &&
     ++	GIT_TRACE2_EVENT="$(pwd)/trace-clone.txt" git \
     ++		clone --bundle-uri="$HTTPD_URL/bundle-list" \
     ++		"$HTTPD_URL/smart/fetch.git" clone-list-http-2 &&
      +
      +	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
     -+	git -C clone-list-http-2 cat-file --batch-check <oids
     ++	git -C clone-list-http-2 cat-file --batch-check <oids &&
     ++
     ++	cat >expect <<-EOF &&
     ++	$HTTPD_URL/bundle-1.bundle
     ++	$HTTPD_URL/bundle-2.bundle
     ++	$HTTPD_URL/bundle-3.bundle
     ++	$HTTPD_URL/bundle-4.bundle
     ++	$HTTPD_URL/bundle-list
     ++	EOF
     ++
     ++	# Since the creationToken heuristic is not yet understood by the
     ++	# client, the order cannot be verified at this moment. Sort the
     ++	# list for consistent results.
     ++	test_remote_https_urls <trace-clone.txt | sort >actual &&
     ++	test_cmp expect actual
      +'
      +
       # Do not add tests here unless they use the HTTP server, as they will
       # not run unless the HTTP dependencies exist.
       
     +
     + ## t/test-lib-functions.sh ##
     +@@ t/test-lib-functions.sh: test_region () {
     + 	return 0
     + }
     + 
     ++# Given a GIT_TRACE2_EVENT log over stdin, writes to stdout a list of URLs
     ++# sent to git-remote-https child processes.
     ++test_remote_https_urls() {
     ++	grep -e '"event":"child_start".*"argv":\["git-remote-https",".*"\]' |
     ++		sed -e 's/{"event":"child_start".*"argv":\["git-remote-https","//g' \
     ++		    -e 's/"\]}//g'
     ++}
     ++
     + # Print the destination of symlink(s) provided as arguments. Basically
     + # the same as the readlink command, but it's not available everywhere.
     + test_readlink () {
  2:  9007249b948 !  3:  f6f8197c9cc bundle-uri: parse bundle.heuristic=creationToken
     @@ Commit message
          bundle-uri' to print the heuristic value and verify that the parsing
          works correctly.
      
     +    As an extra precaution, create the internal 'heuristics' array to be a
     +    list of (enum, string) pairs so we can iterate through the array entries
     +    carefully, regardless of the enum values.
     +
          Signed-off-by: Derrick Stolee <derrickstolee@github.com>
      
       ## Documentation/config/bundle.txt ##
     @@ bundle-uri.c
       #include "config.h"
       #include "remote.h"
       
     -+static const char *heuristics[] = {
     -+	[BUNDLE_HEURISTIC_NONE] = "",
     -+	[BUNDLE_HEURISTIC_CREATIONTOKEN] = "creationToken",
     ++static struct {
     ++	enum bundle_list_heuristic heuristic;
     ++	const char *name;
     ++} heuristics[BUNDLE_HEURISTIC__COUNT] = {
     ++	{ BUNDLE_HEURISTIC_NONE, ""},
     ++	{ BUNDLE_HEURISTIC_CREATIONTOKEN, "creationToken" },
      +};
      +
       static int compare_bundles(const void *hashmap_cmp_fn_data,
     @@ bundle-uri.c: void print_bundle_list(FILE *fp, struct bundle_list *list)
       	fprintf(fp, "\tversion = %d\n", list->version);
       	fprintf(fp, "\tmode = %s\n", mode);
       
     -+	if (list->heuristic)
     -+		printf("\theuristic = %s\n", heuristics[list->heuristic]);
     ++	if (list->heuristic) {
     ++		int i;
     ++		for (i = 0; i < BUNDLE_HEURISTIC__COUNT; i++) {
     ++			if (heuristics[i].heuristic == list->heuristic) {
     ++				printf("\theuristic = %s\n",
     ++				       heuristics[list->heuristic].name);
     ++				break;
     ++			}
     ++		}
     ++	}
      +
       	for_all_bundles_in_list(list, summarize_bundle, fp);
       }
     @@ bundle-uri.c: static int bundle_list_update(const char *key, const char *value,
      +		if (!strcmp(subkey, "heuristic")) {
      +			int i;
      +			for (i = 0; i < BUNDLE_HEURISTIC__COUNT; i++) {
     -+				if (!strcmp(value, heuristics[i])) {
     -+					list->heuristic = i;
     ++				if (heuristics[i].heuristic &&
     ++				    heuristics[i].name &&
     ++				    !strcmp(value, heuristics[i].name)) {
     ++					list->heuristic = heuristics[i].heuristic;
      +					return 0;
      +				}
      +			}
     @@ bundle-uri.h: enum bundle_list_mode {
      +	BUNDLE_HEURISTIC_CREATIONTOKEN,
      +
      +	/* Must be last. */
     -+	BUNDLE_HEURISTIC__COUNT,
     ++	BUNDLE_HEURISTIC__COUNT
      +};
      +
       /**
  3:  a1808f0b10c =  4:  12efa228d04 bundle-uri: parse bundle.<id>.creationToken values
  4:  57c0174d375 !  5:  7cfaa3c518c bundle-uri: download in creationToken order
     @@ Commit message
          strategy implemented here provides that short-circuit where the client
          downloads a minimal set of bundles.
      
     +    However, we are not satisfied by the naive approach of downloading
     +    bundles until one successfully unbundles, expecting the earlier bundles
     +    to successfully unbundle now. The example repository in t5558
     +    demonstrates this well:
     +
     +     ---------------- bundle-4
     +
     +           4
     +          / \
     +     ----|---|------- bundle-3
     +         |   |
     +         |   3
     +         |   |
     +     ----|---|------- bundle-2
     +         |   |
     +         2   |
     +         |   |
     +     ----|---|------- bundle-1
     +          \ /
     +           1
     +           |
     +     (previous commits)
     +
     +    In this repository, if we already have the objects for bundle-1 and then
     +    try to fetch from this list, the naive approach will fail. bundle-4
     +    requires both bundle-3 and bundle-2, though bundle-3 will successfully
     +    unbundle without bundle-2. Thus, the algorithm needs to keep this in
     +    mind.
     +
          A later implementation detail will store the maximum creationToken seen
          during such a bundle download, and the client will avoid downloading a
          bundle unless its creationToken is strictly greater than that stored
     @@ bundle-uri.c: static int download_bundle_to_file(struct remote_bundle_info *bund
       	return 0;
       }
       
     -+struct sorted_bundle_list {
     ++struct bundles_for_sorting {
      +	struct remote_bundle_info **items;
      +	size_t alloc;
      +	size_t nr;
      +};
      +
     -+static int insert_bundle(struct remote_bundle_info *bundle, void *data)
     ++static int append_bundle(struct remote_bundle_info *bundle, void *data)
      +{
     -+	struct sorted_bundle_list *list = data;
     ++	struct bundles_for_sorting *list = data;
      +	list->items[list->nr++] = bundle;
      +	return 0;
      +}
      +
     -+static int compare_creation_token(const void *va, const void *vb)
     ++/**
     ++ * For use in QSORT() to get a list sorted by creationToken
     ++ * in decreasing order.
     ++ */
     ++static int compare_creation_token_decreasing(const void *va, const void *vb)
      +{
      +	const struct remote_bundle_info * const *a = va;
      +	const struct remote_bundle_info * const *b = vb;
     @@ bundle-uri.c: static int download_bundle_to_file(struct remote_bundle_info *bund
      +				  struct bundle_list *list)
      +{
      +	int cur;
     -+	int pop_or_push = 0;
     ++	int move_direction = 0;
      +	struct bundle_list_context ctx = {
      +		.r = r,
      +		.list = list,
      +		.mode = list->mode,
      +	};
     -+	struct sorted_bundle_list sorted = {
     ++	struct bundles_for_sorting bundles = {
      +		.alloc = hashmap_get_size(&list->bundles),
      +	};
      +
     -+	ALLOC_ARRAY(sorted.items, sorted.alloc);
     ++	ALLOC_ARRAY(bundles.items, bundles.alloc);
      +
     -+	for_all_bundles_in_list(list, insert_bundle, &sorted);
     ++	for_all_bundles_in_list(list, append_bundle, &bundles);
      +
     -+	QSORT(sorted.items, sorted.nr, compare_creation_token);
     ++	QSORT(bundles.items, bundles.nr, compare_creation_token_decreasing);
      +
      +	/*
     -+	 * Use a stack-based approach to download the bundles and attempt
     -+	 * to unbundle them in decreasing order by creation token. If we
     -+	 * fail to unbundle (after a successful download) then move to the
     -+	 * next non-downloaded bundle (push to the stack) and attempt
     -+	 * downloading. Once we succeed in applying a bundle, move to the
     -+	 * previous unapplied bundle (pop the stack) and attempt to unbundle
     -+	 * it again.
     ++	 * Attempt to download and unbundle the minimum number of bundles by
     ++	 * creationToken in decreasing order. If we fail to unbundle (after
     ++	 * a successful download) then move to the next non-downloaded bundle
     ++	 * and attempt downloading. Once we succeed in applying a bundle,
     ++	 * move to the previous unapplied bundle and attempt to unbundle it
     ++	 * again.
      +	 *
      +	 * In the case of a fresh clone, we will likely download all of the
      +	 * bundles before successfully unbundling the oldest one, then the
     @@ bundle-uri.c: static int download_bundle_to_file(struct remote_bundle_info *bund
      +	 * repo's object store.
      +	 */
      +	cur = 0;
     -+	while (cur >= 0 && cur < sorted.nr) {
     -+		struct remote_bundle_info *bundle = sorted.items[cur];
     ++	while (cur >= 0 && cur < bundles.nr) {
     ++		struct remote_bundle_info *bundle = bundles.items[cur];
      +		if (!bundle->file) {
     -+			/* Not downloaded yet. Try downloading. */
     -+			if (download_bundle_to_file(bundle, &ctx)) {
     -+				/* Failure. Push to the stack. */
     -+				pop_or_push = 1;
     ++			/*
     ++			 * Not downloaded yet. Try downloading.
     ++			 *
     ++			 * Note that bundle->file is non-NULL if a download
     ++			 * was attempted, even if it failed to download.
     ++			 */
     ++			if (fetch_bundle_uri_internal(ctx.r, bundle, ctx.depth + 1, ctx.list)) {
     ++				/* Mark as unbundled so we do not retry. */
     ++				bundle->unbundled = 1;
     ++
     ++				/* Try looking deeper in the list. */
     ++				move_direction = 1;
      +				goto stack_operation;
      +			}
      +
     @@ bundle-uri.c: static int download_bundle_to_file(struct remote_bundle_info *bund
      +			 * unbundled. Try unbundling again.
      +			 */
      +			if (unbundle_from_file(ctx.r, bundle->file)) {
     -+				/* Failed to unbundle. Push to stack. */
     -+				pop_or_push = 1;
     ++				/* Try looking deeper in the list. */
     ++				move_direction = 1;
      +			} else {
     -+				/* Succeeded in unbundle. Pop stack. */
     -+				pop_or_push = -1;
     ++				/*
     ++				 * Succeeded in unbundle. Retry bundles
     ++				 * that previously failed to unbundle.
     ++				 */
     ++				move_direction = -1;
     ++				bundle->unbundled = 1;
      +			}
      +		}
      +
     @@ bundle-uri.c: static int download_bundle_to_file(struct remote_bundle_info *bund
      +
      +stack_operation:
      +		/* Move in the specified direction and repeat. */
     -+		cur += pop_or_push;
     ++		cur += move_direction;
      +	}
      +
     -+	free(sorted.items);
     ++	free(bundles.items);
      +
      +	/*
      +	 * We succeed if the loop terminates because 'cur' drops below
     @@ bundle-uri.c: static int fetch_bundle_list_in_config_format(struct repository *r
      +	 * it advertises are expected to be bundles, not nested lists.
      +	 * We can drop 'global_list' and 'depth'.
      +	 */
     -+	if (list_from_bundle.heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN)
     ++	if (list_from_bundle.heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN) {
      +		result = fetch_bundles_by_token(r, &list_from_bundle);
     -+	else if ((result = download_bundle_list(r, &list_from_bundle,
     ++		global_list->heuristic = BUNDLE_HEURISTIC_CREATIONTOKEN;
     ++	} else if ((result = download_bundle_list(r, &list_from_bundle,
       					   global_list, depth)))
       		goto cleanup;
       
     @@ bundle-uri.c: int fetch_bundle_list(struct repository *r, struct bundle_list *li
       	for_all_bundles_in_list(&global_list, unlink_bundle, NULL);
      
       ## t/t5558-clone-bundle-uri.sh ##
     -@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone bundle list (HTTP, any mode)' '
     - '
     - 
     - test_expect_success 'clone bundle list (http, creationToken)' '
     -+	test_when_finished rm -f trace*.txt &&
     -+
     - 	cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" &&
     - 	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
     - 	[bundle]
      @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone bundle list (http, creationToken)' '
     - 		creationToken = 4
     - 	EOF
     + 	git -C clone-list-http-2 cat-file --batch-check <oids &&
       
     --	git clone --bundle-uri="$HTTPD_URL/bundle-list" . clone-list-http-2 &&
     -+	GIT_TRACE2_EVENT=$(pwd)/trace-clone.txt \
     -+	git clone --bundle-uri="$HTTPD_URL/bundle-list" \
     -+		clone-from clone-list-http-2 &&
     - 
     - 	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
     --	git -C clone-list-http-2 cat-file --batch-check <oids
     -+	git -C clone-list-http-2 cat-file --batch-check <oids &&
     -+
     -+	for b in 1 2 3 4
     -+	do
     -+		test_bundle_downloaded bundle-$b.bundle trace-clone.txt ||
     -+			return 1
     -+	done
     + 	cat >expect <<-EOF &&
     +-	$HTTPD_URL/bundle-1.bundle
     +-	$HTTPD_URL/bundle-2.bundle
     +-	$HTTPD_URL/bundle-3.bundle
     ++	$HTTPD_URL/bundle-list
     + 	$HTTPD_URL/bundle-4.bundle
     ++	$HTTPD_URL/bundle-3.bundle
     ++	$HTTPD_URL/bundle-2.bundle
     ++	$HTTPD_URL/bundle-1.bundle
     ++	EOF
     ++
     ++	test_remote_https_urls <trace-clone.txt >actual &&
     ++	test_cmp expect actual
      +'
      +
     -+test_expect_success 'clone bundle list (http, creationToken)' '
     ++test_expect_success 'clone incomplete bundle list (http, creationToken)' '
      +	test_when_finished rm -f trace*.txt &&
      +
      +	cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" &&
     @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone bundle list (http, creat
      +	[bundle "bundle-1"]
      +		uri = bundle-1.bundle
      +		creationToken = 1
     -+
     -+	[bundle "bundle-2"]
     -+		uri = bundle-2.bundle
     -+		creationToken = 2
      +	EOF
      +
      +	GIT_TRACE2_EVENT=$(pwd)/trace-clone.txt \
      +	git clone --bundle-uri="$HTTPD_URL/bundle-list" \
     -+		clone-from clone-token-http &&
     ++		--single-branch --branch=base --no-tags \
     ++		"$HTTPD_URL/smart/fetch.git" clone-token-http &&
      +
     -+	test_bundle_downloaded bundle-1.bundle trace-clone.txt &&
     -+	test_bundle_downloaded bundle-2.bundle trace-clone.txt
     ++	cat >expect <<-EOF &&
     + 	$HTTPD_URL/bundle-list
     ++	$HTTPD_URL/bundle-1.bundle
     + 	EOF
     + 
     +-	# Since the creationToken heuristic is not yet understood by the
     +-	# client, the order cannot be verified at this moment. Sort the
     +-	# list for consistent results.
     +-	test_remote_https_urls <trace-clone.txt | sort >actual &&
     ++	test_remote_https_urls <trace-clone.txt >actual &&
     + 	test_cmp expect actual
       '
       
     - # Do not add tests here unless they use the HTTP server, as they will
      
       ## t/t5601-clone.sh ##
      @@ t/t5601-clone.sh: test_expect_success 'auto-discover multiple bundles from HTTP clone' '
       	grep -f pattern trace.txt
       '
       
     -+# Usage: test_bundle_downloaded <bundle-id> <trace-filename>
     -+test_bundle_downloaded () {
     -+	cat >pattern <<-EOF &&
     -+	"event":"child_start".*"argv":\["git-remote-https","$HTTPD_URL/$1.bundle"\]
     -+	EOF
     -+	grep -f pattern "$2"
     -+}
     -+
      +test_expect_success 'auto-discover multiple bundles from HTTP clone: creationToken heuristic' '
      +	test_when_finished rm -rf "$HTTPD_DOCUMENT_ROOT_PATH/repo4.git" &&
      +	test_when_finished rm -rf clone-heuristic trace*.txt &&
     @@ t/t5601-clone.sh: test_expect_success 'auto-discover multiple bundles from HTTP
      +		    -c transfer.bundleURI=true clone \
      +		"$HTTPD_URL/smart/repo4.git" clone-heuristic &&
      +
     -+	# We should fetch all bundles
     -+	for b in everything new newest
     -+	do
     -+		test_bundle_downloaded $b trace-clone.txt || return 1
     -+	done
     ++	cat >expect <<-EOF &&
     ++	$HTTPD_URL/newest.bundle
     ++	$HTTPD_URL/new.bundle
     ++	$HTTPD_URL/everything.bundle
     ++	EOF
     ++
     ++	# We should fetch all bundles in the expected order.
     ++	test_remote_https_urls <trace-clone.txt >actual &&
     ++	test_cmp expect actual
      +'
      +
       # DO NOT add non-httpd-specific tests here, because the last part of this
  5:  d9c6f50e4f2 !  6:  17c404c1b83 clone: set fetch.bundleURI if appropriate
     @@ Documentation/config/fetch.txt: fetch.writeCommitGraph::
       	`git push -f`, and `git log --graph`. Defaults to false.
      +
      +fetch.bundleURI::
     -+	This value stores a URI for fetching Git object data from a bundle URI
     -+	before performing an incremental fetch from the origin Git server. If
     -+	the value is `<uri>` then running `git fetch <args>` is equivalent to
     -+	first running `git fetch --bundle-uri=<uri>` immediately before
     -+	`git fetch <args>`. See details of the `--bundle-uri` option in
     -+	linkgit:git-fetch[1].
     ++	This value stores a URI for downloading Git object data from a bundle
     ++	URI before performing an incremental fetch from the origin Git server.
     ++	This is similar to how the `--bundle-uri` option behaves in
     ++	linkgit:git-clone[1]. `git clone --bundle-uri` will set the
     ++	`fetch.bundleURI` value if the supplied bundle URI contains a bundle
     ++	list that is organized for incremental fetches.
      
       ## builtin/clone.c ##
      @@ builtin/clone.c: int cmd_clone(int argc, const char **argv, const char *prefix)
     @@ builtin/clone.c: int cmd_clone(int argc, const char **argv, const char *prefix)
       	strvec_push(&transport_ls_refs_options.ref_prefixes, "HEAD");
      
       ## bundle-uri.c ##
     -@@ bundle-uri.c: static int fetch_bundle_list_in_config_format(struct repository *r,
     - 	 * it advertises are expected to be bundles, not nested lists.
     - 	 * We can drop 'global_list' and 'depth'.
     - 	 */
     --	if (list_from_bundle.heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN)
     -+	if (list_from_bundle.heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN) {
     - 		result = fetch_bundles_by_token(r, &list_from_bundle);
     --	else if ((result = download_bundle_list(r, &list_from_bundle,
     -+		global_list->heuristic = BUNDLE_HEURISTIC_CREATIONTOKEN;
     -+	} else if ((result = download_bundle_list(r, &list_from_bundle,
     - 					   global_list, depth)))
     - 		goto cleanup;
     - 
      @@ bundle-uri.c: static int unlink_bundle(struct remote_bundle_info *info, void *data)
       	return 0;
       }
     @@ bundle-uri.h: int bundle_uri_parse_config_format(const char *uri,
        * Given a bundle list that was already advertised (likely by the
      
       ## t/t5558-clone-bundle-uri.sh ##
     -@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone bundle list (http, creationToken)' '
     - 	test_bundle_downloaded bundle-2.bundle trace-clone.txt
     +@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone incomplete bundle list (http, creationToken)' '
     + 		--single-branch --branch=base --no-tags \
     + 		"$HTTPD_URL/smart/fetch.git" clone-token-http &&
     + 
     ++	test_cmp_config -C clone-token-http "$HTTPD_URL/bundle-list" fetch.bundleuri &&
     ++
     + 	cat >expect <<-EOF &&
     + 	$HTTPD_URL/bundle-list
     + 	$HTTPD_URL/bundle-1.bundle
     +@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone incomplete bundle list (http, creationToken)' '
     + 	test_cmp expect actual
       '
       
      +test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
     @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone bundle list (http, creat
      +
      +	test_cmp_config -C fetch-http-4 "$HTTPD_URL/bundle-list" fetch.bundleuri &&
      +
     -+	# The clone should copy two files: the list and bundle-1.
     -+	test_bundle_downloaded bundle-list trace-clone.txt &&
     -+	test_bundle_downloaded bundle-1.bundle trace-clone.txt &&
     ++	cat >expect <<-EOF &&
     ++	$HTTPD_URL/bundle-list
     ++	$HTTPD_URL/bundle-1.bundle
     ++	EOF
     ++
     ++	test_remote_https_urls <trace-clone.txt >actual &&
     ++	test_cmp expect actual &&
      +
      +	# only received base ref from bundle-1
      +	git -C fetch-http-4 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
  6:  afcfd27a883 =  7:  d491070efed bundle-uri: drop bundle.flag from design doc
  7:  1627fc158b1 !  8:  59e57e04968 fetch: fetch from an external bundle URI
     @@ builtin/fetch.c: int cmd_fetch(int argc, const char **argv, const char *prefix)
       	if (dry_run)
       		write_fetch_head = 0;
       
     -+	if (!git_config_get_string_tmp("fetch.bundleuri", &bundle_uri) &&
     -+	    !starts_with(bundle_uri, "remote:")) {
     ++	if (!git_config_get_string_tmp("fetch.bundleuri", &bundle_uri)) {
      +		if (fetch_bundle_uri(the_repository, bundle_uri, NULL))
      +			warning(_("failed to fetch bundles from '%s'"), bundle_uri);
      +	}
     @@ builtin/fetch.c: int cmd_fetch(int argc, const char **argv, const char *prefix)
       			die(_("fetch --all does not take a repository argument"));
      
       ## t/t5558-clone-bundle-uri.sh ##
     +@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone incomplete bundle list (http, creationToken)' '
     + 	EOF
     + 
     + 	test_remote_https_urls <trace-clone.txt >actual &&
     +-	test_cmp expect actual
     ++	test_cmp expect actual &&
     ++
     ++	# We now have only one bundle ref.
     ++	git -C clone-token-http for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
     ++	cat >expect <<-\EOF &&
     ++	refs/bundles/base
     ++	EOF
     ++	test_cmp expect refs &&
     ++
     ++	# Add remaining bundles, exercising the "deepening" strategy
     ++	# for downloading via the creationToken heurisitc.
     ++	cat >>"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
     ++	[bundle "bundle-2"]
     ++		uri = bundle-2.bundle
     ++		creationToken = 2
     ++
     ++	[bundle "bundle-3"]
     ++		uri = bundle-3.bundle
     ++		creationToken = 3
     ++
     ++	[bundle "bundle-4"]
     ++		uri = bundle-4.bundle
     ++		creationToken = 4
     ++	EOF
     ++
     ++	GIT_TRACE2_EVENT="$(pwd)/trace1.txt" \
     ++		git -C clone-token-http fetch origin --no-tags \
     ++		refs/heads/merge:refs/heads/merge &&
     ++
     ++	cat >expect <<-EOF &&
     ++	$HTTPD_URL/bundle-list
     ++	$HTTPD_URL/bundle-4.bundle
     ++	$HTTPD_URL/bundle-3.bundle
     ++	$HTTPD_URL/bundle-2.bundle
     ++	EOF
     ++
     ++	test_remote_https_urls <trace1.txt >actual &&
     ++	test_cmp expect actual &&
     ++
     ++	# We now have all bundle refs.
     ++	git -C clone-token-http for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
     ++
     ++	cat >expect <<-\EOF &&
     ++	refs/bundles/base
     ++	refs/bundles/left
     ++	refs/bundles/merge
     ++	refs/bundles/right
     ++	EOF
     ++	test_cmp expect refs
     + '
     + 
     + test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
      @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
       	cat >expect <<-\EOF &&
       	refs/bundles/base
     @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'http clone with bundle.heurist
      +		refs/heads/left:refs/heads/left \
      +		refs/heads/right:refs/heads/right &&
      +
     -+	# This fetch should copy two files: the list and bundle-2.
     -+	test_bundle_downloaded bundle-list trace1.txt &&
     -+	test_bundle_downloaded bundle-2.bundle trace1.txt &&
     -+	! test_bundle_downloaded bundle-1.bundle trace1.txt &&
     ++	cat >expect <<-EOF &&
     ++	$HTTPD_URL/bundle-list
     ++	$HTTPD_URL/bundle-2.bundle
     ++	EOF
     ++
     ++	test_remote_https_urls <trace1.txt >actual &&
     ++	test_cmp expect actual &&
      +
      +	# received left from bundle-2
      +	git -C fetch-http-4 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
     @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'http clone with bundle.heurist
      +		creationToken = 4
      +	EOF
      +
     -+	# This fetch should skip bundle-3.bundle, since its objets are
     ++	# This fetch should skip bundle-3.bundle, since its objects are
      +	# already local (we have the requisite commits for bundle-4.bundle).
      +	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
      +		git -C fetch-http-4 fetch origin --no-tags \
      +		refs/heads/merge:refs/heads/merge &&
      +
     -+	# This fetch should copy three files: the list, bundle-3, and bundle-4.
     -+	test_bundle_downloaded bundle-list trace2.txt &&
     -+	test_bundle_downloaded bundle-4.bundle trace2.txt &&
     -+	! test_bundle_downloaded bundle-1.bundle trace2.txt &&
     -+	! test_bundle_downloaded bundle-2.bundle trace2.txt &&
     -+	! test_bundle_downloaded bundle-3.bundle trace2.txt &&
     ++	cat >expect <<-EOF &&
     ++	$HTTPD_URL/bundle-list
     ++	$HTTPD_URL/bundle-4.bundle
     ++	EOF
     ++
     ++	test_remote_https_urls <trace2.txt >actual &&
     ++	test_cmp expect actual &&
      +
      +	# received merge ref from bundle-4, but right is missing
      +	# because we did not download bundle-3.
  8:  51f210ddeb4 !  9:  6a1504b1c3a bundle-uri: store fetch.bundleCreationToken
     @@ Commit message
          When checking the same bundle list twice, this strategy requires
          downloading the bundle with the maximum creationToken again, which is
          wasteful. The creationToken heuristic promises that the client will not
     -    have a use for that bundle if its creationToken value is the at most the
     +    have a use for that bundle if its creationToken value is at most the
          previous creationToken value.
      
          To prevent these wasteful downloads, create a fetch.bundleCreationToken
     @@ Commit message
      
       ## Documentation/config/fetch.txt ##
      @@ Documentation/config/fetch.txt: fetch.bundleURI::
     - 	first running `git fetch --bundle-uri=<uri>` immediately before
     - 	`git fetch <args>`. See details of the `--bundle-uri` option in
     - 	linkgit:git-fetch[1].
     + 	linkgit:git-clone[1]. `git clone --bundle-uri` will set the
     + 	`fetch.bundleURI` value if the supplied bundle URI contains a bundle
     + 	list that is organized for incremental fetches.
     +++
     ++If you modify this value and your repository has a `fetch.bundleCreationToken`
     ++value, then remove that `fetch.bundleCreationToken` value before fetching from
     ++the new bundle URI.
      +
      +fetch.bundleCreationToken::
      +	When using `fetch.bundleURI` to fetch incrementally from a bundle
     @@ Documentation/config/fetch.txt: fetch.bundleURI::
      +	This value is used to prevent downloading bundles in the future
      +	if the advertised `creationToken` is not strictly larger than this
      +	value.
     +++
     ++The creation token values are chosen by the provider serving the specific
     ++bundle URI. If you modify the URI at `fetch.bundleURI`, then be sure to
     ++remove the value for the `fetch.bundleCreationToken` value before fetching.
      
       ## bundle-uri.c ##
      @@ bundle-uri.c: static int fetch_bundles_by_token(struct repository *r,
       {
       	int cur;
     - 	int pop_or_push = 0;
     + 	int move_direction = 0;
      +	const char *creationTokenStr;
     -+	uint64_t maxCreationToken;
     ++	uint64_t maxCreationToken = 0, newMaxCreationToken = 0;
       	struct bundle_list_context ctx = {
       		.r = r,
       		.list = list,
      @@ bundle-uri.c: static int fetch_bundles_by_token(struct repository *r,
       
     - 	for_all_bundles_in_list(list, insert_bundle, &sorted);
     + 	for_all_bundles_in_list(list, append_bundle, &bundles);
       
     -+	if (!sorted.nr) {
     -+		free(sorted.items);
     ++	if (!bundles.nr) {
     ++		free(bundles.items);
      +		return 0;
      +	}
      +
     - 	QSORT(sorted.items, sorted.nr, compare_creation_token);
     + 	QSORT(bundles.items, bundles.nr, compare_creation_token_decreasing);
       
      +	/*
      +	 * If fetch.bundleCreationToken exists, parses to a uint64t, and
     @@ bundle-uri.c: static int fetch_bundles_by_token(struct repository *r,
      +				   "fetch.bundlecreationtoken",
      +				   &creationTokenStr) &&
      +	    sscanf(creationTokenStr, "%"PRIu64, &maxCreationToken) == 1 &&
     -+	    sorted.items[0]->creationToken <= maxCreationToken) {
     -+		free(sorted.items);
     ++	    bundles.items[0]->creationToken <= maxCreationToken) {
     ++		free(bundles.items);
      +		return 0;
      +	}
      +
       	/*
     - 	 * Use a stack-based approach to download the bundles and attempt
     - 	 * to unbundle them in decreasing order by creation token. If we
     + 	 * Attempt to download and unbundle the minimum number of bundles by
     + 	 * creationToken in decreasing order. If we fail to unbundle (after
     +@@ bundle-uri.c: static int fetch_bundles_by_token(struct repository *r,
     + 	cur = 0;
     + 	while (cur >= 0 && cur < bundles.nr) {
     + 		struct remote_bundle_info *bundle = bundles.items[cur];
     ++
     ++		/*
     ++		 * If we need to dig into bundles below the previous
     ++		 * creation token value, then likely we are in an erroneous
     ++		 * state due to missing or invalid bundles. Halt the process
     ++		 * instead of continuing to download extra data.
     ++		 */
     ++		if (bundle->creationToken <= maxCreationToken)
     ++			break;
     ++
     + 		if (!bundle->file) {
     + 			/*
     + 			 * Not downloaded yet. Try downloading.
     +@@ bundle-uri.c: static int fetch_bundles_by_token(struct repository *r,
     + 				 */
     + 				move_direction = -1;
     + 				bundle->unbundled = 1;
     ++
     ++				if (bundle->creationToken > newMaxCreationToken)
     ++					newMaxCreationToken = bundle->creationToken;
     + 			}
     + 		}
     + 
      @@ bundle-uri.c: stack_operation:
     - 		cur += pop_or_push;
     + 		cur += move_direction;
       	}
       
     --	free(sorted.items);
     +-	free(bundles.items);
      -
       	/*
       	 * We succeed if the loop terminates because 'cur' drops below
     @@ bundle-uri.c: stack_operation:
       	 */
      +	if (cur < 0) {
      +		struct strbuf value = STRBUF_INIT;
     -+		strbuf_addf(&value, "%"PRIu64"", sorted.items[0]->creationToken);
     ++		strbuf_addf(&value, "%"PRIu64"", newMaxCreationToken);
      +		if (repo_config_set_multivar_gently(ctx.r,
      +						    "fetch.bundleCreationToken",
      +						    value.buf, NULL, 0))
     @@ bundle-uri.c: stack_operation:
      +		strbuf_release(&value);
      +	}
      +
     -+	free(sorted.items);
     ++	free(bundles.items);
       	return cur >= 0;
       }
       
      
       ## t/t5558-clone-bundle-uri.sh ##
     +@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone incomplete bundle list (http, creationToken)' '
     + 		"$HTTPD_URL/smart/fetch.git" clone-token-http &&
     + 
     + 	test_cmp_config -C clone-token-http "$HTTPD_URL/bundle-list" fetch.bundleuri &&
     ++	test_cmp_config -C clone-token-http 1 fetch.bundlecreationtoken &&
     + 
     + 	cat >expect <<-EOF &&
     + 	$HTTPD_URL/bundle-list
     +@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone incomplete bundle list (http, creationToken)' '
     + 	GIT_TRACE2_EVENT="$(pwd)/trace1.txt" \
     + 		git -C clone-token-http fetch origin --no-tags \
     + 		refs/heads/merge:refs/heads/merge &&
     ++	test_cmp_config -C clone-token-http 4 fetch.bundlecreationtoken &&
     + 
     + 	cat >expect <<-EOF &&
     + 	$HTTPD_URL/bundle-list
      @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
       		"$HTTPD_URL/smart/fetch.git" fetch-http-4 &&
       
       	test_cmp_config -C fetch-http-4 "$HTTPD_URL/bundle-list" fetch.bundleuri &&
      +	test_cmp_config -C fetch-http-4 1 fetch.bundlecreationtoken &&
       
     - 	# The clone should copy two files: the list and bundle-1.
     - 	test_bundle_downloaded bundle-list trace-clone.txt &&
     + 	cat >expect <<-EOF &&
     + 	$HTTPD_URL/bundle-list
      @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
     + 		git -C fetch-http-4 fetch origin --no-tags \
       		refs/heads/left:refs/heads/left \
       		refs/heads/right:refs/heads/right &&
     - 
      +	test_cmp_config -C fetch-http-4 2 fetch.bundlecreationtoken &&
     -+
     - 	# This fetch should copy two files: the list and bundle-2.
     - 	test_bundle_downloaded bundle-list trace1.txt &&
     - 	test_bundle_downloaded bundle-2.bundle trace1.txt &&
     + 
     + 	cat >expect <<-EOF &&
     + 	$HTTPD_URL/bundle-list
      @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
       	EOF
       	test_cmp expect refs &&
     @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'http clone with bundle.heurist
      +		git -C fetch-http-4 fetch origin --no-tags \
      +		refs/heads/left:refs/heads/left \
      +		refs/heads/right:refs/heads/right &&
     -+	test_bundle_downloaded bundle-list trace1b.txt &&
     -+	! test_bundle_downloaded bundle-1.bundle trace1b.txt &&
     -+	! test_bundle_downloaded bundle-2.bundle trace1b.txt &&
     ++
     ++	cat >expect <<-EOF &&
     ++	$HTTPD_URL/bundle-list
     ++	EOF
     ++	test_remote_https_urls <trace1b.txt >actual &&
     ++	test_cmp expect actual &&
      +
       	cat >>"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
       	[bundle "bundle-3"]
       		uri = bundle-3.bundle
      @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
     + 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
       		git -C fetch-http-4 fetch origin --no-tags \
       		refs/heads/merge:refs/heads/merge &&
     - 
      +	test_cmp_config -C fetch-http-4 4 fetch.bundlecreationtoken &&
     -+
     - 	# This fetch should copy three files: the list, bundle-3, and bundle-4.
     - 	test_bundle_downloaded bundle-list trace2.txt &&
     - 	test_bundle_downloaded bundle-4.bundle trace2.txt &&
     + 
     + 	cat >expect <<-EOF &&
     + 	$HTTPD_URL/bundle-list
      @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
       	refs/bundles/left
       	refs/bundles/merge
     @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'http clone with bundle.heurist
      +	# No-op fetch
      +	GIT_TRACE2_EVENT="$(pwd)/trace2b.txt" \
      +		git -C fetch-http-4 fetch origin &&
     -+	test_bundle_downloaded bundle-list trace2b.txt &&
     -+	! test_bundle_downloaded bundle-1.bundle trace2b.txt &&
     -+	! test_bundle_downloaded bundle-2.bundle trace2b.txt &&
     -+	! test_bundle_downloaded bundle-3.bundle trace2b.txt &&
     -+	! test_bundle_downloaded bundle-4.bundle trace2b.txt
     ++
     ++	cat >expect <<-EOF &&
     ++	$HTTPD_URL/bundle-list
     ++	EOF
     ++	test_remote_https_urls <trace2b.txt >actual &&
     ++	test_cmp expect actual
       '
       
       # Do not add tests here unless they use the HTTP server, as they will
  -:  ----------- > 10:  676522615ad bundle-uri: test missing bundles with heuristic

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH v2 01/10] bundle: optionally skip reachability walk
  2023-01-23 15:21 ` [PATCH v2 00/10] Bundle URIs V: creationToken heuristic for incremental fetches Derrick Stolee via GitGitGadget
@ 2023-01-23 15:21   ` Derrick Stolee via GitGitGadget
  2023-01-23 18:03     ` Junio C Hamano
  2023-01-23 15:21   ` [PATCH v2 02/10] t5558: add tests for creationToken heuristic Derrick Stolee via GitGitGadget
                     ` (10 subsequent siblings)
  11 siblings, 1 reply; 74+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-01-23 15:21 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

When unbundling a bundle, the verify_bundle() method checks two things
with regards to the prerequisite commits:

 1. Those commits are in the object store, and
 2. Those commits are reachable from refs.

During testing of the bundle URI feature, where multiple bundles are
unbundled in the same process, the ref store did not appear to be
refreshing with the new refs/bundles/* references added within that
process. This caused the second half -- the reachability walk -- report
that some commits were not present, despite actually being present.

One way to attempt to fix this would be to create a way to force-refresh
the ref state. That would correct this for these cases where the
refs/bundles/* references have been updated. However, this still is an
expensive operation in a repository with many references.

Instead, optionally allow callers to skip this portion by instead just
checking for presence within the object store. Use this when unbundling
in bundle-uri.c.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c | 8 +++++++-
 bundle.c     | 3 ++-
 bundle.h     | 1 +
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index 36268dda172..2f079f713cf 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -322,9 +322,15 @@ static int unbundle_from_file(struct repository *r, const char *file)
 	 * Skip the reachability walk here, since we will be adding
 	 * a reachable ref pointing to the new tips, which will reach
 	 * the prerequisite commits.
+	 *
+	 * Since multiple iterations of unbundle_from_file() can create
+	 * new commits in the object store that are not reachable from
+	 * the current cached state of the ref store, skip the reachability
+	 * walk and move forward as long as the objects are present in the
+	 * object store.
 	 */
 	if ((result = unbundle(r, &header, bundle_fd, NULL,
-			       VERIFY_BUNDLE_QUIET)))
+			       VERIFY_BUNDLE_QUIET | VERIFY_BUNDLE_SKIP_REACHABLE)))
 		return 1;
 
 	/*
diff --git a/bundle.c b/bundle.c
index 4ef7256aa11..b51974f0806 100644
--- a/bundle.c
+++ b/bundle.c
@@ -223,7 +223,8 @@ int verify_bundle(struct repository *r,
 			error("%s", message);
 		error("%s %s", oid_to_hex(oid), name);
 	}
-	if (revs.pending.nr != p->nr)
+	if (revs.pending.nr != p->nr ||
+	    (flags & VERIFY_BUNDLE_SKIP_REACHABLE))
 		goto cleanup;
 	req_nr = revs.pending.nr;
 	setup_revisions(2, argv, &revs, NULL);
diff --git a/bundle.h b/bundle.h
index 9f2bd733a6a..24c30e5f74a 100644
--- a/bundle.h
+++ b/bundle.h
@@ -34,6 +34,7 @@ int create_bundle(struct repository *r, const char *path,
 enum verify_bundle_flags {
 	VERIFY_BUNDLE_VERBOSE = (1 << 0),
 	VERIFY_BUNDLE_QUIET = (1 << 1),
+	VERIFY_BUNDLE_SKIP_REACHABLE = (1 << 2),
 };
 
 int verify_bundle(struct repository *r, struct bundle_header *header,
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v2 02/10] t5558: add tests for creationToken heuristic
  2023-01-23 15:21 ` [PATCH v2 00/10] Bundle URIs V: creationToken heuristic for incremental fetches Derrick Stolee via GitGitGadget
  2023-01-23 15:21   ` [PATCH v2 01/10] bundle: optionally skip reachability walk Derrick Stolee via GitGitGadget
@ 2023-01-23 15:21   ` Derrick Stolee via GitGitGadget
  2023-01-27 19:15     ` Victoria Dye
  2023-01-23 15:21   ` [PATCH v2 03/10] bundle-uri: parse bundle.heuristic=creationToken Derrick Stolee via GitGitGadget
                     ` (9 subsequent siblings)
  11 siblings, 1 reply; 74+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-01-23 15:21 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

As documented in the bundle URI design doc in 2da14fad8fe (docs:
document bundle URI standard, 2022-08-09), the 'creationToken' member of
a bundle URI allows a bundle provider to specify a total order on the
bundles.

Future changes will allow the Git client to understand these members and
modify its behavior around downloading the bundles in that order. In the
meantime, create tests that add creation tokens to the bundle list. For
now, the Git client correctly ignores these unknown keys.

Create a new test helper function, test_remote_https_urls, which filters
GIT_TRACE2_EVENT output to extract a list of URLs passed to
git-remote-https child processes. This can be used to verify the order
of these requests as we implement the creationToken heuristic. For now,
we need to sort the actual output since the current client does not have
a well-defined order that it applies to the bundles.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 t/t5558-clone-bundle-uri.sh | 69 +++++++++++++++++++++++++++++++++++--
 t/test-lib-functions.sh     |  8 +++++
 2 files changed, 75 insertions(+), 2 deletions(-)

diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index 9155f31fa2c..474432c8ace 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -285,6 +285,8 @@ test_expect_success 'clone HTTP bundle' '
 '
 
 test_expect_success 'clone bundle list (HTTP, no heuristic)' '
+	test_when_finished rm -f trace*.txt &&
+
 	cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" &&
 	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
 	[bundle]
@@ -304,12 +306,26 @@ test_expect_success 'clone bundle list (HTTP, no heuristic)' '
 		uri = $HTTPD_URL/bundle-4.bundle
 	EOF
 
-	git clone --bundle-uri="$HTTPD_URL/bundle-list" \
+	GIT_TRACE2_EVENT="$(pwd)/trace-clone.txt" \
+		git clone --bundle-uri="$HTTPD_URL/bundle-list" \
 		clone-from clone-list-http  2>err &&
 	! grep "Repository lacks these prerequisite commits" err &&
 
 	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
-	git -C clone-list-http cat-file --batch-check <oids
+	git -C clone-list-http cat-file --batch-check <oids &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-1.bundle
+	$HTTPD_URL/bundle-2.bundle
+	$HTTPD_URL/bundle-3.bundle
+	$HTTPD_URL/bundle-4.bundle
+	$HTTPD_URL/bundle-list
+	EOF
+
+	# Sort the list, since the order is not well-defined
+	# without a heuristic.
+	test_remote_https_urls <trace-clone.txt | sort >actual &&
+	test_cmp expect actual
 '
 
 test_expect_success 'clone bundle list (HTTP, any mode)' '
@@ -350,6 +366,55 @@ test_expect_success 'clone bundle list (HTTP, any mode)' '
 	test_cmp expect actual
 '
 
+test_expect_success 'clone bundle list (http, creationToken)' '
+	test_when_finished rm -f trace*.txt &&
+
+	cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" &&
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = bundle-1.bundle
+		creationToken = 1
+
+	[bundle "bundle-2"]
+		uri = bundle-2.bundle
+		creationToken = 2
+
+	[bundle "bundle-3"]
+		uri = bundle-3.bundle
+		creationToken = 3
+
+	[bundle "bundle-4"]
+		uri = bundle-4.bundle
+		creationToken = 4
+	EOF
+
+	GIT_TRACE2_EVENT="$(pwd)/trace-clone.txt" git \
+		clone --bundle-uri="$HTTPD_URL/bundle-list" \
+		"$HTTPD_URL/smart/fetch.git" clone-list-http-2 &&
+
+	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
+	git -C clone-list-http-2 cat-file --batch-check <oids &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-1.bundle
+	$HTTPD_URL/bundle-2.bundle
+	$HTTPD_URL/bundle-3.bundle
+	$HTTPD_URL/bundle-4.bundle
+	$HTTPD_URL/bundle-list
+	EOF
+
+	# Since the creationToken heuristic is not yet understood by the
+	# client, the order cannot be verified at this moment. Sort the
+	# list for consistent results.
+	test_remote_https_urls <trace-clone.txt | sort >actual &&
+	test_cmp expect actual
+'
+
 # Do not add tests here unless they use the HTTP server, as they will
 # not run unless the HTTP dependencies exist.
 
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index f036c4d3003..ace542f4226 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1833,6 +1833,14 @@ test_region () {
 	return 0
 }
 
+# Given a GIT_TRACE2_EVENT log over stdin, writes to stdout a list of URLs
+# sent to git-remote-https child processes.
+test_remote_https_urls() {
+	grep -e '"event":"child_start".*"argv":\["git-remote-https",".*"\]' |
+		sed -e 's/{"event":"child_start".*"argv":\["git-remote-https","//g' \
+		    -e 's/"\]}//g'
+}
+
 # Print the destination of symlink(s) provided as arguments. Basically
 # the same as the readlink command, but it's not available everywhere.
 test_readlink () {
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v2 03/10] bundle-uri: parse bundle.heuristic=creationToken
  2023-01-23 15:21 ` [PATCH v2 00/10] Bundle URIs V: creationToken heuristic for incremental fetches Derrick Stolee via GitGitGadget
  2023-01-23 15:21   ` [PATCH v2 01/10] bundle: optionally skip reachability walk Derrick Stolee via GitGitGadget
  2023-01-23 15:21   ` [PATCH v2 02/10] t5558: add tests for creationToken heuristic Derrick Stolee via GitGitGadget
@ 2023-01-23 15:21   ` Derrick Stolee via GitGitGadget
  2023-01-23 15:21   ` [PATCH v2 04/10] bundle-uri: parse bundle.<id>.creationToken values Derrick Stolee via GitGitGadget
                     ` (8 subsequent siblings)
  11 siblings, 0 replies; 74+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-01-23 15:21 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The bundle.heuristic value communicates that the bundle list is
organized to make use of the bundle.<id>.creationToken values that may
be provided in the bundle list. Those values will create a total order
on the bundles, allowing the Git client to download them in a specific
order and even remember previously-downloaded bundles by storing the
maximum creation token value.

Before implementing any logic that parses or uses the
bundle.<id>.creationToken values, teach Git to parse the
bundle.heuristic value from a bundle list. We can use 'test-tool
bundle-uri' to print the heuristic value and verify that the parsing
works correctly.

As an extra precaution, create the internal 'heuristics' array to be a
list of (enum, string) pairs so we can iterate through the array entries
carefully, regardless of the enum values.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/config/bundle.txt |  7 +++++++
 bundle-uri.c                    | 34 +++++++++++++++++++++++++++++++++
 bundle-uri.h                    | 14 ++++++++++++++
 t/t5750-bundle-uri-parse.sh     | 19 ++++++++++++++++++
 4 files changed, 74 insertions(+)

diff --git a/Documentation/config/bundle.txt b/Documentation/config/bundle.txt
index daa21eb674a..3faae386853 100644
--- a/Documentation/config/bundle.txt
+++ b/Documentation/config/bundle.txt
@@ -15,6 +15,13 @@ bundle.mode::
 	complete understanding of the bundled information (`all`) or if any one
 	of the listed bundle URIs is sufficient (`any`).
 
+bundle.heuristic::
+	If this string-valued key exists, then the bundle list is designed to
+	work well with incremental `git fetch` commands. The heuristic signals
+	that there are additional keys available for each bundle that help
+	determine which subset of bundles the client should download. The
+	only value currently understood is `creationToken`.
+
 bundle.<id>.*::
 	The `bundle.<id>.*` keys are used to describe a single item in the
 	bundle list, grouped under `<id>` for identification purposes.
diff --git a/bundle-uri.c b/bundle-uri.c
index 2f079f713cf..0d64b1d84ba 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -9,6 +9,14 @@
 #include "config.h"
 #include "remote.h"
 
+static struct {
+	enum bundle_list_heuristic heuristic;
+	const char *name;
+} heuristics[BUNDLE_HEURISTIC__COUNT] = {
+	{ BUNDLE_HEURISTIC_NONE, ""},
+	{ BUNDLE_HEURISTIC_CREATIONTOKEN, "creationToken" },
+};
+
 static int compare_bundles(const void *hashmap_cmp_fn_data,
 			   const struct hashmap_entry *he1,
 			   const struct hashmap_entry *he2,
@@ -100,6 +108,17 @@ void print_bundle_list(FILE *fp, struct bundle_list *list)
 	fprintf(fp, "\tversion = %d\n", list->version);
 	fprintf(fp, "\tmode = %s\n", mode);
 
+	if (list->heuristic) {
+		int i;
+		for (i = 0; i < BUNDLE_HEURISTIC__COUNT; i++) {
+			if (heuristics[i].heuristic == list->heuristic) {
+				printf("\theuristic = %s\n",
+				       heuristics[list->heuristic].name);
+				break;
+			}
+		}
+	}
+
 	for_all_bundles_in_list(list, summarize_bundle, fp);
 }
 
@@ -142,6 +161,21 @@ static int bundle_list_update(const char *key, const char *value,
 			return 0;
 		}
 
+		if (!strcmp(subkey, "heuristic")) {
+			int i;
+			for (i = 0; i < BUNDLE_HEURISTIC__COUNT; i++) {
+				if (heuristics[i].heuristic &&
+				    heuristics[i].name &&
+				    !strcmp(value, heuristics[i].name)) {
+					list->heuristic = heuristics[i].heuristic;
+					return 0;
+				}
+			}
+
+			/* Ignore unknown heuristics. */
+			return 0;
+		}
+
 		/* Ignore other unknown global keys. */
 		return 0;
 	}
diff --git a/bundle-uri.h b/bundle-uri.h
index d5e89f1671c..2e44a50a90b 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -52,6 +52,14 @@ enum bundle_list_mode {
 	BUNDLE_MODE_ANY
 };
 
+enum bundle_list_heuristic {
+	BUNDLE_HEURISTIC_NONE = 0,
+	BUNDLE_HEURISTIC_CREATIONTOKEN,
+
+	/* Must be last. */
+	BUNDLE_HEURISTIC__COUNT
+};
+
 /**
  * A bundle_list contains an unordered set of remote_bundle_info structs,
  * as well as information about the bundle listing, such as version and
@@ -75,6 +83,12 @@ struct bundle_list {
 	 * advertised by the bundle list at that location.
 	 */
 	char *baseURI;
+
+	/**
+	 * A list can have a heuristic, which helps reduce the number of
+	 * downloaded bundles.
+	 */
+	enum bundle_list_heuristic heuristic;
 };
 
 void init_bundle_list(struct bundle_list *list);
diff --git a/t/t5750-bundle-uri-parse.sh b/t/t5750-bundle-uri-parse.sh
index 7b4f930e532..6fc92a9c0d4 100755
--- a/t/t5750-bundle-uri-parse.sh
+++ b/t/t5750-bundle-uri-parse.sh
@@ -250,4 +250,23 @@ test_expect_success 'parse config format edge cases: empty key or value' '
 	test_cmp_config_output expect actual
 '
 
+test_expect_success 'parse config format: creationToken heuristic' '
+	cat >expect <<-\EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+	[bundle "one"]
+		uri = http://example.com/bundle.bdl
+	[bundle "two"]
+		uri = https://example.com/bundle.bdl
+	[bundle "three"]
+		uri = file:///usr/share/git/bundle.bdl
+	EOF
+
+	test-tool bundle-uri parse-config expect >actual 2>err &&
+	test_must_be_empty err &&
+	test_cmp_config_output expect actual
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v2 04/10] bundle-uri: parse bundle.<id>.creationToken values
  2023-01-23 15:21 ` [PATCH v2 00/10] Bundle URIs V: creationToken heuristic for incremental fetches Derrick Stolee via GitGitGadget
                     ` (2 preceding siblings ...)
  2023-01-23 15:21   ` [PATCH v2 03/10] bundle-uri: parse bundle.heuristic=creationToken Derrick Stolee via GitGitGadget
@ 2023-01-23 15:21   ` Derrick Stolee via GitGitGadget
  2023-01-23 15:21   ` [PATCH v2 05/10] bundle-uri: download in creationToken order Derrick Stolee via GitGitGadget
                     ` (7 subsequent siblings)
  11 siblings, 0 replies; 74+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-01-23 15:21 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The previous change taught Git to parse the bundle.heuristic value,
especially when its value is "creationToken". Now, teach Git to parse
the bundle.<id>.creationToken values on each bundle in a bundle list.

Before implementing any logic based on creationToken values for the
creationToken heuristic, parse and print these values for testing
purposes.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c                | 10 ++++++++++
 bundle-uri.h                |  6 ++++++
 t/t5750-bundle-uri-parse.sh | 18 ++++++++++++++++++
 3 files changed, 34 insertions(+)

diff --git a/bundle-uri.c b/bundle-uri.c
index 0d64b1d84ba..f46ab5c1743 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -83,6 +83,9 @@ static int summarize_bundle(struct remote_bundle_info *info, void *data)
 	FILE *fp = data;
 	fprintf(fp, "[bundle \"%s\"]\n", info->id);
 	fprintf(fp, "\turi = %s\n", info->uri);
+
+	if (info->creationToken)
+		fprintf(fp, "\tcreationToken = %"PRIu64"\n", info->creationToken);
 	return 0;
 }
 
@@ -203,6 +206,13 @@ static int bundle_list_update(const char *key, const char *value,
 		return 0;
 	}
 
+	if (!strcmp(subkey, "creationtoken")) {
+		if (sscanf(value, "%"PRIu64, &bundle->creationToken) != 1)
+			warning(_("could not parse bundle list key %s with value '%s'"),
+				"creationToken", value);
+		return 0;
+	}
+
 	/*
 	 * At this point, we ignore any information that we don't
 	 * understand, assuming it to be hints for a heuristic the client
diff --git a/bundle-uri.h b/bundle-uri.h
index 2e44a50a90b..ef32840bfa6 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -42,6 +42,12 @@ struct remote_bundle_info {
 	 * this boolean is true.
 	 */
 	unsigned unbundled:1;
+
+	/**
+	 * If the bundle is part of a list with the creationToken
+	 * heuristic, then we use this member for sorting the bundles.
+	 */
+	uint64_t creationToken;
 };
 
 #define REMOTE_BUNDLE_INFO_INIT { 0 }
diff --git a/t/t5750-bundle-uri-parse.sh b/t/t5750-bundle-uri-parse.sh
index 6fc92a9c0d4..81bdf58b944 100755
--- a/t/t5750-bundle-uri-parse.sh
+++ b/t/t5750-bundle-uri-parse.sh
@@ -258,10 +258,13 @@ test_expect_success 'parse config format: creationToken heuristic' '
 		heuristic = creationToken
 	[bundle "one"]
 		uri = http://example.com/bundle.bdl
+		creationToken = 123456
 	[bundle "two"]
 		uri = https://example.com/bundle.bdl
+		creationToken = 12345678901234567890
 	[bundle "three"]
 		uri = file:///usr/share/git/bundle.bdl
+		creationToken = 1
 	EOF
 
 	test-tool bundle-uri parse-config expect >actual 2>err &&
@@ -269,4 +272,19 @@ test_expect_success 'parse config format: creationToken heuristic' '
 	test_cmp_config_output expect actual
 '
 
+test_expect_success 'parse config format edge cases: creationToken heuristic' '
+	cat >expect <<-\EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+	[bundle "one"]
+		uri = http://example.com/bundle.bdl
+		creationToken = bogus
+	EOF
+
+	test-tool bundle-uri parse-config expect >actual 2>err &&
+	grep "could not parse bundle list key creationToken with value '\''bogus'\''" err
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v2 05/10] bundle-uri: download in creationToken order
  2023-01-23 15:21 ` [PATCH v2 00/10] Bundle URIs V: creationToken heuristic for incremental fetches Derrick Stolee via GitGitGadget
                     ` (3 preceding siblings ...)
  2023-01-23 15:21   ` [PATCH v2 04/10] bundle-uri: parse bundle.<id>.creationToken values Derrick Stolee via GitGitGadget
@ 2023-01-23 15:21   ` Derrick Stolee via GitGitGadget
  2023-01-27 19:17     ` Victoria Dye
  2023-01-23 15:21   ` [PATCH v2 06/10] clone: set fetch.bundleURI if appropriate Derrick Stolee via GitGitGadget
                     ` (6 subsequent siblings)
  11 siblings, 1 reply; 74+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-01-23 15:21 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The creationToken heuristic provides an ordering on the bundles
advertised by a bundle list. Teach the Git client to download bundles
differently when this heuristic is advertised.

The bundles in the list are sorted by their advertised creationToken
values, then downloaded in decreasing order. This avoids the previous
strategy of downloading bundles in an arbitrary order and attempting
to apply them (likely failing in the case of required commits) until
discovering the order through attempted unbundling.

During a fresh 'git clone', it may make sense to download the bundles in
increasing order, since that would prevent the need to attempt
unbundling a bundle with required commits that do not exist in our empty
object store. The cost of testing an unbundle is quite low, and instead
the chosen order is optimizing for a future bundle download during a
'git fetch' operation with a non-empty object store.

Since the Git client continues fetching from the Git remote after
downloading and unbundling bundles, the client's object store can be
ahead of the bundle provider's object store. The next time it attempts
to download from the bundle list, it makes most sense to download only
the most-recent bundles until all tips successfully unbundle. The
strategy implemented here provides that short-circuit where the client
downloads a minimal set of bundles.

However, we are not satisfied by the naive approach of downloading
bundles until one successfully unbundles, expecting the earlier bundles
to successfully unbundle now. The example repository in t5558
demonstrates this well:

 ---------------- bundle-4

       4
      / \
 ----|---|------- bundle-3
     |   |
     |   3
     |   |
 ----|---|------- bundle-2
     |   |
     2   |
     |   |
 ----|---|------- bundle-1
      \ /
       1
       |
 (previous commits)

In this repository, if we already have the objects for bundle-1 and then
try to fetch from this list, the naive approach will fail. bundle-4
requires both bundle-3 and bundle-2, though bundle-3 will successfully
unbundle without bundle-2. Thus, the algorithm needs to keep this in
mind.

A later implementation detail will store the maximum creationToken seen
during such a bundle download, and the client will avoid downloading a
bundle unless its creationToken is strictly greater than that stored
value. For now, if the client seeks to download from an identical
bundle list since its previous download, it will download the
most-recent bundle then stop since its required commits are already in
the object store.

Add tests that exercise this behavior, but we will expand upon these
tests when incremental downloads during 'git fetch' make use of
creationToken values.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c                | 156 +++++++++++++++++++++++++++++++++++-
 t/t5558-clone-bundle-uri.sh |  40 +++++++--
 t/t5601-clone.sh            |  46 +++++++++++
 3 files changed, 233 insertions(+), 9 deletions(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index f46ab5c1743..39acd856fb9 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -453,6 +453,139 @@ static int download_bundle_to_file(struct remote_bundle_info *bundle, void *data
 	return 0;
 }
 
+struct bundles_for_sorting {
+	struct remote_bundle_info **items;
+	size_t alloc;
+	size_t nr;
+};
+
+static int append_bundle(struct remote_bundle_info *bundle, void *data)
+{
+	struct bundles_for_sorting *list = data;
+	list->items[list->nr++] = bundle;
+	return 0;
+}
+
+/**
+ * For use in QSORT() to get a list sorted by creationToken
+ * in decreasing order.
+ */
+static int compare_creation_token_decreasing(const void *va, const void *vb)
+{
+	const struct remote_bundle_info * const *a = va;
+	const struct remote_bundle_info * const *b = vb;
+
+	if ((*a)->creationToken > (*b)->creationToken)
+		return -1;
+	if ((*a)->creationToken < (*b)->creationToken)
+		return 1;
+	return 0;
+}
+
+static int fetch_bundles_by_token(struct repository *r,
+				  struct bundle_list *list)
+{
+	int cur;
+	int move_direction = 0;
+	struct bundle_list_context ctx = {
+		.r = r,
+		.list = list,
+		.mode = list->mode,
+	};
+	struct bundles_for_sorting bundles = {
+		.alloc = hashmap_get_size(&list->bundles),
+	};
+
+	ALLOC_ARRAY(bundles.items, bundles.alloc);
+
+	for_all_bundles_in_list(list, append_bundle, &bundles);
+
+	QSORT(bundles.items, bundles.nr, compare_creation_token_decreasing);
+
+	/*
+	 * Attempt to download and unbundle the minimum number of bundles by
+	 * creationToken in decreasing order. If we fail to unbundle (after
+	 * a successful download) then move to the next non-downloaded bundle
+	 * and attempt downloading. Once we succeed in applying a bundle,
+	 * move to the previous unapplied bundle and attempt to unbundle it
+	 * again.
+	 *
+	 * In the case of a fresh clone, we will likely download all of the
+	 * bundles before successfully unbundling the oldest one, then the
+	 * rest of the bundles unbundle successfully in increasing order
+	 * of creationToken.
+	 *
+	 * If there are existing objects, then this process may terminate
+	 * early when all required commits from "new" bundles exist in the
+	 * repo's object store.
+	 */
+	cur = 0;
+	while (cur >= 0 && cur < bundles.nr) {
+		struct remote_bundle_info *bundle = bundles.items[cur];
+		if (!bundle->file) {
+			/*
+			 * Not downloaded yet. Try downloading.
+			 *
+			 * Note that bundle->file is non-NULL if a download
+			 * was attempted, even if it failed to download.
+			 */
+			if (fetch_bundle_uri_internal(ctx.r, bundle, ctx.depth + 1, ctx.list)) {
+				/* Mark as unbundled so we do not retry. */
+				bundle->unbundled = 1;
+
+				/* Try looking deeper in the list. */
+				move_direction = 1;
+				goto stack_operation;
+			}
+
+			/* We expect bundles when using creationTokens. */
+			if (!is_bundle(bundle->file, 1)) {
+				warning(_("file downloaded from '%s' is not a bundle"),
+					bundle->uri);
+				break;
+			}
+		}
+
+		if (bundle->file && !bundle->unbundled) {
+			/*
+			 * This was downloaded, but not successfully
+			 * unbundled. Try unbundling again.
+			 */
+			if (unbundle_from_file(ctx.r, bundle->file)) {
+				/* Try looking deeper in the list. */
+				move_direction = 1;
+			} else {
+				/*
+				 * Succeeded in unbundle. Retry bundles
+				 * that previously failed to unbundle.
+				 */
+				move_direction = -1;
+				bundle->unbundled = 1;
+			}
+		}
+
+		/*
+		 * Else case: downloaded and unbundled successfully.
+		 * Skip this by moving in the same direction as the
+		 * previous step.
+		 */
+
+stack_operation:
+		/* Move in the specified direction and repeat. */
+		cur += move_direction;
+	}
+
+	free(bundles.items);
+
+	/*
+	 * We succeed if the loop terminates because 'cur' drops below
+	 * zero. The other case is that we terminate because 'cur'
+	 * reaches the end of the list, so we have a failure no matter
+	 * which bundles we apply from the list.
+	 */
+	return cur >= 0;
+}
+
 static int download_bundle_list(struct repository *r,
 				struct bundle_list *local_list,
 				struct bundle_list *global_list,
@@ -490,7 +623,15 @@ static int fetch_bundle_list_in_config_format(struct repository *r,
 		goto cleanup;
 	}
 
-	if ((result = download_bundle_list(r, &list_from_bundle,
+	/*
+	 * If this list uses the creationToken heuristic, then the URIs
+	 * it advertises are expected to be bundles, not nested lists.
+	 * We can drop 'global_list' and 'depth'.
+	 */
+	if (list_from_bundle.heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN) {
+		result = fetch_bundles_by_token(r, &list_from_bundle);
+		global_list->heuristic = BUNDLE_HEURISTIC_CREATIONTOKEN;
+	} else if ((result = download_bundle_list(r, &list_from_bundle,
 					   global_list, depth)))
 		goto cleanup;
 
@@ -632,6 +773,14 @@ int fetch_bundle_list(struct repository *r, struct bundle_list *list)
 	int result;
 	struct bundle_list global_list;
 
+	/*
+	 * If the creationToken heuristic is used, then the URIs
+	 * advertised by 'list' are not nested lists and instead
+	 * direct bundles. We do not need to use global_list.
+	 */
+	if (list->heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN)
+		return fetch_bundles_by_token(r, list);
+
 	init_bundle_list(&global_list);
 
 	/* If a bundle is added to this global list, then it is required. */
@@ -640,7 +789,10 @@ int fetch_bundle_list(struct repository *r, struct bundle_list *list)
 	if ((result = download_bundle_list(r, list, &global_list, 0)))
 		goto cleanup;
 
-	result = unbundle_all_bundles(r, &global_list);
+	if (list->heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN)
+		result = fetch_bundles_by_token(r, list);
+	else
+		result = unbundle_all_bundles(r, &global_list);
 
 cleanup:
 	for_all_bundles_in_list(&global_list, unlink_bundle, NULL);
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index 474432c8ace..6f9417a0afb 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -401,17 +401,43 @@ test_expect_success 'clone bundle list (http, creationToken)' '
 	git -C clone-list-http-2 cat-file --batch-check <oids &&
 
 	cat >expect <<-EOF &&
-	$HTTPD_URL/bundle-1.bundle
-	$HTTPD_URL/bundle-2.bundle
-	$HTTPD_URL/bundle-3.bundle
+	$HTTPD_URL/bundle-list
 	$HTTPD_URL/bundle-4.bundle
+	$HTTPD_URL/bundle-3.bundle
+	$HTTPD_URL/bundle-2.bundle
+	$HTTPD_URL/bundle-1.bundle
+	EOF
+
+	test_remote_https_urls <trace-clone.txt >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'clone incomplete bundle list (http, creationToken)' '
+	test_when_finished rm -f trace*.txt &&
+
+	cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" &&
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = bundle-1.bundle
+		creationToken = 1
+	EOF
+
+	GIT_TRACE2_EVENT=$(pwd)/trace-clone.txt \
+	git clone --bundle-uri="$HTTPD_URL/bundle-list" \
+		--single-branch --branch=base --no-tags \
+		"$HTTPD_URL/smart/fetch.git" clone-token-http &&
+
+	cat >expect <<-EOF &&
 	$HTTPD_URL/bundle-list
+	$HTTPD_URL/bundle-1.bundle
 	EOF
 
-	# Since the creationToken heuristic is not yet understood by the
-	# client, the order cannot be verified at this moment. Sort the
-	# list for consistent results.
-	test_remote_https_urls <trace-clone.txt | sort >actual &&
+	test_remote_https_urls <trace-clone.txt >actual &&
 	test_cmp expect actual
 '
 
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index 1928ea1dd7c..b7d5551262c 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -831,6 +831,52 @@ test_expect_success 'auto-discover multiple bundles from HTTP clone' '
 	grep -f pattern trace.txt
 '
 
+test_expect_success 'auto-discover multiple bundles from HTTP clone: creationToken heuristic' '
+	test_when_finished rm -rf "$HTTPD_DOCUMENT_ROOT_PATH/repo4.git" &&
+	test_when_finished rm -rf clone-heuristic trace*.txt &&
+
+	test_commit -C src newest &&
+	git -C src bundle create "$HTTPD_DOCUMENT_ROOT_PATH/newest.bundle" HEAD~1..HEAD &&
+	git clone --bare --no-local src "$HTTPD_DOCUMENT_ROOT_PATH/repo4.git" &&
+
+	cat >>"$HTTPD_DOCUMENT_ROOT_PATH/repo4.git/config" <<-EOF &&
+	[uploadPack]
+		advertiseBundleURIs = true
+
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "everything"]
+		uri = $HTTPD_URL/everything.bundle
+		creationtoken = 1
+
+	[bundle "new"]
+		uri = $HTTPD_URL/new.bundle
+		creationtoken = 2
+
+	[bundle "newest"]
+		uri = $HTTPD_URL/newest.bundle
+		creationtoken = 3
+	EOF
+
+	GIT_TRACE2_EVENT="$(pwd)/trace-clone.txt" \
+		git -c protocol.version=2 \
+		    -c transfer.bundleURI=true clone \
+		"$HTTPD_URL/smart/repo4.git" clone-heuristic &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/newest.bundle
+	$HTTPD_URL/new.bundle
+	$HTTPD_URL/everything.bundle
+	EOF
+
+	# We should fetch all bundles in the expected order.
+	test_remote_https_urls <trace-clone.txt >actual &&
+	test_cmp expect actual
+'
+
 # DO NOT add non-httpd-specific tests here, because the last part of this
 # test script is only executed when httpd is available and enabled.
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v2 06/10] clone: set fetch.bundleURI if appropriate
  2023-01-23 15:21 ` [PATCH v2 00/10] Bundle URIs V: creationToken heuristic for incremental fetches Derrick Stolee via GitGitGadget
                     ` (4 preceding siblings ...)
  2023-01-23 15:21   ` [PATCH v2 05/10] bundle-uri: download in creationToken order Derrick Stolee via GitGitGadget
@ 2023-01-23 15:21   ` Derrick Stolee via GitGitGadget
  2023-01-23 15:21   ` [PATCH v2 07/10] bundle-uri: drop bundle.flag from design doc Derrick Stolee via GitGitGadget
                     ` (5 subsequent siblings)
  11 siblings, 0 replies; 74+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-01-23 15:21 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

Bundle providers may organize their bundle lists in a way that is
intended to improve incremental fetches, not just initial clones.
However, they do need to state that they have organized with that in
mind, or else the client will not expect to save time by downloading
bundles after the initial clone. This is done by specifying a
bundle.heuristic value.

There are two types of bundle lists: those at a static URI and those
that are advertised from a Git remote over protocol v2.

The new fetch.bundleURI config value applies for static bundle URIs that
are not advertised over protocol v2. If the user specifies a static URI
via 'git clone --bundle-uri', then Git can set this config as a reminder
for future 'git fetch' operations to check the bundle list before
connecting to the remote(s).

For lists provided over protocol v2, we will want to take a different
approach and create a property of the remote itself by creating a
remote.<id>.* type config key. That is not implemented in this change.

Later changes will update 'git fetch' to consume this option.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/config/fetch.txt |  8 +++++++
 builtin/clone.c                |  6 +++++-
 bundle-uri.c                   |  5 ++++-
 bundle-uri.h                   |  8 ++++++-
 t/t5558-clone-bundle-uri.sh    | 39 ++++++++++++++++++++++++++++++++++
 5 files changed, 63 insertions(+), 3 deletions(-)

diff --git a/Documentation/config/fetch.txt b/Documentation/config/fetch.txt
index cd65d236b43..244f44d460f 100644
--- a/Documentation/config/fetch.txt
+++ b/Documentation/config/fetch.txt
@@ -96,3 +96,11 @@ fetch.writeCommitGraph::
 	merge and the write may take longer. Having an updated commit-graph
 	file helps performance of many Git commands, including `git merge-base`,
 	`git push -f`, and `git log --graph`. Defaults to false.
+
+fetch.bundleURI::
+	This value stores a URI for downloading Git object data from a bundle
+	URI before performing an incremental fetch from the origin Git server.
+	This is similar to how the `--bundle-uri` option behaves in
+	linkgit:git-clone[1]. `git clone --bundle-uri` will set the
+	`fetch.bundleURI` value if the supplied bundle URI contains a bundle
+	list that is organized for incremental fetches.
diff --git a/builtin/clone.c b/builtin/clone.c
index 5453ba5277f..5370617664d 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1248,12 +1248,16 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	 * data from the --bundle-uri option.
 	 */
 	if (bundle_uri) {
+		int has_heuristic = 0;
+
 		/* At this point, we need the_repository to match the cloned repo. */
 		if (repo_init(the_repository, git_dir, work_tree))
 			warning(_("failed to initialize the repo, skipping bundle URI"));
-		else if (fetch_bundle_uri(the_repository, bundle_uri))
+		else if (fetch_bundle_uri(the_repository, bundle_uri, &has_heuristic))
 			warning(_("failed to fetch objects from bundle URI '%s'"),
 				bundle_uri);
+		else if (has_heuristic)
+			git_config_set_gently("fetch.bundleuri", bundle_uri);
 	}
 
 	strvec_push(&transport_ls_refs_options.ref_prefixes, "HEAD");
diff --git a/bundle-uri.c b/bundle-uri.c
index 39acd856fb9..162a9276f31 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -742,7 +742,8 @@ static int unlink_bundle(struct remote_bundle_info *info, void *data)
 	return 0;
 }
 
-int fetch_bundle_uri(struct repository *r, const char *uri)
+int fetch_bundle_uri(struct repository *r, const char *uri,
+		     int *has_heuristic)
 {
 	int result;
 	struct bundle_list list;
@@ -762,6 +763,8 @@ int fetch_bundle_uri(struct repository *r, const char *uri)
 	result = unbundle_all_bundles(r, &list);
 
 cleanup:
+	if (has_heuristic)
+		*has_heuristic = (list.heuristic != BUNDLE_HEURISTIC_NONE);
 	for_all_bundles_in_list(&list, unlink_bundle, NULL);
 	clear_bundle_list(&list);
 	clear_remote_bundle_info(&bundle, NULL);
diff --git a/bundle-uri.h b/bundle-uri.h
index ef32840bfa6..6dbc780f661 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -124,8 +124,14 @@ int bundle_uri_parse_config_format(const char *uri,
  * based on that information.
  *
  * Returns non-zero if no bundle information is found at the given 'uri'.
+ *
+ * If the pointer 'has_heuristic' is non-NULL, then the value it points to
+ * will be set to be non-zero if and only if the fetched list has a
+ * heuristic value. Such a value indicates that the list was designed for
+ * incremental fetches.
  */
-int fetch_bundle_uri(struct repository *r, const char *uri);
+int fetch_bundle_uri(struct repository *r, const char *uri,
+		     int *has_heuristic);
 
 /**
  * Given a bundle list that was already advertised (likely by the
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index 6f9417a0afb..b2d15e141ca 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -432,6 +432,8 @@ test_expect_success 'clone incomplete bundle list (http, creationToken)' '
 		--single-branch --branch=base --no-tags \
 		"$HTTPD_URL/smart/fetch.git" clone-token-http &&
 
+	test_cmp_config -C clone-token-http "$HTTPD_URL/bundle-list" fetch.bundleuri &&
+
 	cat >expect <<-EOF &&
 	$HTTPD_URL/bundle-list
 	$HTTPD_URL/bundle-1.bundle
@@ -441,6 +443,43 @@ test_expect_success 'clone incomplete bundle list (http, creationToken)' '
 	test_cmp expect actual
 '
 
+test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
+	test_when_finished rm -rf fetch-http-4 trace*.txt &&
+
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = bundle-1.bundle
+		creationToken = 1
+	EOF
+
+	GIT_TRACE2_EVENT="$(pwd)/trace-clone.txt" \
+	git clone --single-branch --branch=base \
+		--bundle-uri="$HTTPD_URL/bundle-list" \
+		"$HTTPD_URL/smart/fetch.git" fetch-http-4 &&
+
+	test_cmp_config -C fetch-http-4 "$HTTPD_URL/bundle-list" fetch.bundleuri &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-list
+	$HTTPD_URL/bundle-1.bundle
+	EOF
+
+	test_remote_https_urls <trace-clone.txt >actual &&
+	test_cmp expect actual &&
+
+	# only received base ref from bundle-1
+	git -C fetch-http-4 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+	cat >expect <<-\EOF &&
+	refs/bundles/base
+	EOF
+	test_cmp expect refs
+'
+
 # Do not add tests here unless they use the HTTP server, as they will
 # not run unless the HTTP dependencies exist.
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v2 07/10] bundle-uri: drop bundle.flag from design doc
  2023-01-23 15:21 ` [PATCH v2 00/10] Bundle URIs V: creationToken heuristic for incremental fetches Derrick Stolee via GitGitGadget
                     ` (5 preceding siblings ...)
  2023-01-23 15:21   ` [PATCH v2 06/10] clone: set fetch.bundleURI if appropriate Derrick Stolee via GitGitGadget
@ 2023-01-23 15:21   ` Derrick Stolee via GitGitGadget
  2023-01-23 15:21   ` [PATCH v2 08/10] fetch: fetch from an external bundle URI Derrick Stolee via GitGitGadget
                     ` (4 subsequent siblings)
  11 siblings, 0 replies; 74+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-01-23 15:21 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The Implementation Plan section lists a 'bundle.flag' option that is not
documented anywhere else. What is documented elsewhere in the document
and implemented by previous changes is the 'bundle.heuristic' config
key. For now, a heuristic is required to indicate that a bundle list is
organized for use during 'git fetch', and it is also sufficient for all
existing designs.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/technical/bundle-uri.txt | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/Documentation/technical/bundle-uri.txt b/Documentation/technical/bundle-uri.txt
index b78d01d9adf..91d3a13e327 100644
--- a/Documentation/technical/bundle-uri.txt
+++ b/Documentation/technical/bundle-uri.txt
@@ -479,14 +479,14 @@ outline for submitting these features:
    (This choice is an opt-in via a config option and a command-line
    option.)
 
-4. Allow the client to understand the `bundle.flag=forFetch` configuration
+4. Allow the client to understand the `bundle.heuristic` configuration key
    and the `bundle.<id>.creationToken` heuristic. When `git clone`
-   discovers a bundle URI with `bundle.flag=forFetch`, it configures the
-   client repository to check that bundle URI during later `git fetch <remote>`
+   discovers a bundle URI with `bundle.heuristic`, it configures the client
+   repository to check that bundle URI during later `git fetch <remote>`
    commands.
 
 5. Allow clients to discover bundle URIs during `git fetch` and configure
-   a bundle URI for later fetches if `bundle.flag=forFetch`.
+   a bundle URI for later fetches if `bundle.heuristic` is set.
 
 6. Implement the "inspect headers" heuristic to reduce data downloads when
    the `bundle.<id>.creationToken` heuristic is not available.
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v2 08/10] fetch: fetch from an external bundle URI
  2023-01-23 15:21 ` [PATCH v2 00/10] Bundle URIs V: creationToken heuristic for incremental fetches Derrick Stolee via GitGitGadget
                     ` (6 preceding siblings ...)
  2023-01-23 15:21   ` [PATCH v2 07/10] bundle-uri: drop bundle.flag from design doc Derrick Stolee via GitGitGadget
@ 2023-01-23 15:21   ` Derrick Stolee via GitGitGadget
  2023-01-27 19:18     ` Victoria Dye
  2023-01-23 15:21   ` [PATCH v2 09/10] bundle-uri: store fetch.bundleCreationToken Derrick Stolee via GitGitGadget
                     ` (3 subsequent siblings)
  11 siblings, 1 reply; 74+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-01-23 15:21 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

When a user specifies a URI via 'git clone --bundle-uri', that URI may
be a bundle list that advertises a 'bundle.heuristic' value. In that
case, the Git client stores a 'fetch.bundleURI' config value storing
that URI.

Teach 'git fetch' to check for this config value and download bundles
from that URI before fetching from the Git remote(s). Likely, the bundle
provider has configured a heuristic (such as "creationToken") that will
allow the Git client to download only a portion of the bundles before
continuing the fetch.

Since this URI is completely independent of the remote server, we want
to be sure that we connect to the bundle URI before creating a
connection to the Git remote. We do not want to hold a stateful
connection for too long if we can avoid it.

To test that this works correctly, extend the previous tests that set
'fetch.bundleURI' to do follow-up fetches. The bundle list is updated
incrementally at each phase to demonstrate that the heuristic avoids
downloading older bundles. This includes the middle fetch downloading
the objects in bundle-3.bundle from the Git remote, and therefore not
needing that bundle in the third fetch.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/fetch.c             |   7 +++
 t/t5558-clone-bundle-uri.sh | 113 +++++++++++++++++++++++++++++++++++-
 2 files changed, 119 insertions(+), 1 deletion(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 7378cafeec9..f101e454dc9 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -29,6 +29,7 @@
 #include "commit-graph.h"
 #include "shallow.h"
 #include "worktree.h"
+#include "bundle-uri.h"
 
 #define FORCED_UPDATES_DELAY_WARNING_IN_MS (10 * 1000)
 
@@ -2109,6 +2110,7 @@ static int fetch_one(struct remote *remote, int argc, const char **argv,
 int cmd_fetch(int argc, const char **argv, const char *prefix)
 {
 	int i;
+	const char *bundle_uri;
 	struct string_list list = STRING_LIST_INIT_DUP;
 	struct remote *remote = NULL;
 	int result = 0;
@@ -2194,6 +2196,11 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 	if (dry_run)
 		write_fetch_head = 0;
 
+	if (!git_config_get_string_tmp("fetch.bundleuri", &bundle_uri)) {
+		if (fetch_bundle_uri(the_repository, bundle_uri, NULL))
+			warning(_("failed to fetch bundles from '%s'"), bundle_uri);
+	}
+
 	if (all) {
 		if (argc == 1)
 			die(_("fetch --all does not take a repository argument"));
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index b2d15e141ca..7deeb4b8ad1 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -440,7 +440,55 @@ test_expect_success 'clone incomplete bundle list (http, creationToken)' '
 	EOF
 
 	test_remote_https_urls <trace-clone.txt >actual &&
-	test_cmp expect actual
+	test_cmp expect actual &&
+
+	# We now have only one bundle ref.
+	git -C clone-token-http for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+	cat >expect <<-\EOF &&
+	refs/bundles/base
+	EOF
+	test_cmp expect refs &&
+
+	# Add remaining bundles, exercising the "deepening" strategy
+	# for downloading via the creationToken heurisitc.
+	cat >>"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle "bundle-2"]
+		uri = bundle-2.bundle
+		creationToken = 2
+
+	[bundle "bundle-3"]
+		uri = bundle-3.bundle
+		creationToken = 3
+
+	[bundle "bundle-4"]
+		uri = bundle-4.bundle
+		creationToken = 4
+	EOF
+
+	GIT_TRACE2_EVENT="$(pwd)/trace1.txt" \
+		git -C clone-token-http fetch origin --no-tags \
+		refs/heads/merge:refs/heads/merge &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-list
+	$HTTPD_URL/bundle-4.bundle
+	$HTTPD_URL/bundle-3.bundle
+	$HTTPD_URL/bundle-2.bundle
+	EOF
+
+	test_remote_https_urls <trace1.txt >actual &&
+	test_cmp expect actual &&
+
+	# We now have all bundle refs.
+	git -C clone-token-http for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+
+	cat >expect <<-\EOF &&
+	refs/bundles/base
+	refs/bundles/left
+	refs/bundles/merge
+	refs/bundles/right
+	EOF
+	test_cmp expect refs
 '
 
 test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
@@ -477,6 +525,69 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
 	cat >expect <<-\EOF &&
 	refs/bundles/base
 	EOF
+	test_cmp expect refs &&
+
+	cat >>"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle "bundle-2"]
+		uri = bundle-2.bundle
+		creationToken = 2
+	EOF
+
+	# Fetch the objects for bundle-2 _and_ bundle-3.
+	GIT_TRACE2_EVENT="$(pwd)/trace1.txt" \
+		git -C fetch-http-4 fetch origin --no-tags \
+		refs/heads/left:refs/heads/left \
+		refs/heads/right:refs/heads/right &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-list
+	$HTTPD_URL/bundle-2.bundle
+	EOF
+
+	test_remote_https_urls <trace1.txt >actual &&
+	test_cmp expect actual &&
+
+	# received left from bundle-2
+	git -C fetch-http-4 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+	cat >expect <<-\EOF &&
+	refs/bundles/base
+	refs/bundles/left
+	EOF
+	test_cmp expect refs &&
+
+	cat >>"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle "bundle-3"]
+		uri = bundle-3.bundle
+		creationToken = 3
+
+	[bundle "bundle-4"]
+		uri = bundle-4.bundle
+		creationToken = 4
+	EOF
+
+	# This fetch should skip bundle-3.bundle, since its objects are
+	# already local (we have the requisite commits for bundle-4.bundle).
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
+		git -C fetch-http-4 fetch origin --no-tags \
+		refs/heads/merge:refs/heads/merge &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-list
+	$HTTPD_URL/bundle-4.bundle
+	EOF
+
+	test_remote_https_urls <trace2.txt >actual &&
+	test_cmp expect actual &&
+
+	# received merge ref from bundle-4, but right is missing
+	# because we did not download bundle-3.
+	git -C fetch-http-4 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+
+	cat >expect <<-\EOF &&
+	refs/bundles/base
+	refs/bundles/left
+	refs/bundles/merge
+	EOF
 	test_cmp expect refs
 '
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v2 09/10] bundle-uri: store fetch.bundleCreationToken
  2023-01-23 15:21 ` [PATCH v2 00/10] Bundle URIs V: creationToken heuristic for incremental fetches Derrick Stolee via GitGitGadget
                     ` (7 preceding siblings ...)
  2023-01-23 15:21   ` [PATCH v2 08/10] fetch: fetch from an external bundle URI Derrick Stolee via GitGitGadget
@ 2023-01-23 15:21   ` Derrick Stolee via GitGitGadget
  2023-01-23 15:21   ` [PATCH v2 10/10] bundle-uri: test missing bundles with heuristic Derrick Stolee via GitGitGadget
                     ` (2 subsequent siblings)
  11 siblings, 0 replies; 74+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-01-23 15:21 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

When a bundle list specifies the "creationToken" heuristic, the Git
client downloads the list and then starts downloading bundles in
descending creationToken order. This process stops as soon as all
downloaded bundles can be applied to the repository (because all
required commits are present in the repository or in the downloaded
bundles).

When checking the same bundle list twice, this strategy requires
downloading the bundle with the maximum creationToken again, which is
wasteful. The creationToken heuristic promises that the client will not
have a use for that bundle if its creationToken value is at most the
previous creationToken value.

To prevent these wasteful downloads, create a fetch.bundleCreationToken
config setting that the Git client sets after downloading bundles. This
value allows skipping that maximum bundle download when this config
value is the same value (or larger).

To test that this works correctly, we can insert some "duplicate"
fetches into existing tests and demonstrate that only the bundle list is
downloaded.

The previous logic for downloading bundles by creationToken worked even
if the bundle list was empty, but now we have logic that depends on the
first entry of the list. Terminate early in the (non-sensical) case of
an empty bundle list.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/config/fetch.txt | 16 ++++++++++++
 bundle-uri.c                   | 48 ++++++++++++++++++++++++++++++++--
 t/t5558-clone-bundle-uri.sh    | 29 +++++++++++++++++++-
 3 files changed, 90 insertions(+), 3 deletions(-)

diff --git a/Documentation/config/fetch.txt b/Documentation/config/fetch.txt
index 244f44d460f..568f0f75b30 100644
--- a/Documentation/config/fetch.txt
+++ b/Documentation/config/fetch.txt
@@ -104,3 +104,19 @@ fetch.bundleURI::
 	linkgit:git-clone[1]. `git clone --bundle-uri` will set the
 	`fetch.bundleURI` value if the supplied bundle URI contains a bundle
 	list that is organized for incremental fetches.
++
+If you modify this value and your repository has a `fetch.bundleCreationToken`
+value, then remove that `fetch.bundleCreationToken` value before fetching from
+the new bundle URI.
+
+fetch.bundleCreationToken::
+	When using `fetch.bundleURI` to fetch incrementally from a bundle
+	list that uses the "creationToken" heuristic, this config value
+	stores the maximum `creationToken` value of the downloaded bundles.
+	This value is used to prevent downloading bundles in the future
+	if the advertised `creationToken` is not strictly larger than this
+	value.
++
+The creation token values are chosen by the provider serving the specific
+bundle URI. If you modify the URI at `fetch.bundleURI`, then be sure to
+remove the value for the `fetch.bundleCreationToken` value before fetching.
diff --git a/bundle-uri.c b/bundle-uri.c
index 162a9276f31..691853b2c56 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -487,6 +487,8 @@ static int fetch_bundles_by_token(struct repository *r,
 {
 	int cur;
 	int move_direction = 0;
+	const char *creationTokenStr;
+	uint64_t maxCreationToken = 0, newMaxCreationToken = 0;
 	struct bundle_list_context ctx = {
 		.r = r,
 		.list = list,
@@ -500,8 +502,27 @@ static int fetch_bundles_by_token(struct repository *r,
 
 	for_all_bundles_in_list(list, append_bundle, &bundles);
 
+	if (!bundles.nr) {
+		free(bundles.items);
+		return 0;
+	}
+
 	QSORT(bundles.items, bundles.nr, compare_creation_token_decreasing);
 
+	/*
+	 * If fetch.bundleCreationToken exists, parses to a uint64t, and
+	 * is not strictly smaller than the maximum creation token in the
+	 * bundle list, then do not download any bundles.
+	 */
+	if (!repo_config_get_value(r,
+				   "fetch.bundlecreationtoken",
+				   &creationTokenStr) &&
+	    sscanf(creationTokenStr, "%"PRIu64, &maxCreationToken) == 1 &&
+	    bundles.items[0]->creationToken <= maxCreationToken) {
+		free(bundles.items);
+		return 0;
+	}
+
 	/*
 	 * Attempt to download and unbundle the minimum number of bundles by
 	 * creationToken in decreasing order. If we fail to unbundle (after
@@ -522,6 +543,16 @@ static int fetch_bundles_by_token(struct repository *r,
 	cur = 0;
 	while (cur >= 0 && cur < bundles.nr) {
 		struct remote_bundle_info *bundle = bundles.items[cur];
+
+		/*
+		 * If we need to dig into bundles below the previous
+		 * creation token value, then likely we are in an erroneous
+		 * state due to missing or invalid bundles. Halt the process
+		 * instead of continuing to download extra data.
+		 */
+		if (bundle->creationToken <= maxCreationToken)
+			break;
+
 		if (!bundle->file) {
 			/*
 			 * Not downloaded yet. Try downloading.
@@ -561,6 +592,9 @@ static int fetch_bundles_by_token(struct repository *r,
 				 */
 				move_direction = -1;
 				bundle->unbundled = 1;
+
+				if (bundle->creationToken > newMaxCreationToken)
+					newMaxCreationToken = bundle->creationToken;
 			}
 		}
 
@@ -575,14 +609,24 @@ stack_operation:
 		cur += move_direction;
 	}
 
-	free(bundles.items);
-
 	/*
 	 * We succeed if the loop terminates because 'cur' drops below
 	 * zero. The other case is that we terminate because 'cur'
 	 * reaches the end of the list, so we have a failure no matter
 	 * which bundles we apply from the list.
 	 */
+	if (cur < 0) {
+		struct strbuf value = STRBUF_INIT;
+		strbuf_addf(&value, "%"PRIu64"", newMaxCreationToken);
+		if (repo_config_set_multivar_gently(ctx.r,
+						    "fetch.bundleCreationToken",
+						    value.buf, NULL, 0))
+			warning(_("failed to store maximum creation token"));
+
+		strbuf_release(&value);
+	}
+
+	free(bundles.items);
 	return cur >= 0;
 }
 
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index 7deeb4b8ad1..9c2b7934b9b 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -433,6 +433,7 @@ test_expect_success 'clone incomplete bundle list (http, creationToken)' '
 		"$HTTPD_URL/smart/fetch.git" clone-token-http &&
 
 	test_cmp_config -C clone-token-http "$HTTPD_URL/bundle-list" fetch.bundleuri &&
+	test_cmp_config -C clone-token-http 1 fetch.bundlecreationtoken &&
 
 	cat >expect <<-EOF &&
 	$HTTPD_URL/bundle-list
@@ -468,6 +469,7 @@ test_expect_success 'clone incomplete bundle list (http, creationToken)' '
 	GIT_TRACE2_EVENT="$(pwd)/trace1.txt" \
 		git -C clone-token-http fetch origin --no-tags \
 		refs/heads/merge:refs/heads/merge &&
+	test_cmp_config -C clone-token-http 4 fetch.bundlecreationtoken &&
 
 	cat >expect <<-EOF &&
 	$HTTPD_URL/bundle-list
@@ -511,6 +513,7 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
 		"$HTTPD_URL/smart/fetch.git" fetch-http-4 &&
 
 	test_cmp_config -C fetch-http-4 "$HTTPD_URL/bundle-list" fetch.bundleuri &&
+	test_cmp_config -C fetch-http-4 1 fetch.bundlecreationtoken &&
 
 	cat >expect <<-EOF &&
 	$HTTPD_URL/bundle-list
@@ -538,6 +541,7 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
 		git -C fetch-http-4 fetch origin --no-tags \
 		refs/heads/left:refs/heads/left \
 		refs/heads/right:refs/heads/right &&
+	test_cmp_config -C fetch-http-4 2 fetch.bundlecreationtoken &&
 
 	cat >expect <<-EOF &&
 	$HTTPD_URL/bundle-list
@@ -555,6 +559,18 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
 	EOF
 	test_cmp expect refs &&
 
+	# No-op fetch
+	GIT_TRACE2_EVENT="$(pwd)/trace1b.txt" \
+		git -C fetch-http-4 fetch origin --no-tags \
+		refs/heads/left:refs/heads/left \
+		refs/heads/right:refs/heads/right &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-list
+	EOF
+	test_remote_https_urls <trace1b.txt >actual &&
+	test_cmp expect actual &&
+
 	cat >>"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
 	[bundle "bundle-3"]
 		uri = bundle-3.bundle
@@ -570,6 +586,7 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
 		git -C fetch-http-4 fetch origin --no-tags \
 		refs/heads/merge:refs/heads/merge &&
+	test_cmp_config -C fetch-http-4 4 fetch.bundlecreationtoken &&
 
 	cat >expect <<-EOF &&
 	$HTTPD_URL/bundle-list
@@ -588,7 +605,17 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
 	refs/bundles/left
 	refs/bundles/merge
 	EOF
-	test_cmp expect refs
+	test_cmp expect refs &&
+
+	# No-op fetch
+	GIT_TRACE2_EVENT="$(pwd)/trace2b.txt" \
+		git -C fetch-http-4 fetch origin &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-list
+	EOF
+	test_remote_https_urls <trace2b.txt >actual &&
+	test_cmp expect actual
 '
 
 # Do not add tests here unless they use the HTTP server, as they will
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v2 10/10] bundle-uri: test missing bundles with heuristic
  2023-01-23 15:21 ` [PATCH v2 00/10] Bundle URIs V: creationToken heuristic for incremental fetches Derrick Stolee via GitGitGadget
                     ` (8 preceding siblings ...)
  2023-01-23 15:21   ` [PATCH v2 09/10] bundle-uri: store fetch.bundleCreationToken Derrick Stolee via GitGitGadget
@ 2023-01-23 15:21   ` Derrick Stolee via GitGitGadget
  2023-01-27 19:21     ` Victoria Dye
  2023-01-27 19:28   ` [PATCH v2 00/10] Bundle URIs V: creationToken heuristic for incremental fetches Victoria Dye
  2023-01-31 13:29   ` [PATCH v3 00/11] " Derrick Stolee via GitGitGadget
  11 siblings, 1 reply; 74+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-01-23 15:21 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The creationToken heuristic uses a different mechanism for downloading
bundles from the "standard" approach. Specifically: it uses a concrete
order based on the creationToken values and attempts to download as few
bundles as possible. It also modifies local config to store a value for
future fetches to avoid downloading bundles, if possible.

However, if any of the individual bundles has a failed download, then
the logic for the ordering comes into question. It is important to avoid
infinite loops, assigning invalid creation token values in config, but
also to be opportunistic as possible when downloading as many bundles as
seem appropriate.

These tests were used to inform the implementation of
fetch_bundles_by_token() in bundle-uri.c, but are being added
independently here to allow focusing on faulty downloads. There may be
more cases that could be added that result in modifications to
fetch_bundles_by_token() as interesting data shapes reveal themselves in
real scenarios.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 t/t5558-clone-bundle-uri.sh | 400 ++++++++++++++++++++++++++++++++++++
 1 file changed, 400 insertions(+)

diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index 9c2b7934b9b..e3ccfe872c4 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -618,6 +618,406 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
 	test_cmp expect actual
 '
 
+test_expect_success 'creationToken heuristic with failed downloads (clone)' '
+	test_when_finished rm -rf download-* trace*.txt &&
+
+	# Case 1: base bundle does not exist, nothing can unbundle
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = fake.bundle
+		creationToken = 1
+
+	[bundle "bundle-2"]
+		uri = bundle-2.bundle
+		creationToken = 2
+
+	[bundle "bundle-3"]
+		uri = bundle-3.bundle
+		creationToken = 3
+
+	[bundle "bundle-4"]
+		uri = bundle-4.bundle
+		creationToken = 4
+	EOF
+
+	GIT_TRACE2_EVENT="$(pwd)/trace-clone-1.txt" \
+	git clone --single-branch --branch=base \
+		--bundle-uri="$HTTPD_URL/bundle-list" \
+		"$HTTPD_URL/smart/fetch.git" download-1 &&
+
+	# Bundle failure does not set these configs.
+	test_must_fail git -C download-1 config fetch.bundleuri &&
+	test_must_fail git -C download-1 config fetch.bundlecreationtoken &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-list
+	$HTTPD_URL/bundle-4.bundle
+	$HTTPD_URL/bundle-3.bundle
+	$HTTPD_URL/bundle-2.bundle
+	$HTTPD_URL/fake.bundle
+	EOF
+	test_remote_https_urls <trace-clone-1.txt >actual &&
+	test_cmp expect actual &&
+
+	# All bundles failed to unbundle
+	git -C download-1 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+	test_must_be_empty refs &&
+
+	# Case 2: middle bundle does not exist, only two bundles can unbundle
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = bundle-1.bundle
+		creationToken = 1
+
+	[bundle "bundle-2"]
+		uri = fake.bundle
+		creationToken = 2
+
+	[bundle "bundle-3"]
+		uri = bundle-3.bundle
+		creationToken = 3
+
+	[bundle "bundle-4"]
+		uri = bundle-4.bundle
+		creationToken = 4
+	EOF
+
+	GIT_TRACE2_EVENT="$(pwd)/trace-clone-2.txt" \
+	git clone --single-branch --branch=base \
+		--bundle-uri="$HTTPD_URL/bundle-list" \
+		"$HTTPD_URL/smart/fetch.git" download-2 &&
+
+	# Bundle failure does not set these configs.
+	test_must_fail git -C download-2 config fetch.bundleuri &&
+	test_must_fail git -C download-2 config fetch.bundlecreationtoken &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-list
+	$HTTPD_URL/bundle-4.bundle
+	$HTTPD_URL/bundle-3.bundle
+	$HTTPD_URL/fake.bundle
+	$HTTPD_URL/bundle-1.bundle
+	EOF
+	test_remote_https_urls <trace-clone-2.txt >actual &&
+	test_cmp expect actual &&
+
+	# Only base bundle unbundled.
+	git -C download-2 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+	cat >expect <<-EOF &&
+	refs/bundles/base
+	refs/bundles/right
+	EOF
+	test_cmp expect refs &&
+
+	# Case 3: top bundle does not exist, rest unbundle fine.
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = bundle-1.bundle
+		creationToken = 1
+
+	[bundle "bundle-2"]
+		uri = bundle-2.bundle
+		creationToken = 2
+
+	[bundle "bundle-3"]
+		uri = bundle-3.bundle
+		creationToken = 3
+
+	[bundle "bundle-4"]
+		uri = fake.bundle
+		creationToken = 4
+	EOF
+
+	GIT_TRACE2_EVENT="$(pwd)/trace-clone-3.txt" \
+	git clone --single-branch --branch=base \
+		--bundle-uri="$HTTPD_URL/bundle-list" \
+		"$HTTPD_URL/smart/fetch.git" download-3 &&
+
+	# As long as we have continguous successful downloads,
+	# we _do_ set these configs.
+	test_cmp_config -C download-3 "$HTTPD_URL/bundle-list" fetch.bundleuri &&
+	test_cmp_config -C download-3 3 fetch.bundlecreationtoken &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-list
+	$HTTPD_URL/fake.bundle
+	$HTTPD_URL/bundle-3.bundle
+	$HTTPD_URL/bundle-2.bundle
+	$HTTPD_URL/bundle-1.bundle
+	EOF
+	test_remote_https_urls <trace-clone-3.txt >actual &&
+	test_cmp expect actual &&
+
+	# All bundles failed to unbundle
+	git -C download-3 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+	cat >expect <<-EOF &&
+	refs/bundles/base
+	refs/bundles/left
+	refs/bundles/right
+	EOF
+	test_cmp expect refs
+'
+
+# Expand the bundle list to include other interesting shapes, specifically
+# interesting for use when fetching from a previous state.
+#
+# ---------------- bundle-7
+#       7
+#     _/|\_
+# ---/--|--\------ bundle-6
+#   5   |   6
+# --|---|---|----- bundle-4
+#   |   4   |
+#   |  / \  /
+# --|-|---|/------ bundle-3 (the client will be caught up to this point.)
+#   \ |   3
+# ---\|---|------- bundle-2
+#     2   |
+# ----|---|------- bundle-1
+#      \ /
+#       1
+#       |
+# (previous commits)
+test_expect_success 'expand incremental bundle list' '
+	(
+		cd clone-from &&
+		git checkout -b lefter left &&
+		test_commit 5 &&
+		git checkout -b righter right &&
+		test_commit 6 &&
+		git checkout -b top lefter &&
+		git merge -m "7" merge righter &&
+
+		git bundle create bundle-6.bundle lefter righter --not left right &&
+		git bundle create bundle-7.bundle top --not lefter merge righter &&
+
+		cp bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/"
+	) &&
+	git -C "$HTTPD_DOCUMENT_ROOT_PATH/fetch.git" fetch origin +refs/heads/*:refs/heads/*
+'
+
+test_expect_success 'creationToken heuristic with failed downloads (fetch)' '
+	test_when_finished rm -rf download-* trace*.txt &&
+
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = bundle-1.bundle
+		creationToken = 1
+
+	[bundle "bundle-2"]
+		uri = bundle-2.bundle
+		creationToken = 2
+
+	[bundle "bundle-3"]
+		uri = bundle-3.bundle
+		creationToken = 3
+	EOF
+
+	git clone --single-branch --branch=left \
+		--bundle-uri="$HTTPD_URL/bundle-list" \
+		"$HTTPD_URL/smart/fetch.git" fetch-base &&
+	test_cmp_config -C fetch-base "$HTTPD_URL/bundle-list" fetch.bundleURI &&
+	test_cmp_config -C fetch-base 3 fetch.bundleCreationToken &&
+
+	# Case 1: all bundles exist: successful unbundling of all bundles
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = bundle-1.bundle
+		creationToken = 1
+
+	[bundle "bundle-2"]
+		uri = bundle-2.bundle
+		creationToken = 2
+
+	[bundle "bundle-3"]
+		uri = bundle-3.bundle
+		creationToken = 3
+
+	[bundle "bundle-4"]
+		uri = bundle-4.bundle
+		creationToken = 4
+
+	[bundle "bundle-6"]
+		uri = bundle-6.bundle
+		creationToken = 6
+
+	[bundle "bundle-7"]
+		uri = bundle-7.bundle
+		creationToken = 7
+	EOF
+
+	cp -r fetch-base fetch-1 &&
+	GIT_TRACE2_EVENT="$(pwd)/trace-fetch-1.txt" \
+		git -C fetch-1 fetch origin &&
+	test_cmp_config -C fetch-1 7 fetch.bundlecreationtoken &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-list
+	$HTTPD_URL/bundle-7.bundle
+	$HTTPD_URL/bundle-6.bundle
+	$HTTPD_URL/bundle-4.bundle
+	EOF
+	test_remote_https_urls <trace-fetch-1.txt >actual &&
+	test_cmp expect actual &&
+
+	# Check which bundles have unbundled by refs
+	git -C fetch-1 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+	cat >expect <<-EOF &&
+	refs/bundles/base
+	refs/bundles/left
+	refs/bundles/lefter
+	refs/bundles/merge
+	refs/bundles/right
+	refs/bundles/righter
+	refs/bundles/top
+	EOF
+	test_cmp expect refs &&
+
+	# Case 2: middle bundle does not exist, only bundle-4 can unbundle
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = bundle-1.bundle
+		creationToken = 1
+
+	[bundle "bundle-2"]
+		uri = bundle-2.bundle
+		creationToken = 2
+
+	[bundle "bundle-3"]
+		uri = bundle-3.bundle
+		creationToken = 3
+
+	[bundle "bundle-4"]
+		uri = bundle-4.bundle
+		creationToken = 4
+
+	[bundle "bundle-6"]
+		uri = fake.bundle
+		creationToken = 6
+
+	[bundle "bundle-7"]
+		uri = bundle-7.bundle
+		creationToken = 7
+	EOF
+
+	cp -r fetch-base fetch-2 &&
+	GIT_TRACE2_EVENT="$(pwd)/trace-fetch-2.txt" \
+		git -C fetch-2 fetch origin &&
+
+	# Since bundle-7 fails to unbundle, do not update creation token.
+	test_cmp_config -C fetch-2 3 fetch.bundlecreationtoken &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-list
+	$HTTPD_URL/bundle-7.bundle
+	$HTTPD_URL/fake.bundle
+	$HTTPD_URL/bundle-4.bundle
+	EOF
+	test_remote_https_urls <trace-fetch-2.txt >actual &&
+	test_cmp expect actual &&
+
+	# Check which bundles have unbundled by refs
+	git -C fetch-2 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+	cat >expect <<-EOF &&
+	refs/bundles/base
+	refs/bundles/left
+	refs/bundles/merge
+	refs/bundles/right
+	EOF
+	test_cmp expect refs &&
+
+	# Case 3: top bundle does not exist, rest unbundle fine.
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = bundle-1.bundle
+		creationToken = 1
+
+	[bundle "bundle-2"]
+		uri = bundle-2.bundle
+		creationToken = 2
+
+	[bundle "bundle-3"]
+		uri = bundle-3.bundle
+		creationToken = 3
+
+	[bundle "bundle-4"]
+		uri = bundle-4.bundle
+		creationToken = 4
+
+	[bundle "bundle-6"]
+		uri = bundle-6.bundle
+		creationToken = 6
+
+	[bundle "bundle-7"]
+		uri = fake.bundle
+		creationToken = 7
+	EOF
+
+	cp -r fetch-base fetch-3 &&
+	GIT_TRACE2_EVENT="$(pwd)/trace-fetch-3.txt" \
+		git -C fetch-3 fetch origin &&
+
+	# As long as we have continguous successful downloads,
+	# we _do_ set the maximum creation token.
+	test_cmp_config -C fetch-3 6 fetch.bundlecreationtoken &&
+
+	# NOTE: the fetch skips bundle-4 since bundle-6 successfully
+	# unbundles itself and bundle-7 failed to download.
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-list
+	$HTTPD_URL/fake.bundle
+	$HTTPD_URL/bundle-6.bundle
+	EOF
+	test_remote_https_urls <trace-fetch-3.txt >actual &&
+	test_cmp expect actual &&
+
+	# Check which bundles have unbundled by refs
+	git -C fetch-3 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+	cat >expect <<-EOF &&
+	refs/bundles/base
+	refs/bundles/left
+	refs/bundles/lefter
+	refs/bundles/right
+	refs/bundles/righter
+	EOF
+	test_cmp expect refs
+'
+
 # Do not add tests here unless they use the HTTP server, as they will
 # not run unless the HTTP dependencies exist.
 
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: [PATCH v2 01/10] bundle: optionally skip reachability walk
  2023-01-23 15:21   ` [PATCH v2 01/10] bundle: optionally skip reachability walk Derrick Stolee via GitGitGadget
@ 2023-01-23 18:03     ` Junio C Hamano
  2023-01-23 18:24       ` Derrick Stolee
  0 siblings, 1 reply; 74+ messages in thread
From: Junio C Hamano @ 2023-01-23 18:03 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, me, vdye, avarab, steadmon, chooglen, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <derrickstolee@github.com>
>
> When unbundling a bundle, the verify_bundle() method checks two things
> with regards to the prerequisite commits:
>
>  1. Those commits are in the object store, and
>  2. Those commits are reachable from refs.
>
> During testing of the bundle URI feature, where multiple bundles are
> unbundled in the same process, the ref store did not appear to be
> refreshing with the new refs/bundles/* references added within that
> process. This caused the second half -- the reachability walk -- report
> that some commits were not present, despite actually being present.
>
> One way to attempt to fix this would be to create a way to force-refresh
> the ref state. That would correct this for these cases where the
> refs/bundles/* references have been updated. However, this still is an
> expensive operation in a repository with many references.
>
> Instead, optionally allow callers to skip this portion by instead just
> checking for presence within the object store. Use this when unbundling
> in bundle-uri.c.

This step is new in this round.

I am assuming that this approach is to avoid repeated "now we
unbundled one, let's spend enormous cycles to update the in-core
view of the ref store before processing the next bundle"---instead
we unbundle all, assuming the prerequisites for each and every
bundle are satisfied.

I am OK as long as we check the assumption holds true at the end;
this looks like a good optimization.


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v2 01/10] bundle: optionally skip reachability walk
  2023-01-23 18:03     ` Junio C Hamano
@ 2023-01-23 18:24       ` Derrick Stolee
  2023-01-23 20:13         ` Junio C Hamano
  2023-01-23 21:08         ` Junio C Hamano
  0 siblings, 2 replies; 74+ messages in thread
From: Derrick Stolee @ 2023-01-23 18:24 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, me, vdye, avarab, steadmon, chooglen

On 1/23/2023 1:03 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Derrick Stolee <derrickstolee@github.com>
>>
>> When unbundling a bundle, the verify_bundle() method checks two things
>> with regards to the prerequisite commits:
>>
>>  1. Those commits are in the object store, and
>>  2. Those commits are reachable from refs.
>>
>> During testing of the bundle URI feature, where multiple bundles are
>> unbundled in the same process, the ref store did not appear to be
>> refreshing with the new refs/bundles/* references added within that
>> process. This caused the second half -- the reachability walk -- report
>> that some commits were not present, despite actually being present.
>>
>> One way to attempt to fix this would be to create a way to force-refresh
>> the ref state. That would correct this for these cases where the
>> refs/bundles/* references have been updated. However, this still is an
>> expensive operation in a repository with many references.
>>
>> Instead, optionally allow callers to skip this portion by instead just
>> checking for presence within the object store. Use this when unbundling
>> in bundle-uri.c.
> 
> This step is new in this round.
> 
> I am assuming that this approach is to avoid repeated "now we
> unbundled one, let's spend enormous cycles to update the in-core
> view of the ref store before processing the next bundle"---instead
> we unbundle all, assuming the prerequisites for each and every
> bundle are satisfied.

We are specifically removing the requirement that the objects are
reachable from refs, we still check that the objects are in the
object store. Thus, we can only be in a bad state afterwards if
the required objects for a bundle were in the object store,
previously unreachable, and one of these two things happened:

1. Some objects reachable from those required commits were already
   missing in the repository (so the repo's object store was broken
   but only for some unreachable objects).

2. A GC pruned those objects between verifying the bundle and
   writing the refs/bundles/* refs after unbundling.

I think we should trust the repository to not be in the first state,
but the race condition in the second option will create a state
where we have missing objects that are now reachable from refs.
 
> I am OK as long as we check the assumption holds true at the end;
> this looks like a good optimization.
 
So are you recommending that we verify all objects reachable from
the new refs/bundles/* are present after unbundling? That would
prevent the possibility of a GC race, but at some significant run-
time performance costs. Do we do the same as we unpack from a
fetch? Do we apply the same scrutiny to the objects during
unbundling as we do from a fetched pack? They both use 'git
index-pack --stdin --fix-thin', so my guess is that we do the same
amount of checks for both cases.

Since this is only one of multiple ways to add objects that depend
on possibly-unreachable objects, my preferred way to avoid the
GC race is by using care in the GC process itself (such as the new
--expire-to option to recover from these races).

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v2 01/10] bundle: optionally skip reachability walk
  2023-01-23 18:24       ` Derrick Stolee
@ 2023-01-23 20:13         ` Junio C Hamano
  2023-01-23 22:30           ` Junio C Hamano
  2023-01-23 21:08         ` Junio C Hamano
  1 sibling, 1 reply; 74+ messages in thread
From: Junio C Hamano @ 2023-01-23 20:13 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, me, vdye, avarab, steadmon,
	chooglen

Derrick Stolee <derrickstolee@github.com> writes:

> We are specifically removing the requirement that the objects are
> reachable from refs, we still check that the objects are in the
> object store. Thus, we can only be in a bad state afterwards if
> the required objects for a bundle were in the object store,
> previously unreachable, and one of these two things happened:
>
> 1. Some objects reachable from those required commits were already
>    missing in the repository (so the repo's object store was broken
>    but only for some unreachable objects).

A repository having some unreachable objects floating in the object
store is not corrupt.  As long as all the objects reachable from refs
are connected, that is a perfectly sane state.

But allowing unbundling with the sanity check loosened WILL corrupt
it, at the moment you point some objects from the bundle with refs.

> I think we should trust the repository to not be in the first state,

So, I think this line of thought is simply mistaken.

>> I am OK as long as we check the assumption holds true at the end;
>> this looks like a good optimization.
>  
> So are you recommending that we verify all objects reachable from
> the new refs/bundles/* are present after unbundling?

Making sure that prerequisites are connected will reduce the span of
the DAG we would need to verify.  After unbundling all bundles, but
before updating the refs to point at the tips in the bundles, if we
can make sure that these prerequisite objects named in the bundles
are reachable from the tips recorded in the bundles, while stopping
the traversal at the tips of original refs (remember: we have only
updated objects in the object store, but haven't updated the refs from
the bundles), that would allow us to make sure that the updates to
refs proposed by the bundles will not corrupt the repository.


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v2 01/10] bundle: optionally skip reachability walk
  2023-01-23 18:24       ` Derrick Stolee
  2023-01-23 20:13         ` Junio C Hamano
@ 2023-01-23 21:08         ` Junio C Hamano
  1 sibling, 0 replies; 74+ messages in thread
From: Junio C Hamano @ 2023-01-23 21:08 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, me, vdye, avarab, steadmon,
	chooglen

Derrick Stolee <derrickstolee@github.com> writes:

> Do we do the same as we unpack from a fetch?

We should.

We only consider tips of refs and objects that are reachable from
them to be "present", and there may be random objects that float in
the object store without any guarantee that no the objects that
ought to be reachable from them are missing from the object store,
but they do not play part in the common ancestor discovery.

And then after we unpack, we ensure that the proposed updates to
refs made by the fetch operation will not corrupt the repository.
This can be guaranteed by making sure that objects to be placed at
the updated tip can all reach some existing tips of refs.  We trust
refs before the operation (in this case, 'git fetch') begins.  We
ensure that refs after the operation can be trusted before updating
them.  Where "trust" here means "objects pointed at by them are
connected.


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v2 01/10] bundle: optionally skip reachability walk
  2023-01-23 20:13         ` Junio C Hamano
@ 2023-01-23 22:30           ` Junio C Hamano
  2023-01-24 12:27             ` Derrick Stolee
  0 siblings, 1 reply; 74+ messages in thread
From: Junio C Hamano @ 2023-01-23 22:30 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, me, vdye, avarab, steadmon,
	chooglen

Junio C Hamano <gitster@pobox.com> writes:

> A repository having some unreachable objects floating in the object
> store is not corrupt.  As long as all the objects reachable from refs
> are connected, that is a perfectly sane state.
>
> But allowing unbundling with the sanity check loosened WILL corrupt
> it, at the moment you point some objects from the bundle with refs.

While all of the above is true, I think existing check done by
bundle.c::verify_bundle() is stricter than necessary.  We make sure
that the prerequiste objects exist and are reachable from the refs.
But for the purpose of ensuring the health of the repo after the
operation, it is also OK if the prerequisite objects exist and they
pass connected.c::check_connected() test to reach existing refs.
verify_bundle() that is used in unbundle() does not allow it.

In a simplest case, with a single ref and a single strand of pearls
history, with a few cruft history that are connected to the main
history but are not anchored by any ref (e.g. the tip of 'main' was
rewound recently and we haven't done any GC).

                             7---8 (cruft)
                            /
   0---1---2---3---4---5---6 refs/heads/main

And they have another repository created back when '5' was the
latest and greatest, which built three commits on top of it.

   0---1---2---3---4---5---A---B---C

They create a bundle of 5..C to update us to C.  In the meantime, we
have created three commits ourselves (6, 7, and 8) but threw two
away.

If a bundle requires commit '5', because it is reachable from an
existing ref (which points at the 'main' branch), we can unbundle it
and point a ref at the tip of the history contained within the
bundle safely.  Commit '5' being pointed by a ref means the commit,
its ancestors, and all trees and blobs referenced are available to
the repository (some may be fetched lazily from promisor), and
unless the producer lied and placed unconnected data in the bundle,
unpacking a bundle on top of '5' should give us all the objects that
are needed to complete the new tip proposed by the bundle (e.g. 'C').

                             7---8 (cruft)
                            /
   0---1---2---3---4---5---6 refs/heads/main
                        \
                         A---B---C refs/bundle-1/main

And existing check that I called "sticter than necessary" easily
makes sure that it is safe to point 'C' with our ref.

Imagine another party cloned us back when 'main' was pointing at '8'
(which we since then have rewound), and built a few commits on it.

                                   X---Y refs/bundle-2/main
                                  /
                             7---8 (cruft)
                            /
   0---1---2---3---4---5---6 refs/heads/main

As they did not know we'd rewind and discard 7 and 8, they would
have created their bundle to cover 8..Y.  The current test will fail
because '8' that is prerequisite is not reachable from any ref on
our side.  But that is too pessimistic.

As long as we haven't garbage collected so that all objects
reachable from '7' and '8' are available to this repository,
however, we should be able to unbundle the bundle that has '8' as
its prerequisite.  For that, we only need that '8' passes the
check_connected() check, which essentially means "we shouldn't find
any missing link while traversing history from '8' that stops at any
existing refs".

Again this relies on the fact that unbundling code makes sure that
incoming data is fully connected (i.e. bundle producer did not lie
about the prerequisite).


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v2 01/10] bundle: optionally skip reachability walk
  2023-01-23 22:30           ` Junio C Hamano
@ 2023-01-24 12:27             ` Derrick Stolee
  2023-01-24 14:14               ` [PATCH v2.5 01/11] bundle: test unbundling with incomplete history Derrick Stolee
                                 ` (2 more replies)
  0 siblings, 3 replies; 74+ messages in thread
From: Derrick Stolee @ 2023-01-24 12:27 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee via GitGitGadget, git, me, vdye, avarab, steadmon,
	chooglen

On 1/23/2023 5:30 PM, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
>> A repository having some unreachable objects floating in the object
>> store is not corrupt.  As long as all the objects reachable from refs
>> are connected, that is a perfectly sane state.
>>
>> But allowing unbundling with the sanity check loosened WILL corrupt
>> it, at the moment you point some objects from the bundle with refs.
> 
> While all of the above is true, I think existing check done by
> bundle.c::verify_bundle() is stricter than necessary.  We make sure
> that the prerequiste objects exist and are reachable from the refs.
> But for the purpose of ensuring the health of the repo after the
> operation, it is also OK if the prerequisite objects exist and they
> pass connected.c::check_connected() test to reach existing refs.
> verify_bundle() that is used in unbundle() does not allow it.

Thank you for all of the detailed explanation, here and in other
messages.

I'll focus on this area today and see what I can learn and how I
can approach this problem in a different way. The current options
that I see are:

 1. Leave verify_bundle() as-is and figure out how to refresh the
    refs. (This would remain a stricter check than necessary.)

 2. Find out how to modify verify_bundle() so it can do the more
    relaxed connectivity check.

 3. Take the connectivity check that fetch uses before updating
    refs and add that check before updating refs in the bundle URI
    code.

There could also be a combination of (2) and (3), or others I have
not considered until I go poking around in the code.

I'll let you know what I find.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH v2.5 01/11] bundle: test unbundling with incomplete history
  2023-01-24 12:27             ` Derrick Stolee
@ 2023-01-24 14:14               ` Derrick Stolee
  2023-01-24 17:16                 ` Junio C Hamano
  2023-01-24 14:16               ` [PATCH v2.5 02/11] bundle: verify using connected() Derrick Stolee
  2023-01-24 15:22               ` [PATCH v2 01/10] bundle: optionally skip reachability walk Junio C Hamano
  2 siblings, 1 reply; 74+ messages in thread
From: Derrick Stolee @ 2023-01-24 14:14 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee via GitGitGadget, git, me, vdye, avarab, steadmon,
	chooglen

On 1/24/2023 7:27 AM, Derrick Stolee wrote:

> I'll focus on this area today and see what I can learn and how I
> can approach this problem in a different way.

The first thing I did was try to figure out how things work today,
so I created this test case. It appears we were not testing this
at all previously.

This is just a candidate replacement for v3, so don't worry about
applying it until I re-roll.

Thanks,
-Stolee

--- >8 ---

From f9b0cc872ac44892fe6b1c973f16b35edfdc5b20 Mon Sep 17 00:00:00 2001
From: Derrick Stolee <derrickstolee@github.com>
Date: Tue, 24 Jan 2023 08:47:19 -0500
Subject: [PATCH v2.5 01/11] bundle: test unbundling with incomplete history

When verifying a bundle, Git checks first that all prerequisite commits
exist in the object store, then adds an additional check: those
prerequisite commits must be reachable from references in the
repository.

This check is stronger than what is checked for refs being added during
'git fetch', which simply guarantees that the new refs have a complete
history up to the point where it intersects with the current reachable
history.

However, we also do not have any tests that check the behavior under
this condition. Create a test that demonstrates its behavior.

In order to construct a broken history, perform a shallow clone of a
repository with a linear history, but whose default branch ('base') has
a single commit, so dropping the shallow markers leaves a complete
history from that reference. However, the 'tip' reference adds a
shallow commit whose parent is missing in the cloned repository. Trying
to unbundle a bundle with the 'tip' as a prerequisite will succeed past
the object store check and move into the reachability check.

The two errors that are reported are of this form:

  error: Could not read <missing-commit>
  fatal: Failed to traverse parents of commit <present-commit>

These messages are not particularly helpful for the person running the
unbundle command, but they do prevent the command from succeeding.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 t/t6020-bundle-misc.sh | 40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh
index 3a1cf30b1d7..38dbbf89155 100755
--- a/t/t6020-bundle-misc.sh
+++ b/t/t6020-bundle-misc.sh
@@ -566,4 +566,44 @@ test_expect_success 'cloning from filtered bundle has useful error' '
 	grep "cannot clone from filtered bundle" err
 '

+test_expect_success 'verify catches unreachable, broken prerequisites' '
+	test_when_finished rm -rf clone-from clone-to &&
+	git init clone-from &&
+	(
+		cd clone-from &&
+		git checkout -b base &&
+		test_commit A &&
+		git checkout -b tip &&
+		git commit --allow-empty -m "will drop by shallow" &&
+		git commit --allow-empty -m "will keep by shallow" &&
+		git commit --allow-empty -m "for bundle, not clone" &&
+		git bundle create tip.bundle tip~1..tip &&
+		git reset --hard HEAD~1 &&
+		git checkout base
+	) &&
+	BAD_OID=$(git -C clone-from rev-parse tip~1) &&
+	TIP_OID=$(git -C clone-from rev-parse tip) &&
+	git clone --depth=1 --no-single-branch \
+		"file://$(pwd)/clone-from" clone-to &&
+	(
+		cd clone-to &&
+
+		# Set up broken history by removing shallow markers
+		git update-ref -d refs/remotes/origin/tip &&
+		rm .git/shallow &&
+
+		# Verify should fail
+		test_must_fail git bundle verify \
+			../clone-from/tip.bundle 2>err &&
+		grep "Could not read $BAD_OID" err &&
+		grep "Failed to traverse parents of commit $TIP_OID" err &&
+
+		# Unbundling should fail
+		test_must_fail git bundle unbundle \
+			../clone-from/tip.bundle 2>err &&
+		grep "Could not read $BAD_OID" err &&
+		grep "Failed to traverse parents of commit $TIP_OID" err
+	)
+'
+
 test_done
--
2.39.1.vfs.0.0



^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v2.5 02/11] bundle: verify using connected()
  2023-01-24 12:27             ` Derrick Stolee
  2023-01-24 14:14               ` [PATCH v2.5 01/11] bundle: test unbundling with incomplete history Derrick Stolee
@ 2023-01-24 14:16               ` Derrick Stolee
  2023-01-24 17:33                 ` Junio C Hamano
  2023-01-24 15:22               ` [PATCH v2 01/10] bundle: optionally skip reachability walk Junio C Hamano
  2 siblings, 1 reply; 74+ messages in thread
From: Derrick Stolee @ 2023-01-24 14:16 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee via GitGitGadget, git, me, vdye, avarab, steadmon,
	chooglen

On 1/24/2023 7:27 AM, Derrick Stolee wrote:

> I'll focus on this area today and see what I can learn and how I
> can approach this problem in a different way.

>  2. Find out how to modify verify_bundle() so it can do the more
>     relaxed connectivity check.

And here is the modification to verify_bundle() to depend on
check_connected() instead. This also improves (in my opinion) the
error reporting from this situation, as seen in the test edits.

Again, this is a placeholder before I re-roll this series into
an inevitable v3, so don't bother applying this patch until then.

Thanks,
-Stolee

--- >8 ---

From 6a3d64761e9691994f9310b9ce2338f49aa72d48 Mon Sep 17 00:00:00 2001
From: Derrick Stolee <derrickstolee@github.com>
Date: Tue, 24 Jan 2023 08:47:00 -0500
Subject: [PATCH v2.5 02/11] bundle: verify using connected()

When Git verifies a bundle to see if it is safe for unbundling, it first
looks to see if the prerequisite commits are in the object store. This
is usually a sufficient filter, and those missing commits are indicated
clearly in the error messages. However, if the commits are present in
the object store, then there could still be issues if those commits are
not reachable from the repository's references. The repository only has
guarantees that its object store is closed under reachability for the
objects that are reachable from references.

Thus, the code in verify_bundle() has previously had the additional
check that all prerequisite commits are reachable from repository
references. This is done via a revision walk from all references,
stopping only if all prerequisite commits are discovered or all commits
are walked. This uses a custom walk to verify_bundle().

This check is more strict than what Git applies even to fetched
pack-files. In the fetch case, Git guarantees that the new references
are closed under reachability by walking from the new references until
walking commits that are reachable from repository refs. This is done
through the well-used check_connected() method.

To better align with the restrictions required by 'git fetch',
reimplement this check in verify_bundle() to use check_connected(). This
also simplifies the code significantly.

The previous change added a test that verified the behavior of 'git
bundle verify' and 'git bundle unbundle' in this case, and the error
messages looked like this:

  error: Could not read <missing-commit>
  fatal: Failed to traverse parents of commit <extant-commit>

However, by changing the revision walk slightly within check_connected()
and using its quiet mode, we can omit those messages. Instead, we get
only this message, tailored to describing the current state of the
repository:

  error: some prerequisite commits exist in the object store,
         but are not connected to the repository's history

(Line break added here for the commit message formatting, only.)

While this message does not include any object IDs, there is no
guarantee that those object IDs would help the user diagnose what is
going on, as they could be separated from the prerequisite commits by
some distance. At minimum, this situation describes the situation in a
more informative way than the previous error messages.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle.c               | 75 ++++++++++++++++--------------------------
 t/t6020-bundle-misc.sh |  8 ++---
 2 files changed, 33 insertions(+), 50 deletions(-)

diff --git a/bundle.c b/bundle.c
index 4ef7256aa11..76c3a904898 100644
--- a/bundle.c
+++ b/bundle.c
@@ -12,6 +12,7 @@
 #include "refs.h"
 #include "strvec.h"
 #include "list-objects-filter-options.h"
+#include "connected.h"

 static const char v2_bundle_signature[] = "# v2 git bundle\n";
 static const char v3_bundle_signature[] = "# v3 git bundle\n";
@@ -187,6 +188,21 @@ static int list_refs(struct string_list *r, int argc, const char **argv)
 /* Remember to update object flag allocation in object.h */
 #define PREREQ_MARK (1u<<16)

+struct string_list_iterator {
+	struct string_list *list;
+	size_t cur;
+};
+
+static const struct object_id *iterate_ref_map(void *cb_data)
+{
+	struct string_list_iterator *iter = cb_data;
+
+	if (iter->cur >= iter->list->nr)
+		return NULL;
+
+	return iter->list->items[iter->cur++].util;
+}
+
 int verify_bundle(struct repository *r,
 		  struct bundle_header *header,
 		  enum verify_bundle_flags flags)
@@ -196,26 +212,25 @@ int verify_bundle(struct repository *r,
 	 * to be verbose about the errors
 	 */
 	struct string_list *p = &header->prerequisites;
-	struct rev_info revs = REV_INFO_INIT;
-	const char *argv[] = {NULL, "--all", NULL};
-	struct commit *commit;
-	int i, ret = 0, req_nr;
+	int i, ret = 0;
 	const char *message = _("Repository lacks these prerequisite commits:");
+	struct string_list_iterator iter = {
+		.list = p,
+	};
+	struct check_connected_options opts = {
+		.quiet = 1,
+	};

 	if (!r || !r->objects || !r->objects->odb)
 		return error(_("need a repository to verify a bundle"));

-	repo_init_revisions(r, &revs, NULL);
 	for (i = 0; i < p->nr; i++) {
 		struct string_list_item *e = p->items + i;
 		const char *name = e->string;
 		struct object_id *oid = e->util;
 		struct object *o = parse_object(r, oid);
-		if (o) {
-			o->flags |= PREREQ_MARK;
-			add_pending_object(&revs, o, name);
+		if (o)
 			continue;
-		}
 		ret++;
 		if (flags & VERIFY_BUNDLE_QUIET)
 			continue;
@@ -223,37 +238,14 @@ int verify_bundle(struct repository *r,
 			error("%s", message);
 		error("%s %s", oid_to_hex(oid), name);
 	}
-	if (revs.pending.nr != p->nr)
+	if (ret)
 		goto cleanup;
-	req_nr = revs.pending.nr;
-	setup_revisions(2, argv, &revs, NULL);
-
-	list_objects_filter_copy(&revs.filter, &header->filter);
-
-	if (prepare_revision_walk(&revs))
-		die(_("revision walk setup failed"));

-	i = req_nr;
-	while (i && (commit = get_revision(&revs)))
-		if (commit->object.flags & PREREQ_MARK)
-			i--;
-
-	for (i = 0; i < p->nr; i++) {
-		struct string_list_item *e = p->items + i;
-		const char *name = e->string;
-		const struct object_id *oid = e->util;
-		struct object *o = parse_object(r, oid);
-		assert(o); /* otherwise we'd have returned early */
-		if (o->flags & SHOWN)
-			continue;
-		ret++;
-		if (flags & VERIFY_BUNDLE_QUIET)
-			continue;
-		if (ret == 1)
-			error("%s", message);
-		error("%s %s", oid_to_hex(oid), name);
-	}
+	if ((ret = check_connected(iterate_ref_map, &iter, &opts)))
+		error(_("some prerequisite commits exist in the object store, "
+			"but are not connected to the repository's history"));

+	/* TODO: preserve this verbose language. */
 	if (flags & VERIFY_BUNDLE_VERBOSE) {
 		struct string_list *r;

@@ -282,15 +274,6 @@ int verify_bundle(struct repository *r,
 				  list_objects_filter_spec(&header->filter));
 	}
 cleanup:
-	/* Clean up objects used, as they will be reused. */
-	for (i = 0; i < p->nr; i++) {
-		struct string_list_item *e = p->items + i;
-		struct object_id *oid = e->util;
-		commit = lookup_commit_reference_gently(r, oid, 1);
-		if (commit)
-			clear_commit_marks(commit, ALL_REV_FLAGS | PREREQ_MARK);
-	}
-	release_revisions(&revs);
 	return ret;
 }

diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh
index 38dbbf89155..7d40994991e 100755
--- a/t/t6020-bundle-misc.sh
+++ b/t/t6020-bundle-misc.sh
@@ -595,14 +595,14 @@ test_expect_success 'verify catches unreachable, broken prerequisites' '
 		# Verify should fail
 		test_must_fail git bundle verify \
 			../clone-from/tip.bundle 2>err &&
-		grep "Could not read $BAD_OID" err &&
-		grep "Failed to traverse parents of commit $TIP_OID" err &&
+		grep "some prerequisite commits .* are not connected" err &&
+		test_line_count = 1 err &&

 		# Unbundling should fail
 		test_must_fail git bundle unbundle \
 			../clone-from/tip.bundle 2>err &&
-		grep "Could not read $BAD_OID" err &&
-		grep "Failed to traverse parents of commit $TIP_OID" err
+		grep "some prerequisite commits .* are not connected" err &&
+		test_line_count = 1 err
 	)
 '

--
2.39.1.vfs.0.0



^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: [PATCH v2 01/10] bundle: optionally skip reachability walk
  2023-01-24 12:27             ` Derrick Stolee
  2023-01-24 14:14               ` [PATCH v2.5 01/11] bundle: test unbundling with incomplete history Derrick Stolee
  2023-01-24 14:16               ` [PATCH v2.5 02/11] bundle: verify using connected() Derrick Stolee
@ 2023-01-24 15:22               ` Junio C Hamano
  2 siblings, 0 replies; 74+ messages in thread
From: Junio C Hamano @ 2023-01-24 15:22 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, me, vdye, avarab, steadmon,
	chooglen

Derrick Stolee <derrickstolee@github.com> writes:

> I'll focus on this area today and see what I can learn and how I
> can approach this problem in a different way. The current options
> that I see are:
>
>  1. Leave verify_bundle() as-is and figure out how to refresh the
>     refs. (This would remain a stricter check than necessary.)

Even if we switch to "assume everything is OK, remember a few key
facts (like prerequisites and tips) about each bundle as we unpack
them, and sanity check the results at the end" approach, doesn't
that last step require us to be able to see the final state of the
refs?  If so, wouldn't we need to figure out how to refresh the refs
no matter what?

>  2. Find out how to modify verify_bundle() so it can do the more
>     relaxed connectivity check.

I am not sure what kind of relaxing you have in mind, but as long as
we can guarantee the connectedness of the end result

>  3. Take the connectivity check that fetch uses before updating
>     refs and add that check before updating refs in the bundle URI
>     code.

This is optional at much lower priority, isn't it?  In the second
example in the message you are responding to, I do not think it is
too bad to reject the bundle based on '8' that has been rewound away
(in other words, a bundle publisher ought to be basing their bundles
on well publicized and commonly available commits).  Only when we
try to be overly helpful to such a use case, it becomes necessary to
loosen the rule from "all prerequisites must be reachable from
existing refs" to "or prerequisites that are not reachable from any
refs are also OK if they pass check_connected()".

The current check to require that prerequisites are reachable from
refs does not have to check trees and blobs, because any commit that
is reachable from an existing ref is complete[*] by definition.

    Let's define a term: a commit is "complete" iff it is not
    missing any objects that it (recursively) references to.

The check done by check_connected() is more expensive because it has
to prove that a commit, which is found in the object store and may
or may not be reachable from any refs, is complete.  The tranversal
still can take advantage of the fact that commits _reachable_ from
refs are guaranteed to be complete, but until the traversal reaches
a commit that is reachable from refs (e.g. when inspecting commits
'8' and then '7' until it reaches '6', in the second example in the
message you are responding to) we need to look at trees and blobs.

> There could also be a combination of (2) and (3), or others I have
> not considered until I go poking around in the code.
>
> I'll let you know what I find.

Thanks.  Unlike areas that allow glitches as long as workarounds are
available (e.g. UI), the object store + refs layer is where it is
absolutely required to be correct.  I am happy to see capable minds
are on it.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v2.5 01/11] bundle: test unbundling with incomplete history
  2023-01-24 14:14               ` [PATCH v2.5 01/11] bundle: test unbundling with incomplete history Derrick Stolee
@ 2023-01-24 17:16                 ` Junio C Hamano
  0 siblings, 0 replies; 74+ messages in thread
From: Junio C Hamano @ 2023-01-24 17:16 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, me, vdye, avarab, steadmon,
	chooglen

Derrick Stolee <derrickstolee@github.com> writes:

> In order to construct a broken history, perform a shallow clone of a
> repository with a linear history, but whose default branch ('base') has
> a single commit, so dropping the shallow markers leaves a complete
> history from that reference. However, the 'tip' reference adds a
> shallow commit whose parent is missing in the cloned repository. Trying
> to unbundle a bundle with the 'tip' as a prerequisite will succeed past
> the object store check and move into the reachability check.

It makes it sound convoluted set-up for tests, but I guess it is the
most direct way to get to the state you want to test, which is good.

In practice, the problem would appear when you create a multi-commit
branch, which then is discarded.  GC then decides to expire the
older part of the commit chain while leaving the commits near the
tip still in the object store.  So the problem can happen without
users doing anything esoteric, and is very much worth testing.

> +test_expect_success 'verify catches unreachable, broken prerequisites' '
> +	test_when_finished rm -rf clone-from clone-to &&

OK, so my understanding of what happens is ...

> +	git init clone-from &&
> +	(
> +		cd clone-from &&
> +		git checkout -b base &&
> +		test_commit A &&
> +		git checkout -b tip &&
> +		git commit --allow-empty -m "will drop by shallow" &&
> +		git commit --allow-empty -m "will keep by shallow" &&
> +		git commit --allow-empty -m "for bundle, not clone" &&
> +		git bundle create tip.bundle tip~1..tip &&

... there is a single strand of pearls

	A---D---K---B tip

where D is with "will drop by shallow" message.  The bundle
is prepared to give a history leading to B while requiring K.

> +		git reset --hard HEAD~1 &&
> +		git checkout base

Then B is thrown away before the history is cloned.

> +	) &&
> +	BAD_OID=$(git -C clone-from rev-parse tip~1) &&
> +	TIP_OID=$(git -C clone-from rev-parse tip) &&
> +	git clone --depth=1 --no-single-branch \
> +		"file://$(pwd)/clone-from" clone-to &&
> +	(
> +		cd clone-to &&

The cloned repository should have

	A---d---K

where D is missing behind the shallow boundary, origin/tip pointing
at K.

> +		# Set up broken history by removing shallow markers
> +		git update-ref -d refs/remotes/origin/tip &&

But we remove origin/tip, so K (and its trees and blobs) is totally
disconnected.

> +		rm .git/shallow &&

And then this removes the shallow info that makes us to pretend that
K does not have D (missing) as its parent.  Now we lack the required
parent D if we start traversing from K.

> +		# Verify should fail
> +		test_must_fail git bundle verify \
> +			../clone-from/tip.bundle 2>err &&

verify_bundle() wants to see traversal from "--all" to hit the
prerequisite objects and K certainly cannot be reached by any ref.

OK.  So we ended up with a repository where we are on 'base' branch,
and origin/HEAD and origin/base remote-tracking refs exist, all of
these refs pointing at A.  Plus K exists but not D, but it is fine
because K is not referenced by any ref.

This is perfectly constructed test case that checks a very
interesting scenario.  It is as if the commit chain D---K was
discarded (via "git branch -D") and then D got expired for being too
old but K is not old enough.

We want to ensure "git bundle verify" and "git fetch ./bundle.file"
in this healthy repository, where its refs do honor the promise, but
its object store has unconnected commits (like "K") that are not
complete, behaves sensibly.  If we loosen "prerequisites must be
reachable from refs" to "prerequisites must exist", it will lead to
repository corruption if we allow the bundle to be unbundled and its
tips made into our refs, because these new refs point at incomplete
objects.

Excellent.

> +		# Unbundling should fail
> +		test_must_fail git bundle unbundle \
> +			../clone-from/tip.bundle 2>err &&
> +		grep "Could not read $BAD_OID" err &&
> +		grep "Failed to traverse parents of commit $TIP_OID" err
> +	)
> +'

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v2.5 02/11] bundle: verify using connected()
  2023-01-24 14:16               ` [PATCH v2.5 02/11] bundle: verify using connected() Derrick Stolee
@ 2023-01-24 17:33                 ` Junio C Hamano
  2023-01-24 18:46                   ` Derrick Stolee
  0 siblings, 1 reply; 74+ messages in thread
From: Junio C Hamano @ 2023-01-24 17:33 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, me, vdye, avarab, steadmon,
	chooglen

Derrick Stolee <derrickstolee@github.com> writes:

> When Git verifies a bundle to see if it is safe for unbundling, it first
> looks to see if the prerequisite commits are in the object store. This
> is usually a sufficient filter, and those missing commits are indicated
> clearly in the error messages.

I am not sure if our early check is because "does the prerequisite
even exist?" is sufficient.  It is a short-cut that is cheap and can
be done without preparing the commit traversal.

> However, if the commits are present in
> the object store, then there could still be issues if those commits are
> not reachable from the repository's references. The repository only has
> guarantees that its object store is closed under reachability for the
> objects that are reachable from references.

Correct.

> Thus, the code in verify_bundle() has previously had the additional
> check that all prerequisite commits are reachable from repository
> references. This is done via a revision walk from all references,
> stopping only if all prerequisite commits are discovered or all commits
> are walked. This uses a custom walk to verify_bundle().

Correct.

> This check is more strict than what Git applies even to fetched
> pack-files.

I do not see the need to say "even" here.  In what other situation
do we make connectivity checks, and is there a need to be more
strict than others when checking fetched packfiles?

> In the fetch case, Git guarantees that the new references
> are closed under reachability by walking from the new references until
> walking commits that are reachable from repository refs. This is done
> through the well-used check_connected() method.

Correct and is a good point to make.

> To better align with the restrictions required by 'git fetch',
> reimplement this check in verify_bundle() to use check_connected(). This
> also simplifies the code significantly.

Wonderful.  I never liked the custom check done in unbundle code,
which I am reasonably sure came from scripted hack to unbundle I
wrote eons ago.


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v2.5 02/11] bundle: verify using connected()
  2023-01-24 17:33                 ` Junio C Hamano
@ 2023-01-24 18:46                   ` Derrick Stolee
  2023-01-24 20:41                     ` Junio C Hamano
  0 siblings, 1 reply; 74+ messages in thread
From: Derrick Stolee @ 2023-01-24 18:46 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee via GitGitGadget, git, me, vdye, avarab, steadmon,
	chooglen

On 1/24/2023 12:33 PM, Junio C Hamano wrote:
> Derrick Stolee <derrickstolee@github.com> writes:
> 
>> When Git verifies a bundle to see if it is safe for unbundling, it first
>> looks to see if the prerequisite commits are in the object store. This
>> is usually a sufficient filter, and those missing commits are indicated
>> clearly in the error messages.
> 
> I am not sure if our early check is because "does the prerequisite
> even exist?" is sufficient.  It is a short-cut that is cheap and can
> be done without preparing the commit traversal.

I suppose I should say "Usually, existence in the object store is
correlated with having all reachable objects, but this is not
guaranteed." I'll also mention that it is a short-cut that can fail
faster than the reachability check.

>> This check is more strict than what Git applies even to fetched
>> pack-files.
> 
> I do not see the need to say "even" here.  In what other situation
> do we make connectivity checks, and is there a need to be more
> strict than others when checking fetched packfiles?

I suppose that I was implying that fetches are the more common
operation, and the scrutiny applied to an arbitrary pack-file from
a remote is probably higher there. However, who knows where a
bundle came from, so the scrutiny should be the same.

>> To better align with the restrictions required by 'git fetch',
>> reimplement this check in verify_bundle() to use check_connected(). This
>> also simplifies the code significantly.
> 
> Wonderful.  I never liked the custom check done in unbundle code,
> which I am reasonably sure came from scripted hack to unbundle I
> wrote eons ago.
 
Excellent. Thanks for your feedback.

-Stolee

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v2.5 02/11] bundle: verify using connected()
  2023-01-24 18:46                   ` Derrick Stolee
@ 2023-01-24 20:41                     ` Junio C Hamano
  0 siblings, 0 replies; 74+ messages in thread
From: Junio C Hamano @ 2023-01-24 20:41 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, me, vdye, avarab, steadmon,
	chooglen

Derrick Stolee <derrickstolee@github.com> writes:

>> I do not see the need to say "even" here.  In what other situation
>> do we make connectivity checks, and is there a need to be more
>> strict than others when checking fetched packfiles?
>
> I suppose that I was implying that fetches are the more common
> operation, and the scrutiny applied to an arbitrary pack-file from
> a remote is probably higher there. However, who knows where a
> bundle came from, so the scrutiny should be the same.

Ah, I see that is where that "even" came from.  And yes, I agree
that unbundling and fetch should be suspicious of their input to the
same degree.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v2 02/10] t5558: add tests for creationToken heuristic
  2023-01-23 15:21   ` [PATCH v2 02/10] t5558: add tests for creationToken heuristic Derrick Stolee via GitGitGadget
@ 2023-01-27 19:15     ` Victoria Dye
  0 siblings, 0 replies; 74+ messages in thread
From: Victoria Dye @ 2023-01-27 19:15 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget, git
  Cc: gitster, me, avarab, steadmon, chooglen, Derrick Stolee

Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>
> 
> As documented in the bundle URI design doc in 2da14fad8fe (docs:
> document bundle URI standard, 2022-08-09), the 'creationToken' member of
> a bundle URI allows a bundle provider to specify a total order on the
> bundles.
> 
> Future changes will allow the Git client to understand these members and
> modify its behavior around downloading the bundles in that order. In the
> meantime, create tests that add creation tokens to the bundle list. For
> now, the Git client correctly ignores these unknown keys.
> 
> Create a new test helper function, test_remote_https_urls, which filters
> GIT_TRACE2_EVENT output to extract a list of URLs passed to
> git-remote-https child processes. This can be used to verify the order
> of these requests as we implement the creationToken heuristic. For now,
> we need to sort the actual output since the current client does not have
> a well-defined order that it applies to the bundles.

...

> +# Given a GIT_TRACE2_EVENT log over stdin, writes to stdout a list of URLs
> +# sent to git-remote-https child processes.
> +test_remote_https_urls() {
> +	grep -e '"event":"child_start".*"argv":\["git-remote-https",".*"\]' |
> +		sed -e 's/{"event":"child_start".*"argv":\["git-remote-https","//g' \
> +		    -e 's/"\]}//g'
> +}
> +

...

> +	cat >expect <<-EOF &&
> +	$HTTPD_URL/bundle-1.bundle
> +	$HTTPD_URL/bundle-2.bundle
> +	$HTTPD_URL/bundle-3.bundle
> +	$HTTPD_URL/bundle-4.bundle
> +	$HTTPD_URL/bundle-list
> +	EOF
> +
> +	# Sort the list, since the order is not well-defined
> +	# without a heuristic.
> +	test_remote_https_urls <trace-clone.txt | sort >actual &&
> +	test_cmp expect actual

...

> +	cat >expect <<-EOF &&
> +	$HTTPD_URL/bundle-1.bundle
> +	$HTTPD_URL/bundle-2.bundle
> +	$HTTPD_URL/bundle-3.bundle
> +	$HTTPD_URL/bundle-4.bundle
> +	$HTTPD_URL/bundle-list
> +	EOF
> +
> +	# Since the creationToken heuristic is not yet understood by the
> +	# client, the order cannot be verified at this moment. Sort the
> +	# list for consistent results.
> +	test_remote_https_urls <trace-clone.txt | sort >actual &&
> +	test_cmp expect actual

These updates make the tests stronger (that is, less likely to let a
regression slip through), and the additional comments are helpful for
explaining what is and is not implemented at this point in the series. 


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v2 05/10] bundle-uri: download in creationToken order
  2023-01-23 15:21   ` [PATCH v2 05/10] bundle-uri: download in creationToken order Derrick Stolee via GitGitGadget
@ 2023-01-27 19:17     ` Victoria Dye
  2023-01-27 19:32       ` Junio C Hamano
  0 siblings, 1 reply; 74+ messages in thread
From: Victoria Dye @ 2023-01-27 19:17 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget, git
  Cc: gitster, me, avarab, steadmon, chooglen, Derrick Stolee

Derrick Stolee via GitGitGadget wrote:
> +struct bundles_for_sorting {

...

> +static int append_bundle(struct remote_bundle_info *bundle, void *data)

...

> +/**
> + * For use in QSORT() to get a list sorted by creationToken
> + * in decreasing order.
> + */
> +static int compare_creation_token_decreasing(const void *va, const void *vb)

These new function/struct names are all unambiguous. Looks good!

> +	cur = 0;
> +	while (cur >= 0 && cur < bundles.nr) {
> +		struct remote_bundle_info *bundle = bundles.items[cur];
> +		if (!bundle->file) {
> +			/*
> +			 * Not downloaded yet. Try downloading.
> +			 *
> +			 * Note that bundle->file is non-NULL if a download
> +			 * was attempted, even if it failed to download.
> +			 */
> +			if (fetch_bundle_uri_internal(ctx.r, bundle, ctx.depth + 1, ctx.list)) {
> +				/* Mark as unbundled so we do not retry. */
> +				bundle->unbundled = 1;

This implicitly shows that, unlike a failed unbundling, a failed download is
always erroneous behavior, with the added benefit of avoiding (potentially
expensive) download re-attempts.

> +
> +				/* Try looking deeper in the list. */
> +				move_direction = 1;
> +				goto stack_operation;
> +			}
> +
> +			/* We expect bundles when using creationTokens. */
> +			if (!is_bundle(bundle->file, 1)) {
> +				warning(_("file downloaded from '%s' is not a bundle"),
> +					bundle->uri);
> +				break;
> +			}
> +		}
> +
> +		if (bundle->file && !bundle->unbundled) {
> +			/*
> +			 * This was downloaded, but not successfully
> +			 * unbundled. Try unbundling again.
> +			 */
> +			if (unbundle_from_file(ctx.r, bundle->file)) {
> +				/* Try looking deeper in the list. */
> +				move_direction = 1;
> +			} else {
> +				/*
> +				 * Succeeded in unbundle. Retry bundles
> +				 * that previously failed to unbundle.
> +				 */
> +				move_direction = -1;
> +				bundle->unbundled = 1;
> +			}
> +		}
> +
> +		/*
> +		 * Else case: downloaded and unbundled successfully.
> +		 * Skip this by moving in the same direction as the
> +		 * previous step.
> +		 */
> +
> +stack_operation:

Other than this label, it looks like you've replaced all of the
"stack-based" language. Should this be replaced as well? No problem if not,
I just wasn't sure whether it was left that way intentionally.

> diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
> index 474432c8ace..6f9417a0afb 100755
> --- a/t/t5558-clone-bundle-uri.sh
> +++ b/t/t5558-clone-bundle-uri.sh
> @@ -401,17 +401,43 @@ test_expect_success 'clone bundle list (http, creationToken)' '
>  	git -C clone-list-http-2 cat-file --batch-check <oids &&
>  
>  	cat >expect <<-EOF &&
> -	$HTTPD_URL/bundle-1.bundle
> -	$HTTPD_URL/bundle-2.bundle
> -	$HTTPD_URL/bundle-3.bundle
> +	$HTTPD_URL/bundle-list
>  	$HTTPD_URL/bundle-4.bundle
> +	$HTTPD_URL/bundle-3.bundle
> +	$HTTPD_URL/bundle-2.bundle
> +	$HTTPD_URL/bundle-1.bundle
> +	EOF

Ooh, interesting - using the new "test_remote_https_urls", these tests now
also verify that the bundles were downloaded in decreasing order when using
the 'creationToken' heuristic. That's a nice extra confirmation that the
heuristic is working as intended.

> +test_expect_success 'clone incomplete bundle list (http, creationToken)' '

...

> +test_expect_success 'auto-discover multiple bundles from HTTP clone: creationToken heuristic' '

These tests look good as well, especially 'clone incomplete bundle list's
now-more descriptive name.


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v2 08/10] fetch: fetch from an external bundle URI
  2023-01-23 15:21   ` [PATCH v2 08/10] fetch: fetch from an external bundle URI Derrick Stolee via GitGitGadget
@ 2023-01-27 19:18     ` Victoria Dye
  0 siblings, 0 replies; 74+ messages in thread
From: Victoria Dye @ 2023-01-27 19:18 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget, git
  Cc: gitster, me, avarab, steadmon, chooglen, Derrick Stolee

Derrick Stolee via GitGitGadget wrote:
> @@ -2109,6 +2110,7 @@ static int fetch_one(struct remote *remote, int argc, const char **argv,
>  int cmd_fetch(int argc, const char **argv, const char *prefix)
>  {
>  	int i;
> +	const char *bundle_uri;
>  	struct string_list list = STRING_LIST_INIT_DUP;
>  	struct remote *remote = NULL;
>  	int result = 0;
> @@ -2194,6 +2196,11 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
>  	if (dry_run)
>  		write_fetch_head = 0;
>  
> +	if (!git_config_get_string_tmp("fetch.bundleuri", &bundle_uri)) {
> +		if (fetch_bundle_uri(the_repository, bundle_uri, NULL))
> +			warning(_("failed to fetch bundles from '%s'"), bundle_uri);

nit: these conditions don't need to be nested and could instead be joined
with '&&' in the outer 'if ()' (unless you plan to add more to this block in
a future series - I didn't see anything in later patches here).

Everything else (tests, doc updates since the last version) looks good.


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v2 10/10] bundle-uri: test missing bundles with heuristic
  2023-01-23 15:21   ` [PATCH v2 10/10] bundle-uri: test missing bundles with heuristic Derrick Stolee via GitGitGadget
@ 2023-01-27 19:21     ` Victoria Dye
  2023-01-30 18:47       ` Derrick Stolee
  0 siblings, 1 reply; 74+ messages in thread
From: Victoria Dye @ 2023-01-27 19:21 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget, git
  Cc: gitster, me, avarab, steadmon, chooglen, Derrick Stolee

Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>
> 
> The creationToken heuristic uses a different mechanism for downloading
> bundles from the "standard" approach. Specifically: it uses a concrete
> order based on the creationToken values and attempts to download as few
> bundles as possible. It also modifies local config to store a value for
> future fetches to avoid downloading bundles, if possible.
> 
> However, if any of the individual bundles has a failed download, then
> the logic for the ordering comes into question. It is important to avoid
> infinite loops, assigning invalid creation token values in config, but
> also to be opportunistic as possible when downloading as many bundles as
> seem appropriate.
> 
> These tests were used to inform the implementation of
> fetch_bundles_by_token() in bundle-uri.c, but are being added
> independently here to allow focusing on faulty downloads. There may be
> more cases that could be added that result in modifications to
> fetch_bundles_by_token() as interesting data shapes reveal themselves in
> real scenarios.
> 

The expanded testing is great, thanks for adding it!

> +	# Case 2: middle bundle does not exist, only two bundles can unbundle
> +	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
> +	[bundle]
> +		version = 1
> +		mode = all
> +		heuristic = creationToken
> +
> +	[bundle "bundle-1"]
> +		uri = bundle-1.bundle
> +		creationToken = 1
> +
> +	[bundle "bundle-2"]
> +		uri = fake.bundle
> +		creationToken = 2
> +
> +	[bundle "bundle-3"]
> +		uri = bundle-3.bundle
> +		creationToken = 3
> +
> +	[bundle "bundle-4"]
> +		uri = bundle-4.bundle
> +		creationToken = 4
> +	EOF
> +
> +	GIT_TRACE2_EVENT="$(pwd)/trace-clone-2.txt" \
> +	git clone --single-branch --branch=base \
> +		--bundle-uri="$HTTPD_URL/bundle-list" \
> +		"$HTTPD_URL/smart/fetch.git" download-2 &&
> +
> +	# Bundle failure does not set these configs.
> +	test_must_fail git -C download-2 config fetch.bundleuri &&
> +	test_must_fail git -C download-2 config fetch.bundlecreationtoken &&
> +
> +	cat >expect <<-EOF &&
> +	$HTTPD_URL/bundle-list
> +	$HTTPD_URL/bundle-4.bundle
> +	$HTTPD_URL/bundle-3.bundle
> +	$HTTPD_URL/fake.bundle
> +	$HTTPD_URL/bundle-1.bundle
> +	EOF
> +	test_remote_https_urls <trace-clone-2.txt >actual &&
> +	test_cmp expect actual &&
> +
> +	# Only base bundle unbundled.
> +	git -C download-2 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
> +	cat >expect <<-EOF &&
> +	refs/bundles/base
> +	refs/bundles/right
> +	EOF
> +	test_cmp expect refs &&

Maybe I'm misreading, but I don't think the comment ("Only base bundle
unbundled") lines up with the expected bundle refs (both bundle-1
('refs/bundles/base') and bundle-3 ('refs/bundles/right') seem to be
unbundled). 

> +
> +	# Case 3: top bundle does not exist, rest unbundle fine.
> +	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
> +	[bundle]
> +		version = 1
> +		mode = all
> +		heuristic = creationToken
> +
> +	[bundle "bundle-1"]
> +		uri = bundle-1.bundle
> +		creationToken = 1
> +
> +	[bundle "bundle-2"]
> +		uri = bundle-2.bundle
> +		creationToken = 2
> +
> +	[bundle "bundle-3"]
> +		uri = bundle-3.bundle
> +		creationToken = 3
> +
> +	[bundle "bundle-4"]
> +		uri = fake.bundle
> +		creationToken = 4
> +	EOF
> +
> +	GIT_TRACE2_EVENT="$(pwd)/trace-clone-3.txt" \
> +	git clone --single-branch --branch=base \
> +		--bundle-uri="$HTTPD_URL/bundle-list" \
> +		"$HTTPD_URL/smart/fetch.git" download-3 &&
> +
> +	# As long as we have continguous successful downloads,
> +	# we _do_ set these configs.
> +	test_cmp_config -C download-3 "$HTTPD_URL/bundle-list" fetch.bundleuri &&
> +	test_cmp_config -C download-3 3 fetch.bundlecreationtoken &&
> +
> +	cat >expect <<-EOF &&
> +	$HTTPD_URL/bundle-list
> +	$HTTPD_URL/fake.bundle
> +	$HTTPD_URL/bundle-3.bundle
> +	$HTTPD_URL/bundle-2.bundle
> +	$HTTPD_URL/bundle-1.bundle
> +	EOF
> +	test_remote_https_urls <trace-clone-3.txt >actual &&
> +	test_cmp expect actual &&
> +
> +	# All bundles failed to unbundle
> +	git -C download-3 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
> +	cat >expect <<-EOF &&
> +	refs/bundles/base
> +	refs/bundles/left
> +	refs/bundles/right
> +	EOF
> +	test_cmp expect refs

Similar issue with the comment here - it says that all bundles *failed* to
unbundle, but the test case description ("Case 3: top bundle does not exist,
rest unbundle fine.") and the result show bundle-1, bundle-2, and bundle-3
all unbundling successfully.

> +'
> +
> +# Expand the bundle list to include other interesting shapes, specifically
> +# interesting for use when fetching from a previous state.
> +#
> +# ---------------- bundle-7
> +#       7
> +#     _/|\_
> +# ---/--|--\------ bundle-6
> +#   5   |   6
> +# --|---|---|----- bundle-4
> +#   |   4   |
> +#   |  / \  /
> +# --|-|---|/------ bundle-3 (the client will be caught up to this point.)
> +#   \ |   3
> +# ---\|---|------- bundle-2
> +#     2   |
> +# ----|---|------- bundle-1
> +#      \ /
> +#       1
> +#       |
> +# (previous commits)

...

> +	# Case 1: all bundles exist: successful unbundling of all bundles

...

> +	# Case 2: middle bundle does not exist, only bundle-4 can unbundle

...

> +	# Case 3: top bundle does not exist, rest unbundle fine.

The rest of these cases look okay and, at a high-level, it's helpful to have
these additional tests covering a different topology.


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v2 00/10] Bundle URIs V: creationToken heuristic for incremental fetches
  2023-01-23 15:21 ` [PATCH v2 00/10] Bundle URIs V: creationToken heuristic for incremental fetches Derrick Stolee via GitGitGadget
                     ` (9 preceding siblings ...)
  2023-01-23 15:21   ` [PATCH v2 10/10] bundle-uri: test missing bundles with heuristic Derrick Stolee via GitGitGadget
@ 2023-01-27 19:28   ` Victoria Dye
  2023-01-31 13:29   ` [PATCH v3 00/11] " Derrick Stolee via GitGitGadget
  11 siblings, 0 replies; 74+ messages in thread
From: Victoria Dye @ 2023-01-27 19:28 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget, git
  Cc: gitster, me, avarab, steadmon, chooglen, Derrick Stolee

Derrick Stolee via GitGitGadget wrote:
> Updates in v2
> =============

In patches 2-9, I only had a few minor nits I commented on in the respective
patches; everything else either addressed something directly noted in the
last set of reviews or otherwise was sufficiently explained by
comments/commit messages. 

As for patch 1, I think the approach to relaxing the unbundling checks you
settled on in [1] after discussing with Junio makes sense.

[1] https://lore.kernel.org/git/ecc6b167-f5c4-48ce-3973-461d1659ed40@github.com/

> 
> Thanks,
> 
>  * Stolee
> 
> Derrick Stolee (10):
>   bundle: optionally skip reachability walk
>   t5558: add tests for creationToken heuristic
>   bundle-uri: parse bundle.heuristic=creationToken
>   bundle-uri: parse bundle.<id>.creationToken values
>   bundle-uri: download in creationToken order
>   clone: set fetch.bundleURI if appropriate
>   bundle-uri: drop bundle.flag from design doc
>   fetch: fetch from an external bundle URI
>   bundle-uri: store fetch.bundleCreationToken
>   bundle-uri: test missing bundles with heuristic
> 
>  Documentation/config/bundle.txt        |   7 +
>  Documentation/config/fetch.txt         |  24 +
>  Documentation/technical/bundle-uri.txt |   8 +-
>  builtin/clone.c                        |   6 +-
>  builtin/fetch.c                        |   7 +
>  bundle-uri.c                           | 257 +++++++++-
>  bundle-uri.h                           |  28 +-
>  bundle.c                               |   3 +-
>  bundle.h                               |   1 +
>  t/t5558-clone-bundle-uri.sh            | 672 ++++++++++++++++++++++++-
>  t/t5601-clone.sh                       |  46 ++
>  t/t5750-bundle-uri-parse.sh            |  37 ++
>  t/test-lib-functions.sh                |   8 +
>  13 files changed, 1091 insertions(+), 13 deletions(-)
> 
> 
> base-commit: 4dbebc36b0893f5094668ddea077d0e235560b16
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1454%2Fderrickstolee%2Fbundle-redo%2FcreationToken-v2
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1454/derrickstolee/bundle-redo/creationToken-v2
> Pull-Request: https://github.com/gitgitgadget/git/pull/1454
> 
> Range-diff vs v1:
> 
>   -:  ----------- >  1:  b3828725bc8 bundle: optionally skip reachability walk
>   1:  39eed914878 !  2:  427aff4d5e5 t5558: add tests for creationToken heuristic
>      @@ Commit message
>           meantime, create tests that add creation tokens to the bundle list. For
>           now, the Git client correctly ignores these unknown keys.
>       
>      +    Create a new test helper function, test_remote_https_urls, which filters
>      +    GIT_TRACE2_EVENT output to extract a list of URLs passed to
>      +    git-remote-https child processes. This can be used to verify the order
>      +    of these requests as we implement the creationToken heuristic. For now,
>      +    we need to sort the actual output since the current client does not have
>      +    a well-defined order that it applies to the bundles.
>      +
>           Signed-off-by: Derrick Stolee <derrickstolee@github.com>
>       
>        ## t/t5558-clone-bundle-uri.sh ##
>       @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone HTTP bundle' '
>      - 	test_config -C clone-http log.excludedecoration refs/bundle/
>        '
>        
>      -+# usage: test_bundle_downloaded <bundle-name> <trace-file>
>      -+test_bundle_downloaded () {
>      -+	cat >pattern <<-EOF &&
>      -+	"event":"child_start".*"argv":\["git-remote-https","$HTTPD_URL/$1"\]
>      -+	EOF
>      -+	grep -f pattern "$2"
>      -+}
>      -+
>        test_expect_success 'clone bundle list (HTTP, no heuristic)' '
>       +	test_when_finished rm -f trace*.txt &&
>       +
>      @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone bundle list (HTTP, no he
>       -	git -C clone-list-http cat-file --batch-check <oids
>       +	git -C clone-list-http cat-file --batch-check <oids &&
>       +
>      -+	for b in 1 2 3 4
>      -+	do
>      -+		test_bundle_downloaded bundle-$b.bundle trace-clone.txt ||
>      -+			return 1
>      -+	done
>      ++	cat >expect <<-EOF &&
>      ++	$HTTPD_URL/bundle-1.bundle
>      ++	$HTTPD_URL/bundle-2.bundle
>      ++	$HTTPD_URL/bundle-3.bundle
>      ++	$HTTPD_URL/bundle-4.bundle
>      ++	$HTTPD_URL/bundle-list
>      ++	EOF
>      ++
>      ++	# Sort the list, since the order is not well-defined
>      ++	# without a heuristic.
>      ++	test_remote_https_urls <trace-clone.txt | sort >actual &&
>      ++	test_cmp expect actual
>        '
>        
>        test_expect_success 'clone bundle list (HTTP, any mode)' '
>      @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone bundle list (HTTP, any m
>        '
>        
>       +test_expect_success 'clone bundle list (http, creationToken)' '
>      ++	test_when_finished rm -f trace*.txt &&
>      ++
>       +	cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" &&
>       +	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
>       +	[bundle]
>      @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone bundle list (HTTP, any m
>       +		creationToken = 4
>       +	EOF
>       +
>      -+	git clone --bundle-uri="$HTTPD_URL/bundle-list" . clone-list-http-2 &&
>      ++	GIT_TRACE2_EVENT="$(pwd)/trace-clone.txt" git \
>      ++		clone --bundle-uri="$HTTPD_URL/bundle-list" \
>      ++		"$HTTPD_URL/smart/fetch.git" clone-list-http-2 &&
>       +
>       +	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
>      -+	git -C clone-list-http-2 cat-file --batch-check <oids
>      ++	git -C clone-list-http-2 cat-file --batch-check <oids &&
>      ++
>      ++	cat >expect <<-EOF &&
>      ++	$HTTPD_URL/bundle-1.bundle
>      ++	$HTTPD_URL/bundle-2.bundle
>      ++	$HTTPD_URL/bundle-3.bundle
>      ++	$HTTPD_URL/bundle-4.bundle
>      ++	$HTTPD_URL/bundle-list
>      ++	EOF
>      ++
>      ++	# Since the creationToken heuristic is not yet understood by the
>      ++	# client, the order cannot be verified at this moment. Sort the
>      ++	# list for consistent results.
>      ++	test_remote_https_urls <trace-clone.txt | sort >actual &&
>      ++	test_cmp expect actual
>       +'
>       +
>        # Do not add tests here unless they use the HTTP server, as they will
>        # not run unless the HTTP dependencies exist.
>        
>      +
>      + ## t/test-lib-functions.sh ##
>      +@@ t/test-lib-functions.sh: test_region () {
>      + 	return 0
>      + }
>      + 
>      ++# Given a GIT_TRACE2_EVENT log over stdin, writes to stdout a list of URLs
>      ++# sent to git-remote-https child processes.
>      ++test_remote_https_urls() {
>      ++	grep -e '"event":"child_start".*"argv":\["git-remote-https",".*"\]' |
>      ++		sed -e 's/{"event":"child_start".*"argv":\["git-remote-https","//g' \
>      ++		    -e 's/"\]}//g'
>      ++}
>      ++
>      + # Print the destination of symlink(s) provided as arguments. Basically
>      + # the same as the readlink command, but it's not available everywhere.
>      + test_readlink () {
>   2:  9007249b948 !  3:  f6f8197c9cc bundle-uri: parse bundle.heuristic=creationToken
>      @@ Commit message
>           bundle-uri' to print the heuristic value and verify that the parsing
>           works correctly.
>       
>      +    As an extra precaution, create the internal 'heuristics' array to be a
>      +    list of (enum, string) pairs so we can iterate through the array entries
>      +    carefully, regardless of the enum values.
>      +
>           Signed-off-by: Derrick Stolee <derrickstolee@github.com>
>       
>        ## Documentation/config/bundle.txt ##
>      @@ bundle-uri.c
>        #include "config.h"
>        #include "remote.h"
>        
>      -+static const char *heuristics[] = {
>      -+	[BUNDLE_HEURISTIC_NONE] = "",
>      -+	[BUNDLE_HEURISTIC_CREATIONTOKEN] = "creationToken",
>      ++static struct {
>      ++	enum bundle_list_heuristic heuristic;
>      ++	const char *name;
>      ++} heuristics[BUNDLE_HEURISTIC__COUNT] = {
>      ++	{ BUNDLE_HEURISTIC_NONE, ""},
>      ++	{ BUNDLE_HEURISTIC_CREATIONTOKEN, "creationToken" },
>       +};
>       +
>        static int compare_bundles(const void *hashmap_cmp_fn_data,
>      @@ bundle-uri.c: void print_bundle_list(FILE *fp, struct bundle_list *list)
>        	fprintf(fp, "\tversion = %d\n", list->version);
>        	fprintf(fp, "\tmode = %s\n", mode);
>        
>      -+	if (list->heuristic)
>      -+		printf("\theuristic = %s\n", heuristics[list->heuristic]);
>      ++	if (list->heuristic) {
>      ++		int i;
>      ++		for (i = 0; i < BUNDLE_HEURISTIC__COUNT; i++) {
>      ++			if (heuristics[i].heuristic == list->heuristic) {
>      ++				printf("\theuristic = %s\n",
>      ++				       heuristics[list->heuristic].name);
>      ++				break;
>      ++			}
>      ++		}
>      ++	}
>       +
>        	for_all_bundles_in_list(list, summarize_bundle, fp);
>        }
>      @@ bundle-uri.c: static int bundle_list_update(const char *key, const char *value,
>       +		if (!strcmp(subkey, "heuristic")) {
>       +			int i;
>       +			for (i = 0; i < BUNDLE_HEURISTIC__COUNT; i++) {
>      -+				if (!strcmp(value, heuristics[i])) {
>      -+					list->heuristic = i;
>      ++				if (heuristics[i].heuristic &&
>      ++				    heuristics[i].name &&
>      ++				    !strcmp(value, heuristics[i].name)) {
>      ++					list->heuristic = heuristics[i].heuristic;
>       +					return 0;
>       +				}
>       +			}
>      @@ bundle-uri.h: enum bundle_list_mode {
>       +	BUNDLE_HEURISTIC_CREATIONTOKEN,
>       +
>       +	/* Must be last. */
>      -+	BUNDLE_HEURISTIC__COUNT,
>      ++	BUNDLE_HEURISTIC__COUNT
>       +};
>       +
>        /**
>   3:  a1808f0b10c =  4:  12efa228d04 bundle-uri: parse bundle.<id>.creationToken values
>   4:  57c0174d375 !  5:  7cfaa3c518c bundle-uri: download in creationToken order
>      @@ Commit message
>           strategy implemented here provides that short-circuit where the client
>           downloads a minimal set of bundles.
>       
>      +    However, we are not satisfied by the naive approach of downloading
>      +    bundles until one successfully unbundles, expecting the earlier bundles
>      +    to successfully unbundle now. The example repository in t5558
>      +    demonstrates this well:
>      +
>      +     ---------------- bundle-4
>      +
>      +           4
>      +          / \
>      +     ----|---|------- bundle-3
>      +         |   |
>      +         |   3
>      +         |   |
>      +     ----|---|------- bundle-2
>      +         |   |
>      +         2   |
>      +         |   |
>      +     ----|---|------- bundle-1
>      +          \ /
>      +           1
>      +           |
>      +     (previous commits)
>      +
>      +    In this repository, if we already have the objects for bundle-1 and then
>      +    try to fetch from this list, the naive approach will fail. bundle-4
>      +    requires both bundle-3 and bundle-2, though bundle-3 will successfully
>      +    unbundle without bundle-2. Thus, the algorithm needs to keep this in
>      +    mind.
>      +
>           A later implementation detail will store the maximum creationToken seen
>           during such a bundle download, and the client will avoid downloading a
>           bundle unless its creationToken is strictly greater than that stored
>      @@ bundle-uri.c: static int download_bundle_to_file(struct remote_bundle_info *bund
>        	return 0;
>        }
>        
>      -+struct sorted_bundle_list {
>      ++struct bundles_for_sorting {
>       +	struct remote_bundle_info **items;
>       +	size_t alloc;
>       +	size_t nr;
>       +};
>       +
>      -+static int insert_bundle(struct remote_bundle_info *bundle, void *data)
>      ++static int append_bundle(struct remote_bundle_info *bundle, void *data)
>       +{
>      -+	struct sorted_bundle_list *list = data;
>      ++	struct bundles_for_sorting *list = data;
>       +	list->items[list->nr++] = bundle;
>       +	return 0;
>       +}
>       +
>      -+static int compare_creation_token(const void *va, const void *vb)
>      ++/**
>      ++ * For use in QSORT() to get a list sorted by creationToken
>      ++ * in decreasing order.
>      ++ */
>      ++static int compare_creation_token_decreasing(const void *va, const void *vb)
>       +{
>       +	const struct remote_bundle_info * const *a = va;
>       +	const struct remote_bundle_info * const *b = vb;
>      @@ bundle-uri.c: static int download_bundle_to_file(struct remote_bundle_info *bund
>       +				  struct bundle_list *list)
>       +{
>       +	int cur;
>      -+	int pop_or_push = 0;
>      ++	int move_direction = 0;
>       +	struct bundle_list_context ctx = {
>       +		.r = r,
>       +		.list = list,
>       +		.mode = list->mode,
>       +	};
>      -+	struct sorted_bundle_list sorted = {
>      ++	struct bundles_for_sorting bundles = {
>       +		.alloc = hashmap_get_size(&list->bundles),
>       +	};
>       +
>      -+	ALLOC_ARRAY(sorted.items, sorted.alloc);
>      ++	ALLOC_ARRAY(bundles.items, bundles.alloc);
>       +
>      -+	for_all_bundles_in_list(list, insert_bundle, &sorted);
>      ++	for_all_bundles_in_list(list, append_bundle, &bundles);
>       +
>      -+	QSORT(sorted.items, sorted.nr, compare_creation_token);
>      ++	QSORT(bundles.items, bundles.nr, compare_creation_token_decreasing);
>       +
>       +	/*
>      -+	 * Use a stack-based approach to download the bundles and attempt
>      -+	 * to unbundle them in decreasing order by creation token. If we
>      -+	 * fail to unbundle (after a successful download) then move to the
>      -+	 * next non-downloaded bundle (push to the stack) and attempt
>      -+	 * downloading. Once we succeed in applying a bundle, move to the
>      -+	 * previous unapplied bundle (pop the stack) and attempt to unbundle
>      -+	 * it again.
>      ++	 * Attempt to download and unbundle the minimum number of bundles by
>      ++	 * creationToken in decreasing order. If we fail to unbundle (after
>      ++	 * a successful download) then move to the next non-downloaded bundle
>      ++	 * and attempt downloading. Once we succeed in applying a bundle,
>      ++	 * move to the previous unapplied bundle and attempt to unbundle it
>      ++	 * again.
>       +	 *
>       +	 * In the case of a fresh clone, we will likely download all of the
>       +	 * bundles before successfully unbundling the oldest one, then the
>      @@ bundle-uri.c: static int download_bundle_to_file(struct remote_bundle_info *bund
>       +	 * repo's object store.
>       +	 */
>       +	cur = 0;
>      -+	while (cur >= 0 && cur < sorted.nr) {
>      -+		struct remote_bundle_info *bundle = sorted.items[cur];
>      ++	while (cur >= 0 && cur < bundles.nr) {
>      ++		struct remote_bundle_info *bundle = bundles.items[cur];
>       +		if (!bundle->file) {
>      -+			/* Not downloaded yet. Try downloading. */
>      -+			if (download_bundle_to_file(bundle, &ctx)) {
>      -+				/* Failure. Push to the stack. */
>      -+				pop_or_push = 1;
>      ++			/*
>      ++			 * Not downloaded yet. Try downloading.
>      ++			 *
>      ++			 * Note that bundle->file is non-NULL if a download
>      ++			 * was attempted, even if it failed to download.
>      ++			 */
>      ++			if (fetch_bundle_uri_internal(ctx.r, bundle, ctx.depth + 1, ctx.list)) {
>      ++				/* Mark as unbundled so we do not retry. */
>      ++				bundle->unbundled = 1;
>      ++
>      ++				/* Try looking deeper in the list. */
>      ++				move_direction = 1;
>       +				goto stack_operation;
>       +			}
>       +
>      @@ bundle-uri.c: static int download_bundle_to_file(struct remote_bundle_info *bund
>       +			 * unbundled. Try unbundling again.
>       +			 */
>       +			if (unbundle_from_file(ctx.r, bundle->file)) {
>      -+				/* Failed to unbundle. Push to stack. */
>      -+				pop_or_push = 1;
>      ++				/* Try looking deeper in the list. */
>      ++				move_direction = 1;
>       +			} else {
>      -+				/* Succeeded in unbundle. Pop stack. */
>      -+				pop_or_push = -1;
>      ++				/*
>      ++				 * Succeeded in unbundle. Retry bundles
>      ++				 * that previously failed to unbundle.
>      ++				 */
>      ++				move_direction = -1;
>      ++				bundle->unbundled = 1;
>       +			}
>       +		}
>       +
>      @@ bundle-uri.c: static int download_bundle_to_file(struct remote_bundle_info *bund
>       +
>       +stack_operation:
>       +		/* Move in the specified direction and repeat. */
>      -+		cur += pop_or_push;
>      ++		cur += move_direction;
>       +	}
>       +
>      -+	free(sorted.items);
>      ++	free(bundles.items);
>       +
>       +	/*
>       +	 * We succeed if the loop terminates because 'cur' drops below
>      @@ bundle-uri.c: static int fetch_bundle_list_in_config_format(struct repository *r
>       +	 * it advertises are expected to be bundles, not nested lists.
>       +	 * We can drop 'global_list' and 'depth'.
>       +	 */
>      -+	if (list_from_bundle.heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN)
>      ++	if (list_from_bundle.heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN) {
>       +		result = fetch_bundles_by_token(r, &list_from_bundle);
>      -+	else if ((result = download_bundle_list(r, &list_from_bundle,
>      ++		global_list->heuristic = BUNDLE_HEURISTIC_CREATIONTOKEN;
>      ++	} else if ((result = download_bundle_list(r, &list_from_bundle,
>        					   global_list, depth)))
>        		goto cleanup;
>        
>      @@ bundle-uri.c: int fetch_bundle_list(struct repository *r, struct bundle_list *li
>        	for_all_bundles_in_list(&global_list, unlink_bundle, NULL);
>       
>        ## t/t5558-clone-bundle-uri.sh ##
>      -@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone bundle list (HTTP, any mode)' '
>      - '
>      - 
>      - test_expect_success 'clone bundle list (http, creationToken)' '
>      -+	test_when_finished rm -f trace*.txt &&
>      -+
>      - 	cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" &&
>      - 	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
>      - 	[bundle]
>       @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone bundle list (http, creationToken)' '
>      - 		creationToken = 4
>      - 	EOF
>      + 	git -C clone-list-http-2 cat-file --batch-check <oids &&
>        
>      --	git clone --bundle-uri="$HTTPD_URL/bundle-list" . clone-list-http-2 &&
>      -+	GIT_TRACE2_EVENT=$(pwd)/trace-clone.txt \
>      -+	git clone --bundle-uri="$HTTPD_URL/bundle-list" \
>      -+		clone-from clone-list-http-2 &&
>      - 
>      - 	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
>      --	git -C clone-list-http-2 cat-file --batch-check <oids
>      -+	git -C clone-list-http-2 cat-file --batch-check <oids &&
>      -+
>      -+	for b in 1 2 3 4
>      -+	do
>      -+		test_bundle_downloaded bundle-$b.bundle trace-clone.txt ||
>      -+			return 1
>      -+	done
>      + 	cat >expect <<-EOF &&
>      +-	$HTTPD_URL/bundle-1.bundle
>      +-	$HTTPD_URL/bundle-2.bundle
>      +-	$HTTPD_URL/bundle-3.bundle
>      ++	$HTTPD_URL/bundle-list
>      + 	$HTTPD_URL/bundle-4.bundle
>      ++	$HTTPD_URL/bundle-3.bundle
>      ++	$HTTPD_URL/bundle-2.bundle
>      ++	$HTTPD_URL/bundle-1.bundle
>      ++	EOF
>      ++
>      ++	test_remote_https_urls <trace-clone.txt >actual &&
>      ++	test_cmp expect actual
>       +'
>       +
>      -+test_expect_success 'clone bundle list (http, creationToken)' '
>      ++test_expect_success 'clone incomplete bundle list (http, creationToken)' '
>       +	test_when_finished rm -f trace*.txt &&
>       +
>       +	cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" &&
>      @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone bundle list (http, creat
>       +	[bundle "bundle-1"]
>       +		uri = bundle-1.bundle
>       +		creationToken = 1
>      -+
>      -+	[bundle "bundle-2"]
>      -+		uri = bundle-2.bundle
>      -+		creationToken = 2
>       +	EOF
>       +
>       +	GIT_TRACE2_EVENT=$(pwd)/trace-clone.txt \
>       +	git clone --bundle-uri="$HTTPD_URL/bundle-list" \
>      -+		clone-from clone-token-http &&
>      ++		--single-branch --branch=base --no-tags \
>      ++		"$HTTPD_URL/smart/fetch.git" clone-token-http &&
>       +
>      -+	test_bundle_downloaded bundle-1.bundle trace-clone.txt &&
>      -+	test_bundle_downloaded bundle-2.bundle trace-clone.txt
>      ++	cat >expect <<-EOF &&
>      + 	$HTTPD_URL/bundle-list
>      ++	$HTTPD_URL/bundle-1.bundle
>      + 	EOF
>      + 
>      +-	# Since the creationToken heuristic is not yet understood by the
>      +-	# client, the order cannot be verified at this moment. Sort the
>      +-	# list for consistent results.
>      +-	test_remote_https_urls <trace-clone.txt | sort >actual &&
>      ++	test_remote_https_urls <trace-clone.txt >actual &&
>      + 	test_cmp expect actual
>        '
>        
>      - # Do not add tests here unless they use the HTTP server, as they will
>       
>        ## t/t5601-clone.sh ##
>       @@ t/t5601-clone.sh: test_expect_success 'auto-discover multiple bundles from HTTP clone' '
>        	grep -f pattern trace.txt
>        '
>        
>      -+# Usage: test_bundle_downloaded <bundle-id> <trace-filename>
>      -+test_bundle_downloaded () {
>      -+	cat >pattern <<-EOF &&
>      -+	"event":"child_start".*"argv":\["git-remote-https","$HTTPD_URL/$1.bundle"\]
>      -+	EOF
>      -+	grep -f pattern "$2"
>      -+}
>      -+
>       +test_expect_success 'auto-discover multiple bundles from HTTP clone: creationToken heuristic' '
>       +	test_when_finished rm -rf "$HTTPD_DOCUMENT_ROOT_PATH/repo4.git" &&
>       +	test_when_finished rm -rf clone-heuristic trace*.txt &&
>      @@ t/t5601-clone.sh: test_expect_success 'auto-discover multiple bundles from HTTP
>       +		    -c transfer.bundleURI=true clone \
>       +		"$HTTPD_URL/smart/repo4.git" clone-heuristic &&
>       +
>      -+	# We should fetch all bundles
>      -+	for b in everything new newest
>      -+	do
>      -+		test_bundle_downloaded $b trace-clone.txt || return 1
>      -+	done
>      ++	cat >expect <<-EOF &&
>      ++	$HTTPD_URL/newest.bundle
>      ++	$HTTPD_URL/new.bundle
>      ++	$HTTPD_URL/everything.bundle
>      ++	EOF
>      ++
>      ++	# We should fetch all bundles in the expected order.
>      ++	test_remote_https_urls <trace-clone.txt >actual &&
>      ++	test_cmp expect actual
>       +'
>       +
>        # DO NOT add non-httpd-specific tests here, because the last part of this
>   5:  d9c6f50e4f2 !  6:  17c404c1b83 clone: set fetch.bundleURI if appropriate
>      @@ Documentation/config/fetch.txt: fetch.writeCommitGraph::
>        	`git push -f`, and `git log --graph`. Defaults to false.
>       +
>       +fetch.bundleURI::
>      -+	This value stores a URI for fetching Git object data from a bundle URI
>      -+	before performing an incremental fetch from the origin Git server. If
>      -+	the value is `<uri>` then running `git fetch <args>` is equivalent to
>      -+	first running `git fetch --bundle-uri=<uri>` immediately before
>      -+	`git fetch <args>`. See details of the `--bundle-uri` option in
>      -+	linkgit:git-fetch[1].
>      ++	This value stores a URI for downloading Git object data from a bundle
>      ++	URI before performing an incremental fetch from the origin Git server.
>      ++	This is similar to how the `--bundle-uri` option behaves in
>      ++	linkgit:git-clone[1]. `git clone --bundle-uri` will set the
>      ++	`fetch.bundleURI` value if the supplied bundle URI contains a bundle
>      ++	list that is organized for incremental fetches.
>       
>        ## builtin/clone.c ##
>       @@ builtin/clone.c: int cmd_clone(int argc, const char **argv, const char *prefix)
>      @@ builtin/clone.c: int cmd_clone(int argc, const char **argv, const char *prefix)
>        	strvec_push(&transport_ls_refs_options.ref_prefixes, "HEAD");
>       
>        ## bundle-uri.c ##
>      -@@ bundle-uri.c: static int fetch_bundle_list_in_config_format(struct repository *r,
>      - 	 * it advertises are expected to be bundles, not nested lists.
>      - 	 * We can drop 'global_list' and 'depth'.
>      - 	 */
>      --	if (list_from_bundle.heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN)
>      -+	if (list_from_bundle.heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN) {
>      - 		result = fetch_bundles_by_token(r, &list_from_bundle);
>      --	else if ((result = download_bundle_list(r, &list_from_bundle,
>      -+		global_list->heuristic = BUNDLE_HEURISTIC_CREATIONTOKEN;
>      -+	} else if ((result = download_bundle_list(r, &list_from_bundle,
>      - 					   global_list, depth)))
>      - 		goto cleanup;
>      - 
>       @@ bundle-uri.c: static int unlink_bundle(struct remote_bundle_info *info, void *data)
>        	return 0;
>        }
>      @@ bundle-uri.h: int bundle_uri_parse_config_format(const char *uri,
>         * Given a bundle list that was already advertised (likely by the
>       
>        ## t/t5558-clone-bundle-uri.sh ##
>      -@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone bundle list (http, creationToken)' '
>      - 	test_bundle_downloaded bundle-2.bundle trace-clone.txt
>      +@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone incomplete bundle list (http, creationToken)' '
>      + 		--single-branch --branch=base --no-tags \
>      + 		"$HTTPD_URL/smart/fetch.git" clone-token-http &&
>      + 
>      ++	test_cmp_config -C clone-token-http "$HTTPD_URL/bundle-list" fetch.bundleuri &&
>      ++
>      + 	cat >expect <<-EOF &&
>      + 	$HTTPD_URL/bundle-list
>      + 	$HTTPD_URL/bundle-1.bundle
>      +@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone incomplete bundle list (http, creationToken)' '
>      + 	test_cmp expect actual
>        '
>        
>       +test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
>      @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone bundle list (http, creat
>       +
>       +	test_cmp_config -C fetch-http-4 "$HTTPD_URL/bundle-list" fetch.bundleuri &&
>       +
>      -+	# The clone should copy two files: the list and bundle-1.
>      -+	test_bundle_downloaded bundle-list trace-clone.txt &&
>      -+	test_bundle_downloaded bundle-1.bundle trace-clone.txt &&
>      ++	cat >expect <<-EOF &&
>      ++	$HTTPD_URL/bundle-list
>      ++	$HTTPD_URL/bundle-1.bundle
>      ++	EOF
>      ++
>      ++	test_remote_https_urls <trace-clone.txt >actual &&
>      ++	test_cmp expect actual &&
>       +
>       +	# only received base ref from bundle-1
>       +	git -C fetch-http-4 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
>   6:  afcfd27a883 =  7:  d491070efed bundle-uri: drop bundle.flag from design doc
>   7:  1627fc158b1 !  8:  59e57e04968 fetch: fetch from an external bundle URI
>      @@ builtin/fetch.c: int cmd_fetch(int argc, const char **argv, const char *prefix)
>        	if (dry_run)
>        		write_fetch_head = 0;
>        
>      -+	if (!git_config_get_string_tmp("fetch.bundleuri", &bundle_uri) &&
>      -+	    !starts_with(bundle_uri, "remote:")) {
>      ++	if (!git_config_get_string_tmp("fetch.bundleuri", &bundle_uri)) {
>       +		if (fetch_bundle_uri(the_repository, bundle_uri, NULL))
>       +			warning(_("failed to fetch bundles from '%s'"), bundle_uri);
>       +	}
>      @@ builtin/fetch.c: int cmd_fetch(int argc, const char **argv, const char *prefix)
>        			die(_("fetch --all does not take a repository argument"));
>       
>        ## t/t5558-clone-bundle-uri.sh ##
>      +@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone incomplete bundle list (http, creationToken)' '
>      + 	EOF
>      + 
>      + 	test_remote_https_urls <trace-clone.txt >actual &&
>      +-	test_cmp expect actual
>      ++	test_cmp expect actual &&
>      ++
>      ++	# We now have only one bundle ref.
>      ++	git -C clone-token-http for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
>      ++	cat >expect <<-\EOF &&
>      ++	refs/bundles/base
>      ++	EOF
>      ++	test_cmp expect refs &&
>      ++
>      ++	# Add remaining bundles, exercising the "deepening" strategy
>      ++	# for downloading via the creationToken heurisitc.
>      ++	cat >>"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
>      ++	[bundle "bundle-2"]
>      ++		uri = bundle-2.bundle
>      ++		creationToken = 2
>      ++
>      ++	[bundle "bundle-3"]
>      ++		uri = bundle-3.bundle
>      ++		creationToken = 3
>      ++
>      ++	[bundle "bundle-4"]
>      ++		uri = bundle-4.bundle
>      ++		creationToken = 4
>      ++	EOF
>      ++
>      ++	GIT_TRACE2_EVENT="$(pwd)/trace1.txt" \
>      ++		git -C clone-token-http fetch origin --no-tags \
>      ++		refs/heads/merge:refs/heads/merge &&
>      ++
>      ++	cat >expect <<-EOF &&
>      ++	$HTTPD_URL/bundle-list
>      ++	$HTTPD_URL/bundle-4.bundle
>      ++	$HTTPD_URL/bundle-3.bundle
>      ++	$HTTPD_URL/bundle-2.bundle
>      ++	EOF
>      ++
>      ++	test_remote_https_urls <trace1.txt >actual &&
>      ++	test_cmp expect actual &&
>      ++
>      ++	# We now have all bundle refs.
>      ++	git -C clone-token-http for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
>      ++
>      ++	cat >expect <<-\EOF &&
>      ++	refs/bundles/base
>      ++	refs/bundles/left
>      ++	refs/bundles/merge
>      ++	refs/bundles/right
>      ++	EOF
>      ++	test_cmp expect refs
>      + '
>      + 
>      + test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
>       @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
>        	cat >expect <<-\EOF &&
>        	refs/bundles/base
>      @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'http clone with bundle.heurist
>       +		refs/heads/left:refs/heads/left \
>       +		refs/heads/right:refs/heads/right &&
>       +
>      -+	# This fetch should copy two files: the list and bundle-2.
>      -+	test_bundle_downloaded bundle-list trace1.txt &&
>      -+	test_bundle_downloaded bundle-2.bundle trace1.txt &&
>      -+	! test_bundle_downloaded bundle-1.bundle trace1.txt &&
>      ++	cat >expect <<-EOF &&
>      ++	$HTTPD_URL/bundle-list
>      ++	$HTTPD_URL/bundle-2.bundle
>      ++	EOF
>      ++
>      ++	test_remote_https_urls <trace1.txt >actual &&
>      ++	test_cmp expect actual &&
>       +
>       +	# received left from bundle-2
>       +	git -C fetch-http-4 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
>      @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'http clone with bundle.heurist
>       +		creationToken = 4
>       +	EOF
>       +
>      -+	# This fetch should skip bundle-3.bundle, since its objets are
>      ++	# This fetch should skip bundle-3.bundle, since its objects are
>       +	# already local (we have the requisite commits for bundle-4.bundle).
>       +	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
>       +		git -C fetch-http-4 fetch origin --no-tags \
>       +		refs/heads/merge:refs/heads/merge &&
>       +
>      -+	# This fetch should copy three files: the list, bundle-3, and bundle-4.
>      -+	test_bundle_downloaded bundle-list trace2.txt &&
>      -+	test_bundle_downloaded bundle-4.bundle trace2.txt &&
>      -+	! test_bundle_downloaded bundle-1.bundle trace2.txt &&
>      -+	! test_bundle_downloaded bundle-2.bundle trace2.txt &&
>      -+	! test_bundle_downloaded bundle-3.bundle trace2.txt &&
>      ++	cat >expect <<-EOF &&
>      ++	$HTTPD_URL/bundle-list
>      ++	$HTTPD_URL/bundle-4.bundle
>      ++	EOF
>      ++
>      ++	test_remote_https_urls <trace2.txt >actual &&
>      ++	test_cmp expect actual &&
>       +
>       +	# received merge ref from bundle-4, but right is missing
>       +	# because we did not download bundle-3.
>   8:  51f210ddeb4 !  9:  6a1504b1c3a bundle-uri: store fetch.bundleCreationToken
>      @@ Commit message
>           When checking the same bundle list twice, this strategy requires
>           downloading the bundle with the maximum creationToken again, which is
>           wasteful. The creationToken heuristic promises that the client will not
>      -    have a use for that bundle if its creationToken value is the at most the
>      +    have a use for that bundle if its creationToken value is at most the
>           previous creationToken value.
>       
>           To prevent these wasteful downloads, create a fetch.bundleCreationToken
>      @@ Commit message
>       
>        ## Documentation/config/fetch.txt ##
>       @@ Documentation/config/fetch.txt: fetch.bundleURI::
>      - 	first running `git fetch --bundle-uri=<uri>` immediately before
>      - 	`git fetch <args>`. See details of the `--bundle-uri` option in
>      - 	linkgit:git-fetch[1].
>      + 	linkgit:git-clone[1]. `git clone --bundle-uri` will set the
>      + 	`fetch.bundleURI` value if the supplied bundle URI contains a bundle
>      + 	list that is organized for incremental fetches.
>      +++
>      ++If you modify this value and your repository has a `fetch.bundleCreationToken`
>      ++value, then remove that `fetch.bundleCreationToken` value before fetching from
>      ++the new bundle URI.
>       +
>       +fetch.bundleCreationToken::
>       +	When using `fetch.bundleURI` to fetch incrementally from a bundle
>      @@ Documentation/config/fetch.txt: fetch.bundleURI::
>       +	This value is used to prevent downloading bundles in the future
>       +	if the advertised `creationToken` is not strictly larger than this
>       +	value.
>      +++
>      ++The creation token values are chosen by the provider serving the specific
>      ++bundle URI. If you modify the URI at `fetch.bundleURI`, then be sure to
>      ++remove the value for the `fetch.bundleCreationToken` value before fetching.
>       
>        ## bundle-uri.c ##
>       @@ bundle-uri.c: static int fetch_bundles_by_token(struct repository *r,
>        {
>        	int cur;
>      - 	int pop_or_push = 0;
>      + 	int move_direction = 0;
>       +	const char *creationTokenStr;
>      -+	uint64_t maxCreationToken;
>      ++	uint64_t maxCreationToken = 0, newMaxCreationToken = 0;
>        	struct bundle_list_context ctx = {
>        		.r = r,
>        		.list = list,
>       @@ bundle-uri.c: static int fetch_bundles_by_token(struct repository *r,
>        
>      - 	for_all_bundles_in_list(list, insert_bundle, &sorted);
>      + 	for_all_bundles_in_list(list, append_bundle, &bundles);
>        
>      -+	if (!sorted.nr) {
>      -+		free(sorted.items);
>      ++	if (!bundles.nr) {
>      ++		free(bundles.items);
>       +		return 0;
>       +	}
>       +
>      - 	QSORT(sorted.items, sorted.nr, compare_creation_token);
>      + 	QSORT(bundles.items, bundles.nr, compare_creation_token_decreasing);
>        
>       +	/*
>       +	 * If fetch.bundleCreationToken exists, parses to a uint64t, and
>      @@ bundle-uri.c: static int fetch_bundles_by_token(struct repository *r,
>       +				   "fetch.bundlecreationtoken",
>       +				   &creationTokenStr) &&
>       +	    sscanf(creationTokenStr, "%"PRIu64, &maxCreationToken) == 1 &&
>      -+	    sorted.items[0]->creationToken <= maxCreationToken) {
>      -+		free(sorted.items);
>      ++	    bundles.items[0]->creationToken <= maxCreationToken) {
>      ++		free(bundles.items);
>       +		return 0;
>       +	}
>       +
>        	/*
>      - 	 * Use a stack-based approach to download the bundles and attempt
>      - 	 * to unbundle them in decreasing order by creation token. If we
>      + 	 * Attempt to download and unbundle the minimum number of bundles by
>      + 	 * creationToken in decreasing order. If we fail to unbundle (after
>      +@@ bundle-uri.c: static int fetch_bundles_by_token(struct repository *r,
>      + 	cur = 0;
>      + 	while (cur >= 0 && cur < bundles.nr) {
>      + 		struct remote_bundle_info *bundle = bundles.items[cur];
>      ++
>      ++		/*
>      ++		 * If we need to dig into bundles below the previous
>      ++		 * creation token value, then likely we are in an erroneous
>      ++		 * state due to missing or invalid bundles. Halt the process
>      ++		 * instead of continuing to download extra data.
>      ++		 */
>      ++		if (bundle->creationToken <= maxCreationToken)
>      ++			break;
>      ++
>      + 		if (!bundle->file) {
>      + 			/*
>      + 			 * Not downloaded yet. Try downloading.
>      +@@ bundle-uri.c: static int fetch_bundles_by_token(struct repository *r,
>      + 				 */
>      + 				move_direction = -1;
>      + 				bundle->unbundled = 1;
>      ++
>      ++				if (bundle->creationToken > newMaxCreationToken)
>      ++					newMaxCreationToken = bundle->creationToken;
>      + 			}
>      + 		}
>      + 
>       @@ bundle-uri.c: stack_operation:
>      - 		cur += pop_or_push;
>      + 		cur += move_direction;
>        	}
>        
>      --	free(sorted.items);
>      +-	free(bundles.items);
>       -
>        	/*
>        	 * We succeed if the loop terminates because 'cur' drops below
>      @@ bundle-uri.c: stack_operation:
>        	 */
>       +	if (cur < 0) {
>       +		struct strbuf value = STRBUF_INIT;
>      -+		strbuf_addf(&value, "%"PRIu64"", sorted.items[0]->creationToken);
>      ++		strbuf_addf(&value, "%"PRIu64"", newMaxCreationToken);
>       +		if (repo_config_set_multivar_gently(ctx.r,
>       +						    "fetch.bundleCreationToken",
>       +						    value.buf, NULL, 0))
>      @@ bundle-uri.c: stack_operation:
>       +		strbuf_release(&value);
>       +	}
>       +
>      -+	free(sorted.items);
>      ++	free(bundles.items);
>        	return cur >= 0;
>        }
>        
>       
>        ## t/t5558-clone-bundle-uri.sh ##
>      +@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone incomplete bundle list (http, creationToken)' '
>      + 		"$HTTPD_URL/smart/fetch.git" clone-token-http &&
>      + 
>      + 	test_cmp_config -C clone-token-http "$HTTPD_URL/bundle-list" fetch.bundleuri &&
>      ++	test_cmp_config -C clone-token-http 1 fetch.bundlecreationtoken &&
>      + 
>      + 	cat >expect <<-EOF &&
>      + 	$HTTPD_URL/bundle-list
>      +@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone incomplete bundle list (http, creationToken)' '
>      + 	GIT_TRACE2_EVENT="$(pwd)/trace1.txt" \
>      + 		git -C clone-token-http fetch origin --no-tags \
>      + 		refs/heads/merge:refs/heads/merge &&
>      ++	test_cmp_config -C clone-token-http 4 fetch.bundlecreationtoken &&
>      + 
>      + 	cat >expect <<-EOF &&
>      + 	$HTTPD_URL/bundle-list
>       @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
>        		"$HTTPD_URL/smart/fetch.git" fetch-http-4 &&
>        
>        	test_cmp_config -C fetch-http-4 "$HTTPD_URL/bundle-list" fetch.bundleuri &&
>       +	test_cmp_config -C fetch-http-4 1 fetch.bundlecreationtoken &&
>        
>      - 	# The clone should copy two files: the list and bundle-1.
>      - 	test_bundle_downloaded bundle-list trace-clone.txt &&
>      + 	cat >expect <<-EOF &&
>      + 	$HTTPD_URL/bundle-list
>       @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
>      + 		git -C fetch-http-4 fetch origin --no-tags \
>        		refs/heads/left:refs/heads/left \
>        		refs/heads/right:refs/heads/right &&
>      - 
>       +	test_cmp_config -C fetch-http-4 2 fetch.bundlecreationtoken &&
>      -+
>      - 	# This fetch should copy two files: the list and bundle-2.
>      - 	test_bundle_downloaded bundle-list trace1.txt &&
>      - 	test_bundle_downloaded bundle-2.bundle trace1.txt &&
>      + 
>      + 	cat >expect <<-EOF &&
>      + 	$HTTPD_URL/bundle-list
>       @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
>        	EOF
>        	test_cmp expect refs &&
>      @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'http clone with bundle.heurist
>       +		git -C fetch-http-4 fetch origin --no-tags \
>       +		refs/heads/left:refs/heads/left \
>       +		refs/heads/right:refs/heads/right &&
>      -+	test_bundle_downloaded bundle-list trace1b.txt &&
>      -+	! test_bundle_downloaded bundle-1.bundle trace1b.txt &&
>      -+	! test_bundle_downloaded bundle-2.bundle trace1b.txt &&
>      ++
>      ++	cat >expect <<-EOF &&
>      ++	$HTTPD_URL/bundle-list
>      ++	EOF
>      ++	test_remote_https_urls <trace1b.txt >actual &&
>      ++	test_cmp expect actual &&
>       +
>        	cat >>"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
>        	[bundle "bundle-3"]
>        		uri = bundle-3.bundle
>       @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
>      + 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
>        		git -C fetch-http-4 fetch origin --no-tags \
>        		refs/heads/merge:refs/heads/merge &&
>      - 
>       +	test_cmp_config -C fetch-http-4 4 fetch.bundlecreationtoken &&
>      -+
>      - 	# This fetch should copy three files: the list, bundle-3, and bundle-4.
>      - 	test_bundle_downloaded bundle-list trace2.txt &&
>      - 	test_bundle_downloaded bundle-4.bundle trace2.txt &&
>      + 
>      + 	cat >expect <<-EOF &&
>      + 	$HTTPD_URL/bundle-list
>       @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
>        	refs/bundles/left
>        	refs/bundles/merge
>      @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'http clone with bundle.heurist
>       +	# No-op fetch
>       +	GIT_TRACE2_EVENT="$(pwd)/trace2b.txt" \
>       +		git -C fetch-http-4 fetch origin &&
>      -+	test_bundle_downloaded bundle-list trace2b.txt &&
>      -+	! test_bundle_downloaded bundle-1.bundle trace2b.txt &&
>      -+	! test_bundle_downloaded bundle-2.bundle trace2b.txt &&
>      -+	! test_bundle_downloaded bundle-3.bundle trace2b.txt &&
>      -+	! test_bundle_downloaded bundle-4.bundle trace2b.txt
>      ++
>      ++	cat >expect <<-EOF &&
>      ++	$HTTPD_URL/bundle-list
>      ++	EOF
>      ++	test_remote_https_urls <trace2b.txt >actual &&
>      ++	test_cmp expect actual
>        '
>        
>        # Do not add tests here unless they use the HTTP server, as they will
>   -:  ----------- > 10:  676522615ad bundle-uri: test missing bundles with heuristic
> 


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v2 05/10] bundle-uri: download in creationToken order
  2023-01-27 19:17     ` Victoria Dye
@ 2023-01-27 19:32       ` Junio C Hamano
  2023-01-30 18:43         ` Derrick Stolee
  0 siblings, 1 reply; 74+ messages in thread
From: Junio C Hamano @ 2023-01-27 19:32 UTC (permalink / raw)
  To: Victoria Dye
  Cc: Derrick Stolee via GitGitGadget, git, me, avarab, steadmon,
	chooglen, Derrick Stolee

Victoria Dye <vdye@github.com> writes:

>> +			/*
>> +			 * Not downloaded yet. Try downloading.
>> +			 *
>> +			 * Note that bundle->file is non-NULL if a download
>> +			 * was attempted, even if it failed to download.
>> +			 */
>> +			if (fetch_bundle_uri_internal(ctx.r, bundle, ctx.depth + 1, ctx.list)) {
>> +				/* Mark as unbundled so we do not retry. */
>> +				bundle->unbundled = 1;
>
> This implicitly shows that, unlike a failed unbundling, a failed download is
> always erroneous behavior, with the added benefit of avoiding (potentially
> expensive) download re-attempts.

Hmph, I somehow was hoping that we'd allow an option to use range
requests to resume an interrupted download in the future, so
outright "always avoid attempts to download again" may not be what
we want in the longer run.  But being able to tell if download
failed (and there will probably be more than "success/failure" bit,
but something like "we got an explicit 401 not found" vs "we were
disconnected after downloading a few megabytes"), and unbundling
failed (where there is no point attempting) is a good idea.

>>  	cat >expect <<-EOF &&
>> -	$HTTPD_URL/bundle-1.bundle
>> -	$HTTPD_URL/bundle-2.bundle
>> -	$HTTPD_URL/bundle-3.bundle
>> +	$HTTPD_URL/bundle-list
>>  	$HTTPD_URL/bundle-4.bundle
>> +	$HTTPD_URL/bundle-3.bundle
>> +	$HTTPD_URL/bundle-2.bundle
>> +	$HTTPD_URL/bundle-1.bundle
>> +	EOF
>
> Ooh, interesting - using the new "test_remote_https_urls", these tests now
> also verify that the bundles were downloaded in decreasing order when using
> the 'creationToken' heuristic. That's a nice extra confirmation that the
> heuristic is working as intended.

Yes, that indeed is very nice.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v2 05/10] bundle-uri: download in creationToken order
  2023-01-27 19:32       ` Junio C Hamano
@ 2023-01-30 18:43         ` Derrick Stolee
  2023-01-30 19:02           ` Junio C Hamano
  0 siblings, 1 reply; 74+ messages in thread
From: Derrick Stolee @ 2023-01-30 18:43 UTC (permalink / raw)
  To: Junio C Hamano, Victoria Dye
  Cc: Derrick Stolee via GitGitGadget, git, me, avarab, steadmon, chooglen

On 1/27/2023 2:32 PM, Junio C Hamano wrote:
> Victoria Dye <vdye@github.com> writes:
> 
>>> +			/*
>>> +			 * Not downloaded yet. Try downloading.
>>> +			 *
>>> +			 * Note that bundle->file is non-NULL if a download
>>> +			 * was attempted, even if it failed to download.
>>> +			 */
>>> +			if (fetch_bundle_uri_internal(ctx.r, bundle, ctx.depth + 1, ctx.list)) {
>>> +				/* Mark as unbundled so we do not retry. */
>>> +				bundle->unbundled = 1;
>>
>> This implicitly shows that, unlike a failed unbundling, a failed download is
>> always erroneous behavior, with the added benefit of avoiding (potentially
>> expensive) download re-attempts.
> 
> Hmph, I somehow was hoping that we'd allow an option to use range
> requests to resume an interrupted download in the future, so
> outright "always avoid attempts to download again" may not be what
> we want in the longer run.  But being able to tell if download
> failed (and there will probably be more than "success/failure" bit,
> but something like "we got an explicit 401 not found" vs "we were
> disconnected after downloading a few megabytes"), and unbundling
> failed (where there is no point attempting) is a good idea.

I think there are two possible directions we can have when talking
about interrupted downloads:

1. The network connection was disconnected, and the client may want
   to respond to that with a retry and a ranged request.

2. The client process itself terminates for some reason, and a
   second process recognizes that some of the data already exists
   and could be used for a range request of the remainder.

I think both of these would not be handled at this layer, but
instead further down, inside fetch_bundle_uri_internal()
(specifically further down in download_https_uri_to_file()).

Any retry logic should happen there, closer to the connection,
and at the layer of the current patch, we should assume that any
retry logic that was attempted ended up failing in the end.

Does that satisfy your concerns here?

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v2 10/10] bundle-uri: test missing bundles with heuristic
  2023-01-27 19:21     ` Victoria Dye
@ 2023-01-30 18:47       ` Derrick Stolee
  0 siblings, 0 replies; 74+ messages in thread
From: Derrick Stolee @ 2023-01-30 18:47 UTC (permalink / raw)
  To: Victoria Dye, Derrick Stolee via GitGitGadget, git
  Cc: gitster, me, avarab, steadmon, chooglen

On 1/27/2023 2:21 PM, Victoria Dye wrote:
> Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <derrickstolee@github.com>

>> +	# Only base bundle unbundled.
>> +	git -C download-2 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
>> +	cat >expect <<-EOF &&
>> +	refs/bundles/base
>> +	refs/bundles/right
>> +	EOF
>> +	test_cmp expect refs &&
> 
> Maybe I'm misreading, but I don't think the comment ("Only base bundle
> unbundled") lines up with the expected bundle refs (both bundle-1
> ('refs/bundles/base') and bundle-3 ('refs/bundles/right') seem to be
> unbundled). 

>> +	# All bundles failed to unbundle
>> +	git -C download-3 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
>> +	cat >expect <<-EOF &&
>> +	refs/bundles/base
>> +	refs/bundles/left
>> +	refs/bundles/right
>> +	EOF
>> +	test_cmp expect refs
> 
> Similar issue with the comment here - it says that all bundles *failed* to
> unbundle, but the test case description ("Case 3: top bundle does not exist,
> rest unbundle fine.") and the result show bundle-1, bundle-2, and bundle-3
> all unbundling successfully.

Thank you for reading carefully. I'm sorry about not updating
the comments after copying these checks around the test file.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v2 05/10] bundle-uri: download in creationToken order
  2023-01-30 18:43         ` Derrick Stolee
@ 2023-01-30 19:02           ` Junio C Hamano
  2023-01-30 19:12             ` Derrick Stolee
  0 siblings, 1 reply; 74+ messages in thread
From: Junio C Hamano @ 2023-01-30 19:02 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Victoria Dye, Derrick Stolee via GitGitGadget, git, me, avarab,
	steadmon, chooglen

Derrick Stolee <derrickstolee@github.com> writes:

> I think there are two possible directions we can have when talking
> about interrupted downloads:
>
> 1. The network connection was disconnected, and the client may want
>    to respond to that with a retry and a ranged request.
>
> 2. The client process itself terminates for some reason, and a
>    second process recognizes that some of the data already exists
>    and could be used for a range request of the remainder.
>
> I think both of these would not be handled at this layer, but
> instead further down, inside fetch_bundle_uri_internal()
> (specifically further down in download_https_uri_to_file()).
>
> Any retry logic should happen there, closer to the connection,
> and at the layer of the current patch, we should assume that any
> retry logic that was attempted ended up failing in the end.
>
> Does that satisfy your concerns here?

Mostly.  We probably do not want / need to cater to "I killed it
with ^C and would want to continue".

Thanks.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v2 05/10] bundle-uri: download in creationToken order
  2023-01-30 19:02           ` Junio C Hamano
@ 2023-01-30 19:12             ` Derrick Stolee
  0 siblings, 0 replies; 74+ messages in thread
From: Derrick Stolee @ 2023-01-30 19:12 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Victoria Dye, Derrick Stolee via GitGitGadget, git, me, avarab,
	steadmon, chooglen

On 1/30/2023 2:02 PM, Junio C Hamano wrote:
> Derrick Stolee <derrickstolee@github.com> writes:
> 
>> I think there are two possible directions we can have when talking
>> about interrupted downloads:
>>
>> 1. The network connection was disconnected, and the client may want
>>    to respond to that with a retry and a ranged request.
>>
>> 2. The client process itself terminates for some reason, and a
>>    second process recognizes that some of the data already exists
>>    and could be used for a range request of the remainder.
... 
> Mostly.  We probably do not want / need to cater to "I killed it
> with ^C and would want to continue".

I mention it because it has been mentioned to me as an example use
case. This includes other failures, such as power outages.

I don't think we have range requests enabled for the longer-lived
packfile-uri, but we shall see if adoption of bundle URIs presents
more motivation to build that feature. Case (1) will be the easiest
to consider, and I agree that it would be more generally useful.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH v3 00/11] Bundle URIs V: creationToken heuristic for incremental fetches
  2023-01-23 15:21 ` [PATCH v2 00/10] Bundle URIs V: creationToken heuristic for incremental fetches Derrick Stolee via GitGitGadget
                     ` (10 preceding siblings ...)
  2023-01-27 19:28   ` [PATCH v2 00/10] Bundle URIs V: creationToken heuristic for incremental fetches Victoria Dye
@ 2023-01-31 13:29   ` Derrick Stolee via GitGitGadget
  2023-01-31 13:29     ` [PATCH v3 01/11] bundle: test unbundling with incomplete history Derrick Stolee via GitGitGadget
                       ` (10 more replies)
  11 siblings, 11 replies; 74+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-01-31 13:29 UTC (permalink / raw)
  To: git; +Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee

This fifth part to the bundle URIs feature follows part IV (advertising via
protocol v2) which recently merged to 'master', so this series is based on
'master'.

This part introduces the concept of a heuristic that a bundle list can
advertise. The purpose of the heuristic is to hint to the Git client that
the bundles can be downloaded and unbundled in a certain order. In
particular, that order can assist with using the same bundle URI to download
new bundles from an updated bundle list. This allows bundle URIs to assist
with incremental fetches, not just initial clones.

The only planned heuristic is the "creationToken" heuristic where the bundle
list adds a 64-bit unsigned integer "creationToken" value to each bundle in
the list. Those values provide an ordering on the bundles implying that the
bundles can be unbundled in increasing creationToken order and at each point
the required commits for the ith bundle were provided by bundles with lower
creationTokens.

At clone time, the only difference implied by the creationToken order is
that the Git client does not need to guess at the order to apply the
bundles, but instead can use the creationToken order to apply them without
failure and retry. However, this presents an interesting benefit during
fetches: the Git client can check the bundle list and download bundles in
decreasing creationToken order until the required commits for these bundles
are present within the repository's object store. This prevents downloading
more bundle information than required.

The creationToken value is also a promise that the Git client will not need
to download a bundle if its creationToken is less than or equal to the
creationToken of a previously-downloaded bundle. This further improves the
performance during a fetch in that the client does not need to download any
bundles at all if it recognizes that the maximum creationToken is the same
(or smaller than) a previously-downloaded creationToken.

The creationToken concept is documented in the existing design document at
Documentation/technical/bundle-uri.txt, including suggested ways for bundle
providers to organize their bundle lists to take advantage of the heuristic.

This series formalizes the creationToken heuristic and the Git client logic
for understanding it. Further, for bundle lists provided by the git clone
--bundle-uri option, the Git client will recognize the heuristic as being
helpful for incremental fetches and store config values so that future git
fetch commands check the bundle list before communicating with any Git
remotes.

Note that this option does not integrate fetches with bundle lists
advertised via protocol v2. I spent some time working on this, but found the
implementation to be distinct enough that it merited its own attention in a
separate series. In particular, the configuration for indicating that a
fetch should check the bundle-uri protocol v2 command seemed best to be
located within a Git remote instead of a repository-global key such as is
being used for a static URI. Further, the timing of querying the bundle-uri
command during a git fetch command is significantly different and more
complicated than how it is used in git clone.


What Remains?
=============

Originally, I had planned on making this bundle URI work a 5-part series,
and this is part 5. Shouldn't we be done now?

There are two main things that should be done after this series, in any
order:

 * Teach git fetch to check a bundle list advertised by a remote over the
   bundle-uri protocol v2 command.
 * Add the bundle.<id>.filter option to allow advertising bundles and
   partial bundles side-by-side.

There is also room for expanding tests for more error conditions, or for
other tweaks that are not currently part of the design document. I do think
that after this series, the feature will be easier to work on different
parts in parallel.


Patch Outline
=============

 * (New in v3) Patch 1 tests the behavior of 'git bundle verify' and 'git
   bundle unbundle' when in the strange situation where a prerequisite
   commit exists in the object store but is not closed under reachability
   (necessarily not reachable from refs, too). This helps motivate the new
   Patch 2.
 * (New in v3) Patch 2 updates the behavior in verify_bundle() to use the
   check_connected()
 * Patch 3 creates a test setup demonstrating a creationToken heuristic. At
   this point, the Git client ignores the heuristic and uses its ad-hoc
   strategy for ordering the bundles.
 * Patches 4 and 5 teach Git to parse the bundle.heuristic and
   bundle.<id>.creationToken keys in a bundle list.
 * Patch 6 teaches Git to download bundles using the creationToken order.
   This order uses a stack approach to start from the maximum creationToken
   and continue downloading the next bundle in the list until all bundles
   can successfully be unbundled. This is the algorithm required for
   incremental fetches, while initial clones could download in the opposite
   order. Since clones will download all bundles anyway, having a second
   code path just for clones seemed unnecessary.
 * Patch 7 teaches git clone --bundle-uri to set fetch.bundleURI when the
   advertised bundle list includs a heuristic that Git understands.
 * Patch 8 updates the design document to remove reference to a bundle.flag
   option that was previously going to indicate the list was designed for
   fetches, but the bundle.heuristic option already does that.
 * Patch 9 teaches git fetch to check fetch.bundleURI and download bundles
   from that static URI before connecting to remotes via the Git protocol.
 * Patch 10 introduces a new fetch.bundleCreationToken config value to store
   the maximum creationToken of downloaded bundles. This prevents
   downloading the latest bundle on every git fetch command, reducing waste.
 * Patch 11 adds new tests for interesting incremental fetch shapes. Along
   with other test edits in other patches, these revealed several issues
   that required improvement within this series. These tests also check
   extra cases around failed bundle downloads.


Updates in v3
=============

 * Patches 1 and 2 are replacements for v3's patch 1. Instead of skipping
   the reachability walk, make it slightly more flexible by using
   check_connected(). The first patch adds tests that cover this behavior,
   which was previously untested.
 * Patch 6 replaces the "stack_operation" label with a "move" label.
 * Patch 9 simplifies nested ifs to use &&.
 * Patch 11 updates some incorrect test comments.


Updates in v2
=============

 * Patches 1 and 10 are new.
 * I started making the extra tests in patch 10 due to Victoria's concern
   around failed downloads. I extended the bundle list in a way that exposed
   other issues that are fixed in this version. Unfortunately, the test
   requires the full functionality of the entire series, so the tests are
   not isolated to where the code fixes are made. One thing that I noticed
   in the process is that some of the tests were using the local-clone trick
   to copy full object directories instead of copying only the requested
   object set. This was causing confusion in how the bundles were applying
   or failing to apply, so the tests are updated to use http whenever
   possible.
 * In Patch 2, I created a new test_remote_https_urls helper to get the full
   download list (in order). In this patch, the bundle download order is not
   well-defined, but is modified in later tests when it becomes
   well-defined.
 * In Patch 3, I updated the connection between config value and enum value
   to be an array of pairs instead of faking a hashmap-like interface that
   could be dangerous if the enum values were assigned incorrectly.
 * In Patch 5, the 'sorted' list and its type was renamed to be more
   descriptive. This also included updates to "append_bundle()" and
   "compare_creation_token_decreasing()" to be more descriptive. This had
   some side effects in Patch 8 due to the renames.
 * In Patch 5, I added the interesting bundle shape to the commit message to
   remind us of why the creationToken algorithm needs to be the way it is. I
   also removed the "stack" language in favor of discussing ranges of the
   sorted list. Some renames, such as "pop_or_push" is changed to
   "move_direction", resulted from this change of language.
 * The assignment of heuristic from the local list to global_list was moved
   into Patch 5.
 * In Patch 5, one of the tests removed bundle-2 because it allows a later
   test for git fetch to demonstrate the interesting behavior where bundle-4
   requires both bundle-2 and bundle-3.
 * In Patch 6, the fetch.bundleURI config is described differently,
   including dropping the defunct git fetch --bundle-uri reference and
   discussing that git clone --bundle-uri will set it automatically.
 * Patch 8 no longer refers to a config value starting with "remote:". It
   also expands a test that was previously not expanded in v1.
 * Patch 9 updates the documentation for fetch.bundleURI and
   fetch.bundleCreationToken to describe how the user should unset the
   latter if they edit the former.
 * Much of Patch 9's changes are due to context changes from the renames in
   Patch 5. However, it also adds the restriction that it will not attempt
   to download bundles unless their creationToken is strictly greater than
   the stored token. This ends up being critical to the failed download
   case, preventing an incremental fetch from downloading all bundles just
   because one bundle failed to download (and that case is tested in patch
   10).
 * Patch 10 adds significant testing, including several tests of failed
   bundle downloads in various cases.

Thanks,

 * Stolee

Derrick Stolee (11):
  bundle: test unbundling with incomplete history
  bundle: verify using check_connected()
  t5558: add tests for creationToken heuristic
  bundle-uri: parse bundle.heuristic=creationToken
  bundle-uri: parse bundle.<id>.creationToken values
  bundle-uri: download in creationToken order
  clone: set fetch.bundleURI if appropriate
  bundle-uri: drop bundle.flag from design doc
  fetch: fetch from an external bundle URI
  bundle-uri: store fetch.bundleCreationToken
  bundle-uri: test missing bundles with heuristic

 Documentation/config/bundle.txt        |   7 +
 Documentation/config/fetch.txt         |  24 +
 Documentation/technical/bundle-uri.txt |   8 +-
 builtin/clone.c                        |   6 +-
 builtin/fetch.c                        |   6 +
 bundle-uri.c                           | 249 ++++++++-
 bundle-uri.h                           |  28 +-
 bundle.c                               |  75 ++-
 t/t5558-clone-bundle-uri.sh            | 672 ++++++++++++++++++++++++-
 t/t5601-clone.sh                       |  46 ++
 t/t5750-bundle-uri-parse.sh            |  37 ++
 t/t6020-bundle-misc.sh                 |  40 ++
 t/test-lib-functions.sh                |   8 +
 13 files changed, 1149 insertions(+), 57 deletions(-)


base-commit: 4dbebc36b0893f5094668ddea077d0e235560b16
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1454%2Fderrickstolee%2Fbundle-redo%2FcreationToken-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1454/derrickstolee/bundle-redo/creationToken-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/1454

Range-diff vs v2:

  1:  b3828725bc8 <  -:  ----------- bundle: optionally skip reachability walk
  -:  ----------- >  1:  f9b0cc872ac bundle: test unbundling with incomplete history
  -:  ----------- >  2:  20c29d37f9c bundle: verify using check_connected()
  2:  427aff4d5e5 =  3:  45cdf9d13a7 t5558: add tests for creationToken heuristic
  3:  f6f8197c9cc =  4:  49bf10e0fd4 bundle-uri: parse bundle.heuristic=creationToken
  4:  12efa228d04 =  5:  ff629bc119b bundle-uri: parse bundle.<id>.creationToken values
  5:  7cfaa3c518c !  6:  366db5f6931 bundle-uri: download in creationToken order
     @@ bundle-uri.c: static int download_bundle_to_file(struct remote_bundle_info *bund
      +
      +				/* Try looking deeper in the list. */
      +				move_direction = 1;
     -+				goto stack_operation;
     ++				goto move;
      +			}
      +
      +			/* We expect bundles when using creationTokens. */
     @@ bundle-uri.c: static int download_bundle_to_file(struct remote_bundle_info *bund
      +		 * previous step.
      +		 */
      +
     -+stack_operation:
     ++move:
      +		/* Move in the specified direction and repeat. */
      +		cur += move_direction;
      +	}
  6:  17c404c1b83 =  7:  b59c4e2d390 clone: set fetch.bundleURI if appropriate
  7:  d491070efed =  8:  83f49b37c69 bundle-uri: drop bundle.flag from design doc
  8:  59e57e04968 !  9:  314c60f2ae4 fetch: fetch from an external bundle URI
     @@ builtin/fetch.c: int cmd_fetch(int argc, const char **argv, const char *prefix)
       	if (dry_run)
       		write_fetch_head = 0;
       
     -+	if (!git_config_get_string_tmp("fetch.bundleuri", &bundle_uri)) {
     -+		if (fetch_bundle_uri(the_repository, bundle_uri, NULL))
     -+			warning(_("failed to fetch bundles from '%s'"), bundle_uri);
     -+	}
     ++	if (!git_config_get_string_tmp("fetch.bundleuri", &bundle_uri) &&
     ++	    fetch_bundle_uri(the_repository, bundle_uri, NULL))
     ++		warning(_("failed to fetch bundles from '%s'"), bundle_uri);
      +
       	if (all) {
       		if (argc == 1)
  9:  6a1504b1c3a ! 10:  4e0465efd19 bundle-uri: store fetch.bundleCreationToken
     @@ bundle-uri.c: static int fetch_bundles_by_token(struct repository *r,
       			}
       		}
       
     -@@ bundle-uri.c: stack_operation:
     +@@ bundle-uri.c: move:
       		cur += move_direction;
       	}
       
 10:  676522615ad ! 11:  c968b63feba bundle-uri: test missing bundles with heuristic
     @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'http clone with bundle.heurist
      +	test_remote_https_urls <trace-clone-2.txt >actual &&
      +	test_cmp expect actual &&
      +
     -+	# Only base bundle unbundled.
     ++	# bundle-1 and bundle-3 could unbundle, but bundle-4 could not
      +	git -C download-2 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
      +	cat >expect <<-EOF &&
      +	refs/bundles/base
     @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'http clone with bundle.heurist
      +	test_remote_https_urls <trace-clone-3.txt >actual &&
      +	test_cmp expect actual &&
      +
     -+	# All bundles failed to unbundle
     ++	# fake.bundle did not unbundle, but the others did.
      +	git -C download-3 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
      +	cat >expect <<-EOF &&
      +	refs/bundles/base

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH v3 01/11] bundle: test unbundling with incomplete history
  2023-01-31 13:29   ` [PATCH v3 00/11] " Derrick Stolee via GitGitGadget
@ 2023-01-31 13:29     ` Derrick Stolee via GitGitGadget
  2023-01-31 13:29     ` [PATCH v3 02/11] bundle: verify using check_connected() Derrick Stolee via GitGitGadget
                       ` (9 subsequent siblings)
  10 siblings, 0 replies; 74+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-01-31 13:29 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

When verifying a bundle, Git checks first that all prerequisite commits
exist in the object store, then adds an additional check: those
prerequisite commits must be reachable from references in the
repository.

This check is stronger than what is checked for refs being added during
'git fetch', which simply guarantees that the new refs have a complete
history up to the point where it intersects with the current reachable
history.

However, we also do not have any tests that check the behavior under
this condition. Create a test that demonstrates its behavior.

In order to construct a broken history, perform a shallow clone of a
repository with a linear history, but whose default branch ('base') has
a single commit, so dropping the shallow markers leaves a complete
history from that reference. However, the 'tip' reference adds a
shallow commit whose parent is missing in the cloned repository. Trying
to unbundle a bundle with the 'tip' as a prerequisite will succeed past
the object store check and move into the reachability check.

The two errors that are reported are of this form:

  error: Could not read <missing-commit>
  fatal: Failed to traverse parents of commit <present-commit>

These messages are not particularly helpful for the person running the
unbundle command, but they do prevent the command from succeeding.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 t/t6020-bundle-misc.sh | 40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh
index 3a1cf30b1d7..38dbbf89155 100755
--- a/t/t6020-bundle-misc.sh
+++ b/t/t6020-bundle-misc.sh
@@ -566,4 +566,44 @@ test_expect_success 'cloning from filtered bundle has useful error' '
 	grep "cannot clone from filtered bundle" err
 '
 
+test_expect_success 'verify catches unreachable, broken prerequisites' '
+	test_when_finished rm -rf clone-from clone-to &&
+	git init clone-from &&
+	(
+		cd clone-from &&
+		git checkout -b base &&
+		test_commit A &&
+		git checkout -b tip &&
+		git commit --allow-empty -m "will drop by shallow" &&
+		git commit --allow-empty -m "will keep by shallow" &&
+		git commit --allow-empty -m "for bundle, not clone" &&
+		git bundle create tip.bundle tip~1..tip &&
+		git reset --hard HEAD~1 &&
+		git checkout base
+	) &&
+	BAD_OID=$(git -C clone-from rev-parse tip~1) &&
+	TIP_OID=$(git -C clone-from rev-parse tip) &&
+	git clone --depth=1 --no-single-branch \
+		"file://$(pwd)/clone-from" clone-to &&
+	(
+		cd clone-to &&
+
+		# Set up broken history by removing shallow markers
+		git update-ref -d refs/remotes/origin/tip &&
+		rm .git/shallow &&
+
+		# Verify should fail
+		test_must_fail git bundle verify \
+			../clone-from/tip.bundle 2>err &&
+		grep "Could not read $BAD_OID" err &&
+		grep "Failed to traverse parents of commit $TIP_OID" err &&
+
+		# Unbundling should fail
+		test_must_fail git bundle unbundle \
+			../clone-from/tip.bundle 2>err &&
+		grep "Could not read $BAD_OID" err &&
+		grep "Failed to traverse parents of commit $TIP_OID" err
+	)
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v3 02/11] bundle: verify using check_connected()
  2023-01-31 13:29   ` [PATCH v3 00/11] " Derrick Stolee via GitGitGadget
  2023-01-31 13:29     ` [PATCH v3 01/11] bundle: test unbundling with incomplete history Derrick Stolee via GitGitGadget
@ 2023-01-31 13:29     ` Derrick Stolee via GitGitGadget
  2023-01-31 13:29     ` [PATCH v3 03/11] t5558: add tests for creationToken heuristic Derrick Stolee via GitGitGadget
                       ` (8 subsequent siblings)
  10 siblings, 0 replies; 74+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-01-31 13:29 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

When Git verifies a bundle to see if it is safe for unbundling, it first
looks to see if the prerequisite commits are in the object store. This
is an easy way to "fail fast" but it is not a sufficient check for
updating refs that guarantee closure under reachability. There could
still be issues if those commits are not reachable from the repository's
references. The repository only has guarantees that its object store is
closed under reachability for the objects that are reachable from
references.

Thus, the code in verify_bundle() has previously had the additional
check that all prerequisite commits are reachable from repository
references. This is done via a revision walk from all references,
stopping only if all prerequisite commits are discovered or all commits
are walked. This uses a custom walk to verify_bundle().

This check is more strict than what Git applies to fetched pack-files.
In the fetch case, Git guarantees that the new references are closed
under reachability by walking from the new references until walking
commits that are reachable from repository refs. This is done through
the well-used check_connected() method.

To better align with the restrictions required by 'git fetch',
reimplement this check in verify_bundle() to use check_connected(). This
also simplifies the code significantly.

The previous change added a test that verified the behavior of 'git
bundle verify' and 'git bundle unbundle' in this case, and the error
messages looked like this:

  error: Could not read <missing-commit>
  fatal: Failed to traverse parents of commit <extant-commit>

However, by changing the revision walk slightly within check_connected()
and using its quiet mode, we can omit those messages. Instead, we get
only this message, tailored to describing the current state of the
repository:

  error: some prerequisite commits exist in the object store,
         but are not connected to the repository's history

(Line break added here for the commit message formatting, only.)

While this message does not include any object IDs, there is no
guarantee that those object IDs would help the user diagnose what is
going on, as they could be separated from the prerequisite commits by
some distance. At minimum, this situation describes the situation in a
more informative way than the previous error messages.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle.c               | 75 ++++++++++++++++--------------------------
 t/t6020-bundle-misc.sh |  8 ++---
 2 files changed, 33 insertions(+), 50 deletions(-)

diff --git a/bundle.c b/bundle.c
index 4ef7256aa11..76c3a904898 100644
--- a/bundle.c
+++ b/bundle.c
@@ -12,6 +12,7 @@
 #include "refs.h"
 #include "strvec.h"
 #include "list-objects-filter-options.h"
+#include "connected.h"
 
 static const char v2_bundle_signature[] = "# v2 git bundle\n";
 static const char v3_bundle_signature[] = "# v3 git bundle\n";
@@ -187,6 +188,21 @@ static int list_refs(struct string_list *r, int argc, const char **argv)
 /* Remember to update object flag allocation in object.h */
 #define PREREQ_MARK (1u<<16)
 
+struct string_list_iterator {
+	struct string_list *list;
+	size_t cur;
+};
+
+static const struct object_id *iterate_ref_map(void *cb_data)
+{
+	struct string_list_iterator *iter = cb_data;
+
+	if (iter->cur >= iter->list->nr)
+		return NULL;
+
+	return iter->list->items[iter->cur++].util;
+}
+
 int verify_bundle(struct repository *r,
 		  struct bundle_header *header,
 		  enum verify_bundle_flags flags)
@@ -196,26 +212,25 @@ int verify_bundle(struct repository *r,
 	 * to be verbose about the errors
 	 */
 	struct string_list *p = &header->prerequisites;
-	struct rev_info revs = REV_INFO_INIT;
-	const char *argv[] = {NULL, "--all", NULL};
-	struct commit *commit;
-	int i, ret = 0, req_nr;
+	int i, ret = 0;
 	const char *message = _("Repository lacks these prerequisite commits:");
+	struct string_list_iterator iter = {
+		.list = p,
+	};
+	struct check_connected_options opts = {
+		.quiet = 1,
+	};
 
 	if (!r || !r->objects || !r->objects->odb)
 		return error(_("need a repository to verify a bundle"));
 
-	repo_init_revisions(r, &revs, NULL);
 	for (i = 0; i < p->nr; i++) {
 		struct string_list_item *e = p->items + i;
 		const char *name = e->string;
 		struct object_id *oid = e->util;
 		struct object *o = parse_object(r, oid);
-		if (o) {
-			o->flags |= PREREQ_MARK;
-			add_pending_object(&revs, o, name);
+		if (o)
 			continue;
-		}
 		ret++;
 		if (flags & VERIFY_BUNDLE_QUIET)
 			continue;
@@ -223,37 +238,14 @@ int verify_bundle(struct repository *r,
 			error("%s", message);
 		error("%s %s", oid_to_hex(oid), name);
 	}
-	if (revs.pending.nr != p->nr)
+	if (ret)
 		goto cleanup;
-	req_nr = revs.pending.nr;
-	setup_revisions(2, argv, &revs, NULL);
-
-	list_objects_filter_copy(&revs.filter, &header->filter);
-
-	if (prepare_revision_walk(&revs))
-		die(_("revision walk setup failed"));
 
-	i = req_nr;
-	while (i && (commit = get_revision(&revs)))
-		if (commit->object.flags & PREREQ_MARK)
-			i--;
-
-	for (i = 0; i < p->nr; i++) {
-		struct string_list_item *e = p->items + i;
-		const char *name = e->string;
-		const struct object_id *oid = e->util;
-		struct object *o = parse_object(r, oid);
-		assert(o); /* otherwise we'd have returned early */
-		if (o->flags & SHOWN)
-			continue;
-		ret++;
-		if (flags & VERIFY_BUNDLE_QUIET)
-			continue;
-		if (ret == 1)
-			error("%s", message);
-		error("%s %s", oid_to_hex(oid), name);
-	}
+	if ((ret = check_connected(iterate_ref_map, &iter, &opts)))
+		error(_("some prerequisite commits exist in the object store, "
+			"but are not connected to the repository's history"));
 
+	/* TODO: preserve this verbose language. */
 	if (flags & VERIFY_BUNDLE_VERBOSE) {
 		struct string_list *r;
 
@@ -282,15 +274,6 @@ int verify_bundle(struct repository *r,
 				  list_objects_filter_spec(&header->filter));
 	}
 cleanup:
-	/* Clean up objects used, as they will be reused. */
-	for (i = 0; i < p->nr; i++) {
-		struct string_list_item *e = p->items + i;
-		struct object_id *oid = e->util;
-		commit = lookup_commit_reference_gently(r, oid, 1);
-		if (commit)
-			clear_commit_marks(commit, ALL_REV_FLAGS | PREREQ_MARK);
-	}
-	release_revisions(&revs);
 	return ret;
 }
 
diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh
index 38dbbf89155..7d40994991e 100755
--- a/t/t6020-bundle-misc.sh
+++ b/t/t6020-bundle-misc.sh
@@ -595,14 +595,14 @@ test_expect_success 'verify catches unreachable, broken prerequisites' '
 		# Verify should fail
 		test_must_fail git bundle verify \
 			../clone-from/tip.bundle 2>err &&
-		grep "Could not read $BAD_OID" err &&
-		grep "Failed to traverse parents of commit $TIP_OID" err &&
+		grep "some prerequisite commits .* are not connected" err &&
+		test_line_count = 1 err &&
 
 		# Unbundling should fail
 		test_must_fail git bundle unbundle \
 			../clone-from/tip.bundle 2>err &&
-		grep "Could not read $BAD_OID" err &&
-		grep "Failed to traverse parents of commit $TIP_OID" err
+		grep "some prerequisite commits .* are not connected" err &&
+		test_line_count = 1 err
 	)
 '
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v3 03/11] t5558: add tests for creationToken heuristic
  2023-01-31 13:29   ` [PATCH v3 00/11] " Derrick Stolee via GitGitGadget
  2023-01-31 13:29     ` [PATCH v3 01/11] bundle: test unbundling with incomplete history Derrick Stolee via GitGitGadget
  2023-01-31 13:29     ` [PATCH v3 02/11] bundle: verify using check_connected() Derrick Stolee via GitGitGadget
@ 2023-01-31 13:29     ` Derrick Stolee via GitGitGadget
  2023-01-31 13:29     ` [PATCH v3 04/11] bundle-uri: parse bundle.heuristic=creationToken Derrick Stolee via GitGitGadget
                       ` (7 subsequent siblings)
  10 siblings, 0 replies; 74+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-01-31 13:29 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

As documented in the bundle URI design doc in 2da14fad8fe (docs:
document bundle URI standard, 2022-08-09), the 'creationToken' member of
a bundle URI allows a bundle provider to specify a total order on the
bundles.

Future changes will allow the Git client to understand these members and
modify its behavior around downloading the bundles in that order. In the
meantime, create tests that add creation tokens to the bundle list. For
now, the Git client correctly ignores these unknown keys.

Create a new test helper function, test_remote_https_urls, which filters
GIT_TRACE2_EVENT output to extract a list of URLs passed to
git-remote-https child processes. This can be used to verify the order
of these requests as we implement the creationToken heuristic. For now,
we need to sort the actual output since the current client does not have
a well-defined order that it applies to the bundles.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 t/t5558-clone-bundle-uri.sh | 69 +++++++++++++++++++++++++++++++++++--
 t/test-lib-functions.sh     |  8 +++++
 2 files changed, 75 insertions(+), 2 deletions(-)

diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index 9155f31fa2c..474432c8ace 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -285,6 +285,8 @@ test_expect_success 'clone HTTP bundle' '
 '
 
 test_expect_success 'clone bundle list (HTTP, no heuristic)' '
+	test_when_finished rm -f trace*.txt &&
+
 	cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" &&
 	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
 	[bundle]
@@ -304,12 +306,26 @@ test_expect_success 'clone bundle list (HTTP, no heuristic)' '
 		uri = $HTTPD_URL/bundle-4.bundle
 	EOF
 
-	git clone --bundle-uri="$HTTPD_URL/bundle-list" \
+	GIT_TRACE2_EVENT="$(pwd)/trace-clone.txt" \
+		git clone --bundle-uri="$HTTPD_URL/bundle-list" \
 		clone-from clone-list-http  2>err &&
 	! grep "Repository lacks these prerequisite commits" err &&
 
 	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
-	git -C clone-list-http cat-file --batch-check <oids
+	git -C clone-list-http cat-file --batch-check <oids &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-1.bundle
+	$HTTPD_URL/bundle-2.bundle
+	$HTTPD_URL/bundle-3.bundle
+	$HTTPD_URL/bundle-4.bundle
+	$HTTPD_URL/bundle-list
+	EOF
+
+	# Sort the list, since the order is not well-defined
+	# without a heuristic.
+	test_remote_https_urls <trace-clone.txt | sort >actual &&
+	test_cmp expect actual
 '
 
 test_expect_success 'clone bundle list (HTTP, any mode)' '
@@ -350,6 +366,55 @@ test_expect_success 'clone bundle list (HTTP, any mode)' '
 	test_cmp expect actual
 '
 
+test_expect_success 'clone bundle list (http, creationToken)' '
+	test_when_finished rm -f trace*.txt &&
+
+	cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" &&
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = bundle-1.bundle
+		creationToken = 1
+
+	[bundle "bundle-2"]
+		uri = bundle-2.bundle
+		creationToken = 2
+
+	[bundle "bundle-3"]
+		uri = bundle-3.bundle
+		creationToken = 3
+
+	[bundle "bundle-4"]
+		uri = bundle-4.bundle
+		creationToken = 4
+	EOF
+
+	GIT_TRACE2_EVENT="$(pwd)/trace-clone.txt" git \
+		clone --bundle-uri="$HTTPD_URL/bundle-list" \
+		"$HTTPD_URL/smart/fetch.git" clone-list-http-2 &&
+
+	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
+	git -C clone-list-http-2 cat-file --batch-check <oids &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-1.bundle
+	$HTTPD_URL/bundle-2.bundle
+	$HTTPD_URL/bundle-3.bundle
+	$HTTPD_URL/bundle-4.bundle
+	$HTTPD_URL/bundle-list
+	EOF
+
+	# Since the creationToken heuristic is not yet understood by the
+	# client, the order cannot be verified at this moment. Sort the
+	# list for consistent results.
+	test_remote_https_urls <trace-clone.txt | sort >actual &&
+	test_cmp expect actual
+'
+
 # Do not add tests here unless they use the HTTP server, as they will
 # not run unless the HTTP dependencies exist.
 
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index f036c4d3003..ace542f4226 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1833,6 +1833,14 @@ test_region () {
 	return 0
 }
 
+# Given a GIT_TRACE2_EVENT log over stdin, writes to stdout a list of URLs
+# sent to git-remote-https child processes.
+test_remote_https_urls() {
+	grep -e '"event":"child_start".*"argv":\["git-remote-https",".*"\]' |
+		sed -e 's/{"event":"child_start".*"argv":\["git-remote-https","//g' \
+		    -e 's/"\]}//g'
+}
+
 # Print the destination of symlink(s) provided as arguments. Basically
 # the same as the readlink command, but it's not available everywhere.
 test_readlink () {
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v3 04/11] bundle-uri: parse bundle.heuristic=creationToken
  2023-01-31 13:29   ` [PATCH v3 00/11] " Derrick Stolee via GitGitGadget
                       ` (2 preceding siblings ...)
  2023-01-31 13:29     ` [PATCH v3 03/11] t5558: add tests for creationToken heuristic Derrick Stolee via GitGitGadget
@ 2023-01-31 13:29     ` Derrick Stolee via GitGitGadget
  2023-01-31 13:29     ` [PATCH v3 05/11] bundle-uri: parse bundle.<id>.creationToken values Derrick Stolee via GitGitGadget
                       ` (6 subsequent siblings)
  10 siblings, 0 replies; 74+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-01-31 13:29 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The bundle.heuristic value communicates that the bundle list is
organized to make use of the bundle.<id>.creationToken values that may
be provided in the bundle list. Those values will create a total order
on the bundles, allowing the Git client to download them in a specific
order and even remember previously-downloaded bundles by storing the
maximum creation token value.

Before implementing any logic that parses or uses the
bundle.<id>.creationToken values, teach Git to parse the
bundle.heuristic value from a bundle list. We can use 'test-tool
bundle-uri' to print the heuristic value and verify that the parsing
works correctly.

As an extra precaution, create the internal 'heuristics' array to be a
list of (enum, string) pairs so we can iterate through the array entries
carefully, regardless of the enum values.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/config/bundle.txt |  7 +++++++
 bundle-uri.c                    | 34 +++++++++++++++++++++++++++++++++
 bundle-uri.h                    | 14 ++++++++++++++
 t/t5750-bundle-uri-parse.sh     | 19 ++++++++++++++++++
 4 files changed, 74 insertions(+)

diff --git a/Documentation/config/bundle.txt b/Documentation/config/bundle.txt
index daa21eb674a..3faae386853 100644
--- a/Documentation/config/bundle.txt
+++ b/Documentation/config/bundle.txt
@@ -15,6 +15,13 @@ bundle.mode::
 	complete understanding of the bundled information (`all`) or if any one
 	of the listed bundle URIs is sufficient (`any`).
 
+bundle.heuristic::
+	If this string-valued key exists, then the bundle list is designed to
+	work well with incremental `git fetch` commands. The heuristic signals
+	that there are additional keys available for each bundle that help
+	determine which subset of bundles the client should download. The
+	only value currently understood is `creationToken`.
+
 bundle.<id>.*::
 	The `bundle.<id>.*` keys are used to describe a single item in the
 	bundle list, grouped under `<id>` for identification purposes.
diff --git a/bundle-uri.c b/bundle-uri.c
index 36268dda172..36ec542718d 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -9,6 +9,14 @@
 #include "config.h"
 #include "remote.h"
 
+static struct {
+	enum bundle_list_heuristic heuristic;
+	const char *name;
+} heuristics[BUNDLE_HEURISTIC__COUNT] = {
+	{ BUNDLE_HEURISTIC_NONE, ""},
+	{ BUNDLE_HEURISTIC_CREATIONTOKEN, "creationToken" },
+};
+
 static int compare_bundles(const void *hashmap_cmp_fn_data,
 			   const struct hashmap_entry *he1,
 			   const struct hashmap_entry *he2,
@@ -100,6 +108,17 @@ void print_bundle_list(FILE *fp, struct bundle_list *list)
 	fprintf(fp, "\tversion = %d\n", list->version);
 	fprintf(fp, "\tmode = %s\n", mode);
 
+	if (list->heuristic) {
+		int i;
+		for (i = 0; i < BUNDLE_HEURISTIC__COUNT; i++) {
+			if (heuristics[i].heuristic == list->heuristic) {
+				printf("\theuristic = %s\n",
+				       heuristics[list->heuristic].name);
+				break;
+			}
+		}
+	}
+
 	for_all_bundles_in_list(list, summarize_bundle, fp);
 }
 
@@ -142,6 +161,21 @@ static int bundle_list_update(const char *key, const char *value,
 			return 0;
 		}
 
+		if (!strcmp(subkey, "heuristic")) {
+			int i;
+			for (i = 0; i < BUNDLE_HEURISTIC__COUNT; i++) {
+				if (heuristics[i].heuristic &&
+				    heuristics[i].name &&
+				    !strcmp(value, heuristics[i].name)) {
+					list->heuristic = heuristics[i].heuristic;
+					return 0;
+				}
+			}
+
+			/* Ignore unknown heuristics. */
+			return 0;
+		}
+
 		/* Ignore other unknown global keys. */
 		return 0;
 	}
diff --git a/bundle-uri.h b/bundle-uri.h
index d5e89f1671c..2e44a50a90b 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -52,6 +52,14 @@ enum bundle_list_mode {
 	BUNDLE_MODE_ANY
 };
 
+enum bundle_list_heuristic {
+	BUNDLE_HEURISTIC_NONE = 0,
+	BUNDLE_HEURISTIC_CREATIONTOKEN,
+
+	/* Must be last. */
+	BUNDLE_HEURISTIC__COUNT
+};
+
 /**
  * A bundle_list contains an unordered set of remote_bundle_info structs,
  * as well as information about the bundle listing, such as version and
@@ -75,6 +83,12 @@ struct bundle_list {
 	 * advertised by the bundle list at that location.
 	 */
 	char *baseURI;
+
+	/**
+	 * A list can have a heuristic, which helps reduce the number of
+	 * downloaded bundles.
+	 */
+	enum bundle_list_heuristic heuristic;
 };
 
 void init_bundle_list(struct bundle_list *list);
diff --git a/t/t5750-bundle-uri-parse.sh b/t/t5750-bundle-uri-parse.sh
index 7b4f930e532..6fc92a9c0d4 100755
--- a/t/t5750-bundle-uri-parse.sh
+++ b/t/t5750-bundle-uri-parse.sh
@@ -250,4 +250,23 @@ test_expect_success 'parse config format edge cases: empty key or value' '
 	test_cmp_config_output expect actual
 '
 
+test_expect_success 'parse config format: creationToken heuristic' '
+	cat >expect <<-\EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+	[bundle "one"]
+		uri = http://example.com/bundle.bdl
+	[bundle "two"]
+		uri = https://example.com/bundle.bdl
+	[bundle "three"]
+		uri = file:///usr/share/git/bundle.bdl
+	EOF
+
+	test-tool bundle-uri parse-config expect >actual 2>err &&
+	test_must_be_empty err &&
+	test_cmp_config_output expect actual
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v3 05/11] bundle-uri: parse bundle.<id>.creationToken values
  2023-01-31 13:29   ` [PATCH v3 00/11] " Derrick Stolee via GitGitGadget
                       ` (3 preceding siblings ...)
  2023-01-31 13:29     ` [PATCH v3 04/11] bundle-uri: parse bundle.heuristic=creationToken Derrick Stolee via GitGitGadget
@ 2023-01-31 13:29     ` Derrick Stolee via GitGitGadget
  2023-01-31 13:29     ` [PATCH v3 06/11] bundle-uri: download in creationToken order Derrick Stolee via GitGitGadget
                       ` (5 subsequent siblings)
  10 siblings, 0 replies; 74+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-01-31 13:29 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The previous change taught Git to parse the bundle.heuristic value,
especially when its value is "creationToken". Now, teach Git to parse
the bundle.<id>.creationToken values on each bundle in a bundle list.

Before implementing any logic based on creationToken values for the
creationToken heuristic, parse and print these values for testing
purposes.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c                | 10 ++++++++++
 bundle-uri.h                |  6 ++++++
 t/t5750-bundle-uri-parse.sh | 18 ++++++++++++++++++
 3 files changed, 34 insertions(+)

diff --git a/bundle-uri.c b/bundle-uri.c
index 36ec542718d..d4277b2e3a7 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -83,6 +83,9 @@ static int summarize_bundle(struct remote_bundle_info *info, void *data)
 	FILE *fp = data;
 	fprintf(fp, "[bundle \"%s\"]\n", info->id);
 	fprintf(fp, "\turi = %s\n", info->uri);
+
+	if (info->creationToken)
+		fprintf(fp, "\tcreationToken = %"PRIu64"\n", info->creationToken);
 	return 0;
 }
 
@@ -203,6 +206,13 @@ static int bundle_list_update(const char *key, const char *value,
 		return 0;
 	}
 
+	if (!strcmp(subkey, "creationtoken")) {
+		if (sscanf(value, "%"PRIu64, &bundle->creationToken) != 1)
+			warning(_("could not parse bundle list key %s with value '%s'"),
+				"creationToken", value);
+		return 0;
+	}
+
 	/*
 	 * At this point, we ignore any information that we don't
 	 * understand, assuming it to be hints for a heuristic the client
diff --git a/bundle-uri.h b/bundle-uri.h
index 2e44a50a90b..ef32840bfa6 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -42,6 +42,12 @@ struct remote_bundle_info {
 	 * this boolean is true.
 	 */
 	unsigned unbundled:1;
+
+	/**
+	 * If the bundle is part of a list with the creationToken
+	 * heuristic, then we use this member for sorting the bundles.
+	 */
+	uint64_t creationToken;
 };
 
 #define REMOTE_BUNDLE_INFO_INIT { 0 }
diff --git a/t/t5750-bundle-uri-parse.sh b/t/t5750-bundle-uri-parse.sh
index 6fc92a9c0d4..81bdf58b944 100755
--- a/t/t5750-bundle-uri-parse.sh
+++ b/t/t5750-bundle-uri-parse.sh
@@ -258,10 +258,13 @@ test_expect_success 'parse config format: creationToken heuristic' '
 		heuristic = creationToken
 	[bundle "one"]
 		uri = http://example.com/bundle.bdl
+		creationToken = 123456
 	[bundle "two"]
 		uri = https://example.com/bundle.bdl
+		creationToken = 12345678901234567890
 	[bundle "three"]
 		uri = file:///usr/share/git/bundle.bdl
+		creationToken = 1
 	EOF
 
 	test-tool bundle-uri parse-config expect >actual 2>err &&
@@ -269,4 +272,19 @@ test_expect_success 'parse config format: creationToken heuristic' '
 	test_cmp_config_output expect actual
 '
 
+test_expect_success 'parse config format edge cases: creationToken heuristic' '
+	cat >expect <<-\EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+	[bundle "one"]
+		uri = http://example.com/bundle.bdl
+		creationToken = bogus
+	EOF
+
+	test-tool bundle-uri parse-config expect >actual 2>err &&
+	grep "could not parse bundle list key creationToken with value '\''bogus'\''" err
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v3 06/11] bundle-uri: download in creationToken order
  2023-01-31 13:29   ` [PATCH v3 00/11] " Derrick Stolee via GitGitGadget
                       ` (4 preceding siblings ...)
  2023-01-31 13:29     ` [PATCH v3 05/11] bundle-uri: parse bundle.<id>.creationToken values Derrick Stolee via GitGitGadget
@ 2023-01-31 13:29     ` Derrick Stolee via GitGitGadget
  2023-01-31 13:29     ` [PATCH v3 07/11] clone: set fetch.bundleURI if appropriate Derrick Stolee via GitGitGadget
                       ` (4 subsequent siblings)
  10 siblings, 0 replies; 74+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-01-31 13:29 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The creationToken heuristic provides an ordering on the bundles
advertised by a bundle list. Teach the Git client to download bundles
differently when this heuristic is advertised.

The bundles in the list are sorted by their advertised creationToken
values, then downloaded in decreasing order. This avoids the previous
strategy of downloading bundles in an arbitrary order and attempting
to apply them (likely failing in the case of required commits) until
discovering the order through attempted unbundling.

During a fresh 'git clone', it may make sense to download the bundles in
increasing order, since that would prevent the need to attempt
unbundling a bundle with required commits that do not exist in our empty
object store. The cost of testing an unbundle is quite low, and instead
the chosen order is optimizing for a future bundle download during a
'git fetch' operation with a non-empty object store.

Since the Git client continues fetching from the Git remote after
downloading and unbundling bundles, the client's object store can be
ahead of the bundle provider's object store. The next time it attempts
to download from the bundle list, it makes most sense to download only
the most-recent bundles until all tips successfully unbundle. The
strategy implemented here provides that short-circuit where the client
downloads a minimal set of bundles.

However, we are not satisfied by the naive approach of downloading
bundles until one successfully unbundles, expecting the earlier bundles
to successfully unbundle now. The example repository in t5558
demonstrates this well:

 ---------------- bundle-4

       4
      / \
 ----|---|------- bundle-3
     |   |
     |   3
     |   |
 ----|---|------- bundle-2
     |   |
     2   |
     |   |
 ----|---|------- bundle-1
      \ /
       1
       |
 (previous commits)

In this repository, if we already have the objects for bundle-1 and then
try to fetch from this list, the naive approach will fail. bundle-4
requires both bundle-3 and bundle-2, though bundle-3 will successfully
unbundle without bundle-2. Thus, the algorithm needs to keep this in
mind.

A later implementation detail will store the maximum creationToken seen
during such a bundle download, and the client will avoid downloading a
bundle unless its creationToken is strictly greater than that stored
value. For now, if the client seeks to download from an identical
bundle list since its previous download, it will download the
most-recent bundle then stop since its required commits are already in
the object store.

Add tests that exercise this behavior, but we will expand upon these
tests when incremental downloads during 'git fetch' make use of
creationToken values.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c                | 156 +++++++++++++++++++++++++++++++++++-
 t/t5558-clone-bundle-uri.sh |  40 +++++++--
 t/t5601-clone.sh            |  46 +++++++++++
 3 files changed, 233 insertions(+), 9 deletions(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index d4277b2e3a7..af48938d243 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -447,6 +447,139 @@ static int download_bundle_to_file(struct remote_bundle_info *bundle, void *data
 	return 0;
 }
 
+struct bundles_for_sorting {
+	struct remote_bundle_info **items;
+	size_t alloc;
+	size_t nr;
+};
+
+static int append_bundle(struct remote_bundle_info *bundle, void *data)
+{
+	struct bundles_for_sorting *list = data;
+	list->items[list->nr++] = bundle;
+	return 0;
+}
+
+/**
+ * For use in QSORT() to get a list sorted by creationToken
+ * in decreasing order.
+ */
+static int compare_creation_token_decreasing(const void *va, const void *vb)
+{
+	const struct remote_bundle_info * const *a = va;
+	const struct remote_bundle_info * const *b = vb;
+
+	if ((*a)->creationToken > (*b)->creationToken)
+		return -1;
+	if ((*a)->creationToken < (*b)->creationToken)
+		return 1;
+	return 0;
+}
+
+static int fetch_bundles_by_token(struct repository *r,
+				  struct bundle_list *list)
+{
+	int cur;
+	int move_direction = 0;
+	struct bundle_list_context ctx = {
+		.r = r,
+		.list = list,
+		.mode = list->mode,
+	};
+	struct bundles_for_sorting bundles = {
+		.alloc = hashmap_get_size(&list->bundles),
+	};
+
+	ALLOC_ARRAY(bundles.items, bundles.alloc);
+
+	for_all_bundles_in_list(list, append_bundle, &bundles);
+
+	QSORT(bundles.items, bundles.nr, compare_creation_token_decreasing);
+
+	/*
+	 * Attempt to download and unbundle the minimum number of bundles by
+	 * creationToken in decreasing order. If we fail to unbundle (after
+	 * a successful download) then move to the next non-downloaded bundle
+	 * and attempt downloading. Once we succeed in applying a bundle,
+	 * move to the previous unapplied bundle and attempt to unbundle it
+	 * again.
+	 *
+	 * In the case of a fresh clone, we will likely download all of the
+	 * bundles before successfully unbundling the oldest one, then the
+	 * rest of the bundles unbundle successfully in increasing order
+	 * of creationToken.
+	 *
+	 * If there are existing objects, then this process may terminate
+	 * early when all required commits from "new" bundles exist in the
+	 * repo's object store.
+	 */
+	cur = 0;
+	while (cur >= 0 && cur < bundles.nr) {
+		struct remote_bundle_info *bundle = bundles.items[cur];
+		if (!bundle->file) {
+			/*
+			 * Not downloaded yet. Try downloading.
+			 *
+			 * Note that bundle->file is non-NULL if a download
+			 * was attempted, even if it failed to download.
+			 */
+			if (fetch_bundle_uri_internal(ctx.r, bundle, ctx.depth + 1, ctx.list)) {
+				/* Mark as unbundled so we do not retry. */
+				bundle->unbundled = 1;
+
+				/* Try looking deeper in the list. */
+				move_direction = 1;
+				goto move;
+			}
+
+			/* We expect bundles when using creationTokens. */
+			if (!is_bundle(bundle->file, 1)) {
+				warning(_("file downloaded from '%s' is not a bundle"),
+					bundle->uri);
+				break;
+			}
+		}
+
+		if (bundle->file && !bundle->unbundled) {
+			/*
+			 * This was downloaded, but not successfully
+			 * unbundled. Try unbundling again.
+			 */
+			if (unbundle_from_file(ctx.r, bundle->file)) {
+				/* Try looking deeper in the list. */
+				move_direction = 1;
+			} else {
+				/*
+				 * Succeeded in unbundle. Retry bundles
+				 * that previously failed to unbundle.
+				 */
+				move_direction = -1;
+				bundle->unbundled = 1;
+			}
+		}
+
+		/*
+		 * Else case: downloaded and unbundled successfully.
+		 * Skip this by moving in the same direction as the
+		 * previous step.
+		 */
+
+move:
+		/* Move in the specified direction and repeat. */
+		cur += move_direction;
+	}
+
+	free(bundles.items);
+
+	/*
+	 * We succeed if the loop terminates because 'cur' drops below
+	 * zero. The other case is that we terminate because 'cur'
+	 * reaches the end of the list, so we have a failure no matter
+	 * which bundles we apply from the list.
+	 */
+	return cur >= 0;
+}
+
 static int download_bundle_list(struct repository *r,
 				struct bundle_list *local_list,
 				struct bundle_list *global_list,
@@ -484,7 +617,15 @@ static int fetch_bundle_list_in_config_format(struct repository *r,
 		goto cleanup;
 	}
 
-	if ((result = download_bundle_list(r, &list_from_bundle,
+	/*
+	 * If this list uses the creationToken heuristic, then the URIs
+	 * it advertises are expected to be bundles, not nested lists.
+	 * We can drop 'global_list' and 'depth'.
+	 */
+	if (list_from_bundle.heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN) {
+		result = fetch_bundles_by_token(r, &list_from_bundle);
+		global_list->heuristic = BUNDLE_HEURISTIC_CREATIONTOKEN;
+	} else if ((result = download_bundle_list(r, &list_from_bundle,
 					   global_list, depth)))
 		goto cleanup;
 
@@ -626,6 +767,14 @@ int fetch_bundle_list(struct repository *r, struct bundle_list *list)
 	int result;
 	struct bundle_list global_list;
 
+	/*
+	 * If the creationToken heuristic is used, then the URIs
+	 * advertised by 'list' are not nested lists and instead
+	 * direct bundles. We do not need to use global_list.
+	 */
+	if (list->heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN)
+		return fetch_bundles_by_token(r, list);
+
 	init_bundle_list(&global_list);
 
 	/* If a bundle is added to this global list, then it is required. */
@@ -634,7 +783,10 @@ int fetch_bundle_list(struct repository *r, struct bundle_list *list)
 	if ((result = download_bundle_list(r, list, &global_list, 0)))
 		goto cleanup;
 
-	result = unbundle_all_bundles(r, &global_list);
+	if (list->heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN)
+		result = fetch_bundles_by_token(r, list);
+	else
+		result = unbundle_all_bundles(r, &global_list);
 
 cleanup:
 	for_all_bundles_in_list(&global_list, unlink_bundle, NULL);
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index 474432c8ace..6f9417a0afb 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -401,17 +401,43 @@ test_expect_success 'clone bundle list (http, creationToken)' '
 	git -C clone-list-http-2 cat-file --batch-check <oids &&
 
 	cat >expect <<-EOF &&
-	$HTTPD_URL/bundle-1.bundle
-	$HTTPD_URL/bundle-2.bundle
-	$HTTPD_URL/bundle-3.bundle
+	$HTTPD_URL/bundle-list
 	$HTTPD_URL/bundle-4.bundle
+	$HTTPD_URL/bundle-3.bundle
+	$HTTPD_URL/bundle-2.bundle
+	$HTTPD_URL/bundle-1.bundle
+	EOF
+
+	test_remote_https_urls <trace-clone.txt >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'clone incomplete bundle list (http, creationToken)' '
+	test_when_finished rm -f trace*.txt &&
+
+	cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" &&
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = bundle-1.bundle
+		creationToken = 1
+	EOF
+
+	GIT_TRACE2_EVENT=$(pwd)/trace-clone.txt \
+	git clone --bundle-uri="$HTTPD_URL/bundle-list" \
+		--single-branch --branch=base --no-tags \
+		"$HTTPD_URL/smart/fetch.git" clone-token-http &&
+
+	cat >expect <<-EOF &&
 	$HTTPD_URL/bundle-list
+	$HTTPD_URL/bundle-1.bundle
 	EOF
 
-	# Since the creationToken heuristic is not yet understood by the
-	# client, the order cannot be verified at this moment. Sort the
-	# list for consistent results.
-	test_remote_https_urls <trace-clone.txt | sort >actual &&
+	test_remote_https_urls <trace-clone.txt >actual &&
 	test_cmp expect actual
 '
 
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index 1928ea1dd7c..b7d5551262c 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -831,6 +831,52 @@ test_expect_success 'auto-discover multiple bundles from HTTP clone' '
 	grep -f pattern trace.txt
 '
 
+test_expect_success 'auto-discover multiple bundles from HTTP clone: creationToken heuristic' '
+	test_when_finished rm -rf "$HTTPD_DOCUMENT_ROOT_PATH/repo4.git" &&
+	test_when_finished rm -rf clone-heuristic trace*.txt &&
+
+	test_commit -C src newest &&
+	git -C src bundle create "$HTTPD_DOCUMENT_ROOT_PATH/newest.bundle" HEAD~1..HEAD &&
+	git clone --bare --no-local src "$HTTPD_DOCUMENT_ROOT_PATH/repo4.git" &&
+
+	cat >>"$HTTPD_DOCUMENT_ROOT_PATH/repo4.git/config" <<-EOF &&
+	[uploadPack]
+		advertiseBundleURIs = true
+
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "everything"]
+		uri = $HTTPD_URL/everything.bundle
+		creationtoken = 1
+
+	[bundle "new"]
+		uri = $HTTPD_URL/new.bundle
+		creationtoken = 2
+
+	[bundle "newest"]
+		uri = $HTTPD_URL/newest.bundle
+		creationtoken = 3
+	EOF
+
+	GIT_TRACE2_EVENT="$(pwd)/trace-clone.txt" \
+		git -c protocol.version=2 \
+		    -c transfer.bundleURI=true clone \
+		"$HTTPD_URL/smart/repo4.git" clone-heuristic &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/newest.bundle
+	$HTTPD_URL/new.bundle
+	$HTTPD_URL/everything.bundle
+	EOF
+
+	# We should fetch all bundles in the expected order.
+	test_remote_https_urls <trace-clone.txt >actual &&
+	test_cmp expect actual
+'
+
 # DO NOT add non-httpd-specific tests here, because the last part of this
 # test script is only executed when httpd is available and enabled.
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v3 07/11] clone: set fetch.bundleURI if appropriate
  2023-01-31 13:29   ` [PATCH v3 00/11] " Derrick Stolee via GitGitGadget
                       ` (5 preceding siblings ...)
  2023-01-31 13:29     ` [PATCH v3 06/11] bundle-uri: download in creationToken order Derrick Stolee via GitGitGadget
@ 2023-01-31 13:29     ` Derrick Stolee via GitGitGadget
  2023-01-31 13:29     ` [PATCH v3 08/11] bundle-uri: drop bundle.flag from design doc Derrick Stolee via GitGitGadget
                       ` (3 subsequent siblings)
  10 siblings, 0 replies; 74+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-01-31 13:29 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

Bundle providers may organize their bundle lists in a way that is
intended to improve incremental fetches, not just initial clones.
However, they do need to state that they have organized with that in
mind, or else the client will not expect to save time by downloading
bundles after the initial clone. This is done by specifying a
bundle.heuristic value.

There are two types of bundle lists: those at a static URI and those
that are advertised from a Git remote over protocol v2.

The new fetch.bundleURI config value applies for static bundle URIs that
are not advertised over protocol v2. If the user specifies a static URI
via 'git clone --bundle-uri', then Git can set this config as a reminder
for future 'git fetch' operations to check the bundle list before
connecting to the remote(s).

For lists provided over protocol v2, we will want to take a different
approach and create a property of the remote itself by creating a
remote.<id>.* type config key. That is not implemented in this change.

Later changes will update 'git fetch' to consume this option.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/config/fetch.txt |  8 +++++++
 builtin/clone.c                |  6 +++++-
 bundle-uri.c                   |  5 ++++-
 bundle-uri.h                   |  8 ++++++-
 t/t5558-clone-bundle-uri.sh    | 39 ++++++++++++++++++++++++++++++++++
 5 files changed, 63 insertions(+), 3 deletions(-)

diff --git a/Documentation/config/fetch.txt b/Documentation/config/fetch.txt
index cd65d236b43..244f44d460f 100644
--- a/Documentation/config/fetch.txt
+++ b/Documentation/config/fetch.txt
@@ -96,3 +96,11 @@ fetch.writeCommitGraph::
 	merge and the write may take longer. Having an updated commit-graph
 	file helps performance of many Git commands, including `git merge-base`,
 	`git push -f`, and `git log --graph`. Defaults to false.
+
+fetch.bundleURI::
+	This value stores a URI for downloading Git object data from a bundle
+	URI before performing an incremental fetch from the origin Git server.
+	This is similar to how the `--bundle-uri` option behaves in
+	linkgit:git-clone[1]. `git clone --bundle-uri` will set the
+	`fetch.bundleURI` value if the supplied bundle URI contains a bundle
+	list that is organized for incremental fetches.
diff --git a/builtin/clone.c b/builtin/clone.c
index 5453ba5277f..5370617664d 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1248,12 +1248,16 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	 * data from the --bundle-uri option.
 	 */
 	if (bundle_uri) {
+		int has_heuristic = 0;
+
 		/* At this point, we need the_repository to match the cloned repo. */
 		if (repo_init(the_repository, git_dir, work_tree))
 			warning(_("failed to initialize the repo, skipping bundle URI"));
-		else if (fetch_bundle_uri(the_repository, bundle_uri))
+		else if (fetch_bundle_uri(the_repository, bundle_uri, &has_heuristic))
 			warning(_("failed to fetch objects from bundle URI '%s'"),
 				bundle_uri);
+		else if (has_heuristic)
+			git_config_set_gently("fetch.bundleuri", bundle_uri);
 	}
 
 	strvec_push(&transport_ls_refs_options.ref_prefixes, "HEAD");
diff --git a/bundle-uri.c b/bundle-uri.c
index af48938d243..7a1b6d94bf5 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -736,7 +736,8 @@ static int unlink_bundle(struct remote_bundle_info *info, void *data)
 	return 0;
 }
 
-int fetch_bundle_uri(struct repository *r, const char *uri)
+int fetch_bundle_uri(struct repository *r, const char *uri,
+		     int *has_heuristic)
 {
 	int result;
 	struct bundle_list list;
@@ -756,6 +757,8 @@ int fetch_bundle_uri(struct repository *r, const char *uri)
 	result = unbundle_all_bundles(r, &list);
 
 cleanup:
+	if (has_heuristic)
+		*has_heuristic = (list.heuristic != BUNDLE_HEURISTIC_NONE);
 	for_all_bundles_in_list(&list, unlink_bundle, NULL);
 	clear_bundle_list(&list);
 	clear_remote_bundle_info(&bundle, NULL);
diff --git a/bundle-uri.h b/bundle-uri.h
index ef32840bfa6..6dbc780f661 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -124,8 +124,14 @@ int bundle_uri_parse_config_format(const char *uri,
  * based on that information.
  *
  * Returns non-zero if no bundle information is found at the given 'uri'.
+ *
+ * If the pointer 'has_heuristic' is non-NULL, then the value it points to
+ * will be set to be non-zero if and only if the fetched list has a
+ * heuristic value. Such a value indicates that the list was designed for
+ * incremental fetches.
  */
-int fetch_bundle_uri(struct repository *r, const char *uri);
+int fetch_bundle_uri(struct repository *r, const char *uri,
+		     int *has_heuristic);
 
 /**
  * Given a bundle list that was already advertised (likely by the
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index 6f9417a0afb..b2d15e141ca 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -432,6 +432,8 @@ test_expect_success 'clone incomplete bundle list (http, creationToken)' '
 		--single-branch --branch=base --no-tags \
 		"$HTTPD_URL/smart/fetch.git" clone-token-http &&
 
+	test_cmp_config -C clone-token-http "$HTTPD_URL/bundle-list" fetch.bundleuri &&
+
 	cat >expect <<-EOF &&
 	$HTTPD_URL/bundle-list
 	$HTTPD_URL/bundle-1.bundle
@@ -441,6 +443,43 @@ test_expect_success 'clone incomplete bundle list (http, creationToken)' '
 	test_cmp expect actual
 '
 
+test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
+	test_when_finished rm -rf fetch-http-4 trace*.txt &&
+
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = bundle-1.bundle
+		creationToken = 1
+	EOF
+
+	GIT_TRACE2_EVENT="$(pwd)/trace-clone.txt" \
+	git clone --single-branch --branch=base \
+		--bundle-uri="$HTTPD_URL/bundle-list" \
+		"$HTTPD_URL/smart/fetch.git" fetch-http-4 &&
+
+	test_cmp_config -C fetch-http-4 "$HTTPD_URL/bundle-list" fetch.bundleuri &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-list
+	$HTTPD_URL/bundle-1.bundle
+	EOF
+
+	test_remote_https_urls <trace-clone.txt >actual &&
+	test_cmp expect actual &&
+
+	# only received base ref from bundle-1
+	git -C fetch-http-4 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+	cat >expect <<-\EOF &&
+	refs/bundles/base
+	EOF
+	test_cmp expect refs
+'
+
 # Do not add tests here unless they use the HTTP server, as they will
 # not run unless the HTTP dependencies exist.
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v3 08/11] bundle-uri: drop bundle.flag from design doc
  2023-01-31 13:29   ` [PATCH v3 00/11] " Derrick Stolee via GitGitGadget
                       ` (6 preceding siblings ...)
  2023-01-31 13:29     ` [PATCH v3 07/11] clone: set fetch.bundleURI if appropriate Derrick Stolee via GitGitGadget
@ 2023-01-31 13:29     ` Derrick Stolee via GitGitGadget
  2023-01-31 13:29     ` [PATCH v3 09/11] fetch: fetch from an external bundle URI Derrick Stolee via GitGitGadget
                       ` (2 subsequent siblings)
  10 siblings, 0 replies; 74+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-01-31 13:29 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The Implementation Plan section lists a 'bundle.flag' option that is not
documented anywhere else. What is documented elsewhere in the document
and implemented by previous changes is the 'bundle.heuristic' config
key. For now, a heuristic is required to indicate that a bundle list is
organized for use during 'git fetch', and it is also sufficient for all
existing designs.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/technical/bundle-uri.txt | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/Documentation/technical/bundle-uri.txt b/Documentation/technical/bundle-uri.txt
index b78d01d9adf..91d3a13e327 100644
--- a/Documentation/technical/bundle-uri.txt
+++ b/Documentation/technical/bundle-uri.txt
@@ -479,14 +479,14 @@ outline for submitting these features:
    (This choice is an opt-in via a config option and a command-line
    option.)
 
-4. Allow the client to understand the `bundle.flag=forFetch` configuration
+4. Allow the client to understand the `bundle.heuristic` configuration key
    and the `bundle.<id>.creationToken` heuristic. When `git clone`
-   discovers a bundle URI with `bundle.flag=forFetch`, it configures the
-   client repository to check that bundle URI during later `git fetch <remote>`
+   discovers a bundle URI with `bundle.heuristic`, it configures the client
+   repository to check that bundle URI during later `git fetch <remote>`
    commands.
 
 5. Allow clients to discover bundle URIs during `git fetch` and configure
-   a bundle URI for later fetches if `bundle.flag=forFetch`.
+   a bundle URI for later fetches if `bundle.heuristic` is set.
 
 6. Implement the "inspect headers" heuristic to reduce data downloads when
    the `bundle.<id>.creationToken` heuristic is not available.
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v3 09/11] fetch: fetch from an external bundle URI
  2023-01-31 13:29   ` [PATCH v3 00/11] " Derrick Stolee via GitGitGadget
                       ` (7 preceding siblings ...)
  2023-01-31 13:29     ` [PATCH v3 08/11] bundle-uri: drop bundle.flag from design doc Derrick Stolee via GitGitGadget
@ 2023-01-31 13:29     ` Derrick Stolee via GitGitGadget
  2023-01-31 13:29     ` [PATCH v3 10/11] bundle-uri: store fetch.bundleCreationToken Derrick Stolee via GitGitGadget
  2023-01-31 13:29     ` [PATCH v3 11/11] bundle-uri: test missing bundles with heuristic Derrick Stolee via GitGitGadget
  10 siblings, 0 replies; 74+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-01-31 13:29 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

When a user specifies a URI via 'git clone --bundle-uri', that URI may
be a bundle list that advertises a 'bundle.heuristic' value. In that
case, the Git client stores a 'fetch.bundleURI' config value storing
that URI.

Teach 'git fetch' to check for this config value and download bundles
from that URI before fetching from the Git remote(s). Likely, the bundle
provider has configured a heuristic (such as "creationToken") that will
allow the Git client to download only a portion of the bundles before
continuing the fetch.

Since this URI is completely independent of the remote server, we want
to be sure that we connect to the bundle URI before creating a
connection to the Git remote. We do not want to hold a stateful
connection for too long if we can avoid it.

To test that this works correctly, extend the previous tests that set
'fetch.bundleURI' to do follow-up fetches. The bundle list is updated
incrementally at each phase to demonstrate that the heuristic avoids
downloading older bundles. This includes the middle fetch downloading
the objects in bundle-3.bundle from the Git remote, and therefore not
needing that bundle in the third fetch.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/fetch.c             |   6 ++
 t/t5558-clone-bundle-uri.sh | 113 +++++++++++++++++++++++++++++++++++-
 2 files changed, 118 insertions(+), 1 deletion(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 7378cafeec9..0477c379369 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -29,6 +29,7 @@
 #include "commit-graph.h"
 #include "shallow.h"
 #include "worktree.h"
+#include "bundle-uri.h"
 
 #define FORCED_UPDATES_DELAY_WARNING_IN_MS (10 * 1000)
 
@@ -2109,6 +2110,7 @@ static int fetch_one(struct remote *remote, int argc, const char **argv,
 int cmd_fetch(int argc, const char **argv, const char *prefix)
 {
 	int i;
+	const char *bundle_uri;
 	struct string_list list = STRING_LIST_INIT_DUP;
 	struct remote *remote = NULL;
 	int result = 0;
@@ -2194,6 +2196,10 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 	if (dry_run)
 		write_fetch_head = 0;
 
+	if (!git_config_get_string_tmp("fetch.bundleuri", &bundle_uri) &&
+	    fetch_bundle_uri(the_repository, bundle_uri, NULL))
+		warning(_("failed to fetch bundles from '%s'"), bundle_uri);
+
 	if (all) {
 		if (argc == 1)
 			die(_("fetch --all does not take a repository argument"));
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index b2d15e141ca..7deeb4b8ad1 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -440,7 +440,55 @@ test_expect_success 'clone incomplete bundle list (http, creationToken)' '
 	EOF
 
 	test_remote_https_urls <trace-clone.txt >actual &&
-	test_cmp expect actual
+	test_cmp expect actual &&
+
+	# We now have only one bundle ref.
+	git -C clone-token-http for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+	cat >expect <<-\EOF &&
+	refs/bundles/base
+	EOF
+	test_cmp expect refs &&
+
+	# Add remaining bundles, exercising the "deepening" strategy
+	# for downloading via the creationToken heurisitc.
+	cat >>"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle "bundle-2"]
+		uri = bundle-2.bundle
+		creationToken = 2
+
+	[bundle "bundle-3"]
+		uri = bundle-3.bundle
+		creationToken = 3
+
+	[bundle "bundle-4"]
+		uri = bundle-4.bundle
+		creationToken = 4
+	EOF
+
+	GIT_TRACE2_EVENT="$(pwd)/trace1.txt" \
+		git -C clone-token-http fetch origin --no-tags \
+		refs/heads/merge:refs/heads/merge &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-list
+	$HTTPD_URL/bundle-4.bundle
+	$HTTPD_URL/bundle-3.bundle
+	$HTTPD_URL/bundle-2.bundle
+	EOF
+
+	test_remote_https_urls <trace1.txt >actual &&
+	test_cmp expect actual &&
+
+	# We now have all bundle refs.
+	git -C clone-token-http for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+
+	cat >expect <<-\EOF &&
+	refs/bundles/base
+	refs/bundles/left
+	refs/bundles/merge
+	refs/bundles/right
+	EOF
+	test_cmp expect refs
 '
 
 test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
@@ -477,6 +525,69 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
 	cat >expect <<-\EOF &&
 	refs/bundles/base
 	EOF
+	test_cmp expect refs &&
+
+	cat >>"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle "bundle-2"]
+		uri = bundle-2.bundle
+		creationToken = 2
+	EOF
+
+	# Fetch the objects for bundle-2 _and_ bundle-3.
+	GIT_TRACE2_EVENT="$(pwd)/trace1.txt" \
+		git -C fetch-http-4 fetch origin --no-tags \
+		refs/heads/left:refs/heads/left \
+		refs/heads/right:refs/heads/right &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-list
+	$HTTPD_URL/bundle-2.bundle
+	EOF
+
+	test_remote_https_urls <trace1.txt >actual &&
+	test_cmp expect actual &&
+
+	# received left from bundle-2
+	git -C fetch-http-4 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+	cat >expect <<-\EOF &&
+	refs/bundles/base
+	refs/bundles/left
+	EOF
+	test_cmp expect refs &&
+
+	cat >>"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle "bundle-3"]
+		uri = bundle-3.bundle
+		creationToken = 3
+
+	[bundle "bundle-4"]
+		uri = bundle-4.bundle
+		creationToken = 4
+	EOF
+
+	# This fetch should skip bundle-3.bundle, since its objects are
+	# already local (we have the requisite commits for bundle-4.bundle).
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
+		git -C fetch-http-4 fetch origin --no-tags \
+		refs/heads/merge:refs/heads/merge &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-list
+	$HTTPD_URL/bundle-4.bundle
+	EOF
+
+	test_remote_https_urls <trace2.txt >actual &&
+	test_cmp expect actual &&
+
+	# received merge ref from bundle-4, but right is missing
+	# because we did not download bundle-3.
+	git -C fetch-http-4 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+
+	cat >expect <<-\EOF &&
+	refs/bundles/base
+	refs/bundles/left
+	refs/bundles/merge
+	EOF
 	test_cmp expect refs
 '
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v3 10/11] bundle-uri: store fetch.bundleCreationToken
  2023-01-31 13:29   ` [PATCH v3 00/11] " Derrick Stolee via GitGitGadget
                       ` (8 preceding siblings ...)
  2023-01-31 13:29     ` [PATCH v3 09/11] fetch: fetch from an external bundle URI Derrick Stolee via GitGitGadget
@ 2023-01-31 13:29     ` Derrick Stolee via GitGitGadget
  2023-01-31 13:29     ` [PATCH v3 11/11] bundle-uri: test missing bundles with heuristic Derrick Stolee via GitGitGadget
  10 siblings, 0 replies; 74+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-01-31 13:29 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

When a bundle list specifies the "creationToken" heuristic, the Git
client downloads the list and then starts downloading bundles in
descending creationToken order. This process stops as soon as all
downloaded bundles can be applied to the repository (because all
required commits are present in the repository or in the downloaded
bundles).

When checking the same bundle list twice, this strategy requires
downloading the bundle with the maximum creationToken again, which is
wasteful. The creationToken heuristic promises that the client will not
have a use for that bundle if its creationToken value is at most the
previous creationToken value.

To prevent these wasteful downloads, create a fetch.bundleCreationToken
config setting that the Git client sets after downloading bundles. This
value allows skipping that maximum bundle download when this config
value is the same value (or larger).

To test that this works correctly, we can insert some "duplicate"
fetches into existing tests and demonstrate that only the bundle list is
downloaded.

The previous logic for downloading bundles by creationToken worked even
if the bundle list was empty, but now we have logic that depends on the
first entry of the list. Terminate early in the (non-sensical) case of
an empty bundle list.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/config/fetch.txt | 16 ++++++++++++
 bundle-uri.c                   | 48 ++++++++++++++++++++++++++++++++--
 t/t5558-clone-bundle-uri.sh    | 29 +++++++++++++++++++-
 3 files changed, 90 insertions(+), 3 deletions(-)

diff --git a/Documentation/config/fetch.txt b/Documentation/config/fetch.txt
index 244f44d460f..568f0f75b30 100644
--- a/Documentation/config/fetch.txt
+++ b/Documentation/config/fetch.txt
@@ -104,3 +104,19 @@ fetch.bundleURI::
 	linkgit:git-clone[1]. `git clone --bundle-uri` will set the
 	`fetch.bundleURI` value if the supplied bundle URI contains a bundle
 	list that is organized for incremental fetches.
++
+If you modify this value and your repository has a `fetch.bundleCreationToken`
+value, then remove that `fetch.bundleCreationToken` value before fetching from
+the new bundle URI.
+
+fetch.bundleCreationToken::
+	When using `fetch.bundleURI` to fetch incrementally from a bundle
+	list that uses the "creationToken" heuristic, this config value
+	stores the maximum `creationToken` value of the downloaded bundles.
+	This value is used to prevent downloading bundles in the future
+	if the advertised `creationToken` is not strictly larger than this
+	value.
++
+The creation token values are chosen by the provider serving the specific
+bundle URI. If you modify the URI at `fetch.bundleURI`, then be sure to
+remove the value for the `fetch.bundleCreationToken` value before fetching.
diff --git a/bundle-uri.c b/bundle-uri.c
index 7a1b6d94bf5..d6f7df7350f 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -481,6 +481,8 @@ static int fetch_bundles_by_token(struct repository *r,
 {
 	int cur;
 	int move_direction = 0;
+	const char *creationTokenStr;
+	uint64_t maxCreationToken = 0, newMaxCreationToken = 0;
 	struct bundle_list_context ctx = {
 		.r = r,
 		.list = list,
@@ -494,8 +496,27 @@ static int fetch_bundles_by_token(struct repository *r,
 
 	for_all_bundles_in_list(list, append_bundle, &bundles);
 
+	if (!bundles.nr) {
+		free(bundles.items);
+		return 0;
+	}
+
 	QSORT(bundles.items, bundles.nr, compare_creation_token_decreasing);
 
+	/*
+	 * If fetch.bundleCreationToken exists, parses to a uint64t, and
+	 * is not strictly smaller than the maximum creation token in the
+	 * bundle list, then do not download any bundles.
+	 */
+	if (!repo_config_get_value(r,
+				   "fetch.bundlecreationtoken",
+				   &creationTokenStr) &&
+	    sscanf(creationTokenStr, "%"PRIu64, &maxCreationToken) == 1 &&
+	    bundles.items[0]->creationToken <= maxCreationToken) {
+		free(bundles.items);
+		return 0;
+	}
+
 	/*
 	 * Attempt to download and unbundle the minimum number of bundles by
 	 * creationToken in decreasing order. If we fail to unbundle (after
@@ -516,6 +537,16 @@ static int fetch_bundles_by_token(struct repository *r,
 	cur = 0;
 	while (cur >= 0 && cur < bundles.nr) {
 		struct remote_bundle_info *bundle = bundles.items[cur];
+
+		/*
+		 * If we need to dig into bundles below the previous
+		 * creation token value, then likely we are in an erroneous
+		 * state due to missing or invalid bundles. Halt the process
+		 * instead of continuing to download extra data.
+		 */
+		if (bundle->creationToken <= maxCreationToken)
+			break;
+
 		if (!bundle->file) {
 			/*
 			 * Not downloaded yet. Try downloading.
@@ -555,6 +586,9 @@ static int fetch_bundles_by_token(struct repository *r,
 				 */
 				move_direction = -1;
 				bundle->unbundled = 1;
+
+				if (bundle->creationToken > newMaxCreationToken)
+					newMaxCreationToken = bundle->creationToken;
 			}
 		}
 
@@ -569,14 +603,24 @@ move:
 		cur += move_direction;
 	}
 
-	free(bundles.items);
-
 	/*
 	 * We succeed if the loop terminates because 'cur' drops below
 	 * zero. The other case is that we terminate because 'cur'
 	 * reaches the end of the list, so we have a failure no matter
 	 * which bundles we apply from the list.
 	 */
+	if (cur < 0) {
+		struct strbuf value = STRBUF_INIT;
+		strbuf_addf(&value, "%"PRIu64"", newMaxCreationToken);
+		if (repo_config_set_multivar_gently(ctx.r,
+						    "fetch.bundleCreationToken",
+						    value.buf, NULL, 0))
+			warning(_("failed to store maximum creation token"));
+
+		strbuf_release(&value);
+	}
+
+	free(bundles.items);
 	return cur >= 0;
 }
 
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index 7deeb4b8ad1..9c2b7934b9b 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -433,6 +433,7 @@ test_expect_success 'clone incomplete bundle list (http, creationToken)' '
 		"$HTTPD_URL/smart/fetch.git" clone-token-http &&
 
 	test_cmp_config -C clone-token-http "$HTTPD_URL/bundle-list" fetch.bundleuri &&
+	test_cmp_config -C clone-token-http 1 fetch.bundlecreationtoken &&
 
 	cat >expect <<-EOF &&
 	$HTTPD_URL/bundle-list
@@ -468,6 +469,7 @@ test_expect_success 'clone incomplete bundle list (http, creationToken)' '
 	GIT_TRACE2_EVENT="$(pwd)/trace1.txt" \
 		git -C clone-token-http fetch origin --no-tags \
 		refs/heads/merge:refs/heads/merge &&
+	test_cmp_config -C clone-token-http 4 fetch.bundlecreationtoken &&
 
 	cat >expect <<-EOF &&
 	$HTTPD_URL/bundle-list
@@ -511,6 +513,7 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
 		"$HTTPD_URL/smart/fetch.git" fetch-http-4 &&
 
 	test_cmp_config -C fetch-http-4 "$HTTPD_URL/bundle-list" fetch.bundleuri &&
+	test_cmp_config -C fetch-http-4 1 fetch.bundlecreationtoken &&
 
 	cat >expect <<-EOF &&
 	$HTTPD_URL/bundle-list
@@ -538,6 +541,7 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
 		git -C fetch-http-4 fetch origin --no-tags \
 		refs/heads/left:refs/heads/left \
 		refs/heads/right:refs/heads/right &&
+	test_cmp_config -C fetch-http-4 2 fetch.bundlecreationtoken &&
 
 	cat >expect <<-EOF &&
 	$HTTPD_URL/bundle-list
@@ -555,6 +559,18 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
 	EOF
 	test_cmp expect refs &&
 
+	# No-op fetch
+	GIT_TRACE2_EVENT="$(pwd)/trace1b.txt" \
+		git -C fetch-http-4 fetch origin --no-tags \
+		refs/heads/left:refs/heads/left \
+		refs/heads/right:refs/heads/right &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-list
+	EOF
+	test_remote_https_urls <trace1b.txt >actual &&
+	test_cmp expect actual &&
+
 	cat >>"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
 	[bundle "bundle-3"]
 		uri = bundle-3.bundle
@@ -570,6 +586,7 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
 		git -C fetch-http-4 fetch origin --no-tags \
 		refs/heads/merge:refs/heads/merge &&
+	test_cmp_config -C fetch-http-4 4 fetch.bundlecreationtoken &&
 
 	cat >expect <<-EOF &&
 	$HTTPD_URL/bundle-list
@@ -588,7 +605,17 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
 	refs/bundles/left
 	refs/bundles/merge
 	EOF
-	test_cmp expect refs
+	test_cmp expect refs &&
+
+	# No-op fetch
+	GIT_TRACE2_EVENT="$(pwd)/trace2b.txt" \
+		git -C fetch-http-4 fetch origin &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-list
+	EOF
+	test_remote_https_urls <trace2b.txt >actual &&
+	test_cmp expect actual
 '
 
 # Do not add tests here unless they use the HTTP server, as they will
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v3 11/11] bundle-uri: test missing bundles with heuristic
  2023-01-31 13:29   ` [PATCH v3 00/11] " Derrick Stolee via GitGitGadget
                       ` (9 preceding siblings ...)
  2023-01-31 13:29     ` [PATCH v3 10/11] bundle-uri: store fetch.bundleCreationToken Derrick Stolee via GitGitGadget
@ 2023-01-31 13:29     ` Derrick Stolee via GitGitGadget
  10 siblings, 0 replies; 74+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-01-31 13:29 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The creationToken heuristic uses a different mechanism for downloading
bundles from the "standard" approach. Specifically: it uses a concrete
order based on the creationToken values and attempts to download as few
bundles as possible. It also modifies local config to store a value for
future fetches to avoid downloading bundles, if possible.

However, if any of the individual bundles has a failed download, then
the logic for the ordering comes into question. It is important to avoid
infinite loops, assigning invalid creation token values in config, but
also to be opportunistic as possible when downloading as many bundles as
seem appropriate.

These tests were used to inform the implementation of
fetch_bundles_by_token() in bundle-uri.c, but are being added
independently here to allow focusing on faulty downloads. There may be
more cases that could be added that result in modifications to
fetch_bundles_by_token() as interesting data shapes reveal themselves in
real scenarios.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 t/t5558-clone-bundle-uri.sh | 400 ++++++++++++++++++++++++++++++++++++
 1 file changed, 400 insertions(+)

diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index 9c2b7934b9b..afd56926c53 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -618,6 +618,406 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
 	test_cmp expect actual
 '
 
+test_expect_success 'creationToken heuristic with failed downloads (clone)' '
+	test_when_finished rm -rf download-* trace*.txt &&
+
+	# Case 1: base bundle does not exist, nothing can unbundle
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = fake.bundle
+		creationToken = 1
+
+	[bundle "bundle-2"]
+		uri = bundle-2.bundle
+		creationToken = 2
+
+	[bundle "bundle-3"]
+		uri = bundle-3.bundle
+		creationToken = 3
+
+	[bundle "bundle-4"]
+		uri = bundle-4.bundle
+		creationToken = 4
+	EOF
+
+	GIT_TRACE2_EVENT="$(pwd)/trace-clone-1.txt" \
+	git clone --single-branch --branch=base \
+		--bundle-uri="$HTTPD_URL/bundle-list" \
+		"$HTTPD_URL/smart/fetch.git" download-1 &&
+
+	# Bundle failure does not set these configs.
+	test_must_fail git -C download-1 config fetch.bundleuri &&
+	test_must_fail git -C download-1 config fetch.bundlecreationtoken &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-list
+	$HTTPD_URL/bundle-4.bundle
+	$HTTPD_URL/bundle-3.bundle
+	$HTTPD_URL/bundle-2.bundle
+	$HTTPD_URL/fake.bundle
+	EOF
+	test_remote_https_urls <trace-clone-1.txt >actual &&
+	test_cmp expect actual &&
+
+	# All bundles failed to unbundle
+	git -C download-1 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+	test_must_be_empty refs &&
+
+	# Case 2: middle bundle does not exist, only two bundles can unbundle
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = bundle-1.bundle
+		creationToken = 1
+
+	[bundle "bundle-2"]
+		uri = fake.bundle
+		creationToken = 2
+
+	[bundle "bundle-3"]
+		uri = bundle-3.bundle
+		creationToken = 3
+
+	[bundle "bundle-4"]
+		uri = bundle-4.bundle
+		creationToken = 4
+	EOF
+
+	GIT_TRACE2_EVENT="$(pwd)/trace-clone-2.txt" \
+	git clone --single-branch --branch=base \
+		--bundle-uri="$HTTPD_URL/bundle-list" \
+		"$HTTPD_URL/smart/fetch.git" download-2 &&
+
+	# Bundle failure does not set these configs.
+	test_must_fail git -C download-2 config fetch.bundleuri &&
+	test_must_fail git -C download-2 config fetch.bundlecreationtoken &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-list
+	$HTTPD_URL/bundle-4.bundle
+	$HTTPD_URL/bundle-3.bundle
+	$HTTPD_URL/fake.bundle
+	$HTTPD_URL/bundle-1.bundle
+	EOF
+	test_remote_https_urls <trace-clone-2.txt >actual &&
+	test_cmp expect actual &&
+
+	# bundle-1 and bundle-3 could unbundle, but bundle-4 could not
+	git -C download-2 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+	cat >expect <<-EOF &&
+	refs/bundles/base
+	refs/bundles/right
+	EOF
+	test_cmp expect refs &&
+
+	# Case 3: top bundle does not exist, rest unbundle fine.
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = bundle-1.bundle
+		creationToken = 1
+
+	[bundle "bundle-2"]
+		uri = bundle-2.bundle
+		creationToken = 2
+
+	[bundle "bundle-3"]
+		uri = bundle-3.bundle
+		creationToken = 3
+
+	[bundle "bundle-4"]
+		uri = fake.bundle
+		creationToken = 4
+	EOF
+
+	GIT_TRACE2_EVENT="$(pwd)/trace-clone-3.txt" \
+	git clone --single-branch --branch=base \
+		--bundle-uri="$HTTPD_URL/bundle-list" \
+		"$HTTPD_URL/smart/fetch.git" download-3 &&
+
+	# As long as we have continguous successful downloads,
+	# we _do_ set these configs.
+	test_cmp_config -C download-3 "$HTTPD_URL/bundle-list" fetch.bundleuri &&
+	test_cmp_config -C download-3 3 fetch.bundlecreationtoken &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-list
+	$HTTPD_URL/fake.bundle
+	$HTTPD_URL/bundle-3.bundle
+	$HTTPD_URL/bundle-2.bundle
+	$HTTPD_URL/bundle-1.bundle
+	EOF
+	test_remote_https_urls <trace-clone-3.txt >actual &&
+	test_cmp expect actual &&
+
+	# fake.bundle did not unbundle, but the others did.
+	git -C download-3 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+	cat >expect <<-EOF &&
+	refs/bundles/base
+	refs/bundles/left
+	refs/bundles/right
+	EOF
+	test_cmp expect refs
+'
+
+# Expand the bundle list to include other interesting shapes, specifically
+# interesting for use when fetching from a previous state.
+#
+# ---------------- bundle-7
+#       7
+#     _/|\_
+# ---/--|--\------ bundle-6
+#   5   |   6
+# --|---|---|----- bundle-4
+#   |   4   |
+#   |  / \  /
+# --|-|---|/------ bundle-3 (the client will be caught up to this point.)
+#   \ |   3
+# ---\|---|------- bundle-2
+#     2   |
+# ----|---|------- bundle-1
+#      \ /
+#       1
+#       |
+# (previous commits)
+test_expect_success 'expand incremental bundle list' '
+	(
+		cd clone-from &&
+		git checkout -b lefter left &&
+		test_commit 5 &&
+		git checkout -b righter right &&
+		test_commit 6 &&
+		git checkout -b top lefter &&
+		git merge -m "7" merge righter &&
+
+		git bundle create bundle-6.bundle lefter righter --not left right &&
+		git bundle create bundle-7.bundle top --not lefter merge righter &&
+
+		cp bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/"
+	) &&
+	git -C "$HTTPD_DOCUMENT_ROOT_PATH/fetch.git" fetch origin +refs/heads/*:refs/heads/*
+'
+
+test_expect_success 'creationToken heuristic with failed downloads (fetch)' '
+	test_when_finished rm -rf download-* trace*.txt &&
+
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = bundle-1.bundle
+		creationToken = 1
+
+	[bundle "bundle-2"]
+		uri = bundle-2.bundle
+		creationToken = 2
+
+	[bundle "bundle-3"]
+		uri = bundle-3.bundle
+		creationToken = 3
+	EOF
+
+	git clone --single-branch --branch=left \
+		--bundle-uri="$HTTPD_URL/bundle-list" \
+		"$HTTPD_URL/smart/fetch.git" fetch-base &&
+	test_cmp_config -C fetch-base "$HTTPD_URL/bundle-list" fetch.bundleURI &&
+	test_cmp_config -C fetch-base 3 fetch.bundleCreationToken &&
+
+	# Case 1: all bundles exist: successful unbundling of all bundles
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = bundle-1.bundle
+		creationToken = 1
+
+	[bundle "bundle-2"]
+		uri = bundle-2.bundle
+		creationToken = 2
+
+	[bundle "bundle-3"]
+		uri = bundle-3.bundle
+		creationToken = 3
+
+	[bundle "bundle-4"]
+		uri = bundle-4.bundle
+		creationToken = 4
+
+	[bundle "bundle-6"]
+		uri = bundle-6.bundle
+		creationToken = 6
+
+	[bundle "bundle-7"]
+		uri = bundle-7.bundle
+		creationToken = 7
+	EOF
+
+	cp -r fetch-base fetch-1 &&
+	GIT_TRACE2_EVENT="$(pwd)/trace-fetch-1.txt" \
+		git -C fetch-1 fetch origin &&
+	test_cmp_config -C fetch-1 7 fetch.bundlecreationtoken &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-list
+	$HTTPD_URL/bundle-7.bundle
+	$HTTPD_URL/bundle-6.bundle
+	$HTTPD_URL/bundle-4.bundle
+	EOF
+	test_remote_https_urls <trace-fetch-1.txt >actual &&
+	test_cmp expect actual &&
+
+	# Check which bundles have unbundled by refs
+	git -C fetch-1 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+	cat >expect <<-EOF &&
+	refs/bundles/base
+	refs/bundles/left
+	refs/bundles/lefter
+	refs/bundles/merge
+	refs/bundles/right
+	refs/bundles/righter
+	refs/bundles/top
+	EOF
+	test_cmp expect refs &&
+
+	# Case 2: middle bundle does not exist, only bundle-4 can unbundle
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = bundle-1.bundle
+		creationToken = 1
+
+	[bundle "bundle-2"]
+		uri = bundle-2.bundle
+		creationToken = 2
+
+	[bundle "bundle-3"]
+		uri = bundle-3.bundle
+		creationToken = 3
+
+	[bundle "bundle-4"]
+		uri = bundle-4.bundle
+		creationToken = 4
+
+	[bundle "bundle-6"]
+		uri = fake.bundle
+		creationToken = 6
+
+	[bundle "bundle-7"]
+		uri = bundle-7.bundle
+		creationToken = 7
+	EOF
+
+	cp -r fetch-base fetch-2 &&
+	GIT_TRACE2_EVENT="$(pwd)/trace-fetch-2.txt" \
+		git -C fetch-2 fetch origin &&
+
+	# Since bundle-7 fails to unbundle, do not update creation token.
+	test_cmp_config -C fetch-2 3 fetch.bundlecreationtoken &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-list
+	$HTTPD_URL/bundle-7.bundle
+	$HTTPD_URL/fake.bundle
+	$HTTPD_URL/bundle-4.bundle
+	EOF
+	test_remote_https_urls <trace-fetch-2.txt >actual &&
+	test_cmp expect actual &&
+
+	# Check which bundles have unbundled by refs
+	git -C fetch-2 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+	cat >expect <<-EOF &&
+	refs/bundles/base
+	refs/bundles/left
+	refs/bundles/merge
+	refs/bundles/right
+	EOF
+	test_cmp expect refs &&
+
+	# Case 3: top bundle does not exist, rest unbundle fine.
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = bundle-1.bundle
+		creationToken = 1
+
+	[bundle "bundle-2"]
+		uri = bundle-2.bundle
+		creationToken = 2
+
+	[bundle "bundle-3"]
+		uri = bundle-3.bundle
+		creationToken = 3
+
+	[bundle "bundle-4"]
+		uri = bundle-4.bundle
+		creationToken = 4
+
+	[bundle "bundle-6"]
+		uri = bundle-6.bundle
+		creationToken = 6
+
+	[bundle "bundle-7"]
+		uri = fake.bundle
+		creationToken = 7
+	EOF
+
+	cp -r fetch-base fetch-3 &&
+	GIT_TRACE2_EVENT="$(pwd)/trace-fetch-3.txt" \
+		git -C fetch-3 fetch origin &&
+
+	# As long as we have continguous successful downloads,
+	# we _do_ set the maximum creation token.
+	test_cmp_config -C fetch-3 6 fetch.bundlecreationtoken &&
+
+	# NOTE: the fetch skips bundle-4 since bundle-6 successfully
+	# unbundles itself and bundle-7 failed to download.
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-list
+	$HTTPD_URL/fake.bundle
+	$HTTPD_URL/bundle-6.bundle
+	EOF
+	test_remote_https_urls <trace-fetch-3.txt >actual &&
+	test_cmp expect actual &&
+
+	# Check which bundles have unbundled by refs
+	git -C fetch-3 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+	cat >expect <<-EOF &&
+	refs/bundles/base
+	refs/bundles/left
+	refs/bundles/lefter
+	refs/bundles/right
+	refs/bundles/righter
+	EOF
+	test_cmp expect refs
+'
+
 # Do not add tests here unless they use the HTTP server, as they will
 # not run unless the HTTP dependencies exist.
 
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 74+ messages in thread

end of thread, other threads:[~2023-01-31 13:30 UTC | newest]

Thread overview: 74+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-06 20:36 [PATCH 0/8] Bundle URIs V: creationToken heuristic for incremental fetches Derrick Stolee via GitGitGadget
2023-01-06 20:36 ` [PATCH 1/8] t5558: add tests for creationToken heuristic Derrick Stolee via GitGitGadget
2023-01-17 18:17   ` Victoria Dye
2023-01-17 21:00     ` Derrick Stolee
2023-01-06 20:36 ` [PATCH 2/8] bundle-uri: parse bundle.heuristic=creationToken Derrick Stolee via GitGitGadget
2023-01-09  2:38   ` Junio C Hamano
2023-01-09 14:20     ` Derrick Stolee
2023-01-17 19:13   ` Victoria Dye
2023-01-06 20:36 ` [PATCH 3/8] bundle-uri: parse bundle.<id>.creationToken values Derrick Stolee via GitGitGadget
2023-01-09  3:08   ` Junio C Hamano
2023-01-09 14:41     ` Derrick Stolee
2023-01-17 19:24   ` Victoria Dye
2023-01-06 20:36 ` [PATCH 4/8] bundle-uri: download in creationToken order Derrick Stolee via GitGitGadget
2023-01-09  3:22   ` Junio C Hamano
2023-01-09 14:58     ` Derrick Stolee
2023-01-19 18:32   ` Victoria Dye
2023-01-20 14:56     ` Derrick Stolee
2023-01-06 20:36 ` [PATCH 5/8] clone: set fetch.bundleURI if appropriate Derrick Stolee via GitGitGadget
2023-01-19 19:42   ` Victoria Dye
2023-01-20 15:42     ` Derrick Stolee
2023-01-06 20:36 ` [PATCH 6/8] bundle-uri: drop bundle.flag from design doc Derrick Stolee via GitGitGadget
2023-01-19 19:44   ` Victoria Dye
2023-01-06 20:36 ` [PATCH 7/8] fetch: fetch from an external bundle URI Derrick Stolee via GitGitGadget
2023-01-19 20:34   ` Victoria Dye
2023-01-20 15:47     ` Derrick Stolee
2023-01-06 20:36 ` [PATCH 8/8] bundle-uri: store fetch.bundleCreationToken Derrick Stolee via GitGitGadget
2023-01-19 22:24   ` Victoria Dye
2023-01-20 15:53     ` Derrick Stolee
2023-01-23 15:21 ` [PATCH v2 00/10] Bundle URIs V: creationToken heuristic for incremental fetches Derrick Stolee via GitGitGadget
2023-01-23 15:21   ` [PATCH v2 01/10] bundle: optionally skip reachability walk Derrick Stolee via GitGitGadget
2023-01-23 18:03     ` Junio C Hamano
2023-01-23 18:24       ` Derrick Stolee
2023-01-23 20:13         ` Junio C Hamano
2023-01-23 22:30           ` Junio C Hamano
2023-01-24 12:27             ` Derrick Stolee
2023-01-24 14:14               ` [PATCH v2.5 01/11] bundle: test unbundling with incomplete history Derrick Stolee
2023-01-24 17:16                 ` Junio C Hamano
2023-01-24 14:16               ` [PATCH v2.5 02/11] bundle: verify using connected() Derrick Stolee
2023-01-24 17:33                 ` Junio C Hamano
2023-01-24 18:46                   ` Derrick Stolee
2023-01-24 20:41                     ` Junio C Hamano
2023-01-24 15:22               ` [PATCH v2 01/10] bundle: optionally skip reachability walk Junio C Hamano
2023-01-23 21:08         ` Junio C Hamano
2023-01-23 15:21   ` [PATCH v2 02/10] t5558: add tests for creationToken heuristic Derrick Stolee via GitGitGadget
2023-01-27 19:15     ` Victoria Dye
2023-01-23 15:21   ` [PATCH v2 03/10] bundle-uri: parse bundle.heuristic=creationToken Derrick Stolee via GitGitGadget
2023-01-23 15:21   ` [PATCH v2 04/10] bundle-uri: parse bundle.<id>.creationToken values Derrick Stolee via GitGitGadget
2023-01-23 15:21   ` [PATCH v2 05/10] bundle-uri: download in creationToken order Derrick Stolee via GitGitGadget
2023-01-27 19:17     ` Victoria Dye
2023-01-27 19:32       ` Junio C Hamano
2023-01-30 18:43         ` Derrick Stolee
2023-01-30 19:02           ` Junio C Hamano
2023-01-30 19:12             ` Derrick Stolee
2023-01-23 15:21   ` [PATCH v2 06/10] clone: set fetch.bundleURI if appropriate Derrick Stolee via GitGitGadget
2023-01-23 15:21   ` [PATCH v2 07/10] bundle-uri: drop bundle.flag from design doc Derrick Stolee via GitGitGadget
2023-01-23 15:21   ` [PATCH v2 08/10] fetch: fetch from an external bundle URI Derrick Stolee via GitGitGadget
2023-01-27 19:18     ` Victoria Dye
2023-01-23 15:21   ` [PATCH v2 09/10] bundle-uri: store fetch.bundleCreationToken Derrick Stolee via GitGitGadget
2023-01-23 15:21   ` [PATCH v2 10/10] bundle-uri: test missing bundles with heuristic Derrick Stolee via GitGitGadget
2023-01-27 19:21     ` Victoria Dye
2023-01-30 18:47       ` Derrick Stolee
2023-01-27 19:28   ` [PATCH v2 00/10] Bundle URIs V: creationToken heuristic for incremental fetches Victoria Dye
2023-01-31 13:29   ` [PATCH v3 00/11] " Derrick Stolee via GitGitGadget
2023-01-31 13:29     ` [PATCH v3 01/11] bundle: test unbundling with incomplete history Derrick Stolee via GitGitGadget
2023-01-31 13:29     ` [PATCH v3 02/11] bundle: verify using check_connected() Derrick Stolee via GitGitGadget
2023-01-31 13:29     ` [PATCH v3 03/11] t5558: add tests for creationToken heuristic Derrick Stolee via GitGitGadget
2023-01-31 13:29     ` [PATCH v3 04/11] bundle-uri: parse bundle.heuristic=creationToken Derrick Stolee via GitGitGadget
2023-01-31 13:29     ` [PATCH v3 05/11] bundle-uri: parse bundle.<id>.creationToken values Derrick Stolee via GitGitGadget
2023-01-31 13:29     ` [PATCH v3 06/11] bundle-uri: download in creationToken order Derrick Stolee via GitGitGadget
2023-01-31 13:29     ` [PATCH v3 07/11] clone: set fetch.bundleURI if appropriate Derrick Stolee via GitGitGadget
2023-01-31 13:29     ` [PATCH v3 08/11] bundle-uri: drop bundle.flag from design doc Derrick Stolee via GitGitGadget
2023-01-31 13:29     ` [PATCH v3 09/11] fetch: fetch from an external bundle URI Derrick Stolee via GitGitGadget
2023-01-31 13:29     ` [PATCH v3 10/11] bundle-uri: store fetch.bundleCreationToken Derrick Stolee via GitGitGadget
2023-01-31 13:29     ` [PATCH v3 11/11] bundle-uri: test missing bundles with heuristic Derrick Stolee via GitGitGadget

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).