git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 00/24] [RFC] Bundle URIs Combined RFC
@ 2022-05-20 18:40 Derrick Stolee via GitGitGadget
  2022-05-20 18:40 ` [PATCH 01/24] docs: document bundle URI standard Derrick Stolee via GitGitGadget
                   ` (23 more replies)
  0 siblings, 24 replies; 25+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-20 18:40 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, Ævar Arnfjörð Bjarmason,
	Teng Long, Johannes Schindelin, Derrick Stolee

Here is the latest RFC round of Bundle URIs. It combines most of the ideas
from my previous RFC [5] and Ævar's latest RFC [7]. I'll start by talking
about the differences from those two approaches before going into the full
detail of the RFC (assuming no context of the previous discussion).

This series is based on ds/bundle-uri [8].


Comparison to Previous RFC Rounds
=================================

The two design documents from Ævar's latest RFC are combined into one and
are moved up to the first patch.

The "table of contents" concept is renamed to a "bundle list". When serving
from an HTTPS GET request, that still takes the form of a config file
format, but the exact same key=value pairs are also used in the bundle-uri
protocol v2 verb. Both use a common underlying logic for downloading from
lists that are found in either way. Since I modified Ævar's bundle-uri
protocol v2 logic to match this common model, I was able to preserve his
authorship on several patches, but some were different enough that I added
my co-authorship.

The bundle list has a new bundle.list.mode setting that can be all (get
everything) or any (get one). This allows the bundle-uri verb to advertise
multiple bundle servers where any one of them would be sufficient to get a
recent view of the repository.

The timestamp heuristic is removed from this series for the sake of brevity.
The logic for downloading the bundles can be modified in the future to take
advantage of such a heuristic if it is added in the future.

Ævar's use of shelling out to curl is completely removed. This was causing
test failures on Windows but also has no way of integrating with the
credential manager at the moment, which is critical to serving private
repository data via bundles. Similar to the timestamp heuristic, this "fetch
the headers first" heuristic can be added later.

Because these heuristics are removed, the incremental bundle fetches during
git fetch are also removed. This needs at least one heuristic, so is an
intended target after the git clone integration is more stable.

Instead of creating git bundle fetch, I implemented the same functionality
into git fetch --bundle-uri=X, which is perhaps more understandable to
users. This command will allow users to fetch from bundles on their own
timeline if they don't want to rely on Git auto-discovering or auto-fetching
from bundles.

Both the git clone --bundle-uri and the discovery from the bundle-uri
protocol v2 system are provided.

The transport.injectBundleURI config option is removed. Instead, the
--bundle-uri command-line option is provided.

Ævar's series had some use of logic to feed the bundle ref advertisement
directly into the fetch negotiation. This is removed in favor of storing the
bundled refs in refs/bundles/ (along with
log.excludeDecoration=refs/bundles/) to automatically gain that benefit,
including when using --bundle-uri.

I kept and extended the tests that Ævar provided. There is a new test in the
end-to-end test that includes the server advertising multiple bundles. The
series as a whole is still light on tests, but I wanted to continue to focus
on the viability of the design before committing too much on tests. I've
marked commit messages with RFC-TODO comments which will be resolved before
sending any patches for full review.


Context-Free Description
========================

There have been several suggestions to improve Git clone speeds and
reliability by supplementing the Git protocol with static content. The
Packfile URI [0] feature lets the Git response include URIs that point to
packfiles that the client must download to complete the request.

Last year, Ævar suggested using bundles instead of packfiles [1] [2]. This
design has the same benefits to the packfile URI feature because it offloads
most object downloads to static content fetches. The main advantage over
packfile URIs is that the remote Git server does not need to know what is in
those bundles. The Git client tells the server what it downloaded during the
fetch negotiation afterwards. This includes any chance that the client did
not have access to those bundles or otherwise failed to access them. I
agreed that this was a much more desirable way to serve static content but
had concerns about the flexibility of that design [3]. I started
investigating this idea myself in December, resulting in an RFC in February
[5]. After some discussion, we agreed [6] that Ævar would combine my ideas
with his, including the patches that had not been shared publicly. Ævar's
latest RFC [7] includes the two implementations mostly side-by-side, with
some early refactoring patches that help both implementations. I submitted
those early patches for full review [8], now known as ds/bundle-uri.

Notable to this topic, although not used directly in this RFC, is the notion
of partial bundles [4] which landed as a feature in Git 2.36.0. This allows
us to create bundles that respect a partial clone filter. While this RFC
does not implement the awareness of filters when downloading bundles, it
would be relatively simple to modify the implementation to include those
checks. My previous RFC [5] had that integration, for reference.

The first patch of this series is a design document detailing how the bundle
URI feature works in its ideal, feature-complete state (along with ways it
can be extended with more features in the future). I focused on maximizing
flexibility for the service that organizes and serves bundles. This
includes:

 * Git servers can advertise bundle URIs using the Git protocol.

 * Users can set up bundle servers independent of the remote Git server if
   they specify the bundle URI via a --bundle-uri argument.

 * Bundle URIs work for full and partial clones.

 * Bundle URIs can assist with git fetch in addition to git clone.

The general breakdown is as follows:

 * Patch 1 adds documentation for the feature in its entirety.

 * Patches 2-8 add the git fetch --bundle-uri=<X> and git clone
   --bundle-uri=<X> functionality for a single bundle (no bunde lists).

 * Patches 9-15 add the bundle list data structures and basic parsing so
   that git clone --bundle-uri=<X> can download bundles based on a bundle
   list.

 * Patches 16-24 add the bundle-uri protocol v2 verb and allows git clone to
   discover and download a bundle list advertised by the server.

I consider the patches in their current form to be “RFC quality”. There are
multiple places where tests are missing or special cases are not checked.
The goal for this RFC is to seek feedback on the high-level ideas before
committing to the deep work of creating mergeable patches.

My previous RFC [5] included a running server prototype using static HTTPS
content that is generated by a daily job. I updated that prototype to use
the new bundle list format, but since this RFC does not include
understanding the timestamp heuristic or the partial clone filters, the RFC
isn't appropriate for performance testing with that data. The clones will
still work, but they will download extra data (both the full and partial
bundles).


Intended Focus of this RFC
==========================

This RFC is very large, and that's even with many patches not including full
documentation or tests. These commits are not intended to be reviewed as if
I intended to merge this as-is.

One thing this feature establishes is a new standard by which the Git client
will communicate with external servers. The goal of this RFC is to determine
if this standard is well designed, and whether we need to make it more
robust. Alternatively, the design might need to be changed for reasons I
cannot predict.

[0]
https://github.com/git/git/blob/master/Documentation/technical/packfile-uri.txt
The packfile URI feature in Git (Created June 2020)

[1]
https://lore.kernel.org/git/RFC-cover-00.13-0000000000-20210805T150534Z-avarab@gmail.com/
An earlier RFC for a bundle URI feature. (August 2021)

[2]
https://lore.kernel.org/git/cover-0.3-00000000000-20211025T211159Z-avarab@gmail.com/
An earlier patch series creating a 'bundle-uri' protocol v2 capability.
(October 2021)

[3]
https://lore.kernel.org/git/e7fe220b-2877-107e-8f7e-ea507a65feff@gmail.com/
My earlier thoughts on the previous RFCs, many of which are integrated into
this RFC (August 2021)

[4]
https://lore.kernel.org/git/pull.1159.git.1645638911.gitgitgadget@gmail.com/
Add object filters to bundles (February 2022)

[5]
https://lore.kernel.org/git/pull.1160.git.1645641063.gitgitgadget@gmail.com
[RFC] Bundle URIs (February 2022)

[6]
https://lore.kernel.org/git/ddebc223-1e13-e758-f9b1-d3f23961e459@github.com/
Summary of my conversation with Ævar (March 2022)

[7]
https://lore.kernel.org/git/RFC-cover-v2-00.36-00000000000-20220418T165545Z-avarab@gmail.com/
[RFC PATCH v2 00/36] bundle-uri: a "dumb CDN" for git + TOC format (April
2022)

[8]
https://lore.kernel.org/git/pull.1233.git.1652731865.gitgitgadget@gmail.com/
[PATCH 0/8] Bundle URIs: Prepatory patches (May 2022)

Thanks, -Stolee

Derrick Stolee (18):
  docs: document bundle URI standard
  remote-curl: add 'get' capability
  bundle-uri: create basic file-copy logic
  bundle-uri: add support for http(s):// and file://
  fetch: add --bundle-uri option
  fetch: add 'refs/bundle/' to log.excludeDecoration
  clone: add --bundle-uri option
  clone: --bundle-uri cannot be combined with --depth
  bundle-uri: create bundle_list struct and helpers
  bundle-uri: create base key-value pair parsing
  bundle-uri: limit recursion depth for bundle lists
  bundle-uri: parse bundle list in config format
  bundle-uri: fetch a list of bundles
  bundle-uri: serve URI advertisement from bundle.* config
  bundle-uri: allow relative URLs in bundle lists
  bundle-uri: download bundles from an advertised list
  clone: unbundle the advertised bundles
  t5601: basic bundle URI tests

Ævar Arnfjörð Bjarmason (6):
  bundle-uri: create "key=value" line parsing
  bundle-uri: unit test "key=value" parsing
  protocol v2: add server-side "bundle-uri" skeleton
  bundle-uri client: add minimal NOOP client
  bundle-uri client: add "git ls-remote-bundle-uri"
  bundle-uri client: add boolean transfer.bundleURI setting

 .gitignore                                 |   1 +
 Documentation/config/transfer.txt          |   6 +
 Documentation/fetch-options.txt            |   6 +
 Documentation/git-fetch.txt                |   1 +
 Documentation/git-ls-remote-bundle-uri.txt |  62 ++
 Documentation/git-ls-remote.txt            |   1 +
 Documentation/gitremote-helpers.txt        |   6 +
 Documentation/technical/bundle-uri.txt     | 477 +++++++++++++++
 Makefile                                   |   3 +
 builtin.h                                  |   1 +
 builtin/clone.c                            |  45 ++
 builtin/fetch.c                            |  10 +
 builtin/ls-remote-bundle-uri.c             |  71 +++
 bundle-uri.c                               | 663 +++++++++++++++++++++
 bundle-uri.h                               | 139 +++++
 command-list.txt                           |   1 +
 connect.c                                  |  47 ++
 git.c                                      |   1 +
 remote-curl.c                              |  32 +
 remote.h                                   |   5 +
 serve.c                                    |   6 +
 t/helper/test-bundle-uri.c                 |  92 +++
 t/helper/test-tool.c                       |   1 +
 t/helper/test-tool.h                       |   1 +
 t/lib-t5730-protocol-v2-bundle-uri.sh      | 294 +++++++++
 t/t5601-clone.sh                           |  59 ++
 t/t5701-git-serve.sh                       |  40 +-
 t/t5730-protocol-v2-bundle-uri-file.sh     |  36 ++
 t/t5731-protocol-v2-bundle-uri-git.sh      |  17 +
 t/t5732-protocol-v2-bundle-uri-http.sh     |  17 +
 t/t5750-bundle-uri-parse.sh                | 195 ++++++
 t/t6021-fetch-bundle.sh                    |  23 +
 t/test-lib-functions.sh                    |  11 +
 transport-helper.c                         |  18 +-
 transport-internal.h                       |   7 +
 transport.c                                |  87 +++
 transport.h                                |  23 +
 37 files changed, 2503 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/git-ls-remote-bundle-uri.txt
 create mode 100644 Documentation/technical/bundle-uri.txt
 create mode 100644 builtin/ls-remote-bundle-uri.c
 create mode 100644 bundle-uri.c
 create mode 100644 bundle-uri.h
 create mode 100644 t/helper/test-bundle-uri.c
 create mode 100644 t/lib-t5730-protocol-v2-bundle-uri.sh
 create mode 100755 t/t5730-protocol-v2-bundle-uri-file.sh
 create mode 100755 t/t5731-protocol-v2-bundle-uri-git.sh
 create mode 100755 t/t5732-protocol-v2-bundle-uri-http.sh
 create mode 100755 t/t5750-bundle-uri-parse.sh
 create mode 100755 t/t6021-fetch-bundle.sh


base-commit: 89c6e450fe4a919ecb6fa698005a935531c732cf
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1234%2Fderrickstolee%2Fbundle-redo%2Frfc-3-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1234/derrickstolee/bundle-redo/rfc-3-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1234
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 01/24] docs: document bundle URI standard
  2022-05-20 18:40 [PATCH 00/24] [RFC] Bundle URIs Combined RFC Derrick Stolee via GitGitGadget
@ 2022-05-20 18:40 ` Derrick Stolee via GitGitGadget
  2022-05-20 18:40 ` [PATCH 02/24] remote-curl: add 'get' capability Derrick Stolee via GitGitGadget
                   ` (22 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-20 18:40 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, Ævar Arnfjörð Bjarmason,
	Teng Long, Johannes Schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

Introduce the idea of bundle URIs to the Git codebase through an
aspirational design document. This document includes the full design
intended to include the feature in its fully-implemented form. This will
take several steps as detailed in the Implementation Plan section.

By committing this document now, it can be used to motivate changes
necessary to reach these final goals. The design can still be altered as
new information is discovered.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/technical/bundle-uri.txt | 477 +++++++++++++++++++++++++
 1 file changed, 477 insertions(+)
 create mode 100644 Documentation/technical/bundle-uri.txt

diff --git a/Documentation/technical/bundle-uri.txt b/Documentation/technical/bundle-uri.txt
new file mode 100644
index 00000000000..0862336c9eb
--- /dev/null
+++ b/Documentation/technical/bundle-uri.txt
@@ -0,0 +1,477 @@
+Bundle URIs
+===========
+
+Bundle URIs are locations where Git can download one or more bundles in
+order to bootstrap the object database in advance of fetching the remaining
+objects from a remote.
+
+One goal is to speed up clones and fetches for users with poor network
+connectivity to the origin server. Another benefit is to allow heavy users,
+such as CI build farms, to use local resources for the majority of Git data
+and thereby reducing the load on the origin server.
+
+To enable the bundle URI feature, users can specify a bundle URI using
+command-line options or the origin server can advertise one or more URIs
+via a protocol v2 capability.
+
+Design Goals
+------------
+
+The bundle URI standard aims to be flexible enough to satisfy multiple
+workloads. The bundle provider and the Git client have several choices in
+how they create and consume bundle URIs.
+
+* Bundles can have whatever name the server desires. This name could refer
+  to immutable data by using a hash of the bundle contents. However, this
+  means that a new URI will be needed after every update of the content.
+  This might be acceptable if the server is advertising the URI (and the
+  server is aware of new bundles being generated) but would not be
+  ergonomic for users using the command line option.
+
+* The bundles could be organized specifically for bootstrapping full
+  clones, but could also be organized with the intention of bootstrapping
+  incremental fetches. The bundle provider must decide on one of several
+  organization schemes to minimize client downloads during incremental
+  fetches, but the Git client can also choose whether to use bundles for
+  either of these operations.
+
+* The bundle provider can choose to support full clones, partial clones,
+  or both. The client can detect which bundles are appropriate for the
+  repository's partial clone filter, if any.
+
+* The bundle provider can use a single bundle (for clones only), or a
+  list of bundles. When using a list of bundles, the provider can specify
+  whether or not the client needs _all_ of the bundle URIs for a full
+  clone, or if _any_ one of the bundle URIs is sufficient. This allows the
+  bundle provider to use different URIs for different geographies.
+
+* The bundle provider can organize the bundles using heuristics, such as
+  timestamps or tokens, to help the client prevent downloading bundles it
+  does not need. When the bundle provider does not provide these
+  heuristics, the client can use optimizations to minimize how much of the
+  data is downloaded.
+
+* The bundle provider does not need to be associated with the Git server.
+  The client can choose to use the bundle provider without it being
+  advertised by the Git server.
+
+* The client can choose to discover bundle providers that are advertised
+  by the Git server. This could happen during `git clone`, during
+  `git fetch`, both, or neither. The user can choose which combination
+  works best for them.
+
+* The client can choose to configure a bundle provider manually at any
+  time. The client can also choose to specify a bundle provider manually
+  as a command-line option to `git clone`.
+
+Each repository is different and every Git server has different needs.
+Hopefully the bundle URI feature is flexible enough to satisfy all needs.
+If not, then the feature can be extended through its versioning mechanism.
+
+Server requirements
+-------------------
+
+To provide a server-side implementation of bundle servers, no other parts
+of the Git protocol are required. This allows server maintainers to use
+static content solutions such as CDNs in order to serve the bundle files.
+
+At the current scope of the bundle URI feature, all URIs are expected to
+be HTTP(S) URLs where content is downloaded to a local file using a `GET`
+request to that URL. The server could include authentication requirements
+to those requests with the aim of triggering the configured credential
+helper for secure access. (Future extensions could use "file://" URIs or
+SSH URIs.)
+
+Assuming a `200 OK` response from the server, the content at the URL is
+expected to be of one of two forms:
+
+1. Bundle: A Git bundle file of version 2 or higher.
+
+2. Table of Contents: A plain-text file that is parsable using Git's
+   config file parser. This file describes one or more bundles that are
+   accessible from other URIs.
+
+Any other data provided by the server is considered erroneous.
+
+Bundle Lists
+------------
+
+The Git server can advertise bundle URIs using a set of `key=value` pairs.
+A bundle URI can also serve a plain-text file in the Git config format
+containing these same `key=value` pairs. In both cases, we consider this
+to be a _bundle list_. The pairs specify information about the bundles
+that the client can use to make decisions for which bundles to download
+and which to ignore.
+
+A few keys focus on properties of the list itself.
+
+bundle.list.version::
+	(Required) This value provides a version number for the table of
+	contents. If a future Git change enables a feature that needs the
+	Git client to react to a new key in the table of contents file,
+	then this version will increment. The only current version number
+	is 1, and if any other value is specified then Git will fail to
+	use this file.
+
+bundle.list.mode::
+	(Required) This value has one of two values: `all` and `any`. When
+	`all` is specified, then the client should expect to need all of
+	the listed bundle URIs that match their repository's requirements.
+	When `any` is specified, then the client should expect that any
+	one of the bundle URIs that match their repository's requirements
+	will suffice. Typically, the `any` option is used to list a number
+	of different bundle servers located in different geographies.
+
+bundle.list.forFetch::
+	This boolean value is a signal to the Git client that
+	the bundle server has designed its bundle organization to assist `git fetch`
+	commands in addition to `git clone` commands. If this is missing,
+	Git should not use this table of contents for `git fetch` as it
+	may lead to excess data downloads.
+
+The remaining keys include an `<id>` segment which is a server-designated
+name for each available bundle.
+
+bundle.<id>.uri::
+	(Required) This string value is the URI for downloading bundle
+	`<id>`. If the URI begins with a protocol (`http://` or
+	`https://`) then the URI is absolute. Otherwise, the URI is
+	interpreted as relative to the URI used for the table of contents.
+	If the URI begins with `/`, then that relative path is relative to
+	the domain name used for the table of contents. (This use of
+	relative paths is intended to make it easier to distribute a set
+	of bundles across a large number of servers or CDNs with different
+	domain names.)
+
+bundle.<id>.timestamp::
+	This value is the number of seconds since Unix epoch (UTC) that
+	this bundle was created. This is used as an approximation of a
+	point in time that the bundle matches the data available at the
+	origin server.
+
+bundle.<id>.requires::
+	This string value represents the ID of another bundle. When
+	present, the server is indicating that this bundle contains a thin
+	packfile. If the client does not have all necessary objects to
+	unbundle this packfile, then the client can download the bundle
+	with the `requires` ID and try again. (Note: it may be beneficial
+	to allow the server to specify multiple `requires` bundles.)
+
+bundle.<id>.filter::
+	This string value represents an object filter that should also
+	appear in the header of this bundle. The server uses this value to
+	differentiate different kinds of bundles from which the client can
+	choose those that match their object filters.
+
+bundle.<id>.list::
+	This boolean value indicates whether the client should expect the
+	content from this URI to be a list (if `true`) or a bundle (if
+	`false`). This is typically used when `bundle.list.mode` is `any`.
+
+bundle.<id>.location::
+	This string value advertises a real-world location from where the
+	bundle URI is served. This can be used to present the user with an
+	option for which bundle URI to use. This is only valuable when
+	`bundle.list.mode` is `any`.
+
+Here is an example bundle list using the Git config format:
+
+```
+[bundle "list"]
+	version = 1
+	mode = all
+	forFetch = true
+
+[bundle "2022-02-09-1644442601-daily"]
+	uri = https://bundles.fake.com/git/git/2022-02-09-1644442601-daily.bundle
+	timestamp = 1644442601
+	requires = 2022-02-02-1643842562
+
+[bundle "2022-02-02-1643842562"]
+	uri = https://bundles.fake.com/git/git/2022-02-02-1643842562.bundle
+	timestamp = 1643842562
+
+[bundle "2022-02-09-1644442631-daily-blobless"]
+	uri = 2022-02-09-1644442631-daily-blobless.bundle
+	timestamp = 1644442631
+	requires = 2022-02-02-1643842568-blobless
+	filter = blob:none
+
+[bundle "2022-02-02-1643842568-blobless"]
+	uri = /git/git/2022-02-02-1643842568-blobless.bundle
+	timestamp = 1643842568
+	filter = blob:none
+```
+
+This example uses `bundle.list.mode=all` as well as the
+`bundle.<id>.timestamp` heuristic. It also uses the `bundle.<id>.filter`
+options to present two parallel sets of bundles: one for full clones and
+another for blobless partial clones.
+
+Suppose that this bundle list was found at the URI
+`https://bundles.fake.com/git/git/` and so the two blobless bundles have
+the following fully-expanded URIs:
+
+* `https://bundles.fake.com/git/git/2022-02-09-1644442631-daily-blobless.bundle`
+* `https://bundles.fake.com/git/git/2022-02-02-1643842568-blobless.bundle`
+
+Advertising Bundle URIs
+-----------------------
+
+If a user knows a bundle URI for the repository they are cloning, then they
+can specify that URI manually through a command-line option. However, a
+Git host may want to advertise bundle URIs during the clone operation,
+helping users unaware of the feature.
+
+The only thing required for this feature is that the server can advertise
+one or more bundle URIs. This advertisement takes the form of a new
+protocol v2 capability specifically for discovering bundle URIs.
+
+The client could choose an arbitrary bundle URI as an option _or_ select
+the URI with lowest latency by some exploratory checks. It is up to the
+bundle provider to decide if having multiple URIs is preferable to a
+single URI that is geodistributed through server-side infrastructure.
+
+Cloning with Bundle URIs
+------------------------
+
+The primary need for bundle URIs is to speed up clones. The Git client
+will interact with bundle URIs according to the following flow:
+
+1. The user specifies a bundle URI with the `--bundle-uri` command-line
+   option _or_ the client discovers a bundle list advertised by the
+   Git server.
+
+2. If the downloaded data from a bundle URI is a bundle, then the client
+   inspects the bundle headers to check that the negative commit OIDs are
+   present in the client repository. If some are missing, then the client
+   delays unbundling until other bundles have been unbundled, making those
+   OIDs present. When all required OIDs are present, the client unbundles
+   that data using a refspec. The default refspec is
+   `+refs/heads/*:refs/bundles/*`, but this can be configured.
+
+3. If the file is instead a bundle list, then the client inspects the
+   `bundle.list.mode` to see if the list is of the `all` or `any` form.
+
+   a. If `bundle.list.mode=all`, then the client considers all bundle
+      URIs. The list is reduced based on the `bundle.<id>.filter` options
+      matching the client repository's partial clone filter. Then, all
+      bundle URIs are requested. If the `bundle.<id>.timestamp` heuristic
+      is provided, then the bundles are downloaded in reverse-
+      chronological order, stopping when a bundle has all required OIDs.
+      The bundles can then be unbundled in chronological order. The client
+      stores the latest timestamp as a heuristic for avoiding future
+      downloads if the bundle list does not advertise newer bundles.
+
+   b. If `bundle.list.mode=any`, then the client can choose any one of the
+      bundle URIs to inspect. The client can use a variety of ways to
+      choose among these URIs. The client can also fallback to another URI
+      if the initial choice fails to return a result.
+
+Note that during a clone we expect that all bundles will be required, and
+heuristics such as `bundle.<uri>.timestamp` can be used to download bundles
+in chronological order or in parallel.
+
+If a given bundle URI is a bundle list with `bundle.list.forFetch=true`,
+then the client can choose to store that URI as its chosen bundle URI. The
+client can then navigate directly to that URI during later `git fetch`
+calls.
+
+When downloading bundle URIs, the client can choose to inspect the initial
+content before committing to downloading the entire content. This may
+provide enough information to determine if the URI is a bundle list or
+a bundle. In the case of a bundle, the client may inspect the bundle
+header to determine that all advertised tips are already in the client
+repository and cancel the remaining download.
+
+Fetching with Bundle URIs
+-------------------------
+
+When the client fetches new data, it can decide to fetch from bundle
+servers before fetching from the origin remote. This could be done via
+a command-line option, but it is more likely useful to use a config value
+such as the one specified during the clone.
+
+The fetch operation follows the same procedure to download bundles from a
+bundle list (although we do _not_ want to use parallel downloads here). We
+expect that the process will end when all negative commit OIDs in a thin
+bundle are already in the object database.
+
+A further optimization is that the client can avoid downloading any
+bundles if their timestamps are not larger than the stored timestamp.
+After fetching new bundles, this local timestamp value is updated.
+
+If the bundle provider does not provide the `bundle.<uri>.timestamp`
+heuristic, then the client should attempt to inspect the bundle headers
+before downloading the full bundle data in case the bundle tips already
+exist in the client repository.
+
+Error Conditions
+----------------
+
+If the Git client discovers something unexpected while downloading
+information according to a bundle URI or the table of contents found at
+that location, then Git can ignore that data and continue as if it was not
+given a bundle URI. The remote Git server is the ultimate source of truth,
+not the bundle URI.
+
+Here are a few example error conditions:
+
+* The client fails to connect with a server at the given URI or a connection
+  is lost without any chance to recover.
+
+* The client receives a response other than `200 OK` (such as `404 Not Found`,
+  `401 Not Authorized`, or `500 Internal Server Error`). The client should
+  use the `credential.helper` to attempt authentication after the first
+  `401 Not Authorized` response, but a second such response is a failure.
+
+* The client receives data that is not parsable as a bundle or table of
+  contents.
+
+* The bundle list describes a directed cycle in the
+  `bundle.<id>.requires` links.
+
+* A bundle includes a filter that does not match expectations.
+
+* The client cannot unbundle the bundles because the negative commit OIDs
+  are not in the object database and there are no more
+  `bundle.<id>.requires` links to follow.
+
+There are also situations that could be seen as wasteful, but are not
+error conditions:
+
+* The downloaded bundles contain more information than is requested by
+  the clone or fetch request. A primary example is if the user requests
+  a clone with `--single-branch` but downloads bundles that store every
+  reachable commit from all `refs/heads/*` references. This might be
+  initially wasteful, but perhaps these objects will become reachable by
+  a later ref update that the client cares about.
+
+* A bundle download during a `git fetch` contains objects already in the
+  object database. This is probably unavoidable if we are using bundles
+  for fetches, since the client will almost always be slightly ahead of
+  the bundle servers after performing its "catch-up" fetch to the remote
+  server. This extra work is most wasteful when the client is fetching
+  much more frequently than the server is computing bundles, such as if
+  the client is using hourly prefetches with background maintenance, but
+  the server is computing bundles weekly. For this reason, the client
+  should not use bundle URIs for fetch unless the server has explicitly
+  recommended it through the `bundle.list.forFetch = true` value.
+
+Implementation Plan
+-------------------
+
+This design document is being submitted on its own as an aspirational
+document, with the goal of implementing all of the mentioned client
+features over the course of several patch series. Here is a potential
+outline for submitting these features:
+
+1. Integrate bundle URIs into `git clone` with a `--bundle-uri` option.
+   This will include a new `git fetch --bundle-uri` mode for use as the
+   implementation underneath `git clone`. The initial version here will
+   expect a single bundle at the given URI.
+
+2. Implement the ability to parse a bundle list from a bundle URI and
+   update the `git fetch --bundle-uri` logic to properly distinguish
+   between `bundle.list.mode` options. Specifically design the feature so
+   that the config format parsing feeds a list of key-value pairs into the
+   bundle list logic.
+
+3. Create the `bundle-uri` protocol v2 verb so Git servers can advertise
+   bundle URIs using the key-value pairs. Plug into the existing key-value
+   input to the bundle list logic. Allow `git clone` to discover these
+   bundle URIs and bootstrap the client repository from the bundle data.
+   (This choice is an opt-in via a config option and a command-line
+   option.)
+
+4. Allow the client to understand the `bundle.list.forFetch` configuration
+   and the `bundle.<id>.timestamp` heuristic. When `git clone` discovers a
+   bundle URI with `bundle.list.forFetch=true`, it configures the client
+   repository to check that bundle URI during later `git fetch <remote>`
+   commands.
+
+5. Allow clients to discover bundle URIs during `git fetch` and configure
+   a bundle URI for later fetches if `bundle.list.forFetch=true`.
+
+6. Implement the "inspect headers" heuristic to reduce data downloads when
+   the `bundle.<id>.timestamp` heuristic is not available.
+
+As these features are reviewed, this plan might be updated. We also expect
+that new designs will be discovered and implemented as this feature
+matures and becomes used in real-world scenarios.
+
+Related Work: Packfile URIs
+---------------------------
+
+The Git protocol already has a capability where the Git server can list
+a set of URLs along with the packfile response when serving a client
+request. The client is then expected to download the packfiles at those
+locations in order to have a complete understanding of the response.
+
+This mechanism is used by the Gerrit server (implemented with JGit) and
+has been effective at reducing CPU load and improving user performance for
+clones.
+
+A major downside to this mechanism is that the origin server needs to know
+_exactly_ what is in those packfiles, and the packfiles need to be available
+to the user for some time after the server has responded. This coupling
+between the origin and the packfile data is difficult to manage.
+
+Further, this implementation is extremely hard to make work with fetches.
+
+Related Work: GVFS Cache Servers
+--------------------------------
+
+The GVFS Protocol [2] is a set of HTTP endpoints designed independently of
+the Git project before Git's partial clone was created. One feature of this
+protocol is the idea of a "cache server" which can be colocated with build
+machines or developer offices to transfer Git data without overloading the
+central server.
+
+The endpoint that VFS for Git is famous for is the `GET /gvfs/objects/{oid}`
+endpoint, which allows downloading an object on-demand. This is a critical
+piece of the filesystem virtualization of that product.
+
+However, a more subtle need is the `GET /gvfs/prefetch?lastPackTimestamp=<t>`
+endpoint. Given an optional timestamp, the cache server responds with a list
+of precomputed packfiles containing the commits and trees that were introduced
+in those time intervals.
+
+The cache server computes these "prefetch" packfiles using the following
+strategy:
+
+1. Every hour, an "hourly" pack is generated with a given timestamp.
+2. Nightly, the previous 24 hourly packs are rolled up into a "daily" pack.
+3. Nightly, all prefetch packs more than 30 days old are rolled up into
+   one pack.
+
+When a user runs `gvfs clone` or `scalar clone` against a repo with cache
+servers, the client requests all prefetch packfiles, which is at most
+`24 + 30 + 1` packfiles downloading only commits and trees. The client
+then follows with a request to the origin server for the references, and
+attempts to checkout that tip reference. (There is an extra endpoint that
+helps get all reachable trees from a given commit, in case that commit
+was not already in a prefetch packfile.)
+
+During a `git fetch`, a hook requests the prefetch endpoint using the
+most-recent timestamp from a previously-downloaded prefetch packfile.
+Only the list of packfiles with later timestamps are downloaded. Most
+users fetch hourly, so they get at most one hourly prefetch pack. Users
+whose machines have been off or otherwise have not fetched in over 30 days
+might redownload all prefetch packfiles. This is rare.
+
+It is important to note that the clients always contact the origin server
+for the refs advertisement, so the refs are frequently "ahead" of the
+prefetched pack data. The missing objects are downloaded on-demand using
+the `GET gvfs/objects/{oid}` requests, when needed by a command such as
+`git checkout` or `git log`. Some Git optimizations disable checks that
+would cause these on-demand downloads to be too aggressive.
+
+See Also
+--------
+
+[1] https://lore.kernel.org/git/RFC-cover-00.13-0000000000-20210805T150534Z-avarab@gmail.com/
+    An earlier RFC for a bundle URI feature.
+
+[2] https://github.com/microsoft/VFSForGit/blob/master/Protocol.md
+    The GVFS Protocol
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 02/24] remote-curl: add 'get' capability
  2022-05-20 18:40 [PATCH 00/24] [RFC] Bundle URIs Combined RFC Derrick Stolee via GitGitGadget
  2022-05-20 18:40 ` [PATCH 01/24] docs: document bundle URI standard Derrick Stolee via GitGitGadget
@ 2022-05-20 18:40 ` Derrick Stolee via GitGitGadget
  2022-05-20 18:40 ` [PATCH 03/24] bundle-uri: create basic file-copy logic Derrick Stolee via GitGitGadget
                   ` (21 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-20 18:40 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, Ævar Arnfjörð Bjarmason,
	Teng Long, Johannes Schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

A future change will want a way to download a file over HTTP(S) using
the simplest of download mechanisms. We do not want to assume that the
server on the other side understands anything about the Git protocol but
could be a simple static web server.

Create the new 'get' capability for the remote helpers which advertises
that the 'get' command is avalable. A caller can send a line containing
'get <url> <path>' to download the file at <url> into the file at
<path>.

RFC-TODO: This change requires tests directly on the remote helper.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/gitremote-helpers.txt |  6 ++++++
 remote-curl.c                       | 32 +++++++++++++++++++++++++++++
 t/t6021-fetch-bundle.sh             | 23 +++++++++++++++++++++
 transport-helper.c                  |  5 ++++-
 4 files changed, 65 insertions(+), 1 deletion(-)
 create mode 100755 t/t6021-fetch-bundle.sh

diff --git a/Documentation/gitremote-helpers.txt b/Documentation/gitremote-helpers.txt
index 6f1e269ae43..f82588601a9 100644
--- a/Documentation/gitremote-helpers.txt
+++ b/Documentation/gitremote-helpers.txt
@@ -168,6 +168,9 @@ Supported commands: 'list', 'import'.
 	Can guarantee that when a clone is requested, the received
 	pack is self contained and is connected.
 
+'get'::
+	Can use the 'get' command to download a file from a given URI.
+
 If a helper advertises 'connect', Git will use it if possible and
 fall back to another capability if the helper requests so when
 connecting (see the 'connect' command under COMMANDS).
@@ -418,6 +421,9 @@ Supported if the helper has the "connect" capability.
 +
 Supported if the helper has the "stateless-connect" capability.
 
+'get' <uri> <path>::
+	Downloads the file from the given `<uri>` to the given `<path>`.
+
 If a fatal error occurs, the program writes the error message to
 stderr and exits. The caller should expect that a suitable error
 message has been printed if the child closes the connection without
diff --git a/remote-curl.c b/remote-curl.c
index 67f178b1120..53750d88e76 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -1276,6 +1276,33 @@ static void parse_fetch(struct strbuf *buf)
 	strbuf_reset(buf);
 }
 
+static void parse_get(struct strbuf *buf)
+{
+	struct http_get_options opts = { 0 };
+	struct strbuf url = STRBUF_INIT;
+	struct strbuf path = STRBUF_INIT;
+	const char *p, *space;
+
+	if (!skip_prefix(buf->buf, "get ", &p))
+		die(_("http transport does not support %s"), buf->buf);
+
+	space = strchr(p, ' ');
+
+	if (!space)
+		die(_("protocol error: expected '<url> <path>', missing space"));
+
+	strbuf_add(&url, p, space - p);
+	strbuf_addstr(&path, space + 1);
+
+	http_get_file(url.buf, path.buf, &opts);
+
+	strbuf_release(&url);
+	strbuf_release(&path);
+	printf("\n");
+	fflush(stdout);
+	strbuf_reset(buf);
+}
+
 static int push_dav(int nr_spec, const char **specs)
 {
 	struct child_process child = CHILD_PROCESS_INIT;
@@ -1549,9 +1576,14 @@ int cmd_main(int argc, const char **argv)
 				printf("unsupported\n");
 			fflush(stdout);
 
+		} else if (skip_prefix(buf.buf, "get ", &arg)) {
+			parse_get(&buf);
+			fflush(stdout);
+
 		} else if (!strcmp(buf.buf, "capabilities")) {
 			printf("stateless-connect\n");
 			printf("fetch\n");
+			printf("get\n");
 			printf("option\n");
 			printf("push\n");
 			printf("check-connectivity\n");
diff --git a/t/t6021-fetch-bundle.sh b/t/t6021-fetch-bundle.sh
new file mode 100755
index 00000000000..8ae16009ba0
--- /dev/null
+++ b/t/t6021-fetch-bundle.sh
@@ -0,0 +1,23 @@
+#!/bin/sh
+
+test_description='test fetching files and bundles over HTTP'
+
+. ./test-lib.sh
+. "$TEST_DIRECTORY"/lib-httpd.sh
+start_httpd
+
+test_expect_success 'set up source repo' '
+	git init src &&
+	test_commit -C src one
+'
+
+test_expect_success 'fail to get a file over HTTP' '
+	cat >input <<-EOF &&
+	get $HTTPD_URL/does-not-exist.txt download-failed.txt
+
+	EOF
+	git remote-https "$HTTPD_URL" <input 2>err &&
+	test_path_is_missing download-failed.txt
+'
+
+test_done
diff --git a/transport-helper.c b/transport-helper.c
index b4dbbabb0c2..dfbeaebe40c 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -33,7 +33,8 @@ struct helper_data {
 		check_connectivity : 1,
 		no_disconnect_req : 1,
 		no_private_update : 1,
-		object_format : 1;
+		object_format : 1,
+		get : 1;
 
 	/*
 	 * As an optimization, the transport code may invoke fetch before
@@ -210,6 +211,8 @@ static struct child_process *get_helper(struct transport *transport)
 			data->no_private_update = 1;
 		} else if (starts_with(capname, "object-format")) {
 			data->object_format = 1;
+		} else if (!strcmp(capname, "get")) {
+			data->get = 1;
 		} else if (mandatory) {
 			die(_("unknown mandatory capability %s; this remote "
 			      "helper probably needs newer version of Git"),
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 03/24] bundle-uri: create basic file-copy logic
  2022-05-20 18:40 [PATCH 00/24] [RFC] Bundle URIs Combined RFC Derrick Stolee via GitGitGadget
  2022-05-20 18:40 ` [PATCH 01/24] docs: document bundle URI standard Derrick Stolee via GitGitGadget
  2022-05-20 18:40 ` [PATCH 02/24] remote-curl: add 'get' capability Derrick Stolee via GitGitGadget
@ 2022-05-20 18:40 ` Derrick Stolee via GitGitGadget
  2022-05-20 18:40 ` [PATCH 04/24] bundle-uri: add support for http(s):// and file:// Derrick Stolee via GitGitGadget
                   ` (20 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-20 18:40 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, Ævar Arnfjörð Bjarmason,
	Teng Long, Johannes Schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

Before implementing a way to fetch bundles into a repository, create the
basic logic. Assume that the URI is actually a file path. Future logic
will make this more careful to other protocols.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Makefile     |  1 +
 bundle-uri.c | 92 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 bundle-uri.h | 14 ++++++++
 3 files changed, 107 insertions(+)
 create mode 100644 bundle-uri.c
 create mode 100644 bundle-uri.h

diff --git a/Makefile b/Makefile
index f8bccfab5e9..8f27310836d 100644
--- a/Makefile
+++ b/Makefile
@@ -887,6 +887,7 @@ LIB_OBJS += blob.o
 LIB_OBJS += bloom.o
 LIB_OBJS += branch.o
 LIB_OBJS += bulk-checkin.o
+LIB_OBJS += bundle-uri.o
 LIB_OBJS += bundle.o
 LIB_OBJS += cache-tree.o
 LIB_OBJS += cbtree.o
diff --git a/bundle-uri.c b/bundle-uri.c
new file mode 100644
index 00000000000..91aba061833
--- /dev/null
+++ b/bundle-uri.c
@@ -0,0 +1,92 @@
+#include "cache.h"
+#include "bundle-uri.h"
+#include "bundle.h"
+#include "object-store.h"
+#include "refs.h"
+#include "run-command.h"
+
+static void find_temp_filename(struct strbuf *name)
+{
+	int fd;
+	/*
+	 * Find a temporray filename that is available. This is briefly
+	 * racy, but unlikely to collide.
+	 */
+	fd = odb_mkstemp(name, "bundles/tmp_uri_XXXXXX");
+	if (fd < 0)
+		die(_("failed to create temporary file"));
+	close(fd);
+	unlink(name->buf);
+}
+
+static int copy_uri_to_file(const char *uri, const char *file)
+{
+	/* Copy as a file */
+	return copy_file(uri, file, 0444);
+}
+
+static int unbundle_from_file(struct repository *r, const char *file)
+{
+	int result = 0;
+	int bundle_fd;
+	struct bundle_header header = BUNDLE_HEADER_INIT;
+	struct strvec extra_index_pack_args = STRVEC_INIT;
+	struct string_list_item *refname;
+	struct strbuf bundle_ref = STRBUF_INIT;
+	size_t bundle_prefix_len;
+
+	if ((bundle_fd = read_bundle_header(file, &header)) < 0)
+		return 1;
+
+	result = unbundle(r, &header, bundle_fd, &extra_index_pack_args);
+
+	/*
+	 * Convert all refs/heads/ from the bundle into refs/bundles/
+	 * in the local repository.
+	 */
+	strbuf_addstr(&bundle_ref, "refs/bundles/");
+	bundle_prefix_len = bundle_ref.len;
+
+	for_each_string_list_item(refname, &header.references) {
+		struct object_id *oid = refname->util;
+		struct object_id old_oid;
+		const char *branch_name;
+		int has_old;
+
+		if (!skip_prefix(refname->string, "refs/heads/", &branch_name))
+			continue;
+
+		strbuf_setlen(&bundle_ref, bundle_prefix_len);
+		strbuf_addstr(&bundle_ref, branch_name);
+
+		has_old = !read_ref(bundle_ref.buf, &old_oid);
+		update_ref("fetched bundle", bundle_ref.buf, oid,
+			   has_old ? &old_oid : NULL,
+			   REF_SKIP_OID_VERIFICATION,
+			   UPDATE_REFS_MSG_ON_ERR);
+	}
+
+	bundle_header_release(&header);
+	return result;
+}
+
+int fetch_bundle_uri(struct repository *r, const char *uri)
+{
+	int result = 0;
+	struct strbuf filename = STRBUF_INIT;
+
+	find_temp_filename(&filename);
+	if ((result = copy_uri_to_file(uri, filename.buf)))
+		goto cleanup;
+
+	if ((result = !is_bundle(filename.buf, 0)))
+		goto cleanup;
+
+	if ((result = unbundle_from_file(r, filename.buf)))
+		goto cleanup;
+
+cleanup:
+	unlink(filename.buf);
+	strbuf_release(&filename);
+	return result;
+}
\ No newline at end of file
diff --git a/bundle-uri.h b/bundle-uri.h
new file mode 100644
index 00000000000..8a152f1ef14
--- /dev/null
+++ b/bundle-uri.h
@@ -0,0 +1,14 @@
+#ifndef BUNDLE_URI_H
+#define BUNDLE_URI_H
+
+struct repository;
+
+/**
+ * Fetch data from the given 'uri' and unbundle the bundle data found
+ * based on that information.
+ *
+ * Returns non-zero if no bundle information is found at the given 'uri'.
+ */
+int fetch_bundle_uri(struct repository *r, const char *uri);
+
+#endif
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 04/24] bundle-uri: add support for http(s):// and file://
  2022-05-20 18:40 [PATCH 00/24] [RFC] Bundle URIs Combined RFC Derrick Stolee via GitGitGadget
                   ` (2 preceding siblings ...)
  2022-05-20 18:40 ` [PATCH 03/24] bundle-uri: create basic file-copy logic Derrick Stolee via GitGitGadget
@ 2022-05-20 18:40 ` Derrick Stolee via GitGitGadget
  2022-05-20 18:40 ` [PATCH 05/24] fetch: add --bundle-uri option Derrick Stolee via GitGitGadget
                   ` (19 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-20 18:40 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, Ævar Arnfjörð Bjarmason,
	Teng Long, Johannes Schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The previous change created the logic for copying a file by name. Now,
first inspect the URI for an HTTP(S) prefix and use git-remote-https as
the way to download the data at that URI. Otherwise, check to see if
file:// is present and modify the prefix accordingly.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 64 insertions(+), 1 deletion(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index 91aba061833..08de7257c74 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -19,10 +19,73 @@ static void find_temp_filename(struct strbuf *name)
 	unlink(name->buf);
 }
 
+static int download_https_uri_to_file(const char *uri, const char *file)
+{
+	int result = 0;
+	struct child_process cp = CHILD_PROCESS_INIT;
+	FILE *child_in = NULL, *child_out = NULL;
+	struct strbuf line = STRBUF_INIT;
+	int found_get = 0;
+
+	strvec_pushl(&cp.args, "git-remote-https", "origin", uri, NULL);
+	cp.in = -1;
+	cp.out = -1;
+
+	if (start_command(&cp))
+		return 1;
+
+	child_in = fdopen(cp.in, "w");
+	if (!child_in) {
+		result = 1;
+		goto cleanup;
+	}
+
+	child_out = fdopen(cp.out, "r");
+	if (!child_out) {
+		result = 1;
+		goto cleanup;
+	}
+
+	fprintf(child_in, "capabilities\n");
+	fflush(child_in);
+
+	while (!strbuf_getline(&line, child_out)) {
+		if (!line.len)
+			break;
+		if (!strcmp(line.buf, "get"))
+			found_get = 1;
+	}
+	strbuf_release(&line);
+
+	if (!found_get) {
+		result = error(_("insufficient capabilities"));
+		goto cleanup;
+	}
+
+	fprintf(child_in, "get %s %s\n\n", uri, file);
+
+cleanup:
+	if (child_in)
+		fclose(child_in);
+	if (finish_command(&cp))
+		return 1;
+	if (child_out)
+		fclose(child_out);
+	return result;
+}
+
 static int copy_uri_to_file(const char *uri, const char *file)
 {
+	const char *out;
+	if (skip_prefix(uri, "https:", &out) ||
+	    skip_prefix(uri, "http:", &out))
+		return download_https_uri_to_file(uri, file);
+
+	if (!skip_prefix(uri, "file://", &out))
+		out = uri;
+
 	/* Copy as a file */
-	return copy_file(uri, file, 0444);
+	return !!copy_file(out, file, 0);
 }
 
 static int unbundle_from_file(struct repository *r, const char *file)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 05/24] fetch: add --bundle-uri option
  2022-05-20 18:40 [PATCH 00/24] [RFC] Bundle URIs Combined RFC Derrick Stolee via GitGitGadget
                   ` (3 preceding siblings ...)
  2022-05-20 18:40 ` [PATCH 04/24] bundle-uri: add support for http(s):// and file:// Derrick Stolee via GitGitGadget
@ 2022-05-20 18:40 ` Derrick Stolee via GitGitGadget
  2022-05-20 18:40 ` [PATCH 06/24] fetch: add 'refs/bundle/' to log.excludeDecoration Derrick Stolee via GitGitGadget
                   ` (18 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-20 18:40 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, Ævar Arnfjörð Bjarmason,
	Teng Long, Johannes Schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

Teach 'git fetch' a new --bundle-uri=<uri> option which changes the mode
from fetching from a remote using the Git protocol to fetching a bundle
from the given <uri>. See Documentation/technical/bundle-uri.txt for
more information on the design of this feature.

This implementation is limited to the most basic version of the feature.
We expect the content at that URI to be a bundle file, not a bundle
list. Bundle lists will be implemented later.

This implementation is sufficient for a bundle provider to create a
single bootstrap bundle for a large repository. The user would bootstrap
a repository using a sequence of Git commands, such as:

 1. git init <repo> && cd <repo>
 2. git fetch --bundle-uri=<uri>
 3. git remote add origin <url>
 4. git fetch origin
 5. git checkout FETCH_HEAD

Later changes will make this seamless within a 'git clone' command, but
this implementation is large enough to delay that integration.

Currently, this option supports URIs using "http(s)://", "file://" or
simply specifying a filename. Other protocols could be added in the
future.

RFC TODO: add end-to-end tests of this workflow.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/fetch-options.txt |  5 +++++
 Documentation/git-fetch.txt     |  1 +
 builtin/fetch.c                 | 10 ++++++++++
 3 files changed, 16 insertions(+)

diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt
index 622bd84768b..09bd1feeed8 100644
--- a/Documentation/fetch-options.txt
+++ b/Documentation/fetch-options.txt
@@ -317,3 +317,8 @@ endif::git-pull[]
 -6::
 --ipv6::
 	Use IPv6 addresses only, ignoring IPv4 addresses.
+
+--bundle-uri=<uri>::
+	Instead of fetching from a remote, fetch a bundle from the given
+	`<uri>` and unbundle the data into the local repository. The refs
+	in the bundle will be stored under the `refs/bundle/*` namespace.
diff --git a/Documentation/git-fetch.txt b/Documentation/git-fetch.txt
index e9d364669af..4fd8911b336 100644
--- a/Documentation/git-fetch.txt
+++ b/Documentation/git-fetch.txt
@@ -13,6 +13,7 @@ SYNOPSIS
 'git fetch' [<options>] <group>
 'git fetch' --multiple [<options>] [(<repository> | <group>)...]
 'git fetch' --all [<options>]
+'git fetch' --bundle-uri=<uri> [<options>]
 
 
 DESCRIPTION
diff --git a/builtin/fetch.c b/builtin/fetch.c
index e3791f09ed5..cb0d2fbe82c 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -29,6 +29,7 @@
 #include "commit-graph.h"
 #include "shallow.h"
 #include "worktree.h"
+#include "bundle-uri.h"
 
 #define FORCED_UPDATES_DELAY_WARNING_IN_MS (10 * 1000)
 
@@ -37,6 +38,7 @@ static const char * const builtin_fetch_usage[] = {
 	N_("git fetch [<options>] <group>"),
 	N_("git fetch --multiple [<options>] [(<repository> | <group>)...]"),
 	N_("git fetch --all [<options>]"),
+	N_("git fetch --bundle-uri=<uri> [<options>]"),
 	NULL
 };
 
@@ -86,6 +88,7 @@ static struct string_list negotiation_tip = STRING_LIST_INIT_NODUP;
 static int fetch_write_commit_graph = -1;
 static int stdin_refspecs = 0;
 static int negotiate_only;
+static const char *bundle_uri;
 
 static int git_fetch_config(const char *k, const char *v, void *cb)
 {
@@ -224,6 +227,8 @@ static struct option builtin_fetch_options[] = {
 		 N_("write the commit-graph after fetching")),
 	OPT_BOOL(0, "stdin", &stdin_refspecs,
 		 N_("accept refspecs from stdin")),
+	OPT_STRING(0, "bundle-uri", &bundle_uri, N_("uri"),
+		   N_("download bundle data from the given URI instead of from a remote")),
 	OPT_END()
 };
 
@@ -2181,6 +2186,11 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 	if (dry_run)
 		write_fetch_head = 0;
 
+	if (bundle_uri) {
+		result = fetch_bundle_uri(the_repository, bundle_uri);
+		goto cleanup;
+	}
+
 	if (all) {
 		if (argc == 1)
 			die(_("fetch --all does not take a repository argument"));
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 06/24] fetch: add 'refs/bundle/' to log.excludeDecoration
  2022-05-20 18:40 [PATCH 00/24] [RFC] Bundle URIs Combined RFC Derrick Stolee via GitGitGadget
                   ` (4 preceding siblings ...)
  2022-05-20 18:40 ` [PATCH 05/24] fetch: add --bundle-uri option Derrick Stolee via GitGitGadget
@ 2022-05-20 18:40 ` Derrick Stolee via GitGitGadget
  2022-05-20 18:40 ` [PATCH 07/24] clone: add --bundle-uri option Derrick Stolee via GitGitGadget
                   ` (17 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-20 18:40 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, Ævar Arnfjörð Bjarmason,
	Teng Long, Johannes Schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

When fetching from a bundle URI, the branches of that bundle are stored
in a different ref namespace: refs/bundles/. This namespace is intended
to assist with later 'git fetch' negotiations with a Git server,
allowing the client to advertise which data it already has from a bundle
URI.

These references can be confusing for a user when they appear as a
decoration in 'git log' output. Add "refs/bundles/" to the multi-valued
log.excludeDecoration config value. This is similar to the way
"refs/prefetch/" is hidden by background prefetch operations in 'git
maintenance' as added by 96eaffebb (maintenance: set
log.excludeDecoration durin prefetch, 2021-01-19).

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/fetch-options.txt | 3 ++-
 bundle-uri.c                    | 7 +++++++
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt
index 09bd1feeed8..8b801bcc2f3 100644
--- a/Documentation/fetch-options.txt
+++ b/Documentation/fetch-options.txt
@@ -321,4 +321,5 @@ endif::git-pull[]
 --bundle-uri=<uri>::
 	Instead of fetching from a remote, fetch a bundle from the given
 	`<uri>` and unbundle the data into the local repository. The refs
-	in the bundle will be stored under the `refs/bundle/*` namespace.
+	in the bundle will be stored under the hidden `refs/bundle/*`
+	namespace.
diff --git a/bundle-uri.c b/bundle-uri.c
index 08de7257c74..0692a62071a 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -1,6 +1,7 @@
 #include "cache.h"
 #include "bundle-uri.h"
 #include "bundle.h"
+#include "config.h"
 #include "object-store.h"
 #include "refs.h"
 #include "run-command.h"
@@ -148,6 +149,12 @@ int fetch_bundle_uri(struct repository *r, const char *uri)
 	if ((result = unbundle_from_file(r, filename.buf)))
 		goto cleanup;
 
+	git_config_set_multivar_gently("log.excludedecoration",
+					"refs/bundle/",
+					"refs/bundle/",
+					CONFIG_FLAGS_FIXED_VALUE |
+					CONFIG_FLAGS_MULTI_REPLACE);
+
 cleanup:
 	unlink(filename.buf);
 	strbuf_release(&filename);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 07/24] clone: add --bundle-uri option
  2022-05-20 18:40 [PATCH 00/24] [RFC] Bundle URIs Combined RFC Derrick Stolee via GitGitGadget
                   ` (5 preceding siblings ...)
  2022-05-20 18:40 ` [PATCH 06/24] fetch: add 'refs/bundle/' to log.excludeDecoration Derrick Stolee via GitGitGadget
@ 2022-05-20 18:40 ` Derrick Stolee via GitGitGadget
  2022-05-20 18:40 ` [PATCH 08/24] clone: --bundle-uri cannot be combined with --depth Derrick Stolee via GitGitGadget
                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-20 18:40 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, Ævar Arnfjörð Bjarmason,
	Teng Long, Johannes Schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

Cloning a remote repository is one of the most expensive operations in
Git. The server can spend a lot of CPU time generating a pack-file for
the client's request. The amount of data can clog the network for a long
time, and the Git protocol is not resumable. For users with poor network
connections or are located far away from the origin server, this can be
especially painful.

The 'git bundle fetch' command allows users to bootstrap a repository
using a set of bundles. However, this would require them to use 'git
init' first, followed by the 'git bundle fetch', and finally add a
remote, fetch, and checkout the branch they want.

Instead, integrate this workflow directly into 'git clone' with the
--bundle-uri' option. If the user is aware of a bundle server, then they
can tell Git to bootstrap the new repository with these bundles before
fetching the remaining objects from the origin server.

RFC-TODO: Document this option in git-clone.txt.

RFC-TODO: I added a comment about the location of this code being
necessary for the later step of auto-discovering the bundle URI from the
origin server. This is probably not actually a requirement, but rather a
pain point around how I implemented the feature. If a --bundle-uri
option is specified, but SSH is used for the clone, then the SSH
connection is left open while Git downloads bundles from another server.
This is sub-optimal and should be reconsidered when fully reviewed.

RFC-TODO: create tests for this option with a variety of URI types.

RFC-TODO: a simple end-to-end test is available at the end of the
series.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/clone.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/builtin/clone.c b/builtin/clone.c
index 52316563795..fd1ae82e57b 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -34,6 +34,7 @@
 #include "list-objects-filter-options.h"
 #include "hook.h"
 #include "bundle.h"
+#include "bundle-uri.h"
 
 /*
  * Overall FIXMEs:
@@ -77,6 +78,7 @@ static int option_filter_submodules = -1;    /* unspecified */
 static int config_filter_submodules = -1;    /* unspecified */
 static struct string_list server_options = STRING_LIST_INIT_NODUP;
 static int option_remote_submodules;
+static const char *bundle_uri;
 
 static int recurse_submodules_cb(const struct option *opt,
 				 const char *arg, int unset)
@@ -160,6 +162,8 @@ static struct option builtin_clone_options[] = {
 		    N_("any cloned submodules will use their remote-tracking branch")),
 	OPT_BOOL(0, "sparse", &option_sparse_checkout,
 		    N_("initialize sparse-checkout file to include only files at root")),
+	OPT_STRING(0, "bundle-uri", &bundle_uri,
+		   N_("uri"), N_("a URI for downloading bundles before fetching from origin remote")),
 	OPT_END()
 };
 
@@ -1232,6 +1236,24 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 
 	refs = transport_get_remote_refs(transport, &transport_ls_refs_options);
 
+	/*
+	 * NOTE: The bundle URI download takes place after transport_get_remote_refs()
+	 * because a later change will introduce a check for recommended features,
+	 * which might include a recommended bundle URI.
+	 */
+
+	/*
+	 * Before fetching from the remote, download and install bundle
+	 * data from the --bundle-uri option.
+	 */
+	if (bundle_uri) {
+		/* At this point, we need the_repository to match the cloned repo. */
+		repo_init(the_repository, git_dir, work_tree);
+		if (fetch_bundle_uri(the_repository, bundle_uri))
+			warning(_("failed to fetch objects from bundle URI '%s'"),
+				bundle_uri);
+	}
+
 	if (refs)
 		mapped_refs = wanted_peer_refs(refs, &remote->fetch);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 08/24] clone: --bundle-uri cannot be combined with --depth
  2022-05-20 18:40 [PATCH 00/24] [RFC] Bundle URIs Combined RFC Derrick Stolee via GitGitGadget
                   ` (6 preceding siblings ...)
  2022-05-20 18:40 ` [PATCH 07/24] clone: add --bundle-uri option Derrick Stolee via GitGitGadget
@ 2022-05-20 18:40 ` Derrick Stolee via GitGitGadget
  2022-05-20 18:40 ` [PATCH 09/24] bundle-uri: create bundle_list struct and helpers Derrick Stolee via GitGitGadget
                   ` (15 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-20 18:40 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, Ævar Arnfjörð Bjarmason,
	Teng Long, Johannes Schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The previous change added the '--bundle-uri' option, but did not check
if the --depth parameter was included. Since bundles are not compatible
with shallow clones, provide an error message to the user who is
attempting this combination.

I am leaving this as its own change, separate from the one that
implements '--bundle-uri', because this is more of an advisory for the
user. There is nothing wrong with bootstrapping with bundles and then
fetching a shallow clone. However, that is likely going to involve too
much work for the client _and_ the server. The client will download all
of this bundle information containing the full history of the
repository only to ignore most of it. The server will get a shallow
fetch request, but with a list of haves that might cause a more painful
computation of that shallow pack-file.

RFC-TODO: add a test case for this error message.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/clone.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/builtin/clone.c b/builtin/clone.c
index fd1ae82e57b..a9caa5dfed6 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -926,6 +926,11 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		option_no_checkout = 1;
 	}
 
+	if (bundle_uri) {
+		if (deepen)
+			die(_("--bundle-uri is incompatible with --depth, --shallow-since, and --shallow-exclude"));
+	}
+
 	repo_name = argv[0];
 
 	path = get_repo_path(repo_name, &is_bundle);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 09/24] bundle-uri: create bundle_list struct and helpers
  2022-05-20 18:40 [PATCH 00/24] [RFC] Bundle URIs Combined RFC Derrick Stolee via GitGitGadget
                   ` (7 preceding siblings ...)
  2022-05-20 18:40 ` [PATCH 08/24] clone: --bundle-uri cannot be combined with --depth Derrick Stolee via GitGitGadget
@ 2022-05-20 18:40 ` Derrick Stolee via GitGitGadget
  2022-05-20 18:40 ` [PATCH 10/24] bundle-uri: create base key-value pair parsing Derrick Stolee via GitGitGadget
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-20 18:40 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, Ævar Arnfjörð Bjarmason,
	Teng Long, Johannes Schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

It will likely be rare where a user uses a single bundle URI and expects
that URI to point to a bundle. Instead, that URI will likely be a list
of bundles provided in some format. Alternatively, the Git server could
advertise a list of bundles.

In anticipation of these two ways of advertising multiple bundles,
create a data structure that represents such a list. This will be
populated using a common API, but for now focus on what data can be
represented.

Each list contains a number of remote_bundle_info structs. These contain
an 'id' that is used to uniquely identify them in the list, and also a
'uri' that contains the location of its data. Finally, there is a strbuf
containing the filename used when Git downloads the contents to disk.

The list itself stores these remote_bundle_info structs in a hashtable
using 'id' as the key. The order of the structs in the input is
considered unimportant, but future modifications to the format and these
data structures will place ordering possibilities on the set. The list
also has a few "global" properties, including the version (used when
parsing the list) and the mode. The mode is one of these two options:

1. BUNDLE_MODE_ALL: all listed URIs are intended to be combined
   together. The client should download all of the advertised data to
   have a complete copy of the data.

2. BUNDLE_MODE_ANY: any one listed item is sufficient to have a complete
   copy of the data. The client can choose arbitrarily from these
   options. In the future, the client may use pings to find the closest
   URI among geodistributed replicas, or use some other heuristic
   information added to the format.

This API is currently unused, but will soon be expanded with parsing
logic and then be consumed by the bundle URI download logic.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++
 bundle-uri.h | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 128 insertions(+)

diff --git a/bundle-uri.c b/bundle-uri.c
index 0692a62071a..b9a219d3202 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -5,6 +5,67 @@
 #include "object-store.h"
 #include "refs.h"
 #include "run-command.h"
+#include "hashmap.h"
+#include "pkt-line.h"
+
+static int compare_bundles(const void *hashmap_cmp_fn_data,
+			   const struct hashmap_entry *he1,
+			   const struct hashmap_entry *he2,
+			   const void *id)
+{
+	const struct remote_bundle_info *e1 =
+		container_of(he1, const struct remote_bundle_info, ent);
+	const struct remote_bundle_info *e2 =
+		container_of(he2, const struct remote_bundle_info, ent);
+
+	return strcmp(e1->id, id ? (const char *)id : e2->id);
+}
+
+void init_bundle_list(struct bundle_list *list)
+{
+	memset(list, 0, sizeof(*list));
+
+	/* Implied defaults. */
+	list->mode = BUNDLE_MODE_ALL;
+	list->version = 1;
+
+	hashmap_init(&list->bundles, compare_bundles, NULL, 0);
+}
+
+static int clear_remote_bundle_info(struct remote_bundle_info *bundle,
+				    void *data)
+{
+	free(bundle->id);
+	free(bundle->uri);
+	strbuf_release(&bundle->file);
+	return 0;
+}
+
+void clear_bundle_list(struct bundle_list *list)
+{
+	if (!list)
+		return;
+
+	for_all_bundles_in_list(list, clear_remote_bundle_info, NULL);
+	hashmap_clear_and_free(&list->bundles, struct remote_bundle_info, ent);
+}
+
+int for_all_bundles_in_list(struct bundle_list *list,
+			    bundle_iterator iter,
+			    void *data)
+{
+	struct remote_bundle_info *info;
+	struct hashmap_iter i;
+
+	hashmap_for_each_entry(&list->bundles, &i, info, ent) {
+		int result = iter(info, data);
+
+		if (result)
+			return result;
+	}
+
+	return 0;
+}
 
 static void find_temp_filename(struct strbuf *name)
 {
diff --git a/bundle-uri.h b/bundle-uri.h
index 8a152f1ef14..0f95197744c 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -1,7 +1,74 @@
 #ifndef BUNDLE_URI_H
 #define BUNDLE_URI_H
 
+#include "hashmap.h"
+#include "strbuf.h"
+
 struct repository;
+struct strbuf;
+struct string_list;
+
+/**
+ * The remote_bundle_info struct contains information for a single bundle
+ * URI. This may be initialized simply by a given URI or might have
+ * additional metadata associated with it if the bundle was advertised by
+ * a bundle list.
+ */
+struct remote_bundle_info {
+	struct hashmap_entry ent;
+
+	/**
+	 * The 'id' is a name given to the bundle for reference
+	 * by other bundle infos.
+	 */
+	char *id;
+
+	/**
+	 * The 'uri' is the location of the remote bundle so
+	 * it can be downloaded on-demand. This will be NULL
+	 * if there was no table of contents.
+	 */
+	char *uri;
+
+	/**
+	 * If the bundle has been downloaded, then 'file' is a
+	 * filename storing its contents. Otherwise, 'file' is
+	 * an empty string.
+	 */
+	struct strbuf file;
+};
+
+#define REMOTE_BUNDLE_INFO_INIT { \
+	.file = STRBUF_INIT, \
+}
+
+enum bundle_list_mode {
+	BUNDLE_MODE_NONE = 0,
+	BUNDLE_MODE_ALL,
+	BUNDLE_MODE_ANY
+};
+
+/**
+ * A bundle_list contains an unordered set of remote_bundle_info structs,
+ * as well as information about the bundle listing, such as version and
+ * mode.
+ */
+struct bundle_list {
+	int version;
+	enum bundle_list_mode mode;
+	unsigned forFetch : 1;
+	struct hashmap bundles;
+};
+
+void init_bundle_list(struct bundle_list *list);
+void clear_bundle_list(struct bundle_list *list);
+
+typedef int (*bundle_iterator)(struct remote_bundle_info *bundle,
+			       void *data);
+
+int for_all_bundles_in_list(struct bundle_list *list,
+			    bundle_iterator iter,
+			    void *data);
 
 /**
  * Fetch data from the given 'uri' and unbundle the bundle data found
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 10/24] bundle-uri: create base key-value pair parsing
  2022-05-20 18:40 [PATCH 00/24] [RFC] Bundle URIs Combined RFC Derrick Stolee via GitGitGadget
                   ` (8 preceding siblings ...)
  2022-05-20 18:40 ` [PATCH 09/24] bundle-uri: create bundle_list struct and helpers Derrick Stolee via GitGitGadget
@ 2022-05-20 18:40 ` Derrick Stolee via GitGitGadget
  2022-05-20 18:40 ` [PATCH 11/24] bundle-uri: create "key=value" line parsing Ævar Arnfjörð Bjarmason via GitGitGadget
                   ` (13 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-20 18:40 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, Ævar Arnfjörð Bjarmason,
	Teng Long, Johannes Schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

There will be two primary ways to advertise a bundle list: as a list of
packet lines in Git's protocol v2 and as a config file served from a
bundle URI. Both of these fundamentally use a list of key-value pairs.
We will use the same set of key-value pairs across these formats.

Create a new bundle_list_update() method that is currently unusued, but
will be used in the next change. It inspects each key to see if it is
understood and then applies it to the given bundle_list. Here are the
keys that we teach Git to understand:

* bundle.list.version: This value should be an integer. Git currently
  understands only version 1 and will ignore the list if the version is
  any other value. This version can be increased in the future if we
  need to add new keys that Git should not ignore. We can add new
  "heuristic" keys without incrementing the version.

* bundle.list.mode: This value should be one of "all" or "any". If this
  mode is not understood, then Git will ignore the list. This mode
  indicates whether Git needs all of the bundle list items to make a
  complete view of the content or if any single item is sufficient.

The rest of the keys use a bundle identifier "<id>" as part of the key
name, and this cannot equal "list". Keys using the same "<id>" describe
a common bundle list item.

* bundle.<id>.uri: This stores the URI of the bundle item. This
  currently is expected to be an absolute URI, but will be relaxed to be
  a relative URI in the future.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 80 insertions(+)

diff --git a/bundle-uri.c b/bundle-uri.c
index b9a219d3202..f18bba7071d 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -67,6 +67,86 @@ int for_all_bundles_in_list(struct bundle_list *list,
 	return 0;
 }
 
+/**
+ * Given a key-value pair, update the state of the given bundle list.
+ * Returns 0 if the key-value pair is understood. Returns 1 if the key
+ * is not understood or the value is malformed.
+ */
+MAYBE_UNUSED
+static int bundle_list_update(const char *key, const char *value,
+			      struct bundle_list *list)
+{
+	const char *pkey, *dot;
+	struct strbuf id = STRBUF_INIT;
+	struct remote_bundle_info lookup = REMOTE_BUNDLE_INFO_INIT;
+	struct remote_bundle_info *bundle;
+
+	if (!skip_prefix(key, "bundle.", &pkey))
+		return 1;
+
+	if (!strcmp(pkey, "list.version")) {
+		int version = atoi(value);
+		if (version != 1)
+			return 1;
+
+		list->version = version;
+		return 0;
+	}
+
+	if (!strcmp(pkey, "list.mode")) {
+		if (!strcmp(value, "all"))
+			list->mode = BUNDLE_MODE_ALL;
+		else if (!strcmp(value, "any"))
+			list->mode = BUNDLE_MODE_ANY;
+		else
+			return 1;
+		return 0;
+	}
+
+	/*
+	 * All remaining keys must be of the form "bundle.<id>.*" where
+	 * <id> != "list"
+	 */
+
+	dot = strchr(pkey, '.');
+	if (!dot)
+		return 1;
+	if (dot - pkey == 4 &&
+	    !strncmp(pkey, "list", 4))
+		return 1;
+
+	strbuf_add(&id, pkey, dot - pkey);
+	dot++;
+
+	/*
+	 * Check for an existing bundle with this <id>, or create one
+	 * if necessary.
+	 */
+	lookup.id = id.buf;
+	hashmap_entry_init(&lookup.ent, strhash(lookup.id));
+	if (!(bundle = hashmap_get_entry(&list->bundles, &lookup, ent, NULL))) {
+		CALLOC_ARRAY(bundle, 1);
+		bundle->id = strbuf_detach(&id, NULL);
+		strbuf_init(&bundle->file, 0);
+		hashmap_entry_init(&bundle->ent, strhash(bundle->id));
+		hashmap_add(&list->bundles, &bundle->ent);
+	}
+	strbuf_release(&id);
+
+	if (!strcmp(dot, "uri")) {
+		free(bundle->uri);
+		bundle->uri = xstrdup(value);
+		return 0;
+	}
+
+	/*
+	 * At this point, we ignore any information that we don't
+	 * understand, assuming it to be hints for a heuristic the client
+	 * does not currently understand.
+	 */
+	return 0;
+}
+
 static void find_temp_filename(struct strbuf *name)
 {
 	int fd;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 11/24] bundle-uri: create "key=value" line parsing
  2022-05-20 18:40 [PATCH 00/24] [RFC] Bundle URIs Combined RFC Derrick Stolee via GitGitGadget
                   ` (9 preceding siblings ...)
  2022-05-20 18:40 ` [PATCH 10/24] bundle-uri: create base key-value pair parsing Derrick Stolee via GitGitGadget
@ 2022-05-20 18:40 ` Ævar Arnfjörð Bjarmason via GitGitGadget
  2022-05-20 18:40 ` [PATCH 12/24] bundle-uri: unit test "key=value" parsing Ævar Arnfjörð Bjarmason via GitGitGadget
                   ` (12 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Ævar Arnfjörð Bjarmason via GitGitGadget @ 2022-05-20 18:40 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, Ævar Arnfjörð Bjarmason,
	Teng Long, Johannes Schindelin, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?=
 <avarab@gmail.com>

When advertising a bundle list over Git's protocol v2, we will use
packet lines. Each line will be of the form "key=value" representing a
bundle list. Connect the API necessary for Git's transport to the
key-value pair parsing created in the previous change.

Co-authored-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c | 29 +++++++++++++++++++++++++++--
 bundle-uri.h | 14 +++++++++++++-
 2 files changed, 40 insertions(+), 3 deletions(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index f18bba7071d..948f974c478 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -72,7 +72,6 @@ int for_all_bundles_in_list(struct bundle_list *list,
  * Returns 0 if the key-value pair is understood. Returns 1 if the key
  * is not understood or the value is malformed.
  */
-MAYBE_UNUSED
 static int bundle_list_update(const char *key, const char *value,
 			      struct bundle_list *list)
 {
@@ -300,4 +299,30 @@ cleanup:
 	unlink(filename.buf);
 	strbuf_release(&filename);
 	return result;
-}
\ No newline at end of file
+}
+
+/**
+ * General API for {transport,connect}.c etc.
+ */
+int bundle_uri_parse_line(struct bundle_list *list, const char *line)
+{
+	int result;
+	const char *equals;
+	struct strbuf key = STRBUF_INIT;
+
+	if (!strlen(line))
+		return error(_("bundle-uri: got an empty line"));
+
+	equals = strchr(line, '=');
+
+	if (!equals)
+		return error(_("bundle-uri: line is not of the form 'key=value'"));
+	if (line == equals || !*(equals + 1))
+		return error(_("bundle-uri: line has empty key or value"));
+
+	strbuf_add(&key, line, equals - line);
+	result = bundle_list_update(key.buf, equals + 1, list);
+	strbuf_release(&key);
+
+	return result;
+}
diff --git a/bundle-uri.h b/bundle-uri.h
index 0f95197744c..344d5943922 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -78,4 +78,16 @@ int for_all_bundles_in_list(struct bundle_list *list,
  */
 int fetch_bundle_uri(struct repository *r, const char *uri);
 
-#endif
+/**
+ * General API for {transport,connect}.c etc.
+ */
+
+/**
+ * Parse a "key=value" packet line from the bundle-uri verb.
+ *
+ * Returns 0 on success and non-zero on error.
+ */
+int bundle_uri_parse_line(struct bundle_list *list,
+			  const char *line);
+
+#endif /* BUNDLE_URI_H */
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 12/24] bundle-uri: unit test "key=value" parsing
  2022-05-20 18:40 [PATCH 00/24] [RFC] Bundle URIs Combined RFC Derrick Stolee via GitGitGadget
                   ` (10 preceding siblings ...)
  2022-05-20 18:40 ` [PATCH 11/24] bundle-uri: create "key=value" line parsing Ævar Arnfjörð Bjarmason via GitGitGadget
@ 2022-05-20 18:40 ` Ævar Arnfjörð Bjarmason via GitGitGadget
  2022-05-20 18:40 ` [PATCH 13/24] bundle-uri: limit recursion depth for bundle lists Derrick Stolee via GitGitGadget
                   ` (11 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Ævar Arnfjörð Bjarmason via GitGitGadget @ 2022-05-20 18:40 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, Ævar Arnfjörð Bjarmason,
	Teng Long, Johannes Schindelin, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?=
 <avarab@gmail.com>

Create a new 'test-tool bundle-uri' test helper. This helper will assist
in testing logic deep in the bundle URI feature.

This change introduces the 'parse-key-values' subcommand, which parses
stdin as a list of lines. These are fed into bundle_uri_parse_line() to
test how we construct a 'struct bundle_list' from that data. The list is
then output to stdout as if the key-value pairs were a Git config file.

Co-authored-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Makefile                    |  1 +
 bundle-uri.c                | 34 ++++++++++++++
 bundle-uri.h                |  3 ++
 t/helper/test-bundle-uri.c  | 63 +++++++++++++++++++++++++
 t/helper/test-tool.c        |  1 +
 t/helper/test-tool.h        |  1 +
 t/t5750-bundle-uri-parse.sh | 91 +++++++++++++++++++++++++++++++++++++
 t/test-lib-functions.sh     | 11 +++++
 8 files changed, 205 insertions(+)
 create mode 100644 t/helper/test-bundle-uri.c
 create mode 100755 t/t5750-bundle-uri-parse.sh

diff --git a/Makefile b/Makefile
index 8f27310836d..c8a14793005 100644
--- a/Makefile
+++ b/Makefile
@@ -706,6 +706,7 @@ PROGRAMS += $(patsubst %.o,git-%$X,$(PROGRAM_OBJS))
 TEST_BUILTINS_OBJS += test-advise.o
 TEST_BUILTINS_OBJS += test-bitmap.o
 TEST_BUILTINS_OBJS += test-bloom.o
+TEST_BUILTINS_OBJS += test-bundle-uri.o
 TEST_BUILTINS_OBJS += test-chmtime.o
 TEST_BUILTINS_OBJS += test-config.o
 TEST_BUILTINS_OBJS += test-crontab.o
diff --git a/bundle-uri.c b/bundle-uri.c
index 948f974c478..ff215422888 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -67,6 +67,40 @@ int for_all_bundles_in_list(struct bundle_list *list,
 	return 0;
 }
 
+static int summarize_bundle(struct remote_bundle_info *info, void *data)
+{
+	FILE *fp = data;
+	fprintf(fp, "[bundle \"%s\"]\n", info->id);
+	fprintf(fp, "\turi = %s\n", info->uri);
+	return 0;
+}
+
+void print_bundle_list(FILE *fp, struct bundle_list *list)
+{
+	const char *mode;
+
+	switch (list->mode) {
+	case BUNDLE_MODE_ALL:
+		mode = "all";
+		break;
+
+	case BUNDLE_MODE_ANY:
+		mode = "any";
+		break;
+
+	case BUNDLE_MODE_NONE:
+	default:
+		mode = "<unknown>";
+	}
+
+	printf("[bundle \"list\"]\n");
+	printf("\tversion = %d\n", list->version);
+	printf("\tmode = %s\n", mode);
+
+	for_all_bundles_in_list(list, summarize_bundle, fp);
+}
+
+
 /**
  * Given a key-value pair, update the state of the given bundle list.
  * Returns 0 if the key-value pair is understood. Returns 1 if the key
diff --git a/bundle-uri.h b/bundle-uri.h
index 344d5943922..4a12a90bf42 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -70,6 +70,9 @@ int for_all_bundles_in_list(struct bundle_list *list,
 			    bundle_iterator iter,
 			    void *data);
 
+struct FILE;
+void print_bundle_list(FILE *fp, struct bundle_list *list);
+
 /**
  * Fetch data from the given 'uri' and unbundle the bundle data found
  * based on that information.
diff --git a/t/helper/test-bundle-uri.c b/t/helper/test-bundle-uri.c
new file mode 100644
index 00000000000..5cb0c9196fa
--- /dev/null
+++ b/t/helper/test-bundle-uri.c
@@ -0,0 +1,63 @@
+#include "test-tool.h"
+#include "parse-options.h"
+#include "bundle-uri.h"
+#include "strbuf.h"
+#include "string-list.h"
+
+static int cmd__bundle_uri_parse_key_values(int argc, const char **argv)
+{
+	const char *usage[] = {
+		"test-tool bundle-uri parse-key-values <in",
+		NULL
+	};
+	struct option options[] = {
+		OPT_END(),
+	};
+	struct strbuf sb = STRBUF_INIT;
+	struct bundle_list list;
+	int err = 0;
+
+	argc = parse_options(argc, argv, NULL, options, usage, 0);
+	if (argc)
+		goto usage;
+
+	init_bundle_list(&list);
+	while (strbuf_getline(&sb, stdin) != EOF) {
+		if (bundle_uri_parse_line(&list, sb.buf) < 0)
+			err = error("bad line: '%s'", sb.buf);
+	}
+	strbuf_release(&sb);
+
+	print_bundle_list(stdout, &list);
+
+	clear_bundle_list(&list);
+
+	return !!err;
+
+usage:
+	usage_with_options(usage, options);
+}
+
+int cmd__bundle_uri(int argc, const char **argv)
+{
+	const char *usage[] = {
+		"test-tool bundle-uri <subcommand> [<options>]",
+		NULL
+	};
+	struct option options[] = {
+		OPT_END(),
+	};
+
+	argc = parse_options(argc, argv, NULL, options, usage,
+			     PARSE_OPT_STOP_AT_NON_OPTION |
+			     PARSE_OPT_KEEP_ARGV0);
+	if (argc == 1)
+		goto usage;
+
+	if (!strcmp(argv[1], "parse-key-values"))
+		return cmd__bundle_uri_parse_key_values(argc - 1, argv + 1);
+	error("there is no test-tool bundle-uri tool '%s'", argv[1]);
+
+usage:
+	usage_with_options(usage, options);
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 0424f7adf5d..bff823fbd3e 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -17,6 +17,7 @@ static struct test_cmd cmds[] = {
 	{ "advise", cmd__advise_if_enabled },
 	{ "bitmap", cmd__bitmap },
 	{ "bloom", cmd__bloom },
+	{ "bundle-uri", cmd__bundle_uri },
 	{ "chmtime", cmd__chmtime },
 	{ "config", cmd__config },
 	{ "crontab", cmd__crontab },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index c876e8246fb..eb747e07dd1 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -7,6 +7,7 @@
 int cmd__advise_if_enabled(int argc, const char **argv);
 int cmd__bitmap(int argc, const char **argv);
 int cmd__bloom(int argc, const char **argv);
+int cmd__bundle_uri(int argc, const char **argv);
 int cmd__chmtime(int argc, const char **argv);
 int cmd__config(int argc, const char **argv);
 int cmd__crontab(int argc, const char **argv);
diff --git a/t/t5750-bundle-uri-parse.sh b/t/t5750-bundle-uri-parse.sh
new file mode 100755
index 00000000000..ebb80596125
--- /dev/null
+++ b/t/t5750-bundle-uri-parse.sh
@@ -0,0 +1,91 @@
+#!/bin/sh
+
+test_description="Test bundle-uri bundle_uri_parse_line()"
+
+TEST_NO_CREATE_REPO=1
+TEST_PASSES_SANITIZE_LEAK=true
+. ./test-lib.sh
+
+test_expect_success 'bundle_uri_parse_line() just URIs' '
+	cat >in <<-\EOF &&
+	bundle.one.uri=http://example.com/bundle.bdl
+	bundle.two.uri=https://example.com/bundle.bdl
+	bundle.three.uri=file:///usr/share/git/bundle.bdl
+	EOF
+
+	cat >expect <<-\EOF &&
+	[bundle "list"]
+		version = 1
+		mode = all
+	[bundle "one"]
+		uri = http://example.com/bundle.bdl
+	[bundle "two"]
+		uri = https://example.com/bundle.bdl
+	[bundle "three"]
+		uri = file:///usr/share/git/bundle.bdl
+	EOF
+
+	test-tool bundle-uri parse-key-values <in >actual 2>err &&
+	test_must_be_empty err &&
+	test_cmp_config_output expect actual
+'
+
+test_expect_success 'bundle_uri_parse_line() parsing edge cases: empty key or value' '
+	cat >in <<-\EOF &&
+	=bogus-value
+	bogus-key=
+	EOF
+
+	cat >err.expect <<-EOF &&
+	error: bundle-uri: line has empty key or value
+	error: bad line: '\''=bogus-value'\''
+	error: bundle-uri: line has empty key or value
+	error: bad line: '\''bogus-key='\''
+	EOF
+
+	cat >expect <<-\EOF &&
+	[bundle "list"]
+		version = 1
+		mode = all
+	EOF
+
+	test_must_fail test-tool bundle-uri parse-key-values <in >actual 2>err &&
+	test_cmp err.expect err &&
+	test_cmp_config_output expect actual
+'
+
+test_expect_success 'bundle_uri_parse_line() parsing edge cases: empty lines' '
+	cat >in <<-\EOF &&
+	bundle.one.uri=http://example.com/bundle.bdl
+
+	bundle.two.uri=https://example.com/bundle.bdl
+
+	bundle.three.uri=file:///usr/share/git/bundle.bdl
+	EOF
+
+	cat >err.expect <<-\EOF &&
+	error: bundle-uri: got an empty line
+	error: bad line: '\'''\''
+	error: bundle-uri: got an empty line
+	error: bad line: '\'''\''
+	EOF
+
+	# We fail, but try to continue parsing regardless
+	cat >expect <<-\EOF &&
+	[bundle "list"]
+		version = 1
+		mode = all
+	[bundle "one"]
+		uri = http://example.com/bundle.bdl
+	[bundle "two"]
+		uri = https://example.com/bundle.bdl
+	[bundle "three"]
+		uri = file:///usr/share/git/bundle.bdl
+	EOF
+
+	test_must_fail test-tool bundle-uri parse-key-values <in >actual 2>err &&
+	test_cmp err.expect err &&
+	test_cmp_config_output expect actual
+'
+
+test_done
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 93c03380d44..747739f81b7 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1944,3 +1944,14 @@ test_is_magic_mtime () {
 	rm -f .git/test-mtime-actual
 	return $ret
 }
+
+# Given two filenames, parse both using 'git config --list --file'
+# and compare the sorted output of those commands. Useful when
+# wanting to ignore whitespace differences and sorting concerns.
+test_cmp_config_output () {
+	git config --list --file="$1" >config-expect &&
+	git config --list --file="$2" >config-actual &&
+	sort config-expect >sorted-expect &&
+	sort config-actual >sorted-actual &&
+	test_cmp sorted-expect sorted-actual
+}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 13/24] bundle-uri: limit recursion depth for bundle lists
  2022-05-20 18:40 [PATCH 00/24] [RFC] Bundle URIs Combined RFC Derrick Stolee via GitGitGadget
                   ` (11 preceding siblings ...)
  2022-05-20 18:40 ` [PATCH 12/24] bundle-uri: unit test "key=value" parsing Ævar Arnfjörð Bjarmason via GitGitGadget
@ 2022-05-20 18:40 ` Derrick Stolee via GitGitGadget
  2022-05-20 18:40 ` [PATCH 14/24] bundle-uri: parse bundle list in config format Derrick Stolee via GitGitGadget
                   ` (10 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-20 18:40 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, Ævar Arnfjörð Bjarmason,
	Teng Long, Johannes Schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The next change will start allowing us to parse bundle lists that are
downloaded from a provided bundle URI. Those lists might point to other
lists, which could proceed to an arbitrary depth (and even create
cycles). Restructure fetch_bundle_uri() to have an internal version that
has a recursion depth. Compare that to a new max_bundle_uri_depth
constant that is twice as high as we expect this depth to be for any
legitimate use of bundle list linking.

We can consider making max_bundle_uri_depth a configurable value if
there is demonstrated value in the future.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index ff215422888..74a26ce6c00 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -308,11 +308,25 @@ static int unbundle_from_file(struct repository *r, const char *file)
 	return result;
 }
 
-int fetch_bundle_uri(struct repository *r, const char *uri)
+/**
+ * This limits the recursion on fetch_bundle_uri_internal() when following
+ * bundle lists.
+ */
+static int max_bundle_uri_depth = 4;
+
+static int fetch_bundle_uri_internal(struct repository *r,
+				     const char *uri,
+				     int depth)
 {
 	int result = 0;
 	struct strbuf filename = STRBUF_INIT;
 
+	if (depth >= max_bundle_uri_depth) {
+		warning(_("exceeded bundle URI recursion limit (%d)"),
+			max_bundle_uri_depth);
+		return -1;
+	}
+
 	find_temp_filename(&filename);
 	if ((result = copy_uri_to_file(uri, filename.buf)))
 		goto cleanup;
@@ -335,6 +349,11 @@ cleanup:
 	return result;
 }
 
+int fetch_bundle_uri(struct repository *r, const char *uri)
+{
+	return fetch_bundle_uri_internal(r, uri, 0);
+}
+
 /**
  * General API for {transport,connect}.c etc.
  */
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 14/24] bundle-uri: parse bundle list in config format
  2022-05-20 18:40 [PATCH 00/24] [RFC] Bundle URIs Combined RFC Derrick Stolee via GitGitGadget
                   ` (12 preceding siblings ...)
  2022-05-20 18:40 ` [PATCH 13/24] bundle-uri: limit recursion depth for bundle lists Derrick Stolee via GitGitGadget
@ 2022-05-20 18:40 ` Derrick Stolee via GitGitGadget
  2022-05-20 18:40 ` [PATCH 15/24] bundle-uri: fetch a list of bundles Derrick Stolee via GitGitGadget
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-20 18:40 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, Ævar Arnfjörð Bjarmason,
	Teng Long, Johannes Schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

When a bundle provider wants to operate independently from a Git remote,
they want to provide a single, consistent URI that users can use in
their 'git clone --bundle-uri' commands. At this point, the Git client
expects that URI to be a single bundle that can be unbundled and used to
bootstrap the rest of the clone from the Git server. This single bundle
cannot be re-used to assist with future incremental fetches.

To allow for the incremental fetch case, teach Git to understand a
bundle list that could be advertised at an independent bundle URI. Such
a bundle list is likely to be inspected by human readers, even if only
by the bundle provider creating the list. For this reason, we can take
our expected "key=value" pairs and instead format them using Git config
format.

Create parse_bundle_list_in_config_format() to parse a file in config
format and convert that into a 'struct bundle_list' filled with its
understanding of the contents.

Be careful to call git_config_from_file_with_options() because the
default action for git_config_from_file() is to die() on a parsing
error. The current warning isn't particularly helpful if it arises to a
user, but it will be made more verbose at a higher layer later.

Update 'test-tool bundle-uri' to take this config file format as input.
It uses a filename instead of stdin because there is no existing way to
parse a FILE pointer in the config machinery. Using
git_config_from_mem() is overly complicated and more likely to introduce
bugs than this simpler version. I would rather have a slightly confusing
test helper than complicated product code.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c                | 28 +++++++++++++++++++++
 bundle-uri.h                | 10 ++++++++
 t/helper/test-bundle-uri.c  | 45 ++++++++++++++++++++++++++-------
 t/t5750-bundle-uri-parse.sh | 50 +++++++++++++++++++++++++++++++++++++
 4 files changed, 124 insertions(+), 9 deletions(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index 74a26ce6c00..ab4b6385fc0 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -180,6 +180,34 @@ static int bundle_list_update(const char *key, const char *value,
 	return 0;
 }
 
+static int config_to_bundle_list(const char *key, const char *value, void *data)
+{
+	struct bundle_list *list = data;
+	return bundle_list_update(key, value, list);
+}
+
+int parse_bundle_list_in_config_format(const char *uri,
+				       const char *filename,
+				       struct bundle_list *list)
+{
+	int result;
+	struct config_options opts = {
+		.error_action = CONFIG_ERROR_ERROR,
+	};
+
+	list->mode = BUNDLE_MODE_NONE;
+	result = git_config_from_file_with_options(config_to_bundle_list,
+						   filename, list,
+						   &opts);
+
+	if (!result && list->mode == BUNDLE_MODE_NONE) {
+		warning(_("bundle list at '%s' has no mode"), uri);
+		result = 1;
+	}
+
+	return result;
+}
+
 static void find_temp_filename(struct strbuf *name)
 {
 	int fd;
diff --git a/bundle-uri.h b/bundle-uri.h
index 4a12a90bf42..b0183870336 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -73,6 +73,16 @@ int for_all_bundles_in_list(struct bundle_list *list,
 struct FILE;
 void print_bundle_list(FILE *fp, struct bundle_list *list);
 
+/**
+ * A bundle URI may point to a bundle list where the key=value
+ * pairs are provided in config file format. This method is
+ * exposed publicly for testing purposes.
+ */
+
+int parse_bundle_list_in_config_format(const char *uri,
+				       const char *filename,
+				       struct bundle_list *list);
+
 /**
  * Fetch data from the given 'uri' and unbundle the bundle data found
  * based on that information.
diff --git a/t/helper/test-bundle-uri.c b/t/helper/test-bundle-uri.c
index 5cb0c9196fa..23ce0eebca3 100644
--- a/t/helper/test-bundle-uri.c
+++ b/t/helper/test-bundle-uri.c
@@ -4,27 +4,52 @@
 #include "strbuf.h"
 #include "string-list.h"
 
-static int cmd__bundle_uri_parse_key_values(int argc, const char **argv)
+enum input_mode {
+	KEY_VALUE_PAIRS,
+	CONFIG_FILE,
+};
+
+static int cmd__bundle_uri_parse(int argc, const char **argv, enum input_mode mode)
 {
-	const char *usage[] = {
+	const char *key_value_usage[] = {
 		"test-tool bundle-uri parse-key-values <in",
 		NULL
 	};
+	const char *config_usage[] = {
+		"test-tool bundle-uri parse-config <input>",
+		NULL
+	};
 	struct option options[] = {
 		OPT_END(),
 	};
+	const char **usage = key_value_usage;
 	struct strbuf sb = STRBUF_INIT;
 	struct bundle_list list;
 	int err = 0;
 
-	argc = parse_options(argc, argv, NULL, options, usage, 0);
-	if (argc)
-		goto usage;
+	if (mode == CONFIG_FILE)
+		usage = config_usage;
+
+	argc = parse_options(argc, argv, NULL, options, usage,
+			     PARSE_OPT_STOP_AT_NON_OPTION);
 
 	init_bundle_list(&list);
-	while (strbuf_getline(&sb, stdin) != EOF) {
-		if (bundle_uri_parse_line(&list, sb.buf) < 0)
-			err = error("bad line: '%s'", sb.buf);
+
+	switch (mode) {
+	case KEY_VALUE_PAIRS:
+		if (argc)
+			goto usage;
+		while (strbuf_getline(&sb, stdin) != EOF) {
+			if (bundle_uri_parse_line(&list, sb.buf) < 0)
+				err = error("bad line: '%s'", sb.buf);
+		}
+		break;
+
+	case CONFIG_FILE:
+		if (argc != 1)
+			goto usage;
+		err = parse_bundle_list_in_config_format("<uri>", argv[0], &list);
+		break;
 	}
 	strbuf_release(&sb);
 
@@ -55,7 +80,9 @@ int cmd__bundle_uri(int argc, const char **argv)
 		goto usage;
 
 	if (!strcmp(argv[1], "parse-key-values"))
-		return cmd__bundle_uri_parse_key_values(argc - 1, argv + 1);
+		return cmd__bundle_uri_parse(argc - 1, argv + 1, KEY_VALUE_PAIRS);
+	if (!strcmp(argv[1], "parse-config"))
+		return cmd__bundle_uri_parse(argc - 1, argv + 1, CONFIG_FILE);
 	error("there is no test-tool bundle-uri tool '%s'", argv[1]);
 
 usage:
diff --git a/t/t5750-bundle-uri-parse.sh b/t/t5750-bundle-uri-parse.sh
index ebb80596125..c2b7a8ce968 100755
--- a/t/t5750-bundle-uri-parse.sh
+++ b/t/t5750-bundle-uri-parse.sh
@@ -88,4 +88,54 @@ test_expect_success 'bundle_uri_parse_line() parsing edge cases: empty lines' '
 	test_cmp_config_output expect actual
 '
 
+test_expect_success 'parse config format: just URIs' '
+	cat >expect <<-\EOF &&
+	[bundle "list"]
+		version = 1
+		mode = all
+	[bundle "one"]
+		uri = http://example.com/bundle.bdl
+	[bundle "two"]
+		uri = https://example.com/bundle.bdl
+	[bundle "three"]
+		uri = file:///usr/share/git/bundle.bdl
+	EOF
+
+	test-tool bundle-uri parse-config expect >actual 2>err &&
+	test_must_be_empty err &&
+	test_cmp_config_output expect actual
+'
+
+test_expect_success 'parse config format edge cases: empty key or value' '
+	cat >in1 <<-\EOF &&
+	= bogus-value
+	EOF
+
+	cat >err1 <<-EOF &&
+	error: bad config line 1 in file in1
+	EOF
+
+	cat >expect <<-\EOF &&
+	[bundle "list"]
+		version = 1
+		mode = <unknown>
+	EOF
+
+	test_must_fail test-tool bundle-uri parse-config in1 >actual 2>err &&
+	test_cmp err1 err &&
+	test_cmp_config_output expect actual &&
+
+	cat >in2 <<-\EOF &&
+	bogus-key =
+	EOF
+
+	cat >err2 <<-EOF &&
+	warning: bundle list at '\''<uri>'\'' has no mode
+	EOF
+
+	test_must_fail test-tool bundle-uri parse-config in2 >actual 2>err &&
+	test_cmp err2 err &&
+	test_cmp_config_output expect actual
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 15/24] bundle-uri: fetch a list of bundles
  2022-05-20 18:40 [PATCH 00/24] [RFC] Bundle URIs Combined RFC Derrick Stolee via GitGitGadget
                   ` (13 preceding siblings ...)
  2022-05-20 18:40 ` [PATCH 14/24] bundle-uri: parse bundle list in config format Derrick Stolee via GitGitGadget
@ 2022-05-20 18:40 ` Derrick Stolee via GitGitGadget
  2022-05-20 18:40 ` [PATCH 16/24] protocol v2: add server-side "bundle-uri" skeleton Ævar Arnfjörð Bjarmason via GitGitGadget
                   ` (8 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-20 18:40 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, Ævar Arnfjörð Bjarmason,
	Teng Long, Johannes Schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

When the content at a given bundle URI is not understood as a bundle
(based on inspecting the initial content), then Git currently gives up
and ignores that content. Independent bundle providers may want to split
up the bundle content into multiple bundles, but still make them
available from a single URI.

Teach Git to attempt parsing the bundle URI content as a Git config file
providing the key=value pairs for a bundle list. Git then looks at the
mode of the list to see if ANY single bundle is sufficient or if ALL
bundles are required. The content at the selected URIs are downloaded
and the content is inspected again, creating a recursive process.

To guard the recursion against malformed or malicious content, limit the
recursion depth to a reasonable four for now. This can be converted to a
configured value in the future if necessary. The value of four is twice
as high as expected to be useful (a bundle list is unlikely to point to
more bundle lists).

RFC TODO: add tests for this feature right now.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c | 213 ++++++++++++++++++++++++++++++++++++++++++++++-----
 bundle-uri.h |   6 ++
 2 files changed, 200 insertions(+), 19 deletions(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index ab4b6385fc0..601e6e87822 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -35,9 +35,10 @@ void init_bundle_list(struct bundle_list *list)
 static int clear_remote_bundle_info(struct remote_bundle_info *bundle,
 				    void *data)
 {
-	free(bundle->id);
-	free(bundle->uri);
+	FREE_AND_NULL(bundle->id);
+	FREE_AND_NULL(bundle->uri);
 	strbuf_release(&bundle->file);
+	bundle->unbundled = 0;
 	return 0;
 }
 
@@ -336,18 +337,102 @@ static int unbundle_from_file(struct repository *r, const char *file)
 	return result;
 }
 
+struct bundle_list_context {
+	struct repository *r;
+	struct bundle_list *list;
+	enum bundle_list_mode mode;
+	int count;
+	int depth;
+};
+
+/*
+ * This early definition is necessary because we use indirect recursion:
+ *
+ * While iterating through a bundle list that was downloaded as part
+ * of fetch_bundle_uri_internal(), iterator methods eventually call it
+ * again, but with depth + 1.
+ */
+static int fetch_bundle_uri_internal(struct repository *r,
+				     struct remote_bundle_info *bundle,
+				     int depth,
+				     struct bundle_list *list);
+
+static int download_bundle_to_file(struct remote_bundle_info *bundle, void *data)
+{
+	struct bundle_list_context *ctx = data;
+
+	if (ctx->mode == BUNDLE_MODE_ANY && ctx->count)
+		return 0;
+
+	ctx->count++;
+	return fetch_bundle_uri_internal(ctx->r, bundle, ctx->depth + 1, ctx->list);
+}
+
+static int download_bundle_list(struct repository *r,
+				struct bundle_list *local_list,
+				struct bundle_list *global_list,
+				int depth)
+{
+	struct bundle_list_context ctx = {
+		.r = r,
+		.list = global_list,
+		.depth = depth + 1,
+		.mode = local_list->mode,
+	};
+
+	return for_all_bundles_in_list(local_list, download_bundle_to_file, &ctx);
+}
+
+static int fetch_bundle_list_in_config_format(struct repository *r,
+					      struct bundle_list *global_list,
+					      struct remote_bundle_info *bundle,
+					      int depth)
+{
+	int result;
+	struct bundle_list list_from_bundle;
+
+	init_bundle_list(&list_from_bundle);
+
+	if ((result = parse_bundle_list_in_config_format(bundle->uri,
+							 bundle->file.buf,
+							 &list_from_bundle)))
+		goto cleanup;
+
+	if (list_from_bundle.mode == BUNDLE_MODE_NONE) {
+		warning(_("unrecognized bundle mode from URI '%s'"),
+			bundle->uri);
+		result = -1;
+		goto cleanup;
+	}
+
+	if ((result = download_bundle_list(r, &list_from_bundle,
+					   global_list, depth)))
+		goto cleanup;
+
+cleanup:
+	clear_bundle_list(&list_from_bundle);
+	return result;
+}
+
 /**
  * This limits the recursion on fetch_bundle_uri_internal() when following
  * bundle lists.
  */
 static int max_bundle_uri_depth = 4;
 
+/**
+ * Recursively download all bundles advertised at the given URI
+ * to files. If the file is a bundle, then add it to the given
+ * 'list'. Otherwise, expect a bundle list and recurse on the
+ * URIs in that list according to the list mode (ANY or ALL).
+ */
 static int fetch_bundle_uri_internal(struct repository *r,
-				     const char *uri,
-				     int depth)
+				     struct remote_bundle_info *bundle,
+				     int depth,
+				     struct bundle_list *list)
 {
 	int result = 0;
-	struct strbuf filename = STRBUF_INIT;
+	struct remote_bundle_info *bcopy;
 
 	if (depth >= max_bundle_uri_depth) {
 		warning(_("exceeded bundle URI recursion limit (%d)"),
@@ -355,31 +440,121 @@ static int fetch_bundle_uri_internal(struct repository *r,
 		return -1;
 	}
 
-	find_temp_filename(&filename);
-	if ((result = copy_uri_to_file(uri, filename.buf)))
+	if (!bundle->file.len)
+		find_temp_filename(&bundle->file);
+	if ((result = copy_uri_to_file(bundle->uri, bundle->file.buf)))
 		goto cleanup;
 
-	if ((result = !is_bundle(filename.buf, 0)))
-		goto cleanup;
-
-	if ((result = unbundle_from_file(r, filename.buf)))
+	if ((result = !is_bundle(bundle->file.buf, 1))) {
+		result = fetch_bundle_list_in_config_format(
+				r, list, bundle, depth);
 		goto cleanup;
+	}
 
-	git_config_set_multivar_gently("log.excludedecoration",
-					"refs/bundle/",
-					"refs/bundle/",
-					CONFIG_FLAGS_FIXED_VALUE |
-					CONFIG_FLAGS_MULTI_REPLACE);
+	/* Copy the bundle and insert it into the global list. */
+	CALLOC_ARRAY(bcopy, 1);
+	bcopy->id = xstrdup(bundle->id);
+	strbuf_init(&bcopy->file, 0);
+	strbuf_add(&bcopy->file, bundle->file.buf, bundle->file.len);
+	strbuf_detach(&bundle->file, NULL);
+	hashmap_entry_init(&bcopy->ent, strhash(bcopy->id));
+	hashmap_add(&list->bundles, &bcopy->ent);
 
 cleanup:
-	unlink(filename.buf);
-	strbuf_release(&filename);
+	if (bundle->file.len)
+		unlink(bundle->file.buf);
 	return result;
 }
 
+struct attempt_unbundle_context {
+	struct repository *r;
+	int success_count;
+	int failure_count;
+};
+
+static int attempt_unbundle(struct remote_bundle_info *info, void *data)
+{
+	struct attempt_unbundle_context *ctx = data;
+
+	if (info->unbundled || !unbundle_from_file(ctx->r, info->file.buf)) {
+		ctx->success_count++;
+		info->unbundled = 1;
+	} else {
+		ctx->failure_count++;
+	}
+
+	return 0;
+}
+
+static int unbundle_all_bundles(struct repository *r,
+				struct bundle_list *list)
+{
+	int last_success_count = -1;
+	struct attempt_unbundle_context ctx = {
+		.r = r,
+	};
+
+	/*
+	 * Iterate through all bundles looking for ones that can
+	 * successfully unbundle. If any succeed, then perhaps another
+	 * will succeed in the next attempt.
+	 */
+	while (last_success_count < ctx.success_count) {
+		last_success_count = ctx.success_count;
+
+		ctx.success_count = 0;
+		ctx.failure_count = 0;
+		for_all_bundles_in_list(list, attempt_unbundle, &ctx);
+	}
+
+	if (ctx.success_count)
+		git_config_set_multivar_gently("log.excludedecoration",
+						"refs/bundle/",
+						"refs/bundle/",
+						CONFIG_FLAGS_FIXED_VALUE |
+						CONFIG_FLAGS_MULTI_REPLACE);
+
+	if (ctx.failure_count) {
+		warning(_("failed to unbundle %d bundles"),
+			ctx.failure_count);
+		return -1;
+	}
+
+	return 0;
+}
+
+static int unlink_bundle(struct remote_bundle_info *info, void *data)
+{
+	if (info->file.buf)
+		unlink_or_warn(info->file.buf);
+	return 0;
+}
+
 int fetch_bundle_uri(struct repository *r, const char *uri)
 {
-	return fetch_bundle_uri_internal(r, uri, 0);
+	int result;
+	struct bundle_list list;
+	struct remote_bundle_info bundle = {
+		.uri = xstrdup(uri),
+		.id = xstrdup("<root>"),
+		.file = STRBUF_INIT,
+	};
+
+	init_bundle_list(&list);
+
+	/* If a bundle is added to this global list, then it is required. */
+	list.mode = BUNDLE_MODE_ALL;
+
+	if ((result = fetch_bundle_uri_internal(r, &bundle, 0, &list)))
+		goto cleanup;
+
+	result = unbundle_all_bundles(r, &list);
+
+cleanup:
+	for_all_bundles_in_list(&list, unlink_bundle, NULL);
+	clear_bundle_list(&list);
+	clear_remote_bundle_info(&bundle, NULL);
+	return result;
 }
 
 /**
diff --git a/bundle-uri.h b/bundle-uri.h
index b0183870336..adce5dc07e2 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -36,6 +36,12 @@ struct remote_bundle_info {
 	 * an empty string.
 	 */
 	struct strbuf file;
+
+	/**
+	 * If the bundle has been unbundled successfully, then
+	 * this boolean is true.
+	 */
+	unsigned unbundled:1;
 };
 
 #define REMOTE_BUNDLE_INFO_INIT { \
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 16/24] protocol v2: add server-side "bundle-uri" skeleton
  2022-05-20 18:40 [PATCH 00/24] [RFC] Bundle URIs Combined RFC Derrick Stolee via GitGitGadget
                   ` (14 preceding siblings ...)
  2022-05-20 18:40 ` [PATCH 15/24] bundle-uri: fetch a list of bundles Derrick Stolee via GitGitGadget
@ 2022-05-20 18:40 ` Ævar Arnfjörð Bjarmason via GitGitGadget
  2022-05-20 18:40 ` [PATCH 17/24] bundle-uri client: add minimal NOOP client Ævar Arnfjörð Bjarmason via GitGitGadget
                   ` (7 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Ævar Arnfjörð Bjarmason via GitGitGadget @ 2022-05-20 18:40 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, Ævar Arnfjörð Bjarmason,
	Teng Long, Johannes Schindelin, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?=
 <avarab@gmail.com>

Add a skeleton server-side implementation of a new "bundle-uri" command
to protocol v2. This will allow conforming clients to optionally seed
their initial clones or incremental fetches from URLs containing
"*.bundle" files created with "git bundle create".

This change only performs the basic boilerplate of advertising a new
protocol v2 capability. The new 'bundle-uri' capability allows a client
to request a list of bundles. Right now, the server only returns a flush
packet, which corresponds to an empty advertisement.

The critical bit right now is that the new boolean
uploadPack.adverstiseBundleURIs config value signals whether or not this
capability should be advertised at all.

RFC TODO: Write documentation.

RFC TODO: Compare to original implementation.

Co-authored-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c         | 36 ++++++++++++++++++++++++++++++++++++
 bundle-uri.h         |  7 +++++++
 serve.c              |  6 ++++++
 t/t5701-git-serve.sh | 40 +++++++++++++++++++++++++++++++++++++++-
 4 files changed, 88 insertions(+), 1 deletion(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index 601e6e87822..79a6b7ed8b8 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -557,6 +557,42 @@ cleanup:
 	return result;
 }
 
+/**
+ * API for serve.c.
+ */
+
+static int advertise_bundle_uri = -1;
+
+int bundle_uri_advertise(struct repository *r, struct strbuf *value)
+{
+	if (advertise_bundle_uri != -1)
+		goto cached;
+
+	advertise_bundle_uri = 0;
+	git_config_get_maybe_bool("uploadpack.advertisebundleuris", &advertise_bundle_uri);
+
+cached:
+	return advertise_bundle_uri;
+}
+
+int bundle_uri_command(struct repository *r,
+		       struct packet_reader *request)
+{
+	struct packet_writer writer;
+	packet_writer_init(&writer, 1);
+
+	while (packet_reader_read(request) == PACKET_READ_NORMAL)
+		die(_("bundle-uri: unexpected argument: '%s'"), request->line);
+	if (request->status != PACKET_READ_FLUSH)
+		die(_("bundle-uri: expected flush after arguments"));
+
+	/* TODO: Implement the communication */
+
+	packet_writer_flush(&writer);
+
+	return 0;
+}
+
 /**
  * General API for {transport,connect}.c etc.
  */
diff --git a/bundle-uri.h b/bundle-uri.h
index adce5dc07e2..9f6fc2d75f9 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -4,6 +4,7 @@
 #include "hashmap.h"
 #include "strbuf.h"
 
+struct packet_reader;
 struct repository;
 struct strbuf;
 struct string_list;
@@ -97,6 +98,12 @@ int parse_bundle_list_in_config_format(const char *uri,
  */
 int fetch_bundle_uri(struct repository *r, const char *uri);
 
+/**
+ * API for serve.c.
+ */
+int bundle_uri_advertise(struct repository *r, struct strbuf *value);
+int bundle_uri_command(struct repository *r, struct packet_reader *request);
+
 /**
  * General API for {transport,connect}.c etc.
  */
diff --git a/serve.c b/serve.c
index b3fe9b5126a..f3e0203d2c6 100644
--- a/serve.c
+++ b/serve.c
@@ -8,6 +8,7 @@
 #include "protocol-caps.h"
 #include "serve.h"
 #include "upload-pack.h"
+#include "bundle-uri.h"
 
 static int advertise_sid = -1;
 static int client_hash_algo = GIT_HASH_SHA1;
@@ -136,6 +137,11 @@ static struct protocol_capability capabilities[] = {
 		.advertise = always_advertise,
 		.command = cap_object_info,
 	},
+	{
+		.name = "bundle-uri",
+		.advertise = bundle_uri_advertise,
+		.command = bundle_uri_command,
+	},
 };
 
 void protocol_v2_advertise_capabilities(void)
diff --git a/t/t5701-git-serve.sh b/t/t5701-git-serve.sh
index 1896f671cb3..f21e5e9d33d 100755
--- a/t/t5701-git-serve.sh
+++ b/t/t5701-git-serve.sh
@@ -13,7 +13,7 @@ test_expect_success 'test capability advertisement' '
 	wrong_algo sha1:sha256
 	wrong_algo sha256:sha1
 	EOF
-	cat >expect <<-EOF &&
+	cat >expect.base <<-EOF &&
 	version 2
 	agent=git/$(git version | cut -d" " -f3)
 	ls-refs=unborn
@@ -21,8 +21,11 @@ test_expect_success 'test capability advertisement' '
 	server-option
 	object-format=$(test_oid algo)
 	object-info
+	EOF
+	cat >expect.trailer <<-EOF &&
 	0000
 	EOF
+	cat expect.base expect.trailer >expect &&
 
 	GIT_TEST_SIDEBAND_ALL=0 test-tool serve-v2 \
 		--advertise-capabilities >out &&
@@ -342,4 +345,39 @@ test_expect_success 'basics of object-info' '
 	test_cmp expect actual
 '
 
+test_expect_success 'test capability advertisement with uploadpack.advertiseBundleURIs' '
+	test_config uploadpack.advertiseBundleURIs true &&
+
+	cat >expect.extra <<-EOF &&
+	bundle-uri
+	EOF
+	cat expect.base \
+	    expect.extra \
+	    expect.trailer >expect &&
+
+	GIT_TEST_SIDEBAND_ALL=0 test-tool serve-v2 \
+		--advertise-capabilities >out &&
+	test-tool pkt-line unpack <out >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'basics of bundle-uri: dies if not enabled' '
+	test-tool pkt-line pack >in <<-EOF &&
+	command=bundle-uri
+	0000
+	EOF
+
+	cat >err.expect <<-\EOF &&
+	fatal: invalid command '"'"'bundle-uri'"'"'
+	EOF
+
+	cat >expect <<-\EOF &&
+	ERR serve: invalid command '"'"'bundle-uri'"'"'
+	EOF
+
+	test_must_fail test-tool serve-v2 --stateless-rpc <in >out 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_must_be_empty out
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 17/24] bundle-uri client: add minimal NOOP client
  2022-05-20 18:40 [PATCH 00/24] [RFC] Bundle URIs Combined RFC Derrick Stolee via GitGitGadget
                   ` (15 preceding siblings ...)
  2022-05-20 18:40 ` [PATCH 16/24] protocol v2: add server-side "bundle-uri" skeleton Ævar Arnfjörð Bjarmason via GitGitGadget
@ 2022-05-20 18:40 ` Ævar Arnfjörð Bjarmason via GitGitGadget
  2022-05-20 18:40 ` [PATCH 18/24] bundle-uri client: add "git ls-remote-bundle-uri" Ævar Arnfjörð Bjarmason via GitGitGadget
                   ` (6 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Ævar Arnfjörð Bjarmason via GitGitGadget @ 2022-05-20 18:40 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, Ævar Arnfjörð Bjarmason,
	Teng Long, Johannes Schindelin, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?=
 <avarab@gmail.com>

Set up all the needed client parts of the "bundle-uri" protocol
extension, without actually doing anything with the bundle URIs.

I.e. if the server says it supports "bundle-uri" we'll issue a
command=bundle-uri after command=ls-refs when we're cloning. We'll
parse the returned output using the code already tested for in
t5750-bundle-uri-parse.sh.

What we aren't doing is actually acting on that data, i.e. downloading
the bundle(s) before we get to doing the command=fetch, and adjusting
our negotiation dialog appropriately. I'll do that in subsequent
commits.

There's a question of what level of encapsulation we should use here,
I've opted to use connect.h in clone.c, but we could also e.g. make
transport_get_remote_refs() invoke this, i.e. make it implicitly get
the bundle-uri list for later steps.

This approach means that we don't "support" this in "git fetch" for
now. I'm starting with the case of initial clones, although as noted
in preceding commits to the protocol documentation nothing about this
approach precludes getting bundles on incremental fetches.

For the t5732-protocol-v2-bundle-uri-http.sh it's not easy to set
environment variables for git-upload-pack (it's started by Apache), so
let's skip the test under T5730_HTTP, and add unused T5730_{FILE,GIT}
prerequisites for consistency and future use.

Co-authored-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/clone.c                        |   7 ++
 bundle-uri.c                           |   4 +
 connect.c                              |  47 ++++++++
 remote.h                               |   5 +
 t/lib-t5730-protocol-v2-bundle-uri.sh  | 148 +++++++++++++++++++++++++
 t/t5730-protocol-v2-bundle-uri-file.sh |  36 ++++++
 t/t5731-protocol-v2-bundle-uri-git.sh  |  17 +++
 t/t5732-protocol-v2-bundle-uri-http.sh |  17 +++
 transport-helper.c                     |  13 +++
 transport-internal.h                   |   7 ++
 transport.c                            |  55 +++++++++
 transport.h                            |  19 ++++
 12 files changed, 375 insertions(+)
 create mode 100644 t/lib-t5730-protocol-v2-bundle-uri.sh
 create mode 100755 t/t5730-protocol-v2-bundle-uri-file.sh
 create mode 100755 t/t5731-protocol-v2-bundle-uri-git.sh
 create mode 100755 t/t5732-protocol-v2-bundle-uri-http.sh

diff --git a/builtin/clone.c b/builtin/clone.c
index a9caa5dfed6..7bedd9eb29e 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -27,6 +27,7 @@
 #include "iterator.h"
 #include "sigchain.h"
 #include "branch.h"
+#include "connect.h"
 #include "remote.h"
 #include "run-command.h"
 #include "connected.h"
@@ -1262,6 +1263,12 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	if (refs)
 		mapped_refs = wanted_peer_refs(refs, &remote->fetch);
 
+	/*
+	 * Populate transport->got_remote_bundle_uri and
+	 * transport->bundle_uri. We might get nothing.
+	 */
+	transport_get_remote_bundle_uri(transport);
+
 	if (mapped_refs) {
 		int hash_algo = hash_algo_by_ptr(transport_get_hash_algo(transport));
 
diff --git a/bundle-uri.c b/bundle-uri.c
index 79a6b7ed8b8..fed508cd51a 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -565,6 +565,10 @@ static int advertise_bundle_uri = -1;
 
 int bundle_uri_advertise(struct repository *r, struct strbuf *value)
 {
+	if (value &&
+	    git_env_bool("GIT_TEST_BUNDLE_URI_UNKNOWN_CAPABILITY_VALUE", 0))
+		strbuf_addstr(value, "test-unknown-capability-value");
+
 	if (advertise_bundle_uri != -1)
 		goto cached;
 
diff --git a/connect.c b/connect.c
index e6d0b1d34bd..00dd00cfa44 100644
--- a/connect.c
+++ b/connect.c
@@ -15,6 +15,7 @@
 #include "version.h"
 #include "protocol.h"
 #include "alias.h"
+#include "bundle-uri.h"
 
 static char *server_capabilities_v1;
 static struct strvec server_capabilities_v2 = STRVEC_INIT;
@@ -491,6 +492,52 @@ static void send_capabilities(int fd_out, struct packet_reader *reader)
 	}
 }
 
+int get_remote_bundle_uri(int fd_out, struct packet_reader *reader,
+			  struct bundle_list *bundles, int stateless_rpc)
+{
+	int line_nr = 1;
+
+	/* Assert bundle-uri support */
+	server_supports_v2("bundle-uri", 1);
+
+	/* (Re-)send capabilities */
+	send_capabilities(fd_out, reader);
+
+	/* Send command */
+	packet_write_fmt(fd_out, "command=bundle-uri\n");
+	packet_delim(fd_out);
+
+	/* Send options */
+	if (git_env_bool("GIT_TEST_PROTOCOL_BAD_BUNDLE_URI", 0))
+		packet_write_fmt(fd_out, "test-bad-client\n");
+	packet_flush(fd_out);
+
+	/* Process response from server */
+	while (packet_reader_read(reader) == PACKET_READ_NORMAL) {
+		const char *line = reader->line;
+		line_nr++;
+
+		if (!bundle_uri_parse_line(bundles, line))
+			continue;
+
+		return error(_("error on bundle-uri response line %d: %s"),
+			     line_nr, line);
+	}
+
+	if (reader->status != PACKET_READ_FLUSH)
+		return error(_("expected flush after bundle-uri listing"));
+
+	/*
+	 * Might die(), but obscure enough that that's OK, e.g. in
+	 * serve.c we'll call BUG() on its equivalent (the
+	 * PACKET_READ_RESPONSE_END check).
+	 */
+	check_stateless_delimiter(stateless_rpc, reader,
+				  _("expected response end packet after ref listing"));
+
+	return 0;
+}
+
 struct ref **get_remote_refs(int fd_out, struct packet_reader *reader,
 			     struct ref **list, int for_push,
 			     struct transport_ls_refs_options *transport_options,
diff --git a/remote.h b/remote.h
index dd4402436f1..5f743438cd4 100644
--- a/remote.h
+++ b/remote.h
@@ -236,6 +236,11 @@ struct ref **get_remote_refs(int fd_out, struct packet_reader *reader,
 			     const struct string_list *server_options,
 			     int stateless_rpc);
 
+/* Used for protocol v2 in order to retrieve refs from a remote */
+struct bundle_list;
+int get_remote_bundle_uri(int fd_out, struct packet_reader *reader,
+			  struct bundle_list *bundles, int stateless_rpc);
+
 int resolve_remote_symref(struct ref *ref, struct ref *list);
 
 /*
diff --git a/t/lib-t5730-protocol-v2-bundle-uri.sh b/t/lib-t5730-protocol-v2-bundle-uri.sh
new file mode 100644
index 00000000000..27294e9c976
--- /dev/null
+++ b/t/lib-t5730-protocol-v2-bundle-uri.sh
@@ -0,0 +1,148 @@
+# Included from t573*-protocol-v2-bundle-uri-*.sh
+
+T5730_PARENT=
+T5730_URI=
+T5730_BUNDLE_URI=
+case "$T5730_PROTOCOL" in
+file)
+	T5730_PARENT=file_parent
+	T5730_URI="file://$PWD/file_parent"
+	T5730_BUNDLE_URI="$T5730_URI/fake.bdl"
+	test_set_prereq T5730_FILE
+	;;
+git)
+	. "$TEST_DIRECTORY"/lib-git-daemon.sh
+	start_git_daemon --export-all --enable=receive-pack
+	T5730_PARENT="$GIT_DAEMON_DOCUMENT_ROOT_PATH/parent"
+	T5730_URI="$GIT_DAEMON_URL/parent"
+	T5730_BUNDLE_URI="https://example.com/fake.bdl"
+	test_set_prereq T5730_GIT
+	;;
+http)
+	. "$TEST_DIRECTORY"/lib-httpd.sh
+	start_httpd
+	T5730_PARENT="$HTTPD_DOCUMENT_ROOT_PATH/http_parent"
+	T5730_URI="$HTTPD_URL/smart/http_parent"
+	T5730_BUNDLE_URI="https://example.com/fake.bdl"
+	test_set_prereq T5730_HTTP
+	;;
+*)
+	BUG "Need to pass valid T5730_PROTOCOL (was $T5730_PROTOCOL)"
+	;;
+esac
+
+test_expect_success "setup protocol v2 $T5730_PROTOCOL:// tests" '
+	git init "$T5730_PARENT" &&
+	test_commit -C "$T5730_PARENT" one &&
+	git -C "$T5730_PARENT" config uploadpack.advertiseBundleURIs true
+'
+
+# Poor man's URI escaping. Good enough for the test suite whose trash
+# directory has a space in it. See 93c3fcbe4d4 (git-svn: attempt to
+# mimic SVN 1.7 URL canonicalization, 2012-07-28) for prior art.
+test_uri_escape() {
+	sed 's/ /%20/g'
+}
+
+case "$T5730_PROTOCOL" in
+http)
+	test_expect_success "setup config for $T5730_PROTOCOL:// tests" '
+		git -C "$T5730_PARENT" config http.receivepack true
+	'
+	;;
+*)
+	;;
+esac
+T5730_BUNDLE_URI_ESCAPED=$(echo "$T5730_BUNDLE_URI" | test_uri_escape)
+
+test_expect_success "connect with $T5730_PROTOCOL:// using protocol v2: no bundle-uri" '
+	test_when_finished "rm -f log" &&
+	test_when_finished "git -C \"$T5730_PARENT\" config uploadpack.advertiseBundleURIs true" &&
+	git -C "$T5730_PARENT" config uploadpack.advertiseBundleURIs false &&
+
+	GIT_TRACE_PACKET="$PWD/log" \
+	git \
+		-c protocol.version=2 \
+		ls-remote --symref "$T5730_URI" \
+		>actual 2>err &&
+
+	# Server responded using protocol v2
+	grep "< version 2" log &&
+
+	! grep bundle-uri log
+'
+
+test_expect_success "connect with $T5730_PROTOCOL:// using protocol v2: have bundle-uri" '
+	test_when_finished "rm -f log" &&
+
+	test_config -C "$T5730_PARENT" \
+		uploadpack.bundleURI "$T5730_BUNDLE_URI_ESCAPED" &&
+
+	GIT_TRACE_PACKET="$PWD/log" \
+	git \
+		-c protocol.version=2 \
+		ls-remote --symref "$T5730_URI" \
+		>actual 2>err &&
+
+	# Server responded using protocol v2
+	grep "< version 2" log &&
+
+	# Server advertised bundle-uri capability
+	grep bundle-uri log
+'
+
+test_expect_success !T5730_HTTP "bad client with $T5730_PROTOCOL:// using protocol v2" '
+	test_when_finished "rm -f log" &&
+
+	test_config -C "$T5730_PARENT" uploadpack.bundleURI \
+		"$T5730_BUNDLE_URI_ESCAPED" &&
+
+	cat >err.expect <<-\EOF &&
+	Cloning into '"'"'child'"'"'...
+	EOF
+	case "$T5730_PROTOCOL" in
+	file)
+		cat >fatal-bundle-uri.expect <<-\EOF
+		fatal: bundle-uri: unexpected argument: '"'"'test-bad-client'"'"'
+		EOF
+		;;
+	*)
+		cat >fatal.expect <<-\EOF
+		fatal: read error: Connection reset by peer
+		EOF
+		;;
+	esac &&
+
+	test_when_finished "rm -rf child" &&
+	test_must_fail ok=sigpipe env \
+		GIT_TRACE_PACKET="$PWD/log" \
+		GIT_TEST_PROTOCOL_BAD_BUNDLE_URI=true \
+		git -c protocol.version=2 \
+		clone "$T5730_URI" child \
+		>out 2>err &&
+	test_must_be_empty out &&
+
+	grep -v -e ^fatal: -e ^error: err >err.actual &&
+	test_cmp err.expect err.actual &&
+
+	case "$T5730_PROTOCOL" in
+	file)
+		# Due to general race conditions with client/server replies we
+		# may or may not get "fatal: the remote end hung up
+		# expectedly" here
+		grep "^fatal: bundle-uri:" err >fatal-bundle-uri.actual &&
+		test_cmp fatal-bundle-uri.expect fatal-bundle-uri.actual
+		;;
+	*)
+		grep "^fatal:" err >fatal.actual &&
+		# Due to the same race conditions this might be
+		# "fatal: read error: Connection reset by peer", "fatal: the remote end
+		# hung up unexpectedly" etc.
+		cat fatal.actual &&
+		test_file_not_empty fatal.actual
+		;;
+	esac &&
+
+	grep "clone> test-bad-client$" log >sent-bad-request &&
+	test_file_not_empty sent-bad-request
+'
diff --git a/t/t5730-protocol-v2-bundle-uri-file.sh b/t/t5730-protocol-v2-bundle-uri-file.sh
new file mode 100755
index 00000000000..89203d3a23c
--- /dev/null
+++ b/t/t5730-protocol-v2-bundle-uri-file.sh
@@ -0,0 +1,36 @@
+#!/bin/sh
+
+test_description="Test bundle-uri with protocol v2 and 'file://' transport"
+
+TEST_NO_CREATE_REPO=1
+
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+
+. ./test-lib.sh
+
+# Test protocol v2 with 'file://' transport
+#
+T5730_PROTOCOL=file
+. "$TEST_DIRECTORY"/lib-t5730-protocol-v2-bundle-uri.sh
+
+test_expect_success "unknown capability value with $T5730_PROTOCOL:// using protocol v2" '
+	test_when_finished "rm -f log" &&
+
+	test_config -C "$T5730_PARENT" \
+		uploadpack.bundleURI "$T5730_BUNDLE_URI_ESCAPED" &&
+
+	GIT_TRACE_PACKET="$PWD/log" \
+	GIT_TEST_BUNDLE_URI_UNKNOWN_CAPABILITY_VALUE=true \
+	git \
+		-c protocol.version=2 \
+		ls-remote --symref "$T5730_URI" \
+		>actual 2>err &&
+
+	# Server responded using protocol v2
+	grep "< version 2" log &&
+
+	grep "> bundle-uri=test-unknown-capability-value" log
+'
+
+test_done
diff --git a/t/t5731-protocol-v2-bundle-uri-git.sh b/t/t5731-protocol-v2-bundle-uri-git.sh
new file mode 100755
index 00000000000..282847b311f
--- /dev/null
+++ b/t/t5731-protocol-v2-bundle-uri-git.sh
@@ -0,0 +1,17 @@
+#!/bin/sh
+
+test_description="Test bundle-uri with protocol v2 and 'git://' transport"
+
+TEST_NO_CREATE_REPO=1
+
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+
+. ./test-lib.sh
+
+# Test protocol v2 with 'git://' transport
+#
+T5730_PROTOCOL=git
+. "$TEST_DIRECTORY"/lib-t5730-protocol-v2-bundle-uri.sh
+
+test_done
diff --git a/t/t5732-protocol-v2-bundle-uri-http.sh b/t/t5732-protocol-v2-bundle-uri-http.sh
new file mode 100755
index 00000000000..fcc1cf3faef
--- /dev/null
+++ b/t/t5732-protocol-v2-bundle-uri-http.sh
@@ -0,0 +1,17 @@
+#!/bin/sh
+
+test_description="Test bundle-uri with protocol v2 and 'git://' transport"
+
+TEST_NO_CREATE_REPO=1
+
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+
+. ./test-lib.sh
+
+# Test protocol v2 with 'git://' transport
+#
+T5730_PROTOCOL=http
+. "$TEST_DIRECTORY"/lib-t5730-protocol-v2-bundle-uri.sh
+
+test_done
diff --git a/transport-helper.c b/transport-helper.c
index dfbeaebe40c..ee7d0b09a24 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -1270,9 +1270,22 @@ static struct ref *get_refs_list_using_list(struct transport *transport,
 	return ret;
 }
 
+static int get_bundle_uri(struct transport *transport)
+{
+	get_helper(transport);
+
+	if (process_connect(transport, 0)) {
+		do_take_over(transport);
+		return transport->vtable->get_bundle_uri(transport);
+	}
+
+	return -1;
+}
+
 static struct transport_vtable vtable = {
 	.set_option	= set_helper_option,
 	.get_refs_list	= get_refs_list,
+	.get_bundle_uri = get_bundle_uri,
 	.fetch_refs	= fetch_refs,
 	.push_refs	= push_refs,
 	.connect	= connect_helper,
diff --git a/transport-internal.h b/transport-internal.h
index c4ca0b733ac..90ea749e5cf 100644
--- a/transport-internal.h
+++ b/transport-internal.h
@@ -26,6 +26,13 @@ struct transport_vtable {
 	struct ref *(*get_refs_list)(struct transport *transport, int for_push,
 				     struct transport_ls_refs_options *transport_options);
 
+	/**
+	 * Populates the remote side's bundle-uri under protocol v2,
+	 * if the "bundle-uri" capability was advertised. Returns 0 if
+	 * OK, negative values on error.
+	 */
+	int (*get_bundle_uri)(struct transport *transport);
+
 	/**
 	 * Fetch the objects for the given refs. Note that this gets
 	 * an array, and should ignore the list structure.
diff --git a/transport.c b/transport.c
index 3d64a43ab39..106bb8c5577 100644
--- a/transport.c
+++ b/transport.c
@@ -22,6 +22,7 @@
 #include "protocol.h"
 #include "object-store.h"
 #include "color.h"
+#include "bundle-uri.h"
 
 static int transport_use_color = -1;
 static char transport_colors[][COLOR_MAXLEN] = {
@@ -359,6 +360,25 @@ static struct ref *get_refs_via_connect(struct transport *transport, int for_pus
 	return handshake(transport, for_push, options, 1);
 }
 
+static int get_bundle_uri(struct transport *transport)
+{
+	struct git_transport_data *data = transport->data;
+	struct packet_reader reader;
+	int stateless_rpc = transport->stateless_rpc;
+
+	if (!transport->bundles) {
+		CALLOC_ARRAY(transport->bundles, 1);
+		init_bundle_list(transport->bundles);
+	}
+
+	packet_reader_init(&reader, data->fd[0], NULL, 0,
+			   PACKET_READ_CHOMP_NEWLINE |
+			   PACKET_READ_GENTLE_ON_EOF);
+
+	return get_remote_bundle_uri(data->fd[1], &reader,
+				     transport->bundles, stateless_rpc);
+}
+
 static int fetch_refs_via_pack(struct transport *transport,
 			       int nr_heads, struct ref **to_fetch)
 {
@@ -899,6 +919,7 @@ static int disconnect_git(struct transport *transport)
 
 static struct transport_vtable taken_over_vtable = {
 	.get_refs_list	= get_refs_via_connect,
+	.get_bundle_uri = get_bundle_uri,
 	.fetch_refs	= fetch_refs_via_pack,
 	.push_refs	= git_transport_push,
 	.disconnect	= disconnect_git
@@ -1052,6 +1073,7 @@ static struct transport_vtable bundle_vtable = {
 
 static struct transport_vtable builtin_smart_vtable = {
 	.get_refs_list	= get_refs_via_connect,
+	.get_bundle_uri = get_bundle_uri,
 	.fetch_refs	= fetch_refs_via_pack,
 	.push_refs	= git_transport_push,
 	.connect	= connect_git,
@@ -1066,6 +1088,9 @@ struct transport *transport_get(struct remote *remote, const char *url)
 	ret->progress = isatty(2);
 	string_list_init_dup(&ret->pack_lockfiles);
 
+	CALLOC_ARRAY(ret->bundles, 1);
+	init_bundle_list(ret->bundles);
+
 	if (!remote)
 		BUG("No remote provided to transport_get()");
 
@@ -1473,6 +1498,34 @@ int transport_fetch_refs(struct transport *transport, struct ref *refs)
 	return rc;
 }
 
+int transport_get_remote_bundle_uri(struct transport *transport)
+{
+	const struct transport_vtable *vtable = transport->vtable;
+
+	/* Check config only once. */
+	if (transport->got_remote_bundle_uri++)
+		return 0;
+
+	/*
+	 * "Support" protocol v0 and v2 without bundle-uri support by
+	 * silently degrading to a NOOP.
+	 */
+	if (!server_supports_v2("bundle-uri", 0))
+		return 0;
+
+	/*
+	 * This is intentionally below the transport.injectBundleURI,
+	 * we want to be able to inject into protocol v0, or into the
+	 * dialog of a server who doesn't support this.
+	 */
+	if (!vtable->get_bundle_uri)
+		return error(_("bundle-uri operation not supported by protocol"));
+
+	if (vtable->get_bundle_uri(transport) < 0)
+		return error(_("could not retrieve server-advertised bundle-uri list"));
+	return 0;
+}
+
 void transport_unlock_pack(struct transport *transport, unsigned int flags)
 {
 	int in_signal_handler = !!(flags & TRANSPORT_UNLOCK_PACK_IN_SIGNAL_HANDLER);
@@ -1503,6 +1556,8 @@ int transport_disconnect(struct transport *transport)
 		ret = transport->vtable->disconnect(transport);
 	if (transport->got_remote_refs)
 		free_refs((void *)transport->remote_refs);
+	clear_bundle_list(transport->bundles);
+	free(transport->bundles);
 	free(transport);
 	return ret;
 }
diff --git a/transport.h b/transport.h
index 12bc08fc339..015375e083d 100644
--- a/transport.h
+++ b/transport.h
@@ -62,6 +62,7 @@ enum transport_family {
 	TRANSPORT_FAMILY_IPV6
 };
 
+struct bundle_list;
 struct transport {
 	const struct transport_vtable *vtable;
 
@@ -76,6 +77,18 @@ struct transport {
 	 */
 	unsigned got_remote_refs : 1;
 
+	/**
+	 * Indicates whether we already called get_bundle_uri_list(); set by
+	 * transport.c::transport_get_remote_bundle_uri().
+	 */
+	unsigned got_remote_bundle_uri : 1;
+
+	/*
+	 * The results of "command=bundle-uri", if both sides support
+	 * the "bundle-uri" capability.
+	 */
+	struct bundle_list *bundles;
+
 	/*
 	 * Transports that call take-over destroys the data specific to
 	 * the transport type while doing so, and cannot be reused.
@@ -280,6 +293,12 @@ void transport_ls_refs_options_release(struct transport_ls_refs_options *opts);
 const struct ref *transport_get_remote_refs(struct transport *transport,
 					    struct transport_ls_refs_options *transport_options);
 
+/**
+ * Retrieve bundle URI(s) from a remote. Populates "struct
+ * transport"'s "bundle_uri" and "got_remote_bundle_uri".
+ */
+int transport_get_remote_bundle_uri(struct transport *transport);
+
 /*
  * Fetch the hash algorithm used by a remote.
  *
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 18/24] bundle-uri client: add "git ls-remote-bundle-uri"
  2022-05-20 18:40 [PATCH 00/24] [RFC] Bundle URIs Combined RFC Derrick Stolee via GitGitGadget
                   ` (16 preceding siblings ...)
  2022-05-20 18:40 ` [PATCH 17/24] bundle-uri client: add minimal NOOP client Ævar Arnfjörð Bjarmason via GitGitGadget
@ 2022-05-20 18:40 ` Ævar Arnfjörð Bjarmason via GitGitGadget
  2022-05-20 18:40 ` [PATCH 19/24] bundle-uri: serve URI advertisement from bundle.* config Derrick Stolee via GitGitGadget
                   ` (5 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Ævar Arnfjörð Bjarmason via GitGitGadget @ 2022-05-20 18:40 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, Ævar Arnfjörð Bjarmason,
	Teng Long, Johannes Schindelin, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?=
 <avarab@gmail.com>

TODO: make this a test-helper instead of its own builtin?

Add a git-ls-remote-bundle-uri command, this is a thin wrapper for
issuing protocol v2 "bundle-uri" commands to a server, and to the
parsing routines in bundle-uri.c.

Since in the "git clone" case we'll have already done the handshake(),
but not here, introduce a "got_advertisement" state along with
"got_remote_heads". It seems to me that the "got_remote_heads" is
badly named in the first place, and the whole logic of eagerly getting
ls-refs on handshake() or not could be refactored somewhat, but let's
not do that now, and instead just add another self-documenting state
variable.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 .gitignore                                 |   1 +
 Documentation/git-ls-remote-bundle-uri.txt |  62 +++++++++
 Documentation/git-ls-remote.txt            |   1 +
 Makefile                                   |   1 +
 builtin.h                                  |   1 +
 builtin/clone.c                            |   2 +-
 builtin/ls-remote-bundle-uri.c             |  71 ++++++++++
 command-list.txt                           |   1 +
 git.c                                      |   1 +
 t/lib-t5730-protocol-v2-bundle-uri.sh      | 147 ++++++++++++++++++++-
 transport.c                                |  43 ++++--
 transport.h                                |   6 +-
 12 files changed, 324 insertions(+), 13 deletions(-)
 create mode 100644 Documentation/git-ls-remote-bundle-uri.txt
 create mode 100644 builtin/ls-remote-bundle-uri.c

diff --git a/.gitignore b/.gitignore
index e81de1063a4..e1312b2795c 100644
--- a/.gitignore
+++ b/.gitignore
@@ -91,6 +91,7 @@
 /git-log
 /git-ls-files
 /git-ls-remote
+/git-ls-remote-bundle-uri
 /git-ls-tree
 /git-mailinfo
 /git-mailsplit
diff --git a/Documentation/git-ls-remote-bundle-uri.txt b/Documentation/git-ls-remote-bundle-uri.txt
new file mode 100644
index 00000000000..793d7677f2f
--- /dev/null
+++ b/Documentation/git-ls-remote-bundle-uri.txt
@@ -0,0 +1,62 @@
+git-ls-remote-bundle-uri(1)
+===========================
+
+NAME
+----
+git-ls-remote-bundle-uri - List 'bundle-uri' in a remote repository
+
+SYNOPSIS
+--------
+[verse]
+'git ls-remote-bundle-uri' [-q |--quiet] [--uri] [--upload-pack=<exec>]
+			 [[-o | --server-option=]<option>] <repository>
+
+
+DESCRIPTION
+-----------
+
+Displays the `bundle-uri`s advertised by a remote repository. See
+`bundle-uri` in link:technical/protocol-v2.html[the Git Wire Protocol,
+Version 2] documentation for what the output format looks like.
+
+OPTIONS
+-------
+
+-q::
+--quiet::
+	Do not print remote URL to stderr in cases where the remote
+	name is inferred from config.
++
+When the remote name is not inferred (e.g. `git ls-remote-bundle-uri
+origin`, or `git ls-remote-bundle-uri https://[...]`) the remote URL
+is not printed in any case.
+
+--uri::
+	Print only the URIs, and not any of their optional attributes.
+
+--upload-pack=<exec>::
+	Specify the full path of 'git-upload-pack' on the remote
+	host. This allows listing references from repositories accessed via
+	SSH and where the SSH daemon does not use the PATH configured by the
+	user.
+
+-o <option>::
+--server-option=<option>::
+	Transmit the given string to the server when communicating using
+	protocol version 2.  The given string must not contain a NUL or LF
+	character.
+	When multiple `--server-option=<option>` are given, they are all
+	sent to the other side in the order listed on the command line.
+
+<repository>::
+	The "remote" repository to query.  This parameter can be
+	either a URL or the name of a remote (see the GIT URLS and
+	REMOTES sections of linkgit:git-fetch[1]).
+
+SEE ALSO
+--------
+linkgit:git-ls-remote[1].
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/Documentation/git-ls-remote.txt b/Documentation/git-ls-remote.txt
index 492e573856f..86c07eff832 100644
--- a/Documentation/git-ls-remote.txt
+++ b/Documentation/git-ls-remote.txt
@@ -114,6 +114,7 @@ c5db5456ae3b0873fc659c19fafdde22313cc441	refs/tags/v0.99.2
 
 SEE ALSO
 --------
+linkgit:git-ls-remote-bundle-uri[1].
 linkgit:git-check-ref-format[1].
 
 GIT
diff --git a/Makefile b/Makefile
index c8a14793005..badb87bed78 100644
--- a/Makefile
+++ b/Makefile
@@ -1161,6 +1161,7 @@ BUILTIN_OBJS += builtin/init-db.o
 BUILTIN_OBJS += builtin/interpret-trailers.o
 BUILTIN_OBJS += builtin/log.o
 BUILTIN_OBJS += builtin/ls-files.o
+BUILTIN_OBJS += builtin/ls-remote-bundle-uri.o
 BUILTIN_OBJS += builtin/ls-remote.o
 BUILTIN_OBJS += builtin/ls-tree.o
 BUILTIN_OBJS += builtin/mailinfo.o
diff --git a/builtin.h b/builtin.h
index 40e9ecc8485..c80bec94abe 100644
--- a/builtin.h
+++ b/builtin.h
@@ -173,6 +173,7 @@ int cmd_log(int argc, const char **argv, const char *prefix);
 int cmd_log_reflog(int argc, const char **argv, const char *prefix);
 int cmd_ls_files(int argc, const char **argv, const char *prefix);
 int cmd_ls_tree(int argc, const char **argv, const char *prefix);
+int cmd_ls_remote_bundle_uri(int argc, const char **argv, const char *prefix);
 int cmd_ls_remote(int argc, const char **argv, const char *prefix);
 int cmd_mailinfo(int argc, const char **argv, const char *prefix);
 int cmd_mailsplit(int argc, const char **argv, const char *prefix);
diff --git a/builtin/clone.c b/builtin/clone.c
index 7bedd9eb29e..ca2291552f7 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1267,7 +1267,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	 * Populate transport->got_remote_bundle_uri and
 	 * transport->bundle_uri. We might get nothing.
 	 */
-	transport_get_remote_bundle_uri(transport);
+	transport_get_remote_bundle_uri(transport, 1);
 
 	if (mapped_refs) {
 		int hash_algo = hash_algo_by_ptr(transport_get_hash_algo(transport));
diff --git a/builtin/ls-remote-bundle-uri.c b/builtin/ls-remote-bundle-uri.c
new file mode 100644
index 00000000000..dbd499d369e
--- /dev/null
+++ b/builtin/ls-remote-bundle-uri.c
@@ -0,0 +1,71 @@
+#include "builtin.h"
+#include "cache.h"
+#include "bundle-uri.h"
+#include "transport.h"
+#include "ref-filter.h"
+#include "remote.h"
+#include "refs.h"
+
+static const char * const ls_remote_bundle_uri_usage[] = {
+	N_("git ls-remote-bundle-uri <repository>"),
+	NULL
+};
+
+int cmd_ls_remote_bundle_uri(int argc, const char **argv, const char *prefix)
+{
+	int quiet = 0;
+	int uri = 0;
+	const char *uploadpack = NULL;
+	struct string_list server_options = STRING_LIST_INIT_DUP;
+	struct option options[] = {
+		OPT__QUIET(&quiet, N_("do not print remote URL")),
+		OPT_BOOL(0, "uri", &uri, N_("limit to showing uri field")),
+		OPT_STRING(0, "upload-pack", &uploadpack, N_("exec"),
+			   N_("path of git-upload-pack on the remote host")),
+		OPT_STRING_LIST('o', "server-option", &server_options,
+				N_("server-specific"),
+				N_("option to transmit")),
+		OPT_END()
+	};
+	const char *dest = NULL;
+	struct remote *remote;
+	struct transport *transport;
+	int status = 0;
+
+	argc = parse_options(argc, argv, prefix, options, ls_remote_bundle_uri_usage,
+			     PARSE_OPT_STOP_AT_NON_OPTION);
+	dest = argv[0];
+
+	packet_trace_identity("ls-remote-bundle-uri");
+
+	remote = remote_get(dest);
+	if (!remote) {
+		if (dest)
+			die(_("bad repository '%s'"), dest);
+		die(_("no remote configured to get bundle URIs from"));
+	}
+	if (!remote->url_nr)
+		die(_("remote '%s' has no configured URL"), dest);
+
+	transport = transport_get(remote, NULL);
+	if (uploadpack)
+		transport_set_option(transport, TRANS_OPT_UPLOADPACK, uploadpack);
+	if (server_options.nr)
+		transport->server_options = &server_options;
+
+	if (!dest && !quiet)
+		fprintf(stderr, "From %s\n", *remote->url);
+
+	if (transport_get_remote_bundle_uri(transport, 0) < 0) {
+		error(_("could not get the bundle-uri list"));
+		status = 1;
+		goto cleanup;
+	}
+
+	print_bundle_list(stdout, transport->bundles);
+
+cleanup:
+	if (transport_disconnect(transport))
+		return 1;
+	return status;
+}
diff --git a/command-list.txt b/command-list.txt
index 9bd6f3c48f4..a50eebd4aa2 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -115,6 +115,7 @@ git-interpret-trailers                  purehelpers
 git-log                                 mainporcelain           info
 git-ls-files                            plumbinginterrogators
 git-ls-remote                           plumbinginterrogators
+git-ls-remote-bundle-uri                plumbinginterrogators
 git-ls-tree                             plumbinginterrogators
 git-mailinfo                            purehelpers
 git-mailsplit                           purehelpers
diff --git a/git.c b/git.c
index 3d8e48cf555..352e76cedaf 100644
--- a/git.c
+++ b/git.c
@@ -551,6 +551,7 @@ static struct cmd_struct commands[] = {
 	{ "log", cmd_log, RUN_SETUP },
 	{ "ls-files", cmd_ls_files, RUN_SETUP },
 	{ "ls-remote", cmd_ls_remote, RUN_SETUP_GENTLY },
+	{ "ls-remote-bundle-uri", cmd_ls_remote_bundle_uri, RUN_SETUP_GENTLY },
 	{ "ls-tree", cmd_ls_tree, RUN_SETUP },
 	{ "mailinfo", cmd_mailinfo, RUN_SETUP_GENTLY | NO_PARSEOPT },
 	{ "mailsplit", cmd_mailsplit, NO_PARSEOPT },
diff --git a/t/lib-t5730-protocol-v2-bundle-uri.sh b/t/lib-t5730-protocol-v2-bundle-uri.sh
index 27294e9c976..23f2de0d9d7 100644
--- a/t/lib-t5730-protocol-v2-bundle-uri.sh
+++ b/t/lib-t5730-protocol-v2-bundle-uri.sh
@@ -34,7 +34,9 @@ esac
 test_expect_success "setup protocol v2 $T5730_PROTOCOL:// tests" '
 	git init "$T5730_PARENT" &&
 	test_commit -C "$T5730_PARENT" one &&
-	git -C "$T5730_PARENT" config uploadpack.advertiseBundleURIs true
+	git -C "$T5730_PARENT" config uploadpack.advertiseBundleURIs true &&
+	git -C "$T5730_PARENT" config bundle.list.version 1 &&
+	git -C "$T5730_PARENT" config bundle.list.mode all
 '
 
 # Poor man's URI escaping. Good enough for the test suite whose trash
@@ -94,7 +96,7 @@ test_expect_success "connect with $T5730_PROTOCOL:// using protocol v2: have bun
 test_expect_success !T5730_HTTP "bad client with $T5730_PROTOCOL:// using protocol v2" '
 	test_when_finished "rm -f log" &&
 
-	test_config -C "$T5730_PARENT" uploadpack.bundleURI \
+	test_config -C "$T5730_PARENT" bundle.only.uri \
 		"$T5730_BUNDLE_URI_ESCAPED" &&
 
 	cat >err.expect <<-\EOF &&
@@ -146,3 +148,144 @@ test_expect_success !T5730_HTTP "bad client with $T5730_PROTOCOL:// using protoc
 	grep "clone> test-bad-client$" log >sent-bad-request &&
 	test_file_not_empty sent-bad-request
 '
+
+test_expect_success "ls-remote-bundle-uri with $T5730_PROTOCOL:// using protocol v2" '
+	test_when_finished "rm -f log" &&
+
+	test_config -C "$T5730_PARENT" bundle.only.uri \
+		"$T5730_BUNDLE_URI_ESCAPED" &&
+
+	# All data about bundle URIs
+	cat >expect <<-EOF &&
+	[bundle "list"]
+		version = 1
+		mode = all
+	[bundle "only"]
+		uri = $T5730_BUNDLE_URI_ESCAPED
+	EOF
+	GIT_TRACE_PACKET="$PWD/log" \
+	git \
+		-c protocol.version=2 \
+		ls-remote-bundle-uri \
+		"$T5730_URI" \
+		>actual &&
+	test_cmp_config_output expect actual &&
+
+	# Only the URIs
+	git \
+		-c protocol.version=2 \
+		ls-remote-bundle-uri --uri \
+		"$T5730_URI" \
+		>actual2 &&
+	test_cmp actual actual2
+'
+
+test_expect_success "ls-remote-bundle-uri with $T5730_PROTOCOL:// using protocol v2" '
+	test_when_finished "rm -f log" &&
+
+	test_config -C "$T5730_PARENT" bundle.only.uri \
+		"$T5730_BUNDLE_URI_ESCAPED" &&
+
+	# Extra data should be ignored
+	test_config -C "$T5730_PARENT" bundle.only.extra bogus &&
+
+	# All data about bundle URIs
+	cat >expect <<-EOF &&
+	[bundle "list"]
+		version = 1
+		mode = all
+	[bundle "only"]
+		uri = $T5730_BUNDLE_URI_ESCAPED
+	EOF
+	GIT_TRACE_PACKET="$PWD/log" \
+	git \
+		-c protocol.version=2 \
+		ls-remote-bundle-uri \
+		"$T5730_URI" \
+		>actual &&
+	test_cmp_config_output expect actual
+'
+
+test_expect_success "ls-remote-bundle-uri with $T5730_PROTOCOL:// using protocol v2: --uri" '
+	test_when_finished "rm -f log" &&
+
+	test_config -C "$T5730_PARENT" bundle.only.uri \
+		"$T5730_BUNDLE_URI_ESCAPED" &&
+
+	# All data about bundle URIs
+	cat >expect <<-EOF &&
+	[bundle "list"]
+		version = 1
+		mode = all
+	[bundle "only"]
+		uri = $T5730_BUNDLE_URI_ESCAPED
+	EOF
+	GIT_TRACE_PACKET="$PWD/log" \
+	git \
+		-c protocol.version=2 \
+		ls-remote-bundle-uri \
+		--uri \
+		"$T5730_URI" \
+		>actual &&
+	test_cmp_config_output expect actual
+'
+
+test_expect_success "ls-remote-bundle-uri --[no-]quiet with $T5730_PROTOCOL:// using protocol v2" '
+	test_when_finished "rm -f log" &&
+	test_when_finished "rm -rf child" &&
+	env GIT_TRACE_PACKET="$PWD/log" \
+	git \
+		-c protocol.version=2 \
+		 clone "$T5730_URI" child &&
+
+	test_config -C "$T5730_PARENT" bundle.only.uri \
+		"$T5730_BUNDLE_URI_ESCAPED" &&
+
+	# Without --[no-]quiet
+	cat >out.expect <<-EOF &&
+	[bundle "list"]
+		version = 1
+		mode = all
+	[bundle "only"]
+		uri = $T5730_BUNDLE_URI_ESCAPED
+	EOF
+	cat >err.expect <<-EOF &&
+	From $T5730_URI
+	EOF
+	git \
+		-C child \
+		 -c protocol.version=2 \
+		ls-remote-bundle-uri \
+		>out.actual 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_cmp_config_output out.expect out.actual &&
+
+	# --no-quiet is the default
+	git \
+		-C child \
+		-c protocol.version=2 \
+		ls-remote-bundle-uri \
+		--no-quiet \
+		>out.actual 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_cmp_config_output out.expect out.actual &&
+
+	# --quiet quiets the "From" line
+	git \
+		-C child \
+		-c protocol.version=2 \
+		ls-remote-bundle-uri \
+		--quiet \
+		>out.actual 2>err &&
+	test_must_be_empty err &&
+	test_cmp_config_output out.expect out.actual &&
+
+	# --quiet is implicit if the remote is not implicit
+	git \
+		-c protocol.version=2 \
+		ls-remote-bundle-uri \
+		"$T5730_URI" \
+		>out.actual 2>err &&
+	test_must_be_empty err &&
+	test_cmp_config_output out.expect out.actual
+'
diff --git a/transport.c b/transport.c
index 106bb8c5577..c9d2e1804eb 100644
--- a/transport.c
+++ b/transport.c
@@ -198,6 +198,7 @@ struct git_transport_data {
 	struct git_transport_options options;
 	struct child_process *conn;
 	int fd[2];
+	unsigned got_advertisement : 1;
 	unsigned got_remote_heads : 1;
 	enum protocol_version version;
 	struct oid_array extra_have;
@@ -346,6 +347,7 @@ static struct ref *handshake(struct transport *transport, int for_push,
 		BUG("unknown protocol version");
 	}
 	data->got_remote_heads = 1;
+	data->got_advertisement = 1;
 	transport->hash_algo = reader.hash_algo;
 
 	if (reader.line_peeked)
@@ -371,6 +373,33 @@ static int get_bundle_uri(struct transport *transport)
 		init_bundle_list(transport->bundles);
 	}
 
+	if (!data->got_advertisement) {
+		struct ref *refs;
+		struct git_transport_data *data = transport->data;
+		enum protocol_version version;
+
+		refs = handshake(transport, 0, NULL, 0);
+		version = data->version;
+
+		switch (version) {
+		case protocol_v2:
+			assert(!refs);
+			break;
+		case protocol_v0:
+		case protocol_v1:
+		case protocol_unknown_version:
+			assert(refs);
+			break;
+		}
+	}
+
+	/*
+	 * "Support" protocol v0 and v2 without bundle-uri support by
+	 * silently degrading to a NOOP.
+	 */
+	if (!server_supports_v2("bundle-uri", 0))
+		return 0;
+
 	packet_reader_init(&reader, data->fd[0], NULL, 0,
 			   PACKET_READ_CHOMP_NEWLINE |
 			   PACKET_READ_GENTLE_ON_EOF);
@@ -1498,7 +1527,7 @@ int transport_fetch_refs(struct transport *transport, struct ref *refs)
 	return rc;
 }
 
-int transport_get_remote_bundle_uri(struct transport *transport)
+int transport_get_remote_bundle_uri(struct transport *transport, int quiet)
 {
 	const struct transport_vtable *vtable = transport->vtable;
 
@@ -1506,20 +1535,16 @@ int transport_get_remote_bundle_uri(struct transport *transport)
 	if (transport->got_remote_bundle_uri++)
 		return 0;
 
-	/*
-	 * "Support" protocol v0 and v2 without bundle-uri support by
-	 * silently degrading to a NOOP.
-	 */
-	if (!server_supports_v2("bundle-uri", 0))
-		return 0;
-
 	/*
 	 * This is intentionally below the transport.injectBundleURI,
 	 * we want to be able to inject into protocol v0, or into the
 	 * dialog of a server who doesn't support this.
 	 */
-	if (!vtable->get_bundle_uri)
+	if (!vtable->get_bundle_uri) {
+		if (quiet)
+			return -1;
 		return error(_("bundle-uri operation not supported by protocol"));
+	}
 
 	if (vtable->get_bundle_uri(transport) < 0)
 		return error(_("could not retrieve server-advertised bundle-uri list"));
diff --git a/transport.h b/transport.h
index 015375e083d..f40bf2c69b8 100644
--- a/transport.h
+++ b/transport.h
@@ -296,8 +296,12 @@ const struct ref *transport_get_remote_refs(struct transport *transport,
 /**
  * Retrieve bundle URI(s) from a remote. Populates "struct
  * transport"'s "bundle_uri" and "got_remote_bundle_uri".
+ *
+ * With `quiet=1` it will not complain if the serve doesn't support
+ * the protocol, but only if we discover the server uses it, and
+ * encounter issues then.
  */
-int transport_get_remote_bundle_uri(struct transport *transport);
+int transport_get_remote_bundle_uri(struct transport *transport, int quiet);
 
 /*
  * Fetch the hash algorithm used by a remote.
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 19/24] bundle-uri: serve URI advertisement from bundle.* config
  2022-05-20 18:40 [PATCH 00/24] [RFC] Bundle URIs Combined RFC Derrick Stolee via GitGitGadget
                   ` (17 preceding siblings ...)
  2022-05-20 18:40 ` [PATCH 18/24] bundle-uri client: add "git ls-remote-bundle-uri" Ævar Arnfjörð Bjarmason via GitGitGadget
@ 2022-05-20 18:40 ` Derrick Stolee via GitGitGadget
  2022-05-20 18:40 ` [PATCH 20/24] bundle-uri client: add boolean transfer.bundleURI setting Ævar Arnfjörð Bjarmason via GitGitGadget
                   ` (4 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-20 18:40 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, Ævar Arnfjörð Bjarmason,
	Teng Long, Johannes Schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

TODO: fill in details

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index fed508cd51a..70be53aa38d 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -579,6 +579,16 @@ cached:
 	return advertise_bundle_uri;
 }
 
+static int config_to_packet_line(const char *key, const char *value, void *data)
+{
+	struct packet_reader *writer = data;
+
+	if (!strncmp(key, "bundle.", 7))
+		packet_write_fmt(writer->fd, "%s=%s", key, value);
+
+	return 0;
+}
+
 int bundle_uri_command(struct repository *r,
 		       struct packet_reader *request)
 {
@@ -590,7 +600,11 @@ int bundle_uri_command(struct repository *r,
 	if (request->status != PACKET_READ_FLUSH)
 		die(_("bundle-uri: expected flush after arguments"));
 
-	/* TODO: Implement the communication */
+	/*
+	 * Read all "bundle.*" config lines to the client as key=value
+	 * packet lines.
+	 */
+	git_config(config_to_packet_line, &writer);
 
 	packet_writer_flush(&writer);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 20/24] bundle-uri client: add boolean transfer.bundleURI setting
  2022-05-20 18:40 [PATCH 00/24] [RFC] Bundle URIs Combined RFC Derrick Stolee via GitGitGadget
                   ` (18 preceding siblings ...)
  2022-05-20 18:40 ` [PATCH 19/24] bundle-uri: serve URI advertisement from bundle.* config Derrick Stolee via GitGitGadget
@ 2022-05-20 18:40 ` Ævar Arnfjörð Bjarmason via GitGitGadget
  2022-05-20 18:40 ` [PATCH 21/24] bundle-uri: allow relative URLs in bundle lists Derrick Stolee via GitGitGadget
                   ` (3 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Ævar Arnfjörð Bjarmason via GitGitGadget @ 2022-05-20 18:40 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, Ævar Arnfjörð Bjarmason,
	Teng Long, Johannes Schindelin, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?=
 <avarab@gmail.com>

The yet-to-be introduced client support for bundle-uri will always
fall back on a full clone, but we'd still like to be able to ignore a
server's bundle-uri advertisement entirely.

The new transfer.bundleURI config option defaults to 'false', but a user
can set it to 'true' to enable checking for bundle URIs from the origin
Git server using protocol v2.

To enable this setting by default in the correct tests, add a
GIT_TEST_BUNDLE_URI environment variable.

Co-authored-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/config/transfer.txt     |  6 ++++++
 t/lib-t5730-protocol-v2-bundle-uri.sh |  3 +++
 transport.c                           | 10 +++++++---
 3 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/Documentation/config/transfer.txt b/Documentation/config/transfer.txt
index b49429eb4db..337a7886546 100644
--- a/Documentation/config/transfer.txt
+++ b/Documentation/config/transfer.txt
@@ -77,3 +77,9 @@ transfer.unpackLimit::
 transfer.advertiseSID::
 	Boolean. When true, client and server processes will advertise their
 	unique session IDs to their remote counterpart. Defaults to false.
+
+transfer.bundleURI::
+	When set to `false` ignores any server advertisement of
+	`bundle-uri` and proceed with a "normal" clone/fetch even if
+	using bundles to bootstap is possible. Defaults to `true`,
+	i.e. bundle-uri is tried whenever a server offers it.
diff --git a/t/lib-t5730-protocol-v2-bundle-uri.sh b/t/lib-t5730-protocol-v2-bundle-uri.sh
index 23f2de0d9d7..49d4d848cac 100644
--- a/t/lib-t5730-protocol-v2-bundle-uri.sh
+++ b/t/lib-t5730-protocol-v2-bundle-uri.sh
@@ -1,5 +1,8 @@
 # Included from t573*-protocol-v2-bundle-uri-*.sh
 
+GIT_TEST_BUNDLE_URI=1
+export GIT_TEST_BUNDLE_URI
+
 T5730_PARENT=
 T5730_URI=
 T5730_BUNDLE_URI=
diff --git a/transport.c b/transport.c
index c9d2e1804eb..f0114778ff4 100644
--- a/transport.c
+++ b/transport.c
@@ -1529,6 +1529,7 @@ int transport_fetch_refs(struct transport *transport, struct ref *refs)
 
 int transport_get_remote_bundle_uri(struct transport *transport, int quiet)
 {
+	int value = 0;
 	const struct transport_vtable *vtable = transport->vtable;
 
 	/* Check config only once. */
@@ -1536,10 +1537,13 @@ int transport_get_remote_bundle_uri(struct transport *transport, int quiet)
 		return 0;
 
 	/*
-	 * This is intentionally below the transport.injectBundleURI,
-	 * we want to be able to inject into protocol v0, or into the
-	 * dialog of a server who doesn't support this.
+	 * Don't use bundle-uri at all, if configured not to.
+	 * This logic defaults "transfer.bundleURI" to false.
 	 */
+	if (!git_env_bool("GIT_TEST_BUNDLE_URI", 0) &&
+	    (git_config_get_bool("transfer.bundleuri", &value) || !value))
+		return 0;
+
 	if (!vtable->get_bundle_uri) {
 		if (quiet)
 			return -1;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 21/24] bundle-uri: allow relative URLs in bundle lists
  2022-05-20 18:40 [PATCH 00/24] [RFC] Bundle URIs Combined RFC Derrick Stolee via GitGitGadget
                   ` (19 preceding siblings ...)
  2022-05-20 18:40 ` [PATCH 20/24] bundle-uri client: add boolean transfer.bundleURI setting Ævar Arnfjörð Bjarmason via GitGitGadget
@ 2022-05-20 18:40 ` Derrick Stolee via GitGitGadget
  2022-05-20 18:40 ` [PATCH 22/24] bundle-uri: download bundles from an advertised list Derrick Stolee via GitGitGadget
                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-20 18:40 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, Ævar Arnfjörð Bjarmason,
	Teng Long, Johannes Schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

Bundle providers may want to distribute that data across multiple CDNs.
This might require a change in the base URI, all the way to the domain
name. If all bundles require an absolute URI in their 'uri' value, then
every push to a CDN would require altering the table of contents to
match the expected domain and exact location within it.

Allow a bundle list to specify a relative URI for the bundles.
This allows easier distribution of bundle data.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c                |  6 ++++-
 bundle-uri.h                |  9 +++++++
 t/helper/test-bundle-uri.c  |  2 ++
 t/t5750-bundle-uri-parse.sh | 54 +++++++++++++++++++++++++++++++++++++
 transport.c                 |  3 +++
 5 files changed, 73 insertions(+), 1 deletion(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index 70be53aa38d..9e7dc0fb4ca 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -7,6 +7,7 @@
 #include "run-command.h"
 #include "hashmap.h"
 #include "pkt-line.h"
+#include "remote.h"
 
 static int compare_bundles(const void *hashmap_cmp_fn_data,
 			   const struct hashmap_entry *he1,
@@ -49,6 +50,7 @@ void clear_bundle_list(struct bundle_list *list)
 
 	for_all_bundles_in_list(list, clear_remote_bundle_info, NULL);
 	hashmap_clear_and_free(&list->bundles, struct remote_bundle_info, ent);
+	free(list->baseURI);
 }
 
 int for_all_bundles_in_list(struct bundle_list *list,
@@ -169,7 +171,7 @@ static int bundle_list_update(const char *key, const char *value,
 
 	if (!strcmp(dot, "uri")) {
 		free(bundle->uri);
-		bundle->uri = xstrdup(value);
+		bundle->uri = relative_url(list->baseURI, value, NULL);
 		return 0;
 	}
 
@@ -197,6 +199,8 @@ int parse_bundle_list_in_config_format(const char *uri,
 	};
 
 	list->mode = BUNDLE_MODE_NONE;
+	if (!list->baseURI)
+		list->baseURI = xstrdup(uri);
 	result = git_config_from_file_with_options(config_to_bundle_list,
 						   filename, list,
 						   &opts);
diff --git a/bundle-uri.h b/bundle-uri.h
index 9f6fc2d75f9..fdcaea048a8 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -65,6 +65,15 @@ struct bundle_list {
 	enum bundle_list_mode mode;
 	unsigned forFetch : 1;
 	struct hashmap bundles;
+
+	/**
+	 * The baseURI of a bundle_list is used as the base for any
+	 * relative URIs advertised by the bundle list at that location.
+	 *
+	 * When the list is generated from a Git server, then use that
+	 * server's location.
+	 */
+	char *baseURI;
 };
 
 void init_bundle_list(struct bundle_list *list);
diff --git a/t/helper/test-bundle-uri.c b/t/helper/test-bundle-uri.c
index 23ce0eebca3..b535dc47134 100644
--- a/t/helper/test-bundle-uri.c
+++ b/t/helper/test-bundle-uri.c
@@ -35,6 +35,8 @@ static int cmd__bundle_uri_parse(int argc, const char **argv, enum input_mode mo
 
 	init_bundle_list(&list);
 
+	list.baseURI = xstrdup("<uri>");
+
 	switch (mode) {
 	case KEY_VALUE_PAIRS:
 		if (argc)
diff --git a/t/t5750-bundle-uri-parse.sh b/t/t5750-bundle-uri-parse.sh
index c2b7a8ce968..ea279c2b762 100755
--- a/t/t5750-bundle-uri-parse.sh
+++ b/t/t5750-bundle-uri-parse.sh
@@ -30,6 +30,30 @@ test_expect_success 'bundle_uri_parse_line() just URIs' '
 	test_cmp_config_output expect actual
 '
 
+test_expect_success 'bundle_uri_parse_line(): relative URIs' '
+	cat >in <<-\EOF &&
+	bundle.one.uri=bundle.bdl
+	bundle.two.uri=../bundle.bdl
+	bundle.three.uri=sub/dir/bundle.bdl
+	EOF
+
+	cat >expect <<-\EOF &&
+	[bundle "list"]
+		version = 1
+		mode = all
+	[bundle "one"]
+		uri = <uri>/bundle.bdl
+	[bundle "two"]
+		uri = bundle.bdl
+	[bundle "three"]
+		uri = <uri>/sub/dir/bundle.bdl
+	EOF
+
+	test-tool bundle-uri parse-key-values <in >actual 2>err &&
+	test_must_be_empty err &&
+	test_cmp_config_output expect actual
+'
+
 test_expect_success 'bundle_uri_parse_line() parsing edge cases: empty key or value' '
 	cat >in <<-\EOF &&
 	=bogus-value
@@ -106,6 +130,36 @@ test_expect_success 'parse config format: just URIs' '
 	test_cmp_config_output expect actual
 '
 
+test_expect_success 'parse config format: relative URIs' '
+	cat >in <<-\EOF &&
+	[bundle "list"]
+		version = 1
+		mode = all
+	[bundle "one"]
+		uri = bundle.bdl
+	[bundle "two"]
+		uri = ../bundle.bdl
+	[bundle "three"]
+		uri = sub/dir/bundle.bdl
+	EOF
+
+	cat >expect <<-\EOF &&
+	[bundle "list"]
+		version = 1
+		mode = all
+	[bundle "one"]
+		uri = <uri>/bundle.bdl
+	[bundle "two"]
+		uri = bundle.bdl
+	[bundle "three"]
+		uri = <uri>/sub/dir/bundle.bdl
+	EOF
+
+	test-tool bundle-uri parse-config in >actual 2>err &&
+	test_must_be_empty err &&
+	test_cmp_config_output expect actual
+'
+
 test_expect_success 'parse config format edge cases: empty key or value' '
 	cat >in1 <<-\EOF &&
 	= bogus-value
diff --git a/transport.c b/transport.c
index f0114778ff4..52b99cd3298 100644
--- a/transport.c
+++ b/transport.c
@@ -1544,6 +1544,9 @@ int transport_get_remote_bundle_uri(struct transport *transport, int quiet)
 	    (git_config_get_bool("transfer.bundleuri", &value) || !value))
 		return 0;
 
+	if (!transport->bundles->baseURI)
+		transport->bundles->baseURI = xstrdup(transport->url);
+
 	if (!vtable->get_bundle_uri) {
 		if (quiet)
 			return -1;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 22/24] bundle-uri: download bundles from an advertised list
  2022-05-20 18:40 [PATCH 00/24] [RFC] Bundle URIs Combined RFC Derrick Stolee via GitGitGadget
                   ` (20 preceding siblings ...)
  2022-05-20 18:40 ` [PATCH 21/24] bundle-uri: allow relative URLs in bundle lists Derrick Stolee via GitGitGadget
@ 2022-05-20 18:40 ` Derrick Stolee via GitGitGadget
  2022-05-20 18:40 ` [PATCH 23/24] clone: unbundle the advertised bundles Derrick Stolee via GitGitGadget
  2022-05-20 18:40 ` [PATCH 24/24] t5601: basic bundle URI tests Derrick Stolee via GitGitGadget
  23 siblings, 0 replies; 25+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-20 18:40 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, Ævar Arnfjörð Bjarmason,
	Teng Long, Johannes Schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The logic in fetch_bundle_uri() is useful for the --bundle-uri option of
'git clone', but is not helpful when the clone operation discovers a
list of URIs from the bundle-uri protocol v2 verb. To actually download
and unbundle the advertised bundles, we need a different mechanism.

Create the new fetch_bundle_list() method which is very similar to
fetch_bundle_uri() except that it relies on download_bundle_list()
instead of fetch_bundle_uri_internal(). The download_bundle_list()
method will recursively call fetch_bundle_uri_internal() if any of the
advertised URIs serve a bundle list instead of a bundle. This will also
follow the bundle.list.mode setting from the input list: "any" will
download only one such URI while "all" will download data from all of
the URIs.

In an identical way to the fetch_bundle_uri(), the bundles are unbundled
after all of the bundle lists have been expanded and all necessary URIs.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c | 21 +++++++++++++++++++++
 bundle-uri.h | 11 +++++++++++
 2 files changed, 32 insertions(+)

diff --git a/bundle-uri.c b/bundle-uri.c
index 9e7dc0fb4ca..43815b3c5fd 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -561,6 +561,27 @@ cleanup:
 	return result;
 }
 
+int fetch_bundle_list(struct repository *r, const char *uri, struct bundle_list *list)
+{
+	int result;
+	struct bundle_list global_list;
+
+	init_bundle_list(&global_list);
+
+	/* If a bundle is added to this global list, then it is required. */
+	global_list.mode = BUNDLE_MODE_ALL;
+
+	if ((result = download_bundle_list(r, list, &global_list, 0)))
+		goto cleanup;
+
+	result = unbundle_all_bundles(r, &global_list);
+
+cleanup:
+	for_all_bundles_in_list(&global_list, unlink_bundle, NULL);
+	clear_bundle_list(&global_list);
+	return result;
+}
+
 /**
  * API for serve.c.
  */
diff --git a/bundle-uri.h b/bundle-uri.h
index fdcaea048a8..1bcbd43320f 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -107,6 +107,17 @@ int parse_bundle_list_in_config_format(const char *uri,
  */
 int fetch_bundle_uri(struct repository *r, const char *uri);
 
+/**
+ * Given a bundle list that was already advertised (likely by the
+ * bundle-uri protocol v2 verb) at the given uri, fetch and unbundle the
+ * bundles according to the bundle strategy of that list.
+ *
+ * Returns non-zero if no bundle information is found at the given 'uri'.
+ */
+int fetch_bundle_list(struct repository *r,
+		      const char *uri,
+		      struct bundle_list *list);
+
 /**
  * API for serve.c.
  */
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 23/24] clone: unbundle the advertised bundles
  2022-05-20 18:40 [PATCH 00/24] [RFC] Bundle URIs Combined RFC Derrick Stolee via GitGitGadget
                   ` (21 preceding siblings ...)
  2022-05-20 18:40 ` [PATCH 22/24] bundle-uri: download bundles from an advertised list Derrick Stolee via GitGitGadget
@ 2022-05-20 18:40 ` Derrick Stolee via GitGitGadget
  2022-05-20 18:40 ` [PATCH 24/24] t5601: basic bundle URI tests Derrick Stolee via GitGitGadget
  23 siblings, 0 replies; 25+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-20 18:40 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, Ævar Arnfjörð Bjarmason,
	Teng Long, Johannes Schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

A previous change introduced the transport methods to acquire a bundle
list from the 'bundle-uri' protocol v2 verb, when advertised _and_ when
the client has chosen to enable the feature.

Teach Git to download and unbundle the data advertised by those bundles
during 'git clone'.

Also, since the --bundle-uri option exists, we do not want to mix the
advertised bundles with the user-specified bundles.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/clone.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index ca2291552f7..cbe392cf60f 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1263,11 +1263,22 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	if (refs)
 		mapped_refs = wanted_peer_refs(refs, &remote->fetch);
 
-	/*
-	 * Populate transport->got_remote_bundle_uri and
-	 * transport->bundle_uri. We might get nothing.
-	 */
-	transport_get_remote_bundle_uri(transport, 1);
+	if (!bundle_uri) {
+		/*
+		* Populate transport->got_remote_bundle_uri and
+		* transport->bundle_uri. We might get nothing.
+		*/
+		transport_get_remote_bundle_uri(transport, 1);
+
+		if (transport->bundles) {
+			/* At this point, we need the_repository to match the cloned repo. */
+			repo_init(the_repository, git_dir, work_tree);
+			if (fetch_bundle_list(the_repository,
+					      remote->url[0],
+					      transport->bundles))
+				warning(_("failed to fetch advertised bundles"));
+		}
+	}
 
 	if (mapped_refs) {
 		int hash_algo = hash_algo_by_ptr(transport_get_hash_algo(transport));
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 24/24] t5601: basic bundle URI tests
  2022-05-20 18:40 [PATCH 00/24] [RFC] Bundle URIs Combined RFC Derrick Stolee via GitGitGadget
                   ` (22 preceding siblings ...)
  2022-05-20 18:40 ` [PATCH 23/24] clone: unbundle the advertised bundles Derrick Stolee via GitGitGadget
@ 2022-05-20 18:40 ` Derrick Stolee via GitGitGadget
  23 siblings, 0 replies; 25+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-20 18:40 UTC (permalink / raw)
  To: git
  Cc: gitster, me, newren, Ævar Arnfjörð Bjarmason,
	Teng Long, Johannes Schindelin, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

This test demonstrates an end-to-end form of the bundle URI feature
given by an HTTP server advertising the 'features' capability with a
bundle URI that is a bundle file on that same HTTP server. We verify
that we unbundled a bundle, which could only have happened if we
successfully downloaded that file.

RFC-TODO: Create similar tests throughout the series that perform
similar tests, including examples with table of contents and partial
clones.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 t/t5601-clone.sh | 59 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 59 insertions(+)

diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index 4a61f2c901e..4ee9e231203 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -767,6 +767,65 @@ test_expect_success 'reject cloning shallow repository using HTTP' '
 	git clone --no-reject-shallow $HTTPD_URL/smart/repo.git repo
 '
 
+test_expect_success 'auto-discover bundle URI from HTTP clone' '
+	test_when_finished rm -rf trace.txt repo2 "$HTTPD_DOCUMENT_ROOT_PATH/repo2.git" &&
+	git -C src bundle create "$HTTPD_DOCUMENT_ROOT_PATH/everything.bundle" --all &&
+	git clone --bare --no-local src "$HTTPD_DOCUMENT_ROOT_PATH/repo2.git" &&
+
+	git -C "$HTTPD_DOCUMENT_ROOT_PATH/repo2.git" config \
+		uploadpack.advertiseBundleURIs true &&
+	git -C "$HTTPD_DOCUMENT_ROOT_PATH/repo2.git" config \
+		bundle.list.version 1 &&
+	git -C "$HTTPD_DOCUMENT_ROOT_PATH/repo2.git" config \
+		bundle.list.mode all &&
+	git -C "$HTTPD_DOCUMENT_ROOT_PATH/repo2.git" config \
+		bundle.everything.uri "$HTTPD_URL/everything.bundle" &&
+
+	GIT_TEST_BUNDLE_URI=1 \
+	GIT_TRACE2_EVENT="$(pwd)/trace.txt" \
+		git -c protocol.version=2 clone \
+		$HTTPD_URL/smart/repo2.git repo2 &&
+	cat >pattern <<-EOF &&
+	"event":"child_start".*"argv":\["git-remote-https","origin","$HTTPD_URL/everything.bundle"\]
+	EOF
+	grep -f pattern trace.txt
+'
+
+test_expect_success 'auto-discover multiple bundles from HTTP clone' '
+	test_when_finished rm -rf trace.txt repo3 "$HTTPD_DOCUMENT_ROOT_PATH/repo3.git" &&
+
+	test_commit -C src new &&
+	git -C src bundle create "$HTTPD_DOCUMENT_ROOT_PATH/new.bundle" HEAD~1..HEAD &&
+	git clone --bare --no-local src "$HTTPD_DOCUMENT_ROOT_PATH/repo3.git" &&
+
+	git -C "$HTTPD_DOCUMENT_ROOT_PATH/repo3.git" config \
+		uploadpack.advertiseBundleURIs true &&
+	git -C "$HTTPD_DOCUMENT_ROOT_PATH/repo3.git" config \
+		bundle.list.version 1 &&
+	git -C "$HTTPD_DOCUMENT_ROOT_PATH/repo3.git" config \
+		bundle.list.mode all &&
+
+	git -C "$HTTPD_DOCUMENT_ROOT_PATH/repo3.git" config \
+		bundle.everything.uri "$HTTPD_URL/everything.bundle" &&
+	git -C "$HTTPD_DOCUMENT_ROOT_PATH/repo3.git" config \
+		bundle.new.uri "$HTTPD_URL/new.bundle" &&
+
+	GIT_TEST_BUNDLE_URI=1 \
+	GIT_TRACE2_EVENT="$(pwd)/trace.txt" \
+		git -c protocol.version=2 clone \
+		$HTTPD_URL/smart/repo3.git repo3 &&
+
+	# We should fetch _both_ bundles
+	cat >pattern <<-EOF &&
+	"event":"child_start".*"argv":\["git-remote-https","origin","$HTTPD_URL/everything.bundle"\]
+	EOF
+	grep -f pattern trace.txt &&
+	cat >pattern <<-EOF &&
+	"event":"child_start".*"argv":\["git-remote-https","origin","$HTTPD_URL/new.bundle"\]
+	EOF
+	grep -f pattern trace.txt
+'
+
 # DO NOT add non-httpd-specific tests here, because the last part of this
 # test script is only executed when httpd is available and enabled.
 
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2022-05-20 18:42 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-20 18:40 [PATCH 00/24] [RFC] Bundle URIs Combined RFC Derrick Stolee via GitGitGadget
2022-05-20 18:40 ` [PATCH 01/24] docs: document bundle URI standard Derrick Stolee via GitGitGadget
2022-05-20 18:40 ` [PATCH 02/24] remote-curl: add 'get' capability Derrick Stolee via GitGitGadget
2022-05-20 18:40 ` [PATCH 03/24] bundle-uri: create basic file-copy logic Derrick Stolee via GitGitGadget
2022-05-20 18:40 ` [PATCH 04/24] bundle-uri: add support for http(s):// and file:// Derrick Stolee via GitGitGadget
2022-05-20 18:40 ` [PATCH 05/24] fetch: add --bundle-uri option Derrick Stolee via GitGitGadget
2022-05-20 18:40 ` [PATCH 06/24] fetch: add 'refs/bundle/' to log.excludeDecoration Derrick Stolee via GitGitGadget
2022-05-20 18:40 ` [PATCH 07/24] clone: add --bundle-uri option Derrick Stolee via GitGitGadget
2022-05-20 18:40 ` [PATCH 08/24] clone: --bundle-uri cannot be combined with --depth Derrick Stolee via GitGitGadget
2022-05-20 18:40 ` [PATCH 09/24] bundle-uri: create bundle_list struct and helpers Derrick Stolee via GitGitGadget
2022-05-20 18:40 ` [PATCH 10/24] bundle-uri: create base key-value pair parsing Derrick Stolee via GitGitGadget
2022-05-20 18:40 ` [PATCH 11/24] bundle-uri: create "key=value" line parsing Ævar Arnfjörð Bjarmason via GitGitGadget
2022-05-20 18:40 ` [PATCH 12/24] bundle-uri: unit test "key=value" parsing Ævar Arnfjörð Bjarmason via GitGitGadget
2022-05-20 18:40 ` [PATCH 13/24] bundle-uri: limit recursion depth for bundle lists Derrick Stolee via GitGitGadget
2022-05-20 18:40 ` [PATCH 14/24] bundle-uri: parse bundle list in config format Derrick Stolee via GitGitGadget
2022-05-20 18:40 ` [PATCH 15/24] bundle-uri: fetch a list of bundles Derrick Stolee via GitGitGadget
2022-05-20 18:40 ` [PATCH 16/24] protocol v2: add server-side "bundle-uri" skeleton Ævar Arnfjörð Bjarmason via GitGitGadget
2022-05-20 18:40 ` [PATCH 17/24] bundle-uri client: add minimal NOOP client Ævar Arnfjörð Bjarmason via GitGitGadget
2022-05-20 18:40 ` [PATCH 18/24] bundle-uri client: add "git ls-remote-bundle-uri" Ævar Arnfjörð Bjarmason via GitGitGadget
2022-05-20 18:40 ` [PATCH 19/24] bundle-uri: serve URI advertisement from bundle.* config Derrick Stolee via GitGitGadget
2022-05-20 18:40 ` [PATCH 20/24] bundle-uri client: add boolean transfer.bundleURI setting Ævar Arnfjörð Bjarmason via GitGitGadget
2022-05-20 18:40 ` [PATCH 21/24] bundle-uri: allow relative URLs in bundle lists Derrick Stolee via GitGitGadget
2022-05-20 18:40 ` [PATCH 22/24] bundle-uri: download bundles from an advertised list Derrick Stolee via GitGitGadget
2022-05-20 18:40 ` [PATCH 23/24] clone: unbundle the advertised bundles Derrick Stolee via GitGitGadget
2022-05-20 18:40 ` [PATCH 24/24] t5601: basic bundle URI tests Derrick Stolee via GitGitGadget

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).