git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [RFC PATCH 00/13] Add bundle-uri: resumably clones, static "dumb" CDN etc.
@ 2021-08-05 15:07 Ævar Arnfjörð Bjarmason
  2021-08-05 15:07 ` [RFC PATCH 01/13] serve: add command to advertise bundle URIs Ævar Arnfjörð Bjarmason
                   ` (14 more replies)
  0 siblings, 15 replies; 26+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-08-05 15:07 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Patrick Steinhardt, Christian Couder, Albert Cui,
	Jonathan Tan, Ævar Arnfjörð Bjarmason

We're in the 2.33.0 rc cycle, and I'd hoped to have some more prep
work for this integrated already, but for now here's something
interesting I've been working on for early commentary/feedback.

This adds the the ability to protocol v2 for servers to optimistically
pre-seed supporting clients with one or more bundles via a new
"bundle-uri" protocol extension.

Right now only "clone" supports this, but it's a rather easy change on
top to add incremental "fetch" support as well.

The elevator pitch for this feature is (I guess it's a long elevator
ride..);

 * Allows for offloading most/all of a PACK "fetch" to "dumb" CDN's
   that *don't* have very close coordination with the server running
   "git-upload-pack" (unlike packfile-uri, more on that below).

   I.e. distributing an up-to-date-enough bundle via something like
   Debian's FTP mirror system, or a best-effort occasionally updated
   CDN.

   Should the bundle(s) be outdated, corrupt or whatever the client
   gracefully recovers by either ignoring the bad data, or catching up
   via negotiation with whatever data it did get.

   Server operators should be confident in using bundle URIs, even if
   the CDN they're pointing to is flaky, out of date, or even
   sometimes outright broken or unreachable. The client will recover
   in all those cases.

 * Makes performant git infrastructure more accessible, i.e. this
   feature helps the last with an up-to-date repack with up-to-date
   bitmaps when talking to a network-local git server, but a lot of
   users have more option for scaling or distributing things via dumb
   CDNs than a server that can run "git-upload-pack".

 * You can even bootstrap a clone of a remote server that doesn't
   support bundle-uri with a local or remote bundle with the
   "transfer.injectBundleURI" config.

 * Trivial path to resumable clones. Note that that's "resumable" in
   the sense that curl(1) will resume a partially downloaded bundle,
   we don't resume whatever state index-pack was in when the
   connection was broken.

   I have a POC of this working locally, it's just a matter of
   invoking curl(1) with "--continue-at -".

   The hindrance for resumably clones is now just the UI for git-clone
   (i.e. stashing the partial data somewhere), not protocol
   limitations.

This goes on top of the outstanding series I have for serve.c API
cleanup & fixes at
https://lore.kernel.org/git/cover-v4-00.10-00000000000-20210805T011823Z-avarab@gmail.com

I also needed to grab one patch from my "bundle unbundle progress"
series:
https://lore.kernel.org/git/cover-0.4-0000000000-20210727T004015Z-avarab@gmail.com/

Something like this approach had been suggested before in late 2011 by
Jeff King, see:
https://lore.kernel.org/git/20111110074330.GA27925@sigill.intra.peff.net/;
There's significant differences in the approach, mainly due to
protocol v2 not existing at the time. I wrote most of this before
finding/seeing Jeff's earlier patches.

For a demo of how this works head over to 12/13:
https://lore.kernel.org/git/RFC-patch-12.13-8dc5613e87-20210805T150534Z-avarab@gmail.com

In 13/13 there's a design doc discussing the approach, and major
differences with the existing packfile-uri mechanism:
https://lore.kernel.org/git/RFC-patch-13.13-1e657ed27a-20210805T150534Z-avarab@gmail.com

This can also be grabbed from the "avar/bundle-uri-client-clone"
branch of https://github.com/avar/git/

Ævar Arnfjörð Bjarmason (13):
  serve: add command to advertise bundle URIs
  bundle-uri client: add "bundle-uri" parsing + tests
  connect.c: refactor sending of agent & object-format
  bundle-uri client: add minimal NOOP client
  bundle-uri client: add "git ls-remote-bundle-uri"
  bundle-uri client: add transfer.injectBundleURI support
  bundle-uri client: add boolean transfer.bundleURI setting
  bundle.h: make "fd" version of read_bundle_header() public
  fetch-pack: add a deref_without_lazy_fetch_extended()
  fetch-pack: move --keep=* option filling to a function
  index-pack: add --progress-title option
  bundle-uri client: support for bundle-uri with "clone"
  bundle-uri docs: add design notes

 Documentation/config/transfer.txt          |  26 ++
 Documentation/git-index-pack.txt           |   6 +
 Documentation/git-ls-remote-bundle-uri.txt |  63 +++
 Documentation/git-ls-remote.txt            |   1 +
 Documentation/technical/bundle-uri.txt     | 119 ++++++
 Documentation/technical/protocol-v2.txt    | 145 +++++++
 Makefile                                   |   3 +
 builtin.h                                  |   1 +
 builtin/clone.c                            |   7 +
 builtin/index-pack.c                       |   6 +
 builtin/ls-remote-bundle-uri.c             |  90 +++++
 bundle-uri.c                               | 151 ++++++++
 bundle-uri.h                               |  30 ++
 bundle.c                                   |   8 +-
 bundle.h                                   |   2 +
 command-list.txt                           |   1 +
 connect.c                                  |  80 +++-
 fetch-pack.c                               | 304 ++++++++++++++-
 fetch-pack.h                               |   6 +
 git.c                                      |   1 +
 remote.h                                   |   4 +
 serve.c                                    |   6 +
 t/helper/test-bundle-uri.c                 |  80 ++++
 t/helper/test-tool.c                       |   1 +
 t/helper/test-tool.h                       |   1 +
 t/lib-t5730-protocol-v2-bundle-uri.sh      | 425 +++++++++++++++++++++
 t/t5701-git-serve.sh                       | 124 +++++-
 t/t5730-protocol-v2-bundle-uri-file.sh     |  36 ++
 t/t5731-protocol-v2-bundle-uri-git.sh      |  17 +
 t/t5732-protocol-v2-bundle-uri-http.sh     |  17 +
 t/t5750-bundle-uri-parse.sh                |  98 +++++
 transport-helper.c                         |  13 +
 transport-internal.h                       |   7 +
 transport.c                                | 120 ++++++
 transport.h                                |  22 ++
 35 files changed, 1988 insertions(+), 33 deletions(-)
 create mode 100644 Documentation/git-ls-remote-bundle-uri.txt
 create mode 100644 Documentation/technical/bundle-uri.txt
 create mode 100644 builtin/ls-remote-bundle-uri.c
 create mode 100644 bundle-uri.c
 create mode 100644 bundle-uri.h
 create mode 100644 t/helper/test-bundle-uri.c
 create mode 100644 t/lib-t5730-protocol-v2-bundle-uri.sh
 create mode 100755 t/t5730-protocol-v2-bundle-uri-file.sh
 create mode 100755 t/t5731-protocol-v2-bundle-uri-git.sh
 create mode 100755 t/t5732-protocol-v2-bundle-uri-http.sh
 create mode 100755 t/t5750-bundle-uri-parse.sh

-- 
2.33.0.rc0.646.g585563e77f


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC PATCH 01/13] serve: add command to advertise bundle URIs
  2021-08-05 15:07 [RFC PATCH 00/13] Add bundle-uri: resumably clones, static "dumb" CDN etc Ævar Arnfjörð Bjarmason
@ 2021-08-05 15:07 ` Ævar Arnfjörð Bjarmason
  2021-08-10 13:58   ` Derrick Stolee
  2021-08-05 15:07 ` [RFC PATCH 02/13] bundle-uri client: add "bundle-uri" parsing + tests Ævar Arnfjörð Bjarmason
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 26+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-08-05 15:07 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Patrick Steinhardt, Christian Couder, Albert Cui,
	Jonathan Tan, Ævar Arnfjörð Bjarmason

When the uploadpack.bundleURI config is set to a URI (or URIs, if set
>1 times), advertise a "bundle-uri" command, then when the client
requests "bundle-uri" emit those URIs back at them.

The client CAN then request those URIs out of bounds, and after
they've done (after either disconnecting & coming back, or leaving us
hanging), proceed with the rest of request flow. I.e. issuing a
"ls-refs" followed by a "fetch". The client MAY then send us "have"
lines with the tips they've unpacked from their newly acquired
bundle(s).

This commit doesn't implement any of that required client behavior,
only the trivial server behavior of spewing a list of URLs at the
client on request.

There is already a uploadpack.blobPackfileUri setting for the server,
so why is this needed? The Documentation/technical/bundle-uri.txt
added in a preceding commit discusses that in more detail, but in
summary:

 1. There is no "real" support for in git.git. The
    uploadpack.blobPackfileUri setting allows carving out a list of
    blobs (actually any OIDs), but as alluded to in bfc2a36ff2a (Doc:
    clarify contents of packfile sent as URI, 2021-01-20) the only
    "real" implementation is JGit based.

 2. The uploadpack.blobPackfileUri is a MUST where this is a
    "CAN". I.e. once a client says they support packfile-uri of given
    list of protocols the server will send them a PACK response
    assuming they've downloaded the URI they client was sent, if the
    client doesn't do that they don't have a valid repository.

    Pointing at a bundle and having the client send us "have"
    lines (or not, maybe they couldn't fetch it, or decided they
    didn't want to) is more flexible, and can gracefully recover
    e.g. if the CDN isn't reachable (maybe you do support "https", but
    the CDN provider is down, or blocked your whole country).

 3. Because of the disconnect in #2 "dumb" servers can seed
    pre-clients, e.g. we might point to a repo.bundle whose exact
    state we aren't sure of (a cronjob updates it, sometimes). The
    client will discover its contents, and give us the "have" lines,
    the "packfile-uri" method effectively requires the server to have
    those exact "have" lines (or rather, it will produce a similar
    PACK using give-or-take the same exclusions).

 4. This provides an easy way to the long sought after "resumable
    clones". I.e. since we can assume that it's in the server's
    interest to keep their bundle(s) as up-to-date as possible, most
    or all of the history we need to fetch will be in the bundle. If
    we fail midway through the "clone" we can offload the problem of
    resuming to wget/curl/rsync/whatever, instead of (as has been
    suggested, but not implemented for the "normal" dialog)
    "repairing" a partial PACK response or something.

There was a suggestion of implementing a similar feature long ago[1]
by Jeff King. The main difference between it and this approach is that
we've since gained protocol v2, so we can add this as an optional path
in the dialog between client and server. The 2011 implementation
hooked into the transport mechanism to try to clone from a bundle
directly. See also [2] and [3] for some later mentions of that
approach.

See also [4] for the series that implemented
uploadpack.blobPackfileUri, and [5] for a series on top that did the
.gitmodules check in that context. See [6] for the "ls-refs unborn"
feature which modified code in similar areas of the request flow.

1. https://lore.kernel.org/git/20111110074330.GA27925@sigill.intra.peff.net/
2. https://lore.kernel.org/git/20190514092900.GA11679@sigill.intra.peff.net/
3. https://lore.kernel.org/git/YFJWz5yIGng+a16k@coredump.intra.peff.net/
4. https://lore.kernel.org/git/cover.1591821067.git.jonathantanmy@google.com/
   Merged as 34e849b05a4 (Merge branch 'jt/cdn-offload', 2020-06-25)
5. https://lore.kernel.org/git/cover.1614021092.git.jonathantanmy@google.com/
   Merged as 6ee353d42f3 (Merge branch 'jt/transfer-fsck-across-packs',
   2021-03-01)
6. 69571dfe219 (Merge branch 'jt/clone-unborn-head', 2021-02-17)

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/technical/protocol-v2.txt | 140 ++++++++++++++++++++++++
 Makefile                                |   1 +
 bundle-uri.c                            |  65 +++++++++++
 bundle-uri.h                            |  14 +++
 serve.c                                 |   6 +
 t/t5701-git-serve.sh                    | 124 ++++++++++++++++++++-
 6 files changed, 349 insertions(+), 1 deletion(-)
 create mode 100644 bundle-uri.c
 create mode 100644 bundle-uri.h

diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt
index 213538f1d0..d10d5e9ef6 100644
--- a/Documentation/technical/protocol-v2.txt
+++ b/Documentation/technical/protocol-v2.txt
@@ -556,3 +556,143 @@ and associated requested information, each separated by a single space.
 	attr = "size"
 
 	obj-info = obj-id SP obj-size
+
+bundle-uri
+~~~~~~~~~~
+
+If the 'bundle-uri' capability is advertised, the server supports the
+`bundle-uri' command.
+
+The capability is currently advertised with no value (i.e. not
+"bundle-uri=somevalue"), a value may be added in the future for
+supporting command-wide extensions. Clients MUST ignore any unknown
+capability values and proceed with the 'bundle-uri` dialog they
+support.
+
+The 'bundle-uri' command is intended to be issued before `fetch` to
+get URIs to bundle files (see linkgit:git-bundle[1]) to "seed" and
+inform the subsequent `fetch` command.
+
+The client CAN issue `bundle-uri` before or after any other valid
+command. It's expected that it'll be issued after an `ls-refs` and
+before `fetch`.
+
+DISCUSSION of bundle-uri
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+The intent of the feature is optimize for server resource consumption
+in the common case by changing the common case of fetching a very
+large PACK during linkgit:git-clone[1] into a smaller incremental
+fetch.
+
+It also allows servers to achieve better caching in combination with
+an `uploadpack.packObjectsHook` (see linkgit:git-config[1]).
+
+By having new clones or fetches be a more predictable and common
+negotiation against the tips of recently produces *.bundle file(s).
+Servers might even pre-generate the results of such negotiations for
+the `uploadpack.packObjectsHook` as new pushes come in.
+
+I.e. the server would anticipate that fresh clones will download a
+known bundle, followed by catching up to the current state of the
+repository using ref tips found in that bundle (or bundles).
+
+PROTOCOL for bundle-uri
+^^^^^^^^^^^^^^^^^^^^^^^
+
+A `bundle-uri` request takes no arguments, and as noted above does not
+currently advertise a capability value. Both may be added in the
+future.
+
+When the client issues a `command=bundle-uri` the response is a list
+of URIs the server would like the client to fetch out-of-bounds before
+proceeding with the `fetch` request in this format:
+
+	output = bundle-uri-line
+		 bundle-uri-line* flush-pkt
+
+	bundle-uri-line = PKT-LINE(bundle-uri)
+			  *(SP bundle-feature-key *(=bundle-feature-val))
+			  LF
+
+	bundle-uri = A URI such as a https://, ssh:// etc. URI
+
+	bundle-feature-key = Any printable ASCII characters except SP or "="
+	bundle-feature-val = Any printable ASCII characters except SP or "="
+
+No `bundle-feature-key`=`bundle-feature-value` fields are currently
+defined. See the discussion of features below.
+
+bundle-uri CLIENT AND SERVER EXPECTATIONS
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The advertised bundles MUST contain one or more reference tips for use
+by the client. Bundles that are not self-contained MUST use the
+standard "-" prefixes in the bundle format to indicate their
+prerequisites. I.e. they must be in the standard format "git bundle
+create" would create.
+
+If after an `ls-refs` the client finds that the ref tips it wants can
+be retrieved entirety from advertised bundle(s), it MAY
+disconnect. The results of such a "clone" or "fetch" should be
+indistinguishable from the state attained without using bundle-uri.
+
+The client MAY also keep the connection open pending download of the
+bundle-uris, e.g. should on or more downloads (or their validation)
+fail.
+
+The client MAY provide reference tips found in the bundle(s) to be
+downloaded out-of-bounds as `have` lines in the `fetch` request. They
+MAY also ignore the bundle(s) entirely (e.g. if they can't be
+downloaded) or some combination of the two.
+
+For the convenience of clients bundles SHOULD be provided in the order
+that they must be unpacked in if processed one-at-a-time by a dumber client.
+
+That usually means a "big bundle" first with most of the history
+that's self-contained, optionally followed by incremental updates on
+that "big bundle".
+
+This ordering is a mere convention and not a MUST, e.g. a repository
+with N branches with disconnected histories might have N "big
+bundles", each with their own self-contained history. A server might
+also only provide "incremental updates".
+
+A client MUST consider the content of the bundles themselves and their
+header as the ultimate source of truth. Servers MAY be tolerant of
+simpler clients by using the convention outlined above.
+
+As noted before a client MUST gracefully degrade on errors, whether
+that error is because of bad missing/data in the bundle URI(s), or
+because that client is too dumb to e.g. understand and fully parse out
+bundle headers and their prerequisite relationships.
+
+bundle-uri PROTOCOL FEATURES
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+As noted above no `bundle-feature-key`=`bundle-feature-value` fields
+are currently defined.
+
+They are intended for future per-URI metadata which older clients MUST
+ignore and gracefully degrade on. Any fields they do recognize they
+CAN also ignore.
+
+Any backwards-incompatible addition of pre-URI key-value will be
+guarded by a new value or values in 'bundle-uri' capability
+advertisement itself, and/or by new future `bundle-uri` request
+arguments.
+
+While no per-URI key-value are currently supported currently they're
+intended to support future features such as:
+
+ * Add a "hash=<val>" or "size=<bytes>" advertise the expected hash or
+   size of the bundle file.
+
+ * Advertise that one or more bundle files are the same (to e.g. have
+   clients round-robin or otherwise choose one of N possible files).
+
+ * A "tip=<OID>" shortcut. A client who'd have the advertised <OID>
+   would know there was no need to download the relevant bundle(s),
+   they've got that OID already (for multi-tips the client would need
+   to fetch the bundle, or do e.g. HTTP range requests to get its
+   header).
diff --git a/Makefile b/Makefile
index 9573190f1d..877c6c47b6 100644
--- a/Makefile
+++ b/Makefile
@@ -850,6 +850,7 @@ LIB_OBJS += blob.o
 LIB_OBJS += bloom.o
 LIB_OBJS += branch.o
 LIB_OBJS += bulk-checkin.o
+LIB_OBJS += bundle-uri.o
 LIB_OBJS += bundle.o
 LIB_OBJS += cache-tree.o
 LIB_OBJS += cbtree.o
diff --git a/bundle-uri.c b/bundle-uri.c
new file mode 100644
index 0000000000..2d93e8b003
--- /dev/null
+++ b/bundle-uri.c
@@ -0,0 +1,65 @@
+#include "cache.h"
+#include "bundle-uri.h"
+#include "pkt-line.h"
+#include "config.h"
+
+/**
+ * serve.[ch] API.
+ */
+
+/*
+ * "uploadpack.bundleURI" is advertised only if there's URIs to serve
+ * up per the config.
+ */
+static int advertise_bundle_uri = -1;
+
+static void send_bundle_uris(struct packet_writer *writer,
+			     struct string_list *uris)
+{
+	struct string_list_item *item;
+	for_each_string_list_item(item, uris) {
+		const char *uri = item->string;
+
+		packet_writer_write(writer, "%s", uri);
+	}
+}
+
+static struct string_list bundle_uris = STRING_LIST_INIT_DUP;
+
+static int bundle_uri_startup_config(const char *var, const char *value,
+				     void *data)
+{
+	if (!strcmp(var, "uploadpack.bundleuri")) {
+		advertise_bundle_uri = 1;
+		string_list_append(&bundle_uris, value);
+	}
+	return 0;
+}
+
+int bundle_uri_advertise(struct repository *r, struct strbuf *value)
+{
+	if (advertise_bundle_uri == -1) {
+		git_config(bundle_uri_startup_config, NULL);
+		if (advertise_bundle_uri == -1)
+			advertise_bundle_uri = 0;
+	}
+	return advertise_bundle_uri;
+}
+
+int bundle_uri_command(struct repository *r,
+		       struct packet_reader *request)
+{
+	struct packet_writer writer;
+	packet_writer_init(&writer, 1);
+
+	while (packet_reader_read(request) == PACKET_READ_NORMAL)
+		die("bundle-uri: unexpected argument: '%s'", request->line);
+	if (request->status != PACKET_READ_FLUSH)
+		die("bundle-uri: expected flush after arguments");
+
+	send_bundle_uris(&writer, &bundle_uris);
+
+	packet_writer_flush(&writer);
+
+	return 0;
+}
diff --git a/bundle-uri.h b/bundle-uri.h
new file mode 100644
index 0000000000..6a40efeb39
--- /dev/null
+++ b/bundle-uri.h
@@ -0,0 +1,14 @@
+#ifndef BUNDLE_URI_H
+#define BUNDLE_URI_H
+
+struct repository;
+struct packet_reader;
+struct packet_writer;
+
+/**
+ * serve.[ch] API.
+ */
+int bundle_uri_advertise(struct repository *r, struct strbuf *value);
+int bundle_uri_command(struct repository *r, struct packet_reader *request);
+
+#endif /* BUNDLE_URI_H */
diff --git a/serve.c b/serve.c
index 1817edc7f5..789bf5fc38 100644
--- a/serve.c
+++ b/serve.c
@@ -8,6 +8,7 @@
 #include "protocol-caps.h"
 #include "serve.h"
 #include "upload-pack.h"
+#include "bundle-uri.h"
 
 static int advertise_sid = -1;
 
@@ -104,6 +105,11 @@ static struct protocol_capability capabilities[] = {
 		.advertise = always_advertise,
 		.command = cap_object_info,
 	},
+	{
+		.name = "bundle-uri",
+		.advertise = bundle_uri_advertise,
+		.command = bundle_uri_command,
+	},
 };
 
 void protocol_v2_advertise_capabilities(void)
diff --git a/t/t5701-git-serve.sh b/t/t5701-git-serve.sh
index 930721f053..21d5314d83 100755
--- a/t/t5701-git-serve.sh
+++ b/t/t5701-git-serve.sh
@@ -12,7 +12,7 @@ test_expect_success 'test capability advertisement' '
 	wrong_algo sha1:sha256
 	wrong_algo sha256:sha1
 	EOF
-	cat >expect <<-EOF &&
+	cat >expect.base <<-EOF &&
 	version 2
 	agent=git/$(git version | cut -d" " -f3)
 	ls-refs=unborn
@@ -20,8 +20,11 @@ test_expect_success 'test capability advertisement' '
 	server-option
 	object-format=$(test_oid algo)
 	object-info
+	EOF
+	cat >expect.trailer <<-EOF &&
 	0000
 	EOF
+	cat expect.base expect.trailer >expect &&
 
 	GIT_TEST_SIDEBAND_ALL=0 test-tool serve-v2 \
 		--advertise-capabilities >out &&
@@ -266,4 +269,123 @@ test_expect_success 'basics of object-info' '
 	test_cmp expect actual
 '
 
+# Test the basics of bundle-uri
+#
+test_expect_success 'test capability advertisement with uploadpack.bundleURI' '
+	test_config uploadpack.bundleURI FAKE &&
+
+	cat >expect.extra <<-EOF &&
+	bundle-uri
+	EOF
+	cat expect.base \
+	    expect.extra \
+	    expect.trailer >expect &&
+
+	GIT_TEST_SIDEBAND_ALL=0 test-tool serve-v2 \
+		--advertise-capabilities >out &&
+	test-tool pkt-line unpack <out >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'basics of bundle-uri: dies if not enabled' '
+	test-tool pkt-line pack >in <<-EOF &&
+	command=bundle-uri
+	0000
+	EOF
+
+	cat >err.expect <<-\EOF &&
+	fatal: invalid command '"'"'bundle-uri'"'"'
+	EOF
+
+	cat >expect <<-\EOF &&
+	ERR serve: invalid command '"'"'bundle-uri'"'"'
+	EOF
+
+	test_must_fail test-tool serve-v2 --stateless-rpc <in >out 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_must_be_empty out
+'
+
+
+test_expect_success 'basics of bundle-uri: enabled with single URI' '
+	test_config uploadpack.bundleURI https://cdn.example.com/repo.bdl &&
+
+	test-tool pkt-line pack >in <<-EOF &&
+	command=bundle-uri
+	object-format=$(test_oid algo)
+	0000
+	EOF
+
+	cat >expect <<-EOF &&
+	https://cdn.example.com/repo.bdl
+	0000
+	EOF
+
+	test-tool serve-v2 --stateless-rpc <in >out &&
+	test-tool pkt-line unpack <out >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'basics of bundle-uri: enabled with single URI' '
+	test_config uploadpack.bundleURI https://cdn.example.com/repo.bdl &&
+
+	test-tool pkt-line pack >in <<-EOF &&
+	command=bundle-uri
+	object-format=$(test_oid algo)
+	0000
+	EOF
+
+	cat >expect <<-EOF &&
+	https://cdn.example.com/repo.bdl
+	0000
+	EOF
+
+	test-tool serve-v2 --stateless-rpc <in >out &&
+	test-tool pkt-line unpack <out >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'basics of bundle-uri: enabled with two URIs' '
+	test_config uploadpack.bundleURI https://cdn.example.com/repo.bdl &&
+	test_config uploadpack.bundleURI https://cdn.example.com/recent.bdl --add &&
+
+	test-tool pkt-line pack >in <<-EOF &&
+	command=bundle-uri
+	object-format=$(test_oid algo)
+	0000
+	EOF
+
+	cat >expect <<-EOF &&
+	https://cdn.example.com/repo.bdl
+	https://cdn.example.com/recent.bdl
+	0000
+	EOF
+
+	test-tool serve-v2 --stateless-rpc <in >out &&
+	test-tool pkt-line unpack <out >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'basics of bundle-uri: unknown future feature(s)' '
+	test_config uploadpack.bundleURI https://cdn.example.com/fake.bdl &&
+
+	test-tool pkt-line pack >in <<-EOF &&
+	command=bundle-uri
+	object-format=$(test_oid algo)
+	0001
+	some-feature
+	we-do-not
+	know=about
+	0000
+	EOF
+
+	cat >err.expect <<-\EOF &&
+	fatal: bundle-uri: unexpected argument: '"'"'some-feature'"'"'
+	EOF
+
+	test_must_fail test-tool serve-v2 --stateless-rpc <in >out 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_must_be_empty out
+'
+
 test_done
-- 
2.33.0.rc0.646.g585563e77f


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 02/13] bundle-uri client: add "bundle-uri" parsing + tests
  2021-08-05 15:07 [RFC PATCH 00/13] Add bundle-uri: resumably clones, static "dumb" CDN etc Ævar Arnfjörð Bjarmason
  2021-08-05 15:07 ` [RFC PATCH 01/13] serve: add command to advertise bundle URIs Ævar Arnfjörð Bjarmason
@ 2021-08-05 15:07 ` Ævar Arnfjörð Bjarmason
  2021-08-05 15:07 ` [RFC PATCH 03/13] connect.c: refactor sending of agent & object-format Ævar Arnfjörð Bjarmason
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-08-05 15:07 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Patrick Steinhardt, Christian Couder, Albert Cui,
	Jonathan Tan, Ævar Arnfjörð Bjarmason

Add a "test-tool bundle-uri parse" which parses the format defined in
the newly specified "bundle-uri" command.

As note in the "bundle-uri" section in protocol-v2.txt we haven't
specified any key-values yet, just URI lines, but we need to make sure
our client doesn't die if this optional data is provided by the
server. Let's add a bundle_uri_parse_line() to do that, in subsequent
commits the actual client code in {connect,transport}.c will make use
of it.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Makefile                    |  1 +
 bundle-uri.c                | 80 ++++++++++++++++++++++++++++++
 bundle-uri.h                | 16 ++++++
 t/helper/test-bundle-uri.c  | 80 ++++++++++++++++++++++++++++++
 t/helper/test-tool.c        |  1 +
 t/helper/test-tool.h        |  1 +
 t/t5750-bundle-uri-parse.sh | 98 +++++++++++++++++++++++++++++++++++++
 7 files changed, 277 insertions(+)
 create mode 100644 t/helper/test-bundle-uri.c
 create mode 100755 t/t5750-bundle-uri-parse.sh

diff --git a/Makefile b/Makefile
index 877c6c47b6..6d3612b962 100644
--- a/Makefile
+++ b/Makefile
@@ -699,6 +699,7 @@ PROGRAMS += $(patsubst %.o,git-%$X,$(PROGRAM_OBJS))
 TEST_BUILTINS_OBJS += test-advise.o
 TEST_BUILTINS_OBJS += test-bitmap.o
 TEST_BUILTINS_OBJS += test-bloom.o
+TEST_BUILTINS_OBJS += test-bundle-uri.o
 TEST_BUILTINS_OBJS += test-chmtime.o
 TEST_BUILTINS_OBJS += test-config.o
 TEST_BUILTINS_OBJS += test-crontab.o
diff --git a/bundle-uri.c b/bundle-uri.c
index 2d93e8b003..d48bb78012 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -63,3 +63,83 @@ int bundle_uri_command(struct repository *r,
 
 	return 0;
 }
+
+/**
+ * General API for {transport,connect}.c etc.
+ */
+int bundle_uri_parse_line(struct string_list *bundle_uri, const char *line)
+{
+	int i;
+	struct string_list uri = STRING_LIST_INIT_DUP;
+	struct string_list_item *item = NULL;
+	int err = 0;
+
+	/*
+	 * Right now we don't understand anything beyond the first SP,
+	 * but let's be tolerant and ignore any future unknown
+	 * fields. See the "MUST" note about "bundle-feature-key" in
+	 * technical/protocol-v2.txt
+	 */
+	if (string_list_split(&uri, line, ' ', -1) < 1)
+		return error(_("bundle-uri line not in SP-delimited format: %s"), line);
+
+	for (i = 0; i < uri.nr; i++) {
+		struct string_list kv = STRING_LIST_INIT_DUP;
+		struct string_list_item *kv_item = NULL;
+		const char *arg = uri.items[i].string;
+		int fields;
+
+		/*
+		 * The "string" for each list item is the parsed URI
+		 * at the start of the line
+		 */
+		if (i == 0) {
+			item = string_list_append(bundle_uri, arg);
+			continue;
+		}
+
+		/*
+		 * Anything else on the line is keys or key-value
+		 * pairs separated by "=".
+		 *
+		 * Let's parse the format, even if we don't understand
+		 * any of the keys or values yet.
+		 */
+		assert(item);
+		arg = uri.items[i].string;
+		if (i == 1) {
+			item->util = xcalloc(1, sizeof(struct string_list));
+			string_list_init(item->util, 1);
+		}
+
+		fields = string_list_split(&kv, arg, '=', 2);
+		if (fields < 1 || fields > 2) {
+			err = error("expected `k` or `k=v` in column %d of bundle-uri line '%s', got '%s'",
+				     i, line, arg);
+			string_list_clear(&kv, 0);
+			continue;
+		}
+
+		kv_item = string_list_append(item->util, kv.items[0].string);
+		if (kv.nr == 2)
+			kv_item->util = xstrdup(kv.items[1].string);
+
+		string_list_clear(&kv, 0);
+	}
+	string_list_clear(&uri, 0);
+	return err;
+}
+
+static void bundle_uri_string_list_clear_cb(void *util, const char *string)
+{
+	struct string_list *fields = util;
+	if (!fields)
+		return;
+	string_list_clear(fields, 1);
+	free(fields);
+}
+
+void bundle_uri_string_list_clear(struct string_list *bundle_uri)
+{
+	string_list_clear_func(bundle_uri, bundle_uri_string_list_clear_cb);
+}
diff --git a/bundle-uri.h b/bundle-uri.h
index 6a40efeb39..2d271801b8 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -4,6 +4,7 @@
 struct repository;
 struct packet_reader;
 struct packet_writer;
+struct string_list;
 
 /**
  * serve.[ch] API.
@@ -11,4 +12,19 @@ struct packet_writer;
 int bundle_uri_advertise(struct repository *r, struct strbuf *value);
 int bundle_uri_command(struct repository *r, struct packet_reader *request);
 
+/**
+ * General API for {transport,connect}.c etc.
+ */
+
+/**
+ * bundle_uri_parse_line() returns 0 when a valid bundle-uri has been
+ * added to `bundle_uri`, <0 on error.
+ */
+int bundle_uri_parse_line(struct string_list *bundle_uri, const char *line);
+
+/**
+ * Clear the `bundle_uri` list. Just a very thin wrapper on
+ * string_list_clear().
+ */
+void bundle_uri_string_list_clear(struct string_list *bundle_uri);
 #endif /* BUNDLE_URI_H */
diff --git a/t/helper/test-bundle-uri.c b/t/helper/test-bundle-uri.c
new file mode 100644
index 0000000000..014da1b6ab
--- /dev/null
+++ b/t/helper/test-bundle-uri.c
@@ -0,0 +1,80 @@
+#include "test-tool.h"
+#include "parse-options.h"
+#include "bundle-uri.h"
+#include "strbuf.h"
+#include "string-list.h"
+
+static int cmd__bundle_uri_parse(int argc, const char **argv)
+{
+	const char *usage[] = {
+		"test-tool bundle-uri parse <in",
+		NULL
+	};
+	struct option options[] = {
+		OPT_END(),
+	};
+	struct strbuf sb = STRBUF_INIT;
+	struct string_list list = STRING_LIST_INIT_DUP;
+	int err = 0;
+	struct string_list_item *item;
+
+	argc = parse_options(argc, argv, NULL, options, usage, 0);
+	if (argc)
+		goto usage;
+
+	while (strbuf_getline(&sb, stdin) != EOF) {
+		if (bundle_uri_parse_line(&list, sb.buf) < 0)
+			err = error("bad line: %s", sb.buf);
+	}
+	for_each_string_list_item(item, &list) {
+		struct string_list_item *kv_item;
+		struct string_list *kv = item->util;
+
+		fprintf(stdout, "%s", item->string);
+		if (!kv) {
+			fprintf(stdout, "\n");
+			continue;
+		}
+		for_each_string_list_item(kv_item, kv) {
+			const char *k = kv_item->string;
+			const char *v = kv_item->util;
+
+			if (v)
+				fprintf(stdout, " [kv: %s => %s]", k, v);
+			else
+				fprintf(stdout, " [attr: %s]", k);
+		}
+		fprintf(stdout, "\n");
+	}
+	strbuf_release(&sb);
+
+	bundle_uri_string_list_clear(&list);
+
+	return err < 0 ? 1 : 0;
+usage:
+	usage_with_options(usage, options);
+}
+
+int cmd__bundle_uri(int argc, const char **argv)
+{
+	const char *usage[] = {
+		"test-tool bundle-uri <subcommand> [<options>]",
+		NULL
+	};
+	struct option options[] = {
+		OPT_END(),
+	};
+
+	argc = parse_options(argc, argv, NULL, options, usage,
+			     PARSE_OPT_STOP_AT_NON_OPTION |
+			     PARSE_OPT_KEEP_ARGV0);
+	if (argc == 1)
+		goto usage;
+
+	if (!strcmp(argv[1], "parse"))
+		return cmd__bundle_uri_parse(argc - 1, argv + 1);
+	error("there is no test-tool bundle-uri tool '%s'", argv[1]);
+
+usage:
+	usage_with_options(usage, options);
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 3ce5585e53..b6e1ee7b25 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -17,6 +17,7 @@ static struct test_cmd cmds[] = {
 	{ "advise", cmd__advise_if_enabled },
 	{ "bitmap", cmd__bitmap },
 	{ "bloom", cmd__bloom },
+	{ "bundle-uri", cmd__bundle_uri },
 	{ "chmtime", cmd__chmtime },
 	{ "config", cmd__config },
 	{ "crontab", cmd__crontab },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index 9f0f522850..ef839ac726 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -7,6 +7,7 @@
 int cmd__advise_if_enabled(int argc, const char **argv);
 int cmd__bitmap(int argc, const char **argv);
 int cmd__bloom(int argc, const char **argv);
+int cmd__bundle_uri(int argc, const char **argv);
 int cmd__chmtime(int argc, const char **argv);
 int cmd__config(int argc, const char **argv);
 int cmd__crontab(int argc, const char **argv);
diff --git a/t/t5750-bundle-uri-parse.sh b/t/t5750-bundle-uri-parse.sh
new file mode 100755
index 0000000000..ef1b9fedea
--- /dev/null
+++ b/t/t5750-bundle-uri-parse.sh
@@ -0,0 +1,98 @@
+#!/bin/sh
+
+test_description="Test bundle-uri bundle_uri_parse_line()"
+
+TEST_NO_CREATE_REPO=1
+. ./test-lib.sh
+
+test_expect_success 'bundle_uri_parse_line() just URIs' '
+	cat >in <<-\EOF &&
+	http://example.com/bundle.bdl
+	https://example.com/bundle.bdl
+	file:///usr/share/git/bundle.bdl
+	EOF
+
+	# For the simple case
+	cp in expect &&
+
+	test-tool bundle-uri parse <in >actual 2>err &&
+	test_must_be_empty err &&
+	test_cmp expect actual
+'
+
+test_expect_success 'bundle_uri_parse_line() with attributes' '
+	cat >in <<-\EOF &&
+	http://example.com/bundle1.bdl attr
+	http://example.com/bundle2.bdl ibute
+	EOF
+
+
+	cat >expect <<-\EOF &&
+	http://example.com/bundle1.bdl [attr: attr]
+	http://example.com/bundle2.bdl [attr: ibute]
+	EOF
+
+	test-tool bundle-uri parse <in >actual 2>err &&
+	test_must_be_empty err &&
+	test_cmp expect actual
+'
+
+test_expect_success 'bundle_uri_parse_line() with attributes and key-value attributes' '
+	cat >in <<-\EOF &&
+	http://example.com/bundle1.bdl x a=b y c=d z e=f a=b
+	EOF
+
+
+	cat >expect <<-\EOF &&
+	http://example.com/bundle1.bdl [attr: x] [kv: a => b] [attr: y] [kv: c => d] [attr: z] [kv: e => f] [kv: a => b]
+	EOF
+
+	test-tool bundle-uri parse <in >actual 2>err &&
+	test_must_be_empty err &&
+	test_cmp expect actual
+'
+
+test_expect_success 'bundle_uri_parse_line() parsing edge cases: extra SP' '
+	cat >in <<-\EOF &&
+	http://example.com/bundle1.bdl one-space
+	http://example.com/bundle1.bdl  two-space
+	http://example.com/bundle1.bdl   three-space
+	EOF
+
+	# We are anal just the one SP
+	cat >expect <<-\EOF &&
+	http://example.com/bundle1.bdl [attr: one-space]
+	http://example.com/bundle1.bdl [attr: ] [attr: two-space]
+	http://example.com/bundle1.bdl [attr: ] [attr: ] [attr: three-space]
+	EOF
+
+	test-tool bundle-uri parse <in >actual 2>err &&
+	test_must_be_empty err &&
+	test_cmp expect actual
+'
+
+test_expect_success 'bundle_uri_parse_line() parsing edge cases: multiple = in key-values' '
+	cat >in <<-\EOF &&
+	http://example.com/bundle1.bdl k=v=extra
+	http://example.com/bundle2.bdl a=b k=v=extra c=d
+	EOF
+
+	cat >err.expect <<-\EOF &&
+	error: expected `k` or `k=v` in column 1 of bundle-uri line '"'"'http://example.com/bundle1.bdl k=v=extra'"'"', got '"'"'k=v=extra'"'"'
+	error: bad line: http://example.com/bundle1.bdl k=v=extra
+	error: expected `k` or `k=v` in column 2 of bundle-uri line '"'"'http://example.com/bundle2.bdl a=b k=v=extra c=d'"'"', got '"'"'k=v=extra'"'"'
+	error: bad line: http://example.com/bundle2.bdl a=b k=v=extra c=d
+	EOF
+
+	# We fail, but try to continue parsing regardless
+	cat >expect <<-\EOF &&
+	http://example.com/bundle1.bdl
+	http://example.com/bundle2.bdl [kv: a => b] [kv: c => d]
+	EOF
+
+	test_must_fail test-tool bundle-uri parse <in >actual 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_cmp expect actual
+'
+
+test_done
-- 
2.33.0.rc0.646.g585563e77f


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 03/13] connect.c: refactor sending of agent & object-format
  2021-08-05 15:07 [RFC PATCH 00/13] Add bundle-uri: resumably clones, static "dumb" CDN etc Ævar Arnfjörð Bjarmason
  2021-08-05 15:07 ` [RFC PATCH 01/13] serve: add command to advertise bundle URIs Ævar Arnfjörð Bjarmason
  2021-08-05 15:07 ` [RFC PATCH 02/13] bundle-uri client: add "bundle-uri" parsing + tests Ævar Arnfjörð Bjarmason
@ 2021-08-05 15:07 ` Ævar Arnfjörð Bjarmason
  2021-08-05 15:07 ` [RFC PATCH 04/13] bundle-uri client: add minimal NOOP client Ævar Arnfjörð Bjarmason
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-08-05 15:07 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Patrick Steinhardt, Christian Couder, Albert Cui,
	Jonathan Tan, Ævar Arnfjörð Bjarmason

Refactor the sending of the "agent" and "object-format" capabilities
into a function.

This was added in its current form in ab67235bc4 (connect: parse v2
refs with correct hash algorithm, 2020-05-25). When we connect to a v2
server we need to know about its object-format, and it needs to know
about ours. Since most things in connect.c and transport.c piggy-back
on the eager getting of remote refs via the handshake() those commands
can make use of the just-sent-over object-format by ls-refs.

But I'm about to add a command that may come after ls-refs, and may
not, but we need the server to know about our user-agent and
object-format. So let's split this into a function.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 connect.c | 33 ++++++++++++++++++++-------------
 1 file changed, 20 insertions(+), 13 deletions(-)

diff --git a/connect.c b/connect.c
index 70b13389ba..18dfeae2b8 100644
--- a/connect.c
+++ b/connect.c
@@ -471,6 +471,24 @@ void check_stateless_delimiter(int stateless_rpc,
 		die("%s", error);
 }
 
+static void send_capabilities(int fd_out, struct packet_reader *reader)
+{
+	const char *hash_name;
+
+	if (server_supports_v2("agent", 0))
+		packet_write_fmt(fd_out, "agent=%s", git_user_agent_sanitized());
+
+	if (server_feature_v2("object-format", &hash_name)) {
+		int hash_algo = hash_algo_by_name(hash_name);
+		if (hash_algo == GIT_HASH_UNKNOWN)
+			die(_("unknown object format '%s' specified by server"), hash_name);
+		reader->hash_algo = &hash_algos[hash_algo];
+		packet_write_fmt(fd_out, "object-format=%s", reader->hash_algo->name);
+	} else {
+		reader->hash_algo = &hash_algos[GIT_HASH_SHA1];
+	}
+}
+
 struct ref **get_remote_refs(int fd_out, struct packet_reader *reader,
 			     struct ref **list, int for_push,
 			     struct transport_ls_refs_options *transport_options,
@@ -478,7 +496,6 @@ struct ref **get_remote_refs(int fd_out, struct packet_reader *reader,
 			     int stateless_rpc)
 {
 	int i;
-	const char *hash_name;
 	struct strvec *ref_prefixes = transport_options ?
 		&transport_options->ref_prefixes : NULL;
 	char **unborn_head_target = transport_options ?
@@ -488,18 +505,8 @@ struct ref **get_remote_refs(int fd_out, struct packet_reader *reader,
 	if (server_supports_v2("ls-refs", 1))
 		packet_write_fmt(fd_out, "command=ls-refs\n");
 
-	if (server_supports_v2("agent", 0))
-		packet_write_fmt(fd_out, "agent=%s", git_user_agent_sanitized());
-
-	if (server_feature_v2("object-format", &hash_name)) {
-		int hash_algo = hash_algo_by_name(hash_name);
-		if (hash_algo == GIT_HASH_UNKNOWN)
-			die(_("unknown object format '%s' specified by server"), hash_name);
-		reader->hash_algo = &hash_algos[hash_algo];
-		packet_write_fmt(fd_out, "object-format=%s", reader->hash_algo->name);
-	} else {
-		reader->hash_algo = &hash_algos[GIT_HASH_SHA1];
-	}
+	/* Send capabilities */
+	send_capabilities(fd_out, reader);
 
 	if (server_options && server_options->nr &&
 	    server_supports_v2("server-option", 1))
-- 
2.33.0.rc0.646.g585563e77f


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 04/13] bundle-uri client: add minimal NOOP client
  2021-08-05 15:07 [RFC PATCH 00/13] Add bundle-uri: resumably clones, static "dumb" CDN etc Ævar Arnfjörð Bjarmason
                   ` (2 preceding siblings ...)
  2021-08-05 15:07 ` [RFC PATCH 03/13] connect.c: refactor sending of agent & object-format Ævar Arnfjörð Bjarmason
@ 2021-08-05 15:07 ` Ævar Arnfjörð Bjarmason
  2021-08-05 15:07 ` [RFC PATCH 05/13] bundle-uri client: add "git ls-remote-bundle-uri" Ævar Arnfjörð Bjarmason
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-08-05 15:07 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Patrick Steinhardt, Christian Couder, Albert Cui,
	Jonathan Tan, Ævar Arnfjörð Bjarmason

Set up all the needed client parts of the "bundle-uri" protocol
extension, without actually doing anything with the bundle URIs.

I.e. if the server says it supports "bundle-uri" we'll issue a
command=bundle-uri after command=ls-refs when we're cloning. We'll
parse the returned output using the code already tested for in
t5750-bundle-uri-parse.sh.

What we aren't doing is actually acting on that data, i.e. downloading
the bundle(s) before we get to doing the command=fetch, and adjusting
our negotiation dialog appropriately. I'll do that in subsequent
commits.

There's a question of what level of encapsulation we should use here,
I've opted to use connect.h in clone.c, but we could also e.g. make
transport_get_remote_refs() invoke this, i.e. make it implicitly get
the bundle-uri list for later steps.

This approach means that we don't "support" this in "git fetch" for
now. I'm starting with the case of initial clones, although as noted
in preceding commits to the protocol documentation nothing about this
approach precludes getting bundles on incremental fetches.

For the t5732-protocol-v2-bundle-uri-http.sh it's not easy to set
environment variables for git-upload-pack (it's started by Apache), so
let's skip the test under T5730_HTTP, and add unused T5730_{FILE,GIT}
prerequisites for consistency and future use.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/clone.c                        |   7 ++
 bundle-uri.c                           |  10 +-
 connect.c                              |  47 +++++++++
 remote.h                               |   4 +
 t/lib-t5730-protocol-v2-bundle-uri.sh  | 140 +++++++++++++++++++++++++
 t/t5730-protocol-v2-bundle-uri-file.sh |  36 +++++++
 t/t5731-protocol-v2-bundle-uri-git.sh  |  17 +++
 t/t5732-protocol-v2-bundle-uri-http.sh |  17 +++
 transport-helper.c                     |  13 +++
 transport-internal.h                   |   7 ++
 transport.c                            |  48 +++++++++
 transport.h                            |  18 ++++
 12 files changed, 362 insertions(+), 2 deletions(-)
 create mode 100644 t/lib-t5730-protocol-v2-bundle-uri.sh
 create mode 100755 t/t5730-protocol-v2-bundle-uri-file.sh
 create mode 100755 t/t5731-protocol-v2-bundle-uri-git.sh
 create mode 100755 t/t5732-protocol-v2-bundle-uri-http.sh

diff --git a/builtin/clone.c b/builtin/clone.c
index 66fe66679c..1e4e6be57d 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -27,6 +27,7 @@
 #include "iterator.h"
 #include "sigchain.h"
 #include "branch.h"
+#include "connect.h"
 #include "remote.h"
 #include "run-command.h"
 #include "connected.h"
@@ -1292,6 +1293,12 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 
 	refs = transport_get_remote_refs(transport, &transport_ls_refs_options);
 
+	/*
+	 * Populate transport->got_remote_bundle_uri and
+	 * transport->bundle_uri. We might get nothing.
+	 */
+	transport_get_remote_bundle_uri(transport);
+
 	if (refs) {
 		int hash_algo = hash_algo_by_ptr(transport_get_hash_algo(transport));
 
diff --git a/bundle-uri.c b/bundle-uri.c
index d48bb78012..56619bde22 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -38,11 +38,16 @@ static int bundle_uri_startup_config(const char *var, const char *value,
 
 int bundle_uri_advertise(struct repository *r, struct strbuf *value)
 {
+	if (value &&
+	    git_env_bool("GIT_TEST_BUNDLE_URI_UNKNOWN_CAPABILITY_VALUE", 0))
+		strbuf_addstr(value, "test-unknown-capability-value");
+
 	if (advertise_bundle_uri == -1) {
 		git_config(bundle_uri_startup_config, NULL);
 		if (advertise_bundle_uri == -1)
 			advertise_bundle_uri = 0;
 	}
+
 	return advertise_bundle_uri;
 }
 
@@ -53,9 +58,10 @@ int bundle_uri_command(struct repository *r,
 	packet_writer_init(&writer, 1);
 
 	while (packet_reader_read(request) == PACKET_READ_NORMAL)
-		die("bundle-uri: unexpected argument: '%s'", request->line);
+		die(_("bundle-uri: unexpected argument: '%s'"), request->line);
+
 	if (request->status != PACKET_READ_FLUSH)
-		die("bundle-uri: expected flush after arguments");
+		die(_("bundle-uri: expected flush after arguments"));
 
 	send_bundle_uris(&writer, &bundle_uris);
 
diff --git a/connect.c b/connect.c
index 18dfeae2b8..c7601ffd83 100644
--- a/connect.c
+++ b/connect.c
@@ -15,6 +15,7 @@
 #include "version.h"
 #include "protocol.h"
 #include "alias.h"
+#include "bundle-uri.h"
 
 static char *server_capabilities_v1;
 static struct strvec server_capabilities_v2 = STRVEC_INIT;
@@ -489,6 +490,52 @@ static void send_capabilities(int fd_out, struct packet_reader *reader)
 	}
 }
 
+int get_remote_bundle_uri(int fd_out, struct packet_reader *reader,
+			  struct string_list *bundle_uri, int stateless_rpc)
+{
+	int line_nr = 1;
+
+	/* Assert bundle-uri support */
+	server_supports_v2("bundle-uri", 1);
+
+	/* (Re-)send capabilities */
+	send_capabilities(fd_out, reader);
+
+	/* Send command */
+	packet_write_fmt(fd_out, "command=bundle-uri\n");
+	packet_delim(fd_out);
+
+	/* Send options */
+	if (git_env_bool("GIT_TEST_PROTOCOL_BAD_BUNDLE_URI", 0))
+		packet_write_fmt(fd_out, "test-bad-client\n");
+	packet_flush(fd_out);
+
+	/* Process response from server */
+	while (packet_reader_read(reader) == PACKET_READ_NORMAL) {
+		const char *line = reader->line;
+		line_nr++;
+
+		if (!bundle_uri_parse_line(bundle_uri, line))
+			continue;
+
+		return error(_("error on bundle-uri response line %d: %s"),
+			     line_nr, line);
+	}
+
+	if (reader->status != PACKET_READ_FLUSH)
+		return error(_("expected flush after bundle-uri listing"));
+
+	/*
+	 * Might die(), but obscure enough that that's OK, e.g. in
+	 * serve.c we'll call BUG() on its equivalent (the
+	 * PACKET_READ_RESPONSE_END check).
+	 */
+	check_stateless_delimiter(stateless_rpc, reader,
+				  _("expected response end packet after ref listing"));
+
+	return 0;
+}
+
 struct ref **get_remote_refs(int fd_out, struct packet_reader *reader,
 			     struct ref **list, int for_push,
 			     struct transport_ls_refs_options *transport_options,
diff --git a/remote.h b/remote.h
index 5a59198252..a0823fe4e9 100644
--- a/remote.h
+++ b/remote.h
@@ -202,6 +202,10 @@ struct ref **get_remote_refs(int fd_out, struct packet_reader *reader,
 			     const struct string_list *server_options,
 			     int stateless_rpc);
 
+/* Used for protocol v2 in order to retrieve refs from a remote */
+int get_remote_bundle_uri(int fd_out, struct packet_reader *reader,
+			  struct string_list *bundle_uri, int stateless_rpc);
+
 int resolve_remote_symref(struct ref *ref, struct ref *list);
 
 /*
diff --git a/t/lib-t5730-protocol-v2-bundle-uri.sh b/t/lib-t5730-protocol-v2-bundle-uri.sh
new file mode 100644
index 0000000000..6bd9d2dcfb
--- /dev/null
+++ b/t/lib-t5730-protocol-v2-bundle-uri.sh
@@ -0,0 +1,140 @@
+# Included from t573*-protocol-v2-bundle-uri-*.sh
+
+T5730_PARENT=
+T5730_URI=
+T5730_BUNDLE_URI=
+case "$T5730_PROTOCOL" in
+file)
+	T5730_PARENT=file_parent
+	T5730_URI="file://$PWD/file_parent"
+	T5730_BUNDLE_URI="$T5730_URI/fake.bdl"
+	test_set_prereq T5730_FILE
+	;;
+git)
+	. "$TEST_DIRECTORY"/lib-git-daemon.sh
+	start_git_daemon --export-all --enable=receive-pack
+	T5730_PARENT="$GIT_DAEMON_DOCUMENT_ROOT_PATH/parent"
+	T5730_URI="$GIT_DAEMON_URL/parent"
+	T5730_BUNDLE_URI="https://example.com/fake.bdl"
+	test_set_prereq T5730_GIT
+	;;
+http)
+	. "$TEST_DIRECTORY"/lib-httpd.sh
+	start_httpd
+	T5730_PARENT="$HTTPD_DOCUMENT_ROOT_PATH/http_parent"
+	T5730_URI="$HTTPD_URL/smart/http_parent"
+	test_set_prereq T5730_HTTP
+	;;
+*)
+	BUG "Need to pass valid T5730_PROTOCOL (was $T5730_PROTOCOL)"
+	;;
+esac
+
+test_expect_success "setup protocol v2 $T5730_PROTOCOL:// tests" '
+	git init "$T5730_PARENT" &&
+	test_commit -C "$T5730_PARENT" one
+'
+
+# Poor man's URI escaping. Good enough for the test suite whose trash
+# directory has a space in it. See 93c3fcbe4d4 (git-svn: attempt to
+# mimic SVN 1.7 URL canonicalization, 2012-07-28) for prior art.
+test_uri_escape() {
+	sed 's/ /%20/g'
+}
+
+case "$T5730_PROTOCOL" in
+http)
+	test_expect_success "setup config for $T5730_PROTOCOL:// tests" '
+		git -C "$T5730_PARENT" config http.receivepack true
+	'
+	;;
+*)
+	;;
+esac
+T5730_BUNDLE_URI_ESCAPED=$(echo "$T5730_BUNDLE_URI" | test_uri_escape)
+
+test_expect_success "connect with $T5730_PROTOCOL:// using protocol v2: no bundle-uri" '
+	test_when_finished "rm -f log" &&
+
+	GIT_TRACE_PACKET="$PWD/log" \
+	git \
+		-c protocol.version=2 \
+		ls-remote --symref "$T5730_URI" \
+		>actual 2>err &&
+
+	# Server responded using protocol v2
+	grep "< version 2" log &&
+
+	! grep bundle-uri log
+'
+
+test_expect_success "connect with $T5730_PROTOCOL:// using protocol v2: have bundle-uri" '
+	test_when_finished "rm -f log" &&
+
+	test_config -C "$T5730_PARENT" \
+		uploadpack.bundleURI "$T5730_BUNDLE_URI_ESCAPED" &&
+
+	GIT_TRACE_PACKET="$PWD/log" \
+	git \
+		-c protocol.version=2 \
+		ls-remote --symref "$T5730_URI" \
+		>actual 2>err &&
+
+	# Server responded using protocol v2
+	grep "< version 2" log &&
+
+	# Server advertised bundle-uri capability
+	grep bundle-uri log
+'
+
+test_expect_success !T5730_HTTP "bad client with $T5730_PROTOCOL:// using protocol v2" '
+	test_when_finished "rm -f log" &&
+
+	test_config -C "$T5730_PARENT" uploadpack.bundleURI \
+		"$T5730_BUNDLE_URI_ESCAPED" &&
+
+	cat >err.expect <<-\EOF &&
+	Cloning into '"'"'child'"'"'...
+	EOF
+	case "$T5730_PROTOCOL" in
+	file)
+		cat >fatal-bundle-uri.expect <<-\EOF
+		fatal: bundle-uri: unexpected argument: '"'"'test-bad-client'"'"'
+		EOF
+		;;
+	*)
+		cat >fatal.expect <<-\EOF
+		fatal: read error: Connection reset by peer
+		EOF
+		;;
+	esac &&
+
+	test_when_finished "rm -rf child" &&
+	test_must_fail ok=sigpipe env \
+		GIT_TRACE_PACKET="$PWD/log" \
+		GIT_TEST_PROTOCOL_BAD_BUNDLE_URI=true \
+		git -c protocol.version=2 \
+		clone "$T5730_URI" child \
+		>out 2>err &&
+	test_must_be_empty out &&
+
+	grep -v -e ^fatal: -e ^error: err >err.actual &&
+	test_cmp err.expect err.actual &&
+
+	case "$T5730_PROTOCOL" in
+	file)
+		# Due to general race conditions with client/server replies we
+		# may or may not get "fatal: the remote end hung up
+		# expectedly" here
+		grep "^fatal: bundle-uri:" err >fatal-bundle-uri.actual &&
+		test_cmp fatal-bundle-uri.expect fatal-bundle-uri.actual
+		;;
+	*)
+		grep "^fatal:" err >fatal.actual &&
+		test_cmp fatal.expect fatal.actual
+		;;
+	esac &&
+
+	grep "clone> test-bad-client$" log >sent-bad-request &&
+	test_file_not_empty sent-bad-request
+'
diff --git a/t/t5730-protocol-v2-bundle-uri-file.sh b/t/t5730-protocol-v2-bundle-uri-file.sh
new file mode 100755
index 0000000000..89203d3a23
--- /dev/null
+++ b/t/t5730-protocol-v2-bundle-uri-file.sh
@@ -0,0 +1,36 @@
+#!/bin/sh
+
+test_description="Test bundle-uri with protocol v2 and 'file://' transport"
+
+TEST_NO_CREATE_REPO=1
+
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+
+. ./test-lib.sh
+
+# Test protocol v2 with 'file://' transport
+#
+T5730_PROTOCOL=file
+. "$TEST_DIRECTORY"/lib-t5730-protocol-v2-bundle-uri.sh
+
+test_expect_success "unknown capability value with $T5730_PROTOCOL:// using protocol v2" '
+	test_when_finished "rm -f log" &&
+
+	test_config -C "$T5730_PARENT" \
+		uploadpack.bundleURI "$T5730_BUNDLE_URI_ESCAPED" &&
+
+	GIT_TRACE_PACKET="$PWD/log" \
+	GIT_TEST_BUNDLE_URI_UNKNOWN_CAPABILITY_VALUE=true \
+	git \
+		-c protocol.version=2 \
+		ls-remote --symref "$T5730_URI" \
+		>actual 2>err &&
+
+	# Server responded using protocol v2
+	grep "< version 2" log &&
+
+	grep "> bundle-uri=test-unknown-capability-value" log
+'
+
+test_done
diff --git a/t/t5731-protocol-v2-bundle-uri-git.sh b/t/t5731-protocol-v2-bundle-uri-git.sh
new file mode 100755
index 0000000000..282847b311
--- /dev/null
+++ b/t/t5731-protocol-v2-bundle-uri-git.sh
@@ -0,0 +1,17 @@
+#!/bin/sh
+
+test_description="Test bundle-uri with protocol v2 and 'git://' transport"
+
+TEST_NO_CREATE_REPO=1
+
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+
+. ./test-lib.sh
+
+# Test protocol v2 with 'git://' transport
+#
+T5730_PROTOCOL=git
+. "$TEST_DIRECTORY"/lib-t5730-protocol-v2-bundle-uri.sh
+
+test_done
diff --git a/t/t5732-protocol-v2-bundle-uri-http.sh b/t/t5732-protocol-v2-bundle-uri-http.sh
new file mode 100755
index 0000000000..fcc1cf3fae
--- /dev/null
+++ b/t/t5732-protocol-v2-bundle-uri-http.sh
@@ -0,0 +1,17 @@
+#!/bin/sh
+
+test_description="Test bundle-uri with protocol v2 and 'git://' transport"
+
+TEST_NO_CREATE_REPO=1
+
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+
+. ./test-lib.sh
+
+# Test protocol v2 with 'git://' transport
+#
+T5730_PROTOCOL=http
+. "$TEST_DIRECTORY"/lib-t5730-protocol-v2-bundle-uri.sh
+
+test_done
diff --git a/transport-helper.c b/transport-helper.c
index e8dbdd1153..aff27bcf78 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -1260,9 +1260,22 @@ static struct ref *get_refs_list_using_list(struct transport *transport,
 	return ret;
 }
 
+static int get_bundle_uri(struct transport *transport)
+{
+	get_helper(transport);
+
+	if (process_connect(transport, 0)) {
+		do_take_over(transport);
+		return transport->vtable->get_bundle_uri(transport);
+	}
+
+	return -1;
+}
+
 static struct transport_vtable vtable = {
 	.set_option	= set_helper_option,
 	.get_refs_list	= get_refs_list,
+	.get_bundle_uri = get_bundle_uri,
 	.fetch_refs	= fetch_refs,
 	.push_refs	= push_refs,
 	.connect	= connect_helper,
diff --git a/transport-internal.h b/transport-internal.h
index c4ca0b733a..90ea749e5c 100644
--- a/transport-internal.h
+++ b/transport-internal.h
@@ -26,6 +26,13 @@ struct transport_vtable {
 	struct ref *(*get_refs_list)(struct transport *transport, int for_push,
 				     struct transport_ls_refs_options *transport_options);
 
+	/**
+	 * Populates the remote side's bundle-uri under protocol v2,
+	 * if the "bundle-uri" capability was advertised. Returns 0 if
+	 * OK, negative values on error.
+	 */
+	int (*get_bundle_uri)(struct transport *transport);
+
 	/**
 	 * Fetch the objects for the given refs. Note that this gets
 	 * an array, and should ignore the list structure.
diff --git a/transport.c b/transport.c
index f9400b9b0b..444cf74756 100644
--- a/transport.c
+++ b/transport.c
@@ -22,6 +22,7 @@
 #include "protocol.h"
 #include "object-store.h"
 #include "color.h"
+#include "bundle-uri.h"
 
 static int transport_use_color = -1;
 static char transport_colors[][COLOR_MAXLEN] = {
@@ -345,6 +346,21 @@ static struct ref *get_refs_via_connect(struct transport *transport, int for_pus
 	return handshake(transport, for_push, options, 1);
 }
 
+static int get_bundle_uri(struct transport *transport)
+{
+	struct git_transport_data *data = transport->data;
+	struct packet_reader reader;
+	int stateless_rpc = transport->stateless_rpc;
+	string_list_init(&transport->bundle_uri, 1);
+
+	packet_reader_init(&reader, data->fd[0], NULL, 0,
+			   PACKET_READ_CHOMP_NEWLINE |
+			   PACKET_READ_GENTLE_ON_EOF);
+
+	return get_remote_bundle_uri(data->fd[1], &reader,
+				     &transport->bundle_uri, stateless_rpc);
+}
+
 static int fetch_refs_via_pack(struct transport *transport,
 			       int nr_heads, struct ref **to_fetch)
 {
@@ -884,6 +900,7 @@ static int disconnect_git(struct transport *transport)
 
 static struct transport_vtable taken_over_vtable = {
 	.get_refs_list	= get_refs_via_connect,
+	.get_bundle_uri = get_bundle_uri,
 	.fetch_refs	= fetch_refs_via_pack,
 	.push_refs	= git_transport_push,
 	.disconnect	= disconnect_git
@@ -1037,6 +1054,7 @@ static struct transport_vtable bundle_vtable = {
 
 static struct transport_vtable builtin_smart_vtable = {
 	.get_refs_list	= get_refs_via_connect,
+	.get_bundle_uri = get_bundle_uri,
 	.fetch_refs	= fetch_refs_via_pack,
 	.push_refs	= git_transport_push,
 	.connect	= connect_git,
@@ -1050,6 +1068,7 @@ struct transport *transport_get(struct remote *remote, const char *url)
 
 	ret->progress = isatty(2);
 	string_list_init_dup(&ret->pack_lockfiles);
+	string_list_init_dup(&ret->bundle_uri);
 
 	if (!remote)
 		BUG("No remote provided to transport_get()");
@@ -1453,6 +1472,34 @@ int transport_fetch_refs(struct transport *transport, struct ref *refs)
 	return rc;
 }
 
+int transport_get_remote_bundle_uri(struct transport *transport)
+{
+	const struct transport_vtable *vtable = transport->vtable;
+
+	/* Lazily configured */
+	if (transport->got_remote_bundle_uri++)
+		return 0;
+
+	/*
+	 * "Support" protocol v0 and v2 without bundle-uri support by
+	 * silently degrading to a NOOP.
+	 */
+	if (!server_supports_v2("bundle-uri", 0))
+		return 0;
+
+	/*
+	 * This is intentionally below the transport.injectBundleURI,
+	 * we want to be able to inject into protocol v0, or into the
+	 * dialog of a server who doesn't support this.
+	 */
+	if (!vtable->get_bundle_uri)
+		return error(_("bundle-uri operation not supported by protocol"));
+
+	if (vtable->get_bundle_uri(transport) < 0)
+		return error(_("could not retrieve server-advertised bundle-uri list"));
+	return 0;
+}
+
 void transport_unlock_pack(struct transport *transport)
 {
 	int i;
@@ -1478,6 +1525,7 @@ int transport_disconnect(struct transport *transport)
 		ret = transport->vtable->disconnect(transport);
 	if (transport->got_remote_refs)
 		free_refs((void *)transport->remote_refs);
+	bundle_uri_string_list_clear(&transport->bundle_uri);
 	free(transport);
 	return ret;
 }
diff --git a/transport.h b/transport.h
index 1cbab11373..fb9f89f956 100644
--- a/transport.h
+++ b/transport.h
@@ -75,6 +75,18 @@ struct transport {
 	 */
 	unsigned got_remote_refs : 1;
 
+	/**
+	 * Indicates whether we already called get_bundle_uri_list(); set by
+	 * transport.c::transport_get_remote_bundle_uri().
+	 */
+	unsigned got_remote_bundle_uri : 1;
+
+	/*
+	 * The results of "command=bundle-uri", if both sides support
+	 * the "bundle-uri" capability.
+	 */
+	struct string_list bundle_uri;
+
 	/*
 	 * Transports that call take-over destroys the data specific to
 	 * the transport type while doing so, and cannot be reused.
@@ -270,6 +282,12 @@ struct transport_ls_refs_options {
 const struct ref *transport_get_remote_refs(struct transport *transport,
 					    struct transport_ls_refs_options *transport_options);
 
+/**
+ * Retrieve bundle URI(s) from a remote. Populates "struct
+ * transport"'s "bundle_uri" and "got_remote_bundle_uri".
+ */
+int transport_get_remote_bundle_uri(struct transport *transport);
+
 /*
  * Fetch the hash algorithm used by a remote.
  *
-- 
2.33.0.rc0.646.g585563e77f


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 05/13] bundle-uri client: add "git ls-remote-bundle-uri"
  2021-08-05 15:07 [RFC PATCH 00/13] Add bundle-uri: resumably clones, static "dumb" CDN etc Ævar Arnfjörð Bjarmason
                   ` (3 preceding siblings ...)
  2021-08-05 15:07 ` [RFC PATCH 04/13] bundle-uri client: add minimal NOOP client Ævar Arnfjörð Bjarmason
@ 2021-08-05 15:07 ` Ævar Arnfjörð Bjarmason
  2021-08-05 15:07 ` [RFC PATCH 06/13] bundle-uri client: add transfer.injectBundleURI support Ævar Arnfjörð Bjarmason
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-08-05 15:07 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Patrick Steinhardt, Christian Couder, Albert Cui,
	Jonathan Tan, Ævar Arnfjörð Bjarmason

Add a git-ls-remote-bundle-uri command, this is a thin wrapper for
issuing protocol v2 "bundle-uri" commands to a server, and to the
parsing routines in bundle-uri.c.

Since in the "git clone" case we'll have already done the handshake(),
but not here, introduce a "got_advertisement" state along with
"got_remote_heads". It seems to me that the "got_remote_heads" is
badly named in the first place, and the whole logic of eagerly getting
ls-refs on handshake() or not could be refactored somewhat, but let's
not do that now, and instead just add another self-documenting state
variable.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/git-ls-remote-bundle-uri.txt |  63 ++++++++++
 Documentation/git-ls-remote.txt            |   1 +
 Makefile                                   |   1 +
 builtin.h                                  |   1 +
 builtin/clone.c                            |   2 +-
 builtin/ls-remote-bundle-uri.c             |  90 ++++++++++++++
 command-list.txt                           |   1 +
 git.c                                      |   1 +
 t/lib-t5730-protocol-v2-bundle-uri.sh      | 132 +++++++++++++++++++++
 transport.c                                |  43 +++++--
 transport.h                                |   6 +-
 11 files changed, 330 insertions(+), 11 deletions(-)
 create mode 100644 Documentation/git-ls-remote-bundle-uri.txt
 create mode 100644 builtin/ls-remote-bundle-uri.c

diff --git a/Documentation/git-ls-remote-bundle-uri.txt b/Documentation/git-ls-remote-bundle-uri.txt
new file mode 100644
index 0000000000..7d3d16a36f
--- /dev/null
+++ b/Documentation/git-ls-remote-bundle-uri.txt
@@ -0,0 +1,63 @@
+git-ls-remote-bundle-uri(1)
+===========================
+
+NAME
+----
+git-ls-remote-bundle-uri - List 'bundle-uri' in a remote repository
+
+
+SYNOPSIS
+--------
+[verse]
+'git ls-remote-bundle-uri [--quiet] [--uri] [--upload-pack=<exec>]
+	[[-o | --server-option=]<option>] <repository>
+
+
+DESCRIPTION
+-----------
+
+Displays the `bundle-uri`s advertised by a remote repository. See
+`bundle-uri` in link:technical/protocol-v2.html[the Git Wire Protocol,
+Version 2] documentation for what the output format looks like.
+
+OPTIONS
+-------
+
+-q::
+--quiet::
+	Do not print remote URL to stderr in cases where the remote
+	name is inferred from config.
++
+When the remote name is not inferred (e.g. `git ls-remote-bundle-uri
+origin`, or `git ls-remote-bundle-uri https://[...]`) the remote URL
+is not printed in any case.
+
+--uri::
+	Print only the URIs, and not any of their optional attributes.
+
+--upload-pack=<exec>::
+	Specify the full path of 'git-upload-pack' on the remote
+	host. This allows listing references from repositories accessed via
+	SSH and where the SSH daemon does not use the PATH configured by the
+	user.
+
+-o <option>::
+--server-option=<option>::
+	Transmit the given string to the server when communicating using
+	protocol version 2.  The given string must not contain a NUL or LF
+	character.
+	When multiple `--server-option=<option>` are given, they are all
+	sent to the other side in the order listed on the command line.
+
+<repository>::
+	The "remote" repository to query.  This parameter can be
+	either a URL or the name of a remote (see the GIT URLS and
+	REMOTES sections of linkgit:git-fetch[1]).
+
+SEE ALSO
+--------
+linkgit:git-ls-remote[1].
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/Documentation/git-ls-remote.txt b/Documentation/git-ls-remote.txt
index 492e573856..86c07eff83 100644
--- a/Documentation/git-ls-remote.txt
+++ b/Documentation/git-ls-remote.txt
@@ -114,6 +114,7 @@ c5db5456ae3b0873fc659c19fafdde22313cc441	refs/tags/v0.99.2
 
 SEE ALSO
 --------
+linkgit:git-ls-remote-bundle-uri[1].
 linkgit:git-check-ref-format[1].
 
 GIT
diff --git a/Makefile b/Makefile
index 6d3612b962..04ccb5af3a 100644
--- a/Makefile
+++ b/Makefile
@@ -1117,6 +1117,7 @@ BUILTIN_OBJS += builtin/init-db.o
 BUILTIN_OBJS += builtin/interpret-trailers.o
 BUILTIN_OBJS += builtin/log.o
 BUILTIN_OBJS += builtin/ls-files.o
+BUILTIN_OBJS += builtin/ls-remote-bundle-uri.o
 BUILTIN_OBJS += builtin/ls-remote.o
 BUILTIN_OBJS += builtin/ls-tree.o
 BUILTIN_OBJS += builtin/mailinfo.o
diff --git a/builtin.h b/builtin.h
index 16ecd5586f..9fd0529551 100644
--- a/builtin.h
+++ b/builtin.h
@@ -171,6 +171,7 @@ int cmd_log(int argc, const char **argv, const char *prefix);
 int cmd_log_reflog(int argc, const char **argv, const char *prefix);
 int cmd_ls_files(int argc, const char **argv, const char *prefix);
 int cmd_ls_tree(int argc, const char **argv, const char *prefix);
+int cmd_ls_remote_bundle_uri(int argc, const char **argv, const char *prefix);
 int cmd_ls_remote(int argc, const char **argv, const char *prefix);
 int cmd_mailinfo(int argc, const char **argv, const char *prefix);
 int cmd_mailsplit(int argc, const char **argv, const char *prefix);
diff --git a/builtin/clone.c b/builtin/clone.c
index 1e4e6be57d..80ef939904 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1297,7 +1297,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	 * Populate transport->got_remote_bundle_uri and
 	 * transport->bundle_uri. We might get nothing.
 	 */
-	transport_get_remote_bundle_uri(transport);
+	transport_get_remote_bundle_uri(transport, 1);
 
 	if (refs) {
 		int hash_algo = hash_algo_by_ptr(transport_get_hash_algo(transport));
diff --git a/builtin/ls-remote-bundle-uri.c b/builtin/ls-remote-bundle-uri.c
new file mode 100644
index 0000000000..4e0950d581
--- /dev/null
+++ b/builtin/ls-remote-bundle-uri.c
@@ -0,0 +1,90 @@
+#include "builtin.h"
+#include "cache.h"
+#include "transport.h"
+#include "ref-filter.h"
+#include "remote.h"
+#include "refs.h"
+
+static const char * const ls_remote_bundle_uri_usage[] = {
+	N_("git ls-remote-bundle-uri <repository>"),
+	NULL
+};
+
+int cmd_ls_remote_bundle_uri(int argc, const char **argv, const char *prefix)
+{
+	const char *dest = NULL;
+	int quiet = 0;
+	int uri = 0;
+	const char *uploadpack = NULL;
+	struct string_list server_options = STRING_LIST_INIT_DUP;
+	struct remote *remote;
+	struct transport *transport;
+	int status = 0;
+	struct option options[] = {
+		OPT__QUIET(&quiet, N_("do not print remote URL")),
+		OPT_BOOL(0, "uri", &uri, N_("limit to showing uri field")),
+		OPT_STRING(0, "upload-pack", &uploadpack, N_("exec"),
+			   N_("path of git-upload-pack on the remote host")),
+		OPT_STRING_LIST('o', "server-option", &server_options,
+				N_("server-specific"),
+				N_("option to transmit")),
+		OPT_END()
+	};
+	struct string_list_item *item;
+
+	argc = parse_options(argc, argv, prefix, options, ls_remote_bundle_uri_usage,
+			     PARSE_OPT_STOP_AT_NON_OPTION);
+	dest = argv[0];
+
+	packet_trace_identity("ls-remote-bundle-uri");
+
+	remote = remote_get(dest);
+	if (!remote) {
+		if (dest)
+			die("bad repository '%s'", dest);
+		die("No remote configured to get bundle URIs from.");
+	}
+	if (!remote->url_nr)
+		die("remote %s has no configured URL", dest);
+
+	transport = transport_get(remote, NULL);
+	if (uploadpack)
+		transport_set_option(transport, TRANS_OPT_UPLOADPACK, uploadpack);
+	if (server_options.nr)
+		transport->server_options = &server_options;
+
+	if (!dest && !quiet)
+		fprintf(stderr, "From %s\n", *remote->url);
+
+	if (transport_get_remote_bundle_uri(transport, 0) < 0) {
+		error(_("could not get the bundle-uri list"));
+		status = 1;
+		goto cleanup;
+	}
+
+	for_each_string_list_item(item, &transport->bundle_uri) {
+		struct string_list_item *kv_item;
+		struct string_list *kv = item->util;
+
+		fprintf(stdout, "%s", item->string);
+		if (uri || !kv) {
+			fprintf(stdout, "\n");
+			continue;
+		}
+		for_each_string_list_item(kv_item, kv) {
+			const char *k = kv_item->string;
+			const char *v = kv_item->util;
+
+			if (v)
+				fprintf(stdout, " %s=%s", k, v);
+			else
+				fprintf(stdout, " %s", k);
+		}
+		fprintf(stdout, "\n");
+	}
+
+cleanup:
+	if (transport_disconnect(transport))
+		return 1;
+	return status;
+}
diff --git a/command-list.txt b/command-list.txt
index a289f09ed6..fa5d9596f1 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -115,6 +115,7 @@ gitk                                    mainporcelain
 git-log                                 mainporcelain           info
 git-ls-files                            plumbinginterrogators
 git-ls-remote                           plumbinginterrogators
+git-ls-remote-bundle-uri                plumbinginterrogators
 git-ls-tree                             plumbinginterrogators
 git-mailinfo                            purehelpers
 git-mailsplit                           purehelpers
diff --git a/git.c b/git.c
index 18bed9a996..b902732936 100644
--- a/git.c
+++ b/git.c
@@ -545,6 +545,7 @@ static struct cmd_struct commands[] = {
 	{ "log", cmd_log, RUN_SETUP },
 	{ "ls-files", cmd_ls_files, RUN_SETUP },
 	{ "ls-remote", cmd_ls_remote, RUN_SETUP_GENTLY },
+	{ "ls-remote-bundle-uri", cmd_ls_remote_bundle_uri, RUN_SETUP_GENTLY },
 	{ "ls-tree", cmd_ls_tree, RUN_SETUP },
 	{ "mailinfo", cmd_mailinfo, RUN_SETUP_GENTLY | NO_PARSEOPT },
 	{ "mailsplit", cmd_mailsplit, NO_PARSEOPT },
diff --git a/t/lib-t5730-protocol-v2-bundle-uri.sh b/t/lib-t5730-protocol-v2-bundle-uri.sh
index 6bd9d2dcfb..fccf7ccaa2 100644
--- a/t/lib-t5730-protocol-v2-bundle-uri.sh
+++ b/t/lib-t5730-protocol-v2-bundle-uri.sh
@@ -138,3 +138,135 @@ test_expect_success !T5730_HTTP "bad client with $T5730_PROTOCOL:// using protoc
 	grep "clone> test-bad-client$" log >sent-bad-request &&
 	test_file_not_empty sent-bad-request
 '
+
+test_expect_success "ls-remote-bundle-uri with $T5730_PROTOCOL:// using protocol v2" '
+	test_when_finished "rm -f log" &&
+
+	test_config -C "$T5730_PARENT" uploadpack.bundleURI \
+		"$T5730_BUNDLE_URI_ESCAPED" &&
+
+	# All data about bundle URIs
+	cat >expect <<-EOF &&
+	$T5730_BUNDLE_URI_ESCAPED
+	EOF
+	GIT_TRACE_PACKET="$PWD/log" \
+	git \
+		-c protocol.version=2 \
+		ls-remote-bundle-uri \
+		"$T5730_URI" \
+		>actual &&
+	test_cmp expect actual &&
+
+	# Only the URIs
+	git \
+		-c protocol.version=2 \
+		ls-remote-bundle-uri --uri \
+		"$T5730_URI" \
+		>actual2 &&
+	test_cmp actual actual2
+'
+
+test_expect_success "ls-remote-bundle-uri with $T5730_PROTOCOL:// using protocol v2" '
+	test_when_finished "rm -f log" &&
+
+	ATTR="foo bar=baz" &&
+	test_config -C "$T5730_PARENT" uploadpack.bundleURI \
+		"$T5730_BUNDLE_URI_ESCAPED $ATTR" &&
+
+	# All data about bundle URIs
+	cat >expect <<-EOF &&
+	$T5730_BUNDLE_URI_ESCAPED $ATTR
+	EOF
+	GIT_TRACE_PACKET="$PWD/log" \
+	git \
+		-c protocol.version=2 \
+		ls-remote-bundle-uri \
+		"$T5730_URI" \
+		>actual &&
+	test_cmp expect actual
+'
+
+test_expect_success "ls-remote-bundle-uri with $T5730_PROTOCOL:// using protocol v2: --uri" '
+	test_when_finished "rm -f log" &&
+
+	ATTR="foo bar=baz" &&
+	test_config -C "$T5730_PARENT" uploadpack.bundleURI \
+		"$T5730_BUNDLE_URI_ESCAPED $ATTR" &&
+
+	# All data about bundle URIs
+	cat >expect <<-EOF &&
+	$T5730_BUNDLE_URI_ESCAPED
+	EOF
+	GIT_TRACE_PACKET="$PWD/log" \
+	git \
+		-c protocol.version=2 \
+		ls-remote-bundle-uri \
+		--uri \
+		"$T5730_URI" \
+		>actual &&
+	test_cmp expect actual
+'
+
+test_expect_success "ls-remote-bundle-uri --[no-]quiet with $T5730_PROTOCOL:// using protocol v2" '
+	test_when_finished "rm -f log" &&
+
+	cat >err.expect <<-\EOF &&
+	Cloning into '"'"'child'"'"'...
+	EOF
+
+	test_when_finished "rm -rf child" &&
+	GIT_TRACE_PACKET="$PWD/log" \
+	git \
+		-c protocol.version=2 \
+		 clone "$T5730_URI" child \
+		 >out 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_must_be_empty out &&
+
+	test_config -C "$T5730_PARENT" uploadpack.bundleURI \
+		"$T5730_BUNDLE_URI_ESCAPED" &&
+
+	# Without --[no-]quiet
+	cat >out.expect <<-EOF &&
+	$T5730_BUNDLE_URI_ESCAPED
+	EOF
+	cat >err.expect <<-EOF &&
+	From $T5730_URI
+	EOF
+	git \
+		-C child \
+		 -c protocol.version=2 \
+		ls-remote-bundle-uri \
+		>out.actual 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_cmp out.expect out.actual &&
+
+	# --no-quiet is the default
+	git \
+		-C child \
+		-c protocol.version=2 \
+		ls-remote-bundle-uri \
+		--no-quiet \
+		>out.actual 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_cmp out.expect out.actual &&
+
+	# --quiet quiets the "From" line
+	git \
+		-C child \
+		-c protocol.version=2 \
+		ls-remote-bundle-uri \
+		--quiet \
+		>out.actual 2>err &&
+	test_must_be_empty err &&
+	test_cmp out.expect out.actual &&
+
+	# --quiet is implicit if the remote is not implicit
+	git \
+		-c protocol.version=2 \
+		ls-remote-bundle-uri \
+		"$T5730_URI" \
+		>out.actual 2>err &&
+	test_must_be_empty err &&
+	test_cmp out.expect out.actual
+'
diff --git a/transport.c b/transport.c
index 444cf74756..286c96269d 100644
--- a/transport.c
+++ b/transport.c
@@ -187,6 +187,7 @@ struct git_transport_data {
 	struct git_transport_options options;
 	struct child_process *conn;
 	int fd[2];
+	unsigned got_advertisement : 1;
 	unsigned got_remote_heads : 1;
 	enum protocol_version version;
 	struct oid_array extra_have;
@@ -332,6 +333,7 @@ static struct ref *handshake(struct transport *transport, int for_push,
 		BUG("unknown protocol version");
 	}
 	data->got_remote_heads = 1;
+	data->got_advertisement = 1;
 	transport->hash_algo = reader.hash_algo;
 
 	if (reader.line_peeked)
@@ -353,6 +355,33 @@ static int get_bundle_uri(struct transport *transport)
 	int stateless_rpc = transport->stateless_rpc;
 	string_list_init(&transport->bundle_uri, 1);
 
+	if (!data->got_advertisement) {
+		struct ref *refs;
+		struct git_transport_data *data = transport->data;
+		enum protocol_version version;
+
+		refs = handshake(transport, 0, NULL, 0);
+		version = data->version;
+
+		switch (version) {
+		case protocol_v2:
+			assert(!refs);
+			break;
+		case protocol_v0:
+		case protocol_v1:
+		case protocol_unknown_version:
+			assert(refs);
+			break;
+		}
+	}
+
+	/*
+	 * "Support" protocol v0 and v2 without bundle-uri support by
+	 * silently degrading to a NOOP.
+	 */
+	if (!server_supports_v2("bundle-uri", 0))
+		return 0;
+
 	packet_reader_init(&reader, data->fd[0], NULL, 0,
 			   PACKET_READ_CHOMP_NEWLINE |
 			   PACKET_READ_GENTLE_ON_EOF);
@@ -1472,7 +1501,7 @@ int transport_fetch_refs(struct transport *transport, struct ref *refs)
 	return rc;
 }
 
-int transport_get_remote_bundle_uri(struct transport *transport)
+int transport_get_remote_bundle_uri(struct transport *transport, int quiet)
 {
 	const struct transport_vtable *vtable = transport->vtable;
 
@@ -1480,20 +1509,16 @@ int transport_get_remote_bundle_uri(struct transport *transport)
 	if (transport->got_remote_bundle_uri++)
 		return 0;
 
-	/*
-	 * "Support" protocol v0 and v2 without bundle-uri support by
-	 * silently degrading to a NOOP.
-	 */
-	if (!server_supports_v2("bundle-uri", 0))
-		return 0;
-
 	/*
 	 * This is intentionally below the transport.injectBundleURI,
 	 * we want to be able to inject into protocol v0, or into the
 	 * dialog of a server who doesn't support this.
 	 */
-	if (!vtable->get_bundle_uri)
+	if (!vtable->get_bundle_uri) {
+		if (quiet)
+			return -1;
 		return error(_("bundle-uri operation not supported by protocol"));
+	}
 
 	if (vtable->get_bundle_uri(transport) < 0)
 		return error(_("could not retrieve server-advertised bundle-uri list"));
diff --git a/transport.h b/transport.h
index fb9f89f956..14c3298583 100644
--- a/transport.h
+++ b/transport.h
@@ -285,8 +285,12 @@ const struct ref *transport_get_remote_refs(struct transport *transport,
 /**
  * Retrieve bundle URI(s) from a remote. Populates "struct
  * transport"'s "bundle_uri" and "got_remote_bundle_uri".
+ *
+ * With `quiet=1` it will not complain if the serve doesn't support
+ * the protocol, but only if we discover the server uses it, and
+ * encounter issues then.
  */
-int transport_get_remote_bundle_uri(struct transport *transport);
+int transport_get_remote_bundle_uri(struct transport *transport, int quiet);
 
 /*
  * Fetch the hash algorithm used by a remote.
-- 
2.33.0.rc0.646.g585563e77f


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 06/13] bundle-uri client: add transfer.injectBundleURI support
  2021-08-05 15:07 [RFC PATCH 00/13] Add bundle-uri: resumably clones, static "dumb" CDN etc Ævar Arnfjörð Bjarmason
                   ` (4 preceding siblings ...)
  2021-08-05 15:07 ` [RFC PATCH 05/13] bundle-uri client: add "git ls-remote-bundle-uri" Ævar Arnfjörð Bjarmason
@ 2021-08-05 15:07 ` Ævar Arnfjörð Bjarmason
  2021-08-05 15:07 ` [RFC PATCH 07/13] bundle-uri client: add boolean transfer.bundleURI setting Ævar Arnfjörð Bjarmason
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-08-05 15:07 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Patrick Steinhardt, Christian Couder, Albert Cui,
	Jonathan Tan, Ævar Arnfjörð Bjarmason

Add the ability to inject "fake" bundle URIs into the newly supported
bundle-uri dialog. As discussed in the added documentation this allows
us to pretend as though the remote supports bundle URIs.

This will be useful both for ad-hoc testing, and for the real use-case
of retrofitting bundle URI support on-the-fly, i.e. to have:

	git -c transfer.injectBundleURI "file://$(pwd)/local.bdl" \
	clone https://example.com/git.git"

Be similar in spirit to:

	git clone --reference local-clone.git --disassociate \
	https://example.com/git.git"

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/config/transfer.txt     | 20 ++++++++++++
 t/lib-t5730-protocol-v2-bundle-uri.sh | 46 +++++++++++++++++++++++++++
 transport.c                           | 33 +++++++++++++++++++
 3 files changed, 99 insertions(+)

diff --git a/Documentation/config/transfer.txt b/Documentation/config/transfer.txt
index 505126a780..a9aff7f982 100644
--- a/Documentation/config/transfer.txt
+++ b/Documentation/config/transfer.txt
@@ -73,3 +73,23 @@ transfer.unpackLimit::
 transfer.advertiseSID::
 	Boolean. When true, client and server processes will advertise their
 	unique session IDs to their remote counterpart. Defaults to false.
+
+transfer.injectBundleURI::
+	Allows for the injection of `bundle-uri` lines into the
+	protocol v2 transport dialog (see `protocol.version` in
+	linkgit:git-config[1]). See `bundle-uri` in
+	link:technical/protocol-v2.html[the Git Wire Protocol, Version
+	2] documentation for what the format looks like.
++
+Can be given more than once, each key being injected as one line into
+the dialog.
++
+This is useful for testing the `bundle-uri` facility, and to e.g. use
+linkgit:git-clone[1] to clone from a server which does not support
+`bundle-uri`, but where the clone can benefit from getting some or
+most of the data from a static bundle retrieved from elsewhere.
++
+Impacts any command that uses the transport to communicate with remote
+linkgit:git-upload-pack[1] processes, e.g. linkgit:git-clone[1],
+linkgit:git-fetch[1] and the linkgit:git-ls-remote-bundle-uri[1]
+inspection command, this includes the `file://` protocol.
diff --git a/t/lib-t5730-protocol-v2-bundle-uri.sh b/t/lib-t5730-protocol-v2-bundle-uri.sh
index fccf7ccaa2..e5d17f6a81 100644
--- a/t/lib-t5730-protocol-v2-bundle-uri.sh
+++ b/t/lib-t5730-protocol-v2-bundle-uri.sh
@@ -270,3 +270,49 @@ test_expect_success "ls-remote-bundle-uri --[no-]quiet with $T5730_PROTOCOL:// u
 	test_must_be_empty err &&
 	test_cmp out.expect out.actual
 '
+
+test_expect_success "ls-remote-bundle-uri with -c transfer.injectBundleURI using with $T5730_PROTOCOL:// using protocol v2" '
+	test_when_finished "rm -f log" &&
+
+	test_config -C "$T5730_PARENT" uploadpack.bundleURI \
+		"$T5730_BUNDLE_URI_ESCAPED" &&
+
+	cat >expect <<-\EOF &&
+	https://injected.example.com/fake-1.bdl
+	https://injected.example.com/fake-2.bdl
+	EOF
+	GIT_TRACE_PACKET="$PWD/log" \
+	git \
+		-c protocol.version=2 \
+		-c transfer.injectBundleURI="https://injected.example.com/fake-1.bdl" \
+		-c transfer.injectBundleURI="https://injected.example.com/fake-2.bdl" \
+		ls-remote-bundle-uri \
+		"$T5730_URI" \
+		>actual 2>err &&
+	test_cmp expect actual &&
+	test_path_is_missing log
+'
+
+test_expect_success "ls-remote-bundle-uri with bad -c transfer.injectBundleURI protocol v2 with $T5730_PROTOCOL://" '
+	test_when_finished "rm -f log" &&
+
+	test_config -C "$T5730_PARENT" uploadpack.bundleURI \
+		"$T5730_BUNDLE_URI_ESCAPED" &&
+
+	cat >err.expect <<-\EOF &&
+	error: bad (empty) transfer.injectBundleURI
+	error: could not get the bundle-uri list
+	EOF
+
+	test_must_fail env \
+		GIT_TRACE_PACKET="$PWD/log" \
+		git \
+		-c protocol.version=2 \
+		-c transfer.injectBundleURI \
+		ls-remote-bundle-uri \
+		"$T5730_URI" \
+		>out 2>err.actual &&
+	test_must_be_empty out &&
+	test_cmp err.expect err.actual &&
+	test_path_is_missing log
+'
diff --git a/transport.c b/transport.c
index 286c96269d..bd8a914652 100644
--- a/transport.c
+++ b/transport.c
@@ -1501,14 +1501,47 @@ int transport_fetch_refs(struct transport *transport, struct ref *refs)
 	return rc;
 }
 
+struct config_cb {
+	struct transport *transport;
+	int configured;
+	int ret;
+};
+
+static int bundle_uri_config(const char *var, const char *value, void *data)
+{
+	struct config_cb *cb = data;
+	struct transport *transport = cb->transport;
+	struct string_list *uri = &transport->bundle_uri;
+
+	if (!strcmp(var, "transfer.injectbundleuri")) {
+		cb->configured = 1;
+		if (!value)
+			cb->ret = error(_("bad (empty) transfer.injectBundleURI"));
+		else if (bundle_uri_parse_line(uri, value) < 0)
+			cb->ret = error(_("bad transfer.injectBundleURI: '%s'"),
+					value);
+		return 0;
+	}
+	return 0;
+}
+
 int transport_get_remote_bundle_uri(struct transport *transport, int quiet)
 {
 	const struct transport_vtable *vtable = transport->vtable;
+	struct config_cb cb = {
+		.transport = transport,
+	};
 
 	/* Lazily configured */
 	if (transport->got_remote_bundle_uri++)
 		return 0;
 
+	git_config(bundle_uri_config, &cb);
+
+	/* Our own config can fake it up with transport.injectBundleURI */
+	if (cb.configured)
+		return cb.ret;
+
 	/*
 	 * This is intentionally below the transport.injectBundleURI,
 	 * we want to be able to inject into protocol v0, or into the
-- 
2.33.0.rc0.646.g585563e77f


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 07/13] bundle-uri client: add boolean transfer.bundleURI setting
  2021-08-05 15:07 [RFC PATCH 00/13] Add bundle-uri: resumably clones, static "dumb" CDN etc Ævar Arnfjörð Bjarmason
                   ` (5 preceding siblings ...)
  2021-08-05 15:07 ` [RFC PATCH 06/13] bundle-uri client: add transfer.injectBundleURI support Ævar Arnfjörð Bjarmason
@ 2021-08-05 15:07 ` Ævar Arnfjörð Bjarmason
  2021-08-05 15:07 ` [RFC PATCH 08/13] bundle.h: make "fd" version of read_bundle_header() public Ævar Arnfjörð Bjarmason
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-08-05 15:07 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Patrick Steinhardt, Christian Couder, Albert Cui,
	Jonathan Tan, Ævar Arnfjörð Bjarmason

The yet-to-be introduced client support for bundle-uri will always
fall back on a full clone, but we'd still like to be able to ignore a
server's bundle-uri advertisement entirely.

This is useful for testing, and if a server is pointing to bad
bundles, they take a while to time out etc.

Since we might see the config in any order we need to clear out any
accumulated bundle_uri list when we see transfer.bundleURI=false
setting, and not add any more things to the list.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/config/transfer.txt |  6 ++++++
 transport.c                       | 15 ++++++++++++++-
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/Documentation/config/transfer.txt b/Documentation/config/transfer.txt
index a9aff7f982..1ed026421c 100644
--- a/Documentation/config/transfer.txt
+++ b/Documentation/config/transfer.txt
@@ -74,6 +74,12 @@ transfer.advertiseSID::
 	Boolean. When true, client and server processes will advertise their
 	unique session IDs to their remote counterpart. Defaults to false.
 
+transfer.bundleURI::
+	When set to `false` ignores any server advertisement of
+	`bundle-uri` and proceed with a "normal" clone/fetch even if
+	using bundles to bootstap is possible. Defaults to `true`,
+	i.e. bundle-uri is tried whenever a server offers it.
+
 transfer.injectBundleURI::
 	Allows for the injection of `bundle-uri` lines into the
 	protocol v2 transport dialog (see `protocol.version` in
diff --git a/transport.c b/transport.c
index bd8a914652..80d1857374 100644
--- a/transport.c
+++ b/transport.c
@@ -1503,6 +1503,7 @@ int transport_fetch_refs(struct transport *transport, struct ref *refs)
 
 struct config_cb {
 	struct transport *transport;
+	int disabled;
 	int configured;
 	int ret;
 };
@@ -1513,7 +1514,15 @@ static int bundle_uri_config(const char *var, const char *value, void *data)
 	struct transport *transport = cb->transport;
 	struct string_list *uri = &transport->bundle_uri;
 
-	if (!strcmp(var, "transfer.injectbundleuri")) {
+	if (!strcmp(var, "transfer.bundleuri")) {
+		cb->disabled = !git_config_bool(var, value);
+		if (cb->disabled)
+			bundle_uri_string_list_clear(uri);
+		return 0;
+	}
+
+	if (!cb->disabled &&
+	    !strcmp(var, "transfer.injectbundleuri")) {
 		cb->configured = 1;
 		if (!value)
 			cb->ret = error(_("bad (empty) transfer.injectBundleURI"));
@@ -1538,6 +1547,10 @@ int transport_get_remote_bundle_uri(struct transport *transport, int quiet)
 
 	git_config(bundle_uri_config, &cb);
 
+	/* Don't use bundle-uri at all */
+	if (cb.disabled)
+		return 0;
+
 	/* Our own config can fake it up with transport.injectBundleURI */
 	if (cb.configured)
 		return cb.ret;
-- 
2.33.0.rc0.646.g585563e77f


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 08/13] bundle.h: make "fd" version of read_bundle_header() public
  2021-08-05 15:07 [RFC PATCH 00/13] Add bundle-uri: resumably clones, static "dumb" CDN etc Ævar Arnfjörð Bjarmason
                   ` (6 preceding siblings ...)
  2021-08-05 15:07 ` [RFC PATCH 07/13] bundle-uri client: add boolean transfer.bundleURI setting Ævar Arnfjörð Bjarmason
@ 2021-08-05 15:07 ` Ævar Arnfjörð Bjarmason
  2021-08-05 15:07 ` [RFC PATCH 09/13] fetch-pack: add a deref_without_lazy_fetch_extended() Ævar Arnfjörð Bjarmason
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-08-05 15:07 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Patrick Steinhardt, Christian Couder, Albert Cui,
	Jonathan Tan, Ævar Arnfjörð Bjarmason

Change the parse_bundle_header() function to be non-static, and rename
it to parse_bundle_header_fd(). The parse_bundle_header() function is
already public, and it's a thin wrapper around this function. This
will be used by code that wants to pass a fd to the bundle API.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 bundle.c | 8 ++++----
 bundle.h | 2 ++
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/bundle.c b/bundle.c
index ab63f40226..f3a87308cf 100644
--- a/bundle.c
+++ b/bundle.c
@@ -61,8 +61,8 @@ static int parse_bundle_signature(struct bundle_header *header, const char *line
 	return -1;
 }
 
-static int parse_bundle_header(int fd, struct bundle_header *header,
-			       const char *report_path)
+int read_bundle_header_fd(int fd, struct bundle_header *header,
+			  const char *report_path)
 {
 	struct strbuf buf = STRBUF_INIT;
 	int status = 0;
@@ -138,7 +138,7 @@ int read_bundle_header(const char *path, struct bundle_header *header)
 
 	if (fd < 0)
 		return error(_("could not open '%s'"), path);
-	return parse_bundle_header(fd, header, path);
+	return read_bundle_header_fd(fd, header, path);
 }
 
 int is_bundle(const char *path, int quiet)
@@ -148,7 +148,7 @@ int is_bundle(const char *path, int quiet)
 
 	if (fd < 0)
 		return 0;
-	fd = parse_bundle_header(fd, &header, quiet ? NULL : path);
+	fd = read_bundle_header_fd(fd, &header, quiet ? NULL : path);
 	if (fd >= 0)
 		close(fd);
 	bundle_header_release(&header);
diff --git a/bundle.h b/bundle.h
index 1927d8cd6a..8adf0e7393 100644
--- a/bundle.h
+++ b/bundle.h
@@ -22,6 +22,8 @@ void bundle_header_release(struct bundle_header *header);
 
 int is_bundle(const char *path, int quiet);
 int read_bundle_header(const char *path, struct bundle_header *header);
+int read_bundle_header_fd(int fd, struct bundle_header *header,
+			  const char *report_path);
 int create_bundle(struct repository *r, const char *path,
 		  int argc, const char **argv, struct strvec *pack_options,
 		  int version);
-- 
2.33.0.rc0.646.g585563e77f


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 09/13] fetch-pack: add a deref_without_lazy_fetch_extended()
  2021-08-05 15:07 [RFC PATCH 00/13] Add bundle-uri: resumably clones, static "dumb" CDN etc Ævar Arnfjörð Bjarmason
                   ` (7 preceding siblings ...)
  2021-08-05 15:07 ` [RFC PATCH 08/13] bundle.h: make "fd" version of read_bundle_header() public Ævar Arnfjörð Bjarmason
@ 2021-08-05 15:07 ` Ævar Arnfjörð Bjarmason
  2021-08-05 15:07 ` [RFC PATCH 10/13] fetch-pack: move --keep=* option filling to a function Ævar Arnfjörð Bjarmason
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-08-05 15:07 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Patrick Steinhardt, Christian Couder, Albert Cui,
	Jonathan Tan, Ævar Arnfjörð Bjarmason

Add a version of the deref_without_lazy_fetch function which can be
called with custom oi_flags and to grab information about the
"object_type". This will be used for the bundle-uri client in a
subsequent commit.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fetch-pack.c | 25 ++++++++++++++++++-------
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index b0c7be717c..b0b21cc969 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -114,17 +114,18 @@ static void for_each_cached_alternate(struct fetch_negotiator *negotiator,
 		cb(negotiator, cache.items[i]);
 }
 
-static struct commit *deref_without_lazy_fetch(const struct object_id *oid,
-					       int mark_tags_complete)
+static struct commit *deref_without_lazy_fetch_extended(const struct object_id *oid,
+							int mark_tags_complete,
+							enum object_type *type,
+							unsigned int oi_flags)
 {
-	enum object_type type;
-	struct object_info info = { .typep = &type };
+	struct object_info info = { .typep = type };
 
 	while (1) {
 		if (oid_object_info_extended(the_repository, oid, &info,
-					     OBJECT_INFO_SKIP_FETCH_OBJECT | OBJECT_INFO_QUICK))
+					     oi_flags))
 			return NULL;
-		if (type == OBJ_TAG) {
+		if (*type == OBJ_TAG) {
 			struct tag *tag = (struct tag *)
 				parse_object(the_repository, oid);
 
@@ -137,11 +138,21 @@ static struct commit *deref_without_lazy_fetch(const struct object_id *oid,
 			break;
 		}
 	}
-	if (type == OBJ_COMMIT)
+	if (*type == OBJ_COMMIT)
 		return (struct commit *) parse_object(the_repository, oid);
 	return NULL;
 }
 
+
+static struct commit *deref_without_lazy_fetch(const struct object_id *oid,
+					       int mark_tags_complete)
+{
+	enum object_type type;
+	unsigned flags = OBJECT_INFO_SKIP_FETCH_OBJECT | OBJECT_INFO_QUICK;
+	return deref_without_lazy_fetch_extended(oid, mark_tags_complete,
+						 &type, flags);
+}
+
 static int rev_list_insert_ref(struct fetch_negotiator *negotiator,
 			       const struct object_id *oid)
 {
-- 
2.33.0.rc0.646.g585563e77f


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 10/13] fetch-pack: move --keep=* option filling to a function
  2021-08-05 15:07 [RFC PATCH 00/13] Add bundle-uri: resumably clones, static "dumb" CDN etc Ævar Arnfjörð Bjarmason
                   ` (8 preceding siblings ...)
  2021-08-05 15:07 ` [RFC PATCH 09/13] fetch-pack: add a deref_without_lazy_fetch_extended() Ævar Arnfjörð Bjarmason
@ 2021-08-05 15:07 ` Ævar Arnfjörð Bjarmason
  2021-08-05 15:07 ` [RFC PATCH 11/13] index-pack: add --progress-title option Ævar Arnfjörð Bjarmason
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-08-05 15:07 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Patrick Steinhardt, Christian Couder, Albert Cui,
	Jonathan Tan, Ævar Arnfjörð Bjarmason

Move the populating of the --keep=* option argument to "index-pack" to
a static function, a subsequent commit will make use of it in another
function.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fetch-pack.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index b0b21cc969..ef07b45072 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -826,6 +826,16 @@ static void parse_gitmodules_oids(int fd, struct oidset *gitmodules_oids)
 	} while (1);
 }
 
+static void add_index_pack_keep_option(struct strvec *args)
+{
+	char hostname[HOST_NAME_MAX + 1];
+
+	if (xgethostname(hostname, sizeof(hostname)))
+		xsnprintf(hostname, sizeof(hostname), "localhost");
+	strvec_pushf(args, "--keep=fetch-pack %"PRIuMAX " on %s",
+		     (uintmax_t)getpid(), hostname);
+}
+
 /*
  * If packfile URIs were provided, pass a non-NULL pointer to index_pack_args.
  * The strings to pass as the --index-pack-arg arguments to http-fetch will be
@@ -895,14 +905,8 @@ static int get_pack(struct fetch_pack_args *args,
 			strvec_push(&cmd.args, "-v");
 		if (args->use_thin_pack)
 			strvec_push(&cmd.args, "--fix-thin");
-		if ((do_keep || index_pack_args) && (args->lock_pack || unpack_limit)) {
-			char hostname[HOST_NAME_MAX + 1];
-			if (xgethostname(hostname, sizeof(hostname)))
-				xsnprintf(hostname, sizeof(hostname), "localhost");
-			strvec_pushf(&cmd.args,
-				     "--keep=fetch-pack %"PRIuMAX " on %s",
-				     (uintmax_t)getpid(), hostname);
-		}
+		if ((do_keep || index_pack_args) && (args->lock_pack || unpack_limit))
+			add_index_pack_keep_option(&cmd.args);
 		if (!index_pack_args && args->check_self_contained_and_connected)
 			strvec_push(&cmd.args, "--check-self-contained-and-connected");
 		else
-- 
2.33.0.rc0.646.g585563e77f


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 11/13] index-pack: add --progress-title option
  2021-08-05 15:07 [RFC PATCH 00/13] Add bundle-uri: resumably clones, static "dumb" CDN etc Ævar Arnfjörð Bjarmason
                   ` (9 preceding siblings ...)
  2021-08-05 15:07 ` [RFC PATCH 10/13] fetch-pack: move --keep=* option filling to a function Ævar Arnfjörð Bjarmason
@ 2021-08-05 15:07 ` Ævar Arnfjörð Bjarmason
  2021-08-05 15:07 ` [RFC PATCH 12/13] bundle-uri client: support for bundle-uri with "clone" Ævar Arnfjörð Bjarmason
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-08-05 15:07 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Patrick Steinhardt, Christian Couder, Albert Cui,
	Jonathan Tan, Ævar Arnfjörð Bjarmason

Add a --progress-title option to index-pack, when data is piped into
index-pack its progress is a proxy for whatever's feeding it
data. This option will allow us to set a more relevant progress bar
title.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/git-index-pack.txt | 6 ++++++
 builtin/index-pack.c             | 6 ++++++
 2 files changed, 12 insertions(+)

diff --git a/Documentation/git-index-pack.txt b/Documentation/git-index-pack.txt
index 7fa74b9e79..9fd5d8a2d4 100644
--- a/Documentation/git-index-pack.txt
+++ b/Documentation/git-index-pack.txt
@@ -82,6 +82,12 @@ OPTIONS
 --strict::
 	Die, if the pack contains broken objects or links.
 
+--progress-title::
+	For internal use only.
++
+Set the title of the "Receiving objects" progress bar (it's "Indexing
+objects" under `--stdin`).
+
 --check-self-contained-and-connected::
 	Die if the pack contains broken links. For internal use only.
 
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 8336466865..0841c039ae 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -122,6 +122,7 @@ static int strict;
 static int do_fsck_object;
 static struct fsck_options fsck_options = FSCK_OPTIONS_MISSING_GITMODULES;
 static int verbose;
+static const char *progress_title;
 static int show_resolving_progress;
 static int show_stat;
 static int check_self_contained_and_connected;
@@ -1157,6 +1158,7 @@ static void parse_pack_objects(unsigned char *hash)
 
 	if (verbose)
 		progress = start_progress(
+				progress_title ? progress_title :
 				from_stdin ? _("Receiving objects") : _("Indexing objects"),
 				nr_objects);
 	for (i = 0; i < nr_objects; i++) {
@@ -1806,6 +1808,10 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 				input_len = sizeof(*hdr);
 			} else if (!strcmp(arg, "-v")) {
 				verbose = 1;
+			} else if (!strcmp(arg, "--progress-title")) {
+				if (progress_title || (i+1) >= argc)
+					usage(index_pack_usage);
+				progress_title = argv[++i];
 			} else if (!strcmp(arg, "--show-resolving-progress")) {
 				show_resolving_progress = 1;
 			} else if (!strcmp(arg, "--report-end-of-input")) {
-- 
2.33.0.rc0.646.g585563e77f


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 12/13] bundle-uri client: support for bundle-uri with "clone"
  2021-08-05 15:07 [RFC PATCH 00/13] Add bundle-uri: resumably clones, static "dumb" CDN etc Ævar Arnfjörð Bjarmason
                   ` (10 preceding siblings ...)
  2021-08-05 15:07 ` [RFC PATCH 11/13] index-pack: add --progress-title option Ævar Arnfjörð Bjarmason
@ 2021-08-05 15:07 ` Ævar Arnfjörð Bjarmason
  2021-08-05 15:07 ` [RFC PATCH 13/13] bundle-uri docs: add design notes Ævar Arnfjörð Bjarmason
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-08-05 15:07 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Patrick Steinhardt, Christian Couder, Albert Cui,
	Jonathan Tan, Ævar Arnfjörð Bjarmason

In an earlier commit ("bundle-uri client: add minimal NOOP client") a
transport_get_remote_bundle_uri() call was added to builtin/clone.c to
get any advertised bundle URIs from the server during cloning, but
nothing was being done with them yet.

This implements real support for bundle-uri during the "clone"
phase. It's not used at all by "fetch", but the code to support it is
mostly here already, and will be finished later.

Using the new transfer.injectBundleURI support it's easy to test this
method of cloning on a live server that doesn't support bundle-uri. In
a git.git checkout.

First let's prepare two bundles:

    git bundle create /tmp/git-master-only.bdl origin/master
    git bundle create /tmp/git-master-to-next.bdl origin/master..origin/next

And next, let's do a "fake" clone where we bootstrap from these
bundles. The fetch.uriProtocols is needed because we'd otherwise
ignore "file://" URIs. This uses --no-tags --single-branch for
simplicity:

    rm -rf /tmp/git.git &&
    git \
	-c protocol.version=2 \
        -c fetch.uriProtocols=file \
        -c transfer.injectBundleURI="file:///tmp/git-master-only.bdl" \
	-c transfer.injectBundleURI="file:///tmp/git-master-to-next.bdl" \
	clone --bare --no-tags --single-branch --branch next --template= \
	--verbose --verbose \
	https://github.com/git/git.git /tmp/git.git

We'll then get output like:

    Receiving bundle (1/2): 100% (300529/300529), 87.57 MiB | 32.70 MiB/s, done.
    Resolving deltas: 100% (226765/226765), done.
    have eb27b338a3e71c7c4079fbac8aeae3f8fbb5c687 commit via bundle-uri
    Receiving bundle (2/2): 100% (725/725), 221.11 KiB | 22.11 MiB/s, done.
    Resolving deltas: 100% (539/539), completed with 153 local objects.
    have e1b32706d8dd5db1dc2e13f8e391651214f1d987 commit via bundle-uri
    Marking e1b32706d8dd5db1dc2e13f8e391651214f1d987 as complete
    already have e1b32706d8dd5db1dc2e13f8e391651214f1d987 (refs/heads/next)
    Checking connectivity: 301210, done.

I.e. we did an ls-refs on connection to the server, then retrieved the
advertised bundles (faked up via config in this case).

We then got all the data leading up to the current "master" from
there, and also the commit that's currently on "next. In this case we
found that we didn't need to proceed further with the dialog.

I.e. other than an ls-refs and the server waiting until we downloaded
the bundles, the server didn't need to do any work creating a PACK for
us.

If we change "--branch next" into "--branch seen" in the above command
we'll get the same output at the start until the "want" line, then:

    [...]
    want 93021c12c9f91e0d750d3ca8750a62416f4ea81a (refs/heads/seen)
    POST git-upload-pack (212 bytes)
    remote: Enumerating objects: 2265, done.
    remote: Counting objects: 100% (1576/1576), done.
    remote: Compressing objects: 100% (233/233), done.
    remote: Total 2265 (delta 1378), reused 1480 (delta 1341), pack-reused 689
    Receiving objects: 100% (2265/2265), 2.17 MiB | 10.77 MiB/s, done.
    Resolving deltas: 100% (1673/1673), completed with 339 local objects.
    Checking connectivity: 303225, done.

I.e. the server needed to send us an incremental update on top after
we'd unpacked the bundles, but this was a fairly minimal set of ~2k
objects. It didn't need to service a full clone.

We can see the savings on the server by setting up a local server at
the tip of "next":

    rm -rf /tmp/git-server.git &&
    git init --bare /tmp/git-server.git &&
    git -C /tmp/git-server.git bundle unbundle /tmp/git-master-only.bdl &&
    git -C /tmp/git-server.git bundle unbundle /tmp/git-master-to-next.bdl
    git -C /tmp/git-server.git update-ref refs/heads/master $(git ls-remote /tmp/git-master-only.bdl | cut -f 1) &&
    git -C /tmp/git-server.git update-ref refs/heads/next $(git ls-remote /tmp/git-master-to-next.bdl | cut -f 1) &&
    git -C /tmp/git-server.git for-each-ref

Let's then clone from it, and record the time we spend.

    rm -rf /tmp/git.git /tmp/{client,server}.time &&
    /usr/bin/time -o /tmp/client.time -v git \
	-c protocol.version=2 \
        -c fetch.uriProtocols=file \
        -c transfer.injectBundleURI="file:///tmp/git-master-only.bdl" \
	clone \
	--upload-pack '/usr/bin/time -o /tmp/server.time -v git-upload-pack' \
	--bare --no-tags --single-branch --branch next --template= \
	--verbose --verbose \
	file:///tmp/git-server.git /tmp/git.git &&
    for i in client server
    do
        echo $i: &&
        grep -e seconds -e wall -e Maximum -e context /tmp/$i.time
    done

This gives us something like these results:

    client:
        User time (seconds): 46.34
        System time (seconds): 0.67
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:18.67
        Maximum resident set size (kbytes): 207096
        Voluntary context switches: 116058
        Involuntary context switches: 220
    server:
        User time (seconds): 0.13
        System time (seconds): 0.00
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:14.08
        Maximum resident set size (kbytes): 53168
        Voluntary context switches: 255
        Involuntary context switches: 7

Whereas doing a normal "clone" (by e.g. adding "-c
transfer.bundleURI=false" to the above) will give something like:

    client:
        User time (seconds): 47.24
        System time (seconds): 0.92
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:18.55
        Maximum resident set size (kbytes): 288104
        Voluntary context switches: 136350
        Involuntary context switches: 296
    server:
        User time (seconds): 5.73
        System time (seconds): 0.24
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:07.45
        Maximum resident set size (kbytes): 288104
        Voluntary context switches: 26568
        Involuntary context switches: 111

I.e. we can see that the win on the client in this case is negative,
but we use around over 2% of the CPU time on the server, and around
20% of the memory. The client-visible time is a bit slower, by around
2%.

In practice I think this will be more of a win-win. These results are
on an unloaded local machine, and don't account for the benefit of the
server being more likely to have a network-local version of most of
the repository via dumb CDNs.

Real servers are also usually in a messier state of having various
loose objects and more fragmented pack collections, and needing to
spend CPU to assemble these. Frequent repacking and e.g. local caching
e.g. via the uploadpack.packObjectsHook helps, but using this should
make it more accessible to run a highly performance git server.

This feature also makes things like resumable clones rather trivial to
implement, this approach was discussed in the past[1] as a means to
get that feature.

1. https://lore.kernel.org/git/20111110074330.GA27925@sigill.intra.peff.net/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fetch-pack.c                          | 259 ++++++++++++++++++++++++++
 fetch-pack.h                          |   6 +
 t/lib-t5730-protocol-v2-bundle-uri.sh | 109 ++++++++++-
 transport.c                           |   1 +
 4 files changed, 374 insertions(+), 1 deletion(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index ef07b45072..4ce4946963 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -25,6 +25,7 @@
 #include "shallow.h"
 #include "commit-reach.h"
 #include "commit-graph.h"
+#include "bundle.h"
 
 static int transfer_unpack_limit = -1;
 static int fetch_unpack_limit = -1;
@@ -1000,6 +1001,137 @@ static int get_pack(struct fetch_pack_args *args,
 	return 0;
 }
 
+static int unbundle_bundle_uri(const char *bundle_uri, unsigned int nth,
+			       unsigned int total_nr, FILE *in, int in_fd,
+			       struct oid_array *bundle_oids,
+			       unsigned int use_thin_pack)
+{
+	struct child_process cmd = CHILD_PROCESS_INIT;
+	struct bundle_header header = BUNDLE_HEADER_INIT;
+	int ret = 0;
+	struct string_list_item *item;
+	struct strbuf progress_title = STRBUF_INIT;
+	int code;
+
+	ret = read_bundle_header_fd(in_fd, &header, bundle_uri);
+	if (ret < 0) {
+		ret = error("could not read_bundle_header(%s)", bundle_uri);
+		goto cleanup;
+	}
+
+	for_each_string_list_item(item, &header.references) {
+		/*
+		 * The bundle's idea of the ref name is
+		 * item->string.
+		 *
+		 * Here's where we could do concurrent negotiation
+		 * with the server (and possibly start the fetch!)
+		 * before or while we unpack the bundle with
+		 * index-pack.
+		 *
+		 * The negotiator would need a small change to trust
+		 * arbitrary OIDs instead of assuming it has existing
+		 * in-repo "struct commit *", but ad-hoc testing
+		 * reveals that it'll work & speed up the fetch even
+		 * more, as we could proceed in parallel with the full
+		 * bundle fetching as soon as we get the headers.
+		 */
+		struct object_id *oid = item->util;
+
+		oid_array_append(bundle_oids, oid);
+	}
+
+	if (git_env_bool("GIT_TEST_BUNDLE_URI_FAIL_UNBUNDLE", 0))
+		lseek(in_fd, 0, SEEK_SET);
+
+	strbuf_addf(&progress_title, "Receiving bundle (%d/%d)", nth, total_nr);
+
+	strvec_push(&cmd.args, "index-pack");
+	strvec_push(&cmd.args, "--stdin");
+	strvec_push(&cmd.args, "-v");
+	strvec_push(&cmd.args, "--progress-title");
+	strvec_push(&cmd.args, progress_title.buf);
+
+	if (header.prerequisites.nr && use_thin_pack)
+		strvec_push(&cmd.args, "--fix-thin");
+	strvec_push(&cmd.args, "--check-self-contained-and-connected");
+	add_index_pack_keep_option(&cmd.args);
+
+	cmd.git_cmd = 1;
+	cmd.in = in_fd;
+	cmd.no_stdout = 1;
+	cmd.git_cmd = 1;
+
+	if (start_command(&cmd)) {
+		ret = error(_("fetch-pack: unable to spawn index-pack"));
+		goto cleanup;
+	}
+
+	code = finish_command(&cmd);
+
+	if (header.prerequisites.nr && code == 1)
+		/*
+		 * index-pack returns -1 on
+		 * --check-self-contained-and-connected to indicate
+		 * that the pack was indeed not self contained and
+		 * connected. We know from the bundle header
+		 * prerequisites.
+		 */
+		code = 0;
+
+	if (code) {
+		ret = error(_("fetch-pack: unable to finish index-pack, exited with %d"), code);
+		goto cleanup;
+	}
+
+cleanup:
+	strbuf_release(&progress_title);
+	bundle_header_release(&header);
+	return ret;
+}
+
+static int get_bundle_uri(struct string_list_item *item, unsigned int nth,
+			  unsigned int total_nr, struct oid_array *bundle_oids,
+			  unsigned int use_thin_pack)
+{
+	struct child_process cmd = CHILD_PROCESS_INIT;
+	struct strbuf tempfile = STRBUF_INIT;
+	int ret = 0;
+	const char *uri = item->string;
+	FILE *out;
+	int out_fd;
+
+	strvec_push(&cmd.args, "curl");
+	strvec_push(&cmd.args, "--silent");
+	strvec_push(&cmd.args, "--output");
+	strvec_push(&cmd.args, "-");
+	strvec_push(&cmd.args, "--");
+	strvec_push(&cmd.args, item->string);
+	cmd.git_cmd = 0;
+	cmd.no_stdin = 1;
+	cmd.out = -1;
+
+	if (start_command(&cmd)) {
+		ret = error("fetch-pack: unable to spawn http-fetch");
+		goto cleanup;
+	}
+
+	out = xfdopen(cmd.out, "r");
+	out_fd = fileno(out);
+	ret = unbundle_bundle_uri(uri, nth, total_nr, out, out_fd,
+				  bundle_oids, use_thin_pack);
+
+	if (finish_command(&cmd)) {
+		ret = error("fetch-pack: unable to finish http-fetch");
+		goto cleanup;
+	}
+
+cleanup:
+	strbuf_release(&tempfile);
+
+	return ret;
+}
+
 static int cmp_ref_by_name(const void *a_, const void *b_)
 {
 	const struct ref *a = *((const struct ref **)a_);
@@ -1549,6 +1681,130 @@ static void do_check_stateless_delimiter(int stateless_rpc,
 				  _("git fetch-pack: expected response end packet"));
 }
 
+static int get_bundle_uri_add_known_common(struct string_list_item *item,
+					   unsigned int nth, unsigned int total_nr,
+					   struct fetch_negotiator *negotiator,
+					   struct fetch_pack_args *args,
+					   unsigned int use_thin_pack)
+{
+	int i;
+	struct oid_array bundle_oids = OID_ARRAY_INIT;
+
+	/*
+	 * We don't use OBJECT_INFO_QUICK here unlike in the rest of
+	 * the fetch routines, that's because the rest of them don't
+	 * need to consider a commit object that's just been
+	 * downloaded for further negotiation, but bundle-uri does for
+	 * adding newly downloaded OIDs to the negotiator.
+	 */
+	unsigned oi_flags = OBJECT_INFO_SKIP_FETCH_OBJECT;
+
+	if (get_bundle_uri(item, nth, total_nr, &bundle_oids, use_thin_pack) < 0)
+		return error(_("could not get the bundle URI #%d"), nth);
+
+	for (i = 0; i < bundle_oids.nr; i++) {
+		struct object_id *oid = &bundle_oids.oid[i];
+		enum object_type type = OBJ_NONE;
+		struct commit *c = deref_without_lazy_fetch_extended(oid, 0,
+								     &type,
+								     oi_flags);
+		if (!c) {
+			if (type == OBJ_BLOB || type == OBJ_TREE) {
+				print_verbose(args, "have %s %s via bundle-uri (ignoring due to type)",
+					      oid_to_hex(oid), type_name(type));
+				continue;
+			} else if (type) {
+				/*
+				 * OBJ_TAG should have been peeled,
+				 * and OBJ_COMMIT should have a
+				 * non-NULL "c".
+				 *
+				 * Should be a BUG() if we were not
+				 * bending over backwards to make
+				 * bundle-uri soft-fail.
+				 */
+				return error(_("bundle-uri says it has %s, got it at unexpected type %s"),
+					     oid_to_hex(oid), type_name(type));
+			}
+		}
+
+		print_verbose(args, "have %s %s via bundle-uri",
+			      oid_to_hex(oid), type_name(type));
+
+		negotiator->known_common(negotiator, c);
+		mark_complete(oid);
+	}
+	return 0;
+}
+
+static void do_fetch_pack_v2_bundle_uri(struct fetch_pack_args *args,
+					struct string_list  *bundle_uri,
+					struct fetch_negotiator *negotiator)
+{
+	struct string_list_item *item;
+	struct string_list list = STRING_LIST_INIT_NODUP;
+	struct string_list default_protocols = STRING_LIST_INIT_NODUP;
+	struct string_list *ok_protocols;
+
+	if (!bundle_uri)
+		return;
+
+	if (!bundle_uri->nr)
+		return;
+
+	if (uri_protocols.nr) {
+		ok_protocols = &uri_protocols;
+	} else {
+		string_list_append(&default_protocols, "http");
+		string_list_append(&default_protocols, "https");
+		ok_protocols = &default_protocols;
+	}
+
+	for_each_string_list_item(item, bundle_uri) {
+		const char *uri = item->string;
+		int protocol_ok = 0;
+		struct string_list_item *item2;
+
+		for_each_string_list_item(item2, ok_protocols) {
+			const char *s = item2->string;
+			const char *p;
+
+			if (skip_prefix(item->string, s, &p) &&
+			    starts_with(p, "://")) {
+				protocol_ok = 1;
+				break;
+			}
+		}
+
+		if (!protocol_ok) {
+			print_verbose(args, "skipping bundle-uri not on protocol whitelist: %s",
+				      item->string);
+			continue;
+		}
+
+		string_list_append(&list, uri)->util = item->util;
+	}
+
+	if (list.nr) {
+		int i;
+		unsigned int total_nr = list.nr;
+
+		trace2_region_enter("fetch-pack", "bundle-uri", the_repository);
+		for (i = 0; i < total_nr; i++) {
+			struct string_list_item item = list.items[i];
+			unsigned int nth = i + 1;
+
+			get_bundle_uri_add_known_common(&item, nth, total_nr,
+							negotiator, args,
+							args->use_thin_pack);
+		}
+		trace2_region_leave("fetch-pack", "bundle-uri", the_repository);
+	}
+
+	string_list_clear(&default_protocols, 0);;
+}
+
+
 static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 				    int fd[2],
 				    const struct ref *orig_ref,
@@ -1572,10 +1828,13 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 	struct string_list packfile_uris = STRING_LIST_INIT_DUP;
 	int i;
 	struct strvec index_pack_args = STRVEC_INIT;
+	struct string_list *bundle_uri = args->bundle_uri;
 
 	negotiator = &negotiator_alloc;
 	fetch_negotiator_init(r, negotiator);
 
+	do_fetch_pack_v2_bundle_uri(args, bundle_uri, negotiator);
+
 	packet_reader_init(&reader, fd[0], NULL, 0,
 			   PACKET_READ_CHOMP_NEWLINE |
 			   PACKET_READ_DIE_ON_ERR_PACKET);
diff --git a/fetch-pack.h b/fetch-pack.h
index 7f94a2a583..b19fd7d93b 100644
--- a/fetch-pack.h
+++ b/fetch-pack.h
@@ -24,6 +24,12 @@ struct fetch_pack_args {
 	 */
 	const struct oid_array *negotiation_tips;
 
+	/*
+	 * A pointer to the already populated transport.bundle_uri
+	 * struct.
+	 */
+	struct string_list *bundle_uri;
+
 	unsigned deepen_relative:1;
 	unsigned quiet:1;
 	unsigned keep_pack:1;
diff --git a/t/lib-t5730-protocol-v2-bundle-uri.sh b/t/lib-t5730-protocol-v2-bundle-uri.sh
index e5d17f6a81..db845fd4c3 100644
--- a/t/lib-t5730-protocol-v2-bundle-uri.sh
+++ b/t/lib-t5730-protocol-v2-bundle-uri.sh
@@ -7,6 +7,8 @@ case "$T5730_PROTOCOL" in
 file)
 	T5730_PARENT=file_parent
 	T5730_URI="file://$PWD/file_parent"
+	T5730_URI_BDL_PROTO="file://"
+	T5730_URI_BDL="$T5730_URI_BDL_PROTO$PWD/file_parent"
 	T5730_BUNDLE_URI="$T5730_URI/fake.bdl"
 	test_set_prereq T5730_FILE
 	;;
@@ -15,6 +17,8 @@ git)
 	start_git_daemon --export-all --enable=receive-pack
 	T5730_PARENT="$GIT_DAEMON_DOCUMENT_ROOT_PATH/parent"
 	T5730_URI="$GIT_DAEMON_URL/parent"
+	T5730_URI_BDL_PROTO="file://"
+	T5730_URI_BDL="$T5730_URI_BDL_PROTO$GIT_DAEMON_DOCUMENT_ROOT_PATH/parent"
 	T5730_BUNDLE_URI="https://example.com/fake.bdl"
 	test_set_prereq T5730_GIT
 	;;
@@ -23,6 +27,8 @@ http)
 	start_httpd
 	T5730_PARENT="$HTTPD_DOCUMENT_ROOT_PATH/http_parent"
 	T5730_URI="$HTTPD_URL/smart/http_parent"
+	T5730_URI_BDL_PROTO="http://"
+	T5730_URI_BDL="$HTTPD_URL/dumb/http_parent"
 	test_set_prereq T5730_HTTP
 	;;
 *)
@@ -32,7 +38,20 @@ esac
 
 test_expect_success "setup protocol v2 $T5730_PROTOCOL:// tests" '
 	git init "$T5730_PARENT" &&
-	test_commit -C "$T5730_PARENT" one
+	test_commit -C "$T5730_PARENT" one &&
+	test_commit -C "$T5730_PARENT" two &&
+	test_commit -C "$T5730_PARENT" three &&
+	test_commit -C "$T5730_PARENT" four &&
+	test_commit -C "$T5730_PARENT" five &&
+	test_commit -C "$T5730_PARENT" six &&
+
+	mkdir "$T5730_PARENT"/bdl &&
+	git -C "$T5730_PARENT" bundle create bdl/1.bdl one &&
+	git -C "$T5730_PARENT" bundle create bdl/1-2.bdl one..two &&
+	git -C "$T5730_PARENT" bundle create bdl/2-3.bdl two..three &&
+	git -C "$T5730_PARENT" bundle create bdl/3-4.bdl three..four &&
+	git -C "$T5730_PARENT" bundle create bdl/4-5.bdl four..five &&
+	git -C "$T5730_PARENT" bundle create bdl/5-6.bdl five..six
 '
 
 # Poor man's URI escaping. Good enough for the test suite whose trash
@@ -316,3 +335,91 @@ test_expect_success "ls-remote-bundle-uri with bad -c transfer.injectBundleURI p
 	test_cmp err.expect err.actual &&
 	test_path_is_missing log
 '
+
+test_cmp_repo_refs() {
+	one="$1"
+	two="$2"
+	shift 2
+
+	git -C "$one" for-each-ref "$@" >expect &&
+	git -C "$two" for-each-ref "$@" >actual &&
+	test_cmp expect actual
+}
+
+test_expect_success "clone with bundle-uri protocol v2 over $T5730_PROTOCOL:// 1.bdl via $T5730_URI_BDL_PROTO" '
+	test_when_finished "rm -f log" &&
+
+	test_config -C "$T5730_PARENT" uploadpack.bundleURI "$(echo $T5730_URI_BDL/bdl/1.bdl | test_uri_escape)" &&
+
+	test_when_finished "rm -rf log child" &&
+	GIT_TRACE_PACKET="$PWD/log" \
+	git \
+		-c protocol.version=2 \
+		-c fetch.uriProtocols=file,http \
+		clone --verbose --verbose \
+		"$T5730_URI" child >out 2>err &&
+	grep -F "Receiving bundle (1/1)" err &&
+	grep "clone> want " log &&
+	test_cmp_repo_refs "$T5730_PARENT" child refs/heads refs/tags
+'
+
+test_expect_success "fetch with bundle-uri protocol v2 over $T5730_PROTOCOL:// 1.bdl via $T5730_URI_BDL_PROTO" '
+	test_when_finished "rm -f log" &&
+
+	test_config -C "$T5730_PARENT" uploadpack.bundleURI "$(echo $T5730_URI_BDL/bdl/1.bdl | test_uri_escape)" &&
+
+	test_when_finished "rm -rf log child" &&
+	git init --bare child &&
+	git -C child remote add --mirror=fetch origin "$T5730_URI" &&
+	GIT_TRACE_PACKET="$PWD/log" \
+	git -C child \
+		-c protocol.version=2 \
+		-c fetch.uriProtocols=file,http \
+		fetch --verbose --verbose >out 2>err &&
+	# Fetch is not supported yet
+	! grep -F "Receiving bundle (1/1)" err &&
+	grep "fetch> want " log &&
+	test_cmp_repo_refs "$T5730_PARENT" child refs/heads refs/tags
+'
+
+test_expect_success "clone with bundle-uri protocol v2 with $T5730_PROTOCOL:// 1 + 1-2 + [...].bdl via $T5730_URI_BDL_PROTO" '
+	test_when_finished "rm -f log" &&
+
+	test_config -C "$T5730_PARENT" uploadpack.bundleURI "$(echo $T5730_URI_BDL/bdl/1.bdl | test_uri_escape)" &&
+	test_config -C "$T5730_PARENT" uploadpack.bundleURI "$(echo $T5730_URI_BDL/bdl/1-2.bdl | test_uri_escape)" --add &&
+	test_config -C "$T5730_PARENT" uploadpack.bundleURI "$(echo $T5730_URI_BDL/bdl/2-3.bdl | test_uri_escape)" --add &&
+	test_config -C "$T5730_PARENT" uploadpack.bundleURI "$(echo $T5730_URI_BDL/bdl/3-4.bdl | test_uri_escape)" --add &&
+
+	test_when_finished "rm -rf log child" &&
+	GIT_TRACE_PACKET="$PWD/log" \
+	git \
+		-c protocol.version=2 \
+		-c fetch.uriProtocols=file,http \
+		clone --verbose --verbose \
+		"$T5730_URI" child >out 2>err &&
+	grep -F "Receiving bundle (4/4)" err &&
+	test_cmp_repo_refs "$T5730_PARENT" child refs/heads refs/tags &&
+	grep "clone> want " log
+'
+
+test_expect_success "clone with bundle-uri protocol v2 with $T5730_PROTOCOL:// ALL.bdl via $T5730_URI_BDL_PROTO" '
+	test_when_finished "rm -f log" &&
+
+	test_config -C "$T5730_PARENT" uploadpack.bundleURI "$(echo $T5730_URI_BDL/bdl/1.bdl | test_uri_escape)" &&
+	test_config -C "$T5730_PARENT" uploadpack.bundleURI "$(echo $T5730_URI_BDL/bdl/1-2.bdl | test_uri_escape)" --add &&
+	test_config -C "$T5730_PARENT" uploadpack.bundleURI "$(echo $T5730_URI_BDL/bdl/2-3.bdl | test_uri_escape)" --add &&
+	test_config -C "$T5730_PARENT" uploadpack.bundleURI "$(echo $T5730_URI_BDL/bdl/3-4.bdl | test_uri_escape)" --add &&
+	test_config -C "$T5730_PARENT" uploadpack.bundleURI "$(echo $T5730_URI_BDL/bdl/4-5.bdl | test_uri_escape)" --add &&
+	test_config -C "$T5730_PARENT" uploadpack.bundleURI "$(echo $T5730_URI_BDL/bdl/5-6.bdl | test_uri_escape)" --add &&
+
+	test_when_finished "rm -rf log child" &&
+	GIT_TRACE_PACKET="$PWD/log" \
+	git \
+		-c protocol.version=2 \
+		-c fetch.uriProtocols=file,http \
+		clone --verbose --verbose \
+		"$T5730_URI" child >out 2>err &&
+	grep -F "Receiving bundle (6/6)" err &&
+	test_cmp_repo_refs "$T5730_PARENT" child refs/heads refs/tags &&
+	! grep "clone> want " log
+'
diff --git a/transport.c b/transport.c
index 80d1857374..043afa4544 100644
--- a/transport.c
+++ b/transport.c
@@ -422,6 +422,7 @@ static int fetch_refs_via_pack(struct transport *transport,
 	args.server_options = transport->server_options;
 	args.negotiation_tips = data->options.negotiation_tips;
 	args.reject_shallow_remote = transport->smart_options->reject_shallow;
+	args.bundle_uri = &transport->bundle_uri;
 
 	if (!data->got_remote_heads) {
 		int i;
-- 
2.33.0.rc0.646.g585563e77f


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 13/13] bundle-uri docs: add design notes
  2021-08-05 15:07 [RFC PATCH 00/13] Add bundle-uri: resumably clones, static "dumb" CDN etc Ævar Arnfjörð Bjarmason
                   ` (11 preceding siblings ...)
  2021-08-05 15:07 ` [RFC PATCH 12/13] bundle-uri client: support for bundle-uri with "clone" Ævar Arnfjörð Bjarmason
@ 2021-08-05 15:07 ` Ævar Arnfjörð Bjarmason
  2021-08-24 21:48   ` brian m. carlson
  2021-08-06 14:38 ` [RFC PATCH 00/13] Add bundle-uri: resumably clones, static "dumb" CDN etc Jonathan Nieder
  2021-08-10 13:55 ` Derrick Stolee
  14 siblings, 1 reply; 26+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-08-05 15:07 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Patrick Steinhardt, Christian Couder, Albert Cui,
	Jonathan Tan, Ævar Arnfjörð Bjarmason

Add a design doc for the bundle-uri protocol extension to go along
with the packfile-uri extension added in cd8402e0fd8 (Documentation:
add Packfile URIs design doc, 2020-06-10).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/technical/bundle-uri.txt  | 119 ++++++++++++++++++++++++
 Documentation/technical/protocol-v2.txt |   5 +
 2 files changed, 124 insertions(+)
 create mode 100644 Documentation/technical/bundle-uri.txt

diff --git a/Documentation/technical/bundle-uri.txt b/Documentation/technical/bundle-uri.txt
new file mode 100644
index 0000000000..5ae9a15eaf
--- /dev/null
+++ b/Documentation/technical/bundle-uri.txt
@@ -0,0 +1,119 @@
+Bundle URI Design Notes
+=======================
+
+Protocol
+--------
+
+See `bundle-uri` in the link:protocol-v2.html[protocol-v2]
+documentation for a discussion of the bundle-uri command, and the
+expectations of clients and servers.
+
+This document is a a more general discussion of how the `bundle-uri`
+command fits in with the rest of the git ecosystem, its design goals
+and non-goals, comparison to alternatives etc.
+
+Comparison with Packfile URIs
+-----------------------------
+
+There is a similar "Packfile URIs" facility, see the
+link:packfile-uri.html[packfile-uri] documentation for details.
+
+The Packfile URIs facility requires a much closer cooperation between
+CDN and server than the bundle URI facility.
+
+I.e. the server MUST know what objects exist in the packfile URI it's
+pointing to, as well as its pack checksum. Failure to do so will not
+only result in a client error (the packfile hash won't match), but
+even if it got past that would likely result in a corrupt repository
+with tips pointing to unreachable objects.
+
+By comparison the bundle URIs are meant to be a "dumb" solution
+friendly to e.g. having a weekly cronjob take a snapshot of a git
+repository, that snapshot being uploaded to a network of FTP mirrors
+(which may be inconsistent or out of date).
+
+The server does not need to know what state the side-channel download
+is at, because the client will first validate it, and then optionally
+negotiate with the server using what it discovers there.
+
+Using the local `transfer.injectBundleURI` configuration variable (see
+linkgit:git-config[1]) the `bundle-uri` mechanism doesn't even need
+the server to support it.
+
+Security
+--------
+
+The omission of something equivalent to the packfile <OID> in the
+Packfile URIs protocol is intentional, as having it would require
+closer server and CDN cooperation than some server operators are
+comfortable with.
+
+Furthermore, it is not needed for security. The server doesn't need to
+trust its CDN. If the server were to attempt to send harmful content
+to the client, the result would not validate against the server's
+provided ref tips gotten from ls-refs.
+
+The lack of a such a hash does leave room open to a malicious CDN
+operation to be annoying however. E.g. they could inject irrelevant
+objects into the bundles, which would enlarge the downloaded
+repository until a "gc" would eventually throw them away.
+
+In practice the lack of a hash is considered to be a non-issue. Anyone
+concerned about such security problems between their server and their
+CDN is going to be pointing to a "https" URL under their control. For
+a client the "threat" is the same as without bundle-uri, i.e. a server
+is free to be annoying today and send you garbage in the PACK that you
+won't need.
+
+Security issues peculiar to bundle-uri
+--------------------------------------
+
+Both packfile-uri and bundle-uri use the `fetch.uriProtocols`
+configuration variable (see linkgit:git-config[1]) to configure which
+protocols they support.
+
+By default this is set to "http,https" for both, but bundle-uri
+supports adding "file" to that list. The server can thus point to
+"file://" URIs it expects the client to have access to.
+
+This is primarily intended for use with the `transfer.injectBundleURI`
+mechanism, but can also be useful e.g. in a centralized environment
+where a server might point to a "file:///mnt/bundles/big-repo.bdl" it
+knows to be mounted on the local machine (e.g. a racked server),
+points to it in its "bundle-uri" response.
+
+The client can then add "file" to the `fetch.uriProtocols` list to
+obey such responses. That does mean that a malicious server can point
+to any arbitrary file on the local machine. The threat of this is
+considered minimal, since anyone adding `file` to `fetch.uriProtocols`
+likely knows what they're doing and controls both ands, and the worst
+they can do is make a curl(1) pipe garbage into "index-pack" (which
+will likely promptly die on the non-PACK-file).
+
+Security comparison with packfile-uri
+-------------------------------------
+
+The initial implementation of packfile-uri needed special adjusting to
+run "git fsck" on incoming .gitmodules files, this was to deal with a
+general security issue in git, See CVE-2018-17456.
+
+The current packfile-uri mechanism requires special handling around
+"fsck" to do such cross-PACK fsck's, this is because it first indexes
+the "incremental" PACK, and then any PACK(s) provided via
+packfile-uri, before finally doing a full connectivity check.
+
+This is effect doing the fsck one might do via "clone" and "fetch" in
+reverse, or the equivalent of starting with the incremental "fetch",
+followed by the "clone".
+
+Since the packfile-uri mechanism can result in the .gitmodules blob
+referenced by such a "fetch" to be in the pack for the "clone" the
+fetch-pack process needs to keep state between the indexing of
+multiple packs, to remember to fsck the blob (via the "clone") later
+after seeing it in a tree (from the "fetch).
+
+There are no known security issues with the way packfile-uri does
+this, but since bundle-uri effectively emulates what a which doesn't
+support either "bundle-uri" or "packfile-uri" would do on clone/fetch,
+any future security issues peculiar to the packfile-uri approach are
+unlikely to be shared by it.
diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt
index d10d5e9ef6..5536ea4b7e 100644
--- a/Documentation/technical/protocol-v2.txt
+++ b/Documentation/technical/protocol-v2.txt
@@ -696,3 +696,8 @@ intended to support future features such as:
    they've got that OID already (for multi-tips the client would need
    to fetch the bundle, or do e.g. HTTP range requests to get its
    header).
+
+bundle-uri SEE ALSO
+^^^^^^^^^^^^^^^^^^^
+
+See the link:bundle-uri.html[Bundle URI Design Notes] for more.
-- 
2.33.0.rc0.646.g585563e77f


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 00/13] Add bundle-uri: resumably clones, static "dumb" CDN etc.
  2021-08-05 15:07 [RFC PATCH 00/13] Add bundle-uri: resumably clones, static "dumb" CDN etc Ævar Arnfjörð Bjarmason
                   ` (12 preceding siblings ...)
  2021-08-05 15:07 ` [RFC PATCH 13/13] bundle-uri docs: add design notes Ævar Arnfjörð Bjarmason
@ 2021-08-06 14:38 ` Jonathan Nieder
  2021-08-06 16:26   ` Ævar Arnfjörð Bjarmason
  2021-08-10 13:55 ` Derrick Stolee
  14 siblings, 1 reply; 26+ messages in thread
From: Jonathan Nieder @ 2021-08-06 14:38 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Patrick Steinhardt, Christian Couder, Albert Cui,
	Jonathan Tan

Hi,

Ævar Arnfjörð Bjarmason wrote:

> We're in the 2.33.0 rc cycle, and I'd hoped to have some more prep
> work for this integrated already, but for now here's something
> interesting I've been working on for early commentary/feedback.
>
> This adds the the ability to protocol v2 for servers to optimistically
> pre-seed supporting clients with one or more bundles via a new
> "bundle-uri" protocol extension.

My initial thought here is that even though this includes a comparison
to packfile URIs, I suspect you're underestimating them. :)

Would it be possible to do the same pre-seeding using the packfile
URIs protocol?  Nothing stops a server from sending more objects than
the client asked for.  Is the issue that you want the client to be
able to list "have"s based on that pack?  Can't the server obtain that
same information at the same time as it obtains the bundle URL?

The reason I ask is that this contains a number of differences
relative to packfile URIs, most noticeably the use of bundles instead
of packfiles as the format for the static content.  If we were
starting from scratch and chose this design _instead_ of packfile URIs
then that could make sense (though there are issues with the bundle
format that we can also go into), but in a world where people are also
using packfile URIs it makes for a kind of confusing UX.  Is a server
operator expected to put both kinds of files on CDN and double their
storage bill?  Is this meant as an alternative, a replacement, or
something that combines well together with the packfile URIs feature?
What does the intended end state look like?

Projects like chromium have been using packfile URIs in production for
about 11 months now and it's been working well.  Because of that, I'd
be interested in understanding its shortcomings and improving it in
place --- or in other words, I want _you_ to benefit from them instead
of having to create an alternative to them.  Alternatively, if the
packfile URIs protocol is fundamentally flawed, then I'd like us to
understand that early and act on it instead of creating a parallel
alternative and waiting for it to bitrot.

I'll try to find time to look more closely at the patches to
understand the use case in more detail, but it will take some time
since I'm currently focused on the -rc.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 00/13] Add bundle-uri: resumably clones, static "dumb" CDN etc.
  2021-08-06 14:38 ` [RFC PATCH 00/13] Add bundle-uri: resumably clones, static "dumb" CDN etc Jonathan Nieder
@ 2021-08-06 16:26   ` Ævar Arnfjörð Bjarmason
  2021-08-06 20:40     ` Jonathan Nieder
  0 siblings, 1 reply; 26+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-08-06 16:26 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: git, Jeff King, Patrick Steinhardt, Christian Couder, Albert Cui,
	Jonathan Tan


On Fri, Aug 06 2021, Jonathan Nieder wrote:

> Hi,
>
> Ævar Arnfjörð Bjarmason wrote:
>
>> We're in the 2.33.0 rc cycle, and I'd hoped to have some more prep
>> work for this integrated already, but for now here's something
>> interesting I've been working on for early commentary/feedback.
>>
>> This adds the the ability to protocol v2 for servers to optimistically
>> pre-seed supporting clients with one or more bundles via a new
>> "bundle-uri" protocol extension.
>
> My initial thought here is that even though this includes a comparison
> to packfile URIs, I suspect you're underestimating them. :)
>
> Would it be possible to do the same pre-seeding using the packfile
> URIs protocol?  Nothing stops a server from sending more objects than
> the client asked for.  Is the issue that you want the client to be
> able to list "have"s based on that pack?  Can't the server obtain that
> same information at the same time as it obtains the bundle URL?

Hi. Thanks for taking a quick look.

I think that the changes to Documentation/technical/protocol-v2.txt in
01/13[1] and the Documentation/technical/bundle-uri.txt document added
in 13/13[2] should address these questions.

Or perhaps not, but they're my currently my best effort to explain the
differences between the two and how they interact. So I think it's best
to point to those instead of coming up with something in this reply,
which'll inevitably be an incomplete rewrite of much of that.

In short, there are use-cases that packfile-uri is inherently unsuitable
for, or rather changing the packfile-uri feature to support them would
pretty much make it indistinguishable from this bundle-uri mechanism,
which I think would just add more confusion to the protocol.

> The reason I ask is that this contains a number of differences
> relative to packfile URIs, most noticeably the use of bundles instead
> of packfiles as the format for the static content.  If we were
> starting from scratch and chose this design _instead_ of packfile URIs
> then that could make sense (though there are issues with the bundle
> format that we can also go into), but in a world where people are also
> using packfile URIs it makes for a kind of confusing UX.  Is a server
> operator expected to put both kinds of files on CDN and double their
> storage bill?  Is this meant as an alternative, a replacement, or
> something that combines well together with the packfile URIs feature?
> What does the intended end state look like?

The two are complimentary, i.e. it's meant as something that combines
well with packfile-uri. One (packfile-uri) is meant as a dumb
pre-seeding of objects from static URLs that happens pre-negotiation.

The other (packfile-uri) is a post-negotiation mechanism for giving a
client not only a PACK, but complimenting it with a URI, together they
two form one logical complete PACK. (I know that you know this, but for
anyone reading along...).

> Projects like chromium have been using packfile URIs in production for
> about 11 months now and it's been working well.  Because of that, I'd
> be interested in understanding its shortcomings and improving it in
> place --- or in other words, I want _you_ to benefit from them instead
> of having to create an alternative to them.  Alternatively, if the
> packfile URIs protocol is fundamentally flawed, then I'd like us to
> understand that early and act on it instead of creating a parallel
> alternative and waiting for it to bitrot.

I don't think there's any real shortcomings of packfile-uri in the senes
that it makes sense to improve it in the direction of this bundle-uri
mechanism, they're simply doing different things at different phases in
the dialog.

When packfile-uri was initially being discussed I was interested in
getting something like what this bundle-uri mechanism to be a part it,
but it was clear from discussions with Jonathan Tan that he was taking
packfile-uri in a very different direction. Links to those discussions:

    https://lore.kernel.org/git/87k1hv6eel.fsf@evledraar.gmail.com/
    https://lore.kernel.org/git/87pnpbsra8.fsf@evledraar.gmail.com/
    https://lore.kernel.org/git/87zh35okzy.fsf@evledraar.gmail.com/

A demo of this feature playing nice with a server that supports
packfile-uri, although I haven't actually gotten it to also send me a
packfile-uri if I pre-seed with bundles (perhaps it's only doing it on
full clones?):

    git -c fetch.uriprotocols=https \
        -c transfer.injectBundleURI="https://vm.nix.is/~avar/noindex/codesearch-git-master.bdl" \
        -c transfer.injectBundleURI="https://vm.nix.is/~avar/noindex/codesearch-git-recent.bdl" \
        clone --bare https://chromium.googlesource.com/chromiumos/codesearch.git /tmp/codesearch.git

Will emit:
    
    Cloning into bare repository '/tmp/codesearch.git'...
    Receiving bundle (1/2): 100% (322661/322661), 89.66 MiB | 0 bytes/s, done.
    Resolving deltas: 100% (142478/142478), done.
    Receiving bundle (2/2): 100% (69378/69378), 5.51 MiB | 0 bytes/s, done.
    Resolving deltas: 100% (52558/52558)completed with 4 local objects
    remote: Total 1958 (delta 4), reused 1958 (delta 4)
    Receiving objects: 100% (1958/1958), 1.60 MiB | 0 bytes/s, done.
    Resolving deltas: 100% (4/4)completed with 3 local objects
    Checking connectivity: 393997, done.

I.e. I produced a combination of bundles going up to a commit at the
start of this month, so that last "Receiving objects" is the only
dynamic fetching of objects via the negotiation.

1. https://lore.kernel.org/git/RFC-patch-01.13-4e1a0dbef5-20210805T150534Z-avarab@gmail.com/
2. https://lore.kernel.org/git/RFC-patch-13.13-1e657ed27a-20210805T150534Z-avarab@gmail.com/

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 00/13] Add bundle-uri: resumably clones, static "dumb" CDN etc.
  2021-08-06 16:26   ` Ævar Arnfjörð Bjarmason
@ 2021-08-06 20:40     ` Jonathan Nieder
  2021-08-07  2:19       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 26+ messages in thread
From: Jonathan Nieder @ 2021-08-06 20:40 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Patrick Steinhardt, Christian Couder, Albert Cui,
	Jonathan Tan

Hi,

Ævar Arnfjörð Bjarmason wrote:

> Or perhaps not, but they're my currently my best effort to explain the
> differences between the two and how they interact. So I think it's best
> to point to those instead of coming up with something in this reply,
> which'll inevitably be an incomplete rewrite of much of that.
>
> In short, there are use-cases that packfile-uri is inherently unsuitable
> for, or rather changing the packfile-uri feature to support them would
> pretty much make it indistinguishable from this bundle-uri mechanism,
> which I think would just add more confusion to the protocol.

Hm.  I was hoping you might say more about those use cases --- e.g. is
there a concrete installation that wants to take advantage of this?
By focusing on the real-world example, we'd get a better shared
understanding of the underlying constraints.

After all, both are ways to reduce the bandwidth of a clone or other
large fetch operation by offloading the bulk of content to static
serving.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 00/13] Add bundle-uri: resumably clones, static "dumb" CDN etc.
  2021-08-06 20:40     ` Jonathan Nieder
@ 2021-08-07  2:19       ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 26+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-08-07  2:19 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: git, Jeff King, Patrick Steinhardt, Christian Couder, Albert Cui,
	Jonathan Tan


On Fri, Aug 06 2021, Jonathan Nieder wrote:

> Ævar Arnfjörð Bjarmason wrote:
>
>> Or perhaps not, but they're my currently my best effort to explain the
>> differences between the two and how they interact. So I think it's best
>> to point to those instead of coming up with something in this reply,
>> which'll inevitably be an incomplete rewrite of much of that.
>>
>> In short, there are use-cases that packfile-uri is inherently unsuitable
>> for, or rather changing the packfile-uri feature to support them would
>> pretty much make it indistinguishable from this bundle-uri mechanism,
>> which I think would just add more confusion to the protocol.
>
> Hm.  I was hoping you might say more about those use cases --- e.g. is
> there a concrete installation that wants to take advantage of this?
> By focusing on the real-world example, we'd get a better shared
> understanding of the underlying constraints.

I hacked this up for potential use on GitLab's infrastructure, mainly as
a mechanism to relieve CPU pressure on CI thundering herds.

Often you need full clones, and you sometimes need to do those from
scratch. When you've just had a push come in it's handy to convert those
to incremental fetches on top of a bundle you made recently.

It's not deployed on anything currently, it's just something I've been
hacking up. I'll be on vacation much of the rest of this month, the plan
is to start stressing it on real-world use-cases after that. I thought
I'd send this RFC first.

> After all, both are ways to reduce the bandwidth of a clone or other
> large fetch operation by offloading the bulk of content to static
> serving.

The support we have for packfile-uri in git.git now as far as the server
side goes, I think it's fair to say, fairly immature, I gather that
JGit's version is more advanced, and that's what's serving up things at
Google at e.g. https://chromium.googlesource.com.

I.e. right now for git-upload-pack you need to exhaustively enumerate
all the objects to exclude, although there's some on-list patches
recently for being able to supply tips.

More importantly your CDN reliability MUST match that of your git
server, otherwise your clone fails (as the server has already sent the
"other side" of the expected CDN pack).

Furthermore you MUST as the server be able to tell the client what pack
checksum on the CDN they should expect, which requires a very tight
coupling between git server and CDN.

You can't e.g. say "bootstrap with this bundle/pack" and point to
something like Debian's async-updated FTP network as a source. The
bootstrap data may or may not be there, and it may or may not be as
up-to-date as you'd like.

I think any proposed integration into git.git should mainly consider the
bulk of users, the big hosting providers can always run with their own
patches.

I think this approach opens up the potential for easier and therefore
wider CDN integration for git servers for providers that aren't one of
the big fish. E.g. your CDN generation can be daily cronjob, and the
server can point to it blindly and hope for the best. The client will
optimistically use the CDN data, and recover if not.

I think one thing I oversold is the "path to resumable clones",
i.e. that's all true, but I don't think that's really any harder go do
with packfile-uri in theory (in practice just serving up a sensible pack
with it is pretty tedious with git-upload-pack as it stands).

The negotiation aspect of it is also interesting and something I've been
experimenting with. I.e. the bundles are what the client sends as its
HAVE tips. This allows a server to anticipate what dialog newly cloning
clients are likely to have, and even pre-populate a cache of that
locally (it could even serve that diff up as a packfile-uri :).

Right now it retrieves each bundle in full before adding the tips to
negotiate to a subsequent dialog, but I've successfully experimental
locally with negotiating on the basis of objects we don't even have
yet. I.e. download the bundle(s), and as soon as we have the header fire
off the dialog with the server to get its PACK on the basis of those
promised tips.

It makes the recovery a bit more involved in case the bundles don't have
what we're after, but this allows us to disconnect really fast from the
server and twiddle our own thumbs while we finish downloading the
bundles to get the full clone. We already disconnect early in cases
where the bundle(s) already have what we're after.

This E-Mail got rather long, but hey, I did point you parts of this
series that cover some/most of this, and since it wasn't clear what if
anything of that you'd read ... :) Hopefully this summary is useful.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 00/13] Add bundle-uri: resumably clones, static "dumb" CDN etc.
  2021-08-05 15:07 [RFC PATCH 00/13] Add bundle-uri: resumably clones, static "dumb" CDN etc Ævar Arnfjörð Bjarmason
                   ` (13 preceding siblings ...)
  2021-08-06 14:38 ` [RFC PATCH 00/13] Add bundle-uri: resumably clones, static "dumb" CDN etc Jonathan Nieder
@ 2021-08-10 13:55 ` Derrick Stolee
  2021-08-23 13:28   ` Ævar Arnfjörð Bjarmason
  14 siblings, 1 reply; 26+ messages in thread
From: Derrick Stolee @ 2021-08-10 13:55 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, git
  Cc: Jeff King, Patrick Steinhardt, Christian Couder, Albert Cui,
	Jonathan Tan, Jonathan Nieder

On 8/5/2021 11:07 AM, Ævar Arnfjörð Bjarmason wrote:> We're in the 2.33.0 rc cycle, and I'd hoped to have some more prep
> work for this integrated already, but for now here's something
> interesting I've been working on for early commentary/feedback.

I've taking a brief look at the code, but mostly have thoughts about the
core idea based on reading the documentation. I'll keep my feedback here
restricted to that, especially because I saw some thoughts from Jonathan
that question the idea.

> This adds the the ability to protocol v2 for servers to optimistically
> pre-seed supporting clients with one or more bundles via a new
> "bundle-uri" protocol extension.

To make sure I understand things correctly, I'm going to rewrite the
description of the feature in my own words. Please correct me if I have
misread something.

This presents a new capability for protocol v2 that allows a server to
notify a client that they can download a bundle to bootstrap their object
database, and then come back to the origin server to "catch up" the
remaining new objects since that bundle via a fetch negotiation.

This idea is similar to the packfile-uri feature in that it offloads some
server load to a dumb data store (like a CDN) but differs in a few key
ways:

1. The bundle download does not need to be controlled by the server, and
   the server doesn't even need to know its contents. The client will
   discover the content and share its view in a later negotiation.

2. The packfile-uri could advertise a pack that contains only the large
   reachable blobs, and hence could use the "happy path" for bitmaps to
   compute a packfile containing all reachable objects except these large
   blobs. For the bundle-uri feature to mimic this, the blobs would need
   to be reachable from refs (perhaps shallow tags?) but it would not be
   clear that the negotiation would prevent redownloading those files.
   This is an area to explore and expand upon.

3. The bundle-uri feature is focused on "clone" scenarios, and I don't see
   how it could be used to help "fetch" scenarios. To be fair, I also have
   been vocal about how the packfile-uri feature is very hard to make
   helpful for the "fetch" scenario. The benefits to "clone" seem to be
   worth the effort alone. I think the bundle-api doubles-down on that
   being the core functionality that matters, and that is fine by me. It
   sacrifices the flexibility of the packfile-uri with a lower maintenance
   burden for servers that want to implement it.

The biggest benefit that I see is that the Git server does not need to
worry about whether the bundle has an exact set of data that it expects.
There is no timing issue about whether or not the exact packfile is seeded.
Instead, the server could have a fixed URL for its bundle and update its
contents (atomically) at any schedule without worrying about concurrent
clones being interrupted. From an operational perspective, I find the
bundle-uri a better way to offload "clone" load.

This also depends on that following "git fetch" being easy to serve. In
that sense, it can be beneficial to be somewhat aware of the bundles that
are being served: can we store the bundled refs as reachability bitmaps so
we have those available for the negotiation in the following "git fetch"
operations? This choice seems specific to how the server is deciding to
create these bundles.

It also presents interesting ideas for how to create the bundle to focus
on some core portion of the repo. The "thundering herd" of CI machines
that re-clone repos at a high rate will also be likely to use the
"--single-branch" option to reduce the refs that they will ask for in the
negotiation. In that sense, we won't want a snapshot of all refs at a
given time and instead might prefer a snapshot of the default branch or
some set of highly-active branches.

One question I saw Jonathan ask was "can we modify the packfile-uri
capability to be better?" I think this feature has enough different goals
that they could co-exist. That's the point of protocol v2, right? Servers
can implement and advertise the subset of functionality that they think is
best for their needs.

I hope my ramblings provide some amount of clarity to the discussion, but
also I intend to show support of the idea. If I was given the choice of
which feature to support (I mostly work on the client experience, so take
my choice with a grain of salt), then I would focus on implementing the
bundle-uri capability _before_ the packfile-uri capability. And that's the
best part: more options present more flexibility for different hosts to
make different decisions.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 01/13] serve: add command to advertise bundle URIs
  2021-08-05 15:07 ` [RFC PATCH 01/13] serve: add command to advertise bundle URIs Ævar Arnfjörð Bjarmason
@ 2021-08-10 13:58   ` Derrick Stolee
  2021-08-23 13:25     ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 26+ messages in thread
From: Derrick Stolee @ 2021-08-10 13:58 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, git
  Cc: Jeff King, Patrick Steinhardt, Christian Couder, Albert Cui,
	Jonathan Tan

On 8/5/2021 11:07 AM, Ævar Arnfjörð Bjarmason wrote:
...
> +bundle-uri CLIENT AND SERVER EXPECTATIONS
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +The advertised bundles MUST contain one or more reference tips for use
> +by the client. Bundles that are not self-contained MUST use the
> +standard "-" prefixes in the bundle format to indicate their
> +prerequisites. I.e. they must be in the standard format "git bundle
> +create" would create.
> +
> +If after an `ls-refs` the client finds that the ref tips it wants can
> +be retrieved entirety from advertised bundle(s), it MAY
> +disconnect. The results of such a "clone" or "fetch" should be
> +indistinguishable from the state attained without using bundle-uri.
> +
> +The client MAY also keep the connection open pending download of the
> +bundle-uris, e.g. should on or more downloads (or their validation)
> +fail.

The only technical thought I had (so far) about this proposal was that
leaving the connection open while downloading the bundle would leave
unnecessary load on the servers when no communication is happening.
There is a cost to keeping an open SSH connection, so here it would be
good to at least have the Git client close the connection after
getting a 200 response from the bundle (but not waiting for all of its
contents).

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 01/13] serve: add command to advertise bundle URIs
  2021-08-10 13:58   ` Derrick Stolee
@ 2021-08-23 13:25     ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 26+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-08-23 13:25 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git, Jeff King, Patrick Steinhardt, Christian Couder, Albert Cui,
	Jonathan Tan


On Tue, Aug 10 2021, Derrick Stolee wrote:

> On 8/5/2021 11:07 AM, Ævar Arnfjörð Bjarmason wrote:
> ...
>> +bundle-uri CLIENT AND SERVER EXPECTATIONS
>> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> +
>> +The advertised bundles MUST contain one or more reference tips for use
>> +by the client. Bundles that are not self-contained MUST use the
>> +standard "-" prefixes in the bundle format to indicate their
>> +prerequisites. I.e. they must be in the standard format "git bundle
>> +create" would create.
>> +
>> +If after an `ls-refs` the client finds that the ref tips it wants can
>> +be retrieved entirety from advertised bundle(s), it MAY
>> +disconnect. The results of such a "clone" or "fetch" should be
>> +indistinguishable from the state attained without using bundle-uri.
>> +
>> +The client MAY also keep the connection open pending download of the
>> +bundle-uris, e.g. should on or more downloads (or their validation)
>> +fail.
>
> The only technical thought I had (so far) about this proposal was that
> leaving the connection open while downloading the bundle would leave
> unnecessary load on the servers when no communication is happening.
> There is a cost to keeping an open SSH connection, so here it would be
> good to at least have the Git client close the connection after
> getting a 200 response from the bundle (but not waiting for all of its
> contents).

Thanks. Yes it's something I'll have to fix. I was hoping that I'd get
away with it for an initial implementation, but e.g. using
transfer.injectBundleURI to bootstrap chromium.git's repo from a bundle
will take so long that Google's server will give up and hang up on you.

I wonder if it's something the transport layer should be doing in
general to resume connections if they go stale if it's at a point of
clean separation in the dialog, but in any case I'll need it for
bundle-uri.

Closing the connection is also going to be more expensive in some cases,
e.g. if the bundle takes 1s we'll open/close/download
bundle-uri/open/close the connection, instead of of open/download
bundle-uri/close. I wonder if anyone cares though, we can always apply
some heuristic later I guess...

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 00/13] Add bundle-uri: resumably clones, static "dumb" CDN etc.
  2021-08-10 13:55 ` Derrick Stolee
@ 2021-08-23 13:28   ` Ævar Arnfjörð Bjarmason
  2021-08-24  2:03     ` Derrick Stolee
  0 siblings, 1 reply; 26+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-08-23 13:28 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git, Jeff King, Patrick Steinhardt, Christian Couder, Albert Cui,
	Jonathan Tan, Jonathan Nieder


On Tue, Aug 10 2021, Derrick Stolee wrote:

> On 8/5/2021 11:07 AM, Ævar Arnfjörð Bjarmason wrote:> We're in the 2.33.0 rc cycle, and I'd hoped to have some more prep
>> work for this integrated already, but for now here's something
>> interesting I've been working on for early commentary/feedback.
>
> I've taking a brief look at the code, but mostly have thoughts about the
> core idea based on reading the documentation. I'll keep my feedback here
> restricted to that, especially because I saw some thoughts from Jonathan
> that question the idea.

Thanks a lot for taking a look, and sorry about the very late reply. I
was away from the computer for a while (on vacation).

>> This adds the the ability to protocol v2 for servers to optimistically
>> pre-seed supporting clients with one or more bundles via a new
>> "bundle-uri" protocol extension.
>
> To make sure I understand things correctly, I'm going to rewrite the
> description of the feature in my own words. Please correct me if I have
> misread something.

Thanks. I'll probably try to steal some of this summary...

> This presents a new capability for protocol v2 that allows a server to
> notify a client that they can download a bundle to bootstrap their object
> database, and then come back to the origin server to "catch up" the
> remaining new objects since that bundle via a fetch negotiation.
>
> This idea is similar to the packfile-uri feature in that it offloads some
> server load to a dumb data store (like a CDN) but differs in a few key
> ways:
>
> 1. The bundle download does not need to be controlled by the server, and
>    the server doesn't even need to know its contents. The client will
>    discover the content and share its view in a later negotiation.

Correct.

> 2. The packfile-uri could advertise a pack that contains only the large
>    reachable blobs, and hence could use the "happy path" for bitmaps to
>    compute a packfile containing all reachable objects except these large
>    blobs. For the bundle-uri feature to mimic this, the blobs would need
>    to be reachable from refs (perhaps shallow tags?) but it would not be
>    clear that the negotiation would prevent redownloading those files.
>    This is an area to explore and expand upon.

Correct that this isn't handled currently.

The few paragraphs below are mostly a (hopefully helpful) digression.

Despite what I said in
https://lore.kernel.org/git/87h7g2zd8f.fsf@evledraar.gmail.com/ about
packfile-uri and bundle-uri playing nicely together there are cases
where they won't. They won't error out or anything, just that combining
them in those cases won't make much sense.

In particular if you now use packfile-uri to exclude one giant blob from
your early history, you can't use bundle-uri to serve up that early
history without also including that blob (as we'll expect the bundles to
be connected).

That property isn't inherent though, in the same way that we're clever
about .gitmodules checking we could make note of any missing content and
do a full connectivity check at the end.

But if you're using packfile-uri for that "one big blob on a CDN" I
don't see why you wouldn't be happy to serve it up in your bundle, so I
don't see much of a practical reason for further work in that area.

And yes, you do run into limitations in the general negotiation
mechanism eventually (which to be fair, we probably wouldn't care about
before bundle-uri). E.g. we can't say we HAVE a blob OID currently
(server expects commit OID).

I don't see why it *couldn't* be supported though, and would be handled
under the hood with the same happy path for bitmaps that we use for
packfile-uri exclusions now...

> 3. The bundle-uri feature is focused on "clone" scenarios, and I don't see
>    how it could be used to help "fetch" scenarios. To be fair, I also have
>    been vocal about how the packfile-uri feature is very hard to make
>    helpful for the "fetch" scenario. The benefits to "clone" seem to be
>    worth the effort alone. I think the bundle-api doubles-down on that
>    being the core functionality that matters, and that is fine by me. It
>    sacrifices the flexibility of the packfile-uri with a lower maintenance
>    burden for servers that want to implement it.

Not correct, I'm going to have it handle incremental fetches.

What *is* correct is that the initial RFC patches here entirely punt on
it (I only hook them into clone.c, no fetch.c).

I focused on the "clone" case to get to a MVP earlier, and as you note I
think in practice most users of this will want to only use it for clone
bootstrapping, and won't care so much about the incremental case.

But for the "fetch" case we'd simply start fetching the N bundles in
parallel, and read their headers, then abort fetching any whose header
declares tips that we already have.

So e.g. for a repo with a "big" bundle that's 1 week old, and 7
incremental bundles for the last 7 days of updates we'd quickly find
that we only need the last 3 if we updated 4 days ago, based on only
fetching their headers.

The client spec is very liberal on "MAY" instead of "MUST" for
supporting clients. E.g. a client might sensibly conclude that their
most recent commits are recent enough, and that it would probably be a
waste of time to fetch the headers of the pointed-to bundles, and just
do a normal fetch instead.

The response format is also currently:

    <URI>[ <optional key-value>*]

And nothing uses the "optional key-value". One of the main things I had
in mind was to support something like (none of these key-values are
specified currently):

    https://example.com/big.bundle tip=deadbeef1
    https://example.com/incremental-1.bundle prereq=deadbeef1 tip=deadbeef2
    https://example.com/incremental-2.bundle prereq=deadbeef2 tip=deadbeef3

Or:

    https://example.com/big.bundle type=complete
    https://example.com/incremental-1.bundle type=incremental
    https://example.com/incremental-2.bundle type=incremental

I.e. allow the server to up-front to optionally specify some details
about the pointed-to-bundles, purely so the client can avoid the work
and roundtrip of trying to fetch the headers of N pointed-to bundles
just to see which if any it needs for a clone/fetch.

This peeking behind the curtain would always yield to the source of
truth of inpecting the bundle header/content itself, so it would be
purely an optional client hint. It's also not as flexible as a full
bundle header (e.g. can't supply N tips or N prereqs, but for the common
case of "one big history" it should work nicely.

But maybe this use for key-value headers would be a premature
optimization, I haven't actually benchmarked it...

> The biggest benefit that I see is that the Git server does not need to
> worry about whether the bundle has an exact set of data that it expects.
> There is no timing issue about whether or not the exact packfile is seeded.
> Instead, the server could have a fixed URL for its bundle and update its
> contents (atomically) at any schedule without worrying about concurrent
> clones being interrupted. From an operational perspective, I find the
> bundle-uri a better way to offload "clone" load.

Thanks, yeah. That's pretty the main reason I hacked this up when we
already have packefile-uri, being able to e.g. have a dumb cronjob
(which may not even run) optimistically handle your CDN offloading.

> This also depends on that following "git fetch" being easy to serve. In
> that sense, it can be beneficial to be somewhat aware of the bundles that
> are being served: can we store the bundled refs as reachability bitmaps so
> we have those available for the negotiation in the following "git fetch"
> operations? This choice seems specific to how the server is deciding to
> create these bundles.

Yeah, one of the first things I'm experimenting with on the server side
(but haven't yet) is to generate bundles on the one hand, and being
aware of the last generated bundle do the equivalent of pre-populating
the pack-objects hook cache in anticipation of pre-seeded clients coming
in.

> It also presents interesting ideas for how to create the bundle to focus
> on some core portion of the repo. The "thundering herd" of CI machines
> that re-clone repos at a high rate will also be likely to use the
> "--single-branch" option to reduce the refs that they will ask for in the
> negotiation. In that sense, we won't want a snapshot of all refs at a
> given time and instead might prefer a snapshot of the default branch or
> some set of highly-active branches.

Yes, the current RFC MVP I have is rather dumb about this, but it's an
easy follow-up change along with the "fetch" support to e.g. have a
server have one "big" bundle of just its main branch in anticipation of
--single-branch --no-tags fetches, and then the next bundle is the delta
between that and the other branches.

So the first is master, the second is master..next, master..seen + tags
etc. The client from ls-refs what tips it wants, so it can then ignore
or retrieve the bundles as appropriate.

A server operator can then play around with what they think might be the
optimal arrangement and split-up of bundles.

They obviously can't handle all cases, e.g. if you have a bundle with
"master" and clone just git.git's "todo" branch with these current
patches you'll get redundant objects you don't need currently, but a
slightly smarter client will be ignoring that bundle-uri.

> One question I saw Jonathan ask was "can we modify the packfile-uri
> capability to be better?" I think this feature has enough different goals
> that they could co-exist. That's the point of protocol v2, right? Servers
> can implement and advertise the subset of functionality that they think is
> best for their needs.

*Nod*, I also think as shown with these patches the overall complexity
after setting up the scaffolding for a new feature isn't very high, and
much of it is going away in some generally useful prep patches I'm
trying to get in...

> I hope my ramblings provide some amount of clarity to the discussion, but
> also I intend to show support of the idea. If I was given the choice of
> which feature to support (I mostly work on the client experience, so take
> my choice with a grain of salt), then I would focus on implementing the
> bundle-uri capability _before_ the packfile-uri capability. And that's the
> best part: more options present more flexibility for different hosts to
> make different decisions.

Thanks again. I'll keep you CC'd on future submissions if that's OK.

One thing I could really use to move this forward is code review of some
of the outstanding serieses I've got, in particular these two are
immediately applicable to this one. It's based on the former, and I
needed to steal one patch from the latter:

https://lore.kernel.org/git/cover-v4-00.10-00000000000-20210805T011823Z-avarab@gmail.com/
https://lore.kernel.org/git/cover-v2-0.4-00000000000-20210823T110136Z-avarab@gmail.com/

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 00/13] Add bundle-uri: resumably clones, static "dumb" CDN etc.
  2021-08-23 13:28   ` Ævar Arnfjörð Bjarmason
@ 2021-08-24  2:03     ` Derrick Stolee
  2021-08-24 22:00       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 26+ messages in thread
From: Derrick Stolee @ 2021-08-24  2:03 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Patrick Steinhardt, Christian Couder, Albert Cui,
	Jonathan Tan, Jonathan Nieder

On 8/23/2021 9:28 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Tue, Aug 10 2021, Derrick Stolee wrote:
> 
>> On 8/5/2021 11:07 AM, Ævar Arnfjörð Bjarmason wrote:> We're in the 2.33.0 rc cycle, and I'd hoped to have some more prep

(I'll skip the parts where you agree with me.)

>> 3. The bundle-uri feature is focused on "clone" scenarios, and I don't see
>>    how it could be used to help "fetch" scenarios. To be fair, I also have
>>    been vocal about how the packfile-uri feature is very hard to make
>>    helpful for the "fetch" scenario. The benefits to "clone" seem to be
>>    worth the effort alone. I think the bundle-api doubles-down on that
>>    being the core functionality that matters, and that is fine by me. It
>>    sacrifices the flexibility of the packfile-uri with a lower maintenance
>>    burden for servers that want to implement it.
> 
> Not correct, I'm going to have it handle incremental fetches.
> 
> What *is* correct is that the initial RFC patches here entirely punt on
> it (I only hook them into clone.c, no fetch.c).
> 
> I focused on the "clone" case to get to a MVP earlier, and as you note I
> think in practice most users of this will want to only use it for clone
> bootstrapping, and won't care so much about the incremental case.
> 
> But for the "fetch" case we'd simply start fetching the N bundles in
> parallel, and read their headers, then abort fetching any whose header
> declares tips that we already have.

To save network time, we would need to read data as it is streaming
from the network and stop the download when we realize we don't need
the bundle? This seems wasteful, but I'm willing to see it work in
reality before judging it too harshly.

It would be better if the origin Git server could see the tips that
the user has and then only recommend bundles that serve new data.
That requires a closer awareness of which bundles contain which
content.

> So e.g. for a repo with a "big" bundle that's 1 week old, and 7
> incremental bundles for the last 7 days of updates we'd quickly find
> that we only need the last 3 if we updated 4 days ago, based on only
> fetching their headers.

I can attest that using time-based packs can be a good way to catch
up based on pre-computed pack-files instead of generating one on
demand. The GVFS protocol has a "prefetch" endpoint that gives all
pre-computed pack-files since a given timestamp. (One big difference
is that those pack-files contain only commits and trees, not blobs.
We should ensure that your proposal can extend to bundles that serve
filtered repos such as --filter=blob:none.)

The thing that works especially well with the GVFS protocol is that
if we are missing a reachable object from one of those pack-files,
then it is downloaded as a one-off request _when needed_. This allows
the pack-files to be computed by independent cache servers that have
their own clocks and compute their own timestamps and generate these
pack-files as "new objects I fetched since the last timestamp".

This is all to say that your timestamp-based approach will work as
long as you are very careful in how the incremental bundles are
constructed. There can really only be one list of bundles at a time,
and somehow we need those servers to atomically swap their list of
bundles when they are reorganized.

> The client spec is very liberal on "MAY" instead of "MUST" for
> supporting clients. E.g. a client might sensibly conclude that their
> most recent commits are recent enough, and that it would probably be a
> waste of time to fetch the headers of the pointed-to bundles, and just
> do a normal fetch instead.

Sensible. I wouldn't expect anything less.

> The response format is also currently:
> 
>     <URI>[ <optional key-value>*]
> 
> And nothing uses the "optional key-value". One of the main things I had
> in mind was to support something like (none of these key-values are
> specified currently):
> 
>     https://example.com/big.bundle tip=deadbeef1
>     https://example.com/incremental-1.bundle prereq=deadbeef1 tip=deadbeef2
>     https://example.com/incremental-2.bundle prereq=deadbeef2 tip=deadbeef3
> 
> Or:
> 
>     https://example.com/big.bundle type=complete
>     https://example.com/incremental-1.bundle type=incremental
>     https://example.com/incremental-2.bundle type=incremental
> 
> I.e. allow the server to up-front to optionally specify some details
> about the pointed-to-bundles, purely so the client can avoid the work
> and roundtrip of trying to fetch the headers of N pointed-to bundles
> just to see which if any it needs for a clone/fetch.

Ok, this seems like a reasonable way forward. If the bundles are
truly focused on data within a core "trunk" (say, the reflog of
the default branch) then this works. It gets more complicated for
repos with many long-lived branches that merge together over the
span of weeks. I'm talking about extreme outliers like the Windows
OS repository, so take this concern with a grain of salt. The
point is that in that world, a single tip isn't enough sometimes.

...
>> One question I saw Jonathan ask was "can we modify the packfile-uri
>> capability to be better?" I think this feature has enough different goals
>> that they could co-exist. That's the point of protocol v2, right? Servers
>> can implement and advertise the subset of functionality that they think is
>> best for their needs.
> 
> *Nod*, I also think as shown with these patches the overall complexity
> after setting up the scaffolding for a new feature isn't very high, and
> much of it is going away in some generally useful prep patches I'm
> trying to get in...
> 
>> I hope my ramblings provide some amount of clarity to the discussion, but
>> also I intend to show support of the idea. If I was given the choice of
>> which feature to support (I mostly work on the client experience, so take
>> my choice with a grain of salt), then I would focus on implementing the
>> bundle-uri capability _before_ the packfile-uri capability. And that's the
>> best part: more options present more flexibility for different hosts to
>> make different decisions.
> 
> Thanks again. I'll keep you CC'd on future submissions if that's OK.

Sounds good. I'll try to prioritize review.

> One thing I could really use to move this forward is code review of some
> of the outstanding serieses I've got, in particular these two are
> immediately applicable to this one. It's based on the former, and I
> needed to steal one patch from the latter:
> 
> https://lore.kernel.org/git/cover-v4-00.10-00000000000-20210805T011823Z-avarab@gmail.com/
> https://lore.kernel.org/git/cover-v2-0.4-00000000000-20210823T110136Z-avarab@gmail.com/

I'll see what I can contribute here.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 13/13] bundle-uri docs: add design notes
  2021-08-05 15:07 ` [RFC PATCH 13/13] bundle-uri docs: add design notes Ævar Arnfjörð Bjarmason
@ 2021-08-24 21:48   ` brian m. carlson
  2021-08-24 22:33     ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 26+ messages in thread
From: brian m. carlson @ 2021-08-24 21:48 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Patrick Steinhardt, Christian Couder, Albert Cui,
	Jonathan Tan

[-- Attachment #1: Type: text/plain, Size: 4159 bytes --]

On 2021-08-05 at 15:07:29, Ævar Arnfjörð Bjarmason wrote:
> Add a design doc for the bundle-uri protocol extension to go along
> with the packfile-uri extension added in cd8402e0fd8 (Documentation:
> add Packfile URIs design doc, 2020-06-10).
> 
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  Documentation/technical/bundle-uri.txt  | 119 ++++++++++++++++++++++++
>  Documentation/technical/protocol-v2.txt |   5 +
>  2 files changed, 124 insertions(+)
>  create mode 100644 Documentation/technical/bundle-uri.txt
> 
> diff --git a/Documentation/technical/bundle-uri.txt b/Documentation/technical/bundle-uri.txt
> new file mode 100644
> index 0000000000..5ae9a15eaf
> --- /dev/null
> +++ b/Documentation/technical/bundle-uri.txt
> @@ -0,0 +1,119 @@
> +Bundle URI Design Notes
> +=======================
> +
> +Protocol
> +--------
> +
> +See `bundle-uri` in the link:protocol-v2.html[protocol-v2]
> +documentation for a discussion of the bundle-uri command, and the
> +expectations of clients and servers.
> +
> +This document is a a more general discussion of how the `bundle-uri`
> +command fits in with the rest of the git ecosystem, its design goals
> +and non-goals, comparison to alternatives etc.
> +
> +Comparison with Packfile URIs
> +-----------------------------
> +
> +There is a similar "Packfile URIs" facility, see the
> +link:packfile-uri.html[packfile-uri] documentation for details.
> +
> +The Packfile URIs facility requires a much closer cooperation between
> +CDN and server than the bundle URI facility.
> +
> +I.e. the server MUST know what objects exist in the packfile URI it's
> +pointing to, as well as its pack checksum. Failure to do so will not
> +only result in a client error (the packfile hash won't match), but
> +even if it got past that would likely result in a corrupt repository
> +with tips pointing to unreachable objects.
> +
> +By comparison the bundle URIs are meant to be a "dumb" solution
> +friendly to e.g. having a weekly cronjob take a snapshot of a git
> +repository, that snapshot being uploaded to a network of FTP mirrors
> +(which may be inconsistent or out of date).
> +
> +The server does not need to know what state the side-channel download
> +is at, because the client will first validate it, and then optionally
> +negotiate with the server using what it discovers there.
> +
> +Using the local `transfer.injectBundleURI` configuration variable (see
> +linkgit:git-config[1]) the `bundle-uri` mechanism doesn't even need
> +the server to support it.

One thing I'm not seeing with this doc that I brought up during the
packfile URI discussion is that HTTPS is broken for a decent number of
Git users, and for them SSH is the only viable option.  This is true for
users of certain antivirus programs on Windows, as well as people who
have certain corporate proxies in their workplace.  For those people, as
soon as the server offers a bundle URI, their connection will stop
working.

I know that you're probably thinking, "Gee, how often does that happen?"
but judging by the number of people on StackOverflow, this is actually
very common.  The antivirus programs that break Git are actually not
uncommon and they are widely deployed on corporate machines, plus the
fact that lots of companies sell TLS intercepting proxies, which are
almost always broken in this way.  Many of these users don't even know
what's going on, so they simply lack the knowledge to take any action or
ask their network administrator for a fix.  For them, HTTPS just doesn't
work with Git, while it does for a web browser.

So we will probably want to make this behavior opt-in with a config
option for SSH, or just not available for SSH at all, so that we don't
magically break users on upgrade who are relying on the SSH protocol not
using HTTPS under the hood[0], especially the users who won't even know
what's wrong.

[0] I can't tell you how many times users have complained about the Git
LFS SSH protocol also using HTTPS implicitly.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 00/13] Add bundle-uri: resumably clones, static "dumb" CDN etc.
  2021-08-24  2:03     ` Derrick Stolee
@ 2021-08-24 22:00       ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 26+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-08-24 22:00 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git, Jeff King, Patrick Steinhardt, Christian Couder, Albert Cui,
	Jonathan Tan, Jonathan Nieder


On Mon, Aug 23 2021, Derrick Stolee wrote:

> On 8/23/2021 9:28 AM, Ævar Arnfjörð Bjarmason wrote:
>> 
>> On Tue, Aug 10 2021, Derrick Stolee wrote:
>> 
>>> On 8/5/2021 11:07 AM, Ævar Arnfjörð Bjarmason wrote:> We're in the 2.33.0 rc cycle, and I'd hoped to have some more prep
>
> (I'll skip the parts where you agree with me.)
>
>>> 3. The bundle-uri feature is focused on "clone" scenarios, and I don't see
>>>    how it could be used to help "fetch" scenarios. To be fair, I also have
>>>    been vocal about how the packfile-uri feature is very hard to make
>>>    helpful for the "fetch" scenario. The benefits to "clone" seem to be
>>>    worth the effort alone. I think the bundle-api doubles-down on that
>>>    being the core functionality that matters, and that is fine by me. It
>>>    sacrifices the flexibility of the packfile-uri with a lower maintenance
>>>    burden for servers that want to implement it.
>> 
>> Not correct, I'm going to have it handle incremental fetches.
>> 
>> What *is* correct is that the initial RFC patches here entirely punt on
>> it (I only hook them into clone.c, no fetch.c).
>> 
>> I focused on the "clone" case to get to a MVP earlier, and as you note I
>> think in practice most users of this will want to only use it for clone
>> bootstrapping, and won't care so much about the incremental case.
>> 
>> But for the "fetch" case we'd simply start fetching the N bundles in
>> parallel, and read their headers, then abort fetching any whose header
>> declares tips that we already have.
>
> To save network time, we would need to read data as it is streaming
> from the network and stop the download when we realize we don't need
> the bundle? This seems wasteful, but I'm willing to see it work in
> reality before judging it too harshly.

The common case would be making a Range request to get (what's most
likely to contain) the header, so not deciding on the fly while we're
redundantly getting data we may not use from the server.

But yes, I think for that incremental case it's likely to be better to
skip it entirely for small-ish fetches, both for client & server.

An out by default here is that a server operator would be unlikely to
configure this if they don't suspect that it's helping their use-case.

I also wonder if I shouldn't make it explicit from day 1 in the spec
that either the first listed bundle is always one without prereqs, so we
could short-circuit this on "fetch" for what's likely to be the common
case of a server operator who's only interested in this for "clone",
i.e. without any incremental bundles.

It would mostly close the door (or rather, result in redundant requests
for) someone who wants incremental-only bundles, but I can't imagine
anyone using this at all would want that, but maybe...

> It would be better if the origin Git server could see the tips that
> the user has and then only recommend bundles that serve new data.
> That requires a closer awareness of which bundles contain which
> content.

Once we're imagining this happening after the user sends over the tips
they have we're really starting to talk about something that's pretty
much indistinguishable from packfile-uri.

I.e. that happen after ls-refs & fetch negotiation, whereas bundle-uri
happens after ls-refs, but pre-negotiation.

"Pretty much" because we can imagine that instead of the server sending
the packfile-uri they'd send some hybrid of bundle-uri and packfile-uri
where instead of sending a PACK they say "oh if you want these tips
here's a bundle that has most/all of it, fetch that and get back to me".

Except the "get back to me" probably can't happen, the client would
either need to keep the server waiting as they fetch that
hybrid-bundle-uri, or disconnect, fetch it, and then re-do the
negotiation.

I can see how it might be useful if you're generating incremental
bundles anyway, but I think mostly I'd just say "well, use packfile-uri
then", but maybe I'm not being imaginative enough ... :)

>> So e.g. for a repo with a "big" bundle that's 1 week old, and 7
>> incremental bundles for the last 7 days of updates we'd quickly find
>> that we only need the last 3 if we updated 4 days ago, based on only
>> fetching their headers.
>
> I can attest that using time-based packs can be a good way to catch
> up based on pre-computed pack-files instead of generating one on
> demand. The GVFS protocol has a "prefetch" endpoint that gives all
> pre-computed pack-files since a given timestamp. (One big difference
> is that those pack-files contain only commits and trees, not blobs.
> We should ensure that your proposal can extend to bundles that serve
> filtered repos such as --filter=blob:none.)

I may be wrong, and I'm at all familiar with the guts of these filtering
mechanisms, but would that even be possible? Doesn't that now involve
the server sending over a filtered PACK payload without the blobs per
client request?

Whereas a bundle per my understanding must always be complete as far as
a commit range goes, i.e. it can have prereqs, but those prereqs are
commits, not arbitrary OIDs. But maybe arbitrary OIDs (i.e. the OIDs of
the missing blob/trees etc.) would work, I haven't tried to manually
craft such a thing...

> The thing that works especially well with the GVFS protocol is that
> if we are missing a reachable object from one of those pack-files,
> then it is downloaded as a one-off request _when needed_. This allows
> the pack-files to be computed by independent cache servers that have
> their own clocks and compute their own timestamps and generate these
> pack-files as "new objects I fetched since the last timestamp".

I *think* that's covered by the above, but not sure, i.e. we're still
discussing these filtered PACKs I think...

> This is all to say that your timestamp-based approach will work as
> long as you are very careful in how the incremental bundles are
> constructed. There can really only be one list of bundles at a time,
> and somehow we need those servers to atomically swap their list of
> bundles when they are reorganized.

The current implementation theoretically racy in that we could have
bundle uris on the server defined in different parts of config (e.g. via
includes), and have a A..F "big", bundle, then F..G, G..H etc, but in
the middle of a swap-out a client might see A..E, F..G.

I think in practice the general way we read/write config in one location
should alleviate that, and for any remaining races the client will
handle it gracefully, i.e. we'll fetch that A..E try to apply (or
rather, read fail on reading the prereqs in the header) F..G on top of
it, fail and just fall back on a negotiation from E, rather than the
expected G.

>> The client spec is very liberal on "MAY" instead of "MUST" for
>> supporting clients. E.g. a client might sensibly conclude that their
>> most recent commits are recent enough, and that it would probably be a
>> waste of time to fetch the headers of the pointed-to bundles, and just
>> do a normal fetch instead.
>
> Sensible. I wouldn't expect anything less.

*Nod*, but it's useful to note that by comparison with packfile-uri, the
plan here is to have the server and its CDN not have a hard dependency
on one another. I.e. with packfile-uri if your server and CDN have a 99%
availability, your total availability is .99^2 =~ 98.01%.

So it's inherently a softer and more optimistic way to handle failures,
and allows server operators who aren't 100% sure of their CDN never
failing them (least clones start failing) to experiment with using one.

>> The response format is also currently:
>> 
>>     <URI>[ <optional key-value>*]
>> 
>> And nothing uses the "optional key-value". One of the main things I had
>> in mind was to support something like (none of these key-values are
>> specified currently):
>> 
>>     https://example.com/big.bundle tip=deadbeef1
>>     https://example.com/incremental-1.bundle prereq=deadbeef1 tip=deadbeef2
>>     https://example.com/incremental-2.bundle prereq=deadbeef2 tip=deadbeef3
>> 
>> Or:
>> 
>>     https://example.com/big.bundle type=complete
>>     https://example.com/incremental-1.bundle type=incremental
>>     https://example.com/incremental-2.bundle type=incremental
>> 
>> I.e. allow the server to up-front to optionally specify some details
>> about the pointed-to-bundles, purely so the client can avoid the work
>> and roundtrip of trying to fetch the headers of N pointed-to bundles
>> just to see which if any it needs for a clone/fetch.
>
> Ok, this seems like a reasonable way forward. If the bundles are
> truly focused on data within a core "trunk" (say, the reflog of
> the default branch) then this works. It gets more complicated for
> repos with many long-lived branches that merge together over the
> span of weeks. I'm talking about extreme outliers like the Windows
> OS repository, so take this concern with a grain of salt. The
> point is that in that world, a single tip isn't enough sometimes.

*Nod*, but note that this would just be a shortcut over using the Range
request to get the header discussed above.

For the Windows repository you could presumably (using git.git as an
example) have bundles for:

    ..master (the big bundle, no prereqs)
    master..next
    next..ds/add-with-sparse-index
    next..ab/serve-cleanup

I.e. ds/add-with-sparse-index are two divergent branches from "next"
that we'd like to get.

A bundle header could of course have N branches and N prereqs, but for
the prereq/tip suggested above we'd handle the common case of a small
number of "big topics" that are likely to diverge quite a bit, which I
imagine the use-case you're noting is like.

>>> One question I saw Jonathan ask was "can we modify the packfile-uri
>>> capability to be better?" I think this feature has enough different goals
>>> that they could co-exist. That's the point of protocol v2, right? Servers
>>> can implement and advertise the subset of functionality that they think is
>>> best for their needs.
>> 
>> *Nod*, I also think as shown with these patches the overall complexity
>> after setting up the scaffolding for a new feature isn't very high, and
>> much of it is going away in some generally useful prep patches I'm
>> trying to get in...
>> 
>>> I hope my ramblings provide some amount of clarity to the discussion, but
>>> also I intend to show support of the idea. If I was given the choice of
>>> which feature to support (I mostly work on the client experience, so take
>>> my choice with a grain of salt), then I would focus on implementing the
>>> bundle-uri capability _before_ the packfile-uri capability. And that's the
>>> best part: more options present more flexibility for different hosts to
>>> make different decisions.
>> 
>> Thanks again. I'll keep you CC'd on future submissions if that's OK.
>
> Sounds good. I'll try to prioritize review.
>
>> One thing I could really use to move this forward is code review of some
>> of the outstanding serieses I've got, in particular these two are
>> immediately applicable to this one. It's based on the former, and I
>> needed to steal one patch from the latter:
>> 
>> https://lore.kernel.org/git/cover-v4-00.10-00000000000-20210805T011823Z-avarab@gmail.com/
>> https://lore.kernel.org/git/cover-v2-0.4-00000000000-20210823T110136Z-avarab@gmail.com/
>
> I'll see what I can contribute here.

Thanks, the reviews on the "unbundle progress" are already very
useful...

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 13/13] bundle-uri docs: add design notes
  2021-08-24 21:48   ` brian m. carlson
@ 2021-08-24 22:33     ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 26+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-08-24 22:33 UTC (permalink / raw)
  To: brian m. carlson
  Cc: git, Jeff King, Patrick Steinhardt, Christian Couder, Albert Cui,
	Jonathan Tan


On Tue, Aug 24 2021, brian m. carlson wrote:

> [[PGP Signed Part:Undecided]]
> On 2021-08-05 at 15:07:29, Ævar Arnfjörð Bjarmason wrote:
>> Add a design doc for the bundle-uri protocol extension to go along
>> with the packfile-uri extension added in cd8402e0fd8 (Documentation:
>> add Packfile URIs design doc, 2020-06-10).
>> 
>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>> ---
>>  Documentation/technical/bundle-uri.txt  | 119 ++++++++++++++++++++++++
>>  Documentation/technical/protocol-v2.txt |   5 +
>>  2 files changed, 124 insertions(+)
>>  create mode 100644 Documentation/technical/bundle-uri.txt
>> 
>> diff --git a/Documentation/technical/bundle-uri.txt b/Documentation/technical/bundle-uri.txt
>> new file mode 100644
>> index 0000000000..5ae9a15eaf
>> --- /dev/null
>> +++ b/Documentation/technical/bundle-uri.txt
>> @@ -0,0 +1,119 @@
>> +Bundle URI Design Notes
>> +=======================
>> +
>> +Protocol
>> +--------
>> +
>> +See `bundle-uri` in the link:protocol-v2.html[protocol-v2]
>> +documentation for a discussion of the bundle-uri command, and the
>> +expectations of clients and servers.
>> +
>> +This document is a a more general discussion of how the `bundle-uri`
>> +command fits in with the rest of the git ecosystem, its design goals
>> +and non-goals, comparison to alternatives etc.
>> +
>> +Comparison with Packfile URIs
>> +-----------------------------
>> +
>> +There is a similar "Packfile URIs" facility, see the
>> +link:packfile-uri.html[packfile-uri] documentation for details.
>> +
>> +The Packfile URIs facility requires a much closer cooperation between
>> +CDN and server than the bundle URI facility.
>> +
>> +I.e. the server MUST know what objects exist in the packfile URI it's
>> +pointing to, as well as its pack checksum. Failure to do so will not
>> +only result in a client error (the packfile hash won't match), but
>> +even if it got past that would likely result in a corrupt repository
>> +with tips pointing to unreachable objects.
>> +
>> +By comparison the bundle URIs are meant to be a "dumb" solution
>> +friendly to e.g. having a weekly cronjob take a snapshot of a git
>> +repository, that snapshot being uploaded to a network of FTP mirrors
>> +(which may be inconsistent or out of date).
>> +
>> +The server does not need to know what state the side-channel download
>> +is at, because the client will first validate it, and then optionally
>> +negotiate with the server using what it discovers there.
>> +
>> +Using the local `transfer.injectBundleURI` configuration variable (see
>> +linkgit:git-config[1]) the `bundle-uri` mechanism doesn't even need
>> +the server to support it.
>
> One thing I'm not seeing with this doc that I brought up during the
> packfile URI discussion is that HTTPS is broken for a decent number of
> Git users, and for them SSH is the only viable option.  This is true for
> users of certain antivirus programs on Windows, as well as people who
> have certain corporate proxies in their workplace.  For those people, as
> soon as the server offers a bundle URI, their connection will stop
> working.
>
> I know that you're probably thinking, "Gee, how often does that happen?"
> but judging by the number of people on StackOverflow, this is actually
> very common.  The antivirus programs that break Git are actually not
> uncommon and they are widely deployed on corporate machines, plus the
> fact that lots of companies sell TLS intercepting proxies, which are
> almost always broken in this way.  Many of these users don't even know
> what's going on, so they simply lack the knowledge to take any action or
> ask their network administrator for a fix.  For them, HTTPS just doesn't
> work with Git, while it does for a web browser.
>
> So we will probably want to make this behavior opt-in with a config
> option for SSH, or just not available for SSH at all, so that we don't
> magically break users on upgrade who are relying on the SSH protocol not
> using HTTPS under the hood[0], especially the users who won't even know
> what's wrong.

Good point, I think this sort of thing will be a non-issue with
bundle-uri, because in general it handles any sort of network / fetching
/ validation failures gracefully. I.e. with these patches you can point
at a bad URI, broken non-bundle etc. We'll just move on to a full clone.

Whereas with packfile-uri the inline PACK and the URI are things you
MUST both get, as the provided packfile-uri completes the incomplete
inline PACK. So once you say that you're willing to accept things over
https, you MUST be able to get that thing.

We'll still waste a bit of time trying though with bundle-uri. But I
think for the common case of bundle-uri helping more than not (which
presumably, the server operator has tested), it's a better default to
try https:// even if the main dialog is over ssh://.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2021-08-24 22:39 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-05 15:07 [RFC PATCH 00/13] Add bundle-uri: resumably clones, static "dumb" CDN etc Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 01/13] serve: add command to advertise bundle URIs Ævar Arnfjörð Bjarmason
2021-08-10 13:58   ` Derrick Stolee
2021-08-23 13:25     ` Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 02/13] bundle-uri client: add "bundle-uri" parsing + tests Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 03/13] connect.c: refactor sending of agent & object-format Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 04/13] bundle-uri client: add minimal NOOP client Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 05/13] bundle-uri client: add "git ls-remote-bundle-uri" Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 06/13] bundle-uri client: add transfer.injectBundleURI support Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 07/13] bundle-uri client: add boolean transfer.bundleURI setting Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 08/13] bundle.h: make "fd" version of read_bundle_header() public Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 09/13] fetch-pack: add a deref_without_lazy_fetch_extended() Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 10/13] fetch-pack: move --keep=* option filling to a function Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 11/13] index-pack: add --progress-title option Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 12/13] bundle-uri client: support for bundle-uri with "clone" Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 13/13] bundle-uri docs: add design notes Ævar Arnfjörð Bjarmason
2021-08-24 21:48   ` brian m. carlson
2021-08-24 22:33     ` Ævar Arnfjörð Bjarmason
2021-08-06 14:38 ` [RFC PATCH 00/13] Add bundle-uri: resumably clones, static "dumb" CDN etc Jonathan Nieder
2021-08-06 16:26   ` Ævar Arnfjörð Bjarmason
2021-08-06 20:40     ` Jonathan Nieder
2021-08-07  2:19       ` Ævar Arnfjörð Bjarmason
2021-08-10 13:55 ` Derrick Stolee
2021-08-23 13:28   ` Ævar Arnfjörð Bjarmason
2021-08-24  2:03     ` Derrick Stolee
2021-08-24 22:00       ` Ævar Arnfjörð Bjarmason

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).