git@vger.kernel.org mailing list mirror (one of many)
 help / Atom feed
* [WIP RFC 0/5] Design for offloading part of packfile response to CDN
@ 2018-12-03 23:37 Jonathan Tan
  2018-12-03 23:37 ` [WIP RFC 1/5] Documentation: order protocol v2 sections Jonathan Tan
                   ` (5 more replies)
  0 siblings, 6 replies; 22+ messages in thread
From: Jonathan Tan @ 2018-12-03 23:37 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Some of us have been working on a design to improve the scalability of
Git servers by allowing them to offload part of the packfile response to
CDNs in this way: returning HTTP(S) URIs in fetch responses in addition
to packfiles.

This can reduce the load on individual Git servers and improves
proximity (by having data served from closer to the user).

I have included here a design document (patch 2) and a rough
implementation of the server (patch 5). Currently, the implementation
only allows replacing single blobs with URIs, but the protocol
improvement is designed in such a way as to allow independent
improvement of Git server implementations.

There is a potential issue: a server which produces both the URIs and
the packfile at roughly the same time (like the implementation in this
patch set) will not have sideband access until it has concluded sending
the URIs. Among other things, this means that the server cannot send
keepalive packets until quite late in the response. One solution to this
might be to add a feature that allows the server to use a sideband
throughout the whole response - and this has other benefits too like
allowing servers to inform the client throughout the whole fetch, not
just at the end.

Jonathan Tan (5):
  Documentation: order protocol v2 sections
  Documentation: add Packfile URIs design doc
  upload-pack: refactor reading of pack-objects out
  upload-pack: refactor writing of "packfile" line
  upload-pack: send part of packfile response as uri

 Documentation/technical/packfile-uri.txt |  83 +++++++++++++
 Documentation/technical/protocol-v2.txt  |  22 ++--
 builtin/pack-objects.c                   |  48 ++++++++
 fetch-pack.c                             |   9 ++
 t/t5702-protocol-v2.sh                   |  25 ++++
 upload-pack.c                            | 150 ++++++++++++++++-------
 6 files changed, 285 insertions(+), 52 deletions(-)
 create mode 100644 Documentation/technical/packfile-uri.txt

-- 
2.19.0.271.gfe8321ec05.dirty


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [WIP RFC 1/5] Documentation: order protocol v2 sections
  2018-12-03 23:37 [WIP RFC 0/5] Design for offloading part of packfile response to CDN Jonathan Tan
@ 2018-12-03 23:37 ` Jonathan Tan
  2018-12-05  4:10   ` Junio C Hamano
  2018-12-03 23:37 ` [WIP RFC 2/5] Documentation: add Packfile URIs design doc Jonathan Tan
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 22+ messages in thread
From: Jonathan Tan @ 2018-12-03 23:37 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

The git command line expects Git servers to follow a specific order of
sections when transmitting protocol v2 responses, but this is not
explicit in the documentation. Make the order explicit.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 Documentation/technical/protocol-v2.txt | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt
index 09e4e0273f..345c00e08c 100644
--- a/Documentation/technical/protocol-v2.txt
+++ b/Documentation/technical/protocol-v2.txt
@@ -309,11 +309,11 @@ the 'wanted-refs' section in the server's response as explained below.
 
 The response of `fetch` is broken into a number of sections separated by
 delimiter packets (0001), with each section beginning with its section
-header.
+header. Most sections are sent only when the packfile is sent.
 
-    output = *section
-    section = (acknowledgments | shallow-info | wanted-refs | packfile)
-	      (flush-pkt | delim-pkt)
+    output = acknowledgements flush-pkt |
+	     [acknowledgments delim-pkt] [shallow-info delim-pkt]
+	     [wanted-refs delim-pkt] packfile flush-pkt
 
     acknowledgments = PKT-LINE("acknowledgments" LF)
 		      (nak | *ack)
@@ -335,9 +335,10 @@ header.
 	       *PKT-LINE(%x01-03 *%x00-ff)
 
     acknowledgments section
-	* If the client determines that it is finished with negotiations
-	  by sending a "done" line, the acknowledgments sections MUST be
-	  omitted from the server's response.
+	* If the client determines that it is finished with negotiations by
+	  sending a "done" line (thus requiring the server to send a packfile),
+	  the acknowledgments sections MUST be omitted from the server's
+	  response.
 
 	* Always begins with the section header "acknowledgments"
 
@@ -388,9 +389,6 @@ header.
 	  which the client has not indicated was shallow as a part of
 	  its request.
 
-	* This section is only included if a packfile section is also
-	  included in the response.
-
     wanted-refs section
 	* This section is only included if the client has requested a
 	  ref using a 'want-ref' line and if a packfile section is also
-- 
2.19.0.271.gfe8321ec05.dirty


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [WIP RFC 2/5] Documentation: add Packfile URIs design doc
  2018-12-03 23:37 [WIP RFC 0/5] Design for offloading part of packfile response to CDN Jonathan Tan
  2018-12-03 23:37 ` [WIP RFC 1/5] Documentation: order protocol v2 sections Jonathan Tan
@ 2018-12-03 23:37 ` Jonathan Tan
  2018-12-04  0:21   ` Stefan Beller
                     ` (2 more replies)
  2018-12-03 23:37 ` [WIP RFC 3/5] upload-pack: refactor reading of pack-objects out Jonathan Tan
                   ` (3 subsequent siblings)
  5 siblings, 3 replies; 22+ messages in thread
From: Jonathan Tan @ 2018-12-03 23:37 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 Documentation/technical/packfile-uri.txt | 83 ++++++++++++++++++++++++
 Documentation/technical/protocol-v2.txt  |  6 +-
 2 files changed, 88 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/technical/packfile-uri.txt

diff --git a/Documentation/technical/packfile-uri.txt b/Documentation/technical/packfile-uri.txt
new file mode 100644
index 0000000000..6535801486
--- /dev/null
+++ b/Documentation/technical/packfile-uri.txt
@@ -0,0 +1,83 @@
+Packfile URIs
+=============
+
+This feature allows servers to serve part of their packfile response as URIs.
+This allows server designs that improve scalability in bandwidth and CPU usage
+(for example, by serving some data through a CDN), and (in the future) provides
+some measure of resumability to clients.
+
+This feature is available only in protocol version 2.
+
+Protocol
+--------
+
+The server advertises `packfile-uris`.
+
+If the client replies with the following arguments:
+
+ * packfile-uris
+ * thin-pack
+ * ofs-delta
+
+when the server sends the packfile, it MAY send a `packfile-uris` section
+directly before the `packfile` section (right after `wanted-refs` if it is
+sent) containing HTTP(S) URIs. See protocol-v2.txt for the documentation of
+this section.
+
+Clients then should understand that the returned packfile could be incomplete,
+and that it needs to download all the given URIs before the fetch or clone is
+complete. Each URI should point to a Git packfile (which may be a thin pack and
+which may contain offset deltas).
+
+Server design
+-------------
+
+The server can be trivially made compatible with the proposed protocol by
+having it advertise `packfile-uris`, tolerating the client sending
+`packfile-uris`, and never sending any `packfile-uris` section. But we should
+include some sort of non-trivial implementation in the Minimum Viable Product,
+at least so that we can test the client.
+
+This is the implementation: a feature, marked experimental, that allows the
+server to be configured by one or more `uploadpack.blobPackfileUri=<sha1>
+<uri>` entries. Whenever the list of objects to be sent is assembled, a blob
+with the given sha1 can be replaced by the given URI. This allows, for example,
+servers to delegate serving of large blobs to CDNs.
+
+Client design
+-------------
+
+While fetching, the client needs to remember the list of URIs and cannot
+declare that the fetch is complete until all URIs have been downloaded as
+packfiles.
+
+The division of work (initial fetch + additional URIs) introduces convenient
+points for resumption of an interrupted clone - such resumption can be done
+after the Minimum Viable Product (see "Future work").
+
+The client can inhibit this feature (i.e. refrain from sending the
+`packfile-urls` parameter) by passing --no-packfile-urls to `git fetch`.
+
+Future work
+-----------
+
+The protocol design allows some evolution of the server and client without any
+need for protocol changes, so only a small-scoped design is included here to
+form the MVP. For example, the following can be done:
+
+ * On the server, a long-running process that takes in entire requests and
+   outputs a list of URIs and the corresponding inclusion and exclusion sets of
+   objects. This allows, e.g., signed URIs to be used and packfiles for common
+   requests to be cached.
+ * On the client, resumption of clone. If a clone is interrupted, information
+   could be recorded in the repository's config and a "clone-resume" command
+   can resume the clone in progress. (Resumption of subsequent fetches is more
+   difficult because that must deal with the user wanting to use the repository
+   even after the fetch was interrupted.)
+
+There are some possible features that will require a change in protocol:
+
+ * Additional HTTP headers (e.g. authentication)
+ * Byte range support
+ * Different file formats referenced by URIs (e.g. raw object)
+
diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt
index 345c00e08c..2cb1c41742 100644
--- a/Documentation/technical/protocol-v2.txt
+++ b/Documentation/technical/protocol-v2.txt
@@ -313,7 +313,8 @@ header. Most sections are sent only when the packfile is sent.
 
     output = acknowledgements flush-pkt |
 	     [acknowledgments delim-pkt] [shallow-info delim-pkt]
-	     [wanted-refs delim-pkt] packfile flush-pkt
+	     [wanted-refs delim-pkt] [packfile-uris delim-pkt]
+	     packfile flush-pkt
 
     acknowledgments = PKT-LINE("acknowledgments" LF)
 		      (nak | *ack)
@@ -331,6 +332,9 @@ header. Most sections are sent only when the packfile is sent.
 		  *PKT-LINE(wanted-ref LF)
     wanted-ref = obj-id SP refname
 
+    packfile-uris = PKT-LINE("packfile-uris" LF) *packfile-uri
+    packfile-uri = PKT-LINE("uri" SP *%x20-ff LF)
+
     packfile = PKT-LINE("packfile" LF)
 	       *PKT-LINE(%x01-03 *%x00-ff)
 
-- 
2.19.0.271.gfe8321ec05.dirty


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [WIP RFC 3/5] upload-pack: refactor reading of pack-objects out
  2018-12-03 23:37 [WIP RFC 0/5] Design for offloading part of packfile response to CDN Jonathan Tan
  2018-12-03 23:37 ` [WIP RFC 1/5] Documentation: order protocol v2 sections Jonathan Tan
  2018-12-03 23:37 ` [WIP RFC 2/5] Documentation: add Packfile URIs design doc Jonathan Tan
@ 2018-12-03 23:37 ` Jonathan Tan
  2018-12-04  0:30   ` Stefan Beller
  2018-12-05  6:30   ` Junio C Hamano
  2018-12-03 23:37 ` [WIP RFC 4/5] upload-pack: refactor writing of "packfile" line Jonathan Tan
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 22+ messages in thread
From: Jonathan Tan @ 2018-12-03 23:37 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Subsequent patches will change how the output of pack-objects is
processed, so extract that processing into its own function.

Currently, at most 1 character can be buffered (in the "buffered" local
variable). One of those patches will require a larger buffer, so replace
that "buffered" local variable with a buffer array.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 upload-pack.c | 80 +++++++++++++++++++++++++++++----------------------
 1 file changed, 46 insertions(+), 34 deletions(-)

diff --git a/upload-pack.c b/upload-pack.c
index 5e81f1ff24..cec43e0a46 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -101,14 +101,51 @@ static int write_one_shallow(const struct commit_graft *graft, void *cb_data)
 	return 0;
 }
 
+struct output_state {
+	char buffer[8193];
+	int used;
+};
+
+static int read_pack_objects_stdout(int outfd, struct output_state *os)
+{
+	/* Data ready; we keep the last byte to ourselves
+	 * in case we detect broken rev-list, so that we
+	 * can leave the stream corrupted.  This is
+	 * unfortunate -- unpack-objects would happily
+	 * accept a valid packdata with trailing garbage,
+	 * so appending garbage after we pass all the
+	 * pack data is not good enough to signal
+	 * breakage to downstream.
+	 */
+	ssize_t readsz;
+
+	readsz = xread(outfd, os->buffer + os->used,
+		       sizeof(os->buffer) - os->used);
+	if (readsz < 0) {
+		return readsz;
+	}
+	os->used += readsz;
+
+	if (os->used > 1) {
+		send_client_data(1, os->buffer, os->used - 1);
+		os->buffer[0] = os->buffer[os->used - 1];
+		os->used = 1;
+	} else {
+		send_client_data(1, os->buffer, os->used);
+		os->used = 0;
+	}
+
+	return readsz;
+}
+
 static void create_pack_file(const struct object_array *have_obj,
 			     const struct object_array *want_obj)
 {
 	struct child_process pack_objects = CHILD_PROCESS_INIT;
-	char data[8193], progress[128];
+	struct output_state output_state = {0};
+	char progress[128];
 	char abort_msg[] = "aborting due to possible repository "
 		"corruption on the remote side.";
-	int buffered = -1;
 	ssize_t sz;
 	int i;
 	FILE *pipe_fd;
@@ -235,39 +272,15 @@ static void create_pack_file(const struct object_array *have_obj,
 			continue;
 		}
 		if (0 <= pu && (pfd[pu].revents & (POLLIN|POLLHUP))) {
-			/* Data ready; we keep the last byte to ourselves
-			 * in case we detect broken rev-list, so that we
-			 * can leave the stream corrupted.  This is
-			 * unfortunate -- unpack-objects would happily
-			 * accept a valid packdata with trailing garbage,
-			 * so appending garbage after we pass all the
-			 * pack data is not good enough to signal
-			 * breakage to downstream.
-			 */
-			char *cp = data;
-			ssize_t outsz = 0;
-			if (0 <= buffered) {
-				*cp++ = buffered;
-				outsz++;
-			}
-			sz = xread(pack_objects.out, cp,
-				  sizeof(data) - outsz);
-			if (0 < sz)
-				;
-			else if (sz == 0) {
+			int result = read_pack_objects_stdout(pack_objects.out,
+							      &output_state);
+
+			if (result == 0) {
 				close(pack_objects.out);
 				pack_objects.out = -1;
-			}
-			else
+			} else if (result < 0) {
 				goto fail;
-			sz += outsz;
-			if (1 < sz) {
-				buffered = data[sz-1] & 0xFF;
-				sz--;
 			}
-			else
-				buffered = -1;
-			send_client_data(1, data, sz);
 		}
 
 		/*
@@ -292,9 +305,8 @@ static void create_pack_file(const struct object_array *have_obj,
 	}
 
 	/* flush the data */
-	if (0 <= buffered) {
-		data[0] = buffered;
-		send_client_data(1, data, 1);
+	if (output_state.used > 0) {
+		send_client_data(1, output_state.buffer, output_state.used);
 		fprintf(stderr, "flushed.\n");
 	}
 	if (use_sideband)
-- 
2.19.0.271.gfe8321ec05.dirty


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [WIP RFC 4/5] upload-pack: refactor writing of "packfile" line
  2018-12-03 23:37 [WIP RFC 0/5] Design for offloading part of packfile response to CDN Jonathan Tan
                   ` (2 preceding siblings ...)
  2018-12-03 23:37 ` [WIP RFC 3/5] upload-pack: refactor reading of pack-objects out Jonathan Tan
@ 2018-12-03 23:37 ` Jonathan Tan
  2018-12-06  6:35   ` Junio C Hamano
  2018-12-03 23:37 ` [WIP RFC 5/5] upload-pack: send part of packfile response as uri Jonathan Tan
  2018-12-04  0:01 ` [WIP RFC 0/5] Design for offloading part of packfile response to CDN Stefan Beller
  5 siblings, 1 reply; 22+ messages in thread
From: Jonathan Tan @ 2018-12-03 23:37 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

A subsequent patch allows pack-objects to output additional information
(in addition to the packfile that it currently outputs). This means that
we must hold off on writing the "packfile" section header to the client
before we process the output of pack-objects, so move the writing of
the "packfile" section header to read_pack_objects_stdout().

Unfortunately, this also means that we cannot send keepalive packets
until pack-objects starts sending out the packfile, since the sideband
is only established when the "packfile" section header is sent.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 upload-pack.c | 47 ++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 36 insertions(+), 11 deletions(-)

diff --git a/upload-pack.c b/upload-pack.c
index cec43e0a46..aa2589b858 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -104,9 +104,12 @@ static int write_one_shallow(const struct commit_graft *graft, void *cb_data)
 struct output_state {
 	char buffer[8193];
 	int used;
+	unsigned packfile_started : 1;
+	struct strbuf progress_buf;
 };
 
-static int read_pack_objects_stdout(int outfd, struct output_state *os)
+static int read_pack_objects_stdout(int outfd, struct output_state *os,
+				    int use_protocol_v2)
 {
 	/* Data ready; we keep the last byte to ourselves
 	 * in case we detect broken rev-list, so that we
@@ -126,6 +129,12 @@ static int read_pack_objects_stdout(int outfd, struct output_state *os)
 	}
 	os->used += readsz;
 
+	if (!os->packfile_started) {
+		os->packfile_started = 1;
+		if (use_protocol_v2)
+			packet_write_fmt(1, "packfile\n");
+	}
+
 	if (os->used > 1) {
 		send_client_data(1, os->buffer, os->used - 1);
 		os->buffer[0] = os->buffer[os->used - 1];
@@ -138,8 +147,17 @@ static int read_pack_objects_stdout(int outfd, struct output_state *os)
 	return readsz;
 }
 
+static void flush_progress_buf(struct strbuf *progress_buf)
+{
+	if (!progress_buf->len)
+		return;
+	send_client_data(2, progress_buf->buf, progress_buf->len);
+	strbuf_reset(progress_buf);
+}
+
 static void create_pack_file(const struct object_array *have_obj,
-			     const struct object_array *want_obj)
+			     const struct object_array *want_obj,
+			     int use_protocol_v2)
 {
 	struct child_process pack_objects = CHILD_PROCESS_INIT;
 	struct output_state output_state = {0};
@@ -260,9 +278,13 @@ static void create_pack_file(const struct object_array *have_obj,
 			 */
 			sz = xread(pack_objects.err, progress,
 				  sizeof(progress));
-			if (0 < sz)
-				send_client_data(2, progress, sz);
-			else if (sz == 0) {
+			if (0 < sz) {
+				if (output_state.packfile_started)
+					send_client_data(2, progress, sz);
+				else
+					strbuf_add(&output_state.progress_buf,
+						   progress, sz);
+			} else if (sz == 0) {
 				close(pack_objects.err);
 				pack_objects.err = -1;
 			}
@@ -273,8 +295,10 @@ static void create_pack_file(const struct object_array *have_obj,
 		}
 		if (0 <= pu && (pfd[pu].revents & (POLLIN|POLLHUP))) {
 			int result = read_pack_objects_stdout(pack_objects.out,
-							      &output_state);
-
+							      &output_state,
+							      use_protocol_v2);
+			if (output_state.packfile_started)
+				flush_progress_buf(&output_state.progress_buf);
 			if (result == 0) {
 				close(pack_objects.out);
 				pack_objects.out = -1;
@@ -293,12 +317,14 @@ static void create_pack_file(const struct object_array *have_obj,
 		 * protocol to say anything, so those clients are just out of
 		 * luck.
 		 */
-		if (!ret && use_sideband) {
+		if (!ret && use_sideband && output_state.packfile_started) {
 			static const char buf[] = "0005\1";
 			write_or_die(1, buf, 5);
 		}
 	}
 
+	flush_progress_buf(&output_state.progress_buf);
+
 	if (finish_command(&pack_objects)) {
 		error("git upload-pack: git-pack-objects died with error.");
 		goto fail;
@@ -1094,7 +1120,7 @@ void upload_pack(struct upload_pack_options *options)
 	if (want_obj.nr) {
 		struct object_array have_obj = OBJECT_ARRAY_INIT;
 		get_common_commits(&have_obj, &want_obj);
-		create_pack_file(&have_obj, &want_obj);
+		create_pack_file(&have_obj, &want_obj, 0);
 	}
 }
 
@@ -1475,8 +1501,7 @@ int upload_pack_v2(struct repository *r, struct argv_array *keys,
 			send_wanted_ref_info(&data);
 			send_shallow_info(&data, &want_obj);
 
-			packet_write_fmt(1, "packfile\n");
-			create_pack_file(&have_obj, &want_obj);
+			create_pack_file(&have_obj, &want_obj, 1);
 			state = FETCH_DONE;
 			break;
 		case FETCH_DONE:
-- 
2.19.0.271.gfe8321ec05.dirty


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [WIP RFC 5/5] upload-pack: send part of packfile response as uri
  2018-12-03 23:37 [WIP RFC 0/5] Design for offloading part of packfile response to CDN Jonathan Tan
                   ` (3 preceding siblings ...)
  2018-12-03 23:37 ` [WIP RFC 4/5] upload-pack: refactor writing of "packfile" line Jonathan Tan
@ 2018-12-03 23:37 ` Jonathan Tan
  2018-12-04 20:09   ` Stefan Beller
  2018-12-04  0:01 ` [WIP RFC 0/5] Design for offloading part of packfile response to CDN Stefan Beller
  5 siblings, 1 reply; 22+ messages in thread
From: Jonathan Tan @ 2018-12-03 23:37 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

This is a partial implementation of upload-pack sending part of its
packfile response as URIs.

The client is not fully implemented - it knows to ignore the
"packfile-uris" section, but because it does not actually fetch those
URIs, the returned packfile is incomplete. A test is included to show
that the appropriate URI is indeed transmitted, and that the returned
packfile is lacking exactly the expected object.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/pack-objects.c | 48 ++++++++++++++++++++++++++++++++++++++++++
 fetch-pack.c           |  9 ++++++++
 t/t5702-protocol-v2.sh | 25 ++++++++++++++++++++++
 upload-pack.c          | 37 ++++++++++++++++++++++++++++----
 4 files changed, 115 insertions(+), 4 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index e7ea206c08..2abbddd3cb 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -117,6 +117,15 @@ enum missing_action {
 static enum missing_action arg_missing_action;
 static show_object_fn fn_show_object;
 
+struct configured_exclusion {
+	struct oidmap_entry e;
+	char *uri;
+};
+static struct oidmap configured_exclusions;
+
+static int exclude_configured_blobs;
+static struct oidset excluded_by_config;
+
 /*
  * stats
  */
@@ -831,6 +840,23 @@ static off_t write_reused_pack(struct hashfile *f)
 	return reuse_packfile_offset - sizeof(struct pack_header);
 }
 
+static void write_excluded_by_configs(void)
+{
+	struct oidset_iter iter;
+	const struct object_id *oid;
+
+	oidset_iter_init(&excluded_by_config, &iter);
+	while ((oid = oidset_iter_next(&iter))) {
+		struct configured_exclusion *ex =
+			oidmap_get(&configured_exclusions, oid);
+
+		if (!ex)
+			BUG("configured exclusion wasn't configured");
+		write_in_full(1, ex->uri, strlen(ex->uri));
+		write_in_full(1, "\n", 1);
+	}
+}
+
 static const char no_split_warning[] = N_(
 "disabling bitmap writing, packs are split due to pack.packSizeLimit"
 );
@@ -1124,6 +1150,12 @@ static int want_object_in_pack(const struct object_id *oid,
 		}
 	}
 
+	if (exclude_configured_blobs &&
+	    oidmap_get(&configured_exclusions, oid)) {
+		oidset_insert(&excluded_by_config, oid);
+		return 0;
+	}
+
 	return 1;
 }
 
@@ -2728,6 +2760,19 @@ static int git_pack_config(const char *k, const char *v, void *cb)
 			    pack_idx_opts.version);
 		return 0;
 	}
+	if (!strcmp(k, "uploadpack.blobpackfileuri")) {
+		struct configured_exclusion *ex = xmalloc(sizeof(*ex));
+		const char *end;
+
+		if (parse_oid_hex(v, &ex->e.oid, &end) || *end != ' ')
+			die(_("value of uploadpack.blobpackfileuri must be "
+			      "of the form '<sha-1> <uri>' (got '%s')"), v);
+		if (oidmap_get(&configured_exclusions, &ex->e.oid))
+			die(_("object already configured in another "
+			      "uploadpack.blobpackfileuri (got '%s')"), v);
+		ex->uri = xstrdup(end + 1);
+		oidmap_put(&configured_exclusions, ex);
+	}
 	return git_default_config(k, v, cb);
 }
 
@@ -3314,6 +3359,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 			 N_("do not pack objects in promisor packfiles")),
 		OPT_BOOL(0, "delta-islands", &use_delta_islands,
 			 N_("respect islands during delta compression")),
+		OPT_BOOL(0, "exclude-configured-blobs", &exclude_configured_blobs,
+			 N_("respect uploadpack.blobpackfileuri")),
 		OPT_END(),
 	};
 
@@ -3487,6 +3534,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 		return 0;
 	if (nr_result)
 		prepare_pack(window, depth);
+	write_excluded_by_configs();
 	write_pack_file();
 	if (progress)
 		fprintf_ln(stderr,
diff --git a/fetch-pack.c b/fetch-pack.c
index 9691046e64..6e1985ab55 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1413,6 +1413,15 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 				receive_wanted_refs(&reader, sought, nr_sought);
 
 			/* get the pack */
+			if (process_section_header(&reader, "packfile-uris", 1)) {
+				/* skip the whole section */
+				process_section_header(&reader, "packfile-uris", 0);
+				while (packet_reader_read(&reader) == PACKET_READ_NORMAL) {
+					/* do nothing */
+				}
+				if (reader.status != PACKET_READ_DELIM)
+					die("expected DELIM");
+			}
 			process_section_header(&reader, "packfile", 0);
 			if (get_pack(args, fd, pack_lockfile))
 				die(_("git fetch-pack: fetch failed."));
diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
index 0f2b09ebb8..ccb1fc510e 100755
--- a/t/t5702-protocol-v2.sh
+++ b/t/t5702-protocol-v2.sh
@@ -588,6 +588,31 @@ test_expect_success 'when server does not send "ready", expect FLUSH' '
 	test_i18ngrep "expected no other sections to be sent after no .ready." err
 '
 
+test_expect_success 'part of packfile response provided as URI' '
+	rm -rf "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" http_child log &&
+
+	git init "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+	echo my-blob >"$HTTPD_DOCUMENT_ROOT_PATH/http_parent/my-blob" &&
+	git -C "$HTTPD_DOCUMENT_ROOT_PATH/http_parent/" add my-blob &&
+	git -C "$HTTPD_DOCUMENT_ROOT_PATH/http_parent/" commit -m x &&
+
+	git -C "$HTTPD_DOCUMENT_ROOT_PATH/http_parent/" hash-object my-blob >h &&
+	git -C "$HTTPD_DOCUMENT_ROOT_PATH/http_parent/" config \
+		"uploadpack.blobpackfileuri" \
+		"$(cat h) https://example.com/a-uri" &&
+
+	# NEEDSWORK: "git clone" fails here because it ignores the URI provided
+	# instead of fetching it.
+	test_must_fail env GIT_TRACE_PACKET="$(pwd)/log" \
+		git -c protocol.version=2 clone \
+		"$HTTPD_URL/smart/http_parent" http_child 2>err &&
+	# Although "git clone" fails, we can still check that the server
+	# provided the URI we requested and that the error message pinpoints
+	# the object that is missing.
+	grep "clone< uri https://example.com/a-uri" log &&
+	test_i18ngrep "did not receive expected object $(cat h)" err
+'
+
 stop_httpd
 
 test_done
diff --git a/upload-pack.c b/upload-pack.c
index aa2589b858..a8eef697ec 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -104,6 +104,7 @@ static int write_one_shallow(const struct commit_graft *graft, void *cb_data)
 struct output_state {
 	char buffer[8193];
 	int used;
+	unsigned packfile_uris_started : 1;
 	unsigned packfile_started : 1;
 	struct strbuf progress_buf;
 };
@@ -129,10 +130,35 @@ static int read_pack_objects_stdout(int outfd, struct output_state *os,
 	}
 	os->used += readsz;
 
-	if (!os->packfile_started) {
-		os->packfile_started = 1;
-		if (use_protocol_v2)
-			packet_write_fmt(1, "packfile\n");
+	while (!os->packfile_started) {
+		char *p;
+		if (os->used >= 4 && !memcmp(os->buffer, "PACK", 4)) {
+			os->packfile_started = 1;
+			if (use_protocol_v2) {
+				if (os->packfile_uris_started)
+					packet_delim(1);
+				packet_write_fmt(1, "packfile\n");
+			}
+			break;
+		}
+		if ((p = memchr(os->buffer, '\n', os->used))) {
+			if (!os->packfile_uris_started) {
+				os->packfile_uris_started = 1;
+				if (!use_protocol_v2)
+					BUG("packfile_uris requires protocol v2");
+				packet_write_fmt(1, "packfile-uris\n");
+			}
+			*p = '\0';
+			packet_write_fmt(1, "uri %s\n", os->buffer);
+
+			os->used -= p - os->buffer + 1;
+			memmove(os->buffer, p, os->used);
+		} else {
+			/*
+			 * Incomplete line.
+			 */
+			return readsz;
+		}
 	}
 
 	if (os->used > 1) {
@@ -205,6 +231,9 @@ static void create_pack_file(const struct object_array *have_obj,
 					 filter_options.filter_spec);
 		}
 	}
+	if (use_protocol_v2)
+		argv_array_push(&pack_objects.args,
+				"--exclude-configured-blobs");
 
 	pack_objects.in = -1;
 	pack_objects.out = -1;
-- 
2.19.0.271.gfe8321ec05.dirty


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [WIP RFC 0/5] Design for offloading part of packfile response to CDN
  2018-12-03 23:37 [WIP RFC 0/5] Design for offloading part of packfile response to CDN Jonathan Tan
                   ` (4 preceding siblings ...)
  2018-12-03 23:37 ` [WIP RFC 5/5] upload-pack: send part of packfile response as uri Jonathan Tan
@ 2018-12-04  0:01 ` Stefan Beller
  5 siblings, 0 replies; 22+ messages in thread
From: Stefan Beller @ 2018-12-04  0:01 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On Mon, Dec 3, 2018 at 3:37 PM Jonathan Tan <jonathantanmy@google.com> wrote:

>
> There is a potential issue: a server which produces both the URIs and
> the packfile at roughly the same time (like the implementation in this
> patch set) will not have sideband access until it has concluded sending
> the URIs. Among other things, this means that the server cannot send
> keepalive packets until quite late in the response. One solution to this
> might be to add a feature that allows the server to use a sideband
> throughout the whole response - and this has other benefits too like
> allowing servers to inform the client throughout the whole fetch, not
> just at the end.

While side band sounds like the right thing to do, we could also
sending (NULL)-URIs within this feature.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [WIP RFC 2/5] Documentation: add Packfile URIs design doc
  2018-12-03 23:37 ` [WIP RFC 2/5] Documentation: add Packfile URIs design doc Jonathan Tan
@ 2018-12-04  0:21   ` Stefan Beller
  2018-12-04  1:54   ` brian m. carlson
  2018-12-05  5:02   ` Junio C Hamano
  2 siblings, 0 replies; 22+ messages in thread
From: Stefan Beller @ 2018-12-04  0:21 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Thanks for bringing this design to the list!

> diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt
> index 345c00e08c..2cb1c41742 100644
> --- a/Documentation/technical/protocol-v2.txt
> +++ b/Documentation/technical/protocol-v2.txt
> @@ -313,7 +313,8 @@ header. Most sections are sent only when the packfile is sent.
>
>      output = acknowledgements flush-pkt |
>              [acknowledgments delim-pkt] [shallow-info delim-pkt]
> -            [wanted-refs delim-pkt] packfile flush-pkt
> +            [wanted-refs delim-pkt] [packfile-uris delim-pkt]
> +            packfile flush-pkt

While this is an RFC and incomplete, we'd need to remember to
add packfile-uris to the capabilities list above, stating that it requires
thin-pack and ofs-delta to be sent, and what to expect from it.

The mention of  --no-packfile-urls in the Client design above
seems to imply we'd want to turn it on by default, which I thought
was not the usual stance how we introduce new things.

An odd way of disabling it would be --no-thin-pack, hoping the
client side implementation abides by the implied requirements.

>      acknowledgments = PKT-LINE("acknowledgments" LF)
>                       (nak | *ack)
> @@ -331,6 +332,9 @@ header. Most sections are sent only when the packfile is sent.
>                   *PKT-LINE(wanted-ref LF)
>      wanted-ref = obj-id SP refname
>
> +    packfile-uris = PKT-LINE("packfile-uris" LF) *packfile-uri
> +    packfile-uri = PKT-LINE("uri" SP *%x20-ff LF)

Is the *%x20-ff a fancy way of saying obj-id?

While the server is configured with pairs of (oid URL),
we would not need to send the exact oid to the client
as that is what the client can figure out on its own by reading
the downloaded pack.

Instead we could send an integrity hash (i.e. the packfile
downloaded from "uri" is expected to hash to $oid here)

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [WIP RFC 3/5] upload-pack: refactor reading of pack-objects out
  2018-12-03 23:37 ` [WIP RFC 3/5] upload-pack: refactor reading of pack-objects out Jonathan Tan
@ 2018-12-04  0:30   ` Stefan Beller
  2018-12-05  6:30   ` Junio C Hamano
  1 sibling, 0 replies; 22+ messages in thread
From: Stefan Beller @ 2018-12-04  0:30 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On Mon, Dec 3, 2018 at 3:37 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> Subsequent patches will change how the output of pack-objects is
> processed, so extract that processing into its own function.
>
> Currently, at most 1 character can be buffered (in the "buffered" local
> variable). One of those patches will require a larger buffer, so replace
> that "buffered" local variable with a buffer array.

This buffering sounds oddly similar to the pkt reader which can buffer
at most one pkt, the difference being that we'd buffer bytes
instead of pkts.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [WIP RFC 2/5] Documentation: add Packfile URIs design doc
  2018-12-03 23:37 ` [WIP RFC 2/5] Documentation: add Packfile URIs design doc Jonathan Tan
  2018-12-04  0:21   ` Stefan Beller
@ 2018-12-04  1:54   ` brian m. carlson
  2018-12-04 19:29     ` Jonathan Tan
  2018-12-05  5:02   ` Junio C Hamano
  2 siblings, 1 reply; 22+ messages in thread
From: brian m. carlson @ 2018-12-04  1:54 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 3058 bytes --]

On Mon, Dec 03, 2018 at 03:37:35PM -0800, Jonathan Tan wrote:
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  Documentation/technical/packfile-uri.txt | 83 ++++++++++++++++++++++++
>  Documentation/technical/protocol-v2.txt  |  6 +-
>  2 files changed, 88 insertions(+), 1 deletion(-)
>  create mode 100644 Documentation/technical/packfile-uri.txt
> 
> diff --git a/Documentation/technical/packfile-uri.txt b/Documentation/technical/packfile-uri.txt
> new file mode 100644
> index 0000000000..6535801486
> --- /dev/null
> +++ b/Documentation/technical/packfile-uri.txt
> @@ -0,0 +1,83 @@
> +Packfile URIs
> +=============
> +
> +This feature allows servers to serve part of their packfile response as URIs.
> +This allows server designs that improve scalability in bandwidth and CPU usage
> +(for example, by serving some data through a CDN), and (in the future) provides
> +some measure of resumability to clients.
> +
> +This feature is available only in protocol version 2.
> +
> +Protocol
> +--------
> +
> +The server advertises `packfile-uris`.
> +
> +If the client replies with the following arguments:
> +
> + * packfile-uris
> + * thin-pack
> + * ofs-delta
> +
> +when the server sends the packfile, it MAY send a `packfile-uris` section
> +directly before the `packfile` section (right after `wanted-refs` if it is
> +sent) containing HTTP(S) URIs. See protocol-v2.txt for the documentation of
> +this section.
> +
> +Clients then should understand that the returned packfile could be incomplete,
> +and that it needs to download all the given URIs before the fetch or clone is
> +complete. Each URI should point to a Git packfile (which may be a thin pack and
> +which may contain offset deltas).


Some thoughts here:

First, I'd like to see a section (and a bit in the implementation)
requiring HTTPS if the original protocol is secure (SSH or HTTPS).
Allowing the server to downgrade to HTTP, even by accident, would be a
security problem.

Second, this feature likely should be opt-in for SSH. One issue I've
seen repeatedly is that people don't want to use HTTPS to fetch things
when they're using SSH for Git. Many people in corporate environments
have proxies that break HTTP for non-browser use cases[0], and using SSH
is the only way that they can make a functional Git connection.

Third, I think the server needs to be required to both support Range
headers and never change the content of a URI, so that we can have
resumable clone implicit in this design. There are some places in the
world where connections are poor and fetching even the initial packfile
at once might be a problem. (I've seen such questions on Stack
Overflow, for example.)

Having said that, I think overall this is a good idea and I'm glad to
see a proposal for it.

[0] For example, a naughty-word filter may corrupt or block certain byte
sequences that occur incidentally in the pack stream.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [WIP RFC 2/5] Documentation: add Packfile URIs design doc
  2018-12-04  1:54   ` brian m. carlson
@ 2018-12-04 19:29     ` Jonathan Tan
  0 siblings, 0 replies; 22+ messages in thread
From: Jonathan Tan @ 2018-12-04 19:29 UTC (permalink / raw)
  To: sandals; +Cc: jonathantanmy, git

> Some thoughts here:
> 
> First, I'd like to see a section (and a bit in the implementation)
> requiring HTTPS if the original protocol is secure (SSH or HTTPS).
> Allowing the server to downgrade to HTTP, even by accident, would be a
> security problem.
> 
> Second, this feature likely should be opt-in for SSH. One issue I've
> seen repeatedly is that people don't want to use HTTPS to fetch things
> when they're using SSH for Git. Many people in corporate environments
> have proxies that break HTTP for non-browser use cases[0], and using SSH
> is the only way that they can make a functional Git connection.

Good points about SSH support and the client needing to control which
protocols the server will send URIs for. I'll include a line in the
client request in which the client can specify which protocols it is OK
with.

> Third, I think the server needs to be required to both support Range
> headers and never change the content of a URI, so that we can have
> resumable clone implicit in this design. There are some places in the
> world where connections are poor and fetching even the initial packfile
> at once might be a problem. (I've seen such questions on Stack
> Overflow, for example.)

Good points. I'll add these in the next revision.

> Having said that, I think overall this is a good idea and I'm glad to
> see a proposal for it.

Thanks, and thanks for your comments too.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [WIP RFC 5/5] upload-pack: send part of packfile response as uri
  2018-12-03 23:37 ` [WIP RFC 5/5] upload-pack: send part of packfile response as uri Jonathan Tan
@ 2018-12-04 20:09   ` Stefan Beller
  0 siblings, 0 replies; 22+ messages in thread
From: Stefan Beller @ 2018-12-04 20:09 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On Mon, Dec 3, 2018 at 3:38 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> This is a partial implementation of upload-pack sending part of its
> packfile response as URIs.

It does so by implementing a new flag `--exclude-configured-blobs`
in pack-objects, which would change the output of pack-objects to
output a list of URLs (of the excluded blobs) followed by the
pack to be asked for.

This design seems easy to implement as then upload-pack
can just parse the output and only needs to insert
"packfile" and "packfile-uris\n" at the appropriate places
of the stream, otherwise it just passes through the information
obtained from pack-objects.

The design as-is would make for hard documentation of
pack-objects (its output is not just a pack anymore when that
flag is given, but a highly optimized byte stream).

Initially I did not anticipate this to be one of the major design problems
as I assumed we'd want to use this pack feature over broadly (e.g.
eventually by offloading most of the objects into a base pack that
is just always included as the likelihood for any object in there is
very high on initial clone), but it makes total sense to only
output the URIs that we actually need.

An alternative that comes very close to the current situation
would be to either pass a file path or file descriptor (that upload-pack
listens to in parallel) to pack-objects as an argument of the new flag.
Then we would not need to splice the protocol sections into the single
output stream, but we could announce the sections, then flush
the URIs and then flush the pack afterwards.

I looked at this quickly, but that would need either extensions in
run-command.c for setting up the new fd for us, as there we already
have OS specific code for these setups, or we'd have to duplicate
some of the logic here, which doesn't enthuse me either.

So maybe we'd create a temp file via mkstemp and pass
the file name to pack-objects for writing the URIs and then
we'd just need to stream that file?

> +       # NEEDSWORK: "git clone" fails here because it ignores the URI provided
> +       # instead of fetching it.
> +       test_must_fail env GIT_TRACE_PACKET="$(pwd)/log" \
> +               git -c protocol.version=2 clone \
> +               "$HTTPD_URL/smart/http_parent" http_child 2>err &&
> +       # Although "git clone" fails, we can still check that the server
> +       # provided the URI we requested and that the error message pinpoints
> +       # the object that is missing.
> +       grep "clone< uri https://example.com/a-uri" log &&
> +       test_i18ngrep "did not receive expected object $(cat h)" err

That is a nice test!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [WIP RFC 1/5] Documentation: order protocol v2 sections
  2018-12-03 23:37 ` [WIP RFC 1/5] Documentation: order protocol v2 sections Jonathan Tan
@ 2018-12-05  4:10   ` Junio C Hamano
  2018-12-06 22:54     ` Jonathan Tan
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2018-12-05  4:10 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan <jonathantanmy@google.com> writes:

> The git command line expects Git servers to follow a specific order of

"Command line"?  It sounds like you are talking about the order of
command line arguments and options, but apparently that is not what
you are doing.  Is it "The git over-the-wire protocol"?

> +    output = acknowledgements flush-pkt |
> +	     [acknowledgments delim-pkt] [shallow-info delim-pkt]
> +	     [wanted-refs delim-pkt] packfile flush-pkt

So the output can be either 

 - acks followed by flush (and nothing else) or

 - (possibly) acks, followed by (possibly) shallow, followed by
   (possibly) wanted-refs, followed by the pack stream and flush at
   the end.

> @@ -335,9 +335,10 @@ header.
>  	       *PKT-LINE(%x01-03 *%x00-ff)
>  
>      acknowledgments section
> -	* If the client determines that it is finished with negotiations
> -	  by sending a "done" line, the acknowledgments sections MUST be
> -	  omitted from the server's response.
> +	* If the client determines that it is finished with negotiations by
> +	  sending a "done" line (thus requiring the server to send a packfile),
> +	  the acknowledgments sections MUST be omitted from the server's
> +	  response.

OK.  

>  	* Always begins with the section header "acknowledgments"
>  
> @@ -388,9 +389,6 @@ header.
>  	  which the client has not indicated was shallow as a part of
>  	  its request.
>  
> -	* This section is only included if a packfile section is also
> -	  included in the response.
> -

Earlier, we said that shallow-info is not given when packfile is not
there.  That is captured in the updated EBNF above.  We don't have a
corresponding removal of a bullet point for wanted-refs section below
but probably that is because the original did not have corresponding
bullet point to begin with.

>      wanted-refs section
>  	* This section is only included if the client has requested a
>  	  ref using a 'want-ref' line and if a packfile section is also

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [WIP RFC 2/5] Documentation: add Packfile URIs design doc
  2018-12-03 23:37 ` [WIP RFC 2/5] Documentation: add Packfile URIs design doc Jonathan Tan
  2018-12-04  0:21   ` Stefan Beller
  2018-12-04  1:54   ` brian m. carlson
@ 2018-12-05  5:02   ` Junio C Hamano
  2018-12-05  5:55     ` Junio C Hamano
  2018-12-06 23:16     ` Jonathan Tan
  2 siblings, 2 replies; 22+ messages in thread
From: Junio C Hamano @ 2018-12-05  5:02 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan <jonathantanmy@google.com> writes:

> +This feature allows servers to serve part of their packfile response as URIs.
> +This allows server designs that improve scalability in bandwidth and CPU usage
> +(for example, by serving some data through a CDN), and (in the future) provides
> +some measure of resumability to clients.

Without reading the remainder, this makes readers anticipate a few
good things ;-)

 - "part of", so pre-generated constant material can be given from
   CDN and then followed-up by "filling the gaps" small packfile,
   perhaps?

 - The "part of" transmission may not bring the repository up to
   date wrt to the "want" objects; would this feature involve "you
   asked history up to these commits, but with this pack-uri, you'll
   be getting history up to these (somewhat stale) commits"?

Anyway, let's read on.

> +This feature is available only in protocol version 2.
> +
> +Protocol
> +--------
> +
> +The server advertises `packfile-uris`.
> +
> +If the client replies with the following arguments:
> +
> + * packfile-uris
> + * thin-pack
> + * ofs-delta

"with the following" meaning "with all of the following", or "with
any of the following"?  Is there a reason why the server side must
require that the client understands and is willing to accept a
thin-pack when wanting to use packfile-uris?  The same question for
the ofs-delta.

When the pregenerated constant material the server plans to hand out
the uris for was prepared by using ofs-delta encoding, the server
cannot give the uri to it when the client does not want ofs-delta
encoded packfile, but it feels somewhat strange that we require the
most capable client at the protocol level.  After all, the server
side could prepare one with ofs-delta and another without ofs-delta
and depending on what the client is capable of, hand out different
URIs, if it wanted to.

The reason why I care is because thin and ofs will *NOT* stay
forever be the only optional features of the pack format.  We may
invent yet another such optional 'frotz' feature, which may greatly
help the efficiency of the packfile encoding, hence it may be
preferrable to always generate a CDN packfile with that feature, in
addition to thin and ofs.  Would we add 'frotz' to the above list in
the documentation, then?  What would happen to existing servers and
clients written before that time then?

My recommendation is to drop the mention of "thin" and "ofs" from
the above list, and also from the following paragraph.  The "it MAY
send" will serve as a practical escape clause to allow a server/CDN
implementation that *ALWAYS* prepares pregenerated material that can
only be digested by clients that supports thin and ofs.  Such a server
can send packfile-URIs only when all of the three are given by the
client and be compliant.  And such an update to the proposed document
would allow a more diskful server to prepare both thin and non-thin
pregenerated packs and choose which one to give to the client depending
on the capability.

> +when the server sends the packfile, it MAY send a `packfile-uris` section
> +directly before the `packfile` section (right after `wanted-refs` if it is
> +sent) containing HTTP(S) URIs. See protocol-v2.txt for the documentation of
> +this section.

So, this is OK, but

> +Clients then should understand that the returned packfile could be incomplete,
> +and that it needs to download all the given URIs before the fetch or clone is
> +complete. Each URI should point to a Git packfile (which may be a thin pack and
> +which may contain offset deltas).

weaken or remove the (parenthetical comment) in the last sentence,
and replace the beginning of the section with something like

	If the client replies with 'packfile-uris', when the server
	sends the packfile, it MAY send a `packfile-uris` section...

You may steal what I wrote in the above response to help the
server-side folks to decide how to actually implement the "it MAY
send a packfile-uris" part in the document.

> +Server design
> +-------------
> +
> +The server can be trivially made compatible with the proposed protocol by
> +having it advertise `packfile-uris`, tolerating the client sending
> +`packfile-uris`, and never sending any `packfile-uris` section. But we should
> +include some sort of non-trivial implementation in the Minimum Viable Product,
> +at least so that we can test the client.
> +
> +This is the implementation: a feature, marked experimental, that allows the
> +server to be configured by one or more `uploadpack.blobPackfileUri=<sha1>
> +<uri>` entries. Whenever the list of objects to be sent is assembled, a blob
> +with the given sha1 can be replaced by the given URI. This allows, for example,
> +servers to delegate serving of large blobs to CDNs.

;-)

> +Client design
> +-------------
> +
> +While fetching, the client needs to remember the list of URIs and cannot
> +declare that the fetch is complete until all URIs have been downloaded as
> +packfiles.
> +
> +The division of work (initial fetch + additional URIs) introduces convenient
> +points for resumption of an interrupted clone - such resumption can be done
> +after the Minimum Viable Product (see "Future work").
> +
> +The client can inhibit this feature (i.e. refrain from sending the
> +`packfile-urls` parameter) by passing --no-packfile-urls to `git fetch`.

OK, this comes back to what I alluded to at the beginning.  We could
respond to a full-clone request by feeding a series of packfile-uris
and some ref information, perhaps like this:

	* Grab this packfile and update your remote-tracking refs
          and tags to these values; you'd be as if you cloned the
          project when it was at v1.0.

	* When you are done with the above, grab this packfile and
          update your remote-tracking refs and tags to these values;
          you'd be as if you cloned the project when it was at v2.0.

	* When you are done with the above, grab this packfile and
          update your remote-tracking refs and tags to these values;
          you'd be as if you cloned the project when it was at v3.0.

	...

	* When you are done with the above, here is the remaining
          packdata to bring you fully up to date with your original
          "want"s.

and before fully reading the proposal, I anticipated that it was
what you were going to describe.  The major difference is "up to the
packdata given to you so far, you'd be as if you fetched these" ref
information, which would allow you to be interrupted and then simply
resume, without having to remember the set of packfile-uris yet to
be processed across a fetch/clone failure.  If you sucessfully fetch
packfile for ..v1.0, you can update the remote-tracking refs to
match as if you fetched back when that was the most recent state of
the project, and then if you failed while transferring packfile for
v1.0..v2.0, the resuming would just reissue "git fetch" internally.

I think what you proposed, i.e. without the "with the data up to
this packfile, you have history to these objects", would also work,
even though it requires us to remember more of what we learned
during the initial attempt throughout retrying failed transfers.

> +Future work
> +-----------
> +
> +The protocol design allows some evolution of the server and client without any
> +need for protocol changes, so only a small-scoped design is included here to
> +form the MVP. For example, the following can be done:
> +
> + * On the server, a long-running process that takes in entire requests and
> +   outputs a list of URIs and the corresponding inclusion and exclusion sets of
> +   objects. This allows, e.g., signed URIs to be used and packfiles for common
> +   requests to be cached.
> + * On the client, resumption of clone. If a clone is interrupted, information
> +   could be recorded in the repository's config and a "clone-resume" command
> +   can resume the clone in progress. (Resumption of subsequent fetches is more
> +   difficult because that must deal with the user wanting to use the repository
> +   even after the fetch was interrupted.)
> +
> +There are some possible features that will require a change in protocol:
> +
> + * Additional HTTP headers (e.g. authentication)
> + * Byte range support
> + * Different file formats referenced by URIs (e.g. raw object)
> +
> diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt
> index 345c00e08c..2cb1c41742 100644
> --- a/Documentation/technical/protocol-v2.txt
> +++ b/Documentation/technical/protocol-v2.txt
> @@ -313,7 +313,8 @@ header. Most sections are sent only when the packfile is sent.
>  
>      output = acknowledgements flush-pkt |
>  	     [acknowledgments delim-pkt] [shallow-info delim-pkt]
> -	     [wanted-refs delim-pkt] packfile flush-pkt
> +	     [wanted-refs delim-pkt] [packfile-uris delim-pkt]
> +	     packfile flush-pkt
>  
>      acknowledgments = PKT-LINE("acknowledgments" LF)
>  		      (nak | *ack)
> @@ -331,6 +332,9 @@ header. Most sections are sent only when the packfile is sent.
>  		  *PKT-LINE(wanted-ref LF)
>      wanted-ref = obj-id SP refname
>  
> +    packfile-uris = PKT-LINE("packfile-uris" LF) *packfile-uri
> +    packfile-uri = PKT-LINE("uri" SP *%x20-ff LF)
> +
>      packfile = PKT-LINE("packfile" LF)
>  	       *PKT-LINE(%x01-03 *%x00-ff)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [WIP RFC 2/5] Documentation: add Packfile URIs design doc
  2018-12-05  5:02   ` Junio C Hamano
@ 2018-12-05  5:55     ` Junio C Hamano
  2018-12-06 23:16     ` Jonathan Tan
  1 sibling, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2018-12-05  5:55 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

> So, this is OK, but
>
>> +Clients then should understand that the returned packfile could be incomplete,
>> +and that it needs to download all the given URIs before the fetch or clone is
>> +complete. Each URI should point to a Git packfile (which may be a thin pack and
>> +which may contain offset deltas).
>
> weaken or remove the (parenthetical comment) in the last sentence,
> and replace the beginning of the section with something like
>
> 	If the client replies with 'packfile-uris', when the server
> 	sends the packfile, it MAY send a `packfile-uris` section...
>
> You may steal what I wrote in the above response to help the
> server-side folks to decide how to actually implement the "it MAY
> send a packfile-uris" part in the document.

By the way, I do agree with the practical consideration the design
you described makes.  For a pregenerated pack that brings you from
v1.0 to v2.0, "thin" would roughly save the transfer of one full
checkout (compressed, of course), and "ofs" would also save several
bytes per object.  Compared to a single pack that delivers everything
the fetcher wants, concatenation of packs without "thin" to transfer
the same set of objects would cost quite a lot more.

And I do not think we should care too much about fetchers that lack
either thin or ofs, so I'd imagine that any client that ask for
packfile-uris would also advertise thin and ofs as well, so in
practice, a request with packfile-uris that lack thin or ofs would
be pretty rare and requiring all three and requiring only one would
not make much practical difference.  It's just that I think singling
out these two capabilities as hard requirements at the protocol
level is wrong.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [WIP RFC 3/5] upload-pack: refactor reading of pack-objects out
  2018-12-03 23:37 ` [WIP RFC 3/5] upload-pack: refactor reading of pack-objects out Jonathan Tan
  2018-12-04  0:30   ` Stefan Beller
@ 2018-12-05  6:30   ` Junio C Hamano
  1 sibling, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2018-12-05  6:30 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan <jonathantanmy@google.com> writes:

> +struct output_state {
> +	char buffer[8193];
> +	int used;
> +};
> +
> +static int read_pack_objects_stdout(int outfd, struct output_state *os)
> +{

The naming feels quite odd.  We are downstream of pack-objects and
reading its standard output stream, so "read stdout-of-pack-objects"
is not wrong per-se, but it may be just me.  Stepping back and what
the function is really about often helps us to understand what we
are doing.

I think the function you extracted from the existing logic is
responsible for relaying the data read from pack-objects to the
fetch-pack client on the other side of the wire.  So from that point
of view, singling out "read" as if it is a primary thing the
function does is already suboptimal.  Perhaps

	static int relay_pack_data(int in, struct output_state *os)

because the fd is what we "read" from (hence, 'in', not 'out'), and
relaying is what we do and reading is only one half of doing so?

> +	/* Data ready; we keep the last byte to ourselves
> +	 * in case we detect broken rev-list, so that we
> +	 * can leave the stream corrupted.  This is
> +	 * unfortunate -- unpack-objects would happily
> +	 * accept a valid packdata with trailing garbage,
> +	 * so appending garbage after we pass all the
> +	 * pack data is not good enough to signal
> +	 * breakage to downstream.
> +	 */

Yeah, I recall writing this funky and unfortunate logic in 2006.
Perhaps we want to update the comment style to make it a bit more
modern?

The "Data ready;" part of the comment applies more to what the
caller of this logic does (i.e. poll returned and revents indicates
we can read, and that is why we are calling into this logic).  The
remainder is the comment that is relevat to this logic.

> +	ssize_t readsz;
> +
> +	readsz = xread(outfd, os->buffer + os->used,
> +		       sizeof(os->buffer) - os->used);

So we read what we could read in the remaining buffer.

I notice that you alloated 8k+1 bytes; would this code end up
reading that 8k+1 bytes in full under the right condition?  I am
mainly wondering where the need for +1 comes from.

> +	if (readsz < 0) {
> +		return readsz;
> +	}

OK, report an error to the caller by returning negative.  I am
guessing that you'd have more code in this block that currently have
only one statement in the future steps, perhaps.

> +	os->used += readsz;
> +
> +	if (os->used > 1) {
> +		send_client_data(1, os->buffer, os->used - 1);
> +		os->buffer[0] = os->buffer[os->used - 1];

OK, this corresponds to the "*cp++ = buffered"  in the original just
before xread().

> +		os->used = 1;
> +	} else {
> +		send_client_data(1, os->buffer, os->used);
> +		os->used = 0;

I am not sure if the code is correct when os->used happens to be 1
(shouldn't we hold the byte, skip the call to send_client_data(),
and go back to poll() to expect more data?), but this is a faithful
code movement and rewrite of the original.

I've read the caller (below, snipped) and the conversion looked
sensible there as well.  The final flushing of the byte(s) held back
in *os also looks good.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [WIP RFC 4/5] upload-pack: refactor writing of "packfile" line
  2018-12-03 23:37 ` [WIP RFC 4/5] upload-pack: refactor writing of "packfile" line Jonathan Tan
@ 2018-12-06  6:35   ` Junio C Hamano
  2018-12-06 23:25     ` Jonathan Tan
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2018-12-06  6:35 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan <jonathantanmy@google.com> writes:

> @@ -126,6 +129,12 @@ static int read_pack_objects_stdout(int outfd, struct output_state *os)
>  	}
>  	os->used += readsz;
>  
> +	if (!os->packfile_started) {
> +		os->packfile_started = 1;
> +		if (use_protocol_v2)
> +			packet_write_fmt(1, "packfile\n");

If we fix this function so that the only byte in the buffer is held
back without emitted when os->used == 1 as I alluded to, this may
have to be done a bit later, as with such a change, it is no longer
guaranteed that send_client_data() will be called after this point.

> +	}
> +
>  	if (os->used > 1) {
>  		send_client_data(1, os->buffer, os->used - 1);
>  		os->buffer[0] = os->buffer[os->used - 1];

> +static void flush_progress_buf(struct strbuf *progress_buf)
> +{
> +	if (!progress_buf->len)
> +		return;
> +	send_client_data(2, progress_buf->buf, progress_buf->len);
> +	strbuf_reset(progress_buf);
> +}

Interesting.

>  static void create_pack_file(const struct object_array *have_obj,
> -			     const struct object_array *want_obj)
> +			     const struct object_array *want_obj,
> +			     int use_protocol_v2)
>  {
>  	struct child_process pack_objects = CHILD_PROCESS_INIT;
>  	struct output_state output_state = {0};
> @@ -260,9 +278,13 @@ static void create_pack_file(const struct object_array *have_obj,
>  			 */
>  			sz = xread(pack_objects.err, progress,
>  				  sizeof(progress));
> -			if (0 < sz)
> -				send_client_data(2, progress, sz);
> -			else if (sz == 0) {
> +			if (0 < sz) {
> +				if (output_state.packfile_started)
> +					send_client_data(2, progress, sz);
> +				else
> +					strbuf_add(&output_state.progress_buf,
> +						   progress, sz);

Isn't progress output that goes to the channel #2 pretty much
independent from the payload stream that carries the pkt-line 
command like "packfile" plus the raw pack stream?  It somehow
feels like an oxymoron to _buffer_ progress indicator, as it
defeats the whole point of progress report to buffer it.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [WIP RFC 1/5] Documentation: order protocol v2 sections
  2018-12-05  4:10   ` Junio C Hamano
@ 2018-12-06 22:54     ` Jonathan Tan
  2018-12-09  0:15       ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: Jonathan Tan @ 2018-12-06 22:54 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git

> > The git command line expects Git servers to follow a specific order of
> 
> "Command line"?  It sounds like you are talking about the order of
> command line arguments and options, but apparently that is not what
> you are doing.  Is it "The git over-the-wire protocol"?

I meant to say the current Git implementation, as opposed to what is
written in the specification. I'll replace it with "The current C Git
implementation".

> Earlier, we said that shallow-info is not given when packfile is not
> there.  That is captured in the updated EBNF above.  We don't have a
> corresponding removal of a bullet point for wanted-refs section below
> but probably that is because the original did not have corresponding
> bullet point to begin with.

That's because the corresponding bullet point had other information.
Quoted in full below:

> 	* This section is only included if the client has requested a
> 	  ref using a 'want-ref' line and if a packfile section is also
> 	  included in the response.

I could reword it to "If a packfile section is included in the response,
this section is only included if the client has requested a ref using a
'want-ref' line", but I don't think that is significantly clearer.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [WIP RFC 2/5] Documentation: add Packfile URIs design doc
  2018-12-05  5:02   ` Junio C Hamano
  2018-12-05  5:55     ` Junio C Hamano
@ 2018-12-06 23:16     ` Jonathan Tan
  1 sibling, 0 replies; 22+ messages in thread
From: Jonathan Tan @ 2018-12-06 23:16 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git

> > +This feature allows servers to serve part of their packfile response as URIs.
> > +This allows server designs that improve scalability in bandwidth and CPU usage
> > +(for example, by serving some data through a CDN), and (in the future) provides
> > +some measure of resumability to clients.
> 
> Without reading the remainder, this makes readers anticipate a few
> good things ;-)
> 
>  - "part of", so pre-generated constant material can be given from
>    CDN and then followed-up by "filling the gaps" small packfile,
>    perhaps?

Yes :-)

>  - The "part of" transmission may not bring the repository up to
>    date wrt to the "want" objects; would this feature involve "you
>    asked history up to these commits, but with this pack-uri, you'll
>    be getting history up to these (somewhat stale) commits"?

It could be, but not necessarily. In my current WIP implementation, for
example, pack URIs don't give you any commits at all (and thus, no
history) - only blobs. Quite a few people first think of the "stale
clone then top-up" case, though - I wonder if it would be a good idea to
give the blob example in this paragraph in order to put people in the
right frame of mind.

> > +If the client replies with the following arguments:
> > +
> > + * packfile-uris
> > + * thin-pack
> > + * ofs-delta
> 
> "with the following" meaning "with all of the following", or "with
> any of the following"?  Is there a reason why the server side must
> require that the client understands and is willing to accept a
> thin-pack when wanting to use packfile-uris?  The same question for
> the ofs-delta.

"All of the following", but from your later comments, we probably don't
need this section anyway.

> My recommendation is to drop the mention of "thin" and "ofs" from
> the above list, and also from the following paragraph.  The "it MAY
> send" will serve as a practical escape clause to allow a server/CDN
> implementation that *ALWAYS* prepares pregenerated material that can
> only be digested by clients that supports thin and ofs.  Such a server
> can send packfile-URIs only when all of the three are given by the
> client and be compliant.  And such an update to the proposed document
> would allow a more diskful server to prepare both thin and non-thin
> pregenerated packs and choose which one to give to the client depending
> on the capability.

That is true - we can just let the server decide. I'll update the patch
accordingly, and state that the URIs should point to packs with features
like thin-pack and ofs-delta only if the client has declared that it
supports them.

> > +Clients then should understand that the returned packfile could be incomplete,
> > +and that it needs to download all the given URIs before the fetch or clone is
> > +complete. Each URI should point to a Git packfile (which may be a thin pack and
> > +which may contain offset deltas).
> 
> weaken or remove the (parenthetical comment) in the last sentence,
> and replace the beginning of the section with something like
> 
> 	If the client replies with 'packfile-uris', when the server
> 	sends the packfile, it MAY send a `packfile-uris` section...
> 
> You may steal what I wrote in the above response to help the
> server-side folks to decide how to actually implement the "it MAY
> send a packfile-uris" part in the document.

OK, will do.

> OK, this comes back to what I alluded to at the beginning.  We could
> respond to a full-clone request by feeding a series of packfile-uris
> and some ref information, perhaps like this:
> 
> 	* Grab this packfile and update your remote-tracking refs
>           and tags to these values; you'd be as if you cloned the
>           project when it was at v1.0.
> 
> 	* When you are done with the above, grab this packfile and
>           update your remote-tracking refs and tags to these values;
>           you'd be as if you cloned the project when it was at v2.0.
> 
> 	* When you are done with the above, grab this packfile and
>           update your remote-tracking refs and tags to these values;
>           you'd be as if you cloned the project when it was at v3.0.
> 
> 	...
> 
> 	* When you are done with the above, here is the remaining
>           packdata to bring you fully up to date with your original
>           "want"s.
> 
> and before fully reading the proposal, I anticipated that it was
> what you were going to describe.  The major difference is "up to the
> packdata given to you so far, you'd be as if you fetched these" ref
> information, which would allow you to be interrupted and then simply
> resume, without having to remember the set of packfile-uris yet to
> be processed across a fetch/clone failure.  If you sucessfully fetch
> packfile for ..v1.0, you can update the remote-tracking refs to
> match as if you fetched back when that was the most recent state of
> the project, and then if you failed while transferring packfile for
> v1.0..v2.0, the resuming would just reissue "git fetch" internally.

The "up to" would work if we had the stale clone + periodic "upgrades"
arrangement you describe, but not when, for example, we just want to
separate large blobs out. If we were to insist on attaching ref
information to each packfile URI (or turn the packfiles into bundles),
it is true that we can have resumable fetch, although I haven't fully
thought out the implications of letting the user modify the repository
while a fetch is in progress. (What happens if the user wipes out their
object store in between fetching one packfile and the next, for
example?) That is why I only talked about resumable clone, not resumable
fetch.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [WIP RFC 4/5] upload-pack: refactor writing of "packfile" line
  2018-12-06  6:35   ` Junio C Hamano
@ 2018-12-06 23:25     ` Jonathan Tan
  2018-12-07  0:22       ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: Jonathan Tan @ 2018-12-06 23:25 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git

> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> > @@ -126,6 +129,12 @@ static int read_pack_objects_stdout(int outfd, struct output_state *os)
> >  	}
> >  	os->used += readsz;
> >  
> > +	if (!os->packfile_started) {
> > +		os->packfile_started = 1;
> > +		if (use_protocol_v2)
> > +			packet_write_fmt(1, "packfile\n");
> 
> If we fix this function so that the only byte in the buffer is held
> back without emitted when os->used == 1 as I alluded to, this may
> have to be done a bit later, as with such a change, it is no longer
> guaranteed that send_client_data() will be called after this point.

I'm not sure what you mean about there being no guarantee that
send_client_data() is not called - in create_pack_file(), there is an
"if (output_state.used > 0)" line (previously "if (0 <= buffered)") that
outputs anything remaining.

> Isn't progress output that goes to the channel #2 pretty much
> independent from the payload stream that carries the pkt-line 
> command like "packfile" plus the raw pack stream?  It somehow
> feels like an oxymoron to _buffer_ progress indicator, as it
> defeats the whole point of progress report to buffer it.

Yes, it is - I don't fully like this part of the design. I alluded to a
similar issue (keepalive) in the toplevel email [1] and that it might be
better if the server can send sideband throughout the whole response -
perhaps that should be investigated first. If we had sideband throughout
the whole response, we wouldn't need to buffer the progress indicator.

[1] https://public-inbox.org/git/cover.1543879256.git.jonathantanmy@google.com/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [WIP RFC 4/5] upload-pack: refactor writing of "packfile" line
  2018-12-06 23:25     ` Jonathan Tan
@ 2018-12-07  0:22       ` Junio C Hamano
  0 siblings, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2018-12-07  0:22 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan <jonathantanmy@google.com> writes:

>> Jonathan Tan <jonathantanmy@google.com> writes:
>> 
>> > @@ -126,6 +129,12 @@ static int read_pack_objects_stdout(int outfd, struct output_state *os)
>> >  	}
>> >  	os->used += readsz;
>> >  
>> > +	if (!os->packfile_started) {
>> > +		os->packfile_started = 1;
>> > +		if (use_protocol_v2)
>> > +			packet_write_fmt(1, "packfile\n");
>> 
>> If we fix this function so that the only byte in the buffer is held
>> back without emitted when os->used == 1 as I alluded to, this may
>> have to be done a bit later, as with such a change, it is no longer
>> guaranteed that send_client_data() will be called after this point.
>
> I'm not sure what you mean about there being no guarantee that
> send_client_data() is not called - in create_pack_file(), there is an
> "if (output_state.used > 0)" line (previously "if (0 <= buffered)") that
> outputs anything remaining.

I was referring to this part of the review on the previous step,
which you may not yet have read.

    OK, this corresponds to the "*cp++ = buffered"  in the original just
    before xread().

    > +		os->used = 1;
    > +	} else {
    > +		send_client_data(1, os->buffer, os->used);
    > +		os->used = 0;

    I am not sure if the code is correct when os->used happens to be 1
    (shouldn't we hold the byte, skip the call to send_client_data(),
    and go back to poll() to expect more data?), but this is a faithful
    code movement and rewrite of the original.

The point of this logic is to make sure we always hold back some
bytes and do not feed *all* the bytes to the other side by calling
"send-client-data" until we made sure the upstream of what we are
relaying (pack-objects?) successfully exited, but it looks to me
that the "else" clause above ends up flushing everything when
os->used is 1, which goes against the whole purpose of the code.

And the "fix" I was alluding to was to update that "else" clause to
make it a no-op that keeps os->used non-zero, which would not call
send-client-data.

When that fix happens, the part that early in the function this
patch added "now we know we will call send-client-data, so let's say
'here comes packdata' unless we have already said that" is making
the decision too early.  Depending on the value of os->used when we
enter the code and the number of bytes xread() reads from the
upstream, we might not call send-client-data yet (namely, when we
have no buffered data and we happened to get only one byte).

> ... it might be
> better if the server can send sideband throughout the whole response -
> perhaps that should be investigated first.

Yup.  It just looked quite crazy, and it is even more crazy to
buffer keepalives ;-)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [WIP RFC 1/5] Documentation: order protocol v2 sections
  2018-12-06 22:54     ` Jonathan Tan
@ 2018-12-09  0:15       ` Junio C Hamano
  0 siblings, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2018-12-09  0:15 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan <jonathantanmy@google.com> writes:

>> > The git command line expects Git servers to follow a specific order of
>> 
>> "Command line"?  It sounds like you are talking about the order of
>> command line arguments and options, but apparently that is not what
>> you are doing.  Is it "The git over-the-wire protocol"?
>
> I meant to say the current Git implementation, as opposed to what is
> written in the specification. I'll replace it with "The current C Git
> implementation".

Yeah, that would avoid confusing future readers; sounds good.

>> Earlier, we said that shallow-info is not given when packfile is not
>> there.  That is captured in the updated EBNF above.  We don't have a
>> corresponding removal of a bullet point for wanted-refs section below
>> but probably that is because the original did not have corresponding
>> bullet point to begin with.
>
> That's because the corresponding bullet point had other information.
> Quoted in full below:
>
>> 	* This section is only included if the client has requested a
>> 	  ref using a 'want-ref' line and if a packfile section is also
>> 	  included in the response.
>
> I could reword it to "If a packfile section is included in the response,
> this section is only included if the client has requested a ref using a
> 'want-ref' line", but I don't think that is significantly clearer.

I don't either.  I didn't mean to suggest to change anything in this
part.  I was just giving an observation---two parallel things do not
get updates in tandem, and that is because they were not described
the same way to begin with, which was a good enough explanation.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, back to index

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-03 23:37 [WIP RFC 0/5] Design for offloading part of packfile response to CDN Jonathan Tan
2018-12-03 23:37 ` [WIP RFC 1/5] Documentation: order protocol v2 sections Jonathan Tan
2018-12-05  4:10   ` Junio C Hamano
2018-12-06 22:54     ` Jonathan Tan
2018-12-09  0:15       ` Junio C Hamano
2018-12-03 23:37 ` [WIP RFC 2/5] Documentation: add Packfile URIs design doc Jonathan Tan
2018-12-04  0:21   ` Stefan Beller
2018-12-04  1:54   ` brian m. carlson
2018-12-04 19:29     ` Jonathan Tan
2018-12-05  5:02   ` Junio C Hamano
2018-12-05  5:55     ` Junio C Hamano
2018-12-06 23:16     ` Jonathan Tan
2018-12-03 23:37 ` [WIP RFC 3/5] upload-pack: refactor reading of pack-objects out Jonathan Tan
2018-12-04  0:30   ` Stefan Beller
2018-12-05  6:30   ` Junio C Hamano
2018-12-03 23:37 ` [WIP RFC 4/5] upload-pack: refactor writing of "packfile" line Jonathan Tan
2018-12-06  6:35   ` Junio C Hamano
2018-12-06 23:25     ` Jonathan Tan
2018-12-07  0:22       ` Junio C Hamano
2018-12-03 23:37 ` [WIP RFC 5/5] upload-pack: send part of packfile response as uri Jonathan Tan
2018-12-04 20:09   ` Stefan Beller
2018-12-04  0:01 ` [WIP RFC 0/5] Design for offloading part of packfile response to CDN Stefan Beller

git@vger.kernel.org mailing list mirror (one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/
       or Tor2web: https://www.tor2web.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox