git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/5] Protocol v2 fix: http and auth
@ 2019-02-14 19:06 Jonathan Tan
  2019-02-14 19:06 ` [PATCH 1/5] remote-curl: reduce scope of rpc_state.argv Jonathan Tan
                   ` (6 more replies)
  0 siblings, 7 replies; 24+ messages in thread
From: Jonathan Tan @ 2019-02-14 19:06 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, peff

Peff noticed an issue with my http auth in protocol v2 patch earlier
[1], and in the ensuing discussion, I thought that it would be best to
make v2 use post_rpc() as well (to be the same as v0/v1) instead of
using its own functions, to fix this issue and try to avoid a similar
issue in the future. Besides that, there is also a net reduction in
lines of code.

So here are the patches. First 4 are refactoring - the last one has the
actual change. This is on the master branch.

[1] https://public-inbox.org/git/20190206212928.GB12737@sigill.intra.peff.net/

Jonathan Tan (5):
  remote-curl: reduce scope of rpc_state.argv
  remote-curl: reduce scope of rpc_state.stdin_preamble
  remote-curl: reduce scope of rpc_state.result
  remote-curl: refactor reading into rpc_state's buf
  remote-curl: use post_rpc() for protocol v2 also

 pkt-line.c             |   2 +-
 pkt-line.h             |   1 +
 remote-curl.c          | 351 ++++++++++++++++++-----------------------
 t/t5702-protocol-v2.sh |  26 ++-
 4 files changed, 181 insertions(+), 199 deletions(-)

-- 
2.19.0.271.gfe8321ec05.dirty


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/5] remote-curl: reduce scope of rpc_state.argv
  2019-02-14 19:06 [PATCH 0/5] Protocol v2 fix: http and auth Jonathan Tan
@ 2019-02-14 19:06 ` Jonathan Tan
  2019-02-14 19:06 ` [PATCH 2/5] remote-curl: reduce scope of rpc_state.stdin_preamble Jonathan Tan
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 24+ messages in thread
From: Jonathan Tan @ 2019-02-14 19:06 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, peff

The argv field in struct rpc_state is only used in rpc_service(), and
not in any functions it directly or indirectly calls. Refactor it to
become an argument of rpc_service() instead.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 remote-curl.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/remote-curl.c b/remote-curl.c
index 2e04d53ac8..3bc5055da6 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -491,7 +491,6 @@ static void output_refs(struct ref *refs)
 
 struct rpc_state {
 	const char *service_name;
-	const char **argv;
 	struct strbuf *stdin_preamble;
 	char *service_url;
 	char *hdr_content_type;
@@ -815,7 +814,8 @@ static int post_rpc(struct rpc_state *rpc)
 	return err;
 }
 
-static int rpc_service(struct rpc_state *rpc, struct discovery *heads)
+static int rpc_service(struct rpc_state *rpc, struct discovery *heads,
+		       const char **client_argv)
 {
 	const char *svc = rpc->service_name;
 	struct strbuf buf = STRBUF_INIT;
@@ -826,7 +826,7 @@ static int rpc_service(struct rpc_state *rpc, struct discovery *heads)
 	client.in = -1;
 	client.out = -1;
 	client.git_cmd = 1;
-	client.argv = rpc->argv;
+	client.argv = client_argv;
 	if (start_command(&client))
 		exit(1);
 	if (preamble)
@@ -964,11 +964,10 @@ static int fetch_git(struct discovery *heads,
 
 	memset(&rpc, 0, sizeof(rpc));
 	rpc.service_name = "git-upload-pack",
-	rpc.argv = args.argv;
 	rpc.stdin_preamble = &preamble;
 	rpc.gzip_request = 1;
 
-	err = rpc_service(&rpc, heads);
+	err = rpc_service(&rpc, heads, args.argv);
 	if (rpc.result.len)
 		write_or_die(1, rpc.result.buf, rpc.result.len);
 	strbuf_release(&rpc.result);
@@ -1098,10 +1097,9 @@ static int push_git(struct discovery *heads, int nr_spec, char **specs)
 
 	memset(&rpc, 0, sizeof(rpc));
 	rpc.service_name = "git-receive-pack",
-	rpc.argv = args.argv;
 	rpc.stdin_preamble = &preamble;
 
-	err = rpc_service(&rpc, heads);
+	err = rpc_service(&rpc, heads, args.argv);
 	if (rpc.result.len)
 		write_or_die(1, rpc.result.buf, rpc.result.len);
 	strbuf_release(&rpc.result);
-- 
2.19.0.271.gfe8321ec05.dirty


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/5] remote-curl: reduce scope of rpc_state.stdin_preamble
  2019-02-14 19:06 [PATCH 0/5] Protocol v2 fix: http and auth Jonathan Tan
  2019-02-14 19:06 ` [PATCH 1/5] remote-curl: reduce scope of rpc_state.argv Jonathan Tan
@ 2019-02-14 19:06 ` Jonathan Tan
  2019-02-14 19:06 ` [PATCH 3/5] remote-curl: reduce scope of rpc_state.result Jonathan Tan
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 24+ messages in thread
From: Jonathan Tan @ 2019-02-14 19:06 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, peff

The stdin_preamble field in struct rpc_state is only used in
rpc_service(), and not in any functions it directly or indirectly calls.
Refactor it to become an argument of rpc_service() instead.

An observation of all callers of rpc_service() shows that the preamble
is always set, so we no longer need the "if (preamble)" check.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 remote-curl.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/remote-curl.c b/remote-curl.c
index 3bc5055da6..bff0bb9af6 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -491,7 +491,6 @@ static void output_refs(struct ref *refs)
 
 struct rpc_state {
 	const char *service_name;
-	struct strbuf *stdin_preamble;
 	char *service_url;
 	char *hdr_content_type;
 	char *hdr_accept;
@@ -815,11 +814,10 @@ static int post_rpc(struct rpc_state *rpc)
 }
 
 static int rpc_service(struct rpc_state *rpc, struct discovery *heads,
-		       const char **client_argv)
+		       const char **client_argv, const struct strbuf *preamble)
 {
 	const char *svc = rpc->service_name;
 	struct strbuf buf = STRBUF_INIT;
-	struct strbuf *preamble = rpc->stdin_preamble;
 	struct child_process client = CHILD_PROCESS_INIT;
 	int err = 0;
 
@@ -829,8 +827,7 @@ static int rpc_service(struct rpc_state *rpc, struct discovery *heads,
 	client.argv = client_argv;
 	if (start_command(&client))
 		exit(1);
-	if (preamble)
-		write_or_die(client.in, preamble->buf, preamble->len);
+	write_or_die(client.in, preamble->buf, preamble->len);
 	if (heads)
 		write_or_die(client.in, heads->buf, heads->len);
 
@@ -964,10 +961,9 @@ static int fetch_git(struct discovery *heads,
 
 	memset(&rpc, 0, sizeof(rpc));
 	rpc.service_name = "git-upload-pack",
-	rpc.stdin_preamble = &preamble;
 	rpc.gzip_request = 1;
 
-	err = rpc_service(&rpc, heads, args.argv);
+	err = rpc_service(&rpc, heads, args.argv, &preamble);
 	if (rpc.result.len)
 		write_or_die(1, rpc.result.buf, rpc.result.len);
 	strbuf_release(&rpc.result);
@@ -1097,9 +1093,8 @@ static int push_git(struct discovery *heads, int nr_spec, char **specs)
 
 	memset(&rpc, 0, sizeof(rpc));
 	rpc.service_name = "git-receive-pack",
-	rpc.stdin_preamble = &preamble;
 
-	err = rpc_service(&rpc, heads, args.argv);
+	err = rpc_service(&rpc, heads, args.argv, &preamble);
 	if (rpc.result.len)
 		write_or_die(1, rpc.result.buf, rpc.result.len);
 	strbuf_release(&rpc.result);
-- 
2.19.0.271.gfe8321ec05.dirty


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/5] remote-curl: reduce scope of rpc_state.result
  2019-02-14 19:06 [PATCH 0/5] Protocol v2 fix: http and auth Jonathan Tan
  2019-02-14 19:06 ` [PATCH 1/5] remote-curl: reduce scope of rpc_state.argv Jonathan Tan
  2019-02-14 19:06 ` [PATCH 2/5] remote-curl: reduce scope of rpc_state.stdin_preamble Jonathan Tan
@ 2019-02-14 19:06 ` Jonathan Tan
  2019-02-14 19:06 ` [PATCH 4/5] remote-curl: refactor reading into rpc_state's buf Jonathan Tan
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 24+ messages in thread
From: Jonathan Tan @ 2019-02-14 19:06 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, peff

The result field in struct rpc_state is only used in rpc_service(), and
not in any functions it directly or indirectly calls. Refactor it to
become an argument of rpc_service() instead.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 remote-curl.c | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/remote-curl.c b/remote-curl.c
index bff0bb9af6..2eb39774d0 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -502,7 +502,6 @@ struct rpc_state {
 	int in;
 	int out;
 	int any_written;
-	struct strbuf result;
 	unsigned gzip_request : 1;
 	unsigned initial_buffer : 1;
 };
@@ -814,7 +813,8 @@ static int post_rpc(struct rpc_state *rpc)
 }
 
 static int rpc_service(struct rpc_state *rpc, struct discovery *heads,
-		       const char **client_argv, const struct strbuf *preamble)
+		       const char **client_argv, const struct strbuf *preamble,
+		       struct strbuf *rpc_result)
 {
 	const char *svc = rpc->service_name;
 	struct strbuf buf = STRBUF_INIT;
@@ -835,7 +835,6 @@ static int rpc_service(struct rpc_state *rpc, struct discovery *heads,
 	rpc->buf = xmalloc(rpc->alloc);
 	rpc->in = client.in;
 	rpc->out = client.out;
-	strbuf_init(&rpc->result, 0);
 
 	strbuf_addf(&buf, "%s%s", url.buf, svc);
 	rpc->service_url = strbuf_detach(&buf, NULL);
@@ -863,7 +862,7 @@ static int rpc_service(struct rpc_state *rpc, struct discovery *heads,
 	close(client.in);
 	client.in = -1;
 	if (!err) {
-		strbuf_read(&rpc->result, client.out, 0);
+		strbuf_read(rpc_result, client.out, 0);
 	} else {
 		char buf[4096];
 		for (;;)
@@ -916,6 +915,7 @@ static int fetch_git(struct discovery *heads,
 	struct strbuf preamble = STRBUF_INIT;
 	int i, err;
 	struct argv_array args = ARGV_ARRAY_INIT;
+	struct strbuf rpc_result = STRBUF_INIT;
 
 	argv_array_pushl(&args, "fetch-pack", "--stateless-rpc",
 			 "--stdin", "--lock-pack", NULL);
@@ -963,10 +963,10 @@ static int fetch_git(struct discovery *heads,
 	rpc.service_name = "git-upload-pack",
 	rpc.gzip_request = 1;
 
-	err = rpc_service(&rpc, heads, args.argv, &preamble);
-	if (rpc.result.len)
-		write_or_die(1, rpc.result.buf, rpc.result.len);
-	strbuf_release(&rpc.result);
+	err = rpc_service(&rpc, heads, args.argv, &preamble, &rpc_result);
+	if (rpc_result.len)
+		write_or_die(1, rpc_result.buf, rpc_result.len);
+	strbuf_release(&rpc_result);
 	strbuf_release(&preamble);
 	argv_array_clear(&args);
 	return err;
@@ -1061,6 +1061,7 @@ static int push_git(struct discovery *heads, int nr_spec, char **specs)
 	struct argv_array args;
 	struct string_list_item *cas_option;
 	struct strbuf preamble = STRBUF_INIT;
+	struct strbuf rpc_result = STRBUF_INIT;
 
 	argv_array_init(&args);
 	argv_array_pushl(&args, "send-pack", "--stateless-rpc", "--helper-status",
@@ -1094,10 +1095,10 @@ static int push_git(struct discovery *heads, int nr_spec, char **specs)
 	memset(&rpc, 0, sizeof(rpc));
 	rpc.service_name = "git-receive-pack",
 
-	err = rpc_service(&rpc, heads, args.argv, &preamble);
-	if (rpc.result.len)
-		write_or_die(1, rpc.result.buf, rpc.result.len);
-	strbuf_release(&rpc.result);
+	err = rpc_service(&rpc, heads, args.argv, &preamble, &rpc_result);
+	if (rpc_result.len)
+		write_or_die(1, rpc_result.buf, rpc_result.len);
+	strbuf_release(&rpc_result);
 	strbuf_release(&preamble);
 	argv_array_clear(&args);
 	return err;
-- 
2.19.0.271.gfe8321ec05.dirty


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 4/5] remote-curl: refactor reading into rpc_state's buf
  2019-02-14 19:06 [PATCH 0/5] Protocol v2 fix: http and auth Jonathan Tan
                   ` (2 preceding siblings ...)
  2019-02-14 19:06 ` [PATCH 3/5] remote-curl: reduce scope of rpc_state.result Jonathan Tan
@ 2019-02-14 19:06 ` Jonathan Tan
  2019-02-14 19:06 ` [PATCH 5/5] remote-curl: use post_rpc() for protocol v2 also Jonathan Tan
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 24+ messages in thread
From: Jonathan Tan @ 2019-02-14 19:06 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, peff

Currently, whenever remote-curl reads pkt-lines from its response file
descriptor, only the payload is written to its buf, not the 4 characters
denoting the length. A future patch will require the ability to also
write those 4 characters, so in preparation for that, refactor this read
into its own function.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 remote-curl.c | 33 ++++++++++++++++++++++++---------
 1 file changed, 24 insertions(+), 9 deletions(-)

diff --git a/remote-curl.c b/remote-curl.c
index 2eb39774d0..32c133f636 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -506,6 +506,25 @@ struct rpc_state {
 	unsigned initial_buffer : 1;
 };
 
+/*
+ * Appends the result of reading from rpc->out to the string represented by
+ * rpc->buf and rpc->len if there is enough space. Returns 1 if there was
+ * enough space, 0 otherwise.
+ *
+ * Writes the number of bytes appended into appended.
+ */
+static int rpc_read_from_out(struct rpc_state *rpc, size_t *appended) {
+	size_t left = rpc->alloc - rpc->len;
+	char *buf = rpc->buf + rpc->len;
+
+	if (left < LARGE_PACKET_MAX)
+		return 0;
+
+	*appended = packet_read(rpc->out, NULL, NULL, buf, left, 0);
+	rpc->len += *appended;
+	return 1;
+}
+
 static size_t rpc_out(void *ptr, size_t eltsize,
 		size_t nmemb, void *buffer_)
 {
@@ -515,11 +534,12 @@ static size_t rpc_out(void *ptr, size_t eltsize,
 
 	if (!avail) {
 		rpc->initial_buffer = 0;
-		avail = packet_read(rpc->out, NULL, NULL, rpc->buf, rpc->alloc, 0);
+		rpc->len = 0;
+		if (!rpc_read_from_out(rpc, &avail))
+			BUG("The entire rpc->buf should be larger than LARGE_PACKET_MAX");
 		if (!avail)
 			return 0;
 		rpc->pos = 0;
-		rpc->len = avail;
 	}
 
 	if (max < avail)
@@ -663,20 +683,15 @@ static int post_rpc(struct rpc_state *rpc)
 	 * chunked encoding mess.
 	 */
 	while (1) {
-		size_t left = rpc->alloc - rpc->len;
-		char *buf = rpc->buf + rpc->len;
-		int n;
+		size_t n;
 
-		if (left < LARGE_PACKET_MAX) {
+		if (!rpc_read_from_out(rpc, &n)) {
 			large_request = 1;
 			use_gzip = 0;
 			break;
 		}
-
-		n = packet_read(rpc->out, NULL, NULL, buf, left, 0);
 		if (!n)
 			break;
-		rpc->len += n;
 	}
 
 	if (large_request) {
-- 
2.19.0.271.gfe8321ec05.dirty


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 5/5] remote-curl: use post_rpc() for protocol v2 also
  2019-02-14 19:06 [PATCH 0/5] Protocol v2 fix: http and auth Jonathan Tan
                   ` (3 preceding siblings ...)
  2019-02-14 19:06 ` [PATCH 4/5] remote-curl: refactor reading into rpc_state's buf Jonathan Tan
@ 2019-02-14 19:06 ` Jonathan Tan
  2019-02-14 22:47   ` Junio C Hamano
  2019-02-21 13:46   ` Jeff King
  2019-02-21 13:47 ` [PATCH 0/5] Protocol v2 fix: http and auth Jeff King
  2019-02-21 20:24 ` [PATCH v2 " Jonathan Tan
  6 siblings, 2 replies; 24+ messages in thread
From: Jonathan Tan @ 2019-02-14 19:06 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, peff

When transmitting and receiving POSTs for protocol v0 and v1,
remote-curl uses post_rpc() (and associated functions), but when doing
the same for protocol v2, it uses a separate set of functions
(proxy_rpc() and others). Besides duplication of code, this has caused
at least one bug: the auth retry mechanism that was implemented in v0/v1
was not implemented in v2.

To fix this issue and avoid it in the future, make remote-curl also use
post_rpc() when handling protocol v2. Because line lengths are written
to the HTTP request in protocol v2 (unlike in protocol v0/v1), this
necessitates changes in post_rpc() and some of the functions it uses;
perform these changes too.

A test has been included to ensure that the code for both the unchunked
and chunked variants of the HTTP request is exercised.

Note: stateless_connect() has been updated to use the lower-level packet
reading functions instead of struct packet_reader. The low-level control
is necessary here because we cannot change the destination buffer of
struct packet_reader while it is being used; struct packet_buffer has a
peeking mechanism which relies on the destination buffer being present
in between a peek and a read.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 pkt-line.c             |   2 +-
 pkt-line.h             |   1 +
 remote-curl.c          | 298 +++++++++++++++++------------------------
 t/t5702-protocol-v2.sh |  26 +++-
 4 files changed, 150 insertions(+), 177 deletions(-)

diff --git a/pkt-line.c b/pkt-line.c
index d4b71d3e82..60329b301b 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -117,7 +117,7 @@ void packet_buf_delim(struct strbuf *buf)
 	strbuf_add(buf, "0001", 4);
 }
 
-static void set_packet_header(char *buf, const int size)
+void set_packet_header(char *buf, const int size)
 {
 	static char hexchar[] = "0123456789abcdef";
 
diff --git a/pkt-line.h b/pkt-line.h
index ad9a4a2cd7..c36cb788ed 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -25,6 +25,7 @@ void packet_delim(int fd);
 void packet_write_fmt(int fd, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
 void packet_buf_flush(struct strbuf *buf);
 void packet_buf_delim(struct strbuf *buf);
+void set_packet_header(char *buf, int size);
 void packet_write(int fd_out, const char *buf, size_t size);
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
 void packet_buf_write_len(struct strbuf *buf, const char *data, size_t len);
diff --git a/remote-curl.c b/remote-curl.c
index 32c133f636..13836e4c28 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -504,6 +504,18 @@ struct rpc_state {
 	int any_written;
 	unsigned gzip_request : 1;
 	unsigned initial_buffer : 1;
+
+	/*
+	 * Whenever a pkt-line is read into buf, append the 4 characters
+	 * denoting its length before appending the payload.
+	 */
+	unsigned write_line_lengths : 1;
+
+	/*
+	 * rpc_out uses this to keep track of whether it should continue
+	 * reading to populate the current request. Initialize to 0.
+	 */
+	unsigned stop_reading : 1;
 };
 
 /*
@@ -511,17 +523,54 @@ struct rpc_state {
  * rpc->buf and rpc->len if there is enough space. Returns 1 if there was
  * enough space, 0 otherwise.
  *
- * Writes the number of bytes appended into appended.
+ * If rpc->write_line_lengths is true, appends the line length as a 4-byte
+ * hexadecimal string before appending the result described above.
+ *
+ * Writes the total number of bytes appended into appended.
  */
-static int rpc_read_from_out(struct rpc_state *rpc, size_t *appended) {
-	size_t left = rpc->alloc - rpc->len;
-	char *buf = rpc->buf + rpc->len;
+static int rpc_read_from_out(struct rpc_state *rpc, int options,
+			     size_t *appended,
+			     enum packet_read_status *status) {
+	size_t left;
+	char *buf;
+	int pktlen_raw;
+
+	if (rpc->write_line_lengths) {
+		left = rpc->alloc - rpc->len - 4;
+		buf = rpc->buf + rpc->len + 4;
+	} else {
+		left = rpc->alloc - rpc->len;
+		buf = rpc->buf + rpc->len;
+	}
 
 	if (left < LARGE_PACKET_MAX)
 		return 0;
 
-	*appended = packet_read(rpc->out, NULL, NULL, buf, left, 0);
-	rpc->len += *appended;
+	*status = packet_read_with_status(rpc->out, NULL, NULL, buf,
+			left, &pktlen_raw, options);
+	if (*status != PACKET_READ_EOF) {
+		*appended = pktlen_raw + (rpc->write_line_lengths ? 4 : 0);
+		rpc->len += *appended;
+	}
+
+	if (rpc->write_line_lengths) {
+		switch (*status) {
+		case PACKET_READ_EOF:
+			if (!(options & PACKET_READ_GENTLE_ON_EOF))
+				die("shouldn't have EOF when not gentle on EOF");
+			break;
+		case PACKET_READ_NORMAL:
+			set_packet_header(buf - 4, *appended);
+			break;
+		case PACKET_READ_DELIM:
+			memcpy(buf - 4, "0001", 4);
+			break;
+		case PACKET_READ_FLUSH:
+			memcpy(buf - 4, "0000", 4);
+			break;
+		}
+	}
+
 	return 1;
 }
 
@@ -531,15 +580,32 @@ static size_t rpc_out(void *ptr, size_t eltsize,
 	size_t max = eltsize * nmemb;
 	struct rpc_state *rpc = buffer_;
 	size_t avail = rpc->len - rpc->pos;
+	enum packet_read_status status;
 
 	if (!avail) {
 		rpc->initial_buffer = 0;
 		rpc->len = 0;
-		if (!rpc_read_from_out(rpc, &avail))
-			BUG("The entire rpc->buf should be larger than LARGE_PACKET_MAX");
-		if (!avail)
-			return 0;
 		rpc->pos = 0;
+		if (!rpc->stop_reading) {
+			if (!rpc_read_from_out(rpc, 0, &avail, &status))
+				BUG("The entire rpc->buf should be larger than LARGE_PACKET_MAX");
+			if (status == PACKET_READ_FLUSH)
+				/*
+				 * We are done reading for this request, but we
+				 * still need to send this line out (if
+				 * rpc->write_line_lengths is true) so do not
+				 * return yet.
+				 */
+				rpc->stop_reading = 1;
+		}
+	}
+	if (!avail && rpc->stop_reading) {
+		/*
+		 * "return 0" will notify Curl that this RPC request is done,
+		 * so reset stop_reading back to 0 for the next request.
+		 */
+		rpc->stop_reading = 0;
+		return 0;
 	}
 
 	if (max < avail)
@@ -684,13 +750,14 @@ static int post_rpc(struct rpc_state *rpc)
 	 */
 	while (1) {
 		size_t n;
+		enum packet_read_status status;
 
-		if (!rpc_read_from_out(rpc, &n)) {
+		if (!rpc_read_from_out(rpc, 0, &n, &status)) {
 			large_request = 1;
 			use_gzip = 0;
 			break;
 		}
-		if (!n)
+		if (status == PACKET_READ_FLUSH)
 			break;
 	}
 
@@ -1165,165 +1232,11 @@ static void parse_push(struct strbuf *buf)
 	free(specs);
 }
 
-/*
- * Used to represent the state of a connection to an HTTP server when
- * communicating using git's wire-protocol version 2.
- */
-struct proxy_state {
-	char *service_name;
-	char *service_url;
-	struct curl_slist *headers;
-	struct strbuf request_buffer;
-	int in;
-	int out;
-	struct packet_reader reader;
-	size_t pos;
-	int seen_flush;
-};
-
-static void proxy_state_init(struct proxy_state *p, const char *service_name,
-			     enum protocol_version version)
-{
-	struct strbuf buf = STRBUF_INIT;
-
-	memset(p, 0, sizeof(*p));
-	p->service_name = xstrdup(service_name);
-
-	p->in = 0;
-	p->out = 1;
-	strbuf_init(&p->request_buffer, 0);
-
-	strbuf_addf(&buf, "%s%s", url.buf, p->service_name);
-	p->service_url = strbuf_detach(&buf, NULL);
-
-	p->headers = http_copy_default_headers();
-
-	strbuf_addf(&buf, "Content-Type: application/x-%s-request", p->service_name);
-	p->headers = curl_slist_append(p->headers, buf.buf);
-	strbuf_reset(&buf);
-
-	strbuf_addf(&buf, "Accept: application/x-%s-result", p->service_name);
-	p->headers = curl_slist_append(p->headers, buf.buf);
-	strbuf_reset(&buf);
-
-	p->headers = curl_slist_append(p->headers, "Transfer-Encoding: chunked");
-
-	/* Add the Git-Protocol header */
-	if (get_protocol_http_header(version, &buf))
-		p->headers = curl_slist_append(p->headers, buf.buf);
-
-	packet_reader_init(&p->reader, p->in, NULL, 0,
-			   PACKET_READ_GENTLE_ON_EOF |
-			   PACKET_READ_DIE_ON_ERR_PACKET);
-
-	strbuf_release(&buf);
-}
-
-static void proxy_state_clear(struct proxy_state *p)
-{
-	free(p->service_name);
-	free(p->service_url);
-	curl_slist_free_all(p->headers);
-	strbuf_release(&p->request_buffer);
-}
-
-/*
- * CURLOPT_READFUNCTION callback function.
- * Attempts to copy over a single packet-line at a time into the
- * curl provided buffer.
- */
-static size_t proxy_in(char *buffer, size_t eltsize,
-		       size_t nmemb, void *userdata)
-{
-	size_t max;
-	struct proxy_state *p = userdata;
-	size_t avail = p->request_buffer.len - p->pos;
-
-
-	if (eltsize != 1)
-		BUG("curl read callback called with size = %"PRIuMAX" != 1",
-		    (uintmax_t)eltsize);
-	max = nmemb;
-
-	if (!avail) {
-		if (p->seen_flush) {
-			p->seen_flush = 0;
-			return 0;
-		}
-
-		strbuf_reset(&p->request_buffer);
-		switch (packet_reader_read(&p->reader)) {
-		case PACKET_READ_EOF:
-			die("unexpected EOF when reading from parent process");
-		case PACKET_READ_NORMAL:
-			packet_buf_write_len(&p->request_buffer, p->reader.line,
-					     p->reader.pktlen);
-			break;
-		case PACKET_READ_DELIM:
-			packet_buf_delim(&p->request_buffer);
-			break;
-		case PACKET_READ_FLUSH:
-			packet_buf_flush(&p->request_buffer);
-			p->seen_flush = 1;
-			break;
-		}
-		p->pos = 0;
-		avail = p->request_buffer.len;
-	}
-
-	if (max < avail)
-		avail = max;
-	memcpy(buffer, p->request_buffer.buf + p->pos, avail);
-	p->pos += avail;
-	return avail;
-}
-
-static size_t proxy_out(char *buffer, size_t eltsize,
-			size_t nmemb, void *userdata)
-{
-	size_t size;
-	struct proxy_state *p = userdata;
-
-	if (eltsize != 1)
-		BUG("curl read callback called with size = %"PRIuMAX" != 1",
-		    (uintmax_t)eltsize);
-	size = nmemb;
-
-	write_or_die(p->out, buffer, size);
-	return size;
-}
-
-/* Issues a request to the HTTP server configured in `p` */
-static int proxy_request(struct proxy_state *p)
-{
-	struct active_request_slot *slot;
-
-	slot = get_active_slot();
-
-	curl_easy_setopt(slot->curl, CURLOPT_ENCODING, "");
-	curl_easy_setopt(slot->curl, CURLOPT_NOBODY, 0);
-	curl_easy_setopt(slot->curl, CURLOPT_POST, 1);
-	curl_easy_setopt(slot->curl, CURLOPT_URL, p->service_url);
-	curl_easy_setopt(slot->curl, CURLOPT_HTTPHEADER, p->headers);
-
-	/* Setup function to read request from client */
-	curl_easy_setopt(slot->curl, CURLOPT_READFUNCTION, proxy_in);
-	curl_easy_setopt(slot->curl, CURLOPT_READDATA, p);
-
-	/* Setup function to write server response to client */
-	curl_easy_setopt(slot->curl, CURLOPT_WRITEFUNCTION, proxy_out);
-	curl_easy_setopt(slot->curl, CURLOPT_WRITEDATA, p);
-
-	if (run_slot(slot, NULL) != HTTP_OK)
-		return -1;
-
-	return 0;
-}
-
 static int stateless_connect(const char *service_name)
 {
 	struct discovery *discover;
-	struct proxy_state p;
+	struct rpc_state rpc;
+	struct strbuf buf = STRBUF_INIT;
 
 	/*
 	 * Run the info/refs request and see if the server supports protocol
@@ -1343,23 +1256,58 @@ static int stateless_connect(const char *service_name)
 		fflush(stdout);
 	}
 
-	proxy_state_init(&p, service_name, discover->version);
+	rpc.service_name = service_name;
+	rpc.service_url = xstrfmt("%s%s", url.buf, rpc.service_name);
+	rpc.hdr_content_type = xstrfmt("Content-Type: application/x-%s-request", rpc.service_name);
+	rpc.hdr_accept = xstrfmt("Accept: application/x-%s-result", rpc.service_name);
+	if (get_protocol_http_header(discover->version, &buf)) {
+		rpc.protocol_header = strbuf_detach(&buf, NULL);
+	} else {
+		rpc.protocol_header = NULL;
+		strbuf_release(&buf);
+	}
+	rpc.buf = xmalloc(http_post_buffer);
+	rpc.alloc = http_post_buffer;
+	rpc.len = 0;
+	rpc.pos = 0;
+	rpc.in = 1;
+	rpc.out = 0;
+	rpc.any_written = 0;
+	rpc.gzip_request = 1;
+	rpc.initial_buffer = 0;
+	rpc.write_line_lengths = 1;
+	rpc.stop_reading = 0;
 
 	/*
 	 * Dump the capability listing that we got from the server earlier
 	 * during the info/refs request.
 	 */
-	write_or_die(p.out, discover->buf, discover->len);
+	write_or_die(rpc.in, discover->buf, discover->len);
 
-	/* Peek the next packet line.  Until we see EOF keep sending POSTs */
-	while (packet_reader_peek(&p.reader) != PACKET_READ_EOF) {
-		if (proxy_request(&p)) {
+	/* Until we see EOF keep sending POSTs */
+	while (1) {
+		size_t avail;
+		enum packet_read_status status;
+
+		if (!rpc_read_from_out(&rpc, PACKET_READ_GENTLE_ON_EOF, &avail,
+				       &status))
+			BUG("The entire rpc->buf should be larger than LARGE_PACKET_MAX");
+		if (status == PACKET_READ_EOF)
+			break;
+		if (post_rpc(&rpc))
 			/* We would have an err here */
 			break;
-		}
+		/* Reset the buffer for next request */
+		rpc.len = 0;
 	}
 
-	proxy_state_clear(&p);
+	free(rpc.service_url);
+	free(rpc.hdr_content_type);
+	free(rpc.hdr_accept);
+	free(rpc.protocol_header);
+	free(rpc.buf);
+	strbuf_release(&buf);
+
 	return 0;
 }
 
diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
index db4ae09f2f..61acf99d80 100755
--- a/t/t5702-protocol-v2.sh
+++ b/t/t5702-protocol-v2.sh
@@ -542,7 +542,31 @@ test_expect_success 'clone with http:// using protocol v2' '
 	# Client requested to use protocol v2
 	grep "Git-Protocol: version=2" log &&
 	# Server responded using protocol v2
-	grep "git< version 2" log
+	grep "git< version 2" log &&
+	# Verify that the chunked encoding sending codepath is NOT exercised
+	! grep "Send header: Transfer-Encoding: chunked" log
+'
+
+test_expect_success 'clone big repository with http:// using protocol v2' '
+	test_when_finished "rm -f log" &&
+
+	git init "$HTTPD_DOCUMENT_ROOT_PATH/big" &&
+	# Ensure that the list of wants is greater than http.postbuffer below
+	for i in $(seq 1 1500)
+	do
+		test_commit -C "$HTTPD_DOCUMENT_ROOT_PATH/big" "commit$i"
+	done &&
+
+	GIT_TRACE_PACKET="$(pwd)/log" GIT_TRACE_CURL="$(pwd)/log" git \
+		-c protocol.version=2 -c http.postbuffer=65536 \
+		clone "$HTTPD_URL/smart/big" big_child &&
+
+	# Client requested to use protocol v2
+	grep "Git-Protocol: version=2" log &&
+	# Server responded using protocol v2
+	grep "git< version 2" log &&
+	# Verify that the chunked encoding sending codepath is exercised
+	grep "Send header: Transfer-Encoding: chunked" log
 '
 
 test_expect_success 'fetch with http:// using protocol v2' '
-- 
2.19.0.271.gfe8321ec05.dirty


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/5] remote-curl: use post_rpc() for protocol v2 also
  2019-02-14 19:06 ` [PATCH 5/5] remote-curl: use post_rpc() for protocol v2 also Jonathan Tan
@ 2019-02-14 22:47   ` Junio C Hamano
  2019-02-21 13:46   ` Jeff King
  1 sibling, 0 replies; 24+ messages in thread
From: Junio C Hamano @ 2019-02-14 22:47 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, peff

Jonathan Tan <jonathantanmy@google.com> writes:

> +	# Ensure that the list of wants is greater than http.postbuffer below
> +	for i in $(seq 1 1500)

test_seq.

> +	do
> +		test_commit -C "$HTTPD_DOCUMENT_ROOT_PATH/big" "commit$i"
> +	done &&
> +
> +	GIT_TRACE_PACKET="$(pwd)/log" GIT_TRACE_CURL="$(pwd)/log" git \
> +		-c protocol.version=2 -c http.postbuffer=65536 \
> +		clone "$HTTPD_URL/smart/big" big_child &&
> +
> +	# Client requested to use protocol v2
> +	grep "Git-Protocol: version=2" log &&
> +	# Server responded using protocol v2
> +	grep "git< version 2" log &&
> +	# Verify that the chunked encoding sending codepath is exercised
> +	grep "Send header: Transfer-Encoding: chunked" log
>  '
>  
>  test_expect_success 'fetch with http:// using protocol v2' '

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/5] remote-curl: use post_rpc() for protocol v2 also
  2019-02-14 19:06 ` [PATCH 5/5] remote-curl: use post_rpc() for protocol v2 also Jonathan Tan
  2019-02-14 22:47   ` Junio C Hamano
@ 2019-02-21 13:46   ` Jeff King
  2019-02-21 19:26     ` Jonathan Tan
  1 sibling, 1 reply; 24+ messages in thread
From: Jeff King @ 2019-02-21 13:46 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On Thu, Feb 14, 2019 at 11:06:39AM -0800, Jonathan Tan wrote:

> diff --git a/remote-curl.c b/remote-curl.c
> index 32c133f636..13836e4c28 100644
> --- a/remote-curl.c
> +++ b/remote-curl.c
> @@ -504,6 +504,18 @@ struct rpc_state {
>  	int any_written;
>  	unsigned gzip_request : 1;
>  	unsigned initial_buffer : 1;
> +
> +	/*
> +	 * Whenever a pkt-line is read into buf, append the 4 characters
> +	 * denoting its length before appending the payload.
> +	 */
> +	unsigned write_line_lengths : 1;

Hmm, so we read a packet, and then we "append its length" before
appending the contents. But that would always be the length we just
read, right? I wonder if it would be simpler to just call this option
something like "proxy_packets" or "full_packets", teach the packet code
to give us the full packets, and then just treat that whole buffer as a
unit. I dunno. There might be some gotchas in practice, and it's not
like it's that much simpler. Just a thought.

> +	/*
> +	 * rpc_out uses this to keep track of whether it should continue
> +	 * reading to populate the current request. Initialize to 0.
> +	 */
> +	unsigned stop_reading : 1;

OK, so we need this because the v2 proxying will require us to stop
reading but keep the channel open? Kind of awkward, but I don't see a
way around it.

> +static int rpc_read_from_out(struct rpc_state *rpc, int options,
> +			     size_t *appended,
> +			     enum packet_read_status *status) {
> +	size_t left;
> +	char *buf;
> +	int pktlen_raw;
> +
> +	if (rpc->write_line_lengths) {
> +		left = rpc->alloc - rpc->len - 4;
> +		buf = rpc->buf + rpc->len + 4;
> +	} else {
> +		left = rpc->alloc - rpc->len;
> +		buf = rpc->buf + rpc->len;
> +	}

OK, so we push the packets 4 bytes further into the buffer in that case,
leaving room for the header. Makes sense.

>  	if (left < LARGE_PACKET_MAX)
>  		return 0;
>  
> -	*appended = packet_read(rpc->out, NULL, NULL, buf, left, 0);
> -	rpc->len += *appended;
> +	*status = packet_read_with_status(rpc->out, NULL, NULL, buf,
> +			left, &pktlen_raw, options);
> +	if (*status != PACKET_READ_EOF) {
> +		*appended = pktlen_raw + (rpc->write_line_lengths ? 4 : 0);
> +		rpc->len += *appended;
> +	}
> +
> +	if (rpc->write_line_lengths) {
> +		switch (*status) {
> +		case PACKET_READ_EOF:
> +			if (!(options & PACKET_READ_GENTLE_ON_EOF))
> +				die("shouldn't have EOF when not gentle on EOF");
> +			break;
> +		case PACKET_READ_NORMAL:
> +			set_packet_header(buf - 4, *appended);
> +			break;
> +		case PACKET_READ_DELIM:
> +			memcpy(buf - 4, "0001", 4);
> +			break;
> +		case PACKET_READ_FLUSH:
> +			memcpy(buf - 4, "0000", 4);
> +			break;
> +		}
> +	}

And here we fill it in. Make sense. It's a little awkward that we have
to re-translate READ_DELIM, etc, back into their headers.

> @@ -531,15 +580,32 @@ static size_t rpc_out(void *ptr, size_t eltsize,
>  	size_t max = eltsize * nmemb;
>  	struct rpc_state *rpc = buffer_;
>  	size_t avail = rpc->len - rpc->pos;
> +	enum packet_read_status status;
>  
>  	if (!avail) {
>  		rpc->initial_buffer = 0;
>  		rpc->len = 0;
> -		if (!rpc_read_from_out(rpc, &avail))
> -			BUG("The entire rpc->buf should be larger than LARGE_PACKET_MAX");
> -		if (!avail)
> -			return 0;
>  		rpc->pos = 0;
> +		if (!rpc->stop_reading) {
> +			if (!rpc_read_from_out(rpc, 0, &avail, &status))
> +				BUG("The entire rpc->buf should be larger than LARGE_PACKET_MAX");

Do we actually need it to be LARGE_PACKET_MAX+4 here? I guess not,
because LARGE_PACKET_DATA_MAX is the "-4" version. So I think this BUG()
was perhaps already wrong?

> +			if (status == PACKET_READ_FLUSH)
> +				/*
> +				 * We are done reading for this request, but we
> +				 * still need to send this line out (if
> +				 * rpc->write_line_lengths is true) so do not
> +				 * return yet.
> +				 */
> +				rpc->stop_reading = 1;
> +		}
> +	}
> +	if (!avail && rpc->stop_reading) {
> +		/*
> +		 * "return 0" will notify Curl that this RPC request is done,
> +		 * so reset stop_reading back to 0 for the next request.
> +		 */
> +		rpc->stop_reading = 0;
> +		return 0;

OK, and here's where we handle the stop_reading thing. It is indeed
awkward, but I think your comments make it clear what's going on.

If we get stop_reading, do we care about "avail"? I.e., shouldn't we be
able to return non-zero to say "we got the whole input, this is not a
too-large request"?

> +test_expect_success 'clone big repository with http:// using protocol v2' '
> +	test_when_finished "rm -f log" &&
> +
> +	git init "$HTTPD_DOCUMENT_ROOT_PATH/big" &&
> +	# Ensure that the list of wants is greater than http.postbuffer below
> +	for i in $(seq 1 1500)
> +	do
> +		test_commit -C "$HTTPD_DOCUMENT_ROOT_PATH/big" "commit$i"
> +	done &&

As Junio noted, this should be test_seq. But I think it would be nice to
avoid looping on test_commit here at all. It kicks off at least 3
processes; multiplying that by 1500 is going to be slow.

Making a big input is often much faster by generating a fast-import
stream (which can often be done entirely in-shell). There's some prior
art in t3302, t5551, t5608, and others.

-Peff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/5] Protocol v2 fix: http and auth
  2019-02-14 19:06 [PATCH 0/5] Protocol v2 fix: http and auth Jonathan Tan
                   ` (4 preceding siblings ...)
  2019-02-14 19:06 ` [PATCH 5/5] remote-curl: use post_rpc() for protocol v2 also Jonathan Tan
@ 2019-02-21 13:47 ` Jeff King
  2019-02-21 20:24 ` [PATCH v2 " Jonathan Tan
  6 siblings, 0 replies; 24+ messages in thread
From: Jeff King @ 2019-02-21 13:47 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On Thu, Feb 14, 2019 at 11:06:34AM -0800, Jonathan Tan wrote:

> Peff noticed an issue with my http auth in protocol v2 patch earlier
> [1], and in the ensuing discussion, I thought that it would be best to
> make v2 use post_rpc() as well (to be the same as v0/v1) instead of
> using its own functions, to fix this issue and try to avoid a similar
> issue in the future. Besides that, there is also a net reduction in
> lines of code.
> 
> So here are the patches. First 4 are refactoring - the last one has the
> actual change. This is on the master branch.

I think this is the right approach. Patches 1-4 look pretty nice. I
didn't see anything wrong with the code in the 5th one, but I did have
some thoughts about how we might be able to do it a bit more cleanly
(though I will not at all be surprised if they turn out to be bad
suggestions).

-Peff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/5] remote-curl: use post_rpc() for protocol v2 also
  2019-02-21 13:46   ` Jeff King
@ 2019-02-21 19:26     ` Jonathan Tan
  0 siblings, 0 replies; 24+ messages in thread
From: Jonathan Tan @ 2019-02-21 19:26 UTC (permalink / raw)
  To: peff; +Cc: jonathantanmy, git

> > diff --git a/remote-curl.c b/remote-curl.c
> > index 32c133f636..13836e4c28 100644
> > --- a/remote-curl.c
> > +++ b/remote-curl.c
> > @@ -504,6 +504,18 @@ struct rpc_state {
> >  	int any_written;
> >  	unsigned gzip_request : 1;
> >  	unsigned initial_buffer : 1;
> > +
> > +	/*
> > +	 * Whenever a pkt-line is read into buf, append the 4 characters
> > +	 * denoting its length before appending the payload.
> > +	 */
> > +	unsigned write_line_lengths : 1;
> 
> Hmm, so we read a packet, and then we "append its length" before
> appending the contents. But that would always be the length we just
> read, right? I wonder if it would be simpler to just call this option
> something like "proxy_packets" or "full_packets", teach the packet code
> to give us the full packets, and then just treat that whole buffer as a
> unit. I dunno. There might be some gotchas in practice, and it's not
> like it's that much simpler. Just a thought.

Yes, the length is the length we just read. And it might not even be
simpler - this shifts the complexity to the code that does not use the 4
characters (unless we always return 2 pointers, which seems redundant,
and which now changes the problem of ensuring that the correct pointer
is used). I think that this is a good default - proxying still seems
rarer than just consuming payloads.

I'm OK with changing the name, although I think that both "proxy" and
"full" are less clear than "write_line_lengths" - aren't you still
proxying even if you're changing the format a little, and isn't a packet
"full" even without the line lengths?

> > +	/*
> > +	 * rpc_out uses this to keep track of whether it should continue
> > +	 * reading to populate the current request. Initialize to 0.
> > +	 */
> > +	unsigned stop_reading : 1;
> 
> OK, so we need this because the v2 proxying will require us to stop
> reading but keep the channel open? Kind of awkward, but I don't see a
> way around it.

I'll improve the comments here and elsewhere in the next version. This
basically means that "we read a flush but we haven't sent the 0000 to
libcurl yet, so don't read anything more until we have sent the 0000",
and "flush_read_but_not_sent" is probably a better name.

> > @@ -531,15 +580,32 @@ static size_t rpc_out(void *ptr, size_t eltsize,
> >  	size_t max = eltsize * nmemb;
> >  	struct rpc_state *rpc = buffer_;
> >  	size_t avail = rpc->len - rpc->pos;
> > +	enum packet_read_status status;
> >  
> >  	if (!avail) {
> >  		rpc->initial_buffer = 0;
> >  		rpc->len = 0;
> > -		if (!rpc_read_from_out(rpc, &avail))
> > -			BUG("The entire rpc->buf should be larger than LARGE_PACKET_MAX");
> > -		if (!avail)
> > -			return 0;
> >  		rpc->pos = 0;
> > +		if (!rpc->stop_reading) {
> > +			if (!rpc_read_from_out(rpc, 0, &avail, &status))
> > +				BUG("The entire rpc->buf should be larger than LARGE_PACKET_MAX");
> 
> Do we actually need it to be LARGE_PACKET_MAX+4 here? I guess not,
> because LARGE_PACKET_DATA_MAX is the "-4" version. So I think this BUG()
> was perhaps already wrong?

In this patch, yes (if not, the non-chunked code is useless).
Previously, the BUG should have been LARGE_PACKET_DATA_MAX, yes. The
BUG() was introduced in patch 4 of this set - I'll update that one to
LARGE_PACKET_DATA_MAX and this one to LARGE_PACKET_MAX.

> > +			if (status == PACKET_READ_FLUSH)
> > +				/*
> > +				 * We are done reading for this request, but we
> > +				 * still need to send this line out (if
> > +				 * rpc->write_line_lengths is true) so do not
> > +				 * return yet.
> > +				 */
> > +				rpc->stop_reading = 1;
> > +		}
> > +	}
> > +	if (!avail && rpc->stop_reading) {
> > +		/*
> > +		 * "return 0" will notify Curl that this RPC request is done,
> > +		 * so reset stop_reading back to 0 for the next request.
> > +		 */
> > +		rpc->stop_reading = 0;
> > +		return 0;
> 
> OK, and here's where we handle the stop_reading thing. It is indeed
> awkward, but I think your comments make it clear what's going on.
> 
> If we get stop_reading, do we care about "avail"? I.e., shouldn't we be
> able to return non-zero to say "we got the whole input, this is not a
> too-large request"?

This code is in rpc_out(), which is a callback passed as
CURLOPT_READFUNCTION. So returning non-zero means "send these bytes",
and returning zero means EOF.

We set stop_reading when we receive a flush, as you can see from the
quoted code snippet. But this does not mean that the buffer is empty -
the buffer may contain "0000" (if rpc->write_line_lengths is true, as
the comment states). We have to let libcurl repeatedly call this
function until all of the "0000" is sent (and return non-zero each
time). But once all of the "0000" is sent - we know this by avail being
zero - and libcurl calls this function once more, we have to remember to
do nothing except to reset stop_reading and return 0 to indicate EOF.

> > +test_expect_success 'clone big repository with http:// using protocol v2' '
> > +	test_when_finished "rm -f log" &&
> > +
> > +	git init "$HTTPD_DOCUMENT_ROOT_PATH/big" &&
> > +	# Ensure that the list of wants is greater than http.postbuffer below
> > +	for i in $(seq 1 1500)
> > +	do
> > +		test_commit -C "$HTTPD_DOCUMENT_ROOT_PATH/big" "commit$i"
> > +	done &&
> 
> As Junio noted, this should be test_seq. But I think it would be nice to
> avoid looping on test_commit here at all. It kicks off at least 3
> processes; multiplying that by 1500 is going to be slow.
> 
> Making a big input is often much faster by generating a fast-import
> stream (which can often be done entirely in-shell). There's some prior
> art in t3302, t5551, t5608, and others.

OK - I'll look at generating a fast-import stream. Thanks for all your
comments.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 0/5] Protocol v2 fix: http and auth
  2019-02-14 19:06 [PATCH 0/5] Protocol v2 fix: http and auth Jonathan Tan
                   ` (5 preceding siblings ...)
  2019-02-21 13:47 ` [PATCH 0/5] Protocol v2 fix: http and auth Jeff King
@ 2019-02-21 20:24 ` Jonathan Tan
  2019-02-21 20:24   ` [PATCH v2 1/5] remote-curl: reduce scope of rpc_state.argv Jonathan Tan
                     ` (4 more replies)
  6 siblings, 5 replies; 24+ messages in thread
From: Jonathan Tan @ 2019-02-21 20:24 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, peff

Thanks to Junio and Peff for their reviews.

The main changes are the name and comments around the stop_reading part.
I have renamed it to flush_read_but_not_sent and have added a lot of
comments - I tried to err on the side of overcommenting here, since it
is rather complicated.

Other changes:
 - used test_seq instead of seq
 - used fast_import instead of test_commit 1500 times
 - in patch 4, the BUG statement now reads "...larger than
   LARGE_PACKET_DATA_MAX; the corresponding BUG in patch 5 is
   LARGE_PACKET_MAX (this means that the total diff does not change, and
   is thus not visible in the interdiff below)

I discovered a slight issue in which the full http.postBuffer is not
used, because Git immediately switches to chunked mode if the buffer
cannot contain a maximally sized pkt-line (without first reading the
pkt-line to see if it fits). This means that I could replace "test-seq 1
1500" with "test-seq 1 2" and the test would still pass, but I'm still
using 1500 in this patch set so that the test will pass if we ever
decide to use the http.postBuffer slightly more efficiently.

Jonathan Tan (5):
  remote-curl: reduce scope of rpc_state.argv
  remote-curl: reduce scope of rpc_state.stdin_preamble
  remote-curl: reduce scope of rpc_state.result
  remote-curl: refactor reading into rpc_state's buf
  remote-curl: use post_rpc() for protocol v2 also

 pkt-line.c             |   2 +-
 pkt-line.h             |   1 +
 remote-curl.c          | 362 +++++++++++++++++++----------------------
 t/t5702-protocol-v2.sh |  33 +++-
 4 files changed, 199 insertions(+), 199 deletions(-)

Interdiff against v1:
diff --git a/remote-curl.c b/remote-curl.c
index 2394a00650..8c03c78fc6 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -526,10 +526,13 @@ struct rpc_state {
 	unsigned write_line_lengths : 1;
 
 	/*
-	 * rpc_out uses this to keep track of whether it should continue
-	 * reading to populate the current request. Initialize to 0.
+	 * Used by rpc_out; initialize to 0. This is true if a flush has been
+	 * read, but the corresponding line length (if write_line_lengths is
+	 * true) and EOF have not been sent to libcurl. Since each flush marks
+	 * the end of a request, each flush must be completely sent before any
+	 * further reading occurs.
 	 */
-	unsigned stop_reading : 1;
+	unsigned flush_read_but_not_sent : 1;
 };
 
 /*
@@ -600,26 +603,34 @@ static size_t rpc_out(void *ptr, size_t eltsize,
 		rpc->initial_buffer = 0;
 		rpc->len = 0;
 		rpc->pos = 0;
-		if (!rpc->stop_reading) {
+		if (!rpc->flush_read_but_not_sent) {
 			if (!rpc_read_from_out(rpc, 0, &avail, &status))
 				BUG("The entire rpc->buf should be larger than LARGE_PACKET_MAX");
 			if (status == PACKET_READ_FLUSH)
-				/*
-				 * We are done reading for this request, but we
-				 * still need to send this line out (if
-				 * rpc->write_line_lengths is true) so do not
-				 * return yet.
-				 */
-				rpc->stop_reading = 1;
+				rpc->flush_read_but_not_sent = 1;
 		}
+		/*
+		 * If flush_read_but_not_sent is true, we have already read one
+		 * full request but have not fully sent it + EOF, which is why
+		 * we need to refrain from reading.
+		 */
 	}
-	if (!avail && rpc->stop_reading) {
+	if (rpc->flush_read_but_not_sent) {
+		if (!avail) {
+			/*
+			 * The line length either does not need to be sent at
+			 * all or has already been completely sent. Now we can
+			 * return 0, indicating EOF, meaning that the flush has
+			 * been fully sent.
+			 */
+			rpc->flush_read_but_not_sent = 0;
+			return 0;
+		}
 		/*
-		 * "return 0" will notify Curl that this RPC request is done,
-		 * so reset stop_reading back to 0 for the next request.
+		 * If avail is non-zerp, the line length for the flush still
+		 * hasn't been fully sent. Proceed with sending the line
+		 * length.
 		 */
-		rpc->stop_reading = 0;
-		return 0;
 	}
 
 	if (max < avail)
@@ -1290,7 +1301,7 @@ static int stateless_connect(const char *service_name)
 	rpc.gzip_request = 1;
 	rpc.initial_buffer = 0;
 	rpc.write_line_lengths = 1;
-	rpc.stop_reading = 0;
+	rpc.flush_read_but_not_sent = 0;
 
 	/*
 	 * Dump the capability listing that we got from the server earlier
diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
index 61acf99d80..e112b6086c 100755
--- a/t/t5702-protocol-v2.sh
+++ b/t/t5702-protocol-v2.sh
@@ -552,10 +552,17 @@ test_expect_success 'clone big repository with http:// using protocol v2' '
 
 	git init "$HTTPD_DOCUMENT_ROOT_PATH/big" &&
 	# Ensure that the list of wants is greater than http.postbuffer below
-	for i in $(seq 1 1500)
+	for i in $(test_seq 1 1500)
 	do
-		test_commit -C "$HTTPD_DOCUMENT_ROOT_PATH/big" "commit$i"
-	done &&
+		# do not use here-doc, because it requires a process
+		# per loop iteration
+		echo "commit refs/heads/too-many-refs-$i" &&
+		echo "committer git <git@example.com> $i +0000" &&
+		echo "data 0" &&
+		echo "M 644 inline bla.txt" &&
+		echo "data 4" &&
+		echo "bla"
+	done | git -C "$HTTPD_DOCUMENT_ROOT_PATH/big" fast-import &&
 
 	GIT_TRACE_PACKET="$(pwd)/log" GIT_TRACE_CURL="$(pwd)/log" git \
 		-c protocol.version=2 -c http.postbuffer=65536 \
-- 
2.19.0.271.gfe8321ec05.dirty


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 1/5] remote-curl: reduce scope of rpc_state.argv
  2019-02-21 20:24 ` [PATCH v2 " Jonathan Tan
@ 2019-02-21 20:24   ` Jonathan Tan
  2019-02-21 20:24   ` [PATCH v2 2/5] remote-curl: reduce scope of rpc_state.stdin_preamble Jonathan Tan
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 24+ messages in thread
From: Jonathan Tan @ 2019-02-21 20:24 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, peff

The argv field in struct rpc_state is only used in rpc_service(), and
not in any functions it directly or indirectly calls. Refactor it to
become an argument of rpc_service() instead.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 remote-curl.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/remote-curl.c b/remote-curl.c
index bb7421023b..8b7baf6298 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -505,7 +505,6 @@ static void output_refs(struct ref *refs)
 
 struct rpc_state {
 	const char *service_name;
-	const char **argv;
 	struct strbuf *stdin_preamble;
 	char *service_url;
 	char *hdr_content_type;
@@ -829,7 +828,8 @@ static int post_rpc(struct rpc_state *rpc)
 	return err;
 }
 
-static int rpc_service(struct rpc_state *rpc, struct discovery *heads)
+static int rpc_service(struct rpc_state *rpc, struct discovery *heads,
+		       const char **client_argv)
 {
 	const char *svc = rpc->service_name;
 	struct strbuf buf = STRBUF_INIT;
@@ -840,7 +840,7 @@ static int rpc_service(struct rpc_state *rpc, struct discovery *heads)
 	client.in = -1;
 	client.out = -1;
 	client.git_cmd = 1;
-	client.argv = rpc->argv;
+	client.argv = client_argv;
 	if (start_command(&client))
 		exit(1);
 	if (preamble)
@@ -978,11 +978,10 @@ static int fetch_git(struct discovery *heads,
 
 	memset(&rpc, 0, sizeof(rpc));
 	rpc.service_name = "git-upload-pack",
-	rpc.argv = args.argv;
 	rpc.stdin_preamble = &preamble;
 	rpc.gzip_request = 1;
 
-	err = rpc_service(&rpc, heads);
+	err = rpc_service(&rpc, heads, args.argv);
 	if (rpc.result.len)
 		write_or_die(1, rpc.result.buf, rpc.result.len);
 	strbuf_release(&rpc.result);
@@ -1112,10 +1111,9 @@ static int push_git(struct discovery *heads, int nr_spec, char **specs)
 
 	memset(&rpc, 0, sizeof(rpc));
 	rpc.service_name = "git-receive-pack",
-	rpc.argv = args.argv;
 	rpc.stdin_preamble = &preamble;
 
-	err = rpc_service(&rpc, heads);
+	err = rpc_service(&rpc, heads, args.argv);
 	if (rpc.result.len)
 		write_or_die(1, rpc.result.buf, rpc.result.len);
 	strbuf_release(&rpc.result);
-- 
2.19.0.271.gfe8321ec05.dirty


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 2/5] remote-curl: reduce scope of rpc_state.stdin_preamble
  2019-02-21 20:24 ` [PATCH v2 " Jonathan Tan
  2019-02-21 20:24   ` [PATCH v2 1/5] remote-curl: reduce scope of rpc_state.argv Jonathan Tan
@ 2019-02-21 20:24   ` Jonathan Tan
  2019-02-21 20:24   ` [PATCH v2 3/5] remote-curl: reduce scope of rpc_state.result Jonathan Tan
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 24+ messages in thread
From: Jonathan Tan @ 2019-02-21 20:24 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, peff

The stdin_preamble field in struct rpc_state is only used in
rpc_service(), and not in any functions it directly or indirectly calls.
Refactor it to become an argument of rpc_service() instead.

An observation of all callers of rpc_service() shows that the preamble
is always set, so we no longer need the "if (preamble)" check.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 remote-curl.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/remote-curl.c b/remote-curl.c
index 8b7baf6298..d33d5bbfa8 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -505,7 +505,6 @@ static void output_refs(struct ref *refs)
 
 struct rpc_state {
 	const char *service_name;
-	struct strbuf *stdin_preamble;
 	char *service_url;
 	char *hdr_content_type;
 	char *hdr_accept;
@@ -829,11 +828,10 @@ static int post_rpc(struct rpc_state *rpc)
 }
 
 static int rpc_service(struct rpc_state *rpc, struct discovery *heads,
-		       const char **client_argv)
+		       const char **client_argv, const struct strbuf *preamble)
 {
 	const char *svc = rpc->service_name;
 	struct strbuf buf = STRBUF_INIT;
-	struct strbuf *preamble = rpc->stdin_preamble;
 	struct child_process client = CHILD_PROCESS_INIT;
 	int err = 0;
 
@@ -843,8 +841,7 @@ static int rpc_service(struct rpc_state *rpc, struct discovery *heads,
 	client.argv = client_argv;
 	if (start_command(&client))
 		exit(1);
-	if (preamble)
-		write_or_die(client.in, preamble->buf, preamble->len);
+	write_or_die(client.in, preamble->buf, preamble->len);
 	if (heads)
 		write_or_die(client.in, heads->buf, heads->len);
 
@@ -978,10 +975,9 @@ static int fetch_git(struct discovery *heads,
 
 	memset(&rpc, 0, sizeof(rpc));
 	rpc.service_name = "git-upload-pack",
-	rpc.stdin_preamble = &preamble;
 	rpc.gzip_request = 1;
 
-	err = rpc_service(&rpc, heads, args.argv);
+	err = rpc_service(&rpc, heads, args.argv, &preamble);
 	if (rpc.result.len)
 		write_or_die(1, rpc.result.buf, rpc.result.len);
 	strbuf_release(&rpc.result);
@@ -1111,9 +1107,8 @@ static int push_git(struct discovery *heads, int nr_spec, char **specs)
 
 	memset(&rpc, 0, sizeof(rpc));
 	rpc.service_name = "git-receive-pack",
-	rpc.stdin_preamble = &preamble;
 
-	err = rpc_service(&rpc, heads, args.argv);
+	err = rpc_service(&rpc, heads, args.argv, &preamble);
 	if (rpc.result.len)
 		write_or_die(1, rpc.result.buf, rpc.result.len);
 	strbuf_release(&rpc.result);
-- 
2.19.0.271.gfe8321ec05.dirty


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 3/5] remote-curl: reduce scope of rpc_state.result
  2019-02-21 20:24 ` [PATCH v2 " Jonathan Tan
  2019-02-21 20:24   ` [PATCH v2 1/5] remote-curl: reduce scope of rpc_state.argv Jonathan Tan
  2019-02-21 20:24   ` [PATCH v2 2/5] remote-curl: reduce scope of rpc_state.stdin_preamble Jonathan Tan
@ 2019-02-21 20:24   ` Jonathan Tan
  2019-02-21 20:24   ` [PATCH v2 4/5] remote-curl: refactor reading into rpc_state's buf Jonathan Tan
  2019-02-21 20:24   ` [PATCH v2 5/5] remote-curl: use post_rpc() for protocol v2 also Jonathan Tan
  4 siblings, 0 replies; 24+ messages in thread
From: Jonathan Tan @ 2019-02-21 20:24 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, peff

The result field in struct rpc_state is only used in rpc_service(), and
not in any functions it directly or indirectly calls. Refactor it to
become an argument of rpc_service() instead.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 remote-curl.c | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/remote-curl.c b/remote-curl.c
index d33d5bbfa8..8e0e37ed3d 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -516,7 +516,6 @@ struct rpc_state {
 	int in;
 	int out;
 	int any_written;
-	struct strbuf result;
 	unsigned gzip_request : 1;
 	unsigned initial_buffer : 1;
 };
@@ -828,7 +827,8 @@ static int post_rpc(struct rpc_state *rpc)
 }
 
 static int rpc_service(struct rpc_state *rpc, struct discovery *heads,
-		       const char **client_argv, const struct strbuf *preamble)
+		       const char **client_argv, const struct strbuf *preamble,
+		       struct strbuf *rpc_result)
 {
 	const char *svc = rpc->service_name;
 	struct strbuf buf = STRBUF_INIT;
@@ -849,7 +849,6 @@ static int rpc_service(struct rpc_state *rpc, struct discovery *heads,
 	rpc->buf = xmalloc(rpc->alloc);
 	rpc->in = client.in;
 	rpc->out = client.out;
-	strbuf_init(&rpc->result, 0);
 
 	strbuf_addf(&buf, "%s%s", url.buf, svc);
 	rpc->service_url = strbuf_detach(&buf, NULL);
@@ -877,7 +876,7 @@ static int rpc_service(struct rpc_state *rpc, struct discovery *heads,
 	close(client.in);
 	client.in = -1;
 	if (!err) {
-		strbuf_read(&rpc->result, client.out, 0);
+		strbuf_read(rpc_result, client.out, 0);
 	} else {
 		char buf[4096];
 		for (;;)
@@ -930,6 +929,7 @@ static int fetch_git(struct discovery *heads,
 	struct strbuf preamble = STRBUF_INIT;
 	int i, err;
 	struct argv_array args = ARGV_ARRAY_INIT;
+	struct strbuf rpc_result = STRBUF_INIT;
 
 	argv_array_pushl(&args, "fetch-pack", "--stateless-rpc",
 			 "--stdin", "--lock-pack", NULL);
@@ -977,10 +977,10 @@ static int fetch_git(struct discovery *heads,
 	rpc.service_name = "git-upload-pack",
 	rpc.gzip_request = 1;
 
-	err = rpc_service(&rpc, heads, args.argv, &preamble);
-	if (rpc.result.len)
-		write_or_die(1, rpc.result.buf, rpc.result.len);
-	strbuf_release(&rpc.result);
+	err = rpc_service(&rpc, heads, args.argv, &preamble, &rpc_result);
+	if (rpc_result.len)
+		write_or_die(1, rpc_result.buf, rpc_result.len);
+	strbuf_release(&rpc_result);
 	strbuf_release(&preamble);
 	argv_array_clear(&args);
 	return err;
@@ -1075,6 +1075,7 @@ static int push_git(struct discovery *heads, int nr_spec, char **specs)
 	struct argv_array args;
 	struct string_list_item *cas_option;
 	struct strbuf preamble = STRBUF_INIT;
+	struct strbuf rpc_result = STRBUF_INIT;
 
 	argv_array_init(&args);
 	argv_array_pushl(&args, "send-pack", "--stateless-rpc", "--helper-status",
@@ -1108,10 +1109,10 @@ static int push_git(struct discovery *heads, int nr_spec, char **specs)
 	memset(&rpc, 0, sizeof(rpc));
 	rpc.service_name = "git-receive-pack",
 
-	err = rpc_service(&rpc, heads, args.argv, &preamble);
-	if (rpc.result.len)
-		write_or_die(1, rpc.result.buf, rpc.result.len);
-	strbuf_release(&rpc.result);
+	err = rpc_service(&rpc, heads, args.argv, &preamble, &rpc_result);
+	if (rpc_result.len)
+		write_or_die(1, rpc_result.buf, rpc_result.len);
+	strbuf_release(&rpc_result);
 	strbuf_release(&preamble);
 	argv_array_clear(&args);
 	return err;
-- 
2.19.0.271.gfe8321ec05.dirty


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 4/5] remote-curl: refactor reading into rpc_state's buf
  2019-02-21 20:24 ` [PATCH v2 " Jonathan Tan
                     ` (2 preceding siblings ...)
  2019-02-21 20:24   ` [PATCH v2 3/5] remote-curl: reduce scope of rpc_state.result Jonathan Tan
@ 2019-02-21 20:24   ` Jonathan Tan
  2019-02-21 20:24   ` [PATCH v2 5/5] remote-curl: use post_rpc() for protocol v2 also Jonathan Tan
  4 siblings, 0 replies; 24+ messages in thread
From: Jonathan Tan @ 2019-02-21 20:24 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, peff

Currently, whenever remote-curl reads pkt-lines from its response file
descriptor, only the payload is written to its buf, not the 4 characters
denoting the length. A future patch will require the ability to also
write those 4 characters, so in preparation for that, refactor this read
into its own function.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 remote-curl.c | 33 ++++++++++++++++++++++++---------
 1 file changed, 24 insertions(+), 9 deletions(-)

diff --git a/remote-curl.c b/remote-curl.c
index 8e0e37ed3d..1f0161475d 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -520,6 +520,25 @@ struct rpc_state {
 	unsigned initial_buffer : 1;
 };
 
+/*
+ * Appends the result of reading from rpc->out to the string represented by
+ * rpc->buf and rpc->len if there is enough space. Returns 1 if there was
+ * enough space, 0 otherwise.
+ *
+ * Writes the number of bytes appended into appended.
+ */
+static int rpc_read_from_out(struct rpc_state *rpc, size_t *appended) {
+	size_t left = rpc->alloc - rpc->len;
+	char *buf = rpc->buf + rpc->len;
+
+	if (left < LARGE_PACKET_MAX)
+		return 0;
+
+	*appended = packet_read(rpc->out, NULL, NULL, buf, left, 0);
+	rpc->len += *appended;
+	return 1;
+}
+
 static size_t rpc_out(void *ptr, size_t eltsize,
 		size_t nmemb, void *buffer_)
 {
@@ -529,11 +548,12 @@ static size_t rpc_out(void *ptr, size_t eltsize,
 
 	if (!avail) {
 		rpc->initial_buffer = 0;
-		avail = packet_read(rpc->out, NULL, NULL, rpc->buf, rpc->alloc, 0);
+		rpc->len = 0;
+		if (!rpc_read_from_out(rpc, &avail))
+			BUG("The entire rpc->buf should be larger than LARGE_PACKET_DATA_MAX");
 		if (!avail)
 			return 0;
 		rpc->pos = 0;
-		rpc->len = avail;
 	}
 
 	if (max < avail)
@@ -677,20 +697,15 @@ static int post_rpc(struct rpc_state *rpc)
 	 * chunked encoding mess.
 	 */
 	while (1) {
-		size_t left = rpc->alloc - rpc->len;
-		char *buf = rpc->buf + rpc->len;
-		int n;
+		size_t n;
 
-		if (left < LARGE_PACKET_MAX) {
+		if (!rpc_read_from_out(rpc, &n)) {
 			large_request = 1;
 			use_gzip = 0;
 			break;
 		}
-
-		n = packet_read(rpc->out, NULL, NULL, buf, left, 0);
 		if (!n)
 			break;
-		rpc->len += n;
 	}
 
 	if (large_request) {
-- 
2.19.0.271.gfe8321ec05.dirty


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 5/5] remote-curl: use post_rpc() for protocol v2 also
  2019-02-21 20:24 ` [PATCH v2 " Jonathan Tan
                     ` (3 preceding siblings ...)
  2019-02-21 20:24   ` [PATCH v2 4/5] remote-curl: refactor reading into rpc_state's buf Jonathan Tan
@ 2019-02-21 20:24   ` Jonathan Tan
  2019-02-22 13:18     ` Eric Sunshine
  2019-02-25 22:08     ` Jeff King
  4 siblings, 2 replies; 24+ messages in thread
From: Jonathan Tan @ 2019-02-21 20:24 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, peff

When transmitting and receiving POSTs for protocol v0 and v1,
remote-curl uses post_rpc() (and associated functions), but when doing
the same for protocol v2, it uses a separate set of functions
(proxy_rpc() and others). Besides duplication of code, this has caused
at least one bug: the auth retry mechanism that was implemented in v0/v1
was not implemented in v2.

To fix this issue and avoid it in the future, make remote-curl also use
post_rpc() when handling protocol v2. Because line lengths are written
to the HTTP request in protocol v2 (unlike in protocol v0/v1), this
necessitates changes in post_rpc() and some of the functions it uses;
perform these changes too.

A test has been included to ensure that the code for both the unchunked
and chunked variants of the HTTP request is exercised.

Note: stateless_connect() has been updated to use the lower-level packet
reading functions instead of struct packet_reader. The low-level control
is necessary here because we cannot change the destination buffer of
struct packet_reader while it is being used; struct packet_buffer has a
peeking mechanism which relies on the destination buffer being present
in between a peek and a read.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 pkt-line.c             |   2 +-
 pkt-line.h             |   1 +
 remote-curl.c          | 309 ++++++++++++++++++-----------------------
 t/t5702-protocol-v2.sh |  33 ++++-
 4 files changed, 168 insertions(+), 177 deletions(-)

diff --git a/pkt-line.c b/pkt-line.c
index d4b71d3e82..60329b301b 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -117,7 +117,7 @@ void packet_buf_delim(struct strbuf *buf)
 	strbuf_add(buf, "0001", 4);
 }
 
-static void set_packet_header(char *buf, const int size)
+void set_packet_header(char *buf, const int size)
 {
 	static char hexchar[] = "0123456789abcdef";
 
diff --git a/pkt-line.h b/pkt-line.h
index ad9a4a2cd7..c36cb788ed 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -25,6 +25,7 @@ void packet_delim(int fd);
 void packet_write_fmt(int fd, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
 void packet_buf_flush(struct strbuf *buf);
 void packet_buf_delim(struct strbuf *buf);
+void set_packet_header(char *buf, int size);
 void packet_write(int fd_out, const char *buf, size_t size);
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
 void packet_buf_write_len(struct strbuf *buf, const char *data, size_t len);
diff --git a/remote-curl.c b/remote-curl.c
index 1f0161475d..8c03c78fc6 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -518,6 +518,21 @@ struct rpc_state {
 	int any_written;
 	unsigned gzip_request : 1;
 	unsigned initial_buffer : 1;
+
+	/*
+	 * Whenever a pkt-line is read into buf, append the 4 characters
+	 * denoting its length before appending the payload.
+	 */
+	unsigned write_line_lengths : 1;
+
+	/*
+	 * Used by rpc_out; initialize to 0. This is true if a flush has been
+	 * read, but the corresponding line length (if write_line_lengths is
+	 * true) and EOF have not been sent to libcurl. Since each flush marks
+	 * the end of a request, each flush must be completely sent before any
+	 * further reading occurs.
+	 */
+	unsigned flush_read_but_not_sent : 1;
 };
 
 /*
@@ -525,17 +540,54 @@ struct rpc_state {
  * rpc->buf and rpc->len if there is enough space. Returns 1 if there was
  * enough space, 0 otherwise.
  *
- * Writes the number of bytes appended into appended.
+ * If rpc->write_line_lengths is true, appends the line length as a 4-byte
+ * hexadecimal string before appending the result described above.
+ *
+ * Writes the total number of bytes appended into appended.
  */
-static int rpc_read_from_out(struct rpc_state *rpc, size_t *appended) {
-	size_t left = rpc->alloc - rpc->len;
-	char *buf = rpc->buf + rpc->len;
+static int rpc_read_from_out(struct rpc_state *rpc, int options,
+			     size_t *appended,
+			     enum packet_read_status *status) {
+	size_t left;
+	char *buf;
+	int pktlen_raw;
+
+	if (rpc->write_line_lengths) {
+		left = rpc->alloc - rpc->len - 4;
+		buf = rpc->buf + rpc->len + 4;
+	} else {
+		left = rpc->alloc - rpc->len;
+		buf = rpc->buf + rpc->len;
+	}
 
 	if (left < LARGE_PACKET_MAX)
 		return 0;
 
-	*appended = packet_read(rpc->out, NULL, NULL, buf, left, 0);
-	rpc->len += *appended;
+	*status = packet_read_with_status(rpc->out, NULL, NULL, buf,
+			left, &pktlen_raw, options);
+	if (*status != PACKET_READ_EOF) {
+		*appended = pktlen_raw + (rpc->write_line_lengths ? 4 : 0);
+		rpc->len += *appended;
+	}
+
+	if (rpc->write_line_lengths) {
+		switch (*status) {
+		case PACKET_READ_EOF:
+			if (!(options & PACKET_READ_GENTLE_ON_EOF))
+				die("shouldn't have EOF when not gentle on EOF");
+			break;
+		case PACKET_READ_NORMAL:
+			set_packet_header(buf - 4, *appended);
+			break;
+		case PACKET_READ_DELIM:
+			memcpy(buf - 4, "0001", 4);
+			break;
+		case PACKET_READ_FLUSH:
+			memcpy(buf - 4, "0000", 4);
+			break;
+		}
+	}
+
 	return 1;
 }
 
@@ -545,15 +597,40 @@ static size_t rpc_out(void *ptr, size_t eltsize,
 	size_t max = eltsize * nmemb;
 	struct rpc_state *rpc = buffer_;
 	size_t avail = rpc->len - rpc->pos;
+	enum packet_read_status status;
 
 	if (!avail) {
 		rpc->initial_buffer = 0;
 		rpc->len = 0;
-		if (!rpc_read_from_out(rpc, &avail))
-			BUG("The entire rpc->buf should be larger than LARGE_PACKET_DATA_MAX");
-		if (!avail)
-			return 0;
 		rpc->pos = 0;
+		if (!rpc->flush_read_but_not_sent) {
+			if (!rpc_read_from_out(rpc, 0, &avail, &status))
+				BUG("The entire rpc->buf should be larger than LARGE_PACKET_MAX");
+			if (status == PACKET_READ_FLUSH)
+				rpc->flush_read_but_not_sent = 1;
+		}
+		/*
+		 * If flush_read_but_not_sent is true, we have already read one
+		 * full request but have not fully sent it + EOF, which is why
+		 * we need to refrain from reading.
+		 */
+	}
+	if (rpc->flush_read_but_not_sent) {
+		if (!avail) {
+			/*
+			 * The line length either does not need to be sent at
+			 * all or has already been completely sent. Now we can
+			 * return 0, indicating EOF, meaning that the flush has
+			 * been fully sent.
+			 */
+			rpc->flush_read_but_not_sent = 0;
+			return 0;
+		}
+		/*
+		 * If avail is non-zerp, the line length for the flush still
+		 * hasn't been fully sent. Proceed with sending the line
+		 * length.
+		 */
 	}
 
 	if (max < avail)
@@ -698,13 +775,14 @@ static int post_rpc(struct rpc_state *rpc)
 	 */
 	while (1) {
 		size_t n;
+		enum packet_read_status status;
 
-		if (!rpc_read_from_out(rpc, &n)) {
+		if (!rpc_read_from_out(rpc, 0, &n, &status)) {
 			large_request = 1;
 			use_gzip = 0;
 			break;
 		}
-		if (!n)
+		if (status == PACKET_READ_FLUSH)
 			break;
 	}
 
@@ -1179,165 +1257,11 @@ static void parse_push(struct strbuf *buf)
 	free(specs);
 }
 
-/*
- * Used to represent the state of a connection to an HTTP server when
- * communicating using git's wire-protocol version 2.
- */
-struct proxy_state {
-	char *service_name;
-	char *service_url;
-	struct curl_slist *headers;
-	struct strbuf request_buffer;
-	int in;
-	int out;
-	struct packet_reader reader;
-	size_t pos;
-	int seen_flush;
-};
-
-static void proxy_state_init(struct proxy_state *p, const char *service_name,
-			     enum protocol_version version)
-{
-	struct strbuf buf = STRBUF_INIT;
-
-	memset(p, 0, sizeof(*p));
-	p->service_name = xstrdup(service_name);
-
-	p->in = 0;
-	p->out = 1;
-	strbuf_init(&p->request_buffer, 0);
-
-	strbuf_addf(&buf, "%s%s", url.buf, p->service_name);
-	p->service_url = strbuf_detach(&buf, NULL);
-
-	p->headers = http_copy_default_headers();
-
-	strbuf_addf(&buf, "Content-Type: application/x-%s-request", p->service_name);
-	p->headers = curl_slist_append(p->headers, buf.buf);
-	strbuf_reset(&buf);
-
-	strbuf_addf(&buf, "Accept: application/x-%s-result", p->service_name);
-	p->headers = curl_slist_append(p->headers, buf.buf);
-	strbuf_reset(&buf);
-
-	p->headers = curl_slist_append(p->headers, "Transfer-Encoding: chunked");
-
-	/* Add the Git-Protocol header */
-	if (get_protocol_http_header(version, &buf))
-		p->headers = curl_slist_append(p->headers, buf.buf);
-
-	packet_reader_init(&p->reader, p->in, NULL, 0,
-			   PACKET_READ_GENTLE_ON_EOF |
-			   PACKET_READ_DIE_ON_ERR_PACKET);
-
-	strbuf_release(&buf);
-}
-
-static void proxy_state_clear(struct proxy_state *p)
-{
-	free(p->service_name);
-	free(p->service_url);
-	curl_slist_free_all(p->headers);
-	strbuf_release(&p->request_buffer);
-}
-
-/*
- * CURLOPT_READFUNCTION callback function.
- * Attempts to copy over a single packet-line at a time into the
- * curl provided buffer.
- */
-static size_t proxy_in(char *buffer, size_t eltsize,
-		       size_t nmemb, void *userdata)
-{
-	size_t max;
-	struct proxy_state *p = userdata;
-	size_t avail = p->request_buffer.len - p->pos;
-
-
-	if (eltsize != 1)
-		BUG("curl read callback called with size = %"PRIuMAX" != 1",
-		    (uintmax_t)eltsize);
-	max = nmemb;
-
-	if (!avail) {
-		if (p->seen_flush) {
-			p->seen_flush = 0;
-			return 0;
-		}
-
-		strbuf_reset(&p->request_buffer);
-		switch (packet_reader_read(&p->reader)) {
-		case PACKET_READ_EOF:
-			die("unexpected EOF when reading from parent process");
-		case PACKET_READ_NORMAL:
-			packet_buf_write_len(&p->request_buffer, p->reader.line,
-					     p->reader.pktlen);
-			break;
-		case PACKET_READ_DELIM:
-			packet_buf_delim(&p->request_buffer);
-			break;
-		case PACKET_READ_FLUSH:
-			packet_buf_flush(&p->request_buffer);
-			p->seen_flush = 1;
-			break;
-		}
-		p->pos = 0;
-		avail = p->request_buffer.len;
-	}
-
-	if (max < avail)
-		avail = max;
-	memcpy(buffer, p->request_buffer.buf + p->pos, avail);
-	p->pos += avail;
-	return avail;
-}
-
-static size_t proxy_out(char *buffer, size_t eltsize,
-			size_t nmemb, void *userdata)
-{
-	size_t size;
-	struct proxy_state *p = userdata;
-
-	if (eltsize != 1)
-		BUG("curl read callback called with size = %"PRIuMAX" != 1",
-		    (uintmax_t)eltsize);
-	size = nmemb;
-
-	write_or_die(p->out, buffer, size);
-	return size;
-}
-
-/* Issues a request to the HTTP server configured in `p` */
-static int proxy_request(struct proxy_state *p)
-{
-	struct active_request_slot *slot;
-
-	slot = get_active_slot();
-
-	curl_easy_setopt(slot->curl, CURLOPT_ENCODING, "");
-	curl_easy_setopt(slot->curl, CURLOPT_NOBODY, 0);
-	curl_easy_setopt(slot->curl, CURLOPT_POST, 1);
-	curl_easy_setopt(slot->curl, CURLOPT_URL, p->service_url);
-	curl_easy_setopt(slot->curl, CURLOPT_HTTPHEADER, p->headers);
-
-	/* Setup function to read request from client */
-	curl_easy_setopt(slot->curl, CURLOPT_READFUNCTION, proxy_in);
-	curl_easy_setopt(slot->curl, CURLOPT_READDATA, p);
-
-	/* Setup function to write server response to client */
-	curl_easy_setopt(slot->curl, CURLOPT_WRITEFUNCTION, proxy_out);
-	curl_easy_setopt(slot->curl, CURLOPT_WRITEDATA, p);
-
-	if (run_slot(slot, NULL) != HTTP_OK)
-		return -1;
-
-	return 0;
-}
-
 static int stateless_connect(const char *service_name)
 {
 	struct discovery *discover;
-	struct proxy_state p;
+	struct rpc_state rpc;
+	struct strbuf buf = STRBUF_INIT;
 
 	/*
 	 * Run the info/refs request and see if the server supports protocol
@@ -1357,23 +1281,58 @@ static int stateless_connect(const char *service_name)
 		fflush(stdout);
 	}
 
-	proxy_state_init(&p, service_name, discover->version);
+	rpc.service_name = service_name;
+	rpc.service_url = xstrfmt("%s%s", url.buf, rpc.service_name);
+	rpc.hdr_content_type = xstrfmt("Content-Type: application/x-%s-request", rpc.service_name);
+	rpc.hdr_accept = xstrfmt("Accept: application/x-%s-result", rpc.service_name);
+	if (get_protocol_http_header(discover->version, &buf)) {
+		rpc.protocol_header = strbuf_detach(&buf, NULL);
+	} else {
+		rpc.protocol_header = NULL;
+		strbuf_release(&buf);
+	}
+	rpc.buf = xmalloc(http_post_buffer);
+	rpc.alloc = http_post_buffer;
+	rpc.len = 0;
+	rpc.pos = 0;
+	rpc.in = 1;
+	rpc.out = 0;
+	rpc.any_written = 0;
+	rpc.gzip_request = 1;
+	rpc.initial_buffer = 0;
+	rpc.write_line_lengths = 1;
+	rpc.flush_read_but_not_sent = 0;
 
 	/*
 	 * Dump the capability listing that we got from the server earlier
 	 * during the info/refs request.
 	 */
-	write_or_die(p.out, discover->buf, discover->len);
+	write_or_die(rpc.in, discover->buf, discover->len);
+
+	/* Until we see EOF keep sending POSTs */
+	while (1) {
+		size_t avail;
+		enum packet_read_status status;
 
-	/* Peek the next packet line.  Until we see EOF keep sending POSTs */
-	while (packet_reader_peek(&p.reader) != PACKET_READ_EOF) {
-		if (proxy_request(&p)) {
+		if (!rpc_read_from_out(&rpc, PACKET_READ_GENTLE_ON_EOF, &avail,
+				       &status))
+			BUG("The entire rpc->buf should be larger than LARGE_PACKET_MAX");
+		if (status == PACKET_READ_EOF)
+			break;
+		if (post_rpc(&rpc))
 			/* We would have an err here */
 			break;
-		}
+		/* Reset the buffer for next request */
+		rpc.len = 0;
 	}
 
-	proxy_state_clear(&p);
+	free(rpc.service_url);
+	free(rpc.hdr_content_type);
+	free(rpc.hdr_accept);
+	free(rpc.protocol_header);
+	free(rpc.buf);
+	strbuf_release(&buf);
+
 	return 0;
 }
 
diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
index db4ae09f2f..e112b6086c 100755
--- a/t/t5702-protocol-v2.sh
+++ b/t/t5702-protocol-v2.sh
@@ -542,7 +542,38 @@ test_expect_success 'clone with http:// using protocol v2' '
 	# Client requested to use protocol v2
 	grep "Git-Protocol: version=2" log &&
 	# Server responded using protocol v2
-	grep "git< version 2" log
+	grep "git< version 2" log &&
+	# Verify that the chunked encoding sending codepath is NOT exercised
+	! grep "Send header: Transfer-Encoding: chunked" log
+'
+
+test_expect_success 'clone big repository with http:// using protocol v2' '
+	test_when_finished "rm -f log" &&
+
+	git init "$HTTPD_DOCUMENT_ROOT_PATH/big" &&
+	# Ensure that the list of wants is greater than http.postbuffer below
+	for i in $(test_seq 1 1500)
+	do
+		# do not use here-doc, because it requires a process
+		# per loop iteration
+		echo "commit refs/heads/too-many-refs-$i" &&
+		echo "committer git <git@example.com> $i +0000" &&
+		echo "data 0" &&
+		echo "M 644 inline bla.txt" &&
+		echo "data 4" &&
+		echo "bla"
+	done | git -C "$HTTPD_DOCUMENT_ROOT_PATH/big" fast-import &&
+
+	GIT_TRACE_PACKET="$(pwd)/log" GIT_TRACE_CURL="$(pwd)/log" git \
+		-c protocol.version=2 -c http.postbuffer=65536 \
+		clone "$HTTPD_URL/smart/big" big_child &&
+
+	# Client requested to use protocol v2
+	grep "Git-Protocol: version=2" log &&
+	# Server responded using protocol v2
+	grep "git< version 2" log &&
+	# Verify that the chunked encoding sending codepath is exercised
+	grep "Send header: Transfer-Encoding: chunked" log
 '
 
 test_expect_success 'fetch with http:// using protocol v2' '
-- 
2.19.0.271.gfe8321ec05.dirty


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 5/5] remote-curl: use post_rpc() for protocol v2 also
  2019-02-21 20:24   ` [PATCH v2 5/5] remote-curl: use post_rpc() for protocol v2 also Jonathan Tan
@ 2019-02-22 13:18     ` Eric Sunshine
  2019-02-22 19:15       ` Eric Sunshine
  2019-02-25 22:08     ` Jeff King
  1 sibling, 1 reply; 24+ messages in thread
From: Eric Sunshine @ 2019-02-22 13:18 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Git List, Jeff King

On Thu, Feb 21, 2019 at 3:25 PM Jonathan Tan <jonathantanmy@google.com> wrote:
> diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
> @@ -542,7 +542,38 @@ test_expect_success 'clone with http:// using protocol v2' '
> +test_expect_success 'clone big repository with http:// using protocol v2' '
> +       test_when_finished "rm -f log" &&
> +
> +       git init "$HTTPD_DOCUMENT_ROOT_PATH/big" &&
> +       # Ensure that the list of wants is greater than http.postbuffer below
> +       for i in $(test_seq 1 1500)
> +       do
> +               # do not use here-doc, because it requires a process
> +               # per loop iteration
> +               echo "commit refs/heads/too-many-refs-$i" &&
> +               echo "committer git <git@example.com> $i +0000" &&
> +               echo "data 0" &&
> +               echo "M 644 inline bla.txt" &&
> +               echo "data 4" &&
> +               echo "bla"
> +       done | git -C "$HTTPD_DOCUMENT_ROOT_PATH/big" fast-import &&

Although not outright wrong, maintaining an &&-chain in the loop body
is a bit misleading here for a couple reasons. First, without "|| exit
1" at the end of the body, neither the loop nor the test itself will
abort early if some command within the body fails. Second, the exit
status of the loop is lost anyhow due to being upstream of a pipe.

By the way, I see that this same script already contains a couple
tests with loops neglecting the trailing "|| exit 1", hence they too
fail to abort early if something goes wrong inside the body (which, in
both cases, contains a test_commit() invocation). Fixing those might
be fodder as a follow-up patch for someone.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 5/5] remote-curl: use post_rpc() for protocol v2 also
  2019-02-22 13:18     ` Eric Sunshine
@ 2019-02-22 19:15       ` Eric Sunshine
  0 siblings, 0 replies; 24+ messages in thread
From: Eric Sunshine @ 2019-02-22 19:15 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Git List, Jeff King

On Fri, Feb 22, 2019 at 8:18 AM Eric Sunshine <sunshine@sunshineco.com> wrote:
> On Thu, Feb 21, 2019 at 3:25 PM Jonathan Tan <jonathantanmy@google.com> wrote:
> > +       for i in $(test_seq 1 1500)
> > +       do
> > +               # do not use here-doc, because it requires a process
> > +               # per loop iteration
> > +               echo "commit refs/heads/too-many-refs-$i" &&
> > +               echo "committer git <git@example.com> $i +0000" &&
> > +               echo "data 0" &&
> > +               echo "M 644 inline bla.txt" &&
> > +               echo "data 4" &&
> > +               echo "bla"
> > +       done | git -C "$HTTPD_DOCUMENT_ROOT_PATH/big" fast-import &&
>
> Although not outright wrong, maintaining an &&-chain in the loop body
> is a bit misleading here for a couple reasons. First, without "|| exit
> 1" at the end of the body, neither the loop nor the test itself will
> abort early if some command within the body fails. [...]

I meant "|| return 1", not "|| exit 1".

> By the way, I see that this same script already contains a couple
> tests with loops neglecting the trailing "|| exit 1", hence they too
> fail to abort early if something goes wrong inside the body [...]

Ditto.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 5/5] remote-curl: use post_rpc() for protocol v2 also
  2019-02-21 20:24   ` [PATCH v2 5/5] remote-curl: use post_rpc() for protocol v2 also Jonathan Tan
  2019-02-22 13:18     ` Eric Sunshine
@ 2019-02-25 22:08     ` Jeff King
  2019-02-25 23:49       ` [FIXUP] Fixup on tip of jt/http-auth-proto-v2-fix Jonathan Tan
  1 sibling, 1 reply; 24+ messages in thread
From: Jeff King @ 2019-02-25 22:08 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On Thu, Feb 21, 2019 at 12:24:41PM -0800, Jonathan Tan wrote:

> When transmitting and receiving POSTs for protocol v0 and v1,
> remote-curl uses post_rpc() (and associated functions), but when doing
> the same for protocol v2, it uses a separate set of functions
> (proxy_rpc() and others). Besides duplication of code, this has caused
> at least one bug: the auth retry mechanism that was implemented in v0/v1
> was not implemented in v2.
> 
> To fix this issue and avoid it in the future, make remote-curl also use
> post_rpc() when handling protocol v2. Because line lengths are written
> to the HTTP request in protocol v2 (unlike in protocol v0/v1), this
> necessitates changes in post_rpc() and some of the functions it uses;
> perform these changes too.

This patch seems to cause noop fetches from GitHub to report "the remote
end hung up unexpectedly" at the end of the fetch (I have
protocol.version=2 set in the examples below):

  [parent is good]
  $ git checkout 0cdb2a12ad0300b5d0cb5bb6e8999034ae4b9bef^
  $ make
  $ bin-wrappers/git fetch https://github.com/git/git
  From https://github.com/git/git
   * branch                  HEAD       -> FETCH_HEAD

  [this patch is bad]
  $ git checkout 0cdb2a12ad0300b5d0cb5bb6e8999034ae4b9bef
  $ make
  $ bin-wrappers/git fetch https://github.com/git/git
  From https://github.com/git/git
   * branch                  HEAD       -> FETCH_HEAD
  fatal: the remote end hung up unexpectedly

I haven't dug, so it's possible that GitHub's server-side is doing
something slightly odd or wrong. But given the nature of the patch, I'd
suspect we just aren't noticing EOF correctly.

-Peff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [FIXUP] Fixup on tip of jt/http-auth-proto-v2-fix
  2019-02-25 22:08     ` Jeff King
@ 2019-02-25 23:49       ` Jonathan Tan
  2019-02-26  7:04         ` Jonathan Nieder
  2019-02-27 12:02         ` Jeff King
  0 siblings, 2 replies; 24+ messages in thread
From: Jonathan Tan @ 2019-02-25 23:49 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, peff

Thanks, Peff, for noticing this. It's because the client sometimes sends
"0000" as a single request (that is, it flushes, and then before it
sends any data, it flushes again). And post_rpc() assumes that it can
always read something - which is usually correct, but not in this case;
we read in stateless_connect() first, and if we read "0000", we need to
tell post_rpc() to not read at all.

This is a fixup on the tip of jt/http-auth-proto-v2-fix that fixes that.

As for why the client sends a lone "0000", I'm not sure, but that's
outside the scope of this patch set, I think.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 remote-curl.c | 32 +++++++++++++++++++-------------
 1 file changed, 19 insertions(+), 13 deletions(-)

diff --git a/remote-curl.c b/remote-curl.c
index 2394a00650..6c0207f4f9 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -747,7 +747,11 @@ static curl_off_t xcurl_off_t(size_t len)
 	return (curl_off_t)size;
 }
 
-static int post_rpc(struct rpc_state *rpc)
+/*
+ * If flush_received is true, do not attempt to read any more; just use what's
+ * in rpc->buf.
+ */
+static int post_rpc(struct rpc_state *rpc, int flush_received)
 {
 	struct active_request_slot *slot;
 	struct curl_slist *headers = http_copy_default_headers();
@@ -762,17 +766,19 @@ static int post_rpc(struct rpc_state *rpc)
 	 * allocated buffer space we can use HTTP/1.0 and avoid the
 	 * chunked encoding mess.
 	 */
-	while (1) {
-		size_t n;
-		enum packet_read_status status;
-
-		if (!rpc_read_from_out(rpc, 0, &n, &status)) {
-			large_request = 1;
-			use_gzip = 0;
-			break;
+	if (!flush_received) {
+		while (1) {
+			size_t n;
+			enum packet_read_status status;
+
+			if (!rpc_read_from_out(rpc, 0, &n, &status)) {
+				large_request = 1;
+				use_gzip = 0;
+				break;
+			}
+			if (status == PACKET_READ_FLUSH)
+				break;
 		}
-		if (status == PACKET_READ_FLUSH)
-			break;
 	}
 
 	if (large_request) {
@@ -952,7 +958,7 @@ static int rpc_service(struct rpc_state *rpc, struct discovery *heads,
 			break;
 		rpc->pos = 0;
 		rpc->len = n;
-		err |= post_rpc(rpc);
+		err |= post_rpc(rpc, 0);
 	}
 
 	close(client.in);
@@ -1308,7 +1314,7 @@ static int stateless_connect(const char *service_name)
 			BUG("The entire rpc->buf should be larger than LARGE_PACKET_MAX");
 		if (status == PACKET_READ_EOF)
 			break;
-		if (post_rpc(&rpc))
+		if (post_rpc(&rpc, status == PACKET_READ_FLUSH))
 			/* We would have an err here */
 			break;
 		/* Reset the buffer for next request */
-- 
2.19.0.271.gfe8321ec05.dirty


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [FIXUP] Fixup on tip of jt/http-auth-proto-v2-fix
  2019-02-25 23:49       ` [FIXUP] Fixup on tip of jt/http-auth-proto-v2-fix Jonathan Tan
@ 2019-02-26  7:04         ` Jonathan Nieder
  2019-02-26 18:18           ` Jonathan Tan
  2019-02-27 12:02         ` Jeff King
  1 sibling, 1 reply; 24+ messages in thread
From: Jonathan Nieder @ 2019-02-26  7:04 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, peff

Hi,

Jonathan Tan wrote:

> Thanks, Peff, for noticing this. It's because the client sometimes sends
> "0000" as a single request (that is, it flushes, and then before it
> sends any data, it flushes again). And post_rpc() assumes that it can
> always read something - which is usually correct, but not in this case;
> we read in stateless_connect() first, and if we read "0000", we need to
> tell post_rpc() to not read at all.
>
> This is a fixup on the tip of jt/http-auth-proto-v2-fix that fixes that.
>
> As for why the client sends a lone "0000", I'm not sure, but that's
> outside the scope of this patch set, I think.
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  remote-curl.c | 32 +++++++++++++++++++-------------
>  1 file changed, 19 insertions(+), 13 deletions(-)

Tested-by: Jonathan Nieder <jrnieder@gmail.com>

Thanks for fixing it.

Is there a particular patch this should be squashed into, or does it
stand alone?  It the latter, mind writing a commit message for it?

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [FIXUP] Fixup on tip of jt/http-auth-proto-v2-fix
  2019-02-26  7:04         ` Jonathan Nieder
@ 2019-02-26 18:18           ` Jonathan Tan
  2019-03-04  3:45             ` Junio C Hamano
  0 siblings, 1 reply; 24+ messages in thread
From: Jonathan Tan @ 2019-02-26 18:18 UTC (permalink / raw)
  To: jrnieder; +Cc: jonathantanmy, git, peff

> Thanks for fixing it.
> 
> Is there a particular patch this should be squashed into, or does it
> stand alone?  It the latter, mind writing a commit message for it?

Not sure if I'm using "fixup" correctly in the subject, but this is
meant to be squashed onto the tip of jt/http-auth-proto-v2-fix -
specifically, deb7d2094a ("remote-curl: use post_rpc() for protocol v2
also", 2019-02-14).

I don't think it should be a standalone commit, as the tip is buggy -
which might break bisect. But if we really want a standalone commit,
this commit message should work:

remote-curl: handle consecutive flushes correctly

When the client, using protocol v2, sends two consecutive flushes and
then an EOF, remote-curl dies. This is because, at the start of a
new request, stateless_connect() reads, and if no EOF is found, then
stateless_connect() calls post_rpc() which reads until a flush is
encountered. This is a problem when stateless_connect() reads the second
consecutive flush (hence, no EOF), and then post_rpc() reads, not
expecting an EOF at all.

Teach stateless_connect() to inform post_rpc() to read only if what it
read isn't a flush.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [FIXUP] Fixup on tip of jt/http-auth-proto-v2-fix
  2019-02-25 23:49       ` [FIXUP] Fixup on tip of jt/http-auth-proto-v2-fix Jonathan Tan
  2019-02-26  7:04         ` Jonathan Nieder
@ 2019-02-27 12:02         ` Jeff King
  1 sibling, 0 replies; 24+ messages in thread
From: Jeff King @ 2019-02-27 12:02 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On Mon, Feb 25, 2019 at 03:49:01PM -0800, Jonathan Tan wrote:

> Thanks, Peff, for noticing this. It's because the client sometimes sends
> "0000" as a single request (that is, it flushes, and then before it
> sends any data, it flushes again). And post_rpc() assumes that it can
> always read something - which is usually correct, but not in this case;
> we read in stateless_connect() first, and if we read "0000", we need to
> tell post_rpc() to not read at all.
> 
> This is a fixup on the tip of jt/http-auth-proto-v2-fix that fixes that.

Thanks, I can confirm that this makes the problem go away (and your
explanation makes perfect sense to me).

> As for why the client sends a lone "0000", I'm not sure, but that's
> outside the scope of this patch set, I think.

Yeah, that does seem odd. I noticed it on noop fetches. So after we've
done "ls-refs", would "fetch" need to send a flush to say "I don't want
anything?" I guess not, since we're stateless, and it is literally
making a new HTTP request just to say nothing.

It does seem unique to protocol v2.

-Peff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [FIXUP] Fixup on tip of jt/http-auth-proto-v2-fix
  2019-02-26 18:18           ` Jonathan Tan
@ 2019-03-04  3:45             ` Junio C Hamano
  0 siblings, 0 replies; 24+ messages in thread
From: Junio C Hamano @ 2019-03-04  3:45 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: jrnieder, git, peff

Jonathan Tan <jonathantanmy@google.com> writes:

>> Thanks for fixing it.
>> 
>> Is there a particular patch this should be squashed into, or does it
>> stand alone?  It the latter, mind writing a commit message for it?
>
> Not sure if I'm using "fixup" correctly in the subject, but this is
> meant to be squashed onto the tip of jt/http-auth-proto-v2-fix -
> specifically, deb7d2094a ("remote-curl: use post_rpc() for protocol v2
> also", 2019-02-14).

Saying "Fixup on tip of" to mean that you are fixing the 5th one in
a 5-patch series is sufficient to convey what you want between
humans, but that does not help the machine to help humans reduce
mistakes.  It would have been even more helpful if you created this
commit with

	git commit --fixup 0cdb2a12ad

which would have produced something like the attached patch,
and that would have allowed me to do

	git checkout jt/http-auth-proto-v2-fix
	git am ./+jt-http-auth-proto-v2-fix-fixup
	git rebase -i --autosquash HEAD~5

or somesuch.

Thanks, will queue.

-- >8 --
Subject: [PATCH] fixup! remote-curl: use post_rpc() for protocol v2 also

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 remote-curl.c | 32 +++++++++++++++++++-------------
 1 file changed, 19 insertions(+), 13 deletions(-)

diff --git a/remote-curl.c b/remote-curl.c
index 8c03c78fc6..3c3872cca6 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -758,7 +758,11 @@ static curl_off_t xcurl_off_t(size_t len)
 	return (curl_off_t)size;
 }
 
-static int post_rpc(struct rpc_state *rpc)
+/*
+ * If flush_received is true, do not attempt to read any more; just use what's
+ * in rpc->buf.
+ */
+static int post_rpc(struct rpc_state *rpc, int flush_received)
 {
 	struct active_request_slot *slot;
 	struct curl_slist *headers = http_copy_default_headers();
@@ -773,17 +777,19 @@ static int post_rpc(struct rpc_state *rpc)
 	 * allocated buffer space we can use HTTP/1.0 and avoid the
 	 * chunked encoding mess.
 	 */
-	while (1) {
-		size_t n;
-		enum packet_read_status status;
-
-		if (!rpc_read_from_out(rpc, 0, &n, &status)) {
-			large_request = 1;
-			use_gzip = 0;
-			break;
+	if (!flush_received) {
+		while (1) {
+			size_t n;
+			enum packet_read_status status;
+
+			if (!rpc_read_from_out(rpc, 0, &n, &status)) {
+				large_request = 1;
+				use_gzip = 0;
+				break;
+			}
+			if (status == PACKET_READ_FLUSH)
+				break;
 		}
-		if (status == PACKET_READ_FLUSH)
-			break;
 	}
 
 	if (large_request) {
@@ -963,7 +969,7 @@ static int rpc_service(struct rpc_state *rpc, struct discovery *heads,
 			break;
 		rpc->pos = 0;
 		rpc->len = n;
-		err |= post_rpc(rpc);
+		err |= post_rpc(rpc, 0);
 	}
 
 	close(client.in);
@@ -1319,7 +1325,7 @@ static int stateless_connect(const char *service_name)
 			BUG("The entire rpc->buf should be larger than LARGE_PACKET_MAX");
 		if (status == PACKET_READ_EOF)
 			break;
-		if (post_rpc(&rpc))
+		if (post_rpc(&rpc, status == PACKET_READ_FLUSH))
 			/* We would have an err here */
 			break;
 		/* Reset the buffer for next request */
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2019-03-04  3:45 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-14 19:06 [PATCH 0/5] Protocol v2 fix: http and auth Jonathan Tan
2019-02-14 19:06 ` [PATCH 1/5] remote-curl: reduce scope of rpc_state.argv Jonathan Tan
2019-02-14 19:06 ` [PATCH 2/5] remote-curl: reduce scope of rpc_state.stdin_preamble Jonathan Tan
2019-02-14 19:06 ` [PATCH 3/5] remote-curl: reduce scope of rpc_state.result Jonathan Tan
2019-02-14 19:06 ` [PATCH 4/5] remote-curl: refactor reading into rpc_state's buf Jonathan Tan
2019-02-14 19:06 ` [PATCH 5/5] remote-curl: use post_rpc() for protocol v2 also Jonathan Tan
2019-02-14 22:47   ` Junio C Hamano
2019-02-21 13:46   ` Jeff King
2019-02-21 19:26     ` Jonathan Tan
2019-02-21 13:47 ` [PATCH 0/5] Protocol v2 fix: http and auth Jeff King
2019-02-21 20:24 ` [PATCH v2 " Jonathan Tan
2019-02-21 20:24   ` [PATCH v2 1/5] remote-curl: reduce scope of rpc_state.argv Jonathan Tan
2019-02-21 20:24   ` [PATCH v2 2/5] remote-curl: reduce scope of rpc_state.stdin_preamble Jonathan Tan
2019-02-21 20:24   ` [PATCH v2 3/5] remote-curl: reduce scope of rpc_state.result Jonathan Tan
2019-02-21 20:24   ` [PATCH v2 4/5] remote-curl: refactor reading into rpc_state's buf Jonathan Tan
2019-02-21 20:24   ` [PATCH v2 5/5] remote-curl: use post_rpc() for protocol v2 also Jonathan Tan
2019-02-22 13:18     ` Eric Sunshine
2019-02-22 19:15       ` Eric Sunshine
2019-02-25 22:08     ` Jeff King
2019-02-25 23:49       ` [FIXUP] Fixup on tip of jt/http-auth-proto-v2-fix Jonathan Tan
2019-02-26  7:04         ` Jonathan Nieder
2019-02-26 18:18           ` Jonathan Tan
2019-03-04  3:45             ` Junio C Hamano
2019-02-27 12:02         ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).