git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [PATCH] remote-curl: die on server-side errors
@ 2018-11-12 22:44 steadmon
  2018-11-12 22:55 ` Stefan Beller
                   ` (4 more replies)
  0 siblings, 5 replies; 41+ messages in thread
From: steadmon @ 2018-11-12 22:44 UTC (permalink / raw)
  To: git

When a smart HTTP server sends an error message via pkt-line,
remote-curl will fail to detect the error (which usually results in
incorrectly falling back to dumb-HTTP mode).

This patch adds a check in discover_refs() for server-side error
messages, as well as a test case for this issue.

Signed-off-by: Josh Steadmon <steadmon@google.com>
---
 remote-curl.c                   | 4 +++-
 t/lib-httpd.sh                  | 1 +
 t/lib-httpd/apache.conf         | 4 ++++
 t/lib-httpd/error-smart-http.sh | 3 +++
 t/t5551-http-fetch-smart.sh     | 5 +++++
 5 files changed, 16 insertions(+), 1 deletion(-)
 create mode 100644 t/lib-httpd/error-smart-http.sh

diff --git a/remote-curl.c b/remote-curl.c
index 762a55a75f..bb3a86505e 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -436,7 +436,9 @@ static struct discovery *discover_refs(const char *service, int for_push)
 	} else if (maybe_smart &&
 		   last->len > 5 && starts_with(last->buf + 4, "version 2")) {
 		last->proto_git = 1;
-	}
+	} else if (maybe_smart && last->len > 5 &&
+		   starts_with(last->buf + 4, "ERR "))
+		die(_("remote error: %s"), last->buf + 8);
 
 	if (last->proto_git)
 		last->refs = parse_git_refs(last, for_push);
diff --git a/t/lib-httpd.sh b/t/lib-httpd.sh
index a8729f8232..4e946623bb 100644
--- a/t/lib-httpd.sh
+++ b/t/lib-httpd.sh
@@ -131,6 +131,7 @@ prepare_httpd() {
 	mkdir -p "$HTTPD_DOCUMENT_ROOT_PATH"
 	cp "$TEST_PATH"/passwd "$HTTPD_ROOT_PATH"
 	install_script broken-smart-http.sh
+	install_script error-smart-http.sh
 	install_script error.sh
 	install_script apply-one-time-sed.sh
 
diff --git a/t/lib-httpd/apache.conf b/t/lib-httpd/apache.conf
index 581c010d8f..6de2bc775c 100644
--- a/t/lib-httpd/apache.conf
+++ b/t/lib-httpd/apache.conf
@@ -117,6 +117,7 @@ Alias /auth/dumb/ www/auth/dumb/
 </LocationMatch>
 ScriptAliasMatch /smart_*[^/]*/(.*) ${GIT_EXEC_PATH}/git-http-backend/$1
 ScriptAlias /broken_smart/ broken-smart-http.sh/
+ScriptAlias /error_smart/ error-smart-http.sh/
 ScriptAlias /error/ error.sh/
 ScriptAliasMatch /one_time_sed/(.*) apply-one-time-sed.sh/$1
 <Directory ${GIT_EXEC_PATH}>
@@ -125,6 +126,9 @@ ScriptAliasMatch /one_time_sed/(.*) apply-one-time-sed.sh/$1
 <Files broken-smart-http.sh>
 	Options ExecCGI
 </Files>
+<Files error-smart-http.sh>
+	Options ExecCGI
+</Files>
 <Files error.sh>
   Options ExecCGI
 </Files>
diff --git a/t/lib-httpd/error-smart-http.sh b/t/lib-httpd/error-smart-http.sh
new file mode 100644
index 0000000000..0a1af318f7
--- /dev/null
+++ b/t/lib-httpd/error-smart-http.sh
@@ -0,0 +1,3 @@
+printf "Content-Type: text/%s\n" "html"
+echo
+printf "%s" "0019ERR server-side error"
diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh
index 8630b0cc39..ba83e567e5 100755
--- a/t/t5551-http-fetch-smart.sh
+++ b/t/t5551-http-fetch-smart.sh
@@ -429,5 +429,10 @@ test_expect_success 'GIT_TRACE_CURL_NO_DATA prevents data from being traced' '
 	! grep "=> Send data" err
 '
 
+test_expect_success 'server-side error detected' '
+	test_must_fail git clone $HTTPD_URL/error_smart/repo.git 2>actual &&
+	grep "server-side error" actual
+'
+
 stop_httpd
 test_done
-- 
2.19.1.930.g4563a0d9d0-goog


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] remote-curl: die on server-side errors
  2018-11-12 22:44 [PATCH] remote-curl: die on server-side errors steadmon
@ 2018-11-12 22:55 ` Stefan Beller
  2018-11-13  2:52 ` Junio C Hamano
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 41+ messages in thread
From: Stefan Beller @ 2018-11-12 22:55 UTC (permalink / raw)
  To: Josh Steadmon; +Cc: git

On Mon, Nov 12, 2018 at 2:45 PM <steadmon@google.com> wrote:
>
> When a smart HTTP server sends an error message via pkt-line,
> remote-curl will fail to detect the error (which usually results in
> incorrectly falling back to dumb-HTTP mode).
>
> This patch adds a check in discover_refs() for server-side error
> messages, as well as a test case for this issue.
>
> Signed-off-by: Josh Steadmon <steadmon@google.com>

Reviewed-by: Stefan Beller <sbeller@google.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] remote-curl: die on server-side errors
  2018-11-12 22:44 [PATCH] remote-curl: die on server-side errors steadmon
  2018-11-12 22:55 ` Stefan Beller
@ 2018-11-13  2:52 ` Junio C Hamano
  2018-11-13  3:02 ` Junio C Hamano
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 41+ messages in thread
From: Junio C Hamano @ 2018-11-13  2:52 UTC (permalink / raw)
  To: steadmon; +Cc: git

steadmon@google.com writes:

> When a smart HTTP server sends an error message via pkt-line,
> remote-curl will fail to detect the error (which usually results in
> incorrectly falling back to dumb-HTTP mode).
>
> This patch adds a check in discover_refs() for server-side error
> messages, as well as a test case for this issue.
>
> Signed-off-by: Josh Steadmon <steadmon@google.com>
> ---
>  remote-curl.c                   | 4 +++-
>  t/lib-httpd.sh                  | 1 +
>  t/lib-httpd/apache.conf         | 4 ++++
>  t/lib-httpd/error-smart-http.sh | 3 +++
>  t/t5551-http-fetch-smart.sh     | 5 +++++
>  5 files changed, 16 insertions(+), 1 deletion(-)
>  create mode 100644 t/lib-httpd/error-smart-http.sh
>
> diff --git a/remote-curl.c b/remote-curl.c
> index 762a55a75f..bb3a86505e 100644
> --- a/remote-curl.c
> +++ b/remote-curl.c
> @@ -436,7 +436,9 @@ static struct discovery *discover_refs(const char *service, int for_push)
>  	} else if (maybe_smart &&
>  		   last->len > 5 && starts_with(last->buf + 4, "version 2")) {
>  		last->proto_git = 1;
> -	}
> +	} else if (maybe_smart && last->len > 5 &&
> +		   starts_with(last->buf + 4, "ERR "))
> +		die(_("remote error: %s"), last->buf + 8);

Two observations and a half.

 - All of these if/else if/ blocks (currently 2, now you added the
   third one) are to special case the "maybe_smart" case.  Perhaps
   the whole thing should be restrucutred to

	if (maybe_smart) {
		if ...
		else if ...
		else if ...
	}

 - All conditions skip the first four bytes in last->buf and does
   starts_with (the first branch is "starts_with("#") in disguise).
   Can we do something to the hardcoded magic number 4, which looks
   somewhat irritating?

 - This is less of the problem with the suggested change in the
   first bullet item above, but the current code structure makes
   readers wonder if maybe_start that starts as 1 is turned off
   somewhere in the if/else if/ cascade when we find out that we are
   not dealing with smart HTTP after all.  That however is not what
   is happening.  The "maybe" variable is primarily there to avoid
   repeating the logic that determined its initial value in the
   early part of the function.  The last->proto_git field is used to
   record if we are using smart HTTP or not.

Otherwise the change looks sensible.


> diff --git a/t/lib-httpd.sh b/t/lib-httpd.sh
> index a8729f8232..4e946623bb 100644
> --- a/t/lib-httpd.sh
> +++ b/t/lib-httpd.sh
> @@ -131,6 +131,7 @@ prepare_httpd() {
>  	mkdir -p "$HTTPD_DOCUMENT_ROOT_PATH"
>  	cp "$TEST_PATH"/passwd "$HTTPD_ROOT_PATH"
>  	install_script broken-smart-http.sh
> +	install_script error-smart-http.sh
>  	install_script error.sh
>  	install_script apply-one-time-sed.sh
>  
> diff --git a/t/lib-httpd/apache.conf b/t/lib-httpd/apache.conf
> index 581c010d8f..6de2bc775c 100644
> --- a/t/lib-httpd/apache.conf
> +++ b/t/lib-httpd/apache.conf
> @@ -117,6 +117,7 @@ Alias /auth/dumb/ www/auth/dumb/
>  </LocationMatch>
>  ScriptAliasMatch /smart_*[^/]*/(.*) ${GIT_EXEC_PATH}/git-http-backend/$1
>  ScriptAlias /broken_smart/ broken-smart-http.sh/
> +ScriptAlias /error_smart/ error-smart-http.sh/
>  ScriptAlias /error/ error.sh/
>  ScriptAliasMatch /one_time_sed/(.*) apply-one-time-sed.sh/$1
>  <Directory ${GIT_EXEC_PATH}>
> @@ -125,6 +126,9 @@ ScriptAliasMatch /one_time_sed/(.*) apply-one-time-sed.sh/$1
>  <Files broken-smart-http.sh>
>  	Options ExecCGI
>  </Files>
> +<Files error-smart-http.sh>
> +	Options ExecCGI
> +</Files>
>  <Files error.sh>
>    Options ExecCGI
>  </Files>
> diff --git a/t/lib-httpd/error-smart-http.sh b/t/lib-httpd/error-smart-http.sh
> new file mode 100644
> index 0000000000..0a1af318f7
> --- /dev/null
> +++ b/t/lib-httpd/error-smart-http.sh
> @@ -0,0 +1,3 @@
> +printf "Content-Type: text/%s\n" "html"
> +echo
> +printf "%s" "0019ERR server-side error"
> diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh
> index 8630b0cc39..ba83e567e5 100755
> --- a/t/t5551-http-fetch-smart.sh
> +++ b/t/t5551-http-fetch-smart.sh
> @@ -429,5 +429,10 @@ test_expect_success 'GIT_TRACE_CURL_NO_DATA prevents data from being traced' '
>  	! grep "=> Send data" err
>  '
>  
> +test_expect_success 'server-side error detected' '
> +	test_must_fail git clone $HTTPD_URL/error_smart/repo.git 2>actual &&
> +	grep "server-side error" actual
> +'
> +
>  stop_httpd
>  test_done

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] remote-curl: die on server-side errors
  2018-11-12 22:44 [PATCH] remote-curl: die on server-side errors steadmon
  2018-11-12 22:55 ` Stefan Beller
  2018-11-13  2:52 ` Junio C Hamano
@ 2018-11-13  3:02 ` Junio C Hamano
  2018-11-13 22:15   ` Josh Steadmon
  2018-11-13 14:26 ` Jeff King
  2018-11-13 14:30 ` [PATCH] remote-curl: die on server-side errors Junio C Hamano
  4 siblings, 1 reply; 41+ messages in thread
From: Junio C Hamano @ 2018-11-13  3:02 UTC (permalink / raw)
  To: steadmon; +Cc: git

steadmon@google.com writes:

> When a smart HTTP server sends an error message via pkt-line,
> remote-curl will fail to detect the error (which usually results in
> incorrectly falling back to dumb-HTTP mode).
>
> This patch adds a check in discover_refs() for server-side error
> messages, as well as a test case for this issue.
>
> Signed-off-by: Josh Steadmon <steadmon@google.com>
> ---

Forgot to mention one procedural comment.

As you can see in the To: line of this reply, your MUA is placing
only the e-mail address without name on your From: line.

Preferrably I'd like to see the same string as your sign-off on the
"From:" line in your messages for a bit of human touch ;-) Can you
tweak your MUA to make that happen?

The second preference is to add an in-body header (i.e. as the first
line of the body of the message) so that the body of the message
starts like this:

    From: Josh Steadmon <steadmon@google.com>

    When a smart HTTP server sends an error message via pkt-line,
    remote-curl will fail to detect the error (which usually results in
    incorrectly falling back to dumb-HTTP mode).

    This patch adds a check in discover_refs() for server-side error
    messages, as well as a test case for this issue.

    Signed-off-by: Josh Steadmon <steadmon@google.com>
    ---
     remote-curl.c                   | 4 +++-
     t/lib-httpd.sh                  | 1 +
     t/lib-httpd/apache.conf         | 4 ++++

Either way would make sure that the resulting patch's author line
will be attributed correctly to the same identity as who is signing
it off first as the author.

Thanks.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] remote-curl: die on server-side errors
  2018-11-12 22:44 [PATCH] remote-curl: die on server-side errors steadmon
                   ` (2 preceding siblings ...)
  2018-11-13  3:02 ` Junio C Hamano
@ 2018-11-13 14:26 ` Jeff King
  2018-11-13 22:25   ` Josh Steadmon
  2018-11-13 14:30 ` [PATCH] remote-curl: die on server-side errors Junio C Hamano
  4 siblings, 1 reply; 41+ messages in thread
From: Jeff King @ 2018-11-13 14:26 UTC (permalink / raw)
  To: steadmon; +Cc: git

On Mon, Nov 12, 2018 at 02:44:56PM -0800, steadmon@google.com wrote:

> When a smart HTTP server sends an error message via pkt-line,
> remote-curl will fail to detect the error (which usually results in
> incorrectly falling back to dumb-HTTP mode).
> 
> This patch adds a check in discover_refs() for server-side error
> messages, as well as a test case for this issue.

Aside from the reformatting of the conditional that Junio mentioned,
this seems pretty good to me. But while looking at that, I found a few
things, some old and some new. :)

> diff --git a/remote-curl.c b/remote-curl.c
> index 762a55a75f..bb3a86505e 100644
> --- a/remote-curl.c
> +++ b/remote-curl.c
> @@ -436,7 +436,9 @@ static struct discovery *discover_refs(const char *service, int for_push)
>  	} else if (maybe_smart &&
>  		   last->len > 5 && starts_with(last->buf + 4, "version 2")) {
>  		last->proto_git = 1;
> -	}
> +	} else if (maybe_smart && last->len > 5 &&
> +		   starts_with(last->buf + 4, "ERR "))
> +		die(_("remote error: %s"), last->buf + 8);

The magic "4" here and in the existing "version 2" check is because we
are expecting pkt-lines. The original conditional always calls
packed_read_line_buf() and will complain if we didn't actually get a
pkt-line.

Should we confirm that we got a real packet-line? Or at least that those
first four are even plausible hex chars?

I admit that it's pretty unlikely that the server is going to fool us
here. It would need something like "foo ERRORS ARE FUN!". And even then
we'd report an error (whereas the correct behavior is falling back to
dumb http, but we know that won't work anyway because that's not a valid
ref advertisement). So I doubt this is really a bug per se, but it might
make it easier to understand what's going on if we actually parsed the
packet.

Similarly, we seem eager to accept "version 2" even if we are only
expecting v0. I know you have another series working in that direction,
but I don't think it touches this "proto_git". I guess accepting
"version 2" as "we're speaking git protocol" and then barfing later with
"wait, I didn't mean to speak v2" is probably OK.

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] remote-curl: die on server-side errors
  2018-11-12 22:44 [PATCH] remote-curl: die on server-side errors steadmon
                   ` (3 preceding siblings ...)
  2018-11-13 14:26 ` Jeff King
@ 2018-11-13 14:30 ` Junio C Hamano
  2018-11-13 22:28   ` Josh Steadmon
  4 siblings, 1 reply; 41+ messages in thread
From: Junio C Hamano @ 2018-11-13 14:30 UTC (permalink / raw)
  To: steadmon; +Cc: git

steadmon@google.com writes:

> When a smart HTTP server sends an error message via pkt-line,
> remote-curl will fail to detect the error (which usually results in
> incorrectly falling back to dumb-HTTP mode).

OK, that is a valid thing to worry about.

>
> This patch adds a check in discover_refs() for server-side error
> messages, as well as a test case for this issue.

This makes me wonder if discoer_refs() is the only place where we
ought to be checking for ERR packets but we are not.  Are there
other places that we also need a similar fix?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] remote-curl: die on server-side errors
  2018-11-13  3:02 ` Junio C Hamano
@ 2018-11-13 22:15   ` Josh Steadmon
  0 siblings, 0 replies; 41+ messages in thread
From: Josh Steadmon @ 2018-11-13 22:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On 2018.11.13 12:02, Junio C Hamano wrote:
> steadmon@google.com writes:
> 
> > When a smart HTTP server sends an error message via pkt-line,
> > remote-curl will fail to detect the error (which usually results in
> > incorrectly falling back to dumb-HTTP mode).
> >
> > This patch adds a check in discover_refs() for server-side error
> > messages, as well as a test case for this issue.
> >
> > Signed-off-by: Josh Steadmon <steadmon@google.com>
> > ---
> 
> Forgot to mention one procedural comment.
> 
> As you can see in the To: line of this reply, your MUA is placing
> only the e-mail address without name on your From: line.
> 
> Preferrably I'd like to see the same string as your sign-off on the
> "From:" line in your messages for a bit of human touch ;-) Can you
> tweak your MUA to make that happen?
> 
> The second preference is to add an in-body header (i.e. as the first
> line of the body of the message) so that the body of the message
> starts like this:
> 
>     From: Josh Steadmon <steadmon@google.com>
> 
>     When a smart HTTP server sends an error message via pkt-line,
>     remote-curl will fail to detect the error (which usually results in
>     incorrectly falling back to dumb-HTTP mode).
> 
>     This patch adds a check in discover_refs() for server-side error
>     messages, as well as a test case for this issue.
> 
>     Signed-off-by: Josh Steadmon <steadmon@google.com>
>     ---
>      remote-curl.c                   | 4 +++-
>      t/lib-httpd.sh                  | 1 +
>      t/lib-httpd/apache.conf         | 4 ++++
> 
> Either way would make sure that the resulting patch's author line
> will be attributed correctly to the same identity as who is signing
> it off first as the author.
> 
> Thanks.

This should be fixed for future patch submissions. Thanks for the
heads-up.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] remote-curl: die on server-side errors
  2018-11-13 14:26 ` Jeff King
@ 2018-11-13 22:25   ` Josh Steadmon
  2018-11-14  0:49     ` Jeff King
  0 siblings, 1 reply; 41+ messages in thread
From: Josh Steadmon @ 2018-11-13 22:25 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On 2018.11.13 09:26, Jeff King wrote:
> On Mon, Nov 12, 2018 at 02:44:56PM -0800, steadmon@google.com wrote:
> 
> > When a smart HTTP server sends an error message via pkt-line,
> > remote-curl will fail to detect the error (which usually results in
> > incorrectly falling back to dumb-HTTP mode).
> > 
> > This patch adds a check in discover_refs() for server-side error
> > messages, as well as a test case for this issue.
> 
> Aside from the reformatting of the conditional that Junio mentioned,
> this seems pretty good to me. But while looking at that, I found a few
> things, some old and some new. :)
> 
> > diff --git a/remote-curl.c b/remote-curl.c
> > index 762a55a75f..bb3a86505e 100644
> > --- a/remote-curl.c
> > +++ b/remote-curl.c
> > @@ -436,7 +436,9 @@ static struct discovery *discover_refs(const char *service, int for_push)
> >  	} else if (maybe_smart &&
> >  		   last->len > 5 && starts_with(last->buf + 4, "version 2")) {
> >  		last->proto_git = 1;
> > -	}
> > +	} else if (maybe_smart && last->len > 5 &&
> > +		   starts_with(last->buf + 4, "ERR "))
> > +		die(_("remote error: %s"), last->buf + 8);
> 
> The magic "4" here and in the existing "version 2" check is because we
> are expecting pkt-lines. The original conditional always calls
> packed_read_line_buf() and will complain if we didn't actually get a
> pkt-line.
> 
> Should we confirm that we got a real packet-line? Or at least that those
> first four are even plausible hex chars?
> 
> I admit that it's pretty unlikely that the server is going to fool us
> here. It would need something like "foo ERRORS ARE FUN!". And even then
> we'd report an error (whereas the correct behavior is falling back to
> dumb http, but we know that won't work anyway because that's not a valid
> ref advertisement). So I doubt this is really a bug per se, but it might
> make it easier to understand what's going on if we actually parsed the
> packet.

Unfortunately we can't just directly parse the data in last->buf,
because other parts of the code are expecting to see the raw pkt-line
data there. I tried adding a duplicate pointer and length variable for
this data and parsing that through packet_read_line_buf(), but even
without using the results this apparently has side-effects that break
all of t5550 (and probably other tests as well). It also fails if I
completely duplicate last->buf into a new char* and call
packet_readline_buf() on that, so there's clearly some interaction in
the pkt-line guts that I'm not properly accounting for.

> Similarly, we seem eager to accept "version 2" even if we are only
> expecting v0. I know you have another series working in that direction,
> but I don't think it touches this "proto_git". I guess accepting
> "version 2" as "we're speaking git protocol" and then barfing later with
> "wait, I didn't mean to speak v2" is probably OK.
> 
> -Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] remote-curl: die on server-side errors
  2018-11-13 14:30 ` [PATCH] remote-curl: die on server-side errors Junio C Hamano
@ 2018-11-13 22:28   ` Josh Steadmon
  0 siblings, 0 replies; 41+ messages in thread
From: Josh Steadmon @ 2018-11-13 22:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On 2018.11.13 23:30, Junio C Hamano wrote:
> steadmon@google.com writes:
> 
> > When a smart HTTP server sends an error message via pkt-line,
> > remote-curl will fail to detect the error (which usually results in
> > incorrectly falling back to dumb-HTTP mode).
> 
> OK, that is a valid thing to worry about.
> 
> >
> > This patch adds a check in discover_refs() for server-side error
> > messages, as well as a test case for this issue.
> 
> This makes me wonder if discoer_refs() is the only place where we
> ought to be checking for ERR packets but we are not.  Are there
> other places that we also need a similar fix?

Quite possibly; I'll review the other client code to see if there are
similar issues before sending v2.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] remote-curl: die on server-side errors
  2018-11-13 22:25   ` Josh Steadmon
@ 2018-11-14  0:49     ` Jeff King
  2018-11-14  7:00       ` Jeff King
  0 siblings, 1 reply; 41+ messages in thread
From: Jeff King @ 2018-11-14  0:49 UTC (permalink / raw)
  To: git

On Tue, Nov 13, 2018 at 02:25:40PM -0800, Josh Steadmon wrote:

> > The magic "4" here and in the existing "version 2" check is because we
> > are expecting pkt-lines. The original conditional always calls
> > packed_read_line_buf() and will complain if we didn't actually get a
> > pkt-line.
> > 
> > Should we confirm that we got a real packet-line? Or at least that those
> > first four are even plausible hex chars?
> > 
> > I admit that it's pretty unlikely that the server is going to fool us
> > here. It would need something like "foo ERRORS ARE FUN!". And even then
> > we'd report an error (whereas the correct behavior is falling back to
> > dumb http, but we know that won't work anyway because that's not a valid
> > ref advertisement). So I doubt this is really a bug per se, but it might
> > make it easier to understand what's going on if we actually parsed the
> > packet.
> 
> Unfortunately we can't just directly parse the data in last->buf,
> because other parts of the code are expecting to see the raw pkt-line
> data there. I tried adding a duplicate pointer and length variable for
> this data and parsing that through packet_read_line_buf(), but even
> without using the results this apparently has side-effects that break
> all of t5550 (and probably other tests as well). It also fails if I
> completely duplicate last->buf into a new char* and call
> packet_readline_buf() on that, so there's clearly some interaction in
> the pkt-line guts that I'm not properly accounting for.

Yes, the packet_read_line_buf() interface will both advance the buf
pointer and decrement the length.  So if we want to "peek", we have to
do so with a copy (there's a peek function if you use the packet_reader
interface, but that might be overkill here).

You can rewrite it like this, which is a pretty faithful conversion and
passes the tests (but see below).

diff --git a/remote-curl.c b/remote-curl.c
index 762a55a75f..66c57c9d62 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -330,9 +330,78 @@ static int get_protocol_http_header(enum protocol_version version,
 	return 0;
 }
 
+static void check_smart_http(struct discovery *d, const char *service,
+			     struct strbuf *type)
+{
+	char *src_buf;
+	size_t src_len;
+	char pkt[LARGE_PACKET_MAX];
+	int pkt_len;
+	enum packet_read_status r;
+
+	/*
+	 * We speculatively try to read a packet, which means we must preserve
+	 * the original buf/len pair in some cases.
+	 */
+	src_buf = d->buf;
+	src_len = d->len;
+	r = packet_read_with_status(-1, &src_buf, &src_len,
+				    pkt, sizeof(pkt), &pkt_len,
+				    PACKET_READ_GENTLE |
+				    PACKET_READ_CHOMP_NEWLINE);
+
+	/*
+	 * This could probably just be handled as "not smart" like all the
+	 * other pkt-line error cases, but traditionally we've complained
+	 * about it (technically we used to do so only when we got the right
+	 * content-type, but it probably doesn't matter).
+	 */
+	if (r == PACKET_READ_FLUSH)
+		die("invalid server response; expected service, got flush packet");
+	if (r != PACKET_READ_NORMAL)
+		return; /* not smart */
+
+	if (pkt[0] == '#') {
+		/* v0 smart http */
+		struct strbuf exp = STRBUF_INIT;
+
+		strbuf_addf(&exp, "application/x-%s-advertisement", service);
+		if (strbuf_cmp(&exp, type)) {
+			strbuf_release(&exp);
+			return;
+		}
+
+		strbuf_reset(&exp);
+		strbuf_addf(&exp, "# service=%s", service);
+		if (strcmp(pkt, exp.buf))
+			die("invalid server response; got '%s'", pkt);
+
+		strbuf_release(&exp);
+
+		/*
+		 * The header can include additional metadata lines, up
+		 * until a packet flush marker.  Ignore these now, but
+		 * in the future we might start to scan them.
+		 */
+		while (packet_read_line_buf(&src_buf, &src_len, NULL))
+			;
+
+		d->buf = src_buf;
+		d->len = src_len;
+		d->proto_git = 1;
+
+	} else if (starts_with(pkt, "version 2")) {
+		/*
+		 * v2 smart http; do not consume version packet, which will
+		 * be handled elsewhere. Should we be checking the content-type
+		 * in this code-path, too?
+		 */
+		d->proto_git = 1;
+	}
+}
+
 static struct discovery *discover_refs(const char *service, int for_push)
 {
-	struct strbuf exp = STRBUF_INIT;
 	struct strbuf type = STRBUF_INIT;
 	struct strbuf charset = STRBUF_INIT;
 	struct strbuf buffer = STRBUF_INIT;
@@ -405,38 +474,8 @@ static struct discovery *discover_refs(const char *service, int for_push)
 	last->buf_alloc = strbuf_detach(&buffer, &last->len);
 	last->buf = last->buf_alloc;
 
-	strbuf_addf(&exp, "application/x-%s-advertisement", service);
-	if (maybe_smart &&
-	    (5 <= last->len && last->buf[4] == '#') &&
-	    !strbuf_cmp(&exp, &type)) {
-		char *line;
-
-		/*
-		 * smart HTTP response; validate that the service
-		 * pkt-line matches our request.
-		 */
-		line = packet_read_line_buf(&last->buf, &last->len, NULL);
-		if (!line)
-			die("invalid server response; expected service, got flush packet");
-
-		strbuf_reset(&exp);
-		strbuf_addf(&exp, "# service=%s", service);
-		if (strcmp(line, exp.buf))
-			die("invalid server response; got '%s'", line);
-		strbuf_release(&exp);
-
-		/* The header can include additional metadata lines, up
-		 * until a packet flush marker.  Ignore these now, but
-		 * in the future we might start to scan them.
-		 */
-		while (packet_read_line_buf(&last->buf, &last->len, NULL))
-			;
-
-		last->proto_git = 1;
-	} else if (maybe_smart &&
-		   last->len > 5 && starts_with(last->buf + 4, "version 2")) {
-		last->proto_git = 1;
-	}
+	if (maybe_smart)
+		check_smart_http(last, service, &type);
 
 	if (last->proto_git)
 		last->refs = parse_git_refs(last, for_push);
@@ -444,7 +483,6 @@ static struct discovery *discover_refs(const char *service, int for_push)
 		last->refs = parse_info_refs(last);
 
 	strbuf_release(&refs_url);
-	strbuf_release(&exp);
 	strbuf_release(&type);
 	strbuf_release(&charset);
 	strbuf_release(&effective_url);

So the few tricky things are:

  - the ordering with respect to checking the packet and the
    content-type is a little different here. Should v2 protocol be
    expecting that content-type, too? If so, then I think it would make
    sense to check that first, before even considering if there's a
    packet to read (and the current code is overly-permissive in that
    case).

  - there is no way to speculatively read a packet and not die() if the
    buffer doesn't have pkt-lines in it. So we'd additionally need the
    patch below (which annoyingly has to touch a bunch of switch
    statements to keep the compiler happy). If we _can_ check the
    content-type first, then all of that could go away (i.e., once we
    see the right content-type, we're confident enough in expecting
    packets to die() in the existing packet code-paths).

diff --git a/connect.c b/connect.c
index 24281b6082..1caafc3ce7 100644
--- a/connect.c
+++ b/connect.c
@@ -125,6 +125,8 @@ enum protocol_version discover_version(struct packet_reader *reader)
 	switch (packet_reader_peek(reader)) {
 	case PACKET_READ_EOF:
 		die_initial_contact(0);
+	case PACKET_READ_FORMAT_ERROR:
+		BUG("format error from non-gentle packet read");
 	case PACKET_READ_FLUSH:
 	case PACKET_READ_DELIM:
 		version = protocol_v0;
@@ -304,6 +306,8 @@ struct ref **get_remote_heads(struct packet_reader *reader,
 		switch (packet_reader_read(reader)) {
 		case PACKET_READ_EOF:
 			die_initial_contact(1);
+		case PACKET_READ_FORMAT_ERROR:
+			BUG("format error from non-gentle packet read");
 		case PACKET_READ_NORMAL:
 			len = reader->pktlen;
 			if (len > 4 && skip_prefix(reader->line, "ERR ", &arg))
diff --git a/pkt-line.c b/pkt-line.c
index 04d10bbd03..2daacc6188 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -293,7 +293,8 @@ static int get_packet_data(int fd, char **src_buf, size_t *src_size,
 
 	/* And complain if we didn't get enough bytes to satisfy the read. */
 	if (ret != size) {
-		if (options & PACKET_READ_GENTLE_ON_EOF)
+		if ((options & PACKET_READ_GENTLE_ON_EOF) ||
+		    (options & PACKET_READ_GENTLE))
 			return -1;
 
 		die(_("the remote end hung up unexpectedly"));
@@ -324,6 +325,10 @@ enum packet_read_status packet_read_with_status(int fd, char **src_buffer,
 	len = packet_length(linelen);
 
 	if (len < 0) {
+		if (options & PACKET_READ_GENTLE) {
+			*pktlen = -1;
+			return PACKET_READ_FORMAT_ERROR;
+		}
 		die(_("protocol error: bad line length character: %.4s"), linelen);
 	} else if (!len) {
 		packet_trace("0000", 4, 0);
@@ -334,12 +339,21 @@ enum packet_read_status packet_read_with_status(int fd, char **src_buffer,
 		*pktlen = 0;
 		return PACKET_READ_DELIM;
 	} else if (len < 4) {
+		if (options & PACKET_READ_GENTLE) {
+			*pktlen = -1;
+			return PACKET_READ_FORMAT_ERROR;
+		}
 		die(_("protocol error: bad line length %d"), len);
 	}
 
 	len -= 4;
-	if ((unsigned)len >= size)
+	if ((unsigned)len >= size) {
+		if (options & PACKET_READ_GENTLE) {
+			*pktlen = -1;
+			return PACKET_READ_FORMAT_ERROR;
+		}
 		die(_("protocol error: bad line length %d"), len);
+	}
 
 	if (get_packet_data(fd, src_buffer, src_len, buffer, len, options) < 0) {
 		*pktlen = -1;
diff --git a/pkt-line.h b/pkt-line.h
index 5b28d43472..7580f83cdc 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -44,7 +44,7 @@ int write_packetized_from_buf(const char *src_in, size_t len, int fd_out);
  * If src_buffer (or *src_buffer) is NULL, then data is read from the
  * descriptor "fd".
  *
- * If options does not contain PACKET_READ_GENTLE_ON_EOF, we will die under any
+ * If options does not contain PACKET_READ_GENTLE, we will die under any
  * of the following conditions:
  *
  *   1. Read error from descriptor.
@@ -56,6 +56,8 @@ int write_packetized_from_buf(const char *src_in, size_t len, int fd_out);
  *   4. Truncated output from the remote (e.g., we expected a packet but got
  *      EOF, or we got a partial packet followed by EOF).
  *
+ * If options does contain PACKET_READ_GENTLE, we'll instead return -1.
+ *
  * If options does contain PACKET_READ_GENTLE_ON_EOF, we will not die on
  * condition 4 (truncated input), but instead return -1. However, we will still
  * die for the other 3 conditions.
@@ -65,6 +67,7 @@ int write_packetized_from_buf(const char *src_in, size_t len, int fd_out);
  */
 #define PACKET_READ_GENTLE_ON_EOF (1u<<0)
 #define PACKET_READ_CHOMP_NEWLINE (1u<<1)
+#define PACKET_READ_GENTLE        (1u<<2)
 int packet_read(int fd, char **src_buffer, size_t *src_len, char
 		*buffer, unsigned size, int options);
 
@@ -79,6 +82,7 @@ enum packet_read_status {
 	PACKET_READ_NORMAL,
 	PACKET_READ_FLUSH,
 	PACKET_READ_DELIM,
+	PACKET_READ_FORMAT_ERROR,
 };
 enum packet_read_status packet_read_with_status(int fd, char **src_buffer,
 						size_t *src_len, char *buffer,
diff --git a/remote-curl.c b/remote-curl.c
index 762a55a75f..66c57c9d62 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -1214,6 +1252,8 @@ static size_t proxy_in(char *buffer, size_t eltsize,
 		switch (packet_reader_read(&p->reader)) {
 		case PACKET_READ_EOF:
 			die("unexpected EOF when reading from parent process");
+		case PACKET_READ_FORMAT_ERROR:
+			BUG("format error from non-gentle packet read");
 		case PACKET_READ_NORMAL:
 			packet_buf_write_len(&p->request_buffer, p->reader.line,
 					     p->reader.pktlen);
diff --git a/serve.c b/serve.c
index bda085f09c..e88936262b 100644
--- a/serve.c
+++ b/serve.c
@@ -181,6 +181,8 @@ static int process_request(void)
 		switch (packet_reader_peek(&reader)) {
 		case PACKET_READ_EOF:
 			BUG("Should have already died when seeing EOF");
+		case PACKET_READ_FORMAT_ERROR:
+			BUG("format error from non-gentle packet read");
 		case PACKET_READ_NORMAL:
 			/* collect request; a sequence of keys and values */
 			if (is_command(reader.line, &command) ||
diff --git a/t/helper/test-pkt-line.c b/t/helper/test-pkt-line.c
index 282d536384..1522176a2f 100644
--- a/t/helper/test-pkt-line.c
+++ b/t/helper/test-pkt-line.c
@@ -37,6 +37,8 @@ static void unpack(void)
 		switch (reader.status) {
 		case PACKET_READ_EOF:
 			break;
+		case PACKET_READ_FORMAT_ERROR:
+			BUG("format error from non-gentle packet read");
 		case PACKET_READ_NORMAL:
 			printf("%s\n", reader.line);
 			break;
@@ -64,6 +66,8 @@ static void unpack_sideband(void)
 		switch (reader.status) {
 		case PACKET_READ_EOF:
 			break;
+		case PACKET_READ_FORMAT_ERROR:
+			BUG("format error from non-gentle packet read");
 		case PACKET_READ_NORMAL:
 			band = reader.line[0] & 0xff;
 			if (band < 1 || band > 2)


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] remote-curl: die on server-side errors
  2018-11-14  0:49     ` Jeff King
@ 2018-11-14  7:00       ` Jeff King
  2018-11-15 21:51         ` Josh Steadmon
  0 siblings, 1 reply; 41+ messages in thread
From: Jeff King @ 2018-11-14  7:00 UTC (permalink / raw)
  To: git

On Tue, Nov 13, 2018 at 07:49:15PM -0500, Jeff King wrote:

> Yes, the packet_read_line_buf() interface will both advance the buf
> pointer and decrement the length.  So if we want to "peek", we have to
> do so with a copy (there's a peek function if you use the packet_reader
> interface, but that might be overkill here).
> 
> You can rewrite it like this, which is a pretty faithful conversion and
> passes the tests (but see below).
> [...]

Here's a version which is less faithful, but I think does sensible
things in all cases, and is much easier to follow. I get a little
nervous just because it tightens some cases, and one never knows if
other implementations might be relying on the looseness. E.g.:

  - in the current code we'd still drop back to dumb http if the server
    tells us "application/x-git-upload-pack" but the initial pktline
    doesn't start with "#" (even though if it _does_ have "#" at
    position 5 but isn't a valid pktline, we'd complain!)

  - right now the v2 protocol does not require the server to say
    "application/x-git-upload-pack" for the content-type

This patch tightens both of those (I also made a few stylistic tweaks,
and added the ERR condition to show where it would go). I dunno. Part of
me sees this as a nice cleanup, but maybe it is better to just leave it
alone. A lot of these behaviors are just how it happens to work now, and
not part of the spec, but we don't know what might be relying on them.

diff --git a/remote-curl.c b/remote-curl.c
index 762a55a75f..1adb96311b 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -330,9 +330,61 @@ static int get_protocol_http_header(enum protocol_version version,
 	return 0;
 }
 
+static void check_smart_http(struct discovery *d, const char *service,
+			     struct strbuf *type)
+{
+	char *src_buf;
+	size_t src_len;
+	char *line;
+	const char *p;
+
+	if (!skip_prefix(type->buf, "application/x-", &p) ||
+	    !skip_prefix(p, service, &p) ||
+	    strcmp(p, "-advertisement"))
+		return;
+
+	/*
+	 * We speculatively try to read a packet, which means we must preserve
+	 * the original buf/len pair in some cases.
+	 */
+	src_buf = d->buf;
+	src_len = d->len;
+	line = packet_read_line_buf(&src_buf, &src_len, NULL);
+	if (!line)
+		die("invalid server response; expected service, got flush packet");
+
+	if (skip_prefix(line, "# service=", &p) && !strcmp(p, service)) {
+		/*
+		 * The header can include additional metadata lines, up
+		 * until a packet flush marker.  Ignore these now, but
+		 * in the future we might start to scan them.
+		 */
+		while (packet_read_line_buf(&src_buf, &src_len, NULL))
+			;
+
+		/*
+		 * v0 smart http; callers expect us to soak up the
+		 * service and header packets
+		 */
+		d->buf = src_buf;
+		d->len = src_len;
+		d->proto_git = 1;
+
+	} else if (starts_with(line, "version 2")) { /* should be strcmp()? */
+		/*
+		 * v2 smart http; do not consume version packet, which will
+		 * be handled elsewhere.
+		 */
+		d->proto_git = 1;
+	} else if (skip_prefix(line, "ERR ", &p)) {
+		die(_("remote error: %s"), p);
+	} else {
+		die("invalid server response; got '%s'", line);
+	}
+}
+
 static struct discovery *discover_refs(const char *service, int for_push)
 {
-	struct strbuf exp = STRBUF_INIT;
 	struct strbuf type = STRBUF_INIT;
 	struct strbuf charset = STRBUF_INIT;
 	struct strbuf buffer = STRBUF_INIT;
@@ -405,38 +457,8 @@ static struct discovery *discover_refs(const char *service, int for_push)
 	last->buf_alloc = strbuf_detach(&buffer, &last->len);
 	last->buf = last->buf_alloc;
 
-	strbuf_addf(&exp, "application/x-%s-advertisement", service);
-	if (maybe_smart &&
-	    (5 <= last->len && last->buf[4] == '#') &&
-	    !strbuf_cmp(&exp, &type)) {
-		char *line;
-
-		/*
-		 * smart HTTP response; validate that the service
-		 * pkt-line matches our request.
-		 */
-		line = packet_read_line_buf(&last->buf, &last->len, NULL);
-		if (!line)
-			die("invalid server response; expected service, got flush packet");
-
-		strbuf_reset(&exp);
-		strbuf_addf(&exp, "# service=%s", service);
-		if (strcmp(line, exp.buf))
-			die("invalid server response; got '%s'", line);
-		strbuf_release(&exp);
-
-		/* The header can include additional metadata lines, up
-		 * until a packet flush marker.  Ignore these now, but
-		 * in the future we might start to scan them.
-		 */
-		while (packet_read_line_buf(&last->buf, &last->len, NULL))
-			;
-
-		last->proto_git = 1;
-	} else if (maybe_smart &&
-		   last->len > 5 && starts_with(last->buf + 4, "version 2")) {
-		last->proto_git = 1;
-	}
+	if (maybe_smart)
+		check_smart_http(last, service, &type);
 
 	if (last->proto_git)
 		last->refs = parse_git_refs(last, for_push);
@@ -444,7 +466,6 @@ static struct discovery *discover_refs(const char *service, int for_push)
 		last->refs = parse_info_refs(last);
 
 	strbuf_release(&refs_url);
-	strbuf_release(&exp);
 	strbuf_release(&type);
 	strbuf_release(&charset);
 	strbuf_release(&effective_url);

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] remote-curl: die on server-side errors
  2018-11-14  7:00       ` Jeff King
@ 2018-11-15 21:51         ` Josh Steadmon
  2018-11-16  8:44           ` [PATCH 0/3] remote-curl smart-http discovery cleanup Jeff King
  0 siblings, 1 reply; 41+ messages in thread
From: Josh Steadmon @ 2018-11-15 21:51 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On 2018.11.14 02:00, Jeff King wrote:
> On Tue, Nov 13, 2018 at 07:49:15PM -0500, Jeff King wrote:
> 
> > Yes, the packet_read_line_buf() interface will both advance the buf
> > pointer and decrement the length.  So if we want to "peek", we have to
> > do so with a copy (there's a peek function if you use the packet_reader
> > interface, but that might be overkill here).
> > 
> > You can rewrite it like this, which is a pretty faithful conversion and
> > passes the tests (but see below).
> > [...]
> 
> Here's a version which is less faithful, but I think does sensible
> things in all cases, and is much easier to follow. I get a little
> nervous just because it tightens some cases, and one never knows if
> other implementations might be relying on the looseness. E.g.:
> 
>   - in the current code we'd still drop back to dumb http if the server
>     tells us "application/x-git-upload-pack" but the initial pktline
>     doesn't start with "#" (even though if it _does_ have "#" at
>     position 5 but isn't a valid pktline, we'd complain!)
> 
>   - right now the v2 protocol does not require the server to say
>     "application/x-git-upload-pack" for the content-type
> 
> This patch tightens both of those (I also made a few stylistic tweaks,
> and added the ERR condition to show where it would go). I dunno. Part of
> me sees this as a nice cleanup, but maybe it is better to just leave it
> alone. A lot of these behaviors are just how it happens to work now, and
> not part of the spec, but we don't know what might be relying on them.

At least according to the protocol-v2 and http-protocol docs, the
stricter behavior seems correct:

For the first point above, dumb servers should never use an
"application/x-git-*" content type (http-protocol.txt line 163-167).

For the second point, the docs require v2 servers to use
"application/x-git-*" content types. protocol-v2.txt lines 63-65 state
that v2 clients should make a smart http request, while
http-protocol.txt lines 247-252 state that a smart server's response
type must be "application/x-git-*".

Of course we don't know if other implementations follow the spec, but
ISTM that this patch at least doesn't contradict how we've promised the
protocols should work.

If no one has any objections, I'll include the diff below in v2. Thanks
for the help Jeff!

> diff --git a/remote-curl.c b/remote-curl.c
> index 762a55a75f..1adb96311b 100644
> --- a/remote-curl.c
> +++ b/remote-curl.c
> @@ -330,9 +330,61 @@ static int get_protocol_http_header(enum protocol_version version,
>  	return 0;
>  }
>  
> +static void check_smart_http(struct discovery *d, const char *service,
> +			     struct strbuf *type)
> +{
> +	char *src_buf;
> +	size_t src_len;
> +	char *line;
> +	const char *p;
> +
> +	if (!skip_prefix(type->buf, "application/x-", &p) ||
> +	    !skip_prefix(p, service, &p) ||
> +	    strcmp(p, "-advertisement"))
> +		return;
> +
> +	/*
> +	 * We speculatively try to read a packet, which means we must preserve
> +	 * the original buf/len pair in some cases.
> +	 */
> +	src_buf = d->buf;
> +	src_len = d->len;
> +	line = packet_read_line_buf(&src_buf, &src_len, NULL);
> +	if (!line)
> +		die("invalid server response; expected service, got flush packet");
> +
> +	if (skip_prefix(line, "# service=", &p) && !strcmp(p, service)) {
> +		/*
> +		 * The header can include additional metadata lines, up
> +		 * until a packet flush marker.  Ignore these now, but
> +		 * in the future we might start to scan them.
> +		 */
> +		while (packet_read_line_buf(&src_buf, &src_len, NULL))
> +			;
> +
> +		/*
> +		 * v0 smart http; callers expect us to soak up the
> +		 * service and header packets
> +		 */
> +		d->buf = src_buf;
> +		d->len = src_len;
> +		d->proto_git = 1;
> +
> +	} else if (starts_with(line, "version 2")) { /* should be strcmp()? */
> +		/*
> +		 * v2 smart http; do not consume version packet, which will
> +		 * be handled elsewhere.
> +		 */
> +		d->proto_git = 1;
> +	} else if (skip_prefix(line, "ERR ", &p)) {
> +		die(_("remote error: %s"), p);
> +	} else {
> +		die("invalid server response; got '%s'", line);
> +	}
> +}
> +
>  static struct discovery *discover_refs(const char *service, int for_push)
>  {
> -	struct strbuf exp = STRBUF_INIT;
>  	struct strbuf type = STRBUF_INIT;
>  	struct strbuf charset = STRBUF_INIT;
>  	struct strbuf buffer = STRBUF_INIT;
> @@ -405,38 +457,8 @@ static struct discovery *discover_refs(const char *service, int for_push)
>  	last->buf_alloc = strbuf_detach(&buffer, &last->len);
>  	last->buf = last->buf_alloc;
>  
> -	strbuf_addf(&exp, "application/x-%s-advertisement", service);
> -	if (maybe_smart &&
> -	    (5 <= last->len && last->buf[4] == '#') &&
> -	    !strbuf_cmp(&exp, &type)) {
> -		char *line;
> -
> -		/*
> -		 * smart HTTP response; validate that the service
> -		 * pkt-line matches our request.
> -		 */
> -		line = packet_read_line_buf(&last->buf, &last->len, NULL);
> -		if (!line)
> -			die("invalid server response; expected service, got flush packet");
> -
> -		strbuf_reset(&exp);
> -		strbuf_addf(&exp, "# service=%s", service);
> -		if (strcmp(line, exp.buf))
> -			die("invalid server response; got '%s'", line);
> -		strbuf_release(&exp);
> -
> -		/* The header can include additional metadata lines, up
> -		 * until a packet flush marker.  Ignore these now, but
> -		 * in the future we might start to scan them.
> -		 */
> -		while (packet_read_line_buf(&last->buf, &last->len, NULL))
> -			;
> -
> -		last->proto_git = 1;
> -	} else if (maybe_smart &&
> -		   last->len > 5 && starts_with(last->buf + 4, "version 2")) {
> -		last->proto_git = 1;
> -	}
> +	if (maybe_smart)
> +		check_smart_http(last, service, &type);
>  
>  	if (last->proto_git)
>  		last->refs = parse_git_refs(last, for_push);
> @@ -444,7 +466,6 @@ static struct discovery *discover_refs(const char *service, int for_push)
>  		last->refs = parse_info_refs(last);
>  
>  	strbuf_release(&refs_url);
> -	strbuf_release(&exp);
>  	strbuf_release(&type);
>  	strbuf_release(&charset);
>  	strbuf_release(&effective_url);

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 0/3] remote-curl smart-http discovery cleanup
  2018-11-15 21:51         ` Josh Steadmon
@ 2018-11-16  8:44           ` Jeff King
  2018-11-16  8:47             ` [PATCH 1/3] remote-curl: refactor smart-http discovery Jeff King
                               ` (4 more replies)
  0 siblings, 5 replies; 41+ messages in thread
From: Jeff King @ 2018-11-16  8:44 UTC (permalink / raw)
  To: git

On Thu, Nov 15, 2018 at 01:51:52PM -0800, Josh Steadmon wrote:

> > This patch tightens both of those (I also made a few stylistic tweaks,
> > and added the ERR condition to show where it would go). I dunno. Part of
> > me sees this as a nice cleanup, but maybe it is better to just leave it
> > alone. A lot of these behaviors are just how it happens to work now, and
> > not part of the spec, but we don't know what might be relying on them.
> 
> At least according to the protocol-v2 and http-protocol docs, the
> stricter behavior seems correct:
> 
> For the first point above, dumb servers should never use an
> "application/x-git-*" content type (http-protocol.txt line 163-167).
> 
> For the second point, the docs require v2 servers to use
> "application/x-git-*" content types. protocol-v2.txt lines 63-65 state
> that v2 clients should make a smart http request, while
> http-protocol.txt lines 247-252 state that a smart server's response
> type must be "application/x-git-*".

Thanks for digging into the spec. I agree that it's pretty clear that
the appropriate content-type is expected.

> Of course we don't know if other implementations follow the spec, but
> ISTM that this patch at least doesn't contradict how we've promised the
> protocols should work.

These seem like pretty unlikely ways for a buggy server to behave, so I
think it's a reasonable risk. I also checked GitHub's implementation
(which recently learned to speak v2) and made sure that it works. :)
I didn't check JGit, but given the provenance, I assume it's fine.

Amusingly, this does break the test you just added, because it tries to
issue an ERR after claiming "text/html" (and after my patch, we
correctly fall back to dumb-http).

> If no one has any objections, I'll include the diff below in v2. Thanks
> for the help Jeff!

I think it makes sense to do the refactoring first as a separate step.
And of course it needs a commit message. So how about this series (your
original is rebased on top)?

  [1/3]: remote-curl: refactor smart-http discovery
  [2/3]: remote-curl: tighten "version 2" check for smart-http
  [3/3]: remote-curl: die on server-side errors

 remote-curl.c                   | 96 +++++++++++++++++++++------------
 t/lib-httpd.sh                  |  1 +
 t/lib-httpd/apache.conf         |  4 ++
 t/lib-httpd/error-smart-http.sh |  3 ++
 t/t5551-http-fetch-smart.sh     |  5 ++
 5 files changed, 75 insertions(+), 34 deletions(-)
 create mode 100644 t/lib-httpd/error-smart-http.sh

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 1/3] remote-curl: refactor smart-http discovery
  2018-11-16  8:44           ` [PATCH 0/3] remote-curl smart-http discovery cleanup Jeff King
@ 2018-11-16  8:47             ` Jeff King
  2018-11-16 20:27               ` Josh Steadmon
  2019-02-05 23:29               ` Junio C Hamano
  2018-11-16  8:48             ` [PATCH 2/3] remote-curl: tighten "version 2" check for smart-http Jeff King
                               ` (3 subsequent siblings)
  4 siblings, 2 replies; 41+ messages in thread
From: Jeff King @ 2018-11-16  8:47 UTC (permalink / raw)
  To: git

After making initial contact with an http server, we have to decide if
the server supports smart-http, and if so, which version. Our rules are
a bit inconsistent:

  1. For v0, we require that the content-type indicates a smart-http
     response. We also require the response to look vaguely like a
     pkt-line starting with "#". If one of those does not match, we fall
     back to dumb-http.

     But according to our http protocol spec[1]:

       Dumb servers MUST NOT return a return type starting with
       `application/x-git-`.

     If we see the expected content-type, we should consider it
     smart-http. At that point we can parse the pkt-line for real, and
     complain if it is not syntactically valid.

  2. For v2, we do not actually check the content-type. Our v2 protocol
     spec says[2]:

       When using the http:// or https:// transport a client makes a
       "smart" info/refs request as described in `http-protocol.txt`[...]

     and the http spec is clear that for a smart-http[3]:

       The Content-Type MUST be `application/x-$servicename-advertisement`.

     So it is required according to the spec.

These inconsistencies were easy to miss because of the way the original
code was written as an inline conditional. Let's pull it out into its
own function for readability, and improve a few things:

 - we now predicate the smart/dumb decision entirely on the presence of
   the correct content-type

 - we do a real pkt-line parse before deciding how to proceed (and die
   if it isn't valid)

 - use skip_prefix() for comparing service strings, instead of
   constructing expected output in a strbuf; this avoids dealing with
   memory cleanup

Note that this _is_ tightening what the client will allow. It's all
according to the spec, but it's possible that other implementations
might violate these. However, violating these particular rules seems
like an odd choice for a server to make.

[1] Documentation/technical/http-protocol.txt, l. 166-167
[2] Documentation/technical/protocol-v2.txt, l. 63-64
[3] Documentation/technical/http-protocol.txt, l. 247

Helped-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Jeff King <peff@peff.net>
---
 remote-curl.c | 93 ++++++++++++++++++++++++++++++++-------------------
 1 file changed, 59 insertions(+), 34 deletions(-)

diff --git a/remote-curl.c b/remote-curl.c
index 762a55a75f..dd9bc41aa1 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -330,9 +330,65 @@ static int get_protocol_http_header(enum protocol_version version,
 	return 0;
 }
 
+static void check_smart_http(struct discovery *d, const char *service,
+			     struct strbuf *type)
+{
+	char *src_buf;
+	size_t src_len;
+	char *line;
+	const char *p;
+
+	/*
+	 * If we don't see x-$service-advertisement, then it's not smart-http.
+	 * But once we do, we commit to it and assume any other protocol
+	 * violations are hard errors.
+	 */
+	if (!skip_prefix(type->buf, "application/x-", &p) ||
+	    !skip_prefix(p, service, &p) ||
+	    strcmp(p, "-advertisement"))
+		return;
+
+	/*
+	 * "Peek" at the first packet by using a separate buf/len pair; some
+	 * cases below require us leaving the originals intact.
+	 */
+	src_buf = d->buf;
+	src_len = d->len;
+	line = packet_read_line_buf(&src_buf, &src_len, NULL);
+	if (!line)
+		die("invalid server response; expected service, got flush packet");
+
+	if (skip_prefix(line, "# service=", &p) && !strcmp(p, service)) {
+		/*
+		 * The header can include additional metadata lines, up
+		 * until a packet flush marker.  Ignore these now, but
+		 * in the future we might start to scan them.
+		 */
+		while (packet_read_line_buf(&src_buf, &src_len, NULL))
+			;
+
+		/*
+		 * v0 smart http; callers expect us to soak up the
+		 * service and header packets
+		 */
+		d->buf = src_buf;
+		d->len = src_len;
+		d->proto_git = 1;
+
+	} else if (starts_with(line, "version 2")) {
+		/*
+		 * v2 smart http; do not consume version packet, which will
+		 * be handled elsewhere.
+		 */
+		d->proto_git = 1;
+
+	} else {
+		die("invalid server response; got '%s'", line);
+	}
+}
+
 static struct discovery *discover_refs(const char *service, int for_push)
 {
-	struct strbuf exp = STRBUF_INIT;
 	struct strbuf type = STRBUF_INIT;
 	struct strbuf charset = STRBUF_INIT;
 	struct strbuf buffer = STRBUF_INIT;
@@ -405,38 +461,8 @@ static struct discovery *discover_refs(const char *service, int for_push)
 	last->buf_alloc = strbuf_detach(&buffer, &last->len);
 	last->buf = last->buf_alloc;
 
-	strbuf_addf(&exp, "application/x-%s-advertisement", service);
-	if (maybe_smart &&
-	    (5 <= last->len && last->buf[4] == '#') &&
-	    !strbuf_cmp(&exp, &type)) {
-		char *line;
-
-		/*
-		 * smart HTTP response; validate that the service
-		 * pkt-line matches our request.
-		 */
-		line = packet_read_line_buf(&last->buf, &last->len, NULL);
-		if (!line)
-			die("invalid server response; expected service, got flush packet");
-
-		strbuf_reset(&exp);
-		strbuf_addf(&exp, "# service=%s", service);
-		if (strcmp(line, exp.buf))
-			die("invalid server response; got '%s'", line);
-		strbuf_release(&exp);
-
-		/* The header can include additional metadata lines, up
-		 * until a packet flush marker.  Ignore these now, but
-		 * in the future we might start to scan them.
-		 */
-		while (packet_read_line_buf(&last->buf, &last->len, NULL))
-			;
-
-		last->proto_git = 1;
-	} else if (maybe_smart &&
-		   last->len > 5 && starts_with(last->buf + 4, "version 2")) {
-		last->proto_git = 1;
-	}
+	if (maybe_smart)
+		check_smart_http(last, service, &type);
 
 	if (last->proto_git)
 		last->refs = parse_git_refs(last, for_push);
@@ -444,7 +470,6 @@ static struct discovery *discover_refs(const char *service, int for_push)
 		last->refs = parse_info_refs(last);
 
 	strbuf_release(&refs_url);
-	strbuf_release(&exp);
 	strbuf_release(&type);
 	strbuf_release(&charset);
 	strbuf_release(&effective_url);
-- 
2.19.1.1636.gc7a073d580


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 2/3] remote-curl: tighten "version 2" check for smart-http
  2018-11-16  8:44           ` [PATCH 0/3] remote-curl smart-http discovery cleanup Jeff King
  2018-11-16  8:47             ` [PATCH 1/3] remote-curl: refactor smart-http discovery Jeff King
@ 2018-11-16  8:48             ` Jeff King
  2018-11-16 20:28               ` Josh Steadmon
  2018-11-16  8:49             ` [PATCH 3/3] remote-curl: die on server-side errors Jeff King
                               ` (2 subsequent siblings)
  4 siblings, 1 reply; 41+ messages in thread
From: Jeff King @ 2018-11-16  8:48 UTC (permalink / raw)
  To: git

In a v2 smart-http conversation, the server should reply to our initial
request with a pkt-line saying "version 2" (this is the start of the
"capabilities advertisement"). We check for the string using
starts_with(), but that's overly permissive (it would match "version
20", for example).

Let's tighten this check to use strcmp(). Note that we don't need to
worry about a trailing newline here, because the ptk-line code will have
chomped it for us already.

Signed-off-by: Jeff King <peff@peff.net>
---
 remote-curl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/remote-curl.c b/remote-curl.c
index dd9bc41aa1..3c9c4a07c3 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -375,7 +375,7 @@ static void check_smart_http(struct discovery *d, const char *service,
 		d->len = src_len;
 		d->proto_git = 1;
 
-	} else if (starts_with(line, "version 2")) {
+	} else if (!strcmp(line, "version 2")) {
 		/*
 		 * v2 smart http; do not consume version packet, which will
 		 * be handled elsewhere.
-- 
2.19.1.1636.gc7a073d580


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 3/3] remote-curl: die on server-side errors
  2018-11-16  8:44           ` [PATCH 0/3] remote-curl smart-http discovery cleanup Jeff King
  2018-11-16  8:47             ` [PATCH 1/3] remote-curl: refactor smart-http discovery Jeff King
  2018-11-16  8:48             ` [PATCH 2/3] remote-curl: tighten "version 2" check for smart-http Jeff King
@ 2018-11-16  8:49             ` Jeff King
  2018-11-16 20:04             ` [PATCH 0/3] remote-curl smart-http discovery cleanup Josh Steadmon
  2018-12-12  0:25             ` [PATCH v3 0/4] Unify pkt-line error handling and refactor smart-http Josh Steadmon
  4 siblings, 0 replies; 41+ messages in thread
From: Jeff King @ 2018-11-16  8:49 UTC (permalink / raw)
  To: git

From: Josh Steadmon <steadmon@google.com>

When a smart HTTP server sends an error message via pkt-line,
remote-curl will fail to detect the error (which usually results in
incorrectly falling back to dumb-HTTP mode).

This patch adds a check in check_smart_http() for server-side error
messages, as well as a test case for this issue.

Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Jeff King <peff@peff.net>
---
 remote-curl.c                   | 3 +++
 t/lib-httpd.sh                  | 1 +
 t/lib-httpd/apache.conf         | 4 ++++
 t/lib-httpd/error-smart-http.sh | 3 +++
 t/t5551-http-fetch-smart.sh     | 5 +++++
 5 files changed, 16 insertions(+)
 create mode 100644 t/lib-httpd/error-smart-http.sh

diff --git a/remote-curl.c b/remote-curl.c
index 3c9c4a07c3..b1309f2bdc 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -382,6 +382,9 @@ static void check_smart_http(struct discovery *d, const char *service,
 		 */
 		d->proto_git = 1;
 
+	} else if (skip_prefix(line, "ERR ", &p)) {
+		die(_("remote error: %s"), p);
+
 	} else {
 		die("invalid server response; got '%s'", line);
 	}
diff --git a/t/lib-httpd.sh b/t/lib-httpd.sh
index a8729f8232..4e946623bb 100644
--- a/t/lib-httpd.sh
+++ b/t/lib-httpd.sh
@@ -131,6 +131,7 @@ prepare_httpd() {
 	mkdir -p "$HTTPD_DOCUMENT_ROOT_PATH"
 	cp "$TEST_PATH"/passwd "$HTTPD_ROOT_PATH"
 	install_script broken-smart-http.sh
+	install_script error-smart-http.sh
 	install_script error.sh
 	install_script apply-one-time-sed.sh
 
diff --git a/t/lib-httpd/apache.conf b/t/lib-httpd/apache.conf
index 581c010d8f..6de2bc775c 100644
--- a/t/lib-httpd/apache.conf
+++ b/t/lib-httpd/apache.conf
@@ -117,6 +117,7 @@ Alias /auth/dumb/ www/auth/dumb/
 </LocationMatch>
 ScriptAliasMatch /smart_*[^/]*/(.*) ${GIT_EXEC_PATH}/git-http-backend/$1
 ScriptAlias /broken_smart/ broken-smart-http.sh/
+ScriptAlias /error_smart/ error-smart-http.sh/
 ScriptAlias /error/ error.sh/
 ScriptAliasMatch /one_time_sed/(.*) apply-one-time-sed.sh/$1
 <Directory ${GIT_EXEC_PATH}>
@@ -125,6 +126,9 @@ ScriptAliasMatch /one_time_sed/(.*) apply-one-time-sed.sh/$1
 <Files broken-smart-http.sh>
 	Options ExecCGI
 </Files>
+<Files error-smart-http.sh>
+	Options ExecCGI
+</Files>
 <Files error.sh>
   Options ExecCGI
 </Files>
diff --git a/t/lib-httpd/error-smart-http.sh b/t/lib-httpd/error-smart-http.sh
new file mode 100644
index 0000000000..e65d447fc4
--- /dev/null
+++ b/t/lib-httpd/error-smart-http.sh
@@ -0,0 +1,3 @@
+echo "Content-Type: application/x-git-upload-pack-advertisement"
+echo
+printf "%s" "0019ERR server-side error"
diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh
index 8630b0cc39..ba83e567e5 100755
--- a/t/t5551-http-fetch-smart.sh
+++ b/t/t5551-http-fetch-smart.sh
@@ -429,5 +429,10 @@ test_expect_success 'GIT_TRACE_CURL_NO_DATA prevents data from being traced' '
 	! grep "=> Send data" err
 '
 
+test_expect_success 'server-side error detected' '
+	test_must_fail git clone $HTTPD_URL/error_smart/repo.git 2>actual &&
+	grep "server-side error" actual
+'
+
 stop_httpd
 test_done
-- 
2.19.1.1636.gc7a073d580

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/3] remote-curl smart-http discovery cleanup
  2018-11-16  8:44           ` [PATCH 0/3] remote-curl smart-http discovery cleanup Jeff King
                               ` (2 preceding siblings ...)
  2018-11-16  8:49             ` [PATCH 3/3] remote-curl: die on server-side errors Jeff King
@ 2018-11-16 20:04             ` Josh Steadmon
  2018-12-12  0:25             ` [PATCH v3 0/4] Unify pkt-line error handling and refactor smart-http Josh Steadmon
  4 siblings, 0 replies; 41+ messages in thread
From: Josh Steadmon @ 2018-11-16 20:04 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On 2018.11.16 03:44, Jeff King wrote:
[...]
> Amusingly, this does break the test you just added, because it tries to
> issue an ERR after claiming "text/html" (and after my patch, we
> correctly fall back to dumb-http).

Heh yeah, I copied the script from a dumb-http test without reading the
spec.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] remote-curl: refactor smart-http discovery
  2018-11-16  8:47             ` [PATCH 1/3] remote-curl: refactor smart-http discovery Jeff King
@ 2018-11-16 20:27               ` Josh Steadmon
  2019-02-05 23:29               ` Junio C Hamano
  1 sibling, 0 replies; 41+ messages in thread
From: Josh Steadmon @ 2018-11-16 20:27 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On 2018.11.16 03:47, Jeff King wrote:
> After making initial contact with an http server, we have to decide if
> the server supports smart-http, and if so, which version. Our rules are
> a bit inconsistent:
> 
>   1. For v0, we require that the content-type indicates a smart-http
>      response. We also require the response to look vaguely like a
>      pkt-line starting with "#". If one of those does not match, we fall
>      back to dumb-http.
> 
>      But according to our http protocol spec[1]:
> 
>        Dumb servers MUST NOT return a return type starting with
>        `application/x-git-`.
> 
>      If we see the expected content-type, we should consider it
>      smart-http. At that point we can parse the pkt-line for real, and
>      complain if it is not syntactically valid.
> 
>   2. For v2, we do not actually check the content-type. Our v2 protocol
>      spec says[2]:
> 
>        When using the http:// or https:// transport a client makes a
>        "smart" info/refs request as described in `http-protocol.txt`[...]
> 
>      and the http spec is clear that for a smart-http[3]:
> 
>        The Content-Type MUST be `application/x-$servicename-advertisement`.
> 
>      So it is required according to the spec.
> 
> These inconsistencies were easy to miss because of the way the original
> code was written as an inline conditional. Let's pull it out into its
> own function for readability, and improve a few things:
> 
>  - we now predicate the smart/dumb decision entirely on the presence of
>    the correct content-type
> 
>  - we do a real pkt-line parse before deciding how to proceed (and die
>    if it isn't valid)
> 
>  - use skip_prefix() for comparing service strings, instead of
>    constructing expected output in a strbuf; this avoids dealing with
>    memory cleanup
> 
> Note that this _is_ tightening what the client will allow. It's all
> according to the spec, but it's possible that other implementations
> might violate these. However, violating these particular rules seems
> like an odd choice for a server to make.
> 
> [1] Documentation/technical/http-protocol.txt, l. 166-167
> [2] Documentation/technical/protocol-v2.txt, l. 63-64
> [3] Documentation/technical/http-protocol.txt, l. 247
> 
> Helped-by: Josh Steadmon <steadmon@google.com>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  remote-curl.c | 93 ++++++++++++++++++++++++++++++++-------------------
>  1 file changed, 59 insertions(+), 34 deletions(-)
> 
> diff --git a/remote-curl.c b/remote-curl.c
> index 762a55a75f..dd9bc41aa1 100644
> --- a/remote-curl.c
> +++ b/remote-curl.c
> @@ -330,9 +330,65 @@ static int get_protocol_http_header(enum protocol_version version,
>  	return 0;
>  }
>  
> +static void check_smart_http(struct discovery *d, const char *service,
> +			     struct strbuf *type)
> +{
> +	char *src_buf;
> +	size_t src_len;
> +	char *line;
> +	const char *p;
> +
> +	/*
> +	 * If we don't see x-$service-advertisement, then it's not smart-http.
> +	 * But once we do, we commit to it and assume any other protocol
> +	 * violations are hard errors.
> +	 */
> +	if (!skip_prefix(type->buf, "application/x-", &p) ||
> +	    !skip_prefix(p, service, &p) ||
> +	    strcmp(p, "-advertisement"))
> +		return;
> +
> +	/*
> +	 * "Peek" at the first packet by using a separate buf/len pair; some
> +	 * cases below require us leaving the originals intact.
> +	 */
> +	src_buf = d->buf;
> +	src_len = d->len;
> +	line = packet_read_line_buf(&src_buf, &src_len, NULL);
> +	if (!line)
> +		die("invalid server response; expected service, got flush packet");
> +
> +	if (skip_prefix(line, "# service=", &p) && !strcmp(p, service)) {
> +		/*
> +		 * The header can include additional metadata lines, up
> +		 * until a packet flush marker.  Ignore these now, but
> +		 * in the future we might start to scan them.
> +		 */
> +		while (packet_read_line_buf(&src_buf, &src_len, NULL))
> +			;
> +
> +		/*
> +		 * v0 smart http; callers expect us to soak up the
> +		 * service and header packets
> +		 */
> +		d->buf = src_buf;
> +		d->len = src_len;
> +		d->proto_git = 1;
> +
> +	} else if (starts_with(line, "version 2")) {
> +		/*
> +		 * v2 smart http; do not consume version packet, which will
> +		 * be handled elsewhere.
> +		 */
> +		d->proto_git = 1;
> +
> +	} else {
> +		die("invalid server response; got '%s'", line);
> +	}
> +}
> +
>  static struct discovery *discover_refs(const char *service, int for_push)
>  {
> -	struct strbuf exp = STRBUF_INIT;
>  	struct strbuf type = STRBUF_INIT;
>  	struct strbuf charset = STRBUF_INIT;
>  	struct strbuf buffer = STRBUF_INIT;
> @@ -405,38 +461,8 @@ static struct discovery *discover_refs(const char *service, int for_push)
>  	last->buf_alloc = strbuf_detach(&buffer, &last->len);
>  	last->buf = last->buf_alloc;
>  
> -	strbuf_addf(&exp, "application/x-%s-advertisement", service);
> -	if (maybe_smart &&
> -	    (5 <= last->len && last->buf[4] == '#') &&
> -	    !strbuf_cmp(&exp, &type)) {
> -		char *line;
> -
> -		/*
> -		 * smart HTTP response; validate that the service
> -		 * pkt-line matches our request.
> -		 */
> -		line = packet_read_line_buf(&last->buf, &last->len, NULL);
> -		if (!line)
> -			die("invalid server response; expected service, got flush packet");
> -
> -		strbuf_reset(&exp);
> -		strbuf_addf(&exp, "# service=%s", service);
> -		if (strcmp(line, exp.buf))
> -			die("invalid server response; got '%s'", line);
> -		strbuf_release(&exp);
> -
> -		/* The header can include additional metadata lines, up
> -		 * until a packet flush marker.  Ignore these now, but
> -		 * in the future we might start to scan them.
> -		 */
> -		while (packet_read_line_buf(&last->buf, &last->len, NULL))
> -			;
> -
> -		last->proto_git = 1;
> -	} else if (maybe_smart &&
> -		   last->len > 5 && starts_with(last->buf + 4, "version 2")) {
> -		last->proto_git = 1;
> -	}
> +	if (maybe_smart)
> +		check_smart_http(last, service, &type);
>  
>  	if (last->proto_git)
>  		last->refs = parse_git_refs(last, for_push);
> @@ -444,7 +470,6 @@ static struct discovery *discover_refs(const char *service, int for_push)
>  		last->refs = parse_info_refs(last);
>  
>  	strbuf_release(&refs_url);
> -	strbuf_release(&exp);
>  	strbuf_release(&type);
>  	strbuf_release(&charset);
>  	strbuf_release(&effective_url);
> -- 
> 2.19.1.1636.gc7a073d580
> 

Looks good to me.

Reviewed-by: Josh Steadmon <steadmon@google.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/3] remote-curl: tighten "version 2" check for smart-http
  2018-11-16  8:48             ` [PATCH 2/3] remote-curl: tighten "version 2" check for smart-http Jeff King
@ 2018-11-16 20:28               ` Josh Steadmon
  0 siblings, 0 replies; 41+ messages in thread
From: Josh Steadmon @ 2018-11-16 20:28 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On 2018.11.16 03:48, Jeff King wrote:
> In a v2 smart-http conversation, the server should reply to our initial
> request with a pkt-line saying "version 2" (this is the start of the
> "capabilities advertisement"). We check for the string using
> starts_with(), but that's overly permissive (it would match "version
> 20", for example).
> 
> Let's tighten this check to use strcmp(). Note that we don't need to
> worry about a trailing newline here, because the ptk-line code will have
> chomped it for us already.
> 
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  remote-curl.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/remote-curl.c b/remote-curl.c
> index dd9bc41aa1..3c9c4a07c3 100644
> --- a/remote-curl.c
> +++ b/remote-curl.c
> @@ -375,7 +375,7 @@ static void check_smart_http(struct discovery *d, const char *service,
>  		d->len = src_len;
>  		d->proto_git = 1;
>  
> -	} else if (starts_with(line, "version 2")) {
> +	} else if (!strcmp(line, "version 2")) {
>  		/*
>  		 * v2 smart http; do not consume version packet, which will
>  		 * be handled elsewhere.
> -- 
> 2.19.1.1636.gc7a073d580
> 

Looks good to me.

Reviewed-by: Josh Steadmon <steadmon@google.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v3 0/4] Unify pkt-line error handling and refactor smart-http
  2018-11-16  8:44           ` [PATCH 0/3] remote-curl smart-http discovery cleanup Jeff King
                               ` (3 preceding siblings ...)
  2018-11-16 20:04             ` [PATCH 0/3] remote-curl smart-http discovery cleanup Josh Steadmon
@ 2018-12-12  0:25             ` Josh Steadmon
  2018-12-12  0:25               ` [PATCH v3 1/4] pack-protocol.txt: accept error packets in any context Josh Steadmon
                                 ` (4 more replies)
  4 siblings, 5 replies; 41+ messages in thread
From: Josh Steadmon @ 2018-12-12  0:25 UTC (permalink / raw)
  To: git; +Cc: peff, gitster, masayasuzuki

This is a reroll of js/smart-http-detect-remote-error that also includes
a fixed version of ms/proto-err-packet-anywhere [1].

The first patch clarifies the use of ERR messages in the pkt-line
protocol and unifies error handling in pkt-line.c

The second patch refactors smart-http discovery in remote-curl.c. Among
other improvements, it makes more use of the pkt-line functions, which
allows us to catch remote errors that were previously ignored.

The third patch makes the version check in remote-curl more strict.

The final patch adds a test to verify that the fix in patch #2 does
actually catch remote HTTP errors.

[1]: https://public-inbox.org/git/20181127045301.103807-1-masayasuzuki@google.com/

Jeff King (2):
  remote-curl: refactor smart-http discovery
  remote-curl: tighten "version 2" check for smart-http

Josh Steadmon (1):
  lib-httpd, t5551: check server-side HTTP errors

Masaya Suzuki (1):
  pack-protocol.txt: accept error packets in any context

 Documentation/technical/pack-protocol.txt | 20 ++---
 builtin/archive.c                         |  2 -
 connect.c                                 |  3 -
 fetch-pack.c                              |  2 -
 pkt-line.c                                |  4 +
 remote-curl.c                             | 93 ++++++++++++++---------
 t/lib-httpd.sh                            |  1 +
 t/lib-httpd/apache.conf                   |  4 +
 t/lib-httpd/error-smart-http.sh           |  3 +
 t/t5551-http-fetch-smart.sh               |  5 ++
 t/t5703-upload-pack-ref-in-want.sh        |  4 +-
 11 files changed, 89 insertions(+), 52 deletions(-)
 create mode 100644 t/lib-httpd/error-smart-http.sh

-- 
2.20.0.rc2.403.gdbc3b29805-goog


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v3 1/4] pack-protocol.txt: accept error packets in any context
  2018-12-12  0:25             ` [PATCH v3 0/4] Unify pkt-line error handling and refactor smart-http Josh Steadmon
@ 2018-12-12  0:25               ` Josh Steadmon
  2018-12-12 11:02                 ` Jeff King
  2018-12-12  0:25               ` [PATCH v3 2/4] remote-curl: refactor smart-http discovery Josh Steadmon
                                 ` (3 subsequent siblings)
  4 siblings, 1 reply; 41+ messages in thread
From: Josh Steadmon @ 2018-12-12  0:25 UTC (permalink / raw)
  To: git; +Cc: peff, gitster, masayasuzuki

From: Masaya Suzuki <masayasuzuki@google.com>

In the Git pack protocol definition, an error packet may appear only in
a certain context. However, servers can face a runtime error (e.g. I/O
error) at an arbitrary timing. This patch changes the protocol to allow
an error packet to be sent instead of any packet.

Following this protocol spec change, the error packet handling code is
moved to pkt-line.c.

Signed-off-by: Masaya Suzuki <masayasuzuki@google.com>
Signed-off-by: Josh Steadmon <steadmon@google.com>
---
 Documentation/technical/pack-protocol.txt | 20 +++++++++++---------
 builtin/archive.c                         |  2 --
 connect.c                                 |  3 ---
 fetch-pack.c                              |  2 --
 pkt-line.c                                |  4 ++++
 t/t5703-upload-pack-ref-in-want.sh        |  4 ++--
 6 files changed, 17 insertions(+), 18 deletions(-)

diff --git a/Documentation/technical/pack-protocol.txt b/Documentation/technical/pack-protocol.txt
index 6ac774d5f6..7a2375a55d 100644
--- a/Documentation/technical/pack-protocol.txt
+++ b/Documentation/technical/pack-protocol.txt
@@ -22,6 +22,16 @@ protocol-common.txt. When the grammar indicate `PKT-LINE(...)`, unless
 otherwise noted the usual pkt-line LF rules apply: the sender SHOULD
 include a LF, but the receiver MUST NOT complain if it is not present.
 
+An error packet is a special pkt-line that contains an error string.
+
+----
+  error-line     =  PKT-LINE("ERR" SP explanation-text)
+----
+
+Throughout the protocol, where `PKT-LINE(...)` is expected, an error packet MAY
+be sent. Once this packet is sent by a client or a server, the data transfer
+process defined in this protocol is terminated.
+
 Transports
 ----------
 There are three transports over which the packfile protocol is
@@ -89,13 +99,6 @@ process on the server side over the Git protocol is this:
      "0039git-upload-pack /schacon/gitbook.git\0host=example.com\0" |
      nc -v example.com 9418
 
-If the server refuses the request for some reasons, it could abort
-gracefully with an error message.
-
-----
-  error-line     =  PKT-LINE("ERR" SP explanation-text)
-----
-
 
 SSH Transport
 -------------
@@ -398,12 +401,11 @@ from the client).
 Then the server will start sending its packfile data.
 
 ----
-  server-response = *ack_multi ack / nak / error-line
+  server-response = *ack_multi ack / nak
   ack_multi       = PKT-LINE("ACK" SP obj-id ack_status)
   ack_status      = "continue" / "common" / "ready"
   ack             = PKT-LINE("ACK" SP obj-id)
   nak             = PKT-LINE("NAK")
-  error-line     =  PKT-LINE("ERR" SP explanation-text)
 ----
 
 A simple clone may look like this (with no 'have' lines):
diff --git a/builtin/archive.c b/builtin/archive.c
index d2455237ce..5d179bbd16 100644
--- a/builtin/archive.c
+++ b/builtin/archive.c
@@ -59,8 +59,6 @@ static int run_remote_archiver(int argc, const char **argv,
 	if (strcmp(buf, "ACK")) {
 		if (starts_with(buf, "NACK "))
 			die(_("git archive: NACK %s"), buf + 5);
-		if (starts_with(buf, "ERR "))
-			die(_("remote error: %s"), buf + 4);
 		die(_("git archive: protocol error"));
 	}
 
diff --git a/connect.c b/connect.c
index 24281b6082..4813f005ab 100644
--- a/connect.c
+++ b/connect.c
@@ -296,7 +296,6 @@ struct ref **get_remote_heads(struct packet_reader *reader,
 	struct ref **orig_list = list;
 	int len = 0;
 	enum get_remote_heads_state state = EXPECTING_FIRST_REF;
-	const char *arg;
 
 	*list = NULL;
 
@@ -306,8 +305,6 @@ struct ref **get_remote_heads(struct packet_reader *reader,
 			die_initial_contact(1);
 		case PACKET_READ_NORMAL:
 			len = reader->pktlen;
-			if (len > 4 && skip_prefix(reader->line, "ERR ", &arg))
-				die(_("remote error: %s"), arg);
 			break;
 		case PACKET_READ_FLUSH:
 			state = EXPECTING_DONE;
diff --git a/fetch-pack.c b/fetch-pack.c
index 9691046e64..e66cd7b71b 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -178,8 +178,6 @@ static enum ack_type get_ack(int fd, struct object_id *result_oid)
 			return ACK;
 		}
 	}
-	if (skip_prefix(line, "ERR ", &arg))
-		die(_("remote error: %s"), arg);
 	die(_("git fetch-pack: expected ACK/NAK, got '%s'"), line);
 }
 
diff --git a/pkt-line.c b/pkt-line.c
index 04d10bbd03..ce9e42d10e 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -346,6 +346,10 @@ enum packet_read_status packet_read_with_status(int fd, char **src_buffer,
 		return PACKET_READ_EOF;
 	}
 
+	if (starts_with(buffer, "ERR ")) {
+		die(_("remote error: %s"), buffer + 4);
+	}
+
 	if ((options & PACKET_READ_CHOMP_NEWLINE) &&
 	    len && buffer[len-1] == '\n')
 		len--;
diff --git a/t/t5703-upload-pack-ref-in-want.sh b/t/t5703-upload-pack-ref-in-want.sh
index 3f58f05cbb..d2a9d0c127 100755
--- a/t/t5703-upload-pack-ref-in-want.sh
+++ b/t/t5703-upload-pack-ref-in-want.sh
@@ -208,7 +208,7 @@ test_expect_success 'server is initially ahead - no ref in want' '
 	cp -r "$LOCAL_PRISTINE" local &&
 	inconsistency master 1234567890123456789012345678901234567890 &&
 	test_must_fail git -C local fetch 2>err &&
-	grep "ERR upload-pack: not our ref" err
+	grep "fatal: remote error: upload-pack: not our ref" err
 '
 
 test_expect_success 'server is initially ahead - ref in want' '
@@ -254,7 +254,7 @@ test_expect_success 'server loses a ref - ref in want' '
 	echo "s/master/raster/" >"$HTTPD_ROOT_PATH/one-time-sed" &&
 	test_must_fail git -C local fetch 2>err &&
 
-	grep "ERR unknown ref refs/heads/raster" err
+	grep "fatal: remote error: unknown ref refs/heads/raster" err
 '
 
 stop_httpd
-- 
2.20.0.rc2.403.gdbc3b29805-goog


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v3 2/4] remote-curl: refactor smart-http discovery
  2018-12-12  0:25             ` [PATCH v3 0/4] Unify pkt-line error handling and refactor smart-http Josh Steadmon
  2018-12-12  0:25               ` [PATCH v3 1/4] pack-protocol.txt: accept error packets in any context Josh Steadmon
@ 2018-12-12  0:25               ` Josh Steadmon
  2018-12-12  0:25               ` [PATCH v3 3/4] remote-curl: tighten "version 2" check for smart-http Josh Steadmon
                                 ` (2 subsequent siblings)
  4 siblings, 0 replies; 41+ messages in thread
From: Josh Steadmon @ 2018-12-12  0:25 UTC (permalink / raw)
  To: git; +Cc: peff, gitster, masayasuzuki

From: Jeff King <peff@peff.net>

After making initial contact with an http server, we have to decide if
the server supports smart-http, and if so, which version. Our rules are
a bit inconsistent:

  1. For v0, we require that the content-type indicates a smart-http
     response. We also require the response to look vaguely like a
     pkt-line starting with "#". If one of those does not match, we fall
     back to dumb-http.

     But according to our http protocol spec[1]:

       Dumb servers MUST NOT return a return type starting with
       `application/x-git-`.

     If we see the expected content-type, we should consider it
     smart-http. At that point we can parse the pkt-line for real, and
     complain if it is not syntactically valid.

  2. For v2, we do not actually check the content-type. Our v2 protocol
     spec says[2]:

       When using the http:// or https:// transport a client makes a
       "smart" info/refs request as described in `http-protocol.txt`[...]

     and the http spec is clear that for a smart-http[3]:

       The Content-Type MUST be `application/x-$servicename-advertisement`.

     So it is required according to the spec.

These inconsistencies were easy to miss because of the way the original
code was written as an inline conditional. Let's pull it out into its
own function for readability, and improve a few things:

 - we now predicate the smart/dumb decision entirely on the presence of
   the correct content-type

 - we do a real pkt-line parse before deciding how to proceed (and die
   if it isn't valid)

 - use skip_prefix() for comparing service strings, instead of
   constructing expected output in a strbuf; this avoids dealing with
   memory cleanup

Note that this _is_ tightening what the client will allow. It's all
according to the spec, but it's possible that other implementations
might violate these. However, violating these particular rules seems
like an odd choice for a server to make.

[1] Documentation/technical/http-protocol.txt, l. 166-167
[2] Documentation/technical/protocol-v2.txt, l. 63-64
[3] Documentation/technical/http-protocol.txt, l. 247

Helped-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Josh Steadmon <steadmon@google.com>
---
 remote-curl.c | 93 ++++++++++++++++++++++++++++++++-------------------
 1 file changed, 59 insertions(+), 34 deletions(-)

diff --git a/remote-curl.c b/remote-curl.c
index 1220dffcdc..38f51dffb8 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -330,9 +330,65 @@ static int get_protocol_http_header(enum protocol_version version,
 	return 0;
 }
 
+static void check_smart_http(struct discovery *d, const char *service,
+			     struct strbuf *type)
+{
+	char *src_buf;
+	size_t src_len;
+	char *line;
+	const char *p;
+
+	/*
+	 * If we don't see x-$service-advertisement, then it's not smart-http.
+	 * But once we do, we commit to it and assume any other protocol
+	 * violations are hard errors.
+	 */
+	if (!skip_prefix(type->buf, "application/x-", &p) ||
+	    !skip_prefix(p, service, &p) ||
+	    strcmp(p, "-advertisement"))
+		return;
+
+	/*
+	 * "Peek" at the first packet by using a separate buf/len pair; some
+	 * cases below require us leaving the originals intact.
+	 */
+	src_buf = d->buf;
+	src_len = d->len;
+	line = packet_read_line_buf(&src_buf, &src_len, NULL);
+	if (!line)
+		die("invalid server response; expected service, got flush packet");
+
+	if (skip_prefix(line, "# service=", &p) && !strcmp(p, service)) {
+		/*
+		 * The header can include additional metadata lines, up
+		 * until a packet flush marker.  Ignore these now, but
+		 * in the future we might start to scan them.
+		 */
+		while (packet_read_line_buf(&src_buf, &src_len, NULL))
+			;
+
+		/*
+		 * v0 smart http; callers expect us to soak up the
+		 * service and header packets
+		 */
+		d->buf = src_buf;
+		d->len = src_len;
+		d->proto_git = 1;
+
+	} else if (starts_with(line, "version 2")) {
+		/*
+		 * v2 smart http; do not consume version packet, which will
+		 * be handled elsewhere.
+		 */
+		d->proto_git = 1;
+
+	} else {
+		die("invalid server response; got '%s'", line);
+	}
+}
+
 static struct discovery *discover_refs(const char *service, int for_push)
 {
-	struct strbuf exp = STRBUF_INIT;
 	struct strbuf type = STRBUF_INIT;
 	struct strbuf charset = STRBUF_INIT;
 	struct strbuf buffer = STRBUF_INIT;
@@ -405,38 +461,8 @@ static struct discovery *discover_refs(const char *service, int for_push)
 	last->buf_alloc = strbuf_detach(&buffer, &last->len);
 	last->buf = last->buf_alloc;
 
-	strbuf_addf(&exp, "application/x-%s-advertisement", service);
-	if (maybe_smart &&
-	    (5 <= last->len && last->buf[4] == '#') &&
-	    !strbuf_cmp(&exp, &type)) {
-		char *line;
-
-		/*
-		 * smart HTTP response; validate that the service
-		 * pkt-line matches our request.
-		 */
-		line = packet_read_line_buf(&last->buf, &last->len, NULL);
-		if (!line)
-			die("invalid server response; expected service, got flush packet");
-
-		strbuf_reset(&exp);
-		strbuf_addf(&exp, "# service=%s", service);
-		if (strcmp(line, exp.buf))
-			die("invalid server response; got '%s'", line);
-		strbuf_release(&exp);
-
-		/* The header can include additional metadata lines, up
-		 * until a packet flush marker.  Ignore these now, but
-		 * in the future we might start to scan them.
-		 */
-		while (packet_read_line_buf(&last->buf, &last->len, NULL))
-			;
-
-		last->proto_git = 1;
-	} else if (maybe_smart &&
-		   last->len > 5 && starts_with(last->buf + 4, "version 2")) {
-		last->proto_git = 1;
-	}
+	if (maybe_smart)
+		check_smart_http(last, service, &type);
 
 	if (last->proto_git)
 		last->refs = parse_git_refs(last, for_push);
@@ -444,7 +470,6 @@ static struct discovery *discover_refs(const char *service, int for_push)
 		last->refs = parse_info_refs(last);
 
 	strbuf_release(&refs_url);
-	strbuf_release(&exp);
 	strbuf_release(&type);
 	strbuf_release(&charset);
 	strbuf_release(&effective_url);
-- 
2.20.0.rc2.403.gdbc3b29805-goog


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v3 3/4] remote-curl: tighten "version 2" check for smart-http
  2018-12-12  0:25             ` [PATCH v3 0/4] Unify pkt-line error handling and refactor smart-http Josh Steadmon
  2018-12-12  0:25               ` [PATCH v3 1/4] pack-protocol.txt: accept error packets in any context Josh Steadmon
  2018-12-12  0:25               ` [PATCH v3 2/4] remote-curl: refactor smart-http discovery Josh Steadmon
@ 2018-12-12  0:25               ` Josh Steadmon
  2018-12-12  0:25               ` [PATCH v3 4/4] lib-httpd, t5551: check server-side HTTP errors Josh Steadmon
  2018-12-12  8:43               ` [PATCH v3 0/4] Unify pkt-line error handling and refactor smart-http Junio C Hamano
  4 siblings, 0 replies; 41+ messages in thread
From: Josh Steadmon @ 2018-12-12  0:25 UTC (permalink / raw)
  To: git; +Cc: peff, gitster, masayasuzuki

From: Jeff King <peff@peff.net>

In a v2 smart-http conversation, the server should reply to our initial
request with a pkt-line saying "version 2" (this is the start of the
"capabilities advertisement"). We check for the string using
starts_with(), but that's overly permissive (it would match "version
20", for example).

Let's tighten this check to use strcmp(). Note that we don't need to
worry about a trailing newline here, because the ptk-line code will have
chomped it for us already.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Josh Steadmon <steadmon@google.com>
---
 remote-curl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/remote-curl.c b/remote-curl.c
index 38f51dffb8..b77b173f33 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -375,7 +375,7 @@ static void check_smart_http(struct discovery *d, const char *service,
 		d->len = src_len;
 		d->proto_git = 1;
 
-	} else if (starts_with(line, "version 2")) {
+	} else if (!strcmp(line, "version 2")) {
 		/*
 		 * v2 smart http; do not consume version packet, which will
 		 * be handled elsewhere.
-- 
2.20.0.rc2.403.gdbc3b29805-goog


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v3 4/4] lib-httpd, t5551: check server-side HTTP errors
  2018-12-12  0:25             ` [PATCH v3 0/4] Unify pkt-line error handling and refactor smart-http Josh Steadmon
                                 ` (2 preceding siblings ...)
  2018-12-12  0:25               ` [PATCH v3 3/4] remote-curl: tighten "version 2" check for smart-http Josh Steadmon
@ 2018-12-12  0:25               ` Josh Steadmon
  2018-12-12  8:43               ` [PATCH v3 0/4] Unify pkt-line error handling and refactor smart-http Junio C Hamano
  4 siblings, 0 replies; 41+ messages in thread
From: Josh Steadmon @ 2018-12-12  0:25 UTC (permalink / raw)
  To: git; +Cc: peff, gitster, masayasuzuki

Add an HTTP path to the test server config that returns an ERR pkt-line
unconditionally. Verify in t5551 that remote-curl properly detects this
ERR message and aborts.

Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 t/lib-httpd.sh                  | 1 +
 t/lib-httpd/apache.conf         | 4 ++++
 t/lib-httpd/error-smart-http.sh | 3 +++
 t/t5551-http-fetch-smart.sh     | 5 +++++
 4 files changed, 13 insertions(+)
 create mode 100644 t/lib-httpd/error-smart-http.sh

diff --git a/t/lib-httpd.sh b/t/lib-httpd.sh
index a8729f8232..4e946623bb 100644
--- a/t/lib-httpd.sh
+++ b/t/lib-httpd.sh
@@ -131,6 +131,7 @@ prepare_httpd() {
 	mkdir -p "$HTTPD_DOCUMENT_ROOT_PATH"
 	cp "$TEST_PATH"/passwd "$HTTPD_ROOT_PATH"
 	install_script broken-smart-http.sh
+	install_script error-smart-http.sh
 	install_script error.sh
 	install_script apply-one-time-sed.sh
 
diff --git a/t/lib-httpd/apache.conf b/t/lib-httpd/apache.conf
index 581c010d8f..6de2bc775c 100644
--- a/t/lib-httpd/apache.conf
+++ b/t/lib-httpd/apache.conf
@@ -117,6 +117,7 @@ Alias /auth/dumb/ www/auth/dumb/
 </LocationMatch>
 ScriptAliasMatch /smart_*[^/]*/(.*) ${GIT_EXEC_PATH}/git-http-backend/$1
 ScriptAlias /broken_smart/ broken-smart-http.sh/
+ScriptAlias /error_smart/ error-smart-http.sh/
 ScriptAlias /error/ error.sh/
 ScriptAliasMatch /one_time_sed/(.*) apply-one-time-sed.sh/$1
 <Directory ${GIT_EXEC_PATH}>
@@ -125,6 +126,9 @@ ScriptAliasMatch /one_time_sed/(.*) apply-one-time-sed.sh/$1
 <Files broken-smart-http.sh>
 	Options ExecCGI
 </Files>
+<Files error-smart-http.sh>
+	Options ExecCGI
+</Files>
 <Files error.sh>
   Options ExecCGI
 </Files>
diff --git a/t/lib-httpd/error-smart-http.sh b/t/lib-httpd/error-smart-http.sh
new file mode 100644
index 0000000000..e65d447fc4
--- /dev/null
+++ b/t/lib-httpd/error-smart-http.sh
@@ -0,0 +1,3 @@
+echo "Content-Type: application/x-git-upload-pack-advertisement"
+echo
+printf "%s" "0019ERR server-side error"
diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh
index 8630b0cc39..ba83e567e5 100755
--- a/t/t5551-http-fetch-smart.sh
+++ b/t/t5551-http-fetch-smart.sh
@@ -429,5 +429,10 @@ test_expect_success 'GIT_TRACE_CURL_NO_DATA prevents data from being traced' '
 	! grep "=> Send data" err
 '
 
+test_expect_success 'server-side error detected' '
+	test_must_fail git clone $HTTPD_URL/error_smart/repo.git 2>actual &&
+	grep "server-side error" actual
+'
+
 stop_httpd
 test_done
-- 
2.20.0.rc2.403.gdbc3b29805-goog


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 0/4] Unify pkt-line error handling and refactor smart-http
  2018-12-12  0:25             ` [PATCH v3 0/4] Unify pkt-line error handling and refactor smart-http Josh Steadmon
                                 ` (3 preceding siblings ...)
  2018-12-12  0:25               ` [PATCH v3 4/4] lib-httpd, t5551: check server-side HTTP errors Josh Steadmon
@ 2018-12-12  8:43               ` Junio C Hamano
  4 siblings, 0 replies; 41+ messages in thread
From: Junio C Hamano @ 2018-12-12  8:43 UTC (permalink / raw)
  To: Josh Steadmon; +Cc: git, peff, masayasuzuki

Josh Steadmon <steadmon@google.com> writes:

> This is a reroll of js/smart-http-detect-remote-error that also includes
> a fixed version of ms/proto-err-packet-anywhere [1].

Yay.  Thanks for reducing a topic I have to worry about by 1 ;-).

> The first patch clarifies the use of ERR messages in the pkt-line
> protocol and unifies error handling in pkt-line.c
>
> The second patch refactors smart-http discovery in remote-curl.c. Among
> other improvements, it makes more use of the pkt-line functions, which
> allows us to catch remote errors that were previously ignored.
>
> The third patch makes the version check in remote-curl more strict.
>
> The final patch adds a test to verify that the fix in patch #2 does
> actually catch remote HTTP errors.

Thanks.  All look sensible.  Will queue.

>
> [1]: https://public-inbox.org/git/20181127045301.103807-1-masayasuzuki@google.com/
>
> Jeff King (2):
>   remote-curl: refactor smart-http discovery
>   remote-curl: tighten "version 2" check for smart-http
>
> Josh Steadmon (1):
>   lib-httpd, t5551: check server-side HTTP errors
>
> Masaya Suzuki (1):
>   pack-protocol.txt: accept error packets in any context
>
>  Documentation/technical/pack-protocol.txt | 20 ++---
>  builtin/archive.c                         |  2 -
>  connect.c                                 |  3 -
>  fetch-pack.c                              |  2 -
>  pkt-line.c                                |  4 +
>  remote-curl.c                             | 93 ++++++++++++++---------
>  t/lib-httpd.sh                            |  1 +
>  t/lib-httpd/apache.conf                   |  4 +
>  t/lib-httpd/error-smart-http.sh           |  3 +
>  t/t5551-http-fetch-smart.sh               |  5 ++
>  t/t5703-upload-pack-ref-in-want.sh        |  4 +-
>  11 files changed, 89 insertions(+), 52 deletions(-)
>  create mode 100644 t/lib-httpd/error-smart-http.sh

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 1/4] pack-protocol.txt: accept error packets in any context
  2018-12-12  0:25               ` [PATCH v3 1/4] pack-protocol.txt: accept error packets in any context Josh Steadmon
@ 2018-12-12 11:02                 ` Jeff King
  2018-12-13  1:17                   ` Masaya Suzuki
  0 siblings, 1 reply; 41+ messages in thread
From: Jeff King @ 2018-12-12 11:02 UTC (permalink / raw)
  To: Josh Steadmon; +Cc: git, gitster, masayasuzuki

On Tue, Dec 11, 2018 at 04:25:15PM -0800, Josh Steadmon wrote:

> From: Masaya Suzuki <masayasuzuki@google.com>
> 
> In the Git pack protocol definition, an error packet may appear only in
> a certain context. However, servers can face a runtime error (e.g. I/O
> error) at an arbitrary timing. This patch changes the protocol to allow
> an error packet to be sent instead of any packet.
> 
> Following this protocol spec change, the error packet handling code is
> moved to pkt-line.c.

This is a change in the spec with an accompanying change in the code,
which raises the question: what do other implementations do with this
change (both older Git, and implementations like JGit, libgit2, etc)?

I think the answer for older Git is "hang up unceremoniously", which is
probably OK given the semantics of the change. And I'd suspect most
other implementations would do the same. I just wonder if anybody tested
it with other implementations.

> +An error packet is a special pkt-line that contains an error string.
> +
> +----
> +  error-line     =  PKT-LINE("ERR" SP explanation-text)
> +----
> +
> +Throughout the protocol, where `PKT-LINE(...)` is expected, an error packet MAY
> +be sent. Once this packet is sent by a client or a server, the data transfer
> +process defined in this protocol is terminated.

The packfile data is typically packetized, too, and contains arbitrary
data (that could have "ERR" in it). It looks like we don't specifically
say PKT-LINE() in that part of the protocol spec, though, so I think
this is OK.

Likewise, in the implementation:

> diff --git a/pkt-line.c b/pkt-line.c
> index 04d10bbd03..ce9e42d10e 100644
> --- a/pkt-line.c
> +++ b/pkt-line.c
> @@ -346,6 +346,10 @@ enum packet_read_status packet_read_with_status(int fd, char **src_buffer,
>  		return PACKET_READ_EOF;
>  	}
>  
> +	if (starts_with(buffer, "ERR ")) {
> +		die(_("remote error: %s"), buffer + 4);
> +	}
> +
>  	if ((options & PACKET_READ_CHOMP_NEWLINE) &&
>  	    len && buffer[len-1] == '\n')
>  		len--;

This ERR handling has been moved to a very low level. What happens if
we're passing arbitrary data via the packet_read() code? Could we
erroneously trigger an error if a packfile happens to have the bytes
"ERR " at a packet boundary?

For packfiles via upload-pack, I _think_ we're OK, because we only
packetize it when a sideband is in use. In which case this would never
match, because we'd have "\1" in the first byte slot.

But are there are other cases we need to worry about? Just
brainstorming, I can think of:

  1. We also pass packetized packfiles between git-remote-https and
     the stateless-rpc mode of fetch-pack/send-pack. And I don't think
     we use sidebands there.

  2. The packet code is used for long-lived clean/smudge filters these
     days, which also pass arbitrary data.

So I think it's probably not a good idea to unconditionally have callers
of packet_read_with_status() handle this. We'd need a flag like
PACKET_READ_RESPECT_ERR, and to trigger it from the appropriate callers.

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 1/4] pack-protocol.txt: accept error packets in any context
  2018-12-12 11:02                 ` Jeff King
@ 2018-12-13  1:17                   ` Masaya Suzuki
  2018-12-13  8:04                     ` Jeff King
  2018-12-13 22:18                     ` Josh Steadmon
  0 siblings, 2 replies; 41+ messages in thread
From: Masaya Suzuki @ 2018-12-13  1:17 UTC (permalink / raw)
  To: peff; +Cc: Josh Steadmon, git, Junio C Hamano

On Wed, Dec 12, 2018 at 3:02 AM Jeff King <peff@peff.net> wrote:
>
> On Tue, Dec 11, 2018 at 04:25:15PM -0800, Josh Steadmon wrote:
>
> > From: Masaya Suzuki <masayasuzuki@google.com>
> >
> > In the Git pack protocol definition, an error packet may appear only in
> > a certain context. However, servers can face a runtime error (e.g. I/O
> > error) at an arbitrary timing. This patch changes the protocol to allow
> > an error packet to be sent instead of any packet.
> >
> > Following this protocol spec change, the error packet handling code is
> > moved to pkt-line.c.
>
> This is a change in the spec with an accompanying change in the code,
> which raises the question: what do other implementations do with this
> change (both older Git, and implementations like JGit, libgit2, etc)?

JGit is similar to Git. It parses "ERR " in limited places. When it sees an ERR
packet in an unexpected place, it'll fail somewhere in the parsing code.

https://github.com/eclipse/jgit/blob/30c6c7542190c149e2aee792f992a312a5fc5793/org.eclipse.jgit/src/org/eclipse/jgit/transport/PacketLineIn.java#L145-L147
https://github.com/eclipse/jgit/blob/f40b39345cd9b54473ee871bff401fe3d394ffe3/org.eclipse.jgit/src/org/eclipse/jgit/transport/BasePackConnection.java#L208

I'm not familiar with libgit2 code, but it seems it handles this at a
lower level. An error type packet is parsed out at a low level, and
the error handling is done by the callers of the packet parser.

https://github.com/libgit2/libgit2/blob/bea65980c7a42e34edfafbdc40b199ba7b2a564e/src/transports/smart_pkt.c#L482-L483

I cannot find an ERR packet handling in go-git. It seems to me that if
an ERR packet appears it treats it as a parsing error.

https://github.com/src-d/go-git/blob/master/plumbing/protocol/packp/common.go#L60-L62

>
> I think the answer for older Git is "hang up unceremoniously", which is
> probably OK given the semantics of the change. And I'd suspect most
> other implementations would do the same. I just wonder if anybody tested
> it with other implementations.

I'm thinking aloud here. There would be two aspects of the protocol
compatibility: (1) new clients speak to old servers (2) old clients
speak to a new server that speaks the updated protocol.

For (1), I assume that in the Git pack protocol, a packet starting
from "ERR " does not appear naturally except for a very special case
that the server doesn't support sideband, but using the updated
protocol. As you mentioned, at first it looks like this can mistakenly
parse the pack file of git-receive-pack as an ERR packet, assuming
that git-receive-pack's pack file is packetized. Actually
git-receive-pack's pack file is not packetized in the Git pack
protocol (https://github.com/git/git/blob/master/builtin/receive-pack.c#L1695).
I recently wrote a Git protocol parser
(https://github.com/google/gitprotocolio), and I confirmed that this
is the case at least for the HTTP transport. git-upload-pack's pack
file is indeed packetized, but packetized with sideband. Except for
the case where sideband is not used, the packfiles wouldn't be
considered as an ERR packet accidentally.

For (2), if the old clients see an unexpected ERR packet, they cannot
parse it. They would handle this unparsable data as if the server is
not speaking Git protocol correctly. Even if the old clients just
ignore the packet, due to the nature of the ERR packet, the server
won't send further data. The client won't be able to proceed. Overall,
the clients anyway face an error, and the only difference would be
whether the clients can show an error nicely or not. The new clients
will show the errors nicely to users. Old clients will not.

>
> > +An error packet is a special pkt-line that contains an error string.
> > +
> > +----
> > +  error-line     =  PKT-LINE("ERR" SP explanation-text)
> > +----
> > +
> > +Throughout the protocol, where `PKT-LINE(...)` is expected, an error packet MAY
> > +be sent. Once this packet is sent by a client or a server, the data transfer
> > +process defined in this protocol is terminated.
>
> The packfile data is typically packetized, too, and contains arbitrary
> data (that could have "ERR" in it). It looks like we don't specifically
> say PKT-LINE() in that part of the protocol spec, though, so I think
> this is OK.

As I described above, as far as I can see, the packfile in
git-upload-pack is not packetized. The packfile in git-receive-pack is
packetized but typically with sideband. At least at the Git pack
protocol level, this should be OK.

>
> Likewise, in the implementation:
>
> > diff --git a/pkt-line.c b/pkt-line.c
> > index 04d10bbd03..ce9e42d10e 100644
> > --- a/pkt-line.c
> > +++ b/pkt-line.c
> > @@ -346,6 +346,10 @@ enum packet_read_status packet_read_with_status(int fd, char **src_buffer,
> >               return PACKET_READ_EOF;
> >       }
> >
> > +     if (starts_with(buffer, "ERR ")) {
> > +             die(_("remote error: %s"), buffer + 4);
> > +     }
> > +
> >       if ((options & PACKET_READ_CHOMP_NEWLINE) &&
> >           len && buffer[len-1] == '\n')
> >               len--;
>
> This ERR handling has been moved to a very low level. What happens if
> we're passing arbitrary data via the packet_read() code? Could we
> erroneously trigger an error if a packfile happens to have the bytes
> "ERR " at a packet boundary?
>
> For packfiles via upload-pack, I _think_ we're OK, because we only
> packetize it when a sideband is in use. In which case this would never
> match, because we'd have "\1" in the first byte slot.
>
> But are there are other cases we need to worry about? Just
> brainstorming, I can think of:
>
>   1. We also pass packetized packfiles between git-remote-https and
>      the stateless-rpc mode of fetch-pack/send-pack. And I don't think
>      we use sidebands there.
>
>   2. The packet code is used for long-lived clean/smudge filters these
>      days, which also pass arbitrary data.
>
> So I think it's probably not a good idea to unconditionally have callers
> of packet_read_with_status() handle this. We'd need a flag like
> PACKET_READ_RESPECT_ERR, and to trigger it from the appropriate callers.

This is outside of the Git pack protocol so having a separate parsing
mode makes sense to me.

>
> -Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 1/4] pack-protocol.txt: accept error packets in any context
  2018-12-13  1:17                   ` Masaya Suzuki
@ 2018-12-13  8:04                     ` Jeff King
  2018-12-13 22:18                     ` Josh Steadmon
  1 sibling, 0 replies; 41+ messages in thread
From: Jeff King @ 2018-12-13  8:04 UTC (permalink / raw)
  To: Masaya Suzuki; +Cc: Josh Steadmon, git, Junio C Hamano

On Wed, Dec 12, 2018 at 05:17:01PM -0800, Masaya Suzuki wrote:

> > This is a change in the spec with an accompanying change in the code,
> > which raises the question: what do other implementations do with this
> > change (both older Git, and implementations like JGit, libgit2, etc)?
> 
> JGit is similar to Git. It parses "ERR " in limited places. When it sees an ERR
> packet in an unexpected place, it'll fail somewhere in the parsing code.
> 
> https://github.com/eclipse/jgit/blob/30c6c7542190c149e2aee792f992a312a5fc5793/org.eclipse.jgit/src/org/eclipse/jgit/transport/PacketLineIn.java#L145-L147
> https://github.com/eclipse/jgit/blob/f40b39345cd9b54473ee871bff401fe3d394ffe3/org.eclipse.jgit/src/org/eclipse/jgit/transport/BasePackConnection.java#L208
> 
> I'm not familiar with libgit2 code, but it seems it handles this at a
> lower level. An error type packet is parsed out at a low level, and
> the error handling is done by the callers of the packet parser.
> 
> https://github.com/libgit2/libgit2/blob/bea65980c7a42e34edfafbdc40b199ba7b2a564e/src/transports/smart_pkt.c#L482-L483
> 
> I cannot find an ERR packet handling in go-git. It seems to me that if
> an ERR packet appears it treats it as a parsing error.
> 
> https://github.com/src-d/go-git/blob/master/plumbing/protocol/packp/common.go#L60-L62

Thanks for digging into these. It does make sense that other
implementations would give a parsing error. Hopefully they also produce
a sensible error message (ideally printing the bogus pktline), but even
if they don't we're probably no worse off than the status quo. With the
current scheme, the server can't give any message, and just has to hang
up anyway.

> > I think the answer for older Git is "hang up unceremoniously", which is
> > probably OK given the semantics of the change. And I'd suspect most
> > other implementations would do the same. I just wonder if anybody tested
> > it with other implementations.
> 
> I'm thinking aloud here. There would be two aspects of the protocol
> compatibility: (1) new clients speak to old servers (2) old clients
> speak to a new server that speaks the updated protocol.
> 
> For (1), I assume that in the Git pack protocol, a packet starting
> from "ERR " does not appear naturally except for a very special case
> that the server doesn't support sideband, but using the updated
> protocol. As you mentioned, at first it looks like this can mistakenly
> parse the pack file of git-receive-pack as an ERR packet, assuming
> that git-receive-pack's pack file is packetized. Actually
> git-receive-pack's pack file is not packetized in the Git pack
> protocol (https://github.com/git/git/blob/master/builtin/receive-pack.c#L1695).
> I recently wrote a Git protocol parser
> (https://github.com/google/gitprotocolio), and I confirmed that this
> is the case at least for the HTTP transport. git-upload-pack's pack
> file is indeed packetized, but packetized with sideband. Except for
> the case where sideband is not used, the packfiles wouldn't be
> considered as an ERR packet accidentally.

Right, that matches my understanding.

> For (2), if the old clients see an unexpected ERR packet, they cannot
> parse it. They would handle this unparsable data as if the server is
> not speaking Git protocol correctly. Even if the old clients just
> ignore the packet, due to the nature of the ERR packet, the server
> won't send further data. The client won't be able to proceed. Overall,
> the clients anyway face an error, and the only difference would be
> whether the clients can show an error nicely or not. The new clients
> will show the errors nicely to users. Old clients will not.

Yeah, this was the case I was more concerned about, and I think it is
probably OK (by this rationale, and what I wrote above).

> > So I think it's probably not a good idea to unconditionally have callers
> > of packet_read_with_status() handle this. We'd need a flag like
> > PACKET_READ_RESPECT_ERR, and to trigger it from the appropriate callers.
> 
> This is outside of the Git pack protocol so having a separate parsing
> mode makes sense to me.

Yeah. Here's a sample script which works with current Git (the index
contains the uppercased content "ERR FOO"), but fails after this patch
(Git thinks the filter reported an error and dies; it's not great that
we die in the packet-reading code at all for this case, but your patch
is hardly the first call to die() in that function).

-- >8 --
git init -q repo
cd repo

echo '*.magic filter=magic' >.git/info/attributes
git config filter.magic.process $PWD/filter

# toy filter to uppercase content
cat >filter <<-\EOF
#!/usr/bin/perl
sub read_pkt {
  my @r;
  while (1) {
    read(STDIN, my $len, 4);
    last if $len eq "0000";
    read(STDIN, my $buf, hex($len)-4);
    push @r, $buf;
  }
  return @r;
}
sub write_pkt {
  local $| = 1;
  while (@_) {
    my $buf = shift;
    printf "%04x", length($buf) + 4;
    print $buf;
  }
  print "0000";
}

read_pkt(); # handshake
write_pkt(qw(git-filter-server version=2));
read_pkt(); # capabilities
write_pkt(qw(capability=clean));

read_pkt(); # clean command
@content = read_pkt();

write_pkt(qw(status=success));
write_pkt(map { uc } @content);
write_pkt(); # final status
EOF
chmod +x filter

echo 'err foo' >foo.magic
git add foo.magic
git cat-file blob :foo.magic

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 1/4] pack-protocol.txt: accept error packets in any context
  2018-12-13  1:17                   ` Masaya Suzuki
  2018-12-13  8:04                     ` Jeff King
@ 2018-12-13 22:18                     ` Josh Steadmon
  2018-12-17 21:33                       ` Jeff King
  1 sibling, 1 reply; 41+ messages in thread
From: Josh Steadmon @ 2018-12-13 22:18 UTC (permalink / raw)
  To: Masaya Suzuki; +Cc: peff, git, Junio C Hamano

On 2018.12.12 17:17, Masaya Suzuki wrote:
> On Wed, Dec 12, 2018 at 3:02 AM Jeff King <peff@peff.net> wrote:
> > This ERR handling has been moved to a very low level. What happens if
> > we're passing arbitrary data via the packet_read() code? Could we
> > erroneously trigger an error if a packfile happens to have the bytes
> > "ERR " at a packet boundary?
> >
> > For packfiles via upload-pack, I _think_ we're OK, because we only
> > packetize it when a sideband is in use. In which case this would never
> > match, because we'd have "\1" in the first byte slot.
> >
> > But are there are other cases we need to worry about? Just
> > brainstorming, I can think of:
> >
> >   1. We also pass packetized packfiles between git-remote-https and
> >      the stateless-rpc mode of fetch-pack/send-pack. And I don't think
> >      we use sidebands there.
> >
> >   2. The packet code is used for long-lived clean/smudge filters these
> >      days, which also pass arbitrary data.
> >
> > So I think it's probably not a good idea to unconditionally have callers
> > of packet_read_with_status() handle this. We'd need a flag like
> > PACKET_READ_RESPECT_ERR, and to trigger it from the appropriate callers.
> 
> This is outside of the Git pack protocol so having a separate parsing
> mode makes sense to me.

This sounds like it could be a significant refactoring. Should we go
back to V2 of this series, and then work on the new parsing mode
separately?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 1/4] pack-protocol.txt: accept error packets in any context
  2018-12-13 22:18                     ` Josh Steadmon
@ 2018-12-17 21:33                       ` Jeff King
  2018-12-19 23:30                         ` Josh Steadmon
  0 siblings, 1 reply; 41+ messages in thread
From: Jeff King @ 2018-12-17 21:33 UTC (permalink / raw)
  To: Masaya Suzuki, git, Junio C Hamano

On Thu, Dec 13, 2018 at 02:18:26PM -0800, Josh Steadmon wrote:

> On 2018.12.12 17:17, Masaya Suzuki wrote:
> > On Wed, Dec 12, 2018 at 3:02 AM Jeff King <peff@peff.net> wrote:
> > > This ERR handling has been moved to a very low level. What happens if
> > > we're passing arbitrary data via the packet_read() code? Could we
> > > erroneously trigger an error if a packfile happens to have the bytes
> > > "ERR " at a packet boundary?
> > >
> > > For packfiles via upload-pack, I _think_ we're OK, because we only
> > > packetize it when a sideband is in use. In which case this would never
> > > match, because we'd have "\1" in the first byte slot.
> > >
> > > But are there are other cases we need to worry about? Just
> > > brainstorming, I can think of:
> > >
> > >   1. We also pass packetized packfiles between git-remote-https and
> > >      the stateless-rpc mode of fetch-pack/send-pack. And I don't think
> > >      we use sidebands there.
> > >
> > >   2. The packet code is used for long-lived clean/smudge filters these
> > >      days, which also pass arbitrary data.
> > >
> > > So I think it's probably not a good idea to unconditionally have callers
> > > of packet_read_with_status() handle this. We'd need a flag like
> > > PACKET_READ_RESPECT_ERR, and to trigger it from the appropriate callers.
> > 
> > This is outside of the Git pack protocol so having a separate parsing
> > mode makes sense to me.
> 
> This sounds like it could be a significant refactoring. Should we go
> back to V2 of this series, and then work on the new parsing mode
> separately?

Which one is v2? :)

Just the remote-curl cleanups from me, and then your "die on server-side
errors" patch?

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 1/4] pack-protocol.txt: accept error packets in any context
  2018-12-17 21:33                       ` Jeff King
@ 2018-12-19 23:30                         ` Josh Steadmon
  2018-12-20 15:49                           ` Jeff King
  0 siblings, 1 reply; 41+ messages in thread
From: Josh Steadmon @ 2018-12-19 23:30 UTC (permalink / raw)
  To: Jeff King; +Cc: Masaya Suzuki, git, Junio C Hamano

On 2018.12.17 16:33, Jeff King wrote:
> On Thu, Dec 13, 2018 at 02:18:26PM -0800, Josh Steadmon wrote:
> 
> > On 2018.12.12 17:17, Masaya Suzuki wrote:
> > > On Wed, Dec 12, 2018 at 3:02 AM Jeff King <peff@peff.net> wrote:
> > > > This ERR handling has been moved to a very low level. What happens if
> > > > we're passing arbitrary data via the packet_read() code? Could we
> > > > erroneously trigger an error if a packfile happens to have the bytes
> > > > "ERR " at a packet boundary?
> > > >
> > > > For packfiles via upload-pack, I _think_ we're OK, because we only
> > > > packetize it when a sideband is in use. In which case this would never
> > > > match, because we'd have "\1" in the first byte slot.
> > > >
> > > > But are there are other cases we need to worry about? Just
> > > > brainstorming, I can think of:
> > > >
> > > >   1. We also pass packetized packfiles between git-remote-https and
> > > >      the stateless-rpc mode of fetch-pack/send-pack. And I don't think
> > > >      we use sidebands there.
> > > >
> > > >   2. The packet code is used for long-lived clean/smudge filters these
> > > >      days, which also pass arbitrary data.
> > > >
> > > > So I think it's probably not a good idea to unconditionally have callers
> > > > of packet_read_with_status() handle this. We'd need a flag like
> > > > PACKET_READ_RESPECT_ERR, and to trigger it from the appropriate callers.
> > > 
> > > This is outside of the Git pack protocol so having a separate parsing
> > > mode makes sense to me.
> > 
> > This sounds like it could be a significant refactoring. Should we go
> > back to V2 of this series, and then work on the new parsing mode
> > separately?
> 
> Which one is v2? :)
> 
> Just the remote-curl cleanups from me, and then your "die on server-side
> errors" patch?

Yes, that one :)

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 1/4] pack-protocol.txt: accept error packets in any context
  2018-12-19 23:30                         ` Josh Steadmon
@ 2018-12-20 15:49                           ` Jeff King
  0 siblings, 0 replies; 41+ messages in thread
From: Jeff King @ 2018-12-20 15:49 UTC (permalink / raw)
  To: Masaya Suzuki, git, Junio C Hamano

On Wed, Dec 19, 2018 at 03:30:05PM -0800, Josh Steadmon wrote:

> > > > This is outside of the Git pack protocol so having a separate parsing
> > > > mode makes sense to me.
> > > 
> > > This sounds like it could be a significant refactoring. Should we go
> > > back to V2 of this series, and then work on the new parsing mode
> > > separately?
> > 
> > Which one is v2? :)
> > 
> > Just the remote-curl cleanups from me, and then your "die on server-side
> > errors" patch?
> 
> Yes, that one :)

Then yes, that sounds reasonable to me.

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] remote-curl: refactor smart-http discovery
  2018-11-16  8:47             ` [PATCH 1/3] remote-curl: refactor smart-http discovery Jeff King
  2018-11-16 20:27               ` Josh Steadmon
@ 2019-02-05 23:29               ` Junio C Hamano
  2019-02-06 19:16                 ` Jeff King
  1 sibling, 1 reply; 41+ messages in thread
From: Junio C Hamano @ 2019-02-05 23:29 UTC (permalink / raw)
  To: Jeff King, Josh Steadmon; +Cc: git

Jeff King <peff@peff.net> writes:

> After making initial contact with an http server, we have to decide if
> the server supports smart-http, and if so, which version. Our rules are
> a bit inconsistent:
> ...
>
>  - we now predicate the smart/dumb decision entirely on the presence of
>    the correct content-type
>
>  - we do a real pkt-line parse before deciding how to proceed (and die
>    if it isn't valid)
>
>  - use skip_prefix() for comparing service strings, instead of
>    constructing expected output in a strbuf; this avoids dealing with
>    memory cleanup
>
> Note that this _is_ tightening what the client will allow. It's all
> according to the spec, but it's possible that other implementations
> might violate these. However, violating these particular rules seems
> like an odd choice for a server to make.
>
> [1] Documentation/technical/http-protocol.txt, l. 166-167
> [2] Documentation/technical/protocol-v2.txt, l. 63-64
> [3] Documentation/technical/http-protocol.txt, l. 247
>
> Helped-by: Josh Steadmon <steadmon@google.com>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  remote-curl.c | 93 ++++++++++++++++++++++++++++++++-------------------
>  1 file changed, 59 insertions(+), 34 deletions(-)

It turns out that this has interactions with 01f9ec64 ("Use
packet_reader instead of packet_read_line", 2018-12-29) on the
ms/packet-err-check branch.  Can we get this rebased on top of
a more recent 'master'?

Thanks.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] remote-curl: refactor smart-http discovery
  2019-02-05 23:29               ` Junio C Hamano
@ 2019-02-06 19:16                 ` Jeff King
  2019-02-06 19:18                   ` Jeff King
                                     ` (3 more replies)
  0 siblings, 4 replies; 41+ messages in thread
From: Jeff King @ 2019-02-06 19:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Josh Steadmon, git

On Tue, Feb 05, 2019 at 03:29:18PM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > After making initial contact with an http server, we have to decide if
> > the server supports smart-http, and if so, which version. Our rules are
> > a bit inconsistent:
> > ...
> >
> >  - we now predicate the smart/dumb decision entirely on the presence of
> >    the correct content-type
> >
> >  - we do a real pkt-line parse before deciding how to proceed (and die
> >    if it isn't valid)
> >
> >  - use skip_prefix() for comparing service strings, instead of
> >    constructing expected output in a strbuf; this avoids dealing with
> >    memory cleanup
> >
> > Note that this _is_ tightening what the client will allow. It's all
> > according to the spec, but it's possible that other implementations
> > might violate these. However, violating these particular rules seems
> > like an odd choice for a server to make.
> >
> > [1] Documentation/technical/http-protocol.txt, l. 166-167
> > [2] Documentation/technical/protocol-v2.txt, l. 63-64
> > [3] Documentation/technical/http-protocol.txt, l. 247
> >
> > Helped-by: Josh Steadmon <steadmon@google.com>
> > Signed-off-by: Jeff King <peff@peff.net>
> > ---
> >  remote-curl.c | 93 ++++++++++++++++++++++++++++++++-------------------
> >  1 file changed, 59 insertions(+), 34 deletions(-)
> 
> It turns out that this has interactions with 01f9ec64 ("Use
> packet_reader instead of packet_read_line", 2018-12-29) on the
> ms/packet-err-check branch.  Can we get this rebased on top of
> a more recent 'master'?

Yep. Here it is.

Rather than a range-diff, which is quite large due to the code movement,
I'll include below the interesting hunk of a diff between the two
endpoints (i.e., what we would have seen applying the packet-err-check
changes on top of my code movement, which is more or less what I did to
generate it).

Josh's original 3/3 isn't needed anymore, since ms/packet-err-check
covers that case already. However, he did write tests, which
ms/packet-err-check does not have. So I've converted his final patch
into just a test addition.

  [1/3]: remote-curl: refactor smart-http discovery
  [2/3]: remote-curl: tighten "version 2" check for smart-http
  [3/3]: t5551: test server-side ERR packet

 remote-curl.c                   | 100 ++++++++++++++++++--------------
 t/lib-httpd.sh                  |   1 +
 t/lib-httpd/apache.conf         |   4 ++
 t/lib-httpd/error-smart-http.sh |   3 +
 t/t5551-http-fetch-smart.sh     |   5 ++
 5 files changed, 70 insertions(+), 43 deletions(-)
 create mode 100644 t/lib-httpd/error-smart-http.sh

Here's that hunk:

diff --git a/remote-curl.c b/remote-curl.c
index b1309f2bdc..bb7421023b 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -333,10 +334,8 @@ static int get_protocol_http_header(enum protocol_version version,
 static void check_smart_http(struct discovery *d, const char *service,
 			     struct strbuf *type)
 {
-	char *src_buf;
-	size_t src_len;
-	char *line;
 	const char *p;
+	struct packet_reader reader;
 
 	/*
 	 * If we don't see x-$service-advertisement, then it's not smart-http.
@@ -348,45 +347,42 @@ static void check_smart_http(struct discovery *d, const char *service,
 	    strcmp(p, "-advertisement"))
 		return;
 
-	/*
-	 * "Peek" at the first packet by using a separate buf/len pair; some
-	 * cases below require us leaving the originals intact.
-	 */
-	src_buf = d->buf;
-	src_len = d->len;
-	line = packet_read_line_buf(&src_buf, &src_len, NULL);
-	if (!line)
+	packet_reader_init(&reader, -1, d->buf, d->len,
+			   PACKET_READ_CHOMP_NEWLINE |
+			   PACKET_READ_DIE_ON_ERR_PACKET);
+	if (packet_reader_read(&reader) != PACKET_READ_NORMAL)
 		die("invalid server response; expected service, got flush packet");
 
-	if (skip_prefix(line, "# service=", &p) && !strcmp(p, service)) {
+	if (skip_prefix(reader.line, "# service=", &p) && !strcmp(p, service)) {
 		/*
 		 * The header can include additional metadata lines, up
 		 * until a packet flush marker.  Ignore these now, but
 		 * in the future we might start to scan them.
 		 */
-		while (packet_read_line_buf(&src_buf, &src_len, NULL))
-			;
+		for (;;) {
+			packet_reader_read(&reader);
+			if (reader.pktlen <= 0) {
+				break;
+			}
+		}
 
 		/*
 		 * v0 smart http; callers expect us to soak up the
 		 * service and header packets
 		 */
-		d->buf = src_buf;
-		d->len = src_len;
+		d->buf = reader.src_buffer;
+		d->len = reader.src_len;
 		d->proto_git = 1;
 
-	} else if (!strcmp(line, "version 2")) {
+	} else if (!strcmp(reader.line, "version 2")) {
 		/*
 		 * v2 smart http; do not consume version packet, which will
 		 * be handled elsewhere.
 		 */
 		d->proto_git = 1;
 
-	} else if (skip_prefix(line, "ERR ", &p)) {
-		die(_("remote error: %s"), p);
-
 	} else {
-		die("invalid server response; got '%s'", line);
+		die("invalid server response; got '%s'", reader.line);
 	}
 }
 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 1/3] remote-curl: refactor smart-http discovery
  2019-02-06 19:16                 ` Jeff King
@ 2019-02-06 19:18                   ` Jeff King
  2019-02-06 19:29                     ` Josh Steadmon
  2019-02-06 19:18                   ` [PATCH 2/3] remote-curl: tighten "version 2" check for smart-http Jeff King
                                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 41+ messages in thread
From: Jeff King @ 2019-02-06 19:18 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Josh Steadmon, git

After making initial contact with an http server, we have to decide if
the server supports smart-http, and if so, which version. Our rules are
a bit inconsistent:

  1. For v0, we require that the content-type indicates a smart-http
     response. We also require the response to look vaguely like a
     pkt-line starting with "#". If one of those does not match, we fall
     back to dumb-http.

     But according to our http protocol spec[1]:

       Dumb servers MUST NOT return a return type starting with
       `application/x-git-`.

     If we see the expected content-type, we should consider it
     smart-http. At that point we can parse the pkt-line for real, and
     complain if it is not syntactically valid.

  2. For v2, we do not actually check the content-type. Our v2 protocol
     spec says[2]:

       When using the http:// or https:// transport a client makes a
       "smart" info/refs request as described in `http-protocol.txt`[...]

     and the http spec is clear that for a smart-http response[3]:

       The Content-Type MUST be `application/x-$servicename-advertisement`.

     So it is required according to the spec.

These inconsistencies were easy to miss because of the way the original
code was written as an inline conditional. Let's pull it out into its
own function for readability, and improve a few things:

 - we now predicate the smart/dumb decision entirely on the presence of
   the correct content-type

 - we do a real pkt-line parse before deciding how to proceed (and die
   if it isn't valid)

 - use skip_prefix() for comparing service strings, instead of
   constructing expected output in a strbuf; this avoids dealing with
   memory cleanup

Note that this _is_ tightening what the client will allow. It's all
according to the spec, but it's possible that other implementations
might violate these. However, violating these particular rules seems
like an odd choice for a server to make.

[1] Documentation/technical/http-protocol.txt, l. 166-167
[2] Documentation/technical/protocol-v2.txt, l. 63-64
[3] Documentation/technical/http-protocol.txt, l. 247

Helped-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Jeff King <peff@peff.net>
---
 remote-curl.c | 100 ++++++++++++++++++++++++++++----------------------
 1 file changed, 57 insertions(+), 43 deletions(-)

diff --git a/remote-curl.c b/remote-curl.c
index 2e04d53ac8..c78ba83744 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -331,9 +331,63 @@ static int get_protocol_http_header(enum protocol_version version,
 	return 0;
 }
 
+static void check_smart_http(struct discovery *d, const char *service,
+			     struct strbuf *type)
+{
+	const char *p;
+	struct packet_reader reader;
+
+	/*
+	 * If we don't see x-$service-advertisement, then it's not smart-http.
+	 * But once we do, we commit to it and assume any other protocol
+	 * violations are hard errors.
+	 */
+	if (!skip_prefix(type->buf, "application/x-", &p) ||
+	    !skip_prefix(p, service, &p) ||
+	    strcmp(p, "-advertisement"))
+		return;
+
+	packet_reader_init(&reader, -1, d->buf, d->len,
+			   PACKET_READ_CHOMP_NEWLINE |
+			   PACKET_READ_DIE_ON_ERR_PACKET);
+	if (packet_reader_read(&reader) != PACKET_READ_NORMAL)
+		die("invalid server response; expected service, got flush packet");
+
+	if (skip_prefix(reader.line, "# service=", &p) && !strcmp(p, service)) {
+		/*
+		 * The header can include additional metadata lines, up
+		 * until a packet flush marker.  Ignore these now, but
+		 * in the future we might start to scan them.
+		 */
+		for (;;) {
+			packet_reader_read(&reader);
+			if (reader.pktlen <= 0) {
+				break;
+			}
+		}
+
+		/*
+		 * v0 smart http; callers expect us to soak up the
+		 * service and header packets
+		 */
+		d->buf = reader.src_buffer;
+		d->len = reader.src_len;
+		d->proto_git = 1;
+
+	} else if (starts_with(reader.line, "version 2")) {
+		/*
+		 * v2 smart http; do not consume version packet, which will
+		 * be handled elsewhere.
+		 */
+		d->proto_git = 1;
+
+	} else {
+		die("invalid server response; got '%s'", reader.line);
+	}
+}
+
 static struct discovery *discover_refs(const char *service, int for_push)
 {
-	struct strbuf exp = STRBUF_INIT;
 	struct strbuf type = STRBUF_INIT;
 	struct strbuf charset = STRBUF_INIT;
 	struct strbuf buffer = STRBUF_INIT;
@@ -405,47 +459,8 @@ static struct discovery *discover_refs(const char *service, int for_push)
 	last->buf_alloc = strbuf_detach(&buffer, &last->len);
 	last->buf = last->buf_alloc;
 
-	strbuf_addf(&exp, "application/x-%s-advertisement", service);
-	if (maybe_smart &&
-	    (5 <= last->len && last->buf[4] == '#') &&
-	    !strbuf_cmp(&exp, &type)) {
-		struct packet_reader reader;
-		packet_reader_init(&reader, -1, last->buf, last->len,
-				   PACKET_READ_CHOMP_NEWLINE |
-				   PACKET_READ_DIE_ON_ERR_PACKET);
-
-		/*
-		 * smart HTTP response; validate that the service
-		 * pkt-line matches our request.
-		 */
-		if (packet_reader_read(&reader) != PACKET_READ_NORMAL)
-			die("invalid server response; expected service, got flush packet");
-
-		strbuf_reset(&exp);
-		strbuf_addf(&exp, "# service=%s", service);
-		if (strcmp(reader.line, exp.buf))
-			die("invalid server response; got '%s'", reader.line);
-		strbuf_release(&exp);
-
-		/* The header can include additional metadata lines, up
-		 * until a packet flush marker.  Ignore these now, but
-		 * in the future we might start to scan them.
-		 */
-		for (;;) {
-			packet_reader_read(&reader);
-			if (reader.pktlen <= 0) {
-				break;
-			}
-		}
-
-		last->buf = reader.src_buffer;
-		last->len = reader.src_len;
-
-		last->proto_git = 1;
-	} else if (maybe_smart &&
-		   last->len > 5 && starts_with(last->buf + 4, "version 2")) {
-		last->proto_git = 1;
-	}
+	if (maybe_smart)
+		check_smart_http(last, service, &type);
 
 	if (last->proto_git)
 		last->refs = parse_git_refs(last, for_push);
@@ -453,7 +468,6 @@ static struct discovery *discover_refs(const char *service, int for_push)
 		last->refs = parse_info_refs(last);
 
 	strbuf_release(&refs_url);
-	strbuf_release(&exp);
 	strbuf_release(&type);
 	strbuf_release(&charset);
 	strbuf_release(&effective_url);
-- 
2.20.1.1122.g2972e48916


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 2/3] remote-curl: tighten "version 2" check for smart-http
  2019-02-06 19:16                 ` Jeff King
  2019-02-06 19:18                   ` Jeff King
@ 2019-02-06 19:18                   ` Jeff King
  2019-02-06 19:19                   ` [PATCH 3/3] t5551: test server-side ERR packet Jeff King
  2019-02-06 22:19                   ` [PATCH 1/3] remote-curl: refactor smart-http discovery Junio C Hamano
  3 siblings, 0 replies; 41+ messages in thread
From: Jeff King @ 2019-02-06 19:18 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Josh Steadmon, git

In a v2 smart-http conversation, the server should reply to our initial
request with a pkt-line saying "version 2". We check that with
starts_with(), but really that should be the only thing in that packet.
A response of "version 20" should not match.

Let's tighten this check to use strcmp(). Note that we don't need to
worry about a trailing newline here, because the ptk-line code will have
chomped it for us already.

Signed-off-by: Jeff King <peff@peff.net>
---
 remote-curl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/remote-curl.c b/remote-curl.c
index c78ba83744..bb7421023b 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -374,7 +374,7 @@ static void check_smart_http(struct discovery *d, const char *service,
 		d->len = reader.src_len;
 		d->proto_git = 1;
 
-	} else if (starts_with(reader.line, "version 2")) {
+	} else if (!strcmp(reader.line, "version 2")) {
 		/*
 		 * v2 smart http; do not consume version packet, which will
 		 * be handled elsewhere.
-- 
2.20.1.1122.g2972e48916


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 3/3] t5551: test server-side ERR packet
  2019-02-06 19:16                 ` Jeff King
  2019-02-06 19:18                   ` Jeff King
  2019-02-06 19:18                   ` [PATCH 2/3] remote-curl: tighten "version 2" check for smart-http Jeff King
@ 2019-02-06 19:19                   ` Jeff King
  2019-02-06 22:19                   ` [PATCH 1/3] remote-curl: refactor smart-http discovery Junio C Hamano
  3 siblings, 0 replies; 41+ messages in thread
From: Jeff King @ 2019-02-06 19:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Josh Steadmon, git

From: Josh Steadmon <steadmon@google.com>

When a smart HTTP server sends an error message via pkt-line, we detect
the error due to using PACKET_READ_DIE_ON_ERR_PACKET. This case was
added by 2d103c31c2 (pack-protocol.txt: accept error packets in any
context, 2018-12-29), but not covered by tests.

Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Jeff King <peff@peff.net>
---
 t/lib-httpd.sh                  | 1 +
 t/lib-httpd/apache.conf         | 4 ++++
 t/lib-httpd/error-smart-http.sh | 3 +++
 t/t5551-http-fetch-smart.sh     | 5 +++++
 4 files changed, 13 insertions(+)
 create mode 100644 t/lib-httpd/error-smart-http.sh

diff --git a/t/lib-httpd.sh b/t/lib-httpd.sh
index e465116ef9..216281eabc 100644
--- a/t/lib-httpd.sh
+++ b/t/lib-httpd.sh
@@ -131,6 +131,7 @@ prepare_httpd() {
 	mkdir -p "$HTTPD_DOCUMENT_ROOT_PATH"
 	cp "$TEST_PATH"/passwd "$HTTPD_ROOT_PATH"
 	install_script broken-smart-http.sh
+	install_script error-smart-http.sh
 	install_script error.sh
 	install_script apply-one-time-sed.sh
 
diff --git a/t/lib-httpd/apache.conf b/t/lib-httpd/apache.conf
index 5d63ed90c5..06a81b54c7 100644
--- a/t/lib-httpd/apache.conf
+++ b/t/lib-httpd/apache.conf
@@ -119,6 +119,7 @@ Alias /auth/dumb/ www/auth/dumb/
 ScriptAliasMatch /error_git_upload_pack/(.*)/git-upload-pack error.sh/
 ScriptAliasMatch /smart_*[^/]*/(.*) ${GIT_EXEC_PATH}/git-http-backend/$1
 ScriptAlias /broken_smart/ broken-smart-http.sh/
+ScriptAlias /error_smart/ error-smart-http.sh/
 ScriptAlias /error/ error.sh/
 ScriptAliasMatch /one_time_sed/(.*) apply-one-time-sed.sh/$1
 <Directory ${GIT_EXEC_PATH}>
@@ -127,6 +128,9 @@ ScriptAliasMatch /one_time_sed/(.*) apply-one-time-sed.sh/$1
 <Files broken-smart-http.sh>
 	Options ExecCGI
 </Files>
+<Files error-smart-http.sh>
+	Options ExecCGI
+</Files>
 <Files error.sh>
   Options ExecCGI
 </Files>
diff --git a/t/lib-httpd/error-smart-http.sh b/t/lib-httpd/error-smart-http.sh
new file mode 100644
index 0000000000..e65d447fc4
--- /dev/null
+++ b/t/lib-httpd/error-smart-http.sh
@@ -0,0 +1,3 @@
+echo "Content-Type: application/x-git-upload-pack-advertisement"
+echo
+printf "%s" "0019ERR server-side error"
diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh
index 8630b0cc39..ba83e567e5 100755
--- a/t/t5551-http-fetch-smart.sh
+++ b/t/t5551-http-fetch-smart.sh
@@ -429,5 +429,10 @@ test_expect_success 'GIT_TRACE_CURL_NO_DATA prevents data from being traced' '
 	! grep "=> Send data" err
 '
 
+test_expect_success 'server-side error detected' '
+	test_must_fail git clone $HTTPD_URL/error_smart/repo.git 2>actual &&
+	grep "server-side error" actual
+'
+
 stop_httpd
 test_done
-- 
2.20.1.1122.g2972e48916

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] remote-curl: refactor smart-http discovery
  2019-02-06 19:18                   ` Jeff King
@ 2019-02-06 19:29                     ` Josh Steadmon
  2019-02-06 20:42                       ` Junio C Hamano
  2019-02-06 21:14                       ` Jeff King
  0 siblings, 2 replies; 41+ messages in thread
From: Josh Steadmon @ 2019-02-06 19:29 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git

On 2019.02.06 14:18, Jeff King wrote:
> After making initial contact with an http server, we have to decide if
> the server supports smart-http, and if so, which version. Our rules are
> a bit inconsistent:
> 
>   1. For v0, we require that the content-type indicates a smart-http
>      response. We also require the response to look vaguely like a
>      pkt-line starting with "#". If one of those does not match, we fall
>      back to dumb-http.
> 
>      But according to our http protocol spec[1]:
> 
>        Dumb servers MUST NOT return a return type starting with
>        `application/x-git-`.
> 
>      If we see the expected content-type, we should consider it
>      smart-http. At that point we can parse the pkt-line for real, and
>      complain if it is not syntactically valid.
> 
>   2. For v2, we do not actually check the content-type. Our v2 protocol
>      spec says[2]:
> 
>        When using the http:// or https:// transport a client makes a
>        "smart" info/refs request as described in `http-protocol.txt`[...]
> 
>      and the http spec is clear that for a smart-http response[3]:
> 
>        The Content-Type MUST be `application/x-$servicename-advertisement`.
> 
>      So it is required according to the spec.
> 
> These inconsistencies were easy to miss because of the way the original
> code was written as an inline conditional. Let's pull it out into its
> own function for readability, and improve a few things:
> 
>  - we now predicate the smart/dumb decision entirely on the presence of
>    the correct content-type
> 
>  - we do a real pkt-line parse before deciding how to proceed (and die
>    if it isn't valid)
> 
>  - use skip_prefix() for comparing service strings, instead of
>    constructing expected output in a strbuf; this avoids dealing with
>    memory cleanup
> 
> Note that this _is_ tightening what the client will allow. It's all
> according to the spec, but it's possible that other implementations
> might violate these. However, violating these particular rules seems
> like an odd choice for a server to make.
> 
> [1] Documentation/technical/http-protocol.txt, l. 166-167
> [2] Documentation/technical/protocol-v2.txt, l. 63-64
> [3] Documentation/technical/http-protocol.txt, l. 247
> 
> Helped-by: Josh Steadmon <steadmon@google.com>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  remote-curl.c | 100 ++++++++++++++++++++++++++++----------------------
>  1 file changed, 57 insertions(+), 43 deletions(-)
> 
> diff --git a/remote-curl.c b/remote-curl.c
> index 2e04d53ac8..c78ba83744 100644
> --- a/remote-curl.c
> +++ b/remote-curl.c
> @@ -331,9 +331,63 @@ static int get_protocol_http_header(enum protocol_version version,
>  	return 0;
>  }
>  
> +static void check_smart_http(struct discovery *d, const char *service,
> +			     struct strbuf *type)
> +{
> +	const char *p;
> +	struct packet_reader reader;
> +
> +	/*
> +	 * If we don't see x-$service-advertisement, then it's not smart-http.
> +	 * But once we do, we commit to it and assume any other protocol
> +	 * violations are hard errors.
> +	 */
> +	if (!skip_prefix(type->buf, "application/x-", &p) ||
> +	    !skip_prefix(p, service, &p) ||
> +	    strcmp(p, "-advertisement"))
> +		return;
> +
> +	packet_reader_init(&reader, -1, d->buf, d->len,
> +			   PACKET_READ_CHOMP_NEWLINE |
> +			   PACKET_READ_DIE_ON_ERR_PACKET);
> +	if (packet_reader_read(&reader) != PACKET_READ_NORMAL)
> +		die("invalid server response; expected service, got flush packet");

This can also trigger on an EOF or a delim packet, should we clarify the
error message?


> +
> +	if (skip_prefix(reader.line, "# service=", &p) && !strcmp(p, service)) {
> +		/*
> +		 * The header can include additional metadata lines, up
> +		 * until a packet flush marker.  Ignore these now, but
> +		 * in the future we might start to scan them.
> +		 */
> +		for (;;) {
> +			packet_reader_read(&reader);
> +			if (reader.pktlen <= 0) {
> +				break;
> +			}
> +		}

Could we make this loop cleaner as:

while (packet_reader_read(&reader) != PACKET_READ_NORMAL)
  ;


> +
> +		/*
> +		 * v0 smart http; callers expect us to soak up the
> +		 * service and header packets
> +		 */
> +		d->buf = reader.src_buffer;
> +		d->len = reader.src_len;
> +		d->proto_git = 1;
> +
> +	} else if (starts_with(reader.line, "version 2")) {
> +		/*
> +		 * v2 smart http; do not consume version packet, which will
> +		 * be handled elsewhere.
> +		 */
> +		d->proto_git = 1;
> +
> +	} else {
> +		die("invalid server response; got '%s'", reader.line);
> +	}
> +}
> +
>  static struct discovery *discover_refs(const char *service, int for_push)
>  {
> -	struct strbuf exp = STRBUF_INIT;
>  	struct strbuf type = STRBUF_INIT;
>  	struct strbuf charset = STRBUF_INIT;
>  	struct strbuf buffer = STRBUF_INIT;
> @@ -405,47 +459,8 @@ static struct discovery *discover_refs(const char *service, int for_push)
>  	last->buf_alloc = strbuf_detach(&buffer, &last->len);
>  	last->buf = last->buf_alloc;
>  
> -	strbuf_addf(&exp, "application/x-%s-advertisement", service);
> -	if (maybe_smart &&
> -	    (5 <= last->len && last->buf[4] == '#') &&
> -	    !strbuf_cmp(&exp, &type)) {
> -		struct packet_reader reader;
> -		packet_reader_init(&reader, -1, last->buf, last->len,
> -				   PACKET_READ_CHOMP_NEWLINE |
> -				   PACKET_READ_DIE_ON_ERR_PACKET);
> -
> -		/*
> -		 * smart HTTP response; validate that the service
> -		 * pkt-line matches our request.
> -		 */
> -		if (packet_reader_read(&reader) != PACKET_READ_NORMAL)
> -			die("invalid server response; expected service, got flush packet");
> -
> -		strbuf_reset(&exp);
> -		strbuf_addf(&exp, "# service=%s", service);
> -		if (strcmp(reader.line, exp.buf))
> -			die("invalid server response; got '%s'", reader.line);
> -		strbuf_release(&exp);
> -
> -		/* The header can include additional metadata lines, up
> -		 * until a packet flush marker.  Ignore these now, but
> -		 * in the future we might start to scan them.
> -		 */
> -		for (;;) {
> -			packet_reader_read(&reader);
> -			if (reader.pktlen <= 0) {
> -				break;
> -			}
> -		}
> -
> -		last->buf = reader.src_buffer;
> -		last->len = reader.src_len;
> -
> -		last->proto_git = 1;
> -	} else if (maybe_smart &&
> -		   last->len > 5 && starts_with(last->buf + 4, "version 2")) {
> -		last->proto_git = 1;
> -	}
> +	if (maybe_smart)
> +		check_smart_http(last, service, &type);
>  
>  	if (last->proto_git)
>  		last->refs = parse_git_refs(last, for_push);
> @@ -453,7 +468,6 @@ static struct discovery *discover_refs(const char *service, int for_push)
>  		last->refs = parse_info_refs(last);
>  
>  	strbuf_release(&refs_url);
> -	strbuf_release(&exp);
>  	strbuf_release(&type);
>  	strbuf_release(&charset);
>  	strbuf_release(&effective_url);
> -- 
> 2.20.1.1122.g2972e48916
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] remote-curl: refactor smart-http discovery
  2019-02-06 19:29                     ` Josh Steadmon
@ 2019-02-06 20:42                       ` Junio C Hamano
  2019-02-06 21:14                       ` Jeff King
  1 sibling, 0 replies; 41+ messages in thread
From: Junio C Hamano @ 2019-02-06 20:42 UTC (permalink / raw)
  To: Josh Steadmon; +Cc: Jeff King, git

Josh Steadmon <steadmon@google.com> writes:

>> +	packet_reader_init(&reader, -1, d->buf, d->len,
>> +			   PACKET_READ_CHOMP_NEWLINE |
>> +			   PACKET_READ_DIE_ON_ERR_PACKET);
>> +	if (packet_reader_read(&reader) != PACKET_READ_NORMAL)
>> +		die("invalid server response; expected service, got flush packet");
>
> This can also trigger on an EOF or a delim packet, should we clarify the
> error message?

You mean that it may not be "flush packet"?  I guess we can remove
", got X" part and that would probably be an improvement.

>> +
>> +	if (skip_prefix(reader.line, "# service=", &p) && !strcmp(p, service)) {
>> +		/*
>> +		 * The header can include additional metadata lines, up
>> +		 * until a packet flush marker.  Ignore these now, but
>> +		 * in the future we might start to scan them.
>> +		 */
>> +		for (;;) {
>> +			packet_reader_read(&reader);
>> +			if (reader.pktlen <= 0) {
>> +				break;
>> +			}
>> +		}
>
> Could we make this loop cleaner as:
>
> while (packet_reader_read(&reader) != PACKET_READ_NORMAL)
>   ;

The only case as far as I can tell that would make the difference
between the original and your simplified one would be if a packet
had a single newline and nothing else in it, in which case pktlen
would be zero (since we pass CHOMP_NEWLINE) and the return value
would be READ_NORMAL.  The original would break, while yours will
spin one more time.

Thanks.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] remote-curl: refactor smart-http discovery
  2019-02-06 19:29                     ` Josh Steadmon
  2019-02-06 20:42                       ` Junio C Hamano
@ 2019-02-06 21:14                       ` Jeff King
  1 sibling, 0 replies; 41+ messages in thread
From: Jeff King @ 2019-02-06 21:14 UTC (permalink / raw)
  To: Josh Steadmon, Junio C Hamano, git

On Wed, Feb 06, 2019 at 11:29:03AM -0800, Josh Steadmon wrote:

> > +	packet_reader_init(&reader, -1, d->buf, d->len,
> > +			   PACKET_READ_CHOMP_NEWLINE |
> > +			   PACKET_READ_DIE_ON_ERR_PACKET);
> > +	if (packet_reader_read(&reader) != PACKET_READ_NORMAL)
> > +		die("invalid server response; expected service, got flush packet");
> 
> This can also trigger on an EOF or a delim packet, should we clarify the
> error message?

Maybe, though I'd prefer to do it as a patch on top; these lines were
moved straight from the existing code.

> > +	if (skip_prefix(reader.line, "# service=", &p) && !strcmp(p, service)) {
> > +		/*
> > +		 * The header can include additional metadata lines, up
> > +		 * until a packet flush marker.  Ignore these now, but
> > +		 * in the future we might start to scan them.
> > +		 */
> > +		for (;;) {
> > +			packet_reader_read(&reader);
> > +			if (reader.pktlen <= 0) {
> > +				break;
> > +			}
> > +		}
> 
> Could we make this loop cleaner as:
> 
> while (packet_reader_read(&reader) != PACKET_READ_NORMAL)
>   ;

Likewise here.

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] remote-curl: refactor smart-http discovery
  2019-02-06 19:16                 ` Jeff King
                                     ` (2 preceding siblings ...)
  2019-02-06 19:19                   ` [PATCH 3/3] t5551: test server-side ERR packet Jeff King
@ 2019-02-06 22:19                   ` Junio C Hamano
  3 siblings, 0 replies; 41+ messages in thread
From: Junio C Hamano @ 2019-02-06 22:19 UTC (permalink / raw)
  To: Jeff King; +Cc: Josh Steadmon, git

Jeff King <peff@peff.net> writes:

> Yep. Here it is.
>
> Rather than a range-diff, which is quite large due to the code movement,
> I'll include below the interesting hunk of a diff between the two
> endpoints (i.e., what we would have seen applying the packet-err-check
> changes on top of my code movement, which is more or less what I did to
> generate it).

Thanks.  The hunks shown below are to use packet_reader interface in
check_smart_http(), which is exactly what we want to see in this
step, and what I was too lazy to recreate as a part of conflict
resolution ;-)

Thanks, queued.

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2019-02-06 22:19 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-12 22:44 [PATCH] remote-curl: die on server-side errors steadmon
2018-11-12 22:55 ` Stefan Beller
2018-11-13  2:52 ` Junio C Hamano
2018-11-13  3:02 ` Junio C Hamano
2018-11-13 22:15   ` Josh Steadmon
2018-11-13 14:26 ` Jeff King
2018-11-13 22:25   ` Josh Steadmon
2018-11-14  0:49     ` Jeff King
2018-11-14  7:00       ` Jeff King
2018-11-15 21:51         ` Josh Steadmon
2018-11-16  8:44           ` [PATCH 0/3] remote-curl smart-http discovery cleanup Jeff King
2018-11-16  8:47             ` [PATCH 1/3] remote-curl: refactor smart-http discovery Jeff King
2018-11-16 20:27               ` Josh Steadmon
2019-02-05 23:29               ` Junio C Hamano
2019-02-06 19:16                 ` Jeff King
2019-02-06 19:18                   ` Jeff King
2019-02-06 19:29                     ` Josh Steadmon
2019-02-06 20:42                       ` Junio C Hamano
2019-02-06 21:14                       ` Jeff King
2019-02-06 19:18                   ` [PATCH 2/3] remote-curl: tighten "version 2" check for smart-http Jeff King
2019-02-06 19:19                   ` [PATCH 3/3] t5551: test server-side ERR packet Jeff King
2019-02-06 22:19                   ` [PATCH 1/3] remote-curl: refactor smart-http discovery Junio C Hamano
2018-11-16  8:48             ` [PATCH 2/3] remote-curl: tighten "version 2" check for smart-http Jeff King
2018-11-16 20:28               ` Josh Steadmon
2018-11-16  8:49             ` [PATCH 3/3] remote-curl: die on server-side errors Jeff King
2018-11-16 20:04             ` [PATCH 0/3] remote-curl smart-http discovery cleanup Josh Steadmon
2018-12-12  0:25             ` [PATCH v3 0/4] Unify pkt-line error handling and refactor smart-http Josh Steadmon
2018-12-12  0:25               ` [PATCH v3 1/4] pack-protocol.txt: accept error packets in any context Josh Steadmon
2018-12-12 11:02                 ` Jeff King
2018-12-13  1:17                   ` Masaya Suzuki
2018-12-13  8:04                     ` Jeff King
2018-12-13 22:18                     ` Josh Steadmon
2018-12-17 21:33                       ` Jeff King
2018-12-19 23:30                         ` Josh Steadmon
2018-12-20 15:49                           ` Jeff King
2018-12-12  0:25               ` [PATCH v3 2/4] remote-curl: refactor smart-http discovery Josh Steadmon
2018-12-12  0:25               ` [PATCH v3 3/4] remote-curl: tighten "version 2" check for smart-http Josh Steadmon
2018-12-12  0:25               ` [PATCH v3 4/4] lib-httpd, t5551: check server-side HTTP errors Josh Steadmon
2018-12-12  8:43               ` [PATCH v3 0/4] Unify pkt-line error handling and refactor smart-http Junio C Hamano
2018-11-13 14:30 ` [PATCH] remote-curl: die on server-side errors Junio C Hamano
2018-11-13 22:28   ` Josh Steadmon

git@vger.kernel.org list mirror (unofficial, one of many)

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 git git/ https://public-inbox.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git
 note: .onion URLs require Tor: https://www.torproject.org/

code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git