git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] credential: treat "?" and "#" in URLs as end of host
@ 2020-04-14 21:43 Jeff King
  2020-04-15  0:13 ` Taylor Blau
  2020-04-15 18:39 ` Junio C Hamano
  0 siblings, 2 replies; 3+ messages in thread
From: Jeff King @ 2020-04-14 21:43 UTC (permalink / raw)
  To: git

It's unusual to see:

  https://example.com?query-parameters

without an intervening slash, like:

  https://example.com/some-path?query-parameters

or even:

  https://example.com/?query-parameters

but it is a valid end to the hostname (actually "authority component")
according to RFC 3986. Likewise for "#".

And curl will parse the URL according to the standard, meaning it will
contact example.com, but our credential code would ask about a bogus
hostname with a "?" in it. Let's make sure we follow the standard, and
more importantly ask about the same hosts that curl will be talking to.

It would be nice if we could just ask curl to parse the URL for us. But
it didn't grow a URL-parsing API until 7.62, so we'd be stuck with
fallback code either way. Plus we'd need this code in the main Git
binary, where we've tried to avoid having a link dependency on libcurl.

But let's at least fix our parser. Moving to curl's parser would prevent
other potential discrepancies, but this gives us immediate relief for
the known problem, and would help our fallback code if we eventually use
curl.

Signed-off-by: Jeff King <peff@peff.net>
---
Just a follow-on to today's release. This isn't security critical after
the earlier fix, but it made some of the attack vectors much easier.

 credential.c           |  9 ++++++++-
 t/t0300-credentials.sh | 36 +++++++++++++++++++++++++++++++++++-
 2 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/credential.c b/credential.c
index 21b3ba152f..8aa9777548 100644
--- a/credential.c
+++ b/credential.c
@@ -388,7 +388,14 @@ int credential_from_url_gently(struct credential *c, const char *url,
 	cp = proto_end + 3;
 	at = strchr(cp, '@');
 	colon = strchr(cp, ':');
-	slash = strchrnul(cp, '/');
+
+	/*
+	 * A query or fragment marker before the slash ends the host portion.
+	 * We'll just continue to call this "slash" for simplicity. Notably our
+	 * "trim leading slashes" part won't skip over this part of the path,
+	 * but that's what we'd want.
+	 */
+	slash = cp + strcspn(cp, "/?#");
 
 	if (!at || slash <= at) {
 		/* Case (1) */
diff --git a/t/t0300-credentials.sh b/t/t0300-credentials.sh
index 5b78ebbc3f..b6ec676989 100755
--- a/t/t0300-credentials.sh
+++ b/t/t0300-credentials.sh
@@ -443,11 +443,45 @@ test_expect_success 'url parser ignores embedded newlines' '
 	username=askpass-username
 	password=askpass-password
 	--
-	warning: url contains a newline in its host component: https://one.example.com?%0ahost=two.example.com/
+	warning: url contains a newline in its path component: https://one.example.com?%0ahost=two.example.com/
 	warning: skipping credential lookup for url: https://one.example.com?%0ahost=two.example.com/
 	askpass: Username:
 	askpass: Password:
 	EOF
 '
 
+# usage: check_host_and_path <url> <expected-host> <expected-path>
+check_host_and_path () {
+	# we always parse the path component, but we need this to make sure it
+	# is passed to the helper
+	test_config credential.useHTTPPath true &&
+	check fill "verbatim user pass" <<-EOF
+	url=$1
+	--
+	protocol=https
+	host=$2
+	path=$3
+	username=user
+	password=pass
+	--
+	verbatim: get
+	verbatim: protocol=https
+	verbatim: host=$2
+	verbatim: path=$3
+	EOF
+}
+
+test_expect_success 'url parser handles bare query marker' '
+	check_host_and_path https://example.com?foo.git example.com ?foo.git
+'
+
+test_expect_success 'url parser handles bare fragment marker' '
+	check_host_and_path https://example.com#foo.git example.com "#foo.git"
+'
+
+test_expect_success 'url parser not confused by encoded markers' '
+	check_host_and_path https://example.com%23%3f%2f/foo.git \
+		"example.com#?/" foo.git
+'
+
 test_done
-- 
2.26.1.429.g609150846d

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] credential: treat "?" and "#" in URLs as end of host
  2020-04-14 21:43 [PATCH] credential: treat "?" and "#" in URLs as end of host Jeff King
@ 2020-04-15  0:13 ` Taylor Blau
  2020-04-15 18:39 ` Junio C Hamano
  1 sibling, 0 replies; 3+ messages in thread
From: Taylor Blau @ 2020-04-15  0:13 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Hi Peff,

On Tue, Apr 14, 2020 at 05:43:04PM -0400, Jeff King wrote:
> It's unusual to see:
>
>   https://example.com?query-parameters
>
> without an intervening slash, like:
>
>   https://example.com/some-path?query-parameters
>
> or even:
>
>   https://example.com/?query-parameters
>
> but it is a valid end to the hostname (actually "authority component")
> according to RFC 3986. Likewise for "#".
>
> And curl will parse the URL according to the standard, meaning it will
> contact example.com, but our credential code would ask about a bogus
> hostname with a "?" in it. Let's make sure we follow the standard, and
> more importantly ask about the same hosts that curl will be talking to.
>
> It would be nice if we could just ask curl to parse the URL for us. But
> it didn't grow a URL-parsing API until 7.62, so we'd be stuck with
> fallback code either way. Plus we'd need this code in the main Git
> binary, where we've tried to avoid having a link dependency on libcurl.
>
> But let's at least fix our parser. Moving to curl's parser would prevent
> other potential discrepancies, but this gives us immediate relief for
> the known problem, and would help our fallback code if we eventually use
> curl.
>
> Signed-off-by: Jeff King <peff@peff.net>

All makes sense to me. I agree that it would probably be preferable if
we could just use cURL's own parser and forget about this code entirely.
But, having it still be a fallback knocks out any benefit that we'd be
getting by relying on their parser rather than our own.

> ---
> Just a follow-on to today's release. This isn't security critical after
> the earlier fix, but it made some of the attack vectors much easier.

Yep, thanks for noting.

>  credential.c           |  9 ++++++++-
>  t/t0300-credentials.sh | 36 +++++++++++++++++++++++++++++++++++-
>  2 files changed, 43 insertions(+), 2 deletions(-)
>
> diff --git a/credential.c b/credential.c
> index 21b3ba152f..8aa9777548 100644
> --- a/credential.c
> +++ b/credential.c
> @@ -388,7 +388,14 @@ int credential_from_url_gently(struct credential *c, const char *url,
>  	cp = proto_end + 3;
>  	at = strchr(cp, '@');
>  	colon = strchr(cp, ':');
> -	slash = strchrnul(cp, '/');
> +
> +	/*
> +	 * A query or fragment marker before the slash ends the host portion.
> +	 * We'll just continue to call this "slash" for simplicity. Notably our
> +	 * "trim leading slashes" part won't skip over this part of the path,
> +	 * but that's what we'd want.
> +	 */
> +	slash = cp + strcspn(cp, "/?#");
>
>  	if (!at || slash <= at) {
>  		/* Case (1) */
> diff --git a/t/t0300-credentials.sh b/t/t0300-credentials.sh
> index 5b78ebbc3f..b6ec676989 100755
> --- a/t/t0300-credentials.sh
> +++ b/t/t0300-credentials.sh
> @@ -443,11 +443,45 @@ test_expect_success 'url parser ignores embedded newlines' '
>  	username=askpass-username
>  	password=askpass-password
>  	--
> -	warning: url contains a newline in its host component: https://one.example.com?%0ahost=two.example.com/
> +	warning: url contains a newline in its path component: https://one.example.com?%0ahost=two.example.com/
>  	warning: skipping credential lookup for url: https://one.example.com?%0ahost=two.example.com/
>  	askpass: Username:
>  	askpass: Password:
>  	EOF
>  '
>
> +# usage: check_host_and_path <url> <expected-host> <expected-path>
> +check_host_and_path () {
> +	# we always parse the path component, but we need this to make sure it
> +	# is passed to the helper
> +	test_config credential.useHTTPPath true &&
> +	check fill "verbatim user pass" <<-EOF
> +	url=$1
> +	--
> +	protocol=https
> +	host=$2
> +	path=$3
> +	username=user
> +	password=pass
> +	--
> +	verbatim: get
> +	verbatim: protocol=https
> +	verbatim: host=$2
> +	verbatim: path=$3
> +	EOF
> +}
> +
> +test_expect_success 'url parser handles bare query marker' '
> +	check_host_and_path https://example.com?foo.git example.com ?foo.git
> +'
> +
> +test_expect_success 'url parser handles bare fragment marker' '
> +	check_host_and_path https://example.com#foo.git example.com "#foo.git"
> +'
> +
> +test_expect_success 'url parser not confused by encoded markers' '
> +	check_host_and_path https://example.com%23%3f%2f/foo.git \
> +		"example.com#?/" foo.git
> +'
> +
>  test_done

These look good, too. Thanks for working on this.

> --
> 2.26.1.429.g609150846d

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] credential: treat "?" and "#" in URLs as end of host
  2020-04-14 21:43 [PATCH] credential: treat "?" and "#" in URLs as end of host Jeff King
  2020-04-15  0:13 ` Taylor Blau
@ 2020-04-15 18:39 ` Junio C Hamano
  1 sibling, 0 replies; 3+ messages in thread
From: Junio C Hamano @ 2020-04-15 18:39 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> -	slash = strchrnul(cp, '/');
> +
> +	/*
> +	 * A query or fragment marker before the slash ends the host portion.
> +	 * We'll just continue to call this "slash" for simplicity. Notably our
> +	 * "trim leading slashes" part won't skip over this part of the path,
> +	 * but that's what we'd want.
> +	 */
> +	slash = cp + strcspn(cp, "/?#");

Whatever happened when the original did not find any slash should
happen in the new code when none of these three terminating bytes
is found.  The original made slash point at the end of the string,
and the new code does the same.

So, by definition, the code is correct ;-)

> diff --git a/t/t0300-credentials.sh b/t/t0300-credentials.sh
> index 5b78ebbc3f..b6ec676989 100755
> --- a/t/t0300-credentials.sh
> +++ b/t/t0300-credentials.sh
> @@ -443,11 +443,45 @@ test_expect_success 'url parser ignores embedded newlines' '
>  	username=askpass-username
>  	password=askpass-password
>  	--
> -	warning: url contains a newline in its host component: https://one.example.com?%0ahost=two.example.com/
> +	warning: url contains a newline in its path component: https://one.example.com?%0ahost=two.example.com/

Nice demonstration ;-).


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-04-15 19:17 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-14 21:43 [PATCH] credential: treat "?" and "#" in URLs as end of host Jeff King
2020-04-15  0:13 ` Taylor Blau
2020-04-15 18:39 ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).