git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: "Kyle J. McKay" <mackyle@gmail.com>, Alex Waite <alex@waite.eu>,
	git@vger.kernel.org
Subject: [PATCH] urlmatch: add underscore to URL_HOST_CHARS
Date: Tue, 12 Oct 2021 17:12:26 -0400	[thread overview]
Message-ID: <YWX6OkJANJGN0RnT@coredump.intra.peff.net> (raw)
In-Reply-To: <YWX13C7xsLcu+jZA@coredump.intra.peff.net>

On Tue, Oct 12, 2021 at 04:53:48PM -0400, Jeff King wrote:

> > because earlier we define URL_HOST_CHARS without underscore:
> > 
> >   #define URL_HOST_CHARS URL_ALPHADIGIT ".-[:]" /* IPv6 literals need [:] */
> > 
> > I'm not sure why, given that this otherwise seems to match according to
> > the rfc. This code comes from 3402a8dc48 (config: add helper to
> > normalize and match URLs, 2013-07-31), but there's no mention of
> > underscore there. Possibly it came from earlier rules (rfc1738, for
> > example, has a stricter grammar that allows only alphabit and dashes).
> 
> Sorry, I meant to cc the author of 3402a8dc48, which I've now done. It's
> been a while, but maybe he remembers something (I couldn't find anything
> digging in the archive, either).

Absent any other input, I'd propose the patch below.

-- >8 --
Subject: urlmatch: add underscore to URL_HOST_CHARS

When parsing a URL to normalize it, we allow hostnames to contain only
dot (".") or dash ("-"), plus brackets and colons for IPv6 literals.
This matches the old URL standard in RFC 1738, which says:

  host           = hostname | hostnumber
  hostname       = *[ domainlabel "." ] toplabel
  domainlabel    = alphadigit | alphadigit *[ alphadigit | "-" ] alphadigit

But this was later updated by RFC 3986, which is more liberal:

  host        = IP-literal / IPv4address / reg-name
  reg-name    = *( unreserved / pct-encoded / sub-delims )
  unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"

While names with underscore in them are not common and possibly violate
some DNS rules, they do work in practice, and we will happily contact
them over http://, git://, or ssh://. It seems odd to ignore them for
purposes of URL matching, especially when the URL RFC seems to allow
them.

There shouldn't be any downside here. It's not a syntactically
significant character in a URL, so we won't be confused about parsing;
we'd have simply rejected such a URL previously (the test here checks
the url code directly, but the obvious user-visible effect would be
failing to match credential.http://foo_bar.example.com.helper, or
similar config in http.<url>.*).

Arguably we'd want to allow tilde ("~") here, too. There's likewise
probably no downside, but I didn't add it simply because it seems like
an even less likely character to appear in a hostname.

Reported-by: Alex Waite <alex@waite.eu>
Signed-off-by: Jeff King <peff@peff.net>
---
I'm on the fence regarding "~". I didn't actually test that things like
curl even allow it (I did for underscore by creating a throwaway DNS
name).

 t/t0110-urlmatch-normalization.sh | 2 +-
 urlmatch.c                        | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/t/t0110-urlmatch-normalization.sh b/t/t0110-urlmatch-normalization.sh
index f99529d838..4dc9fecf72 100755
--- a/t/t0110-urlmatch-normalization.sh
+++ b/t/t0110-urlmatch-normalization.sh
@@ -47,7 +47,7 @@ test_expect_success 'url authority' '
 	test-tool urlmatch-normalization "scheme://@host" &&
 	test-tool urlmatch-normalization "scheme://%00@host" &&
 	! test-tool urlmatch-normalization "scheme://%%@host" &&
-	! test-tool urlmatch-normalization "scheme://host_" &&
+	test-tool urlmatch-normalization "scheme://host_" &&
 	test-tool urlmatch-normalization "scheme://user:pass@host/" &&
 	test-tool urlmatch-normalization "scheme://@host/" &&
 	test-tool urlmatch-normalization "scheme://host/" &&
diff --git a/urlmatch.c b/urlmatch.c
index 33a2ccd306..03ad3f30a9 100644
--- a/urlmatch.c
+++ b/urlmatch.c
@@ -5,7 +5,7 @@
 #define URL_DIGIT "0123456789"
 #define URL_ALPHADIGIT URL_ALPHA URL_DIGIT
 #define URL_SCHEME_CHARS URL_ALPHADIGIT "+.-"
-#define URL_HOST_CHARS URL_ALPHADIGIT ".-[:]" /* IPv6 literals need [:] */
+#define URL_HOST_CHARS URL_ALPHADIGIT ".-_[:]" /* IPv6 literals need [:] */
 #define URL_UNSAFE_CHARS " <>\"%{}|\\^`" /* plus 0x00-0x1F,0x7F-0xFF */
 #define URL_GEN_RESERVED ":/?#[]@"
 #define URL_SUB_RESERVED "!$&'()*+,;="
-- 
2.33.0.1387.g4e339dd0af


  reply	other threads:[~2021-10-12 21:12 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-12 14:25 [BUG] credential wildcard does not match hostnames containing an underscore Alex Waite
2021-10-12 17:47 ` Junio C Hamano
2021-10-12 18:00   ` Alex Waite
2021-10-12 18:28     ` Junio C Hamano
2021-10-12 20:45     ` Jeff King
2021-10-12 20:42   ` Jeff King
2021-10-12 20:53     ` Jeff King
2021-10-12 21:12       ` Jeff King [this message]
2021-10-12 21:21     ` brian m. carlson
2021-10-12 21:32       ` Jeff King
2021-10-12 21:48         ` brian m. carlson
2021-10-12 21:55           ` Jeff King
2021-10-12 21:57           ` brian m. carlson
2021-10-12 22:25             ` Aaron Schrab
2021-10-13 16:21               ` Alex Waite
2021-10-14 11:43                 ` Philip Oakley
2021-10-12 21:12 ` brian m. carlson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YWX6OkJANJGN0RnT@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=alex@waite.eu \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mackyle@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).