From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: "Kyle J. McKay" <mackyle@gmail.com>, Alex Waite <alex@waite.eu>,
git@vger.kernel.org
Subject: [PATCH] urlmatch: add underscore to URL_HOST_CHARS
Date: Tue, 12 Oct 2021 17:12:26 -0400 [thread overview]
Message-ID: <YWX6OkJANJGN0RnT@coredump.intra.peff.net> (raw)
In-Reply-To: <YWX13C7xsLcu+jZA@coredump.intra.peff.net>
On Tue, Oct 12, 2021 at 04:53:48PM -0400, Jeff King wrote:
> > because earlier we define URL_HOST_CHARS without underscore:
> >
> > #define URL_HOST_CHARS URL_ALPHADIGIT ".-[:]" /* IPv6 literals need [:] */
> >
> > I'm not sure why, given that this otherwise seems to match according to
> > the rfc. This code comes from 3402a8dc48 (config: add helper to
> > normalize and match URLs, 2013-07-31), but there's no mention of
> > underscore there. Possibly it came from earlier rules (rfc1738, for
> > example, has a stricter grammar that allows only alphabit and dashes).
>
> Sorry, I meant to cc the author of 3402a8dc48, which I've now done. It's
> been a while, but maybe he remembers something (I couldn't find anything
> digging in the archive, either).
Absent any other input, I'd propose the patch below.
-- >8 --
Subject: urlmatch: add underscore to URL_HOST_CHARS
When parsing a URL to normalize it, we allow hostnames to contain only
dot (".") or dash ("-"), plus brackets and colons for IPv6 literals.
This matches the old URL standard in RFC 1738, which says:
host = hostname | hostnumber
hostname = *[ domainlabel "." ] toplabel
domainlabel = alphadigit | alphadigit *[ alphadigit | "-" ] alphadigit
But this was later updated by RFC 3986, which is more liberal:
host = IP-literal / IPv4address / reg-name
reg-name = *( unreserved / pct-encoded / sub-delims )
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
While names with underscore in them are not common and possibly violate
some DNS rules, they do work in practice, and we will happily contact
them over http://, git://, or ssh://. It seems odd to ignore them for
purposes of URL matching, especially when the URL RFC seems to allow
them.
There shouldn't be any downside here. It's not a syntactically
significant character in a URL, so we won't be confused about parsing;
we'd have simply rejected such a URL previously (the test here checks
the url code directly, but the obvious user-visible effect would be
failing to match credential.http://foo_bar.example.com.helper, or
similar config in http.<url>.*).
Arguably we'd want to allow tilde ("~") here, too. There's likewise
probably no downside, but I didn't add it simply because it seems like
an even less likely character to appear in a hostname.
Reported-by: Alex Waite <alex@waite.eu>
Signed-off-by: Jeff King <peff@peff.net>
---
I'm on the fence regarding "~". I didn't actually test that things like
curl even allow it (I did for underscore by creating a throwaway DNS
name).
t/t0110-urlmatch-normalization.sh | 2 +-
urlmatch.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/t/t0110-urlmatch-normalization.sh b/t/t0110-urlmatch-normalization.sh
index f99529d838..4dc9fecf72 100755
--- a/t/t0110-urlmatch-normalization.sh
+++ b/t/t0110-urlmatch-normalization.sh
@@ -47,7 +47,7 @@ test_expect_success 'url authority' '
test-tool urlmatch-normalization "scheme://@host" &&
test-tool urlmatch-normalization "scheme://%00@host" &&
! test-tool urlmatch-normalization "scheme://%%@host" &&
- ! test-tool urlmatch-normalization "scheme://host_" &&
+ test-tool urlmatch-normalization "scheme://host_" &&
test-tool urlmatch-normalization "scheme://user:pass@host/" &&
test-tool urlmatch-normalization "scheme://@host/" &&
test-tool urlmatch-normalization "scheme://host/" &&
diff --git a/urlmatch.c b/urlmatch.c
index 33a2ccd306..03ad3f30a9 100644
--- a/urlmatch.c
+++ b/urlmatch.c
@@ -5,7 +5,7 @@
#define URL_DIGIT "0123456789"
#define URL_ALPHADIGIT URL_ALPHA URL_DIGIT
#define URL_SCHEME_CHARS URL_ALPHADIGIT "+.-"
-#define URL_HOST_CHARS URL_ALPHADIGIT ".-[:]" /* IPv6 literals need [:] */
+#define URL_HOST_CHARS URL_ALPHADIGIT ".-_[:]" /* IPv6 literals need [:] */
#define URL_UNSAFE_CHARS " <>\"%{}|\\^`" /* plus 0x00-0x1F,0x7F-0xFF */
#define URL_GEN_RESERVED ":/?#[]@"
#define URL_SUB_RESERVED "!$&'()*+,;="
--
2.33.0.1387.g4e339dd0af
next prev parent reply other threads:[~2021-10-12 21:12 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-12 14:25 [BUG] credential wildcard does not match hostnames containing an underscore Alex Waite
2021-10-12 17:47 ` Junio C Hamano
2021-10-12 18:00 ` Alex Waite
2021-10-12 18:28 ` Junio C Hamano
2021-10-12 20:45 ` Jeff King
2021-10-12 20:42 ` Jeff King
2021-10-12 20:53 ` Jeff King
2021-10-12 21:12 ` Jeff King [this message]
2021-10-12 21:21 ` brian m. carlson
2021-10-12 21:32 ` Jeff King
2021-10-12 21:48 ` brian m. carlson
2021-10-12 21:55 ` Jeff King
2021-10-12 21:57 ` brian m. carlson
2021-10-12 22:25 ` Aaron Schrab
2021-10-13 16:21 ` Alex Waite
2021-10-14 11:43 ` Philip Oakley
2021-10-12 21:12 ` brian m. carlson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YWX6OkJANJGN0RnT@coredump.intra.peff.net \
--to=peff@peff.net \
--cc=alex@waite.eu \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=mackyle@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).