git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [BUG] credential wildcard does not match hostnames containing an underscore
@ 2021-10-12 14:25 Alex Waite
  2021-10-12 17:47 ` Junio C Hamano
  2021-10-12 21:12 ` brian m. carlson
  0 siblings, 2 replies; 17+ messages in thread
From: Alex Waite @ 2021-10-12 14:25 UTC (permalink / raw)
  To: git

Hey Everyone,

This is my first time reporting a bug to this list. I attempted to follow the direction on the git website [1], and have followed the template provided by "git bugreport".

If my report should be formatted differently, or reported elsewhere, please let me know. :-)

---Alex

[1] https://git-scm.com/community

-------

What did you do before the bug happened? (Steps to reproduce your issue)

  I configured my ~/.gitconfig so that git credentials invoke a helper for a
  subdomain using wildcards. For example:

  [credential "https://*.example.com"]
          helper = "/usr/local/bin/custom_helper"

  This works for all tested subdomains /except/ for those which contain an
  underscore.

  authenticates without prompting:
    git clone https://testA.example.com
    git clone https://test-b.example.com

  prompts for authentication:
    git clone https://test_c.example.com


What did you expect to happen? (Expected behavior)

  I expected the pattern matching to work for all resolved URLs.


What happened instead? (Actual behavior)

  It does not match URLs which contain an underscore.


What's different between what you expected and what actually happened?

  It only matches URLs which do not contain an underscore.


Anything else you want to add:

  If I don't use pattern matching, and instead state the URL explicitly in
  ~/.gitconfig, it works as expected. For example, the following works:

  [credential "https://test_c.example.com"]
          helper = "/usr/local/bin/custom_helper"

  As part of writing this bug report, I learned that underscores are not valid
  DNS characters for hostnames (but are valid for other record types, which are
  largely irrelevant to git).

  What is notable is that git pattern matching enforces the spec more strictly
  than without pattern matching (and more strictly than the OS and every DNS
  server between my system and the authoritative DNS server).

  At minimum, git should be consistent with itself.

  As for which behavior is "correct", the question is whether git wishes to
  follow/enforce the spec tightly, or not get in the way of a real-world oddity
  that everything else seems to tolerate.

  This also likely affects patterns for http.<url>.*
  https://git-scm.com/docs/git-config#Documentation/git-config.txt-httplturlgt


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [BUG] credential wildcard does not match hostnames containing an underscore
  2021-10-12 14:25 [BUG] credential wildcard does not match hostnames containing an underscore Alex Waite
@ 2021-10-12 17:47 ` Junio C Hamano
  2021-10-12 18:00   ` Alex Waite
  2021-10-12 20:42   ` Jeff King
  2021-10-12 21:12 ` brian m. carlson
  1 sibling, 2 replies; 17+ messages in thread
From: Junio C Hamano @ 2021-10-12 17:47 UTC (permalink / raw)
  To: Alex Waite; +Cc: git

"Alex Waite" <alex@waite.eu> writes:

>   This works for all tested subdomains /except/ for those which contain an
>   underscore.
>
>   authenticates without prompting:
>     git clone https://testA.example.com
>     git clone https://test-b.example.com
>
>   prompts for authentication:
>     git clone https://test_c.example.com

Hmph, given that hostnames cannot have '_' (cf. RFC1123 2.1 "Host
Names and Numbers", for example), the third URL seems invalid.  Is
this even a bug?



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [BUG] credential wildcard does not match hostnames containing an underscore
  2021-10-12 17:47 ` Junio C Hamano
@ 2021-10-12 18:00   ` Alex Waite
  2021-10-12 18:28     ` Junio C Hamano
  2021-10-12 20:45     ` Jeff King
  2021-10-12 20:42   ` Jeff King
  1 sibling, 2 replies; 17+ messages in thread
From: Alex Waite @ 2021-10-12 18:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Thanks for the response. :-)

> Hmph, given that hostnames cannot have '_' (cf. RFC1123 2.1 "Host
> Names and Numbers", for example), the third URL seems invalid.  Is
> this even a bug?

That is a fair question, and I do acknowledge that later on in my bug report (where I provide some additional information).

The core issue, IMO, is that git is not consistent with itself. I can write a static rule that will match ("https://test_c.example.com") but cannot write a pattern that will do so.

From a user perspective, a URL containing an underscore in the hostname works for everything else (including all other git operations), but not with pattern matching. It took me a couple hours to figure out /why/, as GIT_TRACE does not provide debugging for how git config rules do (or don't) match patterns.

Specifying in the docs which characters are matched would have helped understand why it was behaving so.

In any case, I felt it was opaque, and was worth sharing.

---Alex

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [BUG] credential wildcard does not match hostnames containing an underscore
  2021-10-12 18:00   ` Alex Waite
@ 2021-10-12 18:28     ` Junio C Hamano
  2021-10-12 20:45     ` Jeff King
  1 sibling, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2021-10-12 18:28 UTC (permalink / raw)
  To: Alex Waite, brian m. carlson; +Cc: git

"Alex Waite" <alex@waite.eu> writes:

> Thanks for the response. :-)
>
>> Hmph, given that hostnames cannot have '_' (cf. RFC1123 2.1 "Host
>> Names and Numbers", for example), the third URL seems invalid.  Is
>> this even a bug?
>
> That is a fair question, and I do acknowledge that later on in my bug report (where I provide some additional information).
>
> The core issue, IMO, is that git is not consistent with itself. I
> can write a static rule that will match
> ("https://test_c.example.com") but cannot write a pattern that
> will do so.

I do not know if that is avoidable.

To be able to match a concrete URL that came from the running
session with a pattern taken from the configuration, we'd need to do
some parsing to figure out which part of the URL matches the
wildcard, and to be able to tell which substrings are "parts", there
needs some syntactic rule that says what constitutes a valid word.

I guess that we could make them consistent by treating a literal as
a pattern that does not happen to have any wildcard and reject the
"test_c" hostname the same way in both cases.  I do not offhand know
how desirable such a change would be.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [BUG] credential wildcard does not match hostnames containing an underscore
  2021-10-12 17:47 ` Junio C Hamano
  2021-10-12 18:00   ` Alex Waite
@ 2021-10-12 20:42   ` Jeff King
  2021-10-12 20:53     ` Jeff King
  2021-10-12 21:21     ` [BUG] credential wildcard does not match hostnames containing an underscore brian m. carlson
  1 sibling, 2 replies; 17+ messages in thread
From: Jeff King @ 2021-10-12 20:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Alex Waite, git

On Tue, Oct 12, 2021 at 10:47:01AM -0700, Junio C Hamano wrote:

> "Alex Waite" <alex@waite.eu> writes:
> 
> >   This works for all tested subdomains /except/ for those which contain an
> >   underscore.
> >
> >   authenticates without prompting:
> >     git clone https://testA.example.com
> >     git clone https://test-b.example.com
> >
> >   prompts for authentication:
> >     git clone https://test_c.example.com
> 
> Hmph, given that hostnames cannot have '_' (cf. RFC1123 2.1 "Host
> Names and Numbers", for example), the third URL seems invalid.  Is
> this even a bug?

That may be so for hostnames in general, but URLs seem to allow it. RFC
3986 says:

      host        = IP-literal / IPv4address / reg-name
      reg-name    = *( unreserved / pct-encoded / sub-delims )
      unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"

So underscore is definitely allowed in the host portion. Our code
complains during url_normalize(), in this code:

          if (allow_globs)
                  spanned = strspn(url, URL_HOST_CHARS "*");
          else
                  spanned = strspn(url, URL_HOST_CHARS);
  
          if (spanned < colon_ptr - url) {
                  /* Host name has invalid characters */
                  if (out_info) {
                          out_info->url = NULL;
                          out_info->err = _("invalid characters in host name");
                  }
                  strbuf_release(&norm);
                  return NULL;
          }

because earlier we define URL_HOST_CHARS without underscore:

  #define URL_HOST_CHARS URL_ALPHADIGIT ".-[:]" /* IPv6 literals need [:] */

I'm not sure why, given that this otherwise seems to match according to
the rfc. This code comes from 3402a8dc48 (config: add helper to
normalize and match URLs, 2013-07-31), but there's no mention of
underscore there. Possibly it came from earlier rules (rfc1738, for
example, has a stricter grammar that allows only alphabit and dashes).

I can't imagine it would cause any problems to allow it here (as noted,
we're perfectly happy to use the name in other contexts, and I don't
think there any syntactic gotchas here).

Adding "_" to that #define does make it work as expected.

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [BUG] credential wildcard does not match hostnames containing an underscore
  2021-10-12 18:00   ` Alex Waite
  2021-10-12 18:28     ` Junio C Hamano
@ 2021-10-12 20:45     ` Jeff King
  1 sibling, 0 replies; 17+ messages in thread
From: Jeff King @ 2021-10-12 20:45 UTC (permalink / raw)
  To: Alex Waite; +Cc: Junio C Hamano, git

On Tue, Oct 12, 2021 at 08:00:55PM +0200, Alex Waite wrote:

> From a user perspective, a URL containing an underscore in the
> hostname works for everything else (including all other git
> operations), but not with pattern matching. It took me a couple hours
> to figure out /why/, as GIT_TRACE does not provide debugging for how
> git config rules do (or don't) match patterns.

One thing I noticed here: url_normalize() does complain about parsing
this URL, but we don't propagate its error message to stderr. Perhaps we
should do so with warning(), but I'm a bit afraid that we may be relying
on this code to silently ignore invalid urls (i.e., showing the error
would cause some other innocuous cases to start issuing noisy and
useless warnings).

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [BUG] credential wildcard does not match hostnames containing an underscore
  2021-10-12 20:42   ` Jeff King
@ 2021-10-12 20:53     ` Jeff King
  2021-10-12 21:12       ` [PATCH] urlmatch: add underscore to URL_HOST_CHARS Jeff King
  2021-10-12 21:21     ` [BUG] credential wildcard does not match hostnames containing an underscore brian m. carlson
  1 sibling, 1 reply; 17+ messages in thread
From: Jeff King @ 2021-10-12 20:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Kyle J. McKay, Alex Waite, git

On Tue, Oct 12, 2021 at 04:42:01PM -0400, Jeff King wrote:

> That may be so for hostnames in general, but URLs seem to allow it. RFC
> 3986 says:
> 
>       host        = IP-literal / IPv4address / reg-name
>       reg-name    = *( unreserved / pct-encoded / sub-delims )
>       unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"
> 
> So underscore is definitely allowed in the host portion. Our code
> complains during url_normalize(), in this code:
> 
>           if (allow_globs)
>                   spanned = strspn(url, URL_HOST_CHARS "*");
>           else
>                   spanned = strspn(url, URL_HOST_CHARS);
>   
>           if (spanned < colon_ptr - url) {
>                   /* Host name has invalid characters */
>                   if (out_info) {
>                           out_info->url = NULL;
>                           out_info->err = _("invalid characters in host name");
>                   }
>                   strbuf_release(&norm);
>                   return NULL;
>           }
> 
> because earlier we define URL_HOST_CHARS without underscore:
> 
>   #define URL_HOST_CHARS URL_ALPHADIGIT ".-[:]" /* IPv6 literals need [:] */
> 
> I'm not sure why, given that this otherwise seems to match according to
> the rfc. This code comes from 3402a8dc48 (config: add helper to
> normalize and match URLs, 2013-07-31), but there's no mention of
> underscore there. Possibly it came from earlier rules (rfc1738, for
> example, has a stricter grammar that allows only alphabit and dashes).

Sorry, I meant to cc the author of 3402a8dc48, which I've now done. It's
been a while, but maybe he remembers something (I couldn't find anything
digging in the archive, either).

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] urlmatch: add underscore to URL_HOST_CHARS
  2021-10-12 20:53     ` Jeff King
@ 2021-10-12 21:12       ` Jeff King
  0 siblings, 0 replies; 17+ messages in thread
From: Jeff King @ 2021-10-12 21:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Kyle J. McKay, Alex Waite, git

On Tue, Oct 12, 2021 at 04:53:48PM -0400, Jeff King wrote:

> > because earlier we define URL_HOST_CHARS without underscore:
> > 
> >   #define URL_HOST_CHARS URL_ALPHADIGIT ".-[:]" /* IPv6 literals need [:] */
> > 
> > I'm not sure why, given that this otherwise seems to match according to
> > the rfc. This code comes from 3402a8dc48 (config: add helper to
> > normalize and match URLs, 2013-07-31), but there's no mention of
> > underscore there. Possibly it came from earlier rules (rfc1738, for
> > example, has a stricter grammar that allows only alphabit and dashes).
> 
> Sorry, I meant to cc the author of 3402a8dc48, which I've now done. It's
> been a while, but maybe he remembers something (I couldn't find anything
> digging in the archive, either).

Absent any other input, I'd propose the patch below.

-- >8 --
Subject: urlmatch: add underscore to URL_HOST_CHARS

When parsing a URL to normalize it, we allow hostnames to contain only
dot (".") or dash ("-"), plus brackets and colons for IPv6 literals.
This matches the old URL standard in RFC 1738, which says:

  host           = hostname | hostnumber
  hostname       = *[ domainlabel "." ] toplabel
  domainlabel    = alphadigit | alphadigit *[ alphadigit | "-" ] alphadigit

But this was later updated by RFC 3986, which is more liberal:

  host        = IP-literal / IPv4address / reg-name
  reg-name    = *( unreserved / pct-encoded / sub-delims )
  unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"

While names with underscore in them are not common and possibly violate
some DNS rules, they do work in practice, and we will happily contact
them over http://, git://, or ssh://. It seems odd to ignore them for
purposes of URL matching, especially when the URL RFC seems to allow
them.

There shouldn't be any downside here. It's not a syntactically
significant character in a URL, so we won't be confused about parsing;
we'd have simply rejected such a URL previously (the test here checks
the url code directly, but the obvious user-visible effect would be
failing to match credential.http://foo_bar.example.com.helper, or
similar config in http.<url>.*).

Arguably we'd want to allow tilde ("~") here, too. There's likewise
probably no downside, but I didn't add it simply because it seems like
an even less likely character to appear in a hostname.

Reported-by: Alex Waite <alex@waite.eu>
Signed-off-by: Jeff King <peff@peff.net>
---
I'm on the fence regarding "~". I didn't actually test that things like
curl even allow it (I did for underscore by creating a throwaway DNS
name).

 t/t0110-urlmatch-normalization.sh | 2 +-
 urlmatch.c                        | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/t/t0110-urlmatch-normalization.sh b/t/t0110-urlmatch-normalization.sh
index f99529d838..4dc9fecf72 100755
--- a/t/t0110-urlmatch-normalization.sh
+++ b/t/t0110-urlmatch-normalization.sh
@@ -47,7 +47,7 @@ test_expect_success 'url authority' '
 	test-tool urlmatch-normalization "scheme://@host" &&
 	test-tool urlmatch-normalization "scheme://%00@host" &&
 	! test-tool urlmatch-normalization "scheme://%%@host" &&
-	! test-tool urlmatch-normalization "scheme://host_" &&
+	test-tool urlmatch-normalization "scheme://host_" &&
 	test-tool urlmatch-normalization "scheme://user:pass@host/" &&
 	test-tool urlmatch-normalization "scheme://@host/" &&
 	test-tool urlmatch-normalization "scheme://host/" &&
diff --git a/urlmatch.c b/urlmatch.c
index 33a2ccd306..03ad3f30a9 100644
--- a/urlmatch.c
+++ b/urlmatch.c
@@ -5,7 +5,7 @@
 #define URL_DIGIT "0123456789"
 #define URL_ALPHADIGIT URL_ALPHA URL_DIGIT
 #define URL_SCHEME_CHARS URL_ALPHADIGIT "+.-"
-#define URL_HOST_CHARS URL_ALPHADIGIT ".-[:]" /* IPv6 literals need [:] */
+#define URL_HOST_CHARS URL_ALPHADIGIT ".-_[:]" /* IPv6 literals need [:] */
 #define URL_UNSAFE_CHARS " <>\"%{}|\\^`" /* plus 0x00-0x1F,0x7F-0xFF */
 #define URL_GEN_RESERVED ":/?#[]@"
 #define URL_SUB_RESERVED "!$&'()*+,;="
-- 
2.33.0.1387.g4e339dd0af


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [BUG] credential wildcard does not match hostnames containing an underscore
  2021-10-12 14:25 [BUG] credential wildcard does not match hostnames containing an underscore Alex Waite
  2021-10-12 17:47 ` Junio C Hamano
@ 2021-10-12 21:12 ` brian m. carlson
  1 sibling, 0 replies; 17+ messages in thread
From: brian m. carlson @ 2021-10-12 21:12 UTC (permalink / raw)
  To: Alex Waite; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 2724 bytes --]

On 2021-10-12 at 14:25:04, Alex Waite wrote:
> What did you do before the bug happened? (Steps to reproduce your issue)
> 
>   I configured my ~/.gitconfig so that git credentials invoke a helper for a
>   subdomain using wildcards. For example:
> 
>   [credential "https://*.example.com"]
>           helper = "/usr/local/bin/custom_helper"
> 
>   This works for all tested subdomains /except/ for those which contain an
>   underscore.
> 
>   authenticates without prompting:
>     git clone https://testA.example.com
>     git clone https://test-b.example.com
> 
>   prompts for authentication:
>     git clone https://test_c.example.com
> 
> 
> What did you expect to happen? (Expected behavior)
> 
>   I expected the pattern matching to work for all resolved URLs.

As mentioned below and elsewhere in this thread, this isn't a valid
hostname, and as a result, this isn't even a valid URL according to RFC
3986 unless you intended it to be resolved in a system other than DNS.

I don't personally see a reason to accept locally specified hostnames
(e.g., in a hosts file) which don't conform to RFC 1123 or which
otherwise don't conform to the DNS standards for hostnames, but perhaps
others can see a good reason to do so.

> Anything else you want to add:
> 
>   If I don't use pattern matching, and instead state the URL explicitly in
>   ~/.gitconfig, it works as expected. For example, the following works:
> 
>   [credential "https://test_c.example.com"]
>           helper = "/usr/local/bin/custom_helper"
> 
>   As part of writing this bug report, I learned that underscores are not valid
>   DNS characters for hostnames (but are valid for other record types, which are
>   largely irrelevant to git).
> 
>   What is notable is that git pattern matching enforces the spec more strictly
>   than without pattern matching (and more strictly than the OS and every DNS
>   server between my system and the authoritative DNS server).
> 
>   At minimum, git should be consistent with itself.
> 
>   As for which behavior is "correct", the question is whether git wishes to
>   follow/enforce the spec tightly, or not get in the way of a real-world oddity
>   that everything else seems to tolerate.

There are a variety of systems which won't accept such a hostname, so I
think at best we should reject such hostnames altogether and prevent
this from working at all, since they are likely to be subtly broken in a
variety of ways and we won't want to try to fix all of the cases in
which things are broken.  To me, this appears to be simply a case where
we should improve error handling.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [BUG] credential wildcard does not match hostnames containing an underscore
  2021-10-12 20:42   ` Jeff King
  2021-10-12 20:53     ` Jeff King
@ 2021-10-12 21:21     ` brian m. carlson
  2021-10-12 21:32       ` Jeff King
  1 sibling, 1 reply; 17+ messages in thread
From: brian m. carlson @ 2021-10-12 21:21 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Alex Waite, git

[-- Attachment #1: Type: text/plain, Size: 2145 bytes --]

On 2021-10-12 at 20:42:01, Jeff King wrote:
> On Tue, Oct 12, 2021 at 10:47:01AM -0700, Junio C Hamano wrote:
> 
> > "Alex Waite" <alex@waite.eu> writes:
> > 
> > >   This works for all tested subdomains /except/ for those which contain an
> > >   underscore.
> > >
> > >   authenticates without prompting:
> > >     git clone https://testA.example.com
> > >     git clone https://test-b.example.com
> > >
> > >   prompts for authentication:
> > >     git clone https://test_c.example.com
> > 
> > Hmph, given that hostnames cannot have '_' (cf. RFC1123 2.1 "Host
> > Names and Numbers", for example), the third URL seems invalid.  Is
> > this even a bug?
> 
> That may be so for hostnames in general, but URLs seem to allow it. RFC
> 3986 says:
> 
>       host        = IP-literal / IPv4address / reg-name
>       reg-name    = *( unreserved / pct-encoded / sub-delims )
>       unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"

That's what the schema says.  The text says this:

  A host identified by a registered name is a sequence of characters
  usually intended for lookup within a locally defined host or service
  name registry, though the URI's scheme-specific semantics may require
  that a specific registry (or fixed name table) be used instead.  The
  most common name registry mechanism is the Domain Name System (DNS).
  A registered name intended for lookup in the DNS uses the syntax
  defined in Section 3.5 of [RFC1034] and Section 2.1 of [RFC1123].

Those RFCs disallow the underscore.

If we plan to allow names that are not registered in the DNS, we should
clearly specify what those are and document how they work in conjunction
with libcurl (which presumably does a DNS lookup on them).  It's my
guess that there are going to be system resolvers which are not going to
accept this syntax in getaddrinfo and as a result, we're going to have
various breakage across systems if we try to accept this.

I'm happy to put in a change to reject these hostnames altogether, but I
won't get to it before Friday.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [BUG] credential wildcard does not match hostnames containing an underscore
  2021-10-12 21:21     ` [BUG] credential wildcard does not match hostnames containing an underscore brian m. carlson
@ 2021-10-12 21:32       ` Jeff King
  2021-10-12 21:48         ` brian m. carlson
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff King @ 2021-10-12 21:32 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Junio C Hamano, Alex Waite, git

On Tue, Oct 12, 2021 at 09:21:59PM +0000, brian m. carlson wrote:

> > That may be so for hostnames in general, but URLs seem to allow it. RFC
> > 3986 says:
> > 
> >       host        = IP-literal / IPv4address / reg-name
> >       reg-name    = *( unreserved / pct-encoded / sub-delims )
> >       unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"
> 
> That's what the schema says.  The text says this:
> 
>   A host identified by a registered name is a sequence of characters
>   usually intended for lookup within a locally defined host or service
>   name registry, though the URI's scheme-specific semantics may require
>   that a specific registry (or fixed name table) be used instead.  The
>   most common name registry mechanism is the Domain Name System (DNS).
>   A registered name intended for lookup in the DNS uses the syntax
>   defined in Section 3.5 of [RFC1034] and Section 2.1 of [RFC1123].
> 
> Those RFCs disallow the underscore.

Thanks, I skimmed looking for some resolution to this mismatch, but
didn't find that paragraph.

> If we plan to allow names that are not registered in the DNS, we should
> clearly specify what those are and document how they work in conjunction
> with libcurl (which presumably does a DNS lookup on them).  It's my
> guess that there are going to be system resolvers which are not going to
> accept this syntax in getaddrinfo and as a result, we're going to have
> various breakage across systems if we try to accept this.

I don't think this makes anything worse. Either the underscore works or
it doesn't for general use on your system. This just means we'll allow
http.<url>.* config for it.

And it does indeed work fine on my system, via DNS. My stub resolver is
glibc, and curl itself is fine with it. The server side answering the
query was djbdns (tinydns, with dnscache as a recursive resolver in
between). I could believe that other implementations may be more strict,
though.

> I'm happy to put in a change to reject these hostnames altogether, but I
> won't get to it before Friday.

IMHO _that_ is the thing that will produce breakage. People who are not
using URL-specific config but are happily using foo_bar.example.com will
now get a failure for something that used to work.

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [BUG] credential wildcard does not match hostnames containing an underscore
  2021-10-12 21:32       ` Jeff King
@ 2021-10-12 21:48         ` brian m. carlson
  2021-10-12 21:55           ` Jeff King
  2021-10-12 21:57           ` brian m. carlson
  0 siblings, 2 replies; 17+ messages in thread
From: brian m. carlson @ 2021-10-12 21:48 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Alex Waite, git

[-- Attachment #1: Type: text/plain, Size: 2136 bytes --]

On 2021-10-12 at 21:32:24, Jeff King wrote:
> On Tue, Oct 12, 2021 at 09:21:59PM +0000, brian m. carlson wrote:
> > I'm happy to put in a change to reject these hostnames altogether, but I
> > won't get to it before Friday.
> 
> IMHO _that_ is the thing that will produce breakage. People who are not
> using URL-specific config but are happily using foo_bar.example.com will
> now get a failure for something that used to work.

There's a well-known bug on Tumblr, where it allocated hostnames for
users that happened to start or end with a dash, which is not allowed.
This worked great on Windows systems, which don't care, but every Unix
system was broken.

When we decide to allow this particular case, we end up with the problem
that people won't see consistent behavior across systems and tools.  It
may be that libgit2 rejects this, or Git LFS, or some other tool in the
ecosystem, and then we'll have people complaining that "Well, Git
accepts it, so why don't you?" which I am not eager to see.  I, for
example, have absolutely zero control over the URL parsing library
that's used in Git LFS, and the Go team has demonstrated that they don't
care one bit about supporting Git-related tooling.  That doesn't even
include a variety of proprietary Unix systems that might have different
rules or resolvers.

I am also not eager to see additional bug reports for this case that
will need to be fixed under the precedent that we accepted a patch to
fix it before.  If there's a concern that rejecting these hostnames
altogether would break existing users, then we can just do nothing, and
tell users that their syntax is not valid and they need to fix their
hostnames.  This rule has been documented since before ISO standardized
C, so it shouldn't be new to anyone deploying systems or DNS.

So I'm fine with doing nothing, or rejecting these hostnames, but not
allowing more lenient syntax, because it will probably be broken
somewhere and we (or someone else in the ecosystem) will have to deal
with it again down the line.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [BUG] credential wildcard does not match hostnames containing an underscore
  2021-10-12 21:48         ` brian m. carlson
@ 2021-10-12 21:55           ` Jeff King
  2021-10-12 21:57           ` brian m. carlson
  1 sibling, 0 replies; 17+ messages in thread
From: Jeff King @ 2021-10-12 21:55 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Junio C Hamano, Alex Waite, git

On Tue, Oct 12, 2021 at 09:48:33PM +0000, brian m. carlson wrote:

> I am also not eager to see additional bug reports for this case that
> will need to be fixed under the precedent that we accepted a patch to
> fix it before.  If there's a concern that rejecting these hostnames
> altogether would break existing users, then we can just do nothing, and
> tell users that their syntax is not valid and they need to fix their
> hostnames.  This rule has been documented since before ISO standardized
> C, so it shouldn't be new to anyone deploying systems or DNS.
> 
> So I'm fine with doing nothing, or rejecting these hostnames, but not
> allowing more lenient syntax, because it will probably be broken
> somewhere and we (or someone else in the ecosystem) will have to deal
> with it again down the line.

FWIW, I'm fine with doing nothing. But it will still come back further
down the line, because it _does_ work most of the way, and has for a
long time.

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [BUG] credential wildcard does not match hostnames containing an underscore
  2021-10-12 21:48         ` brian m. carlson
  2021-10-12 21:55           ` Jeff King
@ 2021-10-12 21:57           ` brian m. carlson
  2021-10-12 22:25             ` Aaron Schrab
  1 sibling, 1 reply; 17+ messages in thread
From: brian m. carlson @ 2021-10-12 21:57 UTC (permalink / raw)
  To: Jeff King, Junio C Hamano, Alex Waite, git

[-- Attachment #1: Type: text/plain, Size: 2429 bytes --]

On 2021-10-12 at 21:48:33, brian m. carlson wrote:
> On 2021-10-12 at 21:32:24, Jeff King wrote:
> > On Tue, Oct 12, 2021 at 09:21:59PM +0000, brian m. carlson wrote:
> > > I'm happy to put in a change to reject these hostnames altogether, but I
> > > won't get to it before Friday.
> > 
> > IMHO _that_ is the thing that will produce breakage. People who are not
> > using URL-specific config but are happily using foo_bar.example.com will
> > now get a failure for something that used to work.
> 
> There's a well-known bug on Tumblr, where it allocated hostnames for
> users that happened to start or end with a dash, which is not allowed.
> This worked great on Windows systems, which don't care, but every Unix
> system was broken.
> 
> When we decide to allow this particular case, we end up with the problem
> that people won't see consistent behavior across systems and tools.  It
> may be that libgit2 rejects this, or Git LFS, or some other tool in the
> ecosystem, and then we'll have people complaining that "Well, Git
> accepts it, so why don't you?" which I am not eager to see.  I, for
> example, have absolutely zero control over the URL parsing library
> that's used in Git LFS, and the Go team has demonstrated that they don't
> care one bit about supporting Git-related tooling.  That doesn't even
> include a variety of proprietary Unix systems that might have different
> rules or resolvers.
> 
> I am also not eager to see additional bug reports for this case that
> will need to be fixed under the precedent that we accepted a patch to
> fix it before.  If there's a concern that rejecting these hostnames
> altogether would break existing users, then we can just do nothing, and
> tell users that their syntax is not valid and they need to fix their
> hostnames.  This rule has been documented since before ISO standardized
> C, so it shouldn't be new to anyone deploying systems or DNS.

I also just checked, and RFC 5280 specifies the rules for RFC 1123
regarding host names in certificates.  So even if we did accept this, no
publicly trusted CA could issue a certificate for such a domain, because
to do so would be misissuance.  So this at best could help people who
are either using plain HTTP or an internal CA using broken tools,
neither of which I think argue in favor of supporting this.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [BUG] credential wildcard does not match hostnames containing an underscore
  2021-10-12 21:57           ` brian m. carlson
@ 2021-10-12 22:25             ` Aaron Schrab
  2021-10-13 16:21               ` Alex Waite
  0 siblings, 1 reply; 17+ messages in thread
From: Aaron Schrab @ 2021-10-12 22:25 UTC (permalink / raw)
  To: brian m. carlson, Jeff King, Junio C Hamano, Alex Waite, git

[-- Attachment #1: Type: text/plain, Size: 541 bytes --]

At 21:57 +0000 12 Oct 2021, "brian m. carlson" <sandals@crustytoothpaste.net> wrote:
>I also just checked, and RFC 5280 specifies the rules for RFC 1123
>regarding host names in certificates.  So even if we did accept this, no
>publicly trusted CA could issue a certificate for such a domain, because
>to do so would be misissuance.  So this at best could help people who
>are either using plain HTTP or an internal CA using broken tools,
>neither of which I think argue in favor of supporting this.

Or people using a wildcard certificate.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 898 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [BUG] credential wildcard does not match hostnames containing an underscore
  2021-10-12 22:25             ` Aaron Schrab
@ 2021-10-13 16:21               ` Alex Waite
  2021-10-14 11:43                 ` Philip Oakley
  0 siblings, 1 reply; 17+ messages in thread
From: Alex Waite @ 2021-10-13 16:21 UTC (permalink / raw)
  To: Aaron Schrab, brian m. carlson, Jeff King, Junio C Hamano, git

2021.10.13, aaron@schrab.com:
> At 21:57 +0000 12 Oct 2021, "brian m. carlson" 
> <sandals@crustytoothpaste.net> wrote:
>>I also just checked, and RFC 5280 specifies the rules for RFC 1123
>>regarding host names in certificates.  So even if we did accept this, no
>>publicly trusted CA could issue a certificate for such a domain, because
>>to do so would be misissuance.  So this at best could help people who
>>are either using plain HTTP or an internal CA using broken tools,
>>neither of which I think argue in favor of supporting this.
>
> Or people using a wildcard certificate.

I didn't expect to kick off such a big discussion with this bug. ;-)

Just to add my 2 cents: the environment where I bumped into this is using a wildcard cert. It's among a fleet of (busy) internal domains that host data for scientific analysis. Lots of tools talk with those domains, and this is the first time I've been made aware of something struggling with the domains containing an underscore.

I'm not saying whether git should or should not change its behavior here. But the above is why I was surprised to learn the an underscore is not valid. Because everything (DNS servers, dig, Apache, and git itself) seems to happily use it.

In my view, the primary bug is how difficult it was to debug what was going wrong. This is most easily solved by improving the git docs to specify which characters will be matched. Even better if GIT_TRACE (or something similar) can inform/warn the user about matching.

In any case, thanks everyone for looking into this. :-)

---Alex

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [BUG] credential wildcard does not match hostnames containing an underscore
  2021-10-13 16:21               ` Alex Waite
@ 2021-10-14 11:43                 ` Philip Oakley
  0 siblings, 0 replies; 17+ messages in thread
From: Philip Oakley @ 2021-10-14 11:43 UTC (permalink / raw)
  To: Alex Waite, Aaron Schrab, brian m. carlson, Jeff King,
	Junio C Hamano, git

On 13/10/2021 17:21, Alex Waite wrote:
> In my view, the primary bug is how difficult it was to debug what was going wrong. This is most easily solved by improving the git docs to specify which characters will be matched. Even better if GIT_TRACE (or something similar) can inform/warn the user about matching.
It did sound to me that the documentation route was a way out of the
confusing situation.
--
Philip

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-10-14 11:44 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-12 14:25 [BUG] credential wildcard does not match hostnames containing an underscore Alex Waite
2021-10-12 17:47 ` Junio C Hamano
2021-10-12 18:00   ` Alex Waite
2021-10-12 18:28     ` Junio C Hamano
2021-10-12 20:45     ` Jeff King
2021-10-12 20:42   ` Jeff King
2021-10-12 20:53     ` Jeff King
2021-10-12 21:12       ` [PATCH] urlmatch: add underscore to URL_HOST_CHARS Jeff King
2021-10-12 21:21     ` [BUG] credential wildcard does not match hostnames containing an underscore brian m. carlson
2021-10-12 21:32       ` Jeff King
2021-10-12 21:48         ` brian m. carlson
2021-10-12 21:55           ` Jeff King
2021-10-12 21:57           ` brian m. carlson
2021-10-12 22:25             ` Aaron Schrab
2021-10-13 16:21               ` Alex Waite
2021-10-14 11:43                 ` Philip Oakley
2021-10-12 21:12 ` brian m. carlson

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).