git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Regarding the depreciation of ssh+git/git+ssh protocols
@ 2021-03-15 16:27 Drew DeVault
  2021-03-15 17:56 ` Jonathan Nieder
  0 siblings, 1 reply; 28+ messages in thread
From: Drew DeVault @ 2021-03-15 16:27 UTC (permalink / raw)
  To: git

c05186cc38ca4605bff1f275619d7d0faeaf2fa5 introduced ssh+git, and
07c7782cc8e1f37c7255dfc69c5d0e3f4d4d728c admitted this was a mistake. I
argue that it was not a mistake.

The main use-case for the git-specific protocol is to disambiguate with
other version control systems which also use SSH (or HTTPS), such as
Mercurial, or simply downloading a tarball over HTTP.

Some things that are affected by this include package manager source
lists and configurations for CI tooling (the latter being my main
interest in this).

A lot of software already recognizes ssh+git or https+git for this
purpose, and in the latter case, rewrites it to https before handing it
off to git.

I would like to see this feature un-disowned, and https+git support
added as well.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Regarding the depreciation of ssh+git/git+ssh protocols
  2021-03-15 16:27 Regarding the depreciation of ssh+git/git+ssh protocols Drew DeVault
@ 2021-03-15 17:56 ` Jonathan Nieder
  2021-03-15 18:14   ` Drew DeVault
  0 siblings, 1 reply; 28+ messages in thread
From: Jonathan Nieder @ 2021-03-15 17:56 UTC (permalink / raw)
  To: Drew DeVault; +Cc: git

Hi,

Drew DeVault wrote:

> c05186cc38ca4605bff1f275619d7d0faeaf2fa5 introduced ssh+git, and
> 07c7782cc8e1f37c7255dfc69c5d0e3f4d4d728c admitted this was a mistake. I
> argue that it was not a mistake.
>
> The main use-case for the git-specific protocol is to disambiguate with
> other version control systems which also use SSH (or HTTPS), such as
> Mercurial, or simply downloading a tarball over HTTP.

Following the trail of links, I reach
https://public-inbox.org/git/CA+55aFyWqK0bu2V1SYagrYCBGpj0=2orobK2vT-KRkqpq=kgtw@mail.gmail.com/,
but that email mostly just makes assertions rather than explaining the
rationale.  So it's probably worth talking it through now.

> Some things that are affected by this include package manager source
> lists and configurations for CI tooling (the latter being my main
> interest in this).

The original idea of URI schemes like svn+https is that we can treat
these version control URLs as part of the general category of uniform
resource identifiers --- in other words, you might be able to type
them in a browser's URL bar, browse the content of a repository, use
an <img> tag to point to a file within a version control repository,
and so on.

_That_ idea, at least, does not work all that well.  There's not an
equivalent to a fragment identifier to refer to a particular file
within a repository.  Further, if I have an https URL referring to a
Git repository, I'm better off viewing it without a "git+" prefix
because then I can see the content of the repository using a web based
repository browser.

In other words, a "Git URL" is not a URI at all; it's simply the
identifier that Git uses to clone a repository.  A package manager or
CI tool is perfectly within its rights to provide its own naming
scheme for sources, such as "git::https://example.com/path/to/repo" or
even the same with "git+" prefix; or it can use an https URL and infer
from the content it gets there what version control system it uses.

The missing piece is an HTTP header to unambiguously mark that URL as
being usable by Git.  I'm not aware of a standard way to do that; e.g.
golang's "go get" tool[*] uses a custom 'meta name="go-import"' HTML
element.

Thanks and hope that helps,
Jonathan

[*] https://golang.org/cmd/go/#hdr-Remote_import_paths

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Regarding the depreciation of ssh+git/git+ssh protocols
  2021-03-15 17:56 ` Jonathan Nieder
@ 2021-03-15 18:14   ` Drew DeVault
  2021-03-15 22:01     ` brian m. carlson
  0 siblings, 1 reply; 28+ messages in thread
From: Drew DeVault @ 2021-03-15 18:14 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git

On Mon Mar 15, 2021 at 1:56 PM EDT, Jonathan Nieder wrote:
> The original idea of URI schemes like svn+https is that we can treat
> these version control URLs as part of the general category of uniform
> resource identifiers --- in other words, you might be able to type
> them in a browser's URL bar, browse the content of a repository, use
> an <img> tag to point to a file within a version control repository,
> and so on.

That was indeed the original idea, but I think it's fair to assume that
it's evolved well beyond this. There are many schemes in common use
which don't meet this criteria, such as mailto:, magnet:, bitcoin:,
postgresql:, and so on. None of these examples make productive use of
all of the URI, such as your fragment example, but they still make
productive use of parts of the URI.

To my mind, the contemporary purpose of a URI is to:

1. Identify a resource
2. Identify the protocol used to access it
3. Store domain-specific information that an implementation of that
   protocol can use to accomplish something

> The missing piece is an HTTP header to unambiguously mark that URL as
> being usable by Git. I'm not aware of a standard way to do that; e.g.
> golang's "go get" tool[*] uses a custom 'meta name="go-import"' HTML
> element.

I don't agree that this is the case. It would be much better to be able
to identify a URL as being useful for git without having to perform a
network request to find out.

A standard approach to the go-import kind of deal is also a meritous
idea, but a separate matter - and one I'm also involved in trying to
address!

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Regarding the depreciation of ssh+git/git+ssh protocols
  2021-03-15 18:14   ` Drew DeVault
@ 2021-03-15 22:01     ` brian m. carlson
  2021-03-16  0:52       ` Drew DeVault
  2021-03-16  0:54       ` Drew DeVault
  0 siblings, 2 replies; 28+ messages in thread
From: brian m. carlson @ 2021-03-15 22:01 UTC (permalink / raw)
  To: Drew DeVault; +Cc: Jonathan Nieder, git

[-- Attachment #1: Type: text/plain, Size: 1883 bytes --]

On 2021-03-15 at 18:14:31, Drew DeVault wrote:
> On Mon Mar 15, 2021 at 1:56 PM EDT, Jonathan Nieder wrote:
> > The missing piece is an HTTP header to unambiguously mark that URL as
> > being usable by Git. I'm not aware of a standard way to do that; e.g.
> > golang's "go get" tool[*] uses a custom 'meta name="go-import"' HTML
> > element.
> 
> I don't agree that this is the case. It would be much better to be able
> to identify a URL as being useful for git without having to perform a
> network request to find out.

But you can't find whether a URL is useful for a particular purpose in
general.  For example, if I see an HTTPS URL, that tells me nothing
about the resources that one might find at that URL.

One might find:

* A plain dumb Git remote.
* A plain smart Git remote.
* A smart Git remote and Git LFS support.
* A human-readable text response.
* A machine-readable JSON response.
* A binary document which is intended to be human intelligible.
* Something else.
* Nothing at all.

In addition, it's possible that the data you want exists, but is not
suitable for you in whatever way (not in a language you understand, in
an unsuitable format, is illegal or offensive, etc.), or you are not
authorized to access it.  You can't know any of this without making some
sort of request.

All a URL can tell you is literally where a resource is located.  Even
if we saw a URL that used the hypothetical https+git as the scheme, we
couldn't determine whether we could access the data, whether the data
even still exists, or, even if we knew all of those things, whether it
was using the smart or dumb protocol, without making a request.

So I don't think this is a thing we can do, simply because in general
URLs aren't suitable for sharing this kind of information.
-- 
brian m. carlson (he/him or they/them)
Houston, Texas, US

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Regarding the depreciation of ssh+git/git+ssh protocols
  2021-03-15 22:01     ` brian m. carlson
@ 2021-03-16  0:52       ` Drew DeVault
  2021-03-16  1:02         ` Jonathan Nieder
  2021-03-16  0:54       ` Drew DeVault
  1 sibling, 1 reply; 28+ messages in thread
From: Drew DeVault @ 2021-03-16  0:52 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Jonathan Nieder, git

On Mon Mar 15, 2021 at 6:01 PM EDT, brian m. carlson wrote:
> But you can't find whether a URL is useful for a particular purpose in
> general. For example, if I see an HTTPS URL, that tells me nothing
> about the resources that one might find at that URL.
>
> In addition, it's possible that the data you want exists, but is not
> suitable for you in whatever way (not in a language you understand, in
> an unsuitable format, is illegal or offensive, etc.), or you are not
> authorized to access it. You can't know any of this without making some
> sort of request.
>
> All a URL can tell you is literally where a resource is located. Even
> if we saw a URL that used the hypothetical https+git as the scheme, we
> couldn't determine whether we could access the data, whether the data
> even still exists, or, even if we knew all of those things, whether it
> was using the smart or dumb protocol, without making a request.

What we know is that we can pass it to git to deal with, and then git
will determine the next steps. It will negotiate dumb or smart HTTP
in-band, deal with errors that arise, and so on. It signals that git is
the tool best equipped to deal with the situation, and without that we'd
end up guessing.

> So I don't think this is a thing we can do, simply because in general
> URLs aren't suitable for sharing this kind of information.

That's simply not true. They are quite capable at this task, and are
fulfilling this duty for a wide varitety of applications today.

I don't really understand the disconnect here. No, URLs are not magic,
but they are perfectly sufficient for this use-case.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Regarding the depreciation of ssh+git/git+ssh protocols
  2021-03-15 22:01     ` brian m. carlson
  2021-03-16  0:52       ` Drew DeVault
@ 2021-03-16  0:54       ` Drew DeVault
  1 sibling, 0 replies; 28+ messages in thread
From: Drew DeVault @ 2021-03-16  0:54 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Jonathan Nieder, git

On Mon Mar 15, 2021 at 6:01 PM EDT, brian m. carlson wrote:
> All a URL can tell you is literally where a resource is located.

To further clarify: a URL tells you not only where to find a resource,
but how to access it. This is the purpose of the scheme field.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Regarding the depreciation of ssh+git/git+ssh protocols
  2021-03-16  0:52       ` Drew DeVault
@ 2021-03-16  1:02         ` Jonathan Nieder
  2021-03-16  1:05           ` Drew DeVault
  2021-03-16  4:38           ` Eli Schwartz
  0 siblings, 2 replies; 28+ messages in thread
From: Jonathan Nieder @ 2021-03-16  1:02 UTC (permalink / raw)
  To: Drew DeVault; +Cc: brian m. carlson, git

Drew DeVault wrote:
> On Mon Mar 15, 2021 at 6:01 PM EDT, brian m. carlson wrote:

>> So I don't think this is a thing we can do, simply because in general
>> URLs aren't suitable for sharing this kind of information.
>
> That's simply not true. They are quite capable at this task, and are
> fulfilling this duty for a wide varitety of applications today.
>
> I don't really understand the disconnect here. No, URLs are not magic,
> but they are perfectly sufficient for this use-case.

I'm not sure it's a disconnect; instead, it just looks like we
disagree.  That said, with more details about the use case it might be
possible to sway me in another direction.

To maintain the URI analogy: the URI does not tell me the content-type
of what I can access from there.  Until I know that content-type, I
may not know what the best tool is to access it.

The root of the disagreement, though, is "Git URLs" looking like a URI
in the first place.  They're not meant to be universal at all.  They
are specifically for Git.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Regarding the depreciation of ssh+git/git+ssh protocols
  2021-03-16  1:02         ` Jonathan Nieder
@ 2021-03-16  1:05           ` Drew DeVault
  2021-03-16 21:23             ` Jeff King
  2021-03-16  4:38           ` Eli Schwartz
  1 sibling, 1 reply; 28+ messages in thread
From: Drew DeVault @ 2021-03-16  1:05 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: brian m. carlson, git

On Mon Mar 15, 2021 at 9:02 PM EDT, Jonathan Nieder wrote:
> I'm not sure it's a disconnect; instead, it just looks like we
> disagree. That said, with more details about the use case it might be
> possible to sway me in another direction.
>
> To maintain the URI analogy: the URI does not tell me the content-type
> of what I can access from there. Until I know that content-type, I
> may not know what the best tool is to access it.

git isn't a content type, it's a protocol. git over HTTP or git over SSH
is a protocol in its own right, distinct from these base protocols, in
the same sense that SSH lives on top of TCP which lives on top of IP
which is transmitted to your computer over ethernet or 802.11. It's
turtles all the way down.

> The root of the disagreement, though, is "Git URLs" looking like a URI
> in the first place. They're not meant to be universal at all. They
> are specifically for Git.

At worst I would call this a happy coincidence. We have this convenient
universal format at our disposal, and we would be wise to take advantage
of it. Rejecting it on the premise that we never wanted to have it
doesn't make sense when we consider that (1) we do have it and (2) it
can be of good use to us.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Regarding the depreciation of ssh+git/git+ssh protocols
  2021-03-16  1:02         ` Jonathan Nieder
  2021-03-16  1:05           ` Drew DeVault
@ 2021-03-16  4:38           ` Eli Schwartz
  2021-03-16 11:54             ` brian m. carlson
  1 sibling, 1 reply; 28+ messages in thread
From: Eli Schwartz @ 2021-03-16  4:38 UTC (permalink / raw)
  To: Jonathan Nieder, Drew DeVault; +Cc: brian m. carlson, git


[-- Attachment #1.1: Type: text/plain, Size: 2162 bytes --]

On 3/15/21 9:02 PM, Jonathan Nieder wrote:
> Drew DeVault wrote:
>> On Mon Mar 15, 2021 at 6:01 PM EDT, brian m. carlson wrote:
> 
>>> So I don't think this is a thing we can do, simply because in general
>>> URLs aren't suitable for sharing this kind of information.
>>
>> That's simply not true. They are quite capable at this task, and are
>> fulfilling this duty for a wide varitety of applications today.
>>
>> I don't really understand the disconnect here. No, URLs are not magic,
>> but they are perfectly sufficient for this use-case.
> 
> I'm not sure it's a disconnect; instead, it just looks like we
> disagree.  That said, with more details about the use case it might be
> possible to sway me in another direction.
> 
> To maintain the URI analogy: the URI does not tell me the content-type
> of what I can access from there.  Until I know that content-type, I
> may not know what the best tool is to access it.

This is a pretty odd argument. Drew is recommending that the URI
"git+https://" tells a person the right tool to obtain the resource ("do
I use curl/wget, or git clone"), and now you're arguing that that it is
somehow insufficient because "git+https://" doesn't tell the person
which media viewer application is best suited to display the contents
after it's been downloaded and no longer has an associated URI at all
(but does exchange that particular variety of metadata for a mimetype).

Why does this even matter? Again, the point here is the assertion by
Drew that, for the purpose of listing a manifest of remotely fetchable
resources, he sees a benefit to having some standard format for the URI
itself, describing how it's intended to be fetched.

- ftp:// -> use the `ftp` tool
- scp:// -> use the `scp` tool
- http:// -> use the `wget` tool
- git+http:// -> use the `git` tool

But instead of needing every program with a git integration to
reimplement "recognize git+http and do substring prefix removal before
passing to git", the suggestion is for git to do this.

There is definitely a (strange) disconnect here.

-- 
Eli Schwartz
Arch Linux Bug Wrangler and Trusted User


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Regarding the depreciation of ssh+git/git+ssh protocols
  2021-03-16  4:38           ` Eli Schwartz
@ 2021-03-16 11:54             ` brian m. carlson
  2021-03-16 14:21               ` Drew DeVault
  2021-03-16 18:03               ` Eli Schwartz
  0 siblings, 2 replies; 28+ messages in thread
From: brian m. carlson @ 2021-03-16 11:54 UTC (permalink / raw)
  To: Eli Schwartz; +Cc: Jonathan Nieder, Drew DeVault, git

[-- Attachment #1: Type: text/plain, Size: 2145 bytes --]

On 2021-03-16 at 04:38:08, Eli Schwartz wrote:
> Why does this even matter? Again, the point here is the assertion by
> Drew that, for the purpose of listing a manifest of remotely fetchable
> resources, he sees a benefit to having some standard format for the URI
> itself, describing how it's intended to be fetched.
> 
> - ftp:// -> use the `ftp` tool
> - scp:// -> use the `scp` tool
> - http:// -> use the `wget` tool
> - git+http:// -> use the `git` tool
> 
> But instead of needing every program with a git integration to
> reimplement "recognize git+http and do substring prefix removal before
> passing to git", the suggestion is for git to do this.

I believe this construct is nonstandard.  It is better to use standard
URL syntax when possible because it makes it much, much easier for
people to use standard tooling to parse and handle URLs.  Such tooling
may have special cases for the HTTP syntax that it doesn't use in MAILTO
syntax, so it's important to pick something that works automatically.

It's difficult enough to handle parsing of SSH specifications and
distinguish them uniformly from Windows paths (think of an alias named
"c"), so I'd prefer we didn't add additional complexity to handle this
case.

Lest you think that only Git has to handle parsing these, the Git LFS
project (and every other implementation compatible with Git) has to
handle parsing them as well (and related things like url.*.insteadOf),
and providing bug-for-bug compatible behavior is generally a hassle.
We've run into numerous problems where things aren't exactly the same,
and making things more complex by adding an esoteric syntax that few
users are likely to use isn't helping.  Despite the fact that ssh+git is
specified as deprecated, we had people expect it to magically work and
had to support it in Git LFS.

So I'm very much opposed to adding, expanding, or giving any sort of
official blessing to this syntax, especially when there are perfectly
valid and equivalent schemes that are already blessed and registered
with IANA.
-- 
brian m. carlson (he/him or they/them)
Houston, Texas, US

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Regarding the depreciation of ssh+git/git+ssh protocols
  2021-03-16 11:54             ` brian m. carlson
@ 2021-03-16 14:21               ` Drew DeVault
  2021-03-16 21:28                 ` Jeff King
                                   ` (2 more replies)
  2021-03-16 18:03               ` Eli Schwartz
  1 sibling, 3 replies; 28+ messages in thread
From: Drew DeVault @ 2021-03-16 14:21 UTC (permalink / raw)
  To: brian m. carlson, Eli Schwartz; +Cc: Jonathan Nieder, git

On Tue Mar 16, 2021 at 7:54 AM EDT, brian m. carlson wrote:
> I believe this construct is nonstandard. It is better to use standard
> URL syntax when possible because it makes it much, much easier for
> people to use standard tooling to parse and handle URLs. Such tooling
> may have special cases for the HTTP syntax that it doesn't use in MAILTO
> syntax, so it's important to pick something that works automatically.

It is standard - RFC 3986 section 3.1 permits the + character in
URI schemes. The use of protocol "composition", e.g. git+https, is a
convention, but not a standard.
>
> So I'm very much opposed to adding, expanding, or giving any sort of
> official blessing to this syntax, especially when there are perfectly
> valid and equivalent schemes that are already blessed and registered
> with IANA.

This convention is blessed by the IANA, given that they have
accepted protocol registrations which use this convention:

https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml

> It's difficult enough to handle parsing of SSH specifications and
> distinguish them uniformly from Windows paths (think of an alias named
> "c"), so I'd prefer we didn't add additional complexity to handle this
> case.

There's no additional complexity here: git remotes are URIs, and any
implementation which parses them as such already deals with this case
correctly. Any implementation which doesn't may face all kinds of
problems as a consequence: SSH without a user specified, HTTPS with
Basic auth in the URI username/password fields (or just the password,
which is also allowed), and so on. Any sane and correct implementation
is pulling in a URI parser here, and if not, I don't think it's fair for
git to constrain itself in order to work around some other project's
bugs.

> Lest you think that only Git has to handle parsing these

I don't, given that my argument stems from making it easier for
third-party applications to deal with git URIs :)

> Despite the fact that ssh+git is specified as deprecated, we had
> people expect it to magically work and had to support it in Git LFS.

Aye, people do expect it to work. The problem is not going to go away.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Regarding the depreciation of ssh+git/git+ssh protocols
  2021-03-16 11:54             ` brian m. carlson
  2021-03-16 14:21               ` Drew DeVault
@ 2021-03-16 18:03               ` Eli Schwartz
  2021-03-17 22:15                 ` Jonathan Nieder
  1 sibling, 1 reply; 28+ messages in thread
From: Eli Schwartz @ 2021-03-16 18:03 UTC (permalink / raw)
  To: brian m. carlson, Jonathan Nieder, Drew DeVault, git


[-- Attachment #1.1: Type: text/plain, Size: 2736 bytes --]

On 3/16/21 7:54 AM, brian m. carlson wrote:
> On 2021-03-16 at 04:38:08, Eli Schwartz wrote:
>> Why does this even matter? Again, the point here is the assertion by
>> Drew that, for the purpose of listing a manifest of remotely fetchable
>> resources, he sees a benefit to having some standard format for the URI
>> itself, describing how it's intended to be fetched.
>>
>> - ftp:// -> use the `ftp` tool
>> - scp:// -> use the `scp` tool
>> - http:// -> use the `wget` tool
>> - git+http:// -> use the `git` tool
>>
>> But instead of needing every program with a git integration to
>> reimplement "recognize git+http and do substring prefix removal before
>> passing to git", the suggestion is for git to do this.
> 
> I believe this construct is nonstandard.  It is better to use standard
> URL syntax when possible because it makes it much, much easier for
> people to use standard tooling to parse and handle URLs.  Such tooling
> may have special cases for the HTTP syntax that it doesn't use in MAILTO
> syntax, so it's important to pick something that works automatically.
> 
> It's difficult enough to handle parsing of SSH specifications and
> distinguish them uniformly from Windows paths (think of an alias named
> "c"), so I'd prefer we didn't add additional complexity to handle this
> case.
> 
> Lest you think that only Git has to handle parsing these, the Git LFS
> project (and every other implementation compatible with Git) has to
> handle parsing them as well (and related things like url.*.insteadOf),
> and providing bug-for-bug compatible behavior is generally a hassle.
> We've run into numerous problems where things aren't exactly the same,
> and making things more complex by adding an esoteric syntax that few
> users are likely to use isn't helping.  Despite the fact that ssh+git is
> specified as deprecated, we had people expect it to magically work and
> had to support it in Git LFS.
> 
> So I'm very much opposed to adding, expanding, or giving any sort of
> official blessing to this syntax, especially when there are perfectly
> valid and equivalent schemes that are already blessed and registered
> with IANA.

Suddenly I'm hearing a much more reasonable response than "but it
doesn't give me content-type so I can't know which media application is
capable of opening it".

(I'm not especially attached to the proposal. I'm a maintainer for one
of these package managers that currently special-case git+https?:// and
rewrite the url that git sees, which has worked adequately for a long
time. However, I figured if you want to reject this proposal, reject it
for a good reason...)

-- 
Eli Schwartz
Arch Linux Bug Wrangler and Trusted User


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Regarding the depreciation of ssh+git/git+ssh protocols
  2021-03-16  1:05           ` Drew DeVault
@ 2021-03-16 21:23             ` Jeff King
  2021-03-17 14:49               ` Drew DeVault
  2021-03-18 21:30               ` Junio C Hamano
  0 siblings, 2 replies; 28+ messages in thread
From: Jeff King @ 2021-03-16 21:23 UTC (permalink / raw)
  To: Drew DeVault; +Cc: Jonathan Nieder, brian m. carlson, git

On Mon, Mar 15, 2021 at 09:05:34PM -0400, Drew DeVault wrote:

> On Mon Mar 15, 2021 at 9:02 PM EDT, Jonathan Nieder wrote:
> > I'm not sure it's a disconnect; instead, it just looks like we
> > disagree. That said, with more details about the use case it might be
> > possible to sway me in another direction.
> >
> > To maintain the URI analogy: the URI does not tell me the content-type
> > of what I can access from there. Until I know that content-type, I
> > may not know what the best tool is to access it.
> 
> git isn't a content type, it's a protocol. git over HTTP or git over SSH
> is a protocol in its own right, distinct from these base protocols, in
> the same sense that SSH lives on top of TCP which lives on top of IP
> which is transmitted to your computer over ethernet or 802.11. It's
> turtles all the way down.

I think this is the key observation. A browser can access an HTTP URL,
and then based on the content type, decide what to do with the result.
But one cannot do so with a git-over-http URL. Git will not even
directly access the resource specified in the URL! It will construct a
related one (with appending "info/refs" and a "service" field) and
request that.

So you definitely need to "somehow" know that a URL is meant to be used
with Git. And that makes me somewhat sympathetic to your request.

The downsides I see are:

  - one of the advantages of straight http:// URLs is that they can
    accessed by multiple tools. Most "forge" tools let you use the same
    URL both for getting a human-readable page in a browser, as well as
    accessing the repository with the Git CLI. I'd hate to see https+git
    URLs become common, because they add friction there (though simply
    supporting them at all gives people the choice of whether to use
    them).

  - I'm also sympathetic to brian's point that there's a wider
    ecosystem. It's not just "git" that needs to learn them. It's jgit,
    and libgit2, and many tools that work with git remotes.

-Peff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Regarding the depreciation of ssh+git/git+ssh protocols
  2021-03-16 14:21               ` Drew DeVault
@ 2021-03-16 21:28                 ` Jeff King
  2021-03-17 14:50                   ` Drew DeVault
  2021-03-17  0:45                 ` Jakub Narębski
  2021-03-17 22:06                 ` brian m. carlson
  2 siblings, 1 reply; 28+ messages in thread
From: Jeff King @ 2021-03-16 21:28 UTC (permalink / raw)
  To: Drew DeVault; +Cc: brian m. carlson, Eli Schwartz, Jonathan Nieder, git

On Tue, Mar 16, 2021 at 10:21:13AM -0400, Drew DeVault wrote:

> > It's difficult enough to handle parsing of SSH specifications and
> > distinguish them uniformly from Windows paths (think of an alias named
> > "c"), so I'd prefer we didn't add additional complexity to handle this
> > case.
> 
> There's no additional complexity here: git remotes are URIs, and any
> implementation which parses them as such already deals with this case
> correctly. Any implementation which doesn't may face all kinds of
> problems as a consequence: SSH without a user specified, HTTPS with
> Basic auth in the URI username/password fields (or just the password,
> which is also allowed), and so on. Any sane and correct implementation
> is pulling in a URI parser here, and if not, I don't think it's fair for
> git to constrain itself in order to work around some other project's
> bugs.

Git remotes are most definitely not just URIs. Some valid remotes are:
".", "foo", "/tmp/foo", "c:\foo", "example.com:foo". The parser inside
Git has rules to distinguish these from actual rfc3986-compliant URIs.

Now I don't know much about the parsing code in, say, git-lfs, or how
much of pain it would be to add a new scheme for something that _does_
conform to rfc3986. But it's not necessarily as easy as "you should be
using a compliant URI parser".

-Peff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Regarding the depreciation of ssh+git/git+ssh protocols
  2021-03-16 14:21               ` Drew DeVault
  2021-03-16 21:28                 ` Jeff King
@ 2021-03-17  0:45                 ` Jakub Narębski
  2021-03-17 14:53                   ` Drew DeVault
  2021-03-17 22:06                 ` brian m. carlson
  2 siblings, 1 reply; 28+ messages in thread
From: Jakub Narębski @ 2021-03-17  0:45 UTC (permalink / raw)
  To: Drew DeVault; +Cc: brian m. carlson, Eli Schwartz, Jonathan Nieder, git

"Drew DeVault" <sir@cmpwn.com> writes:
> On Tue Mar 16, 2021 at 7:54 AM EDT, brian m. carlson wrote:

>> I believe this construct is nonstandard. It is better to use standard
>> URL syntax when possible because it makes it much, much easier for
>> people to use standard tooling to parse and handle URLs. Such tooling
>> may have special cases for the HTTP syntax that it doesn't use in MAILTO
>> syntax, so it's important to pick something that works automatically.
>
> It is standard - RFC 3986 section 3.1 permits the + character in
> URI schemes. The use of protocol "composition", e.g. git+https, is a
> convention, but not a standard.

All right, that is true... but the Git itself and Git--related tools do
not usually employ the full-fledged URI parser, as far as I know.  They
just check for the few schemas they support if the repository location
is given as an URI / URL.

That said, if the RFC states it, then it is a standard construct.

>> So I'm very much opposed to adding, expanding, or giving any sort of
>> official blessing to this syntax, especially when there are perfectly
>> valid and equivalent schemes that are already blessed and registered
>> with IANA.
>
> This convention is blessed by the IANA, given that they have
> accepted protocol registrations which use this convention:
>
> https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml

Well, thara is a total of one protocol (CoAP) that uses '+' based
schemas, namely: coap+tcp, coap+ws, coaps+tcp, coaps+ws (well at least
out of those protocols that made it into IANA).

Though it in this case neither of those parts of schema joined by the
'+' sign is an application name...

>> It's difficult enough to handle parsing of SSH specifications and
>> distinguish them uniformly from Windows paths (think of an alias named
>> "c"), so I'd prefer we didn't add additional complexity to handle this
>> case.
>
> There's no additional complexity here: git remotes are URIs, and any
> implementation which parses them as such already deals with this case
> correctly. Any implementation which doesn't may face all kinds of
> problems as a consequence: SSH without a user specified, HTTPS with
> Basic auth in the URI username/password fields (or just the password,
> which is also allowed), and so on. Any sane and correct implementation
> is pulling in a URI parser here, and if not, I don't think it's fair for
> git to constrain itself in order to work around some other project's
> bugs.

The Git documentation explicitly enumerates all possible URL types that
you can use with Git.

On the other hand Git-related tools can support more types of URL, for
example ones for AWS S3 buckets.

>
>> Lest you think that only Git has to handle parsing these
>
> I don't, given that my argument stems from making it easier for
> third-party applications to deal with git URIs :)
>
>> Despite the fact that ssh+git is specified as deprecated, we had
>> people expect it to magically work and had to support it in Git LFS.
>
> Aye, people do expect it to work. The problem is not going to go away.

To reiterate, the idea of "prefixed URLs", that is using git+https://
and git+ssh:// is to denote that said URL is only usable by Git, without
any additional out-of-band information (like other attributes on <a>
element or its encompassing element)?

Best,
--
Jakub Narębski

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Regarding the depreciation of ssh+git/git+ssh protocols
  2021-03-16 21:23             ` Jeff King
@ 2021-03-17 14:49               ` Drew DeVault
  2021-03-18 21:30               ` Junio C Hamano
  1 sibling, 0 replies; 28+ messages in thread
From: Drew DeVault @ 2021-03-17 14:49 UTC (permalink / raw)
  To: Jeff King; +Cc: Jonathan Nieder, brian m. carlson, git

On Tue Mar 16, 2021 at 5:23 PM EDT, Jeff King wrote:
> - one of the advantages of straight http:// URLs is that they can
> accessed by multiple tools. Most "forge" tools let you use the same
> URL both for getting a human-readable page in a browser, as well as
> accessing the repository with the Git CLI. I'd hate to see https+git
> URLs become common, because they add friction there (though simply
> supporting them at all gives people the choice of whether to use
> them).

I think their main use-cases would be limited to places where the
distinction is necessary, such as for those packaging or CI tools. I
don't expect us to end up in a situation where users are passing each
other git+https URLs in everyday conversation.

> - I'm also sympathetic to brian's point that there's a wider
> ecosystem. It's not just "git" that needs to learn them. It's jgit,
> and libgit2, and many tools that work with git remotes.

I would be happy to write the necessary patch for libgit2, at least.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Regarding the depreciation of ssh+git/git+ssh protocols
  2021-03-16 21:28                 ` Jeff King
@ 2021-03-17 14:50                   ` Drew DeVault
  0 siblings, 0 replies; 28+ messages in thread
From: Drew DeVault @ 2021-03-17 14:50 UTC (permalink / raw)
  To: Jeff King; +Cc: brian m. carlson, Eli Schwartz, Jonathan Nieder, git

On Tue Mar 16, 2021 at 5:28 PM EDT, Jeff King wrote:
> Git remotes are most definitely not just URIs. Some valid remotes are:
> ".", "foo", "/tmp/foo", "c:\foo", "example.com:foo". The parser inside
> Git has rules to distinguish these from actual rfc3986-compliant URIs.
>
> Now I don't know much about the parsing code in, say, git-lfs, or how
> much of pain it would be to add a new scheme for something that _does_
> conform to rfc3986. But it's not necessarily as easy as "you should be
> using a compliant URI parser".

Sorry, I meant to say that git remotes are a superset of URIs, so a
conformant URI parser already has to be involved - I didn't mean that
all git remotes are URIs.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Regarding the depreciation of ssh+git/git+ssh protocols
  2021-03-17  0:45                 ` Jakub Narębski
@ 2021-03-17 14:53                   ` Drew DeVault
  0 siblings, 0 replies; 28+ messages in thread
From: Drew DeVault @ 2021-03-17 14:53 UTC (permalink / raw)
  To: Jakub Narębski; +Cc: brian m. carlson, Eli Schwartz, Jonathan Nieder, git

On Tue Mar 16, 2021 at 8:45 PM EDT, Jakub Narębski wrote:
> Well, thara is a total of one protocol (CoAP) that uses '+' based
> schemas, namely: coap+tcp, coap+ws, coaps+tcp, coaps+ws (well at least
> out of those protocols that made it into IANA).

One is greater than zero! It is blessed, even if only a little. We can
just go ask the IANA about it if we want to further entertain the idea
that this approach is non-kosher, but, like you said: if the RFC states
it, then it is a standard construct.

> Though it in this case neither of those parts of schema joined by the
> '+' sign is an application name...

git is both an application name and a protocol name. ¯\_(ツ)_/¯

> > Aye, people do expect it to work. The problem is not going to go away.
>
> To reiterate, the idea of "prefixed URLs", that is using git+https://
> and git+ssh:// is to denote that said URL is only usable by Git, without
> any additional out-of-band information (like other attributes on <a>
> element or its encompassing element)?

Correct.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Regarding the depreciation of ssh+git/git+ssh protocols
  2021-03-16 14:21               ` Drew DeVault
  2021-03-16 21:28                 ` Jeff King
  2021-03-17  0:45                 ` Jakub Narębski
@ 2021-03-17 22:06                 ` brian m. carlson
  2021-03-18 12:53                   ` Drew DeVault
  2 siblings, 1 reply; 28+ messages in thread
From: brian m. carlson @ 2021-03-17 22:06 UTC (permalink / raw)
  To: Drew DeVault; +Cc: Eli Schwartz, Jonathan Nieder, git

[-- Attachment #1: Type: text/plain, Size: 4184 bytes --]

On 2021-03-16 at 14:21:13, Drew DeVault wrote:
> On Tue Mar 16, 2021 at 7:54 AM EDT, brian m. carlson wrote:
> > So I'm very much opposed to adding, expanding, or giving any sort of
> > official blessing to this syntax, especially when there are perfectly
> > valid and equivalent schemes that are already blessed and registered
> > with IANA.
> 
> This convention is blessed by the IANA, given that they have
> accepted protocol registrations which use this convention:
> 
> https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml

I assume that you're volunteering to write the RFC to register these
with IANA?  If not, then they are indeed non-standard and will remain
so.

I should point out that I don't believe the IANA will accept such a
registration, because they will believe it to be duplicative of the
existing scheme.  But if you want to go this route, we should only
proceed if we register them with IANA.

> > It's difficult enough to handle parsing of SSH specifications and
> > distinguish them uniformly from Windows paths (think of an alias named
> > "c"), so I'd prefer we didn't add additional complexity to handle this
> > case.
> 
> There's no additional complexity here: git remotes are URIs, and any
> implementation which parses them as such already deals with this case
> correctly. Any implementation which doesn't may face all kinds of
> problems as a consequence: SSH without a user specified, HTTPS with
> Basic auth in the URI username/password fields (or just the password,
> which is also allowed), and so on. Any sane and correct implementation
> is pulling in a URI parser here, and if not, I don't think it's fair for
> git to constrain itself in order to work around some other project's
> bugs.

We accept local paths in a variety of situations and SSH specifications,
neither of which are URLs.  The ultimate problem is that we support
Windows paths and need to handle them correctly on Windows but don't
support them on other operating systems and need to not handle them
there.  So, somehow, in portable code which does not vary based on
operating system, we need to decide what should be a local path and what
should be an SSH specification and do that in a way compatible with Git.

Git LFS has also run into the problem that the URL parser we use has
gotten stricter in a point release due to CVEs against it which broke
various kinds of parsing of our SSH URLs that were previously accepted.
This almost certainly bit other Go-based tools that work with Git
repositories as well, since everyone uses the standard library URI
parser.

If we only supported valid URLs, this would be much, much easier.  That
is not at all the case, and it has never been the case for Git.

> > Lest you think that only Git has to handle parsing these
> 
> I don't, given that my argument stems from making it easier for
> third-party applications to deal with git URIs :)

This does not make my life as a maintainer of said third-party
application easier.  It complicates it significantly, because people
often upgrade Git without upgrading Git LFS and then are unhappy when
the five-year old version they use from their distro doesn't support
every new feature.

Adding this feature which duplicates existing functionality does not
improve my life as a user of Git, as a developer of Git, as a maintainer
of a number of third-party tools which interact with Git, or as someone
who maintains part of a hosting platform.

It also will inevitably confuse users who will want to know the relevant
difference between the URLs and which they should use.  They will then
see the new type of URL and wonder why it does not work with the version
they are using.  And many users already don't understand the difference
between HTTPS and SSH URLs, which is compounded by the fact that many
Windows users have never before and will never otherwise use SSH.

In case it was not already clear, I'm very strongly opposed to this
proposal.  It seems to make a lot of needless work without a clear and
convincing benefit.
-- 
brian m. carlson (he/him or they/them)
Houston, Texas, US

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Regarding the depreciation of ssh+git/git+ssh protocols
  2021-03-16 18:03               ` Eli Schwartz
@ 2021-03-17 22:15                 ` Jonathan Nieder
  2021-03-31  4:23                   ` Eli Schwartz
  2021-04-07 13:46                   ` Mark Lodato
  0 siblings, 2 replies; 28+ messages in thread
From: Jonathan Nieder @ 2021-03-17 22:15 UTC (permalink / raw)
  To: Eli Schwartz; +Cc: brian m. carlson, Drew DeVault, git

Hi,

Eli Schwartz wrote:

> I'm not especially attached to the proposal. I'm a maintainer for one
> of these package managers that currently special-case git+https?:// and
> rewrite the url that git sees, which has worked adequately for a long
> time.

This is useful context.  What URL forms does this package manager
support (e.g., do you have a link to its documentation)?  What would
the effect be for the package manager and its users if Git started
supporting a git+https:// synonym for https://?

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Regarding the depreciation of ssh+git/git+ssh protocols
  2021-03-17 22:06                 ` brian m. carlson
@ 2021-03-18 12:53                   ` Drew DeVault
  0 siblings, 0 replies; 28+ messages in thread
From: Drew DeVault @ 2021-03-18 12:53 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Eli Schwartz, Jonathan Nieder, git

I feel like the tone here is getting a bit hostile. Let's try to keep
things friendly.

On Wed Mar 17, 2021 at 6:06 PM EDT, brian m. carlson wrote:
> I assume that you're volunteering to write the RFC to register these
> with IANA? If not, then they are indeed non-standard and will remain
> so.
>
> I should point out that I don't believe the IANA will accept such a
> registration, because they will believe it to be duplicative of the
> existing scheme. But if you want to go this route, we should only
> proceed if we register them with IANA.

This is a needlessly high bar to set, and saying we can only proceed
with the IANA's involvement seems like a convenient excuse to shut the
conversation down entirely. Registering with IANA is nice, but there are
thousands of protocols which don't bother.

In any case, this is not quite as high of a bar as you may believe (or
hope?). The process is pretty straightforward, and a scheme with "+" in
it meets the criteria laid forth in the RFC, and the argument is even
stronger given that WHATWG standards make use of the convention these
days. If this is truly desirable, we can do it after the feature lands,
but given that the git:// protocol was registered as an apparent
after-thought by a third-party from Microsoft with zero commits in the
git tree, it just seems like a requirement put forth in bad faith.

> > > Lest you think that only Git has to handle parsing these
> > 
> > I don't, given that my argument stems from making it easier for
> > third-party applications to deal with git URIs :)
>
> This does not make my life as a maintainer of said third-party
> application easier. It complicates it significantly, because people
> often upgrade Git without upgrading Git LFS and then are unhappy when
> the five-year old version they use from their distro doesn't support
> every new feature.

What third-party software do you represent? Can we make an objective
estimation of the complexity of the change for your project in practice?

> Adding this feature which duplicates existing functionality

What existing method is there to identify a URL as being a git remote?

> It also will inevitably confuse users who will want to know the relevant
> difference between the URLs and which they should use. They will then
> see the new type of URL and wonder why it does not work with the version
> they are using. And many users already don't understand the difference
> between HTTPS and SSH URLs, which is compounded by the fact that many
> Windows users have never before and will never otherwise use SSH.

As you explained, this confusion is already happening. If users don't
know what a URI is, then they're already confused, and this is unlikely
to make it worse. If anything, this could make it easier, as a URL which
explicitly represents its relationship with git could hint at its
intended usage.

And again, I don't expect users to actually be handing these URLs around
to each other for regular use. This is specifically necessary in cases
where software needs to handle multiple kinds of version control.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Regarding the depreciation of ssh+git/git+ssh protocols
  2021-03-16 21:23             ` Jeff King
  2021-03-17 14:49               ` Drew DeVault
@ 2021-03-18 21:30               ` Junio C Hamano
  2021-03-18 21:53                 ` Drew DeVault
  1 sibling, 1 reply; 28+ messages in thread
From: Junio C Hamano @ 2021-03-18 21:30 UTC (permalink / raw)
  To: Jeff King; +Cc: Drew DeVault, Jonathan Nieder, brian m. carlson, git

Jeff King <peff@peff.net> writes:

> So you definitely need to "somehow" know that a URL is meant to be used
> with Git. And that makes me somewhat sympathetic to your request.

Nicely summarized.  I am also sympathetic to the cause, but I do not
see upside in tucking the information to the URL syntax.  Even if we
limit ourselves to the CI context, I do not see how the repository
location alone is sufficient (e.g. "build the tip of this branch of
that repository every time it gets updated" already needs more than
the repository location).

> The downsides I see are:
>
>   - one of the advantages of straight http:// URLs is that they can
>     accessed by multiple tools. Most "forge" tools let you use the same
>     URL both for getting a human-readable page in a browser, as well as
>     accessing the repository with the Git CLI. I'd hate to see https+git
>     URLs become common, because they add friction there (though simply
>     supporting them at all gives people the choice of whether to use
>     them).
>
>   - I'm also sympathetic to brian's point that there's a wider
>     ecosystem. It's not just "git" that needs to learn them. It's jgit,
>     and libgit2, and many tools that work with git remotes.

Yup.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Regarding the depreciation of ssh+git/git+ssh protocols
  2021-03-18 21:30               ` Junio C Hamano
@ 2021-03-18 21:53                 ` Drew DeVault
  0 siblings, 0 replies; 28+ messages in thread
From: Drew DeVault @ 2021-03-18 21:53 UTC (permalink / raw)
  To: Junio C Hamano, Jeff King; +Cc: Jonathan Nieder, brian m. carlson, git

The status quo is similarly frustrating. We have no choice but to allow
these strange unofficial +git URLs to proliferate among package managers
and build systems. It has already caused confusion with users, and it
can only cause more the longer it remains unaddressed upstream.

There are two options:

1. We make the change, users are confused for a while, and software has
   to be updated, but the confusion gradually diminishes over time as
   the ecosystem adjusts and people learn the change.
2. We don't make the change, and the inconsistency continues to require
   special cases in new tools, with no central organization for keeping
   them consistent from one to the next, and users will continue to stub
   their toe on it indefinitely.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Regarding the depreciation of ssh+git/git+ssh protocols
  2021-03-17 22:15                 ` Jonathan Nieder
@ 2021-03-31  4:23                   ` Eli Schwartz
  2021-04-07 13:46                   ` Mark Lodato
  1 sibling, 0 replies; 28+ messages in thread
From: Eli Schwartz @ 2021-03-31  4:23 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: brian m. carlson, Drew DeVault, git


[-- Attachment #1.1: Type: text/plain, Size: 1931 bytes --]

On 3/17/21 6:15 PM, Jonathan Nieder wrote:
> Hi,
> 
> Eli Schwartz wrote:
> 
>> I'm not especially attached to the proposal. I'm a maintainer for one
>> of these package managers that currently special-case git+https?:// and
>> rewrite the url that git sees, which has worked adequately for a long
>> time.
> 
> This is useful context.  What URL forms does this package manager
> support (e.g., do you have a link to its documentation)?  What would
> the effect be for the package manager and its users if Git started
> supporting a git+https:// synonym for https://?


https://archlinux.org/pacman/PKGBUILD.5.html#VCS

We support cloning arbitrary version controlled sources via either

vcs://

or vcs+proto://

but not

proto+vcs://

so that encompasses git:// or git+https:// or git+ssh:// and also
permits hg+https or svn+https:// or bzr+http:// or fossil+https://

(ignore the documentation not mentioning fossil, this is a development
branch addition and obviously the docs are for the stable release)

We then do prefix removal of everything before the plus sign since
currently no VCS supports this directly (I think?), but we could remove
that pass from our git source plugin if git implemented it internally.

Implementing https+git:// as a synonym for https:// is IMO confusing, so
I don't intend to implement it even if git does. I think one way to
specify the VCS + transport protocol is enough... and prefix removal is
easier than removing the middle of the string.

The net effect would be, I guess, less code in the package manager, and
users would be able to go to a public registry of source packages like
https://aur.archlinux.org/packages/pacman-git, see the clickable link
under "Sources (5)" and copy/paste that into a `git clone` command line
without knowing they need to edit the link first.


-- 
Eli Schwartz
Arch Linux Bug Wrangler and Trusted User


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Regarding the depreciation of ssh+git/git+ssh protocols
  2021-03-17 22:15                 ` Jonathan Nieder
  2021-03-31  4:23                   ` Eli Schwartz
@ 2021-04-07 13:46                   ` Mark Lodato
  2021-04-07 19:46                     ` Junio C Hamano
  1 sibling, 1 reply; 28+ messages in thread
From: Mark Lodato @ 2021-04-07 13:46 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Eli Schwartz, brian m. carlson, Drew DeVault, git

Jonathan Nieder wrote:
> This is useful context.  What URL forms does this package manager
> support (e.g., do you have a link to its documentation)?  What would
> the effect be for the package manager and its users if Git started
> supporting a git+https:// synonym for https://?

Here are two more examples:

- pip: https://pip.pypa.io/en/latest/cli/pip_install/#git
- SPDX: https://spdx.github.io/spdx-spec/3-package-information/#37-package-download-location

The common thread is that systems need a way to uniquely identify a git
repository or some object therein. I believe this means some combination
of:

- VCS type (git)
- Transport location (e.g. https://github.com/git/git)
- Ref (e.g. master)
- Resolved commit ID (e.g. 48bf2fa8bad054d66bd79c6ba903c89c704201f7)
- Path (e.g. contrib/diff-highlight)
- (possibly) Clone depth

As Drew has said, the current state of affairs is that, lacking a
standard, multiple systems are all inventing incompatible schemes using
the `git+https` name. This is not a good situation because the "URI" is
no longer "unique". Given such a URI in isolation, one cannot know how
to parse it.

It's not clear to me that git itself needs to support this scheme. It
would go a long way for git to simply recommend a particular scheme so
that all these systems can use a common format. (We could register that
with IANA.) The pip format seems to be the closest, but it doesn't
support both ref AND resolved commit ID, and it is currently specific to
pip (`egg=` could be replaced with `path=`).

Best,
Mark

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Regarding the depreciation of ssh+git/git+ssh protocols
  2021-04-07 13:46                   ` Mark Lodato
@ 2021-04-07 19:46                     ` Junio C Hamano
  2021-04-13  8:52                       ` Kerry, Richard
  0 siblings, 1 reply; 28+ messages in thread
From: Junio C Hamano @ 2021-04-07 19:46 UTC (permalink / raw)
  To: Mark Lodato
  Cc: Jonathan Nieder, Eli Schwartz, brian m. carlson, Drew DeVault,
	git

Mark Lodato <lodato@google.com> writes:

> The common thread is that systems need a way to uniquely identify a git
> repository or some object therein. I believe this means some combination
> of:
>
> - VCS type (git)
> - Transport location (e.g. https://github.com/git/git)
> - Ref (e.g. master)
> - Resolved commit ID (e.g. 48bf2fa8bad054d66bd79c6ba903c89c704201f7)
> - Path (e.g. contrib/diff-highlight)
> - (possibly) Clone depth

Nice.  So there is no reason to expect that these downstream systems
can sanely force various VCS systems that the notation they use for
"transport location" would identify what VCS type uses that
location.  All the other details (like refs, which may other VCS
many not even have) other than VCS type depend on the VCS used.

Thanks.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: Regarding the depreciation of ssh+git/git+ssh protocols
  2021-04-07 19:46                     ` Junio C Hamano
@ 2021-04-13  8:52                       ` Kerry, Richard
  0 siblings, 0 replies; 28+ messages in thread
From: Kerry, Richard @ 2021-04-13  8:52 UTC (permalink / raw)
  To: git@vger.kernel.org
  Cc: Jonathan Nieder, Eli Schwartz, brian m. carlson, Drew DeVault,
	Mark Lodato, Junio C Hamano


s/depreciation/deprecation/



Regards,
Richard.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Regarding the depreciation of ssh+git/git+ssh protocols
@ 2023-10-13 20:49 David Rogers
  0 siblings, 0 replies; 28+ messages in thread
From: David Rogers @ 2023-10-13 20:49 UTC (permalink / raw)
  To: git

Git repositories have become indispensable resources for citing parts
of a development history with links.  However, the format of git
remote entries is not always distinguishable from other types of
citation -- for example a git reference vs. a plain URL.

Rather than rely on context to tell me that `https://github.com/git/git`
refers to a git repository which I could clone with git over https, it
would be nice to use a url like `git+https://github.com/git/git`
or even
`git+https://github.com/git/git?commit=d0e8084c65cbf949038ae4cc344ac2c2efd77415`
to unambiguously specify that the type of data and its method of
access are native to git.

This issue is extremely important for version control systems which
build dependency lists from git, e.g.
https://pip.pypa.io/en/stable/topics/vcs-support/

That project lists several invented URL schemes (all beginning with
git+) and assigning special reserved characters
(https://datatracker.ietf.org/doc/html/rfc3986#section-2.2)
git+https://git.example.com/MyProject.git@master
git+https://git.example.com/MyProject.git@v1.0
git+https://git.example.com/MyProject.git@da39a3ee5e6b4b0d3255bfef95601890afd80709
git+https://git.example.com/MyProject.git@refs/pull/123/head

It would be helpful for the git project itself to define its own URL
scheme to codify these use cases and, possibly in addition, provide a
standard way to reference within git repositories.

For reference, some of the ways URLs are already used/defined within
git are documented here:
- https://github.com/git/git/blob/d0e8084c65cbf949038ae4cc344ac2c2efd77415/connect.c#L107
  (alternately, using gitweb syntax not actually available on github,
https://github.com/git/git.git/blob/d0e8084c65cbf949038ae4cc344ac2c2efd77415:/git/connect.c)
- https://mirrors.edge.kernel.org/pub/software/scm/git/docs/gitremote-helpers.html
- https://git-scm.com/docs/git-http-backend
- https://git-scm.com/docs/gitweb

Currently, a comment in connect.c notes "git+" schemes were
deprecated.  However, I would argue that at a minimum, these "git+"
schemes should be a supported and documented feature of git.  Also,
something has to be fixed (or better communicated) about URLs of the
form "git@github.com:user/project.git"  These are implicitly treated
as "git+ssh://git@github.com/user/project.git", but the use of ":" is
confusing from the perspective of translating between these two forms.

In addition, the use of paths, queries, and fragments should be
considered to allow (IMHO) at least 3 distinct uses:
1. naming commit-ish objects (and potentially metadata like author and
parents within the commit)
2. naming tree-ish objects and paths within them
3. naming blobs (and potentially fragment identifiers like lines or
HTML tags within those blobs)

These further refinements don't have to be supported by any special
functions within git.  However, their existence may influence git data
structures and api-s in the future.

The last discussion I can find of this issue on the git mailing list
(https://lore.kernel.org/git/C9Y2DPYH4XO1.3KFD8LT770P2@taiga)
indicates that defining conventions like these within git's
documentation would be a good place to start.  On a separate thread, I
will send a draft "git+" URI naming scheme for discussion and eventual
submission to IANA
(https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml).

~ David M. Rogers


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2023-10-13 20:49 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-15 16:27 Regarding the depreciation of ssh+git/git+ssh protocols Drew DeVault
2021-03-15 17:56 ` Jonathan Nieder
2021-03-15 18:14   ` Drew DeVault
2021-03-15 22:01     ` brian m. carlson
2021-03-16  0:52       ` Drew DeVault
2021-03-16  1:02         ` Jonathan Nieder
2021-03-16  1:05           ` Drew DeVault
2021-03-16 21:23             ` Jeff King
2021-03-17 14:49               ` Drew DeVault
2021-03-18 21:30               ` Junio C Hamano
2021-03-18 21:53                 ` Drew DeVault
2021-03-16  4:38           ` Eli Schwartz
2021-03-16 11:54             ` brian m. carlson
2021-03-16 14:21               ` Drew DeVault
2021-03-16 21:28                 ` Jeff King
2021-03-17 14:50                   ` Drew DeVault
2021-03-17  0:45                 ` Jakub Narębski
2021-03-17 14:53                   ` Drew DeVault
2021-03-17 22:06                 ` brian m. carlson
2021-03-18 12:53                   ` Drew DeVault
2021-03-16 18:03               ` Eli Schwartz
2021-03-17 22:15                 ` Jonathan Nieder
2021-03-31  4:23                   ` Eli Schwartz
2021-04-07 13:46                   ` Mark Lodato
2021-04-07 19:46                     ` Junio C Hamano
2021-04-13  8:52                       ` Kerry, Richard
2021-03-16  0:54       ` Drew DeVault
  -- strict thread matches above, loose matches on Subject: below --
2023-10-13 20:49 David Rogers

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).