git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Kyle J. McKay" <mackyle@gmail.com>
To: Jeff King <peff@peff.net>, Junio C Hamano <gitster@pobox.com>
Cc: "git list" <git@vger.kernel.org>,
	"David Aguilar" <davvid@gmail.com>, "Petr Baudis" <pasky@ucw.cz>,
	"Richard Hartmann" <richih.mailinglist@gmail.com>,
	"Daniel Knittl-Frank" <knittl89@googlemail.com>,
	"Jan Krüger" <jk@jk.gs>, "Alejandro Mery" <amery@geeks.cl>,
	"Aaron Schrab" <aaron@schrab.com>
Subject: Re: [PATCH v3] config: add support for http.<url>.* settings
Date: Mon, 15 Jul 2013 02:50:48 -0700	[thread overview]
Message-ID: <CDCD6E5C-A829-43A1-94F6-93670A339A35@gmail.com> (raw)
In-Reply-To: <20130715051233.GC21127@sigill.intra.peff.net>

(I'm attempting to combine the various separate email replies into a  
single response here, please forgive me if I mangle something up.)



On Jul 14, 2013, at 22:12, Jeff King wrote:
> On Sun, Jul 14, 2013 at 09:02:19PM -0700, Junio C Hamano wrote:
>
>>> Or proceed with what's there right now (there are a few pending
>>> updates from reviewers) and then, as Junio says above, adjust it  
>>> later
>>> if needed?
>>
>> I have been assuming that "strictly textual match" will be a subset
>> of the matching semantics Aaron and Peff suggested.  That is, if we
>> include your version in the upcoming release, the user writes the
>> http.<URLpattern>.<variable> configuration so that the entries match
>> what they want them to match, the enhanced URL matcher Aaron and
>> Peff suggested will still make them match.
>>
>> Am I mistaken?  Will there be some <URLpattern> that will not match
>> with the same URL literally?
>
> I think we need to decide now, because the two schemes are not
> compatible, and switching will break setups. Yes, the matcher that  
> Aaron
> and I suggest is a strict superset (i.e., we will not stop matching
> things that used to match), which is good. But we would not be able to
> implement "longest prefix wins" overriding anymore, which would change
> the meaning of cases like:
>
>  [http "https://example.com"] foo = 1
>  [http] foo = 2
>
> (under Kyle's scheme it is "1", and under ours "2"). We can probably
> come up with some clever rules for overriding a broken-down URL that
> would stay backwards compatible. E.g., longest-prefix-match if there  
> are
> no wildcarded components, and last-one-wins if there are. But that is
> not a rule I would want readers to have to puzzle out in the
> documentation.
>
> So I think we are much better off to decide the semantics now.

Yes.  Consider these two commands:

> git config http.foo = 2
> git config http.https://example.com/.foo = 1

I am proposing that this means https://example.com has foo set to 1  
(assuming there are no other http*.foo configurations).

The other proposal probably means that, but it might not.

Given this sequence:

> git config http.foo = 2
> git config http.https://example.com/.foo = 1
>

or this sequence:

> git config http.https://example.com/.foo = 1
> git config http.foo = 2

what actually happens when using the other proposal will depend on  
whether or not the user has previously configured any other "http.*"  
setting or any other "http.https://example.com/.*" setting since doing  
so would have established such a section in the config file and it  
will affect the order the directives are processed since they will be  
added to the existing section rather than creating a new same-named  
section at the end of the file.

For this reason the order the above two "git config" commands are  
given cannot be relied upon to determine what setting http.foo will  
have when a https://example.com/ url is accessed.

Since git config does not have a "git config --placing-after-section  
section-name http.foo 2" option it seems to me that the only way to be  
sure what you would end up with using this method is to examine the  
created config file (git config -l would be sufficient although  
probably harder to read than viewing the config file itself).

For an individual .git/config file I don't expect this to be an  
issue.  However for the --global config file, I believe quite some  
time could go by between setting one http.* option and another so it  
seems quite likely to me that the user may not remember or have ever  
been aware of what order the various [http*] sections are currently in.

This is why I originally proposed longest-match-wins semantics, but  
please read on.



On Jul 14, 2013, at 22:06, Jeff King wrote:
> I admit that these are unlikely to come up in practice, but I am  
> worried
> that there is some room for mischief here. For example:
>
>  https://example.com%2ftricky.host/repo.git

This is actually an invalid URL.  Host names may not contain '%'  
characters and can only be followed by optionally ':port' and then one  
of '/', '?', or '#'.  A URL parser would actually die when it sees the  
'%' there as there's no way to match that to the URL grammar rules.   
(It wouldn't actually be taken as part of the host name even though it  
may appear to be.)

> If we canonicalize that into:
>
>  https://example.com/tricky.host/repo.git

I don't think this will be a concern since that is an invalid  
normalization.  Perhaps there is another example that is also  
concerning that needs to be addressed?


> One of the things that gets encoded are the delimiting characters.  
> So if
> I have the URL:
>
>  https://foo%3abar@example.com
>
> you would "canonicalize" it into:
>
>  https://foo:bar@example.com
>
> But those are two different URLs entirely; the first has the username
> "foo:bar", and the second has the username "foo" and the password  
> "bar".

That would be an incorrect normalization according to RFC 3986.  '@',  
'/' and ':' must be escaped in the user:password portion and decoding  
them there is verboten (prohibited).  And delimiters in general may  
not have their escaping changed during normalization.

> So I think the three options are basically:
>
>  1. No decoding, require the user to use a consistent prefix between
>     config and other uses of the URL. I.e., your current patch. The
>     downside is that it doesn't handle any variation of input.

I have previously agreed with Aaron about this and am no longer  
proposing this.

>
>  2. Full decoding into constituent parts. This handles  
> canonicalization
>     of encoding, and also allows "wildcard" components (e.g., a URL
>     with username can match the generic "https://example.com" in the
>     config). The downside is that you cannot do a "longest prefix  
> wins"
>     rule for overriding.
>
>  3. Full decoding as in (2), but then re-assemble into a canonicalized
>     encoded URL. The upside is that you get to do "longest prefix
>     wins", but you can no longer have wildcard components. I think  
> this
>     is what you are suggesting in your mail.
>
> I'm still in favor of (2), because I think the wildcard components are
> important (and while I agree that the "longest prefix wins" is  
> nicer, we
> already have "last one wins" for the rest of the config, including the
> credential URL matcher). But I certainly think (3) is better than (1).

AIUI, currently "last one wins" only applies to same-named options.   
That is, if I have these:

> [svn-remote "svn"]
> 	useSvnsyncProps = true
> [svn]
> 	useSvnsyncProps = false

When fetching using git-svn, useSvnsyncProps will be true since "svn- 
remote.svn.useSvnsyncProps" and "svn.useSvnsyncProps" are different  
property names.



I think we've agreed that any matches must match:

1) the scheme (http:,https:,...)

2) the host name

3) the port number

4) at least a prefix of the path (that does not seem to be in  
question, the question seems to be whether or not longest match wins  
or last seen when processed by git_config() wins)

In that case the only wildcard in question is the user name.  (I don't  
think anyone is proposing matching on the user password when it has  
been included directly in the URL.)



I propose that:

1) URLs be normalized before attempting any matching (RFC 3986 rules  
here)

2) Allow a config <url> without a user to match a given url with a  
user (otherwise, the user part must match exactly)

2) Longest length path match wins (to avoid users needing to be  
cognizant of the actual order sections appear in their config file)

3) If there are multiple same-path-length matches, the last one  
encountered wins except that exact user matches are preferred over  
matches where the url contains a user but the config <url> does not.



On Jul 14, 2013, at 21:02, Junio C Hamano wrote:
> "Kyle J. McKay" <mackyle@gmail.com> writes:
>
>> On Jul 12, 2013, at 13:58, Aaron Schrab wrote:
>> ...
>> This should guarantee a match in the scenario Aaron proposes above  
>> and
>> still has pretty much the same easy explanation to the user.
>>
>> Shall I go ahead and add that to the next patch version?
>>
>> Or proceed with what's there right now (there are a few pending
>> updates from reviewers) and then, as Junio says above, adjust it  
>> later
>> if needed?
>
> I have been assuming that "strictly textual match" will be a subset
> of the matching semantics Aaron and Peff suggested.  That is, if we
> include your version in the upcoming release, the user writes the
> http.<URLpattern>.<variable> configuration so that the entries match
> what they want them to match, the enhanced URL matcher Aaron and
> Peff suggested will still make them match.

Yes.  Normalizing the URLs will only create more matches, it will not  
cause any current matches to stop matching.

> Am I mistaken?  Will there be some <URLpattern> that will not match
> with the same URL literally?

I believe you are correct.

> Assuming that Aaron and Peff's enhancement will not be a backward
> incompatible update, my preference is to take the posted matching
> semantics as-is (you may have some other changes that does not
> change the "strictly textual match" semantics).



So along these lines I have prepared a forthcoming [PATCH v5] that  
adds url normalization before the comparisons.  The url normalization  
is added as a separate patch in the patch series on top of the textual  
matching with a test on top of that (it also includes the "fix parsing  
of http.sslCertPasswordProtected" change as a preparatory patch).

The forthcoming patch does not include wildcard user matching, but the  
same principle applies in that adding wildcard user matching in the  
future will only create more matches without causing any current  
matches to stop matching.

I appreciate the time you reviewers have spent looking at these  
patches and sending feedback.  I would like to see this recognized in  
the patch with a suitable annotation (Reviewed-By: or Feedback-From:  
or whatever's appropriate).  I am unclear on how to make this happen  
since, as has been pointed out, I cannot actually include a 'Reviewed- 
By:' line in a new patch I send out since such a new patch has not yet  
actually been reviewed no matter the content.

Thanks for your help,
Kyle

  reply	other threads:[~2013-07-15  9:50 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-11 21:50 [PATCH v3] config: add support for http.<url>.* settings Kyle J. McKay
2013-07-11 23:12 ` Junio C Hamano
     [not found]   ` <455666C5-7663-4361-BF34-378D3EAE2891@gmail.com>
     [not found]     ` <7vsizjn390.fsf@alter.siamese.dyndns.org>
     [not found]       ` <7v4nbyic57.fsf@alter.siamese.dyndns.org>
2013-07-13 19:46         ` Kyle J. McKay
2013-07-15  4:02           ` Junio C Hamano
2013-07-15  5:12             ` Jeff King
2013-07-15  9:50               ` Kyle J. McKay [this message]
2013-07-15  5:06           ` Jeff King
2013-07-12  9:59 ` Jeff King
2013-07-12 13:07   ` Kyle J. McKay
2013-07-12 20:58     ` Aaron Schrab
2013-07-15  4:43     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CDCD6E5C-A829-43A1-94F6-93670A339A35@gmail.com \
    --to=mackyle@gmail.com \
    --cc=aaron@schrab.com \
    --cc=amery@geeks.cl \
    --cc=davvid@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jk@jk.gs \
    --cc=knittl89@googlemail.com \
    --cc=pasky@ucw.cz \
    --cc=peff@peff.net \
    --cc=richih.mailinglist@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).