* [PATCH] Disable dumb HTTP fallback with GIT_CURL_FALLBACK=0 @ 2012-09-20 2:55 Shawn O. Pearce 2012-09-20 3:22 ` Shawn Pearce ` (2 more replies) 0 siblings, 3 replies; 36+ messages in thread From: Shawn O. Pearce @ 2012-09-20 2:55 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Shawn O. Pearce From: "Shawn O. Pearce" <spearce@spearce.org> If the user doesn't want to use the dumb HTTP protocol, she may set GIT_CURL_FALLBACK=0 in the environment before invoking a Git protocol operation. This is mostly useful when testing against servers that are known to not support the dumb protocol. If the smart service detection fails the client should not continue with dumb behavior, but instead provide accurate HTTP failure data. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> --- remote-curl.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/remote-curl.c b/remote-curl.c index 4a0927e..2f91128 100644 --- a/remote-curl.c +++ b/remote-curl.c @@ -17,7 +17,8 @@ struct options { unsigned progress : 1, followtags : 1, dry_run : 1, - thin : 1; + thin : 1, + fallback : 1; }; static struct options options; @@ -115,7 +116,8 @@ static struct discovery* discover_refs(const char *service) http_ret = http_get_strbuf(refs_url, &buffer, HTTP_NO_CACHE); /* try again with "plain" url (no ? or & appended) */ - if (http_ret != HTTP_OK && http_ret != HTTP_NOAUTH) { + if (options.fallback && http_ret != HTTP_OK + && http_ret != HTTP_NOAUTH) { free(refs_url); strbuf_reset(&buffer); @@ -868,6 +870,12 @@ int main(int argc, const char **argv) options.verbosity = 1; options.progress = !!isatty(2); options.thin = 1; + options.fallback = 1; + + if (getenv("GIT_CURL_FALLBACK")) { + char *fb = getenv("GIT_CURL_FALLBACK"); + options.fallback = *fb != '0'; + } remote = remote_get(argv[1]); -- 1.7.12.1.512.g9b230e6 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH] Disable dumb HTTP fallback with GIT_CURL_FALLBACK=0 2012-09-20 2:55 [PATCH] Disable dumb HTTP fallback with GIT_CURL_FALLBACK=0 Shawn O. Pearce @ 2012-09-20 3:22 ` Shawn Pearce 2012-09-20 3:52 ` Jeff King 2012-09-20 3:48 ` Jeff King 2012-09-20 4:14 ` Re* [PATCH] Disable dumb HTTP fallback with GIT_CURL_FALLBACK=0 Junio C Hamano 2 siblings, 1 reply; 36+ messages in thread From: Shawn Pearce @ 2012-09-20 3:22 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Shawn O. Pearce On Wed, Sep 19, 2012 at 7:55 PM, Shawn O. Pearce <spearce@spearce.org> wrote: > From: "Shawn O. Pearce" <spearce@spearce.org> I can't explain why git send-email did this. I obviously didn't need the extra From header here. format-patch didn't write it to the patch file, it was injected by send-email. My .git/config is pretty simple, the name/email are derived from there: [user] name = Shawn O. Pearce email = spearce@spearce.org Ick. I really don't want to debug this right now so I'm just going to pretend it wasn't written. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] Disable dumb HTTP fallback with GIT_CURL_FALLBACK=0 2012-09-20 3:22 ` Shawn Pearce @ 2012-09-20 3:52 ` Jeff King 0 siblings, 0 replies; 36+ messages in thread From: Jeff King @ 2012-09-20 3:52 UTC (permalink / raw) To: Shawn Pearce; +Cc: Junio C Hamano, git On Wed, Sep 19, 2012 at 08:22:58PM -0700, Shawn O. Pearce wrote: > On Wed, Sep 19, 2012 at 7:55 PM, Shawn O. Pearce <spearce@spearce.org> wrote: > > From: "Shawn O. Pearce" <spearce@spearce.org> > > I can't explain why git send-email did this. I obviously didn't need > the extra From header here. format-patch didn't write it to the patch > file, it was injected by send-email. My .git/config is pretty simple, > the name/email are derived from there: > > [user] > name = Shawn O. Pearce > email = spearce@spearce.org > > Ick. I really don't want to debug this right now so I'm just going to > pretend it wasn't written. I was looking at the send-email author comparison code a month or three ago, and I remember noticing that it totally fails to canonicalize before comparing. Without even looking at it again, I'm fairly sure that it thinks '"Shawn O. Pearce"' and 'Shawn O. Pearce' (i.e., with and without the quotes) are different, and therefore author != committer. In your case it is particularly egregious because the quotes are introduced (correctly) by format-patch, so it is not even like you have configured two different versions of your name. I think the same bug exists for different rfc2047 encodings of a name with non-ascii characters. Fixing both would involve canonicalizing the names before comparing. I wonder if it would be simpler to just compare the email addresses and ignore the names entirely. -Peff ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] Disable dumb HTTP fallback with GIT_CURL_FALLBACK=0 2012-09-20 2:55 [PATCH] Disable dumb HTTP fallback with GIT_CURL_FALLBACK=0 Shawn O. Pearce 2012-09-20 3:22 ` Shawn Pearce @ 2012-09-20 3:48 ` Jeff King 2012-09-20 5:57 ` Shawn Pearce 2012-09-20 4:14 ` Re* [PATCH] Disable dumb HTTP fallback with GIT_CURL_FALLBACK=0 Junio C Hamano 2 siblings, 1 reply; 36+ messages in thread From: Jeff King @ 2012-09-20 3:48 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Junio C Hamano, git On Wed, Sep 19, 2012 at 07:55:53PM -0700, Shawn O. Pearce wrote: > From: "Shawn O. Pearce" <spearce@spearce.org> > > If the user doesn't want to use the dumb HTTP protocol, she may > set GIT_CURL_FALLBACK=0 in the environment before invoking a Git > protocol operation. This is mostly useful when testing against > servers that are known to not support the dumb protocol. If the > smart service detection fails the client should not continue with > dumb behavior, but instead provide accurate HTTP failure data. I have been looking into this recently, as well. GitHub does not allow dumb http at all these days, so transient errors on the initial smart contact can cause us to fall back to dumb, and end up reporting a totally useless 403 Forbidden error. I guess Google Code has a similar issue. Note that it is not really do not "fall back to dumb"; we detect the dumb nature from the response. It is really "fall back to trying the URL without the query string, because there are some servers that cannot handle it". With your patch, we might still end up performing a dumb transfer. I think what you're doing here is sane, because you have to turn it on manually, and thus there are no possible backwards compatibility issues. But it might be nice to make things work better out of the box. Here are two client-side changes I've been toying with: 1. If both smart and dumb requests fail, report the error for the smart request. Now that smart-http clients are common, I'd expect most http servers to be smart these days. Of course I don't have any sort of numbers to back this up (nor am I sure how to get them; obviously big sites like GitHub and Google Code do a lot of traffic, but who knows how many one-off repo-on-a-generic-web-host sites still exist?). An alternative would be to simply be more verbose, and mention that we tried to fallback and list both failures (or we could do this with just "fetch -v"). 2. Be more discerning about which errors will cause a fallback. Something like "504 Gateway Timeout" should not give a fallback. The problem is that you are really guessing at what kinds of http errors you are going to get from a dumb server when you try the smart URL. I dug back into the list thread that spawned the "retry without query string" patch (703e6e7). The thread is here: http://thread.gmane.org/gmane.comp.version-control.git/137609 If you read the thread, it turns out that the problem in this case (which is the only reported case I could find in the archive) is that the server was misconfigured to treat _anything_ with a query string as a gitweb URL. And then it got fixed pretty much immediately. So as far as we know, there may be zero servers for which this fallback is actually doing anything useful. I'm tempted to just reverse the logic. Try the request with the query string and immediately fail if it doesn't work. For the few (if any) people who are hitting a server that will not serve the dumb file in that case, add a "remote.*.dumbhttp" setting that will turn off smart completely as a workaround. That would serve the (presumed) majority who are using smart http, everyone using dumb http on a reasonably-configured server, and still allow an easy workaround for people with badly configured servers. What do you think? > --- > remote-curl.c | 12 ++++++++++-- > 1 file changed, 10 insertions(+), 2 deletions(-) If we do go this route, the patch itself looks fairly obvious, although: > @@ -868,6 +870,12 @@ int main(int argc, const char **argv) > options.verbosity = 1; > options.progress = !!isatty(2); > options.thin = 1; > + options.fallback = 1; > + > + if (getenv("GIT_CURL_FALLBACK")) { > + char *fb = getenv("GIT_CURL_FALLBACK"); > + options.fallback = *fb != '0'; > + } This can just be: options.fallback = git_env_bool("GIT_CURL_FALLBACK", 1); Fewer lines, and you get all of the true/false parsing for free. -Peff ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] Disable dumb HTTP fallback with GIT_CURL_FALLBACK=0 2012-09-20 3:48 ` Jeff King @ 2012-09-20 5:57 ` Shawn Pearce 2012-09-20 5:58 ` [PATCH] Revert "retry request without query when info/refs?query fails" Shawn O. Pearce 2012-09-20 17:24 ` [PATCH] Disable dumb HTTP fallback with GIT_CURL_FALLBACK=0 Jeff King 0 siblings, 2 replies; 36+ messages in thread From: Shawn Pearce @ 2012-09-20 5:57 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, git On Wed, Sep 19, 2012 at 8:48 PM, Jeff King <peff@peff.net> wrote: > On Wed, Sep 19, 2012 at 07:55:53PM -0700, Shawn O. Pearce wrote: > >> If the user doesn't want to use the dumb HTTP protocol, she may >> set GIT_CURL_FALLBACK=0 in the environment before invoking a Git >> protocol operation. This is mostly useful when testing against >> servers that are known to not support the dumb protocol. If the >> smart service detection fails the client should not continue with >> dumb behavior, but instead provide accurate HTTP failure data. > > I have been looking into this recently, as well. GitHub does not allow > dumb http at all these days, Interesting that GitHub doesn't support dumb transfer either. > so transient errors on the initial smart > contact can cause us to fall back to dumb, Transient errors is actually what is leading me down this path. We see about 0.0455% of our requests to the Android hosting service as these dumb fallbacks. This means the client never got a proper smart service reply. Server logs suggest we sent a valid response, so I am suspecting transient proxy errors, but its hard to debug because clients discard the error. > and end up reporting a > totally useless 403 Forbidden error. Today I posted a change to JGit [1] to make this 406 Not Acceptable as I think it better matches the description of the failure. It also no longer reuses the common 403 Forbidden error that a server might choose to return if a request was given with invalid credentials. [1] https://git.eclipse.org/r/7847 > I guess Google Code has a similar > issue. Yes, and {android,kernel,gerrit}.googlesource.com also only support the smart protocol. > Note that it is not really do not "fall back to dumb"; we detect the > dumb nature from the response. It is really "fall back to trying the URL > without the query string, because there are some servers that cannot > handle it". With your patch, we might still end up performing a dumb > transfer. Yes, the server could ignore the parameter and return a dumb info/refs response on the initial request, forcing the client to a dumb transfer. I forgot this case being common on a dumb server. I have only been working with or worrying about smart servers, whose transient failures have this info/refs request that they always fail... giving a poor user experience. > I think what you're doing here is sane, because you have to turn it on > manually, and thus there are no possible backwards compatibility issues. > But it might be nice to make things work better out of the box. Here are > two client-side changes I've been toying with: > > 1. If both smart and dumb requests fail, report the error for the > smart request. Now that smart-http clients are common, I'd expect > most http servers to be smart these days. Of course I don't have > any sort of numbers to back this up (nor am I sure how to get them; > obviously big sites like GitHub and Google Code do a lot of > traffic, but who knows how many one-off repo-on-a-generic-web-host > sites still exist?). I suspect there are still a number of servers that rely on dumb protocol to host repositories because its very simple to setup. Breaking support for this wouldn't be a good idea. kernel.org and eclipse.org both support the smart protocol, but I think this is a common trend. Sites that host a lot of Git repositories take the time to setup the smart protocol. Everyone else, its hit-or-miss, mostly miss. > An alternative would be to simply be more verbose, and mention that > we tried to fallback and list both failures (or we could do this > with just "fetch -v"). I would bet the big hosting providers would appreciate having the failure listed in non-verbose mode. Many Git users are going to use a hosting service like GitHub or Google Code, or work with their organization like kernel.org for central Git hosting. Consumers of hosted projects that are less familiar with Git will be accessing these sites often enough. I considered logging this failure to stderr, but didn't. > 2. Be more discerning about which errors will cause a fallback. > Something like "504 Gateway Timeout" should not give a fallback. > The problem is that you are really guessing at what kinds of http > errors you are going to get from a dumb server when you try the > smart URL. > > I dug back into the list thread that spawned the "retry > without query string" patch (703e6e7). > > The thread is here: > > http://thread.gmane.org/gmane.comp.version-control.git/137609 > > If you read the thread, it turns out that the problem in this case > (which is the only reported case I could find in the archive) is > that the server was misconfigured to treat _anything_ with a query > string as a gitweb URL. And then it got fixed pretty much > immediately. > > So as far as we know, there may be zero servers for which this > fallback is actually doing anything useful. > > I'm tempted to just reverse the logic. After re-reading that thread, it was a mistake to apply 703e6e76. We should just revert it. > Try the request with the query > string and immediately fail if it doesn't work. For the few (if any) > people who are hitting a server that will not serve the dumb file in > that case, add a "remote.*.dumbhttp" setting that will turn off smart > completely as a workaround. > > That would serve the (presumed) majority who are using smart http, > everyone using dumb http on a reasonably-configured server, and still > allow an easy workaround for people with badly configured servers. > > What do you think? Yes, this is the better idea. Revert 703e6e76 and add a new feature to disable the smart protocol entirely as an escape hatch. ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH] Revert "retry request without query when info/refs?query fails" 2012-09-20 5:57 ` Shawn Pearce @ 2012-09-20 5:58 ` Shawn O. Pearce 2012-09-20 6:29 ` Junio C Hamano 2012-09-20 17:24 ` [PATCH] Disable dumb HTTP fallback with GIT_CURL_FALLBACK=0 Jeff King 1 sibling, 1 reply; 36+ messages in thread From: Shawn O. Pearce @ 2012-09-20 5:58 UTC (permalink / raw) To: Junio C Hamano, Jeff King; +Cc: git, Shawn O. Pearce From: "Shawn O. Pearce" <spearce@spearce.org> This reverts commit 703e6e76a14825e5b0c960d525f34e607154b4f7. Retrying without the query parameter was added as a workaround for a single broken HTTP server at git.debian.org[1]. The server was misconfigured to route every request with a query parameter into gitweb.cgi. Admins fixed the server's configuration within 16 hours of the bug report to the Git mailing list, but we still patched Git with this fallback and have been paying for it since. Most Git hosting services configure the smart HTTP protocol and the retry logic confuses users when there is a transient HTTP error as Git dropped the real error from the smart HTTP request. Removing the retry makes root causes easier to identify. [1] http://thread.gmane.org/gmane.comp.version-control.git/137609 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> --- remote-curl.c | 18 ++---------------- 1 file changed, 2 insertions(+), 16 deletions(-) diff --git a/remote-curl.c b/remote-curl.c index 3ec474f..2359f59 100644 --- a/remote-curl.c +++ b/remote-curl.c @@ -95,7 +95,7 @@ static struct discovery* discover_refs(const char *service) struct strbuf buffer = STRBUF_INIT; struct discovery *last = last_discovery; char *refs_url; - int http_ret, is_http = 0, proto_git_candidate = 1; + int http_ret, is_http = 0; if (last && !strcmp(service, last->service)) return last; @@ -113,19 +113,6 @@ static struct discovery* discover_refs(const char *service) refs_url = strbuf_detach(&buffer, NULL); http_ret = http_get_strbuf(refs_url, &buffer, HTTP_NO_CACHE); - - /* try again with "plain" url (no ? or & appended) */ - if (http_ret != HTTP_OK && http_ret != HTTP_NOAUTH) { - free(refs_url); - strbuf_reset(&buffer); - - proto_git_candidate = 0; - strbuf_addf(&buffer, "%sinfo/refs", url); - refs_url = strbuf_detach(&buffer, NULL); - - http_ret = http_get_strbuf(refs_url, &buffer, HTTP_NO_CACHE); - } - switch (http_ret) { case HTTP_OK: break; @@ -144,8 +131,7 @@ static struct discovery* discover_refs(const char *service) last->buf_alloc = strbuf_detach(&buffer, &last->len); last->buf = last->buf_alloc; - if (is_http && proto_git_candidate - && 5 <= last->len && last->buf[4] == '#') { + if (is_http && 5 <= last->len && last->buf[4] == '#') { /* smart HTTP response; validate that the service * pkt-line matches our request. */ -- 1.7.12.1.512.g9b230e6 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH] Revert "retry request without query when info/refs?query fails" 2012-09-20 5:58 ` [PATCH] Revert "retry request without query when info/refs?query fails" Shawn O. Pearce @ 2012-09-20 6:29 ` Junio C Hamano 2012-09-20 6:31 ` Junio C Hamano 2012-09-20 16:24 ` Jeff King 0 siblings, 2 replies; 36+ messages in thread From: Junio C Hamano @ 2012-09-20 6:29 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Jeff King, git "Shawn O. Pearce" <spearce@spearce.org> writes: > This reverts commit 703e6e76a14825e5b0c960d525f34e607154b4f7. > > Retrying without the query parameter was added as a workaround > for a single broken HTTP server at git.debian.org[1]. The server > was misconfigured to route every request with a query parameter > into gitweb.cgi. Admins fixed the server's configuration within > 16 hours of the bug report to the Git mailing list, but we still > patched Git with this fallback and have been paying for it since. As the consequence of the above, the only two things we know about the servers in the wild are (1) a misconfiguration that requires this retry was once made, so it is not very unlikely others did the same misconfiguration, and (2) those unknown number of servers have been happily serving the current clients because the workaround patch have been hiding the misconfiguration ever since. But as long as the failure diagnosis from updated clients that revert this workaround is sufficient to allow such misconfigured servers, I think it is OK. We might see a large number of small people having to run around and fix the configuration as a fallout, though. > Most Git hosting services configure the smart HTTP protocol and the > retry logic confuses users when there is a transient HTTP error as > Git dropped the real error from the smart HTTP request. Removing the > retry makes root causes easier to identify. Does that hold true also for dumb only small people installations? They are the ones that need more help than the large installations staffed sufficiently and run smart http gateway. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] Revert "retry request without query when info/refs?query fails" 2012-09-20 6:29 ` Junio C Hamano @ 2012-09-20 6:31 ` Junio C Hamano 2012-09-20 16:24 ` Jeff King 1 sibling, 0 replies; 36+ messages in thread From: Junio C Hamano @ 2012-09-20 6:31 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Jeff King, git Junio C Hamano <gitster@pobox.com> writes: > "Shawn O. Pearce" <spearce@spearce.org> writes: > >> This reverts commit 703e6e76a14825e5b0c960d525f34e607154b4f7. >> >> Retrying without the query parameter was added as a workaround >> for a single broken HTTP server at git.debian.org[1]. The server >> was misconfigured to route every request with a query parameter >> into gitweb.cgi. Admins fixed the server's configuration within >> 16 hours of the bug report to the Git mailing list, but we still >> patched Git with this fallback and have been paying for it since. > > As the consequence of the above, the only two things we know about > the servers in the wild are (1) a misconfiguration that requires > this retry was once made, so it is not very unlikely others did the > same misconfiguration, and (2) those unknown number of servers have > been happily serving the current clients because the workaround > patch have been hiding the misconfiguration ever since. > > But as long as the failure diagnosis from updated clients that > revert this workaround is sufficient to allow such misconfigured > servers, I think it is OK. We might see a large number of small s/servers,/servers diagnosed,/; > people having to run around and fix the configuration as a fallout, > though. > >> Most Git hosting services configure the smart HTTP protocol and the >> retry logic confuses users when there is a transient HTTP error as >> Git dropped the real error from the smart HTTP request. Removing the >> retry makes root causes easier to identify. > > Does that hold true also for dumb only small people installations? > They are the ones that need more help than the large installations > staffed sufficiently and run smart http gateway. In any case, will queue. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] Revert "retry request without query when info/refs?query fails" 2012-09-20 6:29 ` Junio C Hamano 2012-09-20 6:31 ` Junio C Hamano @ 2012-09-20 16:24 ` Jeff King 2012-09-20 16:59 ` [PATCH 0/2] smart http toggle switch fails" Jeff King 1 sibling, 1 reply; 36+ messages in thread From: Jeff King @ 2012-09-20 16:24 UTC (permalink / raw) To: Junio C Hamano; +Cc: Shawn O. Pearce, git On Wed, Sep 19, 2012 at 11:29:34PM -0700, Junio C Hamano wrote: > "Shawn O. Pearce" <spearce@spearce.org> writes: > > > This reverts commit 703e6e76a14825e5b0c960d525f34e607154b4f7. > > > > Retrying without the query parameter was added as a workaround > > for a single broken HTTP server at git.debian.org[1]. The server > > was misconfigured to route every request with a query parameter > > into gitweb.cgi. Admins fixed the server's configuration within > > 16 hours of the bug report to the Git mailing list, but we still > > patched Git with this fallback and have been paying for it since. > > As the consequence of the above, the only two things we know about > the servers in the wild are (1) a misconfiguration that requires > this retry was once made, so it is not very unlikely others did the > same misconfiguration, and (2) those unknown number of servers have > been happily serving the current clients because the workaround > patch have been hiding the misconfiguration ever since. The misconfiguration was pretty wild in this case. I'd be much more worried about stupidly non-compliant servers that will not serve "foo/bar" when asked for "foo/bar?key=value". > But as long as the failure diagnosis from updated clients that > revert this workaround is sufficient to allow such misconfigured > servers, I think it is OK. We might see a large number of small > people having to run around and fix the configuration as a fallout, > though. I think Shawn's revert is the right thing to do. But it is not complete without the manual workaround. I'm putting that patch together now and should have it out in a few minutes. > > Most Git hosting services configure the smart HTTP protocol and the > > retry logic confuses users when there is a transient HTTP error as > > Git dropped the real error from the smart HTTP request. Removing the > > retry makes root causes easier to identify. > > Does that hold true also for dumb only small people installations? > They are the ones that need more help than the large installations > staffed sufficiently and run smart http gateway. For the most part, yes. They will get a useful error out of the smart request if there is a transient error, the repo does not exist, etc. The real fallout is the people who are hitting a broken or misconfigured server and may get a confusing error code (in the one case we know about, it was a 404, but it really could be anything, depending on the exact nature of the misconfiguration). -Peff ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 0/2] smart http toggle switch fails" 2012-09-20 16:24 ` Jeff King @ 2012-09-20 16:59 ` Jeff King 2012-09-20 17:00 ` [PATCH 1/2] remote-curl: rename is_http variable Jeff King 2012-09-20 17:05 ` [PATCH 2/2] remote-curl: let users turn off smart http Jeff King 0 siblings, 2 replies; 36+ messages in thread From: Jeff King @ 2012-09-20 16:59 UTC (permalink / raw) To: Junio C Hamano; +Cc: Shawn O. Pearce, git On Thu, Sep 20, 2012 at 12:24:56PM -0400, Jeff King wrote: > I think Shawn's revert is the right thing to do. But it is not complete > without the manual workaround. I'm putting that patch together now and > should have it out in a few minutes. And here it is. This goes on top of Shawn's revert patch (it might be nice to mention the commit id of that in the commit message of the second patch. I couldn't do so because it is not yet in your repo). [1/2]: remote-curl: rename is_http variable [2/2]: remote-curl: let users turn off smart http -Peff ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 1/2] remote-curl: rename is_http variable 2012-09-20 16:59 ` [PATCH 0/2] smart http toggle switch fails" Jeff King @ 2012-09-20 17:00 ` Jeff King 2012-09-20 17:05 ` [PATCH 2/2] remote-curl: let users turn off smart http Jeff King 1 sibling, 0 replies; 36+ messages in thread From: Jeff King @ 2012-09-20 17:00 UTC (permalink / raw) To: Junio C Hamano; +Cc: Shawn O. Pearce, git We don't actually care whether the connection is http or not; what we care about is whether it might be smart http. Rename the variable to be more accurate, which will make it easier to later make smart-http optional. Signed-off-by: Jeff King <peff@peff.net> --- remote-curl.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/remote-curl.c b/remote-curl.c index 2359f59..c0b98cc 100644 --- a/remote-curl.c +++ b/remote-curl.c @@ -95,7 +95,7 @@ static struct discovery* discover_refs(const char *service) struct strbuf buffer = STRBUF_INIT; struct discovery *last = last_discovery; char *refs_url; - int http_ret, is_http = 0; + int http_ret, maybe_smart = 0; if (last && !strcmp(service, last->service)) return last; @@ -103,7 +103,7 @@ static struct discovery* discover_refs(const char *service) strbuf_addf(&buffer, "%sinfo/refs", url); if (!prefixcmp(url, "http://") || !prefixcmp(url, "https://")) { - is_http = 1; + maybe_smart = 1; if (!strchr(url, '?')) strbuf_addch(&buffer, '?'); else @@ -131,7 +131,7 @@ static struct discovery* discover_refs(const char *service) last->buf_alloc = strbuf_detach(&buffer, &last->len); last->buf = last->buf_alloc; - if (is_http && 5 <= last->len && last->buf[4] == '#') { + if (maybe_smart && 5 <= last->len && last->buf[4] == '#') { /* smart HTTP response; validate that the service * pkt-line matches our request. */ -- 1.7.11.7.15.g085c6bd ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 2/2] remote-curl: let users turn off smart http 2012-09-20 16:59 ` [PATCH 0/2] smart http toggle switch fails" Jeff King 2012-09-20 17:00 ` [PATCH 1/2] remote-curl: rename is_http variable Jeff King @ 2012-09-20 17:05 ` Jeff King 2012-09-20 17:53 ` Junio C Hamano 1 sibling, 1 reply; 36+ messages in thread From: Jeff King @ 2012-09-20 17:05 UTC (permalink / raw) To: Junio C Hamano; +Cc: Shawn O. Pearce, git Usually there is no need for users to specify whether an http remote is smart or dumb; the protocol is designed so that a single initial request is made, and the client can determine the server's capability from the response. However, some misconfigured dumb-only servers may not like the initial request by a smart client, as it contains a query string. Until recently, commit 703e6e7 worked around this by making a second request. However, that commit was recently reverted due to its side effect of masking the initial request's error code. This patch takes a different approach to the workaround. We assume that the common case is that the server is either smart-http or a reasonably configured dumb-http. If that is not the case, we provide both a per-remote config option and an environment variable with which the user can manually work around the issue. Signed-off-by: Jeff King <peff@peff.net> --- I added the config item as remote.foo.smarthttp. You could also allow "http.$url.smart" (and just "http.smart", for that matter), which could be more flexible if you have multiple remotes pointing to the same broken server. However, it is also more complex to use, and is a lot more code. Since we don't know if any such servers even exist, I tried to give the minimal escape hatch, and we can easily build more features on it later if people complain. Documentation/config.txt | 11 +++++++++++ remote-curl.c | 3 ++- remote.c | 3 +++ remote.h | 1 + t/t5551-http-fetch.sh | 17 +++++++++++++++++ 5 files changed, 34 insertions(+), 1 deletion(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 6416cae..651b23c 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1871,6 +1871,17 @@ remote.<name>.uploadpack:: The default program to execute on the remote side when fetching. See option \--upload-pack of linkgit:git-fetch-pack[1]. +remote.<name>.smartHttp:: + If true, this remote will attempt to use git's smart http + protocol when making remote http requests. Normally git sends an + initial smart-http request, and falls back to the older "dumb" + protocol if the server does not claim to support the smart + protocol. However, some misconfigured dumb-only servers may + produce confusing results for the initial request. Setting this + option to false disables the initial smart request, which can + workaround problems with such servers. You should not generally + need to set this. Defaults to `true`. + remote.<name>.tagopt:: Setting this value to \--no-tags disables automatic tag following when fetching from remote <name>. Setting it to \--tags will fetch every diff --git a/remote-curl.c b/remote-curl.c index c0b98cc..8829bfb 100644 --- a/remote-curl.c +++ b/remote-curl.c @@ -102,7 +102,8 @@ static struct discovery* discover_refs(const char *service) free_discovery(last); strbuf_addf(&buffer, "%sinfo/refs", url); - if (!prefixcmp(url, "http://") || !prefixcmp(url, "https://")) { + if ((!prefixcmp(url, "http://") || !prefixcmp(url, "https://")) && + git_env_bool("GIT_SMART_HTTP", remote->smart_http)) { maybe_smart = 1; if (!strchr(url, '?')) strbuf_addch(&buffer, '?'); diff --git a/remote.c b/remote.c index 04fd9ea..a334390 100644 --- a/remote.c +++ b/remote.c @@ -152,6 +152,7 @@ static struct remote *make_remote(const char *name, int len) ret->name = xstrndup(name, len); else ret->name = xstrdup(name); + ret->smart_http = 1; return ret; } @@ -453,6 +454,8 @@ static int handle_config(const char *key, const char *value, void *cb) key, value); } else if (!strcmp(subkey, ".vcs")) { return git_config_string(&remote->foreign_vcs, key, value); + } else if (!strcmp(subkey, ".smarthttp")) { + remote->smart_http = git_config_bool(key, value); } return 0; } diff --git a/remote.h b/remote.h index 251d8fd..9031d18 100644 --- a/remote.h +++ b/remote.h @@ -40,6 +40,7 @@ struct remote { int fetch_tags; int skip_default_update; int mirror; + int smart_http; const char *receivepack; const char *uploadpack; diff --git a/t/t5551-http-fetch.sh b/t/t5551-http-fetch.sh index 2db5c35..48173ed 100755 --- a/t/t5551-http-fetch.sh +++ b/t/t5551-http-fetch.sh @@ -129,6 +129,23 @@ test_expect_success 'clone from auth-only-for-push repository' ' test_cmp expect actual ' +test_expect_success 'disable dumb http on server' ' + git --git-dir="$HTTPD_DOCUMENT_ROOT_PATH/repo.git" \ + config http.getanyfile false +' + +test_expect_success 'GIT_SMART_HTTP can disable smart http' ' + (GIT_SMART_HTTP=0 && + export GIT_SMART_HTTP && + cd clone && + test_must_fail git fetch) +' + +test_expect_success 'remote.*.smartHTTP can disable smart http' ' + (cd clone && + test_must_fail git -c remote.origin.smartHTTP=false fetch) +' + test -n "$GIT_TEST_LONG" && test_set_prereq EXPENSIVE test_expect_success EXPENSIVE 'create 50,000 tags in the repo' ' -- 1.7.11.7.15.g085c6bd ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH 2/2] remote-curl: let users turn off smart http 2012-09-20 17:05 ` [PATCH 2/2] remote-curl: let users turn off smart http Jeff King @ 2012-09-20 17:53 ` Junio C Hamano 2012-09-20 18:12 ` Jeff King 0 siblings, 1 reply; 36+ messages in thread From: Junio C Hamano @ 2012-09-20 17:53 UTC (permalink / raw) To: Jeff King; +Cc: Shawn O. Pearce, git Jeff King <peff@peff.net> writes: > I added the config item as remote.foo.smarthttp. You could also allow > "http.$url.smart" (and just "http.smart", for that matter), which could > be more flexible if you have multiple remotes pointing to the same > broken server. What would the user experience be when we introduce "even smarter" http server protocol extension? Will we add remote.foo.starterhttp? Perhaps remote.$name.httpvariants = [smart] [dumb] to allow users to say "smart only", "dumb only", or "smart and/or dumb" might be more code but less burden on the users. The code obviously looks correct, and the documentation reads fine. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/2] remote-curl: let users turn off smart http 2012-09-20 17:53 ` Junio C Hamano @ 2012-09-20 18:12 ` Jeff King 2012-09-20 18:36 ` Junio C Hamano 0 siblings, 1 reply; 36+ messages in thread From: Jeff King @ 2012-09-20 18:12 UTC (permalink / raw) To: Junio C Hamano; +Cc: Shawn O. Pearce, git On Thu, Sep 20, 2012 at 10:53:15AM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > I added the config item as remote.foo.smarthttp. You could also allow > > "http.$url.smart" (and just "http.smart", for that matter), which could > > be more flexible if you have multiple remotes pointing to the same > > broken server. > > What would the user experience be when we introduce "even smarter" > http server protocol extension? Will we add remote.foo.starterhttp? I would hope that it would actually be negotiated reliably at the protocol level so we do not have to deal with this mess again. > Perhaps > > remote.$name.httpvariants = [smart] [dumb] > > to allow users to say "smart only", "dumb only", or "smart and/or > dumb" might be more code but less burden on the users. I don't mind that format if we are going that direction, but is there anybody who actually wants to say "smart only?" -Peff ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/2] remote-curl: let users turn off smart http 2012-09-20 18:12 ` Jeff King @ 2012-09-20 18:36 ` Junio C Hamano 2012-09-20 20:51 ` Jeff King 0 siblings, 1 reply; 36+ messages in thread From: Junio C Hamano @ 2012-09-20 18:36 UTC (permalink / raw) To: Jeff King; +Cc: Shawn O. Pearce, git Jeff King <peff@peff.net> writes: > On Thu, Sep 20, 2012 at 10:53:15AM -0700, Junio C Hamano wrote: > >> Jeff King <peff@peff.net> writes: >> >> > I added the config item as remote.foo.smarthttp. You could also allow >> > "http.$url.smart" (and just "http.smart", for that matter), which could >> > be more flexible if you have multiple remotes pointing to the same >> > broken server. >> >> What would the user experience be when we introduce "even smarter" >> http server protocol extension? Will we add remote.foo.starterhttp? > > I would hope that it would actually be negotiated reliably at the > protocol level so we do not have to deal with this mess again. The original dumb vs smart was supposed to be "negotiated reliably at the protocol level", no? Yet we need this band-aid, so... >> Perhaps >> >> remote.$name.httpvariants = [smart] [dumb] >> >> to allow users to say "smart only", "dumb only", or "smart and/or >> dumb" might be more code but less burden on the users. > > I don't mind that format if we are going that direction, but is there > anybody who actually wants to say "smart only?" With 703e6e7 reverted, we take a failure from the initial smart request to mean the server is simply not serving, so "smart only" to fail quickly without trying dumb fallback is not needed. "smart only" to say "I wouldn't want to talk to dumb-only server---I do not have infinite amount of time, and I'd rather try another server" is still a possibility, but likely not worth supporting. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/2] remote-curl: let users turn off smart http 2012-09-20 18:36 ` Junio C Hamano @ 2012-09-20 20:51 ` Jeff King 2012-09-20 21:15 ` Junio C Hamano 0 siblings, 1 reply; 36+ messages in thread From: Jeff King @ 2012-09-20 20:51 UTC (permalink / raw) To: Junio C Hamano; +Cc: Shawn O. Pearce, git On Thu, Sep 20, 2012 at 11:36:34AM -0700, Junio C Hamano wrote: > >> What would the user experience be when we introduce "even smarter" > >> http server protocol extension? Will we add remote.foo.starterhttp? > > > > I would hope that it would actually be negotiated reliably at the > > protocol level so we do not have to deal with this mess again. > > The original dumb vs smart was supposed to be "negotiated reliably > at the protocol level", no? Yet we need this band-aid, so... I started to write much more in my original response, but deleted it as being too wordy. I guess I will have to rewrite it now. :) The difference is that jumping from dumb to smart had to give context clues at the HTTP layer. That is, by sending a query string, the client sends a single bit that tells the server "I understand smart http", and the server responds with output that indicates it also understands. We had to embed this in the HTTP layer, because the previous iteration wasn't running any custom git code at all. Whereas if we were to enhance the protocol again, it would probably _still_ begin with the same type of initial query, but we would negotiate more at the git-protocol level. And there we are in charge of how the implementation responds and handles backwards compatibility. This has already happened to some degree. We have added new capabilities at the git-protocol level, and it worked without these problems. It's not a "new protocol", but it is a backwards-compatible enhancement. And it's the likely mode for new enhancements in the future. It's possible we could have something drastically different in the future that does not even start with the same initial git conversation. But even then, I think we'd do it with a new "git-upload-pack2" service tag, or git:// and ssh access would be left behind. > >> Perhaps > >> > >> remote.$name.httpvariants = [smart] [dumb] > >> > >> to allow users to say "smart only", "dumb only", or "smart and/or > >> dumb" might be more code but less burden on the users. > > > > I don't mind that format if we are going that direction, but is there > > anybody who actually wants to say "smart only?" > > With 703e6e7 reverted, we take a failure from the initial smart > request to mean the server is simply not serving, so "smart only" to > fail quickly without trying dumb fallback is not needed. "smart > only" to say "I wouldn't want to talk to dumb-only server---I do not > have infinite amount of time, and I'd rather try another server" is > still a possibility, but likely not worth supporting. Yes. I do still need to resurrect my fetch-a-bundle-by-http code, which could also be covered by such a switch. But I guess I am just not sure if there is any point in spending effort to implement toggles that nobody has actually asked for. I'm also a little iffy on it because we would be inventing new config syntax. I don't think we want to split the list across multiple config items (which makes our usual later-config-overwrites-earlier rules behave badly). So what is the value format? Is it a whitespace-delimited case-insensitive list completely specifying the transports allowed? What happens if a new value is added. Do people who have said "smart" not get the new value, even though all they really wanted to say was "not dumb"? What about people who write "bundle smart" because their new version of git understands it, but then have old versions of git barf on it? Most of our current config is very toggle-oriented, and I'm not sure there is precedent for an option exactly like this. We can try to come up with answers to those questions, but I don't think doing it is as simple as just changing a few lines of code to support !dumb and !smart modes. I'm half-tempted to just drop the config entirely, leave GIT_SMART_HTTP=false as an escape hatch, and see if anybody even cares. At least then we're not promising support for a config option that we may want to change later. What do you want to do? -Peff ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/2] remote-curl: let users turn off smart http 2012-09-20 20:51 ` Jeff King @ 2012-09-20 21:15 ` Junio C Hamano 2012-09-20 21:30 ` Jeff King 0 siblings, 1 reply; 36+ messages in thread From: Junio C Hamano @ 2012-09-20 21:15 UTC (permalink / raw) To: Jeff King; +Cc: Shawn O. Pearce, git Jeff King <peff@peff.net> writes: > I'm half-tempted to just drop the config entirely, leave > GIT_SMART_HTTP=false as an escape hatch, and see if anybody even cares. Sounds like a very attractive minimalistic way to go forward. We can always add per-remote configuration when we find it necessary, but once we add support, we cannot easily yank it out. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/2] remote-curl: let users turn off smart http 2012-09-20 21:15 ` Junio C Hamano @ 2012-09-20 21:30 ` Jeff King 2012-09-21 17:34 ` Junio C Hamano 0 siblings, 1 reply; 36+ messages in thread From: Jeff King @ 2012-09-20 21:30 UTC (permalink / raw) To: Junio C Hamano; +Cc: Shawn O. Pearce, git On Thu, Sep 20, 2012 at 02:15:20PM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > I'm half-tempted to just drop the config entirely, leave > > GIT_SMART_HTTP=false as an escape hatch, and see if anybody even cares. > > Sounds like a very attractive minimalistic way to go forward. We > can always add per-remote configuration when we find it necessary, > but once we add support, we cannot easily yank it out. Like this? -- >8 -- Subject: [PATCH] remote-curl: let users turn off smart http Usually there is no need for users to specify whether an http remote is smart or dumb; the protocol is designed so that a single initial request is made, and the client can determine the server's capability from the response. However, some misconfigured dumb-only servers may not like the initial request by a smart client, as it contains a query string. Until recently, commit 703e6e7 worked around this by making a second request. However, that commit was recently reverted due to its side effect of masking the initial request's error code. Since git has had that workaround for several years, we don't know exactly how many such misconfigured servers are out there. The reversion of 703e6e7 assumes they are rare enough not to worry about. Still, that reversion leaves somebody who does run into such a server with no escape hatch at all. Let's give them an environment variable they can tweak to perform the "dumb" request. This is intentionally not a documented interface. It's overly simple and is really there for debugging in case somebody does complain about git not working with their server. A real user-facing interface would entail a per-remote or per-URL config variable. Signed-off-by: Jeff King <peff@peff.net> --- remote-curl.c | 3 ++- t/t5551-http-fetch.sh | 12 ++++++++++++ 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/remote-curl.c b/remote-curl.c index c0b98cc..7b19ebb 100644 --- a/remote-curl.c +++ b/remote-curl.c @@ -102,7 +102,8 @@ static struct discovery* discover_refs(const char *service) free_discovery(last); strbuf_addf(&buffer, "%sinfo/refs", url); - if (!prefixcmp(url, "http://") || !prefixcmp(url, "https://")) { + if ((!prefixcmp(url, "http://") || !prefixcmp(url, "https://")) && + git_env_bool("GIT_SMART_HTTP", 1)) { maybe_smart = 1; if (!strchr(url, '?')) strbuf_addch(&buffer, '?'); diff --git a/t/t5551-http-fetch.sh b/t/t5551-http-fetch.sh index 2db5c35..8427943 100755 --- a/t/t5551-http-fetch.sh +++ b/t/t5551-http-fetch.sh @@ -129,6 +129,18 @@ test_expect_success 'clone from auth-only-for-push repository' ' test_cmp expect actual ' +test_expect_success 'disable dumb http on server' ' + git --git-dir="$HTTPD_DOCUMENT_ROOT_PATH/repo.git" \ + config http.getanyfile false +' + +test_expect_success 'GIT_SMART_HTTP can disable smart http' ' + (GIT_SMART_HTTP=0 && + export GIT_SMART_HTTP && + cd clone && + test_must_fail git fetch) +' + test -n "$GIT_TEST_LONG" && test_set_prereq EXPENSIVE test_expect_success EXPENSIVE 'create 50,000 tags in the repo' ' -- 1.7.11.7.15.g085c6bd ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH 2/2] remote-curl: let users turn off smart http 2012-09-20 21:30 ` Jeff King @ 2012-09-21 17:34 ` Junio C Hamano 2012-09-21 17:41 ` Jeff King 0 siblings, 1 reply; 36+ messages in thread From: Junio C Hamano @ 2012-09-21 17:34 UTC (permalink / raw) To: Jeff King; +Cc: Shawn O. Pearce, git Jeff King <peff@peff.net> writes: > On Thu, Sep 20, 2012 at 02:15:20PM -0700, Junio C Hamano wrote: > >> Jeff King <peff@peff.net> writes: >> >> > I'm half-tempted to just drop the config entirely, leave >> > GIT_SMART_HTTP=false as an escape hatch, and see if anybody even cares. >> >> Sounds like a very attractive minimalistic way to go forward. We >> can always add per-remote configuration when we find it necessary, >> but once we add support, we cannot easily yank it out. > > Like this? Yeah. Will queue this one instead. The simpler, the better ;-) > > -- >8 -- > Subject: [PATCH] remote-curl: let users turn off smart http > > Usually there is no need for users to specify whether an > http remote is smart or dumb; the protocol is designed so > that a single initial request is made, and the client can > determine the server's capability from the response. > > However, some misconfigured dumb-only servers may not like > the initial request by a smart client, as it contains a > query string. Until recently, commit 703e6e7 worked around > this by making a second request. However, that commit was > recently reverted due to its side effect of masking the > initial request's error code. > > Since git has had that workaround for several years, we > don't know exactly how many such misconfigured servers are > out there. The reversion of 703e6e7 assumes they are rare > enough not to worry about. Still, that reversion leaves > somebody who does run into such a server with no escape > hatch at all. Let's give them an environment variable they > can tweak to perform the "dumb" request. > > This is intentionally not a documented interface. It's > overly simple and is really there for debugging in case > somebody does complain about git not working with their > server. A real user-facing interface would entail a > per-remote or per-URL config variable. > > Signed-off-by: Jeff King <peff@peff.net> > --- > remote-curl.c | 3 ++- > t/t5551-http-fetch.sh | 12 ++++++++++++ > 2 files changed, 14 insertions(+), 1 deletion(-) > > diff --git a/remote-curl.c b/remote-curl.c > index c0b98cc..7b19ebb 100644 > --- a/remote-curl.c > +++ b/remote-curl.c > @@ -102,7 +102,8 @@ static struct discovery* discover_refs(const char *service) > free_discovery(last); > > strbuf_addf(&buffer, "%sinfo/refs", url); > - if (!prefixcmp(url, "http://") || !prefixcmp(url, "https://")) { > + if ((!prefixcmp(url, "http://") || !prefixcmp(url, "https://")) && > + git_env_bool("GIT_SMART_HTTP", 1)) { > maybe_smart = 1; > if (!strchr(url, '?')) > strbuf_addch(&buffer, '?'); > diff --git a/t/t5551-http-fetch.sh b/t/t5551-http-fetch.sh > index 2db5c35..8427943 100755 > --- a/t/t5551-http-fetch.sh > +++ b/t/t5551-http-fetch.sh > @@ -129,6 +129,18 @@ test_expect_success 'clone from auth-only-for-push repository' ' > test_cmp expect actual > ' > > +test_expect_success 'disable dumb http on server' ' > + git --git-dir="$HTTPD_DOCUMENT_ROOT_PATH/repo.git" \ > + config http.getanyfile false > +' > + > +test_expect_success 'GIT_SMART_HTTP can disable smart http' ' > + (GIT_SMART_HTTP=0 && > + export GIT_SMART_HTTP && > + cd clone && > + test_must_fail git fetch) > +' > + > test -n "$GIT_TEST_LONG" && test_set_prereq EXPENSIVE > > test_expect_success EXPENSIVE 'create 50,000 tags in the repo' ' ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/2] remote-curl: let users turn off smart http 2012-09-21 17:34 ` Junio C Hamano @ 2012-09-21 17:41 ` Jeff King 0 siblings, 0 replies; 36+ messages in thread From: Jeff King @ 2012-09-21 17:41 UTC (permalink / raw) To: Junio C Hamano; +Cc: Shawn O. Pearce, git On Fri, Sep 21, 2012 at 10:34:22AM -0700, Junio C Hamano wrote: > >> > I'm half-tempted to just drop the config entirely, leave > >> > GIT_SMART_HTTP=false as an escape hatch, and see if anybody even cares. > >> > >> Sounds like a very attractive minimalistic way to go forward. We > >> can always add per-remote configuration when we find it necessary, > >> but once we add support, we cannot easily yank it out. > > > > Like this? > > Yeah. Will queue this one instead. The simpler, the better ;-) Thanks. I almost followed up with a rebased version of my config patch, should we want to apply it later separately. But I think I would really rather gather data from even a single bug report before we move any further (and with any luck, there will be zero bug reports :) ). -Peff ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] Disable dumb HTTP fallback with GIT_CURL_FALLBACK=0 2012-09-20 5:57 ` Shawn Pearce 2012-09-20 5:58 ` [PATCH] Revert "retry request without query when info/refs?query fails" Shawn O. Pearce @ 2012-09-20 17:24 ` Jeff King 2012-09-20 23:05 ` Shawn Pearce 1 sibling, 1 reply; 36+ messages in thread From: Jeff King @ 2012-09-20 17:24 UTC (permalink / raw) To: Shawn Pearce; +Cc: Junio C Hamano, git On Wed, Sep 19, 2012 at 10:57:35PM -0700, Shawn O. Pearce wrote: > > I have been looking into this recently, as well. GitHub does not allow > > dumb http at all these days, > > Interesting that GitHub doesn't support dumb transfer either. Our objects are still in regular repos, and is served by fairly stock git. The main show-stopper is that we share objects via alternates, and we aggressively repack the alternates repos. So a dumb client would end up being quite inefficient. We also toyed with having mixed public/private fork networks at one point, which would obviously necessitate disabling dumb access. But we gave it up as too risky, as you can do all sorts of weird tricks to convince git to disclose information about what's in the private repos (e.g., you can reliably find out which sha1s exist, and you can lie about which sha1s you have to ask git to create deltas against them). Doing a shared-object store right would require marking which repo "knows" which objects (I assume that's what Google Code does, but I don't remember if Dave talked about that aspect at last year's GitTogether). > > so transient errors on the initial smart > > contact can cause us to fall back to dumb, > > Transient errors is actually what is leading me down this path. We see > about 0.0455% of our requests to the Android hosting service as these > dumb fallbacks. This means the client never got a proper smart service > reply. Server logs suggest we sent a valid response, so I am > suspecting transient proxy errors, but its hard to debug because > clients discard the error. Yup, we see the same thing. I've tracked a few down manually to actual things like gateway timeouts on our reverse proxy. > > and end up reporting a > > totally useless 403 Forbidden error. > > Today I posted a change to JGit [1] to make this 406 Not Acceptable as > I think it better matches the description of the failure. It also no > longer reuses the common 403 Forbidden error that a server might > choose to return if a request was given with invalid credentials. That might be worth doing for git-http-backend, too. It might even make sense for the git client to recognize the 406 (and possibly the 403) and print a more useful message. > > 1. If both smart and dumb requests fail, report the error for the > > smart request. Now that smart-http clients are common, I'd expect > > most http servers to be smart these days. Of course I don't have > > any sort of numbers to back this up (nor am I sure how to get them; > > obviously big sites like GitHub and Google Code do a lot of > > traffic, but who knows how many one-off repo-on-a-generic-web-host > > sites still exist?). > > I suspect there are still a number of servers that rely on dumb > protocol to host repositories because its very simple to setup. > Breaking support for this wouldn't be a good idea. I don't think it would break on most servers, though. Even for a dumb server, the initial error will be a useful one. It's only the weirdly-configured ones where you get wildly different results depending on whether the query string is there. In other words, it is really no worse than reverting 703e6e76, and it might be better. In the common case, you get a better error message. In the broken-server case, we still try the fallback. So it will keep working on a broken server without any manual intervention, whereas reverting and adding a manual escape hatch means the user has to do something. BUT. I still think it's better to revert 703e6e76, because I really do think the broken-server case is an extreme enough minority that it is not even worth wasting the time of clients to make a second request (especially because the first request may very well have failed because of a network error that causes a long timeout, and the user then has to wait double to find out what the error is). Of course, I have no actual data aside from reading the original thread that led to 703e6e76, and the fact that nobody else mentioned it (not during the time when it was broken, and not even after, when people often still complain because they haven't upgraded yet). But who knows? Maybe I will eat my words, and we will end up getting that data in the form of complaints. :) We can always switch to fallback-but-prefer-the-initial-error then. And we'll have more data on exactly how the misconfigured servers behave. -Peff ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] Disable dumb HTTP fallback with GIT_CURL_FALLBACK=0 2012-09-20 17:24 ` [PATCH] Disable dumb HTTP fallback with GIT_CURL_FALLBACK=0 Jeff King @ 2012-09-20 23:05 ` Shawn Pearce 2012-09-21 5:26 ` Jeff King 0 siblings, 1 reply; 36+ messages in thread From: Shawn Pearce @ 2012-09-20 23:05 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, git On Thu, Sep 20, 2012 at 10:24 AM, Jeff King <peff@peff.net> wrote: > On Wed, Sep 19, 2012 at 10:57:35PM -0700, Shawn O. Pearce wrote: > >> > so transient errors on the initial smart >> > contact can cause us to fall back to dumb, >> >> Transient errors is actually what is leading me down this path. We see >> about 0.0455% of our requests to the Android hosting service as these >> dumb fallbacks. This means the client never got a proper smart service >> reply. Server logs suggest we sent a valid response, so I am >> suspecting transient proxy errors, but its hard to debug because >> clients discard the error. > > Yup, we see the same thing. I've tracked a few down manually to actual > things like gateway timeouts on our reverse proxy. Our reverse proxies also fail sometimes. :-) But right now I am seeing failures in libcurl's SSL connection that may also be causing the smart connection failures. For example this trace, where libcurl was just not able to connect to respond to the 401 with a password. I suspect what is happening is the SSL session dropped out of cache on our servers, and libcurl couldn't reuse the existing SSL session. Instead of discarding the bad session and retrying, Git aborts. I'm willing to bet modern browsers just discard the bad session and start a new one, because clients can't assume the remote server will be able to remember their session forever. * Couldn't find host android.googlesource.com in the .netrc file; using defaults * About to connect() to android.googlesource.com port 443 (#0) * Trying 2607:f8b0:400e:c02::52... * Connected to android.googlesource.com (2607:f8b0:400e:c02::52) port 443 (#0) * successfully set certificate verify locations: * CAfile: none CApath: /etc/ssl/certs * SSL connection using RC4-SHA * Server certificate: * subject: C=US; ST=California; L=Mountain View; O=Google Inc; CN=*.googlecode.com * start date: 2012-08-16 12:25:39 GMT * expire date: 2013-06-07 19:43:27 GMT * subjectAltName: android.googlesource.com matched * issuer: C=US; O=Google Inc; CN=Google Internet Authority * SSL certificate verify ok. > GET /a/platform/tools/build/info/refs?service=git-upload-pack HTTP/1.1 User-Agent: git/1.7.12.1.1.g9b7ccb3 Host: android.googlesource.com Accept: */* Pragma: no-cache * The requested URL returned error: 401 * Closing connection #0 * Couldn't find host android.googlesource.com in the .netrc file; using defaults * About to connect() to android.googlesource.com port 443 (#0) * Trying 2607:f8b0:400e:c02::52... * Connected to android.googlesource.com (2607:f8b0:400e:c02::52) port 443 (#0) * successfully set certificate verify locations: * CAfile: none CApath: /etc/ssl/certs * SSL re-using session ID * Unknown SSL protocol error in connection to android.googlesource.com:443 * Expire cleared * Closing connection #0 error: Unknown SSL protocol error in connection to android.googlesource.com:443 while accessing https://android.googlesource.com/a/platform/tools/build/info/refs?service=git-upload-pack fatal: HTTP request failed ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] Disable dumb HTTP fallback with GIT_CURL_FALLBACK=0 2012-09-20 23:05 ` Shawn Pearce @ 2012-09-21 5:26 ` Jeff King 2012-09-21 14:19 ` Shawn Pearce 0 siblings, 1 reply; 36+ messages in thread From: Jeff King @ 2012-09-21 5:26 UTC (permalink / raw) To: Shawn Pearce; +Cc: Junio C Hamano, git On Thu, Sep 20, 2012 at 04:05:03PM -0700, Shawn O. Pearce wrote: > But right now I am seeing failures in libcurl's SSL connection that > may also be causing the smart connection failures. For example this > trace, where libcurl was just not able to connect to respond to the > 401 with a password. I suspect what is happening is the SSL session > dropped out of cache on our servers, and libcurl couldn't reuse the > existing SSL session. Instead of discarding the bad session and > retrying, Git aborts. I'm willing to bet modern browsers just discard > the bad session and start a new one, because clients can't assume the > remote server will be able to remember their session forever. That's something I haven't seen. But then, I don't usually see the client side; I just see the fallback dumb fetch in our logs, and have occasionally followed up. Is there a long pause while the user is typing their password? > * SSL re-using session ID > * Unknown SSL protocol error in connection to android.googlesource.com:443 > * Expire cleared > * Closing connection #0 > error: Unknown SSL protocol error in connection to > android.googlesource.com:443 while accessing > https://android.googlesource.com/a/platform/tools/build/info/refs?service=git-upload-pack > fatal: HTTP request failed You could try turning off CURLOPT_SSL_SESSIONID_CACHE and seeing if that improves it. Of course, it is probably hard to reproduce, so it would be tough to know if that helped or not. It would also be nice if you could dump more information on the error from the ssl library (I typically build curl against openssl; I wonder if it could be related to using gnutls or something). -Peff ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] Disable dumb HTTP fallback with GIT_CURL_FALLBACK=0 2012-09-21 5:26 ` Jeff King @ 2012-09-21 14:19 ` Shawn Pearce 2012-10-01 21:23 ` [PATCH] Retry HTTP requests on SSL connect failures Shawn O. Pearce 0 siblings, 1 reply; 36+ messages in thread From: Shawn Pearce @ 2012-09-21 14:19 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, git On Thu, Sep 20, 2012 at 10:26 PM, Jeff King <peff@peff.net> wrote: > On Thu, Sep 20, 2012 at 04:05:03PM -0700, Shawn O. Pearce wrote: > >> But right now I am seeing failures in libcurl's SSL connection that >> may also be causing the smart connection failures. For example this >> trace, where libcurl was just not able to connect to respond to the >> 401 with a password. I suspect what is happening is the SSL session >> dropped out of cache on our servers, and libcurl couldn't reuse the >> existing SSL session. Instead of discarding the bad session and >> retrying, Git aborts. I'm willing to bet modern browsers just discard >> the bad session and start a new one, because clients can't assume the >> remote server will be able to remember their session forever. > > That's something I haven't seen. But then, I don't usually see the > client side; I just see the fallback dumb fetch in our logs, and > have occasionally followed up. I hadn't seen this either until I deleted the fallback code from remote-curl.c and ran git ls-remote in a while true loop for 6 hours. Its obviously happening though. > Is there a long pause while the user is typing their password? No. The password comes off a credential helper that has access to it from a credential store. There is very little lag here, under 100 ms. >> * SSL re-using session ID >> * Unknown SSL protocol error in connection to android.googlesource.com:443 >> * Expire cleared >> * Closing connection #0 >> error: Unknown SSL protocol error in connection to >> android.googlesource.com:443 while accessing >> https://android.googlesource.com/a/platform/tools/build/info/refs?service=git-upload-pack >> fatal: HTTP request failed > > You could try turning off CURLOPT_SSL_SESSIONID_CACHE and seeing if that > improves it. Of course, it is probably hard to reproduce, so it would be > tough to know if that helped or not. It would also be nice if you could > dump more information on the error from the ssl library (I typically > build curl against openssl; I wonder if it could be related to using > gnutls or something). This is OpenSSL, because I also always build against OpenSSL. :-) I'll try the CURLOPT_SSL_SESSIONID_CACHE today. It is hard to reproduce, so not producing it doesn't necessarily mean it isn't still there. ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH] Retry HTTP requests on SSL connect failures 2012-09-21 14:19 ` Shawn Pearce @ 2012-10-01 21:23 ` Shawn O. Pearce 2012-10-01 21:47 ` Junio C Hamano ` (3 more replies) 0 siblings, 4 replies; 36+ messages in thread From: Shawn O. Pearce @ 2012-10-01 21:23 UTC (permalink / raw) To: Junio C Hamano, Jeff King; +Cc: git, Shawn O. Pearce From: "Shawn O. Pearce" <spearce@spearce.org> When libcurl fails to connect to an SSL server always retry the request once. Since the connection failed before the HTTP headers can be sent, no data has exchanged hands, so the remote side has not learned of the request and will not perform it twice. In the wild we have seen git-remote-https fail to connect to some load-balanced SSL servers sporadically, while modern popular browsers (e.g. Firefox and Chromium) have no trouble with the same server pool. Lets assume the site operators (Hi Google!) have a clue and are doing everything they already can to ensure secure, successful SSL connections from a wide range of HTTP clients. Implementing a single level of retry in the client can make it more robust against transient failure modes. --- http.c | 19 ++++++++++++------- remote-curl.c | 2 ++ 2 files changed, 14 insertions(+), 7 deletions(-) diff --git a/http.c b/http.c index 345c171..953f2e6 100644 --- a/http.c +++ b/http.c @@ -784,7 +784,7 @@ static int http_request(const char *url, void *result, int target, int options) struct slot_results results; struct curl_slist *headers = NULL; struct strbuf buf = STRBUF_INIT; - int ret; + int ret, attempts; slot = get_active_slot(); slot->results = &results; @@ -820,12 +820,17 @@ static int http_request(const char *url, void *result, int target, int options) curl_easy_setopt(slot->curl, CURLOPT_HTTPHEADER, headers); curl_easy_setopt(slot->curl, CURLOPT_ENCODING, "gzip"); - if (start_active_slot(slot)) { - run_active_slot(slot); - ret = handle_curl_result(slot); - } else { - error("Unable to start HTTP request for %s", url); - ret = HTTP_START_FAILED; + for (attempts = 0; attempts < 2; attempts++) { + if (start_active_slot(slot)) { + run_active_slot(slot); + if (slot->results->curl_result == CURLE_SSL_CONNECT_ERROR) + continue; + ret = handle_curl_result(slot); + } else { + error("Unable to start HTTP request for %s", url); + ret = HTTP_START_FAILED; + } + break; } curl_slist_free_all(headers); diff --git a/remote-curl.c b/remote-curl.c index a269608..04a379c 100644 --- a/remote-curl.c +++ b/remote-curl.c @@ -353,6 +353,8 @@ static int run_slot(struct active_request_slot *slot) slot->results = &results; slot->curl_result = curl_easy_perform(slot->curl); + if (slot->curl_result == CURLE_SSL_CONNECT_ERROR) + slot->curl_result = curl_easy_perform(slot->curl); finish_active_slot(slot); err = handle_curl_result(slot); -- 1.7.12.1.590.g4bb1bc4 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH] Retry HTTP requests on SSL connect failures 2012-10-01 21:23 ` [PATCH] Retry HTTP requests on SSL connect failures Shawn O. Pearce @ 2012-10-01 21:47 ` Junio C Hamano 2012-10-01 21:53 ` Junio C Hamano ` (2 subsequent siblings) 3 siblings, 0 replies; 36+ messages in thread From: Junio C Hamano @ 2012-10-01 21:47 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Jeff King, git "Shawn O. Pearce" <spearce@spearce.org> writes: > Lets assume the site operators (Hi Google!) have a clue and are > doing everything they already can to ensure secure, successful > SSL connections from a wide range of HTTP clients. Implementing a > single level of retry in the client can make it more robust against > transient failure modes. > --- Sign off? > http.c | 19 ++++++++++++------- > remote-curl.c | 2 ++ > 2 files changed, 14 insertions(+), 7 deletions(-) > > diff --git a/http.c b/http.c > index 345c171..953f2e6 100644 > --- a/http.c > +++ b/http.c > @@ -784,7 +784,7 @@ static int http_request(const char *url, void *result, int target, int options) > struct slot_results results; > struct curl_slist *headers = NULL; > struct strbuf buf = STRBUF_INIT; > - int ret; > + int ret, attempts; > > slot = get_active_slot(); > slot->results = &results; > @@ -820,12 +820,17 @@ static int http_request(const char *url, void *result, int target, int options) > curl_easy_setopt(slot->curl, CURLOPT_HTTPHEADER, headers); > curl_easy_setopt(slot->curl, CURLOPT_ENCODING, "gzip"); > > - if (start_active_slot(slot)) { > - run_active_slot(slot); > - ret = handle_curl_result(slot); > - } else { > - error("Unable to start HTTP request for %s", url); > - ret = HTTP_START_FAILED; > + for (attempts = 0; attempts < 2; attempts++) { > + if (start_active_slot(slot)) { > + run_active_slot(slot); > + if (slot->results->curl_result == CURLE_SSL_CONNECT_ERROR) > + continue; > + ret = handle_curl_result(slot); > + } else { > + error("Unable to start HTTP request for %s", url); > + ret = HTTP_START_FAILED; > + } > + break; > } Two naïve questions, that applies to this and the one in remote-curl.c::run_slot(). (1) why only twice? (2) no need for "wait a bit and then retry"? > diff --git a/remote-curl.c b/remote-curl.c > index a269608..04a379c 100644 > --- a/remote-curl.c > +++ b/remote-curl.c > @@ -353,6 +353,8 @@ static int run_slot(struct active_request_slot *slot) > > slot->results = &results; > slot->curl_result = curl_easy_perform(slot->curl); > + if (slot->curl_result == CURLE_SSL_CONNECT_ERROR) > + slot->curl_result = curl_easy_perform(slot->curl); > finish_active_slot(slot); > > err = handle_curl_result(slot); ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] Retry HTTP requests on SSL connect failures 2012-10-01 21:23 ` [PATCH] Retry HTTP requests on SSL connect failures Shawn O. Pearce 2012-10-01 21:47 ` Junio C Hamano @ 2012-10-01 21:53 ` Junio C Hamano 2012-10-01 22:23 ` Jeff King 2012-10-01 22:18 ` Jeff King 2012-10-02 0:14 ` Drew Northup 3 siblings, 1 reply; 36+ messages in thread From: Junio C Hamano @ 2012-10-01 21:53 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Jeff King, git "Shawn O. Pearce" <spearce@spearce.org> writes: > + for (attempts = 0; attempts < 2; attempts++) { > + if (start_active_slot(slot)) { > + run_active_slot(slot); > + if (slot->results->curl_result == CURLE_SSL_CONNECT_ERROR) > + continue; Is it safe to continue and let start_active_slot() to add the same curl handle again when USE_CURL_MULTI is in effect? > + ret = handle_curl_result(slot); > + } else { > + error("Unable to start HTTP request for %s", url); > + ret = HTTP_START_FAILED; > + } > + break; > } > > curl_slist_free_all(headers); > diff --git a/remote-curl.c b/remote-curl.c > index a269608..04a379c 100644 > --- a/remote-curl.c > +++ b/remote-curl.c > @@ -353,6 +353,8 @@ static int run_slot(struct active_request_slot *slot) > > slot->results = &results; > slot->curl_result = curl_easy_perform(slot->curl); > + if (slot->curl_result == CURLE_SSL_CONNECT_ERROR) > + slot->curl_result = curl_easy_perform(slot->curl); > finish_active_slot(slot); > > err = handle_curl_result(slot); ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] Retry HTTP requests on SSL connect failures 2012-10-01 21:53 ` Junio C Hamano @ 2012-10-01 22:23 ` Jeff King 2012-10-01 23:20 ` Junio C Hamano 0 siblings, 1 reply; 36+ messages in thread From: Jeff King @ 2012-10-01 22:23 UTC (permalink / raw) To: Junio C Hamano; +Cc: Shawn O. Pearce, git On Mon, Oct 01, 2012 at 02:53:17PM -0700, Junio C Hamano wrote: > "Shawn O. Pearce" <spearce@spearce.org> writes: > > > + for (attempts = 0; attempts < 2; attempts++) { > > + if (start_active_slot(slot)) { > > + run_active_slot(slot); > > + if (slot->results->curl_result == CURLE_SSL_CONNECT_ERROR) > > + continue; > > Is it safe to continue and let start_active_slot() to add the same > curl handle again when USE_CURL_MULTI is in effect? I _think_ so. We reuse the slots anyway. So the usual workflow would be get_active_slot, then start_active_slot, then run_active_slot. This loop omits get_active_slot, which is responsible for (re-)initializing a bunch of aspects of the slot. But we wouldn't want that here, since it would mean we'd have to set up our URL, callbacks, etc, again. My only worry would be that the failed curl request actually ended up writing some data or made some other state change. But since we are explicitly catching only ssl connection failures, presumably that would not have happened. -Peff ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] Retry HTTP requests on SSL connect failures 2012-10-01 22:23 ` Jeff King @ 2012-10-01 23:20 ` Junio C Hamano 0 siblings, 0 replies; 36+ messages in thread From: Junio C Hamano @ 2012-10-01 23:20 UTC (permalink / raw) To: Jeff King; +Cc: Shawn O. Pearce, git Jeff King <peff@peff.net> writes: > On Mon, Oct 01, 2012 at 02:53:17PM -0700, Junio C Hamano wrote: > >> "Shawn O. Pearce" <spearce@spearce.org> writes: >> >> > + for (attempts = 0; attempts < 2; attempts++) { >> > + if (start_active_slot(slot)) { >> > + run_active_slot(slot); >> > + if (slot->results->curl_result == CURLE_SSL_CONNECT_ERROR) >> > + continue; >> >> Is it safe to continue and let start_active_slot() to add the same >> curl handle again when USE_CURL_MULTI is in effect? > > I _think_ so. It seems that at the beginning of curl_multi_add_handle() there is a check to see if the incoming slot->curl has already been added to some curl-multi-handle and the function would return an error code CURLM_BAD_EASY_HANDLE without doing anything useful. Doesn't the second attempt to call start_active_slot() set the slot->in_use to zero and return false, skipping the call to run_active_slot() in that case? ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] Retry HTTP requests on SSL connect failures 2012-10-01 21:23 ` [PATCH] Retry HTTP requests on SSL connect failures Shawn O. Pearce 2012-10-01 21:47 ` Junio C Hamano 2012-10-01 21:53 ` Junio C Hamano @ 2012-10-01 22:18 ` Jeff King 2012-10-02 2:38 ` Shawn Pearce 2012-10-02 0:14 ` Drew Northup 3 siblings, 1 reply; 36+ messages in thread From: Jeff King @ 2012-10-01 22:18 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Junio C Hamano, git On Mon, Oct 01, 2012 at 02:23:06PM -0700, Shawn O. Pearce wrote: > From: "Shawn O. Pearce" <spearce@spearce.org> > > When libcurl fails to connect to an SSL server always retry the > request once. Since the connection failed before the HTTP headers > can be sent, no data has exchanged hands, so the remote side has > not learned of the request and will not perform it twice. > > In the wild we have seen git-remote-https fail to connect to > some load-balanced SSL servers sporadically, while modern popular > browsers (e.g. Firefox and Chromium) have no trouble with the same > server pool. > > Lets assume the site operators (Hi Google!) have a clue and are > doing everything they already can to ensure secure, successful > SSL connections from a wide range of HTTP clients. Implementing a > single level of retry in the client can make it more robust against > transient failure modes. I find this a little distasteful just because we haven't figured out the actual _reason_ for the failure. That is, I'm not convinced this isn't something that curl or the ssl library can't handle internally if we would only configure them correctly. Did you ever follow up on tweaking the session caching options for curl? Have you tried running your fails-after-a-few-hours request with other clients that don't have the problem and seeing what they do (I'm thinking a small webkit harness or something would be the most feasible)? That being said, you did make it so that it only kicks in during ssl connect errors: > + for (attempts = 0; attempts < 2; attempts++) { > + if (start_active_slot(slot)) { > + run_active_slot(slot); > + if (slot->results->curl_result == CURLE_SSL_CONNECT_ERROR) > + continue; > + ret = handle_curl_result(slot); > + } else { > + error("Unable to start HTTP request for %s", url); > + ret = HTTP_START_FAILED; > + } > + break; which means it shouldn't really be affecting the general populace. So even though it feels like a dirty hack, at least it is self-contained, and it does fix a real-world problem. If your answer to the above questions is "hunting this further is just not worth the effort", I can live with that. > diff --git a/remote-curl.c b/remote-curl.c > index a269608..04a379c 100644 > --- a/remote-curl.c > +++ b/remote-curl.c > @@ -353,6 +353,8 @@ static int run_slot(struct active_request_slot *slot) > > slot->results = &results; > slot->curl_result = curl_easy_perform(slot->curl); > + if (slot->curl_result == CURLE_SSL_CONNECT_ERROR) > + slot->curl_result = curl_easy_perform(slot->curl); > finish_active_slot(slot); How come the first hunk gets a nice for-loop and this one doesn't? Also, are these hunks the only two spots where this error can come up? The first one does http_request, which handles smart-http GET requests. the second does run_slot, which handles smart-http POST requests. Some of the dumb http fetches will go through http_request. But some will not. And I think almost none of dumb http push will. -Peff ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] Retry HTTP requests on SSL connect failures 2012-10-01 22:18 ` Jeff King @ 2012-10-02 2:38 ` Shawn Pearce 2012-10-02 13:57 ` Drew Northup 0 siblings, 1 reply; 36+ messages in thread From: Shawn Pearce @ 2012-10-02 2:38 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, git On Mon, Oct 1, 2012 at 3:18 PM, Jeff King <peff@peff.net> wrote: > On Mon, Oct 01, 2012 at 02:23:06PM -0700, Shawn O. Pearce wrote: >> >> When libcurl fails to connect to an SSL server always retry the >> request once. Since the connection failed before the HTTP headers >> can be sent, no data has exchanged hands, so the remote side has >> not learned of the request and will not perform it twice. > > I find this a little distasteful just because we haven't figured out the > actual _reason_ for the failure. That is, I'm not convinced this isn't > something that curl or the ssl library can't handle internally if we > would only configure them correctly. Did you ever follow up on tweaking > the session caching options for curl? No. I didn't try because I reproduced the issue on the initial "GET /.../info/refs?service=git-upload-pack" request with no authentication required. So the very first thing the remote-https process did was fail on an SSL error. During this run I was using a patched Git that had a different version of the retry logic, but it immediately retried and the retry was successful. At that point I decided the SSL session cache wasn't possibly relevant since the first request failed and the immediate retry was OK. > Have you tried running your fails-after-a-few-hours request with other > clients that don't have the problem and seeing what they do This is harder to reproduce than you think. It took me about 5 days of continuous polling to reproduce the error. And I have thus far only reproduced it against our production servers. This makes it very hard to test anything. Or to prove that any given patch is better than a different version. > (I'm > thinking a small webkit harness or something would be the most > feasible)? So I suspect the contrib/persistent-https proxy thing in Go actually papers over this problem by having the Go SSL client handle the connection. But this is only based on a test I ran for several days through that proxy that did not reproduce the bug. This doesn't mean it doesn't reproduce with the proxy, it just means _I_ didn't get lucky with an error in a ~48 hour run. > which means it shouldn't really be affecting the general populace. So > even though it feels like a dirty hack, at least it is self-contained, > and it does fix a real-world problem. If your answer to the above > questions is "hunting this further is just not worth the effort", I can > live with that. I am sort of at that point, but the hack is so ugly... yea, we shouldn't have to do this. Or pollute our code with it. I'm willing to go back and iterate on this further, but its going to be a while before I can provide any more information. >> diff --git a/remote-curl.c b/remote-curl.c >> index a269608..04a379c 100644 >> --- a/remote-curl.c >> +++ b/remote-curl.c >> @@ -353,6 +353,8 @@ static int run_slot(struct active_request_slot *slot) >> >> slot->results = &results; >> slot->curl_result = curl_easy_perform(slot->curl); >> + if (slot->curl_result == CURLE_SSL_CONNECT_ERROR) >> + slot->curl_result = curl_easy_perform(slot->curl); >> finish_active_slot(slot); > > How come the first hunk gets a nice for-loop and this one doesn't? Both hunks retry exactly once after an SSL connect error. I just tried to pick something reasonably clean to implement. This hunk seemed simple with the if, the other was uglier and a loop seemed the most simple way to get a retry in there. > Also, are these hunks the only two spots where this error can come up? > The first one does http_request, which handles smart-http GET requests. > the second does run_slot, which handles smart-http POST requests. Grrr. I thought I caught all of the curl perform calls but I guess I missed the dumb transport. > Some of the dumb http fetches will go through http_request. But some > will not. And I think almost none of dumb http push will. Well, don't use those? :-) ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] Retry HTTP requests on SSL connect failures 2012-10-02 2:38 ` Shawn Pearce @ 2012-10-02 13:57 ` Drew Northup 0 siblings, 0 replies; 36+ messages in thread From: Drew Northup @ 2012-10-02 13:57 UTC (permalink / raw) To: Shawn Pearce; +Cc: Jeff King, Junio C Hamano, git On Mon, Oct 1, 2012 at 10:38 PM, Shawn Pearce <spearce@spearce.org> wrote: > On Mon, Oct 1, 2012 at 3:18 PM, Jeff King <peff@peff.net> wrote: >> On Mon, Oct 01, 2012 at 02:23:06PM -0700, Shawn O. Pearce wrote: >>> >>> When libcurl fails to connect to an SSL server always retry the >>> request once. Since the connection failed before the HTTP headers >>> can be sent, no data has exchanged hands, so the remote side has >>> not learned of the request and will not perform it twice. >> >> I find this a little distasteful just because we haven't figured out the >> actual _reason_ for the failure. > > No. I didn't try because I reproduced the issue on the initial "GET > /.../info/refs?service=git-upload-pack" request with no authentication > required. So the very first thing the remote-https process did was > fail on an SSL error. During this run I was using a patched Git that > had a different version of the retry logic, but it immediately retried > and the retry was successful. At that point I decided the SSL session > cache wasn't possibly relevant since the first request failed and the > immediate retry was OK. > >> Have you tried running your fails-after-a-few-hours request with other >> clients that don't have the problem and seeing what they do > > This is harder to reproduce than you think. It took me about 5 days of > continuous polling to reproduce the error. And I have thus far only > reproduced it against our production servers. This makes it very hard > to test anything. Or to prove that any given patch is better than a > different version. The only sure way to make sure your patch works is to get your load balancers Slashdotted first (reason noted in my previous mail on this subject). For the sake of your relationship with your networking crew I'd not advise doing that intentionally. >> which means it shouldn't really be affecting the general populace. So >> even though it feels like a dirty hack, at least it is self-contained, >> and it does fix a real-world problem. If your answer to the above >> questions is "hunting this further is just not worth the effort", I can >> live with that. > > I am sort of at that point, but the hack is so ugly... yea, we > shouldn't have to do this. Or pollute our code with it. I'm willing to > go back and iterate on this further, but its going to be a while > before I can provide any more information. >> How come the first hunk gets a nice for-loop and this one doesn't? > > Both hunks retry exactly once after an SSL connect error. I just tried > to pick something reasonably clean to implement. This hunk seemed > simple with the if, the other was uglier and a loop seemed the most > simple way to get a retry in there. If indeed the problem you are having is with a load balanced setup then applying TCP/IP like back-off semantics is the right way to go. The only reason the network stack isn't doing it for you is because the load balancers wait for the SSL/TLS start before dumping the "excess" (exceeding of license) SSL connections. -- -Drew Northup -------------------------------------------------------------- "As opposed to vegetable or mineral error?" -John Pescatore, SANS NewsBites Vol. 12 Num. 59 ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] Retry HTTP requests on SSL connect failures 2012-10-01 21:23 ` [PATCH] Retry HTTP requests on SSL connect failures Shawn O. Pearce ` (2 preceding siblings ...) 2012-10-01 22:18 ` Jeff King @ 2012-10-02 0:14 ` Drew Northup 3 siblings, 0 replies; 36+ messages in thread From: Drew Northup @ 2012-10-02 0:14 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Junio C Hamano, Jeff King, git On Mon, Oct 1, 2012 at 5:23 PM, Shawn O. Pearce <spearce@spearce.org> wrote: > From: "Shawn O. Pearce" <spearce@spearce.org> > > When libcurl fails to connect to an SSL server always retry the > request once. Since the connection failed before the HTTP headers > can be sent, no data has exchanged hands, so the remote side has > not learned of the request and will not perform it twice. > > In the wild we have seen git-remote-https fail to connect to > some load-balanced SSL servers sporadically, while modern popular > browsers (e.g. Firefox and Chromium) have no trouble with the same > server pool. > > Lets assume the site operators (Hi Google!) have a clue and are > doing everything they already can to ensure secure, successful > SSL connections from a wide range of HTTP clients. Implementing a > single level of retry in the client can make it more robust against > transient failure modes. Ok, this begs for some background info... @Dayjob one of the many things I do is mange our load balancers (redundant pair in our case). If the attempted SSL connections in one "bin" (time-slot) exceeds the licensed size of that "bin" then the excess attempts are just "dropped on the floor." Normal web browsers detect this initial failure and try again. This may be implemented internally—I haven't checked. Google, as I am sure you are well aware, doesn't rely upon a traditional L2/L3 network level load balancing architecture. Therefore, I would not attempt to argue that the results that apply to their systems would apply much of anywhere else. (They have done presentations publicly, which are archived on the 'net, about how they do things.) -- -Drew Northup -------------------------------------------------------------- "As opposed to vegetable or mineral error?" -John Pescatore, SANS NewsBites Vol. 12 Num. 59 ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re* [PATCH] Disable dumb HTTP fallback with GIT_CURL_FALLBACK=0 2012-09-20 2:55 [PATCH] Disable dumb HTTP fallback with GIT_CURL_FALLBACK=0 Shawn O. Pearce 2012-09-20 3:22 ` Shawn Pearce 2012-09-20 3:48 ` Jeff King @ 2012-09-20 4:14 ` Junio C Hamano 2012-09-20 4:14 ` [PATCH 1/2] Disable dumb HTTP fallback with GIT_DUMB_HTTP_FALLBACK=false Junio C Hamano 2012-09-20 4:14 ` [PATCH 2/2] remote-curl: make dumb-http fallback configurable per URL Junio C Hamano 2 siblings, 2 replies; 36+ messages in thread From: Junio C Hamano @ 2012-09-20 4:14 UTC (permalink / raw) To: git; +Cc: Shawn O. Pearce "Shawn O. Pearce" <spearce@spearce.org> writes: > From: "Shawn O. Pearce" <spearce@spearce.org> > > If the user doesn't want to use the dumb HTTP protocol, she may > set GIT_CURL_FALLBACK=0 in the environment before invoking a Git > protocol operation. This is mostly useful when testing against > servers that are known to not support the dumb protocol. If the > smart service detection fails the client should not continue with > dumb behavior, but instead provide accurate HTTP failure data. > > Signed-off-by: Shawn O. Pearce <spearce@spearce.org> > --- > remote-curl.c | 12 ++++++++++-- > 1 file changed, 10 insertions(+), 2 deletions(-) I can see how this feature may be useful, but I have to say the external interface triply sucks. - If it is primarily for debugging smart HTTP, a solution with an environment variable without more permanent configuration variable would be sufficient, but then the environment variable is better named GIT_SMART_HTTP_DEBUG, or something no? - If it is useful outside the context of debugging, perhaps a per remote configuration variable remote.$name.$variable (or http.$prefix_of_server_url.$variable) might be necessary? - I do not see this as "fallback (to) curl"; you still talk your smart protocol over curl library. "fallback to dumb http" is more understandable. In any case, I think CURL_FALLBACK was named with CURL in its name primarily because the environment applies only to remote-curl, but that means we cannot have any fallback logic other than the current "smart does not work, fall back on dumb" in the future. Here is a bit of rewrite. [1/2] is yours but with a bit more sensible name. [2/2] is entirely optional. Junio C Hamano (1): remote-curl: make dumb-http fallback configurable per URL Shawn O. Pearce (1): Disable dumb HTTP fallback with GIT_DUMB_HTTP_FALLBACK=false remote-curl.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 53 insertions(+), 2 deletions(-) -- 1.7.12.1.389.g3dff30b ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 1/2] Disable dumb HTTP fallback with GIT_DUMB_HTTP_FALLBACK=false 2012-09-20 4:14 ` Re* [PATCH] Disable dumb HTTP fallback with GIT_CURL_FALLBACK=0 Junio C Hamano @ 2012-09-20 4:14 ` Junio C Hamano 2012-09-20 4:14 ` [PATCH 2/2] remote-curl: make dumb-http fallback configurable per URL Junio C Hamano 1 sibling, 0 replies; 36+ messages in thread From: Junio C Hamano @ 2012-09-20 4:14 UTC (permalink / raw) To: git; +Cc: Shawn O. Pearce From: "Shawn O. Pearce" <spearce@spearce.org> If the user doesn't want to use the dumb HTTP protocol, she may set GIT_DUMB_HTTP_FALLBACK=false in the environment before invoking a Git protocol operation. This is mostly useful when testing against servers that are known to not support the dumb protocol. If the smart service detection fails the client should not continue with dumb behavior, but instead provide accurate HTTP failure data. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> --- remote-curl.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/remote-curl.c b/remote-curl.c index 3ec474f..f25cf3c 100644 --- a/remote-curl.c +++ b/remote-curl.c @@ -17,7 +17,8 @@ struct options { unsigned progress : 1, followtags : 1, dry_run : 1, - thin : 1; + thin : 1, + fallback : 1; }; static struct options options; @@ -115,7 +116,8 @@ static struct discovery* discover_refs(const char *service) http_ret = http_get_strbuf(refs_url, &buffer, HTTP_NO_CACHE); /* try again with "plain" url (no ? or & appended) */ - if (http_ret != HTTP_OK && http_ret != HTTP_NOAUTH) { + if (options.fallback && http_ret != HTTP_OK + && http_ret != HTTP_NOAUTH) { free(refs_url); strbuf_reset(&buffer); @@ -853,6 +855,8 @@ static void parse_push(struct strbuf *buf) free(specs); } +static const char DUMB_HTTP_FALLBACK_ENV[] = "GIT_DUMB_HTTP_FALLBACK"; + int main(int argc, const char **argv) { struct strbuf buf = STRBUF_INIT; @@ -868,6 +872,12 @@ int main(int argc, const char **argv) options.verbosity = 1; options.progress = !!isatty(2); options.thin = 1; + options.fallback = 1; + + if (getenv(DUMB_HTTP_FALLBACK_ENV)) { + char *fb = getenv(DUMB_HTTP_FALLBACK_ENV); + options.fallback = git_config_bool(DUMB_HTTP_FALLBACK_ENV, fb); + } remote = remote_get(argv[1]); -- 1.7.12.1.389.g3dff30b ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 2/2] remote-curl: make dumb-http fallback configurable per URL 2012-09-20 4:14 ` Re* [PATCH] Disable dumb HTTP fallback with GIT_CURL_FALLBACK=0 Junio C Hamano 2012-09-20 4:14 ` [PATCH 1/2] Disable dumb HTTP fallback with GIT_DUMB_HTTP_FALLBACK=false Junio C Hamano @ 2012-09-20 4:14 ` Junio C Hamano 1 sibling, 0 replies; 36+ messages in thread From: Junio C Hamano @ 2012-09-20 4:14 UTC (permalink / raw) To: git Introduce http.$url_prefix.dumbhttpfallback configuration variables so that users do not have to set GIT_DUMB_HTTP_FALLBACK environment depending on which remote they are talking with. Signed-off-by: Junio C Hamano <gitster@pobox.com> --- remote-curl.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 47 insertions(+), 6 deletions(-) diff --git a/remote-curl.c b/remote-curl.c index f25cf3c..44544c7 100644 --- a/remote-curl.c +++ b/remote-curl.c @@ -855,6 +855,46 @@ static void parse_push(struct strbuf *buf) free(specs); } +struct dumb_fallback_cb { + const char *url; + int value; +}; + +static int dumb_fallback_cb(const char *key, const char *value, void *cb_) +{ + struct dumb_fallback_cb *cb = cb_; + int i, len; + + /* Is this "http.$url.dumbHttpFallback"? */ + if (prefixcmp(key, "http.")) + return 0; + len = strrchr(key, '.') - key; + if (len <= 5) + /* we found the dot at the end of "http." */ + return 0; + key += 5; /* skip over http. part */ + len -= 5; + if (strcmp(key + len, ".dumbhttpfallback")) + return 0; + + /* Does the $url part match? */ + for (i = 0; i < len; i++) + if (cb->url[i] != key[i]) + return 0; + cb->value = git_config_bool(key, value); + return 0; +} + +static int dumb_fallback_config(const char *url) +{ + struct dumb_fallback_cb cb; + + cb.url = url; + cb.value = 1; /* defaults to true */ + git_config(dumb_fallback_cb, &cb); + return cb.value; +} + static const char DUMB_HTTP_FALLBACK_ENV[] = "GIT_DUMB_HTTP_FALLBACK"; int main(int argc, const char **argv) @@ -872,12 +912,6 @@ int main(int argc, const char **argv) options.verbosity = 1; options.progress = !!isatty(2); options.thin = 1; - options.fallback = 1; - - if (getenv(DUMB_HTTP_FALLBACK_ENV)) { - char *fb = getenv(DUMB_HTTP_FALLBACK_ENV); - options.fallback = git_config_bool(DUMB_HTTP_FALLBACK_ENV, fb); - } remote = remote_get(argv[1]); @@ -889,6 +923,13 @@ int main(int argc, const char **argv) url = strbuf_detach(&buf, NULL); + if (getenv(DUMB_HTTP_FALLBACK_ENV)) { + char *fb = getenv(DUMB_HTTP_FALLBACK_ENV); + options.fallback = git_config_bool(DUMB_HTTP_FALLBACK_ENV, fb); + } else { + options.fallback = dumb_fallback_config(url); + } + http_init(remote, url, 0); do { -- 1.7.12.1.389.g3dff30b ^ permalink raw reply related [flat|nested] 36+ messages in thread
end of thread, other threads:[~2012-10-02 13:57 UTC | newest] Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-09-20 2:55 [PATCH] Disable dumb HTTP fallback with GIT_CURL_FALLBACK=0 Shawn O. Pearce 2012-09-20 3:22 ` Shawn Pearce 2012-09-20 3:52 ` Jeff King 2012-09-20 3:48 ` Jeff King 2012-09-20 5:57 ` Shawn Pearce 2012-09-20 5:58 ` [PATCH] Revert "retry request without query when info/refs?query fails" Shawn O. Pearce 2012-09-20 6:29 ` Junio C Hamano 2012-09-20 6:31 ` Junio C Hamano 2012-09-20 16:24 ` Jeff King 2012-09-20 16:59 ` [PATCH 0/2] smart http toggle switch fails" Jeff King 2012-09-20 17:00 ` [PATCH 1/2] remote-curl: rename is_http variable Jeff King 2012-09-20 17:05 ` [PATCH 2/2] remote-curl: let users turn off smart http Jeff King 2012-09-20 17:53 ` Junio C Hamano 2012-09-20 18:12 ` Jeff King 2012-09-20 18:36 ` Junio C Hamano 2012-09-20 20:51 ` Jeff King 2012-09-20 21:15 ` Junio C Hamano 2012-09-20 21:30 ` Jeff King 2012-09-21 17:34 ` Junio C Hamano 2012-09-21 17:41 ` Jeff King 2012-09-20 17:24 ` [PATCH] Disable dumb HTTP fallback with GIT_CURL_FALLBACK=0 Jeff King 2012-09-20 23:05 ` Shawn Pearce 2012-09-21 5:26 ` Jeff King 2012-09-21 14:19 ` Shawn Pearce 2012-10-01 21:23 ` [PATCH] Retry HTTP requests on SSL connect failures Shawn O. Pearce 2012-10-01 21:47 ` Junio C Hamano 2012-10-01 21:53 ` Junio C Hamano 2012-10-01 22:23 ` Jeff King 2012-10-01 23:20 ` Junio C Hamano 2012-10-01 22:18 ` Jeff King 2012-10-02 2:38 ` Shawn Pearce 2012-10-02 13:57 ` Drew Northup 2012-10-02 0:14 ` Drew Northup 2012-09-20 4:14 ` Re* [PATCH] Disable dumb HTTP fallback with GIT_CURL_FALLBACK=0 Junio C Hamano 2012-09-20 4:14 ` [PATCH 1/2] Disable dumb HTTP fallback with GIT_DUMB_HTTP_FALLBACK=false Junio C Hamano 2012-09-20 4:14 ` [PATCH 2/2] remote-curl: make dumb-http fallback configurable per URL Junio C Hamano
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).