From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff King Subject: Re: [PATCH] Retry HTTP requests on SSL connect failures Date: Mon, 1 Oct 2012 18:18:17 -0400 Message-ID: <20121001221817.GA12496@sigill.intra.peff.net> References: <1349126586-755-1-git-send-email-spearce@spearce.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Cc: Junio C Hamano , git@vger.kernel.org To: "Shawn O. Pearce" X-From: git-owner@vger.kernel.org Tue Oct 02 00:18:43 2012 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1TIoJv-0003mJ-K3 for gcvg-git-2@plane.gmane.org; Tue, 02 Oct 2012 00:18:35 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757434Ab2JAWSW (ORCPT ); Mon, 1 Oct 2012 18:18:22 -0400 Received: from 75-15-5-89.uvs.iplsin.sbcglobal.net ([75.15.5.89]:37508 "EHLO peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753051Ab2JAWSU (ORCPT ); Mon, 1 Oct 2012 18:18:20 -0400 Received: (qmail 24015 invoked by uid 107); 1 Oct 2012 22:18:50 -0000 Received: from sigill.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.7) (smtp-auth username relayok, mechanism cram-md5) by peff.net (qpsmtpd/0.84) with ESMTPA; Mon, 01 Oct 2012 18:18:50 -0400 Received: by sigill.intra.peff.net (sSMTP sendmail emulation); Mon, 01 Oct 2012 18:18:17 -0400 Content-Disposition: inline In-Reply-To: <1349126586-755-1-git-send-email-spearce@spearce.org> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Mon, Oct 01, 2012 at 02:23:06PM -0700, Shawn O. Pearce wrote: > From: "Shawn O. Pearce" > > When libcurl fails to connect to an SSL server always retry the > request once. Since the connection failed before the HTTP headers > can be sent, no data has exchanged hands, so the remote side has > not learned of the request and will not perform it twice. > > In the wild we have seen git-remote-https fail to connect to > some load-balanced SSL servers sporadically, while modern popular > browsers (e.g. Firefox and Chromium) have no trouble with the same > server pool. > > Lets assume the site operators (Hi Google!) have a clue and are > doing everything they already can to ensure secure, successful > SSL connections from a wide range of HTTP clients. Implementing a > single level of retry in the client can make it more robust against > transient failure modes. I find this a little distasteful just because we haven't figured out the actual _reason_ for the failure. That is, I'm not convinced this isn't something that curl or the ssl library can't handle internally if we would only configure them correctly. Did you ever follow up on tweaking the session caching options for curl? Have you tried running your fails-after-a-few-hours request with other clients that don't have the problem and seeing what they do (I'm thinking a small webkit harness or something would be the most feasible)? That being said, you did make it so that it only kicks in during ssl connect errors: > + for (attempts = 0; attempts < 2; attempts++) { > + if (start_active_slot(slot)) { > + run_active_slot(slot); > + if (slot->results->curl_result == CURLE_SSL_CONNECT_ERROR) > + continue; > + ret = handle_curl_result(slot); > + } else { > + error("Unable to start HTTP request for %s", url); > + ret = HTTP_START_FAILED; > + } > + break; which means it shouldn't really be affecting the general populace. So even though it feels like a dirty hack, at least it is self-contained, and it does fix a real-world problem. If your answer to the above questions is "hunting this further is just not worth the effort", I can live with that. > diff --git a/remote-curl.c b/remote-curl.c > index a269608..04a379c 100644 > --- a/remote-curl.c > +++ b/remote-curl.c > @@ -353,6 +353,8 @@ static int run_slot(struct active_request_slot *slot) > > slot->results = &results; > slot->curl_result = curl_easy_perform(slot->curl); > + if (slot->curl_result == CURLE_SSL_CONNECT_ERROR) > + slot->curl_result = curl_easy_perform(slot->curl); > finish_active_slot(slot); How come the first hunk gets a nice for-loop and this one doesn't? Also, are these hunks the only two spots where this error can come up? The first one does http_request, which handles smart-http GET requests. the second does run_slot, which handles smart-http POST requests. Some of the dumb http fetches will go through http_request. But some will not. And I think almost none of dumb http push will. -Peff