From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff King Subject: Re: clone hang prevention / timeout? Date: Wed, 13 Apr 2016 18:24:14 -0400 Message-ID: <20160413222414.GA9882@sigill.intra.peff.net> References: Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Cc: git@vger.kernel.org To: Jason Vas Dias X-From: git-owner@vger.kernel.org Thu Apr 14 00:24:21 2016 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1aqTCv-00078E-Cz for gcvg-git-2@plane.gmane.org; Thu, 14 Apr 2016 00:24:21 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753604AbcDMWYR (ORCPT ); Wed, 13 Apr 2016 18:24:17 -0400 Received: from cloud.peff.net ([50.56.180.127]:49118 "HELO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1753033AbcDMWYR (ORCPT ); Wed, 13 Apr 2016 18:24:17 -0400 Received: (qmail 7575 invoked by uid 102); 13 Apr 2016 22:24:16 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.84) with SMTP; Wed, 13 Apr 2016 18:24:16 -0400 Received: (qmail 23357 invoked by uid 107); 13 Apr 2016 22:24:22 -0000 Received: from sigill.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.7) by peff.net (qpsmtpd/0.84) with SMTP; Wed, 13 Apr 2016 18:24:22 -0400 Received: by sigill.intra.peff.net (sSMTP sendmail emulation); Wed, 13 Apr 2016 18:24:14 -0400 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Mon, Apr 11, 2016 at 10:49:19PM +0100, Jason Vas Dias wrote: > It appears GIT has no way of specifying a timeout for a clone operation - > if the server decides not to complete a get request, the clone can > hang forever - > is this correct ? Yes. Git's protocol has no timeouts, though each side is generally either writing or reading at any moment, and so an interrupted connection should cause either EPIPE or EOF, ending the process. The exceptions I have seen are: - protocol / implementation bugs that cause a true deadlock. At this we've fixed all known cases, but that doesn't mean there aren't bugs lurking. - the network drops out in such a way that the OS doesn't realize the connection is gone, and the reading side is left waiting for input forever I think the TCP keepalive stuff that Eric mentioned should address the latter, though I don't know how well it works in practice. We used to sometimes see processes hung for days on GitHub, but it's been a long time. I don't recall if it was pre-v1.8.5 (which introduced SO_KEEPALIVE), or if we made some other change (we have a load-balancing layer in front that has more aggressive timeouts). > This appears to be what I am seeing, in a script that is attempting to do many > successive clone operations, eg. of > git://anongit.freedesktop.org/xorg/* , the script > occasionally hangs in a clone - I can see with netstat + strace that the TCP > connection is open and GIT is trying to read . > Is there any option I can specify to get the clone to timeout, or do I manually > have to strace the git process and send it a signal after a hang is detected? There are periods where a git client may have to wait for a while in read() while the other side is quiet (e.g., when the other side is badly packed and needs to do a lot of up-front CPU work to prepare the packfile). Since v1.8.4.2, the server side of a clone should generate application-level keepalive packets, so that the client never sees silence for more than ~5 seconds. The freedesktop servers appear to be on v2.1.4, so a long read() as you're seeing probably is a real hang. Note that pushing has a similar problem (the client may wait a long time while the server chews on the uploaded packfile before reporting status). There are no keepalives in that direction, though I have a series there that I need to polish and submit. -Peff