From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff King Subject: Re: [PATCH 2/2] http-backend: spool ref negotiation requests to buffer Date: Fri, 15 May 2015 15:16:46 -0400 Message-ID: <20150515191646.GA29934@peff.net> References: <20150515062901.GA30768@peff.net> <20150515063339.GB30890@peff.net> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Cc: Konstantin Ryabitsev , git@vger.kernel.org To: Junio C Hamano X-From: git-owner@vger.kernel.org Fri May 15 21:16:55 2015 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1YtL6L-0007Tp-VX for gcvg-git-2@plane.gmane.org; Fri, 15 May 2015 21:16:54 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934241AbbEOTQt (ORCPT ); Fri, 15 May 2015 15:16:49 -0400 Received: from cloud.peff.net ([50.56.180.127]:59396 "HELO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1754661AbbEOTQt (ORCPT ); Fri, 15 May 2015 15:16:49 -0400 Received: (qmail 30860 invoked by uid 102); 15 May 2015 19:16:48 -0000 Received: from Unknown (HELO peff.net) (10.0.1.1) by cloud.peff.net (qpsmtpd/0.84) with SMTP; Fri, 15 May 2015 14:16:48 -0500 Received: (qmail 5265 invoked by uid 107); 15 May 2015 19:16:48 -0000 Received: from sigill.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.7) by peff.net (qpsmtpd/0.84) with SMTP; Fri, 15 May 2015 15:16:48 -0400 Received: by sigill.intra.peff.net (sSMTP sendmail emulation); Fri, 15 May 2015 15:16:46 -0400 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Fri, May 15, 2015 at 11:22:42AM -0700, Junio C Hamano wrote: > Jeff King writes: > > > The solution is fairly straight-forward: we read the request > > body into an in-memory buffer in http-backend, freeing up > > Apache, and then feed the data ourselves to upload-pack. But > > there are a few important things to note: > > > > 1. We limit in-memory buffer to no larger than 1 megabyte > > to prevent an obvious denial-of-service attack. This > > is a new hard limit on requests, but it's likely that > > requests of this size didn't work before at all (i.e., > > they would have run into the pipe buffer thing and > > deadlocked). So this 1MB limit is clearly a problem, and the reasoning above is not right. The case we are helping is when a large amount of input creates a large amount of output. But we're _hurting_ the case where there's just a large amount of input (as shown by the Dennis's test case). What do we want to do about that? We can switch to streaming after hitting our limit (so opening the opportunity for deadlock again in some cases, but making sure we do no harm to cases that currently work). Or we can just bump the input size and say "you'd be crazy to send more than 10MB" (or 50, or whatever). We could make a configuration knob, too, I guess. > One unrelated thing I noticed was that three codepaths independently > have close(0) in run_service() now, and made me follow the two > helper functions to see they both do the close at the end. It might > have made the flow easier to follow if run_service() were > > ... > close(1); > if (gzip) > inflate(); > else if (buffer) > copy(); > close(0); > ... > > But that is minor. I don't see the close(0) in the other (buffered) code paths. We close the _output_ to the child, but of course we have to do that to tell it we're done sending (I actually forgot it in an earlier version of copy_request(), and things hang :) ). I don't think there's any need to close(0) in the buffered cases. We read until EOF in the copy() case. For gzip, we read until the end of the gzipped data. I guess it would be better to close if we're not expecting more input, as otherwise Apache might block trying to write to us if the client sends bogus input (i.e., a zlib stream with more cruft at the end). > Also, is it worth allocating small and then growing up to the maximum? > I think this only relays one request at a time anyway, and I suspect > that a single 1MB allocation at the first call kept getting reused > may be sufficient (and much simpler). My initial attempt did exactly that, but I had a much smaller buffer. I started to get worried around 1MB. If we bump it to 10MB (or make it configurable), I get more so. I dunno. It is not _that_ much memory, but it is per-request we are serving, so it might add up on a busy server. OTOH, pack-objects thinks nothing of allocating 800MB just for the book-keeping to serve a clone of torvalds/linux. -Peff