From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-4.1 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id 39AEF1FBEC for ; Tue, 7 Mar 2017 12:32:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755384AbdCGMbt (ORCPT ); Tue, 7 Mar 2017 07:31:49 -0500 Received: from cloud.peff.net ([104.130.231.41]:39670 "EHLO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754831AbdCGMao (ORCPT ); Tue, 7 Mar 2017 07:30:44 -0500 Received: (qmail 4346 invoked by uid 109); 7 Mar 2017 09:42:49 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.84) with SMTP; Tue, 07 Mar 2017 09:42:49 +0000 Received: (qmail 24248 invoked by uid 111); 7 Mar 2017 09:42:58 -0000 Received: from sigill.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.7) by peff.net (qpsmtpd/0.84) with SMTP; Tue, 07 Mar 2017 04:42:58 -0500 Received: by sigill.intra.peff.net (sSMTP sendmail emulation); Tue, 07 Mar 2017 04:42:47 -0500 Date: Tue, 7 Mar 2017 04:42:47 -0500 From: Jeff King To: Junio C Hamano Cc: Mark Thomas , git@vger.kernel.org Subject: Re: [RFC 0/4] Shallow clones with on-demand fetch Message-ID: <20170307094247.atdtqpttchk5r6qe@sigill.intra.peff.net> References: <20170304191901.9622-1-markbt@efaref.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Mon, Mar 06, 2017 at 11:18:30AM -0800, Junio C Hamano wrote: > Mark Thomas writes: > > > This is a proof-of-concept, so it is in no way complete. It contains a > > few hacks to make it work, but these can be ironed out with a bit more > > work. What I have so far is sufficient to try out the idea. > > Two things that immediately come to mind (which may or may not be > real issues) are > > (1) What (if any) security model you have in mind. > > From object-confidentiality's point of view, this needs to be > enabled only on a host that allows > uploadpack.allowAnySHA1InWant but even riskier. > > From DoS point of view, you can make a short 40-byte request to > cause the other side emit megabytes of stuff. I do not think > it is a new problem (anybody can repeatedly request a clone of > large stuff), but there may be new ramifications. > > (2) If the interface to ask just one object kills the whole idea > due to roundtrip latency. > > You may want to be able to say "I want all objects reachable > from this tree; please give me a packfile of needed objects > assuming that I have all objects reachable from this other tree > (or these other trees)". Not just latency, but you also lose all of the benefits of delta compression. So if I asked for: git log -p -- foo.c and git is going to fault in all of the various versions of foo.c over time, it's _much_ more efficient to batch them into a single request, so that the server can reuse on-disk deltas between the various versions. That makes the transmission smaller, and it also makes it more likely for the server to be able to transmit the bits straight off the disk (rather than assembling each delta itself then zlib-compressing the result). Similarly, there's a latency tension in just finding out whether an object exists. When we call has_sha1_file() as part of a fetch, for example, we really want to be able to answer it quickly. So you'd probably want some mechanism to say "tell me the sha1, type, and size" of each object I _could_ get via upload-file. The size of that data is far from trivial for a large repository, but you're probably better off getting it once than paying the latency cost to fetch it piecemeal. -Peff