git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Mark Thomas <markbt@efaref.net>, git@vger.kernel.org
Subject: Re: [RFC 0/4] Shallow clones with on-demand fetch
Date: Tue, 7 Mar 2017 04:42:47 -0500	[thread overview]
Message-ID: <20170307094247.atdtqpttchk5r6qe@sigill.intra.peff.net> (raw)
In-Reply-To: <xmqqr32anri1.fsf@junio-linux.mtv.corp.google.com>

On Mon, Mar 06, 2017 at 11:18:30AM -0800, Junio C Hamano wrote:

> Mark Thomas <markbt@efaref.net> writes:
> 
> > This is a proof-of-concept, so it is in no way complete.  It contains a
> > few hacks to make it work, but these can be ironed out with a bit more
> > work.  What I have so far is sufficient to try out the idea.
> 
> Two things that immediately come to mind (which may or may not be
> real issues) are 
> 
>  (1) What (if any) security model you have in mind.
> 
>      From object-confidentiality's point of view, this needs to be
>      enabled only on a host that allows
>      uploadpack.allowAnySHA1InWant but even riskier.
> 
>      From DoS point of view, you can make a short 40-byte request to
>      cause the other side emit megabytes of stuff.  I do not think
>      it is a new problem (anybody can repeatedly request a clone of
>      large stuff), but there may be new ramifications.
> 
>  (2) If the interface to ask just one object kills the whole idea
>      due to roundtrip latency.
> 
>      You may want to be able to say "I want all objects reachable
>      from this tree; please give me a packfile of needed objects
>      assuming that I have all objects reachable from this other tree
>      (or these other trees)".

Not just latency, but you also lose all of the benefits of delta
compression. So if I asked for:

  git log -p -- foo.c

and git is going to fault in all of the various versions of foo.c over
time, it's _much_ more efficient to batch them into a single request, so
that the server can reuse on-disk deltas between the various versions.
That makes the transmission smaller, and it also makes it more likely
for the server to be able to transmit the bits straight off the disk
(rather than assembling each delta itself then zlib-compressing the
result).

Similarly, there's a latency tension in just finding out whether an
object exists. When we call has_sha1_file() as part of a fetch, for
example, we really want to be able to answer it quickly. So you'd
probably want some mechanism to say "tell me the sha1, type, and size"
of each object I _could_ get via upload-file. The size of that data is
far from trivial for a large repository, but you're probably better off
getting it once than paying the latency cost to fetch it piecemeal.

-Peff

      reply	other threads:[~2017-03-07 12:32 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-04 19:18 [RFC 0/4] Shallow clones with on-demand fetch Mark Thomas
2017-03-04 19:18 ` [RFC 1/4] upload-file: Add upload-file command Mark Thomas
2017-03-04 19:18 ` [RFC 2/4] on-demand: Fetch missing files from remote Mark Thomas
2017-03-04 19:19 ` [RFC 3/4] upload-pack: Send all commits if client requests on-demand Mark Thomas
2017-03-04 19:19 ` [RFC 4/4] clone: Request on-demand shallow clones Mark Thomas
2017-03-06 19:16 ` [RFC 0/4] Shallow clones with on-demand fetch Jonathan Tan
2017-03-06 20:01   ` Stefan Beller
2017-03-06 19:18 ` Junio C Hamano
2017-03-07  9:42   ` Jeff King [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170307094247.atdtqpttchk5r6qe@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=markbt@efaref.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).