git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: avarab@gmail.com, git@vger.kernel.org
Subject: Re: [PATCH v2 0/8] CDN offloading of fetch response
Date: Tue, 23 Apr 2019 01:21:59 -0400	[thread overview]
Message-ID: <20190423052159.GA12530@sigill.intra.peff.net> (raw)
In-Reply-To: <cover.1552073690.git.jonathantanmy@google.com>

On Fri, Mar 08, 2019 at 01:55:12PM -0800, Jonathan Tan wrote:

> Here's my current progress - the only thing that is lacking is more
> tests, maybe, so I think it's ready for review.

A bit belated, but here are some overall thoughts. A lot of my thinking
comes from comparing this to previous work I had done on teaching the
client to fetch first from a bundle and then do a follow-up fetch from
the server.

> This code now starts CDN downloads after closing the first HTTP request,
> and it holds on to the .keep files until after the refs are set. I'll
> leave parallelization of the CDN downloads for later work.

Good. One of the problems with the bundle+followup approach was either
having to hold open or re-establish the connection to the original
server, since that fetch had to be put on hold.

I agree that parallelizing can come later. I do wonder if it's worth
making a new tool rather than trying to reuse git-http-fetch. Yes, it
basically does what we want already, but there's quite a lot of cruft in
that dumb-http code base. Using http_get_file(), or even curl more
directly might be a bit easier.

One problem in particular I'd worry about is that the http code
generally expects authentication to happen through the initial
ref-discovery, and then be set for all subsequent requests. So I doubt
an http_pack_request actually handles auth at all, and retro-fitting it
may be tricky.

> One relatively significant change: someone pointed out that the issue fixed by 
> 50d3413740 ("http: make redirects more obvious", 2016-12-06) could also
> occur here, so I have changed it so that the server is required to send
> the packfile's hash along with the URI.

I have mixed feelings on that. I like how it clearly makes the server
the source of authority. You don't have to care about CDN tampering,
because you know you are getting the bytes the server wanted you to.

I think even without that this is still _mostly_ true, though. You're
going to compute the hash of every object the CDN sends you, and those
objects must fit into the missing gaps in what the server sends you. So
the worst case is that a malicious CDN could send you some extra
objects. That's probably not the end of the world, but I do like the
extra guarantee of knowing that you got byte-for-byte what the server
wanted you to.

> This does mean that Ævar's workflow described in [1] would not work.
> Quoting Ævar:
> 
> > More concretely, I'd like to have a setup where a server can just dumbly
> > point to some URL that probably has most of the data, without having any
> > idea what OIDs are in it. So that e.g. some machine entirely
> > disconnected from the server (and with just a regular clone) can
> > continually generating an up-to-date-enough packfile.
> 
> With 50d3413740, it seems to me that it's important for the server to
> know details about the URIs that it points to, so such a disconnection
> would not work.

I think even without 50d3413740, this is important for your scheme. One
of the weaknesses (and strengths, I suppose) of the bundle+followup
scheme was that the initial bundle generally had to meet the git
repository guarantee. After you fetched the bundle, you'd tell the
server "OK, now I have commit X" just like you were a regular client.

But I really like that your proposal does away with that, and asks for
tighter cooperation between the server and the CDN. It means the CDN can
serve some random subset of the objects. But once we do that, now the
server _has_ to know what was in those URLs it pointed to, because our
protocol has no good way of communicating a random subset of objects (if
they're just blobs, we could enumerate all of them to the server as
haves; yuck.  But as soon as you get a tree or commit from the CDN, you
have to have all of the reachable objects to be able to claim it).

So I think this is pretty inherent to your proposal, and but it's the
right tradeoff to be making here (I think there could still be room for
the other thing, too, but it's just a different feature).

I have a few other comments on the design, but I'll send them in
response to patch 5.

-Peff

  parent reply	other threads:[~2019-04-23  5:22 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-23 23:38 [WIP 0/7] CDN offloading of fetch response Jonathan Tan
2019-02-23 23:38 ` [WIP 1/7] http: use --stdin and --keep when downloading pack Jonathan Tan
2019-02-23 23:38 ` [WIP 2/7] http: improve documentation of http_pack_request Jonathan Tan
2019-02-23 23:38 ` [WIP 3/7] http-fetch: support fetching packfiles by URL Jonathan Tan
2019-02-23 23:38 ` [WIP 4/7] Documentation: order protocol v2 sections Jonathan Tan
2019-02-23 23:38 ` [WIP 5/7] Documentation: add Packfile URIs design doc Jonathan Tan
2019-02-23 23:39 ` [WIP 6/7] upload-pack: refactor reading of pack-objects out Jonathan Tan
2019-02-23 23:39 ` [WIP 7/7] upload-pack: send part of packfile response as uri Jonathan Tan
2019-02-24 15:54   ` Junio C Hamano
2019-02-25 21:04   ` Christian Couder
2019-02-26  1:53     ` Jonathan Nieder
2019-02-26  7:08       ` Christian Couder
2019-03-01  0:09   ` Josh Steadmon
2019-03-01  0:17     ` Jonathan Tan
2019-02-25 21:30 ` [WIP 0/7] CDN offloading of fetch response Christian Couder
2019-02-25 23:45   ` Jonathan Nieder
2019-02-26  8:30     ` Christian Couder
2019-02-26  9:12       ` Ævar Arnfjörð Bjarmason
2019-03-04  8:24         ` Christian Couder
2019-02-28 23:21       ` Jonathan Nieder
2019-03-04  8:54         ` Christian Couder
2019-03-08 21:55 ` [PATCH v2 0/8] " Jonathan Tan
2019-03-08 21:55   ` [PATCH v2 1/8] http: use --stdin when getting dumb HTTP pack Jonathan Tan
2019-03-08 21:55   ` [PATCH v2 2/8] http: improve documentation of http_pack_request Jonathan Tan
2019-03-08 21:55   ` [PATCH v2 3/8] http-fetch: support fetching packfiles by URL Jonathan Tan
2019-03-08 21:55   ` [PATCH v2 4/8] Documentation: order protocol v2 sections Jonathan Tan
2019-03-08 21:55   ` [PATCH v2 5/8] Documentation: add Packfile URIs design doc Jonathan Tan
2019-04-23  5:31     ` Jeff King
2019-04-23 20:38       ` Jonathan Tan
2019-04-23 22:18         ` Ævar Arnfjörð Bjarmason
2019-04-23 22:22           ` Jonathan Nieder
2019-04-23 22:30             ` Ævar Arnfjörð Bjarmason
2019-04-23 22:51               ` Jonathan Nieder
2019-04-23 22:11       ` Jonathan Nieder
2019-04-23 22:25         ` Ævar Arnfjörð Bjarmason
2019-04-23 22:48           ` Jonathan Nieder
2019-04-24  7:48             ` Ævar Arnfjörð Bjarmason
2019-04-24  3:01       ` Junio C Hamano
2019-03-08 21:55   ` [PATCH v2 6/8] upload-pack: refactor reading of pack-objects out Jonathan Tan
2019-03-08 21:55   ` [PATCH v2 7/8] fetch-pack: support more than one pack lockfile Jonathan Tan
2019-03-08 21:55   ` [PATCH v2 8/8] upload-pack: send part of packfile response as uri Jonathan Tan
2019-03-19 20:48   ` [PATCH v2 0/8] CDN offloading of fetch response Josh Steadmon
2019-04-23  5:21   ` Jeff King [this message]
2019-04-23 19:23     ` Jonathan Tan
2019-04-24  9:09   ` Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190423052159.GA12530@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).