git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH v2 0/8] CDN offloading of fetch response
Date: Wed, 24 Apr 2019 11:09:51 +0200	[thread overview]
Message-ID: <87pnpbsra8.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <cover.1552073690.git.jonathantanmy@google.com>


On Fri, Mar 08 2019, Jonathan Tan wrote:

> One relatively significant change: someone pointed out that the issue fixed by
> 50d3413740 ("http: make redirects more obvious", 2016-12-06) could also
> occur here, so I have changed it so that the server is required to send
> the packfile's hash along with the URI.
>
> This does mean that Ævar's workflow described in
> https://public-inbox.org/git/87k1hv6eel.fsf@evledraar.gmail.com/ would
> not work.  Quoting Ævar:
>
>> More concretely, I'd like to have a setup where a server can just dumbly
>> point to some URL that probably has most of the data, without having any
>> idea what OIDs are in it. So that e.g. some machine entirely
>> disconnected from the server (and with just a regular clone) can
>> continually generating an up-to-date-enough packfile.
>
> With 50d3413740, it seems to me that it's important for the server to
> know details about the URIs that it points to, so such a disconnection
> would not work.

For what it's worth I'm fine with that, and as is obvious in some of the
previous threads I hadn't thought about that threat model. This "I know
XYZ has a pack that should have most of it, check what's in it and get
back to me with HAVEs" was a nice-to-have, but not a must.

And you can still get most of the way there with this proposal and the
techniques described in the follow-up to
https://public-inbox.org/git/87bmhiykvw.fsf@evledraar.gmail.com/ if I'm
reading this series correctly.

I.e. the server would know the CDNs are going to be generating a pack
from the "master" history with the tip being the Nth commit, and since
both agree on the pack algorithm they can generate the same OID/pack
hash pair without further coordination, the server would then
optimistically advertise that. This would give you the sort of "lazy"
CDN setup I was describing with slightly more work than just "here's
some random latest pack".

But isn't in the case that if a malicious server ever got a hold of a
pack SHA-1 it knows to be on a private CDN *and* a commit that's in it
we'd have the same problem? Shouldn't this security model be prominently
documented in Documentation/technical/packfile-uri.txt? I.e. "THOU MUST
NEVER LEAK A COMBINATION OF THE PACK SHA-1 OF A PRIVATE CDN AND A
PRIVATE COMMIT SHA-1!".

It seems likely that once we have git CDN support the first thing CDNs
are going to do is to serve up such packs via a URL that includes the
pack SHA-1. Once the combination of that & a commit in it leaks we're
back to square one, no? *My* default approach in setting up such
infrastructure without knowing about 50d3413740 would be not to care
about the privacy of the SHA-1s, even in the case of private data.

Isn't there also overlap here with non-CDN shallow/narrow fetching?
Where a server can detect such a client, rely on it to get the objects
from elsewhere (e.g. via adding an unrelated remote), and then get a
push that gives it secret data?

I don't see any fool-proof way out of it, but maybe a mode where we:

 1. Get the CDN data URLs (which for the purposes of this example I'm
    assuming are "base packs" containing at least some commits...)

 2. Do a follow-up request to the server where we e.g. pretend not to
    have the last N commits for tips it advertises and ask it to serve
    that up from a non-CDN. If it can't we know it's lying and trying an
    attack similar to the one described in 50d3413740.

    Conversely, can't know if it can that it isn't, but at that point it
    seems unlikely, since surely the useful thing is a delta against the
    recent-but-server-lied-about-having-it commit, at least in most
    cases....

This would also have the advantage of addressing some of the reliability
concerns brought up elsewhere. I.e. once we're doing a sanity-check
post-fetch anyway a failure to download 1/3 packs from a CDN becomes
recoverable, although the design outlined here (understandably) would
prefer that to be done in another "fetch" dialog so the server can keep
the "CDN + my data should be 100% of the data" state.

      parent reply	other threads:[~2019-04-24  9:09 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-23 23:38 [WIP 0/7] CDN offloading of fetch response Jonathan Tan
2019-02-23 23:38 ` [WIP 1/7] http: use --stdin and --keep when downloading pack Jonathan Tan
2019-02-23 23:38 ` [WIP 2/7] http: improve documentation of http_pack_request Jonathan Tan
2019-02-23 23:38 ` [WIP 3/7] http-fetch: support fetching packfiles by URL Jonathan Tan
2019-02-23 23:38 ` [WIP 4/7] Documentation: order protocol v2 sections Jonathan Tan
2019-02-23 23:38 ` [WIP 5/7] Documentation: add Packfile URIs design doc Jonathan Tan
2019-02-23 23:39 ` [WIP 6/7] upload-pack: refactor reading of pack-objects out Jonathan Tan
2019-02-23 23:39 ` [WIP 7/7] upload-pack: send part of packfile response as uri Jonathan Tan
2019-02-24 15:54   ` Junio C Hamano
2019-02-25 21:04   ` Christian Couder
2019-02-26  1:53     ` Jonathan Nieder
2019-02-26  7:08       ` Christian Couder
2019-03-01  0:09   ` Josh Steadmon
2019-03-01  0:17     ` Jonathan Tan
2019-02-25 21:30 ` [WIP 0/7] CDN offloading of fetch response Christian Couder
2019-02-25 23:45   ` Jonathan Nieder
2019-02-26  8:30     ` Christian Couder
2019-02-26  9:12       ` Ævar Arnfjörð Bjarmason
2019-03-04  8:24         ` Christian Couder
2019-02-28 23:21       ` Jonathan Nieder
2019-03-04  8:54         ` Christian Couder
2019-03-08 21:55 ` [PATCH v2 0/8] " Jonathan Tan
2019-03-08 21:55   ` [PATCH v2 1/8] http: use --stdin when getting dumb HTTP pack Jonathan Tan
2019-03-08 21:55   ` [PATCH v2 2/8] http: improve documentation of http_pack_request Jonathan Tan
2019-03-08 21:55   ` [PATCH v2 3/8] http-fetch: support fetching packfiles by URL Jonathan Tan
2019-03-08 21:55   ` [PATCH v2 4/8] Documentation: order protocol v2 sections Jonathan Tan
2019-03-08 21:55   ` [PATCH v2 5/8] Documentation: add Packfile URIs design doc Jonathan Tan
2019-04-23  5:31     ` Jeff King
2019-04-23 20:38       ` Jonathan Tan
2019-04-23 22:18         ` Ævar Arnfjörð Bjarmason
2019-04-23 22:22           ` Jonathan Nieder
2019-04-23 22:30             ` Ævar Arnfjörð Bjarmason
2019-04-23 22:51               ` Jonathan Nieder
2019-04-23 22:11       ` Jonathan Nieder
2019-04-23 22:25         ` Ævar Arnfjörð Bjarmason
2019-04-23 22:48           ` Jonathan Nieder
2019-04-24  7:48             ` Ævar Arnfjörð Bjarmason
2019-04-24  3:01       ` Junio C Hamano
2019-03-08 21:55   ` [PATCH v2 6/8] upload-pack: refactor reading of pack-objects out Jonathan Tan
2019-03-08 21:55   ` [PATCH v2 7/8] fetch-pack: support more than one pack lockfile Jonathan Tan
2019-03-08 21:55   ` [PATCH v2 8/8] upload-pack: send part of packfile response as uri Jonathan Tan
2019-03-19 20:48   ` [PATCH v2 0/8] CDN offloading of fetch response Josh Steadmon
2019-04-23  5:21   ` Jeff King
2019-04-23 19:23     ` Jonathan Tan
2019-04-24  9:09   ` Ævar Arnfjörð Bjarmason [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87pnpbsra8.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).