git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: git@vger.kernel.org
Subject: Re: [WIP RFC 2/5] Documentation: add Packfile URIs design doc
Date: Tue, 19 Feb 2019 15:28:34 +0100	[thread overview]
Message-ID: <87k1hv6eel.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <0461b362569362c6d0e73951469c547a03a1b59d.1543879256.git.jonathantanmy@google.com>


On Tue, Dec 04 2018, Jonathan Tan wrote:

I meant to follow-up after Git Merge, but didn't remember until this
thread was bumped.

But some things I'd like to clarify / am concerned about...

> +when the server sends the packfile, it MAY send a `packfile-uris` section
> +directly before the `packfile` section (right after `wanted-refs` if it is
> +sent) containing HTTP(S) URIs. See protocol-v2.txt for the documentation of
> +this section.
> +
> +Clients then should understand that the returned packfile could be incomplete,
> +and that it needs to download all the given URIs before the fetch or clone is
> +complete. Each URI should point to a Git packfile (which may be a thin pack and
> +which may contain offset deltas).
> [...]
> +This is the implementation: a feature, marked experimental, that allows the
> +server to be configured by one or more `uploadpack.blobPackfileUri=<sha1>
> +<uri>` entries. Whenever the list of objects to be sent is assembled, a blob
> +with the given sha1 can be replaced by the given URI. This allows, for example,
> +servers to delegate serving of large blobs to CDNs.

Okey, so the server advertisement is not just "<urls>" but <oid><url>
pairs. More on this later...

> +While fetching, the client needs to remember the list of URIs and cannot
> +declare that the fetch is complete until all URIs have been downloaded as
> +packfiles.

And this. I don't quite understand this well enough, but maybe it helps
if I talk about what I'd expect out of CDN offloading. It comes down to
three things:

 * The server should be able to point to some "seed" packfiles *without*
   necessarily knowing what OIDs are in it, or have to tell the client.

 * The client should be able to just blindly get this data ("I guess
   this is where most of it is"), unpack it, see what OIDs it has, and
   *then* without initiating a new connection continue a want/have
   dialog.

   This effectively "bootstraps" a "clone" mid way into an arbitrary
   "fetch".

 * There should be no requirement that a client successfully downloads
   the advertised CDNs, for fault handling (also discussed in
   https://public-inbox.org/git/87lg2b6gg0.fsf@evledraar.gmail.com/)

More concretely, I'd like to have a setup where a server can just dumbly
point to some URL that probably has most of the data, without having any
idea what OIDs are in it. So that e.g. some machine entirely
disconnected from the server (and with just a regular clone) can
continually generating an up-to-date-enough packfile.

I don't see how this is compatible with the server needing to send a
bunch of "<oid> <url>" lines, or why a client "cannot declare that the
fetch is complete until all URIs have been downloaded as
packfiles". Can't it fall back on the normal dialog?

Other thoughts:

 * If there isn't such a close coordination between git server & CDN, is
   there a case for having pack *.idx files on the CDN, so clients can
   inspect them to see if they'd like to download the full referenced
   pack?

 * Without the server needing to know enough about the packs to
   advertise "<oid> <url>" is there a way to e.g. advertise 4x packs to
   clients:

       big.pack, last-month.pack, last-week.pack, last-day.pack

   Or some other optimistic negotiation where clients, even ones just
   doing regular fetches, can seek to get more up-to-date with one of
   the more recent packs before doing the first fetch in 3 days?

   In the past I'd toyed with creating a similar "not quite CDN" setup
   using git-bundle.

  parent reply	other threads:[~2019-02-19 14:28 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-03 23:37 [WIP RFC 0/5] Design for offloading part of packfile response to CDN Jonathan Tan
2018-12-03 23:37 ` [WIP RFC 1/5] Documentation: order protocol v2 sections Jonathan Tan
2018-12-05  4:10   ` Junio C Hamano
2018-12-06 22:54     ` Jonathan Tan
2018-12-09  0:15       ` Junio C Hamano
2018-12-03 23:37 ` [WIP RFC 2/5] Documentation: add Packfile URIs design doc Jonathan Tan
2018-12-04  0:21   ` Stefan Beller
2018-12-04  1:54   ` brian m. carlson
2018-12-04 19:29     ` Jonathan Tan
2019-02-19 13:22       ` Christian Couder
2019-02-19 20:10         ` Jonathan Tan
2019-02-22 11:35           ` Christian Couder
2019-02-19 13:44     ` Ævar Arnfjörð Bjarmason
2019-02-21  1:09       ` brian m. carlson
2019-02-22  9:34         ` Ævar Arnfjörð Bjarmason
2018-12-05  5:02   ` Junio C Hamano
2018-12-05  5:55     ` Junio C Hamano
2018-12-06 23:16     ` Jonathan Tan
2019-02-19 14:28   ` Ævar Arnfjörð Bjarmason [this message]
2019-02-19 22:06     ` Jonathan Tan
2018-12-03 23:37 ` [WIP RFC 3/5] upload-pack: refactor reading of pack-objects out Jonathan Tan
2018-12-04  0:30   ` Stefan Beller
2018-12-05  6:30   ` Junio C Hamano
2018-12-03 23:37 ` [WIP RFC 4/5] upload-pack: refactor writing of "packfile" line Jonathan Tan
2018-12-06  6:35   ` Junio C Hamano
2018-12-06 23:25     ` Jonathan Tan
2018-12-07  0:22       ` Junio C Hamano
2018-12-03 23:37 ` [WIP RFC 5/5] upload-pack: send part of packfile response as uri Jonathan Tan
2018-12-04 20:09   ` Stefan Beller
2018-12-04  0:01 ` [WIP RFC 0/5] Design for offloading part of packfile response to CDN Stefan Beller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87k1hv6eel.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).