git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jonathan Tan <jonathantanmy@google.com>
To: avarab@gmail.com
Cc: jonathantanmy@google.com, git@vger.kernel.org
Subject: Re: [WIP RFC 2/5] Documentation: add Packfile URIs design doc
Date: Tue, 19 Feb 2019 14:06:05 -0800	[thread overview]
Message-ID: <20190219220605.8905-1-jonathantanmy@google.com> (raw)
In-Reply-To: <87k1hv6eel.fsf@evledraar.gmail.com>

> > +when the server sends the packfile, it MAY send a `packfile-uris` section
> > +directly before the `packfile` section (right after `wanted-refs` if it is
> > +sent) containing HTTP(S) URIs. See protocol-v2.txt for the documentation of
> > +this section.
> > +
> > +Clients then should understand that the returned packfile could be incomplete,
> > +and that it needs to download all the given URIs before the fetch or clone is
> > +complete. Each URI should point to a Git packfile (which may be a thin pack and
> > +which may contain offset deltas).
> > [...]
> > +This is the implementation: a feature, marked experimental, that allows the
> > +server to be configured by one or more `uploadpack.blobPackfileUri=<sha1>
> > +<uri>` entries. Whenever the list of objects to be sent is assembled, a blob
> > +with the given sha1 can be replaced by the given URI. This allows, for example,
> > +servers to delegate serving of large blobs to CDNs.
> 
> Okey, so the server advertisement is not just "<urls>" but <oid><url>
> pairs. More on this later...

Actually, the server advertisement is just "<urls>". (The OID is there
to tell the server which object to omit if it sends the URL.) But I see
that the rest of your comments still stand.

> More concretely, I'd like to have a setup where a server can just dumbly
> point to some URL that probably has most of the data, without having any
> idea what OIDs are in it. So that e.g. some machine entirely
> disconnected from the server (and with just a regular clone) can
> continually generating an up-to-date-enough packfile.

Thanks for the concrete use case. Server ignorance would work in this
case, since the client can concisely communicate to the server what
objects it obtained from the CDN (in this case, through "have" lines),
but it does not seem to work in the general case (e.g. offloading large
blobs; or the CDN serving a pack suitable for a shallow clone -
containing all objects referenced by the last few commits, whether
changed in that commit or not).

In this case, maybe the batch job can also inform the server which
commit the CDN is prepared to serve.

> I don't see how this is compatible with the server needing to send a
> bunch of "<oid> <url>" lines, or why a client "cannot declare that the
> fetch is complete until all URIs have been downloaded as
> packfiles". Can't it fall back on the normal dialog?

As stated above, the server advertisement is just "<url>", but you're
right that the server still needs to know their corresponding OIDs (or
have some knowledge like "this pack contains all objects in between this
commit and that commit").

I was thinking that there is no normal dialog to be had with this
protocol, since (as above) in the general case, the client cannot
concisely communicate what objects it obtained from the CDN.

> Other thoughts:
> 
>  * If there isn't such a close coordination between git server & CDN, is
>    there a case for having pack *.idx files on the CDN, so clients can
>    inspect them to see if they'd like to download the full referenced
>    pack?

I'm not sure if I understand this fully, but off the top of my head, the
.idx file doesn't contain relations between objects, so I don't think
the client has enough information to decide if it wants to download the
corresponding packfile.

>  * Without the server needing to know enough about the packs to
>    advertise "<oid> <url>" is there a way to e.g. advertise 4x packs to
>    clients:
> 
>        big.pack, last-month.pack, last-week.pack, last-day.pack
> 
>    Or some other optimistic negotiation where clients, even ones just
>    doing regular fetches, can seek to get more up-to-date with one of
>    the more recent packs before doing the first fetch in 3 days?
> 
>    In the past I'd toyed with creating a similar "not quite CDN" setup
>    using git-bundle.

I think such optimistic downloading of packs during a regular fetch
would only work in a partial clone where "holes" are tolerated. (That
does bring up the possibility of having a fetch mode in which we
download potentially incomplete packfiles into a partial repo and then
"completing" the repo through a not-yet-implemented process, but I
haven't thought through this.)

  reply	other threads:[~2019-02-19 22:06 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-03 23:37 [WIP RFC 0/5] Design for offloading part of packfile response to CDN Jonathan Tan
2018-12-03 23:37 ` [WIP RFC 1/5] Documentation: order protocol v2 sections Jonathan Tan
2018-12-05  4:10   ` Junio C Hamano
2018-12-06 22:54     ` Jonathan Tan
2018-12-09  0:15       ` Junio C Hamano
2018-12-03 23:37 ` [WIP RFC 2/5] Documentation: add Packfile URIs design doc Jonathan Tan
2018-12-04  0:21   ` Stefan Beller
2018-12-04  1:54   ` brian m. carlson
2018-12-04 19:29     ` Jonathan Tan
2019-02-19 13:22       ` Christian Couder
2019-02-19 20:10         ` Jonathan Tan
2019-02-22 11:35           ` Christian Couder
2019-02-19 13:44     ` Ævar Arnfjörð Bjarmason
2019-02-21  1:09       ` brian m. carlson
2019-02-22  9:34         ` Ævar Arnfjörð Bjarmason
2018-12-05  5:02   ` Junio C Hamano
2018-12-05  5:55     ` Junio C Hamano
2018-12-06 23:16     ` Jonathan Tan
2019-02-19 14:28   ` Ævar Arnfjörð Bjarmason
2019-02-19 22:06     ` Jonathan Tan [this message]
2018-12-03 23:37 ` [WIP RFC 3/5] upload-pack: refactor reading of pack-objects out Jonathan Tan
2018-12-04  0:30   ` Stefan Beller
2018-12-05  6:30   ` Junio C Hamano
2018-12-03 23:37 ` [WIP RFC 4/5] upload-pack: refactor writing of "packfile" line Jonathan Tan
2018-12-06  6:35   ` Junio C Hamano
2018-12-06 23:25     ` Jonathan Tan
2018-12-07  0:22       ` Junio C Hamano
2018-12-03 23:37 ` [WIP RFC 5/5] upload-pack: send part of packfile response as uri Jonathan Tan
2018-12-04 20:09   ` Stefan Beller
2018-12-04  0:01 ` [WIP RFC 0/5] Design for offloading part of packfile response to CDN Stefan Beller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190219220605.8905-1-jonathantanmy@google.com \
    --to=jonathantanmy@google.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).