git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Stefan Beller <sbeller@google.com>
To: Jonathan Tan <jonathantanmy@google.com>, Ben Peart <peartben@gmail.com>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>,
	Mark Thomas <markbt@efaref.net>,
	Jeff Hostetler <git@jeffhostetler.com>
Subject: Re: Proposal for "fetch-any-blob Git protocol" and server design
Date: Tue, 28 Mar 2017 16:19:33 -0700	[thread overview]
Message-ID: <CAGZ79ka-YaF5dBXJmhXsfwwrwSBy0tkun0yDysG581E0mpqDVw@mail.gmail.com> (raw)
In-Reply-To: <ffd92ad9-39fe-c76b-178d-6e3d6a425037@google.com>

+cc Ben Peart, who sent
"[RFC] Add support for downloading blobs on demand" to the list recently.
This proposal here seems like it has the same goal, so maybe your review
could go a long way here?

Thanks,
Stefan

On Tue, Mar 14, 2017 at 3:57 PM, Jonathan Tan <jonathantanmy@google.com> wrote:
> As described in "Background" below, there have been at least 2 patch sets to
> support "partial clones" and on-demand blob fetches, where the server part
> that supports on-demand blob fetches was treated at least in outline. Here
> is a proposal treating that server part in detail.
>
> == Background
>
> The desire for Git to support (i) missing blobs and (ii) fetching them as
> needed from a remote repository has surfaced on the mailing list a few
> times, most recently in the form of RFC patch sets [1] [2].
>
> A local repository that supports (i) will be created by a "partial clone",
> that is, a clone with some special parameters (exact parameters are still
> being discussed) that does not download all blobs normally downloaded. Such
> a repository should support (ii), which is what this proposal describes.
>
> == Design
>
> A new endpoint "server" is created. The client will send a message in the
> following format:
>
> ----
> fbp-request = PKT-LINE("fetch-blob-pack")
>               1*want
>               flush-pkt
> want = PKT-LINE("want" SP obj-id)
> ----
>
> The client may send one or more SHA-1s for which it wants blobs, then a
> flush-pkt.
>
> The server will then reply:
>
> ----
> server-reply = flush-pkt | PKT-LINE("ERR" SP message)
> ----
>
> If there was no error, the server will then send them in a packfile,
> formatted like described in "Packfile Data" in pack-protocol.txt with
> "side-band-64k" enabled.
>
> Any server that supports "partial clone" will also support this, and the
> client will automatically assume this. (How a client discovers "partial
> clone" is not covered by this proposal.)
>
> The server will perform reachability checks on requested blobs through the
> equivalent of "git rev-list --use-bitmap-index" (like "git upload-pack" when
> using the allowreachablesha1inwant option), unless configured to suppress
> reachability checks through a config option. The server administrator is
> highly recommended to regularly regenerate the bitmap (or suppress
> reachability checks).
>
> === Endpoint support for forward compatibility
>
> This "server" endpoint requires that the first line be understood, but will
> ignore any other lines starting with words that it does not understand. This
> allows new "commands" to be added (distinguished by their first lines) and
> existing commands to be "upgraded" with backwards compatibility.
>
> === Related improvements possible with new endpoint
>
> Previous protocol upgrade suggestions have had to face the difficulty of
> allowing updated clients to discover the server support while not slowing
> down (for example, through extra network round-trips) any client, whether
> non-updated or updated. The introduction of "partial clone" allows clients
> to rely on the guarantee that any server that supports "partial clone"
> supports "fetch-blob-pack", and we can extend the guarantee to other
> protocol upgrades that such repos would want.
>
> One such upgrade is "ref-in-want" [3]. The full details can be obtained from
> that email thread, but to summarize, the patch set eliminates the need for
> the initial ref advertisement and allows communication in ref name globs,
> making it much easier for multiple load-balanced servers to serve large
> repos to clients - this is something that would greatly benefit the Android
> project, for example, and possibly many others.
>
> Bundling support for "ref-in-want" with "fetch-blob-pack" simplifies matters
> for the client in that a client needs to only handle one "version" of server
> (a server that supports both). If "ref-in-want" were added later, instead of
> now, clients would need to be able to handle two "versions" (one with only
> "fetch-blob-pack" and one with both "fetch-blob-pack" and "ref-in-want").
>
> As for its implementation, that email thread already contains a patch set
> that makes it work with the existing "upload-pack" endpoint; I can update
> that patch set to use the proposed "server" endpoint (with a
> "fetch-commit-pack" message) if need be.
>
> == Client behavior
>
> This proposal is concerned with server behavior only, but it is useful to
> envision how the client would use this to ensure that the server behavior is
> useful.
>
> === Indication to use the proposed endpoint
>
> The client will probably already record that at least one of its remotes
> (the one that it successfully performed a "partial clone" from) supports
> this new endpoint (if not, it can’t determine whether a missing blob was
> caused by repo corruption or by the "partial clone"). This knowledge can be
> used both to know that the server supports "fetch-blob-pack" and
> "fetch-commit-pack" (for the latter, the client can fall back to
> "fetch-pack"/"upload-pack" when fetching from other servers).
>
> === Multiple remotes
>
> Fetches of missing blobs should (at least by default?) go to the remote that
> sent the tree that points to them. This means that if there are multiple
> remotes, the client needs to remember which remote it learned about a given
> missing blob from.
>
> == Alternatives considered
>
> The "fetch-blob-pack" and "fetch-commit-pack" messages could be split into
> their own endpoints. It seemed more reasonable to combine them together
> since they serve similar use cases (large repos), and (for example) reduces
> the number of binaries in PATH, but I do not feel strongly about this.
>
> The client could supply commit information about the blobs it wants (or
> other information that could help the reachability analysis). However, these
> lines wouldn’t be used by the proposed server design. And if we do discover
> that these lines are useful, the protocol could be extended with new lines
> that contain this information (since old servers will ignore all lines that
> they do not understand).
>
> We could extend "upload-pack" to allow blobs in "want" lines instead of
> having a new endpoint. Due to a quirk in the Git implementation (but
> possibly not other implementations like JGit), this is already supported
> [4]. However, each invocation would require the server to generate an
> unnecessary ref list, and would require both the server and the client to
> undergo more network traffic.
>
> Also, the new "server" endpoint might be made to be discovered through
> another mechanism (for example, a capability advertisement on another
> endpoint). It is probably simpler to tie it to the "partial clone" feature,
> though, since they are so likely to be used together.
>
> [1] <20170304191901.9622-1-markbt@efaref.net>
> [2] <1488999039-37631-1-git-send-email-git@jeffhostetler.com>
> [3] <cover.1485381677.git.jonathantanmy@google.com>
> [4] <20170309003547.6930-1-jonathantanmy@google.com>

  parent reply	other threads:[~2017-03-28 23:19 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-14 22:57 Proposal for "fetch-any-blob Git protocol" and server design Jonathan Tan
2017-03-15 17:59 ` Junio C Hamano
2017-03-16 17:31   ` Jonathan Tan
2017-03-16 21:17     ` Junio C Hamano
2017-03-16 22:48       ` Jonathan Tan
2017-03-28 23:19 ` Stefan Beller [this message]
     [not found]   ` <00bf01d2aed7$b13492a0$139db7e0$@gmail.com>
2017-04-12 22:02     ` Kevin David
2017-04-13 20:12       ` Jonathan Tan
2017-04-21 16:41         ` Kevin David
2017-04-26 22:51           ` Jonathan Tan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGZ79ka-YaF5dBXJmhXsfwwrwSBy0tkun0yDysG581E0mpqDVw@mail.gmail.com \
    --to=sbeller@google.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=markbt@efaref.net \
    --cc=peartben@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).