git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <derrickstolee@github.com>
To: Josh Steadmon <steadmon@google.com>,
	Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org, gitster@pobox.com, me@ttaylorr.com,
	newren@gmail.com, avarab@gmail.com, dyroneteng@gmail.com,
	Johannes.Schindelin@gmx.de
Subject: Re: [PATCH v2 1/6] docs: document bundle URI standard
Date: Fri, 22 Jul 2022 09:15:26 -0400	[thread overview]
Message-ID: <5a6a687d-4543-f375-13dd-b60b37ddce18@github.com> (raw)
In-Reply-To: <YtnHeUUnaLn6mSYK@google.com>

On 7/21/2022 5:39 PM, Josh Steadmon wrote:
> On 2022.06.29 20:40, Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <derrickstolee@github.com>

>> +bundle.heuristic::
>> +	If this string-valued key exists, then the bundle list is designed to
>> +	work well with incremental `git fetch` commands. The heuristic signals
>> +	that there are additional keys available for each bundle that help
>> +	determine which subset of bundles the client should download.
> 
> To be clear, the values of `bundle.heuristic` are keys underneath the
> following `bundle.<id>` segments?

At the moment, the only planned heuristic is associated with a single
key (bundle.<id>.creationToken) but future heuristics could be more
complicated. The general idea is to say "these bundles were organized
with this heuristic in mind, so either take all of them (ignoring the
heuristic) or apply a specific algorithm to download a subset at a
time." The creationToken heuristic uses the total order on the bundles
to do a greedy algorithm for downloading the most-recent bundles until
the required commits are all satisfied.

>> +Related Work: Packfile URIs
>> +---------------------------
> 
> Thanks for including this section; my very first question before I even
> started reading the doc was "How does this compare to Packfile URIs?".
> 
> 
>> +
>> +The Git protocol already has a capability where the Git server can list
>> +a set of URLs along with the packfile response when serving a client
>> +request. The client is then expected to download the packfiles at those
>> +locations in order to have a complete understanding of the response.
>> +
>> +This mechanism is used by the Gerrit server (implemented with JGit) and
>> +has been effective at reducing CPU load and improving user performance for
>> +clones.
>> +
>> +A major downside to this mechanism is that the origin server needs to know
>> +_exactly_ what is in those packfiles, and the packfiles need to be available
>> +to the user for some time after the server has responded. This coupling
>> +between the origin and the packfile data is difficult to manage.
> 
> It was not immediately clear to me why bundle URIs would avoid these two
> downsides, but after thinking about it and discussing it, I believe this
> is the reasoning (please correct me if I'm wrong):
> 
> Bundle URIs are intended to be fully processed before negotiation
> happens, so the server can rely solely on the client's reported Haves /
> Wants to determine its response, as usual, and therefore doesn't need to
> know the bundle contents. Packfile URIs are not processed by the client
> before negotiation, so the server needs to be aware of the packfile
> contents in order to determine which additional objects to send, and
> can't rely solely on the Haves/Wants from the client.
> 
> If that's accurate, then it seems fine that when the bundle URI is
> provided by the user on the command-line, Git can download and process
> the bundle before even attempting to contact the server. But in future
> series, when we add the ability for the server to provide a URI via a
> capability, the client will have to pause after seeing the server's
> advertised URI, fetch and process the bundle, and then proceed with a
> fetch command. This seems fine assuming the bundles can be handled in a
> reasonable amount of time. And even if the connection breaks before the
> fetch command can be issued, presumably the client would not need to
> attempt to re-download the bundle a second time when the client makes a
> second connection to re-attempt the fetch.

After the server has advertised the bundle URIs, the client should
consider disconnecting while downloading bundles, then reconnect to
the server after that is complete.

For stateless connections (https) we need to reconnect every time, so
this isn't a problem.

I anticipate that in the long-term view of this feature, the server
advertised bundle URIs will usually point to an independent bundle
list at a static URI. In that situation, we can store the chosen
bundle list URI in Git config and check that bundle list for new data
before even connecting to the origin Git server. We would only
"rediscover" the URIs when either the bundle URI returns 404 and/or
the origin server lists new URIs.

Of course, there is the option that the origin Git server wants to
track and advertise the individual bundles directly, in which case the
client can't do this opportunistic bundle pre-check.

> For the point about the packfiles needing to be available for some time,
> this makes sense because the server's response is processed by the
> client before the referenced packfile is downloaded, but the response is
> incomplete without the packfile. But with bundle URIs, the server's
> response doesn't depend on whether or not the client actually processed
> the referenced bundle, only on what was negotiated. So the server's
> response is still valid and useful even if the client is unable to get
> the bundle.

You're pointing out the critical feature: the bundles are completely
optional. If the client fails to download the bundles due to something
like this race, then it will just negotiate for the origin Git server
to provide the data.

What's perhaps not entirely fair about this point is that the bundle
provider should keep this race in mind between serving a bundle list
and having the client request the associated bundles. Something like
a 24-hour window between updating the bundle list and deleting an
unreferenced bundle file should satisfy the vast majority of clients.

Thanks for the close look. While I removed them from the context, I've
made note of your smaller comments on things like typos and will fix
them in v3.

Thanks,
-Stolee

  reply	other threads:[~2022-07-22 13:16 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-06 19:55 [PATCH 0/6] bundle URIs: design doc and initial git fetch --bundle-uri implementation Derrick Stolee via GitGitGadget
2022-06-06 19:55 ` [PATCH 1/6] docs: document bundle URI standard Derrick Stolee via GitGitGadget
2022-06-06 22:18   ` Junio C Hamano
2022-06-08 19:20     ` Derrick Stolee
2022-06-08 19:27       ` Junio C Hamano
2022-06-08 20:44         ` Junio C Hamano
2022-06-08 20:39       ` Junio C Hamano
2022-06-08 20:52         ` Derrick Stolee
2022-06-07  0:33   ` Junio C Hamano
2022-06-08 19:46     ` Derrick Stolee
2022-06-08 21:01       ` Junio C Hamano
2022-06-09 16:00         ` Derrick Stolee
2022-06-09 17:56           ` Junio C Hamano
2022-06-09 18:27             ` Ævar Arnfjörð Bjarmason
2022-06-09 19:39             ` Derrick Stolee
2022-06-09 20:13               ` Junio C Hamano
2022-06-21 19:34       ` Derrick Stolee
2022-06-21 20:16         ` Junio C Hamano
2022-06-21 21:10           ` Derrick Stolee
2022-06-21 21:33             ` Junio C Hamano
2022-06-06 19:55 ` [PATCH 2/6] remote-curl: add 'get' capability Derrick Stolee via GitGitGadget
2022-07-21 22:59   ` Junio C Hamano
2022-06-06 19:55 ` [PATCH 3/6] bundle-uri: create basic file-copy logic Derrick Stolee via GitGitGadget
2022-06-06 19:55 ` [PATCH 4/6] fetch: add --bundle-uri option Derrick Stolee via GitGitGadget
2022-06-06 19:55 ` [PATCH 5/6] bundle-uri: add support for http(s):// and file:// Derrick Stolee via GitGitGadget
2022-06-06 19:55 ` [PATCH 6/6] fetch: add 'refs/bundle/' to log.excludeDecoration Derrick Stolee via GitGitGadget
2022-06-29 20:40 ` [PATCH v2 0/6] bundle URIs: design doc and initial git fetch --bundle-uri implementation Derrick Stolee via GitGitGadget
2022-06-29 20:40   ` [PATCH v2 1/6] docs: document bundle URI standard Derrick Stolee via GitGitGadget
2022-07-18  9:20     ` SZEDER Gábor
2022-07-21 12:09     ` Matthew John Cheetham
2022-07-22 13:52       ` Derrick Stolee
2022-07-22 16:03       ` Derrick Stolee
2022-07-21 21:39     ` Josh Steadmon
2022-07-22 13:15       ` Derrick Stolee [this message]
2022-07-22 15:01       ` Derrick Stolee
2022-06-29 20:40   ` [PATCH v2 2/6] remote-curl: add 'get' capability Derrick Stolee via GitGitGadget
2022-07-21 21:41     ` Josh Steadmon
2022-06-29 20:40   ` [PATCH v2 3/6] bundle-uri: create basic file-copy logic Derrick Stolee via GitGitGadget
2022-07-21 21:45     ` Josh Steadmon
2022-07-22 13:18       ` Derrick Stolee
2022-06-29 20:40   ` [PATCH v2 4/6] fetch: add --bundle-uri option Derrick Stolee via GitGitGadget
2022-06-29 20:40   ` [PATCH v2 5/6] bundle-uri: add support for http(s):// and file:// Derrick Stolee via GitGitGadget
2022-06-29 20:40   ` [PATCH v2 6/6] fetch: add 'refs/bundle/' to log.excludeDecoration Derrick Stolee via GitGitGadget
2022-07-21 21:47     ` Josh Steadmon
2022-07-22 13:20       ` Derrick Stolee
2022-07-21 21:48   ` [PATCH v2 0/6] bundle URIs: design doc and initial git fetch --bundle-uri implementation Josh Steadmon
2022-07-21 21:56     ` Junio C Hamano
2022-07-25 13:53   ` [PATCH v3 0/2] " Derrick Stolee via GitGitGadget
2022-07-25 13:53     ` [PATCH v3 1/2] docs: document bundle URI standard Derrick Stolee via GitGitGadget
2022-07-28  1:23       ` tenglong.tl
2022-08-01 13:42         ` Derrick Stolee
2022-07-25 13:53     ` [PATCH v3 2/2] bundle-uri: add example bundle organization Derrick Stolee via GitGitGadget
2022-08-04 16:09       ` Matthew John Cheetham
2022-08-04 17:39         ` Derrick Stolee
2022-08-04 20:29           ` Ævar Arnfjörð Bjarmason
2022-08-05 18:29             ` Derrick Stolee
2022-07-25 20:05     ` [PATCH v3 0/2] bundle URIs: design doc and initial git fetch --bundle-uri implementation Josh Steadmon
2022-08-09 13:12     ` [PATCH v4 0/2] bundle URIs: design doc Derrick Stolee via GitGitGadget
2022-08-09 13:12       ` [PATCH v4 1/2] docs: document bundle URI standard Derrick Stolee via GitGitGadget
2022-10-04 19:48         ` Philip Oakley
2022-08-09 13:12       ` [PATCH v4 2/2] bundle-uri: add example bundle organization Derrick Stolee via GitGitGadget
2022-08-09 13:49       ` [PATCH v4 0/2] bundle URIs: design doc Phillip Wood
2022-08-09 15:50         ` Derrick Stolee
2022-08-11 15:42           ` Phillip Wood

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5a6a687d-4543-f375-13dd-b60b37ddce18@github.com \
    --to=derrickstolee@github.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=avarab@gmail.com \
    --cc=dyroneteng@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=newren@gmail.com \
    --cc=steadmon@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).