git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <derrickstolee@github.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org, me@ttaylorr.com, newren@gmail.com,
	avarab@gmail.com, dyroneteng@gmail.com,
	Johannes.Schindelin@gmx.de
Subject: Re: [PATCH 1/6] docs: document bundle URI standard
Date: Thu, 9 Jun 2022 12:00:27 -0400	[thread overview]
Message-ID: <3d67b69b-fac8-3171-92dc-303ea672efbf@github.com> (raw)
In-Reply-To: <xmqq5ylarhsg.fsf@gitster.g>

On 6/8/2022 5:01 PM, Junio C Hamano wrote:
> Derrick Stolee <derrickstolee@github.com> writes:
>>> That sounds quite straight-forward.  Do you envision that their
>>> incremental snapshot packfile chains can somehow be shared with the
>>> bundle URI implementations?  Doesn't it make it more cumbersome that
>>> this proposal uses the bundles as the encapsulation format, rather
>>> than packfiles?  As you are sending extra pieces of information on
>>> top of the payload in the form of table-of-contents already, I
>>> wonder if bundle.<id>.uri should point at a bare packfile (instead
>>> of a bundle), while multi-valued bundle.<id>.prerequisite give the
>>> prerequisite objects?  The machinery that is already generating the
>>> prefetch packfiles already know which packfile has what
>>> prerequisites in it, so it rather looks simpler if the solution did
>>> not involve bundles.
>>
>> The prefetch packfiles could be replaced with bundle URIs, if desired.
>> ...
>> So in this world, the bundle URIs could be used as a replacement for
>> downloading these prefetch packfiles (bundles with filter=blob:none)
>> but the bundled refs become useless to the client.
> 
> That's all understandable, but what I was alluding to was to go in
> the other direction.  Since "bundle URI" thing is new, while the
> GVFS Cache Servers already use these prefetch packfiles, it could be
> beneficial if the new thing can be done without bundle files and
> instead with packfiles.  You are already generating these snapshot
> packfiles for GVFS Cache Servers.  So if we can reuse them to also
> serve "git clone" and "git fetch" clients, we can do so without
> doubling the disk footprint.

Now I'm confused as to what you are trying to say, so let me back
up and start from the beginning. Hopefully, that brings clarity so
we can get to the root of my confusion.

The GVFS Cache Servers started as a way to have low-latency per-object
downloads to satisfy the filesystem virtualization feature of the
clients. This initially was going to be the _only_ way clients got
objects until we realized that commit and tree "misses" are very
expensive.

So, the "prefetch packfile" system was developed to use timestamp-
based packs that contain commits and trees. Clients would provide
their latest timestamp and the servers would provide the list of
packfiles to download.

Because the GVFS Protocol still has the "download objects on-demand"
feature, any objects that were needed that were not already in those
prefetch packfiles (including recently-pushed commits and trees)
could be downloaded by the clients on-demand.

This has been successful in production, and in particular is helpful
that cache servers can be maintained completely independently of the
origin Git server. There is some configuration to allow the origin
server to advertise the list of cache servers via the <url>/gvfs/config
REST API, but otherwise they are completely independent.

For years, I've been interested in bringing this kind of functionality
to Git proper, but struggled on multiple fronts:

1. The independence of the cache servers could not use packfile-URIs.

2. The way packfile-URIs happens _within_ a fetch negotiation makes it
   hard to integrate even if we didn't have this independence.

3. If the Git client directly downloaded these packfiles from the
   cache server, then how does it get the remaining objects from the
   origin server?

Ævar's observation that bundles also add ref tips to the packfile is
the key to breaking down this concern: these ref tips give us a way
to negotiate the difference between what the client already has
(including the bundles downloaded from a bundle provider) and what it
wants from the origin Git server. This all happens without any change
necessary to the origin Git server.

And thus, this bundle URI design came about. It takes all of the best
things about the GVFS Cache Server but then layers refs on top of the
time-based prefetch packfiles so a normal Git client can do that
"catch-up fetch" afterwards.

This motivated my "could we use the new bundle URI feature in the
old GVFS Cache Server environment?" comment:

I could imagine updating GVFS Cache Servers to generate bundles
instead (or also) and updating the VFS for Git clients to use the
bundle URI feature to download the data. However, for the sake of not
overloading the origin server with those incremental fetches, we would
probably keep the "only download missing objects on-demand" feature
in that environment. (Hence, the refs are useless to those clients.)

However, you seem to be hinting at "the GVFS Cache Servers seem to
work just fine, so why do we need bundles?" but I think that the
constraints of what is expected at the end of "git clone" or "git
fetch" require us to not "catch up later" and instead complete the
full download during the process. The refs in the bundles are critical
to making that work.
 
> Even if you scrapped the "bundle URI" and rebuilt it as the
> "packfile URI" mechanism, the only change you need is to make
> positive and negative refs, which were available in bundle files but
> not stored in packfiles, available as a part of the metadata for
> each packfile, no?  You'd be keeping track of associated metadata
> (like the .timestamp and .requires fields) in addition to what is in
> the bundle anyway, so...

From this comment, it seems you are suggesting that we augment the
packfile data being served by the packfile-URI feature in order to
include those positive/negative refs (as well as other metadata that
is included in the bundle URI design).

I see two major issues with that:

1. We don't have a way to add that metadata directly into packfiles,
   so we would need to update the file standard or update the
   packfile-URI protocol to include that metadata.

2. The only source of packfile-URI listings come as a response to the
   "git fetch" request to the origin Git server, so there is no way
   to allow an independent server to provide that data.

I hope I am going in the right direction here, but I likely
misunderstood some of your proposed alternatives.

Thanks,
-Stolee

  reply	other threads:[~2022-06-09 16:00 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-06 19:55 [PATCH 0/6] bundle URIs: design doc and initial git fetch --bundle-uri implementation Derrick Stolee via GitGitGadget
2022-06-06 19:55 ` [PATCH 1/6] docs: document bundle URI standard Derrick Stolee via GitGitGadget
2022-06-06 22:18   ` Junio C Hamano
2022-06-08 19:20     ` Derrick Stolee
2022-06-08 19:27       ` Junio C Hamano
2022-06-08 20:44         ` Junio C Hamano
2022-06-08 20:39       ` Junio C Hamano
2022-06-08 20:52         ` Derrick Stolee
2022-06-07  0:33   ` Junio C Hamano
2022-06-08 19:46     ` Derrick Stolee
2022-06-08 21:01       ` Junio C Hamano
2022-06-09 16:00         ` Derrick Stolee [this message]
2022-06-09 17:56           ` Junio C Hamano
2022-06-09 18:27             ` Ævar Arnfjörð Bjarmason
2022-06-09 19:39             ` Derrick Stolee
2022-06-09 20:13               ` Junio C Hamano
2022-06-21 19:34       ` Derrick Stolee
2022-06-21 20:16         ` Junio C Hamano
2022-06-21 21:10           ` Derrick Stolee
2022-06-21 21:33             ` Junio C Hamano
2022-06-06 19:55 ` [PATCH 2/6] remote-curl: add 'get' capability Derrick Stolee via GitGitGadget
2022-07-21 22:59   ` Junio C Hamano
2022-06-06 19:55 ` [PATCH 3/6] bundle-uri: create basic file-copy logic Derrick Stolee via GitGitGadget
2022-06-06 19:55 ` [PATCH 4/6] fetch: add --bundle-uri option Derrick Stolee via GitGitGadget
2022-06-06 19:55 ` [PATCH 5/6] bundle-uri: add support for http(s):// and file:// Derrick Stolee via GitGitGadget
2022-06-06 19:55 ` [PATCH 6/6] fetch: add 'refs/bundle/' to log.excludeDecoration Derrick Stolee via GitGitGadget
2022-06-29 20:40 ` [PATCH v2 0/6] bundle URIs: design doc and initial git fetch --bundle-uri implementation Derrick Stolee via GitGitGadget
2022-06-29 20:40   ` [PATCH v2 1/6] docs: document bundle URI standard Derrick Stolee via GitGitGadget
2022-07-18  9:20     ` SZEDER Gábor
2022-07-21 12:09     ` Matthew John Cheetham
2022-07-22 13:52       ` Derrick Stolee
2022-07-22 16:03       ` Derrick Stolee
2022-07-21 21:39     ` Josh Steadmon
2022-07-22 13:15       ` Derrick Stolee
2022-07-22 15:01       ` Derrick Stolee
2022-06-29 20:40   ` [PATCH v2 2/6] remote-curl: add 'get' capability Derrick Stolee via GitGitGadget
2022-07-21 21:41     ` Josh Steadmon
2022-06-29 20:40   ` [PATCH v2 3/6] bundle-uri: create basic file-copy logic Derrick Stolee via GitGitGadget
2022-07-21 21:45     ` Josh Steadmon
2022-07-22 13:18       ` Derrick Stolee
2022-06-29 20:40   ` [PATCH v2 4/6] fetch: add --bundle-uri option Derrick Stolee via GitGitGadget
2022-06-29 20:40   ` [PATCH v2 5/6] bundle-uri: add support for http(s):// and file:// Derrick Stolee via GitGitGadget
2022-06-29 20:40   ` [PATCH v2 6/6] fetch: add 'refs/bundle/' to log.excludeDecoration Derrick Stolee via GitGitGadget
2022-07-21 21:47     ` Josh Steadmon
2022-07-22 13:20       ` Derrick Stolee
2022-07-21 21:48   ` [PATCH v2 0/6] bundle URIs: design doc and initial git fetch --bundle-uri implementation Josh Steadmon
2022-07-21 21:56     ` Junio C Hamano
2022-07-25 13:53   ` [PATCH v3 0/2] " Derrick Stolee via GitGitGadget
2022-07-25 13:53     ` [PATCH v3 1/2] docs: document bundle URI standard Derrick Stolee via GitGitGadget
2022-07-28  1:23       ` tenglong.tl
2022-08-01 13:42         ` Derrick Stolee
2022-07-25 13:53     ` [PATCH v3 2/2] bundle-uri: add example bundle organization Derrick Stolee via GitGitGadget
2022-08-04 16:09       ` Matthew John Cheetham
2022-08-04 17:39         ` Derrick Stolee
2022-08-04 20:29           ` Ævar Arnfjörð Bjarmason
2022-08-05 18:29             ` Derrick Stolee
2022-07-25 20:05     ` [PATCH v3 0/2] bundle URIs: design doc and initial git fetch --bundle-uri implementation Josh Steadmon
2022-08-09 13:12     ` [PATCH v4 0/2] bundle URIs: design doc Derrick Stolee via GitGitGadget
2022-08-09 13:12       ` [PATCH v4 1/2] docs: document bundle URI standard Derrick Stolee via GitGitGadget
2022-10-04 19:48         ` Philip Oakley
2022-08-09 13:12       ` [PATCH v4 2/2] bundle-uri: add example bundle organization Derrick Stolee via GitGitGadget
2022-08-09 13:49       ` [PATCH v4 0/2] bundle URIs: design doc Phillip Wood
2022-08-09 15:50         ` Derrick Stolee
2022-08-11 15:42           ` Phillip Wood

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3d67b69b-fac8-3171-92dc-303ea672efbf@github.com \
    --to=derrickstolee@github.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=avarab@gmail.com \
    --cc=dyroneteng@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=newren@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).