git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Derrick Stolee <derrickstolee@github.com>
Cc: "Matthew John Cheetham" <mjcheetham@outlook.com>,
	"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>,
	gitster@pobox.com, me@ttaylorr.com, newren@gmail.com,
	dyroneteng@gmail.com, Johannes.Schindelin@gmx.de,
	"SZEDER Gábor" <szeder.dev@gmail.com>,
	"Josh Steadmon" <steadmon@google.com>,
	git@vger.kernel.org
Subject: Re: [PATCH v3 2/2] bundle-uri: add example bundle organization
Date: Thu, 04 Aug 2022 22:29:26 +0200	[thread overview]
Message-ID: <220804.86y1w3sozy.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <9b1cf24c-dfa9-0a5b-06f7-8942a8ba72ec@github.com>


On Thu, Aug 04 2022, Derrick Stolee wrote:

> On 8/4/2022 12:09 PM, Matthew John Cheetham wrote:
>> On 2022-07-25 14:53, Derrick Stolee via GitGitGadget wrote:
>>> From: Derrick Stolee <derrickstolee@github.com>
>>>
>>> The previous change introduced the bundle URI design document. It
>>> creates a flexible set of options that allow bundle providers many ways
>>> to organize Git object data and speed up clones and fetches. It is
>>> particularly important that we have flexibility so we can apply future
>>> advancements as new ideas for efficiently organizing Git data are
>>> discovered.
>>>
>>> However, the design document does not provide even an example of how
>>> bundles could be organized, and that makes it difficult to envision how
>>> the feature should work at the end of the implementation plan.
>>>
>>> Add a section that details how a bundle provider could work, including
>>> using the Git server advertisement for multiple geo-distributed servers.
>>> This organization is based on the GVFS Cache Servers which have
>>> successfully used similar ideas to provide fast object access and
>>> reduced server load for very large repositories.
>> Thanks! This patch is helpful guidance for bundle server implementors.
>>> +This example organization is a simplified model of what is used by the
>>> +GVFS Cache Servers (see section near the end of this document) which have
>>> +been beneficial in speeding up clones and fetches for very large
>>> +repositories, although using extra software outside of Git.
>> 
>> Nit: might be a good idea to use "VFS for Git" rather than the old name
>> "GVFS" [1].
>
> The rename from "GVFS" to "VFS for Git" is made even more confusing
> because "the GVFS Protocol" keeps the name since it is independent of
> the virtual filesystem part (and has "gvfs" in the API routes). In
> particular, "the GVFS Cache Servers" provide a repository mirror using
> the GVFS protocol and can be used by things like Scalar (when using
> the microsoft/git fork).
>  
>>> +The bundle provider deploys servers across multiple geographies. Each
>>> +server manages its own bundle set. The server can track a number of Git
>>> +repositories, but provides a bundle list for each based on a pattern. For
>>> +example, when mirroring a repository at `https://<domain>/<org>/<repo>`
>>> +the bundle server could have its bundle list available at
>>> +`https://<server-url>/<domain>/<org>/<repo>`. The origin Git server can
>>> +list all of these servers under the "any" mode:
>>> +
>>> +	[bundle]
>>> +		version = 1
>>> +		mode = any
>>> +		
>>> +	[bundle "eastus"]
>>> +		uri = https://eastus.example.com/<domain>/<org>/<repo>
>>> +		
>>> +	[bundle "europe"]
>>> +		uri = https://europe.example.com/<domain>/<org>/<repo>
>>> +		
>>> +	[bundle "apac"]
>>> +		uri = https://apac.example.com/<domain>/<org>/<repo>
>>> +
>>> +This "list of lists" is static and only changes if a bundle server is
>>> +added or removed.
>>> +
>>> +Each bundle server manages its own set of bundles. The initial bundle list
>>> +contains only a single bundle, containing all of the objects received from
>>> +cloning the repository from the origin server. The list uses the
>>> +`creationToken` heuristic and a `creationToken` is made for the bundle
>>> +based on the server's timestamp.
>> 
>> Just to confirm, in this example the origin server advertises a single
>> URL (over v2 protocol) that points to this example "list of lists"?
>
> No, here the origin server provides the list of lists using the 'bundle-uri'
> protocol v2 command. Using the config file format was an unfortunate choice
> on my part because that actually uses "key=value" lines.
>
> This could be more clear by using that format:
>
>   bundle.version=1
>   bundle.mode=any
>   bundle.eastus.uri=https://eastus.example.com/<domain>/<org>/<repo>
>   bundle.europe.uri=https://europe.example.com/<domain>/<org>/<repo>
>   bundle.apac.uri=https://apac.example.com/<domain>/<org>/<repo>

[I've tried to stay away from the bundle-uri topic for a while, to give
others some space to comment]

On it generally: Your CL goes into some of the saga of it, but briefly
the design I put forward initially assumed that these sort of things
would be offloaded to other protocols.

So, just to take an example of a prominent URL from your "From"
address. AFAICT there isn't a eastus.api.github.com, or
europe.api.github.com, instead it just uses DNS load-balancing for
api.github.com.

See the different IPs you'll get e.g. at
https://www.whatsmydns.net/#A/api.github.com (or from many other such
geoloc-inspecting DNS lookup tools). You can also do the same with
multicast etc.

 We've had some back & fourths on that before. You clearly think this
sort of thing is needed in (some version of) a bundle-uri. I don't
really see why. This sort of load spreading by different DNS naming
hasn't been common in serious production use for a decade or two.

But let's leave that aside, and other things I think we've had diverging
ideas about before (e.g. your spec's explicit cache management, which I
imagined offloading to standard HTTP features).

I do think that:

1) This proposed version would be much stronger if it generally tried to
   justify the features it's putting forward. E.g. just in this case
   (but it applies more generally) it seems to be taken as a given that
   {eastus,europe,apac}.<domain> etc. is the natural way to do that sort
   of load-balancing.

   But the spec doesn't really go into it. Why would someone use that
   instead of setting up GeoDNS (or similar), why does it need to be in
   git's protocol, and not in DNS?

2) I'd really like it clarified in the doc whether it considers itself a
   "living document" amenable to change, or a "spec" that we have to
   stick to.

   I'd like it to be the former, and I think it should be prominently
   noted there (e.g. that it's "EXPERIMENTAL" or whatever).

   I don't think it's a good time investment to argue over every little
   detail of how and why some aspects of bundle-uri should look before
   any of it is in-tree, we can just start with some base functionality,
   and tweak it.

   So if e.g. we find (from real-world benchmarking etc.) that some
   feature of the current spec isn't required (such as this
   GeoDNS-alike) we should be able to remove it, and not say "we can't,
   that's part of 'the spec'".

   Or maybe we keep that (and change something else). The point is that
   I don't think we have the full overview *right now*, and it would be
   regrettable if we prematurely decreed certain things "stable" or
   "specc'd".

I suspect we agree on #2, since your CL mentions a PR that integrates
basically the docs had as part of [1]. So presumably you're aiming for
getting to the end of those PRs, followed by some phase where we attempt
to unify the two, which might mean stripping out some feature(s) from
one or both, and adding others.

Thanks again for pushing this forward!

1. https://lore.kernel.org/git/RFC-cover-v2-00.36-00000000000-20220418T165545Z-avarab@gmail.com/


  reply	other threads:[~2022-08-04 21:00 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-06 19:55 [PATCH 0/6] bundle URIs: design doc and initial git fetch --bundle-uri implementation Derrick Stolee via GitGitGadget
2022-06-06 19:55 ` [PATCH 1/6] docs: document bundle URI standard Derrick Stolee via GitGitGadget
2022-06-06 22:18   ` Junio C Hamano
2022-06-08 19:20     ` Derrick Stolee
2022-06-08 19:27       ` Junio C Hamano
2022-06-08 20:44         ` Junio C Hamano
2022-06-08 20:39       ` Junio C Hamano
2022-06-08 20:52         ` Derrick Stolee
2022-06-07  0:33   ` Junio C Hamano
2022-06-08 19:46     ` Derrick Stolee
2022-06-08 21:01       ` Junio C Hamano
2022-06-09 16:00         ` Derrick Stolee
2022-06-09 17:56           ` Junio C Hamano
2022-06-09 18:27             ` Ævar Arnfjörð Bjarmason
2022-06-09 19:39             ` Derrick Stolee
2022-06-09 20:13               ` Junio C Hamano
2022-06-21 19:34       ` Derrick Stolee
2022-06-21 20:16         ` Junio C Hamano
2022-06-21 21:10           ` Derrick Stolee
2022-06-21 21:33             ` Junio C Hamano
2022-06-06 19:55 ` [PATCH 2/6] remote-curl: add 'get' capability Derrick Stolee via GitGitGadget
2022-07-21 22:59   ` Junio C Hamano
2022-06-06 19:55 ` [PATCH 3/6] bundle-uri: create basic file-copy logic Derrick Stolee via GitGitGadget
2022-06-06 19:55 ` [PATCH 4/6] fetch: add --bundle-uri option Derrick Stolee via GitGitGadget
2022-06-06 19:55 ` [PATCH 5/6] bundle-uri: add support for http(s):// and file:// Derrick Stolee via GitGitGadget
2022-06-06 19:55 ` [PATCH 6/6] fetch: add 'refs/bundle/' to log.excludeDecoration Derrick Stolee via GitGitGadget
2022-06-29 20:40 ` [PATCH v2 0/6] bundle URIs: design doc and initial git fetch --bundle-uri implementation Derrick Stolee via GitGitGadget
2022-06-29 20:40   ` [PATCH v2 1/6] docs: document bundle URI standard Derrick Stolee via GitGitGadget
2022-07-18  9:20     ` SZEDER Gábor
2022-07-21 12:09     ` Matthew John Cheetham
2022-07-22 13:52       ` Derrick Stolee
2022-07-22 16:03       ` Derrick Stolee
2022-07-21 21:39     ` Josh Steadmon
2022-07-22 13:15       ` Derrick Stolee
2022-07-22 15:01       ` Derrick Stolee
2022-06-29 20:40   ` [PATCH v2 2/6] remote-curl: add 'get' capability Derrick Stolee via GitGitGadget
2022-07-21 21:41     ` Josh Steadmon
2022-06-29 20:40   ` [PATCH v2 3/6] bundle-uri: create basic file-copy logic Derrick Stolee via GitGitGadget
2022-07-21 21:45     ` Josh Steadmon
2022-07-22 13:18       ` Derrick Stolee
2022-06-29 20:40   ` [PATCH v2 4/6] fetch: add --bundle-uri option Derrick Stolee via GitGitGadget
2022-06-29 20:40   ` [PATCH v2 5/6] bundle-uri: add support for http(s):// and file:// Derrick Stolee via GitGitGadget
2022-06-29 20:40   ` [PATCH v2 6/6] fetch: add 'refs/bundle/' to log.excludeDecoration Derrick Stolee via GitGitGadget
2022-07-21 21:47     ` Josh Steadmon
2022-07-22 13:20       ` Derrick Stolee
2022-07-21 21:48   ` [PATCH v2 0/6] bundle URIs: design doc and initial git fetch --bundle-uri implementation Josh Steadmon
2022-07-21 21:56     ` Junio C Hamano
2022-07-25 13:53   ` [PATCH v3 0/2] " Derrick Stolee via GitGitGadget
2022-07-25 13:53     ` [PATCH v3 1/2] docs: document bundle URI standard Derrick Stolee via GitGitGadget
2022-07-28  1:23       ` tenglong.tl
2022-08-01 13:42         ` Derrick Stolee
2022-07-25 13:53     ` [PATCH v3 2/2] bundle-uri: add example bundle organization Derrick Stolee via GitGitGadget
2022-08-04 16:09       ` Matthew John Cheetham
2022-08-04 17:39         ` Derrick Stolee
2022-08-04 20:29           ` Ævar Arnfjörð Bjarmason [this message]
2022-08-05 18:29             ` Derrick Stolee
2022-07-25 20:05     ` [PATCH v3 0/2] bundle URIs: design doc and initial git fetch --bundle-uri implementation Josh Steadmon
2022-08-09 13:12     ` [PATCH v4 0/2] bundle URIs: design doc Derrick Stolee via GitGitGadget
2022-08-09 13:12       ` [PATCH v4 1/2] docs: document bundle URI standard Derrick Stolee via GitGitGadget
2022-10-04 19:48         ` Philip Oakley
2022-08-09 13:12       ` [PATCH v4 2/2] bundle-uri: add example bundle organization Derrick Stolee via GitGitGadget
2022-08-09 13:49       ` [PATCH v4 0/2] bundle URIs: design doc Phillip Wood
2022-08-09 15:50         ` Derrick Stolee
2022-08-11 15:42           ` Phillip Wood

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=220804.86y1w3sozy.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=derrickstolee@github.com \
    --cc=dyroneteng@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=mjcheetham@outlook.com \
    --cc=newren@gmail.com \
    --cc=steadmon@google.com \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).