From: Matthew John Cheetham <mjcheetham@outlook.com>
To: Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Cc: gitster@pobox.com, me@ttaylorr.com, newren@gmail.com,
avarab@gmail.com, dyroneteng@gmail.com,
Johannes.Schindelin@gmx.de, "SZEDER Gábor" <szeder.dev@gmail.com>,
"Josh Steadmon" <steadmon@google.com>,
"Derrick Stolee" <derrickstolee@github.com>,
git@vger.kernel.org
Subject: Re: [PATCH v3 2/2] bundle-uri: add example bundle organization
Date: Thu, 4 Aug 2022 17:09:18 +0100 [thread overview]
Message-ID: <AS8PR03MB86898A2F7156918A390296CAC09F9@AS8PR03MB8689.eurprd03.prod.outlook.com> (raw)
In-Reply-To: <a933471c3afdd2c95d4115719c24d79e5e430b4d.1658757188.git.gitgitgadget@gmail.com>
On 2022-07-25 14:53, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>
>
> The previous change introduced the bundle URI design document. It
> creates a flexible set of options that allow bundle providers many ways
> to organize Git object data and speed up clones and fetches. It is
> particularly important that we have flexibility so we can apply future
> advancements as new ideas for efficiently organizing Git data are
> discovered.
>
> However, the design document does not provide even an example of how
> bundles could be organized, and that makes it difficult to envision how
> the feature should work at the end of the implementation plan.
>
> Add a section that details how a bundle provider could work, including
> using the Git server advertisement for multiple geo-distributed servers.
> This organization is based on the GVFS Cache Servers which have
> successfully used similar ideas to provide fast object access and
> reduced server load for very large repositories.
Thanks! This patch is helpful guidance for bundle server implementors.
> +This example organization is a simplified model of what is used by the
> +GVFS Cache Servers (see section near the end of this document) which have
> +been beneficial in speeding up clones and fetches for very large
> +repositories, although using extra software outside of Git.
Nit: might be a good idea to use "VFS for Git" rather than the old name
"GVFS" [1].
> +The bundle provider deploys servers across multiple geographies. Each
> +server manages its own bundle set. The server can track a number of Git
> +repositories, but provides a bundle list for each based on a pattern. For
> +example, when mirroring a repository at `https://<domain>/<org>/<repo>`
> +the bundle server could have its bundle list available at
> +`https://<server-url>/<domain>/<org>/<repo>`. The origin Git server can
> +list all of these servers under the "any" mode:
> +
> + [bundle]
> + version = 1
> + mode = any
> +
> + [bundle "eastus"]
> + uri = https://eastus.example.com/<domain>/<org>/<repo>
> +
> + [bundle "europe"]
> + uri = https://europe.example.com/<domain>/<org>/<repo>
> +
> + [bundle "apac"]
> + uri = https://apac.example.com/<domain>/<org>/<repo>
> +
> +This "list of lists" is static and only changes if a bundle server is
> +added or removed.
> +
> +Each bundle server manages its own set of bundles. The initial bundle list
> +contains only a single bundle, containing all of the objects received from
> +cloning the repository from the origin server. The list uses the
> +`creationToken` heuristic and a `creationToken` is made for the bundle
> +based on the server's timestamp.
Just to confirm, in this example the origin server advertises a single
URL (over v2 protocol) that points to this example "list of lists"?
Remote -> 1 URL -> List(any/split by geo) -> List(all/split by time)
> +The bundle server runs regularly-scheduled updates for the bundle list,
> +such as once a day. During this task, the server fetches the latest
> +contents from the origin server and generates a bundle containing the
> +objects reachable from the latest origin refs, but not contained in a
> +previously-computed bundle. This bundle is added to the list, with care
> +that the `creationToken` is strictly greater than the previous maximum
> +`creationToken`.
> +
> +When the bundle list grows too large, say more than 30 bundles, then the
> +oldest "_N_ minus 30" bundles are combined into a single bundle. This
> +bundle's `creationToken` is equal to the maximum `creationToken` among the
> +merged bundles.
> +
> +An example bundle list is provided here, although it only has two daily
> +bundles and not a full list of 30:
> +
> + [bundle]
> + version = 1
> + mode = all
> + heuristic = creationToken
> +
> + [bundle "2022-02-13-1644770820-daily"]
> + uri = https://eastus.example.com/<domain>/<org>/<repo>/2022-02-09-1644770820-daily.bundle
> + creationToken = 1644770820
> +
> + [bundle "2022-02-09-1644442601-daily"]
> + uri = https://eastus.example.com/<domain>/<org>/<repo>/2022-02-09-1644442601-daily.bundle
> + creationToken = 1644442601
> +
> + [bundle "2022-02-02-1643842562"]
> + uri = https://eastus.example.com/<domain>/<org>/<repo>/2022-02-02-1643842562.bundle
> + creationToken = 1643842562
> +
> +To avoid storing and serving object data in perpetuity despite becoming
> +unreachable in the origin server, this bundle merge can be more careful.
> +Instead of taking an absolute union of the old bundles, instead the bundle
> +can be created by looking at the newer bundles and ensuring that their
> +necessary commits are all available in this merged bundle (or in another
> +one of the newer bundles). This allows "expiring" object data that is not
> +being used by new commits in this window of time. That data could be
> +reintroduced by a later push.
> +
> +The intention of this data organization has two main goals. First, initial
> +clones of the repository become faster by downloading precomputed object
> +data from a closer source. Second, `git fetch` commands can be faster,
> +especially if the client has not fetched for a few days. However, if a
> +client does not fetch for 30 days, then the bundle list organization would
> +cause redownloading a large amount of object data.
> +
> +One way to make this organization more useful to users who fetch frequently
> +is to have more frequent bundle creation. For example, bundles could be
> +created every hour, and then once a day those "hourly" bundles could be
> +merged into a "daily" bundle. The daily bundles are merged into the
> +oldest bundle after 30 days.
> +
> +It is recommened that this bundle strategy is repeated with the `blob:none`
> +filter if clients of this repository are expecting to use blobless partial
> +clones. This list of blobless bundles stays in the same list as the full
> +bundles, but uses the `bundle.<id>.filter` key to separate the two groups.
> +For very large repositories, the bundle provider may want to _only_ provide
> +blobless bundles.
> +
> Implementation Plan
> -------------------
>
In general this looks good to me!
[1] https://github.com/microsoft/VFSForGit/issues/72
next prev parent reply other threads:[~2022-08-04 16:09 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-06 19:55 [PATCH 0/6] bundle URIs: design doc and initial git fetch --bundle-uri implementation Derrick Stolee via GitGitGadget
2022-06-06 19:55 ` [PATCH 1/6] docs: document bundle URI standard Derrick Stolee via GitGitGadget
2022-06-06 22:18 ` Junio C Hamano
2022-06-08 19:20 ` Derrick Stolee
2022-06-08 19:27 ` Junio C Hamano
2022-06-08 20:44 ` Junio C Hamano
2022-06-08 20:39 ` Junio C Hamano
2022-06-08 20:52 ` Derrick Stolee
2022-06-07 0:33 ` Junio C Hamano
2022-06-08 19:46 ` Derrick Stolee
2022-06-08 21:01 ` Junio C Hamano
2022-06-09 16:00 ` Derrick Stolee
2022-06-09 17:56 ` Junio C Hamano
2022-06-09 18:27 ` Ævar Arnfjörð Bjarmason
2022-06-09 19:39 ` Derrick Stolee
2022-06-09 20:13 ` Junio C Hamano
2022-06-21 19:34 ` Derrick Stolee
2022-06-21 20:16 ` Junio C Hamano
2022-06-21 21:10 ` Derrick Stolee
2022-06-21 21:33 ` Junio C Hamano
2022-06-06 19:55 ` [PATCH 2/6] remote-curl: add 'get' capability Derrick Stolee via GitGitGadget
2022-07-21 22:59 ` Junio C Hamano
2022-06-06 19:55 ` [PATCH 3/6] bundle-uri: create basic file-copy logic Derrick Stolee via GitGitGadget
2022-06-06 19:55 ` [PATCH 4/6] fetch: add --bundle-uri option Derrick Stolee via GitGitGadget
2022-06-06 19:55 ` [PATCH 5/6] bundle-uri: add support for http(s):// and file:// Derrick Stolee via GitGitGadget
2022-06-06 19:55 ` [PATCH 6/6] fetch: add 'refs/bundle/' to log.excludeDecoration Derrick Stolee via GitGitGadget
2022-06-29 20:40 ` [PATCH v2 0/6] bundle URIs: design doc and initial git fetch --bundle-uri implementation Derrick Stolee via GitGitGadget
2022-06-29 20:40 ` [PATCH v2 1/6] docs: document bundle URI standard Derrick Stolee via GitGitGadget
2022-07-18 9:20 ` SZEDER Gábor
2022-07-21 12:09 ` Matthew John Cheetham
2022-07-22 13:52 ` Derrick Stolee
2022-07-22 16:03 ` Derrick Stolee
2022-07-21 21:39 ` Josh Steadmon
2022-07-22 13:15 ` Derrick Stolee
2022-07-22 15:01 ` Derrick Stolee
2022-06-29 20:40 ` [PATCH v2 2/6] remote-curl: add 'get' capability Derrick Stolee via GitGitGadget
2022-07-21 21:41 ` Josh Steadmon
2022-06-29 20:40 ` [PATCH v2 3/6] bundle-uri: create basic file-copy logic Derrick Stolee via GitGitGadget
2022-07-21 21:45 ` Josh Steadmon
2022-07-22 13:18 ` Derrick Stolee
2022-06-29 20:40 ` [PATCH v2 4/6] fetch: add --bundle-uri option Derrick Stolee via GitGitGadget
2022-06-29 20:40 ` [PATCH v2 5/6] bundle-uri: add support for http(s):// and file:// Derrick Stolee via GitGitGadget
2022-06-29 20:40 ` [PATCH v2 6/6] fetch: add 'refs/bundle/' to log.excludeDecoration Derrick Stolee via GitGitGadget
2022-07-21 21:47 ` Josh Steadmon
2022-07-22 13:20 ` Derrick Stolee
2022-07-21 21:48 ` [PATCH v2 0/6] bundle URIs: design doc and initial git fetch --bundle-uri implementation Josh Steadmon
2022-07-21 21:56 ` Junio C Hamano
2022-07-25 13:53 ` [PATCH v3 0/2] " Derrick Stolee via GitGitGadget
2022-07-25 13:53 ` [PATCH v3 1/2] docs: document bundle URI standard Derrick Stolee via GitGitGadget
2022-07-28 1:23 ` tenglong.tl
2022-08-01 13:42 ` Derrick Stolee
2022-07-25 13:53 ` [PATCH v3 2/2] bundle-uri: add example bundle organization Derrick Stolee via GitGitGadget
2022-08-04 16:09 ` Matthew John Cheetham [this message]
2022-08-04 17:39 ` Derrick Stolee
2022-08-04 20:29 ` Ævar Arnfjörð Bjarmason
2022-08-05 18:29 ` Derrick Stolee
2022-07-25 20:05 ` [PATCH v3 0/2] bundle URIs: design doc and initial git fetch --bundle-uri implementation Josh Steadmon
2022-08-09 13:12 ` [PATCH v4 0/2] bundle URIs: design doc Derrick Stolee via GitGitGadget
2022-08-09 13:12 ` [PATCH v4 1/2] docs: document bundle URI standard Derrick Stolee via GitGitGadget
2022-10-04 19:48 ` Philip Oakley
2022-08-09 13:12 ` [PATCH v4 2/2] bundle-uri: add example bundle organization Derrick Stolee via GitGitGadget
2022-08-09 13:49 ` [PATCH v4 0/2] bundle URIs: design doc Phillip Wood
2022-08-09 15:50 ` Derrick Stolee
2022-08-11 15:42 ` Phillip Wood
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=AS8PR03MB86898A2F7156918A390296CAC09F9@AS8PR03MB8689.eurprd03.prod.outlook.com \
--to=mjcheetham@outlook.com \
--cc=Johannes.Schindelin@gmx.de \
--cc=avarab@gmail.com \
--cc=derrickstolee@github.com \
--cc=dyroneteng@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
--cc=me@ttaylorr.com \
--cc=newren@gmail.com \
--cc=steadmon@google.com \
--cc=szeder.dev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).