git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: gitster@pobox.com, me@ttaylorr.com, newren@gmail.com,
	avarab@gmail.com, dyroneteng@gmail.com,
	Johannes.Schindelin@gmx.de, "SZEDER Gábor" <szeder.dev@gmail.com>,
	"Matthew John Cheetham" <mjcheetham@outlook.com>,
	"Josh Steadmon" <steadmon@google.com>,
	"Derrick Stolee" <derrickstolee@github.com>
Subject: [PATCH v3 0/2] bundle URIs: design doc and initial git fetch --bundle-uri implementation
Date: Mon, 25 Jul 2022 13:53:06 +0000	[thread overview]
Message-ID: <pull.1248.v3.git.1658757188.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1248.v2.git.1656535245.gitgitgadget@gmail.com>

This is the first of series towards building the bundle URI feature as
discussed in previous RFCs, specifically pulled directly out of [5]:

[1]
https://lore.kernel.org/git/RFC-cover-00.13-0000000000-20210805T150534Z-avarab@gmail.com/
[2]
https://lore.kernel.org/git/cover-0.3-00000000000-20211025T211159Z-avarab@gmail.com/
[3]
https://lore.kernel.org/git/pull.1160.git.1645641063.gitgitgadget@gmail.com
[4]
https://lore.kernel.org/git/RFC-cover-v2-00.36-00000000000-20220418T165545Z-avarab@gmail.com/
[5]
https://lore.kernel.org/git/pull.1234.git.1653072042.gitgitgadget@gmail.com

THIS ONLY INCLUDES THE DESIGN DOCUMENT. See "Updates in v3". There are two
patches:

 1. The main design document that details the bundle URI standard and how
    the client interacts with the bundle data.
 2. An addendum to the design document that details one strategy for
    organizing bundles from the perspective of a bundle provider.

As outlined in [5], the next steps after this are:

 1. Add 'git clone --bundle-uri=' to run a 'git bundle fetch ' step before
    doing a fetch negotiation with the origin remote. [6]
 2. Allow parsing a bundle list as a config file at the given URI. The
    key-value format is unified with the protocol v2 verb (coming in (3)).
    [7]
 3. Implement the protocol v2 verb, re-using the bundle list logic from (2).
    Use this to auto-discover bundle URIs during 'git clone' (behind a
    config option). [8]
 4. Implement the 'creationToken' heuristic, allowing incremental 'git
    fetch' commands to download a bundle list from a configured URI, and
    only download bundles that are new based on the creation token values.
    [9]

I have prepared some of this work as pull requests on my personal fork so
curious readers can look ahead to where we are going:

[6] https://github.com/derrickstolee/git/pull/18 [7]
https://github.com/derrickstolee/git/pull/20 [8]
https://github.com/derrickstolee/git/pull/21 [9]
https://github.com/derrickstolee/git/pull/22

As mentioned in the design document, this is not all that is possible. For
instance, Ævar's suggestion to download only the bundle headers can be used
as a second heuristic (and as an augmentation of the timestamp heuristic).


Updates in v3
=============

 * This version only includes the design document. Thanks to all the
   reviewers for the significant attention that improves the doc a lot.
 * The second patch has an addition to the design document that details a
   potential way to organize bundles from the provider's perspective.
 * Based on some off-list feedback, I was going to switch git fetch
   --bundle-uri into git bundle fetch, but that has a major conflict with
   [10] which was just submitted.
 * I will move the git bundle fetch implementation into [6] which also has
   the git clone --bundle-uri implementation. [10]
   https://lore.kernel.org/git/20220725123857.2773963-1-szeder.dev@gmail.com/


Updates in v2
=============

 * The design document has been updated based on Junio's feedback.
 * The "bundle.list." keys are now just "bundle.".
 * The "timestamp" heuristic is now "creationToken".
 * More clarity on how Git parses data from the bundle URI.
 * Dropped some unnecessary bundle list keys (*.list, *.requires).

Thanks, -Stolee

Derrick Stolee (2):
  docs: document bundle URI standard
  bundle-uri: add example bundle organization

 Documentation/Makefile                 |   1 +
 Documentation/technical/bundle-uri.txt | 573 +++++++++++++++++++++++++
 2 files changed, 574 insertions(+)
 create mode 100644 Documentation/technical/bundle-uri.txt


base-commit: e72d93e88cb20b06e88e6e7d81bd1dc4effe453f
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1248%2Fderrickstolee%2Fbundle-redo%2Ffetch-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1248/derrickstolee/bundle-redo/fetch-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/1248

Range-diff vs v2:

 1:  d444042dc4d ! 1:  e0f003e1b5f docs: document bundle URI standard
     @@ Commit message
      
          Signed-off-by: Derrick Stolee <derrickstolee@github.com>
      
     + ## Documentation/Makefile ##
     +@@ Documentation/Makefile: TECH_DOCS += SubmittingPatches
     + TECH_DOCS += ToolsForGit
     + TECH_DOCS += technical/bitmap-format
     + TECH_DOCS += technical/bundle-format
     ++TECH_DOCS += technical/bundle-uri
     + TECH_DOCS += technical/cruft-packs
     + TECH_DOCS += technical/hash-function-transition
     + TECH_DOCS += technical/http-protocol
     +
       ## Documentation/technical/bundle-uri.txt (new) ##
      @@
      +Bundle URIs
      +===========
      +
     ++Git bundles are files that store a pack-file along with some extra metadata,
     ++including a set of refs and a (possibly empty) set of necessary commits. See
     ++linkgit:git-bundle[1] and link:bundle-format.txt[the bundle format] for more
     ++information.
     ++
      +Bundle URIs are locations where Git can download one or more bundles in
      +order to bootstrap the object database in advance of fetching the remaining
      +objects from a remote.
     @@ Documentation/technical/bundle-uri.txt (new)
      +	If this string-valued key exists, then the bundle list is designed to
      +	work well with incremental `git fetch` commands. The heuristic signals
      +	that there are additional keys available for each bundle that help
     -+	determine which subset of bundles the client should download.
     ++	determine which subset of bundles the client should download. The only
     ++  heuristic currently planned is `creationToken`.
      +
      +The remaining keys include an `<id>` segment which is a server-designated
     -+name for each available bundle.
     ++name for each available bundle. The `<id>` must contain only alphanumeric
     ++and `-` characters.
      +
      +bundle.<id>.uri::
      +	(Required) This string value is the URI for downloading bundle `<id>`.
     @@ Documentation/technical/bundle-uri.txt (new)
      +
      +Here is an example bundle list using the Git config format:
      +
     -+```
     -+[bundle]
     -+	version = 1
     -+	mode = all
     -+	heuristic = creationToken
     ++	[bundle]
     ++		version = 1
     ++		mode = all
     ++		heuristic = creationToken
      +
     -+[bundle "2022-02-09-1644442601-daily"]
     -+	uri = https://bundles.example.com/git/git/2022-02-09-1644442601-daily.bundle
     -+	timestamp = 1644442601
     ++	[bundle "2022-02-09-1644442601-daily"]
     ++		uri = https://bundles.example.com/git/git/2022-02-09-1644442601-daily.bundle
     ++		creationToken = 1644442601
      +
     -+[bundle "2022-02-02-1643842562"]
     -+	uri = https://bundles.example.com/git/git/2022-02-02-1643842562.bundle
     -+	timestamp = 1643842562
     ++	[bundle "2022-02-02-1643842562"]
     ++		uri = https://bundles.example.com/git/git/2022-02-02-1643842562.bundle
     ++		creationToken = 1643842562
      +
     -+[bundle "2022-02-09-1644442631-daily-blobless"]
     -+	uri = 2022-02-09-1644442631-daily-blobless.bundle
     -+	timestamp = 1644442631
     -+	filter = blob:none
     ++	[bundle "2022-02-09-1644442631-daily-blobless"]
     ++		uri = 2022-02-09-1644442631-daily-blobless.bundle
     ++		creationToken = 1644442631
     ++		filter = blob:none
      +
     -+[bundle "2022-02-02-1643842568-blobless"]
     -+	uri = /git/git/2022-02-02-1643842568-blobless.bundle
     -+	timestamp = 1643842568
     -+	filter = blob:none
     -+```
     ++	[bundle "2022-02-02-1643842568-blobless"]
     ++		uri = /git/git/2022-02-02-1643842568-blobless.bundle
     ++		creationToken = 1643842568
     ++		filter = blob:none
      +
      +This example uses `bundle.mode=all` as well as the
      +`bundle.<id>.creationToken` heuristic. It also uses the `bundle.<id>.filter`
     @@ Documentation/technical/bundle-uri.txt (new)
      +* The client fails to connect with a server at the given URI or a connection
      +  is lost without any chance to recover.
      +
     -+* The client receives a response other than `200 OK` (such as `404 Not Found`,
     -+  `401 Not Authorized`, or `500 Internal Server Error`). The client should
     -+  use the `credential.helper` to attempt authentication after the first
     -+  `401 Not Authorized` response, but a second such response is a failure.
     ++* The client receives a 400-level response (such as `404 Not Found` or
     ++  `401 Not Authorized`). The client should use the credential helper to
     ++  find and provide a credential for the URI, but match the semantics of
     ++  Git's other HTTP protocols in terms of handling specific 400-level
     ++  errors.
      +
     -+* The client receives data that is not parsable as a bundle or bundle list.
     ++* The server reports any other failure reponse.
      +
     -+* The bundle list describes a directed cycle in the
     -+  `bundle.<id>.requires` links.
     ++* The client receives data that is not parsable as a bundle or bundle list.
      +
      +* A bundle includes a filter that does not match expectations.
      +
      +* The client cannot unbundle the bundles because the prerequisite commit OIDs
     -+  are not in the object database and there are no more
     -+  `bundle.<id>.requires` links to follow.
     ++  are not in the object database and there are no more bundles to download.
      +
      +There are also situations that could be seen as wasteful, but are not
      +error conditions:
     @@ Documentation/technical/bundle-uri.txt (new)
      +  the client is using hourly prefetches with background maintenance, but
      +  the server is computing bundles weekly. For this reason, the client
      +  should not use bundle URIs for fetch unless the server has explicitly
     -+  recommended it through the `bundle.flags = forFetch` value.
     ++  recommended it through a `bundle.heuristic` value.
      +
      +Implementation Plan
      +-------------------
     @@ Documentation/technical/bundle-uri.txt (new)
      +   that the config format parsing feeds a list of key-value pairs into the
      +   bundle list logic.
      +
     -+3. Create the `bundle-uri` protocol v2 verb so Git servers can advertise
     ++3. Create the `bundle-uri` protocol v2 command so Git servers can advertise
      +   bundle URIs using the key-value pairs. Plug into the existing key-value
      +   input to the bundle list logic. Allow `git clone` to discover these
      +   bundle URIs and bootstrap the client repository from the bundle data.
 2:  0a2cf60437f < -:  ----------- remote-curl: add 'get' capability
 3:  abec47564fd < -:  ----------- bundle-uri: create basic file-copy logic
 4:  f6255ec5188 < -:  ----------- fetch: add --bundle-uri option
 5:  bfbd11b48bf < -:  ----------- bundle-uri: add support for http(s):// and file://
 6:  a217e9a0640 < -:  ----------- fetch: add 'refs/bundle/' to log.excludeDecoration
 -:  ----------- > 2:  a933471c3af bundle-uri: add example bundle organization

-- 
gitgitgadget

  parent reply	other threads:[~2022-07-25 13:53 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-06 19:55 [PATCH 0/6] bundle URIs: design doc and initial git fetch --bundle-uri implementation Derrick Stolee via GitGitGadget
2022-06-06 19:55 ` [PATCH 1/6] docs: document bundle URI standard Derrick Stolee via GitGitGadget
2022-06-06 22:18   ` Junio C Hamano
2022-06-08 19:20     ` Derrick Stolee
2022-06-08 19:27       ` Junio C Hamano
2022-06-08 20:44         ` Junio C Hamano
2022-06-08 20:39       ` Junio C Hamano
2022-06-08 20:52         ` Derrick Stolee
2022-06-07  0:33   ` Junio C Hamano
2022-06-08 19:46     ` Derrick Stolee
2022-06-08 21:01       ` Junio C Hamano
2022-06-09 16:00         ` Derrick Stolee
2022-06-09 17:56           ` Junio C Hamano
2022-06-09 18:27             ` Ævar Arnfjörð Bjarmason
2022-06-09 19:39             ` Derrick Stolee
2022-06-09 20:13               ` Junio C Hamano
2022-06-21 19:34       ` Derrick Stolee
2022-06-21 20:16         ` Junio C Hamano
2022-06-21 21:10           ` Derrick Stolee
2022-06-21 21:33             ` Junio C Hamano
2022-06-06 19:55 ` [PATCH 2/6] remote-curl: add 'get' capability Derrick Stolee via GitGitGadget
2022-07-21 22:59   ` Junio C Hamano
2022-06-06 19:55 ` [PATCH 3/6] bundle-uri: create basic file-copy logic Derrick Stolee via GitGitGadget
2022-06-06 19:55 ` [PATCH 4/6] fetch: add --bundle-uri option Derrick Stolee via GitGitGadget
2022-06-06 19:55 ` [PATCH 5/6] bundle-uri: add support for http(s):// and file:// Derrick Stolee via GitGitGadget
2022-06-06 19:55 ` [PATCH 6/6] fetch: add 'refs/bundle/' to log.excludeDecoration Derrick Stolee via GitGitGadget
2022-06-29 20:40 ` [PATCH v2 0/6] bundle URIs: design doc and initial git fetch --bundle-uri implementation Derrick Stolee via GitGitGadget
2022-06-29 20:40   ` [PATCH v2 1/6] docs: document bundle URI standard Derrick Stolee via GitGitGadget
2022-07-18  9:20     ` SZEDER Gábor
2022-07-21 12:09     ` Matthew John Cheetham
2022-07-22 13:52       ` Derrick Stolee
2022-07-22 16:03       ` Derrick Stolee
2022-07-21 21:39     ` Josh Steadmon
2022-07-22 13:15       ` Derrick Stolee
2022-07-22 15:01       ` Derrick Stolee
2022-06-29 20:40   ` [PATCH v2 2/6] remote-curl: add 'get' capability Derrick Stolee via GitGitGadget
2022-07-21 21:41     ` Josh Steadmon
2022-06-29 20:40   ` [PATCH v2 3/6] bundle-uri: create basic file-copy logic Derrick Stolee via GitGitGadget
2022-07-21 21:45     ` Josh Steadmon
2022-07-22 13:18       ` Derrick Stolee
2022-06-29 20:40   ` [PATCH v2 4/6] fetch: add --bundle-uri option Derrick Stolee via GitGitGadget
2022-06-29 20:40   ` [PATCH v2 5/6] bundle-uri: add support for http(s):// and file:// Derrick Stolee via GitGitGadget
2022-06-29 20:40   ` [PATCH v2 6/6] fetch: add 'refs/bundle/' to log.excludeDecoration Derrick Stolee via GitGitGadget
2022-07-21 21:47     ` Josh Steadmon
2022-07-22 13:20       ` Derrick Stolee
2022-07-21 21:48   ` [PATCH v2 0/6] bundle URIs: design doc and initial git fetch --bundle-uri implementation Josh Steadmon
2022-07-21 21:56     ` Junio C Hamano
2022-07-25 13:53   ` Derrick Stolee via GitGitGadget [this message]
2022-07-25 13:53     ` [PATCH v3 1/2] docs: document bundle URI standard Derrick Stolee via GitGitGadget
2022-07-28  1:23       ` tenglong.tl
2022-08-01 13:42         ` Derrick Stolee
2022-07-25 13:53     ` [PATCH v3 2/2] bundle-uri: add example bundle organization Derrick Stolee via GitGitGadget
2022-08-04 16:09       ` Matthew John Cheetham
2022-08-04 17:39         ` Derrick Stolee
2022-08-04 20:29           ` Ævar Arnfjörð Bjarmason
2022-08-05 18:29             ` Derrick Stolee
2022-07-25 20:05     ` [PATCH v3 0/2] bundle URIs: design doc and initial git fetch --bundle-uri implementation Josh Steadmon
2022-08-09 13:12     ` [PATCH v4 0/2] bundle URIs: design doc Derrick Stolee via GitGitGadget
2022-08-09 13:12       ` [PATCH v4 1/2] docs: document bundle URI standard Derrick Stolee via GitGitGadget
2022-10-04 19:48         ` Philip Oakley
2022-08-09 13:12       ` [PATCH v4 2/2] bundle-uri: add example bundle organization Derrick Stolee via GitGitGadget
2022-08-09 13:49       ` [PATCH v4 0/2] bundle URIs: design doc Phillip Wood
2022-08-09 15:50         ` Derrick Stolee
2022-08-11 15:42           ` Phillip Wood

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.1248.v3.git.1658757188.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=avarab@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=dyroneteng@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=mjcheetham@outlook.com \
    --cc=newren@gmail.com \
    --cc=steadmon@google.com \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).