git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
blob 0e6d7c8dda3c56e7ed14313fc7e2bc45e2551186 5659 bytes (raw)
name: Documentation/technical/packfile-uri.txt 	 # note: path name is non-authoritative(*)

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
 
Packfile URIs
=============

This feature allows servers to serve part of their packfile response as URIs.
This allows server designs that improve scalability in bandwidth and CPU usage
(for example, by serving some data through a CDN), and (in the future) provides
some measure of resumability to clients.

This feature is available only in protocol version 2.

Protocol
--------

The server advertises the `packfile-uris` capability.

If the client then communicates which protocols (HTTPS, etc.) it supports with
a `packfile-uris` argument, the server MAY send a `packfile-uris` section
directly before the `packfile` section (right after `wanted-refs` if it is
sent) containing URIs of any of the given protocols. The URIs point to
packfiles that use only features that the client has declared that it supports
(e.g. ofs-delta and thin-pack). See protocol-v2.txt for the documentation of
this section.

Clients should then download and index all the given URIs (in addition to
downloading and indexing the packfile given in the `packfile` section of the
response) before performing the connectivity check.

Server design
-------------

The server can be trivially made compatible with the proposed protocol by
having it advertise `packfile-uris`, tolerating the client sending
`packfile-uris`, and never sending any `packfile-uris` section. But we should
include some sort of non-trivial implementation in the Minimum Viable Product,
at least so that we can test the client.

This is the implementation: a feature, marked experimental, that allows
the server to be configured by one or more entries with the format:

    uploadpack.excludeobject=<object-hash> <level> <pack-hash> <uri>

Value `<object-hash>` is the key of entry, and the object type can be
blob, tree, commit, or tag. Value of entry has three parts,
`<pack-hash>` is used to identify the packfile which contains the given
`<object-hash>` object, and `<uri>` is the URI to download the packfile by
client. For example, When a blob is configured with `uploadpack.excludeobject`
that means whenever the blob to be send is assembled, the object will
be excluded.

In addition to excluding a single object like blob, sometimes it's
hoped to exclude not only the object itself, but also all the related
objects with it, like all the objects a tree contains or the ancestors
that a commit can reach. In these cases, the `<level>` is designed to
distinguish the scope of exclusion, it supports three levels:

- Level 0: Excluding a single object itself, without any objects that
  have a relationship with it. 

- Level 1: Excluding object itself, and objects it contains.

- Level 2: Excluding object itself, the objects it contains, and the
  ancestors it can reach.

If `<level>` is configured as 0, only the object itself will be
excluded, no matter what the object type is. It is a common scenario
for large size blobs, but it does much not sense for other object types
(e.g. download a singe commit without downloading the blobs and tree
in it).

If `<level>` is configured as 1, not only the single object but also all
the objects in it will be excluded. This applies to scenarios where
it's wanted to exclude a specified non-blob object that includes some
lage size objects.

- If <object-hash> is a blob, the result is the same as level 0, because blob
contains nothing just itself.

- If <object-hash> is a tree, the tree itself, and all blobs and trees
  in it will be excluded.

- If <object-hash> is a commit, the commit itself, the referenced
  root-tree, and all blobs and trees in the root-tree will be excluded.

- If <object-hash> is a tag, the tag itself, the dereferenced commit
  and all trees and blobs contained in its root-tree will be excluded.

If `<level>` is configured as 2, not only the objects in the scope of
level 1 , but also the reachable ancestors will be excluded if
`<object-hash>` is commit or tag.

Configuration compatibility
---------------------------

The old configuration of packfile-uri:

    uploadpack.blobPackfileUri=<object-hash> <pack-hash> <uri>

For the old configuration is compatible with the new one, but it only
supports the exclusion of blob objects.

Client design
-------------

The client has a config variable `fetch.uriprotocols` that determines which
protocols the end user is willing to use. By default, this is empty.

When the client downloads the given URIs, it should store them with "keep"
files, just like it does with the packfile in the `packfile` section. These
additional "keep" files can only be removed after the refs have been updated -
just like the "keep" file for the packfile in the `packfile` section.

The division of work (initial fetch + additional URIs) introduces convenient
points for resumption of an interrupted clone - such resumption can be done
after the Minimum Viable Product (see "Future work").

Future work
-----------

The protocol design allows some evolution of the server and client without any
need for protocol changes, so only a small-scoped design is included here to
form the MVP. For example, the following can be done:

 * On the client, resumption of clone. If a clone is interrupted, information
   could be recorded in the repository's config and a "clone-resume" command
   can resume the clone in progress. (Resumption of subsequent fetches is more
   difficult because that must deal with the user wanting to use the repository
   even after the fetch was interrupted.)

There are some possible features that will require a change in protocol:

 * Additional HTTP headers (e.g. authentication)
 * Byte range support
 * Different file formats referenced by URIs (e.g. raw object)

debug log:

solving 0e6d7c8dda ...
found 0e6d7c8dda in https://public-inbox.org/git/79fc2c23cf053cb525d818b6b47e24c49b57f672.1634634814.git.tenglong@alibaba-inc.com/
found f7eabc6c76 in https://80x24.org/mirrors/git.git
preparing index
index prepared:
100644 f7eabc6c76838d6577d14058c3ad43b69f3cf4f4	Documentation/technical/packfile-uri.txt

applying [1/1] https://public-inbox.org/git/79fc2c23cf053cb525d818b6b47e24c49b57f672.1634634814.git.tenglong@alibaba-inc.com/
diff --git a/Documentation/technical/packfile-uri.txt b/Documentation/technical/packfile-uri.txt
index f7eabc6c76..0e6d7c8dda 100644

1:37: trailing whitespace.
  have a relationship with it. 
Checking patch Documentation/technical/packfile-uri.txt...
Applied patch Documentation/technical/packfile-uri.txt cleanly.
warning: 1 line adds whitespace errors.

index at:
100644 0e6d7c8dda3c56e7ed14313fc7e2bc45e2551186	Documentation/technical/packfile-uri.txt

(*) Git path names are given by the tree(s) the blob belongs to.
    Blobs themselves have no identifier aside from the hash of its contents.^

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).