git@vger.kernel.org mailing list mirror (one of many)
 help / Atom feed
* Git clonebundles
@ 2017-01-31  7:00 Stefan Saasen
  2017-02-04 17:39 ` Shawn Pearce
  0 siblings, 1 reply; 8+ messages in thread
From: Stefan Saasen @ 2017-01-31  7:00 UTC (permalink / raw)
  To: Git Mailing List

Hi all,

Bitbucket recently added support for Mercurial’s clonebundle extension
(http://gregoryszorc.com/blog/2015/10/22/cloning-improvements-in-mercurial-3.6/).
Mercurial’s clone bundles allow the Mercurial client to seed a repository using
a bundle file instead of dynamically generating a bundle for the client.


Mercurial clonebundles?
~~~~~~~~~~~~~~~~~~~~~~~

With Mercurial clonebundles the high level clone sequence looks like this:

1. The command "hg clone URL"  attempts to clone the repository at URL.
2. If a bundle file exists for the repository, the existence of the file
`clonebundles.manifest` causes the server to advertise the `clonebundle`
capability (capabilities lookup is the first command the client issues).
3. In the above case the client then executes the command "clonebundles".
4. The manifest file will be returned.
5. The client then selects a bundle file to download from the list of URLs
advertised in the manifests file, to seed the repository.
6. To update the repository the last step involves fetching the latest changes.


Why is this useful?
~~~~~~~~~~~~~~~~~~~

The fact that clone bundles can be distributed as static files enables us to
use static file servers for bundle distribution. Users have also reported
latency improvements for clone operations of popular Mercurial repositories.
Additionally this significantly reduces the resource usage of clone operations,
as clone operations are reduced to simpler fetches to resolve the delta between
the current repository and the downloaded bundle state.


clonebundles for git?
~~~~~~~~~~~~~~~~~~~~~

We recently looked into how this concept could be translated to git. This is
not a new idea and has been discussed before (more on that later) but our
success with the Mercurial clonebundle rollout prompted us to revisit this
topic.

We believe that bringing a similar concept to git could have the following
benefits:

* Improved clone times for users that clone large git repositories, especially
  if bundle file distribution leverages global CDNs.
* Improved scalability of git for managing large popular repositories.
  Offloading a significant portion of the clone resource usage to CDNs or static
  file hosts.


Our current proof-of-concept to explore this space, closely follows
the approach from Mercurial outlined above.

* An `/info/bundle` path returns a bundle manifest (over HTTP)
* The bundle manifest contains a simple list of URLs with some additional meta
  data that allows the client to select a suitable bundle download URL
* The bundle download URL points to a bundle file generated using `git bundle
  create` including all the relevant refs as a self contained repository seed.
* The client probes the target URL with a `GET` request to $URL/info/bundle and
  downloads the bundle file if present.
* The repository will be created based on the downloaded bundle (downloading a
  static file allows resumable downloads or parallel downloads of chunks if the
  file/web server supports range requests).
* A `git fetch` and the appropriate checkout then updates the "cloned"
  repository to match the latest upstream state.

The proof-of-concept was built as an external binary `git-clone2` that
mimics the behaviour of the `git clone` command, so unfortunately I
can't provide any patches to git to demonstrate the behaviour.


Ultimately our proof-of-concept is built around a few core ideas:

* Re-use the existing bundle format as a single-file, self-contained
repository representation.
* Introduce a bundle manifest (accessible at `$URL/info/bundle`) that allows
  the client to resolve a suitable bundle download URL.
* Teach the `git clone` command to accept and prefer seeding a repository using
  a static bundle file that is advertised in a bundle manifest.
* Re-use as much as possible of the existing commands and in particular the
  `git bundle` machinery to seed the repository and to create the static bundle
  file.
* We accept additional storage requirements for the bundle files in addition to
  the actual repository content in pack-files or loose objects.
Hosting providers
  or system administrators are free to decide how many bundles to advertise and
  how frequently the bundles are updated.
* It targets the "seed from a bundle file" use case, with resumable clones just
  being a potential side-effect.


Some of the problems that need to be solved with an approach like this are:

* Bundle advertisement/bundle negotiation: We considered advertising a
  new capability "clonebundle" as part of the rev advertisement
capabilities list.
  This would allow clients that support clonebundles to abort the clone attempt
  and resolve a suitable bundle URL from a bundle manifest at `$URL/info/bundle`
  instead. For HTTP this would amount to an early termination when
retrieving the
  ref-advertisement.
  Note: We didn't pursue this for our proof-of-concept so we didn't
explore whether
  this is feasible.
* Uniform approach for the supported transports: Our proof-of-concept
only supports HTTP as
  a transport. Ideally the clonebundle capability could be supported by all
  available transports (of which at least ssh would be highly desirable).
* Bundle manifest and bundle download: It is unclear whose responsibility it is
  to generate the bundle manifest with the bundle download URLs. Most likely the
  bundle files will be served using a webserver or CDN, so download
URL generation
  should not be a core git responsibility. For hosting purpose we envision that
  the bundle manifest might contain dynamic download URLs with personalised
  access tokens with expiry.
* Bundle generation: Similar to the above it is unclear how bundle
generation is handled.
  For hosting purposes, the operator would likely want to influence
when and how bundles are generated.



Prior art
~~~~~~~~~

Our proof-of-concept is built on top of ideas that have been
circulating for a while. We are aware of a number of proposed changes
in this space:


* Jeff King's work on network bundles:
https://github.com/peff/git/commit/17e2409df37edd0c49ef7d35f47a7695f9608900
* Nguyễn Thái Ngọc Duy's work on "[PATCH 0/8] Resumable clone
revisited, proof of concept":
https://www.spinics.net/lists/git/msg267260.html
* Resumable clone work by Kevin Wern:
https://public-inbox.org/git/1473984742-12516-1-git-send-email-kevin.m.wern@gmail.com/


Whilst the above mentioned proposals/proposed changes are in a similar
space, I would be interest to understand whether there is any
consensus on the general idea of supporting static bundle files as a
mechanism to seed a repository?
I would also appreciate any pointers to other discussions in this area.


Best regards,
Stefan Saasen & Erik van Zijst; Atlassian Bitbucket

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Git clonebundles
  2017-01-31  7:00 Git clonebundles Stefan Saasen
@ 2017-02-04 17:39 ` Shawn Pearce
  2017-02-05 16:37   ` Christian Couder
  0 siblings, 1 reply; 8+ messages in thread
From: Shawn Pearce @ 2017-02-04 17:39 UTC (permalink / raw)
  To: Stefan Saasen; +Cc: Git Mailing List

On Mon, Jan 30, 2017 at 11:00 PM, Stefan Saasen <ssaasen@atlassian.com> wrote:
>
> Bitbucket recently added support for Mercurial’s clonebundle extension
> (http://gregoryszorc.com/blog/2015/10/22/cloning-improvements-in-mercurial-3.6/).
> Mercurial’s clone bundles allow the Mercurial client to seed a repository using
> a bundle file instead of dynamically generating a bundle for the client.
...
> Prior art
> ~~~~~~~~~
>
> Our proof-of-concept is built on top of ideas that have been
> circulating for a while. We are aware of a number of proposed changes
> in this space:
>
>
> * Jeff King's work on network bundles:
> https://github.com/peff/git/commit/17e2409df37edd0c49ef7d35f47a7695f9608900
> * Nguyễn Thái Ngọc Duy's work on "[PATCH 0/8] Resumable clone
> revisited, proof of concept":
> https://www.spinics.net/lists/git/msg267260.html
> * Resumable clone work by Kevin Wern:
> https://public-inbox.org/git/1473984742-12516-1-git-send-email-kevin.m.wern@gmail.com/

I think you missed the most common deployment of prior art, which is
Android using the git-repo tool[1]. The git-repo tool has had
clone.bundle support since Sep 2011[2] and the Android Git servers
have been answering /clone.bundle requests[3] since just before that.
The bundle files are generated with `git bundle create` on a regular
schedule by cron.

[1] https://gerrit.googlesource.com/git-repo/+/04071c1c72437a930db017bd4c562ad06087986a/project.py#2091
[2] https://gerrit.googlesource.com/git-repo/+/f322b9abb4cadc67b991baf6ba1b9f2fbd5d7812
[3] https://android.googlesource.com/platform/frameworks/base/clone.bundle


> Whilst the above mentioned proposals/proposed changes are in a similar
> space, I would be interest to understand whether there is any
> consensus on the general idea of supporting static bundle files as a
> mechanism to seed a repository?

I don't think we have a consensus on how to advertise a bundle file is
available, which is why there are so many instances of prior art. In
2011 I just threw together /clone.bundle on HTTP because it was easy
to make the Python wrapper ask for the file and handle 404 gracefully
as not found and fall back to `git clone`.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Git clonebundles
  2017-02-04 17:39 ` Shawn Pearce
@ 2017-02-05 16:37   ` Christian Couder
  2017-02-06 22:16     ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: Christian Couder @ 2017-02-05 16:37 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Stefan Saasen, Git Mailing List

On Sat, Feb 4, 2017 at 6:39 PM, Shawn Pearce <spearce@spearce.org> wrote:
> On Mon, Jan 30, 2017 at 11:00 PM, Stefan Saasen <ssaasen@atlassian.com> wrote:
>>
>> Bitbucket recently added support for Mercurial’s clonebundle extension
>> (http://gregoryszorc.com/blog/2015/10/22/cloning-improvements-in-mercurial-3.6/).
>> Mercurial’s clone bundles allow the Mercurial client to seed a repository using
>> a bundle file instead of dynamically generating a bundle for the client.
> ...
>> Prior art
>> ~~~~~~~~~
>>
>> Our proof-of-concept is built on top of ideas that have been
>> circulating for a while. We are aware of a number of proposed changes
>> in this space:
>>
>>
>> * Jeff King's work on network bundles:
>> https://github.com/peff/git/commit/17e2409df37edd0c49ef7d35f47a7695f9608900
>> * Nguyễn Thái Ngọc Duy's work on "[PATCH 0/8] Resumable clone
>> revisited, proof of concept":
>> https://www.spinics.net/lists/git/msg267260.html
>> * Resumable clone work by Kevin Wern:
>> https://public-inbox.org/git/1473984742-12516-1-git-send-email-kevin.m.wern@gmail.com/
>
> I think you missed the most common deployment of prior art, which is
> Android using the git-repo tool[1]. The git-repo tool has had
> clone.bundle support since Sep 2011[2] and the Android Git servers
> have been answering /clone.bundle requests[3] since just before that.
> The bundle files are generated with `git bundle create` on a regular
> schedule by cron.
>
> [1] https://gerrit.googlesource.com/git-repo/+/04071c1c72437a930db017bd4c562ad06087986a/project.py#2091
> [2] https://gerrit.googlesource.com/git-repo/+/f322b9abb4cadc67b991baf6ba1b9f2fbd5d7812
> [3] https://android.googlesource.com/platform/frameworks/base/clone.bundle

There is also Junio's work on Bundle v3 that was unfortunately
recently discarded.
Look for "jc/bundle" in:

http://public-inbox.org/git/xmqq4m0cry60.fsf@gitster.mtv.corp.google.com/

and previous "What's cooking in git.git" emails.

I am also working on adding external object database support using
previous work by Peff:

http://public-inbox.org/git/20161130210420.15982-1-chriscool@tuxfamily.org/

that could be extended to support clone bundles.

[...]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Git clonebundles
  2017-02-05 16:37   ` Christian Couder
@ 2017-02-06 22:16     ` Junio C Hamano
  2017-02-07 12:04       ` Johannes Schindelin
  0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2017-02-06 22:16 UTC (permalink / raw)
  To: Christian Couder; +Cc: Shawn Pearce, Stefan Saasen, Git Mailing List

Christian Couder <christian.couder@gmail.com> writes:

> There is also Junio's work on Bundle v3 that was unfortunately
> recently discarded.
> Look for "jc/bundle" in:
>
> http://public-inbox.org/git/xmqq4m0cry60.fsf@gitster.mtv.corp.google.com/
>
> and previous "What's cooking in git.git" emails.

If people think it might be useful to have it around to experiment,
I can resurrect and keep that in 'pu' (or rather 'jch'), as long as
it does not overlap and conflict with other topics in flight.  Let
me try that in today's integration cycle.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Git clonebundles
  2017-02-06 22:16     ` Junio C Hamano
@ 2017-02-07 12:04       ` Johannes Schindelin
  2017-02-07 15:34         ` Stefan Beller
  2017-02-07 20:54         ` Junio C Hamano
  0 siblings, 2 replies; 8+ messages in thread
From: Johannes Schindelin @ 2017-02-07 12:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Christian Couder, Shawn Pearce, Stefan Saasen, Git Mailing List

Hi Junio,

On Mon, 6 Feb 2017, Junio C Hamano wrote:

> Christian Couder <christian.couder@gmail.com> writes:
> 
> > There is also Junio's work on Bundle v3 that was unfortunately
> > recently discarded.  Look for "jc/bundle" in:
> >
> > http://public-inbox.org/git/xmqq4m0cry60.fsf@gitster.mtv.corp.google.com/
> >
> > and previous "What's cooking in git.git" emails.
> 
> If people think it might be useful to have it around to experiment, I
> can resurrect and keep that in 'pu' (or rather 'jch'), as long as it
> does not overlap and conflict with other topics in flight.  Let me try
> that in today's integration cycle.

I would like to remind you of my suggestion to make this more publicly
visible and substantially easier to play with, by adding it as an
experimental feature (possibly guarded via an explicit opt-in config
setting).

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Git clonebundles
  2017-02-07 12:04       ` Johannes Schindelin
@ 2017-02-07 15:34         ` Stefan Beller
  2017-02-07 20:54         ` Junio C Hamano
  1 sibling, 0 replies; 8+ messages in thread
From: Stefan Beller @ 2017-02-07 15:34 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, Christian Couder, Shawn Pearce, Stefan Saasen, Git Mailing List

On Tue, Feb 7, 2017 at 4:04 AM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> Hi Junio,
>
> On Mon, 6 Feb 2017, Junio C Hamano wrote:
>
>> Christian Couder <christian.couder@gmail.com> writes:
>>
>> > There is also Junio's work on Bundle v3 that was unfortunately
>> > recently discarded.  Look for "jc/bundle" in:
>> >
>> > http://public-inbox.org/git/xmqq4m0cry60.fsf@gitster.mtv.corp.google.com/
>> >
>> > and previous "What's cooking in git.git" emails.
>>
>> If people think it might be useful to have it around to experiment, I
>> can resurrect and keep that in 'pu' (or rather 'jch'), as long as it
>> does not overlap and conflict with other topics in flight.  Let me try
>> that in today's integration cycle.
>
> I would like to remind you of my suggestion to make this more publicly
> visible and substantially easier to play with, by adding it as an
> experimental feature (possibly guarded via an explicit opt-in config
> setting).
>
> Ciao,
> Johannes

For making this more publicly visible, I want to look into publishing
the cooking reports on the git-scm.com. Maybe we can have a "dev"
section there, that has
* a "getting started" section
  linking to
    Documentation/SubmittingPatches
    How to setup your travis
* "current state of development" section
  e.g. the cooking reports, the
  release calender, description of the workflow
  (which branches do exist and serve which purpose),

Most of the static information is already covered quite
well in Documentation/ so there is definitively overlap,
hence lots of links to the ground truth.

The dynamic information however (release calender,
cooking reports) are not described well enough in
Documentation/ so I think we'd want to focus on these
in that dev section.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Git clonebundles
  2017-02-07 12:04       ` Johannes Schindelin
  2017-02-07 15:34         ` Stefan Beller
@ 2017-02-07 20:54         ` Junio C Hamano
  2017-02-08 14:28           ` Johannes Schindelin
  1 sibling, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2017-02-07 20:54 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Christian Couder, Shawn Pearce, Stefan Saasen, Git Mailing List

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> If people think it might be useful to have it around to experiment, I
>> can resurrect and keep that in 'pu' (or rather 'jch'), as long as it
>> does not overlap and conflict with other topics in flight.  Let me try
>> that in today's integration cycle.
>
> I would like to remind you of my suggestion to make this more publicly
> visible and substantially easier to play with, by adding it as an
> experimental feature (possibly guarded via an explicit opt-in config
> setting).

I do not understand why you want to give this topic undue prominence
ovver any other random topic that cook in 'pu' and later merged down
to 'next' and then 'master' only after they turn out to be useful
(or at least harmless).

If there were somebody who is the champion of that topic, advocating
that any clone-bundle solution must be based on this topic, it would
be different.  Even though I am not opposed to the topic myself, I
am not that somebody.  That is why I kept it around to wait to see
if somebody finds it potentially useful and then discarded it after
seeing no such person stepped up.

That champion of the topic would spend the necessaly engineering
effort to document it as experimental, to make sure that there is a
reasonable upgrade/transition route if the "v3" format turns out to
be not very useful, etc. by rerolling the patches or following-up on
them to advance it from 'pu' down to 'next' and to 'master' just
like any other topic.

Judging from the tone of his message (i.e. "unfortunately" in it),
Christian may want to be one, or somebody else may want to be one.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Git clonebundles
  2017-02-07 20:54         ` Junio C Hamano
@ 2017-02-08 14:28           ` Johannes Schindelin
  0 siblings, 0 replies; 8+ messages in thread
From: Johannes Schindelin @ 2017-02-08 14:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Christian Couder, Shawn Pearce, Stefan Saasen, Git Mailing List

Hi Junio,

On Tue, 7 Feb 2017, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
> >> If people think it might be useful to have it around to experiment, I
> >> can resurrect and keep that in 'pu' (or rather 'jch'), as long as it
> >> does not overlap and conflict with other topics in flight.  Let me try
> >> that in today's integration cycle.
> >
> > I would like to remind you of my suggestion to make this more publicly
> > visible and substantially easier to play with, by adding it as an
> > experimental feature (possibly guarded via an explicit opt-in config
> > setting).
> 
> I do not understand why you want to give this topic undue prominence
> ovver any other random topic that cook in 'pu' [...]

Since you ask so nicely for an explanation: clonebundles got a really
lively and active discussion at the Contributors' Summit. So it is not
your run of the mill typo fix, the bundle issue is something that clearly
receives a lot of interest in particular from developers who are
unfamiliar with the idiosynchracies of the code Git development.

And I got the very distinct impression that Git would benefit a lot from
these developers, *in particular* since they come with fresh perspectives.

Now, we can make it hard for them (e.g. expecting them to sift through a
few months' worth of What's Cooking mails, to find out whether there has
been any related work, and what is the branch name, if any, and where to
find that branch), and we can alternatively make it easy for them to help
us make Git better.

I would like us to choose the easier route for them. Because it would
benefit us.

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, back to index

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-31  7:00 Git clonebundles Stefan Saasen
2017-02-04 17:39 ` Shawn Pearce
2017-02-05 16:37   ` Christian Couder
2017-02-06 22:16     ` Junio C Hamano
2017-02-07 12:04       ` Johannes Schindelin
2017-02-07 15:34         ` Stefan Beller
2017-02-07 20:54         ` Junio C Hamano
2017-02-08 14:28           ` Johannes Schindelin

git@vger.kernel.org mailing list mirror (one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/
       or Tor2web: https://www.tor2web.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox