git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff Hostetler <git@jeffhostetler.com>
To: Vitaly Arbuzov <vit@uber.com>, git@vger.kernel.org
Subject: Re: How hard would it be to implement sparse fetching/pulling?
Date: Thu, 30 Nov 2017 09:24:08 -0500	[thread overview]
Message-ID: <e2d5470b-9252-07b4-f3cf-57076d103a17@jeffhostetler.com> (raw)
In-Reply-To: <CANxXvsMbpBOSRKaAi8iVUikfxtQp=kofZ60N0pHXs+R+q1k3_Q@mail.gmail.com>



On 11/29/2017 10:16 PM, Vitaly Arbuzov wrote:
> Hi guys,
> 
> I'm looking for ways to improve fetch/pull/clone time for large git
> (mono)repositories with unrelated source trees (that span across
> multiple services).
> I've found sparse checkout approach appealing and helpful for most of
> client-side operations (e.g. status, reset, commit, etc.)
> The problem is that there is no feature like sparse fetch/pull in git,
> this means that ALL objects in unrelated trees are always fetched.
> It may take a lot of time for large repositories and results in some
> practical scalability limits for git.
> This forced some large companies like Facebook and Google to move to
> Mercurial as they were unable to improve client-side experience with
> git while Microsoft has developed GVFS, which seems to be a step back
> to CVCS world.
> 
> I want to get a feedback (from more experienced git users than I am)
> on what it would take to implement sparse fetching/pulling.
> (Downloading only objects related to the sparse-checkout list)
> Are there any issues with missing hashes?
> Are there any fundamental problems why it can't be done?
> Can we get away with only client-side changes or would it require
> special features on the server side?
> 
> If we had such a feature then all we would need on top is a separate
> tool that builds the right "sparse" scope for the workspace based on
> paths that developer wants to work on.
> 
> In the world where more and more companies are moving towards large
> monorepos this improvement would provide a good way of scaling git to
> meet this demand.
> 
> PS. Please don't advice to split things up, as there are some good
> reasons why many companies decide to keep their code in the monorepo,
> which you can easily find online. So let's keep that part out the
> scope.
> 
> -Vitaly
> 


This work is in-progress now.  A short summary can be found in [1]
of the current parts 1, 2, and 3.

> * jh/object-filtering (2017-11-22) 6 commits
> * jh/fsck-promisors (2017-11-22) 10 commits
> * jh/partial-clone (2017-11-22) 14 commits

[1] https://public-inbox.org/git/xmqq1skh6fyz.fsf@gitster.mtv.corp.google.com/T/

I have a branch that contains V5 all 3 parts:
https://github.com/jeffhostetler/git/tree/core/pc5_p3

This is a WIP, so there are some rough edges....
I hope to have a V6 out before the weekend with some
bug fixes and cleanup.

Please give it a try and see if it fits your needs.
Currently, there are filter methods to filter all blobs,
all large blobs, and one to match a sparse-checkout
specification.

Let me know if you have any questions or problems.

Thanks,
Jeff

  reply	other threads:[~2017-11-30 14:24 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-30  3:16 How hard would it be to implement sparse fetching/pulling? Vitaly Arbuzov
2017-11-30 14:24 ` Jeff Hostetler [this message]
2017-11-30 17:01   ` Vitaly Arbuzov
2017-11-30 17:44     ` Vitaly Arbuzov
2017-11-30 20:03       ` Jonathan Nieder
2017-12-01 16:03         ` Jeff Hostetler
2017-12-01 18:16           ` Jonathan Nieder
2017-11-30 23:43       ` Philip Oakley
2017-12-01  1:27         ` Vitaly Arbuzov
2017-12-01  1:51           ` Vitaly Arbuzov
2017-12-01  2:51             ` Jonathan Nieder
2017-12-01  3:37               ` Vitaly Arbuzov
2017-12-02 16:59               ` Philip Oakley
2017-12-01 14:30             ` Jeff Hostetler
2017-12-02 16:30               ` Philip Oakley
2017-12-04 15:36                 ` Jeff Hostetler
2017-12-05 23:46                   ` Philip Oakley
2017-12-02 15:04           ` Philip Oakley
2017-12-01 17:23         ` Jeff Hostetler
2017-12-01 18:24           ` Jonathan Nieder
2017-12-04 15:53             ` Jeff Hostetler
2017-12-02 18:24           ` Philip Oakley
2017-12-05 19:14             ` Jeff Hostetler
2017-12-05 20:07               ` Jonathan Nieder
2017-12-01 15:28       ` Jeff Hostetler
2017-12-01 14:50     ` Jeff Hostetler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e2d5470b-9252-07b4-f3cf-57076d103a17@jeffhostetler.com \
    --to=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=vit@uber.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).