From: Jeff Hostetler <git@jeffhostetler.com>
To: Vitaly Arbuzov <vit@uber.com>, git@vger.kernel.org
Subject: Re: How hard would it be to implement sparse fetching/pulling?
Date: Thu, 30 Nov 2017 09:24:08 -0500 [thread overview]
Message-ID: <e2d5470b-9252-07b4-f3cf-57076d103a17@jeffhostetler.com> (raw)
In-Reply-To: <CANxXvsMbpBOSRKaAi8iVUikfxtQp=kofZ60N0pHXs+R+q1k3_Q@mail.gmail.com>
On 11/29/2017 10:16 PM, Vitaly Arbuzov wrote:
> Hi guys,
>
> I'm looking for ways to improve fetch/pull/clone time for large git
> (mono)repositories with unrelated source trees (that span across
> multiple services).
> I've found sparse checkout approach appealing and helpful for most of
> client-side operations (e.g. status, reset, commit, etc.)
> The problem is that there is no feature like sparse fetch/pull in git,
> this means that ALL objects in unrelated trees are always fetched.
> It may take a lot of time for large repositories and results in some
> practical scalability limits for git.
> This forced some large companies like Facebook and Google to move to
> Mercurial as they were unable to improve client-side experience with
> git while Microsoft has developed GVFS, which seems to be a step back
> to CVCS world.
>
> I want to get a feedback (from more experienced git users than I am)
> on what it would take to implement sparse fetching/pulling.
> (Downloading only objects related to the sparse-checkout list)
> Are there any issues with missing hashes?
> Are there any fundamental problems why it can't be done?
> Can we get away with only client-side changes or would it require
> special features on the server side?
>
> If we had such a feature then all we would need on top is a separate
> tool that builds the right "sparse" scope for the workspace based on
> paths that developer wants to work on.
>
> In the world where more and more companies are moving towards large
> monorepos this improvement would provide a good way of scaling git to
> meet this demand.
>
> PS. Please don't advice to split things up, as there are some good
> reasons why many companies decide to keep their code in the monorepo,
> which you can easily find online. So let's keep that part out the
> scope.
>
> -Vitaly
>
This work is in-progress now. A short summary can be found in [1]
of the current parts 1, 2, and 3.
> * jh/object-filtering (2017-11-22) 6 commits
> * jh/fsck-promisors (2017-11-22) 10 commits
> * jh/partial-clone (2017-11-22) 14 commits
[1] https://public-inbox.org/git/xmqq1skh6fyz.fsf@gitster.mtv.corp.google.com/T/
I have a branch that contains V5 all 3 parts:
https://github.com/jeffhostetler/git/tree/core/pc5_p3
This is a WIP, so there are some rough edges....
I hope to have a V6 out before the weekend with some
bug fixes and cleanup.
Please give it a try and see if it fits your needs.
Currently, there are filter methods to filter all blobs,
all large blobs, and one to match a sparse-checkout
specification.
Let me know if you have any questions or problems.
Thanks,
Jeff
next prev parent reply other threads:[~2017-11-30 14:24 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-11-30 3:16 How hard would it be to implement sparse fetching/pulling? Vitaly Arbuzov
2017-11-30 14:24 ` Jeff Hostetler [this message]
2017-11-30 17:01 ` Vitaly Arbuzov
2017-11-30 17:44 ` Vitaly Arbuzov
2017-11-30 20:03 ` Jonathan Nieder
2017-12-01 16:03 ` Jeff Hostetler
2017-12-01 18:16 ` Jonathan Nieder
2017-11-30 23:43 ` Philip Oakley
2017-12-01 1:27 ` Vitaly Arbuzov
2017-12-01 1:51 ` Vitaly Arbuzov
2017-12-01 2:51 ` Jonathan Nieder
2017-12-01 3:37 ` Vitaly Arbuzov
2017-12-02 16:59 ` Philip Oakley
2017-12-01 14:30 ` Jeff Hostetler
2017-12-02 16:30 ` Philip Oakley
2017-12-04 15:36 ` Jeff Hostetler
2017-12-05 23:46 ` Philip Oakley
2017-12-02 15:04 ` Philip Oakley
2017-12-01 17:23 ` Jeff Hostetler
2017-12-01 18:24 ` Jonathan Nieder
2017-12-04 15:53 ` Jeff Hostetler
2017-12-02 18:24 ` Philip Oakley
2017-12-05 19:14 ` Jeff Hostetler
2017-12-05 20:07 ` Jonathan Nieder
2017-12-01 15:28 ` Jeff Hostetler
2017-12-01 14:50 ` Jeff Hostetler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e2d5470b-9252-07b4-f3cf-57076d103a17@jeffhostetler.com \
--to=git@jeffhostetler.com \
--cc=git@vger.kernel.org \
--cc=vit@uber.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).