git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jonathan Nieder <jrnieder@gmail.com>
To: Vitaly Arbuzov <vit@uber.com>
Cc: Philip Oakley <philipoakley@iee.org>,
	Jeff Hostetler <git@jeffhostetler.com>,
	Git List <git@vger.kernel.org>
Subject: Re: How hard would it be to implement sparse fetching/pulling?
Date: Thu, 30 Nov 2017 18:51:06 -0800	[thread overview]
Message-ID: <20171201025106.GD20640@aiede.mtv.corp.google.com> (raw)
In-Reply-To: <CANxXvsNuEmo+uaRY8t44csqzXAk3rS+D9E=LMvaLcZeg-aLvRw@mail.gmail.com>

Hi Vitaly,

Vitaly Arbuzov wrote:

> I think it would be great if we high level agree on desired user
> experience, so let me put a few possible use cases here.

I think one thing this thread is pointing to is a lack of overview
documentation about how the 'partial clone' series currently works.
The basic components are:

 1. extending git protocol to (1) allow fetching only a subset of the
    objects reachable from the commits being fetched and (2) later,
    going back and fetching the objects that were left out.

    We've also discussed some other protocol changes, e.g. to allow
    obtaining the sizes of un-fetched objects without fetching the
    objects themselves

 2. extending git's on-disk format to allow having some objects not be
    present but only be "promised" to be obtainable from a remote
    repository.  When running a command that requires those objects,
    the user can choose to have it either (a) error out ("airplane
    mode") or (b) fetch the required objects.

    It is still possible to work fully locally in such a repo, make
    changes, get useful results out of "git fsck", etc.  It is kind of
    similar to the existing "shallow clone" feature, except that there
    is a more straightforward way to obtain objects that are outside
    the "shallow" clone when needed on demand.

 3. improving everyday commands to require fewer objects.  For
    example, if I run "git log -p", then I way to see the history of
    most files but I don't necessarily want to download large binary
    files just to print 'Binary files differ' for them.

    And by the same token, we might want to have a mode for commands
    like "git log -p" to default to restricting to a particular
    directory, instead of downloading files outside that directory.

    There are some fundamental changes to make in this category ---
    e.g. modifying the index format to not require entries for files
    outside the sparse checkout, to avoid having to download the
    trees for them.

The overall goal is to make git scale better.

The existing patches do (1) and (2), though it is possible to do more
in those categories. :)  We have plans to work on (3) as well.

These are overall changes that happen at a fairly low level in git.
They mostly don't require changes command-by-command.

Thanks,
Jonathan

  reply	other threads:[~2017-12-01  2:51 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-30  3:16 How hard would it be to implement sparse fetching/pulling? Vitaly Arbuzov
2017-11-30 14:24 ` Jeff Hostetler
2017-11-30 17:01   ` Vitaly Arbuzov
2017-11-30 17:44     ` Vitaly Arbuzov
2017-11-30 20:03       ` Jonathan Nieder
2017-12-01 16:03         ` Jeff Hostetler
2017-12-01 18:16           ` Jonathan Nieder
2017-11-30 23:43       ` Philip Oakley
2017-12-01  1:27         ` Vitaly Arbuzov
2017-12-01  1:51           ` Vitaly Arbuzov
2017-12-01  2:51             ` Jonathan Nieder [this message]
2017-12-01  3:37               ` Vitaly Arbuzov
2017-12-02 16:59               ` Philip Oakley
2017-12-01 14:30             ` Jeff Hostetler
2017-12-02 16:30               ` Philip Oakley
2017-12-04 15:36                 ` Jeff Hostetler
2017-12-05 23:46                   ` Philip Oakley
2017-12-02 15:04           ` Philip Oakley
2017-12-01 17:23         ` Jeff Hostetler
2017-12-01 18:24           ` Jonathan Nieder
2017-12-04 15:53             ` Jeff Hostetler
2017-12-02 18:24           ` Philip Oakley
2017-12-05 19:14             ` Jeff Hostetler
2017-12-05 20:07               ` Jonathan Nieder
2017-12-01 15:28       ` Jeff Hostetler
2017-12-01 14:50     ` Jeff Hostetler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171201025106.GD20640@aiede.mtv.corp.google.com \
    --to=jrnieder@gmail.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=philipoakley@iee.org \
    --cc=vit@uber.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).