Re: How hard would it be to implement sparse fetching/pulling?

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

From: Jeff Hostetler <git@jeffhostetler.com>
To: Jonathan Nieder <jrnieder@gmail.com>
Cc: Philip Oakley <philipoakley@iee.org>,
	Vitaly Arbuzov <vit@uber.com>, Git List <git@vger.kernel.org>
Subject: Re: How hard would it be to implement sparse fetching/pulling?
Date: Mon, 4 Dec 2017 10:53:32 -0500	[thread overview]
Message-ID: <b277ecb7-addc-f494-cf30-b48a794abdce@jeffhostetler.com> (raw)
In-Reply-To: <20171201182446.GB18220@aiede.mtv.corp.google.com>



On 12/1/2017 1:24 PM, Jonathan Nieder wrote:
> Jeff Hostetler wrote:
>> On 11/30/2017 6:43 PM, Philip Oakley wrote:
> 
>>> The 'companies' problem is that it tends to force a client-server, always-on
>>> on-line mentality. I'm also wanting the original DVCS off-line capability to
>>> still be available, with _user_ control, in a generic sense, of what they
>>> have locally available (including files/directories they have not yet looked
>>> at, but expect to have. IIUC Jeff's work is that on-line view, without the
>>> off-line capability.
>>>
>>> I'd commented early in the series at [1,2,3].
>>
>> Yes, this does tend to lead towards an always-online mentality.
>> However, there are 2 parts:
>> [a] dynamic object fetching for missing objects, such as during a
>>      random command like diff or blame or merge.  We need this
>>      regardless of usage -- because we can't always predict (or
>>      dry-run) every command the user might run in advance.
>> [b] batch fetch mode, such as using partial-fetch to match your
>>      sparse-checkout so that you always have the blobs of interest
>>      to you.  And assuming you don't wander outside of this subset
>>      of the tree, you should be able to work offline as usual.
>> If you can work within the confines of [b], you wouldn't need to
>> always be online.
> 
> Just to amplify this: for our internal use we care a lot about
> disconnected usage working.  So it is not like we have forgotten about
> this use case.
> 
>> We might also add a part [c] with explicit commands to back-fill or
>> alter your incomplete view of the ODB
> 
> Agreed, this will be a nice thing to add.
> 
> [...]
>>> At its core, my idea was to use the object store to hold markers for the
>>> 'not yet fetched' objects (mainly trees and blobs). These would be in a
>>> known fixed format, and have the same effect (conceptually) as the
>>> sub-module markers - they _confirm_ the oid, yet say 'not here, try
>>> elsewhere'.
>>
>> We do have something like this.  Jonathan can explain better than I, but
>> basically, we denote possibly incomplete packfiles from partial clones
>> and fetches as "promisor" and have special rules in the code to assert
>> that a missing blob referenced from a "promisor" packfile is OK and can
>> be fetched later if necessary from the "promising" remote.
>>
>> The main problem with markers or other lists of missing objects is
>> that it has scale problems for large repos.
> 
> Any chance that we can get a design doc in Documentation/technical/
> giving an overview of the design, with a brief "alternatives
> considered" section describing this kind of thing?

Yeah, I'll start one.  I have notes within the individual protocol
docs and man-pages, but no summary doc.  Thanks!

> 
> E.g. some of the earlier descriptions like
>   https://public-inbox.org/git/20170915134343.3814dc38@twelve2.svl.corp.google.com/
>   https://public-inbox.org/git/cover.1506714999.git.jonathantanmy@google.com/
>   https://public-inbox.org/git/20170113155253.1644-1-benpeart@microsoft.com/
> may help as a starting point.
> 
> Thanks,
> Jonathan
>

next prev parent reply	other threads:[~2017-12-04 15:53 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-30  3:16 How hard would it be to implement sparse fetching/pulling? Vitaly Arbuzov
2017-11-30 14:24 ` Jeff Hostetler
2017-11-30 17:01   ` Vitaly Arbuzov
2017-11-30 17:44     ` Vitaly Arbuzov
2017-11-30 20:03       ` Jonathan Nieder
2017-12-01 16:03         ` Jeff Hostetler
2017-12-01 18:16           ` Jonathan Nieder
2017-11-30 23:43       ` Philip Oakley
2017-12-01  1:27         ` Vitaly Arbuzov
2017-12-01  1:51           ` Vitaly Arbuzov
2017-12-01  2:51             ` Jonathan Nieder
2017-12-01  3:37               ` Vitaly Arbuzov
2017-12-02 16:59               ` Philip Oakley
2017-12-01 14:30             ` Jeff Hostetler
2017-12-02 16:30               ` Philip Oakley
2017-12-04 15:36                 ` Jeff Hostetler
2017-12-05 23:46                   ` Philip Oakley
2017-12-02 15:04           ` Philip Oakley
2017-12-01 17:23         ` Jeff Hostetler
2017-12-01 18:24           ` Jonathan Nieder
2017-12-04 15:53             ` Jeff Hostetler [this message]
2017-12-02 18:24           ` Philip Oakley
2017-12-05 19:14             ` Jeff Hostetler
2017-12-05 20:07               ` Jonathan Nieder
2017-12-01 15:28       ` Jeff Hostetler
2017-12-01 14:50     ` Jeff Hostetler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b277ecb7-addc-f494-cf30-b48a794abdce@jeffhostetler.com \
    --to=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=jrnieder@gmail.com \
    --cc=philipoakley@iee.org \
    --cc=vit@uber.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).