From: Jeff Hostetler <git@jeffhostetler.com>
To: Jonathan Nieder <jrnieder@gmail.com>
Cc: Philip Oakley <philipoakley@iee.org>,
Vitaly Arbuzov <vit@uber.com>, Git List <git@vger.kernel.org>
Subject: Re: How hard would it be to implement sparse fetching/pulling?
Date: Mon, 4 Dec 2017 10:53:32 -0500 [thread overview]
Message-ID: <b277ecb7-addc-f494-cf30-b48a794abdce@jeffhostetler.com> (raw)
In-Reply-To: <20171201182446.GB18220@aiede.mtv.corp.google.com>
On 12/1/2017 1:24 PM, Jonathan Nieder wrote:
> Jeff Hostetler wrote:
>> On 11/30/2017 6:43 PM, Philip Oakley wrote:
>
>>> The 'companies' problem is that it tends to force a client-server, always-on
>>> on-line mentality. I'm also wanting the original DVCS off-line capability to
>>> still be available, with _user_ control, in a generic sense, of what they
>>> have locally available (including files/directories they have not yet looked
>>> at, but expect to have. IIUC Jeff's work is that on-line view, without the
>>> off-line capability.
>>>
>>> I'd commented early in the series at [1,2,3].
>>
>> Yes, this does tend to lead towards an always-online mentality.
>> However, there are 2 parts:
>> [a] dynamic object fetching for missing objects, such as during a
>> random command like diff or blame or merge. We need this
>> regardless of usage -- because we can't always predict (or
>> dry-run) every command the user might run in advance.
>> [b] batch fetch mode, such as using partial-fetch to match your
>> sparse-checkout so that you always have the blobs of interest
>> to you. And assuming you don't wander outside of this subset
>> of the tree, you should be able to work offline as usual.
>> If you can work within the confines of [b], you wouldn't need to
>> always be online.
>
> Just to amplify this: for our internal use we care a lot about
> disconnected usage working. So it is not like we have forgotten about
> this use case.
>
>> We might also add a part [c] with explicit commands to back-fill or
>> alter your incomplete view of the ODB
>
> Agreed, this will be a nice thing to add.
>
> [...]
>>> At its core, my idea was to use the object store to hold markers for the
>>> 'not yet fetched' objects (mainly trees and blobs). These would be in a
>>> known fixed format, and have the same effect (conceptually) as the
>>> sub-module markers - they _confirm_ the oid, yet say 'not here, try
>>> elsewhere'.
>>
>> We do have something like this. Jonathan can explain better than I, but
>> basically, we denote possibly incomplete packfiles from partial clones
>> and fetches as "promisor" and have special rules in the code to assert
>> that a missing blob referenced from a "promisor" packfile is OK and can
>> be fetched later if necessary from the "promising" remote.
>>
>> The main problem with markers or other lists of missing objects is
>> that it has scale problems for large repos.
>
> Any chance that we can get a design doc in Documentation/technical/
> giving an overview of the design, with a brief "alternatives
> considered" section describing this kind of thing?
Yeah, I'll start one. I have notes within the individual protocol
docs and man-pages, but no summary doc. Thanks!
>
> E.g. some of the earlier descriptions like
> https://public-inbox.org/git/20170915134343.3814dc38@twelve2.svl.corp.google.com/
> https://public-inbox.org/git/cover.1506714999.git.jonathantanmy@google.com/
> https://public-inbox.org/git/20170113155253.1644-1-benpeart@microsoft.com/
> may help as a starting point.
>
> Thanks,
> Jonathan
>
next prev parent reply other threads:[~2017-12-04 15:53 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-11-30 3:16 How hard would it be to implement sparse fetching/pulling? Vitaly Arbuzov
2017-11-30 14:24 ` Jeff Hostetler
2017-11-30 17:01 ` Vitaly Arbuzov
2017-11-30 17:44 ` Vitaly Arbuzov
2017-11-30 20:03 ` Jonathan Nieder
2017-12-01 16:03 ` Jeff Hostetler
2017-12-01 18:16 ` Jonathan Nieder
2017-11-30 23:43 ` Philip Oakley
2017-12-01 1:27 ` Vitaly Arbuzov
2017-12-01 1:51 ` Vitaly Arbuzov
2017-12-01 2:51 ` Jonathan Nieder
2017-12-01 3:37 ` Vitaly Arbuzov
2017-12-02 16:59 ` Philip Oakley
2017-12-01 14:30 ` Jeff Hostetler
2017-12-02 16:30 ` Philip Oakley
2017-12-04 15:36 ` Jeff Hostetler
2017-12-05 23:46 ` Philip Oakley
2017-12-02 15:04 ` Philip Oakley
2017-12-01 17:23 ` Jeff Hostetler
2017-12-01 18:24 ` Jonathan Nieder
2017-12-04 15:53 ` Jeff Hostetler [this message]
2017-12-02 18:24 ` Philip Oakley
2017-12-05 19:14 ` Jeff Hostetler
2017-12-05 20:07 ` Jonathan Nieder
2017-12-01 15:28 ` Jeff Hostetler
2017-12-01 14:50 ` Jeff Hostetler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b277ecb7-addc-f494-cf30-b48a794abdce@jeffhostetler.com \
--to=git@jeffhostetler.com \
--cc=git@vger.kernel.org \
--cc=jrnieder@gmail.com \
--cc=philipoakley@iee.org \
--cc=vit@uber.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).