git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Ben Peart <peartben@gmail.com>
To: Philip Oakley <philipoakley@iee.org>,
	Jonathan Tan <jonathantanmy@google.com>,
	git@vger.kernel.org
Subject: Re: [RFC PATCH 0/3] Partial clone: promised blobs (formerly "missing blobs")
Date: Mon, 17 Jul 2017 13:43:42 -0400	[thread overview]
Message-ID: <b928c073-6156-30f2-c850-993e59079ed1@gmail.com> (raw)
In-Reply-To: <C299C45128634A21AF9D65E1B2B52C5B@PhilipOakley>



On 7/16/2017 11:23 AM, Philip Oakley wrote:
> From: "Jonathan Tan" <jonathantanmy@google.com>
> Sent: Tuesday, July 11, 2017 8:48 PM
>> These patches are part of a set of patches implementing partial clone,
>> as you can see here:
>>
>> https://github.com/jonathantanmy/git/tree/partialclone
>>
>> In that branch, clone with batch checkout works, as you can see in the
>> README. The code and tests are generally done, but some patches are
>> still missing documentation and commit messages.
>>
>> These 3 patches implement the foundational concept - formerly known as
>> "missing blobs" in the "missing blob manifest", I decided to call them
>> "promised blobs". The repo knows their object names and sizes. It also
>> does not have the blobs themselves, but can be configured to know how to
>> fetch them.
>>
> If I understand correctly, this method doesn't give any direct user 
> visibility of missing blobs in the file system. Is that correct?

That is correct

> 
> I was hoping that eventually the various 'on demand' approaches would 
> still allow users to continue to work as they go off-line such that they 
> can see directly (in the FS) where the missing blobs (and trees) are 
> located, so that they can continue to commit new work on existing files.
> 

This is a challenge as git assumes all objects are always available 
(that is a key design principal of a DVCS) so any missing object is 
considered a corruption that typically results in a call to "die."

The GVFS solution gets around this by ensuring any missing object is 
retrieved on behalf of git so that it never sees it as missing.  The 
obvious tradeoff is that this requires a network connection so the 
object can be retrieved.

> I had felt that some sort of 'gitlink' should be present (huma readable) 
> as a place holder for the missing blob/tree. e.g. 'gitblob: 1234abcd' 
> (showing the missing oid, jsut like sub-modules can do - it's no 
> different really.
> 

We explored that option briefly but when you have a large number of 
files, even writing out some sort of place holder can take a very long 
time.  In fact, since the typical source file is relatively small (a few 
kilobytes), writing out a placeholder doesn't save much time vs just 
writing out the actual file contents.

Another challenge is that even if there is a placeholder written to 
disk, you still need a network connection to retrieve the actual 
contents if/when it is needed.

> I'm concerned that the various GVFS extensions haven't fully achieved a 
> separation of concerns surrounding the DVCS capability for 
> on-line/off-line conversion as comms drop in and out. The GVFS looks 
> great for a fully networked, always on, environment, but it would be 
> good to also have the sepration for those who (will) have shallow/narrow 
> clones that may also need to work with a local upstream that is also 
> shallow/narrow.
> 

You are correct that this hasn't been tackled yet. It is a challenging 
problem. I can envision something along the lines of what was done for 
the shallow clone feature where there are distinct ways to change the 
set of objects that are available but that would hopefully come in some 
future patch series.

> -- 
> Philip
> I wanted to at least get my thoughts into the discussion before it all 
> passes by.
> 
>> An older version of these patches was sent as a single demonstration
>> patch in versions 1 to 3 of [1]. In there, Junio suggested that I have
>> only one file containing missing blob information. I have made that
>> suggested change in this version.
>>
>> One thing remaining is to add a repository extension [2] so that older
>> versions of Git fail immediately instead of trying to read missing
>> blobs, but I thought I'd send these first in order to get some initial
>> feedback.
>>
>> [1] 
>> https://public-inbox.org/git/cover.1497035376.git.jonathantanmy@google.com/ 
>>
>> [2] Documentation/technical/repository-version.txt
>>
>> Jonathan Tan (3):
>>  promised-blob, fsck: introduce promised blobs
>>  sha1-array: support appending unsigned char hash
>>  sha1_file: add promised blob hook support
>>
>> Documentation/config.txt               |   8 ++
>> Documentation/gitrepository-layout.txt |   8 ++
>> Makefile                               |   1 +
>> builtin/cat-file.c                     |   9 ++
>> builtin/fsck.c                         |  13 +++
>> promised-blob.c                        | 170 
>> +++++++++++++++++++++++++++++++++
>> promised-blob.h                        |  27 ++++++
>> sha1-array.c                           |   7 ++
>> sha1-array.h                           |   1 +
>> sha1_file.c                            |  44 ++++++---
>> t/t3907-promised-blob.sh               |  65 +++++++++++++
>> t/test-lib-functions.sh                |   6 ++
>> 12 files changed, 345 insertions(+), 14 deletions(-)
>> create mode 100644 promised-blob.c
>> create mode 100644 promised-blob.h
>> create mode 100755 t/t3907-promised-blob.sh
>>
>> -- 
>> 2.13.2.932.g7449e964c-goog
>>
> 

  reply	other threads:[~2017-07-17 17:43 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-11 19:48 [RFC PATCH 0/3] Partial clone: promised blobs (formerly "missing blobs") Jonathan Tan
2017-07-11 19:48 ` [RFC PATCH 1/3] promised-blob, fsck: introduce promised blobs Jonathan Tan
2017-07-11 22:02   ` Stefan Beller
2017-07-19 23:37     ` Jonathan Tan
2017-07-12 17:29   ` Jeff Hostetler
2017-07-12 19:28     ` Jonathan Nieder
2017-07-13 14:48       ` Jeff Hostetler
2017-07-13 15:05         ` Jeff Hostetler
2017-07-13 19:39     ` Jonathan Tan
2017-07-14 20:03       ` Jeff Hostetler
2017-07-14 21:30         ` Jonathan Nieder
2017-07-11 19:48 ` [RFC PATCH 2/3] sha1-array: support appending unsigned char hash Jonathan Tan
2017-07-11 22:06   ` Stefan Beller
2017-07-19 23:56     ` Jonathan Tan
2017-07-20  0:06       ` Stefan Beller
2017-07-11 19:48 ` [RFC PATCH 3/3] sha1_file: add promised blob hook support Jonathan Tan
2017-07-11 22:38   ` Stefan Beller
2017-07-12 17:40   ` Ben Peart
2017-07-12 20:38     ` Jonathan Nieder
2017-07-16 15:23 ` [RFC PATCH 0/3] Partial clone: promised blobs (formerly "missing blobs") Philip Oakley
2017-07-17 17:43   ` Ben Peart [this message]
2017-07-25 20:48     ` Philip Oakley
2017-07-17 18:03   ` Jonathan Nieder
2017-07-29 12:51     ` Philip Oakley
2017-07-20  0:21 ` [RFC PATCH v2 0/4] Partial clone: promised objects (not only blobs) Jonathan Tan
2017-07-20  0:21 ` [RFC PATCH v2 1/4] object: remove "used" field from struct object Jonathan Tan
2017-07-20  0:36   ` Stefan Beller
2017-07-20  0:55     ` Jonathan Tan
2017-07-20 17:44       ` Ben Peart
2017-07-20 21:20   ` Junio C Hamano
2017-07-20  0:21 ` [RFC PATCH v2 2/4] promised-object, fsck: introduce promised objects Jonathan Tan
2017-07-20 18:07   ` Stefan Beller
2017-07-20 19:17     ` Jonathan Tan
2017-07-20 19:58   ` Ben Peart
2017-07-20 21:13     ` Jonathan Tan
2017-07-21 16:24       ` Ben Peart
2017-07-21 20:33         ` Jonathan Tan
2017-07-25 15:10           ` Ben Peart
2017-07-29 13:26             ` Philip Oakley
2017-07-20  0:21 ` [RFC PATCH v2 3/4] sha1-array: support appending unsigned char hash Jonathan Tan
2017-07-20  0:21 ` [RFC PATCH v2 4/4] sha1_file: support promised object hook Jonathan Tan
2017-07-20 18:23   ` Stefan Beller
2017-07-20 20:58     ` Ben Peart
2017-07-20 21:18       ` Jonathan Tan
2017-07-21 16:27         ` Ben Peart
  -- strict thread matches above, loose matches on Subject: below --
2022-09-17 23:56 [RFC PATCH 0/3] Partial clone: promised blobs (formerly "missing blobs") Вероника Кулешова

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b928c073-6156-30f2-c850-993e59079ed1@gmail.com \
    --to=peartben@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=philipoakley@iee.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).