From: Jeff Hostetler <git@jeffhostetler.com>
To: Jeff King <peff@peff.net>
Cc: Junio C Hamano <gitster@pobox.com>,
Jeff Hostetler <jeffhost@microsoft.com>,
git@vger.kernel.org, markbt@efaref.net, benpeart@microsoft.com,
jonathantanmy@google.com
Subject: Re: [PATCH 06/10] rev-list: add --allow-partial option to relax connectivity checks
Date: Thu, 9 Mar 2017 13:38:35 -0500 [thread overview]
Message-ID: <cb3418b8-8ecf-a38b-67d3-e73dc28d4f69@jeffhostetler.com> (raw)
In-Reply-To: <20170309075642.jy5o353ann524k7f@sigill.intra.peff.net>
On 3/9/2017 2:56 AM, Jeff King wrote:
> On Wed, Mar 08, 2017 at 03:10:54PM -0500, Jeff Hostetler wrote:
>
>>> Even though I do very much like the basic "high level" premise to
>>> omit often useless large blobs that are buried deep in the history
>>> we would not necessarily need from the initial cloning and
>>> subsequent fetches, I find it somewhat disturbing that the code
>>> "Assume"s that any missing blob is due to an previous partial clone.
>>> Adding this option smells like telling the users that they are not
>>> supposed to run "git fsck" because a partially cloned repository is
>>> inherently a corrupt repository.
>>>
>>> Can't we do a bit better? If we want to make the world safer again,
>>> what additional complexity is required to allow us to tell the
>>> "missing by design" and "corrupt repository" apart?
>>
>> I'm open to suggestions here. It would be nice to extend the
>> fetch-pack/upload-pack protocol to return a list of the SHAa
>> (and maybe the sizes) of the omitted blobs, so that a partial
>> clone or fetch would still be able to be integrity checked.
>
> Yeah, the early external-odb patches did this. It lets you do a more
> accurate fsck, and it also helps diff avoid faulting in large-object
> cases (because we can mark them as binary for "free" by comparing the
> size to big_file_threshold).
>
> So I think it makes a lot of sense in the large-blob case, where
> transmitting a type/size/sha1 tuple is way more efficient than sending
> the blob itself. But it's less clear for "sparse" cases where just
> enumerating the set of blobs may be prohibitively large.
>
> I have a feeling that the "sparse" thing needs to be handled separately
> from "partial". IOW, the client needs to tell the server "I'm only
> interested in the path foo/bar, so just send that". Then you don't find
> out about the types and sizes outside of that path, but you don't need
> to; the sparse path is stored locally and fsck knows to avoid looking
> into it.
>
> -Peff
>
That makes sense. I'd like to get both concepts (by-size/special vs
sparse-file) in, but they don't really overlap that much (internally).
So I could see doing this in 2 separate efforts.
Thanks,
Jeff
next prev parent reply other threads:[~2017-03-09 18:39 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-08 17:37 [PATCH 00/10] RFC Partial Clone and Fetch Jeff Hostetler
2017-03-08 17:37 ` [PATCH 01/10] pack-objects: eat CR in addition to LF after fgets Jeff Hostetler
2017-03-09 7:01 ` Jeff King
2017-03-09 15:46 ` Jeff Hostetler
2017-03-08 17:37 ` [PATCH 02/10] pack-objects: add --partial-by-size=n --partial-special Jeff Hostetler
2017-03-08 18:47 ` Junio C Hamano
2017-03-08 20:21 ` Jeff Hostetler
2017-03-09 7:04 ` Jeff King
2017-03-10 17:58 ` Brandon Williams
2017-03-10 18:03 ` Jeff King
2017-03-10 19:38 ` Junio C Hamano
2017-03-10 19:47 ` Jeff King
2017-03-09 7:31 ` Jeff King
2017-03-09 18:26 ` Jeff Hostetler
2017-03-08 17:37 ` [PATCH 03/10] pack-objects: test for --partial-by-size --partial-special Jeff Hostetler
2017-03-09 7:35 ` Jeff King
2017-03-09 18:11 ` Johannes Sixt
2017-03-08 17:37 ` [PATCH 04/10] upload-pack: add partial (sparse) fetch Jeff Hostetler
2017-03-09 7:48 ` Jeff King
2017-03-09 18:34 ` Jeff Hostetler
2017-03-09 19:09 ` Jeff King
2017-03-08 17:38 ` [PATCH 05/10] fetch-pack: add partial-by-size and partial-special Jeff Hostetler
2017-03-08 17:38 ` [PATCH 06/10] rev-list: add --allow-partial option to relax connectivity checks Jeff Hostetler
2017-03-08 18:55 ` Junio C Hamano
2017-03-08 20:10 ` Jeff Hostetler
2017-03-09 7:56 ` Jeff King
2017-03-09 18:38 ` Jeff Hostetler [this message]
2017-03-08 17:38 ` [PATCH 07/10] index-pack: add --allow-partial option to relax blob existence checks Jeff Hostetler
2017-03-08 17:38 ` [PATCH 08/10] fetch: add partial-by-size and partial-special arguments Jeff Hostetler
2017-03-08 17:38 ` [PATCH 09/10] clone: " Jeff Hostetler
2017-03-08 17:38 ` [PATCH 10/10] ls-partial: created command to list missing blobs Jeff Hostetler
-- strict thread matches above, loose matches on Subject: below --
2017-03-08 18:50 [PATCH 00/10] RFC Partial Clone and Fetch git
2017-03-08 18:50 ` [PATCH 06/10] rev-list: add --allow-partial option to relax connectivity checks git
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cb3418b8-8ecf-a38b-67d3-e73dc28d4f69@jeffhostetler.com \
--to=git@jeffhostetler.com \
--cc=benpeart@microsoft.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jeffhost@microsoft.com \
--cc=jonathantanmy@google.com \
--cc=markbt@efaref.net \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).