git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: git@vger.kernel.org
Subject: Re: [Design RFC] Being more defensive about fetching commits in partial clone
Date: Mon, 28 Nov 2022 20:31:04 -0500	[thread overview]
Message-ID: <Y4Vg2FkoX/Z8G2Cd@coredump.intra.peff.net> (raw)
In-Reply-To: <20221128185320.2735382-1-jonathantanmy@google.com>

On Mon, Nov 28, 2022 at 10:53:19AM -0800, Jonathan Tan wrote:

> > In general, I think partial clones tend to know which filters were used
> > to create them, because we save that filter in the config. Would it be
> > reasonable before lazy-fetching to say "I am looking for an object of
> > type X; would my configured filters have skipped such an object?".
> > 
> > Then not only would you get the behavior you want for commits (which are
> > never skipped), but a blob:none clone would not try to lazy-fetch trees
> > (but a tree:depth one would lazy-fetch both trees and blobs).
> > 
> > The gotcha I'd worry about is that the config doesn't necessarily match
> > how the repository was originally created. There is nothing right now
> > saying you cannot partial-clone with one filter, then change the config
> > going forward.
> 
> Thanks for weighing in. In the general case, we indeed do know what kind of
> object we're fetching, so it does make sense to generalize my idea to trees and
> blobs as well. On the other hand, though, besides the issue that the user may
> subsequently change the config, the benefits of distinguishing between a blob
> and tree are not that great, I think - we would fail fast when, say, a tree
> is missing due to object store corruption when trying to check out something
> in a blob:none clone, but the same object store corruption probably would
> mean that commits are missing too, so even if we just did this for commits, we
> would already fail equally as fast. Because of this (and the user being able
> to change the config), it makes sense to me to handle only the commit case, at
> least for now.
> 
> Having said that, this is not forwards incompatible with restricting lazy fetch
> for trees and blobs, so if we see fit later to do that, we can still do it.

Yeah, I think your reasoning is sound. Certainly it seems reasonable to
start with commit objects, since we know they can never be filtered, and
see how that fares in practice. And then think about handling other
object types later (or never; this is really just a "we might catch
corruption sooner" nice-to-have, and not a correctness problem in normal
use).

-Peff

      reply	other threads:[~2022-11-29  1:31 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-24  0:42 [Design RFC] Being more defensive about fetching commits in partial clone Jonathan Tan
2022-11-24  1:04 ` Jeff King
2022-11-28 18:53   ` Jonathan Tan
2022-11-29  1:31     ` Jeff King [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y4Vg2FkoX/Z8G2Cd@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).