git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Jeff Hostetler <git@jeffhostetler.com>
Cc: Jonathan Tan <jonathantanmy@google.com>,
	Patrick Steinhardt <ps@pks.im>, Taylor Blau <me@ttaylorr.com>,
	Bagas Sanjaya <bagasdotme@gmail.com>,
	Git Users <git@vger.kernel.org>,
	"brian m . carlson" <sandals@crustytoothpaste.net>
Subject: Re: "bad revision" fetch error when fetching missing objects from partial clones
Date: Fri, 14 May 2021 03:27:12 -0400	[thread overview]
Message-ID: <YJ4mUJ+EEAnudI3G@coredump.intra.peff.net> (raw)
In-Reply-To: <YJ0FL3zr/SnWN7t6@coredump.intra.peff.net>

[+cc Jonathan Tan for partial clone wisdom]

On Thu, May 13, 2021 at 06:53:36AM -0400, Jeff King wrote:

> > So I think it will require some digging. My _guess_ is that it has to do
> > with the "never filter out an object that was explicitly requested" rule
> > not being consistently followed. But we'll see.
> 
> OK, I think I've unraveled the mysteries.

Nope. This part is wrong:

> It is indeed a problem with the "never filter out an object that was
> explicitly requested" rule. But really, that rule is stronger: it is
> "always include an explicitly requested object". I.e., it must override
> even the usual "you said commit X was uninteresting, so we did not
> include objects reachable from X" logic.

The rule really is the softer "don't filter out explicitly-requested
objects". It's just that the non-bitmap traversal code is way less
careful about finding uninteresting objects.

Here's the same scenario failing without using bitmaps at all:

  # same as before; repo with one commit
  git init repo
  cd repo
  
  echo content >file
  git add file
  git commit -m base

  # and now we make a partial clone omitting the blob
  git config uploadpack.allowfilter true
  git clone --no-local --bare --filter=blob:none . clone

  # but here's the twist. Now we make a second commit...
  echo more >file
  git commit -am more

  # ...and then we fetch both the object we need _and_ that second
  # commit. That causes pack-objects to traverse from base..more.
  # The boundary is at "base", so we mark its tree and blob as
  # UNINTERESTING, and thus we _don't_ send them.
  cd clone
  git fetch origin $(git rev-parse HEAD:file) HEAD

So I guess the first question is: is this supposed to work? Without
bitmaps, it often will. Because we walk commits first, and then only
mark trees uninteresting at the boundary; so if there were more commits
here, and we were asking to get a blob from one of the middle ones, it
would probably work. But fundamentally the client is lying to the server
here (as all partial clones must); it is saying "I have that first
commit", but of course we don't have all of the reachable objects.

If this is supposed to work, I think we need to teach the traversal code
to "add back" all of the objects that were explicitly given when a
filter is in use (either explicitly, or perhaps just clearing or
avoiding the UNINTERESTING flag on user-given objects in the first
place). And my earlier patch does that for the bitmap side, but not the
regular traversal.

Alternatively, part of that fundamental "partial clones are lying about
their reachability" property is that we assume they can fetch the
objects they need on-demand. And indeed, the on-demand fetch we'd do
will use the noop negotiation algorithm, and will succeed. So should we,
after receiving an empty pack from the other side (because we claimed to
have objects reachable from the commit), then do a follow-up on-demand
fetch? If so, then why isn't that kicking in (I can guess there might be
some limits to on-demand fetching during a fetch itself, in order to
avoid infinite recursion)?

-Peff

  reply	other threads:[~2021-05-14  7:27 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-07 12:56 "bad revision" fetch error when fetching missing objects from partial clones Bagas Sanjaya
2021-05-11 18:44 ` Jeff Hostetler
2021-05-13  9:57   ` Jeff King
2021-05-13 10:53     ` Jeff King
2021-05-14  7:27       ` Jeff King [this message]
2021-05-17  6:03         ` Patrick Steinhardt
2021-05-17  6:31           ` Jeff King
2021-05-17 16:25         ` Derrick Stolee
2021-05-18  8:11           ` Jeff King
2021-05-18 10:14             ` Bagas Sanjaya
2021-05-18 14:04               ` Derrick Stolee
2021-05-15  6:52     ` Bagas Sanjaya

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YJ4mUJ+EEAnudI3G@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=bagasdotme@gmail.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=me@ttaylorr.com \
    --cc=ps@pks.im \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).