git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jonathan Tan <jonathantanmy@google.com>
To: robert.coup@koordinates.com
Cc: git@vger.kernel.org, Jonathan Tan <jonathantanmy@google.com>
Subject: Re: partial clone: promisor fetch during push (pack-objects)
Date: Fri, 26 Feb 2021 12:54:58 -0800	[thread overview]
Message-ID: <20210226205458.2909811-1-jonathantanmy@google.com> (raw)
In-Reply-To: <CAFLLRpJgfseK5P8ZJm7iEW1onf7ROVSkyeuPfh1+qoHHjsC8uw@mail.gmail.com>

> For a partially-cloned repository, push (pack-objects) does a fetch
> for missing objects to do delta compression with.
> 
> Test-case to reproduce, with annotated output:
> https://gist.github.com/rcoup/5593a0627cca52f226580c72743d8e33
> (git v2.30.1.602.g966e671106)

I found it difficult to understand the gist (there were what I think
are unrelated packings and unpackagings of objects, for example), but
what I gleaned is that you did a bare partial clone (thus, no checkout,
so no blobs get fetched at all), wrote a new blob to the repo, manually
created a tree with the ID of that blob and the IDs of some missing
blobs taken from the current HEAD tree, committed, and then pushed that
commit.

> My use case is partially-cloning some large repositories with the
> majority of the blobs not fetched locally, writing some blobs to
> replace some specific paths I don't currently have cloned, committing
> & pushing (while leaving most trees as-is). I didn't expect git to
> attempt promisor fetches during a push :)
> 
> I happened to see it because allowAnySha1InWant wasn't enabled, so it
> printed errors but happily continued and completed the push (after a
> fetch attempt for each object).
> 
> My current workaround is setting `pack.window=0` during push. Looks
> from builtin/pack_objects.c:prepare_pack() that `pack.depth=0` should
> skip it too, but that didn't seem to work.

Ah, thanks for this. So the cause is not that we are unnecessarily
pushing the missing objects, but because they appear in the window when
we do delta compression (just like you said at the start).

> Invoking pack-objects directly with any --missing= value still tries
> to fetch. And regardless, it carries on if the fetches fail. The
> fetches happen at the end of builtin/pack_objects.c:check_object().

To clarify, the "end" here is the call to prefetch_to_pack(), which
indeed bypasses fetch_if_missing (set by some --missing= arguments).

> 1. Feels to me that fetching from a promisor remote is never going to
> be quicker than skipping delta compression during a push. Maybe
> there's a case for doing it during other pack invocations to minimise
> size though?

It makes sense to me that we shouldn't fetch in this situation.

> 2. Seems like a bug that check_object() doesn't honour
> fetch_if_missing and skip the call to prefetch_to_pack().

We do need to fetch if we're trying to pack a missing object. I think
the real bug here is that we shouldn't prefetch if the to_pack entry has
preferred_base set (which means that it will not go into the final
packfile and is just available for delta compression).

> 3. But push doesn't pass --missing= to pack-objects anyway, so that
> wouldn't actually solve the original issue. Should it?

If we fix the bug I described above, I think it's still OK if push
doesn't pass --missing=.

  reply	other threads:[~2021-02-26 20:59 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-26 12:01 partial clone: promisor fetch during push (pack-objects) Robert Coup
2021-02-26 20:54 ` Jonathan Tan [this message]
2021-03-01 11:57   ` Robert Coup
2021-03-01 20:55     ` Jonathan Tan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210226205458.2909811-1-jonathantanmy@google.com \
    --to=jonathantanmy@google.com \
    --cc=git@vger.kernel.org \
    --cc=robert.coup@koordinates.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).