git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* partial clone: promisor fetch during push (pack-objects)
@ 2021-02-26 12:01 Robert Coup
  2021-02-26 20:54 ` Jonathan Tan
  0 siblings, 1 reply; 4+ messages in thread
From: Robert Coup @ 2021-02-26 12:01 UTC (permalink / raw)
  To: git

For a partially-cloned repository, push (pack-objects) does a fetch
for missing objects to do delta compression with.

Test-case to reproduce, with annotated output:
https://gist.github.com/rcoup/5593a0627cca52f226580c72743d8e33
(git v2.30.1.602.g966e671106)

My use case is partially-cloning some large repositories with the
majority of the blobs not fetched locally, writing some blobs to
replace some specific paths I don't currently have cloned, committing
& pushing (while leaving most trees as-is). I didn't expect git to
attempt promisor fetches during a push :)

I happened to see it because allowAnySha1InWant wasn't enabled, so it
printed errors but happily continued and completed the push (after a
fetch attempt for each object).

My current workaround is setting `pack.window=0` during push. Looks
from builtin/pack_objects.c:prepare_pack() that `pack.depth=0` should
skip it too, but that didn't seem to work.

Invoking pack-objects directly with any --missing= value still tries
to fetch. And regardless, it carries on if the fetches fail. The
fetches happen at the end of builtin/pack_objects.c:check_object().

1. Feels to me that fetching from a promisor remote is never going to
be quicker than skipping delta compression during a push. Maybe
there's a case for doing it during other pack invocations to minimise
size though?

2. Seems like a bug that check_object() doesn't honour
fetch_if_missing and skip the call to prefetch_to_pack().

3. But push doesn't pass --missing= to pack-objects anyway, so that
wouldn't actually solve the original issue. Should it?

Thanks!
Rob :)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: partial clone: promisor fetch during push (pack-objects)
  2021-02-26 12:01 partial clone: promisor fetch during push (pack-objects) Robert Coup
@ 2021-02-26 20:54 ` Jonathan Tan
  2021-03-01 11:57   ` Robert Coup
  0 siblings, 1 reply; 4+ messages in thread
From: Jonathan Tan @ 2021-02-26 20:54 UTC (permalink / raw)
  To: robert.coup; +Cc: git, Jonathan Tan

> For a partially-cloned repository, push (pack-objects) does a fetch
> for missing objects to do delta compression with.
> 
> Test-case to reproduce, with annotated output:
> https://gist.github.com/rcoup/5593a0627cca52f226580c72743d8e33
> (git v2.30.1.602.g966e671106)

I found it difficult to understand the gist (there were what I think
are unrelated packings and unpackagings of objects, for example), but
what I gleaned is that you did a bare partial clone (thus, no checkout,
so no blobs get fetched at all), wrote a new blob to the repo, manually
created a tree with the ID of that blob and the IDs of some missing
blobs taken from the current HEAD tree, committed, and then pushed that
commit.

> My use case is partially-cloning some large repositories with the
> majority of the blobs not fetched locally, writing some blobs to
> replace some specific paths I don't currently have cloned, committing
> & pushing (while leaving most trees as-is). I didn't expect git to
> attempt promisor fetches during a push :)
> 
> I happened to see it because allowAnySha1InWant wasn't enabled, so it
> printed errors but happily continued and completed the push (after a
> fetch attempt for each object).
> 
> My current workaround is setting `pack.window=0` during push. Looks
> from builtin/pack_objects.c:prepare_pack() that `pack.depth=0` should
> skip it too, but that didn't seem to work.

Ah, thanks for this. So the cause is not that we are unnecessarily
pushing the missing objects, but because they appear in the window when
we do delta compression (just like you said at the start).

> Invoking pack-objects directly with any --missing= value still tries
> to fetch. And regardless, it carries on if the fetches fail. The
> fetches happen at the end of builtin/pack_objects.c:check_object().

To clarify, the "end" here is the call to prefetch_to_pack(), which
indeed bypasses fetch_if_missing (set by some --missing= arguments).

> 1. Feels to me that fetching from a promisor remote is never going to
> be quicker than skipping delta compression during a push. Maybe
> there's a case for doing it during other pack invocations to minimise
> size though?

It makes sense to me that we shouldn't fetch in this situation.

> 2. Seems like a bug that check_object() doesn't honour
> fetch_if_missing and skip the call to prefetch_to_pack().

We do need to fetch if we're trying to pack a missing object. I think
the real bug here is that we shouldn't prefetch if the to_pack entry has
preferred_base set (which means that it will not go into the final
packfile and is just available for delta compression).

> 3. But push doesn't pass --missing= to pack-objects anyway, so that
> wouldn't actually solve the original issue. Should it?

If we fix the bug I described above, I think it's still OK if push
doesn't pass --missing=.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: partial clone: promisor fetch during push (pack-objects)
  2021-02-26 20:54 ` Jonathan Tan
@ 2021-03-01 11:57   ` Robert Coup
  2021-03-01 20:55     ` Jonathan Tan
  0 siblings, 1 reply; 4+ messages in thread
From: Robert Coup @ 2021-03-01 11:57 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Hi Jonathan,

On Fri, 26 Feb 2021 at 20:55, Jonathan Tan <jonathantanmy@google.com> wrote:
> So the cause is not that we are unnecessarily
> pushing the missing objects, but because they appear in the window when
> we do delta compression (just like you said at the start).

Thanks for persevering :)

>
> > 2. Seems like a bug that check_object() doesn't honour
> > fetch_if_missing and skip the call to prefetch_to_pack().
>
> We do need to fetch if we're trying to pack a missing object. I think
> the real bug here is that we shouldn't prefetch if the to_pack entry has
> preferred_base set (which means that it will not go into the final
> packfile and is just available for delta compression).

That makes sense, I had a feeling it wouldn't be quite as simple as my
suggestion.

>
> > 3. But push doesn't pass --missing= to pack-objects anyway, so that
> > wouldn't actually solve the original issue. Should it?
>
> If we fix the bug I described above, I think it's still OK if push
> doesn't pass --missing=.

I'll have a go at a patch. Any suggestions on how to approach
writing/expanding a test to cover this?

Thanks, Rob :)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: partial clone: promisor fetch during push (pack-objects)
  2021-03-01 11:57   ` Robert Coup
@ 2021-03-01 20:55     ` Jonathan Tan
  0 siblings, 0 replies; 4+ messages in thread
From: Jonathan Tan @ 2021-03-01 20:55 UTC (permalink / raw)
  To: robert.coup; +Cc: jonathantanmy, git

> > > 3. But push doesn't pass --missing= to pack-objects anyway, so that
> > > wouldn't actually solve the original issue. Should it?
> >
> > If we fix the bug I described above, I think it's still OK if push
> > doesn't pass --missing=.
> 
> I'll have a go at a patch. Any suggestions on how to approach
> writing/expanding a test to cover this?

I think the main thing is to test that such a push doesn't also fetch.
The most straightforward and reliable way I can think of is to run "git
count-objects -v" before and after the push. There are other ways like
using GIT_TRACE_PACKET and then grep-ping for the absence of "fetch>
done" (or something like that) (but I don't like this approach because
grep-ping for absence is not robust in the face of typos) or checking
the number of files in .git/objects/pack before and after (but that is
not robust if Git decides to GC in between).

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-03-01 21:00 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-26 12:01 partial clone: promisor fetch during push (pack-objects) Robert Coup
2021-02-26 20:54 ` Jonathan Tan
2021-03-01 11:57   ` Robert Coup
2021-03-01 20:55     ` Jonathan Tan

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).