git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] push: don't fetch commit object when checking existence
@ 2024-05-22 13:36 Tom Hughes
  2024-05-22 19:16 ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: Tom Hughes @ 2024-05-22 13:36 UTC (permalink / raw
  To: git; +Cc: Tom Hughes

If we're checking to see whether to tell the user to do a fetch
before pushing there's no need for us to actually fetch the object
from the remote if the clone is partial.

Because the promisor doesn't do negotiation actually trying to do
the fetch of the new head can be very expensive as it will try and
include history that we already have and it just results in rejecting
the push with a different message, and in behavior that is different
to a clone that is not partial.

Signed-off-by: Tom Hughes <tom@compton.nu>
---
 remote.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/remote.c b/remote.c
index 2b650b813b..20395bbbd0 100644
--- a/remote.c
+++ b/remote.c
@@ -1773,7 +1773,7 @@ void set_ref_status_for_push(struct ref *remote_refs, int send_mirror,
 		if (!reject_reason && !ref->deletion && !is_null_oid(&ref->old_oid)) {
 			if (starts_with(ref->name, "refs/tags/"))
 				reject_reason = REF_STATUS_REJECT_ALREADY_EXISTS;
-			else if (!repo_has_object_file(the_repository, &ref->old_oid))
+			else if (!repo_has_object_file_with_flags(the_repository, &ref->old_oid, OBJECT_INFO_SKIP_FETCH_OBJECT))
 				reject_reason = REF_STATUS_REJECT_FETCH_FIRST;
 			else if (!lookup_commit_reference_gently(the_repository, &ref->old_oid, 1) ||
 				 !lookup_commit_reference_gently(the_repository, &ref->new_oid, 1))
-- 
2.45.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] push: don't fetch commit object when checking existence
  2024-05-22 13:36 [PATCH] push: don't fetch commit object when checking existence Tom Hughes
@ 2024-05-22 19:16 ` Junio C Hamano
  2024-05-22 20:15   ` [PATCH v2] " Tom Hughes
  2024-05-22 20:18   ` [PATCH] " Tom Hughes
  0 siblings, 2 replies; 8+ messages in thread
From: Junio C Hamano @ 2024-05-22 19:16 UTC (permalink / raw
  To: Tom Hughes; +Cc: git

Tom Hughes <tom@compton.nu> writes:

> If we're checking to see whether to tell the user to do a fetch
> before pushing there's no need for us to actually fetch the object
> from the remote if the clone is partial.
>
> Because the promisor doesn't do negotiation actually trying to do
> the fetch of the new head can be very expensive as it will try and
> include history that we already have and it just results in rejecting
> the push with a different message, and in behavior that is different
> to a clone that is not partial.

Interesting.  Is this something that is easily testable, perhaps by
preparing a partial clone and try to push from there and checking
the non-existence of the object after seeing that push failed?

Thanks.

> Signed-off-by: Tom Hughes <tom@compton.nu>
> ---
>  remote.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/remote.c b/remote.c
> index 2b650b813b..20395bbbd0 100644
> --- a/remote.c
> +++ b/remote.c
> @@ -1773,7 +1773,7 @@ void set_ref_status_for_push(struct ref *remote_refs, int send_mirror,
>  		if (!reject_reason && !ref->deletion && !is_null_oid(&ref->old_oid)) {
>  			if (starts_with(ref->name, "refs/tags/"))
>  				reject_reason = REF_STATUS_REJECT_ALREADY_EXISTS;
> -			else if (!repo_has_object_file(the_repository, &ref->old_oid))
> +			else if (!repo_has_object_file_with_flags(the_repository, &ref->old_oid, OBJECT_INFO_SKIP_FETCH_OBJECT))
>  				reject_reason = REF_STATUS_REJECT_FETCH_FIRST;
>  			else if (!lookup_commit_reference_gently(the_repository, &ref->old_oid, 1) ||
>  				 !lookup_commit_reference_gently(the_repository, &ref->new_oid, 1))


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2] push: don't fetch commit object when checking existence
  2024-05-22 19:16 ` Junio C Hamano
@ 2024-05-22 20:15   ` Tom Hughes
  2024-05-22 20:55     ` Junio C Hamano
  2024-05-23  8:58     ` Jeff King
  2024-05-22 20:18   ` [PATCH] " Tom Hughes
  1 sibling, 2 replies; 8+ messages in thread
From: Tom Hughes @ 2024-05-22 20:15 UTC (permalink / raw
  To: gitster; +Cc: git, Tom Hughes

If we're checking to see whether to tell the user to do a fetch
before pushing there's no need for us to actually fetch the object
from the remote if the clone is partial.

Because the promisor doesn't do negotiation actually trying to do
the fetch of the new head can be very expensive as it will try and
include history that we already have and it just results in rejecting
the push with a different message, and in behavior that is different
to a clone that is not partial.

Signed-off-by: Tom Hughes <tom@compton.nu>
---
 remote.c                 |  2 +-
 t/t0410-partial-clone.sh | 19 +++++++++++++++++++
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/remote.c b/remote.c
index 2b650b813b..20395bbbd0 100644
--- a/remote.c
+++ b/remote.c
@@ -1773,7 +1773,7 @@ void set_ref_status_for_push(struct ref *remote_refs, int send_mirror,
 		if (!reject_reason && !ref->deletion && !is_null_oid(&ref->old_oid)) {
 			if (starts_with(ref->name, "refs/tags/"))
 				reject_reason = REF_STATUS_REJECT_ALREADY_EXISTS;
-			else if (!repo_has_object_file(the_repository, &ref->old_oid))
+			else if (!repo_has_object_file_with_flags(the_repository, &ref->old_oid, OBJECT_INFO_SKIP_FETCH_OBJECT))
 				reject_reason = REF_STATUS_REJECT_FETCH_FIRST;
 			else if (!lookup_commit_reference_gently(the_repository, &ref->old_oid, 1) ||
 				 !lookup_commit_reference_gently(the_repository, &ref->new_oid, 1))
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 88a66f0904..7797391c03 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -689,6 +689,25 @@ test_expect_success 'lazy-fetch when accessing object not in the_repository' '
 	! grep "[?]$FILE_HASH" out
 '
 
+test_expect_success 'push should not fetch new commit objects' '
+	rm -rf server client &&
+	test_create_repo server &&
+	test_config -C server uploadpack.allowfilter 1 &&
+	test_config -C server uploadpack.allowanysha1inwant 1 &&
+	test_commit -C server server1 &&
+
+	git clone --filter=blob:none "file://$(pwd)/server" client &&
+	test_commit -C client client1 &&
+
+	test_commit -C server server2 &&
+	COMMIT=$(git -C server rev-parse server2) &&
+
+	test_must_fail git -C client push 2>err &&
+	grep "fetch first" err &&
+	git -C client rev-list --objects --missing=print "$COMMIT" >objects &&
+	grep "^[?]$COMMIT" objects
+'
+
 . "$TEST_DIRECTORY"/lib-httpd.sh
 start_httpd
 
-- 
2.45.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] push: don't fetch commit object when checking existence
  2024-05-22 19:16 ` Junio C Hamano
  2024-05-22 20:15   ` [PATCH v2] " Tom Hughes
@ 2024-05-22 20:18   ` Tom Hughes
  1 sibling, 0 replies; 8+ messages in thread
From: Tom Hughes @ 2024-05-22 20:18 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git

On 22/05/2024 20:16, Junio C Hamano wrote:
> Tom Hughes <tom@compton.nu> writes:
> 
>> If we're checking to see whether to tell the user to do a fetch
>> before pushing there's no need for us to actually fetch the object
>> from the remote if the clone is partial.
>>
>> Because the promisor doesn't do negotiation actually trying to do
>> the fetch of the new head can be very expensive as it will try and
>> include history that we already have and it just results in rejecting
>> the push with a different message, and in behavior that is different
>> to a clone that is not partial.
> 
> Interesting.  Is this something that is easily testable, perhaps by
> preparing a partial clone and try to push from there and checking
> the non-existence of the object after seeing that push failed?

Sure. I think I've managed to figure out a test and have sent
a second version of the patch with it added.

Tom

-- 
Tom Hughes (tom@compton.nu)
http://compton.nu/



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] push: don't fetch commit object when checking existence
  2024-05-22 20:15   ` [PATCH v2] " Tom Hughes
@ 2024-05-22 20:55     ` Junio C Hamano
  2024-05-22 21:46       ` Tom Hughes
  2024-05-23  8:58     ` Jeff King
  1 sibling, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2024-05-22 20:55 UTC (permalink / raw
  To: Tom Hughes; +Cc: git

Tom Hughes <tom@compton.nu> writes:

> diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
> index 88a66f0904..7797391c03 100755
> --- a/t/t0410-partial-clone.sh
> +++ b/t/t0410-partial-clone.sh
> @@ -689,6 +689,25 @@ test_expect_success 'lazy-fetch when accessing object not in the_repository' '
>  	! grep "[?]$FILE_HASH" out
>  '
>  
> +test_expect_success 'push should not fetch new commit objects' '
> +	rm -rf server client &&
> +	test_create_repo server &&
> +	test_config -C server uploadpack.allowfilter 1 &&
> +	test_config -C server uploadpack.allowanysha1inwant 1 &&
> +	test_commit -C server server1 &&

OK, we create the source that allows a partial clone.

> +	git clone --filter=blob:none "file://$(pwd)/server" client &&
> +	test_commit -C client client1 &&

And make a clone out of it, without blobs.

> +	test_commit -C server server2 &&
> +	COMMIT=$(git -C server rev-parse server2) &&

Then we create a new commit that the client does not yet have.

> +	test_must_fail git -C client push 2>err &&

We try to overwrite it.  We expect it to fail with "not a fast forward".

> +	grep "fetch first" err &&

May want to use "test_grep" but this script does not use it, so
being consistent with the surrounding tests is good.

> +	git -C client rev-list --objects --missing=print "$COMMIT" >objects &&
> +	grep "^[?]$COMMIT" objects
> +'

OK.

>  . "$TEST_DIRECTORY"/lib-httpd.sh
>  start_httpd

Looking good.  Thanks, will queue.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] push: don't fetch commit object when checking existence
  2024-05-22 20:55     ` Junio C Hamano
@ 2024-05-22 21:46       ` Tom Hughes
  2024-05-22 21:58         ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: Tom Hughes @ 2024-05-22 21:46 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git

On 22/05/2024 21:55, Junio C Hamano wrote:
> Tom Hughes <tom@compton.nu> writes:
>
>> +test_expect_success 'push should not fetch new commit objects' '
>> +	rm -rf server client &&
>> +	test_create_repo server &&
>> +	test_config -C server uploadpack.allowfilter 1 &&
>> +	test_config -C server uploadpack.allowanysha1inwant 1 &&
>> +	test_commit -C server server1 &&
> 
> OK, we create the source that allows a partial clone.
> 
>> +	git clone --filter=blob:none "file://$(pwd)/server" client &&
>> +	test_commit -C client client1 &&
> 
> And make a clone out of it, without blobs.
> 
>> +	test_commit -C server server2 &&
>> +	COMMIT=$(git -C server rev-parse server2) &&
> 
> Then we create a new commit that the client does not yet have.
> 
>> +	test_must_fail git -C client push 2>err &&
> 
> We try to overwrite it.  We expect it to fail with "not a fast forward".

Well that is what it would fail with at the moment, but it's not
what would happen with a non-partial clone - a non-partial clone
would fail with "fetch first" instead.

This patch makes both cases consistent although that wasn't the
main driver - the main driver was to stop it fetching 100Mb or
more of history in the large repository I was working with when
the upstream has one new commit.

>> +	grep "fetch first" err &&
> 
> May want to use "test_grep" but this script does not use it, so
> being consistent with the surrounding tests is good.

So here we are testing that it's a "fetch first" and rather
than "not a fast forward".

>> +	git -C client rev-list --objects --missing=print "$COMMIT" >objects &&
>> +	grep "^[?]$COMMIT" objects
>> +'
> 
> OK.

and also that it hasn't fetched the new commit.

Tom

-- 
Tom Hughes (tom@compton.nu)
http://compton.nu/



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] push: don't fetch commit object when checking existence
  2024-05-22 21:46       ` Tom Hughes
@ 2024-05-22 21:58         ` Junio C Hamano
  0 siblings, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2024-05-22 21:58 UTC (permalink / raw
  To: Tom Hughes; +Cc: git

Tom Hughes <tom@compton.nu> writes:

>>> +	test_must_fail git -C client push 2>err &&
>> We try to overwrite it.  We expect it to fail with "not a fast
>> forward".
>
> Well that is what it would fail with at the moment, but it's not
> what would happen with a non-partial clone - a non-partial clone
> would fail with "fetch first" instead.

Oh, don't get me wrong.  I wasn't trying to split hairs between the
two error modes and their phrasing.  The "fetch-first" from
set_ref_status_for_push() is done before we even initiate the
transfer to stop the operation, with a cheap check, that will
eventually lead to "not a fast forward" error.  IOW, in my mind,
they are the same errors, just diagnosed at two different places in
the code and their messages phrased differently.

> So here we are testing that it's a "fetch first" and rather
> than "not a fast forward".

I think that is being overly specific, but that is fine.  As I said,
to the end users, these two errors mean the same thing (they would
need to fetch first and then integrate their changes before pushing
it out again), so it is plausible that we may in the future decide
that we want to use the same message.  When it happens, this test
must change, which may even be a good thing (it makes it clear what
the fallout from such a change looks like).

>>> +	git -C client rev-list --objects --missing=print "$COMMIT" >objects &&
>>> +	grep "^[?]$COMMIT" objects
>>> +'
>> OK.
>
> and also that it hasn't fetched the new commit.

Yes, and this is a good check that will stand the test of time, even
across a change to rephrase the error message.

Thanks.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] push: don't fetch commit object when checking existence
  2024-05-22 20:15   ` [PATCH v2] " Tom Hughes
  2024-05-22 20:55     ` Junio C Hamano
@ 2024-05-23  8:58     ` Jeff King
  1 sibling, 0 replies; 8+ messages in thread
From: Jeff King @ 2024-05-23  8:58 UTC (permalink / raw
  To: Tom Hughes; +Cc: gitster, git

On Wed, May 22, 2024 at 09:15:40PM +0100, Tom Hughes wrote:

> diff --git a/remote.c b/remote.c
> index 2b650b813b..20395bbbd0 100644
> --- a/remote.c
> +++ b/remote.c
> @@ -1773,7 +1773,7 @@ void set_ref_status_for_push(struct ref *remote_refs, int send_mirror,
>  		if (!reject_reason && !ref->deletion && !is_null_oid(&ref->old_oid)) {
>  			if (starts_with(ref->name, "refs/tags/"))
>  				reject_reason = REF_STATUS_REJECT_ALREADY_EXISTS;
> -			else if (!repo_has_object_file(the_repository, &ref->old_oid))
> +			else if (!repo_has_object_file_with_flags(the_repository, &ref->old_oid, OBJECT_INFO_SKIP_FETCH_OBJECT))
>  				reject_reason = REF_STATUS_REJECT_FETCH_FIRST;
>  			else if (!lookup_commit_reference_gently(the_repository, &ref->old_oid, 1) ||
>  				 !lookup_commit_reference_gently(the_repository, &ref->new_oid, 1))

This makes sense to me, as we're just speculatively asking "do we have
the object". I think for that reason it would also be reasonable to use
OBJECT_INFO_QUICK here, which would avoid a fruitless re-scan of the
local objects/ directory. We often pair the two[1].

In practice, though, I think fetching the missing object is going to be
much more expensive than a local re-scan. We tend to notice the latter
only when you have a large number of objects to check, and here we're
basically limited by the number of non-fast-forward refs you're trying
to push.

So I also think it would be OK to leave it here and only do QUICK if
somebody ever notices it.

-Peff

[1] We've talked about unifying those two flags, since they so often
    come together. There's some discussion in:

      https://lore.kernel.org/git/20191011220822.154063-1-jonathantanmy@google.com/

    that they could become one flag, but these two:

      https://lore.kernel.org/git/20190909222101.GB31319@sigill.intra.peff.net/

      https://lore.kernel.org/git/20200322054916.GB578498@coredump.intra.peff.net/

    argue that QUICK implies SKIP_FETCH, but not always the other way
    around. (Obviously getting a bit off topic for your patch; if
    anything, I think this call site would just use both for now).


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-05-23  8:58 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-22 13:36 [PATCH] push: don't fetch commit object when checking existence Tom Hughes
2024-05-22 19:16 ` Junio C Hamano
2024-05-22 20:15   ` [PATCH v2] " Tom Hughes
2024-05-22 20:55     ` Junio C Hamano
2024-05-22 21:46       ` Tom Hughes
2024-05-22 21:58         ` Junio C Hamano
2024-05-23  8:58     ` Jeff King
2024-05-22 20:18   ` [PATCH] " Tom Hughes

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).