git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] fetch: remove fetch_if_missing=0
@ 2019-11-01 20:38 Jonathan Tan
  2019-11-01 22:05 ` Jonathan Nieder
  2019-11-05 18:56 ` [PATCH v2] " Jonathan Tan
  0 siblings, 2 replies; 12+ messages in thread
From: Jonathan Tan @ 2019-11-01 20:38 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, gitster

In fetch_pack() (and all functions it calls), pass
OBJECT_INFO_SKIP_FETCH_OBJECT whenever we query an object that could be
a tree or blob that we do not want to be lazy-fetched even if it is
absent. Thus, the only lazy-fetches occurring for trees and blobs are
when resolving deltas.

Thus, we can remove fetch_if_missing=0 from builtin/fetch.c. Remove
this, and also add a test ensuring that such objects are not
lazy-fetched. (We might be able to remove fetch_if_missing=0 from other
places too, but I have limited myself to builtin/fetch.c in this commit
because I have not written tests for the other commands yet.)

Note that commits and tags may still be lazy-fetched. I limited myself
to objects that could be trees or blobs here because Git does not
support creating such commit- and tag-excluding clones yet, and even if
such a clone were manually created, Git does not have good support for
fetching a single commit (when fetching a commit, it and all its
ancestors would be sent).

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
I've verified that this also solves the bug explained in:
https://public-inbox.org/git/20191007181825.13463-1-jonathantanmy@google.com/

Quoting what I wrote from there:

> (Alternatively, we
> could add the equivalent of OBJECT_INFO_SKIP_FETCH_OBJECT to functions
> like parse_commit() that are used by files like negotiator/default.c, or
> split up commit parsing into object reading - which already has that
> flag - and commit parsing.)

This commit adds OBJECT_INFO_SKIP_FETCH_OBJECT to functions, but not the
parse_commit() mentioned - as stated above, this commit only handles
objects that could be trees or blobs.
---
 builtin/fetch.c          |  5 ++-
 fetch-pack.c             |  3 +-
 t/t5616-partial-clone.sh | 69 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 73 insertions(+), 4 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 0c345b5dfe..5ff7367dd7 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1074,7 +1074,8 @@ static int check_exist_and_connected(struct ref *ref_map)
 	 * we need all direct targets to exist.
 	 */
 	for (r = rm; r; r = r->next) {
-		if (!has_object_file(&r->old_oid))
+		if (!has_object_file_with_flags(&r->old_oid,
+						OBJECT_INFO_SKIP_FETCH_OBJECT))
 			return -1;
 	}
 
@@ -1755,8 +1756,6 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 
 	packet_trace_identity("fetch");
 
-	fetch_if_missing = 0;
-
 	/* Record the command line for the reflog */
 	strbuf_addstr(&default_rla, "fetch");
 	for (i = 1; i < argc; i++)
diff --git a/fetch-pack.c b/fetch-pack.c
index 0130b44112..37178e2d34 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -673,7 +673,8 @@ static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator,
 		struct object *o;
 
 		if (!has_object_file_with_flags(&ref->old_oid,
-						OBJECT_INFO_QUICK))
+						OBJECT_INFO_QUICK |
+							OBJECT_INFO_SKIP_FETCH_OBJECT))
 			continue;
 		o = parse_object(the_repository, &ref->old_oid);
 		if (!o)
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 79f7b65f8c..1081ed2de0 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -296,6 +296,75 @@ test_expect_success 'partial clone with unresolvable sparse filter fails cleanly
 	test_i18ngrep "unable to parse sparse filter data in" err
 '
 
+setup_triangle () {
+	rm -rf big-blob.txt server client promisor-remote &&
+
+	touch big-blob.txt &&
+	for i in $(seq 1 100)
+	do
+		echo line $i >>big-blob.txt
+	done &&
+
+	# Create a server with 2 commits: a commit with a big blob and a child
+	# commit with an incremental change. Also, create a partial clone
+	# client that only contains the first commit.
+	git init server &&
+	git -C server config --local uploadpack.allowfilter 1 &&
+	cp big-blob.txt server &&
+	git -C server add big-blob.txt &&
+	git -C server commit -m "initial" &&
+	git clone --bare --filter=tree:0 "file://$(pwd)/server" client &&
+	echo another line >>server/big-blob.txt &&
+	git -C server commit -am "append line to big blob" &&
+
+	# Create a promisor remote that only contains the blob from the first
+	# commit, and set it as the promisor remote of client. Thus, whenever
+	# the client lazy fetches, the lazy fetch will succeed only if it is
+	# for this blob.
+	git init promisor-remote &&
+	test_commit -C promisor-remote one && # so that ref advertisement is not empty
+	git -C promisor-remote config --local uploadpack.allowanysha1inwant 1 &&
+	git -C promisor-remote hash-object -w --stdin <big-blob.txt &&
+	git -C client remote set-url origin "file://$(pwd)/promisor-remote"
+}
+
+test_expect_success 'fetch lazy-fetches only to resolve deltas' '
+	setup_triangle &&
+
+	# Exercise to make sure it works. Git will not fetch anything from the
+	# promisor remote other than for the big blob (because it needs to
+	# resolve the delta).
+	GIT_TRACE_PACKET="$(pwd)/trace" git -C client \
+		fetch "file://$(pwd)/server" master &&
+
+	# Verify the assumption that the client needed to fetch the delta base
+	# to resolve the delta.
+	git hash-object big-blob.txt >hash &&
+	grep "want $(cat hash)" trace
+'
+
+test_expect_success 'fetch lazy-fetches only to resolve deltas, protocol v2' '
+	setup_triangle &&
+
+	git -C server config --local protocol.version 2 &&
+	git -C client config --local protocol.version 2 &&
+	git -C promisor-remote config --local protocol.version 2 &&
+
+	# Exercise to make sure it works. Git will not fetch anything from the
+	# promisor remote other than for the big blob (because it needs to
+	# resolve the delta).
+	GIT_TRACE_PACKET="$(pwd)/trace" git -C client \
+		fetch "file://$(pwd)/server" master &&
+
+	# Verify that protocol version 2 was used.
+	grep "fetch< version 2" trace &&
+
+	# Verify the assumption that the client needed to fetch the delta base
+	# to resolve the delta.
+	git hash-object big-blob.txt >hash &&
+	grep "want $(cat hash)" trace
+'
+
 . "$TEST_DIRECTORY"/lib-httpd.sh
 start_httpd
 
-- 
2.24.0.rc1.363.gb1bccd3e3d-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] fetch: remove fetch_if_missing=0
  2019-11-01 20:38 [PATCH] fetch: remove fetch_if_missing=0 Jonathan Tan
@ 2019-11-01 22:05 ` Jonathan Nieder
  2019-11-02  5:55   ` Junio C Hamano
                     ` (2 more replies)
  2019-11-05 18:56 ` [PATCH v2] " Jonathan Tan
  1 sibling, 3 replies; 12+ messages in thread
From: Jonathan Nieder @ 2019-11-01 22:05 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, gitster

Hi,

Jonathan Tan wrote:

> In fetch_pack() (and all functions it calls), pass
> OBJECT_INFO_SKIP_FETCH_OBJECT whenever we query an object that could be
> a tree or blob that we do not want to be lazy-fetched even if it is
> absent. Thus, the only lazy-fetches occurring for trees and blobs are
> when resolving deltas.
>
> Thus, we can remove fetch_if_missing=0 from builtin/fetch.c. Remove
> this, and also add a test ensuring that such objects are not
> lazy-fetched. (We might be able to remove fetch_if_missing=0 from other
> places too, but I have limited myself to builtin/fetch.c in this commit
> because I have not written tests for the other commands yet.)

Hooray!  Thanks much, this looks easier to maintain.

> Note that commits and tags may still be lazy-fetched. I limited myself
> to objects that could be trees or blobs here because Git does not
> support creating such commit- and tag-excluding clones yet, and even if
> such a clone were manually created, Git does not have good support for
> fetching a single commit (when fetching a commit, it and all its
> ancestors would be sent).

Is there a place we could put a NEEDSWORK comment to avoid confusion
when debugging if this gets introduced later?

Even if not, this seems like a sensible choice.

> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
> I've verified that this also solves the bug explained in:
> https://public-inbox.org/git/20191007181825.13463-1-jonathantanmy@google.com/

Might be worth mentioning the example from there in the commit message
as well, to help explain the context behind the change.

I would still be in favor of applying that more conservative change to
"master", even this late in the -rc cycle.

[...]
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -1074,7 +1074,8 @@ static int check_exist_and_connected(struct ref *ref_map)
>  	 * we need all direct targets to exist.
>  	 */
>  	for (r = rm; r; r = r->next) {
> -		if (!has_object_file(&r->old_oid))
> +		if (!has_object_file_with_flags(&r->old_oid,
> +						OBJECT_INFO_SKIP_FETCH_OBJECT))

Yes.

[...]
> @@ -1755,8 +1756,6 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
>  
>  	packet_trace_identity("fetch");
>  
> -	fetch_if_missing = 0;
> -

This is the scary part, but in an "uncomfortably exciting" sense rather
than a worrying one.  Thanks for adding a test.

[...]
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -673,7 +673,8 @@ static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator,
>  		struct object *o;
>  
>  		if (!has_object_file_with_flags(&ref->old_oid,
> -						OBJECT_INFO_QUICK))
> +						OBJECT_INFO_QUICK |
> +							OBJECT_INFO_SKIP_FETCH_OBJECT))

Should we make OBJECT_INFO_QUICK always imply
OBJECT_INFO_SKIP_FETCH_OBJECT?  I would suspect that if we are willing to
avoid checking thoroughly locally, checking remotely would be even more
undesirable.

[...]
> --- a/t/t5616-partial-clone.sh
> +++ b/t/t5616-partial-clone.sh
> @@ -296,6 +296,75 @@ test_expect_success 'partial clone with unresolvable sparse filter fails cleanly
>  	test_i18ngrep "unable to parse sparse filter data in" err
>  '
>  
> +setup_triangle () {
> +	rm -rf big-blob.txt server client promisor-remote &&
> +
> +	touch big-blob.txt &&

Tests seem to prefer spelling this as

	>big-blob.txt &&

because that specifes the content of the file.

> +	for i in $(seq 1 100)
> +	do
> +		echo line $i >>big-blob.txt
> +	done &&

Should this use test_seq for better portability?

nit: can avoid a subshell:

	test_seq 1 100 | sed -e 's/^/line /' >big-blob.txt

[...]
> +test_expect_success 'fetch lazy-fetches only to resolve deltas' '
> +	setup_triangle &&
> +
> +	# Exercise to make sure it works. Git will not fetch anything from the
> +	# promisor remote other than for the big blob (because it needs to
> +	# resolve the delta).
> +	GIT_TRACE_PACKET="$(pwd)/trace" git -C client \
> +		fetch "file://$(pwd)/server" master &&
> +
> +	# Verify the assumption that the client needed to fetch the delta base
> +	# to resolve the delta.
> +	git hash-object big-blob.txt >hash &&
> +	grep "want $(cat hash)" trace

nit: can avoid using cat:

	hash=$(git hash-object big-blob.txt) &&
	grep "want $hash" trace

Thanks and hope that helps,
Jonathan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] fetch: remove fetch_if_missing=0
  2019-11-01 22:05 ` Jonathan Nieder
@ 2019-11-02  5:55   ` Junio C Hamano
  2019-11-02  6:11     ` Eric Sunshine
  2019-11-02  5:59   ` Junio C Hamano
  2019-11-05 18:53   ` Jonathan Tan
  2 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2019-11-02  5:55 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Jonathan Tan, git

Jonathan Nieder <jrnieder@gmail.com> writes:

>> @@ -1755,8 +1756,6 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
>>  
>>  	packet_trace_identity("fetch");
>>  
>> -	fetch_if_missing = 0;
>> -
>
> This is the scary part, but in an "uncomfortably exciting" sense rather
> than a worrying one.  Thanks for adding a test.
>
> [...]
>> --- a/fetch-pack.c
>> +++ b/fetch-pack.c
>> @@ -673,7 +673,8 @@ static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator,
>>  		struct object *o;
>>  
>>  		if (!has_object_file_with_flags(&ref->old_oid,
>> -						OBJECT_INFO_QUICK))
>> +						OBJECT_INFO_QUICK |
>> +							OBJECT_INFO_SKIP_FETCH_OBJECT))
>
> Should we make OBJECT_INFO_QUICK always imply
> OBJECT_INFO_SKIP_FETCH_OBJECT?  I would suspect that if we are willing to
> avoid checking thoroughly locally, checking remotely would be even more
> undesirable.

I think I've seen this mentioned a few times during this cycle in
the list archive ;-)

>> +	for i in $(seq 1 100)
>> +	do
>> +		echo line $i >>big-blob.txt
>> +	done &&
>
> Should this use test_seq for better portability?

Yup.

> nit: can avoid a subshell:
>
> 	test_seq 1 100 | sed -e 's/^/line /' >big-blob.txt

Yeah, but it costs process start-up and "sed" that may be rather
heavyweight.  At least

	for i in $(test_seq ...)
	do
		echo line $i
	done >big-blob.txt

would save repeated opening and closing the file, I'd think.

>> +	git hash-object big-blob.txt >hash &&
>> +	grep "want $(cat hash)" trace
>
> nit: can avoid using cat:
>
> 	hash=$(git hash-object big-blob.txt) &&
> 	grep "want $hash" trace
>
> Thanks and hope that helps,

Thanks, both.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] fetch: remove fetch_if_missing=0
  2019-11-01 22:05 ` Jonathan Nieder
  2019-11-02  5:55   ` Junio C Hamano
@ 2019-11-02  5:59   ` Junio C Hamano
  2019-11-05 18:53   ` Jonathan Tan
  2 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2019-11-02  5:59 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Jonathan Tan, git

Jonathan Nieder <jrnieder@gmail.com> writes:

> I would still be in favor of applying that more conservative change to
> "master", even this late in the -rc cycle.

I believe <20191023233403.145457-1-jonathantanmy@google.com> is the
latest incarnation of the "conservative" approach.  I think I can
buy that, but let me see if there are interactions with other topics
first.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] fetch: remove fetch_if_missing=0
  2019-11-02  5:55   ` Junio C Hamano
@ 2019-11-02  6:11     ` Eric Sunshine
  0 siblings, 0 replies; 12+ messages in thread
From: Eric Sunshine @ 2019-11-02  6:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Nieder, Jonathan Tan, Git List

On Sat, Nov 2, 2019 at 1:55 AM Junio C Hamano <gitster@pobox.com> wrote:
> Jonathan Nieder <jrnieder@gmail.com> writes:
> > nit: can avoid a subshell:
> >
> >       test_seq 1 100 | sed -e 's/^/line /' >big-blob.txt
>
> Yeah, but it costs process start-up and "sed" that may be rather
> heavyweight.  At least
>
>         for i in $(test_seq ...)
>         do
>                 echo line $i
>         done >big-blob.txt
>
> would save repeated opening and closing the file, I'd think.

More bikeshedding:

    printf "line %d\n" $(test_seq 1 100) >big-blob.txt

is reasonably concise, perhaps easier to grok than the one-liner
incorporating 'sed', and shouldn't run afoul of command-line length
limitation.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] fetch: remove fetch_if_missing=0
  2019-11-01 22:05 ` Jonathan Nieder
  2019-11-02  5:55   ` Junio C Hamano
  2019-11-02  5:59   ` Junio C Hamano
@ 2019-11-05 18:53   ` Jonathan Tan
  2019-11-05 18:58     ` Jonathan Nieder
  2 siblings, 1 reply; 12+ messages in thread
From: Jonathan Tan @ 2019-11-05 18:53 UTC (permalink / raw)
  To: jrnieder; +Cc: jonathantanmy, git, gitster

Thanks for your review.

> > Note that commits and tags may still be lazy-fetched. I limited myself
> > to objects that could be trees or blobs here because Git does not
> > support creating such commit- and tag-excluding clones yet, and even if
> > such a clone were manually created, Git does not have good support for
> > fetching a single commit (when fetching a commit, it and all its
> > ancestors would be sent).
> 
> Is there a place we could put a NEEDSWORK comment to avoid confusion
> when debugging if this gets introduced later?
> 
> Even if not, this seems like a sensible choice.

Done in the test.

> > I've verified that this also solves the bug explained in:
> > https://public-inbox.org/git/20191007181825.13463-1-jonathantanmy@google.com/
> 
> Might be worth mentioning the example from there in the commit message
> as well, to help explain the context behind the change.
> 
> I would still be in favor of applying that more conservative change to
> "master", even this late in the -rc cycle.

If we're applying that change first, then this no longer fixes any bug,
but is just a code cleanup. (If we don't apply that change, then I'll
include that example in the commit message.)

> Should we make OBJECT_INFO_QUICK always imply
> OBJECT_INFO_SKIP_FETCH_OBJECT?  I would suspect that if we are willing to
> avoid checking thoroughly locally, checking remotely would be even more
> undesirable.

As I wrote in [1], the implication does not really go both ways. I think
it's better to keep them separate. (Or, at least, we don't need to make
the decision in this patch.)

[1] https://public-inbox.org/git/20191011220822.154063-1-jonathantanmy@google.com/

> > +test_expect_success 'fetch lazy-fetches only to resolve deltas' '
> > +	setup_triangle &&
> > +
> > +	# Exercise to make sure it works. Git will not fetch anything from the
> > +	# promisor remote other than for the big blob (because it needs to
> > +	# resolve the delta).
> > +	GIT_TRACE_PACKET="$(pwd)/trace" git -C client \
> > +		fetch "file://$(pwd)/server" master &&
> > +
> > +	# Verify the assumption that the client needed to fetch the delta base
> > +	# to resolve the delta.
> > +	git hash-object big-blob.txt >hash &&
> > +	grep "want $(cat hash)" trace
> 
> nit: can avoid using cat:
> 
> 	hash=$(git hash-object big-blob.txt) &&
> 	grep "want $hash" trace

I think it's less error-prone if we always have a "git" command on its
own on a line, to avoid losing its error code. When piped into another
invocation, or when command-substituted into an argument (e.g. "echo
$(git hash-object foo)"), we lose its error code.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2] fetch: remove fetch_if_missing=0
  2019-11-01 20:38 [PATCH] fetch: remove fetch_if_missing=0 Jonathan Tan
  2019-11-01 22:05 ` Jonathan Nieder
@ 2019-11-05 18:56 ` Jonathan Tan
  2019-11-05 20:06   ` Eric Sunshine
                     ` (2 more replies)
  1 sibling, 3 replies; 12+ messages in thread
From: Jonathan Tan @ 2019-11-05 18:56 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, jrnieder, gitster, sunshine

In fetch_pack() (and all functions it calls), pass
OBJECT_INFO_SKIP_FETCH_OBJECT whenever we query an object that could be
a tree or blob that we do not want to be lazy-fetched even if it is
absent. Thus, the only lazy-fetches occurring for trees and blobs are
when resolving deltas.

Thus, we can remove fetch_if_missing=0 from builtin/fetch.c. Remove
this, and also add a test ensuring that such objects are not
lazy-fetched. (We might be able to remove fetch_if_missing=0 from other
places too, but I have limited myself to builtin/fetch.c in this commit
because I have not written tests for the other commands yet.)

Note that commits and tags may still be lazy-fetched. I limited myself
to objects that could be trees or blobs here because Git does not
support creating such commit- and tag-excluding clones yet, and even if
such a clone were manually created, Git does not have good support for
fetching a single commit (when fetching a commit, it and all its
ancestors would be sent).

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
Changes from v1:
- Added NEEDSWORK in test, suggested by Jonathan Nieder
- Used printf in test, suggested by Eric Sunshine

Range-diff against v1:
1:  f52c8d4c11 ! 1:  8e93cb8b02 fetch: remove fetch_if_missing=0
    @@ Commit message
         ancestors would be sent).
     
         Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
     
      ## builtin/fetch.c ##
     @@ builtin/fetch.c: static int check_exist_and_connected(struct ref *ref_map)
    @@ t/t5616-partial-clone.sh: test_expect_success 'partial clone with unresolvable s
     +setup_triangle () {
     +	rm -rf big-blob.txt server client promisor-remote &&
     +
    -+	touch big-blob.txt &&
    -+	for i in $(seq 1 100)
    -+	do
    -+		echo line $i >>big-blob.txt
    -+	done &&
    ++	printf "line %d\n" $(test_seq 1 100) >big-blob.txt
     +
     +	# Create a server with 2 commits: a commit with a big blob and a child
     +	# commit with an incremental change. Also, create a partial clone
    @@ t/t5616-partial-clone.sh: test_expect_success 'partial clone with unresolvable s
     +	git -C client remote set-url origin "file://$(pwd)/promisor-remote"
     +}
     +
    ++# NEEDSWORK: The tests beginning with "fetch lazy-fetches" below only
    ++# test that "fetch" avoid fetching trees and blobs, but not commits or
    ++# tags. Revisit this if Git is ever taught to support partial clones
    ++# with commits and/or tags filtered out.
    ++
     +test_expect_success 'fetch lazy-fetches only to resolve deltas' '
     +	setup_triangle &&
     +

 builtin/fetch.c          |  5 ++-
 fetch-pack.c             |  3 +-
 t/t5616-partial-clone.sh | 70 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 74 insertions(+), 4 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 0c345b5dfe..5ff7367dd7 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1074,7 +1074,8 @@ static int check_exist_and_connected(struct ref *ref_map)
 	 * we need all direct targets to exist.
 	 */
 	for (r = rm; r; r = r->next) {
-		if (!has_object_file(&r->old_oid))
+		if (!has_object_file_with_flags(&r->old_oid,
+						OBJECT_INFO_SKIP_FETCH_OBJECT))
 			return -1;
 	}
 
@@ -1755,8 +1756,6 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 
 	packet_trace_identity("fetch");
 
-	fetch_if_missing = 0;
-
 	/* Record the command line for the reflog */
 	strbuf_addstr(&default_rla, "fetch");
 	for (i = 1; i < argc; i++)
diff --git a/fetch-pack.c b/fetch-pack.c
index 0130b44112..37178e2d34 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -673,7 +673,8 @@ static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator,
 		struct object *o;
 
 		if (!has_object_file_with_flags(&ref->old_oid,
-						OBJECT_INFO_QUICK))
+						OBJECT_INFO_QUICK |
+							OBJECT_INFO_SKIP_FETCH_OBJECT))
 			continue;
 		o = parse_object(the_repository, &ref->old_oid);
 		if (!o)
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 79f7b65f8c..e785fba7ed 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -296,6 +296,76 @@ test_expect_success 'partial clone with unresolvable sparse filter fails cleanly
 	test_i18ngrep "unable to parse sparse filter data in" err
 '
 
+setup_triangle () {
+	rm -rf big-blob.txt server client promisor-remote &&
+
+	printf "line %d\n" $(test_seq 1 100) >big-blob.txt
+
+	# Create a server with 2 commits: a commit with a big blob and a child
+	# commit with an incremental change. Also, create a partial clone
+	# client that only contains the first commit.
+	git init server &&
+	git -C server config --local uploadpack.allowfilter 1 &&
+	cp big-blob.txt server &&
+	git -C server add big-blob.txt &&
+	git -C server commit -m "initial" &&
+	git clone --bare --filter=tree:0 "file://$(pwd)/server" client &&
+	echo another line >>server/big-blob.txt &&
+	git -C server commit -am "append line to big blob" &&
+
+	# Create a promisor remote that only contains the blob from the first
+	# commit, and set it as the promisor remote of client. Thus, whenever
+	# the client lazy fetches, the lazy fetch will succeed only if it is
+	# for this blob.
+	git init promisor-remote &&
+	test_commit -C promisor-remote one && # so that ref advertisement is not empty
+	git -C promisor-remote config --local uploadpack.allowanysha1inwant 1 &&
+	git -C promisor-remote hash-object -w --stdin <big-blob.txt &&
+	git -C client remote set-url origin "file://$(pwd)/promisor-remote"
+}
+
+# NEEDSWORK: The tests beginning with "fetch lazy-fetches" below only
+# test that "fetch" avoid fetching trees and blobs, but not commits or
+# tags. Revisit this if Git is ever taught to support partial clones
+# with commits and/or tags filtered out.
+
+test_expect_success 'fetch lazy-fetches only to resolve deltas' '
+	setup_triangle &&
+
+	# Exercise to make sure it works. Git will not fetch anything from the
+	# promisor remote other than for the big blob (because it needs to
+	# resolve the delta).
+	GIT_TRACE_PACKET="$(pwd)/trace" git -C client \
+		fetch "file://$(pwd)/server" master &&
+
+	# Verify the assumption that the client needed to fetch the delta base
+	# to resolve the delta.
+	git hash-object big-blob.txt >hash &&
+	grep "want $(cat hash)" trace
+'
+
+test_expect_success 'fetch lazy-fetches only to resolve deltas, protocol v2' '
+	setup_triangle &&
+
+	git -C server config --local protocol.version 2 &&
+	git -C client config --local protocol.version 2 &&
+	git -C promisor-remote config --local protocol.version 2 &&
+
+	# Exercise to make sure it works. Git will not fetch anything from the
+	# promisor remote other than for the big blob (because it needs to
+	# resolve the delta).
+	GIT_TRACE_PACKET="$(pwd)/trace" git -C client \
+		fetch "file://$(pwd)/server" master &&
+
+	# Verify that protocol version 2 was used.
+	grep "fetch< version 2" trace &&
+
+	# Verify the assumption that the client needed to fetch the delta base
+	# to resolve the delta.
+	git hash-object big-blob.txt >hash &&
+	grep "want $(cat hash)" trace
+'
+
 . "$TEST_DIRECTORY"/lib-httpd.sh
 start_httpd
 
-- 
2.24.0.rc1.363.gb1bccd3e3d-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] fetch: remove fetch_if_missing=0
  2019-11-05 18:53   ` Jonathan Tan
@ 2019-11-05 18:58     ` Jonathan Nieder
  0 siblings, 0 replies; 12+ messages in thread
From: Jonathan Nieder @ 2019-11-05 18:58 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, gitster

Jonathan Tan wrote:

> I think it's less error-prone if we always have a "git" command on its
> own on a line, to avoid losing its error code. When piped into another
> invocation, or when command-substituted into an argument (e.g. "echo
> $(git hash-object foo)"), we lose its error code.

Sure, but assignment is a special case:

	if myvar=$(false)
	then
		echo yes
	else
		echo no
	fi

prints "no".

See "Don't use command substitution in a way that discards git's exit
code" in t/README for more details.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] fetch: remove fetch_if_missing=0
  2019-11-05 18:56 ` [PATCH v2] " Jonathan Tan
@ 2019-11-05 20:06   ` Eric Sunshine
  2019-11-06  1:45   ` Junio C Hamano
  2019-11-08  6:33   ` Junio C Hamano
  2 siblings, 0 replies; 12+ messages in thread
From: Eric Sunshine @ 2019-11-05 20:06 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Git List, Jonathan Nieder, Junio C Hamano

On Tue, Nov 5, 2019 at 1:56 PM Jonathan Tan <jonathantanmy@google.com> wrote:
> diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
> @@ -296,6 +296,76 @@ test_expect_success 'partial clone with unresolvable sparse filter fails cleanly
> +setup_triangle () {
> +       rm -rf big-blob.txt server client promisor-remote &&
> +
> +       printf "line %d\n" $(test_seq 1 100) >big-blob.txt

Broken &&-chain.

> +       # Create a server with 2 commits: a commit with a big blob and a child
> +       # commit with an incremental change. Also, create a partial clone
> +       # client that only contains the first commit.
> +       git init server &&
> +       git -C server config --local uploadpack.allowfilter 1 &&
> +       cp big-blob.txt server &&
> +       git -C server add big-blob.txt &&
> +       git -C server commit -m "initial" &&
> +       git clone --bare --filter=tree:0 "file://$(pwd)/server" client &&
> +       echo another line >>server/big-blob.txt &&
> +       git -C server commit -am "append line to big blob" &&
> +
> +       # Create a promisor remote that only contains the blob from the first
> +       # commit, and set it as the promisor remote of client. Thus, whenever
> +       # the client lazy fetches, the lazy fetch will succeed only if it is
> +       # for this blob.
> +       git init promisor-remote &&
> +       test_commit -C promisor-remote one && # so that ref advertisement is not empty
> +       git -C promisor-remote config --local uploadpack.allowanysha1inwant 1 &&
> +       git -C promisor-remote hash-object -w --stdin <big-blob.txt &&
> +       git -C client remote set-url origin "file://$(pwd)/promisor-remote"
> +}

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] fetch: remove fetch_if_missing=0
  2019-11-05 18:56 ` [PATCH v2] " Jonathan Tan
  2019-11-05 20:06   ` Eric Sunshine
@ 2019-11-06  1:45   ` Junio C Hamano
  2019-11-08  6:33   ` Junio C Hamano
  2 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2019-11-06  1:45 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, jrnieder, sunshine

Jonathan Tan <jonathantanmy@google.com> writes:

> In fetch_pack() (and all functions it calls), pass
> OBJECT_INFO_SKIP_FETCH_OBJECT whenever we query an object that could be
> a tree or blob that we do not want to be lazy-fetched even if it is
> absent. Thus, the only lazy-fetches occurring for trees and blobs are
> when resolving deltas.
>
> Thus, we can remove fetch_if_missing=0 from builtin/fetch.c. Remove
> this, and also add a test ensuring that such objects are not
> lazy-fetched. (We might be able to remove fetch_if_missing=0 from other
> places too, but I have limited myself to builtin/fetch.c in this commit
> because I have not written tests for the other commands yet.)
>
> Note that commits and tags may still be lazy-fetched. I limited myself
> to objects that could be trees or blobs here because Git does not
> support creating such commit- and tag-excluding clones yet, and even if
> such a clone were manually created, Git does not have good support for
> fetching a single commit (when fetching a commit, it and all its
> ancestors would be sent).
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
> Changes from v1:
> - Added NEEDSWORK in test, suggested by Jonathan Nieder
> - Used printf in test, suggested by Eric Sunshine

Nice.  The less we use big "plug the machinery to lazily fetch for
now" switch, the better, I think.

> +setup_triangle () {
> +	rm -rf big-blob.txt server client promisor-remote &&
> +
> +	printf "line %d\n" $(test_seq 1 100) >big-blob.txt

I'll tweak this line while queueing with a trailing " &&" (no need
to resend only for this).

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] fetch: remove fetch_if_missing=0
  2019-11-05 18:56 ` [PATCH v2] " Jonathan Tan
  2019-11-05 20:06   ` Eric Sunshine
  2019-11-06  1:45   ` Junio C Hamano
@ 2019-11-08  6:33   ` Junio C Hamano
  2019-11-08  7:40     ` Junio C Hamano
  2 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2019-11-08  6:33 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, jrnieder, sunshine

Jonathan Tan <jonathantanmy@google.com> writes:

> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index 0c345b5dfe..5ff7367dd7 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -1074,7 +1074,8 @@ static int check_exist_and_connected(struct ref *ref_map)
>  	 * we need all direct targets to exist.
>  	 */
>  	for (r = rm; r; r = r->next) {
> -		if (!has_object_file(&r->old_oid))
> +		if (!has_object_file_with_flags(&r->old_oid,
> +						OBJECT_INFO_SKIP_FETCH_OBJECT))
>  			return -1;
>  	}
>  
> @@ -1755,8 +1756,6 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
>  
>  	packet_trace_identity("fetch");
>  
> -	fetch_if_missing = 0;
> -
>  	/* Record the command line for the reflog */
>  	strbuf_addstr(&default_rla, "fetch");
>  	for (i = 1; i < argc; i++)

As this is on a stale codebase, let's base the fix on top of
c32ca691c2 or later.  The part of the patch for builtin/fetch.c
adjusted for that decision would look like attached.

An interesting thing is that c32ca691c2^2 that moved the assignment
to this big red switch variable around causes 3-way merge to fail in
a miserable way.  The "moving around" would involve removing from
the same location as the rebased patch below removes, plus adding
the assignment elsewhere, so "both sides removed the assignment from
this hunk, so take that" would correctly leave the original assignment
we see in the second hunk above removed, but fails to notice that
the assignment made elsewhere (the result of "moving around" patch)
is no longer needed, because "c32ca691c2^2 added one and this change
does not do anything there, so take the addition" cleanly resolves
to an incorrect merge result.

 builtin/fetch.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 863c858fde..5ff7367dd7 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1074,7 +1074,8 @@ static int check_exist_and_connected(struct ref *ref_map)
 	 * we need all direct targets to exist.
 	 */
 	for (r = rm; r; r = r->next) {
-		if (!has_object_file(&r->old_oid))
+		if (!has_object_file_with_flags(&r->old_oid,
+						OBJECT_INFO_SKIP_FETCH_OBJECT))
 			return -1;
 	}
 
@@ -1822,8 +1823,6 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 		}
 	}
 
-	fetch_if_missing = 0;
-
 	if (remote) {
 		if (filter_options.choice || has_promisor_remote())
 			fetch_one_setup_partial(remote);

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] fetch: remove fetch_if_missing=0
  2019-11-08  6:33   ` Junio C Hamano
@ 2019-11-08  7:40     ` Junio C Hamano
  0 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2019-11-08  7:40 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, jrnieder, sunshine

Junio C Hamano <gitster@pobox.com> writes:

> An interesting thing is that c32ca691c2^2 that moved the assignment
> to this big red switch variable around causes 3-way merge to fail in
> a miserable way.  The "moving around" would involve removing from
> the same location as the rebased patch below removes, plus adding
> the assignment elsewhere, so "both sides removed the assignment from
> this hunk, so take that" would correctly leave the original assignment
> we see in the second hunk above removed, but fails to notice that
> the assignment made elsewhere (the result of "moving around" patch)
> is no longer needed, because "c32ca691c2^2 added one and this change
> does not do anything there, so take the addition" cleanly resolves
> to an incorrect merge result.

Total tangent.  One frustrating thing is that we do this a bit
better at the tree level merge.  After read-tree does three-way
merge at the tree level, what is passed to the merge-recursive
machinery has "side A added" and "side A removed" left unresolved,
so that the post-processing phase could try to match them up and say
"aha, side A moved that path elsewhere while side B just removed,
which is a conflict".  

I wish if xdiff/xmerge.c could learn a similar trick, but the
necessary change feels quite involved, error prone and too magical.




^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-11-08  7:40 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-01 20:38 [PATCH] fetch: remove fetch_if_missing=0 Jonathan Tan
2019-11-01 22:05 ` Jonathan Nieder
2019-11-02  5:55   ` Junio C Hamano
2019-11-02  6:11     ` Eric Sunshine
2019-11-02  5:59   ` Junio C Hamano
2019-11-05 18:53   ` Jonathan Tan
2019-11-05 18:58     ` Jonathan Nieder
2019-11-05 18:56 ` [PATCH v2] " Jonathan Tan
2019-11-05 20:06   ` Eric Sunshine
2019-11-06  1:45   ` Junio C Hamano
2019-11-08  6:33   ` Junio C Hamano
2019-11-08  7:40     ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).