git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jonathan Nieder <jrnieder@gmail.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: git@vger.kernel.org, gitster@pobox.com
Subject: Re: [PATCH] fetch: remove fetch_if_missing=0
Date: Fri, 1 Nov 2019 15:05:37 -0700	[thread overview]
Message-ID: <20191101220537.GA249573@google.com> (raw)
In-Reply-To: <20191101203830.231676-1-jonathantanmy@google.com>

Hi,

Jonathan Tan wrote:

> In fetch_pack() (and all functions it calls), pass
> OBJECT_INFO_SKIP_FETCH_OBJECT whenever we query an object that could be
> a tree or blob that we do not want to be lazy-fetched even if it is
> absent. Thus, the only lazy-fetches occurring for trees and blobs are
> when resolving deltas.
>
> Thus, we can remove fetch_if_missing=0 from builtin/fetch.c. Remove
> this, and also add a test ensuring that such objects are not
> lazy-fetched. (We might be able to remove fetch_if_missing=0 from other
> places too, but I have limited myself to builtin/fetch.c in this commit
> because I have not written tests for the other commands yet.)

Hooray!  Thanks much, this looks easier to maintain.

> Note that commits and tags may still be lazy-fetched. I limited myself
> to objects that could be trees or blobs here because Git does not
> support creating such commit- and tag-excluding clones yet, and even if
> such a clone were manually created, Git does not have good support for
> fetching a single commit (when fetching a commit, it and all its
> ancestors would be sent).

Is there a place we could put a NEEDSWORK comment to avoid confusion
when debugging if this gets introduced later?

Even if not, this seems like a sensible choice.

> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
> I've verified that this also solves the bug explained in:
> https://public-inbox.org/git/20191007181825.13463-1-jonathantanmy@google.com/

Might be worth mentioning the example from there in the commit message
as well, to help explain the context behind the change.

I would still be in favor of applying that more conservative change to
"master", even this late in the -rc cycle.

[...]
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -1074,7 +1074,8 @@ static int check_exist_and_connected(struct ref *ref_map)
>  	 * we need all direct targets to exist.
>  	 */
>  	for (r = rm; r; r = r->next) {
> -		if (!has_object_file(&r->old_oid))
> +		if (!has_object_file_with_flags(&r->old_oid,
> +						OBJECT_INFO_SKIP_FETCH_OBJECT))

Yes.

[...]
> @@ -1755,8 +1756,6 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
>  
>  	packet_trace_identity("fetch");
>  
> -	fetch_if_missing = 0;
> -

This is the scary part, but in an "uncomfortably exciting" sense rather
than a worrying one.  Thanks for adding a test.

[...]
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -673,7 +673,8 @@ static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator,
>  		struct object *o;
>  
>  		if (!has_object_file_with_flags(&ref->old_oid,
> -						OBJECT_INFO_QUICK))
> +						OBJECT_INFO_QUICK |
> +							OBJECT_INFO_SKIP_FETCH_OBJECT))

Should we make OBJECT_INFO_QUICK always imply
OBJECT_INFO_SKIP_FETCH_OBJECT?  I would suspect that if we are willing to
avoid checking thoroughly locally, checking remotely would be even more
undesirable.

[...]
> --- a/t/t5616-partial-clone.sh
> +++ b/t/t5616-partial-clone.sh
> @@ -296,6 +296,75 @@ test_expect_success 'partial clone with unresolvable sparse filter fails cleanly
>  	test_i18ngrep "unable to parse sparse filter data in" err
>  '
>  
> +setup_triangle () {
> +	rm -rf big-blob.txt server client promisor-remote &&
> +
> +	touch big-blob.txt &&

Tests seem to prefer spelling this as

	>big-blob.txt &&

because that specifes the content of the file.

> +	for i in $(seq 1 100)
> +	do
> +		echo line $i >>big-blob.txt
> +	done &&

Should this use test_seq for better portability?

nit: can avoid a subshell:

	test_seq 1 100 | sed -e 's/^/line /' >big-blob.txt

[...]
> +test_expect_success 'fetch lazy-fetches only to resolve deltas' '
> +	setup_triangle &&
> +
> +	# Exercise to make sure it works. Git will not fetch anything from the
> +	# promisor remote other than for the big blob (because it needs to
> +	# resolve the delta).
> +	GIT_TRACE_PACKET="$(pwd)/trace" git -C client \
> +		fetch "file://$(pwd)/server" master &&
> +
> +	# Verify the assumption that the client needed to fetch the delta base
> +	# to resolve the delta.
> +	git hash-object big-blob.txt >hash &&
> +	grep "want $(cat hash)" trace

nit: can avoid using cat:

	hash=$(git hash-object big-blob.txt) &&
	grep "want $hash" trace

Thanks and hope that helps,
Jonathan

  reply	other threads:[~2019-11-01 22:05 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-01 20:38 [PATCH] fetch: remove fetch_if_missing=0 Jonathan Tan
2019-11-01 22:05 ` Jonathan Nieder [this message]
2019-11-02  5:55   ` Junio C Hamano
2019-11-02  6:11     ` Eric Sunshine
2019-11-02  5:59   ` Junio C Hamano
2019-11-05 18:53   ` Jonathan Tan
2019-11-05 18:58     ` Jonathan Nieder
2019-11-05 18:56 ` [PATCH v2] " Jonathan Tan
2019-11-05 20:06   ` Eric Sunshine
2019-11-06  1:45   ` Junio C Hamano
2019-11-08  6:33   ` Junio C Hamano
2019-11-08  7:40     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191101220537.GA249573@google.com \
    --to=jrnieder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).