From: Jonathan Nieder <jrnieder@gmail.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: git@vger.kernel.org, gitster@pobox.com
Subject: Re: [PATCH] fetch: remove fetch_if_missing=0
Date: Fri, 1 Nov 2019 15:05:37 -0700 [thread overview]
Message-ID: <20191101220537.GA249573@google.com> (raw)
In-Reply-To: <20191101203830.231676-1-jonathantanmy@google.com>
Hi,
Jonathan Tan wrote:
> In fetch_pack() (and all functions it calls), pass
> OBJECT_INFO_SKIP_FETCH_OBJECT whenever we query an object that could be
> a tree or blob that we do not want to be lazy-fetched even if it is
> absent. Thus, the only lazy-fetches occurring for trees and blobs are
> when resolving deltas.
>
> Thus, we can remove fetch_if_missing=0 from builtin/fetch.c. Remove
> this, and also add a test ensuring that such objects are not
> lazy-fetched. (We might be able to remove fetch_if_missing=0 from other
> places too, but I have limited myself to builtin/fetch.c in this commit
> because I have not written tests for the other commands yet.)
Hooray! Thanks much, this looks easier to maintain.
> Note that commits and tags may still be lazy-fetched. I limited myself
> to objects that could be trees or blobs here because Git does not
> support creating such commit- and tag-excluding clones yet, and even if
> such a clone were manually created, Git does not have good support for
> fetching a single commit (when fetching a commit, it and all its
> ancestors would be sent).
Is there a place we could put a NEEDSWORK comment to avoid confusion
when debugging if this gets introduced later?
Even if not, this seems like a sensible choice.
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
> I've verified that this also solves the bug explained in:
> https://public-inbox.org/git/20191007181825.13463-1-jonathantanmy@google.com/
Might be worth mentioning the example from there in the commit message
as well, to help explain the context behind the change.
I would still be in favor of applying that more conservative change to
"master", even this late in the -rc cycle.
[...]
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -1074,7 +1074,8 @@ static int check_exist_and_connected(struct ref *ref_map)
> * we need all direct targets to exist.
> */
> for (r = rm; r; r = r->next) {
> - if (!has_object_file(&r->old_oid))
> + if (!has_object_file_with_flags(&r->old_oid,
> + OBJECT_INFO_SKIP_FETCH_OBJECT))
Yes.
[...]
> @@ -1755,8 +1756,6 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
>
> packet_trace_identity("fetch");
>
> - fetch_if_missing = 0;
> -
This is the scary part, but in an "uncomfortably exciting" sense rather
than a worrying one. Thanks for adding a test.
[...]
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -673,7 +673,8 @@ static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator,
> struct object *o;
>
> if (!has_object_file_with_flags(&ref->old_oid,
> - OBJECT_INFO_QUICK))
> + OBJECT_INFO_QUICK |
> + OBJECT_INFO_SKIP_FETCH_OBJECT))
Should we make OBJECT_INFO_QUICK always imply
OBJECT_INFO_SKIP_FETCH_OBJECT? I would suspect that if we are willing to
avoid checking thoroughly locally, checking remotely would be even more
undesirable.
[...]
> --- a/t/t5616-partial-clone.sh
> +++ b/t/t5616-partial-clone.sh
> @@ -296,6 +296,75 @@ test_expect_success 'partial clone with unresolvable sparse filter fails cleanly
> test_i18ngrep "unable to parse sparse filter data in" err
> '
>
> +setup_triangle () {
> + rm -rf big-blob.txt server client promisor-remote &&
> +
> + touch big-blob.txt &&
Tests seem to prefer spelling this as
>big-blob.txt &&
because that specifes the content of the file.
> + for i in $(seq 1 100)
> + do
> + echo line $i >>big-blob.txt
> + done &&
Should this use test_seq for better portability?
nit: can avoid a subshell:
test_seq 1 100 | sed -e 's/^/line /' >big-blob.txt
[...]
> +test_expect_success 'fetch lazy-fetches only to resolve deltas' '
> + setup_triangle &&
> +
> + # Exercise to make sure it works. Git will not fetch anything from the
> + # promisor remote other than for the big blob (because it needs to
> + # resolve the delta).
> + GIT_TRACE_PACKET="$(pwd)/trace" git -C client \
> + fetch "file://$(pwd)/server" master &&
> +
> + # Verify the assumption that the client needed to fetch the delta base
> + # to resolve the delta.
> + git hash-object big-blob.txt >hash &&
> + grep "want $(cat hash)" trace
nit: can avoid using cat:
hash=$(git hash-object big-blob.txt) &&
grep "want $hash" trace
Thanks and hope that helps,
Jonathan
next prev parent reply other threads:[~2019-11-01 22:05 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-11-01 20:38 [PATCH] fetch: remove fetch_if_missing=0 Jonathan Tan
2019-11-01 22:05 ` Jonathan Nieder [this message]
2019-11-02 5:55 ` Junio C Hamano
2019-11-02 6:11 ` Eric Sunshine
2019-11-02 5:59 ` Junio C Hamano
2019-11-05 18:53 ` Jonathan Tan
2019-11-05 18:58 ` Jonathan Nieder
2019-11-05 18:56 ` [PATCH v2] " Jonathan Tan
2019-11-05 20:06 ` Eric Sunshine
2019-11-06 1:45 ` Junio C Hamano
2019-11-08 6:33 ` Junio C Hamano
2019-11-08 7:40 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191101220537.GA249573@google.com \
--to=jrnieder@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jonathantanmy@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).