git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] fetch-pack: make packfile URIs work with transfer.fsckobjects
Date: Fri, 14 Aug 2020 12:59:20 -0700	[thread overview]
Message-ID: <xmqq8sehvw13.fsf@gitster.c.googlers.com> (raw)
In-Reply-To: <20200814193234.3072139-1-jonathantanmy@google.com> (Jonathan Tan's message of "Fri, 14 Aug 2020 12:32:34 -0700")

Jonathan Tan <jonathantanmy@google.com> writes:

> When fetching with packfile URIs and transfer.fsckobjects=1, use the
> --fsck-objects instead of the --strict flag when invoking index-pack so
> that links are not checked, only objects. This is because incomplete
> links are expected. (A subsequent connectivity check will be done when
> all the packs have been downloaded regardless of whether
> transfer.fsckobjects is set.)

Good reasoning.  The change looks surprisingly small, thanks to the
existing need already.

I realize that the code's quality (from readability and
discoverability's point of view) has deteriorated in this area quite
a lot over the past few years, though.  The "from_promisor" field is
set when fetch-pack is run with the corresponding command line
option, but the option is documented nowhere, so it is not
immediately obvious why the new need can be fulfilled by just
piggybacking on the existing codepath.  The meaning of the
only_packfile parameter get_pack() takes is never explained
anywhere, either.

> diff --git a/fetch-pack.c b/fetch-pack.c
> index 7f20eca4f8..66631d0034 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -892,7 +892,7 @@ static int get_pack(struct fetch_pack_args *args,
>  	    : transfer_fsck_objects >= 0
>  	    ? transfer_fsck_objects
>  	    : 0) {
> -		if (args->from_promisor)
> +		if (args->from_promisor || !only_packfile)
>  			/*
>  			 * We cannot use --strict in index-pack because it
>  			 * checks both broken objects and links, but we only

I think this is a good way to work around the "we do not have full
set of objects until we grab out-of-line packfiles, but we process
the in-protocol packdata before we grab them, so we cannot validate
yet" problem.  My guess is that "only_packfile" means "after reading
this packfile, the repository should be fully complete and we can
afford to check for connectivity", which would be always true for
protocol below v2 that lack the packfile-uri extension (hence the
call to get_pack() in do_fetch_pack() passes hardcoded 1 in this
parameter).  The v2 codepath in do_fetch_pack_v2() calls get_pack()
with true only when there is no packfile-uri, so we loosen the
validation when we know we will further grab one or more packfiles
out of line.

> +test_expect_success 'packfile-uri with transfer.fsckobjects' '
> +	P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
> +	rm -rf "$P" http_child log &&
> +
> +	git init "$P" &&
> +	git -C "$P" config "uploadpack.allowsidebandall" "true" &&
> +
> +	echo my-blob >"$P/my-blob" &&
> +	git -C "$P" add my-blob &&
> +	git -C "$P" commit -m x &&
> +
> +	configure_exclusion "$P" my-blob >h &&
> +
> +	sane_unset GIT_TEST_SIDEBAND_ALL &&
> +	git -c protocol.version=2 -c transfer.fsckobjects=1 \
> +		-c fetch.uriprotocols=http,https \
> +		clone "$HTTPD_URL/smart/http_parent" http_child &&
> +
> +	# Ensure that there are exactly 4 files (2 .pack and 2 .idx).
> +	ls http_child/.git/objects/pack/* >filelist &&

Subtle but correct.

> +	test_line_count = 4 filelist
> +'
> +
> +test_expect_success 'packfile-uri with transfer.fsckobjects fails on bad object' '
> +	P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
> +	rm -rf "$P" http_child log &&
> +
> +	git init "$P" &&
> +	git -C "$P" config "uploadpack.allowsidebandall" "true" &&
> +
> +	cat >bogus-commit <<EOF &&
> +tree $EMPTY_TREE
> +author Bugs Bunny 1234567890 +0000
> +committer Bugs Bunny <bugs@bun.ni> 1234567890 +0000
> +
> +This commit object intentionally broken
> +EOF

Use <<-EOF for readablity, please.

> +	BOGUS=$(git -C "$P" hash-object -t commit -w --stdin <bogus-commit) &&
> +	git -C "$P" branch bogus-branch "$BOGUS" &&
> +
> +	echo my-blob >"$P/my-blob" &&
> +	git -C "$P" add my-blob &&
> +	git -C "$P" commit -m x &&
> +
> +	configure_exclusion "$P" my-blob >h &&
> +
> +	sane_unset GIT_TEST_SIDEBAND_ALL &&
> +	test_must_fail git -c protocol.version=2 -c transfer.fsckobjects=1 \
> +		-c fetch.uriprotocols=http,https \
> +		clone "$HTTPD_URL/smart/http_parent" http_child 2>error &&
> +	test_i18ngrep "invalid author/committer line - missing email" error
> +'
> +
>  # DO NOT add non-httpd-specific tests here, because the last part of this
>  # test script is only executed when httpd is available and enabled.

  reply	other threads:[~2020-08-14 19:59 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-14 19:32 [PATCH] fetch-pack: make packfile URIs work with transfer.fsckobjects Jonathan Tan
2020-08-14 19:59 ` Junio C Hamano [this message]
2020-08-17 19:48 ` [PATCH v2 0/3] " Jonathan Tan
2020-08-17 19:48   ` [PATCH v2 1/3] (various): document from_promisor parameter Jonathan Tan
2020-08-17 19:48   ` [PATCH v2 2/3] fetch-pack: document only_packfile in get_pack() Jonathan Tan
2020-08-17 19:48   ` [PATCH v2 3/3] fetch-pack: make packfile URIs work with transfer.fsckobjects Jonathan Tan
2020-08-17 19:52     ` Eric Sunshine
2020-08-17 20:22       ` Jonathan Tan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq8sehvw13.fsf@gitster.c.googlers.com \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).