git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, me@ttaylorr.com, vdye@github.com,
	avarab@gmail.com, steadmon@google.com, chooglen@google.com,
	Derrick Stolee <derrickstolee@github.com>
Subject: Re: [PATCH v3 02/11] bundle: verify using check_connected()
Date: Tue, 31 Jan 2023 09:35:24 -0800	[thread overview]
Message-ID: <xmqqr0vay5cz.fsf@gitster.g> (raw)
In-Reply-To: <20c29d37f9c1ba1367145331d25dd27f966312cd.1675171759.git.gitgitgadget@gmail.com> (Derrick Stolee via GitGitGadget's message of "Tue, 31 Jan 2023 13:29:10 +0000")

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Thus, the code in verify_bundle() has previously had the additional
> check that all prerequisite commits are reachable from repository
> references. This is done via a revision walk from all references,
> stopping only if all prerequisite commits are discovered or all commits
> are walked. This uses a custom walk to verify_bundle().
>
> This check is more strict than what Git applies to fetched pack-files.
> In the fetch case, Git guarantees that the new references are closed
> under reachability by walking from the new references until walking
> commits that are reachable from repository refs. This is done through
> the well-used check_connected() method.
>
> To better align with the restrictions required by 'git fetch',
> reimplement this check in verify_bundle() to use check_connected(). This
> also simplifies the code significantly.

As I often say, breaking repository faster is not the kind of
performance gain we want to have in Git, and I am in favor of this
iteration compared to the earlier version that mostly punted on
ensuring the correctness (rather, it relied on assumption that "most
of the time this should be OK").

>  bundle.c               | 75 ++++++++++++++++--------------------------
>  t/t6020-bundle-misc.sh |  8 ++---
>  2 files changed, 33 insertions(+), 50 deletions(-)

The diffstat is very pleasing to see.

Let me read the postimage along aloud (preimage omitted).

> diff --git a/bundle.c b/bundle.c
> index 4ef7256aa11..76c3a904898 100644
> --- a/bundle.c
> +++ b/bundle.c
> @@ -187,6 +188,21 @@ static int list_refs(struct string_list *r, int argc, const char **argv)
>  /* Remember to update object flag allocation in object.h */
>  #define PREREQ_MARK (1u<<16)
>  
> +struct string_list_iterator {
> +	struct string_list *list;
> +	size_t cur;
> +};
> +
> +static const struct object_id *iterate_ref_map(void *cb_data)
> +{
> +	struct string_list_iterator *iter = cb_data;
> +
> +	if (iter->cur >= iter->list->nr)
> +		return NULL;
> +
> +	return iter->list->items[iter->cur++].util;
> +}

This is to let check_connected() collect all the prerequisite object
names.  OK.

>  int verify_bundle(struct repository *r,
>  		  struct bundle_header *header,
>  		  enum verify_bundle_flags flags)
>  {
>  	/*
>  	 * Do fast check, then if any prereqs are missing then go line by line
>  	 * to be verbose about the errors
>  	 */
>  	struct string_list *p = &header->prerequisites;
> +	int i, ret = 0;
>  	const char *message = _("Repository lacks these prerequisite commits:");
> +	struct string_list_iterator iter = {
> +		.list = p,
> +	};
> +	struct check_connected_options opts = {
> +		.quiet = 1,
> +	};
>  
>  	if (!r || !r->objects || !r->objects->odb)
>  		return error(_("need a repository to verify a bundle"));
>  
>  	for (i = 0; i < p->nr; i++) {
>  		struct string_list_item *e = p->items + i;
>  		const char *name = e->string;
>  		struct object_id *oid = e->util;
>  		struct object *o = parse_object(r, oid);
> +		if (o)
>  			continue;
>  		ret++;
>  		if (flags & VERIFY_BUNDLE_QUIET)
>  			continue;
>  		if (ret == 1)
>  			error("%s", message);
>  		error("%s %s", oid_to_hex(oid), name);
>  	}
> +	if (ret)
>  		goto cleanup;

The "quick fail" logic as before.  Looking sensible.

>  
> +	if ((ret = check_connected(iterate_ref_map, &iter, &opts)))
> +		error(_("some prerequisite commits exist in the object store, "
> +			"but are not connected to the repository's history"));

And then we let check_connected() to ensure that traversing from
these prerequisite objects down to the DAG formed by existing refs
will not die from missing objects.  Makes sense.

> +	/* TODO: preserve this verbose language. */

I am lost -- aren't we preserving the BUNDLE_VERBOSE code below?

>  	if (flags & VERIFY_BUNDLE_VERBOSE) {



> diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh
> index 38dbbf89155..7d40994991e 100755
> --- a/t/t6020-bundle-misc.sh
> +++ b/t/t6020-bundle-misc.sh
> @@ -595,14 +595,14 @@ test_expect_success 'verify catches unreachable, broken prerequisites' '
>  		# Verify should fail
>  		test_must_fail git bundle verify \
>  			../clone-from/tip.bundle 2>err &&
> -		grep "Could not read $BAD_OID" err &&
> -		grep "Failed to traverse parents of commit $TIP_OID" err &&
> +		grep "some prerequisite commits .* are not connected" err &&
> +		test_line_count = 1 err &&
>  
>  		# Unbundling should fail
>  		test_must_fail git bundle unbundle \
>  			../clone-from/tip.bundle 2>err &&
> -		grep "Could not read $BAD_OID" err &&
> -		grep "Failed to traverse parents of commit $TIP_OID" err
> +		grep "some prerequisite commits .* are not connected" err &&
> +		test_line_count = 1 err
>  	)
>  '

Especially with the new test added in the previous step, we know we
are not trading correctness off.  Excellent.

I wonder how much the performance hit does this version incur over
the "not safe at all" version and over the "use custom and
stricter-than-needed" version, by the way?

Thanks.

  reply	other threads:[~2023-01-31 17:36 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-06 20:36 [PATCH 0/8] Bundle URIs V: creationToken heuristic for incremental fetches Derrick Stolee via GitGitGadget
2023-01-06 20:36 ` [PATCH 1/8] t5558: add tests for creationToken heuristic Derrick Stolee via GitGitGadget
2023-01-17 18:17   ` Victoria Dye
2023-01-17 21:00     ` Derrick Stolee
2023-01-06 20:36 ` [PATCH 2/8] bundle-uri: parse bundle.heuristic=creationToken Derrick Stolee via GitGitGadget
2023-01-09  2:38   ` Junio C Hamano
2023-01-09 14:20     ` Derrick Stolee
2023-01-17 19:13   ` Victoria Dye
2023-01-06 20:36 ` [PATCH 3/8] bundle-uri: parse bundle.<id>.creationToken values Derrick Stolee via GitGitGadget
2023-01-09  3:08   ` Junio C Hamano
2023-01-09 14:41     ` Derrick Stolee
2023-01-17 19:24   ` Victoria Dye
2023-01-06 20:36 ` [PATCH 4/8] bundle-uri: download in creationToken order Derrick Stolee via GitGitGadget
2023-01-09  3:22   ` Junio C Hamano
2023-01-09 14:58     ` Derrick Stolee
2023-01-19 18:32   ` Victoria Dye
2023-01-20 14:56     ` Derrick Stolee
2023-01-06 20:36 ` [PATCH 5/8] clone: set fetch.bundleURI if appropriate Derrick Stolee via GitGitGadget
2023-01-19 19:42   ` Victoria Dye
2023-01-20 15:42     ` Derrick Stolee
2023-01-06 20:36 ` [PATCH 6/8] bundle-uri: drop bundle.flag from design doc Derrick Stolee via GitGitGadget
2023-01-19 19:44   ` Victoria Dye
2023-01-06 20:36 ` [PATCH 7/8] fetch: fetch from an external bundle URI Derrick Stolee via GitGitGadget
2023-01-19 20:34   ` Victoria Dye
2023-01-20 15:47     ` Derrick Stolee
2023-01-06 20:36 ` [PATCH 8/8] bundle-uri: store fetch.bundleCreationToken Derrick Stolee via GitGitGadget
2023-01-19 22:24   ` Victoria Dye
2023-01-20 15:53     ` Derrick Stolee
2023-01-23 15:21 ` [PATCH v2 00/10] Bundle URIs V: creationToken heuristic for incremental fetches Derrick Stolee via GitGitGadget
2023-01-23 15:21   ` [PATCH v2 01/10] bundle: optionally skip reachability walk Derrick Stolee via GitGitGadget
2023-01-23 18:03     ` Junio C Hamano
2023-01-23 18:24       ` Derrick Stolee
2023-01-23 20:13         ` Junio C Hamano
2023-01-23 22:30           ` Junio C Hamano
2023-01-24 12:27             ` Derrick Stolee
2023-01-24 14:14               ` [PATCH v2.5 01/11] bundle: test unbundling with incomplete history Derrick Stolee
2023-01-24 17:16                 ` Junio C Hamano
2023-01-24 14:16               ` [PATCH v2.5 02/11] bundle: verify using connected() Derrick Stolee
2023-01-24 17:33                 ` Junio C Hamano
2023-01-24 18:46                   ` Derrick Stolee
2023-01-24 20:41                     ` Junio C Hamano
2023-01-24 15:22               ` [PATCH v2 01/10] bundle: optionally skip reachability walk Junio C Hamano
2023-01-23 21:08         ` Junio C Hamano
2023-01-23 15:21   ` [PATCH v2 02/10] t5558: add tests for creationToken heuristic Derrick Stolee via GitGitGadget
2023-01-27 19:15     ` Victoria Dye
2023-01-23 15:21   ` [PATCH v2 03/10] bundle-uri: parse bundle.heuristic=creationToken Derrick Stolee via GitGitGadget
2023-01-23 15:21   ` [PATCH v2 04/10] bundle-uri: parse bundle.<id>.creationToken values Derrick Stolee via GitGitGadget
2023-01-23 15:21   ` [PATCH v2 05/10] bundle-uri: download in creationToken order Derrick Stolee via GitGitGadget
2023-01-27 19:17     ` Victoria Dye
2023-01-27 19:32       ` Junio C Hamano
2023-01-30 18:43         ` Derrick Stolee
2023-01-30 19:02           ` Junio C Hamano
2023-01-30 19:12             ` Derrick Stolee
2023-01-23 15:21   ` [PATCH v2 06/10] clone: set fetch.bundleURI if appropriate Derrick Stolee via GitGitGadget
2023-01-23 15:21   ` [PATCH v2 07/10] bundle-uri: drop bundle.flag from design doc Derrick Stolee via GitGitGadget
2023-01-23 15:21   ` [PATCH v2 08/10] fetch: fetch from an external bundle URI Derrick Stolee via GitGitGadget
2023-01-27 19:18     ` Victoria Dye
2023-01-23 15:21   ` [PATCH v2 09/10] bundle-uri: store fetch.bundleCreationToken Derrick Stolee via GitGitGadget
2023-01-23 15:21   ` [PATCH v2 10/10] bundle-uri: test missing bundles with heuristic Derrick Stolee via GitGitGadget
2023-01-27 19:21     ` Victoria Dye
2023-01-30 18:47       ` Derrick Stolee
2023-01-27 19:28   ` [PATCH v2 00/10] Bundle URIs V: creationToken heuristic for incremental fetches Victoria Dye
2023-01-31 13:29   ` [PATCH v3 00/11] " Derrick Stolee via GitGitGadget
2023-01-31 13:29     ` [PATCH v3 01/11] bundle: test unbundling with incomplete history Derrick Stolee via GitGitGadget
2023-01-31 13:29     ` [PATCH v3 02/11] bundle: verify using check_connected() Derrick Stolee via GitGitGadget
2023-01-31 17:35       ` Junio C Hamano [this message]
2023-01-31 19:31         ` Derrick Stolee
2023-01-31 19:36           ` Junio C Hamano
2023-01-31 13:29     ` [PATCH v3 03/11] t5558: add tests for creationToken heuristic Derrick Stolee via GitGitGadget
2023-01-31 13:29     ` [PATCH v3 04/11] bundle-uri: parse bundle.heuristic=creationToken Derrick Stolee via GitGitGadget
2023-01-31 13:29     ` [PATCH v3 05/11] bundle-uri: parse bundle.<id>.creationToken values Derrick Stolee via GitGitGadget
2023-01-31 21:22       ` Junio C Hamano
2023-01-31 13:29     ` [PATCH v3 06/11] bundle-uri: download in creationToken order Derrick Stolee via GitGitGadget
2023-01-31 13:29     ` [PATCH v3 07/11] clone: set fetch.bundleURI if appropriate Derrick Stolee via GitGitGadget
2023-01-31 13:29     ` [PATCH v3 08/11] bundle-uri: drop bundle.flag from design doc Derrick Stolee via GitGitGadget
2023-01-31 13:29     ` [PATCH v3 09/11] fetch: fetch from an external bundle URI Derrick Stolee via GitGitGadget
2023-01-31 13:29     ` [PATCH v3 10/11] bundle-uri: store fetch.bundleCreationToken Derrick Stolee via GitGitGadget
2023-01-31 13:29     ` [PATCH v3 11/11] bundle-uri: test missing bundles with heuristic Derrick Stolee via GitGitGadget
2023-01-31 22:01     ` [PATCH v3 00/11] Bundle URIs V: creationToken heuristic for incremental fetches Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqr0vay5cz.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=avarab@gmail.com \
    --cc=chooglen@google.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=me@ttaylorr.com \
    --cc=steadmon@google.com \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).