From: Junio C Hamano <gitster@pobox.com>
To: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, me@ttaylorr.com, vdye@github.com,
avarab@gmail.com, steadmon@google.com, chooglen@google.com,
Derrick Stolee <derrickstolee@github.com>
Subject: Re: [PATCH v3 02/11] bundle: verify using check_connected()
Date: Tue, 31 Jan 2023 09:35:24 -0800 [thread overview]
Message-ID: <xmqqr0vay5cz.fsf@gitster.g> (raw)
In-Reply-To: <20c29d37f9c1ba1367145331d25dd27f966312cd.1675171759.git.gitgitgadget@gmail.com> (Derrick Stolee via GitGitGadget's message of "Tue, 31 Jan 2023 13:29:10 +0000")
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> Thus, the code in verify_bundle() has previously had the additional
> check that all prerequisite commits are reachable from repository
> references. This is done via a revision walk from all references,
> stopping only if all prerequisite commits are discovered or all commits
> are walked. This uses a custom walk to verify_bundle().
>
> This check is more strict than what Git applies to fetched pack-files.
> In the fetch case, Git guarantees that the new references are closed
> under reachability by walking from the new references until walking
> commits that are reachable from repository refs. This is done through
> the well-used check_connected() method.
>
> To better align with the restrictions required by 'git fetch',
> reimplement this check in verify_bundle() to use check_connected(). This
> also simplifies the code significantly.
As I often say, breaking repository faster is not the kind of
performance gain we want to have in Git, and I am in favor of this
iteration compared to the earlier version that mostly punted on
ensuring the correctness (rather, it relied on assumption that "most
of the time this should be OK").
> bundle.c | 75 ++++++++++++++++--------------------------
> t/t6020-bundle-misc.sh | 8 ++---
> 2 files changed, 33 insertions(+), 50 deletions(-)
The diffstat is very pleasing to see.
Let me read the postimage along aloud (preimage omitted).
> diff --git a/bundle.c b/bundle.c
> index 4ef7256aa11..76c3a904898 100644
> --- a/bundle.c
> +++ b/bundle.c
> @@ -187,6 +188,21 @@ static int list_refs(struct string_list *r, int argc, const char **argv)
> /* Remember to update object flag allocation in object.h */
> #define PREREQ_MARK (1u<<16)
>
> +struct string_list_iterator {
> + struct string_list *list;
> + size_t cur;
> +};
> +
> +static const struct object_id *iterate_ref_map(void *cb_data)
> +{
> + struct string_list_iterator *iter = cb_data;
> +
> + if (iter->cur >= iter->list->nr)
> + return NULL;
> +
> + return iter->list->items[iter->cur++].util;
> +}
This is to let check_connected() collect all the prerequisite object
names. OK.
> int verify_bundle(struct repository *r,
> struct bundle_header *header,
> enum verify_bundle_flags flags)
> {
> /*
> * Do fast check, then if any prereqs are missing then go line by line
> * to be verbose about the errors
> */
> struct string_list *p = &header->prerequisites;
> + int i, ret = 0;
> const char *message = _("Repository lacks these prerequisite commits:");
> + struct string_list_iterator iter = {
> + .list = p,
> + };
> + struct check_connected_options opts = {
> + .quiet = 1,
> + };
>
> if (!r || !r->objects || !r->objects->odb)
> return error(_("need a repository to verify a bundle"));
>
> for (i = 0; i < p->nr; i++) {
> struct string_list_item *e = p->items + i;
> const char *name = e->string;
> struct object_id *oid = e->util;
> struct object *o = parse_object(r, oid);
> + if (o)
> continue;
> ret++;
> if (flags & VERIFY_BUNDLE_QUIET)
> continue;
> if (ret == 1)
> error("%s", message);
> error("%s %s", oid_to_hex(oid), name);
> }
> + if (ret)
> goto cleanup;
The "quick fail" logic as before. Looking sensible.
>
> + if ((ret = check_connected(iterate_ref_map, &iter, &opts)))
> + error(_("some prerequisite commits exist in the object store, "
> + "but are not connected to the repository's history"));
And then we let check_connected() to ensure that traversing from
these prerequisite objects down to the DAG formed by existing refs
will not die from missing objects. Makes sense.
> + /* TODO: preserve this verbose language. */
I am lost -- aren't we preserving the BUNDLE_VERBOSE code below?
> if (flags & VERIFY_BUNDLE_VERBOSE) {
> diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh
> index 38dbbf89155..7d40994991e 100755
> --- a/t/t6020-bundle-misc.sh
> +++ b/t/t6020-bundle-misc.sh
> @@ -595,14 +595,14 @@ test_expect_success 'verify catches unreachable, broken prerequisites' '
> # Verify should fail
> test_must_fail git bundle verify \
> ../clone-from/tip.bundle 2>err &&
> - grep "Could not read $BAD_OID" err &&
> - grep "Failed to traverse parents of commit $TIP_OID" err &&
> + grep "some prerequisite commits .* are not connected" err &&
> + test_line_count = 1 err &&
>
> # Unbundling should fail
> test_must_fail git bundle unbundle \
> ../clone-from/tip.bundle 2>err &&
> - grep "Could not read $BAD_OID" err &&
> - grep "Failed to traverse parents of commit $TIP_OID" err
> + grep "some prerequisite commits .* are not connected" err &&
> + test_line_count = 1 err
> )
> '
Especially with the new test added in the previous step, we know we
are not trading correctness off. Excellent.
I wonder how much the performance hit does this version incur over
the "not safe at all" version and over the "use custom and
stricter-than-needed" version, by the way?
Thanks.
next prev parent reply other threads:[~2023-01-31 17:36 UTC|newest]
Thread overview: 79+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-06 20:36 [PATCH 0/8] Bundle URIs V: creationToken heuristic for incremental fetches Derrick Stolee via GitGitGadget
2023-01-06 20:36 ` [PATCH 1/8] t5558: add tests for creationToken heuristic Derrick Stolee via GitGitGadget
2023-01-17 18:17 ` Victoria Dye
2023-01-17 21:00 ` Derrick Stolee
2023-01-06 20:36 ` [PATCH 2/8] bundle-uri: parse bundle.heuristic=creationToken Derrick Stolee via GitGitGadget
2023-01-09 2:38 ` Junio C Hamano
2023-01-09 14:20 ` Derrick Stolee
2023-01-17 19:13 ` Victoria Dye
2023-01-06 20:36 ` [PATCH 3/8] bundle-uri: parse bundle.<id>.creationToken values Derrick Stolee via GitGitGadget
2023-01-09 3:08 ` Junio C Hamano
2023-01-09 14:41 ` Derrick Stolee
2023-01-17 19:24 ` Victoria Dye
2023-01-06 20:36 ` [PATCH 4/8] bundle-uri: download in creationToken order Derrick Stolee via GitGitGadget
2023-01-09 3:22 ` Junio C Hamano
2023-01-09 14:58 ` Derrick Stolee
2023-01-19 18:32 ` Victoria Dye
2023-01-20 14:56 ` Derrick Stolee
2023-01-06 20:36 ` [PATCH 5/8] clone: set fetch.bundleURI if appropriate Derrick Stolee via GitGitGadget
2023-01-19 19:42 ` Victoria Dye
2023-01-20 15:42 ` Derrick Stolee
2023-01-06 20:36 ` [PATCH 6/8] bundle-uri: drop bundle.flag from design doc Derrick Stolee via GitGitGadget
2023-01-19 19:44 ` Victoria Dye
2023-01-06 20:36 ` [PATCH 7/8] fetch: fetch from an external bundle URI Derrick Stolee via GitGitGadget
2023-01-19 20:34 ` Victoria Dye
2023-01-20 15:47 ` Derrick Stolee
2023-01-06 20:36 ` [PATCH 8/8] bundle-uri: store fetch.bundleCreationToken Derrick Stolee via GitGitGadget
2023-01-19 22:24 ` Victoria Dye
2023-01-20 15:53 ` Derrick Stolee
2023-01-23 15:21 ` [PATCH v2 00/10] Bundle URIs V: creationToken heuristic for incremental fetches Derrick Stolee via GitGitGadget
2023-01-23 15:21 ` [PATCH v2 01/10] bundle: optionally skip reachability walk Derrick Stolee via GitGitGadget
2023-01-23 18:03 ` Junio C Hamano
2023-01-23 18:24 ` Derrick Stolee
2023-01-23 20:13 ` Junio C Hamano
2023-01-23 22:30 ` Junio C Hamano
2023-01-24 12:27 ` Derrick Stolee
2023-01-24 14:14 ` [PATCH v2.5 01/11] bundle: test unbundling with incomplete history Derrick Stolee
2023-01-24 17:16 ` Junio C Hamano
2023-01-24 14:16 ` [PATCH v2.5 02/11] bundle: verify using connected() Derrick Stolee
2023-01-24 17:33 ` Junio C Hamano
2023-01-24 18:46 ` Derrick Stolee
2023-01-24 20:41 ` Junio C Hamano
2023-01-24 15:22 ` [PATCH v2 01/10] bundle: optionally skip reachability walk Junio C Hamano
2023-01-23 21:08 ` Junio C Hamano
2023-01-23 15:21 ` [PATCH v2 02/10] t5558: add tests for creationToken heuristic Derrick Stolee via GitGitGadget
2023-01-27 19:15 ` Victoria Dye
2023-01-23 15:21 ` [PATCH v2 03/10] bundle-uri: parse bundle.heuristic=creationToken Derrick Stolee via GitGitGadget
2023-01-23 15:21 ` [PATCH v2 04/10] bundle-uri: parse bundle.<id>.creationToken values Derrick Stolee via GitGitGadget
2023-01-23 15:21 ` [PATCH v2 05/10] bundle-uri: download in creationToken order Derrick Stolee via GitGitGadget
2023-01-27 19:17 ` Victoria Dye
2023-01-27 19:32 ` Junio C Hamano
2023-01-30 18:43 ` Derrick Stolee
2023-01-30 19:02 ` Junio C Hamano
2023-01-30 19:12 ` Derrick Stolee
2023-01-23 15:21 ` [PATCH v2 06/10] clone: set fetch.bundleURI if appropriate Derrick Stolee via GitGitGadget
2023-01-23 15:21 ` [PATCH v2 07/10] bundle-uri: drop bundle.flag from design doc Derrick Stolee via GitGitGadget
2023-01-23 15:21 ` [PATCH v2 08/10] fetch: fetch from an external bundle URI Derrick Stolee via GitGitGadget
2023-01-27 19:18 ` Victoria Dye
2023-01-23 15:21 ` [PATCH v2 09/10] bundle-uri: store fetch.bundleCreationToken Derrick Stolee via GitGitGadget
2023-01-23 15:21 ` [PATCH v2 10/10] bundle-uri: test missing bundles with heuristic Derrick Stolee via GitGitGadget
2023-01-27 19:21 ` Victoria Dye
2023-01-30 18:47 ` Derrick Stolee
2023-01-27 19:28 ` [PATCH v2 00/10] Bundle URIs V: creationToken heuristic for incremental fetches Victoria Dye
2023-01-31 13:29 ` [PATCH v3 00/11] " Derrick Stolee via GitGitGadget
2023-01-31 13:29 ` [PATCH v3 01/11] bundle: test unbundling with incomplete history Derrick Stolee via GitGitGadget
2023-01-31 13:29 ` [PATCH v3 02/11] bundle: verify using check_connected() Derrick Stolee via GitGitGadget
2023-01-31 17:35 ` Junio C Hamano [this message]
2023-01-31 19:31 ` Derrick Stolee
2023-01-31 19:36 ` Junio C Hamano
2023-01-31 13:29 ` [PATCH v3 03/11] t5558: add tests for creationToken heuristic Derrick Stolee via GitGitGadget
2023-01-31 13:29 ` [PATCH v3 04/11] bundle-uri: parse bundle.heuristic=creationToken Derrick Stolee via GitGitGadget
2023-01-31 13:29 ` [PATCH v3 05/11] bundle-uri: parse bundle.<id>.creationToken values Derrick Stolee via GitGitGadget
2023-01-31 21:22 ` Junio C Hamano
2023-01-31 13:29 ` [PATCH v3 06/11] bundle-uri: download in creationToken order Derrick Stolee via GitGitGadget
2023-01-31 13:29 ` [PATCH v3 07/11] clone: set fetch.bundleURI if appropriate Derrick Stolee via GitGitGadget
2023-01-31 13:29 ` [PATCH v3 08/11] bundle-uri: drop bundle.flag from design doc Derrick Stolee via GitGitGadget
2023-01-31 13:29 ` [PATCH v3 09/11] fetch: fetch from an external bundle URI Derrick Stolee via GitGitGadget
2023-01-31 13:29 ` [PATCH v3 10/11] bundle-uri: store fetch.bundleCreationToken Derrick Stolee via GitGitGadget
2023-01-31 13:29 ` [PATCH v3 11/11] bundle-uri: test missing bundles with heuristic Derrick Stolee via GitGitGadget
2023-01-31 22:01 ` [PATCH v3 00/11] Bundle URIs V: creationToken heuristic for incremental fetches Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqr0vay5cz.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=avarab@gmail.com \
--cc=chooglen@google.com \
--cc=derrickstolee@github.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=me@ttaylorr.com \
--cc=steadmon@google.com \
--cc=vdye@github.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).