git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/2] bug with rev-list --verify-objects and commit-graph
@ 2022-09-06 20:58 Jeff King
  2022-09-06 21:02 ` [PATCH 1/2] lookup_commit_in_graph(): use prepare_commit_graph() to check for graph Jeff King
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Jeff King @ 2022-09-06 20:58 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Junio C Hamano

While looking at something unrelated, I noticed that "git rev-list
--verify-objects" will not actually notice corruptions of commits that
are found in a commit graph. This fixes it.

The first one is a cleanup that is not strictly related, but is needed
for the tests in the second to work reliably (and is a good idea
anyway).

The second is the fix. I don't think it's super-important, as we do not
use --verify-objects for anything, since d21c463d55 (fetch/receive:
remove over-pessimistic connectivity check, 2012-03-15). And it's not
even documented, so perhaps we should just consider getting rid of it.
But in the meantime, it was easy enough to correct.

  [1/2]: lookup_commit_in_graph(): use prepare_commit_graph() to check for graph
  [2/2]: rev-list: disable commit graph with --verify-objects

 commit-graph.c  |  2 +-
 revision.c      |  1 +
 t/t1450-fsck.sh | 28 ++++++++++++++++++++++++++++
 3 files changed, 30 insertions(+), 1 deletion(-)

-Peff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] lookup_commit_in_graph(): use prepare_commit_graph() to check for graph
  2022-09-06 20:58 [PATCH 0/2] bug with rev-list --verify-objects and commit-graph Jeff King
@ 2022-09-06 21:02 ` Jeff King
  2022-09-06 21:16   ` Taylor Blau
  2022-09-06 21:04 ` [PATCH 2/2] rev-list: disable commit graph with --verify-objects Jeff King
  2022-09-06 21:20 ` [PATCH 0/2] bug with rev-list --verify-objects and commit-graph Taylor Blau
  2 siblings, 1 reply; 7+ messages in thread
From: Jeff King @ 2022-09-06 21:02 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Junio C Hamano

We exit early from lookup_commit_in_graph() if the commit_graph pointer
is NULL, under the assumption that we don't have a graph to look at. But
the graph pointer is lazy-loaded; if no other code happens to have
called prepare_commit_graph(), we'll incorrectly assume that one isn't
available at all.

This has a pretty small performance impact in practice, because the
fallback will generally be to call parse_object() instead. That ends up
in parse_commit_buffer(), which loads the graph data itself. So the
first commit we see won't use the graph, but subsequent ones will. Since
using the graph is just an optimization there's generally no
user-visible difference, but if you instrument rev-list like so:

  diff --git a/revision.c b/revision.c
  index ee702e498a..63c488ffb6 100644
  --- a/revision.c
  +++ b/revision.c
  @@ -381,6 +381,9 @@ static struct object *get_reference(struct rev_info *revs, const char *name,
           * parsing commit data from disk.
           */
          commit = lookup_commit_in_graph(revs->repo, oid);
  +       warning("%s %s in commit graph",
  +               commit ? "found" : "did not find",
  +               name);
          if (commit)
                  object = &commit->object;
          else

and run (in git.git):

  git commit-graph write --reachable
  git rev-list origin/master origin/next >/dev/null

you'll see that we fail to find the first one:

  warning: did not find origin/master in commit graph
  warning: found origin/next in commit graph

After this patch, you'll see that we find both:

  warning: found origin/master in commit graph
  warning: found origin/next in commit graph

Even though the performance implication is small here, there are two
important reasons to do this:

  - it's downright confusing if you are hunting a bug triggered by the
    use of the commit graph. It may or may not trigger depending on the
    number and ordering of tips you ask for.

  - prepare_commit_graph() has other policy logic, too. In particular,
    if we've loaded a commit graph and then disabled the graph via
    disable_commit_graph(), that should take precedence.

    I'm not sure if this can trigger bad behavior in practice. The only
    caller there is upload-pack's deepen_by_rev_list(), which should be
    avoiding the commit graph for its traversal tips, but probably
    wasn't before this patch. Whether you could come up with a case
    where that mattered is unclear. Still, this is obviously the right
    thing to be doing.

Signed-off-by: Jeff King <peff@peff.net>
---
I couldn't find any other reason to avoid calling prepare_commit_graph()
here (especially since it ends up lazy-loaded as discussed above). The
cost of the call should not be high; after the first call, it is
simplified down to a few integer checks.

 commit-graph.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index f2a36032f8..aef076e145 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -901,7 +901,7 @@ struct commit *lookup_commit_in_graph(struct repository *repo, const struct obje
 	struct commit *commit;
 	uint32_t pos;
 
-	if (!repo->objects->commit_graph)
+	if (!prepare_commit_graph(repo))
 		return NULL;
 	if (!search_commit_pos_in_graph(id, repo->objects->commit_graph, &pos))
 		return NULL;
-- 
2.37.3.1134.gfd534b3986


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] rev-list: disable commit graph with --verify-objects
  2022-09-06 20:58 [PATCH 0/2] bug with rev-list --verify-objects and commit-graph Jeff King
  2022-09-06 21:02 ` [PATCH 1/2] lookup_commit_in_graph(): use prepare_commit_graph() to check for graph Jeff King
@ 2022-09-06 21:04 ` Jeff King
  2022-09-06 21:20   ` Taylor Blau
  2022-09-07 16:38   ` Junio C Hamano
  2022-09-06 21:20 ` [PATCH 0/2] bug with rev-list --verify-objects and commit-graph Taylor Blau
  2 siblings, 2 replies; 7+ messages in thread
From: Jeff King @ 2022-09-06 21:04 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Junio C Hamano

Since the point of --verify-objects is to actually load and checksum the
bytes of each object, optimizing out reads using the commit graph runs
contrary to our goal.

The most targeted way to implement this would be for the revision
traversal code to check revs->verify_objects and avoid using the commit
graph. But it's difficult to be sure we've hit all of the correct spots.
For instance, I started this patch by writing the first of the included
test cases, where the corrupted commit is directly on rev-list's command
line. And that is easy to fix by teaching get_reference() to check
revs->verify_objects before calling lookup_commit_in_graph().

But that doesn't cover the second test case: when we traverse to a
corrupted commit, we'd parse the parent in process_parents(). So we'd
need to check there, too. And it keeps going. In handle_commit() we
sometimes parses commits, too, though I couldn't figure out a way to
trigger it that did not already parse via get_reference() or tag
peeling. And try_to_simplify_commit() has its own parse call, and so on.

So it seems like the safest thing is to just disable the commit graph
for the whole process when we see the --verify-objects option. We can do
that either in builtin/rev-list.c, where we use the option, or in
revision.c, where we parse it. There are some subtleties:

  - putting it in rev-list.c is less surprising in some ways, because
    there we know we are just doing a single traversal. In a command
    which does multiple traversals in a single process, it's rather
    unexpected to globally disable the commit graph.

  - putting it in revision.c is less surprising in some ways, because
    the caller does not have to remember to disable the graph
    themselves. But this is already tricky! The verify_objects flag in
    rev_info doesn't do anything by itself. The caller has to provide an
    object callback which does the right thing.

  - for that reason, in practice nobody but rev-list uses this option in
    the first place. So the distinction is probably not important either
    way. Arguably it should just be an option of rev-list, and not the
    general revision machinery; right now you can run "git log
    --verify-objects", but it does not actually do anything useful.

  - checking for a parsed revs.verify_objects flag in rev-list.c is too
    late. By that time we've already passed the arguments to
    setup_revisions(), which will have parsed the commits using the
    graph.

So this commit disables the graph as soon as we see the option in
revision.c. That's a pretty broad hammer, but it does what we want, and
in practice nobody but rev-list is using this flag anyway.

The tests cover both the "tip" and "parent" cases. Obviously our hammer
hits them both in this case, but it's good to check both in case
somebody later tries the more focused approach.

Signed-off-by: Jeff King <peff@peff.net>
---
 revision.c      |  1 +
 t/t1450-fsck.sh | 28 ++++++++++++++++++++++++++++
 2 files changed, 29 insertions(+)

diff --git a/revision.c b/revision.c
index ee702e498a..00f9c7943b 100644
--- a/revision.c
+++ b/revision.c
@@ -2426,6 +2426,7 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 		revs->tree_objects = 1;
 		revs->blob_objects = 1;
 		revs->verify_objects = 1;
+		disable_commit_graph(revs->repo);
 	} else if (!strcmp(arg, "--unpacked")) {
 		revs->unpacked = 1;
 	} else if (starts_with(arg, "--unpacked=")) {
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 53c2aa10b7..f9a1bc5de7 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -507,6 +507,34 @@ test_expect_success 'rev-list --verify-objects with bad sha1' '
 	test_i18ngrep -q "error: hash mismatch $(dirname $new)$(test_oid ff_2)" out
 '
 
+test_expect_success 'set up repository with commit-graph' '
+	git init corrupt-graph &&
+	(
+		cd corrupt-graph &&
+		test_commit one &&
+		test_commit two &&
+		git commit-graph write --reachable
+	)
+'
+
+corrupt_graph_obj () {
+	oid=$(git -C corrupt-graph rev-parse "$1") &&
+	obj=corrupt-graph/.git/objects/$(test_oid_to_path $oid) &&
+	test_when_finished 'mv backup $obj' &&
+	mv $obj backup &&
+	echo garbage >$obj
+}
+
+test_expect_success 'rev-list --verify-objects with commit graph (tip)' '
+	corrupt_graph_obj HEAD &&
+	test_must_fail git -C corrupt-graph rev-list --verify-objects HEAD
+'
+
+test_expect_success 'rev-list --verify-objects with commit graph (parent)' '
+	corrupt_graph_obj HEAD^ &&
+	test_must_fail git -C corrupt-graph rev-list --verify-objects HEAD
+'
+
 test_expect_success 'force fsck to ignore double author' '
 	git cat-file commit HEAD >basis &&
 	sed "s/^author .*/&,&/" <basis | tr , \\n >multiple-authors &&
-- 
2.37.3.1134.gfd534b3986

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] lookup_commit_in_graph(): use prepare_commit_graph() to check for graph
  2022-09-06 21:02 ` [PATCH 1/2] lookup_commit_in_graph(): use prepare_commit_graph() to check for graph Jeff King
@ 2022-09-06 21:16   ` Taylor Blau
  0 siblings, 0 replies; 7+ messages in thread
From: Taylor Blau @ 2022-09-06 21:16 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Patrick Steinhardt, Junio C Hamano

On Tue, Sep 06, 2022 at 05:02:13PM -0400, Jeff King wrote:
> I couldn't find any other reason to avoid calling prepare_commit_graph()
> here (especially since it ends up lazy-loaded as discussed above). The
> cost of the call should not be high; after the first call, it is
> simplified down to a few integer checks.

Neither could I. Obviously f559d6d45e (revision: avoid hitting packfiles
when commits are in commit-graph, 2021-08-09) is adding a call to look
for a commit by object ID in the commit-graph where there wasn't one
before, so there isn't anything to compare to there.

But the next-closest function `load_commit_graph_info()` calls
`prepare_commit_graph()` (via `repo_find_commit_pos_in_graph()`).

And that all matches my understanding that `r->objects->commit_graph` is
lazily loaded. Perhaps it should be made more difficult to access via
the struct member, and instead done behind a function like
`prepare_commit_graph()` (modified to return the `struct commit_graph*`
if one was found).

But that's for another day :-). This is obviously correct in the
meantime.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] rev-list: disable commit graph with --verify-objects
  2022-09-06 21:04 ` [PATCH 2/2] rev-list: disable commit graph with --verify-objects Jeff King
@ 2022-09-06 21:20   ` Taylor Blau
  2022-09-07 16:38   ` Junio C Hamano
  1 sibling, 0 replies; 7+ messages in thread
From: Taylor Blau @ 2022-09-06 21:20 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Patrick Steinhardt, Junio C Hamano

On Tue, Sep 06, 2022 at 05:04:35PM -0400, Jeff King wrote:
> So it seems like the safest thing is to just disable the commit graph
> for the whole process when we see the --verify-objects option. We can do
> that either in builtin/rev-list.c, where we use the option, or in
> revision.c, where we parse it. There are some subtleties:

Agreed that putting it in the rev-list code makes sense.

I was wondering whether disabling the commit graph would survive
a `raw_object_store_clear()` or `close_commit_graph()`. But it does,
since we're not actually touching the `commit_graph` pointer itself, but
a second variable `commit_graph_disabled`, which isn't ever set back to
0.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/2] bug with rev-list --verify-objects and commit-graph
  2022-09-06 20:58 [PATCH 0/2] bug with rev-list --verify-objects and commit-graph Jeff King
  2022-09-06 21:02 ` [PATCH 1/2] lookup_commit_in_graph(): use prepare_commit_graph() to check for graph Jeff King
  2022-09-06 21:04 ` [PATCH 2/2] rev-list: disable commit graph with --verify-objects Jeff King
@ 2022-09-06 21:20 ` Taylor Blau
  2 siblings, 0 replies; 7+ messages in thread
From: Taylor Blau @ 2022-09-06 21:20 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Patrick Steinhardt, Junio C Hamano

On Tue, Sep 06, 2022 at 04:58:59PM -0400, Jeff King wrote:
> While looking at something unrelated, I noticed that "git rev-list
> --verify-objects" will not actually notice corruptions of commits that
> are found in a commit graph. This fixes it.

I reviewed both of these patches, and they look good to me. Thanks!

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] rev-list: disable commit graph with --verify-objects
  2022-09-06 21:04 ` [PATCH 2/2] rev-list: disable commit graph with --verify-objects Jeff King
  2022-09-06 21:20   ` Taylor Blau
@ 2022-09-07 16:38   ` Junio C Hamano
  1 sibling, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2022-09-07 16:38 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Patrick Steinhardt

> diff --git a/revision.c b/revision.c
> index ee702e498a..00f9c7943b 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -2426,6 +2426,7 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg

It is consistent with the description in the proposed log message ...

>  		revs->tree_objects = 1;
>  		revs->blob_objects = 1;
>  		revs->verify_objects = 1;
> +		disable_commit_graph(revs->repo);

... to disable commit graph when "--verify-objects" is requested.

> diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
> index 53c2aa10b7..f9a1bc5de7 100755
> --- a/t/t1450-fsck.sh
> +++ b/t/t1450-fsck.sh
> @@ -507,6 +507,34 @@ test_expect_success 'rev-list --verify-objects with bad sha1' '
>  	test_i18ngrep -q "error: hash mismatch $(dirname $new)$(test_oid ff_2)" out
>  '
>  
> +test_expect_success 'set up repository with commit-graph' '
> +	git init corrupt-graph &&
> +	(
> +		cd corrupt-graph &&
> +		test_commit one &&
> +		test_commit two &&
> +		git commit-graph write --reachable
> +	)
> +'
> +
> +corrupt_graph_obj () {
> +	oid=$(git -C corrupt-graph rev-parse "$1") &&
> +	obj=corrupt-graph/.git/objects/$(test_oid_to_path $oid) &&
> +	test_when_finished 'mv backup $obj' &&

OK.  I missed this when-finished thing in my first read and wondered
who takes care of the 'backup' file.  We obviously cannot use it in
tests that do not follow the "break single loose object and inspect
the command behaviour in the repository" pattern, but that is OK.

> +	mv $obj backup &&
> +	echo garbage >$obj
> +}
> +
> +test_expect_success 'rev-list --verify-objects with commit graph (tip)' '
> +	corrupt_graph_obj HEAD &&
> +	test_must_fail git -C corrupt-graph rev-list --verify-objects HEAD
> +'
> +
> +test_expect_success 'rev-list --verify-objects with commit graph (parent)' '
> +	corrupt_graph_obj HEAD^ &&
> +	test_must_fail git -C corrupt-graph rev-list --verify-objects HEAD
> +'
> +

Looks good, thanks.

>  test_expect_success 'force fsck to ignore double author' '
>  	git cat-file commit HEAD >basis &&
>  	sed "s/^author .*/&,&/" <basis | tr , \\n >multiple-authors &&

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-09-07 16:38 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-06 20:58 [PATCH 0/2] bug with rev-list --verify-objects and commit-graph Jeff King
2022-09-06 21:02 ` [PATCH 1/2] lookup_commit_in_graph(): use prepare_commit_graph() to check for graph Jeff King
2022-09-06 21:16   ` Taylor Blau
2022-09-06 21:04 ` [PATCH 2/2] rev-list: disable commit graph with --verify-objects Jeff King
2022-09-06 21:20   ` Taylor Blau
2022-09-07 16:38   ` Junio C Hamano
2022-09-06 21:20 ` [PATCH 0/2] bug with rev-list --verify-objects and commit-graph Taylor Blau

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).