git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v2 0/2] commit-graph: detect commits missing in ODB
@ 2023-10-23 11:27 Patrick Steinhardt
  2023-10-23 11:27 ` [PATCH v2 1/2] commit-graph: introduce envvar to disable commit existence checks Patrick Steinhardt
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Patrick Steinhardt @ 2023-10-23 11:27 UTC (permalink / raw
  To: git; +Cc: Karthik Nayak, Junio C Hamano, Taylor Blau, Jeff King

[-- Attachment #1: Type: text/plain, Size: 6482 bytes --]

Hi,

this is version 2 of my patch series to more readily detect commits
parsed from the commit graph which are missing in the object database.
I've split this out into a separate thread as version 1 was sent in
reply to a different series to extend git-rev-list(1)'s `--missing`
option so that I don't continue to hijack this thread.

Changes compared to v1:

    - I've added a preparatory patch that introduced a new
      GIT_COMMIT_GRAPH_PARANOIA environment variable as suggested by
      Peff. This envvar is retrofitted to the preexisting check in
      `lookup_commit_in_graph()` so that the behaviour for this sanity
      check is consistent.

    - `repo_parse_commit_internal()` now also honors this new envvar.

    - I've amended the commit message in v2 to include the benchmark
      that demonstrates the performance regression.

    - We now un-parse the commit when parsing it via the commit graph
      succeeded, but it doesn't exist in the ODB.

Thanks for all the feedback and discussion around this.

Patrick

[1]: <b0bf576c51a706367a758b8e30eca37edb9c2734.1697200576.git.ps@pks.im>

Patrick Steinhardt (2):
  commit-graph: introduce envvar to disable commit existence checks
  commit: detect commits that exist in commit-graph but not in the ODB

 Documentation/git.txt   |  9 ++++++++
 commit-graph.c          |  6 +++++-
 commit-graph.h          |  6 ++++++
 commit.c                | 16 +++++++++++++-
 t/t5318-commit-graph.sh | 48 +++++++++++++++++++++++++++++++++++++++++
 5 files changed, 83 insertions(+), 2 deletions(-)

Range-diff against v1:
-:  ---------- > 1:  a89c435528 commit-graph: introduce envvar to disable commit existence checks
1:  6ec1e340f8 ! 2:  0476d48555 commit: detect commits that exist in commit-graph but not in the ODB
    @@ Commit message
         behaviour by checking for object existence via the object database, as
         well.
     
    +    This check of course comes with a performance penalty. The following
    +    benchmarks have been executed in a clone of linux.git with stable tags
    +    added:
    +
    +        Benchmark 1: git -c core.commitGraph=true rev-list --topo-order --all (git = master)
    +          Time (mean ± σ):      2.913 s ±  0.018 s    [User: 2.363 s, System: 0.548 s]
    +          Range (min … max):    2.894 s …  2.950 s    10 runs
    +
    +        Benchmark 2: git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
    +          Time (mean ± σ):      3.834 s ±  0.052 s    [User: 3.276 s, System: 0.556 s]
    +          Range (min … max):    3.780 s …  3.961 s    10 runs
    +
    +        Benchmark 3: git -c core.commitGraph=false rev-list --topo-order --all (git = master)
    +          Time (mean ± σ):     13.841 s ±  0.084 s    [User: 13.152 s, System: 0.687 s]
    +          Range (min … max):   13.714 s … 13.995 s    10 runs
    +
    +        Benchmark 4: git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
    +          Time (mean ± σ):     13.762 s ±  0.116 s    [User: 13.094 s, System: 0.667 s]
    +          Range (min … max):   13.645 s … 14.038 s    10 runs
    +
    +        Summary
    +          git -c core.commitGraph=true rev-list --topo-order --all (git = master) ran
    +            1.32 ± 0.02 times faster than git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
    +            4.72 ± 0.05 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
    +            4.75 ± 0.04 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = master)
    +
    +    We look at a ~30% regression in general, but in general we're still a
    +    whole lot faster than without the commit graph. To counteract this, the
    +    new check can be turned off with the `GIT_COMMIT_GRAPH_PARANOIA` envvar.
    +
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     
      ## commit.c ##
    +@@
    + #include "shallow.h"
    + #include "tree.h"
    + #include "hook.h"
    ++#include "parse.h"
    + 
    + static struct commit_extra_header *read_commit_extra_header_lines(const char *buf, size_t len, const char **);
    + 
     @@ commit.c: int repo_parse_commit_internal(struct repository *r,
      		return -1;
      	if (item->object.parsed)
      		return 0;
     -	if (use_commit_graph && parse_commit_in_graph(r, item))
     +	if (use_commit_graph && parse_commit_in_graph(r, item)) {
    -+		if (!has_object(r, &item->object.oid, 0))
    ++		static int object_paranoia = -1;
    ++
    ++		if (object_paranoia == -1)
    ++			object_paranoia = git_env_bool(GIT_COMMIT_GRAPH_PARANOIA, 1);
    ++
    ++		if (object_paranoia && !has_object(r, &item->object.oid, 0)) {
    ++			unparse_commit(r, &item->object.oid);
     +			return quiet_on_missing ? -1 :
     +				error(_("commit %s exists in commit-graph but not in the object database"),
     +				      oid_to_hex(&item->object.oid));
    ++		}
    ++
      		return 0;
     +	}
      
    @@ commit.c: int repo_parse_commit_internal(struct repository *r,
      		return quiet_on_missing ? -1 :
     
      ## t/t5318-commit-graph.sh ##
    -@@ t/t5318-commit-graph.sh: test_expect_success 'overflow during generation version upgrade' '
    +@@ t/t5318-commit-graph.sh: test_expect_success 'stale commit cannot be parsed when given directly' '
      	)
      '
      
    -+test_expect_success 'commit exists in commit-graph but not in object database' '
    ++test_expect_success 'stale commit cannot be parsed when traversing graph' '
     +	test_when_finished "rm -rf repo" &&
     +	git init repo &&
     +	(
    @@ t/t5318-commit-graph.sh: test_expect_success 'overflow during generation version
     +		oid=$(git rev-parse B) &&
     +		rm .git/objects/"$(test_oid_to_path "$oid")" &&
     +
    ++		# Again, we should be able to parse the commit when not
    ++		# being paranoid about commit graph staleness...
    ++		GIT_COMMIT_GRAPH_PARANOIA=false git rev-parse HEAD~2 &&
    ++		# ... but fail when we are paranoid.
     +		test_must_fail git rev-parse HEAD~2 2>error &&
     +		grep "error: commit $oid exists in commit-graph but not in the object database" error
     +	)
-- 
2.42.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 1/2] commit-graph: introduce envvar to disable commit existence checks
  2023-10-23 11:27 [PATCH v2 0/2] commit-graph: detect commits missing in ODB Patrick Steinhardt
@ 2023-10-23 11:27 ` Patrick Steinhardt
  2023-10-30 21:29   ` Taylor Blau
  2023-10-23 11:27 ` [PATCH v2 2/2] commit: detect commits that exist in commit-graph but not in the ODB Patrick Steinhardt
  2023-10-31  7:16 ` [PATCH v3 0/2] commit-graph: detect commits missing in ODB Patrick Steinhardt
  2 siblings, 1 reply; 18+ messages in thread
From: Patrick Steinhardt @ 2023-10-23 11:27 UTC (permalink / raw
  To: git; +Cc: Karthik Nayak, Junio C Hamano, Taylor Blau, Jeff King

[-- Attachment #1: Type: text/plain, Size: 5776 bytes --]

Our `lookup_commit_in_graph()` helper tries to look up commits from the
commit graph and, if it doesn't exist there, falls back to parsing it
from the object database instead. This is intended to speed up the
lookup of any such commit that exists in the database. There is an edge
case though where the commit exists in the graph, but not in the object
database. To avoid returning such stale commits the helper function thus
double checks that any such commit parsed from the graph also exists in
the object database. This makes the function safe to use even when
commit graphs aren't updated regularly.

We're about to introduce the same pattern into other parts of our code
base though, namely `repo_parse_commit_internal()`. Here the extra
sanity check is a bit of a tougher sell: `lookup_commit_in_graph()` was
a newly introduced helper, and as such there was no performance hit by
adding this sanity check. If we added `repo_parse_commit_internal()`
with that sanity check right from the beginning as well, this would
probably never have been an issue to begin with. But by retrofitting it
with this sanity check now we do add a performance regression to
preexisting code, and thus there is a desire to avoid this or at least
give an escape hatch.

In practice, there is no inherent reason why either of those functions
should have the sanity check whereas the other one does not: either both
of them are able to detect this issue or none of them should be. This
also means that the default of whether we do the check should likely be
the same for both. To err on the side of caution, we thus rather want to
make `repo_parse_commit_internal()` stricter than to loosen the checks
that we already have in `lookup_commit_in_graph()`.

The escape hatch is added in the form of a new GIT_COMMIT_GRAPH_PARANOIA
environment variable that mirrors GIT_REF_PARANOIA. If enabled, which is
the default, we will double check that commits looked up in the commit
graph via `lookup_commit_in_graph()` also exist in the object database.
This same check will also be added in `repo_parse_commit_internal()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/git.txt   |  9 +++++++++
 commit-graph.c          |  6 +++++-
 commit-graph.h          |  6 ++++++
 t/t5318-commit-graph.sh | 21 +++++++++++++++++++++
 4 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/Documentation/git.txt b/Documentation/git.txt
index 11228956cd..22c2b537aa 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -911,6 +911,15 @@ for full details.
 	should not normally need to set this to `0`, but it may be
 	useful when trying to salvage data from a corrupted repository.
 
+`GIT_COMMIT_GRAPH_PARANOIA`::
+	If this Boolean environment variable is set to false, ignore the
+	case where commits exist in the commit graph but not in the
+	object database. Normally, Git will check whether commits loaded
+	from the commit graph exist in the object database to avoid
+	issues with stale commit graphs, but this check comes with a
+	performance penalty. The default is `1` (i.e., be paranoid about
+	stale commits in the commit graph).
+
 `GIT_ALLOW_PROTOCOL`::
 	If set to a colon-separated list of protocols, behave as if
 	`protocol.allow` is set to `never`, and each of the listed
diff --git a/commit-graph.c b/commit-graph.c
index fd2f700b2e..12ec31902e 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -939,14 +939,18 @@ int repo_find_commit_pos_in_graph(struct repository *r, struct commit *c,
 
 struct commit *lookup_commit_in_graph(struct repository *repo, const struct object_id *id)
 {
+	static int object_paranoia = -1;
 	struct commit *commit;
 	uint32_t pos;
 
+	if (object_paranoia == -1)
+		object_paranoia = git_env_bool(GIT_COMMIT_GRAPH_PARANOIA, 1);
+
 	if (!prepare_commit_graph(repo))
 		return NULL;
 	if (!search_commit_pos_in_graph(id, repo->objects->commit_graph, &pos))
 		return NULL;
-	if (!has_object(repo, id, 0))
+	if (object_paranoia && !has_object(repo, id, 0))
 		return NULL;
 
 	commit = lookup_commit(repo, id);
diff --git a/commit-graph.h b/commit-graph.h
index 20ada7e891..bd4289620c 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -8,6 +8,12 @@
 #define GIT_TEST_COMMIT_GRAPH_DIE_ON_PARSE "GIT_TEST_COMMIT_GRAPH_DIE_ON_PARSE"
 #define GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS "GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS"
 
+/*
+ * This environment variable controls whether commits looked up via the
+ * commit graph will be double checked to exist in the object database.
+ */
+#define GIT_COMMIT_GRAPH_PARANOIA "GIT_COMMIT_GRAPH_PARANOIA"
+
 /*
  * This method is only used to enhance coverage of the commit-graph
  * feature in the test suite with the GIT_TEST_COMMIT_GRAPH and
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index ba65f17dd9..c0cc454538 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -821,4 +821,25 @@ test_expect_success 'overflow during generation version upgrade' '
 	)
 '
 
+test_expect_success 'stale commit cannot be parsed when given directly' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit A &&
+		test_commit B &&
+		git commit-graph write --reachable &&
+
+		oid=$(git rev-parse B) &&
+		rm .git/objects/"$(test_oid_to_path "$oid")" &&
+
+		# Verify that it is possible to read the commit from the
+		# commit graph when not being paranoid, ...
+		GIT_COMMIT_GRAPH_PARANOIA=false git rev-list B &&
+		# ... but parsing the commit when double checking that
+		# it actually exists in the object database should fail.
+		test_must_fail git rev-list -1 B
+	)
+'
+
 test_done
-- 
2.42.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 2/2] commit: detect commits that exist in commit-graph but not in the ODB
  2023-10-23 11:27 [PATCH v2 0/2] commit-graph: detect commits missing in ODB Patrick Steinhardt
  2023-10-23 11:27 ` [PATCH v2 1/2] commit-graph: introduce envvar to disable commit existence checks Patrick Steinhardt
@ 2023-10-23 11:27 ` Patrick Steinhardt
  2023-10-24 19:10   ` Junio C Hamano
  2023-10-30 21:31   ` Taylor Blau
  2023-10-31  7:16 ` [PATCH v3 0/2] commit-graph: detect commits missing in ODB Patrick Steinhardt
  2 siblings, 2 replies; 18+ messages in thread
From: Patrick Steinhardt @ 2023-10-23 11:27 UTC (permalink / raw
  To: git; +Cc: Karthik Nayak, Junio C Hamano, Taylor Blau, Jeff King

[-- Attachment #1: Type: text/plain, Size: 6787 bytes --]

Commit graphs can become stale and contain references to commits that do
not exist in the object database anymore. Theoretically, this can lead
to a scenario where we are able to successfully look up any such commit
via the commit graph even though such a lookup would fail if done via
the object database directly.

As the commit graph is mostly intended as a sort of cache to speed up
parsing of commits we do not want to have diverging behaviour in a
repository with and a repository without commit graphs, no matter
whether they are stale or not. As commits are otherwise immutable, the
only thing that we really need to care about is thus the presence or
absence of a commit.

To address potentially stale commit data that may exist in the graph,
our `lookup_commit_in_graph()` function will check for the commit's
existence in both the commit graph, but also in the object database. So
even if we were able to look up the commit's data in the graph, we would
still pretend as if the commit didn't exist if it is missing in the
object database.

We don't have the same safety net in `parse_commit_in_graph_one()`
though. This function is mostly used internally in "commit-graph.c"
itself to validate the commit graph, and this usage is fine. We do
expose its functionality via `parse_commit_in_graph()` though, which
gets called by `repo_parse_commit_internal()`, and that function is in
turn used in many places in our codebase.

For all I can see this function is never used to directly turn an object
ID into a commit object without additional safety checks before or after
this lookup. What it is being used for though is to walk history via the
parent chain of commits. So when commits in the parent chain of a graph
walk are missing it is possible that we wouldn't notice if that missing
commit was part of the commit graph. Thus, a query like `git rev-parse
HEAD~2` can succeed even if the intermittent commit is missing.

It's unclear whether there are additional ways in which such stale
commit graphs can lead to problems. In any case, it feels like this is a
bigger bug waiting to happen when we gain additional direct or indirect
callers of `repo_parse_commit_internal()`. So let's fix the inconsistent
behaviour by checking for object existence via the object database, as
well.

This check of course comes with a performance penalty. The following
benchmarks have been executed in a clone of linux.git with stable tags
added:

    Benchmark 1: git -c core.commitGraph=true rev-list --topo-order --all (git = master)
      Time (mean ± σ):      2.913 s ±  0.018 s    [User: 2.363 s, System: 0.548 s]
      Range (min … max):    2.894 s …  2.950 s    10 runs

    Benchmark 2: git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
      Time (mean ± σ):      3.834 s ±  0.052 s    [User: 3.276 s, System: 0.556 s]
      Range (min … max):    3.780 s …  3.961 s    10 runs

    Benchmark 3: git -c core.commitGraph=false rev-list --topo-order --all (git = master)
      Time (mean ± σ):     13.841 s ±  0.084 s    [User: 13.152 s, System: 0.687 s]
      Range (min … max):   13.714 s … 13.995 s    10 runs

    Benchmark 4: git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
      Time (mean ± σ):     13.762 s ±  0.116 s    [User: 13.094 s, System: 0.667 s]
      Range (min … max):   13.645 s … 14.038 s    10 runs

    Summary
      git -c core.commitGraph=true rev-list --topo-order --all (git = master) ran
        1.32 ± 0.02 times faster than git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
        4.72 ± 0.05 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
        4.75 ± 0.04 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = master)

We look at a ~30% regression in general, but in general we're still a
whole lot faster than without the commit graph. To counteract this, the
new check can be turned off with the `GIT_COMMIT_GRAPH_PARANOIA` envvar.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 commit.c                | 16 +++++++++++++++-
 t/t5318-commit-graph.sh | 27 +++++++++++++++++++++++++++
 2 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/commit.c b/commit.c
index b3223478bc..7399e90212 100644
--- a/commit.c
+++ b/commit.c
@@ -28,6 +28,7 @@
 #include "shallow.h"
 #include "tree.h"
 #include "hook.h"
+#include "parse.h"
 
 static struct commit_extra_header *read_commit_extra_header_lines(const char *buf, size_t len, const char **);
 
@@ -572,8 +573,21 @@ int repo_parse_commit_internal(struct repository *r,
 		return -1;
 	if (item->object.parsed)
 		return 0;
-	if (use_commit_graph && parse_commit_in_graph(r, item))
+	if (use_commit_graph && parse_commit_in_graph(r, item)) {
+		static int object_paranoia = -1;
+
+		if (object_paranoia == -1)
+			object_paranoia = git_env_bool(GIT_COMMIT_GRAPH_PARANOIA, 1);
+
+		if (object_paranoia && !has_object(r, &item->object.oid, 0)) {
+			unparse_commit(r, &item->object.oid);
+			return quiet_on_missing ? -1 :
+				error(_("commit %s exists in commit-graph but not in the object database"),
+				      oid_to_hex(&item->object.oid));
+		}
+
 		return 0;
+	}
 
 	if (oid_object_info_extended(r, &item->object.oid, &oi, flags) < 0)
 		return quiet_on_missing ? -1 :
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index c0cc454538..79467d7926 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -842,4 +842,31 @@ test_expect_success 'stale commit cannot be parsed when given directly' '
 	)
 '
 
+test_expect_success 'stale commit cannot be parsed when traversing graph' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+
+		test_commit A &&
+		test_commit B &&
+		test_commit C &&
+		git commit-graph write --reachable &&
+
+		# Corrupt the repository by deleting the intermittent commit
+		# object. Commands should notice that this object is absent and
+		# thus that the repository is corrupt even if the commit graph
+		# exists.
+		oid=$(git rev-parse B) &&
+		rm .git/objects/"$(test_oid_to_path "$oid")" &&
+
+		# Again, we should be able to parse the commit when not
+		# being paranoid about commit graph staleness...
+		GIT_COMMIT_GRAPH_PARANOIA=false git rev-parse HEAD~2 &&
+		# ... but fail when we are paranoid.
+		test_must_fail git rev-parse HEAD~2 2>error &&
+		grep "error: commit $oid exists in commit-graph but not in the object database" error
+	)
+'
+
 test_done
-- 
2.42.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 2/2] commit: detect commits that exist in commit-graph but not in the ODB
  2023-10-23 11:27 ` [PATCH v2 2/2] commit: detect commits that exist in commit-graph but not in the ODB Patrick Steinhardt
@ 2023-10-24 19:10   ` Junio C Hamano
  2023-10-30 21:32     ` Taylor Blau
  2023-10-30 21:31   ` Taylor Blau
  1 sibling, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2023-10-24 19:10 UTC (permalink / raw
  To: Patrick Steinhardt; +Cc: git, Karthik Nayak, Taylor Blau, Jeff King

Patrick Steinhardt <ps@pks.im> writes:

> Commit graphs can become stale and contain references to commits that do
> not exist in the object database anymore. Theoretically, this can lead
> to a scenario where we are able to successfully look up any such commit
> via the commit graph even though such a lookup would fail if done via
> the object database directly.
>
> As the commit graph is mostly intended as a sort of cache to speed up
> parsing of commits we do not want to have diverging behaviour in a
> repository with and a repository without commit graphs, no matter
> whether they are stale or not. As commits are otherwise immutable, the
> only thing that we really need to care about is thus the presence or
> absence of a commit.
>
> To address potentially stale commit data that may exist in the graph,
> our `lookup_commit_in_graph()` function will check for the commit's
> existence in both the commit graph, but also in the object database. So
> even if we were able to look up the commit's data in the graph, we would
> still pretend as if the commit didn't exist if it is missing in the
> object database.
>
> We don't have the same safety net in `parse_commit_in_graph_one()`
> though. This function is mostly used internally in "commit-graph.c"
> itself to validate the commit graph, and this usage is fine. We do
> expose its functionality via `parse_commit_in_graph()` though, which
> gets called by `repo_parse_commit_internal()`, and that function is in
> turn used in many places in our codebase.
>
> For all I can see this function is never used to directly turn an object
> ID into a commit object without additional safety checks before or after
> this lookup. What it is being used for though is to walk history via the
> parent chain of commits. So when commits in the parent chain of a graph
> walk are missing it is possible that we wouldn't notice if that missing
> commit was part of the commit graph. Thus, a query like `git rev-parse
> HEAD~2` can succeed even if the intermittent commit is missing.
>
> It's unclear whether there are additional ways in which such stale
> commit graphs can lead to problems. In any case, it feels like this is a
> bigger bug waiting to happen when we gain additional direct or indirect
> callers of `repo_parse_commit_internal()`. So let's fix the inconsistent
> behaviour by checking for object existence via the object database, as
> well.
>
> This check of course comes with a performance penalty. The following
> benchmarks have been executed in a clone of linux.git with stable tags
> added:
>
>     Benchmark 1: git -c core.commitGraph=true rev-list --topo-order --all (git = master)
>       Time (mean ± σ):      2.913 s ±  0.018 s    [User: 2.363 s, System: 0.548 s]
>       Range (min … max):    2.894 s …  2.950 s    10 runs
>
>     Benchmark 2: git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
>       Time (mean ± σ):      3.834 s ±  0.052 s    [User: 3.276 s, System: 0.556 s]
>       Range (min … max):    3.780 s …  3.961 s    10 runs
>
>     Benchmark 3: git -c core.commitGraph=false rev-list --topo-order --all (git = master)
>       Time (mean ± σ):     13.841 s ±  0.084 s    [User: 13.152 s, System: 0.687 s]
>       Range (min … max):   13.714 s … 13.995 s    10 runs
>
>     Benchmark 4: git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
>       Time (mean ± σ):     13.762 s ±  0.116 s    [User: 13.094 s, System: 0.667 s]
>       Range (min … max):   13.645 s … 14.038 s    10 runs
>
>     Summary
>       git -c core.commitGraph=true rev-list --topo-order --all (git = master) ran
>         1.32 ± 0.02 times faster than git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
>         4.72 ± 0.05 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
>         4.75 ± 0.04 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = master)
>
> We look at a ~30% regression in general, but in general we're still a
> whole lot faster than without the commit graph. To counteract this, the
> new check can be turned off with the `GIT_COMMIT_GRAPH_PARANOIA` envvar.

Very nicely described.  Will queue.  I'll go offline for the rest of
the week but if there are no significant issues discovered by the
time I come back, let's declare a victory and merge these two
patches down to 'next'.

Thanks.

>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  commit.c                | 16 +++++++++++++++-
>  t/t5318-commit-graph.sh | 27 +++++++++++++++++++++++++++
>  2 files changed, 42 insertions(+), 1 deletion(-)
>
> diff --git a/commit.c b/commit.c
> index b3223478bc..7399e90212 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -28,6 +28,7 @@
>  #include "shallow.h"
>  #include "tree.h"
>  #include "hook.h"
> +#include "parse.h"
>  
>  static struct commit_extra_header *read_commit_extra_header_lines(const char *buf, size_t len, const char **);
>  
> @@ -572,8 +573,21 @@ int repo_parse_commit_internal(struct repository *r,
>  		return -1;
>  	if (item->object.parsed)
>  		return 0;
> -	if (use_commit_graph && parse_commit_in_graph(r, item))
> +	if (use_commit_graph && parse_commit_in_graph(r, item)) {
> +		static int object_paranoia = -1;
> +
> +		if (object_paranoia == -1)
> +			object_paranoia = git_env_bool(GIT_COMMIT_GRAPH_PARANOIA, 1);
> +
> +		if (object_paranoia && !has_object(r, &item->object.oid, 0)) {
> +			unparse_commit(r, &item->object.oid);
> +			return quiet_on_missing ? -1 :
> +				error(_("commit %s exists in commit-graph but not in the object database"),
> +				      oid_to_hex(&item->object.oid));
> +		}
> +
>  		return 0;
> +	}
>  
>  	if (oid_object_info_extended(r, &item->object.oid, &oi, flags) < 0)
>  		return quiet_on_missing ? -1 :
> diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
> index c0cc454538..79467d7926 100755
> --- a/t/t5318-commit-graph.sh
> +++ b/t/t5318-commit-graph.sh
> @@ -842,4 +842,31 @@ test_expect_success 'stale commit cannot be parsed when given directly' '
>  	)
>  '
>  
> +test_expect_success 'stale commit cannot be parsed when traversing graph' '
> +	test_when_finished "rm -rf repo" &&
> +	git init repo &&
> +	(
> +		cd repo &&
> +
> +		test_commit A &&
> +		test_commit B &&
> +		test_commit C &&
> +		git commit-graph write --reachable &&
> +
> +		# Corrupt the repository by deleting the intermittent commit
> +		# object. Commands should notice that this object is absent and
> +		# thus that the repository is corrupt even if the commit graph
> +		# exists.
> +		oid=$(git rev-parse B) &&
> +		rm .git/objects/"$(test_oid_to_path "$oid")" &&
> +
> +		# Again, we should be able to parse the commit when not
> +		# being paranoid about commit graph staleness...
> +		GIT_COMMIT_GRAPH_PARANOIA=false git rev-parse HEAD~2 &&
> +		# ... but fail when we are paranoid.
> +		test_must_fail git rev-parse HEAD~2 2>error &&
> +		grep "error: commit $oid exists in commit-graph but not in the object database" error
> +	)
> +'
> +
>  test_done


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/2] commit-graph: introduce envvar to disable commit existence checks
  2023-10-23 11:27 ` [PATCH v2 1/2] commit-graph: introduce envvar to disable commit existence checks Patrick Steinhardt
@ 2023-10-30 21:29   ` Taylor Blau
  2023-10-31  6:19     ` Patrick Steinhardt
  0 siblings, 1 reply; 18+ messages in thread
From: Taylor Blau @ 2023-10-30 21:29 UTC (permalink / raw
  To: Patrick Steinhardt; +Cc: git, Karthik Nayak, Junio C Hamano, Jeff King

On Mon, Oct 23, 2023 at 01:27:16PM +0200, Patrick Steinhardt wrote:
> Our `lookup_commit_in_graph()` helper tries to look up commits from the
> commit graph and, if it doesn't exist there, falls back to parsing it
> from the object database instead. This is intended to speed up the
> lookup of any such commit that exists in the database. There is an edge
> case though where the commit exists in the graph, but not in the object
> database. To avoid returning such stale commits the helper function thus
> double checks that any such commit parsed from the graph also exists in
> the object database. This makes the function safe to use even when
> commit graphs aren't updated regularly.
>
> We're about to introduce the same pattern into other parts of our code
> base though, namely `repo_parse_commit_internal()`. Here the extra
> sanity check is a bit of a tougher sell: `lookup_commit_in_graph()` was
> a newly introduced helper, and as such there was no performance hit by
> adding this sanity check. If we added `repo_parse_commit_internal()`
> with that sanity check right from the beginning as well, this would
> probably never have been an issue to begin with. But by retrofitting it
> with this sanity check now we do add a performance regression to
> preexisting code, and thus there is a desire to avoid this or at least
> give an escape hatch.
>
> In practice, there is no inherent reason why either of those functions
> should have the sanity check whereas the other one does not: either both
> of them are able to detect this issue or none of them should be. This
> also means that the default of whether we do the check should likely be
> the same for both. To err on the side of caution, we thus rather want to
> make `repo_parse_commit_internal()` stricter than to loosen the checks
> that we already have in `lookup_commit_in_graph()`.

All well reasoned. I think the most compelling reason is that we're
already doing this extra check in lookup_commit_in_graph(), and having
that be somewhat inconsistent with repo_parse_commit_internal() feels
error-prone to me.

> The escape hatch is added in the form of a new GIT_COMMIT_GRAPH_PARANOIA
> environment variable that mirrors GIT_REF_PARANOIA. If enabled, which is
> the default, we will double check that commits looked up in the commit
> graph via `lookup_commit_in_graph()` also exist in the object database.
> This same check will also be added in `repo_parse_commit_internal()`.

Sounds good.

> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  Documentation/git.txt   |  9 +++++++++
>  commit-graph.c          |  6 +++++-
>  commit-graph.h          |  6 ++++++
>  t/t5318-commit-graph.sh | 21 +++++++++++++++++++++
>  4 files changed, 41 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/git.txt b/Documentation/git.txt
> index 11228956cd..22c2b537aa 100644
> --- a/Documentation/git.txt
> +++ b/Documentation/git.txt
> @@ -911,6 +911,15 @@ for full details.
>  	should not normally need to set this to `0`, but it may be
>  	useful when trying to salvage data from a corrupted repository.
>
> +`GIT_COMMIT_GRAPH_PARANOIA`::
> +	If this Boolean environment variable is set to false, ignore the
> +	case where commits exist in the commit graph but not in the
> +	object database. Normally, Git will check whether commits loaded
> +	from the commit graph exist in the object database to avoid
> +	issues with stale commit graphs, but this check comes with a
> +	performance penalty. The default is `1` (i.e., be paranoid about
> +	stale commits in the commit graph).
> +

The first two sentences seem to be flipped. Perhaps:

    When loading a commit object from the commit-graph, Git will perform
    an existence check on the object in the ODB before parsing it out of
    the commit-graph. The default is "true", which enables the
    aforementioned behavior. Setting this to "false" disables the
    existential check when parsing commits from a commit-graph.

>  `GIT_ALLOW_PROTOCOL`::
>  	If set to a colon-separated list of protocols, behave as if
>  	`protocol.allow` is set to `never`, and each of the listed
> diff --git a/commit-graph.c b/commit-graph.c
> index fd2f700b2e..12ec31902e 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -939,14 +939,18 @@ int repo_find_commit_pos_in_graph(struct repository *r, struct commit *c,
>
>  struct commit *lookup_commit_in_graph(struct repository *repo, const struct object_id *id)
>  {
> +	static int object_paranoia = -1;
>  	struct commit *commit;
>  	uint32_t pos;
>
> +	if (object_paranoia == -1)
> +		object_paranoia = git_env_bool(GIT_COMMIT_GRAPH_PARANOIA, 1);
> +

I don't think that this is a reroll-able issue, but calling this
variable object_paranoia to store a setting for *graph* paranoia feels
like a good itch to scratch. But obviously not a big deal ;-).

> @@ -821,4 +821,25 @@ test_expect_success 'overflow during generation version upgrade' '
>  	)
>  '
>
> +test_expect_success 'stale commit cannot be parsed when given directly' '
> +	test_when_finished "rm -rf repo" &&
> +	git init repo &&
> +	(
> +		cd repo &&
> +		test_commit A &&
> +		test_commit B &&
> +		git commit-graph write --reachable &&
> +
> +		oid=$(git rev-parse B) &&
> +		rm .git/objects/"$(test_oid_to_path "$oid")" &&
> +
> +		# Verify that it is possible to read the commit from the
> +		# commit graph when not being paranoid, ...
> +		GIT_COMMIT_GRAPH_PARANOIA=false git rev-list B &&
> +		# ... but parsing the commit when double checking that
> +		# it actually exists in the object database should fail.
> +		test_must_fail git rev-list -1 B

Would "cat-file -p" be more direct here than "rev-list -1"?

Thanks,
Taylor


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 2/2] commit: detect commits that exist in commit-graph but not in the ODB
  2023-10-23 11:27 ` [PATCH v2 2/2] commit: detect commits that exist in commit-graph but not in the ODB Patrick Steinhardt
  2023-10-24 19:10   ` Junio C Hamano
@ 2023-10-30 21:31   ` Taylor Blau
  1 sibling, 0 replies; 18+ messages in thread
From: Taylor Blau @ 2023-10-30 21:31 UTC (permalink / raw
  To: Patrick Steinhardt; +Cc: git, Karthik Nayak, Junio C Hamano, Jeff King

On Mon, Oct 23, 2023 at 01:27:20PM +0200, Patrick Steinhardt wrote:
> @@ -572,8 +573,21 @@ int repo_parse_commit_internal(struct repository *r,
>  		return -1;
>  	if (item->object.parsed)
>  		return 0;
> -	if (use_commit_graph && parse_commit_in_graph(r, item))
> +	if (use_commit_graph && parse_commit_in_graph(r, item)) {
> +		static int object_paranoia = -1;
> +
> +		if (object_paranoia == -1)
> +			object_paranoia = git_env_bool(GIT_COMMIT_GRAPH_PARANOIA, 1);

The same note here about object_paranoia versus graph_paranoia, but
otherwise this patch looks good to me, modulo one typo below.

> @@ -842,4 +842,31 @@ test_expect_success 'stale commit cannot be parsed when given directly' '
>  	)
>  '
>
> +test_expect_success 'stale commit cannot be parsed when traversing graph' '
> +	test_when_finished "rm -rf repo" &&
> +	git init repo &&
> +	(
> +		cd repo &&
> +
> +		test_commit A &&
> +		test_commit B &&
> +		test_commit C &&
> +		git commit-graph write --reachable &&
> +
> +		# Corrupt the repository by deleting the intermittent commit

s/intermittent/intermediate

> +		# object. Commands should notice that this object is absent and
> +		# thus that the repository is corrupt even if the commit graph
> +		# exists.
> +		oid=$(git rev-parse B) &&
> +		rm .git/objects/"$(test_oid_to_path "$oid")" &&
> +
> +		# Again, we should be able to parse the commit when not
> +		# being paranoid about commit graph staleness...
> +		GIT_COMMIT_GRAPH_PARANOIA=false git rev-parse HEAD~2 &&
> +		# ... but fail when we are paranoid.
> +		test_must_fail git rev-parse HEAD~2 2>error &&
> +		grep "error: commit $oid exists in commit-graph but not in the object database" error
> +	)

Thanks,
Taylor


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 2/2] commit: detect commits that exist in commit-graph but not in the ODB
  2023-10-24 19:10   ` Junio C Hamano
@ 2023-10-30 21:32     ` Taylor Blau
  2023-10-31  0:21       ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: Taylor Blau @ 2023-10-30 21:32 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Patrick Steinhardt, git, Karthik Nayak, Jeff King

On Tue, Oct 24, 2023 at 12:10:13PM -0700, Junio C Hamano wrote:
> > We look at a ~30% regression in general, but in general we're still a
> > whole lot faster than without the commit graph. To counteract this, the
> > new check can be turned off with the `GIT_COMMIT_GRAPH_PARANOIA` envvar.
>
> Very nicely described.  Will queue.  I'll go offline for the rest of
> the week but if there are no significant issues discovered by the
> time I come back, let's declare a victory and merge these two
> patches down to 'next'.

I think we're close here. There are a couple of small comments that I
made throughout these two patches, but nothing major.

Thanks,
Taylor


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 2/2] commit: detect commits that exist in commit-graph but not in the ODB
  2023-10-30 21:32     ` Taylor Blau
@ 2023-10-31  0:21       ` Junio C Hamano
  0 siblings, 0 replies; 18+ messages in thread
From: Junio C Hamano @ 2023-10-31  0:21 UTC (permalink / raw
  To: Taylor Blau; +Cc: Patrick Steinhardt, git, Karthik Nayak, Jeff King

Taylor Blau <me@ttaylorr.com> writes:

> On Tue, Oct 24, 2023 at 12:10:13PM -0700, Junio C Hamano wrote:
>> > We look at a ~30% regression in general, but in general we're still a
>> > whole lot faster than without the commit graph. To counteract this, the
>> > new check can be turned off with the `GIT_COMMIT_GRAPH_PARANOIA` envvar.
>>
>> Very nicely described.  Will queue.  I'll go offline for the rest of
>> the week but if there are no significant issues discovered by the
>> time I come back, let's declare a victory and merge these two
>> patches down to 'next'.
>
> I think we're close here. There are a couple of small comments that I
> made throughout these two patches, but nothing major.

Thanks for a comment.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/2] commit-graph: introduce envvar to disable commit existence checks
  2023-10-30 21:29   ` Taylor Blau
@ 2023-10-31  6:19     ` Patrick Steinhardt
  0 siblings, 0 replies; 18+ messages in thread
From: Patrick Steinhardt @ 2023-10-31  6:19 UTC (permalink / raw
  To: Taylor Blau; +Cc: git, Karthik Nayak, Junio C Hamano, Jeff King

[-- Attachment #1: Type: text/plain, Size: 6644 bytes --]

On Mon, Oct 30, 2023 at 05:29:33PM -0400, Taylor Blau wrote:
> On Mon, Oct 23, 2023 at 01:27:16PM +0200, Patrick Steinhardt wrote:
> > Our `lookup_commit_in_graph()` helper tries to look up commits from the
> > commit graph and, if it doesn't exist there, falls back to parsing it
> > from the object database instead. This is intended to speed up the
> > lookup of any such commit that exists in the database. There is an edge
> > case though where the commit exists in the graph, but not in the object
> > database. To avoid returning such stale commits the helper function thus
> > double checks that any such commit parsed from the graph also exists in
> > the object database. This makes the function safe to use even when
> > commit graphs aren't updated regularly.
> >
> > We're about to introduce the same pattern into other parts of our code
> > base though, namely `repo_parse_commit_internal()`. Here the extra
> > sanity check is a bit of a tougher sell: `lookup_commit_in_graph()` was
> > a newly introduced helper, and as such there was no performance hit by
> > adding this sanity check. If we added `repo_parse_commit_internal()`
> > with that sanity check right from the beginning as well, this would
> > probably never have been an issue to begin with. But by retrofitting it
> > with this sanity check now we do add a performance regression to
> > preexisting code, and thus there is a desire to avoid this or at least
> > give an escape hatch.
> >
> > In practice, there is no inherent reason why either of those functions
> > should have the sanity check whereas the other one does not: either both
> > of them are able to detect this issue or none of them should be. This
> > also means that the default of whether we do the check should likely be
> > the same for both. To err on the side of caution, we thus rather want to
> > make `repo_parse_commit_internal()` stricter than to loosen the checks
> > that we already have in `lookup_commit_in_graph()`.
> 
> All well reasoned. I think the most compelling reason is that we're
> already doing this extra check in lookup_commit_in_graph(), and having
> that be somewhat inconsistent with repo_parse_commit_internal() feels
> error-prone to me.
> 
> > The escape hatch is added in the form of a new GIT_COMMIT_GRAPH_PARANOIA
> > environment variable that mirrors GIT_REF_PARANOIA. If enabled, which is
> > the default, we will double check that commits looked up in the commit
> > graph via `lookup_commit_in_graph()` also exist in the object database.
> > This same check will also be added in `repo_parse_commit_internal()`.
> 
> Sounds good.
> 
> > Signed-off-by: Patrick Steinhardt <ps@pks.im>
> > ---
> >  Documentation/git.txt   |  9 +++++++++
> >  commit-graph.c          |  6 +++++-
> >  commit-graph.h          |  6 ++++++
> >  t/t5318-commit-graph.sh | 21 +++++++++++++++++++++
> >  4 files changed, 41 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/git.txt b/Documentation/git.txt
> > index 11228956cd..22c2b537aa 100644
> > --- a/Documentation/git.txt
> > +++ b/Documentation/git.txt
> > @@ -911,6 +911,15 @@ for full details.
> >  	should not normally need to set this to `0`, but it may be
> >  	useful when trying to salvage data from a corrupted repository.
> >
> > +`GIT_COMMIT_GRAPH_PARANOIA`::
> > +	If this Boolean environment variable is set to false, ignore the
> > +	case where commits exist in the commit graph but not in the
> > +	object database. Normally, Git will check whether commits loaded
> > +	from the commit graph exist in the object database to avoid
> > +	issues with stale commit graphs, but this check comes with a
> > +	performance penalty. The default is `1` (i.e., be paranoid about
> > +	stale commits in the commit graph).
> > +
> 
> The first two sentences seem to be flipped. Perhaps:
> 
>     When loading a commit object from the commit-graph, Git will perform
>     an existence check on the object in the ODB before parsing it out of
>     the commit-graph. The default is "true", which enables the
>     aforementioned behavior. Setting this to "false" disables the
>     existential check when parsing commits from a commit-graph.

I was modelling this after the text we had in `GIT_REF_PARANOIA`, but I
like your version more indeed. I'll massage it a bit to mention _why_
one would want to disable this.

> >  `GIT_ALLOW_PROTOCOL`::
> >  	If set to a colon-separated list of protocols, behave as if
> >  	`protocol.allow` is set to `never`, and each of the listed
> > diff --git a/commit-graph.c b/commit-graph.c
> > index fd2f700b2e..12ec31902e 100644
> > --- a/commit-graph.c
> > +++ b/commit-graph.c
> > @@ -939,14 +939,18 @@ int repo_find_commit_pos_in_graph(struct repository *r, struct commit *c,
> >
> >  struct commit *lookup_commit_in_graph(struct repository *repo, const struct object_id *id)
> >  {
> > +	static int object_paranoia = -1;
> >  	struct commit *commit;
> >  	uint32_t pos;
> >
> > +	if (object_paranoia == -1)
> > +		object_paranoia = git_env_bool(GIT_COMMIT_GRAPH_PARANOIA, 1);
> > +
> 
> I don't think that this is a reroll-able issue, but calling this
> variable object_paranoia to store a setting for *graph* paranoia feels
> like a good itch to scratch. But obviously not a big deal ;-).

Ugh, yeah. I first had the envvar as "GIT_OBJECT_PARANOIA", but
discarded that name because I feared that it might become overloaded
with semi-related checks.

Will fix.

> > @@ -821,4 +821,25 @@ test_expect_success 'overflow during generation version upgrade' '
> >  	)
> >  '
> >
> > +test_expect_success 'stale commit cannot be parsed when given directly' '
> > +	test_when_finished "rm -rf repo" &&
> > +	git init repo &&
> > +	(
> > +		cd repo &&
> > +		test_commit A &&
> > +		test_commit B &&
> > +		git commit-graph write --reachable &&
> > +
> > +		oid=$(git rev-parse B) &&
> > +		rm .git/objects/"$(test_oid_to_path "$oid")" &&
> > +
> > +		# Verify that it is possible to read the commit from the
> > +		# commit graph when not being paranoid, ...
> > +		GIT_COMMIT_GRAPH_PARANOIA=false git rev-list B &&
> > +		# ... but parsing the commit when double checking that
> > +		# it actually exists in the object database should fail.
> > +		test_must_fail git rev-list -1 B
> 
> Would "cat-file -p" be more direct here than "rev-list -1"?

No, because it doesn't use `lookup_commit_in_graph()`. I had to go
searching a bit to find something that exposes this inconsistency, and
git-rev-list(1) was the easiest one I found.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v3 0/2] commit-graph: detect commits missing in ODB
  2023-10-23 11:27 [PATCH v2 0/2] commit-graph: detect commits missing in ODB Patrick Steinhardt
  2023-10-23 11:27 ` [PATCH v2 1/2] commit-graph: introduce envvar to disable commit existence checks Patrick Steinhardt
  2023-10-23 11:27 ` [PATCH v2 2/2] commit: detect commits that exist in commit-graph but not in the ODB Patrick Steinhardt
@ 2023-10-31  7:16 ` Patrick Steinhardt
  2023-10-31  7:16   ` [PATCH v3 1/2] commit-graph: introduce envvar to disable commit existence checks Patrick Steinhardt
                     ` (3 more replies)
  2 siblings, 4 replies; 18+ messages in thread
From: Patrick Steinhardt @ 2023-10-31  7:16 UTC (permalink / raw
  To: git; +Cc: Karthik Nayak, Junio C Hamano, Taylor Blau, Jeff King

[-- Attachment #1: Type: text/plain, Size: 4682 bytes --]

Hi,

this is version 3 of my patch series to more readily detect commits
parsed from the commit graph which are missing in the object database.

Changes compared to v2:

    - Rewrote the help text for `GIT_COMMIT_GRAPH_PARANOIA` to be more
      accessible.

    - Renamed the `object_paranoia` variable to `commit_graph_paranoia`.

    - Fixed a typo.

Thanks!

Patrick

Patrick Steinhardt (2):
  commit-graph: introduce envvar to disable commit existence checks
  commit: detect commits that exist in commit-graph but not in the ODB

 Documentation/git.txt   | 10 +++++++++
 commit-graph.c          |  6 +++++-
 commit-graph.h          |  6 ++++++
 commit.c                | 16 +++++++++++++-
 t/t5318-commit-graph.sh | 48 +++++++++++++++++++++++++++++++++++++++++
 5 files changed, 84 insertions(+), 2 deletions(-)

Range-diff against v2:
1:  a89c435528 ! 1:  c433ec1254 commit-graph: introduce envvar to disable commit existence checks
    @@ Documentation/git.txt: for full details.
      	useful when trying to salvage data from a corrupted repository.
      
     +`GIT_COMMIT_GRAPH_PARANOIA`::
    -+	If this Boolean environment variable is set to false, ignore the
    -+	case where commits exist in the commit graph but not in the
    -+	object database. Normally, Git will check whether commits loaded
    -+	from the commit graph exist in the object database to avoid
    -+	issues with stale commit graphs, but this check comes with a
    -+	performance penalty. The default is `1` (i.e., be paranoid about
    -+	stale commits in the commit graph).
    ++	When loading a commit object from the commit-graph, Git performs an
    ++	existence check on the object in the object database. This is done to
    ++	avoid issues with stale commit-graphs that contain references to
    ++	already-deleted commits, but comes with a performance penalty.
    +++
    ++The default is "true", which enables the aforementioned behavior.
    ++Setting this to "false" disables the existence check. This can lead to
    ++a performance improvement at the cost of consistency.
     +
      `GIT_ALLOW_PROTOCOL`::
      	If set to a colon-separated list of protocols, behave as if
    @@ commit-graph.c: int repo_find_commit_pos_in_graph(struct repository *r, struct c
      
      struct commit *lookup_commit_in_graph(struct repository *repo, const struct object_id *id)
      {
    -+	static int object_paranoia = -1;
    ++	static int commit_graph_paranoia = -1;
      	struct commit *commit;
      	uint32_t pos;
      
    -+	if (object_paranoia == -1)
    -+		object_paranoia = git_env_bool(GIT_COMMIT_GRAPH_PARANOIA, 1);
    ++	if (commit_graph_paranoia == -1)
    ++		commit_graph_paranoia = git_env_bool(GIT_COMMIT_GRAPH_PARANOIA, 1);
     +
      	if (!prepare_commit_graph(repo))
      		return NULL;
      	if (!search_commit_pos_in_graph(id, repo->objects->commit_graph, &pos))
      		return NULL;
     -	if (!has_object(repo, id, 0))
    -+	if (object_paranoia && !has_object(repo, id, 0))
    ++	if (commit_graph_paranoia && !has_object(repo, id, 0))
      		return NULL;
      
      	commit = lookup_commit(repo, id);
2:  0476d48555 ! 2:  8629fd0892 commit: detect commits that exist in commit-graph but not in the ODB
    @@ commit.c: int repo_parse_commit_internal(struct repository *r,
      		return 0;
     -	if (use_commit_graph && parse_commit_in_graph(r, item))
     +	if (use_commit_graph && parse_commit_in_graph(r, item)) {
    -+		static int object_paranoia = -1;
    ++		static int commit_graph_paranoia = -1;
     +
    -+		if (object_paranoia == -1)
    -+			object_paranoia = git_env_bool(GIT_COMMIT_GRAPH_PARANOIA, 1);
    ++		if (commit_graph_paranoia == -1)
    ++			commit_graph_paranoia = git_env_bool(GIT_COMMIT_GRAPH_PARANOIA, 1);
     +
    -+		if (object_paranoia && !has_object(r, &item->object.oid, 0)) {
    ++		if (commit_graph_paranoia && !has_object(r, &item->object.oid, 0)) {
     +			unparse_commit(r, &item->object.oid);
     +			return quiet_on_missing ? -1 :
     +				error(_("commit %s exists in commit-graph but not in the object database"),
    @@ t/t5318-commit-graph.sh: test_expect_success 'stale commit cannot be parsed when
     +		test_commit C &&
     +		git commit-graph write --reachable &&
     +
    -+		# Corrupt the repository by deleting the intermittent commit
    ++		# Corrupt the repository by deleting the intermediate commit
     +		# object. Commands should notice that this object is absent and
     +		# thus that the repository is corrupt even if the commit graph
     +		# exists.
-- 
2.42.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v3 1/2] commit-graph: introduce envvar to disable commit existence checks
  2023-10-31  7:16 ` [PATCH v3 0/2] commit-graph: detect commits missing in ODB Patrick Steinhardt
@ 2023-10-31  7:16   ` Patrick Steinhardt
  2023-10-31  7:16   ` [PATCH v3 2/2] commit: detect commits that exist in commit-graph but not in the ODB Patrick Steinhardt
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 18+ messages in thread
From: Patrick Steinhardt @ 2023-10-31  7:16 UTC (permalink / raw
  To: git; +Cc: Karthik Nayak, Junio C Hamano, Taylor Blau, Jeff King

[-- Attachment #1: Type: text/plain, Size: 5847 bytes --]

Our `lookup_commit_in_graph()` helper tries to look up commits from the
commit graph and, if it doesn't exist there, falls back to parsing it
from the object database instead. This is intended to speed up the
lookup of any such commit that exists in the database. There is an edge
case though where the commit exists in the graph, but not in the object
database. To avoid returning such stale commits the helper function thus
double checks that any such commit parsed from the graph also exists in
the object database. This makes the function safe to use even when
commit graphs aren't updated regularly.

We're about to introduce the same pattern into other parts of our code
base though, namely `repo_parse_commit_internal()`. Here the extra
sanity check is a bit of a tougher sell: `lookup_commit_in_graph()` was
a newly introduced helper, and as such there was no performance hit by
adding this sanity check. If we added `repo_parse_commit_internal()`
with that sanity check right from the beginning as well, this would
probably never have been an issue to begin with. But by retrofitting it
with this sanity check now we do add a performance regression to
preexisting code, and thus there is a desire to avoid this or at least
give an escape hatch.

In practice, there is no inherent reason why either of those functions
should have the sanity check whereas the other one does not: either both
of them are able to detect this issue or none of them should be. This
also means that the default of whether we do the check should likely be
the same for both. To err on the side of caution, we thus rather want to
make `repo_parse_commit_internal()` stricter than to loosen the checks
that we already have in `lookup_commit_in_graph()`.

The escape hatch is added in the form of a new GIT_COMMIT_GRAPH_PARANOIA
environment variable that mirrors GIT_REF_PARANOIA. If enabled, which is
the default, we will double check that commits looked up in the commit
graph via `lookup_commit_in_graph()` also exist in the object database.
This same check will also be added in `repo_parse_commit_internal()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/git.txt   | 10 ++++++++++
 commit-graph.c          |  6 +++++-
 commit-graph.h          |  6 ++++++
 t/t5318-commit-graph.sh | 21 +++++++++++++++++++++
 4 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/Documentation/git.txt b/Documentation/git.txt
index 11228956cd..3bac24cf8a 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -911,6 +911,16 @@ for full details.
 	should not normally need to set this to `0`, but it may be
 	useful when trying to salvage data from a corrupted repository.
 
+`GIT_COMMIT_GRAPH_PARANOIA`::
+	When loading a commit object from the commit-graph, Git performs an
+	existence check on the object in the object database. This is done to
+	avoid issues with stale commit-graphs that contain references to
+	already-deleted commits, but comes with a performance penalty.
++
+The default is "true", which enables the aforementioned behavior.
+Setting this to "false" disables the existence check. This can lead to
+a performance improvement at the cost of consistency.
+
 `GIT_ALLOW_PROTOCOL`::
 	If set to a colon-separated list of protocols, behave as if
 	`protocol.allow` is set to `never`, and each of the listed
diff --git a/commit-graph.c b/commit-graph.c
index fd2f700b2e..6d21ea6301 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -939,14 +939,18 @@ int repo_find_commit_pos_in_graph(struct repository *r, struct commit *c,
 
 struct commit *lookup_commit_in_graph(struct repository *repo, const struct object_id *id)
 {
+	static int commit_graph_paranoia = -1;
 	struct commit *commit;
 	uint32_t pos;
 
+	if (commit_graph_paranoia == -1)
+		commit_graph_paranoia = git_env_bool(GIT_COMMIT_GRAPH_PARANOIA, 1);
+
 	if (!prepare_commit_graph(repo))
 		return NULL;
 	if (!search_commit_pos_in_graph(id, repo->objects->commit_graph, &pos))
 		return NULL;
-	if (!has_object(repo, id, 0))
+	if (commit_graph_paranoia && !has_object(repo, id, 0))
 		return NULL;
 
 	commit = lookup_commit(repo, id);
diff --git a/commit-graph.h b/commit-graph.h
index 20ada7e891..bd4289620c 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -8,6 +8,12 @@
 #define GIT_TEST_COMMIT_GRAPH_DIE_ON_PARSE "GIT_TEST_COMMIT_GRAPH_DIE_ON_PARSE"
 #define GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS "GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS"
 
+/*
+ * This environment variable controls whether commits looked up via the
+ * commit graph will be double checked to exist in the object database.
+ */
+#define GIT_COMMIT_GRAPH_PARANOIA "GIT_COMMIT_GRAPH_PARANOIA"
+
 /*
  * This method is only used to enhance coverage of the commit-graph
  * feature in the test suite with the GIT_TEST_COMMIT_GRAPH and
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index ba65f17dd9..c0cc454538 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -821,4 +821,25 @@ test_expect_success 'overflow during generation version upgrade' '
 	)
 '
 
+test_expect_success 'stale commit cannot be parsed when given directly' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit A &&
+		test_commit B &&
+		git commit-graph write --reachable &&
+
+		oid=$(git rev-parse B) &&
+		rm .git/objects/"$(test_oid_to_path "$oid")" &&
+
+		# Verify that it is possible to read the commit from the
+		# commit graph when not being paranoid, ...
+		GIT_COMMIT_GRAPH_PARANOIA=false git rev-list B &&
+		# ... but parsing the commit when double checking that
+		# it actually exists in the object database should fail.
+		test_must_fail git rev-list -1 B
+	)
+'
+
 test_done
-- 
2.42.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 2/2] commit: detect commits that exist in commit-graph but not in the ODB
  2023-10-31  7:16 ` [PATCH v3 0/2] commit-graph: detect commits missing in ODB Patrick Steinhardt
  2023-10-31  7:16   ` [PATCH v3 1/2] commit-graph: introduce envvar to disable commit existence checks Patrick Steinhardt
@ 2023-10-31  7:16   ` Patrick Steinhardt
  2023-10-31 19:16   ` [PATCH v3 0/2] commit-graph: detect commits missing in ODB Taylor Blau
  2023-10-31 23:55   ` Junio C Hamano
  3 siblings, 0 replies; 18+ messages in thread
From: Patrick Steinhardt @ 2023-10-31  7:16 UTC (permalink / raw
  To: git; +Cc: Karthik Nayak, Junio C Hamano, Taylor Blau, Jeff King

[-- Attachment #1: Type: text/plain, Size: 6811 bytes --]

Commit graphs can become stale and contain references to commits that do
not exist in the object database anymore. Theoretically, this can lead
to a scenario where we are able to successfully look up any such commit
via the commit graph even though such a lookup would fail if done via
the object database directly.

As the commit graph is mostly intended as a sort of cache to speed up
parsing of commits we do not want to have diverging behaviour in a
repository with and a repository without commit graphs, no matter
whether they are stale or not. As commits are otherwise immutable, the
only thing that we really need to care about is thus the presence or
absence of a commit.

To address potentially stale commit data that may exist in the graph,
our `lookup_commit_in_graph()` function will check for the commit's
existence in both the commit graph, but also in the object database. So
even if we were able to look up the commit's data in the graph, we would
still pretend as if the commit didn't exist if it is missing in the
object database.

We don't have the same safety net in `parse_commit_in_graph_one()`
though. This function is mostly used internally in "commit-graph.c"
itself to validate the commit graph, and this usage is fine. We do
expose its functionality via `parse_commit_in_graph()` though, which
gets called by `repo_parse_commit_internal()`, and that function is in
turn used in many places in our codebase.

For all I can see this function is never used to directly turn an object
ID into a commit object without additional safety checks before or after
this lookup. What it is being used for though is to walk history via the
parent chain of commits. So when commits in the parent chain of a graph
walk are missing it is possible that we wouldn't notice if that missing
commit was part of the commit graph. Thus, a query like `git rev-parse
HEAD~2` can succeed even if the intermittent commit is missing.

It's unclear whether there are additional ways in which such stale
commit graphs can lead to problems. In any case, it feels like this is a
bigger bug waiting to happen when we gain additional direct or indirect
callers of `repo_parse_commit_internal()`. So let's fix the inconsistent
behaviour by checking for object existence via the object database, as
well.

This check of course comes with a performance penalty. The following
benchmarks have been executed in a clone of linux.git with stable tags
added:

    Benchmark 1: git -c core.commitGraph=true rev-list --topo-order --all (git = master)
      Time (mean ± σ):      2.913 s ±  0.018 s    [User: 2.363 s, System: 0.548 s]
      Range (min … max):    2.894 s …  2.950 s    10 runs

    Benchmark 2: git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
      Time (mean ± σ):      3.834 s ±  0.052 s    [User: 3.276 s, System: 0.556 s]
      Range (min … max):    3.780 s …  3.961 s    10 runs

    Benchmark 3: git -c core.commitGraph=false rev-list --topo-order --all (git = master)
      Time (mean ± σ):     13.841 s ±  0.084 s    [User: 13.152 s, System: 0.687 s]
      Range (min … max):   13.714 s … 13.995 s    10 runs

    Benchmark 4: git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
      Time (mean ± σ):     13.762 s ±  0.116 s    [User: 13.094 s, System: 0.667 s]
      Range (min … max):   13.645 s … 14.038 s    10 runs

    Summary
      git -c core.commitGraph=true rev-list --topo-order --all (git = master) ran
        1.32 ± 0.02 times faster than git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
        4.72 ± 0.05 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
        4.75 ± 0.04 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = master)

We look at a ~30% regression in general, but in general we're still a
whole lot faster than without the commit graph. To counteract this, the
new check can be turned off with the `GIT_COMMIT_GRAPH_PARANOIA` envvar.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 commit.c                | 16 +++++++++++++++-
 t/t5318-commit-graph.sh | 27 +++++++++++++++++++++++++++
 2 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/commit.c b/commit.c
index b3223478bc..8405d7c3fc 100644
--- a/commit.c
+++ b/commit.c
@@ -28,6 +28,7 @@
 #include "shallow.h"
 #include "tree.h"
 #include "hook.h"
+#include "parse.h"
 
 static struct commit_extra_header *read_commit_extra_header_lines(const char *buf, size_t len, const char **);
 
@@ -572,8 +573,21 @@ int repo_parse_commit_internal(struct repository *r,
 		return -1;
 	if (item->object.parsed)
 		return 0;
-	if (use_commit_graph && parse_commit_in_graph(r, item))
+	if (use_commit_graph && parse_commit_in_graph(r, item)) {
+		static int commit_graph_paranoia = -1;
+
+		if (commit_graph_paranoia == -1)
+			commit_graph_paranoia = git_env_bool(GIT_COMMIT_GRAPH_PARANOIA, 1);
+
+		if (commit_graph_paranoia && !has_object(r, &item->object.oid, 0)) {
+			unparse_commit(r, &item->object.oid);
+			return quiet_on_missing ? -1 :
+				error(_("commit %s exists in commit-graph but not in the object database"),
+				      oid_to_hex(&item->object.oid));
+		}
+
 		return 0;
+	}
 
 	if (oid_object_info_extended(r, &item->object.oid, &oi, flags) < 0)
 		return quiet_on_missing ? -1 :
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index c0cc454538..7b1c331b07 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -842,4 +842,31 @@ test_expect_success 'stale commit cannot be parsed when given directly' '
 	)
 '
 
+test_expect_success 'stale commit cannot be parsed when traversing graph' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+
+		test_commit A &&
+		test_commit B &&
+		test_commit C &&
+		git commit-graph write --reachable &&
+
+		# Corrupt the repository by deleting the intermediate commit
+		# object. Commands should notice that this object is absent and
+		# thus that the repository is corrupt even if the commit graph
+		# exists.
+		oid=$(git rev-parse B) &&
+		rm .git/objects/"$(test_oid_to_path "$oid")" &&
+
+		# Again, we should be able to parse the commit when not
+		# being paranoid about commit graph staleness...
+		GIT_COMMIT_GRAPH_PARANOIA=false git rev-parse HEAD~2 &&
+		# ... but fail when we are paranoid.
+		test_must_fail git rev-parse HEAD~2 2>error &&
+		grep "error: commit $oid exists in commit-graph but not in the object database" error
+	)
+'
+
 test_done
-- 
2.42.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 0/2] commit-graph: detect commits missing in ODB
  2023-10-31  7:16 ` [PATCH v3 0/2] commit-graph: detect commits missing in ODB Patrick Steinhardt
  2023-10-31  7:16   ` [PATCH v3 1/2] commit-graph: introduce envvar to disable commit existence checks Patrick Steinhardt
  2023-10-31  7:16   ` [PATCH v3 2/2] commit: detect commits that exist in commit-graph but not in the ODB Patrick Steinhardt
@ 2023-10-31 19:16   ` Taylor Blau
  2023-10-31 23:57     ` Junio C Hamano
  2023-10-31 23:55   ` Junio C Hamano
  3 siblings, 1 reply; 18+ messages in thread
From: Taylor Blau @ 2023-10-31 19:16 UTC (permalink / raw
  To: Patrick Steinhardt; +Cc: git, Karthik Nayak, Junio C Hamano, Jeff King

On Tue, Oct 31, 2023 at 08:16:09AM +0100, Patrick Steinhardt wrote:
> Patrick Steinhardt (2):
>   commit-graph: introduce envvar to disable commit existence checks
>   commit: detect commits that exist in commit-graph but not in the ODB
>
>  Documentation/git.txt   | 10 +++++++++
>  commit-graph.c          |  6 +++++-
>  commit-graph.h          |  6 ++++++
>  commit.c                | 16 +++++++++++++-
>  t/t5318-commit-graph.sh | 48 +++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 84 insertions(+), 2 deletions(-)
>
> Range-diff against v2:

Thanks, the range-diff here looks exactly as expected. Thanks for
working on this, this version LGTM.

Thanks,
Taylor


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 0/2] commit-graph: detect commits missing in ODB
  2023-10-31  7:16 ` [PATCH v3 0/2] commit-graph: detect commits missing in ODB Patrick Steinhardt
                     ` (2 preceding siblings ...)
  2023-10-31 19:16   ` [PATCH v3 0/2] commit-graph: detect commits missing in ODB Taylor Blau
@ 2023-10-31 23:55   ` Junio C Hamano
  3 siblings, 0 replies; 18+ messages in thread
From: Junio C Hamano @ 2023-10-31 23:55 UTC (permalink / raw
  To: Patrick Steinhardt; +Cc: git, Karthik Nayak, Taylor Blau, Jeff King

Patrick Steinhardt <ps@pks.im> writes:

> Hi,
>
> this is version 3 of my patch series to more readily detect commits
> parsed from the commit graph which are missing in the object database.
>
> Changes compared to v2:
>
>     - Rewrote the help text for `GIT_COMMIT_GRAPH_PARANOIA` to be more
>       accessible.
>
>     - Renamed the `object_paranoia` variable to `commit_graph_paranoia`.
>
>     - Fixed a typo.
>
> Thanks!
>
> Patrick
>
> Patrick Steinhardt (2):
>   commit-graph: introduce envvar to disable commit existence checks
>   commit: detect commits that exist in commit-graph but not in the ODB

Yikes, isn't this already in 'next'???



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 0/2] commit-graph: detect commits missing in ODB
  2023-10-31 19:16   ` [PATCH v3 0/2] commit-graph: detect commits missing in ODB Taylor Blau
@ 2023-10-31 23:57     ` Junio C Hamano
  2023-11-01  2:06       ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2023-10-31 23:57 UTC (permalink / raw
  To: Taylor Blau; +Cc: Patrick Steinhardt, git, Karthik Nayak, Jeff King

Taylor Blau <me@ttaylorr.com> writes:

> On Tue, Oct 31, 2023 at 08:16:09AM +0100, Patrick Steinhardt wrote:
>> Patrick Steinhardt (2):
>>   commit-graph: introduce envvar to disable commit existence checks
>>   commit: detect commits that exist in commit-graph but not in the ODB
>>
>>  Documentation/git.txt   | 10 +++++++++
>>  commit-graph.c          |  6 +++++-
>>  commit-graph.h          |  6 ++++++
>>  commit.c                | 16 +++++++++++++-
>>  t/t5318-commit-graph.sh | 48 +++++++++++++++++++++++++++++++++++++++++
>>  5 files changed, 84 insertions(+), 2 deletions(-)
>>
>> Range-diff against v2:
>
> Thanks, the range-diff here looks exactly as expected. Thanks for
> working on this, this version LGTM.

OK, I'd like a version as incremental to v2 (since it already is in
'next') that results in the same tree state as v3 then.

Thanks for working on it, and reviewing it.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 0/2] commit-graph: detect commits missing in ODB
  2023-10-31 23:57     ` Junio C Hamano
@ 2023-11-01  2:06       ` Junio C Hamano
  2023-11-01  4:55         ` Patrick Steinhardt
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2023-11-01  2:06 UTC (permalink / raw
  To: Patrick Steinhardt; +Cc: Taylor Blau, git, Karthik Nayak, Jeff King

Junio C Hamano <gitster@pobox.com> writes:

>> Thanks, the range-diff here looks exactly as expected. Thanks for
>> working on this, this version LGTM.
>
> OK, I'd like a version as incremental to v2 (since it already is in
> 'next') that results in the same tree state as v3 then.
>
> Thanks for working on it, and reviewing it.

In the meantime, here is a mechanically produced incremental I'll
tentatively queue.  Hopefully I did not screw up while generating
it.

Thanks.

--- >8 ---
From: Patrick Steinhardt <ps@pks.im>
Date: Tue, 31 Oct 2023 08:16:09 +0100
Subject: [PATCH] commit-graph: clarify GIT_COMMIT_GRAPH_PARANOIA documentation

In response to reviews of the previous round that has already hit
'next', clarify the help text for GIT_COMMIT_GRAPH_PARANOIA and
rename object_paranoia variable to commit_graph_paranoia for
consistency.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/git.txt   | 15 ++++++++-------
 commit-graph.c          |  8 ++++----
 commit.c                |  8 ++++----
 t/t5318-commit-graph.sh |  2 +-
 4 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/Documentation/git.txt b/Documentation/git.txt
index 22c2b537aa..3bac24cf8a 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -912,13 +912,14 @@ for full details.
 	useful when trying to salvage data from a corrupted repository.
 
 `GIT_COMMIT_GRAPH_PARANOIA`::
-	If this Boolean environment variable is set to false, ignore the
-	case where commits exist in the commit graph but not in the
-	object database. Normally, Git will check whether commits loaded
-	from the commit graph exist in the object database to avoid
-	issues with stale commit graphs, but this check comes with a
-	performance penalty. The default is `1` (i.e., be paranoid about
-	stale commits in the commit graph).
+	When loading a commit object from the commit-graph, Git performs an
+	existence check on the object in the object database. This is done to
+	avoid issues with stale commit-graphs that contain references to
+	already-deleted commits, but comes with a performance penalty.
++
+The default is "true", which enables the aforementioned behavior.
+Setting this to "false" disables the existence check. This can lead to
+a performance improvement at the cost of consistency.
 
 `GIT_ALLOW_PROTOCOL`::
 	If set to a colon-separated list of protocols, behave as if
diff --git a/commit-graph.c b/commit-graph.c
index 376f59af73..b37fdcb214 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -907,18 +907,18 @@ int repo_find_commit_pos_in_graph(struct repository *r, struct commit *c,
 
 struct commit *lookup_commit_in_graph(struct repository *repo, const struct object_id *id)
 {
-	static int object_paranoia = -1;
+	static int commit_graph_paranoia = -1;
 	struct commit *commit;
 	uint32_t pos;
 
-	if (object_paranoia == -1)
-		object_paranoia = git_env_bool(GIT_COMMIT_GRAPH_PARANOIA, 1);
+	if (commit_graph_paranoia == -1)
+		commit_graph_paranoia = git_env_bool(GIT_COMMIT_GRAPH_PARANOIA, 1);
 
 	if (!prepare_commit_graph(repo))
 		return NULL;
 	if (!search_commit_pos_in_graph(id, repo->objects->commit_graph, &pos))
 		return NULL;
-	if (object_paranoia && !has_object(repo, id, 0))
+	if (commit_graph_paranoia && !has_object(repo, id, 0))
 		return NULL;
 
 	commit = lookup_commit(repo, id);
diff --git a/commit.c b/commit.c
index 7399e90212..8405d7c3fc 100644
--- a/commit.c
+++ b/commit.c
@@ -574,12 +574,12 @@ int repo_parse_commit_internal(struct repository *r,
 	if (item->object.parsed)
 		return 0;
 	if (use_commit_graph && parse_commit_in_graph(r, item)) {
-		static int object_paranoia = -1;
+		static int commit_graph_paranoia = -1;
 
-		if (object_paranoia == -1)
-			object_paranoia = git_env_bool(GIT_COMMIT_GRAPH_PARANOIA, 1);
+		if (commit_graph_paranoia == -1)
+			commit_graph_paranoia = git_env_bool(GIT_COMMIT_GRAPH_PARANOIA, 1);
 
-		if (object_paranoia && !has_object(r, &item->object.oid, 0)) {
+		if (commit_graph_paranoia && !has_object(r, &item->object.oid, 0)) {
 			unparse_commit(r, &item->object.oid);
 			return quiet_on_missing ? -1 :
 				error(_("commit %s exists in commit-graph but not in the object database"),
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 55e3c7ec78..2c62b91ef9 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -847,7 +847,7 @@ test_expect_success 'stale commit cannot be parsed when traversing graph' '
 		test_commit C &&
 		git commit-graph write --reachable &&
 
-		# Corrupt the repository by deleting the intermittent commit
+		# Corrupt the repository by deleting the intermediate commit
 		# object. Commands should notice that this object is absent and
 		# thus that the repository is corrupt even if the commit graph
 		# exists.
-- 
2.42.0-530-g692be87cbb



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 0/2] commit-graph: detect commits missing in ODB
  2023-11-01  2:06       ` Junio C Hamano
@ 2023-11-01  4:55         ` Patrick Steinhardt
  2023-11-01  5:01           ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: Patrick Steinhardt @ 2023-11-01  4:55 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Taylor Blau, git, Karthik Nayak, Jeff King

[-- Attachment #1: Type: text/plain, Size: 5402 bytes --]

On Wed, Nov 01, 2023 at 11:06:05AM +0900, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
> >> Thanks, the range-diff here looks exactly as expected. Thanks for
> >> working on this, this version LGTM.
> >
> > OK, I'd like a version as incremental to v2 (since it already is in
> > 'next') that results in the same tree state as v3 then.
> >
> > Thanks for working on it, and reviewing it.
> 
> In the meantime, here is a mechanically produced incremental I'll
> tentatively queue.  Hopefully I did not screw up while generating
> it.
> 
> Thanks.

Ah, sorry, didn't notice it was in 'next' already. Anyway, the diff
below looks good to me, thanks!

Patrick

> --- >8 ---
> From: Patrick Steinhardt <ps@pks.im>
> Date: Tue, 31 Oct 2023 08:16:09 +0100
> Subject: [PATCH] commit-graph: clarify GIT_COMMIT_GRAPH_PARANOIA documentation
> 
> In response to reviews of the previous round that has already hit
> 'next', clarify the help text for GIT_COMMIT_GRAPH_PARANOIA and
> rename object_paranoia variable to commit_graph_paranoia for
> consistency.
> 
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  Documentation/git.txt   | 15 ++++++++-------
>  commit-graph.c          |  8 ++++----
>  commit.c                |  8 ++++----
>  t/t5318-commit-graph.sh |  2 +-
>  4 files changed, 17 insertions(+), 16 deletions(-)
> 
> diff --git a/Documentation/git.txt b/Documentation/git.txt
> index 22c2b537aa..3bac24cf8a 100644
> --- a/Documentation/git.txt
> +++ b/Documentation/git.txt
> @@ -912,13 +912,14 @@ for full details.
>  	useful when trying to salvage data from a corrupted repository.
>  
>  `GIT_COMMIT_GRAPH_PARANOIA`::
> -	If this Boolean environment variable is set to false, ignore the
> -	case where commits exist in the commit graph but not in the
> -	object database. Normally, Git will check whether commits loaded
> -	from the commit graph exist in the object database to avoid
> -	issues with stale commit graphs, but this check comes with a
> -	performance penalty. The default is `1` (i.e., be paranoid about
> -	stale commits in the commit graph).
> +	When loading a commit object from the commit-graph, Git performs an
> +	existence check on the object in the object database. This is done to
> +	avoid issues with stale commit-graphs that contain references to
> +	already-deleted commits, but comes with a performance penalty.
> ++
> +The default is "true", which enables the aforementioned behavior.
> +Setting this to "false" disables the existence check. This can lead to
> +a performance improvement at the cost of consistency.
>  
>  `GIT_ALLOW_PROTOCOL`::
>  	If set to a colon-separated list of protocols, behave as if
> diff --git a/commit-graph.c b/commit-graph.c
> index 376f59af73..b37fdcb214 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -907,18 +907,18 @@ int repo_find_commit_pos_in_graph(struct repository *r, struct commit *c,
>  
>  struct commit *lookup_commit_in_graph(struct repository *repo, const struct object_id *id)
>  {
> -	static int object_paranoia = -1;
> +	static int commit_graph_paranoia = -1;
>  	struct commit *commit;
>  	uint32_t pos;
>  
> -	if (object_paranoia == -1)
> -		object_paranoia = git_env_bool(GIT_COMMIT_GRAPH_PARANOIA, 1);
> +	if (commit_graph_paranoia == -1)
> +		commit_graph_paranoia = git_env_bool(GIT_COMMIT_GRAPH_PARANOIA, 1);
>  
>  	if (!prepare_commit_graph(repo))
>  		return NULL;
>  	if (!search_commit_pos_in_graph(id, repo->objects->commit_graph, &pos))
>  		return NULL;
> -	if (object_paranoia && !has_object(repo, id, 0))
> +	if (commit_graph_paranoia && !has_object(repo, id, 0))
>  		return NULL;
>  
>  	commit = lookup_commit(repo, id);
> diff --git a/commit.c b/commit.c
> index 7399e90212..8405d7c3fc 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -574,12 +574,12 @@ int repo_parse_commit_internal(struct repository *r,
>  	if (item->object.parsed)
>  		return 0;
>  	if (use_commit_graph && parse_commit_in_graph(r, item)) {
> -		static int object_paranoia = -1;
> +		static int commit_graph_paranoia = -1;
>  
> -		if (object_paranoia == -1)
> -			object_paranoia = git_env_bool(GIT_COMMIT_GRAPH_PARANOIA, 1);
> +		if (commit_graph_paranoia == -1)
> +			commit_graph_paranoia = git_env_bool(GIT_COMMIT_GRAPH_PARANOIA, 1);
>  
> -		if (object_paranoia && !has_object(r, &item->object.oid, 0)) {
> +		if (commit_graph_paranoia && !has_object(r, &item->object.oid, 0)) {
>  			unparse_commit(r, &item->object.oid);
>  			return quiet_on_missing ? -1 :
>  				error(_("commit %s exists in commit-graph but not in the object database"),
> diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
> index 55e3c7ec78..2c62b91ef9 100755
> --- a/t/t5318-commit-graph.sh
> +++ b/t/t5318-commit-graph.sh
> @@ -847,7 +847,7 @@ test_expect_success 'stale commit cannot be parsed when traversing graph' '
>  		test_commit C &&
>  		git commit-graph write --reachable &&
>  
> -		# Corrupt the repository by deleting the intermittent commit
> +		# Corrupt the repository by deleting the intermediate commit
>  		# object. Commands should notice that this object is absent and
>  		# thus that the repository is corrupt even if the commit graph
>  		# exists.
> -- 
> 2.42.0-530-g692be87cbb
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 0/2] commit-graph: detect commits missing in ODB
  2023-11-01  4:55         ` Patrick Steinhardt
@ 2023-11-01  5:01           ` Junio C Hamano
  0 siblings, 0 replies; 18+ messages in thread
From: Junio C Hamano @ 2023-11-01  5:01 UTC (permalink / raw
  To: Patrick Steinhardt; +Cc: Taylor Blau, git, Karthik Nayak, Jeff King

Patrick Steinhardt <ps@pks.im> writes:

>> In the meantime, here is a mechanically produced incremental I'll
>> tentatively queue.  Hopefully I did not screw up while generating
>> it.
>> 
>> Thanks.
>
> Ah, sorry, didn't notice it was in 'next' already. Anyway, the diff
> below looks good to me, thanks!

I changed my mind ;-) I'll kick out a few topics and rebuild 'next',
with your v3 patches.

Thanks.


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2023-11-01  5:01 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-23 11:27 [PATCH v2 0/2] commit-graph: detect commits missing in ODB Patrick Steinhardt
2023-10-23 11:27 ` [PATCH v2 1/2] commit-graph: introduce envvar to disable commit existence checks Patrick Steinhardt
2023-10-30 21:29   ` Taylor Blau
2023-10-31  6:19     ` Patrick Steinhardt
2023-10-23 11:27 ` [PATCH v2 2/2] commit: detect commits that exist in commit-graph but not in the ODB Patrick Steinhardt
2023-10-24 19:10   ` Junio C Hamano
2023-10-30 21:32     ` Taylor Blau
2023-10-31  0:21       ` Junio C Hamano
2023-10-30 21:31   ` Taylor Blau
2023-10-31  7:16 ` [PATCH v3 0/2] commit-graph: detect commits missing in ODB Patrick Steinhardt
2023-10-31  7:16   ` [PATCH v3 1/2] commit-graph: introduce envvar to disable commit existence checks Patrick Steinhardt
2023-10-31  7:16   ` [PATCH v3 2/2] commit: detect commits that exist in commit-graph but not in the ODB Patrick Steinhardt
2023-10-31 19:16   ` [PATCH v3 0/2] commit-graph: detect commits missing in ODB Taylor Blau
2023-10-31 23:57     ` Junio C Hamano
2023-11-01  2:06       ` Junio C Hamano
2023-11-01  4:55         ` Patrick Steinhardt
2023-11-01  5:01           ` Junio C Hamano
2023-10-31 23:55   ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).