git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [RFC PATCH] repo-settings: set defaults even when not in a repo
@ 2022-03-23 18:03 Josh Steadmon
  2022-03-23 19:22 ` Derrick Stolee
                   ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Josh Steadmon @ 2022-03-23 18:03 UTC (permalink / raw)
  To: git; +Cc: lessleydennington, gitster

prepare_repo_settings() initializes a `struct repository` with various
default config options and settings read from a repository-local config
file. In 44c7e62 (2021-12-06, repo-settings:prepare_repo_settings only
in git repos), prepare_repo_settings was changed to issue a BUG() if it
is called by a process whose CWD is not a Git repository. This approach
was suggested in [1].

This breaks fuzz-commit-graph, which attempts to parse arbitrary
fuzzing-engine-provided bytes as a commit graph file.
commit-graph.c:parse_commit_graph() calls prepare_repo_settings(), but
since we run the fuzz tests without a valid repository, we are hitting
the BUG() from 44c7e62 for every test case.

Fix this by refactoring prepare_repo_settings() such that it sets
default options unconditionally; if its process is in a Git repository,
it will also load settings from the local config. This eliminates the
need for a BUG() when not in a repository.

Concerns:

Are any callers strictly dependent on having a BUG() here? I suspect
that the worst that would happen is that rather than this BUG(), the
caller would later hit its own BUG() or die(), so I do not think this is
a blocker. Additionally, every builtin that directly calls
prepare_repo_settings is either marked as RUN_SETUP, which means we
would die() prior to calling it anyway, or checks on its own before
calling it (builtin/diff.c). There are several callers in library code,
though, and I have not tracked down how all of those are used.

Alternatives considered:

Setting up a valid repository for fuzz testing would avoid triggering
this bug, but would unacceptably slow down the test cases.

Refactoring parse_commit_graph() in such a way that the fuzz test has an
alternate entry point that avoids calling prepare_repo_settings() might
be possible, but would be a much larger change than this one. It would
also run the risk that the changes would be so extensive that the fuzzer
would be merely testing its own custom commit-graph implementation,
rather than the one that's actually used in the real world.

[1]: https://lore.kernel.org/git/xmqqcznh8913.fsf@gitster.g/

Signed-off-by: Josh Steadmon <steadmon@google.com>
---
 repo-settings.c | 111 ++++++++++++++++++++++++------------------------
 1 file changed, 55 insertions(+), 56 deletions(-)

diff --git a/repo-settings.c b/repo-settings.c
index b4fbd16cdc..d154b37007 100644
--- a/repo-settings.c
+++ b/repo-settings.c
@@ -17,9 +17,6 @@ void prepare_repo_settings(struct repository *r)
 	char *strval;
 	int manyfiles;
 
-	if (!r->gitdir)
-		BUG("Cannot add settings for uninitialized repository");
-
 	if (r->settings.initialized++)
 		return;
 
@@ -28,27 +25,63 @@ void prepare_repo_settings(struct repository *r)
 	r->settings.core_untracked_cache = UNTRACKED_CACHE_KEEP;
 	r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_CONSECUTIVE;
 
-	/* Booleans config or default, cascades to other settings */
-	repo_cfg_bool(r, "feature.manyfiles", &manyfiles, 0);
-	repo_cfg_bool(r, "feature.experimental", &experimental, 0);
+	if (r->gitdir) {
+		/* Booleans config or default, cascades to other settings */
+		repo_cfg_bool(r, "feature.manyfiles", &manyfiles, 0);
+		repo_cfg_bool(r, "feature.experimental", &experimental, 0);
 
-	/* Defaults modified by feature.* */
-	if (experimental) {
-		r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_SKIPPING;
-	}
-	if (manyfiles) {
-		r->settings.index_version = 4;
-		r->settings.core_untracked_cache = UNTRACKED_CACHE_WRITE;
-	}
+		/* Defaults modified by feature.* */
+		if (experimental) {
+			r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_SKIPPING;
+		}
+		if (manyfiles) {
+			r->settings.index_version = 4;
+			r->settings.core_untracked_cache = UNTRACKED_CACHE_WRITE;
+		}
+
+		/* Boolean config or default, does not cascade (simple)  */
+		repo_cfg_bool(r, "core.commitgraph", &r->settings.core_commit_graph, 1);
+		repo_cfg_bool(r, "commitgraph.readchangedpaths", &r->settings.commit_graph_read_changed_paths, 1);
+		repo_cfg_bool(r, "gc.writecommitgraph", &r->settings.gc_write_commit_graph, 1);
+		repo_cfg_bool(r, "fetch.writecommitgraph", &r->settings.fetch_write_commit_graph, 0);
+		repo_cfg_bool(r, "pack.usesparse", &r->settings.pack_use_sparse, 1);
+		repo_cfg_bool(r, "core.multipackindex", &r->settings.core_multi_pack_index, 1);
+		repo_cfg_bool(r, "index.sparse", &r->settings.sparse_index, 0);
 
-	/* Boolean config or default, does not cascade (simple)  */
-	repo_cfg_bool(r, "core.commitgraph", &r->settings.core_commit_graph, 1);
-	repo_cfg_bool(r, "commitgraph.readchangedpaths", &r->settings.commit_graph_read_changed_paths, 1);
-	repo_cfg_bool(r, "gc.writecommitgraph", &r->settings.gc_write_commit_graph, 1);
-	repo_cfg_bool(r, "fetch.writecommitgraph", &r->settings.fetch_write_commit_graph, 0);
-	repo_cfg_bool(r, "pack.usesparse", &r->settings.pack_use_sparse, 1);
-	repo_cfg_bool(r, "core.multipackindex", &r->settings.core_multi_pack_index, 1);
-	repo_cfg_bool(r, "index.sparse", &r->settings.sparse_index, 0);
+		/*
+		 * Non-boolean config
+		 */
+		if (!repo_config_get_int(r, "index.version", &value))
+			r->settings.index_version = value;
+
+		if (!repo_config_get_string(r, "core.untrackedcache", &strval)) {
+			int v = git_parse_maybe_bool(strval);
+
+			/*
+			 * If it's set to "keep", or some other non-boolean
+			 * value then "v < 0". Then we do nothing and keep it
+			 * at the default of UNTRACKED_CACHE_KEEP.
+			 */
+			if (v >= 0)
+				r->settings.core_untracked_cache = v ?
+					UNTRACKED_CACHE_WRITE : UNTRACKED_CACHE_REMOVE;
+			free(strval);
+		}
+
+		if (!repo_config_get_string(r, "fetch.negotiationalgorithm", &strval)) {
+			int fetch_default = r->settings.fetch_negotiation_algorithm;
+			if (!strcasecmp(strval, "skipping"))
+				r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_SKIPPING;
+			else if (!strcasecmp(strval, "noop"))
+				r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_NOOP;
+			else if (!strcasecmp(strval, "consecutive"))
+				r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_CONSECUTIVE;
+			else if (!strcasecmp(strval, "default"))
+				r->settings.fetch_negotiation_algorithm = fetch_default;
+			else
+				die("unknown fetch negotiation algorithm '%s'", strval);
+		}
+	}
 
 	/*
 	 * The GIT_TEST_MULTI_PACK_INDEX variable is special in that
@@ -60,40 +93,6 @@ void prepare_repo_settings(struct repository *r)
 	if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0))
 		r->settings.core_multi_pack_index = 1;
 
-	/*
-	 * Non-boolean config
-	 */
-	if (!repo_config_get_int(r, "index.version", &value))
-		r->settings.index_version = value;
-
-	if (!repo_config_get_string(r, "core.untrackedcache", &strval)) {
-		int v = git_parse_maybe_bool(strval);
-
-		/*
-		 * If it's set to "keep", or some other non-boolean
-		 * value then "v < 0". Then we do nothing and keep it
-		 * at the default of UNTRACKED_CACHE_KEEP.
-		 */
-		if (v >= 0)
-			r->settings.core_untracked_cache = v ?
-				UNTRACKED_CACHE_WRITE : UNTRACKED_CACHE_REMOVE;
-		free(strval);
-	}
-
-	if (!repo_config_get_string(r, "fetch.negotiationalgorithm", &strval)) {
-		int fetch_default = r->settings.fetch_negotiation_algorithm;
-		if (!strcasecmp(strval, "skipping"))
-			r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_SKIPPING;
-		else if (!strcasecmp(strval, "noop"))
-			r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_NOOP;
-		else if (!strcasecmp(strval, "consecutive"))
-			r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_CONSECUTIVE;
-		else if (!strcasecmp(strval, "default"))
-			r->settings.fetch_negotiation_algorithm = fetch_default;
-		else
-			die("unknown fetch negotiation algorithm '%s'", strval);
-	}
-
 	/*
 	 * This setting guards all index reads to require a full index
 	 * over a sparse index. After suitable guards are placed in the

base-commit: 715d08a9e51251ad8290b181b6ac3b9e1f9719d7
-- 
2.35.1.894.gb6a874cedc-goog


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH] repo-settings: set defaults even when not in a repo
  2022-03-23 18:03 [RFC PATCH] repo-settings: set defaults even when not in a repo Josh Steadmon
@ 2022-03-23 19:22 ` Derrick Stolee
  2022-03-23 19:52   ` Taylor Blau
  2022-03-23 20:11 ` [RFC PATCH] repo-settings: set defaults even when not in a repo Victoria Dye
  2022-03-23 20:51 ` Junio C Hamano
  2 siblings, 1 reply; 24+ messages in thread
From: Derrick Stolee @ 2022-03-23 19:22 UTC (permalink / raw)
  To: Josh Steadmon, git; +Cc: lessleydennington, gitster

On 3/23/2022 2:03 PM, Josh Steadmon wrote:
> prepare_repo_settings() initializes a `struct repository` with various
> default config options and settings read from a repository-local config
> file. In 44c7e62 (2021-12-06, repo-settings:prepare_repo_settings only
> in git repos), prepare_repo_settings was changed to issue a BUG() if it
> is called by a process whose CWD is not a Git repository. This approach
> was suggested in [1].
> 
> This breaks fuzz-commit-graph, which attempts to parse arbitrary
> fuzzing-engine-provided bytes as a commit graph file.
> commit-graph.c:parse_commit_graph() calls prepare_repo_settings(), but
> since we run the fuzz tests without a valid repository, we are hitting
> the BUG() from 44c7e62 for every test case.
> 
> Fix this by refactoring prepare_repo_settings() such that it sets
> default options unconditionally; if its process is in a Git repository,
> it will also load settings from the local config. This eliminates the
> need for a BUG() when not in a repository.

I think you have the right idea and this can work.
 
> -	/* Booleans config or default, cascades to other settings */
> -	repo_cfg_bool(r, "feature.manyfiles", &manyfiles, 0);
> -	repo_cfg_bool(r, "feature.experimental", &experimental, 0);
> +	if (r->gitdir) {
> +		/* Booleans config or default, cascades to other settings */
> +		repo_cfg_bool(r, "feature.manyfiles", &manyfiles, 0);
> +		repo_cfg_bool(r, "feature.experimental", &experimental, 0);

However, this giant if block is going to be a bit unwieldy,
especially as we add settings in the future.

Ignoring whitespace, this is your diff:

diff --git a/repo-settings.c b/repo-settings.c
index b4fbd16cdc..d154b37007 100644
--- a/repo-settings.c
+++ b/repo-settings.c
@@ -17,9 +17,6 @@ void prepare_repo_settings(struct repository *r)
 	char *strval;
 	int manyfiles;
 
-	if (!r->gitdir)
-		BUG("Cannot add settings for uninitialized repository");
-
 	if (r->settings.initialized++)
 		return;
 
@@ -28,6 +25,7 @@ void prepare_repo_settings(struct repository *r)
 	r->settings.core_untracked_cache = UNTRACKED_CACHE_KEEP;
 	r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_CONSECUTIVE;
 
+	if (r->gitdir) {
 		/* Booleans config or default, cascades to other settings */
 		repo_cfg_bool(r, "feature.manyfiles", &manyfiles, 0);
 		repo_cfg_bool(r, "feature.experimental", &experimental, 0);
@@ -50,16 +48,6 @@ void prepare_repo_settings(struct repository *r)
 		repo_cfg_bool(r, "core.multipackindex", &r->settings.core_multi_pack_index, 1);
 		repo_cfg_bool(r, "index.sparse", &r->settings.sparse_index, 0);

(Adding the if)

-	/*
-	 * The GIT_TEST_MULTI_PACK_INDEX variable is special in that
-	 * either it *or* the config sets
-	 * r->settings.core_multi_pack_index if true. We don't take
-	 * the environment variable if it exists (even if false) over
-	 * any config, as in most other cases.
-	 */
-	if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0))
-		r->settings.core_multi_pack_index = 1;
-
 		/*
 		 * Non-boolean config
 		 */
@@ -93,6 +81,17 @@ void prepare_repo_settings(struct repository *r)
 			else
 				die("unknown fetch negotiation algorithm '%s'", strval);
 		}
+	}
+
+	/*
+	 * The GIT_TEST_MULTI_PACK_INDEX variable is special in that
+	 * either it *or* the config sets
+	 * r->settings.core_multi_pack_index if true. We don't take
+	 * the environment variable if it exists (even if false) over
+	 * any config, as in most other cases.
+	 */
+	if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0))
+		r->settings.core_multi_pack_index = 1;
 
(Moving this to group with other no-dir settings.)

I think what you're really trying to do is

	if (r->gitdir)
		(Load settings that require a repo)

	(Load settings that work without a repo.)

I think that you'd be better off extracting the two types of config
into helper methods (prepare_gitdir_settings()/prepare_nodir_settings()?)
across two patches, then in a third removing the BUG and inserting the
if (r->gitdir) above.

Does that seem reasonable?

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH] repo-settings: set defaults even when not in a repo
  2022-03-23 19:22 ` Derrick Stolee
@ 2022-03-23 19:52   ` Taylor Blau
  2022-03-28 19:15     ` Josh Steadmon
                       ` (5 more replies)
  0 siblings, 6 replies; 24+ messages in thread
From: Taylor Blau @ 2022-03-23 19:52 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Josh Steadmon, git, lessleydennington, gitster

On Wed, Mar 23, 2022 at 03:22:13PM -0400, Derrick Stolee wrote:
> On 3/23/2022 2:03 PM, Josh Steadmon wrote:
> > prepare_repo_settings() initializes a `struct repository` with various
> > default config options and settings read from a repository-local config
> > file. In 44c7e62 (2021-12-06, repo-settings:prepare_repo_settings only
> > in git repos), prepare_repo_settings was changed to issue a BUG() if it
> > is called by a process whose CWD is not a Git repository. This approach
> > was suggested in [1].
> >
> > This breaks fuzz-commit-graph, which attempts to parse arbitrary
> > fuzzing-engine-provided bytes as a commit graph file.
> > commit-graph.c:parse_commit_graph() calls prepare_repo_settings(), but
> > since we run the fuzz tests without a valid repository, we are hitting
> > the BUG() from 44c7e62 for every test case.
> >
> > Fix this by refactoring prepare_repo_settings() such that it sets
> > default options unconditionally; if its process is in a Git repository,
> > it will also load settings from the local config. This eliminates the
> > need for a BUG() when not in a repository.
>
> I think you have the right idea and this can work.

Hmmm. To me this feels like bending over backwards in
`prepare_repo_settings()` to accommodate one particular caller. I'm not
necessarily opposed to it, but it does feel strange to make
`prepare_repo_settings()` a noop here, since I would expect that any
callers who do want to call `prepare_repo_settings()` are likely
convinced that they are inside of a repository, and it probably should
be a BUG() if they aren't.

I was initially thinking that Josh's alternative of introducing and
calling a lower-level version of `prepare_commit_graph()` that doesn't
require the use of any repository settings would make sense.

But when I started looking at implementing it, I became confused at how
this is supposed to work at all without using a repository. We depend on
the values parsed from:

  - commitGraph.generationVersion
  - commitGraph.readChangedPaths

to determine which part(s) of the commit graph we do and don't read.

Looking around, I think I probably inadvertently broke this in
ab14d0676c (commit-graph: pass a 'struct repository *' in more places,
2020-09-09). But prior to ab14d0676c, neither of those settings existed,
so parsing the commit graph was a pure function of the commit graph's
contents alone, and didn't rely on the existence of a repository.

We could pretend as if `commitGraph.generationVersion` is always "2" and
`commitGraph.readChangedPaths` is always "true", and I think that would
still give us good-enough coverage.

I assume that libFuzzer doesn't give us a convenient way to stub out a
repository. Maybe we should have a variant of `parse_commit_graph` that
instead takes a `struct repo_settings` filled out with the defaults?

We could use that to teach libFuzzer about additional states that the
commit graph parser can be in, too (though probably outside the scope of
this patch).

I tried to sketch all of this out, which seems to work. Let me know what
you think:

--- 8< ---

diff --git a/commit-graph.c b/commit-graph.c
index adffd020dd..3d5aa536c2 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -96,13 +96,6 @@ define_commit_slab(commit_graph_data_slab, struct commit_graph_data);
 static struct commit_graph_data_slab commit_graph_data_slab =
 	COMMIT_SLAB_INIT(1, commit_graph_data_slab);

-static int get_configured_generation_version(struct repository *r)
-{
-	int version = 2;
-	repo_config_get_int(r, "commitgraph.generationversion", &version);
-	return version;
-}
-
 uint32_t commit_graph_position(const struct commit *c)
 {
 	struct commit_graph_data *data =
@@ -335,6 +328,13 @@ static int graph_read_bloom_data(const unsigned char *chunk_start,

 struct commit_graph *parse_commit_graph(struct repository *r,
 					void *graph_map, size_t graph_size)
+{
+	prepare_repo_settings(r);
+	return parse_commit_graph_settings(&r->settings, graph_map, graph_size);
+}
+
+struct commit_graph *parse_commit_graph_settings(struct repo_settings *s,
+						 void *graph_map, size_t graph_size)
 {
 	const unsigned char *data;
 	struct commit_graph *graph;
@@ -371,8 +371,6 @@ struct commit_graph *parse_commit_graph(struct repository *r,
 		return NULL;
 	}

-	prepare_repo_settings(r);
-
 	graph = alloc_commit_graph();

 	graph->hash_len = the_hash_algo->rawsz;
@@ -402,7 +400,7 @@ struct commit_graph *parse_commit_graph(struct repository *r,
 	pair_chunk(cf, GRAPH_CHUNKID_EXTRAEDGES, &graph->chunk_extra_edges);
 	pair_chunk(cf, GRAPH_CHUNKID_BASE, &graph->chunk_base_graphs);

-	if (get_configured_generation_version(r) >= 2) {
+	if (s->commit_graph_generation_version >= 2) {
 		pair_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA,
 			&graph->chunk_generation_data);
 		pair_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW,
@@ -412,7 +410,7 @@ struct commit_graph *parse_commit_graph(struct repository *r,
 			graph->read_generation_data = 1;
 	}

-	if (r->settings.commit_graph_read_changed_paths) {
+	if (s->commit_graph_read_changed_paths) {
 		pair_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES,
 			   &graph->chunk_bloom_indexes);
 		read_chunk(cf, GRAPH_CHUNKID_BLOOMDATA,
@@ -2299,7 +2297,7 @@ int write_commit_graph(struct object_directory *odb,
 	ctx->split = flags & COMMIT_GRAPH_WRITE_SPLIT ? 1 : 0;
 	ctx->opts = opts;
 	ctx->total_bloom_filter_data_size = 0;
-	ctx->write_generation_data = (get_configured_generation_version(r) == 2);
+	ctx->write_generation_data = (r->settings.commit_graph_generation_version == 2);
 	ctx->num_generation_data_overflows = 0;

 	bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY",
diff --git a/commit-graph.h b/commit-graph.h
index 2e3ac35237..1fdca7a927 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -95,6 +95,8 @@ struct commit_graph *read_commit_graph_one(struct repository *r,
 					   struct object_directory *odb);
 struct commit_graph *parse_commit_graph(struct repository *r,
 					void *graph_map, size_t graph_size);
+struct commit_graph *parse_commit_graph_settings(struct repo_settings *s,
+					void *graph_map, size_t graph_size);

 /*
  * Return 1 if and only if the repository has a commit-graph
diff --git a/fuzz-commit-graph.c b/fuzz-commit-graph.c
index e7cf6d5b0f..07ef4643d8 100644
--- a/fuzz-commit-graph.c
+++ b/fuzz-commit-graph.c
@@ -11,7 +11,7 @@ int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
 	struct commit_graph *g;

 	initialize_the_repository();
-	g = parse_commit_graph(the_repository, (void *)data, size);
+	g = parse_commit_graph_settings(the_repository->settings, (void *)data, size);
 	repo_clear(the_repository);
 	free_commit_graph(g);

diff --git a/repo-settings.c b/repo-settings.c
index b4fbd16cdc..a7e5d10b27 100644
--- a/repo-settings.c
+++ b/repo-settings.c
@@ -10,6 +10,13 @@ static void repo_cfg_bool(struct repository *r, const char *key, int *dest,
 		*dest = def;
 }

+static void repo_cfg_int(struct repository *r, const char *key, int *dest,
+			 int def)
+{
+	if (repo_config_get_int(r, key, dest))
+		*dest = def;
+}
+
 void prepare_repo_settings(struct repository *r)
 {
 	int experimental;
@@ -43,6 +50,7 @@ void prepare_repo_settings(struct repository *r)

 	/* Boolean config or default, does not cascade (simple)  */
 	repo_cfg_bool(r, "core.commitgraph", &r->settings.core_commit_graph, 1);
+	repo_cfg_int(r, "commitgraph.generationversion", &r->settings.commit_graph_generation_version, 2);
 	repo_cfg_bool(r, "commitgraph.readchangedpaths", &r->settings.commit_graph_read_changed_paths, 1);
 	repo_cfg_bool(r, "gc.writecommitgraph", &r->settings.gc_write_commit_graph, 1);
 	repo_cfg_bool(r, "fetch.writecommitgraph", &r->settings.fetch_write_commit_graph, 0);
diff --git a/repository.h b/repository.h
index e29f361703..6c40ea3f1b 100644
--- a/repository.h
+++ b/repository.h
@@ -29,6 +29,7 @@ struct repo_settings {
 	int initialized;

 	int core_commit_graph;
+	int commit_graph_generation_version;
 	int commit_graph_read_changed_paths;
 	int gc_write_commit_graph;
 	int fetch_write_commit_graph;

--- >8 ---

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH] repo-settings: set defaults even when not in a repo
  2022-03-23 18:03 [RFC PATCH] repo-settings: set defaults even when not in a repo Josh Steadmon
  2022-03-23 19:22 ` Derrick Stolee
@ 2022-03-23 20:11 ` Victoria Dye
  2022-03-23 20:54   ` Junio C Hamano
  2022-03-23 20:51 ` Junio C Hamano
  2 siblings, 1 reply; 24+ messages in thread
From: Victoria Dye @ 2022-03-23 20:11 UTC (permalink / raw)
  To: Josh Steadmon, git; +Cc: lessleydennington, gitster

Josh Steadmon wrote:
> prepare_repo_settings() initializes a `struct repository` with various
> default config options and settings read from a repository-local config
> file. In 44c7e62 (2021-12-06, repo-settings:prepare_repo_settings only
> in git repos), prepare_repo_settings was changed to issue a BUG() if it
> is called by a process whose CWD is not a Git repository. This approach
> was suggested in [1].
> 
> This breaks fuzz-commit-graph, which attempts to parse arbitrary
> fuzzing-engine-provided bytes as a commit graph file.
> commit-graph.c:parse_commit_graph() calls prepare_repo_settings(), but
> since we run the fuzz tests without a valid repository, we are hitting
> the BUG() from 44c7e62 for every test case.
> 
> Fix this by refactoring prepare_repo_settings() such that it sets
> default options unconditionally; if its process is in a Git repository,
> it will also load settings from the local config. This eliminates the
> need for a BUG() when not in a repository.
> 

I think the decision of whether to go with this approach or the alternative
listed below depends on the validity of a 'repository' without a gitdir. 

As far as I can tell, there is an implicit conflict between the changes in:

1. b66d84756f (commit-graph: respect 'commitGraph.readChangedPaths',
   2020-09-09)
2. 44c7e62e51 (repo-settings: prepare_repo_settings only in git repos,
   2021-12-06) (as you pointed out in your message)

The former says that commit-graph should use a repository setting (implying
it needs a valid repository), and the latter says that you need a valid
gitdir to get repository settings.

So to me, how to proceed depends on whether a repository can be "valid"
without a gitdir or not:

* If it is valid (or not-completely-broken/semi-valid - i.e., not a BUG()),
  then your approach is probably fine - the 'BUG()' should be removed from
  'prepare_repo_settings()'. But, to make sure we don't run into this
  question again, 'prepare_repo_settings()' should be audited for any
  settings/setup that implicitly require a gitdir and updated accordingly,
  along with a comment somewhere saying that the 'gitdir' isn't required.
* If it is not valid, then this use in 'fuzz-commit-graph' tests is fragile
  (even if it's not a problem right now) - 'prepare_repo_settings()' could
  start needing the gitdir, or "the_hash_algo" could become invalid, or some
  other problem could arise from the invalid repo and the tests would break.

I don't have a good answer to this question, but someone with more
experience in this area might be able to help.

> Concerns:
> 
> Are any callers strictly dependent on having a BUG() here? I suspect
> that the worst that would happen is that rather than this BUG(), the
> caller would later hit its own BUG() or die(), so I do not think this is
> a blocker. Additionally, every builtin that directly calls
> prepare_repo_settings is either marked as RUN_SETUP, which means we
> would die() prior to calling it anyway, or checks on its own before
> calling it (builtin/diff.c). There are several callers in library code,
> though, and I have not tracked down how all of those are used.
> 
> Alternatives considered:
> 
> Setting up a valid repository for fuzz testing would avoid triggering
> this bug, but would unacceptably slow down the test cases.
> 
> Refactoring parse_commit_graph() in such a way that the fuzz test has an
> alternate entry point that avoids calling prepare_repo_settings() might
> be possible, but would be a much larger change than this one. It would
> also run the risk that the changes would be so extensive that the fuzzer
> would be merely testing its own custom commit-graph implementation,
> rather than the one that's actually used in the real world.
> 

If the 'repository' really is *completely* invalid without a gitdir, I don't
think the refactor of 'commit-graph.c' would necessarily be too bad -
certainly not so extensive that it's not testing "real world" behavior:

* Make 'parse_commit_graph()' only call 'prepare_repo_settings()' if the
  repo has a valid gitdir (guarded like 'git diff' is)
* Guard all use use of 'r' in 'parse_commit_graph()' based on the existence
  of 'gitdir', with fallbacks on e.g. global config settings. From what I
  can see, the only usage is:
    * The check for 'r->settings.commit_graph_read_changed_paths'
    * The call to 'get_configured_generation_version()'
    * The use of 'the_hash_algo->raw_sz'

> [1]: https://lore.kernel.org/git/xmqqcznh8913.fsf@gitster.g/
> 
> Signed-off-by: Josh Steadmon <steadmon@google.com>
> ---
>  repo-settings.c | 111 ++++++++++++++++++++++++------------------------
>  1 file changed, 55 insertions(+), 56 deletions(-)
> 
> diff --git a/repo-settings.c b/repo-settings.c
> index b4fbd16cdc..d154b37007 100644
> --- a/repo-settings.c
> +++ b/repo-settings.c
> @@ -17,9 +17,6 @@ void prepare_repo_settings(struct repository *r)
>  	char *strval;
>  	int manyfiles;
>  
> -	if (!r->gitdir)
> -		BUG("Cannot add settings for uninitialized repository");
> -
>  	if (r->settings.initialized++)
>  		return;
>  
> @@ -28,27 +25,63 @@ void prepare_repo_settings(struct repository *r)
>  	r->settings.core_untracked_cache = UNTRACKED_CACHE_KEEP;
>  	r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_CONSECUTIVE;
>  
> -	/* Booleans config or default, cascades to other settings */
> -	repo_cfg_bool(r, "feature.manyfiles", &manyfiles, 0);
> -	repo_cfg_bool(r, "feature.experimental", &experimental, 0);
> +	if (r->gitdir) {
> +		/* Booleans config or default, cascades to other settings */
> +		repo_cfg_bool(r, "feature.manyfiles", &manyfiles, 0);
> +		repo_cfg_bool(r, "feature.experimental", &experimental, 0);
>  
> -	/* Defaults modified by feature.* */
> -	if (experimental) {
> -		r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_SKIPPING;
> -	}
> -	if (manyfiles) {
> -		r->settings.index_version = 4;
> -		r->settings.core_untracked_cache = UNTRACKED_CACHE_WRITE;
> -	}
> +		/* Defaults modified by feature.* */
> +		if (experimental) {
> +			r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_SKIPPING;
> +		}
> +		if (manyfiles) {
> +			r->settings.index_version = 4;
> +			r->settings.core_untracked_cache = UNTRACKED_CACHE_WRITE;
> +		}
> +
> +		/* Boolean config or default, does not cascade (simple)  */
> +		repo_cfg_bool(r, "core.commitgraph", &r->settings.core_commit_graph, 1);
> +		repo_cfg_bool(r, "commitgraph.readchangedpaths", &r->settings.commit_graph_read_changed_paths, 1);
> +		repo_cfg_bool(r, "gc.writecommitgraph", &r->settings.gc_write_commit_graph, 1);
> +		repo_cfg_bool(r, "fetch.writecommitgraph", &r->settings.fetch_write_commit_graph, 0);
> +		repo_cfg_bool(r, "pack.usesparse", &r->settings.pack_use_sparse, 1);
> +		repo_cfg_bool(r, "core.multipackindex", &r->settings.core_multi_pack_index, 1);
> +		repo_cfg_bool(r, "index.sparse", &r->settings.sparse_index, 0);
>  
> -	/* Boolean config or default, does not cascade (simple)  */
> -	repo_cfg_bool(r, "core.commitgraph", &r->settings.core_commit_graph, 1);
> -	repo_cfg_bool(r, "commitgraph.readchangedpaths", &r->settings.commit_graph_read_changed_paths, 1);
> -	repo_cfg_bool(r, "gc.writecommitgraph", &r->settings.gc_write_commit_graph, 1);
> -	repo_cfg_bool(r, "fetch.writecommitgraph", &r->settings.fetch_write_commit_graph, 0);
> -	repo_cfg_bool(r, "pack.usesparse", &r->settings.pack_use_sparse, 1);
> -	repo_cfg_bool(r, "core.multipackindex", &r->settings.core_multi_pack_index, 1);
> -	repo_cfg_bool(r, "index.sparse", &r->settings.sparse_index, 0);
> +		/*
> +		 * Non-boolean config
> +		 */
> +		if (!repo_config_get_int(r, "index.version", &value))
> +			r->settings.index_version = value;
> +
> +		if (!repo_config_get_string(r, "core.untrackedcache", &strval)) {
> +			int v = git_parse_maybe_bool(strval);
> +
> +			/*
> +			 * If it's set to "keep", or some other non-boolean
> +			 * value then "v < 0". Then we do nothing and keep it
> +			 * at the default of UNTRACKED_CACHE_KEEP.
> +			 */
> +			if (v >= 0)
> +				r->settings.core_untracked_cache = v ?
> +					UNTRACKED_CACHE_WRITE : UNTRACKED_CACHE_REMOVE;
> +			free(strval);
> +		}
> +
> +		if (!repo_config_get_string(r, "fetch.negotiationalgorithm", &strval)) {
> +			int fetch_default = r->settings.fetch_negotiation_algorithm;
> +			if (!strcasecmp(strval, "skipping"))
> +				r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_SKIPPING;
> +			else if (!strcasecmp(strval, "noop"))
> +				r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_NOOP;
> +			else if (!strcasecmp(strval, "consecutive"))
> +				r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_CONSECUTIVE;
> +			else if (!strcasecmp(strval, "default"))
> +				r->settings.fetch_negotiation_algorithm = fetch_default;
> +			else
> +				die("unknown fetch negotiation algorithm '%s'", strval);
> +		}
> +	}
>  
>  	/*
>  	 * The GIT_TEST_MULTI_PACK_INDEX variable is special in that
> @@ -60,40 +93,6 @@ void prepare_repo_settings(struct repository *r)
>  	if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0))
>  		r->settings.core_multi_pack_index = 1;
>  
> -	/*
> -	 * Non-boolean config
> -	 */
> -	if (!repo_config_get_int(r, "index.version", &value))
> -		r->settings.index_version = value;
> -
> -	if (!repo_config_get_string(r, "core.untrackedcache", &strval)) {
> -		int v = git_parse_maybe_bool(strval);
> -
> -		/*
> -		 * If it's set to "keep", or some other non-boolean
> -		 * value then "v < 0". Then we do nothing and keep it
> -		 * at the default of UNTRACKED_CACHE_KEEP.
> -		 */
> -		if (v >= 0)
> -			r->settings.core_untracked_cache = v ?
> -				UNTRACKED_CACHE_WRITE : UNTRACKED_CACHE_REMOVE;
> -		free(strval);
> -	}
> -
> -	if (!repo_config_get_string(r, "fetch.negotiationalgorithm", &strval)) {
> -		int fetch_default = r->settings.fetch_negotiation_algorithm;
> -		if (!strcasecmp(strval, "skipping"))
> -			r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_SKIPPING;
> -		else if (!strcasecmp(strval, "noop"))
> -			r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_NOOP;
> -		else if (!strcasecmp(strval, "consecutive"))
> -			r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_CONSECUTIVE;
> -		else if (!strcasecmp(strval, "default"))
> -			r->settings.fetch_negotiation_algorithm = fetch_default;
> -		else
> -			die("unknown fetch negotiation algorithm '%s'", strval);
> -	}
> -
>  	/*
>  	 * This setting guards all index reads to require a full index
>  	 * over a sparse index. After suitable guards are placed in the
> 
> base-commit: 715d08a9e51251ad8290b181b6ac3b9e1f9719d7


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH] repo-settings: set defaults even when not in a repo
  2022-03-23 18:03 [RFC PATCH] repo-settings: set defaults even when not in a repo Josh Steadmon
  2022-03-23 19:22 ` Derrick Stolee
  2022-03-23 20:11 ` [RFC PATCH] repo-settings: set defaults even when not in a repo Victoria Dye
@ 2022-03-23 20:51 ` Junio C Hamano
  2 siblings, 0 replies; 24+ messages in thread
From: Junio C Hamano @ 2022-03-23 20:51 UTC (permalink / raw)
  To: Josh Steadmon; +Cc: git, lessleydennington

Josh Steadmon <steadmon@google.com> writes:

> prepare_repo_settings() initializes a `struct repository` with various
> default config options and settings read from a repository-local config
> file. In 44c7e62 (2021-12-06, repo-settings:prepare_repo_settings only
> in git repos), prepare_repo_settings was changed to issue a BUG() if it
> is called by a process whose CWD is not a Git repository. This approach
> was suggested in [1].
>
> This breaks fuzz-commit-graph, which attempts to parse arbitrary
> fuzzing-engine-provided bytes as a commit graph file.
> commit-graph.c:parse_commit_graph() calls prepare_repo_settings(), but
> since we run the fuzz tests without a valid repository, we are hitting
> the BUG() from 44c7e62 for every test case.

I think the right approach for such a breakage is to ensure it is in
a repository, not to force prepare_repo_settings() to lie.  In the
day-to-day real code paths the end user uses, we do want to catch
mistakes in our code that calls prepare_repo_settings() when we are
not in a repository.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH] repo-settings: set defaults even when not in a repo
  2022-03-23 20:11 ` [RFC PATCH] repo-settings: set defaults even when not in a repo Victoria Dye
@ 2022-03-23 20:54   ` Junio C Hamano
  2022-03-23 21:19     ` Victoria Dye
  0 siblings, 1 reply; 24+ messages in thread
From: Junio C Hamano @ 2022-03-23 20:54 UTC (permalink / raw)
  To: Victoria Dye; +Cc: Josh Steadmon, git, lessleydennington

Victoria Dye <vdye@github.com> writes:

> I think the decision of whether to go with this approach or the alternative
> listed below depends on the validity of a 'repository' without a gitdir. 
>
> As far as I can tell, there is an implicit conflict between the changes in:
>
> 1. b66d84756f (commit-graph: respect 'commitGraph.readChangedPaths',
>    2020-09-09)
> 2. 44c7e62e51 (repo-settings: prepare_repo_settings only in git repos,
>    2021-12-06) (as you pointed out in your message)
>
> The former says that commit-graph should use a repository setting (implying
> it needs a valid repository), and the latter says that you need a valid
> gitdir to get repository settings.
>
> So to me, how to proceed depends on whether a repository can be "valid"
> without a gitdir or not:

Sorry, I do not get it.  What does "a repository without a git dir"
look like?  It does not make any sense to me.  A repository without
working tree, I can understand.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH] repo-settings: set defaults even when not in a repo
  2022-03-23 20:54   ` Junio C Hamano
@ 2022-03-23 21:19     ` Victoria Dye
  0 siblings, 0 replies; 24+ messages in thread
From: Victoria Dye @ 2022-03-23 21:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Josh Steadmon, git, lessleydennington

Junio C Hamano wrote:
> Victoria Dye <vdye@github.com> writes:
> 
>> I think the decision of whether to go with this approach or the alternative
>> listed below depends on the validity of a 'repository' without a gitdir. 
>>
>> As far as I can tell, there is an implicit conflict between the changes in:
>>
>> 1. b66d84756f (commit-graph: respect 'commitGraph.readChangedPaths',
>>    2020-09-09)
>> 2. 44c7e62e51 (repo-settings: prepare_repo_settings only in git repos,
>>    2021-12-06) (as you pointed out in your message)
>>
>> The former says that commit-graph should use a repository setting (implying
>> it needs a valid repository), and the latter says that you need a valid
>> gitdir to get repository settings.
>>
>> So to me, how to proceed depends on whether a repository can be "valid"
>> without a gitdir or not:
> 
> Sorry, I do not get it.  What does "a repository without a git dir"
> look like?  It does not make any sense to me.  A repository without
> working tree, I can understand.

I think that answers my question - if it doesn't make sense to have a
"repository without a git dir", then the code shouldn't allow
'the_repository' to be used without a valid 'the_repository.gitdir'. The
'BUG()' in 'prepare_repo_settings()' enforces that condition right now, so
removing it like this RFC does would make the tests fragile and the code
more prone to future bugs.

That said, if it's conceptually sensible for 'fuzz-commit-graph' to work
without a repository, then it could be updated to get its defaults from
somewhere other than 'the_repository' when the repo doesn't exist (like what
I mentioned in [1] ("If the 'repository' really is..."), or Taylor's
suggestion in [2]).

[1] https://lore.kernel.org/git/fcfdcbb9-761a-0d34-7d36-61e0ef279922@github.com/
[2] https://lore.kernel.org/git/Yjt6mLIfw0V3aVTO@nand.local/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH] repo-settings: set defaults even when not in a repo
  2022-03-23 19:52   ` Taylor Blau
@ 2022-03-28 19:15     ` Josh Steadmon
  2022-03-29  1:21       ` Taylor Blau
  2022-03-28 19:53     ` Josh Steadmon
                       ` (4 subsequent siblings)
  5 siblings, 1 reply; 24+ messages in thread
From: Josh Steadmon @ 2022-03-28 19:15 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Derrick Stolee, git, lessleydennington, gitster, vdye

On 2022.03.23 15:52, Taylor Blau wrote:
> On Wed, Mar 23, 2022 at 03:22:13PM -0400, Derrick Stolee wrote:
> > On 3/23/2022 2:03 PM, Josh Steadmon wrote:
> > > prepare_repo_settings() initializes a `struct repository` with various
> > > default config options and settings read from a repository-local config
> > > file. In 44c7e62 (2021-12-06, repo-settings:prepare_repo_settings only
> > > in git repos), prepare_repo_settings was changed to issue a BUG() if it
> > > is called by a process whose CWD is not a Git repository. This approach
> > > was suggested in [1].
> > >
> > > This breaks fuzz-commit-graph, which attempts to parse arbitrary
> > > fuzzing-engine-provided bytes as a commit graph file.
> > > commit-graph.c:parse_commit_graph() calls prepare_repo_settings(), but
> > > since we run the fuzz tests without a valid repository, we are hitting
> > > the BUG() from 44c7e62 for every test case.
> > >
> > > Fix this by refactoring prepare_repo_settings() such that it sets
> > > default options unconditionally; if its process is in a Git repository,
> > > it will also load settings from the local config. This eliminates the
> > > need for a BUG() when not in a repository.
> >
> > I think you have the right idea and this can work.
> 
> Hmmm. To me this feels like bending over backwards in
> `prepare_repo_settings()` to accommodate one particular caller. I'm not
> necessarily opposed to it, but it does feel strange to make
> `prepare_repo_settings()` a noop here, since I would expect that any
> callers who do want to call `prepare_repo_settings()` are likely
> convinced that they are inside of a repository, and it probably should
> be a BUG() if they aren't.
> 
> I was initially thinking that Josh's alternative of introducing and
> calling a lower-level version of `prepare_commit_graph()` that doesn't
> require the use of any repository settings would make sense.
> 
> But when I started looking at implementing it, I became confused at how
> this is supposed to work at all without using a repository. We depend on
> the values parsed from:
> 
>   - commitGraph.generationVersion
>   - commitGraph.readChangedPaths
> 
> to determine which part(s) of the commit graph we do and don't read.
> 
> Looking around, I think I probably inadvertently broke this in
> ab14d0676c (commit-graph: pass a 'struct repository *' in more places,
> 2020-09-09). But prior to ab14d0676c, neither of those settings existed,
> so parsing the commit graph was a pure function of the commit graph's
> contents alone, and didn't rely on the existence of a repository.

Yeah, I have not done a great job keeping the fuzzers up to date with
commit-graph changes :(.

> We could pretend as if `commitGraph.generationVersion` is always "2" and
> `commitGraph.readChangedPaths` is always "true", and I think that would
> still give us good-enough coverage.

It might also be worthwhile for the fuzzer to test each interesting
combination of settings, using the same arbitrary input.

> I assume that libFuzzer doesn't give us a convenient way to stub out a
> repository. Maybe we should have a variant of `parse_commit_graph` that
> instead takes a `struct repo_settings` filled out with the defaults?
> 
> We could use that to teach libFuzzer about additional states that the
> commit graph parser can be in, too (though probably outside the scope of
> this patch).
> 
> I tried to sketch all of this out, which seems to work. Let me know what
> you think:

Yes, your patch looks like a much smaller change than I feared would be
the case for a parse_commit_graph refactor. I'll test it out, and
probably add some additional fuzzer improvements on top. Thanks for the
patch!

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH] repo-settings: set defaults even when not in a repo
  2022-03-23 19:52   ` Taylor Blau
  2022-03-28 19:15     ` Josh Steadmon
@ 2022-03-28 19:53     ` Josh Steadmon
  2022-03-29  1:22       ` Taylor Blau
  2022-03-29  9:03     ` Ævar Arnfjörð Bjarmason
                       ` (3 subsequent siblings)
  5 siblings, 1 reply; 24+ messages in thread
From: Josh Steadmon @ 2022-03-28 19:53 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Derrick Stolee, git, lessleydennington, gitster, vdye

On 2022.03.23 15:52, Taylor Blau wrote:
> I tried to sketch all of this out, which seems to work. Let me know what
> you think:

BTW, is it OK if I include your Signed-off-by on this when I send my V2?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH] repo-settings: set defaults even when not in a repo
  2022-03-28 19:15     ` Josh Steadmon
@ 2022-03-29  1:21       ` Taylor Blau
  0 siblings, 0 replies; 24+ messages in thread
From: Taylor Blau @ 2022-03-29  1:21 UTC (permalink / raw)
  To: Josh Steadmon, Derrick Stolee, git, lessleydennington, gitster, vdye

On Mon, Mar 28, 2022 at 12:15:53PM -0700, Josh Steadmon wrote:
> > Looking around, I think I probably inadvertently broke this in
> > ab14d0676c (commit-graph: pass a 'struct repository *' in more places,
> > 2020-09-09). But prior to ab14d0676c, neither of those settings existed,
> > so parsing the commit graph was a pure function of the commit graph's
> > contents alone, and didn't rely on the existence of a repository.
>
> Yeah, I have not done a great job keeping the fuzzers up to date with
> commit-graph changes :(.

I think that puts you and I in the same boat, since the original
breakage from ab14d0676c blames back to me. I'm sorry that I didn't
notice that my change had broken the fuzzing code at the time, and I
appreciate you working on fixing it!

> > We could pretend as if `commitGraph.generationVersion` is always "2" and
> > `commitGraph.readChangedPaths` is always "true", and I think that would
> > still give us good-enough coverage.
>
> It might also be worthwhile for the fuzzer to test each interesting
> combination of settings, using the same arbitrary input.

Definitely. I don't think it hurts to just focus on getting the common
case ("2", "true") working again. And if libFuzzer makes it easy-ish to
test more of the possible input space, I'm all for it.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH] repo-settings: set defaults even when not in a repo
  2022-03-28 19:53     ` Josh Steadmon
@ 2022-03-29  1:22       ` Taylor Blau
  0 siblings, 0 replies; 24+ messages in thread
From: Taylor Blau @ 2022-03-29  1:22 UTC (permalink / raw)
  To: Josh Steadmon, Derrick Stolee, git, lessleydennington, gitster, vdye

On Mon, Mar 28, 2022 at 12:53:33PM -0700, Josh Steadmon wrote:
> On 2022.03.23 15:52, Taylor Blau wrote:
> > I tried to sketch all of this out, which seems to work. Let me know what
> > you think:
>
> BTW, is it OK if I include your Signed-off-by on this when I send my V2?

Yes, absolutely. Thanks for asking; it's fine to include my
Signed-off-by on any patches / diffs that I send to the mailing list.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH] repo-settings: set defaults even when not in a repo
  2022-03-23 19:52   ` Taylor Blau
  2022-03-28 19:15     ` Josh Steadmon
  2022-03-28 19:53     ` Josh Steadmon
@ 2022-03-29  9:03     ` Ævar Arnfjörð Bjarmason
  2022-03-30  2:26       ` Taylor Blau
  2022-03-29  9:04     ` Ævar Arnfjörð Bjarmason
                       ` (2 subsequent siblings)
  5 siblings, 1 reply; 24+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-03-29  9:03 UTC (permalink / raw)
  To: Taylor Blau
  Cc: Derrick Stolee, Josh Steadmon, git, lessleydennington, gitster


On Wed, Mar 23 2022, Taylor Blau wrote:

>  	/* Boolean config or default, does not cascade (simple)  */
>  	repo_cfg_bool(r, "core.commitgraph", &r->settings.core_commit_graph, 1);
> +	repo_cfg_int(r, "commitgraph.generationversion", &r->settings.commit_graph_generation_version, 2);

If this ends up being kept let's add it to its own commented "section",
the comment 2-lines above it is now incorrect.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH] repo-settings: set defaults even when not in a repo
  2022-03-23 19:52   ` Taylor Blau
                       ` (2 preceding siblings ...)
  2022-03-29  9:03     ` Ævar Arnfjörð Bjarmason
@ 2022-03-29  9:04     ` Ævar Arnfjörð Bjarmason
  2022-03-30  2:34       ` Taylor Blau
  2022-04-09  6:52     ` [RFC PATCH v2] commit-graph: refactor to avoid prepare_repo_settings Josh Steadmon
  2022-06-14 22:37     ` [PATCH v3] " Josh Steadmon
  5 siblings, 1 reply; 24+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-03-29  9:04 UTC (permalink / raw)
  To: Taylor Blau
  Cc: Derrick Stolee, Josh Steadmon, git, lessleydennington, gitster


On Wed, Mar 23 2022, Taylor Blau wrote:

> On Wed, Mar 23, 2022 at 03:22:13PM -0400, Derrick Stolee wrote:
>> On 3/23/2022 2:03 PM, Josh Steadmon wrote:
>> > prepare_repo_settings() initializes a `struct repository` with various
>> > default config options and settings read from a repository-local config
>> > file. In 44c7e62 (2021-12-06, repo-settings:prepare_repo_settings only
>> > in git repos), prepare_repo_settings was changed to issue a BUG() if it
>> > is called by a process whose CWD is not a Git repository. This approach
>> > was suggested in [1].
>> >
>> > This breaks fuzz-commit-graph, which attempts to parse arbitrary
>> > fuzzing-engine-provided bytes as a commit graph file.
>> > commit-graph.c:parse_commit_graph() calls prepare_repo_settings(), but
>> > since we run the fuzz tests without a valid repository, we are hitting
>> > the BUG() from 44c7e62 for every test case.
>> >
>> > Fix this by refactoring prepare_repo_settings() such that it sets
>> > default options unconditionally; if its process is in a Git repository,
>> > it will also load settings from the local config. This eliminates the
>> > need for a BUG() when not in a repository.
>>
>> I think you have the right idea and this can work.
>
> Hmmm. To me this feels like bending over backwards in
> `prepare_repo_settings()` to accommodate one particular caller. I'm not
> necessarily opposed to it, but it does feel strange to make
> `prepare_repo_settings()` a noop here, since I would expect that any
> callers who do want to call `prepare_repo_settings()` are likely
> convinced that they are inside of a repository, and it probably should
> be a BUG() if they aren't.

I think adding that BUG() was overzelous in the first place, per
https://lore.kernel.org/git/211207.86r1apow9f.gmgdl@evledraar.gmail.com/;

I don't see what purpose it solves to be this overly anal in this code,
and 44c7e62e51e (repo-settings: prepare_repo_settings only in git repos,
2021-12-06) just discusses "what" and not "why".

I think a perfectly fine solution to this is just to revert it:
	
	diff --git a/repo-settings.c b/repo-settings.c
	index b4fbd16cdcc..e162c1479bf 100644
	--- a/repo-settings.c
	+++ b/repo-settings.c
	@@ -18,7 +18,7 @@ void prepare_repo_settings(struct repository *r)
	 	int manyfiles;
	 
	 	if (!r->gitdir)
	-		BUG("Cannot add settings for uninitialized repository");
	+		return;
	 
	 	if (r->settings.initialized++)
	 		return;

I have that in my local integration branch, because I ended up wanting
to add prepare_repo_settings() to usage.c, which may or may not run
inside a repo (and maybe we'll have that config, maybe not).

But really, in common-main.c we do a initialize_the_repository(), so a
"struct repository" is already a thing we have before we get to the
"RUN_SETUP_GENTLY" or whatever in git.c, and a bunch of things all over
the place assume that it's the equivalent of { 0 }-initialized.

If we actually want to turn repository.[ch] into some strict API where
"Tho Shalt Not Use the_repository unless" we're actually in a repo
surely we should have it be NULL then, and to add that BUG() to the
likes of initialize_the_repository().

Except I think there's no point in that, and it would just lead to
pointless churn, so why do it for the settings in particular? Why can't
they just be { 0 }-init'd too?

If some caller cares about the distinction between r->settings being
like it is because of us actually having a repo, or us using the
defaults why can't they just check r->gitdir themselves?

For the rest the default of "just provide the defaults then" is a much
saner API.

I think *maybe* what this actually wanted to do was to bridge the gap
between "startup_info->have_repository" and a caller in builtin/ calling
prepare_repo_settings(), i.e. that it was a logic error to have that
RUN_SETUP_GENTLY caller do that.

I can see how that *might* be useful as some sanity assertion, but then
maybe we could add a more narrow BUG() just for that case, even having a
builtin_prepare_repo_settings() wrapper in builtin.h or whatever.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH] repo-settings: set defaults even when not in a repo
  2022-03-29  9:03     ` Ævar Arnfjörð Bjarmason
@ 2022-03-30  2:26       ` Taylor Blau
  2022-04-09  6:33         ` Josh Steadmon
  0 siblings, 1 reply; 24+ messages in thread
From: Taylor Blau @ 2022-03-30  2:26 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Taylor Blau, Derrick Stolee, Josh Steadmon, git,
	lessleydennington, gitster

On Tue, Mar 29, 2022 at 11:03:14AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
> On Wed, Mar 23 2022, Taylor Blau wrote:
>
> >  	/* Boolean config or default, does not cascade (simple)  */
> >  	repo_cfg_bool(r, "core.commitgraph", &r->settings.core_commit_graph, 1);
> > +	repo_cfg_int(r, "commitgraph.generationversion", &r->settings.commit_graph_generation_version, 2);
>
> If this ends up being kept let's add it to its own commented "section",
> the comment 2-lines above it is now incorrect.

Thanks for pointing it out; indeed that comment is definitely stale with
respect to the newer code. This patch was just a sketch of an
alternative approach for Josh to consider, so I can't guarantee that it
isn't immune to nit-picks ;).

I think that Josh is planning on picking this up, so hopefully he
doesn't mind applying these and any other touch-ups locally before
resubmitting.

(Josh, if you do: thank you very much in advance!)

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH] repo-settings: set defaults even when not in a repo
  2022-03-29  9:04     ` Ævar Arnfjörð Bjarmason
@ 2022-03-30  2:34       ` Taylor Blau
  2022-03-30 17:38         ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 24+ messages in thread
From: Taylor Blau @ 2022-03-30  2:34 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Taylor Blau, Derrick Stolee, Josh Steadmon, git,
	lessleydennington, gitster

On Tue, Mar 29, 2022 at 11:04:18AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
> On Wed, Mar 23 2022, Taylor Blau wrote:
>
> > On Wed, Mar 23, 2022 at 03:22:13PM -0400, Derrick Stolee wrote:
> >> On 3/23/2022 2:03 PM, Josh Steadmon wrote:
> >> > prepare_repo_settings() initializes a `struct repository` with various
> >> > default config options and settings read from a repository-local config
> >> > file. In 44c7e62 (2021-12-06, repo-settings:prepare_repo_settings only
> >> > in git repos), prepare_repo_settings was changed to issue a BUG() if it
> >> > is called by a process whose CWD is not a Git repository. This approach
> >> > was suggested in [1].
> >> >
> >> > This breaks fuzz-commit-graph, which attempts to parse arbitrary
> >> > fuzzing-engine-provided bytes as a commit graph file.
> >> > commit-graph.c:parse_commit_graph() calls prepare_repo_settings(), but
> >> > since we run the fuzz tests without a valid repository, we are hitting
> >> > the BUG() from 44c7e62 for every test case.
> >> >
> >> > Fix this by refactoring prepare_repo_settings() such that it sets
> >> > default options unconditionally; if its process is in a Git repository,
> >> > it will also load settings from the local config. This eliminates the
> >> > need for a BUG() when not in a repository.
> >>
> >> I think you have the right idea and this can work.
> >
> > Hmmm. To me this feels like bending over backwards in
> > `prepare_repo_settings()` to accommodate one particular caller. I'm not
> > necessarily opposed to it, but it does feel strange to make
> > `prepare_repo_settings()` a noop here, since I would expect that any
> > callers who do want to call `prepare_repo_settings()` are likely
> > convinced that they are inside of a repository, and it probably should
> > be a BUG() if they aren't.
>
> I think adding that BUG() was overzelous in the first place, per
> https://lore.kernel.org/git/211207.86r1apow9f.gmgdl@evledraar.gmail.com/;

I think Junio raised a good point in

    https://lore.kernel.org/git/xmqqcznh8913.fsf@gitster.g/

, though some of the detail was lost in 44c7e62e51 (repo-settings:
prepare_repo_settings only in git repos, 2021-12-06).

> I have that in my local integration branch, because I ended up wanting
> to add prepare_repo_settings() to usage.c, which may or may not run
> inside a repo (and maybe we'll have that config, maybe not).

I see what you're saying, though I think we would be equally OK to have
a default value of the repo_settings struct that we could rely on. I
said some of this back in

    https://lore.kernel.org/git/Yjt6mLIfw0V3aVTO@nand.local/

, namely the parts around "I would expect that any callers who do want
to call `prepare_repo_settings()` are likely convinced that they are
inside of a repository, and it probably should be a BUG() if they
aren't."

Thinking in terms of your message, though, I think the distinction (from
my perspective, at least) is between (a) using something called
_repo_-settings in a non-repo context, and (b) calling a function which
is supposed to fill in its values from a repository (which the caller
implicitly expects to exist).

Neither feel _good_ to me, but (b) feels worse, since it is making it OK
to operate in a likely-unexpected context with respect to the caller's
expectations.

Anyway, I think that we are pretty far into the weeds, and it's likely
time to turn around. I don't have that strong a feeling either way, and
in all honesty either approach is probably just fine.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH] repo-settings: set defaults even when not in a repo
  2022-03-30  2:34       ` Taylor Blau
@ 2022-03-30 17:38         ` Ævar Arnfjörð Bjarmason
  2022-03-30 20:14           ` Junio C Hamano
  0 siblings, 1 reply; 24+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-03-30 17:38 UTC (permalink / raw)
  To: Taylor Blau
  Cc: Derrick Stolee, Josh Steadmon, git, lessleydennington, gitster


On Tue, Mar 29 2022, Taylor Blau wrote:

> On Tue, Mar 29, 2022 at 11:04:18AM +0200, Ævar Arnfjörð Bjarmason wrote:
>>
>> On Wed, Mar 23 2022, Taylor Blau wrote:
>>
>> > On Wed, Mar 23, 2022 at 03:22:13PM -0400, Derrick Stolee wrote:
>> >> On 3/23/2022 2:03 PM, Josh Steadmon wrote:
>> >> > prepare_repo_settings() initializes a `struct repository` with various
>> >> > default config options and settings read from a repository-local config
>> >> > file. In 44c7e62 (2021-12-06, repo-settings:prepare_repo_settings only
>> >> > in git repos), prepare_repo_settings was changed to issue a BUG() if it
>> >> > is called by a process whose CWD is not a Git repository. This approach
>> >> > was suggested in [1].
>> >> >
>> >> > This breaks fuzz-commit-graph, which attempts to parse arbitrary
>> >> > fuzzing-engine-provided bytes as a commit graph file.
>> >> > commit-graph.c:parse_commit_graph() calls prepare_repo_settings(), but
>> >> > since we run the fuzz tests without a valid repository, we are hitting
>> >> > the BUG() from 44c7e62 for every test case.
>> >> >
>> >> > Fix this by refactoring prepare_repo_settings() such that it sets
>> >> > default options unconditionally; if its process is in a Git repository,
>> >> > it will also load settings from the local config. This eliminates the
>> >> > need for a BUG() when not in a repository.
>> >>
>> >> I think you have the right idea and this can work.
>> >
>> > Hmmm. To me this feels like bending over backwards in
>> > `prepare_repo_settings()` to accommodate one particular caller. I'm not
>> > necessarily opposed to it, but it does feel strange to make
>> > `prepare_repo_settings()` a noop here, since I would expect that any
>> > callers who do want to call `prepare_repo_settings()` are likely
>> > convinced that they are inside of a repository, and it probably should
>> > be a BUG() if they aren't.
>>
>> I think adding that BUG() was overzelous in the first place, per
>> https://lore.kernel.org/git/211207.86r1apow9f.gmgdl@evledraar.gmail.com/;
>
> I think Junio raised a good point in
>
>     https://lore.kernel.org/git/xmqqcznh8913.fsf@gitster.g/
>
> , though some of the detail was lost in 44c7e62e51 (repo-settings:
> prepare_repo_settings only in git repos, 2021-12-06).
>
>> I have that in my local integration branch, because I ended up wanting
>> to add prepare_repo_settings() to usage.c, which may or may not run
>> inside a repo (and maybe we'll have that config, maybe not).
>
> I see what you're saying, though I think we would be equally OK to have
> a default value of the repo_settings struct that we could rely on. I
> said some of this back in
>
>     https://lore.kernel.org/git/Yjt6mLIfw0V3aVTO@nand.local/
>
> , namely the parts around "I would expect that any callers who do want
> to call `prepare_repo_settings()` are likely convinced that they are
> inside of a repository, and it probably should be a BUG() if they
> aren't."
>
> Thinking in terms of your message, though, I think the distinction (from
> my perspective, at least) is between (a) using something called
> _repo_-settings in a non-repo context, and (b) calling a function which
> is supposed to fill in its values from a repository (which the caller
> implicitly expects to exist).
>
> Neither feel _good_ to me, but (b) feels worse, since it is making it OK
> to operate in a likely-unexpected context with respect to the caller's
> expectations.

I agree that it's a bit iffy. I'm basically advocating for treating
"the_repository->settings" as though it's a new "the_config" or
whatever.

Maybe we'd be better off just making that move, or having
the_repository->settings contain only settings relevant to cases where
we only have a repository.

But I think predicating useful uses of it on that refactoring is
overdoing it a bit, especially as I think your "(b)" concern here is
already something we deal with when it comes to
initialize_the_repository() and checks for
"the_repository->gitdir".

Can't we just have callers that really care about the distinction check
"->gitdir" instead? As they're already doing in some cases already?

Or just:

    git mv {repo,global}-settings.c

Since that's what it seems to want to be anyway.

> Anyway, I think that we are pretty far into the weeds, and it's likely
> time to turn around. I don't have that strong a feeling either way, and
> in all honesty either approach is probably just fine.

*nod*

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH] repo-settings: set defaults even when not in a repo
  2022-03-30 17:38         ` Ævar Arnfjörð Bjarmason
@ 2022-03-30 20:14           ` Junio C Hamano
  0 siblings, 0 replies; 24+ messages in thread
From: Junio C Hamano @ 2022-03-30 20:14 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Taylor Blau, Derrick Stolee, Josh Steadmon, git, lessleydennington

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> Or just:
>
>     git mv {repo,global}-settings.c
>
> Since that's what it seems to want to be anyway.

Hmph, care to elaborate a bit more on "seems to"?

Here is my take

 - The code makes extensive use of repo_cfg_bool(), which is a thin
   wrapper around repo_config_get_bool(); despite its name, it is
   not about reading from the configuration file of that repository
   and nowhere else.  It can be affected by global configuration.

 - Other uses of repo_config_get_*() it uses is the same way.

So, it wants to grab a set of configuration that would +apply+ to
this specific instance of "struct repository".

But that is quite different from "give us settings that would apply
in general, I do not have a specific repository in mind", which is
what "global-settings.c" would imply at least to me.

And in order for the "this specific instance" to make sense, the
caller should have made sure that it is indeed a repository.
Lifting that BUG() from the code path not only smells sloppy way to
work around some corner case code that does not prepare the
repository properly, but does not make much sense, at least to me.
In exchange for scrapping the safety to help a caller that forgets
to prepare repository before it is ready to call this function, what
are we gaining?

I went back to the thread-starter message and re-read its
justification.  It talks about:

> Concerns:
>
> Are any callers strictly dependent on having a BUG() here? I suspect
> that the worst that would happen is that rather than this BUG(), the
> caller would later hit its own BUG() or die(), so I do not think this is
> a blocker. Additionally, every builtin that directly calls
> prepare_repo_settings is either marked as RUN_SETUP, which means we
> would die() prior to calling it anyway, or checks on its own before
> calling it (builtin/diff.c). There are several callers in library code,
> though, and I have not tracked down how all of those are used.

Asking for existing callers being dependent on having a BUG() is a
pure nonsense.  The existing callers are there in shipped versions
of Git exactly because they do things correctly not to hit the BUG(),
so BY DEFINITION, they do not care if the BUG() is there or not.

So that is not "a blocker", but is a non-argument to ask if existing
code paths care if the BUG() is gone.

What BUG() is protecting us against is a careless developer who
writes a new code or alters an existing code path that ends up
making the control flow in such a way that a proper set-up of the
repository structure is bypassed by mistake before calling this
function.  The function is call-once by r->settings.initialized
guarding it, calling it and then doing a set-up will result in an
unexplainable bug even if the caller tries to compensate by calling
it twice, as r->settings that is set incorrectly will be sticky.

Having said all that, I can be pursuaded to consider an approach to
allow callers to explicitly ask for running outside repository, just
like the more strict setup_git_directory() for majority of callers
has looser setup_git_directory_gently() counterpart.  The current
callers should retain the "you must have discovered gitdir" there,
but a special purpose code that is not even Git (like fuzzer) can
say

    prepare_repo_settings_gently(r, &nongit_ok);

instead.

diff --git c/repo-settings.c w/repo-settings.c
index b4fbd16cdc..c492bc7671 100644
--- c/repo-settings.c
+++ w/repo-settings.c
@@ -10,15 +10,24 @@ static void repo_cfg_bool(struct repository *r, const char *key, int *dest,
 		*dest = def;
 }
 
-void prepare_repo_settings(struct repository *r)
+void prepare_repo_settings_gently(struct repository *r, int *nongit)
 {
 	int experimental;
 	int value;
 	char *strval;
 	int manyfiles;
 
-	if (!r->gitdir)
-		BUG("Cannot add settings for uninitialized repository");
+	if (!r->gitdir) {
+		/*
+		 * The caller can pass nongit (out paremeter) to ask if r is already
+		 * initialized (and act on it after this function returns).
+		 */
+		if (!nongit)
+			BUG("Cannot add settings for uninitialized repository");
+		*nongit = 1;
+	} else if (nongit) {
+		*nongit = 0;
+	}
 
 	if (r->settings.initialized++)
 		return;
diff --git c/repository.h w/repository.h
index e29f361703..98f6ec12cc 100644
--- c/repository.h
+++ w/repository.h
@@ -222,7 +222,8 @@ int repo_read_index_unmerged(struct repository *);
  */
 void repo_update_index_if_able(struct repository *, struct lock_file *);
 
-void prepare_repo_settings(struct repository *r);
+#define prepare_repo_settings(r) prepare_repo_settings_gently((r), NULL)
+void prepare_repo_settings_gently(struct repository *r, int *nongit);
 
 /*
  * Return 1 if upgrade repository format to target_version succeeded,



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH] repo-settings: set defaults even when not in a repo
  2022-03-30  2:26       ` Taylor Blau
@ 2022-04-09  6:33         ` Josh Steadmon
  0 siblings, 0 replies; 24+ messages in thread
From: Josh Steadmon @ 2022-04-09  6:33 UTC (permalink / raw)
  To: Taylor Blau
  Cc: Ævar Arnfjörð Bjarmason, Derrick Stolee, git,
	lessleydennington, gitster

On 2022.03.29 22:26, Taylor Blau wrote:
> On Tue, Mar 29, 2022 at 11:03:14AM +0200, Ævar Arnfjörð Bjarmason wrote:
> >
> > On Wed, Mar 23 2022, Taylor Blau wrote:
> >
> > >  	/* Boolean config or default, does not cascade (simple)  */
> > >  	repo_cfg_bool(r, "core.commitgraph", &r->settings.core_commit_graph, 1);
> > > +	repo_cfg_int(r, "commitgraph.generationversion", &r->settings.commit_graph_generation_version, 2);
> >
> > If this ends up being kept let's add it to its own commented "section",
> > the comment 2-lines above it is now incorrect.
> 
> Thanks for pointing it out; indeed that comment is definitely stale with
> respect to the newer code. This patch was just a sketch of an
> alternative approach for Josh to consider, so I can't guarantee that it
> isn't immune to nit-picks ;).
> 
> I think that Josh is planning on picking this up, so hopefully he
> doesn't mind applying these and any other touch-ups locally before
> resubmitting.
> 
> (Josh, if you do: thank you very much in advance!)
> 
> Thanks,
> Taylor

Done in V2, to be sent out shortly.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH v2] commit-graph: refactor to avoid prepare_repo_settings
  2022-03-23 19:52   ` Taylor Blau
                       ` (3 preceding siblings ...)
  2022-03-29  9:04     ` Ævar Arnfjörð Bjarmason
@ 2022-04-09  6:52     ` Josh Steadmon
  2022-06-07 20:02       ` Jonathan Tan
  2022-06-14 22:37     ` [PATCH v3] " Josh Steadmon
  5 siblings, 1 reply; 24+ messages in thread
From: Josh Steadmon @ 2022-04-09  6:52 UTC (permalink / raw)
  To: git; +Cc: me, derrickstolee, lessleydennington, gitster, vdye, avarab

From: Taylor Blau <me@ttaylorr.com>

Prior to ab14d0676c (commit-graph: pass a 'struct repository *' in more
places, 2020-09-09), parsing a commit-graph was a pure function
depending only on the contents of the commit-graph itself. Commit
ab14d0676c introduced a dependency on a `struct repository` pointer, and
later commits such as b66d84756f (commit-graph: respect
'commitGraph.readChangedPaths', 2020-09-09) added dependencies on config
settings, which were accessed through the `settings` field of the
repository pointer. This field was initialized via a call to
`prepare_repo_settings()`.

In 44c7e62 (2021-12-06, repo-settings:prepare_repo_settings only in git
repos), prepare_repo_settings was changed to issue a BUG() if it is
called by a process whose CWD is not a Git repository.

This series of changes broke fuzz-commit-graph, which attempts to parse
arbitrary fuzzing-engine-provided bytes as a commit graph file.
commit-graph.c:parse_commit_graph() calls prepare_repo_settings(), but
since we run the fuzz tests without a valid repository, we are hitting
the BUG() from 44c7e62 for every test case.

Fix this by moving the majority of the implementaiton of
`parse_commit_graph()` into a new function,
`parse_commit_graph_settings()` that accepts a repo_settings pointer.
This allows fuzz-commit-graph to continue to test the commit-graph
parser implementation without relying on prepare_repo_settings().

Additionally, properly initialize the
repo_settings.commit_graph_generation_version field in
prepare_repo_settings(). Load the value from the config if present, and
default to version 2 otherwise.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
I've taken the diff from Taylor's message Yjt6mLIfw0V3aVTO@nand.local
with a small tweak to the fuzzer: I didn't see that the commit graph
settings were being initialized anywhere outside of
prepare_repo_settings(), so I manually initialized them in
fuzz-commit-graph. I've also moved the commit-graph settings in
prepare_repo_settings() to their own section, as suggested by Ævar.

I've tried to combine Taylor's explanation from his email with the
commit message from my original patch. Taylor, if you feel like anything
needs to be changed please let me know or feel free to resend with your
changes. Thanks again for providing this fix!

 commit-graph.c      | 22 ++++++++++------------
 commit-graph.h      |  2 ++
 fuzz-commit-graph.c |  8 +++++++-
 repo-settings.c     | 12 +++++++++++-
 repository.h        |  1 +
 5 files changed, 31 insertions(+), 14 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 265c010122..c54a734619 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -96,13 +96,6 @@ define_commit_slab(commit_graph_data_slab, struct commit_graph_data);
 static struct commit_graph_data_slab commit_graph_data_slab =
 	COMMIT_SLAB_INIT(1, commit_graph_data_slab);
 
-static int get_configured_generation_version(struct repository *r)
-{
-	int version = 2;
-	repo_config_get_int(r, "commitgraph.generationversion", &version);
-	return version;
-}
-
 uint32_t commit_graph_position(const struct commit *c)
 {
 	struct commit_graph_data *data =
@@ -335,6 +328,13 @@ static int graph_read_bloom_data(const unsigned char *chunk_start,
 
 struct commit_graph *parse_commit_graph(struct repository *r,
 					void *graph_map, size_t graph_size)
+{
+	prepare_repo_settings(r);
+	return parse_commit_graph_settings(&r->settings, graph_map, graph_size);
+}
+
+struct commit_graph *parse_commit_graph_settings(struct repo_settings *s,
+						 void *graph_map, size_t graph_size)
 {
 	const unsigned char *data;
 	struct commit_graph *graph;
@@ -371,8 +371,6 @@ struct commit_graph *parse_commit_graph(struct repository *r,
 		return NULL;
 	}
 
-	prepare_repo_settings(r);
-
 	graph = alloc_commit_graph();
 
 	graph->hash_len = the_hash_algo->rawsz;
@@ -402,14 +400,14 @@ struct commit_graph *parse_commit_graph(struct repository *r,
 	pair_chunk(cf, GRAPH_CHUNKID_EXTRAEDGES, &graph->chunk_extra_edges);
 	pair_chunk(cf, GRAPH_CHUNKID_BASE, &graph->chunk_base_graphs);
 
-	if (get_configured_generation_version(r) >= 2) {
+	if (s->commit_graph_generation_version >= 2) {
 		pair_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA,
 			&graph->chunk_generation_data);
 		pair_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW,
 			&graph->chunk_generation_data_overflow);
 	}
 
-	if (r->settings.commit_graph_read_changed_paths) {
+	if (s->commit_graph_read_changed_paths) {
 		pair_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES,
 			   &graph->chunk_bloom_indexes);
 		read_chunk(cf, GRAPH_CHUNKID_BLOOMDATA,
@@ -2288,7 +2286,7 @@ int write_commit_graph(struct object_directory *odb,
 	ctx->split = flags & COMMIT_GRAPH_WRITE_SPLIT ? 1 : 0;
 	ctx->opts = opts;
 	ctx->total_bloom_filter_data_size = 0;
-	ctx->write_generation_data = (get_configured_generation_version(r) == 2);
+	ctx->write_generation_data = (r->settings.commit_graph_generation_version == 2);
 	ctx->num_generation_data_overflows = 0;
 
 	bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY",
diff --git a/commit-graph.h b/commit-graph.h
index 04a94e1830..0f0d28b129 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -95,6 +95,8 @@ struct commit_graph *read_commit_graph_one(struct repository *r,
 					   struct object_directory *odb);
 struct commit_graph *parse_commit_graph(struct repository *r,
 					void *graph_map, size_t graph_size);
+struct commit_graph *parse_commit_graph_settings(struct repo_settings *s,
+					void *graph_map, size_t graph_size);
 
 /*
  * Return 1 if and only if the repository has a commit-graph
diff --git a/fuzz-commit-graph.c b/fuzz-commit-graph.c
index e7cf6d5b0f..e53a2635f6 100644
--- a/fuzz-commit-graph.c
+++ b/fuzz-commit-graph.c
@@ -11,7 +11,13 @@ int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
 	struct commit_graph *g;
 
 	initialize_the_repository();
-	g = parse_commit_graph(the_repository, (void *)data, size);
+	/*
+	 * Manually initialize commit-graph settings, to avoid the need to run
+	 * in an actual repository.
+	 */
+	the_repository->settings.commit_graph_generation_version = 2;
+	the_repository->settings.commit_graph_read_changed_paths = 1;
+	g = parse_commit_graph_settings(&the_repository->settings, (void *)data, size);
 	repo_clear(the_repository);
 	free_commit_graph(g);
 
diff --git a/repo-settings.c b/repo-settings.c
index b4fbd16cdc..26241c1c2c 100644
--- a/repo-settings.c
+++ b/repo-settings.c
@@ -10,6 +10,13 @@ static void repo_cfg_bool(struct repository *r, const char *key, int *dest,
 		*dest = def;
 }
 
+static void repo_cfg_int(struct repository *r, const char *key, int *dest,
+			 int def)
+{
+	if (repo_config_get_int(r, key, dest))
+		*dest = def;
+}
+
 void prepare_repo_settings(struct repository *r)
 {
 	int experimental;
@@ -41,11 +48,14 @@ void prepare_repo_settings(struct repository *r)
 		r->settings.core_untracked_cache = UNTRACKED_CACHE_WRITE;
 	}
 
-	/* Boolean config or default, does not cascade (simple)  */
+	/* Commit graph config or default, does not cascade (simple) */
 	repo_cfg_bool(r, "core.commitgraph", &r->settings.core_commit_graph, 1);
+	repo_cfg_int(r, "commitgraph.generationversion", &r->settings.commit_graph_generation_version, 2);
 	repo_cfg_bool(r, "commitgraph.readchangedpaths", &r->settings.commit_graph_read_changed_paths, 1);
 	repo_cfg_bool(r, "gc.writecommitgraph", &r->settings.gc_write_commit_graph, 1);
 	repo_cfg_bool(r, "fetch.writecommitgraph", &r->settings.fetch_write_commit_graph, 0);
+
+	/* Boolean config or default, does not cascade (simple)  */
 	repo_cfg_bool(r, "pack.usesparse", &r->settings.pack_use_sparse, 1);
 	repo_cfg_bool(r, "core.multipackindex", &r->settings.core_multi_pack_index, 1);
 	repo_cfg_bool(r, "index.sparse", &r->settings.sparse_index, 0);
diff --git a/repository.h b/repository.h
index ca837cb9e9..4f8275f97c 100644
--- a/repository.h
+++ b/repository.h
@@ -29,6 +29,7 @@ struct repo_settings {
 	int initialized;
 
 	int core_commit_graph;
+	int commit_graph_generation_version;
 	int commit_graph_read_changed_paths;
 	int gc_write_commit_graph;
 	int fetch_write_commit_graph;

base-commit: 715d08a9e51251ad8290b181b6ac3b9e1f9719d7
-- 
2.35.1.1178.g4f1659d476-goog


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2] commit-graph: refactor to avoid prepare_repo_settings
  2022-04-09  6:52     ` [RFC PATCH v2] commit-graph: refactor to avoid prepare_repo_settings Josh Steadmon
@ 2022-06-07 20:02       ` Jonathan Tan
  2022-06-14 22:38         ` Josh Steadmon
  0 siblings, 1 reply; 24+ messages in thread
From: Jonathan Tan @ 2022-06-07 20:02 UTC (permalink / raw)
  To: Josh Steadmon
  Cc: Jonathan Tan, git, me, derrickstolee, lessleydennington, gitster,
	vdye, avarab

Josh Steadmon <steadmon@google.com> writes:
> diff --git a/fuzz-commit-graph.c b/fuzz-commit-graph.c
> index e7cf6d5b0f..e53a2635f6 100644
> --- a/fuzz-commit-graph.c
> +++ b/fuzz-commit-graph.c
> @@ -11,7 +11,13 @@ int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
>  	struct commit_graph *g;
>  
>  	initialize_the_repository();
> -	g = parse_commit_graph(the_repository, (void *)data, size);
> +	/*
> +	 * Manually initialize commit-graph settings, to avoid the need to run
> +	 * in an actual repository.
> +	 */
> +	the_repository->settings.commit_graph_generation_version = 2;
> +	the_repository->settings.commit_graph_read_changed_paths = 1;
> +	g = parse_commit_graph_settings(&the_repository->settings, (void *)data, size);
>  	repo_clear(the_repository);
>  	free_commit_graph(g);

The comment doesn't explain why we need to avoid an actual repository. Maybe
better: Initialize the commit-graph settings that would normally be read from
the repository's gitdir.

Other than that, this patch looks good to me. Isolating a part of the
commit graph mechanism that can be fuzzed without access to the disk is,
I think, a good idea, and this patch is a good way to do it.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v3] commit-graph: refactor to avoid prepare_repo_settings
  2022-03-23 19:52   ` Taylor Blau
                       ` (4 preceding siblings ...)
  2022-04-09  6:52     ` [RFC PATCH v2] commit-graph: refactor to avoid prepare_repo_settings Josh Steadmon
@ 2022-06-14 22:37     ` Josh Steadmon
  2022-06-14 23:32       ` Taylor Blau
  2022-06-23 21:59       ` Junio C Hamano
  5 siblings, 2 replies; 24+ messages in thread
From: Josh Steadmon @ 2022-06-14 22:37 UTC (permalink / raw)
  To: git
  Cc: me, derrickstolee, lessleydennington, gitster, vdye, avarab,
	jonathantanmy

From: Taylor Blau <me@ttaylorr.com>

Prior to ab14d0676c (commit-graph: pass a 'struct repository *' in more
places, 2020-09-09), parsing a commit-graph was a pure function
depending only on the contents of the commit-graph itself. Commit
ab14d0676c introduced a dependency on a `struct repository` pointer, and
later commits such as b66d84756f (commit-graph: respect
'commitGraph.readChangedPaths', 2020-09-09) added dependencies on config
settings, which were accessed through the `settings` field of the
repository pointer. This field was initialized via a call to
`prepare_repo_settings()`.

In 44c7e62 (2021-12-06, repo-settings:prepare_repo_settings only in git
repos), prepare_repo_settings was changed to issue a BUG() if it is
called by a process whose CWD is not a Git repository.

This series of changes broke fuzz-commit-graph, which attempts to parse
arbitrary fuzzing-engine-provided bytes as a commit graph file.
commit-graph.c:parse_commit_graph() calls prepare_repo_settings(), but
since we run the fuzz tests without a valid repository, we are hitting
the BUG() from 44c7e62 for every test case.

Fix this by moving the majority of the implementaiton of
`parse_commit_graph()` into a new function,
`parse_commit_graph_settings()` that accepts a repo_settings pointer.
This allows fuzz-commit-graph to continue to test the commit-graph
parser implementation without relying on prepare_repo_settings().

Additionally, properly initialize the
repo_settings.commit_graph_generation_version field in
prepare_repo_settings(). Load the value from the config if present, and
default to version 2 otherwise.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Josh Steadmon <steadmon@google.com>
---
Range-diff against v2:
1:  2c2bfc7b43 ! 1:  9b56496b08 commit-graph: refactor to avoid prepare_repo_settings
    @@ fuzz-commit-graph.c: int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size
      	initialize_the_repository();
     -	g = parse_commit_graph(the_repository, (void *)data, size);
     +	/*
    -+	 * Manually initialize commit-graph settings, to avoid the need to run
    -+	 * in an actual repository.
    ++	 * Initialize the_repository with commit-graph settings that would
    ++	 * normally be read from the repository's gitdir. We want to avoid
    ++	 * touching the disk to keep the individual fuzz-test cases as fast as
    ++	 * possible.
     +	 */
     +	the_repository->settings.commit_graph_generation_version = 2;
     +	the_repository->settings.commit_graph_read_changed_paths = 1;

 commit-graph.c      | 22 ++++++++++------------
 commit-graph.h      |  2 ++
 fuzz-commit-graph.c | 10 +++++++++-
 repo-settings.c     | 12 +++++++++++-
 repository.h        |  1 +
 5 files changed, 33 insertions(+), 14 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 265c010122..c54a734619 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -96,13 +96,6 @@ define_commit_slab(commit_graph_data_slab, struct commit_graph_data);
 static struct commit_graph_data_slab commit_graph_data_slab =
 	COMMIT_SLAB_INIT(1, commit_graph_data_slab);
 
-static int get_configured_generation_version(struct repository *r)
-{
-	int version = 2;
-	repo_config_get_int(r, "commitgraph.generationversion", &version);
-	return version;
-}
-
 uint32_t commit_graph_position(const struct commit *c)
 {
 	struct commit_graph_data *data =
@@ -335,6 +328,13 @@ static int graph_read_bloom_data(const unsigned char *chunk_start,
 
 struct commit_graph *parse_commit_graph(struct repository *r,
 					void *graph_map, size_t graph_size)
+{
+	prepare_repo_settings(r);
+	return parse_commit_graph_settings(&r->settings, graph_map, graph_size);
+}
+
+struct commit_graph *parse_commit_graph_settings(struct repo_settings *s,
+						 void *graph_map, size_t graph_size)
 {
 	const unsigned char *data;
 	struct commit_graph *graph;
@@ -371,8 +371,6 @@ struct commit_graph *parse_commit_graph(struct repository *r,
 		return NULL;
 	}
 
-	prepare_repo_settings(r);
-
 	graph = alloc_commit_graph();
 
 	graph->hash_len = the_hash_algo->rawsz;
@@ -402,14 +400,14 @@ struct commit_graph *parse_commit_graph(struct repository *r,
 	pair_chunk(cf, GRAPH_CHUNKID_EXTRAEDGES, &graph->chunk_extra_edges);
 	pair_chunk(cf, GRAPH_CHUNKID_BASE, &graph->chunk_base_graphs);
 
-	if (get_configured_generation_version(r) >= 2) {
+	if (s->commit_graph_generation_version >= 2) {
 		pair_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA,
 			&graph->chunk_generation_data);
 		pair_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW,
 			&graph->chunk_generation_data_overflow);
 	}
 
-	if (r->settings.commit_graph_read_changed_paths) {
+	if (s->commit_graph_read_changed_paths) {
 		pair_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES,
 			   &graph->chunk_bloom_indexes);
 		read_chunk(cf, GRAPH_CHUNKID_BLOOMDATA,
@@ -2288,7 +2286,7 @@ int write_commit_graph(struct object_directory *odb,
 	ctx->split = flags & COMMIT_GRAPH_WRITE_SPLIT ? 1 : 0;
 	ctx->opts = opts;
 	ctx->total_bloom_filter_data_size = 0;
-	ctx->write_generation_data = (get_configured_generation_version(r) == 2);
+	ctx->write_generation_data = (r->settings.commit_graph_generation_version == 2);
 	ctx->num_generation_data_overflows = 0;
 
 	bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY",
diff --git a/commit-graph.h b/commit-graph.h
index 04a94e1830..0f0d28b129 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -95,6 +95,8 @@ struct commit_graph *read_commit_graph_one(struct repository *r,
 					   struct object_directory *odb);
 struct commit_graph *parse_commit_graph(struct repository *r,
 					void *graph_map, size_t graph_size);
+struct commit_graph *parse_commit_graph_settings(struct repo_settings *s,
+					void *graph_map, size_t graph_size);
 
 /*
  * Return 1 if and only if the repository has a commit-graph
diff --git a/fuzz-commit-graph.c b/fuzz-commit-graph.c
index e7cf6d5b0f..810b1a8001 100644
--- a/fuzz-commit-graph.c
+++ b/fuzz-commit-graph.c
@@ -11,7 +11,15 @@ int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
 	struct commit_graph *g;
 
 	initialize_the_repository();
-	g = parse_commit_graph(the_repository, (void *)data, size);
+	/*
+	 * Initialize the_repository with commit-graph settings that would
+	 * normally be read from the repository's gitdir. We want to avoid
+	 * touching the disk to keep the individual fuzz-test cases as fast as
+	 * possible.
+	 */
+	the_repository->settings.commit_graph_generation_version = 2;
+	the_repository->settings.commit_graph_read_changed_paths = 1;
+	g = parse_commit_graph_settings(&the_repository->settings, (void *)data, size);
 	repo_clear(the_repository);
 	free_commit_graph(g);
 
diff --git a/repo-settings.c b/repo-settings.c
index b4fbd16cdc..26241c1c2c 100644
--- a/repo-settings.c
+++ b/repo-settings.c
@@ -10,6 +10,13 @@ static void repo_cfg_bool(struct repository *r, const char *key, int *dest,
 		*dest = def;
 }
 
+static void repo_cfg_int(struct repository *r, const char *key, int *dest,
+			 int def)
+{
+	if (repo_config_get_int(r, key, dest))
+		*dest = def;
+}
+
 void prepare_repo_settings(struct repository *r)
 {
 	int experimental;
@@ -41,11 +48,14 @@ void prepare_repo_settings(struct repository *r)
 		r->settings.core_untracked_cache = UNTRACKED_CACHE_WRITE;
 	}
 
-	/* Boolean config or default, does not cascade (simple)  */
+	/* Commit graph config or default, does not cascade (simple) */
 	repo_cfg_bool(r, "core.commitgraph", &r->settings.core_commit_graph, 1);
+	repo_cfg_int(r, "commitgraph.generationversion", &r->settings.commit_graph_generation_version, 2);
 	repo_cfg_bool(r, "commitgraph.readchangedpaths", &r->settings.commit_graph_read_changed_paths, 1);
 	repo_cfg_bool(r, "gc.writecommitgraph", &r->settings.gc_write_commit_graph, 1);
 	repo_cfg_bool(r, "fetch.writecommitgraph", &r->settings.fetch_write_commit_graph, 0);
+
+	/* Boolean config or default, does not cascade (simple)  */
 	repo_cfg_bool(r, "pack.usesparse", &r->settings.pack_use_sparse, 1);
 	repo_cfg_bool(r, "core.multipackindex", &r->settings.core_multi_pack_index, 1);
 	repo_cfg_bool(r, "index.sparse", &r->settings.sparse_index, 0);
diff --git a/repository.h b/repository.h
index ca837cb9e9..4f8275f97c 100644
--- a/repository.h
+++ b/repository.h
@@ -29,6 +29,7 @@ struct repo_settings {
 	int initialized;
 
 	int core_commit_graph;
+	int commit_graph_generation_version;
 	int commit_graph_read_changed_paths;
 	int gc_write_commit_graph;
 	int fetch_write_commit_graph;

base-commit: 715d08a9e51251ad8290b181b6ac3b9e1f9719d7
-- 
2.36.1.476.g0c4daa206d-goog


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2] commit-graph: refactor to avoid prepare_repo_settings
  2022-06-07 20:02       ` Jonathan Tan
@ 2022-06-14 22:38         ` Josh Steadmon
  0 siblings, 0 replies; 24+ messages in thread
From: Josh Steadmon @ 2022-06-14 22:38 UTC (permalink / raw)
  To: Jonathan Tan
  Cc: git, me, derrickstolee, lessleydennington, gitster, vdye, avarab

On 2022.06.07 13:02, Jonathan Tan wrote:
> Josh Steadmon <steadmon@google.com> writes:
> > diff --git a/fuzz-commit-graph.c b/fuzz-commit-graph.c
> > index e7cf6d5b0f..e53a2635f6 100644
> > --- a/fuzz-commit-graph.c
> > +++ b/fuzz-commit-graph.c
> > @@ -11,7 +11,13 @@ int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
> >  	struct commit_graph *g;
> >  
> >  	initialize_the_repository();
> > -	g = parse_commit_graph(the_repository, (void *)data, size);
> > +	/*
> > +	 * Manually initialize commit-graph settings, to avoid the need to run
> > +	 * in an actual repository.
> > +	 */
> > +	the_repository->settings.commit_graph_generation_version = 2;
> > +	the_repository->settings.commit_graph_read_changed_paths = 1;
> > +	g = parse_commit_graph_settings(&the_repository->settings, (void *)data, size);
> >  	repo_clear(the_repository);
> >  	free_commit_graph(g);
> 
> The comment doesn't explain why we need to avoid an actual repository. Maybe
> better: Initialize the commit-graph settings that would normally be read from
> the repository's gitdir.
> 
> Other than that, this patch looks good to me. Isolating a part of the
> commit graph mechanism that can be fuzzed without access to the disk is,
> I think, a good idea, and this patch is a good way to do it.

Done in V3, thanks for taking a look!

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3] commit-graph: refactor to avoid prepare_repo_settings
  2022-06-14 22:37     ` [PATCH v3] " Josh Steadmon
@ 2022-06-14 23:32       ` Taylor Blau
  2022-06-23 21:59       ` Junio C Hamano
  1 sibling, 0 replies; 24+ messages in thread
From: Taylor Blau @ 2022-06-14 23:32 UTC (permalink / raw)
  To: Josh Steadmon
  Cc: git, me, derrickstolee, lessleydennington, gitster, vdye, avarab,
	jonathantanmy

On Tue, Jun 14, 2022 at 03:37:21PM -0700, Josh Steadmon wrote:
> Range-diff against v2:
> 1:  2c2bfc7b43 ! 1:  9b56496b08 commit-graph: refactor to avoid prepare_repo_settings
>     @@ fuzz-commit-graph.c: int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size
>       	initialize_the_repository();
>      -	g = parse_commit_graph(the_repository, (void *)data, size);
>      +	/*
>     -+	 * Manually initialize commit-graph settings, to avoid the need to run
>     -+	 * in an actual repository.
>     ++	 * Initialize the_repository with commit-graph settings that would
>     ++	 * normally be read from the repository's gitdir. We want to avoid
>     ++	 * touching the disk to keep the individual fuzz-test cases as fast as
>     ++	 * possible.
>      +	 */
>      +	the_repository->settings.commit_graph_generation_version = 2;
>      +	the_repository->settings.commit_graph_read_changed_paths = 1;

This version looks good to me. Thanks for working on this, Josh!

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3] commit-graph: refactor to avoid prepare_repo_settings
  2022-06-14 22:37     ` [PATCH v3] " Josh Steadmon
  2022-06-14 23:32       ` Taylor Blau
@ 2022-06-23 21:59       ` Junio C Hamano
  1 sibling, 0 replies; 24+ messages in thread
From: Junio C Hamano @ 2022-06-23 21:59 UTC (permalink / raw)
  To: Josh Steadmon
  Cc: git, me, derrickstolee, lessleydennington, vdye, avarab, jonathantanmy

Josh Steadmon <steadmon@google.com> writes:

> This series of changes broke fuzz-commit-graph, which attempts to parse
> arbitrary fuzzing-engine-provided bytes as a commit graph file.
> commit-graph.c:parse_commit_graph() calls prepare_repo_settings(), but
> since we run the fuzz tests without a valid repository, we are hitting
> the BUG() from 44c7e62 for every test case.
>
> Fix this by moving the majority of the implementaiton of
> `parse_commit_graph()` into a new function,
> `parse_commit_graph_settings()` that accepts a repo_settings pointer.
> This allows fuzz-commit-graph to continue to test the commit-graph
> parser implementation without relying on prepare_repo_settings().

It sounds like this is not a "fix" but a workaround to bend the
production code so that a non-production test shim can be inserted
more easily.

I am OK with the idea, but have a huge problem with the new name.

Is it just me who thinks parse_commit_graph_settings() is a function
that parses some kind of settings that affects the way the commit
graph gets used or written?

Stepping back a bit, why can't fuzz-commit-graph prepare a
repository object that looks sufficiently real?  Something along the
lines of...

                struct repository fake_repo;

                fake_repo.settings.initialized = 1;
                fake_repo.gitdir = ".";
                parse_commit_graph(&fake_repo, (void *)data, size);
		...

Also, I feel somewhat uneasy to see these changes:

> -	if (get_configured_generation_version(r) >= 2) {
> +	if (s->commit_graph_generation_version >= 2) {
> -	if (r->settings.commit_graph_read_changed_paths) {
> +	if (s->commit_graph_read_changed_paths) {
> -	ctx->write_generation_data = (get_configured_generation_version(r) == 2);
> +	ctx->write_generation_data = (r->settings.commit_graph_generation_version == 2);
>  	ctx->num_generation_data_overflows = 0;

that makes the production code bend over backwards to _avoid_
referencing 'r', only to cater to the test shim.  That's an
artificial limitation we are forcing on our developers who works on
this code.  It might be that what is in the repository settings is
sufficient for today's code to work, but I do not think needs for
fuzz tests should tie the hand of this production code by forbidding
it to look at other things in the repository in the future.  After
all, tests are to serve the production code, not the other way
around.

On the other hand, I think a change that is slightly smaller than
the posted patch, which justifies itself with a completely different
rationale, would be totally acceptable.  You can justify this change
with NO mention of fuzzers.

    The parse_commit_graph() function takes a "struct repository *"
    pointer, but all it cares about is the .settings member of it.

    Update the function and all its existing callers so that it
    takes "struct repo_settings *" instead.

Now, in the future, some developers _might_ find it necessary to
access stuff other than the repository settings to validate the
contents of the graph file, and we may need to change it to take a
full repository structure again.  The test should adjust to such
needs of the production code, not the other way around.  But until
then...

Thanks.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2022-06-23 21:59 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-23 18:03 [RFC PATCH] repo-settings: set defaults even when not in a repo Josh Steadmon
2022-03-23 19:22 ` Derrick Stolee
2022-03-23 19:52   ` Taylor Blau
2022-03-28 19:15     ` Josh Steadmon
2022-03-29  1:21       ` Taylor Blau
2022-03-28 19:53     ` Josh Steadmon
2022-03-29  1:22       ` Taylor Blau
2022-03-29  9:03     ` Ævar Arnfjörð Bjarmason
2022-03-30  2:26       ` Taylor Blau
2022-04-09  6:33         ` Josh Steadmon
2022-03-29  9:04     ` Ævar Arnfjörð Bjarmason
2022-03-30  2:34       ` Taylor Blau
2022-03-30 17:38         ` Ævar Arnfjörð Bjarmason
2022-03-30 20:14           ` Junio C Hamano
2022-04-09  6:52     ` [RFC PATCH v2] commit-graph: refactor to avoid prepare_repo_settings Josh Steadmon
2022-06-07 20:02       ` Jonathan Tan
2022-06-14 22:38         ` Josh Steadmon
2022-06-14 22:37     ` [PATCH v3] " Josh Steadmon
2022-06-14 23:32       ` Taylor Blau
2022-06-23 21:59       ` Junio C Hamano
2022-03-23 20:11 ` [RFC PATCH] repo-settings: set defaults even when not in a repo Victoria Dye
2022-03-23 20:54   ` Junio C Hamano
2022-03-23 21:19     ` Victoria Dye
2022-03-23 20:51 ` Junio C Hamano

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).