From: Victoria Dye <vdye@github.com>
To: Josh Steadmon <steadmon@google.com>, git@vger.kernel.org
Cc: lessleydennington@gmail.com, gitster@pobox.com
Subject: Re: [RFC PATCH] repo-settings: set defaults even when not in a repo
Date: Wed, 23 Mar 2022 13:11:10 -0700 [thread overview]
Message-ID: <fcfdcbb9-761a-0d34-7d36-61e0ef279922@github.com> (raw)
In-Reply-To: <1b27e0b115f858a422e0a2891688227be8f3db01.1648055915.git.steadmon@google.com>
Josh Steadmon wrote:
> prepare_repo_settings() initializes a `struct repository` with various
> default config options and settings read from a repository-local config
> file. In 44c7e62 (2021-12-06, repo-settings:prepare_repo_settings only
> in git repos), prepare_repo_settings was changed to issue a BUG() if it
> is called by a process whose CWD is not a Git repository. This approach
> was suggested in [1].
>
> This breaks fuzz-commit-graph, which attempts to parse arbitrary
> fuzzing-engine-provided bytes as a commit graph file.
> commit-graph.c:parse_commit_graph() calls prepare_repo_settings(), but
> since we run the fuzz tests without a valid repository, we are hitting
> the BUG() from 44c7e62 for every test case.
>
> Fix this by refactoring prepare_repo_settings() such that it sets
> default options unconditionally; if its process is in a Git repository,
> it will also load settings from the local config. This eliminates the
> need for a BUG() when not in a repository.
>
I think the decision of whether to go with this approach or the alternative
listed below depends on the validity of a 'repository' without a gitdir.
As far as I can tell, there is an implicit conflict between the changes in:
1. b66d84756f (commit-graph: respect 'commitGraph.readChangedPaths',
2020-09-09)
2. 44c7e62e51 (repo-settings: prepare_repo_settings only in git repos,
2021-12-06) (as you pointed out in your message)
The former says that commit-graph should use a repository setting (implying
it needs a valid repository), and the latter says that you need a valid
gitdir to get repository settings.
So to me, how to proceed depends on whether a repository can be "valid"
without a gitdir or not:
* If it is valid (or not-completely-broken/semi-valid - i.e., not a BUG()),
then your approach is probably fine - the 'BUG()' should be removed from
'prepare_repo_settings()'. But, to make sure we don't run into this
question again, 'prepare_repo_settings()' should be audited for any
settings/setup that implicitly require a gitdir and updated accordingly,
along with a comment somewhere saying that the 'gitdir' isn't required.
* If it is not valid, then this use in 'fuzz-commit-graph' tests is fragile
(even if it's not a problem right now) - 'prepare_repo_settings()' could
start needing the gitdir, or "the_hash_algo" could become invalid, or some
other problem could arise from the invalid repo and the tests would break.
I don't have a good answer to this question, but someone with more
experience in this area might be able to help.
> Concerns:
>
> Are any callers strictly dependent on having a BUG() here? I suspect
> that the worst that would happen is that rather than this BUG(), the
> caller would later hit its own BUG() or die(), so I do not think this is
> a blocker. Additionally, every builtin that directly calls
> prepare_repo_settings is either marked as RUN_SETUP, which means we
> would die() prior to calling it anyway, or checks on its own before
> calling it (builtin/diff.c). There are several callers in library code,
> though, and I have not tracked down how all of those are used.
>
> Alternatives considered:
>
> Setting up a valid repository for fuzz testing would avoid triggering
> this bug, but would unacceptably slow down the test cases.
>
> Refactoring parse_commit_graph() in such a way that the fuzz test has an
> alternate entry point that avoids calling prepare_repo_settings() might
> be possible, but would be a much larger change than this one. It would
> also run the risk that the changes would be so extensive that the fuzzer
> would be merely testing its own custom commit-graph implementation,
> rather than the one that's actually used in the real world.
>
If the 'repository' really is *completely* invalid without a gitdir, I don't
think the refactor of 'commit-graph.c' would necessarily be too bad -
certainly not so extensive that it's not testing "real world" behavior:
* Make 'parse_commit_graph()' only call 'prepare_repo_settings()' if the
repo has a valid gitdir (guarded like 'git diff' is)
* Guard all use use of 'r' in 'parse_commit_graph()' based on the existence
of 'gitdir', with fallbacks on e.g. global config settings. From what I
can see, the only usage is:
* The check for 'r->settings.commit_graph_read_changed_paths'
* The call to 'get_configured_generation_version()'
* The use of 'the_hash_algo->raw_sz'
> [1]: https://lore.kernel.org/git/xmqqcznh8913.fsf@gitster.g/
>
> Signed-off-by: Josh Steadmon <steadmon@google.com>
> ---
> repo-settings.c | 111 ++++++++++++++++++++++++------------------------
> 1 file changed, 55 insertions(+), 56 deletions(-)
>
> diff --git a/repo-settings.c b/repo-settings.c
> index b4fbd16cdc..d154b37007 100644
> --- a/repo-settings.c
> +++ b/repo-settings.c
> @@ -17,9 +17,6 @@ void prepare_repo_settings(struct repository *r)
> char *strval;
> int manyfiles;
>
> - if (!r->gitdir)
> - BUG("Cannot add settings for uninitialized repository");
> -
> if (r->settings.initialized++)
> return;
>
> @@ -28,27 +25,63 @@ void prepare_repo_settings(struct repository *r)
> r->settings.core_untracked_cache = UNTRACKED_CACHE_KEEP;
> r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_CONSECUTIVE;
>
> - /* Booleans config or default, cascades to other settings */
> - repo_cfg_bool(r, "feature.manyfiles", &manyfiles, 0);
> - repo_cfg_bool(r, "feature.experimental", &experimental, 0);
> + if (r->gitdir) {
> + /* Booleans config or default, cascades to other settings */
> + repo_cfg_bool(r, "feature.manyfiles", &manyfiles, 0);
> + repo_cfg_bool(r, "feature.experimental", &experimental, 0);
>
> - /* Defaults modified by feature.* */
> - if (experimental) {
> - r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_SKIPPING;
> - }
> - if (manyfiles) {
> - r->settings.index_version = 4;
> - r->settings.core_untracked_cache = UNTRACKED_CACHE_WRITE;
> - }
> + /* Defaults modified by feature.* */
> + if (experimental) {
> + r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_SKIPPING;
> + }
> + if (manyfiles) {
> + r->settings.index_version = 4;
> + r->settings.core_untracked_cache = UNTRACKED_CACHE_WRITE;
> + }
> +
> + /* Boolean config or default, does not cascade (simple) */
> + repo_cfg_bool(r, "core.commitgraph", &r->settings.core_commit_graph, 1);
> + repo_cfg_bool(r, "commitgraph.readchangedpaths", &r->settings.commit_graph_read_changed_paths, 1);
> + repo_cfg_bool(r, "gc.writecommitgraph", &r->settings.gc_write_commit_graph, 1);
> + repo_cfg_bool(r, "fetch.writecommitgraph", &r->settings.fetch_write_commit_graph, 0);
> + repo_cfg_bool(r, "pack.usesparse", &r->settings.pack_use_sparse, 1);
> + repo_cfg_bool(r, "core.multipackindex", &r->settings.core_multi_pack_index, 1);
> + repo_cfg_bool(r, "index.sparse", &r->settings.sparse_index, 0);
>
> - /* Boolean config or default, does not cascade (simple) */
> - repo_cfg_bool(r, "core.commitgraph", &r->settings.core_commit_graph, 1);
> - repo_cfg_bool(r, "commitgraph.readchangedpaths", &r->settings.commit_graph_read_changed_paths, 1);
> - repo_cfg_bool(r, "gc.writecommitgraph", &r->settings.gc_write_commit_graph, 1);
> - repo_cfg_bool(r, "fetch.writecommitgraph", &r->settings.fetch_write_commit_graph, 0);
> - repo_cfg_bool(r, "pack.usesparse", &r->settings.pack_use_sparse, 1);
> - repo_cfg_bool(r, "core.multipackindex", &r->settings.core_multi_pack_index, 1);
> - repo_cfg_bool(r, "index.sparse", &r->settings.sparse_index, 0);
> + /*
> + * Non-boolean config
> + */
> + if (!repo_config_get_int(r, "index.version", &value))
> + r->settings.index_version = value;
> +
> + if (!repo_config_get_string(r, "core.untrackedcache", &strval)) {
> + int v = git_parse_maybe_bool(strval);
> +
> + /*
> + * If it's set to "keep", or some other non-boolean
> + * value then "v < 0". Then we do nothing and keep it
> + * at the default of UNTRACKED_CACHE_KEEP.
> + */
> + if (v >= 0)
> + r->settings.core_untracked_cache = v ?
> + UNTRACKED_CACHE_WRITE : UNTRACKED_CACHE_REMOVE;
> + free(strval);
> + }
> +
> + if (!repo_config_get_string(r, "fetch.negotiationalgorithm", &strval)) {
> + int fetch_default = r->settings.fetch_negotiation_algorithm;
> + if (!strcasecmp(strval, "skipping"))
> + r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_SKIPPING;
> + else if (!strcasecmp(strval, "noop"))
> + r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_NOOP;
> + else if (!strcasecmp(strval, "consecutive"))
> + r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_CONSECUTIVE;
> + else if (!strcasecmp(strval, "default"))
> + r->settings.fetch_negotiation_algorithm = fetch_default;
> + else
> + die("unknown fetch negotiation algorithm '%s'", strval);
> + }
> + }
>
> /*
> * The GIT_TEST_MULTI_PACK_INDEX variable is special in that
> @@ -60,40 +93,6 @@ void prepare_repo_settings(struct repository *r)
> if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0))
> r->settings.core_multi_pack_index = 1;
>
> - /*
> - * Non-boolean config
> - */
> - if (!repo_config_get_int(r, "index.version", &value))
> - r->settings.index_version = value;
> -
> - if (!repo_config_get_string(r, "core.untrackedcache", &strval)) {
> - int v = git_parse_maybe_bool(strval);
> -
> - /*
> - * If it's set to "keep", or some other non-boolean
> - * value then "v < 0". Then we do nothing and keep it
> - * at the default of UNTRACKED_CACHE_KEEP.
> - */
> - if (v >= 0)
> - r->settings.core_untracked_cache = v ?
> - UNTRACKED_CACHE_WRITE : UNTRACKED_CACHE_REMOVE;
> - free(strval);
> - }
> -
> - if (!repo_config_get_string(r, "fetch.negotiationalgorithm", &strval)) {
> - int fetch_default = r->settings.fetch_negotiation_algorithm;
> - if (!strcasecmp(strval, "skipping"))
> - r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_SKIPPING;
> - else if (!strcasecmp(strval, "noop"))
> - r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_NOOP;
> - else if (!strcasecmp(strval, "consecutive"))
> - r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_CONSECUTIVE;
> - else if (!strcasecmp(strval, "default"))
> - r->settings.fetch_negotiation_algorithm = fetch_default;
> - else
> - die("unknown fetch negotiation algorithm '%s'", strval);
> - }
> -
> /*
> * This setting guards all index reads to require a full index
> * over a sparse index. After suitable guards are placed in the
>
> base-commit: 715d08a9e51251ad8290b181b6ac3b9e1f9719d7
next prev parent reply other threads:[~2022-03-23 20:11 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-23 18:03 [RFC PATCH] repo-settings: set defaults even when not in a repo Josh Steadmon
2022-03-23 19:22 ` Derrick Stolee
2022-03-23 19:52 ` Taylor Blau
2022-03-28 19:15 ` Josh Steadmon
2022-03-29 1:21 ` Taylor Blau
2022-03-28 19:53 ` Josh Steadmon
2022-03-29 1:22 ` Taylor Blau
2022-03-29 9:03 ` Ævar Arnfjörð Bjarmason
2022-03-30 2:26 ` Taylor Blau
2022-04-09 6:33 ` Josh Steadmon
2022-03-29 9:04 ` Ævar Arnfjörð Bjarmason
2022-03-30 2:34 ` Taylor Blau
2022-03-30 17:38 ` Ævar Arnfjörð Bjarmason
2022-03-30 20:14 ` Junio C Hamano
2022-04-09 6:52 ` [RFC PATCH v2] commit-graph: refactor to avoid prepare_repo_settings Josh Steadmon
2022-06-07 20:02 ` Jonathan Tan
2022-06-14 22:38 ` Josh Steadmon
2022-06-14 22:37 ` [PATCH v3] " Josh Steadmon
2022-06-14 23:32 ` Taylor Blau
2022-06-23 21:59 ` Junio C Hamano
2022-07-14 21:44 ` Josh Steadmon
2022-07-14 21:43 ` [PATCH v4] commit-graph: pass repo_settings instead of repository Josh Steadmon
2022-07-14 22:48 ` Junio C Hamano
2022-03-23 20:11 ` Victoria Dye [this message]
2022-03-23 20:54 ` [RFC PATCH] repo-settings: set defaults even when not in a repo Junio C Hamano
2022-03-23 21:19 ` Victoria Dye
2022-03-23 20:51 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fcfdcbb9-761a-0d34-7d36-61e0ef279922@github.com \
--to=vdye@github.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=lessleydennington@gmail.com \
--cc=steadmon@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).