git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [RFC PATCH 0/2] Conditional config includes based on remote URL
@ 2021-10-12 22:57 Jonathan Tan
  2021-10-12 22:57 ` [RFC PATCH 1/2] config: make git_config_include() static Jonathan Tan
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Jonathan Tan @ 2021-10-12 22:57 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Previously [1], I sent a patch set for remote-suggested configs that are
transmitted when fetching, but there were some security concerns. Here
is another way that remote repo administators can provide recommended
configs - through conditionally included files based on the configured
remote. Git itself neither transmits nor prompts for these files, which
hopefully reduces people's concerns.

More information is in the commit message of patch 2.

[1] https://lore.kernel.org/git/cover.1623881977.git.jonathantanmy@google.com/

Jonathan Tan (2):
  config: make git_config_include() static
  config: include file if remote URL matches a glob

 config.c          | 80 ++++++++++++++++++++++++++++++++++++++++++-----
 config.h          | 37 +++-------------------
 t/t1300-config.sh | 27 ++++++++++++++++
 3 files changed, 103 insertions(+), 41 deletions(-)

-- 
2.33.0.882.g93a45727a2-goog


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [RFC PATCH 1/2] config: make git_config_include() static
  2021-10-12 22:57 [RFC PATCH 0/2] Conditional config includes based on remote URL Jonathan Tan
@ 2021-10-12 22:57 ` Jonathan Tan
  2021-10-12 23:07   ` Jeff King
                     ` (2 more replies)
  2021-10-12 22:57 ` [RFC PATCH 2/2] config: include file if remote URL matches a glob Jonathan Tan
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 15+ messages in thread
From: Jonathan Tan @ 2021-10-12 22:57 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

It is not used from outside the file in which it is declared.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 config.c | 12 +++++++++++-
 config.h | 37 ++++---------------------------------
 2 files changed, 15 insertions(+), 34 deletions(-)

diff --git a/config.c b/config.c
index 2edf835262..365d57833b 100644
--- a/config.c
+++ b/config.c
@@ -120,6 +120,16 @@ static long config_buf_ftell(struct config_source *conf)
 	return conf->u.buf.pos;
 }
 
+struct config_include_data {
+	int depth;
+	config_fn_t fn;
+	void *data;
+	const struct config_options *opts;
+};
+#define CONFIG_INCLUDE_INIT { 0 }
+
+static int git_config_include(const char *var, const char *value, void *data);
+
 #define MAX_INCLUDE_DEPTH 10
 static const char include_depth_advice[] = N_(
 "exceeded maximum include depth (%d) while including\n"
@@ -306,7 +316,7 @@ static int include_condition_is_true(const struct config_options *opts,
 	return 0;
 }
 
-int git_config_include(const char *var, const char *value, void *data)
+static int git_config_include(const char *var, const char *value, void *data)
 {
 	struct config_include_data *inc = data;
 	const char *cond, *key;
diff --git a/config.h b/config.h
index 147f5e0490..b11b0be09a 100644
--- a/config.h
+++ b/config.h
@@ -126,6 +126,8 @@ int git_default_config(const char *, const char *, void *);
 /**
  * Read a specific file in git-config format.
  * This function takes the same callback and data parameters as `git_config`.
+ *
+ * Unlike git_config(), this function does not respect includes.
  */
 int git_config_from_file(config_fn_t fn, const char *, void *);
 
@@ -158,6 +160,8 @@ void read_very_early_config(config_fn_t cb, void *data);
  * will first feed the user-wide one to the callback, and then the
  * repo-specific one; by overwriting, the higher-priority repo-specific
  * value is left at the end).
+ *
+ * Unlike git_config_from_file(), this function respects includes.
  */
 void git_config(config_fn_t fn, void *);
 
@@ -339,39 +343,6 @@ const char *current_config_origin_type(void);
 const char *current_config_name(void);
 int current_config_line(void);
 
-/**
- * Include Directives
- * ------------------
- *
- * By default, the config parser does not respect include directives.
- * However, a caller can use the special `git_config_include` wrapper
- * callback to support them. To do so, you simply wrap your "real" callback
- * function and data pointer in a `struct config_include_data`, and pass
- * the wrapper to the regular config-reading functions. For example:
- *
- * -------------------------------------------
- * int read_file_with_include(const char *file, config_fn_t fn, void *data)
- * {
- * struct config_include_data inc = CONFIG_INCLUDE_INIT;
- * inc.fn = fn;
- * inc.data = data;
- * return git_config_from_file(git_config_include, file, &inc);
- * }
- * -------------------------------------------
- *
- * `git_config` respects includes automatically. The lower-level
- * `git_config_from_file` does not.
- *
- */
-struct config_include_data {
-	int depth;
-	config_fn_t fn;
-	void *data;
-	const struct config_options *opts;
-};
-#define CONFIG_INCLUDE_INIT { 0 }
-int git_config_include(const char *name, const char *value, void *data);
-
 /*
  * Match and parse a config key of the form:
  *
-- 
2.33.0.882.g93a45727a2-goog


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [RFC PATCH 2/2] config: include file if remote URL matches a glob
  2021-10-12 22:57 [RFC PATCH 0/2] Conditional config includes based on remote URL Jonathan Tan
  2021-10-12 22:57 ` [RFC PATCH 1/2] config: make git_config_include() static Jonathan Tan
@ 2021-10-12 22:57 ` Jonathan Tan
  2021-10-12 23:30   ` Jeff King
  2021-10-12 23:48   ` Junio C Hamano
  2021-10-13  0:46 ` [RFC PATCH 0/2] Conditional config includes based on remote URL brian m. carlson
  2021-10-18 20:48 ` Jonathan Tan
  3 siblings, 2 replies; 15+ messages in thread
From: Jonathan Tan @ 2021-10-12 22:57 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

This is a feature that supports config file inclusion conditional on
whether the repo has a remote with a URL that matches a glob.

Similar to my previous work on remote-suggested hooks [1], the main
motivation is to allow remote repo administrators to provide recommended
configs in a way that can be consumed more easily (e.g. through a
package installable by a package manager). But this can also be used by,
say, an individual that wants certain configs to apply to a certain set
of local repos but not others.

I marked this as RFC because there are some design points that need to
be resolved:

 - The existing "include" and "includeIf" instructions are executed
   immediately, whereas in order to be useful, the execution of
   "includeIf hasremoteurl" needs to be delayed until all config files
   are read. Are there better ways to do this?

 - Is the conditionally-included file allowed to have its own
   "include{,If}" instructions? I'm thinking that we should forbid it
   because, for example, if we had 4 files as follows: A includes B and
   C includes D, and we include A and C in our main config (in that
   order), it wouldn't be clear whether B (because A was first included)
   or C (because we should execute everything at the same depth first)
   should be executed first. (In this patch, I didn't do anything about
   includes.)

 - A small one: the exact format of the glob. I probably will treat the
   URL like a path.

[1] https://lore.kernel.org/git/cover.1623881977.git.jonathantanmy@google.com/

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 config.c          | 70 +++++++++++++++++++++++++++++++++++++++++------
 t/t1300-config.sh | 27 ++++++++++++++++++
 2 files changed, 89 insertions(+), 8 deletions(-)

diff --git a/config.c b/config.c
index 365d57833b..448509d549 100644
--- a/config.c
+++ b/config.c
@@ -125,8 +125,20 @@ struct config_include_data {
 	config_fn_t fn;
 	void *data;
 	const struct config_options *opts;
+
+	/*
+	 * All remote URLs discovered when reading all config files.
+	 */
+	struct string_list remote_urls;
+
+	/*
+	 * All "includeif.hasremoteurl:" entries. The item is the URL glob and the
+	 * util is the path (must be freed).
+	 */
+	struct string_list include_url_glob_to_path;
 };
-#define CONFIG_INCLUDE_INIT { 0 }
+#define CONFIG_INCLUDE_INIT { .remote_urls = STRING_LIST_INIT_DUP, \
+	.include_url_glob_to_path = STRING_LIST_INIT_DUP }
 
 static int git_config_include(const char *var, const char *value, void *data);
 
@@ -319,10 +331,18 @@ static int include_condition_is_true(const struct config_options *opts,
 static int git_config_include(const char *var, const char *value, void *data)
 {
 	struct config_include_data *inc = data;
+	const char *remote_name;
+	size_t remote_name_len;
 	const char *cond, *key;
 	size_t cond_len;
 	int ret;
 
+	if (!parse_config_key(var, "remote", &remote_name, &remote_name_len,
+			      &key) &&
+	    remote_name &&
+	    !strcmp(key, "url"))
+		string_list_append(&inc->remote_urls, value);
+
 	/*
 	 * Pass along all values, including "include" directives; this makes it
 	 * possible to query information on the includes themselves.
@@ -335,9 +355,18 @@ static int git_config_include(const char *var, const char *value, void *data)
 		ret = handle_path_include(value, inc);
 
 	if (!parse_config_key(var, "includeif", &cond, &cond_len, &key) &&
-	    (cond && include_condition_is_true(inc->opts, cond, cond_len)) &&
-	    !strcmp(key, "path"))
-		ret = handle_path_include(value, inc);
+	    cond && !strcmp(key, "path")) {
+		if (skip_prefix_mem(cond, cond_len, "hasremoteurl:", &cond,
+				    &cond_len)) {
+			struct string_list_item *item = string_list_append_nodup(
+				&inc->include_url_glob_to_path,
+				xmemdupz(cond, cond_len));
+			item->util = xstrdup(value);
+			ret = 0;
+		} else if (include_condition_is_true(inc->opts, cond, cond_len)) {
+			ret = handle_path_include(value, inc);
+		}
+	}
 
 	return ret;
 }
@@ -1951,6 +1980,8 @@ int config_with_options(config_fn_t fn, void *data,
 			const struct config_options *opts)
 {
 	struct config_include_data inc = CONFIG_INCLUDE_INIT;
+	int ret;
+	struct string_list_item *glob_item;
 
 	if (opts->respect_includes) {
 		inc.fn = fn;
@@ -1968,17 +1999,40 @@ int config_with_options(config_fn_t fn, void *data,
 	 * regular lookup sequence.
 	 */
 	if (config_source && config_source->use_stdin) {
-		return git_config_from_stdin(fn, data);
+		ret = git_config_from_stdin(fn, data);
 	} else if (config_source && config_source->file) {
-		return git_config_from_file(fn, config_source->file, data);
+		ret = git_config_from_file(fn, config_source->file, data);
 	} else if (config_source && config_source->blob) {
 		struct repository *repo = config_source->repo ?
 			config_source->repo : the_repository;
-		return git_config_from_blob_ref(fn, repo, config_source->blob,
+		ret = git_config_from_blob_ref(fn, repo, config_source->blob,
 						data);
+	} else {
+		ret = do_git_config_sequence(opts, fn, data);
+	}
+
+	for_each_string_list_item(glob_item, &inc.include_url_glob_to_path) {
+		struct strbuf pattern = STRBUF_INIT;
+		struct string_list_item *url_item;
+		int found = 0;
+
+		strbuf_addstr(&pattern, glob_item->string);
+		add_trailing_starstar_for_dir(&pattern);
+		for_each_string_list_item(url_item, &inc.remote_urls) {
+			if (!wildmatch(pattern.buf, url_item->string, WM_PATHNAME)) {
+				found = 1;
+				break;
+			}
+		}
+		strbuf_release(&pattern);
+		if (found) {
+			handle_path_include(glob_item->util, &inc);
+		}
 	}
 
-	return do_git_config_sequence(opts, fn, data);
+	string_list_clear(&inc.remote_urls, 0);
+	string_list_clear(&inc.include_url_glob_to_path, 1);
+	return ret;
 }
 
 static void configset_iter(struct config_set *cs, config_fn_t fn, void *data)
diff --git a/t/t1300-config.sh b/t/t1300-config.sh
index 9ff46f3b04..4803155f89 100755
--- a/t/t1300-config.sh
+++ b/t/t1300-config.sh
@@ -2387,4 +2387,31 @@ test_expect_success '--get and --get-all with --fixed-value' '
 	test_must_fail git config --file=config --get-regexp --fixed-value fixed+ non-existent
 '
 
+test_expect_success 'includeIf.hasremoteurl' '
+	test_create_repo hasremoteurlTest &&
+
+	cat >"$(pwd)"/include-this <<-\EOF &&
+	[user]
+		this = this-is-included
+	EOF
+	cat >"$(pwd)"/dont-include-that <<-\EOF &&
+	[user]
+		that = that-is-not-included
+	EOF
+	cat >>hasremoteurlTest/.git/config <<-EOF &&
+	[includeIf "hasremoteurl:foo"]
+		path = "$(pwd)/include-this"
+	[includeIf "hasremoteurl:bar"]
+		path = "$(pwd)/dont-include-that"
+	[remote "foo"]
+		url = foo
+	EOF
+
+	echo this-is-included >expect-this &&
+	git -C hasremoteurlTest config --get user.this >actual-this &&
+	test_cmp expect-this actual-this &&
+
+	test_must_fail git -C hasremoteurlTest config --get user.that
+'
+
 test_done
-- 
2.33.0.882.g93a45727a2-goog


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 1/2] config: make git_config_include() static
  2021-10-12 22:57 ` [RFC PATCH 1/2] config: make git_config_include() static Jonathan Tan
@ 2021-10-12 23:07   ` Jeff King
  2021-10-12 23:26   ` Junio C Hamano
  2021-10-13  8:26   ` Ævar Arnfjörð Bjarmason
  2 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2021-10-12 23:07 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On Tue, Oct 12, 2021 at 03:57:22PM -0700, Jonathan Tan wrote:

> It is not used from outside the file in which it is declared.

Makes sense. We used to use it from builtin/config.c, but that went away
in e895589883 (git-config: use git_config_with_options, 2012-10-23).

> diff --git a/config.h b/config.h
> index 147f5e0490..b11b0be09a 100644
> --- a/config.h
> +++ b/config.h
> @@ -126,6 +126,8 @@ int git_default_config(const char *, const char *, void *);
>  /**
>   * Read a specific file in git-config format.
>   * This function takes the same callback and data parameters as `git_config`.
> + *
> + * Unlike git_config(), this function does not respect includes.
>   */

Breaking out the relevant caller-facing parts of the documentation like
this is a nice touch.

And the rest of the patch looks good to me.

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 1/2] config: make git_config_include() static
  2021-10-12 22:57 ` [RFC PATCH 1/2] config: make git_config_include() static Jonathan Tan
  2021-10-12 23:07   ` Jeff King
@ 2021-10-12 23:26   ` Junio C Hamano
  2021-10-13  8:26   ` Ævar Arnfjörð Bjarmason
  2 siblings, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2021-10-12 23:26 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan <jonathantanmy@google.com> writes:

> It is not used from outside the file in which it is declared.
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  config.c | 12 +++++++++++-
>  config.h | 37 ++++---------------------------------
>  2 files changed, 15 insertions(+), 34 deletions(-)

Nice.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 2/2] config: include file if remote URL matches a glob
  2021-10-12 22:57 ` [RFC PATCH 2/2] config: include file if remote URL matches a glob Jonathan Tan
@ 2021-10-12 23:30   ` Jeff King
  2021-10-13 18:33     ` Jonathan Tan
  2021-10-12 23:48   ` Junio C Hamano
  1 sibling, 1 reply; 15+ messages in thread
From: Jeff King @ 2021-10-12 23:30 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On Tue, Oct 12, 2021 at 03:57:23PM -0700, Jonathan Tan wrote:

> This is a feature that supports config file inclusion conditional on
> whether the repo has a remote with a URL that matches a glob.
> 
> Similar to my previous work on remote-suggested hooks [1], the main
> motivation is to allow remote repo administrators to provide recommended
> configs in a way that can be consumed more easily (e.g. through a
> package installable by a package manager). But this can also be used by,
> say, an individual that wants certain configs to apply to a certain set
> of local repos but not others.

OK. I was a little wary after reading the subject that this would be
"when we are using such a URL", which is full of all kinds of odd corner
cases. But if it is "a remote is defined with a matching URL" that makes
it a property of the repository, not the operation.

I think in general this kind of feature is currently served by just
correlating filesystem paths with their function. So with your patch I
could do:

  [includeIf "hasremoteurl:https://myjob.example.com"]
  path = foo.conf

But in general, I'd imagine most people put their repository in ~/work
or similar, and just do:

  [includeIf "gitdir:~/work"]
  path = foo.conf

(and of course you can imagine more subdivisions as necessary). So I
find the use-case only sort-of compelling. In general, I'm in favor of
adding new includeIf directions even if they're only moderately
convenient. But this one is rather sticky, because it is dependent on
other config keys being defined. So it introduces a new and complicated
ordering issue. Is it worth it? Maybe I'm not being imaginative enough
in seeing the use cases.

> I marked this as RFC because there are some design points that need to
> be resolved:
> 
>  - The existing "include" and "includeIf" instructions are executed
>    immediately, whereas in order to be useful, the execution of
>    "includeIf hasremoteurl" needs to be delayed until all config files
>    are read. Are there better ways to do this?

Note that this violates the "as if they had been found at the location
of the include directive" rule which we advertise to users. I'd imagine
that most of the time it doesn't matter, but this is a pretty big
exception we'll need to document.

Just brainstorming some alternatives:

  - We could stop the world while we are parsing and do a _new_ parse
    that just looks at the remote config (in fact, this is the natural
    thing if you were consulting the regular remote.c code for the list
    of remotes, because it does its own config parse).

    That does mean that the remote-conditional includes cannot
    themselves define new remotes. But I think that is already the case
    with your patch (and violating that gets you into weird circular
    problems).

  - We could simply document that if you want to depend on conditional
    includes based on a particular remote.*.url existing, then that
    remote config must appear earlier in the sequence.

    This is a bit ugly, because I'm sure it will bite somebody
    eventually. But at the same time, it resolves all of the weird
    timing issues, and does so in a way that will be easy to match if we
    have any other config dependencies.

>  - Is the conditionally-included file allowed to have its own
>    "include{,If}" instructions? I'm thinking that we should forbid it
>    because, for example, if we had 4 files as follows: A includes B and
>    C includes D, and we include A and C in our main config (in that
>    order), it wouldn't be clear whether B (because A was first included)
>    or C (because we should execute everything at the same depth first)
>    should be executed first. (In this patch, I didn't do anything about
>    includes.)

I'd say that A would expand B at the moment it is parsed, by the usual
as-if rule. If it has a recursive includeIf on remotes, then my head may
explode. I'd argue that we should refuse to do recursive remote-ifs in
that case (though all of this is a consequence of the after-the-fact
parsing; I'd much prefer one of the alternatives I gave earlier).

>  - A small one: the exact format of the glob. I probably will treat the
>    URL like a path.

You might want to use the matcher from urlmatch.[ch], which understands
things like wildcards. Of course remote "URLs" are not always real
syntactically valid URLs, which may make that awkward.

Barring that the usual fnmatch glob is probably our best bet.

> @@ -319,10 +331,18 @@ static int include_condition_is_true(const struct config_options *opts,
>  static int git_config_include(const char *var, const char *value, void *data)
>  {
>  	struct config_include_data *inc = data;
> +	const char *remote_name;
> +	size_t remote_name_len;
>  	const char *cond, *key;
>  	size_t cond_len;
>  	int ret;
>  
> +	if (!parse_config_key(var, "remote", &remote_name, &remote_name_len,
> +			      &key) &&
> +	    remote_name &&
> +	    !strcmp(key, "url"))
> +		string_list_append(&inc->remote_urls, value);

So we make a copy of every remote name on the off chance that somebody
has an includeIf which looks at it. That feels wasteful, though in
practice it's probably not that big a deal.

By doing the config parsing ourselves here we're missing out on any
other forms of remote, like .git/remotes. Those are old and not widely
used, and I'd be OK with skipping them. But we should clearly document
that this is matching remote.*.url, not any of the other mechanisms.

> [...]

I only lightly read the rest of the patch. I didn't see anything
obviously wrong, but I think the goal at this point is figuring out the
design.

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 2/2] config: include file if remote URL matches a glob
  2021-10-12 22:57 ` [RFC PATCH 2/2] config: include file if remote URL matches a glob Jonathan Tan
  2021-10-12 23:30   ` Jeff King
@ 2021-10-12 23:48   ` Junio C Hamano
  2021-10-13 19:52     ` Jonathan Tan
  1 sibling, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2021-10-12 23:48 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan <jonathantanmy@google.com> writes:

> I marked this as RFC because there are some design points that need to
> be resolved:
>
>  - The existing "include" and "includeIf" instructions are executed
>    immediately, whereas in order to be useful, the execution of
>    "includeIf hasremoteurl" needs to be delayed until all config files
>    are read. Are there better ways to do this?

An interesting chicken-and-egg problem.  Even if an included
configuration file does not have further "include", you may discover
there are more remotes, which may add new includes to fire from the
top-level configuration file.

What if we have multiple remotes?  Is it a sufficient match for only
one of them match what is mentioned in the includeIf condition?
Should all of them must match the pattern instead?  Majority,
perhaps?  Something else?

>  - Is the conditionally-included file allowed to have its own
>    "include{,If}" instructions? I'm thinking that we should forbid it
>    because, for example, if we had 4 files as follows: A includes B and
>    C includes D, and we include A and C in our main config (in that
>    order), it wouldn't be clear whether B (because A was first included)
>    or C (because we should execute everything at the same depth first)
>    should be executed first. (In this patch, I didn't do anything about
>    includes.)

Interesting.  The order of real inclusion obviously would affect the
outcome of the "last one wins" rule.  And this does not have to be
limited to this "hasremote" condition, so we need to design it with
a bit of care.

Would it be possible for a newly included file to invalidate an
earlier condition that was used to decide whether to include another
file or not?  If not, then you can take a two-pass approach where
the first pass is used to decide solely to discover which
conditionally included files are taken, clear the slate and the
parse these files in the textual order.  In the case of your example
above, the early part of the primary config would be the first to be
read, then comes A's early part, then comes B in its entirety, then
the rest of A, and then the middle part of the primary config, then
C's early part, then D, and then the rest of C,... you got the idea.

If it is possible for an included file to invalidate a condition we
have already evaluated to make a decision, it would become messy.
For example, we might decide to include another file based on the
value we discovered for a config variable:

    === .git/config ===
    [my] variable
    [your] variable = false

    [includeIf "configEq:my.variable==true"]
            file = fileA

but the included file may override the condition, e.g.

    === fileA ===
    [my] variable = false
    [your] variable = true

and applying the "last one wins" rule becomes messy.  I do not
offhand know what these

    $ git config --bool my.variable
    $ git config --bool your.variable

should say, and do not have a good explanation for possible
outcomes.

Maybe the above example can serve as a way to guide us when we
design the types of conditionals we allow in includeIf.  This
example tells us that it is probably a terrible idea to allow using
values of configuration variables as part of "includeIf" condition.

There may lead to similar "'hasremoteurl' makes a well-behaved
condition, because [remote] are additive and not 'last-one-wins',
but we cannot add 'lacksremoteurl' as a condition, because a file we
decide to include based on a 'lacks' predicate may invalidate the
'lacks' condition by defining such a remote" design decisions you'd
need to make around the URLs of the remotes defined for the
repository.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 0/2] Conditional config includes based on remote URL
  2021-10-12 22:57 [RFC PATCH 0/2] Conditional config includes based on remote URL Jonathan Tan
  2021-10-12 22:57 ` [RFC PATCH 1/2] config: make git_config_include() static Jonathan Tan
  2021-10-12 22:57 ` [RFC PATCH 2/2] config: include file if remote URL matches a glob Jonathan Tan
@ 2021-10-13  0:46 ` brian m. carlson
  2021-10-13 18:17   ` Jonathan Tan
  2021-10-18 20:48 ` Jonathan Tan
  3 siblings, 1 reply; 15+ messages in thread
From: brian m. carlson @ 2021-10-13  0:46 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 920 bytes --]

On 2021-10-12 at 22:57:21, Jonathan Tan wrote:
> Previously [1], I sent a patch set for remote-suggested configs that are
> transmitted when fetching, but there were some security concerns. Here
> is another way that remote repo administators can provide recommended
> configs - through conditionally included files based on the configured
> remote. Git itself neither transmits nor prompts for these files, which
> hopefully reduces people's concerns.
> 
> More information is in the commit message of patch 2.

I won't go into the details of the patches, since I'm a little low on
time at the moment, but I think from what I've seen of the cover letter
and the commit messages, this approach is much better from a security
perspective and, provided we can get the kinks mentioned downthread
ironed out, I'd be happy to see this merged.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 1/2] config: make git_config_include() static
  2021-10-12 22:57 ` [RFC PATCH 1/2] config: make git_config_include() static Jonathan Tan
  2021-10-12 23:07   ` Jeff King
  2021-10-12 23:26   ` Junio C Hamano
@ 2021-10-13  8:26   ` Ævar Arnfjörð Bjarmason
  2021-10-13 17:00     ` Junio C Hamano
  2 siblings, 1 reply; 15+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-13  8:26 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git


On Tue, Oct 12 2021, Jonathan Tan wrote:

> It is not used from outside the file in which it is declared.
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  config.c | 12 +++++++++++-
>  config.h | 37 ++++---------------------------------
>  2 files changed, 15 insertions(+), 34 deletions(-)
>
> diff --git a/config.c b/config.c
> index 2edf835262..365d57833b 100644
> --- a/config.c
> +++ b/config.c
> @@ -120,6 +120,16 @@ static long config_buf_ftell(struct config_source *conf)
>  	return conf->u.buf.pos;
>  }
>  
> +struct config_include_data {
> +	int depth;
> +	config_fn_t fn;
> +	void *data;
> +	const struct config_options *opts;
> +};
> +#define CONFIG_INCLUDE_INIT { 0 }
> +
> +static int git_config_include(const char *var, const char *value, void *data);

Can't we just move the function here?

>  #define MAX_INCLUDE_DEPTH 10
>  static const char include_depth_advice[] = N_(
>  "exceeded maximum include depth (%d) while including\n"
> @@ -306,7 +316,7 @@ static int include_condition_is_true(const struct config_options *opts,
>  	return 0;
>  }
>  
> -int git_config_include(const char *var, const char *value, void *data)
> +static int git_config_include(const char *var, const char *value, void *data)

...and avoid the forward declaration?

I've seen that in a few places, making the diff smaller here doesn't
seem worth it v.s. maintaining the definition in two places.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 1/2] config: make git_config_include() static
  2021-10-13  8:26   ` Ævar Arnfjörð Bjarmason
@ 2021-10-13 17:00     ` Junio C Hamano
  2021-10-13 18:13       ` Jonathan Tan
  0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2021-10-13 17:00 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Jonathan Tan, git

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Tue, Oct 12 2021, Jonathan Tan wrote:
>
>> It is not used from outside the file in which it is declared.
>>
>> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
>> ---
>>  config.c | 12 +++++++++++-
>>  config.h | 37 ++++---------------------------------
>>  2 files changed, 15 insertions(+), 34 deletions(-)
>>
>> diff --git a/config.c b/config.c
>> index 2edf835262..365d57833b 100644
>> --- a/config.c
>> +++ b/config.c
>> @@ -120,6 +120,16 @@ static long config_buf_ftell(struct config_source *conf)
>>  	return conf->u.buf.pos;
>>  }
>>  
>> +struct config_include_data {
>> +	int depth;
>> +	config_fn_t fn;
>> +	void *data;
>> +	const struct config_options *opts;
>> +};
>> +#define CONFIG_INCLUDE_INIT { 0 }
>> +
>> +static int git_config_include(const char *var, const char *value, void *data);
>
> Can't we just move the function here?
>
>>  #define MAX_INCLUDE_DEPTH 10
>>  static const char include_depth_advice[] = N_(
>>  "exceeded maximum include depth (%d) while including\n"
>> @@ -306,7 +316,7 @@ static int include_condition_is_true(const struct config_options *opts,
>>  	return 0;
>>  }
>>  
>> -int git_config_include(const char *var, const char *value, void *data)
>> +static int git_config_include(const char *var, const char *value, void *data)
>
> ...and avoid the forward declaration?
>
> I've seen that in a few places, making the diff smaller here doesn't
> seem worth it v.s. maintaining the definition in two places.

Sounds good.  If we are moving things around anyway, it is probably
a good time to do that, too ;-)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 1/2] config: make git_config_include() static
  2021-10-13 17:00     ` Junio C Hamano
@ 2021-10-13 18:13       ` Jonathan Tan
  0 siblings, 0 replies; 15+ messages in thread
From: Jonathan Tan @ 2021-10-13 18:13 UTC (permalink / raw)
  To: gitster; +Cc: avarab, jonathantanmy, git

> > ...and avoid the forward declaration?
> >
> > I've seen that in a few places, making the diff smaller here doesn't
> > seem worth it v.s. maintaining the definition in two places.
> 
> Sounds good.  If we are moving things around anyway, it is probably
> a good time to do that, too ;-)

OK, I'll do this.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 0/2] Conditional config includes based on remote URL
  2021-10-13  0:46 ` [RFC PATCH 0/2] Conditional config includes based on remote URL brian m. carlson
@ 2021-10-13 18:17   ` Jonathan Tan
  0 siblings, 0 replies; 15+ messages in thread
From: Jonathan Tan @ 2021-10-13 18:17 UTC (permalink / raw)
  To: sandals; +Cc: git, Jonathan Tan

> I won't go into the details of the patches, since I'm a little low on
> time at the moment, but I think from what I've seen of the cover letter
> and the commit messages, this approach is much better from a security
> perspective and, provided we can get the kinks mentioned downthread
> ironed out, I'd be happy to see this merged.

Thanks - I really appreciate this note. Thanks also for all your
thoughts up to now about the security perspective.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 2/2] config: include file if remote URL matches a glob
  2021-10-12 23:30   ` Jeff King
@ 2021-10-13 18:33     ` Jonathan Tan
  0 siblings, 0 replies; 15+ messages in thread
From: Jonathan Tan @ 2021-10-13 18:33 UTC (permalink / raw)
  To: peff; +Cc: jonathantanmy, git

> OK. I was a little wary after reading the subject that this would be
> "when we are using such a URL", which is full of all kinds of odd corner
> cases. But if it is "a remote is defined with a matching URL" that makes
> it a property of the repository, not the operation.
> 
> I think in general this kind of feature is currently served by just
> correlating filesystem paths with their function. So with your patch I
> could do:
> 
>   [includeIf "hasremoteurl:https://myjob.example.com"]
>   path = foo.conf
> 
> But in general, I'd imagine most people put their repository in ~/work
> or similar, and just do:
> 
>   [includeIf "gitdir:~/work"]
>   path = foo.conf
> 
> (and of course you can imagine more subdivisions as necessary). So I
> find the use-case only sort-of compelling. In general, I'm in favor of
> adding new includeIf directions even if they're only moderately
> convenient. But this one is rather sticky, because it is dependent on
> other config keys being defined. So it introduces a new and complicated
> ordering issue. Is it worth it? Maybe I'm not being imaginative enough
> in seeing the use cases.

My main use case is for a remote repo administrator to offer a
recommended config to anyone who clones that repo. For this, I don't
think we can prescribe a local directory structure (e.g. "~/work")
without being too restrictive or broad (that is, if the user ends up
creating a repo that so happens to match our glob but did not intend the
config to apply to it).

I did bring up the idea that an individual could use this to have config
in one place that affects a subset of remotes, but you're right that
they could just do this by putting repositories at different places in
the filesystem.

> > I marked this as RFC because there are some design points that need to
> > be resolved:
> > 
> >  - The existing "include" and "includeIf" instructions are executed
> >    immediately, whereas in order to be useful, the execution of
> >    "includeIf hasremoteurl" needs to be delayed until all config files
> >    are read. Are there better ways to do this?
> 
> Note that this violates the "as if they had been found at the location
> of the include directive" rule which we advertise to users. I'd imagine
> that most of the time it doesn't matter, but this is a pretty big
> exception we'll need to document.

Yes, that's true. Another thing I just thought of is to add a new
"deferIncludeIf" which makes clear the different semantics (deferred
include, and perhaps not allow recursive includes).

> Just brainstorming some alternatives:
> 
>   - We could stop the world while we are parsing and do a _new_ parse
>     that just looks at the remote config (in fact, this is the natural
>     thing if you were consulting the regular remote.c code for the list
>     of remotes, because it does its own config parse).
> 
>     That does mean that the remote-conditional includes cannot
>     themselves define new remotes. But I think that is already the case
>     with your patch (and violating that gets you into weird circular
>     problems).

Hmm...yes, having a special-case rule that such an included file cannot
define new remotes would be complex.

>   - We could simply document that if you want to depend on conditional
>     includes based on a particular remote.*.url existing, then that
>     remote config must appear earlier in the sequence.
> 
>     This is a bit ugly, because I'm sure it will bite somebody
>     eventually. But at the same time, it resolves all of the weird
>     timing issues, and does so in a way that will be easy to match if we
>     have any other config dependencies.

My main issue with this is that different config files are read at
different times, and the repo config (that usually contains the remote)
is read last.

> >  - Is the conditionally-included file allowed to have its own
> >    "include{,If}" instructions? I'm thinking that we should forbid it
> >    because, for example, if we had 4 files as follows: A includes B and
> >    C includes D, and we include A and C in our main config (in that
> >    order), it wouldn't be clear whether B (because A was first included)
> >    or C (because we should execute everything at the same depth first)
> >    should be executed first. (In this patch, I didn't do anything about
> >    includes.)
> 
> I'd say that A would expand B at the moment it is parsed, by the usual
> as-if rule. If it has a recursive includeIf on remotes, then my head may
> explode. I'd argue that we should refuse to do recursive remote-ifs in
> that case (though all of this is a consequence of the after-the-fact
> parsing; I'd much prefer one of the alternatives I gave earlier).

If we can't expand in place, I would say that any recursive includes
should be refused. But as you said, we could still think about whether
in-place expansion can be done before addressing this question.

> >  - A small one: the exact format of the glob. I probably will treat the
> >    URL like a path.
> 
> You might want to use the matcher from urlmatch.[ch], which understands
> things like wildcards. Of course remote "URLs" are not always real
> syntactically valid URLs, which may make that awkward.
> 
> Barring that the usual fnmatch glob is probably our best bet.

OK.

> So we make a copy of every remote name on the off chance that somebody
> has an includeIf which looks at it. That feels wasteful, though in
> practice it's probably not that big a deal.
> 
> By doing the config parsing ourselves here we're missing out on any
> other forms of remote, like .git/remotes. Those are old and not widely
> used, and I'd be OK with skipping them. But we should clearly document
> that this is matching remote.*.url, not any of the other mechanisms.

Sounds good.

> I only lightly read the rest of the patch. I didn't see anything
> obviously wrong, but I think the goal at this point is figuring out the
> design.

Yes, that's right.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 2/2] config: include file if remote URL matches a glob
  2021-10-12 23:48   ` Junio C Hamano
@ 2021-10-13 19:52     ` Jonathan Tan
  0 siblings, 0 replies; 15+ messages in thread
From: Jonathan Tan @ 2021-10-13 19:52 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git

> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> > I marked this as RFC because there are some design points that need to
> > be resolved:
> >
> >  - The existing "include" and "includeIf" instructions are executed
> >    immediately, whereas in order to be useful, the execution of
> >    "includeIf hasremoteurl" needs to be delayed until all config files
> >    are read. Are there better ways to do this?
> 
> An interesting chicken-and-egg problem.  Even if an included
> configuration file does not have further "include", you may discover
> there are more remotes, which may add new includes to fire from the
> top-level configuration file.

That's true. We might need to say that such conditional includes are
based only on what happened during the main config parsing.

> What if we have multiple remotes?  Is it a sufficient match for only
> one of them match what is mentioned in the includeIf condition?
> Should all of them must match the pattern instead?  Majority,
> perhaps?  Something else?

I think at least one remote should match.

> >  - Is the conditionally-included file allowed to have its own
> >    "include{,If}" instructions? I'm thinking that we should forbid it
> >    because, for example, if we had 4 files as follows: A includes B and
> >    C includes D, and we include A and C in our main config (in that
> >    order), it wouldn't be clear whether B (because A was first included)
> >    or C (because we should execute everything at the same depth first)
> >    should be executed first. (In this patch, I didn't do anything about
> >    includes.)
> 
> Interesting.  The order of real inclusion obviously would affect the
> outcome of the "last one wins" rule.  And this does not have to be
> limited to this "hasremote" condition, so we need to design it with
> a bit of care.
> 
> Would it be possible for a newly included file to invalidate an
> earlier condition that was used to decide whether to include another
> file or not?  If not, then you can take a two-pass approach where
> the first pass is used to decide solely to discover which
> conditionally included files are taken, clear the slate and the
> parse these files in the textual order.  In the case of your example
> above, the early part of the primary config would be the first to be
> read, then comes A's early part, then comes B in its entirety, then
> the rest of A, and then the middle part of the primary config, then
> C's early part, then D, and then the rest of C,... you got the idea.
>
> If it is possible for an included file to invalidate a condition we
> have already evaluated to make a decision, it would become messy.
> For example, we might decide to include another file based on the
> value we discovered for a config variable:
> 
>     === .git/config ===
>     [my] variable
>     [your] variable = false
> 
>     [includeIf "configEq:my.variable==true"]
>             file = fileA
> 
> but the included file may override the condition, e.g.
> 
>     === fileA ===
>     [my] variable = false
>     [your] variable = true
> 
> and applying the "last one wins" rule becomes messy.  I do not
> offhand know what these
> 
>     $ git config --bool my.variable
>     $ git config --bool your.variable
> 
> should say, and do not have a good explanation for possible
> outcomes.

In this case, it makes sense to me to think that files are included
entirely or not at all, so my.variable would be false and your.variable
would be true. I guess the tricky part is something like:

  === .git/config ===
  [my] variable = true
  [your] variable = false
  [includeIf "configEq:my.variable==true"]
    file = fileA
  [includeIf "configEq:my.variable==false"]
    file = fileB
  === fileA ===
    my.variable = false
  === fileB ===
    your.variable = true

and what my.variable and your.variable would end up being.

> Maybe the above example can serve as a way to guide us when we
> design the types of conditionals we allow in includeIf.  This
> example tells us that it is probably a terrible idea to allow using
> values of configuration variables as part of "includeIf" condition.

Hmm...well, remote.foo.url is a configuration variable. I think that the
two-pass approach you describe would work if we prohibit subsequent
inclusions.

> There may lead to similar "'hasremoteurl' makes a well-behaved
> condition, because [remote] are additive and not 'last-one-wins',
> but we cannot add 'lacksremoteurl' as a condition, because a file we
> decide to include based on a 'lacks' predicate may invalidate the
> 'lacks' condition by defining such a remote" design decisions you'd
> need to make around the URLs of the remotes defined for the
> repository.

And if we implement two-pass with no subsequent inclusions, "lacks"
would work the same way.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 0/2] Conditional config includes based on remote URL
  2021-10-12 22:57 [RFC PATCH 0/2] Conditional config includes based on remote URL Jonathan Tan
                   ` (2 preceding siblings ...)
  2021-10-13  0:46 ` [RFC PATCH 0/2] Conditional config includes based on remote URL brian m. carlson
@ 2021-10-18 20:48 ` Jonathan Tan
  3 siblings, 0 replies; 15+ messages in thread
From: Jonathan Tan @ 2021-10-18 20:48 UTC (permalink / raw)
  To: jonathantanmy; +Cc: git, peff, gitster

After some in-office discussion, here are the alternatives as I see it:

 (1) Introduce a "includeAfterIf" (or "deferIncludeIf", or some other
     name) command that is executed after all config files are read. (If
     there are multiple, they are executed in order of appearance.)
     Files included by this mechanism cannot directly or indirectly
     contain another "includeAfterIf". This is the same as what was
     introduced in this patch set, except for the name of the directive.

 (2) Leave the name as "includeIf", and when it is encountered with a
     remote-URL condition: continue parsing the config files, skipping
     all "includeIf hasRemoteUrl", only looking for remote.*.url. After
     that, resume the reading of config files at the first "includeIf
     hasRemoteUrl", using the prior remote.*.url information gathered to
     determine which files to include when "includeIf hasRemoteUrl" is
     encountered. Files included by this mechanism cannot contain any
     "remote.*.url" variables.

In all cases, the include is executed if at least one remote URL
matches.

There are other ideas including:

 (3) remote.*.url must appear before a "includeIf hasRemoteUrl" that
     wants to match it. (But this doesn't fit our use case, in which a
     repo config has the URL but a system or user config has the
     include.)

 (4) "includeIf hasRemoteUrl" triggers a search of the repo config just
     for remote.*.url. (I think this out-of-order config search is more
     complicated than (2), though.)

For (2), I think that prohibiting "remote.*.url" from any "includeIf
hasRemoteUrl" files sidesteps questions like "what happens when an
included file overrides the URL that made us include this file in the
first place" or "what happens if an included file includes a
remote.*.url that validates or invalidates a prior or subsequent file",
because now that cannot happen at all. My main concern with this
prohibition was that if we were to introduce another similar condition
(say, one based on remote names), what would happen? But I think this is
solvable - make the prohibitions based only on all the conditions that
the actually used, so if the user only uses conditions on remote URLs,
then the user can still set refspecs (for example), even after the
remote-name-condition feature is introduced in Git.

For (1), it is simpler in concept (and also in implementation, I think).
The user just needs to know that certain includes are on-the-spot and
certain includes (the ones with "after" in the name) are deferred - in
particular, if a config variable isn't the value they expect, they'll
need to check that it wasn't introduced in an includeAfterIf file. (And
the user also needs to figure out that if they want to override such a
variable, they'll need to make their own includeAfterIf with an
always-true condition.)

From the prior replies, I think that people will be more interested in
(2) as it preserves the "last config wins" rule, and I'm inclined to go
for (2) too. I'll see if others have any other opinions, and if not I'll
see how the implementation of (2) will look like.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-10-18 20:48 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-12 22:57 [RFC PATCH 0/2] Conditional config includes based on remote URL Jonathan Tan
2021-10-12 22:57 ` [RFC PATCH 1/2] config: make git_config_include() static Jonathan Tan
2021-10-12 23:07   ` Jeff King
2021-10-12 23:26   ` Junio C Hamano
2021-10-13  8:26   ` Ævar Arnfjörð Bjarmason
2021-10-13 17:00     ` Junio C Hamano
2021-10-13 18:13       ` Jonathan Tan
2021-10-12 22:57 ` [RFC PATCH 2/2] config: include file if remote URL matches a glob Jonathan Tan
2021-10-12 23:30   ` Jeff King
2021-10-13 18:33     ` Jonathan Tan
2021-10-12 23:48   ` Junio C Hamano
2021-10-13 19:52     ` Jonathan Tan
2021-10-13  0:46 ` [RFC PATCH 0/2] Conditional config includes based on remote URL brian m. carlson
2021-10-13 18:17   ` Jonathan Tan
2021-10-18 20:48 ` Jonathan Tan

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).