git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / Atom feed
* [PATCH 1/1] config: work around bug with includeif:onbranch and early config
  2019-07-31 19:53 [PATCH 0/1] Make the includeif:onbranch feature more robust Johannes Schindelin via GitGitGadget
@ 2019-07-31 19:53 ` Johannes Schindelin via GitGitGadget
  2019-07-31 21:37   ` Junio C Hamano
  2019-07-31 20:06 ` [PATCH v2 0/1] Make the includeif:onbranch feature more robust Johannes Schindelin via GitGitGadget
  1 sibling, 1 reply; 17+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2019-07-31 19:53 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Johannes Schindelin

From: Johannes Schindelin <johasc@microsoft.com>

Since 07b2c0eacac (config: learn the "onbranch:" includeIf condition,
2019-06-05), there is a potential catch-22 in the early config path: if
the `include.onbranch:` feature is used, Git assumes that the Git
directory has been initialized already. However, in the early config
code path that is not true.

One way to trigger this is to call the following commands in any
repository:

	git config includeif.onbranch:refs/heads/master.path broken
	git help -a

The symptom triggered by the `git help -a` invocation reads like this:

BUG: refs.c:1851: attempting to get main_ref_store outside of repository

Let's work around this, simply by ignoring the `includeif.onbranch:`
setting when parsing the config when the ref store has not been
initialized (yet).

Technically, there is a way to solve this properly: teach the refs
machinery to initialize the ref_store from a given gitdir/commondir pair
(which we _do_ have in the early config code path), and then use that in
`include_by_branch()`. This, however, is a pretty involved project, and
we're already in the feature freeze for Git v2.23.0.

Note: when calling above-mentioned two commands _outside_ of any Git
worktree (passing the `--global` flag to `git config`, as there is
obviously no repository config available), at the point when
`include_by_branch()` is called, `the_repository` is `NULL`, therefore
we have to be extra careful not to dereference it in that case.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 config.c                | 3 ++-
 t/t1309-early-config.sh | 5 +++++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/config.c b/config.c
index ed7f58e0fc..3900e4947b 100644
--- a/config.c
+++ b/config.c
@@ -275,7 +275,8 @@ static int include_by_branch(const char *cond, size_t cond_len)
 	int flags;
 	int ret;
 	struct strbuf pattern = STRBUF_INIT;
-	const char *refname = resolve_ref_unsafe("HEAD", 0, NULL, &flags);
+	const char *refname = !the_repository || !the_repository->gitdir ?
+		NULL : resolve_ref_unsafe("HEAD", 0, NULL, &flags);
 	const char *shortname;
 
 	if (!refname || !(flags & REF_ISSYMREF)	||
diff --git a/t/t1309-early-config.sh b/t/t1309-early-config.sh
index 413642aa56..0c37e7180d 100755
--- a/t/t1309-early-config.sh
+++ b/t/t1309-early-config.sh
@@ -89,4 +89,9 @@ test_expect_failure 'ignore .git/ with invalid config' '
 	test_with_config "["
 '
 
+test_expect_success 'early config and onbranch' '
+	echo "[broken" >broken &&
+	test_with_config "[includeif \"onbranch:refs/heads/master\"]path=../broken"
+'
+
 test_done
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 0/1] Make the includeif:onbranch feature more robust
@ 2019-07-31 19:53 Johannes Schindelin via GitGitGadget
  2019-07-31 19:53 ` [PATCH 1/1] config: work around bug with includeif:onbranch and early config Johannes Schindelin via GitGitGadget
  2019-07-31 20:06 ` [PATCH v2 0/1] Make the includeif:onbranch feature more robust Johannes Schindelin via GitGitGadget
  0 siblings, 2 replies; 17+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2019-07-31 19:53 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

I actually stumbled over this while rebasing the VFS for Git patches, in
relation with the pre-command hook that we still support there (we're slowly
migrating toward Trace2, of course).

Once I figured out what the problem was, I hunted for a way to trigger this
bug in plain Git, and git help -a is the one I settled on. It had to be a
command that uses the early config machinery (and git help -a uses it to
list all aliases), and it had to be a command that does not discover a .git 
directory automatically (cmd_help is listed without any flags in git.c, so:
check).

Of course, part of me wants to just go and dig into the refs part of the
code to introduce an equivalent to the "early config" machinery (calling it
"early ref store"), but:

 1. We're really in feature freeze, and I want this bug fix to go into
    v2.23.0.
 2. It is actually a pretty obscure thing to want: a branch-dependent config
    that is used that early that the Git directory was not yet discovered. I 
    could imagine that some power user wants to play some games at some
    stage, say, with the pager depending on the name of the current branch,
    but even then, to run into the issue with this here patch where it
    simply ignores the includeif.onbranch: setting in the early config code
    path, a command has to be run that does not immediately set up 
    the_repository->gitdir but still wants to use the configured pager.

So yes, this patch introduces a known issue, but it does fix a BUG() where
no bug should be reported.

Johannes Schindelin (1):
  config: work around bug with includeif:onbranch and early config

 config.c                | 3 ++-
 t/t1309-early-config.sh | 5 +++++
 2 files changed, 7 insertions(+), 1 deletion(-)


base-commit: 026dd738a6e5f1e42ef0f390feacb5ed6acc4ee8
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-300%2Fdscho%2Fonbranch-and-early-config-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-300/dscho/onbranch-and-early-config-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/300
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 0/1] Make the includeif:onbranch feature more robust
  2019-07-31 19:53 [PATCH 0/1] Make the includeif:onbranch feature more robust Johannes Schindelin via GitGitGadget
  2019-07-31 19:53 ` [PATCH 1/1] config: work around bug with includeif:onbranch and early config Johannes Schindelin via GitGitGadget
@ 2019-07-31 20:06 ` Johannes Schindelin via GitGitGadget
  2019-07-31 20:06   ` [PATCH v2 1/1] config: work around bug with includeif:onbranch and early config Johannes Schindelin via GitGitGadget
  1 sibling, 1 reply; 17+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2019-07-31 20:06 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

I actually stumbled over this while rebasing the VFS for Git patches, in
relation with the pre-command hook that we still support there (we're slowly
migrating toward Trace2, of course).

Once I figured out what the problem was, I hunted for a way to trigger this
bug in plain Git, and git help -a is the one I settled on. It had to be a
command that uses the early config machinery (and git help -a uses it to
list all aliases), and it had to be a command that does not discover a .git 
directory automatically (cmd_help is listed without any flags in git.c, so:
check).

Of course, part of me wants to just go and dig into the refs part of the
code to introduce an equivalent to the "early config" machinery (calling it
"early ref store"), but:

 1. We're really in feature freeze, and I want this bug fix to go into
    v2.23.0.
 2. It is actually a pretty obscure thing to want: a branch-dependent config
    that is used that early that the Git directory was not yet discovered. I 
    could imagine that some power user wants to play some games at some
    stage, say, with the pager depending on the name of the current branch,
    but even then, to run into the issue with this here patch where it
    simply ignores the includeif.onbranch: setting in the early config code
    path, a command has to be run that does not immediately set up 
    the_repository->gitdir but still wants to use the configured pager.

So yes, this patch introduces a known issue, but it does fix a BUG() where
no bug should be reported.

Changes since v1:

 * Now using my correct email address for Git contributions.

Johannes Schindelin (1):
  config: work around bug with includeif:onbranch and early config

 config.c                | 3 ++-
 t/t1309-early-config.sh | 5 +++++
 2 files changed, 7 insertions(+), 1 deletion(-)


base-commit: 026dd738a6e5f1e42ef0f390feacb5ed6acc4ee8
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-300%2Fdscho%2Fonbranch-and-early-config-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-300/dscho/onbranch-and-early-config-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/300

Range-diff vs v1:

 1:  4aa8834013 ! 1:  ea1a746113 config: work around bug with includeif:onbranch and early config
     @@ -1,4 +1,4 @@
     -Author: Johannes Schindelin <johasc@microsoft.com>
     +Author: Johannes Schindelin <johannes.schindelin@gmx.de>
      
          config: work around bug with includeif:onbranch and early config
      

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 1/1] config: work around bug with includeif:onbranch and early config
  2019-07-31 20:06 ` [PATCH v2 0/1] Make the includeif:onbranch feature more robust Johannes Schindelin via GitGitGadget
@ 2019-07-31 20:06   ` Johannes Schindelin via GitGitGadget
  2019-07-31 22:02     ` Jeff King
  0 siblings, 1 reply; 17+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2019-07-31 20:06 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

Since 07b2c0eacac (config: learn the "onbranch:" includeIf condition,
2019-06-05), there is a potential catch-22 in the early config path: if
the `include.onbranch:` feature is used, Git assumes that the Git
directory has been initialized already. However, in the early config
code path that is not true.

One way to trigger this is to call the following commands in any
repository:

	git config includeif.onbranch:refs/heads/master.path broken
	git help -a

The symptom triggered by the `git help -a` invocation reads like this:

BUG: refs.c:1851: attempting to get main_ref_store outside of repository

Let's work around this, simply by ignoring the `includeif.onbranch:`
setting when parsing the config when the ref store has not been
initialized (yet).

Technically, there is a way to solve this properly: teach the refs
machinery to initialize the ref_store from a given gitdir/commondir pair
(which we _do_ have in the early config code path), and then use that in
`include_by_branch()`. This, however, is a pretty involved project, and
we're already in the feature freeze for Git v2.23.0.

Note: when calling above-mentioned two commands _outside_ of any Git
worktree (passing the `--global` flag to `git config`, as there is
obviously no repository config available), at the point when
`include_by_branch()` is called, `the_repository` is `NULL`, therefore
we have to be extra careful not to dereference it in that case.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 config.c                | 3 ++-
 t/t1309-early-config.sh | 5 +++++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/config.c b/config.c
index ed7f58e0fc..3900e4947b 100644
--- a/config.c
+++ b/config.c
@@ -275,7 +275,8 @@ static int include_by_branch(const char *cond, size_t cond_len)
 	int flags;
 	int ret;
 	struct strbuf pattern = STRBUF_INIT;
-	const char *refname = resolve_ref_unsafe("HEAD", 0, NULL, &flags);
+	const char *refname = !the_repository || !the_repository->gitdir ?
+		NULL : resolve_ref_unsafe("HEAD", 0, NULL, &flags);
 	const char *shortname;
 
 	if (!refname || !(flags & REF_ISSYMREF)	||
diff --git a/t/t1309-early-config.sh b/t/t1309-early-config.sh
index 413642aa56..0c37e7180d 100755
--- a/t/t1309-early-config.sh
+++ b/t/t1309-early-config.sh
@@ -89,4 +89,9 @@ test_expect_failure 'ignore .git/ with invalid config' '
 	test_with_config "["
 '
 
+test_expect_success 'early config and onbranch' '
+	echo "[broken" >broken &&
+	test_with_config "[includeif \"onbranch:refs/heads/master\"]path=../broken"
+'
+
 test_done
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/1] config: work around bug with includeif:onbranch and early config
  2019-07-31 19:53 ` [PATCH 1/1] config: work around bug with includeif:onbranch and early config Johannes Schindelin via GitGitGadget
@ 2019-07-31 21:37   ` Junio C Hamano
  0 siblings, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2019-07-31 21:37 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget; +Cc: git, Johannes Schindelin

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> Let's work around this, simply by ignoring the `includeif.onbranch:`
> setting when parsing the config when the ref store has not been
> initialized (yet).

Interesting chicken-and-egg problem.

> Technically, there is a way to solve this properly: teach the refs
> machinery to initialize the ref_store from a given gitdir/commondir pair
> (which we _do_ have in the early config code path), and then use that in
> `include_by_branch()`. This, however, is a pretty involved project, and
> we're already in the feature freeze for Git v2.23.0.

Postponing the proper fix to the next cycle (or later) is good.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/1] config: work around bug with includeif:onbranch and early config
  2019-07-31 20:06   ` [PATCH v2 1/1] config: work around bug with includeif:onbranch and early config Johannes Schindelin via GitGitGadget
@ 2019-07-31 22:02     ` Jeff King
  2019-07-31 22:13       ` Johannes Schindelin
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff King @ 2019-07-31 22:02 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: git, Junio C Hamano, Johannes Schindelin

On Wed, Jul 31, 2019 at 01:06:42PM -0700, Johannes Schindelin via GitGitGadget wrote:

> Since 07b2c0eacac (config: learn the "onbranch:" includeIf condition,
> 2019-06-05), there is a potential catch-22 in the early config path: if
> the `include.onbranch:` feature is used, Git assumes that the Git
> directory has been initialized already. However, in the early config
> code path that is not true.
> 
> One way to trigger this is to call the following commands in any
> repository:
> 
> 	git config includeif.onbranch:refs/heads/master.path broken
> 	git help -a
> 
> The symptom triggered by the `git help -a` invocation reads like this:
> 
> BUG: refs.c:1851: attempting to get main_ref_store outside of repository
> 
> Let's work around this, simply by ignoring the `includeif.onbranch:`
> setting when parsing the config when the ref store has not been
> initialized (yet).

I think this "fix the BUG() now, consider making more uses cases work
later" is the right thing to do, and it matches many of the other
"oops, we are looking at repository bits while not in a repo" fixes
we've done over the past few years.

> Technically, there is a way to solve this properly: teach the refs
> machinery to initialize the ref_store from a given gitdir/commondir pair
> (which we _do_ have in the early config code path), and then use that in
> `include_by_branch()`. This, however, is a pretty involved project, and
> we're already in the feature freeze for Git v2.23.0.

This gets tricky, because some commands are intentionally avoiding the
normal lookup procedure (e.g., clone or init, and probably things like
upload-pack that want to enter another repo). So I think it is OK as
long as the early-config code is explicitly saying "and please look at
the refs in this specific direectory now", and it doesn't affect other
possible code paths that might look at refs. I _think_ that's what
you're suggesting above, but I just want to make sure (not that it
matters either way for this patch).

As a workaround, I suspect in many cases it might work to simply do a
gentle setup (e.g., in "help"), but we simply weren't doing it before
because there was no use case. That obviously wouldn't work for
something like clone, though.

> diff --git a/config.c b/config.c
> index ed7f58e0fc..3900e4947b 100644
> --- a/config.c
> +++ b/config.c
> @@ -275,7 +275,8 @@ static int include_by_branch(const char *cond, size_t cond_len)
>  	int flags;
>  	int ret;
>  	struct strbuf pattern = STRBUF_INIT;
> -	const char *refname = resolve_ref_unsafe("HEAD", 0, NULL, &flags);
> +	const char *refname = !the_repository || !the_repository->gitdir ?
> +		NULL : resolve_ref_unsafe("HEAD", 0, NULL, &flags);

I think the_repository is always non-NULL. The way similar sites check
this is with "!startup_info->have_repository" or have_git_dir(). The
early-config code uses the latter, so we should probably match it here.

  Side note: I suspect there are some cleanup opportunities. IIRC, I had
  to add have_git_dir() to cover some cases where $GIT_DIR was set but
  we hadn't explicitly done a setup step, but there's been a lot of
  refactoring and cleanup in the initialization code since then. I'm not
  sure if it's still necessary.

> diff --git a/t/t1309-early-config.sh b/t/t1309-early-config.sh
> index 413642aa56..0c37e7180d 100755
> --- a/t/t1309-early-config.sh
> +++ b/t/t1309-early-config.sh
> @@ -89,4 +89,9 @@ test_expect_failure 'ignore .git/ with invalid config' '
>  	test_with_config "["
>  '
>  
> +test_expect_success 'early config and onbranch' '
> +	echo "[broken" >broken &&
> +	test_with_config "[includeif \"onbranch:refs/heads/master\"]path=../broken"
> +'

OK, so we know we didn't see the broken config because we'd have barfed
if we actually included it. Makes sense.

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/1] config: work around bug with includeif:onbranch and early config
  2019-07-31 22:02     ` Jeff King
@ 2019-07-31 22:13       ` Johannes Schindelin
  2019-07-31 23:12         ` Jeff King
  0 siblings, 1 reply; 17+ messages in thread
From: Johannes Schindelin @ 2019-07-31 22:13 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin via GitGitGadget, git, Junio C Hamano

Hi Peff,

On Wed, 31 Jul 2019, Jeff King wrote:

> On Wed, Jul 31, 2019 at 01:06:42PM -0700, Johannes Schindelin via GitGitGadget wrote:
>
> > Technically, there is a way to solve this properly: teach the refs
> > machinery to initialize the ref_store from a given gitdir/commondir pair
> > (which we _do_ have in the early config code path), and then use that in
> > `include_by_branch()`. This, however, is a pretty involved project, and
> > we're already in the feature freeze for Git v2.23.0.
>
> This gets tricky, because some commands are intentionally avoiding the
> normal lookup procedure (e.g., clone or init, and probably things like
> upload-pack that want to enter another repo). So I think it is OK as
> long as the early-config code is explicitly saying "and please look at
> the refs in this specific direectory now", and it doesn't affect other
> possible code paths that might look at refs. I _think_ that's what
> you're suggesting above, but I just want to make sure (not that it
> matters either way for this patch).

I think we already have the `git clone` problem with
`includeif.gitdir:`. AFAICT we _will_ discover a Git directory when
cloning inside an existing Git worktree.

> As a workaround, I suspect in many cases it might work to simply do a
> gentle setup (e.g., in "help"), but we simply weren't doing it before
> because there was no use case. That obviously wouldn't work for
> something like clone, though.

Right.

And as you say, there was no use case, and I would even contend that
there still is no use case. In the cover letter, I tried to concoct
something (using a branch-dependent pager) that sounds _really_
far-fetched to even me.
>
> > diff --git a/config.c b/config.c
> > index ed7f58e0fc..3900e4947b 100644
> > --- a/config.c
> > +++ b/config.c
> > @@ -275,7 +275,8 @@ static int include_by_branch(const char *cond, size_t cond_len)
> >  	int flags;
> >  	int ret;
> >  	struct strbuf pattern = STRBUF_INIT;
> > -	const char *refname = resolve_ref_unsafe("HEAD", 0, NULL, &flags);
> > +	const char *refname = !the_repository || !the_repository->gitdir ?
> > +		NULL : resolve_ref_unsafe("HEAD", 0, NULL, &flags);
>
> I think the_repository is always non-NULL.

No, it totally can be `NULL`. I know because my first version of the
patch did not have that extra check, and `git help -a` would segfault
outside a Git worktree when I had an `includeif.onbranch:` in my
`~/.gitconfig`.

I tried to explain that in the commit message, but obviously did a poor
job:

    Note: when calling above-mentioned two commands _outside_ of any Git
    worktree (passing the `--global` flag to `git config`, as there is
    obviously no repository config available), at the point when
    `include_by_branch()` is called, `the_repository` is `NULL`, therefore
    we have to be extra careful not to dereference it in that case.

> The way similar sites check this is withV
> "!startup_info->have_repository" or have_git_dir(). The early-config
> code uses the latter, so we should probably match it here.

Quite frankly, I'd rather not. At this point, it is not important
whether or not we discovered a Git directory, but whether or not we have
populated a dereference'able `the_repository`. Those are two different
things.

>   Side note: I suspect there are some cleanup opportunities. IIRC, I had
>   to add have_git_dir() to cover some cases where $GIT_DIR was set but
>   we hadn't explicitly done a setup step, but there's been a lot of
>   refactoring and cleanup in the initialization code since then. I'm not
>   sure if it's still necessary.

Yeah, well, I am not necessarily certain that we always ask the right
questions, such as asking whether we found a startup repository when we
need, in fact, to know whether `the_repository->refs` would cause a
segmentation fault because we would dereference a `NULL` pointer ;-)
>
> > diff --git a/t/t1309-early-config.sh b/t/t1309-early-config.sh
> > index 413642aa56..0c37e7180d 100755
> > --- a/t/t1309-early-config.sh
> > +++ b/t/t1309-early-config.sh
> > @@ -89,4 +89,9 @@ test_expect_failure 'ignore .git/ with invalid config' '
> >  	test_with_config "["
> >  '
> >
> > +test_expect_success 'early config and onbranch' '
> > +	echo "[broken" >broken &&
> > +	test_with_config "[includeif \"onbranch:refs/heads/master\"]path=../broken"
> > +'
>
> OK, so we know we didn't see the broken config because we'd have barfed
> if we actually included it. Makes sense.

That was the idea :-)

Thanks for the review!
Dscho

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/1] config: work around bug with includeif:onbranch and early config
  2019-07-31 22:13       ` Johannes Schindelin
@ 2019-07-31 23:12         ` Jeff King
  2019-08-01  0:49           ` Jeff King
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff King @ 2019-07-31 23:12 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Johannes Schindelin via GitGitGadget, git, Junio C Hamano

On Thu, Aug 01, 2019 at 12:13:19AM +0200, Johannes Schindelin wrote:

> > This gets tricky, because some commands are intentionally avoiding the
> > normal lookup procedure (e.g., clone or init, and probably things like
> > upload-pack that want to enter another repo). So I think it is OK as
> > long as the early-config code is explicitly saying "and please look at
> > the refs in this specific direectory now", and it doesn't affect other
> > possible code paths that might look at refs. I _think_ that's what
> > you're suggesting above, but I just want to make sure (not that it
> > matters either way for this patch).
> 
> I think we already have the `git clone` problem with
> `includeif.gitdir:`. AFAICT we _will_ discover a Git directory when
> cloning inside an existing Git worktree.

Yeah, I could well believe that. I think it's hard for the config code
to say what's the right think to do here. If I'm running "git clone"
from inside another repository, should I respect, say, an alias defined
in that repository's config? Probably. But should I find that alias
behind "includeif.gitdir"? I dunno. Maybe?

So I'm not 100% sure the current behavior is buggy. And mostly I'd be
happy to ignore it until somebody comes up with a compelling
(real-world) example either way.

> And as you say, there was no use case, and I would even contend that
> there still is no use case. In the cover letter, I tried to concoct
> something (using a branch-dependent pager) that sounds _really_
> far-fetched to even me.

Yeah. I'd be totally fine if we left it with your fix here and nobody
ever found time to work on this. :)

> > > -	const char *refname = resolve_ref_unsafe("HEAD", 0, NULL, &flags);
> > > +	const char *refname = !the_repository || !the_repository->gitdir ?
> > > +		NULL : resolve_ref_unsafe("HEAD", 0, NULL, &flags);
> >
> > I think the_repository is always non-NULL.
> 
> No, it totally can be `NULL`. I know because my first version of the
> patch did not have that extra check, and `git help -a` would segfault
> outside a Git worktree when I had an `includeif.onbranch:` in my
> `~/.gitconfig`.

Hrm. But common-main calls initialize_the_repository(), which points it
at &the_repo. And I can't find any other assignments. So how does it
become NULL? And is every caller of have_git_dir() at risk of
segfaulting?

Ah, I see. I think it is that trace2 reads the configuration very early.
I think we ought to do this:

diff --git a/common-main.c b/common-main.c
index 582a7b1886..89fd415e55 100644
--- a/common-main.c
+++ b/common-main.c
@@ -39,14 +39,14 @@ int main(int argc, const char **argv)
 
 	git_resolve_executable_dir(argv[0]);
 
+	initialize_the_repository();
+
 	trace2_initialize();
 	trace2_cmd_start(argv);
 	trace2_collect_process_info(TRACE2_PROCESS_INFO_STARTUP);
 
 	git_setup_gettext();
 
-	initialize_the_repository();
-
 	attr_start();
 
 	result = cmd_main(argc, argv);

or possibly even move the trace2 bits to the very end of that function.
The point of common-main is to do very basic setup. Doing tentative repo
discovery and config reading there at all is surprising to me, to say
the least. But I think we can at least make sure the library code is
initialized first.

> > The way similar sites check this is withV
> > "!startup_info->have_repository" or have_git_dir(). The early-config
> > code uses the latter, so we should probably match it here.
> 
> Quite frankly, I'd rather not. At this point, it is not important
> whether or not we discovered a Git directory, but whether or not we have
> populated a dereference'able `the_repository`. Those are two different
> things.

What I'm concerned about it is whether there are cases where
the_repository->gitdir is NULL, but we _could_ still look up refs. I.e.,
why is the rest of the config code using have_git_dir(), and why is this
code path special?

Again, I _think_ we might be able to get rid of have_git_dir() now. Back
when it was introduced get_git_dir() did lazy setup, and these days it
looks like it's just peeking at the_repository->gitdir. But it makes
sense to me for this fix to be consistent with the surrounding code, and
then to investigate have_git_dir() separately.

> >   Side note: I suspect there are some cleanup opportunities. IIRC, I had
> >   to add have_git_dir() to cover some cases where $GIT_DIR was set but
> >   we hadn't explicitly done a setup step, but there's been a lot of
> >   refactoring and cleanup in the initialization code since then. I'm not
> >   sure if it's still necessary.
> 
> Yeah, well, I am not necessarily certain that we always ask the right
> questions, such as asking whether we found a startup repository when we
> need, in fact, to know whether `the_repository->refs` would cause a
> segmentation fault because we would dereference a `NULL` pointer ;-)

If there are cases where startup_info->have_repository is non-zero but
we'd segfault, then I think that's a bug that is going to affect more
spots than this, and we need to investigate and fix. But I don't think
that is the case. We should only be setting it after calling
set_git_dir(), and poking at the current sites which set that leads me
to believe this is true.

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/1] config: work around bug with includeif:onbranch and early config
  2019-07-31 23:12         ` Jeff King
@ 2019-08-01  0:49           ` Jeff King
  2019-08-01 17:24             ` Jeff Hostetler
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff King @ 2019-08-01  0:49 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Johannes Schindelin via GitGitGadget, git, Junio C Hamano

On Wed, Jul 31, 2019 at 07:12:57PM -0400, Jeff King wrote:

> Hrm. But common-main calls initialize_the_repository(), which points it
> at &the_repo. And I can't find any other assignments. So how does it
> become NULL? And is every caller of have_git_dir() at risk of
> segfaulting?
> 
> Ah, I see. I think it is that trace2 reads the configuration very early.
> I think we ought to do this:
> 
> diff --git a/common-main.c b/common-main.c
> index 582a7b1886..89fd415e55 100644
> --- a/common-main.c
> +++ b/common-main.c
> @@ -39,14 +39,14 @@ int main(int argc, const char **argv)
>  
>  	git_resolve_executable_dir(argv[0]);
>  
> +	initialize_the_repository();
> +
>  	trace2_initialize();
>  	trace2_cmd_start(argv);
>  	trace2_collect_process_info(TRACE2_PROCESS_INFO_STARTUP);
>  
>  	git_setup_gettext();
>  
> -	initialize_the_repository();
> -
>  	attr_start();
>  
>  	result = cmd_main(argc, argv);

By the way, I wondered why trace2's existing config reading did not
cause us to segfault because of this. It is because it invented the
"very early config" function which always ignores some config sources
(working around this problem, but also making it weirdly unlike most
other config).

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/1] config: work around bug with includeif:onbranch and early config
  2019-08-01  0:49           ` Jeff King
@ 2019-08-01 17:24             ` Jeff Hostetler
  2019-08-06 12:26               ` [PATCH 0/3] the_repository initialization cleanup Jeff King
  2019-08-06 12:56               ` [PATCH v2 1/1] config: work around bug with includeif:onbranch and early config Jeff King
  0 siblings, 2 replies; 17+ messages in thread
From: Jeff Hostetler @ 2019-08-01 17:24 UTC (permalink / raw)
  To: Jeff King, Johannes Schindelin
  Cc: Johannes Schindelin via GitGitGadget, git, Junio C Hamano



On 7/31/2019 8:49 PM, Jeff King wrote:
> On Wed, Jul 31, 2019 at 07:12:57PM -0400, Jeff King wrote:
> 
>> Hrm. But common-main calls initialize_the_repository(), which points it
>> at &the_repo. And I can't find any other assignments. So how does it
>> become NULL? And is every caller of have_git_dir() at risk of
>> segfaulting?
>>
>> Ah, I see. I think it is that trace2 reads the configuration very early.
>> I think we ought to do this:
>>
>> diff --git a/common-main.c b/common-main.c
>> index 582a7b1886..89fd415e55 100644
>> --- a/common-main.c
>> +++ b/common-main.c
>> @@ -39,14 +39,14 @@ int main(int argc, const char **argv)
>>   
>>   	git_resolve_executable_dir(argv[0]);
>>   
>> +	initialize_the_repository();
>> +
>>   	trace2_initialize();
>>   	trace2_cmd_start(argv);
>>   	trace2_collect_process_info(TRACE2_PROCESS_INFO_STARTUP);
>>   
>>   	git_setup_gettext();
>>   
>> -	initialize_the_repository();
>> -
>>   	attr_start();
>>   
>>   	result = cmd_main(argc, argv);
> 
> By the way, I wondered why trace2's existing config reading did not
> cause us to segfault because of this. It is because it invented the
> "very early config" function which always ignores some config sources
> (working around this problem, but also making it weirdly unlike most
> other config).

Yes, I added the "very early config" to try to work around some of
the chicken-n-egg problems.  I can't say that I was completely happy
with having to do that.  I haven't had time to play with your patch
suggestion here, but I think it would be fine to do if it will help
with the original problem.

In [1] I added code to just start the clock in isolation (rather than
being part of the trace2_initialize() -- which does all the config
loading and subsystem initialization).  So it is OK to let the
trace2_initialize() run a little later.  (Part of the reason for that
split was to allow git_resolve_executable_dir() to run first, since
that data was needed to find the location of the system config relative
to the exe path (sigh).)

[1] a089724958a trace2: refactor setting process starting time


So, as you suggested in your previous response, something like
this would/should be fine.

$ git diff
diff --git a/common-main.c b/common-main.c
index 582a7b1886..71e21dd20a 100644
--- a/common-main.c
+++ b/common-main.c
@@ -39,16 +39,16 @@ int main(int argc, const char **argv)

         git_resolve_executable_dir(argv[0]);

-       trace2_initialize();
-       trace2_cmd_start(argv);
-       trace2_collect_process_info(TRACE2_PROCESS_INFO_STARTUP);
-
         git_setup_gettext();

         initialize_the_repository();

         attr_start();

+       trace2_initialize();
+       trace2_cmd_start(argv);
+       trace2_collect_process_info(TRACE2_PROCESS_INFO_STARTUP);
+
         result = cmd_main(argc, argv);

         trace2_cmd_exit(result);


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 0/3] the_repository initialization cleanup
  2019-08-01 17:24             ` Jeff Hostetler
@ 2019-08-06 12:26               ` Jeff King
  2019-08-06 12:26                 ` [PATCH 1/3] t1309: use short branch name in includeIf.onbranch test Jeff King
                                   ` (2 more replies)
  2019-08-06 12:56               ` [PATCH v2 1/1] config: work around bug with includeif:onbranch and early config Jeff King
  1 sibling, 3 replies; 17+ messages in thread
From: Jeff King @ 2019-08-06 12:26 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: Johannes Schindelin, Johannes Schindelin via GitGitGadget, git,
	Junio C Hamano

On Thu, Aug 01, 2019 at 01:24:17PM -0400, Jeff Hostetler wrote:

> > By the way, I wondered why trace2's existing config reading did not
> > cause us to segfault because of this. It is because it invented the
> > "very early config" function which always ignores some config sources
> > (working around this problem, but also making it weirdly unlike most
> > other config).
> 
> Yes, I added the "very early config" to try to work around some of
> the chicken-n-egg problems.  I can't say that I was completely happy
> with having to do that.  I haven't had time to play with your patch
> suggestion here, but I think it would be fine to do if it will help
> with the original problem.
> 
> In [1] I added code to just start the clock in isolation (rather than
> being part of the trace2_initialize() -- which does all the config
> loading and subsystem initialization).  So it is OK to let the
> trace2_initialize() run a little later.  (Part of the reason for that
> split was to allow git_resolve_executable_dir() to run first, since
> that data was needed to find the location of the system config relative
> to the exe path (sigh).)
> 
> [1] a089724958a trace2: refactor setting process starting time
> 
> So, as you suggested in your previous response, something like
> this would/should be fine.

Thanks, that pointer helped me write the commit message. So here are a
few cleanups that can go on top (of master now, since Dscho's patch
graduated). There's no rush to get these into v2.23.

Note that there _is_ still a funny corner case with the way the original
patch checks the_repository->git_dir, but I don't think it's worth
fixing at this point. I'll send a followup email with more details.

  [1/3]: t1309: use short branch name in includeIf.onbranch test
  [2/3]: common-main: delay trace2 initialization
  [3/3]: config: stop checking whether the_repository is NULL

 common-main.c           | 8 ++++----
 config.c                | 2 +-
 t/t1309-early-config.sh | 7 ++++++-
 3 files changed, 11 insertions(+), 6 deletions(-)

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/3] t1309: use short branch name in includeIf.onbranch test
  2019-08-06 12:26               ` [PATCH 0/3] the_repository initialization cleanup Jeff King
@ 2019-08-06 12:26                 ` Jeff King
  2019-08-06 12:27                 ` [PATCH 2/3] common-main: delay trace2 initialization Jeff King
  2019-08-06 12:27                 ` [PATCH 3/3] config: stop checking whether the_repository is NULL Jeff King
  2 siblings, 0 replies; 17+ messages in thread
From: Jeff King @ 2019-08-06 12:26 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: Johannes Schindelin, Johannes Schindelin via GitGitGadget, git,
	Junio C Hamano

Commit 85fe0e800c (config: work around bug with includeif:onbranch and
early config, 2019-07-31) tests that our early config-reader does not
access the file mentioned by includeIf.onbranch:refs/heads/master.path.
But it would never do so even if the feature were implemented, since the
onbranch matching code uses the short refname "master".

The test still serves its purpose, since the bug fixed by 85fe0e800c is
actually that we hit a BUG() before even deciding whether to match the
ref. But let's use the correct name to avoid confusion (and which we'll
eventually want to trigger once we do the "real" fix described in that
commit).

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t1309-early-config.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t1309-early-config.sh b/t/t1309-early-config.sh
index 0c37e7180d..eeb60e4143 100755
--- a/t/t1309-early-config.sh
+++ b/t/t1309-early-config.sh
@@ -91,7 +91,7 @@ test_expect_failure 'ignore .git/ with invalid config' '
 
 test_expect_success 'early config and onbranch' '
 	echo "[broken" >broken &&
-	test_with_config "[includeif \"onbranch:refs/heads/master\"]path=../broken"
+	test_with_config "[includeif \"onbranch:master\"]path=../broken"
 '
 
 test_done
-- 
2.23.0.rc1.436.g24d2e81391


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 2/3] common-main: delay trace2 initialization
  2019-08-06 12:26               ` [PATCH 0/3] the_repository initialization cleanup Jeff King
  2019-08-06 12:26                 ` [PATCH 1/3] t1309: use short branch name in includeIf.onbranch test Jeff King
@ 2019-08-06 12:27                 ` Jeff King
  2019-08-06 12:27                 ` [PATCH 3/3] config: stop checking whether the_repository is NULL Jeff King
  2 siblings, 0 replies; 17+ messages in thread
From: Jeff King @ 2019-08-06 12:27 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: Johannes Schindelin, Johannes Schindelin via GitGitGadget, git,
	Junio C Hamano

We initialize the trace2 system in the common main() function so that
all programs (even ones that aren't builtins) will enable tracing. But
trace2 startup is relatively heavy-weight, as we have to actually read
on-disk config to decide whether to trace. This can cause unexpected
interactions with other common-main initialization. For instance, we'll
end up in the config code before calling initialize_the_repository(),
and the usual invariant that the_repository is never NULL will not hold.

Let's push the trace2 initialization further down in common-main, to
just before we execute cmd_main(). The other parts of the initialization
are much more self-contained and less likely to call library code that
depends on those kinds of invariants.

Originally the trace2 code tried to start as early as possible to get
accurate timings. But the timer initialization was split out from the
config reading in a089724958 (trace2: refactor setting process starting
time, 2019-04-15), so there shouldn't be any impact from this patch.

Signed-off-by: Jeff King <peff@peff.net>
---
 common-main.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/common-main.c b/common-main.c
index 582a7b1886..71e21dd20a 100644
--- a/common-main.c
+++ b/common-main.c
@@ -39,16 +39,16 @@ int main(int argc, const char **argv)
 
 	git_resolve_executable_dir(argv[0]);
 
-	trace2_initialize();
-	trace2_cmd_start(argv);
-	trace2_collect_process_info(TRACE2_PROCESS_INFO_STARTUP);
-
 	git_setup_gettext();
 
 	initialize_the_repository();
 
 	attr_start();
 
+	trace2_initialize();
+	trace2_cmd_start(argv);
+	trace2_collect_process_info(TRACE2_PROCESS_INFO_STARTUP);
+
 	result = cmd_main(argc, argv);
 
 	trace2_cmd_exit(result);
-- 
2.23.0.rc1.436.g24d2e81391


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 3/3] config: stop checking whether the_repository is NULL
  2019-08-06 12:26               ` [PATCH 0/3] the_repository initialization cleanup Jeff King
  2019-08-06 12:26                 ` [PATCH 1/3] t1309: use short branch name in includeIf.onbranch test Jeff King
  2019-08-06 12:27                 ` [PATCH 2/3] common-main: delay trace2 initialization Jeff King
@ 2019-08-06 12:27                 ` Jeff King
  2019-08-06 12:49                   ` Jeff King
  2 siblings, 1 reply; 17+ messages in thread
From: Jeff King @ 2019-08-06 12:27 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: Johannes Schindelin, Johannes Schindelin via GitGitGadget, git,
	Junio C Hamano

Since the previous commit, our invariant that the_repository is never
NULL is restored, and we can stop being defensive in include_by_branch().

We can confirm the fix by showing that an onbranch config include will
not cause a segfault when run outside a git repository. I've put this in
t1309-early-config since it's related to the case added by 85fe0e800c
(config: work around bug with includeif:onbranch and early config,
2019-07-31), though technically the issue was with
read_very_early_config() and not read_early_config().

Signed-off-by: Jeff King <peff@peff.net>
---
 config.c                | 2 +-
 t/t1309-early-config.sh | 5 +++++
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/config.c b/config.c
index 3900e4947b..cc637363bb 100644
--- a/config.c
+++ b/config.c
@@ -275,7 +275,7 @@ static int include_by_branch(const char *cond, size_t cond_len)
 	int flags;
 	int ret;
 	struct strbuf pattern = STRBUF_INIT;
-	const char *refname = !the_repository || !the_repository->gitdir ?
+	const char *refname = !the_repository->gitdir ?
 		NULL : resolve_ref_unsafe("HEAD", 0, NULL, &flags);
 	const char *shortname;
 
diff --git a/t/t1309-early-config.sh b/t/t1309-early-config.sh
index eeb60e4143..3a0de0ddaa 100755
--- a/t/t1309-early-config.sh
+++ b/t/t1309-early-config.sh
@@ -94,4 +94,9 @@ test_expect_success 'early config and onbranch' '
 	test_with_config "[includeif \"onbranch:master\"]path=../broken"
 '
 
+test_expect_success 'onbranch config outside of git repo' '
+	test_config_global includeIf.onbranch:master.path non-existent &&
+	nongit git help
+'
+
 test_done
-- 
2.23.0.rc1.436.g24d2e81391

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] config: stop checking whether the_repository is NULL
  2019-08-06 12:27                 ` [PATCH 3/3] config: stop checking whether the_repository is NULL Jeff King
@ 2019-08-06 12:49                   ` Jeff King
  2019-08-08 19:48                     ` Johannes Schindelin
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff King @ 2019-08-06 12:49 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Jeff Hostetler, Johannes Schindelin via GitGitGadget, git,
	Junio C Hamano

On Tue, Aug 06, 2019 at 08:27:58AM -0400, Jeff King wrote:

> diff --git a/config.c b/config.c
> index 3900e4947b..cc637363bb 100644
> --- a/config.c
> +++ b/config.c
> @@ -275,7 +275,7 @@ static int include_by_branch(const char *cond, size_t cond_len)
>  	int flags;
>  	int ret;
>  	struct strbuf pattern = STRBUF_INIT;
> -	const char *refname = !the_repository || !the_repository->gitdir ?
> +	const char *refname = !the_repository->gitdir ?
>  		NULL : resolve_ref_unsafe("HEAD", 0, NULL, &flags);

I stopped here, even though I argued earlier that this probably ought to
be have_git_dir() to match the rest of the config code. What I found
when I started poking at it is that there are a few odd cases still in
the startup code, and that IMHO the final form we want to end up with
actually matches what's written here (but I'm not quite ready to sink
the time into taking us there just yet). Read on for the gory details.

The most immediate issue I saw was that the_repository->gitdir doesn't
actually tell you whether or not we have a repository! I.e., that entry
may be non-NULL even though startup_info->have_repository is 0.

So these two tests fail:

diff --git a/t/t1305-config-include.sh b/t/t1305-config-include.sh
index d20b4d150d..7ec91c4c1c 100755
--- a/t/t1305-config-include.sh
+++ b/t/t1305-config-include.sh
@@ -348,6 +348,18 @@ test_expect_success 'conditional include, onbranch, implicit /** for /' '
 	test_cmp expect actual
 '
 
+test_expect_success 'onbranch include not fooled by fake GIT_DIR' '
+	mkdir not-a-git-dir &&
+	echo "ref: refs/heads/master" >not-a-git-dir/HEAD &&
+	git config --file=inner the.value inner &&
+	git config --file=outer the.value outer &&
+	git config --file=outer includeIf.onbranch:master.path inner &&
+	GIT_DIR=not-a-git-dir \
+		git config --file=outer --includes the.value >actual &&
+	echo outer >expect &&
+	test_cmp expect actual
+'
+
 test_expect_success 'include cycles are detected' '
 	git init --bare cycle &&
 	git -C cycle config include.path cycle &&
diff --git a/t/t1309-early-config.sh b/t/t1309-early-config.sh
index 3a0de0ddaa..91af2c2f89 100755
--- a/t/t1309-early-config.sh
+++ b/t/t1309-early-config.sh
@@ -99,4 +99,12 @@ test_expect_success 'onbranch config outside of git repo' '
 	nongit git help
 '
 
+test_expect_success 'early config is not fooled by bogus GIT_DIR' '
+	mkdir not-a-git-dir &&
+	git config --file=not-a-git-dir/config \
+		alias.should-not-run \
+		"!echo should-not-run" &&
+	test_must_fail env GIT_DIR="not-a-git-dir" git should-not-run
+'
+
 test_done

As you can see from that case, it's a curiosity that we still set the
internal gitdir variable from $GIT_DIR, even if we found that it's not a
valid repo.

The onbranch case is fooled by our direct check of gitdir, but likewise
have_git_dir() allows either have_repository or a non-NULL gitdir (as I
mentioned earlier, I think this is vestigial at this point).

So doing this fixes the onbranch case:

diff --git a/config.c b/config.c
index cc637363bb..134637f3ad 100644
--- a/config.c
+++ b/config.c
@@ -275,7 +275,7 @@ static int include_by_branch(const char *cond, size_t cond_len)
 	int flags;
 	int ret;
 	struct strbuf pattern = STRBUF_INIT;
-	const char *refname = !the_repository->gitdir ?
+	const char *refname = !have_git_dir() ?
 		NULL : resolve_ref_unsafe("HEAD", 0, NULL, &flags);
 	const char *shortname;
 
diff --git a/environment.c b/environment.c
index 89af47cb85..08bdf79d6b 100644
--- a/environment.c
+++ b/environment.c
@@ -203,8 +203,7 @@ int is_bare_repository(void)
 
 int have_git_dir(void)
 {
-	return startup_info->have_repository
-		|| the_repository->gitdir;
+	return startup_info->have_repository;
 }
 
 const char *get_git_dir(void)


but curiously it _doesn't_ fix the alias one, because after seeing that
we have no configured git dir, we then fall back to discover_git_dir(),
which happily reports back the same bogus $GIT_DIR.

So I think there'd be some more cleanup required there. But the general
direction is that in the near future we probably ought to be checking
startup_info->have_repository and not the_repository->gitdir.

But I think in the long-term, we probably ought to be getting rid of
startup_info->have_repository itself. It's really just hiding a subtle
dependency on whether the_repository is valid, and we're better off
actually seeing all the spots that depend on the_repository (and
eventually converting them to take a "struct repository" as
appropriate).

Where we want to end up is (I think):

  - stop setting the_repository->gitdir when we know that there's no
    repo (and possibly clear $GIT_DIR as well to avoid confusion). That
    should make it a robust way to check whether a "struct repository"
    is valid.

    This is the step I've stalled on, because I'm a bit worried that
    there's some subtle case that depends on it working this way.

  - drop startup_info->have_repository entirely. At that point it would
    be redundant with checking the_repository->gitdir.

    If you do this without the first step, t4201 will notice and fail
    (because it expects GIT_DIR=not-a-repo to notice that there is no
    repo).

  - optionally, drop have_git_dir() in favor of just checking
    the_repository->gitdir. This loses a layer of abstraction, but I
    think it makes it much more clear where we'd depending on
    the_repository.

And so in the end, the current state of include_by_branch() is what we
want, and all the other call sites should change to match it. Whether it
should be changed in the interim (until we fix the discrepancy between
gitdir/have_repository), I don't feel strongly about. There are
user-visible cases that I believe we get wrong today, but as you can see
from the tests above, they're pretty ridiculous and unlikely in
practice.

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/1] config: work around bug with includeif:onbranch and early config
  2019-08-01 17:24             ` Jeff Hostetler
  2019-08-06 12:26               ` [PATCH 0/3] the_repository initialization cleanup Jeff King
@ 2019-08-06 12:56               ` Jeff King
  1 sibling, 0 replies; 17+ messages in thread
From: Jeff King @ 2019-08-06 12:56 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: Johannes Schindelin, Johannes Schindelin via GitGitGadget, git,
	Junio C Hamano

On Thu, Aug 01, 2019 at 01:24:17PM -0400, Jeff Hostetler wrote:

> > By the way, I wondered why trace2's existing config reading did not
> > cause us to segfault because of this. It is because it invented the
> > "very early config" function which always ignores some config sources
> > (working around this problem, but also making it weirdly unlike most
> > other config).
> 
> Yes, I added the "very early config" to try to work around some of
> the chicken-n-egg problems.  I can't say that I was completely happy
> with having to do that.

I meant to comment a little further on this earlier. While I do think
it's unfortunate to have yet another set of special rules, I think what
you ended up with is probably the least-bad thing.  This trace2 config
read is happening so early (even after my movement patch) that it makes
me nervous to do anything at all complicated.

Not just for performance reasons, which you cited in the original
commit, but also for correctness and even security reasons.  This is
code we're running for every single binary before we even hit main(),
and it's probably a good thing that we are not loading .git/config
values from a potentially untrusted repository (e.g., upload-pack is
supposed to be safe to run in an untrusted repo).

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] config: stop checking whether the_repository is NULL
  2019-08-06 12:49                   ` Jeff King
@ 2019-08-08 19:48                     ` Johannes Schindelin
  0 siblings, 0 replies; 17+ messages in thread
From: Johannes Schindelin @ 2019-08-08 19:48 UTC (permalink / raw)
  To: Jeff King
  Cc: Jeff Hostetler, Johannes Schindelin via GitGitGadget, git,
	Junio C Hamano

Hi Peff,

ACK on the three patches, and I think your outline is valid (post
v2.23.0, possibly an Outreachy project?).

Thanks,
Dscho

P.S.: Sorry for top-posting, but I felt that I did not even respond to
anything you said, concretely, so...

On Tue, 6 Aug 2019, Jeff King wrote:

> On Tue, Aug 06, 2019 at 08:27:58AM -0400, Jeff King wrote:
>
> > diff --git a/config.c b/config.c
> > index 3900e4947b..cc637363bb 100644
> > --- a/config.c
> > +++ b/config.c
> > @@ -275,7 +275,7 @@ static int include_by_branch(const char *cond, size_t cond_len)
> >  	int flags;
> >  	int ret;
> >  	struct strbuf pattern = STRBUF_INIT;
> > -	const char *refname = !the_repository || !the_repository->gitdir ?
> > +	const char *refname = !the_repository->gitdir ?
> >  		NULL : resolve_ref_unsafe("HEAD", 0, NULL, &flags);
>
> I stopped here, even though I argued earlier that this probably ought to
> be have_git_dir() to match the rest of the config code. What I found
> when I started poking at it is that there are a few odd cases still in
> the startup code, and that IMHO the final form we want to end up with
> actually matches what's written here (but I'm not quite ready to sink
> the time into taking us there just yet). Read on for the gory details.
>
> The most immediate issue I saw was that the_repository->gitdir doesn't
> actually tell you whether or not we have a repository! I.e., that entry
> may be non-NULL even though startup_info->have_repository is 0.
>
> So these two tests fail:
>
> diff --git a/t/t1305-config-include.sh b/t/t1305-config-include.sh
> index d20b4d150d..7ec91c4c1c 100755
> --- a/t/t1305-config-include.sh
> +++ b/t/t1305-config-include.sh
> @@ -348,6 +348,18 @@ test_expect_success 'conditional include, onbranch, implicit /** for /' '
>  	test_cmp expect actual
>  '
>
> +test_expect_success 'onbranch include not fooled by fake GIT_DIR' '
> +	mkdir not-a-git-dir &&
> +	echo "ref: refs/heads/master" >not-a-git-dir/HEAD &&
> +	git config --file=inner the.value inner &&
> +	git config --file=outer the.value outer &&
> +	git config --file=outer includeIf.onbranch:master.path inner &&
> +	GIT_DIR=not-a-git-dir \
> +		git config --file=outer --includes the.value >actual &&
> +	echo outer >expect &&
> +	test_cmp expect actual
> +'
> +
>  test_expect_success 'include cycles are detected' '
>  	git init --bare cycle &&
>  	git -C cycle config include.path cycle &&
> diff --git a/t/t1309-early-config.sh b/t/t1309-early-config.sh
> index 3a0de0ddaa..91af2c2f89 100755
> --- a/t/t1309-early-config.sh
> +++ b/t/t1309-early-config.sh
> @@ -99,4 +99,12 @@ test_expect_success 'onbranch config outside of git repo' '
>  	nongit git help
>  '
>
> +test_expect_success 'early config is not fooled by bogus GIT_DIR' '
> +	mkdir not-a-git-dir &&
> +	git config --file=not-a-git-dir/config \
> +		alias.should-not-run \
> +		"!echo should-not-run" &&
> +	test_must_fail env GIT_DIR="not-a-git-dir" git should-not-run
> +'
> +
>  test_done
>
> As you can see from that case, it's a curiosity that we still set the
> internal gitdir variable from $GIT_DIR, even if we found that it's not a
> valid repo.
>
> The onbranch case is fooled by our direct check of gitdir, but likewise
> have_git_dir() allows either have_repository or a non-NULL gitdir (as I
> mentioned earlier, I think this is vestigial at this point).
>
> So doing this fixes the onbranch case:
>
> diff --git a/config.c b/config.c
> index cc637363bb..134637f3ad 100644
> --- a/config.c
> +++ b/config.c
> @@ -275,7 +275,7 @@ static int include_by_branch(const char *cond, size_t cond_len)
>  	int flags;
>  	int ret;
>  	struct strbuf pattern = STRBUF_INIT;
> -	const char *refname = !the_repository->gitdir ?
> +	const char *refname = !have_git_dir() ?
>  		NULL : resolve_ref_unsafe("HEAD", 0, NULL, &flags);
>  	const char *shortname;
>
> diff --git a/environment.c b/environment.c
> index 89af47cb85..08bdf79d6b 100644
> --- a/environment.c
> +++ b/environment.c
> @@ -203,8 +203,7 @@ int is_bare_repository(void)
>
>  int have_git_dir(void)
>  {
> -	return startup_info->have_repository
> -		|| the_repository->gitdir;
> +	return startup_info->have_repository;
>  }
>
>  const char *get_git_dir(void)
>
>
> but curiously it _doesn't_ fix the alias one, because after seeing that
> we have no configured git dir, we then fall back to discover_git_dir(),
> which happily reports back the same bogus $GIT_DIR.
>
> So I think there'd be some more cleanup required there. But the general
> direction is that in the near future we probably ought to be checking
> startup_info->have_repository and not the_repository->gitdir.
>
> But I think in the long-term, we probably ought to be getting rid of
> startup_info->have_repository itself. It's really just hiding a subtle
> dependency on whether the_repository is valid, and we're better off
> actually seeing all the spots that depend on the_repository (and
> eventually converting them to take a "struct repository" as
> appropriate).
>
> Where we want to end up is (I think):
>
>   - stop setting the_repository->gitdir when we know that there's no
>     repo (and possibly clear $GIT_DIR as well to avoid confusion). That
>     should make it a robust way to check whether a "struct repository"
>     is valid.
>
>     This is the step I've stalled on, because I'm a bit worried that
>     there's some subtle case that depends on it working this way.
>
>   - drop startup_info->have_repository entirely. At that point it would
>     be redundant with checking the_repository->gitdir.
>
>     If you do this without the first step, t4201 will notice and fail
>     (because it expects GIT_DIR=not-a-repo to notice that there is no
>     repo).
>
>   - optionally, drop have_git_dir() in favor of just checking
>     the_repository->gitdir. This loses a layer of abstraction, but I
>     think it makes it much more clear where we'd depending on
>     the_repository.
>
> And so in the end, the current state of include_by_branch() is what we
> want, and all the other call sites should change to match it. Whether it
> should be changed in the interim (until we fix the discrepancy between
> gitdir/have_repository), I don't feel strongly about. There are
> user-visible cases that I believe we get wrong today, but as you can see
> from the tests above, they're pretty ridiculous and unlikely in
> practice.
>
> -Peff
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, back to index

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-31 19:53 [PATCH 0/1] Make the includeif:onbranch feature more robust Johannes Schindelin via GitGitGadget
2019-07-31 19:53 ` [PATCH 1/1] config: work around bug with includeif:onbranch and early config Johannes Schindelin via GitGitGadget
2019-07-31 21:37   ` Junio C Hamano
2019-07-31 20:06 ` [PATCH v2 0/1] Make the includeif:onbranch feature more robust Johannes Schindelin via GitGitGadget
2019-07-31 20:06   ` [PATCH v2 1/1] config: work around bug with includeif:onbranch and early config Johannes Schindelin via GitGitGadget
2019-07-31 22:02     ` Jeff King
2019-07-31 22:13       ` Johannes Schindelin
2019-07-31 23:12         ` Jeff King
2019-08-01  0:49           ` Jeff King
2019-08-01 17:24             ` Jeff Hostetler
2019-08-06 12:26               ` [PATCH 0/3] the_repository initialization cleanup Jeff King
2019-08-06 12:26                 ` [PATCH 1/3] t1309: use short branch name in includeIf.onbranch test Jeff King
2019-08-06 12:27                 ` [PATCH 2/3] common-main: delay trace2 initialization Jeff King
2019-08-06 12:27                 ` [PATCH 3/3] config: stop checking whether the_repository is NULL Jeff King
2019-08-06 12:49                   ` Jeff King
2019-08-08 19:48                     ` Johannes Schindelin
2019-08-06 12:56               ` [PATCH v2 1/1] config: work around bug with includeif:onbranch and early config Jeff King

git@vger.kernel.org list mirror (unofficial, one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Example config snippet for mirrors

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git