git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* git rebase regression: cannot pass a shell expression directly to --exec
@ 2017-05-15 18:08 Eric Rannaud
  2017-05-16  3:25 ` Jeff King
  2017-05-16 10:23 ` Johannes Schindelin
  0 siblings, 2 replies; 29+ messages in thread
From: Eric Rannaud @ 2017-05-15 18:08 UTC (permalink / raw)
  To: git, Johannes Schindelin; +Cc: Jeremy Serror

Hi all,

It used to be possible to run a sequence like:

  foo() { echo X; }
  export -f foo
  git rebase --exec foo HEAD~10

Since upgrading to 2.13.0, I had to update my scripts to run:

  git rebase --exec "bash -c foo" HEAD~10

I'm not sure if this was an intended change. Bisecting with the
following script:

  #!/usr/bin/env bash

  make -j8 || exit 3

  function foo() {
          echo OK
  }
  export -f foo

  pushd tmp
  ../git --exec-path=.. rebase --exec foo HEAD^^
  ret=$?
  # Cleanup if failure
  ../git --exec-path=.. rebase --abort &> /dev/null
  popd
  exit $ret

It points to this commit:

commit 18633e1a22a68bbe8e6311a1039d13ebbf6fd041 (refs/bisect/bad)
Author: Johannes Schindelin <johannes.schindelin@gmx.de>
Date:   Thu Feb 9 23:23:11 2017 +0100

    rebase -i: use the rebase--helper builtin

    Now that the sequencer learned to process a "normal" interactive rebase,
    we use it. The original shell script is still used for "non-normal"
    interactive rebases, i.e. when --root or --preserve-merges was passed.

    Please note that the --root option (via the $squash_onto variable) needs
    special handling only for the very first command, hence it is still okay
    to use the helper upon continue/skip.

    Also please note that the --no-ff setting is volatile, i.e. when the
    interactive rebase is interrupted at any stage, there is no record of
    it. Therefore, we have to pass it from the shell script to the
    rebase--helper.

    Note: the test t3404 had to be adjusted because the the error messages
    produced by the sequencer comply with our current convention to start with
    a lower-case letter.

    Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>


Thanks,
Eric

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git rebase regression: cannot pass a shell expression directly to --exec
  2017-05-15 18:08 git rebase regression: cannot pass a shell expression directly to --exec Eric Rannaud
@ 2017-05-16  3:25 ` Jeff King
  2017-05-16  3:37   ` Jeff King
  2017-05-16  3:40   ` Junio C Hamano
  2017-05-16 10:23 ` Johannes Schindelin
  1 sibling, 2 replies; 29+ messages in thread
From: Jeff King @ 2017-05-16  3:25 UTC (permalink / raw)
  To: Eric Rannaud; +Cc: git, Johannes Schindelin, Jeremy Serror

On Mon, May 15, 2017 at 11:08:39AM -0700, Eric Rannaud wrote:

> It used to be possible to run a sequence like:
> 
>   foo() { echo X; }
>   export -f foo
>   git rebase --exec foo HEAD~10
> 
> Since upgrading to 2.13.0, I had to update my scripts to run:
> 
>   git rebase --exec "bash -c foo" HEAD~10

Interesting. The "exec" string is still run with a shell. E.g.:

  $ git rebase --exec 'for i in 1 2 3; do echo >&2 $i; done' HEAD~5
  Executing: for i in 1 2 3; do echo >&2 $i; done
  1
  2
  3
  Executing: for i in 1 2 3; do echo >&2 $i; done
  1
  2
  3
  [...and so on...]

I wonder if this is falling afoul of the optimization in run-command's
prepare_shell_cmd() to skip shell invocations for "simple" commands.

E.g., if I do:

   strace -fe execve git rebase -x foo HEAD^ 2>&1 | grep foo

I see:

   ...
   Executing: foo
   [pid 21820] execve("foo", ["foo"], [/* 57 vars */]) = -1 ENOENT (No such file or directory)
   fatal: cannot run foo: No such file or directory

But if I were to add a shell meta-character, like this:

   strace -fe execve git rebase -x 'foo;' HEAD^ 2>&1 | grep foo

then I see:

  ...
  Executing: foo;
  [pid 22569] execve("/bin/sh", ["/bin/sh", "-c", "foo;", "foo;"], [/* 57 vars */]) = 0
  foo;: 1: foo;: foo: not found

So I suspect if you added an extraneous semi-colon, your case would work
again (and that would confirm for us that this is indeed the problem).

> I'm not sure if this was an intended change. Bisecting with the
> following script:
> [...]
> It points to this commit:
>
> commit 18633e1a22a68bbe8e6311a1039d13ebbf6fd041 (refs/bisect/bad)
> Author: Johannes Schindelin <johannes.schindelin@gmx.de>
> Date:   Thu Feb 9 23:23:11 2017 +0100
>
>     rebase -i: use the rebase--helper builtin

The optimization in run-command is very old, but the switch to
rebase--helper presumably moved us from doing that exec in the actual
shell script to doing it via the C code.

Which means your exported-function technique has been broken for _most_
of Git all along, but it just now affected this particular spot.

I'm not sure how to feel about it. In the face of exported functions, we
can never do the shell-skipping optimization, because we don't know how
the shell is going to interpret even a simple-looking command. And it is
kind of a neat trick. But I also hate to add extra useless shell
invocations for the predominantly common case that people aren't using
this trick (or aren't even using a shell that supports function
exports).

One hack would be to look for BASH_FUNC_* in the environment and disable
the optimization in that case. I think that would make your case Just
Work. It doesn't help other oddball cases, like:

  - you're trying to run a shell builtin that behaves differently than
    its exec-able counterpart

  - your shell has some other mechanism for defining commands that we
    would not find via exec. I don't know of one offhand. Obviously $ENV
    could point to a file which defines some, but for most shells would
    not read any startup files for a non-interactive "sh -c" invocation.

-Peff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git rebase regression: cannot pass a shell expression directly to --exec
  2017-05-16  3:25 ` Jeff King
@ 2017-05-16  3:37   ` Jeff King
  2017-05-16 16:41     ` Jonathan Nieder
  2017-05-16  3:40   ` Junio C Hamano
  1 sibling, 1 reply; 29+ messages in thread
From: Jeff King @ 2017-05-16  3:37 UTC (permalink / raw)
  To: Eric Rannaud; +Cc: git, Johannes Schindelin, Jeremy Serror

On Mon, May 15, 2017 at 11:25:03PM -0400, Jeff King wrote:

> One hack would be to look for BASH_FUNC_* in the environment and disable
> the optimization in that case. I think that would make your case Just
> Work. It doesn't help other oddball cases, like:
> 
>   - you're trying to run a shell builtin that behaves differently than
>     its exec-able counterpart
> 
>   - your shell has some other mechanism for defining commands that we
>     would not find via exec. I don't know of one offhand. Obviously $ENV
>     could point to a file which defines some, but for most shells would
>     not read any startup files for a non-interactive "sh -c" invocation.

So I was thinking something like the patch below, though I guess
technically you could look for BASH_FUNC_$argv[0]%%, which seems to be
bash's magic variable name. I hate to get too intimate with those
details, though.

Another option is to speculatively run "foo" without the shell, and if
execve fails to find it, then fall back to running the shell. That would
catch any number of cases where the shell "somehow" finds a command that
we can't.

You'd still have confusing behavior if your shell builtin behaved
differently than the exec-able version, though (because we'd quietly use
the exec-able one), but I would imagine that's exceedingly rare.

I dunno. Maybe the whole thing is a fool's errand.

diff --git a/run-command.c b/run-command.c
index 1c02bfb2e..8328d27fb 100644
--- a/run-command.c
+++ b/run-command.c
@@ -240,12 +240,24 @@ int sane_execvp(const char *file, char * const argv[])
 	return -1;
 }
 
+static int env_has_bash_functions(void)
+{
+	extern char **environ;
+	char **key;
+
+	for (key = environ; *key; key++)
+		if (starts_with(*key, "BASH_FUNC_"))
+			return 1;
+	return 0;
+}
+
 static const char **prepare_shell_cmd(struct argv_array *out, const char **argv)
 {
 	if (!argv[0])
 		die("BUG: shell command is empty");
 
-	if (strcspn(argv[0], "|&;<>()$`\\\"' \t\n*?[#~=%") != strlen(argv[0])) {
+	if (strcspn(argv[0], "|&;<>()$`\\\"' \t\n*?[#~=%") != strlen(argv[0]) ||
+	    env_has_bash_functions()) {
 #ifndef GIT_WINDOWS_NATIVE
 		argv_array_push(out, SHELL_PATH);
 #else

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: git rebase regression: cannot pass a shell expression directly to --exec
  2017-05-16  3:25 ` Jeff King
  2017-05-16  3:37   ` Jeff King
@ 2017-05-16  3:40   ` Junio C Hamano
  2017-05-16  3:53     ` Jeff King
  2017-05-16 10:46     ` Johannes Schindelin
  1 sibling, 2 replies; 29+ messages in thread
From: Junio C Hamano @ 2017-05-16  3:40 UTC (permalink / raw)
  To: Jeff King; +Cc: Eric Rannaud, git, Johannes Schindelin, Jeremy Serror

Jeff King <peff@peff.net> writes:

> Interesting. The "exec" string is still run with a shell. E.g.:
> ...
> I wonder if this is falling afoul of the optimization in run-command's
> prepare_shell_cmd() to skip shell invocations for "simple" commands.
> ...
> So I suspect if you added an extraneous semi-colon, your case would work
> again (and that would confirm for us that this is indeed the problem).

Wow, that's a brilliant analysis.

> The optimization in run-command is very old, but the switch to
> rebase--helper presumably moved us from doing that exec in the actual
> shell script to doing it via the C code.
>
> Which means your exported-function technique has been broken for _most_
> of Git all along, but it just now affected this particular spot.
>
> I'm not sure how to feel about it. In the face of exported functions, we
> can never do the shell-skipping optimization, because we don't know how
> the shell is going to interpret even a simple-looking command. And it is
> kind of a neat trick. But I also hate to add extra useless shell
> invocations for the predominantly common case that people aren't using
> this trick (or aren't even using a shell that supports function
> exports).

I was about to write this off as "an unfortunate regression, a
fallout that will likely left unfixed, due to lack of a good
practical workaround."

The point of rewriting things in C and using run_command() interface
was to avoid shell overhead.  We are doing an exec already, but
adding a shell invocation in the middle will double the number of
exec (and probably add an extra fork as well), which probably is
measurable on systems with slow fork/exec.

The "semicolon" trick is way too obscure, but perhaps can be
documented as an escape hatch?


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git rebase regression: cannot pass a shell expression directly to --exec
  2017-05-16  3:40   ` Junio C Hamano
@ 2017-05-16  3:53     ` Jeff King
  2017-05-16  4:08       ` Jeff King
  2017-05-16 16:45       ` Eric Rannaud
  2017-05-16 10:46     ` Johannes Schindelin
  1 sibling, 2 replies; 29+ messages in thread
From: Jeff King @ 2017-05-16  3:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Eric Rannaud, git, Johannes Schindelin, Jeremy Serror

On Tue, May 16, 2017 at 12:40:51PM +0900, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Interesting. The "exec" string is still run with a shell. E.g.:
> > ...
> > I wonder if this is falling afoul of the optimization in run-command's
> > prepare_shell_cmd() to skip shell invocations for "simple" commands.
> > ...
> > So I suspect if you added an extraneous semi-colon, your case would work
> > again (and that would confirm for us that this is indeed the problem).
> 
> Wow, that's a brilliant analysis.

If it's right. :) It's all theory at this point.

My /bin/sh isn't bash, but I should be able to build with
SHELL_PATH=/bin/bash to reproduce. But I can't:

   $ bash
   $ foo() { echo >&2 "in foo"; }
   $ export -f foo
   $ bash -c foo
   in foo

   $ strace -f 2>&1 git.compile rebase -x 'foo;' HEAD^ | grep foo
   Executing: foo;
   [pid  1788] execve("/bin/bash", ["/bin/bash", "-c", "foo;", "foo;"], [/* 60 vars */] <unfinished ...>
   foo;: foo: command not found

So I'm not sure why the direct "bash -c" can find it, but somehow the
variable doesn't make it through to the "bash -c" at the lower level.
Replacing "foo;" with "env" shows the environment, but BASH_FUNC_foo
isn't set in it. I'm not sure where it's getting eaten, though.

> The "semicolon" trick is way too obscure, but perhaps can be
> documented as an escape hatch?

Yeah, I agree this should be documented if it can't be fixed. I wasn't
sure if we were giving up just yet.

Either way, though, it wouldn't hurt to mention optimizing out "maybe
shell" optimization, because it can occasionally produce user-visible
effects. Where would be a good place? In git(1), I guess?

It's almost something that could go in gitcli(7), but it's not really
about the CLI in particular. In most cases the shell-exec'd commands are
from config, but not always (as this case shows). So git-config(1)
probably isn't the right place.

AFAIK, we don't talk about this behavior at all in the existing
documentation. And I really don't know where we'd put it that somebody
would find it without having to read the documentation exhaustively.

-Peff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git rebase regression: cannot pass a shell expression directly to --exec
  2017-05-16  3:53     ` Jeff King
@ 2017-05-16  4:08       ` Jeff King
  2017-05-16 16:45       ` Eric Rannaud
  1 sibling, 0 replies; 29+ messages in thread
From: Jeff King @ 2017-05-16  4:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Eric Rannaud, git, Johannes Schindelin, Jeremy Serror

On Mon, May 15, 2017 at 11:53:57PM -0400, Jeff King wrote:

> My /bin/sh isn't bash, but I should be able to build with
> SHELL_PATH=/bin/bash to reproduce. But I can't:
> 
>    $ bash
>    $ foo() { echo >&2 "in foo"; }
>    $ export -f foo
>    $ bash -c foo
>    in foo
> 
>    $ strace -f 2>&1 git.compile rebase -x 'foo;' HEAD^ | grep foo
>    Executing: foo;
>    [pid  1788] execve("/bin/bash", ["/bin/bash", "-c", "foo;", "foo;"], [/* 60 vars */] <unfinished ...>
>    foo;: foo: command not found
> 
> So I'm not sure why the direct "bash -c" can find it, but somehow the
> variable doesn't make it through to the "bash -c" at the lower level.
> Replacing "foo;" with "env" shows the environment, but BASH_FUNC_foo
> isn't set in it. I'm not sure where it's getting eaten, though.

Hmph. It looks like "dash" eats it. My "git.compile" above is a symlink
to /path/to/git/bin-wrappers/git. But it looks like our Makefile isn't
smart enough to rebuild bin-wrappers when you switch SHELL_PATH, so it
will had "/bin/sh" in it. Which points to dash on my machine. And
indeed, dash seems to wipe the environment:

  $ foo() { echo >&2 "in foo"; }
  $ export -f foo
  $ bash -c foo
  in foo
  $ dash -c "bash -c foo"
  bash: foo: command not found

I wonder if it's related to ShellShock protections, or if dash just
rejects variable names with the "%" in them or something. Anyway. That
explains why I had trouble reproducing. Using bash consistently as my
shell lets me reproduce Eric's results. And doing the semicolon trick
does make it work again.

So I feel confident that I've analyzed the problem correctly, at least.

-Peff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git rebase regression: cannot pass a shell expression directly to --exec
  2017-05-15 18:08 git rebase regression: cannot pass a shell expression directly to --exec Eric Rannaud
  2017-05-16  3:25 ` Jeff King
@ 2017-05-16 10:23 ` Johannes Schindelin
  2017-05-16 16:18   ` Jeff King
  1 sibling, 1 reply; 29+ messages in thread
From: Johannes Schindelin @ 2017-05-16 10:23 UTC (permalink / raw)
  To: Eric Rannaud; +Cc: git, Jeremy Serror

Hi Eric,

On Mon, 15 May 2017, Eric Rannaud wrote:

> It used to be possible to run a sequence like:
> 
>   foo() { echo X; }
>   export -f foo
>   git rebase --exec foo HEAD~10

It would appear to me that you used a side effect of an implementation
detail: that `git rebase -i` was implemented entirely as a shell script.

In my mind, this was a fragile assumption, and that is why...

> commit 18633e1a22a68bbe8e6311a1039d13ebbf6fd041 (refs/bisect/bad)
> Author: Johannes Schindelin <johannes.schindelin@gmx.de>
> Date:   Thu Feb 9 23:23:11 2017 +0100
> 
>     rebase -i: use the rebase--helper builtin

... this commit, which is a long-overdue start of cleaning up our act,
broke your code's assumption.

I am afraid that your code placed too much of a dependency on an
implementation detail that changed.

In short: I think that your fix to your script is correct.

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git rebase regression: cannot pass a shell expression directly to --exec
  2017-05-16  3:40   ` Junio C Hamano
  2017-05-16  3:53     ` Jeff King
@ 2017-05-16 10:46     ` Johannes Schindelin
  1 sibling, 0 replies; 29+ messages in thread
From: Johannes Schindelin @ 2017-05-16 10:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Eric Rannaud, git, Jeremy Serror

Hi Junio,

On Tue, 16 May 2017, Junio C Hamano wrote:

> The point of rewriting things in C and using run_command() interface was
> to avoid shell overhead.

That statement is missing the point.

It is true that converting shell scripts to proper builtins avoids the
shell overhead.

It is even more true that that results in a nice speed-up that can even be
felt on Linux (I feel I have to stress this, as you seem to be mostly
focused on Linux usage, and I do not want you to get the impression that
you are only doing "the Windows folks" a favor when you gracefully accept
the patches to convert scripts to proper builtins).

But the real truth is: shell scripting is not portable.

Shell scripting is never only shell scripting, of course. A quite
undocumented set of utilities is expected to be present for our scripts to
run, too: sed, awk, tr, cat, expr, just to name a few.

It does not end there. For example, sed is not equal to sed. BSD sed has
different semantics than GNU sed, and we jump through hoops to try to
ensure that our shell scripts run with both versions. Which must make many
a contributor feel a lot less positive about their contributions e.g. when
a reviewer points out that their \t in their sed statement would break Git
on MacOSX. Sad!

We place a burden on maintainers targeting platforms that are not Linux.
If I remember correctly, we need to override the default shell on one
Unix, even, because we simply failed to make our shell scripts run with
it.

And then, of course, there are all the problems inflicted on users on that
most prevalent Operating System of them all: Windows. Yes, I know, you
"non-Windows folk" would like to happily ignore that little tiny problem,
but that's really doing Git a disservice. So stop doing it.

What hurts me most about this is that Subversion, a software that us "Git
folk" pride ourselves in making obsolete, never had those problems. It was
designed with an eye toward portability from the start.

It is a good thing to support users who want to automate their workflows
using scripts, of course. Oh, and please, do not forget to remember that
there are tons of scripting languages out there, e.g. AppleScript,
Javascript, Tcl, Python, Ruby; just because you prefer Bash does not mean
the majority of developers do.

So let's continue to broaden our scripting support. And at the same time,
let's also change our mindset so that we can reduce the portability
problems of Git.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git rebase regression: cannot pass a shell expression directly to --exec
  2017-05-16 10:23 ` Johannes Schindelin
@ 2017-05-16 16:18   ` Jeff King
  2017-05-16 16:59     ` Eric Rannaud
  0 siblings, 1 reply; 29+ messages in thread
From: Jeff King @ 2017-05-16 16:18 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Eric Rannaud, git, Jeremy Serror

On Tue, May 16, 2017 at 12:23:02PM +0200, Johannes Schindelin wrote:

> On Mon, 15 May 2017, Eric Rannaud wrote:
> 
> > It used to be possible to run a sequence like:
> > 
> >   foo() { echo X; }
> >   export -f foo
> >   git rebase --exec foo HEAD~10
> 
> It would appear to me that you used a side effect of an implementation
> detail: that `git rebase -i` was implemented entirely as a shell script.

I don't think that's true at all. He expected the user-provided "--exec"
command to be run by a shell, which seems like a reasonable thing for
Git to promise (and we already make a similar promise for most
user-provided commands that we run).  What happens in between, be it
shell or C code, doesn't matter, and the conversion away from a shell
script in this case only tickled an existing bad interaction between
"export -f" and Git's run-command code.

See my other replies for the full story.

I don't think this has anything in particular to do with git-rebase,
though. Our solutions are either:

  - declare "export -f" as too tricky for our optimization, and teach
    people about the ";" trick

  - figure out some workaround/fallback to disable the shell-skipping
    optimization in this case

-Peff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git rebase regression: cannot pass a shell expression directly to --exec
  2017-05-16  3:37   ` Jeff King
@ 2017-05-16 16:41     ` Jonathan Nieder
  2017-05-16 16:47       ` Jeff King
  0 siblings, 1 reply; 29+ messages in thread
From: Jonathan Nieder @ 2017-05-16 16:41 UTC (permalink / raw)
  To: Jeff King
  Cc: Eric Rannaud, git, Johannes Schindelin, Jeremy Serror,
	Brandon Williams

(+Brandon Williams, who may have more context for execvp-related things)
Hi,

Jeff King wrote:
> On Mon, May 15, 2017 at 11:25:03PM -0400, Jeff King wrote:

>> One hack would be to look for BASH_FUNC_* in the environment and disable
>> the optimization in that case. I think that would make your case Just
>> Work. It doesn't help other oddball cases, like:
>>
>>   - you're trying to run a shell builtin that behaves differently than
>>     its exec-able counterpart
>>
>>   - your shell has some other mechanism for defining commands that we
>>     would not find via exec. I don't know of one offhand. Obviously $ENV
>>     could point to a file which defines some, but for most shells would
>>     not read any startup files for a non-interactive "sh -c" invocation.
>
> So I was thinking something like the patch below, though I guess
> technically you could look for BASH_FUNC_$argv[0]%%, which seems to be
> bash's magic variable name. I hate to get too intimate with those
> details, though.
>
> Another option is to speculatively run "foo" without the shell, and if
> execve fails to find it, then fall back to running the shell. That would
> catch any number of cases where the shell "somehow" finds a command that
> we can't.

Hm.  execvp explicitly does this when it hits ENOEXEC, but not for
ENOENT.  Do you know why that is?

I think we want to behave consistently for shell builtins and for
exported functions --- they are different sides of the same problem,
and behaving differently between the two feels confusing.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git rebase regression: cannot pass a shell expression directly to --exec
  2017-05-16  3:53     ` Jeff King
  2017-05-16  4:08       ` Jeff King
@ 2017-05-16 16:45       ` Eric Rannaud
  1 sibling, 0 replies; 29+ messages in thread
From: Eric Rannaud @ 2017-05-16 16:45 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, Johannes Schindelin, Jeremy Serror

On Mon, May 15, 2017 at 8:53 PM, Jeff King <peff@peff.net> wrote:
>> > So I suspect if you added an extraneous semi-colon, your case would work
>> > again (and that would confirm for us that this is indeed the problem).
>>
>> Wow, that's a brilliant analysis.
>
> If it's right. :) It's all theory at this point.
>
> My /bin/sh isn't bash, but I should be able to build with
> SHELL_PATH=/bin/bash to reproduce. But I can't:

Just to clarify if there was any doubt, the semicolon trick does
indeed work fine when bash is your shell.

It would be nice if it were consistent for all git commands that take
a <cmd> argument:

  foo() { echo foo; }
  export -f foo
  git bisect start master master^^
  git bisect run foo  # works
  git bisect run 'foo;'  # doesn't work
running foo;
/usr/lib/git-core/git-bisect: line 493: foo;: command not found

Also, what's a "simple command", exactly?

  foo() { echo foo; }
  export -f foo
  git rebase --exec foo master^^  # fails
  git rebase --exec 'foo;' master^^  # OK
  git rebase --exec 'foo 1' master^^  # OK

Not sure if this can be made easy to understand in the manpage.

Thanks,
Eric

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git rebase regression: cannot pass a shell expression directly to --exec
  2017-05-16 16:41     ` Jonathan Nieder
@ 2017-05-16 16:47       ` Jeff King
  2017-05-16 17:11         ` Eric Rannaud
                           ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Jeff King @ 2017-05-16 16:47 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Eric Rannaud, git, Johannes Schindelin, Jeremy Serror,
	Brandon Williams

On Tue, May 16, 2017 at 09:41:24AM -0700, Jonathan Nieder wrote:

> Jeff King wrote:
> > On Mon, May 15, 2017 at 11:25:03PM -0400, Jeff King wrote:
> 
> >> One hack would be to look for BASH_FUNC_* in the environment and disable
> >> the optimization in that case. I think that would make your case Just
> >> Work. It doesn't help other oddball cases, like:
> >>
> >>   - you're trying to run a shell builtin that behaves differently than
> >>     its exec-able counterpart
> >>
> >>   - your shell has some other mechanism for defining commands that we
> >>     would not find via exec. I don't know of one offhand. Obviously $ENV
> >>     could point to a file which defines some, but for most shells would
> >>     not read any startup files for a non-interactive "sh -c" invocation.
> >
> > So I was thinking something like the patch below, though I guess
> > technically you could look for BASH_FUNC_$argv[0]%%, which seems to be
> > bash's magic variable name. I hate to get too intimate with those
> > details, though.
> >
> > Another option is to speculatively run "foo" without the shell, and if
> > execve fails to find it, then fall back to running the shell. That would
> > catch any number of cases where the shell "somehow" finds a command that
> > we can't.
> 
> Hm.  execvp explicitly does this when it hits ENOEXEC, but not for
> ENOENT.  Do you know why that is?

When execvp(foo) falls back on ENOEXEC, it is not running "sh -c foo".
It is actually running "sh foo" to run the script contents. So it's
about letting you do:

  echo "no shebang line" >script
  chmod +x script
  ./script

And nothing to do with shell builtins.

> I think we want to behave consistently for shell builtins and for
> exported functions --- they are different sides of the same problem,
> and behaving differently between the two feels confusing.

Yes, I think ideally we'd handle it all transparently. Although I'd also
point out that Git the behavior under discussion dates back to 2009 and
this is (as far as I can recall) the first report. So documenting the
current behavior might not be that bad an option.

-Peff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git rebase regression: cannot pass a shell expression directly to --exec
  2017-05-16 16:18   ` Jeff King
@ 2017-05-16 16:59     ` Eric Rannaud
  2017-05-16 17:14       ` Kevin Daudt
  2017-05-16 17:21       ` Eric Rannaud
  0 siblings, 2 replies; 29+ messages in thread
From: Eric Rannaud @ 2017-05-16 16:59 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin, git, Jeremy Serror

On Tue, May 16, 2017 at 9:18 AM, Jeff King <peff@peff.net> wrote:
> On Tue, May 16, 2017 at 12:23:02PM +0200, Johannes Schindelin wrote:
>> It would appear to me that you used a side effect of an implementation
>> detail: that `git rebase -i` was implemented entirely as a shell script.
>
> I don't think that's true at all. He expected the user-provided "--exec"
> command to be run by a shell, which seems like a reasonable thing for
> Git to promise (and we already make a similar promise for most
> user-provided commands that we run).  What happens in between, be it

As a "user", my expectation was simply that the command would be run
not just in "a shell", but in *my* shell (or the shell that calls git,
maybe). So I don't see any portability question with respect to Git.
My script that uses git rebase --exec may not be portable, but that's
my problem.

When I use "git rebase --exec <cmd>" I'm basically writing a "foreach
commit in range { <cmd> }" in my shell. Same idea with git bisect run.

A transparent optimization that tries execve() then falls back to the
user's shell sounds like a good idea.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git rebase regression: cannot pass a shell expression directly to --exec
  2017-05-16 16:47       ` Jeff King
@ 2017-05-16 17:11         ` Eric Rannaud
  2017-05-16 17:15         ` Brandon Williams
  2017-05-16 17:37         ` Jonathan Nieder
  2 siblings, 0 replies; 29+ messages in thread
From: Eric Rannaud @ 2017-05-16 17:11 UTC (permalink / raw)
  To: Jeff King
  Cc: Jonathan Nieder, git, Johannes Schindelin, Jeremy Serror,
	Brandon Williams

On Tue, May 16, 2017 at 9:47 AM, Jeff King <peff@peff.net> wrote:
>> I think we want to behave consistently for shell builtins and for
>> exported functions --- they are different sides of the same problem,
>> and behaving differently between the two feels confusing.
>
> Yes, I think ideally we'd handle it all transparently. Although I'd also
> point out that Git the behavior under discussion dates back to 2009 and
> this is (as far as I can recall) the first report. So documenting the
> current behavior might not be that bad an option.

This is on Arch Linux which is usually pretty aggressive with their
Git upgrades and I first saw the problem this week. The script that
hits this case runs constantly on my machine so my guess is that if
anyone relies on the old behavior, they're just starting to see a
new-enough Git to have a problem.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git rebase regression: cannot pass a shell expression directly to --exec
  2017-05-16 16:59     ` Eric Rannaud
@ 2017-05-16 17:14       ` Kevin Daudt
  2017-05-16 17:29         ` Eric Rannaud
  2017-05-16 17:21       ` Eric Rannaud
  1 sibling, 1 reply; 29+ messages in thread
From: Kevin Daudt @ 2017-05-16 17:14 UTC (permalink / raw)
  To: Eric Rannaud; +Cc: Jeff King, Johannes Schindelin, git, Jeremy Serror

On Tue, May 16, 2017 at 09:59:03AM -0700, Eric Rannaud wrote:
> On Tue, May 16, 2017 at 9:18 AM, Jeff King <peff@peff.net> wrote:
> > On Tue, May 16, 2017 at 12:23:02PM +0200, Johannes Schindelin wrote:
> >> It would appear to me that you used a side effect of an implementation
> >> detail: that `git rebase -i` was implemented entirely as a shell script.
> >
> > I don't think that's true at all. He expected the user-provided "--exec"
> > command to be run by a shell, which seems like a reasonable thing for
> > Git to promise (and we already make a similar promise for most
> > user-provided commands that we run).  What happens in between, be it
> 
> As a "user", my expectation was simply that the command would be run
> not just in "a shell", but in *my* shell (or the shell that calls git,
> maybe). So I don't see any portability question with respect to Git.
> My script that uses git rebase --exec may not be portable, but that's
> my problem.
> 
> When I use "git rebase --exec <cmd>" I'm basically writing a "foreach
> commit in range { <cmd> }" in my shell. Same idea with git bisect run.
> 
> A transparent optimization that tries execve() then falls back to the
> user's shell sounds like a good idea.

It does not really work that way. Git runs in a separate process that
does not have access to your current shell. That's why you need to do
'export -f foo'.

If you want git to be able to ecute the foo shell function, git needs to
start a _new_ shell process, which reads the environment, recognize the
exported function and run that.

This is not the same as git executing the command in your shell. Not
exported variables would not be available in this function (as it would
be in your equivalent).

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git rebase regression: cannot pass a shell expression directly to --exec
  2017-05-16 16:47       ` Jeff King
  2017-05-16 17:11         ` Eric Rannaud
@ 2017-05-16 17:15         ` Brandon Williams
  2017-05-16 17:23           ` Jeff King
  2017-05-16 17:37         ` Jonathan Nieder
  2 siblings, 1 reply; 29+ messages in thread
From: Brandon Williams @ 2017-05-16 17:15 UTC (permalink / raw)
  To: Jeff King
  Cc: Jonathan Nieder, Eric Rannaud, git, Johannes Schindelin,
	Jeremy Serror

On 05/16, Jeff King wrote:
> On Tue, May 16, 2017 at 09:41:24AM -0700, Jonathan Nieder wrote:
> 
> > Jeff King wrote:
> > > On Mon, May 15, 2017 at 11:25:03PM -0400, Jeff King wrote:
> > 
> > >> One hack would be to look for BASH_FUNC_* in the environment and disable
> > >> the optimization in that case. I think that would make your case Just
> > >> Work. It doesn't help other oddball cases, like:
> > >>
> > >>   - you're trying to run a shell builtin that behaves differently than
> > >>     its exec-able counterpart
> > >>
> > >>   - your shell has some other mechanism for defining commands that we
> > >>     would not find via exec. I don't know of one offhand. Obviously $ENV
> > >>     could point to a file which defines some, but for most shells would
> > >>     not read any startup files for a non-interactive "sh -c" invocation.
> > >
> > > So I was thinking something like the patch below, though I guess
> > > technically you could look for BASH_FUNC_$argv[0]%%, which seems to be
> > > bash's magic variable name. I hate to get too intimate with those
> > > details, though.

One concern with that is what about all other shells that are not BASH?
I'm sure they use a different env var for storing functions so we may
need to handle other shell's too? That is assuming we want to keep the
old behavior.

> > >
> > > Another option is to speculatively run "foo" without the shell, and if
> > > execve fails to find it, then fall back to running the shell. That would
> > > catch any number of cases where the shell "somehow" finds a command that
> > > we can't.
> > 
> > Hm.  execvp explicitly does this when it hits ENOEXEC, but not for
> > ENOENT.  Do you know why that is?
> 
> When execvp(foo) falls back on ENOEXEC, it is not running "sh -c foo".
> It is actually running "sh foo" to run the script contents. So it's
> about letting you do:
> 
>   echo "no shebang line" >script
>   chmod +x script
>   ./script
> 
> And nothing to do with shell builtins.

That's correct, and is the exact behavior I was trying to mimic with the
changes to run_command.
  1. try executing the command.
  2. if it fails with ENOEXEC then attempt to execute it with a shell

> 
> > I think we want to behave consistently for shell builtins and for
> > exported functions --- they are different sides of the same problem,
> > and behaving differently between the two feels confusing.
> 
> Yes, I think ideally we'd handle it all transparently. Although I'd also
> point out that Git the behavior under discussion dates back to 2009 and
> this is (as far as I can recall) the first report. So documenting the
> current behavior might not be that bad an option.
> 
> -Peff

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git rebase regression: cannot pass a shell expression directly to --exec
  2017-05-16 16:59     ` Eric Rannaud
  2017-05-16 17:14       ` Kevin Daudt
@ 2017-05-16 17:21       ` Eric Rannaud
  2017-05-16 17:37         ` Jeff King
  1 sibling, 1 reply; 29+ messages in thread
From: Eric Rannaud @ 2017-05-16 17:21 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin, git, Jeremy Serror

On Tue, May 16, 2017 at 9:59 AM, Eric Rannaud <eric.rannaud@gmail.com> wrote:
> When I use "git rebase --exec <cmd>" I'm basically writing a "foreach
> commit in range { <cmd> }" in my shell. Same idea with git bisect run.
>
> A transparent optimization that tries execve() then falls back to the
> user's shell sounds like a good idea.

One issue with the execve-else-shell optimization is that sometimes a
binary exists that will shadow an exported function or a shell
builtin:

  git rebase --exec true master^^  # OK but in fact this runs /usr/bin/true

In this case it doesn't really matter. If the optimization is only
applied to "simple commands" (i.e. no arguments), then it's probably
OK. I can't think of problematic cases. Except weird things like:

  $ git rebase --exec time master^
  Executing: time
  Usage: time [-apvV] [-f format] [-o file] [--append] [--verbose]
         [--portability] [--format=format] [--output=file] [--version]
         [--help] command [arg...]
  warning: execution failed: time

/usr/bin/time requires an argument. Even though the bash builtin time
runs fine without argument.

  $ time

  real    0m0.000s
  user    0m0.000s
  sys     0m0.000s

But if the optimization is applied to more complex commands, then we
will have problems. For instance, the builtin echo supports \E, but
/usr/bin/echo doesn't support it.

In any case, the manpage says --exec <cmd> and "<cmd> will be
interpreted as one or more shell commands.", it doesn't say "--exec
<executable>".

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git rebase regression: cannot pass a shell expression directly to --exec
  2017-05-16 17:15         ` Brandon Williams
@ 2017-05-16 17:23           ` Jeff King
  2017-05-16 17:30             ` Brandon Williams
  2017-05-16 19:14             ` Linus Torvalds
  0 siblings, 2 replies; 29+ messages in thread
From: Jeff King @ 2017-05-16 17:23 UTC (permalink / raw)
  To: Brandon Williams
  Cc: Jonathan Nieder, Eric Rannaud, git, Johannes Schindelin,
	Jeremy Serror

On Tue, May 16, 2017 at 10:15:40AM -0700, Brandon Williams wrote:

> > > > So I was thinking something like the patch below, though I guess
> > > > technically you could look for BASH_FUNC_$argv[0]%%, which seems to be
> > > > bash's magic variable name. I hate to get too intimate with those
> > > > details, though.
> 
> One concern with that is what about all other shells that are not BASH?
> I'm sure they use a different env var for storing functions so we may
> need to handle other shell's too? That is assuming we want to keep the
> old behavior.

Most other shells don't do function-exporting at all. Certainly dash and
most traditional bourne shells don't. I wouldn't be surprised if zsh
does. But yeah, we'd have to support them one by one (and possibly
variants across different versions of each shell). Workable, but gross.

> > When execvp(foo) falls back on ENOEXEC, it is not running "sh -c foo".
> > It is actually running "sh foo" to run the script contents. So it's
> > about letting you do:
> > 
> >   echo "no shebang line" >script
> >   chmod +x script
> >   ./script
> > 
> > And nothing to do with shell builtins.
> 
> That's correct, and is the exact behavior I was trying to mimic with the
> changes to run_command.
>   1. try executing the command.
>   2. if it fails with ENOEXEC then attempt to execute it with a shell

I think the logic here would be more like:

  1. During prepare_shell_cmd(), even if we optimize out the shell call,
     still prepare a fallback argv (since we can't allocate memory
     post-fork).

  2. In the forked child, if we get ENOENT from exec and cmd->use_shell
     is set, then exec the fallback shell argv instead. Propagate its
     results, even if it's 127.

That still means we'd prefer a $PATH copy of a command to its shell
builtin variant, but that can't be helped (and I kind of doubt anybody
would care too much).

-Peff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git rebase regression: cannot pass a shell expression directly to --exec
  2017-05-16 17:14       ` Kevin Daudt
@ 2017-05-16 17:29         ` Eric Rannaud
  2017-05-16 17:41           ` Jeff King
  0 siblings, 1 reply; 29+ messages in thread
From: Eric Rannaud @ 2017-05-16 17:29 UTC (permalink / raw)
  To: Kevin Daudt; +Cc: Jeff King, Johannes Schindelin, git, Jeremy Serror

On Tue, May 16, 2017 at 10:14 AM, Kevin Daudt <me@ikke.info> wrote:
>> A transparent optimization that tries execve() then falls back to the
>> user's shell sounds like a good idea.
>
> It does not really work that way. Git runs in a separate process that
> does not have access to your current shell. That's why you need to do
> 'export -f foo'.
>
> If you want git to be able to ecute the foo shell function, git needs to
> start a _new_ shell process, which reads the environment, recognize the
> exported function and run that.
>
> This is not the same as git executing the command in your shell. Not
> exported variables would not be available in this function (as it would
> be in your equivalent).

I'm sorry, I didn't mean (or say) "my shell process". Indeed, it
doesn't work that way. And to be clear, there is no problem with
having to "export -f foo". The only question is how should git run the
<cmd> passed to --exec: should it run directly or using a shell?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git rebase regression: cannot pass a shell expression directly to --exec
  2017-05-16 17:23           ` Jeff King
@ 2017-05-16 17:30             ` Brandon Williams
  2017-05-16 19:14             ` Linus Torvalds
  1 sibling, 0 replies; 29+ messages in thread
From: Brandon Williams @ 2017-05-16 17:30 UTC (permalink / raw)
  To: Jeff King
  Cc: Jonathan Nieder, Eric Rannaud, git, Johannes Schindelin,
	Jeremy Serror

On 05/16, Jeff King wrote:
> On Tue, May 16, 2017 at 10:15:40AM -0700, Brandon Williams wrote:
> 
> > > > > So I was thinking something like the patch below, though I guess
> > > > > technically you could look for BASH_FUNC_$argv[0]%%, which seems to be
> > > > > bash's magic variable name. I hate to get too intimate with those
> > > > > details, though.
> > 
> > One concern with that is what about all other shells that are not BASH?
> > I'm sure they use a different env var for storing functions so we may
> > need to handle other shell's too? That is assuming we want to keep the
> > old behavior.
> 
> Most other shells don't do function-exporting at all. Certainly dash and
> most traditional bourne shells don't. I wouldn't be surprised if zsh
> does. But yeah, we'd have to support them one by one (and possibly
> variants across different versions of each shell). Workable, but gross.
> 
> > > When execvp(foo) falls back on ENOEXEC, it is not running "sh -c foo".
> > > It is actually running "sh foo" to run the script contents. So it's
> > > about letting you do:
> > > 
> > >   echo "no shebang line" >script
> > >   chmod +x script
> > >   ./script
> > > 
> > > And nothing to do with shell builtins.
> > 
> > That's correct, and is the exact behavior I was trying to mimic with the
> > changes to run_command.
> >   1. try executing the command.
> >   2. if it fails with ENOEXEC then attempt to execute it with a shell
> 
> I think the logic here would be more like:

Oh yes, I was just saying what I did not taking into account how we
would solve this issue.

> 
>   1. During prepare_shell_cmd(), even if we optimize out the shell call,
>      still prepare a fallback argv (since we can't allocate memory
>      post-fork).
> 
>   2. In the forked child, if we get ENOENT from exec and cmd->use_shell
>      is set, then exec the fallback shell argv instead. Propagate its
>      results, even if it's 127.
> 
> That still means we'd prefer a $PATH copy of a command to its shell
> builtin variant, but that can't be helped (and I kind of doubt anybody
> would care too much).
> 
> -Peff

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git rebase regression: cannot pass a shell expression directly to --exec
  2017-05-16 16:47       ` Jeff King
  2017-05-16 17:11         ` Eric Rannaud
  2017-05-16 17:15         ` Brandon Williams
@ 2017-05-16 17:37         ` Jonathan Nieder
  2 siblings, 0 replies; 29+ messages in thread
From: Jonathan Nieder @ 2017-05-16 17:37 UTC (permalink / raw)
  To: Jeff King
  Cc: Eric Rannaud, git, Johannes Schindelin, Jeremy Serror,
	Brandon Williams

Jeff King wrote:
>>> On Mon, May 15, 2017 at 11:25:03PM -0400, Jeff King wrote:

>>>> One hack would be to look for BASH_FUNC_* in the environment and disable
>>>> the optimization in that case. I think that would make your case Just
>>>> Work. It doesn't help other oddball cases, like:
[...]
>> Hm.  execvp explicitly does this when it hits ENOEXEC, but not for
>> ENOENT.  Do you know why that is?
>
> When execvp(foo) falls back on ENOEXEC, it is not running "sh -c foo".
> It is actually running "sh foo" to run the script contents.

Oh --- now I get it.  This is about the optimization to bypass the shell
in prepare_shell_command.  It has nothing to do with execvp --- our
execvp emulation already does the right thing, and Brandon's patches
did not make this problem any worse or better.

Unconditionally falling back to sh -c when we get ENOENT in the
RUN_USING_SHELL case makes sense to me.

Sorry for the confusion,
Jonathan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git rebase regression: cannot pass a shell expression directly to --exec
  2017-05-16 17:21       ` Eric Rannaud
@ 2017-05-16 17:37         ` Jeff King
  0 siblings, 0 replies; 29+ messages in thread
From: Jeff King @ 2017-05-16 17:37 UTC (permalink / raw)
  To: Eric Rannaud; +Cc: Johannes Schindelin, git, Jeremy Serror

On Tue, May 16, 2017 at 10:21:51AM -0700, Eric Rannaud wrote:

> On Tue, May 16, 2017 at 9:59 AM, Eric Rannaud <eric.rannaud@gmail.com> wrote:
> > When I use "git rebase --exec <cmd>" I'm basically writing a "foreach
> > commit in range { <cmd> }" in my shell. Same idea with git bisect run.
> >
> > A transparent optimization that tries execve() then falls back to the
> > user's shell sounds like a good idea.
> 
> One issue with the execve-else-shell optimization is that sometimes a
> binary exists that will shadow an exported function or a shell
> builtin:
> 
>   git rebase --exec true master^^  # OK but in fact this runs /usr/bin/true

Yeah, this is the builtin thing I mentioned elsewhere. I think it's
pretty rare to run a builtin with no arguments and care about the
behavior differences.

> /usr/bin/time requires an argument. Even though the bash builtin time
> runs fine without argument.
> 
>   $ time
> 
>   real    0m0.000s
>   user    0m0.000s
>   sys     0m0.000s

I've run into the "time" distinction even when running things from the
shell, because the time builtin is special. E.g.:

  $ time true
  real	0m0.000s
  user	0m0.000s
  sys	0m0.000s

  $ (echo foo | time true) 2>&1
  0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata 1136maxresident)k
  0inputs+0outputs (0major+61minor)pagefaults 0swaps

So to some degree, depending on builtins versus external commands
(especially when you're round-tripping through another program running a
second shell) is going to have some surprises.

> But if the optimization is applied to more complex commands, then we
> will have problems. For instance, the builtin echo supports \E, but
> /usr/bin/echo doesn't support it.

No, it shouldn't. If any of

     |&;<>()$`\\\"' \t\n*?[#~=%

appear in the string, we always run the shell.

> In any case, the manpage says --exec <cmd> and "<cmd> will be
> interpreted as one or more shell commands.", it doesn't say "--exec
> <executable>".

Right, it's clearly supposed to use the shell, or behave as if the shell
were invoked (within reason).

-Peff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git rebase regression: cannot pass a shell expression directly to --exec
  2017-05-16 17:29         ` Eric Rannaud
@ 2017-05-16 17:41           ` Jeff King
  0 siblings, 0 replies; 29+ messages in thread
From: Jeff King @ 2017-05-16 17:41 UTC (permalink / raw)
  To: Eric Rannaud; +Cc: Kevin Daudt, Johannes Schindelin, git, Jeremy Serror

On Tue, May 16, 2017 at 10:29:31AM -0700, Eric Rannaud wrote:

> > It does not really work that way. Git runs in a separate process that
> > does not have access to your current shell. That's why you need to do
> > 'export -f foo'.
> >
> > If you want git to be able to ecute the foo shell function, git needs to
> > start a _new_ shell process, which reads the environment, recognize the
> > exported function and run that.
> >
> > This is not the same as git executing the command in your shell. Not
> > exported variables would not be available in this function (as it would
> > be in your equivalent).
> 
> I'm sorry, I didn't mean (or say) "my shell process". Indeed, it
> doesn't work that way. And to be clear, there is no problem with
> having to "export -f foo". The only question is how should git run the
> <cmd> passed to --exec: should it run directly or using a shell?

It's definitely "using a shell". Most things Git runs on your behalf
behave the same, but there are some exceptions due to historical warts
(that would break backwards compatibility if we switched them). E.g.,
$GIT_SSH does not use the shell, but we introduced GIT_SSH_COMMAND as an
alternative which does use the shell.

But note that "a shell" may not be necessarily be your login shell. We
always execute "/bin/sh -c" (and you can override the shell path at
build time). So your "export -f" trick only works because your /bin/sh
is a symlink to bash (or because you specifically built with the
SHELL_PATH knob pointed at bash).

-Peff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git rebase regression: cannot pass a shell expression directly to --exec
  2017-05-16 17:23           ` Jeff King
  2017-05-16 17:30             ` Brandon Williams
@ 2017-05-16 19:14             ` Linus Torvalds
  2017-05-16 19:35               ` [TANGENT] run-command: use vfork instead of fork Eric Wong
  2017-05-16 20:12               ` git rebase regression: cannot pass a shell expression directly to --exec Johannes Schindelin
  1 sibling, 2 replies; 29+ messages in thread
From: Linus Torvalds @ 2017-05-16 19:14 UTC (permalink / raw)
  To: Jeff King
  Cc: Brandon Williams, Jonathan Nieder, Eric Rannaud, Git Mailing List,
	Johannes Schindelin, Jeremy Serror

On Tue, May 16, 2017 at 10:23 AM, Jeff King <peff@peff.net> wrote:
>
> I think the logic here would be more like:
>
>   1. During prepare_shell_cmd(), even if we optimize out the shell call,
>      still prepare a fallback argv (since we can't allocate memory
>      post-fork).
>
>   2. In the forked child, if we get ENOENT from exec and cmd->use_shell
>      is set, then exec the fallback shell argv instead. Propagate its
>      results, even if it's 127.
>
> That still means we'd prefer a $PATH copy of a command to its shell
> builtin variant, but that can't be helped (and I kind of doubt anybody
> would care too much).

I think it would be better to just

 (a) get rid of the magic strcspn() entirely

 (b) make the 'can we optimize this' test be simply just looking up
'argv[0]' in $PATH

Hmm? If you have executables with special characters in them in your
PATH, you have bigger issues.

Also, if people really want to optimize the code that executes an
external program (whether in shell or directly), I think it might be
worth it to look at replacing the "fork()" with a "vfork()".

Something like this

-       cmd->pid = fork();
+       cmd->pid = (cmd->git_cmd || cmd->env) ? fork() : vfork();

might work (the native git_cmd case needs a real fork, and if we
change the environment variables we need it too, but the other cases
look like they might work with vfork()).

Using vfork() can be hugely more efficient, because you don't have the
extra page table copies and teardown, but also avoid a lot of possible
copy-on-write faults.

                 Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [TANGENT] run-command: use vfork instead of fork
  2017-05-16 19:14             ` Linus Torvalds
@ 2017-05-16 19:35               ` Eric Wong
  2017-05-16 20:47                 ` Linus Torvalds
  2017-05-16 20:12               ` git rebase regression: cannot pass a shell expression directly to --exec Johannes Schindelin
  1 sibling, 1 reply; 29+ messages in thread
From: Eric Wong @ 2017-05-16 19:35 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeff King, Brandon Williams, Jonathan Nieder, Eric Rannaud,
	Git Mailing List, Johannes Schindelin, Jeremy Serror

Linus Torvalds <torvalds@linux-foundation.org> wrote:
> Also, if people really want to optimize the code that executes an
> external program (whether in shell or directly), I think it might be
> worth it to look at replacing the "fork()" with a "vfork()".
> 
> Something like this
> 
> -       cmd->pid = fork();
> +       cmd->pid = (cmd->git_cmd || cmd->env) ? fork() : vfork();
> 
> might work (the native git_cmd case needs a real fork, and if we
> change the environment variables we need it too, but the other cases
> look like they might work with vfork()).
> 
> Using vfork() can be hugely more efficient, because you don't have the
> extra page table copies and teardown, but also avoid a lot of possible
> copy-on-write faults.

Fwiw, most of the vfork preparation was already done by Brandon
and myself a few weeks ago, and cooking in pu.

I think only the patch below would be needed to enable vfork
(along with any build-time detection)

However, I haven't noticed enough forking in git to make a
difference (but maybe others do).  I think it would make a
bigger difference if such changes were made to bash, dash,
make, and perl5.

--------8<------------
Subject: [PATCH] run-command: use vfork instead of fork

To enable vfork, we merely have to avoid modifying memory we
share with the parent, so the guard functions
`child_(error|warn|die)_fn` can now be disabled.

FIXME: still missing autoconf + Makefile portability tweaks.

Signed-off-by: Eric Wong <e@80x24.org>
---
 run-command.c | 28 +---------------------------
 1 file changed, 1 insertion(+), 27 deletions(-)

diff --git a/run-command.c b/run-command.c
index 9e36151bf9..0292dd94b6 100644
--- a/run-command.c
+++ b/run-command.c
@@ -324,25 +324,6 @@ static void fake_fatal(const char *err, va_list params)
 	vreportf("fatal: ", err, params);
 }
 
-static void child_error_fn(const char *err, va_list params)
-{
-	const char msg[] = "error() should not be called in child\n";
-	xwrite(2, msg, sizeof(msg) - 1);
-}
-
-static void child_warn_fn(const char *err, va_list params)
-{
-	const char msg[] = "warn() should not be called in child\n";
-	xwrite(2, msg, sizeof(msg) - 1);
-}
-
-static void NORETURN child_die_fn(const char *err, va_list params)
-{
-	const char msg[] = "die() should not be called in child\n";
-	xwrite(2, msg, sizeof(msg) - 1);
-	_exit(2);
-}
-
 /* this runs in the parent process */
 static void child_err_spew(struct child_process *cmd, struct child_err *cerr)
 {
@@ -658,17 +639,10 @@ int start_command(struct child_process *cmd)
 	 * never be released in the child process.  This means only
 	 * Async-Signal-Safe functions are permitted in the child.
 	 */
-	cmd->pid = fork();
+	cmd->pid = vfork();
 	failed_errno = errno;
 	if (!cmd->pid) {
 		int sig;
-		/*
-		 * Ensure the default die/error/warn routines do not get
-		 * called, they can take stdio locks and malloc.
-		 */
-		set_die_routine(child_die_fn);
-		set_error_routine(child_error_fn);
-		set_warn_routine(child_warn_fn);
 
 		close(notify_pipe[0]);
 		set_cloexec(notify_pipe[1]);
-- 
Also fetchable:  git://bogomips.org/git-svn vfork-test
commit 5f88d79182aaabc5ea467d1d29e13e45bd2b99bf

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: git rebase regression: cannot pass a shell expression directly to --exec
  2017-05-16 19:14             ` Linus Torvalds
  2017-05-16 19:35               ` [TANGENT] run-command: use vfork instead of fork Eric Wong
@ 2017-05-16 20:12               ` Johannes Schindelin
  2017-05-16 20:27                 ` Linus Torvalds
  1 sibling, 1 reply; 29+ messages in thread
From: Johannes Schindelin @ 2017-05-16 20:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeff King, Brandon Williams, Jonathan Nieder, Eric Rannaud,
	Git Mailing List, Jeremy Serror

Hi Linus,

On Tue, 16 May 2017, Linus Torvalds wrote:

> On Tue, May 16, 2017 at 10:23 AM, Jeff King <peff@peff.net> wrote:
> >
> > I think the logic here would be more like:
> >
> >   1. During prepare_shell_cmd(), even if we optimize out the shell call,
> >      still prepare a fallback argv (since we can't allocate memory
> >      post-fork).
> >
> >   2. In the forked child, if we get ENOENT from exec and cmd->use_shell
> >      is set, then exec the fallback shell argv instead. Propagate its
> >      results, even if it's 127.
> >
> > That still means we'd prefer a $PATH copy of a command to its shell
> > builtin variant, but that can't be helped (and I kind of doubt anybody
> > would care too much).
> 
> I think it would be better to just
> 
>  (a) get rid of the magic strcspn() entirely
> 
>  (b) make the 'can we optimize this' test be simply just looking up
> 'argv[0]' in $PATH

What about

	ABC=1 my-executable my-arg

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git rebase regression: cannot pass a shell expression directly to --exec
  2017-05-16 20:12               ` git rebase regression: cannot pass a shell expression directly to --exec Johannes Schindelin
@ 2017-05-16 20:27                 ` Linus Torvalds
  0 siblings, 0 replies; 29+ messages in thread
From: Linus Torvalds @ 2017-05-16 20:27 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Jeff King, Brandon Williams, Jonathan Nieder, Eric Rannaud,
	Git Mailing List, Jeremy Serror

On Tue, May 16, 2017 at 1:12 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>>
>> I think it would be better to just
>>
>>  (a) get rid of the magic strcspn() entirely
>>
>>  (b) make the 'can we optimize this' test be simply just looking up
>> 'argv[0]' in $PATH
>
> What about
>
>         ABC=1 my-executable my-arg

What about it?

Do you have a binary like

    '/usr/bin/ABC=1 my-executable my-arg'

or something? If so, it gets executed. If not, it would get passed off
to "/bin/sh".

That sounds a lot more obvious than "some random set of characters are magical".

                  Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [TANGENT] run-command: use vfork instead of fork
  2017-05-16 19:35               ` [TANGENT] run-command: use vfork instead of fork Eric Wong
@ 2017-05-16 20:47                 ` Linus Torvalds
  2017-05-16 21:11                   ` Brandon Williams
  0 siblings, 1 reply; 29+ messages in thread
From: Linus Torvalds @ 2017-05-16 20:47 UTC (permalink / raw)
  To: Eric Wong
  Cc: Jeff King, Brandon Williams, Jonathan Nieder, Eric Rannaud,
	Git Mailing List, Johannes Schindelin, Jeremy Serror

On Tue, May 16, 2017 at 12:35 PM, Eric Wong <e@80x24.org> wrote:
>
> Fwiw, most of the vfork preparation was already done by Brandon
> and myself a few weeks ago, and cooking in pu.

Oh, interesting. Was that done for vfork(), or is it for something
else? Some of the changes seem almost overly careful. Is this for
prep-work porting to some odd environment that doesn't really have a
MMU at all? There's nothing fundamentally wrong with allocating memory
after fork().

But yes, it looks like it helps the vfork case.

               Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [TANGENT] run-command: use vfork instead of fork
  2017-05-16 20:47                 ` Linus Torvalds
@ 2017-05-16 21:11                   ` Brandon Williams
  0 siblings, 0 replies; 29+ messages in thread
From: Brandon Williams @ 2017-05-16 21:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric Wong, Jeff King, Jonathan Nieder, Eric Rannaud,
	Git Mailing List, Johannes Schindelin, Jeremy Serror

On 05/16, Linus Torvalds wrote:
> On Tue, May 16, 2017 at 12:35 PM, Eric Wong <e@80x24.org> wrote:
> >
> > Fwiw, most of the vfork preparation was already done by Brandon
> > and myself a few weeks ago, and cooking in pu.
> 
> Oh, interesting. Was that done for vfork(), or is it for something
> else? Some of the changes seem almost overly careful. Is this for
> prep-work porting to some odd environment that doesn't really have a
> MMU at all? There's nothing fundamentally wrong with allocating memory
> after fork().
> 
> But yes, it looks like it helps the vfork case.
> 
>                Linus

I started working on the run-command code when I ran into a deadlock in
'git grep --recurse-submodules'.  When I added support for submodules to
grep I just assumed that launching a process (which atm is unfortunately
the only way to work on a submodule) would work in a multi-threaded
environment.  I was naive and wrong!

The deadlock was due to a malloc lock being held by thread 'A' while
thread 'b' tried to launch a process.  Since that lock was in a
locked-state at the time of forking, it remained in a locked-state with
no hope of ever being released.  So when the child process that thread
'b' spawned tried to malloc a chunk of memory after forking, it
deadlocked.

I didn't catch this in initial testing because gclib registers
atfork_handelers in order to prevent this sort of thing, while libraries
like tcmalloc don't do this.

So to account for this, I worked to make run-command safe to use in the
presence of threads, which had the benefit of also preparing it to be
vfork() ready.

Ultimately I'd like to drop the requirement to spawn a child process to
work on a submodule, but that's going to take a lot more effort.

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2017-05-16 21:11 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-15 18:08 git rebase regression: cannot pass a shell expression directly to --exec Eric Rannaud
2017-05-16  3:25 ` Jeff King
2017-05-16  3:37   ` Jeff King
2017-05-16 16:41     ` Jonathan Nieder
2017-05-16 16:47       ` Jeff King
2017-05-16 17:11         ` Eric Rannaud
2017-05-16 17:15         ` Brandon Williams
2017-05-16 17:23           ` Jeff King
2017-05-16 17:30             ` Brandon Williams
2017-05-16 19:14             ` Linus Torvalds
2017-05-16 19:35               ` [TANGENT] run-command: use vfork instead of fork Eric Wong
2017-05-16 20:47                 ` Linus Torvalds
2017-05-16 21:11                   ` Brandon Williams
2017-05-16 20:12               ` git rebase regression: cannot pass a shell expression directly to --exec Johannes Schindelin
2017-05-16 20:27                 ` Linus Torvalds
2017-05-16 17:37         ` Jonathan Nieder
2017-05-16  3:40   ` Junio C Hamano
2017-05-16  3:53     ` Jeff King
2017-05-16  4:08       ` Jeff King
2017-05-16 16:45       ` Eric Rannaud
2017-05-16 10:46     ` Johannes Schindelin
2017-05-16 10:23 ` Johannes Schindelin
2017-05-16 16:18   ` Jeff King
2017-05-16 16:59     ` Eric Rannaud
2017-05-16 17:14       ` Kevin Daudt
2017-05-16 17:29         ` Eric Rannaud
2017-05-16 17:41           ` Jeff King
2017-05-16 17:21       ` Eric Rannaud
2017-05-16 17:37         ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).