git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Enzo Matsumiya <ematsumiya@suse.de>
Subject: Re: [PATCH 5/5] run-command API: remove "argv" member, always use "args"
Date: Mon, 22 Nov 2021 19:19:14 +0100	[thread overview]
Message-ID: <211122.86wnl0xd24.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <YZvUN0kuTpmf9Q7P@coredump.intra.peff.net>


On Mon, Nov 22 2021, Jeff King wrote:

> On Mon, Nov 22, 2021 at 05:04:07PM +0100, Ævar Arnfjörð Bjarmason wrote:
>
>> This change is larger than I'd like, but there's no easy way to avoid
>> it that wouldn't involve even more verbose intermediate steps. We use
>> the "argv" as the source of truth over the "args", so we need to
>> change all parts of run-command.[ch] itself, as well as the trace2
>> logging at the same time.
>
> Yeah, agreed most of this needs to come in a bundle. But at least few spots
> could be split out into the earlier patches:
>
>>  builtin/worktree.c          |  2 --
>>  t/helper/test-run-command.c | 10 +++++---
>
> as they are just users of the API.

Will split these out, I had test-run-command.c split initially, but
squashed it locally, which did away with having to explain leaving this
area untested as an intermediate step. But will split again.

>> diff --git a/t/helper/test-run-command.c b/t/helper/test-run-command.c
>> index 3c4fb862234..b5519f92a19 100644
>> --- a/t/helper/test-run-command.c
>> +++ b/t/helper/test-run-command.c
>> [...]
>> @@ -396,7 +396,8 @@ int cmd__run_command(int argc, const char **argv)
>>  	}
>>  	if (argc < 3)
>>  		return 1;
>> -	proc.argv = (const char **)argv + 2;
>> +	strvec_clear(&proc.args);
>> +	strvec_pushv(&proc.args, (const char **)argv + 2);
>>  
>>  	if (!strcmp(argv[1], "start-command-ENOENT")) {
>>  		if (start_command(&proc) < 0 && errno == ENOENT)
>> @@ -408,7 +409,8 @@ int cmd__run_command(int argc, const char **argv)
>>  		exit(run_command(&proc));
>>  
>>  	jobs = atoi(argv[2]);
>> -	proc.argv = (const char **)argv + 3;
>> +	strvec_clear(&proc.args);
>> +	strvec_pushv(&proc.args, (const char **)argv + 3);
>>  
>>  	if (!strcmp(argv[1], "run-command-parallel"))
>>  		exit(run_processes_parallel(jobs, parallel_next,
>
> These two in particular are tricky. This second strvec_clear() is
> necessary because we are overwriting what was put into proc.args by the
> earlier strvec_pushv(). I think this is one of two interesting ways
> these kinds of trivial conversions can fail:
>
>   - somebody is relying on the fact that "argv = whatever" is an
>     assignment, not an append (which is the case here)
>
>   - somebody is relying on the fact that assignment of the pointer is
>     shallow, whereas strvec_push() is doing a deep copy. So converting:
>
>       cp.argv = argv;
>       argv[1] = "foo";
>
>     to:
>
>       strvec_pushv(&cp.args, argv);
>       argv[1] = "foo";
>
>     is not correct. We wouldn't use the updated "foo". I didn't notice
>     any cases like this during my rough own rough conversion, but they
>     could be lurking.

Yes, I tried to eyeball all the code involved & see if there were any. I
could amend this to start renaming variables, but that'll be a much
larger change.

> The strvec_clear() in the first hunk above is also not necessary (nobody
> has written anything before it), but I don't mind it for consistency /
> being defensive.

*nod*, will remove.

>> @@ -902,8 +900,9 @@ int start_command(struct child_process *cmd)
>>  #else
>>  {
>>  	int fhin = 0, fhout = 1, fherr = 2;
>> -	const char **sargv = cmd->argv;
>> +	const char **sargv = strvec_detach(&cmd->args);
>>  	struct strvec nargv = STRVEC_INIT;
>> +	const char **temp_argv = NULL;
>
> I wondered about detaching here, because other parts of the code
> (finish_command(), tracing, etc) will be looking at cmd->args.v[0] for
> the command name.
>
> But we eventually overwrite it with munged results:
>
>> @@ -929,20 +928,26 @@ int start_command(struct child_process *cmd)
>>  		fhout = dup(cmd->out);
>>  
>>  	if (cmd->git_cmd)
>> -		cmd->argv = prepare_git_cmd(&nargv, cmd->argv);
>> +		temp_argv = prepare_git_cmd(&nargv, sargv);
>>  	else if (cmd->use_shell)
>> -		cmd->argv = prepare_shell_cmd(&nargv, cmd->argv);
>> +		temp_argv = prepare_shell_cmd(&nargv, sargv);
>> +	else
>> +		temp_argv = sargv;
>> +	if (!temp_argv)
>> +		BUG("should have some cmd->args to set");
>> +	strvec_pushv(&cmd->args, temp_argv);
>> +	strvec_clear(&nargv);
>
> So we have to do some replacement. I think the memory management gets
> confusing here, though.
>
> At this point "temp_argv" might point to nargv.v (in which case it is a
> dangling pointer after we clear nargv) or to the original sargv (in
> which case it is memory owned by us that must be freed). The former is
> OK, because we never look at it again. And the latter we eventually
> reattach into &cmd->args, but:
>
>> -	strvec_clear(&nargv);
>> -	cmd->argv = sargv;
>> +	strvec_pushv(&cmd->args, sargv);;
>> +	free(sargv);
>
> I think we still have all the entries we pushed into cmd->args from
> temp_argv earlier. So we'd need to strvec_clear() it before pushing
> sargv again.
>
> And then the free(sargv) is too shallow. It's just freeing the outer
> array, but sargv[0], etc, are leaked.

I'll try to fix that, but updates to this part are very painful, since
I've needed to wait 30m for each change in the Windows CI.

> I think what you really want is the equivalent of strvec_attach(). We
> don't have that, but it's basically just:
>
>   void strvec_attach(struct strvec *s, const char **v)
>   {
> 	/*
> 	 * this is convenient for our caller, but if we wanted to find
> 	 * potential misuses, we could also BUG() if s.nr > 0
> 	 */
> 	strvec_clear(&s);
>
>         /* take over ownership */
> 	s->v = v;
>
> 	/* v must be NULL-terminated; count up to set number */
> 	s->nr = 0;
> 	for (; *v; v++)
> 		s->nr++;
> 	/*
> 	 * we don't know how big the original allocation was, so we can
> 	 * assume only nr. But add 1 to account for the NULL.
> 	 */
> 	s->alloc = s->nr + 1;
>   }
>
> I do think this whole thing would be easier to read if we left cmd->argv
> intact, and just operated on a separate argv strvec. That's what the
> non-Windows side does. But that distinction is nothing new in your
> patch, and I'm not sure if there is a reason that the Windows code does
> it differently.

I did have this with a strvec_attach() locally, but figured I'd get
comments about growing scope & with just one caller.

This version seems to be duplicating things in the existing API though,
I just had the below, which I think works just as well without the
duplication. Perhaps you missed strvec_push_nodup()?

diff --git a/strvec.c b/strvec.c
index 61a76ce6cb9..c10008d792f 100644
--- a/strvec.c
+++ b/strvec.c
@@ -106,3 +106,9 @@ const char **strvec_detach(struct strvec *array)
                return ret;
        }
 }
+
+void strvec_attach(struct strvec *array, const char **items)
+{
+       for (; *items; items++)
+               strvec_push_nodup(array, *items);
+}

  reply	other threads:[~2021-11-22 18:25 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-20 19:40 [PATCH v2] pager: fix crash when pager program doesn't exist Enzo Matsumiya
2021-11-21 18:37 ` Jeff King
2021-11-22  2:10   ` Junio C Hamano
2021-11-22  4:35     ` Jeff King
2021-11-22 14:52       ` Enzo Matsumiya
2021-11-22 17:05         ` Junio C Hamano
2021-11-23 16:40           ` Enzo Matsumiya
2021-11-24  1:55             ` Ævar Arnfjörð Bjarmason
2021-11-24 15:51               ` Jeff King
2021-11-22 16:04       ` [PATCH 0/5] run-command API: get rid of "argv" Ævar Arnfjörð Bjarmason
2021-11-22 16:04         ` [PATCH 1/5] archive-tar: use our own cmd.buf in error message Ævar Arnfjörð Bjarmason
2021-11-22 21:04           ` Junio C Hamano
2021-11-22 16:04         ` [PATCH 2/5] upload-archive: use regular "struct child_process" pattern Ævar Arnfjörð Bjarmason
2021-11-22 17:02           ` Jeff King
2021-11-22 20:53           ` Ævar Arnfjörð Bjarmason
2021-11-22 21:10             ` Jeff King
2021-11-22 21:36               ` Ævar Arnfjörð Bjarmason
2021-11-22 16:04         ` [PATCH 3/5] run-command API users: use strvec_pushv(), not argv assignment Ævar Arnfjörð Bjarmason
2021-11-22 21:19           ` Junio C Hamano
2021-11-22 21:30             ` Ævar Arnfjörð Bjarmason
2021-11-22 16:04         ` [PATCH 4/5] run-command API users: use strvec_pushl(), not argv construction Ævar Arnfjörð Bjarmason
2021-11-22 16:04         ` [PATCH 5/5] run-command API: remove "argv" member, always use "args" Ævar Arnfjörð Bjarmason
2021-11-22 17:32           ` Jeff King
2021-11-22 18:19             ` Ævar Arnfjörð Bjarmason [this message]
2021-11-22 18:47               ` Jeff King
2021-11-22 17:52         ` [PATCH 0/5] run-command API: get rid of "argv" Jeff King
2021-11-22 18:11           ` Junio C Hamano
2021-11-22 18:33             ` Ævar Arnfjörð Bjarmason
2021-11-22 18:49               ` Jeff King
2021-11-22 18:26           ` Ævar Arnfjörð Bjarmason
2021-11-23 12:06         ` [PATCH v2 0/9] run-command API: get rid of "argv" and "env" Ævar Arnfjörð Bjarmason
2021-11-23 12:06           ` [PATCH v2 1/9] worktree: remove redundant NULL-ing of "cp.argv Ævar Arnfjörð Bjarmason
2021-11-23 15:26             ` Eric Sunshine
2021-11-24  1:54               ` Junio C Hamano
2021-11-24  6:00                 ` Eric Sunshine
2021-11-24  6:12                   ` Eric Sunshine
2021-11-24  5:44               ` Eric Sunshine
2021-11-23 12:06           ` [PATCH v2 2/9] upload-archive: use regular "struct child_process" pattern Ævar Arnfjörð Bjarmason
2021-11-23 12:06           ` [PATCH v2 3/9] run-command API users: use strvec_pushv(), not argv assignment Ævar Arnfjörð Bjarmason
2021-11-23 12:06           ` [PATCH v2 4/9] run-command tests: " Ævar Arnfjörð Bjarmason
2021-11-24  1:33             ` Eric Sunshine
2021-11-23 12:06           ` [PATCH v2 5/9] run-command API users: use strvec_pushl(), not argv construction Ævar Arnfjörð Bjarmason
2021-11-23 12:06           ` [PATCH v2 6/9] run-command API users: use strvec_push(), " Ævar Arnfjörð Bjarmason
2021-11-23 12:06           ` [PATCH v2 7/9] run-command API: remove "argv" member, always use "args" Ævar Arnfjörð Bjarmason
2021-11-23 12:06           ` [PATCH v2 8/9] difftool: use "env_array" to simplify memory management Ævar Arnfjörð Bjarmason
2021-11-23 12:06           ` [PATCH v2 9/9] run-command API: remove "env" member, always use "env_array" Ævar Arnfjörð Bjarmason
2021-11-25 22:52           ` [PATCH v3 0/9] run-command API: get rid of "argv" and "env" Ævar Arnfjörð Bjarmason
2021-11-25 22:52             ` [PATCH v3 1/9] worktree: stop being overly intimate with run_command() internals Ævar Arnfjörð Bjarmason
2021-11-26  9:48               ` Eric Sunshine
2021-11-25 22:52             ` [PATCH v3 2/9] upload-archive: use regular "struct child_process" pattern Ævar Arnfjörð Bjarmason
2021-11-25 22:52             ` [PATCH v3 3/9] run-command API users: use strvec_pushv(), not argv assignment Ævar Arnfjörð Bjarmason
2021-11-25 22:52             ` [PATCH v3 4/9] run-command tests: " Ævar Arnfjörð Bjarmason
2021-11-25 22:52             ` [PATCH v3 5/9] run-command API users: use strvec_pushl(), not argv construction Ævar Arnfjörð Bjarmason
2021-11-25 22:52             ` [PATCH v3 6/9] run-command API users: use strvec_push(), " Ævar Arnfjörð Bjarmason
2021-11-25 22:52             ` [PATCH v3 7/9] run-command API: remove "argv" member, always use "args" Ævar Arnfjörð Bjarmason
2021-11-25 22:52             ` [PATCH v3 8/9] difftool: use "env_array" to simplify memory management Ævar Arnfjörð Bjarmason
2021-11-25 22:52             ` [PATCH v3 9/9] run-command API: remove "env" member, always use "env_array" Ævar Arnfjörð Bjarmason
2021-11-22 15:31     ` [PATCH v2] pager: fix crash when pager program doesn't exist Enzo Matsumiya
2021-11-22 16:22       ` Ævar Arnfjörð Bjarmason
2021-11-22 16:46         ` Enzo Matsumiya
2021-11-22 17:10           ` Ævar Arnfjörð Bjarmason
2021-11-22 17:41             ` Jeff King
2021-11-22 18:00             ` Junio C Hamano
2021-11-22 18:26               ` Jeff King
2021-11-22 17:55           ` Junio C Hamano
2021-11-22 18:19             ` Junio C Hamano
2021-11-22 18:37               ` Jeff King
2021-11-22 20:39                 ` Junio C Hamano
2021-11-22 17:08         ` Junio C Hamano
2021-11-22 18:35           ` Ævar Arnfjörð Bjarmason
2021-11-22 16:30       ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=211122.86wnl0xd24.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=ematsumiya@suse.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).