From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
Enzo Matsumiya <ematsumiya@suse.de>
Subject: Re: [PATCH 5/5] run-command API: remove "argv" member, always use "args"
Date: Mon, 22 Nov 2021 19:19:14 +0100 [thread overview]
Message-ID: <211122.86wnl0xd24.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <YZvUN0kuTpmf9Q7P@coredump.intra.peff.net>
On Mon, Nov 22 2021, Jeff King wrote:
> On Mon, Nov 22, 2021 at 05:04:07PM +0100, Ævar Arnfjörð Bjarmason wrote:
>
>> This change is larger than I'd like, but there's no easy way to avoid
>> it that wouldn't involve even more verbose intermediate steps. We use
>> the "argv" as the source of truth over the "args", so we need to
>> change all parts of run-command.[ch] itself, as well as the trace2
>> logging at the same time.
>
> Yeah, agreed most of this needs to come in a bundle. But at least few spots
> could be split out into the earlier patches:
>
>> builtin/worktree.c | 2 --
>> t/helper/test-run-command.c | 10 +++++---
>
> as they are just users of the API.
Will split these out, I had test-run-command.c split initially, but
squashed it locally, which did away with having to explain leaving this
area untested as an intermediate step. But will split again.
>> diff --git a/t/helper/test-run-command.c b/t/helper/test-run-command.c
>> index 3c4fb862234..b5519f92a19 100644
>> --- a/t/helper/test-run-command.c
>> +++ b/t/helper/test-run-command.c
>> [...]
>> @@ -396,7 +396,8 @@ int cmd__run_command(int argc, const char **argv)
>> }
>> if (argc < 3)
>> return 1;
>> - proc.argv = (const char **)argv + 2;
>> + strvec_clear(&proc.args);
>> + strvec_pushv(&proc.args, (const char **)argv + 2);
>>
>> if (!strcmp(argv[1], "start-command-ENOENT")) {
>> if (start_command(&proc) < 0 && errno == ENOENT)
>> @@ -408,7 +409,8 @@ int cmd__run_command(int argc, const char **argv)
>> exit(run_command(&proc));
>>
>> jobs = atoi(argv[2]);
>> - proc.argv = (const char **)argv + 3;
>> + strvec_clear(&proc.args);
>> + strvec_pushv(&proc.args, (const char **)argv + 3);
>>
>> if (!strcmp(argv[1], "run-command-parallel"))
>> exit(run_processes_parallel(jobs, parallel_next,
>
> These two in particular are tricky. This second strvec_clear() is
> necessary because we are overwriting what was put into proc.args by the
> earlier strvec_pushv(). I think this is one of two interesting ways
> these kinds of trivial conversions can fail:
>
> - somebody is relying on the fact that "argv = whatever" is an
> assignment, not an append (which is the case here)
>
> - somebody is relying on the fact that assignment of the pointer is
> shallow, whereas strvec_push() is doing a deep copy. So converting:
>
> cp.argv = argv;
> argv[1] = "foo";
>
> to:
>
> strvec_pushv(&cp.args, argv);
> argv[1] = "foo";
>
> is not correct. We wouldn't use the updated "foo". I didn't notice
> any cases like this during my rough own rough conversion, but they
> could be lurking.
Yes, I tried to eyeball all the code involved & see if there were any. I
could amend this to start renaming variables, but that'll be a much
larger change.
> The strvec_clear() in the first hunk above is also not necessary (nobody
> has written anything before it), but I don't mind it for consistency /
> being defensive.
*nod*, will remove.
>> @@ -902,8 +900,9 @@ int start_command(struct child_process *cmd)
>> #else
>> {
>> int fhin = 0, fhout = 1, fherr = 2;
>> - const char **sargv = cmd->argv;
>> + const char **sargv = strvec_detach(&cmd->args);
>> struct strvec nargv = STRVEC_INIT;
>> + const char **temp_argv = NULL;
>
> I wondered about detaching here, because other parts of the code
> (finish_command(), tracing, etc) will be looking at cmd->args.v[0] for
> the command name.
>
> But we eventually overwrite it with munged results:
>
>> @@ -929,20 +928,26 @@ int start_command(struct child_process *cmd)
>> fhout = dup(cmd->out);
>>
>> if (cmd->git_cmd)
>> - cmd->argv = prepare_git_cmd(&nargv, cmd->argv);
>> + temp_argv = prepare_git_cmd(&nargv, sargv);
>> else if (cmd->use_shell)
>> - cmd->argv = prepare_shell_cmd(&nargv, cmd->argv);
>> + temp_argv = prepare_shell_cmd(&nargv, sargv);
>> + else
>> + temp_argv = sargv;
>> + if (!temp_argv)
>> + BUG("should have some cmd->args to set");
>> + strvec_pushv(&cmd->args, temp_argv);
>> + strvec_clear(&nargv);
>
> So we have to do some replacement. I think the memory management gets
> confusing here, though.
>
> At this point "temp_argv" might point to nargv.v (in which case it is a
> dangling pointer after we clear nargv) or to the original sargv (in
> which case it is memory owned by us that must be freed). The former is
> OK, because we never look at it again. And the latter we eventually
> reattach into &cmd->args, but:
>
>> - strvec_clear(&nargv);
>> - cmd->argv = sargv;
>> + strvec_pushv(&cmd->args, sargv);;
>> + free(sargv);
>
> I think we still have all the entries we pushed into cmd->args from
> temp_argv earlier. So we'd need to strvec_clear() it before pushing
> sargv again.
>
> And then the free(sargv) is too shallow. It's just freeing the outer
> array, but sargv[0], etc, are leaked.
I'll try to fix that, but updates to this part are very painful, since
I've needed to wait 30m for each change in the Windows CI.
> I think what you really want is the equivalent of strvec_attach(). We
> don't have that, but it's basically just:
>
> void strvec_attach(struct strvec *s, const char **v)
> {
> /*
> * this is convenient for our caller, but if we wanted to find
> * potential misuses, we could also BUG() if s.nr > 0
> */
> strvec_clear(&s);
>
> /* take over ownership */
> s->v = v;
>
> /* v must be NULL-terminated; count up to set number */
> s->nr = 0;
> for (; *v; v++)
> s->nr++;
> /*
> * we don't know how big the original allocation was, so we can
> * assume only nr. But add 1 to account for the NULL.
> */
> s->alloc = s->nr + 1;
> }
>
> I do think this whole thing would be easier to read if we left cmd->argv
> intact, and just operated on a separate argv strvec. That's what the
> non-Windows side does. But that distinction is nothing new in your
> patch, and I'm not sure if there is a reason that the Windows code does
> it differently.
I did have this with a strvec_attach() locally, but figured I'd get
comments about growing scope & with just one caller.
This version seems to be duplicating things in the existing API though,
I just had the below, which I think works just as well without the
duplication. Perhaps you missed strvec_push_nodup()?
diff --git a/strvec.c b/strvec.c
index 61a76ce6cb9..c10008d792f 100644
--- a/strvec.c
+++ b/strvec.c
@@ -106,3 +106,9 @@ const char **strvec_detach(struct strvec *array)
return ret;
}
}
+
+void strvec_attach(struct strvec *array, const char **items)
+{
+ for (; *items; items++)
+ strvec_push_nodup(array, *items);
+}
next prev parent reply other threads:[~2021-11-22 18:25 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-20 19:40 [PATCH v2] pager: fix crash when pager program doesn't exist Enzo Matsumiya
2021-11-21 18:37 ` Jeff King
2021-11-22 2:10 ` Junio C Hamano
2021-11-22 4:35 ` Jeff King
2021-11-22 14:52 ` Enzo Matsumiya
2021-11-22 17:05 ` Junio C Hamano
2021-11-23 16:40 ` Enzo Matsumiya
2021-11-24 1:55 ` Ævar Arnfjörð Bjarmason
2021-11-24 15:51 ` Jeff King
2021-11-22 16:04 ` [PATCH 0/5] run-command API: get rid of "argv" Ævar Arnfjörð Bjarmason
2021-11-22 16:04 ` [PATCH 1/5] archive-tar: use our own cmd.buf in error message Ævar Arnfjörð Bjarmason
2021-11-22 21:04 ` Junio C Hamano
2021-11-22 16:04 ` [PATCH 2/5] upload-archive: use regular "struct child_process" pattern Ævar Arnfjörð Bjarmason
2021-11-22 17:02 ` Jeff King
2021-11-22 20:53 ` Ævar Arnfjörð Bjarmason
2021-11-22 21:10 ` Jeff King
2021-11-22 21:36 ` Ævar Arnfjörð Bjarmason
2021-11-22 16:04 ` [PATCH 3/5] run-command API users: use strvec_pushv(), not argv assignment Ævar Arnfjörð Bjarmason
2021-11-22 21:19 ` Junio C Hamano
2021-11-22 21:30 ` Ævar Arnfjörð Bjarmason
2021-11-22 16:04 ` [PATCH 4/5] run-command API users: use strvec_pushl(), not argv construction Ævar Arnfjörð Bjarmason
2021-11-22 16:04 ` [PATCH 5/5] run-command API: remove "argv" member, always use "args" Ævar Arnfjörð Bjarmason
2021-11-22 17:32 ` Jeff King
2021-11-22 18:19 ` Ævar Arnfjörð Bjarmason [this message]
2021-11-22 18:47 ` Jeff King
2021-11-22 17:52 ` [PATCH 0/5] run-command API: get rid of "argv" Jeff King
2021-11-22 18:11 ` Junio C Hamano
2021-11-22 18:33 ` Ævar Arnfjörð Bjarmason
2021-11-22 18:49 ` Jeff King
2021-11-22 18:26 ` Ævar Arnfjörð Bjarmason
2021-11-23 12:06 ` [PATCH v2 0/9] run-command API: get rid of "argv" and "env" Ævar Arnfjörð Bjarmason
2021-11-23 12:06 ` [PATCH v2 1/9] worktree: remove redundant NULL-ing of "cp.argv Ævar Arnfjörð Bjarmason
2021-11-23 15:26 ` Eric Sunshine
2021-11-24 1:54 ` Junio C Hamano
2021-11-24 6:00 ` Eric Sunshine
2021-11-24 6:12 ` Eric Sunshine
2021-11-24 5:44 ` Eric Sunshine
2021-11-23 12:06 ` [PATCH v2 2/9] upload-archive: use regular "struct child_process" pattern Ævar Arnfjörð Bjarmason
2021-11-23 12:06 ` [PATCH v2 3/9] run-command API users: use strvec_pushv(), not argv assignment Ævar Arnfjörð Bjarmason
2021-11-23 12:06 ` [PATCH v2 4/9] run-command tests: " Ævar Arnfjörð Bjarmason
2021-11-24 1:33 ` Eric Sunshine
2021-11-23 12:06 ` [PATCH v2 5/9] run-command API users: use strvec_pushl(), not argv construction Ævar Arnfjörð Bjarmason
2021-11-23 12:06 ` [PATCH v2 6/9] run-command API users: use strvec_push(), " Ævar Arnfjörð Bjarmason
2021-11-23 12:06 ` [PATCH v2 7/9] run-command API: remove "argv" member, always use "args" Ævar Arnfjörð Bjarmason
2021-11-23 12:06 ` [PATCH v2 8/9] difftool: use "env_array" to simplify memory management Ævar Arnfjörð Bjarmason
2021-11-23 12:06 ` [PATCH v2 9/9] run-command API: remove "env" member, always use "env_array" Ævar Arnfjörð Bjarmason
2021-11-25 22:52 ` [PATCH v3 0/9] run-command API: get rid of "argv" and "env" Ævar Arnfjörð Bjarmason
2021-11-25 22:52 ` [PATCH v3 1/9] worktree: stop being overly intimate with run_command() internals Ævar Arnfjörð Bjarmason
2021-11-26 9:48 ` Eric Sunshine
2021-11-25 22:52 ` [PATCH v3 2/9] upload-archive: use regular "struct child_process" pattern Ævar Arnfjörð Bjarmason
2021-11-25 22:52 ` [PATCH v3 3/9] run-command API users: use strvec_pushv(), not argv assignment Ævar Arnfjörð Bjarmason
2021-11-25 22:52 ` [PATCH v3 4/9] run-command tests: " Ævar Arnfjörð Bjarmason
2021-11-25 22:52 ` [PATCH v3 5/9] run-command API users: use strvec_pushl(), not argv construction Ævar Arnfjörð Bjarmason
2021-11-25 22:52 ` [PATCH v3 6/9] run-command API users: use strvec_push(), " Ævar Arnfjörð Bjarmason
2021-11-25 22:52 ` [PATCH v3 7/9] run-command API: remove "argv" member, always use "args" Ævar Arnfjörð Bjarmason
2021-11-25 22:52 ` [PATCH v3 8/9] difftool: use "env_array" to simplify memory management Ævar Arnfjörð Bjarmason
2021-11-25 22:52 ` [PATCH v3 9/9] run-command API: remove "env" member, always use "env_array" Ævar Arnfjörð Bjarmason
2021-11-22 15:31 ` [PATCH v2] pager: fix crash when pager program doesn't exist Enzo Matsumiya
2021-11-22 16:22 ` Ævar Arnfjörð Bjarmason
2021-11-22 16:46 ` Enzo Matsumiya
2021-11-22 17:10 ` Ævar Arnfjörð Bjarmason
2021-11-22 17:41 ` Jeff King
2021-11-22 18:00 ` Junio C Hamano
2021-11-22 18:26 ` Jeff King
2021-11-22 17:55 ` Junio C Hamano
2021-11-22 18:19 ` Junio C Hamano
2021-11-22 18:37 ` Jeff King
2021-11-22 20:39 ` Junio C Hamano
2021-11-22 17:08 ` Junio C Hamano
2021-11-22 18:35 ` Ævar Arnfjörð Bjarmason
2021-11-22 16:30 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=211122.86wnl0xd24.gmgdl@evledraar.gmail.com \
--to=avarab@gmail.com \
--cc=ematsumiya@suse.de \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).