git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: John Cai <johncai86@gmail.com>
To: Eric Sunshine <sunshine@sunshineco.com>
Cc: "John Cai via GitGitGadget" <gitgitgadget@gmail.com>,
	"Git List" <git@vger.kernel.org>, "Taylor Blau" <me@ttaylorr.com>,
	"Phillip Wood" <phillip.wood123@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Eric Wong" <e@80x24.org>, "Bagas Sanjaya" <bagasdotme@gmail.com>,
	"Junio C Hamano" <gitster@pobox.com>
Subject: Re: [PATCH 2/2] catfile.c: add --batch-command mode
Date: Fri, 04 Feb 2022 16:41:56 -0500	[thread overview]
Message-ID: <3F83E252-D99C-47CB-B47F-11B8EA554A4F@gmail.com> (raw)
In-Reply-To: <CAPig+cQHcstj28F-==oCakp3iMLHtd1SVQV+snzewYihHxNG6g@mail.gmail.com>

Hi Eric,

I appreciate the feedback and the thorough review!

On 4 Feb 2022, at 1:45, Eric Sunshine wrote:

> ()()On Thu, Feb 3, 2022 at 7:20 PM John Cai via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>> Add new flag --batch-command that will accept commands and arguments
>> from stdin, similar to git-update-ref --stdin.
>>
>> contents <object> NL
>> info <object> NL
>>
>> With the following commands, process (A) can begin a "session" and
>> send a list of object names over stdin. When "get contents" or "get info"
>> is issued, this list of object names will be fed into batch_one_object()
>> to retrieve either info or contents. Finally an fflush() will be called
>> to end the session.
>>
>> begin
>> <sha1>
>> get info
>>
>> begin
>> <sha1>
>> get contents
>
> I had the same reaction to this new command set as Junio expressed
> upstream[1], and was prepared to suggest an alternative but Junio said
> everything I wanted to say, so I won't say anything more about it
> here.
>
> That aside, the implementation presented here is overly loose about
> malformed input and accepts all sorts of bogus invocations which it
> should be reporting as errors. For instance, a lone:
>
>     get info
>
> without a preceding `begin` should be an error but such bogus input is
> not diagnosed. `get contents` is likewise affected. Another example:
>
>     begin
>     <oid>
>     <EOF>
>
> which lacks a closing `get info` or `get contents` is silently
> ignored. Similarly, malformed:
>
>     begin
>     begin
>     ...
>
> should be reported as an error but is not.
>
> There is also a bug in which the accumulated list of OID's is never
> cleared. Thus:
>
>     begin
>     <oid>
>     get info
>     get info
>
> emits info about <oid> twice, once for each `get info` invocation. Similarly:
>
>     begin
>     <oid1>
>     get info
>     <oid2>
>     get info
>
> emits information about <oid1> twice and <oid2> once. Invoking `begin`
> between the `get info` commands doesn't help because `begin` is a
> no-op. Thus:
>
>     begin
>     <oid1>
>     get info
>     begin
>     <oid2>
>     get info
>
> likewise emits information about <oid1> twice and <oid2> once.
>
> It also incorrectly accepts non-"session" commands in the middle of a
> session. For instance:
>
>     begin
>     <oid1>
>     info <oid2>
>     <oid3>
>
> immediately emits information about <oid2> and then bombs out claiming
> that <oid3> is an "unknown command" because the `info` command --
> which should not have been allowed within a session -- prematurely
> ended the "session".
>
> The `info` and `contents` commands neglect to do any sort of
> validation of their arguments, thus any and all bogus invocations are
> accepted. Thus:
>
>     info
>     info <arg1> <arg2>
>     info <non-oid>
>
> are all accepted as valid invocations, misleadingly producing "<foo>
> missing" messages, rather than erroring out as they should. The "<oid>
> missing" message should be reserved for the case when the lone
> argument to `info` or `contents` is something which looks like a
> legitimate OID.

So actually the argument to info and contents doesn't have to be an OID but an object name that
eventually gets passed to get_oid_with_context() to resolve to an actual oid. This is the
same behavior as git cat-file --batch and --batch-check, neither of which throws an error. My
goal was to maintain this behavior in --batch-command.
>
> The above is a long-winded way of saying that it is important not only
> to check the obvious "expect success" cases when implementing a new
> feature, but it's also important to add checks for the "expect
> failure" cases, such as all the above malformed inputs.

I overall agree and will add test coverage for invalid input.

> It's subjective, but it feels like this implementation is trying to be
> too clever by handling all these cases via a single strbuf_getline()
> loop in batch_objects_command(). It would be easier to reason about,
> and would have avoided some of the above problems, for instance, if
> handling of `begin` was split out to its own function which looped
> over strbuf_getline() itself, thus could easily have detected
> premature EOF (lacking `get info` or `get contents`) and likewise
> would not have allowed `info` or `contents` commands to be executed
> within a "session".
>
> Similarly (again subjective), the generic command dispatcher seems
> over-engineered and perhaps contributed to the oversight of `info` and
> `contents` failing to perform validation of their arguments. A simpler
> hand-rolled command-response loop is more common on this project and
> often easier to reason about. Perhaps something like this
> (pseudo-code):
>
>     while (strbuf_getline(&input, stdin)) {
>         char *end_cmd = strchrnul(&input.buf, ' ');
>         const char *argstart = *end_cmd ? end_cmd + 1 : end_cmd;
>         *end_cmd = '\0';
>         if (strcmp(input.buf, "info"))
>             show_info(argstart);
>         else if (strcmp(input.buf, "contents"))
>             show_contents(argstart);
>         else if (strcmp(input.buf, "begin"))
>             begin_session(argstart);
>         else
>             die(...);
>     }

I think I agree that the code in this patch is a little hard to follow,
but now I'm planning to simplify the design by getting rid of "begin"[2].

I'm hoping this change will make the code easier to reason about.

2. https://lore.kernel.org/git/767d5f5a-8395-78bc-865f-a39acc39e061@gmail.com/

>
> and each of the command-handler functions would perform its own
> argument validation (including the case when no argument should be
> present).
>
> [1]: https://lore.kernel.org/git/xmqqo83nsoxs.fsf@gitster.g/
>
>> +static void parse_cmd_begin(struct batch_options *opt,
>> +                          const char *line,
>> +                          struct strbuf *output,
>> +                          struct expand_data *data,
>> +                          struct string_list revs)
>> +{
>> +       /* nothing needs to be done here */
>> +}
>
> Either this function should be clearing the list of collected OID's or...
>
>> +static void parse_cmd_get(struct batch_options *opt,
>> +                          const char *line,
>> +                          struct strbuf *output,
>> +                          struct expand_data *data,
>> +                          struct string_list revs)
>> +{
>> +       struct string_list_item *item;
>> +       for_each_string_list_item(item, &revs) {
>> +               batch_one_object(item->string, output, opt, data);
>> +       }
>> +       if (opt->buffer_output)
>> +               fflush(stdout);
>
> ... this function should do so after it's done processing the OID list.
>
>> +}
>> +static void parse_cmd_get_info(struct batch_options *opt,
>> +                          const char *line,
>> +                          struct strbuf *output,
>> +                          struct expand_data *data,
>> +                          struct string_list revs)
>
> Missing blank line before the function definition.
>
>> +static void parse_cmd_get_objects(struct batch_options *opt,
>> +                          const char *line,
>> +                          struct strbuf *output,
>> +                          struct expand_data *data,
>> +                          struct string_list revs)
>> +{
>> +       opt->print_contents = 1;
>> +       parse_cmd_get(opt, line, output, data, revs);
>> +       if (opt->buffer_output)
>> +               fflush(stdout);
>> +}
>
> Why does this function duplicate the flushing logic of parse_cmd_get()
> immediately after calling parse_cmd_get()?
>
>> +static void batch_objects_command(struct batch_options *opt,
>> +                                   struct strbuf *output,
>> +                                   struct expand_data *data)
>> +{
>> +       /* Read each line dispatch its command */
>
> Missing "and".
>
>> +       while (!strbuf_getline(&input, stdin)) {
>> +               if (state == BATCH_STATE_COMMAND) {
>> +                       if (*input.buf == '\n')
>> +                               die("empty command in input");
>
> This never triggers because strbuf_getline() removes the line
> terminator before returning.

yes, this was an oversight. thanks for catching it!

>
>> +                       else if (isspace(*input.buf))
>> +                               die("whitespace before command: %s", input.buf);
>> +               }
>> +
>> +               for (i = 0; i < ARRAY_SIZE(commands); i++) {
>> +                       if (!skip_prefix(input.buf, prefix, &cmd_end))
>> +                               continue;
>> +                       /*
>> +                        * If the command has arguments, verify that it's
>> +                        * followed by a space. Otherwise, it shall be followed
>> +                        * by a line terminator.
>> +                        */
>> +                       c = commands[i].takes_args ? ' ' : '\n';
>> +                       if (*cmd_end && *cmd_end != c)
>> +                               die("arguments invalid for command: %s", commands[i].prefix);
>
> Ditto regarding strbuf_getline() removing line-terminator.
>
>> +                       cmd = &commands[i];
>> +                       if (cmd->takes_args)
>> +                               p = cmd_end + 1;
>> +                       break;
>> +               }
>> +
>> +               if (input.buf[input.len - 1] == '\n')
>> +                       input.buf[--input.len] = '\0';
>
> Ditto again. This is especially scary since it's accessing element
> `input.len-1` without even checking if `input.len` is greater than
> zero.

I realize we actually don't need this if block at all because strubf_getline()
strips the line terminator for us already.

>
> Also, it's better to use published strbuf API rather than mucking with
> the internals:
>
>     strbuf_setlen(&input, input.len - 1);
>
>> +               if (state == BATCH_STATE_INPUT && !cmd){
>> +                       string_list_append(&revs, input.buf);
>> +                       continue;
>> +               }
>> +
>> +               if (!cmd)
>> +                       die("unknown command: %s", input.buf);
>> +
>> +               state = cmd->next_state;
>> +               cmd->fn(opt, p, output, data, revs);
>> +       }
>> +       strbuf_release(&input);
>> +       string_list_clear(&revs, 0);
>> +}

  reply	other threads:[~2022-02-04 21:42 UTC|newest]

Thread overview: 97+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-03 19:08 [PATCH 0/2] Add cat-file --batch-command flag John Cai via GitGitGadget
2022-02-03 19:08 ` [PATCH 1/2] cat-file.c: rename cmdmode to mode John Cai via GitGitGadget
2022-02-03 19:28   ` Junio C Hamano
2022-02-04 12:10   ` Ævar Arnfjörð Bjarmason
2022-02-03 19:08 ` [PATCH 2/2] catfile.c: add --batch-command mode John Cai via GitGitGadget
2022-02-03 19:57   ` Junio C Hamano
2022-02-04  4:11     ` John Cai
2022-02-04 16:46       ` Phillip Wood
2022-02-04  6:45   ` Eric Sunshine
2022-02-04 21:41     ` John Cai [this message]
2022-02-05  6:52       ` Eric Sunshine
2022-02-04 12:11   ` Ævar Arnfjörð Bjarmason
2022-02-04 16:51     ` Phillip Wood
2022-02-07 16:33 ` [PATCH v2 0/2] Add cat-file --batch-command flag John Cai via GitGitGadget
2022-02-07 16:33   ` [PATCH v2 1/2] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
2022-02-07 23:58     ` Junio C Hamano
2022-02-07 16:33   ` [PATCH v2 2/2] cat-file: add --batch-command mode John Cai via GitGitGadget
2022-02-07 23:34     ` Jonathan Tan
2022-02-08 11:00       ` Phillip Wood
2022-02-08 17:56         ` Jonathan Tan
2022-02-08 18:09           ` Junio C Hamano
2022-02-09  0:11             ` Jonathan Tan
2022-02-08  0:49     ` Junio C Hamano
2022-02-08 11:06     ` Phillip Wood
2022-02-08 20:58   ` [PATCH v3 0/3] Add cat-file --batch-command flag John Cai via GitGitGadget
2022-02-08 20:58     ` [PATCH v3 1/3] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
2022-02-08 20:58     ` [PATCH v3 2/3] cat-file: introduce batch_command enum to replace print_contents John Cai via GitGitGadget
2022-02-08 23:43       ` Junio C Hamano
2022-02-08 20:58     ` [PATCH v3 3/3] cat-file: add --batch-command mode John Cai via GitGitGadget
2022-02-08 23:59       ` Junio C Hamano
2022-02-09 21:40     ` [PATCH v3 0/3] Add cat-file --batch-command flag Junio C Hamano
2022-02-09 22:22       ` John Cai
2022-02-09 23:10         ` John Cai
2022-02-10  4:01     ` [PATCH v4 " John Cai via GitGitGadget
2022-02-10  4:01       ` [PATCH v4 1/3] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
2022-02-10  4:01       ` [PATCH v4 2/3] cat-file: introduce batch_mode enum to replace print_contents John Cai via GitGitGadget
2022-02-10 10:10         ` Christian Couder
2022-02-10  4:01       ` [PATCH v4 3/3] cat-file: add --batch-command mode John Cai via GitGitGadget
2022-02-10 10:57         ` Phillip Wood
2022-02-10 17:05           ` Junio C Hamano
2022-02-11 17:45             ` John Cai
2022-02-11 20:07               ` Junio C Hamano
2022-02-11 21:30                 ` John Cai
2022-02-10 18:55           ` John Cai
2022-02-10 22:46         ` Eric Sunshine
2022-02-10 20:30       ` [PATCH v4 0/3] Add cat-file --batch-command flag Junio C Hamano
2022-02-11 20:01       ` [PATCH v5 " John Cai via GitGitGadget
2022-02-11 20:01         ` [PATCH v5 1/3] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
2022-02-11 20:01         ` [PATCH v5 2/3] cat-file: introduce batch_mode enum to replace print_contents John Cai via GitGitGadget
2022-02-11 20:01         ` [PATCH v5 3/3] cat-file: add --batch-command mode John Cai via GitGitGadget
2022-02-14 13:59           ` Phillip Wood
2022-02-14 16:19             ` John Cai
2022-02-14 18:23         ` [PATCH v6 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
2022-02-14 18:23           ` [PATCH v6 1/4] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
2022-02-14 18:23           ` [PATCH v6 2/4] cat-file: introduce batch_mode enum to replace print_contents John Cai via GitGitGadget
2022-02-14 18:23           ` [PATCH v6 3/4] cat-file: add remove_timestamp helper John Cai via GitGitGadget
2022-02-14 18:23           ` [PATCH v6 4/4] cat-file: add --batch-command mode John Cai via GitGitGadget
2022-02-15 19:39             ` Eric Sunshine
2022-02-15 22:58               ` John Cai
2022-02-15 23:20                 ` Eric Sunshine
2022-02-16  0:53           ` [PATCH v7 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
2022-02-16  0:53             ` [PATCH v7 1/4] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
2022-02-16  0:53             ` [PATCH v7 2/4] cat-file: introduce batch_mode enum to replace print_contents John Cai via GitGitGadget
2022-02-16  0:53             ` [PATCH v7 3/4] cat-file: add remove_timestamp helper John Cai via GitGitGadget
2022-02-16  0:53             ` [PATCH v7 4/4] cat-file: add --batch-command mode John Cai via GitGitGadget
2022-02-16  1:28               ` Junio C Hamano
2022-02-16  2:48                 ` John Cai
2022-02-16  3:00                   ` Junio C Hamano
2022-02-16  3:17                     ` Eric Sunshine
2022-02-16  3:01                   ` Eric Sunshine
2022-02-16 15:02             ` [PATCH v8 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
2022-02-16 15:02               ` [PATCH v8 1/4] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
2022-02-16 15:02               ` [PATCH v8 2/4] cat-file: introduce batch_mode enum to replace print_contents John Cai via GitGitGadget
2022-02-16 15:02               ` [PATCH v8 3/4] cat-file: add remove_timestamp helper John Cai via GitGitGadget
2022-02-16 15:02               ` [PATCH v8 4/4] cat-file: add --batch-command mode John Cai via GitGitGadget
2022-02-16 17:15                 ` Junio C Hamano
2022-02-16 17:25                   ` Eric Sunshine
2022-02-16 20:30                     ` John Cai
2022-02-16 20:59               ` [PATCH v9 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
2022-02-16 20:59                 ` [PATCH v9 1/4] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
2022-02-16 20:59                 ` [PATCH v9 2/4] cat-file: introduce batch_mode enum to replace print_contents John Cai via GitGitGadget
2022-02-16 20:59                 ` [PATCH v9 3/4] cat-file: add remove_timestamp helper John Cai via GitGitGadget
2022-02-16 20:59                 ` [PATCH v9 4/4] cat-file: add --batch-command mode John Cai via GitGitGadget
2022-02-18 11:26                   ` Phillip Wood
2022-02-18 16:53                     ` John Cai
2022-02-18 17:32                       ` Junio C Hamano
2022-02-18 17:23                     ` Junio C Hamano
2022-02-18 18:23                 ` [PATCH v10 0/4] Add cat-file --batch-command flag John Cai via GitGitGadget
2022-02-18 18:23                   ` [PATCH v10 1/4] cat-file: rename cmdmode to transform_mode John Cai via GitGitGadget
2022-02-18 18:23                   ` [PATCH v10 2/4] cat-file: introduce batch_mode enum to replace print_contents John Cai via GitGitGadget
2022-02-18 18:23                   ` [PATCH v10 3/4] cat-file: add remove_timestamp helper John Cai via GitGitGadget
2022-02-19  6:33                     ` Ævar Arnfjörð Bjarmason
2022-02-22  3:31                       ` John Cai
2022-02-18 18:23                   ` [PATCH v10 4/4] cat-file: add --batch-command mode John Cai via GitGitGadget
2022-02-19  6:35                     ` Ævar Arnfjörð Bjarmason
2022-02-18 19:38                   ` [PATCH v10 0/4] Add cat-file --batch-command flag Junio C Hamano
2022-02-22 11:07                   ` Phillip Wood

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3F83E252-D99C-47CB-B47F-11B8EA554A4F@gmail.com \
    --to=johncai86@gmail.com \
    --cc=avarab@gmail.com \
    --cc=bagasdotme@gmail.com \
    --cc=e@80x24.org \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=phillip.wood123@gmail.com \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).