git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Eric Sunshine <sunshine@sunshineco.com>
To: Phillip Wood <phillip.wood@dunelm.org.uk>
Cc: Junio C Hamano <gitster@pobox.com>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	Git List <git@vger.kernel.org>
Subject: Re: [PATCH 3/3] sequencer: use read_author_script()
Date: Thu, 13 Sep 2018 20:31:58 -0400	[thread overview]
Message-ID: <CAPig+cS8ncvp8Se2Q24z9CFkiVey2zHP8i3oYVDPfGoP1kuDdA@mail.gmail.com> (raw)
In-Reply-To: <20180912101029.28052-4-phillip.wood@talktalk.net>

On Wed, Sep 12, 2018 at 6:11 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
> Use the new function to read the author script, updating
> read_env_script() and read_author_ident(). This means we now have a
> single code path that reads the author script and uses sq_dequote() to
> dequote it. This fixes potential problems with user edited scripts
> as read_env_script() which did not track quotes properly.
> [...]
> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
> ---
>  /*
>   * Read a list of environment variable assignments (such as the author-script
>   * file) into an environment block. Returns -1 on error, 0 otherwise.
>   */

According to this comment, this function is capable of parsing a file
of arbitrary "NAME=Value" lines, and indeed the original code does
just that, but...

>  static int read_env_script(struct argv_array *env)
>  {
> +       char *name, *email, *date;
>
> -       if (strbuf_read_file(&script, rebase_path_author_script(), 256) <= 0)
> +       if (read_author_script(rebase_path_author_script(),
> +                              &name, &email, &date, 0))

...the new implementation is able to handle only GIT_AUTHOR_NAME,
GIT_AUTHOR_EMAIL, and GIT_AUTHOR_DATE, in exactly that order.

This seems like a pretty serious (and possibly buggy) change of
behavior, and makes the function much less useful (in general). Is it
true that it will only ever be used for files containing that limited
set of names? If so, the behavior change deserves mention in the
commit message, the function comment needs updating, and the function
itself probably ought to be renamed.

> +       strbuf_addstr(&script, "GIT_AUTHOR_NAME=");
> +       strbuf_addstr(&script, name);
> +       argv_array_push(env, script.buf);
> +       strbuf_reset(&script);
> +       strbuf_addstr(&script, "GIT_AUTHOR_EMAIL=");
> +       strbuf_addstr(&script, email);
> +       argv_array_push(env, script.buf);
> +       strbuf_reset(&script);
> +       strbuf_addstr(&script, "GIT_AUTHOR_DATE=");
> +       strbuf_addstr(&script, date);
> +       argv_array_push(env, script.buf);
> +       strbuf_release(&script);

Mentioned earlier[1], this can all collapse down to:

argv_array_pushf(env, "GIT_AUTHOR_NAME=%s", name);
argv_array_pushf(env, "GIT_AUTHOR_EMAIL=%s", email);
argv_array_pushf(env, "GIT_AUTHOR_DATE=%s", date);

However, it's unfortunate that this manual and hard-coded
reconstruction is needed at all. If you restructure the factoring of
this patch series, such ugliness can be avoided altogether. For
instance, the series could be structured like this:

1. Introduce a general-purpose function for reading a file containing
arbitrary "NAME=Value" lines (not carrying about specific key names or
their order) and returning them in some data structure (perhaps via
'string_list' as parse_key_value_squoted() in patch 2/3 does).

2. Build read_author_script() atop #1, making it expect and extract
GIT_AUTHOR_NAME, GIT_AUTHOR_EMAIL, and GIT_AUTHOR_DATE (possibly in
that order, or possibly not if we don't care).

3. Retrofit existing parsers to call one of those two functions (this
step may happen over several patches). So, for instance,
read_env_script() would call the generic parser from #1, whereas
sequencer.c:read_author_ident() would call the more specific parser
from #2.

More below...

> @@ -790,54 +771,25 @@ static char *get_author(const char *message)
>  /* Read author-script and return an ident line (author <email> timestamp) */
>  static const char *read_author_ident(struct strbuf *buf)
>  {
> -       const char *keys[] = {
> -               "GIT_AUTHOR_NAME=", "GIT_AUTHOR_EMAIL=", "GIT_AUTHOR_DATE="
> -       };
> -       if (strbuf_read_file(buf, rebase_path_author_script(), 256) <= 0)
> +       if (read_author_script(rebase_path_author_script(),
> +                              &name, &email, &date, 0))
>                 return NULL;
> -       /* dequote values and construct ident line in-place */
> -       for (in = buf->buf; i < 3 && in - buf->buf < buf->len; i++) {
> -               if (!skip_prefix(in, keys[i], (const char **)&in)) {
> -                       warning(_("could not parse '%s' (looking for '%s')"),
> -                               rebase_path_author_script(), keys[i]);
> -                       return NULL;
> -               }
> -               if (!sq_dequote(in)) {
> -                       warning(_("bad quoting on %s value in '%s'"),
> -                               keys[i], rebase_path_author_script());
> -                       return NULL;
> -               }
> -       if (i < 3) {
> -               warning(_("could not parse '%s' (looking for '%s')"),
> -                       rebase_path_author_script(), keys[i]);
> -               return NULL;
> -       }

The parsing code being thrown away here does a better job of
diagnosing problems (thus helping the user figure out what went wrong)
than the new shared parser introduced by patch 2/3. The shared
function only ever reports a generic "unable to parse", whereas the
above code gets specific, saying that it was looking for a particular
key or that quoting was broken. I'd have expected the new shared
parser to encompass the best features of the existing parsers (such as
presenting better error messages).

>         /* validate date since fmt_ident() will die() on bad value */
> -       if (parse_date(val[2], &out)){
> +       if (parse_date(date, buf)){

Re-purposing the strbuf 'buf', which is passed into this function,
binds this function too tightly with its caller by assuming that the
caller will never need the original content of 'buf' anymore. Thus, it
would be better for this code continue using its own local strbuf
'out' rather than re-purposing the incoming 'buf'.

>                 warning(_("invalid date format '%s' in '%s'"),
> -                       val[2], rebase_path_author_script());
> -               strbuf_release(&out);
> +                       date, rebase_path_author_script());
> +               strbuf_release(buf);

Likewise, it's doubly odd to see this function releasing 'buf' which
it does not own.

>                 return NULL;
>         }

[1]: https://public-inbox.org/git/CAPig+cRvUr26GZyW6ecYhpwABueBqaEfZH1+JjLaqZo8+RTD6Q@mail.gmail.com/

  parent reply	other threads:[~2018-09-14  0:32 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-12 10:10 [PATCH 0/3] am/rebase: share read_author_script() Phillip Wood
2018-09-12 10:10 ` [PATCH 1/3] am: rename read_author_script() Phillip Wood
2018-09-12 10:10 ` [PATCH 2/3] add read_author_script() to libgit Phillip Wood
2018-09-13 23:49   ` Eric Sunshine
2018-10-10 10:14     ` Phillip Wood
2018-10-10 10:22       ` Eric Sunshine
2018-09-12 10:10 ` [PATCH 3/3] sequencer: use read_author_script() Phillip Wood
2018-09-12 12:14   ` Eric Sunshine
2018-09-14  0:31   ` Eric Sunshine [this message]
2018-10-10 10:35     ` Phillip Wood
2018-10-18 10:00 ` [PATCH v2 0/5] am/rebase: share read_author_script() Phillip Wood
2018-10-18 10:00   ` [PATCH v2 1/5] am: don't die in read_author_script() Phillip Wood
2018-10-25  8:44     ` Junio C Hamano
2018-10-18 10:00   ` [PATCH v2 2/5] am: improve author-script error reporting Phillip Wood
2018-10-25  8:55     ` Junio C Hamano
2018-10-18 10:00   ` [PATCH v2 3/5] am: rename read_author_script() Phillip Wood
2018-10-18 10:00   ` [PATCH v2 4/5] add read_author_script() to libgit Phillip Wood
2018-10-18 10:00   ` [PATCH v2 5/5] sequencer: use read_author_script() Phillip Wood
2018-10-25  8:59   ` [PATCH v2 0/5] am/rebase: share read_author_script() Junio C Hamano
2018-10-26  9:00     ` Phillip Wood
2018-10-26 13:32       ` Junio C Hamano
2018-10-30 10:39 ` [PATCH v3 " Phillip Wood
2018-10-30 10:39   ` [PATCH v3 1/5] am: don't die in read_author_script() Phillip Wood
2018-10-30 10:39   ` [PATCH v3 2/5] am: improve author-script error reporting Phillip Wood
2018-10-30 10:39   ` [PATCH v3 3/5] am: rename read_author_script() Phillip Wood
2018-10-30 10:39   ` [PATCH v3 4/5] add read_author_script() to libgit Phillip Wood
2018-10-30 10:39   ` [PATCH v3 5/5] sequencer: use read_author_script() Phillip Wood
2018-10-31  2:50   ` [PATCH v3 0/5] am/rebase: share read_author_script() Junio C Hamano
2018-10-31 10:14     ` Phillip Wood
2018-10-31 10:15 ` [PATCH v4 " Phillip Wood
2018-10-31 10:15   ` [PATCH v4 1/5] am: don't die in read_author_script() Phillip Wood
2018-10-31 10:15   ` [PATCH v4 2/5] am: improve author-script error reporting Phillip Wood
2018-11-04  4:12     ` Eric Sunshine
2018-11-05  1:53       ` Junio C Hamano
2018-10-31 10:15   ` [PATCH v4 3/5] am: rename read_author_script() Phillip Wood
2018-10-31 10:15   ` [PATCH v4 4/5] add read_author_script() to libgit Phillip Wood
2018-10-31 10:15   ` [PATCH v4 5/5] sequencer: use read_author_script() Phillip Wood
2018-11-01  3:01   ` [PATCH v4 0/5] am/rebase: share read_author_script() Junio C Hamano
2018-11-01  3:12     ` [PATCH v4fixed 2/5] am: improve author-script error reporting Junio C Hamano
2018-11-01  3:12       ` [PATCH v4fixed 4/5] add read_author_script() to libgit Junio C Hamano
2018-11-01 16:27     ` [PATCH v4 0/5] am/rebase: share read_author_script() Phillip Wood

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPig+cS8ncvp8Se2Q24z9CFkiVey2zHP8i3oYVDPfGoP1kuDdA@mail.gmail.com \
    --to=sunshine@sunshineco.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=phillip.wood@dunelm.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).